Dimensions

edit
Dimension Variable
# Samples    
# Layers (exclude input)    
# Units in Input Layer    
# Units in Hidden Layer    
# Units in Output Layer / # Classes    

Constants

edit
Constant
Learning Rate  
Regularization Factor  

Matrices

edit
Notation Equation Dimensions Layers
Input   (given)   (global)
Output   (given)   (global)
Feedforward
 
Weight   (given / calculated)    
Bias   (given / calculated)    
Input        
Weighted Input        
Activation        
Predicted Output        
Backpropagation
 
Loss Function
(CE or MSE)
       
Cost Function     (scalar) (global)
Optimization    
Output Error        
Hidden Error        
Weight Update
(Gradient Descent)
       
Bias Update
(Gradient Descent)
       

Details

edit

Functions and Partial Derivatives

edit

 

Chain Rule

edit

 

Weight / Bias Update (Gradient Descent)

edit

 

Examples

edit

 

Remarks

edit
  •   is the matrix of the previous layer,   is that of the next layer, otherwise   implicitly refer to the current layer
  •   is the activation function (e.g. sigmoid, tanh, ReLU)
  •   is the element-wise product
  •   is the element-wise power
  •   is the matrix's sum of elements
  •   is the matrix derivative
  • Variations:
    1. All matrices transposed, matrix multiplcations in reverse order (row vectors instead of column vectors)
    2.   combined into one parameter matrix  
    3. No   term in  

References

edit