Below is a diagram illustrating the Deep Learning ANN’s architecture utilized in this case study.
I’m utilizing one output layer with one neuron, two hidden levels with five neurons each, and two hidden layers. Can you adjust these figures please? Yes, you are able to modify the total number of neurons in each layer as well as the number of hidden layers.
Lastly, select the combination that yields the highest level of accuracy. This is how the ANN model is tuned.
In the code snippet we utilize the “module, from the Keras library to construct a sequence of ANN layers that are stacked one after another. Each layer is defined using the “module of Keras. Here we specify aspects such as the number of neurons, in each layer the weight initialization technique used in the network and which activation function should be applied to each neuron in that layer.
understanding the hyperparameters in below code snippets
- units=5: This means we are creating a layer with five neurons in it. Each of these five neurons will be receiving the values of inputs, for example, the values of ‘Age’ will be passed to all five neurons, similarly all other columns.
- input_dim=7: This means there are seven predictors in the input data which is expected by the first layer. If you see the second dense layer, we don’t specify this value, because the Sequential model passes this information further to the next layers.
- kernel_initializer=’normal’: When the Neurons start their computation, some algorithm has to decide the value for each weight. This parameter specifies that. You can choose different values for it like ‘normal’ or ‘glorot_uniform’.
- activation=’relu’: This specifies the activation function for the calculations inside each neuron. You can choose values like ‘relu’, ‘tanh’, ‘sigmoid’, etc.
- batch_size=20: This specifies how many rows will be passed to the Network in one go after which the SSE calculation will begin and the neural network will start adjusting its weights based on the errors.
When all the rows are passed in the batches of 20 rows each as specified in this parameter, then we call that 1-epoch. Or one full data cycle. This is also known as mini-batch gradient descent. A small value of batch_size will make the ANN look at the data slowly, like 2 rows at a time or 4 rows at a time which could lead to overfitting, as compared to a large value like 20 or 50 rows at a time, which will make the ANN look at the data fast which could lead to underfitting. Hence a proper value must be chosen using hyperparameter tuning. - Epochs=50: The same activity of adjusting weights continues for 50 times, as specified by this parameter. In simple terms, the ANN looks at the full training data 50 times and adjusts its weights.