The k-fold cross-validation procedure is a standard method for estimating the performance of a machine learning algorithm or configuration on a dataset.Repeated k-fold cross-validation provides a way to improve the estimated performance of a machine learning model. This involves simply repeating the cross-validation procedure multiple times and reporting the mean result across all folds from all runs. This mean result is expected to be a more accurate estimate of the true unknown underlying mean performance of the model on the dataset, as calculated using the standard error.
The algorithm of the k-Fold technique:
- Pick a number of folds – k. Usually, k is 5 or 10 but you can choose any number which is less than the dataset’s length.
- Split the dataset into k equal (if possible) parts (they are called folds)
- Choose k – 1 folds as the training set. The remaining fold will be the test set
- Train the model on the training set. On each iteration of cross-validation, you must train a new model independently of the model trained on the previous iteration
- Validate on the test set
- Save the result of the validation
- Repeat steps 3 – 6 k times. Each time use the remaining fold as the test set. In the end, you should have validated the model on every fold that you have.
- To get the final score average the results that you got on step 6.