K-fold cross validation

The k-fold cross-validation procedure is a standard method for estimating the performance of a machine learning algorithm or configuration on a dataset.Repeated k-fold cross-validation provides a way to improve the estimated performance of a machine learning model. This involves simply repeating the cross-validation procedure multiple times and reporting the mean result across all folds from all runs. This mean result is expected to be a more accurate estimate of the true unknown underlying mean performance of the model on the dataset, as calculated using the standard error.

The algorithm of the k-Fold technique:

  1. Pick a number of folds – k. Usually, k is 5 or 10 but you can choose any number which is less than the dataset’s length.
  2. Split the dataset into k equal (if possible) parts (they are called folds)
  3. Choose k – 1 folds as the training set. The remaining fold will be the test set
  4. Train the model on the training set. On each iteration of cross-validation, you must train a new model independently of the model trained on the previous iteration
  5. Validate on the test set
  6. Save the result of the validation
  7. Repeat steps 3 – 6 k times. Each time use the remaining  fold as the test set. In the end, you should have validated the model on every fold that you have.
  8. To get the final score average the results that you got on step 6.

Leave a Reply

Your email address will not be published. Required fields are marked *