Skip to content

Latest commit

 

History

History
50 lines (31 loc) · 2.33 KB

07-cross-validation.md

File metadata and controls

50 lines (31 loc) · 2.33 KB

4.7 Cross-Validation

Slides

Extra resources

In the lesson we talked about iterators and generators in Python. You can read more about them here:

Notes

Cross-validarions refers to evaluating the same model on different subsets of a dataset, getting the average prediction, and spread within predictions. This method is applied in the parameter tuning step, which is the process of selecting the best parameter.

In this algorithm, the full training dataset is divided into k partitions, we train the model in k-1 partiions of this dataset and evaluate it on the remaining subset. Then, we end up evaluating the model in all the k folds, and we calculate the average evaluation metric for all the folds.

In general, if the dataset is large, we should use the hold-out validation dataset strategy. In the other hand, if the dataset is small or we want to know the standard deviation of the model across different folds, we can use the cross-validation approach.

Libraries, classes and methods:

  • Kfold(k, s, x) - sklearn.model_selection class for calculating the cross validation with k folds, s boolean attribute for shuffle decision, and an x random state
  • Kfold.split(x) - sklearn.Kfold method for splitting the x dataset with the attributes established in the Kfold's object construction.
  • for i in tqdm() - library for showing the progress of each i iteration in a for loop.

The code of this project is available in this jupyter notebook.

Add notes from the video (PRs are welcome)

⚠️ The notes are written by the community.
If you see an error here, please create a PR with a fix.

Nagivation