-
Notifications
You must be signed in to change notification settings - Fork 0
sampling
Tansu Dasli edited this page Sep 22, 2023
·
8 revisions
- understanding the population characteristic
- controlling randomness,
Some of other key points are bias, cost, time and representativeness
- repeating experiment (which is costly) vs bootstrapping (resampling the samples, and get a distribution, then calculate some stats)
- selecting a sampling technique
- sample ν is always smaller than the population ν. So to compensate the gap, divide N-1
sampling in two main areas
population | data gathering
model | train-test split
sampling techniques
random |
convenient | easy picks
clustering | then, random picks | grouping and selecting only from some clusters !
strata | then, random picks | grouping and selecting from all stratum !
note
- cross validation is a kind of bootstrapping
- ensemble a model is also a kind of cross validation