Skip to content

sampling

Tansu Dasli edited this page Sep 22, 2023 · 8 revisions

sampling is about

  • understanding the population characteristic
  • controlling randomness,

Some of other key points are bias, cost, time and representativeness

handling cost is about

  • repeating experiment (which is costly) vs bootstrapping (resampling the samples, and get a distribution, then calculate some stats)
  • selecting a sampling technique

dividing to N-1 instead N?

  • sample ν is always smaller than the population ν. So to compensate the gap, divide N-1
 sampling in two main areas  
 
 population    | data gathering
 model         | train-test split
sampling techniques

   random      |  
   convenient  | easy picks    
   clustering  | then, random picks | grouping and selecting only from some clusters !
   strata      | then, random picks | grouping and selecting from all stratum !

note

  • cross validation is a kind of bootstrapping
  • ensemble a model is also a kind of cross validation