Skip to content

Latest commit

 

History

History
 
 

KDD2009

Example code and data for "Practical Data Science with R, 2nd Edition" by Nina Zumel and John Mount, Manning 2019.

The code and data in this directory supports examples from:

  • Chapter 5: Choosing and Evaluating Models
  • Chapter 6: Using Memorization Methods

A workspace containing most of the results has been saved as KDD2009.Rdata and can be loaded in R with the command:

load('KDD2009.Rdata')

(note you will have to re-load various libraries like ROCR to perform some of the steps).

6-2-2013 Data from: http://www.sigkdd.org/kdd-cup-2009-customer-relationship-prediction Downloaded: $ shasum * e43a38e3477e38b354943519954b719ec7623c2f orange_small_train.data.zip 8274d23235630717659898900b7f74092ff339ad orange_small_train_appetency.labels.txt ec2de79844657fb892ec9047e6304c12b296ff68 orange_small_train_churn.labels.txt 4cd2d7c9b20fd3638883a91a2fed6a03a4d5d015 orange_small_train_upselling.labels.txt Data to support examples in the chapter on memorization methods in "Practical Data Science with R" ( http://www.practicaldatascience.com ).

Load data:

  unzip orange_small_train.data.zip
  gzip -9 orange_small_train.data

See KDDmodels.Rmd for examples and details and KDD2009vtreat.Rmd for a newer vtreat based demonstration.