Machine Learning - Project 1
Tshtsh_club
: Marie Anselmet, Sofia Dandjee, Héloïse Monnet
- Make sure that
Python >= 3.7
andNumPy >= 1.16
are installed - Download the train and test data sets from Kaggle competition dataset, and put
train.csv
andtest.csv
into adata\
folder. - Go to
script\
folder and runrun.py
. You will getsubmission.csv
for Kaggle in thesubmission\
folder.
cd script
python run.py
sigmoid
: Sigmoid function.load_csv_data
: Loads data from a csv file.compute_f1_score
,predict_accuracy
: Computes the accuracy and the F1 score of a prediction.predict_labels
: Generates class predictions for a linear or a logistic regression.build_k_indices
,cross_validation
: Generate the training and validation data for cross-validation.classify
: Converts the (-1,1) of a label vector into (0,1), to use for the logistic regression.batch_iter
: Generates a mini-batch for a dataset.create_csv_submission
: Creates a csv output file for submission to Kaggle.
compute_loss
: Computes the loss by mse for linear regression.logistic_loss
: Compute the loss by negative log likelihood for the logistic regression.reg_logistic_loss
: Compute the regularized logistic loss by negative log likelihood.
compute_gradient
: Computes the gradient for the linear gradient descent.logistic_gradient
: Compute the gradient for the logistic gradient descent.reg_logistic_gradient
: Compute the gradient for the regularized logistic gradient descent.
get_jet_samples
: Divides the input data depending of their jet values.clean_data
,standardize
: Standardizes data, removes undefined values and features with a null standard deviation.augment_data
,build_model_data
,build_poly_all_features
: Augment the data by building polynomial features.
least_squares_GD
: Linear regression using gradient descent.least_squares_SGD
: Linear regression using stochastic gradient descent.least_squares
: Least squares regression using normal equations.ridge_regression
: Ridge regression using normal equations.logistic_regression
: Logistic regression using stochastic gradient.reg_logistic_regression
: Regularized logistic regression using stochastic gradient descent.
Script to produce the same .csv predictions used in the best submission on the Kaggle platform.