Term: Fall 2016
- Data link-(courseworks login required)
- Data description
- Contributor's name: Jaime Gacitua
- Project title: GBM to Recommend song lyrics
- Project summary: A model is proposed to predict song lyrics given song features. A brief description of the model is below.
-
We have a dictionary of 5000 words and, 2700 songs.
-
We have a matrix that, for every song, indicates how many times each word appears (bag of words)
-
We also have access to multiple features, for every song
-
The 19 features chosen to predict words are the following:
- tempo.median
- tempo.var
- tatums.median
- tatums.var
- loudness.median
- loudness.var
- duration
- timbre (median of each of the 12 dimensions)
-
The word matrix is converted into a binary matrix.
- If a song is present in a song, the value is 1. Otherwise, 0.
-
For each column (word) of the matrix, a Generalized Boosting Model (GBM) was fitted, with bernoulli responses.
- The 19 features are the input, and the (0-1) word column is the output.
- In total 5000 GBM models are fit.
-
Parameter tuning was done using cross validation.
- The error is calculated using the sum of ranks.
-
The model trains in around 30 minutes
-
The best average sum of ranks achieved was 0.229, and the simplest model for that result was n=100 trees and depth=8.
Following suggestions by RICH FITZJOHN (@richfitz). This folder is orgarnized as follows.
proj/
├── lib/
├── data/
├── doc/
├── figs/
└── output/
Please see each subfolder for a README file.