Term: Fall 2016
- Data link-(courseworks login required)
- Data description
- Contributor's name: Jaime Gacitua
- Project title: GBM to Recommend song lyrics
- Project summary: A model is proposed to predict song lyrics given song features. A brief description of the model is below.
We have a dictionary of 5000 words and, 2700 songs.
We have a matrix that, for every song, indicates how many times each word appears (bag of words)
We also have access to multiple features, for every song
The 19 features chosen to predict words are the following:
- tempo.median
- tempo.var
- tatums.median
- tatums.var
- loudness.median
- loudness.var
- duration
- timbre (median of each of the 12 dimensions)
The word matrix is converted into a binary matrix.
- If a song is present in a song, the value is 1. Otherwise, 0.
For each column (word) of the matrix, a Generalized Boosting Model (GBM) was fitted, with bernoulli responses.
- The 19 features are the input, and the (0-1) word column is the output.
- In total 5000 GBM models are fit.
Parameter tuning was done using cross validation.
- The error is calculated using the sum of ranks.
The model trains in around 30 minutes
The best average sum of ranks achieved was 0.229, and the simplest model for that result was n=100 trees and depth=8.
Following suggestions by RICH FITZJOHN (@richfitz). This folder is orgarnized as follows.
├── lib/
├── data/
├── doc/
├── figs/
└── output/
Please see each subfolder for a README file.