Project: Words 4 Music

Term: Fall 2016

Data link-(courseworks login required)
Data description
Contributor's name: Jaime Gacitua
Project title: GBM to Recommend song lyrics
Project summary: A model is proposed to predict song lyrics given song features. A brief description of the model is below.

We have a dictionary of 5000 words and, 2700 songs.
We have a matrix that, for every song, indicates how many times each word appears (bag of words)
We also have access to multiple features, for every song
The 19 features chosen to predict words are the following:
1. tempo.median
2. tempo.var
3. tatums.median
4. tatums.var
5. loudness.median
6. loudness.var
7. duration
8. timbre (median of each of the 12 dimensions)
The word matrix is converted into a binary matrix.
- If a song is present in a song, the value is 1. Otherwise, 0.
For each column (word) of the matrix, a Generalized Boosting Model (GBM) was fitted, with bernoulli responses.
- The 19 features are the input, and the (0-1) word column is the output.
- In total 5000 GBM models are fit.
Parameter tuning was done using cross validation.
- The error is calculated using the sum of ranks.
The model trains in around 30 minutes
The best average sum of ranks achieved was 0.229, and the simplest model for that result was n=100 trees and depth=8.

Following suggestions by RICH FITZJOHN (@richfitz). This folder is orgarnized as follows.

proj/
├── lib/
├── data/
├── doc/
├── figs/
└── output/

Please see each subfolder for a README file.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
data		data
doc		doc
figs		figs
lib		lib
output		output
peer-review		peer-review
.DS_Store		.DS_Store
.gitignore		.gitignore
Fall2016-proj4-jaime-gacitua.Rproj		Fall2016-proj4-jaime-gacitua.Rproj
README.md		README.md
analysis.Rmd		analysis.Rmd
analysis.html		analysis.html
analysis.nb.html		analysis.nb.html
main.R		main.R

Provide feedback