Emma McKibbin | [email protected]
Spring 2022
This is Emma McKibbin's term project for LING 1340: Data Science for Linguists.
A Linguistic Look Inside Outsider Music
This project aims to identify linguistic characteristics common among songs in the genre of "outsider music," made by inexperienced or self-taught artists (e.g. "Tiptoe Through the Tulips with Me" by Tiny Tim). Are outsider lyrics as childlike, nonsensical, or repetitive as they're thought to be? What themes or lyrical distributions might distinguish outsider music from "insider," or popular, music?
Outsider Musicians were scraped from this Wikipedia page, lyrics were scraped from Genius.com using the Genius API, and the comparison dataset was taken from Kaylin Pavlik's "50 Year of Pop Music" analysis.
Because of the nature of the data collection and cleaning, the dataset in its current form is very skewed in its distribution. Therefore, the planned analysis is incomplete, and much of this project focuses on the internal trends and skews of the outsider music dataset.
README.md
- Includes a short project summary, directory (how meta), and important links!LICENSE.md
- Specifies the license for this repository: GNU General Public License v3.0project_plan.md
- The initial project plan, including expectations for the data collection, cleaning, analysis, and presentation.progress_report.md
- Updates on changes to the repository, from conception to completion.presentation_4_21_22.pdf
- The project presentation slides, created while the project was in-progress.final_report.md
- The essay-style overview of the project's process and results.
0_wiki_musicians.py
- Script used to retrieve outsider musician names from the Wikipedia page.0_wiki_musicians.txt
- Output of the script above. Manually edited to remove double quotes.1_lyricsgenius_requests.ipynb
- Retrieves lyrics in JSON format from Genius.2_load_json_to_df.ipynb
- Reads the lyrics from JSON form into a pandas Data Frame.3_data_cleaning_and_exploration.ipynb
- Looking at basic statistics of the Data Frame and cleaning the lyric data.
- Folder containing small snippets of the scraped data. I've refrained from publishing multiple samples so as not to violate any lyrics site copyrights.
Lyrics_CripsinGlover.json
- Example JSON of lyrics scraped from Genius.
- Folder containing saved images of graphs produced during .ipynb analysis, to be included in final presentation
My guestbook is where classmates leave comments, questions, and suggestions about this repository!