Skip to content

podcast-data-lab/core-nlp-research

Repository files navigation

Natural Language Processing of Podcast Data

This repository contains research into podcasts with Natural Language processing.

Podcast Data is vast and growing tremendously day by day. There are many data points to research podcasts, with the main being the audio files themselves, transcripts of the audio, podcast descriptions and other metadata obtained from a podcast's rss feed.

Phase 1: Name entity recognition of Podcast and episode text descriptions

The first phase of this research is dealing with textual data obtained from the podcast and it's episodes' descriptions obtained from rss feeds. Named entities are extracted from the descriptions and the entities attached to the the resulting podcast file.

Research Notes