PyVideo Scrapers

Scrapy to generate the JSON files similar to the original pyvideo-data repo.

Python Version

Python 3.4+

Usage

Scraping YouTube playlist

After activating the virtual environment, simply call (inside videodata directory):

scrapy runspider videodata/spiders/youtube_playlist.py \
    -a playlist_id=<playlist_id> \
    [-a api_key=<google_api_key>] \
    [-s OUTPUT_DIR=<output_root_directory>]

where:

playlist_id is a list query parameter from the YouTube playlist URL (example: https://www.youtube.com/playlist?list=PLqtzN042QpfcOm_sOXxAixvNs9QWhhX5w)
google_api_key is a secret key for Google APIs (required only if public API usage quota is exhausted) - for more info how to obtain the API key, visit: https://support.google.com/cloud/answer/6158862
output_root_directory a root directory where the scraping results will be stored (default: <current-working-directory>/scraped_data)

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
tests		tests
videodata		videodata
.editorconfig		.editorconfig
.gitignore		.gitignore
.travis.yml		.travis.yml
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
requirements.txt		requirements.txt
scrapy.cfg		scrapy.cfg
setup.py		setup.py
test_requirements.txt		test_requirements.txt
tox.ini		tox.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PyVideo Scrapers

Python Version

Usage

Scraping YouTube playlist

About

Releases

Packages

Languages

License

slick666/scraper

Folders and files

Latest commit

History

Repository files navigation

PyVideo Scrapers

Python Version

Usage

Scraping YouTube playlist

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages