Skip to content

A web scraper specified for scraping score data from MyAnimeList using Playwright for Python library and SQLalchemy

License

Notifications You must be signed in to change notification settings

yafethtb/MyAnimeList-Score-Scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 

Repository files navigation

MyAnimeList-Score-Scraper

A web scraper specified for scraping score data from MyAnimeList using Playwright for Python library and SQLalchemy

INTRO

  1. THE WHAT

This is a web scraper that created specify for scraping MyAnimeList scoring data. It using a combination of Playwrigth for Python, BeautifulSoup, and SQLalchemy to extract HTML, transform, and store the data to a database.

  1. THE WHY

I like anime. I like data. I'm curious of what is the highest score anime in MyAnimeList. I'm curious to know many things about anime data. And MyAnimeList have quite enough data to explore. So, why not?

  1. THE HOW

I'm using "Extract, Transform, Load" process when I scraping and storing the data I got from MyAnimeList.

PARTS

I choose to separate this project into three files:

  1. playwright_scraper.py

This is where I made the main webscraper functionality. It consist of two functions:

  a. anime_season()
  
      This is a function I made to classify anime into their respective airing season.

  b. playwright_scraper()
  
      The main function for extracting and transforming the data of each pages. It accepting anime genre URL from myanimelist as its input. 
      It will then open each pages of genre feed into it, checking if the page exist, and take the page's HTML ready to be rendered by BeautifulSoup. 
      BeautifulSoup then parsing (transform) the HTML file into informations like anime name, average score, and their genre(s).
  1. modeler.py

Modeler is just a file I use to create database. I'm using SQLite as my database of choice and SQLAlchemy as a tool to create it.

  1. etl_process.py

This file is used to automate the process of extracting, transforming, and loading data from MyAnimeList website into the database. It consists of a variable to getting date (I'm using this to create unique table name in SQLite), a list of URL of the targeted genres, and a for loop that will feed each URL in the list into playwright_scraper() function that will produce a dictionary of anime genre. The dictionary then used as an input to populate the database by using modeler() function.

About

A web scraper specified for scraping score data from MyAnimeList using Playwright for Python library and SQLalchemy

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages