Multilabel-motionpicture-genre-classifier

This is a multilable motionpicture genre classifier project. It can cassify 19 types of genre based on the description of a plot.

DistilRoberta-Base model has been fine-tuned on IMDB movie and TV series description. Around 25,000 movie and tv series description has been used to fin-tune the model. Around 88% accuracy was achieved. The model was then converted to ONNX model for optimizing and later quantized by onnx quantifier. The model is deployed on HuggingFace Spaces and on Render as a web app.

Data

Selenium was used for data scraping. Collected the data from IMDB movie database. About 56,000 data was scraped in multiple genres. Around 25,000 unique data was available after data processing and cleaning. Movies data and TV series data was scraped seperately then cleaned and joined in a single csv file. You can get the combined data here

Model

DistilRoberta-Base from HuggingFace Spaces was fine-tuned by the collected data using Fastai and Blurr. Around 88% accuracy was achieved after fine-tuning the model and F1 score (micro) of 62% was achieved. Converted the model in ONNX format and quantized for making it load faster and usable.

Model Deployment

The quantized model is deployed on HuggingFace Spaces. It was deployed using Gradle. You can see the model from here

Web Deployment

A web app using flask framework was also built and connected to the spaces api. You can find the code in flask-deployment branch. You can also see the live website here

Contributions

You are welcome to contribute to this project. You can fork the repository and submit a pull request, or submit an issue with suggestions for improvements.

Limitations

The IMDB movie description is very short. More elaborated descrition could diversify the model.

Future Work

In future I aim to improve the model by collecting more and diverse data from different sources. Different Model architecture can also be explored to test the dataset. I also plan to collect data in different languages to make the model more robust.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
data		data
dataloaders		dataloaders
deployment		deployment
images		images
models		models
notebooks		notebooks
scraper		scraper
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multilabel-motionpicture-genre-classifier

Data

Model

Model Deployment

Web Deployment

Contributions

Limitations

Future Work

About

Releases

Packages

Languages

License

sheikhDeep/multilable-motionpicture-genre-classifier

Folders and files

Latest commit

History

Repository files navigation

Multilabel-motionpicture-genre-classifier

Data

Model

Model Deployment

Web Deployment

Contributions

Limitations

Future Work

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages