This is the source code for a Python bot that pull random images from Wikipedia, colorize them with (@algorithmiaio) colorize machine learning algorithm, and post to Twitter (@WikiColorize)
For fun. Deep learning is good at those kinds of tasks, Algorithmia has this Colorization Demo that can run without cost, only adding their watermark, Twitter is the right place for those kinds of bots, so the stars aligned.
- Gets WikiMedia random file URL and return the main file and metadata via XPath using
lxml
- Checks if the WikiMedia file is an image (there is audio, PDF and other file formats that do not matter for this application there) - if not, go back to step 1
- Checks if it is black and white using
Pillow
to check if the image is monochrome - if not, go back to step 1 - (Optional Step) Tries to identify if the monochrome image looks like a scanned document using
pytesseract
OCR, if it does, go back to step 1. This step is here because ifTesseract
doesn't recognize the image as a document, it is more likely that it is a photo - Saves the photo and inputs it to Algorithmia Colorization API, that uses Deep Learning to try to guess the right colors for a B&W image (you can see how this colorization works in a technical level here), and returns the colorized photo
- Tweets the colorized photo (with metadata and the URL)
Trying to guess if the image is a photo/picture or a document is not optimal can be improved.
The colorization API is far from perfect for most images. However it is great to see state of the art in the field of automatic colorization with real data.