urlExpander is a Python package for quickly and thoroughly expanding shortened URLs.
urlExpander is inteded to be used by social media researchers who want to do analysis of links.
Analytics and ad-based services make such analysis difficult. Aside from collecting in-depth user engagement data, these services obfuscate the destination of the shortened URLs.
urlExpander was created to address this challenge in a scalable and robust manner. It does so by providing utility functions to convert Tweets into link datasets, filter for known for link-shortening services (like bit.ly), resolve shortened links, and parse the title and meta description from webpages.
This package differs from other approaches because it handles ad-based urls (like adf.ly, lnx.lu, linkbucks.com, and adfoc.us) thanks to the Unshortenit library, as well as resolves redirects to defunct websites (like blacktolive.com). Most importantly, urlExpander and offers multithreaded url expansion.
The multithreaded url expansion was created to overcome the bottleneck of mass link expansion through parallelization, minimizating http requests, caching results, and chunking the input into smaller pieces.
pip install urlexpander
import urlexpander
urlexpander.expand('https://trib.al/xXI5ruM')
returns
'https://www.breitbart.com/video/2017/12/31/lindsey-graham-trump-just-cant-tweet-iran/'
The function shines given a massive list of urls to unshorten:
resolved_links = urlexpander.multithread_expand(list_of_short_urls,
chunksize=1280,
n_workers=64,
cache_file='tmp.json')
Check out this Jupyter Notebook for a more in-depth quickstart!
We'll generate a readthedocs shortly!
urlExpander was written by Leon Yin with contributions by Nicole Baram and Gregory Eady for the Social Media and Political Participation Lab at NYU.
Please cite urlExpander in your publications if it helps your research. Here is an example BibTeX entry:
@misc{leon_yin_2018_1345144,
author = {Leon Yin},
title = {SMAPPNYU/urlExpander: Initial release},
month = aug,
year = 2018,
doi = {10.5281/zenodo.1345144},
url = {https://doi.org/10.5281/zenodo.1345144}
}
Please also send us your work :)
urlExpander is being used is several forthcoming publications from the SMaPP Lab (and perhaps from you?). We'll keep a running tally here.