desalination_scraper

I'll post any code that I create to generate the python scraper for expanding Watson's corpus.

I'm using the following libraries:

http://scrapy.org/
http://code.google.com/p/pygoogle/ (depending on whether I can get some universal rules working to scrape meaningful text across domains)
https://python-docx.readthedocs.org/en/latest/ (for outputting Watson-friendly .docx files with headers)
http://www.unixuser.org/~euske/python/pdfminer/

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
README.md		README.md
wikiDesalinationSpider.py		wikiDesalinationSpider.py

Provide feedback