Friendly github crawler.
- Install requirements
pip install -r requirement.txt
- Update source url as per your need in
github/github/spiders/github-user.py
def start_requests(self):
urls = [
"your search url here"
]
Set folllowing variables in settings.py
ITEM_PIPELINES = {
'GithubCsvPipeline': 300,
}
Set folllowing variables in settings.py
ELASTICSEARCH_HOST = ''
ELASTICSEARCH_PORT = 9200
ITEM_PIPELINES = {
'GithubElasticsearchPipeline': 300,
}
Note: This option requires index to be already created in the elasticsearch server
- Set folllowing variables in
settings.py
GOOGLE_SHEET =""
ITEM_PIPELINES = {
'github.pipeline.GithubExcelPipeline': 300,
}
- Store googleapi credentials in
utility/gsheets_credentials.json
Note: This option requires an existing google sheet with permissions "Editable by anyone who has link"
cd github
scrapy crawl github-user-search