Skip to content

EricLondon/Ruby-Nokogiri-MongoDB-Crawler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

A Ruby class to crawl a website using Nokogiri, MongoDB database, and MongoMapper ORM

Usage:

  1. read blog post
  2. setup.readme
  3. usage.rb:
# include crawler class
require './ng_crawl.rb'

# instantiate crawler class object
ngc = NG_Crawl.new 'http://example.com'

# recursively crawl unprocessed URLs
ngc.crawl

# output all scanned URLs
puts ngc.all_urls

# output all external URLs
puts ngc.all_urls_external