Skip to content

Latest commit

 

History

History
100 lines (80 loc) · 2.88 KB

README.md

File metadata and controls

100 lines (80 loc) · 2.88 KB

BIRDDOG

Simple Node.js website crawler that users either the ?SHOWXML or the sitemap.xml of a site to crawler through looking for blank pages, and other bad status codes.

Installation

To install node package dependences run this in the directory that Birddog sits.

npm install package.json

Examples

Simplest

node birddog.js

Runs using default options in Birddog.js

Set the Website URL.

node birddog.js --url https://www.mandarinoriental.com

This will look for and run against the sitemap.xml file

Use the SHOWXML of a CDE based site

node birddog.js --url https://fontainebleau.com/ --sitemap false

Using Abbreviated Options

node birddog.js --u https://www.mandarinoriental.com/ --s false

Using a Direct URL Path

node birddog.js --d https://fontainebleau.com/fontainebleau-miami-beach-xml-sitemap.xml

In the event that the sitemap.xml file isn't named as such.

Settings

Options Type Default Description
url string https://sabreshospitality.com The website url that you would like to crawl. Has alias *-u*
directpath string https://sabreshospitality.com/sitemap.xml The direct sitemap xml path that you would like to crawl. Only supports supports the [standard XML sitemap protocol]((https://www.sitemaps.org/index.html)). Has alias *-d*
sitemap boolean true If true uses sitemap.xml, if false uses ?SHOWXML for CDE sites. Has alias *-s*
maxConnections integer 10 Crawler.js option: Size of the worker pool. Has alias *-m*
retries number 3 Crawler.js option: Number of retries if the request fails. Has alias *-r*

Dependencies

  1. cheerio v1.0.0-rc.2 -- Tiny, fast, and elegant implementation of core jQuery designed specifically for the server
  2. cli-spinner v0.2.8 -- Spinners for use in the terminal
  3. crawler v1.1.2 -- Crawler is a web spider written with Nodejs.
  4. minimist v0.0.8 -- Parse argument options
  5. request v2.83.0 -- Simplified HTTP request client.

What Are Sitemaps?

Sitemaps are an easy way for webmasters to inform search engines about pages on their sites that are available for crawling. More info