Skip to content

wolf1996/search-engine

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Web crawler and search engine

  1. Report (ru).
  2. Slides (ru).

Ranking functions:

  • BM25 (for whole content and headers only)
  • PageRank
  • Reference rating
  • Query position
  • Length of document

Architecture of the crawler:

architecture

Search page:

search page

Efficiency (1.6GHz i5 + SSD + 4Mbit/s):

  • Indexing: ~50'000 per hour pages.
  • Search: ~0.1s per query on database with 1'000'000 indexed pages.

License

The source code is licensed under MIT license.

The report and slides are not licensed (no rights are given to reproduce or modify this work).

About

Web crawler and search engine

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • JavaScript 90.0%
  • CSS 7.3%
  • HTML 2.7%