Web Crawler 3000 is your very own link-hungry web-surfing bot, powered by Go and Colly. It'll keep gobbling up links until there are none left on the internet… or until it realizes it can't quite visit everything.
Simply point it to a webpage, and watch it go! It'll scrape, crawl, queue, dequeue, and consume links like a toddler with spaghetti. But don’t worry — it won't be ringing anyone’s doorbell (no mailto or tel links!) or trying to get itself tangled in any weird infinite loops.
- Queue-based BFS: None of that pesky recursion here! Our bot's got its priorities straight and only gobbles up links it hasn't tasted before.
- Absolute URL Master: Ever tried figuring out whether
../about
or#contact
will lead to hidden treasure? Let Web Crawler 3000 handle the math. - Selective Tastes: No "mailto" or "tel" links in this diet! We’re web crawlers, not phone crawlers.
Fire up your terminal and run:
go run main.go
Then, just give it a starting URL when it politely asks:
Enter the root link: https://example.com
Grab a coffee while Web Crawler 3000 works its way through every corner of the internet… okay, not every corner, but close enough. Links it finds will show up in your terminal, one after the other, so you know it’s not slacking off!
Here's how it works:
- Enqueue and Dequeue: We’ve got a queue, a map of visited links, and a dream.
- No Double-Dipping: Every link is stored in our
links
map, so we don’t get déjà vu. - Filtered Crawling: Only valid HTTP/HTTPS links are allowed. No funny business.
- Warning: While this bot loves the internet, it might break up with you if you try pointing it at infinite loops. It’s BFS, not “Find Forever and Survive.”
- Ideal Use Case: Impress your friends by pretending you wrote a bot that could totally crawl all of Google if it wanted to.
- Real Use Case: Discovering that many websites out there are just slightly different versions of the same few links.
This bot isn’t responsible for any existential crises caused by repetitive URLs or missing robots.txt
files. Run responsibly!