Skip to content

This microservice crawls a website to find the given URL tree and paths that developed using gRPC.

License

Notifications You must be signed in to change notification settings

nebipeker/Website-Crawler-Microservice

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Website Crawler Microservice

This microservice crawls a website to find the given URL tree and paths.

Usage

To use the microservice, clients can make gRPC requests to the service using the defined service interface. Here is an example Python client:

import grpc
import website_crawler_pb2
import website_crawler_pb2_grpc

channel = grpc.insecure_channel('localhost:50051')
stub = website_crawler_pb2_grpc.WebsiteCrawlerStub(channel)

url = 'https://example.com'
response = stub.crawl(website_crawler_pb2.CrawlRequest(url=url))
print(response)

The crawl method takes a CrawlRequest object containing a URL to crawl and returns a CrawlResponse object containing the URL tree and paths found.

Implementation

The microservice is implemented in Python using the following libraries:

  • gRPC: to define the service interface and handle client requests
  • Requests: to fetch HTML content from websites
  • BeautifulSoup: to parse HTML content and extract links

The implementation is divided into two main components:

  • The website_crawler.proto file defines the service interface using Protocol Buffers syntax. The file is used to generate gRPC stubs in Python.
  • The website_crawler_server.py file defines the service logic and handles client requests. The crawl method crawls the given URL and returns a CrawlResponse containing the URL tree and paths found.

Docker

The microservice is packaged in a Docker container to ensure consistent and portable deployment. The Dockerfile specifies the container environment and dependencies required for the microservice. To build and run the container, use the following commands:

docker build -t website-crawler .
docker run -p 50051:50051 website-crawler

Testing

The microservice is tested using the Pytest testing framework. The tests directory contains test files for each component of the microservice, including the gRPC service interface, the website crawler logic, and the Docker container. To run the tests, use the following command:

pytest

Deployment

The microservice can be deployed to a cloud provider like AWS, GCP, or Azure, or to a container orchestration platform like Kubernetes. To deploy the microservice, follow these steps:

  1. Build the Docker container as described above.
  2. Push the container to a container registry like Docker Hub or Amazon ECR.
  3. Use a cloud provider or container orchestration platform to deploy the container.

License

This microservice is licensed under the MIT License. See the LICENSE file for more details.

About

This microservice crawls a website to find the given URL tree and paths that developed using gRPC.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published