This microservice crawls a website to find the given URL tree and paths.
To use the microservice, clients can make gRPC requests to the service using the defined service interface. Here is an example Python client:
import grpc
import website_crawler_pb2
import website_crawler_pb2_grpc
channel = grpc.insecure_channel('localhost:50051')
stub = website_crawler_pb2_grpc.WebsiteCrawlerStub(channel)
url = 'https://example.com'
response = stub.crawl(website_crawler_pb2.CrawlRequest(url=url))
print(response)
The crawl
method takes a CrawlRequest
object containing a URL to crawl and returns a CrawlResponse
object containing the URL tree and paths found.
The microservice is implemented in Python using the following libraries:
- gRPC: to define the service interface and handle client requests
- Requests: to fetch HTML content from websites
- BeautifulSoup: to parse HTML content and extract links
The implementation is divided into two main components:
- The
website_crawler.proto
file defines the service interface using Protocol Buffers syntax. The file is used to generate gRPC stubs in Python. - The
website_crawler_server.py
file defines the service logic and handles client requests. Thecrawl
method crawls the given URL and returns aCrawlResponse
containing the URL tree and paths found.
The microservice is packaged in a Docker container to ensure consistent and portable deployment. The Dockerfile
specifies the container environment and dependencies required for the microservice. To build and run the container, use the following commands:
docker build -t website-crawler .
docker run -p 50051:50051 website-crawler
The microservice is tested using the Pytest testing framework. The tests
directory contains test files for each component of the microservice, including the gRPC service interface, the website crawler logic, and the Docker container. To run the tests, use the following command:
pytest
The microservice can be deployed to a cloud provider like AWS, GCP, or Azure, or to a container orchestration platform like Kubernetes. To deploy the microservice, follow these steps:
- Build the Docker container as described above.
- Push the container to a container registry like Docker Hub or Amazon ECR.
- Use a cloud provider or container orchestration platform to deploy the container.
This microservice is licensed under the MIT License. See the LICENSE file for more details.