Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Web Scraper Script Added #281

Closed
wants to merge 1 commit into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions Web Scraper/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
In this script, we use the `requests` library to send a GET request to the Python.org blogs page. We then use the `BeautifulSoup` library to parse the HTML content of the page.

We find all the blog titles on the page by searching for `h2` elements with the class `blog-title`. We then print each title found and save them to a file named `blog_titles.txt`.

To run this script, first install the required libraries:

```bash
pip install requests beautifulsoup4
30 changes: 30 additions & 0 deletions Web Scraper/Web_Scraper.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
import requests
from bs4 import BeautifulSoup

# URL to scrape data from
URL = "https://www.python.org/blogs/"

# Send a GET request to the URL
response = requests.get(URL)

# Parse the webpage content using BeautifulSoup
soup = BeautifulSoup(response.content, "html.parser")

# Find all the blog titles on the page
titles = soup.find_all('h2', class_='blog-title')

# Print each title found
print("Python.org Blog Titles:\n")
for i, title in enumerate(titles, start=1):
print(f"{i}. {title.get_text(strip=True)}")

# Save the titles to a file
with open("blog_titles.txt", "w") as file:
for title in titles:
file.write(title.get_text(strip=True) + "\n")

print("\nBlog titles saved to 'blog_titles.txt'.")