Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added Web Scraper Script #289

Merged
merged 2 commits into from
Oct 4, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -121,6 +121,8 @@ More information on contributing and the general code of conduct for discussion
| Weather GUI | [Weather GUI](https://github.com/DhanushNehru/Python-Scripts/tree/master/Weather%20GUI) | Displays information on the weather. |
| Website Blocker | [Website Blocker](https://github.com/DhanushNehru/Python-Scripts/tree/master/Website%20Blocker) | Downloads the website and loads it on your homepage in your local IP. |
| Website Cloner | [Website Cloner](https://github.com/DhanushNehru/Python-Scripts/tree/master/Website%20Cloner) | Clones any website and opens the site in your local IP. |
| Web Scraper | [Web Scraper](https://github.com/Charul00/Python-Scripts/tree/main/Web%20Scraper) | A Python script that scrapes blog titles from Python.org and saves them to a file. |

| Weight Converter | [Weight Converter](https://github.com/WatashiwaSid/Python-Scripts/tree/master/Weight%20Converter) | Simple GUI script to convert weight in different measurement units. |
| Wikipedia Data Extractor | [Wikipedia Data Extractor](https://github.com/DhanushNehru/Python-Scripts/tree/master/Wikipedia%20Data%20Extractor) | A simple Wikipedia data extractor script to get output in your IDE. |
| Word to PDF | [Word to PDF](https://github.com/DhanushNehru/Python-Scripts/tree/master/Word%20to%20PDF%20converter) | A Python script to convert an MS Word file to a PDF file. |
Expand Down
8 changes: 8 additions & 0 deletions Web Scraper/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
In this script, we use the `requests` library to send a GET request to the Python.org blogs page. We then use the `BeautifulSoup` library to parse the HTML content of the page.

We find all the blog titles on the page by searching for `h2` elements with the class `blog-title`. We then print each title found and save them to a file named `blog_titles.txt`.

To run this script, first install the required libraries:

```bash
pip install requests beautifulsoup4
30 changes: 30 additions & 0 deletions Web Scraper/Web_Scraper.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
import requests
from bs4 import BeautifulSoup

# URL to scrape data from
URL = "https://www.python.org/blogs/"

# Send a GET request to the URL
response = requests.get(URL)

# Parse the webpage content using BeautifulSoup
soup = BeautifulSoup(response.content, "html.parser")

# Find all the blog titles on the page
titles = soup.find_all('h2', class_='blog-title')

# Print each title found
print("Python.org Blog Titles:\n")
for i, title in enumerate(titles, start=1):
print(f"{i}. {title.get_text(strip=True)}")

# Save the titles to a file
with open("blog_titles.txt", "w") as file:
for title in titles:
file.write(title.get_text(strip=True) + "\n")

print("\nBlog titles saved to 'blog_titles.txt'.")