Skip to content

A robust Image Scraper that leverages OpenAI's GPT Chat Completions to determine the relevant HTML used to Scrape Images from websites.

License

Notifications You must be signed in to change notification settings

theSTremblay/GPT-Image-Scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GPT-Image-Scraper

A robust Image Scraper that leverages OpenAI's GPT Chat Completions and Selenium to determine the relevant HTML and uses it to Scrape Images from websites.

Description

Broadly the scraper requires limited to no knowledge of HTML or CSS so if you code mostly backend REJOICE!

  • Here is an educational example- Scraping an image header from an nba box score. This gives an idea of the tkinter window and the popup that will show when you click on an image.
  • The files are saved to a directory called web_scrapings after the input is selected.
  • Go Bucks!

Example

There are two areas that require manual input at the start:

1: Terminal Input: When you navigate to the URL you are parsing you will need to click on the image and a popup will show you the html of the page- copy this and input it into the input in your terminal or IDE
2: Tkinter GUI Input: Will prompt you to describe the field you want to parse. The fields are optional but tell GPT what to look for more precisely- the better your description the better the output should be- you can play around with it. 

TODO: Educational, since the output is not always accurate. Working on testing it on more websites and testing more of the "page turning" functions. Still working on that

Getting Started

Dependencies

Python 3 - (3.10 or greater if troubleshooting)

Installation

These are the necessary libraries for this project written into pip commands:

pip install selenium
pip install requests

Installing

  • API_KEY in GPT_utils is needed before starting. It is what holds the OpenAI credentials. Making an Open AI account and troubleshooting can be found here:

https://platform.openai.com/docs/quickstart?context=python

The model I am using by default is gpt-3.5-turbo, there are other models like "gpt-4" you can take from the OpenAI website.

Executing program

  • Try Running from an IDE at first
  • Be aware of the two Inputs at the start of the program one in the terminal and the other in the tkinter popup- after that it should be automated and download the image to webscrapings

Considerations

  • This code leverage OpenAI's API - a paid platform
  • OpenAI's account and pricing models can be found here: https://openai.com/pricing

   😳   If you found any code helpful in the repo, drop a star sir. Helps out with the jobs sir

More-to-Come

I have run through a set of examples. I plan to be uploading gifs later and making bug fixes soon

The code may need to be made more robust I'll work on that after the inital commit

About

A robust Image Scraper that leverages OpenAI's GPT Chat Completions to determine the relevant HTML used to Scrape Images from websites.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages