Skip to content

Collects image sets of various criteria off Bing Images

Notifications You must be signed in to change notification settings

fcornelius/facefetch

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 

Repository files navigation

facefetch

A script to batch download sets of images matching specified criteria off Bing Images and organize each set in subfolders. Although this was originally used to gather training data for a face recognizer it can be utilized to accumulate any kind of image clusters.

Dependencies

facefetch is written in Python3 and requires lxml and Beautiful Soup 4 for html parsing

pip install lxml beautifulsoup4

Sample usage

git clone https://github.com/phoelix/facefetch.git
cd facefetch
mkdir {'Bud Spencer','Terrence Hill'}
./ffetch -n 10 --size large --type photo --face closeup --ftypes jpg --rename {dir}_{:02d}

This results in two folders each containing 10 jpgs of large face closeups, with file naming format below:

Files example

Options

As ffetch takes the names of all subdirectories as search query input, the only required option is the number of images to download (-n) into each folder.

Positional Arguments:

path                    absolute path to directory with image folders.
                        defaults to working directory

Optional Arguments:

  --help, -h            show help message
  -n                    number of images to download in each directory
  --size, -s            minimum image size. Values: all, small,
                        medium, large, xlarge. defaults to medium
  --min, -m  [w h]      minimum image size in pixel dimensions (width height)
  --type, -t            image type. Values: photo, clipart,
                        lineart, anim
  --face, -f            images with faces. Values: closeup, portrait
  --ftypes, -T [[...]]  limit to image file types. Any combination of: jpg,
                        png, gif, tiff, bmp, svg
  --rename, -F          specify format to rename image files.
                        Accepts python3 string formats, use {:d} for a running
                        number and {dir} for the subfolder name
  --verbose, -v         enable verbose output


Image sizes are pixel-defined as followed:

  • small: max 200x200
  • medium: 200x200 - 500x500
  • large: min 500x500
  • xlarge: min 1000x1000

Define a custom minimum size with --min (eg --min 300 400)

Notes

  • Prefix foldernames with '.' to be ignored and not queried
  • ffetch will skip folders containing >= the specified amount of images to download, allowing it to resume batch downloading after abort
  • You will notice (in verbose mode) that some downloads fail due to HTTP 403/404/502... Errors. ffetch will collect an extra image for each failed download to make sure it always results in exactly n images for each subfolder.
  • I have chosen Bing Images over Google Image Search as Google search result links are generated by javascript, which makes parsing slow and inefficient.

About

Collects image sets of various criteria off Bing Images

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages