-
Notifications
You must be signed in to change notification settings - Fork 77
Home
The development of Wget2 started and everybody is invited to contribute, test,
discuss, etc.
The codebase is hosted in the 'wget2' branch of wget's git repository and on
github - both will be regularly synced.
Wget2 on Savannah (checkout branch 'wget2' afte cloning)
The idea is to have a fresh and maintainable codebase with features like
multithreaded downloads, HTTP2, OCSP, HSTS, Metalink, IDNA2008, Public Suffix
List, Multi-Proxies, Sitemaps, Atom/RSS Feeds, compression (gzip, deflate,
lzma, bzip2), support for local filenames, etc.
Some of these feature have been built into Wget in the meantime, but some
other are really hard to implement into the old codebase.
Most of the functionality is exposed via library API (libwget), to allow
external programs make use of it. E.g. have a look at
examples/print_css_urls.c
- just a few lines of C to parse and print out all
URLs from a CSS file.
Wget2 will stay as an own executable separate from Wget.
So you can install and test Wget2 without endangering your existing architecture and
scripts.
- FTP(S) support
- WARC support
- Several Wget options are missing.
- API documentation incomplete
--force-css Treat input file as CSS. (default: off)
--force-sitemap Treat input file as Sitemap. (default: off)
--force-atom Treat input file as Atom Feed. (default: off)
--force-rss Treat input file as RSS Feed. (default: off)
--force-metalink Treat input file as Metalink. (default: off)
--metalink Parse and follow metalink files and don't save them (default: on)
--max-threads Max. concurrent download threads. (default: 5)
--gnutls-options Custom GnuTLS priority string. Interferes with --secure-protocol. (default: none)
--ocsp-stapling Use OCSP stapling to verify the server's certificate. (default: on)
--ocsp Use OCSP server access to verify server's certificate. (default: on)
--ocsp-file Set file for OCSP chaching. (default: .wget_ocsp)
--http2 Use HTTP/2 protocol if possible. (default: on)
--input-encoding Character encoding of the file contents read with --input-file. (default: local encoding)
--cookie-suffixes Load public suffixes from file. They prevent 'supercookie' vulnerabilities.
--chunk-size Download large files in multithreaded chunks. (default: 0 (=off))
Example: wget --chunk-size=1M
--check-hostname Check the server's certificate's hostname. (default: on)
--dns-caching Caching of domain name lookups. (default: on)
--http-proxy Set HTTP proxy/proxies, overriding environment variables.
--https-proxy Set HTTPS proxy/proxies, overriding environment variables.
--input-encoding Character encoding of the file contents read with --input-file. (default: local encoding)
--tcp-fastopen Enable TCP Fast Open (TFO). (default: on)
--robots Respect robots.txt standard for recursive downloads. (default: on)
--random-file File to be used as source of random data.
--fsync-policy Use fsync() to wait for data being written to the pysical layer. (default: off)
- new 'include' statement for config files, e.g. to load /etc/wget/conf.d/*.conf
- --input-file - (reading URLs from stdin) starts downloading with the first URL to allow slow URL generators feed Wget2
- check HTTP 'ETag' to avoid parsing doublettes
- use HTTP 'Accept-Encoding': gzip, deflate, lzma, bzip2
- CLI string options can be set to NULL by prepending a --no-, e.g. --no-user-agent
- boolean CLI options can all be set to true or false
- $WGETRC is not read so far
Option | Wget | Wget2 | Comment |
---|---|---|---|
--accept-regex | β | ||
--ask-password | β | ||
--auth-no-challenge | β | ||
--background | β | ||
--body-data | β | ||
--body-file | β | ||
--check-hostname | β | ||
--chunk-size | β | ||
--config | β | β | Same as --config-file, for compatibilty with Wget1.x |
--config-file | β | ||
--convert-file-only | β | ||
--cookie-suffixes | β | ||
--dns-caching | β | ||
--exclude-directories | β | ||
--egd-file | β | β | A Noop for compatibility (GnuTLS can be compiled/configured to use EGD) |
--follow-ftp | β | ||
--metalink | β | ||
--force-atom | β | ||
--force-css | β | ||
--force-metalink | β | ||
--force-rss | β | ||
--force-sitemap | β | ||
--ftp-password | β | ||
--ftps-clear-data-connection | β | ||
--ftps-fallback-to-ftp | β | ||
--ftps-implicit | β | ||
--ftps-resume-ssl | β | ||
--ftp-user | β | ||
--glob | β | ||
--header | β | ||
--gnutls-options | β | ||
--http2 | β | ||
--http-proxy | β | ||
--https-proxy | β | ||
--if-modified-since | β | Wget2 uses If-Modified-Since when timestamping is turned on | |
--ignore-length | β | ||
--include-directories | β | ||
--input-encoding | β | ||
--input-metalink | β | (β) | Wget2 uses a combination of --input-file and --force-metalink |
--limit-rate | β | For Wget2 use a bandwidth limiter like trickle | |
--metalink-over-http | β | Wget2 does this automatically | |
--method | β | ||
--max-threads | β | ||
--netrc-file | β | Mainly for test code usage to test .netrc files | |
--ocsp | β | ||
--ocsp-file | β | ||
--ocsp-stapling | β | ||
--passive-ftp | β | ||
--preferred-location | β | Wget2 respects priorities and order of locations | |
--preserve-permissions | β | ||
--proxy-password | β | ||
--proxy-user | β | ||
--random-file | β | ||
--regex-type | β | ||
--rejected-log | β | ||
--reject-regex | β | ||
--relative | β | ||
--remove-listing | β | ||
--report-speed | β | ||
--retr-symlinks | β | ||
--retry-connrefused | β | ||
--robots | β | Wget1.x has a robots command but no option, -e robots=1 does the job | |
--show-progress | β | ||
--start-pos | β | ||
--tcp-fastopen | β | ||
--unlink | β | ||
--warc-cdx | β | ||
--warc-compression | β | ||
--warc-dedup | β | ||
--warc-digests | β | ||
--warc-file | β | ||
--warc-header | β | ||
--warc-keep-log | β | ||
--warc-max-size | β | ||
--warc-tempdir | β |