-
Notifications
You must be signed in to change notification settings - Fork 2
Home
CleanLinks protects your private life, by automatically detecting and skipping redirect pages, that track you on your way to the link you really wanted. Tracking parameters (e.g. utm_* or fbclid) are also removed.
You can test the current (master) link cleaning code online.
We automatically detect embedded URLs, which are used either:
- when websites report your current URL, or
- when websites bring you to an intermediate page to track you and then redirect you to their destination.
These requests are then respectively dropped (we could also consider removing the query parameter containing the current URL) and redirected to the embedded URL.
- Plain embedded URL http://www.foobar.com/track=ftp://gnu.org ➠ ftp://gnu.org/
- Base 64-encoded URL: http://example.com/aHR0cDovL3d3dy5nb29nbGUuY29t ➠ http://www.google.com
- javascript URL:
javascript:window.open('http://somesite.com')
➠ http://somesite.com/
CleanLinks has rules, which allow to specify which uses of embedded URLs are legitimate and whitelist those, i.e. not
redirect them. A typical example is a login page with a ?redirectUrl=
parameter to specify where to go once the login is
successful.
CleanLinks will break some websites and you will need to manually whitelist these URLs for them to work. This can be done easily via the popup from the CleanLinks toolbar icon.
Rules allow whitelisting some embedded URLs, and performing further cleaning actions, such as removing tracking parameters (e.g. utm_*), or rewriting an URL’s path.
Different parts of an URL: https://addons.mozilla.org/en-GB/firefox/addon/clean-links-webext/reviews/?score=5
- https
- Protocol
- org
- Public suffix (usually same as top-level domain)
- mozilla.org
- Domain name
- addons.
- Subdomain
- addons.mozilla.org
- Fully-Qualified Domain Name (FQDN)
- /en-GB/firefox/addon/clean-links-webext/reviews/
- Path
- ?score=5
- Query
- score
- Parameter
For maximum privacy, rules are maintained and editable locally (with decent defaults distributed in the add-on). There are currently 4 types of actions. Each of these is a list of regular expressions.
-
Remove query parameters: Any query parameters matched by any expression in this list is removed, unless it is matched by a whitelist expression.
For example, facebook adds a
fbclid
parameter with a unique identifier to every outgoing link, e.g.: https://soundcloud.com/artist/track?fbclid=IwAR1eyii3yum_rNgxs7ym2SY4bsb8QtCVtpOb3hYQ9bYOR-oao7lCC1fI1tY -
Whitelist query parameters: Any query parameters matched by any expression in this list is preserved as-is, even if it includes an embedded URL or is specified as a removeable parameter.
In particular, whitelisting a parameter that contains an embedded URL avoids redirecting (or removing) requests to the intermediate page.
For example, the stackoverflow login page contains the current page URL, to allow returning there once the user is logged in: https://stackoverflow.com/users/login?ssrc=head&returnurl=https%3a%2f%2fstackoverflow.com%2fquestions%2f32814161%2fhow-to-make-spoiler-text-in-github-wiki-pages
-
Replace in URL path: Any part of the URL that is matched by any expression in this list is replaced with the specified replacement − or removed if no replacement is specified.
For example, amazon puts its tracking data directly in the URL path, as
/ref=some-value
: https://www.amazon.es/gp/product/B06Y1VKRXJ/ref=ppx_od_dt_b_asin_title_s00?ie=UTF8&psc=1 -
Whitelist URL path (
Allow URL embedded in path
): This means embedded URLs are also allowed in the URL path, without causing the intermediate page to be skipped. This also prevents replacements from being performed in the URL’s path.For example, the web archive puts the archived page URL in the path, so whitelisting the path allows https://web.archive.org/web/20200304112831/http://www.google.com/ to not redirect to google.com.
NB: Cleaning parameters happens before detected embedded URLs. If an embedded URL is found, it is then cleaned as well before being returned.
CleanLinks analyses and cleans your browser’s requests before they leave the browser, except for javascript requests which are cleaned at the moment they are clicked.
At this stage, it can distinguish between 3 types of requests:
-
top-level requests, which are the websites that are opened and typically correspond to links that clicked inside or outside of the browser.
-
other requests, which are initiated by the website to load resources: scripts, images, iframes, etc.
-
header redirects, which happen when a website issues a 30x to send you from one location to the next. In this case we can clean the destination to which we are redirected.
The BBC for example uses link shorteners: https://bbc.in/some-hash redirects (via a trib.al link shortening) to the following URL, with added tracking parameters: https://www.bbc.co.uk/news/article-id?at_custom2=facebook_page&at_custom1=%5Bpost+type%5D&at_campaign=64&at_medium=custom7&at_custom3=BBC+News
The default rules are available as a json file and can be exported or imported from the CleanLinks
settings to allow backing up and restoring your rules set.
Rules are stored per domain hierarchically (thus right to left in the .
-separated domain parts).
These are detailed in the Rules section above.
For example, with the following rules:
{
".org": {
"actions": {
...
},
".mozilla": {
"actions": {
...
}
}
}
}
- The rules in the 1st
actions
are applied to all websites with the top-level domain.org
- The rules in the 2nd
actions
are applied to all websites of the domainmozilla.org
, or subdomains thereof (.e.g.www.mozilla.org
andaddons.mozilla.org
).