-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consider removing wildcard option of the Timing-Allow-Origin
header to prevent browser history leakage
#222
Comments
Doesn't
That indeed seems like a leak, similar to cache state until double-keying is applicable everywhere. As you stated, the solution seems to be the removal of third-party cookies. While all major browsers seem committed to take that route, it'll take us a couple of years before we get there. |
The concern as it was discussed in PING is that the entire spec is, to the user, largely cost (the privacy risks mentioned above, among others) w/ no corresponding benefit (to the first approximation). Removing In other words, let everyone access the feature seems like the wrong place on the cost/benefit curve, for the user. |
Can you point me to the relevant discussion? |
Thanks! I'm still utterly unconvinced that adding friction is anything but privacy theatre. As @michaelkleber pointed out in the minutes, 3P providers will be motivated to enable timing and will not necessarily be deterred by the need to implement the extra 3 lines of code that server side origin mirroring requires. The main "feature" of it would be that it would break the great majority of current users. That doesn't seem like a worthy goal. Seems like the problem here can be rephrased as "Exposing detailed timing information for credentialed cross-origin resource fetches increases risk of cross-origin state leak". Mitigations should tackle this problem head on. A non-comprehensive list may include:
|
Also worth noting that any such mitigations would be a stop gap measure until all browsers can eliminate 3P cookies, which is the eventual solution to this issue. |
I dont now how many sites this would break, and would be interested in numbers. But if this the numbers show this to be a significant issue, it seems that you could just only allow the v2 information on non-wildcard use. Then nothing would break; sites targeting the current standard would get what they already get, and sites that want the new information could get it w/ a more narrow TAO. Additionally, it seems bad-practice to automatically opt-in sites sending TAO * to share v1 features, who may not (for privacy, security or other reasons) want to share v2 features. Removing '*' would at least reduce the scope of this problem
This sounds like a promising avenue. Can you say more about how this might be done, or how you see this possibly working out?
Seems like we agree that this is a good idea, so lets consider this TODO and set it aside from the rest of the discussion. |
Maybe. But I don't think you want the stop-gap mitigation to rely on future restrictions which may or may not happen.
#221 seems like a completely distinct issue. If there is a problem there, it's completely unrelated to TAO.
That would be solved by double-key caching.
I agree that stop-gap mitigations are in order.
Sure, it adds friction for non-motivated sites. But how is adding friction a helpful goal here?
Not necessarily immediate user-visible breakage, but the numbers stated on this issue indicate 95% of currently timed resources will no longer be.
L2 has been shipping for the last ~5 years in Chrome and Firefox.
I don't think that adding more types of opt-ins is the way to go. If there are material risk differences with the addition of the L2 attributes, we should address those.
When a response is received, the browser should be aware of whether the request was sent with cookies, and whether the response has In terms of Fetch integration, HTTP Network of cache fetch (step 17) seems like a good integration point to add a flag which indicated if cookies were sent.
Whether this is a good idea or not seems tightly coupled with the rest of the discussion. |
I don't think i understand you here; i listed things shipping now that would make origin mirroring difficult to impossible, and suggested more or coming. In those cases, removing
I'm not following. That issue describes how resource timing v2 introduces a new privacy harm of sites being able to detect proxy use (extra troubling bc proxies are often used to increase privacy). Having
If the feature is only safe / privacy preserving to ship on platforms that double-key caches, thats important information to include in the spec.
The goal isn't to add friction, the goal is to prevent specs from including capabilities that allow for broad disclosure of privacy relevant behavior, when requiring the information leak to be narrowly tailored seems to remove the "copy-paste" foot-gun, and a very reasonable compromise (especially since this feature doesn't benefit users at first hop).
One way to read this is a 95% privacy improvement. If the claim is that this will break things, it would be good to have # showing that.
"The non-standardized version is already implemented, so it can't be changed", is not compatible with how horizontal review works. PING is trying to work with you to get the functionality you're saying is important working on sites you think its needed most.
Again, I think this could be a useful direction, but it hinges on what those restrictions are, and getting them into the mandatory parts of the spec.
I'm sorry but i'm not following. Are you saying the spec should, or shouldn't, eliminate the cookie header sizes from the reported values? |
Unless we can ensure those further restrictions on
If I'm a website that wants to discriminate against people using a proxy by detecting protocol changes, extra TAO restrictions won't prevent me from doing that.
No, it does not. The nefarious website can setup its own servers that are using different protocols, setup their TAO headers as it wishes, and use that for protocol inspection.
I'm saying that I think the WG should seriously consider that as a potential mitigation, that seems directly related to the risk. |
I'm really surprised by this response. Is it the WG's opinion that "minority cases" where browsers and sites are trying to improve privacy by reducing referrer information is not relevant / "super interesting"? Plus, referrer policy is not a "minority case", its built into the platform, supported on all sites. It would not be possible to origin mirror for any site with a referrer policy.
I'm still now following here. The claim in the issue is that resource timing v2 introduces new privacy harms, including detecting proxy usage. The ideal thing would be to remove these capabilities from the spec. However, the WG has identified these use cases as important in some cases and so worth supporting. So, the proposal in the issue is to at least reduce the scope of the harm, and reduce the from being the common case, to at least being a far smaller set of narrow cases. We're trying to help you, with a proposal that allows the functionality to be applied in narrow cases, where its most useful. I'm sincerely trying to understand the WG's position:
The position in the issue, as i understand it, and as it was discussed in PING, is that there is risk here (of which proxy detection is one example), and that removing |
First, let me clarify that I speak as editor and WG chair, but not as "the WG". Being a co-chair of the group doesn't mean that I can speak for its members. The WG hasn't been consulted yet. I intend to raise this issue to the group on our upcoming call. Second, reducing referrer information can be interesting on its own, but I don't deem it relevant as a mitigation for the issue raised. Specifically - the issue is hinged on 3P cookies and single-keying of caches. Eliminating 3P cookies and double-keying the various browser caches is the ultimate solution here.
My position is that we should discuss this in #221, as the issue is completely unrelated to TAO restrictions. It is also my position that if you're trying to defend yourself from a specific attack, the mitigation should do something to prevent the attacker from doing its thing, and that unrelated restrictions won't help you much. |
Thank you @yoavweiss for taking a look and @snyderp for jumping in! Thanks, that's useful, but as far as I understand, this only discusses HTTP cache which doesn't address neither issue 1 (requires a partitioned DNS cache) nor issue 3 (requires a partitioned sessions/sockets). A more broad approach (similar to Chrome's Storage Isolation Project) was recently proposed here: privacycg/proposals#4 .
I don't know, but assuming it does, isn't
Please note that this number doesn't represent how often resource timing data is being actually collected.
Those mitigations look really good to me 👍Fuzzing seems to be a good idea for many of the resource timing properties (as a stop gap measure for all browsers that don't implement cache partitioning). Thinking a bit ahead here: isn't it possible to determine the non-fuzzed value by collecting enough fuzzed samples? Are browsers doing fuzzing in any other API (I'd love to read up on the prior art)?
If restrictions will only be applied to responses that match specific criteria, wouldn't it make it easier to detect them? e.g. fuzzed
I agree that we should discuss this separately to keep this issue focused. |
Maybe, If there are many properties that turn on |
We (DuckDuckGo) will have another 50k website crawl soon, I'll try to get that data. |
I would be opposed to the proposed change because:
|
I looked into the use of
|
Since you're digging into that, from the ~12% that have |
|
I believe this is a bug in the implementation of RT by several browsers https://bugs.chromium.org/p/chromium/issues/detail?id=1039080 https://bugs.webkit.org/show_bug.cgi?id=205768 |
Is this study published anywhere / is there anyway we can get a look at the data in more detail? Of the 1.1M 3rd-party requests mentioned above, what services are in them e.g. sites users login into e.g. social networks, vs sites content is served from e.g image CDNs etc.
Is this is a realistic option for many third-party tags providers? Consider a provider who hosts their common code on cloud storage and serves it via a dumb CDN, how would they set TAO to the relevant origin? More intelligent CDNs can enable this via config at the edge but what happens when there's no referrer to reflect back in TAO, or |
You've identified exactly the point of the issue. That the feature should be limited to sites looking to debug their own resources, and that allowing everyone to time everything from everywhere is the wrong point on the privacy / functionality curve. There is both utility and risk in this functionality, so taking steps to restrict (but not remove) its availability to a smaller set of parties is PINGs suggestion for how to keep the functionality but reduce its privacy risk. |
Pete, that was not at all the PING suggestion (minutes here). Suggestions at that meeting included keeping wildcard but not making it the default; renaming it from * to unsafe-all; and cache partitioning. Dropping wildcard entirely was only presented as something that developers could overcome if they exerted more effort. Nobody at PING voiced any support for the position you just voiced, which involved creating cases in which it is impossible to measure timing. Please do not abuse your role by claiming your own novel opinion is some group consensus. |
|
I've opened a distinct PING process issue to capture @michaelkleber's concern / complaint and make sure its addressed publicly, without clouding the discussion of the substance of this issue. w3cping/administrivia#28 |
@pes10k So you're suggesting suggesting sites shouldn't be able to measure in detail the performance of 3rd party resources they're including in their pages? |
Sites are welcome to measure whatever they'd like, using their own machinery and browsers. Whats at issue here is what sites can use visitors' browsers to measure, and how broadly. Since @kdzwinel / PING has identified privacy risk involved in the measurements, it seems prudent to reduce the number and frequency of those measurements. No wild cards is one way of doing so. Something like an analytics / debugging mode would be another way of doing so (e.g. users who want to help the site by providing more information to the site, can signal they'd like to do so. Crawlers and other automation / measurement infrastructure might just always have such a mode on). |
Thanks for taking a look @andydavies! Those issues are definitely related, but I'm not quite sure if fixing them adresses our concern here. As far as I understand, after bug is fixed,
Unfortunately not, although we are contemplating releasing raw crawl data. In the meantime we welcome everyone to double-check our findings using other sources. Perhaps HTTP Archive has similar data?
That's our (I did this review togheter with @dharb and @jdorweiler) recommendation. We are happy to discus alternative mitigations (like the one proposed by @pes10k above). |
What about valid cases of wildcard usage? Mainly by common/shared CDNs such as Google Hosted Libraries where webfonts, css, and js libraries are loaded by several websites. Removing '*' would not allow those websites to measure latency on those resources. |
@marcelduran this feature isn't about whether sites can measure latency (there are already many many ways they can do so, like automation / bots); this issue is about whether sites should be able to use visitors' browsers to measure third-party latency, and subject those users to the corresponding privacy and security risks, w/o the users' knowledge or consent. (re: #222 (comment)) |
What do you mean by automation/bots? Lab measurements? Lab cannot replace real-user performance measurement. That's the whole point of web performance APIs. |
Sites interested in those measurements could replicate the same measurements by asking users "would you like us to help debug / monitor the performance of this site (possibly w/ compensation)? If so, here is a simple automation / puppeteer script you can run on your machine, etc." If the concern is that most users may not wish to participate in such measurements, that part of the point of this issue; there is privacy and security risk in this feature. If sites would like users to take on those risks / costs, and have people volunteer their resources on sites' behalf, they should ask permission (the "analytics / debugging mode" proposal linked to in #222 (comment)) or otherwise incentivize them. |
I agree with you that performance measurements may have privacy and security risks. However I don't think I agree with any of the proposed solutions here. For the proposal to remove "TAO *" I posted my thoughts in #222 (comment). For the proposal to gate it on user consent, I think it misses the point. Most users are not in a position to reason about the privacy/security risks of web platform features. Even if they were, asking them about this for every new website they visit would be a terrible user experience. And their response to such questions will likely be misinformed or driven by their annoyance at the prompts, so it would ultimately affect performance instrumentation more than anything. |
There's no need to rediscuss the proposed mitigation. It was discussed at a WG meeting and was deemed ineffective. There's a real issue here around the potential exposure of cookies being set, but we'd need another mitigation for it. Maybe we can start by eliminating header sizes as a stop-gap, and align the different opt-ins as part of L3? |
For folks who are not familiar with the process here, I’ll just note that in W3C process, all issues need to either be addressed to the initiators and WG’s mutual satisfaction, or decided by the director.
I'm not (in this comment) addressing the substance of whether proposed mitigation is the right solution or not, only that the WG meeting and saying the mitigation is ineffective does not (in and of itself) mean the proposed mitigation does not warrant more discussion.
|
Can you point me to the portion of the process document that indicates that? What I found was the definition of Formally Addressing an issue: Are there portions of my response above or of the discussion in the WG minutes that a W3C reviewer would not consider "technically sound"? If so, it'd be great if you could point them out. On top of that, note that the WG is obligated to formally address issues when a specification is on its way to Proposed Recommendation, not before. As an aside, you may want to note that the role of wide review is to provide comments and point out issues in specifications, and not necessarily suggest and litigate specific solutions to those issues. With all that said, if for some reason you would like to re-litigate your proposed mitigation, we can arrange for you to present it to the WG in one of its upcoming meetings. |
My comment is not specific to wide review at all, its just how W3C works in general, regarding objections. Any member may raise an objection, and elevate it to a formal objection, which triggers the process I mentioned.
https://www.w3.org/2019/Process-20190301/#FormalObjection
… On May 6, 2020, at 8:41 AM, Yoav Weiss ***@***.***> wrote:
For folks who are not familiar with the process here, I’ll just note that in W3C process, all issues need to either be addressed to the initiators and WG’s mutual satisfaction, or decided by the director.
Can you point me to the portion of the process document that indicates that?
What I found was the definition of Formally Addressing an issue:
"A group has formally addressed an issue when it has sent a public, substantive response to the reviewer who raised the issue. A substantive response is expected to include rationale for decisions (e.g., a technical explanation, a pointer to charter scope, or a pointer to a requirements document). The adequacy of a response is measured against what a W3C reviewer would generally consider to be technically sound."
Are there portions of my response above or of the discussion in the WG minutes that a W3C reviewer would not consider "technically sound"? If so, it'd be great if you could point them out.
On top of that, note that the WG is obligated to formally address issues when a specification is on its way to Proposed Recommendation, not before.
As an aside, you may want to note that the role of wide review is to provide comments and point out issues in specifications, and not necessarily suggest and litigate specific solutions to those issues.
With all that said, if for some reason you would like to re-litigate your proposed mitigation, we can arrange for you to present it to the WG in one of its upcoming meetings.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
Are you raising a Formal Objection? |
No, I'm not, or at least not at this moment; I didn’t even file this issue (though I support it). I’m only noting that if someone / anyone doesn’t agree with how the WG resolves this issue, it can be elevated to a formal objection.
And so, “WG doesn’t agree with mitigation” isn’t a reason, in and of itself, to end discussion, since someone can raise a formal objection if they don’t agree with the WG
… On May 6, 2020, at 8:47 AM, Yoav Weiss ***@***.***> wrote:
Are you raising a Formal Objection?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
FWIW, some of the issues pointed out in this thread about |
Note that the current version of the spec uses constant numbers The whole discussion here is big, I'm not familiar with it enough to say whether it helps with the |
Hey PING folks! Maybe we can close this with the following:
Would that work for y'all? |
We crawled 50,000 websites and found that 95% of 1.1M third party requests using the 'Timing-Allow-Origin' header were using a wildcard. Wide usage of wildcard combined with the amount of detailed information that this API exposes about third party requests creates multiple opportunities for leaking user's browsing history.
domainLookupStart
/domainLookupEnd
properties (introduced in level 1) allow any website to extract some information from browser's DNS cache. In particular this can be used to detect a new private session (by checking ifdomainLookupStart !== domainLookupEnd
for some popular services like google-analytics.com).redirectEnd - redirectStart !== 0
(level 1) may leak information about user visiting given website in the past through browser's enforcement of HSTS (HSTS redirects being instant compared to the regular 30X redirects).secureConnectionStart === 0
(level 1) can reveal information about a connection being reused suggesting that user recently visited given website.Many applications are set up in a way that new users are getting 'set-cookie' header on response while users with cookies set are not getting that header. By observing size of the headers (
transferSize - encodedBodySize
) website can learn if cookies were sent with a given third-party request or not.It's worth noting that issues 1 and 3 can be mitigated by the user agent by double-keying of the caches. However, since this technique is not a W3C standard it doesn't address our concerns. Similarly, issue 4 can be mitigated by blocking third party cookies, but it's not a standard behavior.
To mitigate above risks we suggest dropping wildcard functionality in the
Timing-Allow-Origin
header. This will force developers to list actual domains that they want to share this information with and greatly reduce amount of domains that can be scanned, using above techniques. If there are cases where wildcard is required developers will still be able to simulate it by settingtiming-allow-origin
value based on the value of the request'sreferer
header.The other possible mitigation is to introduce randomness to the values returned by the API. As we understand those values are meant to be processed in bulk by website owners to uncover performance trends, there seems to be no need for those values to be always accurate or as precise as they are now.
The text was updated successfully, but these errors were encountered: