Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider removing wildcard option of the Timing-Allow-Origin header to prevent browser history leakage #222

Open
kdzwinel opened this issue Feb 5, 2020 · 42 comments
Labels
privacy-needs-resolution Issue the Privacy Group has raised and looks for a response on.

Comments

@kdzwinel
Copy link

kdzwinel commented Feb 5, 2020

We crawled 50,000 websites and found that 95% of 1.1M third party requests using the 'Timing-Allow-Origin' header were using a wildcard. Wide usage of wildcard combined with the amount of detailed information that this API exposes about third party requests creates multiple opportunities for leaking user's browsing history.

  1. Extracting DNS cache information

domainLookupStart/domainLookupEnd properties (introduced in level 1) allow any website to extract some information from browser's DNS cache. In particular this can be used to detect a new private session (by checking if domainLookupStart !== domainLookupEnd for some popular services like google-analytics.com).

  1. Extracting HSTS information

redirectEnd - redirectStart !== 0 (level 1) may leak information about user visiting given website in the past through browser's enforcement of HSTS (HSTS redirects being instant compared to the regular 30X redirects).

  1. Extracting reused connections

secureConnectionStart === 0 (level 1) can reveal information about a connection being reused suggesting that user recently visited given website.

  1. Extracting information about cookies being set

Many applications are set up in a way that new users are getting 'set-cookie' header on response while users with cookies set are not getting that header. By observing size of the headers (transferSize - encodedBodySize) website can learn if cookies were sent with a given third-party request or not.

It's worth noting that issues 1 and 3 can be mitigated by the user agent by double-keying of the caches. However, since this technique is not a W3C standard it doesn't address our concerns. Similarly, issue 4 can be mitigated by blocking third party cookies, but it's not a standard behavior.

To mitigate above risks we suggest dropping wildcard functionality in the Timing-Allow-Origin header. This will force developers to list actual domains that they want to share this information with and greatly reduce amount of domains that can be scanned, using above techniques. If there are cases where wildcard is required developers will still be able to simulate it by setting timing-allow-origin value based on the value of the request's referer header.
The other possible mitigation is to introduce randomness to the values returned by the API. As we understand those values are meant to be processed in bulk by website owners to uncover performance trends, there seems to be no need for those values to be always accurate or as precise as they are now.

@yoavweiss
Copy link
Contributor

It's worth noting that issues 1 and 3 can be mitigated by the user agent by double-keying of the caches. However, since this technique is not a W3C standard it doesn't address our concerns.

whatwg/fetch#904

  1. Extracting HSTS information

Doesn't ACAO: * expose the same?

  1. Extracting information about cookies being set

That indeed seems like a leak, similar to cache state until double-keying is applicable everywhere.
At the same time, I don't see how increasing friction to TAO opt-in would somehow solve this.

As you stated, the solution seems to be the removal of third-party cookies. While all major browsers seem committed to take that route, it'll take us a couple of years before we get there.
If this is a major issue in the wild, we can discuss mitigations in the meantime.

@plehegar plehegar added the privacy-tracker Group bringing to attention of Privacy, or tracked by the Privacy Group but not needing response. label Feb 10, 2020
@pes10k
Copy link

pes10k commented Feb 10, 2020

At the same time, I don't see how increasing friction to TAO opt-in would somehow solve this.

The concern as it was discussed in PING is that the entire spec is, to the user, largely cost (the privacy risks mentioned above, among others) w/ no corresponding benefit (to the first approximation). Removing * (and possibly limiting the number of origins that could appear) would remove most of the cost to users, but still enable the functionality where it's needed and useful to most sites (folks debugging their own infrastructure, running automated tests on their own applications, etc).

In other words, let everyone access the feature seems like the wrong place on the cost/benefit curve, for the user.

@yoavweiss
Copy link
Contributor

The concern as it was discussed in PING

Can you point me to the relevant discussion?

@yoavweiss
Copy link
Contributor

Thanks!

I'm still utterly unconvinced that adding friction is anything but privacy theatre. As @michaelkleber pointed out in the minutes, 3P providers will be motivated to enable timing and will not necessarily be deterred by the need to implement the extra 3 lines of code that server side origin mirroring requires. The main "feature" of it would be that it would break the great majority of current users. That doesn't seem like a worthy goal.

Seems like the problem here can be rephrased as "Exposing detailed timing information for credentialed cross-origin resource fetches increases risk of cross-origin state leak".

Mitigations should tackle this problem head on. A non-comprehensive list may include:

  • Apply extra restrictions when "actually credentialed" resources are involved - e.g. when the request contained a cookie or when the response has set one.
  • Eliminate the cookie header sizes from the reported values.
  • Fuzz the reported values
  • A mix of the above

@yoavweiss
Copy link
Contributor

Also worth noting that any such mitigations would be a stop gap measure until all browsers can eliminate 3P cookies, which is the eventual solution to this issue.

@pes10k
Copy link

pes10k commented Feb 11, 2020

  1. There are many cases where origin mirroring isn't possible. The web is increasingly reducing referrer information. Firefox includes the option to disable, Brave doesn't reveal referrer information in almost all cases, referrer policy allows sites to say "don't send", Safari ITP restricts to eTLD+1 for some domains, etc. It seems likely that further restrictions on referrer are coming. In all these cases the ability of the site to mirror the origin is reduced or removed.
  2. Some of this leak is distinct from what 3p cookies shows (Consider removing nextHopProtocol as it may expose whether visitor is using VPN / proxy #221, history leak from Are PerformanceResourceTiming instances supposed to have a serializer? #3 above, etc)
  3. Removing 3p cookies would help with some of this, but since the majority browser isn't planning on doing so for at least two years, "stop gap" measures would be very useful
  4. I think you estimate how much friction a few lines of header code add, and how much it can help avoid accidental information exposure. Caching, CDNs, intermediate proxies, etc all make origin-mirroring non-trivial for common-case websites.

would break the great majority of current users

I dont now how many sites this would break, and would be interested in numbers. But if this the numbers show this to be a significant issue, it seems that you could just only allow the v2 information on non-wildcard use. Then nothing would break; sites targeting the current standard would get what they already get, and sites that want the new information could get it w/ a more narrow TAO.

Additionally, it seems bad-practice to automatically opt-in sites sending TAO * to share v1 features, who may not (for privacy, security or other reasons) want to share v2 features. Removing '*' would at least reduce the scope of this problem

Apply extra restrictions when "actually credentialed" resources are involved

This sounds like a promising avenue. Can you say more about how this might be done, or how you see this possibly working out?

Eliminate the cookie header sizes from the reported values.

Seems like we agree that this is a good idea, so lets consider this TODO and set it aside from the rest of the discussion.

@yoavweiss
Copy link
Contributor

  1. There are many cases where origin mirroring isn't possible. The web is increasingly reducing referrer information. Firefox includes the option to disable, Brave doesn't reveal referrer information in almost all cases, referrer policy allows sites to say "don't send", Safari ITP restricts to eTLD+1 for some domains, etc. It seems likely that further restrictions on referrer are coming. In all these cases the ability of the site to mirror the origin is reduced or removed.

Maybe. But I don't think you want the stop-gap mitigation to rely on future restrictions which may or may not happen.

2. Some of this leak is distinct from what 3p cookies shows (#221

#221 seems like a completely distinct issue. If there is a problem there, it's completely unrelated to TAO.

2. history leak from #3 above

That would be solved by double-key caching.

3. Removing 3p cookies would help with some of this, but since the majority browser isn't planning on doing so for at least two years, "stop gap" measures would be very useful

I agree that stop-gap mitigations are in order.

4. I think you estimate how much friction a few lines of header code add, and how much it can help avoid accidental information exposure. Caching, CDNs, intermediate proxies, etc all make origin-mirroring non-trivial for common-case websites.

Sure, it adds friction for non-motivated sites. But how is adding friction a helpful goal here?

I dont now how many sites this would break, and would be interested in numbers

Not necessarily immediate user-visible breakage, but the numbers stated on this issue indicate 95% of currently timed resources will no longer be.

But if this the numbers show this to be a significant issue, it seems that you could just only allow the v2 information on non-wildcard use. Then nothing would break; sites targeting the current standard would get what they already get, and sites that want the new information could get it w/ a more narrow TAO.

L2 has been shipping for the last ~5 years in Chrome and Firefox.

Additionally, it seems bad-practice to automatically opt-in sites sending TAO * to share v1 features, who may not (for privacy, security or other reasons) want to share v2 features. Removing '*' would at least reduce the scope of this problem

I don't think that adding more types of opt-ins is the way to go. If there are material risk differences with the addition of the L2 attributes, we should address those.

This sounds like a promising avenue. Can you say more about how this might be done, or how you see this possibly working out?

When a response is received, the browser should be aware of whether the request was sent with cookies, and whether the response has Set-Cookie in its headers. The browser can then apply different restrictions to those responses and how they are reported to Resource Timing.

In terms of Fetch integration, HTTP Network of cache fetch (step 17) seems like a good integration point to add a flag which indicated if cookies were sent.

Seems like we agree that this is a good idea, so lets consider this TODO and set it aside from the rest of the discussion.

Whether this is a good idea or not seems tightly coupled with the rest of the discussion.

@pes10k
Copy link

pes10k commented Feb 11, 2020

Maybe. But I don't think you want the stop-gap mitigation to rely on future restrictions which may or may not happen.

I don't think i understand you here; i listed things shipping now that would make origin mirroring difficult to impossible, and suggested more or coming. In those cases, removing * is not just friction, its concrete privacy improvement.

#221 seems like a completely distinct issue. If there is a problem there, it's completely unrelated to TAO.

I'm not following. That issue describes how resource timing v2 introduces a new privacy harm of sites being able to detect proxy use (extra troubling bc proxies are often used to increase privacy). Having * in TAO increases the scope of that harm.

That would be solved by double-key caching.

If the feature is only safe / privacy preserving to ship on platforms that double-key caches, thats important information to include in the spec.

Sure, it adds friction for non-motivated sites. But how is adding friction a helpful goal here?

The goal isn't to add friction, the goal is to prevent specs from including capabilities that allow for broad disclosure of privacy relevant behavior, when requiring the information leak to be narrowly tailored seems to remove the "copy-paste" foot-gun, and a very reasonable compromise (especially since this feature doesn't benefit users at first hop).

Not necessarily immediate user-visible breakage, but the numbers stated on this issue indicate 95% of currently timed resources will no longer be.

One way to read this is a 95% privacy improvement. If the claim is that this will break things, it would be good to have # showing that.

L2 has been shipping for the last ~5 years in Chrome and Firefox.

"The non-standardized version is already implemented, so it can't be changed", is not compatible with how horizontal review works. PING is trying to work with you to get the functionality you're saying is important working on sites you think its needed most.

The browser can then apply different restrictions to those responses and how they are reported to Resource Timing

Again, I think this could be a useful direction, but it hinges on what those restrictions are, and getting them into the mandatory parts of the spec.

Whether this is a good idea or not seems tightly coupled with the rest of the discussion.

I'm sorry but i'm not following. Are you saying the spec should, or shouldn't, eliminate the cookie header sizes from the reported values?

@yoavweiss
Copy link
Contributor

I don't think i understand you here; i listed things shipping now that would make origin mirroring difficult to impossible, and suggested more or coming. In those cases, removing * is not just friction, its concrete privacy improvement.

Unless we can ensure those further restrictions on Referer ship in all browsers very soon, the minority cases where its already shipped are not super interesting.

I'm not following. That issue describes how resource timing v2 introduces a new privacy harm of sites being able to detect proxy use (extra troubling bc proxies are often used to increase privacy).

If I'm a website that wants to discriminate against people using a proxy by detecting protocol changes, extra TAO restrictions won't prevent me from doing that.

Having * in TAO increases the scope of that harm.

No, it does not. The nefarious website can setup its own servers that are using different protocols, setup their TAO headers as it wishes, and use that for protocol inspection.

I'm sorry but i'm not following. Are you saying the spec should, or shouldn't, eliminate the cookie header sizes from the reported values?

I'm saying that I think the WG should seriously consider that as a potential mitigation, that seems directly related to the risk.

@pes10k
Copy link

pes10k commented Feb 11, 2020

Unless we can ensure those further restrictions on Referer ship in all browsers very soon, the minority cases where its already shipped are not super interesting.

I'm really surprised by this response. Is it the WG's opinion that "minority cases" where browsers and sites are trying to improve privacy by reducing referrer information is not relevant / "super interesting"?

Plus, referrer policy is not a "minority case", its built into the platform, supported on all sites. It would not be possible to origin mirror for any site with a referrer policy.

If I'm a website that wants to discriminate against people using a proxy by detecting protocol changes, extra TAO restrictions won't prevent me from doing that.

No, it does not. The nefarious website can setup its own servers that are using different protocols, setup their TAO headers as it wishes, and use that for protocol inspection.

I'm still now following here. The claim in the issue is that resource timing v2 introduces new privacy harms, including detecting proxy usage. The ideal thing would be to remove these capabilities from the spec. However, the WG has identified these use cases as important in some cases and so worth supporting. So, the proposal in the issue is to at least reduce the scope of the harm, and reduce the from being the common case, to at least being a far smaller set of narrow cases. We're trying to help you, with a proposal that allows the functionality to be applied in narrow cases, where its most useful.

I'm sincerely trying to understand the WG's position:

  1. there is no harm here (that, for example, detecting proxy use is not considered a privacy harm)
  2. that there is a harm, but moving it from common case risk (i.e. *) to a dramatically reduced risk (i.e. small number of specified domains) is not a reasonable improvement
  3. That risk reduction is a useful strategy, but removing * doesn't actually reduce the amount of parties information will flow to?

The position in the issue, as i understand it, and as it was discussed in PING, is that there is risk here (of which proxy detection is one example), and that removing * both will have the practical effect of reducing the number of parties who get this information, and reduce the possibility of it flowing to unintended parties (i.e. * is a foot gun)

@yoavweiss
Copy link
Contributor

I'm really surprised by this response. Is it the WG's opinion that "minority cases" where browsers and sites are trying to improve privacy by reducing referrer information is not relevant / "super interesting"?

First, let me clarify that I speak as editor and WG chair, but not as "the WG". Being a co-chair of the group doesn't mean that I can speak for its members.

The WG hasn't been consulted yet. I intend to raise this issue to the group on our upcoming call.

Second, reducing referrer information can be interesting on its own, but I don't deem it relevant as a mitigation for the issue raised.

Specifically - the issue is hinged on 3P cookies and single-keying of caches. Eliminating 3P cookies and double-keying the various browser caches is the ultimate solution here.
But, we need a stop-gap, as shipping the above would take us up to 2 years. If our stop-gap relies on future RP reductions that aren't yet in place (in the browsers that haven't yet limited 3P cookies and/or double keyed their caches), it won't be a very good stop gap.

I'm still now following here. The claim in the issue is that resource timing v2 introduces new privacy harms, including detecting proxy usage. The ideal thing would be to remove these capabilities from the spec. However, the WG has identified these use cases as important in some cases and so worth supporting. So, the proposal in the issue is to at least reduce the scope of the harm, and reduce the from being the common case, to at least being a far smaller set of narrow cases. We're trying to help you, with a proposal that allows the functionality to be applied in narrow cases, where its most useful.

I'm sincerely trying to understand the WG's position:

  1. there is no harm here (that, for example, detecting proxy use is not considered a privacy harm)
  2. that there is a harm, but moving it from common case risk (i.e. *) to a dramatically reduced risk (i.e. small number of specified domains) is not a reasonable improvement
  3. That risk reduction is a useful strategy, but removing * doesn't actually reduce the amount of parties information will flow to?

The position in the issue, as i understand it, and as it was discussed in PING, is that there is risk here (of which proxy detection is one example), and that removing * both will have the practical effect of reducing the number of parties who get this information, and reduce the possibility of it flowing to unintended parties (i.e. * is a foot gun)

My position is that we should discuss this in #221, as the issue is completely unrelated to TAO restrictions. It is also my position that if you're trying to defend yourself from a specific attack, the mitigation should do something to prevent the attacker from doing its thing, and that unrelated restrictions won't help you much.

@kdzwinel
Copy link
Author

Thank you @yoavweiss for taking a look and @snyderp for jumping in!

whatwg/fetch#904

Thanks, that's useful, but as far as I understand, this only discusses HTTP cache which doesn't address neither issue 1 (requires a partitioned DNS cache) nor issue 3 (requires a partitioned sessions/sockets). A more broad approach (similar to Chrome's Storage Isolation Project) was recently proposed here: privacycg/proposals#4 .

Doesn't ACAO: * expose the same?

I don't know, but assuming it does, isn't TAO: * still increasing that leakage?

the numbers stated on this issue indicate 95% of currently timed resources will no longer be.

Please note that this number doesn't represent how often resource timing data is being actually collected.

Mitigations should tackle this problem head on.

Those mitigations look really good to me 👍Fuzzing seems to be a good idea for many of the resource timing properties (as a stop gap measure for all browsers that don't implement cache partitioning). Thinking a bit ahead here: isn't it possible to determine the non-fuzzed value by collecting enough fuzzed samples? Are browsers doing fuzzing in any other API (I'd love to read up on the prior art)?

When a response is received, the browser should be aware of whether the request was sent with cookies, and whether the response has Set-Cookie in its headers. The browser can then apply different restrictions to those responses and how they are reported to Resource Timing.

If restrictions will only be applied to responses that match specific criteria, wouldn't it make it easier to detect them? e.g. fuzzed transferSize would be trivial to detect if attacker has knowledge about the expected transferSize values.

My position is that we should discuss this [nextHopProtocol] in #221

I agree that we should discuss this separately to keep this issue focused.

@yoavweiss
Copy link
Contributor

I don't know, but assuming it does, isn't TAO: * still increasing that leakage?

Maybe, If there are many properties that turn on TAO: * without having ACAO: * on the same resources. Since the recommendation is for any resource available on the internet to have ACAO: * enabled, and since TAO is more sensitive than ACAO, I doubt that's the case.

@kdzwinel
Copy link
Author

kdzwinel commented Feb 14, 2020

If there are many properties that turn on TAO: * without having ACAO: * on the same

We (DuckDuckGo) will have another 50k website crawl soon, I'll try to get that data.

@npm1
Copy link
Contributor

npm1 commented Feb 14, 2020

I would be opposed to the proposed change because:

  1. It would break too much Timing-Allow-Origin usage and there's no clear reasoning for the benefit of this breaking change other than 'people may not be aware of what they're opting into'. I do think that the fact that we've added additional power to TAO over time can be a problem for developers that are not aware of these changes. This is a hard problem, as having a new header for every new measurement does not seem desirable either.
  2. It goes against our current goal of aligning TAO as much as possible with CORS, and * is already supported. This means that if/when we make CORS imply TAO (see Make TAO a subset of CORS #178), we'll implicitly have TAO * support because there's already ACAO * support. Thus, there's no point in removing the standalone TAO *. That said, integration with CORS will likely also mean there will be some restrictions for credentialed resources, which I think is desirable. Perhaps we could instead work on implementing those restrictions now, instead of completely getting rid of TAO *?

@jdorweiler
Copy link

I looked into the use of TAO: * and ACAO: * in 50k sites (5.4M requests), and the overlap between the two is only 10%.

  • third party requests seen: 5422241
  • third party requests with ACAO: *: 1442542 (26.6% of all)
  • third party requests with TAO: *: 1190192 (22% of all)
  • third party requests with both ACAO: * + TAO: *: 542026 (10% of all)
  • popularity of TAO: * among all TAO headers: 96%

@npm1
Copy link
Contributor

npm1 commented Feb 18, 2020

Since you're digging into that, from the ~12% that have TAO * but not ACAO *, how often does this happen because they have ACAO with some other value, and how often is it that they're missing the ACAO header altogether?

@jdorweiler
Copy link

  • 1.2% TAO: * and any non-wildcard ACAO
  • 10.7% TAO: * and no ACAO header

@andydavies
Copy link

@kdzwinel

secureConnectionStart === 0 (level 1) can reveal information about a connection being reused suggesting that user recently visited given website.

I believe this is a bug in the implementation of RT by several browsers

https://bugs.chromium.org/p/chromium/issues/detail?id=1039080 https://bugs.webkit.org/show_bug.cgi?id=205768

@andydavies
Copy link

@kdzwinel

Is this study published anywhere / is there anyway we can get a look at the data in more detail?

Of the 1.1M 3rd-party requests mentioned above, what services are in them e.g. sites users login into e.g. social networks, vs sites content is served from e.g image CDNs etc.

This will force developers to list actual domains that they want to share this information with and greatly reduce amount of domains that can be scanned, using above techniques. If there are cases where wildcard is required developers will still be able to simulate it by setting timing-allow-origin value based on the value of the request's referer header.

Is this is a realistic option for many third-party tags providers?

Consider a provider who hosts their common code on cloud storage and serves it via a dumb CDN, how would they set TAO to the relevant origin?

More intelligent CDNs can enable this via config at the edge but what happens when there's no referrer to reflect back in TAO, or vary on?

@pes10k
Copy link

pes10k commented Feb 22, 2020

Is this is a realistic option for many third-party tags providers?

More intelligent CDNs can enable this via config at the edge but what happens when there's no referrer to reflect back in TAO, or vary on?

You've identified exactly the point of the issue. That the feature should be limited to sites looking to debug their own resources, and that allowing everyone to time everything from everywhere is the wrong point on the privacy / functionality curve. There is both utility and risk in this functionality, so taking steps to restrict (but not remove) its availability to a smaller set of parties is PINGs suggestion for how to keep the functionality but reduce its privacy risk.

@michaelkleber
Copy link

Pete, that was not at all the PING suggestion (minutes here). Suggestions at that meeting included keeping wildcard but not making it the default; renaming it from * to unsafe-all; and cache partitioning. Dropping wildcard entirely was only presented as something that developers could overcome if they exerted more effort.

Nobody at PING voiced any support for the position you just voiced, which involved creating cases in which it is impossible to measure timing. Please do not abuse your role by claiming your own novel opinion is some group consensus.

@pes10k
Copy link

pes10k commented Feb 22, 2020

@michaelkleber

  1. The minutes you linked to explicitly say "Our suggestion would be to drop the ability to set wildcard".
  2. I didn't open this issue, so i would kindly appreciate you not suggest I'm putting words in other peoples' mouths.
  3. This issue (titled "Consider removing wildcard option") has been open as a PING issue for weeks now, linked to from the PING tracking issues repo.

@pes10k
Copy link

pes10k commented Feb 22, 2020

I've opened a distinct PING process issue to capture @michaelkleber's concern / complaint and make sure its addressed publicly, without clouding the discussion of the substance of this issue. w3cping/administrivia#28

@andydavies
Copy link

That the feature should be limited to sites looking to debug their own resources, and that allowing everyone to time everything from everywhere is the wrong point on the privacy / functionality curve.

@pes10k So you're suggesting suggesting sites shouldn't be able to measure in detail the performance of 3rd party resources they're including in their pages?

@pes10k
Copy link

pes10k commented Mar 2, 2020

Sites are welcome to measure whatever they'd like, using their own machinery and browsers. Whats at issue here is what sites can use visitors' browsers to measure, and how broadly.

Since @kdzwinel / PING has identified privacy risk involved in the measurements, it seems prudent to reduce the number and frequency of those measurements.

No wild cards is one way of doing so. Something like an analytics / debugging mode would be another way of doing so (e.g. users who want to help the site by providing more information to the site, can signal they'd like to do so. Crawlers and other automation / measurement infrastructure might just always have such a mode on).

@kdzwinel
Copy link
Author

kdzwinel commented Mar 3, 2020

I believe this is a bug in the implementation of RT by several browsers

Thanks for taking a look @andydavies! Those issues are definitely related, but I'm not quite sure if fixing them adresses our concern here. As far as I understand, after bug is fixed, secureConnectionStart === fetchStart would still reveal reused connection.

Is this study published anywhere / is there anyway we can get a look at the data in more detail?

Unfortunately not, although we are contemplating releasing raw crawl data. In the meantime we welcome everyone to double-check our findings using other sources. Perhaps HTTP Archive has similar data?

Nobody at PING voiced any support for the position you just voiced

That's our (I did this review togheter with @dharb and @jdorweiler) recommendation. We are happy to discus alternative mitigations (like the one proposed by @pes10k above).

@marcelduran
Copy link

What about valid cases of wildcard usage? Mainly by common/shared CDNs such as Google Hosted Libraries where webfonts, css, and js libraries are loaded by several websites. Removing '*' would not allow those websites to measure latency on those resources.

@pes10k
Copy link

pes10k commented May 5, 2020

@marcelduran this feature isn't about whether sites can measure latency (there are already many many ways they can do so, like automation / bots); this issue is about whether sites should be able to use visitors' browsers to measure third-party latency, and subject those users to the corresponding privacy and security risks, w/o the users' knowledge or consent. (re: #222 (comment))

@npm1
Copy link
Contributor

npm1 commented May 5, 2020

What do you mean by automation/bots? Lab measurements? Lab cannot replace real-user performance measurement. That's the whole point of web performance APIs.

@pes10k
Copy link

pes10k commented May 5, 2020

Sites interested in those measurements could replicate the same measurements by asking users "would you like us to help debug / monitor the performance of this site (possibly w/ compensation)? If so, here is a simple automation / puppeteer script you can run on your machine, etc."

If the concern is that most users may not wish to participate in such measurements, that part of the point of this issue; there is privacy and security risk in this feature. If sites would like users to take on those risks / costs, and have people volunteer their resources on sites' behalf, they should ask permission (the "analytics / debugging mode" proposal linked to in #222 (comment)) or otherwise incentivize them.

@npm1
Copy link
Contributor

npm1 commented May 5, 2020

I agree with you that performance measurements may have privacy and security risks. However I don't think I agree with any of the proposed solutions here. For the proposal to remove "TAO *" I posted my thoughts in #222 (comment). For the proposal to gate it on user consent, I think it misses the point. Most users are not in a position to reason about the privacy/security risks of web platform features. Even if they were, asking them about this for every new website they visit would be a terrible user experience. And their response to such questions will likely be misinformed or driven by their annoyance at the prompts, so it would ultimately affect performance instrumentation more than anything.

@yoavweiss
Copy link
Contributor

There's no need to rediscuss the proposed mitigation. It was discussed at a WG meeting and was deemed ineffective.

There's a real issue here around the potential exposure of cookies being set, but we'd need another mitigation for it.
One option is to eliminate header sizes from transferSize.
Another, which I like more, is to move transferSize (and other size metrics) away from TAO and onto CORP/CORS. If the resource has opted-in to be embeddable in another context, it's much safer to assume that exposing its size is fine. If we go that route, it'd have to happen as part of a broader "opt-in alignment".

Maybe we can start by eliminating header sizes as a stop-gap, and align the different opt-ins as part of L3?

@pes10k
Copy link

pes10k commented May 6, 2020 via email

@yoavweiss
Copy link
Contributor

For folks who are not familiar with the process here, I’ll just note that in W3C process, all issues need to either be addressed to the initiators and WG’s mutual satisfaction, or decided by the director.

Can you point me to the portion of the process document that indicates that?

What I found was the definition of Formally Addressing an issue:
"A group has formally addressed an issue when it has sent a public, substantive response to the reviewer who raised the issue. A substantive response is expected to include rationale for decisions (e.g., a technical explanation, a pointer to charter scope, or a pointer to a requirements document). The adequacy of a response is measured against what a W3C reviewer would generally consider to be technically sound."

Are there portions of my response above or of the discussion in the WG minutes that a W3C reviewer would not consider "technically sound"? If so, it'd be great if you could point them out.

On top of that, note that the WG is obligated to formally address issues when a specification is on its way to Proposed Recommendation, not before.

As an aside, you may want to note that the role of wide review is to provide comments and point out issues in specifications, and not necessarily suggest and litigate specific solutions to those issues.

With all that said, if for some reason you would like to re-litigate your proposed mitigation, we can arrange for you to present it to the WG in one of its upcoming meetings.

@pes10k
Copy link

pes10k commented May 6, 2020 via email

@yoavweiss
Copy link
Contributor

Are you raising a Formal Objection?

@pes10k
Copy link

pes10k commented May 6, 2020 via email

@rniwa
Copy link

rniwa commented Aug 11, 2020

FWIW, some of the issues pointed out in this thread about transferSize are precisely why we don't expose this property in WebKit / Safari.

@samuelweiler samuelweiler added privacy-needs-resolution Issue the Privacy Group has raised and looks for a response on. and removed privacy-tracker Group bringing to attention of Privacy, or tracked by the Privacy Group but not needing response. labels Jul 1, 2021
@noamr
Copy link
Contributor

noamr commented Sep 29, 2021

Note that the current version of the spec uses constant numbers transferSize, which can only be used to determine whether a response was brought from cache or from the network, but cannot be used to assess the size of the header.

The whole discussion here is big, I'm not familiar with it enough to say whether it helps with the TAO: * issue.

@yoavweiss
Copy link
Contributor

Hey PING folks!

Maybe we can close this with the following:

Would that work for y'all?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
privacy-needs-resolution Issue the Privacy Group has raised and looks for a response on.
Projects
None yet
Development

No branches or pull requests