Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reflect preload information (in ResourceTiming?) #303

Open
noamr opened this issue Oct 26, 2021 · 23 comments
Open

Reflect preload information (in ResourceTiming?) #303

noamr opened this issue Oct 26, 2021 · 23 comments

Comments

@noamr
Copy link
Contributor

noamr commented Oct 26, 2021

Right now it's impossible to know whether a resource was used as part of a preload.
Preloading the resource would generate the RT entry, and in some cases it would be reused without a new fetch (e.g. if it's in the image cache), and in some cases there would be a new fetch that would go to cache (with transferSize 0).

Suggesting to be explicit about it, and to say in a subsequent fetch whether it was served from network, service-worker, HTTP cache, or preload.

@yoavweiss
Copy link
Contributor

The downside of an explicit entry for the preload-reuse resource is that it's a change from what happens today, so it might be worthwhile to run it by RUM folks and see how much confusion it's likely to create.

/cc @nicjansma @andydavies @cliffcrocker

@noamr
Copy link
Contributor Author

noamr commented Oct 26, 2021

The downside of an explicit entry for the preload-reuse resource is that it's a change from what happens today, so it might be worthwhile to run it by RUM folks and see how much confusion it's likely to create.

/cc @nicjansma @andydavies @cliffcrocker

It won't always be different from how it is today. It depends on the status of the cache at that moment. And in any case it would be a near-zero-time entry.

@noamr
Copy link
Contributor Author

noamr commented Oct 26, 2021

Also, "what happens today" is different across browsers :)

If we go with a clear definition of preload (like here) and a clearer definition of what an RT entry represents (we said in the F2F that it should represent a fetch), then a preload and a preload-consume are two fetches, the second one reusing a cloned response from the first one, which would generate two resource timing entries (one with initiatorType "link" and one with initiatorType "img" or whichever resource), the second of which would be zero-ish.

@andydavies
Copy link

If we go with a clear definition of preload (like here) and a clearer definition of what an RT entry represents (we said in the F2F that it should represent a fetch), then a preload and a preload-consume are two fetches, the second one reusing a cloned response from the first one, which would generate two resource timing entries (one with initiatorType "link" and one with initiatorType "img" or whichever resource), the second of which would be zero-ish.

From an RT perspective I think preload should behave as fetches from the pre-parser do as it's essentially the same behaviour - a resource is fetched ahead of time, and used later - one is browser initiated, and the other author initiated

So if preload results in two RT entries then everything that's fetched by the pre-parser should also result in two entries - which I think would be very weird

I think the RT entry for preload should match the network or HTTP cache fetch as it does for other resources and the reporting API used to flag unused preloads

Won't provide information via RT that's something is used but will flag up non-used ones

@noamr
Copy link
Contributor Author

noamr commented Oct 27, 2021

If we go with a clear definition of preload (like here) and a clearer definition of what an RT entry represents (we said in the F2F that it should represent a fetch), then a preload and a preload-consume are two fetches, the second one reusing a cloned response from the first one, which would generate two resource timing entries (one with initiatorType "link" and one with initiatorType "img" or whichever resource), the second of which would be zero-ish.

From an RT perspective I think preload should behave as fetches from the pre-parser do as it's essentially the same behaviour - a resource is fetched ahead of time, and used later - one is browser initiated, and the other author initiated

But the preparser has a substantial difference, e.g. you know the element that's going to use the resource and that early fetching is not entirely observable, e.g. it has the same initiatorType which is not info available to preload.

So if preload results in two RT entries then everything that's fetched by the pre-parser should also result in two entries - which I think would be very weird

I don't think having two entries for preload should necessarily mean two entries for the preparser

I think the RT entry for preload should match the network or HTTP cache fetch as it does for other resources and the reporting API used to flag unused preloads

Won't provide information via RT that's something is used but will flag up non-used ones

Sure, that's also an option

@noamr
Copy link
Contributor Author

noamr commented Oct 30, 2021

Actually based on the newer developments in the preload PR, I tend to agree with either having some "unused preload" flag (maybe something like readyState on the link element?), and/or having a preloadConsumedTime attribute on the RT entry.

@noamr
Copy link
Contributor Author

noamr commented Nov 2, 2021

Proposal:

  • A preload link, once loaded, creates an RT entry immediately, but doesn't queue it to the PerformanceObservers.
  • Once the preload is consumed, it marks a preloadConsumedTime timestamp on the same entry, and the entry gets queued (maybe it somehow register the consumer initiatorType).
  • The duration would be based on responseEnd like today, and the preloadConsumedTime would be (most likely) after that, though in some cases it can be before (if the resource is used before it's fully loaded).

This is in line with navigation timing, where the entry is created when we know about it, but only gets queued when its final attribute is set.

It also allows us to report unused preloads in RUM without adding a timeout - a RUM library or a document can check on its own whether we have preloaded links that don't match RT entries with corresponding preloadConsumedTime.

WDYT, @andydavies @yoavweiss

@yoavweiss
Copy link
Contributor

I like it!
/cc @npm1 @philipwalton @pmeenan for their opinions.

@npm1
Copy link
Contributor

npm1 commented Nov 10, 2021

The proposal seems reasonable to me, but it's unclear from the discussion in the thread what the current behavior is and why we need to change it. Can this be documented here?

@noamr
Copy link
Contributor Author

noamr commented Nov 10, 2021

The proposal seems reasonable to me, but it's unclear from the discussion in the thread what the current behavior is and why we need to change it. Can this be documented here?

The current behavior doesn't mark the time the preload was consumed, and you'd need to guess whether it's consumed based on the existence of another RT entry with the same URL. The idea was to give extra indication about use of preloads, which is a request that came in the preload discussion at TPAC.

Note that this changes the current behavior in a minor way only - the attributes for the preloaded entry stay the same and a new one is added - the only actual change in behavior is that the preload is not queued until consumption.

@nicjansma
Copy link
Contributor

@noamr I like your proposal. Two potential challenges though:

  • For HTTP Header-based Preloads, RUM doesn't know what those headers were (or if there were any). So RUM wouldn't be able to cross-check with the RT entries we saw to know which header-based Preloads weren't used.
  • If we're not going to queue to the PerformanceObservers until they're consumed, unused Preloads wouldn't ever have an RT entry? This would further "hide" data from ResourceTiming that I'd like to avoid (and the bytes/time they took).

I generally don't like tying things to the load event, but maybe we could use that here? If a Preload doesn't get consumed by load, we can queue that entry right then, and maybe mark preloadConsumedTime=-1 or something to indicate that it was a Preload but not used? I expect most Preloads to be used by load, and most RUM to "consume" RT data after load.

@noamr
Copy link
Contributor Author

noamr commented Apr 7, 2022

@noamr I like your proposal. Two potential challenges though:

  • For HTTP Header-based Preloads, RUM doesn't know what those headers were (or if there were any). So RUM wouldn't be able to cross-check with the RT entries we saw to know which header-based Preloads weren't used.

Suggesting that the initiator type of link headers would be link-header, the same way early hints get a early-hint initiator type.

  • If we're not going to queue to the PerformanceObservers until they're consumed, unused Preloads wouldn't ever have an RT entry? This would further "hide" data from ResourceTiming that I'd like to avoid (and the bytes/time they took).

They will have an RT entry, but it wouldn't be queued to PerformanceObservers.

@noamr
Copy link
Contributor Author

noamr commented Apr 7, 2022

I generally don't like tying things to the load event, but maybe we could use that here? If a Preload doesn't get consumed by load, we can queue that entry right then, and maybe mark preloadConsumedTime=-1 or something to indicate that it was a Preload but not used? I expect most Preloads to be used by load, and most RUM to "consume" RT data after load.

If the RT entries are there but not queued to observers, is this needed? Simply check performance.getEntries*** in your onload to see which preloads were not consumed.

@noamr
Copy link
Contributor Author

noamr commented Apr 7, 2022

I think the original idea of having an RT entry for consuming a resource, and differentiating it from the preload entry somehow would be more straightforward - e.g. having a delivery or cacheState attribute that exposes network / http-cache / service-worker / consume-preload or so.

@nicjansma
Copy link
Contributor

Suggesting that the initiator type of link headers would be link-header, the same way early hints get a early-hint initiator type.

There was the suggestion that RUM providers could cross-check their HTML-based <link rel="preload"> vs. the observed ResourceTiming entries, to understand Preloads that didn't get used. That wouldn't work for Headers-based Preloads, as we don't know what HTTP headers there were.

They will have an RT entry, but it wouldn't be queued to PerformanceObservers.

Hm, I don't think we have any precedence for PerformanceObservers "seeing" a different view than performance.getEntries() though -- they're always been viewing the same list.

@noamr
Copy link
Contributor Author

noamr commented Apr 7, 2022

Suggesting that the initiator type of link headers would be link-header, the same way early hints get a early-hint initiator type.

There was the suggestion that RUM providers could cross-check their HTML-based <link rel="preload"> vs. the observed ResourceTiming entries, to understand Preloads that didn't get used. That wouldn't work for Headers-based Preloads, as we don't know what HTTP headers there were.

Right. Suggesting that they could cross-check against preload entries in the RT buffer instead.

They will have an RT entry, but it wouldn't be queued to PerformanceObservers.

Hm, I don't think we have any precedence for PerformanceObservers "seeing" a different view than performance.getEntries() though -- they're always been viewing the same list.

they're seeing the same view but not necessarily get triggered for each entry.
The precedence is the navigation timing entry - it's available from the start, but only queued to the observers after the load event.

@nicjansma
Copy link
Contributor

If the RT entries are there but not queued to observers, is this needed? Simply check performance.getEntries*** in your onload to see which preloads were not consumed.

Ah yep, agreed that would work.

Right. Suggesting that they could cross-check against preload entries in the RT buffer instead.

I see, you're suggesting RUM would:

  • Have a PerformanceObserver active, putting entries into poBuffer
  • At the time of beaconing, also get PerformanceTimeline buffer ptBuffer = performance.getEntriesByType("resource")
  • Compare the list. For anything in the ptBuffer that was .initiatorType="link" (or something new?), see if it's in the poBuffer. If yes, that Preload was consumed. If not, that Preload was wasted.

I can see how that would work, but thinking through a few small gotchas:

  • Small amount of work required to compare the two arrays
  • We're trying to move folks away from using the PT buffer and only use POs, and this would redirect developers (at least RUM caring about Preloads) back to using PT
  • Slight risk from the getEntries*() buffer being either been filled or cleared, so it's not always going to line-up.
  • I'm just a little hesitant to have the PT and PO buffers have mis-matched number of entries (unless it's well-documented)
  • A RUM provider that only uses PO (which is our recommendation), that doesn't make any changes for this, could see less PO entries (and would be missing expensive unused Preloads)

The precedence is the navigation timing entry - it's available from the start, but only queued to the observers after the load event.

That's fair. It's helpful for NT that it's the "only" entry in the NT PO buffer (for now), so I think it's a bit easier for developers to know that caveat.

I'm concerned that developers would get confused in general that the PO and PT buffers may be slightly different for Preload reasons. If we did this, we may want some additional guidance/notes in the spec.

Will ponder this a bit more though!

@andydavies
Copy link

@noamr @nicjansma

Has there been any discussion of using the reporting-api to tackle this e.g. report unused preloads to an endpoint rather than having two performance entries? (did wonder whether bfcache reasons should adopt the same thing)

(Before I test this) do you know how the browser having to go to cache twice for the same resource is recorded in RT e.g. image gets evicted from memory on a low powered device and browser has to refetch it from cache when visitor scrolls of such like?

@noamr
Copy link
Contributor Author

noamr commented Apr 7, 2022

@noamr @nicjansma

Has there been any discussion of using the reporting-api to tackle this e.g. report unused preloads to an endpoint rather than having two performance entries? (did wonder whether bfcache reasons should adopt the same thing)

This is not just about unused preloads, it's also for preloads that are used a long time after they're loaded, to help optimize... maybe you didn't need to preload it because anyway it's used very late?

(Before I test this) do you know how the browser having to go to cache twice for the same resource is recorded in RT e.g. image gets evicted from memory on a low powered device and browser has to refetch it from cache when visitor scrolls of such like?

yes retrieving from HTTP cache is reflected, with transferSize 0. I suggested to expose an enum of where the resource is retrieved from in this comment, which would allow us to differentiate.

@noamr
Copy link
Contributor Author

noamr commented Apr 7, 2022

If the RT entries are there but not queued to observers, is this needed? Simply check performance.getEntries*** in your onload to see which preloads were not consumed.

Ah yep, agreed that would work.

Right. Suggesting that they could cross-check against preload entries in the RT buffer instead.

I see, you're suggesting RUM would:

  • Have a PerformanceObserver active, putting entries into poBuffer
  • At the time of beaconing, also get PerformanceTimeline buffer ptBuffer = performance.getEntriesByType("resource")
  • Compare the list. For anything in the ptBuffer that was .initiatorType="link" (or something new?), see if it's in the poBuffer. If yes, that Preload was consumed. If not, that Preload was wasted.

I can see how that would work, but thinking through a few small gotchas:

  • Small amount of work required to compare the two arrays
  • We're trying to move folks away from using the PT buffer and only use POs, and this would redirect developers (at least RUM caring about Preloads) back to using PT
  • Slight risk from the getEntries*() buffer being either been filled or cleared, so it's not always going to line-up.
  • I'm just a little hesitant to have the PT and PO buffers have mis-matched number of entries (unless it's well-documented)
  • A RUM provider that only uses PO (which is our recommendation), that doesn't make any changes for this, could see less PO entries (and would be missing expensive unused Preloads)

The precedence is the navigation timing entry - it's available from the start, but only queued to the observers after the load event.

That's fair. It's helpful for NT that it's the "only" entry in the NT PO buffer (for now), so I think it's a bit easier for developers to know that caveat.

I'm concerned that developers would get confused in general that the PO and PT buffers may be slightly different for Preload reasons. If we did this, we may want some additional guidance/notes in the spec.

Will ponder this a bit more though!

I understand the concerns... the alternative would be to have an additional RT entry for the consume. I will present the options in one of the following WG meetings, would be good to hear more thoughts here in the meantime.

@andydavies
Copy link

(Before I test this) do you know how the browser having to go to cache twice for the same resource is recorded in RT e.g. image gets evicted from memory on a low powered device and browser has to refetch it from cache when visitor scrolls of such like?
yes retrieving from HTTP cache is reflected, with transferSize 0. I suggested to expose an enum of where the resource is retrieved from #303 (comment), which would allow us to differentiate.

So if I load a page that has an image at the top, scroll to the bottom and the image gets evicted from memory cache, scroll back to the top and the browser reloads the image from cache I'll get two RT entries - one potentially with a transferSize and one without?

@noamr
Copy link
Contributor Author

noamr commented Apr 7, 2022

(Before I test this) do you know how the browser having to go to cache twice for the same resource is recorded in RT e.g. image gets evicted from memory on a low powered device and browser has to refetch it from cache when visitor scrolls of such like?
yes retrieving from HTTP cache is reflected, with transferSize 0. I suggested to expose an enum of where the resource is retrieved from #303 (comment), which would allow us to differentiate.

So if I load a page that has an image at the top, scroll to the bottom and the image gets evicted from memory cache, scroll back to the top and the browser reloads the image from cache I'll get two RT entries - one potentially with a transferSize and one without?

That is correct.

@noamr
Copy link
Contributor Author

noamr commented Apr 14, 2022

Two alternatives: (feel free to propose more)

Separate Entry

  • Add a timing entry to the timeline when the resource is consumed
  • Very flexible
  • Works the same as entries representing resources fetched HTTP cache
  • Perhaps differentiate with a delivery property (from cache | network | service-worker | preload)

Additional Property

  • Add a preloadConsume timestamp to the existing preload entry
  • Less disruptive to current timelines
  • An entry might change after its creation, similar to how navigation timing entries may receive activationStart / loadEventStart after the entry is already available.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants