Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve RealImageLoader.execute to properly call async request #2205

Merged

Conversation

mlykotom
Copy link
Contributor

I was investigating the performance of rememberAsyncImagePainter and found out the AsyncImagePainter.onRemembered is the longest call inside ~0.25ms for each image.
image

When I was investigating more, I think the reason of this is that the RealImageLoader.execute request uses async call with dispatcher Main.immediate, but the coroutine is already called with Main.immediate. If I understand things correclty, this can't be called in parallel and therefore is called sequentially, hence the duration.

This should improve #1866 .

before

ScrollBenchmark_fullCompilation
coil.compose.rememberAsyncImagePainter%AverageMs   min     0,5,   median     0,6,   max     0,7
coil.compose.rememberAsyncImagePainter%Count       min   156,0,   median   173,0,   max   190,0
coil.compose.rememberAsyncImagePainter%Ms          min    81,0,   median    99,9,   max   128,2
timeToInitialDisplayMs                             min   525,2,   median   547,1,   max   579,6
frameDurationCpuMs                                 P50     10,5,   P90     31,0,   P95     42,1,   P99    315,3
frameOverrunMs                                     P50     -7,5,   P90     17,1,   P95     63,4,   P99  1 170,9
Traces: Iteration 0 1 2 3 4 5 6 7 8 9

after

ScrollBenchmark_fullCompilation
coil.compose.rememberAsyncImagePainter%AverageMs   min     0,3,   median     0,4,   max     0,5
coil.compose.rememberAsyncImagePainter%Count       min   154,0,   median   174,0,   max   184,0
coil.compose.rememberAsyncImagePainter%Ms          min    51,4,   median    70,6,   max    83,1
timeToInitialDisplayMs                             min   521,4,   median   563,9,   max   605,9
frameDurationCpuMs                                 P50      9,6,   P90     30,7,   P95     45,1,   P99    301,4
frameOverrunMs                                     P50     -8,1,   P90     22,8,   P95     52,3,   P99  1 172,8
Traces: Iteration 0 1 2 3 4 5 6 7 8 9

From the benchmarks, I can see that the frame duration was improved slightly for the scroll.

before : frameDurationCpuMs  P50     10,5,   P90     31,0,   P95     42,1,   P99    315,3
after    : frameDurationCpuMs  P50      9,6, (-9%)   P90     30,7, (-1%)   P95     45,1, (+6%)  P99    301,4 (-5%)

This solution is probably not sufficient as the dispatcher probably needs to be passed, or called somewhere else, or even this might be done differently all together.

I have experimented with this in 2.x branch, but I see a similar code is in 3.x.

@colinrtwhite
Copy link
Member

Thanks for taking a look at this! Though I'm a bit confused - won't async(Dispatchers.Main.immediate) always run sequentially (even if wrapped with withContext(Dispatchers.Default)) since Dispatchers.Main.immediate is a single threaded dispatcher?

Unfortunately I don't think we can use an non-main thread dispatcher here as it breaks returning images from the memory cache synchronously - this is the failing test.

Do you know if RealImageLoader.executeMain is taking up the bulk of the AsyncImagePainter.onRemembered time? There are some things that Coil needs to run on the main thread, but I'm curious if that's what's slowing it down at the moment or if it's something else.

@mlykotom
Copy link
Contributor Author

mlykotom commented Apr 17, 2024

I sort of expected something to break, but I was a bit in a rush and wanted to start the conversation going 😇

I pushed my wip code with experiments (ScrollBenchmark) that can produce the system trace and a method trace.
Here's the breakdown of the method trace
image

You can run the benchmark as well and get the method trace for analysis.
Given it's a method trace, we don't look at the timing information, because everything is skewed, but more on the ratio between sections.

@mlykotom
Copy link
Contributor Author

Okay, I tried a bit playing with this if something could be triggered earlier, but I'm not able to solve the memory cache blocking request. I thought there could be something running in parallel with the async call and await it at the apropriate time to basically have something like "if not ready, block". But I don't understand the details of when things are called.

@colinrtwhite
Copy link
Member

Interesting thanks for digging in! Looks like there are two main avenues to improve perf like you called out:

  • Speed up scope.launch in AsyncImagePainter.onRemembered: I'm not sure what we can do here given the bulk of the time looks to be coroutines library-related. The result of the code inside scope.launch should be pretty quick.
  • Don't block when ConstraintSizeResolver.size is called: I'm surprised that this is a large percentage of the runtime given SizeResolver.size should suspend instead of block while waiting for non-zero constraints. Do you know what could be the issue?

@mlykotom
Copy link
Contributor Author

  1. Part of the scope.launch in onRemembered is because of reading the state, which happens twice - request and imageLoader. I'm thinking whether the imageLoader has to be as a mutableStateOf. I don't see this used for compose directly, so this might be a small optimization.

  2. The other thing might also be to not use mutableStateOf for the request parameter. I don't see it being read anywhere in the project, is it a niche case when someone might want to listen to this outside? Could this be a MutableStateFlow, which is used for everyone and used as collectAsState() composable if anyone needs listening to this?

  3. Could the blocking nature of ConstraintSizeResolver.size be related to the fact it uses Dispatchers.Main.immediate.

@mlykotom
Copy link
Contributor Author

mlykotom commented Apr 23, 2024

I experimented with removing the State variables, there's some minor change in overall duration, but the FPS impact might be too noisy.

There's very minor API change.

Before:

AsyncImagePainter.onRememberedAverageMs        min   0.1,   median   0.1,   max   0.1
AsyncImagePainter.onRememberedCount            min 130.0,   median 139.0,   max 146.0
AsyncImagePainter.onRememberedMs               min   9.3,   median  12.5,   max  16.3
frameDurationCpuMs                             P50    4.1,   P90    9.1,   P95   20.0,   P99   99.3

After:

AsyncImagePainter.onRememberedAverageMs        min   0.1,   median   0.1,   max   0.1
AsyncImagePainter.onRememberedCount            min 130.0,   median 141.0,   max 148.0
AsyncImagePainter.onRememberedMs               min   8.3,   median  10.5,   max  15.8
frameDurationCpuMs                             P50    4.4,   P90    8.6,   P95   12.0,   P99   95.1

@mlykotom mlykotom force-pushed the mlykotom/perf-improvement-experiment branch from 926a817 to 42cf1a5 Compare April 24, 2024 09:15
@mlykotom
Copy link
Contributor Author

mlykotom commented Apr 24, 2024

I played a bit more with what could be improved and the async call within RealImageLoader.execute is only needed for Views. So if we tweak the the code to use that only for Views, we can get rid of some coroutines overhead.

Load detail image

Before (any tweaks)

ScrollBenchmark_fullCompilation
AsyncImagePainter.onRememberedAverage_µs   min   518.6,   median   671.3,   max   733.1
AsyncImagePainter.onRememberedCount        min     2.0,   median     2.0,   max     2.0
AsyncImagePainter.onRemembered_µs          min 1,037.1,   median 1,342.7,   max 1,466.3
rememberAsyncImagePainterAverage_µs        min 1,147.3,   median 1,497.2,   max 1,638.2
rememberAsyncImagePainterCount             min     1.0,   median     1.0,   max     1.0
rememberAsyncImagePainter_µs               min 1,147.3,   median 1,497.2,   max 1,638.2
timeToInitialDisplayMs                     min   217.2,   median   230.0,   max   272.5
frameDurationCpuMs                         P50      2.9,   P90     54.0,   P95     58.5,   P99     66.0
frameOverrunMs                             P50    -12.3,   P90     42.7,   P95     62.2,   P99     81.4
Traces: Iteration 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

After

ScrollBenchmark_fullCompilation
AsyncImagePainter.onRememberedAverage_µs   min   346.3,   median   402.9,   max   714.2
AsyncImagePainter.onRememberedCount        min     2.0,   median     2.0,   max     2.0
AsyncImagePainter.onRemembered_µs          min   692.5,   median   805.9,   max 1,428.5
rememberAsyncImagePainterAverage_µs        min   784.9,   median   942.1,   max 1,641.6
rememberAsyncImagePainterCount             min     1.0,   median     1.0,   max     1.0
rememberAsyncImagePainter_µs               min   784.9,   median   942.1,   max 1,641.6
timeToInitialDisplayMs                     min   198.0,   median   225.1,   max   236.8
frameDurationCpuMs                         P50      2.9,   P90     53.2,   P95     57.2,   P99     68.1
frameOverrunMs                             P50    -12.3,   P90     41.0,   P95     68.3,   P99     79.2
Traces: Iteration 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

This shows ~58% improvement for one image.

The results would be better taken with Microbenchmark, so it needs to be taken with a grain of salt.

Update benchmarks

Experiment removing mutableState

Before:
AsyncImagePainter.onRememberedAverageMs        min   0.1,   median   0.1,   max   0.1
AsyncImagePainter.onRememberedCount            min 130.0,   median 139.0,   max 146.0
AsyncImagePainter.onRememberedMs               min   9.3,   median  12.5,   max  16.3
frameDurationCpuMs                             P50    4.1,   P90    9.1,   P95   20.0,   P99   99.3

After:
AsyncImagePainter.onRememberedAverageMs        min   0.1,   median   0.1,   max   0.1
AsyncImagePainter.onRememberedCount            min 130.0,   median 141.0,   max 148.0
AsyncImagePainter.onRememberedMs               min   8.3,   median  10.5,   max  15.8
frameDurationCpuMs                             P50    4.4,   P90    8.6,   P95   12.0,   P99   95.1

Don't launch job for compose

before
AsyncImagePainter.onRememberedAverage_µs       min     77.3,   median    101.5,   max    116.7
AsyncImagePainter.onRememberedCount            min    130.0,   median    130.0,   max    146.0
AsyncImagePainter.onRemembered_µs              min 10,048.4,   median 13,880.2,   max 16,900.0
timeToInitialDisplayMs                         min    256.8,   median    281.7,   max    358.9
frameDurationCpuMs                             P50       5.1,   P90       9.2,   P95      13.3,   P99     117.0

after
AsyncImagePainter.onRememberedAverage_µs       min     68.9,   median     78.7,   max     91.1
AsyncImagePainter.onRememberedCount            min    128.0,   median    141.0,   max    146.0
AsyncImagePainter.onRemembered_µs              min  8,956.1,   median 11,163.1,   max 13,111.9
timeToInitialDisplayMs                         min    234.1,   median    276.5,   max    325.7
frameDurationCpuMs                             P50       4.5,   P90       8.9,   P95      15.5,   P99     120.3
Before:
AsyncImagePainter.onRememberedAverageMs        min   0.1,   median   0.1,   max   0.1
AsyncImagePainter.onRememberedCount            min 130.0,   median 139.0,   max 146.0
AsyncImagePainter.onRememberedMs               min   9.3,   median  12.5,   max  16.3
frameDurationCpuMs                             P50    4.1,   P90    9.1,   P95   20.0,   P99   99.3

After:
AsyncImagePainter.onRememberedAverageMs        min   0.1,   median   0.1,   max   0.1
AsyncImagePainter.onRememberedCount            min 130.0,   median 141.0,   max 148.0
AsyncImagePainter.onRememberedMs               min   8.3,   median  10.5,   max  15.8
frameDurationCpuMs                             P50    4.4,   P90    8.6,   P95   12.0,   P99   95.1
before
AsyncImagePainter.onRememberedAverage_µs       min     77.3,   median    101.5,   max    116.7
AsyncImagePainter.onRememberedCount            min    130.0,   median    130.0,   max    146.0
AsyncImagePainter.onRemembered_µs              min 10,048.4,   median 13,880.2,   max 16,900.0
timeToInitialDisplayMs                         min    256.8,   median    281.7,   max    358.9
frameDurationCpuMs                             P50       5.1,   P90       9.2,   P95      13.3,   P99     117.0

after
AsyncImagePainter.onRememberedAverage_µs       min     68.9,   median     78.7,   max     91.1
AsyncImagePainter.onRememberedCount            min    128.0,   median    141.0,   max    146.0
AsyncImagePainter.onRemembered_µs              min  8,956.1,   median 11,163.1,   max 13,111.9
timeToInitialDisplayMs                         min    234.1,   median    276.5,   max    325.7
frameDurationCpuMs                             P50       4.5,   P90       8.9,   P95      15.5,   P99     120.3
@mlykotom mlykotom force-pushed the mlykotom/perf-improvement-experiment branch from d100dff to c8a8b45 Compare April 24, 2024 10:52
@colinrtwhite
Copy link
Member

colinrtwhite commented Apr 25, 2024

@mlykotom This is awesome, thanks! Do you want to merge this in with the scroll perf benchmark?

To improve scroll perf even more, I think if we offer an opt-in option for Compose (or maybe the default in 3.x?) we should be able to make executeMain thread agnostic (though, I think ConstraintsSizeResolve.size will still run on the UI thread). Currently, it only requires it to interact with views, lifecycles, and invoke listener methods, which aren't necessary for AsyncImage/AsyncImagePainter.

@mlykotom
Copy link
Contributor Author

Hey, sorry I was super busy this week and couldn't get to it.
I will clean up the PR to be in a submittable shape.

To improve scroll perf even more, I think if we offer an opt-in option for Compose (or maybe the default in 3.x?) we should be able to make executeMain thread agnostic (though, I think ConstraintsSizeResolve.size will still run on the UI thread). Currently, it only requires it to interact with views, lifecycles, and invoke listener methods, which aren't necessary for AsyncImage/AsyncImagePainter.

I think this sounds really great to think about the threads in 3.x, I would vote for having it default, but definitelly needs some thinking.

Comment on lines 261 to 263
val request
@Composable
get() = requestInternal.collectAsState().value
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@colinrtwhite what do you think about this change? Technically this changes the API, because now the request is only available from a composable, which wasn't before. I'm not sure how this could be done in a backwards compatible way 😬

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to keep it as a non-Composable since it's then the same as the other exposed properties (imageLoader and state). Though if there's a strong performance case then we should consider changing all the exposed properties to be @Composable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Depends what strong is. I think with the coroutines optimizations + the async, we can squeeze ~0.2ms for each rememberAsyncImagePainter = grid of 20 images = ~4ms if they're all composed at once.

I don't want to just come and introduce breaking changes, which is why I try to have this discussion.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mlykotom I think 0.2ms is probably a strong enough incentive to change the API. Though unfortunately, we can't make any breaking changes to the 2.x branch. We could make the changes on the 3.x (main) branch, however.

@@ -282,9 +288,9 @@ class AsyncImagePainter internal constructor(
}

@OptIn(ExperimentalCoroutinesApi::class)
override fun onRemembered() {
override fun onRemembered() = trace("AsyncImagePainter.onRemembered") {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking whether the usage of onRemembered is used actually? This is manually called immediatelly in composition and then compose framework calls it again after the composition and it always just returns with the check on rememberScope != null.

I don't fully understand (yet) the child painter and it's relation to RememberObserver, but feels like it could call them differently as well?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not too sure of Compose's behaviour, but I added this in on onRemembered since we want to stop any ongoing requests in onForgotten. This keeps the symmetry in case the painter is re-remembered after being forgotten.

The child painter is usually either a CrossfadePainter if crossfading is enabled or a DrawablePainter, which wraps the Drawable returned by the ImageLoader.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the current way how it's used it doesn't really need to call this in onRemembered. It's called directly from the rememberAsyncImagePainter and sets the rememberScope, when Compose actually iterates over the RememberObservables after the composition in the apply phase, it will skip doing anything, because it's already created.

The onRemembered should be only called once, because you can't really re-remember. If it gets out of the composition completely, then the onForgotten is called and next time a new instance is created.

I think this could optimized into starting the imageLoader.execute(updateRequest(it)).toState() whenever you set a new request and it's different from the previous and it doesn't have to observe the flow. This way you optimize the coroutines overhead for many people, but also can keep the backwards-compatible way of having the state.

Anyway, I'll revert this so the API stays the same

Comment on lines 311 to 312
requestInternal
.mapLatest { imageLoader.execute(updateRequest(it)).toState() }
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if this is actually just a var _request: ImageRequest?
The request is changed from outside right before calling onRemembered manually, which means it's actually up-to-date here.

I don't know what's the case for the nested painters.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we might need to read this as a state, but I can't remember the specifics. If I remember correctly there are cases were mutableStateOf(request) won't be equal to _request since it exists outside of the Composition. Also it's possible for request to change so we need to observe it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the _request is set everytime when the rememberAsyncImagePainter recomposes, it could use just setter and job?.cancel() without observing the flow. This could bring the benefit because most of the time the _request stays the same and therefore only needs one job that finishes and doesn't have to keep listening, but if it changes, then it just calls another job (or cancels it if existing).

@@ -197,7 +199,7 @@ private fun rememberAsyncImagePainter(
onState: ((State) -> Unit)?,
contentScale: ContentScale,
filterQuality: FilterQuality,
): AsyncImagePainter {
): AsyncImagePainter = trace("rememberAsyncImagePainter") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this trace add any overhead? Is it OK to ship to users?

Copy link
Contributor Author

@mlykotom mlykotom May 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This trace doesn't add any overhead (~6ns per call) and it's fine to ship to production. Compose adds bunch of custom sections as well. These usually serve for better understanding what's going on.
You can even use these sections from the benchmarks without enabling composition tracing to keep track of how long they take.

@@ -247,17 +251,19 @@ class AsyncImagePainter internal constructor(
internal var contentScale = ContentScale.Fit
internal var filterQuality = DefaultFilterQuality
internal var isPreview = false
internal val requestInternal = MutableStateFlow(request)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the a performance benefit to reading request from a MutableStateFlow vs. a mutableStateOf?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, yes, which is why I was trying a different options.

When mutableStateOf is used, in order to observe it you need the snapshotFlowOf, which adds bunch of coroutines overhead. The launch duration takes 17% of the onRemembers in comparison without it takes 9% (8% of the onRemembered difference).

image

image


/** The current [ImageLoader]. */
var imageLoader: ImageLoader by mutableStateOf(imageLoader)
var imageLoader: ImageLoader = imageLoader
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to keep this as a mutableStateOf otherwise it violates @Stable's contract. It's possible to change the ImageLoader from rememberAsyncImagePainter and the composition should be notified.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a feeling it doesn't have to. The readers of this are in the onRemembered function, but aren't observed with snapshotFlow. This means that whatever variable is set there, it will be used.

This state is only needed if someone would like to observe imageLoader in composition, which I kind of highly doubt, but I know it's an API change.

Comment on lines 261 to 263
val request
@Composable
get() = requestInternal.collectAsState().value
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to keep it as a non-Composable since it's then the same as the other exposed properties (imageLoader and state). Though if there's a strong performance case then we should consider changing all the exposed properties to be @Composable.

@@ -282,9 +288,9 @@ class AsyncImagePainter internal constructor(
}

@OptIn(ExperimentalCoroutinesApi::class)
override fun onRemembered() {
override fun onRemembered() = trace("AsyncImagePainter.onRemembered") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not too sure of Compose's behaviour, but I added this in on onRemembered since we want to stop any ongoing requests in onForgotten. This keeps the symmetry in case the painter is re-remembered after being forgotten.

The child painter is usually either a CrossfadePainter if crossfading is enabled or a DrawablePainter, which wraps the Drawable returned by the ImageLoader.

Comment on lines 311 to 312
requestInternal
.mapLatest { imageLoader.execute(updateRequest(it)).toState() }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we might need to read this as a state, but I can't remember the specifics. If I remember correctly there are cases were mutableStateOf(request) won't be equal to _request since it exists outside of the Composition. Also it's possible for request to change so we need to observe it.

@mlykotom mlykotom force-pushed the mlykotom/perf-improvement-experiment branch from aabb461 to b1b9e25 Compare May 9, 2024 18:49
@mlykotom mlykotom force-pushed the mlykotom/perf-improvement-experiment branch from b1b9e25 to 1a81e8a Compare May 9, 2024 18:51
Copy link
Contributor Author

@mlykotom mlykotom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I reverted the breaking changes, so these can be done in a separate PR so the effect is smaller (~0.1ms per item). Still that gets 3ms in total when scrolling content.

before
ScrollBenchmark_fullCompilation
AsyncImagePainter.onRememberedAverage_µs   min    112.2,   median    112.2,   max    112.2
AsyncImagePainter.onRememberedCount        min    130.0,   median    130.0,   max    130.0
AsyncImagePainter.onRemembered_µs          min 14,586.9,   median 14,586.9,   max 14,586.9
rememberAsyncImagePainterAverage_µs        min    244.4,   median    244.4,   max    244.4
rememberAsyncImagePainterCount             min     65.0,   median     65.0,   max     65.0
rememberAsyncImagePainter_µs               min 15,888.1,   median 15,888.1,   max 15,888.1

after
ScrollBenchmark_fullCompilation
AsyncImagePainter.onRememberedAverage_µs   min     90.7,   median     90.7,   max     90.7
AsyncImagePainter.onRememberedCount        min    130.0,   median    130.0,   max    130.0
AsyncImagePainter.onRemembered_µs          min 11,785.6,   median 11,785.6,   max 11,785.6
rememberAsyncImagePainterAverage_µs        min    197.5,   median    197.5,   max    197.5
rememberAsyncImagePainterCount             min     65.0,   median     65.0,   max     65.0
rememberAsyncImagePainter_µs               min 12,839.0,   median 12,839.0,   max 12,839.0


/** The current [ImageLoader]. */
var imageLoader: ImageLoader by mutableStateOf(imageLoader)
var imageLoader: ImageLoader = imageLoader
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a feeling it doesn't have to. The readers of this are in the onRemembered function, but aren't observed with snapshotFlow. This means that whatever variable is set there, it will be used.

This state is only needed if someone would like to observe imageLoader in composition, which I kind of highly doubt, but I know it's an API change.

Comment on lines 261 to 263
val request
@Composable
get() = requestInternal.collectAsState().value
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Depends what strong is. I think with the coroutines optimizations + the async, we can squeeze ~0.2ms for each rememberAsyncImagePainter = grid of 20 images = ~4ms if they're all composed at once.

I don't want to just come and introduce breaking changes, which is why I try to have this discussion.

@@ -247,17 +251,19 @@ class AsyncImagePainter internal constructor(
internal var contentScale = ContentScale.Fit
internal var filterQuality = DefaultFilterQuality
internal var isPreview = false
internal val requestInternal = MutableStateFlow(request)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, yes, which is why I was trying a different options.

When mutableStateOf is used, in order to observe it you need the snapshotFlowOf, which adds bunch of coroutines overhead. The launch duration takes 17% of the onRemembers in comparison without it takes 9% (8% of the onRemembered difference).

image

image

@@ -282,9 +288,9 @@ class AsyncImagePainter internal constructor(
}

@OptIn(ExperimentalCoroutinesApi::class)
override fun onRemembered() {
override fun onRemembered() = trace("AsyncImagePainter.onRemembered") {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the current way how it's used it doesn't really need to call this in onRemembered. It's called directly from the rememberAsyncImagePainter and sets the rememberScope, when Compose actually iterates over the RememberObservables after the composition in the apply phase, it will skip doing anything, because it's already created.

The onRemembered should be only called once, because you can't really re-remember. If it gets out of the composition completely, then the onForgotten is called and next time a new instance is created.

I think this could optimized into starting the imageLoader.execute(updateRequest(it)).toState() whenever you set a new request and it's different from the previous and it doesn't have to observe the flow. This way you optimize the coroutines overhead for many people, but also can keep the backwards-compatible way of having the state.

Anyway, I'll revert this so the API stays the same

Comment on lines 311 to 312
requestInternal
.mapLatest { imageLoader.execute(updateRequest(it)).toState() }
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the _request is set everytime when the rememberAsyncImagePainter recomposes, it could use just setter and job?.cancel() without observing the flow. This could bring the benefit because most of the time the _request stays the same and therefore only needs one job that finishes and doesn't have to keep listening, but if it changes, then it just calls another job (or cancels it if existing).

@colinrtwhite
Copy link
Member

@mlykotom Thanks for all the work on this! I'll take a look through the other optimizations this weekend. Unfortunately we can't make any binary incompatible changes to the 2.x branch, but we should consider incorporating some of them into 3.x (the main branch). I'll also upstream this change to main.

@colinrtwhite colinrtwhite merged commit bacaba0 into coil-kt:2.x May 10, 2024
6 of 7 checks passed
colinrtwhite pushed a commit that referenced this pull request May 10, 2024
* Improve RealImageLoader.execute to properly call async request

* Add experiments to get information about performance

* Revert fix

* Add comment what's blocking

* Fix spotless

* Tweak benchmarks + add traces

Update benchmarks

Experiment removing mutableState

Before:
AsyncImagePainter.onRememberedAverageMs        min   0.1,   median   0.1,   max   0.1
AsyncImagePainter.onRememberedCount            min 130.0,   median 139.0,   max 146.0
AsyncImagePainter.onRememberedMs               min   9.3,   median  12.5,   max  16.3
frameDurationCpuMs                             P50    4.1,   P90    9.1,   P95   20.0,   P99   99.3

After:
AsyncImagePainter.onRememberedAverageMs        min   0.1,   median   0.1,   max   0.1
AsyncImagePainter.onRememberedCount            min 130.0,   median 141.0,   max 148.0
AsyncImagePainter.onRememberedMs               min   8.3,   median  10.5,   max  15.8
frameDurationCpuMs                             P50    4.4,   P90    8.6,   P95   12.0,   P99   95.1

Don't launch job for compose

before
AsyncImagePainter.onRememberedAverage_µs       min     77.3,   median    101.5,   max    116.7
AsyncImagePainter.onRememberedCount            min    130.0,   median    130.0,   max    146.0
AsyncImagePainter.onRemembered_µs              min 10,048.4,   median 13,880.2,   max 16,900.0
timeToInitialDisplayMs                         min    256.8,   median    281.7,   max    358.9
frameDurationCpuMs                             P50       5.1,   P90       9.2,   P95      13.3,   P99     117.0

after
AsyncImagePainter.onRememberedAverage_µs       min     68.9,   median     78.7,   max     91.1
AsyncImagePainter.onRememberedCount            min    128.0,   median    141.0,   max    146.0
AsyncImagePainter.onRemembered_µs              min  8,956.1,   median 11,163.1,   max 13,111.9
timeToInitialDisplayMs                         min    234.1,   median    276.5,   max    325.7
frameDurationCpuMs                             P50       4.5,   P90       8.9,   P95      15.5,   P99     120.3

* Experiment removing Compose State

Before:
AsyncImagePainter.onRememberedAverageMs        min   0.1,   median   0.1,   max   0.1
AsyncImagePainter.onRememberedCount            min 130.0,   median 139.0,   max 146.0
AsyncImagePainter.onRememberedMs               min   9.3,   median  12.5,   max  16.3
frameDurationCpuMs                             P50    4.1,   P90    9.1,   P95   20.0,   P99   99.3

After:
AsyncImagePainter.onRememberedAverageMs        min   0.1,   median   0.1,   max   0.1
AsyncImagePainter.onRememberedCount            min 130.0,   median 141.0,   max 148.0
AsyncImagePainter.onRememberedMs               min   8.3,   median  10.5,   max  15.8
frameDurationCpuMs                             P50    4.4,   P90    8.6,   P95   12.0,   P99   95.1

* Don't launch job for compose

before
AsyncImagePainter.onRememberedAverage_µs       min     77.3,   median    101.5,   max    116.7
AsyncImagePainter.onRememberedCount            min    130.0,   median    130.0,   max    146.0
AsyncImagePainter.onRemembered_µs              min 10,048.4,   median 13,880.2,   max 16,900.0
timeToInitialDisplayMs                         min    256.8,   median    281.7,   max    358.9
frameDurationCpuMs                             P50       5.1,   P90       9.2,   P95      13.3,   P99     117.0

after
AsyncImagePainter.onRememberedAverage_µs       min     68.9,   median     78.7,   max     91.1
AsyncImagePainter.onRememberedCount            min    128.0,   median    141.0,   max    146.0
AsyncImagePainter.onRemembered_µs              min  8,956.1,   median 11,163.1,   max 13,111.9
timeToInitialDisplayMs                         min    234.1,   median    276.5,   max    325.7
frameDurationCpuMs                             P50       4.5,   P90       8.9,   P95      15.5,   P99     120.3

* Cleanup

* Fix dependency with version catalog

* Cleanup

* Revert API changes
colinrtwhite added a commit that referenced this pull request May 10, 2024
* Improve RealImageLoader.execute to properly call async request (#2205)

* Improve RealImageLoader.execute to properly call async request

* Add experiments to get information about performance

* Revert fix

* Add comment what's blocking

* Fix spotless

* Tweak benchmarks + add traces

Update benchmarks

Experiment removing mutableState

Before:
AsyncImagePainter.onRememberedAverageMs        min   0.1,   median   0.1,   max   0.1
AsyncImagePainter.onRememberedCount            min 130.0,   median 139.0,   max 146.0
AsyncImagePainter.onRememberedMs               min   9.3,   median  12.5,   max  16.3
frameDurationCpuMs                             P50    4.1,   P90    9.1,   P95   20.0,   P99   99.3

After:
AsyncImagePainter.onRememberedAverageMs        min   0.1,   median   0.1,   max   0.1
AsyncImagePainter.onRememberedCount            min 130.0,   median 141.0,   max 148.0
AsyncImagePainter.onRememberedMs               min   8.3,   median  10.5,   max  15.8
frameDurationCpuMs                             P50    4.4,   P90    8.6,   P95   12.0,   P99   95.1

Don't launch job for compose

before
AsyncImagePainter.onRememberedAverage_µs       min     77.3,   median    101.5,   max    116.7
AsyncImagePainter.onRememberedCount            min    130.0,   median    130.0,   max    146.0
AsyncImagePainter.onRemembered_µs              min 10,048.4,   median 13,880.2,   max 16,900.0
timeToInitialDisplayMs                         min    256.8,   median    281.7,   max    358.9
frameDurationCpuMs                             P50       5.1,   P90       9.2,   P95      13.3,   P99     117.0

after
AsyncImagePainter.onRememberedAverage_µs       min     68.9,   median     78.7,   max     91.1
AsyncImagePainter.onRememberedCount            min    128.0,   median    141.0,   max    146.0
AsyncImagePainter.onRemembered_µs              min  8,956.1,   median 11,163.1,   max 13,111.9
timeToInitialDisplayMs                         min    234.1,   median    276.5,   max    325.7
frameDurationCpuMs                             P50       4.5,   P90       8.9,   P95      15.5,   P99     120.3

* Experiment removing Compose State

Before:
AsyncImagePainter.onRememberedAverageMs        min   0.1,   median   0.1,   max   0.1
AsyncImagePainter.onRememberedCount            min 130.0,   median 139.0,   max 146.0
AsyncImagePainter.onRememberedMs               min   9.3,   median  12.5,   max  16.3
frameDurationCpuMs                             P50    4.1,   P90    9.1,   P95   20.0,   P99   99.3

After:
AsyncImagePainter.onRememberedAverageMs        min   0.1,   median   0.1,   max   0.1
AsyncImagePainter.onRememberedCount            min 130.0,   median 141.0,   max 148.0
AsyncImagePainter.onRememberedMs               min   8.3,   median  10.5,   max  15.8
frameDurationCpuMs                             P50    4.4,   P90    8.6,   P95   12.0,   P99   95.1

* Don't launch job for compose

before
AsyncImagePainter.onRememberedAverage_µs       min     77.3,   median    101.5,   max    116.7
AsyncImagePainter.onRememberedCount            min    130.0,   median    130.0,   max    146.0
AsyncImagePainter.onRemembered_µs              min 10,048.4,   median 13,880.2,   max 16,900.0
timeToInitialDisplayMs                         min    256.8,   median    281.7,   max    358.9
frameDurationCpuMs                             P50       5.1,   P90       9.2,   P95      13.3,   P99     117.0

after
AsyncImagePainter.onRememberedAverage_µs       min     68.9,   median     78.7,   max     91.1
AsyncImagePainter.onRememberedCount            min    128.0,   median    141.0,   max    146.0
AsyncImagePainter.onRemembered_µs              min  8,956.1,   median 11,163.1,   max 13,111.9
timeToInitialDisplayMs                         min    234.1,   median    276.5,   max    325.7
frameDurationCpuMs                             P50       4.5,   P90       8.9,   P95      15.5,   P99     120.3

* Cleanup

* Fix dependency with version catalog

* Cleanup

* Revert API changes

* Remove debug code.

* Formatting.

---------

Co-authored-by: Tomáš Mlynarič <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants