Implementing benchmarking metrics within status-mobile app #19047

siddarthkay · 2024-02-29T05:06:17Z

siddarthkay
Feb 29, 2024
Maintainer

Problem

What gets measured gets managed

We need a common language and some useful benchmarking metrics so that we can measure performance across different user journeys on different platforms under various conditions.

Existing solutions

React Native Debugger -> A tool provided by react-native team
ref -> https://reactnative.dev/docs/debugging?js-debugger=new-debugger#performance-monitor
Provides following key metrics :

JS FPS
UI FPS
Frame Drops
Memory Consumption
Views rendered vs total Views

This is a good tool but the only limitation here being that it provides information on debug builds and we are interested in those metrics on a release build.
The app goes through various compile time optimisations such that what might be a problem in debug build may not be a problem in release build.
Hence I think its not fair to consider these metrics on debug versions since they may be a misleading.

Hermes Profiler -> A profiler by react native team
ref -> https://reactnative.dev/docs/profile-hermes
Hermes Profiler provides a good waterfall but we need to connect to a laptop to extract these metrics and it is sometimes hard to setup / measure.
This way is often not easy for everyone to setup or use.
FlashLight Library -> A standalone library to measure perf of Android Apps
ref -> https://github.com/bamlab/flashlight

Works only on Android at the moment, iOS support is still in development.
A good thing about this library is that they track important metrics on release builds of the app.
The only limitation here is that they use an external C debugger to profile the performance on an emulator ( or a connected device via a cable ).
The problem with emulators are that they have subtle differences when we compare them with real devices and still do not show the complete picture. The problem with connected devices is just that its (in my opinion) not an optimal UX for measuring the performance but still a doable approach in the absence of other solutions.

Proposal

We build an in house solution with the help of Android and iOS native modules to measure and log the performance metrics we care about.

JS FPS
UI FPS
Frame Drops
Memory Consumption
CPU Usage

How we would do this is by providing a toggle profiler button in settings and a floating button on all screens where we could turn it off. We only would want to profile certain user journeys to measure the performance of the app in those situations and not have it be turned on by default. Although it would be cool if we could turn it on at build time via a flag and turn it off in settings so that we can measure onboarding performance.

We would need a basic UI in the app to view these metrics and the ability to share these metrics as part of our "shake device to share logs" feature so that they are easily exportable.

Since the performance metrics is a time series data we want to store these values with a timestamp and also have the ability to average out the each value in the end.

There are other metrics as well in which we are interested in which are not time series related

TTFI ( Time to first interactivity ) : This would be the time the app takes from startup to become interactive so that the app responds to touches, the lower this metric the better.
High CPU Usage Duration : This would be a measure of how long the app was in high usage state
Average value of all the other time series data like :
- Average FPS (numeric value between 0 and 120) ,
- Average Memory consumption (numeric value between 0 and 4800 usually in MBs),
- Average CPU Usage (numeric value between 0 and 1000 usually in percentages),

ilmotta · 2024-03-01T14:20:19Z

ilmotta
Mar 1, 2024
Collaborator

Before anything, thanks for the iniciative to start this discussion @siddarthkay!

There are many interesting topics you're touching. I'll lightly pass over them to share my perspective.

The app goes through various compile time optimisations such that what might be a problem in debug build may not be a problem in release build.

We could tell shadow-cljs to run the dev build with a bunch of other optimizations to approximate to what's running in prod builds. Especially :optimizations :advanced. On the status-go side, I don't remember if we compile it with more optimizations for prod builds.

The best I think would be to decouple the collection of data from the device of origin. Ideally, the developer should decide if they want to observe from a dev build, from an emulator, from a real device, etc. The dev would trade convenience over precision, understanding the limitations of each environment.

Since the performance metrics is a time series data we want to store these values with a timestamp and also have the ability to average out the each value in the end.

The step to generate reports and make sense of time series data is solved already in the industry. I would certainly try not to reinvent this wheel. Existing tools will allow us to report p95, p99, medians, standard deviations, generate dashboards, and so on. This is all nice, because it means we can descope this effort.

We would need a basic UI in the app to view these metrics and the ability to share these metrics as part of our "shake device to share logs" feature so that they are easily exportable.

Using a limited mobile device to view metrics is not ideal to me. Some metrics can be okay, like UI FPS, or CPU usage, but once we embed the concept of time series, mobile devices just don't fit the bill when the developer needs to understand why something is not performing well.

Shaking the device would be a nice feature, but we could ignore this and focus on the developer side of things. We will be the main consumers and generators of those metrics, actively seeking improvements. Example, we could have Prometheus pulling data from the device(s) and use Graphana to create dashboards. This would allow us to see results in near real-time.

Therefore, I think it's viable to start without any UI in the app, or actually, just a way to start and stop data collection as you suggested.

There are other metrics as well in which we are interested in which are not time series related

TTFI is very important indeed, but a single data point for TTFI is kind of unreliable. So the concept of time is somewhat essential. I believe our in-house solution should always assume data could be consumed as a time series or aggregated into one point.

Hence I think its not fair to consider these metrics on debug versions since they may be a misleading.

This is a bit nuanced to me. Profiling and benchmarking in non-prod builds is useful to give a sense of relative performance. Obviously every result should be taken with a grain of salt and not as truths, even when measuring in production (e.g. real devices have plenty of other things running simultaneously, and every user is different).

I agree with you, results can be misleading, but it's mostly the dev's responsibility to know this, so I still think it's valuable to invest time in making using tools like FlashLight and the Hermes profiler easier for everybody. Both tools worked well in my system, albeit Hermes was a bit inconvenient.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implementing benchmarking metrics within status-mobile app #19047

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment

{{title}}

Select a reply

Implementing benchmarking metrics within status-mobile app #19047

siddarthkay Feb 29, 2024 Maintainer

Problem

Existing solutions

Proposal

Replies: 1 comment

ilmotta Mar 1, 2024 Collaborator

siddarthkay
Feb 29, 2024
Maintainer

ilmotta
Mar 1, 2024
Collaborator