Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fine-grained sampling rate per checkpoint #156

Closed
ramboz opened this issue May 15, 2024 · 2 comments · Fixed by #159
Closed

Fine-grained sampling rate per checkpoint #156

ramboz opened this issue May 15, 2024 · 2 comments · Fixed by #159
Labels
enhancement New feature or request released

Comments

@ramboz
Copy link
Collaborator

ramboz commented May 15, 2024

Is your feature request related to a problem? Please describe.

In the context of running experiments and hoping to gain fast convergence on the winner, we've seen over the last year that the default 1/100 sampling rate is not enough to reach statistical significance in 2 weeks for most customers. We've lowered the sampling to 1/10 since the start of the year on pages that run an experiment but this triggered a few issues:

  1. Lack of proper API: the RUM library does not offer an easy API to control this, and we had to modify the weight after its initialization, which leads to 2.
  2. Breaking session integrity: since we dynamically change the weight only on pages that run experiments, we end up doing this sometime in the eager phase, after some of the checkpoints have already been passed (top, load, possibly others). So we end up with some sessions missing those 2 events and break the integrity of the events lifecycle for that session
  3. Side effect on other checkpoints: once the experimentation checkpoint changes the weight, it's a global change that also impacts all other events happening afterwards… so we suddenly end up sampling block views, resource loads, media views at 1/10 as well which creates a lot of noise for no immediate gain for our use case (increase cost and noise, but no value gained).

Describe the solution you'd like

Ideally, the sampelRUM object would expose an API to change the sampling rate for a specific checkpoint, so only that 1 checkpoint (or a few like experiment & convert) is (are) impacted, and not the others.

Something along the lines:

sampleRUM('experimentation', { source: 'experiment-a', target: 'variant-1' }, 10);

and/or

sampleRUM.sampleAt('experiment', 10);
sampleRUM.sampleAt('convert', 10);
sampleRUM('experiment', { source: 'experiment-a', target: 'variant-1' });

The 2nd approach doesn't create tight coupling between the 2 events, and the experiments can decide to increase sampling for conversion so they align on the page that needed it without impacting pages that just have conversion with no experiments.

Describe alternatives you've considered

  1. Setting a global object dynamically in the head.html, like window.RUM_SAMPLING_RATE = 10, before aem.js/lib-franklin.js is loaded so we can adjust the default sampling before the 1st events are triggered.
    • This leaks JS logic in the head and decouples from the "plugins" that actually require it… so makes instrumentation harder and leaks plugin details in the global code:
      <script>
      window.RUM_SAMPLING_RATE = document.head.querySelector('meta[name="experiment"]') ? 10 : 100;
      </script>
    • We still don't address the side effects on other checkpoints and only increase the cost and noise even further
  2. Resetting the sampling rate after the experiment checkpoint is fired
    • There is still no guarantee that other checkpoints haven't been fired with the custom sampling rate, so we don't fully address cost/noise issues
    • Since experiment works tightly coupled with convert, we actually also need to wait for the 1st conversion to happen otherwise we can't compute the winner
@ramboz ramboz added the enhancement New feature or request label May 15, 2024
@ramboz
Copy link
Collaborator Author

ramboz commented May 29, 2024

After discussing with @trieloff, we decided to stick to the 1. alternative instead for now

@trieloff trieloff closed this as completed Jun 5, 2024
ramboz added a commit that referenced this issue Aug 19, 2024
…ific use cases

This introduces a new global variable, `window.RUM_SAMPLING_RATE`, that can be set before loading the library to increase the sampling rate for specific use cases that require more data collected for short-term reporting.
For instance:
- when running an experiment in a 2-week time-frame and achieve statistical significance even with low traffic
- when running short-lived marketing campaign and wanting to collect enough data over a single weekend

## Usage

For instance:
```js
window.SAMPLE_PAGEVIEWS_AT_RATE = 'high';
```

Or in an HTML context:
```html
  <head>
    <meta name="experiment" content="Foo"/>
    <meta name="experiment-variants" content="/bar,/baz"/>
    <script>window.SAMPLE_PAGEVIEWS_AT_RATE = document.head.querySelector('meta[name="experiment"]') ? 'high' : null</script>
    <script src="/scripts/aem.js" type="module"></script>
    <script src="/scripts/scripts.js" type="module"></script>
    <link rel="stylesheet" href="/styles/styles.css">
  </head>
```
## Related Issues

Fix #156
adobe-bot pushed a commit that referenced this issue Aug 19, 2024
# [2.3.0](v2.2.0...v2.3.0) (2024-08-19)

### Features

* allow increasing the sampling rate up to 1/10 for specific use cases ([f96e713](f96e713))
* **minirum:** allow increasing the sampling rate up to 1/10 for specific use cases ([05f6da7](05f6da7)), closes [#156](#156)
@adobe-bot
Copy link
Collaborator

🎉 This issue has been resolved in version 2.3.0 🎉

The release is available on:

Your semantic-release bot 📦🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request released
Projects
None yet
3 participants