Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[rush] Differentiate remote and local execution in telemetry. #4755

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

aramissennyeydd
Copy link
Contributor

Summary

Fixes #4737. My goal is to address data skew questions before we go ahead with #4680 which just adjusts the data skew.

Details

There is no great way currently to determine if telemetry for an operation was generated from the current machine or a remote machine. This is likely to cause data skew depending on how you ingest the Rush telemetry, either

  1. You restore duration from nonCachedDurationMs, which causes multiple events with the same duration (+/- a few milliseconds) if you emit events from each cobuild agent. That messes with averages and whatnot when aggregating your data.
  2. You calculate duration from startTimestampMs and endTimestampMs which causes massive spikes in duration collected across your agents, as all but the primary agents report 0.05s and the primary agent reports 15.00s. That also messes with averages and whatnot during aggregation.

I propose a new wasExecutedOnThisMachine flag that monorepo maintainers can then use in their plugins to decide whether or not they want to process the given operation's data.

How it was tested

Tested in this repository, using the sharded-repo sandbox.

Impacted documentation

Anything where Rush describes writing your own telemetry plugin.

Comment on lines +853 to +855
wasExecutedOnThisMachine:
!operationResult.cobuildRunnerId ||
operationResult.cobuildRunnerId === cobuildConfiguration?.cobuildRunnerId,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strictly speaking, does a replay from cache count as "executed on this machine"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Open to other names here. Strictly speaking, kind of/kind of not right? Cobuilds are intended to be state restores across machines as though it was built on any given cobuild agent, but from the data, that build wasn't executed on this machine b/c the cobuild runner ids from the state file and machine's cobuild config don't match.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.

[rush] duplicated cobuild telemetry leading to data skew
3 participants