Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create separate worker usage data collection and move hardware emit there #1293
base: master
Are you sure you want to change the base?
Create separate worker usage data collection and move hardware emit there #1293
Changes from 5 commits
f02ecae
e57ce8d
e8abca3
51cb988
4d28be1
ffd2d75
21fe267
51f7207
e304034
24b4a84
a89107f
a9c526f
fa1e190
3628eb9
96e4267
5092606
1a52b18
7f8a165
d0dac1c
64cddbc
cde3ba4
822564b
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to spawn a goroutine per worker? Why not ensure only 1 running?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only the hardware emitting is once per host, all other metrics will be worker-specific. (e.g activity poll response vs. decision poll response)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now I see only w.collectHardwareUsage() which will just spawn bunch of data into the same scope. I would suggest separating hardware emitter and worker specific metrics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's current design, for each type of metrics based on their origin, I will create a separate gorountine for each of them. But they would be contained under a single workerusagecollector so that their result can be collected and sent in one place
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there are a few things problematic about this goroutine closure
wg.Done()
will be called once goroutine forgo w.runHardwareCollector()
is started. It shouldn't be marked as done untilrunHardwareCollector()
terminates so should be moved thererunHardwareCollector
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this might not be future proof and also if customer is running separate processes for decision and activity workers then we will not have the hardware usage of those hosts that only runs activity workers. we should also not create no-op
workerUsageCollector
s if only one of them will do the work.@Groxx what would be your recommendation for host level metric reporting on the client side? I would like to avoid global static variables but this use case probably requires one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We tried Sync.Once before, but that would cause issues with unit testing as it will just wait indefinitely for this routine to stop while blocking all other goroutine from closing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can override it in the unit tests
in typical startup this would be set to
sync.Once
:in test code you can initialize this to a fake implementation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your suggestion. I have implemented that in the latest commit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see
EmitOnce
being used in workerUsageCollector. We should only have one (singleton) instance ofworkerUsageCollector
which would be lazily created by the first worker instance. Rest of the workers would create a noOpUsageCollector. This lazy initialization logic should be hidden from workers. Worker just callsnewWorkerUsageCollector()
and that function should determine whether it's first time or not. Let's discuss offline if more clarification needed.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In our usecase, only the hardware info are once per host collected. Other worker type (decision worker and activity worker) should have different workerUsageCollector as they track different task type behaviors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what type of information are you planning to collect per worker basis in this
workerUsageCollector
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tasklist backlog/poll response since decision and activity worker have their own pollers and that need to be scaled independently