-
Notifications
You must be signed in to change notification settings - Fork 64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Elaborate, document and propose remedies for "the 10k span problem" #80
Comments
One of the solution discussed could be to drop spans. The question is where to do It ?
|
One of the solution discussed could be to drop spans. The question is
where to do It ?
- on the Zipkin server side (with a rule system for example), this
could fix the UI issue
By server, I think you mean at query time, right? One tradeoff of dropping
at query time is that there is an assumption the only customer of the api
is the UI (which isn't the case, eventhough it is the primary consumer).
Nested in the attached google doc is a slight variation which is to drop or
simply collapse (make unrenderable) spans in the client-side javascript.
This is another option to help from overloading the UI, and it has the
advantage of not requiring a data model change or dropping data.
- on the collector side, this reduces the load but also introduces
complexity : How do we know this is a long trace
To qualify what you've mentioned here, this is where you don't know how
many spans will be created in the process (for example, broadcast messaging
spans, which fork on receipt). There are scenarios that create a lot of
spans in-process, and the local tracer could sample there w/o coordination.
|
so one way to proceed from here could be to enumerate different patterns
and strategies for each. For example 10k spans due to local spans, or
broadcast, or RPC, etc. I've created a google doc here that might help
https://docs.google.com/document/d/1XkFGflrQP4wF8vqv-veFDE-t-V5iyH5bh9VXRYaOROg/edit
you can also look here for some text about common tracing patterns, the
summary of which might be helpful in elaborating.
https://drive.google.com/drive/u/0/folders/0B0tSnQT3uGdAUVVUcDA5d21rRWM
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Traces that have orders of thousands of spans can be problematic. They can choke the UI (not just ours) and increase the operating costs of a tracing system. There are a number of scenarios which can result in "the 10k span problem", such as broadcast messaging to boundless consumers or buggy traced loops. Some workarounds are easier than others. For example, dropping local spans reported is easier than trying to coordinate message consumers to have them drop.
This issue should clarify the major scenarios, known workarounds and remedies. Hopefully, it can result in at least documentation, and in ideal case in coding practice that defends against this
Here are some breadcrumbs:
The text was updated successfully, but these errors were encountered: