Is there a way to work with a structural causal model where nodes have multiple values? For use in root cause analysis. #1222

ronikobrosly · 2024-07-04T17:53:13Z

Take for example the microservice latency RCA demonstration in the dowhy documentation (https://www.pywhy.org/dowhy/v0.8/example_notebooks/rca_microservice_architecture.html)

It's a fantastic example, but latency is just one of the "golden signals" used to determine service health. There's also status code counts (2xx, 4xx, 5xx), traffic counts (how many users are visiting each microservice). Oftentimes in the observability space, we have all of this data at our disposal. Is there a way to incorporate these multiple data points for each node to do a more holistic root cause analysis? And if that's not available through the dowhy API itself, do you recommend an hacky approach to handle this? Thank you!

ronikobrosly · 2024-07-04T17:56:06Z

To clarify, I mean that at each time step or unique observation, and for each node there could be multiple values (e.g. for node #1, in the first observation, the latency was 0.5 seconds, the CPU usage was 0.80, and some other metric was 1.25.

It seems to me this could be more holistic, but I'm sure it makes the math more challenging.

bloebp · 2024-07-05T15:18:42Z

Hi, there are different ways to think about this. One could simply see a node as multivariate (i.e., it has vectors of observations instead of single values), but DoWhy does not support this yet. Another perspective is to further 'unroll' the graph by incorporating the relationships between all these metrics. For instance, a metric like 'latency' might cause the latency of the calling node to change, but another metric like 'request count' is exactly the opposite, where the number of requests in a calling node causes the number of requests to increase in the child node, etc. So, optimally, we have a big graph where all these metrics are connected with each other. 'Latency' in a Website might be caused by 'Request' or 'CPU usage' of that website, etc.

ronikobrosly · 2024-07-05T15:32:56Z

Thanks so much @bloebp , this makes a lot of sense! If you don't mind I have a follow-up question: Do you know of any implementation somewhere of structural models that handle multivariate/vector nodes?

bloebp · 2024-07-05T15:51:08Z

I am not aware of implementations that support that (i.e., a functional causal model per node that supports multivariate outputs). I think the general challenge for this is that the underlying model (e.g., a regression model in the case of additive noise models) needs to support this, and this means you need to restrict the types of models that are supported.

Multivariate support has been on my to-do list for a long time, but it is not straightforward to adjust the algorithms to the multivariate cases.

ronikobrosly · 2024-07-05T20:02:23Z

I appreciate your thoughts. Thank you @bloebp

ronikobrosly · 2024-07-06T12:44:08Z

I'd love to take a stab at incorporating this sort of approach into dowhy with you, if you're open to it @bloebp . I'd have a spend a bit of time learning your API more closely at first naturally.

bloebp · 2024-07-08T14:34:48Z

Yea, sure, that would be awesome! Let me know if you have any questions/get stuck, I believe there will be some subtle issues here and there. Also, feel free to message me on the PyWhy discord directly.

ronikobrosly · 2024-07-09T14:24:29Z

Okay fantastic @bloebp . I'll find you on Dischord and reach out.

ronikobrosly added the question Further information is requested label Jul 4, 2024

ronikobrosly closed this as completed Jul 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is there a way to work with a structural causal model where nodes have multiple values? For use in root cause analysis. #1222

Is there a way to work with a structural causal model where nodes have multiple values? For use in root cause analysis. #1222

ronikobrosly commented Jul 4, 2024

ronikobrosly commented Jul 4, 2024

bloebp commented Jul 5, 2024

ronikobrosly commented Jul 5, 2024

bloebp commented Jul 5, 2024

ronikobrosly commented Jul 5, 2024

ronikobrosly commented Jul 6, 2024 •

edited

Loading

bloebp commented Jul 8, 2024

ronikobrosly commented Jul 9, 2024

Is there a way to work with a structural causal model where nodes have multiple values? For use in root cause analysis. #1222

Is there a way to work with a structural causal model where nodes have multiple values? For use in root cause analysis. #1222

Comments

ronikobrosly commented Jul 4, 2024

ronikobrosly commented Jul 4, 2024

bloebp commented Jul 5, 2024

ronikobrosly commented Jul 5, 2024

bloebp commented Jul 5, 2024

ronikobrosly commented Jul 5, 2024

ronikobrosly commented Jul 6, 2024 • edited Loading

bloebp commented Jul 8, 2024

ronikobrosly commented Jul 9, 2024

ronikobrosly commented Jul 6, 2024 •

edited

Loading