Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alert when Orka cluster is unavailable #3491

Open
finestructure opened this issue Nov 12, 2024 · 2 comments
Open

Alert when Orka cluster is unavailable #3491

finestructure opened this issue Nov 12, 2024 · 2 comments
Assignees

Comments

@finestructure
Copy link
Member

The Orka cluster underwent maintenance but we're not getting any alerts about it being unavailable. We need to put a mechanism in place for the Orka orchestrator to report back connectivity issues.

@finestructure finestructure self-assigned this Nov 12, 2024
@finestructure
Copy link
Member Author

I think the only way this could work is if we add Prometheus metrics to the Orka driver and scrape it from the prod Prometheus.

@finestructure
Copy link
Member Author

finestructure commented Nov 16, 2024

The problem with that is that the Orka orchestrator is spawned via Gitlab per job, i.e. it is not a long-running, scrapable process. We'd have to have it push metrics to a push gateway.

We do that already for the batch jobs like reconciliation and analysis but that push gateway is local to the docker swarm network and currently not accessible from outside the network.

We could make it available but then we'd need to deal with some form of authentication. It's probably better/easier to bring up another push gateway next to the Orka orchestrator that we then push to from the orchestrator and scrape from Prometheus.

Screenshot 2024-11-16 at 12 11 26

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant