-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expand monitoring to include Tale launching #7
Comments
@Xarthisius This has been running on staging now for several days and I'm seeing at least one error per day during the create call-- occasionally 400 and 401, and more often 500. Looking at the staging girder error log, it's not immediately clear what the cause is:
I've seen alerts at 4:19AM and 9:18AM today. The Girder error logs also have a surprising number of these:
and these
But since they are |
9:18 AM CDT? |
Sorry, yes 9:18 AM CDT was the alert time from check_mk. |
Looks like ValueError is raised during parsing
relevant code: https://github.com/whole-tale/gwvolman/blob/master/gwvolman/utils.py#L80-L86 either payload was mangled somehow (expired token?) or worker failed to connect back to girder to get user info. I set up |
Thank you. This happens so infrequently that we probably won't see anything until tomorrow AM. The problem does correct itself (i.e., during the next check, things are OK). The |
Got one. From check_mk:
Looking at the Flower dashboard, the payload looks OK:
I was able to curl with the token:
So this seems to point to a possible networking problem, which is unsurprising. I'm going to reduce the check time to every 10 minutes to see how pervasive this is. |
Also received 401 on POST instance at ~5am CDT this morning |
This is the kind of error that's reported most
The text was updated successfully, but these errors were encountered: