Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check failed to run: execution expired #82

Open
danktec opened this issue Jul 19, 2017 · 13 comments
Open

Check failed to run: execution expired #82

danktec opened this issue Jul 19, 2017 · 13 comments

Comments

@danktec
Copy link

danktec commented Jul 19, 2017

We still see this issue occasionally even after upgrading to sensu 1.0, sensu-plugin-1.4.5 with http-2.5.0

Check failed to run: execution expired, [
 \"/opt/sensu/embedded/lib/ruby/gems/2.4.0/gems/sensu-plugin-1.4.5/lib/sensu-plugin/check/cli.rb:23:in `write'\", 
 \"/opt/sensu/embedded/lib/ruby/gems/2.4.0/gems/sensu-plugin-1.4.5/lib/sensu-plugin/check/cli.rb:23:in `puts'\", 
 \"/opt/sensu/embedded/lib/ruby/gems/2.4.0/gems/sensu-plugin-1.4.5/lib/sensu-plugin/check/cli.rb:23:in `puts'\", 
 \"/opt/sensu/embedded/lib/ruby/gems/2.4.0/gems/sensu-plugin-1.4.5/lib/sensu-plugin/check/cli.rb:23:in `output'\", 
 \"/opt/sensu/embedded/lib/ruby/gems/2.4.0/gems/sensu-plugin-1.4.5/lib/sensu-plugin/cli.rb:29:in `block (2 levels) in <class:CLI>'\", 
 \"/opt/sensu/embedded/lib/ruby/gems/2.4.0/gems/sensu-plugins-http-2.5.0/bin/check-http.rb:242:in `rescue in run'\", 
 \"/opt/sensu/embedded/lib/ruby/gems/2.4.0/gems/sensu-plugins-http-2.5.0/bin/check-http.rb:237:in `run'\", 
 \"/opt/sensu/embedded/lib/ruby/gems/2.4.0/gems/sensu-plugin-1.4.5/lib/sensu-plugin/cli.rb:58:in `block in <class:CLI>'\"]
 \n","status":2

This check took 15 seconds to run. Expected to have taken less than 1

command: "check-http.rb -u https://domain/path --response-code 302 -r"

@majormoses
Copy link
Member

@dhdanno I will try to re-produce locally

@majormoses
Copy link
Member

majormoses commented Jul 19, 2017

So far I have not been able to reproduce locally using ruby 2.3 or ruby 2.4 I am using this command:

$ time bundle exec ./bin/check-http.rb -u https://na1.cloudcruiser.com/ --response-code 302 -r
CheckHttp OK: 302, 154 bytes

real	0m0.324s
user	0m0.148s
sys	0m0.020s

I have not been able to get it to takes even 0.4 seconds, is your endpoint publicly that I can test against? Can you replicate this behavior outside of the context of sensu?

@danktec
Copy link
Author

danktec commented Jul 19, 2017

The checks we have run every 60 seconds but this issue only occurs once perhaps every few days or less... The system is running 30 of these checks to remote endpoints.
Unfortunately not publicly available... not sure if a delay in server response is related... trying to find that out now... might be worth simulating.

I'll see if i can replicate

@majormoses
Copy link
Member

Could it be something endpoint specific? Maybe try monitoring that endpoint and see if you can replicate. I have tried several thousand times now and have been unable to replicate anything that indicates an issue outside of the context of sensu. I need to upgrade my work env to use newer versions of this gem (I just looked and I am pinned on 0.2.1 so it's quite old) to see if I can replicate in the context of sensu.

@danktec
Copy link
Author

danktec commented Jul 20, 2017

I have simulated a slow nginx server with

tc qdisc add dev eth0 root netem delay 10000ms

And this successfully triggers the timeout... but i'm not seeing the error... the correct "CheckHttp CRITICAL: Request timed out" is returned.

I'm not sure if the endpoint being slow to respond is the real issue... It might be worth parallelizing this test into thousands of requests to see if any come back with the error.

@majormoses
Copy link
Member

hmm, unfortunately it's probably gonna be pretty hard to troubleshoot this. When I have some time I can try setting up a load test against some endpoint and see if I can replicate.

@dkorel-copperleaf
Copy link

dkorel-copperleaf commented Oct 10, 2017

I've been able to reproduce this, running the latest version of the check standalone against any non-responsive endpoint. The funny thing is, when i increase the timeout it seems to return properly.

Regular:

$ /opt/sensu/embedded/bin/ruby -v /tmp/check-http.rb -u http://8.8.8.8
ruby 2.4.1p111 (2017-03-22 revision 58053) [x86_64-linux]

# wait 15 seconds...

Check failed to run: execution expired, ["/opt/sensu/embedded/lib/ruby/gems/2.4.0/gems/sensu-plugin-1.4.5/lib/sensu-plugin/check/cli.rb:23:in `write'", "/opt/sensu/embedded/lib/ruby/gems/2.4.0/gems/sensu-plugin-1.4.5/lib/sensu-plugin/check/cli.rb:23:in `puts'", "/opt/sensu/embedded/lib/ruby/gems/2.4.0/gems/sensu-plugin-1.4.5/lib/sensu-plugin/check/cli.rb:23:in `puts'", "/opt/sensu/embedded/lib/ruby/gems/2.4.0/gems/sensu-plugin-1.4.5/lib/sensu-plugin/check/cli.rb:23:in `output'", "/opt/sensu/embedded/lib/ruby/gems/2.4.0/gems/sensu-plugin-1.4.5/lib/sensu-plugin/cli.rb:29:in `block (2 levels) in <class:CLI>'", "/tmp/check-http.rb:242:in `rescue in run'", "/tmp/check-http.rb:237:in `run'", "/opt/sensu/embedded/lib/ruby/gems/2.4.0/gems/sensu-plugin-1.4.5/lib/sensu-plugin/cli.rb:58:in `block in <class:CLI>'"]

200 second timeout:

$ /opt/sensu/embedded/bin/ruby -v /tmp/check-http.rb -u http://8.8.8.8 -t 200
ruby 2.4.1p111 (2017-03-22 revision 58053) [x86_64-linux]

# wait 200 seconds...

$ /opt/sensu/embedded/lib/ruby/gems/2.4.0/gems/sensu-plugin-1.4.5/lib/sensu-plugin/check/cli.rb:13: warning: instance variable @check_name not initialized
CheckHttp CRITICAL: Request error: Failed to open TCP connection to 8.8.8.8:80 (Connection timed out - connect(2) for "8.8.8.8" port 80)

@majormoses
Copy link
Member

majormoses commented Oct 11, 2017

Hmm that is indeed odd, can you re-produce this without using the embedded sensu ruby?

I tried this locally and got what I would have expected:

$ bundle exec ./bin/check-http.rb -u http://8.8.8.8 -t 15
CheckHttp CRITICAL: Request timed out

@danktec
Copy link
Author

danktec commented Oct 12, 2017

I installed bundler on a fresh system, and after managing to resolve dependencies, It seemed to work fine... so i guess it must be something with the ruby installation on the system

@majormoses
Copy link
Member

Interesting...

@johanek
Copy link
Contributor

johanek commented Jan 2, 2019

This error could occur if DNS resolution fails. Ruby Timeout.timeout() expires and obscures the reason for the failure. With default resolver config on Linux (timeout 5s, 2 attempts) and the default max_retries of 1 in Net::HTTP (Not configurable in ruby <2.5.0) you end up making 4 requests, so you need a timeout setting of >20seconds to get a name resolution error.

Is the Timeout.timeout() call at https://github.com/sensu-plugins/sensu-plugins-http/blob/master/bin/check-http.rb#L254 needed? All the Net::HTTP timeouts are being explicitly set, I think that all the "correct" Request timed out errors are coming from that.

@johanek
Copy link
Contributor

johanek commented Jan 2, 2019

Even without using Timeout.timeout(), a slow DNS failure will be caught by Net::HTTP's open_timeout. I'm going to raise a PR to better capture the Net::HTTP errors to at least give a clue as to where the issue may be.

But eventually it's down to the user's DNS timeout settings if name resolution failures will be visible in the check output. (options timeout:1 in resolv.conf)

@majormoses
Copy link
Member

I responded to the PR, I don't think we need it but removing it is a breaking change as it acts as a high level circuit breaker and supersedes the HTTP timeouts which are all currently configured to be the same value. There are some paths forward but all roads I can think of lead to a breaking change. See the linked PR for more details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants