You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When multiple hosts are provided to a client and a TransportError occurs, e.g. Faraday::ConnectionFailed, the request is passed on to be tried on the next available host. When all hosts have been tried, but there are more retry attempts remaining, all hosts are revived. This behavior seems pretty straight forward.
Problem
However, when the retry_on_status option is provided to the client, along with multiple hosts, all retries are attempted against the erroring host, and secondary hosts are never queried. In my understanding, this is because retry is called immediately, before the host connection can be killed.
Not only is this somewhat unexpected behavior, there is the added wrinkle that the retry count is adjusted up for multi host connections regardless of whether or not all those connections are used. So with 2 hosts on a client and a retry_on_failure value of 3, a transport error will retry 3 times each on host 1 and host 2, alternating between the two, but on an exception that is noted in retry_on_status, host 1 will see 6 attempted requests before it gives up.
Version
We are still way back on version 5.0.4, but this behavior appears to be the same all the way through 7.7.0
Example
Here are logs from a client with two hosts, ["foobar-us-east-1", "barbaz-us-east-2"], retry_on_failure: 3, retry_on_status: [503]
Expected behavior, and the current behavior of TransportErrors:
{"level":"ERROR","msg":"[Faraday::ConnectionFailed] Failed to open TCP connection to foobar-us-east-1:9200 (getaddrinfo: Name or service not known) {:host=>\"foobar-us-east-1\", :port=>9200, :protocol=>\"http\"}"}
{"level":"WARN","msg":"[Faraday::ConnectionFailed] Attempt 1 connecting to {:host=>\"foobar-us-east-1\", :port=>9200, :protocol=>\"http\"}"}
{"level":"ERROR","msg":"[Faraday::ConnectionFailed] Failed to open TCP connection to bazbat-us-east-2:9200 (getaddrinfo: Name or service not known) {:host=>\"bazbat-us-east-2\", :port=>9200, :protocol=>\"http\"}"}
{"level":"WARN","msg":"[Faraday::ConnectionFailed] Attempt 2 connecting to {:host=>\"bazbat-us-east-2\", :port=>9200, :protocol=>\"http\"}"}
{"level":"ERROR","msg":"[Faraday::ConnectionFailed] Failed to open TCP connection to foobar-us-east-1:9200 (getaddrinfo: Name or service not known) {:host=>\"foobar-us-east-1\", :port=>9200, :protocol=>\"http\"}"}
{"level":"WARN","msg":"[Faraday::ConnectionFailed] Attempt 3 connecting to {:host=>\"foobar-us-east-1\", :port=>9200, :protocol=>\"http\"}"}
{"level":"ERROR","msg":"[Faraday::ConnectionFailed] Failed to open TCP connection to bazbat-us-east-2:9200 (getaddrinfo: Name or service not known) {:host=>\"bazbat-us-east-2\", :port=>9200, :protocol=>\"http\"}"}
{"level":"WARN","msg":"[Faraday::ConnectionFailed] Attempt 4 connecting to {:host=>\"bazbat-us-east-2\", :port=>9200, :protocol=>\"http\"}"}
{"level":"ERROR","msg":"[Faraday::ConnectionFailed] Failed to open TCP connection to foobar-us-east-1:9200 (getaddrinfo: Name or service not known) {:host=>\"foobar-us-east-1\", :port=>9200, :protocol=>\"http\"}"}
{"level":"WARN","msg":"[Faraday::ConnectionFailed] Attempt 5 connecting to {:host=>\"foobar-us-east-1\", :port=>9200, :protocol=>\"http\"}"}
{"level":"ERROR","msg":"[Faraday::ConnectionFailed] Failed to open TCP connection to bazbat-us-east-2:9200 (getaddrinfo: Name or service not known) {:host=>\"bazbat-us-east-2\", :port=>9200, :protocol=>\"http\"}"}
{"level":"WARN","msg":"[Faraday::ConnectionFailed] Attempt 6 connecting to {:host=>\"bazbat-us-east-2\", :port=>9200, :protocol=>\"http\"}"}
{"level":"ERROR","msg":"[Faraday::ConnectionFailed] Failed to open TCP connection to foobar-us-east-1:9200 (getaddrinfo: Name or service not known) {:host=>\"foobar-us-east-1\", :port=>9200, :protocol=>\"http\"}"}
{"level":"WARN","msg":"[Faraday::ConnectionFailed] Attempt 7 connecting to {:host=>\"foobar-us-east-1\", :port=>9200, :protocol=>\"http\"}"}
{"level":"FATAL","msg":"[Faraday::ConnectionFailed] Cannot connect to {:host=>\"foobar-us-east-1\", :port=>9200, :protocol=>\"http\"} after 7 tries"}
Note how attempts bounce back and forth between foobar-us-east-1 and bazbat-us-east-2
Current behavior of retry_on_status errors:
{"level":"WARN","msg":"[Elasticsearch::Transport::Transport::Errors::ServiceUnavailable] Attempt 1 to get response from http://foobar-us-east-1:9200/_search"}
{"level":"WARN","msg":"[Elasticsearch::Transport::Transport::Errors::ServiceUnavailable] Attempt 2 to get response from http://foobar-us-east-1:9200/_search"}
{"level":"WARN","msg":"[Elasticsearch::Transport::Transport::Errors::ServiceUnavailable] Attempt 3 to get response from http://foobar-us-east-1:9200/_search"}
{"level":"WARN","msg":"[Elasticsearch::Transport::Transport::Errors::ServiceUnavailable] Attempt 4 to get response from http://foobar-us-east-1:9200/_search"}
{"level":"WARN","msg":"[Elasticsearch::Transport::Transport::Errors::ServiceUnavailable] Attempt 5 to get response from http://foobar-us-east-1:9200/_search"}
{"level":"WARN","msg":"[Elasticsearch::Transport::Transport::Errors::ServiceUnavailable] Attempt 6 to get response from http://foobar-us-east-1:9200/_search"}
{"level":"WARN","msg":"[Elasticsearch::Transport::Transport::Errors::ServiceUnavailable] Attempt 7 to get response from http://foobar-us-east-1:9200/_search"}
{"level":"FATAL","msg":"[Elasticsearch::Transport::Transport::Errors::ServiceUnavailable] Cannot get response from http://foobar-us-east-1:9200/_search after 7 tries"}
Note how attempts are only made to foobar-us-east-1, and there are retry_count * number of hosts + 1 of them.
The text was updated successfully, but these errors were encountered:
Context
When multiple hosts are provided to a client and a
TransportError
occurs, e.g.Faraday::ConnectionFailed
, the request is passed on to be tried on the next available host. When all hosts have been tried, but there are more retry attempts remaining, all hosts are revived. This behavior seems pretty straight forward.Problem
However, when the
retry_on_status
option is provided to the client, along with multiple hosts, all retries are attempted against the erroring host, and secondary hosts are never queried. In my understanding, this is becauseretry
is called immediately, before the host connection can be killed.Not only is this somewhat unexpected behavior, there is the added wrinkle that the retry count is adjusted up for multi host connections regardless of whether or not all those connections are used. So with 2 hosts on a client and a
retry_on_failure
value of 3, a transport error will retry 3 times each on host 1 and host 2, alternating between the two, but on an exception that is noted inretry_on_status
, host 1 will see 6 attempted requests before it gives up.Version
We are still way back on version
5.0.4
, but this behavior appears to be the same all the way through 7.7.0Example
Here are logs from a client with two hosts,
["foobar-us-east-1", "barbaz-us-east-2"]
,retry_on_failure: 3
,retry_on_status: [503]
Expected behavior, and the current behavior of TransportErrors:
Note how attempts bounce back and forth between
foobar-us-east-1
andbazbat-us-east-2
Current behavior of
retry_on_status
errors:Note how attempts are only made to foobar-us-east-1, and there are
retry_count * number of hosts + 1
of them.The text was updated successfully, but these errors were encountered: