You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I just stumbled over this because of an issue with the docker volume plugin we use while the CSI implementation of Docker Swarm matures: costela/docker-volume-hetzner#53 .
I think this is great, but what we need is to be able to implement a max retry with exponential retry per node and not per request so that we dont run into a situation where a node will be stuck trying again and again to be stuck at some point.
I might be missing some info, but I don't think the retry logic that was recently added is enough here. We kinda need a mechanism that turns down any requests (error would be fine) for a while, while it waits for the rate limit to be fine again.
Expected behavior
In the situation of the docker volume plugin at https://github.com/costela/docker-volume-hetzner we would need support from the hcloud-go driver to give us a way to back off "globally" so that we don't run into situations where a node can't recover from being rate limited.
The text was updated successfully, but these errors were encountered:
I think this is great, but what we need is to be able to implement a max retry with exponential retry per node and not per request so that we dont run into a situation where a node will be stuck trying again and again to be stuck at some point.
With node, you are referring to a server? I am not sure such a high level retry mechanism should be owned by the API client. API errors should be handled by your app, where the retry decision is yours.
The retry mechanism we implement is focusing on transport failures (TCP) and small API outages (HTTP). We did add a retry on rate limits errors, to be more resilient in case of rate limit by having an exponential back off sleep time between retries.
We kinda need a mechanism that turns down any requests (error would be fine) for a while, while it waits for the rate limit to be fine again.
Note that you can also use a cancellable context if you want to stop the retrying after a certain amount of time.
Allowing the users to configure the retry policy is something we were considering (e.g. retry on rate limit or not), but choose to postpone it to gather some more data on how to best handle the user's use cases.
Could you update the issue title to better reflect your feature request ?
TL;DR
I just stumbled over this because of an issue with the docker volume plugin we use while the CSI implementation of Docker Swarm matures: costela/docker-volume-hetzner#53 .
I think this is great, but what we need is to be able to implement a max retry with exponential retry per node and not per request so that we dont run into a situation where a node will be stuck trying again and again to be stuck at some point.
I might be missing some info, but I don't think the retry logic that was recently added is enough here. We kinda need a mechanism that turns down any requests (error would be fine) for a while, while it waits for the rate limit to be fine again.
Expected behavior
In the situation of the docker volume plugin at https://github.com/costela/docker-volume-hetzner we would need support from the hcloud-go driver to give us a way to back off "globally" so that we don't run into situations where a node can't recover from being rate limited.
The text was updated successfully, but these errors were encountered: