Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Max retries per client #535

Open
s4ke opened this issue Sep 19, 2024 · 1 comment
Open

Max retries per client #535

s4ke opened this issue Sep 19, 2024 · 1 comment
Assignees

Comments

@s4ke
Copy link

s4ke commented Sep 19, 2024

TL;DR

I just stumbled over this because of an issue with the docker volume plugin we use while the CSI implementation of Docker Swarm matures: costela/docker-volume-hetzner#53 .

I think this is great, but what we need is to be able to implement a max retry with exponential retry per node and not per request so that we dont run into a situation where a node will be stuck trying again and again to be stuck at some point.

I might be missing some info, but I don't think the retry logic that was recently added is enough here. We kinda need a mechanism that turns down any requests (error would be fine) for a while, while it waits for the rate limit to be fine again.

Expected behavior

In the situation of the docker volume plugin at https://github.com/costela/docker-volume-hetzner we would need support from the hcloud-go driver to give us a way to back off "globally" so that we don't run into situations where a node can't recover from being rate limited.

@jooola
Copy link
Member

jooola commented Oct 9, 2024

I think this is great, but what we need is to be able to implement a max retry with exponential retry per node and not per request so that we dont run into a situation where a node will be stuck trying again and again to be stuck at some point.

With node, you are referring to a server? I am not sure such a high level retry mechanism should be owned by the API client. API errors should be handled by your app, where the retry decision is yours.

The retry mechanism we implement is focusing on transport failures (TCP) and small API outages (HTTP). We did add a retry on rate limits errors, to be more resilient in case of rate limit by having an exponential back off sleep time between retries.

We kinda need a mechanism that turns down any requests (error would be fine) for a while, while it waits for the rate limit to be fine again.

This is somewhat handled by the exponential back off sleep time between retries. You should be able to tweak the retry settings yourself using https://pkg.go.dev/github.com/hetznercloud/hcloud-go/v2/hcloud#WithRetryOpts

Note that you can also use a cancellable context if you want to stop the retrying after a certain amount of time.


Allowing the users to configure the retry policy is something we were considering (e.g. retry on rate limit or not), but choose to postpone it to gather some more data on how to best handle the user's use cases.

Could you update the issue title to better reflect your feature request ?

@jooola jooola self-assigned this Oct 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants