Max retries per client #535

s4ke · 2024-09-19T07:40:20Z

TL;DR

I just stumbled over this because of an issue with the docker volume plugin we use while the CSI implementation of Docker Swarm matures: costela/docker-volume-hetzner#53 .

I think this is great, but what we need is to be able to implement a max retry with exponential retry per node and not per request so that we dont run into a situation where a node will be stuck trying again and again to be stuck at some point.

I might be missing some info, but I don't think the retry logic that was recently added is enough here. We kinda need a mechanism that turns down any requests (error would be fine) for a while, while it waits for the rate limit to be fine again.

Expected behavior

In the situation of the docker volume plugin at https://github.com/costela/docker-volume-hetzner we would need support from the hcloud-go driver to give us a way to back off "globally" so that we don't run into situations where a node can't recover from being rate limited.

jooola · 2024-10-09T09:39:28Z

I think this is great, but what we need is to be able to implement a max retry with exponential retry per node and not per request so that we dont run into a situation where a node will be stuck trying again and again to be stuck at some point.

With node, you are referring to a server? I am not sure such a high level retry mechanism should be owned by the API client. API errors should be handled by your app, where the retry decision is yours.

The retry mechanism we implement is focusing on transport failures (TCP) and small API outages (HTTP). We did add a retry on rate limits errors, to be more resilient in case of rate limit by having an exponential back off sleep time between retries.

We kinda need a mechanism that turns down any requests (error would be fine) for a while, while it waits for the rate limit to be fine again.

This is somewhat handled by the exponential back off sleep time between retries. You should be able to tweak the retry settings yourself using https://pkg.go.dev/github.com/hetznercloud/hcloud-go/v2/hcloud#WithRetryOpts

Note that you can also use a cancellable context if you want to stop the retrying after a certain amount of time.

Allowing the users to configure the retry policy is something we were considering (e.g. retry on rate limit or not), but choose to postpone it to gather some more data on how to best handle the user's use cases.

Could you update the issue title to better reflect your feature request ?

jooola self-assigned this Oct 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Max retries per client #535

Max retries per client #535

s4ke commented Sep 19, 2024

jooola commented Oct 9, 2024 •

edited

Loading

Max retries per client #535

Max retries per client #535

Comments

s4ke commented Sep 19, 2024

TL;DR

Expected behavior

jooola commented Oct 9, 2024 • edited Loading

jooola commented Oct 9, 2024 •

edited

Loading