-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question regarding handling of HTTP 429 with reactive stream #2104
Comments
where is If you want to implement this in a different way you might want to derive from Changing the behaviour of the scroll function by default to do a retry from where the flux broke would change the default behaviour/contract of a Flux.
Actually up to now nobody reported this. Btw it seems strange to me that the cluster throttles a client for too many requests when doing a scrolled search. |
Hi, the It was not my intention to do a retry by default, because there may be other reasons for an error, like a Date parser exceptions which happened on some Dates with 000 milliseconds, weird, but the original code was to separate those errors and only do a retry on stuff that was related to http and actually have a good chance of going right next time. |
Hi, sadly I was not able to really subclass it, because kotlin -> java code doesn't work that well and I had to copy over nearly everything. In the end I made a new version, branch of 4.3.0 and implemented my logic directly in DefaultReactiveElasticsearchClient, it looks like this: Right now it's more or less quick and dirty, I will review it with my colleagues, but I already tried it on prod on a 5 Million row report and although we had >10 retries there were no broken streams and each retry only happened for that one call that was done to the BE System. I think changing the method in which the webClient does the call or a method that evaluates the result could be even more beneficial because it could be used by more than the scroll (we also got http 429 for search and bulk) https://aws.amazon.com/de/premiumsupport/knowledge-center/opensearch-resolve-429-error/ Also here something from elastic I really would have liked the elastic client to be able to handle that automatically, maybe we even need to use elastic client more directly in the future. You said you will use elastic client lib in the future, is this already part of V5? |
thanks for the info and that code. The The new reactive implementation will be in version 5, and hopefully already as an optional alternative in 4.4. But it is not yet complete, the are some bugs that need to be fixed in the new client and in Elasticsearch itself. This next reactive implementation will not be using the |
This change is sufficient to make the scroll work, also will probably fix all other problems. We had a prob in the beginning where we thought that the magic findBy Methods are probably broken because a |
Hi,
we are using spring-data-elasticsearch with version 4.3.0. Our backend is opensearch v1. Whenever we do too many requests opensearch sends HTTP 429 to throttle the client and to tell us that it was too much. Our current approach to that is to retry it, so we have something like that:
repository.mySearchFunctionWhichReturnsFluxOfEntity() .retryExponentialBackoff(20, Duration.ofSeconds(1), Duration.ofSeconds(20), false) { log.debug("I retried") } .doOnNext { // do something with the entity, in our case: add them to a report class to be written to the file system }
Now ... my expectation was that the retry will only retry the failed scroll call, but it doesn't, it restarts the whole flux.
So instead of having 5M rows in our report we now have 25M rows, because it restarted 18 times and so the creation of the report took 7h instead of 45min.
I really don't want to use elasticsearch client lib directly to have full control over my SearchScroll and cover all of the retriable stuff directly before adding them to the Flux, so I would like to know how this can be solved with the tools that spring-data-elasticsearch provide, or something that configures the underlying code.
I thought that this must occur very often and so I think there must be a way to not restart the stream or at least restart it where it broke.
The text was updated successfully, but these errors were encountered: