Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTTP lookup source max retries #90

Closed
wants to merge 3 commits into from
Closed

Conversation

Raz0r
Copy link

@Raz0r Raz0r commented Apr 23, 2024

Description

Lookup queries are now retried in case of IOException up to gid.connector.http.source.lookup.max-retries with a delay of gid.connector.http.source.lookup.request.retry-timeout-ms between retries. The default values are 3 retries and 1 second delay.

Resolves #66

PR Checklist

@@ -27,6 +27,10 @@
@Slf4j
public class JavaNetHttpPollingClient implements PollingClient<RowData> {

public static final String DEFAULT_REQUEST_MAX_RETRIES = "3";

public static final String DEFAULT_REQUEST_RETRY_TIMEOUT_MS = "1000";
Copy link
Contributor

@davidradl davidradl May 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest calling this DEFAULT_REQUEST_RETRY_INTERVAL_MS as it is not a timeout. Also this name should mention lookup or polling client - so that it is obvious it does not apply to the sink client.

Same naming consideration for max retries.

I assume we will add something similar for the sink client

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Raz0r could you apply proposed changes?

@@ -27,6 +27,10 @@
@Slf4j
public class JavaNetHttpPollingClient implements PollingClient<RowData> {

public static final String DEFAULT_REQUEST_MAX_RETRIES = "3";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this not be an int?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We tend keep all properties as Strings, becase they are passed as Properties<String,String> from Table factories.

Later we use them like this:

this.httpRequestMaxRetries = Integer.parseInt(
            options.getProperties().getProperty(
                HttpConnectorConfigConstants.LOOKUP_HTTP_MAX_RETRIES,
                DEFAULT_REQUEST_MAX_RETRIES
            )
        );

@@ -74,15 +96,43 @@ public Optional<RowData> pull(RowData lookupRow) {
}
}

// TODO Add Retry Policy And configure TimeOut from properties
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the intent of this comment is to make use of the Flink Retry strategies, which includes async.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The thing is that AsyncSink uses totally different, dedicated mechanism for retries => Flink native support.
And actually there is separate TODO for this in HttpSinkWriter.

What kind of Flink retry strategies you were here @davidradl ?

Copy link
Contributor

@davidradl davidradl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have made some comments - I think we should have consistent retry mechanisms for sink sourcs async synch. There are Retry strategies in Flink e.g. AsyncRetryStrategies. I would hope we could make an IOException bubble up so that the retry strategies would retry .

In Flink I see existing similar options to those you are suggesting:

@JsonTypeName("RetryLookupOptions")
 public static class RetryLookupOptions {
     public static final String FIELD_NAME_RETRY_PREDICATE = "retry-predicate";
     public static final String FIELD_NAME_RETRY_STRATEGY = "retry-strategy";
     public static final String FIELD_NAME_RETRY_FIXED_DELAY = "fixed-delay";
     public static final String FIELD_NAME_RETRY_MAX_ATTEMPTS = "max-attempts";


@Raz0r
Copy link
Author

Raz0r commented May 6, 2024

Hey @davidradl thanks, didn't know about existing AsyncRetryStrategies, I will need some time to figure out how to use it correctly with making IOException bubble up.

@davidradl
Copy link
Contributor

davidradl commented May 8, 2024

@Raz0r See #94. As part of introducing caching there is a max-retries option that can be used

@kristoffSC kristoffSC changed the title HTTP source max retries HTTP lookup source max retries Jun 7, 2024
@Raz0r
Copy link
Author

Raz0r commented Oct 25, 2024

Closing this in favor of #129 and #120

@Raz0r Raz0r closed this Oct 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

HTTP connect timed out error
3 participants