-
Notifications
You must be signed in to change notification settings - Fork 707
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(Request): add requestHandlerTimeoutSecs
and requestTimeoutSecs
options
#1560
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs tests.
// A request can have some request-specific timeouts defined. We check if they have been defined, and if so, | ||
// use those timeouts instead. However; if they aren't present, the defaults will be used. | ||
const requestTimeoutMillis = request?.requestTimeoutMillis ?? this.internalTimeoutMillis; | ||
const requestHandlerTimeoutMillis = request?.requestHandlerTimeoutMillis ?? this.requestHandlerTimeoutMillis; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not
request.requestHandlerTimeoutSecs ??= this.requestHandlerTimeoutMillis / 1000;
[...]
await this._timeoutAndRetry(
() => source.markRequestHandled(request!),
request.requestHandlerTimeoutSecs * 1000,
`Marking request ${request.url} (${request.id}) as handled timed out after ${requestTimeoutMillis / 1e3} seconds.`,
);
This way we don't need those helpers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes true, nullish coalescing reassignment. I rarely use it, but this is definitely a good use-case
This is not done. Going to work on a better solution other than reassigning these values all over the place - it gets messy. These new request options need to be handled in |
If we want to persist the information (e.g. to make this work on platform even after migration), we should use the same approach as with
Isnt this all about basic crawler, the rest builds on top of it? I can imagine some protected method there (so we can override in extension classes) to calculate timeout for a given request instance, like |
@B4nan What about making it look like this: router.addHandler('label-a', async (ctx) => {
ctx.log.info('...');
}, { timeoutSecs: 5 }); |
@B4nan The problem (in my opinion) with the idea Szymon is proposing is that it would only allow the user to specify label-specific |
I would rather support something like this, feels more natural (and having options parameter after a callback one is a weird DX): router.addHandler('label-a', async ({ request, log }) => {
request.timeoutSecs = 5;
log.info('...');
}); (and this means those options would be part of But we can support both ways in the end if it makes sense. Keep mind mind that a request is labeled, so this above is also "label specific timeout", I dont see a difference with the Szymon's version in this regard. |
It's different because you could do something like this: await crawler.run([{ url: 'foo', label: 'TEST', timeoutSecs: 5 }, { url: 'foo', label: 'TEST', timeoutSecs: 10 }]) Same label, but different |
Yes, and one of that would have priority, two ways to do the same. (not that the proposal is about |
Yeah not talking specifically about |
Right, I would definitely do the request options first, then we can think about the router API. The initial goal here to me was to allow modifying this e.g. via router middleware - there you can label stuff as well as modify the timeouts, before the corrent request handler is picked and executed. |
This feature requires not only changes in the |
You lost me at OOP mess :D |
closes #1485
Adds
requestHandlerTimeoutSecs
andrequestTimeoutSecs
options to a request to allow for request-specific timeouts.If these optional parameters aren't provided in
RequestOptions
, the crawler will fallback to the timeouts provided in its configuration.