Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: use total_seconds instead of seconds in the datetime #46

Merged
merged 1 commit into from
Sep 20, 2024

Conversation

kan-fu
Copy link
Collaborator

@kan-fu kan-fu commented Sep 17, 2024

Using seconds is clearly a bug. Running

import datetime
print(datetime.timedelta(days=1, seconds=13599).seconds)
print(datetime.timedelta(days=1, seconds=13599).total_seconds())

will print 13599 99999.0. We would like 99999 in our case.

Also add more message in the log. Now it looks like

Data quantity is greater than the row limit and will be downloaded in multiple pages.
Downloading time for the first page: 8 seconds
Estimated approx. 6 pages in total.
Estimated approx. 40 seconds to complete for the rest of the pages.

(50000 samples) Downloading page 2...
(100000 samples) Downloading page 3...
(150000 samples) Downloading page 4...
(200000 samples) Downloading page 5...
(250000 samples) Downloading page 6...
(259201 samples) Completed in 40 seconds.

The calculation is based on the response time of downloading the first page. This log makes it clear that the estimated 40 seconds is calculated by 8 seconds/page * (6 - 1) page.
The estimated pages will still be inaccurate if the specified time span includes date range that has no data. This is unavoidable because it cannot be known in advance until the request is made. I found that when running the code below in QA. There are no data after 2019-11-27 for this device in QA.

from onc import ONC
onc = ONC("XXX", production=False)
onc.getDirectByDevice({
        "deviceCode": "BPR-Folger-59",
        "dateFrom": "2019-11-26T00:00:00.000Z",
        "dateTo": "2019-11-30T00:00:00.000Z",
        "rowLimit": 50000,
    },allPages=True)

Data quantity is greater than the row limit and will be downloaded in multiple pages.
Downloading time for the first page: 8 seconds
Estimated approx. 7 pages in total.
Estimated approx. 48 seconds to complete for the rest of the pages.

(50000 samples) Downloading page 2...
(86400 samples) Completed in 15 seconds.

Test failures in the Actions is caused by the multiple page bug in the server. It will be fixed in the next minor release from the server side.

@kan-fu kan-fu merged commit 476841f into main Sep 20, 2024
6 checks passed
@kan-fu kan-fu deleted the issue-45-fix-time-estimation branch September 20, 2024 17:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Estimation time when downloading scalardata in multiple pages is not near the actual case sometimes
3 participants