Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Downloading datasets from public servers fails after some time #409

Open
hagenw opened this issue May 14, 2024 · 4 comments
Open

Downloading datasets from public servers fails after some time #409

hagenw opened this issue May 14, 2024 · 4 comments
Assignees
Labels
bug Something isn't working

Comments

@hagenw
Copy link
Member

hagenw commented May 14, 2024

This was first reported in #389 (comment)

When downloading a dataset with anonymous access to Artifactory, the download fails after some time:

>>> import audb
>>> audb.__version__
'1.7.0'
>>> import audbackend
>>> audbackend.backend.Artifactory.get_authentication("https://artifactory.audeering.com/artifactory")
('anonymous', '')
>>> db = audb.load('cough-speech-sneeze', format='wav', verbose=True)
...
ConnectionError: HTTPSConnectionPool(host='audeering.jfrog.io', port=443): Max retries exceeded with url: /artifactory/api/storage/data-public/cough-speech-sneeze/media/42324
baf-fe27-7828-f7bb-cbdf688aa80a/2.0.1/42324baf-fe27-7828-f7bb-cbdf688aa80a-2.0.1.zip (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x750d9b3ea
ad0>: Failed to resolve 'audeering.jfrog.io' ([Errno -3] Temporary failure in name resolution)"))
...

When running the same download using my credentials for authentication with Artifactory the download fails after the same time, but with a different error message:

HTTPError: 403 Client Error: Forbidden for url: https://jfrog-prod-euc1-shared-frankfurt-main.s3.amazonaws.com/aol-a0jkltxoy0gz0/filestore/95/95ed792104abf9dd0acdccce9c958d38
838abffb?X-Artifactory-username=hwierstorf&X-Artifactory-repoType=local&X-Artifactory-repositoryKey=data-public&X-Artifactory-packageType=maven&X-Artifactory-artifactPath=cou
gh-speech-sneeze%2Fmedia%2F993cb076-fc28-c7c0-0ed8-255f17a1c064%2F2.0.1%2F993cb076-fc28-c7c0-0ed8-255f17a1c064-2.0.1.zip&X-Artifactory-projectKey=default&x-jf-traceId=5e89fb5
fd1cc876c&response-content-disposition=attachment%3Bfilename...

When downloading the same or a larger dataset from our internal Artifactory server, the download does not fail.

When restarting a failed download it will pick up, where it left and will finish the download, also using several workers will work in that case.

@hagenw hagenw added the bug Something isn't working label May 14, 2024
@hagenw
Copy link
Member Author

hagenw commented May 14, 2024

@ChristianGeng any idea how we could track down what is wrong with the settings of the public Artifactory server, or if we need to change how we connect to it in order to avoid the error?

In general, it seems to me that we should try to get another solution for hosting public datasets.

@hagenw
Copy link
Member Author

hagenw commented May 14, 2024

When re-running the same code, but using audb==1.6.5 and audbackend==1.0.2, the download does succeed.
But it also takes 30 minutes, instead of 10 minutes.
So the problem seems related to the changes we introduced in audeering/audbackend#222, where we use a requests.Session object to authenticate only once.

@ChristianGeng
Copy link
Member

When re-running the same code, but using audb==1.6.5 and audbackend==1.0.2, the download does succeed. But it also takes 30 minutes, instead of 10 minutes. So the problem seems related to the changes we introduced in audeering/audbackend#222, where we use a requests.Session object to authenticate only once.

Could it be an async thingy - with _close getting callled too early? requests.Session() is used with a context manager, and could it also be that the context manager insists on closing itself?

@hagenw
Copy link
Member Author

hagenw commented May 14, 2024

I have no clue, the only thing I can report is that those problems do not happen with our internal Artifactory server. So it must be a mixture of the changes introduced to audbackend 2.0.0, e.g. requests.Session(), _close(), and how the Artifactory server at jfrog.io is configured.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants