Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compatibility with OVH object store #223

Open
1 task done
pwl opened this issue Aug 9, 2024 · 1 comment
Open
1 task done

Compatibility with OVH object store #223

pwl opened this issue Aug 9, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@pwl
Copy link

pwl commented Aug 9, 2024

s3torchconnector version

1.2.4

s3torchconnectorclient version

1.2.4

AWS Region

n/a

Describe the running environment

Linux Manjaro (Arch-based) on a local machine.

What happened?

I tried connecting to a private OVH s3-compatible bucket with

endpoint = "https://s3.gra.io.cloud.ovh.net/"
region = "gra"
s3_uri = "s3://<bucket-name>"
data = S3MapDataset.from_prefix(s3_uri=s3_uri, endpoint=endpoint, region=region)
print(data[0].read())

But got the error below.

My credentials are defined via ~/.aws/config and ~/.aws/credentials. They seem to work as aws ls <s3_uri> correctly lists the contents of the bucket.

I'm not sure if alternative s3 bucket providers are supported or not, it's not clear from the README, so I'm assumed they are. I also don't know if this is a bug in OVH API or in s3 connector.

Relevant log output

Traceback (most recent call last):
  File "xxx/data.py", line 40, in <module>
    print(data[0].read())
          ~~~~^^^
  File "/home/pawel/.cache/pypoetry/virtualenvs/esm-clustering-uWNKxUsg-py3.12/lib/python3.12/site-packages/s3torchconnector/s3map_dataset.py", line 144, in __getitem__
    return self._transform(self._get_object(i))
                           ^^^^^^^^^^^^^^^^^^^
  File "/home/pawel/.cache/pypoetry/virtualenvs/esm-clustering-uWNKxUsg-py3.12/lib/python3.12/site-packages/s3torchconnector/s3map_dataset.py", line 138, in _get_object
    bucket_key = self._dataset_bucket_key_pairs[i]
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/pawel/.cache/pypoetry/virtualenvs/esm-clustering-uWNKxUsg-py3.12/lib/python3.12/site-packages/s3torchconnector/s3map_dataset.py", line 56, in _dataset_bucket_key_pairs
    self._bucket_key_pairs = list(self._get_dataset_objects(self._get_client()))
                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/pawel/.cache/pypoetry/virtualenvs/esm-clustering-uWNKxUsg-py3.12/lib/python3.12/site-packages/s3torchconnector/_s3_bucket_iterable.py", line 50, in __next__
    return next(self._list_stream)
           ^^^^^^^^^^^^^^^^^^^^^^^
s3torchconnectorclient._mountpoint_s3_client.S3Exception: Client error: Unknown response error: MetaRequestResult { response_status: 404, crt_error: Error(14343, "aws-c-s3: AWS_ERROR_S3_INVALID_RESPONSE_STATUS, Invalid response status from request"), error_response_headers: Some(Headers { inner: 0x7ff118016970 }), error_response_body: Some("<?xml version=\'1.0\' encoding=\'UTF-8\'?>\n<Error><Code>NoSuchKey</Code><Message>The specified key does not exist.</Message><RequestId>tx6897cfc165de4d4d99d35-0066b62094</RequestId><Key>/</Key></Error>") }

Code of Conduct

  • I agree to follow this project's Code of Conduct
@pwl pwl added the bug Something isn't working label Aug 9, 2024
@IsaevIlya
Copy link
Contributor

Hello @pwl,
Thank you for reaching out and providing the detailed information about the issue you are facing. We appreciate your interest in using the S3 Connector for PyTorch.

Our project primarily focuses on optimizing access to Amazon S3. Maintaining compatibility with other S3-compatible storage systems is not our primary goal. We are open to accepting pull requests that improve compatibility with other storage providers, as long as it does not adversely affect the library's core functionality with Amazon S3.

Regarding your specific issue, we understand that you are encountering an error when trying to access your private OVH bucket using our library. Unfortunately, we cannot provide a definitive solution or workaround at this time, as the issue may stem from compatibility differences between OVH's implementation and the underlying CRT library we use for S3 access.

However, we suggest trying the following steps to further investigate the problem:

  • Configure the AWS CLI to use the CRT client (see flag description here).
  • Execute ls and cp commands using the AWS CLI to test the access to your OVH bucket. This may help identify if the issue is specific to our library or a more general compatibility problem with the CRT client.

If you know that OVH's object store requires path-style accees, you can consider building the latest version of our library from the main branch, as it includes support for path-style access.

We appreciate your understanding that while we aim to provide a versatile library, our primary focus remains on optimizing Amazon S3 access. If you have further insights or suggestions regarding improving compatibility with OVH's S3-compatible storage, we welcome your contributions through pull requests or ongoing discussions.

Thank you for your patience and understanding.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants