Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

opendir() fails on same path that listdir() works on #80

Open
dargueta opened this issue Nov 3, 2020 · 2 comments
Open

opendir() fails on same path that listdir() works on #80

dargueta opened this issue Nov 3, 2020 · 2 comments

Comments

@dargueta
Copy link

dargueta commented Nov 3, 2020

Not sure why, but opendir() apparently only works on the root directory. If you try to use it with a subdirectory that exists, you get a ResourceNotFound error.

>>> s3 = fs.open_fs("s3://my-bucket")

# The directory clearly exists...
>>> s3.listdir("/path/to/directory")
['foo.txt', 'bar.txt']

# ... but sad times if you try to open it
>>> root = s3.opendir("/path/to/directory")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/dargueta/.pyenv/versions/3.7.6/envs/gds/lib/python3.7/site-packages/fs/base.py", line 1207, in opendir
    if not self.getbasic(path).is_dir:
  File "/Users/dargueta/.pyenv/versions/3.7.6/envs/gds/lib/python3.7/site-packages/fs/base.py", line 1525, in getbasic
    return self.getinfo(path, namespaces=["basic"])
  File "/Users/dargueta/.pyenv/versions/3.7.6/envs/gds/lib/python3.7/site-packages/fs_s3fs/_s3fs.py", line 441, in getinfo
    raise errors.ResourceNotFound(path)
fs.errors.ResourceNotFound: resource '/path/to/directory' not found

I've tried this both with the leading and trailing slash and it still breaks.

If I try this...

>>> s3 = fs.open_fs("s3://my-bucket/path/to/directory")

>>> s3.listdir("/")
['foo.txt', 'bar.txt']

# I can open files
>>> with s3.open('foo.txt', 'rb') as fd:
...     print(len(fd.read()))
36256176

So far so good. However, if I try using filterdir() it breaks even though I was just able to open a .txt file:

>>> list(s3.filterdir("*.txt"))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/dargueta/.pyenv/versions/3.7.6/envs/gds/lib/python3.7/site-packages/fs_s3fs/_s3fs.py", line 695, in scandir
    info = self.getinfo(path)
  File "/Users/dargueta/.pyenv/versions/3.7.6/envs/gds/lib/python3.7/site-packages/fs_s3fs/_s3fs.py", line 451, in getinfo
    obj = self._get_object(path, _key)
  File "/Users/dargueta/.pyenv/versions/3.7.6/envs/gds/lib/python3.7/site-packages/fs_s3fs/_s3fs.py", line 351, in _get_object
    return obj
  File "/Users/dargueta/.pyenv/versions/3.7.6/lib/python3.7/contextlib.py", line 130, in __exit__
    self.gen.throw(type, value, traceback)
  File "/Users/dargueta/.pyenv/versions/3.7.6/envs/gds/lib/python3.7/site-packages/fs_s3fs/_s3fs.py", line 183, in s3errors
    raise errors.ResourceNotFound(path)
fs.errors.ResourceNotFound: resource '*.txt' not found

I suspect something's wrong with the way SubFS is getting created.

(Duplicate of #8 but that was closed a long time ago with no resolution)

@desmoteo
Copy link

desmoteo commented Nov 7, 2020

I think this is probably related to https://fs-s3fs.readthedocs.io/en/latest/#limitations , and to previosly reported issues such as #62 and others. It can be tricky to use S3FS on buckets where files were previously created e.g. with boto3, because the presence e.g. of file “foo/bar” does not imply the existence of a directory object "foo/" (which is an empty object with key "foo/") . S3FS instead requires the presence of such objects for some operations. if you try with s3.makedir("/path/to/directory") than listdir should work.

As an alternative you could look at #60 and at https://github.com/mrk-its/s3fs
Or write a script crawling your bucket and creating all missing directories, i guess.

@dargueta
Copy link
Author

So I get the S3 empty object thing but I've been doing some thinking and I think, with some finagling, it may be possible to simulate nonexistent directories by using prefix-based searching instead of relying on empty objects as "sentinels" for lack of a better term. This would complicate somethings like stat since it would have to look for both that empty object and, failing that, see if there are any keys with a matching prefix, but I think that can be done without too much of a performance impact.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants