You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Apr 19, 2022. It is now read-only.
I'm using Spark on AWS EMR against S3 via EMRFS and trying to use snakebite to test whether a file exists on S3 via EMRFS's consistent view.
I am not sure of all implementation details of EMRFS, but I assume it's some sort of HDFS VFS. What I can tell at first glance is hadoop fs recognizes s3://* paths without any additional flags so I would expect snakebite to do the same. However, it appears that snakebite is looking for /user/hadoop/s3:/segmentio/logs/api, which is an invalid path.
Does/should snakebite support EMRFS? Do you have any ideas how I could find more details on how the paths are supposed to be mapped? I'd be happy to send a PR but am not quite sure where to start yet (new to HDFS). Thanks for this library!
[hadoop@... ~]$ hadoop fs -test s3://segmentio/logs/api
-test: No test flag given
Usage: hadoop fs [generic options] -test -[defsz] <path>
[hadoop@...~]$ hadoop fs -test -e s3://segmentio/logs/api
[hadoop@... ~]$ echo $?
0
[hadoop@... ~]$ python
Python 2.7.12 (default, Sep 1 2016, 22:14:00)
[GCC 4.8.3 20140911 (Red Hat 4.8.3-9)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from snakebite.client import Client
>>> c = Client("...", 8020)
>>> c.test("s3://segmentio/logs/api", exists=True)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python2.7/site-packages/snakebite/client.py", line 941, in test
items = list(self._find_items([path], processor, include_toplevel=True))
File "/usr/local/lib/python2.7/site-packages/snakebite/client.py", line 1214, in _find_items
fileinfo = self._get_file_info(path)
File "/usr/local/lib/python2.7/site-packages/snakebite/client.py", line 1342, in _get_file_info
return self.service.getFileInfo(request)
File "/usr/local/lib/python2.7/site-packages/snakebite/service.py", line 40, in <lambda>
rpc = lambda request, service=self, method=method.name: service.call(service_stub_class.__dict__[method], request)
File "/usr/local/lib/python2.7/site-packages/snakebite/service.py", line 46, in call
return method(self.service, controller, request)
File "/usr/local/lib/python2.7/site-packages/google/protobuf/service_reflection.py", line 267, in <lambda>
self._StubMethod(inst, method, rpc_controller, request, callback))
File "/usr/local/lib/python2.7/site-packages/google/protobuf/service_reflection.py", line 284, in _StubMethod
method_descriptor.output_type._concrete_class, callback)
File "/usr/local/lib/python2.7/site-packages/snakebite/channel.py", line 447, in CallMethod
return self.parse_response(byte_stream, response_class)
File "/usr/local/lib/python2.7/site-packages/snakebite/channel.py", line 418, in parse_response
self.handle_error(header)
File "/usr/local/lib/python2.7/site-packages/snakebite/channel.py", line 421, in handle_error
raise RequestError("\n".join([header.exceptionClassName, header.errorMsg]))
snakebite.errors.RequestError: org.apache.hadoop.fs.InvalidPathException
Invalid path name Invalid file name: /user/hadoop/s3:/segmentio/logs/api
at org.apache.hadoop.hdfs.server.namenode.FSDirStatAndListingOp.getFileInfo(FSDirStatAndListingOp.java:100)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getFileInfo(FSNamesystem.java:3857)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getFileInfo(NameNodeRpcServer.java:1012)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getFileInfo(ClientNamenodeProtocolServerSideTranslatorPB.java:843)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)
The text was updated successfully, but these errors were encountered:
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
I'm using Spark on AWS EMR against S3 via EMRFS and trying to use snakebite to test whether a file exists on S3 via EMRFS's consistent view.
I am not sure of all implementation details of EMRFS, but I assume it's some sort of HDFS VFS. What I can tell at first glance is
hadoop fs
recognizes s3://* paths without any additional flags so I would expect snakebite to do the same. However, it appears that snakebite is looking for/user/hadoop/s3:/segmentio/logs/api
, which is an invalid path.Does/should snakebite support EMRFS? Do you have any ideas how I could find more details on how the paths are supposed to be mapped? I'd be happy to send a PR but am not quite sure where to start yet (new to HDFS). Thanks for this library!
The text was updated successfully, but these errors were encountered: