Skip to content
This repository has been archived by the owner on Apr 19, 2022. It is now read-only.

Compatibility with EMRFS (S3)? #223

Open
tejasmanohar opened this issue Jan 5, 2017 · 0 comments
Open

Compatibility with EMRFS (S3)? #223

tejasmanohar opened this issue Jan 5, 2017 · 0 comments

Comments

@tejasmanohar
Copy link

I'm using Spark on AWS EMR against S3 via EMRFS and trying to use snakebite to test whether a file exists on S3 via EMRFS's consistent view.

I am not sure of all implementation details of EMRFS, but I assume it's some sort of HDFS VFS. What I can tell at first glance is hadoop fs recognizes s3://* paths without any additional flags so I would expect snakebite to do the same. However, it appears that snakebite is looking for /user/hadoop/s3:/segmentio/logs/api, which is an invalid path.

Does/should snakebite support EMRFS? Do you have any ideas how I could find more details on how the paths are supposed to be mapped? I'd be happy to send a PR but am not quite sure where to start yet (new to HDFS). Thanks for this library!

[hadoop@... ~]$ hadoop fs -test s3://segmentio/logs/api
-test: No test flag given
Usage: hadoop fs [generic options] -test -[defsz] <path>
[hadoop@...~]$ hadoop fs -test -e s3://segmentio/logs/api
[hadoop@... ~]$ echo $?
0
[hadoop@... ~]$ python
Python 2.7.12 (default, Sep  1 2016, 22:14:00)
[GCC 4.8.3 20140911 (Red Hat 4.8.3-9)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from snakebite.client import Client
>>> c = Client("...", 8020)
>>> c.test("s3://segmentio/logs/api", exists=True)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.7/site-packages/snakebite/client.py", line 941, in test
    items = list(self._find_items([path], processor, include_toplevel=True))
  File "/usr/local/lib/python2.7/site-packages/snakebite/client.py", line 1214, in _find_items
    fileinfo = self._get_file_info(path)
  File "/usr/local/lib/python2.7/site-packages/snakebite/client.py", line 1342, in _get_file_info
    return self.service.getFileInfo(request)
  File "/usr/local/lib/python2.7/site-packages/snakebite/service.py", line 40, in <lambda>
    rpc = lambda request, service=self, method=method.name: service.call(service_stub_class.__dict__[method], request)
  File "/usr/local/lib/python2.7/site-packages/snakebite/service.py", line 46, in call
    return method(self.service, controller, request)
  File "/usr/local/lib/python2.7/site-packages/google/protobuf/service_reflection.py", line 267, in <lambda>
    self._StubMethod(inst, method, rpc_controller, request, callback))
  File "/usr/local/lib/python2.7/site-packages/google/protobuf/service_reflection.py", line 284, in _StubMethod
    method_descriptor.output_type._concrete_class, callback)
  File "/usr/local/lib/python2.7/site-packages/snakebite/channel.py", line 447, in CallMethod
    return self.parse_response(byte_stream, response_class)
  File "/usr/local/lib/python2.7/site-packages/snakebite/channel.py", line 418, in parse_response
    self.handle_error(header)
  File "/usr/local/lib/python2.7/site-packages/snakebite/channel.py", line 421, in handle_error
    raise RequestError("\n".join([header.exceptionClassName, header.errorMsg]))
snakebite.errors.RequestError: org.apache.hadoop.fs.InvalidPathException
Invalid path name Invalid file name: /user/hadoop/s3:/segmentio/logs/api
	at org.apache.hadoop.hdfs.server.namenode.FSDirStatAndListingOp.getFileInfo(FSDirStatAndListingOp.java:100)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getFileInfo(FSNamesystem.java:3857)
	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getFileInfo(NameNodeRpcServer.java:1012)
	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getFileInfo(ClientNamenodeProtocolServerSideTranslatorPB.java:843)
	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant