Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

from.dfs produces "file does not exist" error #161

Open
kardes opened this issue Mar 17, 2015 · 27 comments
Open

from.dfs produces "file does not exist" error #161

kardes opened this issue Mar 17, 2015 · 27 comments
Assignees

Comments

@kardes
Copy link

kardes commented Mar 17, 2015

Hi,
I set up R and Hadoop using cloudera quick start VM CDH 5.3.

R version 3.1.2. VirtualBox Manager 4.3.20 running on MacOSX 10.7.5
I followed the blog
http://www.r-bloggers.com/integration-of-r-rstudio-and-hadoop-in-a-virtualbox-cloudera-demo-vm-on-mac-os-x/
to set up R and Hadoop and turned of MR2/YARN. Instead I Am using MR1.

Everything seems to work fine but the from.dfs function.

I am using the simple example in R:
small.ints <- to.dfs(1:1000)
out <- mapreduce(input = small.ints, map = function(k, v) keyval(v, v^2))
df <- as.data.frame(from.dfs(out))

from.dfs produces the following error. If you could be of any hep, I'd greatly appreciate it. Thank you very much. -EK

When I use it I get the error:
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at org.apache.hadoop.streaming.HadoopStreaming.main(HadoopStreaming.java:41)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
Exception in thread "main" java.io.FileNotFoundException: File does not exist: hdfs://localhost:8020/user/cloudera/128432
at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1093)
at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1085)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1085)
at org.apache.hadoop.streaming.DumpTypedBytes.run(DumpTypedBytes.java:76)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at org.apache.hadoop.streaming.HadoopStreaming.main(HadoopStreaming.java:41)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
Exception in thread "main" java.io.FileNotFoundException: File does not exist: hdfs://localhost:8020/user/cloudera/422
at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1093)
at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1085)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1085)
at org.apache.hadoop.streaming.DumpTypedBytes.run(DumpTypedBytes.java:76)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at org.apache.hadoop.streaming.HadoopStreaming.main(HadoopStreaming.java:41)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
Exception in thread "main" java.io.FileNotFoundException: File does not exist: hdfs://localhost:8020/user/cloudera/122
at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1093)
at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1085)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1085)
at org.apache.hadoop.streaming.DumpTypedBytes.run(DumpTypedBytes.java:76)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at org.apache.hadoop.streaming.HadoopStreaming.main(HadoopStreaming.java:41)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)

@piccolbo
Copy link
Collaborator

can you enter

out()

and paster the output back here?

@piccolbo piccolbo self-assigned this Mar 17, 2015
@kardes
Copy link
Author

kardes commented Mar 17, 2015

out()
[1] "/tmp/RtmpmQu2O7/file1b584440dee3"

@piccolbo
Copy link
Collaborator

Wihout closing that R session where you did the last step, try at the shell
prompt

hadoop jar $HADOOP_STREAMING dumptb /tmp/RtmpmQu2O7/file1b584440dee3

On Tue, Mar 17, 2015 at 10:39 AM, kardes [email protected] wrote:

out()
[1] "/tmp/RtmpmQu2O7/file1b584440dee3"


Reply to this email directly or view it on GitHub
#161 (comment)
.

@kardes
Copy link
Author

kardes commented Mar 17, 2015

I opened a new terminal window (without closing the current one with the R session) and entered that line:

$ hadoop jar $HADOOP_STREAMING dumptb /tmp/RtmpmQu2O7/file1b584440dee3
Not a valid JAR: /home/cloudera/dumptb

@piccolbo
Copy link
Collaborator

make sure HADOOP_STREAMING is set in that shell instance. It looks like
it's empty

On Tue, Mar 17, 2015 at 10:52 AM, kardes [email protected] wrote:

I opened a new terminal window (without closing the current one with the R
session) and entered that line:

$ hadoop jar $HADOOP_STREAMING dumptb /tmp/RtmpmQu2O7/file1b584440dee3
Not a valid JAR: /home/cloudera/dumptb


Reply to this email directly or view it on GitHub
#161 (comment)
.

@kardes
Copy link
Author

kardes commented Mar 17, 2015

Could you please pervade specific instructions on how to do it?
So far, after

small.ints <- to.dfs(1:1000)
out <- mapreduce(input = small.ints, map = function(k, v) keyval(v, v^2))

I check out()

out()
[1] "/tmp/Rtmp5Nt5L7/file25ed2392eeba"

Then I get out of R using Ctrl-Z and entering bg (putting R into the background)
Then I enter

$hadoop jar $HADOOP_STREAMING dumptb /tmp/Rtmp5Nt5L7/file25ed2392eeba

and get

Not a valid JAR: /home/cloudera/dumptb

thanks

@piccolbo
Copy link
Collaborator

That's part of installing rmr2. No HADOOP_STREAMING, no rmr2.

On Tue, Mar 17, 2015 at 12:13 PM, kardes [email protected] wrote:

Could you please pervade specific instructions on how to do it?
So far, after

small.ints <- to.dfs(1:1000)
out <- mapreduce(input = small.ints, map = function(k, v) keyval(v, v^2))

I check out()

out()
[1] "/tmp/Rtmp5Nt5L7/file25ed2392eeba"

Then I get out of R using Ctrl-Z and entering bg (putting R into the
background)
Then I enter

$hadoop jar $HADOOP_STREAMING dumptb /tmp/Rtmp5Nt5L7/file25ed2392eeba

and get

Not a valid JAR: /home/cloudera/dumptb

thanks


Reply to this email directly or view it on GitHub
#161 (comment)
.

@kardes
Copy link
Author

kardes commented Mar 20, 2015

Hi Antonio,
I don't understand anything from results when I search for hadoop streaming rmr2. Coud you please point me to a resource when I get set this up correctly? Please, Thanks.

@piccolbo
Copy link
Collaborator

You are seriously telling me you do not understand a list of two files? Can
you ask a more specific question?

On Fri, Mar 20, 2015 at 11:09 AM, kardes [email protected] wrote:

I obtain the following when do a find:

$find $HADOOP_HOME -name hadoop-streaming*.jar

/usr/lib/hadoop-0.20-mapreduce/contrib/streaming/hadoop-streaming-2.5.0-mr1-cdh5.3.2.jar
/usr/lib/hadoop-0.20-mapreduce/contrib/streaming/hadoop-streaming.jar


Reply to this email directly or view it on GitHub
#161 (comment)
.

@kardes
Copy link
Author

kardes commented Mar 20, 2015

sorry, of course not.

I still get the error on the top of this page (the original error). I tried the following:

Opened a terminal and entered:
$echo "export HADOOP_STREAMING=/usr/lib/hadoop-0.20-mapreduce/contrib/streaming/hadoop-streaming-2.5.0-mr1-cdh5.3.2.jar" >> ~/.bashrc

Then in R, I entered:
small.ints <- to.dfs(1:1000)
out <- mapreduce(input = small.ints, map = function(k, v) keyval(v, v^2))
df <- as.data.frame(from.dfs(out))

in the last line above, I get the same error. I tried the following last, but I am not sure how to proceed. please help! thanks.

out()
[1] "/tmp/RtmpuZJz6S/file1c1778b1f9b"
^Z
[1]+ Stopped R
[cloudera@quickstart ~]$ bg
[1]+ R &
[cloudera@quickstart ~]$ hadoop jar $HADOOP_STREAMING dumptb /tmp/RtmpuZJz6S/file1c1778b1f9b
Exception in thread "main" java.io.FileNotFoundException: Path is not a file: /tmp/RtmpuZJz6S/file1c1778b1f9b/_logs
at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:69)
at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:55)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1879)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1820)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1800)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1772)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:527)
at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getBlockLocations(AuthorizationProviderProxyClientProtocol.java:85)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:356)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)

at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73)
at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:1171)
at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1159)
at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1149)
at org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:270)
at org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:237)
at org.apache.hadoop.hdfs.DFSInputStream.<init>(DFSInputStream.java:230)
at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1448)
at org.apache.hadoop.hdfs.DistributedFileSystem$3.doCall(DistributedFileSystem.java:301)
at org.apache.hadoop.hdfs.DistributedFileSystem$3.doCall(DistributedFileSystem.java:297)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:297)
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:766)
at org.apache.hadoop.streaming.AutoInputFormat.getRecordReader(AutoInputFormat.java:56)
at org.apache.hadoop.streaming.DumpTypedBytes.dumpTypedBytes(DumpTypedBytes.java:102)
at org.apache.hadoop.streaming.DumpTypedBytes.run(DumpTypedBytes.java:83)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at org.apache.hadoop.streaming.HadoopStreaming.main(HadoopStreaming.java:41)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)

Caused by: org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException): Path is not a file: /tmp/RtmpuZJz6S/file1c1778b1f9b/_logs
at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:69)
at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:55)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1879)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1820)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1800)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1772)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:527)
at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getBlockLocations(AuthorizationProviderProxyClientProtocol.java:85)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:356)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)

at org.apache.hadoop.ipc.Client.call(Client.java:1411)
at org.apache.hadoop.ipc.Client.call(Client.java:1364)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
at com.sun.proxy.$Proxy14.getBlockLocations(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getBlockLocations(ClientNamenodeProtocolTranslatorPB.java:246)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy15.getBlockLocations(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:1169)

@piccolbo
Copy link
Collaborator

The last error you got is expected because you did a dumptb on a directory
which is not allowed, you'd have to list that first and dump the files
called part-* I would like to be sure that the R session you are working in
has the correct setting. You added an apparently correct line to .bashrc,
but that's the wrong file, because it only affects interactive shells. You
want to use the .profile or the .bash_profile. Then you need to reload it
with . .profile or . .bash_profile then you need to restart R. Then you can
do a Sys.getenv("HADOOP_STREAMING") to make sure the setting has been
picked up correctly, then you should try the example again and see what
happens. The devil is in the details as they say.

On Fri, Mar 20, 2015 at 12:49 PM, kardes [email protected] wrote:

sorry, of course not.

I still get the error on the top of this page (the original error). I
tried the following:

Opened a terminal and entered:
$echo "export
HADOOP_STREAMING=/usr/lib/hadoop-0.20-mapreduce/contrib/streaming/hadoop-streaming-2.5.0-mr1-cdh5.3.2.jar"

~/.bashrc

Then in R, I entered:
small.ints <- to.dfs(1:1000)
out <- mapreduce(input = small.ints, map = function(k, v) keyval(v, v^2))
df <- as.data.frame(from.dfs(out))

in the last line above, I get the same error. I tried the following last,
but I am not sure how to proceed. please help! thanks.

out()
[1] "/tmp/RtmpuZJz6S/file1c1778b1f9b"
^Z
[1]+ Stopped R
[cloudera@quickstart ~]$ bg
[1]+ R &
[cloudera@quickstart ~]$ hadoop jar $HADOOP_STREAMING dumptb
/tmp/RtmpuZJz6S/file1c1778b1f9b
Exception in thread "main" java.io.FileNotFoundException: Path is not a
file: /tmp/RtmpuZJz6S/file1c1778b1f9b/_logs
at
org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:69)
at
org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:55)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1879)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1820)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1800)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1772)
at
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:527)
at
org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getBlockLocations(AuthorizationProviderProxyClientProtocol.java:85)
at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:356)
at
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)

at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73)
at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:1171)
at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1159)
at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1149)
at org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:270)
at org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:237)
at org.apache.hadoop.hdfs.DFSInputStream.(DFSInputStream.java:230)
at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1448)
at org.apache.hadoop.hdfs.DistributedFileSystem$3.doCall(DistributedFileSystem.java:301)
at org.apache.hadoop.hdfs.DistributedFileSystem$3.doCall(DistributedFileSystem.java:297)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:297)
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:766)
at org.apache.hadoop.streaming.AutoInputFormat.getRecordReader(AutoInputFormat.java:56)
at org.apache.hadoop.streaming.DumpTypedBytes.dumpTypedBytes(DumpTypedBytes.java:102)
at org.apache.hadoop.streaming.DumpTypedBytes.run(DumpTypedBytes.java:83)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at org.apache.hadoop.streaming.HadoopStreaming.main(HadoopStreaming.java:41)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)

Caused by:
org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException): Path
is not a file: /tmp/RtmpuZJz6S/file1c1778b1f9b/_logs
at
org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:69)
at
org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:55)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1879)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1820)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1800)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1772)
at
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:527)
at
org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getBlockLocations(AuthorizationProviderProxyClientProtocol.java:85)
at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:356)
at
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)

at org.apache.hadoop.ipc.Client.call(Client.java:1411)
at org.apache.hadoop.ipc.Client.call(Client.java:1364)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
at com.sun.proxy.$Proxy14.getBlockLocations(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getBlockLocations(ClientNamenodeProtocolTranslatorPB.java:246)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy15.getBlockLocations(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:1169)


Reply to this email directly or view it on GitHub
#161 (comment)
.

@kardes
Copy link
Author

kardes commented Mar 20, 2015

I did update the .bash_profile file. upon your recommendation, I still get the error. could you please let me know how to proceed? thank you very much. here is a snapshot of my shell.

[cloudera@quickstart ~]$ echo "export HADOOP_STREAMING=/usr/lib/hadoop-0.20-mapreduce/contrib/streaming/hadoop-streaming-2.5.0-mr1-cdh5.3.2.jar" >> ~/.bash_profile
[cloudera@quickstart ~]$ source ~/.bash_profile
[cloudera@quickstart ~]$ R

R version 3.1.2 (2014-10-31) -- "Pumpkin Helmet"
Copyright (C) 2014 The R Foundation for Statistical Computing
Platform: x86_64-redhat-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

Sys.getenv("HADOOP_STREAMING")
[1] "/usr/lib/hadoop-0.20-mapreduce/contrib/streaming/hadoop-streaming-2.5.0-mr1-cdh5.3.2.jar"
library(rmr2)
Loading required package: Rcpp
Loading required package: RJSONIO
Loading required package: digest
Loading required package: functional
Loading required package: stringr
Loading required package: plyr
small.ints <- to.dfs(1:1000)
15/03/20 15:00:29 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
15/03/20 15:00:29 INFO compress.CodecPool: Got brand-new compressor [.deflate]
out <- mapreduce(input = small.ints, map = function(k, v) keyval(v, v^2))
packageJobJar: [/tmp/RtmpT6ad7h/rmr-local-env14e469666bf9, /tmp/RtmpT6ad7h/rmr-global-env14e431d70a48, /tmp/RtmpT6ad7h/rmr-streaming-map14e4610d911a, /tmp/hadoop-cloudera/hadoop-unjar1796188924023766754/] [] /tmp/streamjob8004571112410021052.jar tmpDir=null
15/03/20 15:00:33 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
15/03/20 15:00:33 INFO mapred.FileInputFormat: Total input paths to process : 1
15/03/20 15:00:34 INFO streaming.StreamJob: getLocalDirs(): [/var/lib/hadoop-hdfs/cache/cloudera/mapred/local]
15/03/20 15:00:34 INFO streaming.StreamJob: Running job: job_201503201448_0001
15/03/20 15:00:34 INFO streaming.StreamJob: To kill this job, run:
15/03/20 15:00:34 INFO streaming.StreamJob: /usr/lib/hadoop-0.20-mapreduce/bin/hadoop job -Dmapred.job.tracker=localhost:8021 -kill job_201503201448_0001
15/03/20 15:00:34 INFO streaming.StreamJob: Tracking URL: http://quickstart.cloudera:50030/jobdetails.jsp?jobid=job_201503201448_0001
15/03/20 15:00:35 INFO streaming.StreamJob: map 0% reduce 0%
15/03/20 15:00:57 INFO streaming.StreamJob: map 100% reduce 0%
15/03/20 15:01:02 INFO streaming.StreamJob: map 100% reduce 100%
15/03/20 15:01:02 INFO streaming.StreamJob: Job complete: job_201503201448_0001
15/03/20 15:01:02 INFO streaming.StreamJob: Output: /tmp/RtmpT6ad7h/file14e456031345
df <- as.data.frame(from.dfs(out))
Exception in thread "main" java.io.FileNotFoundException: File does not exist: hdfs://localhost:8020/user/cloudera/0
at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1093)
at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1085)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1085)
at org.apache.hadoop.streaming.DumpTypedBytes.run(DumpTypedBytes.java:76)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at org.apache.hadoop.streaming.HadoopStreaming.main(HadoopStreaming.java:41)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
Exception in thread "main" java.io.FileNotFoundException: File does not exist: hdfs://localhost:8020/user/cloudera/128464
at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1093)
at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1085)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1085)
at org.apache.hadoop.streaming.DumpTypedBytes.run(DumpTypedBytes.java:76)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at org.apache.hadoop.streaming.HadoopStreaming.main(HadoopStreaming.java:41)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
Exception in thread "main" java.io.FileNotFoundException: File does not exist: hdfs://localhost:8020/user/cloudera/30402
at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1093)
at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1085)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1085)
at org.apache.hadoop.streaming.DumpTypedBytes.run(DumpTypedBytes.java:76)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at org.apache.hadoop.streaming.HadoopStreaming.main(HadoopStreaming.java:41)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
Exception in thread "main" java.io.FileNotFoundException: File does not exist: hdfs://localhost:8020/user/cloudera/122
at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1093)
at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1085)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1085)
at org.apache.hadoop.streaming.DumpTypedBytes.run(DumpTypedBytes.java:76)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at org.apache.hadoop.streaming.HadoopStreaming.main(HadoopStreaming.java:41)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)

@piccolbo
Copy link
Collaborator

I am at a loss. Those are not files that rmr2 manipulates, at least not
explicitly. The only thing I can think of is that there are two streaming
jars packed with CDH and you are using the wrong one. If you are using
YARN, you need to use the other one. That's done by setting
HADOOP_STREAMING to the other path returned by the cmd fine that you shared
a few messages back. No clue how this could be related to the error, but
something we want to be sure it's not in the way.

On Fri, Mar 20, 2015 at 3:07 PM, kardes [email protected] wrote:

I did update the .bash_profile file. upon your recommendation, I still get
the error. could you please let me know how to proceed? thank you very
much. here is a snapshot of my shell.

[cloudera@quickstart ~]$ echo "export
HADOOP_STREAMING=/usr/lib/hadoop-0.20-mapreduce/contrib/streaming/hadoop-streaming-2.5.0-mr1-cdh5.3.2.jar"

~/.bash_profile
[cloudera@quickstart ~]$ source ~/.bash_profile
[cloudera@quickstart ~]$ R

R version 3.1.2 (2014-10-31) -- "Pumpkin Helmet"
Copyright (C) 2014 The R Foundation for Statistical Computing
Platform: x86_64-redhat-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

Sys.getenv("HADOOP_STREAMING")
[1]
"/usr/lib/hadoop-0.20-mapreduce/contrib/streaming/hadoop-streaming-2.5.0-mr1-cdh5.3.2.jar"
library(rmr2)
Loading required package: Rcpp
Loading required package: RJSONIO
Loading required package: digest
Loading required package: functional
Loading required package: stringr
Loading required package: plyr
small.ints <- to.dfs(1:1000)
15/03/20 15:00:29 INFO zlib.ZlibFactory: Successfully loaded & initialized
native-zlib library
15/03/20 15:00:29 INFO compress.CodecPool: Got brand-new compressor
[.deflate]
out <- mapreduce(input = small.ints, map = function(k, v) keyval(v, v^2))
packageJobJar: [/tmp/RtmpT6ad7h/rmr-local-env14e469666bf9,
/tmp/RtmpT6ad7h/rmr-global-env14e431d70a48,
/tmp/RtmpT6ad7h/rmr-streaming-map14e4610d911a,
/tmp/hadoop-cloudera/hadoop-unjar1796188924023766754/] []
/tmp/streamjob8004571112410021052.jar tmpDir=null
15/03/20 15:00:33 WARN mapred.JobClient: Use GenericOptionsParser for
parsing the arguments. Applications should implement Tool for the same.
15/03/20 15:00:33 INFO mapred.FileInputFormat: Total input paths to
process : 1
15/03/20 15:00:34 INFO streaming.StreamJob: getLocalDirs():
[/var/lib/hadoop-hdfs/cache/cloudera/mapred/local]
15/03/20 15:00:34 INFO streaming.StreamJob: Running job:
job_201503201448_0001
15/03/20 15:00:34 INFO streaming.StreamJob: To kill this job, run:
15/03/20 15:00:34 INFO streaming.StreamJob:
/usr/lib/hadoop-0.20-mapreduce/bin/hadoop job
-Dmapred.job.tracker=localhost:8021 -kill job_201503201448_0001
15/03/20 15:00:34 INFO streaming.StreamJob: Tracking URL:
http://quickstart.cloudera:50030/jobdetails.jsp?jobid=job_201503201448_0001
15/03/20 15:00:35 INFO streaming.StreamJob: map 0% reduce 0%
15/03/20 15:00:57 INFO streaming.StreamJob: map 100% reduce 0%
15/03/20 15:01:02 INFO streaming.StreamJob: map 100% reduce 100%
15/03/20 15:01:02 INFO streaming.StreamJob: Job complete:
job_201503201448_0001
15/03/20 15:01:02 INFO streaming.StreamJob: Output:
/tmp/RtmpT6ad7h/file14e456031345
df <- as.data.frame(from.dfs(out))
Exception in thread "main" java.io.FileNotFoundException: File does not
exist: hdfs://localhost:8020/user/cloudera/0
at
org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1093)
at
org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1085)
at
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1085)
at org.apache.hadoop.streaming.DumpTypedBytes.run(DumpTypedBytes.java:76)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at
org.apache.hadoop.streaming.HadoopStreaming.main(HadoopStreaming.java:41)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
Exception in thread "main" java.io.FileNotFoundException: File does not
exist: hdfs://localhost:8020/user/cloudera/128464
at
org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1093)
at
org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1085)
at
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1085)
at org.apache.hadoop.streaming.DumpTypedBytes.run(DumpTypedBytes.java:76)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at
org.apache.hadoop.streaming.HadoopStreaming.main(HadoopStreaming.java:41)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
Exception in thread "main" java.io.FileNotFoundException: File does not
exist: hdfs://localhost:8020/user/cloudera/30402

at
org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1093)
at
org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1085)
at
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1085)
at org.apache.hadoop.streaming.DumpTypedBytes.run(DumpTypedBytes.java:76)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at
org.apache.hadoop.streaming.HadoopStreaming.main(HadoopStreaming.java:41)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
Exception in thread "main" java.io.FileNotFoundException: File does not
exist: hdfs://localhost:8020/user/cloudera/122
at
org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1093)
at
org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1085)
at
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1085)
at org.apache.hadoop.streaming.DumpTypedBytes.run(DumpTypedBytes.java:76)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at
org.apache.hadoop.streaming.HadoopStreaming.main(HadoopStreaming.java:41)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)


Reply to this email directly or view it on GitHub
#161 (comment)
.

@kardes
Copy link
Author

kardes commented Mar 21, 2015

I am not using YARN. I am using MR1. that's why I am doing:

echo
"export HADOOP_STREAMING=/usr/lib/hadoop-0.20-mapreduce/contrib/streaming/hadoop-streaming-2.5.0-mr1-cdh5.3.2.jar"

~/.bash_profile

On Fri, Mar 20, 2015 at 4:54 PM, Antonio Piccolboni <
[email protected]> wrote:

I am at a loss. Those are not files that rmr2 manipulates, at least not
explicitly. The only thing I can think of is that there are two streaming
jars packed with CDH and you are using the wrong one. If you are using
YARN, you need to use the other one. That's done by setting
HADOOP_STREAMING to the other path returned by the cmd fine that you shared
a few messages back. No clue how this could be related to the error, but
something we want to be sure it's not in the way.

On Fri, Mar 20, 2015 at 3:07 PM, kardes [email protected] wrote:

I did update the .bash_profile file. upon your recommendation, I still
get
the error. could you please let me know how to proceed? thank you very
much. here is a snapshot of my shell.

[cloudera@quickstart ~]$ echo "export

HADOOP_STREAMING=/usr/lib/hadoop-0.20-mapreduce/contrib/streaming/hadoop-streaming-2.5.0-mr1-cdh5.3.2.jar"

~/.bash_profile
[cloudera@quickstart ~]$ source ~/.bash_profile
[cloudera@quickstart ~]$ R

R version 3.1.2 (2014-10-31) -- "Pumpkin Helmet"
Copyright (C) 2014 The R Foundation for Statistical Computing
Platform: x86_64-redhat-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

Sys.getenv("HADOOP_STREAMING")
[1]

"/usr/lib/hadoop-0.20-mapreduce/contrib/streaming/hadoop-streaming-2.5.0-mr1-cdh5.3.2.jar"
library(rmr2)
Loading required package: Rcpp
Loading required package: RJSONIO
Loading required package: digest
Loading required package: functional
Loading required package: stringr
Loading required package: plyr
small.ints <- to.dfs(1:1000)
15/03/20 15:00:29 INFO zlib.ZlibFactory: Successfully loaded &
initialized
native-zlib library
15/03/20 15:00:29 INFO compress.CodecPool: Got brand-new compressor
[.deflate]
out <- mapreduce(input = small.ints, map = function(k, v) keyval(v, v^2))
packageJobJar: [/tmp/RtmpT6ad7h/rmr-local-env14e469666bf9,
/tmp/RtmpT6ad7h/rmr-global-env14e431d70a48,
/tmp/RtmpT6ad7h/rmr-streaming-map14e4610d911a,
/tmp/hadoop-cloudera/hadoop-unjar1796188924023766754/] []
/tmp/streamjob8004571112410021052.jar tmpDir=null
15/03/20 15:00:33 WARN mapred.JobClient: Use GenericOptionsParser for
parsing the arguments. Applications should implement Tool for the same.
15/03/20 15:00:33 INFO mapred.FileInputFormat: Total input paths to
process : 1
15/03/20 15:00:34 INFO streaming.StreamJob: getLocalDirs():
[/var/lib/hadoop-hdfs/cache/cloudera/mapred/local]
15/03/20 15:00:34 INFO streaming.StreamJob: Running job:
job_201503201448_0001
15/03/20 15:00:34 INFO streaming.StreamJob: To kill this job, run:
15/03/20 15:00:34 INFO streaming.StreamJob:
/usr/lib/hadoop-0.20-mapreduce/bin/hadoop job
-Dmapred.job.tracker=localhost:8021 -kill job_201503201448_0001
15/03/20 15:00:34 INFO streaming.StreamJob: Tracking URL:

http://quickstart.cloudera:50030/jobdetails.jsp?jobid=job_201503201448_0001
15/03/20 15:00:35 INFO streaming.StreamJob: map 0% reduce 0%
15/03/20 15:00:57 INFO streaming.StreamJob: map 100% reduce 0%
15/03/20 15:01:02 INFO streaming.StreamJob: map 100% reduce 100%
15/03/20 15:01:02 INFO streaming.StreamJob: Job complete:
job_201503201448_0001
15/03/20 15:01:02 INFO streaming.StreamJob: Output:
/tmp/RtmpT6ad7h/file14e456031345
df <- as.data.frame(from.dfs(out))
Exception in thread "main" java.io.FileNotFoundException: File does not
exist: hdfs://localhost:8020/user/cloudera/0
at

org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1093)
at

org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1085)
at

org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at

org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1085)
at org.apache.hadoop.streaming.DumpTypedBytes.run(DumpTypedBytes.java:76)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at
org.apache.hadoop.streaming.HadoopStreaming.main(HadoopStreaming.java:41)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at

sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at

sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
Exception in thread "main" java.io.FileNotFoundException: File does not
exist: hdfs://localhost:8020/user/cloudera/128464
at

org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1093)
at

org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1085)
at

org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at

org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1085)
at org.apache.hadoop.streaming.DumpTypedBytes.run(DumpTypedBytes.java:76)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at
org.apache.hadoop.streaming.HadoopStreaming.main(HadoopStreaming.java:41)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at

sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at

sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
Exception in thread "main" java.io.FileNotFoundException: File does not
exist: hdfs://localhost:8020/user/cloudera/30402

at

org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1093)
at

org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1085)
at

org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at

org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1085)
at org.apache.hadoop.streaming.DumpTypedBytes.run(DumpTypedBytes.java:76)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at
org.apache.hadoop.streaming.HadoopStreaming.main(HadoopStreaming.java:41)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at

sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at

sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
Exception in thread "main" java.io.FileNotFoundException: File does not
exist: hdfs://localhost:8020/user/cloudera/122
at

org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1093)
at

org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1085)
at

org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at

org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1085)
at org.apache.hadoop.streaming.DumpTypedBytes.run(DumpTypedBytes.java:76)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at
org.apache.hadoop.streaming.HadoopStreaming.main(HadoopStreaming.java:41)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at

sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at

sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)


Reply to this email directly or view it on GitHub
<
#161 (comment)

.


Reply to this email directly or view it on GitHub
#161 (comment)
.

@piccolbo
Copy link
Collaborator

You understand that since I can't reproduce this, unless you can give me
access to a test system, you'll have to debug it yourself but I will try to
help. My current thinking is that your streaming installation has a problem
which is outside rmr2 control or something we can fix, but at least I would
be able to tell you to go bug the cloudera guys with some good argument to
do so. To do that, we need to repro the error outside R. To do that,

  1. Run your mapreduce job.
  2. Follow the console output until a line like the following

15/03/20 15:01:02 INFO streaming.StreamJob: Output: /tmp/RtmpT6ad7h/

  1. make a note of that path (which will be different for every run)
  2. open a shell without closing the current R session
  3. list that directory, that should be something like

hdfs dfs -ls path

  1. It should contain several files named part-. Pick one, say
    file1
  2. Now try to dump its contents

hadoop jar $HADOOP_STREAMING dumptb /

<> brackets mean "replace with actual value"

It should fail exactly the way the from.dfs function fails. If that's the
case, you have something to report to cloudera. Otherwise, we need to debug
from.dfs more closely. Thanks

On Fri, Mar 20, 2015 at 5:11 PM, kardes [email protected] wrote:

I am not using YARN. I am using MR1. that's why I am doing:

echo
"export
HADOOP_STREAMING=/usr/lib/hadoop-0.20-mapreduce/contrib/streaming/hadoop-streaming-2.5.0-mr1-cdh5.3.2.jar"

~/.bash_profile

On Fri, Mar 20, 2015 at 4:54 PM, Antonio Piccolboni <
[email protected]> wrote:

I am at a loss. Those are not files that rmr2 manipulates, at least not
explicitly. The only thing I can think of is that there are two streaming
jars packed with CDH and you are using the wrong one. If you are using
YARN, you need to use the other one. That's done by setting
HADOOP_STREAMING to the other path returned by the cmd fine that you
shared
a few messages back. No clue how this could be related to the error, but
something we want to be sure it's not in the way.

On Fri, Mar 20, 2015 at 3:07 PM, kardes [email protected]
wrote:

I did update the .bash_profile file. upon your recommendation, I still
get
the error. could you please let me know how to proceed? thank you very
much. here is a snapshot of my shell.

[cloudera@quickstart ~]$ echo "export

HADOOP_STREAMING=/usr/lib/hadoop-0.20-mapreduce/contrib/streaming/hadoop-streaming-2.5.0-mr1-cdh5.3.2.jar"

~/.bash_profile
[cloudera@quickstart ~]$ source ~/.bash_profile
[cloudera@quickstart ~]$ R

R version 3.1.2 (2014-10-31) -- "Pumpkin Helmet"
Copyright (C) 2014 The R Foundation for Statistical Computing
Platform: x86_64-redhat-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

Sys.getenv("HADOOP_STREAMING")
[1]

"/usr/lib/hadoop-0.20-mapreduce/contrib/streaming/hadoop-streaming-2.5.0-mr1-cdh5.3.2.jar"

library(rmr2)
Loading required package: Rcpp
Loading required package: RJSONIO
Loading required package: digest
Loading required package: functional
Loading required package: stringr
Loading required package: plyr
small.ints <- to.dfs(1:1000)
15/03/20 15:00:29 INFO zlib.ZlibFactory: Successfully loaded &
initialized
native-zlib library
15/03/20 15:00:29 INFO compress.CodecPool: Got brand-new compressor
[.deflate]
out <- mapreduce(input = small.ints, map = function(k, v) keyval(v,
v^2))
packageJobJar: [/tmp/RtmpT6ad7h/rmr-local-env14e469666bf9,
/tmp/RtmpT6ad7h/rmr-global-env14e431d70a48,
/tmp/RtmpT6ad7h/rmr-streaming-map14e4610d911a,
/tmp/hadoop-cloudera/hadoop-unjar1796188924023766754/] []
/tmp/streamjob8004571112410021052.jar tmpDir=null
15/03/20 15:00:33 WARN mapred.JobClient: Use GenericOptionsParser for
parsing the arguments. Applications should implement Tool for the same.
15/03/20 15:00:33 INFO mapred.FileInputFormat: Total input paths to
process : 1
15/03/20 15:00:34 INFO streaming.StreamJob: getLocalDirs():
[/var/lib/hadoop-hdfs/cache/cloudera/mapred/local]
15/03/20 15:00:34 INFO streaming.StreamJob: Running job:
job_201503201448_0001
15/03/20 15:00:34 INFO streaming.StreamJob: To kill this job, run:
15/03/20 15:00:34 INFO streaming.StreamJob:
/usr/lib/hadoop-0.20-mapreduce/bin/hadoop job
-Dmapred.job.tracker=localhost:8021 -kill job_201503201448_0001
15/03/20 15:00:34 INFO streaming.StreamJob: Tracking URL:

http://quickstart.cloudera:50030/jobdetails.jsp?jobid=job_201503201448_0001

15/03/20 15:00:35 INFO streaming.StreamJob: map 0% reduce 0%
15/03/20 15:00:57 INFO streaming.StreamJob: map 100% reduce 0%
15/03/20 15:01:02 INFO streaming.StreamJob: map 100% reduce 100%
15/03/20 15:01:02 INFO streaming.StreamJob: Job complete:
job_201503201448_0001
15/03/20 15:01:02 INFO streaming.StreamJob: Output:
/tmp/RtmpT6ad7h/file14e456031345
df <- as.data.frame(from.dfs(out))
Exception in thread "main" java.io.FileNotFoundException: File does not
exist: hdfs://localhost:8020/user/cloudera/0
at

org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1093)

at

org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1085)

at

org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)

at

org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1085)

at
org.apache.hadoop.streaming.DumpTypedBytes.run(DumpTypedBytes.java:76)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at

org.apache.hadoop.streaming.HadoopStreaming.main(HadoopStreaming.java:41)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at

sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)

at

sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
Exception in thread "main" java.io.FileNotFoundException: File does not
exist: hdfs://localhost:8020/user/cloudera/128464
at

org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1093)

at

org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1085)

at

org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)

at

org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1085)

at
org.apache.hadoop.streaming.DumpTypedBytes.run(DumpTypedBytes.java:76)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at

org.apache.hadoop.streaming.HadoopStreaming.main(HadoopStreaming.java:41)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at

sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)

at

sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
Exception in thread "main" java.io.FileNotFoundException: File does not
exist: hdfs://localhost:8020/user/cloudera/30402

at

org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1093)

at

org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1085)

at

org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)

at

org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1085)

at
org.apache.hadoop.streaming.DumpTypedBytes.run(DumpTypedBytes.java:76)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at

org.apache.hadoop.streaming.HadoopStreaming.main(HadoopStreaming.java:41)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at

sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)

at

sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
Exception in thread "main" java.io.FileNotFoundException: File does not
exist: hdfs://localhost:8020/user/cloudera/122
at

org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1093)

at

org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1085)

at

org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)

at

org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1085)

at
org.apache.hadoop.streaming.DumpTypedBytes.run(DumpTypedBytes.java:76)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at

org.apache.hadoop.streaming.HadoopStreaming.main(HadoopStreaming.java:41)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at

sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)

at

sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)


Reply to this email directly or view it on GitHub
<

#161 (comment)

.


Reply to this email directly or view it on GitHub
<
#161 (comment)

.


Reply to this email directly or view it on GitHub
#161 (comment)
.

@kardes
Copy link
Author

kardes commented Mar 23, 2015

actually I have done this before, and what I get when I do
hadoop jar $HADOOP_STREAMING dumptb /
is that some meaningless figures/shapes/rectangles appear on the screen.
and I do not get the error I get using from.dfs().

On Mon, Mar 23, 2015 at 10:10 AM, Antonio Piccolboni <
[email protected]> wrote:

You understand that since I can't reproduce this, unless you can give me
access to a test system, you'll have to debug it yourself but I will try to
help. My current thinking is that your streaming installation has a problem
which is outside rmr2 control or something we can fix, but at least I would
be able to tell you to go bug the cloudera guys with some good argument to
do so. To do that, we need to repro the error outside R. To do that,

  1. Run your mapreduce job.
  2. Follow the console output until a line like the following

15/03/20 15:01:02 INFO streaming.StreamJob: Output: /tmp/RtmpT6ad7h/

  1. make a note of that path (which will be different for every run)
  2. open a shell without closing the current R session
  3. list that directory, that should be something like

hdfs dfs -ls path

  1. It should contain several files named part-. Pick one, say
    file1
  2. Now try to dump its contents

hadoop jar $HADOOP_STREAMING dumptb /

<> brackets mean "replace with actual value"

It should fail exactly the way the from.dfs function fails. If that's the
case, you have something to report to cloudera. Otherwise, we need to debug
from.dfs more closely. Thanks

On Fri, Mar 20, 2015 at 5:11 PM, kardes [email protected] wrote:

I am not using YARN. I am using MR1. that's why I am doing:

echo
"export

HADOOP_STREAMING=/usr/lib/hadoop-0.20-mapreduce/contrib/streaming/hadoop-streaming-2.5.0-mr1-cdh5.3.2.jar"

~/.bash_profile

On Fri, Mar 20, 2015 at 4:54 PM, Antonio Piccolboni <
[email protected]> wrote:

I am at a loss. Those are not files that rmr2 manipulates, at least not
explicitly. The only thing I can think of is that there are two
streaming
jars packed with CDH and you are using the wrong one. If you are using
YARN, you need to use the other one. That's done by setting
HADOOP_STREAMING to the other path returned by the cmd fine that you
shared
a few messages back. No clue how this could be related to the error,
but
something we want to be sure it's not in the way.

On Fri, Mar 20, 2015 at 3:07 PM, kardes [email protected]
wrote:

I did update the .bash_profile file. upon your recommendation, I
still
get
the error. could you please let me know how to proceed? thank you
very
much. here is a snapshot of my shell.

[cloudera@quickstart ~]$ echo "export

HADOOP_STREAMING=/usr/lib/hadoop-0.20-mapreduce/contrib/streaming/hadoop-streaming-2.5.0-mr1-cdh5.3.2.jar"

~/.bash_profile
[cloudera@quickstart ~]$ source ~/.bash_profile
[cloudera@quickstart ~]$ R

R version 3.1.2 (2014-10-31) -- "Pumpkin Helmet"
Copyright (C) 2014 The R Foundation for Statistical Computing
Platform: x86_64-redhat-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

Sys.getenv("HADOOP_STREAMING")
[1]

"/usr/lib/hadoop-0.20-mapreduce/contrib/streaming/hadoop-streaming-2.5.0-mr1-cdh5.3.2.jar"

library(rmr2)
Loading required package: Rcpp
Loading required package: RJSONIO
Loading required package: digest
Loading required package: functional
Loading required package: stringr
Loading required package: plyr
small.ints <- to.dfs(1:1000)
15/03/20 15:00:29 INFO zlib.ZlibFactory: Successfully loaded &
initialized
native-zlib library
15/03/20 15:00:29 INFO compress.CodecPool: Got brand-new compressor
[.deflate]
out <- mapreduce(input = small.ints, map = function(k, v) keyval(v,
v^2))
packageJobJar: [/tmp/RtmpT6ad7h/rmr-local-env14e469666bf9,
/tmp/RtmpT6ad7h/rmr-global-env14e431d70a48,
/tmp/RtmpT6ad7h/rmr-streaming-map14e4610d911a,
/tmp/hadoop-cloudera/hadoop-unjar1796188924023766754/] []
/tmp/streamjob8004571112410021052.jar tmpDir=null
15/03/20 15:00:33 WARN mapred.JobClient: Use GenericOptionsParser for
parsing the arguments. Applications should implement Tool for the
same.
15/03/20 15:00:33 INFO mapred.FileInputFormat: Total input paths to
process : 1
15/03/20 15:00:34 INFO streaming.StreamJob: getLocalDirs():
[/var/lib/hadoop-hdfs/cache/cloudera/mapred/local]
15/03/20 15:00:34 INFO streaming.StreamJob: Running job:
job_201503201448_0001
15/03/20 15:00:34 INFO streaming.StreamJob: To kill this job, run:
15/03/20 15:00:34 INFO streaming.StreamJob:
/usr/lib/hadoop-0.20-mapreduce/bin/hadoop job
-Dmapred.job.tracker=localhost:8021 -kill job_201503201448_0001
15/03/20 15:00:34 INFO streaming.StreamJob: Tracking URL:

http://quickstart.cloudera:50030/jobdetails.jsp?jobid=job_201503201448_0001

15/03/20 15:00:35 INFO streaming.StreamJob: map 0% reduce 0%
15/03/20 15:00:57 INFO streaming.StreamJob: map 100% reduce 0%
15/03/20 15:01:02 INFO streaming.StreamJob: map 100% reduce 100%
15/03/20 15:01:02 INFO streaming.StreamJob: Job complete:
job_201503201448_0001
15/03/20 15:01:02 INFO streaming.StreamJob: Output:
/tmp/RtmpT6ad7h/file14e456031345
df <- as.data.frame(from.dfs(out))
Exception in thread "main" java.io.FileNotFoundException: File does
not
exist: hdfs://localhost:8020/user/cloudera/0
at

org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1093)

at

org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1085)

at

org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)

at

org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1085)

at
org.apache.hadoop.streaming.DumpTypedBytes.run(DumpTypedBytes.java:76)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at

org.apache.hadoop.streaming.HadoopStreaming.main(HadoopStreaming.java:41)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at

sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)

at

sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
Exception in thread "main" java.io.FileNotFoundException: File does
not
exist: hdfs://localhost:8020/user/cloudera/128464
at

org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1093)

at

org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1085)

at

org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)

at

org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1085)

at
org.apache.hadoop.streaming.DumpTypedBytes.run(DumpTypedBytes.java:76)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at

org.apache.hadoop.streaming.HadoopStreaming.main(HadoopStreaming.java:41)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at

sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)

at

sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
Exception in thread "main" java.io.FileNotFoundException: File does
not
exist: hdfs://localhost:8020/user/cloudera/30402

at

org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1093)

at

org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1085)

at

org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)

at

org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1085)

at
org.apache.hadoop.streaming.DumpTypedBytes.run(DumpTypedBytes.java:76)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at

org.apache.hadoop.streaming.HadoopStreaming.main(HadoopStreaming.java:41)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at

sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)

at

sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
Exception in thread "main" java.io.FileNotFoundException: File does
not
exist: hdfs://localhost:8020/user/cloudera/122
at

org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1093)

at

org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1085)

at

org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)

at

org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1085)

at
org.apache.hadoop.streaming.DumpTypedBytes.run(DumpTypedBytes.java:76)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at

org.apache.hadoop.streaming.HadoopStreaming.main(HadoopStreaming.java:41)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at

sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)

at

sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)


Reply to this email directly or view it on GitHub
<

#161 (comment)

.


Reply to this email directly or view it on GitHub
<

#161 (comment)

.


Reply to this email directly or view it on GitHub
<
#161 (comment)

.


Reply to this email directly or view it on GitHub
#161 (comment)
.

@kardes
Copy link
Author

kardes commented Mar 23, 2015

I don't know. I am very new to all this and maybe I made a mistake setting
things up I am not sure.
I can try to set up everything from scratch but I couldn't find a good blog
that describes
R+Hadoop setup using recent versions of CDH

On Mon, Mar 23, 2015 at 2:09 PM, Erim Kardes [email protected] wrote:

actually I have done this before, and what I get when I do
hadoop jar $HADOOP_STREAMING dumptb /
is that some meaningless figures/shapes/rectangles appear on the screen.
and I do not get the error I get using from.dfs().

On Mon, Mar 23, 2015 at 10:10 AM, Antonio Piccolboni <
[email protected]> wrote:

You understand that since I can't reproduce this, unless you can give me
access to a test system, you'll have to debug it yourself but I will try
to
help. My current thinking is that your streaming installation has a
problem
which is outside rmr2 control or something we can fix, but at least I
would
be able to tell you to go bug the cloudera guys with some good argument to
do so. To do that, we need to repro the error outside R. To do that,

  1. Run your mapreduce job.
  2. Follow the console output until a line like the following

15/03/20 15:01:02 INFO streaming.StreamJob: Output: /tmp/RtmpT6ad7h/

  1. make a note of that path (which will be different for every run)
  2. open a shell without closing the current R session
  3. list that directory, that should be something like

hdfs dfs -ls path

  1. It should contain several files named part-. Pick one, say
    file1
  2. Now try to dump its contents

hadoop jar $HADOOP_STREAMING dumptb /

<> brackets mean "replace with actual value"

It should fail exactly the way the from.dfs function fails. If that's the
case, you have something to report to cloudera. Otherwise, we need to
debug
from.dfs more closely. Thanks

On Fri, Mar 20, 2015 at 5:11 PM, kardes [email protected] wrote:

I am not using YARN. I am using MR1. that's why I am doing:

echo
"export

HADOOP_STREAMING=/usr/lib/hadoop-0.20-mapreduce/contrib/streaming/hadoop-streaming-2.5.0-mr1-cdh5.3.2.jar"

~/.bash_profile

On Fri, Mar 20, 2015 at 4:54 PM, Antonio Piccolboni <
[email protected]> wrote:

I am at a loss. Those are not files that rmr2 manipulates, at least
not
explicitly. The only thing I can think of is that there are two
streaming
jars packed with CDH and you are using the wrong one. If you are using
YARN, you need to use the other one. That's done by setting
HADOOP_STREAMING to the other path returned by the cmd fine that you
shared
a few messages back. No clue how this could be related to the error,
but
something we want to be sure it's not in the way.

On Fri, Mar 20, 2015 at 3:07 PM, kardes [email protected]
wrote:

I did update the .bash_profile file. upon your recommendation, I
still
get
the error. could you please let me know how to proceed? thank you
very
much. here is a snapshot of my shell.

[cloudera@quickstart ~]$ echo "export

HADOOP_STREAMING=/usr/lib/hadoop-0.20-mapreduce/contrib/streaming/hadoop-streaming-2.5.0-mr1-cdh5.3.2.jar"

~/.bash_profile
[cloudera@quickstart ~]$ source ~/.bash_profile
[cloudera@quickstart ~]$ R

R version 3.1.2 (2014-10-31) -- "Pumpkin Helmet"
Copyright (C) 2014 The R Foundation for Statistical Computing
Platform: x86_64-redhat-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

Sys.getenv("HADOOP_STREAMING")
[1]

"/usr/lib/hadoop-0.20-mapreduce/contrib/streaming/hadoop-streaming-2.5.0-mr1-cdh5.3.2.jar"

library(rmr2)
Loading required package: Rcpp
Loading required package: RJSONIO
Loading required package: digest
Loading required package: functional
Loading required package: stringr
Loading required package: plyr
small.ints <- to.dfs(1:1000)
15/03/20 15:00:29 INFO zlib.ZlibFactory: Successfully loaded &
initialized
native-zlib library
15/03/20 15:00:29 INFO compress.CodecPool: Got brand-new compressor
[.deflate]
out <- mapreduce(input = small.ints, map = function(k, v) keyval(v,
v^2))
packageJobJar: [/tmp/RtmpT6ad7h/rmr-local-env14e469666bf9,
/tmp/RtmpT6ad7h/rmr-global-env14e431d70a48,
/tmp/RtmpT6ad7h/rmr-streaming-map14e4610d911a,
/tmp/hadoop-cloudera/hadoop-unjar1796188924023766754/] []
/tmp/streamjob8004571112410021052.jar tmpDir=null
15/03/20 15:00:33 WARN mapred.JobClient: Use GenericOptionsParser
for
parsing the arguments. Applications should implement Tool for the
same.
15/03/20 15:00:33 INFO mapred.FileInputFormat: Total input paths to
process : 1
15/03/20 15:00:34 INFO streaming.StreamJob: getLocalDirs():
[/var/lib/hadoop-hdfs/cache/cloudera/mapred/local]
15/03/20 15:00:34 INFO streaming.StreamJob: Running job:
job_201503201448_0001
15/03/20 15:00:34 INFO streaming.StreamJob: To kill this job, run:
15/03/20 15:00:34 INFO streaming.StreamJob:
/usr/lib/hadoop-0.20-mapreduce/bin/hadoop job
-Dmapred.job.tracker=localhost:8021 -kill job_201503201448_0001
15/03/20 15:00:34 INFO streaming.StreamJob: Tracking URL:

http://quickstart.cloudera:50030/jobdetails.jsp?jobid=job_201503201448_0001

15/03/20 15:00:35 INFO streaming.StreamJob: map 0% reduce 0%
15/03/20 15:00:57 INFO streaming.StreamJob: map 100% reduce 0%
15/03/20 15:01:02 INFO streaming.StreamJob: map 100% reduce 100%
15/03/20 15:01:02 INFO streaming.StreamJob: Job complete:
job_201503201448_0001
15/03/20 15:01:02 INFO streaming.StreamJob: Output:
/tmp/RtmpT6ad7h/file14e456031345
df <- as.data.frame(from.dfs(out))
Exception in thread "main" java.io.FileNotFoundException: File does
not
exist: hdfs://localhost:8020/user/cloudera/0
at

org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1093)

at

org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1085)

at

org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)

at

org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1085)

at
org.apache.hadoop.streaming.DumpTypedBytes.run(DumpTypedBytes.java:76)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at

org.apache.hadoop.streaming.HadoopStreaming.main(HadoopStreaming.java:41)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at

sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)

at

sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
Exception in thread "main" java.io.FileNotFoundException: File does
not
exist: hdfs://localhost:8020/user/cloudera/128464
at

org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1093)

at

org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1085)

at

org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)

at

org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1085)

at
org.apache.hadoop.streaming.DumpTypedBytes.run(DumpTypedBytes.java:76)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at

org.apache.hadoop.streaming.HadoopStreaming.main(HadoopStreaming.java:41)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at

sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)

at

sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
Exception in thread "main" java.io.FileNotFoundException: File does
not
exist: hdfs://localhost:8020/user/cloudera/30402

at

org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1093)

at

org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1085)

at

org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)

at

org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1085)

at
org.apache.hadoop.streaming.DumpTypedBytes.run(DumpTypedBytes.java:76)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at

org.apache.hadoop.streaming.HadoopStreaming.main(HadoopStreaming.java:41)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at

sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)

at

sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
Exception in thread "main" java.io.FileNotFoundException: File does
not
exist: hdfs://localhost:8020/user/cloudera/122
at

org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1093)

at

org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1085)

at

org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)

at

org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1085)

at
org.apache.hadoop.streaming.DumpTypedBytes.run(DumpTypedBytes.java:76)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at

org.apache.hadoop.streaming.HadoopStreaming.main(HadoopStreaming.java:41)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at

sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)

at

sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)


Reply to this email directly or view it on GitHub
<

#161 (comment)

.


Reply to this email directly or view it on GitHub
<

#161 (comment)

.


Reply to this email directly or view it on GitHub
<
#161 (comment)

.


Reply to this email directly or view it on GitHub
#161 (comment)
.

@piccolbo
Copy link
Collaborator

Sorry, I forgot to add a redirection to that command, so you got the
contents of a binary file in console. As meaningless as it looked, it was
probably just fine. Try this to be absolutely sure

hadoop jar $HADOOP_STREAMING dumptb / > /tmp/dumptb.out

(the last greater sign to be entered as is, no substitutions)

The other thing is that if this succeeds, it's more of a rmr2 problem (my
plate). So try this

debug(from.dfs)

from.dfs(out)

step until function dumptb definition, and step one more

debug(dumptb)

c

You are now in dumptb, a Very Simple Function

Please print the contents of src and also

paste(hadoop.streaming(),
"dumptb", rmr.normalize.path(x), ">>", rmr.normalize.path(dest))

And paste the results here. The idea is that the dumptb function does
almost exactly what you typed in at cmd line, and worked. So there must be
some difference either in the cmd entered or in the environment in which it
is executed. Thanks for your patience and cooperation.

On Mon, Mar 23, 2015 at 2:11 PM, kardes [email protected] wrote:

I don't know. I am very new to all this and maybe I made a mistake setting
things up I am not sure.
I can try to set up everything from scratch but I couldn't find a good blog
that describes
R+Hadoop setup using recent versions of CDH

On Mon, Mar 23, 2015 at 2:09 PM, Erim Kardes [email protected] wrote:

actually I have done this before, and what I get when I do
hadoop jar $HADOOP_STREAMING dumptb /
is that some meaningless figures/shapes/rectangles appear on the screen.
and I do not get the error I get using from.dfs().

On Mon, Mar 23, 2015 at 10:10 AM, Antonio Piccolboni <
[email protected]> wrote:

You understand that since I can't reproduce this, unless you can give me
access to a test system, you'll have to debug it yourself but I will try
to
help. My current thinking is that your streaming installation has a
problem
which is outside rmr2 control or something we can fix, but at least I
would
be able to tell you to go bug the cloudera guys with some good argument
to
do so. To do that, we need to repro the error outside R. To do that,

  1. Run your mapreduce job.
  2. Follow the console output until a line like the following

15/03/20 15:01:02 INFO streaming.StreamJob: Output: /tmp/RtmpT6ad7h/

  1. make a note of that path (which will be different for every run)
  2. open a shell without closing the current R session
  3. list that directory, that should be something like

hdfs dfs -ls path

  1. It should contain several files named part-. Pick one,
    say
    file1
  2. Now try to dump its contents

hadoop jar $HADOOP_STREAMING dumptb /

<> brackets mean "replace with actual value"

It should fail exactly the way the from.dfs function fails. If that's
the
case, you have something to report to cloudera. Otherwise, we need to
debug
from.dfs more closely. Thanks

On Fri, Mar 20, 2015 at 5:11 PM, kardes [email protected]
wrote:

I am not using YARN. I am using MR1. that's why I am doing:

echo
"export

HADOOP_STREAMING=/usr/lib/hadoop-0.20-mapreduce/contrib/streaming/hadoop-streaming-2.5.0-mr1-cdh5.3.2.jar"

~/.bash_profile

On Fri, Mar 20, 2015 at 4:54 PM, Antonio Piccolboni <
[email protected]> wrote:

I am at a loss. Those are not files that rmr2 manipulates, at least
not
explicitly. The only thing I can think of is that there are two
streaming
jars packed with CDH and you are using the wrong one. If you are
using
YARN, you need to use the other one. That's done by setting
HADOOP_STREAMING to the other path returned by the cmd fine that you
shared
a few messages back. No clue how this could be related to the error,
but
something we want to be sure it's not in the way.

On Fri, Mar 20, 2015 at 3:07 PM, kardes [email protected]
wrote:

I did update the .bash_profile file. upon your recommendation, I
still
get
the error. could you please let me know how to proceed? thank you
very
much. here is a snapshot of my shell.

[cloudera@quickstart ~]$ echo "export

HADOOP_STREAMING=/usr/lib/hadoop-0.20-mapreduce/contrib/streaming/hadoop-streaming-2.5.0-mr1-cdh5.3.2.jar"

~/.bash_profile
[cloudera@quickstart ~]$ source ~/.bash_profile
[cloudera@quickstart ~]$ R

R version 3.1.2 (2014-10-31) -- "Pumpkin Helmet"
Copyright (C) 2014 The R Foundation for Statistical Computing
Platform: x86_64-redhat-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

Sys.getenv("HADOOP_STREAMING")
[1]

"/usr/lib/hadoop-0.20-mapreduce/contrib/streaming/hadoop-streaming-2.5.0-mr1-cdh5.3.2.jar"

library(rmr2)
Loading required package: Rcpp
Loading required package: RJSONIO
Loading required package: digest
Loading required package: functional
Loading required package: stringr
Loading required package: plyr
small.ints <- to.dfs(1:1000)
15/03/20 15:00:29 INFO zlib.ZlibFactory: Successfully loaded &
initialized
native-zlib library
15/03/20 15:00:29 INFO compress.CodecPool: Got brand-new
compressor
[.deflate]
out <- mapreduce(input = small.ints, map = function(k, v)
keyval(v,
v^2))
packageJobJar: [/tmp/RtmpT6ad7h/rmr-local-env14e469666bf9,
/tmp/RtmpT6ad7h/rmr-global-env14e431d70a48,
/tmp/RtmpT6ad7h/rmr-streaming-map14e4610d911a,
/tmp/hadoop-cloudera/hadoop-unjar1796188924023766754/] []
/tmp/streamjob8004571112410021052.jar tmpDir=null
15/03/20 15:00:33 WARN mapred.JobClient: Use GenericOptionsParser
for
parsing the arguments. Applications should implement Tool for the
same.
15/03/20 15:00:33 INFO mapred.FileInputFormat: Total input paths
to
process : 1
15/03/20 15:00:34 INFO streaming.StreamJob: getLocalDirs():
[/var/lib/hadoop-hdfs/cache/cloudera/mapred/local]
15/03/20 15:00:34 INFO streaming.StreamJob: Running job:
job_201503201448_0001
15/03/20 15:00:34 INFO streaming.StreamJob: To kill this job, run:
15/03/20 15:00:34 INFO streaming.StreamJob:
/usr/lib/hadoop-0.20-mapreduce/bin/hadoop job
-Dmapred.job.tracker=localhost:8021 -kill job_201503201448_0001
15/03/20 15:00:34 INFO streaming.StreamJob: Tracking URL:

http://quickstart.cloudera:50030/jobdetails.jsp?jobid=job_201503201448_0001

15/03/20 15:00:35 INFO streaming.StreamJob: map 0% reduce 0%
15/03/20 15:00:57 INFO streaming.StreamJob: map 100% reduce 0%
15/03/20 15:01:02 INFO streaming.StreamJob: map 100% reduce 100%
15/03/20 15:01:02 INFO streaming.StreamJob: Job complete:
job_201503201448_0001
15/03/20 15:01:02 INFO streaming.StreamJob: Output:
/tmp/RtmpT6ad7h/file14e456031345
df <- as.data.frame(from.dfs(out))
Exception in thread "main" java.io.FileNotFoundException: File
does
not
exist: hdfs://localhost:8020/user/cloudera/0
at

org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1093)

at

org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1085)

at

org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)

at

org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1085)

at
org.apache.hadoop.streaming.DumpTypedBytes.run(DumpTypedBytes.java:76)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at

org.apache.hadoop.streaming.HadoopStreaming.main(HadoopStreaming.java:41)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at

sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)

at

sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
Exception in thread "main" java.io.FileNotFoundException: File
does
not
exist: hdfs://localhost:8020/user/cloudera/128464
at

org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1093)

at

org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1085)

at

org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)

at

org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1085)

at
org.apache.hadoop.streaming.DumpTypedBytes.run(DumpTypedBytes.java:76)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at

org.apache.hadoop.streaming.HadoopStreaming.main(HadoopStreaming.java:41)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at

sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)

at

sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
Exception in thread "main" java.io.FileNotFoundException: File
does
not
exist: hdfs://localhost:8020/user/cloudera/30402

at

org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1093)

at

org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1085)

at

org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)

at

org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1085)

at
org.apache.hadoop.streaming.DumpTypedBytes.run(DumpTypedBytes.java:76)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at

org.apache.hadoop.streaming.HadoopStreaming.main(HadoopStreaming.java:41)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at

sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)

at

sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
Exception in thread "main" java.io.FileNotFoundException: File
does
not
exist: hdfs://localhost:8020/user/cloudera/122
at

org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1093)

at

org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1085)

at

org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)

at

org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1085)

at
org.apache.hadoop.streaming.DumpTypedBytes.run(DumpTypedBytes.java:76)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at

org.apache.hadoop.streaming.HadoopStreaming.main(HadoopStreaming.java:41)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at

sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)

at

sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)


Reply to this email directly or view it on GitHub
<

#161 (comment)

.


Reply to this email directly or view it on GitHub
<

#161 (comment)

.


Reply to this email directly or view it on GitHub
<

#161 (comment)

.


Reply to this email directly or view it on GitHub
<
#161 (comment)

.


Reply to this email directly or view it on GitHub
#161 (comment)
.

@piccolbo
Copy link
Collaborator

correction, that cmd should read

paste(hadoop.streaming(),
"dumptb", rmr.normalize.path(src[[1]]), ">>",
rmr.normalize.path(dest))

On Mon, Mar 23, 2015 at 2:30 PM, Antonio Piccolboni <[email protected]

wrote:

Sorry, I forgot to add a redirection to that command, so you got the
contents of a binary file in console. As meaningless as it looked, it was
probably just fine. Try this to be absolutely sure

hadoop jar $HADOOP_STREAMING dumptb / > /tmp/dumptb.out

(the last greater sign to be entered as is, no substitutions)

The other thing is that if this succeeds, it's more of a rmr2 problem (my
plate). So try this

debug(from.dfs)

from.dfs(out)

step until function dumptb definition, and step one more

debug(dumptb)

c

You are now in dumptb, a Very Simple Function

Please print the contents of src and also

paste(hadoop.streaming(),
"dumptb", rmr.normalize.path(x), ">>", rmr.normalize.path(dest))

And paste the results here. The idea is that the dumptb function does
almost exactly what you typed in at cmd line, and worked. So there must be
some difference either in the cmd entered or in the environment in which it
is executed. Thanks for your patience and cooperation.

On Mon, Mar 23, 2015 at 2:11 PM, kardes [email protected] wrote:

I don't know. I am very new to all this and maybe I made a mistake setting
things up I am not sure.
I can try to set up everything from scratch but I couldn't find a good
blog
that describes
R+Hadoop setup using recent versions of CDH

On Mon, Mar 23, 2015 at 2:09 PM, Erim Kardes [email protected]
wrote:

actually I have done this before, and what I get when I do
hadoop jar $HADOOP_STREAMING dumptb /
is that some meaningless figures/shapes/rectangles appear on the screen.
and I do not get the error I get using from.dfs().

On Mon, Mar 23, 2015 at 10:10 AM, Antonio Piccolboni <
[email protected]> wrote:

You understand that since I can't reproduce this, unless you can give
me
access to a test system, you'll have to debug it yourself but I will
try
to
help. My current thinking is that your streaming installation has a
problem
which is outside rmr2 control or something we can fix, but at least I
would
be able to tell you to go bug the cloudera guys with some good
argument to
do so. To do that, we need to repro the error outside R. To do that,

  1. Run your mapreduce job.
  2. Follow the console output until a line like the following

15/03/20 15:01:02 INFO streaming.StreamJob: Output: /tmp/RtmpT6ad7h/

  1. make a note of that path (which will be different for every run)
  2. open a shell without closing the current R session
  3. list that directory, that should be something like

hdfs dfs -ls path

  1. It should contain several files named part-. Pick one,
    say
    file1
  2. Now try to dump its contents

hadoop jar $HADOOP_STREAMING dumptb /

<> brackets mean "replace with actual value"

It should fail exactly the way the from.dfs function fails. If that's
the
case, you have something to report to cloudera. Otherwise, we need to
debug
from.dfs more closely. Thanks

On Fri, Mar 20, 2015 at 5:11 PM, kardes [email protected]
wrote:

I am not using YARN. I am using MR1. that's why I am doing:

echo
"export

HADOOP_STREAMING=/usr/lib/hadoop-0.20-mapreduce/contrib/streaming/hadoop-streaming-2.5.0-mr1-cdh5.3.2.jar"

~/.bash_profile

On Fri, Mar 20, 2015 at 4:54 PM, Antonio Piccolboni <
[email protected]> wrote:

I am at a loss. Those are not files that rmr2 manipulates, at least
not
explicitly. The only thing I can think of is that there are two
streaming
jars packed with CDH and you are using the wrong one. If you are
using
YARN, you need to use the other one. That's done by setting
HADOOP_STREAMING to the other path returned by the cmd fine that
you
shared
a few messages back. No clue how this could be related to the
error,
but
something we want to be sure it's not in the way.

On Fri, Mar 20, 2015 at 3:07 PM, kardes [email protected]
wrote:

I did update the .bash_profile file. upon your recommendation, I
still
get
the error. could you please let me know how to proceed? thank you
very
much. here is a snapshot of my shell.

[cloudera@quickstart ~]$ echo "export

HADOOP_STREAMING=/usr/lib/hadoop-0.20-mapreduce/contrib/streaming/hadoop-streaming-2.5.0-mr1-cdh5.3.2.jar"

~/.bash_profile
[cloudera@quickstart ~]$ source ~/.bash_profile
[cloudera@quickstart ~]$ R

R version 3.1.2 (2014-10-31) -- "Pumpkin Helmet"
Copyright (C) 2014 The R Foundation for Statistical Computing
Platform: x86_64-redhat-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

Sys.getenv("HADOOP_STREAMING")
[1]

"/usr/lib/hadoop-0.20-mapreduce/contrib/streaming/hadoop-streaming-2.5.0-mr1-cdh5.3.2.jar"

library(rmr2)
Loading required package: Rcpp
Loading required package: RJSONIO
Loading required package: digest
Loading required package: functional
Loading required package: stringr
Loading required package: plyr
small.ints <- to.dfs(1:1000)
15/03/20 15:00:29 INFO zlib.ZlibFactory: Successfully loaded &
initialized
native-zlib library
15/03/20 15:00:29 INFO compress.CodecPool: Got brand-new
compressor
[.deflate]
out <- mapreduce(input = small.ints, map = function(k, v)
keyval(v,
v^2))
packageJobJar: [/tmp/RtmpT6ad7h/rmr-local-env14e469666bf9,
/tmp/RtmpT6ad7h/rmr-global-env14e431d70a48,
/tmp/RtmpT6ad7h/rmr-streaming-map14e4610d911a,
/tmp/hadoop-cloudera/hadoop-unjar1796188924023766754/] []
/tmp/streamjob8004571112410021052.jar tmpDir=null
15/03/20 15:00:33 WARN mapred.JobClient: Use GenericOptionsParser
for
parsing the arguments. Applications should implement Tool for the
same.
15/03/20 15:00:33 INFO mapred.FileInputFormat: Total input paths
to
process : 1
15/03/20 15:00:34 INFO streaming.StreamJob: getLocalDirs():
[/var/lib/hadoop-hdfs/cache/cloudera/mapred/local]
15/03/20 15:00:34 INFO streaming.StreamJob: Running job:
job_201503201448_0001
15/03/20 15:00:34 INFO streaming.StreamJob: To kill this job,
run:
15/03/20 15:00:34 INFO streaming.StreamJob:
/usr/lib/hadoop-0.20-mapreduce/bin/hadoop job
-Dmapred.job.tracker=localhost:8021 -kill job_201503201448_0001
15/03/20 15:00:34 INFO streaming.StreamJob: Tracking URL:

http://quickstart.cloudera:50030/jobdetails.jsp?jobid=job_201503201448_0001

15/03/20 15:00:35 INFO streaming.StreamJob: map 0% reduce 0%
15/03/20 15:00:57 INFO streaming.StreamJob: map 100% reduce 0%
15/03/20 15:01:02 INFO streaming.StreamJob: map 100% reduce 100%
15/03/20 15:01:02 INFO streaming.StreamJob: Job complete:
job_201503201448_0001
15/03/20 15:01:02 INFO streaming.StreamJob: Output:
/tmp/RtmpT6ad7h/file14e456031345
df <- as.data.frame(from.dfs(out))
Exception in thread "main" java.io.FileNotFoundException: File
does
not
exist: hdfs://localhost:8020/user/cloudera/0
at

org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1093)

at

org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1085)

at

org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)

at

org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1085)

at

org.apache.hadoop.streaming.DumpTypedBytes.run(DumpTypedBytes.java:76)

at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at

org.apache.hadoop.streaming.HadoopStreaming.main(HadoopStreaming.java:41)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at

sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)

at

sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
Exception in thread "main" java.io.FileNotFoundException: File
does
not
exist: hdfs://localhost:8020/user/cloudera/128464
at

org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1093)

at

org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1085)

at

org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)

at

org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1085)

at

org.apache.hadoop.streaming.DumpTypedBytes.run(DumpTypedBytes.java:76)

at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at

org.apache.hadoop.streaming.HadoopStreaming.main(HadoopStreaming.java:41)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at

sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)

at

sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
Exception in thread "main" java.io.FileNotFoundException: File
does
not
exist: hdfs://localhost:8020/user/cloudera/30402

at

org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1093)

at

org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1085)

at

org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)

at

org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1085)

at

org.apache.hadoop.streaming.DumpTypedBytes.run(DumpTypedBytes.java:76)

at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at

org.apache.hadoop.streaming.HadoopStreaming.main(HadoopStreaming.java:41)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at

sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)

at

sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
Exception in thread "main" java.io.FileNotFoundException: File
does
not
exist: hdfs://localhost:8020/user/cloudera/122
at

org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1093)

at

org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1085)

at

org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)

at

org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1085)

at

org.apache.hadoop.streaming.DumpTypedBytes.run(DumpTypedBytes.java:76)

at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at

org.apache.hadoop.streaming.HadoopStreaming.main(HadoopStreaming.java:41)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at

sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)

at

sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)


Reply to this email directly or view it on GitHub
<

#161 (comment)

.


Reply to this email directly or view it on GitHub
<

#161 (comment)

.


Reply to this email directly or view it on GitHub
<

#161 (comment)

.


Reply to this email directly or view it on GitHub
<
#161 (comment)

.


Reply to this email directly or view it on GitHub
#161 (comment)
.

@piccolbo
Copy link
Collaborator

On Wed, Mar 25, 2015 at 3:29 PM, kardes [email protected] wrote:

I did the following:

[cloudera@quickstart ~]$ hadoop fs -ls /tmp/RtmpUzjoWy/file1c7dbd916e7

Found 4 items
-rw-r--r-- 1 cloudera supergroup 0 2015-03-25 15:18
/tmp/RtmpUzjoWy/file1c7dbd916e7/_SUCCESS
drwxrwxrwx - cloudera supergroup 0 2015-03-25 15:17
/tmp/RtmpUzjoWy/file1c7dbd916e7/_logs
-rw-r--r-- 1 cloudera supergroup 422 2015-03-25 15:17
/tmp/RtmpUzjoWy/file1c7dbd916e7/part-00000
-rw-r--r-- 1 cloudera supergroup 122 2015-03-25 15:17
/tmp/RtmpUzjoWy/file1c7dbd916e7/part-00001

and then

[cloudera@quickstart ~]$ hadoop jar $HADOOP_STREAMING dumptb
/tmp/RtmpUzjoWy/file1c7dbd916e7 / part-00001 > /tmp/dumptb.out
Exception in thread "main" java.io.FileNotFoundException: Path is not a
file: /tmp/RtmpUzjoWy/file1c7dbd916e7/_logs

There's two spaces two many in this path /tmp/RtmpUzjoWy/file1c7dbd916e7 /
part-00001
See them? Around /? plz remove and try again.

at
org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:69)
at
org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:55)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1879)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1820)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1800)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1772)
at
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:527)
at
org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getBlockLocations(AuthorizationProviderProxyClientProtocol.java:85)
at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:356)
at
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)

at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73)
at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:1171)
at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1159)
at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1149)
at org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:270)
at org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:237)
at org.apache.hadoop.hdfs.DFSInputStream.(DFSInputStream.java:230)
at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1448)
at org.apache.hadoop.hdfs.DistributedFileSystem$3.doCall(DistributedFileSystem.java:301)
at org.apache.hadoop.hdfs.DistributedFileSystem$3.doCall(DistributedFileSystem.java:297)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:297)
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:766)
at org.apache.hadoop.streaming.AutoInputFormat.getRecordReader(AutoInputFormat.java:56)
at org.apache.hadoop.streaming.DumpTypedBytes.dumpTypedBytes(DumpTypedBytes.java:102)
at org.apache.hadoop.streaming.DumpTypedBytes.run(
DumpTypedBytes.java:83)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at org.apache.hadoop.streaming.HadoopStreaming.main(HadoopStreaming.java:41)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)

Caused by:
org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException): Path
is not a file: /tmp/RtmpUzjoWy/file1c7dbd916e7/_logs
at
org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:69)
at
org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:55)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1879)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1820)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1800)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1772)
at
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:527)
at
org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getBlockLocations(AuthorizationProviderProxyClientProtocol.java:85)
at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:356)
at
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)

at org.apache.hadoop.ipc.Client.call(Client.java:1411)
at org.apache.hadoop.ipc.Client.call(Client.java:1364)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
at com.sun.proxy.$Proxy14.getBlockLocations(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getBlockLocations(ClientNamenodeProtocolTranslatorPB.java:246)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy15.getBlockLocations(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:1169)
... 22 more

[cloudera@quickstart ~]$

so I am getting the error in this case, given I am doing everything as you
suggested. How could I proceed in this case? Thanks for your help.


Reply to this email directly or view it on GitHub
#161 (comment)
.

@kardes
Copy link
Author

kardes commented Mar 25, 2015

after running the mapreduce job, I did the following:

[cloudera@quickstart ~]$ hadoop fs -ls /tmp/RtmpUzjoWy/file1c7dbd916e7

Found 4 items
-rw-r--r-- 1 cloudera supergroup 0 2015-03-25 15:18 /tmp/RtmpUzjoWy/file1c7dbd916e7/_SUCCESS
drwxrwxrwx - cloudera supergroup 0 2015-03-25 15:17 /tmp/RtmpUzjoWy/file1c7dbd916e7/_logs
-rw-r--r-- 1 cloudera supergroup 422 2015-03-25 15:17 /tmp/RtmpUzjoWy/file1c7dbd916e7/part-00000
-rw-r--r-- 1 cloudera supergroup 122 2015-03-25 15:17 /tmp/RtmpUzjoWy/file1c7dbd916e7/part-00001

and then

[cloudera@quickstart ~]$ hadoop jar $HADOOP_STREAMING dumptb /tmp/RtmpUzjoWy/file1c7dbd916e7/part-00001 > /tmp/dumptb.out
[cloudera@quickstart ~]$

so I did not get an error in this case. so I continued:

Browse[2]> debug(dumptb)
Browse[2]> c
debugging in: dumptb(part.list(fname), tmp)
debug: {
lapply(src, function(x) system(paste(hadoop.streaming(),
"dumptb", x, ">>", dest)))
}
Browse[2]> src
[1] "0" "128429" "422" "122"

Browse[2]> paste(hadoop.streaming(),"dumptb",rmr.normalize.path(src[[1]]),">>",rmr.normalize.path(dest))
Error in paste(hadoop.streaming(), "dumptb", rmr.normalize.path(src[[1]]), :
could not find function "rmr.normalize.path"
Browse[2]>

please let me know how to proceed. thanks for your time.

@piccolbo
Copy link
Collaborator

I have it, part.list is failing. Probably a problem with hdfs.ls

rmr2:::hdfs.ls(out())

Please share what that returns, its class.

On Wed, Mar 25, 2015 at 3:56 PM, kardes [email protected] wrote:

after running the mapreduce job, I did the following:

[cloudera@quickstart ~]$ hadoop fs -ls /tmp/RtmpUzjoWy/file1c7dbd916e7

Found 4 items
-rw-r--r-- 1 cloudera supergroup 0 2015-03-25 15:18
/tmp/RtmpUzjoWy/file1c7dbd916e7/_SUCCESS
drwxrwxrwx - cloudera supergroup 0 2015-03-25 15:17
/tmp/RtmpUzjoWy/file1c7dbd916e7/_logs
-rw-r--r-- 1 cloudera supergroup 422 2015-03-25 15:17
/tmp/RtmpUzjoWy/file1c7dbd916e7/part-00000
-rw-r--r-- 1 cloudera supergroup 122 2015-03-25 15:17
/tmp/RtmpUzjoWy/file1c7dbd916e7/part-00001

and then

[cloudera@quickstart ~]$ hadoop jar $HADOOP_STREAMING dumptb
/tmp/RtmpUzjoWy/file1c7dbd916e7/part-00001 > /tmp/dumptb.out
[cloudera@quickstart ~]$

so I did not get an error in this case. so I continued:

Browse[2]> debug(dumptb)
Browse[2]> c
debugging in: dumptb(part.list(fname), tmp)
debug: {
lapply(src, function(x) system(paste(hadoop.streaming(),
"dumptb", x, ">>", dest)))
}
Browse[2]> src
[1] "0" "128429" "422" "122"

Browse[2]>
paste(hadoop.streaming(),"dumptb",rmr.normalize.path(src[[1]]),">>",rmr.normalize.path(dest))
Error in paste(hadoop.streaming(), "dumptb", rmr.normalize.path(src[[1]]),
:
could not find function "rmr.normalize.path"
Browse[2]>

please let me know how to proceed. thanks for your time.


Reply to this email directly or view it on GitHub
#161 (comment)
.

@kardes
Copy link
Author

kardes commented Mar 25, 2015

Browse[2]> rmr2:::hdfs.ls(out())
[,1] [,2] [,3] [,4] [,5] [,6] [,7]
[1,] "-rw-r--r--" "1" "cloudera" "supergroup" "0" "2015-03-25" "15:18"
[2,] "drwxrwxrwx" "-" "cloudera" "supergroup" "0" "2015-03-25" "15:17"
[3,] "-rw-r--r--" "1" "cloudera" "supergroup" "422" "2015-03-25" "15:17"
[4,] "-rw-r--r--" "1" "cloudera" "supergroup" "122" "2015-03-25" "15:17"
[,8]
[1,] "/tmp/RtmpUzjoWy/file1c7dbd916e7/_SUCCESS"
[2,] "/tmp/RtmpUzjoWy/file1c7dbd916e7/_logs"
[3,] "/tmp/RtmpUzjoWy/file1c7dbd916e7/part-00000"
[4,] "/tmp/RtmpUzjoWy/file1c7dbd916e7/part-00001"
Browse[2]> str(rmr2:::hdfs.ls(out()))
chr [1:4, 1:8] "-rw-r--r--" "drwxrwxrwx" "-rw-r--r--" "-rw-r--r--" ...
Browse[2]> class(rmr2:::hdfs.ls(out()))
[1] "matrix"
Browse[2]>

@piccolbo
Copy link
Collaborator

This is close to impossible. Please enter

packageDescription("rmr2")

On Wed, Mar 25, 2015 at 4:25 PM, kardes [email protected] wrote:

Browse[2]> rmr2:::hdfs.ls(out())
[,1] [,2] [,3] [,4] [,5] [,6] [,7]

[1,] "-rw-r--r--" "1" "cloudera" "supergroup" "0" "2015-03-25" "15:18"
[2,] "drwxrwxrwx" "-" "cloudera" "supergroup" "0" "2015-03-25" "15:17"
[3,] "-rw-r--r--" "1" "cloudera" "supergroup" "422" "2015-03-25" "15:17"
[4,] "-rw-r--r--" "1" "cloudera" "supergroup" "122" "2015-03-25" "15:17"
[,8]

[1,] "/tmp/RtmpUzjoWy/file1c7dbd916e7/_SUCCESS"

[2,] "/tmp/RtmpUzjoWy/file1c7dbd916e7/_logs"

[3,] "/tmp/RtmpUzjoWy/file1c7dbd916e7/part-00000"
[4,] "/tmp/RtmpUzjoWy/file1c7dbd916e7/part-00001"
Browse[2]> str(rmr2:::hdfs.ls(out()))
chr [1:4, 1:8] "-rw-r--r--" "drwxrwxrwx" "-rw-r--r--" "-rw-r--r--" ...
Browse[2]> class(rmr2:::hdfs.ls(out()))
[1] "matrix"
Browse[2]>


Reply to this email directly or view it on GitHub
#161 (comment)
.

@kardes
Copy link
Author

kardes commented Mar 25, 2015

packageDescription("rmr2")
Package: rmr2
Type: Package
Title: R and Hadoop Streaming Connector
Version: 2.0.2
Date: 2012-4-12
Author: Revolution Analytics
Depends: R (>= 2.6.0), Rcpp, RJSONIO (>= 0.8-2), digest, functional,
stringr, plyr
Suggests: quickcheck
Collate: basic.R keyval.R IO.R local.R streaming.R mapreduce.R extras.R
.....
Maintainer: Revolution Analytics [email protected]
Description: Supports the map reduce programming model on top of hadoop
streaming
License: Apache License (== 2.0)
Packaged: 2012-12-05 03:35:30 UTC; antonio
Built: R 3.1.2; x86_64-redhat-linux-gnu; 2015-03-12 22:30:28 UTC; unix

-- File: /usr/lib64/R/library/rmr2/Meta/package.rds

@piccolbo
Copy link
Collaborator

Please upgrade to the latest version. Thanks

Antonio

On Wed, Mar 25, 2015 at 4:36 PM, kardes [email protected] wrote:

packageDescription("rmr2")
Package: rmr2
Type: Package
Title: R and Hadoop Streaming Connector
Version: 2.0.2
Date: 2012-4-12
Author: Revolution Analytics
Depends: R (>= 2.6.0), Rcpp, RJSONIO (>= 0.8-2), digest, functional,
stringr, plyr
Suggests: quickcheck
Collate: basic.R keyval.R IO.R local.R streaming.R mapreduce.R extras.R
.....
Maintainer: Revolution Analytics [email protected]
Description: Supports the map reduce programming model on top of hadoop
streaming
License: Apache License (== 2.0)
Packaged: 2012-12-05 03:35:30 UTC; antonio
Built: R 3.1.2; x86_64-redhat-linux-gnu; 2015-03-12 22:30:28 UTC; unix

-- File: /usr/lib64/R/library/rmr2/Meta/package.rds


Reply to this email directly or view it on GitHub
#161 (comment)
.

@kardes
Copy link
Author

kardes commented Mar 31, 2015

made it! thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants