Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About error: pipeMapRed.waitOutputThreads and additional so file #164

Open
ghost opened this issue Apr 8, 2015 · 1 comment
Open

About error: pipeMapRed.waitOutputThreads and additional so file #164

ghost opened this issue Apr 8, 2015 · 1 comment

Comments

@ghost
Copy link

ghost commented Apr 8, 2015

when I ran function mapreduce in rmr2, I encountered an error
pipeMapRed.waitOutputThreads(): subprocess failed with code 127. My environment is that min 17.1 rebecca, hadoop 2.6.0 with localhost setup, R 3.1.3 compiled with Intel MKL, intel C/C++ compiler by myself, oracle java 1.8.40

I digged into this error, I discover that it is the shared library in the system does not load correctly. Since it was successive to run r code by using original streaming files and hadoop command:
bash hadoop jar /usr/local/hadoop/share/hadoop/tools/lib/hadoop-streaming-2.6.0.jar -files mapper.R,reducer.R,/opt/intel/composer_xe_2013_sp1/compiler/lib/intel64/libiomp5.so,/usr/lib/jvm/java-8-oracle/jre/lib/amd64/server/libjvm.so -mapper "mapper.R -m" -reducer "reducer.R -r"-input /user/hadoop/testData/* -output /user/hadoop/testData2-output

I have try to add backend.parameter = list(hadoop=list(files=/opt/intel/composer_xe_2013_sp1/compiler/lib/intel64/libiomp5.so, files=/usr/lib/jvm/java-8-oracle/jre/lib/amd64/server/libjvm.so)) to mapreduce function, but it comes another error. I spectacular that it is caused by hadoop streaming does not accept 2 and more -files.

Therefore, I modify the original file, R/streaming.R, in the package before building. I modify the files parameter in final.command with
R files = paste(collapse = ",", c(image.files, map.file, reduce.file, combine.file, "/opt/intel/composer_xe_2013_sp1/compiler/lib/intel64/libiomp5.so", "/usr/lib/jvm/java-8-oracle/jre/lib/amd64/server/libjvm.so"))

Then it fix the error pipeMapRed.waitOutputThreads(): subprocess failed with code 127. I wonder is it possible to add a new parameter in rmr2 to modify the input files. Or is there another solution to solve this problem by editing the environment of hadoop.

@piccolbo
Copy link
Collaborator

piccolbo commented Apr 8, 2015

There are some limitations related to the specific order of options that may be a problem here. In short backend.parameters is safe for generic options such as -D, which is the one used most often. -files is not generic so it needs to be in a certain order wrt generic ones and there's only so much rmr2 can do to order them right without embedding the full knowledge of what is generic plus a complete refactor of how the cmd line is put together right now (one would have to delay conversion to a string until the cmd line is fully specified). It's quite a bit of development and added, permanent complexity for a very specialized use case. The other thing is that -files is already used and It does accept a list of files, which suggests that specifying it twice may not be acceptable, but I am not 100% sure. If that's the case, allowing the user to specify additional -files arguments would require an even deeper refactor.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant