Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ownership error with squashed outputs #142

Open
fcomitani opened this issue Jun 15, 2018 · 5 comments
Open

Ownership error with squashed outputs #142

fcomitani opened this issue Jun 15, 2018 · 5 comments

Comments

@fcomitani
Copy link

Hello,

I just wanted to notify an issue that arises when dealing with a change of ownership in the pipeline output files.

I am running the pipeline as root on a dedicated VM within an HPC cluster for which all output is squashed to a specific user. This workaround was set up to avoid security issues with docker on the HPC cluster.

The problem is due to tar, which requires to explicitly acknowledge the change in ownership.
I managed to solve the problem first by adding --no-same-owner to every tar call in tools/aligners.py, preprocessing.py and quantifiers.py as in the following example:

subprocess.check_call(['tar', '-xvf', os.path.join(job.tempDir, 'starIndex.tar.gz'), '-C', job.tempDir, '--no-same-owner'])

To temporarily bypass the problem when creating the output tarball files, however, I had to explicitly add the username information to the code of utils/files.py.
Line 20 f_out.add(file_path, arcname=arcname) is now

def reset(tarinfo):
    tarinfo.uid=tarinfo.gid=1000
    tarinfo.uname='username'
    tarinfo.gname='usergroup'
    return tarinfo
f_out.add(file_path, arcname=arcname, filter=reset)

I'm sure there are more elegant solutions to this, but I wanted to let you know in case anybody else tries to run the pipeline on a VM.

Regards,
Federico

@jvivian
Copy link
Collaborator

jvivian commented Jun 18, 2018

@fcomitani — Thank you for the issue submission. A few questions if you could elucidate:

  1. What sort of error does tar give when you don't pass in --no-same-owner?
  2. What sort of VM are you running the workflow in?
  3. You are root within the VM, but you need the output to be owned by a specific user on the HPC cluster — Is there a way you can add the same UID as a privileged user within the VM? Docker lets non-root users run Docker by being added to a group: sudo usermod -aG docker $USER. We use a special call when executing Docker containers that reruns the container in order to chown the output as owner of the mounted directory where the output is being stored. I wonder if this would work in your use case so you don't need to edit source code.

I'm glad you were able to find a workaround in the interim!

@rcurrie — Do you know if this is a common setup for the workflow for our Treehouse collaborators?

@rcurrie
Copy link
Member

rcurrie commented Jun 18, 2018

The Treehouse collaborators are all running using docker to ensure concordance as the output is added to the public compendium.

@jvivian
Copy link
Collaborator

jvivian commented Jun 18, 2018

@rcurrie — Thanks for the info. To get around Docker permission issues are there any groups trying to run Docker in a VM like Federico or has that not come up as an issue?

@rcurrie
Copy link
Member

rcurrie commented Jun 18, 2018

@jvivian Doh! My bad, didn't connect the dots here. I'm 99% sure BC and Nationwide both run Docker on the host OS vs. in a VM. I just unpacked a Nationwide tar and it seems to be find (files appear as created/owned by me)

@fcomitani
Copy link
Author

fcomitani commented Jun 18, 2018

@jvivian thanks for getting back to me.

  1. The pipeline gets interrupted after a list of errors like the following are printed when trying to create the tar file containing the various outputs.

7/G/jobVh_6TN tar: starIndex/chrName.txt: Cannot change ownership to uid 1000, gid 1000: Operation not permitted

  1. The VM runs Ubuntu LTS, it has 64GB or RAM and 32CPUs. It is built on an HPC machine with Torque, but it does not require to pass through the queue system when running jobs. It runs tar 1.28.

  2. I guess giving docking permissions to the non-root user to which all output is squashed (with the same uid and gid) could actually work. There are some issues however with the uid itself, since in the VM it is already taken by ubuntu. I'll see if there's a way to do it and let you know!

Thanks again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants