Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add nix package manager to base install #60

Open
espg opened this issue Jun 15, 2021 · 5 comments
Open

Add nix package manager to base install #60

espg opened this issue Jun 15, 2021 · 5 comments
Labels
🏷️ JupyterHub Something related to JupyterHub

Comments

@espg
Copy link
Contributor

espg commented Jun 15, 2021

Not entirely comfortable with docker builds, hence the issue rather than a pull request...

The nix package manger has a 'hard' promise of reproduciblity, especially between linux systems. If I write a default.nix derivation on my system and run nix on that file within the jupyterhub, I will get the same working environment--same versions of packages, with same compile time options, and same environmental variables (similar to a yarn.lock file as best I can tell). So it enables a user to either prototype a build on the jupyterhub and reproduce on another system, or, bring a working/functional environment into the jupyterhub.

It would be nice to do this in a way that is persistent for users and between sessions. i.e., if someone else on the hub uses nix-shell default.nix, is shouldn't require a new build, because the previous build would be cached under a shared /nix. This is normally done with a multiuser install that runs a daemon to intercept nix calls and farm out the build to the nix user.

Why is this hard?

Nix has a single line install using curl -L https://nixos.org/nix/install | sh that does a 'single user' install.

The problems:

  • Requires access to sudo
  1. If you do this in the hub, you'll have to do it every time that you restart the container. This is because everything gets installed in a non-persistent directory under root. You'll also need to setup privileged containers.
  2. You could do this in the dockerfile to make the binaries at least persistent and not require reinstallation after every container shut down... but the above one liner doesn't work as root, you need to do it as user with sudo. You can set a user with no password to do the install, but then you don't have the environmental variables set since it's only been installed for the 'build' user. There's a potential workaround to copy a script to source the path variables that gets run whenever joyvan logs in, which is klugey but seems like it will work... with the same caveats as below...
  • Requires persistent directory

You could do it the correct way and do a multi user install with nixbuild users and a persistent daemon to build things... but then you need to mount a directory outside of the docker container. If you don't builds will:

  1. Not be shared across users
  2. Not persist for the same user between containers. This includes just switching to a bigger container to run the same code.

@consideRatio leaving this here for now, will update if I find more workable solutions...

@consideRatio
Copy link
Member

@espg can you clarify if you think it would be worth pursuing this for the "persistent on a per user level" rather than "persistent and things built are shared between all users basis"? I ask because I think there can be very significant difference in complexity between these, and the former could be accomplished without that much trouble while the latter probably requires significant efforts and may not turn out to work.

Currently, anything stored in /home/jovyan after the Dockerfile is built will be persistent, but anything written there during the Dockerfile builds will be overwritten by the mounted persistent storage when the container starts.

The complexity I'm worried about by having a shared directory for all users is that it isn't like sharing a directory on one computer between different users on that computer. If all users are on the same computer, you can have a master process doing things for all users and hence just write to the shared folder via one process. But, if you have multiple separate computers or containers, there is no easy way to have a shared process that make sure that write operations won't cause conflicts etc. For these situations, one typically end up needing some central service that others communicate with, but now that must be external to the users computer/container and I'd guess that will require us to make use of some additional service that may not have been developed at all yet.

@espg
Copy link
Contributor Author

espg commented Jun 15, 2021

@consideRatio I'm less worried about the shared directory, because it's generally considered 'read only'... and by that, I mean that the objects that are there are write once and not mutable. You can add a new file path with a new build, but once a file is created, it's not ever going to be 'modified'. So things like 'locking' don't really matter.

Each build is stored like this: /nix/store/{hash}-mylib/ , i.e., /nix/store/000dm655691b5zis34klvhlil3hrv7j5-fftw-3.3.9.tar.gz.drv. If you change anything at all-- the version of gcc that was used to compile the build, the version of lapack that it's linked to, a flag in the configure stage of the install-- then the output of the derivation will change, with the hash... so maybe it's called /nix/store/0dsz6pp7ikra6gnb3n15spwlxrhm1wdl-fftw-3.3.9.tar.gz.drv now, which is the exact same 'version' as the other path, but a build that uses a version of numpy that calls the intel mkl instead of openblas.

That's why I'm caviler about having it shared; it's unlikely to break because it isn't changing. The only time it should get modified is if it's grown large from old versions...like you might delete it once a year to keep the filesizes low. That's also why sudo is usually needed to set up-- to make sure that no user had write access to mess things up by changing the state of any directories.

@espg
Copy link
Contributor Author

espg commented Jun 15, 2021

On that point, it probably doesn't need the multi-user setup... the only thing that the daemon is doing is spawning build processes per user, but if each user is in their own container, then the daemon is pretty redundant. There's a (vanishingly small) chance that you could run into issues with incomplete builds; i.e., someone tries to read from a directory that only has part of the files copied over, but I think that's only going to be a temporary error. Even if two people start building the same requirement/library, if they're writing to the same location, it means that they're writing the same byte-by-byte files...so if they overwrite something, they're overwriting the same file with itself

@consideRatio
Copy link
Member

@espg ah okay! Then I would say we want:

  1. A single user installation of nix, making it available for jovyan only which will run in multiple different containers.
    1. Note that this installation should not reside in the home folder, because that will be replaced.
  2. A location that is shared between users in a read/write manner can be allowed to be overwritten entirely after the Dockerfile has been built during the time a container starts. If this can be in the already mounted /home/jovyan/shared-readwrite folder we are good to go right away. If this must be some other location because we can't configure it, there needs to be done some k8s stuff to create and mount such storage.

About sudo permissions

It is easy and without risk to provide sudo permissions during Dockerfile build, but whenever we leave the Dockerfile build and let users start running code - that is when I'd hope to avoid introducing the need to use sudo. The question in my mind is if we need sudo permissions after we have built the Dockerfile into an image: do we need sudo permissions when the image is started as a container as well for users to install packages just like with apt?

@espg
Copy link
Contributor Author

espg commented Jun 15, 2021

@consideRatio We shouldn't need sudo at all once nix is installed. I think the sudo call during installation is mostly to set the correct permissions on the /nix directory, and to create the nix build user that's called whenever nix-shell is invoked (the build user have write permission on the /nix directory)

For item one, I think we can modify this which is modified from this code (I think):

# Add the user aaronlevin for security reasons and for Nix
RUN adduser --disabled-password --gecos '' aaronlevin 

# Nix requires ownership of /nix.
RUN mkdir -m 0755 /nix && chown aaronlevin /nix

# Change docker user to aaronlevin
USER aaronlevin

# Set some environment variables for Docker and Nix
ENV USER aaronlevin

# Change our working directory to $HOME
WORKDIR /home/aaronlevin

# install Nix
RUN curl https://nixos.org/nix/install | sh

# update the nix channels
# Note: nix.sh sets some environment variables. Unfortunately in Docker
# environment variables don't persist across `RUN` commands
# without using Docker's own `ENV` command, so we need to prefix
# our nix commands with `. .nix-profile/etc/profile.d/nix.sh` to ensure
# nix manages our $PATH appropriately.
RUN . .nix-profile/etc/profile.d/nix.sh && nix-channel --update
RUN . .nix-profile/etc/profile.d/nix.sh 

Where the aaronlevin user is just a throw away build user, who has sudo to build things, but is ignored forever after that. That last line though (.nix-profile/etc/profile.d/nix.sh ) needs to be run as joyvan at some point to fix the paths correctly (I think it's actually just a source .nix-profile/etc/profile.d/nix.sh ), so the script needs to get copied some where accessible during the build.

For item 2, I think it is possible but a hassle... to change the root directory of the store, you'd need to recompile from source (i.e., see here).

@whyjz whyjz added the 🏷️ JupyterHub Something related to JupyterHub label Jul 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏷️ JupyterHub Something related to JupyterHub
Projects
None yet
Development

No branches or pull requests

3 participants