Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] A way to include non-python dependencies in MAP packages #318

Open
laurencejackson opened this issue Aug 1, 2022 · 16 comments
Open
Labels
enhancement New feature or request

Comments

@laurencejackson
Copy link

laurencejackson commented Aug 1, 2022

Is your feature request related to a problem? Please describe.

I'd like to convert a magnetic resonance spectroscopy (MRS) application we run locally to a MAP that can run on MONAI deploy. The application makes use of a 3rd party compiled linux executable called Tarquin.

There should be some way to include non-python dependencies such as this in monai application packages.

Describe the solution you'd like

Since monai-deploy-app-sdk is built on Docker images, there should be a simple way to modify the Dockerfile used to build the MAP to enable inclusion of non-python dependencies. e.g. I'd like to be able to include the following lines in my Dockerfile:

RUN apt-get update -y && apt-get install -y gnuplot ghostscript git wget imagemagick
RUN mkdir ../tarquin_folder
RUN wget -O tarquin.tar.gz https://sourceforge.net/projects/tarquin/files/TARQUIN_4.3.11/TARQUIN_Linux_4.3.11.tar.gz/download
RUN tar -xzf tarquin.tar.gz -C ../tarquin_folder
RUN rm tarquin.tar.gz
RUN mv ../tarquin_folder/TARQUIN_Linux_4.3.11_RC/tarquin /usr/local/bin/tarquin
RUN ln -s /usr/bin/gnuplot /usr/local/bin/gnuplot

A generic way to include this sort of thing would be amazing. This functionality might also tie in with #242.

Describe alternatives you've considered

I'm not aware of any way to include compiled executables in Docker runtimes other than installing the executable into the container.

Additional context

@laurencejackson laurencejackson added the enhancement New feature or request label Aug 1, 2022
@MMelQin
Copy link
Collaborator

MMelQin commented Aug 1, 2022

@laurencejackson Thanks for the feature request, a good one.

For the time being, maybe you have already tried, we can use alternatives, either by building a new image from the newly built MAP image, or building a custom base from the supported base images and then building MAP.

@vikashg
Copy link
Collaborator

vikashg commented Aug 2, 2022

So I was talking about this similar thing with the IT here on Friday and one of the things they suggested is to build pipelines. Here things are working with azure so if there is a specific application we can build a pipeline. Well when I say pipeline it is a pipeline in the K8s sense. But maybe we can execute the pipeline without K8s to avoid additional complexity.

@MMelQin
Copy link
Collaborator

MMelQin commented Aug 2, 2022

So I was talking about this similar thing with the IT here on Friday and one of the things they suggested is to build pipelines. Here things are working with azure so if there is a specific application we can build a pipeline. Well when I say pipeline it is a pipeline in the K8s sense. But maybe we can execute the pipeline without K8s to avoid additional complexity.

Yes, this is what we envisioned, and was demonstrated in the MONAI Deploy Workflow PoC. The MAP, a Docker image itself, can be used as an "operator" in a processing pipeline in Argo on K8s or similar. The MAP itself, as we know, can also have its own pipeline within the app, and in the same process at runtime; this simplifies in memory object sharing as well as application specific pipeline management, with the added benefit that the pipeline in Argo can legitimately be just a single container one.

Composing, and managing, a K8s pipeline with many containers of very granular functionality has many disadvantages. One of the design goals of the App SDK was to help address them.

@laurencejackson
Copy link
Author

Thanks @vikashg, this is really useful. Am I right in my understanding that when you talk about k8s "pipelines", that is MS Azure terminology, and azure pipelines would be equivalent to something like an Argo workflow? So these non-python dependencies can be defined at the level above the MAP?

I understand your comment above @MMelQin, building a new container with the executable on top of the built MAP seems the simplest to me. But I'm still a bit unsure about your second comment, how can I use an Argo/k8s pipeline to include a non-python dependency? Are these libraries/executable files defined and shared at a pipeline level? Is this an Azure feature only or is it available on Argo too?

@MMelQin
Copy link
Collaborator

MMelQin commented Aug 3, 2022

@laurencejackson For the second point, it is not directly related to inject/install dependency in an application container image. It merely confirms the use of an MAP as an Argo operator, run as a container in the workflow/pipeline.

If the functionalities required of the additional packages can be packaged into a separate application container image, AND, used as a separate stage/operator in the workflow, then they can be composed in a pipeline e.g. in Argo. Clara Deploy uses this approach, with granular functionalities in each each container image, e.g. DICOM Parser would be in its own container and likewise the DICOM Writer. This has a number of disadvantages which the App SDK helps to address by composing the functionalities into the same app, while in the meantime still supports MAP being run in a pipeline.

@tomaroberts
Copy link

@MMelQin

If the functionalities required of the additional packages can be packaged into a separate application container image, AND, used as a separate stage/operator in the workflow, then they can be composed in a pipeline e.g. in Argo

I'm working on converting my app into a MAP and I've got a similar requirement to Laurence in that I need to run 3rd party code. Is your comment above still the best way to go about this? Or do you any alternative thoughts since this Issue was created?

I can effectively separate my app into 5 stages/operators:

  1. DCM > Nifti conversion (currently call dcm2niix installed in Docker container)
  2. Pytorch-based AI-driven segmentation (so inside MAP)
  3. C++ based image registration (package as separate Docker image?)
  4. Pytorch-based AI-based image reorientation (so inside MAP)
  5. Nifti > DCM conversion (currently custom code using Python, so could also put inside MAP I guess)

Based on this thread, I think I need to package up Step 3 (and possibly 1) into non-MAP images?

@MMelQin
Copy link
Collaborator

MMelQin commented Oct 31, 2022

@MMelQin

If the functionalities required of the additional packages can be packaged into a separate application container image, AND, used as a separate stage/operator in the workflow, then they can be composed in a pipeline e.g. in Argo

I'm working on converting my app into a MAP and I've got a similar requirement to Laurence in that I need to run 3rd party code. Is your comment above still the best way to go about this? Or do you any alternative thoughts since this Issue was created?

I can effectively separate my app into 5 stages/operators:

  1. DCM > Nifti conversion (currently call dcm2niix installed in Docker container)
  2. Pytorch-based AI-driven segmentation (so inside MAP)
  3. C++ based image registration (package as separate Docker image?)
  4. Pytorch-based AI-based image reorientation (so inside MAP)
  5. Nifti > DCM conversion (currently custom code using Python, so could also put inside MAP I guess)

Based on this thread, I think I need to package up Step 3 (and possibly 1) into non-MAP images?

Hi @tomaroberts a couple things I'd need to clarify

  • The MAP, as it stands now, is a single container image. The App SDK operators used within an Application class are all classes/objects in the same process at runtime.
  • Some of the discussion above was to customize/add/install additional non-Python dependent packages in the container image, which can be done post building of the MAP container image before the monai-deploy package command can properly support/accept extra OS pacaksges.

Your use case indeed presents a challenge, in that a C++ based algorithm needs to be added as an operator,

  • the best way is to wrap the algorithm in Python, so we can build an operator class for it to be used in the Application
  • launching a docker from within the MAP container is technically doable, though this will require using Docker client SDK, and granting the MAP container user permissions to host socket for accessing the host Docker daemon. This also limits the container runtime to Docker (not containerd for example)

As for DCM to Nifti, specifically in file format, we try to avoid saving image to disk in the App SDK. There is the a domain class, Image, which is a simple wrapper holding a data object (ndarray), and a dict of the arbitrary metadata, and down the line, some transformation information can be added. We had defied this before the MetaTensor in MONAI Core was introduced, albeit the latter support conversion to and from torch.tensor. So, if you consider using the Image class, as well as using MONAI transform, intermediate output in Nifti file format can be avoided. In any case, the App SDK operator does support saving file(s) to a known path, and set the path in a defined output of the operator, with type Path or even Text

The other advantage of using the Image class is that the existing App SDK DICOM Writers expect this object as input.

@tomaroberts
Copy link

@MMelQin – thanks for the feedback.

I believe wrapping the C++ code in Python is infeasible. It's a vast codebase (https://github.com/SVRTK/SVRTK) built on top of MIRTK.

Some of the discussion above was to customize/add/install additional non-Python dependent packages in the container image, which can be done post building of the MAP container image before the monai-deploy package command can properly support/accept extra OS pacaksges.

I think this is the solution I need to go for. We have the C++ SVRTK code in a container already built from a Dockerfile, so to me it would seem easiest to port that code into the MAP, then call the C++ commands from within the Python code. Previously done this using subprocess.run within Python.

Would be good to discuss this at the App-SDK workgroup (albeit I have a meeting at that time tomorrow...). See what I can do, otherwise another time.

@MMelQin
Copy link
Collaborator

MMelQin commented Nov 1, 2022

Thanks @tomaroberts Yes, initially we did discuss multi-container MAP, for this exact use case, and the PoC work to get docker in docker working. Will discuss this topic.

The alternative is to run multiple MAPs in a workflow, then local testing will have to use MONAI Deploy Express for orchestration or you own Docker Compose.

@tomaroberts
Copy link

tomaroberts commented Nov 17, 2022

To update, and for any future devs – I've got an initial solution working now. The workflow below demonstrates running dcm2niix in a MAP:

# cd ~/aide-svrtk
# ensure venv running
# ensure Docker running

# Test MAP code locally
python app -i input -o output

# Test MAP with MONAI Deploy
monai-deploy exec app -i input/ -o output/

# Initial packaging of MAP
monai-deploy package app --tag fetalsvrtk/aide:map-test -l DEBUG

# Push to DockerHub
docker push fetalsvrtk/aide:map-test

# Build 3rd-party software on top of MAP
docker build -t fetalsvrtk/aide:map-test-extra app/

# Test MAP-Extra with MONAI Deploy
monai-deploy run fetalsvrtk/aide:map-test-extra input output

Where the app/Dockerfile is:

FROM fetalsvrtk/aide:map-test AS build

WORKDIR /bin

RUN apt-get update && apt-get install --no-install-recommends --no-install-suggests -y \
	git build-essential cmake pigz

RUN git clone https://github.com/rordenlab/dcm2niix.git --branch master --single-branch \
	&& cd dcm2niix \
	&& mkdir build && cd build \
	&& cmake .. \
	&& make

ENV PATH="$PATH:/bin/dcm2niix/build/bin"

WORKDIR /var/monai

Then in the relevant Operator, I'm using subprocess.run(["dcm2niix", ...]) to call it from Python.


TL;DR – monai-deploy package app ... then docker build ... using FROM ... AS build to add 3rd-party software.

@laurencejackson
Copy link
Author

laurencejackson commented Sep 1, 2023

@MMelQin, I was wondering whether, with the recent Holoscan update, you might have any new ideas about how we can install non-python dependencies in application packages?

I recently forked monai-deploy-app-sdk from 0.5.1 and I had drafted a hacky way to install some extra install additional OS-level packages (in my case I required pando apt install libpangocairo-1.0-0) by passing an extra command line argument to monai-deploy package with a path to a text file that was appended to the end of the Dockerfile template. With the 0.6.0 release, it looks like this is no longer compatible. Which is fine, I wasn't particularly happy with that solution anyway - but I'm wondering whether you have any idea about how additional dependencies can be installed with the holoscan packager? I've had a look through the new Jupyter notebooks and the holoscan documentation but I can't see any examples of this yet?

The holoscan update is a huge update to the SDK, I'm looking forward to using it!

@MMelQin
Copy link
Collaborator

MMelQin commented Sep 6, 2023

Hi @laurencejackson, the Holoscan Packager can compile and build C++ apps into packages, as well as building Python apps into packages. With former, installing Debian packages is supported via cmake but for Python, it goes a different path. I tried to request the feature for system package installation, though it is not support in this version of Holoscan, and likely not to be in Holoscan 1.0 either.

We still have ways to workaround, good or bad. Tweaking the template Docker file in the local Holoscan distribution can be one, updating and committing the base image locally is another. With some scripting, these can be automated, which I had to choose to use to patch some issue while developing in parallel with Holoscan.

Will look into this further once I come back from vacation the week after.

@tomaroberts
Copy link

@MMelQin

Do you have examples of this:

Tweaking the template Docker file in the local Holoscan distribution

this:

updating and committing the base image locally is another

and this:

With some scripting, these can be automated, which I had to choose to use to patch some issue while developing in parallel with Holoscan

Thanks!

@MMelQin
Copy link
Collaborator

MMelQin commented Sep 6, 2023

@tomaroberts thanks for the question. I will look into it in the week after once back from vacation. @mocsharp please chime in too.

@laurencejackson
Copy link
Author

Just in case anyone comes across this, here is an example of scripting a multi-stage build with additional dependencies. In this example, I need to install the libpangocairo-1.0-0 using apt install.

The script can be executed in bash e.g. bash the_script_below.sh and could be simply extended to use environment variables or command line arguments for the variables. The general idea here is that I build a MAP without the necessary dependencies, then use that image as the new FROM target in a second dockerfile that installs the additional dependencies on top of the MAP.

# configure build parameters
IMAGE_NAME="ghcr.io/org/image_name"
BASE_IMAGE="nvcr.io/nvidia/cuda:12.2.0-runtime-ubuntu20.04"
APP_PATH="path/to/app"
MODEL_PATH="/data/models/model-764.pt"
APP_VERSION="0.1.0"
MODEL_VERSION="0.1.0"

# build base image
monai-deploy package $APP_PATH -t $IMAGE_NAME:base -l DEBUG -m "$MODEL_PATH" -b "$BASE_IMAGE"

# install additional dependencies
printf "FROM $IMAGE_NAME:base\nENV MODEL_VERSION=$MODEL_VERSION\nRUN apt update && apt install -y --no-install-recommends git-all libpangocairo-1.0-0 && pip cache purge" > Dockerfile_postbuild
docker build -t $IMAGE_NAME:$APP_VERSION -f Dockerfile_postbuild .

printf "***\nBuilt image $IMAGE_NAME:$APP_VERSION\nTo push the image to ghcr run: docker push $IMAGE_NAME:$APP_VERSION\n***\n"

# remove intermediate Dockerfile
docker image rm $IMAGE_NAME:base
rm Dockerfile_postbuild

Note that using the cuda:12.2.0-runtime base image make the eventual image size much smaller (approx 10GB smaller) than using the pytorch base image. This works in my case, but for this to work you need toalso install the git-all package along with libpangocairo-1.0-0.

Hope someone finds this useful!

@MMelQin
Copy link
Collaborator

MMelQin commented Oct 12, 2023

@laurencejackson Thanks for this script and surely it helps!

The default base image used by the Packager, nvcr.io/nvidia/clara-holoscan/holoscan:v0.6.0-dgpu, can be found in the file, artifact_sources.py, in the Holoscan SDK distribution, e.g. for Python 3.8 at python3.8/site-packages/holoscan/cli/common/artifact_sources.py. This base image is not based on the pytorch base image, though has CUDA 11 runtime, along with other packages required by Holoscan SDK. It is still sizable, and a smaller Holoscan runtime container, or at least the Dockerfile, will be published later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants