Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nextflow: When using docker and enabling tracing, processes fail within the docker containers due to paths to /nix/store/bash #350183

Open
stianlagstad opened this issue Oct 21, 2024 · 9 comments
Labels
0.kind: bug Something is broken

Comments

@stianlagstad
Copy link
Contributor

Describe the bug

I'm using nextflow, as packaged in #339197, from the current unstable channel. This new packaging of nextflow has a patch to make sure the correct paths to bash is used depending on whether docker is enabled for the nextflow workflow run or not:

postPatch = ''
# Nextflow invokes the constant "/bin/bash" (not as a shebang) at
# several locations so we fix that globally. However, when running inside
# a container, we actually *want* "/bin/bash". Thus the global fix needs
# to be reverted for this specific use case.
substituteInPlace modules/nextflow/src/main/groovy/nextflow/executor/BashWrapperBuilder.groovy \
--replace-fail "['/bin/bash'," "['${bash}/bin/bash'," \
--replace-fail "if( containerBuilder ) {" "if( containerBuilder ) {
launcher = launcher.replaceFirst(\"/nix/store/.*/bin/bash\", \"/bin/bash\")"
'';
. I suspect that this patch does not do the correct thing if tracing is also enabled for the nextflow workflow run. When running with both docker and tracing enabled, I get this error:

 N E X T F L O W   ~  version 24.08.0-edge

Launching `main.nf` [awesome_cori] DSL2 - revision: 5ad587431e

executor >  local (1)
[56/9687f5] sayHello (1) [100%] 1 of 1, failed: 1 ✘
ERROR ~ Error executing process > 'sayHello (1)'

Caused by:
  Process `sayHello (1)` terminated with an error exit status (127)
...

Command error:
  .command.run: line 154: /nix/store/717iy55ncqs0wmhdkwc5fg2vci5wbmq8-bash-5.2p32/bin/bash: No such file or directory

Work dir:
  /home/stian/devp/hello/work/56/9687f5d5b119c156ba9a46e4694428
...

The problem is in the .command.run file, where not all occurrences of paths to bash is handled correctly by the postPath. When I enable tracing in addition to docker, then the command in .command.run changes from this:

    docker run -i --cpu-shares 1024 -e "NXF_TASK_WORKDIR" -e "NXF_DEBUG=${NXF_DEBUG:=0}" -v /home/stian/devp/hello/work/56/9687f5d5b119c156ba9a46e4694428:/home/stian/devp/hello/work/56/9687f5d5b119c156ba9a46e4694428 -w "$NXF_TASK_WORKDIR" --name $NXF_BOXID quay.io/nextflow/bash /bin/bash /home/stian/devp/hello/work/56/9687f5d5b119c156ba9a46e4694428/.command.run

to this:

    docker run -i --cpu-shares 1024 -e "NXF_TASK_WORKDIR" -e "NXF_DEBUG=${NXF_DEBUG:=0}" -v /home/stian/devp/hello/work/56/9687f5d5b119c156ba9a46e4694428:/home/stian/devp/hello/work/56/9687f5d5b119c156ba9a46e4694428 -w "$NXF_TASK_WORKDIR" --name $NXF_BOXID quay.io/nextflow/bash /bin/bash /home/stian/devp/hello/work/56/9687f5d5b119c156ba9a46e4694428/.command.run nxf_trace

The addition of nxf_trace at the end there takes the bash script into the function nxf_trace_linux (or nxf_trace_mac is on mac), where there's a hardcoded path to the nix store-provided bash:

nxf_trace_linux() {
... 
    /nix/store/717iy55ncqs0wmhdkwc5fg2vci5wbmq8-bash-5.2p32/bin/bash -ue /home/stian/devp/hello/work/56/9687f5d5b119c156ba9a46e4694428/.command.sh &
...

This nix store-provided bash path should not be there as I enabled docker, but it remains, and causes the process to fail.

An additional occurrence of the nix store-provided bash path is present in the shebang of the .command.sh file, although it makes no difference to the result of the workflow run.

Steps To Reproduce

Steps to reproduce the behavior:

  1. Create a shell with nextflow as packaged in nextflow: 22.10.6 -> 24.08.0-edge + remove buildFHSEnv + compile from source + add tests #339197
  2. Download the example workflow https://github.com/nextflow-io/hello, or use any other workflow and enable both docker and tracing.
  3. Run the workflow and observe that you get an error similar to the above.

Expected behavior

I expected the workflow run to succeed, and that the files .command.run and .command.sh should have no occurrences of paths to the nix store.

Screenshots

NA

Additional context

NA

Notify maintainers

@edmundmiller @Etjean @rollf

Metadata

[stian@stian-dell-nixos:~/devp/hello]$ nix-shell -p nix-info --run "nix-info -m"
 - system: `"x86_64-linux"`
 - host os: `Linux 6.6.56, NixOS, 24.05 (Uakari), 24.05.20241016.e001b23`
 - multi-user?: `yes`
 - sandbox: `yes`
 - version: `nix-env (Nix) 2.18.8`
 - nixpkgs: `/nix/store/apchqa88gf3yp1fghqn9aqkg0c2fm962-source`
@stianlagstad stianlagstad added the 0.kind: bug Something is broken label Oct 21, 2024
@stianlagstad
Copy link
Contributor Author

Here's a change which will add two more test cases, leading to failures:

nixos/tests/nextflow.nix --- Nix
23     run-nextflow-pipeline = pkgs.writeShellApplication {                    23     run-nextflow-pipeline = pkgs.writeShellApplication {
24       name = "run-nextflow-pipeline";                                       24       name = "run-nextflow-pipeline";
25       runtimeInputs = [ pkgs.nextflow ];                                    25       runtimeInputs = [ pkgs.nextflow ];
26       text = ''                                                             26       text = ''
27         export NXF_OFFLINE=true                                             27         export NXF_OFFLINE=true
28         for b in false true; do                                             28         for b in false true; do
29           echo "docker.enabled = $b" > nextflow.config                      29           for c in false true; do
..                                                                             30             echo "docker.enabled = $b" > nextflow.config
30           cat nextflow.config                                               31             echo "trace.enabled = $c" > nextflow.config
..                                                                             32             cat nextflow.config
31           nextflow run -ansi-log false ${hello}                             33             nextflow run -ansi-log false ${hello}
..                                                                             34           done
32         done                                                                35         done
33       '';                                                                   36       '';
34     };                                                                      37     };
35   in                                                                        38   in
36   {                                                                         39   {

@rollf
Copy link
Contributor

rollf commented Oct 21, 2024

@stianlagstad Thank you for the detailed report. I vaguely remember that I had looked into the traced-scenario and that something was problematic but I must have lost attention. Sorry about that. I'll have a look into this.

@stianlagstad
Copy link
Contributor Author

stianlagstad commented Oct 22, 2024

@stianlagstad Thank you for the detailed report. I vaguely remember that I had looked into the traced-scenario and that something was problematic but I must have lost attention. Sorry about that. I'll have a look into this.

Please don't apologize! Thank you again for the good work in #339197 💯

I'm trying myself to iterate on a modified postPatch, but so far I have not been successful. I'm considering writing a PR for the nextflow repository instead, so that we can avoid the hardcoded /bin/bash strings there.

Update: I've created an issue for this on the nextflow side: nextflow-io/nextflow#5420

@rollf
Copy link
Contributor

rollf commented Oct 22, 2024

Ah, yes, me, too :) I have something working already see here but I'm not totally satisfied. I will iterate and then open a PR upstream.

@rollf
Copy link
Contributor

rollf commented Oct 22, 2024

Okay, I updated my nextflow-patch and also created a nixpkgs branch that uses this new version and doesn't need to patch anything anymore. You should be able to run it like so

❯ nix run github:rollf/nixpkgs/nextflow-trace#nextflow -- -version

      N E X T F L O W
      version 24.09.2-edge build 5927
      created 14-10-2024 21:22 UTC (23:22 CEST)
      cite doi:10.1038/nbt.3820
      http://nextflow.io

(Mind that it's a newer version of nextflow).

Would be great if you could try this out.

I have adapted the NixOS-tests based on your suggestions (see here) and they pass.

If we're both satisfied, I'll open a PR upstream as well as for nixpkgs. Let's see whether we can get that integrated into upstream.

@stianlagstad
Copy link
Contributor Author

Thank you @rollf !

Do I understand this correctly?

If I understand that correctly then there are some objections to this (using /usr/bin/env) in the comments here:

I also pointed to those in my issue in the nextflow repo: nextflow-io/nextflow#5420.

I can confirm however that your fix works for my use case.

@rollf
Copy link
Contributor

rollf commented Oct 25, 2024

Do I understand this correctly?

Yes.

If I understand that correctly then there are some objections to this (using /usr/bin/env) in the comments here:

...

I also pointed to those in my issue in the nextflow repo: nextflow-io/nextflow#5420.

Yes, I had read those except the issue you raised, so thanks for pointing them out again. I still think what I propose is reasonable because a) it does keep /bin/bash wherever possible (and certainly within container instances) and b) the patch set is really small.

I can confirm however that your fix works for my use case.

Okay, I'll open a PR upstream. If it does not get accepted, then we'll need to keep the patch within nixpkgs 🤷 .

@rollf
Copy link
Contributor

rollf commented Oct 25, 2024

Here is the upstream PR. We'll have to wait for CI, too.

@rollf
Copy link
Contributor

rollf commented Oct 26, 2024

@stianlagstad I have spent much more time now to improve both on the nextflow as well as on the Nixpkgs side and updated both branches / the upstream PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0.kind: bug Something is broken
Projects
None yet
Development

No branches or pull requests

2 participants