Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

race: parallel builds: copying...committing...creating... layer not known #5674

Open
edsantiago opened this issue Aug 8, 2024 · 3 comments

Comments

@edsantiago
Copy link
Member

This might be the same as containers/podman#23331 . If it is, someone please close this or move.

Setup:

$ for i in 1 2;do printf "FROM quay.io/libpod/testimage:20240123\nRUN echo hi from $i\n" >Containerfile$i;done

In window 1:

$ while :;do buildah build -t c1 --layers=true -f Containerfile1 || break;buildah rmi c1;done

In window 2:

$ while :;do buildah build --layers=false -t c2 -f Containerfile2|| break;buildah rmi c2;done

Within 30-60s, window 1 will barf:

STEP 1/2: FROM quay.io/libpod/testimage:20240123
STEP 2/2: RUN echo hi from 1
Error: checking if cached image exists from a previous build: getting top layer info: layer not known

or

STEP 1/2: FROM quay.io/libpod/testimage:20240123
STEP 2/2: RUN echo hi from 1
hi from 1
COMMIT c1
Error: committing container for step {Env:[PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin] Command:run Args:[echo hi from 1] Flags:[] Attrs:map[] Message:RUN echo hi from 1 Heredocs:[] Original:RUN echo hi from 1}: copying layers and metadata for container "a8d0253ccd5f337ca69e106657dc645e4926b20d9775621827b6ec118bcb35fa": committing the finished image: creating image "41778f8cf15b69d1fdb79d5bb744ba65eac877e27a21dd12af8700594d88585b": layer not known

The rmi seems important; I can't get it to fail (at least not within my patience tolerance of ~10m) if I omit rmi from either loop.

Testing with podman fails MUCH faster than buildah, for reasons I don't understand, and also fails sometimes in window 2. Buildah only fails in window 1.

This is blocking parallelization of podman test 070-build and I bet this is one of the uncategorized weirdnesses I've seen in #5552 but didn't follow up on.

@edsantiago
Copy link
Member Author

Issue persists:

<+0042s> # # podman build -t b-t156-muinxj0h /tmp/CI_dBI1/podman_bats.20lh4r/build-test
<+477ms> # STEP 1/3: FROM quay.io/libpod/testimage:20240123
         # STEP 2/3: COPY ./ /tmp/test/
         # Error: checking if cached image exists from a previous build: getting top layer info: layer not known
<+005ms> # [ rc=125 (** EXPECTED 0 **) ]

Podman PR containers/podman#23275 with current buildah (v1.37.1-0.20240828183349-69259725a0df) vendored.

@nalind
Copy link
Member

nalind commented Aug 29, 2024

This is two builds with --layers=true, which means they're reading each other's work as cache candidates, which is not something #5686 was concerned with.

edsantiago added a commit to edsantiago/libpod that referenced this issue Sep 17, 2024
Need --layers=false in podman build, otherwise a buildah race
can trigger "layer not known" failures:

   containers/buildah#5674

Signed-off-by: Ed Santiago <[email protected]>
Copy link

A friendly reminder that this issue had no activity for 30 days.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants