Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rawhide, pasta: new test issue with connections to local registry container #24804

Closed
Luap99 opened this issue Dec 9, 2024 · 4 comments
Closed
Labels
kind/bug Categorizes issue or PR as related to a bug. network Networking related issue or feature pasta pasta(1) bugs or features regression

Comments

@Luap99
Copy link
Member

Luap99 commented Dec 9, 2024

With the latest image update in #24759 many podman login and logout started to fail on rawhide.
The issue seems to come down to the local registry container running with pasta and somehow the connections from podman login/push/pull are no longer handled correctly resulting in different erro messages such as

authenticating creds for "localhost:5435": Get "https://localhost:5435/v2/": read tcp 127.0.0.1:35068->127.0.0.1:5435: read: connection reset by peer
setting up to read manifest and configuration from "docker://localhost:5748/citest:latest": pinging container registry localhost:5748: Get "http://localhost:5748/v2/": dial tcp [::1]:5748: connect: connection refused
trying to reuse blob sha256:03901b4a2ea88eeaad62dbe59b072b28b6efa00491962b8741081c5df50c65e0 at destination: pinging container registry localhost:5650: Get "https://localhost:5650/v2/": dial tcp [::1]:5650: connect: connection refused

Even on a local non CI rawhide VM the issue can be reproduced with make localintegration FOCUS="Podman login and logout", it is some kind of race as the errors are always slightly different and not always the same tests fails.

The CI update brought us to:
passt 0^20241127.gc0fbc7e-1.fc42-x86_64
kernel 6.13.0-0.rc1.20241203gitcdd30ebb1b9f.16.fc42.x86_64

However I already tried to downgrade both pasta and the kernel and neither made difference and the same pasta version works fine on f41/40 in our VMs so it does not look like pasta issue. And given we also confirmed it fails on older kernels it does not seem to be triggered by the kernel either. So something else must have changed on rawhide that triggers it.

Example CI failure log:
https://api.cirrus-ci.com/v1/artifact/task/5908289420001280/html/int-podman-rawhide-rootless-host-sqlite.log.html#t--Podman-login-and-logout-podman-login-and-logout-with-cert-dir--1

@Luap99 Luap99 added kind/bug Categorizes issue or PR as related to a bug. network Networking related issue or feature pasta pasta(1) bugs or features regression labels Dec 9, 2024
@Luap99
Copy link
Member Author

Luap99 commented Dec 10, 2024

diff --git a/test/e2e/login_logout_test.go b/test/e2e/login_logout_test.go
index d0b5bb940e..5bdea4ca61 100644
--- a/test/e2e/login_logout_test.go
+++ b/test/e2e/login_logout_test.go
@@ -5,10 +5,13 @@ package integration
 import (
 	"encoding/json"
 	"fmt"
+	"io"
 	"os"
 	"path/filepath"
 	"strconv"
 	"strings"
+	"syscall"
+	"time"
 
 	. "github.com/containers/podman/v5/test/utils"
 	. "github.com/onsi/ginkgo/v2"
@@ -24,6 +27,9 @@ var _ = Describe("Podman login and logout", func() {
 		server                   string
 		testImg                  string
 		registriesConfWithSearch []byte
+		logFile                  string
+		strace                   *PodmanSession
+		straceLog                string
 	)
 
 	BeforeEach(func() {
@@ -59,7 +65,12 @@ var _ = Describe("Podman login and logout", func() {
 		setup := SystemExec("cp", []string{filepath.Join(certPath, "domain.crt"), filepath.Join(certDirPath, "ca.crt")})
 		setup.WaitWithDefaultTimeout()
 
+		logFile = filepath.Join(podmanTest.TempDir, "pasta.log")
+		pidfile := filepath.Join(podmanTest.TempDir, "pasta.pid")
+
 		session := podmanTest.Podman([]string{"run", "-d", "-p", strings.Join([]string{strconv.Itoa(port), strconv.Itoa(port)}, ":"),
+			"--network", "pasta:--trace,--pid," + pidfile + ",--log-file," + logFile + ",--pcap,/tmp/pasta-" +
+				CurrentSpecReport().LeafNodeText + time.Now().Format(time.RFC3339) + ".pcap",
 			"-e", strings.Join([]string{"REGISTRY_HTTP_ADDR=0.0.0.0", strconv.Itoa(port)}, ":"), "--name", "registry", "-v",
 			strings.Join([]string{authPath, "/auth:Z"}, ":"), "-e", "REGISTRY_AUTH=htpasswd", "-e",
 			"REGISTRY_AUTH_HTPASSWD_REALM=Registry Realm", "-e", "REGISTRY_AUTH_HTPASSWD_PATH=/auth/htpasswd",
@@ -68,6 +79,12 @@ var _ = Describe("Podman login and logout", func() {
 		session.WaitWithDefaultTimeout()
 		Expect(session).Should(ExitCleanly())
 
+		pid, err := os.ReadFile(pidfile)
+		Expect(err).ToNot(HaveOccurred())
+
+		straceLog = filepath.Join(podmanTest.TempDir, "strace.log")
+		strace = StartSystemExec("strace", []string{"-o", straceLog, "-f", "-p", strings.TrimSpace(string(pid))})
+
 		if !WaitContainerReady(podmanTest, "registry", "listening on", 20, 1) {
 			Skip("Cannot start docker registry.")
 		}
@@ -77,6 +94,28 @@ var _ = Describe("Podman login and logout", func() {
 	})
 
 	AfterEach(func() {
+		if CurrentSpecReport().Failed() {
+			session := podmanTest.Podman([]string{"logs", "registry"})
+			session.WaitWithDefaultTimeout()
+
+			f, err := os.Open(logFile)
+			Expect(err).ToNot(HaveOccurred())
+
+			GinkgoWriter.Println("pasta trace log:")
+			_, err = io.Copy(GinkgoWriter, f)
+			f.Close()
+			Expect(err).ToNot(HaveOccurred())
+
+			strace.Signal(syscall.SIGTERM)
+
+			f, err = os.Open(straceLog)
+			Expect(err).ToNot(HaveOccurred())
+			GinkgoWriter.Println("pasta strace log:")
+			_, err = io.Copy(GinkgoWriter, f)
+			f.Close()
+			Expect(err).ToNot(HaveOccurred())
+		}
+
 		os.Unsetenv("REGISTRY_AUTH_FILE")
 		os.RemoveAll(authPath)
 		os.RemoveAll(certDirPath)

Diff to get pasta log file, strace and pcap files for debugging

@Luap99
Copy link
Member Author

Luap99 commented Dec 10, 2024

Just to add a short summary, it seems the glibc upgrade causes issues with the strict pasta seccomp profile.
@sbrivio-rh is working on a fix

@sbrivio-rh
Copy link
Collaborator

Fix posted at https://archives.passt.top/passt-dev/[email protected]/, now pending review. The gory details are reported in the commit message.

Long story short: seccomp profiles for pasta(1) disallow getrandom(2) and brk(2). If those system calls are issued, the process terminates. With glibc > 2.40, strerror(3), used to display descriptions of errors that sometimes occur during those tests, allocates memory in a way that needs both system calls. Replace strerror(3) calls by strerrordesc_np(3), only if available. Keep calling strerror(3) if it's not (e.g. for musl).

@sbrivio-rh
Copy link
Collaborator

Fixed in pasta 2024_12_11.09478d5, matching Fedora Rawhide update passt-0^20241211.g09478d5-1.fc42 and Debian unstable's passt-0.0~git20241211.09478d5-1.

Luap99 added a commit to Luap99/automation_images that referenced this issue Dec 12, 2024
- remove old pasta bump and add new bump for rawhide issue
  containers/podman#24804
- bump debian tar timebomb, it still has the same broken version

Signed-off-by: Paul Holzinger <[email protected]>
hswong3i pushed a commit to alvistack/passt-top-passt that referenced this issue Dec 17, 2024
…2.40

With glibc commit 25a5eb4010df ("string: strerror, strsignal cannot
use buffer after dlmopen (bug 32026)"), strerror() now needs, at least
on x86, the getrandom() and brk() system calls, in order to fill in
the locale-translated error message. But getrandom() and brk() are not
allowed by our seccomp profiles.

This became visible on Fedora Rawhide with the "podman login and
logout" Podman tests, defined at test/e2e/login_logout_test.go in the
Podman source tree, where pasta would terminate upon printing error
descriptions (at least the ones related to the SO_ERROR queue for
spliced connections).

Avoid dynamic memory allocation by calling strerrordesc_np() instead,
which is a GNU function returning a static, untranslated version of
the error description. If it's not available, keep calling strerror(),
which at that point should be simple enough as to be usable (at least,
that's currently the case for musl).

Reported-by: Paul Holzinger <[email protected]>
Link: containers/podman#24804
Analysed-by: Paul Holzinger <[email protected]>
Signed-off-by: Stefano Brivio <[email protected]>
Reviewed-by: David Gibson <[email protected]>
Tested-by: Paul Holzinger <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. network Networking related issue or feature pasta pasta(1) bugs or features regression
Projects
None yet
Development

No branches or pull requests

2 participants