Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trailing backslash causes infinite loop in quadlet generator #24810

Closed
jeremyvisser opened this issue Dec 9, 2024 · 7 comments · Fixed by #24825
Closed

Trailing backslash causes infinite loop in quadlet generator #24810

jeremyvisser opened this issue Dec 9, 2024 · 7 comments · Fixed by #24825
Labels
jira kind/bug Categorizes issue or PR as related to a bug. quadlet

Comments

@jeremyvisser
Copy link

jeremyvisser commented Dec 9, 2024

Issue Description

When an exec line ends with a trailing backslash, the quadlet generator enters an infinite loop and never returns.

Steps to reproduce the issue

I can reproduce this on podman-5.3.0-1.fc41.x86_64, or from 07dddeb (latest main at time of writing).

  1. Populate this test case in /etc/containers/systemd/breakstuff.container:
    [Container]
    Exec=true \
    
    # must have a blank line above, but this line can be anything (including another blank line)
    
  2. Run /usr/lib/systemd/system-generators/podman-system-generator (or cmd/quadlet/main.go)

Describe the results you received

The podman-system-generator (quadlet) process never terminates, and uses 100% CPU:

$ ./quadlet -dryrun -v
quadlet-generator[1284258]: Loading source unit file /etc/containers/systemd/breakstuff.container

<wait for flying cars to be invented>
^C

Describe the results you expected

I would expect a parse error to occur, or at least some kind of timeout.

By the time I noticed this issue, there were many copies of podman-system-generator running on my system, as a new copy is spawned with each systemctl daemon-reload.

While user error is what triggers this (due to bad syntax), it should fail gracefully. This class of problem has occurred elsewhere in #21109 and #22974, which makes me wonder whether this is worth additionally mitigating by introducing some kind of timeout, e.g. doing the work in a goroutine, which is raced against a timer (could be quite long, e.g. order of minutes).

podman info output

host:
  arch: amd64
  buildahVersion: 1.38.0
  cgroupControllers:
  - cpuset
  - cpu
  - io
  - memory
  - hugetlb
  - pids
  - rdma
  - misc
  cgroupManager: systemd
  cgroupVersion: v2
  conmon:
    package: conmon-2.1.12-3.fc41.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.1.12, commit: '
  cpuUtilization:
    idlePercent: 90.73
    systemPercent: 2.97
    userPercent: 6.31
  cpus: 4
  databaseBackend: sqlite
  distribution:
    distribution: fedora
    variant: server
    version: "41"
  eventLogger: journald
  freeLocks: 1948
  idMappings:
    gidmap: null
    uidmap: null
  kernel: 6.11.8-300.fc41.x86_64
  linkmode: dynamic
  logDriver: journald
  memFree: 444219392
  memTotal: 8130838528
  networkBackend: netavark
  networkBackendInfo:
    backend: netavark
    dns:
      package: aardvark-dns-1.13.1-1.fc41.x86_64
      path: /usr/libexec/podman/aardvark-dns
      version: aardvark-dns 1.13.1
    package: netavark-1.13.0-1.fc41.x86_64
    path: /usr/libexec/podman/netavark
    version: netavark 1.13.0
  ociRuntime:
    name: crun
    package: crun-1.18.1-1.fc41.x86_64
    path: /usr/bin/crun
    version: |-
      crun version 1.18.1
      commit: c41f034fdbb9742c395085fc98459c94ad1f9aae
      rundir: /run/crun
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +LIBKRUN +WASM:wasmedge +YAJL
  os: linux
  pasta:
    executable: /usr/bin/pasta
    package: passt-0^20241030.gee7d0b6-1.fc41.x86_64
    version: |
      pasta 0^20241030.gee7d0b6-1.fc41.x86_64-pasta
      Copyright Red Hat
      GNU General Public License, version 2 or later
        <https://www.gnu.org/licenses/old-licenses/gpl-2.0.html>
      This is free software: you are free to change and redistribute it.
      There is NO WARRANTY, to the extent permitted by law.
  remoteSocket:
    exists: true
    path: unix:///run/podman/podman.sock
  rootlessNetworkCmd: pasta
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: false
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: true
  serviceIsRemote: true
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: slirp4netns-1.3.1-1.fc41.x86_64
    version: |-
      slirp4netns version 1.3.1
      commit: e5e368c4f5db6ae75c2fce786e31eef9da6bf236
      libslirp: 4.8.0
      SLIRP_CONFIG_VERSION_MAX: 5
      libseccomp: 2.5.5
  swapFree: 7390883840
  swapTotal: 8130654208
  uptime: 85h 2m 0.00s (Approximately 3.54 days)
  variant: ""
plugins:
  authorization: null
  log:
  - k8s-file
  - none
  - passthrough
  - journald
  network:
  - bridge
  - macvlan
  - ipvlan
  volume:
  - local
registries:
  search:
  - registry.fedoraproject.org
  - registry.access.redhat.com
  - docker.io
store:
  configFile: /usr/share/containers/storage.conf
  containerStore:
    number: 23
    paused: 0
    running: 23
    stopped: 0
  graphDriverName: overlay
  graphOptions:
    overlay.imagestore: /usr/lib/containers/storage
    overlay.mountopt: nodev,metacopy=on
  graphRoot: /var/lib/containers/storage
  graphRootAllocated: 107307073536
  graphRootUsed: 26410164224
  graphStatus:
    Backing Filesystem: xfs
    Native Overlay Diff: "false"
    Supports d_type: "true"
    Supports shifting: "true"
    Supports volatile: "true"
    Using metacopy: "true"
  imageCopyTmpDir: /var/tmp
  imageStore:
    number: 47
  runRoot: /run/containers/storage
  transientStore: false
  volumePath: /var/lib/containers/storage/volumes
version:
  APIVersion: 5.3.0
  Built: 1731456000
  BuiltTime: Wed Nov 13 11:00:00 2024
  GitCommit: ""
  GoVersion: go1.23.2
  Os: linux
  OsArch: linux/amd64
  Version: 5.3.0

Podman in a container

No

Privileged Or Rootless

None

Upstream Latest Release

Yes

Additional environment details

Reproduced on Fedora 41 (amd64).

Additional information

No response

@jeremyvisser jeremyvisser added the kind/bug Categorizes issue or PR as related to a bug. label Dec 9, 2024
@github-actions github-actions bot added the remote Problem is in podman-remote label Dec 9, 2024
@Luap99 Luap99 added quadlet and removed remote Problem is in podman-remote labels Dec 10, 2024
@mheon
Copy link
Member

mheon commented Dec 10, 2024

@ygalblum PTAL

@ygalblum
Copy link
Contributor

I like the idea of adding a timeout. As you wrote, we have seen (and probably see) parsing issues causing infinite loops. The change is not even that big and it does seem to be working in my branch.
But, before we make the change, I'd like to get some opinions both on if, and if yes, what should the timeout be. @Luap99 @vrothberg @rhatdan WDYT?

@Luap99
Copy link
Member

Luap99 commented Dec 11, 2024

There is never a right "timeout", I mean for quadlet we are only parsing small files so we should assume that it runs quick in all cases but is hard to say what quick is exactly. It is certainly an easy fix to guard against other bugs like this but overall I would say we should investigate why we have endless loops in a "simple" file parser and spend time rewriting the code in a way that makes infinitive loops impossible. That said I do acknowledge that might not be an easy task and a timeout is a easy thing we can do to improve things.

If we are run from systemd does it kill us after a set timeout on their end or does the quadlet process just keeps running forever? If we are not killed I think systemd generally seems to use 90s as default timeout so maybe this seems reasonable?

@ygalblum
Copy link
Contributor

Yes, I agree that catching these errors and fixing them is the right thing to do. But, we can see that there's always another use case we did not cover. Since this is not a fix for the issue but instead a workaround for infinite loops, once a timeout occurs, users will report this (we can also ask them to in the log message).

As for current timeouts, according to the issue description, Quadlet was triggered via systemctl daemon-reload. So, it does not seem like there is a timeout anywhere.

Regarding reasonable timeout, I agree, it's always a problem. So, I would choose an unreasonably long one, say 10 minutes.

giuseppe added a commit to giuseppe/libpod that referenced this issue Dec 12, 2024
This commit simplifies the systemd parser logic, and it solves an
infinite loop when using a continuation line.

Closes: containers#24810

Signed-off-by: Giuseppe Scrivano <[email protected]>
giuseppe added a commit to giuseppe/libpod that referenced this issue Dec 12, 2024
This commit simplifies the systemd parser logic, and it solves an
infinite loop when using a continuation line.

Closes: containers#24810

Signed-off-by: Giuseppe Scrivano <[email protected]>
@giuseppe
Copy link
Member

using a timeout to guard issues in a "simple" parser, seems overkilling to me.

I ended up with a simpler implementation while taking a look at the issue, and I've opened a PR: #24825

The tests were focused on recreating exactly the same input, which I don't think has much value, so I've simplified that too.

giuseppe added a commit to giuseppe/libpod that referenced this issue Dec 12, 2024
This commit simplifies the systemd parser logic, and it solves an
infinite loop when using a continuation line.

Closes: containers#24810

Signed-off-by: Giuseppe Scrivano <[email protected]>
giuseppe added a commit to giuseppe/libpod that referenced this issue Dec 12, 2024
This commit simplifies the systemd parser logic, and it solves an
infinite loop when using a continuation line.

Closes: containers#24810

Signed-off-by: Giuseppe Scrivano <[email protected]>
@ygalblum
Copy link
Contributor

@giuseppe thanks for looking into this. I agree that removing the loop Quadlet keeps on getting stuck on is the correct fix. I'll put aside my timeout implementation.
I've also reviewed your PR.

giuseppe added a commit to giuseppe/libpod that referenced this issue Dec 12, 2024
This commit simplifies the systemd parser logic, and it solves an
infinite loop when using a continuation line.

Closes: containers#24810

Signed-off-by: Giuseppe Scrivano <[email protected]>
giuseppe added a commit to giuseppe/libpod that referenced this issue Dec 12, 2024
This commit simplifies the systemd parser logic, and it solves an
infinite loop when using a continuation line.

Closes: containers#24810

Signed-off-by: Giuseppe Scrivano <[email protected]>
giuseppe added a commit to giuseppe/libpod that referenced this issue Dec 12, 2024
This commit simplifies the systemd parser logic, and it solves an
infinite loop when using a continuation line.

Closes: containers#24810

Signed-off-by: Giuseppe Scrivano <[email protected]>
giuseppe added a commit to giuseppe/libpod that referenced this issue Dec 12, 2024
This commit simplifies the systemd parser logic, and it solves an
infinite loop when using a continuation line.

Closes: containers#24810

Signed-off-by: Giuseppe Scrivano <[email protected]>
@giuseppe giuseppe added the jira label Dec 12, 2024
@jeremyvisser
Copy link
Author

Thanks! I’m also in favour of simplifying being an entirely valid mitigation here. Replacing for len(data) > 0 with for lineNr, line := range lines, etc. is definitely a safer pattern, and somewhat more likely to terminate. 😁

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
jira kind/bug Categorizes issue or PR as related to a bug. quadlet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants