Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proxmox v7.4: Sporadic failure unable to read tail (got 0 bytes) on teardown of VMs #1352

Open
AlexFernandes-MOVAI opened this issue Jun 4, 2024 · 3 comments
Labels
🐛 bug Something isn't working

Comments

@AlexFernandes-MOVAI
Copy link

This issue is just an open question to know what can cause this error and if the issue is fixed in later version of bpg provider or proxmox ....

Bug Description

On a production Proxmox server running version 7.4, we randomly run into a teardown error of type imgdel:local:ci@pam: unable to read tail (got 0 bytes)

To Reproduce
Steps to reproduce the behavior:

  1. Create multiple resources of type proxmox_virtual_environment_vm with a single terraform apply
  2. Run the VMs for some time 1-20min
  3. Destroy the resources with terraform destroy
  4. Teardown fail with error mentionned above

The terraform used looks like the one below, where some variables are defined:

resource "proxmox_virtual_environment_vm" "fleet_manager" {
  name            = var.fleet_manager_name
  description     = "Managed by Terraform"
  tags            = var.tags
  node_name       = var.proxmox_host_list[0]
  pool_id         = var.pool
  scsi_hardware   = var.scsihw
  stop_on_destroy = true
  started         = true
  on_boot         = false

  cpu {
    cores = var.fleet_manager_cores
    type  = var.vm_core_type
  }

  memory {
    dedicated = var.fleet_manager_memory
    floating  = var.fleet_manager_balloon
  }

  agent {
    enabled = true
  }

  machine = var.vm_type
  bios    = var.bios

  network_device {
    bridge = var.vm_network_bridge
  }

  disk {
    datastore_id = var.vm_storage
    file_id      = var.fleet_manager_img_id
    interface    = var.vm_disk_interface
    size         = var.fleet_manager_disk_size
    iothread     = true
  }

  serial_device {}
  vga {
    enabled = true
  }

  dynamic "hostpci" {
    for_each = var.fleet_manager_enable_hostpci ? [1] : []
    content {
      device = var.fleet_manager_enable_hostpci ? var.hostpci_device : null
      id     = var.fleet_manager_enable_hostpci ? var.hostpci_device_id : null
      pcie   = var.fleet_manager_enable_hostpci ? var.hostpci_device_pcie : null
      xvga   = var.fleet_manager_enable_hostpci ? var.hostpci_device_xvga : null
    }
  }

  operating_system {
    type = var.vm_os_type
  }

  initialization {
    datastore_id      = var.cloud_init_storage
    user_data_file_id = proxmox_virtual_environment_file.cloud_config_main.id

    ip_config {
      ipv4 {
        address = var.ip_list[0]
        gateway = var.ip_list[0] != "dhcp" ? var.static_ip_gateway : null
      }
    }
  }
  provisioner "local-exec" {
    when    = create
    command = "sleep ${var.startup_wait_for_ip}"
  }
}

Expected behavior
The terraform destroy should always succeed without failures

Logs

May 02 09:28:45 hel pvedaemon[1838702]: <ci@pam> starting task UPID:hel:001C3146:032B2106:66335CCD:qmdestroy:109:ci@pam:
May 02 09:28:45 hel pvedaemon[1847622]: destroy VM 109: UPID:hel:001C3146:032B2106:66335CCD:qmdestroy:109:ci@pam:
May 02 09:28:45 hel pvedaemon[1841203]: <ci@pam> starting task UPID:hel:001C3147:032B2106:66335CCD:qmdestroy:111:ci@pam:
May 02 09:28:45 hel pvedaemon[1847623]: destroy VM 111: UPID:hel:001C3147:032B2106:66335CCD:qmdestroy:111:ci@pam:
May 02 09:28:45 hel pvedaemon[1838829]: <ci@pam> starting task UPID:hel:001C314A:032B2107:66335CCD:qmdestroy:107:ci@pam:
May 02 09:28:45 hel pvedaemon[1847626]: destroy VM 107: UPID:hel:001C314A:032B2107:66335CCD:qmdestroy:107:ci@pam:
May 02 09:28:45 hel pvedaemon[1838702]: <ci@pam> end task UPID:hel:001C3146:032B2106:66335CCD:qmdestroy:109:ci@pam: OK
May 02 09:28:45 hel pvedaemon[1841203]: <ci@pam> end task UPID:hel:001C3147:032B2106:66335CCD:qmdestroy:111:ci@pam: OK
May 02 09:28:46 hel pvedaemon[1838829]: <ci@pam> end task UPID:hel:001C314A:032B2107:66335CCD:qmdestroy:107:ci@pam: OK
May 02 09:28:47 hel pvedaemon[1841203]: <ci@pam> starting task UPID:hel:001C3158:032B21D1:66335CCF:imgdel:local:ci@pam:
May 02 09:28:47 hel pvedaemon[1838702]: <ci@pam> starting task UPID:hel:001C3159:032B21D1:66335CCF:imgdel:local:ci@pam:
May 02 09:28:47 hel pvedaemon[1838702]: <ci@pam> end task UPID:hel:001C3159:032B21D1:66335CCF:imgdel:local:ci@pam: OK
May 02 09:28:47 hel pvedaemon[1841203]: <ci@pam> end task UPID:hel:001C3158:032B21D1:66335CCF:imgdel:local:ci@pam: OK
May 02 09:28:47 hel pvedaemon[1841203]: <ci@pam> end task UPID:hel:001C315C:032B21D2:66335CCF:imgdel:local:ci@pam: unable to read tail (got 0 bytes)
  • Single or clustered Proxmox: Single
  • Proxmox version: 7.4-17
  • Provider version: bpg/proxmox 0.52.0
  • Terraform/OpenTofu version: ">= 0.12.14"
  • OS: Ubuntu 22.04
@AlexFernandes-MOVAI AlexFernandes-MOVAI added the 🐛 bug Something isn't working label Jun 4, 2024
@bpg
Copy link
Owner

bpg commented Jun 5, 2024

Hey @AlexFernandes-MOVAI 👋🏼

Honestly, no much ideas what is causing this. It looks like you're simultaneously deleting at least 3 VMs, so there could be some race conditions in the PVE. Or perhaps IO bottleneck on your storage, and the task inside PVE times out.

The provider does not do much in regards of the resource destruction, it just submits a task and waits for its completion.

As an experiment you may try with different parallelism value, and see if reducing it helps.

@AlexFernandes-MOVAI
Copy link
Author

Thanks for the quick feedback @bpg and the hardwork on this repo which is helping us a lot for a few months now.

I don't believe the storage can be the issue since it is an internal SSD drive. I will try to play with parallelism and post here the conclusions

@bpg
Copy link
Owner

bpg commented Jun 7, 2024

Hm... In fairness, if the VM deletion completes without other errors, we could probably just ignore this status and assume the task has successfully finished.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🐛 bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants