Hi,
I have progressed in my research.
I created a minimal test case in order to reproduce the problem (see below).
I made tests on 3 (physical) machines under Debian 11.4: the problem is
present on 2 machines but there is no problem on the third.
I booted a machine where the problem is present into a Debian 11.4 live
OS and made the test : it works, no problem.
So far, all my tests lead me to the following conclusions:
- The problem is tied to the configuration of the system.
- It's not 'file permission' problem. The directory structure of
the
storage pool, the file permissions on this structure, the configuration
of libvirt and qemu and the user under which the daemon runs are the
same on all systems.
- I have made the test with libvirt 7.0.0 & qemu 1.5.2 and with
libvirt 8.0.0 and qemu 1.7.0 (from Debian 11 backports). The different
versions have the same behavior.
- Apparmor is not the culprit (No error in logs). I have also disabled
it and the behavior is still in the same
I will appreciate any hint about what I should check to find the
difference between the working systems and the failing ones.
Regards,
Fred
How to made a test (under root):
1/ Install libvirt & qemu if needed
apt install libvirt-daemon-system qemu-system-x86 virtinst
2/ Start libvirt daemon if needed
systemctl start libvirtd
3/ Create the default pool storage (if it is not created automatically)
virsh pool-define-as default dir - - - - /var/lib/libvirt/images/
virsh pool-build default
virsh pool-start default
5/ Download Debian 11.4 Generic cloud image and put it in the default
storage pool
wget -O /var/lib/libvirt/images/debian.qcow2
https://cloud.debian.org/images/cloud/bullseye/latest/debian-11-genericcloud-amd64.qcow2
6/ Refresh the default storage and check the Debian image is visible.
virsh pool-refresh default
virsh vol-list --pool default
7) Start the default network
virsh net-start default
8) Create a VM based on the Debian 11.4 Generic cloud image
virt-install -n TESTBUG --disk vol=default/debian.qcow2 --memory 1024
--import --noreboot --graphics none
9/ Start the VM, it should start and work fine
virsh start TESTBUG
10/ Stop the VM
virsh shutdown TESTBUG
11/ Change the disk definition to switch to the disk type from 'file' to
'volume' and adapt the 'source' attributes accordingly.
virsh edit --domain TESTBUG
Change this section:
<disk type="file" device="disk">
<driver name="qemu" type="qcow2"/>
<source file="/var/lib/libvirt/images/debian.qcow2"/>
<target dev="hda" bus="ide"/>
<address type="drive" controller="0" bus="0"
target="0" unit="0"/>
</disk>
to :
<disk type="volume" device="disk">
<driver name="qemu" type="qcow2"/>
<source pool="default" volume="debian.qcow2"/>
<target dev="hda" bus="ide"/>
<address type="drive" controller="0" bus="0"
target="0" unit="0"/>
</disk>
12/ Start the VM again. It will either succeed or fail with the
fololwing error :
error creating libvirt domain: internal error: qemu unexpectedly closed
the monitor: 2022-08-11T16:12:22.987252Z qemu-system-x86_64: -blockdev
{"driver":"file","filename":"/var/lib/libvirt/images/debian.qcow2","node-name":"libvirt-3-storage","auto-read-only":true,"discard":"unmap"}:
Could not open '/var/lib/libvirt/images/debian.qcow2': Permission denied
Le 13/08/2022 ? 12:39, Fr?d?ric Lespez a ?crit?:> Hi,
>
> I need some help to debug a problem with libvirt and a disk device of
> type 'volume'.
>
> I have a VM failing to start with the following error :
> $ virsh -c qemu:///system start server
> error?:Failed to start domain 'server'
> error?:internal error: process exited while connecting to monitor:
> 2022-08-13T09:26:50.121259Z qemu-system-x86_64: -blockdev
>
{"driver":"file","filename":"/mnt/images/debian-11-genericcloud-amd64.qcow2","node-name":"libvirt-3-storage","auto-read-only":true,"discard":"unmap"}:
> Could not open '/mnt/images/debian-11-genericcloud-amd64.qcow2':
> Permission denied
>
> I check the file access permission, but they are correct. I try to set
> everything to 777 or run qemu as root, but the problem persist.
> $ ll -d /mnt /mnt/images /mnt/images/*
> drwxr-xr-x 9 root???????? root???????? 4,0K 31 d?c.?? 2021 /mnt
> drwxr-xr-x 2 root???????? root???????? 4,0K 13 ao?t? 11:31 /mnt/images
> -rw-r--r-- 1 libvirt-qemu libvirt-qemu 242M 13 ao?t? 11:31
> /mnt/images/debian-11-genericcloud-amd64.qcow2
> -rw-r--r-- 1 libvirt-qemu libvirt-qemu 366K 13 ao?t? 11:31
> /mnt/images/server_cloudinit.iso
> -rw-r--r-- 1 libvirt-qemu libvirt-qemu 593M 13 ao?t? 11:59
> /mnt/images/server_image.qcow2
>
> After a lot of searching and testing, I found out that the disk device
> definition is linked to the source of the problem.
> The disk device is defined like this :
> <disk type="volume" device="disk">
> ? <driver name="qemu" type="qcow2"/>
> ? <source pool="TERRAFORM"
volume="server_image.qcow2"/>
> ? <target dev="vda" bus="virtio"/>
> ? <address type="pci" domain="0x0000"
bus="0x00" slot="0x05"
> function="0x0"/>
> </disk>
>
> This image 'server_image.qcow2' use a backing file:
> $ qemu-img info /mnt/images/server_image.qcow2? --backing-chain
> image: /mnt/stockage_rapide/VMs/terraform/puppetdev_server_image.qcow2
> file format: qcow2
> virtual size: 6 GiB (6442450944 bytes)
> disk size: 475 MiB
> cluster_size: 65536
> backing file: /mnt/images/debian-11-genericcloud-amd64.qcow2
> backing file format: qcow2
> Format specific information:
> ??? compat: 0.10
> ??? compression type: zlib
> ??? refcount bits: 16
>
> image: /mnt/images/debian-11-genericcloud-amd64.qcow2
> file format: qcow2
> virtual size: 2 GiB (2147483648 bytes)
> disk size: 242 MiB
> cluster_size: 65536
> Format specific information:
> ??? compat: 1.1
> ??? compression type: zlib
> ??? lazy refcounts: false
> ??? refcount bits: 16
> ??? corrupt: false
> ??? extended l2: false
>
> And here is the definition of the associated storage pool :
> <pool type="dir">
> ? <name>TERRAFORM</name>
> ? <uuid>dae00836-db4d-49ba-9d32-1f0278055516</uuid>
> ? <capacity unit="bytes">155674652672</capacity>
> ? <allocation unit="bytes">74396299264</allocation>
> ? <available unit="bytes">81278353408</available>
> ? <source>
> ? </source>
> ? <target>
> ??? <path>/mnt/images</path>
> ??? <permissions>
> ????? <mode>0755</mode>
> ????? <owner>0</owner>
> ????? <group>0</group>
> ??? </permissions>
> ? </target>
> </pool>
>
> If I changed the disk device definition to this (and changing only
> that), the domain start and works fine (no permission problem !).
> <disk type="file" device="disk">
> ? <driver name="qemu" type="qcow2"/>
> ? <source file="/mnt/images/server_image.qcow2"/>
> ? <target dev="vda" bus="virtio"/>
> ? <address type="pci" domain="0x0000"
bus="0x00" slot="0x05"
> function="0x0"/>
> </disk>
>
> Could you help me find the reason why the domain doesn't work when the
> disk device is of type 'volume' ?
> Thanks in advance for your help.
>
> Regards,
> Fred
>
> Additional information:
> - Running this on Debian 11 with libvirt 8.0.0 (from backports) and qemu
> 7.0 (from backports).
> - Vanilla configuration of libvirt. I have just added my regular user to
> the libvirt group.
> - Problem exists even if AppArmor is disabled.
>
> PS: I want to use a disk device of type 'volume' because this
domain is
> created by Terraform using the libvirt provider which use this kind of
> disk since it has some advantages. See the details here :
>
https://github.com/dmacvicar/terraform-provider-libvirt/issues/126#issuecomment-480597050
>
>
>