Erik Jacobson
2021-Jan-25 23:13 UTC
[Gluster-users] qemu raw image file - qemu and grub2 can't find boot content from VM
Hello all. Thanks again for gluster. We're having a strange problem getting virtual machines started that are hosted on a gluster volume. One of the ways we use gluster now is to make a HA-ish cluster head node. A virtual machine runs in the shared storage and is backed up by 3 physical servers that contribute to the gluster storage share. We're using sharding in this volume. The VM image file is around 5T and we use qemu-img with falloc to get all the blocks allocated in advance. We are not using gfapi largely because it would mean we have to build our own libvirt and qemu and we'd prefer not to do that. So we're using a glusterfs fuse mount to host the image. The virtual machine is using virtio disks but we had similar trouble using scsi emulation. The issue: - all seems well, the VM head node installs, boots, etc. However, at some point, it stops being able to boot! grub2 acts like it cannot find /boot. At the grub2 prompt, it can see the partitions, but reports no filesystem found where there are indeed filesystems. If we switch qemu to use "direct kernel load" (bypass grub2), this often works around the problem but in one case Linux gave us a clue. Linux reported /dev/vda as only being 64 megabytes, which would explain a lot. This means the virtual machine Linux though the disk supplied by the disk image was tiny! 64M instead of 5T We are using sles15sp2 and hit the problem more often with updates applied than without. I'm in the process of trying to isolate if there is a sles15sp2 update causing this, or if we're within "random chance". On one of the physical nodes, if it is in the failure mode, if I use 'kpartx' to create the partitions from the image file, then mount the giant root filesystem (ie mount /dev/mapper/loop0p31 /mnt) and then umount /mnt, then that physical node starts the VM fine, grub2 loads, the virtual machine is fully happy! Until I try to shut it down and start it up again, at which point it sticks at grub2 again! What about mounting the image file makes it so qemu sees the whole disk? The problem doesn't always happen but once it starts, the same VM image has trouble starting on any of the 3 physical nodes sharing the storage. But using the trick to force-mount the root within the image with kpartx, then the machine can come up. My only guess is this changes the file just a tiny bit in the middle of the image. Once the problem starts, it keeps happening except temporarily working when I do the loop mount trick on the physical admin. Here is some info about what I have in place: nano-1:/adminvm/images # gluster volume info Volume Name: adminvm Type: Replicate Volume ID: 67de902c-8c00-4dc9-8b69-60b93b5f6104 Status: Started Snapshot Count: 0 Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: 172.23.255.151:/data/brick_adminvm Brick2: 172.23.255.152:/data/brick_adminvm Brick3: 172.23.255.153:/data/brick_adminvm Options Reconfigured: performance.client-io-threads: on nfs.disable: on storage.fips-mode-rchecksum: on transport.address-family: inet performance.quick-read: off performance.read-ahead: off performance.io-cache: off performance.low-prio-threads: 32 network.remote-dio: enable cluster.eager-lock: enable cluster.quorum-type: auto cluster.server-quorum-type: server cluster.data-self-heal-algorithm: full cluster.locking-scheme: granular cluster.shd-max-threads: 8 cluster.shd-wait-qlength: 10000 features.shard: on user.cifs: off cluster.choose-local: off client.event-threads: 4 server.event-threads: 4 cluster.granular-entry-heal: enable storage.owner-uid: 439 storage.owner-gid: 443 libglusterfs0-7.2-4723.1520.210122T1700.a.sles15sp2hpe.x86_64 glusterfs-7.2-4723.1520.210122T1700.a.sles15sp2hpe.x86_64 python3-gluster-7.2-4723.1520.210122T1700.a.sles15sp2hpe.noarch nano-1:/adminvm/images # uname -a Linux nano-1 5.3.18-24.46-default #1 SMP Tue Jan 5 16:11:50 UTC 2021 (4ff469b) x86_64 x86_64 x86_64 GNU/Linux nano-1:/adminvm/images # rpm -qa | grep qemu-4 qemu-4.2.0-9.4.x86_64 Would love any advice!!!! Erik
Mahdi Adnan
2021-Jan-26 04:26 UTC
[Gluster-users] qemu raw image file - qemu and grub2 can't find boot content from VM
Hello Erik, Anything in the logs of the fuse mount? can you stat the file from the mount? also, the report of an image is only 64M makes me think about Sharding as the default value of Shard size is 64M. Do you have any clues on when this issue start to happen? was there any operation done to the Gluster cluster? On Tue, Jan 26, 2021 at 2:40 AM Erik Jacobson <erik.jacobson at hpe.com> wrote:> Hello all. Thanks again for gluster. We're having a strange problem > getting virtual machines started that are hosted on a gluster volume. > > One of the ways we use gluster now is to make a HA-ish cluster head > node. A virtual machine runs in the shared storage and is backed up by 3 > physical servers that contribute to the gluster storage share. > > We're using sharding in this volume. The VM image file is around 5T and > we use qemu-img with falloc to get all the blocks allocated in advance. > > We are not using gfapi largely because it would mean we have to build > our own libvirt and qemu and we'd prefer not to do that. So we're using > a glusterfs fuse mount to host the image. The virtual machine is using > virtio disks but we had similar trouble using scsi emulation. > > The issue: - all seems well, the VM head node installs, boots, etc. > > However, at some point, it stops being able to boot! grub2 acts like it > cannot find /boot. At the grub2 prompt, it can see the partitions, but > reports no filesystem found where there are indeed filesystems. > > If we switch qemu to use "direct kernel load" (bypass grub2), this often > works around the problem but in one case Linux gave us a clue. Linux > reported /dev/vda as only being 64 megabytes, which would explain a lot. > This means the virtual machine Linux though the disk supplied by the > disk image was tiny! 64M instead of 5T > > We are using sles15sp2 and hit the problem more often with updates > applied than without. I'm in the process of trying to isolate if there > is a sles15sp2 update causing this, or if we're within "random chance". > > On one of the physical nodes, if it is in the failure mode, if I use > 'kpartx' to create the partitions from the image file, then mount the > giant root filesystem (ie mount /dev/mapper/loop0p31 /mnt) and then > umount /mnt, then that physical node starts the VM fine, grub2 loads, > the virtual machine is fully happy! Until I try to shut it down and > start it up again, at which point it sticks at grub2 again! What about > mounting the image file makes it so qemu sees the whole disk? > > The problem doesn't always happen but once it starts, the same VM image has > trouble starting on any of the 3 physical nodes sharing the storage. > But using the trick to force-mount the root within the image with > kpartx, then the machine can come up. My only guess is this changes the > file just a tiny bit in the middle of the image. > > Once the problem starts, it keeps happening except temporarily working > when I do the loop mount trick on the physical admin. > > > Here is some info about what I have in place: > > > nano-1:/adminvm/images # gluster volume info > > Volume Name: adminvm > Type: Replicate > Volume ID: 67de902c-8c00-4dc9-8b69-60b93b5f6104 > Status: Started > Snapshot Count: 0 > Number of Bricks: 1 x 3 = 3 > Transport-type: tcp > Bricks: > Brick1: 172.23.255.151:/data/brick_adminvm > Brick2: 172.23.255.152:/data/brick_adminvm > Brick3: 172.23.255.153:/data/brick_adminvm > Options Reconfigured: > performance.client-io-threads: on > nfs.disable: on > storage.fips-mode-rchecksum: on > transport.address-family: inet > performance.quick-read: off > performance.read-ahead: off > performance.io-cache: off > performance.low-prio-threads: 32 > network.remote-dio: enable > cluster.eager-lock: enable > cluster.quorum-type: auto > cluster.server-quorum-type: server > cluster.data-self-heal-algorithm: full > cluster.locking-scheme: granular > cluster.shd-max-threads: 8 > cluster.shd-wait-qlength: 10000 > features.shard: on > user.cifs: off > cluster.choose-local: off > client.event-threads: 4 > server.event-threads: 4 > cluster.granular-entry-heal: enable > storage.owner-uid: 439 > storage.owner-gid: 443 > > > > > libglusterfs0-7.2-4723.1520.210122T1700.a.sles15sp2hpe.x86_64 > glusterfs-7.2-4723.1520.210122T1700.a.sles15sp2hpe.x86_64 > python3-gluster-7.2-4723.1520.210122T1700.a.sles15sp2hpe.noarch > > > > nano-1:/adminvm/images # uname -a > Linux nano-1 5.3.18-24.46-default #1 SMP Tue Jan 5 16:11:50 UTC 2021 > (4ff469b) x86_64 x86_64 x86_64 GNU/Linux > nano-1:/adminvm/images # rpm -qa | grep qemu-4 > qemu-4.2.0-9.4.x86_64 > > > > Would love any advice!!!! > > > Erik > ________ > > > > Community Meeting Calendar: > > Schedule - > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC > Bridge: https://meet.google.com/cpu-eiue-hvk > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users >-- Respectfully Mahdi -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20210126/596e26e0/attachment.html>
Strahil Nikolov
2021-Jan-27 04:16 UTC
[Gluster-users] qemu raw image file - qemu and grub2 can't find boot content from VM
Are you sure that there is no heals pending at the time of the power up ?> > > nano-1:/adminvm/images # gluster volume info > > Volume Name: adminvm > Type: Replicate > Volume ID: 67de902c-8c00-4dc9-8b69-60b93b5f6104 > Status: Started > Snapshot Count: 0 > Number of Bricks: 1 x 3 = 3 > Transport-type: tcp > Bricks: > Brick1: 172.23.255.151:/data/brick_adminvm > Brick2: 172.23.255.152:/data/brick_adminvm > Brick3: 172.23.255.153:/data/brick_adminvm > Options Reconfigured: > performance.client-io-threads: on > nfs.disable: on > storage.fips-mode-rchecksum: on > transport.address-family: inet > performance.quick-read: off > performance.read-ahead: off > performance.io-cache: off > performance.low-prio-threads: 32 > network.remote-dio: enable > cluster.eager-lock: enable > cluster.quorum-type: auto > cluster.server-quorum-type: server > cluster.data-self-heal-algorithm: full > cluster.locking-scheme: granular > cluster.shd-max-threads: 8 > cluster.shd-wait-qlength: 10000 > features.shard: on > user.cifs: off > cluster.choose-local: off > client.event-threads: 4 > server.event-threads: 4 > cluster.granular-entry-heal: enable > storage.owner-uid: 439 > storage.owner-gid: 443 >I checked my oVirt-based gluster and the only difference is: cluster.gra nular-entry-heal: enable The options seem fine.> > > libglusterfs0-7.2-4723.1520.210122T1700.a.sles15sp2hpe.x86_64 > glusterfs-7.2-4723.1520.210122T1700.a.sles15sp2hpe.x86_64 > python3-gluster-7.2-4723.1520.210122T1700.a.sles15sp2hpe.noarchThis one is quite old although it never caused any troubles with my oVirt VMs. Either try with latest v7 or even v8.3 . Best Regards, Strahil Nikolov