Vincent MAILHOL
2023-Jun-16 02:17 UTC
[Libguestfs] libldm crashes in a linux-sandbox context
Hi Richard, On Fri. 16 Jun. 2023 ? 03:08, Richard W.M. Jones <rjones at redhat.com> wrote:> On Thu, Jun 15, 2023 at 09:18:38PM +0900, Vincent Mailhol wrote: > > Hello, > > > > I am using libguestfs in a Bazel's linux-sandbox environment[1]. > > > > When executing in that sandbox environment, I got frequent crashes. > > > > Please find attached below the results of libguestfs-test-tool when > > run into that linux-sandbox environment. The most relevant part seems > > to be: > > > > [ 0.797233] ldmtool[164]: segfault at 0 ip 0000564a892506a5 sp 00007fff8ee5b900 error 4 in ldmtool[564a8924e000+3000] > > [ 0.798117] Code: 18 64 48 33 1c 25 28 00 00 00 75 5e 48 83 c4 28 5b 5d 41 5c 41 5d 41 5e 41 5f c3 66 2e 0f 1f 84 00 00 00 00 00 e8 db fd ff ff <4c> 8b 20 48 89 44 24 08 4c 89 e7 e8 0b e1 ff ff 45 31 c0 4c 89 e1 > > /init: line 154: 164 Segmentation fault ldmtool create all > > > > So the root cause seems to be around libldm. This mailing list seems > > to cover both libguestfs and libldm, so hopefully, I am at the right > > place to ask :) > > > > Needless to say, when run outside of the sandbox environment, no crash > > were observed. > > > > [1] linux-sandbox.cc > > Link: https://github.com/bazelbuild/bazel/blob/master/src/main/tools/linux-sandbox.cc > > > > --- > ... > > supermin: picked /sys/block/sdb/dev (8:16) as root device > > supermin: creating /dev/root as block special 8:16 > > supermin: mounting new root on /root > > [ 0.678248] EXT4-fs (sdb): mounting ext2 file system using the ext4 subsystem > > [ 0.679832] EXT4-fs (sdb): mounted filesystem without journal. Opts: . Quota mode: none. > > supermin: deleting initramfs files > > supermin: chroot > > Starting /init script ... > > mount: only root can use "--types" option (effective UID is 65534) > > /init: line 38: /proc/cmdline: No such file or directory > > mount: only root can use "--types" option (effective UID is 65534) > > mount: only root can use "--options" option (effective UID is 65534) > > mount: only root can use "--types" option (effective UID is 65534) > > mount: only root can use "--types" option (effective UID is 65534) > > mount: only root can use "--options" option (effective UID is 65534) > > It really goes wrong from here, where apparently it's not running as > root (instead UID 65534), even though we're supposed to be running > inside a Linux appliance virtual machine. > > Any idea why that would be? > > I looked at the sandbox and that would run the qemu process as UID > "nobody" (which might be 65534). However I don't understand why that > would affect anything running on the new kernel inside the appliance.And you were right. It was a fact that I got a crash in the sandbox but did not outside of it and I jumped to the conclusion that the root cause was linked to the sandbox. I continued the analysis and looked at all the differences between a successful libguestfs-test-tool log and the failed one. It turned out that the sandbox was not the cause. The culprit turns out to be the first line of the log: TMPDIR=/tmp. If I force TMPDIR=/var/tmp, the problem disappears !! This gave me a minimal reproducer: TMPDIR=/tmp/ libguestfs-test-tool That one crashed outside the sandbox. Next, my attention went to this line: libguestfs: checking for previously cached test results of /usr/bin/qemu-system-x86_64, in /tmp/.guestfs-1001 I did a: rm -rf /tmp/.guestfs-1001 and that solved my issue \o/ I still do not understand how I could get the issue of running of UID 65534 instead of root in the first place. I did other qemu experimentation, so not sure how, but I somehow got a corrupted environment under /tmp/.guestfs-1001. Last thing, the segfault on ldmtool [1] still seems a valid issue. Even if I now do have a workaround for my problem, that segfault might be worth a bit more investigation. Regardless, thanks a lot for your quick answer, that helped me to continue the troubleshooting. [1] ldmtool line 164 Link: https://github.com/mdbooth/libldm/blob/master/src/ldmtool.c#L164
Richard W.M. Jones
2023-Jun-16 07:34 UTC
[Libguestfs] libldm crashes in a linux-sandbox context
On Fri, Jun 16, 2023 at 11:17:21AM +0900, Vincent MAILHOL wrote:> Hi Richard, > > On Fri. 16 Jun. 2023 ? 03:08, Richard W.M. Jones <rjones at redhat.com> wrote: > > On Thu, Jun 15, 2023 at 09:18:38PM +0900, Vincent Mailhol wrote: > > > Hello, > > > > > > I am using libguestfs in a Bazel's linux-sandbox environment[1]. > > > > > > When executing in that sandbox environment, I got frequent crashes. > > > > > > Please find attached below the results of libguestfs-test-tool when > > > run into that linux-sandbox environment. The most relevant part seems > > > to be: > > > > > > [ 0.797233] ldmtool[164]: segfault at 0 ip 0000564a892506a5 sp 00007fff8ee5b900 error 4 in ldmtool[564a8924e000+3000] > > > [ 0.798117] Code: 18 64 48 33 1c 25 28 00 00 00 75 5e 48 83 c4 28 5b 5d 41 5c 41 5d 41 5e 41 5f c3 66 2e 0f 1f 84 00 00 00 00 00 e8 db fd ff ff <4c> 8b 20 48 89 44 24 08 4c 89 e7 e8 0b e1 ff ff 45 31 c0 4c 89 e1 > > > /init: line 154: 164 Segmentation fault ldmtool create all > > > > > > So the root cause seems to be around libldm. This mailing list seems > > > to cover both libguestfs and libldm, so hopefully, I am at the right > > > place to ask :) > > > > > > Needless to say, when run outside of the sandbox environment, no crash > > > were observed. > > > > > > [1] linux-sandbox.cc > > > Link: https://github.com/bazelbuild/bazel/blob/master/src/main/tools/linux-sandbox.cc > > > > > > --- > > ... > > > supermin: picked /sys/block/sdb/dev (8:16) as root device > > > supermin: creating /dev/root as block special 8:16 > > > supermin: mounting new root on /root > > > [ 0.678248] EXT4-fs (sdb): mounting ext2 file system using the ext4 subsystem > > > [ 0.679832] EXT4-fs (sdb): mounted filesystem without journal. Opts: . Quota mode: none. > > > supermin: deleting initramfs files > > > supermin: chroot > > > Starting /init script ... > > > mount: only root can use "--types" option (effective UID is 65534) > > > /init: line 38: /proc/cmdline: No such file or directory > > > mount: only root can use "--types" option (effective UID is 65534) > > > mount: only root can use "--options" option (effective UID is 65534) > > > mount: only root can use "--types" option (effective UID is 65534) > > > mount: only root can use "--types" option (effective UID is 65534) > > > mount: only root can use "--options" option (effective UID is 65534) > > > > It really goes wrong from here, where apparently it's not running as > > root (instead UID 65534), even though we're supposed to be running > > inside a Linux appliance virtual machine. > > > > Any idea why that would be? > > > > I looked at the sandbox and that would run the qemu process as UID > > "nobody" (which might be 65534). However I don't understand why that > > would affect anything running on the new kernel inside the appliance. > > And you were right. It was a fact that I got a crash in the sandbox > but did not outside of it and I jumped to the conclusion that the root > cause was linked to the sandbox. > > I continued the analysis and looked at all the differences between a > successful libguestfs-test-tool log and the failed one. It turned out > that the sandbox was not the cause. The culprit turns out to be the > first line of the log: TMPDIR=/tmp. > > If I force TMPDIR=/var/tmp, the problem disappears !! > > This gave me a minimal reproducer: > > TMPDIR=/tmp/ libguestfs-test-tool > > That one crashed outside the sandbox. Next, my attention went to this line: > > libguestfs: checking for previously cached test results of > /usr/bin/qemu-system-x86_64, in /tmp/.guestfs-1001 > > I did a: > > rm -rf /tmp/.guestfs-1001 > > and that solved my issue \o/ > > I still do not understand how I could get the issue of running of UID > 65534 instead of root in the first place. I did other qemu > experimentation, so not sure how, but I somehow got a corrupted > environment under /tmp/.guestfs-1001.We will cache the appliance under $TMPDIR/.guestfs-$UID/ (eg have a look at appliance/root in that directory). We rebuild it if the distro changes, so most of the time we don't have to rebuild it when launching libguestfs (although there was a long-standing bug which I fixed recently: https://github.com/libguestfs/supermin/commit/8c38641042e274a713a18daf7fc85584ca0fc9bb).> Last thing, the segfault on ldmtool [1] still seems a valid issue. > Even if I now do have a workaround for my problem, that segfault might > be worth a bit more investigation.Yes that does look like a real problem. Does it crash if you just run ldmtool as a normal command, nothing to do with libguestfs? Might be a good idea to try to get a stack trace of the crash. Rich.> Regardless, thanks a lot for your quick answer, that helped me to > continue the troubleshooting. > > [1] ldmtool line 164 > Link: https://github.com/mdbooth/libldm/blob/master/src/ldmtool.c#L164-- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-builder quickly builds VMs from scratch http://libguestfs.org/virt-builder.1.html