Martin Kletzander
2021-Jun-30 15:49 UTC
[Libguestfs] Figuring out some failing tests for libnbd
On Wed, Jun 30, 2021 at 05:11:45PM +0200, Martin Kletzander wrote:>I am preparing more patches for CI to run check-valgrind and fix ongoing >errors but there are two issues I can not identify the reason why they >are failing. > >- On debian-10 the info/info-can.sh started failing and the error > message is just one of those I saw earlier in other places: > > libnbd: debug: nbd1: nbd_opt_abort: leave: error="nbd_opt_abort: > invalid state: READY: the handle must be negotiating: Invalid > argument" > >- On Fedora rawhide I hit a random issue where a port in a URI was > translated to its name and looking at the code I can not find how this > could have happened. Until this is fixed the test suite is unreliable > and notification fatigue will cause everyone to start ignoring any > future failures. > > /builds/nertpinx/libnbd/tests/.libs/aio-connect: actual URI > nbd://127.0.0.1:altova-lm/ != expected URI nbd://127.0.0.1:35355/ >Same happened on F33: /builds/nertpinx/libnbd/tests/.libs/lt-aio-connect: actual URI nbd://127.0.0.1:rt-helper/ != expected URI nbd://127.0.0.1:35006/>- Both openSUSE builds are failing to run check-valgrind and it looks > like it might be unrelated to libnbd, although it would be nice for > someone else to confirm that. For now I have disabled check-valgrind > on those platforms in my branch. > >- Similarly to openSUSE Ubuntu 20.04 fails in valgrind tests, but > somewhere down the GnuTLS rabbit hole, which I presume is unrelated > too, so I disabled check-valgrind on that one as well. > >I will send the patches once they are cleaned up, but I wanted to let >everyone know what the current status is because eliminating all random >issues is essential to properly consuming CI results. >I forgot to mention the pipeline with all the errors (before the check-valgrind skips) is here: https://gitlab.com/nertpinx/libnbd/-/pipelines/329661257 And the latest (what I sent the patches from) is here: https://gitlab.com/nertpinx/libnbd/-/pipelines/329755114>Thanks, >Martin-------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: not available URL: <http://listman.redhat.com/archives/libguestfs/attachments/20210630/9bbeee83/attachment.sig>
Richard W.M. Jones
2021-Jun-30 16:07 UTC
[Libguestfs] Figuring out some failing tests for libnbd
On Wed, Jun 30, 2021 at 05:49:41PM +0200, Martin Kletzander wrote:> On Wed, Jun 30, 2021 at 05:11:45PM +0200, Martin Kletzander wrote: > >- Both openSUSE builds are failing to run check-valgrind and it looks > > like it might be unrelated to libnbd, although it would be nice for > > someone else to confirm that. For now I have disabled check-valgrind > > on those platforms in my branch. > > > >- Similarly to openSUSE Ubuntu 20.04 fails in valgrind tests, but > > somewhere down the GnuTLS rabbit hole, which I presume is unrelated > > too, so I disabled check-valgrind on that one as well. > > > >I will send the patches once they are cleaned up, but I wanted to let > >everyone know what the current status is because eliminating all random > >issues is essential to properly consuming CI results. > > > > I forgot to mention the pipeline with all the errors (before the > check-valgrind skips) is here: > > https://gitlab.com/nertpinx/libnbd/-/pipelines/329661257This one: https://gitlab.com/nertpinx/libnbd/-/jobs/1389193576 FAIL: dlopen The actual failure is this memory leak: ==17953== 4,096 bytes in 1 blocks are still reachable in loss record 4 of 4 ==17953== at 0x4C2E2DF: malloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd 64-linux.so) ==17953== by 0x52CFEFB: _IO_file_doallocate (in /lib64/libc-2.26.so) ==17953== by 0x52DF0A8: _IO_doallocbuf (in /lib64/libc-2.26.so) ==17953== by 0x52DDC67: _IO_file_overflow@@GLIBC_2.2.5 (in /lib64/libc-2.26.so) ==17953== by 0x52DCCDE: _IO_file_xsputn@@GLIBC_2.2.5 (in /lib64/libc-2.26.so) ==17953== by 0x52AFCBA: vfprintf (in /lib64/libc-2.26.so) ==17953== by 0x52B88D5: printf (in /lib64/libc-2.26.so) ==17953== by 0x400ACE: thread_start (dlopen.c:117) ==17953== by 0x50474F8: start_thread (in /lib64/libpthread-2.26.so) ==17953== by 0x5359ECE: clone (in /lib64/libc-2.26.so) It looks benign so the fix would be to add a suppression to libnbd.git/valgrind/glibc.suppressions, probably something like this (untested): { glibc_6 Memcheck:Leak fun:malloc fun:_IO_file_doallocate } --- https://gitlab.com/nertpinx/libnbd/-/jobs/1389193577 This has dozens of failures in the OCaml tests. Most but not all of them are like this: ==20650== 16 bytes in 1 blocks are still reachable in loss record 1 of 38 ==20650== at 0x483E7B5: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so) ==20650== by 0x47239C: caml_stat_alloc_noexc (memory.c:818) ==20650== by 0x47239C: caml_stat_alloc (memory.c:840) ==20650== by 0x485BCD: caml_register_custom_operations (custom.c:121) ==20650== by 0x485BCD: caml_init_custom_operations (custom.c:163) ==20650== by 0x48BE46: caml_startup_common (startup_nat.c:132) ==20650== by 0x48C05A: caml_startup_exn (startup_nat.c:163) ==20650== by 0x48C05A: caml_startup (startup_nat.c:168) ==20650== by 0x48C05A: caml_main (startup_nat.c:175) ==20650== by 0x430CEB: main (main.c:41) There's already a suppression for something similar (ocaml_heap_leak_5), but it probably needs to be adjusted slightly for the different OCaml compiler being used by SUSE. This leads us to the general problem with attempting to run valgrind tests across lots of different distros: We're going to be forever chasing minor differences in versions of supporting software (glibc, OCaml, gnutls, etc.) --- https://gitlab.com/nertpinx/libnbd/-/jobs/1389193579 Multiple memory leaks in getaddrinfo. I'm going to guess that Ubuntu uses a different NSS configuration from Fedora and so NSS plugin they're using is leaky. I think a suppression covering getaddrinfo -> ... -> malloc would be too broad since it would suppress valid errors eg. where we didn't call freeaddrinfo on all paths. Fun ... Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-df lists disk usage of guests without needing to install any software inside the virtual machine. Supports Linux and Windows. http://people.redhat.com/~rjones/virt-df/
Richard W.M. Jones
2021-Jul-06 08:06 UTC
[Libguestfs] Figuring out some failing tests for libnbd
On Wed, Jun 30, 2021 at 05:07:31PM +0100, Richard W.M. Jones wrote:> https://gitlab.com/nertpinx/libnbd/-/jobs/1389193577 > > This has dozens of failures in the OCaml tests. Most but not > all of them are like this: > > ==20650== 16 bytes in 1 blocks are still reachable in loss record 1 of 38 > ==20650== at 0x483E7B5: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so) > ==20650== by 0x47239C: caml_stat_alloc_noexc (memory.c:818) > ==20650== by 0x47239C: caml_stat_alloc (memory.c:840) > ==20650== by 0x485BCD: caml_register_custom_operations (custom.c:121) > ==20650== by 0x485BCD: caml_init_custom_operations (custom.c:163) > ==20650== by 0x48BE46: caml_startup_common (startup_nat.c:132) > ==20650== by 0x48C05A: caml_startup_exn (startup_nat.c:163) > ==20650== by 0x48C05A: caml_startup (startup_nat.c:168) > ==20650== by 0x48C05A: caml_main (startup_nat.c:175) > ==20650== by 0x430CEB: main (main.c:41)I took a proper look at this issue last night and it should be fixed in these two commits: https://gitlab.com/nbdkit/nbdkit/-/commit/99140272a0675b3d123d2c42cb0a5ab73b09fba2 https://gitlab.com/nbdkit/nbdkit/-/commit/875a5056758dca754225f49516a0f4c8e788ac94 Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-top is 'top' for virtual machines. Tiny program with many powerful monitoring features, net stats, disk stats, logging, etc. http://people.redhat.com/~rjones/virt-top