Daniel P. Berrangé
2022-Oct-13 09:33 UTC
[Libguestfs] libnbd | Failed pipeline for master | 018d55a8
On Thu, Oct 13, 2022 at 09:49:09AM +0100, Richard W.M. Jones wrote:> On Wed, Oct 12, 2022 at 02:00:21PM -0500, Eric Blake wrote: > > > Job #3163966643 ( https://gitlab.com/nbdkit/libnbd/-/jobs/3163966643/raw ) > > > > > > Stage: builds > > > Name: x86_64-opensuse-leap-153-prebuilt-env > > > > This one is still failing because of a bug in gnutls; the log is > > reporting: > > > > libnbd: debug: nbd1: nbd_connect_command: transition: NEWSTYLE.OPT_STARTTLS.RECV_REPLY_PAYLOAD -> NEWSTYLE.OPT_STARTTLS.CHECK_REPLY > > free(): invalid pointer > > libnbd: debug: nbd1: nbd_connect_command: transition: NEWSTYLE.OPT_STARTTLS.CHECK_REPLY -> NEWSTYLE.OPT_STARTTLS.TLS_HANDSHAKE_READ > > libnbd: debug: nbd1: nbd_connect_command: transition: NEWSTYLE.OPT_STARTTLS.TLS_HANDSHAKE_READ -> DEAD > > libnbd: debug: nbd1: nbd_connect_command: leave: error="nbd_connect_command: gnutls_handshake: Error in the pull function. (-1/1)" > > > > That libc message about invalid free() is scary; I'm not yet sure > > whether it is a bug in opensuse-leap's gnutls package or something > > we're doing wrong in libnbd. > > I had a look into this. Unfortunately I only have OpenSUSE Tumbleweed > available. It doesn't fail for me in Tumbleweed. (It also doesn't > fail in the CI pipeline for Tumbleweed.)Anyone has access to the CI env. Line 9 of the build log shows the container env used: Using docker image sha256:e4a8e52b0bbb712a544a90d21b21010daad8ab3e85a768cfea38571461ec85fc for registry.gitlab.com/nbdkit/libnbd/ci-opensuse-leap-153:latest with digest registry.gitlab.com/nbdkit/libnbd/ci-opensuse-leap-153 at sha256:11179119130366bc340f0fe6d0c940fa904c5d3760a10e979296ffd6c8b28488 ... You just need to launch the same container, clone the git repo and then run the build commands IOW, on your local machine do: $ podman run -it registry.gitlab.com/nbdkit/libnbd/ci-opensuse-leap-153:latestn # git clone https://gitlab.com/nbdkit/libnbd # cd libnbd # autoreconf -if # ./configure --enable-gcc-warnings --with-gnutls --with-libxml2 --enable-fuse --enable-ocaml --enable-python --enable-golang # make -j 20 # cd tests # ./connect-tls-psk requires nbdkit --tls-verify-peer -U - null --run 'exit 0' nbdkit: pattern: error: failed to set TLS session priority to @NBDKIT,SYSTEM:+ECDHE-PSK:+DHE-PSK:+PSK: The request is invalid. nbd_connect_command: gnutls_handshake: Error in the push function. (-1/1) What's interesting here is that this shows the real error mesage about TLS sessino priority. If you set MALLOC_CHECK=1, however, then we loose the useful error message: # MALLOC_CHECK_=1 MALLOC_PERTURB_=146 ./connect-tls-psk requires nbdkit --tls-verify-peer -U - null --run 'exit 0' free(): invalid pointer nbd_connect_command: gnutls_handshake: Error in the pull function. (-1/1) which was unfortunate for debuggability. I confirmed it is nbdkit that is crashing and it appears to be in gnutls code. Looking at the image there is no /etc/crypto-policies directory, and nor is there any 'crypto-policies' package available in the distro. So they have mis-built nbdkit in leap 15.3 with TLS priority string of @NBDKIT,SYSTEM, despite not having support for that in their distro.> So I guess this problem is somehow specific to nbdkit or gnutls in > OpenSUSE 15.3.Yep, broken nbdkit, compared by free() crash bug in gnutls hiding the real error> We can probably ignore this failure, under the assumption it is fixed > upstream.In ci/manifest.yml set 'allow-failure: true' for 15.3, and re-run lcitool manifest. Or disable gnutls build on 15.3 for CI purposes by passing --without-gnutls With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
Jim Fehlig
2022-Oct-13 21:02 UTC
[Libguestfs] libnbd | Failed pipeline for master | 018d55a8
Hi Daniel, Thanks for the detailed report! On 10/13/22 03:33, Daniel P. Berrang? wrote:> On Thu, Oct 13, 2022 at 09:49:09AM +0100, Richard W.M. Jones wrote: >> On Wed, Oct 12, 2022 at 02:00:21PM -0500, Eric Blake wrote: >>>> Job #3163966643 ( https://gitlab.com/nbdkit/libnbd/-/jobs/3163966643/raw ) >>>> >>>> Stage: builds >>>> Name: x86_64-opensuse-leap-153-prebuilt-env >>> >>> This one is still failing because of a bug in gnutls; the log is >>> reporting: >>> >>> libnbd: debug: nbd1: nbd_connect_command: transition: NEWSTYLE.OPT_STARTTLS.RECV_REPLY_PAYLOAD -> NEWSTYLE.OPT_STARTTLS.CHECK_REPLY >>> free(): invalid pointer >>> libnbd: debug: nbd1: nbd_connect_command: transition: NEWSTYLE.OPT_STARTTLS.CHECK_REPLY -> NEWSTYLE.OPT_STARTTLS.TLS_HANDSHAKE_READ >>> libnbd: debug: nbd1: nbd_connect_command: transition: NEWSTYLE.OPT_STARTTLS.TLS_HANDSHAKE_READ -> DEAD >>> libnbd: debug: nbd1: nbd_connect_command: leave: error="nbd_connect_command: gnutls_handshake: Error in the pull function. (-1/1)" >>> >>> That libc message about invalid free() is scary; I'm not yet sure >>> whether it is a bug in opensuse-leap's gnutls package or something >>> we're doing wrong in libnbd. >> >> I had a look into this. Unfortunately I only have OpenSUSE Tumbleweed >> available. It doesn't fail for me in Tumbleweed. (It also doesn't >> fail in the CI pipeline for Tumbleweed.) > > Anyone has access to the CI env. Line 9 of the build log > shows the container env used: > > Using docker image sha256:e4a8e52b0bbb712a544a90d21b21010daad8ab3e85a768cfea38571461ec85fc for registry.gitlab.com/nbdkit/libnbd/ci-opensuse-leap-153:latest with digest registry.gitlab.com/nbdkit/libnbd/ci-opensuse-leap-153 at sha256:11179119130366bc340f0fe6d0c940fa904c5d3760a10e979296ffd6c8b28488 ... > > You just need to launch the same container, clone the git repo and > then run the build commands > > IOW, on your local machine do: > > $ podman run -it registry.gitlab.com/nbdkit/libnbd/ci-opensuse-leap-153:latestn > # git clone https://gitlab.com/nbdkit/libnbd > # cd libnbd > # autoreconf -if > # ./configure --enable-gcc-warnings --with-gnutls --with-libxml2 --enable-fuse --enable-ocaml --enable-python --enable-golang > > # make -j 20 > # cd tests > # ./connect-tls-psk > requires nbdkit --tls-verify-peer -U - null --run 'exit 0' > nbdkit: pattern: error: failed to set TLS session priority to @NBDKIT,SYSTEM:+ECDHE-PSK:+DHE-PSK:+PSK: The request is invalid. > nbd_connect_command: gnutls_handshake: Error in the push function. (-1/1) > > What's interesting here is that this shows the real error > mesage about TLS sessino priority. > > If you set MALLOC_CHECK=1, however, then we loose the useful > error message: > > # MALLOC_CHECK_=1 MALLOC_PERTURB_=146 ./connect-tls-psk > requires nbdkit --tls-verify-peer -U - null --run 'exit 0' > free(): invalid pointer > nbd_connect_command: gnutls_handshake: Error in the pull function. (-1/1) > > which was unfortunate for debuggability. > > I confirmed it is nbdkit that is crashing and it appears to be > in gnutls code. > > Looking at the image there is no /etc/crypto-policies directory, > and nor is there any 'crypto-policies' package available in the > distro.Indeed. Leap 15.4 and newer include the crypto-policies package. Should the container move to a 15.4 base?> So they have mis-built nbdkit in leap 15.3 with TLS priority > string of @NBDKIT,SYSTEM, despite not having support for that > in their distro.I'll fix this in our downstream packages. Thanks a lot for bringing it to my attention. Regards, Jim
Eric Blake
2022-Oct-13 22:52 UTC
[Libguestfs] libnbd | Failed pipeline for master | 018d55a8
On Thu, Oct 13, 2022 at 03:02:51PM -0600, Jim Fehlig wrote:> Hi Daniel, > > Thanks for the detailed report! >> > What's interesting here is that this shows the real error > > mesage about TLS sessino priority. > > > > If you set MALLOC_CHECK=1, however, then we loose the useful > > error message: > > > > # MALLOC_CHECK_=1 MALLOC_PERTURB_=146 ./connect-tls-psk > > requires nbdkit --tls-verify-peer -U - null --run 'exit 0' > > free(): invalid pointer > > nbd_connect_command: gnutls_handshake: Error in the pull function. (-1/1) > > > > which was unfortunate for debuggability. > > > > I confirmed it is nbdkit that is crashing and it appears to be > > in gnutls code. > > > > Looking at the image there is no /etc/crypto-policies directory, > > and nor is there any 'crypto-policies' package available in the > > distro. > > Indeed. Leap 15.4 and newer include the crypto-policies package. Should the > container move to a 15.4 base?Looking further, in addition to the nbdkit bug depending on a priority string that is not possible in the base 15.3 distro, there is a definite bug in gnutls 3.6.7 shipped in the distro that was later fixed by gnutls commit 90142f2d "Use inih to parse configuration file". Look at the gnutls code base of lib/priority.c prior to that patch: static char *system_priority_buf = NULL; ... char *_gnutls_resolve_priorities(const char* priorities) ... #ifdef HAVE_FMEMOPEN /* Always try to refresh the cached data, to * allow it to be updated without restarting * all applications */ _gnutls_update_system_priorities(); fp = fmemopen(system_priority_buf, system_priority_buf_size, "r"); #else fp = fopen(system_priority_file, "r"); #endif ... fclose(fp); fp = NULL; This is very much a case of older gnutls mis-using fmemopen(), such that an fclose() on a second or third attempt to use system_priority_buf is indeed freeing an invalid pointer.> > > So they have mis-built nbdkit in leap 15.3 with TLS priority > > string of @NBDKIT,SYSTEM, despite not having support for that > > in their distro. > > I'll fix this in our downstream packages. Thanks a lot for bringing it to my > attention.You may also want to point the gnutls maintainers to the need to backport that patch (or otherwise fix that nastiness). -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org
Daniel P. Berrangé
2022-Oct-14 07:17 UTC
[Libguestfs] libnbd | Failed pipeline for master | 018d55a8
On Thu, Oct 13, 2022 at 03:02:51PM -0600, Jim Fehlig wrote:> Hi Daniel, > > Thanks for the detailed report! > > On 10/13/22 03:33, Daniel P. Berrang? wrote: > > On Thu, Oct 13, 2022 at 09:49:09AM +0100, Richard W.M. Jones wrote: > > > On Wed, Oct 12, 2022 at 02:00:21PM -0500, Eric Blake wrote: > > > > > Job #3163966643 ( https://gitlab.com/nbdkit/libnbd/-/jobs/3163966643/raw ) > > > > > > > > > > Stage: builds > > > > > Name: x86_64-opensuse-leap-153-prebuilt-env > > > > > > > > This one is still failing because of a bug in gnutls; the log is > > > > reporting: > > > > > > > > libnbd: debug: nbd1: nbd_connect_command: transition: NEWSTYLE.OPT_STARTTLS.RECV_REPLY_PAYLOAD -> NEWSTYLE.OPT_STARTTLS.CHECK_REPLY > > > > free(): invalid pointer > > > > libnbd: debug: nbd1: nbd_connect_command: transition: NEWSTYLE.OPT_STARTTLS.CHECK_REPLY -> NEWSTYLE.OPT_STARTTLS.TLS_HANDSHAKE_READ > > > > libnbd: debug: nbd1: nbd_connect_command: transition: NEWSTYLE.OPT_STARTTLS.TLS_HANDSHAKE_READ -> DEAD > > > > libnbd: debug: nbd1: nbd_connect_command: leave: error="nbd_connect_command: gnutls_handshake: Error in the pull function. (-1/1)" > > > > > > > > That libc message about invalid free() is scary; I'm not yet sure > > > > whether it is a bug in opensuse-leap's gnutls package or something > > > > we're doing wrong in libnbd. > > > > > > I had a look into this. Unfortunately I only have OpenSUSE Tumbleweed > > > available. It doesn't fail for me in Tumbleweed. (It also doesn't > > > fail in the CI pipeline for Tumbleweed.) > > > > Anyone has access to the CI env. Line 9 of the build log > > shows the container env used: > > > > Using docker image sha256:e4a8e52b0bbb712a544a90d21b21010daad8ab3e85a768cfea38571461ec85fc for registry.gitlab.com/nbdkit/libnbd/ci-opensuse-leap-153:latest with digest registry.gitlab.com/nbdkit/libnbd/ci-opensuse-leap-153 at sha256:11179119130366bc340f0fe6d0c940fa904c5d3760a10e979296ffd6c8b28488 ... > > > > You just need to launch the same container, clone the git repo and > > then run the build commands > > > > IOW, on your local machine do: > > > > $ podman run -it registry.gitlab.com/nbdkit/libnbd/ci-opensuse-leap-153:latestn > > # git clone https://gitlab.com/nbdkit/libnbd > > # cd libnbd > > # autoreconf -if > > # ./configure --enable-gcc-warnings --with-gnutls --with-libxml2 --enable-fuse --enable-ocaml --enable-python --enable-golang > > > > # make -j 20 > > # cd tests > > # ./connect-tls-psk > > requires nbdkit --tls-verify-peer -U - null --run 'exit 0' > > nbdkit: pattern: error: failed to set TLS session priority to @NBDKIT,SYSTEM:+ECDHE-PSK:+DHE-PSK:+PSK: The request is invalid. > > nbd_connect_command: gnutls_handshake: Error in the push function. (-1/1) > > > > What's interesting here is that this shows the real error > > mesage about TLS sessino priority. > > > > If you set MALLOC_CHECK=1, however, then we loose the useful > > error message: > > > > # MALLOC_CHECK_=1 MALLOC_PERTURB_=146 ./connect-tls-psk > > requires nbdkit --tls-verify-peer -U - null --run 'exit 0' > > free(): invalid pointer > > nbd_connect_command: gnutls_handshake: Error in the pull function. (-1/1) > > > > which was unfortunate for debuggability. > > > > I confirmed it is nbdkit that is crashing and it appears to be > > in gnutls code. > > > > Looking at the image there is no /etc/crypto-policies directory, > > and nor is there any 'crypto-policies' package available in the > > distro. > > Indeed. Leap 15.4 and newer include the crypto-policies package. Should the > container move to a 15.4 base?Yes, we need to add 15.4 to libvirt-ci facts database, given the relative EOL dates. With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|