Nir Soffer
2020-Oct-13 15:45 UTC
Re: [Libguestfs] virt-sparsify failed (was: [oVirt Jenkins] ovirt-system-tests_basic-suite-master_nightly - Build # 479 - Failure!)
On Mon, Oct 12, 2020 at 9:05 AM Yedidyah Bar David <didi@redhat.com> wrote:> > Hi all, > > On Mon, Oct 12, 2020 at 5:17 AM <jenkins@jenkins.phx.ovirt.org> wrote: > > > > Project: https://jenkins.ovirt.org/job/ovirt-system-tests_basic-suite-master_nightly/ > > Build: https://jenkins.ovirt.org/job/ovirt-system-tests_basic-suite-master_nightly/479/ > > Above failed with: > > https://jenkins.ovirt.org/job/ovirt-system-tests_basic-suite-master_nightly/479/testReport/junit/basic-suite-master.test-scenarios/004_basic_sanity/test_sparsify_disk1/ > > vdsm.log has: > > https://jenkins.ovirt.org/job/ovirt-system-tests_basic-suite-master_nightly/479/artifact/exported-artifacts/test_logs/basic-suite-master/lago-basic-suite-master-host-0/_var_log/vdsm/vdsm.log > > 2020-10-11 22:05:14,695-0400 INFO (jsonrpc/1) [api.host] FINISH > getJobs return={'jobs': {'05eaea44-7e4c-4442-9926-2bcb696520f1': > {'id': '05eaea44-7e4c-4442-9926-2bcb696520f1', 'status': 'failed', > 'description': 'sparsify_volume', 'job_type': 'storage', 'error': > {'code': 100, 'message': 'General Exception: (\'Command > [\\\'/usr/bin/virt-sparsify\\\', \\\'--machine-readable\\\', > \\\'--in-place\\\', > \\\'/rhev/data-center/mnt/192.168.200.4:_exports_nfs_share1/8b292c13-fd8a-4a7c-903c-5724ec742c10/images/a367c179-2ac9-4930-abeb-848229f81c97/515fcf06-8743-45d1-9af8-61a0c48e8c67\\\'] > failed with rc=1 out=b\\\'3/12\\\\n{ "message": "libguestfs error: > guestfs_launch failed.\\\\\\\\nThis usually means the libguestfs > appliance failed to start or crashed.\\\\\\\\nDo:\\\\\\\\n export > LIBGUESTFS_DEBUG=1 LIBGUESTFS_TRACE=1\\\\\\\\nand run the command > again. For further information, read:\\\\\\\\n > http://libguestfs.org/guestfs-faq.1.html#debugging-libguestfs\\\\\\\\nYou > can also run \\\\\\\'libguestfs-test-tool\\\\\\\' and post the > *complete* output\\\\\\\\ninto a bug report or message to the > libguestfs mailing list.", "timestamp": > "2020-10-11T22:05:08.397538670-04:00", "type": "error" }\\\\n\\\' > err=b"virt-sparsify: error: libguestfs error: guestfs_launch > failed.\\\\nThis usually means the libguestfs appliance failed to > start or crashed.\\\\nDo:\\\\n export LIBGUESTFS_DEBUG=1 > LIBGUESTFS_TRACE=1\\\\nand run the command again. For further > information, read:\\\\n > http://libguestfs.org/guestfs-faq.1.html#debugging-libguestfs\\\\nYou > can also run \\\'libguestfs-test-tool\\\' and post the *complete* > output\\\\ninto a bug report or message to the libguestfs mailing > list.\\\\n\\\\nIf reporting bugs, run virt-sparsify with debugging > enabled and include the \\\\ncomplete output:\\\\n\\\\n virt-sparsify > -v -x [...]\\\\n"\',)'}}}, 'status': {'code': 0, 'message': 'Done'}} > from=::ffff:192.168.201.4,43318, > flow_id=365642f4-2fe2-45df-937a-f4ca435eea38 (api:54) > 2020-10-11 22:05:14,695-0400 DEBUG (jsonrpc/1) [jsonrpc.JsonRpcServer] > Return 'Host.getJobs' in bridge with > {'05eaea44-7e4c-4442-9926-2bcb696520f1': {'id': > '05eaea44-7e4c-4442-9926-2bcb696520f1', 'status': 'failed', > 'description': 'sparsify_volume', 'job_type': 'storage', 'error': > {'code': 100, 'message': 'General Exception: (\'Command > [\\\'/usr/bin/virt-sparsify\\\', \\\'--machine-readable\\\', > \\\'--in-place\\\', > \\\'/rhev/data-center/mnt/192.168.200.4:_exports_nfs_share1/8b292c13-fd8a-4a7c-903c-5724ec742c10/images/a367c179-2ac9-4930-abeb-848229f81c97/515fcf06-8743-45d1-9af8-61a0c48e8c67\\\'] > failed with rc=1 out=b\\\'3/12\\\\n{ "message": "libguestfs error: > guestfs_launch failed.\\\\\\\\nThis usually means the libguestfs > appliance failed to start or crashed.\\\\\\\\nDo:\\\\\\\\n export > LIBGUESTFS_DEBUG=1 LIBGUESTFS_TRACE=1\\\\\\\\nand run the command > again. For further information, read:\\\\\\\\n > http://libguestfs.org/guestfs-faq.1.html#debugging-libguestfs\\\\\\\\nYou > can also run \\\\\\\'libguestfs-test-tool\\\\\\\' and post the > *complete* output\\\\\\\\ninto a bug report or message to the > libguestfs mailing list.", "timestamp": > "2020-10-11T22:05:08.397538670-04:00", "type": "error" }\\\\n\\\' > err=b"virt-sparsify: error: libguestfs error: guestfs_launch > failed.\\\\nThis usually means the libguestfs appliance failed to > start or crashed.\\\\nDo:\\\\n export LIBGUESTFS_DEBUG=1 > LIBGUESTFS_TRACE=1\\\\nand run the command again. For further > information, read:\\\\n > http://libguestfs.org/guestfs-faq.1.html#debugging-libguestfs\\\\nYou > can also run \\\'libguestfs-test-tool\\\' and post the *complete* > output\\\\ninto a bug report or message to the libguestfs mailing > list.\\\\n\\\\nIf reporting bugs, run virt-sparsify with debugging > enabled and include the \\\\ncomplete output:\\\\n\\\\n virt-sparsify > -v -x [...]\\\\n"\',)'}}} (__init__:356) > > /var/log/messages has: > > Oct 11 22:04:51 lago-basic-suite-master-host-0 kvm[80601]: 1 guest now active > Oct 11 22:05:06 lago-basic-suite-master-host-0 journal[80557]: Domain > id=1 name='guestfs-hl0ntvn92rtkk2u0' > uuid=05ea5a53-562f-49f8-a8ca-76b45c5325b4 is tainted: custom-argv > Oct 11 22:05:06 lago-basic-suite-master-host-0 journal[80557]: Domain > id=1 name='guestfs-hl0ntvn92rtkk2u0' > uuid=05ea5a53-562f-49f8-a8ca-76b45c5325b4 is tainted: host-cpu > Oct 11 22:05:06 lago-basic-suite-master-host-0 kvm[80801]: 2 guests now active > Oct 11 22:05:08 lago-basic-suite-master-host-0 journal[80557]: > internal error: End of file from qemu monitor > Oct 11 22:05:08 lago-basic-suite-master-host-0 kvm[80807]: 1 guest now active > Oct 11 22:05:08 lago-basic-suite-master-host-0 journal[80557]: cannot > resolve symlink /tmp/libguestfseTG8xF/console.sock: No such file or > directoryKN<F3>L^? > Oct 11 22:05:08 lago-basic-suite-master-host-0 journal[80557]: cannot > resolve symlink /tmp/libguestfseTG8xF/guestfsd.sock: No such file or > directoryKN<F3>L^? > Oct 11 22:05:08 lago-basic-suite-master-host-0 journal[80557]: cannot > lookup default selinux label for /tmp/libguestfs1WkcF7/overlay1.qcow2 > Oct 11 22:05:08 lago-basic-suite-master-host-0 journal[80557]: cannot > lookup default selinux label for > /var/tmp/.guestfs-36/appliance.d/kernel > Oct 11 22:05:08 lago-basic-suite-master-host-0 journal[80557]: cannot > lookup default selinux label for > /var/tmp/.guestfs-36/appliance.d/initrd > Oct 11 22:05:08 lago-basic-suite-master-host-0 vdsm[74096]: ERROR Job > '05eaea44-7e4c-4442-9926-2bcb696520f1' failed#012Traceback (most > recent call last):#012 File > "/usr/lib/python3.6/site-packages/vdsm/jobs.py", line 159, in run#012 > self._run()#012 File > "/usr/lib/python3.6/site-packages/vdsm/storage/sdm/api/sparsify_volume.py", > line 57, in _run#012 > virtsparsify.sparsify_inplace(self._vol_info.path)#012 File > "/usr/lib/python3.6/site-packages/vdsm/virtsparsify.py", line 40, in > sparsify_inplace#012 commands.run(cmd)#012 File > "/usr/lib/python3.6/site-packages/vdsm/common/commands.py", line 101, > in run#012 raise cmdutils.Error(args, p.returncode, out, > err)#012vdsm.common.cmdutils.Error: Command ['/usr/bin/virt-sparsify', > '--machine-readable', '--in-place', > '/rhev/data-center/mnt/192.168.200.4:_exports_nfs_share1/8b292c13-fd8a-4a7c-903c-5724ec742c10/images/a367c179-2ac9-4930-abeb-848229f81c97/515fcf06-8743-45d1-9af8-61a0c48e8c67'] > failed with rc=1 out=b'3/12\n{ "message": "libguestfs error: > guestfs_launch failed.\\nThis usually means the libguestfs appliance > failed to start or crashed.\\nDo:\\n export LIBGUESTFS_DEBUG=1 > LIBGUESTFS_TRACE=1\\nand run the command again. For further > information, read:\\n > http://libguestfs.org/guestfs-faq.1.html#debugging-libguestfs\\nYou > can also run \'libguestfs-test-tool\' and post the *complete* > output\\ninto a bug report or message to the libguestfs mailing > list.", "timestamp": "2020-10-11T22:05:08.397538670-04:00", "type": > "error" }\n' err=b"virt-sparsify: error: libguestfs error: > guestfs_launch failed.\nThis usually means the libguestfs appliance > failed to start or crashed.\nDo:\n export LIBGUESTFS_DEBUG=1 > LIBGUESTFS_TRACE=1\nand run the command again. For further > information, read:\n > http://libguestfs.org/guestfs-faq.1.html#debugging-libguestfs\nYou can > also run 'libguestfs-test-tool' and post the *complete* output\ninto a > bug report or message to the libguestfs mailing list.\n\nIf reporting > bugs, run virt-sparsify with debugging enabled and include the > \ncomplete output:\n\n virt-sparsify -v -x [...]\n" > > The next run of the job (480) did finish successfully. No idea if it > was already fixed by a patch, or is simply a random/env issue.I think this is env issue, we run on overloaded vms with small amount of memory. I have seen such radnom failures before.> Is it possible to pass LIBGUESTFS_DEBUG=1 LIBGUESTFS_TRACE=1 without > patching vdsm? Not sure. In any case, if this does not cause too much > excess debug logging, perhaps better always pass it, to help > retroactively analyze such failures.Unfortunately using LIBGUESTFS_DEBUG=1 LIBGUESTFS_TRACE=1 creates huge amount of logs so it is not possible to include these log for every run. Maybe we can show the logs only if the command failed, but vdsm infrastructure make this hard to do since we always log commnad stderr, even if the command was successful. Also I'm not sure this is a good idea to always run virt-sparsify in debug level. I think I posted a patch to run libguestfs-test-tool after virt-something errors for collecting data, but I cannot find it. Maybe this can be useful, so after every virt-something error we will have enough info about the issue in vdsm log.> Or, patch > virt-sparsify/libguestfs/whatever to always log at least enough > information on failure even without passing these.I think this is the right solution - when virt-something tool fails, it should log the reason for the failure - the error that caused the tool to fail. I'm not sure this is easy to do as the failing code run inside a special VM. Maybe the code running in the VM should log the output in a machine readable way, so once an error is detected virt-something can report the error as the reason, without running in debug mode. Nir
Richard W.M. Jones
2020-Oct-13 16:15 UTC
Re: [Libguestfs] virt-sparsify failed (was: [oVirt Jenkins] ovirt-system-tests_basic-suite-master_nightly - Build # 479 - Failure!)
On Tue, Oct 13, 2020 at 06:45:42PM +0300, Nir Soffer wrote:> I think this is the right solution - when virt-something tool fails, > it should log the reason for the failure - the error that caused the > tool to fail. I'm not sure this is easy to do as the failing code > run inside a special VM. Maybe the code running in the VM should log > the output in a machine readable way, so once an error is detected > virt-something can report the error as the reason, without running > in debug mode.All the virt-* tools that I've written have a non-zero exit code and print an error message on stderr when they fail. Errors from inside the appliance are propagated to the library and thence to the tool correctly. I think the best thing to do is: - spool up stdout + stderr from the tool - if the exit code != 0, save the spooled output for analysis - if the exit code == 0, discard it (or keep it if you like) Without LIBGUESTFS_DEBUG/LIBGUESTFS_TRACE stdout + stderr should be quite minimal for all tools. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-df lists disk usage of guests without needing to install any software inside the virtual machine. Supports Linux and Windows. http://people.redhat.com/~rjones/virt-df/
Nir Soffer
2020-Oct-13 16:56 UTC
Re: [Libguestfs] virt-sparsify failed (was: [oVirt Jenkins] ovirt-system-tests_basic-suite-master_nightly - Build # 479 - Failure!)
On Tue, Oct 13, 2020 at 7:15 PM Richard W.M. Jones <rjones@redhat.com> wrote:> > On Tue, Oct 13, 2020 at 06:45:42PM +0300, Nir Soffer wrote: > > I think this is the right solution - when virt-something tool fails, > > it should log the reason for the failure - the error that caused the > > tool to fail. I'm not sure this is easy to do as the failing code > > run inside a special VM. Maybe the code running in the VM should log > > the output in a machine readable way, so once an error is detected > > virt-something can report the error as the reason, without running > > in debug mode. > > All the virt-* tools that I've written have a non-zero exit code and > print an error message on stderr when they fail. Errors from inside > the appliance are propagated to the library and thence to the tool > correctly. > > I think the best thing to do is: > > - spool up stdout + stderr from the tool > > - if the exit code != 0, save the spooled output for analysis > > - if the exit code == 0, discard it (or keep it if you like)This is what we already do, and the result is not helpful. If you look at the log message in the previous message, basically the only info about the error is: libguestfs error: guestfs_launch failed I don't see what we can do with this error message. If this is an error running qemu, the error message from qemu can help to understand why qemu failed to start.> Without LIBGUESTFS_DEBUG/LIBGUESTFS_TRACE stdout + stderr should be > quite minimal for all tools. > > Rich. > > -- > Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones > Read my programming and virtualization blog: http://rwmj.wordpress.com > virt-df lists disk usage of guests without needing to install any > software inside the virtual machine. Supports Linux and Windows. > http://people.redhat.com/~rjones/virt-df/ >
Possibly Parallel Threads
- Re: virt-sparsify failed (was: [oVirt Jenkins] ovirt-system-tests_basic-suite-master_nightly - Build # 479 - Failure!)
- Re: virt-sparsify failed (was: [oVirt Jenkins] ovirt-system-tests_basic-suite-master_nightly - Build # 479 - Failure!)
- Re: virt-sparsify failed (was: [oVirt Jenkins] ovirt-system-tests_basic-suite-master_nightly - Build # 479 - Failure!)
- Re: virt-sparsify failed (was: [oVirt Jenkins] ovirt-system-tests_basic-suite-master_nightly - Build # 479 - Failure!)
- Re: virt-sparsify failed (was: [oVirt Jenkins] ovirt-system-tests_basic-suite-master_nightly - Build # 479 - Failure!)