On Wed, Aug 31, 2022 at 10:13:08AM +0100, Richard W.M. Jones wrote:> On Wed, Aug 31, 2022 at 09:45:45AM +0800, Ming Lei wrote: > > On Tue, Aug 30, 2022 at 05:13:46PM +0100, Richard W.M. Jones wrote: > > > On Tue, Aug 30, 2022 at 11:29:26PM +0800, Ming Lei wrote: > > > > On Tue, Aug 30, 2022 at 03:38:50PM +0100, Richard W.M. Jones wrote: > > > > > On Tue, Aug 30, 2022 at 03:12:23PM +0800, Ming Lei wrote: > > > > > > The patch sent in last email may cause io hang on MQ, and follows the fixed > > > > > > version: > > > > > > > > > > I split this into two commits and cleaned them up and posted them here: > > > > > > > > > > https://gitlab.com/rwmjones/libnbd/-/commits/nbdublk/ > > > > > > > > > > Unfortunately this doesn't work for me. When I do various filesystem > > > > > operations like git clone and a compile I see some subtle disk errors > > > > > and eventually it deadlocks, so I guess there is some problem. > > > > > > > > OK, care to provide more details about the reproducer? Like how backend > > > > is setup, MQ/SQ is used, disk size, ... > > > > > > My test script is attached. $1 == "ublk". > > > > > > It basically just clones a Linux repo and compiles it. It hangs > > > either during the clone or early in the build, and there are various > > > "scary messages" from git which might indicate disk corruption. > > > > > > The NBD server is: > > > > > > nbdkit -f memory 24G > > > > > > running on the hypervisor ("nbd://pick"). > > > > > > > I have cloned linux kernel source tree on nbdublk disk and built it with > > > > fedora 36 config for ~20min, so far so good. In my setting, backend is > > > > 'nbdkit file /dev/sda(virtio-scsi)', nbdublk is single queue. > > > > > > Can you see if you can reproduce a hang with the source from: > > > > > > https://gitlab.com/rwmjones/libnbd/-/commits/nbdublk/ > > > > > > I may have made a mistake when rebasing your patch or fixing it up to > > > remove compiler warnings. > > > > My test used the your tree directly. And I compared with it with > > my native tree, basically same. > > > > Today I will setup & run the test by your approach. > > I tried it again now and it definitely deadlocks under load.I can reproduce it, please try the top patch in aio branch, which fixed hang in my reproducer with your test setting. https://github.com/ming1/ubdsrv/commits/aio Thanks, Ming
Richard W.M. Jones
2022-Aug-31 09:41 UTC
[Libguestfs] [PATCH libnbd] ublk: Add new nbdublk program
On Wed, Aug 31, 2022 at 05:29:13PM +0800, Ming Lei wrote:> I can reproduce it, please try the top patch in aio branch, which fixed > hang in my reproducer with your test setting. > > https://github.com/ming1/ubdsrv/commits/aio(https://github.com/ming1/ubdsrv/commit/0a293b6eb7149dc5ee83e5d07d242accdb840c85) Yes, that seems to fix it. I have two loops, one git-cloning the kernel, and another copying the source of nbdkit and recompiling it, and they are both working without problems. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com Fedora Windows cross-compiler. Compile Windows programs, test, and build Windows installers. Over 100 libraries supported. http://fedoraproject.org/wiki/MinGW