Jason Pepas
2015-Nov-07 19:22 UTC
Re: [Libguestfs] mkfs.ext2 succeeds despite nbd write errors?
On Sat, Nov 7, 2015 at 5:03 AM, Richard W.M. Jones <rjones@redhat.com> wrote:> How about 'strace mkfs.ext2 ..' and see if any system calls are > returning errors. That would show you whether nbd-client is throwing > errors away, or whether mkfs is getting the errors and ignoring them > (seems pretty unlikely, but you never know). > > After that, it'd be down to tracing where the errors end up in the > kernel.Thanks for the tip! The results are interesting. It looks like all of mkfs's pwrite() calls succeed, but its final fsync() calls do actually fail: root@debian:~# strace mkfs.ext2 /dev/nbd0 2>&1 | tee strace.out root@debian:~# cat strace.out | grep pwrite pwrite(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 32768, 8187379712) = 32768 pwrite(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 32768, 8187412480) = 32768 pwrite(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 32768, 8187445248) = 32768 ... root@debian:~# cat strace.out | grep fsync fsync(3) = -1 EIO (Input/output error) fsync(3) = -1 EIO (Input/output error) The fsync() calls happen just before mkfs exists success: root@debian:~# cat strace.out | tail pwrite(3, "\1\2\0\0\2\2\0\0\3\2\0\0\367{\365\37\2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096, 6576672768) = 4096 fsync(3) = -1 EIO (Input/output error) pwrite(3, "\0\0\10\0\0\0 \0\231\231\1\0qm\37\0\365\377\7\0\0\0\0\0\2\0\0\0\2\0\0\0"..., 1024, 1024) = 1024 fsync(3) = -1 EIO (Input/output error) close(3) = 0 write(1, "done\n\n", 6done ) = 6 exit_group(0) = ? +++ exited with 0 +++ root@debian:~# I did manage to find two calls to fsync in the e2fsprogs source which are not return-value-checked: https://github.com/tytso/e2fsprogs/blob/956b0f18a5ddb6815a9dff4f10a1e3125cdca9ba/misc/filefrag.c#L303 https://github.com/tytso/e2fsprogs/blob/956b0f18a5ddb6815a9dff4f10a1e3125cdca9ba/lib/ext2fs/unix_io.c#L915 I'll see about submitting a patch there. I'm not sure where to start with hunting down why mkfs's pwrite() calls aren't failing. I'd look to the kernel source for that? -jason
Richard W.M. Jones
2015-Nov-07 21:02 UTC
Re: [Libguestfs] mkfs.ext2 succeeds despite nbd write errors?
[Adding linux-ext4 mailing list. The original bug report is here: https://www.redhat.com/archives/libguestfs/2015-November/msg00078.html ] On Sat, Nov 07, 2015 at 01:22:45PM -0600, Jason Pepas wrote:> On Sat, Nov 7, 2015 at 5:03 AM, Richard W.M. Jones <rjones@redhat.com> wrote: > > How about 'strace mkfs.ext2 ..' and see if any system calls are > > returning errors. That would show you whether nbd-client is throwing > > errors away, or whether mkfs is getting the errors and ignoring them > > (seems pretty unlikely, but you never know). > > > > After that, it'd be down to tracing where the errors end up in the > > kernel. > > Thanks for the tip! > > The results are interesting. It looks like all of mkfs's pwrite() > calls succeed, but its final fsync() calls do actually fail: > > > root@debian:~# strace mkfs.ext2 /dev/nbd0 2>&1 | tee strace.out > > root@debian:~# cat strace.out | grep pwrite > pwrite(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., > 32768, 8187379712) = 32768 > pwrite(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., > 32768, 8187412480) = 32768 > pwrite(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., > 32768, 8187445248) = 32768 > ... > > root@debian:~# cat strace.out | grep fsync > fsync(3) = -1 EIO (Input/output error) > fsync(3) = -1 EIO (Input/output error) > > > The fsync() calls happen just before mkfs exists success: > > > root@debian:~# cat strace.out | tail > pwrite(3, "\1\2\0\0\2\2\0\0\3\2\0\0\367{\365\37\2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., > 4096, 6576672768) = 4096 > fsync(3) = -1 EIO (Input/output error) > pwrite(3, "\0\0\10\0\0\0 > \0\231\231\1\0qm\37\0\365\377\7\0\0\0\0\0\2\0\0\0\2\0\0\0"..., 1024, > 1024) = 1024 > fsync(3) = -1 EIO (Input/output error) > close(3) = 0 > write(1, "done\n\n", 6done > > ) = 6 > exit_group(0) = ? > +++ exited with 0 +++ > root@debian:~# > > > I did manage to find two calls to fsync in the e2fsprogs source which > are not return-value-checked: > > https://github.com/tytso/e2fsprogs/blob/956b0f18a5ddb6815a9dff4f10a1e3125cdca9ba/misc/filefrag.c#L303 > https://github.com/tytso/e2fsprogs/blob/956b0f18a5ddb6815a9dff4f10a1e3125cdca9ba/lib/ext2fs/unix_io.c#L915That second one looks very suspicious to me. I don't think that it's ever right for mke2fs to ignore the return value from an fsync call, so assuming mke2fs calls that function it's surely a bug.> I'll see about submitting a patch there. > > I'm not sure where to start with hunting down why mkfs's pwrite() > calls aren't failing. I'd look to the kernel source for that?It looks like it's really an e2fsprogs problem, not a kernel problem. That's pretty surprising - I wasn't expecting it. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-p2v converts physical machines to virtual machines. Boot with a live CD or over the network (PXE) and turn machines into KVM guests. http://libguestfs.org/virt-v2v
Jason Pepas
2015-Nov-07 23:09 UTC
Re: [Libguestfs] mkfs.ext2 succeeds despite nbd write errors?
On Sat, Nov 7, 2015 at 3:02 PM, Richard W.M. Jones <rjones@redhat.com> wrote:>> I'm not sure where to start with hunting down why mkfs's pwrite() >> calls aren't failing. I'd look to the kernel source for that? > > It looks like it's really an e2fsprogs problem, not a kernel problem. > That's pretty surprising - I wasn't expecting it.I agree the fsync() issue is an e2fsprogs problem, but as for specifically the pwrite() calls not getting a -1 return value, that's the kernel's fault, right? I've been rolling this around in my mind and I think I can see why the kernel would correctly make fsync() fail but not pwrite() fail. Let me run this by you: When a pwrite() happens, that doesn't immediately cause nbd to send a network packet out, and doesn't wait on a network reply before returning, right? It just ends up in some dirty block device queue, I'm guessing? And then something triggers a bunch of dirty blocks to get flushed out to "disk"? If that's the case, then its impossible for the kernel to give an accurate return code to pwrite(), because it doesn't know those blocks will eventually fail to be written to "disk" (nbd). But as for fsync(), the kernel is probably waiting until every last dirty sector gets written before it decides what the return code is, which is why we see that pwrite() isn't failing, but fsync() is failing. Does that make sense? I wonder if the block device were opened with O_DIRECT by e2fsprogs if that would cause the pwrite() calls to fail correctly? -jason
Jason Pepas
2015-Nov-07 23:23 UTC
Re: [Libguestfs] mkfs.ext2 succeeds despite nbd write errors?
On Sat, Nov 7, 2015 at 3:02 PM, Richard W.M. Jones <rjones@redhat.com> wrote:> On Sat, Nov 07, 2015 at 01:22:45PM -0600, Jason Pepas wrote: >> I did manage to find two calls to fsync in the e2fsprogs source which >> are not return-value-checked: >> >> https://github.com/tytso/e2fsprogs/blob/956b0f18a5ddb6815a9dff4f10a1e3125cdca9ba/misc/filefrag.c#L303 >> https://github.com/tytso/e2fsprogs/blob/956b0f18a5ddb6815a9dff4f10a1e3125cdca9ba/lib/ext2fs/unix_io.c#L915 > > That second one looks very suspicious to me. I don't think that it's > ever right for mke2fs to ignore the return value from an fsync call, > so assuming mke2fs calls that function it's surely a bug.I locally patched mke2fs to check the return value of those two fsync() calls, and now it exits failure like it should: root@debian:~/mkfs/e2fsprogs-1.42.12/build/misc# strace ./mke2fs /dev/nbd0 2>&1 | tail write(1, "\10\10\10\10\1) = 5 pwrite(3, "\1\2\0\0\2\2\0\0\3\2\0\0\367{\365\37\2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096, 6576672768) = 4096 fsync(3) = -1 EIO (Input/output error) pwrite(3, "\0\0\10\0\0\0 \0\231\231\1\0qm\37\0\365\377\7\0\0\0\0\0\2\0\0\0\2\0\0\0"..., 1024, 1024) = 1024 fsync(3) = -1 EIO (Input/output error) close(3) = 0 write(2, "\nWarning, had trouble writing ou"..., 46 Warning, had trouble writing out superblocks.) = 46 exit_group(5) = ? +++ exited with 5 +++ The patches are pretty simple: root@debian:~/mkfs# diff -urN e2fsprogs-1.42.12.orig/misc/filefrag.c e2fsprogs-1.42.12/misc/filefrag.c --- e2fsprogs-1.42.12.orig/misc/filefrag.c 2014-08-13 14:57:29.000000000 -0500 +++ e2fsprogs-1.42.12/misc/filefrag.c 2015-11-07 13:44:17.089128243 -0600 @@ -292,8 +292,11 @@ fm_ext.fe_flags = FIEMAP_EXTENT_MERGED; } - if (sync_file) - fsync(fd); + if (sync_file) { + int rc = fsync(fd); + if (rc < 0) + return rc; + } for (i = 0, logical = 0, *num_extents = 0, count = last_block = 0; i < numblocks; root@debian:~/mkfs# diff -urN e2fsprogs-1.42.12.orig/lib/ext2fs/unix_io.c e2fsprogs-1.42.12/lib/ext2fs/unix_io.c --- e2fsprogs-1.42.12.orig/lib/ext2fs/unix_io.c 2014-08-08 15:59:39.000000000 -0500 +++ e2fsprogs-1.42.12/lib/ext2fs/unix_io.c 2015-11-07 14:31:28.209005806 -0600 @@ -887,7 +887,9 @@ #ifndef NO_IO_CACHE retval = flush_cached_blocks(channel, data, 0); #endif - fsync(data->dev); + if (fsync(data->dev) == -1) + return errno; + return retval; } (attached) -jason