thr3ads.net - freebsd stable - Help with filing a [maybe] ZFS/mmap bug. [Jul 2013]

If this information is useful, please help other people find it:
Share via:

George Hartzell

2013-Jul-17 20:47 UTC

Help with filing a [maybe] ZFS/mmap bug.

Hi All,

I have what I think is a ZFS related bug.  Unfortunately my simplest
test case is a bit cumbersome and I haven't definitively proven that
the problem is ZFS related.

I'm hoping for some feedback on how to move forward.

Quick background: I rip my CD's using grip and produce flac files.  I
tag the music using Musicbrainz' Picard and transcode it to mp3's
within Picard using a plugin that I wrote.  Picard is a python based
app and uses the Mutagen library to tag files.

I'm working on a MacPro with 10GB ram and using Seagate ST31000340AS
drives updated to the latest firmware (SD1A).  The system is running
9-STABLE from late June.  It is ZFS only and boots from a mirrored
pool that provides a bunch of zfs filesystems, including my home
directory.

I recently realized that some of the flacs were corrupt and have been
chasing down the problem.  I've blamed Picard, my disks (there was
newer, "important" firmware, which they're now running), my RAM,
etc...

After blaming each of the moving parts in turn I offer up the
following experiment as evidence that I have found a ZFS problem.

- start with a bunch of untagged flac files that pass validation with
  "flac -t".

- load them into Picard, tag them and save them (this also transcodes
  them to mp3's using my plugin and runs a plugin which runs flac -t
  on the tagged file).

- run flac -t on all of the tag flac files and collect the result as
  pre-exit-validation.

- exit Picard "politely" (using the menu options, not killing it from
  the command line...).

- run flac -t on all of the tag flac files and collect the result
  post-exit-validation.

- reboot the machine

- run flac -t on all of the tag flac files and collect the result
  post-reboot-validation.

On multiple runs through this routine I'll sometimes see errors in the
{pre,post}-exit-validations, but they'll often all validate perfectly.

On all of the runs through the validation I'll see many invalid files
in the post-reboot-validation output.

I've even scp'd the directories to an unrelated machine (Mac OS X
10.8) at the various points to do the "flac -t" validation, with the
same results.

Looking carefully at a couple of instances shows that they differ in a
few bytes.  E.g. one file differs by a few bytes starting at 139253 to
139264 (I might have an off by one counting issue, using emacs' buffer
positions here).  2^17 + 2^13 = 139264, which is an interesting
coincidence.  In another file I see a difference ending at 2^17+2^12
(again, I might be off by one or so in my counting).  Patching the
different hunk from a good file into a bad file (again via emacs)
results in a file that passes validation.

At one point I was blaming RAM and was pulling/swapping sims.  Running
with less memory increased the likelihood of files being invalid.

I built up a similar system running 9-STABLE as of yesterday (7/16)
that uses UFS and have been unable to recreate the problem.

Given that the files are valid after exiting Picard, I do not think
that there is anything in my tagging pipeline that is causing the
problem.

The fact that the files "become" invalid after a reboot suggests
something in the ZFS buffering and/or interactions with the VM system.
The observation that running with less memory causes more/earlier
problems reinforces this.  The fact that the garbage in the file
happens near a power-of-two boundry also reinforces this.

My current test case involves my local version of Picard and my
plugins, and 165 flac files (some of which Picard can discover
automatically based on grip's freedb based metadata, some of which
need a helping hand).  Not particularly minimal but I'm not sure that
I can ever get it trimmed down to something trivial that a ZFS
developer might be able to run locally.

Thanks for making it this far!

How should I move forward with this?

g.

Richard Todd

2013-Jul-18 05:13 UTC

head link

Help with filing a [maybe] ZFS/mmap bug.

George Hartzell <hartzell at alerce.com> writes:
> Hi All,
>
> I have what I think is a ZFS related bug.  Unfortunately my simplest
> test case is a bit cumbersome and I haven't definitively proven that
> the problem is ZFS related.
> I'm hoping for some feedback on how to move forward.
>
> Quick background: I rip my CD's using grip and produce flac files.  I
> tag the music using Musicbrainz' Picard and transcode it to mp3's
> within Picard using a plugin that I wrote.  Picard is a python based
> app and uses the Mutagen library to tag files.
[summary: Picard seems to trigger an mmap consistency bug in ZFS].

Aw, crud.  I ran into what may be this same issue a few years ago, 
under almost identical circumstances (picard with Ogg files instead of 
flac), but never got around to writing up a proper mailinglist post on it.
I *thought* it had been fixed some time back, though I'm not entirely
sure because I've been running with my local patch to py-mutagen to disable
its mmap() usage. 

Anyway, here's what I recall from when I was trying to track this down way
back when:

1) Your suspicions are right, it's definitely a ZFS/mmap interaction bug. 
UFS filesystems didn't show the bug.

2) It's triggered by the insert_bytes or delete_bytes function in
py-mutagen.

3) As long as the in-memory version of that chunk of the file stuck around
in memory, the file read OK, but the data on disk was corrupt, so if that
data got evicted from cache and had to be reread, or if you forced the 
cache data to be disposed of by unmounting and remounting the FS, you would
see a corrupt file.  (Rebooting, of course, would also allow the corrupted
file data to become visible again.)  

I'll attach my patch to disable py-mutagen's mmap usage in
insert/delete_bytes
below.  Try it and see if it makes the corruption problems go away.  If so,
that narrows your search down to those two routines in py-mutagen and what
they're doing.  

I *had* what at the time I recall was a simple C++ test program that
managed to trigger the bug more-or-less reliably.  Unfortunately, it
doesn't look like my test case still works, i.e., the test program
doesn't seem to trigger the bug either on the 10-CURRENT box or RELENG-9
VM I'm trying them on now.  As I recall, the bug's presence was
dependent on
fairly picky details on what exact offsets were used on the mmap()s and the
write()s, so it may be that different offsets from what I tried will still show
the bug.  

Anyway, what I'd suggest is the following: see if my patch for py-mutagen
disabling the mmap() in those two functions lets you run picard reliably.
If so, then the issue is triggered by one or both of those two routines;
hack them to print out the exact offsets used on each call and use that to 
try and code up a simple C++ test case.  

Here's the py-mutagen patch:

--- mutagen/_util.py.orig	2008-06-01 01:33:00.000000000 -0500
+++ mutagen/_util.py	2009-04-11 18:16:53.363758128 -0500
@@ -213,12 +213,6 @@
     fobj.write('\x00' * size)
     fobj.flush()
     try:
-        try:
-            import mmap
-            map = mmap.mmap(fobj.fileno(), filesize + size)
-            try: map.move(offset + size, offset, movesize)
-            finally: map.close()
-        except (ValueError, EnvironmentError, ImportError):
             # handle broken mmap scenarios
             locked = lock(fobj)
             fobj.truncate(filesize)
@@ -272,17 +266,11 @@
     try:
         if movesize > 0:
             fobj.flush()
-            try:
-                import mmap
-                map = mmap.mmap(fobj.fileno(), filesize)
-                try: map.move(offset, offset + size, movesize)
-                finally: map.close()
-            except (ValueError, EnvironmentError, ImportError):
-                # handle broken mmap scenarios
-                locked = lock(fobj)
-                fobj.seek(offset + size)
-                buf = fobj.read(BUFFER_SIZE)
-                while buf:
+            # handle broken mmap scenarios
+            locked = lock(fobj)
+            fobj.seek(offset + size)
+            buf = fobj.read(BUFFER_SIZE)
+            while buf:
                     fobj.seek(offset)
                     fobj.write(buf)
                     offset += len(buf)

and here's my test case:

# This is a shell archive.  Save it in a file, remove anything before
# this line, and then unpack it by entering "sh file".  Note, it may
# create directories; files and directories will be owned by you and
# have default permissions.
#
# This archive contains:
#
#	gen4.cpp
#	test4.sh
#
echo x - gen4.cpp
sed 's/^X//' >gen4.cpp << 'END-of-gen4.cpp'
X/*
X** Program to create a file of data and do some mmap()ing writes to it. 
X*/
X
X#include <stdio.h>
X#include <stdlib.h>
X#include <stdint.h>
X#include <unistd.h>
X#include <string.h>
X#include <fcntl.h>
X#include <sys/types.h>
X#include <sys/mman.h>
X
X/* Insert size bytes (zeros) into file at offset */
Xvoid
Xinsert_bytes(int fd, unsigned int size, unsigned int offset)
X{
X    unsigned int filesize = lseek(fd, (off_t)0, SEEK_END);
X    unsigned int movesize = filesize - offset;
X    
X    char *buf = new char[size];
X    for (unsigned int i = 0 ; i < size ; ++i) {
X	buf[i] = '\0';
X    }
X    write(fd, buf, size);
X
X    char *addr = (char *) mmap((void *)NULL, filesize+size,
PROT_READ|PROT_WRITE, MAP_SHARED, fd, offset);
X    memmove(addr+size, addr, movesize);
X    if (munmap(addr, filesize+size)) {
X	perror("munmap");
X    }
X}
X
X/* Delete size bytes (zeros) from file at offset */
Xvoid
Xdelete_bytes(int fd, unsigned int size, unsigned int offset)
X{
X    unsigned int filesize = lseek(fd, (off_t)0, SEEK_END);
X    unsigned int movesize = filesize - offset - size;
X
X    char *addr = (char *) mmap((void *)NULL, filesize, PROT_READ|PROT_WRITE,
MAP_SHARED, fd, offset);
X    memmove(addr, addr+size, movesize);
X    if (munmap(addr, filesize)) {
X	perror("munmap");
X    }
X    ftruncate(fd, filesize - size);
X}
X
X    
X
X/* Usage: gen1 filename number-of-seconds */
Xint
Xmain(int argc, char **argv)
X{
X    char *fname = argv[1];
X    FILE *f = fopen(fname, "w");
X
X    unsigned int size = 1024;
X
X    for (unsigned int i = 0 ; i < size ; ++i) {
X	char c = i & 0xff;
X	fwrite((void *)&c, 1, 1, f);
X    }
X
X    fclose(f);
X
X    int fd = open(fname, O_RDWR);
X
X    unsigned int S1 = atoi(argv[2]);
X#define O1 0
X    insert_bytes(fd, S1, O1);
X    
X    (void) lseek(fd, (off_t) O1, SEEK_SET);
X
X#define S2 1
X#define O2 (O1+S1)
X
X    delete_bytes(fd, S2, O2);
X    
X    exit(0);
X}
END-of-gen4.cpp
echo x - test4.sh
sed 's/^X//' >test4.sh << 'END-of-test4.sh'
X#!/bin/sh
X# Set the following variables appropriately
X# ZFSFS name of your ZFS filesystem to test
X# ZFSMP mount point of your ZFS filesystem to test
X# ZFSDIR a writable dir somewhere under ZFSMP
X# UFSDIR a writable dir somewhere on a UFS filesystem
X# Script takes one argument, S1, a number of bytes.  
X
XZFSFS=u1
XZFSMP=/u1
XZFSDIR=/u1/tmp
XUFSDIR=/var/tmp
X
XNAME=$ZFSDIR/foo1
XS1=$1
Xrm $NAME
X./gen4 $NAME $S1
Xmd5 $NAME
Xcp $NAME $UFSDIR/before
Xls -l $NAME
Xsudo umount $ZFSMP
Xsudo mount -t zfs $ZFSFS $ZFSMP
Xmd5 $NAME
Xls -l $NAME
Xcp $NAME $UFSDIR/after
Xcmp -l $UFSDIR/before $UFSDIR/after
END-of-test4.sh
exit

Andriy Gapon

2013-Jul-18 15:35 UTC

head link

Help with filing a [maybe] ZFS/mmap bug.

on 17/07/2013 23:47 George Hartzell said the following:> How should I move forward with this?
Could you please try to reproduce this problem using a kernel built with
INVARIANTS options?

-- 
Andriy Gapon

Steven Hartland

2013-Nov-12 02:32 UTC

head link

Help with filing a [maybe] ZFS/mmap bug.

----- Original Message ----- 
From: "George Hartzell" <hartzell at
alerce.com>> > > Andriy Gapon writes:
> > > > on 18/07/2013 20:44 George Hartzell said the following:
> > > > > Andriy Gapon writes:
> > > > >  > on 17/07/2013 23:47 George Hartzell said the
following:
> > > > >  > > How should I move forward with this?
> > > > >  > 
> > > > >  > Could you please try to reproduce this problem
using a kernel built with
> > > > >  > INVARIANTS options?
> > > > > 
> > > > > I added INVARIANT_SUPPORT and INVARIANTS options to the
GENERIC
> > > > > kernel, rebuilt it, installed it and running through my
"test case"
> > > > > generated a lot of invalid flac files.  I"m not
sure what the options
> > > > > are/were supposed to do though, it looks like they
generally lead to
> > > > > KASSERTS, which lead to abort()'s.  Nothing in
/var/log/messages or on
> > > > > the console.
> > > > 
> > > > George,
> > > > 
> > > > do you have anything new on this issue?
> > > 
> > > Since the message that you quoted I narrowed down my "test
case"
> > > somewhat but I have not yet produced a stand-alone tool that
> > > reproduces it (you still have to go through picard et al.).
> > > 
> > > > Could you please try the following patch?
> > > > http://people.freebsd.org/~avg/zfs-putpages.diff
> > > > 
> > > > I expect it to not really fix the issue, but it may help to
narrow it down.
> > > > Please keep INVARIANTS.
> > > 
> > > Absolutely.  Probably not until the weekend, but I'll give it
a go.
> > > 
> > > Thanks for following up.
> > 
> > Did you manage to make any progress with this?
> > 
> > We're seeing a problem where rrdcached corrupts rrd files and
remembering
> > this thread and knowning it uses mmap and we're on ZFS I was
wondering
> > it this may be the cause for this issue too.
> > 
> > I've just recompiled rrdtool without mmap support and am clearing
down
> > all corrupted files but it would be good to know if any progress was
> > made on this?
> > 
> >     Regards
> >     Steve
> 
> I was able recreate the problem on a 10-BETA-something-or-other
> recently (I'd only been using 9 up until then).  Andriy's patches
> didn't make a difference.  I haven't heard anything since reporting
> back to him.
I've pretty much confirmed mmap support is causing the corruption when
running rrdcached as since rebuilding with mmap disabled I've had no
further corruption.

@George when you got corruption what did the files look like? I ask as
here I see lots of zeros as through the file size was correct but pretty
much blanked.

@avg what was your thinking behind what may be the issue here?

If this is a mmap bug in zfs its a pretty serious one given the amount
of silent corruption you can get.

@re Although reported incidents appear to be rare as its silent data
corruption users may be blissfully unaware its happening. Given that
my gut feeling is this is serious enough that we need to get something
in place before 10 release, even if this is make ZFS report ENOTSUP
for mmap calls, would you agree?

    Regards
    Steve

Maybe Matching Threads

Search for more possibly parallel threads

freebsd stable - Jul 2013 - Help with filing a [maybe] ZFS/mmap bug.

Help with filing a [maybe] ZFS/mmap bug.

Help with filing a [maybe] ZFS/mmap bug.

Help with filing a [maybe] ZFS/mmap bug.

Help with filing a [maybe] ZFS/mmap bug.

Maybe Matching Threads