thr3ads.net - Btrfs devel - 2 errors when scrubbing - but I don''t know what they mean [Nov 2013]

If this information is useful, please help other people find it:
Share via:

Sebastian Ochmann

2013-Nov-28 20:36 UTC

2 errors when scrubbing - but I don''t know what they mean

Hello everyone,

when I scrubbed one of my btrfs volumes today, the result of the scrub was:

total bytes scrubbed: 1.27TB with 2 errors
error details: super=2
corrected errors: 0, uncorrectable errors: 0, unverified errors: 0

and dmesg said:

btrfs: bdev /dev/mapper/tray errs: wr 0, rd 0, flush 0, corrupt 0, gen 1
btrfs: bdev /dev/mapper/tray errs: wr 0, rd 0, flush 0, corrupt 0, gen 2

Can someone please enlighten me what these errors mean (especially the 
"super" and "gen" values)? As an additional info: The drive
is sometimes
used in a machine with kernel 3.11.6 and sometimes with 3.12.0, could 
this swapping explain the problem somehow?

Best regards
Sebastian
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Duncan

2013-Nov-29 01:10 UTC

head link

Re: 2 errors when scrubbing - but I don''t know what they mean

Sebastian Ochmann posted on Thu, 28 Nov 2013 21:36:32 +0100 as excerpted:
> when I scrubbed one of my btrfs volumes today, the result of the scrub
> was:
> 
> total bytes scrubbed: 1.27TB with 2 errors error details: super=2
> corrected errors: 0, uncorrectable errors: 0, unverified errors: 0
> 
> and dmesg said:
> 
> btrfs: bdev /dev/mapper/tray errs: wr 0, rd 0, flush 0, corrupt 0, gen 1
> btrfs: bdev /dev/mapper/tray errs: wr 0, rd 0, flush 0, corrupt 0, gen 2
> 
> Can someone please enlighten me what these errors mean (especially the
> "super" and "gen" values)? As an additional info: The
drive is sometimes
> used in a machine with kernel 3.11.6 and sometimes with 3.12.0, could
> this swapping explain the problem somehow?
[Just an admin using/testing btrfs here; not a dev.]

Super=superblock.  I really can''t say what errors registered as
superblock
errors might mean as I''ve never seen them here and haven''t
chanced across
an explanation on-list or on the wiki, but were I seeing that here, my 
approach would be to try the scrub again and hope the errors were fixed 
(tho I should mention that I''m on SSD with multiple independent rather 
small btrfs partitions, so scrubs take a couple minutes for my larger 
partitions, not the hours you''re likely to see with multi-TB spinning 
rust, so rerunning a scrub is trivial, /here/!).  If that didn''t catch 
them, then I''d try btrfsck (without --repair) and see if it had any 
further information to offer.  (Repair is a a further step that I''d
only
take if necessary -- making sure I had a good backup first!)  There''s 
also btrfs-show-super, which should be safe as it''s read-only, simply 
displaying a lot of information, much of which probably won''t make much
sense except to a btrfs dev/expert (it''s beyond me).


As for the dmesg output you quoted, if you compare your syslog times for 
the same messages, I suspect you''ll find they were printed at
filesystem
mount time, NOT during the scrub, and are thus not directly related.

What the dmesg output IS directly related to is the output of btrfs 
device stat.  The first thing to note about it is that errors reported 
are cumulative, only being reset if its -z option is used.  Thus, stats 
let you track whether the number of errors are rising, but unless you 
reset stats (using btrfs dev stat -z) after your last scrub, they''ll 
still reflect historical errors that have already been corrected -- 
errors reported at mount time and by device stat reflect historical 
status and do NOT necessarily reflect *CURRENT* errors.

As with the superblock errors, I''ve not actually seen generation errors
here, so I don''t know whether they''re the superblock errors
scrub is
reporting or are different.  Similarly, I don''t know what fixes them.

What I /have/ seen here are read_ and write_io_errs (as reported by stat, 
simply wr/rd as reported by the kernel at mount time), due to bad 
shutdowns (well, suspend-to-ram that didn''t resume properly).  I know 
scrub can and does recover those, provided it has a second copy to 
recover from, as it does here since (with the exception of /boot) all my 
btrfs filesystems are btrfs raid1 mode, both data and metadata, across 
two SSDs.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Wang Shilong

2013-Nov-29 05:51 UTC

head link

Re: 2 errors when scrubbing - but I don''t know what they mean

Hi,

On 11/29/2013 04:36 AM, Sebastian Ochmann wrote:> Hello everyone,
>
> when I scrubbed one of my btrfs volumes today, the result of the scrub 
> was:
>
> total bytes scrubbed: 1.27TB with 2 errors
> error details: super=2
> corrected errors: 0, uncorrectable errors: 0, unverified errors: 0Here super error means superblock checksum mismatch,scrub just report
superblock errors but dosen''t try to fix it....

Maybe this is just a read error, anyway, superblocks will be rewritten 
after commiting
a transaction..

Thanks,
Wang>
> and dmesg said:
>
> btrfs: bdev /dev/mapper/tray errs: wr 0, rd 0, flush 0, corrupt 0, gen 1
> btrfs: bdev /dev/mapper/tray errs: wr 0, rd 0, flush 0, corrupt 0, gen 2
>
> Can someone please enlighten me what these errors mean (especially the 
> "super" and "gen" values)? As an additional info: The
drive is
> sometimes used in a machine with kernel 3.11.6 and sometimes with 
> 3.12.0, could this swapping explain the problem somehow?
>
> Best regards
> Sebastian
> -- 
> To unsubscribe from this list: send the line "unsubscribe
linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Sebastian Ochmann

2013-Nov-30 11:31 UTC

head link

Re: 2 errors when scrubbing - but I don''t know what they mean

Hello,

thank you for your input. I didn''t know that btrfs keeps the error 
counters over mounts/reboots, but that''s nice.

I''m still trying to figure out how such a generation error may occur in
the first place. One thing I noticed looking at the btrfs code is that 
the generation error counter will only get incremented in the actual 
scrubbing code (either in "scrub_checksum_super" or in 
"scrub_handle_errored_block", both in scrub.c - please correct me if
I''m
wrong, I''m not a btrfs dev). Also, the dmesg errors I saw were not
there
at boot time, but about 10 minutes after boot which was about the time 
when I started the scrub so I''m pretty sure that it was the scrub that 
detected the errors.

The question remains what can cause superblock/gen errors. Sure it could 
be "some" read error, but I''d really like to make sure that
it''s not a
systematic error. I wasn''t able to reproduce it yet though.

Best
Sebastian
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Shilong Wang

2013-Dec-01 01:16 UTC

head link

Fwd: 2 errors when scrubbing - but I don''t know what they mean

cc: linux-btrfs

---------- Forwarded message ----------
From: Shilong Wang <wangshilong1991@gmail.com>
Date: 2013/12/1
Subject: Re: 2 errors when scrubbing - but I don''t know what they mean
To: Sebastian Ochmann <ochmann@informatik.uni-bonn.de>


Hello Sebastian,

2013/11/30 Sebastian Ochmann
<ochmann@informatik.uni-bonn.de>:> Hello,
>
> thank you for your input. I didn''t know that btrfs keeps the error
counters
> over mounts/reboots, but that''s nice.
>
> I''m still trying to figure out how such a generation error may
occur in the
> first place. One thing I noticed looking at the btrfs code is that the
> generation error counter will only get incremented in the actual scrubbing
> code (either in "scrub_checksum_super" or in
"scrub_handle_errored_block",
> both in scrub.c - please correct me if I''m wrong, I''m not
a btrfs dev).
Right, Scrub will read superblock with bio rather than using pagecaches.
This mean we will reread superblock from disks, if a checksum mismatch happens,
This can be the following reasons:

1.some read errors happen while scrubing, while superblocks are actually good
2.during last transaction, when we are trying to write superblocks to
disk, some silent corruption
   happens.
3.some unexpected operation write data to superblocks directly, for
example..''dd if=/dev/zero''
of=/dev/ seek=65536   count=4k'' something like this.

Actually, during boot time, superblock should be fine, because will do
checksum check
when trying to using superblock. if checksum mismatch, we will refuse
to mount, After mounting,
these superblocks should be cached in memory until you umouting filesystem.

So ideal thing is your disk is fine, and during next transaction,
superblocks will be rewritten.
and during next umounting, you can mounting filesystem successfully!

However, if you find such superblocks checksum mismatch very often
during scrub, it maybe
there are something wrong with disk!
> Also, the dmesg errors I saw were not there at boot time, but about 10
> minutes after boot which was about the time when I started the scrub so
I''m
> pretty sure that it was the scrub that detected the errors.
>
> The question remains what can cause superblock/gen errors. Sure it could be
> "some" read error, but I''d really like to make sure that
it''s not a
> systematic error. I wasn''t able to reproduce it yet though.
You can reproduce this by doing ''dd if=/dev/zero of=/dev/sd*
seek=65536 count=4k'' before
btrfs scrubing.

Thanks,
Wang>
> Best
> Sebastian
>
> --
> To unsubscribe from this list: send the line "unsubscribe
linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Sebastian Ochmann

2013-Dec-01 20:45 UTC

head link

Re: 2 errors when scrubbing - but I don''t know what they mean

Hello,

 > However, if you find such superblocks checksum mismatch very often
 > during scrub, it maybe
 > there are something wrong with disk!

I''m sorry, but I don''t think there''s a problem with
my disks because I
was able to trigger the errors that increment the "gen" error counter 
during scrub on a completely different machine and drive today. I 
basically performed some I/O operations on a drive and scrubbed at the 
same time over and over again until I actually saw "super" errors
during
scrub. But the error is reeally hard to trigger. It seems to me like a 
race condition somewhere.

So I went a step further and tried to create a repro for this. It seems 
like I can trigger the errors now once every few minutes with the method 
described below, but sometimes it really takes a long time until the 
error pops up, so be patient when trying this...

For the repro:

I''m using a btrfs image in RAM for this for two reasons: I can scrub 
quickly over and over again and I can rule our hard drive errors. My 
machine has 32 GB of RAM, so that comes in handy here - if you try this 
on a physical drive, make sure to adjust some parameters, if necessary.

Create a tmpfs and a testing image, format as btrfs:

$ mkdir btrfstest
$ cd btrfstest/
$ mkdir tmp
$ mount -t tmpfs -o size=20G none tmp
$ dd if=/dev/zero of=tmp/vol bs=1G count=19
$ mkfs.btrfs tmp/vol
$ mkdir mnt
$ mount -o commit=1 tmp/vol mnt

Note the "commit=1" mount option. It''s not strictly
necessary, but I
have the feeling it helps with triggering the problem...

So now we have a 19 GB btrfs filesystem in RAM, mounted in "mnt". What
I
did for performing some artificial I/O operations is to rm and cp a 
linux source tree over and over again. Suppose you have an unpacked 
linux source tree available in the "/somewhere/linux" directory (and 
you''re using bash). We''ll spawn some loops that keep the
filesystem busy:

$ while true; do rm -fr mnt/a; sleep 1.0; cp -R /somewhere/linux mnt/a; 
sleep 1.0; done
$ while true; do rm -fr mnt/b; sleep 1.1; cp -R /somewhere/linux mnt/b; 
sleep 1.1; done
$ while true; do rm -fr mnt/c; sleep 1.2; cp -R /somewhere/linux mnt/c; 
sleep 1.2; done

Now that the filesystem is busy, we''ll also scrub it repeatedly
(without
backgrounding, -B):

$ while true; do btrfs scrub start -B mnt; sleep 0.5; done

On my machine and in RAM, each scrub takes 0-1 second and the "total 
bytes scrubbed" should fluctuate (seems to be especially true with 
commit=1, but not sure). Get a beverage of your choice and wait.

(about 10 minutes later)

When I was writing this repro it took about 10 minutes until scrub said:

   total bytes scrubbed: 1.20GB with 2 errors
   error details: super=2
   corrected errors: 0, uncorrectable errors: 0, unverified errors: 0

and in dmesg:

   [15282.155170] btrfs: bdev /dev/loop0 errs: wr 0, rd 0, flush 0, 
corrupt 0, gen 1
   [15282.155176] btrfs: bdev /dev/loop0 errs: wr 0, rd 0, flush 0, 
corrupt 0, gen 2

After that, scrub is happy again and will continue normally until the 
same errors happen again after a few hundred scrubs or so.

So all in all, the error can be triggered using normal I/O operations 
and scrubbing at the right moments, it seems. Even with a btrfs image in 
RAM, so no hard drive error is possible.

Hope anyone can reproduce this and maybe debug it.

Best regards
Sebastian
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Wang Shilong

2013-Dec-02 01:30 UTC

head link

Re: 2 errors when scrubbing - but I don''t know what they mean

On 12/02/2013 04:45 AM, Sebastian Ochmann wrote:> Hello,
>
> > However, if you find such superblocks checksum mismatch very often
> > during scrub, it maybe
> > there are something wrong with disk!
>
> I''m sorry, but I don''t think there''s a problem
with my disks because I
> was able to trigger the errors that increment the "gen" error
counter
> during scrub on a completely different machine and drive today. I 
> basically performed some I/O operations on a drive and scrubbed at the 
> same time over and over again until I actually saw "super" errors
> during scrub. But the error is reeally hard to trigger. It seems to me 
> like a race condition somewhere.
>
> So I went a step further and tried to create a repro for this. It 
> seems like I can trigger the errors now once every few minutes with 
> the method described below, but sometimes it really takes a long time 
> until the error pops up, so be patient when trying this...
>
> For the repro:
>
> I''m using a btrfs image in RAM for this for two reasons: I can
scrub
> quickly over and over again and I can rule our hard drive errors. My 
> machine has 32 GB of RAM, so that comes in handy here - if you try 
> this on a physical drive, make sure to adjust some parameters, if 
> necessary.
>
> Create a tmpfs and a testing image, format as btrfs:
>
> $ mkdir btrfstest
> $ cd btrfstest/
> $ mkdir tmp
> $ mount -t tmpfs -o size=20G none tmp
> $ dd if=/dev/zero of=tmp/vol bs=1G count=19
> $ mkfs.btrfs tmp/vol
> $ mkdir mnt
> $ mount -o commit=1 tmp/vol mnt
>
> Note the "commit=1" mount option. It''s not strictly
necessary, but I
> have the feeling it helps with triggering the problem...
>
> So now we have a 19 GB btrfs filesystem in RAM, mounted in "mnt".
What
> I did for performing some artificial I/O operations is to rm and cp a 
> linux source tree over and over again. Suppose you have an unpacked 
> linux source tree available in the "/somewhere/linux" directory
(and
> you''re using bash). We''ll spawn some loops that keep the
filesystem busy:
>
> $ while true; do rm -fr mnt/a; sleep 1.0; cp -R /somewhere/linux 
> mnt/a; sleep 1.0; done
> $ while true; do rm -fr mnt/b; sleep 1.1; cp -R /somewhere/linux 
> mnt/b; sleep 1.1; done
> $ while true; do rm -fr mnt/c; sleep 1.2; cp -R /somewhere/linux 
> mnt/c; sleep 1.2; done
>
> Now that the filesystem is busy, we''ll also scrub it repeatedly 
> (without backgrounding, -B):
>
> $ while true; do btrfs scrub start -B mnt; sleep 0.5; done
>
> On my machine and in RAM, each scrub takes 0-1 second and the "total 
> bytes scrubbed" should fluctuate (seems to be especially true with 
> commit=1, but not sure). Get a beverage of your choice and wait.
>
> (about 10 minutes later)
>
> When I was writing this repro it took about 10 minutes until scrub said:
>
>   total bytes scrubbed: 1.20GB with 2 errors
>   error details: super=2
>   corrected errors: 0, uncorrectable errors: 0, unverified errors: 0
>
> and in dmesg:
>
>   [15282.155170] btrfs: bdev /dev/loop0 errs: wr 0, rd 0, flush 0, 
> corrupt 0, gen 1
>   [15282.155176] btrfs: bdev /dev/loop0 errs: wr 0, rd 0, flush 0, 
> corrupt 0, gen 2
>
> After that, scrub is happy again and will continue normally until the 
> same errors happen again after a few hundred scrubs or so.
>
> So all in all, the error can be triggered using normal I/O operations 
> and scrubbing at the right moments, it seems. Even with a btrfs image 
> in RAM, so no hard drive error is possible.
>
> Hope anyone can reproduce this and maybe debug it.Let me have a look at this.

Thanks,
Wang>
> Best regards
> Sebastian
> -- 
> To unsubscribe from this list: send the line "unsubscribe
linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Wang Shilong

2013-Dec-02 01:53 UTC

head link

Re: 2 errors when scrubbing - but I don''t know what they mean

On 12/02/2013 09:30 AM, Wang Shilong wrote:> On 12/02/2013 04:45 AM, Sebastian Ochmann wrote:
>> Hello,
>>
>> > However, if you find such superblocks checksum mismatch very often
>> > during scrub, it maybe
>> > there are something wrong with disk!
>>
>> I''m sorry, but I don''t think there''s a
problem with my disks because
>> I was able to trigger the errors that increment the "gen"
error
>> counter during scrub on a completely different machine and drive 
>> today. I basically performed some I/O operations on a drive and 
>> scrubbed at the same time over and over again until I actually saw 
>> "super" errors during scrub. But the error is reeally hard to
>> trigger. It seems to me like a race condition somewhere.
>>
>> So I went a step further and tried to create a repro for this. It 
>> seems like I can trigger the errors now once every few minutes with 
>> the method described below, but sometimes it really takes a long time 
>> until the error pops up, so be patient when trying this...
>>
>> For the repro:
>>
>> I''m using a btrfs image in RAM for this for two reasons: I can
scrub
>> quickly over and over again and I can rule our hard drive errors. My 
>> machine has 32 GB of RAM, so that comes in handy here - if you try 
>> this on a physical drive, make sure to adjust some parameters, if 
>> necessary.
>>
>> Create a tmpfs and a testing image, format as btrfs:
>>
>> $ mkdir btrfstest
>> $ cd btrfstest/
>> $ mkdir tmp
>> $ mount -t tmpfs -o size=20G none tmp
>> $ dd if=/dev/zero of=tmp/vol bs=1G count=19
>> $ mkfs.btrfs tmp/vol
>> $ mkdir mnt
>> $ mount -o commit=1 tmp/vol mnt
>>
>> Note the "commit=1" mount option. It''s not strictly
necessary, but I
>> have the feeling it helps with triggering the problem...
>>
>> So now we have a 19 GB btrfs filesystem in RAM, mounted in
"mnt".
>> What I did for performing some artificial I/O operations is to rm and 
>> cp a linux source tree over and over again. Suppose you have an 
>> unpacked linux source tree available in the
"/somewhere/linux"
>> directory (and you''re using bash). We''ll spawn some
loops that keep
>> the filesystem busy:
>>
>> $ while true; do rm -fr mnt/a; sleep 1.0; cp -R /somewhere/linux 
>> mnt/a; sleep 1.0; done
>> $ while true; do rm -fr mnt/b; sleep 1.1; cp -R /somewhere/linux 
>> mnt/b; sleep 1.1; done
>> $ while true; do rm -fr mnt/c; sleep 1.2; cp -R /somewhere/linux 
>> mnt/c; sleep 1.2; done
>>
>> Now that the filesystem is busy, we''ll also scrub it
repeatedly
>> (without backgrounding, -B):
>>
>> $ while true; do btrfs scrub start -B mnt; sleep 0.5; done
>>
>> On my machine and in RAM, each scrub takes 0-1 second and the
"total
>> bytes scrubbed" should fluctuate (seems to be especially true with
>> commit=1, but not sure). Get a beverage of your choice and wait.
>>
>> (about 10 minutes later)
>>
>> When I was writing this repro it took about 10 minutes until scrub
said:
>>
>>   total bytes scrubbed: 1.20GB with 2 errors
>>   error details: super=2
>>   corrected errors: 0, uncorrectable errors: 0, unverified errors: 0
>>
>> and in dmesg:
>>
>>   [15282.155170] btrfs: bdev /dev/loop0 errs: wr 0, rd 0, flush 0, 
>> corrupt 0, gen 1
>>   [15282.155176] btrfs: bdev /dev/loop0 errs: wr 0, rd 0, flush 0, 
>> corrupt 0, gen 2
>>
>> After that, scrub is happy again and will continue normally until the 
>> same errors happen again after a few hundred scrubs or so.
>>
>> So all in all, the error can be triggered using normal I/O operations 
>> and scrubbing at the right moments, it seems. Even with a btrfs image 
>> in RAM, so no hard drive error is possible.
>>
>> Hope anyone can reproduce this and maybe debug it.It seems this is a generation mismatch not a checksum mismatch.

The story is `tree log sync` now only flush first superblock, this will 
casue superblock
generation mismatch while we are scrubbing other two superblocks.

I will give a patch to fix this issue, thanks for reporting!


Thanks,
Wang> Let me have a look at this.
>
> Thanks,
> Wang
>>
>> Best regards
>> Sebastian
>> -- 
>> To unsubscribe from this list: send the line "unsubscribe 
>> linux-btrfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>
> -- 
> To unsubscribe from this list: send the line "unsubscribe
linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Wang Shilong

2013-Dec-02 09:21 UTC

head link

Re: 2 errors when scrubbing - but I don''t know what they mean

Hi Sebastian,

On 12/02/2013 04:45 AM, Sebastian Ochmann wrote:> Hello,
>
> > However, if you find such superblocks checksum mismatch very often
> > during scrub, it maybe
> > there are something wrong with disk!
>
> I''m sorry, but I don''t think there''s a problem
with my disks because I
> was able to trigger the errors that increment the "gen" error
counter
> during scrub on a completely different machine and drive today. I 
> basically performed some I/O operations on a drive and scrubbed at the 
> same time over and over again until I actually saw "super" errors
> during scrub. But the error is reeally hard to trigger. It seems to me 
> like a race condition somewhere.I am sorry, i try to reproduce the problem as steps what you have said, 
it didn''t come up yet(i have run it for more than 6 hours).:-(
I took a careful look at code.

Superblock generation mismatch can only happen in 
scrub_checksum_super(). The generation mismatch happens when:
superblocks'' gen ! = last_trans_commited.

While we can only modify value ''last_trans_commited'' in one 
place(commiting transaction), However, in commiting transaction before
changing last_trans_commited, we will call btrfs_scrub_pause() which 
make it impossible that srubbing and writting supers
happen at the same time. Otherwise, i must miss some important thing 
here:-)

Would you please have a try with btrfs-next and see if the problem still 
exist in that branch:
https://git.kernel.org/cgit/linux/kernel/git/josef/btrfs-next.git/

Thanks,
Wang>
> So I went a step further and tried to create a repro for this. It 
> seems like I can trigger the errors now once every few minutes with 
> the method described below, but sometimes it really takes a long time 
> until the error pops up, so be patient when trying this...
>
> For the repro:
>
> I''m using a btrfs image in RAM for this for two reasons: I can
scrub
> quickly over and over again and I can rule our hard drive errors. My 
> machine has 32 GB of RAM, so that comes in handy here - if you try 
> this on a physical drive, make sure to adjust some parameters, if 
> necessary.
>
> Create a tmpfs and a testing image, format as btrfs:
>
> $ mkdir btrfstest
> $ cd btrfstest/
> $ mkdir tmp
> $ mount -t tmpfs -o size=20G none tmp
> $ dd if=/dev/zero of=tmp/vol bs=1G count=19
> $ mkfs.btrfs tmp/vol
> $ mkdir mnt
> $ mount -o commit=1 tmp/vol mnt
>
> Note the "commit=1" mount option. It''s not strictly
necessary, but I
> have the feeling it helps with triggering the problem...
>
> So now we have a 19 GB btrfs filesystem in RAM, mounted in "mnt".
What
> I did for performing some artificial I/O operations is to rm and cp a 
> linux source tree over and over again. Suppose you have an unpacked 
> linux source tree available in the "/somewhere/linux" directory
(and
> you''re using bash). We''ll spawn some loops that keep the
filesystem busy:
>
> $ while true; do rm -fr mnt/a; sleep 1.0; cp -R /somewhere/linux 
> mnt/a; sleep 1.0; done
> $ while true; do rm -fr mnt/b; sleep 1.1; cp -R /somewhere/linux 
> mnt/b; sleep 1.1; done
> $ while true; do rm -fr mnt/c; sleep 1.2; cp -R /somewhere/linux 
> mnt/c; sleep 1.2; done
>
> Now that the filesystem is busy, we''ll also scrub it repeatedly 
> (without backgrounding, -B):
>
> $ while true; do btrfs scrub start -B mnt; sleep 0.5; done
>
> On my machine and in RAM, each scrub takes 0-1 second and the "total 
> bytes scrubbed" should fluctuate (seems to be especially true with 
> commit=1, but not sure). Get a beverage of your choice and wait.
>
> (about 10 minutes later)
>
> When I was writing this repro it took about 10 minutes until scrub said:
>
>   total bytes scrubbed: 1.20GB with 2 errors
>   error details: super=2
>   corrected errors: 0, uncorrectable errors: 0, unverified errors: 0
>
> and in dmesg:
>
>   [15282.155170] btrfs: bdev /dev/loop0 errs: wr 0, rd 0, flush 0, 
> corrupt 0, gen 1
>   [15282.155176] btrfs: bdev /dev/loop0 errs: wr 0, rd 0, flush 0, 
> corrupt 0, gen 2
>
> After that, scrub is happy again and will continue normally until the 
> same errors happen again after a few hundred scrubs or so.
>
> So all in all, the error can be triggered using normal I/O operations 
> and scrubbing at the right moments, it seems. Even with a btrfs image 
> in RAM, so no hard drive error is possible.
>
> Hope anyone can reproduce this and maybe debug it.
>
> Best regards
> Sebastian
> -- 
> To unsubscribe from this list: send the line "unsubscribe
linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Btrfs devel - Nov 2013 - 2 errors when scrubbing - but I don't know what they mean

2 errors when scrubbing - but I don''t know what they mean

Re: 2 errors when scrubbing - but I don''t know what they mean

Re: 2 errors when scrubbing - but I don''t know what they mean

Re: 2 errors when scrubbing - but I don''t know what they mean

Fwd: 2 errors when scrubbing - but I don''t know what they mean

Re: 2 errors when scrubbing - but I don''t know what they mean

Re: 2 errors when scrubbing - but I don''t know what they mean

Re: 2 errors when scrubbing - but I don''t know what they mean

Re: 2 errors when scrubbing - but I don''t know what they mean