thr3ads.net - Btrfs devel - rfc: fuzz testing by direct writes to device [Sep 2012]

If this information is useful, please help other people find it:
Share via:

Shentino

2012-Sep-01 06:44 UTC

rfc: fuzz testing by direct writes to device

How effective would it be to directly write to the underlying device
and then running tests to see if the corruption is properly detected?

I just ran a fuzz test by syncing, and then manually corrupting a file
with the help of a surgical sed (yes, the before and after patterns
had fixed equal lengths).  First I got an I/O error (expected), then I
ran scrub and got more problems (not ok), the system froze (not good),
a reboot failed to mount the system again (worse), and then the fsck
program dumped core.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Shentino

2012-Sep-01 08:10 UTC

head link

Re: rfc: fuzz testing by direct writes to device

Also, since the problem prevented me from syncing my other filesystmes
I couldn''t capture the debug info.

It vanished during the cold boot still sitting in dirty page cache.

On Fri, Aug 31, 2012 at 11:44 PM, Shentino <shentino@gmail.com>
wrote:> How effective would it be to directly write to the underlying device
> and then running tests to see if the corruption is properly detected?
>
> I just ran a fuzz test by syncing, and then manually corrupting a file
> with the help of a surgical sed (yes, the before and after patterns
> had fixed equal lengths).  First I got an I/O error (expected), then I
> ran scrub and got more problems (not ok), the system froze (not good),
> a reboot failed to mount the system again (worse), and then the fsck
> program dumped core.--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Michael

2012-Sep-01 15:41 UTC

head link

Re: rfc: fuzz testing by direct writes to device

Please make sure you are running a very recent kernel. Btrfs is VERY
active and fixes for things like this are going in all the time. Any
related crash errors, kernel oopses, and exact methodology so we can
reproduce would be useful.
dmesg and uname -a would help us triage this and see what we need to fix.
Mike

On Sat, Sep 1, 2012 at 3:10 AM, Shentino <shentino@gmail.com>
wrote:>
> Also, since the problem prevented me from syncing my other filesystmes
> I couldn''t capture the debug info.
>
> It vanished during the cold boot still sitting in dirty page cache.
>
> On Fri, Aug 31, 2012 at 11:44 PM, Shentino <shentino@gmail.com>
wrote:
> > How effective would it be to directly write to the underlying device
> > and then running tests to see if the corruption is properly detected?
> >
> > I just ran a fuzz test by syncing, and then manually corrupting a file
> > with the help of a surgical sed (yes, the before and after patterns
> > had fixed equal lengths).  First I got an I/O error (expected), then I
> > ran scrub and got more problems (not ok), the system froze (not good),
> > a reboot failed to mount the system again (worse), and then the fsck
> > program dumped core.
> --
> To unsubscribe from this list: send the line "unsubscribe
linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Shentino

2012-Sep-01 17:23 UTC

head link

Re: rfc: fuzz testing by direct writes to device

On Sat, Sep 1, 2012 at 8:41 AM, Michael <mike@draftx.net>
wrote:> Please make sure you are running a very recent kernel. Btrfs is VERY
> active and fixes for things like this are going in all the time. Any
> related crash errors, kernel oopses, and exact methodology so we can
> reproduce would be useful.
> dmesg and uname -a would help us triage this and see what we need to fix.
> Mike
I did save debug information like this, on a separate filesystem no
less...unfortunately I was unable to sync and it disappeared when I
was forced to cold boot.  I don''t have a spare machine available.

To reproduce:

1.  lvcreate vg --name lv --size 1G
2.  mkbtrfs -M /dev/vg/lv
3.  mkdir /mnt/test
4.  mount /dev/vg/lv /mnt/test
5.  dd bs=1024 count=1 < /dev/urandom > /tmp/foo
6.  sha1sum /tmp/foo > /mnt/test/scrubme
7.  sync (get /mnt/test/scrubme written to disk)
8.  sed -e "s/the_sha1_sum/something_else_of_same_length/" <
/dev/vg/lv > /dev/vg/lv
9.  cat /mnt/test/scrubme (returns I/O error as expected, probably
from a failed checksum)
9. btrfs scrub /mnt/test

This is when all hell breaks loose.

Will this be enough information to at least allow it to be reproduced
or check if the bug still exists?
> On Sat, Sep 1, 2012 at 3:10 AM, Shentino <shentino@gmail.com> wrote:
>>
>> Also, since the problem prevented me from syncing my other filesystmes
>> I couldn''t capture the debug info.
>>
>> It vanished during the cold boot still sitting in dirty page cache.
>>
>> On Fri, Aug 31, 2012 at 11:44 PM, Shentino <shentino@gmail.com>
wrote:
>> > How effective would it be to directly write to the underlying
device
>> > and then running tests to see if the corruption is properly
detected?
>> >
>> > I just ran a fuzz test by syncing, and then manually corrupting a
file
>> > with the help of a surgical sed (yes, the before and after
patterns
>> > had fixed equal lengths).  First I got an I/O error (expected),
then I
>> > ran scrub and got more problems (not ok), the system froze (not
good),
>> > a reboot failed to mount the system again (worse), and then the
fsck
>> > program dumped core.
>> --
>> To unsubscribe from this list: send the line "unsubscribe
linux-btrfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

cwillu

2012-Sep-01 20:59 UTC

head link

Re: rfc: fuzz testing by direct writes to device

On Sat, Sep 1, 2012 at 11:23 AM, Shentino <shentino@gmail.com>
wrote:> On Sat, Sep 1, 2012 at 8:41 AM, Michael <mike@draftx.net> wrote:
>> Please make sure you are running a very recent kernel. Btrfs is VERY
>> active and fixes for things like this are going in all the time. Any
>> related crash errors, kernel oopses, and exact methodology so we can
>> reproduce would be useful.
>> dmesg and uname -a would help us triage this and see what we need to
fix.
>> Mike
>
> I did save debug information like this, on a separate filesystem no
> less...unfortunately I was unable to sync and it disappeared when I
> was forced to cold boot.  I don''t have a spare machine available.
>
> To reproduce:
>
> 1.  lvcreate vg --name lv --size 1G
> 2.  mkbtrfs -M /dev/vg/lv
> 3.  mkdir /mnt/test
> 4.  mount /dev/vg/lv /mnt/test
> 5.  dd bs=1024 count=1 < /dev/urandom > /tmp/foo
> 6.  sha1sum /tmp/foo > /mnt/test/scrubme
> 7.  sync (get /mnt/test/scrubme written to disk)
> 8.  sed -e "s/the_sha1_sum/something_else_of_same_length/" <
> /dev/vg/lv > /dev/vg/lv
> 9.  cat /mnt/test/scrubme (returns I/O error as expected, probably
> from a failed checksum)
> 9. btrfs scrub /mnt/test
>
> This is when all hell breaks loose.
>
> Will this be enough information to at least allow it to be reproduced
> or check if the bug still exists?
You still haven''t said which kernel you were running; the thing to do
is try the very latest rc (if not btrfs-next).
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Shentino

2012-Sep-01 22:31 UTC

head link

Re: rfc: fuzz testing by direct writes to device

On Sat, Sep 1, 2012 at 1:59 PM, cwillu <cwillu@cwillu.com>
wrote:> You still haven''t said which kernel you were running; the thing to
do
> is try the very latest rc (if not btrfs-next).
Sorry about that!

I thought I included it.

3.3.8

Hmm...seems it''s been EOL''ed.  I need to yell at my distro.

In the meantime, will mounting a btrfs filesystem with a new kernel
render it unmountable by older kernels?
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Michael

2012-Sep-01 23:49 UTC

head link

Re: rfc: fuzz testing by direct writes to device

It should not. It is always preferred that you dd your drive onto
another disk just in case though.

On Sat, Sep 1, 2012 at 5:31 PM, Shentino <shentino@gmail.com>
wrote:> On Sat, Sep 1, 2012 at 1:59 PM, cwillu <cwillu@cwillu.com> wrote:
>> You still haven''t said which kernel you were running; the
thing to do
>> is try the very latest rc (if not btrfs-next).
>
> Sorry about that!
>
> I thought I included it.
>
> 3.3.8
>
> Hmm...seems it''s been EOL''ed.  I need to yell at my
distro.
>
> In the meantime, will mounting a btrfs filesystem with a new kernel
> render it unmountable by older kernels?--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Shentino

2012-Sep-02 01:03 UTC

head link

Re: rfc: fuzz testing by direct writes to device

This whole subject was also about using sed to corrupt-o-magic a
file''s data on disk.

Is this an acceptable method for testing?

On Sat, Sep 1, 2012 at 4:49 PM, Michael <mike@draftx.net>
wrote:> It should not. It is always preferred that you dd your drive onto
> another disk just in case though.
>
> On Sat, Sep 1, 2012 at 5:31 PM, Shentino <shentino@gmail.com> wrote:
>> On Sat, Sep 1, 2012 at 1:59 PM, cwillu <cwillu@cwillu.com> wrote:
>>> You still haven''t said which kernel you were running; the
thing to do
>>> is try the very latest rc (if not btrfs-next).
>>
>> Sorry about that!
>>
>> I thought I included it.
>>
>> 3.3.8
>>
>> Hmm...seems it''s been EOL''ed.  I need to yell at my
distro.
>>
>> In the meantime, will mounting a btrfs filesystem with a new kernel
>> render it unmountable by older kernels?--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

David Sterba

2012-Sep-02 05:44 UTC

head link

Re: rfc: fuzz testing by direct writes to device

On Sat, Sep 01, 2012 at 06:03:32PM -0700, Shentino
wrote:> This whole subject was also about using sed to corrupt-o-magic a
> file''s data on disk.
> 
> Is this an acceptable method for testing?
Starting with kernels 3.4 the error handling has been improved,
namely for the EIO, so it shouldn''t take your box down when you hit
one.
Newer kernels got fixes to the ''transaction abort'' cleanup, so
it should
be possible to umount and mount the filesystem without problems.

The filesystem should survive shooting at blocks, the checksums catch
any change (with respect to it''s strength, ie. generating a hash
collision will lead to crash/abort later).

Expected result for reading blocks after random writes is:
* EIO for the corrupted block (both data or metadata) provided that
  there''s no other copy
* transparent and automatic repair from other copies

I''ve tested this on an 2 disk data/raid1, metadata/raid1 with a running
dd over one of the devices continually and using the filesystem. It was
slower.


david
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Shentino

2012-Sep-02 11:43 UTC

head link

Re: rfc: fuzz testing by direct writes to device

On Sat, Sep 1, 2012 at 10:44 PM, David Sterba <dave@jikos.cz>
wrote:> On Sat, Sep 01, 2012 at 06:03:32PM -0700, Shentino wrote:
>> This whole subject was also about using sed to corrupt-o-magic a
>> file''s data on disk.
>>
>> Is this an acceptable method for testing?
>
> Starting with kernels 3.4 the error handling has been improved,
> namely for the EIO, so it shouldn''t take your box down when you
hit one.
> Newer kernels got fixes to the ''transaction abort''
cleanup, so it should
> be possible to umount and mount the filesystem without problems.
>
> The filesystem should survive shooting at blocks, the checksums catch
> any change (with respect to it''s strength, ie. generating a hash
> collision will lead to crash/abort later).
>
> Expected result for reading blocks after random writes is:
> * EIO for the corrupted block (both data or metadata) provided that
>   there''s no other copy
> * transparent and automatic repair from other copies
I assume the same results are expected during a scrub as during a normal read?
> I''ve tested this on an 2 disk data/raid1, metadata/raid1 with a
running
> dd over one of the devices continually and using the filesystem. It was
> slower.
>
>
> david
Would I be correct to assume that a core dump on fsck is an automatic
bug or is using the old 3.3.8 kernel a taint that would invalidate a
report?  Note that this is for the btrfs-progs containing the fsck,
not the actual kernel side code.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Goffredo Baroncelli

2012-Sep-04 18:15 UTC

head link

Re: rfc: fuzz testing by direct writes to device

Hi,

On 09/02/2012 03:03 AM, Shentino wrote:> This whole subject was also about using sed to corrupt-o-magic a
> file''s data on disk.
>
> Is this an acceptable method for testing?

I am not sure that doing "sed </dev/sdX >/dev/sdX ..." is the
right
thing to do, because it rewrites the full disk. This means that:
- it takes a lot of time
- you don''t have any control about which part of the disk you change: 
what happens if sed write a block which is update in parallel by BTRFS ?

Anyway I suggest to give a look to the following video [1], which 
explains the automatic repair. Moreover it shows [2] how corrupt a block 
with the "btrfs-corrupt-block" command.

Hoping that this helps you.

BR
G.Baroncelli

[1] http://www.youtube.com/watch?v=hxWuaozpe2I
[2] See minute 17:52 of the video above
>
> On Sat, Sep 1, 2012 at 4:49 PM, Michael<mike@draftx.net>  wrote:
>> It should not. It is always preferred that you dd your drive onto
>> another disk just in case though.
>>
>> On Sat, Sep 1, 2012 at 5:31 PM, Shentino<shentino@gmail.com> 
wrote:
>>> On Sat, Sep 1, 2012 at 1:59 PM, cwillu<cwillu@cwillu.com> 
wrote:
>>>> You still haven''t said which kernel you were running;
the thing to do
>>>> is try the very latest rc (if not btrfs-next).
>>>
>>> Sorry about that!
>>>
>>> I thought I included it.
>>>
>>> 3.3.8
>>>
>>> Hmm...seems it''s been EOL''ed.  I need to yell at
my distro.
>>>
>>> In the meantime, will mounting a btrfs filesystem with a new kernel
>>> render it unmountable by older kernels?
> --
> To unsubscribe from this list: send the line "unsubscribe
linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> .
>
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Shentino

2012-Sep-05 01:59 UTC

head link

Re: rfc: fuzz testing by direct writes to device

On Tue, Sep 4, 2012 at 11:15 AM, Goffredo Baroncelli <kreijack@libero.it>
wrote:> Hi,
>
>
> On 09/02/2012 03:03 AM, Shentino wrote:
>>
>> This whole subject was also about using sed to corrupt-o-magic a
>> file''s data on disk.
>>
>> Is this an acceptable method for testing?
>
> I am not sure that doing "sed </dev/sdX >/dev/sdX ..." is
the right thing to
> do, because it rewrites the full disk. This means that:
> - it takes a lot of time
> - you don''t have any control about which part of the disk you
change: what
> happens if sed write a block which is update in parallel by BTRFS ?
Which is one reason I used a sha1 hash of a random read as the search key :P
> Anyway I suggest to give a look to the following video [1], which explains
> the automatic repair. Moreover it shows [2] how corrupt a block with the
> "btrfs-corrupt-block" command.
That does sound more convenient.
> Hoping that this helps you.
>
> BR
> G.Baroncelli
>
> [1] http://www.youtube.com/watch?v=hxWuaozpe2I
> [2] See minute 17:52 of the video above
>
>>
>> On Sat, Sep 1, 2012 at 4:49 PM, Michael<mike@draftx.net>  wrote:
>>>
>>> It should not. It is always preferred that you dd your drive onto
>>> another disk just in case though.
>>>
>>> On Sat, Sep 1, 2012 at 5:31 PM, Shentino<shentino@gmail.com> 
wrote:
>>>>
>>>> On Sat, Sep 1, 2012 at 1:59 PM, cwillu<cwillu@cwillu.com>
wrote:
>>>>>
>>>>> You still haven''t said which kernel you were
running; the thing to do
>>>>> is try the very latest rc (if not btrfs-next).
>>>>
>>>>
>>>> Sorry about that!
>>>>
>>>> I thought I included it.
>>>>
>>>> 3.3.8
>>>>
>>>> Hmm...seems it''s been EOL''ed.  I need to yell
at my distro.
>>>>
>>>> In the meantime, will mounting a btrfs filesystem with a new
kernel
>>>> render it unmountable by older kernels?
>>
>> --
>>
>> To unsubscribe from this list: send the line "unsubscribe
linux-btrfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> .
>>
>--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Goffredo Baroncelli

2012-Sep-05 05:46 UTC

head link

Re: rfc: fuzz testing by direct writes to device

On 09/05/2012 03:59 AM, Shentino wrote:>> >  I am not sure that doing "sed</dev/sdX>/dev/sdX
..." is the right thing to
>> >  do, because it rewrites the full disk. This means that:
>> >  - it takes a lot of time
>> >  - you don''t have any control about which part of the
disk you change: what
>> >  happens if sed write a block which is update in parallel by BTRFS
?
> Which is one reason I used a sha1 hash of a random read as the search key
:P
This doesn''t change. The race would be the following:

1- kernel read a sector from the disk
2- sed read a sector from the disk
3- sed write a sector to the disk (the same data or an update one 
doesn''t matter)
4- kernel write an update sector to the disk

If 3 and 4 are different data the results are unpredictable. Yes it is a 
very unlikely case, but it could happens.


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

David Sterba

2012-Sep-05 15:04 UTC

head link

Re: rfc: fuzz testing by direct writes to device

On Sun, Sep 02, 2012 at 04:43:48AM -0700, Shentino
wrote:> I assume the same results are expected during a scrub as during a normal
read?
yes
> > I''ve tested this on an 2 disk data/raid1, metadata/raid1 with
a running
> > dd over one of the devices continually and using the filesystem. It
was
> > slower.
>
> Would I be correct to assume that a core dump on fsck is an automatic
> bug or is using the old 3.3.8 kernel a taint that would invalidate a
> report?  Note that this is for the btrfs-progs containing the fsck,
> not the actual kernel side code.
You''re talking about fsck and kernel, I''m not quite sure which
one do
you refer to with ''bug''.

fsck can crash when it finds unexpected data in the tree structures,
but this would mean that it passed through the checksum verification
earlier. This would be a bug, and if it is reproducible on newer kernels
a report on 3.3.x does not disqualify it right away.

Same holds for kernel, a datastructure inconsistency will most probably
lead to a BUG and subsequent crash.

david
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Shentino

2012-Sep-05 21:23 UTC

head link

Re: rfc: fuzz testing by direct writes to device

On Wed, Sep 5, 2012 at 8:04 AM, David Sterba <dave@jikos.cz>
wrote:> On Sun, Sep 02, 2012 at 04:43:48AM -0700, Shentino wrote:
>> I assume the same results are expected during a scrub as during a
normal read?
>
> yes
>
>> > I''ve tested this on an 2 disk data/raid1, metadata/raid1
with a running
>> > dd over one of the devices continually and using the filesystem.
It was
>> > slower.
>>
>> Would I be correct to assume that a core dump on fsck is an automatic
>> bug or is using the old 3.3.8 kernel a taint that would invalidate a
>> report?  Note that this is for the btrfs-progs containing the fsck,
>> not the actual kernel side code.
>
> You''re talking about fsck and kernel, I''m not quite sure
which one do
> you refer to with ''bug''.
>
> fsck can crash when it finds unexpected data in the tree structures,
> but this would mean that it passed through the checksum verification
> earlier. This would be a bug, and if it is reproducible on newer kernels
> a report on 3.3.x does not disqualify it right away.
>
> Same holds for kernel, a datastructure inconsistency will most probably
> lead to a BUG and subsequent crash.
>
> david
By this I mean running fsck on a btrfs that was trashed by a 3.3.8 kernel bug.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Btrfs devel - Sep 2012 - rfc: fuzz testing by direct writes to device

rfc: fuzz testing by direct writes to device

Re: rfc: fuzz testing by direct writes to device

Re: rfc: fuzz testing by direct writes to device

Re: rfc: fuzz testing by direct writes to device

Re: rfc: fuzz testing by direct writes to device

Re: rfc: fuzz testing by direct writes to device

Re: rfc: fuzz testing by direct writes to device

Re: rfc: fuzz testing by direct writes to device

Re: rfc: fuzz testing by direct writes to device

Re: rfc: fuzz testing by direct writes to device

Re: rfc: fuzz testing by direct writes to device

Re: rfc: fuzz testing by direct writes to device

Re: rfc: fuzz testing by direct writes to device

Re: rfc: fuzz testing by direct writes to device

Re: rfc: fuzz testing by direct writes to device