thr3ads.net - zfs discuss - [zfs-discuss] zfs testing / problems [May 2010]

If this information is useful, please help other people find it:
Share via:

Roy Sigurd Karlsbakk

2010-May-02 15:50 UTC

[zfs-discuss] zfs testing / problems

Hi all

Testing variable size ''disks'' in mirror, I followed Victor
Latushkin''s example

bash-4.0# mkfile -n 2000000000000 d0
bash-4.0# zpool create pool $PWD/d0
bash-4.0# mkfile -n 1992869543936 d1
bash-4.0# zpool attach pool $PWD/d0 $PWD/d1

and so on - this works well. Now, to try to mess with ZFS a litt (or a lot), I
tried corrupting parts of both sides of the mirror to see what ZFS would do
about it

root at mime:/testpool/testdisks# dd if=/dev/urandom of=d1 bs=100k count=1
skip=30
1+0 records in
1+0 records out
102400 bytes (102 kB) copied, 0.00208301 s, 49.2 MB/s
root at mime:/testpool/testdisks# dd if=/dev/urandom of=d0 bs=100k count=1
skip=50
1+0 records in
1+0 records out
102400 bytes (102 kB) copied, 0.00205321 s, 49.9 MB/s

This resulted in a panic - see below for the info from the kernel log.
I''d forgotten to turn on dumps after last reinstall, but it should be
easy to reproduce it.

I think I read somewhere that it was normal for zfs/zpool to panic if it lost
contact with a pool - is this what happened here? if so, is it possible to
change this behaviour? If osol looses contact with a pool, I''d rather
try to debug it and reboot myself if I want to, rather than having the system
automatically panic.


May  2 17:42:09 mime unix: [ID 836849 kern.notice] May  2 17:42:09 mime
^Mpanic[cpu1]/thread=ffffff00044d0c60:
May  2 17:42:09 mime genunix: [ID 603766 kern.notice] assertion failed: 0 ==
zap_increment_int(os, (-1ULL), user, delta, tx) (0x0 == 0x32), file: ../.
./common/fs/zfs/dmu_objset.c, line: 1086
May  2 17:42:09 mime unix: [ID 100000 kern.notice] 
May  2 17:42:09 mime genunix: [ID 655072 kern.notice] ffffff00044d09c0
genunix:assfail3+c1 ()
May  2 17:42:09 mime genunix: [ID 655072 kern.notice] ffffff00044d0a20
zfs:do_userquota_callback+11f ()
May  2 17:42:09 mime genunix: [ID 655072 kern.notice] ffffff00044d0a70
zfs:dmu_objset_do_userquota_callbacks+a9 ()
May  2 17:42:09 mime genunix: [ID 655072 kern.notice] ffffff00044d0ae0
zfs:dsl_pool_sync+f0 ()
May  2 17:42:09 mime genunix: [ID 655072 kern.notice] ffffff00044d0ba0
zfs:spa_sync+3a9 ()
May  2 17:42:09 mime genunix: [ID 655072 kern.notice] ffffff00044d0c40
zfs:txg_sync_thread+24a ()
May  2 17:42:09 mime genunix: [ID 655072 kern.notice] ffffff00044d0c50
unix:thread_start+8 ()


Best regards

roy
--
Roy Sigurd Karlsbakk
(+47) 97542685
roy at karlsbakk.net
http://blogg.karlsbakk.net/
--
I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er
et element?rt imperativ for alle pedagoger ? unng? eksessiv anvendelse av
idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og
relevante synonymer p? norsk.

Edward Ned Harvey

2010-May-03 03:59 UTC

head link

[zfs-discuss] zfs testing / problems

> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss-
> bounces at opensolaris.org] On Behalf Of Roy Sigurd Karlsbakk
> 
> bash-4.0# mkfile -n 2000000000000 d0
> bash-4.0# zpool create pool $PWD/d0
> bash-4.0# mkfile -n 1992869543936 d1
> bash-4.0# zpool attach pool $PWD/d0 $PWD/d1
As long as you''re just doing this for testing, great.  I
wouldn''t suggest a configuration like that for any sort of permanent
configuration.  Filesystems inside of files ... In general, not a great idea. 
With only rare exceptions.

Also, you know you can do it all on one line, right?
zpool create mypool mirror $PWD/d0 $PWD/d1

Do a zpool status next.  I think the supposed "resilver" will be
basically instantaneous, because the pool is empty.  But you should just check,
and make sure the pool is healthy and not resilvering before you start your
abuse.

> root at mime:/testpool/testdisks# dd if=/dev/urandom of=d1 bs=100k count=1
> skip=30
> 1+0 records in
> 1+0 records out
> 102400 bytes (102 kB) copied, 0.00208301 s, 49.2 MB/s
> root at mime:/testpool/testdisks# dd if=/dev/urandom of=d0 bs=100k count=1
> skip=50
> 1+0 records in
> 1+0 records out
> 102400 bytes (102 kB) copied, 0.00205321 s, 49.9 MB/s
> 
> This resulted in a panic - see below for the info from the kernel log.
That looks perfect to me, except one thing.  You should zpool export before
doing those dd''s.  While ZFS will correctly identify the faulty blocks
(and in your case, correct them because you have a mirror) ... If
you''re doing that to the "device" while it''s
mounted, I think that''s worse than unknown randomness happening on
disks that aren''t reporting the randomness.  I think what
you''re doing (writing to the file, thus perhaps eliminating the ZFS
open file handle to the "device") is causing the "devices"
to be swept out from under ZFS''s feet.  I think this technique is more
like a simulation of unplugging disks, and less like a simulation of random
undetected errors happening on disks.

Also, if you''re writing the randomness to blocks that happen to be
unoccupied by anything, the system, I believe, won''t even notice,
because it''ll never read the empty space that you''ve
intentionally corrupted.

To ensure a good result, I would recommend:  
Create the filesystem as you''ve done.
Fill up the filesystem.  Thus, when you later do a "scrub" it will
have to inspect all blocks.
zpool export.
Perform the dd''s from urandom as you''ve done.
zpool import.
Check:  zpool status (it will probably say no errors)
Then zpool scrub.
After some time, zpool status will probably show that it found and corrected
errors.

zfs discuss - May 2010 - zfs testing / problems

[zfs-discuss] zfs testing / problems

[zfs-discuss] zfs testing / problems