thr3ads.net - zfs discuss - [zfs-discuss] ZFS still crashing after patch [May 2008]

If this information is useful, please help other people find it:
Share via:

Rustam

2008-May-01 20:07 UTC

[zfs-discuss] ZFS still crashing after patch

Today my production server crashed  4 times. THIS IS NIGHTMARE!
Self-healing file system?! For me ZFS is SELF-KILLING filesystem. 

I cannot fsck it, there''s no such tool.
I cannot scrub it, it crashes 30-40 minutes after scrub starts.
I cannot use it, it crashes a number of times every day! And with every crash
number of checksum failures is growing:

NAME        STATE     READ WRITE CKSUM
        box5        ONLINE       0     0     0
...after a few hours...
        box5        ONLINE       0     0     4
...after a few hours...
        box5        ONLINE       0     0     62
...after another few hours...
        box5        ONLINE       0     0     120
...crash! and we start again...
        box5        ONLINE       0     0     0
...etc...

actually 120 is record, sometimes it crashed as soon as it boots.

and always there''s a permanent error:
errors: Permanent errors have been detected in the following files:
        box5:<0x0>

and very wise self-healing advice:
http://www.sun.com/msg/ZFS-8000-8A
Restore the file in question if possible.  Otherwise restore the entire pool
from backup.

Thanks, but if I restore it from backup it won''t be ZFS anymore,
that''s for sure.

It''s not I/O problem. AFAIK, default ZFS I/O error behavior is
"wait" to repair (i''ve 10U4, non-configurable). Then why it
panics?

Recently there were discussions on failure of OpenSolaris community. Now
it''s been more than half a month since I reported such an error. Nobody
even posted something like "RTFM". Come on guys, I know you are there
and busy with enterprise customers... but at least give me some troubleshooting
ideas. i''m totally lost.

just to remind, it''s heavily loaded fs with 3-4 million files and
folders.

Link to original post:
http://www.opensolaris.org/jive/thread.jspa?threadID=57425
 
 
This message posted from opensolaris.org

Bob Friesenhahn

2008-May-01 20:18 UTC

head link

[zfs-discuss] ZFS still crashing after patch

On Thu, 1 May 2008, Rustam wrote:
> Today my production server crashed  4 times. THIS IS NIGHTMARE!
> Self-healing file system?! For me ZFS is SELF-KILLING filesystem.
>
> I cannot fsck it, there''s no such tool.
> I cannot scrub it, it crashes 30-40 minutes after scrub starts.
> I cannot use it, it crashes a number of times every day! And with every
crash number of checksum failures is growing:
Is your ZFS pool configured with redundancy (e.g mirrors, raidz) or is 
it non-redundant?  If non-redundant, then there is not much that ZFS 
can really do if a device begins to fail.

Bob
=====================================Bob Friesenhahn
bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

Phillip Wagstrom -- Area SSE MidAmerica

2008-May-01 21:47 UTC

head link

[zfs-discuss] ZFS still crashing after patch

Rustam wrote:> Today my production server crashed  4 times. THIS IS NIGHTMARE! 
> Self-healing file system?! For me ZFS is SELF-KILLING filesystem.
> 
> I cannot fsck it, there''s no such tool. I cannot scrub it, it
crashes
> 30-40 minutes after scrub starts. I cannot use it, it crashes a
> number of times every day! And with every crash number of checksum
> failures is growing:
> 
> NAME        STATE     READ WRITE CKSUM box5        ONLINE       0
> 0     0 ...after a few hours... box5        ONLINE       0     0
> 4 ...after a few hours... box5        ONLINE       0     0     62 
> ...after another few hours... box5        ONLINE       0     0
> 120 ...crash! and we start again... box5        ONLINE       0     0
> 0 ...etc...
> 
> actually 120 is record, sometimes it crashed as soon as it boots.
> 
> and always there''s a permanent error: errors: Permanent errors
have
> been detected in the following files: box5:<0x0>
> 
> and very wise self-healing advice: http://www.sun.com/msg/ZFS-8000-8A
>  Restore the file in question if possible.  Otherwise restore the
> entire pool from backup.
> 
> Thanks, but if I restore it from backup it won''t be ZFS anymore,
> that''s for sure.
	That''s a bit harsh.  ZFS is telling you that you have corrupted data 
based on the checksums.  Other types of filesystems would likely simply 
pass the corrupted data on silently.
> It''s not I/O problem. AFAIK, default ZFS I/O error behavior is
"wait"
> to repair (i''ve 10U4, non-configurable). Then why it panics?
	Do you have the panic messages?  ZFS won''t cause panics based on bad 
checksums.  It will by default cause panic if it can''t write data out
to
any device or if it completely loses access to non-redundant devices or 
loses both redundant devices at the same time.
> Recently there were discussions on failure of OpenSolaris community.
> Now it''s been more than half a month since I reported such an
error.
> Nobody even posted something like "RTFM". Come on guys, I know
you
> are there and busy with enterprise customers... but at least give me
> some troubleshooting ideas. i''m totally lost.
> 
> just to remind, it''s heavily loaded fs with 3-4 million files and
> folders.
> 
> Link to original post: 
> http://www.opensolaris.org/jive/thread.jspa?threadID=57425
	Since this seems to show the same number of checksum errors across 2 
different channels and 4 different drives.  Given that, I''d assume that
this is likely a dual-channel HBA of some sort.  It would appear that 
you either have bad hardware or some sort of driver issue.

Regards,
Phil

Rustam

2008-May-02 00:13 UTC

head link

[zfs-discuss] ZFS still crashing after patch

> Is your ZFS pool configured with redundancy (e.g mirrors, raidz) or is
> it non-redundant? If non-redundant, then there is not much that ZFS
> can really do if a device begins to fail.
It''s RAID 10 (more info here:
http://www.opensolaris.org/jive/thread.jspa?threadID=57425):

NAME STATE READ WRITE CKSUM
box5 ONLINE 0 0 4
mirror ONLINE 0 0 2
c1d0 ONLINE 0 0 4
c2d0 ONLINE 0 0 4
mirror ONLINE 0 0 2
c2d1 ONLINE 0 0 4
c1d1 ONLINE 0 0 4

Actually, there''s no damaged data so far. I don''t get any
"unable to read/write" kind of errors. It''s just very strange
checksum errors synchronized over all disks.
> That''s a bit harsh.  ZFS is telling you that you u have corrupted
data
> based on the checksums.  Other types of filesystems would likely simply 
> pass the corrupted data on silently.
Checksums are good, no complaints about that.
> Do you have the panic messages?  ZFS won''t cause panics based on
bad
> checksums.  It will by default cause panic if it can''t write data
out to
> any device or if it completely loses access to non-redundant devices or 
> loses both redundant devices at the same time.
A number of panic messages and crash dump stack trace are attached to the
original post (http://www.opensolaris.org/jive/thread.jspa?threadID=57425). Here
is the short snip:
> ::statusdebugging crash dump vmcore.5 (64-bit) from core
operating system: 5.10 Generic_127112-07 (i86pc)
panic message: BAD TRAP: type=e (#pf Page fault) rp=fffffe800017f8d0 addr=238
occurred in module "unix" due to a NULL pointer dereference
dump content: kernel pages only>
> ::stackmutex_enter+0xb()
zio_buf_alloc+0x1a()
zio_read+0xba()
spa_scrub_io_start+0xf1()
spa_scrub_cb+0x13d()
traverse_callback+0x6a()
traverse_segment+0x118()
traverse_more+0x7b()
spa_scrub_thread+0x147()
thread_start+8()
> Since this seems to show the same number of checksum errors across 2 
> different channels and 4 different drives.  Given that, I''d assume
that
> this is likely a dual-channel HBA of some sort.  It would appear that 
> you either have bad hardware or some sort of driver issue.
You right, this is the dual-channel Intel''s ICH6 SATA controller. 10U4
has native support/drivers for this SATA controller (AHCI drivers afaik). The
thing is that this hardware and ZFS were in production for almost 2 years (ok,
not the best argument). However this problem occurred recently (20 days).
It''s even more strange because I didn''t made any OS/diver
upgrade or patch during last 2-3 months.

However, this is good point. I''ve seen some new SATA/AHCI drivers
available in 10U5. Maybe I should try to upgrade and see if it helps. Thanks
Phil.

--
Rustam
 
 
This message posted from opensolaris.org

Bob Friesenhahn

2008-May-02 01:10 UTC

head link

[zfs-discuss] ZFS still crashing after patch

On Thu, 1 May 2008, Rustam wrote:
> operating system: 5.10 Generic_127112-07 (i86pc)
Seems kind of old.  I am using Generic_127112-11 here.

Probably many hundreds of nasty bugs have been eliminated since the 
version you are using.

Bob
=====================================Bob Friesenhahn
bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

Rustam

2008-May-02 16:09 UTC

head link

[zfs-discuss] ZFS still crashing after patch

> Seems kind of old.  I am using Generic_127112-11 here.
> 
> Probably many hundreds of nasty bugs have been
> eliminated since the version you are using.
I''ve updated to the latest available kernel 127128-11 (from 28 Apr)
which included a number of fixes to AHCI SATA driver and ZFS.

Didn''t help. Keeps crashing.
The worst thing is that I don''t know where''s the problem. More
ideas on how to find problem?
 
 
This message posted from opensolaris.org

Marc Bevand

2008-May-03 05:43 UTC

head link

[zfs-discuss] ZFS still crashing after patch

Rustam <rustam <at> code.az> writes:> 
> Didn''t help. Keeps crashing.
> The worst thing is that I don''t know where''s the problem.
More ideas on
> how to find problem?
Lots of CKSUM errors like you see is often indicative of bad hardware. Run 
memtest for 24-48 hours.

-marc

Rustam

2008-May-03 08:16 UTC

head link

[zfs-discuss] ZFS still crashing after patch

I don''t think that this is hardware issue, however i don''t
except this. I''ll try to explain why.

1. I''ve replaced all memory modules which are more likely to cause such
a problem.

2. There are many different applications running on that server (Apache,
PostgreSQL, etc.). However, if you look at the four different crash dump stack
traces you see the same picture:

------ crash dump st1 ------
mutex_enter+0xb()
zio_buf_alloc+0x1a()
zio_read+0xba()
spa_scrub_io_start+0xf1()
spa_scrub_cb+0x13d()

------ crash dump st2 ------
mutex_enter+0xb()
zio_buf_alloc+0x1a()
zio_read+0xba()
arc_read+0x3cc()
dbuf_prefetch+0x11d()
dmu_prefetch+0x107()
zfs_readdir+0x408()
fop_readdir+0x34()

------ crash dump st3 ------
mutex_enter+0xb()
zio_buf_alloc+0x1a()
zio_read+0xba()
arc_read+0x3cc()
dbuf_prefetch+0x11d()
dmu_prefetch+0x107()
zfs_readdir+0x408()
fop_readdir+0x34()

------ crash dump st4 ------
mutex_enter+0xb()
zio_buf_alloc+0x1a()
zio_read+0xba()
arc_read+0x3cc()
dbuf_prefetch+0x11d()
dmu_prefetch+0x107()
zfs_readdir+0x408()
fop_readdir+0x34()


All four crash dumps show problem at zio_read/zio_buf_alloc. Three of these
appeared during metadata prefetch (dmu_prefetch) and one during scrubbing. I
don''t think that it''s coincidence. IMHO, checksum errors are
the result of this inconsistency.

I tend to think that problem is in ZFS it exists even in the latest Solaris
version (maybe OpenSolaris as well).

> 
> Lots of CKSUM errors like you see is often indicative
> of bad hardware. Run 
> memtest for 24-48 hours.
> 
> -marc 
 
This message posted from opensolaris.org

Robert Milkowski

2008-May-03 12:25 UTC

head link

[zfs-discuss] ZFS still crashing after patch

Hello Rustam,

Saturday, May 3, 2008, 9:16:41 AM, you wrote:

R> I don''t think that this is hardware issue, however i
don''t except this. I''ll try to explain why.

R> 1. I''ve replaced all memory modules which are more likely to
cause such a problem.

R> 2. There are many different applications running on that server
R> (Apache, PostgreSQL, etc.). However, if you look at the four
R> different crash dump stack traces you see the same picture:

R> ------ crash dump st1 ------
R> mutex_enter+0xb()
R> zio_buf_alloc+0x1a()
R> zio_read+0xba()
R> spa_scrub_io_start+0xf1()
R> spa_scrub_cb+0x13d()

R> ------ crash dump st2 ------
R> mutex_enter+0xb()
R> zio_buf_alloc+0x1a()
R> zio_read+0xba()
R> arc_read+0x3cc()
R> dbuf_prefetch+0x11d()
R> dmu_prefetch+0x107()
R> zfs_readdir+0x408()
R> fop_readdir+0x34()

R> ------ crash dump st3 ------
R> mutex_enter+0xb()
R> zio_buf_alloc+0x1a()
R> zio_read+0xba()
R> arc_read+0x3cc()
R> dbuf_prefetch+0x11d()
R> dmu_prefetch+0x107()
R> zfs_readdir+0x408()
R> fop_readdir+0x34()

R> ------ crash dump st4 ------
R> mutex_enter+0xb()
R> zio_buf_alloc+0x1a()
R> zio_read+0xba()
R> arc_read+0x3cc()
R> dbuf_prefetch+0x11d()
R> dmu_prefetch+0x107()
R> zfs_readdir+0x408()
R> fop_readdir+0x34()


R> All four crash dumps show problem at zio_read/zio_buf_alloc. Three
R> of these appeared during metadata prefetch (dmu_prefetch) and one
R> during scrubbing. I don''t think that it''s coincidence.
IMHO,
R> checksum errors are the result of this inconsistency.

Which would happen if you have problem with HW and you''re getting
wring checksums on both side of your mirrors. Maybe PS?

Try memtest anyway or sunvts



-- 
Best regards,
 Robert Milkowski                            mailto:milek at task.gda.pl
                                       http://milek.blogspot.com

Rustam

2008-May-05 11:45 UTC

head link

[zfs-discuss] ZFS still crashing after patch

Hello Robert,> Which would happen if you have problem with HW and you''re getting
> wring checksums on both side of your mirrors. Maybe PS?
>
> Try memtest anyway or sunvtsUnfortunately, SunVTS doesn''t run on non-Sun/OEM hardware. And memtest
requires too much downtime which I cannot afford right now.

However, I''ve interesting observations and now I can reproduce crash.
It seems that I''ve bad checksum(s) and ZFS crashes each time when it
tries to read it. Below are two cases:



Case1: I''ve got a checksum error not striped over mirrors, this time it
was checksum for a file and not <0x0>. I tried to read file twice. First
try returned I/O error, second try caused panic. Here''s the log:




core# zpool status -xv
  pool: box5
 state: ONLINE
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: http://www.sun.com/msg/ZFS-8000-8A
 scrub: none requested
config:
 
        NAME        STATE     READ WRITE CKSUM
        box5        ONLINE       0     0     2
          mirror    ONLINE       0     0     0
            c1d0    ONLINE       0     0     0
            c2d0    ONLINE       0     0     0
          mirror    ONLINE       0     0     2
            c2d1    ONLINE       0     0     4
            c1d1    ONLINE       0     0     4
 
errors: Permanent errors have been detected in the following files:
 
        box5:<0x0>
        /u02/domains/somedomain/0/1/5/data/sub1/sub2/1145543794.file

core# ll /u02/domains/somedomain/0/1/5/data/sub1/sub2/1145543794.file
-rw-------   1 user group       489 Apr 20  2006
/u02/domains/somedomain/0/1/5/data/sub1/sub2/1145543794.file

core# cat /u02/domains/somedomain/0/1/5/data/sub1/sub2/1145543794.file
cat: input error on
/u02/domains/somedomain/0/1/5/data/sub1/sub2/1145543794.file: I/O error

core# zpool status -xv
  pool: box5
 state: ONLINE
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: http://www.sun.com/msg/ZFS-8000-8A
 scrub: none requested
config:
 
        NAME        STATE     READ WRITE CKSUM
        box5        ONLINE       0     0     4
          mirror    ONLINE       0     0     0
            c1d0    ONLINE       0     0     0
            c2d0    ONLINE       0     0     0
          mirror    ONLINE       0     0     4
            c2d1    ONLINE       0     0     8
            c1d1    ONLINE       0     0     8
 
errors: Permanent errors have been detected in the following files:
 
        box5:<0x0>
        /u02/domains/somedomain/0/1/5/data/sub1/sub2/1145543794.file

core# cat /u02/domains/somedomain/0/1/5/data/sub1/sub2/1145543794.file
(Kernel Panic: BAD TRAP: type=e (#pf Page fault) rp=fffffe8001112490
addr=fffffe80882b7000)
...
(after system boot up)
core# rm /u02/domains/somedomain/0/1/5/data/sub1/sub2/1145543794.file
core# zpool status -xv
  pool: box5
 state: ONLINE
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: http://www.sun.com/msg/ZFS-8000-8A
 scrub: none requested
config:
 
        NAME        STATE     READ WRITE CKSUM
        box5        ONLINE       0     0     0
          mirror    ONLINE       0     0     0
            c1d0    ONLINE       0     0     0
            c2d0    ONLINE       0     0     0
          mirror    ONLINE       0     0     0
            c2d1    ONLINE       0     0     0
            c1d1    ONLINE       0     0     0
 
errors: Permanent errors have been detected in the following files:
 
        box5:<0x0>
        box5:<0x4a049a>

core# mdb unix.17 vmcore.17
Loading modules: [ unix krtld genunix specfs dtrace cpu.generic uppc pcplusmp
ufs ip hook neti sctp arp usba uhci fctl nca lofs zfs random nfs ipc sppp crypto
ptm ]> ::statusdebugging crash dump vmcore.17 (64-bit) from core
operating system: 5.10 Generic_127128-11 (i86pc)
panic message: BAD TRAP: type=e (#pf Page fault) rp=fffffe8001112490
addr=fffffe80882b7000
dump content: kernel pages only> ::stackfletcher_2_native+0x13()
zio_checksum_verify+0x27()
zio_next_stage+0x65()
zio_wait_for_children+0x49()
zio_wait_children_done+0x15()
zio_next_stage+0x65()
zio_vdev_io_assess+0x84()
zio_next_stage+0x65()
vdev_cache_read+0x14c()
vdev_disk_io_start+0x135()
vdev_io_start+0x12()
zio_vdev_io_start+0x7b()
zio_next_stage_async+0xae()
zio_nowait+9()
vdev_mirror_io_start+0xa9()
vdev_io_start+0x12()
zio_vdev_io_start+0x7b()
zio_next_stage_async+0xae()
zio_nowait+9()
vdev_mirror_io_start+0xa9()
zio_vdev_io_start+0x116()
zio_next_stage+0x65()
zio_ready+0xec()
zio_next_stage+0x65()
zio_wait_for_children+0x49()
zio_wait_children_ready+0x15()
zio_next_stage_async+0xae()
zio_nowait+9()
arc_read+0x414()
dbuf_read_impl+0x1a0()
dbuf_read+0x95()
dmu_buf_hold_array_by_dnode+0x217()
dmu_buf_hold_array+0x81()
dmu_read_uio+0x49()
zfs_read+0x13c()
fop_read+0x31()
read+0x188()
read32+0xe()
_sys_sysenter_post_swapgs+0x14b()> ::msgbufMESSAGE                                                              
...
 
panic[cpu0]/thread=ffffffff8b98e240:
BAD TRAP: type=e (#pf Page fault) rp=fffffe8001112490 addr=fffffe80882b7000
 
 
cat:
#pf Page fault
Bad kernel fault at addr=0xfffffe80882b7000
pid=17993, pc=0xfffffffff1192923, sp=0xfffffe8001112588, eflags=0x10287
cr0: 8005003b<pg,wp,ne,et,ts,mp,pe> cr4:
6f0<xmme,fxsr,pge,mce,pae,pse>
cr2: fffffe80882b7000 cr3: df609000 cr8: c
        rdi: fffffe80882b7000 rsi: fffffe80884ae200 rdx: fffffe80011125c0
        rcx: 30fceb4cd1a3ca3f  r8: 93e4f7b6a3198113  r9: 73457f74abfdcdf5
        rax:   dc3bf3d311149f rbx: ffffffff8f68a700 rbp: fffffe8001112630
        r10: ffffffff8ecb5c30 r11:                0 r12: fffffe80884ae200
        r13:               c0 r14:           200200 r15: fffffe80882ae000
        fsb: ffffffff80000000 gsb: fffffffffbc24e40  ds:               43
         es:               43  fs:                0  gs:              1c3
        trp:                e err:                0 rip: fffffffff1192923
         cs:               28 rfl:            10287 rsp: fffffe8001112588
         ss:               30
 
fffffe80011123a0 unix:real_mode_end+71e1 ()
fffffe8001112480 unix:trap+5e6 ()
fffffe8001112490 unix:cmntrap+140 ()
fffffe8001112630 zfs:zfsctl_ops_root+30d9707b ()
fffffe8001112650 zfs:zio_checksum_verify+27 ()
fffffe8001112660 zfs:zio_next_stage+65 ()
fffffe8001112690 zfs:zio_wait_for_children+49 ()
fffffe80011126a0 zfs:zio_wait_children_done+15 ()
fffffe80011126b0 zfs:zio_next_stage+65 ()
fffffe80011126f0 zfs:zio_vdev_io_assess+84 ()
fffffe8001112700 zfs:zio_next_stage+65 ()
fffffe80011127e0 zfs:vdev_cache_read+14c ()
fffffe8001112820 zfs:vdev_disk_io_start+135 ()
fffffe8001112830 zfs:vdev_io_start+12 ()
fffffe8001112860 zfs:zio_vdev_io_start+7b ()
fffffe8001112870 zfs:zfsctl_ops_root+30db75c6 ()
fffffe8001112880 zfs:zio_nowait+9 ()
fffffe80011128e0 zfs:vdev_mirror_io_start+a9 ()
fffffe80011128f0 zfs:vdev_io_start+12 ()
fffffe8001112920 zfs:zio_vdev_io_start+7b ()
fffffe8001112930 zfs:zfsctl_ops_root+30db75c6 ()
fffffe8001112940 zfs:zio_nowait+9 ()
fffffe80011129a0 zfs:vdev_mirror_io_start+a9 ()
fffffe80011129d0 zfs:zio_vdev_io_start+116 ()
fffffe80011129e0 zfs:zio_next_stage+65 ()
fffffe8001112a00 zfs:zio_ready+ec ()
fffffe8001112a10 zfs:zio_next_stage+65 ()
fffffe8001112a40 zfs:zio_wait_for_children+49 ()
fffffe8001112a50 zfs:zio_wait_children_ready+15 ()
fffffe8001112a60 zfs:zfsctl_ops_root+30db75c6 ()
fffffe8001112a70 zfs:zio_nowait+9 ()
fffffe8001112af0 zfs:arc_read+414 ()
fffffe8001112b70 zfs:dbuf_read_impl+1a0 ()
fffffe8001112bb0 zfs:zfsctl_ops_root+30d812dd ()
fffffe8001112c20 zfs:dmu_buf_hold_array_by_dnode+217 ()
fffffe8001112c70 zfs:dmu_buf_hold_array+81 ()
fffffe8001112cd0 zfs:zfsctl_ops_root+30d84641 ()
fffffe8001112d30 zfs:zfs_read+13c ()
fffffe8001112d80 genunix:fop_read+31 ()
fffffe8001112eb0 genunix:read+188 ()
fffffe8001112ec0 genunix:read32+e ()
fffffe8001112f10 unix:brand_sys_sysenter+1f2 ()
 
syncing file systems...
 14
 10
 8
...
 8
 done (not all i/o completed)
dumping to /dev/dsk/c0d0s1, offset 215547904, content:
kernel> ^D

-----------------------------------------

Case2: At this point I''ve started to thinking how to reproduce such
error. Obviously, I need to read all blocks to stumble upon bad checksum(s). So
I decided to backup whole box5 pool using zfs send/receive to another pool I had
in the system - box7. I''ve got very interesting results:



# zpool status -v
  pool: box5
 state: ONLINE
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: http://www.sun.com/msg/ZFS-8000-8A
 scrub: none requested
config:
 
        NAME        STATE     READ WRITE CKSUM
        box5        ONLINE       0     0     0
          mirror    ONLINE       0     0     0
            c1d0    ONLINE       0     0     0
            c2d0    ONLINE       0     0     0
          mirror    ONLINE       0     0     0
            c2d1    ONLINE       0     0     0
            c1d1    ONLINE       0     0     0
 
errors: Permanent errors have been detected in the following files:
 
        box5:<0x0>
 
  pool: box7
 state: ONLINE
 scrub: none requested
config:
 
        NAME        STATE     READ WRITE CKSUM
        box7        ONLINE       0     0     0
          mirror    ONLINE       0     0     0
            c3t6d0  ONLINE       0     0     0
            c3t7d0  ONLINE       0     0     0
 
errors: No known data errors

# zfs list
NAME   USED  AVAIL  REFER  MOUNTPOINT
box5   175G   282G   175G  /u02
box7  4.26G   681G  4.26G  /box7

# zfs snapshot box5 at backup1
# zfs send box5 at backup1 | zfs receive box7/backup1 &
[1] 11056

# zfs list
NAME           USED  AVAIL  REFER  MOUNTPOINT
box5           175G   282G   175G  /u02
box5 at backup1  2.07M      -   175G  -
box7          4.26G   681G  4.26G  /box7
box7/backup1  1.50K   681G  1.50K  /box7/backup1
(Kernel Panic. See panic message below)
...
(after system boot up)
# zfs list
NAME           USED  AVAIL  REFER  MOUNTPOINT
box5           175G   282G   175G  /u02
box5 at backup1  24.2M      -   175G  -
box7          6.35G   679G  6.35G  /box7

# mdb unix.20 vmcore.20
Loading modules: [ unix krtld genunix specfs dtrace cpu.generic uppc pcplusmp
ufs ip hook neti sctp arp usba uhci fctl nca lofs zfs random nfs ipc sppp crypto
ptm ]> ::statusdebugging crash dump vmcore.20 (64-bit) from core
operating system: 5.10 Generic_127128-11 (i86pc)
panic message:
ZFS: bad checksum (read on <unknown> off 0: zio ffffffff8b504c80 [L0 DMU
dnode] 4000L/e00P DVA[0]=<0:711cca000:e00> DVA[1]=<1:2535927000:e00>
fletcher4 lzjb LE contiguous birth=10088934 fill=19
cksum=dd5d13d3d3:1b19d851501a2:2034a615f84743c:c3ddc5ca046ccb96): error 50
dump content: kernel pages only> ::stackvpanic()
0xfffffffff11b1af4()
zio_next_stage+0x65()
zio_wait_for_children+0x49()
zio_wait_children_done+0x15()
zio_next_stage+0x65()
zio_vdev_io_assess+0x84()
zio_next_stage+0x65()
vdev_mirror_io_done+0xc1()
zio_vdev_io_done+0x14()
taskq_thread+0xbc()
thread_start+8()> ::msgbufMESSAGE
...
 
panic[cpu0]/thread=fffffe800081dc80:
ZFS: bad checksum (read on <unknown> off 0: zio ffffffff8b504c80 [L0 DMU
dnode] 4000L/e00P DVA[0]=<0:711cca000:e00> DVA[1]=<1:2535927000:e00>
fletcher4 lzjb LE contiguous birth=10088934 fill=19
cksum=dd5d13d3d3:1b19d851501a2:2034a615f84743c:c3ddc5ca046ccb96): error 50
 
 
fffffe800081dac0 zfs:zfsctl_ops_root+30db624c ()
fffffe800081dad0 zfs:zio_next_stage+65 ()
fffffe800081db00 zfs:zio_wait_for_children+49 ()
fffffe800081db10 zfs:zio_wait_children_done+15 ()
fffffe800081db20 zfs:zio_next_stage+65 ()
fffffe800081db60 zfs:zio_vdev_io_assess+84 ()
fffffe800081db70 zfs:zio_next_stage+65 ()
fffffe800081dbd0 zfs:vdev_mirror_io_done+c1 ()
fffffe800081dbe0 zfs:zio_vdev_io_done+14 ()
fffffe800081dc60 genunix:taskq_thread+bc ()
fffffe800081dc70 unix:thread_start+8 ()
 
syncing file systems...
 5
 3
 1
...
 1
 done (not all i/o completed)
dumping to /dev/dsk/c0d0s1, offset 215547904, content:
kernel>

You can see checksum type fletcher4 (with lzjb compression?) in panic message.
However, all of the ZFS file systems are fletcher2 w/o compression. So this is
obviously corrupted checksum. I''ve run backup (send/receive) test
described above 3 times. Every time I''ve got the same error and each
time system panic approx. 45 minutes after start.
>From all these I can conclude that it happens when "zfs send"
tries to read the same bad checksum. Bad checksum could appear in the result of
software or hardware bug. However, it''s obvious that zfs cannot
properly handle and fix this checksum problem.
I tried also to run zdb, but it run out of the memory:

# zdb box5
    version=4
    name=''box5''
    state=0
    txg=11057710
    pool_guid=989471958079851180
    vdev_tree
        type=''root''
        id=0
        guid=989471958079851180
        children[0]
                type=''mirror''
                id=0
                guid=9879820675701757866
                metaslab_array=16
                metaslab_shift=31
                ashift=9
                asize=250045530112
                children[0]
                        type=''disk''
                        id=0
                        guid=6440359594760637663
                        path=''/dev/dsk/c1d0s0''
                        devid=''id1,cmdk at
AST3250820AS=____________9QE0LYMD/a''
                        whole_disk=1
                        DTL=125
                children[1]
                        type=''disk''
                        id=1
                        guid=80751879044845160
                        path=''/dev/dsk/c2d0s0''
                        devid=''id1,cmdk at
AST3250824AS=____________5ND3M6BQ/a''
                        whole_disk=1
                        DTL=23
        children[1]
                type=''mirror''
                id=1
                guid=4781215615782453677
                whole_disk=0
                metaslab_array=13
                metaslab_shift=31
                ashift=9
                asize=250045530112
                children[0]
                        type=''disk''
                        id=0
                        guid=4849048357332929360
                        path=''/dev/dsk/c2d1s0''
                        devid=''id1,cmdk at
AST3250824AS=____________5ND3LQLB/a''
                        whole_disk=1
                        DTL=21
                children[1]
                        type=''disk''
                        id=1
                        guid=15140711491939156235
                        path=''/dev/dsk/c1d1s0''
                        devid=''id1,cmdk at
AST3250820AS=____________3QE01C2H/a''
                        whole_disk=1
                        DTL=19
Uberblock
 
        magic = 0000000000bab10c
        version = 4
        txg = 11061627
        guid_sum = 5267891425222527909
        timestamp = 1209925302 UTC = Sun May  4 23:21:42 2008
 
Dataset mos [META], ID 0, cr_txg 4, 176M, 201 objects
Dataset box5 at backup1 [ZPL], ID 199, cr_txg 11060758, 175G, 4997266 objects
Dataset box5 [ZPL], ID 5, cr_txg 4, 175G, 4998785 objects
 
Traversing all blocks to verify checksums and verify nothing leaked ...

out of memory -- generating core dump
Abort (core dumped)


So what I''m supposed to do now with all these 5 million objects? I
cannot even backup them.


Regards,
Rustam.
 
 
This message posted from opensolaris.org

Richard Elling

2008-May-05 15:12 UTC

head link

[zfs-discuss] ZFS still crashing after patch

Rustam wrote:> Hello Robert,
>   
>> Which would happen if you have problem with HW and you''re
getting
>> wring checksums on both side of your mirrors. Maybe PS?
>>
>> Try memtest anyway or sunvts
>>     
> Unfortunately, SunVTS doesn''t run on non-Sun/OEM hardware. And
memtest requires too much downtime which I cannot afford right now.
>   
Sometimes if you read the docs, you can get confused by people who
intend to confuse you.  SunVTS does work on a wide variety of
hardware, though it may not be "supported." To fully understand the
perspective, SunVTS is used by Sun in the manufacturing process.
It is the tests run on hardware before shipping to customers.  It is not
intended to be a generic "test whatever hardware you find laying around
product."
 -- richard

Marcelo Leal

2008-May-05 17:10 UTC

head link

[zfs-discuss] ZFS still crashing after patch

Hello,
 If you believe that the problem can be related to ZIL code, you can try to
disable it to debug (isolate) the problem. If it is not a fileserver (NFS),
disabling the zil should not impact consistency.

 Leal.
 
 
This message posted from opensolaris.org

Rustam

2008-May-05 20:35 UTC

head link

[zfs-discuss] ZFS still crashing after patch

Hello Leal,

I''ve been already warned
(http://www.opensolaris.org/jive/message.jspa?messageID=231349) that ZIL could
be a cause and I made tests with zil_disabled. I run scrub and system crashed
exactly at after the same period and the same error. ZIL known to cause some
problems on writes, while all my problems are with zio_read and checksum_verify.

This is NFS file server, but it crashed even when NFS unshared and nfs/server is
disabled. So this is not NFS problem.

I reduced panic occasions by setting zfs_prefetch_disable. This allows me to
avoid unnecessary reads and reduces chances of reading bad checksums. For now
I''ve 24 hours without crash which is much better than few times a day.
However, I know that bad checksums are there and I need to fix them somehow.

--
Rustam
 
 
This message posted from opensolaris.org

Bob Friesenhahn

2008-May-05 20:43 UTC

head link

[zfs-discuss] ZFS still crashing after patch

On Mon, 5 May 2008, Marcelo Leal wrote:
> Hello, If you believe that the problem can be related to ZIL code, 
> you can try to disable it to debug (isolate) the problem. If it is 
> not a fileserver (NFS), disabling the zil should not impact 
> consistency.
In what way is NFS special when it comes to ZFS consistency?  If NFS 
consistency is lost by disabling the zil then local consistency is 
also lost.

Bob
=====================================Bob Friesenhahn
bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

eric kustarz

2008-May-05 21:01 UTC

head link

[zfs-discuss] ZFS still crashing after patch

On May 5, 2008, at 1:43 PM, Bob Friesenhahn wrote:
> On Mon, 5 May 2008, Marcelo Leal wrote:
>
>> Hello, If you believe that the problem can be related to ZIL code,
>> you can try to disable it to debug (isolate) the problem. If it is
>> not a fileserver (NFS), disabling the zil should not impact
>> consistency.
>
> In what way is NFS special when it comes to ZFS consistency?  If NFS
> consistency is lost by disabling the zil then local consistency is
> also lost.
That''s not true:
http://blogs.sun.com/erickustarz/entry/zil_disable

Perhaps people are using "consistency" to mean different things
here...

eric

Bob Friesenhahn

2008-May-05 23:43 UTC

head link

[zfs-discuss] ZFS still crashing after patch

On Mon, 5 May 2008, eric kustarz wrote:>
> That''s not true:
> http://blogs.sun.com/erickustarz/entry/zil_disable
>
> Perhaps people are using "consistency" to mean different things
here...
Consistency means that fsync() assures that the data will be written 
to disk so no data is lost.  It is not the same thing as "no 
corruption".  ZFS will happily lose some data in order to avoid some 
corruption if the system loses power.

Bob
=====================================Bob Friesenhahn
bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

Bob Friesenhahn

2008-May-05 23:47 UTC

head link

[zfs-discuss] ZFS still crashing after patch

On Mon, 5 May 2008, Marcelo Leal wrote:> I''m calling consistency, "a coherent local view"...
> I think that was one option to debug (if not a NFS server), without
> generate a corrupted filesystem.
In other words your flight reservation will not be lost if the system 
crashes.

Bob
=====================================Bob Friesenhahn
bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

eric kustarz

2008-May-05 23:58 UTC

head link

[zfs-discuss] ZFS still crashing after patch

On May 5, 2008, at 4:43 PM, Bob Friesenhahn wrote:
> On Mon, 5 May 2008, eric kustarz wrote:
>>
>> That''s not true:
>> http://blogs.sun.com/erickustarz/entry/zil_disable
>>
>> Perhaps people are using "consistency" to mean different
things
>> here...
>
> Consistency means that fsync() assures that the data will be written  
> to disk so no data is lost.  It is not the same thing as "no  
> corruption".  ZFS will happily lose some data in order to avoid some  
> corruption if the system loses power.
Ok, that makes more sense.  You''re talking from the application  
perspective, whereas my blog entry is from the file system''s  
perspective (disabling the ZIL does not compromise on-disk consistency).

eric

Robert Milkowski

2008-May-06 07:05 UTC

head link

[zfs-discuss] ZFS still crashing after patch

Hello Richard,

Monday, May 5, 2008, 4:12:23 PM, you wrote:

RE> Rustam wrote:>> Hello Robert,
>>   
>>> Which would happen if you have problem with HW and you''re
getting
>>> wring checksums on both side of your mirrors. Maybe PS?
>>>
>>> Try memtest anyway or sunvts
>>>     
>> Unfortunately, SunVTS doesn''t run on non-Sun/OEM hardware. And
memtest requires too much downtime which I cannot afford right now.
>>   
RE> Sometimes if you read the docs, you can get confused by people who
RE> intend to confuse you.  SunVTS does work on a wide variety of
RE> hardware, though it may not be "supported." To fully understand
the
RE> perspective, SunVTS is used by Sun in the manufacturing process.
RE> It is the tests run on hardware before shipping to customers.  It is not
RE> intended to be a generic "test whatever hardware you find laying
around
RE> product."

Nevertheless you can actually "persuade" it to run on non Sun HW -
it''s even in manual page IIRC.

-- 
Best regards,
 Robert Milkowski                            mailto:milek at task.gda.pl
                                       http://milek.blogspot.com

zfs discuss - May 2008 - ZFS still crashing after patch

[zfs-discuss] ZFS still crashing after patch

[zfs-discuss] ZFS still crashing after patch

[zfs-discuss] ZFS still crashing after patch

[zfs-discuss] ZFS still crashing after patch

[zfs-discuss] ZFS still crashing after patch

[zfs-discuss] ZFS still crashing after patch

[zfs-discuss] ZFS still crashing after patch

[zfs-discuss] ZFS still crashing after patch

[zfs-discuss] ZFS still crashing after patch

[zfs-discuss] ZFS still crashing after patch

[zfs-discuss] ZFS still crashing after patch

[zfs-discuss] ZFS still crashing after patch

[zfs-discuss] ZFS still crashing after patch

[zfs-discuss] ZFS still crashing after patch

[zfs-discuss] ZFS still crashing after patch

[zfs-discuss] ZFS still crashing after patch

[zfs-discuss] ZFS still crashing after patch

[zfs-discuss] ZFS still crashing after patch

[zfs-discuss] ZFS still crashing after patch