thr3ads.net - zfs discuss - [zfs-discuss] Kernel panic at zpool import [Aug 2008]

If this information is useful, please help other people find it:
Share via:

Borys Saulyak

2008-Aug-07 11:20 UTC

[zfs-discuss] Kernel panic at zpool import

Hi,

I have problem with Solaris 10. I know that this forum is for OpenSolaris but
may be someone will have an idea.
My box is crashing on any attempt to import zfs pool. First crash happened on
export operation and since then I cannot import pool anymore due to kernel
panics. Is there any way of getting it imported or fixed? Removal of zpool.cache
did not help.

Here are details:
SunOS omases11 5.10 Generic_137112-02 i86pc i386 i86pc

root at omases11:~[8]#zpool import 
pool: public 
id: 10521132528798740070 
state: ONLINE 
action: The pool can be imported using its name or numeric identifier. 
config: 

public ONLINE 
c7t60060160CBA21000A5D22553CA91DC11d0 ONLINE 

pool: private 
id: 3180576189687249855 
state: ONLINE 
action: The pool can be imported using its name or numeric identifier. 
config: 

private ONLINE 
c7t60060160CBA21000A6D22553CA91DC11d0 ONLINE 


root at omases11:~[8]#zpool import private 

panic[cpu3]/thread=fffffe8001223c80: ZFS: bad checksum (read on <unknown>
off 0: zio ffffffffa26b7680
[L0 packed nvlist] 4000L/600P DVA[0]=<0:10c000f400:600>
DVA[1]=<0:b40014e00:600> fletcher4 lzjb
LE contiguous birth=3640409 fill=1
cksum=6c8098535e:6150d1eeb30a:2f1f7efda48588:105955d437bb76e5): error 50

fffffe8001223ac0 zfs:zfsctl_ops_root+2ff1624c () 
fffffe8001223ad0 zfs:zio_next_stage+65 () 
fffffe8001223b00 zfs:zio_wait_for_children+49 () 
fffffe8001223b10 zfs:zio_wait_children_done+15 () 
fffffe8001223b20 zfs:zio_next_stage+65 () 
fffffe8001223b60 zfs:zio_vdev_io_assess+84 () 
fffffe8001223b70 zfs:zio_next_stage+65 () 
fffffe8001223bd0 zfs:vdev_mirror_io_done+c1 () 
fffffe8001223be0 zfs:zio_vdev_io_done+14 () 
fffffe8001223c60 genunix:taskq_thread+bc () 
fffffe8001223c70 unix:thread_start+8 () 

syncing file systems... [2] 212 [2] 210 [2] 210 [2] 210 [2] 210 [2] 210 [2] 210
[2] 210 [2] 210 [2] 210 [2] 210 [2] 210 [2] 210 [2] 210 [2] 210 [2] 210 [2] 210
[2] 210 [2] 210 [2] 210 [2] 210 [2] 210 done (not all i/o completed)
dumping to /dev/dsk/c3t2d0s1, offset 65536, content: kernel
 
 
This message posted from opensolaris.org

Marc Bevand

2008-Aug-08 04:13 UTC

head link

[zfs-discuss] Kernel panic at zpool import

Borys Saulyak <borys.saulyak <at> eumetsat.int>
writes:> root <at> omases11:~[8]#zpool import 
> [...]
> pool: private 
> id: 3180576189687249855 
> state: ONLINE 
> action: The pool can be imported using its name or numeric identifier. 
> config: 
> 
>   private ONLINE 
>     c7t60060160CBA21000A6D22553CA91DC11d0 ONLINE 
Your pools have no redundancy...
> root <at> omases11:~[8]#zpool import private 
> 
> panic[cpu3]/thread=fffffe8001223c80: ZFS: bad checksum
...and got corrupted, therefore there is nothing ZFS can do. This is precisely 
why best practices recommend pools to be configured with some level of 
redundancy (mirror, raidz, etc). See:

http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide#Additional_Cautions_for_Storage_Pools

Restore your data from backup.

-marc

Richard Elling

2008-Aug-08 15:39 UTC

head link

[zfs-discuss] Kernel panic at zpool import

There is a chance that a software bug or change has been made which
will help you to recover from this.  I suggest getting the latest SXCE
DVD, booting single user, and attempt an import.

Note: you may see a message indicating that you can upgrade the
pool.  Do not upgrade the pool if you intend to continue running
Solaris 10 in the near future.
 -- richard

Borys Saulyak wrote:> Hi,
>
> I have problem with Solaris 10. I know that this forum is for OpenSolaris
but may be someone will have an idea.
> My box is crashing on any attempt to import zfs pool. First crash happened
on export operation and since then I cannot import pool anymore due to kernel
panics. Is there any way of getting it imported or fixed? Removal of zpool.cache
did not help.
>
> Here are details:
> SunOS omases11 5.10 Generic_137112-02 i86pc i386 i86pc
>
> root at omases11:~[8]#zpool import 
> pool: public 
> id: 10521132528798740070 
> state: ONLINE 
> action: The pool can be imported using its name or numeric identifier. 
> config: 
>
> public ONLINE 
> c7t60060160CBA21000A5D22553CA91DC11d0 ONLINE 
>
> pool: private 
> id: 3180576189687249855 
> state: ONLINE 
> action: The pool can be imported using its name or numeric identifier. 
> config: 
>
> private ONLINE 
> c7t60060160CBA21000A6D22553CA91DC11d0 ONLINE 
>
>
> root at omases11:~[8]#zpool import private 
>
> panic[cpu3]/thread=fffffe8001223c80: ZFS: bad checksum (read on
<unknown> off 0: zio ffffffffa26b7680
> [L0 packed nvlist] 4000L/600P DVA[0]=<0:10c000f400:600>
DVA[1]=<0:b40014e00:600> fletcher4 lzjb
> LE contiguous birth=3640409 fill=1
cksum=6c8098535e:6150d1eeb30a:2f1f7efda48588:105955d437bb76e5): error 50
>
> fffffe8001223ac0 zfs:zfsctl_ops_root+2ff1624c () 
> fffffe8001223ad0 zfs:zio_next_stage+65 () 
> fffffe8001223b00 zfs:zio_wait_for_children+49 () 
> fffffe8001223b10 zfs:zio_wait_children_done+15 () 
> fffffe8001223b20 zfs:zio_next_stage+65 () 
> fffffe8001223b60 zfs:zio_vdev_io_assess+84 () 
> fffffe8001223b70 zfs:zio_next_stage+65 () 
> fffffe8001223bd0 zfs:vdev_mirror_io_done+c1 () 
> fffffe8001223be0 zfs:zio_vdev_io_done+14 () 
> fffffe8001223c60 genunix:taskq_thread+bc () 
> fffffe8001223c70 unix:thread_start+8 () 
>
> syncing file systems... [2] 212 [2] 210 [2] 210 [2] 210 [2] 210 [2] 210 [2]
210 [2] 210 [2] 210 [2] 210 [2] 210 [2] 210 [2] 210 [2] 210 [2] 210 [2] 210 [2]
210 [2] 210 [2] 210 [2] 210 [2] 210 [2] 210 done (not all i/o completed)
> dumping to /dev/dsk/c3t2d0s1, offset 65536, content: kernel
>  
>  
> This message posted from opensolaris.org
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>

Łukasz K

2008-Aug-11 09:21 UTC

head link

[zfs-discuss] Odp: Kernel panic at zpool import

Dnia 7-08-2008 o godz. 13:20 Borys Saulyak napisa?(a):> Hi,
> 
> I have problem with Solaris 10. I know that this forum is for
> OpenSolaris but may be someone will have an idea.
> My box is crashing on any attempt to import zfs pool. First crash
> happened on export operation and since then I cannot import pool anymore
> due to kernel panics. Is there any way of getting it imported or fixed?
> Removal of zpool.cache did not help.
> 
> Here are details:
> SunOS omases11 5.10 Generic_137112-02 i86pc i386 i86pc
> 
> root at omases11:~[8]#zpool import
> pool: public
> id: 10521132528798740070
> state: ONLINE
> action: The pool can be imported using its name or numeric identifier.
> config:
> 
> public ONLINE
> c7t60060160CBA21000A5D22553CA91DC11d0 ONLINE
> 
> pool: private
> id: 3180576189687249855
> state: ONLINE
> action: The pool can be imported using its name or numeric identifier.
> config:
> 
> private ONLINE
> c7t60060160CBA21000A6D22553CA91DC11d0 ONLINE
> 

Try to change uberblock

http://www.opensolaris.org/jive/thread.jspa?messageID=217097

this might help.


--Lukas Karwacki

----------------------------------------------------
Wytworne szmatki, luksusowe auta, efekciarskie gad?ety.
Serwis dla koneser?w prawdziwego luksusu.
http://klik.wp.pl/?adr=www.LuxClub.pl&sid=443

Borys Saulyak

2008-Aug-11 12:51 UTC

head link

[zfs-discuss] Kernel panic at zpool import

> Your pools have no redundancy...Box is connected to two fabric switches via different HBAs, storage is RAID5,
MPxIP is ON, and all after that my pools have no redundancy?!?!
> ...and got corrupted, therefore there is nothing ZFSThis is exactly what I would like to know. HOW this could happened? 
I''m just questioning myself. Is it really reliable filesystem as
presented, or it''s better to keep away from it on production
environment.
 
 
This message posted from opensolaris.org

Darren J Moffat

2008-Aug-11 12:55 UTC

head link

[zfs-discuss] Kernel panic at zpool import

Borys Saulyak wrote:>> Your pools have no redundancy...
> Box is connected to two fabric switches via different HBAs, storage is
RAID5, MPxIP is ON, and all after that my pools have no redundancy?!?!
Not that ZFS can see and use, all that is just a single disk as far as 
ZFS is concerned.
>> ...and got corrupted, therefore there is nothing ZFS
> This is exactly what I would like to know. HOW this could happened? 
> I''m just questioning myself. Is it really reliable filesystem as
presented, or it''s better to keep away from it on production
environment.
ZFS can not repair problems if it is not in control of the redundant copies.

-- 
Darren J Moffat

Marc Bevand

2008-Aug-14 07:12 UTC

head link

[zfs-discuss] Kernel panic at zpool import

Borys Saulyak <borys.saulyak <at> eumetsat.int>
writes:> 
> > Your pools have no redundancy...
>
> Box is connected to two fabric switches via different HBAs, storage is
> RAID5, MPxIP is ON, and all after that my pools have no redundancy?!?! 
As Darren said: no, there is no redundancy that ZFS can use. It is important 
to understand that your setup _prevents_ ZFS from self-healing itself. You 
need a ZFS-redundant pool (mirror, raidz or raidz2) or an fs with the 
attribute copies=2 to enable self-healing.

I would recommend you to make multiple LUNs visible to ZFS, and create 
redundant pools out of them. Browse he past 2 years or so of the zfs-discuss@ 
archives to give you an idea about how others with the same kind of hardware 
as you are doing it. For example, export each disk as a LUN, and create 
multiple raidz vdevs. Or create 2 hardware raid5 arrays and mirror them with 
ZFS, etc.
> > ...and got corrupted, therefore there is nothing ZFS
> This is exactly what I would like to know. HOW this could happened? 
Ask your hardware vendor. The hardware corrupted your data, not ZFS.
> I''m just questioning myself. Is it really reliable filesystem as
presented,
> or it''s better to keep away from it on production environment.
Consider yourself lucky that the corruption was reported by ZFS. Other 
filesystems would have returned silently corrupted data and it would have 
maybe taken you days/weeks to troubleshoot it. As to myself, I use ZFS in 
production to backup 10+ million files, have seen occurences of hw causing 
data corruption, and have seen ZFS self-heal itself. So yes I trust it.

-marc

Borys Saulyak

2008-Aug-14 11:42 UTC

head link

[zfs-discuss] Kernel panic at zpool import

> I would recommend you to make multiple LUNs visible
> to ZFS, and create So, you are saying that ZFS will cope better with failures then any other
storage system, right? I''m just trying to imagine...
I''ve got, lets say, 10 disks in the storage. They are currently in
RAID5 configuration and given to my box as one LUN. You suggest to create 10
LUNs instead, and give them to ZFS, where they will be part of one raidz, right?
So what sort of protection will I gain by that? What kind of failure will be
eliminated? Sorry, but I cannot catch it...
 
 
This message posted from opensolaris.org

Will Murnane

2008-Aug-14 13:41 UTC

head link

[zfs-discuss] Kernel panic at zpool import

On Thu, Aug 14, 2008 at 07:42, Borys Saulyak <borys.saulyak at
eumetsat.int> wrote:> I''ve got, lets say, 10 disks in the storage. They are currently in
RAID5 configuration and given to my box as one LUN. You suggest to create 10
LUNs instead, and give them to ZFS, where they will be part of one raidz, right?
> So what sort of protection will I gain by that? What kind of failure will
be eliminated? Sorry, but I cannot catch it...Suppose that ZFS detects an error in the first case.  It can''t tell
the storage array "something''s wrong, please fix it" (since
the
storage array doesn''t provide for this with checksums and intelligent
recovery), so all it can do is tell the user "this file is corrupt,
recover it from backups".

In the second case, ZFS can use the parity or mirrored data to
reconstruct plausible blocks, and then see if they match the checksum.
 Once it finds one that matches (which will happen as long as
sufficient parity remains), it can write the corrected data back to
the disk that had junk on it, and report to the user "there were
problems over here, but I fixed them".

Will

Chris Cosby

2008-Aug-14 13:57 UTC

head link

[zfs-discuss] Kernel panic at zpool import

To further clarify Will''s point...

Your current setup provides excellent hardware protection, but absolutely no
data protection.
ZFS provides excellent data protection when it has multiple copies of the
data blocks (>1 hardware devices).

Combine the two, provide >1 hardware devices to ZFS, and you have a really
nice solution. If you can spare the space, setup your arrays and things to
provide exactly 2 identical LUNs to your ZFS box and create your zpool with
those in a mirror. The best of all worlds.


On Thu, Aug 14, 2008 at 9:41 AM, Will Murnane <will.murnane at
gmail.com>wrote:
> On Thu, Aug 14, 2008 at 07:42, Borys Saulyak <borys.saulyak at
eumetsat.int>
> wrote:
> > I''ve got, lets say, 10 disks in the storage. They are
currently in RAID5
> configuration and given to my box as one LUN. You suggest to create 10 LUNs
> instead, and give them to ZFS, where they will be part of one raidz, right?
> > So what sort of protection will I gain by that? What kind of failure
will
> be eliminated? Sorry, but I cannot catch it...
> Suppose that ZFS detects an error in the first case.  It can''t
tell
> the storage array "something''s wrong, please fix it"
(since the
> storage array doesn''t provide for this with checksums and
intelligent
> recovery), so all it can do is tell the user "this file is corrupt,
> recover it from backups".
>
> In the second case, ZFS can use the parity or mirrored data to
> reconstruct plausible blocks, and then see if they match the checksum.
>  Once it finds one that matches (which will happen as long as
> sufficient parity remains), it can write the corrected data back to
> the disk that had junk on it, and report to the user "there were
> problems over here, but I fixed them".
>
> Will
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>


-- 
chris -at- microcozm -dot- net
=== Si Hoc Legere Scis Nimium Eruditionis Habes
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20080814/9cae88b8/attachment.html>

Miles Nordin

2008-Aug-14 17:58 UTC

head link

[zfs-discuss] Kernel panic at zpool import

>>>>> "mb" == Marc Bevand <m.bevand at gmail.com>
writes:
mb> Ask your hardware vendor. The hardware corrupted your data,
mb> not ZFS.

You absolutely do NOT have adequate basis to make this statement.

I would further argue that you are probably wrong, and that I think
based on what we know that the pool was probably corrupted by a bug in
ZFS. Simply because ZFS is (a) able to detect problems with hardware
when they exist, and (b) ringing an alarm bell of some sort, does NOT
exhonerate ZFS. and AIUI that is your position.

Further, ZFS''s ability to use zpool-level redundancy heal problems
created by its own bugs is not a cause for celebration or an
improvement over filesystems without bugs. The virtue of the
self-healing is for when hardware actually does fail. If self-healing
also helps with corruption created by bugs in ZFS, that does not shift
blame for unhealed bug-corruption back to the hardware, nor make ZFS
more robust than a different filesystem without corruption bugs.

mb> Other filesystems would have returned silently corrupted
mb> data and it would have maybe taken you days/weeks to
mb> troubleshoot

possibly. very likely, other filesystems would have handled it fine.

Boris, have a look at the two links I posted earlier about ``simon
sez, import!'''' incantations, and required patches.

http://opensolaris.org/jive/message.jspa?messageID=192572#194209
http://sunsolve.sun.com/search/document.do?assetkey=1-66-233602-1

panic-on-import, sounds a lot like your problem. Jonathan also posted
http://www.opensolaris.org/jive/thread.jspa?messageID=220125 which
seems to be incomplete instructions on how to choose a different
ueberblock which helped someone else with a corrupted pool, but the OP
in that thread never wrote it up in recipe form for ignorant sysadmins
like me to follow so it might not be widely useful.

In short, ZFS is unstable and prone to corruption, but may improve
substantially when patched up to the latest revision. And many fixes
are available now, but some which are in SXCE right now will be
available in the stable binary-only Solaris not until u6 so we haven''t
yet gained experience with how much improvement the patches provide.
And finally, there is no way to back up a ZFS filesystem with lots of
clones which is similarly robust to past Unix backup systems---your
best bet for space-efficient backups is to zfs send/recv data onto a
separate ZFS pool.

In more detail, I think there is some experience here that when a
single storage subsystem hosting both ZFS pools and vxfs filesystems
goes away, ZFS pools sometimes become corrupt while vxfs rolls its log
and continues. so, in stable Sol10u5, ZFS is probably more prone to
metadata corruption causing whole-pool-failure than other logging
filesystems. some fixes are around the corner, and others are
apparently the subject of some philosophical debate.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 304 bytes
Desc: not available
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20080814/3b09d2cf/attachment.bin>

Darren J Moffat

2008-Aug-14 18:13 UTC

head link

[zfs-discuss] Kernel panic at zpool import

Miles Nordin wrote:>>>>>> "mb" == Marc Bevand <m.bevand at
gmail.com> writes:
> 
>     mb> Ask your hardware vendor. The hardware corrupted your data,
>     mb> not ZFS.
> 
> You absolutely do NOT have adequate basis to make this statement.
> 
> I would further argue that you are probably wrong, and that I think
> based on what we know that the pool was probably corrupted by a bug in
> ZFS.  Simply because ZFS is (a) able to detect problems with hardware
> when they exist, and (b) ringing an alarm bell of some sort, does NOT
> exhonerate ZFS.  and AIUI that is your position.
> 
> Further, ZFS''s ability to use zpool-level redundancy heal problems
> created by its own bugs is not a cause for celebration or an
> improvement over filesystems without bugs.  The virtue of the
There are no filesystems without bugs.

-- 
Darren J Moffat

Bob Friesenhahn

2008-Aug-14 18:39 UTC

head link

[zfs-discuss] Kernel panic at zpool import

On Thu, 14 Aug 2008, Miles Nordin wrote:
>>>>>> "mb" == Marc Bevand <m.bevand at
gmail.com> writes:
>
>    mb> Ask your hardware vendor. The hardware corrupted your data,
>    mb> not ZFS.
>
> You absolutely do NOT have adequate basis to make this statement.
Unfortunately I was unable to read your entire email since it 
overflowed my limited buffer.  The email would have fit within my 
limited buffer size if it terminated with the single line above. 
Replacing one conjecture with another does not seem like sound 
reasoning to me.

Bob
=====================================Bob Friesenhahn
bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

Borys Saulyak

2008-Aug-18 11:34 UTC

head link

[zfs-discuss] Odp: Kernel panic at zpool import

> Try to change uberblock
> http://www.opensolaris.org/jive/thread.jspa?messageID> 217097Looks like you are the originator of that thread. In the last message you
promised to post some details on how you''ve recovered, but it was not
done. Can you please post some details? How did you figure out offsets for
vdev_uberblock_compare? Thank you!
 
 
This message posted from opensolaris.org

Borys Saulyak

2008-Aug-18 20:06 UTC

head link

[zfs-discuss] Kernel panic at zpool import

>Suppose that ZFS detects an error in the first
> case. ?It can''t tell<br>
> the storage array "something''s wrong, please
> fix it" (since the<br>
> storage array doesn''t provide for this with
> checksums and intelligent<br>
> recovery), so all it can do is tell the user
> "this file is corrupt,<br>
> recover it from backups".<br>Just to remind you. System was working fine with no sign of any failures. 
Data got corrupted at export operation. If storage was somehow misbehaving I
would expect ZFS to complain about it on any operation which did not finish
succesfully.  I had NONE issues on the system with quite extensive read/write
activity. System panicked on export and messed everything such that pools could
not be imported. At what moment ZFS whould do better if I had even raid1
configuration? I assume that this mess would be written on both disks and how
this would help me in recovering. I do understand that having more disks would
be better in case of failure of one or several of them. But only if
it''s related to disks. I''m almost sure disks were fine during
failure. Is there anything you can improve apart from ZFS to cope with such
issues?
 
 
This message posted from opensolaris.org

Borys Saulyak

2008-Aug-18 20:15 UTC

head link

[zfs-discuss] Kernel panic at zpool import

> Ask your hardware vendor. The hardware corrupted your
> data, not ZFS.Right, that''s all because of these storage vendors. All problems come
from them! Never from ZFS :-) I have similar answer from them: ask Sun, ZFS is
buggy. Our storage is always fine. That is really ridiculous! People pay huge
money on storage and its support plus same for hardware and OS to get at the end
both parties blaming each other with no intention to look deeper.
 
 
This message posted from opensolaris.org

Richard Elling

2008-Aug-18 20:41 UTC

head link

[zfs-discuss] Kernel panic at zpool import

Borys Saulyak wrote:>> Suppose that ZFS detects an error in the first
>> case.  It can''t tell<br>
>> the storage array "something''s wrong, please
>> fix it" (since the<br>
>> storage array doesn''t provide for this with
>> checksums and intelligent<br>
>> recovery), so all it can do is tell the user
>> "this file is corrupt,<br>
>> recover it from backups".<br>
>>     
> Just to remind you. System was working fine with no sign of any failures. 
> Data got corrupted at export operation. If storage was somehow misbehaving
I would expect ZFS to complain about it on any operation which did not finish
succesfully.
 From what I can predict, and *nobody* has provided any panic
messages to confirm, ZFS likely had difficulty writing.  For Solaris 10u5
and previous updates, ZFS will panic when writes cannot be completed
successfully.  This will be clearly logged.  For later releases, the policy
set in the pool''s failmode property will be followed.  Or, to say this
another way, the only failmode property in Solaris 10u5 or NV builds
prior to build 77 (October 2007) is "panic."  For later releases, the 
default
failmode is "wait," but you can change it.
>  I had NONE issues on the system with quite extensive read/write activity.
System panicked on export and messed everything such that pools could not be
imported. At what moment ZFS whould do better if I had even raid1 configuration?
I assume that this mess would be written on both disks and how this would help
me in recovering. I do understand that having more disks would be better in case
of failure of one or several of them. But only if it''s related to
disks. I''m almost sure disks were fine during failure. Is there
anything you can improve apart from ZFS to cope with such issues?
>  
>   
I think that nobody will be able to pinpoint the cause until
someone looks at the messages and fma logs.
 -- richard

Borys Saulyak

2008-Aug-19 07:06 UTC

head link

[zfs-discuss] Kernel panic at zpool import

> From what I can predict, and *nobody* has provided
>  any panic
> essages to confirm, ZFS likely had difficulty
> writing.  For Solaris 10u5Panic stack is looking pretty much the same as panic on imprt, and cannot be
correlated to write failure:
Aug  5 12:01:27 omases11 unix: [ID 836849 kern.notice] 
Aug  5 12:01:27 omases11 ^Mpanic[cpu3]/thread=fffffe800279ac80: 
Aug  5 12:01:27 omases11 genunix: [ID 809409 kern.notice] ZFS: bad checksum
(read on <unknown> off 0: zio fffffe8353c23640 [L0 packe
d nvlist] 4000L/600P DVA[0]=<0:d00004200:600>
DVA[1]=<0:9000004200:600> fletcher4 lzjb LE contiguous birth=3637241
fill=1 cksum=6a85
cbad8b:60029922bbbf:2eb217a6bbefd5:1045aa85ce3521e3): error 50
Aug  5 12:01:27 omases11 unix: [ID 100000 kern.notice] 
Aug  5 12:01:27 omases11 genunix: [ID 655072 kern.notice] fffffe800279aac0
zfs:zfsctl_ops_root+3008f24c ()
Aug  5 12:01:27 omases11 genunix: [ID 655072 kern.notice] fffffe800279aad0
zfs:zio_next_stage+65 ()
Aug  5 12:01:27 omases11 genunix: [ID 655072 kern.notice] fffffe800279ab00
zfs:zio_wait_for_children+49 ()
Aug  5 12:01:27 omases11 genunix: [ID 655072 kern.notice] fffffe800279ab10
zfs:zio_wait_children_done+15 ()
Aug  5 12:01:27 omases11 genunix: [ID 655072 kern.notice] fffffe800279ab20
zfs:zio_next_stage+65 ()
Aug  5 12:01:27 omases11 genunix: [ID 655072 kern.notice] fffffe800279ab60
zfs:zio_vdev_io_assess+84 ()
Aug  5 12:01:27 omases11 genunix: [ID 655072 kern.notice] fffffe800279ab70
zfs:zio_next_stage+65 ()
Aug  5 12:01:28 omases11 genunix: [ID 655072 kern.notice] fffffe800279abd0
zfs:vdev_mirror_io_done+c1 ()
Aug  5 12:01:28 omases11 genunix: [ID 655072 kern.notice] fffffe800279abe0
zfs:zio_vdev_io_done+14 ()
Aug  5 12:01:28 omases11 genunix: [ID 655072 kern.notice] fffffe800279ac60
genunix:taskq_thread+bc ()
Aug  5 12:01:28 omases11 genunix: [ID 655072 kern.notice] fffffe800279ac70
unix:thread_start+8 ()
Aug  5 12:01:28 omases11 unix: [ID 100000 kern.notice] 
Aug  5 12:01:28 omases11 genunix: [ID 672855 kern.notice] syncing file
systems...
Aug  5 12:01:28 omases11 genunix: [ID 733762 kern.notice]  7
 
 
This message posted from opensolaris.org

Richard Elling

2008-Aug-19 15:01 UTC

head link

[zfs-discuss] Kernel panic at zpool import

This panic message seems consistent with bugid 6322646, which was
fixed in NV b77 (post S10u5 freeze).
 http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6322646

 -- richard

Borys Saulyak wrote:>> From what I can predict, and *nobody* has provided
>>  any panic
>> essages to confirm, ZFS likely had difficulty
>> writing.  For Solaris 10u5
>>     
> Panic stack is looking pretty much the same as panic on imprt, and cannot
be correlated to write failure:
> Aug  5 12:01:27 omases11 unix: [ID 836849 kern.notice] 
> Aug  5 12:01:27 omases11 ^Mpanic[cpu3]/thread=fffffe800279ac80: 
> Aug  5 12:01:27 omases11 genunix: [ID 809409 kern.notice] ZFS: bad checksum
(read on <unknown> off 0: zio fffffe8353c23640 [L0 packe
> d nvlist] 4000L/600P DVA[0]=<0:d00004200:600>
DVA[1]=<0:9000004200:600> fletcher4 lzjb LE contiguous birth=3637241
fill=1 cksum=6a85
> cbad8b:60029922bbbf:2eb217a6bbefd5:1045aa85ce3521e3): error 50
> Aug  5 12:01:27 omases11 unix: [ID 100000 kern.notice] 
> Aug  5 12:01:27 omases11 genunix: [ID 655072 kern.notice] fffffe800279aac0
zfs:zfsctl_ops_root+3008f24c ()
> Aug  5 12:01:27 omases11 genunix: [ID 655072 kern.notice] fffffe800279aad0
zfs:zio_next_stage+65 ()
> Aug  5 12:01:27 omases11 genunix: [ID 655072 kern.notice] fffffe800279ab00
zfs:zio_wait_for_children+49 ()
> Aug  5 12:01:27 omases11 genunix: [ID 655072 kern.notice] fffffe800279ab10
zfs:zio_wait_children_done+15 ()
> Aug  5 12:01:27 omases11 genunix: [ID 655072 kern.notice] fffffe800279ab20
zfs:zio_next_stage+65 ()
> Aug  5 12:01:27 omases11 genunix: [ID 655072 kern.notice] fffffe800279ab60
zfs:zio_vdev_io_assess+84 ()
> Aug  5 12:01:27 omases11 genunix: [ID 655072 kern.notice] fffffe800279ab70
zfs:zio_next_stage+65 ()
> Aug  5 12:01:28 omases11 genunix: [ID 655072 kern.notice] fffffe800279abd0
zfs:vdev_mirror_io_done+c1 ()
> Aug  5 12:01:28 omases11 genunix: [ID 655072 kern.notice] fffffe800279abe0
zfs:zio_vdev_io_done+14 ()
> Aug  5 12:01:28 omases11 genunix: [ID 655072 kern.notice] fffffe800279ac60
genunix:taskq_thread+bc ()
> Aug  5 12:01:28 omases11 genunix: [ID 655072 kern.notice] fffffe800279ac70
unix:thread_start+8 ()
> Aug  5 12:01:28 omases11 unix: [ID 100000 kern.notice] 
> Aug  5 12:01:28 omases11 genunix: [ID 672855 kern.notice] syncing file
systems...
> Aug  5 12:01:28 omases11 genunix: [ID 733762 kern.notice]  7
>  
>  
> This message posted from opensolaris.org
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>

Richard Elling

2008-Aug-19 16:09 UTC

head link

[zfs-discuss] Kernel panic at zpool import

Borys Saulyak wrote:> May I remind you that I issue occurred on Solaris 10, not on OpenSolaris.
>
>   
I believe you.  If you review the life cycle of a bug,
http://www.sun.com/bigadmin/hubs/documentation/patch/patch-docs/abugslife.pdf

then you will recall that bugs are fixed in NV and then
backported to Solaris 10 as patches.  We would all appreciate
a more rapid patch availability process for Solaris 10, but that
is a discussion more appropriate for another forum.
 -- richard

Borys Saulyak

2008-Aug-27 14:02 UTC

head link

[zfs-discuss] Kernel panic at zpool import

A little update on the subject.

After great help of Victor Latushkin the content of the pools is recovered.
The cause of the problem is still under investigation, but what is clear that
both config objects where corrupted.
What has been done to recover data:
Victor has a zfs module which allows to import pools in readonly mode bypassing
reading of config objects.  After installing it he was able to import pools and
we manages to save almost everything apart from couple of log files. This module
seems to be the only way to read content of the pools in situations like mine,
where pool cannot be imported, and therefor cannot be checked/fixed by
scrubbing. I hope Victor will post sort of instruction along with the module on
how to use it.
 
 
This message posted from opensolaris.org

Andrew

2008-Nov-07 20:15 UTC

head link

[zfs-discuss] Kernel panic at zpool import

Do you guys have any more information about this? I''ve tried the offset
methods, zfs_recover, aok=1, mounting read only, yada yada, with still 0 luck. I
have about 3TBs of data on my array, and I would REALLY hate to lose it.

Thanks!
-- 
This message posted from opensolaris.org

zfs discuss - Aug 2008 - Kernel panic at zpool import

[zfs-discuss] Kernel panic at zpool import

[zfs-discuss] Kernel panic at zpool import

[zfs-discuss] Kernel panic at zpool import

[zfs-discuss] Odp: Kernel panic at zpool import

[zfs-discuss] Kernel panic at zpool import

[zfs-discuss] Kernel panic at zpool import

[zfs-discuss] Kernel panic at zpool import

[zfs-discuss] Kernel panic at zpool import

[zfs-discuss] Kernel panic at zpool import

[zfs-discuss] Kernel panic at zpool import

[zfs-discuss] Kernel panic at zpool import

[zfs-discuss] Kernel panic at zpool import

[zfs-discuss] Kernel panic at zpool import

[zfs-discuss] Odp: Kernel panic at zpool import

[zfs-discuss] Kernel panic at zpool import

[zfs-discuss] Kernel panic at zpool import

[zfs-discuss] Kernel panic at zpool import

[zfs-discuss] Kernel panic at zpool import

[zfs-discuss] Kernel panic at zpool import

[zfs-discuss] Kernel panic at zpool import

[zfs-discuss] Kernel panic at zpool import

[zfs-discuss] Kernel panic at zpool import