Hi, I have problem with Solaris 10. I know that this forum is for OpenSolaris but may be someone will have an idea. My box is crashing on any attempt to import zfs pool. First crash happened on export operation and since then I cannot import pool anymore due to kernel panics. Is there any way of getting it imported or fixed? Removal of zpool.cache did not help. Here are details: SunOS omases11 5.10 Generic_137112-02 i86pc i386 i86pc root at omases11:~[8]#zpool import pool: public id: 10521132528798740070 state: ONLINE action: The pool can be imported using its name or numeric identifier. config: public ONLINE c7t60060160CBA21000A5D22553CA91DC11d0 ONLINE pool: private id: 3180576189687249855 state: ONLINE action: The pool can be imported using its name or numeric identifier. config: private ONLINE c7t60060160CBA21000A6D22553CA91DC11d0 ONLINE root at omases11:~[8]#zpool import private panic[cpu3]/thread=fffffe8001223c80: ZFS: bad checksum (read on <unknown> off 0: zio ffffffffa26b7680 [L0 packed nvlist] 4000L/600P DVA[0]=<0:10c000f400:600> DVA[1]=<0:b40014e00:600> fletcher4 lzjb LE contiguous birth=3640409 fill=1 cksum=6c8098535e:6150d1eeb30a:2f1f7efda48588:105955d437bb76e5): error 50 fffffe8001223ac0 zfs:zfsctl_ops_root+2ff1624c () fffffe8001223ad0 zfs:zio_next_stage+65 () fffffe8001223b00 zfs:zio_wait_for_children+49 () fffffe8001223b10 zfs:zio_wait_children_done+15 () fffffe8001223b20 zfs:zio_next_stage+65 () fffffe8001223b60 zfs:zio_vdev_io_assess+84 () fffffe8001223b70 zfs:zio_next_stage+65 () fffffe8001223bd0 zfs:vdev_mirror_io_done+c1 () fffffe8001223be0 zfs:zio_vdev_io_done+14 () fffffe8001223c60 genunix:taskq_thread+bc () fffffe8001223c70 unix:thread_start+8 () syncing file systems... [2] 212 [2] 210 [2] 210 [2] 210 [2] 210 [2] 210 [2] 210 [2] 210 [2] 210 [2] 210 [2] 210 [2] 210 [2] 210 [2] 210 [2] 210 [2] 210 [2] 210 [2] 210 [2] 210 [2] 210 [2] 210 [2] 210 done (not all i/o completed) dumping to /dev/dsk/c3t2d0s1, offset 65536, content: kernel This message posted from opensolaris.org
Borys Saulyak <borys.saulyak <at> eumetsat.int> writes:> root <at> omases11:~[8]#zpool import > [...] > pool: private > id: 3180576189687249855 > state: ONLINE > action: The pool can be imported using its name or numeric identifier. > config: > > private ONLINE > c7t60060160CBA21000A6D22553CA91DC11d0 ONLINEYour pools have no redundancy...> root <at> omases11:~[8]#zpool import private > > panic[cpu3]/thread=fffffe8001223c80: ZFS: bad checksum...and got corrupted, therefore there is nothing ZFS can do. This is precisely why best practices recommend pools to be configured with some level of redundancy (mirror, raidz, etc). See: http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide#Additional_Cautions_for_Storage_Pools Restore your data from backup. -marc
There is a chance that a software bug or change has been made which will help you to recover from this. I suggest getting the latest SXCE DVD, booting single user, and attempt an import. Note: you may see a message indicating that you can upgrade the pool. Do not upgrade the pool if you intend to continue running Solaris 10 in the near future. -- richard Borys Saulyak wrote:> Hi, > > I have problem with Solaris 10. I know that this forum is for OpenSolaris but may be someone will have an idea. > My box is crashing on any attempt to import zfs pool. First crash happened on export operation and since then I cannot import pool anymore due to kernel panics. Is there any way of getting it imported or fixed? Removal of zpool.cache did not help. > > Here are details: > SunOS omases11 5.10 Generic_137112-02 i86pc i386 i86pc > > root at omases11:~[8]#zpool import > pool: public > id: 10521132528798740070 > state: ONLINE > action: The pool can be imported using its name or numeric identifier. > config: > > public ONLINE > c7t60060160CBA21000A5D22553CA91DC11d0 ONLINE > > pool: private > id: 3180576189687249855 > state: ONLINE > action: The pool can be imported using its name or numeric identifier. > config: > > private ONLINE > c7t60060160CBA21000A6D22553CA91DC11d0 ONLINE > > > root at omases11:~[8]#zpool import private > > panic[cpu3]/thread=fffffe8001223c80: ZFS: bad checksum (read on <unknown> off 0: zio ffffffffa26b7680 > [L0 packed nvlist] 4000L/600P DVA[0]=<0:10c000f400:600> DVA[1]=<0:b40014e00:600> fletcher4 lzjb > LE contiguous birth=3640409 fill=1 cksum=6c8098535e:6150d1eeb30a:2f1f7efda48588:105955d437bb76e5): error 50 > > fffffe8001223ac0 zfs:zfsctl_ops_root+2ff1624c () > fffffe8001223ad0 zfs:zio_next_stage+65 () > fffffe8001223b00 zfs:zio_wait_for_children+49 () > fffffe8001223b10 zfs:zio_wait_children_done+15 () > fffffe8001223b20 zfs:zio_next_stage+65 () > fffffe8001223b60 zfs:zio_vdev_io_assess+84 () > fffffe8001223b70 zfs:zio_next_stage+65 () > fffffe8001223bd0 zfs:vdev_mirror_io_done+c1 () > fffffe8001223be0 zfs:zio_vdev_io_done+14 () > fffffe8001223c60 genunix:taskq_thread+bc () > fffffe8001223c70 unix:thread_start+8 () > > syncing file systems... [2] 212 [2] 210 [2] 210 [2] 210 [2] 210 [2] 210 [2] 210 [2] 210 [2] 210 [2] 210 [2] 210 [2] 210 [2] 210 [2] 210 [2] 210 [2] 210 [2] 210 [2] 210 [2] 210 [2] 210 [2] 210 [2] 210 done (not all i/o completed) > dumping to /dev/dsk/c3t2d0s1, offset 65536, content: kernel > > > This message posted from opensolaris.org > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >
Dnia 7-08-2008 o godz. 13:20 Borys Saulyak napisa?(a):> Hi, > > I have problem with Solaris 10. I know that this forum is for > OpenSolaris but may be someone will have an idea. > My box is crashing on any attempt to import zfs pool. First crash > happened on export operation and since then I cannot import pool anymore > due to kernel panics. Is there any way of getting it imported or fixed? > Removal of zpool.cache did not help. > > Here are details: > SunOS omases11 5.10 Generic_137112-02 i86pc i386 i86pc > > root at omases11:~[8]#zpool import > pool: public > id: 10521132528798740070 > state: ONLINE > action: The pool can be imported using its name or numeric identifier. > config: > > public ONLINE > c7t60060160CBA21000A5D22553CA91DC11d0 ONLINE > > pool: private > id: 3180576189687249855 > state: ONLINE > action: The pool can be imported using its name or numeric identifier. > config: > > private ONLINE > c7t60060160CBA21000A6D22553CA91DC11d0 ONLINE >Try to change uberblock http://www.opensolaris.org/jive/thread.jspa?messageID=217097 this might help. --Lukas Karwacki ---------------------------------------------------- Wytworne szmatki, luksusowe auta, efekciarskie gad?ety. Serwis dla koneser?w prawdziwego luksusu. http://klik.wp.pl/?adr=www.LuxClub.pl&sid=443
> Your pools have no redundancy...Box is connected to two fabric switches via different HBAs, storage is RAID5, MPxIP is ON, and all after that my pools have no redundancy?!?!> ...and got corrupted, therefore there is nothing ZFSThis is exactly what I would like to know. HOW this could happened? I''m just questioning myself. Is it really reliable filesystem as presented, or it''s better to keep away from it on production environment. This message posted from opensolaris.org
Borys Saulyak wrote:>> Your pools have no redundancy... > Box is connected to two fabric switches via different HBAs, storage is RAID5, MPxIP is ON, and all after that my pools have no redundancy?!?!Not that ZFS can see and use, all that is just a single disk as far as ZFS is concerned.>> ...and got corrupted, therefore there is nothing ZFS > This is exactly what I would like to know. HOW this could happened? > I''m just questioning myself. Is it really reliable filesystem as presented, or it''s better to keep away from it on production environment.ZFS can not repair problems if it is not in control of the redundant copies. -- Darren J Moffat
Borys Saulyak <borys.saulyak <at> eumetsat.int> writes:> > > Your pools have no redundancy... > > Box is connected to two fabric switches via different HBAs, storage is > RAID5, MPxIP is ON, and all after that my pools have no redundancy?!?!As Darren said: no, there is no redundancy that ZFS can use. It is important to understand that your setup _prevents_ ZFS from self-healing itself. You need a ZFS-redundant pool (mirror, raidz or raidz2) or an fs with the attribute copies=2 to enable self-healing. I would recommend you to make multiple LUNs visible to ZFS, and create redundant pools out of them. Browse he past 2 years or so of the zfs-discuss@ archives to give you an idea about how others with the same kind of hardware as you are doing it. For example, export each disk as a LUN, and create multiple raidz vdevs. Or create 2 hardware raid5 arrays and mirror them with ZFS, etc.> > ...and got corrupted, therefore there is nothing ZFS > This is exactly what I would like to know. HOW this could happened?Ask your hardware vendor. The hardware corrupted your data, not ZFS.> I''m just questioning myself. Is it really reliable filesystem as presented, > or it''s better to keep away from it on production environment.Consider yourself lucky that the corruption was reported by ZFS. Other filesystems would have returned silently corrupted data and it would have maybe taken you days/weeks to troubleshoot it. As to myself, I use ZFS in production to backup 10+ million files, have seen occurences of hw causing data corruption, and have seen ZFS self-heal itself. So yes I trust it. -marc
> I would recommend you to make multiple LUNs visible > to ZFS, and createSo, you are saying that ZFS will cope better with failures then any other storage system, right? I''m just trying to imagine... I''ve got, lets say, 10 disks in the storage. They are currently in RAID5 configuration and given to my box as one LUN. You suggest to create 10 LUNs instead, and give them to ZFS, where they will be part of one raidz, right? So what sort of protection will I gain by that? What kind of failure will be eliminated? Sorry, but I cannot catch it... This message posted from opensolaris.org
On Thu, Aug 14, 2008 at 07:42, Borys Saulyak <borys.saulyak at eumetsat.int> wrote:> I''ve got, lets say, 10 disks in the storage. They are currently in RAID5 configuration and given to my box as one LUN. You suggest to create 10 LUNs instead, and give them to ZFS, where they will be part of one raidz, right? > So what sort of protection will I gain by that? What kind of failure will be eliminated? Sorry, but I cannot catch it...Suppose that ZFS detects an error in the first case. It can''t tell the storage array "something''s wrong, please fix it" (since the storage array doesn''t provide for this with checksums and intelligent recovery), so all it can do is tell the user "this file is corrupt, recover it from backups". In the second case, ZFS can use the parity or mirrored data to reconstruct plausible blocks, and then see if they match the checksum. Once it finds one that matches (which will happen as long as sufficient parity remains), it can write the corrected data back to the disk that had junk on it, and report to the user "there were problems over here, but I fixed them". Will
To further clarify Will''s point... Your current setup provides excellent hardware protection, but absolutely no data protection. ZFS provides excellent data protection when it has multiple copies of the data blocks (>1 hardware devices). Combine the two, provide >1 hardware devices to ZFS, and you have a really nice solution. If you can spare the space, setup your arrays and things to provide exactly 2 identical LUNs to your ZFS box and create your zpool with those in a mirror. The best of all worlds. On Thu, Aug 14, 2008 at 9:41 AM, Will Murnane <will.murnane at gmail.com>wrote:> On Thu, Aug 14, 2008 at 07:42, Borys Saulyak <borys.saulyak at eumetsat.int> > wrote: > > I''ve got, lets say, 10 disks in the storage. They are currently in RAID5 > configuration and given to my box as one LUN. You suggest to create 10 LUNs > instead, and give them to ZFS, where they will be part of one raidz, right? > > So what sort of protection will I gain by that? What kind of failure will > be eliminated? Sorry, but I cannot catch it... > Suppose that ZFS detects an error in the first case. It can''t tell > the storage array "something''s wrong, please fix it" (since the > storage array doesn''t provide for this with checksums and intelligent > recovery), so all it can do is tell the user "this file is corrupt, > recover it from backups". > > In the second case, ZFS can use the parity or mirrored data to > reconstruct plausible blocks, and then see if they match the checksum. > Once it finds one that matches (which will happen as long as > sufficient parity remains), it can write the corrected data back to > the disk that had junk on it, and report to the user "there were > problems over here, but I fixed them". > > Will > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >-- chris -at- microcozm -dot- net === Si Hoc Legere Scis Nimium Eruditionis Habes -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20080814/9cae88b8/attachment.html>
>>>>> "mb" == Marc Bevand <m.bevand at gmail.com> writes:mb> Ask your hardware vendor. The hardware corrupted your data, mb> not ZFS. You absolutely do NOT have adequate basis to make this statement. I would further argue that you are probably wrong, and that I think based on what we know that the pool was probably corrupted by a bug in ZFS. Simply because ZFS is (a) able to detect problems with hardware when they exist, and (b) ringing an alarm bell of some sort, does NOT exhonerate ZFS. and AIUI that is your position. Further, ZFS''s ability to use zpool-level redundancy heal problems created by its own bugs is not a cause for celebration or an improvement over filesystems without bugs. The virtue of the self-healing is for when hardware actually does fail. If self-healing also helps with corruption created by bugs in ZFS, that does not shift blame for unhealed bug-corruption back to the hardware, nor make ZFS more robust than a different filesystem without corruption bugs. mb> Other filesystems would have returned silently corrupted mb> data and it would have maybe taken you days/weeks to mb> troubleshoot possibly. very likely, other filesystems would have handled it fine. Boris, have a look at the two links I posted earlier about ``simon sez, import!'''' incantations, and required patches. http://opensolaris.org/jive/message.jspa?messageID=192572#194209 http://sunsolve.sun.com/search/document.do?assetkey=1-66-233602-1 panic-on-import, sounds a lot like your problem. Jonathan also posted http://www.opensolaris.org/jive/thread.jspa?messageID=220125 which seems to be incomplete instructions on how to choose a different ueberblock which helped someone else with a corrupted pool, but the OP in that thread never wrote it up in recipe form for ignorant sysadmins like me to follow so it might not be widely useful. In short, ZFS is unstable and prone to corruption, but may improve substantially when patched up to the latest revision. And many fixes are available now, but some which are in SXCE right now will be available in the stable binary-only Solaris not until u6 so we haven''t yet gained experience with how much improvement the patches provide. And finally, there is no way to back up a ZFS filesystem with lots of clones which is similarly robust to past Unix backup systems---your best bet for space-efficient backups is to zfs send/recv data onto a separate ZFS pool. In more detail, I think there is some experience here that when a single storage subsystem hosting both ZFS pools and vxfs filesystems goes away, ZFS pools sometimes become corrupt while vxfs rolls its log and continues. so, in stable Sol10u5, ZFS is probably more prone to metadata corruption causing whole-pool-failure than other logging filesystems. some fixes are around the corner, and others are apparently the subject of some philosophical debate. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 304 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20080814/3b09d2cf/attachment.bin>
Miles Nordin wrote:>>>>>> "mb" == Marc Bevand <m.bevand at gmail.com> writes: > > mb> Ask your hardware vendor. The hardware corrupted your data, > mb> not ZFS. > > You absolutely do NOT have adequate basis to make this statement. > > I would further argue that you are probably wrong, and that I think > based on what we know that the pool was probably corrupted by a bug in > ZFS. Simply because ZFS is (a) able to detect problems with hardware > when they exist, and (b) ringing an alarm bell of some sort, does NOT > exhonerate ZFS. and AIUI that is your position. > > Further, ZFS''s ability to use zpool-level redundancy heal problems > created by its own bugs is not a cause for celebration or an > improvement over filesystems without bugs. The virtue of theThere are no filesystems without bugs. -- Darren J Moffat
On Thu, 14 Aug 2008, Miles Nordin wrote:>>>>>> "mb" == Marc Bevand <m.bevand at gmail.com> writes: > > mb> Ask your hardware vendor. The hardware corrupted your data, > mb> not ZFS. > > You absolutely do NOT have adequate basis to make this statement.Unfortunately I was unable to read your entire email since it overflowed my limited buffer. The email would have fit within my limited buffer size if it terminated with the single line above. Replacing one conjecture with another does not seem like sound reasoning to me. Bob =====================================Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
> Try to change uberblock > http://www.opensolaris.org/jive/thread.jspa?messageID> 217097Looks like you are the originator of that thread. In the last message you promised to post some details on how you''ve recovered, but it was not done. Can you please post some details? How did you figure out offsets for vdev_uberblock_compare? Thank you! This message posted from opensolaris.org
>Suppose that ZFS detects an error in the first > case. ?It can''t tell<br> > the storage array "something''s wrong, please > fix it" (since the<br> > storage array doesn''t provide for this with > checksums and intelligent<br> > recovery), so all it can do is tell the user > "this file is corrupt,<br> > recover it from backups".<br>Just to remind you. System was working fine with no sign of any failures. Data got corrupted at export operation. If storage was somehow misbehaving I would expect ZFS to complain about it on any operation which did not finish succesfully. I had NONE issues on the system with quite extensive read/write activity. System panicked on export and messed everything such that pools could not be imported. At what moment ZFS whould do better if I had even raid1 configuration? I assume that this mess would be written on both disks and how this would help me in recovering. I do understand that having more disks would be better in case of failure of one or several of them. But only if it''s related to disks. I''m almost sure disks were fine during failure. Is there anything you can improve apart from ZFS to cope with such issues? This message posted from opensolaris.org
> Ask your hardware vendor. The hardware corrupted your > data, not ZFS.Right, that''s all because of these storage vendors. All problems come from them! Never from ZFS :-) I have similar answer from them: ask Sun, ZFS is buggy. Our storage is always fine. That is really ridiculous! People pay huge money on storage and its support plus same for hardware and OS to get at the end both parties blaming each other with no intention to look deeper. This message posted from opensolaris.org
Borys Saulyak wrote:>> Suppose that ZFS detects an error in the first >> case. It can''t tell<br> >> the storage array "something''s wrong, please >> fix it" (since the<br> >> storage array doesn''t provide for this with >> checksums and intelligent<br> >> recovery), so all it can do is tell the user >> "this file is corrupt,<br> >> recover it from backups".<br> >> > Just to remind you. System was working fine with no sign of any failures. > Data got corrupted at export operation. If storage was somehow misbehaving I would expect ZFS to complain about it on any operation which did not finish succesfully.From what I can predict, and *nobody* has provided any panic messages to confirm, ZFS likely had difficulty writing. For Solaris 10u5 and previous updates, ZFS will panic when writes cannot be completed successfully. This will be clearly logged. For later releases, the policy set in the pool''s failmode property will be followed. Or, to say this another way, the only failmode property in Solaris 10u5 or NV builds prior to build 77 (October 2007) is "panic." For later releases, the default failmode is "wait," but you can change it.> I had NONE issues on the system with quite extensive read/write activity. System panicked on export and messed everything such that pools could not be imported. At what moment ZFS whould do better if I had even raid1 configuration? I assume that this mess would be written on both disks and how this would help me in recovering. I do understand that having more disks would be better in case of failure of one or several of them. But only if it''s related to disks. I''m almost sure disks were fine during failure. Is there anything you can improve apart from ZFS to cope with such issues? > >I think that nobody will be able to pinpoint the cause until someone looks at the messages and fma logs. -- richard
> From what I can predict, and *nobody* has provided > any panic > essages to confirm, ZFS likely had difficulty > writing. For Solaris 10u5Panic stack is looking pretty much the same as panic on imprt, and cannot be correlated to write failure: Aug 5 12:01:27 omases11 unix: [ID 836849 kern.notice] Aug 5 12:01:27 omases11 ^Mpanic[cpu3]/thread=fffffe800279ac80: Aug 5 12:01:27 omases11 genunix: [ID 809409 kern.notice] ZFS: bad checksum (read on <unknown> off 0: zio fffffe8353c23640 [L0 packe d nvlist] 4000L/600P DVA[0]=<0:d00004200:600> DVA[1]=<0:9000004200:600> fletcher4 lzjb LE contiguous birth=3637241 fill=1 cksum=6a85 cbad8b:60029922bbbf:2eb217a6bbefd5:1045aa85ce3521e3): error 50 Aug 5 12:01:27 omases11 unix: [ID 100000 kern.notice] Aug 5 12:01:27 omases11 genunix: [ID 655072 kern.notice] fffffe800279aac0 zfs:zfsctl_ops_root+3008f24c () Aug 5 12:01:27 omases11 genunix: [ID 655072 kern.notice] fffffe800279aad0 zfs:zio_next_stage+65 () Aug 5 12:01:27 omases11 genunix: [ID 655072 kern.notice] fffffe800279ab00 zfs:zio_wait_for_children+49 () Aug 5 12:01:27 omases11 genunix: [ID 655072 kern.notice] fffffe800279ab10 zfs:zio_wait_children_done+15 () Aug 5 12:01:27 omases11 genunix: [ID 655072 kern.notice] fffffe800279ab20 zfs:zio_next_stage+65 () Aug 5 12:01:27 omases11 genunix: [ID 655072 kern.notice] fffffe800279ab60 zfs:zio_vdev_io_assess+84 () Aug 5 12:01:27 omases11 genunix: [ID 655072 kern.notice] fffffe800279ab70 zfs:zio_next_stage+65 () Aug 5 12:01:28 omases11 genunix: [ID 655072 kern.notice] fffffe800279abd0 zfs:vdev_mirror_io_done+c1 () Aug 5 12:01:28 omases11 genunix: [ID 655072 kern.notice] fffffe800279abe0 zfs:zio_vdev_io_done+14 () Aug 5 12:01:28 omases11 genunix: [ID 655072 kern.notice] fffffe800279ac60 genunix:taskq_thread+bc () Aug 5 12:01:28 omases11 genunix: [ID 655072 kern.notice] fffffe800279ac70 unix:thread_start+8 () Aug 5 12:01:28 omases11 unix: [ID 100000 kern.notice] Aug 5 12:01:28 omases11 genunix: [ID 672855 kern.notice] syncing file systems... Aug 5 12:01:28 omases11 genunix: [ID 733762 kern.notice] 7 This message posted from opensolaris.org
This panic message seems consistent with bugid 6322646, which was fixed in NV b77 (post S10u5 freeze). http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6322646 -- richard Borys Saulyak wrote:>> From what I can predict, and *nobody* has provided >> any panic >> essages to confirm, ZFS likely had difficulty >> writing. For Solaris 10u5 >> > Panic stack is looking pretty much the same as panic on imprt, and cannot be correlated to write failure: > Aug 5 12:01:27 omases11 unix: [ID 836849 kern.notice] > Aug 5 12:01:27 omases11 ^Mpanic[cpu3]/thread=fffffe800279ac80: > Aug 5 12:01:27 omases11 genunix: [ID 809409 kern.notice] ZFS: bad checksum (read on <unknown> off 0: zio fffffe8353c23640 [L0 packe > d nvlist] 4000L/600P DVA[0]=<0:d00004200:600> DVA[1]=<0:9000004200:600> fletcher4 lzjb LE contiguous birth=3637241 fill=1 cksum=6a85 > cbad8b:60029922bbbf:2eb217a6bbefd5:1045aa85ce3521e3): error 50 > Aug 5 12:01:27 omases11 unix: [ID 100000 kern.notice] > Aug 5 12:01:27 omases11 genunix: [ID 655072 kern.notice] fffffe800279aac0 zfs:zfsctl_ops_root+3008f24c () > Aug 5 12:01:27 omases11 genunix: [ID 655072 kern.notice] fffffe800279aad0 zfs:zio_next_stage+65 () > Aug 5 12:01:27 omases11 genunix: [ID 655072 kern.notice] fffffe800279ab00 zfs:zio_wait_for_children+49 () > Aug 5 12:01:27 omases11 genunix: [ID 655072 kern.notice] fffffe800279ab10 zfs:zio_wait_children_done+15 () > Aug 5 12:01:27 omases11 genunix: [ID 655072 kern.notice] fffffe800279ab20 zfs:zio_next_stage+65 () > Aug 5 12:01:27 omases11 genunix: [ID 655072 kern.notice] fffffe800279ab60 zfs:zio_vdev_io_assess+84 () > Aug 5 12:01:27 omases11 genunix: [ID 655072 kern.notice] fffffe800279ab70 zfs:zio_next_stage+65 () > Aug 5 12:01:28 omases11 genunix: [ID 655072 kern.notice] fffffe800279abd0 zfs:vdev_mirror_io_done+c1 () > Aug 5 12:01:28 omases11 genunix: [ID 655072 kern.notice] fffffe800279abe0 zfs:zio_vdev_io_done+14 () > Aug 5 12:01:28 omases11 genunix: [ID 655072 kern.notice] fffffe800279ac60 genunix:taskq_thread+bc () > Aug 5 12:01:28 omases11 genunix: [ID 655072 kern.notice] fffffe800279ac70 unix:thread_start+8 () > Aug 5 12:01:28 omases11 unix: [ID 100000 kern.notice] > Aug 5 12:01:28 omases11 genunix: [ID 672855 kern.notice] syncing file systems... > Aug 5 12:01:28 omases11 genunix: [ID 733762 kern.notice] 7 > > > This message posted from opensolaris.org > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >
Borys Saulyak wrote:> May I remind you that I issue occurred on Solaris 10, not on OpenSolaris. > >I believe you. If you review the life cycle of a bug, http://www.sun.com/bigadmin/hubs/documentation/patch/patch-docs/abugslife.pdf then you will recall that bugs are fixed in NV and then backported to Solaris 10 as patches. We would all appreciate a more rapid patch availability process for Solaris 10, but that is a discussion more appropriate for another forum. -- richard
A little update on the subject. After great help of Victor Latushkin the content of the pools is recovered. The cause of the problem is still under investigation, but what is clear that both config objects where corrupted. What has been done to recover data: Victor has a zfs module which allows to import pools in readonly mode bypassing reading of config objects. After installing it he was able to import pools and we manages to save almost everything apart from couple of log files. This module seems to be the only way to read content of the pools in situations like mine, where pool cannot be imported, and therefor cannot be checked/fixed by scrubbing. I hope Victor will post sort of instruction along with the module on how to use it. This message posted from opensolaris.org
Do you guys have any more information about this? I''ve tried the offset methods, zfs_recover, aok=1, mounting read only, yada yada, with still 0 luck. I have about 3TBs of data on my array, and I would REALLY hate to lose it. Thanks! -- This message posted from opensolaris.org