I have a problem on one of my systems with zfs. I used to have zpool created with 3 luns on SAN. I did not have to put any raid or anything on it since it was already using raid on SAN. Anyway server rebooted and I cannot zee my pools. When I do try to import it it does fail. I am using EMC Clarion as SAN and powerpath # zpool list no pools available # zpool import -f pool: mypool id: 4148251638983938048 state: FAULTED status: One or more devices are missing from the system. action: The pool cannot be imported. Attach the missing devices and try again. see: http://www.sun.com/msg/ZFS-8000-3C config: mypool UNAVAIL insufficient replicas emcpower0a UNAVAIL cannot open emcpower2a UNAVAIL cannot open emcpower3a ONLINE I think I am able to see all the luns and I should be able to access them on my sun box. # powermt display dev=all Pseudo name=emcpower0a CLARiiON ID=APM00070202835 [NRHAPP02] Logical device ID=6006016045201A001264FB20990FDC11 [LUN 13] state=alive; policy=CLAROpt; priority=0; queued-IOs=0 Owner: default=SP B, current=SP B =============================================================================---------------- Host --------------- - Stor - -- I/O Path - -- Stats --- ### HW Path I/O Paths Interf. Mode State Q-IOs Errors =============================================================================3074 pci at 1f,700000/pci at 0/SUNW,qlc at 2/fp at 0,0 c2t5006016041E035A4d0s0 SP A4 active alive 0 0 3074 pci at 1f,700000/pci at 0/SUNW,qlc at 2/fp at 0,0 c2t5006016941E035A4d0s0 SP B5 active alive 0 0 3072 pci at 1f,700000/pci at 0,2/SUNW,qlc at 1/fp at 0,0 c3t5006016141E035A4d0s0 SP A5 active alive 0 0 3072 pci at 1f,700000/pci at 0,2/SUNW,qlc at 1/fp at 0,0 c3t5006016841E035A4d0s0 SP B4 active alive 0 0 Pseudo name=emcpower1a CLARiiON ID=APM00070202835 [NRHAPP02] Logical device ID=6006016045201A004C1388343C10DC11 [LUN 14] state=alive; policy=CLAROpt; priority=0; queued-IOs=0 Owner: default=SP B, current=SP B =============================================================================---------------- Host --------------- - Stor - -- I/O Path - -- Stats --- ### HW Path I/O Paths Interf. Mode State Q-IOs Errors =============================================================================3074 pci at 1f,700000/pci at 0/SUNW,qlc at 2/fp at 0,0 c2t5006016041E035A4d1s0 SP A4 active alive 0 0 3074 pci at 1f,700000/pci at 0/SUNW,qlc at 2/fp at 0,0 c2t5006016941E035A4d1s0 SP B5 active alive 0 0 3072 pci at 1f,700000/pci at 0,2/SUNW,qlc at 1/fp at 0,0 c3t5006016141E035A4d1s0 SP A5 active alive 0 0 3072 pci at 1f,700000/pci at 0,2/SUNW,qlc at 1/fp at 0,0 c3t5006016841E035A4d1s0 SP B4 active alive 0 0 Pseudo name=emcpower3a CLARiiON ID=APM00070202835 [NRHAPP02] Logical device ID=6006016045201A00A82C68514E86DC11 [LUN 7] state=alive; policy=CLAROpt; priority=0; queued-IOs=0 Owner: default=SP B, current=SP B =============================================================================---------------- Host --------------- - Stor - -- I/O Path - -- Stats --- ### HW Path I/O Paths Interf. Mode State Q-IOs Errors =============================================================================3074 pci at 1f,700000/pci at 0/SUNW,qlc at 2/fp at 0,0 c2t5006016041E035A4d3s0 SP A4 active alive 0 0 3074 pci at 1f,700000/pci at 0/SUNW,qlc at 2/fp at 0,0 c2t5006016941E035A4d3s0 SP B5 active alive 0 0 3072 pci at 1f,700000/pci at 0,2/SUNW,qlc at 1/fp at 0,0 c3t5006016141E035A4d3s0 SP A5 active alive 0 0 3072 pci at 1f,700000/pci at 0,2/SUNW,qlc at 1/fp at 0,0 c3t5006016841E035A4d3s0 SP B4 active alive 0 0 Pseudo name=emcpower2a CLARiiON ID=APM00070202835 [NRHAPP02] Logical device ID=600601604B141B00C2F6DB2AC349DC11 [LUN 24] state=alive; policy=CLAROpt; priority=0; queued-IOs=0 Owner: default=SP B, current=SP B =============================================================================---------------- Host --------------- - Stor - -- I/O Path - -- Stats --- ### HW Path I/O Paths Interf. Mode State Q-IOs Errors =============================================================================3074 pci at 1f,700000/pci at 0/SUNW,qlc at 2/fp at 0,0 c2t5006016041E035A4d2s0 SP A4 active alive 0 0 3074 pci at 1f,700000/pci at 0/SUNW,qlc at 2/fp at 0,0 c2t5006016941E035A4d2s0 SP B5 active alive 0 0 3072 pci at 1f,700000/pci at 0,2/SUNW,qlc at 1/fp at 0,0 c3t5006016141E035A4d2s0 SP A5 active alive 0 0 3072 pci at 1f,700000/pci at 0,2/SUNW,qlc at 1/fp at 0,0 c3t5006016841E035A4d2s0 SP B4 active alive 0 0 So format does show them as well bash-3.00# echo | format Searching for disks...done AVAILABLE DISK SELECTIONS: 0. c1t0d0 <SUN72G cyl 14087 alt 2 hd 24 sec 424> /pci at 1e,600000/pci at 0/pci at a/pci at 0/pci at 8/scsi at 1/sd at 0,0 1. c1t1d0 <SUN72G cyl 14087 alt 2 hd 24 sec 424> /pci at 1e,600000/pci at 0/pci at a/pci at 0/pci at 8/scsi at 1/sd at 1,0 2. c1t2d0 <SEAGATE-ST973401LSUN72G-0556-68.37GB> /pci at 1e,600000/pci at 0/pci at a/pci at 0/pci at 8/scsi at 1/sd at 2,0 3. c1t3d0 <SEAGATE-ST973401LSUN72G-0556-68.37GB> /pci at 1e,600000/pci at 0/pci at a/pci at 0/pci at 8/scsi at 1/sd at 3,0 4. c2t5006016941E035A4d0 <DGC-RAID5-0324 cyl 51198 alt 2 hd 256 sec 16> /pci at 1f,700000/pci at 0/SUNW,qlc at 2/fp at 0,0/ssd at w5006016941e035a4,0 5. c2t5006016041E035A4d0 <DGC-RAID5-0324 cyl 51198 alt 2 hd 256 sec 16> /pci at 1f,700000/pci at 0/SUNW,qlc at 2/fp at 0,0/ssd at w5006016041e035a4,0 6. c2t5006016941E035A4d1 <DGC-RAID5-0324 cyl 32766 alt 2 hd 64 sec 10> /pci at 1f,700000/pci at 0/SUNW,qlc at 2/fp at 0,0/ssd at w5006016941e035a4,1 7. c2t5006016041E035A4d1 <DGC-RAID5-0324 cyl 32766 alt 2 hd 64 sec 10> /pci at 1f,700000/pci at 0/SUNW,qlc at 2/fp at 0,0/ssd at w5006016041e035a4,1 8. c2t5006016041E035A4d2 <DGC-RAID5-0324 cyl 32766 alt 2 hd 256 sec 10> /pci at 1f,700000/pci at 0/SUNW,qlc at 2/fp at 0,0/ssd at w5006016041e035a4,2 9. c2t5006016941E035A4d2 <DGC-RAID5-0324 cyl 32766 alt 2 hd 256 sec 10> /pci at 1f,700000/pci at 0/SUNW,qlc at 2/fp at 0,0/ssd at w5006016941e035a4,2 10. c2t5006016041E035A4d3 <DGC-RAID5-0324 cyl 63998 alt 2 hd 256 sec 16> /pci at 1f,700000/pci at 0/SUNW,qlc at 2/fp at 0,0/ssd at w5006016041e035a4,3 11. c2t5006016941E035A4d3 <DGC-RAID5-0324 cyl 63998 alt 2 hd 256 sec 16> /pci at 1f,700000/pci at 0/SUNW,qlc at 2/fp at 0,0/ssd at w5006016941e035a4,3 12. c3t5006016841E035A4d0 <DGC-RAID5-0324 cyl 51198 alt 2 hd 256 sec 16> /pci at 1f,700000/pci at 0,2/SUNW,qlc at 1/fp at 0,0/ssd at w5006016841e035a4,0 13. c3t5006016141E035A4d0 <DGC-RAID5-0324 cyl 51198 alt 2 hd 256 sec 16> /pci at 1f,700000/pci at 0,2/SUNW,qlc at 1/fp at 0,0/ssd at w5006016141e035a4,0 14. c3t5006016141E035A4d1 <DGC-RAID5-0324 cyl 32766 alt 2 hd 64 sec 10> /pci at 1f,700000/pci at 0,2/SUNW,qlc at 1/fp at 0,0/ssd at w5006016141e035a4,1 15. c3t5006016841E035A4d1 <DGC-RAID5-0324 cyl 32766 alt 2 hd 64 sec 10> /pci at 1f,700000/pci at 0,2/SUNW,qlc at 1/fp at 0,0/ssd at w5006016841e035a4,1 16. c3t5006016141E035A4d2 <DGC-RAID5-0324 cyl 32766 alt 2 hd 256 sec 10> /pci at 1f,700000/pci at 0,2/SUNW,qlc at 1/fp at 0,0/ssd at w5006016141e035a4,2 17. c3t5006016841E035A4d2 <DGC-RAID5-0324 cyl 32766 alt 2 hd 256 sec 10> /pci at 1f,700000/pci at 0,2/SUNW,qlc at 1/fp at 0,0/ssd at w5006016841e035a4,2 18. c3t5006016841E035A4d3 <DGC-RAID5-0324 cyl 63998 alt 2 hd 256 sec 16> /pci at 1f,700000/pci at 0,2/SUNW,qlc at 1/fp at 0,0/ssd at w5006016841e035a4,3 19. c3t5006016141E035A4d3 <DGC-RAID5-0324 cyl 63998 alt 2 hd 256 sec 16> /pci at 1f,700000/pci at 0,2/SUNW,qlc at 1/fp at 0,0/ssd at w5006016141e035a4,3 20. emcpower0a <DGC-RAID5-0324 cyl 51198 alt 2 hd 256 sec 16> /pseudo/emcp at 0 21. emcpower1a <DGC-RAID5-0324 cyl 32766 alt 2 hd 64 sec 10> /pseudo/emcp at 1 22. emcpower2a <DGC-RAID5-0324 cyl 32766 alt 2 hd 256 sec 10> /pseudo/emcp at 2 23. emcpower3a <DGC-RAID5-0324 cyl 63998 alt 2 hd 256 sec 16> /pseudo/emcp at 3 Specify disk (enter its number): Specify disk (enter its number): Now the fun part with troubleshooting this. When I do zdb on emcpower3a which seems to be ok from zpool perspective I get the following output: bash-3.00# zdb -lv /dev/dsk/emcpower3a -------------------------------------------- LABEL 0 -------------------------------------------- version=3 name=''mypool'' state=0 txg=4367380 pool_guid=4148251638983938048 top_guid=9690155374174551757 guid=9690155374174551757 vdev_tree type=''disk'' id=2 guid=9690155374174551757 path=''/dev/dsk/emcpower3a'' whole_disk=0 metaslab_array=1813 metaslab_shift=30 ashift=9 asize=134208815104 -------------------------------------------- LABEL 1 -------------------------------------------- version=3 name=''mypool'' state=0 txg=4367380 pool_guid=4148251638983938048 top_guid=9690155374174551757 guid=9690155374174551757 vdev_tree type=''disk'' id=2 guid=9690155374174551757 path=''/dev/dsk/emcpower3a'' whole_disk=0 metaslab_array=1813 metaslab_shift=30 ashift=9 asize=134208815104 -------------------------------------------- LABEL 2 -------------------------------------------- version=3 name=''mypool'' state=0 txg=4367380 pool_guid=4148251638983938048 top_guid=9690155374174551757 guid=9690155374174551757 vdev_tree type=''disk'' id=2 guid=9690155374174551757 path=''/dev/dsk/emcpower3a'' whole_disk=0 metaslab_array=1813 metaslab_shift=30 ashift=9 asize=134208815104 -------------------------------------------- LABEL 3 -------------------------------------------- version=3 name=''mypool'' state=0 txg=4367380 pool_guid=4148251638983938048 top_guid=9690155374174551757 guid=9690155374174551757 vdev_tree type=''disk'' id=2 guid=9690155374174551757 path=''/dev/dsk/emcpower3a'' whole_disk=0 metaslab_array=1813 metaslab_shift=30 ashift=9 asize=134208815104 But when I do zdb on emcpower0a which seems to be not that ok and get the following output: bash-3.00# zdb -lv /dev/dsk/emcpower0a -------------------------------------------- LABEL 0 -------------------------------------------- version=3 name=''mypool'' state=0 txg=4367379 pool_guid=4148251638983938048 top_guid=14125143252243381576 guid=14125143252243381576 vdev_tree type=''disk'' id=0 guid=14125143252243381576 path=''/dev/dsk/emcpower0a'' whole_disk=0 metaslab_array=13 metaslab_shift=29 ashift=9 asize=107365269504 DTL=727 -------------------------------------------- LABEL 1 -------------------------------------------- version=3 name=''mypool'' state=0 txg=4367379 pool_guid=4148251638983938048 top_guid=14125143252243381576 guid=14125143252243381576 vdev_tree type=''disk'' id=0 guid=14125143252243381576 path=''/dev/dsk/emcpower0a'' whole_disk=0 metaslab_array=13 metaslab_shift=29 ashift=9 asize=107365269504 DTL=727 -------------------------------------------- LABEL 2 -------------------------------------------- failed to read label 2 -------------------------------------------- LABEL 3 -------------------------------------------- failed to read label 3 that also is the same for emcpower2a in my pool. Is there a way to be able to fix failed LABELs 2 and 3? I know you need 4 of them, but is there a way to reconstruct them in any way? Or is my pool lost completely and I need to recreate it? It would be off that reboot of a server could cause such disaster. But I was unable to find anywhere where people would be able to repair or recreate those LABELS. How would I recover my zpools? Any help or suggestion is greatly appreciated. Regards, Chris
Looking at the txg numbers, it''s clear that labels on to devices that are unavailable now may be stale: Krzys wrote:> When I do zdb on emcpower3a which seems to be ok from zpool perspective I get > the following output: > bash-3.00# zdb -lv /dev/dsk/emcpower3a > -------------------------------------------- > LABEL 0 > -------------------------------------------- > version=3 > name=''mypool'' > state=0 > txg=4367380 > pool_guid=4148251638983938048 > top_guid=9690155374174551757 > guid=9690155374174551757 > vdev_tree > type=''disk'' > id=2 > guid=9690155374174551757 > path=''/dev/dsk/emcpower3a'' > whole_disk=0 > metaslab_array=1813 > metaslab_shift=30 > ashift=9 > asize=134208815104Here we have txg=4367380, but on other two devices (probably; at least on one of them) - txg=4367379:> But when I do zdb on emcpower0a which seems to be not that ok and get the > following output: > bash-3.00# zdb -lv /dev/dsk/emcpower0a > -------------------------------------------- > LABEL 0 > -------------------------------------------- > version=3 > name=''mypool'' > state=0 > txg=4367379 > pool_guid=4148251638983938048 > top_guid=14125143252243381576 > guid=14125143252243381576 > vdev_tree > type=''disk'' > id=0 > guid=14125143252243381576 > path=''/dev/dsk/emcpower0a'' > whole_disk=0 > metaslab_array=13 > metaslab_shift=29 > ashift=9 > asize=107365269504 > DTL=727 > > that also is the same for emcpower2a in my pool.What does ''zdb -uuu mypool'' say?> Is there a way to be able to fix failed LABELs 2 and 3? I know you need 4 of > them, but is there a way to reconstruct them in any way?It looks like the problem is not that labels 2 and 3 are missing, but that labels 0 and 1 are stale> Or is my pool lost completely and I need to recreate it? > It would be off that reboot of a server could cause such disaster.There''s Dirty Time Log object allocated for device with unreadable labels, and it means that device in question was not available for some time, so something weird might be going on with your storage a while back (prior to reboot)...> But I was unable to find anywhere where people would be able to > repair or recreate those LABELS. How would I recover my zpools? Any > help or suggestion is greatly appreciated.Have you seen this thread - http://www.opensolaris.org/jive/thread.jspa?messageID=220125 ? I think some of that experience may be applicable to this case as well Btw, what kind of Solaris are you running? wbr, victor
It''s OK that you''re missing labels 2 and 3 -- there are four copies precisely so that you can afford to lose a few. Labels 2 and 3 are at the end of the disk. The fact that only they are missing makes me wonder if someone resized the LUNs. Growing them would be OK, but shrinking them would indeed cause the pool to fail to open (since part of it was amputated). There ought to be more helpful diagnostics in the FMA error log. After a failed attempt to import, type this: # fmdump -ev and let me know what it says. Jeff On Tue, Apr 29, 2008 at 03:31:53PM -0400, Krzys wrote:> > > > I have a problem on one of my systems with zfs. I used to have zpool created > with 3 luns on SAN. I did not have to put any raid or anything on it since it > was already using raid on SAN. Anyway server rebooted and I cannot zee my pools. > When I do try to import it it does fail. I am using EMC Clarion as SAN and > powerpath > # zpool list > no pools available > # zpool import -f > pool: mypool > id: 4148251638983938048 > state: FAULTED > status: One or more devices are missing from the system. > action: The pool cannot be imported. Attach the missing > devices and try again. > see: http://www.sun.com/msg/ZFS-8000-3C > config: > mypool UNAVAIL insufficient replicas > emcpower0a UNAVAIL cannot open > emcpower2a UNAVAIL cannot open > emcpower3a ONLINE > > I think I am able to see all the luns and I should be able to access them on my > sun box. > # powermt display dev=all > Pseudo name=emcpower0a > CLARiiON ID=APM00070202835 [NRHAPP02] > Logical device ID=6006016045201A001264FB20990FDC11 [LUN 13] > state=alive; policy=CLAROpt; priority=0; queued-IOs=0 > Owner: default=SP B, current=SP B > =============================================================================> ---------------- Host --------------- - Stor - -- I/O Path - -- Stats --- > ### HW Path I/O Paths Interf. Mode State Q-IOs Errors > =============================================================================> 3074 pci at 1f,700000/pci at 0/SUNW,qlc at 2/fp at 0,0 c2t5006016041E035A4d0s0 SP A4 active > alive 0 0 > 3074 pci at 1f,700000/pci at 0/SUNW,qlc at 2/fp at 0,0 c2t5006016941E035A4d0s0 SP B5 active > alive 0 0 > 3072 pci at 1f,700000/pci at 0,2/SUNW,qlc at 1/fp at 0,0 c3t5006016141E035A4d0s0 SP A5 > active alive 0 0 > 3072 pci at 1f,700000/pci at 0,2/SUNW,qlc at 1/fp at 0,0 c3t5006016841E035A4d0s0 SP B4 > active alive 0 0 > > > Pseudo name=emcpower1a > CLARiiON ID=APM00070202835 [NRHAPP02] > Logical device ID=6006016045201A004C1388343C10DC11 [LUN 14] > state=alive; policy=CLAROpt; priority=0; queued-IOs=0 > Owner: default=SP B, current=SP B > =============================================================================> ---------------- Host --------------- - Stor - -- I/O Path - -- Stats --- > ### HW Path I/O Paths Interf. Mode State Q-IOs Errors > =============================================================================> 3074 pci at 1f,700000/pci at 0/SUNW,qlc at 2/fp at 0,0 c2t5006016041E035A4d1s0 SP A4 active > alive 0 0 > 3074 pci at 1f,700000/pci at 0/SUNW,qlc at 2/fp at 0,0 c2t5006016941E035A4d1s0 SP B5 active > alive 0 0 > 3072 pci at 1f,700000/pci at 0,2/SUNW,qlc at 1/fp at 0,0 c3t5006016141E035A4d1s0 SP A5 > active alive 0 0 > 3072 pci at 1f,700000/pci at 0,2/SUNW,qlc at 1/fp at 0,0 c3t5006016841E035A4d1s0 SP B4 > active alive 0 0 > > > Pseudo name=emcpower3a > CLARiiON ID=APM00070202835 [NRHAPP02] > Logical device ID=6006016045201A00A82C68514E86DC11 [LUN 7] > state=alive; policy=CLAROpt; priority=0; queued-IOs=0 > Owner: default=SP B, current=SP B > =============================================================================> ---------------- Host --------------- - Stor - -- I/O Path - -- Stats --- > ### HW Path I/O Paths Interf. Mode State Q-IOs Errors > =============================================================================> 3074 pci at 1f,700000/pci at 0/SUNW,qlc at 2/fp at 0,0 c2t5006016041E035A4d3s0 SP A4 active > alive 0 0 > 3074 pci at 1f,700000/pci at 0/SUNW,qlc at 2/fp at 0,0 c2t5006016941E035A4d3s0 SP B5 active > alive 0 0 > 3072 pci at 1f,700000/pci at 0,2/SUNW,qlc at 1/fp at 0,0 c3t5006016141E035A4d3s0 SP A5 > active alive 0 0 > 3072 pci at 1f,700000/pci at 0,2/SUNW,qlc at 1/fp at 0,0 c3t5006016841E035A4d3s0 SP B4 > active alive 0 0 > > > Pseudo name=emcpower2a > CLARiiON ID=APM00070202835 [NRHAPP02] > Logical device ID=600601604B141B00C2F6DB2AC349DC11 [LUN 24] > state=alive; policy=CLAROpt; priority=0; queued-IOs=0 > Owner: default=SP B, current=SP B > =============================================================================> ---------------- Host --------------- - Stor - -- I/O Path - -- Stats --- > ### HW Path I/O Paths Interf. Mode State Q-IOs Errors > =============================================================================> 3074 pci at 1f,700000/pci at 0/SUNW,qlc at 2/fp at 0,0 c2t5006016041E035A4d2s0 SP A4 active > alive 0 0 > 3074 pci at 1f,700000/pci at 0/SUNW,qlc at 2/fp at 0,0 c2t5006016941E035A4d2s0 SP B5 active > alive 0 0 > 3072 pci at 1f,700000/pci at 0,2/SUNW,qlc at 1/fp at 0,0 c3t5006016141E035A4d2s0 SP A5 > active alive 0 0 > 3072 pci at 1f,700000/pci at 0,2/SUNW,qlc at 1/fp at 0,0 c3t5006016841E035A4d2s0 SP B4 > active alive 0 0 > > > So format does show them as well > bash-3.00# echo | format > Searching for disks...done > AVAILABLE DISK SELECTIONS: > 0. c1t0d0 <SUN72G cyl 14087 alt 2 hd 24 sec 424> > /pci at 1e,600000/pci at 0/pci at a/pci at 0/pci at 8/scsi at 1/sd at 0,0 > 1. c1t1d0 <SUN72G cyl 14087 alt 2 hd 24 sec 424> > /pci at 1e,600000/pci at 0/pci at a/pci at 0/pci at 8/scsi at 1/sd at 1,0 > 2. c1t2d0 <SEAGATE-ST973401LSUN72G-0556-68.37GB> > /pci at 1e,600000/pci at 0/pci at a/pci at 0/pci at 8/scsi at 1/sd at 2,0 > 3. c1t3d0 <SEAGATE-ST973401LSUN72G-0556-68.37GB> > /pci at 1e,600000/pci at 0/pci at a/pci at 0/pci at 8/scsi at 1/sd at 3,0 > 4. c2t5006016941E035A4d0 <DGC-RAID5-0324 cyl 51198 alt 2 hd 256 sec 16> > /pci at 1f,700000/pci at 0/SUNW,qlc at 2/fp at 0,0/ssd at w5006016941e035a4,0 > 5. c2t5006016041E035A4d0 <DGC-RAID5-0324 cyl 51198 alt 2 hd 256 sec 16> > /pci at 1f,700000/pci at 0/SUNW,qlc at 2/fp at 0,0/ssd at w5006016041e035a4,0 > 6. c2t5006016941E035A4d1 <DGC-RAID5-0324 cyl 32766 alt 2 hd 64 sec 10> > /pci at 1f,700000/pci at 0/SUNW,qlc at 2/fp at 0,0/ssd at w5006016941e035a4,1 > 7. c2t5006016041E035A4d1 <DGC-RAID5-0324 cyl 32766 alt 2 hd 64 sec 10> > /pci at 1f,700000/pci at 0/SUNW,qlc at 2/fp at 0,0/ssd at w5006016041e035a4,1 > 8. c2t5006016041E035A4d2 <DGC-RAID5-0324 cyl 32766 alt 2 hd 256 sec 10> > /pci at 1f,700000/pci at 0/SUNW,qlc at 2/fp at 0,0/ssd at w5006016041e035a4,2 > 9. c2t5006016941E035A4d2 <DGC-RAID5-0324 cyl 32766 alt 2 hd 256 sec 10> > /pci at 1f,700000/pci at 0/SUNW,qlc at 2/fp at 0,0/ssd at w5006016941e035a4,2 > 10. c2t5006016041E035A4d3 <DGC-RAID5-0324 cyl 63998 alt 2 hd 256 sec 16> > /pci at 1f,700000/pci at 0/SUNW,qlc at 2/fp at 0,0/ssd at w5006016041e035a4,3 > 11. c2t5006016941E035A4d3 <DGC-RAID5-0324 cyl 63998 alt 2 hd 256 sec 16> > /pci at 1f,700000/pci at 0/SUNW,qlc at 2/fp at 0,0/ssd at w5006016941e035a4,3 > 12. c3t5006016841E035A4d0 <DGC-RAID5-0324 cyl 51198 alt 2 hd 256 sec 16> > /pci at 1f,700000/pci at 0,2/SUNW,qlc at 1/fp at 0,0/ssd at w5006016841e035a4,0 > 13. c3t5006016141E035A4d0 <DGC-RAID5-0324 cyl 51198 alt 2 hd 256 sec 16> > /pci at 1f,700000/pci at 0,2/SUNW,qlc at 1/fp at 0,0/ssd at w5006016141e035a4,0 > 14. c3t5006016141E035A4d1 <DGC-RAID5-0324 cyl 32766 alt 2 hd 64 sec 10> > /pci at 1f,700000/pci at 0,2/SUNW,qlc at 1/fp at 0,0/ssd at w5006016141e035a4,1 > 15. c3t5006016841E035A4d1 <DGC-RAID5-0324 cyl 32766 alt 2 hd 64 sec 10> > /pci at 1f,700000/pci at 0,2/SUNW,qlc at 1/fp at 0,0/ssd at w5006016841e035a4,1 > 16. c3t5006016141E035A4d2 <DGC-RAID5-0324 cyl 32766 alt 2 hd 256 sec 10> > /pci at 1f,700000/pci at 0,2/SUNW,qlc at 1/fp at 0,0/ssd at w5006016141e035a4,2 > 17. c3t5006016841E035A4d2 <DGC-RAID5-0324 cyl 32766 alt 2 hd 256 sec 10> > /pci at 1f,700000/pci at 0,2/SUNW,qlc at 1/fp at 0,0/ssd at w5006016841e035a4,2 > 18. c3t5006016841E035A4d3 <DGC-RAID5-0324 cyl 63998 alt 2 hd 256 sec 16> > /pci at 1f,700000/pci at 0,2/SUNW,qlc at 1/fp at 0,0/ssd at w5006016841e035a4,3 > 19. c3t5006016141E035A4d3 <DGC-RAID5-0324 cyl 63998 alt 2 hd 256 sec 16> > /pci at 1f,700000/pci at 0,2/SUNW,qlc at 1/fp at 0,0/ssd at w5006016141e035a4,3 > 20. emcpower0a <DGC-RAID5-0324 cyl 51198 alt 2 hd 256 sec 16> > /pseudo/emcp at 0 > 21. emcpower1a <DGC-RAID5-0324 cyl 32766 alt 2 hd 64 sec 10> > /pseudo/emcp at 1 > 22. emcpower2a <DGC-RAID5-0324 cyl 32766 alt 2 hd 256 sec 10> > /pseudo/emcp at 2 > 23. emcpower3a <DGC-RAID5-0324 cyl 63998 alt 2 hd 256 sec 16> > /pseudo/emcp at 3 > Specify disk (enter its number): Specify disk (enter its number): > > > > Now the fun part with troubleshooting this. > > When I do zdb on emcpower3a which seems to be ok from zpool perspective I get > the following output: > bash-3.00# zdb -lv /dev/dsk/emcpower3a > -------------------------------------------- > LABEL 0 > -------------------------------------------- > version=3 > name=''mypool'' > state=0 > txg=4367380 > pool_guid=4148251638983938048 > top_guid=9690155374174551757 > guid=9690155374174551757 > vdev_tree > type=''disk'' > id=2 > guid=9690155374174551757 > path=''/dev/dsk/emcpower3a'' > whole_disk=0 > metaslab_array=1813 > metaslab_shift=30 > ashift=9 > asize=134208815104 > -------------------------------------------- > LABEL 1 > -------------------------------------------- > version=3 > name=''mypool'' > state=0 > txg=4367380 > pool_guid=4148251638983938048 > top_guid=9690155374174551757 > guid=9690155374174551757 > vdev_tree > type=''disk'' > id=2 > guid=9690155374174551757 > path=''/dev/dsk/emcpower3a'' > whole_disk=0 > metaslab_array=1813 > metaslab_shift=30 > ashift=9 > asize=134208815104 > -------------------------------------------- > LABEL 2 > -------------------------------------------- > version=3 > name=''mypool'' > state=0 > txg=4367380 > pool_guid=4148251638983938048 > top_guid=9690155374174551757 > guid=9690155374174551757 > vdev_tree > type=''disk'' > id=2 > guid=9690155374174551757 > path=''/dev/dsk/emcpower3a'' > whole_disk=0 > metaslab_array=1813 > metaslab_shift=30 > ashift=9 > asize=134208815104 > -------------------------------------------- > LABEL 3 > -------------------------------------------- > version=3 > name=''mypool'' > state=0 > txg=4367380 > pool_guid=4148251638983938048 > top_guid=9690155374174551757 > guid=9690155374174551757 > vdev_tree > type=''disk'' > id=2 > guid=9690155374174551757 > path=''/dev/dsk/emcpower3a'' > whole_disk=0 > metaslab_array=1813 > metaslab_shift=30 > ashift=9 > asize=134208815104 > > > > But when I do zdb on emcpower0a which seems to be not that ok and get the > following output: > bash-3.00# zdb -lv /dev/dsk/emcpower0a > -------------------------------------------- > LABEL 0 > -------------------------------------------- > version=3 > name=''mypool'' > state=0 > txg=4367379 > pool_guid=4148251638983938048 > top_guid=14125143252243381576 > guid=14125143252243381576 > vdev_tree > type=''disk'' > id=0 > guid=14125143252243381576 > path=''/dev/dsk/emcpower0a'' > whole_disk=0 > metaslab_array=13 > metaslab_shift=29 > ashift=9 > asize=107365269504 > DTL=727 > -------------------------------------------- > LABEL 1 > -------------------------------------------- > version=3 > name=''mypool'' > state=0 > txg=4367379 > pool_guid=4148251638983938048 > top_guid=14125143252243381576 > guid=14125143252243381576 > vdev_tree > type=''disk'' > id=0 > guid=14125143252243381576 > path=''/dev/dsk/emcpower0a'' > whole_disk=0 > metaslab_array=13 > metaslab_shift=29 > ashift=9 > asize=107365269504 > DTL=727 > -------------------------------------------- > LABEL 2 > -------------------------------------------- > failed to read label 2 > -------------------------------------------- > LABEL 3 > -------------------------------------------- > failed to read label 3 > > > that also is the same for emcpower2a in my pool. > > Is there a way to be able to fix failed LABELs 2 and 3? I know you need 4 of > them, but is there a way to reconstruct them in any way? Or is my pool lost > completely and I need to recreate it? It would be off that reboot of a server > could cause such disaster. But I was unable to find anywhere where people would > be able to repair or recreate those LABELS. How would I recover my zpools? Any > help or suggestion is greatly appreciated. > > Regards, > > Chris > > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
> Looking at the txg numbers, it''s clear that labels on to devices that > are unavailable now may be stale:Actually, they look OK. The txg values in the label indicate the last txg in which the pool configuration changed for devices in that top-level vdev (e.g. mirror or raid-z group), not the last txg synced. Jeff
Jeff Bonwick wrote:>> Looking at the txg numbers, it''s clear that labels on to devices that >> are unavailable now may be stale: > > Actually, they look OK. The txg values in the label indicate the > last txg in which the pool configuration changed for devices in that > top-level vdev (e.g. mirror or raid-z group), not the last txg synced.Agree, I''ve jumped to conclusions here. But still it is a difference between two labels presented. Since this was running for a while I suppose there have been no admin-initiated configuration changes, so config change may be due to allocation of DTL object, correct? Still it would be interesting to know txg of the selected uberblock to see how long ago that change happened. Also it would be interesting to know why did server reboot? Victor
Because this system was in production I had to fairly quickly recover, so I was unable to play much more with it we had to destroy it and recreate new pool and then recover data from tapes. Its a mistery as to why in the middle of a night it rebooted, we could not figure this out and why pool had this problem... so unfortunatelly I will not be able to follow what you Victor and Jeff were suggesting. before we destroyed that pool I did get output of fmdump on that system to see what failed etc. As you can see it happend at around 3:54 am on Sunday morning there was no one on the system from admin perspective to break anything, only thing that I might think of would be the backups running which could generate more traffic, but then I had that system for over a year setup this way, and no changes were made to it from storage perspective. yes I did see this URL: http://www.opensolaris.org/jive/thread.jspa?messageID=220125 but unfortunately I was unable to apply it in my situation as I had no idea what values to apply... :( anyway here is fmdump bash-3.00# fmdump -eV TIME CLASS Apr 27 2008 03:54:05.605369200 ereport.fs.zfs.vdev.open_failed nvlist version: 0 class = ereport.fs.zfs.vdev.open_failed ena = 0x18594234ea00001 detector = (embedded nvlist) nvlist version: 0 version = 0x0 scheme = zfs pool = 0x39918ce32491d000 vdev = 0xc40696f31f78fd48 (end detector) pool = mypool pool_guid = 0x39918ce32491d000 pool_context = 1 vdev_guid = 0xc40696f31f78fd48 vdev_type = disk vdev_path = /dev/dsk/emcpower0a parent_guid = 0x39918ce32491d000 parent_type = root prev_state = 0x1 __ttl = 0x1 __tod = 0x4814311d 0x24153370 Apr 27 2008 03:54:05.605369725 ereport.fs.zfs.vdev.open_failed nvlist version: 0 class = ereport.fs.zfs.vdev.open_failed ena = 0x18594234ea00001 detector = (embedded nvlist) nvlist version: 0 version = 0x0 scheme = zfs pool = 0x39918ce32491d000 vdev = 0xd56fa2d7686dae8c (end detector) pool = mypool pool_guid = 0x39918ce32491d000 pool_context = 1 vdev_guid = 0xd56fa2d7686dae8c vdev_type = disk vdev_path = /dev/dsk/emcpower2a parent_guid = 0x39918ce32491d000 parent_type = root prev_state = 0x1 __ttl = 0x1 __tod = 0x4814311d 0x2415357d Apr 27 2008 03:54:05.605369225 ereport.fs.zfs.zpool nvlist version: 0 class = ereport.fs.zfs.zpool ena = 0x18594234ea00001 detector = (embedded nvlist) nvlist version: 0 version = 0x0 scheme = zfs pool = 0x39918ce32491d000 (end detector) pool = mypool pool_guid = 0x39918ce32491d000 pool_context = 1 __ttl = 0x1 __tod = 0x4814311d 0x24153389 Apr 27 2008 03:56:28.180698100 ereport.fs.zfs.vdev.open_failed nvlist version: 0 class = ereport.fs.zfs.vdev.open_failed ena = 0x398b69181e00401 detector = (embedded nvlist) nvlist version: 0 version = 0x0 scheme = zfs pool = 0x39918ce32491d000 vdev = 0xc40696f31f78fd48 (end detector) pool = mypool pool_guid = 0x39918ce32491d000 pool_context = 1 vdev_guid = 0xc40696f31f78fd48 vdev_type = disk vdev_path = /dev/dsk/emcpower0a parent_guid = 0x39918ce32491d000 parent_type = root prev_state = 0x1 __ttl = 0x1 __tod = 0x481431ac 0xac53bf4 Apr 27 2008 03:56:28.180698375 ereport.fs.zfs.vdev.open_failed nvlist version: 0 class = ereport.fs.zfs.vdev.open_failed ena = 0x398b69181e00401 detector = (embedded nvlist) nvlist version: 0 version = 0x0 scheme = zfs pool = 0x39918ce32491d000 vdev = 0xd56fa2d7686dae8c (end detector) pool = mypool pool_guid = 0x39918ce32491d000 pool_context = 1 vdev_guid = 0xd56fa2d7686dae8c vdev_type = disk vdev_path = /dev/dsk/emcpower2a parent_guid = 0x39918ce32491d000 parent_type = root prev_state = 0x1 __ttl = 0x1 __tod = 0x481431ac 0xac53d07 Apr 27 2008 03:56:28.180698500 ereport.fs.zfs.zpool nvlist version: 0 class = ereport.fs.zfs.zpool ena = 0x398b69181e00401 detector = (embedded nvlist) nvlist version: 0 version = 0x0 scheme = zfs pool = 0x39918ce32491d000 (end detector) pool = mypool pool_guid = 0x39918ce32491d000 pool_context = 1 __ttl = 0x1 __tod = 0x481431ac 0xac53d84 Apr 28 2008 09:40:10.917082725 ereport.fs.zfs.vdev.open_failed nvlist version: 0 class = ereport.fs.zfs.vdev.open_failed ena = 0x1fa8d23d6e00001 detector = (embedded nvlist) nvlist version: 0 version = 0x0 scheme = zfs pool = 0x39918ce32491d000 vdev = 0xc40696f31f78fd48 (end detector) pool = mypool pool_guid = 0x39918ce32491d000 pool_context = 1 vdev_guid = 0xc40696f31f78fd48 vdev_type = disk vdev_path = /dev/dsk/emcpower0a parent_guid = 0x39918ce32491d000 parent_type = root prev_state = 0x1 __ttl = 0x1 __tod = 0x4815d3ba 0x36a99265 Apr 28 2008 09:40:10.917081800 ereport.fs.zfs.vdev.open_failed nvlist version: 0 class = ereport.fs.zfs.vdev.open_failed ena = 0x1fa8d23d6e00001 detector = (embedded nvlist) nvlist version: 0 version = 0x0 scheme = zfs pool = 0x39918ce32491d000 vdev = 0xd56fa2d7686dae8c (end detector) pool = mypool pool_guid = 0x39918ce32491d000 pool_context = 1 vdev_guid = 0xd56fa2d7686dae8c vdev_type = disk vdev_path = /dev/dsk/emcpower2a parent_guid = 0x39918ce32491d000 parent_type = root prev_state = 0x1 __ttl = 0x1 __tod = 0x4815d3ba 0x36a98ec8 Apr 28 2008 09:40:10.917081825 ereport.fs.zfs.zpool nvlist version: 0 class = ereport.fs.zfs.zpool ena = 0x1fa8d23d6e00001 detector = (embedded nvlist) nvlist version: 0 version = 0x0 scheme = zfs pool = 0x39918ce32491d000 (end detector) pool = mypool pool_guid = 0x39918ce32491d000 pool_context = 1 __ttl = 0x1 __tod = 0x4815d3ba 0x36a98ee1 Apr 28 2008 10:05:27.315099900 ereport.fs.zfs.zpool nvlist version: 0 class = ereport.fs.zfs.zpool ena = 0x180ba4f67c600001 detector = (embedded nvlist) nvlist version: 0 version = 0x0 scheme = zfs pool = 0x39918ce32491d000 (end detector) pool = mypool pool_guid = 0x39918ce32491d000 pool_context = 2 __ttl = 0x1 __tod = 0x4815d9a7 0x12c80afc Apr 28 2008 10:06:26.582074525 ereport.fs.zfs.zpool nvlist version: 0 class = ereport.fs.zfs.zpool ena = 0x18e86f20a1a00401 detector = (embedded nvlist) nvlist version: 0 version = 0x0 scheme = zfs pool = 0x39918ce32491d000 (end detector) pool = mypool pool_guid = 0x39918ce32491d000 pool_context = 2 __ttl = 0x1 __tod = 0x4815d9e2 0x22b1c09d Apr 28 2008 10:38:24.337459525 ereport.fs.zfs.zpool nvlist version: 0 class = ereport.fs.zfs.zpool ena = 0xc04d9834fc00001 detector = (embedded nvlist) nvlist version: 0 version = 0x0 scheme = zfs pool = 0x39918ce32491d000 (end detector) pool = mypool pool_guid = 0x39918ce32491d000 pool_context = 2 __ttl = 0x1 __tod = 0x4815e160 0x141d3945 Apr 28 2008 13:51:24.239005950 ereport.fs.zfs.zpool nvlist version: 0 class = ereport.fs.zfs.zpool ena = 0xb487e94711e00001 detector = (embedded nvlist) nvlist version: 0 version = 0x0 scheme = zfs pool = 0x39918ce32491d000 (end detector) pool = mypool pool_guid = 0x39918ce32491d000 pool_context = 2 __ttl = 0x1 __tod = 0x48160e9c 0xe3ef0fe Apr 28 2008 14:00:20.762159250 ereport.fs.zfs.zpool nvlist version: 0 class = ereport.fs.zfs.zpool ena = 0xbc56a37469c00001 detector = (embedded nvlist) nvlist version: 0 version = 0x0 scheme = zfs pool = 0x39918ce32491d000 (end detector) pool = mypool pool_guid = 0x39918ce32491d000 pool_context = 2 __ttl = 0x1 __tod = 0x481610b4 0x2d6da092 Apr 28 2008 15:10:46.512345000 ereport.fs.zfs.zpool nvlist version: 0 class = ereport.fs.zfs.zpool ena = 0xf9d4f37becc00401 detector = (embedded nvlist) nvlist version: 0 version = 0x0 scheme = zfs pool = 0x39918ce32491d000 (end detector) pool = mypool pool_guid = 0x39918ce32491d000 pool_context = 2 __ttl = 0x1 __tod = 0x48162136 0x1e89c3a8 Apr 29 2008 11:10:49.126505525 ereport.fs.zfs.vdev.open_failed nvlist version: 0 class = ereport.fs.zfs.vdev.open_failed ena = 0x2232b54d9e00001 detector = (embedded nvlist) nvlist version: 0 version = 0x0 scheme = zfs pool = 0x39918ce32491d000 vdev = 0xc40696f31f78fd48 (end detector) pool = mypool pool_guid = 0x39918ce32491d000 pool_context = 1 vdev_guid = 0xc40696f31f78fd48 vdev_type = disk vdev_path = /dev/dsk/emcpower0a parent_guid = 0x39918ce32491d000 parent_type = root prev_state = 0x1 __ttl = 0x1 __tod = 0x48173a79 0x78a5235 Apr 29 2008 11:10:49.126505300 ereport.fs.zfs.vdev.open_failed nvlist version: 0 class = ereport.fs.zfs.vdev.open_failed ena = 0x2232b54d9e00001 detector = (embedded nvlist) nvlist version: 0 version = 0x0 scheme = zfs pool = 0x39918ce32491d000 vdev = 0xd56fa2d7686dae8c (end detector) pool = mypool pool_guid = 0x39918ce32491d000 pool_context = 1 vdev_guid = 0xd56fa2d7686dae8c vdev_type = disk vdev_path = /dev/dsk/emcpower2a parent_guid = 0x39918ce32491d000 parent_type = root prev_state = 0x1 __ttl = 0x1 __tod = 0x48173a79 0x78a5154 Apr 29 2008 11:10:49.126505475 ereport.fs.zfs.vdev.open_failed nvlist version: 0 class = ereport.fs.zfs.vdev.open_failed ena = 0x2232b54d9e00001 detector = (embedded nvlist) nvlist version: 0 version = 0x0 scheme = zfs pool = 0x39918ce32491d000 vdev = 0x867a58f8e0043ecd (end detector) pool = mypool pool_guid = 0x39918ce32491d000 pool_context = 1 vdev_guid = 0x867a58f8e0043ecd vdev_type = disk vdev_path = /dev/dsk/emcpower3a parent_guid = 0x39918ce32491d000 parent_type = root prev_state = 0x1 __ttl = 0x1 __tod = 0x48173a79 0x78a5203 Apr 29 2008 11:10:49.126504575 ereport.fs.zfs.zpool nvlist version: 0 class = ereport.fs.zfs.zpool ena = 0x2232b54d9e00001 detector = (embedded nvlist) nvlist version: 0 version = 0x0 scheme = zfs pool = 0x39918ce32491d000 (end detector) pool = mypool pool_guid = 0x39918ce32491d000 pool_context = 1 __ttl = 0x1 __tod = 0x48173a79 0x78a4e7f bash-3.00# On Sun, 4 May 2008, Victor Latushkin wrote:> Jeff Bonwick wrote: >>> Looking at the txg numbers, it''s clear that labels on to devices that >>> are unavailable now may be stale: >> >> Actually, they look OK. The txg values in the label indicate the>> last txg in which the pool configuration changed for devices in that >> top-level vdev (e.g. mirror or raid-z group), not the last txg synced. > > Agree, I''ve jumped to conclusions here. > > But still it is a difference between two labels presented. Since this was > running for a while I suppose there have been no admin-initiated > configuration changes, so config change may be due to allocation of DTL > object, correct? > > Still it would be interesting to know txg of the selected uberblock to see > how long ago that change happened. > > Also it would be interesting to know why did server reboot? > > Victor > > > !DSPAM:122,481d7f2e45852151120594! >