Hello... i have somehow a strange problem. I build a normal zfs pool of two disks (jbod) and set some folders to copies=2 Yesterday one of the disks failed and so the zpool status changed to:> pool: tank > state: DEGRADED > status: One or more devices could not be opened. Sufficient replicas exist for > the pool to continue functioning in a degraded state. > action: Attach the missing device and online it using ''zpool online''.I replaced the failed disk with a new one with the same size and the same time. After a system reboot, the pool "tank" was not active anymore. A zpool import just told me: # zpool import> pool: tank > id: 3778921141244237706 > state: DEGRADED > status: One or more devices are missing from the system. > action: The pool can be imported despite missing or damaged devices. The > fault tolerance of the pool may be compromised if imported. > see: http://www.sun.com/msg/ZFS-8000-2Q > config: > > tank UNAVAIL > c0t5d0 ONLINESo i am not able to import the pool again. And so i am also not able to replace the broken disk. I am really getting nervous atm, becourse this pool is almost my last complete backup of some files. Hope that somebody can give me a clue and tells me, that a repair is still possible. All data of the pool should be still threre, right? -- This message posted from opensolaris.org
On 29 Nov 2008, at 09:35, Philipp Hau?leiter wrote:> Hello... > > i have somehow a strange problem. > I build a normal zfs pool of two disks (jbod) and set some folders > to copies=2Mirrored disks?> Yesterday one of the disks failed and so the zpool status changed to: > >> pool: tank >> state: DEGRADED >> status: One or more devices could not be opened. Sufficient >> replicas exist for >> the pool to continue functioning in a degraded state. >> action: Attach the missing device and online it using ''zpool online''. > > I replaced the failed disk with a new one with the same size and the > same time.What zpool commands did you run at this point? <http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide#Simple_or_Striped_Storage_Pool_Limitations > suggests you should have done: zpool replace pool olddev newdev. Cheers, Chris
Is this a mirrored pool or a striped one? If it was mirrored, you can just replace the faulty drive and it should rebuild. If it was just striped I don''t know where you stand. Copies=2 isn''t a replacement for redundancy, I don''t know if that pool is going to be in a usable state. We really need to know more about your pool, and the commands you''ve used to try to repair it. -- This message posted from opensolaris.org
> > On 29 Nov 2008, at 09:35, Philipp Hau?leiter wrote: > > > Hello... > > > > i have somehow a strange problem. > > I build a normal zfs pool of two disks (jbod) and > set some folders > > to copies=2 > > Mirrored disks?No... just a stripe of disks. But i set some zfs volumes to store the data in two copies. Thought it would be the best choice for upgrading with new disks.> > Yesterday one of the disks failed and so the zpool > status changed to: > > > >> pool: tank > >> state: DEGRADED > >> status: One or more devices could not be opened. > Sufficient > >> replicas exist for > >> the pool to continue functioning in a degraded > state. > >> action: Attach the missing device and online it > using ''zpool online''. > > > > I replaced the failed disk with a new one with the > same size and the > > same time. > > What zpool commands did you run at this point? >zpool import -f tank zpool attach <pool> <newdev> But the pool is not imported.... unfortunetally i cannot import it from one device :-/.> <http://www.solarisinternals.com/wiki/index.php/ZFS_Be > st_Practices_Guide#Simple_or_Striped_Storage_Pool_Limi > tations > > > suggests you should have done: > > zpool replace pool olddev newdev.Did not try this. Will do it in couple of hours.... but this also needs a imported pool, right?> > Cheers, > > Chris > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discu > ss-- This message posted from opensolaris.org
i make the second disk of the pool working again... any change to restore the metadata of the pool? tried to figure something out from the head of the device (dd the first 100megs to a file), but found nothing helpful :-/. tried a zpool -D tank what information should i give? -- This message posted from opensolaris.org
zdb on the two devices told me: zdb -l /dev/rdsk/c0t2d0 -------------------------------------------- LABEL 0 -------------------------------------------- failed to unpack label 0 -------------------------------------------- LABEL 1 -------------------------------------------- failed to unpack label 1 -------------------------------------------- LABEL 2 -------------------------------------------- version=6 name=''tank'' state=0 txg=4 pool_guid=1230498626424814687 hostid=2180312168 hostname=''sunny.local'' top_guid=7409377091667366359 guid=7409377091667366359 vdev_tree type=''disk'' id=1 guid=7409377091667366359 path=''/dev/ad6'' devid=''ad:S13UJDWQ726303'' whole_disk=0 metaslab_array=14 metaslab_shift=32 ashift=9 asize=750151532544 -------------------------------------------- LABEL 3 -------------------------------------------- version=6 name=''tank'' state=0 txg=4 pool_guid=1230498626424814687 hostid=2180312168 hostname=''sunny.local'' top_guid=7409377091667366359 guid=7409377091667366359 vdev_tree type=''disk'' id=1 guid=7409377091667366359 path=''/dev/ad6'' devid=''ad:S13UJDWQ726303'' whole_disk=0 metaslab_array=14 metaslab_shift=32 ashift=9 asize=750151532544 zdb -l /dev/rdsk/c0t5d0 -------------------------------------------- LABEL 0 -------------------------------------------- failed to unpack label 0 -------------------------------------------- LABEL 1 -------------------------------------------- failed to unpack label 1 -------------------------------------------- LABEL 2 -------------------------------------------- version=6 name=''tank'' state=0 txg=4 pool_guid=1230498626424814687 hostid=2180312168 hostname=''sunny.local'' top_guid=1935911704083137663 guid=1935911704083137663 vdev_tree type=''disk'' id=0 guid=1935911704083137663 path=''/dev/ad4'' devid=''ad:S13UJDWQ726301'' whole_disk=0 metaslab_array=17 metaslab_shift=32 ashift=9 asize=750151532544 -------------------------------------------- LABEL 3 -------------------------------------------- version=6 name=''tank'' state=0 txg=4 pool_guid=1230498626424814687 hostid=2180312168 hostname=''sunny.local'' top_guid=1935911704083137663 guid=1935911704083137663 vdev_tree type=''disk'' id=0 guid=1935911704083137663 path=''/dev/ad4'' devid=''ad:S13UJDWQ726301'' whole_disk=0 metaslab_array=17 metaslab_shift=32 ashift=9 asize=750151532544 So really no change to recover the data? :-o -- This message posted from opensolaris.org
i found some tasks to do, to get more information about the problem. zpool import tank tells me: cannot import ''tank'': pool may be in use from other system use ''-f'' to import anyway zpool import -f tank cannot import ''tank'': no such pool or dataset using zpool import fmdump -eV gives me Nov 30 2008 18:46:00.785185192 ereport.fs.zfs.io nvlist version: 0 class = ereport.fs.zfs.io ena = 0xac6f8268a900401 detector = (embedded nvlist) nvlist version: 0 version = 0x0 scheme = zfs pool = 0xeded501105d65329 vdev = 0x14c7450257f00831 (end detector) pool = tank pool_guid = 0xeded501105d65329 pool_context = 2 vdev_guid = 0x14c7450257f00831 vdev_type = missing parent_guid = 0xeded501105d65329 parent_type = root zio_err = 48 zio_offset = 0x4000 zio_size = 0x1c000 __ttl = 0x1 __tod = 0x4932d158 0x2eccf9a8 Nov 30 2008 18:46:00.785185783 ereport.fs.zfs.io nvlist version: 0 class = ereport.fs.zfs.io ena = 0xac6f8268a900401 detector = (embedded nvlist) nvlist version: 0 version = 0x0 scheme = zfs pool = 0xeded501105d65329 vdev = 0x14c7450257f00831 (end detector) pool = tank pool_guid = 0xeded501105d65329 pool_context = 2 vdev_guid = 0x14c7450257f00831 vdev_type = missing parent_guid = 0xeded501105d65329 parent_type = root zio_err = 48 zio_offset = 0x44000 zio_size = 0x1c000 __ttl = 0x1 __tod = 0x4932d158 0x2eccfbf7 Nov 30 2008 18:46:00.785186231 ereport.fs.zfs.io nvlist version: 0 class = ereport.fs.zfs.io ena = 0xac6f8268a900401 detector = (embedded nvlist) nvlist version: 0 version = 0x0 scheme = zfs pool = 0xeded501105d65329 vdev = 0x14c7450257f00831 (end detector) pool = tank pool_guid = 0xeded501105d65329 pool_context = 2 vdev_guid = 0x14c7450257f00831 vdev_type = missing parent_guid = 0xeded501105d65329 parent_type = root zio_err = 48 zio_offset = 0x3f84000 zio_size = 0x1c000 __ttl = 0x1 __tod = 0x4932d158 0x2eccfdb7 Nov 30 2008 18:46:00.785185402 ereport.fs.zfs.io nvlist version: 0 class = ereport.fs.zfs.io ena = 0xac6f8268a900401 detector = (embedded nvlist) nvlist version: 0 version = 0x0 scheme = zfs pool = 0xeded501105d65329 vdev = 0x14c7450257f00831 (end detector) pool = tank pool_guid = 0xeded501105d65329 pool_context = 2 vdev_guid = 0x14c7450257f00831 vdev_type = missing parent_guid = 0xeded501105d65329 parent_type = root zio_err = 48 zio_offset = 0x3fc4000 zio_size = 0x1c000 __ttl = 0x1 __tod = 0x4932d158 0x2eccfa7a Nov 30 2008 18:46:00.785185312 ereport.fs.zfs.zpool nvlist version: 0 class = ereport.fs.zfs.zpool ena = 0xac6f8268a900401 detector = (embedded nvlist) nvlist version: 0 version = 0x0 scheme = zfs pool = 0xeded501105d65329 (end detector) pool = tank pool_guid = 0xeded501105d65329 pool_context = 2 __ttl = 0x1 __tod = 0x4932d158 0x2eccfa20 So both devices have still two valid labels. But only one is still recognized for an import. I found this thread: http://www.opensolaris.org/jive/thread.jspa?messageID=220125 But the level is a little bit to high, I really do not know where to find the valid uberblock. A also read about a new labeling with format -e and setup of a new EFI-Label. Any more suggestions? -- This message posted from opensolaris.org
Hi, Attach both original drives to the system, the faulty one may only have had a few check sum errors. zpool status -v should hopefully show your data pool. Provided you have not started to replaced the faulty drive yet. If it don''t see the pool, zpool export then zpool import and hope for the best........ If you get back to original failed state with your pool degraded but readable. It can be easily fixed. most of the time..... Do a zpool status -v <----- mind the -v Whats it saying about your pool? I suspect the faulty drive has check sum errors and has been off-lined. power down the system and add the spare 3rd drive to the system so you have all 3 drives connected. DO NOT MOVE the original drives to different connections in the system that just going to cause more trouble. While your inside the system check all the connection to the hard drives. power up the system Look up the ZFS commands. Read and understand what your about to do. you need to force the failed drive online #zpool online pool device do a zpool clear to clear the error log on the faulty pool #zpool clear pool now you have 2 choices here, back up your critical data to the new 3rd drive or replace the faulty drive. zpool replace [-f] pool device [new_device] Now zfs is almost certainly going to complain like hell about the faulty pool. during the copy / replace operation. To be blunt your data is either readable or its not. Run zpool clear and force online the faulty drive. Every time it gets put offline, this may be several times! Zfs will tell you exactly what files have been lost, if any. The process could take several hours. Do a zpool scrub once its finished. Then back up your data.... use zpool status -v to monitor progress. If you don''t get a lot of errors from the faulty drive. You could try a low level format, to fix the drive. After you have got the data off it ;) one final word, a striped zpool with copy''s=2 is about as much use, as a chocolate fire guard when it comes to protecting data. Use 3+ drives and raidz its far better. Am no expert, been using zfs for 7months. When i fist started using it, ZFS found 4 faulty drives in my setup. And other operating systems said they were good drives!!! So i have used ZFS to its full recovery potential!!! Brian, -- This message posted from opensolaris.org
thx for your suggestions couper88, but this did not help :-/. I tried the lastes live-cd of 2008.11 and got new information: a zpool import shows me now: jack at opensolaris:~# zpool import pool: tank id: 17144447390511944489 state: UNAVAIL status: One or more devices contains corrupted data. action: The pool cannot be imported due to damaged devices or data. see: http://www.sun.com/msg/ZFS-8000-5E config: tank UNAVAIL insufficient replicas c3t5d0 ONLINE pool: tank id: 1230498626424814687 state: FAULTED status: The pool was last accessed by another system. action: The pool cannot be imported due to damaged devices or data. The pool may be active on another system, but can be imported using the ''-f'' flag. see: http://www.sun.com/msg/ZFS-8000-EY config: tank FAULTED corrupted data c3t5d0p0 FAULTED corrupted data c3t2d0p0 ONLINE So i think the second pool is the right one... BUT.... i really do not know how to import it. I tried both jack at opensolaris:~# zpool import -f tank cannot import ''tank'': more than one matching pool import by numeric ID instead jack at opensolaris:/dev/rdsk# zpool import -f 1230498626424814687 cannot import ''tank'': one or more devices is currently unavailable Soooo i got a little bit more hope now... But there is still the Problem, that i cannot import that specific pool :-/ -- This message posted from opensolaris.org
It seems that my devices have several settings of pools :-( zdb -l /dev/rdsk/c0t5d0 tells me -------------------------------------------- LABEL 0 -------------------------------------------- failed to unpack label 0 -------------------------------------------- LABEL 1 -------------------------------------------- failed to unpack label 1 -------------------------------------------- LABEL 2 -------------------------------------------- version=6 name=''tank'' state=0 txg=4 pool_guid=1230498626424814687 hostid=2180312168 hostname=''sunny.local'' top_guid=7409377091667366359 guid=7409377091667366359 vdev_tree type=''disk'' id=1 guid=7409377091667366359 path=''/dev/ad6'' devid=''ad:S13UJDWQ726303'' whole_disk=0 metaslab_array=14 metaslab_shift=32 ashift=9 asize=750151532544 -------------------------------------------- LABEL 3 -------------------------------------------- version=6 name=''tank'' state=0 txg=4 pool_guid=1230498626424814687 hostid=2180312168 hostname=''sunny.local'' top_guid=7409377091667366359 guid=7409377091667366359 vdev_tree type=''disk'' id=1 guid=7409377091667366359 path=''/dev/ad6'' devid=''ad:S13UJDWQ726303'' whole_disk=0 metaslab_array=14 metaslab_shift=32 ashift=9 asize=750151532544 zdb -l /dev/rdsk/c0t5d0[b]s0[/b] tells me -------------------------------------------- LABEL 0 -------------------------------------------- version=10 name=''tank'' state=0 txg=72220 pool_guid=17144447390511944489 hostname=''sunny'' top_guid=2169144823532120681 guid=2169144823532120681 vdev_tree type=''disk'' id=1 guid=2169144823532120681 path=''/dev/dsk/c0t1d0s0'' devid=''id1,sd at SATA_____SAMSUNG_HD753LJ_______S13UJDWQ726303/a'' phys_path=''/pci at 0,0/pci1002,4391 at 11/disk at 1,0:a'' whole_disk=1 metaslab_array=14 metaslab_shift=32 ashift=9 asize=750142881792 is_log=0 DTL=93 -------------------------------------------- LABEL 1 -------------------------------------------- version=10 name=''tank'' state=0 txg=72220 pool_guid=17144447390511944489 hostname=''sunny'' top_guid=2169144823532120681 guid=2169144823532120681 vdev_tree type=''disk'' id=1 guid=2169144823532120681 path=''/dev/dsk/c0t1d0s0'' devid=''id1,sd at SATA_____SAMSUNG_HD753LJ_______S13UJDWQ726303/a'' phys_path=''/pci at 0,0/pci1002,4391 at 11/disk at 1,0:a'' whole_disk=1 metaslab_array=14 metaslab_shift=32 ashift=9 asize=750142881792 is_log=0 DTL=93 -------------------------------------------- LABEL 2 -------------------------------------------- version=10 name=''tank'' state=0 txg=72220 pool_guid=17144447390511944489 hostname=''sunny'' top_guid=2169144823532120681 guid=2169144823532120681 vdev_tree type=''disk'' id=1 guid=2169144823532120681 path=''/dev/dsk/c0t1d0s0'' devid=''id1,sd at SATA_____SAMSUNG_HD753LJ_______S13UJDWQ726303/a'' phys_path=''/pci at 0,0/pci1002,4391 at 11/disk at 1,0:a'' whole_disk=1 metaslab_array=14 metaslab_shift=32 ashift=9 asize=750142881792 is_log=0 DTL=93 -------------------------------------------- LABEL 3 -------------------------------------------- version=10 name=''tank'' state=0 txg=72220 pool_guid=17144447390511944489 hostname=''sunny'' top_guid=2169144823532120681 guid=2169144823532120681 vdev_tree type=''disk'' id=1 guid=2169144823532120681 path=''/dev/dsk/c0t1d0s0'' devid=''id1,sd at SATA_____SAMSUNG_HD753LJ_______S13UJDWQ726303/a'' phys_path=''/pci at 0,0/pci1002,4391 at 11/disk at 1,0:a'' whole_disk=1 metaslab_array=14 metaslab_shift=32 ashift=9 asize=750142881792 is_log=0 DTL=93 So the right pool data of pool # 17144447390511944489 is in c0t2d0s0 and c0t5d0s0. But somehow there is a second pool setting in c0t2d0 and c0t5d0. Thank you to Richard Eling to point that out. So is it possible to clear the invalid pool setting and just use the valid ones in the s0 Partitions? -- This message posted from opensolaris.org
Hi, Eeemmm, i think its safe to say your zpool and its data are gone for ever. Use the Samsung disk checker boot CD, and see if it can fix your faulty disk. Then connect all 3 drives to your system and use raidz. Your data will then be well protected. Brian, -- This message posted from opensolaris.org
Sooooo.... after a short time of silence.... i took some hours to have a look over the plain disks... So first of all: The Data seems still to be there!!! :-) .... i can identify it by some file headers. So i am currently doing some testing within a VMSetup. After that i will try to scan my two 750 GBs and see what files i can detect there.... and maybe i will be able to restore some of them. -- This message posted from opensolaris.org
Ahhhhhhhhhh..... what a relief, knowing the baby data could be ok... best wishes! I am going for some seafood dinner now. cheers, z ----- Original Message ----- From: "Philipp Hau?leiter" <philipp at haussleiter.de> To: <zfs-discuss at opensolaris.org> Sent: Monday, January 12, 2009 5:45 PM Subject: Re: [zfs-discuss] Problem importing degraded Pool> Sooooo.... > > after a short time of silence.... > > i took some hours to have a look over the plain disks... > So first of all: > > The Data seems still to be there!!! :-) .... i can identify it by some > file headers. > So i am currently doing some testing within a VMSetup. > After that i will try to scan my two 750 GBs and see what files i can > detect there.... and maybe i will be able to restore some of them. > -- > This message posted from opensolaris.org > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss