Hi, I am using FreeBSD 8.2 and went to add 4 new disks today to expand my offsite storage. All was working fine for about 20min and then the new drive cage started to fail. Silly me for assuming new hardware would be fine :( The new drive cage started to fail, it hung the server and the box rebooted. After it rebooted, the entire pool is gone and in the state below. I had only written a few files to the new larger pool and I am not concerned about restoring that data. However, is there a way to get back the original pool data ? Going to http://www.sun.com/msg/ZFS-8000-3C gives a 503 error on the web page listed BTW. 0(offsite)# zpool status pool: tank1 state: UNAVAIL status: One or more devices could not be opened. There are insufficient replicas for the pool to continue functioning. action: Attach the missing device and online it using ''zpool online''. see: http://www.sun.com/msg/ZFS-8000-3C scrub: none requested config: NAME STATE READ WRITE CKSUM tank1 UNAVAIL 0 0 0 insufficient replicas raidz1 ONLINE 0 0 0 ad0 ONLINE 0 0 0 ad1 ONLINE 0 0 0 ad4 ONLINE 0 0 0 ad6 ONLINE 0 0 0 raidz1 ONLINE 0 0 0 ada4 ONLINE 0 0 0 ada5 ONLINE 0 0 0 ada6 ONLINE 0 0 0 ada7 ONLINE 0 0 0 raidz1 UNAVAIL 0 0 0 insufficient replicas ada0 UNAVAIL 0 0 0 cannot open ada1 UNAVAIL 0 0 0 cannot open ada2 UNAVAIL 0 0 0 cannot open ada3 UNAVAIL 0 0 0 cannot open 0(offsite)#
> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss- > bounces at opensolaris.org] On Behalf Of Mike Tancsa > > NAME STATE READ WRITE CKSUM > tank1 UNAVAIL 0 0 0 insufficient replicas > raidz1 ONLINE 0 0 0 > ad0 ONLINE 0 0 0 > ad1 ONLINE 0 0 0 > ad4 ONLINE 0 0 0 > ad6 ONLINE 0 0 0 > raidz1 ONLINE 0 0 0 > ada4 ONLINE 0 0 0 > ada5 ONLINE 0 0 0 > ada6 ONLINE 0 0 0 > ada7 ONLINE 0 0 0 > raidz1 UNAVAIL 0 0 0 insufficient replicas > ada0 UNAVAIL 0 0 0 cannot open > ada1 UNAVAIL 0 0 0 cannot open > ada2 UNAVAIL 0 0 0 cannot open > ada3 UNAVAIL 0 0 0 cannot openThat is a huge bummer. I don''t know if there is any way to recover aside from restoring backups. But I will say this much: That is precisely the reason why you always want to spread your mirror/raidz devices across multiple controllers or chassis. If you lose a controller or a whole chassis, you lose one device from each vdev, and you''re able to continue production in a degraded state...
On Jan 28, 2011, at 6:41 PM, Mike Tancsa wrote:> Hi, > I am using FreeBSD 8.2 and went to add 4 new disks today to expand my > offsite storage. All was working fine for about 20min and then the new > drive cage started to fail. Silly me for assuming new hardware would be > fine :( > > The new drive cage started to fail, it hung the server and the box > rebooted. After it rebooted, the entire pool is gone and in the state > below. I had only written a few files to the new larger pool and I am > not concerned about restoring that data. However, is there a way to get > back the original pool data ? > Going to http://www.sun.com/msg/ZFS-8000-3C gives a 503 error on the web > page listed BTW.Oracle has its fair share of idiots :-( They have been changing around the websites and blowing all of the links people have setup for the past 20+ years.> 0(offsite)# zpool status > pool: tank1 > state: UNAVAIL > status: One or more devices could not be opened. There are insufficient > replicas for the pool to continue functioning. > action: Attach the missing device and online it using ''zpool online''. > see: http://www.sun.com/msg/ZFS-8000-3C > scrub: none requested > config: > > NAME STATE READ WRITE CKSUM > tank1 UNAVAIL 0 0 0 insufficient replicas > raidz1 ONLINE 0 0 0 > ad0 ONLINE 0 0 0 > ad1 ONLINE 0 0 0 > ad4 ONLINE 0 0 0 > ad6 ONLINE 0 0 0 > raidz1 ONLINE 0 0 0 > ada4 ONLINE 0 0 0 > ada5 ONLINE 0 0 0 > ada6 ONLINE 0 0 0 > ada7 ONLINE 0 0 0 > raidz1 UNAVAIL 0 0 0 insufficient replicas > ada0 UNAVAIL 0 0 0 cannot open > ada1 UNAVAIL 0 0 0 cannot open > ada2 UNAVAIL 0 0 0 cannot open > ada3 UNAVAIL 0 0 0 cannot open > 0(offsite)#This is usually easily solved without data loss by making the disks available again. Can you read anything from the disks using any program? -- richard
On 1/29/2011 12:57 PM, Richard Elling wrote:>> 0(offsite)# zpool status >> pool: tank1 >> state: UNAVAIL >> status: One or more devices could not be opened. There are insufficient >> replicas for the pool to continue functioning. >> action: Attach the missing device and online it using ''zpool online''. >> see: http://www.sun.com/msg/ZFS-8000-3C >> scrub: none requested >> config: >> >> NAME STATE READ WRITE CKSUM >> tank1 UNAVAIL 0 0 0 insufficient replicas >> raidz1 ONLINE 0 0 0 >> ad0 ONLINE 0 0 0 >> ad1 ONLINE 0 0 0 >> ad4 ONLINE 0 0 0 >> ad6 ONLINE 0 0 0 >> raidz1 ONLINE 0 0 0 >> ada4 ONLINE 0 0 0 >> ada5 ONLINE 0 0 0 >> ada6 ONLINE 0 0 0 >> ada7 ONLINE 0 0 0 >> raidz1 UNAVAIL 0 0 0 insufficient replicas >> ada0 UNAVAIL 0 0 0 cannot open >> ada1 UNAVAIL 0 0 0 cannot open >> ada2 UNAVAIL 0 0 0 cannot open >> ada3 UNAVAIL 0 0 0 cannot open >> 0(offsite)# > > This is usually easily solved without data loss by making the > disks available again. Can you read anything from the disks using > any program?Thats the strange thing, the disks are readable. The drive cage just reset a couple of times prior to the crash. But they seem OK now. Same order as well. # camcontrol devlist <WDC WD\021501FASR\25500W2B0 \200 0956> at scbus0 target 0 lun 0 (pass0,ada0) <WDC WD\021501FASR\25500W2B0 \200 05.01D\0205> at scbus0 target 1 lun 0 (pass1,ada1) <WDC WD\021501FASR\25500W2B0 \200 05.01D\0205> at scbus0 target 2 lun 0 (pass2,ada2) <WDC WD\021501FASR\25500W2B0 \200 05.01D\0205> at scbus0 target 3 lun 0 (pass3,ada3) # dd if=/dev/ada2 of=/dev/null count=20 bs=1024 20+0 records in 20+0 records out 20480 bytes transferred in 0.001634 secs (12534561 bytes/sec) 0(offsite)# ---Mike
On 1/29/2011 11:38 AM, Edward Ned Harvey wrote:> > That is precisely the reason why you always want to spread your mirror/raidz > devices across multiple controllers or chassis. If you lose a controller or > a whole chassis, you lose one device from each vdev, and you''re able to > continue production in a degraded state...Thanks. These are backups of backups. It would be nice to restore them as it will take a while to sync up once again. But if I need to start fresh, is there a resource you can point me to with the current best practices for laying out large storage like this ? Its just for backups of backups in a DR site ---Mike
On Jan 29, 2011, at 12:58 PM, Mike Tancsa wrote:> On 1/29/2011 12:57 PM, Richard Elling wrote: >>> 0(offsite)# zpool status >>> pool: tank1 >>> state: UNAVAIL >>> status: One or more devices could not be opened. There are insufficient >>> replicas for the pool to continue functioning. >>> action: Attach the missing device and online it using ''zpool online''. >>> see: http://www.sun.com/msg/ZFS-8000-3C >>> scrub: none requested >>> config: >>> >>> NAME STATE READ WRITE CKSUM >>> tank1 UNAVAIL 0 0 0 insufficient replicas >>> raidz1 ONLINE 0 0 0 >>> ad0 ONLINE 0 0 0 >>> ad1 ONLINE 0 0 0 >>> ad4 ONLINE 0 0 0 >>> ad6 ONLINE 0 0 0 >>> raidz1 ONLINE 0 0 0 >>> ada4 ONLINE 0 0 0 >>> ada5 ONLINE 0 0 0 >>> ada6 ONLINE 0 0 0 >>> ada7 ONLINE 0 0 0 >>> raidz1 UNAVAIL 0 0 0 insufficient replicas >>> ada0 UNAVAIL 0 0 0 cannot open >>> ada1 UNAVAIL 0 0 0 cannot open >>> ada2 UNAVAIL 0 0 0 cannot open >>> ada3 UNAVAIL 0 0 0 cannot open >>> 0(offsite)# >> >> This is usually easily solved without data loss by making the >> disks available again. Can you read anything from the disks using >> any program? > > Thats the strange thing, the disks are readable. The drive cage just > reset a couple of times prior to the crash. But they seem OK now. Same > order as well. > > # camcontrol devlist > <WDC WD\021501FASR\25500W2B0 \200 0956> at scbus0 target 0 lun 0 > (pass0,ada0) > <WDC WD\021501FASR\25500W2B0 \200 05.01D\0205> at scbus0 target 1 lun 0 > (pass1,ada1) > <WDC WD\021501FASR\25500W2B0 \200 05.01D\0205> at scbus0 target 2 lun 0 > (pass2,ada2) > <WDC WD\021501FASR\25500W2B0 \200 05.01D\0205> at scbus0 target 3 lun 0 > (pass3,ada3) > > > # dd if=/dev/ada2 of=/dev/null count=20 bs=1024 > 20+0 records in > 20+0 records out > 20480 bytes transferred in 0.001634 secs (12534561 bytes/sec) > 0(offsite)#The next step is to run "zdb -l" and look for all 4 labels. Something like: zdb -l /dev/ada2 If all 4 labels exist for each drive and appear intact, then look more closely at how the OS locates the vdevs. If you can''t solve the "UNAVAIL" problem, you won''t be able to import the pool. -- richard
On 1/29/2011 6:18 PM, Richard Elling wrote:>> 0(offsite)# > > The next step is to run "zdb -l" and look for all 4 labels. Something like: > zdb -l /dev/ada2 > > If all 4 labels exist for each drive and appear intact, then look more closely > at how the OS locates the vdevs. If you can''t solve the "UNAVAIL" problem, > you won''t be able to import the pool.Hmmm, doesnt look good on any of the drives. Before I give up, I will try the drives in a different cage Monday. Unfortunately, its a 150km away from me at our DR site # zdb -l /dev/ada0 -------------------------------------------- LABEL 0 -------------------------------------------- failed to unpack label 0 -------------------------------------------- LABEL 1 -------------------------------------------- failed to unpack label 1 -------------------------------------------- LABEL 2 -------------------------------------------- failed to unpack label 2 -------------------------------------------- LABEL 3 -------------------------------------------- failed to unpack label 3
On Jan 29, 2011, at 4:14 PM, Mike Tancsa wrote:> On 1/29/2011 6:18 PM, Richard Elling wrote: >>> 0(offsite)# >> >> The next step is to run "zdb -l" and look for all 4 labels. Something like: >> zdb -l /dev/ada2 >> >> If all 4 labels exist for each drive and appear intact, then look more closely >> at how the OS locates the vdevs. If you can''t solve the "UNAVAIL" problem, >> you won''t be able to import the pool. > > > > Hmmm, doesnt look good on any of the drives.I''m not sure of the way BSD enumerates devices. Some clever person thought that hiding the partition or slice would be useful. I don''t find it useful. On a Solaris system, ZFS can show a disk something like c0t1d0, but that doesn''t exist. The actual data is in slice 0, so you need to use c0t1d0s0 as the argument to zdb. -- richard> Before I give up, I will > try the drives in a different cage Monday. Unfortunately, its a 150km > away from me at our DR site > > > # zdb -l /dev/ada0 > -------------------------------------------- > LABEL 0 > -------------------------------------------- > failed to unpack label 0 > -------------------------------------------- > LABEL 1 > -------------------------------------------- > failed to unpack label 1 > -------------------------------------------- > LABEL 2 > -------------------------------------------- > failed to unpack label 2 > -------------------------------------------- > LABEL 3 > -------------------------------------------- > failed to unpack label 3
On 1/30/2011 12:39 AM, Richard Elling wrote:>> Hmmm, doesnt look good on any of the drives. > > I''m not sure of the way BSD enumerates devices. Some clever person thought > that hiding the partition or slice would be useful. I don''t find it useful. On a Solaris > system, ZFS can show a disk something like c0t1d0, but that doesn''t exist. The > actual data is in slice 0, so you need to use c0t1d0s0 as the argument to zdb.I think its the right syntax. On the older drives, 0(offsite)# zdb -l /dev/ada0 -------------------------------------------- LABEL 0 -------------------------------------------- failed to unpack label 0 -------------------------------------------- LABEL 1 -------------------------------------------- failed to unpack label 1 -------------------------------------------- LABEL 2 -------------------------------------------- failed to unpack label 2 -------------------------------------------- LABEL 3 -------------------------------------------- failed to unpack label 3 0(offsite)# zdb -l /dev/ada4 -------------------------------------------- LABEL 0 -------------------------------------------- version=15 name=''tank1'' state=0 txg=44593174 pool_guid=7336939736750289319 hostid=3221266864 hostname=''offsite.sentex.ca'' top_guid=6980939370923808328 guid=16144392433229115618 vdev_tree type=''raidz'' id=1 guid=6980939370923808328 nparity=1 metaslab_array=38 metaslab_shift=35 ashift=9 asize=4000799784960 is_log=0 children[0] type=''disk'' id=0 guid=16144392433229115618 path=''/dev/ada4'' whole_disk=0 DTL=341 children[1] type=''disk'' id=1 guid=1210677308003674848 path=''/dev/ada5'' whole_disk=0 DTL=340 children[2] type=''disk'' id=2 guid=2517076601231706249 path=''/dev/ada6'' whole_disk=0 DTL=339 children[3] type=''disk'' id=3 guid=16621760039941477713 path=''/dev/ada7'' whole_disk=0 DTL=338 -------------------------------------------- LABEL 1 -------------------------------------------- version=15 name=''tank1'' state=0 txg=44592523 pool_guid=7336939736750289319 hostid=3221266864 hostname=''offsite.sentex.ca'' top_guid=6980939370923808328 guid=16144392433229115618 vdev_tree type=''raidz'' id=1 guid=6980939370923808328 nparity=1 metaslab_array=38 metaslab_shift=35 ashift=9 asize=4000799784960 is_log=0 children[0] type=''disk'' id=0 guid=16144392433229115618 path=''/dev/ada4'' whole_disk=0 DTL=341 children[1] type=''disk'' id=1 guid=1210677308003674848 path=''/dev/ada5'' whole_disk=0 DTL=340 children[2] type=''disk'' id=2 guid=2517076601231706249 path=''/dev/ada6'' whole_disk=0 DTL=339 children[3] type=''disk'' id=3 guid=16621760039941477713 path=''/dev/ada7'' whole_disk=0 DTL=338 -------------------------------------------- LABEL 2 -------------------------------------------- version=15 name=''tank1'' state=0 txg=44593174 pool_guid=7336939736750289319 hostid=3221266864 hostname=''offsite.sentex.ca'' top_guid=6980939370923808328 guid=16144392433229115618 vdev_tree type=''raidz'' id=1 guid=6980939370923808328 nparity=1 metaslab_array=38 metaslab_shift=35 ashift=9 asize=4000799784960 is_log=0 children[0] type=''disk'' id=0 guid=16144392433229115618 path=''/dev/ada4'' whole_disk=0 DTL=341 children[1] type=''disk'' id=1 guid=1210677308003674848 path=''/dev/ada5'' whole_disk=0 DTL=340 children[2] type=''disk'' id=2 guid=2517076601231706249 path=''/dev/ada6'' whole_disk=0 DTL=339 children[3] type=''disk'' id=3 guid=16621760039941477713 path=''/dev/ada7'' whole_disk=0 DTL=338 -------------------------------------------- LABEL 3 -------------------------------------------- version=15 name=''tank1'' state=0 txg=44592523 pool_guid=7336939736750289319 hostid=3221266864 hostname=''offsite.sentex.ca'' top_guid=6980939370923808328 guid=16144392433229115618 vdev_tree type=''raidz'' id=1 guid=6980939370923808328 nparity=1 metaslab_array=38 metaslab_shift=35 ashift=9 asize=4000799784960 is_log=0 children[0] type=''disk'' id=0 guid=16144392433229115618 path=''/dev/ada4'' whole_disk=0 DTL=341 children[1] type=''disk'' id=1 guid=1210677308003674848 path=''/dev/ada5'' whole_disk=0 DTL=340 children[2] type=''disk'' id=2 guid=2517076601231706249 path=''/dev/ada6'' whole_disk=0 DTL=339 children[3] type=''disk'' id=3 guid=16621760039941477713 path=''/dev/ada7'' whole_disk=0 DTL=338 0(offsite)# zpool status pool: tank1 state: UNAVAIL status: One or more devices could not be opened. There are insufficient replicas for the pool to continue functioning. action: Attach the missing device and online it using ''zpool online''. see: http://www.sun.com/msg/ZFS-8000-3C scrub: none requested config: NAME STATE READ WRITE CKSUM tank1 UNAVAIL 0 0 0 insufficient replicas raidz1 ONLINE 0 0 0 ad0 ONLINE 0 0 0 ad1 ONLINE 0 0 0 ad4 ONLINE 0 0 0 ad6 ONLINE 0 0 0 raidz1 ONLINE 0 0 0 ada4 ONLINE 0 0 0 ada5 ONLINE 0 0 0 ada6 ONLINE 0 0 0 ada7 ONLINE 0 0 0 raidz1 UNAVAIL 0 0 0 insufficient replicas ada0 UNAVAIL 0 0 0 cannot open ada1 UNAVAIL 0 0 0 cannot open ada2 UNAVAIL 0 0 0 cannot open ada3 UNAVAIL 0 0 0 cannot open 0(offsite)#
On Jan 30, 2011, at 4:31 AM, Mike Tancsa wrote:> On 1/30/2011 12:39 AM, Richard Elling wrote: >>> Hmmm, doesnt look good on any of the drives. >> >> I''m not sure of the way BSD enumerates devices. Some clever person thought >> that hiding the partition or slice would be useful. I don''t find it useful. On a Solaris >> system, ZFS can show a disk something like c0t1d0, but that doesn''t exist. The >> actual data is in slice 0, so you need to use c0t1d0s0 as the argument to zdb. > > I think its the right syntax. On the older drives,Bummer. You''ve got to fix this before you can import the pool. No labels, no import. -- richard
On 2011-Jan-30 13:39:22 +0800, Richard Elling <richard.elling at gmail.com> wrote:>I''m not sure of the way BSD enumerates devices. Some clever person thought >that hiding the partition or slice would be useful.No, there''s no hiding. /dev/ada0 always refers to the entire physical disk. If it had PC-style fdisk slices, there would be a sN suffix. If it had GPT partitions, there would be a pN suffix. If it had BSD partitions, there would be an alpha suffix [a-h].>On a Solaris >system, ZFS can show a disk something like c0t1d0, but that doesn''t exist.If we''re discussing brokenness in OS device names, I''ve always thought that reporting device names that don''t exist and not having any way to access the complete physical disk in Solaris was silly. Having a fake ''s2'' meaning the whole disk if there''s no label is a bad kludge. Mike might like to try "gpart list" - which will display FreeBSD''s view of the physical disks. It might also be worthwhile looking at a hexdump of the first and last few MB of the "faulty" disks - it''s possible that the controller has decided to just shift things by a few sectors so the labels aren''t where ZFS expects to find them. -- Peter Jeremy -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 196 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20110131/0f15ff7a/attachment.bin>
On Jan 30, 2011, at 1:09 PM, Peter Jeremy wrote:> On 2011-Jan-30 13:39:22 +0800, Richard Elling <richard.elling at gmail.com> wrote: >> I''m not sure of the way BSD enumerates devices. Some clever person thought >> that hiding the partition or slice would be useful. > > No, there''s no hiding. /dev/ada0 always refers to the entire physical disk.ZFS on Solaris hides the slice when dealing with whole disks using EFI labels.> If it had PC-style fdisk slices, there would be a sN suffix. > If it had GPT partitions, there would be a pN suffix. > If it had BSD partitions, there would be an alpha suffix [a-h]. > >> On a Solaris >> system, ZFS can show a disk something like c0t1d0, but that doesn''t exist. > > If we''re discussing brokenness in OS device names, I''ve always thought > that reporting device names that don''t exist and not having any way to > access the complete physical disk in Solaris was silly. Having a fake > ''s2'' meaning the whole disk if there''s no label is a bad kludge.The "fake" s2 goes back to BSD where the c partition traditionally meant the whole disk. This was just carried forward and changed to "s2" when numbers were used instead of letters. With EFI on Solaris, this is no longer possible and there is "whole disk partition." On a default Solaris system s0 usually refers to the whole disk less s8.> Mike might like to try "gpart list" - which will display FreeBSD''s view > of the physical disks. It might also be worthwhile looking at a hexdump > of the first and last few MB of the "faulty" disks - it''s possible that > the controller has decided to just shift things by a few sectors so the > labels aren''t where ZFS expects to find them.Yes, sometimes controllers will steal space from the disk for implementing RAID. -- richard
He says he''s using FreeBSD. ZFS recorded names like "ada0" which always means a whole disk. In any case FreeBSD will search all block storage for the ZFS dev components if the cached name is wrong: if the attached disks are connected to the system at all FreeBSD will find them wherever they may be. Try FreeBSD 8-STABLE rather than just 8.2-RELEASE as many improvements and fixes have been backported. Perhaps try 9-CURRENT as I''m confident the code there has all of the dev search fixes. Add the line "vfs.zfs.debug=1" to /boot/loader.conf to get detailed debug output as FreeBSD tries to import the pool. -- This message posted from opensolaris.org
On 1/29/2011 6:18 PM, Richard Elling wrote:> > On Jan 29, 2011, at 12:58 PM, Mike Tancsa wrote: > >> On 1/29/2011 12:57 PM, Richard Elling wrote: >>>> 0(offsite)# zpool status >>>> pool: tank1 >>>> state: UNAVAIL >>>> status: One or more devices could not be opened. There are insufficient >>>> replicas for the pool to continue functioning. >>>> action: Attach the missing device and online it using ''zpool online''. >>>> see: http://www.sun.com/msg/ZFS-8000-3C >>>> scrub: none requested >>>> config: >>>> >>>> NAME STATE READ WRITE CKSUM >>>> tank1 UNAVAIL 0 0 0 insufficient replicas >>>> raidz1 ONLINE 0 0 0 >>>> ad0 ONLINE 0 0 0 >>>> ad1 ONLINE 0 0 0 >>>> ad4 ONLINE 0 0 0 >>>> ad6 ONLINE 0 0 0 >>>> raidz1 ONLINE 0 0 0 >>>> ada4 ONLINE 0 0 0 >>>> ada5 ONLINE 0 0 0 >>>> ada6 ONLINE 0 0 0 >>>> ada7 ONLINE 0 0 0 >>>> raidz1 UNAVAIL 0 0 0 insufficient replicas >>>> ada0 UNAVAIL 0 0 0 cannot open >>>> ada1 UNAVAIL 0 0 0 cannot open >>>> ada2 UNAVAIL 0 0 0 cannot open >>>> ada3 UNAVAIL 0 0 0 cannot open >>>> 0(offsite)# >>> >>> This is usually easily solved without data loss by making the >>> disks available again. Can you read anything from the disks using >>> any program? >> >> Thats the strange thing, the disks are readable. The drive cage just >> reset a couple of times prior to the crash. But they seem OK now. Same >> order as well. >> >> # camcontrol devlist >> <WDC WD\021501FASR\25500W2B0 \200 0956> at scbus0 target 0 lun 0 >> (pass0,ada0) >> <WDC WD\021501FASR\25500W2B0 \200 05.01D\0205> at scbus0 target 1 lun 0 >> (pass1,ada1) >> <WDC WD\021501FASR\25500W2B0 \200 05.01D\0205> at scbus0 target 2 lun 0 >> (pass2,ada2) >> <WDC WD\021501FASR\25500W2B0 \200 05.01D\0205> at scbus0 target 3 lun 0 >> (pass3,ada3) >> >> >> # dd if=/dev/ada2 of=/dev/null count=20 bs=1024 >> 20+0 records in >> 20+0 records out >> 20480 bytes transferred in 0.001634 secs (12534561 bytes/sec) >> 0(offsite)# > > The next step is to run "zdb -l" and look for all 4 labels. Something like: > zdb -l /dev/ada2 > > If all 4 labels exist for each drive and appear intact, then look more closely > at how the OS locates the vdevs. If you can''t solve the "UNAVAIL" problem, > you won''t be able to import the pool. > -- richardOn 1/29/2011 10:13 PM, James R. Van Artsdalen wrote:> On 1/28/2011 4:46 PM, Mike Tancsa wrote: >> >> I had just added another set of disks to my zfs array. It looks like the >> drive cage with the new drives is faulty. I had added a couple of files >> to the main pool, but not much. Is there any way to restore the pool >> below ? I have a lot of files on ad0,1,4,6 and ada4,5,6,7 and perhaps >> one file on the new drives in the bad cage. > > Get another enclosure and verify it works OK. Then move the disks from > the suspect enclosure to the tested enclosure and try to import the pool. > > The problem may be cabling or the controller instead - you didn''t > specify how the disks were attached or which version of FreeBSD you''re > using. >First off thanks to all who responded on and offlist! Good news (for me) it seems. New cage and all seems to be recognized correctly. The history is ... 2010-04-22.14:27:38 zpool add tank1 raidz /dev/ada4 /dev/ada5 /dev/ada6 /dev/ada7 2010-06-11.13:49:33 zfs create tank1/argus-data 2010-06-11.13:49:41 zfs create tank1/argus-data/previous 2010-06-11.13:50:38 zfs set compression=off tank1/argus-data 2010-08-06.12:20:59 zpool replace tank1 ad1 ad1 2010-09-16.10:17:51 zpool upgrade -a 2011-01-28.11:45:43 zpool add tank1 raidz /dev/ada0 /dev/ada1 /dev/ada2 /dev/ada3 FreeBSD RELENG_8 from last week, 8G of RAM, amd64. zpool status -v pool: tank1 state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://www.sun.com/msg/ZFS-8000-8A scrub: none requested config: NAME STATE READ WRITE CKSUM tank1 ONLINE 0 0 0 raidz1 ONLINE 0 0 0 ad0 ONLINE 0 0 0 ad1 ONLINE 0 0 0 ad4 ONLINE 0 0 0 ad6 ONLINE 0 0 0 raidz1 ONLINE 0 0 0 ada0 ONLINE 0 0 0 ada1 ONLINE 0 0 0 ada2 ONLINE 0 0 0 ada3 ONLINE 0 0 0 raidz1 ONLINE 0 0 0 ada5 ONLINE 0 0 0 ada8 ONLINE 0 0 0 ada7 ONLINE 0 0 0 ada6 ONLINE 0 0 0 errors: Permanent errors have been detected in the following files: /tank1/argus-data/previous/argus-sites-radium.2011.01.28.16.00 tank1/argus-data:<0xc6> /tank1/argus-data/argus-sites-radium 0(offsite)# zpool get all tank1 NAME PROPERTY VALUE SOURCE tank1 size 14.5T - tank1 used 7.56T - tank1 available 6.94T - tank1 capacity 52% - tank1 altroot - default tank1 health ONLINE - tank1 guid 7336939736750289319 default tank1 version 15 default tank1 bootfs - default tank1 delegation on default tank1 autoreplace off default tank1 cachefile - default tank1 failmode wait default tank1 listsnapshots on local Do I just want to do a scrub ? Unfortunately, http://www.sun.com/msg/ZFS-8000-8A gives a 503 zdb now shows 0(offsite)# zdb -l /dev/ada0 -------------------------------------------- LABEL 0 -------------------------------------------- version=15 name=''tank1'' state=0 txg=44593174 pool_guid=7336939736750289319 hostid=3221266864 hostname=''offsite.sentex.ca'' top_guid=6980939370923808328 guid=16144392433229115618 vdev_tree type=''raidz'' id=1 guid=6980939370923808328 nparity=1 metaslab_array=38 metaslab_shift=35 ashift=9 asize=4000799784960 is_log=0 children[0] type=''disk'' id=0 guid=16144392433229115618 path=''/dev/ada4'' whole_disk=0 DTL=341 children[1] type=''disk'' id=1 guid=1210677308003674848 path=''/dev/ada5'' whole_disk=0 DTL=340 children[2] type=''disk'' id=2 guid=2517076601231706249 path=''/dev/ada6'' whole_disk=0 DTL=339 children[3] type=''disk'' id=3 guid=16621760039941477713 path=''/dev/ada7'' whole_disk=0 DTL=338 -------------------------------------------- LABEL 1 -------------------------------------------- version=15 name=''tank1'' state=0 txg=44592523 pool_guid=7336939736750289319 hostid=3221266864 hostname=''offsite.sentex.ca'' top_guid=6980939370923808328 guid=16144392433229115618 vdev_tree type=''raidz'' id=1 guid=6980939370923808328 nparity=1 metaslab_array=38 metaslab_shift=35 ashift=9 asize=4000799784960 is_log=0 children[0] type=''disk'' id=0 guid=16144392433229115618 path=''/dev/ada4'' whole_disk=0 DTL=341 children[1] type=''disk'' id=1 guid=1210677308003674848 path=''/dev/ada5'' whole_disk=0 DTL=340 children[2] type=''disk'' id=2 guid=2517076601231706249 path=''/dev/ada6'' whole_disk=0 DTL=339 children[3] type=''disk'' id=3 guid=16621760039941477713 path=''/dev/ada7'' whole_disk=0 DTL=338 -------------------------------------------- LABEL 2 -------------------------------------------- version=15 name=''tank1'' state=0 txg=44593174 pool_guid=7336939736750289319 hostid=3221266864 hostname=''offsite.sentex.ca'' top_guid=6980939370923808328 guid=16144392433229115618 vdev_tree type=''raidz'' id=1 guid=6980939370923808328 nparity=1 metaslab_array=38 metaslab_shift=35 ashift=9 asize=4000799784960 is_log=0 children[0] type=''disk'' id=0 guid=16144392433229115618 path=''/dev/ada4'' whole_disk=0 DTL=341 children[1] type=''disk'' id=1 guid=1210677308003674848 path=''/dev/ada5'' whole_disk=0 DTL=340 children[2] type=''disk'' id=2 guid=2517076601231706249 path=''/dev/ada6'' whole_disk=0 DTL=339 children[3] type=''disk'' id=3 guid=16621760039941477713 path=''/dev/ada7'' whole_disk=0 DTL=338 -------------------------------------------- LABEL 3 -------------------------------------------- version=15 name=''tank1'' state=0 txg=44592523 pool_guid=7336939736750289319 hostid=3221266864 hostname=''offsite.sentex.ca'' top_guid=6980939370923808328 guid=16144392433229115618 vdev_tree type=''raidz'' id=1 guid=6980939370923808328 nparity=1 metaslab_array=38 metaslab_shift=35 ashift=9 asize=4000799784960 is_log=0 children[0] type=''disk'' id=0 guid=16144392433229115618 path=''/dev/ada4'' whole_disk=0 DTL=341 children[1] type=''disk'' id=1 guid=1210677308003674848 path=''/dev/ada5'' whole_disk=0 DTL=340 children[2] type=''disk'' id=2 guid=2517076601231706249 path=''/dev/ada6'' whole_disk=0 DTL=339 children[3] type=''disk'' id=3 guid=16621760039941477713 path=''/dev/ada7'' whole_disk=0 DTL=338 0(offsite)# ---Mike
Hi Mike, Yes, this is looking much better. Some combination of removing corrupted files indicated in the zpool status -v output, running zpool scrub and then zpool clear should resolve the corruption, but its depends on how bad the corruption is. First, I would try least destruction method: Try to remove the files listed below by using the rm command. This entry probably means that the metadata is corrupted or some other file (like a temp file) no longer exists: tank1/argus-data:<0xc6> If you are able to remove the individual file with rm, run another zpool scrub and then a zpool clear to clear the pool errors. You might need to repeat the zpool scrub/zpool clear combo. If you can''t remove the individual files, then you might have to destroy the tank1/argus-data file system. Let us know what actually works. Thanks, Cindy On 01/31/11 12:20, Mike Tancsa wrote:> On 1/29/2011 6:18 PM, Richard Elling wrote: >> On Jan 29, 2011, at 12:58 PM, Mike Tancsa wrote: >> >>> On 1/29/2011 12:57 PM, Richard Elling wrote: >>>>> 0(offsite)# zpool status >>>>> pool: tank1 >>>>> state: UNAVAIL >>>>> status: One or more devices could not be opened. There are insufficient >>>>> replicas for the pool to continue functioning. >>>>> action: Attach the missing device and online it using ''zpool online''. >>>>> see: http://www.sun.com/msg/ZFS-8000-3C >>>>> scrub: none requested >>>>> config: >>>>> >>>>> NAME STATE READ WRITE CKSUM >>>>> tank1 UNAVAIL 0 0 0 insufficient replicas >>>>> raidz1 ONLINE 0 0 0 >>>>> ad0 ONLINE 0 0 0 >>>>> ad1 ONLINE 0 0 0 >>>>> ad4 ONLINE 0 0 0 >>>>> ad6 ONLINE 0 0 0 >>>>> raidz1 ONLINE 0 0 0 >>>>> ada4 ONLINE 0 0 0 >>>>> ada5 ONLINE 0 0 0 >>>>> ada6 ONLINE 0 0 0 >>>>> ada7 ONLINE 0 0 0 >>>>> raidz1 UNAVAIL 0 0 0 insufficient replicas >>>>> ada0 UNAVAIL 0 0 0 cannot open >>>>> ada1 UNAVAIL 0 0 0 cannot open >>>>> ada2 UNAVAIL 0 0 0 cannot open >>>>> ada3 UNAVAIL 0 0 0 cannot open >>>>> 0(offsite)# >>>> This is usually easily solved without data loss by making the >>>> disks available again. Can you read anything from the disks using >>>> any program? >>> Thats the strange thing, the disks are readable. The drive cage just >>> reset a couple of times prior to the crash. But they seem OK now. Same >>> order as well. >>> >>> # camcontrol devlist >>> <WDC WD\021501FASR\25500W2B0 \200 0956> at scbus0 target 0 lun 0 >>> (pass0,ada0) >>> <WDC WD\021501FASR\25500W2B0 \200 05.01D\0205> at scbus0 target 1 lun 0 >>> (pass1,ada1) >>> <WDC WD\021501FASR\25500W2B0 \200 05.01D\0205> at scbus0 target 2 lun 0 >>> (pass2,ada2) >>> <WDC WD\021501FASR\25500W2B0 \200 05.01D\0205> at scbus0 target 3 lun 0 >>> (pass3,ada3) >>> >>> >>> # dd if=/dev/ada2 of=/dev/null count=20 bs=1024 >>> 20+0 records in >>> 20+0 records out >>> 20480 bytes transferred in 0.001634 secs (12534561 bytes/sec) >>> 0(offsite)# >> The next step is to run "zdb -l" and look for all 4 labels. Something like: >> zdb -l /dev/ada2 >> >> If all 4 labels exist for each drive and appear intact, then look more closely >> at how the OS locates the vdevs. If you can''t solve the "UNAVAIL" problem, >> you won''t be able to import the pool. >> -- richard > > On 1/29/2011 10:13 PM, James R. Van Artsdalen wrote: >> On 1/28/2011 4:46 PM, Mike Tancsa wrote: >>> I had just added another set of disks to my zfs array. It looks like the >>> drive cage with the new drives is faulty. I had added a couple of files >>> to the main pool, but not much. Is there any way to restore the pool >>> below ? I have a lot of files on ad0,1,4,6 and ada4,5,6,7 and perhaps >>> one file on the new drives in the bad cage. >> Get another enclosure and verify it works OK. Then move the disks from >> the suspect enclosure to the tested enclosure and try to import the pool. >> >> The problem may be cabling or the controller instead - you didn''t >> specify how the disks were attached or which version of FreeBSD you''re >> using. >> > > First off thanks to all who responded on and offlist! > > Good news (for me) it seems. New cage and all seems to be recognized > correctly. The history is > > ... > 2010-04-22.14:27:38 zpool add tank1 raidz /dev/ada4 /dev/ada5 /dev/ada6 > /dev/ada7 > 2010-06-11.13:49:33 zfs create tank1/argus-data > 2010-06-11.13:49:41 zfs create tank1/argus-data/previous > 2010-06-11.13:50:38 zfs set compression=off tank1/argus-data > 2010-08-06.12:20:59 zpool replace tank1 ad1 ad1 > 2010-09-16.10:17:51 zpool upgrade -a > 2011-01-28.11:45:43 zpool add tank1 raidz /dev/ada0 /dev/ada1 /dev/ada2 > /dev/ada3 > > FreeBSD RELENG_8 from last week, 8G of RAM, amd64. > > zpool status -v > pool: tank1 > state: ONLINE > status: One or more devices has experienced an error resulting in data > corruption. Applications may be affected. > action: Restore the file in question if possible. Otherwise restore the > entire pool from backup. > see: http://www.sun.com/msg/ZFS-8000-8A > scrub: none requested > config: > > NAME STATE READ WRITE CKSUM > tank1 ONLINE 0 0 0 > raidz1 ONLINE 0 0 0 > ad0 ONLINE 0 0 0 > ad1 ONLINE 0 0 0 > ad4 ONLINE 0 0 0 > ad6 ONLINE 0 0 0 > raidz1 ONLINE 0 0 0 > ada0 ONLINE 0 0 0 > ada1 ONLINE 0 0 0 > ada2 ONLINE 0 0 0 > ada3 ONLINE 0 0 0 > raidz1 ONLINE 0 0 0 > ada5 ONLINE 0 0 0 > ada8 ONLINE 0 0 0 > ada7 ONLINE 0 0 0 > ada6 ONLINE 0 0 0 > > errors: Permanent errors have been detected in the following files: > > /tank1/argus-data/previous/argus-sites-radium.2011.01.28.16.00 > tank1/argus-data:<0xc6> > /tank1/argus-data/argus-sites-radium > > 0(offsite)# zpool get all tank1 > NAME PROPERTY VALUE SOURCE > tank1 size 14.5T - > tank1 used 7.56T - > tank1 available 6.94T - > tank1 capacity 52% - > tank1 altroot - default > tank1 health ONLINE - > tank1 guid 7336939736750289319 default > tank1 version 15 default > tank1 bootfs - default > tank1 delegation on default > tank1 autoreplace off default > tank1 cachefile - default > tank1 failmode wait default > tank1 listsnapshots on local > > > Do I just want to do a scrub ? > > Unfortunately, http://www.sun.com/msg/ZFS-8000-8A gives a 503 > > zdb now shows > > 0(offsite)# zdb -l /dev/ada0 > -------------------------------------------- > LABEL 0 > -------------------------------------------- > version=15 > name=''tank1'' > state=0 > txg=44593174 > pool_guid=7336939736750289319 > hostid=3221266864 > hostname=''offsite.sentex.ca'' > top_guid=6980939370923808328 > guid=16144392433229115618 > vdev_tree > type=''raidz'' > id=1 > guid=6980939370923808328 > nparity=1 > metaslab_array=38 > metaslab_shift=35 > ashift=9 > asize=4000799784960 > is_log=0 > children[0] > type=''disk'' > id=0 > guid=16144392433229115618 > path=''/dev/ada4'' > whole_disk=0 > DTL=341 > children[1] > type=''disk'' > id=1 > guid=1210677308003674848 > path=''/dev/ada5'' > whole_disk=0 > DTL=340 > children[2] > type=''disk'' > id=2 > guid=2517076601231706249 > path=''/dev/ada6'' > whole_disk=0 > DTL=339 > children[3] > type=''disk'' > id=3 > guid=16621760039941477713 > path=''/dev/ada7'' > whole_disk=0 > DTL=338 > -------------------------------------------- > LABEL 1 > -------------------------------------------- > version=15 > name=''tank1'' > state=0 > txg=44592523 > pool_guid=7336939736750289319 > hostid=3221266864 > hostname=''offsite.sentex.ca'' > top_guid=6980939370923808328 > guid=16144392433229115618 > vdev_tree > type=''raidz'' > id=1 > guid=6980939370923808328 > nparity=1 > metaslab_array=38 > metaslab_shift=35 > ashift=9 > asize=4000799784960 > is_log=0 > children[0] > type=''disk'' > id=0 > guid=16144392433229115618 > path=''/dev/ada4'' > whole_disk=0 > DTL=341 > children[1] > type=''disk'' > id=1 > guid=1210677308003674848 > path=''/dev/ada5'' > whole_disk=0 > DTL=340 > children[2] > type=''disk'' > id=2 > guid=2517076601231706249 > path=''/dev/ada6'' > whole_disk=0 > DTL=339 > children[3] > type=''disk'' > id=3 > guid=16621760039941477713 > path=''/dev/ada7'' > whole_disk=0 > DTL=338 > -------------------------------------------- > LABEL 2 > -------------------------------------------- > version=15 > name=''tank1'' > state=0 > txg=44593174 > pool_guid=7336939736750289319 > hostid=3221266864 > hostname=''offsite.sentex.ca'' > top_guid=6980939370923808328 > guid=16144392433229115618 > vdev_tree > type=''raidz'' > id=1 > guid=6980939370923808328 > nparity=1 > metaslab_array=38 > metaslab_shift=35 > ashift=9 > asize=4000799784960 > is_log=0 > children[0] > type=''disk'' > id=0 > guid=16144392433229115618 > path=''/dev/ada4'' > whole_disk=0 > DTL=341 > children[1] > type=''disk'' > id=1 > guid=1210677308003674848 > path=''/dev/ada5'' > whole_disk=0 > DTL=340 > children[2] > type=''disk'' > id=2 > guid=2517076601231706249 > path=''/dev/ada6'' > whole_disk=0 > DTL=339 > children[3] > type=''disk'' > id=3 > guid=16621760039941477713 > path=''/dev/ada7'' > whole_disk=0 > DTL=338 > -------------------------------------------- > LABEL 3 > -------------------------------------------- > version=15 > name=''tank1'' > state=0 > txg=44592523 > pool_guid=7336939736750289319 > hostid=3221266864 > hostname=''offsite.sentex.ca'' > top_guid=6980939370923808328 > guid=16144392433229115618 > vdev_tree > type=''raidz'' > id=1 > guid=6980939370923808328 > nparity=1 > metaslab_array=38 > metaslab_shift=35 > ashift=9 > asize=4000799784960 > is_log=0 > children[0] > type=''disk'' > id=0 > guid=16144392433229115618 > path=''/dev/ada4'' > whole_disk=0 > DTL=341 > children[1] > type=''disk'' > id=1 > guid=1210677308003674848 > path=''/dev/ada5'' > whole_disk=0 > DTL=340 > children[2] > type=''disk'' > id=2 > guid=2517076601231706249 > path=''/dev/ada6'' > whole_disk=0 > DTL=339 > children[3] > type=''disk'' > id=3 > guid=16621760039941477713 > path=''/dev/ada7'' > whole_disk=0 > DTL=338 > 0(offsite)# > > ---Mike > > > > > > > > > > > > > > > > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
On 1/31/2011 3:14 PM, Cindy Swearingen wrote:> Hi Mike, > > Yes, this is looking much better. > > Some combination of removing corrupted files indicated in the zpool > status -v output, running zpool scrub and then zpool clear should > resolve the corruption, but its depends on how bad the corruption is. > > First, I would try least destruction method: Try to remove the > files listed below by using the rm command. > > This entry probably means that the metadata is corrupted or some > other file (like a temp file) no longer exists: > > tank1/argus-data:<0xc6>Hi Cindy, I removed the files that were listed, and now I am left with errors: Permanent errors have been detected in the following files: tank1/argus-data:<0xc5> tank1/argus-data:<0xc6> tank1/argus-data:<0xc7> I have started a scrub scrub: scrub in progress for 0h48m, 10.90% done, 6h35m to go I will report back once the scrub is done! ---Mike
On Jan 31, 2011, at 1:19 PM, Mike Tancsa wrote:> On 1/31/2011 3:14 PM, Cindy Swearingen wrote: >> Hi Mike, >> >> Yes, this is looking much better. >> >> Some combination of removing corrupted files indicated in the zpool >> status -v output, running zpool scrub and then zpool clear should >> resolve the corruption, but its depends on how bad the corruption is. >> >> First, I would try least destruction method: Try to remove the >> files listed below by using the rm command. >> >> This entry probably means that the metadata is corrupted or some >> other file (like a temp file) no longer exists: >> >> tank1/argus-data:<0xc6> > > > Hi Cindy, > I removed the files that were listed, and now I am left with > > errors: Permanent errors have been detected in the following files: > > tank1/argus-data:<0xc5> > tank1/argus-data:<0xc6> > tank1/argus-data:<0xc7> > > I have started a scrub > scrub: scrub in progress for 0h48m, 10.90% done, 6h35m to go > > I will report back once the scrub is done!The "permanent" errors report shows the current and previous results. When you have multiple failures that are recovered, consider running scrub twice before attempting to correct or delete files. -- richard
On 1/31/2011 4:19 PM, Mike Tancsa wrote:> On 1/31/2011 3:14 PM, Cindy Swearingen wrote: >> Hi Mike, >> >> Yes, this is looking much better. >> >> Some combination of removing corrupted files indicated in the zpool >> status -v output, running zpool scrub and then zpool clear should >> resolve the corruption, but its depends on how bad the corruption is. >> >> First, I would try least destruction method: Try to remove the >> files listed below by using the rm command. >> >> This entry probably means that the metadata is corrupted or some >> other file (like a temp file) no longer exists: >> >> tank1/argus-data:<0xc6> > > > Hi Cindy, > I removed the files that were listed, and now I am left with > > errors: Permanent errors have been detected in the following files: > > tank1/argus-data:<0xc5> > tank1/argus-data:<0xc6> > tank1/argus-data:<0xc7> > > I have started a scrub > scrub: scrub in progress for 0h48m, 10.90% done, 6h35m to goLooks like that was it! The scrub finished in the time it estimated and that was all I needed to do. I did not have to to do zpool clear or any other commands. Is there anything beyond scrub to check the integrity of the pool ? 0(offsite)# zpool status -v pool: tank1 state: ONLINE scrub: scrub completed after 7h32m with 0 errors on Mon Jan 31 23:00:46 2011 config: NAME STATE READ WRITE CKSUM tank1 ONLINE 0 0 0 raidz1 ONLINE 0 0 0 ad0 ONLINE 0 0 0 ad1 ONLINE 0 0 0 ad4 ONLINE 0 0 0 ad6 ONLINE 0 0 0 raidz1 ONLINE 0 0 0 ada0 ONLINE 0 0 0 ada1 ONLINE 0 0 0 ada2 ONLINE 0 0 0 ada3 ONLINE 0 0 0 raidz1 ONLINE 0 0 0 ada5 ONLINE 0 0 0 ada8 ONLINE 0 0 0 ada7 ONLINE 0 0 0 ada6 ONLINE 0 0 0 errors: No known data errors 0(offsite)# ---Mike
Excellent. I think you are good for now as long as your hardware setup is stable. You survived a severe hardware failure so say a prayer and make sure this doesn''t happen again. Always have good backups. Thanks, Cindy On 02/01/11 06:56, Mike Tancsa wrote:> On 1/31/2011 4:19 PM, Mike Tancsa wrote: >> On 1/31/2011 3:14 PM, Cindy Swearingen wrote: >>> Hi Mike, >>> >>> Yes, this is looking much better. >>> >>> Some combination of removing corrupted files indicated in the zpool >>> status -v output, running zpool scrub and then zpool clear should >>> resolve the corruption, but its depends on how bad the corruption is. >>> >>> First, I would try least destruction method: Try to remove the >>> files listed below by using the rm command. >>> >>> This entry probably means that the metadata is corrupted or some >>> other file (like a temp file) no longer exists: >>> >>> tank1/argus-data:<0xc6> >> >> Hi Cindy, >> I removed the files that were listed, and now I am left with >> >> errors: Permanent errors have been detected in the following files: >> >> tank1/argus-data:<0xc5> >> tank1/argus-data:<0xc6> >> tank1/argus-data:<0xc7> >> >> I have started a scrub >> scrub: scrub in progress for 0h48m, 10.90% done, 6h35m to go > > > Looks like that was it! The scrub finished in the time it estimated and > that was all I needed to do. I did not have to to do zpool clear or any > other commands. Is there anything beyond scrub to check the integrity > of the pool ? > > 0(offsite)# zpool status -v > pool: tank1 > state: ONLINE > scrub: scrub completed after 7h32m with 0 errors on Mon Jan 31 23:00:46 > 2011 > config: > > NAME STATE READ WRITE CKSUM > tank1 ONLINE 0 0 0 > raidz1 ONLINE 0 0 0 > ad0 ONLINE 0 0 0 > ad1 ONLINE 0 0 0 > ad4 ONLINE 0 0 0 > ad6 ONLINE 0 0 0 > raidz1 ONLINE 0 0 0 > ada0 ONLINE 0 0 0 > ada1 ONLINE 0 0 0 > ada2 ONLINE 0 0 0 > ada3 ONLINE 0 0 0 > raidz1 ONLINE 0 0 0 > ada5 ONLINE 0 0 0 > ada8 ONLINE 0 0 0 > ada7 ONLINE 0 0 0 > ada6 ONLINE 0 0 0 > > errors: No known data errors > 0(offsite)# > > > ---Mike
On Feb 1, 2011, at 5:56 AM, Mike Tancsa wrote:> On 1/31/2011 4:19 PM, Mike Tancsa wrote: >> On 1/31/2011 3:14 PM, Cindy Swearingen wrote: >>> Hi Mike, >>> >>> Yes, this is looking much better. >>> >>> Some combination of removing corrupted files indicated in the zpool >>> status -v output, running zpool scrub and then zpool clear should >>> resolve the corruption, but its depends on how bad the corruption is. >>> >>> First, I would try least destruction method: Try to remove the >>> files listed below by using the rm command. >>> >>> This entry probably means that the metadata is corrupted or some >>> other file (like a temp file) no longer exists: >>> >>> tank1/argus-data:<0xc6> >> >> >> Hi Cindy, >> I removed the files that were listed, and now I am left with >> >> errors: Permanent errors have been detected in the following files: >> >> tank1/argus-data:<0xc5> >> tank1/argus-data:<0xc6> >> tank1/argus-data:<0xc7> >> >> I have started a scrub >> scrub: scrub in progress for 0h48m, 10.90% done, 6h35m to go > > > Looks like that was it! The scrub finished in the time it estimated and > that was all I needed to do. I did not have to to do zpool clear or any > other commands. Is there anything beyond scrub to check the integrity > of the pool ?That is exactly what scrub does. It validates all data on the disks.> > 0(offsite)# zpool status -v > pool: tank1 > state: ONLINE > scrub: scrub completed after 7h32m with 0 errors on Mon Jan 31 23:00:46 > 2011 > config: > > NAME STATE READ WRITE CKSUM > tank1 ONLINE 0 0 0 > raidz1 ONLINE 0 0 0 > ad0 ONLINE 0 0 0 > ad1 ONLINE 0 0 0 > ad4 ONLINE 0 0 0 > ad6 ONLINE 0 0 0 > raidz1 ONLINE 0 0 0 > ada0 ONLINE 0 0 0 > ada1 ONLINE 0 0 0 > ada2 ONLINE 0 0 0 > ada3 ONLINE 0 0 0 > raidz1 ONLINE 0 0 0 > ada5 ONLINE 0 0 0 > ada8 ONLINE 0 0 0 > ada7 ONLINE 0 0 0 > ada6 ONLINE 0 0 0 > > errors: No known data errorsCongrats! -- richard