Hans-Christian Otto
2010-Oct-07 23:41 UTC
[zfs-discuss] raidz faulted with only one unavailable disk
Hi, I''ve been playing around with zfs for a few days now, and now ended up with a faulted raidz (4 disks) with 3 disks still marked as online. Lets start with the output of zpool import: pool: tank-1 id: 15108774693087697468 state: FAULTED status: One or more devices contains corrupted data. action: The pool cannot be imported due to damaged devices or data. The pool may be active on another system, but can be imported using the ''-f'' flag. see: http://www.sun.com/msg/ZFS-8000-5E config: tank-1 FAULTED corrupted data raidz1-0 ONLINE disk/by-id/dm-name-tank-1-1 UNAVAIL corrupted data disk/by-id/dm-name-tank-1-2 ONLINE disk/by-id/dm-name-tank-1-3 ONLINE disk/by-id/dm-name-tank-1-4 ONLINE After some google searches and reading http://www.sun.com/msg/ZFS-8000-5E, it seems to me as if some metadata is lost, and thus the pool cannot be restored anymore. I''ve tried "zpool import -F tank-1" as well as "zpool import -f tank-1", both resulting in the following message: cannot import ''tank-1'': I/O error Destroy and re-create the pool from a backup source. What I''m wondering about right now are the following things: Is there some way to recover the data? I thought raidz would require two disks to lose the data? And as I think that the data is lost - why did this happen in the first place? Which situation can cause a faulted raidz that has only one broken drive? Greetings, Christian
Cindy Swearingen
2010-Oct-08 15:41 UTC
[zfs-discuss] raidz faulted with only one unavailable disk
Hi Hans-Christian, Can you provide the commands you used to create this pool? Are the pool devices actually files? If so, I don''t see how you have a pool device that starts without a leading slash. I tried to create one and it failed. See the example below. By default, zpool import looks in the /dev/dsk directory so you would need to include the -d /dir option to look in an alternative directory. I''m curious how you faulted the device because when I fault a device in a similar raidz1 configuration, my pool is only degraded. See below. Your pool is corrupted at a higher level. In general, RAID-Z redundancy works like this: raidz can withstand 1 device failure raidz2 can withstand 2 device failures raidz3 can withstand 3 device failures Thanks, Cindy # mkdir /files # mkfile 200m /files/file.1 # mkfile 200m /files/file.2 # mkfile 200m /files/file.3 # mkfile 200m /files/file.4 # cd /files # zpool create tank-1 raidz1 file.1 file.2 file.3 file.4 cannot open ''file.1'': no such device in /dev/dsk must be a full path or shorthand device name # zpool create tank-1 raidz1 /files/file.1 /files/file.2 /files/file.3 /files/file.4 Fault a disk in tank-1: # zpool status tank-1 pool: tank-1 state: DEGRADED status: One or more devices could not be opened. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Attach the missing device and online it using ''zpool online''. see: http://www.sun.com/msg/ZFS-8000-2Q scan: scrub repaired 0 in 0h0m with 0 errors on Fri Oct 8 09:20:36 2010 config: NAME STATE READ WRITE CKSUM tank-1 DEGRADED 0 0 0 raidz1-0 DEGRADED 0 0 0 /files/file.1 UNAVAIL 0 0 0 cannot open /files/file.2 ONLINE 0 0 0 /files/file.3 ONLINE 0 0 0 /files/file.4 ONLINE 0 0 0 # zpool export tank-1 # zpool import tank-1 cannot import ''tank-1'': no such pool available # zpool import -d /files tank-1 On 10/07/10 17:41, Hans-Christian Otto wrote:> Hi, > I''ve been playing around with zfs for a few days now, and now ended up with a faulted raidz (4 disks) with 3 disks still marked as online. > > Lets start with the output of zpool import: > > pool: tank-1 > id: 15108774693087697468 > state: FAULTED > status: One or more devices contains corrupted data. > action: The pool cannot be imported due to damaged devices or data. > The pool may be active on another system, but can be imported using > the ''-f'' flag. > see: http://www.sun.com/msg/ZFS-8000-5E > config: > > tank-1 FAULTED corrupted data > raidz1-0 ONLINE > disk/by-id/dm-name-tank-1-1 UNAVAIL corrupted data > disk/by-id/dm-name-tank-1-2 ONLINE > disk/by-id/dm-name-tank-1-3 ONLINE > disk/by-id/dm-name-tank-1-4 ONLINE > > After some google searches and reading http://www.sun.com/msg/ZFS-8000-5E, > it seems to me as if some metadata is lost, and thus the pool cannot be restored anymore. > > I''ve tried "zpool import -F tank-1" as well as "zpool import -f tank-1", both resulting in the following message: > > cannot import ''tank-1'': I/O error > Destroy and re-create the pool from > a backup source. > > What I''m wondering about right now are the following things: > > Is there some way to recover the data? I thought raidz would require two disks to lose the data? > And as I think that the data is lost - why did this happen in the first place? Which situation can cause a faulted raidz that has only one broken drive? > > Greetings, > Christian > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Hans-Christian Otto
2010-Oct-08 20:45 UTC
[zfs-discuss] raidz faulted with only one unavailable disk
Hi Cindy,> Can you provide the commands you used to create this pool?I don''t have them anymore, no. But they were pretty much like what you wrote below.> Are the pool devices actually files? If so, I don''t see how you > have a pool device that starts without a leading slash. I tried > to create one and it failed. See the example below. > > By default, zpool import looks in the /dev/dsk directory so > you would need to include the -d /dir option to look in an > alternative directory.The pool devices are real devices. The naming scheme might be a bit? different, don''t bother. Importing the pool did work with these names.> > I''m curious how you faulted the device because when I fault > a device in a similar raidz1 configuration, my pool is only > degraded. See below. Your pool is corrupted at a higher level. > d > In general, RAID-Z redundancy works like this: > > raidz can withstand 1 device failure > raidz2 can withstand 2 device failures > raidz3 can withstand 3 device failuresThats what I understood, and thats the reason for my mail to this list. No important data is lost, as I was just playing around with raidz. But I really want to know what happend. After thinking about what I did, one thing came to my mind. Might exporting a degraded pool cause this issue? Greetings, Christian
Cindy Swearingen
2010-Oct-08 21:08 UTC
[zfs-discuss] raidz faulted with only one unavailable disk
Hi Christian, Yes, with non-standard disks you will need to provide the path to zpool import. I don''t think the force import of a degraded pool would cause the pool to be faulted. In general, the I/O error is caused when ZFS can''t access the underlying devices. In this case, your non-standard devices names might have caused that message. You might be able to find out what happened by reviewing the fmdump -eV output to see what device errors occurred to cause the faulted pool. You can review the ZFS hardware diagnostics info, here: http://www.solarisinternals.com/wiki/index.php/ZFS_Troubleshooting_Guide Resolving Hardware Problems Thanks, Cindy On 10/08/10 14:45, Hans-Christian Otto wrote:> Hi Cindy, > >> Can you provide the commands you used to create this pool? > I don''t have them anymore, no. But they were pretty much like what you wrote below. >> Are the pool devices actually files? If so, I don''t see how you >> have a pool device that starts without a leading slash. I tried >> to create one and it failed. See the example below. >> >> By default, zpool import looks in the /dev/dsk directory so >> you would need to include the -d /dir option to look in an >> alternative directory. > The pool devices are real devices. The naming scheme might be a bit? different, don''t bother. > Importing the pool did work with these names. >> I''m curious how you faulted the device because when I fault >> a device in a similar raidz1 configuration, my pool is only >> degraded. See below. Your pool is corrupted at a higher level. >> d >> In general, RAID-Z redundancy works like this: >> >> raidz can withstand 1 device failure >> raidz2 can withstand 2 device failures >> raidz3 can withstand 3 device failures > > Thats what I understood, and thats the reason for my mail to this list. > No important data is lost, as I was just playing around with raidz. > > But I really want to know what happend. > After thinking about what I did, one thing came to my mind. > Might exporting a degraded pool cause this issue? > > Greetings, > Christian
Hans-Christian Otto
2010-Oct-08 21:16 UTC
[zfs-discuss] raidz faulted with only one unavailable disk
Hi Cindy,> I don''t think the force import of a degraded pool would cause the pool > to be faulted. In general, the I/O error is caused when ZFS can''t access the underlying devices. In this case, your non-standard devices names > might have caused that message.as I wrote in my first mail, zpool import (without any params) shows the three non-corrupted disks as "ONLINE" - from my understanding, this should not happen if ZFS can''t access the underlying devices? I will walk through the rest of your mail later. Greetings, Christian