GM
2011-Jul-14 17:37 UTC
zfs tasting dropped a stripe out of my pool. help getting it back?
Hi, Whilst the way zfs looks for it's data everywhere can be useful when devices change, I've been rather stung by it. I have a raidz2 with 4x2TB and 2x 2x1TB stripes to make 6x2TB in total. I currently have this: pool: pool2 state: DEGRADED status: One or more devices could not be used because the label is missing or invalid. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Replace the device using 'zpool replace'. see: http://www.sun.com/msg/ZFS-8000-4J scan: resilvered 1.83M in 0h0m with 0 errors on Thu Jul 14 14:59:22 2011 config: NAME STATE READ WRITE CKSUM pool2 DEGRADED 0 0 0 raidz2-0 DEGRADED 0 0 0 gpt/2TB_drive0 ONLINE 0 0 0 gpt/2TB_drive1 ONLINE 0 0 0 gpt/2TB_drive2 ONLINE 0 0 0 13298804679359865221 UNAVAIL 0 0 0 was /dev/gpt/1TB_drive0 12966661380732156057 UNAVAIL 0 0 0 was /dev/gpt/1TB_drive2 gpt/2TB_drive3 ONLINE 0 0 0 cache gpt/cache0 ONLINE 0 0 0 The two UNAVAIL entries used to be stripes. The system helpfully removed them for me. These are the stripes that used to be in the pool: # gstripe status Name Status Components stripe/1TB_drive0+1 UP gpt/1TB_drive1 gpt/1TB_drive0 stripe/1TB_drive2+3 UP gpt/1TB_drive3 gpt/1TB_drive2 They still exist and have all the data in them. It started, when I booted up with the drive that has gpt/1TB_drive1 missing and zfs helpfully replaced the stripe/1TB_drive0+1 device with gpt/1TB_drive0 and told me it had corrupt data on it. Am I right in thinking, that cos one drive was missing which meant that stripe/1TB_drive0+1 was then also missing, that zfs tasted around and found gpt/1TB_drive0 had what look like the right header on it. However, 64k in, it would find incorrect data, as the next 64k was on the missing part of the stripe on gpt/1TB_drive1? I was contemplating how to get the stripe back into the pool without having to do a complete resilver on it. Seemed unnecessary to have to do that when the data was all there. I thought an export and import might help it find it. However, that for some reason did the same to the other stripe stripe/1TB_drive2+3 and it got replaced with gpt/1TB_drive2. Now I am left without parity. Any ideas on what commands will bring this back? I know I can do a replace on both, but if there is some undetected corruption on the other devices then I will lose some data, as any parity that could fix it is currently missing. I do scrub regularly, but I'd prefer not to take that chance. Especially as I have all the data sitting there! I hoping someone has some magic zfs commands to make all this go away :) What can I do to prevent this in future? I've run pools with stripes for years without this happening. It seems zfs has started to look far and wide for it's devices? In the past if the stripe was broken, it would just tell me the device was missing. When the stripe was back, then all was fine. However, this tasting everywhere seems like stripes are now a no-no for zpools? Thanks.