I can flesh this out with detail if needed, but a brief chain of events is: 1. RAIDZ1 zpool with drives A, B, C & D (I don''t have access to see original drive names) 2. New disk E. Replaced A with E. 3. Part way through resilver, drive D was ''removed'' 4. 700+ persistent errors detected, and lots of checksum errors on all drives. Surprised by this - I thought the absence of one drive could be tolerated? 5. Exported, rebooted, imported. Drive D present now. Good. :-) 6. Drive D disappeared again. Bad. :-( 7. This time, only one persistent error. Does this mean that there aren''t errors in the other 700+ files that it reported the first time, or have I lost my chance to note these down, and they are indeed still corrupt? I''ve re-ran step 5 again, so it is now on the third attempted resilver. Hopefully drive D won''t remove itself again, and I''ll actually have 30+ hours of stability while the new drive resilvers ... Chris -- This message posted from opensolaris.org
On Thu, September 17, 2009 04:29, Chris Murray wrote:> 2. New disk E. Replaced A with E. > 3. Part way through resilver, drive D was ''removed'' > 4. 700+ persistent errors detected, and lots of checksum errors on all > drives. Surprised by this - I thought the absence of one drive could be > tolerated?On a RAIDZ, the absence of one drive can be tolerated. But note that you said "part way through the resilver". Drive E is not fully present, AND drive D was removed -- you have 1+ drives missing, in a configuration that can tolerate only 1 drive missing. -- David Dyer-Bennet, dd-b at dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info
Thanks David. Maybe I mis-understand how a replace works? When I added disk E, and used ''zpool replace [A] [E]'' (still can''t remember those drive names), I thought that disk A would still be part of the pool, and read from in order to build the contents of disk E? Sort of like a safer way of doing the old ''swap one drive at a time'' trick with RAID-5 arrays? Chris -- This message posted from opensolaris.org
On 17.09.09 13:29, Chris Murray wrote:> I can flesh this out with detail if needed, but a brief chain of events is:It would be nice to know what OS version/build/patchlevel are you running.> > 1. RAIDZ1 zpool with drives A, B, C & D (I don''t have access to see > original drive names) > 2. New disk E. Replaced A with E. > 3. Part way through resilver, drive D was ''removed'' > 4. 700+ persistent errors detected, and lots of checksum errors on all > drives. Surprised by this - I thought the absence of one drive could be > tolerated? > 5. Exported, rebooted, imported. Drive D present now. Good. :-) > 6. Drive D disappeared again. Bad. :-( > 7. This time, only one persistent error. > > Does this mean that there aren''t errors in the other 700+ files that it > reported the first time, or have I lost my chance to note these down, and they > are indeed still corrupt?It depends on where that one persistent error is. If it is in some filsystem metadata, ZFS may no longer be able to reach to other error blocks as a result... So it''s impossible to tell without a bit more details. victor> I''ve re-ran step 5 again, so it is now on the third attempted resilver. > Hopefully drive D won''t remove itself again, and I''ll actually have 30+ hours of > stability while the new drive resilvers ...> Chris
On 17.09.09 21:44, Chris Murray wrote:> Thanks David. Maybe I mis-understand how a replace works? When I added disk > E, and used ''zpool replace [A] [E]'' (still can''t remember those drive names), > I thought that disk A would still be part of the pool, and read from in order > to build the contents of disk E?Exactly. Disks A and E will be arranged into special vdev of type ''replacing'' beneath the raidz vdev which behaves like a mirror. As soon as resilvering is complete, disk A will be removed from this ''replacing'' mirror making disk E to stay alone in the raidz vdev. victor
Ok, the resilver has been restarted a number of times over the past few days due to two main issues - a drive disconnecting itself, and power failure. I think my troubles are 100% down to these environmental factors, but would like some confidence that after the resilver has completed, if it reports there aren''t any persistent errors, that there actually aren''t any. Attempt #1: the resilver started after I initiated the replace on my SXCE105 install. All was well until the box lost power. On starting back up, it hung while starting OpenSolaris - just after the line containing the system hostname. I''ve had this before when a scrub is in progress. My usual tactic is to boot with the 2009.06 live CD, import the pool, stop the scrub, export, reboot into SXCE105 again, and import. Of course, you can''t stop a replace that''s in progress, so the remaining attempts are in the 2009.06 live CD (build 111b perhaps?) Attempt #2: the resilver started on imported the pool in 2009.06. It was resilvering fine until one drive reported itself as offline. dmesg showed that the drive was ''gone''. I then noticed a lot of checksum errors at the pool level, and RAIDZ1 level, and a large number of ''permanent'' errors. In a panic, thinking that the resilver was now doing more harm than good, I exported the pool and rebooted. Attempt #3: I imported in 2009.06 again. This time, the drive that was disconnected last attempt was online again, and proceeded to resilver along with the original drive. There was only one permanent error - in a particular snapshot of a ZVOL I''m not too concerned about. This is the point that I wrote the original post, wondering if all of those 700+ errors reported the first time around weren''t a problem any more. I have been running zpool clear in a loop because there were checksum errors on another of the drives (neither of the two part of the replacing vdev, and not the one that was removed previously). I didn''t want it to be marked as faulty, so I kept the zpool clear running. Then .. power failure. Attempt #4: I imported in 2009.06. This time, no errors detected at all. Is that a result of my zpool clear? Would that clear any ''permanent'' errors? From the wording, I''d say it wouldn''t, and therefore the action of starting the resilver again with all of the correct disks in place hasn''t found any errors so far ... ? Then, disk removal again ... :-( Attempt #5: I''m convinced that drive removal is down to faulty cabling. I move the machine, completely disconnect all drives, re-wire all connections with new cables, and start the scrub again in 2009.06. Now, there are checksum errors again, so I''m running zpool clear in order to keep drives from being marked as faulted .. but I also have this: errors: Permanent errors have been detected in the following files: zp/iscsi/meerkat_temp at 20090905_1631:<0x1> I have a few of my usual VMs powered up (ESXi connecting using NFS), and they appear to be fine. I''ve ran a chkdsk in the windows VMs, and no errors are reported. Although I can''t be 100% confident that any of those files were in the original list of 700+ errors. In the absence of iscsitgtd, I''m not powering up the ones that rely on iSCSI just yet. My next steps will be: 1. allow the resilver to finish. Assuming I don''t have yet another power cut, this will be in about 24 hours. 2. zpool export 3. reboot into SXCE 4. zpool import 5. start all my usual virtual machines on the ESXi host 6. note whether that permanent error is still there <-- this will be an interesting one for me - will the export & import clear the error? will my looped zpool clear have simply reset the checksum counters to zero, or will it have cleared this too? 7. zpool scrub to see what else turns up. Chris -- This message posted from opensolaris.org
I''ve had an interesting time with this over the past few days ... After the resilver completed, I had the message "no known data errors" in a zpool status. I guess the title of my post should have been "how permanent are permanent errors?". Now, I don''t know whether the action of completing the resilver was the thing that fixed the one remaining error (in the snapshot of the ''meerkat'' zvol), or whether my looped zpool clear commands have done it. Anyhow, for space/noise reasons, I set the machine back up with the original cables (eSATA), in its original tucked-away position, installed SXCE 119 to get me remotely up to date, and imported the pool. So far so good. I then powered up a load of my virtual machines. None of them report errors when running a chkdsk, and SQL Server ''DBCC CHECKDB'' hasn''t reported any problems yet. Things are looking promising on the corruption front - feels like the errors that were reported while the resilvers were in progress have finally been fixed by the final (successful) resilver! Microsoft Exchange 2003 did complain of corruption of mailbox stores, however I have seen this a few times as a result of unclean shutdowns, and don''t think it''s related to the errors that ZFS was reporting on the pool during resilver. Then, ''disk is gone'' again - I think I can definitely put my original troubles down to cabling, which I''ll sort out for good in the next few days. Now, I''m back on the same SATA cables which saw me through the resilvering operation. One of the drives is showing read errors when I run dmesg. I''m having one problem after another with this pool!! I think the disk I/O during the resilver has tipped this disk over the edge. I''ll replace it ASAP, and then I''ll test the drive in a separate rig and RMA it. Anyhow, there is one last thing that I''m struggling with - getting the pool to expand to use the size of the new disk. Before my original replace, I had 3x1TB and 1x750GB disk. I replaced the 750 with another 1TB, which by my reckoning should give me around 4TB as a total size even after checksums and metadata. No: # zpool list NAME SIZE USED AVAIL CAP HEALTH ALTROOT rpool 74G 8.81G 65.2G 11% ONLINE - zp 2.73T 2.36T 379G 86% ONLINE - 2.73T? I''m convinced I''ve expanded a pool in this way before. What am I missing? Chris -- This message posted from opensolaris.org
Try exporting and reimporting the pool. That has done the trick for me in the past -- This message posted from opensolaris.org
Cheers, I did try that, but still got the same total on import - 2.73TB I even thought I might have just made a mistake with the numbers, so I made a sort of ''quarter scale model'' in VMware and OSOL 2009.06, with 3x250G and 1x187G. That gave me a size of 744GB, which is *approx* 1/4 of what I get in the physical machine. That makes sense. I then replaced the 187 with another 250, still 744GB total, as expected. Exported & imported - now 996GB. So, the export and import process seems to be the thing to do, but why it''s not working on my physical machine (SXCE119) is a mystery. I even contemplated that there might have still been a 750GB drive left in the setup, but they''re all 1TB (well, 931.51GB). Any ideas what else it could be? For anyone interested in the checksum/permanent error thing, I''m running a scrub now. 59% done and not one error. -- This message posted from opensolaris.org