Bill Sommerfeld
2006-Jan-09 21:17 UTC
[zfs-discuss] What a successful "zpool replace" looks like..
For the first time, I had to replace a failed disk in a ZFS pool. The process went extremely smoothly - the only thing which was even a little bit off were the completion time estimates for the subsequent "resilver". Out of an abundance of caution I''ve been running "zpool scrub" at least once a week. Last week, one disk had started returning hard read errors (in groups of three) during a scrub; zpool status reported that these errors had been repaired, but a subsequent scrub reported new errors on new blocks. So clearly the disk is starting to go. I arranged for a replacement, and it showed up today. The disk in question lives in an A5200 fiberchannel JBOD so I had to do battle with luxadm to disconnect the old disk and attach the new one. What I ended up doing: # zpool offline <pool> <failing disk> # luxadm display <failing disk> (discover which enclosure & slot it was in) # luxadm remove_device <enclosure>,<slot> (follow prompts, pull disk) # luxadm insert_device <enclosure>,<slot> (follow prompts, insert disk) # zpool replace <pool> <oldpath> <newpath> I then proceeded to watch it count down until completion: # while zpool status <pool> | grep in.progress while> do while> sleep 60 while> done scrub: resilver in progress, 15.30% done, 0h15m to go scrub: resilver in progress, 17.34% done, 0h18m to go scrub: resilver in progress, 19.45% done, 0h19m to go scrub: resilver in progress, 21.66% done, 0h20m to go scrub: resilver in progress, 23.92% done, 0h21m to go scrub: resilver in progress, 26.08% done, 0h22m to go .... scrub: resilver in progress, 94.67% done, 0h1m to go scrub: resilver in progress, 95.67% done, 0h1m to go scrub: resilver in progress, 97.82% done, 0h0m to go scrub: resilver in progress, 98.82% done, 0h0m to go scrub: resilver in progress, 99.70% done, 0h0m to go
Torrey McMahon
2006-Jan-09 21:48 UTC
[zfs-discuss] What a successful "zpool replace" looks like..
Bill Sommerfeld wrote:> The disk in question lives in an A5200 fiberchannel JBOD so I had to do > battle with luxadm to disconnect the old disk and attach the new one. > > What I ended up doing: > # zpool offline <pool> <failing disk> > > # luxadm display <failing disk> > (discover which enclosure & slot it was in) > # luxadm remove_device <enclosure>,<slot> > (follow prompts, pull disk) > # luxadm insert_device <enclosure>,<slot> > (follow prompts, insert disk) > # zpool replace <pool> <oldpath> <newpath>If that is the extent to which you''ve had to "battle with luxadm" then you are quite lucky. :-)
Eric Schrock
2006-Jan-09 23:05 UTC
[zfs-discuss] What a successful "zpool replace" looks like..
On Mon, Jan 09, 2006 at 04:17:51PM -0500, Bill Sommerfeld wrote:> For the first time, I had to replace a failed disk in a ZFS pool. > The process went extremely smoothly - the only thing which was even a > little bit off were the completion time estimates for the subsequent > "resilver".This is due to the fact that the estimate is O(metadata), not O(data). If you have a small filesystem, or a filesystem with a few very large files, you may find the numbers jump around a bit. But we''ve found that they are reasonably correct for "normal" filesystems. Given the difficulty in getting a more accurate measurement, this will likely have to be good enough for the near future. Good to hear everything went smoothly, though ;-) - Eric -- Eric Schrock, Solaris Kernel Development http://blogs.sun.com/eschrock
Bill Sommerfeld
2006-Jan-10 01:39 UTC
[zfs-discuss] What a successful "zpool replace" looks like..
On Mon, 2006-01-09 at 18:05, Eric Schrock wrote:> This is due to the fact that the estimate is O(metadata), not O(data). > If you have a small filesystem, or a filesystem with a few very large > files, you may find the numbers jump around a bit.Yep, that''s what I see. I plotted the "% remaining" vs time for both the resilver and the scrub, and I see a series of distinct steps (changes in slope) along the way, with roughly the same shape to it for each traversal. The pool contains a bunch of filesystems, which were copied in via rsync one or two at a time (as I was migrating subtrees out of UFS). One filesystem contains a bunch of flash archives; others contain solaris install areas of various vintages, tools, and build workspaces. So the average file size varies considerably from filesystem to filesystem.> Given the difficulty in getting a more accurate measurement, this will > likely have to be good enough for the near future.Indeed. Though it seems like you could collect a modest amount of data during one scrub which would make the completion estimates for a subsequent scrub substantially more accurate. - Bill