Mark Sandrock
2010-Nov-01 19:33 UTC
[zfs-discuss] Excruciatingly slow resilvering on X4540 (build 134)
Hello, I''m working with someone who replaced a failed 1TB drive (50% utilized), on an X4540 running OS build 134, and I think something must be wrong. Last Tuesday afternoon, zpool status reported: scrub: resilver in progress for 306h0m, 63.87% done, 173h7m to go and a week being 168 hours, that put completion at sometime tomorrow night. However, he just reported zpool status shows: scrub: resilver in progress for 447h26m, 65.07% done, 240h10m to go so it''s looking more like 2011 now. That can''t be right. I''m hoping for a suggestion or two on this issue. I''d search the archives, but they don''t seem searchable. Or am I wrong about that? Thanks. Mark (subscription pending) -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20101101/aae975ff/attachment.html>
Ross Walker
2010-Nov-01 22:55 UTC
[zfs-discuss] Excruciatingly slow resilvering on X4540 (build 134)
On Nov 1, 2010, at 3:33 PM, Mark Sandrock <Mark.Sandrock at oracle.com> wrote:> Hello, > > I''m working with someone who replaced a failed 1TB drive (50% utilized), > on an X4540 running OS build 134, and I think something must be wrong. > > Last Tuesday afternoon, zpool status reported: > > scrub: resilver in progress for 306h0m, 63.87% done, 173h7m to go > > and a week being 168 hours, that put completion at sometime tomorrow night. > > However, he just reported zpool status shows: > > scrub: resilver in progress for 447h26m, 65.07% done, 240h10m to go > > so it''s looking more like 2011 now. That can''t be right. > > I''m hoping for a suggestion or two on this issue. > > I''d search the archives, but they don''t seem searchable. Or am I wrong about that?Some zpool versions have an issue where snapshot creation/deletion during a resilver causes it to start over. Try suspending all snapshot activity during the resilver. -Ross -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20101101/e04fce81/attachment.html>
Edward Ned Harvey
2010-Nov-01 23:06 UTC
[zfs-discuss] Excruciatingly slow resilvering on X4540 (build 134)
> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss- > bounces at opensolaris.org] On Behalf Of Mark Sandrock > > I''m working with someone who replaced a failed 1TB drive (50% > utilized), > on an X4540 running OS build 134, and I think something must be wrong. > > Last Tuesday afternoon, zpool status reported: > > scrub: resilver in progress for 306h0m, 63.87% done, 173h7m to go > > and a week being 168 hours, that put completion at sometime tomorrow > night. > > However, he just reported zpool status shows: > > scrub: resilver in progress for 447h26m, 65.07% done, 240h10m to go > > so it''s looking more like 2011 now. That can''t be right. > > I''m hoping for a suggestion or two on this issue.For a typical live system, which has been in production for a long time with files being created, snapshotted, partially overwritten, snapshots destroyed, etc etc... the blocks written to disk tend to be largely written in random order. And at least for now, the order of resilvering blocks is by creation time, not disk location. So resilver time is typically limited by IOPS for random IO, and the number of records that are in the affected vdev. To reduce the number of records in an affected vdev, it is effective to build the pool using mirrors instead of raidz... Or use smaller vdevs of raidz1 instead of large raidz3. Unfortunately, you''re not going to be able to change that with an existing system. Roughly speaking, a 23-disk raidz3 with capacity of 20 disks would take 40x longer to resilver than one of the mirrors in a 40-disk stripe of mirrors with capacity of 20 disks. In rough numbers, that might be 20 days instead of 12 hours. To reduce the IOPS time... (for background info: Under normal circumstances, you should disable the HBA WriteBack cache if you have dedicated log present (on the X4275 that is done via realtek HBA utility, I don''t know about X4540) ) ... But during resilver, you might enable the WriteBack for the drive that''s being resilvered. I don''t know for sure if that will help, but I think it should make some difference, because the logic which led to the disabling of WB does not apply to resilver writes. To reduce the number of records to resilver... * If possible, disable the creation of new snapshots while resilver is running. * If possible, delete files and destroy old snapshots that are not needed anymore * If possible, limit new writes to the system. By the way, I''m sorry to say ... Also don''t trust the progress indicator. You''re likely to reach 100% completed, and stay there for a long time. Even 2T resilvered on a 1T disk... This is an ugly area which looks bad in the face, but it''s actually physically correct because the filesystem is in use, and performing new writes during the resilver... To reduce the IOPS time... * If possible, limit the "live" IO to the system. Resilver has lower priority and therefore gets delayed a lot for production systems. * Definitely DON''T scrub the pool while it''s resilvering. Maybe you might be able to offload some of the IO by adding cache devices, dedicated log, or ram? Meaning... I know it''s sound in principle, but YMMV immensely, depending on your workload. All of the above is likely to be not amazingly effective. There''s not much you can do, if you started with a huge raidz3, for example. The most important thing you can do to affect resilver time is choose to use mirrors instead of raidz, at the time of pool creation. So, as a last ditch effort ... If you "zfs send" the pool to some other storage, and then recreate your local pool, which will be empty and therefore resilver completed, because zfs only resilvers used blocks... and then "zfs send" the data back to restore the pool... Then besides the fact that your resilver has been forcibly completed, your received data will also be ordered on disk optimally, which will greatly help in case another resilver is needed in the near future... and you create an opportunity to revisit the pool architecture, possibly in favor of mirrors instead of raidz.
Ian Collins
2010-Nov-02 05:10 UTC
[zfs-discuss] Excruciatingly slow resilvering on X4540 (build 134)
On 11/ 2/10 08:33 AM, Mark Sandrock wrote:> Hello, > > I''m working with someone who replaced a failed 1TB drive (50% utilized), > on an X4540 running OS build 134, and I think something must be wrong. > > Last Tuesday afternoon, zpool status reported: > > scrub: resilver in progress for 306h0m, 63.87% done, 173h7m to go > > and a week being 168 hours, that put completion at sometime tomorrow > night. > > However, he just reported zpool status shows: > > scrub: resilver in progress for 447h26m, 65.07% done, 240h10m to go > > so it''s looking more like 2011 now. That can''t be right. > > I''m hoping for a suggestion or two on this issue. > > I''d search the archives, but they don''t seem searchable. Or am I wrong > about that?How is the pool configured? I look after a very busy x5400 with 500G drives configured as 8 drive raidz2 and these take about 100 hours to resilver. The workload on this box is probably worst case for resivering, it receives a steady stream of snapshots. -- Ian.
Ian Collins
2010-Nov-02 05:11 UTC
[zfs-discuss] Excruciatingly slow resilvering on X4540 (build 134)
On 11/ 2/10 11:55 AM, Ross Walker wrote:> On Nov 1, 2010, at 3:33 PM, Mark Sandrock <Mark.Sandrock at oracle.com > <mailto:Mark.Sandrock at oracle.com>> wrote: > >> Hello, >> >> I''m working with someone who replaced a failed 1TB drive (50% utilized), >> on an X4540 running OS build 134, and I think something must be wrong. >> >> Last Tuesday afternoon, zpool status reported: >> >> scrub: resilver in progress for 306h0m, 63.87% done, 173h7m to go >> >> and a week being 168 hours, that put completion at sometime tomorrow >> night. >> >> However, he just reported zpool status shows: >> >> scrub: resilver in progress for 447h26m, 65.07% done, 240h10m to go >> >> so it''s looking more like 2011 now. That can''t be right. >> >> I''m hoping for a suggestion or two on this issue. >> >> I''d search the archives, but they don''t seem searchable. Or am I >> wrong about that? > > Some zpool versions have an issue where snapshot creation/deletion > during a resilver causes it to start over. >That was fixed long before build 134.> Try suspending all snapshot activity during the resilver. >This always helps! -- Ian.
Mark Sandrock
2010-Nov-02 10:41 UTC
[zfs-discuss] Excruciatingly slow resilvering on X4540 (build 134)
On Nov 2, 2010, at 12:10 AM, Ian Collins wrote:> On 11/ 2/10 08:33 AM, Mark Sandrock wrote: >> >> >> I''m working with someone who replaced a failed 1TB drive (50% utilized), >> on an X4540 running OS build 134, and I think something must be wrong. >> >> Last Tuesday afternoon, zpool status reported: >> >> scrub: resilver in progress for 306h0m, 63.87% done, 173h7m to go >> >> and a week being 168 hours, that put completion at sometime tomorrow night. >> >> However, he just reported zpool status shows: >> >> scrub: resilver in progress for 447h26m, 65.07% done, 240h10m to go >> >> so it''s looking more like 2011 now. That can''t be right.>> >> > How is the pool configured?Both 10 and 12 disk RAIDZ-2. That, plus too much other io must be the problem. I''m thinking 5 x (7-2) would be better, assuming he doesn''t want to go RAID-10. Thanks much for all the helpful replies. Mark> > I look after a very busy x5400 with 500G drives configured as 8 drive raidz2 and these take about 100 hours to resilver. The workload on this box is probably worst case for resivering, it receives a steady stream of snapshots. > > -- > Ian. > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Richard Elling
2010-Nov-16 09:02 UTC
[zfs-discuss] Excruciatingly slow resilvering on X4540 (build 134)
Measure the I/O performance with iostat. You should see something that looks sorta like (iostat -zxCn 10): extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 5948.9 349.3 40322.3 5238.1 0.1 16.7 0.0 2.7 0 330 c9 3.7 0.0 230.7 0.0 0.0 0.1 0.0 13.5 0 2 c9t1d0 845.0 0.0 5497.4 0.0 0.0 0.9 0.0 1.1 1 32 c9t2d0 3.8 0.0 230.7 0.0 0.0 0.0 0.0 10.6 0 1 c9t3d0 845.2 0.0 5495.4 0.0 0.0 0.9 0.0 1.1 1 32 c9t4d0 3.8 0.0 237.1 0.0 0.0 0.0 0.0 10.4 0 1 c9t5d0 841.4 0.0 5519.7 0.0 0.0 0.9 0.0 1.1 1 32 c9t6d0 3.8 0.0 237.3 0.0 0.0 0.0 0.0 9.2 0 1 c9t7d0 843.5 0.0 5485.2 0.0 0.0 0.9 0.0 1.1 1 31 c9t8d0 3.7 0.0 230.8 0.0 0.0 0.1 0.0 15.2 0 2 c9t9d0 850.2 0.0 5488.6 0.0 0.0 0.9 0.0 1.1 1 31 c9t10d0 3.1 0.0 211.2 0.0 0.0 0.0 0.0 13.2 0 1 c9t11d0 847.9 0.0 5523.4 0.0 0.0 0.9 0.0 1.1 1 31 c9t12d0 3.1 0.0 204.9 0.0 0.0 0.0 0.0 9.6 0 1 c9t13d0 847.2 0.0 5506.0 0.0 0.0 0.9 0.0 1.1 1 31 c9t14d0 3.4 0.0 224.1 0.0 0.0 0.0 0.0 12.3 0 1 c9t15d0 0.0 349.3 0.0 5238.1 0.0 9.9 0.0 28.4 1 100 c9t16d0 Here you can clearly see a raidz2 resilver in progress. c9t16d0 is the disk being resilvered (write workload) and half of the others are being read to generate the resilvering data. Note the relative performance and the ~30% busy for the surviving disks. If you see iostat output that looks significantly different than this, then you might be seeing one of two common causes: 1. Your version of ZFS has the new resilver throttle *and* the pool is otherwise servicing I/O. 2. Disks are throwing errors or responding very slowly. Use fmdump -eV to observe error reports. -- richard On Nov 1, 2010, at 12:33 PM, Mark Sandrock wrote:> Hello, > > I''m working with someone who replaced a failed 1TB drive (50% utilized), > on an X4540 running OS build 134, and I think something must be wrong. > > Last Tuesday afternoon, zpool status reported: > > scrub: resilver in progress for 306h0m, 63.87% done, 173h7m to go > > and a week being 168 hours, that put completion at sometime tomorrow night. > > However, he just reported zpool status shows: > > scrub: resilver in progress for 447h26m, 65.07% done, 240h10m to go > > so it''s looking more like 2011 now. That can''t be right. > > I''m hoping for a suggestion or two on this issue. > > I''d search the archives, but they don''t seem searchable. Or am I wrong about that? > > Thanks. > Mark (subscription pending) > > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss-- ZFS and performance consulting http://www.RichardElling.com