Assume we have 100 disks in one zpool. Assume it takes 5 hours to scrub one disk. If I scrub the zpool, how long time will it take? Will it scrub one disk at a time, so it will take 500 hours, i.e. in sequence, just serial? Or is it possible to run the scrub in parallel, so it takes 5h no matter how many disks? -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20120610/4b5f6470/attachment.html>
On 10 June, 2012 - Kalle Anka sent me these 1,5K bytes:> Assume we have 100 disks in one zpool. Assume it takes 5 hours to > scrub one disk. If I scrub the zpool, how long time will it take? > > > Will it scrub one disk at a time, so it will take 500 hours, i.e. in > sequence, just serial? Or is it possible to run the scrub in parallel, > so it takes 5h no matter how many disks?It walks the filesystem/pool trees, so it''s not just reading the disk from track 0 to track 12345, but validates all possible copies. /Tomas -- Tomas Forsman, stric at acc.umu.se, http://www.acc.umu.se/~stric/ |- Student at Computing Science, University of Ume? `- Sysadmin at {cs,acc}.umu.se
> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss- > bounces at opensolaris.org] On Behalf Of Kalle Anka > > Assume we have 100 disks in one zpool. Assume it takes 5 hours to scrubone> disk. If I scrub the zpool, how long time will it take? > > Will it scrub one disk at a time, so it will take 500 hours, i.e. insequence, just> serial? Or is it possible to run the scrub in parallel, so it takes 5h nomatter> how many disks?It will be approximately parallel, because it''s actually scrubbing only the used blocks, and the order it scrubs in will be approximately the order they were written, which was intentionally parallel. Aside from that, your question doesn''t really make sense, because you don''t just stick a bunch of disks in a pool. You make a pool out of vdev''s which are made of storage devices (in this case, disks.) The type and size of vdev (raidz, raidzN, mirror, etc) will greatly affect the performance, as well as your data usage patterns. Scrubbing is an approximately random IOPS task. Mirrors parallelize random IO much better than raid. The amount of time it takes to scrub or resilver is dependent both on the amount of used data on the vdev, and the on-disk ordering.
2012-06-11 5:37, Edward Ned Harvey wrote:>> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss- >> bounces at opensolaris.org] On Behalf Of Kalle Anka >> >> Assume we have 100 disks in one zpool. Assume it takes 5 hours to scrub > one >> disk. If I scrub the zpool, how long time will it take? >> >> Will it scrub one disk at a time, so it will take 500 hours, i.e. in > sequence, just >> serial? Or is it possible to run the scrub in parallel, so it takes 5h no > matter >> how many disks? > > It will be approximately parallel, because it''s actually scrubbing only the > used blocks, and the order it scrubs in will be approximately the order they > were written, which was intentionally parallel.What the other posters said, plus: 100 disks is quite a lot of contention on the bus(es), so even if it is all parallel, the bus and CPU bottlenecks would raise the scrubbing time somewhat above the single-disk scrub time. Roughly, if all else is ideal (i.e. no/few random seeks and a fast scrub at 100Mbps/disk), the SATA3 interface at 6Gbit/s (on the order of ~600Mbyte/s) will be maxed out at about 6 disks. If your disks are colocated on one HBA receptacle (i.e. via a backplane), this may be an issue for many disks in an enclosure (a 4-lane link will sustain about 24 drives at such speed, and that''s not the market''s max speed). Further on, the PCI buses will become a bottleneck and the CPU processing power might become one too, and for a box with 100 disks this may be noticeable, depending on the other architectural choices, components and their specs. HTH, //Jim
Scrubs are run at very low priority and yield very quickly in the presence of other work. So I really would not expect to see scrub create any impact on an other type of storage activity. Resilvering will more aggressively push forward on what is has to do, but resilvering does not need to read any of the data blocks on the non-resilvering vdevs. -r Le 11 juin 2012 ? 09:05, Jim Klimov a ?crit :> 2012-06-11 5:37, Edward Ned Harvey wrote: >>> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss- >>> bounces at opensolaris.org] On Behalf Of Kalle Anka >>> >>> Assume we have 100 disks in one zpool. Assume it takes 5 hours to scrub >> one >>> disk. If I scrub the zpool, how long time will it take? >>> >>> Will it scrub one disk at a time, so it will take 500 hours, i.e. in >> sequence, just >>> serial? Or is it possible to run the scrub in parallel, so it takes 5h no >> matter >>> how many disks? >> >> It will be approximately parallel, because it''s actually scrubbing only the >> used blocks, and the order it scrubs in will be approximately the order they >> were written, which was intentionally parallel. > > What the other posters said, plus: 100 disks is quite a lot > of contention on the bus(es), so even if it is all parallel, > the bus and CPU bottlenecks would raise the scrubbing time > somewhat above the single-disk scrub time. > > Roughly, if all else is ideal (i.e. no/few random seeks and > a fast scrub at 100Mbps/disk), the SATA3 interface at 6Gbit/s > (on the order of ~600Mbyte/s) will be maxed out at about > 6 disks. If your disks are colocated on one HBA receptacle > (i.e. via a backplane), this may be an issue for many disks > in an enclosure (a 4-lane link will sustain about 24 drives > at such speed, and that''s not the market''s max speed). > > Further on, the PCI buses will become a bottleneck and the > CPU processing power might become one too, and for a box > with 100 disks this may be noticeable, depending on the other > architectural choices, components and their specs. > > HTH, > //Jim > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
2012-06-12 16:20, Roch Bourbonnais wrote:> > Scrubs are run at very low priority and yield very quickly in the presence of other work. > So I really would not expect to see scrub create any impact on an other type of storage activity. > Resilvering will more aggressively push forward on what is has to do, but resilvering does not need to > read any of the data blocks on the non-resilvering vdevs.Thanks, I agree - and that''s important to notice, at least on the current versions of ZFS :) What I meant to stress that if a "scrub of one disk takes 5 hours" (whichever way that measurement can be made, such as making a 1-disk pool with same data distribution), then there are physical reasons why a 100-disk pool probably would take some way more than 5 hours to scrub; or at least which bottlenecks should be paid attention to in order to minimize such increase in scrub time. Also, yes, presence of pool activity would likely delay the scrub completion time, perhaps even more noticeably. Thanks, //Jim Klimov
The process should be scalable. Scrub all of the data on one disk using one disk worth of IOPS Scrub all of the data on N disks using N disk''s worth of IOPS. THat will take ~ the same total time. -r Le 12 juin 2012 ? 08:28, Jim Klimov a ?crit :> 2012-06-12 16:20, Roch Bourbonnais wrote: >> >> Scrubs are run at very low priority and yield very quickly in the presence of other work. >> So I really would not expect to see scrub create any impact on an other type of storage activity. >> Resilvering will more aggressively push forward on what is has to do, but resilvering does not need to >> read any of the data blocks on the non-resilvering vdevs. > > Thanks, I agree - and that''s important to notice, at least > on the current versions of ZFS :) > > What I meant to stress that if a "scrub of one disk takes > 5 hours" (whichever way that measurement can be made, such > as making a 1-disk pool with same data distribution), then > there are physical reasons why a 100-disk pool probably > would take some way more than 5 hours to scrub; or at least > which bottlenecks should be paid attention to in order to > minimize such increase in scrub time. > > Also, yes, presence of pool activity would likely delay > the scrub completion time, perhaps even more noticeably. > > Thanks, > //Jim Klimov > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
2012-06-12 16:45, Roch Bourbonnais wrote:> > The process should be scalable. > Scrub all of the data on one disk using one disk worth of IOPS > Scrub all of the data on N disks using N disk''s worth of IOPS. > > THat will take ~ the same total time.IF the uplink or processing power or some other bottleneck does not limit that (i.e. a single 4-lane SAS link to the daisy-chain of 100 or 200 disks would likely impose a bandwidth bottleneck). I know that well-engineered servers spec''ed by a vendor/integrator for the customer''s tasks and environment, such as those from Sun, are built to avoid such apparent bottlenecks. But people who construct their own storage should know of (and try to avoid) such possible problem-makers ;) Thanks, Roch, //Jim Klimov
On Jun 11, 2012, at 6:05 AM, Jim Klimov wrote:> 2012-06-11 5:37, Edward Ned Harvey wrote: >>> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss- >>> bounces at opensolaris.org] On Behalf Of Kalle Anka >>> >>> Assume we have 100 disks in one zpool. Assume it takes 5 hours to scrub >> one >>> disk. If I scrub the zpool, how long time will it take? >>> >>> Will it scrub one disk at a time, so it will take 500 hours, i.e. in >> sequence, just >>> serial? Or is it possible to run the scrub in parallel, so it takes 5h no >> matter >>> how many disks? >> >> It will be approximately parallel, because it''s actually scrubbing only the >> used blocks, and the order it scrubs in will be approximately the order they >> were written, which was intentionally parallel. > > What the other posters said, plus: 100 disks is quite a lot > of contention on the bus(es), so even if it is all parallel, > the bus and CPU bottlenecks would raise the scrubbing time > somewhat above the single-disk scrub time.In general, this is not true for HDDs or modern CPUs. Modern systems are overprovisioned for bandwidth. In fact, bandwidth has been a poor design point for storage for a long time. Dave Patterson has some interesting observations on this, now 8 years dated. http://www.ll.mit.edu/HPEC/agendas/proc04/invited/patterson_keynote.pdf SSDs tend to be a different story, and there is some interesting work being done in this area, both on the systems side as well as the SSD side. This is where the fun work is progressing :-) -- richard -- ZFS and performance consulting http://www.RichardElling.com