I just witnessed a resilver that took 4h for 27gb of data. Setup is 3x raid-z2 stripes with 6 disks per raid-z2. Disks are 500gb in size. No checksum errors. It seems like an exorbitantly long time. The other 5 disks in the stripe with the replaced disk were at 90% busy and ~150io/s each during the resilver. Does this seem unusual to anyone else? Could it be due to heavy fragmentation or do I have a disk in the stripe going bad? Post-resilver no disk is above 30% util or noticeably higher than any other disk. Thank you in advance. (kernel is snv123) -J Sent via iPhone Is your e-mail Premiere? -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100926/b2e4920d/attachment.html>
> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss- > bounces at opensolaris.org] On Behalf Of Jason J. W. Williams > > I just witnessed a resilver that took 4h for 27gb of data. Setup is 3x > raid-z2 stripes with 6 disks per raid-z2. Disks are 500gb in size. No > checksum errors.27G on a 6-disk raidz2 means approx 6.75G per disk. Ideally, the disk could write 7G = 56 Gbit in a couple minutes if it were all sequential and no other activity in the system. So you''re right to suspect something is suboptimal, but the root cause is inefficient resilvering code in zfs specifically for raidzN. The resilver code spends a *lot* of time seeking, because it''s not optimized by disk layout. This may change some day, but not in the near future. Mirrors don''t suffer the same effect. At least, if they do, it''s far less dramatic. For now, all you can do is: (a) factor this into your decision to use mirror versus raidz, and (b) ensure no snapshots, and minimal IO during the resilver, and (c) if you opt for raidz, keep the number of disks in a raidz to a minimum. It is preferable to use 3 vdev''s each of 7-disk raidz, instead of using a 21-disk raidz3. Your setup of 3x raidz2 is pretty reasonable, and 4h resilver, although slow, is successful. Which is more than you could say if you had a 21-disk raidz3.
----- Original Message ----- I just witnessed a resilver that took 4h for 27gb of data. Setup is 3x raid-z2 stripes with 6 disks per raid-z2. Disks are 500gb in size. No checksum errors. It seems like an exorbitantly long time. The other 5 disks in the stripe with the replaced disk were at 90% busy and ~150io/s each during the resilver. Does this seem unusual to anyone else? Could it be due to heavy fragmentation or do I have a disk in the stripe going bad? Post-resilver no disk is above 30% util or noticeably higher than any other disk. Thank you in advance. (kernel is snv123) It surely seems a long time for 27 gigs. Scrub takes its time, but for this 50TB setup with currently ~29TB used, on WD Green drives (yeah, I know they''re bad, but I didn''t know that at the time I installed the box, and they have worked flawlessly for a year or so), scrub takes a bit of time, but nothing comparible to what you''re reporting scrub: scrub completed after 47h57m with 0 errors on Fri Sep 3 16:57:26 2010 Also, snv123 is quite old, is upgrading to 134 an option? Vennlige hilsener / Best regards roy -- Roy Sigurd Karlsbakk (+47) 97542685 roy at karlsbakk.net http://blogg.karlsbakk.net/ -- I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er et element?rt imperativ for alle pedagoger ? unng? eksessiv anvendelse av idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og relevante synonymer p? norsk. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100926/697fa762/attachment-0001.html>
On Sun, 26 Sep 2010, Edward Ned Harvey wrote:> 27G on a 6-disk raidz2 means approx 6.75G per disk. Ideally, the > disk could write 7G = 56 Gbit in a couple minutes if it were all > sequential and no other activity in the system. So you''re right to > suspect something is suboptimal, but the root cause is inefficient > resilvering code in zfs specifically for raidzN. The resilver code > spends a *lot* of time seeking, because it''s not optimized by disk > layout. This may change some day, but not in the near future.Part of the problem is that the zfs designers decided that the filesystems should remain up and usable during a resilver. Without this requirement things would be a lot easier. For example, we could just run some utility and wait many hours (perhaps fewer hours than zfs resilver) before the filesystems are allowed to be usable. Few of us want to return to that scenario. Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
Upgrading is definitely an option. What is the current snv favorite for ZFS stability? I apologize, with all the Oracle/Sun changes I haven''t been paying as close attention to big reports on zfs-discuss as I used to. -J Sent via iPhone Is your e-mail Premiere? On Sep 26, 2010, at 10:22, Roy Sigurd Karlsbakk <roy at karlsbakk.net> wrote:> > I just witnessed a resilver that took 4h for 27gb of data. Setup is 3x raid-z2 stripes with 6 disks per raid-z2. Disks are 500gb in size. No checksum errors. > > It seems like an exorbitantly long time. The other 5 disks in the stripe with the replaced disk were at 90% busy and ~150io/s each during the resilver. Does this seem unusual to anyone else? Could it be due to heavy fragmentation or do I have a disk in the stripe going bad? Post-resilver no disk is above 30% util or noticeably higher than any other disk. > > Thank you in advance. (kernel is snv123) > It surely seems a long time for 27 gigs. Scrub takes its time, but for this 50TB setup with currently ~29TB used, on WD Green drives (yeah, I know they''re bad, but I didn''t know that at the time I installed the box, and they have worked flawlessly for a year or so), scrub takes a bit of time, but nothing comparible to what you''re reporting > > scrub: scrub completed after 47h57m with 0 errors on Fri Sep 3 16:57:26 2010 > > Also, snv123 is quite old, is upgrading to 134 an option? > > Vennlige hilsener / Best regards > > roy > -- > Roy Sigurd Karlsbakk > (+47) 97542685 > roy at karlsbakk.net > http://blogg.karlsbakk.net/ > -- > I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er et element?rt imperativ for alle pedagoger ? unng? eksessiv anvendelse av idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og relevante synonymer p? norsk.-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100926/95fa8d35/attachment.html>
On Sep 26, 2010, at 11:03 AM, Jason J. W. Williams wrote:> Upgrading is definitely an option. What is the current snv favorite for ZFS stability? I apologize, with all the Oracle/Sun changes I haven''t been paying as close attention to big reports on zfs-discuss as I used to.OpenIndiana b147 is the latest binary release, but it also includes the fix for CR6494473, ZFS needs a way to slow down resilvering http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6494473 http://www.openindiana.org -- richard -- OpenStorage Summit, October 25-27, Palo Alto, CA http://nexenta-summit2010.eventbrite.com Richard Elling richard at nexenta.com +1-760-896-4422 Enterprise class storage for everyone www.nexenta.com
> > Upgrading is definitely an option. What is the current snv favorite > > for ZFS stability? I apologize, with all the Oracle/Sun changes I > > haven''t been paying as close attention to big reports on zfs-discuss > > as I used to. > > OpenIndiana b147 is the latest binary release, but it also includes > the fix for > CR6494473, ZFS needs a way to slow down resilvering > http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6494473 > http://www.openindiana.orgAre you sure upgrading to OI is safe at this point? 134 is stable unless you start fiddling with dedup, and OI is hardly tested. For a production setup, I''d recommend 134 Vennlige hilsener / Best regards roy -- Roy Sigurd Karlsbakk (+47) 97542685 roy at karlsbakk.net http://blogg.karlsbakk.net/ -- I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er et element?rt imperativ for alle pedagoger ? unng? eksessiv anvendelse av idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og relevante synonymer p? norsk.
On Sep 26, 2010, at 1:16 PM, Roy Sigurd Karlsbakk wrote:>>> Upgrading is definitely an option. What is the current snv favorite >>> for ZFS stability? I apologize, with all the Oracle/Sun changes I >>> haven''t been paying as close attention to big reports on zfs-discuss >>> as I used to. >> >> OpenIndiana b147 is the latest binary release, but it also includes >> the fix for >> CR6494473, ZFS needs a way to slow down resilvering >> http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6494473 >> http://www.openindiana.org > > Are you sure upgrading to OI is safe at this point? 134 is stable unless you start fiddling with dedup, and OI is hardly tested. For a production setup, I''d recommend 134For a production setup? For production I''d recommend something that is supported, preferably NexentaStor 3 (which is b134 + important ZFS fixes :-) -- richard -- OpenStorage Summit, October 25-27, Palo Alto, CA http://nexenta-summit2010.eventbrite.com Richard Elling richard at nexenta.com +1-760-896-4422 Enterprise class storage for everyone www.nexenta.com
134 it is. This is an OpenSolaris rig that''s going to be replaced within the next 60 days, so just need to get it to something that won''t through false checksum errors like the 120-123 builds do and has decent rebuild times. Future boxes will be NexentaStor. Thank you guys. :) -J On Sun, Sep 26, 2010 at 2:21 PM, Richard Elling <Richard at nexenta.com> wrote:> On Sep 26, 2010, at 1:16 PM, Roy Sigurd Karlsbakk wrote: > >>> Upgrading is definitely an option. What is the current snv favorite > >>> for ZFS stability? I apologize, with all the Oracle/Sun changes I > >>> haven''t been paying as close attention to big reports on zfs-discuss > >>> as I used to. > >> > >> OpenIndiana b147 is the latest binary release, but it also includes > >> the fix for > >> CR6494473, ZFS needs a way to slow down resilvering > >> http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6494473 > >> http://www.openindiana.org > > > > Are you sure upgrading to OI is safe at this point? 134 is stable unless > you start fiddling with dedup, and OI is hardly tested. For a production > setup, I''d recommend 134 > > For a production setup? For production I''d recommend something that is > supported, preferably NexentaStor 3 (which is b134 + important ZFS fixes > :-) > -- richard > > -- > OpenStorage Summit, October 25-27, Palo Alto, CA > http://nexenta-summit2010.eventbrite.com > > Richard Elling > richard at nexenta.com +1-760-896-4422 > Enterprise class storage for everyone > www.nexenta.com > > > > > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100927/1301cad0/attachment.html>
Err...I meant Nexenta Core. -J On Mon, Sep 27, 2010 at 12:02 PM, Jason J. W. Williams < jasonjwwilliams at gmail.com> wrote:> 134 it is. This is an OpenSolaris rig that''s going to be replaced within > the next 60 days, so just need to get it to something that won''t through > false checksum errors like the 120-123 builds do and has decent rebuild > times. > > Future boxes will be NexentaStor. > > Thank you guys. :) > > -J > > On Sun, Sep 26, 2010 at 2:21 PM, Richard Elling <Richard at nexenta.com>wrote: > >> On Sep 26, 2010, at 1:16 PM, Roy Sigurd Karlsbakk wrote: >> >>> Upgrading is definitely an option. What is the current snv favorite >> >>> for ZFS stability? I apologize, with all the Oracle/Sun changes I >> >>> haven''t been paying as close attention to big reports on zfs-discuss >> >>> as I used to. >> >> >> >> OpenIndiana b147 is the latest binary release, but it also includes >> >> the fix for >> >> CR6494473, ZFS needs a way to slow down resilvering >> >> http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6494473 >> >> http://www.openindiana.org >> > >> > Are you sure upgrading to OI is safe at this point? 134 is stable unless >> you start fiddling with dedup, and OI is hardly tested. For a production >> setup, I''d recommend 134 >> >> For a production setup? For production I''d recommend something that is >> supported, preferably NexentaStor 3 (which is b134 + important ZFS fixes >> :-) >> -- richard >> >> -- >> OpenStorage Summit, October 25-27, Palo Alto, CA >> http://nexenta-summit2010.eventbrite.com >> >> Richard Elling >> richard at nexenta.com +1-760-896-4422 >> Enterprise class storage for everyone >> www.nexenta.com >> >> >> >> >> >> _______________________________________________ >> zfs-discuss mailing list >> zfs-discuss at opensolaris.org >> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >> > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100927/0d48f574/attachment-0001.html>
Dear Richard, I am a Nexenta user and now I meet the same problem of the resilver spend too long time. I try to find out solution from the link on your content that "zfs set resilver_speed=10% pool_name" but the Nexenta without the property of resiler_speed. How can I slove my issue on Nexenta? Please advise. Thanks! -- This message posted from opensolaris.org
Dear Richard, How can I update the important ZFS fixes on NexentaStor? Now my version of NexentsStor is v3.0.4 enterprise. -- This message posted from opensolaris.org
On Dec 21, 2010, at 8:18 AM, Jackson Wang wrote:> Dear Richard, > I am a Nexenta user and now I meet the same problem of the resilver spend too long time. I try to find out solution from the link on your content that "zfs set resilver_speed=10% pool_name" but the Nexenta without the property of resiler_speed. How can I slove my issue on Nexenta? Please advise. Thanks!In general, resilver will take as long as needed. If your resilver is going very, very slow, then there could be other issues causing the slowness. Has the system been logging error messages related to the I/O subsystem during the resilver? -- richard
Dear Richard, Thanks for your reply. Actually there is NO any other disk/controlller fault in this system. An engineer of NexentaStor, Andrew, just add a line in /kernel/drv/sd.conf of "allow-bus-device-reset=0" of the NexentaStor system and then the resilver speed get high. Before the parameter add-on, the system had resilver more than 2 days and not complete yet. After the engineer add-on that line and reboot the system, the reslver just spend about 10 hours to complete. Do you know what happen about it? Thanks!! On Sun, Dec 26, 2010 at 1:24 PM, Richard Elling <richard.elling at gmail.com>wrote:> On Dec 21, 2010, at 8:18 AM, Jackson Wang wrote: > > Dear Richard, > > I am a Nexenta user and now I meet the same problem of the resilver spend > too long time. I try to find out solution from the link on your content that > "zfs set resilver_speed=10% pool_name" but the Nexenta without the property > of resiler_speed. How can I slove my issue on Nexenta? Please advise. > Thanks! > > In general, resilver will take as long as needed. If your resilver is going > very, very slow, then there could be other issues causing the slowness. > Has the system been logging error messages related to the I/O subsystem > during the resilver? > -- richard > >-- InfoTech Technology Corp. ???????? http://www.infowize.com.tw Jackson Wang ??? M: 0916163480 T:02-26791430 / 03-5834432 / 070-1020-9886 F:0940-472248 Tech Supp: support at infowize.com.tw Sales Supp: sales at infowize.com.tw -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20101226/bd90fc79/attachment.html>
Do you have SSD in? Which ones and any errors on those? On 26 Dec 2010 13:35, "Jackson Wang" <jcwang at infowize.com.tw> wrote:> Dear Richard, > Thanks for your reply. > > Actually there is NO any other disk/controlller fault in this system. An > engineer of NexentaStor, Andrew, just add a line in /kernel/drv/sd.conf of > "allow-bus-device-reset=0" of the NexentaStor system and then the resilver > speed get high. Before the parameter add-on, the system had resilver more > than 2 days and not complete yet. After the engineer add-on that line and > reboot the system, the reslver just spend about 10 hours to complete. Doyou> know what happen about it? Thanks!! > > > On Sun, Dec 26, 2010 at 1:24 PM, Richard Elling <richard.elling at gmail.com >wrote: > >> On Dec 21, 2010, at 8:18 AM, Jackson Wang wrote: >> > Dear Richard, >> > I am a Nexenta user and now I meet the same problem of the resilverspend>> too long time. I try to find out solution from the link on your contentthat>> "zfs set resilver_speed=10% pool_name" but the Nexenta without theproperty>> of resiler_speed. How can I slove my issue on Nexenta? Please advise. >> Thanks! >> >> In general, resilver will take as long as needed. If your resilver isgoing>> very, very slow, then there could be other issues causing the slowness. >> Has the system been logging error messages related to the I/O subsystem >> during the resilver? >> -- richard >> >> > > > -- > InfoTech Technology Corp. > ???????? > http://www.infowize.com.tw > > Jackson Wang ??? > M: 0916163480 > T:02-26791430 / 03-5834432 / 070-1020-9886 > F:0940-472248 > > Tech Supp: support at infowize.com.tw > Sales Supp: sales at infowize.com.tw-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20101226/3f1c2834/attachment.html>
On Dec 26, 2010, at 5:33 AM, Jackson Wang wrote:> Dear Richard, > Thanks for your reply. > > Actually there is NO any other disk/controlller fault in this system. An engineer of NexentaStor, Andrew, just add a line in /kernel/drv/sd.conf of "allow-bus-device-reset=0" of the NexentaStor system and then the resilver speed get high. Before the parameter add-on, the system had resilver more than 2 days and not complete yet. After the engineer add-on that line and reboot the system, the reslver just spend about 10 hours to complete. Do you know what happen about it? Thanks!!This occurs when a device is misbehaving and not responding to commands. When a device does not respond to commands for more than 60 seconds, the sd driver will issue a bus reset, which affects other devices on the "bus." This can happen regardless of the I/O workload. The workaround disables the bus resets, as described in the sd man page. -- richard