thr3ads.net - zfs discuss - [zfs-discuss] Resilvering speed? [May 2007]

If this information is useful, please help other people find it:
Share via:

Mario Goebbels

2007-May-09 15:56 UTC

[zfs-discuss] Resilvering speed?

I''ve read that it''s supposed to go at full speed, i.e. as fast
as possible. I''m doing a disk replace and what zpool reports kind of
surprises me. The resilver goes on at 1.6MB/s. Did resilvering get throttled at
some point between the builds, or is my ATA controller having bigger issues?

Thanks,
-mg
 
 
This message posted from opensolaris.org

Robert Milkowski

2007-May-09 18:26 UTC

head link

[zfs-discuss] Resilvering speed?

Hello Mario,

Wednesday, May 9, 2007, 5:56:18 PM, you wrote:

MG> I''ve read that it''s supposed to go at full speed, i.e.
as fast as
MG> possible. I''m doing a disk replace and what zpool reports kind
of
MG> surprises me. The resilver goes on at 1.6MB/s. Did resilvering get
MG> throttled at some point between the builds, or is my ATA controller
having bigger issues?

Lot of small files perhaps? What kind of protection have you used?

-- 
Best regards,
 Robert                            mailto:rmilkowski at task.gda.pl
                                       http://milek.blogspot.com

Richard Elling

2007-May-09 19:10 UTC

head link

[zfs-discuss] Resilvering speed?

Robert Milkowski wrote:> Hello Mario,
> 
> Wednesday, May 9, 2007, 5:56:18 PM, you wrote:
> 
> MG> I''ve read that it''s supposed to go at full speed,
i.e. as fast as
> MG> possible. I''m doing a disk replace and what zpool reports
kind of
> MG> surprises me. The resilver goes on at 1.6MB/s. Did resilvering get
> MG> throttled at some point between the builds, or is my ATA controller
having bigger issues?
> 
> Lot of small files perhaps? What kind of protection have you used?
Good question.  Remember that resilvering is done in time order and from
the top-level metadata down, not by sequentially blasting bits.  Jeff
Bonwick describes this as top-down resilvering.
	http://blogs.sun.com/bonwick/entry/smokin_mirrors

 From a MTTR and performance perspective this means that ZFS recovery time
is a function of the amount of space used, where it is located (!), and the
validity of the surviving or regenerated data.  The big win is the amount of
space used, as most file systems are not full.
  -- richard

Robert Milkowski

2007-May-09 19:26 UTC

head link

[zfs-discuss] Resilvering speed?

Hello Richard,

Wednesday, May 9, 2007, 9:10:22 PM, you wrote:

RE> Robert Milkowski wrote:>> Hello Mario,
>> 
>> Wednesday, May 9, 2007, 5:56:18 PM, you wrote:
>> 
>> MG> I''ve read that it''s supposed to go at full
speed, i.e. as fast as
>> MG> possible. I''m doing a disk replace and what zpool
reports kind of
>> MG> surprises me. The resilver goes on at 1.6MB/s. Did resilvering
get
>> MG> throttled at some point between the builds, or is my ATA
controller having bigger issues?
>> 
>> Lot of small files perhaps? What kind of protection have you used?
RE> Good question.  Remember that resilvering is done in time order and from
RE> the top-level metadata down, not by sequentially blasting bits.  Jeff
RE> Bonwick describes this as top-down resilvering.
RE>         http://blogs.sun.com/bonwick/entry/smokin_mirrors

RE>  From a MTTR and performance perspective this means that ZFS recovery
time
RE> is a function of the amount of space used, where it is located (!), and
the
RE> validity of the surviving or regenerated data.  The big win is the amount
of
RE> space used, as most file systems are not full.

Nevertheless with lot of small files written over many months (some
were removed) resilvering in raid-z2 is SLOOOW, even if there''s no
other activity in a pool (7-10 days on x4500 with 11disk in raidz2
group). Either it''s inherit in such environments or something else is
wrong.

-- 
Best regards,
 Robert                            mailto:rmilkowski at task.gda.pl
                                       http://milek.blogspot.com

Bakul Shah

2007-May-09 19:44 UTC

head link

[zfs-discuss] Resilvering speed?

> Robert Milkowski wrote:
> > Hello Mario,
> > 
> > Wednesday, May 9, 2007, 5:56:18 PM, you wrote:
> > 
> > MG> I''ve read that it''s supposed to go at full
speed, i.e. as fast as
> > MG> possible. I''m doing a disk replace and what zpool
reports kind of
> > MG> surprises me. The resilver goes on at 1.6MB/s. Did resilvering
get
> > MG> throttled at some point between the builds, or is my ATA
controller hav
> ing bigger issues?
> > 
> > Lot of small files perhaps? What kind of protection have you used?
> 
> Good question.  Remember that resilvering is done in time order and from
> the top-level metadata down, not by sequentially blasting bits.  Jeff
> Bonwick describes this as top-down resilvering.
> 	http://blogs.sun.com/bonwick/entry/smokin_mirrors
> 
>  From a MTTR and performance perspective this means that ZFS recovery time
> is a function of the amount of space used, where it is located (!), and the
> validity of the surviving or regenerated data.  The big win is the amount
of
> space used, as most file systems are not full.
>   -- richard
It seems to me that once you copy meta data, you can indeed
copy all live data sequentially.  Given that a vast majority
of disk blocks in use will typically contain data, this is a
winning strategy from a performance point of view and still
allows you to retrieve a fair bit of data in case of a second
disk failure (checksumming will catch a case where good
metadata points to as yet uncopied data block).  If amount of
live data is > 50% of disk space you may as well do a disk
copy, perhaps skipping over already copied meta data.

Not only that, you can even start using the disk being
resilvered right away for writes,  The new write will be
either to a) an already copied block or b) as yet uncopied
block.  In case a) there is nothing more to do.  In case b)
the copied-from-block will have the new data so in both cases
the right thing happens.  Any potential window between
reading a copied-from block and writing to copied-to block
can be closed with careful coding/locking.

If a second disk fails during copying, the current strategy
doesn''t buy you much in most any case.  You really don''t want
to go through a zillion files looking for survivors.  If you
have a backup, you will restore from that rather than look
through the debris.  Not to mention you have made the window
of a potentially catastrophic failure much larger if
resilvering is significantly slower.

Comments?

Toby Thain

2007-May-10 06:15 UTC

head link

[zfs-discuss] Resilvering speed?

On 9-May-07, at 3:44 PM, Bakul Shah wrote:
>> Robert Milkowski wrote:
>>> Hello Mario,
>>>
>>> Wednesday, May 9, 2007, 5:56:18 PM, you wrote:
>>>
>>> MG> I''ve read that it''s supposed to go at full
speed, i.e. as
>>> fast as
>>> MG> possible. I''m doing a disk replace and what zpool
reports
>>> kind of
>>> MG> surprises me. The resilver goes on at 1.6MB/s. Did  
>>> resilvering get
>>> MG> throttled at some point between the builds, or is my ATA  
>>> controller hav
>> ing bigger issues?
>>>
>>> Lot of small files perhaps? What kind of protection have you used?
>>
>> Good question.  Remember that resilvering is done in time order  
>> and from
>> the top-level metadata down, not by sequentially blasting bits.  Jeff
>> Bonwick describes this as top-down resilvering.
>> 	http://blogs.sun.com/bonwick/entry/smokin_mirrors
>>
>>  From a MTTR and performance perspective this means that ZFS  
>> recovery time
>> is a function of the amount of space used, where it is located  
>> (!), and the
>> validity of the surviving or regenerated data.  The big win is the  
>> amount of
>> space used, as most file systems are not full.
>>   -- richard
>
> It seems to me that once you copy meta data, you can indeed
> copy all live data sequentially.
I don''t see this, given the top down strategy. For instance, if I  
understand the transactional update process, you can''t commit the  
metadata until the data is in place.

Can you explain in more detail your reasoning?
>   Given that a vast majority
> of disk blocks in use will typically contain data, this is a
> winning strategy from a performance point of view and still
> allows you to retrieve a fair bit of data in case of a second
> disk failure (checksumming will catch a case where good
> metadata points to as yet uncopied data block).  If amount of
> live data is > 50% of disk space you may as well do a disk
> copy, perhaps skipping over already copied meta data.
>
> Not only that, you can even start using the disk being
> resilvered right away for writes,  The new write will be
> either to a) an already copied block
How can that be, under a COW r?gime?

--Toby
> or b) as yet uncopied
> block.  In case a) there is nothing more to do.  In case b)
> the copied-from-block will have the new data so in both cases
> the right thing happens.  Any potential window between
> reading a copied-from block and writing to copied-to block
> can be closed with careful coding/locking.
>
> If a second disk fails during copying, the current strategy
> doesn''t buy you much in most any case.  You really don''t
want
> to go through a zillion files looking for survivors.  If you
> have a backup, you will restore from that rather than look
> through the debris.  Not to mention you have made the window
> of a potentially catastrophic failure much larger if
> resilvering is significantly slower.
>
> Comments?
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Mario Goebbels

2007-May-10 11:49 UTC

head link

[zfs-discuss] Re: Resilvering speed?

> Lot of small files perhaps? What kind of protection
> have you used?
No protection, and as much small files as a full distro install has, plus some
more source code for some libs. It''s just 28GB that needs to be
resilvered, yet it takes like 6 hours at this abysmal speed.

At first I thought it was intentional, to keep the system responsive, but then I
read something about full speed. The replacement disk is on the same IDE channel
as the one to be replaced, so performance won''t be too high, but
1.6MB/s is just poor.

I guess it''s a general problem with my disk subsystem, because when I
migrated my data from the NTFS drives, I also got write speeds along the
1.2-1.6MB/s line. I blamed the NTFS driver at first, but then that weird writing
behaviour I posted about earlier plus this poor resilvering speed lets me
believe that something else is wrong.

I could configure ata.conf with the driveX_block_factor parameters like
described in man ata, since it defaults to 1 only, but doing so puts me in a
boot loop (there isn''t even an ata driver, just pci-ide, and putting
those params in its config doesn''t do anything).

Thanks for any help.
-mg
 
 
This message posted from opensolaris.org

Mario Goebbels

2007-May-10 12:15 UTC

head link

[zfs-discuss] Re: Resilvering speed?

Oh god I found it. So freakin'' bizarre. I''m pushing now 27MB/s
average, instead of meager 1.6MB/s. That''s more like it.

This is what happened:

Back in the day when I bought my first SATA drive, incidentally a WD Raptor, I
wanted Windows to boot off it, including bootloader placement on it and
everything. This however forced me to disable the IDE drives in the BIOS,
because otherwise, MBR, bootloader and whatever else was placed on the first IDE
drives. This setup works just fine in Windows.

However in Solaris, it doesn''t. It appears that if the drives are not
enabled in the BIOS, things like IDE block mode, IDE prefetching and such
aren''t activated for the drives. Windows'' ATA driver seems to
do this on its own, Solaris'' ATA driver however doesn''t.

Should I file a bug about this? Not that it''s common for IDE drives to
be intentionally disabled, but this issue almost made me tear my hair out.

-mg
 
 
This message posted from opensolaris.org

Bakul Shah

2007-May-10 16:52 UTC

head link

[zfs-discuss] Resilvering speed?

> > It seems to me that once you copy meta data, you can indeed
> > copy all live data sequentially.
> 
> I don''t see this, given the top down strategy. For instance, if I 
> understand the transactional update process, you can''t commit the 
> metadata until the data is in place.
> 
> Can you explain in more detail your reasoning?
First realize that this is just a thought experiment -- I
haven''t read much source code in any detail as yet and it is
entirely likely what I am suggesting can not work or there is
no need for it!  With that caveat....

http://blogs.sun.com/bonwick/entry/smokin_mirrors talks about
top-down resilvering.   That is, copy the root blocks first
(for example the uberblock), then the blocks they point to
and so on [1].  A major goal here is to minimize data loss in
case of a second failure.  Completely losing a metadata block
means you can''t access anything it points to so metadata
blocks are far more precious than data blocks.  This is
different from a normal update transaction, where the
copy-on-write proceeds bottom up -- which is what you are
talking about.  A major goal for a normal update is to ensure
that at all times a consistent filesystem structure is seen
by all.

All I was suggesting is that once all the metadata is copied
(or "resilvered"), switch to sequential copying to maximize
performance.  This does make checking the validity of a data
block more complicated.  So instead of copying data of file1
and then file2 and so on, just copy blocks in the most
efficient order, save their checksums and periodically
validate a whole bunch.  In fact since metadata is read
first, you can roughly figure which metadata blocks will be
needed when to check data block validity (because you know
where data blocks are stored).
> >   Given that a vast majority
> > of disk blocks in use will typically contain data, this is a
> > winning strategy from a performance point of view and still
> > allows you to retrieve a fair bit of data in case of a second
> > disk failure (checksumming will catch a case where good
> > metadata points to as yet uncopied data block).  If amount of
> > live data is > 50% of disk space you may as well do a disk
> > copy, perhaps skipping over already copied meta data.
> >
> > Not only that, you can even start using the disk being
> > resilvered right away for writes,  The new write will be
> > either to a) an already copied block
> 
> How can that be, under a COW regime?
I was talking about resilvering, not a normal update.  Copy
on write happens only for a normal update.  I was speculating
that you can do normal updates during resilvering.

Not sure if this is clear to anyone!

-- bakul

[1] Top down resilvering seems very much like a copying
garbage collector.  That similarity make me wonder if the
physical layout can be rearranged in some way for a more
efficient access to data -- the idea is to resilver and
compactify at the same time on one of the mirrors and then
make it the master and resilver the other mirrors.  Nah...
probably not worth the hassle.  [Again, I suspect no one else
understands what I am talking about:-)]

Tomas Ögren

2007-May-10 17:09 UTC

head link

[zfs-discuss] Resilvering speed?

On 10 May, 2007 - Bakul Shah sent me these 3,2K bytes:
> [1] Top down resilvering seems very much like a copying
> garbage collector.  That similarity make me wonder if the
> physical layout can be rearranged in some way for a more
> efficient access to data -- the idea is to resilver and
> compactify at the same time on one of the mirrors and then
> make it the master and resilver the other mirrors.  Nah...
> probably not worth the hassle.  [Again, I suspect no one else
> understands what I am talking about:-)]
Doing mirror rebuild and "defrag" at the same time sounds dangerous..
How about a separate "defrag" thing that you can run whenever you feel
like it instead..

/Tomas
-- 
Tomas ?gren, stric at acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Ume?
`- Sysadmin at {cs,acc}.umu.se

zfs discuss - May 2007 - Resilvering speed?

[zfs-discuss] Resilvering speed?

[zfs-discuss] Resilvering speed?

[zfs-discuss] Resilvering speed?

[zfs-discuss] Resilvering speed?

[zfs-discuss] Resilvering speed?

[zfs-discuss] Resilvering speed?

[zfs-discuss] Re: Resilvering speed?

[zfs-discuss] Re: Resilvering speed?

[zfs-discuss] Resilvering speed?

[zfs-discuss] Resilvering speed?