thr3ads.net - zfs discuss - [zfs-discuss] Problems with raidz2 resilvering with a TON of files [Mar 2010]

If this information is useful, please help other people find it:
Share via:

Jeffrey Johnson

2010-Mar-02 22:41 UTC

[zfs-discuss] Problems with raidz2 resilvering with a TON of files

Hi Folks,

We have put together a 25T ZFS raidz2 zpool (16x2TB 5900 RPM 32MB
Cache SATA 3.0Gb/s drives with 2x LSI SAS3081E-R SAS RAID Controllers
presenting the drives as JBOD straight thru to the backplane) with 2
hot-spares on OpenSolaris snv_133. The pool contains roughly 800
Million files which are all very small (~10-200k map tiles). We had a
hiccup with one of the drives and the resilvering process was
initiated ... the problem is that zpool status is estimating something
like 650 hours currently. This estimate has varied from 400 to 1800 as
it has run over the last couple of days, but it seems to have settled
around 650 now. That is just WAY too long ... we fear that if the end
user of this device ever has to replace a drive in the pool, it will
take this long to rebuild again.

So, we are wondering if a) there is some way we can optimize or tune
the pool to deal with this number of small files better and speed up
the resilvering process or b) some way we can tweak the resilvering
code to handle for this type of situation better.

One of our engineers is looking at setting up a VM on another machine
and using dtrace to find out where the bottleneck is, but we thought
we might have more luck on this list.

Thanks,

Jeff

Bob Friesenhahn

2010-Mar-02 22:54 UTC

head link

[zfs-discuss] Problems with raidz2 resilvering with a TON of files

On Tue, 2 Mar 2010, Jeffrey Johnson wrote:>
> We have put together a 25T ZFS raidz2 zpool (16x2TB 5900 RPM 32MB
> Cache SATA 3.0Gb/s drives with 2x LSI SAS3081E-R SAS RAID Controllers
> presenting the drives as JBOD straight thru to the backplane) with 2
> hot-spares on OpenSolaris snv_133. The pool contains roughly 800
> Million files which are all very small (~10-200k map tiles). We had a
> hiccup with one of the drives and the resilvering process was
> initiated ... the problem is that zpool status is estimating something
> like 650 hours currently. This estimate has varied from 400 to 1800 as
Oh, dear!  16 slow drives in one raidz2 vdev is just plain too many! 
It should be perhaps half that (at most) per raidz2 vdev.  With the 
super-huge drives you will want to dial down the number of drives per 
vdev.  The slow seek times and long rotational delay is a killer.
> it has run over the last couple of days, but it seems to have settled
> around 650 now. That is just WAY too long ... we fear that if the end
> user of this device ever has to replace a drive in the pool, it will
> take this long to rebuild again.
This fear is well founded.

Regardless, it is wise to use ''iostat -x 30'' to see if you
have a slow
drive in the mix.  The drives should be pretty uniformly loaded.

Bob
--
Bob Friesenhahn
bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

Richard Elling

2010-Mar-02 22:58 UTC

head link

[zfs-discuss] Problems with raidz2 resilvering with a TON of files

On Mar 2, 2010, at 2:41 PM, Jeffrey Johnson wrote:> Hi Folks,
> 
> We have put together a 25T ZFS raidz2 zpool (16x2TB 5900 RPM 32MB
> Cache SATA 3.0Gb/s drives with 2x LSI SAS3081E-R SAS RAID Controllers
> presenting the drives as JBOD straight thru to the backplane) with 2
> hot-spares on OpenSolaris snv_133. The pool contains roughly 800
> Million files which are all very small (~10-200k map tiles). We had a
> hiccup with one of the drives and the resilvering process was
> initiated ... the problem is that zpool status is estimating something
> like 650 hours currently. This estimate has varied from 400 to 1800 as
> it has run over the last couple of days, but it seems to have settled
> around 650 now. That is just WAY too long ... we fear that if the end
> user of this device ever has to replace a drive in the pool, it will
> take this long to rebuild again.
> 
> So, we are wondering if a) there is some way we can optimize or tune
> the pool to deal with this number of small files better and speed up
> the resilvering process or b) some way we can tweak the resilvering
> code to handle for this type of situation better.
> 
> One of our engineers is looking at setting up a VM on another machine
> and using dtrace to find out where the bottleneck is, but we thought
> we might have more luck on this list.
Those are slow drives, so it will take a while to resilver.  To verify the
I/O bottleneck, use iostat and observe the svc_t. If it is more than 5ms
or so, then just be patient.

AFAIK, there is no current rebuild characterization effort or data. I have/had
data from several years ago, but it is not useful today.
 -- richard

ZFS storage and performance consulting at http://www.RichardElling.com
ZFS training on deduplication, NexentaStor, and NAS performance
http://nexenta-atlanta.eventbrite.com (March 16-18, 2010)

Erik Trimble

2010-Mar-02 23:26 UTC

head link

[zfs-discuss] Problems with raidz2 resilvering with a TON of files

Richard Elling wrote:> On Mar 2, 2010, at 2:41 PM, Jeffrey Johnson wrote:
>   
>> Hi Folks,
>>
>> We have put together a 25T ZFS raidz2 zpool (16x2TB 5900 RPM 32MB
>> Cache SATA 3.0Gb/s drives with 2x LSI SAS3081E-R SAS RAID Controllers
>> presenting the drives as JBOD straight thru to the backplane) with 2
>> hot-spares on OpenSolaris snv_133. The pool contains roughly 800
>> Million files which are all very small (~10-200k map tiles). We had a
>> hiccup with one of the drives and the resilvering process was
>> initiated ... the problem is that zpool status is estimating something
>> like 650 hours currently. This estimate has varied from 400 to 1800 as
>> it has run over the last couple of days, but it seems to have settled
>> around 650 now. That is just WAY too long ... we fear that if the end
>> user of this device ever has to replace a drive in the pool, it will
>> take this long to rebuild again.
>>
>> So, we are wondering if a) there is some way we can optimize or tune
>> the pool to deal with this number of small files better and speed up
>> the resilvering process or b) some way we can tweak the resilvering
>> code to handle for this type of situation better.
>>
>> One of our engineers is looking at setting up a VM on another machine
>> and using dtrace to find out where the bottleneck is, but we thought
>> we might have more luck on this list.
>>     
>
> Those are slow drives, so it will take a while to resilver.  To verify the
> I/O bottleneck, use iostat and observe the svc_t. If it is more than 5ms
> or so, then just be patient.
>
> AFAIK, there is no current rebuild characterization effort or data. I
have/had
> data from several years ago, but it is not useful today.
>  -- richarI''m still assuming that a resilver a disk under RAIDZ[123] which 
contains a large number of very small files is the Worst Case scenario 
for resilver rates, correct?  Or has something significant changed recently?

That, and the 2TB/5900RPM drives are /horribly/ slow.  They max out at 
50 IOPS on a good day, and can''t even do sustained streaming write/read
above 100MB/s.  I would be surprised if they can even make 10MB/s doing 
typical random I/O, and it''s going to be even worse doing small random 
I/O chunks.  Which boils down to me considering you lucky if you get 1-2 
MB/s performance out of them. Estimate of 650 hours to do 2TB = a bit 
under 1MB/s. 

-- 
Erik Trimble
Java System Support
Mailstop:  usca22-123
Phone:  x17195
Santa Clara, CA

zfs discuss - Mar 2010 - Problems with raidz2 resilvering with a TON of files

[zfs-discuss] Problems with raidz2 resilvering with a TON of files

[zfs-discuss] Problems with raidz2 resilvering with a TON of files

[zfs-discuss] Problems with raidz2 resilvering with a TON of files

[zfs-discuss] Problems with raidz2 resilvering with a TON of files