thr3ads.net - zfs discuss - [zfs-discuss] ZFS vq_max

If this information is useful, please help other people find it:
Share via:

Manoj Nayak

2008-Jan-22 11:34 UTC

[zfs-discuss] ZFS vq_max_pending value ?

Hi All.

ZFS document says ZFS schedules it''s I/O in such way that it manages to
saturate a single disk bandwidth  using enough concurrent 128K I/O.
The no of concurrent I/O is decided by vq_max_pending.The default value 
for  vq_max_pending is 35.

We have created 4-disk raid-z group inside ZFS pool on Thumper.ZFS 
record size is set to 128k.When we read/write a 128K record ,it issue a
128K/3 I/O to each of the 3 data disks in the 4-disk raid-z group.

We need to saturate all three data disk bandwidth in the Raidz group.Is 
it required to set vq_max_pending value to 35*3=135  ?

Thanks
Manoj Nayak

Richard Elling

2008-Jan-22 18:25 UTC

head link

[zfs-discuss] ZFS vq_max_pending value ?

Manoj Nayak wrote:> Hi All.
>
> ZFS document says ZFS schedules it''s I/O in such way that it
manages to
> saturate a single disk bandwidth  using enough concurrent 128K I/O.
> The no of concurrent I/O is decided by vq_max_pending.The default value 
> for  vq_max_pending is 35.
>
> We have created 4-disk raid-z group inside ZFS pool on Thumper.ZFS 
> record size is set to 128k.When we read/write a 128K record ,it issue a
> 128K/3 I/O to each of the 3 data disks in the 4-disk raid-z group.
>   
Yes, this is how it works for a read without errors.  For a write, you
should see 4 writes, each 128KBytes/3.  Writes may also be
coalesced, so you may see larger physical writes.
> We need to saturate all three data disk bandwidth in the Raidz group.Is 
> it required to set vq_max_pending value to 35*3=135  ?
>   
No.  vq_max_pending applies to each vdev.  Use iostat to see what
the device load is.  For the commonly used Hitachi 500 GByte disks
in a thumper, the read media bandwidth is 31-64.8 MBytes/s.  Writes
will be about 80% of reads, or 24.8-51.8 MBytes/s.  In a thumper,
the disk bandwidth will be the limiting factor for the hardware.
 -- richard

manoj nayak

2008-Jan-23 01:37 UTC

head link

[zfs-discuss] ZFS vq_max_pending value ?

> Manoj Nayak wrote:
>> Hi All.
>>
>> ZFS document says ZFS schedules it''s I/O in such way that it
manages to
>> saturate a single disk bandwidth  using enough concurrent 128K I/O.
>> The no of concurrent I/O is decided by vq_max_pending.The default value
>> for  vq_max_pending is 35.
>>
>> We have created 4-disk raid-z group inside ZFS pool on Thumper.ZFS
record
>> size is set to 128k.When we read/write a 128K record ,it issue a
>> 128K/3 I/O to each of the 3 data disks in the 4-disk raid-z group.
>>
>
> Yes, this is how it works for a read without errors.  For a write, you
> should see 4 writes, each 128KBytes/3.  Writes may also be
> coalesced, so you may see larger physical writes.
>
>> We need to saturate all three data disk bandwidth in the Raidz group.Is
>> it required to set vq_max_pending value to 35*3=135  ?
>>
>
> No.  vq_max_pending applies to each vdev.
4 disk raidz group issues 128k/3=42.6k io to each individual data disk.If 35 
concurrent 128k IO is enough to saturate a disk( vdev ) ,
then 35*3=105 concurrent 42k io will be required to saturates the same disk.

Thanks
Manoj Nayak

Use iostat to see what> the device load is.  For the commonly used Hitachi 500 GByte disks
> in a thumper, the read media bandwidth is 31-64.8 MBytes/s.  Writes
> will be about 80% of reads, or 24.8-51.8 MBytes/s.  In a thumper,
> the disk bandwidth will be the limiting factor for the hardware.
> -- richard
>
>

Richard Elling

2008-Jan-23 01:50 UTC

head link

[zfs-discuss] ZFS vq_max_pending value ?

manoj nayak wrote:>
>> Manoj Nayak wrote:
>>> Hi All.
>>>
>>> ZFS document says ZFS schedules it''s I/O in such way that
it manages
>>> to saturate a single disk bandwidth  using enough concurrent 128K
I/O.
>>> The no of concurrent I/O is decided by vq_max_pending.The default 
>>> value for  vq_max_pending is 35.
>>>
>>> We have created 4-disk raid-z group inside ZFS pool on Thumper.ZFS 
>>> record size is set to 128k.When we read/write a 128K record ,it
issue a
>>> 128K/3 I/O to each of the 3 data disks in the 4-disk raid-z group.
>>>
>>
>> Yes, this is how it works for a read without errors.  For a write, you
>> should see 4 writes, each 128KBytes/3.  Writes may also be
>> coalesced, so you may see larger physical writes.
>>
>>> We need to saturate all three data disk bandwidth in the Raidz 
>>> group.Is it required to set vq_max_pending value to 35*3=135  ?
>>>
>>
>> No.  vq_max_pending applies to each vdev.
>
> 4 disk raidz group issues 128k/3=42.6k io to each individual data 
> disk.If 35 concurrent 128k IO is enough to saturate a disk( vdev ) ,
> then 35*3=105 concurrent 42k io will be required to saturates the same 
> disk.
ZFS doesn''t know anything about disk saturation.  It will send
up to vq_max_pending  I/O requests per vdev (usually a vdev is a
disk). It will try to keep vq_max_pending I/O requests queued to
the vdev.

For writes, you should see them become coalesced, so rather than
sending 3 42.6kByte write requests to a vdev, you might see one
128kByte write request.

In other words, ZFS has an I/O scheduler which is responsible
for sending I/O requests to vdevs.
 -- richard

manoj nayak

2008-Jan-23 01:59 UTC

head link

[zfs-discuss] ZFS vq_max_pending value ?

----- Original Message ----- 
From: "Richard Elling" <Richard.Elling at Sun.COM>
To: "manoj nayak" <Manoj.Nayak at Sun.COM>
Cc: <zfs-discuss at opensolaris.org>
Sent: Wednesday, January 23, 2008 7:20 AM
Subject: Re: [zfs-discuss] ZFS vq_max_pending value ?

> manoj nayak wrote:
>>
>>> Manoj Nayak wrote:
>>>> Hi All.
>>>>
>>>> ZFS document says ZFS schedules it''s I/O in such way
that it manages to
>>>> saturate a single disk bandwidth  using enough concurrent 128K
I/O.
>>>> The no of concurrent I/O is decided by vq_max_pending.The
default value
>>>> for  vq_max_pending is 35.
>>>>
>>>> We have created 4-disk raid-z group inside ZFS pool on
Thumper.ZFS
>>>> record size is set to 128k.When we read/write a 128K record ,it
issue a
>>>> 128K/3 I/O to each of the 3 data disks in the 4-disk raid-z
group.
>>>>
>>>
>>> Yes, this is how it works for a read without errors.  For a write,
you
>>> should see 4 writes, each 128KBytes/3.  Writes may also be
>>> coalesced, so you may see larger physical writes.
>>>
>>>> We need to saturate all three data disk bandwidth in the Raidz
group.Is
>>>> it required to set vq_max_pending value to 35*3=135  ?
>>>>
>>>
>>> No.  vq_max_pending applies to each vdev.
>>
>> 4 disk raidz group issues 128k/3=42.6k io to each individual data
disk.If
>> 35 concurrent 128k IO is enough to saturate a disk( vdev ) ,
>> then 35*3=105 concurrent 42k io will be required to saturates the same 
>> disk.
>
> ZFS doesn''t know anything about disk saturation.  It will send
> up to vq_max_pending  I/O requests per vdev (usually a vdev is a
> disk). It will try to keep vq_max_pending I/O requests queued to
> the vdev.
I can see the "avg pending I/Os" hitting my  vq_max_pending limit,
then
raising the limit would be a good thing. I think , it''s due to
many 42k Read IO to individual disk in the 4 disk raidz group.

Thanks
Manoj Nayak
> For writes, you should see them become coalesced, so rather than
> sending 3 42.6kByte write requests to a vdev, you might see one
> 128kByte write request.
>
> In other words, ZFS has an I/O scheduler which is responsible
> for sending I/O requests to vdevs.
> -- richard
>
>

Richard Elling

2008-Jan-23 06:00 UTC

head link

[zfs-discuss] ZFS vq_max_pending value ?

manoj nayak wrote:>
> ----- Original Message ----- From: "Richard Elling" 
> <Richard.Elling at Sun.COM>
> To: "manoj nayak" <Manoj.Nayak at Sun.COM>
> Cc: <zfs-discuss at opensolaris.org>
> Sent: Wednesday, January 23, 2008 7:20 AM
> Subject: Re: [zfs-discuss] ZFS vq_max_pending value ?
>
>
>> manoj nayak wrote:
>>>
>>>> Manoj Nayak wrote:
>>>>> Hi All.
>>>>>
>>>>> ZFS document says ZFS schedules it''s I/O in such
way that it
>>>>> manages to saturate a single disk bandwidth  using enough 
>>>>> concurrent 128K I/O.
>>>>> The no of concurrent I/O is decided by vq_max_pending.The
default
>>>>> value for  vq_max_pending is 35.
>>>>>
>>>>> We have created 4-disk raid-z group inside ZFS pool on
Thumper.ZFS
>>>>> record size is set to 128k.When we read/write a 128K record
,it
>>>>> issue a
>>>>> 128K/3 I/O to each of the 3 data disks in the 4-disk raid-z
group.
>>>>>
>>>>
>>>> Yes, this is how it works for a read without errors.  For a
write, you
>>>> should see 4 writes, each 128KBytes/3.  Writes may also be
>>>> coalesced, so you may see larger physical writes.
>>>>
>>>>> We need to saturate all three data disk bandwidth in the
Raidz
>>>>> group.Is it required to set vq_max_pending value to
35*3=135  ?
>>>>>
>>>>
>>>> No.  vq_max_pending applies to each vdev.
>>>
>>> 4 disk raidz group issues 128k/3=42.6k io to each individual data 
>>> disk.If 35 concurrent 128k IO is enough to saturate a disk( vdev )
,
>>> then 35*3=105 concurrent 42k io will be required to saturates the 
>>> same disk.
>>
>> ZFS doesn''t know anything about disk saturation.  It will send
>> up to vq_max_pending  I/O requests per vdev (usually a vdev is a
>> disk). It will try to keep vq_max_pending I/O requests queued to
>> the vdev.
>
> I can see the "avg pending I/Os" hitting my  vq_max_pending
limit,
> then raising the limit would be a good thing. I think , it''s due
to
> many 42k Read IO to individual disk in the 4 disk raidz group.
You''re dealing with a queue here.  iostat''s average pending
I/Os represents
the queue depth.   Some devices can''t handle a large queue.  In any
case, queuing theory applies.

Note that for reads, the disk will likely have a track cache, so it is
not a good assumption that a read I/O will require a media access.
 -- richard

Manoj Nayak

2008-Jan-23 07:05 UTC

head link

[zfs-discuss] ZFS vq_max_pending value ?

>>>> 4 disk raidz group issues 128k/3=42.6k io to each individual
data
>>>> disk.If 35 concurrent 128k IO is enough to saturate a disk(
vdev ) ,
>>>> then 35*3=105 concurrent 42k io will be required to saturates
the
>>>> same disk.
>>>
>>> ZFS doesn''t know anything about disk saturation.  It will
send
>>> up to vq_max_pending  I/O requests per vdev (usually a vdev is a
>>> disk). It will try to keep vq_max_pending I/O requests queued to
>>> the vdev.
>>
>> I can see the "avg pending I/Os" hitting my  vq_max_pending
limit,
>> then raising the limit would be a good thing. I think , it''s
due to
>> many 42k Read IO to individual disk in the 4 disk raidz group.
>
> You''re dealing with a queue here.  iostat''s average
pending I/Os
> represents
> the queue depth.   Some devices can''t handle a large queue.  In
any
> case, queuing theory applies.
>
> Note that for reads, the disk will likely have a track cache, so it is
> not a good assumption that a read I/O will require a media access.My workload issues around 5000 MB read I/0 & iopattern says around 55% 
of the IO are random in nature.
I don''t know how much prefetching through track cache is going to help 
here.Probably I can try disabling vdev_cache
through "set ''zfs_vdev_cache_max'' 1"

Thanks
Manoj Nayak

Roch - PAE

2008-Jan-23 09:38 UTC

head link

[zfs-discuss] ZFS vq_max_pending value ?

Manoj Nayak writes:
 > Hi All.
 > 
 > ZFS document says ZFS schedules it''s I/O in such way that it
manages to
 > saturate a single disk bandwidth  using enough concurrent 128K I/O.
 > The no of concurrent I/O is decided by vq_max_pending.The default value 
 > for  vq_max_pending is 35.
 > 
 > We have created 4-disk raid-z group inside ZFS pool on Thumper.ZFS 
 > record size is set to 128k.When we read/write a 128K record ,it issue a
 > 128K/3 I/O to each of the 3 data disks in the 4-disk raid-z group.
 > 
 > We need to saturate all three data disk bandwidth in the Raidz group.Is 
 > it required to set vq_max_pending value to 35*3=135  ?
 > 

Nope.

Once a disk controller is working on 35 requests, we don''t
expect to get any more out of it by queueing more requests
and we might even confuse the firmware and get less.

Now for  an array controller and  a vdev  fronting for large
number of disks, then 35 might  be a low number not allowing
full throughput.  Rather    than tuning 35 up,    we suggest
splitting devives into smaller LUNs  since each luns is given
a 35-deep queue. 

Tuning vq_max_pending down helps read and synchronous write
(ZIL) latency. Today the preferred way to help ZIL latency
is to use a Separate Intent Log.

-r

 > Thanks
 > Manoj Nayak
 > _______________________________________________
 > zfs-discuss mailing list
 > zfs-discuss at opensolaris.org
 > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Manoj Nayak

2008-Jan-23 11:36 UTC

head link

[zfs-discuss] ZFS vq_max_pending value ?

Roch - PAE wrote:> Manoj Nayak writes:
>  > Hi All.
>  > 
>  > ZFS document says ZFS schedules it''s I/O in such way that it
manages to
>  > saturate a single disk bandwidth  using enough concurrent 128K I/O.
>  > The no of concurrent I/O is decided by vq_max_pending.The default
value
>  > for  vq_max_pending is 35.
>  > 
>  > We have created 4-disk raid-z group inside ZFS pool on Thumper.ZFS 
>  > record size is set to 128k.When we read/write a 128K record ,it issue
a
>  > 128K/3 I/O to each of the 3 data disks in the 4-disk raid-z group.
>  > 
>  > We need to saturate all three data disk bandwidth in the Raidz
group.Is
>  > it required to set vq_max_pending value to 35*3=135  ?
>  > 
>
> Nope.
>
> Once a disk controller is working on 35 requests, we don''t
> expect to get any more out of it by queueing more requests
> and we might even confuse the firmware and get less.
>
> Now for  an array controller and  a vdev  fronting for large
> number of disks, then 35 might  be a low number not allowing
> full throughput.  Rather    than tuning 35 up,    we suggest
> splitting devives into smaller LUNs  since each luns is given
> a 35-deep queue. 
>
>   It means 4-disk raid-z group inside ZFS pool is exported to ZFS as a 
single device ( vdev ) .ZFS assigns vq_max_pending value of 35 to this vdev.
To get higher throughput , I need to do following things ?

1.Reduce no of disks in the raidz group from four to three disk.So that 
same pending queue of 35 is available for lesser no of disk.
0r
2.Create slice out of physical disk & create raidz group out of four 
slices of a physical disk.So that same pending queue of 35 is available 
four slices of one physical disk.

Thanks
Manoj Nayak> Tuning vq_max_pending down helps read and synchronous write
> (ZIL) latency. Today the preferred way to help ZIL latency
> is to use a Separate Intent Log.
>
> -r
>
>
>  > Thanks
>  > Manoj Nayak
>  > _______________________________________________
>  > zfs-discuss mailing list
>  > zfs-discuss at opensolaris.org
>  > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
>

Will Murnane

2008-Jan-23 14:04 UTC

head link

[zfs-discuss] ZFS vq_max_pending value ?

On Jan 23, 2008 6:36 AM, Manoj Nayak <Manoj.Nayak at sun.com>
wrote:> It means 4-disk raid-z group inside ZFS pool is exported to ZFS as a
> single device ( vdev ) .ZFS assigns vq_max_pending value of 35 to this
vdev.
> To get higher throughput , I need to do following things ?
>
> 1.Reduce no of disks in the raidz group from four to three disk.So that
> same pending queue of 35 is available for lesser no of disk.
> 0r
> 2.Create slice out of physical disk & create raidz group out of four
> slices of a physical disk.So that same pending queue of 35 is available
> four slices of one physical disk.Or switch to mirrors instead, if you can live with the capacity hit.
Mirrors will have much better random read performance than raidz,
since they don''t need to read from every disk to make sure the
checksum matches.

Will

Richard Elling

2008-Jan-23 15:55 UTC

head link

[zfs-discuss] ZFS vq_max_pending value ?

Manoj Nayak wrote:> Roch - PAE wrote:
>   
>> Manoj Nayak writes:
>>  > Hi All.
>>  > 
>>  > ZFS document says ZFS schedules it''s I/O in such way
that it manages to
>>  > saturate a single disk bandwidth  using enough concurrent 128K
I/O.
>>  > The no of concurrent I/O is decided by vq_max_pending.The default
value
>>  > for  vq_max_pending is 35.
>>  > 
>>  > We have created 4-disk raid-z group inside ZFS pool on
Thumper.ZFS
>>  > record size is set to 128k.When we read/write a 128K record ,it
issue a
>>  > 128K/3 I/O to each of the 3 data disks in the 4-disk raid-z
group.
>>  > 
>>  > We need to saturate all three data disk bandwidth in the Raidz
group.Is
>>  > it required to set vq_max_pending value to 35*3=135  ?
>>  > 
>>
>> Nope.
>>
>> Once a disk controller is working on 35 requests, we don''t
>> expect to get any more out of it by queueing more requests
>> and we might even confuse the firmware and get less.
>>
>> Now for  an array controller and  a vdev  fronting for large
>> number of disks, then 35 might  be a low number not allowing
>> full throughput.  Rather    than tuning 35 up,    we suggest
>> splitting devives into smaller LUNs  since each luns is given
>> a 35-deep queue. 
>>
>>   
>>     
> It means 4-disk raid-z group inside ZFS pool is exported to ZFS as a 
> single device ( vdev ) .ZFS assigns vq_max_pending value of 35 to this
vdev.
> To get higher throughput , I need to do following things ?
>   
This is not the terminology we use to describe ZFS.  Quite simply,
a storage pool contains devices configured in some way, hopefully
using  some form of data protection (mirror, raidz[12]) -- see zpool(1m).
Each storage pool can contain one or more file systems or volumes --
see zfs(1m).

The term "export" is used to describe transition of ownership of a
storage pool between different hosts.
> 1.Reduce no of disks in the raidz group from four to three disk.So that 
> same pending queue of 35 is available for lesser no of disk.
> 0r
>   
35 is for each physical disk.
> 2.Create slice out of physical disk & create raidz group out of four 
> slices of a physical disk.So that same pending queue of 35 is available 
> four slices of one physical disk.
>   
This will likely have a negative scaling affect.  Some devices, especially
raw disks, have wimpy microprocessors and limited memory.  You can
easily overload them and see the response time increase dramatically,
just as queuing theory will suggest.  Some research has shown that a
value of 8-16 is better, at least for some storage devices.   A value of 1
is perhaps too low, at least for devices which can handle multiple
outstanding I/Os.

 > My workload issues around 5000 MB read I/0 & iopattern says around 
55% of the IO are random in nature.
 > I don''t know how much prefetching through track cache is going to
help here.Probably I can try disabling > vdev_cache
 > through "set ''zfs_vdev_cache_max'' 1"

We can''t size something like this unless we also know the I/O
size.  If you are talking small iops, say 8 kBytes, then you''ll
need lots of disks.  For larger iops, you may be able to get
by with fewer disks.
 -- richard

zfs discuss - Jan 2008 - ZFS vq_max_pending value ?

[zfs-discuss] ZFS vq_max_pending value ?

[zfs-discuss] ZFS vq_max_pending value ?

[zfs-discuss] ZFS vq_max_pending value ?

[zfs-discuss] ZFS vq_max_pending value ?

[zfs-discuss] ZFS vq_max_pending value ?

[zfs-discuss] ZFS vq_max_pending value ?

[zfs-discuss] ZFS vq_max_pending value ?

[zfs-discuss] ZFS vq_max_pending value ?

[zfs-discuss] ZFS vq_max_pending value ?

[zfs-discuss] ZFS vq_max_pending value ?

[zfs-discuss] ZFS vq_max_pending value ?