thr3ads.net - zfs discuss - [zfs-discuss] NFS, ZFS & ESX [Jul 2009]

If this information is useful, please help other people find it:
Share via:

erik.ableson

2009-Jul-07 10:14 UTC

[zfs-discuss] NFS, ZFS & ESX

OK - I''m at my wit''s end here as I''ve looked
everywhere to find some
means of tuning NFS performance with ESX into returning something  
acceptable using osol 2008.11.  I''ve eliminated everything but the NFS
portion of the equation and am looking for some pointers in the right  
direction.

Configuration: PE2950 bi pro Xeon, 32Gb RAM with an MD1000 using a  
zpool of 7 mirror vdevs. ESX 3.5 and 4.0.  Pretty much a vanilla  
install across the board, no additional software other than the  
Adaptec StorMan to manage the disks.

local performance via dd - 463MB/s write, 1GB/s read (8Gb file)
iSCSI performance - 90MB/s write, 120MB/s read (800Mb file from a VM)
NFS performance - 1.4MB/s write, 20MB/s read (800Mb file from the  
Service Console, transfer of a 8Gb file via the datastore browser)

I just found the tool latencytop which points the finger at the ZIL  
(tip of the hat to Lejun Zhu).  Ref: <http://www.infrageeks.com/zfs/nfsd.png 
 > & <http://www.infrageeks.com/zfs/fsflush.png>.  Log file:
<http://www.infrageeks.com/zfs/latencytop.log
 >

Now I can understand that there is a performance hit associated with  
this feature of ZFS for ensuring data integrity, but this drastic a  
difference makes no sense whatsoever. The pool is capable of handling  
natively (at worst) 120*7 IOPS and I''m not even seeing enough to  
saturate a USB thumb drive. This still doesn''t answer why the read  
performance is so bad either.  According to latencytop, the culprit  
would be genunix`cv_timedwait_sig rpcmod`svc

 From my searching it appears that there''s no async setting for the  
osol nfsd, and ESX does not offer any mount controls to force an async  
connection.  Other than putting in an SSD as a ZIL (which still  
strikes me as overkill for basic NFS services) I''m looking for any  
information that can bring me up to at least reasonable throughput.

Would a dedicated 15K SAS drive help the situation by moving the ZIL  
traffic off to a dedicated device? Significantly? This is the sort of  
thing that I don''t want to do without some reasonable assurance that  
it will help since you can''t remove a ZIL device from a pool at the  
moment.

Hints and tips appreciated,

Erik

Nicholas Lee

2009-Jul-07 11:26 UTC

head link

[zfs-discuss] NFS, ZFS & ESX

What is your NFS window size?   32kb * 120 * 7 should get you 25MB/s.  Have
considered getting a Intel X25-E?    Going from 840 sync nfs iops to 3-5k+
iops is not overkill for SSD slog device.
In fact probably cheaper to have one or two less vdevs and a single slog
device.

Nicholas

On Tue, Jul 7, 2009 at 10:14 PM, erik.ableson <eableson at mac.com> wrote:
>
> Configuration: PE2950 bi pro Xeon, 32Gb RAM with an MD1000 using a zpool of
> 7 mirror vdevs. ESX 3.5 and 4.0.  Pretty much a vanilla install across the
> board, no additional software other than the Adaptec StorMan to manage the
> disks.

 ...


> Now I can understand that there is a performance hit associated with this
> feature of ZFS for ensuring data integrity, but this drastic a difference
> makes no sense whatsoever. The pool is capable of handling natively (at
> worst) 120*7 IOPS and I''m not even seeing enough to saturate a USB
thumb
> drive. This still doesn''t answer why the read performance is so
bad either.
>  According to latencytop, the culprit would be genunix`cv_timedwait_sig
> rpcmod`svc
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20090707/d1536313/attachment.html>

Calum Mackay

2009-Jul-07 12:32 UTC

head link

[zfs-discuss] [nfs-discuss] NFS, ZFS & ESX

interesting; but presumably the ZIL/fsflush is not the reason for the 
associated poor *read* performance?

where does latencytop point the finger in that case?

cheers,
calum.

Richard Elling

2009-Jul-07 16:10 UTC

head link

[zfs-discuss] NFS, ZFS & ESX

erik.ableson wrote:> OK - I''m at my wit''s end here as I''ve looked
everywhere to find some
> means of tuning NFS performance with ESX into returning something 
> acceptable using osol 2008.11.  I''ve eliminated everything but the
NFS
> portion of the equation and am looking for some pointers in the right 
> direction.
Any time you have NFS, ZFS as the backing store, JBOD, and a performance
concern you need to look at the sync activity on the server.  This will 
often
be visible as ZIL activity, which you can see clearly with zilstat. 
http://www.richardelling.com/Home/scripts-and-programs-1/zilstat

The cure is not to disable the ZIL or break NFS.  The cure is lower latency
I/O for the ZIL.
http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide#ZFS_and_NFS_Server_Performance
 -- richard>
> Configuration: PE2950 bi pro Xeon, 32Gb RAM with an MD1000 using a 
> zpool of 7 mirror vdevs. ESX 3.5 and 4.0.  Pretty much a vanilla 
> install across the board, no additional software other than the 
> Adaptec StorMan to manage the disks.
>
> local performance via dd - 463MB/s write, 1GB/s read (8Gb file)
> iSCSI performance - 90MB/s write, 120MB/s read (800Mb file from a VM)
> NFS performance - 1.4MB/s write, 20MB/s read (800Mb file from the 
> Service Console, transfer of a 8Gb file via the datastore browser)
>
> I just found the tool latencytop which points the finger at the ZIL 
> (tip of the hat to Lejun Zhu).  Ref: 
> <http://www.infrageeks.com/zfs/nfsd.png> & 
> <http://www.infrageeks.com/zfs/fsflush.png>.  Log file: 
> <http://www.infrageeks.com/zfs/latencytop.log>
>
> Now I can understand that there is a performance hit associated with 
> this feature of ZFS for ensuring data integrity, but this drastic a 
> difference makes no sense whatsoever. The pool is capable of handling 
> natively (at worst) 120*7 IOPS and I''m not even seeing enough to 
> saturate a USB thumb drive. This still doesn''t answer why the read
> performance is so bad either.  According to latencytop, the culprit 
> would be genunix`cv_timedwait_sig rpcmod`svc
>
> From my searching it appears that there''s no async setting for the
> osol nfsd, and ESX does not offer any mount controls to force an async 
> connection.  Other than putting in an SSD as a ZIL (which still 
> strikes me as overkill for basic NFS services) I''m looking for any
> information that can bring me up to at least reasonable throughput.
>
> Would a dedicated 15K SAS drive help the situation by moving the ZIL 
> traffic off to a dedicated device? Significantly? This is the sort of 
> thing that I don''t want to do without some reasonable assurance
that
> it will help since you can''t remove a ZIL device from a pool at
the
> moment.
>
> Hints and tips appreciated,
>
> Erik
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Dai Ngo

2009-Jul-07 17:36 UTC

head link

[zfs-discuss] [nfs-discuss] NFS, ZFS & ESX

Without any tuning, the default TCP window size and send buffer size for NFS
connections is around 48KB which is not very optimal for bulk transfer. 
However
the 1.4MB/s write seems to indicate something else is seriously wrong.

iSCSI performance was good, so the network connection seems to be OK 
(assuming
it''s 1GbE).

What is your mount options look like?

I don''t know what datastore browser does for copying file, but have you
tried
the vanilla ''cp'' command?

You can also try NFS performance using tmpfs, instead of ZFS, to make sure
NIC, protocol stack, NFS are not the culprit.

-Dai

erik.ableson wrote:> OK - I''m at my wit''s end here as I''ve looked
everywhere to find some
> means of tuning NFS performance with ESX into returning something 
> acceptable using osol 2008.11.  I''ve eliminated everything but the
NFS
> portion of the equation and am looking for some pointers in the right 
> direction.
>
> Configuration: PE2950 bi pro Xeon, 32Gb RAM with an MD1000 using a 
> zpool of 7 mirror vdevs. ESX 3.5 and 4.0.  Pretty much a vanilla 
> install across the board, no additional software other than the 
> Adaptec StorMan to manage the disks.
>
> local performance via dd - 463MB/s write, 1GB/s read (8Gb file)
> iSCSI performance - 90MB/s write, 120MB/s read (800Mb file from a VM)
> NFS performance - 1.4MB/s write, 20MB/s read (800Mb file from the 
> Service Console, transfer of a 8Gb file via the datastore browser)
>
> I just found the tool latencytop which points the finger at the ZIL 
> (tip of the hat to Lejun Zhu).  Ref: 
> <http://www.infrageeks.com/zfs/nfsd.png> & 
> <http://www.infrageeks.com/zfs/fsflush.png>.  Log file: 
> <http://www.infrageeks.com/zfs/latencytop.log>
>
> Now I can understand that there is a performance hit associated with 
> this feature of ZFS for ensuring data integrity, but this drastic a 
> difference makes no sense whatsoever. The pool is capable of handling 
> natively (at worst) 120*7 IOPS and I''m not even seeing enough to 
> saturate a USB thumb drive. This still doesn''t answer why the read
> performance is so bad either.  According to latencytop, the culprit 
> would be genunix`cv_timedwait_sig rpcmod`svc
>
> From my searching it appears that there''s no async setting for the
> osol nfsd, and ESX does not offer any mount controls to force an async 
> connection.  Other than putting in an SSD as a ZIL (which still 
> strikes me as overkill for basic NFS services) I''m looking for any
> information that can bring me up to at least reasonable throughput.
>
> Would a dedicated 15K SAS drive help the situation by moving the ZIL 
> traffic off to a dedicated device? Significantly? This is the sort of 
> thing that I don''t want to do without some reasonable assurance
that
> it will help since you can''t remove a ZIL device from a pool at
the
> moment.
>
> Hints and tips appreciated,
>
> Erik
> _______________________________________________
> nfs-discuss mailing list
> nfs-discuss at opensolaris.org

erik.ableson

2009-Jul-08 08:11 UTC

head link

[zfs-discuss] [nfs-discuss] NFS, ZFS & ESX

Comments in line.

On 7 juil. 09, at 19:36, Dai Ngo wrote:
> Without any tuning, the default TCP window size and send buffer size  
> for NFS
> connections is around 48KB which is not very optimal for bulk  
> transfer. However
> the 1.4MB/s write seems to indicate something else is seriously wrong.
My sentiment as well.
> iSCSI performance was good, so the network connection seems to be OK  
> (assuming
> it''s 1GbE).
Yup - I''m running at wire speed on the iSCSI connections.
> What is your mount options look like?
Unfortunately, ESX doesn''t give any controls over mount options
> I don''t know what datastore browser does for copying file, but
have
> you tried
> the vanilla ''cp'' command?
The datastore browser copy command is just a wrapper for cp from what  
I can gather. All types of copy operations to the NFS volume, even  
from other machines top out at this speed.  The NFS/iSCSI connections  
are in a separate physical network so I can''t easily plug anything  
into it for testing other mount options from another machine or OS.  
I''ll try from another VM to see if I can''t force a mount with
the
async option to see if that helps any.
> You can also try NFS performance using tmpfs, instead of ZFS, to  
> make sure
> NIC, protocol stack, NFS are not the culprit.
 From what I can observe, it appears that the sync commands issues  
over the NFS stack are slowing down the process, even with a  
reasonable number of disks in the pool.

What I was hoping for was the same behavior (albeit slightly risky) of  
having writes cached to RAM and then dumped out in an optimal manner  
to disk, as per the local behavior where you see the flush to disk  
operations happening on a regular cycle. I think that this would be  
doable with an async mount, but I can''t set this on the server side  
where it would be used by the servers automatically.

Erik
> erik.ableson wrote:
>> OK - I''m at my wit''s end here as I''ve looked
everywhere to find
>> some means of tuning NFS performance with ESX into returning  
>> something acceptable using osol 2008.11.  I''ve eliminated  
>> everything but the NFS portion of the equation and am looking for  
>> some pointers in the right direction.
>>
>> Configuration: PE2950 bi pro Xeon, 32Gb RAM with an MD1000 using a  
>> zpool of 7 mirror vdevs. ESX 3.5 and 4.0.  Pretty much a vanilla  
>> install across the board, no additional software other than the  
>> Adaptec StorMan to manage the disks.
>>
>> local performance via dd - 463MB/s write, 1GB/s read (8Gb file)
>> iSCSI performance - 90MB/s write, 120MB/s read (800Mb file from a VM)
>> NFS performance - 1.4MB/s write, 20MB/s read (800Mb file from the  
>> Service Console, transfer of a 8Gb file via the datastore browser)
>>
>> I just found the tool latencytop which points the finger at the ZIL  
>> (tip of the hat to Lejun Zhu).  Ref:
<http://www.infrageeks.com/zfs/nfsd.png
>> > & <http://www.infrageeks.com/zfs/fsflush.png>.  Log
file: <http://www.infrageeks.com/zfs/latencytop.log
>> >
>>
>> Now I can understand that there is a performance hit associated  
>> with this feature of ZFS for ensuring data integrity, but this  
>> drastic a difference makes no sense whatsoever. The pool is capable  
>> of handling natively (at worst) 120*7 IOPS and I''m not even
seeing
>> enough to saturate a USB thumb drive. This still doesn''t
answer why
>> the read performance is so bad either.  According to latencytop,  
>> the culprit would be genunix`cv_timedwait_sig rpcmod`svc
>>
>> From my searching it appears that there''s no async setting for
the
>> osol nfsd, and ESX does not offer any mount controls to force an  
>> async connection.  Other than putting in an SSD as a ZIL (which  
>> still strikes me as overkill for basic NFS services) I''m
looking
>> for any information that can bring me up to at least reasonable  
>> throughput.
>>
>> Would a dedicated 15K SAS drive help the situation by moving the  
>> ZIL traffic off to a dedicated device? Significantly? This is the  
>> sort of thing that I don''t want to do without some reasonable
>> assurance that it will help since you can''t remove a ZIL
device
>> from a pool at the moment.
>>
>> Hints and tips appreciated,
>>
>> Erik
>> _______________________________________________
>> nfs-discuss mailing list
>> nfs-discuss at opensolaris.org
>

Roch

2009-Jul-08 09:32 UTC

head link

[zfs-discuss] [nfs-discuss] NFS, ZFS & ESX

erik.ableson writes:

 > Comments in line.
 > 
 > On 7 juil. 09, at 19:36, Dai Ngo wrote:
 > 
 > > Without any tuning, the default TCP window size and send buffer size
 > > for NFS
 > > connections is around 48KB which is not very optimal for bulk  
 > > transfer. However
 > > the 1.4MB/s write seems to indicate something else is seriously
wrong.
 > 
 > My sentiment as well.
 > 
 > > iSCSI performance was good, so the network connection seems to be OK
 > > (assuming
 > > it''s 1GbE).
 > 
 > Yup - I''m running at wire speed on the iSCSI connections.
 > 
 > > What is your mount options look like?
 > 
 > Unfortunately, ESX doesn''t give any controls over mount options
 > 
 > > I don''t know what datastore browser does for copying file,
but have
 > > you tried
 > > the vanilla ''cp'' command?
 > 
 > The datastore browser copy command is just a wrapper for cp from what  
 > I can gather. All types of copy operations to the NFS volume, even  
 > from other machines top out at this speed.  The NFS/iSCSI connections  
 > are in a separate physical network so I can''t easily plug
anything
 > into it for testing other mount options from another machine or OS.  
 > I''ll try from another VM to see if I can''t force a mount
with the
 > async option to see if that helps any.
 > 
 > > You can also try NFS performance using tmpfs, instead of ZFS, to  
 > > make sure
 > > NIC, protocol stack, NFS are not the culprit.
 > 
 >  From what I can observe, it appears that the sync commands issues  
 > over the NFS stack are slowing down the process, even with a  
 > reasonable number of disks in the pool.
 > 
 > What I was hoping for was the same behavior (albeit slightly risky) of  
 > having writes cached to RAM and then dumped out in an optimal manner  
 > to disk, as per the local behavior where you see the flush to disk  
 > operations happening on a regular cycle. I think that this would be  
 > doable with an async mount, but I can''t set this on the server
side
 > where it would be used by the servers automatically.
 > 
 > Erik
 > 

I would wouldn''t do this, sounds like you want to have
zil_disable.

	http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide

If you do, then be prepared to unmount or reboot all clients of
the server in case of a crash in order to clear their
corrupted caches.

This is in no way a ZIL problem nor a ZFS problem.

	http://blogs.sun.com/roch/entry/nfs_and_zfs_a_fine 

And most NFS appliance provider will use a form of write
accelerating devices to try to make the NFS experience closer to
local filesystem behavior.


-r



 > > erik.ableson wrote:
 > >> OK - I''m at my wit''s end here as I''ve
looked everywhere to find
 > >> some means of tuning NFS performance with ESX into returning  
 > >> something acceptable using osol 2008.11.  I''ve
eliminated
 > >> everything but the NFS portion of the equation and am looking for
 > >> some pointers in the right direction.
 > >>
 > >> Configuration: PE2950 bi pro Xeon, 32Gb RAM with an MD1000 using
a
 > >> zpool of 7 mirror vdevs. ESX 3.5 and 4.0.  Pretty much a vanilla
 > >> install across the board, no additional software other than the  
 > >> Adaptec StorMan to manage the disks.
 > >>
 > >> local performance via dd - 463MB/s write, 1GB/s read (8Gb file)
 > >> iSCSI performance - 90MB/s write, 120MB/s read (800Mb file from a
VM)
 > >> NFS performance - 1.4MB/s write, 20MB/s read (800Mb file from the
 > >> Service Console, transfer of a 8Gb file via the datastore
browser)
 > >>
 > >> I just found the tool latencytop which points the finger at the
ZIL
 > >> (tip of the hat to Lejun Zhu).  Ref:
<http://www.infrageeks.com/zfs/nfsd.png
 > >> > & <http://www.infrageeks.com/zfs/fsflush.png>. 
Log file: <http://www.infrageeks.com/zfs/latencytop.log
 > >> >
 > >>
 > >> Now I can understand that there is a performance hit associated  
 > >> with this feature of ZFS for ensuring data integrity, but this  
 > >> drastic a difference makes no sense whatsoever. The pool is
capable
 > >> of handling natively (at worst) 120*7 IOPS and I''m not
even seeing
 > >> enough to saturate a USB thumb drive. This still doesn''t
answer why
 > >> the read performance is so bad either.  According to latencytop,
 > >> the culprit would be genunix`cv_timedwait_sig rpcmod`svc
 > >>
 > >> From my searching it appears that there''s no async
setting for the
 > >> osol nfsd, and ESX does not offer any mount controls to force an
 > >> async connection.  Other than putting in an SSD as a ZIL (which  
 > >> still strikes me as overkill for basic NFS services) I''m
looking
 > >> for any information that can bring me up to at least reasonable  
 > >> throughput.
 > >>
 > >> Would a dedicated 15K SAS drive help the situation by moving the
 > >> ZIL traffic off to a dedicated device? Significantly? This is the
 > >> sort of thing that I don''t want to do without some
reasonable
 > >> assurance that it will help since you can''t remove a ZIL
device
 > >> from a pool at the moment.
 > >>
 > >> Hints and tips appreciated,
 > >>
 > >> Erik
 > >> _______________________________________________
 > >> nfs-discuss mailing list
 > >> nfs-discuss at opensolaris.org
 > >
 > 
 > _______________________________________________
 > zfs-discuss mailing list
 > zfs-discuss at opensolaris.org
 > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

zfs discuss - Jul 2009 - NFS, ZFS & ESX

[zfs-discuss] NFS, ZFS & ESX

[zfs-discuss] NFS, ZFS & ESX

[zfs-discuss] [nfs-discuss] NFS, ZFS & ESX

[zfs-discuss] NFS, ZFS & ESX

[zfs-discuss] [nfs-discuss] NFS, ZFS & ESX

[zfs-discuss] [nfs-discuss] NFS, ZFS & ESX

[zfs-discuss] [nfs-discuss] NFS, ZFS & ESX