thr3ads.net - zfs discuss - [zfs-discuss] SPEC SFS testing of NFS/ZFS/B56 [Feb 2007]

If this information is useful, please help other people find it:
Share via:

Leon Koll

2007-Feb-12 23:32 UTC

[zfs-discuss] SPEC SFS testing of NFS/ZFS/B56

Hello,
I am running SPEC SFS benchmark [1] on dual Xeon 2.80GHz box with 4GB memory.
More details:
snv_56, zil_disable=1, zfs_arc_max = 0x80000000 #2GB
Configurations that were tested: 
160 dirs/1 zfs/1 zpool/4 SAN LUNs 
160 zfs''es/1 zpool/4 SAN LUNs
40 zfs''es/4 zpools/4 SAN LUNs
One zpool was created on 4 SAN LUNs. The SAN storage array used doesn''t
honor flush cache commands.
NFSD_SERVERS=1024, NFS3 via UDP was used.
Max. number of obtained SPEC NFS IOPS: 5K
Max. number of SPEC NFS IOPS for SVM/VxFS configuration obtained before: 24K [2]
So we have almost a five-times difference. Can we improve this? How can we
accelerate this NFS/ZFS setup?
Two serious problems were observed:
1.Degradation of benchmark results of the same setup. The same benchmark gave
first time 4030 IOPS, when was ran second time - 2037 IOPS.
2.When 4 zpools were used instead of 1, the result was degraded about 4 times.

The benchmark report shows abnormally high part of [b]readdirplus[/b] operations
that reached 50% of the test time. It''s part in SFS mix is: 9%. Does it
point to some known problem? Increasing of DNLC size doesn''t help in
case ZFS, I checked this.
I will appreciate your help very much. This testing is a part of preparation for
production deployment. I will provide any additional information that may be
needed.

Thank you,
[i]-- leon[/i]










[1] http://www.spec.org/osg/sfs/
[2] http://napobo3.blogspot.com/2006/08/spec-sfs-bencmark-of-zfsufsvxfs.html
[3] http://www.opensolaris.org/jive/thread.jspa?threadID=23263
 
 
This message posted from opensolaris.org

Leon Koll

2007-Feb-14 09:35 UTC

head link

[zfs-discuss] Re: SPEC SFS benchmark of NFS/ZFS/B56 - please help to improve it!

An update:

Not sure is it related to the fragmentation, but I can say that serious
performance degradation in my NFS/ZFS benchmarks is a result of on-disk ZFS data
layout.
Read operations on directories (NFS3 readdirplus) are abnormally time consuming
. That kills the server. After cold restart of the host the performans is still
on the flour.
My conclusion: it''s not CPU, not memory, it''s ZFS on-disk
structures.
 
 
This message posted from opensolaris.org

Robert Milkowski

2007-Feb-14 09:43 UTC

head link

[zfs-discuss] Re: SPEC SFS benchmark of NFS/ZFS/B56 - please help to improve it!

Hello Leon,

Wednesday, February 14, 2007, 10:35:05 AM, you wrote:

LK> An update:

LK> Not sure is it related to the fragmentation, but I can say that
LK> serious performance degradation in my NFS/ZFS benchmarks is a
LK> result of on-disk ZFS data layout.
LK> Read operations on directories (NFS3 readdirplus) are abnormally
LK> time consuming . That kills the server. After cold restart of the
LK> host the performans is still on the flour. 
LK> My conclusion: it''s not CPU, not memory, it''s ZFS
on-disk structures.
LK>  

Before jumping to any conclusions - first try to eliminate nfs and do
readdirs locally - I guess that would be quite fast. Then check on a
client (dtrace) the time distribution of nfs requests and sends us
results.

You may also want to fiddle with async clusters on a nfs client to see
if it makes any difference.

-- 
Best regards,
 Robert                            mailto:rmilkowski at task.gda.pl
                                       http://milek.blogspot.com

Leon Koll

2007-Feb-18 19:29 UTC

head link

[zfs-discuss] Re: Re: SPEC SFS benchmark of NFS/ZFS/B56 - please help to improve it!

Robert wrote:
> Before jumping to any conclusions - first try to
> eliminate nfs and do readdirs locally - I guess that would be quite fast.
> Then check on a client (dtrace) the time distribution of nfs requests
> and sends us results.
We used this test program that is doing readdirs and can be run with one
argument: name of directory to dig into - like in this example :
[b]rdir /mnt[/b]
The program can be downloaded here:
http://tinyurl.com/ywcyyp/rdir.c    (source code)
http://tinyurl.com/ywcyyp/rdir       (executable for sparc)
http://tinyurl.com/ywcyyp/rdir.x86 (executable for x86)

Results:
1.local ZFS - we have 160 zfs''es under /tank1 : /tank1/1.../tank1/160
that were created during the SFS benchmark run

# ptime /var/tmp/rdir /tank1
real     [b]1:37.824[/b]
user        1.637
sys        38.498

again:
real     1:27.001
user        1.595
sys        32.146

To avoid an influence of local runs on the NFS runs:
# zfs unmount -a
# zfs mount -a
# zfs share -a  (160 shares)

2. NFS
ssh to NFS client, create 160 dirs under /mnt, mount /tank1/i to /mnt/i
(i=1...160) from the NFS server> ptime /var/tmp/rdir /mntreal     [b]1:48.983[/b]
user        1.096
sys        17.265

again:
real     [b]3:51.001[/b]
user        1.657
sys        27.468

There is definitely a problem - 2nd NFS run is more than 2 times longer! What is
the reason ?
 
 
This message posted from opensolaris.org

Roch - PAE

2007-Feb-19 16:42 UTC

head link

[zfs-discuss] Re: SPEC SFS benchmark of NFS/ZFS/B56 - please help to improve it!

Leon Koll writes:
 > An update:
 > 
 > Not sure is it related to the fragmentation, but I can say that serious
performance degradation in my NFS/ZFS benchmarks is a result of on-disk ZFS data
layout.
 > Read operations on directories (NFS3 readdirplus) are abnormally time
consuming . That kills the server. After cold restart of the host the performans
is still on the flour.
 > My conclusion: it''s not CPU, not memory, it''s ZFS
on-disk structures.
 >  
 >  
 > This message posted from opensolaris.org
 > _______________________________________________
 > zfs-discuss mailing list
 > zfs-discuss at opensolaris.org
 > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


As I understand the issue, a readdirplus is
2X slower when data is already cached in the client than
when it is not.

Given that the on-disk structure does not change between the 
2 runs, I can''t really place the fault on it.

-r

Leon Koll

2007-Feb-20 11:21 UTC

head link

[zfs-discuss] Re: Re: SPEC SFS benchmark of NFS/ZFS/B56 - please help to improve it!

> 
> As I understand the issue, a readdirplus is
> 2X slower when data is already cached in the client
> than when it is not.
Yes, that''s the issue. It''s not always 2X slower, but ALWAYS
SLOWER.
My another 2runs on NFS/ZFS show:
1. real     3:14.185
    user        2.249
    sys        33.083

2. real     4:47.681
    user        2.578
    sys        40.733
> 
> Given that the on-disk structure does not change
> between the 
> 2 runs, I can''t really place the fault on it.
You mixed two different tests described in this thread: first is spec.org SFS
that shows the bad results on NFS/ZFS even after reboot and second is our own
"rdir" that was written to understand the SFS problem and exposed the
weird/erroneous behaviour of NFS/ZFS combination.

Thank you for your attention.
[i]-- leon[/i]
 
 
This message posted from opensolaris.org

Leon Koll

2007-Feb-21 12:14 UTC

head link

[zfs-discuss] Re: Re: SPEC SFS benchmark of NFS/ZFS/B56 - please help to improve it!

More detailed description of readdir test and conclusion at the end:

Roch asked me:> Is this a NFS V3 or V4 test or don''t care ?
I am running NFS V3 but the short test of NFS V4 showed that the
problem is there.

Then Roch asked:> I''ve run rdir  on a few of  my large directories, However my
> large directories  are  not much larger  than  ncsize, maybe
> your''s are. Do I understand that you hit the issue only upon
> first large rdir after reboot ?
After reboot of the NFS client (see below).

Then Roch added:> If so, it might me that we get a speedup from the part of
> the run in which we are initially filling the dnlc cache.
> That could explain thge increase in sys time. But the real
> time increase seems too much to be due to this.
>
> Anyway I''m interested in the directory size rdir reports and
> the ncsize/D from mdb -k. Also a third pass through might
> yield a lead.
>
> -r
ncsize has a default value. People told me "don''t increase dnlc
size when running ZFS".
# echo ''ncsize/D'' | mdb -k
ncsize:
ncsize:         129675

Directory size? There are 160 ZFS''es under zpool tank1, each ZFS is
202MB, total 31.5GB, 1224000 files

# zpool list
NAME                    SIZE    USED   AVAIL    CAP  HEALTH     ALTROOT
tank1                   382G   31.5G    351G     8%  ONLINE     -

More detailed results:
ZFS local runs - "normal behavior":

1.     2:33.406
2.     2:25.353
3.     2:27.033

NFS V3/ZFS runs - first is ok, then jumped up:

1.     3:14.185
2.     4:47.681
3.     4:52.213
4.     4:49.841
5.     4:53.069
6.     4:45.290

after reboot of the NFS client:

1.     2:56.760
2.     4:43.397

after reboot of both client and server:

1.real     3:12.841
2.real     4:50.869

after reboot of the NFS server only:

1.     5:15.048
2.     4:54.686
3.     4:48.713

It means the problem is on the NFS client: after rebbot of the client the first
run is "ok", then all the rest are "bad". When the server
was rebooted, it didn''t help and the results stayed "bad".

Roch replied :> I''d hypothesize that when the client doesn''t know about a
file he
> just gets the data and boom. But once he''s got a cached copy
> he needs more time to figure out if the data is up to date.
>
> This seems to have been a tradeoff of metadata operations in favor of
> faster data op (!?).
> 
> Note also that SFS doesn''t use the client''s NFS code. It
> runs it''s own user space client.
The fact that the described problem is 100%-NFS-client-problem, there is nothing
to do with ZFS code to improve the situtaion.
And the SFS problem we observed (see the first message in this thread) has
nothing common with this one. Unfortunately, the abnormal behavior of NFS/ZFS
during an SFS test didn''t get much attention so I don''t have
any clue. Anyway, I''ll update this thread when I have more information
on the problem.
 
 
This message posted from opensolaris.org

Matthew Ahrens

2007-Feb-22 05:09 UTC

head link

[zfs-discuss] Re: Re: SPEC SFS benchmark of NFS/ZFS/B56 - please help to improve it!

Leon Koll wrote:> The fact that the described problem is 100%-NFS-client-problem, there
> is nothing to do with ZFS code to improve the situtaion.
You may want to see if the folks over at nfs-discuss at opensolaris.org 
have any ideas on your NFS problem.

--matt

zfs discuss - Feb 2007 - SPEC SFS testing of NFS/ZFS/B56

[zfs-discuss] SPEC SFS testing of NFS/ZFS/B56

[zfs-discuss] Re: SPEC SFS benchmark of NFS/ZFS/B56 - please help to improve it!

[zfs-discuss] Re: SPEC SFS benchmark of NFS/ZFS/B56 - please help to improve it!

[zfs-discuss] Re: Re: SPEC SFS benchmark of NFS/ZFS/B56 - please help to improve it!

[zfs-discuss] Re: SPEC SFS benchmark of NFS/ZFS/B56 - please help to improve it!

[zfs-discuss] Re: Re: SPEC SFS benchmark of NFS/ZFS/B56 - please help to improve it!

[zfs-discuss] Re: Re: SPEC SFS benchmark of NFS/ZFS/B56 - please help to improve it!

[zfs-discuss] Re: Re: SPEC SFS benchmark of NFS/ZFS/B56 - please help to improve it!