Chris Siebenmann
2008-May-09 20:19 UTC
[zfs-discuss] Weird performance issue with ZFS with lots of simultaneous reads
I have a ZFS-based NFS server (Solaris 10 U4 on x86) where I am seeing a weird performance degradation as the number of simultaneous sequential reads increases. Setup: NFS client -> Solaris NFS server -> iSCSI target machine There are 12 physical disks on the iSCSI target machine. Each of them is sliced up into 11 parts and the parts exported as individual LUNs to the Solaris server. The Solaris server uses each LUN as a separate ZFS pool (giving 132 pools in total) and exports them all to the NFS client. (The NFS client and the iSCSI target machine are both running Linux. The Solaris NFS server has 4 GB of RAM.) When the NFS client starts a sequential read against one filesystem from each physical disk, the iSCSI target machine and the NFS client both use the full network bandwidth and each individual read gets 1/12th of it (about 9.something MBytes/sec). Starting a second set of sequential reads against each disk (to a different pool) behaves the same, as does starting a third set. However, when I add a fourth set of reads thing change; while the NFS server continues to read from the iSCSI target at full speed, the data rate to the NFS client drops significantly. By the time I hit 9 reads per physical disk, the NFS client is getting a *total* of 8 MBytes/sec. In other words, it seems that ZFS on the NFS server is somehow discarding most of what it reads from the iSCSI disks, although I can''t see any sign of this in ''vmstat'' output on Solaris. Also, this may not be just an NFS issue; in limited testing with local IO on the Solaris machine it seems that I may be seeing the same effect with the same rough magnitude. (It is limited testing because it is harder to accurately measure what aggregate data rate I''m getting and harder to run that many simultaneous reads, as if I run too many of them the Solaris machine locks up due to overload.) Does anyone have any ideas of what might be going on here, and how I might be able to tune things on the Solaris machine so that it performs better in this situation (ideally without harming performance under smaller loads)? Would partitioning the physical disks on Solaris instead of splitting them up on the iSCSI target make a significant difference? Thanks in advance. - cks
Robin Guo
2008-May-09 21:52 UTC
[zfs-discuss] Weird performance issue with ZFS with lots of simultaneous reads
Hi, Chris, Good topic, I''d like to see comments from expert as well. Firstly, I think it has some punishment from NFS, ZFS/NFS has performance lost, and the L2ARC cache feature is the way to solve it, so far. (Has in opensolaris, but not in s10u4 yet, will target in s10u6 release). And, I also see the performance lost while I try iSCISI from local machine, but I didn''t gather the accurate data yet. That might be a problem need evaluate. I''ll trace this thread to see if any advance, thanks for bring out the topic. - Regards, Robin Guo Chris Siebenmann wrote:> I have a ZFS-based NFS server (Solaris 10 U4 on x86) where I am seeing > a weird performance degradation as the number of simultaneous sequential > reads increases. > > Setup: > NFS client -> Solaris NFS server -> iSCSI target machine > > There are 12 physical disks on the iSCSI target machine. Each of them > is sliced up into 11 parts and the parts exported as individual LUNs to > the Solaris server. The Solaris server uses each LUN as a separate ZFS > pool (giving 132 pools in total) and exports them all to the NFS client. > > (The NFS client and the iSCSI target machine are both running Linux. > The Solaris NFS server has 4 GB of RAM.) > > When the NFS client starts a sequential read against one filesystem > from each physical disk, the iSCSI target machine and the NFS client > both use the full network bandwidth and each individual read gets > 1/12th of it (about 9.something MBytes/sec). Starting a second set of > sequential reads against each disk (to a different pool) behaves the > same, as does starting a third set. > > However, when I add a fourth set of reads thing change; while the > NFS server continues to read from the iSCSI target at full speed, the > data rate to the NFS client drops significantly. By the time I hit > 9 reads per physical disk, the NFS client is getting a *total* of 8 > MBytes/sec. In other words, it seems that ZFS on the NFS server is > somehow discarding most of what it reads from the iSCSI disks, although > I can''t see any sign of this in ''vmstat'' output on Solaris. > > Also, this may not be just an NFS issue; in limited testing with local > IO on the Solaris machine it seems that I may be seeing the same effect > with the same rough magnitude. > > (It is limited testing because it is harder to accurately measure what > aggregate data rate I''m getting and harder to run that many simultaneous > reads, as if I run too many of them the Solaris machine locks up due to > overload.) > > Does anyone have any ideas of what might be going on here, and how I > might be able to tune things on the Solaris machine so that it performs > better in this situation (ideally without harming performance under > smaller loads)? Would partitioning the physical disks on Solaris instead > of splitting them up on the iSCSI target make a significant difference? > > Thanks in advance. > > - cks > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >
Robert Milkowski
2008-May-14 23:30 UTC
[zfs-discuss] Weird performance issue with ZFS with lots of simultaneous reads
Hello Chris, Friday, May 9, 2008, 9:19:53 PM, you wrote: CS> I have a ZFS-based NFS server (Solaris 10 U4 on x86) where I am seeing CS> a weird performance degradation as the number of simultaneous sequential CS> reads increases. CS> Setup: CS> NFS client -> Solaris NFS server -> iSCSI target machine CS> There are 12 physical disks on the iSCSI target machine. Each of them CS> is sliced up into 11 parts and the parts exported as individual LUNs to CS> the Solaris server. The Solaris server uses each LUN as a separate ZFS CS> pool (giving 132 pools in total) and exports them all to the NFS client. CS> (The NFS client and the iSCSI target machine are both running Linux. CS> The Solaris NFS server has 4 GB of RAM.) CS> When the NFS client starts a sequential read against one filesystem CS> from each physical disk, the iSCSI target machine and the NFS client CS> both use the full network bandwidth and each individual read gets CS> 1/12th of it (about 9.something MBytes/sec). Starting a second set of CS> sequential reads against each disk (to a different pool) behaves the CS> same, as does starting a third set. CS> However, when I add a fourth set of reads thing change; while the CS> NFS server continues to read from the iSCSI target at full speed, the CS> data rate to the NFS client drops significantly. By the time I hit CS> 9 reads per physical disk, the NFS client is getting a *total* of 8 CS> MBytes/sec. In other words, it seems that ZFS on the NFS server is CS> somehow discarding most of what it reads from the iSCSI disks, although CS> I can''t see any sign of this in ''vmstat'' output on Solaris. Keep in mind that you will end up with a lot of seeks on physical drives once you do multiple sqeuntial reads from differnt disk regions. Nevertheless I wouldn''t expect much difference in throughput between nfs client and iscsi server. I''m thinking that maybe you are hitting the issue with vdev cache as probably you ended up with 8KB reads over nfs (RSIZE) and 64KB reads from iSCSI. You have 4GB of ram and I''m assuming most of it is free (used by ARC cache)... or maybe it is actually not the case so vdev cache reads 64KB, nfs client reads 8KB and by the time it asks for another 8KB it is already gone... since your box "locks up" - maybe iscsi target or other application has a memory leak? Is your system using swap device just before it "locks up"? Try on fs client to mount filesystems with RSIZE=32KB and make sure your scripts/programs are also requesting at least 32KB at the time. Check if it help. If it doesn''t then disable vdev cache on solaris box (by setting zfs_vdev_cache_max to 1). And check again. CS> (It is limited testing because it is harder to accurately measure what CS> aggregate data rate I''m getting and harder to run that many simultaneous CS> reads, as if I run too many of them the Solaris machine locks up due to CS> overload.) that''s strange - what exactly happens when it "locks up"? Does it panic? CS> smaller loads)? Would partitioning the physical disks on Solaris instead CS> of splitting them up on the iSCSI target make a significant difference? Why do you want to partition them in a first place? Why not present each disk as an iscsi lun then create a pool out of it and if necessary create multiple file systems inside. Then what about data protection - don''t you want to use any RAID? -- Best regards, Robert Milkowski mailto:milek at task.gda.pl http://milek.blogspot.com
Chris Siebenmann
2008-May-15 04:42 UTC
[zfs-discuss] Weird performance issue with ZFS with lots of simultaneous reads
I wrote: | I have a ZFS-based NFS server (Solaris 10 U4 on x86) where I am | seeing a weird performance degradation as the number of simultaneous | sequential reads increases. To update zfs-discuss on this: after more investigation, this seems to be due to file-level prefetching. Turning file-level prefetching off (following the directions of the ZFS Evil Tuning Guide) returns NFS server performance to full network bandwidth when there are lots of simultaneous sequential reads. Unfortunately it significantly reduces the performance of a single sequential read (when the server is otherwise idle). The problem is definitely not an issue of having too many pools or too many LUNS; I saw the same issue with a single striped pool made from 12 whole-disk LUNs. (And the issue happens locally as well as remotely, so it''s not NFS; it''s just easier to measure with an NFS client, because you can clearly see the (maximum) aggregate data rate to all of the sequential reads.) | CS> (It is limited testing because it is harder to accurately measure | CS> what aggregate data rate I''m getting and harder to run that many | CS> simultaneous reads, as if I run too many of them the Solaris | CS> machine locks up due to overload.) | | that''s strange - what exactly happens when it "locks up"? Does it | panic? I have to apologize; this happened during an earlier round of tests, when the Solaris machine had too little memory for the number of pools I had on it. According to my notes, the behavior in the with-prefetch state is that the machine can survive but is extremely unresponsive until the test programs finish. (I haven''t retested with file prefetching turned off.) (Here ''locks up'' means it becomes basically totally unresponsive, although it seems to still be doing IO.) I am using a test program that is basically dd with some reporting; it reads a 1 MB buffer from standard in and writes it to standard out. In these tests, each reader''s stdin is a (different) 10 GB file and their stdout is /dev/null. - cks
Robert Milkowski
2008-May-16 08:09 UTC
[zfs-discuss] Weird performance issue with ZFS with lots of simultaneous reads
Hello Chris, Thursday, May 15, 2008, 5:42:32 AM, you wrote: CS> I wrote: CS> | I have a ZFS-based NFS server (Solaris 10 U4 on x86) where I am CS> | seeing a weird performance degradation as the number of simultaneous CS> | sequential reads increases. CS> To update zfs-discuss on this: after more investigation, this seems CS> to be due to file-level prefetching. Turning file-level prefetching CS> off (following the directions of the ZFS Evil Tuning Guide) returns CS> NFS server performance to full network bandwidth when there are lots CS> of simultaneous sequential reads. Unfortunately it significantly CS> reduces the performance of a single sequential read (when the server is CS> otherwise idle). Have you tried to disable vdev caching and leave file level prefetching? -- Best regards, Robert Milkowski mailto:milek at task.gda.pl http://milek.blogspot.com
Chris Siebenmann
2008-May-16 14:45 UTC
[zfs-discuss] Weird performance issue with ZFS with lots of simultaneous reads
| Have you tried to disable vdev caching and leave file level | prefetching? If you mean setting zfs_vdev_cache_bshift to 13 (per the ZFS Evil Tuning Guide) to turn off device-level prefetching then yes, I have tried turning off just that; it made no difference. If there''s another tunable then I don''t know about it and haven''t tried it (and would be pleased to). - cks