Stephen Stogner
2008-Jul-31 17:27 UTC
[zfs-discuss] Terrible zfs performance under NFS load
Hello, We have a S10U5 server sharing with zfs sharing up NFS shares. While using the nfs mount for a log destination for syslog for 20 or so busy mail servers we have noticed that the throughput becomes severly degraded shortly. I have tried disabling the zil, turning off cache flushing and I have not seen any changes in performance. The servers are only pushing about 1MB/s of constant traffic to the server over nfs of log data. I think this is due to the cache being flushed with every nfs commit, I was wondering if any one had any other suggestions as to what it could be? Thank you. This message posted from opensolaris.org
Stephen Stogner wrote:> Hello, > We have a S10U5 server sharing with zfs sharing up NFS shares. While using the nfs mount for a log destination for syslog for 20 or so busy mail servers we have noticed that the throughput becomes severly degraded shortly. I have tried disabling the zil, turning off cache flushing and I have not seen any changes in performance. The servers are only pushing about 1MB/s of constant traffic to the server over nfs of log data. I think this is due to the cache being flushed with every nfs commit, I was wondering if any one had any other suggestions as to what it could be? Thank you. >Not that this is deals with the nfs/zfs performance you are experiencing, but why not forward the syslog directly to the target machine and allow it to write the syslog files locally to the filesystem? -- paul
Richard Elling
2008-Jul-31 17:41 UTC
[zfs-discuss] Terrible zfs performance under NFS load
Stephen Stogner wrote:> Hello, > We have a S10U5 server sharing with zfs sharing up NFS shares. While using the nfs mount for a log destination for syslog for 20 or so busy mail servers we have noticed that the throughput becomes severly degraded shortly. I have tried disabling the zil, turning off cache flushing and I have not seen any changes in performance. The servers are only pushing about 1MB/s of constant traffic to the server over nfs of log data. I think this is due to the cache being flushed with every nfs commit, I was wondering if any one had any other suggestions as to what it could be? Thank you. >Silly question, since syslog is designed to log to loghosts over the net, why don''t you send the syslogs to the NFS server (loghost) directly? In the current situation, it is a workload that would be helped by a separate ZIL log or NVRAM-fronted storage. If you do what I suggest above, there will be no need for separate ZIL log or NVRAM-fronted storage. -- richard
Stephen Stogner
2008-Jul-31 17:48 UTC
[zfs-discuss] Terrible zfs performance under NFS load
True we could have all the syslog data be directed towards the host but the underlying issue remains the same with the performance hit. We have used nfs shares for log hosts and mail hosts and we are looking towards using a zfs based mail store with nfs moutnts from x mail servers but if nfs/zfs combo take such a performance hit I would need to investigate another solution. This message posted from opensolaris.org
Stephen Stogner wrote:> True we could have all the syslog data be directed towards the host but the underlying issue remains the same with the performance hit. We have used nfs shares for log hosts and mail hosts and we are looking towards using a zfs based mail store with nfs moutnts from x mail servers but if nfs/zfs combo take such a performance hit I would need to investigate another solution. >Syslog is funny in that it does a lot of open/write/close cycles so that rotate can work trivially. Those are meta-data updates and on NFS each implies a COMMIT. This leads us back to the old "solaris nfs over zfs is slow" discussion, where we talk about the fact that other nfs servers does not honor the COMMIT semantics and can lose data. I for one do *not* want solaris nfs/zfs to behave in any way other than it does. The bottom line is that for high COMMIT rate nfs workloads you need to do as Richard suggests and look into setting up a slog (separate intent log) on fast disks (or SSD) away from the rest of the storage pool. In spite of this, I would still recommend that you forward syslog traffic just for the sake of marshalling your resources for important work, rather than waste. -- paul
Nicolas Williams
2008-Jul-31 19:03 UTC
[zfs-discuss] Async open(2)/close(2) (Re: Terrible zfs performance under NFS load)
On Thu, Jul 31, 2008 at 01:07:20PM -0500, Paul Fisher wrote:> Stephen Stogner wrote: > > True we could have all the syslog data be directed towards the host but the underlying issue remains the same with the performance hit. We have used nfs shares for log hosts and mail hosts and we are looking towards using a zfs based mail store with nfs moutnts from x mail servers but if nfs/zfs combo take such a performance hit I would need to investigate another solution. > > > Syslog is funny in that it does a lot of open/write/close cycles so that > rotate can work trivially. Those are meta-data updates and on NFS each > implies a COMMIT. This leads us back to the old "solaris nfs over zfs > is slow" discussion, where we talk about the fact that other nfs servers > does not honor the COMMIT semantics and can lose data. I for one do > *not* want solaris nfs/zfs to behave in any way other than it does.One more place where async variants of open(2) and close(2) would be very useful. NFS requires OPENs and CLOSEs to be synchronous, and for good reasons too. So then syslogd could do an async open of the log file and return to its event loop, then when the open completes it could write queued up log entries, then async close (or maybe async open, async write, async close, then return to the event loop). Such async syscalls can be emulated with threads. The same approach could help tar/cpio/... but there you have other sources of serialization, namely: compression. Nico --
Richard Elling
2008-Jul-31 21:31 UTC
[zfs-discuss] Terrible zfs performance under NFS load
Stephen Stogner wrote:> True we could have all the syslog data be directed towards the host but the underlying issue remains the same with the performance hit. We have used nfs shares for log hosts and mail hosts and we are looking towards using a zfs based mail store with nfs moutnts from x mail servers but if nfs/zfs combo take such a performance hit I would need to investigate another solution. >Thinking about this more... if you write local logs to an NFS server, then it should be mounted as hard, which means that when the NFS is down, and pending writes to the logs are patiently waiting for the server to return. This dependency is not generally what you want for failure-logging, which is, in part, why the syslog protocols use UDP. I would strongly suggest using syslog as it was intended to be used. -- richard
Chris Siebenmann
2008-Aug-01 16:24 UTC
[zfs-discuss] Terrible zfs performance under NFS load
| Syslog is funny in that it does a lot of open/write/close cycles so | that rotate can work trivially. I don''t know of any version of syslog that does this (certainly Solaris 10 U5 syslog does not). The traditional syslog(d) performance issue is that it fsync()''s after writing each log message, in an attempt to maximize the chances that the log message will make it to disk and survive a system crash, power outage, etc. (Some versions of syslog let you turn this off for specific log files, which is very useful for high volume, low importance ones.) I''ve heard that at one point, NFS + ZFS was known to have performance issues with fsync()-heavy workloads. I don''t know if that''s still true today (in either Solaris 10U5 or current OpenSolaris builds), or if all of the issues have been fixed. - cks
>>>>> "cs" == Chris Siebenmann <cks at cs.toronto.edu> writes:cs> (Some versions of syslog let you turn this off for specific cs> log files, which is very useful for high volume, low cs> importance ones.) To ensure that kernel messages are written to disk promptly, syslogd(8) calls fsync(2) after writing messages from the kernel. Other messages are not synced explcitly. You may disable syncing of files specified to receive kernel messages by prefixing the pathname with a minus sign `-''. That''s from BSD, which fsync''s kernel messages only, not messages from libc. try adding a ''-'' to the start of your log filename. It probably won''t work with your syslog variant, though. If your syslog is calling fsync on all messages not just kernel messages, then moving to the syslog protocol between client and ZFS server instead of NFS might not help. If you test more, let us know what happens. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 304 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20080801/85cc402b/attachment.bin>
Bob Friesenhahn
2008-Aug-03 15:56 UTC
[zfs-discuss] Terrible zfs performance under NFS load
On Thu, 31 Jul 2008, Paul Fisher wrote:>> > Syslog is funny in that it does a lot of open/write/close cycles so that > rotate can work trivially. Those are meta-data updates and on NFS each > implies a COMMIT. This leads us back to the old "solaris nfs over zfs > is slow" discussion, where we talk about the fact that other nfs servers > does not honor the COMMIT semantics and can lose data. I for one do > *not* want solaris nfs/zfs to behave in any way other than it does.There is the additional problem that in order for the NFS client to update the log file, part of it needs to be read first. This results in a hit from needing to read a chunk of data in whatever blocksize ZFS decided to use. The entire chunk needs to be read from disk (could be cached in memory). Each read requires that the checksum be computed and verified. If the updates are small compared with the blocksize, then considerable resources are expended just computing the checksum. When the data is written back, the checksum needs to be computed for the new block. If ZFS does not update (enlarge) the head (tail?) block on the file, then it would end up using tiny block sizes leading to terrible fragmentation. Bob =====================================Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/