Hi,
I have a two server setup that acts as SMB as well as NFS servers in
active/active configuration managed by CTDB(http://ctdb.samba.org/).
The write performance is around 100MB/s per client however the read
performance is only 0.6MB/s (using Iozone benchmark). I use Windows 2003
Server as CIFS client. Sometimes the read performance is good only from one
of the CTDB managed Samba servers but not consistent when you restart CTDB +
Samba.
Raw network bandwidth between CIFS client and Samba is greater than 200MB/s.
I tried changing the SO_RCVBUF and SO_SNDBUF size and smb.conf locking
options. Tracing smbd indicates sendfile being used for read and no errors
(-EINVAL or -EIO).
Any idea what is contributing the overhead during the read operation?
smb.conf is as follows:
 [global]
        workgroup = TESTDOMAIN2
        realm = TESTDOMAIN2.LOCAL
        netbios name = CTDB-HEAD
        server string = Clustered CIFS
        security = ADS
        auth methods = winbind, sam
        password server = 172.16.2.25
        private dir = /mnt/gpfs/CTDB_AD
        passdb backend = tdbsam
        log level = 1
        log file = /var/log/samba/log.%m
        max log size = 10000
        deadtime = 15
        socket options = IPTOS_LOWDELAY TCP_NODELAY SO_RCVBUF=262144
SO_SNDBUF=262144 SO_KEEPALIVE
        use mmap = No
        clustering = Yes
        disable spoolss = Yes
        machine password timeout = 999999999
        local master = No
        dns proxy = No
        ldap admin dn = cn=ldap,cn=Users,dc=testdomain2,dc=local
        ldap idmap suffix = dc=testdomain2,dc=local
        ldap suffix = dc=testdomain2,dc=local
        idmap backend = ad
        idmap uid = 5000-100000000
        idmap gid = 5000-100000000
        template homedir = /home/%D+%U
        template shell = /bin/bash
        winbind separator = +
        winbind enum users = Yes
        winbind enum groups = Yes
        notify:inotify = no
        idmap:cache = no
        gpfs:leases = yes
        nfs4:acedup = merge
        nfs4:chown = yes
        nfs4:mode = special
        gpfs:sharemodes = yes
        fileid:mapping = fsname
        force unknown acl user = Yes
        use sendfile = Yes
        mangled names = No
        blocking locks = No
        strict locking = No
        wide links = No
        vfs objects = syncops, gpfs, fileid
        large readwrite = yes
        oplocks = yes
        getwd cache = yes
        blocking locks = no
[global-share]
        comment = GPFS File Share
        path = /mnt/gpfs/nfsexport
        read only = No
        inherit permissions = Yes
        inherit acls = Yes
        oplocks = yes
Thanks in Advance,
-Tim
On Tue, Jan 27, 2009 at 6:30 PM, tim clusters <tim.clusters@gmail.com>wrote:> Hi, > > I have a two server setup that acts as SMB as well as NFS servers in > active/active configuration managed by CTDB(http://ctdb.samba.org/). > > The write performance is around 100MB/s per client however the read > performance is only 0.6MB/s (using Iozone benchmark). I use Windows 2003 > Server as CIFS client. Sometimes the read performance is good only from one > of the CTDB managed Samba servers but not consistent when you restart CTDB + > Samba. > >The issue is resolved and was network related. Tcpdump revealed lots of retransmission from the server to client owing to improper TcpWindowSize value. Cheers, -Tim
On Thursday 29 January 2009 21:40:55 tim clusters wrote:> On Tue, Jan 27, 2009 at 6:30 PM, tim clusters <tim.clusters@gmail.com>wrote: > > Hi, > > > > I have a two server setup that acts as SMB as well as NFS servers in > > active/active configuration managed by CTDB(http://ctdb.samba.org/). > > > > The write performance is around 100MB/s per client however the read > > performance is only 0.6MB/s (using Iozone benchmark). I use Windows 2003 > > Server as CIFS client. Sometimes the read performance is good only from > > one of the CTDB managed Samba servers but not consistent when you restart > > CTDB + Samba. > > The issue is resolved and was network related. Tcpdump revealed lots of > retransmission from the server to client owing to improper TcpWindowSize > value. > > Cheers, > -TimTim, Thanks for reporting that back to the list. This is useful information for others. Would it be possible to perhaps provide a little more detail? Cheers, John T.
On Thu, Jan 29, 2009 at 10:45 PM, John H Terpstra <jht@samba.org> wrote:> On Thursday 29 January 2009 21:40:55 tim clusters wrote: > > On Tue, Jan 27, 2009 at 6:30 PM, tim clusters <tim.clusters@gmail.com > >wrote: > > > Hi, > > > > > > I have a two server setup that acts as SMB as well as NFS servers in > > > active/active configuration managed by CTDB(http://ctdb.samba.org/). > > > > > > The write performance is around 100MB/s per client however the read > > > performance is only 0.6MB/s (using Iozone benchmark). I use Windows > 2003 > > > Server as CIFS client. Sometimes the read performance is good only from > > > one of the CTDB managed Samba servers but not consistent when you > restart > > > CTDB + Samba. > > > > The issue is resolved and was network related. Tcpdump revealed lots of > > retransmission from the server to client owing to improper TcpWindowSize > > value. > > > > Cheers, > > -Tim > > Tim, > > Thanks for reporting that back to the list. This is useful information for > others. Would it be possible to perhaps provide a little more detail? >I apologize for being too terse. I myself need to narrow the right settings for SO_RCVBUF,SO_SNDBUF and TCP/IP settings to get max bandwidth. Initially, I had set SO_RCVBUF and SO_SNBUF to 262144 (larger packet size, more performance) [pid 29734] setsockopt(32, SOL_SOCKET, SO_RCVBUF, [262144], 4) = 0^M [pid 29734] setsockopt(32, SOL_SOCKET, SO_SNDBUF, [262144], 4) = 0^M Strace of SMBD revealed the server doing sendfile in chunk of 64KB from disk file to socket. [pid 29848] sendfile(32, 38, [3207168], 61440) = 61440 [pid 29848] sendfile(32, 38, [3268608], 61440) = 61440 [pid 29848] sendfile(32, 38, [3330048], 61440) = 61440 So, the server was doing as expected but still the performance was poor and network trace revealed lots of retransmission only from the server to the client (not the other way around). 9.990078 192.168.97.5 -> 192.168.97.1 SMB [TCP Retransmission] Read AndX Response, 61440 bytes 10.322077 192.168.97.5 -> 192.168.97.1 SMB [TCP Retransmission] Read AndX Response, 61440 bytes Then I set the SO_RCVBUF and SO_SNDBUF to 65536 to align to sendfile size. Still retransmissions was being seen. Googling, the primary suspect pointed to TCP/IP stack in particular the TCP/IP window size. TCP/IP Window Size = Bandwidth * RTT The Windows machine has Myrinet 10GigE HCA while Linux server has Chelsio 10GigE HCA. For 64KB SMB packet-size, Network testing led me to the following conclusion: Myrinet 10GigE: TCP Window Size = 3Gbps * 300 microsec ==> 150KB Chelsio 10GigE: TCP Window Size = 3.7Gbps * 260 microsec ==> 120KB Myricom recommends TCP/IP windows size of 512KB for Windows, while on Linux the window-size was set to 87.3KB (75% of 120KB to account for small packets?). net.ipv4.tcp_rmem=4096 87380 16777216 net.ipv4.tcp_wmem=4096 87380 16777216 As a results during read operation, the amount of unacknowledged data in flight that the server sent did not cause client to respond (as its window size was 512KB) causing the server to retransmit after timeout (not receiving acknowledgement). Also, TCP Window Scaling (RFC 1323) was not enabled on Windows client. Setting the Windows TCP/IP Windows size to 87.3KB (similar to Server) + TCP_1323Opts resolved the issue. Currently, a SMB server is able to handle sustained 300MB/s on writes and 200MB/s on reads. Performance remains constant as you scale clients with no time-outs and performance scales as you add another server. Iam still not sure if we can extract more from SMBD as CPU/memory/IO subsystem is less than 30% saturated. Seems like the performance bottleneck is network-related + SMB packet-size as raw network yields 450MB/s for 64KB packet-size. I may be wrong, but this is the closest explanation I can come with. Please suggest if there is room for further performance improvements. [snip] of smb.conf socket options = IPTOS_LOWDELAY TCP_NODELAY SO_RCVBUF=65536 SO_SNDBUF=65536 SO_KEEPALIVE use mmap = No use sendfile = Yes blocking locks = No Regards, -Tim