Rainer Krienke
2012-Aug-07 09:03 UTC
[Samba] Performance problem using clustered samba via ctdb
Hello, I recently set up a samba cluster with 4 nodes using ctdb. The systems are virtual Citrix xen machines running SuSE SLES11Sp2 with samba 3.6.3. The shared filesystem needed for ctdb is on a ocfs2 share stored on a ISCSI target. The cluster is running fine and ip takeover etc is working fine as well. To find out how the cluster would performe in real life with many clients accessing samba shares I compiled smbtorture (from samba4) to run the nbenchmark test using the loadfile client.txt from the dbench4.0 distribution. What I found out is really strange: I first tried to simulate 50 clients on one of the cluster nodes: $ bin/smbtorture //host1/smbtest1 -UUNIKO/smbtest1%password bench.nbench --loadfile=dbench-4.0/client.txt --num-progs=100 -t 30 The result is an average throughput rate of 50MByte/sec. Ok do far. Now I distributed the 100 clients on all four nodes by starting an smnbtoture with 25 clients on each of the cluster members: $ bin/smbtorture //host[1,2,3,4]/smbtest[1,2,3,4] .... --num-progs=25 -t 30 The throughput results for the four hosts are now: 4.4 MBytes/sec, 4.6 MBytes/sec, 5.2 MBytes/sec and 2.8 MBytes/sec If I add more clients by increasing the --num-progs-parameter rates drop further down. On one node probably the master I see that all three (virtual) CPU core have a system load of 60% (from top). The other three nodes do not show any high CPU load. I also ran the ping_pong test (ping_pong /shared/cluster/test.dat 5) on the shared filesystem. On one node I get a value of about 36000. If I run the very same ping_pong-command on all four nodes I get a value of 1000 on each node. On our old samba servers we have a total of about 400 connects distributed on two servers. However if I try to put such a load (4x100) on the four new samba cluster nodes via smbtorture the test won't even start. If i put 400 clients on one of the servers it works just fine. Now I ask myself two questions: 1. Is the nbenchmark kind of realistic test? 2. Why do throughput rates drop as much as I found out and is this a known behavior of ctdb or is my configuration somehow bad? Any ideas? Thanks Rainer -- Rainer Krienke, Uni Koblenz, Rechenzentrum, A22, Universitaetsstrasse 1 56070 Koblenz, http://userpages.uni-koblenz.de/~krienke, Tel: +49261287 1312 PGP: http://userpages.uni-koblenz.de/~krienke/mypgp.html,Fax: +49261287 1001312