Emmanuel Florac
2017-Aug-10 17:21 UTC
[Samba] extremely low performance on Samba 4.2.14-Debian
Hi everyone, here's my problem: I have a fast server (dual Xeon E5-2620, 64 GB RAM) with a fast RAID array (24 disks, RAID-6, more than 2GB/s read/write local performance, XFS filesystem) and fast network : dual 10GigE (myri10g) and 40GigE (i40e). It's running Debian 8.11, tried various kernel versions (currently 4.4.x, but 4.9 isn't any better). It's slow as dead snails in molted mollasses using samba. Everything else is fine: * from a windows PC with a 10GigE card, using ftp.exe and vsftpd, I transfer files at 500/600 MB/s easily. * using Chrome and downloading files through HTTP, I've got 250 MB/s. * using Samba, it reaches 105/110 MB/s, tops. Awful. The Windows client and the Linux server are both connected to the same 10GigE/40GigE switch. Transferring from a windows machine to another works fine (700 MB/s and more). Therefore the windows machines are NOT at fault. Looking at what's happening on the server, I noticed that smbd uses gobs of CPU. Actually it uses about 1% of CPU (from 'top') for each MB/s. Therefore it reaches ~100MB/s, and the CPU core it's running on is maxed out! It's definitely NOT normal; on a very similar setup (same motherboard, same CPU, same amount of RAM, same RAID controller, same OS, etc) when an smbd process is writing at 500/600 MB/s the smbd CPU consumption maxes out at 47%! I don't know what's wrong and why smbd is burning CPU cycles like this. Here is a quick comparison I've done. Here is the "bad" machine: root at storiq-111:~# pidstat -p 11694 2 20 Linux 4.4.78-storiq64-opteron (storiq-111) 10/08/2017 _x86_64_ (32 CPU) 16:30:12 UID PID %usr %system %guest %CPU CPU Command 16:30:14 0 11694 0,00 0,00 0,00 0,00 8 smbd 16:30:16 10500 11694 48,00 8,00 0,00 56,00 8 smbd 16:30:18 10500 11694 54,00 13,00 0,00 67,00 8 smbd 16:30:20 10500 11694 54,00 12,00 0,00 66,00 8 smbd 16:30:22 10500 11694 61,50 11,50 0,00 73,00 8 smbd 16:30:24 10500 11694 61,50 10,00 0,00 71,50 8 smbd 16:30:26 10500 11694 64,00 10,00 0,00 74,00 8 smbd 16:30:28 10500 11694 63,50 10,00 0,00 73,50 8 smbd 16:30:30 10500 11694 67,50 11,50 0,00 79,00 8 smbd root at storiq-111:~# numastat -p 11694 Per-node process memory usage (in MBs) for PID 11694 (smbd) Node 0 Node 1 Total --------------- --------------- --------------- Huge 0.00 0.00 0.00 Heap 0.28 0.28 0.56 Stack 0.02 0.02 0.04 Private 11.51 14.45 25.96 ---------------- --------------- --------------- --------------- Total 11.80 14.76 26.56 Notice that it burns tons of CPU in "user". By contrast, here's on another (different and much slower) machine: root at storiq-313:~# pidstat -p 19654 2 20 Linux 4.4.79-storiq64-opteron (storiq-313) 10/08/2017 _x86_64_ (16 CPU) 18:29:30 UID PID %usr %system %guest %CPU CPU Command 18:29:32 1000 19654 5,50 75,50 0,00 81,00 2 smbd 18:29:34 1000 19654 6,50 82,50 0,00 89,00 0 smbd 18:29:36 1000 19654 6,50 89,00 0,00 95,50 0 smbd 18:29:38 1000 19654 5,50 92,00 0,00 97,50 4 smbd 18:29:40 1000 19654 6,50 90,50 0,00 97,00 10 smbd 18:29:42 1000 19654 6,00 94,00 0,00 100,00 0 smbd 18:29:44 1000 19654 7,00 90,50 0,00 97,50 0 smbd 18:29:46 1000 19654 7,50 87,00 0,00 94,50 0 smbd 18:29:48 1000 19654 6,00 92,00 0,00 98,00 0 smbd 18:29:50 1000 19654 7,00 91,00 0,00 98,00 0 smbd 18:29:52 1000 19654 6,00 89,00 0,00 95,00 0 smbd Per-node process memory usage (in MBs) for PID 19654 (smbd) Node 0 Node 2 Total --------------- --------------- --------------- Huge 0.00 0.00 0.00 Heap 0.14 0.00 0.64 Stack 0.02 0.00 0.04 Private 4.61 0.00 8.57 ---------------- --------------- --------------- --------------- Total 4.78 0.00 9.25 The theoretically slower machine is actually 5x faster! That's not amusing... Also for some reason it uses much more memory, on both nodes, on the "bad" machine (but there are many more clients). I tried running strace on the working smbd process, but I don't see anything remarkable in its output. No hardware errors in mcelog either. I'm out of ideas... What's going on? smb.conf in case anyone spots something fishy (it's actually split in 3): /etc/samba/smb.conf: netbios name = storiq-111 server string = %h server (Samba, Debian) include = /etc/samba/smb-common.ad.conf include = /etc/samba/smb-shares.conf /etc/samba/smb-common.ad.conf: security = ADS workgroup = TEST realm = AD.TEST.COM winbind sealed pipes = false require strong key = false winbind sealed pipes:TEST = true require strong key:TEST = true winbind refresh tickets = yes winbind trusted domains only = no winbind use default domain = yes winbind enum users = yes winbind enum groups = yes winbind cache time = 7200 winbind offline logon = yes idmap config *:backend = tdb idmap config *:range = 2000-9999 idmap config TEST:backend = rid idmap config TEST:range = 10000-50000000 winbind nss info = template template shell = /bin/bash template homedir = /mnt/raid/%u client use spnego = yes client ntlmv2 auth = yes encrypt passwords = yes restrict anonymous = 2 server signing = mandatory ntlm auth = yes log level = 0 log file = /var/log/samba/smbd.log max log size = 50 vfs objects = acl_xattr map acl inherit = yes store dos attributes = yes /etc/samba/smb-shares.conf: [test_tr] comment = Utilisateurs de test_tr valid users = @prod force group = prod force create mode = 775 read only = no path = /mnt/raid/test_tr guest ok = no ; vfs objects = acl_xattr streams_xattr -- ------------------------------------------------------------------------ Emmanuel Florac | Direction technique | Intellique | <eflorac at intellique.com> | +33 1 78 94 84 02 ------------------------------------------------------------------------ -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 181 bytes Desc: Signature digitale OpenPGP URL: <http://lists.samba.org/pipermail/samba/attachments/20170810/3a581475/attachment.sig>
Rowland Penny
2017-Aug-10 17:46 UTC
[Samba] extremely low performance on Samba 4.2.14-Debian
On Thu, 10 Aug 2017 19:21:53 +0200 Emmanuel Florac via samba <samba at lists.samba.org> wrote:> > Hi everyone, > > here's my problem: I have a fast server (dual Xeon E5-2620, 64 GB RAM) > with a fast RAID array (24 disks, RAID-6, more than 2GB/s read/write > local performance, XFS filesystem) and fast network : dual 10GigE > (myri10g) and 40GigE (i40e). > > It's running Debian 8.11, tried various kernel versions (currently > 4.4.x, but 4.9 isn't any better). > > It's slow as dead snails in molted mollasses using samba. Everything > else is fine: > > * from a windows PC with a 10GigE card, using ftp.exe and vsftpd, I > transfer files at 500/600 MB/s easily. > * using Chrome and downloading files through HTTP, I've got 250 MB/s. > > * using Samba, it reaches 105/110 MB/s, tops. Awful. > > The Windows client and the Linux server are both connected to the same > 10GigE/40GigE switch. Transferring from a windows machine to another > works fine (700 MB/s and more). Therefore the windows machines are NOT > at fault. > > Looking at what's happening on the server, I noticed that smbd uses > gobs of CPU. Actually it uses about 1% of CPU (from 'top') for each > MB/s. Therefore it reaches ~100MB/s, and the CPU core it's running on > is maxed out! It's definitely NOT normal; on a very similar setup > (same motherboard, same CPU, same amount of RAM, same RAID controller, > same OS, etc) when an smbd process is writing at 500/600 MB/s the smbd > CPU consumption maxes out at 47%! > > I don't know what's wrong and why smbd is burning CPU cycles like > this. > > Here is a quick comparison I've done. Here is the "bad" machine: > > root at storiq-111:~# pidstat -p 11694 2 20 > Linux 4.4.78-storiq64-opteron (storiq-111) 10/08/2017 > _x86_64_ (32 CPU) > > 16:30:12 UID PID %usr %system %guest %CPU CPU > Command 16:30:14 0 11694 0,00 0,00 0,00 > 0,00 8 smbd 16:30:16 10500 11694 48,00 8,00 > 0,00 56,00 8 smbd 16:30:18 10500 11694 54,00 > 13,00 0,00 67,00 8 smbd 16:30:20 10500 11694 > 54,00 12,00 0,00 66,00 8 smbd 16:30:22 10500 > 11694 61,50 11,50 0,00 73,00 8 smbd 16:30:24 > 10500 11694 61,50 10,00 0,00 71,50 8 smbd > 16:30:26 10500 11694 64,00 10,00 0,00 74,00 8 > smbd 16:30:28 10500 11694 63,50 10,00 0,00 73,50 > 8 smbd 16:30:30 10500 11694 67,50 11,50 0,00 > 79,00 8 smbd > > root at storiq-111:~# numastat -p 11694 > Per-node process memory usage (in MBs) for PID 11694 (smbd) > Node 0 Node 1 Total > --------------- --------------- --------------- > Huge 0.00 0.00 0.00 > Heap 0.28 0.28 0.56 > Stack 0.02 0.02 0.04 > Private 11.51 14.45 25.96 > ---------------- --------------- --------------- --------------- > Total 11.80 14.76 26.56 > > Notice that it burns tons of CPU in "user". By contrast, here's on > another (different and much slower) machine: > > root at storiq-313:~# pidstat -p 19654 2 20 > Linux 4.4.79-storiq64-opteron (storiq-313) 10/08/2017 > _x86_64_ (16 CPU) > > 18:29:30 UID PID %usr %system %guest %CPU CPU > Command 18:29:32 1000 19654 5,50 75,50 0,00 > 81,00 2 smbd 18:29:34 1000 19654 6,50 82,50 > 0,00 89,00 0 smbd 18:29:36 1000 19654 6,50 > 89,00 0,00 95,50 0 smbd 18:29:38 1000 19654 > 5,50 92,00 0,00 97,50 4 smbd 18:29:40 1000 > 19654 6,50 90,50 0,00 97,00 10 smbd 18:29:42 > 1000 19654 6,00 94,00 0,00 100,00 0 smbd > 18:29:44 1000 19654 7,00 90,50 0,00 97,50 0 > smbd 18:29:46 1000 19654 7,50 87,00 0,00 94,50 > 0 smbd 18:29:48 1000 19654 6,00 92,00 0,00 > 98,00 0 smbd 18:29:50 1000 19654 7,00 91,00 > 0,00 98,00 0 smbd 18:29:52 1000 19654 6,00 > 89,00 0,00 95,00 0 smbd > > > Per-node process memory usage (in MBs) for PID 19654 (smbd) > Node 0 Node 2 Total > --------------- --------------- --------------- > Huge 0.00 0.00 0.00 > Heap 0.14 0.00 0.64 > Stack 0.02 0.00 0.04 > Private 4.61 0.00 8.57 > ---------------- --------------- --------------- --------------- > Total 4.78 0.00 9.25 > > The theoretically slower machine is actually 5x faster! That's not > amusing... Also for some reason it uses much more memory, on both > nodes, on the "bad" machine (but there are many more clients). > > I tried running strace on the working smbd process, but I don't see > anything remarkable in its output. No hardware errors in mcelog > either. I'm out of ideas... What's going on? > > smb.conf in case anyone spots something fishy (it's actually split in > 3): > > /etc/samba/smb.conf: > > netbios name = storiq-111 > server string = %h server (Samba, Debian) > include = /etc/samba/smb-common.ad.conf > include = /etc/samba/smb-shares.conf > > /etc/samba/smb-common.ad.conf: > > security = ADS > workgroup = TEST > realm = AD.TEST.COM > > > winbind sealed pipes = false > require strong key = false > winbind sealed pipes:TEST = true > require strong key:TEST = true > winbind refresh tickets = yes > winbind trusted domains only = no > winbind use default domain = yes > winbind enum users = yes > winbind enum groups = yes > winbind cache time = 7200 > winbind offline logon = yes > > > idmap config *:backend = tdb > idmap config *:range = 2000-9999 > idmap config TEST:backend = rid > idmap config TEST:range = 10000-50000000 > > winbind nss info = template > template shell = /bin/bash > template homedir = /mnt/raid/%u > > client use spnego = yes > client ntlmv2 auth = yes > encrypt passwords = yes > restrict anonymous = 2 > server signing = mandatory > ntlm auth = yes > > log level = 0 > log file = /var/log/samba/smbd.log > max log size = 50 > > vfs objects = acl_xattr > map acl inherit = yes > store dos attributes = yes > > /etc/samba/smb-shares.conf: > > [test_tr] > comment = Utilisateurs de test_tr > valid users = @prod > force group = prod > force create mode = 775 > read only = no > path = /mnt/raid/test_tr > guest ok = no > ; vfs objects = acl_xattr streams_xattr > > >4.2.x is EOL as far as Samba is concerned, there have been a lot of changes since 4.2.* came out. Can I suggest you go here: http://apt.van-belle.nl/ You can get a much more recent version there, 4.6.7 Rowland
Emmanuel Florac
2017-Aug-10 18:39 UTC
[Samba] extremely low performance on Samba 4.2.14-Debian
Le Thu, 10 Aug 2017 18:46:12 +0100 Rowland Penny via samba <samba at lists.samba.org> écrivait:> 4.2.x is EOL as far as Samba is concerned, there have been a lot of > changes since 4.2.* came out. > > Can I suggest you go here: http://apt.van-belle.nl/ > > You can get a much more recent version there, 4.6.7OK I'll try it, but that doesn't really explain why it's so slow, while it's completely OK on other similar or slower machines... Actually it's the first time ever, with any Samba version (including 3.6.x) that I have such a bad performance on a 10GigE equipped system. I even usually got better performance with bonded Gb interfaces 8 or 9 years ago... -- ------------------------------------------------------------------------ Emmanuel Florac | Direction technique | Intellique | <eflorac at intellique.com> | +33 1 78 94 84 02 ------------------------------------------------------------------------ -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 181 bytes Desc: Signature digitale OpenPGP URL: <http://lists.samba.org/pipermail/samba/attachments/20170810/311ff7bf/attachment.sig>