chris (fool) mccraw
2011-May-05 20:31 UTC
[Samba] debugging really high network usage with no apparent cause
hello, i'm not a heavy samba user in general (works without any serious tweaking on my modest home network), but one of my clients is and we are seeing some strange behavior. perhaps it makes sense to someone with more clue. if not, i could use some advice in how to figure out what is going on. the setup: a couple dozen windows clients (xp pro 64bit & vista ultimate 64bit) taking fileservice, netbios, and WINS from an up-to-date centos 5.5 64bit x86 machine running samba 3.3x (samba3x-common-3.3.8-0.52.el5_5.2 and friends) on a switched gigabit network. the client machines all have local authentication, but do use samba to do hostname resolution of short hostnames (ie, goliath). the issue: "network seems slow". quantifiably, some of the batch (rendering, in this case) jobs that are running on the windows machines are going much more slowly than expected--factor of 2-5x slower than usual is a real killer when "usual" is 12 hours. after checking on the machines and the server, i see nothing untoward chewing up the CPU on either side--on the server, there are as many smbd's as clients and they are splitting the CPU's evenly (server is a 16-core machine with 32G of ram and a big hardware raid hanging off 2 high-end areca cards). the server sits at a load average of about 2 when all the nodes are cranking and interactive use (on the server) is fine then, though fileservice speed is definitely maxed out at wirespeed when i run tests on a quiet network (spread between all the clients. single clients only manage around 350Mbit/sec, but i attribute that to windows, not the server, since it can talk scp at closer to 900Mbit/sec to another linux machine and i can run more than one ~300Mbit smb stream at a time to different clients). so i checked the smbd/nmbd/winbindd logs and see nothing strange. i fire up wireshark on the clients and holy crap there is a lot of traffic when there should be none! i filtered out broadcast, multicast, and indeed all traffic not destined for the host i was monitoring and found that 2 separate clients (win xp and vista) were each chugging along at approximately 1GByte/minute of traffic (in approx 700k packets/min) *from* the server in a single tcp conversation, when the workload should have been more like 0--these guys were crunching numbers, not reading or writing files, not even doing anything that should need nameservice (no interactive use or background programs running). i don't know the tools they are using well, but they are supposed to be reading in a small source file once at the beginning of the job and dumping out, all at once at the end of the process, a video frame in the neighborhood of 4MByte. thinking something really untoward must be going on under the hood, i cranked up debugging on smbd, nmbd, and winbindd to 3 with smbcontrol and the things samba is logging did not change in any way (though smbcontrol did report that debugging was set to level 3 across the board for all 3 daemons). all i understand from the wireshark dump is that all of this traffic is between client and samba server, on port 445 for vista, and on port 139 for xp. and nearly all of it is server->client. wireshark's "info" column labels the suspicious traffic as follows: about 80% as TCP "[TCP segment of a reassembled PDU]", ~10% as TCP, "60579 > microsoft-ds [ACK] Seq=<integer> Ack=<other integer> Win=65535, Len=0" ~5% as SMB "Read andX Request, FID 0x2aee, 16384 bytes at offset <integer>" ~5% as SMB "Read AndX Response, 16384 bytes" if you already see what is going on, great, please let me know! if not (and i'm in the "not" boat), could you recommend what i can look at to determine the cause of so much traffic? it is the case that such traffic is not network-wide, but only on clients that are running these batch processes--idle clients have a modest and normal number of network packets (less than a hundred a second even without filtering broadcast/etc). it just seems to me like there shouldn't be that much traffic to a loaded (cpu-bound, effectively i/o-less) client that is working on a 400k source file and is hours away from writing byte 1 of an output file. i'm happy to crank up debugging further, run tests on the clients or server, or post packet dumps if these things would help in the diagnosis. please let me know what further information would be useful! thanks in advance for your help.
Volker Lendecke
2011-May-05 20:42 UTC
[Samba] debugging really high network usage with no apparent cause
On Thu, May 05, 2011 at 01:31:35PM -0700, chris (fool) mccraw wrote:> about 80% as TCP "[TCP segment of a reassembled PDU]", > ~10% as TCP, "60579 > microsoft-ds [ACK] Seq=<integer> Ack=<other > integer> Win=65535, Len=0" > ~5% as SMB "Read andX Request, FID 0x2aee, 16384 bytes at offset <integer>" > ~5% as SMB "Read AndX Response, 16384 bytes"Virus scanner on the client scanning network shares? Volker -- SerNet GmbH, Bahnhofsallee 1b, 37081 G?ttingen phone: +49-551-370000-0, fax: +49-551-370000-9 AG G?ttingen, HRB 2816, GF: Dr. Johannes Loxen