Over the past few months, I have seen many postings here about runaway smbd processes with Samba versions 3.0.20 and above. Personally, it never happened to me until today. Also, I have stuck with Samba 3.0.13 on most of my machines because of THIS reported issue and a couple of other issues that I have experienced. However, I have a machine running RIGHT NOW where smbd has gone out of control. This machine is running 3.0.20b. If it would help, and if somebody could tell me exactly -- and I mean exactly -- what to do on my machine to capture information that might help explain what is going on, I would be happy to collect the information. But, it has to be in the next couple of hours. It is 8:30 am Friday in Boston, MA USA. I have to reboot the machine to use it in about 3 hours. Note that rpc.statd also seems to be out of control. Don't know if it is related. Andy Liebman Here's what TOP looks like: Tasks: 170 total, 2 running, 168 sleeping, 0 stopped, 0 zombie Cpu(s): 12.6% us, 34.1% sy, 0.0% ni, 20.3% id, 0.0% wa, 0.0% hi, 33.1% si Mem: 2075844k total, 2019784k used, 56060k free, 7668k buffers Swap: 1012052k total, 2556k used, 1009496k free, 1820308k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 4743 root 25 0 8664 2164 1416 R 99.7 0.1 621:48.83 smbd 2569 root 15 0 1692 688 584 S 58.8 0.0 323:00.52 rpc.statd 4928 andrew 15 0 13604 11m 1592 S 0.7 0.6 0:10.55 Xvnc 11509 andrew 16 0 27520 12m 9m S 0.3 0.6 0:01.64 konsole 1 root 16 0 1560 536 472 S 0.0 0.0 0:00.70 init 2 root RT 0 0 0 0 S 0.0 0.0 0:00.00 migration/0 3 root 34 19 0 0 0 S 0.0 0.0 0:00.00 ksoftirqd/0 4 root RT 0 0 0 0 S 0.0 0.0 0:00.00 migration/1 5 root 34 19 0 0 0 S 0.0 0.0 0:00.00 ksoftirqd/1 6 root 10 -5 0 0 0 S 0.0 0.0 0:00.09 events/0 7 root 10 -5 0 0 0 S 0.0 0.0 0:05.15 events/1 8 root 11 -5 0 0 0 S 0.0 0.0 0:00.01 khelper 9 root 10 -5 0 0 0 S 0.0 0.0 0:00.00 kthread 12 root 20 -5 0 0 0 S 0.0 0.0 0:00.00 kacpid 124 root 10 -5 0 0 0 S 0.0 0.0 0:00.05 kblockd/0 125 root 10 -5 0 0 0 S 0.0 0.0 0:00.05 kblockd/1 167 root 15 0 0 0 0 S 0.0 0.0 3:42.50 pdflush
Andy, Luckily, the client this was happening to's contract ran out. I haven't had the problem with any other clients, but I suspect it had something to do with the kernel it was running on...that was the only thing that differed between the boxes that did and didn't work. (It was an older kernel...2.6.5 or something) I never had a chance to ultimately determine the root cause...but my advice would be to attach an strace to the process(es) that is/are spinning out of control and see what it's hanging on. In the online book Samba3 By Example (Google that) it has a chapter on stracing smbd processes. Hope that helps. Best, Ryan andy liebman wrote:> Over the past few months, I have seen many postings here about runaway > smbd processes with Samba versions 3.0.20 and above. Personally, it > never happened to me until today. Also, I have stuck with Samba 3.0.13 > on most of my machines because of THIS reported issue and a couple of > other issues that I have experienced. > > However, I have a machine running RIGHT NOW where smbd has gone out of > control. This machine is running 3.0.20b. If it would help, and if > somebody could tell me exactly -- and I mean exactly -- what to do on > my machine to capture information that might help explain what is > going on, I would be happy to collect the information. > > But, it has to be in the next couple of hours. It is 8:30 am Friday in > Boston, MA USA. I have to reboot the machine to use it in about 3 hours. > > Note that rpc.statd also seems to be out of control. Don't know if it > is related. > > Andy Liebman > > Here's what TOP looks like: > > Tasks: 170 total, 2 running, 168 sleeping, 0 stopped, 0 zombie > Cpu(s): 12.6% us, 34.1% sy, 0.0% ni, 20.3% id, 0.0% wa, 0.0% hi, > 33.1% si > Mem: 2075844k total, 2019784k used, 56060k free, 7668k buffers > Swap: 1012052k total, 2556k used, 1009496k free, 1820308k cached > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > 4743 root 25 0 8664 2164 1416 R 99.7 0.1 621:48.83 smbd > 2569 root 15 0 1692 688 584 S 58.8 0.0 323:00.52 rpc.statd > 4928 andrew 15 0 13604 11m 1592 S 0.7 0.6 0:10.55 Xvnc > 11509 andrew 16 0 27520 12m 9m S 0.3 0.6 0:01.64 konsole > 1 root 16 0 1560 536 472 S 0.0 0.0 0:00.70 init > 2 root RT 0 0 0 0 S 0.0 0.0 0:00.00 migration/0 > 3 root 34 19 0 0 0 S 0.0 0.0 0:00.00 ksoftirqd/0 > 4 root RT 0 0 0 0 S 0.0 0.0 0:00.00 migration/1 > 5 root 34 19 0 0 0 S 0.0 0.0 0:00.00 ksoftirqd/1 > 6 root 10 -5 0 0 0 S 0.0 0.0 0:00.09 events/0 > 7 root 10 -5 0 0 0 S 0.0 0.0 0:05.15 events/1 > 8 root 11 -5 0 0 0 S 0.0 0.0 0:00.01 khelper > 9 root 10 -5 0 0 0 S 0.0 0.0 0:00.00 kthread > 12 root 20 -5 0 0 0 S 0.0 0.0 0:00.00 kacpid > 124 root 10 -5 0 0 0 S 0.0 0.0 0:00.05 kblockd/0 > 125 root 10 -5 0 0 0 S 0.0 0.0 0:00.05 kblockd/1 > 167 root 15 0 0 0 0 S 0.0 0.0 3:42.50 pdflush
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 andy liebman wrote:> Note that rpc.statd also seems to be out of control. > Don't know if it is related.Are you re-exporting nfs file systems by chance? ok. Steps to trouble shoot. * Figure out what IP address is associated with that smbd process and get a network capture to find out if the high CPU is just caused by a chatty client. * Second, strace -p <pid>. Look for recurring patterns. If the culprit is an fcntl() call, try to associate the file descriptor with an actual file. The fd is the first parameter to fcntl(). You can look up the fd in /proc/<pid>/fd/ * Finally, Try to figure out where in the code the CPU is being eaten. Either gdb or perhaps log levels will tell you. cheers, jerry ====================================================================Samba ------- http://www.samba.org Centeris ----------- http://www.centeris.com "What man is a man who does not make the world better?" --Balian -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2 (GNU/Linux) Comment: Using GnuPG with SUSE - http://enigmail.mozdev.org iD8DBQFErnD2IR7qMdg1EfYRAonJAKC1vSbjC7zfnnbXWzHtJPEwdjQoRwCg3NMd LmqG1hfPqjXWoM37/h9aDfc=vx3K -----END PGP SIGNATURE-----