For the last year or so i have been having problems in general with samba (various versions) on the same box. Dell 2500 Xeon 1.8 with 2gb of ram running Redhat 8. What will happen from time to time (although its now happened 3 times in the last 5 days, hence this email) is people will be slow to log in, if at all. Several things appear to happen. The main one is that a smbd process which belongs to a user logging in will appear in top (a cpu monitor program) using massives amount of CPU etc. although the system says it still has about 10-15% idle, this generally stops everyone logging in. Now as part of top on RH (doesnt look the same on bsd) it has a system entry with a % of cpu given over to that. Now system basically means anything I/O or kernal related. since the kernal governs resources this isnt uncommon. During a period of 4 hours i monitored this "system" and it never went above 10% and even then for a matter of seconds. When this problem occours it pushes system upto 50-80%!!! i look at the server and the disks are pretty much idle so its not Disk Related. i am at a loss to find out what it is actually doing to cause this. however once i kill off this process it seems to slowly get back to normal. Now i have read other peoples emails and gone through the archives about this and read about "failure for 4. Error = No route to host", "lib/util_sock.c:read_data(436)" and "oplocking" problems as they all appear to be more pronounced around the time of this high CPU/rouge smbd process. However it would seem a lot of the oplocking problems seem to be hardware related. I use decent 3com kit here with a 4950 as a core and 4400's at edge (i.e not cheap and cheerful netgear/dlink/etc stuff) so im wondering if anyone else has had these problems with this kit. or if its not the kit what can it actually be? ive tried turning oplocks on and off to no avail. it still has this issue. any ideas on the "read_data(436)" and "failure for 4. Error = No route to host" ? Any help offered very gratefully recieved. With thanks Ross McInnes
Next time it happens, running an strace on the offending process "strace -p <process_id>" can provide some insight as to what it's beating around on, especially if it's system related. That might help pinpoint a spot in the code where it's having problems. Eric -----Original Message----- From: samba-bounces+eric.ladner=chevrontexaco.com@lists.samba.org [mailto:samba-bounces+eric.ladner=chevrontexaco.com@lists.samba.org] On Behalf Of Ross McInnes (Systems) Sent: Tuesday, December 09, 2003 9:34 AM To: samba@lists.samba.org Subject: [Samba] Tall tale of woe.... For the last year or so i have been having problems in general with samba (various versions) on the same box. Dell 2500 Xeon 1.8 with 2gb of ram running Redhat 8. What will happen from time to time (although its now happened 3 times in the last 5 days, hence this email) is people will be slow to log in, if at all. Several things appear to happen. The main one is that a smbd process which belongs to a user logging in will appear in top (a cpu monitor program) using massives amount of CPU etc. although the system says it still has about 10-15% idle, this generally stops everyone logging in. Now as part of top on RH (doesnt look the same on bsd) it has a system entry with a % of cpu given over to that. Now system basically means anything I/O or kernal related. since the kernal governs resources this isnt uncommon. During a period of 4 hours i monitored this "system" and it never went above 10% and even then for a matter of seconds. When this problem occours it pushes system upto 50-80%!!! i look at the server and the disks are pretty much idle so its not Disk Related. i am at a loss to find out what it is actually doing to cause this. however once i kill off this process it seems to slowly get back to normal. Now i have read other peoples emails and gone through the archives about this and read about "failure for 4. Error = No route to host", "lib/util_sock.c:read_data(436)" and "oplocking" problems as they all appear to be more pronounced around the time of this high CPU/rouge smbd process. However it would seem a lot of the oplocking problems seem to be hardware related. I use decent 3com kit here with a 4950 as a core and 4400's at edge (i.e not cheap and cheerful netgear/dlink/etc stuff) so im wondering if anyone else has had these problems with this kit. or if its not the kit what can it actually be? ive tried turning oplocks on and off to no avail. it still has this issue. any ideas on the "read_data(436)" and "failure for 4. Error = No route to host" ? Any help offered very gratefully recieved. With thanks Ross McInnes -- To unsubscribe from this list go to the following URL and read the instructions: http://lists.samba.org/mailman/listinfo/samba
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Ross McInnes (Systems) wrote: | The main one is that a smbd process which belongs to a | user logging in will appear in top (a cpu monitor program) | using massives amount of CPU etc. although the system says | it still has about 10-15% idle, this generally stops | everyone logging in. I'm assuming that you are running version 2.2.x (included with RH8).. Have you tested 3.0 (wait until 3.0.1 if you haven't yet since there are a lot of bug fixes in it). What is the smbd process doing ? Trying running strace or get a backtrace in gdb to find out where it is spending its time. | When this problem occours it pushes system upto 50-80%!!! Probably fctnl() calls when looking up data in a tdb. Find out which tdb (withe look in /proc/<pid>/fd to match the file descriptor or us lsof). Also check the network traffic at this point. | Now i have read other peoples emails and gone through | the archives about this and read about "failure for 4. Error | No route to host", "lib/util_sock.c:read_data(436)" | and "oplocking"problems as they all appear to be more | pronounced around the time of this high CPU/rouge smbd | process. Are you servning printers by chance? If so have you set 'disable spoolss = yes' ? I've seen high CPU utilization cases in relation to this param. | However it would seem a lot of the oplocking problems | seem to be hardware related. I use decent 3com kit here | with a 4950 as a core and 4400's at edge (i.e not cheap | and cheerful netgear/dlink/etc stuff) so im wondering if | anyone else has had these problems with this kit. or if its | not the kit what can it actually be? use mii-tool and check the duplex settings. And any hardware can have problem no matter what the price tag says :-) | any ideas on the "read_data(436)" and "failure for 4. Error = No route to | host" ? Chgeck you routers. Maybe they are getting overloaded or are dropping packets. - -- cheers, jerry ~ ---------------------------------------------------------------------- ~ Hewlett-Packard ------------------------- http://www.hp.com ~ SAMBA Team ---------------------- http://www.samba.org ~ GnuPG Key ---- http://www.plainjoe.org/gpg_public.asc ~ "If we're adding to the noise, turn off this song" --Switchfoot (2003) -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.1 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQE/2KzEIR7qMdg1EfYRAnB+AKDbcg2rGSS4meUkdPt/rkUB232z0gCdEclP avVw21Ch7NUW5HlcRq2bCZ8=kKjK -----END PGP SIGNATURE-----
On Mon, 15 Dec 2003, Gerald (Jerry) Carter wrote:> The kernel should log the oops in /var/log/messages.Yeah, its not there. log stops at 11:29:07 the next entry is at 11:47 when its booting.> We can't be blamed for a kernel oops. If a user space app > can cause the kernel to die, then that's a kernel bug. > I would start pursuing this with RedHat (if you have support), > or logging it in bugzilla.redhat.com.not trying to aportion blame here. Just trying to get the good old stable server back :/ was wondering if anyone else has had anything like this before? i will contact redhat and see if they can offer any suggestions. many thanks Ross McInnes