thr3ads.net - samba - [Samba] Tall tale of woe.... [Dec 2003]

If this information is useful, please help other people find it:
Share via:

Ross McInnes (Systems)

2003-Dec-09 15:36 UTC

[Samba] Tall tale of woe....

For the last year or so i have been having problems in general with samba 
(various versions) on the same box. 
Dell 2500 Xeon 1.8 with 2gb of ram running Redhat 8.

What will happen from time to time (although its now happened 3 times in 
the last 5 days, hence this email) is people will be slow to log in, if at 
all. Several things appear to happen.

The main one is that a smbd process which belongs to a user logging in 
will appear in top (a cpu monitor program) using massives amount of CPU 
etc. although the system says it still has about 10-15% idle, this 
generally stops everyone logging in.

Now as part of top on RH (doesnt look the same on bsd) it has a system 
entry with a % of cpu given over to that. Now system basically means 
anything I/O or kernal related. since the kernal governs resources this 
isnt uncommon. During a period of 4 hours i monitored this "system"
and it
never went above 10% and even then for a matter of seconds.
When this problem occours it pushes system upto 50-80%!!! i look at the 
server and the disks are pretty much idle so its not Disk Related. i am at 
a loss to find out what it is actually doing to cause this.

however once i kill off this process it seems to slowly get back to 
normal.

Now i have read other peoples emails and gone through the archives about 
this and read about "failure for 4. Error = No route to host", 
"lib/util_sock.c:read_data(436)" and "oplocking"
problems as they all appear to be more pronounced around the time of 
this high CPU/rouge smbd process. 

However it would seem a lot of the oplocking problems seem to be 
hardware related. I use decent 3com kit here with a 4950 as a core and 
4400's at edge (i.e not cheap and cheerful netgear/dlink/etc stuff) so im 
wondering if anyone else has had these problems with this kit. or if its 
not the kit what can it actually be?

ive tried turning oplocks on and off to no avail. it still has this issue.

any ideas on the "read_data(436)" and "failure for 4. Error = No
route to
host" ?


Any help offered very gratefully recieved.

With thanks

Ross McInnes

Ladner, Eric (Eric.Ladner)

2003-Dec-09 15:44 UTC

head link

[Samba] Tall tale of woe....

Next time it happens, running an strace on the offending process "strace
-p <process_id>" can provide some insight as to what it's beating
around
on, especially if it's system related.  That might help pinpoint a spot
in the code where it's having problems.

Eric

-----Original Message-----
From: samba-bounces+eric.ladner=chevrontexaco.com@lists.samba.org
[mailto:samba-bounces+eric.ladner=chevrontexaco.com@lists.samba.org] On
Behalf Of Ross McInnes (Systems)
Sent: Tuesday, December 09, 2003 9:34 AM
To: samba@lists.samba.org
Subject: [Samba] Tall tale of woe....


For the last year or so i have been having problems in general with
samba 
(various versions) on the same box. 
Dell 2500 Xeon 1.8 with 2gb of ram running Redhat 8.

What will happen from time to time (although its now happened 3 times in

the last 5 days, hence this email) is people will be slow to log in, if
at 
all. Several things appear to happen.

The main one is that a smbd process which belongs to a user logging in 
will appear in top (a cpu monitor program) using massives amount of CPU 
etc. although the system says it still has about 10-15% idle, this 
generally stops everyone logging in.

Now as part of top on RH (doesnt look the same on bsd) it has a system 
entry with a % of cpu given over to that. Now system basically means 
anything I/O or kernal related. since the kernal governs resources this 
isnt uncommon. During a period of 4 hours i monitored this "system"
and
it 
never went above 10% and even then for a matter of seconds. When this
problem occours it pushes system upto 50-80%!!! i look at the 
server and the disks are pretty much idle so its not Disk Related. i am
at 
a loss to find out what it is actually doing to cause this.

however once i kill off this process it seems to slowly get back to 
normal.

Now i have read other peoples emails and gone through the archives about

this and read about "failure for 4. Error = No route to host", 
"lib/util_sock.c:read_data(436)" and "oplocking"
problems as they all appear to be more pronounced around the time of 
this high CPU/rouge smbd process. 

However it would seem a lot of the oplocking problems seem to be 
hardware related. I use decent 3com kit here with a 4950 as a core and 
4400's at edge (i.e not cheap and cheerful netgear/dlink/etc stuff) so
im 
wondering if anyone else has had these problems with this kit. or if its

not the kit what can it actually be?

ive tried turning oplocks on and off to no avail. it still has this
issue.

any ideas on the "read_data(436)" and "failure for 4. Error = No
route
to 
host" ?


Any help offered very gratefully recieved.

With thanks

Ross McInnes

-- 
To unsubscribe from this list go to the following URL and read the
instructions:  http://lists.samba.org/mailman/listinfo/samba

Gerald (Jerry) Carter

2003-Dec-11 17:42 UTC

head link

[Samba] Tall tale of woe....

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Ross McInnes (Systems) wrote:

| The main one is that a smbd process which belongs to a
| user logging in will appear in top (a cpu monitor program)
| using massives amount of CPU  etc. although the system says
| it still has about 10-15% idle, this generally stops
| everyone logging in.

I'm assuming that you are running version 2.2.x
(included with RH8)..  Have you tested 3.0 (wait until
3.0.1 if you haven't yet since there are a lot of bug
fixes in it).

What is the smbd process doing ?  Trying running strace
or get a backtrace in gdb to find out where it is spending
its time.

| When this problem occours it pushes system upto 50-80%!!!

Probably fctnl() calls when looking up data in a tdb.
Find out which tdb  (withe look in /proc/<pid>/fd to
match the file descriptor or us lsof).

Also check the network traffic at this point.

| Now i have read other peoples emails and gone through
| the archives about this and read about "failure for 4. Error | No route
to host", "lib/util_sock.c:read_data(436)"
| and "oplocking"problems as they all appear to be more
| pronounced around the time of this high CPU/rouge smbd
| process.

Are you servning printers by chance?  If so have you
set 'disable spoolss = yes' ?  I've seen high CPU utilization
cases in relation to this param.

| However it would seem a lot of the oplocking problems
| seem to be hardware related. I use decent 3com kit here
| with a 4950 as a core and 4400's at edge (i.e not cheap
| and cheerful netgear/dlink/etc stuff) so im wondering if
| anyone else has had these problems with this kit. or if its
| not the kit what can it actually be?

use mii-tool and check the duplex settings.  And any
hardware can have problem no matter what the price tag
says :-)

| any ideas on the "read_data(436)" and "failure for 4. Error =
No route to
| host" ?

Chgeck you routers.  Maybe they are getting overloaded or
are dropping packets.




- --
cheers, jerry
~ ----------------------------------------------------------------------
~ Hewlett-Packard            ------------------------- http://www.hp.com
~ SAMBA Team                 ---------------------- http://www.samba.org
~ GnuPG Key                  ---- http://www.plainjoe.org/gpg_public.asc
~ "If we're adding to the noise, turn off this song" --Switchfoot
(2003)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.1 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQE/2KzEIR7qMdg1EfYRAnB+AKDbcg2rGSS4meUkdPt/rkUB232z0gCdEclP
avVw21Ch7NUW5HlcRq2bCZ8=kKjK
-----END PGP SIGNATURE-----

Ross McInnes (Systems)

2003-Dec-16 08:59 UTC

head link

[Samba] Tall tale of woe....

On Mon, 15 Dec 2003, Gerald (Jerry) Carter wrote:
> The kernel should log the oops in /var/log/messages.
Yeah, its not there. log stops at 11:29:07 the next entry is at 11:47 when 
its booting.

 > We can't be blamed for a kernel oops.  If a user space app
> can cause the kernel to die, then that's a kernel bug.
> I would start pursuing this with RedHat (if you have support),
> or logging it in bugzilla.redhat.com.
not trying to aportion blame here. Just trying to get the good old stable 
server back :/ was wondering if anyone else has had anything like this 
before?

i will contact redhat and see if they can offer any suggestions.

many thanks 

Ross McInnes

Seemingly Similar Threads

Search for more reasonably related threads

samba - Dec 2003 - Tall tale of woe....

[Samba] Tall tale of woe....

[Samba] Tall tale of woe....

[Samba] Tall tale of woe....

[Samba] Tall tale of woe....

Seemingly Similar Threads