thr3ads.net - samba - [Samba] Possible Issue with Samba Blocking I/O and CPU [Jun 2010]

If this information is useful, please help other people find it:
Share via:

Andy Liebman

2010-Jun-10 14:21 UTC

[Samba] Possible Issue with Samba Blocking I/O and CPU

Hi,

I am experiencing a puzzling problem that may or may not be related to 
recent versions of Samba.  I'm posting on this list, however, because it 
seems that setting "write cache = 262144" (256K) in smb.conf resolves 
the issue and so I have reason to believe the problem might somehow be 
related to Samba.

The problem I am seeing is this.  I have a Linux Samba server. When 
simultaneously reading and writing from multiple client workstations -- 
let's say I'm writing a total of 80 MB/sec from 4 Windows clients and 
reading 220 MB/sec from 12 other Windows clients -- every half hour or 
so, all I/O seems to grind to a halt for 1 or 2 seconds. You can see 
this graphically using a program such as "gkrellm".  You just see a 
dropout of all reading and writing from/to the disks. SMB traffic 
continues to come in from the network, but SMB traffic stops going out 
TO the network.  This same pattern has been observed on multiple 
servers, so the problem isn't caused by some bad RAID card or other 
piece of defective hardware.

Looking at data from /proc  (this is easy using a program like 
"collectl"), at the moment of the dropout, you can see that
"idle" time
goes to 0 on any CPU core running smbd processes, and wait goes very 
high.  On the core handling the interrupts for the RAID card driver, 
"idle" time goes to 100 percent.  This is running the 2.6.32.11 Linux 
kernel with Samba 3.5.3 (the latest as of today) or Samba 3.4.2 (the 
version that comes with my distro).

Curiously, the 1-to-2 second drops don't occur with the exact same 
hardware and workload when running the 2.6.20 Linux kernel with Samba  
3.0.23d.  Since 2.6.20, there have been huge changes made in the vm 
layer of the Linux kernel (specifically related to pdflush and the per 
device "bdi" mechanism).  There have also been many changes to the 
deadline i/o scheduler.  For several weeks, my colleagues and I have 
been thinking that changes to one of those code areas might account for 
the difference.  However, after much study and experimentation in 
tweaking those subsystems, we could not make this occasional "drop
out"
behavior go away.

By the way, this dropout behavior does not occur with a "pure write"
or
"pure read" workflow. Only when there is a mixture of read and write. 
Our theory is that we might occasionally be getting a bunch of 
small-sized blocks of data to write,  causing our RAID-5 configured RAID 
card to do a large number of "partial stripe writes", which would
result
in reading useless data from the drives in order to calculate parity for 
a number of stripes, and interfering with reading data that has actually 
been requested by clients.

Two days ago -- grasping a bit for straws -- we tried messing around 
with some smb.conf settings.  It turns out that setting "write cache = 
262144" made this specific problem go away.  We have repeated our tests 
for over 8 hours since making this change and we have not seen a single 
dropout like before. Presumably, with this setting, Samba will only 
write out in 256K blocks (which happens to be the stripe size on our RAIDS)

My question is, does anyone have a clue why setting "write cache" like
this would have such an effect?  Is setting "write cache" just
covering
up some other problem? Is there any downside to using "write cache"?  
And why didn't we have this issue with the 2.6.20 kernel + Samba 3.0.23d?

FYI, my setup is as follows:

Supermicro X7DWE motherboard
Intel Xeon 5482 3.2 Ghz Quad Core CPU
4 GB ECC Buffered RAM
3ware 9650 RAID card with 9.5.3 firmware (latest, includ fixes for prior 
issue of "writes blocking reads")
16 x 7200 RPM SATA drives
Myricom 10 Gigabit NIC
HP 2910 Switch with 4 x 10 Gigabit Ports and 24 x 1 Gigabit Ports
Linux with 2.6.32.11 kernel
Samba 3.5.3

The RAID system itself can sustain writes of > 650 MB/sec and reads > 
700 MB/sec.  When accessing the storage from Windows workstations via 10 
Gigabit, there is no problem whatsoever in reading/writing > 300 MB/sec 
from any given client.

Any good insights into the cause of what I'm seeing would be much 
appreciated. Thanks in advance.

Andy Liebman

Roel van Meer

2010-Jun-10 14:39 UTC

head link

[Samba] Possible Issue with Samba Blocking I/O and CPU

Hi Andy,
> And why didn't we have this issue with the 2.6.20 kernel + Samba
3.0.23d?
Have you tried kernel 2.6.20 with samba 3.5.3, or kernel 2.6.32.x with samba 
3.0.23d?
What filesystem are you using, and which mount options does it have?

Regards,

roel

Jeremy Allison

2010-Jun-10 16:33 UTC

head link

[Samba] Possible Issue with Samba Blocking I/O and CPU

On Thu, Jun 10, 2010 at 10:21:13AM -0400, Andy Liebman
wrote:> Hi,
>
> I am experiencing a puzzling problem that may or may not be related to  
> recent versions of Samba.  I'm posting on this list, however, because
it
> seems that setting "write cache = 262144" (256K) in smb.conf
resolves
> the issue and so I have reason to believe the problem might somehow be  
> related to Samba.
>
> The problem I am seeing is this.  I have a Linux Samba server. When  
> simultaneously reading and writing from multiple client workstations --  
> let's say I'm writing a total of 80 MB/sec from 4 Windows clients
and
> reading 220 MB/sec from 12 other Windows clients -- every half hour or  
> so, all I/O seems to grind to a halt for 1 or 2 seconds. You can see  
> this graphically using a program such as "gkrellm".  You just see
a
> dropout of all reading and writing from/to the disks. SMB traffic  
> continues to come in from the network, but SMB traffic stops going out  
> TO the network.  This same pattern has been observed on multiple  
> servers, so the problem isn't caused by some bad RAID card or other  
> piece of defective hardware.
>
> Looking at data from /proc  (this is easy using a program like  
> "collectl"), at the moment of the dropout, you can see that
"idle" time
> goes to 0 on any CPU core running smbd processes, and wait goes very  
> high.  On the core handling the interrupts for the RAID card driver,  
> "idle" time goes to 100 percent.  This is running the 2.6.32.11
Linux
> kernel with Samba 3.5.3 (the latest as of today) or Samba 3.4.2 (the  
> version that comes with my distro).
>
> Curiously, the 1-to-2 second drops don't occur with the exact same  
> hardware and workload when running the 2.6.20 Linux kernel with Samba   
> 3.0.23d.  Since 2.6.20, there have been huge changes made in the vm  
> layer of the Linux kernel (specifically related to pdflush and the per  
> device "bdi" mechanism).  There have also been many changes to
the
> deadline i/o scheduler.  For several weeks, my colleagues and I have  
> been thinking that changes to one of those code areas might account for  
> the difference.  However, after much study and experimentation in  
> tweaking those subsystems, we could not make this occasional "drop
out"
> behavior go away.
>
> By the way, this dropout behavior does not occur with a "pure
write" or
> "pure read" workflow. Only when there is a mixture of read and
write.
> Our theory is that we might occasionally be getting a bunch of  
> small-sized blocks of data to write,  causing our RAID-5 configured RAID  
> card to do a large number of "partial stripe writes", which would
result
> in reading useless data from the drives in order to calculate parity for  
> a number of stripes, and interfering with reading data that has actually  
> been requested by clients.
>
> Two days ago -- grasping a bit for straws -- we tried messing around  
> with some smb.conf settings.  It turns out that setting "write cache =
> 262144" made this specific problem go away.  We have repeated our
tests
> for over 8 hours since making this change and we have not seen a single  
> dropout like before. Presumably, with this setting, Samba will only  
> write out in 256K blocks (which happens to be the stripe size on our 
> RAIDS)
>
> My question is, does anyone have a clue why setting "write cache"
like
> this would have such an effect?  Is setting "write cache" just
covering
> up some other problem? Is there any downside to using "write
cache"?
> And why didn't we have this issue with the 2.6.20 kernel + Samba
3.0.23d?
This looks like a kernel issue to me. Setting "write cache = 262144"
will massively change the read and write patterns that smbd does to
your disks. You didn't see the problem with your earlier setup because
it's a different kernel, without the issue.

With this setting of write cache, if the apps have a good locality
of data reference you'll almost never hit the disk, as everything
will be served out of that memory cache.

Sorry I can't be too helpful here, but this really doesn't look like
a Samba problem.

Jeremy.

Seemingly Similar Threads

Search for more seemingly similar threads

samba - Jun 2010 - Possible Issue with Samba Blocking I/O and CPU

[Samba] Possible Issue with Samba Blocking I/O and CPU

[Samba] Possible Issue with Samba Blocking I/O and CPU

[Samba] Possible Issue with Samba Blocking I/O and CPU

Seemingly Similar Threads