Hi, I am experiencing a puzzling problem that may or may not be related to recent versions of Samba. I'm posting on this list, however, because it seems that setting "write cache = 262144" (256K) in smb.conf resolves the issue and so I have reason to believe the problem might somehow be related to Samba. The problem I am seeing is this. I have a Linux Samba server. When simultaneously reading and writing from multiple client workstations -- let's say I'm writing a total of 80 MB/sec from 4 Windows clients and reading 220 MB/sec from 12 other Windows clients -- every half hour or so, all I/O seems to grind to a halt for 1 or 2 seconds. You can see this graphically using a program such as "gkrellm". You just see a dropout of all reading and writing from/to the disks. SMB traffic continues to come in from the network, but SMB traffic stops going out TO the network. This same pattern has been observed on multiple servers, so the problem isn't caused by some bad RAID card or other piece of defective hardware. Looking at data from /proc (this is easy using a program like "collectl"), at the moment of the dropout, you can see that "idle" time goes to 0 on any CPU core running smbd processes, and wait goes very high. On the core handling the interrupts for the RAID card driver, "idle" time goes to 100 percent. This is running the 2.6.32.11 Linux kernel with Samba 3.5.3 (the latest as of today) or Samba 3.4.2 (the version that comes with my distro). Curiously, the 1-to-2 second drops don't occur with the exact same hardware and workload when running the 2.6.20 Linux kernel with Samba 3.0.23d. Since 2.6.20, there have been huge changes made in the vm layer of the Linux kernel (specifically related to pdflush and the per device "bdi" mechanism). There have also been many changes to the deadline i/o scheduler. For several weeks, my colleagues and I have been thinking that changes to one of those code areas might account for the difference. However, after much study and experimentation in tweaking those subsystems, we could not make this occasional "drop out" behavior go away. By the way, this dropout behavior does not occur with a "pure write" or "pure read" workflow. Only when there is a mixture of read and write. Our theory is that we might occasionally be getting a bunch of small-sized blocks of data to write, causing our RAID-5 configured RAID card to do a large number of "partial stripe writes", which would result in reading useless data from the drives in order to calculate parity for a number of stripes, and interfering with reading data that has actually been requested by clients. Two days ago -- grasping a bit for straws -- we tried messing around with some smb.conf settings. It turns out that setting "write cache = 262144" made this specific problem go away. We have repeated our tests for over 8 hours since making this change and we have not seen a single dropout like before. Presumably, with this setting, Samba will only write out in 256K blocks (which happens to be the stripe size on our RAIDS) My question is, does anyone have a clue why setting "write cache" like this would have such an effect? Is setting "write cache" just covering up some other problem? Is there any downside to using "write cache"? And why didn't we have this issue with the 2.6.20 kernel + Samba 3.0.23d? FYI, my setup is as follows: Supermicro X7DWE motherboard Intel Xeon 5482 3.2 Ghz Quad Core CPU 4 GB ECC Buffered RAM 3ware 9650 RAID card with 9.5.3 firmware (latest, includ fixes for prior issue of "writes blocking reads") 16 x 7200 RPM SATA drives Myricom 10 Gigabit NIC HP 2910 Switch with 4 x 10 Gigabit Ports and 24 x 1 Gigabit Ports Linux with 2.6.32.11 kernel Samba 3.5.3 The RAID system itself can sustain writes of > 650 MB/sec and reads > 700 MB/sec. When accessing the storage from Windows workstations via 10 Gigabit, there is no problem whatsoever in reading/writing > 300 MB/sec from any given client. Any good insights into the cause of what I'm seeing would be much appreciated. Thanks in advance. Andy Liebman
Roel van Meer
2010-Jun-10 14:39 UTC
[Samba] Possible Issue with Samba Blocking I/O and CPU
Hi Andy,> And why didn't we have this issue with the 2.6.20 kernel + Samba 3.0.23d?Have you tried kernel 2.6.20 with samba 3.5.3, or kernel 2.6.32.x with samba 3.0.23d? What filesystem are you using, and which mount options does it have? Regards, roel
Jeremy Allison
2010-Jun-10 16:33 UTC
[Samba] Possible Issue with Samba Blocking I/O and CPU
On Thu, Jun 10, 2010 at 10:21:13AM -0400, Andy Liebman wrote:> Hi, > > I am experiencing a puzzling problem that may or may not be related to > recent versions of Samba. I'm posting on this list, however, because it > seems that setting "write cache = 262144" (256K) in smb.conf resolves > the issue and so I have reason to believe the problem might somehow be > related to Samba. > > The problem I am seeing is this. I have a Linux Samba server. When > simultaneously reading and writing from multiple client workstations -- > let's say I'm writing a total of 80 MB/sec from 4 Windows clients and > reading 220 MB/sec from 12 other Windows clients -- every half hour or > so, all I/O seems to grind to a halt for 1 or 2 seconds. You can see > this graphically using a program such as "gkrellm". You just see a > dropout of all reading and writing from/to the disks. SMB traffic > continues to come in from the network, but SMB traffic stops going out > TO the network. This same pattern has been observed on multiple > servers, so the problem isn't caused by some bad RAID card or other > piece of defective hardware. > > Looking at data from /proc (this is easy using a program like > "collectl"), at the moment of the dropout, you can see that "idle" time > goes to 0 on any CPU core running smbd processes, and wait goes very > high. On the core handling the interrupts for the RAID card driver, > "idle" time goes to 100 percent. This is running the 2.6.32.11 Linux > kernel with Samba 3.5.3 (the latest as of today) or Samba 3.4.2 (the > version that comes with my distro). > > Curiously, the 1-to-2 second drops don't occur with the exact same > hardware and workload when running the 2.6.20 Linux kernel with Samba > 3.0.23d. Since 2.6.20, there have been huge changes made in the vm > layer of the Linux kernel (specifically related to pdflush and the per > device "bdi" mechanism). There have also been many changes to the > deadline i/o scheduler. For several weeks, my colleagues and I have > been thinking that changes to one of those code areas might account for > the difference. However, after much study and experimentation in > tweaking those subsystems, we could not make this occasional "drop out" > behavior go away. > > By the way, this dropout behavior does not occur with a "pure write" or > "pure read" workflow. Only when there is a mixture of read and write. > Our theory is that we might occasionally be getting a bunch of > small-sized blocks of data to write, causing our RAID-5 configured RAID > card to do a large number of "partial stripe writes", which would result > in reading useless data from the drives in order to calculate parity for > a number of stripes, and interfering with reading data that has actually > been requested by clients. > > Two days ago -- grasping a bit for straws -- we tried messing around > with some smb.conf settings. It turns out that setting "write cache = > 262144" made this specific problem go away. We have repeated our tests > for over 8 hours since making this change and we have not seen a single > dropout like before. Presumably, with this setting, Samba will only > write out in 256K blocks (which happens to be the stripe size on our > RAIDS) > > My question is, does anyone have a clue why setting "write cache" like > this would have such an effect? Is setting "write cache" just covering > up some other problem? Is there any downside to using "write cache"? > And why didn't we have this issue with the 2.6.20 kernel + Samba 3.0.23d?This looks like a kernel issue to me. Setting "write cache = 262144" will massively change the read and write patterns that smbd does to your disks. You didn't see the problem with your earlier setup because it's a different kernel, without the issue. With this setting of write cache, if the apps have a good locality of data reference you'll almost never hit the disk, as everything will be served out of that memory cache. Sorry I can't be too helpful here, but this really doesn't look like a Samba problem. Jeremy.