thr3ads.net - samba - [Samba] Tbench benchmark numbers seem to be limiting samba performance in the 2.4 and 2.6 kernel. [Dec 2004]

If this information is useful, please help other people find it:
Share via:

2004-Dec-02 06:59 UTC

[Samba] Tbench benchmark numbers seem to be limiting samba performance in the 2.4 and 2.6 kernel.

Hi,

	I'm getting horrible performance on my samba server, and I am 
unsure of the cause after reading, benchmarking, and tuning.
	My server is a K6-500 with 43MB of RAM, standard x86 hardware. The 
OS is Slackware 10.0 w/ 2.6.7 kernel I've had similar problems with the
2.4.26
kernel. I use samba version 3.0.5. I've listed my partitions below, as well 
as the drive models. I have a linear RAID array as a single element of a RAID 
5 array. The RAID 5 array is the array containing the fs being served by 
samba. I'm sure having one raid array built on another affects my I/O 
performance, as well as having root, swap, and a slice of that array all on 
one drive, however, I have taken this into account and still am unable to 
account for my machine's poor performance. All drives are on their own IDE 
channel, no master slave combos, as suggested in the RAID howto.

To tune these drives, I use:
hdparm -c3 -d1 -m16 -X68 -k1 -A1 -a128 -M128 -u1 /dev/hd[kigca]

I have tried different values for -a. I use 128, because this corresponds 
closely with the 64k stripe of the raid 5 array. I ran hdparm -Tt on each 
individual drive as well as both of the raid arrays and included these 
numbers below. The numbers I got were pretty low for modern drives. Still, I 
don't even get read performance comparable to what hdparm -tT outputs. I
also
ran dbench and tbench. In both cases, I'm seeing numbers in the 3-4 MB/sec 
range. I only ran these benchmarks with one client becuase this server only 
sees one client. It is connected via gigabit ethernet to one other machine, 
which is the only client. For tbench, I ran the client on a seperate machine 
to minimize the load on the server. Both of these numbers seem extremely low. 
Using smbclient, I attempt to transfer (read operation) of an 8,608KB file, 
and a 700MB file from the server to the client machine. In both cases I got 
approximately 5000 kb/s.

In my dmesg, I'm seeing something strange.. I think this is determined by 
kernel internals. It seems strange and problematic to me. I believe this 
number is controller dependant, so I'm wondering if I have a controller
issue
here...

hda: max request size: 128KiB
hdc: max request size: 1024KiB
hdg: max request size: 64KiB
hdi: max request size: 128KiB
hdk: max request size: 1024KiB


	From all of this information, I believe my hard drives are somehow not tuned 
properly due to the low hdparm numbers, especially hda and hdc. This is 
causing the raid array to perform poorly, in dbench and hdparm -tT. The fact 
that two drives on the same IDE controller are performing worse than the 
group, hda and hdc, further indicate that there may be a controller problem. 
I may try eliminating this controller and checking the results again.
	Performance seems to be limited by the low tbench numbers and not disk 
performance, though. In the README of the dbench package, Andrew Tridgell 
noted that he saw tbench numbers that were limited at high loads, and found 
that this was a problem in the kernel's TCP stack. As I'm only running 1
client, I doubt this is my problem, but the number is suspiciously low.

I'm also crossposting the info on the raid and drive tuning on the linux 
software raid mailing list. 

My partitions are:

/dev/hda1 on /        
/dev/hda2 is swap
/dev/hda3 is part of /dev/md0
/dev/hdi  is part of /dev/md0
/dev/hdk  is part of /dev/md0
/dev/md0  is a linear array. It is part of /dev/md1
/dev/hdg  is part of /dev/md1
/dev/hdc  is part of /dev/md1
/dev/md1  is a raid 5 array.

hda: WD 400JB  40GB
hdc: WD 2000JB 200GB
hdg: WD 2000JB 200GB
hdi: IBM 75 GXP  120GB
hdk: WD 1200JB 120GB

The results from hdparm -tT for each individual drive and each raid array 
are:

/dev/hda:
 Timing buffer-cache reads:   212 MB in  2.02 seconds = 105.17 MB/sec
 Timing buffered disk reads:   42 MB in  3.07 seconds =  13.67 MB/sec
/dev/hdc:
 Timing buffer-cache reads:   212 MB in  2.00 seconds = 105.80 MB/sec
 Timing buffered disk reads:   44 MB in  3.12 seconds =  14.10 MB/sec
/dev/hdg:
 Timing buffer-cache reads:   212 MB in  2.02 seconds = 105.12 MB/sec
 Timing buffered disk reads:   68 MB in  3.04 seconds =  22.38 MB/sec
/dev/hdi:
 Timing buffer-cache reads:   216 MB in  2.04 seconds = 106.05 MB/sec
 Timing buffered disk reads:   72 MB in  3.06 seconds =  23.53 MB/sec
/dev/hdk:
 Timing buffer-cache reads:   212 MB in  2.01 seconds = 105.33 MB/sec
 Timing buffered disk reads:   66 MB in  3.05 seconds =  21.66 MB/sec
/dev/md0:
 Timing buffer-cache reads:   212 MB in  2.01 seconds = 105.28 MB/sec
 Timing buffered disk reads:   70 MB in  3.07 seconds =  22.77 MB/sec
/dev/md1:
 Timing buffer-cache reads:   212 MB in  2.03 seconds = 104.35 MB/sec
 Timing buffered disk reads:   50 MB in  3.03 seconds =  16.51 MB/sec

The results from dbench 1 are: Throughput 19.0968 MB/sec 1 procs
The results from tbench 1 are: Throughput 4.41996 MB/sec 1 procs

I would appriciate any thoughts, leads, ideas, anything at all to point me in 
a direction here. 

Thanks,
TJ Harrell

2004-Dec-02 07:10 UTC

head link

[Samba] Tbench benchmark numbers seem to be limiting samba performance in the 2.4 and 2.6 kernel.

Hi,

	I'm getting horrible performance on my samba server, and I am
unsure of the cause after reading, benchmarking, and tuning.
	My server is a K6-500 with 43MB of RAM, standard x86 hardware. The
OS is Slackware 10.0 w/ 2.6.7 kernel I've had similar problems with the
 2.4.26 kernel. I use samba version 3.0.5. I've listed my partitions below,
 as well as the drive models. I have a linear RAID array as a single element
 of a RAID 5 array. The RAID 5 array is the array containing the fs being
 served by samba. I'm sure having one raid array built on another affects my
 I/O performance, as well as having root, swap, and a slice of that array all
 on one drive, however, I have taken this into account and still am unable to
 account for my machine's poor performance. All drives are on their own IDE
 channel, no master slave combos, as suggested in the RAID howto.

To tune these drives, I use:
hdparm -c3 -d1 -m16 -X68 -k1 -A1 -a128 -M128 -u1 /dev/hd[kigca]

I have tried different values for -a. I use 128, because this corresponds
closely with the 64k stripe of the raid 5 array. I ran hdparm -Tt on each
individual drive as well as both of the raid arrays and included these
numbers below. The numbers I got were pretty low for modern drives. Still, I
don't even get read performance comparable to what hdparm -tT outputs. I
also
ran dbench and tbench. In both cases, I'm seeing numbers in the 3-4 MB/sec
range. I only ran these benchmarks with one client becuase this server only
sees one client. It is connected via gigabit ethernet to one other machine,
which is the only client. For tbench, I ran the client on a seperate machine
to minimize the load on the server. Both of these numbers seem extremely low.
Using smbclient, I attempt to transfer (read operation) of an 8,608KB file,
and a 700MB file from the server to the client machine. In both cases I got
approximately 5000 kb/s.

In my dmesg, I'm seeing something strange.. I think this is determined by
kernel internals. It seems strange and problematic to me. I believe this
number is controller dependant, so I'm wondering if I have a controller
issue
here...

hda: max request size: 128KiB
hdc: max request size: 1024KiB
hdg: max request size: 64KiB
hdi: max request size: 128KiB
hdk: max request size: 1024KiB


	From all of this information, I believe my hard drives are somehow not tuned
properly due to the low hdparm numbers, especially hda and hdc. This is
causing the raid array to perform poorly, in dbench and hdparm -tT. The fact
that two drives on the same IDE controller are performing worse than the
group, hda and hdc, further indicate that there may be a controller problem.
I may try eliminating this controller and checking the results again.
	Performance seems to be limited by the low tbench numbers and not disk
performance, though. In the README of the dbench package, Andrew Tridgell
noted that he saw tbench numbers that were limited at high loads, and found
that this was a problem in the kernel's TCP stack. As I'm only running 1
client, I doubt this is my problem, but the number is suspiciously low.

I'm also crossposting the info on the raid and drive tuning on the linux
software raid mailing list.

My partitions are:

/dev/hda1 on /
/dev/hda2 is swap
/dev/hda3 is part of /dev/md0
/dev/hdi  is part of /dev/md0
/dev/hdk  is part of /dev/md0
/dev/md0  is a linear array. It is part of /dev/md1
/dev/hdg  is part of /dev/md1
/dev/hdc  is part of /dev/md1
/dev/md1  is a raid 5 array.

hda: WD 400JB  40GB
hdc: WD 2000JB 200GB
hdg: WD 2000JB 200GB
hdi: IBM 75 GXP  120GB
hdk: WD 1200JB 120GB

The results from hdparm -tT for each individual drive and each raid array
are:

/dev/hda:
 Timing buffer-cache reads:   212 MB in  2.02 seconds = 105.17 MB/sec
 Timing buffered disk reads:   42 MB in  3.07 seconds =  13.67 MB/sec
/dev/hdc:
 Timing buffer-cache reads:   212 MB in  2.00 seconds = 105.80 MB/sec
 Timing buffered disk reads:   44 MB in  3.12 seconds =  14.10 MB/sec
/dev/hdg:
 Timing buffer-cache reads:   212 MB in  2.02 seconds = 105.12 MB/sec
 Timing buffered disk reads:   68 MB in  3.04 seconds =  22.38 MB/sec
/dev/hdi:
 Timing buffer-cache reads:   216 MB in  2.04 seconds = 106.05 MB/sec
 Timing buffered disk reads:   72 MB in  3.06 seconds =  23.53 MB/sec
/dev/hdk:
 Timing buffer-cache reads:   212 MB in  2.01 seconds = 105.33 MB/sec
 Timing buffered disk reads:   66 MB in  3.05 seconds =  21.66 MB/sec
/dev/md0:
 Timing buffer-cache reads:   212 MB in  2.01 seconds = 105.28 MB/sec
 Timing buffered disk reads:   70 MB in  3.07 seconds =  22.77 MB/sec
/dev/md1:
 Timing buffer-cache reads:   212 MB in  2.03 seconds = 104.35 MB/sec
 Timing buffered disk reads:   50 MB in  3.03 seconds =  16.51 MB/sec

The results from dbench 1 are: Throughput 19.0968 MB/sec 1 procs
The results from tbench 1 are: Throughput 4.41996 MB/sec 1 procs

I would appriciate any thoughts, leads, ideas, anything at all to point me in
a direction here.

Thanks,
TJ Harrell

2004-Dec-02 07:32 UTC

head link

[Samba] Re: Tbench benchmark numbers seem to be limiting samba performance in the 2.4 and 2.6 kernel.

Aha! In an attempt to decrease TCP overhead, I disabled traffic control on a 
seperate NIC (not the gigabit NIC I tested with tbench on). I also flushed 
all the tables in iptables. I then re-ran tbench from the other machine on 
the gigabit ethernet. Throughput came up to 6.74868.

	I also tried running tbench across loopback on the server with tc and 
iptables off and flushed. The throughput came up to 16.4996! I tested across 
loopback with tc off, but with my iptables rules still set. The throughput 
came back down to 6.5832.

	It seems that my iptables rules are definitely slowing down throughput. Why 
this is true, I do not know. Additionally, even with this off, throughput is 
low across the ethernet. I'm wondering if there is an ethernet problem this 
time. 

The iptables rules that are causing this problem, with process names marked 
out:
iptables -A OUTPUT -t mangle -p tcp -m owner --cmd-owner xxxxx -j MARK 
--set-mark 1
iptables -A OUTPUT -t mangle -p tcp -m owner --cmd-owner xxxxx -j MARK 
--set-mark 2
iptables -A OUTPUT -t mangle -p tcp -m owner --cmd-owner xxxxx -j MARK 
--set-mark 3
iptables -A OUTPUT -t mangle -p tcp -m owner --cmd-owner xxxxx -j MARK 
--set-mark 3
iptables -A OUTPUT -t mangle -p tcp -m owner --cmd-owner xxxxx -j MARK 
--set-mark 3


TJ Harrell

2004-Dec-02 08:22 UTC

head link

[Samba] Re: Tbench benchmark numbers seem to be limiting samba performance in the 2.4 and 2.6 kernel.

Using ttcp, I benchmarked my ethernet connection. I got in the realm of 140 
Mb/s. I'm not sure, then, why throughput is significantly less over ethernet
than over loopback.

TJ Harrell

Apparently Analagous Threads

Search for more apparently analagous threads

samba - Dec 2004 - Tbench benchmark numbers seem to be limiting samba performance in the 2.4 and 2.6 kernel.

[Samba] Tbench benchmark numbers seem to be limiting samba performance in the 2.4 and 2.6 kernel.

[Samba] Tbench benchmark numbers seem to be limiting samba performance in the 2.4 and 2.6 kernel.

[Samba] Re: Tbench benchmark numbers seem to be limiting samba performance in the 2.4 and 2.6 kernel.

[Samba] Re: Tbench benchmark numbers seem to be limiting samba performance in the 2.4 and 2.6 kernel.

Apparently Analagous Threads