TJ
2004-Dec-02 06:59 UTC
[Samba] Tbench benchmark numbers seem to be limiting samba performance in the 2.4 and 2.6 kernel.
Hi, I'm getting horrible performance on my samba server, and I am unsure of the cause after reading, benchmarking, and tuning. My server is a K6-500 with 43MB of RAM, standard x86 hardware. The OS is Slackware 10.0 w/ 2.6.7 kernel I've had similar problems with the 2.4.26 kernel. I use samba version 3.0.5. I've listed my partitions below, as well as the drive models. I have a linear RAID array as a single element of a RAID 5 array. The RAID 5 array is the array containing the fs being served by samba. I'm sure having one raid array built on another affects my I/O performance, as well as having root, swap, and a slice of that array all on one drive, however, I have taken this into account and still am unable to account for my machine's poor performance. All drives are on their own IDE channel, no master slave combos, as suggested in the RAID howto. To tune these drives, I use: hdparm -c3 -d1 -m16 -X68 -k1 -A1 -a128 -M128 -u1 /dev/hd[kigca] I have tried different values for -a. I use 128, because this corresponds closely with the 64k stripe of the raid 5 array. I ran hdparm -Tt on each individual drive as well as both of the raid arrays and included these numbers below. The numbers I got were pretty low for modern drives. Still, I don't even get read performance comparable to what hdparm -tT outputs. I also ran dbench and tbench. In both cases, I'm seeing numbers in the 3-4 MB/sec range. I only ran these benchmarks with one client becuase this server only sees one client. It is connected via gigabit ethernet to one other machine, which is the only client. For tbench, I ran the client on a seperate machine to minimize the load on the server. Both of these numbers seem extremely low. Using smbclient, I attempt to transfer (read operation) of an 8,608KB file, and a 700MB file from the server to the client machine. In both cases I got approximately 5000 kb/s. In my dmesg, I'm seeing something strange.. I think this is determined by kernel internals. It seems strange and problematic to me. I believe this number is controller dependant, so I'm wondering if I have a controller issue here... hda: max request size: 128KiB hdc: max request size: 1024KiB hdg: max request size: 64KiB hdi: max request size: 128KiB hdk: max request size: 1024KiB From all of this information, I believe my hard drives are somehow not tuned properly due to the low hdparm numbers, especially hda and hdc. This is causing the raid array to perform poorly, in dbench and hdparm -tT. The fact that two drives on the same IDE controller are performing worse than the group, hda and hdc, further indicate that there may be a controller problem. I may try eliminating this controller and checking the results again. Performance seems to be limited by the low tbench numbers and not disk performance, though. In the README of the dbench package, Andrew Tridgell noted that he saw tbench numbers that were limited at high loads, and found that this was a problem in the kernel's TCP stack. As I'm only running 1 client, I doubt this is my problem, but the number is suspiciously low. I'm also crossposting the info on the raid and drive tuning on the linux software raid mailing list. My partitions are: /dev/hda1 on / /dev/hda2 is swap /dev/hda3 is part of /dev/md0 /dev/hdi is part of /dev/md0 /dev/hdk is part of /dev/md0 /dev/md0 is a linear array. It is part of /dev/md1 /dev/hdg is part of /dev/md1 /dev/hdc is part of /dev/md1 /dev/md1 is a raid 5 array. hda: WD 400JB 40GB hdc: WD 2000JB 200GB hdg: WD 2000JB 200GB hdi: IBM 75 GXP 120GB hdk: WD 1200JB 120GB The results from hdparm -tT for each individual drive and each raid array are: /dev/hda: Timing buffer-cache reads: 212 MB in 2.02 seconds = 105.17 MB/sec Timing buffered disk reads: 42 MB in 3.07 seconds = 13.67 MB/sec /dev/hdc: Timing buffer-cache reads: 212 MB in 2.00 seconds = 105.80 MB/sec Timing buffered disk reads: 44 MB in 3.12 seconds = 14.10 MB/sec /dev/hdg: Timing buffer-cache reads: 212 MB in 2.02 seconds = 105.12 MB/sec Timing buffered disk reads: 68 MB in 3.04 seconds = 22.38 MB/sec /dev/hdi: Timing buffer-cache reads: 216 MB in 2.04 seconds = 106.05 MB/sec Timing buffered disk reads: 72 MB in 3.06 seconds = 23.53 MB/sec /dev/hdk: Timing buffer-cache reads: 212 MB in 2.01 seconds = 105.33 MB/sec Timing buffered disk reads: 66 MB in 3.05 seconds = 21.66 MB/sec /dev/md0: Timing buffer-cache reads: 212 MB in 2.01 seconds = 105.28 MB/sec Timing buffered disk reads: 70 MB in 3.07 seconds = 22.77 MB/sec /dev/md1: Timing buffer-cache reads: 212 MB in 2.03 seconds = 104.35 MB/sec Timing buffered disk reads: 50 MB in 3.03 seconds = 16.51 MB/sec The results from dbench 1 are: Throughput 19.0968 MB/sec 1 procs The results from tbench 1 are: Throughput 4.41996 MB/sec 1 procs I would appriciate any thoughts, leads, ideas, anything at all to point me in a direction here. Thanks, TJ Harrell
TJ
2004-Dec-02 07:10 UTC
[Samba] Tbench benchmark numbers seem to be limiting samba performance in the 2.4 and 2.6 kernel.
Hi, I'm getting horrible performance on my samba server, and I am unsure of the cause after reading, benchmarking, and tuning. My server is a K6-500 with 43MB of RAM, standard x86 hardware. The OS is Slackware 10.0 w/ 2.6.7 kernel I've had similar problems with the 2.4.26 kernel. I use samba version 3.0.5. I've listed my partitions below, as well as the drive models. I have a linear RAID array as a single element of a RAID 5 array. The RAID 5 array is the array containing the fs being served by samba. I'm sure having one raid array built on another affects my I/O performance, as well as having root, swap, and a slice of that array all on one drive, however, I have taken this into account and still am unable to account for my machine's poor performance. All drives are on their own IDE channel, no master slave combos, as suggested in the RAID howto. To tune these drives, I use: hdparm -c3 -d1 -m16 -X68 -k1 -A1 -a128 -M128 -u1 /dev/hd[kigca] I have tried different values for -a. I use 128, because this corresponds closely with the 64k stripe of the raid 5 array. I ran hdparm -Tt on each individual drive as well as both of the raid arrays and included these numbers below. The numbers I got were pretty low for modern drives. Still, I don't even get read performance comparable to what hdparm -tT outputs. I also ran dbench and tbench. In both cases, I'm seeing numbers in the 3-4 MB/sec range. I only ran these benchmarks with one client becuase this server only sees one client. It is connected via gigabit ethernet to one other machine, which is the only client. For tbench, I ran the client on a seperate machine to minimize the load on the server. Both of these numbers seem extremely low. Using smbclient, I attempt to transfer (read operation) of an 8,608KB file, and a 700MB file from the server to the client machine. In both cases I got approximately 5000 kb/s. In my dmesg, I'm seeing something strange.. I think this is determined by kernel internals. It seems strange and problematic to me. I believe this number is controller dependant, so I'm wondering if I have a controller issue here... hda: max request size: 128KiB hdc: max request size: 1024KiB hdg: max request size: 64KiB hdi: max request size: 128KiB hdk: max request size: 1024KiB From all of this information, I believe my hard drives are somehow not tuned properly due to the low hdparm numbers, especially hda and hdc. This is causing the raid array to perform poorly, in dbench and hdparm -tT. The fact that two drives on the same IDE controller are performing worse than the group, hda and hdc, further indicate that there may be a controller problem. I may try eliminating this controller and checking the results again. Performance seems to be limited by the low tbench numbers and not disk performance, though. In the README of the dbench package, Andrew Tridgell noted that he saw tbench numbers that were limited at high loads, and found that this was a problem in the kernel's TCP stack. As I'm only running 1 client, I doubt this is my problem, but the number is suspiciously low. I'm also crossposting the info on the raid and drive tuning on the linux software raid mailing list. My partitions are: /dev/hda1 on / /dev/hda2 is swap /dev/hda3 is part of /dev/md0 /dev/hdi is part of /dev/md0 /dev/hdk is part of /dev/md0 /dev/md0 is a linear array. It is part of /dev/md1 /dev/hdg is part of /dev/md1 /dev/hdc is part of /dev/md1 /dev/md1 is a raid 5 array. hda: WD 400JB 40GB hdc: WD 2000JB 200GB hdg: WD 2000JB 200GB hdi: IBM 75 GXP 120GB hdk: WD 1200JB 120GB The results from hdparm -tT for each individual drive and each raid array are: /dev/hda: Timing buffer-cache reads: 212 MB in 2.02 seconds = 105.17 MB/sec Timing buffered disk reads: 42 MB in 3.07 seconds = 13.67 MB/sec /dev/hdc: Timing buffer-cache reads: 212 MB in 2.00 seconds = 105.80 MB/sec Timing buffered disk reads: 44 MB in 3.12 seconds = 14.10 MB/sec /dev/hdg: Timing buffer-cache reads: 212 MB in 2.02 seconds = 105.12 MB/sec Timing buffered disk reads: 68 MB in 3.04 seconds = 22.38 MB/sec /dev/hdi: Timing buffer-cache reads: 216 MB in 2.04 seconds = 106.05 MB/sec Timing buffered disk reads: 72 MB in 3.06 seconds = 23.53 MB/sec /dev/hdk: Timing buffer-cache reads: 212 MB in 2.01 seconds = 105.33 MB/sec Timing buffered disk reads: 66 MB in 3.05 seconds = 21.66 MB/sec /dev/md0: Timing buffer-cache reads: 212 MB in 2.01 seconds = 105.28 MB/sec Timing buffered disk reads: 70 MB in 3.07 seconds = 22.77 MB/sec /dev/md1: Timing buffer-cache reads: 212 MB in 2.03 seconds = 104.35 MB/sec Timing buffered disk reads: 50 MB in 3.03 seconds = 16.51 MB/sec The results from dbench 1 are: Throughput 19.0968 MB/sec 1 procs The results from tbench 1 are: Throughput 4.41996 MB/sec 1 procs I would appriciate any thoughts, leads, ideas, anything at all to point me in a direction here. Thanks, TJ Harrell
TJ
2004-Dec-02 07:32 UTC
[Samba] Re: Tbench benchmark numbers seem to be limiting samba performance in the 2.4 and 2.6 kernel.
Aha! In an attempt to decrease TCP overhead, I disabled traffic control on a seperate NIC (not the gigabit NIC I tested with tbench on). I also flushed all the tables in iptables. I then re-ran tbench from the other machine on the gigabit ethernet. Throughput came up to 6.74868. I also tried running tbench across loopback on the server with tc and iptables off and flushed. The throughput came up to 16.4996! I tested across loopback with tc off, but with my iptables rules still set. The throughput came back down to 6.5832. It seems that my iptables rules are definitely slowing down throughput. Why this is true, I do not know. Additionally, even with this off, throughput is low across the ethernet. I'm wondering if there is an ethernet problem this time. The iptables rules that are causing this problem, with process names marked out: iptables -A OUTPUT -t mangle -p tcp -m owner --cmd-owner xxxxx -j MARK --set-mark 1 iptables -A OUTPUT -t mangle -p tcp -m owner --cmd-owner xxxxx -j MARK --set-mark 2 iptables -A OUTPUT -t mangle -p tcp -m owner --cmd-owner xxxxx -j MARK --set-mark 3 iptables -A OUTPUT -t mangle -p tcp -m owner --cmd-owner xxxxx -j MARK --set-mark 3 iptables -A OUTPUT -t mangle -p tcp -m owner --cmd-owner xxxxx -j MARK --set-mark 3 TJ Harrell
TJ
2004-Dec-02 08:22 UTC
[Samba] Re: Tbench benchmark numbers seem to be limiting samba performance in the 2.4 and 2.6 kernel.
Using ttcp, I benchmarked my ethernet connection. I got in the realm of 140 Mb/s. I'm not sure, then, why throughput is significantly less over ethernet than over loopback. TJ Harrell