For some time I have seen very odd issues with IO performance on 8-Stable. Going back to November of last year when 8.0 was released, I see variations of up to 22% in identical operations. This is not a degradation as the performance moves up and down. This is a very simplistic case. I have two identical disks (Fujitsu 80G) on a ThinkPad T43 with a 2 GHz CPU and 2G RAM. I run the command: dd bs=516096 if=/dev/ad0 of=/dev/ad2 I do this in single user mode immediately after a boot with no disks mounted for write. Just a 'boot -s', ,Enter> to start the shell, and the dd. I would expect very consistent performance from run to run, but I don't get it. Here are the results since 8.0 was released: Date Xfer rate Kernel date 12/4/09 19,242,573 Nov. 26 kernel (8.0-stable) 12/9/09 18,304,565 Dec. 6 kernel 12/17/0923,676,086 1/5/10 18,648,609 1/14/10 23,488,540 Jan. 6 kernel 1/21/10 19,551,680 Jan. 15 kernel 1/27/10 21,176,385 Jan. 21 kernel 2/5/10 22,387,745 2/11/10 23,387,894 2/17/10 20,412,172 Feb. 16 kernel 2/25/10 22,049,128 3/4/10 22,099,624 Mar. 3 kernel 3/17/10 20,334,896 Mar. 3 kernel 3/31/10 21,655,213 Mar. 25 kernel 4/8/10 19,673,170 4/14/10 22,235,518 4/30/10 21,262,223 Apr. 14 kernel 6/3/10 22,838,125 May 24 kernel 6/17/10 18,481,270 6/28/10 20,958,356 7/8/10 19,698,282 June 28 kernel 7/21/10 23,330,556 7/28/10 20,544,392 July 24 kernel (8.1-stable) 8/13/10 22,093,259 Aug. 9 kernel Note the dramatic differences even on the same kernel. For the December 6 kernel, for example, I see a maximum of 23,676,086 and a minimum of just 18,304,565. ???? Can anyone explain what might be causing such a dramatic difference? I should also note that the system was consistent back in V6 and V7 days. Consistently slow, but consistent. 17.5M was the norm in V6 and 18.0M in V7. The performance jumped to about 19M in March of 09 and jumped to its current speeds with 8.0. So performance has greatly improved to where the slowest times are better than the fastest prior to March of 09. Just very inconsistent. I don't know that anything is wrong, but I'd love to understand why this is happening. -- R. Kevin Oberman, Network Engineer Energy Sciences Network (ESnet) Ernest O. Lawrence Berkeley National Laboratory (Berkeley Lab) E-mail: oberman@es.net Phone: +1 510 486-8634 Key fingerprint:059B 2DDF 031C 9BA3 14A4 EADA 927D EBB3 987B 3751
Am 13.08.2010 um 18:01 schrieb Kevin Oberman:> Note the dramatic differences even on the same kernel. For the December > 6 kernel, for example, I see a maximum of 23,676,086 and a minimum of > just 18,304,565. ????Are the disks still OK? If any sectors have been remapped between runs, additional seeks would be needed. I think it's unlikely, but checking with smartmontools should only take a few minutes. Stefan -- Stefan Bethke <stb@lassitu.de> Fon +49 151 14070811
On Fri, Aug 13, 2010 at 09:01:09AM -0700, Kevin Oberman wrote:> For some time I have seen very odd issues with IO performance on > 8-Stable. Going back to November of last year when 8.0 was released, I > see variations of up to 22% in identical operations. This is not a > degradation as the performance moves up and down. > > This is a very simplistic case. I have two identical disks (Fujitsu 80G) > on a ThinkPad T43 with a 2 GHz CPU and 2G RAM. I run the command: > dd bs=516096 if=/dev/ad0 of=/dev/ad2Why are you using this peculiar block size?> Note the dramatic differences even on the same kernel. For the December > 6 kernel, for example, I see a maximum of 23,676,086 and a minimum of > just 18,304,565. ????Both figures seem quite low to me? I cannot exactly reproduce your test, because I don't have an empty second disk handy, but doing dd if=/dev/zero bs=1m count=100 of=/tmp/foo yields the following writing speed on 8.1-RELEASE amd64, WDC WD5001ABYS SATA harddisk @ 7200 rpm.: 1) 87263174 bytes/sec 2) 87878728 bytes/sec 3) 86397125 bytes/sec 4) 86550094 bytes/sec 5) 86524741 bytes/sec Th maximum variation in write speed is (87878728-86397125)/86397125*100% 1.7%, which doesn't seem that much to me. This is in multi-user, with X11 running but on an otherwise idling machine, and with filesystem overhead to boot. Still the numbers are a lot higher than yours, which puzzles me. Trying only reading does yield very inconsistent results because of caching, I think; dd if=/tmp/foo bs=1m count=100 of=/dev/null 1) 1454216957 bytes/sec 2) 1003691691 bytes/sec 3) 1429956761 bytes/sec 4) 2324794646 bytes/sec 5) 1804563681 bytes/sec This is a (2324794646-1003691691)/1003691691*100% = 132% difference. OTOH, your data set should be big enough to negate caching effects, I guess. :-) What this does show is that writing seems to be the bottleneck. If I both read from and write to a file (on the same disk & partition); dd if=/tmp/foo bs=1m count=100 of=/tmp/bar gives 1) 85161534 bytes/sec 2) 84978770 bytes/sec 3) 87966613 bytes/sec 4) 83036312 bytes/sec 5) 86536879 bytes/sec This is a (87966613-83036312)/83036312*100% = 5.9% difference between largest and smallest. The speed seems to be dictated by the writing.> Can anyone explain what might be causing such a dramatic difference?Maybe there is a hardware component here? Are both disks on the same controller? Or if not are both controllers using the same interrupt line? You should have a look at 'systat -vmstat' with dd running in the background. That might give a clue as to where the bottleneck is. Roland -- R.F.Smith http://www.xs4all.nl/~rsmith/ [plain text _non-HTML_ PGP/GnuPG encrypted/signed email much appreciated] pgp: 1A2B 477F 9970 BA3C 2914 B7CE 1277 EFB0 C321 A725 (KeyID: C321A725) -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 196 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20100813/c1dbbf43/attachment.pgp
On Fri, Aug 13, 2010 at 09:01:09AM -0700, Kevin Oberman wrote:> For some time I have seen very odd issues with IO performance on > 8-Stable. Going back to November of last year when 8.0 was released, I > see variations of up to 22% in identical operations. This is not a > degradation as the performance moves up and down. > > This is a very simplistic case. I have two identical disks (Fujitsu 80G) > on a ThinkPad T43 with a 2 GHz CPU and 2G RAM. I run the command: > dd bs=516096 if=/dev/ad0 of=/dev/ad2 > > I do this in single user mode immediately after a boot with no disks > mounted for write. Just a 'boot -s', ,Enter> to start the shell, and the > dd. I would expect very consistent performance from run to run, but I > don't get it. Here are the results since 8.0 was released: > Date Xfer rate Kernel date > 12/4/09 19,242,573 Nov. 26 kernel (8.0-stable) > 12/9/09 18,304,565 Dec. 6 kernel > 12/17/0923,676,086 > 1/5/10 18,648,609 > 1/14/10 23,488,540 Jan. 6 kernel > 1/21/10 19,551,680 Jan. 15 kernel > 1/27/10 21,176,385 Jan. 21 kernel > 2/5/10 22,387,745 > 2/11/10 23,387,894 > 2/17/10 20,412,172 Feb. 16 kernel > 2/25/10 22,049,128 > 3/4/10 22,099,624 Mar. 3 kernel > 3/17/10 20,334,896 Mar. 3 kernel > 3/31/10 21,655,213 Mar. 25 kernel > 4/8/10 19,673,170 > 4/14/10 22,235,518 > 4/30/10 21,262,223 Apr. 14 kernel > 6/3/10 22,838,125 May 24 kernel > 6/17/10 18,481,270 > 6/28/10 20,958,356 > 7/8/10 19,698,282 June 28 kernel > 7/21/10 23,330,556 > 7/28/10 20,544,392 July 24 kernel (8.1-stable) > 8/13/10 22,093,259 Aug. 9 kernel > > Note the dramatic differences even on the same kernel. For the December > 6 kernel, for example, I see a maximum of 23,676,086 and a minimum of > just 18,304,565. ???? > > Can anyone explain what might be causing such a dramatic difference? > > I should also note that the system was consistent back in V6 and V7 > days. Consistently slow, but consistent. 17.5M was the norm in V6 and > 18.0M in V7. The performance jumped to about 19M in March of 09 and jumped > to its current speeds with 8.0. So performance has greatly improved to > where the slowest times are better than the fastest prior to March of > 09. Just very inconsistent. > > I don't know that anything is wrong, but I'd love to understand why this > is happening.The system in question is a Thinkpad T43 laptop[1], which is from circa 2005 and uses an ICH6-M southbridge (note the -M). We don't know the exact model of Fujitsu hard disk used, but since it's a laptop my guess is that it's 5400rpm, and PATA. System drastically differs on laptops if being powered off the battery vs. AC. Were the tests performed consistently with the exact same setup (which: battery or AC?) every time? Given that the system role is a laptop, I imagine not. Can you also provide output from these commands? - atacontrol list - atacontrol info ataX (where X is the channel number the ad0 drive is connected to) - atacontrol cap ad0 Anyway, I would expect the system to be seeing 50-60MB/sec, but I'm pulling those numbers out of thin air. An ICH6-M may be "old", but keep reading for a comparison system that's even older... The deviation in your disk I/O isn't a major surprise (to me anyway), given the system specs. What *does* surprise me is your abysmal I/O speeds in general. 18MB/sec min, 24MB/sec max?! ICH6-M can do a lot more than that. Something isn't right. It sounds to me like the disk itself has some kind of internal problem (cache that's gone bad, something mechanical that isn't audible, etc.); even for a 5400rpm drive those numbers are very low. Other possibilities include a southbridge that's going bad, or some kind of power-related problem that's causing the drive to spin at a lower speed than 5400rpm (though SMART sometimes can notice this). I'm grasping at straws with this one, but excessive dust can slow a system down (from what I'm told by EE folks; something about more electricity being required to push voltage across a trace...) Now the comparison -- here's a system that's way older than yours. - FreeBSD 6.4-STABLE, i386 - Supermicro SuperServer 5010E [2] - Intel Pentium 3 (not sure of speed) - 1GB RAM - Intel ICH2 southbridge - ad0: Maxtor STM3160815A disk (160GB, 8MB cache, 7200rpm, ATA133) - ad1: Maxtor STM3160815A disk (160GB, 8MB cache, 7200rpm, ATA133) - Disk connected via ICH2 - System in multi-user with some light load - Command #1: dd if=/dev/ad0 of=/dev/null bs=64k count=100000 - Command #2: dd if=/dev/ad1 of=/dev/null bs=64k count=100000 - Commands run 4 times in succession Result for command #1: 6553600000 bytes transferred in 84.110282 secs (77916752 bytes/sec) 6553600000 bytes transferred in 84.197506 secs (77836035 bytes/sec) 6553600000 bytes transferred in 84.205020 secs (77829089 bytes/sec) 6553600000 bytes transferred in 84.662426 secs (77408602 bytes/sec) Result for command #2: 6553600000 bytes transferred in 85.052923 secs (77053201 bytes/sec) 6553600000 bytes transferred in 85.040286 secs (77064652 bytes/sec) 6553600000 bytes transferred in 85.040805 secs (77064181 bytes/sec) 6553600000 bytes transferred in 85.040560 secs (77064403 bytes/sec) My recommendation: start looking at replacing hardware. Start by replacing the hard disk with something newer and see what becomes of that. If you want recommendations: if power draw, noise, and thermals are a concern point for you, get a 5400rpm drive. Try to find something with 16MB cache or more. The WD Scorpio Blue drives are supposedly quite nice, but I don't know if they come in PATA. I'm not particularly fond of Fujitsu drives (my day job consists of watching their SCSI disks freak out in random ways), so I'd choose to stay away from those. Same goes for Toshiba (just recently replaced one of those on a laptop; cache went bad, disk I/O was absurd). If you want something totally crazy to try, how about booting a FreeBSD 7.2 or 7.3 LiveFS CD and doing your dd's? [1]: http://www.thinkwiki.org/wiki/Category:T43 [2]: http://www.supermicro.com/products/system/1U/5010/SYS-5010E.cfm -- | Jeremy Chadwick jdc@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB |
On 13.8.2010 18:01, Kevin Oberman wrote:> For some time I have seen very odd issues with IO performance on > 8-Stable. Going back to November of last year when 8.0 was released, I > see variations of up to 22% in identical operations. This is not a > degradation as the performance moves up and down.In 8.0-8.1 span of time there was some work on the ata driver to make it use MAXPHYS (128 KiB) transfer sizes instead of 64 KiB. Modifying this will involve changing and recompiling the kernel but if you want to try something and the hardware is SATA you might try the new AHCI driver ("ada"). http://ivoras.net/blog/tree/2009-11-17.trying-ahci-in-8.0.html
> From: Ivan Voras <ivoras@freebsd.org> > Date: Mon, 16 Aug 2010 15:03:23 +0200 > Sender: owner-freebsd-stable@freebsd.org > > On 13.8.2010 18:01, Kevin Oberman wrote: > > For some time I have seen very odd issues with IO performance on > > 8-Stable. Going back to November of last year when 8.0 was released, I > > see variations of up to 22% in identical operations. This is not a > > degradation as the performance moves up and down. > > In 8.0-8.1 span of time there was some work on the ata driver to make it > use MAXPHYS (128 KiB) transfer sizes instead of 64 KiB. Modifying this > will involve changing and recompiling the kernel but if you want to try > something and the hardware is SATA you might try the new AHCI driver > ("ada"). > > http://ivoras.net/blog/tree/2009-11-17.trying-ahci-in-8.0.htmlThanks. I appreciate the suggestion. I am running a 8-Stable kernel from August 9, so I think I should be OK on this. IS there a requirement to set some parameter in the kernel config to take advantage of this? While the ThinkPad has a SATA ICH6-M chip-set, it does not provide or any SATA connections. Both SATA ports a run to a SATA/PATA converter chip and the only 2 physical connections available are PATA. I am assuming that this is because 2.5 in. SATA drives were pretty much unavailable when this system was shipped. This was the last of the T43 series and was dropped from the product line by Lenovo about a month after I got it, to be replaced by T60 systems running Core2 chips and using SATA drives. Just lousy timing almost 4 years ago. Thanks again! -- R. Kevin Oberman, Network Engineer Energy Sciences Network (ESnet) Ernest O. Lawrence Berkeley National Laboratory (Berkeley Lab) E-mail: oberman@es.net Phone: +1 510 486-8634 Key fingerprint:059B 2DDF 031C 9BA3 14A4 EADA 927D EBB3 987B 3751