Hi! I just want to check with the community to see if this is normal. I have used a X4500 with 500Gb disks and I''m not impressed by the copy performance. I can run several jobs in parallel and get close to 400mb/s but I need better performance from a single copy. I have tried to be "EVIL" as well but without success. Tests done with: Solaris 10 U4 Solaris 10 U5 (B10) Nevada B86 *Setup* # zpool status pool: datapool state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM datapool ONLINE 0 0 0 mirror ONLINE 0 0 0 c0t0d0 ONLINE 0 0 0 c1t0d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c4t0d0 ONLINE 0 0 0 c6t0d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c7t0d0 ONLINE 0 0 0 c0t1d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c1t1d0 ONLINE 0 0 0 c4t1d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c5t1d0 ONLINE 0 0 0 c6t1d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c7t1d0 ONLINE 0 0 0 c0t2d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c1t2d0 ONLINE 0 0 0 c4t2d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c5t2d0 ONLINE 0 0 0 c6t2d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c7t2d0 ONLINE 0 0 0 c0t3d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c1t3d0 ONLINE 0 0 0 c4t3d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c5t3d0 ONLINE 0 0 0 c6t3d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c7t3d0 ONLINE 0 0 0 c0t4d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c1t4d0 ONLINE 0 0 0 c4t4d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c6t4d0 ONLINE 0 0 0 c7t4d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c0t5d0 ONLINE 0 0 0 c1t5d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c4t5d0 ONLINE 0 0 0 c5t5d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c6t5d0 ONLINE 0 0 0 c7t5d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c0t6d0 ONLINE 0 0 0 c1t6d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c4t6d0 ONLINE 0 0 0 c5t6d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c6t6d0 ONLINE 0 0 0 c7t6d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c0t7d0 ONLINE 0 0 0 c1t7d0 ONLINE 0 0 0 *Result* - Around 50-60mb/s read parsing profile for config: copyfiles Running /tmp/temp165-231.*.*.COM-zfs-readtest-Apr_8_2008-09h_09m_07s/copyfiles/thisrun.f FileBench Version 1.2.2 5109: 0.005: CopyFiles Version 2.3 personality successfully loaded 5109: 0.005: Creating/pre-allocating files and filesets 5109: 0.069: Fileset destfiles: 10000 files, avg dir = 20, avg depth = 3.1, mbytes=156 5109: 3.922: Removed any existing fileset destfiles in 4 seconds 5109: 3.952: Creating fileset destfiles... 5109: 3.952: Preallocated 0 of 10000 of fileset destfiles in 1 seconds 5109: 4.039: Fileset bigfileset: 10000 files, avg dir = 20, avg depth = 3.1, mbytes=158 5109: 4.071: Removed any existing fileset bigfileset in 1 seconds 5109: 4.098: Creating fileset bigfileset... 5109: 117.245: Preallocated 10000 of 10000 of fileset bigfileset in 114 seconds 5109: 117.245: waiting for fileset pre-allocation to finish 5109: 117.245: Running ''/opt/filebench/scripts/fs_flush zfs /export/transcoded'' ''zpool export datapool'' ''zpool import datapool'' 5109: 127.338: Change dir to /tmp/temp165-231.*.*.COM-zfs-readtest-Apr_8_2008-09h_09m_07s/copyfiles 5109: 127.339: Starting 1 filereader instances 5287: 128.348: Starting 16 filereaderthread threads 5109: 131.358: Running... 5109: 134.378: Run took 3 seconds... 5109: 134.378: Per-Operation Breakdown closefile2 3312ops/s 0.0mb/s 0.0ms/op 3us/op-cpu closefile1 3312ops/s 0.0mb/s 0.0ms/op 4us/op-cpu writefile2 3312ops/s 52.3mb/s 0.1ms/op 59us/op-cpu createfile2 3312ops/s 0.0mb/s 0.2ms/op 100us/op-cpu readfile1 3312ops/s 52.3mb/s 0.1ms/op 27us/op-cpu openfile1 3312ops/s 0.0mb/s 0.6ms/op 53us/op-cpu 5109: 134.378: IO Summary: 60000 ops 19869.5 ops/s, (3312/3312 r/w) 104.6mb/s, 228us cpu/op, 0.4ms latency 5109: 134.378: Stats dump to file ''stats.copyfiles.out'' 5109: 134.378: in statsdump stats.copyfiles.out 5109: 134.379: Shutting down processes Cheers, Henrik This message posted from opensolaris.org
On my drive array (capable of 260MB/second single-process writes and 450MB/second single-process reads) ''zfs iostat'' reports a read rate of about 59MB/second and a write rate of about 59MB/second when executing ''cp -r'' on a directory containing thousands of 8MB files. This seems very similar to the performance you are seeing. The system indicators (other than disk I/O) are almost flatlined at zero while the copy is going on. It seems that a multi-threaded ''cp'' could be much faster. With GNU xargs, find, and cpio, I think that it is possible to cobble together a much faster copy since GNU xargs supports --max-procs and --max-args arguments to allow executing commands concurrently with different sets of files. Bob =====================================Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
Bob Friesenhahn schrieb:> On my drive array (capable of 260MB/second single-process writes and > 450MB/second single-process reads) ''zfs iostat'' reports a read rate of > about 59MB/second and a write rate of about 59MB/second when executing > ''cp -r'' on a directory containing thousands of 8MB files. This seems > very similar to the performance you are seeing. > > The system indicators (other than disk I/O) are almost flatlined at > zero while the copy is going on. > > It seems that a multi-threaded ''cp'' could be much faster. > > With GNU xargs, find, and cpio, I think that it is possible to cobble > together a much faster copy since GNU xargs supports --max-procs and > --max-args arguments to allow executing commands concurrently with > different sets of files. > > BobThat''s the reason I wrote a binary patch (preloadable shared object) for cp, tar, and friends. You might want to take a look at it... Here: http://www.maier-komor.de/mtwrite.html - Thomas
No, that is definitely not expected. One thing that can hose you is having a single disk that performs really badly. I''ve seen disks as slow as 5 MB/sec due to vibration, bad sectors, etc. To see if you have such a disk, try my diskqual.sh script (below). On my desktop system, which has 8 drives, I get: # ./diskqual.sh c1t0d0 65 MB/sec c1t1d0 63 MB/sec c2t0d0 59 MB/sec c2t1d0 63 MB/sec c3t0d0 60 MB/sec c3t1d0 57 MB/sec c4t0d0 61 MB/sec c4t1d0 61 MB/sec The diskqual test is non-destructive (it only does reads), but to get valid numbers you should run it on an otherwise idle system. ---------------------- #!/bin/ksh disks=`format </dev/null | grep c.t.d | nawk ''{print $2}''` getspeed1() { ptime dd if=/dev/rdsk/${1}s0 of=/dev/null bs=64k count=1024 2>&1 | nawk ''$1 == "real" { printf("%.0f\n", 67.108864 / $2) }'' } getspeed() { for iter in 1 2 3 do getspeed1 $1 done | sort -n | tail -2 | head -1 } for disk in $disks do echo $disk `getspeed $disk` MB/sec done ---------------------- Jeff On Tue, Apr 08, 2008 at 06:44:13AM -0700, Henrik Hjort wrote:> Hi! > > I just want to check with the community to see if this is normal. > > I have used a X4500 with 500Gb disks and I''m not impressed by the copy performance. > I can run several jobs in parallel and get close to 400mb/s but I need better performance > from a single copy. I have tried to be "EVIL" as well but without success. > > Tests done with: > Solaris 10 U4 > Solaris 10 U5 (B10) > Nevada B86 > > *Setup* > > # zpool status > pool: datapool > state: ONLINE > scrub: none requested > config: > > NAME STATE READ WRITE CKSUM > datapool ONLINE 0 0 0 > mirror ONLINE 0 0 0 > c0t0d0 ONLINE 0 0 0 > c1t0d0 ONLINE 0 0 0 > mirror ONLINE 0 0 0 > c4t0d0 ONLINE 0 0 0 > c6t0d0 ONLINE 0 0 0 > mirror ONLINE 0 0 0 > c7t0d0 ONLINE 0 0 0 > c0t1d0 ONLINE 0 0 0 > mirror ONLINE 0 0 0 > c1t1d0 ONLINE 0 0 0 > c4t1d0 ONLINE 0 0 0 > mirror ONLINE 0 0 0 > c5t1d0 ONLINE 0 0 0 > c6t1d0 ONLINE 0 0 0 > mirror ONLINE 0 0 0 > c7t1d0 ONLINE 0 0 0 > c0t2d0 ONLINE 0 0 0 > mirror ONLINE 0 0 0 > c1t2d0 ONLINE 0 0 0 > c4t2d0 ONLINE 0 0 0 > mirror ONLINE 0 0 0 > c5t2d0 ONLINE 0 0 0 > c6t2d0 ONLINE 0 0 0 > mirror ONLINE 0 0 0 > c7t2d0 ONLINE 0 0 0 > c0t3d0 ONLINE 0 0 0 > mirror ONLINE 0 0 0 > c1t3d0 ONLINE 0 0 0 > c4t3d0 ONLINE 0 0 0 > mirror ONLINE 0 0 0 > c5t3d0 ONLINE 0 0 0 > c6t3d0 ONLINE 0 0 0 > mirror ONLINE 0 0 0 > c7t3d0 ONLINE 0 0 0 > c0t4d0 ONLINE 0 0 0 > mirror ONLINE 0 0 0 > c1t4d0 ONLINE 0 0 0 > c4t4d0 ONLINE 0 0 0 > mirror ONLINE 0 0 0 > c6t4d0 ONLINE 0 0 0 > c7t4d0 ONLINE 0 0 0 > mirror ONLINE 0 0 0 > c0t5d0 ONLINE 0 0 0 > c1t5d0 ONLINE 0 0 0 > mirror ONLINE 0 0 0 > c4t5d0 ONLINE 0 0 0 > c5t5d0 ONLINE 0 0 0 > mirror ONLINE 0 0 0 > c6t5d0 ONLINE 0 0 0 > c7t5d0 ONLINE 0 0 0 > mirror ONLINE 0 0 0 > c0t6d0 ONLINE 0 0 0 > c1t6d0 ONLINE 0 0 0 > mirror ONLINE 0 0 0 > c4t6d0 ONLINE 0 0 0 > c5t6d0 ONLINE 0 0 0 > mirror ONLINE 0 0 0 > c6t6d0 ONLINE 0 0 0 > c7t6d0 ONLINE 0 0 0 > mirror ONLINE 0 0 0 > c0t7d0 ONLINE 0 0 0 > c1t7d0 ONLINE 0 0 0 > > *Result* - Around 50-60mb/s read > > parsing profile for config: copyfiles > Running /tmp/temp165-231.*.*.COM-zfs-readtest-Apr_8_2008-09h_09m_07s/copyfiles/thisrun.f > FileBench Version 1.2.2 > 5109: 0.005: CopyFiles Version 2.3 personality successfully loaded > 5109: 0.005: Creating/pre-allocating files and filesets > 5109: 0.069: Fileset destfiles: 10000 files, avg dir = 20, avg depth = 3.1, mbytes=156 > 5109: 3.922: Removed any existing fileset destfiles in 4 seconds > 5109: 3.952: Creating fileset destfiles... > 5109: 3.952: Preallocated 0 of 10000 of fileset destfiles in 1 seconds > 5109: 4.039: Fileset bigfileset: 10000 files, avg dir = 20, avg depth = 3.1, mbytes=158 > 5109: 4.071: Removed any existing fileset bigfileset in 1 seconds > 5109: 4.098: Creating fileset bigfileset... > 5109: 117.245: Preallocated 10000 of 10000 of fileset bigfileset in 114 seconds > 5109: 117.245: waiting for fileset pre-allocation to finish > 5109: 117.245: Running ''/opt/filebench/scripts/fs_flush zfs /export/transcoded'' > ''zpool export datapool'' > ''zpool import datapool'' > 5109: 127.338: Change dir to /tmp/temp165-231.*.*.COM-zfs-readtest-Apr_8_2008-09h_09m_07s/copyfiles > 5109: 127.339: Starting 1 filereader instances > 5287: 128.348: Starting 16 filereaderthread threads > 5109: 131.358: Running... > 5109: 134.378: Run took 3 seconds... > 5109: 134.378: Per-Operation Breakdown > closefile2 3312ops/s 0.0mb/s 0.0ms/op 3us/op-cpu > closefile1 3312ops/s 0.0mb/s 0.0ms/op 4us/op-cpu > writefile2 3312ops/s 52.3mb/s 0.1ms/op 59us/op-cpu > createfile2 3312ops/s 0.0mb/s 0.2ms/op 100us/op-cpu > readfile1 3312ops/s 52.3mb/s 0.1ms/op 27us/op-cpu > openfile1 3312ops/s 0.0mb/s 0.6ms/op 53us/op-cpu > > 5109: 134.378: > IO Summary: 60000 ops 19869.5 ops/s, (3312/3312 r/w) 104.6mb/s, 228us cpu/op, 0.4ms latency > 5109: 134.378: Stats dump to file ''stats.copyfiles.out'' > 5109: 134.378: in statsdump stats.copyfiles.out > 5109: 134.379: Shutting down processes > > Cheers, > Henrik > > > This message posted from opensolaris.org > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
On Mon, 14 Apr 2008, Jeff Bonwick wrote:> disks=`format </dev/null | grep c.t.d | nawk ''{print $2}''`I had to change the above line to disks=`format </dev/null | grep '' c.t'' | nawk ''{print $2}''` in order to match my mutipathed devices. ./diskqual.sh c1t0d0 130 MB/sec c1t1d0 13422 MB/sec c4t600A0B80003A8A0B0000096A47B4559Ed0 190 MB/sec c4t600A0B80003A8A0B0000096E47B456DAd0 202 MB/sec c4t600A0B80003A8A0B0000096147B451BEd0 186 MB/sec c4t600A0B80003A8A0B0000096647B453CEd0 176 MB/sec c4t600A0B80003A8A0B0000097347B457D4d0 189 MB/sec c4t600A0B800039C9B500000A9C47B4522Dd0 174 MB/sec c4t600A0B800039C9B500000AA047B4529Bd0 197 MB/sec c4t600A0B800039C9B500000AA447B4544Fd0 223 MB/sec c4t600A0B800039C9B500000AA847B45605d0 224 MB/sec c4t600A0B800039C9B500000AAC47B45739d0 223 MB/sec c4t600A0B800039C9B500000AB047B457ADd0 219 MB/sec c4t600A0B800039C9B500000AB447B4595Fd0 223 MB/sec My ''cp -r'' performance is about the same as Henrik''s. The ''cp -r'' performance is much less than disk benchmark tools would suggest. Bob =====================================Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
after some fruitful discussions with J?rg, it turned out that my mtwrite patch prevents tar, star, gtar, and unzip from setting the file times correctly. I''ve investigated this issue and updated the patch accordingly. Unfortunately, I encountered an issue concerning semaphores, which seem to have a race condition. At least I couldn''t get it to work reliably with semaphores so I switched over to condition variables, which works now. I''ll investigate the semaphore issue as soon as I have time, but I''m pretty convinced that there is a race condition in the semaphore implementation, as the semaphore value from time to time grew larger than the number of elements in the work list. This was on Solaris 10 - so I''ll try to generate a test for SX. Does anybody know of any issues related to semaphores? the work creator did the following: - lock the structure containing the list - attach an element to the list - post the semaphore - unlock the structure the worker thread did the following: - wait on the semaphore - lock the structure containing the list - remove an element from the list - unlock the structure - perform the work described by the list element - lock the structure - update the structure to reflect the work results - unlock the structure - restart from the beginning Is anything wrong with this approach? Replacing the semaphore calls with condition calls and swapping steps 1 and 2 of worker thread made it reliable... - Thomas P.S.: I published the updated mtwrite on my website yesterday - get it here: http://www.maier-komor.de/mtwrite.html