Hi!
I just want to check with the community to see if this is normal.
I have used a X4500 with 500Gb disks and I''m not impressed by the copy
performance.
I can run several jobs in parallel and get close to 400mb/s but I need better
performance
from a single copy. I have tried to be "EVIL" as well but without
success.
Tests done with:
Solaris 10 U4
Solaris 10 U5 (B10)
Nevada B86
*Setup*
# zpool status
pool: datapool
state: ONLINE
scrub: none requested
config:
NAME STATE READ WRITE CKSUM
datapool ONLINE 0 0 0
mirror ONLINE 0 0 0
c0t0d0 ONLINE 0 0 0
c1t0d0 ONLINE 0 0 0
mirror ONLINE 0 0 0
c4t0d0 ONLINE 0 0 0
c6t0d0 ONLINE 0 0 0
mirror ONLINE 0 0 0
c7t0d0 ONLINE 0 0 0
c0t1d0 ONLINE 0 0 0
mirror ONLINE 0 0 0
c1t1d0 ONLINE 0 0 0
c4t1d0 ONLINE 0 0 0
mirror ONLINE 0 0 0
c5t1d0 ONLINE 0 0 0
c6t1d0 ONLINE 0 0 0
mirror ONLINE 0 0 0
c7t1d0 ONLINE 0 0 0
c0t2d0 ONLINE 0 0 0
mirror ONLINE 0 0 0
c1t2d0 ONLINE 0 0 0
c4t2d0 ONLINE 0 0 0
mirror ONLINE 0 0 0
c5t2d0 ONLINE 0 0 0
c6t2d0 ONLINE 0 0 0
mirror ONLINE 0 0 0
c7t2d0 ONLINE 0 0 0
c0t3d0 ONLINE 0 0 0
mirror ONLINE 0 0 0
c1t3d0 ONLINE 0 0 0
c4t3d0 ONLINE 0 0 0
mirror ONLINE 0 0 0
c5t3d0 ONLINE 0 0 0
c6t3d0 ONLINE 0 0 0
mirror ONLINE 0 0 0
c7t3d0 ONLINE 0 0 0
c0t4d0 ONLINE 0 0 0
mirror ONLINE 0 0 0
c1t4d0 ONLINE 0 0 0
c4t4d0 ONLINE 0 0 0
mirror ONLINE 0 0 0
c6t4d0 ONLINE 0 0 0
c7t4d0 ONLINE 0 0 0
mirror ONLINE 0 0 0
c0t5d0 ONLINE 0 0 0
c1t5d0 ONLINE 0 0 0
mirror ONLINE 0 0 0
c4t5d0 ONLINE 0 0 0
c5t5d0 ONLINE 0 0 0
mirror ONLINE 0 0 0
c6t5d0 ONLINE 0 0 0
c7t5d0 ONLINE 0 0 0
mirror ONLINE 0 0 0
c0t6d0 ONLINE 0 0 0
c1t6d0 ONLINE 0 0 0
mirror ONLINE 0 0 0
c4t6d0 ONLINE 0 0 0
c5t6d0 ONLINE 0 0 0
mirror ONLINE 0 0 0
c6t6d0 ONLINE 0 0 0
c7t6d0 ONLINE 0 0 0
mirror ONLINE 0 0 0
c0t7d0 ONLINE 0 0 0
c1t7d0 ONLINE 0 0 0
*Result* - Around 50-60mb/s read
parsing profile for config: copyfiles
Running
/tmp/temp165-231.*.*.COM-zfs-readtest-Apr_8_2008-09h_09m_07s/copyfiles/thisrun.f
FileBench Version 1.2.2
5109: 0.005: CopyFiles Version 2.3 personality successfully loaded
5109: 0.005: Creating/pre-allocating files and filesets
5109: 0.069: Fileset destfiles: 10000 files, avg dir = 20, avg depth = 3.1,
mbytes=156
5109: 3.922: Removed any existing fileset destfiles in 4 seconds
5109: 3.952: Creating fileset destfiles...
5109: 3.952: Preallocated 0 of 10000 of fileset destfiles in 1 seconds
5109: 4.039: Fileset bigfileset: 10000 files, avg dir = 20, avg depth = 3.1,
mbytes=158
5109: 4.071: Removed any existing fileset bigfileset in 1 seconds
5109: 4.098: Creating fileset bigfileset...
5109: 117.245: Preallocated 10000 of 10000 of fileset bigfileset in 114 seconds
5109: 117.245: waiting for fileset pre-allocation to finish
5109: 117.245: Running ''/opt/filebench/scripts/fs_flush zfs
/export/transcoded''
''zpool export datapool''
''zpool import datapool''
5109: 127.338: Change dir to
/tmp/temp165-231.*.*.COM-zfs-readtest-Apr_8_2008-09h_09m_07s/copyfiles
5109: 127.339: Starting 1 filereader instances
5287: 128.348: Starting 16 filereaderthread threads
5109: 131.358: Running...
5109: 134.378: Run took 3 seconds...
5109: 134.378: Per-Operation Breakdown
closefile2 3312ops/s 0.0mb/s 0.0ms/op 3us/op-cpu
closefile1 3312ops/s 0.0mb/s 0.0ms/op 4us/op-cpu
writefile2 3312ops/s 52.3mb/s 0.1ms/op 59us/op-cpu
createfile2 3312ops/s 0.0mb/s 0.2ms/op 100us/op-cpu
readfile1 3312ops/s 52.3mb/s 0.1ms/op 27us/op-cpu
openfile1 3312ops/s 0.0mb/s 0.6ms/op 53us/op-cpu
5109: 134.378:
IO Summary: 60000 ops 19869.5 ops/s, (3312/3312 r/w) 104.6mb/s, 228us
cpu/op, 0.4ms latency
5109: 134.378: Stats dump to file ''stats.copyfiles.out''
5109: 134.378: in statsdump stats.copyfiles.out
5109: 134.379: Shutting down processes
Cheers,
Henrik
This message posted from opensolaris.org
On my drive array (capable of 260MB/second single-process writes and 450MB/second single-process reads) ''zfs iostat'' reports a read rate of about 59MB/second and a write rate of about 59MB/second when executing ''cp -r'' on a directory containing thousands of 8MB files. This seems very similar to the performance you are seeing. The system indicators (other than disk I/O) are almost flatlined at zero while the copy is going on. It seems that a multi-threaded ''cp'' could be much faster. With GNU xargs, find, and cpio, I think that it is possible to cobble together a much faster copy since GNU xargs supports --max-procs and --max-args arguments to allow executing commands concurrently with different sets of files. Bob =====================================Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
Bob Friesenhahn schrieb:> On my drive array (capable of 260MB/second single-process writes and > 450MB/second single-process reads) ''zfs iostat'' reports a read rate of > about 59MB/second and a write rate of about 59MB/second when executing > ''cp -r'' on a directory containing thousands of 8MB files. This seems > very similar to the performance you are seeing. > > The system indicators (other than disk I/O) are almost flatlined at > zero while the copy is going on. > > It seems that a multi-threaded ''cp'' could be much faster. > > With GNU xargs, find, and cpio, I think that it is possible to cobble > together a much faster copy since GNU xargs supports --max-procs and > --max-args arguments to allow executing commands concurrently with > different sets of files. > > BobThat''s the reason I wrote a binary patch (preloadable shared object) for cp, tar, and friends. You might want to take a look at it... Here: http://www.maier-komor.de/mtwrite.html - Thomas
No, that is definitely not expected.
One thing that can hose you is having a single disk that performs
really badly. I''ve seen disks as slow as 5 MB/sec due to vibration,
bad sectors, etc. To see if you have such a disk, try my diskqual.sh
script (below). On my desktop system, which has 8 drives, I get:
# ./diskqual.sh
c1t0d0 65 MB/sec
c1t1d0 63 MB/sec
c2t0d0 59 MB/sec
c2t1d0 63 MB/sec
c3t0d0 60 MB/sec
c3t1d0 57 MB/sec
c4t0d0 61 MB/sec
c4t1d0 61 MB/sec
The diskqual test is non-destructive (it only does reads), but to
get valid numbers you should run it on an otherwise idle system.
----------------------
#!/bin/ksh
disks=`format </dev/null | grep c.t.d | nawk ''{print $2}''`
getspeed1()
{
ptime dd if=/dev/rdsk/${1}s0 of=/dev/null bs=64k count=1024 2>&1 |
nawk ''$1 == "real" { printf("%.0f\n",
67.108864 / $2) }''
}
getspeed()
{
for iter in 1 2 3
do
getspeed1 $1
done | sort -n | tail -2 | head -1
}
for disk in $disks
do
echo $disk `getspeed $disk` MB/sec
done
----------------------
Jeff
On Tue, Apr 08, 2008 at 06:44:13AM -0700, Henrik Hjort
wrote:> Hi!
>
> I just want to check with the community to see if this is normal.
>
> I have used a X4500 with 500Gb disks and I''m not impressed by the
copy performance.
> I can run several jobs in parallel and get close to 400mb/s but I need
better performance
> from a single copy. I have tried to be "EVIL" as well but
without success.
>
> Tests done with:
> Solaris 10 U4
> Solaris 10 U5 (B10)
> Nevada B86
>
> *Setup*
>
> # zpool status
> pool: datapool
> state: ONLINE
> scrub: none requested
> config:
>
> NAME STATE READ WRITE CKSUM
> datapool ONLINE 0 0 0
> mirror ONLINE 0 0 0
> c0t0d0 ONLINE 0 0 0
> c1t0d0 ONLINE 0 0 0
> mirror ONLINE 0 0 0
> c4t0d0 ONLINE 0 0 0
> c6t0d0 ONLINE 0 0 0
> mirror ONLINE 0 0 0
> c7t0d0 ONLINE 0 0 0
> c0t1d0 ONLINE 0 0 0
> mirror ONLINE 0 0 0
> c1t1d0 ONLINE 0 0 0
> c4t1d0 ONLINE 0 0 0
> mirror ONLINE 0 0 0
> c5t1d0 ONLINE 0 0 0
> c6t1d0 ONLINE 0 0 0
> mirror ONLINE 0 0 0
> c7t1d0 ONLINE 0 0 0
> c0t2d0 ONLINE 0 0 0
> mirror ONLINE 0 0 0
> c1t2d0 ONLINE 0 0 0
> c4t2d0 ONLINE 0 0 0
> mirror ONLINE 0 0 0
> c5t2d0 ONLINE 0 0 0
> c6t2d0 ONLINE 0 0 0
> mirror ONLINE 0 0 0
> c7t2d0 ONLINE 0 0 0
> c0t3d0 ONLINE 0 0 0
> mirror ONLINE 0 0 0
> c1t3d0 ONLINE 0 0 0
> c4t3d0 ONLINE 0 0 0
> mirror ONLINE 0 0 0
> c5t3d0 ONLINE 0 0 0
> c6t3d0 ONLINE 0 0 0
> mirror ONLINE 0 0 0
> c7t3d0 ONLINE 0 0 0
> c0t4d0 ONLINE 0 0 0
> mirror ONLINE 0 0 0
> c1t4d0 ONLINE 0 0 0
> c4t4d0 ONLINE 0 0 0
> mirror ONLINE 0 0 0
> c6t4d0 ONLINE 0 0 0
> c7t4d0 ONLINE 0 0 0
> mirror ONLINE 0 0 0
> c0t5d0 ONLINE 0 0 0
> c1t5d0 ONLINE 0 0 0
> mirror ONLINE 0 0 0
> c4t5d0 ONLINE 0 0 0
> c5t5d0 ONLINE 0 0 0
> mirror ONLINE 0 0 0
> c6t5d0 ONLINE 0 0 0
> c7t5d0 ONLINE 0 0 0
> mirror ONLINE 0 0 0
> c0t6d0 ONLINE 0 0 0
> c1t6d0 ONLINE 0 0 0
> mirror ONLINE 0 0 0
> c4t6d0 ONLINE 0 0 0
> c5t6d0 ONLINE 0 0 0
> mirror ONLINE 0 0 0
> c6t6d0 ONLINE 0 0 0
> c7t6d0 ONLINE 0 0 0
> mirror ONLINE 0 0 0
> c0t7d0 ONLINE 0 0 0
> c1t7d0 ONLINE 0 0 0
>
> *Result* - Around 50-60mb/s read
>
> parsing profile for config: copyfiles
> Running
/tmp/temp165-231.*.*.COM-zfs-readtest-Apr_8_2008-09h_09m_07s/copyfiles/thisrun.f
> FileBench Version 1.2.2
> 5109: 0.005: CopyFiles Version 2.3 personality successfully loaded
> 5109: 0.005: Creating/pre-allocating files and filesets
> 5109: 0.069: Fileset destfiles: 10000 files, avg dir = 20, avg depth =
3.1, mbytes=156
> 5109: 3.922: Removed any existing fileset destfiles in 4 seconds
> 5109: 3.952: Creating fileset destfiles...
> 5109: 3.952: Preallocated 0 of 10000 of fileset destfiles in 1 seconds
> 5109: 4.039: Fileset bigfileset: 10000 files, avg dir = 20, avg depth =
3.1, mbytes=158
> 5109: 4.071: Removed any existing fileset bigfileset in 1 seconds
> 5109: 4.098: Creating fileset bigfileset...
> 5109: 117.245: Preallocated 10000 of 10000 of fileset bigfileset in 114
seconds
> 5109: 117.245: waiting for fileset pre-allocation to finish
> 5109: 117.245: Running ''/opt/filebench/scripts/fs_flush zfs
/export/transcoded''
> ''zpool export datapool''
> ''zpool import datapool''
> 5109: 127.338: Change dir to
/tmp/temp165-231.*.*.COM-zfs-readtest-Apr_8_2008-09h_09m_07s/copyfiles
> 5109: 127.339: Starting 1 filereader instances
> 5287: 128.348: Starting 16 filereaderthread threads
> 5109: 131.358: Running...
> 5109: 134.378: Run took 3 seconds...
> 5109: 134.378: Per-Operation Breakdown
> closefile2 3312ops/s 0.0mb/s 0.0ms/op
3us/op-cpu
> closefile1 3312ops/s 0.0mb/s 0.0ms/op
4us/op-cpu
> writefile2 3312ops/s 52.3mb/s 0.1ms/op
59us/op-cpu
> createfile2 3312ops/s 0.0mb/s 0.2ms/op
100us/op-cpu
> readfile1 3312ops/s 52.3mb/s 0.1ms/op
27us/op-cpu
> openfile1 3312ops/s 0.0mb/s 0.6ms/op
53us/op-cpu
>
> 5109: 134.378:
> IO Summary: 60000 ops 19869.5 ops/s, (3312/3312 r/w) 104.6mb/s, 228us
cpu/op, 0.4ms latency
> 5109: 134.378: Stats dump to file ''stats.copyfiles.out''
> 5109: 134.378: in statsdump stats.copyfiles.out
> 5109: 134.379: Shutting down processes
>
> Cheers,
> Henrik
>
>
> This message posted from opensolaris.org
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
On Mon, 14 Apr 2008, Jeff Bonwick wrote:> disks=`format </dev/null | grep c.t.d | nawk ''{print $2}''`I had to change the above line to disks=`format </dev/null | grep '' c.t'' | nawk ''{print $2}''` in order to match my mutipathed devices. ./diskqual.sh c1t0d0 130 MB/sec c1t1d0 13422 MB/sec c4t600A0B80003A8A0B0000096A47B4559Ed0 190 MB/sec c4t600A0B80003A8A0B0000096E47B456DAd0 202 MB/sec c4t600A0B80003A8A0B0000096147B451BEd0 186 MB/sec c4t600A0B80003A8A0B0000096647B453CEd0 176 MB/sec c4t600A0B80003A8A0B0000097347B457D4d0 189 MB/sec c4t600A0B800039C9B500000A9C47B4522Dd0 174 MB/sec c4t600A0B800039C9B500000AA047B4529Bd0 197 MB/sec c4t600A0B800039C9B500000AA447B4544Fd0 223 MB/sec c4t600A0B800039C9B500000AA847B45605d0 224 MB/sec c4t600A0B800039C9B500000AAC47B45739d0 223 MB/sec c4t600A0B800039C9B500000AB047B457ADd0 219 MB/sec c4t600A0B800039C9B500000AB447B4595Fd0 223 MB/sec My ''cp -r'' performance is about the same as Henrik''s. The ''cp -r'' performance is much less than disk benchmark tools would suggest. Bob =====================================Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
after some fruitful discussions with J?rg, it turned out that my mtwrite patch prevents tar, star, gtar, and unzip from setting the file times correctly. I''ve investigated this issue and updated the patch accordingly. Unfortunately, I encountered an issue concerning semaphores, which seem to have a race condition. At least I couldn''t get it to work reliably with semaphores so I switched over to condition variables, which works now. I''ll investigate the semaphore issue as soon as I have time, but I''m pretty convinced that there is a race condition in the semaphore implementation, as the semaphore value from time to time grew larger than the number of elements in the work list. This was on Solaris 10 - so I''ll try to generate a test for SX. Does anybody know of any issues related to semaphores? the work creator did the following: - lock the structure containing the list - attach an element to the list - post the semaphore - unlock the structure the worker thread did the following: - wait on the semaphore - lock the structure containing the list - remove an element from the list - unlock the structure - perform the work described by the list element - lock the structure - update the structure to reflect the work results - unlock the structure - restart from the beginning Is anything wrong with this approach? Replacing the semaphore calls with condition calls and swapping steps 1 and 2 of worker thread made it reliable... - Thomas P.S.: I published the updated mtwrite on my website yesterday - get it here: http://www.maier-komor.de/mtwrite.html