My application processes thousands of files sequentially, reading input files, and outputting new files. I am using Solaris 10U4. While running the application in a verbose mode, I see that it runs very fast but pauses about every 7 seconds for a second or two. This is while reading 50MB/second and writing 73MB/second (ARC cache miss rate of 87%). The pause does not occur if the application spends more time doing real work. However, it would be nice if the pause went away. I have tried turning down the ARC size (from 14GB to 10GB) but the behavior did not noticeably improve. The storage device is trained to ignore cache flush requests. According to the Evil Tuning Guide, the pause I am seeing is due to a cache flush after the uberblock updates. It does not seem like a wise choice to disable ZFS cache flushing entirely. Is there a better way other than adding a small delay into my application? Bob =====================================Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
Bob Friesenhahn wrote:> My application processes thousands of files sequentially, reading > input files, and outputting new files. I am using Solaris 10U4. > While running the application in a verbose mode, I see that it runs > very fast but pauses about every 7 seconds for a second or two.When you experience the pause at the application level, do you see an increase in writes to disk? This might the regular syncing of the transaction group to disk. This is normal behavior. The "amount" of pause is determined by how much data needs to be synced. You could of course decrease it by reducing the time between syncs (either by reducing the ARC and/or decreasing txg_time), however, I am not sure it will translate to better performance for you. hth, -neel This> is while reading 50MB/second and writing 73MB/second (ARC cache miss > rate of 87%). The pause does not occur if the application spends more > time doing real work. However, it would be nice if the pause went > away. > > I have tried turning down the ARC size (from 14GB to 10GB) but the > behavior did not noticeably improve. The storage device is trained to > ignore cache flush requests. According to the Evil Tuning Guide, the > pause I am seeing is due to a cache flush after the uberblock updates. > > It does not seem like a wise choice to disable ZFS cache flushing > entirely. Is there a better way other than adding a small delay into > my application? > > Bob > =====================================> Bob Friesenhahn > bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ > GraphicsMagick Maintainer, http://www.GraphicsMagick.org/ > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
the question is: does the "IO pausing" behaviour you noticed penalize your application? what are the consequences at the application level? for instance we have seen application doing some kind of data capture from external device (video for example) requiring a constant throughput to disk (data feed), risking otherwise loss of data. in this case qfs might be a better option (no free though) if your application is not suffering, then you should be able to live with this apparent "io hangs" s- On Thu, Mar 27, 2008 at 3:35 AM, Bob Friesenhahn <bfriesen at simple.dallas.tx.us> wrote:> My application processes thousands of files sequentially, reading > input files, and outputting new files. I am using Solaris 10U4. > While running the application in a verbose mode, I see that it runs > very fast but pauses about every 7 seconds for a second or two. This > is while reading 50MB/second and writing 73MB/second (ARC cache miss > rate of 87%). The pause does not occur if the application spends more > time doing real work. However, it would be nice if the pause went > away. > > I have tried turning down the ARC size (from 14GB to 10GB) but the > behavior did not noticeably improve. The storage device is trained to > ignore cache flush requests. According to the Evil Tuning Guide, the > pause I am seeing is due to a cache flush after the uberblock updates. > > It does not seem like a wise choice to disable ZFS cache flushing > entirely. Is there a better way other than adding a small delay into > my application? > > Bob > =====================================> Bob Friesenhahn > bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ > GraphicsMagick Maintainer, http://www.GraphicsMagick.org/ > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >-- ------------------------------------------------------ Blog: http://fakoli.blogspot.com/
On Wed, 26 Mar 2008, Neelakanth Nadgir wrote:> When you experience the pause at the application level, > do you see an increase in writes to disk? This might the > regular syncing of the transaction group to disk.If I use ''zpool iostat'' with a one second interval what I see is two or three samples with no write I/O at all followed by a huge write of 100 to 312MB/second. Writes claimed to be a lower rate are split across two sample intervale. It seems that writes are being cached and then issued all at once. This behavior assumes that the file may be written multiple times so a delayed write is more efficient. If I run a script like while true do sync done then the write data rate is much more consistent (at about 66MB/second) and the program does not stall. Of course this is not very efficient. Are the ''zpool iostat'' statistics accurate? Bob =====================================Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
Selim Daoud wrote:> the question is: does the "IO pausing" behaviour you noticed penalize > your application? > what are the consequences at the application level? > > for instance we have seen application doing some kind of data capture > from external device (video for example) requiring a constant > throughput to disk (data feed), risking otherwise loss of data. in > this case qfs might be a better option (no free though) > if your application is not suffering, then you should be able to live > with this apparent "io hangs" > >I would look at txg_time first... for lots of streaming writes on a machine with limited memory writes you can smooth out the sawtooth. QFS is open sourced. http://blogs.sun.com/samqfs -- richard
Bob Friesenhahn wrote:> On Wed, 26 Mar 2008, Neelakanth Nadgir wrote: >> When you experience the pause at the application level, >> do you see an increase in writes to disk? This might the >> regular syncing of the transaction group to disk. > > If I use ''zpool iostat'' with a one second interval what I see is two > or three samples with no write I/O at all followed by a huge write of > 100 to 312MB/second. Writes claimed to be a lower rate are split > across two sample intervale. > > It seems that writes are being cached and then issued all at once. > This behavior assumes that the file may be written multiple times so a > delayed write is more efficient. >This does sound like the regular syncing.> If I run a script like > > while true > do > sync > done > > then the write data rate is much more consistent (at about > 66MB/second) and the program does not stall. Of course this is not > very efficient. >This causes the sync to happen much faster, but as you say, suboptimal. Haven''t had the time to go through the bug report, but probably CR 6429205 each zpool needs to monitor its throughput and throttle heavy writers will help.> Are the ''zpool iostat'' statistics accurate? >Yes. You could also look at regular iostat and correlate it. -neel
On Thu, 27 Mar 2008, Neelakanth Nadgir wrote:> > This causes the sync to happen much faster, but as you say, suboptimal. > Haven''t had the time to go through the bug report, but probably > CR 6429205 each zpool needs to monitor its throughput > and throttle heavy writers > will help.I hope that this feature is implemented soon, and works well. :-) I tested with my application outputting to a UFS filesystem on a single 15K RPM SAS disk and saw that it writes about 50MB/second and without the bursty behavior of ZFS. When writing to ZFS filesystem on a RAID array, zpool I/O stat reports an average (over 10 seconds) write rate of 54MB/second. Given that the throughput is not much higher on the RAID array, I assume that the bottleneck is in my application.>> Are the ''zpool iostat'' statistics accurate? > > Yes. You could also look at regular iostat > and correlate it.Iostat shows that my RAID array disks are loafing with only 9MB/second writes to each but with 82 writes/second. Bob =====================================Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
On Mar 27, 2008, at 9:24 AM, Bob Friesenhahn wrote:> On Thu, 27 Mar 2008, Neelakanth Nadgir wrote: >> >> This causes the sync to happen much faster, but as you say, >> suboptimal. >> Haven''t had the time to go through the bug report, but probably >> CR 6429205 each zpool needs to monitor its throughput >> and throttle heavy writers >> will help. > > I hope that this feature is implemented soon, and works well. :-)Actually, this has gone back into snv_87 (and no we don''t know which s10uX it will go into yet). eric
you may want to try disabling the disk write cache on the single disk. also for the RAID disable ''host cache flush'' if such an option exists. that solved the problem for me. let me know. Bob Friesenhahn <bfriesen at simple.dallas.tx.us> wrote: On Thu, 27 Mar 2008, Neelakanth Nadgir wrote:> > This causes the sync to happen much faster, but as you say, suboptimal. > Haven''t had the time to go through the bug report, but probably > CR 6429205 each zpool needs to monitor its throughput > and throttle heavy writers > will help.I hope that this feature is implemented soon, and works well. :-) I tested with my application outputting to a UFS filesystem on a single 15K RPM SAS disk and saw that it writes about 50MB/second and without the bursty behavior of ZFS. When writing to ZFS filesystem on a RAID array, zpool I/O stat reports an average (over 10 seconds) write rate of 54MB/second. Given that the throughput is not much higher on the RAID array, I assume that the bottleneck is in my application.>> Are the ''zpool iostat'' statistics accurate? > > Yes. You could also look at regular iostat > and correlate it.Iostat shows that my RAID array disks are loafing with only 9MB/second writes to each but with 82 writes/second. Bob =====================================Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/ _______________________________________________ zfs-discuss mailing list zfs-discuss at opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss --------------------------------- Looking for last minute shopping deals? Find them fast with Yahoo! Search. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20080327/6888269a/attachment.html>
Hello eric, Thursday, March 27, 2008, 9:36:42 PM, you wrote: ek> On Mar 27, 2008, at 9:24 AM, Bob Friesenhahn wrote:>> On Thu, 27 Mar 2008, Neelakanth Nadgir wrote: >>> >>> This causes the sync to happen much faster, but as you say, >>> suboptimal. >>> Haven''t had the time to go through the bug report, but probably >>> CR 6429205 each zpool needs to monitor its throughput >>> and throttle heavy writers >>> will help. >> >> I hope that this feature is implemented soon, and works well. :-)ek> Actually, this has gone back into snv_87 (and no we don''t know which ek> s10uX it will go into yet). Could you share more details how it works right now after change? -- Best regards, Robert Milkowski mailto:milek at task.gda.pl http://milek.blogspot.com
ZFS has always done a certain amount of "write throttling". In the past (or the present, for those of you running S10 or pre build 87 bits) this throttling was controlled by a timer and the size of the ARC: we would "cut" a transaction group every 5 seconds based off of our timer, and we would also "cut" a transaction group if we had more than 1/4 of the ARC size worth of dirty data in the transaction group. So, for example, if you have a machine with 16GB of physical memory it wouldn''t be unusual to see an ARC size of around 12GB. This means we would allow up to 3GB of dirty data into a single transaction group (if the writes complete in less than 5 seconds). Now we can have up to three transaction groups "in progress" at any time: open context, quiesce context, and sync context. As a final wrinkle, we also don''t allow more than 1/2 the ARC to be composed of dirty write data. All taken together, this means that there can be up to 6GB of writes "in the pipe" (using the 12GB ARC example from above). Problems with this design start to show up when the write-to-disk bandwidth can''t keep up with the application: if the application is writing at a rate of, say, 1GB/sec, it will "fill the pipe" within 6 seconds. But if the IO bandwidth to disk is only 512MB/sec, its going to take 12sec to get this data onto the disk. This "impedance mis-match" is going to manifest as pauses: the application fills the pipe, then waits for the pipe to empty, then starts writing again. Note that this won''t be smooth, since we need to complete an entire sync phase before allowing things to progress. So you can end up with IO gaps. This is probably what the original submitter is experiencing. Note there are a few other subtleties here that I have glossed over, but the general picture is accurate. The new write throttle code put back into build 87 attempts to smooth out the process. We now measure the amount of time it takes to sync each transaction group, and the amount of data in that group. We dynamically resize our write throttle to try to keep the sync time constant (at 5secs) under write load. We also introduce "fairness" delays on writers when we near pipeline capacity: each write is delayed 1/100sec when we are about to "fill up". This prevents a single heavy writer from "starving out" occasional writers. So instead of coming to an abrupt halt when the pipeline fills, we slow down our write pace. The result should be a constant even IO load. There is one "down side" to this new model: if a write load is very "bursty", e.g., a large 5GB write followed by 30secs of idle, the new code may be less efficient than the old. In the old code, all of this IO would be let in at memory speed and then more slowly make its way out to disk. In the new code, the writes may be slowed down. The data makes its way to the disk in the same amount of time, but the application takes longer. Conceptually: we are sizing the write buffer to the pool bandwidth, rather than to the memory size. Robert Milkowski wrote:> Hello eric, > > Thursday, March 27, 2008, 9:36:42 PM, you wrote: > > ek> On Mar 27, 2008, at 9:24 AM, Bob Friesenhahn wrote: >>> On Thu, 27 Mar 2008, Neelakanth Nadgir wrote: >>>> This causes the sync to happen much faster, but as you say, >>>> suboptimal. >>>> Haven''t had the time to go through the bug report, but probably >>>> CR 6429205 each zpool needs to monitor its throughput >>>> and throttle heavy writers >>>> will help. >>> I hope that this feature is implemented soon, and works well. :-) > > ek> Actually, this has gone back into snv_87 (and no we don''t know which > ek> s10uX it will go into yet). > > > Could you share more details how it works right now after change? >
On Tue, 15 Apr 2008, Mark Maybee wrote:> going to take 12sec to get this data onto the disk. This "impedance > mis-match" is going to manifest as pauses: the application fills > the pipe, then waits for the pipe to empty, then starts writing again. > Note that this won''t be smooth, since we need to complete an entire > sync phase before allowing things to progress. So you can end up > with IO gaps. This is probably what the original submitter isYes. With an application which also needs to make best use of available CPU, these I/O "gaps" cut into available CPU time (by blocking the process) unless the application uses multithreading and an intermediate write queue (more memory) to separate the CPU-centric parts from the I/O-centric parts. While the single-threaded application is waiting for data to be written, it is not able to read and process more data. Since reads take time to complete, being blocked on write stops new reads from being started so the data is ready when it is needed.> There is one "down side" to this new model: if a write load is very > "bursty", e.g., a large 5GB write followed by 30secs of idle, the > new code may be less efficient than the old. In the old code, allThis is also a common scenario. :-) Presumably the special "slow I/O" code would not kick in unless the burst was large enough to fill quite a bit of the ARC. Real time throttling is quite a challenge to do in software. Bob =====================================Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
Hello Mark,
Tuesday, April 15, 2008, 8:32:32 PM, you wrote:
MM> ZFS has always done a certain amount of "write throttling". In
the past
MM> (or the present, for those of you running S10 or pre build 87 bits) this
MM> throttling was controlled by a timer and the size of the ARC: we would
MM> "cut" a transaction group every 5 seconds based off of our
timer, and
MM> we would also "cut" a transaction group if we had more than 1/4
of the
MM> ARC size worth of dirty data in the transaction group. So, for example,
MM> if you have a machine with 16GB of physical memory it wouldn''t
be
MM> unusual to see an ARC size of around 12GB. This means we would allow
MM> up to 3GB of dirty data into a single transaction group (if the writes
MM> complete in less than 5 seconds). Now we can have up to three
MM> transaction groups "in progress" at any time: open context,
quiesce
MM> context, and sync context. As a final wrinkle, we also don''t
allow more
MM> than 1/2 the ARC to be composed of dirty write data. All taken
MM> together, this means that there can be up to 6GB of writes "in the
pipe"
MM> (using the 12GB ARC example from above).
MM> Problems with this design start to show up when the write-to-disk
MM> bandwidth can''t keep up with the application: if the application
is
MM> writing at a rate of, say, 1GB/sec, it will "fill the pipe"
within
MM> 6 seconds. But if the IO bandwidth to disk is only 512MB/sec, its
MM> going to take 12sec to get this data onto the disk. This "impedance
MM> mis-match" is going to manifest as pauses: the application fills
MM> the pipe, then waits for the pipe to empty, then starts writing again.
MM> Note that this won''t be smooth, since we need to complete an
entire
MM> sync phase before allowing things to progress. So you can end up
MM> with IO gaps. This is probably what the original submitter is
MM> experiencing. Note there are a few other subtleties here that I
MM> have glossed over, but the general picture is accurate.
MM> The new write throttle code put back into build 87 attempts to
MM> smooth out the process. We now measure the amount of time it takes
MM> to sync each transaction group, and the amount of data in that group.
MM> We dynamically resize our write throttle to try to keep the sync
MM> time constant (at 5secs) under write load. We also introduce
MM> "fairness" delays on writers when we near pipeline capacity:
each
MM> write is delayed 1/100sec when we are about to "fill up". This
MM> prevents a single heavy writer from "starving out" occasional
MM> writers. So instead of coming to an abrupt halt when the pipeline
MM> fills, we slow down our write pace. The result should be a constant
MM> even IO load.
MM> There is one "down side" to this new model: if a write load is
very
MM> "bursty", e.g., a large 5GB write followed by 30secs of idle,
the
MM> new code may be less efficient than the old. In the old code, all
MM> of this IO would be let in at memory speed and then more slowly make
MM> its way out to disk. In the new code, the writes may be slowed down.
MM> The data makes its way to the disk in the same amount of time, but
MM> the application takes longer. Conceptually: we are sizing the write
MM> buffer to the pool bandwidth, rather than to the memory size.
First - thank you for your explanation - it is very helpful.
I''m worried about the last part - but it''s hard to be optimal
for all
workloads. Nevertheless sometimes the problem is if you change the
behavior from application perspective. With other file systems I
guess you are able to fill in most of memory and still keep disks busy
100% of the time without IO gaps.
My biggest concern were these gaps in IO as zfs should keep disks 100%
busy if needed.
--
Best regards,
Robert Milkowski mailto:milek at task.gda.pl
http://milek.blogspot.com
Bob Friesenhahn writes: > On Tue, 15 Apr 2008, Mark Maybee wrote: > > going to take 12sec to get this data onto the disk. This "impedance > > mis-match" is going to manifest as pauses: the application fills > > the pipe, then waits for the pipe to empty, then starts writing again. > > Note that this won''t be smooth, since we need to complete an entire > > sync phase before allowing things to progress. So you can end up > > with IO gaps. This is probably what the original submitter is > > Yes. With an application which also needs to make best use of > available CPU, these I/O "gaps" cut into available CPU time (by > blocking the process) unless the application uses multithreading and > an intermediate write queue (more memory) to separate the CPU-centric > parts from the I/O-centric parts. While the single-threaded > application is waiting for data to be written, it is not able to read > and process more data. Since reads take time to complete, being > blocked on write stops new reads from being started so the data is > ready when it is needed. > > > There is one "down side" to this new model: if a write load is very > > "bursty", e.g., a large 5GB write followed by 30secs of idle, the > > new code may be less efficient than the old. In the old code, all > > This is also a common scenario. :-) > > Presumably the special "slow I/O" code would not kick in unless the > burst was large enough to fill quite a bit of the ARC. > Bursts of 1/8th of physical memory or 5 seconds of storage throughput whichever is smallest. -r > Real time throttling is quite a challenge to do in software. > > Bob > ===================================== > Bob Friesenhahn > bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ > GraphicsMagick Maintainer, http://www.GraphicsMagick.org/ > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Hello Mark,
Tuesday, April 15, 2008, 8:32:32 PM, you wrote:
MM> The new write throttle code put back into build 87 attempts to
MM> smooth out the process. We now measure the amount of time it takes
MM> to sync each transaction group, and the amount of data in that group.
MM> We dynamically resize our write throttle to try to keep the sync
MM> time constant (at 5secs) under write load. We also introduce
MM> "fairness" delays on writers when we near pipeline capacity:
each
MM> write is delayed 1/100sec when we are about to "fill up". This
MM> prevents a single heavy writer from "starving out" occasional
MM> writers. So instead of coming to an abrupt halt when the pipeline
MM> fills, we slow down our write pace. The result should be a constant
MM> even IO load.
snv_91, 48x 500GB sata drives in one large stripe:
# zpool create -f test c1t0d0 c1t1d0 c1t2d0 c1t3d0 c1t4d0 c1t5d0 c1t6d0 c1t7d0
c2t0d0 c2t1d0 c2t2d0 c2t3d0 c2t4d0 c2t5d0 c2t6d0 c2t7d0 c3t0d0 c3t1d0 c3t2d0
c3t3d0 c3t4d0 c3t5d0 c3t6d0 c3t7d0 c4t0d0 c4t1d0 c4t2d0 c4t3d0 c4t4d0 c4t5d0
c4t6d0 c4t7d0 c5t0d0 c5t1d0 c5t2d0 c5t3d0 c5t4d0 c5t5d0 c5t6d0 c5t7d0 c6t0d0
c6t1d0 c6t2d0 c6t3d0 c6t4d0 c6t5d0 c6t6d0 c6t7d0
# zfs set atime=off test
# dd if=/dev/zero of=/test/q1 bs=1024k
^C34374+0 records in
34374+0 records out
# zpool iostat 1
capacity operations bandwidth
pool used avail read write read write
---------- ----- ----- ----- ----- ----- -----
[...]
test 58.9M 21.7T 0 1.19K 0 80.8M
test 862M 21.7T 0 6.67K 0 776M
test 1.52G 21.7T 0 5.50K 0 689M
test 1.52G 21.7T 0 9.28K 0 1.16G
test 2.88G 21.7T 0 1.14K 0 135M
test 2.88G 21.7T 0 1.61K 0 206M
test 2.88G 21.7T 0 18.0K 0 2.24G
test 5.60G 21.7T 0 79 0 264K
test 5.60G 21.7T 0 0 0 0
test 5.60G 21.7T 0 10.9K 0 1.36G
test 9.59G 21.7T 0 7.09K 0 897M
test 9.59G 21.7T 0 0 0 0
test 9.59G 21.7T 0 6.33K 0 807M
test 9.59G 21.7T 0 17.9K 0 2.24G
test 13.6G 21.7T 0 1.96K 0 239M
test 13.6G 21.7T 0 0 0 0
test 13.6G 21.7T 0 11.9K 0 1.49G
test 17.6G 21.7T 0 9.91K 0 1.23G
test 17.6G 21.7T 0 0 0 0
test 17.6G 21.7T 0 5.48K 0 700M
test 17.6G 21.7T 0 20.0K 0 2.50G
test 21.6G 21.7T 0 2.03K 0 244M
test 21.6G 21.7T 0 0 0 0
test 21.6G 21.7T 0 0 0 0
test 21.6G 21.7T 0 4.03K 0 513M
test 21.6G 21.7T 0 23.7K 0 2.97G
test 25.6G 21.7T 0 1.83K 0 225M
test 25.6G 21.7T 0 0 0 0
test 25.6G 21.7T 0 13.9K 0 1.74G
test 29.6G 21.7T 1 1.40K 127K 167M
test 29.6G 21.7T 0 0 0 0
test 29.6G 21.7T 0 7.14K 0 912M
test 29.6G 21.7T 0 19.2K 0 2.40G
test 33.6G 21.7T 1 378 127K 34.8M
test 33.6G 21.7T 0 0 0 0
^C
Well, doesn''t actually look good. Checking with iostat I don''t
see any
problems like long service times, etc.
Reducing zfs_txg_synctime to 1 helps a little bit but still it''s not
even stream of data.
If I start 3 dd streams at the same time then it is slightly better
(zfs_txg_synctime set back to 5) but still very jumpy.
Reading with one dd produces steady throghput but I''m disapointed with
actual performance:
test 161G 21.6T 9.94K 0 1.24G 0
test 161G 21.6T 10.0K 0 1.25G 0
test 161G 21.6T 10.3K 0 1.29G 0
test 161G 21.6T 10.1K 0 1.27G 0
test 161G 21.6T 10.4K 0 1.31G 0
test 161G 21.6T 10.1K 0 1.27G 0
test 161G 21.6T 10.4K 0 1.30G 0
test 161G 21.6T 10.2K 0 1.27G 0
test 161G 21.6T 10.3K 0 1.29G 0
test 161G 21.6T 10.0K 0 1.25G 0
test 161G 21.6T 9.96K 0 1.24G 0
test 161G 21.6T 10.6K 0 1.33G 0
test 161G 21.6T 10.1K 0 1.26G 0
test 161G 21.6T 10.2K 0 1.27G 0
test 161G 21.6T 10.4K 0 1.30G 0
test 161G 21.6T 9.62K 0 1.20G 0
test 161G 21.6T 8.22K 0 1.03G 0
test 161G 21.6T 9.61K 0 1.20G 0
test 161G 21.6T 10.2K 0 1.28G 0
test 161G 21.6T 9.12K 0 1.14G 0
test 161G 21.6T 9.96K 0 1.25G 0
test 161G 21.6T 9.72K 0 1.22G 0
test 161G 21.6T 10.6K 0 1.32G 0
test 161G 21.6T 9.93K 0 1.24G 0
test 161G 21.6T 9.94K 0 1.24G 0
zpool scrub produces:
test 161G 21.6T 25 69 2.70M 392K
test 161G 21.6T 10.9K 0 1.35G 0
test 161G 21.6T 13.4K 0 1.66G 0
test 161G 21.6T 13.2K 0 1.63G 0
test 161G 21.6T 11.8K 0 1.46G 0
test 161G 21.6T 13.8K 0 1.72G 0
test 161G 21.6T 12.4K 0 1.53G 0
test 161G 21.6T 12.9K 0 1.59G 0
test 161G 21.6T 12.9K 0 1.59G 0
test 161G 21.6T 13.4K 0 1.67G 0
test 161G 21.6T 12.2K 0 1.51G 0
test 161G 21.6T 12.9K 0 1.59G 0
test 161G 21.6T 12.5K 0 1.55G 0
test 161G 21.6T 13.3K 0 1.64G 0
So sequential reading gives steady thruput but numbers are a little
bit lower than expected.
Sequential writing is still jumpy with single or multiple dd streams
for pool with many disk drives.
Lets destroy the pool and create a new one, smaller one.
# zpool create -f test c1t0d0 c2t0d0 c3t0d0 c4t0d0 c5t0d0 c6t0d0
# zfs set atime=off test
# dd if=/dev/zero of=/test/q1 bs=1024k
^C15905+0 records in
15905+0 records out
# zpool iostat 1
capacity operations bandwidth
pool used avail read write read write
---------- ----- ----- ----- ----- ----- -----
[...]
test 688M 2.72T 0 3.29K 0 401M
test 1.01G 2.72T 0 3.69K 0 462M
test 1.35G 2.72T 0 3.59K 0 450M
test 1.35G 2.72T 0 2.95K 0 372M
test 2.03G 2.72T 0 3.37K 0 428M
test 2.03G 2.72T 0 1.94K 0 248M
test 2.71G 2.72T 0 2.44K 0 301M
test 2.71G 2.72T 0 3.88K 0 497M
test 2.71G 2.72T 0 3.86K 0 494M
test 4.07G 2.71T 0 3.42K 0 425M
test 4.07G 2.71T 0 3.89K 0 498M
test 4.07G 2.71T 0 3.88K 0 497M
test 5.43G 2.71T 0 3.44K 0 429M
test 5.43G 2.71T 0 3.94K 0 504M
test 5.43G 2.71T 0 3.88K 0 497M
test 5.43G 2.71T 0 3.88K 0 497M
test 7.62G 2.71T 0 2.34K 0 286M
test 7.62G 2.71T 0 4.23K 0 539M
test 7.62G 2.71T 0 3.89K 0 498M
test 7.62G 2.71T 0 3.87K 0 495M
test 7.62G 2.71T 0 3.88K 0 497M
test 9.81G 2.71T 0 3.33K 0 418M
test 9.81G 2.71T 0 4.12K 0 526M
test 9.81G 2.71T 0 3.88K 0 497M
Much more steady - interesting.
Let''s do it again with yet bigger pool and lets keep distributing
disks in "rows" across controllers.
# zpool create -f test c1t0d0 c2t0d0 c3t0d0 c4t0d0 c5t0d0 c6t0d0 c1t1d0 c2t1d0
c3t1d0 c4t1d0 c5t1d0 c6t1d0
# zfs set atime=off test
test 1.35G 5.44T 0 5.42K 0 671M
test 2.03G 5.44T 0 7.01K 0 883M
test 2.71G 5.43T 0 6.22K 0 786M
test 2.71G 5.43T 0 8.09K 0 1.01G
test 4.07G 5.43T 0 7.14K 0 902M
test 5.43G 5.43T 0 4.02K 0 507M
test 5.43G 5.43T 0 5.52K 0 700M
test 5.43G 5.43T 0 8.04K 0 1.00G
test 5.43G 5.43T 0 7.70K 0 986M
test 8.15G 5.43T 0 6.13K 0 769M
test 8.15G 5.43T 0 7.77K 0 995M
test 8.15G 5.43T 0 7.67K 0 981M
test 10.9G 5.43T 0 4.15K 0 517M
test 10.9G 5.43T 0 7.74K 0 986M
test 10.9G 5.43T 0 7.76K 0 994M
test 10.9G 5.43T 0 7.75K 0 993M
test 14.9G 5.42T 0 6.79K 0 860M
test 14.9G 5.42T 0 7.50K 0 958M
test 14.9G 5.42T 0 8.25K 0 1.03G
test 14.9G 5.42T 0 7.77K 0 995M
test 18.9G 5.42T 0 4.86K 0 614M
starting to be more jumpy, but still not as bad as in first case.
So lets create a pool out of all disks again but this time lets
continue to provide disks in "rows" across controllers.
# zpool create -f test c1t0d0 c2t0d0 c3t0d0 c4t0d0 c5t0d0 c6t0d0 c1t1d0 c2t1d0
c3t1d0 c4t1d0 c5t1d0 c6t1d0 c1t2d0 c2t2d0 c3t2d0 c4t2d0 c5t2d0 c6t2d0 c1t3d0
c2t3d0 c3t3d0 c4t3d0 c5t3d0 c6t3d0 c1t4d0 c2t4d0 c3t4d0 c4t4d0 c5t4d0 c6t4d0
c1t5d0 c2t5d0 c3t5d0 c4t5d0 c5t5d0 c6t5d0 c1t6d0 c2t6d0 c3t6d0 c4t6d0 c5t6d0
c6t6d0 c1t7d0 c2t7d0 c3t7d0 c4t7d0 c5t7d0 c6t7d0
# zfs set atime=off test
test 862M 21.7T 0 5.81K 0 689M
test 1.52G 21.7T 0 5.50K 0 689M
test 2.88G 21.7T 0 10.9K 0 1.35G
test 2.88G 21.7T 0 0 0 0
test 2.88G 21.7T 0 9.49K 0 1.18G
test 5.60G 21.7T 0 11.1K 0 1.38G
test 5.60G 21.7T 0 0 0 0
test 5.60G 21.7T 0 0 0 0
test 5.60G 21.7T 0 15.3K 0 1.90G
test 9.59G 21.7T 0 15.4K 0 1.91G
test 9.59G 21.7T 0 0 0 0
test 9.59G 21.7T 0 0 0 0
test 9.59G 21.7T 0 16.8K 0 2.09G
test 13.6G 21.7T 0 8.60K 0 1.06G
test 13.6G 21.7T 0 0 0 0
test 13.6G 21.7T 0 4.01K 0 512M
test 13.6G 21.7T 0 20.2K 0 2.52G
test 17.6G 21.7T 0 2.86K 0 353M
test 17.6G 21.7T 0 0 0 0
test 17.6G 21.7T 0 11.6K 0 1.45G
test 21.6G 21.7T 0 14.1K 0 1.75G
test 21.6G 21.7T 0 0 0 0
test 21.6G 21.7T 0 0 0 0
test 21.6G 21.7T 0 4.74K 0 602M
test 21.6G 21.7T 0 17.6K 0 2.20G
test 25.6G 21.7T 0 8.00K 0 1008M
test 25.6G 21.7T 0 0 0 0
test 25.6G 21.7T 0 0 0 0
test 25.6G 21.7T 0 16.8K 0 2.09G
test 25.6G 21.7T 0 15.0K 0 1.86G
test 29.6G 21.7T 0 11 0 11.9K
Any idea?
--
Best regards,
Robert Milkowski mailto:milek at task.gda.pl
http://milek.blogspot.com
Le 28 juin 08 ? 05:14, Robert Milkowski a ?crit :> Hello Mark, > > Tuesday, April 15, 2008, 8:32:32 PM, you wrote: > > MM> The new write throttle code put back into build 87 attempts to > MM> smooth out the process. We now measure the amount of time it > takes > MM> to sync each transaction group, and the amount of data in that > group. > MM> We dynamically resize our write throttle to try to keep the sync > MM> time constant (at 5secs) under write load. We also introduce > MM> "fairness" delays on writers when we near pipeline capacity: each > MM> write is delayed 1/100sec when we are about to "fill up". This > MM> prevents a single heavy writer from "starving out" occasional > MM> writers. So instead of coming to an abrupt halt when the pipeline > MM> fills, we slow down our write pace. The result should be a > constant > MM> even IO load. > > snv_91, 48x 500GB sata drives in one large stripe: > > # zpool create -f test c1t0d0 c1t1d0 c1t2d0 c1t3d0 c1t4d0 c1t5d0 > c1t6d0 c1t7d0 c2t0d0 c2t1d0 c2t2d0 c2t3d0 c2t4d0 c2t5d0 c2t6d0 > c2t7d0 c3t0d0 c3t1d0 c3t2d0 c3t3d0 c3t4d0 c3t5d0 c3t6d0 c3t7d0 > c4t0d0 c4t1d0 c4t2d0 c4t3d0 c4t4d0 c4t5d0 c4t6d0 c4t7d0 c5t0d0 > c5t1d0 c5t2d0 c5t3d0 c5t4d0 c5t5d0 c5t6d0 c5t7d0 c6t0d0 c6t1d0 > c6t2d0 c6t3d0 c6t4d0 c6t5d0 c6t6d0 c6t7d0 > # zfs set atime=off test > > > # dd if=/dev/zero of=/test/q1 bs=1024k > ^C34374+0 records in > 34374+0 records out > > > # zpool iostat 1 > capacity operations bandwidth > pool used avail read write read write > ---------- ----- ----- ----- ----- ----- ----- > [...] > test 58.9M 21.7T 0 1.19K 0 80.8M > test 862M 21.7T 0 6.67K 0 776M > test 1.52G 21.7T 0 5.50K 0 689M > test 1.52G 21.7T 0 9.28K 0 1.16G > test 2.88G 21.7T 0 1.14K 0 135M > test 2.88G 21.7T 0 1.61K 0 206M > test 2.88G 21.7T 0 18.0K 0 2.24G > test 5.60G 21.7T 0 79 0 264K > test 5.60G 21.7T 0 0 0 0 > test 5.60G 21.7T 0 10.9K 0 1.36G > test 9.59G 21.7T 0 7.09K 0 897M > test 9.59G 21.7T 0 0 0 0 > test 9.59G 21.7T 0 6.33K 0 807M > test 9.59G 21.7T 0 17.9K 0 2.24G > test 13.6G 21.7T 0 1.96K 0 239M > test 13.6G 21.7T 0 0 0 0 > test 13.6G 21.7T 0 11.9K 0 1.49G > test 17.6G 21.7T 0 9.91K 0 1.23G > test 17.6G 21.7T 0 0 0 0 > test 17.6G 21.7T 0 5.48K 0 700M > test 17.6G 21.7T 0 20.0K 0 2.50G > test 21.6G 21.7T 0 2.03K 0 244M > test 21.6G 21.7T 0 0 0 0 > test 21.6G 21.7T 0 0 0 0 > test 21.6G 21.7T 0 4.03K 0 513M > test 21.6G 21.7T 0 23.7K 0 2.97G > test 25.6G 21.7T 0 1.83K 0 225M > test 25.6G 21.7T 0 0 0 0 > test 25.6G 21.7T 0 13.9K 0 1.74G > test 29.6G 21.7T 1 1.40K 127K 167M > test 29.6G 21.7T 0 0 0 0 > test 29.6G 21.7T 0 7.14K 0 912M > test 29.6G 21.7T 0 19.2K 0 2.40G > test 33.6G 21.7T 1 378 127K 34.8M > test 33.6G 21.7T 0 0 0 0 > ^C > > > Well, doesn''t actually look good. Checking with iostat I don''t see any > problems like long service times, etc. >I suspect, a single dd is cpu bound.> Reducing zfs_txg_synctime to 1 helps a little bit but still it''s not > even stream of data. > > If I start 3 dd streams at the same time then it is slightly better > (zfs_txg_synctime set back to 5) but still very jumpy. >Try zfs_txg_synctime to 10; that reduces the txg overhead.> Reading with one dd produces steady throghput but I''m disapointed with > actual performance: >Again, probably cpu bound. What''s "ptime dd..." saying ?> test 161G 21.6T 9.94K 0 1.24G 0 > test 161G 21.6T 10.0K 0 1.25G 0 > test 161G 21.6T 10.3K 0 1.29G 0 > test 161G 21.6T 10.1K 0 1.27G 0 > test 161G 21.6T 10.4K 0 1.31G 0 > test 161G 21.6T 10.1K 0 1.27G 0 > test 161G 21.6T 10.4K 0 1.30G 0 > test 161G 21.6T 10.2K 0 1.27G 0 > test 161G 21.6T 10.3K 0 1.29G 0 > test 161G 21.6T 10.0K 0 1.25G 0 > test 161G 21.6T 9.96K 0 1.24G 0 > test 161G 21.6T 10.6K 0 1.33G 0 > test 161G 21.6T 10.1K 0 1.26G 0 > test 161G 21.6T 10.2K 0 1.27G 0 > test 161G 21.6T 10.4K 0 1.30G 0 > test 161G 21.6T 9.62K 0 1.20G 0 > test 161G 21.6T 8.22K 0 1.03G 0 > test 161G 21.6T 9.61K 0 1.20G 0 > test 161G 21.6T 10.2K 0 1.28G 0 > test 161G 21.6T 9.12K 0 1.14G 0 > test 161G 21.6T 9.96K 0 1.25G 0 > test 161G 21.6T 9.72K 0 1.22G 0 > test 161G 21.6T 10.6K 0 1.32G 0 > test 161G 21.6T 9.93K 0 1.24G 0 > test 161G 21.6T 9.94K 0 1.24G 0 > > > zpool scrub produces: > > test 161G 21.6T 25 69 2.70M 392K > test 161G 21.6T 10.9K 0 1.35G 0 > test 161G 21.6T 13.4K 0 1.66G 0 > test 161G 21.6T 13.2K 0 1.63G 0 > test 161G 21.6T 11.8K 0 1.46G 0 > test 161G 21.6T 13.8K 0 1.72G 0 > test 161G 21.6T 12.4K 0 1.53G 0 > test 161G 21.6T 12.9K 0 1.59G 0 > test 161G 21.6T 12.9K 0 1.59G 0 > test 161G 21.6T 13.4K 0 1.67G 0 > test 161G 21.6T 12.2K 0 1.51G 0 > test 161G 21.6T 12.9K 0 1.59G 0 > test 161G 21.6T 12.5K 0 1.55G 0 > test 161G 21.6T 13.3K 0 1.64G 0 > > > > > So sequential reading gives steady thruput but numbers are a little > bit lower than expected. > > Sequential writing is still jumpy with single or multiple dd streams > for pool with many disk drives. > > Lets destroy the pool and create a new one, smaller one. > > > > # zpool create -f test c1t0d0 c2t0d0 c3t0d0 c4t0d0 c5t0d0 c6t0d0 > # zfs set atime=off test > > # dd if=/dev/zero of=/test/q1 bs=1024k > ^C15905+0 records in > 15905+0 records out > > > # zpool iostat 1 > capacity operations bandwidth > pool used avail read write read write > ---------- ----- ----- ----- ----- ----- ----- > [...] > test 688M 2.72T 0 3.29K 0 401M > test 1.01G 2.72T 0 3.69K 0 462M > test 1.35G 2.72T 0 3.59K 0 450M > test 1.35G 2.72T 0 2.95K 0 372M > test 2.03G 2.72T 0 3.37K 0 428M > test 2.03G 2.72T 0 1.94K 0 248M > test 2.71G 2.72T 0 2.44K 0 301M > test 2.71G 2.72T 0 3.88K 0 497M > test 2.71G 2.72T 0 3.86K 0 494M > test 4.07G 2.71T 0 3.42K 0 425M > test 4.07G 2.71T 0 3.89K 0 498M > test 4.07G 2.71T 0 3.88K 0 497M > test 5.43G 2.71T 0 3.44K 0 429M > test 5.43G 2.71T 0 3.94K 0 504M > test 5.43G 2.71T 0 3.88K 0 497M > test 5.43G 2.71T 0 3.88K 0 497M > test 7.62G 2.71T 0 2.34K 0 286M > test 7.62G 2.71T 0 4.23K 0 539M > test 7.62G 2.71T 0 3.89K 0 498M > test 7.62G 2.71T 0 3.87K 0 495M > test 7.62G 2.71T 0 3.88K 0 497M > test 9.81G 2.71T 0 3.33K 0 418M > test 9.81G 2.71T 0 4.12K 0 526M > test 9.81G 2.71T 0 3.88K 0 497M > > > Much more steady - interesting. >Now it''s disk bound.> > Let''s do it again with yet bigger pool and lets keep distributing > disks in "rows" across controllers. > > # zpool create -f test c1t0d0 c2t0d0 c3t0d0 c4t0d0 c5t0d0 c6t0d0 > c1t1d0 c2t1d0 c3t1d0 c4t1d0 c5t1d0 c6t1d0 > # zfs set atime=off test > > test 1.35G 5.44T 0 5.42K 0 671M > test 2.03G 5.44T 0 7.01K 0 883M > test 2.71G 5.43T 0 6.22K 0 786M > test 2.71G 5.43T 0 8.09K 0 1.01G > test 4.07G 5.43T 0 7.14K 0 902M > test 5.43G 5.43T 0 4.02K 0 507M > test 5.43G 5.43T 0 5.52K 0 700M > test 5.43G 5.43T 0 8.04K 0 1.00G > test 5.43G 5.43T 0 7.70K 0 986M > test 8.15G 5.43T 0 6.13K 0 769M > test 8.15G 5.43T 0 7.77K 0 995M > test 8.15G 5.43T 0 7.67K 0 981M > test 10.9G 5.43T 0 4.15K 0 517M > test 10.9G 5.43T 0 7.74K 0 986M > test 10.9G 5.43T 0 7.76K 0 994M > test 10.9G 5.43T 0 7.75K 0 993M > test 14.9G 5.42T 0 6.79K 0 860M > test 14.9G 5.42T 0 7.50K 0 958M > test 14.9G 5.42T 0 8.25K 0 1.03G > test 14.9G 5.42T 0 7.77K 0 995M > test 18.9G 5.42T 0 4.86K 0 614M > > > starting to be more jumpy, but still not as bad as in first case. > > So lets create a pool out of all disks again but this time lets > continue to provide disks in "rows" across controllers. > > # zpool create -f test c1t0d0 c2t0d0 c3t0d0 c4t0d0 c5t0d0 c6t0d0 > c1t1d0 c2t1d0 c3t1d0 c4t1d0 c5t1d0 c6t1d0 c1t2d0 c2t2d0 c3t2d0 > c4t2d0 c5t2d0 c6t2d0 c1t3d0 c2t3d0 c3t3d0 c4t3d0 c5t3d0 c6t3d0 > c1t4d0 c2t4d0 c3t4d0 c4t4d0 c5t4d0 c6t4d0 c1t5d0 c2t5d0 c3t5d0 > c4t5d0 c5t5d0 c6t5d0 c1t6d0 c2t6d0 c3t6d0 c4t6d0 c5t6d0 c6t6d0 > c1t7d0 c2t7d0 c3t7d0 c4t7d0 c5t7d0 c6t7d0 > # zfs set atime=off test > > test 862M 21.7T 0 5.81K 0 689M > test 1.52G 21.7T 0 5.50K 0 689M > test 2.88G 21.7T 0 10.9K 0 1.35G > test 2.88G 21.7T 0 0 0 0 > test 2.88G 21.7T 0 9.49K 0 1.18G > test 5.60G 21.7T 0 11.1K 0 1.38G > test 5.60G 21.7T 0 0 0 0 > test 5.60G 21.7T 0 0 0 0 > test 5.60G 21.7T 0 15.3K 0 1.90G > test 9.59G 21.7T 0 15.4K 0 1.91G > test 9.59G 21.7T 0 0 0 0 > test 9.59G 21.7T 0 0 0 0 > test 9.59G 21.7T 0 16.8K 0 2.09G > test 13.6G 21.7T 0 8.60K 0 1.06G > test 13.6G 21.7T 0 0 0 0 > test 13.6G 21.7T 0 4.01K 0 512M > test 13.6G 21.7T 0 20.2K 0 2.52G > test 17.6G 21.7T 0 2.86K 0 353M > test 17.6G 21.7T 0 0 0 0 > test 17.6G 21.7T 0 11.6K 0 1.45G > test 21.6G 21.7T 0 14.1K 0 1.75G > test 21.6G 21.7T 0 0 0 0 > test 21.6G 21.7T 0 0 0 0 > test 21.6G 21.7T 0 4.74K 0 602M > test 21.6G 21.7T 0 17.6K 0 2.20G > test 25.6G 21.7T 0 8.00K 0 1008M > test 25.6G 21.7T 0 0 0 0 > test 25.6G 21.7T 0 0 0 0 > test 25.6G 21.7T 0 16.8K 0 2.09G > test 25.6G 21.7T 0 15.0K 0 1.86G > test 29.6G 21.7T 0 11 0 11.9K > > > > Any idea? > > > > -- > Best regards, > Robert Milkowski mailto:milek at task.gda.pl > http://milek.blogspot.com > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Hello Roch,
Saturday, June 28, 2008, 11:25:17 AM, you wrote:
RB> I suspect, a single dd is cpu bound.
I don''t think so.
Se below one with a stripe of 48x disks again. Single dd with 1024k
block size and 64GB to write.
bash-3.2# zpool iostat 1
capacity operations bandwidth
pool used avail read write read write
---------- ----- ----- ----- ----- ----- -----
test 333K 21.7T 1 1 147K 147K
test 333K 21.7T 0 0 0 0
test 333K 21.7T 0 0 0 0
test 333K 21.7T 0 0 0 0
test 333K 21.7T 0 0 0 0
test 333K 21.7T 0 0 0 0
test 333K 21.7T 0 0 0 0
test 333K 21.7T 0 0 0 0
test 333K 21.7T 0 1.60K 0 204M
test 333K 21.7T 0 20.5K 0 2.55G
test 4.00G 21.7T 0 9.19K 0 1.13G
test 4.00G 21.7T 0 0 0 0
test 4.00G 21.7T 0 1.78K 0 228M
test 4.00G 21.7T 0 12.5K 0 1.55G
test 7.99G 21.7T 0 16.2K 0 2.01G
test 7.99G 21.7T 0 0 0 0
test 7.99G 21.7T 0 13.4K 0 1.68G
test 12.0G 21.7T 0 4.31K 0 530M
test 12.0G 21.7T 0 0 0 0
test 12.0G 21.7T 0 6.91K 0 882M
test 12.0G 21.7T 0 21.8K 0 2.72G
test 16.0G 21.7T 0 839 0 88.4M
test 16.0G 21.7T 0 0 0 0
test 16.0G 21.7T 0 4.42K 0 565M
test 16.0G 21.7T 0 18.5K 0 2.31G
test 20.0G 21.7T 0 8.87K 0 1.10G
test 20.0G 21.7T 0 0 0 0
test 20.0G 21.7T 0 12.2K 0 1.52G
test 24.0G 21.7T 0 9.28K 0 1.14G
test 24.0G 21.7T 0 0 0 0
test 24.0G 21.7T 0 0 0 0
test 24.0G 21.7T 0 0 0 0
test 24.0G 21.7T 0 14.5K 0 1.81G
test 28.0G 21.7T 0 10.1K 63.6K 1.25G
test 28.0G 21.7T 0 0 0 0
test 28.0G 21.7T 0 10.7K 0 1.34G
test 32.0G 21.7T 0 13.6K 63.2K 1.69G
test 32.0G 21.7T 0 0 0 0
test 32.0G 21.7T 0 0 0 0
test 32.0G 21.7T 0 11.1K 0 1.39G
test 36.0G 21.7T 0 19.9K 0 2.48G
test 36.0G 21.7T 0 0 0 0
test 36.0G 21.7T 0 0 0 0
test 36.0G 21.7T 0 17.7K 0 2.21G
test 40.0G 21.7T 0 5.42K 63.1K 680M
test 40.0G 21.7T 0 0 0 0
test 40.0G 21.7T 0 6.62K 0 844M
test 44.0G 21.7T 1 19.8K 125K 2.46G
test 44.0G 21.7T 0 0 0 0
test 44.0G 21.7T 0 0 0 0
test 44.0G 21.7T 0 18.0K 0 2.24G
test 47.9G 21.7T 1 13.2K 127K 1.63G
test 47.9G 21.7T 0 0 0 0
test 47.9G 21.7T 0 0 0 0
test 47.9G 21.7T 0 15.6K 0 1.94G
test 47.9G 21.7T 1 16.1K 126K 1.99G
test 51.9G 21.7T 0 0 0 0
test 51.9G 21.7T 0 0 0 0
test 51.9G 21.7T 0 14.2K 0 1.77G
test 55.9G 21.7T 0 14.0K 63.2K 1.73G
test 55.9G 21.7T 0 0 0 0
test 55.9G 21.7T 0 0 0 0
test 55.9G 21.7T 0 16.3K 0 2.04G
test 59.9G 21.7T 0 14.5K 63.2K 1.80G
test 59.9G 21.7T 0 0 0 0
test 59.9G 21.7T 0 0 0 0
test 59.9G 21.7T 0 17.7K 0 2.21G
test 63.9G 21.7T 0 4.84K 62.6K 603M
test 63.9G 21.7T 0 0 0 0
test 63.9G 21.7T 0 0 0 0
test 63.9G 21.7T 0 0 0 0
test 63.9G 21.7T 0 0 0 0
test 63.9G 21.7T 0 0 0 0
test 63.9G 21.7T 0 0 0 0
test 63.9G 21.7T 0 0 0 0
^C
bash-3.2#
bash-3.2# ptime dd if=/dev/zero of=/test/q1 bs=1024k count=65536
65536+0 records in
65536+0 records out
real 1:06.312
user 0.074
sys 54.060
bash-3.2#
Doesn''t look like it''s CPU bound.
Let''s try to read the file after zpool export test; zpool import test
bash-3.2# zpool iostat 1
capacity operations bandwidth
pool used avail read write read write
---------- ----- ----- ----- ----- ----- -----
test 64.0G 21.7T 15 46 1.22M 1.76M
test 64.0G 21.7T 0 0 0 0
test 64.0G 21.7T 0 0 0 0
test 64.0G 21.7T 6.64K 0 849M 0
test 64.0G 21.7T 10.2K 0 1.27G 0
test 64.0G 21.7T 10.7K 0 1.33G 0
test 64.0G 21.7T 9.91K 0 1.24G 0
test 64.0G 21.7T 10.1K 0 1.27G 0
test 64.0G 21.7T 10.7K 0 1.33G 0
test 64.0G 21.7T 10.4K 0 1.30G 0
test 64.0G 21.7T 10.6K 0 1.33G 0
test 64.0G 21.7T 10.6K 0 1.33G 0
test 64.0G 21.7T 10.2K 0 1.27G 0
test 64.0G 21.7T 10.3K 0 1.29G 0
test 64.0G 21.7T 10.5K 0 1.31G 0
test 64.0G 21.7T 9.16K 0 1.14G 0
test 64.0G 21.7T 1.98K 0 253M 0
test 64.0G 21.7T 2.48K 0 317M 0
test 64.0G 21.7T 1.98K 0 253M 0
test 64.0G 21.7T 1.98K 0 254M 0
test 64.0G 21.7T 2.23K 0 285M 0
test 64.0G 21.7T 1.73K 0 221M 0
test 64.0G 21.7T 2.23K 0 285M 0
test 64.0G 21.7T 2.23K 0 285M 0
test 64.0G 21.7T 1.49K 0 191M 0
test 64.0G 21.7T 2.47K 0 317M 0
test 64.0G 21.7T 1.46K 0 186M 0
test 64.0G 21.7T 2.01K 0 258M 0
test 64.0G 21.7T 1.98K 0 254M 0
test 64.0G 21.7T 1.97K 0 253M 0
test 64.0G 21.7T 2.23K 0 286M 0
test 64.0G 21.7T 1.98K 0 254M 0
test 64.0G 21.7T 1.73K 0 221M 0
test 64.0G 21.7T 1.98K 0 254M 0
test 64.0G 21.7T 2.42K 0 310M 0
test 64.0G 21.7T 1.78K 0 228M 0
test 64.0G 21.7T 2.23K 0 285M 0
test 64.0G 21.7T 1.67K 0 214M 0
test 64.0G 21.7T 1.80K 0 230M 0
test 64.0G 21.7T 2.23K 0 285M 0
test 64.0G 21.7T 2.47K 0 317M 0
test 64.0G 21.7T 1.73K 0 221M 0
test 64.0G 21.7T 1.99K 0 254M 0
test 64.0G 21.7T 1.24K 0 159M 0
test 64.0G 21.7T 2.47K 0 316M 0
test 64.0G 21.7T 2.47K 0 317M 0
test 64.0G 21.7T 1.99K 0 254M 0
test 64.0G 21.7T 2.23K 0 285M 0
test 64.0G 21.7T 1.73K 0 221M 0
test 64.0G 21.7T 2.48K 0 317M 0
test 64.0G 21.7T 2.48K 0 317M 0
test 64.0G 21.7T 1.49K 0 190M 0
test 64.0G 21.7T 2.23K 0 285M 0
test 64.0G 21.7T 2.23K 0 285M 0
test 64.0G 21.7T 1.81K 0 232M 0
test 64.0G 21.7T 1.90K 0 243M 0
test 64.0G 21.7T 2.48K 0 317M 0
test 64.0G 21.7T 1.49K 0 191M 0
test 64.0G 21.7T 2.47K 0 317M 0
test 64.0G 21.7T 1.99K 0 254M 0
test 64.0G 21.7T 1.97K 0 253M 0
test 64.0G 21.7T 1.49K 0 190M 0
test 64.0G 21.7T 2.23K 0 286M 0
test 64.0G 21.7T 1.82K 0 232M 0
test 64.0G 21.7T 2.15K 0 275M 0
test 64.0G 21.7T 2.22K 0 285M 0
test 64.0G 21.7T 1.73K 0 222M 0
test 64.0G 21.7T 2.23K 0 286M 0
test 64.0G 21.7T 1.90K 0 244M 0
test 64.0G 21.7T 1.81K 0 231M 0
test 64.0G 21.7T 2.23K 0 285M 0
test 64.0G 21.7T 1.97K 0 252M 0
test 64.0G 21.7T 2.00K 0 255M 0
test 64.0G 21.7T 8.42K 0 1.05G 0
test 64.0G 21.7T 10.3K 0 1.29G 0
test 64.0G 21.7T 10.2K 0 1.28G 0
test 64.0G 21.7T 10.4K 0 1.30G 0
test 64.0G 21.7T 10.4K 0 1.30G 0
test 64.0G 21.7T 10.4K 0 1.30G 0
test 64.0G 21.7T 10.2K 0 1.27G 0
test 64.0G 21.7T 10.4K 0 1.30G 0
test 64.0G 21.7T 10.6K 0 1.32G 0
test 64.0G 21.7T 10.5K 0 1.31G 0
test 64.0G 21.7T 10.4K 0 1.30G 0
test 64.0G 21.7T 9.23K 0 1.15G 0
test 64.0G 21.7T 10.5K 0 1.31G 0
test 64.0G 21.7T 10.0K 0 1.25G 0
test 64.0G 21.7T 9.55K 0 1.19G 0
test 64.0G 21.7T 10.2K 0 1.27G 0
test 64.0G 21.7T 10.0K 0 1.25G 0
test 64.0G 21.7T 9.91K 0 1.24G 0
test 64.0G 21.7T 10.6K 0 1.32G 0
test 64.0G 21.7T 9.24K 0 1.15G 0
test 64.0G 21.7T 10.1K 0 1.26G 0
test 64.0G 21.7T 10.3K 0 1.29G 0
test 64.0G 21.7T 10.3K 0 1.29G 0
test 64.0G 21.7T 10.6K 0 1.33G 0
test 64.0G 21.7T 10.6K 0 1.33G 0
test 64.0G 21.7T 8.54K 0 1.07G 0
test 64.0G 21.7T 0 0 0 0
test 64.0G 21.7T 0 0 0 0
test 64.0G 21.7T 0 0 0 0
^C
bash-3.2# ptime dd if=/test/q1 of=/dev/null bs=1024k
65536+0 records in
65536+0 records out
real 1:36.732
user 0.046
sys 48.069
bash-3.2#
Well, that drop for several dozen seconds was interesting...
Lets run it again without export/import:
bash-3.2# zpool iostat 1
capacity operations bandwidth
pool used avail read write read write
---------- ----- ----- ----- ----- ----- -----
test 64.0G 21.7T 3.00K 6 384M 271K
test 64.0G 21.7T 0 0 0 0
test 64.0G 21.7T 2.58K 0 330M 0
test 64.0G 21.7T 6.02K 0 771M 0
test 64.0G 21.7T 8.37K 0 1.05G 0
test 64.0G 21.7T 10.4K 0 1.30G 0
test 64.0G 21.7T 10.4K 0 1.30G 0
test 64.0G 21.7T 9.64K 0 1.20G 0
test 64.0G 21.7T 10.5K 0 1.31G 0
test 64.0G 21.7T 10.6K 0 1.32G 0
test 64.0G 21.7T 10.6K 0 1.33G 0
test 64.0G 21.7T 10.4K 0 1.30G 0
test 64.0G 21.7T 9.65K 0 1.21G 0
test 64.0G 21.7T 9.84K 0 1.23G 0
test 64.0G 21.7T 9.22K 0 1.15G 0
test 64.0G 21.7T 10.9K 0 1.36G 0
test 64.0G 21.7T 10.9K 0 1.36G 0
test 64.0G 21.7T 10.4K 0 1.30G 0
test 64.0G 21.7T 10.7K 0 1.34G 0
test 64.0G 21.7T 10.6K 0 1.33G 0
test 64.0G 21.7T 10.9K 0 1.36G 0
test 64.0G 21.7T 10.6K 0 1.32G 0
test 64.0G 21.7T 10.4K 0 1.30G 0
test 64.0G 21.7T 10.7K 0 1.34G 0
test 64.0G 21.7T 10.5K 0 1.32G 0
test 64.0G 21.7T 10.6K 0 1.32G 0
test 64.0G 21.7T 10.8K 0 1.34G 0
test 64.0G 21.7T 10.4K 0 1.29G 0
test 64.0G 21.7T 10.5K 0 1.31G 0
test 64.0G 21.7T 9.15K 0 1.14G 0
test 64.0G 21.7T 10.8K 0 1.35G 0
test 64.0G 21.7T 9.76K 0 1.22G 0
test 64.0G 21.7T 8.67K 0 1.08G 0
test 64.0G 21.7T 10.8K 0 1.36G 0
test 64.0G 21.7T 10.9K 0 1.36G 0
test 64.0G 21.7T 10.3K 0 1.28G 0
test 64.0G 21.7T 9.76K 0 1.22G 0
test 64.0G 21.7T 10.5K 0 1.31G 0
test 64.0G 21.7T 10.6K 0 1.33G 0
test 64.0G 21.7T 9.23K 0 1.15G 0
test 64.0G 21.7T 9.63K 0 1.20G 0
test 64.0G 21.7T 9.79K 0 1.22G 0
test 64.0G 21.7T 10.2K 0 1.28G 0
test 64.0G 21.7T 10.4K 0 1.30G 0
test 64.0G 21.7T 10.3K 0 1.29G 0
test 64.0G 21.7T 10.2K 0 1.28G 0
test 64.0G 21.7T 10.6K 0 1.33G 0
test 64.0G 21.7T 10.8K 0 1.35G 0
test 64.0G 21.7T 10.5K 0 1.32G 0
test 64.0G 21.7T 11.0K 0 1.37G 0
test 64.0G 21.7T 10.2K 0 1.27G 0
test 64.0G 21.7T 9.69K 0 1.21G 0
test 64.0G 21.7T 6.07K 0 777M 0
test 64.0G 21.7T 0 0 0 0
test 64.0G 21.7T 0 0 0 0
test 64.0G 21.7T 0 0 0 0
^C
bash-3.2#
bash-3.2# ptime dd if=/test/q1 of=/dev/null bs=1024k
65536+0 records in
65536+0 records out
real 50.521
user 0.043
sys 48.971
bash-3.2#
Now looks like reading from the pool using single dd is actually CPU
bound.
Reading the same file again and again does produce, more or less,
consistent timing. However every time I export/import the pool during
the first read there is that drop in throughput during first read and
total time increases to almost 100 seconds.... some meta-data? (of
course there are no errors oof any sort, etc.)
>> Reducing zfs_txg_synctime to 1 helps a little bit but still
it''s not
>> even stream of data.
>>
>> If I start 3 dd streams at the same time then it is slightly better
>> (zfs_txg_synctime set back to 5) but still very jumpy.
>>
RB> Try zfs_txg_synctime to 10; that reduces the txg overhead.
Doesn''t help...
[...]
test 13.6G 21.7T 0 0 0 0
test 13.6G 21.7T 0 8.46K 0 1.05G
test 17.6G 21.7T 0 19.3K 0 2.40G
test 17.6G 21.7T 0 0 0 0
test 17.6G 21.7T 0 0 0 0
test 17.6G 21.7T 0 8.04K 0 1022M
test 17.6G 21.7T 0 20.2K 0 2.51G
test 21.6G 21.7T 0 76 0 249K
test 21.6G 21.7T 0 0 0 0
test 21.6G 21.7T 0 0 0 0
test 21.6G 21.7T 0 10.1K 0 1.25G
test 25.6G 21.7T 0 18.6K 0 2.31G
test 25.6G 21.7T 0 0 0 0
test 25.6G 21.7T 0 0 0 0
test 25.6G 21.7T 0 6.34K 0 810M
test 25.6G 21.7T 0 19.9K 0 2.48G
test 29.6G 21.7T 0 88 63.2K 354K
[...]
bash-3.2# ptime dd if=/dev/zero of=/test/q1 bs=1024k count=65536
65536+0 records in
65536+0 records out
real 1:10.074
user 0.074
sys 52.250
bash-3.2#
Increasing it even further (up-to 32s) doesn''t help either.
However lowering it to 1s gives:
[...]
test 2.43G 21.7T 0 8.62K 0 1.07G
test 4.46G 21.7T 0 7.23K 0 912M
test 4.46G 21.7T 0 624 0 77.9M
test 6.66G 21.7T 0 10.7K 0 1.33G
test 6.66G 21.7T 0 6.66K 0 850M
test 8.86G 21.7T 0 10.6K 0 1.31G
test 8.86G 21.7T 0 1.96K 0 251M
test 11.2G 21.7T 0 16.5K 0 2.04G
test 11.2G 21.7T 0 0 0 0
test 11.2G 21.7T 0 18.6K 0 2.31G
test 13.5G 21.7T 0 11 0 11.9K
test 13.5G 21.7T 0 2.60K 0 332M
test 13.5G 21.7T 0 19.1K 0 2.37G
test 16.3G 21.7T 0 11 0 11.9K
test 16.3G 21.7T 0 9.61K 0 1.20G
test 18.4G 21.7T 0 7.41K 0 936M
test 18.4G 21.7T 0 11.6K 0 1.45G
test 20.3G 21.7T 0 3.26K 0 407M
test 20.3G 21.7T 0 7.66K 0 977M
test 22.5G 21.7T 0 7.62K 0 963M
test 22.5G 21.7T 0 6.86K 0 875M
test 24.5G 21.7T 0 8.41K 0 1.04G
test 24.5G 21.7T 0 10.4K 0 1.30G
test 26.5G 21.7T 1 2.19K 127K 270M
test 26.5G 21.7T 0 0 0 0
test 26.5G 21.7T 0 4.56K 0 584M
test 28.5G 21.7T 0 11.5K 0 1.42G
[...]
bash-3.2# ptime dd if=/dev/zero of=/test/q1 bs=1024k count=65536
65536+0 records in
65536+0 records out
real 1:09.541
user 0.072
sys 53.421
bash-3.2#
Looks slightly less jumpy but the total real time is about the same so
average throughput is actually the same (about 1GB/s).
>> Reading with one dd produces steady throghput but I''m
disapointed with
>> actual performance:
>>
RB> Again, probably cpu bound. What''s "ptime dd..." saying
?
You were right here. Reading with single dd seems to be cpu bound.
However multiple streams for reading do not seem to increase
performance considerably.
Nevertheless the main issu is jumpy writing...
--
Best regards,
Robert Milkowski mailto:milek at task.gda.pl
http://milek.blogspot.com
Hello Robert,
Tuesday, July 1, 2008, 12:01:03 AM, you wrote:
RM> Nevertheless the main issu is jumpy writing...
I was just wondering how much thruoughput I can get running multiple
dd - one per disk drive and what kind of aggregated throughput I would
get.
So for each out of 48 disks I did:
dd if=/dev/zero of=/dev/rdsk/c6t7d0s0 bs=128k&
The iostat looks like:
bash-3.2# iostat -xnzC 1|egrep " c[0-6]$|devic"
[skipped the first output]
extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
0.0 5308.0 0.0 679418.9 0.1 7.2 0.0 1.4 0 718 c1
0.0 5264.2 0.0 673813.1 0.1 7.2 0.0 1.4 0 720 c2
0.0 4047.6 0.0 518095.1 0.1 7.3 0.0 1.8 0 725 c3
0.0 5340.1 0.0 683532.5 0.1 7.2 0.0 1.3 0 718 c4
0.0 5325.1 0.0 681608.0 0.1 7.1 0.0 1.3 0 714 c5
0.0 4089.3 0.0 523434.0 0.1 7.3 0.0 1.8 0 727 c6
extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
0.0 5283.1 0.0 676231.2 0.1 7.2 0.0 1.4 0 723 c1
0.0 5215.2 0.0 667549.5 0.1 7.2 0.0 1.4 0 720 c2
0.0 4009.0 0.0 513152.8 0.1 7.3 0.0 1.8 0 725 c3
0.0 5281.9 0.0 676082.5 0.1 7.2 0.0 1.4 0 722 c4
0.0 5316.6 0.0 680520.9 0.1 7.2 0.0 1.4 0 720 c5
0.0 4159.5 0.0 532420.9 0.1 7.3 0.0 1.7 0 726 c6
extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
0.0 5322.0 0.0 681213.6 0.1 7.2 0.0 1.4 0 720 c1
0.0 5292.9 0.0 677494.0 0.1 7.2 0.0 1.4 0 722 c2
0.0 4051.4 0.0 518573.3 0.1 7.3 0.0 1.8 0 727 c3
0.0 5315.0 0.0 680318.8 0.1 7.2 0.0 1.4 0 721 c4
0.0 5313.1 0.0 680074.3 0.1 7.2 0.0 1.4 0 723 c5
0.0 4184.8 0.0 535648.7 0.1 7.3 0.0 1.7 0 730 c6
extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
0.0 5296.4 0.0 677940.2 0.1 7.1 0.0 1.3 0 714 c1
0.0 5236.4 0.0 670265.3 0.1 7.2 0.0 1.4 0 720 c2
0.0 4023.5 0.0 515011.5 0.1 7.3 0.0 1.8 0 728 c3
0.0 5291.4 0.0 677300.7 0.1 7.2 0.0 1.4 0 723 c4
0.0 5297.4 0.0 678072.8 0.1 7.2 0.0 1.4 0 720 c5
0.0 4095.6 0.0 524236.0 0.1 7.3 0.0 1.8 0 726 c6
^C
one full output:
extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
0.0 5302.0 0.0 678658.6 0.1 7.2 0.0 1.4 0 722 c1
0.0 664.0 0.0 84992.8 0.0 0.9 0.0 1.4 1 90 c1t0d0
0.0 657.0 0.0 84090.5 0.0 0.9 0.0 1.3 1 89 c1t1d0
0.0 666.0 0.0 85251.4 0.0 0.9 0.0 1.3 1 89 c1t2d0
0.0 662.0 0.0 84735.6 0.0 0.9 0.0 1.4 1 91 c1t3d0
0.0 669.1 0.0 85638.4 0.0 0.9 0.0 1.4 1 92 c1t4d0
0.0 665.0 0.0 85122.9 0.0 0.9 0.0 1.4 1 91 c1t5d0
0.0 652.9 0.0 83575.1 0.0 0.9 0.0 1.4 1 90 c1t6d0
0.0 666.0 0.0 85251.8 0.0 0.9 0.0 1.4 1 91 c1t7d0
0.0 5293.3 0.0 677537.5 0.1 7.3 0.0 1.4 0 725 c2
0.0 660.0 0.0 84481.2 0.0 0.9 0.0 1.4 1 91 c2t0d0
0.0 661.0 0.0 84610.3 0.0 0.9 0.0 1.4 1 90 c2t1d0
0.0 664.0 0.0 84997.4 0.0 0.9 0.0 1.4 1 90 c2t2d0
0.0 662.0 0.0 84739.4 0.0 0.9 0.0 1.4 1 92 c2t3d0
0.0 655.0 0.0 83836.6 0.0 0.9 0.0 1.4 1 89 c2t4d0
0.0 663.1 0.0 84871.3 0.0 0.9 0.0 1.4 1 90 c2t5d0
0.0 663.1 0.0 84871.5 0.0 0.9 0.0 1.4 1 92 c2t6d0
0.0 665.1 0.0 85129.7 0.0 0.9 0.0 1.4 1 92 c2t7d0
0.0 4072.1 0.0 521228.9 0.1 7.3 0.0 1.8 0 728 c3
0.0 506.9 0.0 64879.3 0.0 0.9 0.0 1.8 1 90 c3t0d0
0.0 513.9 0.0 65782.4 0.0 0.9 0.0 1.8 1 92 c3t1d0
0.0 511.9 0.0 65524.4 0.0 0.9 0.0 1.8 1 91 c3t2d0
0.0 505.9 0.0 64750.5 0.0 0.9 0.0 1.8 1 91 c3t3d0
0.0 502.8 0.0 64363.6 0.0 0.9 0.0 1.8 1 90 c3t4d0
0.0 506.9 0.0 64879.6 0.0 0.9 0.0 1.8 1 91 c3t5d0
0.0 513.9 0.0 65782.6 0.0 0.9 0.0 1.8 1 92 c3t6d0
0.0 509.9 0.0 65266.6 0.0 0.9 0.0 1.8 1 91 c3t7d0
0.0 5298.7 0.0 678232.6 0.1 7.3 0.0 1.4 0 725 c4
0.0 664.1 0.0 85001.4 0.0 0.9 0.0 1.4 1 92 c4t0d0
0.0 662.1 0.0 84743.4 0.0 0.9 0.0 1.4 1 90 c4t1d0
0.0 663.1 0.0 84872.4 0.0 0.9 0.0 1.4 1 92 c4t2d0
0.0 664.1 0.0 85001.4 0.0 0.9 0.0 1.3 1 88 c4t3d0
0.0 657.1 0.0 84105.4 0.0 0.9 0.0 1.4 1 91 c4t4d0
0.0 658.1 0.0 84234.5 0.0 0.9 0.0 1.4 1 91 c4t5d0
0.0 669.2 0.0 85653.4 0.0 0.9 0.0 1.3 1 90 c4t6d0
0.0 661.1 0.0 84620.5 0.0 0.9 0.0 1.4 1 91 c4t7d0
0.0 5314.1 0.0 680209.2 0.1 7.2 0.0 1.3 0 717 c5
0.0 666.1 0.0 85265.7 0.0 0.9 0.0 1.3 1 89 c5t0d0
0.0 662.1 0.0 84749.8 0.0 0.9 0.0 1.3 1 88 c5t1d0
0.0 660.1 0.0 84491.8 0.0 0.9 0.0 1.3 1 89 c5t2d0
0.0 665.2 0.0 85140.3 0.0 0.9 0.0 1.3 1 89 c5t3d0
0.0 668.2 0.0 85527.3 0.0 0.9 0.0 1.4 1 92 c5t4d0
0.0 666.2 0.0 85269.5 0.0 0.9 0.0 1.3 1 89 c5t5d0
0.0 664.2 0.0 85011.4 0.0 0.9 0.0 1.4 1 91 c5t6d0
0.0 662.1 0.0 84753.5 0.0 0.9 0.0 1.4 1 90 c5t7d0
0.0 4229.8 0.0 541418.9 0.1 7.3 0.0 1.7 0 726 c6
0.0 518.0 0.0 66306.4 0.0 0.9 0.0 1.7 1 89 c6t0d0
0.0 533.1 0.0 68241.7 0.0 0.9 0.0 1.7 1 91 c6t1d0
0.0 531.1 0.0 67983.6 0.0 0.9 0.0 1.7 1 91 c6t2d0
0.0 524.1 0.0 67080.6 0.0 0.9 0.0 1.7 1 90 c6t3d0
0.0 540.2 0.0 69144.7 0.0 0.9 0.0 1.7 1 92 c6t4d0
0.0 525.1 0.0 67209.8 0.0 0.9 0.0 1.7 1 90 c6t5d0
0.0 535.2 0.0 68500.0 0.0 0.9 0.0 1.7 1 92 c6t6d0
0.0 523.1 0.0 66952.1 0.0 0.9 0.0 1.7 1 90 c6t7d0
bash-3.2# bc
scale=4
678658.6+677537.5+521228.9+678232.6+680209.2+541418.9
3777285.7
3777285.7/(1024*1024)
3.6023
bash-3.2#
So it''s about 3.6GB/s - pretty good :)
Average throughput with one large stripe pool using zfs is less than
half of that above performance... :( And yes, even with multiple dd to
the same pool.
Additionally turning checksums off helps:
bash-3.2# zpool iostat 1
capacity operations bandwidth
pool used avail read write read write
---------- ----- ----- ----- ----- ----- -----
test 366G 21.4T 0 10.7K 43.2K 1.33G
test 370G 21.4T 0 14.7K 63.4K 1.82G
test 370G 21.4T 0 22.0K 0 2.69G
test 374G 21.4T 0 12.4K 0 1.54G
test 374G 21.4T 0 23.6K 0 2.91G
test 378G 21.4T 0 12.5K 0 1.53G
test 378G 21.4T 0 17.3K 0 2.13G
test 382G 21.4T 1 16.6K 126K 2.05G
test 382G 21.4T 2 17.7K 190K 2.19G
test 386G 21.4T 0 20.4K 0 2.51G
test 390G 21.4T 11 11.6K 762K 1.44G
test 390G 21.4T 0 28.9K 0 3.55G
test 394G 21.4T 2 12.5K 157K 1.51G
test 398G 21.4T 1 20.0K 127K 2.49G
test 398G 21.4T 0 16.3K 0 2.00G
test 402G 21.4T 4 15.3K 311K 1.90G
test 402G 21.4T 0 21.9K 0 2.70G
test 406G 21.4T 4 9.73K 314K 1.19G
test 406G 21.4T 0 22.7K 0 2.78G
test 410G 21.4T 2 14.4K 131K 1.78G
test 414G 21.3T 0 19.9K 61.4K 2.43G
test 414G 21.3T 0 19.1K 0 2.35G
^C
bash-3.2# zfs set checksum=on test
bash-3.2# zpool iostat 1
capacity operations bandwidth
pool used avail read write read write
---------- ----- ----- ----- ----- ----- -----
test 439G 21.3T 0 11.4K 50.2K 1.41G
test 439G 21.3T 0 5.52K 0 702M
test 439G 21.3T 0 24.6K 0 3.07G
test 443G 21.3T 0 13.7K 0 1.70G
test 447G 21.3T 1 13.1K 123K 1.62G
test 447G 21.3T 0 16.1K 0 2.00G
test 451G 21.3T 1 3.97K 116K 498M
test 451G 21.3T 0 17.5K 0 2.19G
test 455G 21.3T 1 12.4K 66.9K 1.54G
test 455G 21.3T 0 13.0K 0 1.60G
test 459G 21.3T 0 11 0 11.9K
test 459G 21.3T 0 16.8K 0 2.09G
test 463G 21.3T 0 9.34K 0 1.16G
test 467G 21.3T 0 15.4K 0 1.91G
test 467G 21.3T 0 16.3K 0 2.03G
test 471G 21.3T 0 9.67K 0 1.20G
test 475G 21.3T 0 17.3K 0 2.13G
test 475G 21.3T 0 3.71K 0 472M
test 475G 21.3T 0 21.9K 0 2.73G
test 479G 21.3T 0 17.4K 0 2.16G
test 483G 21.3T 0 848 0 96.4M
test 483G 21.3T 0 17.4K 0 2.17G
^C
bash-3.2#
bash-3.2# zpool iostat 5
capacity operations bandwidth
pool used avail read write read write
---------- ----- ----- ----- ----- ----- -----
test 582G 21.2T 0 11.8K 44.4K 1.46G
test 590G 21.2T 1 13.8K 76.5K 1.72G
test 598G 21.2T 1 12.4K 102K 1.54G
test 610G 21.2T 1 14.0K 76.7K 1.73G
test 618G 21.1T 0 12.9K 25.5K 1.59G
test 626G 21.1T 0 14.8K 11.1K 1.83G
test 634G 21.1T 0 14.2K 11.9K 1.76G
test 642G 21.1T 0 12.8K 12.8K 1.59G
test 650G 21.1T 0 12.9K 12.8K 1.60G
^C
bash-3.2# zfs set checksum=off test
bash-3.2# zpool iostat 5
capacity operations bandwidth
pool used avail read write read write
---------- ----- ----- ----- ----- ----- -----
test 669G 21.1T 0 12.0K 43.5K 1.48G
test 681G 21.1T 0 17.7K 25.2K 2.18G
test 693G 21.1T 0 16.0K 12.7K 1.97G
test 701G 21.1T 0 19.4K 25.5K 2.38G
test 713G 21.1T 0 16.6K 12.8K 2.03G
test 725G 21.0T 0 17.8K 24.9K 2.18G
test 737G 21.0T 0 17.2K 12.7K 2.11G
test 745G 21.0T 0 19.0K 38.3K 2.34G
test 757G 21.0T 0 16.9K 12.8K 2.08G
test 769G 21.0T 0 17.6K 50.7K 2.16G
^C
bash-3.2#
So without checksums it is much better but still it''s jumpy instead of
steady/constant stream. Especially with 1s iostat resolution.
--
Best regards,
Robert Milkowski mailto:milek at task.gda.pl
http://milek.blogspot.com
Robert Milkowski writes: > Hello Roch, > > Saturday, June 28, 2008, 11:25:17 AM, you wrote: > > > RB> I suspect, a single dd is cpu bound. > > I don''t think so. > We''re nearly so as you show. More below. > Se below one with a stripe of 48x disks again. Single dd with 1024k > block size and 64GB to write. > > bash-3.2# zpool iostat 1 > capacity operations bandwidth > pool used avail read write read write > ---------- ----- ----- ----- ----- ----- ----- > test 333K 21.7T 1 1 147K 147K > test 333K 21.7T 0 0 0 0 > test 333K 21.7T 0 0 0 0 > test 333K 21.7T 0 0 0 0 > test 333K 21.7T 0 0 0 0 > test 333K 21.7T 0 0 0 0 > test 333K 21.7T 0 0 0 0 > test 333K 21.7T 0 0 0 0 > test 333K 21.7T 0 1.60K 0 204M > test 333K 21.7T 0 20.5K 0 2.55G > test 4.00G 21.7T 0 9.19K 0 1.13G > test 4.00G 21.7T 0 0 0 0 > test 4.00G 21.7T 0 1.78K 0 228M > test 4.00G 21.7T 0 12.5K 0 1.55G > test 7.99G 21.7T 0 16.2K 0 2.01G > test 7.99G 21.7T 0 0 0 0 > test 7.99G 21.7T 0 13.4K 0 1.68G > test 12.0G 21.7T 0 4.31K 0 530M > test 12.0G 21.7T 0 0 0 0 > test 12.0G 21.7T 0 6.91K 0 882M > test 12.0G 21.7T 0 21.8K 0 2.72G > test 16.0G 21.7T 0 839 0 88.4M > test 16.0G 21.7T 0 0 0 0 > test 16.0G 21.7T 0 4.42K 0 565M > test 16.0G 21.7T 0 18.5K 0 2.31G > test 20.0G 21.7T 0 8.87K 0 1.10G > test 20.0G 21.7T 0 0 0 0 > test 20.0G 21.7T 0 12.2K 0 1.52G > test 24.0G 21.7T 0 9.28K 0 1.14G > test 24.0G 21.7T 0 0 0 0 > test 24.0G 21.7T 0 0 0 0 > test 24.0G 21.7T 0 0 0 0 > test 24.0G 21.7T 0 14.5K 0 1.81G > test 28.0G 21.7T 0 10.1K 63.6K 1.25G > test 28.0G 21.7T 0 0 0 0 > test 28.0G 21.7T 0 10.7K 0 1.34G > test 32.0G 21.7T 0 13.6K 63.2K 1.69G > test 32.0G 21.7T 0 0 0 0 > test 32.0G 21.7T 0 0 0 0 > test 32.0G 21.7T 0 11.1K 0 1.39G > test 36.0G 21.7T 0 19.9K 0 2.48G > test 36.0G 21.7T 0 0 0 0 > test 36.0G 21.7T 0 0 0 0 > test 36.0G 21.7T 0 17.7K 0 2.21G > test 40.0G 21.7T 0 5.42K 63.1K 680M > test 40.0G 21.7T 0 0 0 0 > test 40.0G 21.7T 0 6.62K 0 844M > test 44.0G 21.7T 1 19.8K 125K 2.46G > test 44.0G 21.7T 0 0 0 0 > test 44.0G 21.7T 0 0 0 0 > test 44.0G 21.7T 0 18.0K 0 2.24G > test 47.9G 21.7T 1 13.2K 127K 1.63G > test 47.9G 21.7T 0 0 0 0 > test 47.9G 21.7T 0 0 0 0 > test 47.9G 21.7T 0 15.6K 0 1.94G > test 47.9G 21.7T 1 16.1K 126K 1.99G > test 51.9G 21.7T 0 0 0 0 > test 51.9G 21.7T 0 0 0 0 > test 51.9G 21.7T 0 14.2K 0 1.77G > test 55.9G 21.7T 0 14.0K 63.2K 1.73G > test 55.9G 21.7T 0 0 0 0 > test 55.9G 21.7T 0 0 0 0 > test 55.9G 21.7T 0 16.3K 0 2.04G > test 59.9G 21.7T 0 14.5K 63.2K 1.80G > test 59.9G 21.7T 0 0 0 0 > test 59.9G 21.7T 0 0 0 0 > test 59.9G 21.7T 0 17.7K 0 2.21G > test 63.9G 21.7T 0 4.84K 62.6K 603M > test 63.9G 21.7T 0 0 0 0 > test 63.9G 21.7T 0 0 0 0 > test 63.9G 21.7T 0 0 0 0 > test 63.9G 21.7T 0 0 0 0 > test 63.9G 21.7T 0 0 0 0 > test 63.9G 21.7T 0 0 0 0 > test 63.9G 21.7T 0 0 0 0 > ^C > bash-3.2# > > bash-3.2# ptime dd if=/dev/zero of=/test/q1 bs=1024k count=65536 > 65536+0 records in > 65536+0 records out > > real 1:06.312 > user 0.074 > sys 54.060 > bash-3.2# > > Doesn''t look like it''s CPU bound. > So if sys we''re at 81% of CPU saturation. If you make this 100% you will still have zeros in the zpool iostat. We might be waiting on memory pages and a few other locks. > > > Let''s try to read the file after zpool export test; zpool import test > > bash-3.2# zpool iostat 1 > capacity operations bandwidth > pool used avail read write read write > ---------- ----- ----- ----- ----- ----- ----- > test 64.0G 21.7T 15 46 1.22M 1.76M > test 64.0G 21.7T 0 0 0 0 > test 64.0G 21.7T 0 0 0 0 > test 64.0G 21.7T 6.64K 0 849M 0 > test 64.0G 21.7T 10.2K 0 1.27G 0 > test 64.0G 21.7T 10.7K 0 1.33G 0 > test 64.0G 21.7T 9.91K 0 1.24G 0 > test 64.0G 21.7T 10.1K 0 1.27G 0 > test 64.0G 21.7T 10.7K 0 1.33G 0 > test 64.0G 21.7T 10.4K 0 1.30G 0 > test 64.0G 21.7T 10.6K 0 1.33G 0 > test 64.0G 21.7T 10.6K 0 1.33G 0 > test 64.0G 21.7T 10.2K 0 1.27G 0 > test 64.0G 21.7T 10.3K 0 1.29G 0 > test 64.0G 21.7T 10.5K 0 1.31G 0 > test 64.0G 21.7T 9.16K 0 1.14G 0 > test 64.0G 21.7T 1.98K 0 253M 0 > test 64.0G 21.7T 2.48K 0 317M 0 > test 64.0G 21.7T 1.98K 0 253M 0 > test 64.0G 21.7T 1.98K 0 254M 0 > test 64.0G 21.7T 2.23K 0 285M 0 > test 64.0G 21.7T 1.73K 0 221M 0 > test 64.0G 21.7T 2.23K 0 285M 0 > test 64.0G 21.7T 2.23K 0 285M 0 > test 64.0G 21.7T 1.49K 0 191M 0 > test 64.0G 21.7T 2.47K 0 317M 0 > test 64.0G 21.7T 1.46K 0 186M 0 > test 64.0G 21.7T 2.01K 0 258M 0 > test 64.0G 21.7T 1.98K 0 254M 0 > test 64.0G 21.7T 1.97K 0 253M 0 > test 64.0G 21.7T 2.23K 0 286M 0 > test 64.0G 21.7T 1.98K 0 254M 0 > test 64.0G 21.7T 1.73K 0 221M 0 > test 64.0G 21.7T 1.98K 0 254M 0 > test 64.0G 21.7T 2.42K 0 310M 0 > test 64.0G 21.7T 1.78K 0 228M 0 > test 64.0G 21.7T 2.23K 0 285M 0 > test 64.0G 21.7T 1.67K 0 214M 0 > test 64.0G 21.7T 1.80K 0 230M 0 > test 64.0G 21.7T 2.23K 0 285M 0 > test 64.0G 21.7T 2.47K 0 317M 0 > test 64.0G 21.7T 1.73K 0 221M 0 > test 64.0G 21.7T 1.99K 0 254M 0 > test 64.0G 21.7T 1.24K 0 159M 0 > test 64.0G 21.7T 2.47K 0 316M 0 > test 64.0G 21.7T 2.47K 0 317M 0 > test 64.0G 21.7T 1.99K 0 254M 0 > test 64.0G 21.7T 2.23K 0 285M 0 > test 64.0G 21.7T 1.73K 0 221M 0 > test 64.0G 21.7T 2.48K 0 317M 0 > test 64.0G 21.7T 2.48K 0 317M 0 > test 64.0G 21.7T 1.49K 0 190M 0 > test 64.0G 21.7T 2.23K 0 285M 0 > test 64.0G 21.7T 2.23K 0 285M 0 > test 64.0G 21.7T 1.81K 0 232M 0 > test 64.0G 21.7T 1.90K 0 243M 0 > test 64.0G 21.7T 2.48K 0 317M 0 > test 64.0G 21.7T 1.49K 0 191M 0 > test 64.0G 21.7T 2.47K 0 317M 0 > test 64.0G 21.7T 1.99K 0 254M 0 > test 64.0G 21.7T 1.97K 0 253M 0 > test 64.0G 21.7T 1.49K 0 190M 0 > test 64.0G 21.7T 2.23K 0 286M 0 > test 64.0G 21.7T 1.82K 0 232M 0 > test 64.0G 21.7T 2.15K 0 275M 0 > test 64.0G 21.7T 2.22K 0 285M 0 > test 64.0G 21.7T 1.73K 0 222M 0 > test 64.0G 21.7T 2.23K 0 286M 0 > test 64.0G 21.7T 1.90K 0 244M 0 > test 64.0G 21.7T 1.81K 0 231M 0 > test 64.0G 21.7T 2.23K 0 285M 0 > test 64.0G 21.7T 1.97K 0 252M 0 > test 64.0G 21.7T 2.00K 0 255M 0 > test 64.0G 21.7T 8.42K 0 1.05G 0 > test 64.0G 21.7T 10.3K 0 1.29G 0 > test 64.0G 21.7T 10.2K 0 1.28G 0 > test 64.0G 21.7T 10.4K 0 1.30G 0 > test 64.0G 21.7T 10.4K 0 1.30G 0 > test 64.0G 21.7T 10.4K 0 1.30G 0 > test 64.0G 21.7T 10.2K 0 1.27G 0 > test 64.0G 21.7T 10.4K 0 1.30G 0 > test 64.0G 21.7T 10.6K 0 1.32G 0 > test 64.0G 21.7T 10.5K 0 1.31G 0 > test 64.0G 21.7T 10.4K 0 1.30G 0 > test 64.0G 21.7T 9.23K 0 1.15G 0 > test 64.0G 21.7T 10.5K 0 1.31G 0 > test 64.0G 21.7T 10.0K 0 1.25G 0 > test 64.0G 21.7T 9.55K 0 1.19G 0 > test 64.0G 21.7T 10.2K 0 1.27G 0 > test 64.0G 21.7T 10.0K 0 1.25G 0 > test 64.0G 21.7T 9.91K 0 1.24G 0 > test 64.0G 21.7T 10.6K 0 1.32G 0 > test 64.0G 21.7T 9.24K 0 1.15G 0 > test 64.0G 21.7T 10.1K 0 1.26G 0 > test 64.0G 21.7T 10.3K 0 1.29G 0 > test 64.0G 21.7T 10.3K 0 1.29G 0 > test 64.0G 21.7T 10.6K 0 1.33G 0 > test 64.0G 21.7T 10.6K 0 1.33G 0 > test 64.0G 21.7T 8.54K 0 1.07G 0 > test 64.0G 21.7T 0 0 0 0 > test 64.0G 21.7T 0 0 0 0 > test 64.0G 21.7T 0 0 0 0 > ^C > > bash-3.2# ptime dd if=/test/q1 of=/dev/null bs=1024k > 65536+0 records in > 65536+0 records out > > real 1:36.732 > user 0.046 > sys 48.069 > bash-3.2# > > > Well, that drop for several dozen seconds was interesting... > Lets run it again without export/import: > > bash-3.2# zpool iostat 1 > capacity operations bandwidth > pool used avail read write read write > ---------- ----- ----- ----- ----- ----- ----- > test 64.0G 21.7T 3.00K 6 384M 271K > test 64.0G 21.7T 0 0 0 0 > test 64.0G 21.7T 2.58K 0 330M 0 > test 64.0G 21.7T 6.02K 0 771M 0 > test 64.0G 21.7T 8.37K 0 1.05G 0 > test 64.0G 21.7T 10.4K 0 1.30G 0 > test 64.0G 21.7T 10.4K 0 1.30G 0 > test 64.0G 21.7T 9.64K 0 1.20G 0 > test 64.0G 21.7T 10.5K 0 1.31G 0 > test 64.0G 21.7T 10.6K 0 1.32G 0 > test 64.0G 21.7T 10.6K 0 1.33G 0 > test 64.0G 21.7T 10.4K 0 1.30G 0 > test 64.0G 21.7T 9.65K 0 1.21G 0 > test 64.0G 21.7T 9.84K 0 1.23G 0 > test 64.0G 21.7T 9.22K 0 1.15G 0 > test 64.0G 21.7T 10.9K 0 1.36G 0 > test 64.0G 21.7T 10.9K 0 1.36G 0 > test 64.0G 21.7T 10.4K 0 1.30G 0 > test 64.0G 21.7T 10.7K 0 1.34G 0 > test 64.0G 21.7T 10.6K 0 1.33G 0 > test 64.0G 21.7T 10.9K 0 1.36G 0 > test 64.0G 21.7T 10.6K 0 1.32G 0 > test 64.0G 21.7T 10.4K 0 1.30G 0 > test 64.0G 21.7T 10.7K 0 1.34G 0 > test 64.0G 21.7T 10.5K 0 1.32G 0 > test 64.0G 21.7T 10.6K 0 1.32G 0 > test 64.0G 21.7T 10.8K 0 1.34G 0 > test 64.0G 21.7T 10.4K 0 1.29G 0 > test 64.0G 21.7T 10.5K 0 1.31G 0 > test 64.0G 21.7T 9.15K 0 1.14G 0 > test 64.0G 21.7T 10.8K 0 1.35G 0 > test 64.0G 21.7T 9.76K 0 1.22G 0 > test 64.0G 21.7T 8.67K 0 1.08G 0 > test 64.0G 21.7T 10.8K 0 1.36G 0 > test 64.0G 21.7T 10.9K 0 1.36G 0 > test 64.0G 21.7T 10.3K 0 1.28G 0 > test 64.0G 21.7T 9.76K 0 1.22G 0 > test 64.0G 21.7T 10.5K 0 1.31G 0 > test 64.0G 21.7T 10.6K 0 1.33G 0 > test 64.0G 21.7T 9.23K 0 1.15G 0 > test 64.0G 21.7T 9.63K 0 1.20G 0 > test 64.0G 21.7T 9.79K 0 1.22G 0 > test 64.0G 21.7T 10.2K 0 1.28G 0 > test 64.0G 21.7T 10.4K 0 1.30G 0 > test 64.0G 21.7T 10.3K 0 1.29G 0 > test 64.0G 21.7T 10.2K 0 1.28G 0 > test 64.0G 21.7T 10.6K 0 1.33G 0 > test 64.0G 21.7T 10.8K 0 1.35G 0 > test 64.0G 21.7T 10.5K 0 1.32G 0 > test 64.0G 21.7T 11.0K 0 1.37G 0 > test 64.0G 21.7T 10.2K 0 1.27G 0 > test 64.0G 21.7T 9.69K 0 1.21G 0 > test 64.0G 21.7T 6.07K 0 777M 0 > test 64.0G 21.7T 0 0 0 0 > test 64.0G 21.7T 0 0 0 0 > test 64.0G 21.7T 0 0 0 0 > ^C > bash-3.2# > > bash-3.2# ptime dd if=/test/q1 of=/dev/null bs=1024k > 65536+0 records in > 65536+0 records out > > real 50.521 > user 0.043 > sys 48.971 > bash-3.2# > > Now looks like reading from the pool using single dd is actually CPU > bound. > > Reading the same file again and again does produce, more or less, > consistent timing. However every time I export/import the pool during > the first read there is that drop in throughput during first read and > total time increases to almost 100 seconds.... some meta-data? (of > course there are no errors oof any sort, etc.) > > That might fall in either of these buckets. 6412053 zfetch needs some love 6579975 dnode_new_blkid should check before it locks > > > > > >> Reducing zfs_txg_synctime to 1 helps a little bit but still it''s not > >> even stream of data. > >> > >> If I start 3 dd streams at the same time then it is slightly better > >> (zfs_txg_synctime set back to 5) but still very jumpy. > >> > > RB> Try zfs_txg_synctime to 10; that reduces the txg overhead. > > You need multiple dd; we''re basically CPU bound here. With multiple dd and zfs_txg_synctime to 10 you will have more write throughput. The drops you will have then will correspond to the metadata updates phase during the transaction group. But the drops below correspond to txg going faster (memory to disk) than dd is able to fill memory (some of that speed governed by suboptimal locking). -r > Doesn''t help... > > [...] > test 13.6G 21.7T 0 0 0 0 > test 13.6G 21.7T 0 8.46K 0 1.05G > test 17.6G 21.7T 0 19.3K 0 2.40G > test 17.6G 21.7T 0 0 0 0 > test 17.6G 21.7T 0 0 0 0 > test 17.6G 21.7T 0 8.04K 0 1022M > test 17.6G 21.7T 0 20.2K 0 2.51G > test 21.6G 21.7T 0 76 0 249K > test 21.6G 21.7T 0 0 0 0 > test 21.6G 21.7T 0 0 0 0 > test 21.6G 21.7T 0 10.1K 0 1.25G > test 25.6G 21.7T 0 18.6K 0 2.31G > test 25.6G 21.7T 0 0 0 0 > test 25.6G 21.7T 0 0 0 0 > test 25.6G 21.7T 0 6.34K 0 810M > test 25.6G 21.7T 0 19.9K 0 2.48G > test 29.6G 21.7T 0 88 63.2K 354K > [...] > > bash-3.2# ptime dd if=/dev/zero of=/test/q1 bs=1024k count=65536 > 65536+0 records in > 65536+0 records out > > real 1:10.074 > user 0.074 > sys 52.250 > bash-3.2# > > > Increasing it even further (up-to 32s) doesn''t help either. > > However lowering it to 1s gives: > > [...] > test 2.43G 21.7T 0 8.62K 0 1.07G > test 4.46G 21.7T 0 7.23K 0 912M > test 4.46G 21.7T 0 624 0 77.9M > test 6.66G 21.7T 0 10.7K 0 1.33G > test 6.66G 21.7T 0 6.66K 0 850M > test 8.86G 21.7T 0 10.6K 0 1.31G > test 8.86G 21.7T 0 1.96K 0 251M > test 11.2G 21.7T 0 16.5K 0 2.04G > test 11.2G 21.7T 0 0 0 0 > test 11.2G 21.7T 0 18.6K 0 2.31G > test 13.5G 21.7T 0 11 0 11.9K > test 13.5G 21.7T 0 2.60K 0 332M > test 13.5G 21.7T 0 19.1K 0 2.37G > test 16.3G 21.7T 0 11 0 11.9K > test 16.3G 21.7T 0 9.61K 0 1.20G > test 18.4G 21.7T 0 7.41K 0 936M > test 18.4G 21.7T 0 11.6K 0 1.45G > test 20.3G 21.7T 0 3.26K 0 407M > test 20.3G 21.7T 0 7.66K 0 977M > test 22.5G 21.7T 0 7.62K 0 963M > test 22.5G 21.7T 0 6.86K 0 875M > test 24.5G 21.7T 0 8.41K 0 1.04G > test 24.5G 21.7T 0 10.4K 0 1.30G > test 26.5G 21.7T 1 2.19K 127K 270M > test 26.5G 21.7T 0 0 0 0 > test 26.5G 21.7T 0 4.56K 0 584M > test 28.5G 21.7T 0 11.5K 0 1.42G > [...] > > bash-3.2# ptime dd if=/dev/zero of=/test/q1 bs=1024k count=65536 > 65536+0 records in > 65536+0 records out > > real 1:09.541 > user 0.072 > sys 53.421 > bash-3.2# > > > > Looks slightly less jumpy but the total real time is about the same so > average throughput is actually the same (about 1GB/s). > > > > > >> Reading with one dd produces steady throghput but I''m disapointed with > >> actual performance: > >> > > RB> Again, probably cpu bound. What''s "ptime dd..." saying ? > > You were right here. Reading with single dd seems to be cpu bound. > However multiple streams for reading do not seem to increase > performance considerably. > > Nevertheless the main issu is jumpy writing... > > > > > -- > Best regards, > Robert Milkowski mailto:milek at task.gda.pl > http://milek.blogspot.com >