My application processes thousands of files sequentially, reading input files, and outputting new files. I am using Solaris 10U4. While running the application in a verbose mode, I see that it runs very fast but pauses about every 7 seconds for a second or two. This is while reading 50MB/second and writing 73MB/second (ARC cache miss rate of 87%). The pause does not occur if the application spends more time doing real work. However, it would be nice if the pause went away. I have tried turning down the ARC size (from 14GB to 10GB) but the behavior did not noticeably improve. The storage device is trained to ignore cache flush requests. According to the Evil Tuning Guide, the pause I am seeing is due to a cache flush after the uberblock updates. It does not seem like a wise choice to disable ZFS cache flushing entirely. Is there a better way other than adding a small delay into my application? Bob =====================================Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
Bob Friesenhahn wrote:> My application processes thousands of files sequentially, reading > input files, and outputting new files. I am using Solaris 10U4. > While running the application in a verbose mode, I see that it runs > very fast but pauses about every 7 seconds for a second or two.When you experience the pause at the application level, do you see an increase in writes to disk? This might the regular syncing of the transaction group to disk. This is normal behavior. The "amount" of pause is determined by how much data needs to be synced. You could of course decrease it by reducing the time between syncs (either by reducing the ARC and/or decreasing txg_time), however, I am not sure it will translate to better performance for you. hth, -neel This> is while reading 50MB/second and writing 73MB/second (ARC cache miss > rate of 87%). The pause does not occur if the application spends more > time doing real work. However, it would be nice if the pause went > away. > > I have tried turning down the ARC size (from 14GB to 10GB) but the > behavior did not noticeably improve. The storage device is trained to > ignore cache flush requests. According to the Evil Tuning Guide, the > pause I am seeing is due to a cache flush after the uberblock updates. > > It does not seem like a wise choice to disable ZFS cache flushing > entirely. Is there a better way other than adding a small delay into > my application? > > Bob > =====================================> Bob Friesenhahn > bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ > GraphicsMagick Maintainer, http://www.GraphicsMagick.org/ > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
the question is: does the "IO pausing" behaviour you noticed penalize your application? what are the consequences at the application level? for instance we have seen application doing some kind of data capture from external device (video for example) requiring a constant throughput to disk (data feed), risking otherwise loss of data. in this case qfs might be a better option (no free though) if your application is not suffering, then you should be able to live with this apparent "io hangs" s- On Thu, Mar 27, 2008 at 3:35 AM, Bob Friesenhahn <bfriesen at simple.dallas.tx.us> wrote:> My application processes thousands of files sequentially, reading > input files, and outputting new files. I am using Solaris 10U4. > While running the application in a verbose mode, I see that it runs > very fast but pauses about every 7 seconds for a second or two. This > is while reading 50MB/second and writing 73MB/second (ARC cache miss > rate of 87%). The pause does not occur if the application spends more > time doing real work. However, it would be nice if the pause went > away. > > I have tried turning down the ARC size (from 14GB to 10GB) but the > behavior did not noticeably improve. The storage device is trained to > ignore cache flush requests. According to the Evil Tuning Guide, the > pause I am seeing is due to a cache flush after the uberblock updates. > > It does not seem like a wise choice to disable ZFS cache flushing > entirely. Is there a better way other than adding a small delay into > my application? > > Bob > =====================================> Bob Friesenhahn > bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ > GraphicsMagick Maintainer, http://www.GraphicsMagick.org/ > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >-- ------------------------------------------------------ Blog: http://fakoli.blogspot.com/
On Wed, 26 Mar 2008, Neelakanth Nadgir wrote:> When you experience the pause at the application level, > do you see an increase in writes to disk? This might the > regular syncing of the transaction group to disk.If I use ''zpool iostat'' with a one second interval what I see is two or three samples with no write I/O at all followed by a huge write of 100 to 312MB/second. Writes claimed to be a lower rate are split across two sample intervale. It seems that writes are being cached and then issued all at once. This behavior assumes that the file may be written multiple times so a delayed write is more efficient. If I run a script like while true do sync done then the write data rate is much more consistent (at about 66MB/second) and the program does not stall. Of course this is not very efficient. Are the ''zpool iostat'' statistics accurate? Bob =====================================Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
Selim Daoud wrote:> the question is: does the "IO pausing" behaviour you noticed penalize > your application? > what are the consequences at the application level? > > for instance we have seen application doing some kind of data capture > from external device (video for example) requiring a constant > throughput to disk (data feed), risking otherwise loss of data. in > this case qfs might be a better option (no free though) > if your application is not suffering, then you should be able to live > with this apparent "io hangs" > >I would look at txg_time first... for lots of streaming writes on a machine with limited memory writes you can smooth out the sawtooth. QFS is open sourced. http://blogs.sun.com/samqfs -- richard
Bob Friesenhahn wrote:> On Wed, 26 Mar 2008, Neelakanth Nadgir wrote: >> When you experience the pause at the application level, >> do you see an increase in writes to disk? This might the >> regular syncing of the transaction group to disk. > > If I use ''zpool iostat'' with a one second interval what I see is two > or three samples with no write I/O at all followed by a huge write of > 100 to 312MB/second. Writes claimed to be a lower rate are split > across two sample intervale. > > It seems that writes are being cached and then issued all at once. > This behavior assumes that the file may be written multiple times so a > delayed write is more efficient. >This does sound like the regular syncing.> If I run a script like > > while true > do > sync > done > > then the write data rate is much more consistent (at about > 66MB/second) and the program does not stall. Of course this is not > very efficient. >This causes the sync to happen much faster, but as you say, suboptimal. Haven''t had the time to go through the bug report, but probably CR 6429205 each zpool needs to monitor its throughput and throttle heavy writers will help.> Are the ''zpool iostat'' statistics accurate? >Yes. You could also look at regular iostat and correlate it. -neel
On Thu, 27 Mar 2008, Neelakanth Nadgir wrote:> > This causes the sync to happen much faster, but as you say, suboptimal. > Haven''t had the time to go through the bug report, but probably > CR 6429205 each zpool needs to monitor its throughput > and throttle heavy writers > will help.I hope that this feature is implemented soon, and works well. :-) I tested with my application outputting to a UFS filesystem on a single 15K RPM SAS disk and saw that it writes about 50MB/second and without the bursty behavior of ZFS. When writing to ZFS filesystem on a RAID array, zpool I/O stat reports an average (over 10 seconds) write rate of 54MB/second. Given that the throughput is not much higher on the RAID array, I assume that the bottleneck is in my application.>> Are the ''zpool iostat'' statistics accurate? > > Yes. You could also look at regular iostat > and correlate it.Iostat shows that my RAID array disks are loafing with only 9MB/second writes to each but with 82 writes/second. Bob =====================================Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
On Mar 27, 2008, at 9:24 AM, Bob Friesenhahn wrote:> On Thu, 27 Mar 2008, Neelakanth Nadgir wrote: >> >> This causes the sync to happen much faster, but as you say, >> suboptimal. >> Haven''t had the time to go through the bug report, but probably >> CR 6429205 each zpool needs to monitor its throughput >> and throttle heavy writers >> will help. > > I hope that this feature is implemented soon, and works well. :-)Actually, this has gone back into snv_87 (and no we don''t know which s10uX it will go into yet). eric
you may want to try disabling the disk write cache on the single disk. also for the RAID disable ''host cache flush'' if such an option exists. that solved the problem for me. let me know. Bob Friesenhahn <bfriesen at simple.dallas.tx.us> wrote: On Thu, 27 Mar 2008, Neelakanth Nadgir wrote:> > This causes the sync to happen much faster, but as you say, suboptimal. > Haven''t had the time to go through the bug report, but probably > CR 6429205 each zpool needs to monitor its throughput > and throttle heavy writers > will help.I hope that this feature is implemented soon, and works well. :-) I tested with my application outputting to a UFS filesystem on a single 15K RPM SAS disk and saw that it writes about 50MB/second and without the bursty behavior of ZFS. When writing to ZFS filesystem on a RAID array, zpool I/O stat reports an average (over 10 seconds) write rate of 54MB/second. Given that the throughput is not much higher on the RAID array, I assume that the bottleneck is in my application.>> Are the ''zpool iostat'' statistics accurate? > > Yes. You could also look at regular iostat > and correlate it.Iostat shows that my RAID array disks are loafing with only 9MB/second writes to each but with 82 writes/second. Bob =====================================Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/ _______________________________________________ zfs-discuss mailing list zfs-discuss at opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss --------------------------------- Looking for last minute shopping deals? Find them fast with Yahoo! Search. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20080327/6888269a/attachment.html>
Hello eric, Thursday, March 27, 2008, 9:36:42 PM, you wrote: ek> On Mar 27, 2008, at 9:24 AM, Bob Friesenhahn wrote:>> On Thu, 27 Mar 2008, Neelakanth Nadgir wrote: >>> >>> This causes the sync to happen much faster, but as you say, >>> suboptimal. >>> Haven''t had the time to go through the bug report, but probably >>> CR 6429205 each zpool needs to monitor its throughput >>> and throttle heavy writers >>> will help. >> >> I hope that this feature is implemented soon, and works well. :-)ek> Actually, this has gone back into snv_87 (and no we don''t know which ek> s10uX it will go into yet). Could you share more details how it works right now after change? -- Best regards, Robert Milkowski mailto:milek at task.gda.pl http://milek.blogspot.com
ZFS has always done a certain amount of "write throttling". In the past (or the present, for those of you running S10 or pre build 87 bits) this throttling was controlled by a timer and the size of the ARC: we would "cut" a transaction group every 5 seconds based off of our timer, and we would also "cut" a transaction group if we had more than 1/4 of the ARC size worth of dirty data in the transaction group. So, for example, if you have a machine with 16GB of physical memory it wouldn''t be unusual to see an ARC size of around 12GB. This means we would allow up to 3GB of dirty data into a single transaction group (if the writes complete in less than 5 seconds). Now we can have up to three transaction groups "in progress" at any time: open context, quiesce context, and sync context. As a final wrinkle, we also don''t allow more than 1/2 the ARC to be composed of dirty write data. All taken together, this means that there can be up to 6GB of writes "in the pipe" (using the 12GB ARC example from above). Problems with this design start to show up when the write-to-disk bandwidth can''t keep up with the application: if the application is writing at a rate of, say, 1GB/sec, it will "fill the pipe" within 6 seconds. But if the IO bandwidth to disk is only 512MB/sec, its going to take 12sec to get this data onto the disk. This "impedance mis-match" is going to manifest as pauses: the application fills the pipe, then waits for the pipe to empty, then starts writing again. Note that this won''t be smooth, since we need to complete an entire sync phase before allowing things to progress. So you can end up with IO gaps. This is probably what the original submitter is experiencing. Note there are a few other subtleties here that I have glossed over, but the general picture is accurate. The new write throttle code put back into build 87 attempts to smooth out the process. We now measure the amount of time it takes to sync each transaction group, and the amount of data in that group. We dynamically resize our write throttle to try to keep the sync time constant (at 5secs) under write load. We also introduce "fairness" delays on writers when we near pipeline capacity: each write is delayed 1/100sec when we are about to "fill up". This prevents a single heavy writer from "starving out" occasional writers. So instead of coming to an abrupt halt when the pipeline fills, we slow down our write pace. The result should be a constant even IO load. There is one "down side" to this new model: if a write load is very "bursty", e.g., a large 5GB write followed by 30secs of idle, the new code may be less efficient than the old. In the old code, all of this IO would be let in at memory speed and then more slowly make its way out to disk. In the new code, the writes may be slowed down. The data makes its way to the disk in the same amount of time, but the application takes longer. Conceptually: we are sizing the write buffer to the pool bandwidth, rather than to the memory size. Robert Milkowski wrote:> Hello eric, > > Thursday, March 27, 2008, 9:36:42 PM, you wrote: > > ek> On Mar 27, 2008, at 9:24 AM, Bob Friesenhahn wrote: >>> On Thu, 27 Mar 2008, Neelakanth Nadgir wrote: >>>> This causes the sync to happen much faster, but as you say, >>>> suboptimal. >>>> Haven''t had the time to go through the bug report, but probably >>>> CR 6429205 each zpool needs to monitor its throughput >>>> and throttle heavy writers >>>> will help. >>> I hope that this feature is implemented soon, and works well. :-) > > ek> Actually, this has gone back into snv_87 (and no we don''t know which > ek> s10uX it will go into yet). > > > Could you share more details how it works right now after change? >
On Tue, 15 Apr 2008, Mark Maybee wrote:> going to take 12sec to get this data onto the disk. This "impedance > mis-match" is going to manifest as pauses: the application fills > the pipe, then waits for the pipe to empty, then starts writing again. > Note that this won''t be smooth, since we need to complete an entire > sync phase before allowing things to progress. So you can end up > with IO gaps. This is probably what the original submitter isYes. With an application which also needs to make best use of available CPU, these I/O "gaps" cut into available CPU time (by blocking the process) unless the application uses multithreading and an intermediate write queue (more memory) to separate the CPU-centric parts from the I/O-centric parts. While the single-threaded application is waiting for data to be written, it is not able to read and process more data. Since reads take time to complete, being blocked on write stops new reads from being started so the data is ready when it is needed.> There is one "down side" to this new model: if a write load is very > "bursty", e.g., a large 5GB write followed by 30secs of idle, the > new code may be less efficient than the old. In the old code, allThis is also a common scenario. :-) Presumably the special "slow I/O" code would not kick in unless the burst was large enough to fill quite a bit of the ARC. Real time throttling is quite a challenge to do in software. Bob =====================================Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
Hello Mark, Tuesday, April 15, 2008, 8:32:32 PM, you wrote: MM> ZFS has always done a certain amount of "write throttling". In the past MM> (or the present, for those of you running S10 or pre build 87 bits) this MM> throttling was controlled by a timer and the size of the ARC: we would MM> "cut" a transaction group every 5 seconds based off of our timer, and MM> we would also "cut" a transaction group if we had more than 1/4 of the MM> ARC size worth of dirty data in the transaction group. So, for example, MM> if you have a machine with 16GB of physical memory it wouldn''t be MM> unusual to see an ARC size of around 12GB. This means we would allow MM> up to 3GB of dirty data into a single transaction group (if the writes MM> complete in less than 5 seconds). Now we can have up to three MM> transaction groups "in progress" at any time: open context, quiesce MM> context, and sync context. As a final wrinkle, we also don''t allow more MM> than 1/2 the ARC to be composed of dirty write data. All taken MM> together, this means that there can be up to 6GB of writes "in the pipe" MM> (using the 12GB ARC example from above). MM> Problems with this design start to show up when the write-to-disk MM> bandwidth can''t keep up with the application: if the application is MM> writing at a rate of, say, 1GB/sec, it will "fill the pipe" within MM> 6 seconds. But if the IO bandwidth to disk is only 512MB/sec, its MM> going to take 12sec to get this data onto the disk. This "impedance MM> mis-match" is going to manifest as pauses: the application fills MM> the pipe, then waits for the pipe to empty, then starts writing again. MM> Note that this won''t be smooth, since we need to complete an entire MM> sync phase before allowing things to progress. So you can end up MM> with IO gaps. This is probably what the original submitter is MM> experiencing. Note there are a few other subtleties here that I MM> have glossed over, but the general picture is accurate. MM> The new write throttle code put back into build 87 attempts to MM> smooth out the process. We now measure the amount of time it takes MM> to sync each transaction group, and the amount of data in that group. MM> We dynamically resize our write throttle to try to keep the sync MM> time constant (at 5secs) under write load. We also introduce MM> "fairness" delays on writers when we near pipeline capacity: each MM> write is delayed 1/100sec when we are about to "fill up". This MM> prevents a single heavy writer from "starving out" occasional MM> writers. So instead of coming to an abrupt halt when the pipeline MM> fills, we slow down our write pace. The result should be a constant MM> even IO load. MM> There is one "down side" to this new model: if a write load is very MM> "bursty", e.g., a large 5GB write followed by 30secs of idle, the MM> new code may be less efficient than the old. In the old code, all MM> of this IO would be let in at memory speed and then more slowly make MM> its way out to disk. In the new code, the writes may be slowed down. MM> The data makes its way to the disk in the same amount of time, but MM> the application takes longer. Conceptually: we are sizing the write MM> buffer to the pool bandwidth, rather than to the memory size. First - thank you for your explanation - it is very helpful. I''m worried about the last part - but it''s hard to be optimal for all workloads. Nevertheless sometimes the problem is if you change the behavior from application perspective. With other file systems I guess you are able to fill in most of memory and still keep disks busy 100% of the time without IO gaps. My biggest concern were these gaps in IO as zfs should keep disks 100% busy if needed. -- Best regards, Robert Milkowski mailto:milek at task.gda.pl http://milek.blogspot.com
Bob Friesenhahn writes: > On Tue, 15 Apr 2008, Mark Maybee wrote: > > going to take 12sec to get this data onto the disk. This "impedance > > mis-match" is going to manifest as pauses: the application fills > > the pipe, then waits for the pipe to empty, then starts writing again. > > Note that this won''t be smooth, since we need to complete an entire > > sync phase before allowing things to progress. So you can end up > > with IO gaps. This is probably what the original submitter is > > Yes. With an application which also needs to make best use of > available CPU, these I/O "gaps" cut into available CPU time (by > blocking the process) unless the application uses multithreading and > an intermediate write queue (more memory) to separate the CPU-centric > parts from the I/O-centric parts. While the single-threaded > application is waiting for data to be written, it is not able to read > and process more data. Since reads take time to complete, being > blocked on write stops new reads from being started so the data is > ready when it is needed. > > > There is one "down side" to this new model: if a write load is very > > "bursty", e.g., a large 5GB write followed by 30secs of idle, the > > new code may be less efficient than the old. In the old code, all > > This is also a common scenario. :-) > > Presumably the special "slow I/O" code would not kick in unless the > burst was large enough to fill quite a bit of the ARC. > Bursts of 1/8th of physical memory or 5 seconds of storage throughput whichever is smallest. -r > Real time throttling is quite a challenge to do in software. > > Bob > ===================================== > Bob Friesenhahn > bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ > GraphicsMagick Maintainer, http://www.GraphicsMagick.org/ > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Hello Mark, Tuesday, April 15, 2008, 8:32:32 PM, you wrote: MM> The new write throttle code put back into build 87 attempts to MM> smooth out the process. We now measure the amount of time it takes MM> to sync each transaction group, and the amount of data in that group. MM> We dynamically resize our write throttle to try to keep the sync MM> time constant (at 5secs) under write load. We also introduce MM> "fairness" delays on writers when we near pipeline capacity: each MM> write is delayed 1/100sec when we are about to "fill up". This MM> prevents a single heavy writer from "starving out" occasional MM> writers. So instead of coming to an abrupt halt when the pipeline MM> fills, we slow down our write pace. The result should be a constant MM> even IO load. snv_91, 48x 500GB sata drives in one large stripe: # zpool create -f test c1t0d0 c1t1d0 c1t2d0 c1t3d0 c1t4d0 c1t5d0 c1t6d0 c1t7d0 c2t0d0 c2t1d0 c2t2d0 c2t3d0 c2t4d0 c2t5d0 c2t6d0 c2t7d0 c3t0d0 c3t1d0 c3t2d0 c3t3d0 c3t4d0 c3t5d0 c3t6d0 c3t7d0 c4t0d0 c4t1d0 c4t2d0 c4t3d0 c4t4d0 c4t5d0 c4t6d0 c4t7d0 c5t0d0 c5t1d0 c5t2d0 c5t3d0 c5t4d0 c5t5d0 c5t6d0 c5t7d0 c6t0d0 c6t1d0 c6t2d0 c6t3d0 c6t4d0 c6t5d0 c6t6d0 c6t7d0 # zfs set atime=off test # dd if=/dev/zero of=/test/q1 bs=1024k ^C34374+0 records in 34374+0 records out # zpool iostat 1 capacity operations bandwidth pool used avail read write read write ---------- ----- ----- ----- ----- ----- ----- [...] test 58.9M 21.7T 0 1.19K 0 80.8M test 862M 21.7T 0 6.67K 0 776M test 1.52G 21.7T 0 5.50K 0 689M test 1.52G 21.7T 0 9.28K 0 1.16G test 2.88G 21.7T 0 1.14K 0 135M test 2.88G 21.7T 0 1.61K 0 206M test 2.88G 21.7T 0 18.0K 0 2.24G test 5.60G 21.7T 0 79 0 264K test 5.60G 21.7T 0 0 0 0 test 5.60G 21.7T 0 10.9K 0 1.36G test 9.59G 21.7T 0 7.09K 0 897M test 9.59G 21.7T 0 0 0 0 test 9.59G 21.7T 0 6.33K 0 807M test 9.59G 21.7T 0 17.9K 0 2.24G test 13.6G 21.7T 0 1.96K 0 239M test 13.6G 21.7T 0 0 0 0 test 13.6G 21.7T 0 11.9K 0 1.49G test 17.6G 21.7T 0 9.91K 0 1.23G test 17.6G 21.7T 0 0 0 0 test 17.6G 21.7T 0 5.48K 0 700M test 17.6G 21.7T 0 20.0K 0 2.50G test 21.6G 21.7T 0 2.03K 0 244M test 21.6G 21.7T 0 0 0 0 test 21.6G 21.7T 0 0 0 0 test 21.6G 21.7T 0 4.03K 0 513M test 21.6G 21.7T 0 23.7K 0 2.97G test 25.6G 21.7T 0 1.83K 0 225M test 25.6G 21.7T 0 0 0 0 test 25.6G 21.7T 0 13.9K 0 1.74G test 29.6G 21.7T 1 1.40K 127K 167M test 29.6G 21.7T 0 0 0 0 test 29.6G 21.7T 0 7.14K 0 912M test 29.6G 21.7T 0 19.2K 0 2.40G test 33.6G 21.7T 1 378 127K 34.8M test 33.6G 21.7T 0 0 0 0 ^C Well, doesn''t actually look good. Checking with iostat I don''t see any problems like long service times, etc. Reducing zfs_txg_synctime to 1 helps a little bit but still it''s not even stream of data. If I start 3 dd streams at the same time then it is slightly better (zfs_txg_synctime set back to 5) but still very jumpy. Reading with one dd produces steady throghput but I''m disapointed with actual performance: test 161G 21.6T 9.94K 0 1.24G 0 test 161G 21.6T 10.0K 0 1.25G 0 test 161G 21.6T 10.3K 0 1.29G 0 test 161G 21.6T 10.1K 0 1.27G 0 test 161G 21.6T 10.4K 0 1.31G 0 test 161G 21.6T 10.1K 0 1.27G 0 test 161G 21.6T 10.4K 0 1.30G 0 test 161G 21.6T 10.2K 0 1.27G 0 test 161G 21.6T 10.3K 0 1.29G 0 test 161G 21.6T 10.0K 0 1.25G 0 test 161G 21.6T 9.96K 0 1.24G 0 test 161G 21.6T 10.6K 0 1.33G 0 test 161G 21.6T 10.1K 0 1.26G 0 test 161G 21.6T 10.2K 0 1.27G 0 test 161G 21.6T 10.4K 0 1.30G 0 test 161G 21.6T 9.62K 0 1.20G 0 test 161G 21.6T 8.22K 0 1.03G 0 test 161G 21.6T 9.61K 0 1.20G 0 test 161G 21.6T 10.2K 0 1.28G 0 test 161G 21.6T 9.12K 0 1.14G 0 test 161G 21.6T 9.96K 0 1.25G 0 test 161G 21.6T 9.72K 0 1.22G 0 test 161G 21.6T 10.6K 0 1.32G 0 test 161G 21.6T 9.93K 0 1.24G 0 test 161G 21.6T 9.94K 0 1.24G 0 zpool scrub produces: test 161G 21.6T 25 69 2.70M 392K test 161G 21.6T 10.9K 0 1.35G 0 test 161G 21.6T 13.4K 0 1.66G 0 test 161G 21.6T 13.2K 0 1.63G 0 test 161G 21.6T 11.8K 0 1.46G 0 test 161G 21.6T 13.8K 0 1.72G 0 test 161G 21.6T 12.4K 0 1.53G 0 test 161G 21.6T 12.9K 0 1.59G 0 test 161G 21.6T 12.9K 0 1.59G 0 test 161G 21.6T 13.4K 0 1.67G 0 test 161G 21.6T 12.2K 0 1.51G 0 test 161G 21.6T 12.9K 0 1.59G 0 test 161G 21.6T 12.5K 0 1.55G 0 test 161G 21.6T 13.3K 0 1.64G 0 So sequential reading gives steady thruput but numbers are a little bit lower than expected. Sequential writing is still jumpy with single or multiple dd streams for pool with many disk drives. Lets destroy the pool and create a new one, smaller one. # zpool create -f test c1t0d0 c2t0d0 c3t0d0 c4t0d0 c5t0d0 c6t0d0 # zfs set atime=off test # dd if=/dev/zero of=/test/q1 bs=1024k ^C15905+0 records in 15905+0 records out # zpool iostat 1 capacity operations bandwidth pool used avail read write read write ---------- ----- ----- ----- ----- ----- ----- [...] test 688M 2.72T 0 3.29K 0 401M test 1.01G 2.72T 0 3.69K 0 462M test 1.35G 2.72T 0 3.59K 0 450M test 1.35G 2.72T 0 2.95K 0 372M test 2.03G 2.72T 0 3.37K 0 428M test 2.03G 2.72T 0 1.94K 0 248M test 2.71G 2.72T 0 2.44K 0 301M test 2.71G 2.72T 0 3.88K 0 497M test 2.71G 2.72T 0 3.86K 0 494M test 4.07G 2.71T 0 3.42K 0 425M test 4.07G 2.71T 0 3.89K 0 498M test 4.07G 2.71T 0 3.88K 0 497M test 5.43G 2.71T 0 3.44K 0 429M test 5.43G 2.71T 0 3.94K 0 504M test 5.43G 2.71T 0 3.88K 0 497M test 5.43G 2.71T 0 3.88K 0 497M test 7.62G 2.71T 0 2.34K 0 286M test 7.62G 2.71T 0 4.23K 0 539M test 7.62G 2.71T 0 3.89K 0 498M test 7.62G 2.71T 0 3.87K 0 495M test 7.62G 2.71T 0 3.88K 0 497M test 9.81G 2.71T 0 3.33K 0 418M test 9.81G 2.71T 0 4.12K 0 526M test 9.81G 2.71T 0 3.88K 0 497M Much more steady - interesting. Let''s do it again with yet bigger pool and lets keep distributing disks in "rows" across controllers. # zpool create -f test c1t0d0 c2t0d0 c3t0d0 c4t0d0 c5t0d0 c6t0d0 c1t1d0 c2t1d0 c3t1d0 c4t1d0 c5t1d0 c6t1d0 # zfs set atime=off test test 1.35G 5.44T 0 5.42K 0 671M test 2.03G 5.44T 0 7.01K 0 883M test 2.71G 5.43T 0 6.22K 0 786M test 2.71G 5.43T 0 8.09K 0 1.01G test 4.07G 5.43T 0 7.14K 0 902M test 5.43G 5.43T 0 4.02K 0 507M test 5.43G 5.43T 0 5.52K 0 700M test 5.43G 5.43T 0 8.04K 0 1.00G test 5.43G 5.43T 0 7.70K 0 986M test 8.15G 5.43T 0 6.13K 0 769M test 8.15G 5.43T 0 7.77K 0 995M test 8.15G 5.43T 0 7.67K 0 981M test 10.9G 5.43T 0 4.15K 0 517M test 10.9G 5.43T 0 7.74K 0 986M test 10.9G 5.43T 0 7.76K 0 994M test 10.9G 5.43T 0 7.75K 0 993M test 14.9G 5.42T 0 6.79K 0 860M test 14.9G 5.42T 0 7.50K 0 958M test 14.9G 5.42T 0 8.25K 0 1.03G test 14.9G 5.42T 0 7.77K 0 995M test 18.9G 5.42T 0 4.86K 0 614M starting to be more jumpy, but still not as bad as in first case. So lets create a pool out of all disks again but this time lets continue to provide disks in "rows" across controllers. # zpool create -f test c1t0d0 c2t0d0 c3t0d0 c4t0d0 c5t0d0 c6t0d0 c1t1d0 c2t1d0 c3t1d0 c4t1d0 c5t1d0 c6t1d0 c1t2d0 c2t2d0 c3t2d0 c4t2d0 c5t2d0 c6t2d0 c1t3d0 c2t3d0 c3t3d0 c4t3d0 c5t3d0 c6t3d0 c1t4d0 c2t4d0 c3t4d0 c4t4d0 c5t4d0 c6t4d0 c1t5d0 c2t5d0 c3t5d0 c4t5d0 c5t5d0 c6t5d0 c1t6d0 c2t6d0 c3t6d0 c4t6d0 c5t6d0 c6t6d0 c1t7d0 c2t7d0 c3t7d0 c4t7d0 c5t7d0 c6t7d0 # zfs set atime=off test test 862M 21.7T 0 5.81K 0 689M test 1.52G 21.7T 0 5.50K 0 689M test 2.88G 21.7T 0 10.9K 0 1.35G test 2.88G 21.7T 0 0 0 0 test 2.88G 21.7T 0 9.49K 0 1.18G test 5.60G 21.7T 0 11.1K 0 1.38G test 5.60G 21.7T 0 0 0 0 test 5.60G 21.7T 0 0 0 0 test 5.60G 21.7T 0 15.3K 0 1.90G test 9.59G 21.7T 0 15.4K 0 1.91G test 9.59G 21.7T 0 0 0 0 test 9.59G 21.7T 0 0 0 0 test 9.59G 21.7T 0 16.8K 0 2.09G test 13.6G 21.7T 0 8.60K 0 1.06G test 13.6G 21.7T 0 0 0 0 test 13.6G 21.7T 0 4.01K 0 512M test 13.6G 21.7T 0 20.2K 0 2.52G test 17.6G 21.7T 0 2.86K 0 353M test 17.6G 21.7T 0 0 0 0 test 17.6G 21.7T 0 11.6K 0 1.45G test 21.6G 21.7T 0 14.1K 0 1.75G test 21.6G 21.7T 0 0 0 0 test 21.6G 21.7T 0 0 0 0 test 21.6G 21.7T 0 4.74K 0 602M test 21.6G 21.7T 0 17.6K 0 2.20G test 25.6G 21.7T 0 8.00K 0 1008M test 25.6G 21.7T 0 0 0 0 test 25.6G 21.7T 0 0 0 0 test 25.6G 21.7T 0 16.8K 0 2.09G test 25.6G 21.7T 0 15.0K 0 1.86G test 29.6G 21.7T 0 11 0 11.9K Any idea? -- Best regards, Robert Milkowski mailto:milek at task.gda.pl http://milek.blogspot.com
Le 28 juin 08 ? 05:14, Robert Milkowski a ?crit :> Hello Mark, > > Tuesday, April 15, 2008, 8:32:32 PM, you wrote: > > MM> The new write throttle code put back into build 87 attempts to > MM> smooth out the process. We now measure the amount of time it > takes > MM> to sync each transaction group, and the amount of data in that > group. > MM> We dynamically resize our write throttle to try to keep the sync > MM> time constant (at 5secs) under write load. We also introduce > MM> "fairness" delays on writers when we near pipeline capacity: each > MM> write is delayed 1/100sec when we are about to "fill up". This > MM> prevents a single heavy writer from "starving out" occasional > MM> writers. So instead of coming to an abrupt halt when the pipeline > MM> fills, we slow down our write pace. The result should be a > constant > MM> even IO load. > > snv_91, 48x 500GB sata drives in one large stripe: > > # zpool create -f test c1t0d0 c1t1d0 c1t2d0 c1t3d0 c1t4d0 c1t5d0 > c1t6d0 c1t7d0 c2t0d0 c2t1d0 c2t2d0 c2t3d0 c2t4d0 c2t5d0 c2t6d0 > c2t7d0 c3t0d0 c3t1d0 c3t2d0 c3t3d0 c3t4d0 c3t5d0 c3t6d0 c3t7d0 > c4t0d0 c4t1d0 c4t2d0 c4t3d0 c4t4d0 c4t5d0 c4t6d0 c4t7d0 c5t0d0 > c5t1d0 c5t2d0 c5t3d0 c5t4d0 c5t5d0 c5t6d0 c5t7d0 c6t0d0 c6t1d0 > c6t2d0 c6t3d0 c6t4d0 c6t5d0 c6t6d0 c6t7d0 > # zfs set atime=off test > > > # dd if=/dev/zero of=/test/q1 bs=1024k > ^C34374+0 records in > 34374+0 records out > > > # zpool iostat 1 > capacity operations bandwidth > pool used avail read write read write > ---------- ----- ----- ----- ----- ----- ----- > [...] > test 58.9M 21.7T 0 1.19K 0 80.8M > test 862M 21.7T 0 6.67K 0 776M > test 1.52G 21.7T 0 5.50K 0 689M > test 1.52G 21.7T 0 9.28K 0 1.16G > test 2.88G 21.7T 0 1.14K 0 135M > test 2.88G 21.7T 0 1.61K 0 206M > test 2.88G 21.7T 0 18.0K 0 2.24G > test 5.60G 21.7T 0 79 0 264K > test 5.60G 21.7T 0 0 0 0 > test 5.60G 21.7T 0 10.9K 0 1.36G > test 9.59G 21.7T 0 7.09K 0 897M > test 9.59G 21.7T 0 0 0 0 > test 9.59G 21.7T 0 6.33K 0 807M > test 9.59G 21.7T 0 17.9K 0 2.24G > test 13.6G 21.7T 0 1.96K 0 239M > test 13.6G 21.7T 0 0 0 0 > test 13.6G 21.7T 0 11.9K 0 1.49G > test 17.6G 21.7T 0 9.91K 0 1.23G > test 17.6G 21.7T 0 0 0 0 > test 17.6G 21.7T 0 5.48K 0 700M > test 17.6G 21.7T 0 20.0K 0 2.50G > test 21.6G 21.7T 0 2.03K 0 244M > test 21.6G 21.7T 0 0 0 0 > test 21.6G 21.7T 0 0 0 0 > test 21.6G 21.7T 0 4.03K 0 513M > test 21.6G 21.7T 0 23.7K 0 2.97G > test 25.6G 21.7T 0 1.83K 0 225M > test 25.6G 21.7T 0 0 0 0 > test 25.6G 21.7T 0 13.9K 0 1.74G > test 29.6G 21.7T 1 1.40K 127K 167M > test 29.6G 21.7T 0 0 0 0 > test 29.6G 21.7T 0 7.14K 0 912M > test 29.6G 21.7T 0 19.2K 0 2.40G > test 33.6G 21.7T 1 378 127K 34.8M > test 33.6G 21.7T 0 0 0 0 > ^C > > > Well, doesn''t actually look good. Checking with iostat I don''t see any > problems like long service times, etc. >I suspect, a single dd is cpu bound.> Reducing zfs_txg_synctime to 1 helps a little bit but still it''s not > even stream of data. > > If I start 3 dd streams at the same time then it is slightly better > (zfs_txg_synctime set back to 5) but still very jumpy. >Try zfs_txg_synctime to 10; that reduces the txg overhead.> Reading with one dd produces steady throghput but I''m disapointed with > actual performance: >Again, probably cpu bound. What''s "ptime dd..." saying ?> test 161G 21.6T 9.94K 0 1.24G 0 > test 161G 21.6T 10.0K 0 1.25G 0 > test 161G 21.6T 10.3K 0 1.29G 0 > test 161G 21.6T 10.1K 0 1.27G 0 > test 161G 21.6T 10.4K 0 1.31G 0 > test 161G 21.6T 10.1K 0 1.27G 0 > test 161G 21.6T 10.4K 0 1.30G 0 > test 161G 21.6T 10.2K 0 1.27G 0 > test 161G 21.6T 10.3K 0 1.29G 0 > test 161G 21.6T 10.0K 0 1.25G 0 > test 161G 21.6T 9.96K 0 1.24G 0 > test 161G 21.6T 10.6K 0 1.33G 0 > test 161G 21.6T 10.1K 0 1.26G 0 > test 161G 21.6T 10.2K 0 1.27G 0 > test 161G 21.6T 10.4K 0 1.30G 0 > test 161G 21.6T 9.62K 0 1.20G 0 > test 161G 21.6T 8.22K 0 1.03G 0 > test 161G 21.6T 9.61K 0 1.20G 0 > test 161G 21.6T 10.2K 0 1.28G 0 > test 161G 21.6T 9.12K 0 1.14G 0 > test 161G 21.6T 9.96K 0 1.25G 0 > test 161G 21.6T 9.72K 0 1.22G 0 > test 161G 21.6T 10.6K 0 1.32G 0 > test 161G 21.6T 9.93K 0 1.24G 0 > test 161G 21.6T 9.94K 0 1.24G 0 > > > zpool scrub produces: > > test 161G 21.6T 25 69 2.70M 392K > test 161G 21.6T 10.9K 0 1.35G 0 > test 161G 21.6T 13.4K 0 1.66G 0 > test 161G 21.6T 13.2K 0 1.63G 0 > test 161G 21.6T 11.8K 0 1.46G 0 > test 161G 21.6T 13.8K 0 1.72G 0 > test 161G 21.6T 12.4K 0 1.53G 0 > test 161G 21.6T 12.9K 0 1.59G 0 > test 161G 21.6T 12.9K 0 1.59G 0 > test 161G 21.6T 13.4K 0 1.67G 0 > test 161G 21.6T 12.2K 0 1.51G 0 > test 161G 21.6T 12.9K 0 1.59G 0 > test 161G 21.6T 12.5K 0 1.55G 0 > test 161G 21.6T 13.3K 0 1.64G 0 > > > > > So sequential reading gives steady thruput but numbers are a little > bit lower than expected. > > Sequential writing is still jumpy with single or multiple dd streams > for pool with many disk drives. > > Lets destroy the pool and create a new one, smaller one. > > > > # zpool create -f test c1t0d0 c2t0d0 c3t0d0 c4t0d0 c5t0d0 c6t0d0 > # zfs set atime=off test > > # dd if=/dev/zero of=/test/q1 bs=1024k > ^C15905+0 records in > 15905+0 records out > > > # zpool iostat 1 > capacity operations bandwidth > pool used avail read write read write > ---------- ----- ----- ----- ----- ----- ----- > [...] > test 688M 2.72T 0 3.29K 0 401M > test 1.01G 2.72T 0 3.69K 0 462M > test 1.35G 2.72T 0 3.59K 0 450M > test 1.35G 2.72T 0 2.95K 0 372M > test 2.03G 2.72T 0 3.37K 0 428M > test 2.03G 2.72T 0 1.94K 0 248M > test 2.71G 2.72T 0 2.44K 0 301M > test 2.71G 2.72T 0 3.88K 0 497M > test 2.71G 2.72T 0 3.86K 0 494M > test 4.07G 2.71T 0 3.42K 0 425M > test 4.07G 2.71T 0 3.89K 0 498M > test 4.07G 2.71T 0 3.88K 0 497M > test 5.43G 2.71T 0 3.44K 0 429M > test 5.43G 2.71T 0 3.94K 0 504M > test 5.43G 2.71T 0 3.88K 0 497M > test 5.43G 2.71T 0 3.88K 0 497M > test 7.62G 2.71T 0 2.34K 0 286M > test 7.62G 2.71T 0 4.23K 0 539M > test 7.62G 2.71T 0 3.89K 0 498M > test 7.62G 2.71T 0 3.87K 0 495M > test 7.62G 2.71T 0 3.88K 0 497M > test 9.81G 2.71T 0 3.33K 0 418M > test 9.81G 2.71T 0 4.12K 0 526M > test 9.81G 2.71T 0 3.88K 0 497M > > > Much more steady - interesting. >Now it''s disk bound.> > Let''s do it again with yet bigger pool and lets keep distributing > disks in "rows" across controllers. > > # zpool create -f test c1t0d0 c2t0d0 c3t0d0 c4t0d0 c5t0d0 c6t0d0 > c1t1d0 c2t1d0 c3t1d0 c4t1d0 c5t1d0 c6t1d0 > # zfs set atime=off test > > test 1.35G 5.44T 0 5.42K 0 671M > test 2.03G 5.44T 0 7.01K 0 883M > test 2.71G 5.43T 0 6.22K 0 786M > test 2.71G 5.43T 0 8.09K 0 1.01G > test 4.07G 5.43T 0 7.14K 0 902M > test 5.43G 5.43T 0 4.02K 0 507M > test 5.43G 5.43T 0 5.52K 0 700M > test 5.43G 5.43T 0 8.04K 0 1.00G > test 5.43G 5.43T 0 7.70K 0 986M > test 8.15G 5.43T 0 6.13K 0 769M > test 8.15G 5.43T 0 7.77K 0 995M > test 8.15G 5.43T 0 7.67K 0 981M > test 10.9G 5.43T 0 4.15K 0 517M > test 10.9G 5.43T 0 7.74K 0 986M > test 10.9G 5.43T 0 7.76K 0 994M > test 10.9G 5.43T 0 7.75K 0 993M > test 14.9G 5.42T 0 6.79K 0 860M > test 14.9G 5.42T 0 7.50K 0 958M > test 14.9G 5.42T 0 8.25K 0 1.03G > test 14.9G 5.42T 0 7.77K 0 995M > test 18.9G 5.42T 0 4.86K 0 614M > > > starting to be more jumpy, but still not as bad as in first case. > > So lets create a pool out of all disks again but this time lets > continue to provide disks in "rows" across controllers. > > # zpool create -f test c1t0d0 c2t0d0 c3t0d0 c4t0d0 c5t0d0 c6t0d0 > c1t1d0 c2t1d0 c3t1d0 c4t1d0 c5t1d0 c6t1d0 c1t2d0 c2t2d0 c3t2d0 > c4t2d0 c5t2d0 c6t2d0 c1t3d0 c2t3d0 c3t3d0 c4t3d0 c5t3d0 c6t3d0 > c1t4d0 c2t4d0 c3t4d0 c4t4d0 c5t4d0 c6t4d0 c1t5d0 c2t5d0 c3t5d0 > c4t5d0 c5t5d0 c6t5d0 c1t6d0 c2t6d0 c3t6d0 c4t6d0 c5t6d0 c6t6d0 > c1t7d0 c2t7d0 c3t7d0 c4t7d0 c5t7d0 c6t7d0 > # zfs set atime=off test > > test 862M 21.7T 0 5.81K 0 689M > test 1.52G 21.7T 0 5.50K 0 689M > test 2.88G 21.7T 0 10.9K 0 1.35G > test 2.88G 21.7T 0 0 0 0 > test 2.88G 21.7T 0 9.49K 0 1.18G > test 5.60G 21.7T 0 11.1K 0 1.38G > test 5.60G 21.7T 0 0 0 0 > test 5.60G 21.7T 0 0 0 0 > test 5.60G 21.7T 0 15.3K 0 1.90G > test 9.59G 21.7T 0 15.4K 0 1.91G > test 9.59G 21.7T 0 0 0 0 > test 9.59G 21.7T 0 0 0 0 > test 9.59G 21.7T 0 16.8K 0 2.09G > test 13.6G 21.7T 0 8.60K 0 1.06G > test 13.6G 21.7T 0 0 0 0 > test 13.6G 21.7T 0 4.01K 0 512M > test 13.6G 21.7T 0 20.2K 0 2.52G > test 17.6G 21.7T 0 2.86K 0 353M > test 17.6G 21.7T 0 0 0 0 > test 17.6G 21.7T 0 11.6K 0 1.45G > test 21.6G 21.7T 0 14.1K 0 1.75G > test 21.6G 21.7T 0 0 0 0 > test 21.6G 21.7T 0 0 0 0 > test 21.6G 21.7T 0 4.74K 0 602M > test 21.6G 21.7T 0 17.6K 0 2.20G > test 25.6G 21.7T 0 8.00K 0 1008M > test 25.6G 21.7T 0 0 0 0 > test 25.6G 21.7T 0 0 0 0 > test 25.6G 21.7T 0 16.8K 0 2.09G > test 25.6G 21.7T 0 15.0K 0 1.86G > test 29.6G 21.7T 0 11 0 11.9K > > > > Any idea? > > > > -- > Best regards, > Robert Milkowski mailto:milek at task.gda.pl > http://milek.blogspot.com > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Hello Roch, Saturday, June 28, 2008, 11:25:17 AM, you wrote: RB> I suspect, a single dd is cpu bound. I don''t think so. Se below one with a stripe of 48x disks again. Single dd with 1024k block size and 64GB to write. bash-3.2# zpool iostat 1 capacity operations bandwidth pool used avail read write read write ---------- ----- ----- ----- ----- ----- ----- test 333K 21.7T 1 1 147K 147K test 333K 21.7T 0 0 0 0 test 333K 21.7T 0 0 0 0 test 333K 21.7T 0 0 0 0 test 333K 21.7T 0 0 0 0 test 333K 21.7T 0 0 0 0 test 333K 21.7T 0 0 0 0 test 333K 21.7T 0 0 0 0 test 333K 21.7T 0 1.60K 0 204M test 333K 21.7T 0 20.5K 0 2.55G test 4.00G 21.7T 0 9.19K 0 1.13G test 4.00G 21.7T 0 0 0 0 test 4.00G 21.7T 0 1.78K 0 228M test 4.00G 21.7T 0 12.5K 0 1.55G test 7.99G 21.7T 0 16.2K 0 2.01G test 7.99G 21.7T 0 0 0 0 test 7.99G 21.7T 0 13.4K 0 1.68G test 12.0G 21.7T 0 4.31K 0 530M test 12.0G 21.7T 0 0 0 0 test 12.0G 21.7T 0 6.91K 0 882M test 12.0G 21.7T 0 21.8K 0 2.72G test 16.0G 21.7T 0 839 0 88.4M test 16.0G 21.7T 0 0 0 0 test 16.0G 21.7T 0 4.42K 0 565M test 16.0G 21.7T 0 18.5K 0 2.31G test 20.0G 21.7T 0 8.87K 0 1.10G test 20.0G 21.7T 0 0 0 0 test 20.0G 21.7T 0 12.2K 0 1.52G test 24.0G 21.7T 0 9.28K 0 1.14G test 24.0G 21.7T 0 0 0 0 test 24.0G 21.7T 0 0 0 0 test 24.0G 21.7T 0 0 0 0 test 24.0G 21.7T 0 14.5K 0 1.81G test 28.0G 21.7T 0 10.1K 63.6K 1.25G test 28.0G 21.7T 0 0 0 0 test 28.0G 21.7T 0 10.7K 0 1.34G test 32.0G 21.7T 0 13.6K 63.2K 1.69G test 32.0G 21.7T 0 0 0 0 test 32.0G 21.7T 0 0 0 0 test 32.0G 21.7T 0 11.1K 0 1.39G test 36.0G 21.7T 0 19.9K 0 2.48G test 36.0G 21.7T 0 0 0 0 test 36.0G 21.7T 0 0 0 0 test 36.0G 21.7T 0 17.7K 0 2.21G test 40.0G 21.7T 0 5.42K 63.1K 680M test 40.0G 21.7T 0 0 0 0 test 40.0G 21.7T 0 6.62K 0 844M test 44.0G 21.7T 1 19.8K 125K 2.46G test 44.0G 21.7T 0 0 0 0 test 44.0G 21.7T 0 0 0 0 test 44.0G 21.7T 0 18.0K 0 2.24G test 47.9G 21.7T 1 13.2K 127K 1.63G test 47.9G 21.7T 0 0 0 0 test 47.9G 21.7T 0 0 0 0 test 47.9G 21.7T 0 15.6K 0 1.94G test 47.9G 21.7T 1 16.1K 126K 1.99G test 51.9G 21.7T 0 0 0 0 test 51.9G 21.7T 0 0 0 0 test 51.9G 21.7T 0 14.2K 0 1.77G test 55.9G 21.7T 0 14.0K 63.2K 1.73G test 55.9G 21.7T 0 0 0 0 test 55.9G 21.7T 0 0 0 0 test 55.9G 21.7T 0 16.3K 0 2.04G test 59.9G 21.7T 0 14.5K 63.2K 1.80G test 59.9G 21.7T 0 0 0 0 test 59.9G 21.7T 0 0 0 0 test 59.9G 21.7T 0 17.7K 0 2.21G test 63.9G 21.7T 0 4.84K 62.6K 603M test 63.9G 21.7T 0 0 0 0 test 63.9G 21.7T 0 0 0 0 test 63.9G 21.7T 0 0 0 0 test 63.9G 21.7T 0 0 0 0 test 63.9G 21.7T 0 0 0 0 test 63.9G 21.7T 0 0 0 0 test 63.9G 21.7T 0 0 0 0 ^C bash-3.2# bash-3.2# ptime dd if=/dev/zero of=/test/q1 bs=1024k count=65536 65536+0 records in 65536+0 records out real 1:06.312 user 0.074 sys 54.060 bash-3.2# Doesn''t look like it''s CPU bound. Let''s try to read the file after zpool export test; zpool import test bash-3.2# zpool iostat 1 capacity operations bandwidth pool used avail read write read write ---------- ----- ----- ----- ----- ----- ----- test 64.0G 21.7T 15 46 1.22M 1.76M test 64.0G 21.7T 0 0 0 0 test 64.0G 21.7T 0 0 0 0 test 64.0G 21.7T 6.64K 0 849M 0 test 64.0G 21.7T 10.2K 0 1.27G 0 test 64.0G 21.7T 10.7K 0 1.33G 0 test 64.0G 21.7T 9.91K 0 1.24G 0 test 64.0G 21.7T 10.1K 0 1.27G 0 test 64.0G 21.7T 10.7K 0 1.33G 0 test 64.0G 21.7T 10.4K 0 1.30G 0 test 64.0G 21.7T 10.6K 0 1.33G 0 test 64.0G 21.7T 10.6K 0 1.33G 0 test 64.0G 21.7T 10.2K 0 1.27G 0 test 64.0G 21.7T 10.3K 0 1.29G 0 test 64.0G 21.7T 10.5K 0 1.31G 0 test 64.0G 21.7T 9.16K 0 1.14G 0 test 64.0G 21.7T 1.98K 0 253M 0 test 64.0G 21.7T 2.48K 0 317M 0 test 64.0G 21.7T 1.98K 0 253M 0 test 64.0G 21.7T 1.98K 0 254M 0 test 64.0G 21.7T 2.23K 0 285M 0 test 64.0G 21.7T 1.73K 0 221M 0 test 64.0G 21.7T 2.23K 0 285M 0 test 64.0G 21.7T 2.23K 0 285M 0 test 64.0G 21.7T 1.49K 0 191M 0 test 64.0G 21.7T 2.47K 0 317M 0 test 64.0G 21.7T 1.46K 0 186M 0 test 64.0G 21.7T 2.01K 0 258M 0 test 64.0G 21.7T 1.98K 0 254M 0 test 64.0G 21.7T 1.97K 0 253M 0 test 64.0G 21.7T 2.23K 0 286M 0 test 64.0G 21.7T 1.98K 0 254M 0 test 64.0G 21.7T 1.73K 0 221M 0 test 64.0G 21.7T 1.98K 0 254M 0 test 64.0G 21.7T 2.42K 0 310M 0 test 64.0G 21.7T 1.78K 0 228M 0 test 64.0G 21.7T 2.23K 0 285M 0 test 64.0G 21.7T 1.67K 0 214M 0 test 64.0G 21.7T 1.80K 0 230M 0 test 64.0G 21.7T 2.23K 0 285M 0 test 64.0G 21.7T 2.47K 0 317M 0 test 64.0G 21.7T 1.73K 0 221M 0 test 64.0G 21.7T 1.99K 0 254M 0 test 64.0G 21.7T 1.24K 0 159M 0 test 64.0G 21.7T 2.47K 0 316M 0 test 64.0G 21.7T 2.47K 0 317M 0 test 64.0G 21.7T 1.99K 0 254M 0 test 64.0G 21.7T 2.23K 0 285M 0 test 64.0G 21.7T 1.73K 0 221M 0 test 64.0G 21.7T 2.48K 0 317M 0 test 64.0G 21.7T 2.48K 0 317M 0 test 64.0G 21.7T 1.49K 0 190M 0 test 64.0G 21.7T 2.23K 0 285M 0 test 64.0G 21.7T 2.23K 0 285M 0 test 64.0G 21.7T 1.81K 0 232M 0 test 64.0G 21.7T 1.90K 0 243M 0 test 64.0G 21.7T 2.48K 0 317M 0 test 64.0G 21.7T 1.49K 0 191M 0 test 64.0G 21.7T 2.47K 0 317M 0 test 64.0G 21.7T 1.99K 0 254M 0 test 64.0G 21.7T 1.97K 0 253M 0 test 64.0G 21.7T 1.49K 0 190M 0 test 64.0G 21.7T 2.23K 0 286M 0 test 64.0G 21.7T 1.82K 0 232M 0 test 64.0G 21.7T 2.15K 0 275M 0 test 64.0G 21.7T 2.22K 0 285M 0 test 64.0G 21.7T 1.73K 0 222M 0 test 64.0G 21.7T 2.23K 0 286M 0 test 64.0G 21.7T 1.90K 0 244M 0 test 64.0G 21.7T 1.81K 0 231M 0 test 64.0G 21.7T 2.23K 0 285M 0 test 64.0G 21.7T 1.97K 0 252M 0 test 64.0G 21.7T 2.00K 0 255M 0 test 64.0G 21.7T 8.42K 0 1.05G 0 test 64.0G 21.7T 10.3K 0 1.29G 0 test 64.0G 21.7T 10.2K 0 1.28G 0 test 64.0G 21.7T 10.4K 0 1.30G 0 test 64.0G 21.7T 10.4K 0 1.30G 0 test 64.0G 21.7T 10.4K 0 1.30G 0 test 64.0G 21.7T 10.2K 0 1.27G 0 test 64.0G 21.7T 10.4K 0 1.30G 0 test 64.0G 21.7T 10.6K 0 1.32G 0 test 64.0G 21.7T 10.5K 0 1.31G 0 test 64.0G 21.7T 10.4K 0 1.30G 0 test 64.0G 21.7T 9.23K 0 1.15G 0 test 64.0G 21.7T 10.5K 0 1.31G 0 test 64.0G 21.7T 10.0K 0 1.25G 0 test 64.0G 21.7T 9.55K 0 1.19G 0 test 64.0G 21.7T 10.2K 0 1.27G 0 test 64.0G 21.7T 10.0K 0 1.25G 0 test 64.0G 21.7T 9.91K 0 1.24G 0 test 64.0G 21.7T 10.6K 0 1.32G 0 test 64.0G 21.7T 9.24K 0 1.15G 0 test 64.0G 21.7T 10.1K 0 1.26G 0 test 64.0G 21.7T 10.3K 0 1.29G 0 test 64.0G 21.7T 10.3K 0 1.29G 0 test 64.0G 21.7T 10.6K 0 1.33G 0 test 64.0G 21.7T 10.6K 0 1.33G 0 test 64.0G 21.7T 8.54K 0 1.07G 0 test 64.0G 21.7T 0 0 0 0 test 64.0G 21.7T 0 0 0 0 test 64.0G 21.7T 0 0 0 0 ^C bash-3.2# ptime dd if=/test/q1 of=/dev/null bs=1024k 65536+0 records in 65536+0 records out real 1:36.732 user 0.046 sys 48.069 bash-3.2# Well, that drop for several dozen seconds was interesting... Lets run it again without export/import: bash-3.2# zpool iostat 1 capacity operations bandwidth pool used avail read write read write ---------- ----- ----- ----- ----- ----- ----- test 64.0G 21.7T 3.00K 6 384M 271K test 64.0G 21.7T 0 0 0 0 test 64.0G 21.7T 2.58K 0 330M 0 test 64.0G 21.7T 6.02K 0 771M 0 test 64.0G 21.7T 8.37K 0 1.05G 0 test 64.0G 21.7T 10.4K 0 1.30G 0 test 64.0G 21.7T 10.4K 0 1.30G 0 test 64.0G 21.7T 9.64K 0 1.20G 0 test 64.0G 21.7T 10.5K 0 1.31G 0 test 64.0G 21.7T 10.6K 0 1.32G 0 test 64.0G 21.7T 10.6K 0 1.33G 0 test 64.0G 21.7T 10.4K 0 1.30G 0 test 64.0G 21.7T 9.65K 0 1.21G 0 test 64.0G 21.7T 9.84K 0 1.23G 0 test 64.0G 21.7T 9.22K 0 1.15G 0 test 64.0G 21.7T 10.9K 0 1.36G 0 test 64.0G 21.7T 10.9K 0 1.36G 0 test 64.0G 21.7T 10.4K 0 1.30G 0 test 64.0G 21.7T 10.7K 0 1.34G 0 test 64.0G 21.7T 10.6K 0 1.33G 0 test 64.0G 21.7T 10.9K 0 1.36G 0 test 64.0G 21.7T 10.6K 0 1.32G 0 test 64.0G 21.7T 10.4K 0 1.30G 0 test 64.0G 21.7T 10.7K 0 1.34G 0 test 64.0G 21.7T 10.5K 0 1.32G 0 test 64.0G 21.7T 10.6K 0 1.32G 0 test 64.0G 21.7T 10.8K 0 1.34G 0 test 64.0G 21.7T 10.4K 0 1.29G 0 test 64.0G 21.7T 10.5K 0 1.31G 0 test 64.0G 21.7T 9.15K 0 1.14G 0 test 64.0G 21.7T 10.8K 0 1.35G 0 test 64.0G 21.7T 9.76K 0 1.22G 0 test 64.0G 21.7T 8.67K 0 1.08G 0 test 64.0G 21.7T 10.8K 0 1.36G 0 test 64.0G 21.7T 10.9K 0 1.36G 0 test 64.0G 21.7T 10.3K 0 1.28G 0 test 64.0G 21.7T 9.76K 0 1.22G 0 test 64.0G 21.7T 10.5K 0 1.31G 0 test 64.0G 21.7T 10.6K 0 1.33G 0 test 64.0G 21.7T 9.23K 0 1.15G 0 test 64.0G 21.7T 9.63K 0 1.20G 0 test 64.0G 21.7T 9.79K 0 1.22G 0 test 64.0G 21.7T 10.2K 0 1.28G 0 test 64.0G 21.7T 10.4K 0 1.30G 0 test 64.0G 21.7T 10.3K 0 1.29G 0 test 64.0G 21.7T 10.2K 0 1.28G 0 test 64.0G 21.7T 10.6K 0 1.33G 0 test 64.0G 21.7T 10.8K 0 1.35G 0 test 64.0G 21.7T 10.5K 0 1.32G 0 test 64.0G 21.7T 11.0K 0 1.37G 0 test 64.0G 21.7T 10.2K 0 1.27G 0 test 64.0G 21.7T 9.69K 0 1.21G 0 test 64.0G 21.7T 6.07K 0 777M 0 test 64.0G 21.7T 0 0 0 0 test 64.0G 21.7T 0 0 0 0 test 64.0G 21.7T 0 0 0 0 ^C bash-3.2# bash-3.2# ptime dd if=/test/q1 of=/dev/null bs=1024k 65536+0 records in 65536+0 records out real 50.521 user 0.043 sys 48.971 bash-3.2# Now looks like reading from the pool using single dd is actually CPU bound. Reading the same file again and again does produce, more or less, consistent timing. However every time I export/import the pool during the first read there is that drop in throughput during first read and total time increases to almost 100 seconds.... some meta-data? (of course there are no errors oof any sort, etc.)>> Reducing zfs_txg_synctime to 1 helps a little bit but still it''s not >> even stream of data. >> >> If I start 3 dd streams at the same time then it is slightly better >> (zfs_txg_synctime set back to 5) but still very jumpy. >>RB> Try zfs_txg_synctime to 10; that reduces the txg overhead. Doesn''t help... [...] test 13.6G 21.7T 0 0 0 0 test 13.6G 21.7T 0 8.46K 0 1.05G test 17.6G 21.7T 0 19.3K 0 2.40G test 17.6G 21.7T 0 0 0 0 test 17.6G 21.7T 0 0 0 0 test 17.6G 21.7T 0 8.04K 0 1022M test 17.6G 21.7T 0 20.2K 0 2.51G test 21.6G 21.7T 0 76 0 249K test 21.6G 21.7T 0 0 0 0 test 21.6G 21.7T 0 0 0 0 test 21.6G 21.7T 0 10.1K 0 1.25G test 25.6G 21.7T 0 18.6K 0 2.31G test 25.6G 21.7T 0 0 0 0 test 25.6G 21.7T 0 0 0 0 test 25.6G 21.7T 0 6.34K 0 810M test 25.6G 21.7T 0 19.9K 0 2.48G test 29.6G 21.7T 0 88 63.2K 354K [...] bash-3.2# ptime dd if=/dev/zero of=/test/q1 bs=1024k count=65536 65536+0 records in 65536+0 records out real 1:10.074 user 0.074 sys 52.250 bash-3.2# Increasing it even further (up-to 32s) doesn''t help either. However lowering it to 1s gives: [...] test 2.43G 21.7T 0 8.62K 0 1.07G test 4.46G 21.7T 0 7.23K 0 912M test 4.46G 21.7T 0 624 0 77.9M test 6.66G 21.7T 0 10.7K 0 1.33G test 6.66G 21.7T 0 6.66K 0 850M test 8.86G 21.7T 0 10.6K 0 1.31G test 8.86G 21.7T 0 1.96K 0 251M test 11.2G 21.7T 0 16.5K 0 2.04G test 11.2G 21.7T 0 0 0 0 test 11.2G 21.7T 0 18.6K 0 2.31G test 13.5G 21.7T 0 11 0 11.9K test 13.5G 21.7T 0 2.60K 0 332M test 13.5G 21.7T 0 19.1K 0 2.37G test 16.3G 21.7T 0 11 0 11.9K test 16.3G 21.7T 0 9.61K 0 1.20G test 18.4G 21.7T 0 7.41K 0 936M test 18.4G 21.7T 0 11.6K 0 1.45G test 20.3G 21.7T 0 3.26K 0 407M test 20.3G 21.7T 0 7.66K 0 977M test 22.5G 21.7T 0 7.62K 0 963M test 22.5G 21.7T 0 6.86K 0 875M test 24.5G 21.7T 0 8.41K 0 1.04G test 24.5G 21.7T 0 10.4K 0 1.30G test 26.5G 21.7T 1 2.19K 127K 270M test 26.5G 21.7T 0 0 0 0 test 26.5G 21.7T 0 4.56K 0 584M test 28.5G 21.7T 0 11.5K 0 1.42G [...] bash-3.2# ptime dd if=/dev/zero of=/test/q1 bs=1024k count=65536 65536+0 records in 65536+0 records out real 1:09.541 user 0.072 sys 53.421 bash-3.2# Looks slightly less jumpy but the total real time is about the same so average throughput is actually the same (about 1GB/s).>> Reading with one dd produces steady throghput but I''m disapointed with >> actual performance: >>RB> Again, probably cpu bound. What''s "ptime dd..." saying ? You were right here. Reading with single dd seems to be cpu bound. However multiple streams for reading do not seem to increase performance considerably. Nevertheless the main issu is jumpy writing... -- Best regards, Robert Milkowski mailto:milek at task.gda.pl http://milek.blogspot.com
Hello Robert, Tuesday, July 1, 2008, 12:01:03 AM, you wrote: RM> Nevertheless the main issu is jumpy writing... I was just wondering how much thruoughput I can get running multiple dd - one per disk drive and what kind of aggregated throughput I would get. So for each out of 48 disks I did: dd if=/dev/zero of=/dev/rdsk/c6t7d0s0 bs=128k& The iostat looks like: bash-3.2# iostat -xnzC 1|egrep " c[0-6]$|devic" [skipped the first output] extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.0 5308.0 0.0 679418.9 0.1 7.2 0.0 1.4 0 718 c1 0.0 5264.2 0.0 673813.1 0.1 7.2 0.0 1.4 0 720 c2 0.0 4047.6 0.0 518095.1 0.1 7.3 0.0 1.8 0 725 c3 0.0 5340.1 0.0 683532.5 0.1 7.2 0.0 1.3 0 718 c4 0.0 5325.1 0.0 681608.0 0.1 7.1 0.0 1.3 0 714 c5 0.0 4089.3 0.0 523434.0 0.1 7.3 0.0 1.8 0 727 c6 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.0 5283.1 0.0 676231.2 0.1 7.2 0.0 1.4 0 723 c1 0.0 5215.2 0.0 667549.5 0.1 7.2 0.0 1.4 0 720 c2 0.0 4009.0 0.0 513152.8 0.1 7.3 0.0 1.8 0 725 c3 0.0 5281.9 0.0 676082.5 0.1 7.2 0.0 1.4 0 722 c4 0.0 5316.6 0.0 680520.9 0.1 7.2 0.0 1.4 0 720 c5 0.0 4159.5 0.0 532420.9 0.1 7.3 0.0 1.7 0 726 c6 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.0 5322.0 0.0 681213.6 0.1 7.2 0.0 1.4 0 720 c1 0.0 5292.9 0.0 677494.0 0.1 7.2 0.0 1.4 0 722 c2 0.0 4051.4 0.0 518573.3 0.1 7.3 0.0 1.8 0 727 c3 0.0 5315.0 0.0 680318.8 0.1 7.2 0.0 1.4 0 721 c4 0.0 5313.1 0.0 680074.3 0.1 7.2 0.0 1.4 0 723 c5 0.0 4184.8 0.0 535648.7 0.1 7.3 0.0 1.7 0 730 c6 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.0 5296.4 0.0 677940.2 0.1 7.1 0.0 1.3 0 714 c1 0.0 5236.4 0.0 670265.3 0.1 7.2 0.0 1.4 0 720 c2 0.0 4023.5 0.0 515011.5 0.1 7.3 0.0 1.8 0 728 c3 0.0 5291.4 0.0 677300.7 0.1 7.2 0.0 1.4 0 723 c4 0.0 5297.4 0.0 678072.8 0.1 7.2 0.0 1.4 0 720 c5 0.0 4095.6 0.0 524236.0 0.1 7.3 0.0 1.8 0 726 c6 ^C one full output: extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.0 5302.0 0.0 678658.6 0.1 7.2 0.0 1.4 0 722 c1 0.0 664.0 0.0 84992.8 0.0 0.9 0.0 1.4 1 90 c1t0d0 0.0 657.0 0.0 84090.5 0.0 0.9 0.0 1.3 1 89 c1t1d0 0.0 666.0 0.0 85251.4 0.0 0.9 0.0 1.3 1 89 c1t2d0 0.0 662.0 0.0 84735.6 0.0 0.9 0.0 1.4 1 91 c1t3d0 0.0 669.1 0.0 85638.4 0.0 0.9 0.0 1.4 1 92 c1t4d0 0.0 665.0 0.0 85122.9 0.0 0.9 0.0 1.4 1 91 c1t5d0 0.0 652.9 0.0 83575.1 0.0 0.9 0.0 1.4 1 90 c1t6d0 0.0 666.0 0.0 85251.8 0.0 0.9 0.0 1.4 1 91 c1t7d0 0.0 5293.3 0.0 677537.5 0.1 7.3 0.0 1.4 0 725 c2 0.0 660.0 0.0 84481.2 0.0 0.9 0.0 1.4 1 91 c2t0d0 0.0 661.0 0.0 84610.3 0.0 0.9 0.0 1.4 1 90 c2t1d0 0.0 664.0 0.0 84997.4 0.0 0.9 0.0 1.4 1 90 c2t2d0 0.0 662.0 0.0 84739.4 0.0 0.9 0.0 1.4 1 92 c2t3d0 0.0 655.0 0.0 83836.6 0.0 0.9 0.0 1.4 1 89 c2t4d0 0.0 663.1 0.0 84871.3 0.0 0.9 0.0 1.4 1 90 c2t5d0 0.0 663.1 0.0 84871.5 0.0 0.9 0.0 1.4 1 92 c2t6d0 0.0 665.1 0.0 85129.7 0.0 0.9 0.0 1.4 1 92 c2t7d0 0.0 4072.1 0.0 521228.9 0.1 7.3 0.0 1.8 0 728 c3 0.0 506.9 0.0 64879.3 0.0 0.9 0.0 1.8 1 90 c3t0d0 0.0 513.9 0.0 65782.4 0.0 0.9 0.0 1.8 1 92 c3t1d0 0.0 511.9 0.0 65524.4 0.0 0.9 0.0 1.8 1 91 c3t2d0 0.0 505.9 0.0 64750.5 0.0 0.9 0.0 1.8 1 91 c3t3d0 0.0 502.8 0.0 64363.6 0.0 0.9 0.0 1.8 1 90 c3t4d0 0.0 506.9 0.0 64879.6 0.0 0.9 0.0 1.8 1 91 c3t5d0 0.0 513.9 0.0 65782.6 0.0 0.9 0.0 1.8 1 92 c3t6d0 0.0 509.9 0.0 65266.6 0.0 0.9 0.0 1.8 1 91 c3t7d0 0.0 5298.7 0.0 678232.6 0.1 7.3 0.0 1.4 0 725 c4 0.0 664.1 0.0 85001.4 0.0 0.9 0.0 1.4 1 92 c4t0d0 0.0 662.1 0.0 84743.4 0.0 0.9 0.0 1.4 1 90 c4t1d0 0.0 663.1 0.0 84872.4 0.0 0.9 0.0 1.4 1 92 c4t2d0 0.0 664.1 0.0 85001.4 0.0 0.9 0.0 1.3 1 88 c4t3d0 0.0 657.1 0.0 84105.4 0.0 0.9 0.0 1.4 1 91 c4t4d0 0.0 658.1 0.0 84234.5 0.0 0.9 0.0 1.4 1 91 c4t5d0 0.0 669.2 0.0 85653.4 0.0 0.9 0.0 1.3 1 90 c4t6d0 0.0 661.1 0.0 84620.5 0.0 0.9 0.0 1.4 1 91 c4t7d0 0.0 5314.1 0.0 680209.2 0.1 7.2 0.0 1.3 0 717 c5 0.0 666.1 0.0 85265.7 0.0 0.9 0.0 1.3 1 89 c5t0d0 0.0 662.1 0.0 84749.8 0.0 0.9 0.0 1.3 1 88 c5t1d0 0.0 660.1 0.0 84491.8 0.0 0.9 0.0 1.3 1 89 c5t2d0 0.0 665.2 0.0 85140.3 0.0 0.9 0.0 1.3 1 89 c5t3d0 0.0 668.2 0.0 85527.3 0.0 0.9 0.0 1.4 1 92 c5t4d0 0.0 666.2 0.0 85269.5 0.0 0.9 0.0 1.3 1 89 c5t5d0 0.0 664.2 0.0 85011.4 0.0 0.9 0.0 1.4 1 91 c5t6d0 0.0 662.1 0.0 84753.5 0.0 0.9 0.0 1.4 1 90 c5t7d0 0.0 4229.8 0.0 541418.9 0.1 7.3 0.0 1.7 0 726 c6 0.0 518.0 0.0 66306.4 0.0 0.9 0.0 1.7 1 89 c6t0d0 0.0 533.1 0.0 68241.7 0.0 0.9 0.0 1.7 1 91 c6t1d0 0.0 531.1 0.0 67983.6 0.0 0.9 0.0 1.7 1 91 c6t2d0 0.0 524.1 0.0 67080.6 0.0 0.9 0.0 1.7 1 90 c6t3d0 0.0 540.2 0.0 69144.7 0.0 0.9 0.0 1.7 1 92 c6t4d0 0.0 525.1 0.0 67209.8 0.0 0.9 0.0 1.7 1 90 c6t5d0 0.0 535.2 0.0 68500.0 0.0 0.9 0.0 1.7 1 92 c6t6d0 0.0 523.1 0.0 66952.1 0.0 0.9 0.0 1.7 1 90 c6t7d0 bash-3.2# bc scale=4 678658.6+677537.5+521228.9+678232.6+680209.2+541418.9 3777285.7 3777285.7/(1024*1024) 3.6023 bash-3.2# So it''s about 3.6GB/s - pretty good :) Average throughput with one large stripe pool using zfs is less than half of that above performance... :( And yes, even with multiple dd to the same pool. Additionally turning checksums off helps: bash-3.2# zpool iostat 1 capacity operations bandwidth pool used avail read write read write ---------- ----- ----- ----- ----- ----- ----- test 366G 21.4T 0 10.7K 43.2K 1.33G test 370G 21.4T 0 14.7K 63.4K 1.82G test 370G 21.4T 0 22.0K 0 2.69G test 374G 21.4T 0 12.4K 0 1.54G test 374G 21.4T 0 23.6K 0 2.91G test 378G 21.4T 0 12.5K 0 1.53G test 378G 21.4T 0 17.3K 0 2.13G test 382G 21.4T 1 16.6K 126K 2.05G test 382G 21.4T 2 17.7K 190K 2.19G test 386G 21.4T 0 20.4K 0 2.51G test 390G 21.4T 11 11.6K 762K 1.44G test 390G 21.4T 0 28.9K 0 3.55G test 394G 21.4T 2 12.5K 157K 1.51G test 398G 21.4T 1 20.0K 127K 2.49G test 398G 21.4T 0 16.3K 0 2.00G test 402G 21.4T 4 15.3K 311K 1.90G test 402G 21.4T 0 21.9K 0 2.70G test 406G 21.4T 4 9.73K 314K 1.19G test 406G 21.4T 0 22.7K 0 2.78G test 410G 21.4T 2 14.4K 131K 1.78G test 414G 21.3T 0 19.9K 61.4K 2.43G test 414G 21.3T 0 19.1K 0 2.35G ^C bash-3.2# zfs set checksum=on test bash-3.2# zpool iostat 1 capacity operations bandwidth pool used avail read write read write ---------- ----- ----- ----- ----- ----- ----- test 439G 21.3T 0 11.4K 50.2K 1.41G test 439G 21.3T 0 5.52K 0 702M test 439G 21.3T 0 24.6K 0 3.07G test 443G 21.3T 0 13.7K 0 1.70G test 447G 21.3T 1 13.1K 123K 1.62G test 447G 21.3T 0 16.1K 0 2.00G test 451G 21.3T 1 3.97K 116K 498M test 451G 21.3T 0 17.5K 0 2.19G test 455G 21.3T 1 12.4K 66.9K 1.54G test 455G 21.3T 0 13.0K 0 1.60G test 459G 21.3T 0 11 0 11.9K test 459G 21.3T 0 16.8K 0 2.09G test 463G 21.3T 0 9.34K 0 1.16G test 467G 21.3T 0 15.4K 0 1.91G test 467G 21.3T 0 16.3K 0 2.03G test 471G 21.3T 0 9.67K 0 1.20G test 475G 21.3T 0 17.3K 0 2.13G test 475G 21.3T 0 3.71K 0 472M test 475G 21.3T 0 21.9K 0 2.73G test 479G 21.3T 0 17.4K 0 2.16G test 483G 21.3T 0 848 0 96.4M test 483G 21.3T 0 17.4K 0 2.17G ^C bash-3.2# bash-3.2# zpool iostat 5 capacity operations bandwidth pool used avail read write read write ---------- ----- ----- ----- ----- ----- ----- test 582G 21.2T 0 11.8K 44.4K 1.46G test 590G 21.2T 1 13.8K 76.5K 1.72G test 598G 21.2T 1 12.4K 102K 1.54G test 610G 21.2T 1 14.0K 76.7K 1.73G test 618G 21.1T 0 12.9K 25.5K 1.59G test 626G 21.1T 0 14.8K 11.1K 1.83G test 634G 21.1T 0 14.2K 11.9K 1.76G test 642G 21.1T 0 12.8K 12.8K 1.59G test 650G 21.1T 0 12.9K 12.8K 1.60G ^C bash-3.2# zfs set checksum=off test bash-3.2# zpool iostat 5 capacity operations bandwidth pool used avail read write read write ---------- ----- ----- ----- ----- ----- ----- test 669G 21.1T 0 12.0K 43.5K 1.48G test 681G 21.1T 0 17.7K 25.2K 2.18G test 693G 21.1T 0 16.0K 12.7K 1.97G test 701G 21.1T 0 19.4K 25.5K 2.38G test 713G 21.1T 0 16.6K 12.8K 2.03G test 725G 21.0T 0 17.8K 24.9K 2.18G test 737G 21.0T 0 17.2K 12.7K 2.11G test 745G 21.0T 0 19.0K 38.3K 2.34G test 757G 21.0T 0 16.9K 12.8K 2.08G test 769G 21.0T 0 17.6K 50.7K 2.16G ^C bash-3.2# So without checksums it is much better but still it''s jumpy instead of steady/constant stream. Especially with 1s iostat resolution. -- Best regards, Robert Milkowski mailto:milek at task.gda.pl http://milek.blogspot.com
Robert Milkowski writes: > Hello Roch, > > Saturday, June 28, 2008, 11:25:17 AM, you wrote: > > > RB> I suspect, a single dd is cpu bound. > > I don''t think so. > We''re nearly so as you show. More below. > Se below one with a stripe of 48x disks again. Single dd with 1024k > block size and 64GB to write. > > bash-3.2# zpool iostat 1 > capacity operations bandwidth > pool used avail read write read write > ---------- ----- ----- ----- ----- ----- ----- > test 333K 21.7T 1 1 147K 147K > test 333K 21.7T 0 0 0 0 > test 333K 21.7T 0 0 0 0 > test 333K 21.7T 0 0 0 0 > test 333K 21.7T 0 0 0 0 > test 333K 21.7T 0 0 0 0 > test 333K 21.7T 0 0 0 0 > test 333K 21.7T 0 0 0 0 > test 333K 21.7T 0 1.60K 0 204M > test 333K 21.7T 0 20.5K 0 2.55G > test 4.00G 21.7T 0 9.19K 0 1.13G > test 4.00G 21.7T 0 0 0 0 > test 4.00G 21.7T 0 1.78K 0 228M > test 4.00G 21.7T 0 12.5K 0 1.55G > test 7.99G 21.7T 0 16.2K 0 2.01G > test 7.99G 21.7T 0 0 0 0 > test 7.99G 21.7T 0 13.4K 0 1.68G > test 12.0G 21.7T 0 4.31K 0 530M > test 12.0G 21.7T 0 0 0 0 > test 12.0G 21.7T 0 6.91K 0 882M > test 12.0G 21.7T 0 21.8K 0 2.72G > test 16.0G 21.7T 0 839 0 88.4M > test 16.0G 21.7T 0 0 0 0 > test 16.0G 21.7T 0 4.42K 0 565M > test 16.0G 21.7T 0 18.5K 0 2.31G > test 20.0G 21.7T 0 8.87K 0 1.10G > test 20.0G 21.7T 0 0 0 0 > test 20.0G 21.7T 0 12.2K 0 1.52G > test 24.0G 21.7T 0 9.28K 0 1.14G > test 24.0G 21.7T 0 0 0 0 > test 24.0G 21.7T 0 0 0 0 > test 24.0G 21.7T 0 0 0 0 > test 24.0G 21.7T 0 14.5K 0 1.81G > test 28.0G 21.7T 0 10.1K 63.6K 1.25G > test 28.0G 21.7T 0 0 0 0 > test 28.0G 21.7T 0 10.7K 0 1.34G > test 32.0G 21.7T 0 13.6K 63.2K 1.69G > test 32.0G 21.7T 0 0 0 0 > test 32.0G 21.7T 0 0 0 0 > test 32.0G 21.7T 0 11.1K 0 1.39G > test 36.0G 21.7T 0 19.9K 0 2.48G > test 36.0G 21.7T 0 0 0 0 > test 36.0G 21.7T 0 0 0 0 > test 36.0G 21.7T 0 17.7K 0 2.21G > test 40.0G 21.7T 0 5.42K 63.1K 680M > test 40.0G 21.7T 0 0 0 0 > test 40.0G 21.7T 0 6.62K 0 844M > test 44.0G 21.7T 1 19.8K 125K 2.46G > test 44.0G 21.7T 0 0 0 0 > test 44.0G 21.7T 0 0 0 0 > test 44.0G 21.7T 0 18.0K 0 2.24G > test 47.9G 21.7T 1 13.2K 127K 1.63G > test 47.9G 21.7T 0 0 0 0 > test 47.9G 21.7T 0 0 0 0 > test 47.9G 21.7T 0 15.6K 0 1.94G > test 47.9G 21.7T 1 16.1K 126K 1.99G > test 51.9G 21.7T 0 0 0 0 > test 51.9G 21.7T 0 0 0 0 > test 51.9G 21.7T 0 14.2K 0 1.77G > test 55.9G 21.7T 0 14.0K 63.2K 1.73G > test 55.9G 21.7T 0 0 0 0 > test 55.9G 21.7T 0 0 0 0 > test 55.9G 21.7T 0 16.3K 0 2.04G > test 59.9G 21.7T 0 14.5K 63.2K 1.80G > test 59.9G 21.7T 0 0 0 0 > test 59.9G 21.7T 0 0 0 0 > test 59.9G 21.7T 0 17.7K 0 2.21G > test 63.9G 21.7T 0 4.84K 62.6K 603M > test 63.9G 21.7T 0 0 0 0 > test 63.9G 21.7T 0 0 0 0 > test 63.9G 21.7T 0 0 0 0 > test 63.9G 21.7T 0 0 0 0 > test 63.9G 21.7T 0 0 0 0 > test 63.9G 21.7T 0 0 0 0 > test 63.9G 21.7T 0 0 0 0 > ^C > bash-3.2# > > bash-3.2# ptime dd if=/dev/zero of=/test/q1 bs=1024k count=65536 > 65536+0 records in > 65536+0 records out > > real 1:06.312 > user 0.074 > sys 54.060 > bash-3.2# > > Doesn''t look like it''s CPU bound. > So if sys we''re at 81% of CPU saturation. If you make this 100% you will still have zeros in the zpool iostat. We might be waiting on memory pages and a few other locks. > > > Let''s try to read the file after zpool export test; zpool import test > > bash-3.2# zpool iostat 1 > capacity operations bandwidth > pool used avail read write read write > ---------- ----- ----- ----- ----- ----- ----- > test 64.0G 21.7T 15 46 1.22M 1.76M > test 64.0G 21.7T 0 0 0 0 > test 64.0G 21.7T 0 0 0 0 > test 64.0G 21.7T 6.64K 0 849M 0 > test 64.0G 21.7T 10.2K 0 1.27G 0 > test 64.0G 21.7T 10.7K 0 1.33G 0 > test 64.0G 21.7T 9.91K 0 1.24G 0 > test 64.0G 21.7T 10.1K 0 1.27G 0 > test 64.0G 21.7T 10.7K 0 1.33G 0 > test 64.0G 21.7T 10.4K 0 1.30G 0 > test 64.0G 21.7T 10.6K 0 1.33G 0 > test 64.0G 21.7T 10.6K 0 1.33G 0 > test 64.0G 21.7T 10.2K 0 1.27G 0 > test 64.0G 21.7T 10.3K 0 1.29G 0 > test 64.0G 21.7T 10.5K 0 1.31G 0 > test 64.0G 21.7T 9.16K 0 1.14G 0 > test 64.0G 21.7T 1.98K 0 253M 0 > test 64.0G 21.7T 2.48K 0 317M 0 > test 64.0G 21.7T 1.98K 0 253M 0 > test 64.0G 21.7T 1.98K 0 254M 0 > test 64.0G 21.7T 2.23K 0 285M 0 > test 64.0G 21.7T 1.73K 0 221M 0 > test 64.0G 21.7T 2.23K 0 285M 0 > test 64.0G 21.7T 2.23K 0 285M 0 > test 64.0G 21.7T 1.49K 0 191M 0 > test 64.0G 21.7T 2.47K 0 317M 0 > test 64.0G 21.7T 1.46K 0 186M 0 > test 64.0G 21.7T 2.01K 0 258M 0 > test 64.0G 21.7T 1.98K 0 254M 0 > test 64.0G 21.7T 1.97K 0 253M 0 > test 64.0G 21.7T 2.23K 0 286M 0 > test 64.0G 21.7T 1.98K 0 254M 0 > test 64.0G 21.7T 1.73K 0 221M 0 > test 64.0G 21.7T 1.98K 0 254M 0 > test 64.0G 21.7T 2.42K 0 310M 0 > test 64.0G 21.7T 1.78K 0 228M 0 > test 64.0G 21.7T 2.23K 0 285M 0 > test 64.0G 21.7T 1.67K 0 214M 0 > test 64.0G 21.7T 1.80K 0 230M 0 > test 64.0G 21.7T 2.23K 0 285M 0 > test 64.0G 21.7T 2.47K 0 317M 0 > test 64.0G 21.7T 1.73K 0 221M 0 > test 64.0G 21.7T 1.99K 0 254M 0 > test 64.0G 21.7T 1.24K 0 159M 0 > test 64.0G 21.7T 2.47K 0 316M 0 > test 64.0G 21.7T 2.47K 0 317M 0 > test 64.0G 21.7T 1.99K 0 254M 0 > test 64.0G 21.7T 2.23K 0 285M 0 > test 64.0G 21.7T 1.73K 0 221M 0 > test 64.0G 21.7T 2.48K 0 317M 0 > test 64.0G 21.7T 2.48K 0 317M 0 > test 64.0G 21.7T 1.49K 0 190M 0 > test 64.0G 21.7T 2.23K 0 285M 0 > test 64.0G 21.7T 2.23K 0 285M 0 > test 64.0G 21.7T 1.81K 0 232M 0 > test 64.0G 21.7T 1.90K 0 243M 0 > test 64.0G 21.7T 2.48K 0 317M 0 > test 64.0G 21.7T 1.49K 0 191M 0 > test 64.0G 21.7T 2.47K 0 317M 0 > test 64.0G 21.7T 1.99K 0 254M 0 > test 64.0G 21.7T 1.97K 0 253M 0 > test 64.0G 21.7T 1.49K 0 190M 0 > test 64.0G 21.7T 2.23K 0 286M 0 > test 64.0G 21.7T 1.82K 0 232M 0 > test 64.0G 21.7T 2.15K 0 275M 0 > test 64.0G 21.7T 2.22K 0 285M 0 > test 64.0G 21.7T 1.73K 0 222M 0 > test 64.0G 21.7T 2.23K 0 286M 0 > test 64.0G 21.7T 1.90K 0 244M 0 > test 64.0G 21.7T 1.81K 0 231M 0 > test 64.0G 21.7T 2.23K 0 285M 0 > test 64.0G 21.7T 1.97K 0 252M 0 > test 64.0G 21.7T 2.00K 0 255M 0 > test 64.0G 21.7T 8.42K 0 1.05G 0 > test 64.0G 21.7T 10.3K 0 1.29G 0 > test 64.0G 21.7T 10.2K 0 1.28G 0 > test 64.0G 21.7T 10.4K 0 1.30G 0 > test 64.0G 21.7T 10.4K 0 1.30G 0 > test 64.0G 21.7T 10.4K 0 1.30G 0 > test 64.0G 21.7T 10.2K 0 1.27G 0 > test 64.0G 21.7T 10.4K 0 1.30G 0 > test 64.0G 21.7T 10.6K 0 1.32G 0 > test 64.0G 21.7T 10.5K 0 1.31G 0 > test 64.0G 21.7T 10.4K 0 1.30G 0 > test 64.0G 21.7T 9.23K 0 1.15G 0 > test 64.0G 21.7T 10.5K 0 1.31G 0 > test 64.0G 21.7T 10.0K 0 1.25G 0 > test 64.0G 21.7T 9.55K 0 1.19G 0 > test 64.0G 21.7T 10.2K 0 1.27G 0 > test 64.0G 21.7T 10.0K 0 1.25G 0 > test 64.0G 21.7T 9.91K 0 1.24G 0 > test 64.0G 21.7T 10.6K 0 1.32G 0 > test 64.0G 21.7T 9.24K 0 1.15G 0 > test 64.0G 21.7T 10.1K 0 1.26G 0 > test 64.0G 21.7T 10.3K 0 1.29G 0 > test 64.0G 21.7T 10.3K 0 1.29G 0 > test 64.0G 21.7T 10.6K 0 1.33G 0 > test 64.0G 21.7T 10.6K 0 1.33G 0 > test 64.0G 21.7T 8.54K 0 1.07G 0 > test 64.0G 21.7T 0 0 0 0 > test 64.0G 21.7T 0 0 0 0 > test 64.0G 21.7T 0 0 0 0 > ^C > > bash-3.2# ptime dd if=/test/q1 of=/dev/null bs=1024k > 65536+0 records in > 65536+0 records out > > real 1:36.732 > user 0.046 > sys 48.069 > bash-3.2# > > > Well, that drop for several dozen seconds was interesting... > Lets run it again without export/import: > > bash-3.2# zpool iostat 1 > capacity operations bandwidth > pool used avail read write read write > ---------- ----- ----- ----- ----- ----- ----- > test 64.0G 21.7T 3.00K 6 384M 271K > test 64.0G 21.7T 0 0 0 0 > test 64.0G 21.7T 2.58K 0 330M 0 > test 64.0G 21.7T 6.02K 0 771M 0 > test 64.0G 21.7T 8.37K 0 1.05G 0 > test 64.0G 21.7T 10.4K 0 1.30G 0 > test 64.0G 21.7T 10.4K 0 1.30G 0 > test 64.0G 21.7T 9.64K 0 1.20G 0 > test 64.0G 21.7T 10.5K 0 1.31G 0 > test 64.0G 21.7T 10.6K 0 1.32G 0 > test 64.0G 21.7T 10.6K 0 1.33G 0 > test 64.0G 21.7T 10.4K 0 1.30G 0 > test 64.0G 21.7T 9.65K 0 1.21G 0 > test 64.0G 21.7T 9.84K 0 1.23G 0 > test 64.0G 21.7T 9.22K 0 1.15G 0 > test 64.0G 21.7T 10.9K 0 1.36G 0 > test 64.0G 21.7T 10.9K 0 1.36G 0 > test 64.0G 21.7T 10.4K 0 1.30G 0 > test 64.0G 21.7T 10.7K 0 1.34G 0 > test 64.0G 21.7T 10.6K 0 1.33G 0 > test 64.0G 21.7T 10.9K 0 1.36G 0 > test 64.0G 21.7T 10.6K 0 1.32G 0 > test 64.0G 21.7T 10.4K 0 1.30G 0 > test 64.0G 21.7T 10.7K 0 1.34G 0 > test 64.0G 21.7T 10.5K 0 1.32G 0 > test 64.0G 21.7T 10.6K 0 1.32G 0 > test 64.0G 21.7T 10.8K 0 1.34G 0 > test 64.0G 21.7T 10.4K 0 1.29G 0 > test 64.0G 21.7T 10.5K 0 1.31G 0 > test 64.0G 21.7T 9.15K 0 1.14G 0 > test 64.0G 21.7T 10.8K 0 1.35G 0 > test 64.0G 21.7T 9.76K 0 1.22G 0 > test 64.0G 21.7T 8.67K 0 1.08G 0 > test 64.0G 21.7T 10.8K 0 1.36G 0 > test 64.0G 21.7T 10.9K 0 1.36G 0 > test 64.0G 21.7T 10.3K 0 1.28G 0 > test 64.0G 21.7T 9.76K 0 1.22G 0 > test 64.0G 21.7T 10.5K 0 1.31G 0 > test 64.0G 21.7T 10.6K 0 1.33G 0 > test 64.0G 21.7T 9.23K 0 1.15G 0 > test 64.0G 21.7T 9.63K 0 1.20G 0 > test 64.0G 21.7T 9.79K 0 1.22G 0 > test 64.0G 21.7T 10.2K 0 1.28G 0 > test 64.0G 21.7T 10.4K 0 1.30G 0 > test 64.0G 21.7T 10.3K 0 1.29G 0 > test 64.0G 21.7T 10.2K 0 1.28G 0 > test 64.0G 21.7T 10.6K 0 1.33G 0 > test 64.0G 21.7T 10.8K 0 1.35G 0 > test 64.0G 21.7T 10.5K 0 1.32G 0 > test 64.0G 21.7T 11.0K 0 1.37G 0 > test 64.0G 21.7T 10.2K 0 1.27G 0 > test 64.0G 21.7T 9.69K 0 1.21G 0 > test 64.0G 21.7T 6.07K 0 777M 0 > test 64.0G 21.7T 0 0 0 0 > test 64.0G 21.7T 0 0 0 0 > test 64.0G 21.7T 0 0 0 0 > ^C > bash-3.2# > > bash-3.2# ptime dd if=/test/q1 of=/dev/null bs=1024k > 65536+0 records in > 65536+0 records out > > real 50.521 > user 0.043 > sys 48.971 > bash-3.2# > > Now looks like reading from the pool using single dd is actually CPU > bound. > > Reading the same file again and again does produce, more or less, > consistent timing. However every time I export/import the pool during > the first read there is that drop in throughput during first read and > total time increases to almost 100 seconds.... some meta-data? (of > course there are no errors oof any sort, etc.) > > That might fall in either of these buckets. 6412053 zfetch needs some love 6579975 dnode_new_blkid should check before it locks > > > > > >> Reducing zfs_txg_synctime to 1 helps a little bit but still it''s not > >> even stream of data. > >> > >> If I start 3 dd streams at the same time then it is slightly better > >> (zfs_txg_synctime set back to 5) but still very jumpy. > >> > > RB> Try zfs_txg_synctime to 10; that reduces the txg overhead. > > You need multiple dd; we''re basically CPU bound here. With multiple dd and zfs_txg_synctime to 10 you will have more write throughput. The drops you will have then will correspond to the metadata updates phase during the transaction group. But the drops below correspond to txg going faster (memory to disk) than dd is able to fill memory (some of that speed governed by suboptimal locking). -r > Doesn''t help... > > [...] > test 13.6G 21.7T 0 0 0 0 > test 13.6G 21.7T 0 8.46K 0 1.05G > test 17.6G 21.7T 0 19.3K 0 2.40G > test 17.6G 21.7T 0 0 0 0 > test 17.6G 21.7T 0 0 0 0 > test 17.6G 21.7T 0 8.04K 0 1022M > test 17.6G 21.7T 0 20.2K 0 2.51G > test 21.6G 21.7T 0 76 0 249K > test 21.6G 21.7T 0 0 0 0 > test 21.6G 21.7T 0 0 0 0 > test 21.6G 21.7T 0 10.1K 0 1.25G > test 25.6G 21.7T 0 18.6K 0 2.31G > test 25.6G 21.7T 0 0 0 0 > test 25.6G 21.7T 0 0 0 0 > test 25.6G 21.7T 0 6.34K 0 810M > test 25.6G 21.7T 0 19.9K 0 2.48G > test 29.6G 21.7T 0 88 63.2K 354K > [...] > > bash-3.2# ptime dd if=/dev/zero of=/test/q1 bs=1024k count=65536 > 65536+0 records in > 65536+0 records out > > real 1:10.074 > user 0.074 > sys 52.250 > bash-3.2# > > > Increasing it even further (up-to 32s) doesn''t help either. > > However lowering it to 1s gives: > > [...] > test 2.43G 21.7T 0 8.62K 0 1.07G > test 4.46G 21.7T 0 7.23K 0 912M > test 4.46G 21.7T 0 624 0 77.9M > test 6.66G 21.7T 0 10.7K 0 1.33G > test 6.66G 21.7T 0 6.66K 0 850M > test 8.86G 21.7T 0 10.6K 0 1.31G > test 8.86G 21.7T 0 1.96K 0 251M > test 11.2G 21.7T 0 16.5K 0 2.04G > test 11.2G 21.7T 0 0 0 0 > test 11.2G 21.7T 0 18.6K 0 2.31G > test 13.5G 21.7T 0 11 0 11.9K > test 13.5G 21.7T 0 2.60K 0 332M > test 13.5G 21.7T 0 19.1K 0 2.37G > test 16.3G 21.7T 0 11 0 11.9K > test 16.3G 21.7T 0 9.61K 0 1.20G > test 18.4G 21.7T 0 7.41K 0 936M > test 18.4G 21.7T 0 11.6K 0 1.45G > test 20.3G 21.7T 0 3.26K 0 407M > test 20.3G 21.7T 0 7.66K 0 977M > test 22.5G 21.7T 0 7.62K 0 963M > test 22.5G 21.7T 0 6.86K 0 875M > test 24.5G 21.7T 0 8.41K 0 1.04G > test 24.5G 21.7T 0 10.4K 0 1.30G > test 26.5G 21.7T 1 2.19K 127K 270M > test 26.5G 21.7T 0 0 0 0 > test 26.5G 21.7T 0 4.56K 0 584M > test 28.5G 21.7T 0 11.5K 0 1.42G > [...] > > bash-3.2# ptime dd if=/dev/zero of=/test/q1 bs=1024k count=65536 > 65536+0 records in > 65536+0 records out > > real 1:09.541 > user 0.072 > sys 53.421 > bash-3.2# > > > > Looks slightly less jumpy but the total real time is about the same so > average throughput is actually the same (about 1GB/s). > > > > > >> Reading with one dd produces steady throghput but I''m disapointed with > >> actual performance: > >> > > RB> Again, probably cpu bound. What''s "ptime dd..." saying ? > > You were right here. Reading with single dd seems to be cpu bound. > However multiple streams for reading do not seem to increase > performance considerably. > > Nevertheless the main issu is jumpy writing... > > > > > -- > Best regards, > Robert Milkowski mailto:milek at task.gda.pl > http://milek.blogspot.com >