Hi Thomas,
LEIBOVICI Thomas wrote:> You will find in attachment to this mail some benchmarks we made on 
> Linux with pios over ZFS-DMU.
> There are some interesting things about ZFS tuning and some ideas for 
> breaking a bottleneck we identified in DMU.
Those are some very, very interesting benchmarks.
Regarding the "ZFS striping performance", you noticed that increasing 
the number of threads beyond a certain point didn''t improve
performance,
in fact it actually decreased.
I think that was something expected due to the fact that the DMU already 
has a good streamlined I/O pipeline which already parallelizes I/O, and 
increasing the number of threads greatly beyond the number of cpus 
causes contention to increase which causes I/O throughput to decrease.
However, it is in fact unfortunate that more luns didn''t improve 
performance. I wonder if you were hitting a CPU wall?
In a previous benchmark I ran, I noticed that PIOS was not getting 
improved throughput with more disks, even though there was still a 
significant percentage of available CPU time.
So I guess we still have opportunities to do good optimizations.
The "ZIO threads" is also something that I highly suspected had an 
impact in throughput, which is why when I benchmarked the DMU on the 
Thumper I increased them from 8 to 24. It is good to have hard data that 
confirms this.
The section about parallelizing checksums is something that the ZFS team 
appears to have solved already.
You can see this code section: 
http://www.wizy.org/mercurial/zfs-lustre/file/49c2aaa6a859/src/lib/libzfscommon/include/sys/zio_impl.h#101
If you take a look at the "ZIO_WRITE_COMMON_STAGES", you will notice 
that just before the "checksum generate" stage there is an "issue
async"
stage. This "issue async" stage basically consists in dispatching the 
I/O (ZIO) to the ZIO thread pool, which effectively causes them to be 
parallelized. The I/O dependencies are automatically tracked by the ZIO 
pipeline.
All in all, this was a very good report.
So far we have only done very limited benchmarking and optimization, but 
we are already starting to work on performance improvements. One of the 
tasks of our next development cycle will be doing this kind of analysis 
but, of course, the sooner we see this, the better :)
Great work and thanks for sharing this with us!
Best regards,
Ricardo
-- 
<http://www.sun.com> 	
*Ricardo Manuel Correia*
Lustre Engineering
*Sun Microsystems, Inc.*
Portugal
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://lists.lustre.org/pipermail/lustre-devel/attachments/20080115/d043c2df/attachment-0004.html
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 6g_top.gif
Type: image/gif
Size: 1257 bytes
Desc: not available
Url :
http://lists.lustre.org/pipermail/lustre-devel/attachments/20080115/d043c2df/attachment-0004.gif