Hi, In the process of testing the Lustre DMU-OSS with a write intensive workload, I have seen a performance issue where IOs were being sent to disk in 512 byte sizes (even though we are currently doing 4K writes per transaction). I have noticed that vdev_queue.c is not being able to aggregate IOs, perhaps because vdev_file_io_start() is not doing asynchronous I/O. To try to fix this, I have added ZIO_STAGE_VDEV_IO_START to the list of async I/O stages, which somewhat improved the number of IO aggregations, but not nearly enough. It seems that for some reason the number of nodes in vq_pending_tree and vq_deadline_tree don''t go much above 1, even though the disk is always busy. I have also noticed that the 1 GB file produced by this benchmark had >2 million blocks, with an average block size (as reported by zdb -bbc) of 524 bytes or so, instead of the 128 KB block size I expected. Even manually setting the "recordsize" property to 128 KB (which was already the default) didn''t have any effect. After changing the Lustre DMU code to call dmu_object_alloc() with a blocksize of 128 KB, throughput improved *a lot*. Strangely (to me, at least), it seems that in ZFS all regular files are being created with 512 byte data block sizes, and that the "recordsize" property only affects the maximum write size per transaction in zfs_write(). Is this correct? Comments and suggestions are welcome :) Regards, Ricardo