Jody McIntyre
2008-Apr-26 00:18 UTC
[Linux_hpc_swstack] Fwd: sata_mv performance; impact of NCQ
I''m forwarding this here - I just posted it to linux-ide. I can''t CC messages to both lists unless this list allows nonsubscriber posting. I''ll investigate that on Monday but I wanted to get the message out before the weekend. Cheers, Jody ----- Forwarded message from Jody McIntyre <scjody at sun.com> ----- From: Jody McIntyre <scjody at sun.com> To: linux-ide at vger.kernel.org Subject: sata_mv performance; impact of NCQ I have completed several performance tests of the Marvell SATA controllers (6x MV88SX6081) in a Sun x4500 (Thumper) system. The complete results are available from: http://downloads.lustre.org/people/scjody/thumper-kernel-comparison-01.ods My ultimate goal is to replace mv_sata (the out-of-tree vendor driver) on RHEL 4 with sata_mv on a modern kernel, and to do this I need equivalent or better performance, especially on large (1MB) IOs. I note the recent changes to enable NCQ result in a net performance gain for 64K and 128K IOs, but due to a chipset limitation in the 6xxx series (according to commit f273827e2aadcf2f74a7bdc9ad715a1b20ea7dda), max_sectors is now limited which means we can no longer perform IOs greater than 128K (ENOMEM is returned from an sg write.) Therefore large IO performance suffers - for example, the performance of 2.6.25 with NCQ support removed on 1MB IOs is better than anything possible with stock 2.6.25 for many workloads. Would it be worth re-enabling large IOs on this hardware when NCQ is disabled (using the queue_depth /proc variable)? If so I''ll come up with a patch. Does anyone know what mv_sata does about NCQ? I see references to NCQ throughout the code but I don''t yet understand it enough to determine what''s going on. mv_sata _does_ support IOs greater than 128K, which suggests that it does not use NCQ on this hardware at least. Any advice on areas to explore to improve sata_mv''s performance? I imagine I need to understand what mv_sata does differently, and plan on spending some time reading that code, but I''d appreciate more specific ideas if anyone has them. Details on the tests I performed: I used the sgpdd-survey tool, a low level performance test from the Lustre iokit. It can be downloaded from: http://downloads.lustre.org/public/tools/lustre-iokit/ . The tool performs timed sgp_dd commands using various IO sizes, region counts, and thread counts and reports aggregate bandwidth. These results were then graphed using a spreadsheet. Note that the largest thread count measured is not the same on all graphs. For some reason, write() to the sg device returns ENOMEM to some sgp_dd threads when large numbers of threads are run on recent kernels. The problem does not exist with the RHEL 4 kernel. I have not yet investigated why this happens. Cheers, Jody -- Jody McIntyre - Linux Kernel Engineer, Sun HPC ----- End forwarded message ----- -- Jody McIntyre - Linux Kernel Engineer, Sun HPC