Weikuan Yu
2008-Feb-29 13:26 UTC
[Lustre-discuss] [Fwd: [ofa-general] Announcing the release of MVAPICH 1.0]
Per the announcement from the MVAPICH team, I am pleased to let you know that the MPI-IO support for Lustre has been integrated into the new release of MVAPICH, version 1.0. > - Optimized and high-performance ADIO driver for Lustre > - This MPI-IO support is a contribution from Future Technologies > Group, Oak Ridge National Laboratory. > (http://ft.ornl.gov/doku/doku.php?id=ft:pio:start) > - Performance graph at: > http://mvapich.cse.ohio-state.edu/performance/mvapich/romio.shtml Please feel free to try it out and send your comments/questions to this lustre-discuss list or mvapich-discuss at cse.ohio-state.edu. Thanks, --Weikuan -------- Original Message -------- Subject: [ofa-general] Announcing the release of MVAPICH 1.0 Date: Fri, 29 Feb 2008 00:17:48 -0500 (EST) From: Dhabaleswar Panda <panda at cse.ohio-state.edu> To: ewg at lists.openfabrics.org, <general at lists.openfabrics.org> The MVAPICH team is pleased to announce the availability of MVAPICH 1.0 with the following NEW features: - New Scalable and robust job startup - Enhanced and robust mpirun_rsh framework to provide scalable launching on multi-thousand core clusters - Running time of `MPI Hello World'' program on 1K cores is around 4 sec and on 32K cores is around 80 sec - Available for OpenFabrics/Gen2, OpenFabrics/Gen2-UD and QLogic InfiniPath devices - Performance graph at: http://mvapich.cse.ohio-state.edu/performance/startup.shtml - Enhanced support for SLURM - Available for OpenFabrics/Gen2, OpenFabrics/Gen2-UD and QLogic InfiniPath devices - New OpenFabrics Gen2 Unreliable-Datagram (UD)-based design for large-scale InfiniBand clusters (multi-thousand cores) - delivers performance and scalability with constant memory footprint for communication contexts - Only 40MB per process even with 16K processes connected to each other - Performance graph at: http://mvapich.cse.ohio-state.edu/performance/mvapich/ud_memory.shtml - zero-copy protocol for large data transfer - shared memory communication between cores within a node - multi-core optimized collectives (MPI_Bcast, MPI_Barrier, MPI_Reduce and MPI_Allreduce) - enhanced MPI_Allgather collective - New features for OpenFabrics Gen2-IB interface - enhanced coalescing support with varying degree of coalescing - support for ConnectX adapter - support for asynchronous progress at both sender and receiver to overlap computation and communication - multi-core optimized collectives (MPI_Bcast) - tuned collectives (MPI_Allgather, MPI_Bcast) based on network adapter characteristics - Performance graph at: http://mvapich.cse.ohio-state.edu/performance/collective.shtml - network-level fault tolerance with Automatic Path Migration (APM) for tolerating intermittent network failures over InfiniBand. - New Support for QLogic InfiniPath adapters - high-performance point-to-point communication - optimized collectives (MPI_Bcast and MPI_Barrier) with k-nomial algorithms while exploiting multi-core architecture - Optimized and high-performance ADIO driver for Lustre - This MPI-IO support is a contribution from Future Technologies Group, Oak Ridge National Laboratory. (http://ft.ornl.gov/doku/doku.php?id=ft:pio:start) - Performance graph at: http://mvapich.cse.ohio-state.edu/performance/mvapich/romio.shtml - Flexible user defined processor affinity for better resource utilization on multi-core systems - flexible process bindings to cores - allows memory-intensive applications to run with a subset of cores on each chip for better performance More details on all features and supported platforms can be obtained by visiting the following URL: http://mvapich.cse.ohio-state.edu/overview/mvapich/features.shtml MVAPICH 1.0 continues to deliver excellent performance. Sample performance numbers include: - with OpenFabrics/Gen2 on EM64T quad-core with PCIe and ConnectX-DDR: - 1.51 microsec one-way latency (4 bytes) - 1404 MB/sec unidirectional bandwidth - 2713 MB/sec bidirectional bandwidth - with PSM on Opteron with Hypertransport and QLogic-SDR: - 1.25 microsec one-way latency (4 bytes) - 953 MB/sec unidirectional bandwidth - 1891 MB/sec bidirectional bandwidth Performance numbers for all other platforms, system configurations and operations can be viewed by visiting `Performance'' section of the project''s web page. For downloading MVAPICH 1.0, associated user guide and accessing the anonymous SVN, please visit the following URL: http://mvapich.cse.ohio-state.edu All feedbacks, including bug reports and hints for performance tuning, are welcome. Please post it to the mvapich-discuss mailing list. Thanks, The MVAPICH Team =====================================================================MVAPICH/MVAPICH2 project is currently supported with funding from U.S. National Science Foundation, U.S. DOE Office of Science, Mellanox, Intel, Cisco Systems, QLogic, Sun Microsystems and Linux Networx; and with equipment support from Advanced Clustering, AMD, Appro, Chelsio, Dell, Fujitsu, Fulcrum, IBM, Intel, Mellanox, Microway, NetEffect, QLogic and Sun Microsystems. Other technology partner includes Etnus. ===================================================================== _______________________________________________ general mailing list general at lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general -- Weikuan Yu <+> 1-865-574-7990 http://ft.ornl.gov/~wyu/