sss@cray.com
2007-Jan-08 18:18 UTC
[Lustre-devel] [Bug 10710] Poor shared file I/O performance on large file system
Please don''t reply to lustre-devel. Instead, comment in Bugzilla by using the following link: https://bugzilla.lustre.org/show_bug.cgi?id=10710 I find the previous comment confusing, as Red Storm only has DDN S2A8500 RAIDs. Below is the current I/O configuration, which I had sent earlier by email. If there is a misconfiguration, please identify it. Also note that the data reported in comments #0 and #6 (back in July!) was taken when Red Storm was running XT 1.3.* software, which was based on Lustre 1.4.5 and Linux 2.4. ---- Red Storm has two separate I/O sections, one at each end of the machine. The information below is for each I/O section. Hardware: Red Storm (Cray XT3), 2.4 GHz Opteron dual-core Compute clients: 6,720 (small), 19,200 (large), 25,920 (jumbo) FC2 HBA (QLogic), either two single-port or one dual-port per OSS DDN S2A8500 RAID, FC disks, 8 tiers, ~500 GB/tier Software: Cray XT 1.4.45 Compute node OS: Catamount, with liblustre client SIO node OS: Linux 2.6 (SLES 9), uniprocessor kernel Lustre 1.4.6 + patches, Portals LND Lustre configuration: One large file system using 160 OSS nodes (320 OSTs, ~100 TB) Two smaller file systems, each using half (80) of the same OSS nodes (160 OSTs each, ~15 TB each) 3 MDS nodes, one for each file system 40 DDN S2A8500 couplets (320 ports) Each DDN tier partitioned into two LUNs, one data and one journal; each LUN partitioned for the two file systems attached: grande data (320 GB) and scratch[1,2] data (180 GB), grande journal (400 MB) and scratch[1,2] journal (400 MB) RAIDs are connected and zoned for failover but failover is not in the current Lustre configuration MDS DDNs configured the same (not optimized for MD operations) DDN configuration: Firmware Version = 5.24 Block Size = 4096 Fast AV = OFF Cache Coherency = ON Cache prefetch = 1 Cache prefetch ceiling = 65536 Write Cache = ON Cache Size = 1024 ("1024 segments of 2048 Kbytes") Cache writeback limit = 75%
scjody@clusterfs.com
2007-Jan-17 16:00 UTC
[Lustre-devel] [Bug 10710] Poor shared file I/O performance on large file system
Please don''t reply to lustre-devel. Instead, comment in Bugzilla by using the following link: https://bugzilla.lustre.org/show_bug.cgi?id=10710 I have collected all the suggestions I could find from lustre-discuss and lustre-devel since the beginning of November on the Lustre wiki. This information will eventually be added to the manual. Please extend it with anything you have learned: Please don''t reply to lustre-devel. Instead, comment in Bugzilla by using the following link: https://mail.clusterfs.com/wikis/lustre/StripingGuidelines