sss@cray.com
2007-Jan-08 18:18 UTC
[Lustre-devel] [Bug 10710] Poor shared file I/O performance on large file system
Please don''t reply to lustre-devel. Instead, comment in Bugzilla by
using the following link:
https://bugzilla.lustre.org/show_bug.cgi?id=10710
I find the previous comment confusing, as Red Storm only has DDN S2A8500
RAIDs. Below is the current I/O configuration, which I had sent earlier
by email. If there is a misconfiguration, please identify it.
Also note that the data reported in comments #0 and #6 (back in July!)
was taken when Red Storm was running XT 1.3.* software, which was based
on Lustre 1.4.5 and Linux 2.4.
----
Red Storm has two separate I/O sections, one at each end of the machine.
The information below is for each I/O section.
Hardware:
Red Storm (Cray XT3), 2.4 GHz Opteron dual-core
Compute clients: 6,720 (small), 19,200 (large), 25,920 (jumbo)
FC2 HBA (QLogic), either two single-port or one dual-port per OSS
DDN S2A8500 RAID, FC disks, 8 tiers, ~500 GB/tier
Software:
Cray XT 1.4.45
Compute node OS: Catamount, with liblustre client
SIO node OS: Linux 2.6 (SLES 9), uniprocessor kernel
Lustre 1.4.6 + patches, Portals LND
Lustre configuration:
One large file system using 160 OSS nodes (320 OSTs, ~100 TB)
Two smaller file systems, each using half (80) of the same OSS nodes
(160 OSTs each, ~15 TB each)
3 MDS nodes, one for each file system
40 DDN S2A8500 couplets (320 ports)
Each DDN tier partitioned into two LUNs, one data and one journal;
each LUN partitioned for the two file systems attached:
grande data (320 GB) and scratch[1,2] data (180 GB),
grande journal (400 MB) and scratch[1,2] journal (400 MB)
RAIDs are connected and zoned for failover but failover is not in
the current Lustre configuration
MDS DDNs configured the same (not optimized for MD operations)
DDN configuration:
Firmware Version = 5.24
Block Size = 4096
Fast AV = OFF
Cache Coherency = ON
Cache prefetch = 1
Cache prefetch ceiling = 65536
Write Cache = ON
Cache Size = 1024 ("1024 segments of 2048 Kbytes")
Cache writeback limit = 75%
scjody@clusterfs.com
2007-Jan-17 16:00 UTC
[Lustre-devel] [Bug 10710] Poor shared file I/O performance on large file system
Please don''t reply to lustre-devel. Instead, comment in Bugzilla by using the following link: https://bugzilla.lustre.org/show_bug.cgi?id=10710 I have collected all the suggestions I could find from lustre-discuss and lustre-devel since the beginning of November on the Lustre wiki. This information will eventually be added to the manual. Please extend it with anything you have learned: Please don''t reply to lustre-devel. Instead, comment in Bugzilla by using the following link: https://mail.clusterfs.com/wikis/lustre/StripingGuidelines