We have an lustre 1.6.x filesystem, 4 OSS, 3 x4500 and 1 ddn s2a6620 Each oss has 4 1gig interfaces bonded, or 1 10gig interface. I have a user who is running a few hundred serial jobs that are all accessing the same 16GB file, we striped the file over all the osts, and are tapped at 500-600MB/s no mater the number of hosts running. IO per OST is around 15-20MB/s (31 total ost''s) This set of jobs keeps reading in the same data set, and has been running for about 24 hours (the group of about 900 total jobs). * Is there a recommendation of a better way to do these sorts of jobs? The compute nodes have 48GB of ram, he does not use much ram for the job just all the IO. * Is there a better way to tune? What should I be looking for to tune? Thanks! Brock Palen www.umich.edu/~brockp Center for Advanced Computing brockp at umich.edu (734)936-1985
On 12/10/2010 11:42 AM, Brock Palen wrote:> We have an lustre 1.6.x filesystem,1.6 has been dead for well over a year. End Of Life.> > 4 OSS, 3 x4500 and 1 ddn s2a6620 > > Each oss has 4 1gig interfaces bonded, or 1 10gig interface. > > I have a user who is running a few hundred serial jobs that are all accessing the same 16GB file, we striped the file over all the osts, and are tapped at 500-600MB/s no mater the number of hosts running. IO per OST is around 15-20MB/s (31 total ost''s) > > This set of jobs keeps reading in the same data set, and has been running for about 24 hours (the group of about 900 total jobs). > > * Is there a recommendation of a better way to do these sorts of jobs?Upgrade to the latest release of Lustre. The compute nodes have 48GB of ram, he does not use much ram for the job just all the IO.> > * Is there a better way to tune?Yes, you upgrade to the code that has all the tuning fixes/enhancements - Lustre 1.8 What should I be looking for to tune? You are wasting your time tuning here. 1.8 supports many things, including cache on OSTs which would likely help bunches in your case. cliffw> > Thanks! > > Brock Palen > www.umich.edu/~brockp > Center for Advanced Computing > brockp at umich.edu > (734)936-1985 > > > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss
On 2010-12-10, at 12:42, Brock Palen wrote:> We have an lustre 1.6.x filesystem, > > 4 OSS, 3 x4500 and 1 ddn s2a6620 > > Each oss has 4 1gig interfaces bonded, or 1 10gig interface. > > I have a user who is running a few hundred serial jobs that are all accessing the same 16GB file, we striped the file over all the osts, and are tapped at 500-600MB/s no mater the number of hosts running. IO per OST is around 15-20MB/s (31 total ost''s)How big is the IO size? Are all the clients both reading and writing this same file? Presumably you see better performance when so many jobs are not running against the filesystem?> This set of jobs keeps reading in the same data set, and has been running for about 24 hours (the group of about 900 total jobs). > > * Is there a recommendation of a better way to do these sorts of jobs? The compute nodes have 48GB of ram, he does not use much ram for the job just all the IO.I agree with Cliff that the 1.8 OSS read cache will probably help the performance in this case. OSS read cache does not need a client-side upgrade to work, though of course I''d suggest upgrading the clients anyway. 1.8.5 was just released this week...> * Is there a better way to tune? What should I be looking for to tune?Start by looking at /proc/fs/lustre/obdfilter/*/brw_stats on the OSTs. It should be reset before the job (echo 0 to each file) so you get stats relevant to that job only. You can also check iostat on the OSS nodes to see how busy the disks are. They may be imbalanced due to being different hardware, and will only go as fast as the slowest OSTs. Cheers, Andreas -- Andreas Dilger Lustre Technical Lead Oracle Corporation Canada Inc.