Jordan Mendler
2009-Apr-02 22:17 UTC
[Lustre-discuss] OSS Cache Size for read optimization
Hi all, I deployed Lustre on some legacy hardware and as a result my (4) OSS''s each have 32GB of RAM. Our workflow is such that we are frequently rereading the same 15GB indexes over and over again from Lustre (they are striped across all OSS''s) by all nodes on our cluster. As such, is there any way to increase the amount of memory that either Lustre or the Linux kernel uses to cache files read from disk by the OSS''s? This would allow much of the indexes to be served from memory on the OSS''s rather than disk. I see a *lustre.memused_max = 48140176* parameter, but not sure what that does. If it matters, my setup is such that each of the 4 OSS''s serves 1 OST that consists of a software RAID10 across 4 SATA disks internal to that OSS. Any other suggestions for tuning for fast reads of large files would also be greatly appreciated. Thanks so much, Jordan -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20090402/b9115a6a/attachment.html
Jordan Mendler wrote:> Hi all, > > I deployed Lustre on some legacy hardware and as a result my (4) OSS''s > each have 32GB of RAM. Our workflow is such that we are frequently > rereading the same 15GB indexes over and over again from Lustre (they > are striped across all OSS''s) by all nodes on our cluster. As such, is > there any way to increase the amount of memory that either Lustre or the > Linux kernel uses to cache files read from disk by the OSS''s? This would > allow much of the indexes to be served from memory on the OSS''s rather > than disk. > > I see a /lustre.memused_max = 48140176/ parameter, but not sure what > that does. If it matters, my setup is such that each of the 4 OSS''s > serves 1 OST that consists of a software RAID10 across 4 SATA disks > internal to that OSS. > > Any other suggestions for tuning for fast reads of large files would > also be greatly appreciated. >Current Lustre does not cache on OSTs at all. All IO is direct. Future Lustre releases will provide an OST cache. For now, you can increase the amount of data cached on clients, which might help a little. Client caching is set with /proc/fs/lustre/osc/*/max_dirty_mb. cliffw> Thanks so much, > Jordan > > > ------------------------------------------------------------------------ > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss
Lundgren, Andrew
2009-Apr-03 18:52 UTC
[Lustre-discuss] OSS Cache Size for read optimization
The parameter is called dirty, is that write cache, or is it read-write?> > Current Lustre does not cache on OSTs at all. All IO is direct. > Future Lustre releases will provide an OST cache. > > For now, you can increase the amount of data cached on clients, which > might help a little. Client caching is set with > /proc/fs/lustre/osc/*/max_dirty_mb. >
Yes, it is for dirty cache limiting on a per-osc basis. There is also /proc/fs/lustre/llite/*/max_cached_mb that regulates how much cached data per client you can have. (default is 3/4 of RAM) On Apr 3, 2009, at 2:52 PM, Lundgren, Andrew wrote:> The parameter is called dirty, is that write cache, or is it read- > write? > >> >> Current Lustre does not cache on OSTs at all. All IO is direct. >> Future Lustre releases will provide an OST cache. >> >> For now, you can increase the amount of data cached on clients, which >> might help a little. Client caching is set with >> /proc/fs/lustre/osc/*/max_dirty_mb. >> > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss
Andreas Dilger
2009-Apr-06 06:09 UTC
[Lustre-discuss] OSS Cache Size for read optimization
On Apr 02, 2009 15:17 -0700, Jordan Mendler wrote:> I deployed Lustre on some legacy hardware and as a result my (4) OSS''s each > have 32GB of RAM. Our workflow is such that we are frequently rereading the > same 15GB indexes over and over again from Lustre (they are striped across > all OSS''s) by all nodes on our cluster. As such, is there any way to > increase the amount of memory that either Lustre or the Linux kernel uses to > cache files read from disk by the OSS''s? This would allow much of the > indexes to be served from memory on the OSS''s rather than disk.With Lustre 1.8.0 (in late release testing, you could grab v1_8_0_RC5 from CVS for testing[*]) there is OSS server-side caching of read and just-written data. There is a tunable that allows limiting the maximum file size that is cached on the OSS so that small files can be cached, and large files will not wipe out the read cache, /proc/fs/lustre/obdfilter/*/readcache_max_filesize. Set readcache_max_filesize just large enough to hold your index files (which are hopefully not too large individually) to maximize your cache retention. While the cache eviction is LRU, it may be that at high IO rates your working set would still be evicted from RAM if too many other files fall within the cache file size limit. [*] Note that v1_8_0_RC5 is missing the fix for bug 18659 so is not at all safe to use on the MDS, v1_8_0_RC6 will have that fix, as does b1_8.> I see a *lustre.memused_max = 48140176* parameter, but not sure what that > does. If it matters, my setup is such that each of the 4 OSS''s serves 1 OST > that consists of a software RAID10 across 4 SATA disks internal to that OSS.That is just reporting the total amount of RAM ever used by the Lustre code itself (48MB in this case), and has nothing to do with the cached data. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc.