Our temporary filesystem was promoted by events in to semi-production, as frequently happens, and is being overworked. We have too few servers, too many targets and too many jobs. Typically, the clients think the filesystem is unmounted. The servers record messages about many clients being evicted due to lock blocking callback and lock glimpse callback timeouts. Less frequently are bulk PUT timeouts. The servers generally have less than 100MB free. New hardware that will support the workload is on the way, but are there some changes I can make now to 1.6.6 that would increase reliability, even at the expense of performance? -Don
On Tue, 2009-09-01 at 11:34 -0700, Don Thorp wrote:> > New hardware that will support the workload is on the way, but are > there some changes I can make now to 1.6.6 that would increase > reliability, even at the expense of performance?With what you have given us to work with, my first suggestion would be to increase your obd_timeout. You should not need to go higher than about 300 seconds, but should try to choose a value only high enough to stop the callback timeouts. Higher obd_timeout values mean longer recoveries. Additionally, you might look into tuning the number of OST threads on your OSSes if you are driving your disks too hard. OST thread count, like obd_timeout should be just high enough, but not more, to reach maximum throughput. If you have not baselined your hardware with the iokit, you can simply start dropping the OST thread counts until you find that you are impacting throughput. It''s a bit more trial and error than using the iokit, but if you are in production already, it''s probably the best you can do. b. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 197 bytes Desc: This is a digitally signed message part Url : http://lists.lustre.org/pipermail/lustre-discuss/attachments/20090901/ba01a54d/attachment.bin
On Sep 01, 2009 15:13 -0400, Brian J. Murrell wrote:> On Tue, 2009-09-01 at 11:34 -0700, Don Thorp wrote: > > New hardware that will support the workload is on the way, but are > > there some changes I can make now to 1.6.6 that would increase > > reliability, even at the expense of performance? > > With what you have given us to work with, my first suggestion would be > to increase your obd_timeout. You should not need to go higher than > about 300 seconds, but should try to choose a value only high enough to > stop the callback timeouts. Higher obd_timeout values mean longer > recoveries. > > Additionally, you might look into tuning the number of OST threads on > your OSSes if you are driving your disks too hard. OST thread count, > like obd_timeout should be just high enough, but not more, to reach > maximum throughput. If you have not baselined your hardware with the > iokit, you can simply start dropping the OST thread counts until you > find that you are impacting throughput. It''s a bit more trial and error > than using the iokit, but if you are in production already, it''s > probably the best you can do.Note that in 1.6 changing the oss thread count is not dynamic, it needs a server restart. In 1.8.1 (IIRC) it is possible to increase the thread count at runtime, though it can''t yet be reduced. As a completely rough estimate, if you have 4 OSS threads per spindle, that wouldn''t be a terrible first approximation. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc.
On Tue, 2009-09-01 at 17:46 -0600, Andreas Dilger wrote:> > Note that in 1.6 changing the oss thread count is not dynamic, it > needs a server restart.Yeah. :-(> In 1.8.1 (IIRC) it is possible to increase > the thread count at runtime, though it can''t yet be reduced.Oh, not so bad as 1.6 then. For me though, the ideal is reached when the tuning is done dynamically. I would imagine an implementation not unlike what TCP''s slow start does for the congestion window. i.e. start with a small number of OST threads (or perhaps cache a value from the last run) and add more until performance either drops, or does at least does not increase with the addition of threads. The devil is in the details of course. I''m not quite imagining how "performance" is measured or tracked though. I think I opened a bug on this a while ago. b. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 197 bytes Desc: This is a digitally signed message part Url : http://lists.lustre.org/pipermail/lustre-discuss/attachments/20090908/1f832ea3/attachment.bin