Christoph Biardzki wrote:> > a couple of weeks ago I reported low metadata performance on my > two-node setup: on the client is was very low and on the server (with > lustre locally mounted) quite good. > > I localized the problem today: it only appears with zero-config > > Is there something I''m possibly missing or could this be a bug?This is definitely a bug. Although we have seen on-and-off the low metadata performance that you reported earlier, this is new and interesting information, thank you. Someone here will gather this all into a bug report for tracking and resolving. Thanks-- -Phil
Hello, a couple of weeks ago I reported low metadata performance on my two-node setup: on the client is was very low and on the server (with lustre locally mounted) quite good. I localized the problem today: it only appears with zero-config: when I use an /etc/fstab entry like nfs8:/mds1/nfs7 /mnt/lustre lustre nettype=tcp,noauto 0 0 and issue a "mount /mnt/lustre" command, the file system mounts correctly but metadata performance is very low. When I use lconf directly on the corresponding xml config file, metadata speed is very good. This behaviour can be reproduced. Is there something I''m possibly missing or could this be a bug? Greetings from Munich, Christoph
Brent M. Clements
2006-May-19 07:36 UTC
[Lustre-discuss] equation for # clients versus # ost''s.
Hi we were wondering, how many clients can a single ost support if all clients were performing i/o at the same time? Thanks, Brent Clements Linux Technology Specialist Information Technology Rice University
Phil Schwan
2006-May-19 07:36 UTC
[Lustre-discuss] equation for # clients versus # ost''s.
On Wed, 2004-05-26 at 12:56, Brent M. Clements wrote:> Hi we were wondering, how many clients can a single ost support if all > clients were performing i/o at the same time?We have run tests in which some 1200 nodes (the maximum hardware available to us today) are all writing to the same OST. It''s not very fast -- each node gets only 1/1200th of the OSS''s bandwidth -- but it does work. We make sure that things like the incoming request buffers on the server are large enough to handle this kind of load. Today''s ''timeout'' detection does not take into account this extremely heavy load, so it''s important to limit the amount of dirty cached data on each client (default 4MB is usually fine on a fast interconnect). A client must write out any dirty data covered by a lock before it can give the lock up, and if it can''t do that in about a minute, it runs the risk of being evicted by the server for being unresponsive. At some point we will adapt these timeout mechanisms to be more flexible (a design has been written). It does not yet seem to be a showstopper, but it might well be when we have 10,000 nodes to play with. Thanks-- -Phil