Andreas Dilger
2006-Nov-17 16:46 UTC
[Lustre-devel] Re: [Arch] scalability study: single client _CONNECTS_ to a very large number of OSS servers
On Nov 16, 2006 11:29 +0800, Niu YaWei wrote:> 1. Use case identifier: single client _CONNECTS_ to a very large number > of OSS serversJust to clarify, will there be a separate use case for many clients connecting to a single OSS?> e. lists, arrays, queues @runtime: > - obd array obd_devs, the maximum device count is 8k, so > the osc count must be less than 8k, I think it''s enough.There has already been a real test with almost 4000 OSTs, though nothing in production with more than ~500 OSTs AFAIK. I would still consider this a scalability issue, and a bug should be filed and block the scalability-tracking bug. Yu did a good job making the obd_devs[] _elements_ dynamically allocated, the work to make the array itself grow should be relatively small. Don''t do this with ever-increasing kmallocs, as that will eventually fail too (likely because of memory fragmentation before the 128kB kmalloc limit, which is only 16k 8-byte pointers).> - The time complexity of qos_add_tgt() is O(N), and it should > only happen when MDS connect OSS, so no need to improve it.Hmm, later you (correctly) say that qos lists are also set up on clients... The good news is that the existing "UUID HASH" work being done can help avoid much of the list walking here (at least the "find OSS" part) by adding the OSS UUID to the hash for fast lookup, though not the "insert OSS into sorted-by-num-OSTs list" part. The latter can be fixed by changing the algorithm to avoid searching: - for each #OST count N {1..MAX_OSTS_PER_OSS}, keep a pointer to one OSS with N OSTs on it, and insert AFTER that OSS when adding/moving an OSS. - adding single-OST OSS insert at end of list, may set as reference for N=1 - if the "reference" OSS for N is incremented check .next for OSS with same N and use that as new "reference" for N, or NULL pointer if none is found. No list move is needed since the OSS is already at the boundary, though it may become the new reference OSS for N+1. - a MAX_OSTS_PER_OSS of 16 is safe (generally only 2 or 4 OSTs/OSS, and it is only a pointer so array can be reallocated if this is exceeded) This will keep the list sorting O(1), and benefits from the fact that we know there will be a very small range of N (essentially implementing this as a bunch of buckets instead of a list). It will need some special handling to keep the references in sync when OSTs are removed from the OSS. Can you please file a separate bug on this, also block scalability-tracking. Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc.