thr3ads.net - Lustre devel - [Lustre-devel] RE: [Arch] scalability study: single client _CONNECTS_ to a very large number of OSS servers [Nov 2006]

If this information is useful, please help other people find it:
Share via:

Peter J. Braam

2006-Nov-17 10:55 UTC

[Lustre-devel] RE: [Arch] scalability study: single client _CONNECTS_ to a very large number of OSS servers

Hi Niu,

This is a good review of the scalability of connections.  But there are some
questions.  I have now cc''d lustre-devel to get the discussion in the
open.
> -----Original Message-----
> From: arch-bounces@clusterfs.com 
> [mailto:arch-bounces@clusterfs.com] On Behalf Of Niu YaWei
> Sent: Wednesday, November 15, 2006 8:29 PM
> To: arch@clusterfs.com
> Cc: beaver@clusterfs.com
> Subject: [Arch] scalability study: 
> 
single client _CONNECTS_ to a very large number of OSS servers

Review form:
> 1. Use case identifier:  single client _CONNECTS_ to a very 
> large number of OSS servers
> 
> 2. Link to architectural information: None
> 
> 3. HLD available: YES
> 
> 4. Patterns of basic operations:
>         a. RPCs:
>                 - One OST_CONNECT RPC for each OST.
>         b. fs/obd other methods:
>                 - obd_connect.
>         c. cache: None.
>         d. Lustre & Linux locks: No suspect locks.
>         e. lists, arrays, queues @runtime:
>                 - obd array obd_devs, the maximum device 
> count is 8k, so the osc count must be less than 8k, I think 
> it''s enough.
Nope - we want this to be far more scalable than 8K OSC''s.  Last week
we
heard that Evan Felix ran with 4000 OSC''s (getting a whopping 130GB/sec
read
from Lustre!

The array needs to go away.  I think Nathan is already working on this for
the load simulator btw.

Hmm, I don''t see a server side consideration of this problem.  Am I
missing
something?

>                 - qos_add_tgt() will search and maintain the 
> lq_oss_list, this list grows as OSS number grow.
>                 - Need search connection in the 
> imp_conn_list, but this list is quite small and will never grow.
>         f. startup data: None.
> 
> 5. Scalable use pattern of basic operations:
>         - One client perform mount.
>         - MDS setup.
Is connect also used against the management server?
> 6. Scalability measures:
>         - The number of OST_CONNECT RPC is N (OST count), 
> since the RPC is sent asynchronously, it runs in O(1) time.
>         - Unless we are going to build a cluster with more 
> than 8k OSTs, we can''t run out of obd_devs.
>         - The time complexity of qos_add_tgt() is O(N), and 
> it should only happen when MDS connect OSS, so no need
>           to improve it.
On the server side isn''t there scanning in the list of existing
connections
to see if a UUID of a connection is already in the list?  Isn''t that
list
O(N) long?  If so, the scan is O(N^2)?

Eeb - can you confirm one more time that connection setup, which is likely
to happen at this point in LNET has no linear scans?
> 
> 7. Experiment description and findings:
>         - No test for it.
Nathan - will the load simulator do this?  I think it could even be used
over the net?
 > 8. Recommendations for improvements:
>         - No recommendation on implementation improvements.
A. Kill the array (P2)
B. Fix the searching on the server (P1)
> 9. Non scalable issues encountered, not identified by this process:
>         - The qos lists are useless for client''s lov, but we 
> have to setup them since MDS and client use
>           the same lov driver, this needless list maintenance 
> work will burden each client mount, we should
>           avoid it.
Hmm.  But this will change in the future when the client has a full WB
cache, so let''s leave them in.  Does the setup scale well?

- Peter -

> 
> 
> _______________________________________________
> Arch mailing list
> Arch@clusterfs.com
> https://mail.clusterfs.com/mailman/listinfo/arch

Niu YaWei

2006-Nov-19 20:04 UTC

head link

[Lustre-devel] Re: [Arch] scalability study: single client _CONNECTS_ to a very large number of OSS servers

Peter J. Braam wrote:
>Hi Niu,
>
>This is a good review of the scalability of connections.  But there are some
>questions.  I have now cc''d lustre-devel to get the discussion in
the open.
>
>  
>
>>-----Original Message-----
>>From: arch-bounces@clusterfs.com 
>>[mailto:arch-bounces@clusterfs.com] On Behalf Of Niu YaWei
>>Sent: Wednesday, November 15, 2006 8:29 PM
>>To: arch@clusterfs.com
>>Cc: beaver@clusterfs.com
>>Subject: [Arch] scalability study: 
>>
>>    
>>
>
>single client _CONNECTS_ to a very large number of OSS servers
>
>Review form:
>
>  
>
>>1. Use case identifier:  single client _CONNECTS_ to a very 
>>large number of OSS servers
>>
>>2. Link to architectural information: None
>>
>>3. HLD available: YES
>>
>>4. Patterns of basic operations:
>>        a. RPCs:
>>                - One OST_CONNECT RPC for each OST.
>>        b. fs/obd other methods:
>>                - obd_connect.
>>        c. cache: None.
>>        d. Lustre & Linux locks: No suspect locks.
>>        e. lists, arrays, queues @runtime:
>>                - obd array obd_devs, the maximum device 
>>count is 8k, so the osc count must be less than 8k, I think 
>>it''s enough.
>>    
>>
>
>Nope - we want this to be far more scalable than 8K OSC''s.  Last
week we
>heard that Evan Felix ran with 4000 OSC''s (getting a whopping
130GB/sec read
>from Lustre!
>
>The array needs to go away.  I think Nathan is already working on this for
>the load simulator btw.
>
>Hmm, I don''t see a server side consideration of this problem.  Am I
missing
>something?
>
>
>  
>There are oscs in the obd_devs of MDS, so it also has the problem, but 
for OSS,
there are not so much obd devs by now.
>>                - qos_add_tgt() will search and maintain the 
>>lq_oss_list, this list grows as OSS number grow.
>>                - Need search connection in the 
>>imp_conn_list, but this list is quite small and will never grow.
>>        f. startup data: None.
>>
>>5. Scalable use pattern of basic operations:
>>        - One client perform mount.
>>        - MDS setup.
>>    
>>
>
>Is connect also used against the management server?
>  
>No, I think it isn''t. Nathan, could you confirm it?
>  
>
>>6. Scalability measures:
>>        - The number of OST_CONNECT RPC is N (OST count), 
>>since the RPC is sent asynchronously, it runs in O(1) time.
>>        - Unless we are going to build a cluster with more 
>>than 8k OSTs, we can''t run out of obd_devs.
>>        - The time complexity of qos_add_tgt() is O(N), and 
>>it should only happen when MDS connect OSS, so no need
>>          to improve it.
>>    
>>
>
>On the server side isn''t there scanning in the list of existing
connections
>to see if a UUID of a connection is already in the list?  Isn''t
that list
>O(N) long?  If so, the scan is O(N^2)?
>  
>There''ll be another use case analysis for "a large number of Linux
clients _MOUNTS_ Lustre",
I think server side analysis is better to go there.
>Eeb - can you confirm one more time that connection setup, which is likely
>to happen at this point in LNET has no linear scans?
>
>  
>
>>7. Experiment description and findings:
>>        - No test for it.
>>    
>>
>
>Nathan - will the load simulator do this?  I think it could even be used
>over the net?
> 
>  
>
>>8. Recommendations for improvements:
>>        - No recommendation on implementation improvements.
>>    
>>
>
>A. Kill the array (P2)
>B. Fix the searching on the server (P1)
>
>  
>
>>9. Non scalable issues encountered, not identified by this process:
>>        - The qos lists are useless for client''s lov, but we 
>>have to setup them since MDS and client use
>>          the same lov driver, this needless list maintenance 
>>work will burden each client mount, we should
>>          avoid it.
>>    
>>
>
>Hmm.  But this will change in the future when the client has a full WB
>cache, so let''s leave them in.  Does the setup scale well?
>  
>Andreas has made a proposal to optimize qos list maintenance, we''ll
file
a bug to
fix it.

Thanks
- Niu

Nathaniel Rutman

2006-Nov-20 10:21 UTC

head link

[Lustre-devel] Re: [Arch] scalability study: single client _CONNECTS_ to a very large number of OSS servers

Niu YaWei wrote:>>>
>>> 5. Scalable use pattern of basic operations:
>>>        - One client perform mount.
>>>        - MDS setup.
>>>   
>>
>> Is connect also used against the management server?
>>  
>>
> No, I think it isn''t. Nathan, could you confirm it?
>Yes, the MGC''s are full clients of the MGS.  They connect and can get 
evicted if not heard from.  But there is only a single MGC per node in 
the usual cases -- i.e. multiple clients or OSTs or whatever all share 
the same MGC. >>  
>>
>>> 6. Scalability measures:
>>>        - The number of OST_CONNECT RPC is N (OST count), since the 
>>> RPC is sent asynchronously, it runs in O(1) time.
>>>        - Unless we are going to build a cluster with more than 8k 
>>> OSTs, we can''t run out of obd_devs.That''s not true.  If you mount two clients on the same node, you need 
twice the obd''s. >>>        - The time complexity of qos_add_tgt() is O(N), and it
should
>>> only happen when MDS connect OSS, so no need
>>>          to improve it.
>>>   
>>
>> On the server side isn''t there scanning in the list of
existing
>> connections
>> to see if a UUID of a connection is already in the list? 
Isn''t that
>> list
>> O(N) long?  If so, the scan is O(N^2)?
>>  
>>
> There''ll be another use case analysis for "a large number of
Linux
> clients _MOUNTS_ Lustre",
> I think server side analysis is better to go there.
>
>> Eeb - can you confirm one more time that connection setup, which is 
>> likely
>> to happen at this point in LNET has no linear scans?
class_add_uuid is O(N^2) also - see bugzilla 10345>>
>>  
>>
>>> 7. Experiment description and findings:
>>>        - No test for it.
>>>   
>>
>> Nathan - will the load simulator do this?  I think it could even be
used
>> over the net?I don''t see any reason why not, except it assumes the local nid for the
OST.  I could just add that as a param.

Lustre devel - Nov 2006 - RE: [Arch] scalability study: single client _CONNECTS_ to a very large number of OSS servers

[Lustre-devel] RE: [Arch] scalability study: single client _CONNECTS_ to a very large number of OSS servers

[Lustre-devel] Re: [Arch] scalability study: single client _CONNECTS_ to a very large number of OSS servers

[Lustre-devel] Re: [Arch] scalability study: single client _CONNECTS_ to a very large number of OSS servers