thr3ads.net - Lustre discuss - [Lustre-discuss] Problems with multiple lustre filesystems [Sep 2011]

If this information is useful, please help other people find it:
Share via:

J Alejandro Medina

2011-Sep-12 01:09 UTC

[Lustre-discuss] Problems with multiple lustre filesystems

Hi to all,

Our organization has recently configured two Lustre filesystems on a Linux
cluster. Both filesystems are connected to the same 10GBe VLAN. We have
tested both filesystems with iOzone and other benchmarking software without
errors.

When copying data from one filesystem to the other we experience excessive
broadcast messages. The network crawls down to its knees until both
filesystems stop responding.

If we test both filesystems separately we do not see this behavior.

Any ideas?
-- 
J. Alejandro Medina
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://lists.lustre.org/pipermail/lustre-discuss/attachments/20110911/c5d07370/attachment.html

THIELL Stephane

2011-Sep-13 16:20 UTC

head link

[Lustre-discuss] Problems with multiple lustre filesystems

J Alejandro Medina a ?crit :> When copying data from one filesystem to the other we experience 
> excessive broadcast messages. The network crawls down to its knees 
> until both filesystems stop responding. 
>
> If we test both filesystems separately we do not see this behavior.An idea could be to reduce your lustre client max_cached_mb value 
(per-filesystem value). By default, it is set to 2/3 of available system 
memory so it''s not optimal when mounting multiple lustre filesystems on
the same node, especially when copying data from one to the other.

see /proc/fs/lustre/llite/*/max_cached_mb

HTH,
Stephane Thiell
CEA

gregoire.pichon at bull.net

2011-Sep-14 07:10 UTC

head link

[Lustre-discuss] Problems with multiple lustre filesystems

> De : THIELL Stephane <stephane.thiell at cea.fr>
> 
> J Alejandro Medina a ?crit :
> > When copying data from one filesystem to the other we experience 
> > excessive broadcast messages. The network crawls down to its knees 
> > until both filesystems stop responding. 
> >
> > If we test both filesystems separately we do not see this behavior.
> An idea could be to reduce your lustre client max_cached_mb value 
> (per-filesystem value). By default, it is set to 2/3 of available system 
> memory so it''s not optimal when mounting multiple lustre
filesystems on
> the same node, especially when copying data from one to the other.
> 
> see /proc/fs/lustre/llite/*/max_cached_mb
> 
Looking at the code (lustre 2.0) it appears the max_cached_mb tunable has 
no effect.
I have found LU-141 "port lustre client page cache shrinker back to
clio"
that tracks the problem.

--
Gr?goire PICHON
Bull

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://lists.lustre.org/pipermail/lustre-discuss/attachments/20110914/0dc9320c/attachment.html

Lustre discuss - Sep 2011 - Problems with multiple lustre filesystems

[Lustre-discuss] Problems with multiple lustre filesystems

[Lustre-discuss] Problems with multiple lustre filesystems

[Lustre-discuss] Problems with multiple lustre filesystems