thr3ads.net - Lustre discuss - [Lustre-discuss] mmap with Lustre 1.6beta5 [Dec 2006]

If this information is useful, please help other people find it:
Share via:

Martin Pokorny

2006-Dec-08 12:00 UTC

[Lustre-discuss] mmap with Lustre 1.6beta5

Hi,

I am in the process of evaluating lustre, and I have a small problem I
am hoping that someone could shed some light on.

What I''m running:
Lustre v1.6beta5
linux 2.6.12.6 (smp)
2 OSS
1 MGS/MDT
4 clients
Ethernet network

I''m trying to determine whether multiple processes on multiple nodes
can
simultaneously mmap a common file on a lustre file system, write to it,
and produce a coherent result (I''m using OpenMPI to spawn the processes
and provide synchronization barriers). In my tests, each process is
writing a 10,000 byte segment of the file, but is memory mapping the
whole file. What I''m seeing is that if I use 40 processes or less, the
file is (usually) produced correctly. However, when I try my test with
50 or 100 processes, I rarely get a good result; in fact, the tests seem
to hang. What I''ve found is that, when the test fails, there are
processes remaining on the lustre client nodes that are using up all the
CPU, but never seem to finish. I have no trouble interrupting the
running processes in this case.

While I''m not entirely sure of the result I should expect in these
tests, I certainly would expect the test to finish. Does anyone have any
comments or ideas?

-- 
Martin

Jean-Marc Saffroy

2006-Dec-08 12:26 UTC

head link

[Lustre-discuss] mmap with Lustre 1.6beta5

On Fri, 8 Dec 2006, Martin Pokorny wrote:
> I''m trying to determine whether multiple processes on multiple
nodes can
> simultaneously mmap a common file on a lustre file system, write to it, 
> and produce a coherent result (I''m using OpenMPI to spawn the
processes
> and provide synchronization barriers). In my tests, each process is 
> writing a 10,000 byte segment of the file, but is memory mapping the 
> whole file.
With my very limited understanding of its internals, I would expect Lustre 
to provide it''s best results with stripe-aligned, or at least
page-aligned
write areas. With your test, Lustre''s internal locking may well be
under
high stress.
> What I''m seeing is that if I use 40 processes or less, the file is
> (usually) produced correctly. However, when I try my test with 50 or 100 
> processes, I rarely get a good result; in fact, the tests seem to hang. 
> What I''ve found is that, when the test fails, there are processes 
> remaining on the lustre client nodes that are using up all the CPU, but 
> never seem to finish. I have no trouble interrupting the running 
> processes in this case.
>
> While I''m not entirely sure of the result I should expect in these
> tests, I certainly would expect the test to finish. Does anyone have any 
> comments or ideas?
Maybe some locks are ping-ponging between clients? Or it could be a real 
deadlock too.

CFS engineers will probably suggest you to turn on certain debugging flags 
and post the resulting logs, which only them can analyze. ;-)


Cheers,

-- 
Jean-Marc Saffroy - jean-marc.saffroy@ext.bull.net

Oleg Drokin

2006-Dec-23 08:33 UTC

head link

[Lustre-discuss] mmap with Lustre 1.6beta5

Hello!

On Fri, Dec 08, 2006 at 11:59:41AM -0700, Martin Pokorny
wrote:> I''m trying to determine whether multiple processes on multiple
nodes can
> simultaneously mmap a common file on a lustre file system, write to it,
> and produce a coherent result (I''m using OpenMPI to spawn the
processes
> and provide synchronization barriers). In my tests, each process is
> writing a 10,000 byte segment of the file, but is memory mapping the
write with write(2) system call, or write into mapping?
What is the striping pattern?
> whole file. What I''m seeing is that if I use 40 processes or less,
the
> file is (usually) produced correctly. However, when I try my test with
> 50 or 100 processes, I rarely get a good result; in fact, the tests seem
> to hang. What I''ve found is that, when the test fails, there are
> processes remaining on the lustre client nodes that are using up all the
> CPU, but never seem to finish. I have no trouble interrupting the
> running processes in this case.
Can you obtain traces for these processes? (use sysrq-t and sysrq-p).

Also I wonder if yo can retry your testing with Lustre 1.4.8 and see if it
behaves any differently.

Thanks.

Bye,
    Oleg

Lustre discuss - Dec 2006 - mmap with Lustre 1.6beta5

[Lustre-discuss] mmap with Lustre 1.6beta5

[Lustre-discuss] mmap with Lustre 1.6beta5

[Lustre-discuss] mmap with Lustre 1.6beta5