thr3ads.net - Lustre discuss - [Lustre-discuss] Inconsistent data with Lustre 1.6.3 [Jan 2008]

If this information is useful, please help other people find it:
Share via:

Jeff Darcy

2008-Jan-29 21:44 UTC

[Lustre-discuss] Inconsistent data with Lustre 1.6.3

We''re running Lustre 1.6.3 and Linux 2.6.18 on our 972-node 
(5832-processor) machines, and we''re seeing some interesting problems 
when we run executables from a Lustre filesystem.  When we run 
5000-processor jobs, we often see some - maybe only a few, maybe a 
couple of dozen - fail with illegal-instruction and other traps, where 
examining the core file shows that the instructions in question are just 
fine (and the same as on jobs that succeeded).  Has anybody else seen 
similar problems running executables from a Lustre filesystem?

The setup in our lab only has MGS+MDT and one OST on one node, and two 
OSTs on another, exported to the rest via socklnd over our Ethernet 
emulation.  This originally showed up in some Fortran code, but we have 
also been able to reproduce it with a generated C program that contains 
nothing but 50,000 "x = x + 1" lines.  On the theory that this has 
something to do with I/O being completed prematurely - i.e. while 
buffers are in fact still being filled - we produced a variant of the 
program that walks through the entire program text to make sure the 
pages all get loaded well before they''re accessed, and the failures do 
not occur in this mode.  Stranger still, after a few runs (more than 
one) with the page-scanning turned on, runs without the page-scanning 
also start to succeed.  Copy the executable to a new location, though, 
and the failures start all over again.  This seems to support the theory 
that there''s a race in the I/O completion code, but doesn''t
tell us much
more than that.

There''s a significant chance that the problem is architecture-specific 
(our CPU architecture is MIPS with weak memory ordering) and/or in Linux 
rather than Lustre, but the same test has run fine using Lustre 1.6beta 
on Linux 2.6.15 and on other filesystems (e.g. NFS or ext3 over NBD) 
using current versions.  If anybody has any suggestions about places to 
look, parameters to tweak for the sake of experimentation, etc. it would 
be most appreciated.

Andreas Dilger

2008-Jan-30 00:09 UTC

head link

[Lustre-discuss] Inconsistent data with Lustre 1.6.3

On Jan 29, 2008  16:44 -0500, Jeff Darcy wrote:> We''re running Lustre 1.6.3 and Linux 2.6.18 on our 972-node 
> (5832-processor) machines, and we''re seeing some interesting
problems
> when we run executables from a Lustre filesystem.  When we run 
> 5000-processor jobs, we often see some - maybe only a few, maybe a 
> couple of dozen - fail with illegal-instruction and other traps, where 
> examining the core file shows that the instructions in question are just 
> fine (and the same as on jobs that succeeded).  Has anybody else seen 
> similar problems running executables from a Lustre filesystem?
> 
> There''s a significant chance that the problem is
architecture-specific
> (our CPU architecture is MIPS with weak memory ordering) and/or in Linux 
> rather than Lustre, but the same test has run fine using Lustre 1.6beta 
> on Linux 2.6.15 and on other filesystems (e.g. NFS or ext3 over NBD) 
> using current versions.  If anybody has any suggestions about places to 
> look, parameters to tweak for the sake of experimentation, etc. it would 
> be most appreciated.
There is definitely a possibility that the MIPS page cache is not coherent
in some cases.  I''m peripherally aware of these architecture-specific
areas,
and it is possible that we don''t handle all of them in the Lustre code.

In particular, I recalled vaguely (and then found) bug 933 which seems
directly relevant to your situation as flush_dcache_page() and friends
are NOT no-ops that they are on most common architectures.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

Lustre discuss - Jan 2008 - Inconsistent data with Lustre 1.6.3

[Lustre-discuss] Inconsistent data with Lustre 1.6.3

[Lustre-discuss] Inconsistent data with Lustre 1.6.3