thr3ads.net - Lustre discuss - [Lustre-discuss] File system corruption [May 2006]

If this information is useful, please help other people find it:
Share via:

Phil Schwan

2006-May-19 07:36 UTC

[Lustre-discuss] File system corruption

Hi Paul--

There have been so many dozens of bugs fixed since v1.0.4 that I really
don''t want to even speculate.  It would not be useful for either of us
to
try to debug 1.0.4.

What if we give you access to 1.2.x, and you give that a go instead?

-Phil

On 10/4/2004 20:06, paul@pabryan.mine.nu wrote:
> Hi again.
> 
> I haven''t heard anything about my filesystem corruption problem,
but I
> was wondering if someone can answer me this:
> 
> Can I get access to the patched version of e2fsprogs without a support
> contract?
> 
> I''ve been trialling Lustre in the hopes of convincing management
to roll
> out a larger scale cluster, which would then involve a support contract
> from CluterFS. Unfortunately, its going to be really difficult to sell
> the idea when I have a corrupt file system that I can''t recover :(
> 
> So, can anyone answer my question?
> 
> Cheers,
> Paul.
> 
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss@lists.clusterfs.com
> https://lists.clusterfs.com/mailman/listinfo/lustre-discuss

paul@pabryan.mine.nu

2006-May-19 07:36 UTC

head link

[Lustre-discuss] File system corruption

Hi again.

I haven''t heard anything about my filesystem corruption problem, but I
was wondering if someone can answer me this:

Can I get access to the patched version of e2fsprogs without a support
contract?

I''ve been trialling Lustre in the hopes of convincing management to
roll
out a larger scale cluster, which would then involve a support contract
from CluterFS. Unfortunately, its going to be really difficult to sell
the idea when I have a corrupt file system that I can''t recover :(

So, can anyone answer my question?

Cheers,
Paul.

paul@pabryan.mine.nu

2006-May-19 07:36 UTC

head link

[Lustre-discuss] File system corruption

On Tue, Oct 05, 2004 at 01:06:10AM -0400, Phil Schwan
wrote:> Hi Paul--
> 
> There have been so many dozens of bugs fixed since v1.0.4 that I really
> don''t want to even speculate.  It would not be useful for either
of us to
> try to debug 1.0.4.
Seems fair.
> 
> What if we give you access to 1.2.x, and you give that a go instead?
That would be great.

Is the patched e2fsprogs distributed with 1.2.x? If not, is it possible
to get access to it? I''d still like to run it over the filesystem to
see
what it turns up.

Just so you know though, what I''m doing at the moment is a small scale
rollout of a cluster to store backups on. It''s a semi-production proof
of concept. I''m anticipating around 5 nodes or so.

The point is to convince management that Lustre would be a good option
to roll out for the entire organization (distributed across 5 different
sites - yes, lots of questions to be answered there!).

So, at this point, the full project may not get off the ground. They may
choose a different system.

If you''re fine with giving me access in this situation, then no
problem.
I just wanted to make sure you knew this beforehand.

On the plus side for you guys, I''m looking at some interesting
scenarios. In particular, getting a Novell (running with the Linux
kernel) system to talk to Lustre. I''m not sure what they do with the
kernel these days, but you used to be able to run eDirectory on a box
with standard RedHat kernels. So, I may have some interesting results!

Cheers,
Paul.
> 
> -Phil
>

paul@pabryan.mine.nu

2006-May-19 07:36 UTC

head link

[Lustre-discuss] File system corruption

Hi there. I''ve set up a new Lustre 1.04 cluster on Debian hosts
that''s
been running for around a week now. I only have one OST on there at
the moment.

Yesterday I started getting write errors. Trying to create a new file
or directory results in "Input/output error" message. I can still read
from the cluster and write to existing files though.

The errors started immediately after deleting some files with rm. This
command completed successfully - or at least appeared to. The files
don''t show up with ls in any event.

As far as I can tell, I didn''t lose network comms, and definitely
didn''t lose power at all.

I tried to download the patched e2fsprogs, but I don''t have
permission. Is this only available to clients with support contracts?

Any help would be appreciated.

Output from dmesg is (I haven''t got debugging turned on):

OST:

LustreError: 1359:(filter.c:1633:filter_precreate()) Serious error: objid 710806
already exists; is this filesystem corrupt?
Lustre: 1192:(socknal_cb.c:1544:ksocknal_process_receive()) [f0974000] EOF from
0xc0a80cc8 ip 192.168.12.200:33982
LustreError: 1355:(../ldlm/ldlm_lib.c:354:target_handle_reconnect())
b1a192d7-b9d3-412b-b42e-73f21becf023 reconnecting
Lustre: 1341:(filter.c:1543:filter_destroy_precreated()) deleting orphan objects
from 709219 to 710805
Lustre: 1192:(socknal_cb.c:1544:ksocknal_process_receive()) [eff38800] EOF from
0xc0a80ccb ip 192.168.12.203:34281

MDS:

LustreError: 28197:(osc_create.c:82:osc_interpret_create()) @@@ unknown rc -17
from async create: failing oscc req@f62f3400 x6/t0
o5->ost1_UUID@NID_192.168.12.202_UUID:6 lens 240/240 ref 1 fl Interpret:R/0/0
rc -17/-17
LustreError: 28197:(recover.c:103:ptlrpc_run_failed_import_upcall()) Invoked
upcall /usr/lib/lustre/lustre_upcall FAILED_IMPORT ost1_UUID
OSC_pan-mds1_ost1_mds1 NID_192.168.12.202_UUID
LustreError: 28083:(mds_open.c:389:mds_create_objects()) error creating objects
for inode 40699: rc = -5
LustreError: 28083:(mds_open.c:610:mds_finish_open()) mds_create_objects: rc =
-5
LustreError: 28216:(import.c:126:ptlrpc_connect_import()) reconnected to
ost1_UUID@NID_192.168.12.202_UUID after partition
Lustre: 28216:(mds_lov.c:522:mds_notify()) MDS mds1: ost1_UUID now active,
resetting orphans

Client:

LustreError: 5131:(mdc_locks.c:306:mdc_enqueue()) ldlm_cli_enqueue: -5
LustreError: 5131:(../ldlm/ldlm_request.c:549:ldlm_cli_cancel()) Got rc -5 from
cancel RPC: canceling anyway
LustreError: 1716:(mdc_request.c:487:mdc_close()) Unexpected: can''t
find mdc_open_data, but the close succeeded.  Please tell CFS.
LustreError: 1716:(mdc_request.c:487:mdc_close()) Unexpected: can''t
find mdc_open_data, but the close succeeded.  Please tell CFS.
LustreError: 1716:(mdc_request.c:487:mdc_close()) Unexpected: can''t
find mdc_open_data, but the close succeeded.  Please tell CFS.
LustreError: 1716:(mdc_request.c:487:mdc_close()) Unexpected: can''t
find mdc_open_data, but the close succeeded.  Please tell CFS.
LustreError: 1716:(mdc_request.c:487:mdc_close()) Unexpected: can''t
find mdc_open_data, but the close succeeded.  Please tell CFS.
LustreError: 1716:(mdc_request.c:487:mdc_close()) Unexpected: can''t
find mdc_open_data, but the close succeeded.  Please tell CFS.
LustreError: 1716:(mdc_request.c:487:mdc_close()) Unexpected: can''t
find mdc_open_data, but the close succeeded.  Please tell CFS.
LustreError: 1716:(mdc_request.c:487:mdc_close()) Unexpected: can''t
find mdc_open_data, but the close succeeded.  Please tell CFS.
LustreError: 1716:(mdc_request.c:487:mdc_close()) Unexpected: can''t
find mdc_open_data, but the close succeeded.  Please tell CFS.
LustreError: 1716:(mdc_request.c:487:mdc_close()) Unexpected: can''t
find mdc_open_data, but the close succeeded.  Please tell CFS.
LustreError: 5131:(../ldlm/ldlm_request.c:549:ldlm_cli_cancel()) Got rc -5 from
cancel RPC: canceling anyway
LustreError: 5131:(mdc_locks.c:306:mdc_enqueue()) ldlm_cli_enqueue: -5
LustreError: 30649:(connection.c:164:ptlrpc_cleanup_connection()) Connection
d884ef60/NID_192.168.12.200_UUID has refcount 1817 (nid=0xc0a80cc8 on socknal)
LustreError: 30651:(class_obd.c:676:cleanup_obdclass()) obd mem max: 48250311
leaked: 2051980

Cheers,
Paul.

Lustre discuss - May 2006 - File system corruption

[Lustre-discuss] File system corruption

[Lustre-discuss] File system corruption

[Lustre-discuss] File system corruption

[Lustre-discuss] File system corruption