Hi Paul-- There have been so many dozens of bugs fixed since v1.0.4 that I really don''t want to even speculate. It would not be useful for either of us to try to debug 1.0.4. What if we give you access to 1.2.x, and you give that a go instead? -Phil On 10/4/2004 20:06, paul@pabryan.mine.nu wrote:> Hi again. > > I haven''t heard anything about my filesystem corruption problem, but I > was wondering if someone can answer me this: > > Can I get access to the patched version of e2fsprogs without a support > contract? > > I''ve been trialling Lustre in the hopes of convincing management to roll > out a larger scale cluster, which would then involve a support contract > from CluterFS. Unfortunately, its going to be really difficult to sell > the idea when I have a corrupt file system that I can''t recover :( > > So, can anyone answer my question? > > Cheers, > Paul. > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss@lists.clusterfs.com > https://lists.clusterfs.com/mailman/listinfo/lustre-discuss
Hi again. I haven''t heard anything about my filesystem corruption problem, but I was wondering if someone can answer me this: Can I get access to the patched version of e2fsprogs without a support contract? I''ve been trialling Lustre in the hopes of convincing management to roll out a larger scale cluster, which would then involve a support contract from CluterFS. Unfortunately, its going to be really difficult to sell the idea when I have a corrupt file system that I can''t recover :( So, can anyone answer my question? Cheers, Paul.
On Tue, Oct 05, 2004 at 01:06:10AM -0400, Phil Schwan wrote:> Hi Paul-- > > There have been so many dozens of bugs fixed since v1.0.4 that I really > don''t want to even speculate. It would not be useful for either of us to > try to debug 1.0.4.Seems fair.> > What if we give you access to 1.2.x, and you give that a go instead?That would be great. Is the patched e2fsprogs distributed with 1.2.x? If not, is it possible to get access to it? I''d still like to run it over the filesystem to see what it turns up. Just so you know though, what I''m doing at the moment is a small scale rollout of a cluster to store backups on. It''s a semi-production proof of concept. I''m anticipating around 5 nodes or so. The point is to convince management that Lustre would be a good option to roll out for the entire organization (distributed across 5 different sites - yes, lots of questions to be answered there!). So, at this point, the full project may not get off the ground. They may choose a different system. If you''re fine with giving me access in this situation, then no problem. I just wanted to make sure you knew this beforehand. On the plus side for you guys, I''m looking at some interesting scenarios. In particular, getting a Novell (running with the Linux kernel) system to talk to Lustre. I''m not sure what they do with the kernel these days, but you used to be able to run eDirectory on a box with standard RedHat kernels. So, I may have some interesting results! Cheers, Paul.> > -Phil >
Hi there. I''ve set up a new Lustre 1.04 cluster on Debian hosts that''s been running for around a week now. I only have one OST on there at the moment. Yesterday I started getting write errors. Trying to create a new file or directory results in "Input/output error" message. I can still read from the cluster and write to existing files though. The errors started immediately after deleting some files with rm. This command completed successfully - or at least appeared to. The files don''t show up with ls in any event. As far as I can tell, I didn''t lose network comms, and definitely didn''t lose power at all. I tried to download the patched e2fsprogs, but I don''t have permission. Is this only available to clients with support contracts? Any help would be appreciated. Output from dmesg is (I haven''t got debugging turned on): OST: LustreError: 1359:(filter.c:1633:filter_precreate()) Serious error: objid 710806 already exists; is this filesystem corrupt? Lustre: 1192:(socknal_cb.c:1544:ksocknal_process_receive()) [f0974000] EOF from 0xc0a80cc8 ip 192.168.12.200:33982 LustreError: 1355:(../ldlm/ldlm_lib.c:354:target_handle_reconnect()) b1a192d7-b9d3-412b-b42e-73f21becf023 reconnecting Lustre: 1341:(filter.c:1543:filter_destroy_precreated()) deleting orphan objects from 709219 to 710805 Lustre: 1192:(socknal_cb.c:1544:ksocknal_process_receive()) [eff38800] EOF from 0xc0a80ccb ip 192.168.12.203:34281 MDS: LustreError: 28197:(osc_create.c:82:osc_interpret_create()) @@@ unknown rc -17 from async create: failing oscc req@f62f3400 x6/t0 o5->ost1_UUID@NID_192.168.12.202_UUID:6 lens 240/240 ref 1 fl Interpret:R/0/0 rc -17/-17 LustreError: 28197:(recover.c:103:ptlrpc_run_failed_import_upcall()) Invoked upcall /usr/lib/lustre/lustre_upcall FAILED_IMPORT ost1_UUID OSC_pan-mds1_ost1_mds1 NID_192.168.12.202_UUID LustreError: 28083:(mds_open.c:389:mds_create_objects()) error creating objects for inode 40699: rc = -5 LustreError: 28083:(mds_open.c:610:mds_finish_open()) mds_create_objects: rc = -5 LustreError: 28216:(import.c:126:ptlrpc_connect_import()) reconnected to ost1_UUID@NID_192.168.12.202_UUID after partition Lustre: 28216:(mds_lov.c:522:mds_notify()) MDS mds1: ost1_UUID now active, resetting orphans Client: LustreError: 5131:(mdc_locks.c:306:mdc_enqueue()) ldlm_cli_enqueue: -5 LustreError: 5131:(../ldlm/ldlm_request.c:549:ldlm_cli_cancel()) Got rc -5 from cancel RPC: canceling anyway LustreError: 1716:(mdc_request.c:487:mdc_close()) Unexpected: can''t find mdc_open_data, but the close succeeded. Please tell CFS. LustreError: 1716:(mdc_request.c:487:mdc_close()) Unexpected: can''t find mdc_open_data, but the close succeeded. Please tell CFS. LustreError: 1716:(mdc_request.c:487:mdc_close()) Unexpected: can''t find mdc_open_data, but the close succeeded. Please tell CFS. LustreError: 1716:(mdc_request.c:487:mdc_close()) Unexpected: can''t find mdc_open_data, but the close succeeded. Please tell CFS. LustreError: 1716:(mdc_request.c:487:mdc_close()) Unexpected: can''t find mdc_open_data, but the close succeeded. Please tell CFS. LustreError: 1716:(mdc_request.c:487:mdc_close()) Unexpected: can''t find mdc_open_data, but the close succeeded. Please tell CFS. LustreError: 1716:(mdc_request.c:487:mdc_close()) Unexpected: can''t find mdc_open_data, but the close succeeded. Please tell CFS. LustreError: 1716:(mdc_request.c:487:mdc_close()) Unexpected: can''t find mdc_open_data, but the close succeeded. Please tell CFS. LustreError: 1716:(mdc_request.c:487:mdc_close()) Unexpected: can''t find mdc_open_data, but the close succeeded. Please tell CFS. LustreError: 1716:(mdc_request.c:487:mdc_close()) Unexpected: can''t find mdc_open_data, but the close succeeded. Please tell CFS. LustreError: 5131:(../ldlm/ldlm_request.c:549:ldlm_cli_cancel()) Got rc -5 from cancel RPC: canceling anyway LustreError: 5131:(mdc_locks.c:306:mdc_enqueue()) ldlm_cli_enqueue: -5 LustreError: 30649:(connection.c:164:ptlrpc_cleanup_connection()) Connection d884ef60/NID_192.168.12.200_UUID has refcount 1817 (nid=0xc0a80cc8 on socknal) LustreError: 30651:(class_obd.c:676:cleanup_obdclass()) obd mem max: 48250311 leaked: 2051980 Cheers, Paul.