Colin Leavett-Brown
2012-Feb-06 19:47 UTC
[Lustre-discuss] Migrate lustre 1.8 to 2.1 recommended procedure.
What is the recommended procedure to migrate a Lustre 1.8.3 filesystem to Lustre 2.1? Can I install a new Linux kernel, Lustre 2.1 rpms and attach the OSTs? Or do I need to create a new filesystems and copy the data across?
Our filesystem suffered a major MDS problem. ?We now have some very bad inconsistencies between the MDS and OST''s and know that the solution will either be formatting the filesystem or an extended outage for an fsck. ?In the mean time, I''d like to ask if it''s possible to erase some broken files manually. ?Here''s an example of a broken file from the client perspective: sh-3.2# ls -l 100MB.bin ls: 100MB.bin: Invalid argument sh-3.2# ls -l total 16 ?--------- ?? ? ? ? ? ?? ? ? ? ? ? ? ? ? ? ? ? ?? 100MB.bin drwxr-xr-x 14 root ? ? root ? ? 4096 Feb ?6 15:51 deprecated sh-3.2# mv 100MB.bin deprecated/ mv: cannot stat `100MB.bin'': Invalid argument sh-3.2# lfs getstripe 100MB.bin llapi_semantic_traverse: Failed to open ''100MB.bin'': Invalid argument (22) error: getstripe failed for 100MB.bin. sh-3.2# rm -f 100MB.bin rm: cannot remove `100MB.bin'': Invalid argument sh-3.2# When I try to manipulate the file, these are the messages in /var/log/messages: kernel: Lustre: 20538:0:(lov_pack.c:64:lov_dump_lmm_common()) objid 0xd1a97a8, magic 0x0bd10bd0, pattern 0x1 kernel: Lustre: 20538:0:(lov_pack.c:67:lov_dump_lmm_common()) stripe_size 1048576, stripe_count 1 kernel: Lustre: 20538:0:(lov_pack.c:84:lov_dump_lmm_objects()) stripe 0 idx 8 subobj 0x0/0xf4374 So the question is: Is there any way to delete or recreate this file? Thanks for any help!
The "unlink" command will remove a file reference from the file system. Malcolm. On 07/02/2012 08:19, My Lustre wrote:> Our filesystem suffered a major MDS problem. We now have some very bad inconsistencies between the MDS and OST''s and know that the solution will either be formatting the filesystem or an extended outage for an fsck. In the mean time, I''d like to ask if it''s possible to erase some broken files manually. Here''s an example of a broken file from the client perspective: > > sh-3.2# ls -l 100MB.bin > ls: 100MB.bin: Invalid argument > sh-3.2# ls -l > total 16 > ?--------- ? ? ? ? ? 100MB.bin > drwxr-xr-x 14 root root 4096 Feb 6 15:51 deprecated > > sh-3.2# mv 100MB.bin deprecated/ > mv: cannot stat `100MB.bin'': Invalid argument > sh-3.2# lfs getstripe 100MB.bin > llapi_semantic_traverse: Failed to open ''100MB.bin'': Invalid argument (22) > error: getstripe failed for 100MB.bin. > sh-3.2# rm -f 100MB.bin > rm: cannot remove `100MB.bin'': Invalid argument > sh-3.2# > > > When I try to manipulate the file, these are the messages in /var/log/messages: > > kernel: Lustre: 20538:0:(lov_pack.c:64:lov_dump_lmm_common()) objid 0xd1a97a8, magic 0x0bd10bd0, pattern 0x1 > kernel: Lustre: 20538:0:(lov_pack.c:67:lov_dump_lmm_common()) stripe_size 1048576, stripe_count 1 > kernel: Lustre: 20538:0:(lov_pack.c:84:lov_dump_lmm_objects()) stripe 0 idx 8 subobj 0x0/0xf4374 > > So the question is: Is there any way to delete or recreate this file? > > Thanks for any help! > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss
On Mon, Feb 06, 2012 at 01:19:13PM -0800, My Lustre wrote:> So the question is: Is there any way to delete or recreate this file?You can try "unlink 100MB.bin". Unlike rm, the unlink command won''t stat the file to check whether it is a directory. Cheers, Johann -- Johann Lombardi Whamcloud, Inc. www.whamcloud.com
Thanks for the quick replies, but this also fails: sh-3.2# unlink 100MB.bin unlink: cannot unlink `100MB.bin'': Invalid argument kernel: Lustre: 21134:0:(lov_pack.c:64:lov_dump_lmm_common()) objid 0xd1a97a8, magic 0x0bd10bd0, pattern 0x1 kernel: Lustre: 21134:0:(lov_pack.c:67:lov_dump_lmm_common()) stripe_size 1048576, stripe_count 1 kernel: Lustre: 21134:0:(lov_pack.c:84:lov_dump_lmm_objects()) stripe 0 idx 8 subobj 0x0/0xf4374 ----- Original Message ----- From: Malcolm Cowe <malcolm.cowe at oracle.com> To: My Lustre <lustrefs at yahoo.com> Cc: "lustre-discuss at lists.lustre.org" <lustre-discuss at lists.lustre.org> Sent: Monday, February 6, 2012 4:26 PM Subject: Re: [Lustre-discuss] Bad files on 1.8.5 The "unlink" command will remove a file reference from the file system. Malcolm. On 07/02/2012 08:19, My Lustre wrote:> Our filesystem suffered a major MDS problem.? We now have some very bad inconsistencies between the MDS and OST''s and know that the solution will either be formatting the filesystem or an extended outage for an fsck.? In the mean time, I''d like to ask if it''s possible to erase some broken files manually.? Here''s an example of a broken file from the client perspective: > > sh-3.2# ls -l 100MB.bin > ls: 100MB.bin: Invalid argument > sh-3.2# ls -l > total 16 > ?---------? ? ?? ? ? ? ?? ? ? ? ? ?? ? ? ? ? ? ? 100MB.bin > drwxr-xr-x 14 root? ? root? ? 4096 Feb? 6 15:51 deprecated > > sh-3.2# mv 100MB.bin deprecated/ > mv: cannot stat `100MB.bin'': Invalid argument > sh-3.2# lfs getstripe 100MB.bin > llapi_semantic_traverse: Failed to open ''100MB.bin'': Invalid argument (22) > error: getstripe failed for 100MB.bin. > sh-3.2# rm -f 100MB.bin > rm: cannot remove `100MB.bin'': Invalid argument > sh-3.2# > > > When I try to manipulate the file, these are the messages in /var/log/messages: > > kernel: Lustre: 20538:0:(lov_pack.c:64:lov_dump_lmm_common()) objid 0xd1a97a8, magic 0x0bd10bd0, pattern 0x1 > kernel: Lustre: 20538:0:(lov_pack.c:67:lov_dump_lmm_common()) stripe_size 1048576, stripe_count 1 > kernel: Lustre: 20538:0:(lov_pack.c:84:lov_dump_lmm_objects()) stripe 0 idx 8 subobj 0x0/0xf4374 > > So the question is: Is there any way to delete or recreate this file? > > Thanks for any help! > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss
Carlos Thomaz
2012-Feb-08 04:32 UTC
[Lustre-discuss] Migrate lustre 1.8 to 2.1 recommended procedure.
Hi. There''s a summary that briefly describes the upgrade procedure from 1.8 to2.0 (So it is possible to upgrade without formatting the underneath filesystem from 1.8 to 2.0). This procedure is even referenced from the Lustre 2.1 manuals, so I guess is ok when upgrading to 2.1. Anyways, assuming you are currently running on rh-el5, rh-el6 or sles11, you should be able to do it. Take a look at: http://build.whamcloud.com/job/lustre-manual/lastSuccessfulBuild/artifact/l ustre_manual.html#dbdoclet.50438205_51369 And also review this: http://build.whamcloud.com/job/lustre-manual/lastSuccessfulBuild/artifact/l ustre_manual.html#installinglustre.tab.req However, this is a major upgrade. If you are capable to run a lustre 2.1 fs in parallel and copy data across, I would go for that. Assuming you have HA pairs for both MDS would be even possible to avoid major downtimes but it''s quite complicate and you will need to plan it very carefully since will require a set of failovers and failbacks events, etc. Regards, Carlos. -- Carlos Thomaz | HPC Systems Architect Mobile: +1 (303) 519-0578 cthomaz at ddn.com | Skype ID: carlosthomaz DataDirect Networks, Inc. 9960 Federal Dr., Ste 100 Colorado Springs, CO 80921 ddn.com <http://www.ddn.com/> | Twitter: @ddn_limitless <http://twitter.com/ddn_limitless> | 1.800.TERABYTE On 2/6/12 12:47 PM, "Colin Leavett-Brown" <crlb at uvic.ca> wrote:>What is the recommended procedure to migrate a Lustre 1.8.3 filesystem >to Lustre 2.1? Can I install a new Linux kernel, Lustre 2.1 rpms and >attach the OSTs? Or do I need to create a new filesystems and copy the >data across? >_______________________________________________ >Lustre-discuss mailing list >Lustre-discuss at lists.lustre.org >http://lists.lustre.org/mailman/listinfo/lustre-discuss
On Monday, February 06, 2012 10:19:13 PM My Lustre wrote:> Our filesystem suffered a major MDS problem. We now have some very bad > inconsistencies between the MDS and OST''s and know that the solution will > either be formatting the filesystem or an extended outage for an fsck. In > the mean time, I''d like to ask if it''s possible to erase some broken files > manually. Here''s an example of a broken file from the client perspective: > > sh-3.2# ls -l 100MB.bin > ls: 100MB.bin: Invalid argument > sh-3.2# ls -l > total 16 > ?--------- ? ? ? ? ? 100MB.bin > drwxr-xr-x 14 root root 4096 Feb 6 15:51 deprecatedJust to eliminate the obvious, make sure you have your user info (/etc/group passwd) synced on all nodes. A mismatch gives similar sympthoms. /Peter -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 198 bytes Desc: This is a digitally signed message part. Url : http://lists.lustre.org/pipermail/lustre-discuss/attachments/20120209/58acb429/attachment.bin