Stu Midgley
2012-Mar-18 05:20 UTC
[Lustre-discuss] [wc-discuss] can''t mount our lustre filesystem after tunefs.lustre --writeconf
ok, from what I can tell, the root of the problem is [root at mds001 CONFIGS]# hexdump -C p1-MDT0000 | grep -C 2 mds 00002450 0b 00 00 00 04 00 00 00 12 00 00 00 00 00 00 00 |................| 00002460 70 31 2d 4d 44 54 30 30 30 30 00 00 00 00 00 00 |p1-MDT0000......| 00002470 6d 64 73 00 00 00 00 00 70 72 6f 64 5f 6d 64 73 |mds.....prod_mds| 00002480 5f 30 30 31 5f 55 55 49 44 00 00 00 00 00 00 00 |_001_UUID.......| 00002490 78 00 00 00 07 00 00 00 88 00 00 00 08 00 00 00 |x...............| -- 000024c0 00 00 00 00 04 00 00 00 0b 00 00 00 12 00 00 00 |................| 000024d0 02 00 00 00 0b 00 00 00 70 31 2d 4d 44 54 30 30 |........p1-MDT00| 000024e0 30 30 00 00 00 00 00 00 70 72 6f 64 5f 6d 64 73 |00......prod_mds| 000024f0 5f 30 30 31 5f 55 55 49 44 00 00 00 00 00 00 00 |_001_UUID.......| 00002500 30 00 00 00 00 00 00 00 70 31 2d 4d 44 54 30 30 |0.......p1-MDT00| [root at mds001 CONFIGS]# [root at mds001 CONFIGS]# hexdump -C /mnt/md2/CONFIGS/p1-MDT0000 | grep -C 2 mds 00002450 0b 00 00 00 04 00 00 00 10 00 00 00 00 00 00 00 |................| 00002460 70 31 2d 4d 44 54 30 30 30 30 00 00 00 00 00 00 |p1-MDT0000......| 00002470 6d 64 73 00 00 00 00 00 70 31 2d 4d 44 54 30 30 |mds.....p1-MDT00| 00002480 30 30 5f 55 55 49 44 00 70 00 00 00 07 00 00 00 |00_UUID.p.......| 00002490 80 00 00 00 08 00 00 00 00 00 62 10 ff ff ff ff |..........b.....| now if only I can get the UUID to be removed or reset... On Sun, Mar 18, 2012 at 1:05 PM, Dr Stuart Midgley <sdm900 at gmail.com> wrote:> hmmm? that didn''t work > > # tunefs.lustre --force --fsname=p1 /dev/md2 > checking for existing Lustre data: found CONFIGS/mountdata > Reading CONFIGS/mountdata > > ? Read previous values: > Target: ? ? p1-MDT0000 > Index: ? ? ?0 > UUID: ? ? ? prod_mds_001_UUID > Lustre FS: ?p1 > Mount type: ldiskfs > Flags: ? ? ?0x405 > ? ? ? ? ? ? ?(MDT MGS ) > Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr > Parameters: > > tunefs.lustre: unrecognized option `--force'' > tunefs.lustre: exiting with 22 (Invalid argument) > > > > > -- > Dr Stuart Midgley > sdm900 at gmail.com > > > > On 18/03/2012, at 12:17 AM, Nathan Rutman wrote: > >> Take them all down again, use tunefs.lustre --force --fsname. >> >> >> On Mar 17, 2012, at 2:10 AM, "Stu Midgley" <sdm900 at gmail.com> wrote: >> >>> Afternoon >>> >>> We have a rather severe problem with our lustre file system. ?We had a >>> full config log and the advice was to rewrite it with a new one. ?So, >>> we unmounted our lustre file system off all clients, unmount all the >>> ost''s and then unmounted the mds. ?I then did >>> >>> mds: >>> ?tunefs.lustre --writeconf --erase-params /dev/md2 >>> >>> oss: >>> ?tunefs.lustre --writeconf --erase-params --mgsnode=mds001 /dev/md2 >>> >>> >>> >>> After the tunefs.lustre on the mds I saw >>> >>> Mar 17 14:33:02 mds001 kernel: Lustre: MGS MGS started >>> Mar 17 14:33:02 mds001 kernel: Lustre: MGC172.16.0.251 at tcp: Reactivating import >>> Mar 17 14:33:02 mds001 kernel: Lustre: MGS: Logs for fs p1 were >>> removed by user request. ?All servers must be restarted in order to >>> regenerate the logs. >>> Mar 17 14:33:02 mds001 kernel: Lustre: Enabling user_xattr >>> Mar 17 14:33:02 mds001 kernel: Lustre: p1-MDT0000: new disk, initializing >>> Mar 17 14:33:02 mds001 kernel: Lustre: p1-MDT0000: Now serving >>> p1-MDT0000 on /dev/md2 with recovery enabled >>> >>> which scared me a little... >>> >>> >>> >>> the mds and the oss''s mount happily BUT I can''t mount the file system >>> on my clients... on the mds I see >>> >>> >>> Mar 17 16:42:11 mds001 kernel: LustreError: 137-5: UUID >>> ''prod_mds_001_UUID'' is not available ?for connect (no target) >>> >>> >>> On the client I see >>> >>> >>> Mar 17 16:00:06 host kernel: LustreError: 11-0: an error occurred >>> while communicating with 172.16.0.251 at tcp. The mds_connect operation >>> failed with -19 >>> >>> >>> now, it appears the writeconf renamed the UUID of the mds from >>> prod_mds_001_UUID to p1-MDT0000_UUID but I can''t work out how to get >>> it back... >>> >>> >>> for example I tried >>> >>> >>> # tunefs.lustre --mgs --mdt --fsname=p1 /dev/md2 >>> checking for existing Lustre data: found CONFIGS/mountdata >>> Reading CONFIGS/mountdata >>> >>> Read previous values: >>> Target: ? ? p1-MDT0000 >>> Index: ? ? ?0 >>> UUID: ? ? ? prod_mds_001_UUID >>> Lustre FS: ?p1 >>> Mount type: ldiskfs >>> Flags: ? ? ?0x405 >>> ? ? ? ? ? ?(MDT MGS ) >>> Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr >>> Parameters: >>> >>> tunefs.lustre: cannot change the name of a registered target >>> tunefs.lustre: exiting with 1 (Operation not permitted) >>> >>> >>> >>> I''m now stuck not being able to mount a 1PB file system... which isn''t good :( >>> >>> -- >>> Dr Stuart Midgley >>> sdm900 at gmail.com >> ______________________________________________________________________ >> This email may contain privileged or confidential information, which should only be used for the purpose for which it was sent by Xyratex. No further rights or licenses are granted to use such information. If you are not the intended recipient of this message, please notify the sender by return and delete it. You may not use, copy, disclose or rely on the information contained in it. >> >> Internet email is susceptible to data corruption, interception and unauthorised amendment for which Xyratex does not accept liability. While we have taken reasonable precautions to ensure that this email is free of viruses, Xyratex does not accept liability for the presence of any computer viruses in this email, nor for any losses caused as a result of viruses. >> >> Xyratex Technology Limited (03134912), Registered in England & Wales, Registered Office, Langstone Road, Havant, Hampshire, PO9 1SA. >> >> The Xyratex group of companies also includes, Xyratex Ltd, registered in Bermuda, Xyratex International Inc, registered in California, Xyratex (Malaysia) Sdn Bhd registered in Malaysia, Xyratex Technology (Wuxi) Co Ltd registered in The People''s Republic of China and Xyratex Japan Limited registered in Japan. >> ______________________________________________________________________ >> >> >-- Dr Stuart Midgley sdm900 at gmail.com
Kit Westneat
2012-Mar-18 06:24 UTC
[Lustre-discuss] [wc-discuss] can''t mount our lustre filesystem after tunefs.lustre --writeconf
You should be able to reset the UUID by doing another writeconf with the --fsname flag. After the writeconf, you''ll have to writeconf all the OSTs too. It worked on my very simple test at least: [root at mds1 tmp]# tunefs.lustre --writeconf --fsname=test1 /dev/loop0 checking for existing Lustre data: found CONFIGS/mountdata Reading CONFIGS/mountdata Read previous values: Target: t1-MDT0000 Index: 0 Lustre FS: t1 Mount type: ldiskfs Flags: 0x5 (MDT MGS ) Persistent mount opts: iopen_nopriv,user_xattr,errors=remount-ro Parameters: mdt.group_upcall=/usr/sbin/l_getgroups Permanent disk data: Target: test1-MDT0000 Index: 0 Lustre FS: test1 Mount type: ldiskfs Flags: 0x105 (MDT MGS writeconf ) Persistent mount opts: iopen_nopriv,user_xattr,errors=remount-ro Parameters: mdt.group_upcall=/usr/sbin/l_getgroups Writing CONFIGS/mountdata HTH, Kit -- Kit Westneat System Administrator, eSys kit.westneat at nyu.edu 212-992-7647 On Sun, Mar 18, 2012 at 1:20 AM, Stu Midgley <sdm900 at gmail.com> wrote:> ok, from what I can tell, the root of the problem is > > > [root at mds001 CONFIGS]# hexdump -C p1-MDT0000 | grep -C 2 mds > 00002450 0b 00 00 00 04 00 00 00 12 00 00 00 00 00 00 00 > |................| > 00002460 70 31 2d 4d 44 54 30 30 30 30 00 00 00 00 00 00 > |p1-MDT0000......| > 00002470 6d 64 73 00 00 00 00 00 70 72 6f 64 5f 6d 64 73 > |mds.....prod_mds| > 00002480 5f 30 30 31 5f 55 55 49 44 00 00 00 00 00 00 00 > |_001_UUID.......| > 00002490 78 00 00 00 07 00 00 00 88 00 00 00 08 00 00 00 > |x...............| > -- > 000024c0 00 00 00 00 04 00 00 00 0b 00 00 00 12 00 00 00 > |................| > 000024d0 02 00 00 00 0b 00 00 00 70 31 2d 4d 44 54 30 30 > |........p1-MDT00| > 000024e0 30 30 00 00 00 00 00 00 70 72 6f 64 5f 6d 64 73 > |00......prod_mds| > 000024f0 5f 30 30 31 5f 55 55 49 44 00 00 00 00 00 00 00 > |_001_UUID.......| > 00002500 30 00 00 00 00 00 00 00 70 31 2d 4d 44 54 30 30 > |0.......p1-MDT00| > > [root at mds001 CONFIGS]# > [root at mds001 CONFIGS]# hexdump -C /mnt/md2/CONFIGS/p1-MDT0000 | grep -C 2 > mds > 00002450 0b 00 00 00 04 00 00 00 10 00 00 00 00 00 00 00 > |................| > 00002460 70 31 2d 4d 44 54 30 30 30 30 00 00 00 00 00 00 > |p1-MDT0000......| > 00002470 6d 64 73 00 00 00 00 00 70 31 2d 4d 44 54 30 30 > |mds.....p1-MDT00| > 00002480 30 30 5f 55 55 49 44 00 70 00 00 00 07 00 00 00 > |00_UUID.p.......| > 00002490 80 00 00 00 08 00 00 00 00 00 62 10 ff ff ff ff > |..........b.....| > > > now if only I can get the UUID to be removed or reset... > > > On Sun, Mar 18, 2012 at 1:05 PM, Dr Stuart Midgley <sdm900 at gmail.com> > wrote: > > hmmm? that didn''t work > > > > # tunefs.lustre --force --fsname=p1 /dev/md2 > > checking for existing Lustre data: found CONFIGS/mountdata > > Reading CONFIGS/mountdata > > > > Read previous values: > > Target: p1-MDT0000 > > Index: 0 > > UUID: prod_mds_001_UUID > > Lustre FS: p1 > > Mount type: ldiskfs > > Flags: 0x405 > > (MDT MGS ) > > Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr > > Parameters: > > > > tunefs.lustre: unrecognized option `--force'' > > tunefs.lustre: exiting with 22 (Invalid argument) > > > > > > > > > > -- > > Dr Stuart Midgley > > sdm900 at gmail.com > > > > > > > > On 18/03/2012, at 12:17 AM, Nathan Rutman wrote: > > > >> Take them all down again, use tunefs.lustre --force --fsname. > >> > >> > >> On Mar 17, 2012, at 2:10 AM, "Stu Midgley" <sdm900 at gmail.com> wrote: > >> > >>> Afternoon > >>> > >>> We have a rather severe problem with our lustre file system. We had a > >>> full config log and the advice was to rewrite it with a new one. So, > >>> we unmounted our lustre file system off all clients, unmount all the > >>> ost''s and then unmounted the mds. I then did > >>> > >>> mds: > >>> tunefs.lustre --writeconf --erase-params /dev/md2 > >>> > >>> oss: > >>> tunefs.lustre --writeconf --erase-params --mgsnode=mds001 /dev/md2 > >>> > >>> > >>> > >>> After the tunefs.lustre on the mds I saw > >>> > >>> Mar 17 14:33:02 mds001 kernel: Lustre: MGS MGS started > >>> Mar 17 14:33:02 mds001 kernel: Lustre: MGC172.16.0.251 at tcp: > Reactivating import > >>> Mar 17 14:33:02 mds001 kernel: Lustre: MGS: Logs for fs p1 were > >>> removed by user request. All servers must be restarted in order to > >>> regenerate the logs. > >>> Mar 17 14:33:02 mds001 kernel: Lustre: Enabling user_xattr > >>> Mar 17 14:33:02 mds001 kernel: Lustre: p1-MDT0000: new disk, > initializing > >>> Mar 17 14:33:02 mds001 kernel: Lustre: p1-MDT0000: Now serving > >>> p1-MDT0000 on /dev/md2 with recovery enabled > >>> > >>> which scared me a little... > >>> > >>> > >>> > >>> the mds and the oss''s mount happily BUT I can''t mount the file system > >>> on my clients... on the mds I see > >>> > >>> > >>> Mar 17 16:42:11 mds001 kernel: LustreError: 137-5: UUID > >>> ''prod_mds_001_UUID'' is not available for connect (no target) > >>> > >>> > >>> On the client I see > >>> > >>> > >>> Mar 17 16:00:06 host kernel: LustreError: 11-0: an error occurred > >>> while communicating with 172.16.0.251 at tcp. The mds_connect operation > >>> failed with -19 > >>> > >>> > >>> now, it appears the writeconf renamed the UUID of the mds from > >>> prod_mds_001_UUID to p1-MDT0000_UUID but I can''t work out how to get > >>> it back... > >>> > >>> > >>> for example I tried > >>> > >>> > >>> # tunefs.lustre --mgs --mdt --fsname=p1 /dev/md2 > >>> checking for existing Lustre data: found CONFIGS/mountdata > >>> Reading CONFIGS/mountdata > >>> > >>> Read previous values: > >>> Target: p1-MDT0000 > >>> Index: 0 > >>> UUID: prod_mds_001_UUID > >>> Lustre FS: p1 > >>> Mount type: ldiskfs > >>> Flags: 0x405 > >>> (MDT MGS ) > >>> Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr > >>> Parameters: > >>> > >>> tunefs.lustre: cannot change the name of a registered target > >>> tunefs.lustre: exiting with 1 (Operation not permitted) > >>> > >>> > >>> > >>> I''m now stuck not being able to mount a 1PB file system... which isn''t > good :( > >>> > >>> -- > >>> Dr Stuart Midgley > >>> sdm900 at gmail.com > >> ______________________________________________________________________ > >> This email may contain privileged or confidential information, which > should only be used for the purpose for which it was sent by Xyratex. No > further rights or licenses are granted to use such information. If you are > not the intended recipient of this message, please notify the sender by > return and delete it. You may not use, copy, disclose or rely on the > information contained in it. > >> > >> Internet email is susceptible to data corruption, interception and > unauthorised amendment for which Xyratex does not accept liability. While > we have taken reasonable precautions to ensure that this email is free of > viruses, Xyratex does not accept liability for the presence of any computer > viruses in this email, nor for any losses caused as a result of viruses. > >> > >> Xyratex Technology Limited (03134912), Registered in England & Wales, > Registered Office, Langstone Road, Havant, Hampshire, PO9 1SA. > >> > >> The Xyratex group of companies also includes, Xyratex Ltd, registered > in Bermuda, Xyratex International Inc, registered in California, Xyratex > (Malaysia) Sdn Bhd registered in Malaysia, Xyratex Technology (Wuxi) Co Ltd > registered in The People''s Republic of China and Xyratex Japan Limited > registered in Japan. > >> ______________________________________________________________________ > >> > >> > > > > > > -- > Dr Stuart Midgley > sdm900 at gmail.com >-------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20120318/f579e3df/attachment-0001.html
iamatt
2012-Mar-18 06:28 UTC
[Lustre-discuss] [wc-discuss] can''t mount our lustre filesystem after tunefs.lustre --writeconf
sorry for you situation... only word of encouragement i can offer at this time is http://www.youtube.com/watch?v=hNf5s_vra18 On Sun, Mar 18, 2012 at 1:24 AM, Kit Westneat <kit.westneat at nyu.edu> wrote:> You should be able to reset the UUID by doing another writeconf with the > --fsname flag. After the writeconf, you''ll have to writeconf all the OSTs > too. > > It worked on my very simple test at least: > [root at mds1 tmp]# tunefs.lustre --writeconf --fsname=test1 /dev/loop0 > checking for existing Lustre data: found CONFIGS/mountdata > Reading CONFIGS/mountdata > > ? ?Read previous values: > Target: ? ? t1-MDT0000 > Index: ? ? ?0 > Lustre FS: ?t1 > Mount type: ldiskfs > Flags: ? ? ?0x5 > ? ? ? ? ? ? ? (MDT MGS ) > Persistent mount opts: iopen_nopriv,user_xattr,errors=remount-ro > Parameters: mdt.group_upcall=/usr/sbin/l_getgroups > > > ? ?Permanent disk data: > Target: ? ? test1-MDT0000 > Index: ? ? ?0 > Lustre FS: ?test1 > Mount type: ldiskfs > Flags: ? ? ?0x105 > ? ? ? ? ? ? ? (MDT MGS writeconf ) > Persistent mount opts: iopen_nopriv,user_xattr,errors=remount-ro > Parameters: mdt.group_upcall=/usr/sbin/l_getgroups > > Writing CONFIGS/mountdata > > > HTH, > Kit > -- > Kit Westneat > System Administrator, eSys > kit.westneat at nyu.edu > 212-992-7647 > > > > On Sun, Mar 18, 2012 at 1:20 AM, Stu Midgley <sdm900 at gmail.com> wrote: >> >> ok, from what I can tell, the root of the problem is >> >> >> [root at mds001 CONFIGS]# hexdump -C p1-MDT0000 ?| grep -C 2 mds >> 00002450 ?0b 00 00 00 04 00 00 00 ?12 00 00 00 00 00 00 00 >> ?|................| >> 00002460 ?70 31 2d 4d 44 54 30 30 ?30 30 00 00 00 00 00 00 >> ?|p1-MDT0000......| >> 00002470 ?6d 64 73 00 00 00 00 00 ?70 72 6f 64 5f 6d 64 73 >> ?|mds.....prod_mds| >> 00002480 ?5f 30 30 31 5f 55 55 49 ?44 00 00 00 00 00 00 00 >> ?|_001_UUID.......| >> 00002490 ?78 00 00 00 07 00 00 00 ?88 00 00 00 08 00 00 00 >> ?|x...............| >> -- >> 000024c0 ?00 00 00 00 04 00 00 00 ?0b 00 00 00 12 00 00 00 >> ?|................| >> 000024d0 ?02 00 00 00 0b 00 00 00 ?70 31 2d 4d 44 54 30 30 >> ?|........p1-MDT00| >> 000024e0 ?30 30 00 00 00 00 00 00 ?70 72 6f 64 5f 6d 64 73 >> ?|00......prod_mds| >> 000024f0 ?5f 30 30 31 5f 55 55 49 ?44 00 00 00 00 00 00 00 >> ?|_001_UUID.......| >> 00002500 ?30 00 00 00 00 00 00 00 ?70 31 2d 4d 44 54 30 30 >> ?|0.......p1-MDT00| >> >> [root at mds001 CONFIGS]# >> [root at mds001 CONFIGS]# hexdump -C /mnt/md2/CONFIGS/p1-MDT0000 | grep -C 2 >> mds >> 00002450 ?0b 00 00 00 04 00 00 00 ?10 00 00 00 00 00 00 00 >> ?|................| >> 00002460 ?70 31 2d 4d 44 54 30 30 ?30 30 00 00 00 00 00 00 >> ?|p1-MDT0000......| >> 00002470 ?6d 64 73 00 00 00 00 00 ?70 31 2d 4d 44 54 30 30 >> ?|mds.....p1-MDT00| >> 00002480 ?30 30 5f 55 55 49 44 00 ?70 00 00 00 07 00 00 00 >> ?|00_UUID.p.......| >> 00002490 ?80 00 00 00 08 00 00 00 ?00 00 62 10 ff ff ff ff >> ?|..........b.....| >> >> >> now if only I can get the UUID to be removed or reset... >> >> >> On Sun, Mar 18, 2012 at 1:05 PM, Dr Stuart Midgley <sdm900 at gmail.com> >> wrote: >> > hmmm? that didn''t work >> > >> > # tunefs.lustre --force --fsname=p1 /dev/md2 >> > checking for existing Lustre data: found CONFIGS/mountdata >> > Reading CONFIGS/mountdata >> > >> > ? Read previous values: >> > Target: ? ? p1-MDT0000 >> > Index: ? ? ?0 >> > UUID: ? ? ? prod_mds_001_UUID >> > Lustre FS: ?p1 >> > Mount type: ldiskfs >> > Flags: ? ? ?0x405 >> > ? ? ? ? ? ? ?(MDT MGS ) >> > Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr >> > Parameters: >> > >> > tunefs.lustre: unrecognized option `--force'' >> > tunefs.lustre: exiting with 22 (Invalid argument) >> > >> > >> > >> > >> > -- >> > Dr Stuart Midgley >> > sdm900 at gmail.com >> > >> > >> > >> > On 18/03/2012, at 12:17 AM, Nathan Rutman wrote: >> > >> >> Take them all down again, use tunefs.lustre --force --fsname. >> >> >> >> >> >> On Mar 17, 2012, at 2:10 AM, "Stu Midgley" <sdm900 at gmail.com> wrote: >> >> >> >>> Afternoon >> >>> >> >>> We have a rather severe problem with our lustre file system. ?We had a >> >>> full config log and the advice was to rewrite it with a new one. ?So, >> >>> we unmounted our lustre file system off all clients, unmount all the >> >>> ost''s and then unmounted the mds. ?I then did >> >>> >> >>> mds: >> >>> ?tunefs.lustre --writeconf --erase-params /dev/md2 >> >>> >> >>> oss: >> >>> ?tunefs.lustre --writeconf --erase-params --mgsnode=mds001 /dev/md2 >> >>> >> >>> >> >>> >> >>> After the tunefs.lustre on the mds I saw >> >>> >> >>> Mar 17 14:33:02 mds001 kernel: Lustre: MGS MGS started >> >>> Mar 17 14:33:02 mds001 kernel: Lustre: MGC172.16.0.251 at tcp: >> >>> Reactivating import >> >>> Mar 17 14:33:02 mds001 kernel: Lustre: MGS: Logs for fs p1 were >> >>> removed by user request. ?All servers must be restarted in order to >> >>> regenerate the logs. >> >>> Mar 17 14:33:02 mds001 kernel: Lustre: Enabling user_xattr >> >>> Mar 17 14:33:02 mds001 kernel: Lustre: p1-MDT0000: new disk, >> >>> initializing >> >>> Mar 17 14:33:02 mds001 kernel: Lustre: p1-MDT0000: Now serving >> >>> p1-MDT0000 on /dev/md2 with recovery enabled >> >>> >> >>> which scared me a little... >> >>> >> >>> >> >>> >> >>> the mds and the oss''s mount happily BUT I can''t mount the file system >> >>> on my clients... on the mds I see >> >>> >> >>> >> >>> Mar 17 16:42:11 mds001 kernel: LustreError: 137-5: UUID >> >>> ''prod_mds_001_UUID'' is not available ?for connect (no target) >> >>> >> >>> >> >>> On the client I see >> >>> >> >>> >> >>> Mar 17 16:00:06 host kernel: LustreError: 11-0: an error occurred >> >>> while communicating with 172.16.0.251 at tcp. The mds_connect operation >> >>> failed with -19 >> >>> >> >>> >> >>> now, it appears the writeconf renamed the UUID of the mds from >> >>> prod_mds_001_UUID to p1-MDT0000_UUID but I can''t work out how to get >> >>> it back... >> >>> >> >>> >> >>> for example I tried >> >>> >> >>> >> >>> # tunefs.lustre --mgs --mdt --fsname=p1 /dev/md2 >> >>> checking for existing Lustre data: found CONFIGS/mountdata >> >>> Reading CONFIGS/mountdata >> >>> >> >>> Read previous values: >> >>> Target: ? ? p1-MDT0000 >> >>> Index: ? ? ?0 >> >>> UUID: ? ? ? prod_mds_001_UUID >> >>> Lustre FS: ?p1 >> >>> Mount type: ldiskfs >> >>> Flags: ? ? ?0x405 >> >>> ? ? ? ? ? ?(MDT MGS ) >> >>> Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr >> >>> Parameters: >> >>> >> >>> tunefs.lustre: cannot change the name of a registered target >> >>> tunefs.lustre: exiting with 1 (Operation not permitted) >> >>> >> >>> >> >>> >> >>> I''m now stuck not being able to mount a 1PB file system... which isn''t >> >>> good :( >> >>> >> >>> -- >> >>> Dr Stuart Midgley >> >>> sdm900 at gmail.com >> >> ______________________________________________________________________ >> >> This email may contain privileged or confidential information, which >> >> should only be used for the purpose for which it was sent by Xyratex. No >> >> further rights or licenses are granted to use such information. If you are >> >> not the intended recipient of this message, please notify the sender by >> >> return and delete it. You may not use, copy, disclose or rely on the >> >> information contained in it. >> >> >> >> Internet email is susceptible to data corruption, interception and >> >> unauthorised amendment for which Xyratex does not accept liability. While we >> >> have taken reasonable precautions to ensure that this email is free of >> >> viruses, Xyratex does not accept liability for the presence of any computer >> >> viruses in this email, nor for any losses caused as a result of viruses. >> >> >> >> Xyratex Technology Limited (03134912), Registered in England & Wales, >> >> Registered Office, Langstone Road, Havant, Hampshire, PO9 1SA. >> >> >> >> The Xyratex group of companies also includes, Xyratex Ltd, registered >> >> in Bermuda, Xyratex International Inc, registered in California, Xyratex >> >> (Malaysia) Sdn Bhd registered in Malaysia, Xyratex Technology (Wuxi) Co Ltd >> >> registered in The People''s Republic of China and Xyratex Japan Limited >> >> registered in Japan. >> >> ______________________________________________________________________ >> >> >> >> >> > >> >> >> >> -- >> Dr Stuart Midgley >> sdm900 at gmail.com > >
Dr Stuart Midgley
2012-Mar-18 06:40 UTC
[Lustre-discuss] [wc-discuss] can''t mount our lustre filesystem after tunefs.lustre --writeconf
No, we have tried that. This file system started life about 6 years ago as lustre 1.4 and has continually been upgraded? hence the whacky UUID. Trying to rename the FS doesn''t work. It doesn''t change the UUID that the mgs tells clients to mount. -- Dr Stuart Midgley sdm900 at gmail.com On 18/03/2012, at 2:24 PM, Kit Westneat wrote:> You should be able to reset the UUID by doing another writeconf with the --fsname flag. After the writeconf, you''ll have to writeconf all the OSTs too. > > It worked on my very simple test at least: > [root at mds1 tmp]# tunefs.lustre --writeconf --fsname=test1 /dev/loop0 > checking for existing Lustre data: found CONFIGS/mountdata > Reading CONFIGS/mountdata > > Read previous values: > Target: t1-MDT0000 > Index: 0 > Lustre FS: t1 > Mount type: ldiskfs > Flags: 0x5 > (MDT MGS ) > Persistent mount opts: iopen_nopriv,user_xattr,errors=remount-ro > Parameters: mdt.group_upcall=/usr/sbin/l_getgroups > > > Permanent disk data: > Target: test1-MDT0000 > Index: 0 > Lustre FS: test1 > Mount type: ldiskfs > Flags: 0x105 > (MDT MGS writeconf ) > Persistent mount opts: iopen_nopriv,user_xattr,errors=remount-ro > Parameters: mdt.group_upcall=/usr/sbin/l_getgroups > > Writing CONFIGS/mountdata > > > HTH, > Kit > -- > Kit Westneat > System Administrator, eSys > kit.westneat at nyu.edu > 212-992-7647 > > > On Sun, Mar 18, 2012 at 1:20 AM, Stu Midgley <sdm900 at gmail.com> wrote: > ok, from what I can tell, the root of the problem is > > > [root at mds001 CONFIGS]# hexdump -C p1-MDT0000 | grep -C 2 mds > 00002450 0b 00 00 00 04 00 00 00 12 00 00 00 00 00 00 00 |................| > 00002460 70 31 2d 4d 44 54 30 30 30 30 00 00 00 00 00 00 |p1-MDT0000......| > 00002470 6d 64 73 00 00 00 00 00 70 72 6f 64 5f 6d 64 73 |mds.....prod_mds| > 00002480 5f 30 30 31 5f 55 55 49 44 00 00 00 00 00 00 00 |_001_UUID.......| > 00002490 78 00 00 00 07 00 00 00 88 00 00 00 08 00 00 00 |x...............| > -- > 000024c0 00 00 00 00 04 00 00 00 0b 00 00 00 12 00 00 00 |................| > 000024d0 02 00 00 00 0b 00 00 00 70 31 2d 4d 44 54 30 30 |........p1-MDT00| > 000024e0 30 30 00 00 00 00 00 00 70 72 6f 64 5f 6d 64 73 |00......prod_mds| > 000024f0 5f 30 30 31 5f 55 55 49 44 00 00 00 00 00 00 00 |_001_UUID.......| > 00002500 30 00 00 00 00 00 00 00 70 31 2d 4d 44 54 30 30 |0.......p1-MDT00| > > [root at mds001 CONFIGS]# > [root at mds001 CONFIGS]# hexdump -C /mnt/md2/CONFIGS/p1-MDT0000 | grep -C 2 mds > 00002450 0b 00 00 00 04 00 00 00 10 00 00 00 00 00 00 00 |................| > 00002460 70 31 2d 4d 44 54 30 30 30 30 00 00 00 00 00 00 |p1-MDT0000......| > 00002470 6d 64 73 00 00 00 00 00 70 31 2d 4d 44 54 30 30 |mds.....p1-MDT00| > 00002480 30 30 5f 55 55 49 44 00 70 00 00 00 07 00 00 00 |00_UUID.p.......| > 00002490 80 00 00 00 08 00 00 00 00 00 62 10 ff ff ff ff |..........b.....| > > > now if only I can get the UUID to be removed or reset... > > > On Sun, Mar 18, 2012 at 1:05 PM, Dr Stuart Midgley <sdm900 at gmail.com> wrote: > > hmmm? that didn''t work > > > > # tunefs.lustre --force --fsname=p1 /dev/md2 > > checking for existing Lustre data: found CONFIGS/mountdata > > Reading CONFIGS/mountdata > > > > Read previous values: > > Target: p1-MDT0000 > > Index: 0 > > UUID: prod_mds_001_UUID > > Lustre FS: p1 > > Mount type: ldiskfs > > Flags: 0x405 > > (MDT MGS ) > > Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr > > Parameters: > > > > tunefs.lustre: unrecognized option `--force'' > > tunefs.lustre: exiting with 22 (Invalid argument) > > > > > > > > > > -- > > Dr Stuart Midgley > > sdm900 at gmail.com > > > > > > > > On 18/03/2012, at 12:17 AM, Nathan Rutman wrote: > > > >> Take them all down again, use tunefs.lustre --force --fsname. > >> > >> > >> On Mar 17, 2012, at 2:10 AM, "Stu Midgley" <sdm900 at gmail.com> wrote: > >> > >>> Afternoon > >>> > >>> We have a rather severe problem with our lustre file system. We had a > >>> full config log and the advice was to rewrite it with a new one. So, > >>> we unmounted our lustre file system off all clients, unmount all the > >>> ost''s and then unmounted the mds. I then did > >>> > >>> mds: > >>> tunefs.lustre --writeconf --erase-params /dev/md2 > >>> > >>> oss: > >>> tunefs.lustre --writeconf --erase-params --mgsnode=mds001 /dev/md2 > >>> > >>> > >>> > >>> After the tunefs.lustre on the mds I saw > >>> > >>> Mar 17 14:33:02 mds001 kernel: Lustre: MGS MGS started > >>> Mar 17 14:33:02 mds001 kernel: Lustre: MGC172.16.0.251 at tcp: Reactivating import > >>> Mar 17 14:33:02 mds001 kernel: Lustre: MGS: Logs for fs p1 were > >>> removed by user request. All servers must be restarted in order to > >>> regenerate the logs. > >>> Mar 17 14:33:02 mds001 kernel: Lustre: Enabling user_xattr > >>> Mar 17 14:33:02 mds001 kernel: Lustre: p1-MDT0000: new disk, initializing > >>> Mar 17 14:33:02 mds001 kernel: Lustre: p1-MDT0000: Now serving > >>> p1-MDT0000 on /dev/md2 with recovery enabled > >>> > >>> which scared me a little... > >>> > >>> > >>> > >>> the mds and the oss''s mount happily BUT I can''t mount the file system > >>> on my clients... on the mds I see > >>> > >>> > >>> Mar 17 16:42:11 mds001 kernel: LustreError: 137-5: UUID > >>> ''prod_mds_001_UUID'' is not available for connect (no target) > >>> > >>> > >>> On the client I see > >>> > >>> > >>> Mar 17 16:00:06 host kernel: LustreError: 11-0: an error occurred > >>> while communicating with 172.16.0.251 at tcp. The mds_connect operation > >>> failed with -19 > >>> > >>> > >>> now, it appears the writeconf renamed the UUID of the mds from > >>> prod_mds_001_UUID to p1-MDT0000_UUID but I can''t work out how to get > >>> it back... > >>> > >>> > >>> for example I tried > >>> > >>> > >>> # tunefs.lustre --mgs --mdt --fsname=p1 /dev/md2 > >>> checking for existing Lustre data: found CONFIGS/mountdata > >>> Reading CONFIGS/mountdata > >>> > >>> Read previous values: > >>> Target: p1-MDT0000 > >>> Index: 0 > >>> UUID: prod_mds_001_UUID > >>> Lustre FS: p1 > >>> Mount type: ldiskfs > >>> Flags: 0x405 > >>> (MDT MGS ) > >>> Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr > >>> Parameters: > >>> > >>> tunefs.lustre: cannot change the name of a registered target > >>> tunefs.lustre: exiting with 1 (Operation not permitted) > >>> > >>> > >>> > >>> I''m now stuck not being able to mount a 1PB file system... which isn''t good :( > >>> > >>> -- > >>> Dr Stuart Midgley > >>> sdm900 at gmail.com > >> ______________________________________________________________________ > >> This email may contain privileged or confidential information, which should only be used for the purpose for which it was sent by Xyratex. No further rights or licenses are granted to use such information. If you are not the intended recipient of this message, please notify the sender by return and delete it. You may not use, copy, disclose or rely on the information contained in it. > >> > >> Internet email is susceptible to data corruption, interception and unauthorised amendment for which Xyratex does not accept liability. While we have taken reasonable precautions to ensure that this email is free of viruses, Xyratex does not accept liability for the presence of any computer viruses in this email, nor for any losses caused as a result of viruses. > >> > >> Xyratex Technology Limited (03134912), Registered in England & Wales, Registered Office, Langstone Road, Havant, Hampshire, PO9 1SA. > >> > >> The Xyratex group of companies also includes, Xyratex Ltd, registered in Bermuda, Xyratex International Inc, registered in California, Xyratex (Malaysia) Sdn Bhd registered in Malaysia, Xyratex Technology (Wuxi) Co Ltd registered in The People''s Republic of China and Xyratex Japan Limited registered in Japan. > >> ______________________________________________________________________ > >> > >> > > > > > > -- > Dr Stuart Midgley > sdm900 at gmail.com >
Kit Westneat
2012-Mar-18 07:04 UTC
[Lustre-discuss] [wc-discuss] can''t mount our lustre filesystem after tunefs.lustre --writeconf
Oh right, that makes sense. I guess if I were you I would try one of two things. First, back up the MDT, and then try: 1) format a small loopback device with the parameters you want the MDT to have, then replace the CONFIGS directory on your MDT with the CONFIGS directory on the loopback device - OR - 2) use a hex editor to modify the UUID Then use tunefs.lustre --print to make sure it all looks good before mounting it. Though one thing I wonder about is, are the OSTs on the same page with the fsname? Like are they expecting to be part of the p1 filesystem? HTH, Kit -- Kit Westneat System Administrator, eSys kit.westneat at nyu.edu 212-992-7647 On Sun, Mar 18, 2012 at 2:40 AM, Dr Stuart Midgley <sdm900 at gmail.com> wrote:> No, we have tried that. > > This file system started life about 6 years ago as lustre 1.4 and has > continually been upgraded? hence the whacky UUID. Trying to rename the FS > doesn''t work. It doesn''t change the UUID that the mgs tells clients to > mount. > > > -- > Dr Stuart Midgley > sdm900 at gmail.com > > > > On 18/03/2012, at 2:24 PM, Kit Westneat wrote: > > > You should be able to reset the UUID by doing another writeconf with the > --fsname flag. After the writeconf, you''ll have to writeconf all the OSTs > too. > > > > It worked on my very simple test at least: > > [root at mds1 tmp]# tunefs.lustre --writeconf --fsname=test1 /dev/loop0 > > checking for existing Lustre data: found CONFIGS/mountdata > > Reading CONFIGS/mountdata > > > > Read previous values: > > Target: t1-MDT0000 > > Index: 0 > > Lustre FS: t1 > > Mount type: ldiskfs > > Flags: 0x5 > > (MDT MGS ) > > Persistent mount opts: iopen_nopriv,user_xattr,errors=remount-ro > > Parameters: mdt.group_upcall=/usr/sbin/l_getgroups > > > > > > Permanent disk data: > > Target: test1-MDT0000 > > Index: 0 > > Lustre FS: test1 > > Mount type: ldiskfs > > Flags: 0x105 > > (MDT MGS writeconf ) > > Persistent mount opts: iopen_nopriv,user_xattr,errors=remount-ro > > Parameters: mdt.group_upcall=/usr/sbin/l_getgroups > > > > Writing CONFIGS/mountdata > > > > > > HTH, > > Kit > > -- > > Kit Westneat > > System Administrator, eSys > > kit.westneat at nyu.edu > > 212-992-7647 > > > > > > On Sun, Mar 18, 2012 at 1:20 AM, Stu Midgley <sdm900 at gmail.com> wrote: > > ok, from what I can tell, the root of the problem is > > > > > > [root at mds001 CONFIGS]# hexdump -C p1-MDT0000 | grep -C 2 mds > > 00002450 0b 00 00 00 04 00 00 00 12 00 00 00 00 00 00 00 > |................| > > 00002460 70 31 2d 4d 44 54 30 30 30 30 00 00 00 00 00 00 > |p1-MDT0000......| > > 00002470 6d 64 73 00 00 00 00 00 70 72 6f 64 5f 6d 64 73 > |mds.....prod_mds| > > 00002480 5f 30 30 31 5f 55 55 49 44 00 00 00 00 00 00 00 > |_001_UUID.......| > > 00002490 78 00 00 00 07 00 00 00 88 00 00 00 08 00 00 00 > |x...............| > > -- > > 000024c0 00 00 00 00 04 00 00 00 0b 00 00 00 12 00 00 00 > |................| > > 000024d0 02 00 00 00 0b 00 00 00 70 31 2d 4d 44 54 30 30 > |........p1-MDT00| > > 000024e0 30 30 00 00 00 00 00 00 70 72 6f 64 5f 6d 64 73 > |00......prod_mds| > > 000024f0 5f 30 30 31 5f 55 55 49 44 00 00 00 00 00 00 00 > |_001_UUID.......| > > 00002500 30 00 00 00 00 00 00 00 70 31 2d 4d 44 54 30 30 > |0.......p1-MDT00| > > > > [root at mds001 CONFIGS]# > > [root at mds001 CONFIGS]# hexdump -C /mnt/md2/CONFIGS/p1-MDT0000 | grep -C > 2 mds > > 00002450 0b 00 00 00 04 00 00 00 10 00 00 00 00 00 00 00 > |................| > > 00002460 70 31 2d 4d 44 54 30 30 30 30 00 00 00 00 00 00 > |p1-MDT0000......| > > 00002470 6d 64 73 00 00 00 00 00 70 31 2d 4d 44 54 30 30 > |mds.....p1-MDT00| > > 00002480 30 30 5f 55 55 49 44 00 70 00 00 00 07 00 00 00 > |00_UUID.p.......| > > 00002490 80 00 00 00 08 00 00 00 00 00 62 10 ff ff ff ff > |..........b.....| > > > > > > now if only I can get the UUID to be removed or reset... > > > > > > On Sun, Mar 18, 2012 at 1:05 PM, Dr Stuart Midgley <sdm900 at gmail.com> > wrote: > > > hmmm? that didn''t work > > > > > > # tunefs.lustre --force --fsname=p1 /dev/md2 > > > checking for existing Lustre data: found CONFIGS/mountdata > > > Reading CONFIGS/mountdata > > > > > > Read previous values: > > > Target: p1-MDT0000 > > > Index: 0 > > > UUID: prod_mds_001_UUID > > > Lustre FS: p1 > > > Mount type: ldiskfs > > > Flags: 0x405 > > > (MDT MGS ) > > > Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr > > > Parameters: > > > > > > tunefs.lustre: unrecognized option `--force'' > > > tunefs.lustre: exiting with 22 (Invalid argument) > > > > > > > > > > > > > > > -- > > > Dr Stuart Midgley > > > sdm900 at gmail.com > > > > > > > > > > > > On 18/03/2012, at 12:17 AM, Nathan Rutman wrote: > > > > > >> Take them all down again, use tunefs.lustre --force --fsname. > > >> > > >> > > >> On Mar 17, 2012, at 2:10 AM, "Stu Midgley" <sdm900 at gmail.com> wrote: > > >> > > >>> Afternoon > > >>> > > >>> We have a rather severe problem with our lustre file system. We had > a > > >>> full config log and the advice was to rewrite it with a new one. So, > > >>> we unmounted our lustre file system off all clients, unmount all the > > >>> ost''s and then unmounted the mds. I then did > > >>> > > >>> mds: > > >>> tunefs.lustre --writeconf --erase-params /dev/md2 > > >>> > > >>> oss: > > >>> tunefs.lustre --writeconf --erase-params --mgsnode=mds001 /dev/md2 > > >>> > > >>> > > >>> > > >>> After the tunefs.lustre on the mds I saw > > >>> > > >>> Mar 17 14:33:02 mds001 kernel: Lustre: MGS MGS started > > >>> Mar 17 14:33:02 mds001 kernel: Lustre: MGC172.16.0.251 at tcp: > Reactivating import > > >>> Mar 17 14:33:02 mds001 kernel: Lustre: MGS: Logs for fs p1 were > > >>> removed by user request. All servers must be restarted in order to > > >>> regenerate the logs. > > >>> Mar 17 14:33:02 mds001 kernel: Lustre: Enabling user_xattr > > >>> Mar 17 14:33:02 mds001 kernel: Lustre: p1-MDT0000: new disk, > initializing > > >>> Mar 17 14:33:02 mds001 kernel: Lustre: p1-MDT0000: Now serving > > >>> p1-MDT0000 on /dev/md2 with recovery enabled > > >>> > > >>> which scared me a little... > > >>> > > >>> > > >>> > > >>> the mds and the oss''s mount happily BUT I can''t mount the file system > > >>> on my clients... on the mds I see > > >>> > > >>> > > >>> Mar 17 16:42:11 mds001 kernel: LustreError: 137-5: UUID > > >>> ''prod_mds_001_UUID'' is not available for connect (no target) > > >>> > > >>> > > >>> On the client I see > > >>> > > >>> > > >>> Mar 17 16:00:06 host kernel: LustreError: 11-0: an error occurred > > >>> while communicating with 172.16.0.251 at tcp. The mds_connect operation > > >>> failed with -19 > > >>> > > >>> > > >>> now, it appears the writeconf renamed the UUID of the mds from > > >>> prod_mds_001_UUID to p1-MDT0000_UUID but I can''t work out how to get > > >>> it back... > > >>> > > >>> > > >>> for example I tried > > >>> > > >>> > > >>> # tunefs.lustre --mgs --mdt --fsname=p1 /dev/md2 > > >>> checking for existing Lustre data: found CONFIGS/mountdata > > >>> Reading CONFIGS/mountdata > > >>> > > >>> Read previous values: > > >>> Target: p1-MDT0000 > > >>> Index: 0 > > >>> UUID: prod_mds_001_UUID > > >>> Lustre FS: p1 > > >>> Mount type: ldiskfs > > >>> Flags: 0x405 > > >>> (MDT MGS ) > > >>> Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr > > >>> Parameters: > > >>> > > >>> tunefs.lustre: cannot change the name of a registered target > > >>> tunefs.lustre: exiting with 1 (Operation not permitted) > > >>> > > >>> > > >>> > > >>> I''m now stuck not being able to mount a 1PB file system... which > isn''t good :( > > >>> > > >>> -- > > >>> Dr Stuart Midgley > > >>> sdm900 at gmail.com > > >> ______________________________________________________________________ > > >> This email may contain privileged or confidential information, which > should only be used for the purpose for which it was sent by Xyratex. No > further rights or licenses are granted to use such information. If you are > not the intended recipient of this message, please notify the sender by > return and delete it. You may not use, copy, disclose or rely on the > information contained in it. > > >> > > >> Internet email is susceptible to data corruption, interception and > unauthorised amendment for which Xyratex does not accept liability. While > we have taken reasonable precautions to ensure that this email is free of > viruses, Xyratex does not accept liability for the presence of any computer > viruses in this email, nor for any losses caused as a result of viruses. > > >> > > >> Xyratex Technology Limited (03134912), Registered in England & Wales, > Registered Office, Langstone Road, Havant, Hampshire, PO9 1SA. > > >> > > >> The Xyratex group of companies also includes, Xyratex Ltd, registered > in Bermuda, Xyratex International Inc, registered in California, Xyratex > (Malaysia) Sdn Bhd registered in Malaysia, Xyratex Technology (Wuxi) Co Ltd > registered in The People''s Republic of China and Xyratex Japan Limited > registered in Japan. > > >> ______________________________________________________________________ > > >> > > >> > > > > > > > > > > > -- > > Dr Stuart Midgley > > sdm900 at gmail.com > > > >-------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20120318/6d4366e9/attachment-0001.html
Stu Midgley
2012-Mar-18 07:36 UTC
[Lustre-discuss] [wc-discuss] can''t mount our lustre filesystem after tunefs.lustre --writeconf
I''m well down this path... I replaced the mountdata with that from my small temporary mdt (same name) and that didn''t help. Now, I will do a few tests on the p1-client. Perhaps after a write conf that is basically clean... and I can replace it... but currently it contains lots of info about each of the OST''s. All the OST''s are happy mounting to the mdt and all think that they are part of our p1 file system. Thanks. On Sun, Mar 18, 2012 at 3:04 PM, Kit Westneat <kit.westneat at nyu.edu> wrote:> Oh right, that makes sense. I guess if I were you I would try one of two > things. First, back up the MDT, and then try: > 1) format a small loopback device with the parameters you want the MDT to > have, then replace the CONFIGS directory on your MDT with the CONFIGS > directory on the loopback device > - OR - > 2) use a hex editor to modify the UUID > > Then use tunefs.lustre --print to make sure it all looks good before > mounting it. > > Though one thing I wonder about is, are the OSTs on the same page with the > fsname? Like are they expecting to be part of the p1 filesystem? > > HTH, > Kit > > -- > Kit Westneat > System Administrator, eSys > kit.westneat at nyu.edu > 212-992-7647 > > > On Sun, Mar 18, 2012 at 2:40 AM, Dr Stuart Midgley <sdm900 at gmail.com> wrote: >> >> No, we have tried that. >> >> This file system started life about 6 years ago as lustre 1.4 and has >> continually been upgraded? hence the whacky UUID. ?Trying to rename the FS >> doesn''t work. ?It doesn''t change the UUID that the mgs tells clients to >> mount. >> >> >> -- >> Dr Stuart Midgley >> sdm900 at gmail.com >> >> >> >> On 18/03/2012, at 2:24 PM, Kit Westneat wrote: >> >> > You should be able to reset the UUID by doing another writeconf with the >> > --fsname flag. After the writeconf, you''ll have to writeconf all the OSTs >> > too. >> > >> > It worked on my very simple test at least: >> > [root at mds1 tmp]# tunefs.lustre --writeconf --fsname=test1 /dev/loop0 >> > checking for existing Lustre data: found CONFIGS/mountdata >> > Reading CONFIGS/mountdata >> > >> > ? ?Read previous values: >> > Target: ? ? t1-MDT0000 >> > Index: ? ? ?0 >> > Lustre FS: ?t1 >> > Mount type: ldiskfs >> > Flags: ? ? ?0x5 >> > ? ? ? ? ? ? ? (MDT MGS ) >> > Persistent mount opts: iopen_nopriv,user_xattr,errors=remount-ro >> > Parameters: mdt.group_upcall=/usr/sbin/l_getgroups >> > >> > >> > ? ?Permanent disk data: >> > Target: ? ? test1-MDT0000 >> > Index: ? ? ?0 >> > Lustre FS: ?test1 >> > Mount type: ldiskfs >> > Flags: ? ? ?0x105 >> > ? ? ? ? ? ? ? (MDT MGS writeconf ) >> > Persistent mount opts: iopen_nopriv,user_xattr,errors=remount-ro >> > Parameters: mdt.group_upcall=/usr/sbin/l_getgroups >> > >> > Writing CONFIGS/mountdata >> > >> > >> > HTH, >> > Kit >> > -- >> > Kit Westneat >> > System Administrator, eSys >> > kit.westneat at nyu.edu >> > 212-992-7647 >> > >> > >> > On Sun, Mar 18, 2012 at 1:20 AM, Stu Midgley <sdm900 at gmail.com> wrote: >> > ok, from what I can tell, the root of the problem is >> > >> > >> > [root at mds001 CONFIGS]# hexdump -C p1-MDT0000 ?| grep -C 2 mds >> > 00002450 ?0b 00 00 00 04 00 00 00 ?12 00 00 00 00 00 00 00 >> > ?|................| >> > 00002460 ?70 31 2d 4d 44 54 30 30 ?30 30 00 00 00 00 00 00 >> > ?|p1-MDT0000......| >> > 00002470 ?6d 64 73 00 00 00 00 00 ?70 72 6f 64 5f 6d 64 73 >> > ?|mds.....prod_mds| >> > 00002480 ?5f 30 30 31 5f 55 55 49 ?44 00 00 00 00 00 00 00 >> > ?|_001_UUID.......| >> > 00002490 ?78 00 00 00 07 00 00 00 ?88 00 00 00 08 00 00 00 >> > ?|x...............| >> > -- >> > 000024c0 ?00 00 00 00 04 00 00 00 ?0b 00 00 00 12 00 00 00 >> > ?|................| >> > 000024d0 ?02 00 00 00 0b 00 00 00 ?70 31 2d 4d 44 54 30 30 >> > ?|........p1-MDT00| >> > 000024e0 ?30 30 00 00 00 00 00 00 ?70 72 6f 64 5f 6d 64 73 >> > ?|00......prod_mds| >> > 000024f0 ?5f 30 30 31 5f 55 55 49 ?44 00 00 00 00 00 00 00 >> > ?|_001_UUID.......| >> > 00002500 ?30 00 00 00 00 00 00 00 ?70 31 2d 4d 44 54 30 30 >> > ?|0.......p1-MDT00| >> > >> > [root at mds001 CONFIGS]# >> > [root at mds001 CONFIGS]# hexdump -C /mnt/md2/CONFIGS/p1-MDT0000 | grep -C >> > 2 mds >> > 00002450 ?0b 00 00 00 04 00 00 00 ?10 00 00 00 00 00 00 00 >> > ?|................| >> > 00002460 ?70 31 2d 4d 44 54 30 30 ?30 30 00 00 00 00 00 00 >> > ?|p1-MDT0000......| >> > 00002470 ?6d 64 73 00 00 00 00 00 ?70 31 2d 4d 44 54 30 30 >> > ?|mds.....p1-MDT00| >> > 00002480 ?30 30 5f 55 55 49 44 00 ?70 00 00 00 07 00 00 00 >> > ?|00_UUID.p.......| >> > 00002490 ?80 00 00 00 08 00 00 00 ?00 00 62 10 ff ff ff ff >> > ?|..........b.....| >> > >> > >> > now if only I can get the UUID to be removed or reset... >> > >> > >> > On Sun, Mar 18, 2012 at 1:05 PM, Dr Stuart Midgley <sdm900 at gmail.com> >> > wrote: >> > > hmmm? that didn''t work >> > > >> > > # tunefs.lustre --force --fsname=p1 /dev/md2 >> > > checking for existing Lustre data: found CONFIGS/mountdata >> > > Reading CONFIGS/mountdata >> > > >> > > ? Read previous values: >> > > Target: ? ? p1-MDT0000 >> > > Index: ? ? ?0 >> > > UUID: ? ? ? prod_mds_001_UUID >> > > Lustre FS: ?p1 >> > > Mount type: ldiskfs >> > > Flags: ? ? ?0x405 >> > > ? ? ? ? ? ? ?(MDT MGS ) >> > > Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr >> > > Parameters: >> > > >> > > tunefs.lustre: unrecognized option `--force'' >> > > tunefs.lustre: exiting with 22 (Invalid argument) >> > > >> > > >> > > >> > > >> > > -- >> > > Dr Stuart Midgley >> > > sdm900 at gmail.com >> > > >> > > >> > > >> > > On 18/03/2012, at 12:17 AM, Nathan Rutman wrote: >> > > >> > >> Take them all down again, use tunefs.lustre --force --fsname. >> > >> >> > >> >> > >> On Mar 17, 2012, at 2:10 AM, "Stu Midgley" <sdm900 at gmail.com> wrote: >> > >> >> > >>> Afternoon >> > >>> >> > >>> We have a rather severe problem with our lustre file system. ?We had >> > >>> a >> > >>> full config log and the advice was to rewrite it with a new one. >> > >>> ?So, >> > >>> we unmounted our lustre file system off all clients, unmount all the >> > >>> ost''s and then unmounted the mds. ?I then did >> > >>> >> > >>> mds: >> > >>> ?tunefs.lustre --writeconf --erase-params /dev/md2 >> > >>> >> > >>> oss: >> > >>> ?tunefs.lustre --writeconf --erase-params --mgsnode=mds001 /dev/md2 >> > >>> >> > >>> >> > >>> >> > >>> After the tunefs.lustre on the mds I saw >> > >>> >> > >>> Mar 17 14:33:02 mds001 kernel: Lustre: MGS MGS started >> > >>> Mar 17 14:33:02 mds001 kernel: Lustre: MGC172.16.0.251 at tcp: >> > >>> Reactivating import >> > >>> Mar 17 14:33:02 mds001 kernel: Lustre: MGS: Logs for fs p1 were >> > >>> removed by user request. ?All servers must be restarted in order to >> > >>> regenerate the logs. >> > >>> Mar 17 14:33:02 mds001 kernel: Lustre: Enabling user_xattr >> > >>> Mar 17 14:33:02 mds001 kernel: Lustre: p1-MDT0000: new disk, >> > >>> initializing >> > >>> Mar 17 14:33:02 mds001 kernel: Lustre: p1-MDT0000: Now serving >> > >>> p1-MDT0000 on /dev/md2 with recovery enabled >> > >>> >> > >>> which scared me a little... >> > >>> >> > >>> >> > >>> >> > >>> the mds and the oss''s mount happily BUT I can''t mount the file >> > >>> system >> > >>> on my clients... on the mds I see >> > >>> >> > >>> >> > >>> Mar 17 16:42:11 mds001 kernel: LustreError: 137-5: UUID >> > >>> ''prod_mds_001_UUID'' is not available ?for connect (no target) >> > >>> >> > >>> >> > >>> On the client I see >> > >>> >> > >>> >> > >>> Mar 17 16:00:06 host kernel: LustreError: 11-0: an error occurred >> > >>> while communicating with 172.16.0.251 at tcp. The mds_connect operation >> > >>> failed with -19 >> > >>> >> > >>> >> > >>> now, it appears the writeconf renamed the UUID of the mds from >> > >>> prod_mds_001_UUID to p1-MDT0000_UUID but I can''t work out how to get >> > >>> it back... >> > >>> >> > >>> >> > >>> for example I tried >> > >>> >> > >>> >> > >>> # tunefs.lustre --mgs --mdt --fsname=p1 /dev/md2 >> > >>> checking for existing Lustre data: found CONFIGS/mountdata >> > >>> Reading CONFIGS/mountdata >> > >>> >> > >>> Read previous values: >> > >>> Target: ? ? p1-MDT0000 >> > >>> Index: ? ? ?0 >> > >>> UUID: ? ? ? prod_mds_001_UUID >> > >>> Lustre FS: ?p1 >> > >>> Mount type: ldiskfs >> > >>> Flags: ? ? ?0x405 >> > >>> ? ? ? ? ? ?(MDT MGS ) >> > >>> Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr >> > >>> Parameters: >> > >>> >> > >>> tunefs.lustre: cannot change the name of a registered target >> > >>> tunefs.lustre: exiting with 1 (Operation not permitted) >> > >>> >> > >>> >> > >>> >> > >>> I''m now stuck not being able to mount a 1PB file system... which >> > >>> isn''t good :( >> > >>> >> > >>> -- >> > >>> Dr Stuart Midgley >> > >>> sdm900 at gmail.com >> > >> >> > >> ______________________________________________________________________ >> > >> This email may contain privileged or confidential information, which >> > >> should only be used for the purpose for which it was sent by Xyratex. No >> > >> further rights or licenses are granted to use such information. If you are >> > >> not the intended recipient of this message, please notify the sender by >> > >> return and delete it. You may not use, copy, disclose or rely on the >> > >> information contained in it. >> > >> >> > >> Internet email is susceptible to data corruption, interception and >> > >> unauthorised amendment for which Xyratex does not accept liability. While we >> > >> have taken reasonable precautions to ensure that this email is free of >> > >> viruses, Xyratex does not accept liability for the presence of any computer >> > >> viruses in this email, nor for any losses caused as a result of viruses. >> > >> >> > >> Xyratex Technology Limited (03134912), Registered in England & Wales, >> > >> Registered Office, Langstone Road, Havant, Hampshire, PO9 1SA. >> > >> >> > >> The Xyratex group of companies also includes, Xyratex Ltd, registered >> > >> in Bermuda, Xyratex International Inc, registered in California, Xyratex >> > >> (Malaysia) Sdn Bhd registered in Malaysia, Xyratex Technology (Wuxi) Co Ltd >> > >> registered in The People''s Republic of China and Xyratex Japan Limited >> > >> registered in Japan. >> > >> >> > >> ______________________________________________________________________ >> > >> >> > >> >> > > >> > >> > >> > >> > -- >> > Dr Stuart Midgley >> > sdm900 at gmail.com >> > >> >-- Dr Stuart Midgley sdm900 at gmail.com
Dr Stuart Midgley
2012-Mar-18 08:58 UTC
[Lustre-discuss] [wc-discuss] can''t mount our lustre filesystem after tunefs.lustre --writeconf
Well, our filesystem is back. I hexedit''ed the CONFIGS/p1-client and replaced prod_mds_001_UUID with p1-MDT0000_UUID and now our file system mounts. Ran a heap of checks and it all looks good. Thanks everyone for your help. -- Dr Stuart Midgley sdm900 at gmail.com On 18/03/2012, at 3:36 PM, Stu Midgley wrote:> I''m well down this path... I replaced the mountdata with that from my > small temporary mdt (same name) and that didn''t help. > > Now, I will do a few tests on the p1-client. Perhaps after a write > conf that is basically clean... and I can replace it... but currently > it contains lots of info about each of the OST''s. > > All the OST''s are happy mounting to the mdt and all think that they > are part of our p1 file system. > > Thanks. > > > On Sun, Mar 18, 2012 at 3:04 PM, Kit Westneat <kit.westneat at nyu.edu> wrote: >> Oh right, that makes sense. I guess if I were you I would try one of two >> things. First, back up the MDT, and then try: >> 1) format a small loopback device with the parameters you want the MDT to >> have, then replace the CONFIGS directory on your MDT with the CONFIGS >> directory on the loopback device >> - OR - >> 2) use a hex editor to modify the UUID >> >> Then use tunefs.lustre --print to make sure it all looks good before >> mounting it. >> >> Though one thing I wonder about is, are the OSTs on the same page with the >> fsname? Like are they expecting to be part of the p1 filesystem? >> >> HTH, >> Kit >> >> -- >> Kit Westneat >> System Administrator, eSys >> kit.westneat at nyu.edu >> 212-992-7647 >> >> >> On Sun, Mar 18, 2012 at 2:40 AM, Dr Stuart Midgley <sdm900 at gmail.com> wrote: >>> >>> No, we have tried that. >>> >>> This file system started life about 6 years ago as lustre 1.4 and has >>> continually been upgraded? hence the whacky UUID. Trying to rename the FS >>> doesn''t work. It doesn''t change the UUID that the mgs tells clients to >>> mount. >>> >>> >>> -- >>> Dr Stuart Midgley >>> sdm900 at gmail.com >>> >>> >>> >>> On 18/03/2012, at 2:24 PM, Kit Westneat wrote: >>> >>>> You should be able to reset the UUID by doing another writeconf with the >>>> --fsname flag. After the writeconf, you''ll have to writeconf all the OSTs >>>> too. >>>> >>>> It worked on my very simple test at least: >>>> [root at mds1 tmp]# tunefs.lustre --writeconf --fsname=test1 /dev/loop0 >>>> checking for existing Lustre data: found CONFIGS/mountdata >>>> Reading CONFIGS/mountdata >>>> >>>> Read previous values: >>>> Target: t1-MDT0000 >>>> Index: 0 >>>> Lustre FS: t1 >>>> Mount type: ldiskfs >>>> Flags: 0x5 >>>> (MDT MGS ) >>>> Persistent mount opts: iopen_nopriv,user_xattr,errors=remount-ro >>>> Parameters: mdt.group_upcall=/usr/sbin/l_getgroups >>>> >>>> >>>> Permanent disk data: >>>> Target: test1-MDT0000 >>>> Index: 0 >>>> Lustre FS: test1 >>>> Mount type: ldiskfs >>>> Flags: 0x105 >>>> (MDT MGS writeconf ) >>>> Persistent mount opts: iopen_nopriv,user_xattr,errors=remount-ro >>>> Parameters: mdt.group_upcall=/usr/sbin/l_getgroups >>>> >>>> Writing CONFIGS/mountdata >>>> >>>> >>>> HTH, >>>> Kit >>>> -- >>>> Kit Westneat >>>> System Administrator, eSys >>>> kit.westneat at nyu.edu >>>> 212-992-7647 >>>> >>>> >>>> On Sun, Mar 18, 2012 at 1:20 AM, Stu Midgley <sdm900 at gmail.com> wrote: >>>> ok, from what I can tell, the root of the problem is >>>> >>>> >>>> [root at mds001 CONFIGS]# hexdump -C p1-MDT0000 | grep -C 2 mds >>>> 00002450 0b 00 00 00 04 00 00 00 12 00 00 00 00 00 00 00 >>>> |................| >>>> 00002460 70 31 2d 4d 44 54 30 30 30 30 00 00 00 00 00 00 >>>> |p1-MDT0000......| >>>> 00002470 6d 64 73 00 00 00 00 00 70 72 6f 64 5f 6d 64 73 >>>> |mds.....prod_mds| >>>> 00002480 5f 30 30 31 5f 55 55 49 44 00 00 00 00 00 00 00 >>>> |_001_UUID.......| >>>> 00002490 78 00 00 00 07 00 00 00 88 00 00 00 08 00 00 00 >>>> |x...............| >>>> -- >>>> 000024c0 00 00 00 00 04 00 00 00 0b 00 00 00 12 00 00 00 >>>> |................| >>>> 000024d0 02 00 00 00 0b 00 00 00 70 31 2d 4d 44 54 30 30 >>>> |........p1-MDT00| >>>> 000024e0 30 30 00 00 00 00 00 00 70 72 6f 64 5f 6d 64 73 >>>> |00......prod_mds| >>>> 000024f0 5f 30 30 31 5f 55 55 49 44 00 00 00 00 00 00 00 >>>> |_001_UUID.......| >>>> 00002500 30 00 00 00 00 00 00 00 70 31 2d 4d 44 54 30 30 >>>> |0.......p1-MDT00| >>>> >>>> [root at mds001 CONFIGS]# >>>> [root at mds001 CONFIGS]# hexdump -C /mnt/md2/CONFIGS/p1-MDT0000 | grep -C >>>> 2 mds >>>> 00002450 0b 00 00 00 04 00 00 00 10 00 00 00 00 00 00 00 >>>> |................| >>>> 00002460 70 31 2d 4d 44 54 30 30 30 30 00 00 00 00 00 00 >>>> |p1-MDT0000......| >>>> 00002470 6d 64 73 00 00 00 00 00 70 31 2d 4d 44 54 30 30 >>>> |mds.....p1-MDT00| >>>> 00002480 30 30 5f 55 55 49 44 00 70 00 00 00 07 00 00 00 >>>> |00_UUID.p.......| >>>> 00002490 80 00 00 00 08 00 00 00 00 00 62 10 ff ff ff ff >>>> |..........b.....| >>>> >>>> >>>> now if only I can get the UUID to be removed or reset... >>>> >>>> >>>> On Sun, Mar 18, 2012 at 1:05 PM, Dr Stuart Midgley <sdm900 at gmail.com> >>>> wrote: >>>>> hmmm? that didn''t work >>>>> >>>>> # tunefs.lustre --force --fsname=p1 /dev/md2 >>>>> checking for existing Lustre data: found CONFIGS/mountdata >>>>> Reading CONFIGS/mountdata >>>>> >>>>> Read previous values: >>>>> Target: p1-MDT0000 >>>>> Index: 0 >>>>> UUID: prod_mds_001_UUID >>>>> Lustre FS: p1 >>>>> Mount type: ldiskfs >>>>> Flags: 0x405 >>>>> (MDT MGS ) >>>>> Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr >>>>> Parameters: >>>>> >>>>> tunefs.lustre: unrecognized option `--force'' >>>>> tunefs.lustre: exiting with 22 (Invalid argument) >>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> Dr Stuart Midgley >>>>> sdm900 at gmail.com >>>>> >>>>> >>>>> >>>>> On 18/03/2012, at 12:17 AM, Nathan Rutman wrote: >>>>> >>>>>> Take them all down again, use tunefs.lustre --force --fsname. >>>>>> >>>>>> >>>>>> On Mar 17, 2012, at 2:10 AM, "Stu Midgley" <sdm900 at gmail.com> wrote: >>>>>> >>>>>>> Afternoon >>>>>>> >>>>>>> We have a rather severe problem with our lustre file system. We had >>>>>>> a >>>>>>> full config log and the advice was to rewrite it with a new one. >>>>>>> So, >>>>>>> we unmounted our lustre file system off all clients, unmount all the >>>>>>> ost''s and then unmounted the mds. I then did >>>>>>> >>>>>>> mds: >>>>>>> tunefs.lustre --writeconf --erase-params /dev/md2 >>>>>>> >>>>>>> oss: >>>>>>> tunefs.lustre --writeconf --erase-params --mgsnode=mds001 /dev/md2 >>>>>>> >>>>>>> >>>>>>> >>>>>>> After the tunefs.lustre on the mds I saw >>>>>>> >>>>>>> Mar 17 14:33:02 mds001 kernel: Lustre: MGS MGS started >>>>>>> Mar 17 14:33:02 mds001 kernel: Lustre: MGC172.16.0.251 at tcp: >>>>>>> Reactivating import >>>>>>> Mar 17 14:33:02 mds001 kernel: Lustre: MGS: Logs for fs p1 were >>>>>>> removed by user request. All servers must be restarted in order to >>>>>>> regenerate the logs. >>>>>>> Mar 17 14:33:02 mds001 kernel: Lustre: Enabling user_xattr >>>>>>> Mar 17 14:33:02 mds001 kernel: Lustre: p1-MDT0000: new disk, >>>>>>> initializing >>>>>>> Mar 17 14:33:02 mds001 kernel: Lustre: p1-MDT0000: Now serving >>>>>>> p1-MDT0000 on /dev/md2 with recovery enabled >>>>>>> >>>>>>> which scared me a little... >>>>>>> >>>>>>> >>>>>>> >>>>>>> the mds and the oss''s mount happily BUT I can''t mount the file >>>>>>> system >>>>>>> on my clients... on the mds I see >>>>>>> >>>>>>> >>>>>>> Mar 17 16:42:11 mds001 kernel: LustreError: 137-5: UUID >>>>>>> ''prod_mds_001_UUID'' is not available for connect (no target) >>>>>>> >>>>>>> >>>>>>> On the client I see >>>>>>> >>>>>>> >>>>>>> Mar 17 16:00:06 host kernel: LustreError: 11-0: an error occurred >>>>>>> while communicating with 172.16.0.251 at tcp. The mds_connect operation >>>>>>> failed with -19 >>>>>>> >>>>>>> >>>>>>> now, it appears the writeconf renamed the UUID of the mds from >>>>>>> prod_mds_001_UUID to p1-MDT0000_UUID but I can''t work out how to get >>>>>>> it back... >>>>>>> >>>>>>> >>>>>>> for example I tried >>>>>>> >>>>>>> >>>>>>> # tunefs.lustre --mgs --mdt --fsname=p1 /dev/md2 >>>>>>> checking for existing Lustre data: found CONFIGS/mountdata >>>>>>> Reading CONFIGS/mountdata >>>>>>> >>>>>>> Read previous values: >>>>>>> Target: p1-MDT0000 >>>>>>> Index: 0 >>>>>>> UUID: prod_mds_001_UUID >>>>>>> Lustre FS: p1 >>>>>>> Mount type: ldiskfs >>>>>>> Flags: 0x405 >>>>>>> (MDT MGS ) >>>>>>> Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr >>>>>>> Parameters: >>>>>>> >>>>>>> tunefs.lustre: cannot change the name of a registered target >>>>>>> tunefs.lustre: exiting with 1 (Operation not permitted) >>>>>>> >>>>>>> >>>>>>> >>>>>>> I''m now stuck not being able to mount a 1PB file system... which >>>>>>> isn''t good :( >>>>>>> >>>>>>> -- >>>>>>> Dr Stuart Midgley >>>>>>> sdm900 at gmail.com >>>>>> >>>>>> ______________________________________________________________________ >>>>>> This email may contain privileged or confidential information, which >>>>>> should only be used for the purpose for which it was sent by Xyratex. No >>>>>> further rights or licenses are granted to use such information. If you are >>>>>> not the intended recipient of this message, please notify the sender by >>>>>> return and delete it. You may not use, copy, disclose or rely on the >>>>>> information contained in it. >>>>>> >>>>>> Internet email is susceptible to data corruption, interception and >>>>>> unauthorised amendment for which Xyratex does not accept liability. While we >>>>>> have taken reasonable precautions to ensure that this email is free of >>>>>> viruses, Xyratex does not accept liability for the presence of any computer >>>>>> viruses in this email, nor for any losses caused as a result of viruses. >>>>>> >>>>>> Xyratex Technology Limited (03134912), Registered in England & Wales, >>>>>> Registered Office, Langstone Road, Havant, Hampshire, PO9 1SA. >>>>>> >>>>>> The Xyratex group of companies also includes, Xyratex Ltd, registered >>>>>> in Bermuda, Xyratex International Inc, registered in California, Xyratex >>>>>> (Malaysia) Sdn Bhd registered in Malaysia, Xyratex Technology (Wuxi) Co Ltd >>>>>> registered in The People''s Republic of China and Xyratex Japan Limited >>>>>> registered in Japan. >>>>>> >>>>>> ______________________________________________________________________ >>>>>> >>>>>> >>>>> >>>> >>>> >>>> >>>> -- >>>> Dr Stuart Midgley >>>> sdm900 at gmail.com >>>> >>> >> > > > > -- > Dr Stuart Midgley > sdm900 at gmail.com