Hi, I would like to be able to change a file system name. Towards that end, I have run the following commands as an experiment: mkfs.lustre --reformat --fsname BEFORE --device-size=10000 --mgs --mdt --mgsnode=10.2.9.1 at o2ib0 /dev/mapper/map0 dmesg -c mount -t lustre /dev/mapper/map0 /mnt/mdt dmesg -c umount /mnt/mdt dmesg -c tunefs.lustre --writeconf --fsname=AFTER --mgs --mdt /dev/mapper/map0 dmesg -c mount -t lustre /dev/mapper/map0 /mnt/mdt dmesg -c Unfortunately, this does not work. Can someone please explain the correct sequence of commands to ues? The output of each command is as follows. Thanks. [root at ts-hss2-01 ~]# mkfs.lustre --reformat --fsname BEFORE --device-size=10000 --mgs --mdt --mgsnode=10.2.9.1 at o2ib0 /dev/mapper/map0 Permanent disk data: Target: BEFORE-MDTffff Index: unassigned Lustre FS: BEFORE Mount type: ldiskfs Flags: 0x75 (MDT MGS needs_index first_time update ) Persistent mount opts: iopen_nopriv,user_xattr,errors=remount-ro Parameters: mgsnode=10.2.9.1 at o2ib mdt.group_upcall=/usr/sbin/l_getgroups device size = 1632256MB 2 6 18 formatting backing filesystem ldiskfs on /dev/mapper/map0 target name BEFORE-MDTffff 4k blocks 2500 options -i 4096 -I 512 -q -O dir_index,extents,uninit_groups -F mkfs_cmd = mke2fs -j -b 4096 -L BEFORE-MDTffff -i 4096 -I 512 -q -O dir_index,extents,uninit_groups -F /dev/mapper/map0 2500 Writing CONFIGS/mountdata [root at ts-hss2-01 ~]# dmesg -c LDISKFS-fs: barriers enabled kjournald2 starting: pid 1388, dev dm-4:8, commit interval 5 seconds LDISKFS FS on dm-4, internal journal on dm-4:8 LDISKFS-fs: delayed allocation enabled LDISKFS-fs: file extents enabled LDISKFS-fs: mballoc enabled LDISKFS-fs: mounted filesystem dm-4 with ordered data mode LDISKFS-fs: mballoc: 1 blocks 1 reqs (0 success) LDISKFS-fs: mballoc: 1 extents scanned, 0 goal hits, 1 2^N hits, 0 breaks, 0 lost LDISKFS-fs: mballoc: 1 generated and it took 2142 LDISKFS-fs: mballoc: 512 preallocated, 0 discarded [root at ts-hss2-01 ~]# mount -t lustre /dev/mapper/map0 /mnt/mdt [root at ts-hss2-01 ~]# dmesg -c LDISKFS-fs: barriers enabled kjournald2 starting: pid 1406, dev dm-4:8, commit interval 5 seconds LDISKFS FS on dm-4, internal journal on dm-4:8 LDISKFS-fs: delayed allocation enabled LDISKFS-fs: file extents enabled LDISKFS-fs: mballoc enabled LDISKFS-fs: mounted filesystem dm-4 with ordered data mode LDISKFS-fs: mballoc: 0 blocks 0 reqs (0 success) LDISKFS-fs: mballoc: 0 extents scanned, 0 goal hits, 0 2^N hits, 0 breaks, 0 lost LDISKFS-fs: mballoc: 0 generated and it took 0 LDISKFS-fs: mballoc: 0 preallocated, 0 discarded LDISKFS-fs: barriers enabled kjournald2 starting: pid 1410, dev dm-4:8, commit interval 5 seconds LDISKFS FS on dm-4, internal journal on dm-4:8 LDISKFS-fs: delayed allocation enabled LDISKFS-fs: file extents enabled LDISKFS-fs: mballoc enabled LDISKFS-fs: mounted filesystem dm-4 with ordered data mode Lustre: MGS MGS started Lustre: MGC10.2.9.1 at o2ib: Reactivating import Lustre: Setting parameter BEFORE-MDT0000.mdt.group_upcall in log BEFORE-MDT0000 Lustre: Enabling user_xattr Lustre: BEFORE-MDT0000: new disk, initializing Lustre: BEFORE-MDT0000: Now serving BEFORE-MDT0000 on /dev/mapper/map0 with recovery enabled Lustre: 1503:0:(lproc_mds.c:271:lprocfs_wr_group_upcall()) BEFORE-MDT0000: group upcall set to /usr/sbin/l_getgroups Lustre: BEFORE-MDT0000.mdt: set parameter group_upcall=/usr/sbin/l_getgroups [root at ts-hss2-01 ~]# umount /mnt/mdt [root at ts-hss2-01 ~]# dmesg -c Lustre: Failing over BEFORE-MDT0000 Lustre: Skipped 1 previous similar message Lustre: *** setting obd BEFORE-MDT0000 device ''dm-4'' read-only *** Turning device dm-4 (0xfd00004) read-only Lustre: BEFORE-MDT0000: shutting down for failover; client state will be preserved. Lustre: MDT BEFORE-MDT0000 has stopped. LustreError: 1517:0:(ldlm_request.c:1025:ldlm_cli_cancel_req()) Got rc -108 from cancel RPC: canceling anyway LustreError: 1517:0:(ldlm_request.c:1587:ldlm_cli_cancel_list()) ldlm_cli_cancel_list: -108 Lustre: MGS has stopped. LDISKFS-fs: mballoc: 3 blocks 3 reqs (0 success) LDISKFS-fs: mballoc: 8 extents scanned, 0 goal hits, 0 2^N hits, 0 breaks, 0 lost LDISKFS-fs: mballoc: 1 generated and it took 2598 LDISKFS-fs: mballoc: 1145 preallocated, 0 discarded Removing read-only on unknown block (0xfd00004) Lustre: server umount BEFORE-MDT0000 complete [root at ts-hss2-01 ~]# tunefs.lustre --writeconf --fsname=AFTER --mgs --mdt /dev/mapper/map0 checking for existing Lustre data: found CONFIGS/mountdata Reading CONFIGS/mountdata Read previous values: Target: BEFORE-MDT0000 Index: 0 Lustre FS: BEFORE Mount type: ldiskfs Flags: 0x5 (MDT MGS ) Persistent mount opts: iopen_nopriv,user_xattr,errors=remount-ro Parameters: mgsnode=10.2.9.1 at o2ib mdt.group_upcall=/usr/sbin/l_getgroups Permanent disk data: Target: AFTER-MDT0000 Index: 0 Lustre FS: AFTER Mount type: ldiskfs Flags: 0x105 (MDT MGS writeconf ) Persistent mount opts: iopen_nopriv,user_xattr,errors=remount-ro Parameters: mgsnode=10.2.9.1 at o2ib mdt.group_upcall=/usr/sbin/l_getgroups Writing CONFIGS/mountdata [root at ts-hss2-01 ~]# dmesg -c LDISKFS-fs: barriers enabled kjournald2 starting: pid 1539, dev dm-4:8, commit interval 5 seconds LDISKFS FS on dm-4, internal journal on dm-4:8 LDISKFS-fs: delayed allocation enabled LDISKFS-fs: file extents enabled LDISKFS-fs: mballoc enabled LDISKFS-fs: recovery complete. LDISKFS-fs: mounted filesystem dm-4 with ordered data mode LDISKFS-fs: mballoc: 1 blocks 1 reqs (0 success) LDISKFS-fs: mballoc: 6 extents scanned, 0 goal hits, 0 2^N hits, 0 breaks, 0 lost LDISKFS-fs: mballoc: 1 generated and it took 2905 LDISKFS-fs: mballoc: 506 preallocated, 0 discarded [root at ts-hss2-01 ~]# mount -t lustre /dev/mapper/map0 /mnt/mdt mount.lustre: mount /dev/mapper/map0 at /mnt/mdt failed: Invalid argument This may have multiple causes. Are the mount options correct? Check the syslog for more info. [root at ts-hss2-01 ~]# dmesg -c LDISKFS-fs: barriers enabled kjournald2 starting: pid 1567, dev dm-4:8, commit interval 5 seconds LDISKFS FS on dm-4, internal journal on dm-4:8 LDISKFS-fs: delayed allocation enabled LDISKFS-fs: file extents enabled LDISKFS-fs: mballoc enabled LDISKFS-fs: mounted filesystem dm-4 with ordered data mode LDISKFS-fs: mballoc: 0 blocks 0 reqs (0 success) LDISKFS-fs: mballoc: 0 extents scanned, 0 goal hits, 0 2^N hits, 0 breaks, 0 lost LDISKFS-fs: mballoc: 0 generated and it took 0 LDISKFS-fs: mballoc: 0 preallocated, 0 discarded LDISKFS-fs: barriers enabled kjournald2 starting: pid 1575, dev dm-4:8, commit interval 5 seconds LDISKFS FS on dm-4, internal journal on dm-4:8 LDISKFS-fs: delayed allocation enabled LDISKFS-fs: file extents enabled LDISKFS-fs: mballoc enabled LDISKFS-fs: mounted filesystem dm-4 with ordered data mode Lustre: MGS MGS started Lustre: MGC10.2.9.1 at o2ib: Reactivating import Lustre: MGS: Logs for fs AFTER were removed by user request. All servers must be restarted in order to regenerate the logs. Lustre: Setting parameter AFTER-MDT0000.mdt.group_upcall in log AFTER-MDT0000 Lustre: Enabling user_xattr LustreError: 157-3: Trying to start OBD AFTER-MDT0000_UUID using the wrong disk BEFORE-MDT0000_UUID. Were the /dev/ assignments rearranged? LustreError: 1665:0:(mds_fs.c:828:mds_fs_setup()) cannot read last_rcvd: rc = -22 LustreError: 1665:0:(handler.c:2007:mds_setup()) AFTER-MDT0000: MDS filesystem method init failed: rc = -22 LustreError: 1665:0:(obd_config.c:372:class_setup()) setup AFTER-MDT0000 failed (-22) LustreError: 1665:0:(obd_config.c:1199:class_config_llog_handler()) Err -22 on cfg command: Lustre: cmd=cf003 0:AFTER-MDT0000 1:AFTER-MDT0000_UUID 2:0 3:AFTER-MDT0000 LustreError: 15b-f: MGC10.2.9.1 at o2ib: The configuration from log ''AFTER-MDT0000'' failed (-22). Make sure this client and the MGS are running compatible versions of Lustre. LustreError: 15c-8: MGC10.2.9.1 at o2ib: The configuration from log ''AFTER-MDT0000'' failed (-22). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information. LustreError: 1566:0:(obd_mount.c:1124:server_start_targets()) failed to start server AFTER-MDT0000: -22 LustreError: 1566:0:(obd_mount.c:1653:server_fill_super()) Unable to start targets: -22 LustreError: 1566:0:(obd_config.c:443:class_cleanup()) Device 4 not setup LustreError: 1566:0:(ldlm_request.c:1025:ldlm_cli_cancel_req()) Got rc -108 from cancel RPC: canceling anyway LustreError: 1566:0:(ldlm_request.c:1587:ldlm_cli_cancel_list()) ldlm_cli_cancel_list: -108 Lustre: MGS has stopped. LDISKFS-fs: mballoc: 1 blocks 1 reqs (0 success) LDISKFS-fs: mballoc: 6 extents scanned, 0 goal hits, 0 2^N hits, 0 breaks, 0 lost LDISKFS-fs: mballoc: 1 generated and it took 2883 LDISKFS-fs: mballoc: 503 preallocated, 0 discarded Lustre: 1566:0:(obd_mount.c:1473:server_put_super()) Cleaning orphaned obd AFTER-mdtlov Lustre: server umount AFTER-MDT0000 complete LustreError: 1566:0:(obd_mount.c:2045:lustre_fill_super()) Unable to mount (-22) Roger Spellman Staff Engineer Terascala, Inc. 508-588-1501 www.terascala.com <http://www.terascala.com/> -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20100802/fa74fea0/attachment.html
There''s a ''failsafe'' feature that prevents filesystem name changes:> LustreError: 157-3: Trying to start OBD AFTER-MDT0000_UUID using the wrong disk BEFORE-MDT0000_UUID. Were the /dev/ assignments rearranged? >You''ll have to go and delete the last_rcvd file off the disk for all the servers in the filesystem as well as tunefs --writeconf them all to the name AFTER name. On Aug 2, 2010, at 6:08 PM, Roger Spellman wrote:> > Hi, > I would like to be able to change a file system name. Towards that end, I have run the following commands as an experiment: > > mkfs.lustre --reformat --fsname BEFORE --device-size=10000 --mgs --mdt --mgsnode=10.2.9.1 at o2ib0 /dev/mapper/map0 > dmesg -c > mount -t lustre /dev/mapper/map0 /mnt/mdt > dmesg -c > umount /mnt/mdt > dmesg -c > tunefs.lustre --writeconf --fsname=AFTER --mgs --mdt /dev/mapper/map0 > dmesg -c > mount -t lustre /dev/mapper/map0 /mnt/mdt > dmesg -c > > Unfortunately, this does not work. Can someone please explain the correct sequence of commands to ues? The output of each command is as follows. > > Thanks. > > [root at ts-hss2-01 ~]# mkfs.lustre --reformat --fsname BEFORE --device-size=10000 --mgs --mdt --mgsnode=10.2.9.1 at o2ib0 /dev/mapper/map0 > > Permanent disk data: > Target: BEFORE-MDTffff > Index: unassigned > Lustre FS: BEFORE > Mount type: ldiskfs > Flags: 0x75 > (MDT MGS needs_index first_time update ) > Persistent mount opts: iopen_nopriv,user_xattr,errors=remount-ro > Parameters: mgsnode=10.2.9.1 at o2ib mdt.group_upcall=/usr/sbin/l_getgroups > > device size = 1632256MB > 2 6 18 > formatting backing filesystem ldiskfs on /dev/mapper/map0 > target name BEFORE-MDTffff > 4k blocks 2500 > options -i 4096 -I 512 -q -O dir_index,extents,uninit_groups -F > mkfs_cmd = mke2fs -j -b 4096 -L BEFORE-MDTffff -i 4096 -I 512 -q -O dir_index,extents,uninit_groups -F /dev/mapper/map0 2500 > Writing CONFIGS/mountdata > [root at ts-hss2-01 ~]# dmesg -c > LDISKFS-fs: barriers enabled > kjournald2 starting: pid 1388, dev dm-4:8, commit interval 5 seconds > LDISKFS FS on dm-4, internal journal on dm-4:8 > LDISKFS-fs: delayed allocation enabled > LDISKFS-fs: file extents enabled > LDISKFS-fs: mballoc enabled > LDISKFS-fs: mounted filesystem dm-4 with ordered data mode > LDISKFS-fs: mballoc: 1 blocks 1 reqs (0 success) > LDISKFS-fs: mballoc: 1 extents scanned, 0 goal hits, 1 2^N hits, 0 breaks, 0 lost > LDISKFS-fs: mballoc: 1 generated and it took 2142 > LDISKFS-fs: mballoc: 512 preallocated, 0 discarded > > > [root at ts-hss2-01 ~]# mount -t lustre /dev/mapper/map0 /mnt/mdt > [root at ts-hss2-01 ~]# dmesg -c > LDISKFS-fs: barriers enabled > kjournald2 starting: pid 1406, dev dm-4:8, commit interval 5 seconds > LDISKFS FS on dm-4, internal journal on dm-4:8 > LDISKFS-fs: delayed allocation enabled > LDISKFS-fs: file extents enabled > LDISKFS-fs: mballoc enabled > LDISKFS-fs: mounted filesystem dm-4 with ordered data mode > LDISKFS-fs: mballoc: 0 blocks 0 reqs (0 success) > LDISKFS-fs: mballoc: 0 extents scanned, 0 goal hits, 0 2^N hits, 0 breaks, 0 lost > LDISKFS-fs: mballoc: 0 generated and it took 0 > LDISKFS-fs: mballoc: 0 preallocated, 0 discarded > LDISKFS-fs: barriers enabled > kjournald2 starting: pid 1410, dev dm-4:8, commit interval 5 seconds > LDISKFS FS on dm-4, internal journal on dm-4:8 > LDISKFS-fs: delayed allocation enabled > LDISKFS-fs: file extents enabled > LDISKFS-fs: mballoc enabled > LDISKFS-fs: mounted filesystem dm-4 with ordered data mode > Lustre: MGS MGS started > Lustre: MGC10.2.9.1 at o2ib: Reactivating import > Lustre: Setting parameter BEFORE-MDT0000.mdt.group_upcall in log BEFORE-MDT0000 > Lustre: Enabling user_xattr > Lustre: BEFORE-MDT0000: new disk, initializing > Lustre: BEFORE-MDT0000: Now serving BEFORE-MDT0000 on /dev/mapper/map0 with recovery enabled > Lustre: 1503:0:(lproc_mds.c:271:lprocfs_wr_group_upcall()) BEFORE-MDT0000: group upcall set to /usr/sbin/l_getgroups > Lustre: BEFORE-MDT0000.mdt: set parameter group_upcall=/usr/sbin/l_getgroups > > > [root at ts-hss2-01 ~]# umount /mnt/mdt > [root at ts-hss2-01 ~]# dmesg -c > Lustre: Failing over BEFORE-MDT0000 > Lustre: Skipped 1 previous similar message > Lustre: *** setting obd BEFORE-MDT0000 device ''dm-4'' read-only *** > Turning device dm-4 (0xfd00004) read-only > Lustre: BEFORE-MDT0000: shutting down for failover; client state will be preserved. > Lustre: MDT BEFORE-MDT0000 has stopped. > LustreError: 1517:0:(ldlm_request.c:1025:ldlm_cli_cancel_req()) Got rc -108 from cancel RPC: canceling anyway > LustreError: 1517:0:(ldlm_request.c:1587:ldlm_cli_cancel_list()) ldlm_cli_cancel_list: -108 > Lustre: MGS has stopped. > LDISKFS-fs: mballoc: 3 blocks 3 reqs (0 success) > LDISKFS-fs: mballoc: 8 extents scanned, 0 goal hits, 0 2^N hits, 0 breaks, 0 lost > LDISKFS-fs: mballoc: 1 generated and it took 2598 > LDISKFS-fs: mballoc: 1145 preallocated, 0 discarded > Removing read-only on unknown block (0xfd00004) > Lustre: server umount BEFORE-MDT0000 complete > > > [root at ts-hss2-01 ~]# tunefs.lustre --writeconf --fsname=AFTER --mgs --mdt /dev/mapper/map0 > checking for existing Lustre data: found CONFIGS/mountdata > Reading CONFIGS/mountdata > > Read previous values: > Target: BEFORE-MDT0000 > Index: 0 > Lustre FS: BEFORE > Mount type: ldiskfs > Flags: 0x5 > (MDT MGS ) > Persistent mount opts: iopen_nopriv,user_xattr,errors=remount-ro > Parameters: mgsnode=10.2.9.1 at o2ib mdt.group_upcall=/usr/sbin/l_getgroups > > > Permanent disk data: > Target: AFTER-MDT0000 > Index: 0 > Lustre FS: AFTER > Mount type: ldiskfs > Flags: 0x105 > (MDT MGS writeconf ) > Persistent mount opts: iopen_nopriv,user_xattr,errors=remount-ro > Parameters: mgsnode=10.2.9.1 at o2ib mdt.group_upcall=/usr/sbin/l_getgroups > > Writing CONFIGS/mountdata > [root at ts-hss2-01 ~]# dmesg -c > LDISKFS-fs: barriers enabled > kjournald2 starting: pid 1539, dev dm-4:8, commit interval 5 seconds > LDISKFS FS on dm-4, internal journal on dm-4:8 > LDISKFS-fs: delayed allocation enabled > LDISKFS-fs: file extents enabled > LDISKFS-fs: mballoc enabled > LDISKFS-fs: recovery complete. > LDISKFS-fs: mounted filesystem dm-4 with ordered data mode > LDISKFS-fs: mballoc: 1 blocks 1 reqs (0 success) > LDISKFS-fs: mballoc: 6 extents scanned, 0 goal hits, 0 2^N hits, 0 breaks, 0 lost > LDISKFS-fs: mballoc: 1 generated and it took 2905 > LDISKFS-fs: mballoc: 506 preallocated, 0 discarded > > > [root at ts-hss2-01 ~]# mount -t lustre /dev/mapper/map0 /mnt/mdt > mount.lustre: mount /dev/mapper/map0 at /mnt/mdt failed: Invalid argument > This may have multiple causes. > Are the mount options correct? > Check the syslog for more info. > [root at ts-hss2-01 ~]# dmesg -c > LDISKFS-fs: barriers enabled > kjournald2 starting: pid 1567, dev dm-4:8, commit interval 5 seconds > LDISKFS FS on dm-4, internal journal on dm-4:8 > LDISKFS-fs: delayed allocation enabled > LDISKFS-fs: file extents enabled > LDISKFS-fs: mballoc enabled > LDISKFS-fs: mounted filesystem dm-4 with ordered data mode > LDISKFS-fs: mballoc: 0 blocks 0 reqs (0 success) > LDISKFS-fs: mballoc: 0 extents scanned, 0 goal hits, 0 2^N hits, 0 breaks, 0 lost > LDISKFS-fs: mballoc: 0 generated and it took 0 > LDISKFS-fs: mballoc: 0 preallocated, 0 discarded > LDISKFS-fs: barriers enabled > kjournald2 starting: pid 1575, dev dm-4:8, commit interval 5 seconds > LDISKFS FS on dm-4, internal journal on dm-4:8 > LDISKFS-fs: delayed allocation enabled > LDISKFS-fs: file extents enabled > LDISKFS-fs: mballoc enabled > LDISKFS-fs: mounted filesystem dm-4 with ordered data mode > Lustre: MGS MGS started > Lustre: MGC10.2.9.1 at o2ib: Reactivating import > Lustre: MGS: Logs for fs AFTER were removed by user request. All servers must be restarted in order to regenerate the logs. > Lustre: Setting parameter AFTER-MDT0000.mdt.group_upcall in log AFTER-MDT0000 > Lustre: Enabling user_xattr > LustreError: 157-3: Trying to start OBD AFTER-MDT0000_UUID using the wrong disk BEFORE-MDT0000_UUID. Were the /dev/ assignments rearranged? > LustreError: 1665:0:(mds_fs.c:828:mds_fs_setup()) cannot read last_rcvd: rc = -22 > LustreError: 1665:0:(handler.c:2007:mds_setup()) AFTER-MDT0000: MDS filesystem method init failed: rc = -22 > LustreError: 1665:0:(obd_config.c:372:class_setup()) setup AFTER-MDT0000 failed (-22) > LustreError: 1665:0:(obd_config.c:1199:class_config_llog_handler()) Err -22 on cfg command: > Lustre: cmd=cf003 0:AFTER-MDT0000 1:AFTER-MDT0000_UUID 2:0 3:AFTER-MDT0000 > LustreError: 15b-f: MGC10.2.9.1 at o2ib: The configuration from log ''AFTER-MDT0000'' failed (-22). Make sure this client and the MGS are running compatible versions of Lustre. > LustreError: 15c-8: MGC10.2.9.1 at o2ib: The configuration from log ''AFTER-MDT0000'' failed (-22). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information. > LustreError: 1566:0:(obd_mount.c:1124:server_start_targets()) failed to start server AFTER-MDT0000: -22 > LustreError: 1566:0:(obd_mount.c:1653:server_fill_super()) Unable to start targets: -22 > LustreError: 1566:0:(obd_config.c:443:class_cleanup()) Device 4 not setup > LustreError: 1566:0:(ldlm_request.c:1025:ldlm_cli_cancel_req()) Got rc -108 from cancel RPC: canceling anyway > LustreError: 1566:0:(ldlm_request.c:1587:ldlm_cli_cancel_list()) ldlm_cli_cancel_list: -108 > Lustre: MGS has stopped. > LDISKFS-fs: mballoc: 1 blocks 1 reqs (0 success) > LDISKFS-fs: mballoc: 6 extents scanned, 0 goal hits, 0 2^N hits, 0 breaks, 0 lost > LDISKFS-fs: mballoc: 1 generated and it took 2883 > LDISKFS-fs: mballoc: 503 preallocated, 0 discarded > Lustre: 1566:0:(obd_mount.c:1473:server_put_super()) Cleaning orphaned obd AFTER-mdtlov > Lustre: server umount AFTER-MDT0000 complete > LustreError: 1566:0:(obd_mount.c:2045:lustre_fill_super()) Unable to mount (-22) > > Roger Spellman > Staff Engineer > Terascala, Inc. > 508-588-1501 > www.terascala.com <http://www.terascala.com/> > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss-------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20100803/3c021e95/attachment.html -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 1931 bytes Desc: not available Url : http://lists.lustre.org/pipermail/lustre-discuss/attachments/20100803/3c021e95/attachment.bin
Nathan, Thank you. That works! I found that if I change IP address, I also need to remove the file /mnt/mdt/CONFIGS/*-client. The reason is that the OST mounts failed - the OST was still looking for the old IP Address. I grepped for files with the old IP Address, and I found those client files. Is that a safe thing to do? Please note that my mdt and mgs are on the same LUN. Thanks. -Roger ________________________________ From: Nathan Rutman [mailto:nathan.rutman at oracle.com] Sent: Tuesday, August 03, 2010 2:03 PM To: Roger Spellman Cc: lustre-discuss at lists.lustre.org Subject: Re: [Lustre-discuss] Problem with write_conf There''s a ''failsafe'' feature that prevents filesystem name changes: LustreError: 157-3: Trying to start OBD AFTER-MDT0000_UUID using the wrong disk BEFORE-MDT0000_UUID. Were the /dev/ assignments rearranged? You''ll have to go and delete the last_rcvd file off the disk for all the servers in the filesystem as well as tunefs --writeconf them all to the name AFTER name. On Aug 2, 2010, at 6:08 PM, Roger Spellman wrote: Hi, I would like to be able to change a file system name. Towards that end, I have run the following commands as an experiment: mkfs.lustre --reformat --fsname BEFORE --device-size=10000 --mgs --mdt --mgsnode=10.2.9.1 at o2ib0 /dev/mapper/map0 dmesg -c mount -t lustre /dev/mapper/map0 /mnt/mdt dmesg -c umount /mnt/mdt dmesg -c tunefs.lustre --writeconf --fsname=AFTER --mgs --mdt /dev/mapper/map0 dmesg -c mount -t lustre /dev/mapper/map0 /mnt/mdt dmesg -c Unfortunately, this does not work. Can someone please explain the correct sequence of commands to ues? The output of each command is as follows. Thanks. [root at ts-hss2-01 ~]# mkfs.lustre --reformat --fsname BEFORE --device-size=10000 --mgs --mdt --mgsnode=10.2.9.1 at o2ib0 /dev/mapper/map0 Permanent disk data: Target: BEFORE-MDTffff Index: unassigned Lustre FS: BEFORE Mount type: ldiskfs Flags: 0x75 (MDT MGS needs_index first_time update ) Persistent mount opts: iopen_nopriv,user_xattr,errors=remount-ro Parameters: mgsnode=10.2.9.1 at o2ib mdt.group_upcall=/usr/sbin/l_getgroups device size = 1632256MB 2 6 18 formatting backing filesystem ldiskfs on /dev/mapper/map0 target name BEFORE-MDTffff 4k blocks 2500 options -i 4096 -I 512 -q -O dir_index,extents,uninit_groups -F mkfs_cmd = mke2fs -j -b 4096 -L BEFORE-MDTffff -i 4096 -I 512 -q -O dir_index,extents,uninit_groups -F /dev/mapper/map0 2500 Writing CONFIGS/mountdata [root at ts-hss2-01 ~]# dmesg -c LDISKFS-fs: barriers enabled kjournald2 starting: pid 1388, dev dm-4:8, commit interval 5 seconds LDISKFS FS on dm-4, internal journal on dm-4:8 LDISKFS-fs: delayed allocation enabled LDISKFS-fs: file extents enabled LDISKFS-fs: mballoc enabled LDISKFS-fs: mounted filesystem dm-4 with ordered data mode LDISKFS-fs: mballoc: 1 blocks 1 reqs (0 success) LDISKFS-fs: mballoc: 1 extents scanned, 0 goal hits, 1 2^N hits, 0 breaks, 0 lost LDISKFS-fs: mballoc: 1 generated and it took 2142 LDISKFS-fs: mballoc: 512 preallocated, 0 discarded [root at ts-hss2-01 ~]# mount -t lustre /dev/mapper/map0 /mnt/mdt [root at ts-hss2-01 ~]# dmesg -c LDISKFS-fs: barriers enabled kjournald2 starting: pid 1406, dev dm-4:8, commit interval 5 seconds LDISKFS FS on dm-4, internal journal on dm-4:8 LDISKFS-fs: delayed allocation enabled LDISKFS-fs: file extents enabled LDISKFS-fs: mballoc enabled LDISKFS-fs: mounted filesystem dm-4 with ordered data mode LDISKFS-fs: mballoc: 0 blocks 0 reqs (0 success) LDISKFS-fs: mballoc: 0 extents scanned, 0 goal hits, 0 2^N hits, 0 breaks, 0 lost LDISKFS-fs: mballoc: 0 generated and it took 0 LDISKFS-fs: mballoc: 0 preallocated, 0 discarded LDISKFS-fs: barriers enabled kjournald2 starting: pid 1410, dev dm-4:8, commit interval 5 seconds LDISKFS FS on dm-4, internal journal on dm-4:8 LDISKFS-fs: delayed allocation enabled LDISKFS-fs: file extents enabled LDISKFS-fs: mballoc enabled LDISKFS-fs: mounted filesystem dm-4 with ordered data mode Lustre: MGS MGS started Lustre: MGC10.2.9.1 at o2ib: Reactivating import Lustre: Setting parameter BEFORE-MDT0000.mdt.group_upcall in log BEFORE-MDT0000 Lustre: Enabling user_xattr Lustre: BEFORE-MDT0000: new disk, initializing Lustre: BEFORE-MDT0000: Now serving BEFORE-MDT0000 on /dev/mapper/map0 with recovery enabled Lustre: 1503:0:(lproc_mds.c:271:lprocfs_wr_group_upcall()) BEFORE-MDT0000: group upcall set to /usr/sbin/l_getgroups Lustre: BEFORE-MDT0000.mdt: set parameter group_upcall=/usr/sbin/l_getgroups [root at ts-hss2-01 ~]# umount /mnt/mdt [root at ts-hss2-01 ~]# dmesg -c Lustre: Failing over BEFORE-MDT0000 Lustre: Skipped 1 previous similar message Lustre: *** setting obd BEFORE-MDT0000 device ''dm-4'' read-only *** Turning device dm-4 (0xfd00004) read-only Lustre: BEFORE-MDT0000: shutting down for failover; client state will be preserved. Lustre: MDT BEFORE-MDT0000 has stopped. LustreError: 1517:0:(ldlm_request.c:1025:ldlm_cli_cancel_req()) Got rc -108 from cancel RPC: canceling anyway LustreError: 1517:0:(ldlm_request.c:1587:ldlm_cli_cancel_list()) ldlm_cli_cancel_list: -108 Lustre: MGS has stopped. LDISKFS-fs: mballoc: 3 blocks 3 reqs (0 success) LDISKFS-fs: mballoc: 8 extents scanned, 0 goal hits, 0 2^N hits, 0 breaks, 0 lost LDISKFS-fs: mballoc: 1 generated and it took 2598 LDISKFS-fs: mballoc: 1145 preallocated, 0 discarded Removing read-only on unknown block (0xfd00004) Lustre: server umount BEFORE-MDT0000 complete [root at ts-hss2-01 ~]# tunefs.lustre --writeconf --fsname=AFTER --mgs --mdt /dev/mapper/map0 checking for existing Lustre data: found CONFIGS/mountdata Reading CONFIGS/mountdata Read previous values: Target: BEFORE-MDT0000 Index: 0 Lustre FS: BEFORE Mount type: ldiskfs Flags: 0x5 (MDT MGS ) Persistent mount opts: iopen_nopriv,user_xattr,errors=remount-ro Parameters: mgsnode=10.2.9.1 at o2ib mdt.group_upcall=/usr/sbin/l_getgroups Permanent disk data: Target: AFTER-MDT0000 Index: 0 Lustre FS: AFTER Mount type: ldiskfs Flags: 0x105 (MDT MGS writeconf ) Persistent mount opts: iopen_nopriv,user_xattr,errors=remount-ro Parameters: mgsnode=10.2.9.1 at o2ib mdt.group_upcall=/usr/sbin/l_getgroups Writing CONFIGS/mountdata [root at ts-hss2-01 ~]# dmesg -c LDISKFS-fs: barriers enabled kjournald2 starting: pid 1539, dev dm-4:8, commit interval 5 seconds LDISKFS FS on dm-4, internal journal on dm-4:8 LDISKFS-fs: delayed allocation enabled LDISKFS-fs: file extents enabled LDISKFS-fs: mballoc enabled LDISKFS-fs: recovery complete. LDISKFS-fs: mounted filesystem dm-4 with ordered data mode LDISKFS-fs: mballoc: 1 blocks 1 reqs (0 success) LDISKFS-fs: mballoc: 6 extents scanned, 0 goal hits, 0 2^N hits, 0 breaks, 0 lost LDISKFS-fs: mballoc: 1 generated and it took 2905 LDISKFS-fs: mballoc: 506 preallocated, 0 discarded [root at ts-hss2-01 ~]# mount -t lustre /dev/mapper/map0 /mnt/mdt mount.lustre: mount /dev/mapper/map0 at /mnt/mdt failed: Invalid argument This may have multiple causes. Are the mount options correct? Check the syslog for more info. [root at ts-hss2-01 ~]# dmesg -c LDISKFS-fs: barriers enabled kjournald2 starting: pid 1567, dev dm-4:8, commit interval 5 seconds LDISKFS FS on dm-4, internal journal on dm-4:8 LDISKFS-fs: delayed allocation enabled LDISKFS-fs: file extents enabled LDISKFS-fs: mballoc enabled LDISKFS-fs: mounted filesystem dm-4 with ordered data mode LDISKFS-fs: mballoc: 0 blocks 0 reqs (0 success) LDISKFS-fs: mballoc: 0 extents scanned, 0 goal hits, 0 2^N hits, 0 breaks, 0 lost LDISKFS-fs: mballoc: 0 generated and it took 0 LDISKFS-fs: mballoc: 0 preallocated, 0 discarded LDISKFS-fs: barriers enabled kjournald2 starting: pid 1575, dev dm-4:8, commit interval 5 seconds LDISKFS FS on dm-4, internal journal on dm-4:8 LDISKFS-fs: delayed allocation enabled LDISKFS-fs: file extents enabled LDISKFS-fs: mballoc enabled LDISKFS-fs: mounted filesystem dm-4 with ordered data mode Lustre: MGS MGS started Lustre: MGC10.2.9.1 at o2ib: Reactivating import Lustre: MGS: Logs for fs AFTER were removed by user request. All servers must be restarted in order to regenerate the logs. Lustre: Setting parameter AFTER-MDT0000.mdt.group_upcall in log AFTER-MDT0000 Lustre: Enabling user_xattr LustreError: 157-3: Trying to start OBD AFTER-MDT0000_UUID using the wrong disk BEFORE-MDT0000_UUID. Were the /dev/ assignments rearranged? LustreError: 1665:0:(mds_fs.c:828:mds_fs_setup()) cannot read last_rcvd: rc = -22 LustreError: 1665:0:(handler.c:2007:mds_setup()) AFTER-MDT0000: MDS filesystem method init failed: rc = -22 LustreError: 1665:0:(obd_config.c:372:class_setup()) setup AFTER-MDT0000 failed (-22) LustreError: 1665:0:(obd_config.c:1199:class_config_llog_handler()) Err -22 on cfg command: Lustre: cmd=cf003 0:AFTER-MDT0000 1:AFTER-MDT0000_UUID 2:0 3:AFTER-MDT0000 LustreError: 15b-f: MGC10.2.9.1 at o2ib: The configuration from log ''AFTER-MDT0000'' failed (-22). Make sure this client and the MGS are running compatible versions of Lustre. LustreError: 15c-8: MGC10.2.9.1 at o2ib: The configuration from log ''AFTER-MDT0000'' failed (-22). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information. LustreError: 1566:0:(obd_mount.c:1124:server_start_targets()) failed to start server AFTER-MDT0000: -22 LustreError: 1566:0:(obd_mount.c:1653:server_fill_super()) Unable to start targets: -22 LustreError: 1566:0:(obd_config.c:443:class_cleanup()) Device 4 not setup LustreError: 1566:0:(ldlm_request.c:1025:ldlm_cli_cancel_req()) Got rc -108 from cancel RPC: canceling anyway LustreError: 1566:0:(ldlm_request.c:1587:ldlm_cli_cancel_list()) ldlm_cli_cancel_list: -108 Lustre: MGS has stopped. LDISKFS-fs: mballoc: 1 blocks 1 reqs (0 success) LDISKFS-fs: mballoc: 6 extents scanned, 0 goal hits, 0 2^N hits, 0 breaks, 0 lost LDISKFS-fs: mballoc: 1 generated and it took 2883 LDISKFS-fs: mballoc: 503 preallocated, 0 discarded Lustre: 1566:0:(obd_mount.c:1473:server_put_super()) Cleaning orphaned obd AFTER-mdtlov Lustre: server umount AFTER-MDT0000 complete LustreError: 1566:0:(obd_mount.c:2045:lustre_fill_super()) Unable to mount (-22) Roger Spellman Staff Engineer Terascala, Inc. 508-588-1501 www.terascala.com <http://www.terascala.com/> _______________________________________________ Lustre-discuss mailing list Lustre-discuss at lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20100803/893a1f98/attachment-0001.html
On Aug 3, 2010, at 11:25 AM, Roger Spellman wrote:> Nathan, > > Thank you. That works! > > I found that if I change IP address, I also need to remove the file /mnt/mdt/CONFIGS/*-client.This is what tunefs.lustre --writeconf on the MDT does, when you first mount it after the writeconf. --writeconf on the MDT and all OSTs is the preferred way of changing a server nid.> > The reason is that the OST mounts failed ? the OST was still looking for the old IP Address. I grepped for files with the old IP Address, and I found those client files. > > Is that a safe thing to do? Please note that my mdt and mgs are on the same LUN. > > Thanks. > > -Roger > > > From: Nathan Rutman [mailto:nathan.rutman at oracle.com] > Sent: Tuesday, August 03, 2010 2:03 PM > To: Roger Spellman > Cc: lustre-discuss at lists.lustre.org > Subject: Re: [Lustre-discuss] Problem with write_conf > > There''s a ''failsafe'' feature that prevents filesystem name changes: >> LustreError: 157-3: Trying to start OBD AFTER-MDT0000_UUID using the wrong disk BEFORE-MDT0000_UUID. Were the /dev/ assignments rearranged? >> > You''ll have to go and delete the last_rcvd file off the disk for all the servers in the filesystem as well as tunefs --writeconf them all to the name AFTER name. > > On Aug 2, 2010, at 6:08 PM, Roger Spellman wrote: > > > > Hi, > I would like to be able to change a file system name. Towards that end, I have run the following commands as an experiment: > > mkfs.lustre --reformat --fsname BEFORE --device-size=10000 --mgs --mdt --mgsnode=10.2.9.1 at o2ib0 /dev/mapper/map0 > dmesg -c > mount -t lustre /dev/mapper/map0 /mnt/mdt > dmesg -c > umount /mnt/mdt > dmesg -c > tunefs.lustre --writeconf --fsname=AFTER --mgs --mdt /dev/mapper/map0 > dmesg -c > mount -t lustre /dev/mapper/map0 /mnt/mdt > dmesg -c > > Unfortunately, this does not work. Can someone please explain the correct sequence of commands to ues? The output of each command is as follows. > > Thanks. > > [root at ts-hss2-01 ~]# mkfs.lustre --reformat --fsname BEFORE --device-size=10000 --mgs --mdt --mgsnode=10.2.9.1 at o2ib0 /dev/mapper/map0 > > Permanent disk data: > Target: BEFORE-MDTffff > Index: unassigned > Lustre FS: BEFORE > Mount type: ldiskfs > Flags: 0x75 > (MDT MGS needs_index first_time update ) > Persistent mount opts: iopen_nopriv,user_xattr,errors=remount-ro > Parameters: mgsnode=10.2.9.1 at o2ib mdt.group_upcall=/usr/sbin/l_getgroups > > device size = 1632256MB > 2 6 18 > formatting backing filesystem ldiskfs on /dev/mapper/map0 > target name BEFORE-MDTffff > 4k blocks 2500 > options -i 4096 -I 512 -q -O dir_index,extents,uninit_groups -F > mkfs_cmd = mke2fs -j -b 4096 -L BEFORE-MDTffff -i 4096 -I 512 -q -O dir_index,extents,uninit_groups -F /dev/mapper/map0 2500 > Writing CONFIGS/mountdata > [root at ts-hss2-01 ~]# dmesg -c > LDISKFS-fs: barriers enabled > kjournald2 starting: pid 1388, dev dm-4:8, commit interval 5 seconds > LDISKFS FS on dm-4, internal journal on dm-4:8 > LDISKFS-fs: delayed allocation enabled > LDISKFS-fs: file extents enabled > LDISKFS-fs: mballoc enabled > LDISKFS-fs: mounted filesystem dm-4 with ordered data mode > LDISKFS-fs: mballoc: 1 blocks 1 reqs (0 success) > LDISKFS-fs: mballoc: 1 extents scanned, 0 goal hits, 1 2^N hits, 0 breaks, 0 lost > LDISKFS-fs: mballoc: 1 generated and it took 2142 > LDISKFS-fs: mballoc: 512 preallocated, 0 discarded > > > [root at ts-hss2-01 ~]# mount -t lustre /dev/mapper/map0 /mnt/mdt > [root at ts-hss2-01 ~]# dmesg -c > LDISKFS-fs: barriers enabled > kjournald2 starting: pid 1406, dev dm-4:8, commit interval 5 seconds > LDISKFS FS on dm-4, internal journal on dm-4:8 > LDISKFS-fs: delayed allocation enabled > LDISKFS-fs: file extents enabled > LDISKFS-fs: mballoc enabled > LDISKFS-fs: mounted filesystem dm-4 with ordered data mode > LDISKFS-fs: mballoc: 0 blocks 0 reqs (0 success) > LDISKFS-fs: mballoc: 0 extents scanned, 0 goal hits, 0 2^N hits, 0 breaks, 0 lost > LDISKFS-fs: mballoc: 0 generated and it took 0 > LDISKFS-fs: mballoc: 0 preallocated, 0 discarded > LDISKFS-fs: barriers enabled > kjournald2 starting: pid 1410, dev dm-4:8, commit interval 5 seconds > LDISKFS FS on dm-4, internal journal on dm-4:8 > LDISKFS-fs: delayed allocation enabled > LDISKFS-fs: file extents enabled > LDISKFS-fs: mballoc enabled > LDISKFS-fs: mounted filesystem dm-4 with ordered data mode > Lustre: MGS MGS started > Lustre: MGC10.2.9.1 at o2ib: Reactivating import > Lustre: Setting parameter BEFORE-MDT0000.mdt.group_upcall in log BEFORE-MDT0000 > Lustre: Enabling user_xattr > Lustre: BEFORE-MDT0000: new disk, initializing > Lustre: BEFORE-MDT0000: Now serving BEFORE-MDT0000 on /dev/mapper/map0 with recovery enabled > Lustre: 1503:0:(lproc_mds.c:271:lprocfs_wr_group_upcall()) BEFORE-MDT0000: group upcall set to /usr/sbin/l_getgroups > Lustre: BEFORE-MDT0000.mdt: set parameter group_upcall=/usr/sbin/l_getgroups > > > [root at ts-hss2-01 ~]# umount /mnt/mdt > [root at ts-hss2-01 ~]# dmesg -c > Lustre: Failing over BEFORE-MDT0000 > Lustre: Skipped 1 previous similar message > Lustre: *** setting obd BEFORE-MDT0000 device ''dm-4'' read-only *** > Turning device dm-4 (0xfd00004) read-only > Lustre: BEFORE-MDT0000: shutting down for failover; client state will be preserved. > Lustre: MDT BEFORE-MDT0000 has stopped. > LustreError: 1517:0:(ldlm_request.c:1025:ldlm_cli_cancel_req()) Got rc -108 from cancel RPC: canceling anyway > LustreError: 1517:0:(ldlm_request.c:1587:ldlm_cli_cancel_list()) ldlm_cli_cancel_list: -108 > Lustre: MGS has stopped. > LDISKFS-fs: mballoc: 3 blocks 3 reqs (0 success) > LDISKFS-fs: mballoc: 8 extents scanned, 0 goal hits, 0 2^N hits, 0 breaks, 0 lost > LDISKFS-fs: mballoc: 1 generated and it took 2598 > LDISKFS-fs: mballoc: 1145 preallocated, 0 discarded > Removing read-only on unknown block (0xfd00004) > Lustre: server umount BEFORE-MDT0000 complete > > > [root at ts-hss2-01 ~]# tunefs.lustre --writeconf --fsname=AFTER --mgs --mdt /dev/mapper/map0 > checking for existing Lustre data: found CONFIGS/mountdata > Reading CONFIGS/mountdata > > Read previous values: > Target: BEFORE-MDT0000 > Index: 0 > Lustre FS: BEFORE > Mount type: ldiskfs > Flags: 0x5 > (MDT MGS ) > Persistent mount opts: iopen_nopriv,user_xattr,errors=remount-ro > Parameters: mgsnode=10.2.9.1 at o2ib mdt.group_upcall=/usr/sbin/l_getgroups > > > Permanent disk data: > Target: AFTER-MDT0000 > Index: 0 > Lustre FS: AFTER > Mount type: ldiskfs > Flags: 0x105 > (MDT MGS writeconf ) > Persistent mount opts: iopen_nopriv,user_xattr,errors=remount-ro > Parameters: mgsnode=10.2.9.1 at o2ib mdt.group_upcall=/usr/sbin/l_getgroups > > Writing CONFIGS/mountdata > [root at ts-hss2-01 ~]# dmesg -c > LDISKFS-fs: barriers enabled > kjournald2 starting: pid 1539, dev dm-4:8, commit interval 5 seconds > LDISKFS FS on dm-4, internal journal on dm-4:8 > LDISKFS-fs: delayed allocation enabled > LDISKFS-fs: file extents enabled > LDISKFS-fs: mballoc enabled > LDISKFS-fs: recovery complete. > LDISKFS-fs: mounted filesystem dm-4 with ordered data mode > LDISKFS-fs: mballoc: 1 blocks 1 reqs (0 success) > LDISKFS-fs: mballoc: 6 extents scanned, 0 goal hits, 0 2^N hits, 0 breaks, 0 lost > LDISKFS-fs: mballoc: 1 generated and it took 2905 > LDISKFS-fs: mballoc: 506 preallocated, 0 discarded > > > [root at ts-hss2-01 ~]# mount -t lustre /dev/mapper/map0 /mnt/mdt > mount.lustre: mount /dev/mapper/map0 at /mnt/mdt failed: Invalid argument > This may have multiple causes. > Are the mount options correct? > Check the syslog for more info. > [root at ts-hss2-01 ~]# dmesg -c > LDISKFS-fs: barriers enabled > kjournald2 starting: pid 1567, dev dm-4:8, commit interval 5 seconds > LDISKFS FS on dm-4, internal journal on dm-4:8 > LDISKFS-fs: delayed allocation enabled > LDISKFS-fs: file extents enabled > LDISKFS-fs: mballoc enabled > LDISKFS-fs: mounted filesystem dm-4 with ordered data mode > LDISKFS-fs: mballoc: 0 blocks 0 reqs (0 success) > LDISKFS-fs: mballoc: 0 extents scanned, 0 goal hits, 0 2^N hits, 0 breaks, 0 lost > LDISKFS-fs: mballoc: 0 generated and it took 0 > LDISKFS-fs: mballoc: 0 preallocated, 0 discarded > LDISKFS-fs: barriers enabled > kjournald2 starting: pid 1575, dev dm-4:8, commit interval 5 seconds > LDISKFS FS on dm-4, internal journal on dm-4:8 > LDISKFS-fs: delayed allocation enabled > LDISKFS-fs: file extents enabled > LDISKFS-fs: mballoc enabled > LDISKFS-fs: mounted filesystem dm-4 with ordered data mode > Lustre: MGS MGS started > Lustre: MGC10.2.9.1 at o2ib: Reactivating import > Lustre: MGS: Logs for fs AFTER were removed by user request. All servers must be restarted in order to regenerate the logs. > Lustre: Setting parameter AFTER-MDT0000.mdt.group_upcall in log AFTER-MDT0000 > Lustre: Enabling user_xattr > LustreError: 157-3: Trying to start OBD AFTER-MDT0000_UUID using the wrong disk BEFORE-MDT0000_UUID. Were the /dev/ assignments rearranged? > LustreError: 1665:0:(mds_fs.c:828:mds_fs_setup()) cannot read last_rcvd: rc = -22 > LustreError: 1665:0:(handler.c:2007:mds_setup()) AFTER-MDT0000: MDS filesystem method init failed: rc = -22 > LustreError: 1665:0:(obd_config.c:372:class_setup()) setup AFTER-MDT0000 failed (-22) > LustreError: 1665:0:(obd_config.c:1199:class_config_llog_handler()) Err -22 on cfg command: > Lustre: cmd=cf003 0:AFTER-MDT0000 1:AFTER-MDT0000_UUID 2:0 3:AFTER-MDT0000 > LustreError: 15b-f: MGC10.2.9.1 at o2ib: The configuration from log ''AFTER-MDT0000'' failed (-22). Make sure this client and the MGS are running compatible versions of Lustre. > LustreError: 15c-8: MGC10.2.9.1 at o2ib: The configuration from log ''AFTER-MDT0000'' failed (-22). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information. > LustreError: 1566:0:(obd_mount.c:1124:server_start_targets()) failed to start server AFTER-MDT0000: -22 > LustreError: 1566:0:(obd_mount.c:1653:server_fill_super()) Unable to start targets: -22 > LustreError: 1566:0:(obd_config.c:443:class_cleanup()) Device 4 not setup > LustreError: 1566:0:(ldlm_request.c:1025:ldlm_cli_cancel_req()) Got rc -108 from cancel RPC: canceling anyway > LustreError: 1566:0:(ldlm_request.c:1587:ldlm_cli_cancel_list()) ldlm_cli_cancel_list: -108 > Lustre: MGS has stopped. > LDISKFS-fs: mballoc: 1 blocks 1 reqs (0 success) > LDISKFS-fs: mballoc: 6 extents scanned, 0 goal hits, 0 2^N hits, 0 breaks, 0 lost > LDISKFS-fs: mballoc: 1 generated and it took 2883 > LDISKFS-fs: mballoc: 503 preallocated, 0 discarded > Lustre: 1566:0:(obd_mount.c:1473:server_put_super()) Cleaning orphaned obd AFTER-mdtlov > Lustre: server umount AFTER-MDT0000 complete > LustreError: 1566:0:(obd_mount.c:2045:lustre_fill_super()) Unable to mount (-22) > > Roger Spellman > Staff Engineer > Terascala, Inc. > 508-588-1501 > www.terascala.com <http://www.terascala.com/> > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss >-------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20100803/7725f4f1/attachment-0001.html -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 1931 bytes Desc: not available Url : http://lists.lustre.org/pipermail/lustre-discuss/attachments/20100803/7725f4f1/attachment-0001.bin
If I change the NIDs, and if I don''t remove /mnt/mdt/CONFIGS/*-client, then I get the following when I try mounting a client (note that 10.2.9.1 is the OLD address): mount.lustre: mount 10.2.9.1 at o2ib:/hss2 at /mnt/lustre-hss2 failed: Cannot send after transport endpoint shutdown dmesg shows: Lustre: Request x1 sent from MGC10.2.9.1 at o2ib to NID 10.2.9.1 at o2ib 5s ago has timed out (limit 5s). LustreError: 15c-8: MGC10.2.9.1 at o2ib: The configuration from log ''hss2-client'' failed (-108). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information. LustreError: 6285:0:(llite_lib.c:1065:ll_fill_super()) Unable to process log: -108 Lustre: client ffff81007e98e800 umount complete LustreError: 6285:0:(obd_mount.c:1991:lustre_fill_super()) Unable to mount (-108) Am I missing a step? -Roger ________________________________ From: Nathan Rutman [mailto:nathan.rutman at oracle.com] Sent: Tuesday, August 03, 2010 2:34 PM To: Roger Spellman Cc: lustre-discuss at lists.lustre.org Subject: Re: [Lustre-discuss] Problem with write_conf On Aug 3, 2010, at 11:25 AM, Roger Spellman wrote: Nathan, Thank you. That works! I found that if I change IP address, I also need to remove the file /mnt/mdt/CONFIGS/*-client. This is what tunefs.lustre --writeconf on the MDT does, when you first mount it after the writeconf. --writeconf on the MDT and all OSTs is the preferred way of changing a server nid. The reason is that the OST mounts failed - the OST was still looking for the old IP Address. I grepped for files with the old IP Address, and I found those client files. Is that a safe thing to do? Please note that my mdt and mgs are on the same LUN. Thanks. -Roger ________________________________ From: Nathan Rutman [mailto:nathan.rutman at oracle.com] Sent: Tuesday, August 03, 2010 2:03 PM To: Roger Spellman Cc: lustre-discuss at lists.lustre.org Subject: Re: [Lustre-discuss] Problem with write_conf There''s a ''failsafe'' feature that prevents filesystem name changes: LustreError: 157-3: Trying to start OBD AFTER-MDT0000_UUID using the wrong disk BEFORE-MDT0000_UUID. Were the /dev/ assignments rearranged? You''ll have to go and delete the last_rcvd file off the disk for all the servers in the filesystem as well as tunefs --writeconf them all to the name AFTER name. On Aug 2, 2010, at 6:08 PM, Roger Spellman wrote: Hi, I would like to be able to change a file system name. Towards that end, I have run the following commands as an experiment: mkfs.lustre --reformat --fsname BEFORE --device-size=10000 --mgs --mdt --mgsnode=10.2.9.1 at o2ib0 /dev/mapper/map0 dmesg -c mount -t lustre /dev/mapper/map0 /mnt/mdt dmesg -c umount /mnt/mdt dmesg -c tunefs.lustre --writeconf --fsname=AFTER --mgs --mdt /dev/mapper/map0 dmesg -c mount -t lustre /dev/mapper/map0 /mnt/mdt dmesg -c Unfortunately, this does not work. Can someone please explain the correct sequence of commands to ues? The output of each command is as follows. Thanks. [root at ts-hss2-01 ~]# mkfs.lustre --reformat --fsname BEFORE --device-size=10000 --mgs --mdt --mgsnode=10.2.9.1 at o2ib0 /dev/mapper/map0 Permanent disk data: Target: BEFORE-MDTffff Index: unassigned Lustre FS: BEFORE Mount type: ldiskfs Flags: 0x75 (MDT MGS needs_index first_time update ) Persistent mount opts: iopen_nopriv,user_xattr,errors=remount-ro Parameters: mgsnode=10.2.9.1 at o2ib mdt.group_upcall=/usr/sbin/l_getgroups device size = 1632256MB 2 6 18 formatting backing filesystem ldiskfs on /dev/mapper/map0 target name BEFORE-MDTffff 4k blocks 2500 options -i 4096 -I 512 -q -O dir_index,extents,uninit_groups -F mkfs_cmd = mke2fs -j -b 4096 -L BEFORE-MDTffff -i 4096 -I 512 -q -O dir_index,extents,uninit_groups -F /dev/mapper/map0 2500 Writing CONFIGS/mountdata [root at ts-hss2-01 ~]# dmesg -c LDISKFS-fs: barriers enabled kjournald2 starting: pid 1388, dev dm-4:8, commit interval 5 seconds LDISKFS FS on dm-4, internal journal on dm-4:8 LDISKFS-fs: delayed allocation enabled LDISKFS-fs: file extents enabled LDISKFS-fs: mballoc enabled LDISKFS-fs: mounted filesystem dm-4 with ordered data mode LDISKFS-fs: mballoc: 1 blocks 1 reqs (0 success) LDISKFS-fs: mballoc: 1 extents scanned, 0 goal hits, 1 2^N hits, 0 breaks, 0 lost LDISKFS-fs: mballoc: 1 generated and it took 2142 LDISKFS-fs: mballoc: 512 preallocated, 0 discarded [root at ts-hss2-01 ~]# mount -t lustre /dev/mapper/map0 /mnt/mdt [root at ts-hss2-01 ~]# dmesg -c LDISKFS-fs: barriers enabled kjournald2 starting: pid 1406, dev dm-4:8, commit interval 5 seconds LDISKFS FS on dm-4, internal journal on dm-4:8 LDISKFS-fs: delayed allocation enabled LDISKFS-fs: file extents enabled LDISKFS-fs: mballoc enabled LDISKFS-fs: mounted filesystem dm-4 with ordered data mode LDISKFS-fs: mballoc: 0 blocks 0 reqs (0 success) LDISKFS-fs: mballoc: 0 extents scanned, 0 goal hits, 0 2^N hits, 0 breaks, 0 lost LDISKFS-fs: mballoc: 0 generated and it took 0 LDISKFS-fs: mballoc: 0 preallocated, 0 discarded LDISKFS-fs: barriers enabled kjournald2 starting: pid 1410, dev dm-4:8, commit interval 5 seconds LDISKFS FS on dm-4, internal journal on dm-4:8 LDISKFS-fs: delayed allocation enabled LDISKFS-fs: file extents enabled LDISKFS-fs: mballoc enabled LDISKFS-fs: mounted filesystem dm-4 with ordered data mode Lustre: MGS MGS started Lustre: MGC10.2.9.1 at o2ib: Reactivating import Lustre: Setting parameter BEFORE-MDT0000.mdt.group_upcall in log BEFORE-MDT0000 Lustre: Enabling user_xattr Lustre: BEFORE-MDT0000: new disk, initializing Lustre: BEFORE-MDT0000: Now serving BEFORE-MDT0000 on /dev/mapper/map0 with recovery enabled Lustre: 1503:0:(lproc_mds.c:271:lprocfs_wr_group_upcall()) BEFORE-MDT0000: group upcall set to /usr/sbin/l_getgroups Lustre: BEFORE-MDT0000.mdt: set parameter group_upcall=/usr/sbin/l_getgroups [root at ts-hss2-01 ~]# umount /mnt/mdt [root at ts-hss2-01 ~]# dmesg -c Lustre: Failing over BEFORE-MDT0000 Lustre: Skipped 1 previous similar message Lustre: *** setting obd BEFORE-MDT0000 device ''dm-4'' read-only *** Turning device dm-4 (0xfd00004) read-only Lustre: BEFORE-MDT0000: shutting down for failover; client state will be preserved. Lustre: MDT BEFORE-MDT0000 has stopped. LustreError: 1517:0:(ldlm_request.c:1025:ldlm_cli_cancel_req()) Got rc -108 from cancel RPC: canceling anyway LustreError: 1517:0:(ldlm_request.c:1587:ldlm_cli_cancel_list()) ldlm_cli_cancel_list: -108 Lustre: MGS has stopped. LDISKFS-fs: mballoc: 3 blocks 3 reqs (0 success) LDISKFS-fs: mballoc: 8 extents scanned, 0 goal hits, 0 2^N hits, 0 breaks, 0 lost LDISKFS-fs: mballoc: 1 generated and it took 2598 LDISKFS-fs: mballoc: 1145 preallocated, 0 discarded Removing read-only on unknown block (0xfd00004) Lustre: server umount BEFORE-MDT0000 complete [root at ts-hss2-01 ~]# tunefs.lustre --writeconf --fsname=AFTER --mgs --mdt /dev/mapper/map0 checking for existing Lustre data: found CONFIGS/mountdata Reading CONFIGS/mountdata Read previous values: Target: BEFORE-MDT0000 Index: 0 Lustre FS: BEFORE Mount type: ldiskfs Flags: 0x5 (MDT MGS ) Persistent mount opts: iopen_nopriv,user_xattr,errors=remount-ro Parameters: mgsnode=10.2.9.1 at o2ib mdt.group_upcall=/usr/sbin/l_getgroups Permanent disk data: Target: AFTER-MDT0000 Index: 0 Lustre FS: AFTER Mount type: ldiskfs Flags: 0x105 (MDT MGS writeconf ) Persistent mount opts: iopen_nopriv,user_xattr,errors=remount-ro Parameters: mgsnode=10.2.9.1 at o2ib mdt.group_upcall=/usr/sbin/l_getgroups Writing CONFIGS/mountdata [root at ts-hss2-01 ~]# dmesg -c LDISKFS-fs: barriers enabled kjournald2 starting: pid 1539, dev dm-4:8, commit interval 5 seconds LDISKFS FS on dm-4, internal journal on dm-4:8 LDISKFS-fs: delayed allocation enabled LDISKFS-fs: file extents enabled LDISKFS-fs: mballoc enabled LDISKFS-fs: recovery complete. LDISKFS-fs: mounted filesystem dm-4 with ordered data mode LDISKFS-fs: mballoc: 1 blocks 1 reqs (0 success) LDISKFS-fs: mballoc: 6 extents scanned, 0 goal hits, 0 2^N hits, 0 breaks, 0 lost LDISKFS-fs: mballoc: 1 generated and it took 2905 LDISKFS-fs: mballoc: 506 preallocated, 0 discarded [root at ts-hss2-01 ~]# mount -t lustre /dev/mapper/map0 /mnt/mdt mount.lustre: mount /dev/mapper/map0 at /mnt/mdt failed: Invalid argument This may have multiple causes. Are the mount options correct? Check the syslog for more info. [root at ts-hss2-01 ~]# dmesg -c LDISKFS-fs: barriers enabled kjournald2 starting: pid 1567, dev dm-4:8, commit interval 5 seconds LDISKFS FS on dm-4, internal journal on dm-4:8 LDISKFS-fs: delayed allocation enabled LDISKFS-fs: file extents enabled LDISKFS-fs: mballoc enabled LDISKFS-fs: mounted filesystem dm-4 with ordered data mode LDISKFS-fs: mballoc: 0 blocks 0 reqs (0 success) LDISKFS-fs: mballoc: 0 extents scanned, 0 goal hits, 0 2^N hits, 0 breaks, 0 lost LDISKFS-fs: mballoc: 0 generated and it took 0 LDISKFS-fs: mballoc: 0 preallocated, 0 discarded LDISKFS-fs: barriers enabled kjournald2 starting: pid 1575, dev dm-4:8, commit interval 5 seconds LDISKFS FS on dm-4, internal journal on dm-4:8 LDISKFS-fs: delayed allocation enabled LDISKFS-fs: file extents enabled LDISKFS-fs: mballoc enabled LDISKFS-fs: mounted filesystem dm-4 with ordered data mode Lustre: MGS MGS started Lustre: MGC10.2.9.1 at o2ib: Reactivating import Lustre: MGS: Logs for fs AFTER were removed by user request. All servers must be restarted in order to regenerate the logs. Lustre: Setting parameter AFTER-MDT0000.mdt.group_upcall in log AFTER-MDT0000 Lustre: Enabling user_xattr LustreError: 157-3: Trying to start OBD AFTER-MDT0000_UUID using the wrong disk BEFORE-MDT0000_UUID. Were the /dev/ assignments rearranged? LustreError: 1665:0:(mds_fs.c:828:mds_fs_setup()) cannot read last_rcvd: rc = -22 LustreError: 1665:0:(handler.c:2007:mds_setup()) AFTER-MDT0000: MDS filesystem method init failed: rc = -22 LustreError: 1665:0:(obd_config.c:372:class_setup()) setup AFTER-MDT0000 failed (-22) LustreError: 1665:0:(obd_config.c:1199:class_config_llog_handler()) Err -22 on cfg command: Lustre: cmd=cf003 0:AFTER-MDT0000 1:AFTER-MDT0000_UUID 2:0 3:AFTER-MDT0000 LustreError: 15b-f: MGC10.2.9.1 at o2ib: The configuration from log ''AFTER-MDT0000'' failed (-22). Make sure this client and the MGS are running compatible versions of Lustre. LustreError: 15c-8: MGC10.2.9.1 at o2ib: The configuration from log ''AFTER-MDT0000'' failed (-22). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information. LustreError: 1566:0:(obd_mount.c:1124:server_start_targets()) failed to start server AFTER-MDT0000: -22 LustreError: 1566:0:(obd_mount.c:1653:server_fill_super()) Unable to start targets: -22 LustreError: 1566:0:(obd_config.c:443:class_cleanup()) Device 4 not setup LustreError: 1566:0:(ldlm_request.c:1025:ldlm_cli_cancel_req()) Got rc -108 from cancel RPC: canceling anyway LustreError: 1566:0:(ldlm_request.c:1587:ldlm_cli_cancel_list()) ldlm_cli_cancel_list: -108 Lustre: MGS has stopped. LDISKFS-fs: mballoc: 1 blocks 1 reqs (0 success) LDISKFS-fs: mballoc: 6 extents scanned, 0 goal hits, 0 2^N hits, 0 breaks, 0 lost LDISKFS-fs: mballoc: 1 generated and it took 2883 LDISKFS-fs: mballoc: 503 preallocated, 0 discarded Lustre: 1566:0:(obd_mount.c:1473:server_put_super()) Cleaning orphaned obd AFTER-mdtlov Lustre: server umount AFTER-MDT0000 complete LustreError: 1566:0:(obd_mount.c:2045:lustre_fill_super()) Unable to mount (-22) Roger Spellman Staff Engineer Terascala, Inc. 508-588-1501 www.terascala.com <http://www.terascala.com/> <http://www.terascala.com/> _______________________________________________ Lustre-discuss mailing list Lustre-discuss at lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20100803/21b9137c/attachment-0001.html
On Aug 3, 2010, at 12:49 PM, Roger Spellman wrote:> If I change the NIDs, and if I don?t remove /mnt/mdt/CONFIGS/*-client, then I get the following when I try mounting a client (note that 10.2.9.1 is the OLD address): > > mount.lustre: mount 10.2.9.1 at o2ib:/hss2 at /mnt/lustre-hss2 failed: Cannot send after transport endpoint shutdownDon''t mount with the old address :) This is not contained in the config log; this is the MGS address the client needs to talk to to GET the config log. It needs to point to the current IP of the MGS. Maybe you''ve stuck this in /etc/fstab or perhaps your DNS name resolution of the MGS''s common name hasn''t been updated.> > dmesg shows: > > Lustre: Request x1 sent from MGC10.2.9.1 at o2ib to NID 10.2.9.1 at o2ib 5s ago has timed out (limit 5s). > LustreError: 15c-8: MGC10.2.9.1 at o2ib: The configuration from log ''hss2-client'' failed (-108). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information. > LustreError: 6285:0:(llite_lib.c:1065:ll_fill_super()) Unable to process log: -108 > Lustre: client ffff81007e98e800 umount complete > LustreError: 6285:0:(obd_mount.c:1991:lustre_fill_super()) Unable to mount (-108) > > Am I missing a step? > > -Roger > > From: Nathan Rutman [mailto:nathan.rutman at oracle.com] > Sent: Tuesday, August 03, 2010 2:34 PM > To: Roger Spellman > Cc: lustre-discuss at lists.lustre.org > Subject: Re: [Lustre-discuss] Problem with write_conf > > > On Aug 3, 2010, at 11:25 AM, Roger Spellman wrote: > > > Nathan, > > Thank you. That works! > > I found that if I change IP address, I also need to remove the file /mnt/mdt/CONFIGS/*-client. > > This is what tunefs.lustre --writeconf on the MDT does, when you first mount it after the writeconf. > --writeconf on the MDT and all OSTs is the preferred way of changing a server nid. > > > > The reason is that the OST mounts failed ? the OST was still looking for the old IP Address. I grepped for files with the old IP Address, and I found those client files. > > Is that a safe thing to do? Please note that my mdt and mgs are on the same LUN. > > Thanks. > > -Roger > > > From: Nathan Rutman [mailto:nathan.rutman at oracle.com] > Sent: Tuesday, August 03, 2010 2:03 PM > To: Roger Spellman > Cc: lustre-discuss at lists.lustre.org > Subject: Re: [Lustre-discuss] Problem with write_conf > > There''s a ''failsafe'' feature that prevents filesystem name changes: >> LustreError: 157-3: Trying to start OBD AFTER-MDT0000_UUID using the wrong disk BEFORE-MDT0000_UUID. Were the /dev/ assignments rearranged? >> > You''ll have to go and delete the last_rcvd file off the disk for all the servers in the filesystem as well as tunefs --writeconf them all to the name AFTER name. > > On Aug 2, 2010, at 6:08 PM, Roger Spellman wrote: > > > > > Hi, > I would like to be able to change a file system name. Towards that end, I have run the following commands as an experiment: > > mkfs.lustre --reformat --fsname BEFORE --device-size=10000 --mgs --mdt --mgsnode=10.2.9.1 at o2ib0 /dev/mapper/map0 > dmesg -c > mount -t lustre /dev/mapper/map0 /mnt/mdt > dmesg -c > umount /mnt/mdt > dmesg -c > tunefs.lustre --writeconf --fsname=AFTER --mgs --mdt /dev/mapper/map0 > dmesg -c > mount -t lustre /dev/mapper/map0 /mnt/mdt > dmesg -c > > Unfortunately, this does not work. Can someone please explain the correct sequence of commands to ues? The output of each command is as follows. > > Thanks. > > [root at ts-hss2-01 ~]# mkfs.lustre --reformat --fsname BEFORE --device-size=10000 --mgs --mdt --mgsnode=10.2.9.1 at o2ib0 /dev/mapper/map0 > > Permanent disk data: > Target: BEFORE-MDTffff > Index: unassigned > Lustre FS: BEFORE > Mount type: ldiskfs > Flags: 0x75 > (MDT MGS needs_index first_time update ) > Persistent mount opts: iopen_nopriv,user_xattr,errors=remount-ro > Parameters: mgsnode=10.2.9.1 at o2ib mdt.group_upcall=/usr/sbin/l_getgroups > > device size = 1632256MB > 2 6 18 > formatting backing filesystem ldiskfs on /dev/mapper/map0 > target name BEFORE-MDTffff > 4k blocks 2500 > options -i 4096 -I 512 -q -O dir_index,extents,uninit_groups -F > mkfs_cmd = mke2fs -j -b 4096 -L BEFORE-MDTffff -i 4096 -I 512 -q -O dir_index,extents,uninit_groups -F /dev/mapper/map0 2500 > Writing CONFIGS/mountdata > [root at ts-hss2-01 ~]# dmesg -c > LDISKFS-fs: barriers enabled > kjournald2 starting: pid 1388, dev dm-4:8, commit interval 5 seconds > LDISKFS FS on dm-4, internal journal on dm-4:8 > LDISKFS-fs: delayed allocation enabled > LDISKFS-fs: file extents enabled > LDISKFS-fs: mballoc enabled > LDISKFS-fs: mounted filesystem dm-4 with ordered data mode > LDISKFS-fs: mballoc: 1 blocks 1 reqs (0 success) > LDISKFS-fs: mballoc: 1 extents scanned, 0 goal hits, 1 2^N hits, 0 breaks, 0 lost > LDISKFS-fs: mballoc: 1 generated and it took 2142 > LDISKFS-fs: mballoc: 512 preallocated, 0 discarded > > > [root at ts-hss2-01 ~]# mount -t lustre /dev/mapper/map0 /mnt/mdt > [root at ts-hss2-01 ~]# dmesg -c > LDISKFS-fs: barriers enabled > kjournald2 starting: pid 1406, dev dm-4:8, commit interval 5 seconds > LDISKFS FS on dm-4, internal journal on dm-4:8 > LDISKFS-fs: delayed allocation enabled > LDISKFS-fs: file extents enabled > LDISKFS-fs: mballoc enabled > LDISKFS-fs: mounted filesystem dm-4 with ordered data mode > LDISKFS-fs: mballoc: 0 blocks 0 reqs (0 success) > LDISKFS-fs: mballoc: 0 extents scanned, 0 goal hits, 0 2^N hits, 0 breaks, 0 lost > LDISKFS-fs: mballoc: 0 generated and it took 0 > LDISKFS-fs: mballoc: 0 preallocated, 0 discarded > LDISKFS-fs: barriers enabled > kjournald2 starting: pid 1410, dev dm-4:8, commit interval 5 seconds > LDISKFS FS on dm-4, internal journal on dm-4:8 > LDISKFS-fs: delayed allocation enabled > LDISKFS-fs: file extents enabled > LDISKFS-fs: mballoc enabled > LDISKFS-fs: mounted filesystem dm-4 with ordered data mode > Lustre: MGS MGS started > Lustre: MGC10.2.9.1 at o2ib: Reactivating import > Lustre: Setting parameter BEFORE-MDT0000.mdt.group_upcall in log BEFORE-MDT0000 > Lustre: Enabling user_xattr > Lustre: BEFORE-MDT0000: new disk, initializing > Lustre: BEFORE-MDT0000: Now serving BEFORE-MDT0000 on /dev/mapper/map0 with recovery enabled > Lustre: 1503:0:(lproc_mds.c:271:lprocfs_wr_group_upcall()) BEFORE-MDT0000: group upcall set to /usr/sbin/l_getgroups > Lustre: BEFORE-MDT0000.mdt: set parameter group_upcall=/usr/sbin/l_getgroups > > > [root at ts-hss2-01 ~]# umount /mnt/mdt > [root at ts-hss2-01 ~]# dmesg -c > Lustre: Failing over BEFORE-MDT0000 > Lustre: Skipped 1 previous similar message > Lustre: *** setting obd BEFORE-MDT0000 device ''dm-4'' read-only *** > Turning device dm-4 (0xfd00004) read-only > Lustre: BEFORE-MDT0000: shutting down for failover; client state will be preserved. > Lustre: MDT BEFORE-MDT0000 has stopped. > LustreError: 1517:0:(ldlm_request.c:1025:ldlm_cli_cancel_req()) Got rc -108 from cancel RPC: canceling anyway > LustreError: 1517:0:(ldlm_request.c:1587:ldlm_cli_cancel_list()) ldlm_cli_cancel_list: -108 > Lustre: MGS has stopped. > LDISKFS-fs: mballoc: 3 blocks 3 reqs (0 success) > LDISKFS-fs: mballoc: 8 extents scanned, 0 goal hits, 0 2^N hits, 0 breaks, 0 lost > LDISKFS-fs: mballoc: 1 generated and it took 2598 > LDISKFS-fs: mballoc: 1145 preallocated, 0 discarded > Removing read-only on unknown block (0xfd00004) > Lustre: server umount BEFORE-MDT0000 complete > > > [root at ts-hss2-01 ~]# tunefs.lustre --writeconf --fsname=AFTER --mgs --mdt /dev/mapper/map0 > checking for existing Lustre data: found CONFIGS/mountdata > Reading CONFIGS/mountdata > > Read previous values: > Target: BEFORE-MDT0000 > Index: 0 > Lustre FS: BEFORE > Mount type: ldiskfs > Flags: 0x5 > (MDT MGS ) > Persistent mount opts: iopen_nopriv,user_xattr,errors=remount-ro > Parameters: mgsnode=10.2.9.1 at o2ib mdt.group_upcall=/usr/sbin/l_getgroups > > > Permanent disk data: > Target: AFTER-MDT0000 > Index: 0 > Lustre FS: AFTER > Mount type: ldiskfs > Flags: 0x105 > (MDT MGS writeconf ) > Persistent mount opts: iopen_nopriv,user_xattr,errors=remount-ro > Parameters: mgsnode=10.2.9.1 at o2ib mdt.group_upcall=/usr/sbin/l_getgroups > > Writing CONFIGS/mountdata > [root at ts-hss2-01 ~]# dmesg -c > LDISKFS-fs: barriers enabled > kjournald2 starting: pid 1539, dev dm-4:8, commit interval 5 seconds > LDISKFS FS on dm-4, internal journal on dm-4:8 > LDISKFS-fs: delayed allocation enabled > LDISKFS-fs: file extents enabled > LDISKFS-fs: mballoc enabled > LDISKFS-fs: recovery complete. > LDISKFS-fs: mounted filesystem dm-4 with ordered data mode > LDISKFS-fs: mballoc: 1 blocks 1 reqs (0 success) > LDISKFS-fs: mballoc: 6 extents scanned, 0 goal hits, 0 2^N hits, 0 breaks, 0 lost > LDISKFS-fs: mballoc: 1 generated and it took 2905 > LDISKFS-fs: mballoc: 506 preallocated, 0 discarded > > > [root at ts-hss2-01 ~]# mount -t lustre /dev/mapper/map0 /mnt/mdt > mount.lustre: mount /dev/mapper/map0 at /mnt/mdt failed: Invalid argument > This may have multiple causes. > Are the mount options correct? > Check the syslog for more info. > [root at ts-hss2-01 ~]# dmesg -c > LDISKFS-fs: barriers enabled > kjournald2 starting: pid 1567, dev dm-4:8, commit interval 5 seconds > LDISKFS FS on dm-4, internal journal on dm-4:8 > LDISKFS-fs: delayed allocation enabled > LDISKFS-fs: file extents enabled > LDISKFS-fs: mballoc enabled > LDISKFS-fs: mounted filesystem dm-4 with ordered data mode > LDISKFS-fs: mballoc: 0 blocks 0 reqs (0 success) > LDISKFS-fs: mballoc: 0 extents scanned, 0 goal hits, 0 2^N hits, 0 breaks, 0 lost > LDISKFS-fs: mballoc: 0 generated and it took 0 > LDISKFS-fs: mballoc: 0 preallocated, 0 discarded > LDISKFS-fs: barriers enabled > kjournald2 starting: pid 1575, dev dm-4:8, commit interval 5 seconds > LDISKFS FS on dm-4, internal journal on dm-4:8 > LDISKFS-fs: delayed allocation enabled > LDISKFS-fs: file extents enabled > LDISKFS-fs: mballoc enabled > LDISKFS-fs: mounted filesystem dm-4 with ordered data mode > Lustre: MGS MGS started > Lustre: MGC10.2.9.1 at o2ib: Reactivating import > Lustre: MGS: Logs for fs AFTER were removed by user request. All servers must be restarted in order to regenerate the logs. > Lustre: Setting parameter AFTER-MDT0000.mdt.group_upcall in log AFTER-MDT0000 > Lustre: Enabling user_xattr > LustreError: 157-3: Trying to start OBD AFTER-MDT0000_UUID using the wrong disk BEFORE-MDT0000_UUID. Were the /dev/ assignments rearranged? > LustreError: 1665:0:(mds_fs.c:828:mds_fs_setup()) cannot read last_rcvd: rc = -22 > LustreError: 1665:0:(handler.c:2007:mds_setup()) AFTER-MDT0000: MDS filesystem method init failed: rc = -22 > LustreError: 1665:0:(obd_config.c:372:class_setup()) setup AFTER-MDT0000 failed (-22) > LustreError: 1665:0:(obd_config.c:1199:class_config_llog_handler()) Err -22 on cfg command: > Lustre: cmd=cf003 0:AFTER-MDT0000 1:AFTER-MDT0000_UUID 2:0 3:AFTER-MDT0000 > LustreError: 15b-f: MGC10.2.9.1 at o2ib: The configuration from log ''AFTER-MDT0000'' failed (-22). Make sure this client and the MGS are running compatible versions of Lustre. > LustreError: 15c-8: MGC10.2.9.1 at o2ib: The configuration from log ''AFTER-MDT0000'' failed (-22). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information. > LustreError: 1566:0:(obd_mount.c:1124:server_start_targets()) failed to start server AFTER-MDT0000: -22 > LustreError: 1566:0:(obd_mount.c:1653:server_fill_super()) Unable to start targets: -22 > LustreError: 1566:0:(obd_config.c:443:class_cleanup()) Device 4 not setup > LustreError: 1566:0:(ldlm_request.c:1025:ldlm_cli_cancel_req()) Got rc -108 from cancel RPC: canceling anyway > LustreError: 1566:0:(ldlm_request.c:1587:ldlm_cli_cancel_list()) ldlm_cli_cancel_list: -108 > Lustre: MGS has stopped. > LDISKFS-fs: mballoc: 1 blocks 1 reqs (0 success) > LDISKFS-fs: mballoc: 6 extents scanned, 0 goal hits, 0 2^N hits, 0 breaks, 0 lost > LDISKFS-fs: mballoc: 1 generated and it took 2883 > LDISKFS-fs: mballoc: 503 preallocated, 0 discarded > Lustre: 1566:0:(obd_mount.c:1473:server_put_super()) Cleaning orphaned obd AFTER-mdtlov > Lustre: server umount AFTER-MDT0000 complete > LustreError: 1566:0:(obd_mount.c:2045:lustre_fill_super()) Unable to mount (-22) > > Roger Spellman > Staff Engineer > Terascala, Inc. > 508-588-1501 > www.terascala.com <http://www.terascala.com/> > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss > >-------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20100803/0ad9c3a4/attachment-0001.html -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 1931 bytes Desc: not available Url : http://lists.lustre.org/pipermail/lustre-discuss/attachments/20100803/0ad9c3a4/attachment-0001.bin
Nathan, Thanks. That works great. Are there any tricks involved in also making a non-redundant system redundant at the same time? E.g. Can I just do: MDS# tunefs.lustre --erase-param --mgsnode=10.2.9.201 at o2ib0 --failnode=10.2.9.202 at o2ib0 /dev/mapper/map0 OSS# tunefs.lustre --erase-param --failnode=10.2.9.204 at o2ib0 --mgsnode=10.2.9.201 at o2ib0 --mgsnode=10.2.9.202 at o2ib0 /dev/mapper/map0 Is the OSS''s NID stored anywhere on the OST? -Roger ________________________________ From: Nathan Rutman [mailto:nathan.rutman at oracle.com] Sent: Tuesday, August 03, 2010 4:05 PM To: Roger Spellman Cc: lustre-discuss at lists.lustre.org Subject: Re: [Lustre-discuss] Problem with write_conf On Aug 3, 2010, at 12:49 PM, Roger Spellman wrote: If I change the NIDs, and if I don''t remove /mnt/mdt/CONFIGS/*-client, then I get the following when I try mounting a client (note that 10.2.9.1 is the OLD address): mount.lustre: mount 10.2.9.1 at o2ib:/hss2 at /mnt/lustre-hss2 failed: Cannot send after transport endpoint shutdown Don''t mount with the old address :) This is not contained in the config log; this is the MGS address the client needs to talk to to GET the config log. It needs to point to the current IP of the MGS. Maybe you''ve stuck this in /etc/fstab or perhaps your DNS name resolution of the MGS''s common name hasn''t been updated. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20100803/5f72f4cf/attachment.html
Nathan, I started out with IP addresses of 10.2.9.1 (MDS), 10.2.9.2 (standby MDS), 10.2.9.3 (OSS), and 10.2.9.4 (peer OSS). I created a single MDT and a single OST, using the following commands: MDS# mkfs.lustre --reformat --fsname hss2 --device-size=10000 --mgs --mdt --mkfsoptions='' -O extents,dir_index,uninit_groups'' --mgsnode=10.2.9.1 at o2ib0 /dev/mapper/map0 OSS# mkfs.lustre --reformat --ost --index=0 --mkfsoptions='' -O extents,dir_index,uninit_groups '' --fsname hss2 --device-size=100000 --mgsnode=10.2.9.1 at o2ib0 /dev/mapper/map0 I mounted, mounted a client, created a few files, then unmounted the client, unmounted the servers, rebooted the clients and servers. Once the servers were back up, I ran the following on the MDS and OSS, respectively: MDS# tunefs.lustre --erase-param --mgsnode=10.2.9.201 at o2ib0 --failnode=10.2.9.202 at o2ib0 /dev/mapper/map0 OSS# tunefs.lustre --erase-param --failnode=10.2.9.204 at o2ib0 --mgsnode=10.2.9.201 at o2ib0 --mgsnode=10.2.9.202 at o2ib0 /dev/mapper/map0 Then, I removed last_rcvd from the MDT and OST. The, I changed the IP address to 10.2.9.201 (MDS), 10.2.9.202 (standby MDS), 10.2.9.203 (OSS), 10.2.9.204 (peer OSS). I mounted the MDT and OST. After a short while, I got the following errors on the MDS: Lustre: 4567:0:(client.c:1464:ptlrpc_expire_one_request()) @@@ Request x1343087831941136 sent from hss2-OST0000-osc to NID 10.2.9.204 at o2ib 0s ago has failed due to network error (5s prior to deadline). req at ffff810213b5e400 x1343087831941136/t0 o8->hss2-OST0000_UUID at 10.2.9.204@o2ib:28/4 lens 368/584 e 0 to 1 dl 1280868405 ref 1 fl Rpc:N/0/0 rc 0/0 Lustre: 4568:0:(import.c:517:import_select_connection()) hss2-OST0000-osc: tried all connections, increasing latency to 1s Lustre: 4567:0:(client.c:1464:ptlrpc_expire_one_request()) @@@ Request x1343087831941137 sent from hss2-OST0000-osc to NID 10.2.9.3 at o2ib 6s ago has timed out (6s prior to d eadline). req at ffff810213b5e400 x1343087831941137/t0 o8->hss2-OST0000_UUID at 10.2.9.3@o2ib:28/4 lens 368/584 e 0 to 1 dl 1280868412 ref 2 fl Rpc:N/0/0 rc 0/0 LustreError: 4567:0:(lib-move.c:2441:LNetPut()) Error sending PUT to 12345-10.2.9.204 at o2ib: -113 Note that the old IP address of the old OST (10.2.9.203) is still listed. How can I change that? The client is also seeing old IP addresses, this time the MDS''s 10.2.9.1: Lustre: Request x55 sent from hss2-MDT0000-mdc-ffff81007981d800 to NID 10.2.9.1 at o2ib 5s ago has timed out (limit 5s). Lustre: Skipped 9 previous similar messages Lustre: 6433:0:(import.c:507:import_select_connection()) hss2-MDT0000-mdc-ffff81007981d800: tried all connections, increasing latency to 50s Lustre: 6433:0:(import.c:507:import_select_connection()) Skipped 4 previous similar messages Any help is appreciated. Thanks. -Roger ________________________________ From: Roger Spellman Sent: Tuesday, August 03, 2010 4:22 PM To: ''Nathan Rutman'' Cc: lustre-discuss at lists.lustre.org Subject: RE: [Lustre-discuss] Problem with write_conf Nathan, Thanks. That works great. Are there any tricks involved in also making a non-redundant system redundant at the same time? E.g. Can I just do: MDS# tunefs.lustre --erase-param --mgsnode=10.2.9.201 at o2ib0 --failnode=10.2.9.202 at o2ib0 /dev/mapper/map0 OSS# tunefs.lustre --erase-param --failnode=10.2.9.204 at o2ib0 --mgsnode=10.2.9.201 at o2ib0 --mgsnode=10.2.9.202 at o2ib0 /dev/mapper/map0 Is the OSS''s NID stored anywhere on the OST? -Roger ________________________________ From: Nathan Rutman [mailto:nathan.rutman at oracle.com] Sent: Tuesday, August 03, 2010 4:05 PM To: Roger Spellman Cc: lustre-discuss at lists.lustre.org Subject: Re: [Lustre-discuss] Problem with write_conf On Aug 3, 2010, at 12:49 PM, Roger Spellman wrote: If I change the NIDs, and if I don''t remove /mnt/mdt/CONFIGS/*-client, then I get the following when I try mounting a client (note that 10.2.9.1 is the OLD address): mount.lustre: mount 10.2.9.1 at o2ib:/hss2 at /mnt/lustre-hss2 failed: Cannot send after transport endpoint shutdown Don''t mount with the old address :) This is not contained in the config log; this is the MGS address the client needs to talk to to GET the config log. It needs to point to the current IP of the MGS. Maybe you''ve stuck this in /etc/fstab or perhaps your DNS name resolution of the MGS''s common name hasn''t been updated. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20100803/cb6ffe6c/attachment-0001.html