I am in the process of upgrading one of my filesystems. I had a few older OSSs that I am in the process of swapping out for new disks. What I have done so far. 1. moved most of the data off of the older OSTs. 2. backed up the remaining files and configs on the old OSTs. Using the modified tar binary. 3. swapped disks created the new raid and formatted them. 4. Restored the backup to the new disks. However now I cannot mount the filesystem. I get the following message on the MDS. LustreError: 13b-9: lustre-OST0000 claims to have registered, but this MGS does not know about it, preventing registration. On the OSS I see the following messages. kernel: LustreError: 11-0: an error occurred while communicating with 192.168.136.10 at tcp. The mgs_target_reg operation failed with -2 kernel: LustreError: 27428:0:(obd_mount.c:1139:server_start_targets()) no server named lustre-OST0000 was started kernel: LustreError: 27428:0:(obd_mount.c:1653:server_fill_super()) Unable to start targets: -6 kernel: LustreError: 27428:0:(obd_mount.c:1436:server_put_super()) no obd lustre-OST0000 kernel: LustreError: 27428:0:(ldlm_request.c:1025:ldlm_cli_cancel_req()) Got rc -108 from cancel RPC: canceling anyway kernel: LustreError: 27428:0:(ldlm_request.c:1587:ldlm_cli_cancel_list()) ldlm_cli_cancel_list: -108 kernel: LDISKFS-fs: mballoc: 0 blocks 0 reqs (0 success) kernel: LDISKFS-fs: mballoc: 0 extents scanned, 0 goal hits, 0 2^N hits, 0 breaks, 0 lost kernel: LDISKFS-fs: mballoc: 0 generated and it took 0 kernel: LDISKFS-fs: mballoc: 0 preallocated, 0 discarded kernel: Lustre: server umount lustre-OST0000 complete kernel: LustreError: 27428:0:(obd_mount.c:2045:lustre_fill_super()) Unable to mount (-6) I have attempted a tunefs.lustre --writeconf and a move of the CATALOG file to try to get the system mounted again but this did not work. I did notice before the upgrade that when I performed a lctl dl the names of the older osts was different from the newly created OSTs. I was thinking I needed to do something like the following. perform a writeconf on all devices then remount the filesystem but did want to verify that this was headed in the right direction. Sebastian
On Thursday, August 26, 2010, Sebastian Gutierrez wrote:> I am in the process of upgrading one of my filesystems. I had a few older > OSSs that I am in the process of swapping out for new disks. > > What I have done so far. > > 1. moved most of the data off of the older OSTs. > 2. backed up the remaining files and configs on the old OSTs. Using the > modified tar binary. 3. swapped disks created the new raid and formatted > them. > 4. Restored the backup to the new disks. > > However now I cannot mount the filesystem. I get the following message on > the MDS. > > LustreError: 13b-9: lustre-OST0000 claims to have registered, but this MGS > does not know about it, preventing registration. > > On the OSS I see the following messages. > > kernel: LustreError: 11-0: an error occurred while communicating with > 192.168.136.10 at tcp. The mgs_target_reg operation failed with -2 kernel: > LustreError: 27428:0:(obd_mount.c:1139:server_start_targets()) no server > named lustre-OST0000 was started kernel: LustreError:What does the MGS say?> I was thinking I needed to do something like the following. > > perform a writeconf on all devices > then remount the filesystem but did want to verify that this was headed in > the right direction.Yes, likely to be correct, but please check first what the MGS complains about. Cheers, Bernd -- Bernd Schubert DataDirect Networks