Michal Bialoskorski
2009-Jun-26 09:45 UTC
[Lustre-discuss] error while upgrading 1.6.5 to 1.6.7.2
Hello All! I''m upgrading lustre from 1.6.5 to 1.6.7 version. I''ve done all steps from manual and while probe of mounting OST I got an error on MDS: home-OST0000 claims to have registered, but this MGS does not know about it, preventing registration I''ve tried to do "if an error occurs" step: ossh2:# tunefs.lustre --ost --fsname=home --mgssnode=192.168.27.252 at o2ib,192.168.27.253 at o2ib1 --mgsnode=192.168.27.250 at o2ib,192.168.27.251 at o2ib1 /dev/mapper/home.ost.01 [...] tunefs.lustre: cannot change the name of a registered target tunefs.lustre: exiting with 1 (Operation not permitted) Can any one help me? I''m afraid of loosing all users data. Regards, Michal. The details are: **************** First I mount MDT: **************** Jun 26 11:19:03 mdsh kernel: Lustre: MGS MGS started Jun 26 11:19:03 mdsh kernel: Lustre: MGC192.168.27.250 at o2ib: Reactivating import Jun 26 11:19:03 mdsh kernel: Lustre: Enabling user_xattr Jun 26 11:19:03 mdsh kernel: Lustre: 23387:0:(mds_fs.c:511:mds_init_server_data()) RECOVERY: service home-MDT0000, 981 recoverable clients, last_transno 2716170795 Jun 26 11:19:03 mdsh kernel: Lustre: MDT home-MDT0000 now serving home-MDT0000_UUID (home-MDT0000/08f9d071-1eb1-a32e-4c3f-2beb50305606), but will be in recovery for at least 5:00, or until 981 clients reconnect. During this time new clients will not be allowed to connect. Recovery progress can be monitored by watching /proc/fs/lustre/mds/home-MDT0000/recovery_status. Jun 26 11:19:03 mdsh kernel: Lustre: 23387:0:(lproc_mds.c:271:lprocfs_wr_group_upcall()) home-MDT0000: group upcall set to /usr/sbin/l_getgroups Jun 26 11:19:03 mdsh kernel: Lustre: home-MDT0000.mdt: set parameter group_upcall=/usr/sbin/l_getgroups Jun 26 11:19:03 mdsh kernel: Lustre: Server home-MDT0000 on device /dev/mapper/home.mdt has started ******************** Second I try to mount OST: ******************** ossh2:~# mount -t lustre /dev/mapper/home.ost.01 /lustre/home/ost.01 mount.lustre: mount /dev/mapper/home.ost.01 at /lustre/home/ost.01 failed: No such device or address The target service failed to start (bad config log?) (/dev/mapper/home.ost.01). See /var/log/messages. syslog on OSS said: Jun 26 11:23:58 ossh2 kernel: Lustre: MGC192.168.27.252 at o2ib: Reactivating import Jun 26 11:23:58 ossh2 kernel: LustreError: 11-0: an error occurred while communicating with 192.168.27.252 at o2ib. The mgs_target_reg operation failed with -2 Jun 26 11:23:59 ossh2 kernel: LustreError: 10467:0:(obd_mount.c:1129:server_start_targets()) no server named home-OST0000 was started Jun 26 11:23:59 ossh2 kernel: LustreError: 10467:0:(obd_mount.c:1628:server_fill_super()) Unable to start targets: -6 Jun 26 11:23:59 ossh2 kernel: LustreError: 10467:0:(obd_mount.c:1411:server_put_super()) no obd home-OST0000 Jun 26 11:23:59 ossh2 kernel: LustreError: 10467:0:(ldlm_request.c:1033:ldlm_cli_cancel_req()) Got rc -108 from cancel RPC: canceling anyway Jun 26 11:23:59 ossh2 kernel: LustreError: 10467:0:(ldlm_request.c:1622:ldlm_cli_cancel_list()) ldlm_cli_cancel_list: -108 Jun 26 11:23:59 ossh2 kernel: LDISKFS-fs: mballoc: 0 blocks 0 reqs (0 success) Jun 26 11:23:59 ossh2 kernel: LDISKFS-fs: mballoc: 0 extents scanned, 0 goal hits, 0 2^N hits, 0 breaks, 0 lost Jun 26 11:23:59 ossh2 kernel: LDISKFS-fs: mballoc: 0 generated and it took 0 Jun 26 11:23:59 ossh2 kernel: LDISKFS-fs: mballoc: 0 preallocated, 0 discarded Jun 26 11:23:59 ossh2 kernel: Lustre: server umount home-OST0000 complete Jun 26 11:23:59 ossh2 kernel: LustreError: 10467:0:(obd_mount.c:1991:lustre_fill_super()) Unable to mount (-6) syslog on MDS said: Jun 26 11:23:59 mdsh kernel: LustreError: 13b-9: home-OST0000 claims to have registered, but this MGS does not know about it, preventing registration. Jun 26 11:23:59 mdsh kernel: LustreError: 23383:0:(mgs_handler.c:654:mgs_handle()) MGS handle cmd=253 rc=-2 Jun 26 11:23:59 mdsh kernel: LustreError: 23383:0:(ldlm_lib.c:1643:target_send_reply_msg()) @@@ processing error (-2) req at ffff8107f06ea450 x34/t0 o253->123978a8-a36c-6f4c-9da9-521de5ecebd0 at NET_0x50000c0a81bf6_UUID:0/0 lens 4672/4672 e 0 to 0 dl 1246008339 ref 1 fl Interpret:/0/0 rc 0/0
Nirmal Seenu
2009-Jun-26 15:27 UTC
[Lustre-discuss] error while upgrading 1.6.5 to 1.6.7.2
The manual has the incorrect command, just remove the option "--fsname" and everything should work fine. Nirmal
Michal Bialoskorski
2009-Jun-26 16:21 UTC
[Lustre-discuss] error while upgrading 1.6.5 to 1.6.7.2
Thanks Nirmal, tunefs now works, I run this commnad: /opt/lustre/sbin/tunefs.lustre --ost --erase-param --mgsnode=192.168.27.252 at o2ib /dev/mapper/home.ost.01 but I still cannot mount the OST. After try of mounting OST I''ve got: Jun 26 17:49:12 ossh2 kernel: Lustre: MGC192.168.27.252 at o2ib: Reactivating import Jun 26 17:49:12 ossh2 kernel: LustreError: 5848:0:(obd_mount.c:1129:server_start_targets()) no server named home-OST0000 was started Jun 26 17:49:12 ossh2 kernel: LustreError: 5848:0:(obd_mount.c:1628:server_fill_super()) Unable to start targets: -6 Jun 26 17:49:12 ossh2 kernel: LustreError: 5848:0:(obd_mount.c:1411:server_put_super()) no obd home-OST0000 Jun 26 17:49:12 ossh2 kernel: LustreError: 5848:0:(ldlm_request.c:1033:ldlm_cli_cancel_req()) Got rc -108 from cancel RPC: canceling anyway Jun 26 17:49:12 ossh2 kernel: LustreError: 5848:0:(ldlm_request.c:1622:ldlm_cli_cancel_list()) ldlm_cli_cancel_list: -108 Jun 26 17:49:12 ossh2 kernel: LDISKFS-fs: mballoc: 0 blocks 0 reqs (0 success) Jun 26 17:49:12 ossh2 kernel: LDISKFS-fs: mballoc: 0 extents scanned, 0 goal hits, 0 2^N hits, 0 breaks, 0 lost Jun 26 17:49:12 ossh2 kernel: LDISKFS-fs: mballoc: 0 generated and it took 0 Jun 26 17:49:12 ossh2 kernel: LDISKFS-fs: mballoc: 0 preallocated, 0 discarded Jun 26 17:49:12 ossh2 kernel: Lustre: server umount home-OST0000 complete Jun 26 17:49:12 ossh2 kernel: Lustre: Skipped 1 previous similar message Jun 26 17:49:12 ossh2 kernel: LustreError: 5848:0:(obd_mount.c:1991:lustre_fill_super()) Unable to mount (-6) Jun 26 17:49:12 ossh2 kernel: LustreError: 5848:0:(obd_mount.c:1991:lustre_fill_super()) Skipped 1 previous similar message And on MDS: Jun 26 18:16:41 mdsh kernel: LustreError: 13b-9: home-OST0000 claims to have registered, but this MGS does not know about it, preventing registration. Jun 26 18:16:41 mdsh kernel: LustreError: 5251:0:(mgs_handler.c:654:mgs_handle()) MGS handle cmd=253 rc=-2 Jun 26 18:16:41 mdsh kernel: LustreError: 5251:0:(ldlm_lib.c:1643:target_send_reply_msg()) @@@ processing error (-2) req at ffff810805c00050 x16/t0 o253->6a7fc3be-6a98-7219-6302-a17def55c327 at NET_0x50000c0a81bf6_UUID:0/0 lens 4672/4672 e 0 to 0 dl 1246033101 ref 1 fl Interpret:/0/0 rc 0/0 Have you got any idea what is wrong? How can I clean the MGS? Michal. Nirmal Seenu napisal:> The manual has the incorrect command, just remove the option "--fsname" > and everything should work fine. > > Nirmal > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss >
Michal Bialoskorski
2009-Jun-26 21:27 UTC
[Lustre-discuss] error while upgrading 1.6.5 to 1.6.7.2
Thank You Nirmal very very much. "home" is working now. Now I will
upgrade/recover the second fs.
Summing it up what I''ve done is:
on MDS:
1) /opt/lustre/sbin/tunefs.lustre --mdt --writeconf --erase-param \
--param="mdt.group_upcall=/usr/sbin/l_getgroups"/dev/mapper/home.mdt
2) mount -t lustre -o abort_recov /dev/mapper/work.mdt /lustre/home.mdt
3) umount /lustre/home.mdt
4) mount -t lustre /dev/mapper/work.mdt /lustre/home.mdt
on OSSs for all OSTs:
1) tunefs.lustre --ost --writeconf --erase-param
--mgsnode=192.168.27.252 at o2ib /dev/mapper/home.ost.XX
2) mount -t lustre -o abort_recov /dev/mapper/home.ost.XX
/lustre/home/ost.XX
m.
Nirmal Seenu pisze:> Hi Michal,
>
> You will have to include the option --writeconf to actually make those
> tunefs modifications to be written on to the OST.
>
> Nirmal
>
> Michal Bialoskorski wrote:
>> Thanks Nirmal,
>>
>> tunefs now works, I run this commnad:
>>
>> /opt/lustre/sbin/tunefs.lustre --ost --erase-param
>> --mgsnode=192.168.27.252 at o2ib /dev/mapper/home.ost.01
>> but I still cannot mount the OST. After try of mounting OST
I''ve got:
>>
>> Jun 26 17:49:12 ossh2 kernel: Lustre: MGC192.168.27.252 at o2ib:
>> Reactivating import
>> Jun 26 17:49:12 ossh2 kernel: LustreError:
>> 5848:0:(obd_mount.c:1129:server_start_targets()) no server named
>> home-OST0000 was started
>> Jun 26 17:49:12 ossh2 kernel: LustreError:
>> 5848:0:(obd_mount.c:1628:server_fill_super()) Unable to start
>> targets: -6
>> Jun 26 17:49:12 ossh2 kernel: LustreError:
>> 5848:0:(obd_mount.c:1411:server_put_super()) no obd home-OST0000
>> Jun 26 17:49:12 ossh2 kernel: LustreError:
>> 5848:0:(ldlm_request.c:1033:ldlm_cli_cancel_req()) Got rc -108 from
>> cancel RPC: canceling anyway
>> Jun 26 17:49:12 ossh2 kernel: LustreError:
>> 5848:0:(ldlm_request.c:1622:ldlm_cli_cancel_list())
>> ldlm_cli_cancel_list: -108
>> Jun 26 17:49:12 ossh2 kernel: LDISKFS-fs: mballoc: 0 blocks 0 reqs (0
>> success)
>> Jun 26 17:49:12 ossh2 kernel: LDISKFS-fs: mballoc: 0 extents scanned, 0
>> goal hits, 0 2^N hits, 0 breaks, 0 lost
>> Jun 26 17:49:12 ossh2 kernel: LDISKFS-fs: mballoc: 0 generated and it
>> took 0
>> Jun 26 17:49:12 ossh2 kernel: LDISKFS-fs: mballoc: 0 preallocated, 0
>> discarded
>> Jun 26 17:49:12 ossh2 kernel: Lustre: server umount home-OST0000
>> complete
>> Jun 26 17:49:12 ossh2 kernel: Lustre: Skipped 1 previous similar
message
>> Jun 26 17:49:12 ossh2 kernel: LustreError:
>> 5848:0:(obd_mount.c:1991:lustre_fill_super()) Unable to mount (-6)
>> Jun 26 17:49:12 ossh2 kernel: LustreError:
>> 5848:0:(obd_mount.c:1991:lustre_fill_super()) Skipped 1 previous
similar
>> message
>>
>> And on MDS:
>>
>> Jun 26 18:16:41 mdsh kernel: LustreError: 13b-9: home-OST0000 claims to
>> have registered, but this MGS does not know about it, preventing
>> registration.
>> Jun 26 18:16:41 mdsh kernel: LustreError:
>> 5251:0:(mgs_handler.c:654:mgs_handle()) MGS handle cmd=253 rc=-2
>> Jun 26 18:16:41 mdsh kernel: LustreError:
>> 5251:0:(ldlm_lib.c:1643:target_send_reply_msg()) @@@ processing error
>> (-2) req at ffff810805c00050 x16/t0
>> o253->6a7fc3be-6a98-7219-6302-a17def55c327 at
NET_0x50000c0a81bf6_UUID:0/0
>> lens 4672/4672 e 0 to 0 dl 1246033101 ref 1 fl Interpret:/0/0 rc 0/0
>>
>> Have you got any idea what is wrong? How can I clean the MGS?
>>
>> Michal.
>>
>>
>> Nirmal Seenu napisal:
>>> The manual has the incorrect command, just remove the option
>>> "--fsname" and everything should work fine.
>>>
>>> Nirmal
>>> _______________________________________________
>>> Lustre-discuss mailing list
>>> Lustre-discuss at lists.lustre.org
>>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>>>
>>