Michal Bialoskorski
2009-Jun-26 09:45 UTC
[Lustre-discuss] error while upgrading 1.6.5 to 1.6.7.2
Hello All! I''m upgrading lustre from 1.6.5 to 1.6.7 version. I''ve done all steps from manual and while probe of mounting OST I got an error on MDS: home-OST0000 claims to have registered, but this MGS does not know about it, preventing registration I''ve tried to do "if an error occurs" step: ossh2:# tunefs.lustre --ost --fsname=home --mgssnode=192.168.27.252 at o2ib,192.168.27.253 at o2ib1 --mgsnode=192.168.27.250 at o2ib,192.168.27.251 at o2ib1 /dev/mapper/home.ost.01 [...] tunefs.lustre: cannot change the name of a registered target tunefs.lustre: exiting with 1 (Operation not permitted) Can any one help me? I''m afraid of loosing all users data. Regards, Michal. The details are: **************** First I mount MDT: **************** Jun 26 11:19:03 mdsh kernel: Lustre: MGS MGS started Jun 26 11:19:03 mdsh kernel: Lustre: MGC192.168.27.250 at o2ib: Reactivating import Jun 26 11:19:03 mdsh kernel: Lustre: Enabling user_xattr Jun 26 11:19:03 mdsh kernel: Lustre: 23387:0:(mds_fs.c:511:mds_init_server_data()) RECOVERY: service home-MDT0000, 981 recoverable clients, last_transno 2716170795 Jun 26 11:19:03 mdsh kernel: Lustre: MDT home-MDT0000 now serving home-MDT0000_UUID (home-MDT0000/08f9d071-1eb1-a32e-4c3f-2beb50305606), but will be in recovery for at least 5:00, or until 981 clients reconnect. During this time new clients will not be allowed to connect. Recovery progress can be monitored by watching /proc/fs/lustre/mds/home-MDT0000/recovery_status. Jun 26 11:19:03 mdsh kernel: Lustre: 23387:0:(lproc_mds.c:271:lprocfs_wr_group_upcall()) home-MDT0000: group upcall set to /usr/sbin/l_getgroups Jun 26 11:19:03 mdsh kernel: Lustre: home-MDT0000.mdt: set parameter group_upcall=/usr/sbin/l_getgroups Jun 26 11:19:03 mdsh kernel: Lustre: Server home-MDT0000 on device /dev/mapper/home.mdt has started ******************** Second I try to mount OST: ******************** ossh2:~# mount -t lustre /dev/mapper/home.ost.01 /lustre/home/ost.01 mount.lustre: mount /dev/mapper/home.ost.01 at /lustre/home/ost.01 failed: No such device or address The target service failed to start (bad config log?) (/dev/mapper/home.ost.01). See /var/log/messages. syslog on OSS said: Jun 26 11:23:58 ossh2 kernel: Lustre: MGC192.168.27.252 at o2ib: Reactivating import Jun 26 11:23:58 ossh2 kernel: LustreError: 11-0: an error occurred while communicating with 192.168.27.252 at o2ib. The mgs_target_reg operation failed with -2 Jun 26 11:23:59 ossh2 kernel: LustreError: 10467:0:(obd_mount.c:1129:server_start_targets()) no server named home-OST0000 was started Jun 26 11:23:59 ossh2 kernel: LustreError: 10467:0:(obd_mount.c:1628:server_fill_super()) Unable to start targets: -6 Jun 26 11:23:59 ossh2 kernel: LustreError: 10467:0:(obd_mount.c:1411:server_put_super()) no obd home-OST0000 Jun 26 11:23:59 ossh2 kernel: LustreError: 10467:0:(ldlm_request.c:1033:ldlm_cli_cancel_req()) Got rc -108 from cancel RPC: canceling anyway Jun 26 11:23:59 ossh2 kernel: LustreError: 10467:0:(ldlm_request.c:1622:ldlm_cli_cancel_list()) ldlm_cli_cancel_list: -108 Jun 26 11:23:59 ossh2 kernel: LDISKFS-fs: mballoc: 0 blocks 0 reqs (0 success) Jun 26 11:23:59 ossh2 kernel: LDISKFS-fs: mballoc: 0 extents scanned, 0 goal hits, 0 2^N hits, 0 breaks, 0 lost Jun 26 11:23:59 ossh2 kernel: LDISKFS-fs: mballoc: 0 generated and it took 0 Jun 26 11:23:59 ossh2 kernel: LDISKFS-fs: mballoc: 0 preallocated, 0 discarded Jun 26 11:23:59 ossh2 kernel: Lustre: server umount home-OST0000 complete Jun 26 11:23:59 ossh2 kernel: LustreError: 10467:0:(obd_mount.c:1991:lustre_fill_super()) Unable to mount (-6) syslog on MDS said: Jun 26 11:23:59 mdsh kernel: LustreError: 13b-9: home-OST0000 claims to have registered, but this MGS does not know about it, preventing registration. Jun 26 11:23:59 mdsh kernel: LustreError: 23383:0:(mgs_handler.c:654:mgs_handle()) MGS handle cmd=253 rc=-2 Jun 26 11:23:59 mdsh kernel: LustreError: 23383:0:(ldlm_lib.c:1643:target_send_reply_msg()) @@@ processing error (-2) req at ffff8107f06ea450 x34/t0 o253->123978a8-a36c-6f4c-9da9-521de5ecebd0 at NET_0x50000c0a81bf6_UUID:0/0 lens 4672/4672 e 0 to 0 dl 1246008339 ref 1 fl Interpret:/0/0 rc 0/0
Nirmal Seenu
2009-Jun-26 15:27 UTC
[Lustre-discuss] error while upgrading 1.6.5 to 1.6.7.2
The manual has the incorrect command, just remove the option "--fsname" and everything should work fine. Nirmal
Michal Bialoskorski
2009-Jun-26 16:21 UTC
[Lustre-discuss] error while upgrading 1.6.5 to 1.6.7.2
Thanks Nirmal, tunefs now works, I run this commnad: /opt/lustre/sbin/tunefs.lustre --ost --erase-param --mgsnode=192.168.27.252 at o2ib /dev/mapper/home.ost.01 but I still cannot mount the OST. After try of mounting OST I''ve got: Jun 26 17:49:12 ossh2 kernel: Lustre: MGC192.168.27.252 at o2ib: Reactivating import Jun 26 17:49:12 ossh2 kernel: LustreError: 5848:0:(obd_mount.c:1129:server_start_targets()) no server named home-OST0000 was started Jun 26 17:49:12 ossh2 kernel: LustreError: 5848:0:(obd_mount.c:1628:server_fill_super()) Unable to start targets: -6 Jun 26 17:49:12 ossh2 kernel: LustreError: 5848:0:(obd_mount.c:1411:server_put_super()) no obd home-OST0000 Jun 26 17:49:12 ossh2 kernel: LustreError: 5848:0:(ldlm_request.c:1033:ldlm_cli_cancel_req()) Got rc -108 from cancel RPC: canceling anyway Jun 26 17:49:12 ossh2 kernel: LustreError: 5848:0:(ldlm_request.c:1622:ldlm_cli_cancel_list()) ldlm_cli_cancel_list: -108 Jun 26 17:49:12 ossh2 kernel: LDISKFS-fs: mballoc: 0 blocks 0 reqs (0 success) Jun 26 17:49:12 ossh2 kernel: LDISKFS-fs: mballoc: 0 extents scanned, 0 goal hits, 0 2^N hits, 0 breaks, 0 lost Jun 26 17:49:12 ossh2 kernel: LDISKFS-fs: mballoc: 0 generated and it took 0 Jun 26 17:49:12 ossh2 kernel: LDISKFS-fs: mballoc: 0 preallocated, 0 discarded Jun 26 17:49:12 ossh2 kernel: Lustre: server umount home-OST0000 complete Jun 26 17:49:12 ossh2 kernel: Lustre: Skipped 1 previous similar message Jun 26 17:49:12 ossh2 kernel: LustreError: 5848:0:(obd_mount.c:1991:lustre_fill_super()) Unable to mount (-6) Jun 26 17:49:12 ossh2 kernel: LustreError: 5848:0:(obd_mount.c:1991:lustre_fill_super()) Skipped 1 previous similar message And on MDS: Jun 26 18:16:41 mdsh kernel: LustreError: 13b-9: home-OST0000 claims to have registered, but this MGS does not know about it, preventing registration. Jun 26 18:16:41 mdsh kernel: LustreError: 5251:0:(mgs_handler.c:654:mgs_handle()) MGS handle cmd=253 rc=-2 Jun 26 18:16:41 mdsh kernel: LustreError: 5251:0:(ldlm_lib.c:1643:target_send_reply_msg()) @@@ processing error (-2) req at ffff810805c00050 x16/t0 o253->6a7fc3be-6a98-7219-6302-a17def55c327 at NET_0x50000c0a81bf6_UUID:0/0 lens 4672/4672 e 0 to 0 dl 1246033101 ref 1 fl Interpret:/0/0 rc 0/0 Have you got any idea what is wrong? How can I clean the MGS? Michal. Nirmal Seenu napisal:> The manual has the incorrect command, just remove the option "--fsname" > and everything should work fine. > > Nirmal > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss >
Michal Bialoskorski
2009-Jun-26 21:27 UTC
[Lustre-discuss] error while upgrading 1.6.5 to 1.6.7.2
Thank You Nirmal very very much. "home" is working now. Now I will upgrade/recover the second fs. Summing it up what I''ve done is: on MDS: 1) /opt/lustre/sbin/tunefs.lustre --mdt --writeconf --erase-param \ --param="mdt.group_upcall=/usr/sbin/l_getgroups"/dev/mapper/home.mdt 2) mount -t lustre -o abort_recov /dev/mapper/work.mdt /lustre/home.mdt 3) umount /lustre/home.mdt 4) mount -t lustre /dev/mapper/work.mdt /lustre/home.mdt on OSSs for all OSTs: 1) tunefs.lustre --ost --writeconf --erase-param --mgsnode=192.168.27.252 at o2ib /dev/mapper/home.ost.XX 2) mount -t lustre -o abort_recov /dev/mapper/home.ost.XX /lustre/home/ost.XX m. Nirmal Seenu pisze:> Hi Michal, > > You will have to include the option --writeconf to actually make those > tunefs modifications to be written on to the OST. > > Nirmal > > Michal Bialoskorski wrote: >> Thanks Nirmal, >> >> tunefs now works, I run this commnad: >> >> /opt/lustre/sbin/tunefs.lustre --ost --erase-param >> --mgsnode=192.168.27.252 at o2ib /dev/mapper/home.ost.01 >> but I still cannot mount the OST. After try of mounting OST I''ve got: >> >> Jun 26 17:49:12 ossh2 kernel: Lustre: MGC192.168.27.252 at o2ib: >> Reactivating import >> Jun 26 17:49:12 ossh2 kernel: LustreError: >> 5848:0:(obd_mount.c:1129:server_start_targets()) no server named >> home-OST0000 was started >> Jun 26 17:49:12 ossh2 kernel: LustreError: >> 5848:0:(obd_mount.c:1628:server_fill_super()) Unable to start >> targets: -6 >> Jun 26 17:49:12 ossh2 kernel: LustreError: >> 5848:0:(obd_mount.c:1411:server_put_super()) no obd home-OST0000 >> Jun 26 17:49:12 ossh2 kernel: LustreError: >> 5848:0:(ldlm_request.c:1033:ldlm_cli_cancel_req()) Got rc -108 from >> cancel RPC: canceling anyway >> Jun 26 17:49:12 ossh2 kernel: LustreError: >> 5848:0:(ldlm_request.c:1622:ldlm_cli_cancel_list()) >> ldlm_cli_cancel_list: -108 >> Jun 26 17:49:12 ossh2 kernel: LDISKFS-fs: mballoc: 0 blocks 0 reqs (0 >> success) >> Jun 26 17:49:12 ossh2 kernel: LDISKFS-fs: mballoc: 0 extents scanned, 0 >> goal hits, 0 2^N hits, 0 breaks, 0 lost >> Jun 26 17:49:12 ossh2 kernel: LDISKFS-fs: mballoc: 0 generated and it >> took 0 >> Jun 26 17:49:12 ossh2 kernel: LDISKFS-fs: mballoc: 0 preallocated, 0 >> discarded >> Jun 26 17:49:12 ossh2 kernel: Lustre: server umount home-OST0000 >> complete >> Jun 26 17:49:12 ossh2 kernel: Lustre: Skipped 1 previous similar message >> Jun 26 17:49:12 ossh2 kernel: LustreError: >> 5848:0:(obd_mount.c:1991:lustre_fill_super()) Unable to mount (-6) >> Jun 26 17:49:12 ossh2 kernel: LustreError: >> 5848:0:(obd_mount.c:1991:lustre_fill_super()) Skipped 1 previous similar >> message >> >> And on MDS: >> >> Jun 26 18:16:41 mdsh kernel: LustreError: 13b-9: home-OST0000 claims to >> have registered, but this MGS does not know about it, preventing >> registration. >> Jun 26 18:16:41 mdsh kernel: LustreError: >> 5251:0:(mgs_handler.c:654:mgs_handle()) MGS handle cmd=253 rc=-2 >> Jun 26 18:16:41 mdsh kernel: LustreError: >> 5251:0:(ldlm_lib.c:1643:target_send_reply_msg()) @@@ processing error >> (-2) req at ffff810805c00050 x16/t0 >> o253->6a7fc3be-6a98-7219-6302-a17def55c327 at NET_0x50000c0a81bf6_UUID:0/0 >> lens 4672/4672 e 0 to 0 dl 1246033101 ref 1 fl Interpret:/0/0 rc 0/0 >> >> Have you got any idea what is wrong? How can I clean the MGS? >> >> Michal. >> >> >> Nirmal Seenu napisal: >>> The manual has the incorrect command, just remove the option >>> "--fsname" and everything should work fine. >>> >>> Nirmal >>> _______________________________________________ >>> Lustre-discuss mailing list >>> Lustre-discuss at lists.lustre.org >>> http://lists.lustre.org/mailman/listinfo/lustre-discuss >>> >>