thr3ads.net - Lustre discuss - [Lustre-discuss] failover of OSTs: llogs for setup [Jan 2009]

If this information is useful, please help other people find it:
Share via:

Erich Focht

2009-Jan-29 18:46 UTC

[Lustre-discuss] failover of OSTs: llogs for setup

Hello,

we have a problem in a test setup where clients don''t recover after a 
failover of the OSS. Looking at the llog entries on the MGS I see:

#25 (224)marker  10 (flags=0x01, v1.6.5.1) lustre-OST0001  ''add
osc'' Thu
Nov  6 17:56:23 2008-
#26 (080)add_uuid  nid=10.3.0.229 at o2ib(0x500000a0300e5)  0: 
1:10.3.0.229 at o2ib
#27 (080)add_uuid  nid=192.168.50.129 at tcp(0x20000c0a83281)  0: 
1:10.3.0.229 at o2ib
#28 (128)attach    0:lustre-OST0001-osc  1:osc  2:lustre-clilov_UUID
#29 (136)setup     0:lustre-OST0001-osc  1:lustre-OST0001_UUID 
2:10.3.0.229 at o2ib
#30 (080)add_uuid  nid=10.3.0.229 at o2ib(0x500000a0300e5)  0: 
1:10.3.0.229 at o2ib
#31 (080)add_uuid  nid=192.168.50.129 at tcp(0x20000c0a83281)  0: 
1:10.3.0.229 at o2ib
#32 (104)add_conn  0:lustre-OST0001-osc  1:10.3.0.229 at o2ib
#33 (128)lov_modify_tgts add 0:lustre-clilov  1:lustre-OST0001_UUID  2:1 
  3:1
#34 (224)marker  10 (flags=0x02, v1.6.5.1) lustre-OST0001  ''add
osc'' Thu
Nov  6 17:56:23 2008-


If I understand this correctly: the client "knows" where to connect
for
accessing an OST from these entries. And these just display one of the 
two OSSes (10.3.0.229 at o2ib,192.168.50.129 at tcp). It is possible that 
there was a mistake when mounting the OST the first time, and it was 
mounted on the wrong OSS (the failover node). Would this lead to such an 
issue?

Is this correctable by re-registering the OST to the MDS (doing the 
"first mount" again)? What do I need to do on the MGS and OST for this
(tunefs...?)?

Thanks & best regards,
Erich

Lustre discuss - Jan 2009 - failover of OSTs: llogs for setup

[Lustre-discuss] failover of OSTs: llogs for setup