Mark Field
2012-Nov-22 17:25 UTC
[Lustre-discuss] Odd broken behaviour on one lustre client
Hi, I am currently using lustre 1.8, after a OST failure, I deactivated the OST on the MDS and made the change permanent. If I now run lctl dl on the client nodes all of them except one show the OST as inactive (device 7 in the output below) 0 UP mgc MGC10.214.4.201 at o2ib 78b8432f-6331-cae7-8d75-dbaba9708056 5 1 UP lov optstr01-clilov-ffff8103350d0400 cd18b560-e476-f55d-6df1-edcbd68c361b 4 2 UP mdc optstr01-MDT0000-mdc-ffff8103350d0400 cd18b560-e476-f55d-6df1-edcbd68c361b 5 3 UP osc optstr01-OST0000-osc-ffff8103350d0400 cd18b560-e476-f55d-6df1-edcbd68c361b 5 4 UP osc optstr01-OST0001-osc-ffff8103350d0400 cd18b560-e476-f55d-6df1-edcbd68c361b 5 5 UP osc optstr01-OST0002-osc-ffff8103350d0400 cd18b560-e476-f55d-6df1-edcbd68c361b 5 6 UP osc optstr01-OST0003-osc-ffff8103350d0400 cd18b560-e476-f55d-6df1-edcbd68c361b 5 7 IN osc optstr01-OST0004-osc-ffff8103350d0400 cd18b560-e476-f55d-6df1-edcbd68c361b 5 8 UP osc optstr01-OST0008-osc-ffff8103350d0400 cd18b560-e476-f55d-6df1-edcbd68c361b 5 9 UP osc optstr01-OST0005-osc-ffff8103350d0400 cd18b560-e476-f55d-6df1-edcbd68c361b 5 The other client is not working correctly, lctl dl looks like this 0 UP mgc MGC10.214.4.201 at o2ib 94226c2b-6914-6a92-5c6b-2a27ebff676e 5 1 UP lov optstr01-clilov-ffff81016d482400 e7a4a072-c0db-aac9-c13f-bd4189986407 4 2 UP mdc optstr01-MDT0000-mdc-ffff81016d482400 e7a4a072-c0db-aac9-c13f-bd4189986407 5 3 UP osc optstr01-OST0000-osc-ffff81016d482400 e7a4a072-c0db-aac9-c13f-bd4189986407 5 4 UP osc optstr01-OST0001-osc-ffff81016d482400 e7a4a072-c0db-aac9-c13f-bd4189986407 5 5 UP osc optstr01-OST0002-osc-ffff81016d482400 e7a4a072-c0db-aac9-c13f-bd4189986407 5 6 UP osc optstr01-OST0003-osc-ffff81016d482400 e7a4a072-c0db-aac9-c13f-bd4189986407 5 7 UP osc optstr01-OST0004-osc-ffff81016d482400 e7a4a072-c0db-aac9-c13f-bd4189986407 4 8 UP osc optstr01-OST0008-osc-ffff81016d482400 e7a4a072-c0db-aac9-c13f-bd4189986407 5 9 UP osc optstr01-OST0005-osc-ffff81016d482400 e7a4a072-c0db-aac9-c13f-bd4189986407 5 Notice device 7 is ''UP'' rather than ''IN'' and also the last number on the line is 4 not 5. I tried umount and re-mounting the client, and rebooting, but it always comes back the same. Is there persistent data somewhere on the client that is corrupt in someway and needs to be deleted? Thanks Mark -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20121122/d53e59ca/attachment.html
Dilger, Andreas
2012-Nov-22 19:34 UTC
[Lustre-discuss] Odd broken behaviour on one lustre client
On 11/22/12 10:25 AM, "Mark Field" <mnfield at gmail.com> wrote:>Hi, > >I am currently using lustre 1.8, after a OST failure, I deactivated the >OST on the MDS and made the change permanent. If I now run lctl dl on >the client nodes all of them except one show the OST as inactive > (device 7 in the output below) > > > 0 UP mgc MGC10.214.4.201 at o2ib 78b8432f-6331-cae7-8d75-dbaba9708056 5 > 1 UP lov optstr01-clilov-ffff8103350d0400 >cd18b560-e476-f55d-6df1-edcbd68c361b 4 > 2 UP mdc optstr01-MDT0000-mdc-ffff8103350d0400 >cd18b560-e476-f55d-6df1-edcbd68c361b 5 > 3 UP osc optstr01-OST0000-osc-ffff8103350d0400 >cd18b560-e476-f55d-6df1-edcbd68c361b 5 > 4 UP osc optstr01-OST0001-osc-ffff8103350d0400 >cd18b560-e476-f55d-6df1-edcbd68c361b 5 > 5 UP osc optstr01-OST0002-osc-ffff8103350d0400 >cd18b560-e476-f55d-6df1-edcbd68c361b 5 > 6 UP osc optstr01-OST0003-osc-ffff8103350d0400 >cd18b560-e476-f55d-6df1-edcbd68c361b 5 > 7 IN osc optstr01-OST0004-osc-ffff8103350d0400 >cd18b560-e476-f55d-6df1-edcbd68c361b 5 > 8 UP osc optstr01-OST0008-osc-ffff8103350d0400 >cd18b560-e476-f55d-6df1-edcbd68c361b 5 > 9 UP osc optstr01-OST0005-osc-ffff8103350d0400 >cd18b560-e476-f55d-6df1-edcbd68c361b 5 > > > >The other client is not working correctly, lctl dl looks like this > > > 0 UP mgc MGC10.214.4.201 at o2ib 94226c2b-6914-6a92-5c6b-2a27ebff676e 5 > 1 UP lov optstr01-clilov-ffff81016d482400 >e7a4a072-c0db-aac9-c13f-bd4189986407 4 > 2 UP mdc optstr01-MDT0000-mdc-ffff81016d482400 >e7a4a072-c0db-aac9-c13f-bd4189986407 5 > 3 UP osc optstr01-OST0000-osc-ffff81016d482400 >e7a4a072-c0db-aac9-c13f-bd4189986407 5 > 4 UP osc optstr01-OST0001-osc-ffff81016d482400 >e7a4a072-c0db-aac9-c13f-bd4189986407 5 > 5 UP osc optstr01-OST0002-osc-ffff81016d482400 >e7a4a072-c0db-aac9-c13f-bd4189986407 5 > 6 UP osc optstr01-OST0003-osc-ffff81016d482400 >e7a4a072-c0db-aac9-c13f-bd4189986407 5 > 7 UP osc optstr01-OST0004-osc-ffff81016d482400 >e7a4a072-c0db-aac9-c13f-bd4189986407 4 > 8 UP osc optstr01-OST0008-osc-ffff81016d482400 >e7a4a072-c0db-aac9-c13f-bd4189986407 5 > 9 UP osc optstr01-OST0005-osc-ffff81016d482400 >e7a4a072-c0db-aac9-c13f-bd4189986407 5 > > > >Notice device 7 is ''UP'' rather than ''IN'' and also the last number on the >line is 4 not 5. I tried umount and re-mounting the client, and >rebooting, but it always comes back the same. Is there persistent > data somewhere on the client that is corrupt in someway and needs to be >deleted?No, there is no persistent data on the clients at all. They get a new UUID each time they mount, so the servers can''t even tell it is the same node from one mount to the next. Presumably this is causing a visible problem, or you wouldn''t have mentioned it? Cheers, Andreas
Mark Field
2012-Nov-22 23:16 UTC
[Lustre-discuss] Odd broken behaviour on one lustre client
Yes, this machine can''t access the mounted file system and caused a kernel panic when we tried to access some files, it also seems to give different and incorrect values when du or df is run on it. On 22 November 2012 19:34, Dilger, Andreas <andreas.dilger at intel.com> wrote:> On 11/22/12 10:25 AM, "Mark Field" <mnfield at gmail.com> wrote: > > >Hi, > > > >I am currently using lustre 1.8, after a OST failure, I deactivated the > >OST on the MDS and made the change permanent. If I now run lctl dl on > >the client nodes all of them except one show the OST as inactive > > (device 7 in the output below) > > > > > > 0 UP mgc MGC10.214.4.201 at o2ib 78b8432f-6331-cae7-8d75-dbaba9708056 5 > > 1 UP lov optstr01-clilov-ffff8103350d0400 > >cd18b560-e476-f55d-6df1-edcbd68c361b 4 > > 2 UP mdc optstr01-MDT0000-mdc-ffff8103350d0400 > >cd18b560-e476-f55d-6df1-edcbd68c361b 5 > > 3 UP osc optstr01-OST0000-osc-ffff8103350d0400 > >cd18b560-e476-f55d-6df1-edcbd68c361b 5 > > 4 UP osc optstr01-OST0001-osc-ffff8103350d0400 > >cd18b560-e476-f55d-6df1-edcbd68c361b 5 > > 5 UP osc optstr01-OST0002-osc-ffff8103350d0400 > >cd18b560-e476-f55d-6df1-edcbd68c361b 5 > > 6 UP osc optstr01-OST0003-osc-ffff8103350d0400 > >cd18b560-e476-f55d-6df1-edcbd68c361b 5 > > 7 IN osc optstr01-OST0004-osc-ffff8103350d0400 > >cd18b560-e476-f55d-6df1-edcbd68c361b 5 > > 8 UP osc optstr01-OST0008-osc-ffff8103350d0400 > >cd18b560-e476-f55d-6df1-edcbd68c361b 5 > > 9 UP osc optstr01-OST0005-osc-ffff8103350d0400 > >cd18b560-e476-f55d-6df1-edcbd68c361b 5 > > > > > > > >The other client is not working correctly, lctl dl looks like this > > > > > > 0 UP mgc MGC10.214.4.201 at o2ib 94226c2b-6914-6a92-5c6b-2a27ebff676e 5 > > 1 UP lov optstr01-clilov-ffff81016d482400 > >e7a4a072-c0db-aac9-c13f-bd4189986407 4 > > 2 UP mdc optstr01-MDT0000-mdc-ffff81016d482400 > >e7a4a072-c0db-aac9-c13f-bd4189986407 5 > > 3 UP osc optstr01-OST0000-osc-ffff81016d482400 > >e7a4a072-c0db-aac9-c13f-bd4189986407 5 > > 4 UP osc optstr01-OST0001-osc-ffff81016d482400 > >e7a4a072-c0db-aac9-c13f-bd4189986407 5 > > 5 UP osc optstr01-OST0002-osc-ffff81016d482400 > >e7a4a072-c0db-aac9-c13f-bd4189986407 5 > > 6 UP osc optstr01-OST0003-osc-ffff81016d482400 > >e7a4a072-c0db-aac9-c13f-bd4189986407 5 > > 7 UP osc optstr01-OST0004-osc-ffff81016d482400 > >e7a4a072-c0db-aac9-c13f-bd4189986407 4 > > 8 UP osc optstr01-OST0008-osc-ffff81016d482400 > >e7a4a072-c0db-aac9-c13f-bd4189986407 5 > > 9 UP osc optstr01-OST0005-osc-ffff81016d482400 > >e7a4a072-c0db-aac9-c13f-bd4189986407 5 > > > > > > > >Notice device 7 is ''UP'' rather than ''IN'' and also the last number on the > >line is 4 not 5. I tried umount and re-mounting the client, and > >rebooting, but it always comes back the same. Is there persistent > > data somewhere on the client that is corrupt in someway and needs to be > >deleted? > > No, there is no persistent data on the clients at all. They get a new > UUID each time they mount, so the servers can''t even tell it is the same > node from one mount to the next. > > Presumably this is causing a visible problem, or you wouldn''t have > mentioned it? > > Cheers, Andreas > >-------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20121122/3ff30154/attachment.html