Hi, A while ago, we experienced multi disk failures on a raid6 ost. We managed to migrate some data off the OST (lfs_migrate), and the process was long (software raid was often failing). We reconstructed the target from scratch, which introduced a new OST. Following the Lustre documentation on "Removing an OST from the File System", we used the following procedure to permanently remove the old OST: on MGS: lctl conf_param osc.lustre1-OST002f-osc.active.osc.active=0 Many days later, and even following a complete server/clients reboot, we are now seeing this target being active on clients: on MDT: [root at mds2 ~]# lctl get_param osc.lustre1-OST002f-osc.active osc.lustre1-OST002f-osc.active=0 [root at mds2 ~]# lctl dl|grep 002f 51 UP osc lustre1-OST002f-osc lustre1-mdtlov_UUID 5 on client: # ssh r101-n33 lctl dl |grep 002f 50 UP osc lustre1-OST002f-osc-ffff810377354000 ef3f455a-7f67-134e-cf38-bcc0d9b89f26 4 What are we missing from the procedure here? I''m really looking at *permanently* disabling OST from Lustre. Thanks for any pointers. Florent
On 2010-05-26, at 16:49, Florent Parent wrote:> A while ago, we experienced multi disk failures on a raid6 ost. We > managed to migrate some data off the OST (lfs_migrate), and the > process was long (software raid was often failing). > > We reconstructed the target from scratch, which introduced a new OST. > Following the Lustre documentation on "Removing an OST from the File > System", we used the following procedure to permanently remove the old > OST: > > on MGS: > lctl conf_param osc.lustre1-OST002f-osc.active.osc.active=0If you specified the above, it is possible that you only deactivated it on the MDS, not on the clients as well.> Many days later, and even following a complete server/clients reboot, > we are now seeing this target being active on clients: > > on MDT: > [root at mds2 ~]# lctl get_param osc.lustre1-OST002f-osc.active > osc.lustre1-OST002f-osc.active=0 > [root at mds2 ~]# lctl dl|grep 002f > 51 UP osc lustre1-OST002f-osc lustre1-mdtlov_UUID 5The device is configured, but if it is not active it will not be used for anything.> on client: > # ssh r101-n33 lctl dl |grep 002f > 50 UP osc lustre1-OST002f-osc-ffff810377354000 > ef3f455a-7f67-134e-cf38-bcc0d9b89f26 4What does "active" report for this OSC on a client? Cheers, Andreas -- Andreas Dilger Lustre Technical Lead Oracle Corporation Canada Inc.
On Wed, May 26, 2010 at 19:08, Andreas Dilger <andreas.dilger at oracle.com> wrote:> On 2010-05-26, at 16:49, Florent Parent wrote: >> >> on MGS: >> lctl conf_param osc.lustre1-OST002f-osc.active.osc.active=0 > > If you specified the above, it is possible that you only deactivated it on the MDS, not on the clients as well.Right. It was executed on all clients as well.> >> Many days later, and even following a complete server/clients reboot, >> we are now seeing this target being active on clients: >> >> on MDT: >> [root at mds2 ~]# lctl get_param osc.lustre1-OST002f-osc.active >> osc.lustre1-OST002f-osc.active=0 >> [root at mds2 ~]# lctl dl|grep 002f >> 51 UP osc lustre1-OST002f-osc lustre1-mdtlov_UUID 5 > > The device is configured, but if it is not active it will not be used for anything. > >> on client: >> # ssh r101-n33 lctl dl |grep 002f >> 50 UP osc lustre1-OST002f-osc-ffff810377354000 >> ef3f455a-7f67-134e-cf38-bcc0d9b89f26 4 > > What does "active" report for this OSC on a client?Shows 0 (I don''t know why we are seeing a double entry here). So I guess it''s inactive. I was under the impression references to the OST would go away. It''s also confusing to have the OST show as UP in "lctl dl". # ssh r101-n33 lctl get_param osc.lustre1-OST002f-osc*.active osc.lustre1-OST002f-osc-ffff810371ac3c00.active=0 osc.lustre1-OST002f-osc-ffff810377354000.active=0 Thanks Florent
Andreas, This isn''t the same as the similar thread, where an OST is being replaced (keeping the same number). Doesn''t he just have to re-do the writeconf, to delete references to the OST in the MGS, as in Bug 22283? There will remain a gap in the OST numbers, but that should be okay if there are no objects, right? Kevin On May 26, 2010, at 6:22 PM, Florent Parent <florent.parent at clumeq.ca> wrote:> On Wed, May 26, 2010 at 19:08, Andreas Dilger <andreas.dilger at oracle.com > > wrote: >> On 2010-05-26, at 16:49, Florent Parent wrote: >>> >>> on MGS: >>> lctl conf_param osc.lustre1-OST002f-osc.active.osc.active=0 >> >> If you specified the above, it is possible that you only >> deactivated it on the MDS, not on the clients as well. > > Right. It was executed on all clients as well. > >> >>> Many days later, and even following a complete server/clients >>> reboot, >>> we are now seeing this target being active on clients: >>> >>> on MDT: >>> [root at mds2 ~]# lctl get_param osc.lustre1-OST002f-osc.active >>> osc.lustre1-OST002f-osc.active=0 >>> [root at mds2 ~]# lctl dl|grep 002f >>> 51 UP osc lustre1-OST002f-osc lustre1-mdtlov_UUID 5 >> >> The device is configured, but if it is not active it will not be >> used for anything. >> >>> on client: >>> # ssh r101-n33 lctl dl |grep 002f >>> 50 UP osc lustre1-OST002f-osc-ffff810377354000 >>> ef3f455a-7f67-134e-cf38-bcc0d9b89f26 4 >> >> What does "active" report for this OSC on a client? > > Shows 0 (I don''t know why we are seeing a double entry here). So I > guess it''s inactive. I was under the impression references to the OST > would go away. It''s also confusing to have the OST show as UP in "lctl > dl". > > # ssh r101-n33 lctl get_param osc.lustre1-OST002f-osc*.active > osc.lustre1-OST002f-osc-ffff810371ac3c00.active=0 > osc.lustre1-OST002f-osc-ffff810377354000.active=0 > > Thanks > Florent > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss
To clarify: would still have some vestiges of the old OST, and would have to follow the other procedure if the OST index is reused, but the writeconf should remove all mention of the OST from "lctl dl", right? Kevin Van Maren wrote:> Andreas, > > This isn''t the same as the similar thread, where an OST is being > replaced (keeping the same number). Doesn''t he just have to re-do the > writeconf, to delete references to the OST in the MGS, as in Bug 22283? > > There will remain a gap in the OST numbers, but that should be okay if > there are no objects, right? > > Kevin > > > On May 26, 2010, at 6:22 PM, Florent Parent <florent.parent at clumeq.ca> > wrote: > > >> On Wed, May 26, 2010 at 19:08, Andreas Dilger <andreas.dilger at oracle.com >> >>> wrote: >>> On 2010-05-26, at 16:49, Florent Parent wrote: >>> >>>> on MGS: >>>> lctl conf_param osc.lustre1-OST002f-osc.active.osc.active=0 >>>> >>> If you specified the above, it is possible that you only >>> deactivated it on the MDS, not on the clients as well. >>> >> Right. It was executed on all clients as well. >> >> >>>> Many days later, and even following a complete server/clients >>>> reboot, >>>> we are now seeing this target being active on clients: >>>> >>>> on MDT: >>>> [root at mds2 ~]# lctl get_param osc.lustre1-OST002f-osc.active >>>> osc.lustre1-OST002f-osc.active=0 >>>> [root at mds2 ~]# lctl dl|grep 002f >>>> 51 UP osc lustre1-OST002f-osc lustre1-mdtlov_UUID 5 >>>> >>> The device is configured, but if it is not active it will not be >>> used for anything. >>> >>> >>>> on client: >>>> # ssh r101-n33 lctl dl |grep 002f >>>> 50 UP osc lustre1-OST002f-osc-ffff810377354000 >>>> ef3f455a-7f67-134e-cf38-bcc0d9b89f26 4 >>>> >>> What does "active" report for this OSC on a client? >>> >> Shows 0 (I don''t know why we are seeing a double entry here). So I >> guess it''s inactive. I was under the impression references to the OST >> would go away. It''s also confusing to have the OST show as UP in "lctl >> dl". >> >> # ssh r101-n33 lctl get_param osc.lustre1-OST002f-osc*.active >> osc.lustre1-OST002f-osc-ffff810371ac3c00.active=0 >> osc.lustre1-OST002f-osc-ffff810377354000.active=0 >> >> Thanks >> Florent >> _______________________________________________ >> Lustre-discuss mailing list >> Lustre-discuss at lists.lustre.org >> http://lists.lustre.org/mailman/listinfo/lustre-discuss >> > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss >