Hi all. I believe this has been discussed before, but 15 minutes of googling and searching my mail archives didn''t reveal the answer; no doubt when somebody reminds me what it is, I''ll get to say D''oh! I''ve got a test installation running 1.6b5, and it looks like one of the drives (containing an OST) is on its way out. I''ve migrated all the data off it (by deactivating it on the mds and using lfs find to identify all the files that needed to be copied) and now I''m trying to cleanly shut down that OST and make the rest of the system forget about it, at least for a while. I tried deactivating the device on the OSS, using lctl --device N deactivate, but that gripes invalid argument. If I just dismount it, the MDS/MGS sit around griping that they''re trying to recover it. I could have sworn there was a way to get the system to no longer think that OST is a part of it, but I can''t seem to find it now. Anybody got hints? Thanks in advance...
Deactivate the device on the MDT side for a currently-running server e.g. 13 UP osc lustre-OST0001-osc lustre-mdtlov_UUID 5 lctl --device 13 deactivate To start a client or MDT with a known down OST: mount -t lustre -o exclude=lustre-OST0001 ... John R. Dunning wrote:> Hi all. I believe this has been discussed before, but 15 minutes of googling > and searching my mail archives didn''t reveal the answer; no doubt when > somebody reminds me what it is, I''ll get to say D''oh! > > I''ve got a test installation running 1.6b5, and it looks like one of the > drives (containing an OST) is on its way out. I''ve migrated all the data off > it (by deactivating it on the mds and using lfs find to identify all the files > that needed to be copied) and now I''m trying to cleanly shut down that OST and > make the rest of the system forget about it, at least for a while. > > I tried deactivating the device on the OSS, using lctl --device N deactivate, > but that gripes invalid argument. If I just dismount it, the MDS/MGS sit > around griping that they''re trying to recover it. I could have sworn there > was a way to get the system to no longer think that OST is a part of it, but I > can''t seem to find it now. Anybody got hints? > > Thanks in advance... > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss@clusterfs.com > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss > >
From: Nathaniel Rutman <nathan@clusterfs.com> Date: Fri, 17 Nov 2006 11:39:59 -0800 Deactivate the device on the MDT side for a currently-running server e.g. 13 UP osc lustre-OST0001-osc lustre-mdtlov_UUID 5 lctl --device 13 deactivate Ok, did that. It still shows as UP when I lctl dl, though. To start a client or MDT with a known down OST: mount -t lustre -o exclude=lustre-OST0001 ... Ah, ok. So there isn''t any way to say "Remove all traces of this OST from the system so that nobody knows it was ever there" ?
John R. Dunning wrote:> From: Nathaniel Rutman <nathan@clusterfs.com> > Date: Fri, 17 Nov 2006 11:39:59 -0800 > > Deactivate the device on the MDT side for a currently-running server > e.g. 13 UP osc lustre-OST0001-osc lustre-mdtlov_UUID 5 > lctl --device 13 deactivate > > Ok, did that. It still shows as UP when I lctl dl, though. >Yes, it does. Your question prompted me to take a look at changing that... For now, you can get to it here: cfs21:~# cat /proc/fs/lustre/lov/lustre-mdtlov/target_obd 0: lustre-OST0000_UUID ACTIVE 1: lustre-OST0001_UUID INACTIVE> > To start a client or MDT with a known down OST: > mount -t lustre -o exclude=lustre-OST0001 ... > > Ah, ok. So there isn''t any way to say "Remove all traces of this OST from the > system so that nobody knows it was ever there" ? >That is an eventual planned feature, but isn''t implemented yet. You could --writeconf the MDT to nuke the config logs, then restart the servers, and that will truly erase all traces of OSTs that don''t restart. Beware, any file that has stripes on such an erased OST will be very confusing to Lustre... Beware #2: I don''t claim to have tried this myself.
From: Nathaniel Rutman <nathan@clusterfs.com> Date: Fri, 17 Nov 2006 12:11:25 -0800 John R. Dunning wrote: > From: Nathaniel Rutman <nathan@clusterfs.com> > Date: Fri, 17 Nov 2006 11:39:59 -0800 > > Deactivate the device on the MDT side for a currently-running server > e.g. 13 UP osc lustre-OST0001-osc lustre-mdtlov_UUID 5 > lctl --device 13 deactivate > > Ok, did that. It still shows as UP when I lctl dl, though. > Yes, it does. Your question prompted me to take a look at changing that... For now, you can get to it here: cfs21:~# cat /proc/fs/lustre/lov/lustre-mdtlov/target_obd 0: lustre-OST0000_UUID ACTIVE 1: lustre-OST0001_UUID INACTIVE Ok. > > To start a client or MDT with a known down OST: > mount -t lustre -o exclude=lustre-OST0001 ... > > Ah, ok. So there isn''t any way to say "Remove all traces of this OST from the > system so that nobody knows it was ever there" ? > That is an eventual planned feature, but isn''t implemented yet. Ok. You could --writeconf the MDT to nuke the config logs, then restart the servers, Example? and that will truly erase all traces of OSTs that don''t restart. Beware, any file that has stripes on such an erased OST will be very confusing to Lustre... Sure, of course. I suppose to do it really right, you''d want some kind of tool that could examine the MD and gripe about anything that had stripes on the OST in question. But that would be pretty slow. Beware #2: I don''t claim to have tried this myself. Understood. Perhaps I''ll try this next week, or perhaps I''ll just blow it away and rebuild it without the offending unit. Thanks...
John R. Dunning wrote:> From: Nathaniel Rutman <nathan@clusterfs.com> > Date: Fri, 17 Nov 2006 12:11:25 -0800 > > John R. Dunning wrote: > > From: Nathaniel Rutman <nathan@clusterfs.com> > > Date: Fri, 17 Nov 2006 11:39:59 -0800 > > > > Deactivate the device on the MDT side for a currently-running server > > e.g. 13 UP osc lustre-OST0001-osc lustre-mdtlov_UUID 5 > > lctl --device 13 deactivate > > > > Ok, did that. It still shows as UP when I lctl dl, though. > > > Yes, it does. Your question prompted me to take a look at changing that... > > For now, you can get to it here: > cfs21:~# cat /proc/fs/lustre/lov/lustre-mdtlov/target_obd > 0: lustre-OST0000_UUID ACTIVE > 1: lustre-OST0001_UUID INACTIVE > > Ok. > > > > > To start a client or MDT with a known down OST: > > mount -t lustre -o exclude=lustre-OST0001 ... > > > > Ah, ok. So there isn''t any way to say "Remove all traces of this OST from the > > system so that nobody knows it was ever there" ? > > > That is an eventual planned feature, but isn''t implemented yet. > > Ok. > > You could --writeconf the MDT to nuke the config logs, then restart the > servers, > > Example? >See the wiki: https://mail.clusterfs.com/wikis/lustre/MountConf#head-18c689130e5184035dcec1e6e2b49597afdab189 I just noticed a regression in my current code (and updated the wiki) - you''ll have to tunefs.lustre --writeconf every server disk, not only the MDT, to regen the logs. I have now fixed that so you only need to --writeconf the MDT, but it is always safe to do them all. (Not sure when that regressed.)> and > that will truly erase all traces of OSTs that don''t restart. Beware, > any file that has > stripes on such an erased OST will be very confusing to Lustre... > > Sure, of course. I suppose to do it really right, you''d want some kind of > tool that could examine the MD and gripe about anything that had stripes on > the OST in question. But that would be pretty slow. > > Beware #2: I don''t claim to have tried this myself. > > > Understood. Perhaps I''ll try this next week, or perhaps I''ll just blow it > away and rebuild it without the offending unit. >I just tried it myself, and it works like a charm. Files on lost OSTs don''t actually seem to confuse Lustre at all, they just act corrupted: cfs21:~/cfs/b1_5/lustre/tests# ll /mnt/lustre total 4 ?--------- ? ? ? ? ? p2 -rw-r--r-- 1 root root 1699 Nov 17 12:53 passwd Adding a new OST that reuses the old index results in a valid but truncated file: cfs21:~/cfs/b1_5/lustre/tests# ll /mnt/lustre total 4 -rw-r--r-- 1 root root 0 Nov 17 13:31 p2 -rw-r--r-- 1 root root 1699 Nov 17 12:53 passwd
On Nov 17, 2006 15:26 -0500, John R. Dunning wrote:> Sure, of course. I suppose to do it really right, you''d want some kind of > tool that could examine the MD and gripe about anything that had stripes on > the OST in question. But that would be pretty slow.That is what "lfs find -obd ...." does, and you''ve already done that. As long as the OST is deactivated on the MDS no objects will be created there, but I''d consider doing one last pass before removing it completely (in case it was active while the fs was in use, I don''t know how tightly this system is controlled by you). Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc.
On Nov 17, 2006 13:37 -0800, Nathaniel Rutman wrote:> Adding a new OST that reuses the old index results in a valid but > truncated file: > cfs21:~/cfs/b1_5/lustre/tests# ll /mnt/lustre > total 4 > -rw-r--r-- 1 root root 0 Nov 17 13:31 p2 > -rw-r--r-- 1 root root 1699 Nov 17 12:53 passwdHmm, that shouldn''t be possible. What should instead happen is that either this OST index is marked "do not use" or the "ost_gen" field in the lov_tgts/lov_ost_data_v1 is incremented to indicate that while the index is the same this is in fact a different OST (that avoids the need to have potentially thousands of empty slots in lov_tgts). Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc.
From: Andreas Dilger <adilger@clusterfs.com> Date: Fri, 17 Nov 2006 14:43:59 -0700 On Nov 17, 2006 15:26 -0500, John R. Dunning wrote: > Sure, of course. I suppose to do it really right, you''d want some kind of > tool that could examine the MD and gripe about anything that had stripes on > the OST in question. But that would be pretty slow. That is what "lfs find -obd ...." does, and you''ve already done that. As long as the OST is deactivated on the MDS no objects will be created there, but I''d consider doing one last pass before removing it completely (in case it was active while the fs was in use, I don''t know how tightly this system is controlled by you). In this (test) case, the answer is "totally". In the likely scenarios that I can see for deployment, the answer is also likely to be "totally". In the case where it''s an external fs that I''m interfacing to along with other clients, I hope and expect that I can push the problem off onto whoever''s managing it. What got me started on that line of thought was what happens when you have a lustre fs that lives for a long time. Growing it is easy enough, but what if, for whatever reason, you want to shrink it while leaving it operational. In that case, you might well want to reduce the number of OSTs, so a procedure which allows one to reliably get rid of an OST and tell the system not to expect it to come back seems like a Good Thing (tm). Maybe I shouldn''t worry about it, storage is cheap enough and getting cheaper, so maybe I should only expect things to grow :-}