Brian J. Murrell wrote:> On Fri, 2007-11-23 at 12:01 -0600, D. Dante Lorenso wrote:
>> I''ve added a new 2.2 TB OST to my cluster easily enough, but
this new
>> disk array is meant to replace several smaller OSTs that I used to have
>> of which were only 120 GB, 500 GB, and 700 GB.
>>
>> Adding an OST is easy, but how do I REMOVE the small OSTs that I no
>> longer want to be part of my cluster? Is there a command to tell
luster
>> to move all the file stripes off one of the nodes?
>
> I answered a very similar question not that long ago (i.e. last couple
> of weeks) on this list. Check the archives linked off the main list
> page listed in the footer of this message.
Sorry about that. I just joined the list and didn''t see your other
message here:
https://mail.clusterfs.com/pipermail/lustre-discuss/2007-November/004332.html
You recommend to put the OST in read-only mode then hunt down all the
files on the OST and copy/move them to other OSTs. Here is my attempt
to do just that:
----------
I set up a small lab test with a couple 20 GB OSTs and created some
large files on it using:
[lab01]/dante> dd if=/dev/zero of=/dante/BIG_001.out bs=50MB count=1
[lab01]/dante> dd if=/dev/zero of=/dante/BIG_002.out bs=50MB count=1
...
[lab01]/dante> dd if=/dev/zero of=/dante/BIG_020.out bs=50MB count=1
The ''lfs find'' command needs to know the UUID of the OST which
you are
looking up. I found that ID by using this command:
[lab01]/dante> lfs df -h
UUID bytes Used Available Use% Mounted on
dante-MDT0000_UUID 16.3G 1.4G 14.9G 8% /dante[MDT:0]
dante-OST0000_UUID 19.7G 1.5G 18.1G 7% /dante[OST:0]
dante-OST0001_UUID 19.7G 1.5G 18.2G 7% /dante[OST:1]
filesystem summary: 39.4G 3.1G 36.3G 7% /dante
Once I know the UUID, I can use the lfs find command to find all the
files which have objects stored on a specific OST:
[lab01]/dante> lfs find -r --obd dante-OST0001_UUID /dante
/dante/BIG_002.out
/dante/BIG_004.out
/dante/BIG_006.out
/dante/BIG_008.out
/dante/BIG_010.out
/dante/BIG_012.out
/dante/BIG_014.out
/dante/BIG_015.out
/dante/BIG_017.out
So, I think these files need to be copied and moved after making the OST
readonly. I can''t figure out how to use the
''readonly'' command as
listed in the lctl command:
[lab01]/dante> lctl help | less
Available commands are:
...
=== testing (DANGEROUS) = ...
readonly
...
For more help type: help command-name
Apparently, Brian was having trouble figuring it out also:
https://mail.clusterfs.com/pipermail/lustre-discuss/2006-October/002246.html
So, to make an OST readonly I need to deactivate it? I need to know the
device number to do that. Here''s how I find the device number from the
MDS:
[lab01]/dante> lctl device_list
0 UP mgs MGS MGS 9
1 UP mgc MGC192.168.200.51 at tcp 690382da-9a5b-c548-a042-29b3c805494d 5
2 UP mdt MDS MDS_uuid 3
3 UP lov dante-mdtlov dante-mdtlov_UUID 4
4 UP mds dante-MDT0000 dante-MDT0000_UUID 9
5 UP osc dante-OST0000-osc dante-mdtlov_UUID 5
6 UP osc dante-OST0001-osc dante-mdtlov_UUID 5
7 UP lov dante-clilov-d87aba00 ...202d-b193-5bdc-077b7760b396 4
8 UP mdc dante-MDT0000-mdc-d87aba00 .....2d-b193-5bdc-077b7760b396 5
9 UP osc dante-OST0000-osc-d87aba00 ...202d-b193-5bdc-077b7760b396 5
10 UP osc dante-OST0001-osc-d87aba00 ...202d-b193-5bdc-077b7760b396 5
If I''m trying to remove OST0001, then I need to deactivate device #6 or
is it #10. I''ll try #6 and see what happens:
[lab01]/dante> lctl --device 6 deactivate
Ok, did that work? I didn''t get any output:
[lab01]/dante> cat /proc/fs/lustre/lov/dante-mdtlov/target_obd
0: dante-OST0000_UUID ACTIVE
1: dante-OST0001_UUID INACTIVE
Well, it''s now listed as INACTIVE, so I guess it must have worked? Can
I activate it again?
[lab01]/dante> lctl --device 6 activate
[lab01]/dante> cat /proc/fs/lustre/lov/dante-mdtlov/target_obd
0: dante-OST0000_UUID ACTIVE
1: dante-OST0001_UUID ACTIVE
Ok, I guess that''s how you do it, so here I go making it deactivated
again:
[lab01]/dante> lctl --device 6 deactivate
[lab01]/dante> cat /proc/fs/lustre/lov/dante-mdtlov/target_obd
0: dante-OST0000_UUID ACTIVE
1: dante-OST0001_UUID INACTIVE
Now, back to that list of files that I need to move:
[lab01]/dante> lfs find -r --obd dante-OST0001_UUID /dante
/dante/BIG_002.out
/dante/BIG_004.out
/dante/BIG_006.out
/dante/BIG_008.out
/dante/BIG_010.out
/dante/BIG_012.out
/dante/BIG_014.out
/dante/BIG_015.out
/dante/BIG_017.out
Let try moving one of the files OFF this device:
[lab01]/dante> copy BIG_002.out BIG_002.out.tmp
[lab01]/dante> mv BIG_002.out.tmp BIG_002.out
[lab01]/dante> lfs find -r --obd dante-OST0001_UUID /dante
/dante/BIG_004.out
/dante/BIG_006.out
/dante/BIG_008.out
/dante/BIG_010.out
/dante/BIG_012.out
/dante/BIG_014.out
/dante/BIG_015.out
/dante/BIG_017.out
I guess that''s working. Let me keep going to see if I can get them all
moved over:
[lab01]/dante> copy ... mv ... copy ... mv ... etc etc
[lab01]/dante> lfs find -r --obd dante-OST0001_UUID /dante
* nothing listed *
Well, looks good so far. How about that df command?:
[lab01]/dante> lfs df -h /dante
UUID bytes Used Available Use% Mounted on
dante-MDT0000_UUID 16.3G 1.4G 14.9G 8% /dante[MDT:0]
dante-OST0000_UUID 19.7G 1.6G 18.1G 8% /dante[OST:0]
dante-OST0001_UUID 19.7G 1.4G 18.3G 7% /dante[OST:1]
filesystem summary: 39.4G 3.1G 36.3G 7% /dante
What? If there is nothing on dante-OST0001_UUID, then why does it still
show 1.4G used? Let''s confirm that the files are all on OST0000 and
not
OST0001:
[lab01]/dante> lfs find -r --obd dante-OST0000_UUID /dante
/dante/BIG_001.out
/dante/BIG_016.out
/dante/BIG_002.out
/dante/BIG_003.out
/dante/BIG_004.out
/dante/BIG_005.out
/dante/BIG_006.out
/dante/BIG_007.out
/dante/BIG_008.out
/dante/BIG_009.out
/dante/BIG_010.out
/dante/BIG_011.out
/dante/BIG_012.out
/dante/BIG_013.out
/dante/BIG_014.out
/dante/BIG_015.out
/dante/BIG_017.out
/dante/BIG_018.out
/dante/BIG_019.out
/dante/BIG_020.out
[lab01]/dante> lfs find -r --obd dante-OST0001_UUID /dante
[lab01]/dante> lfs df -h /dante
UUID bytes Used Available Use% Mounted on
dante-MDT0000_UUID 16.3G 1.4G 14.9G 8% /dante[MDT:0]
dante-OST0000_UUID 19.7G 1.6G 18.1G 8% /dante[OST:0]
dante-OST0001_UUID 19.7G 1.4G 18.3G 7% /dante[OST:1]
filesystem summary: 39.4G 3.1G 36.3G 7% /dante
Well, sure enough it looks like all 20 files are on OST0000 and no files
are on OST0001 yet the df output doesn''t reflect that. Maybe the df
output isn''t correct unless I activate the OST again? Let''s
try that:
[lab01]/dante> lctl --device 6 activate
[lab01]/dante> cat /proc/fs/lustre/lov/dante-mdtlov/target_obd
0: dante-OST0000_UUID ACTIVE
1: dante-OST0001_UUID ACTIVE
[lab01]/dante> lfs df -h /dante
UUID bytes Used Available Use% Mounted on
dante-MDT0000_UUID 16.3G 1.4G 14.9G 8% /dante[MDT:0]
dante-OST0000_UUID 19.7G 1.6G 18.1G 8% /dante[OST:0]
dante-OST0001_UUID 19.7G 1.4G 18.3G 7% /dante[OST:1]
filesystem summary: 39.4G 3.1G 36.3G 7% /dante
Nope, that doesn''t work either. What''s wrong with this
picture?
-- Dante