thr3ads.net - Lustre discuss - [Lustre-discuss] Moving files off an OST [Apr 2010]

If this information is useful, please help other people find it:
Share via:

Scott

2010-Apr-23 17:32 UTC

[Lustre-discuss] Moving files off an OST

Appolgies if Im missing something obvious here.

My OSTs are set up in Raid 5 and one of the arrays has a bad stripe so i 
need to rebuild it.  In preparation for this i want to move all the data 
off of this OST so i deactivated the OST on the MDS and ran:


lfs find --recursive --obd nasone-OST0002_UUID --quiet /lustre | while 
read F;  cp $F $F.tmp && mv $F.tmp $F; done

This ran for quite a while and after it finished i ran the find command 
again to confirm there were no more files on the OST.

However if i look at the OSS i still show there are 3.4TBs of used space 
on that OST2:

# df
Filesystem           1K-blocks      Used Available Use% Mounted on

/dev/sdd             5765425880 5223515676 249043440  96% /mnt/ost4
/dev/sdc             4804519904 3479755816 1080708536  77% /mnt/ost2

Does this make any sense at all or am i missing something obvious here? 
I was expecting (hoping) to see the used space back to almost zero so 
does this mean i have quite a bit of lost data?

Any help?

Regards

Andreas Dilger

2010-Apr-24 09:10 UTC

head link

[Lustre-discuss] Moving files off an OST

On 2010-04-23, at 11:32, Scott wrote:> Appolgies if Im missing something obvious here.
> 
> My OSTs are set up in Raid 5 and one of the arrays has a bad stripe so i 
> need to rebuild it.  In preparation for this i want to move all the data 
> off of this OST so i deactivated the OST on the MDS and ran:
> 
> 
> lfs find --recursive --obd nasone-OST0002_UUID --quiet /lustre | while 
> read F;  cp $F $F.tmp && mv $F.tmp $F; done
> 
> This ran for quite a while and after it finished i ran the find command 
> again to confirm there were no more files on the OST.
I did nearly this same thing just yesterday, though I quoted "$F" so
that it would handle filenames with spaces, etc. in them.  Note that doing this
is only safe for files that are not currently in use by clients/applications.
> However if i look at the OSS i still show there are 3.4TBs of used space 
> on that OST2:
> 
> # df
> Filesystem           1K-blocks      Used Available Use% Mounted on
> 
> /dev/sdd             5765425880 5223515676 249043440  96% /mnt/ost4
> /dev/sdc             4804519904 3479755816 1080708536  77% /mnt/ost2
> 
> Does this make any sense at all or am i missing something obvious here? 
> I was expecting (hoping) to see the used space back to almost zero so 
> does this mean i have quite a bit of lost data?
I got down to virtually no space used, on my 4-year-old OST that I''m
moving to a larger disk.

One possibility is that you have open files that are holding this space in use. 
If you unmount the MDT (use "umount -f", which will evict all of the
clients, though this will cause applications to see IO errors, if that is
acceptable) and mount it again, does the space usage go away?

Cheers, Andreas
--
Andreas Dilger
Lustre Technical Lead
Oracle Corporation Canada Inc.

Scott

2010-Apr-24 13:34 UTC

head link

[Lustre-discuss] Moving files off an OST

Andreas,

Thank you for the response. I should have mentioned that i only have 1 
client currently active, i have removed the others in preparation for 
this.  So having alot of open files would not be expected.

I did what you suggested, umount -f on the MDS and then mounted the mdt 
again. Good news, it freed up another 1.5 TB of space.

The OST now looks like:

Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/sdc             4804519904 1947035768 2613428584  43% /mnt/ost2


What is troubling to me is i did this procedure a few more times just to 
watch the log files and every time it seems to free up more space!  Im 
down to 1.5TB used now.  Im seeing errors like these in the messages.log 
file on the oss:

Apr 24 09:32:08 nasoss2 kernel: LustreError: 
17601:0:(ldlm_resource.c:851:ldlm_resource_add()) lvbo_init failed for 
resource 2107572: rc -2
Apr 24 09:32:08 nasoss2 kernel: LustreError: 
17601:0:(ldlm_resource.c:851:ldlm_resource_add()) Skipped 17 previous 
similar messages
Apr 24 09:32:28 nasoss2 kernel: LustreError: 
17625:0:(filter.c:1396:filter_destroy_internal()) destroying objid 
2331968 ino 144073365 nlink 0 count 1
Apr 24 09:32:28 nasoss2 kernel: LustreError: 
17625:0:(filter.c:1396:filter_destroy_internal()) Skipped 3978 previous 
similar messages
Apr 24 09:32:28 nasoss2 kernel: LustreError: 
17625:0:(filter.c:1402:filter_destroy_internal()) error unlinking objid 
2331968: rc -2
Apr 24 09:32:28 nasoss2 kernel: LustreError: 
17625:0:(filter.c:1402:filter_destroy_internal()) Skipped 3978 previous 
similar messages
Apr 24 09:32:56 nasoss2 kernel: LustreError: 
17643:0:(recov_thread.c:453:log_commit_thread()) commit 
ffff8101accd1000:ffff8101d9bf01c0 drop 124 cookies: rc -22
Apr 24 09:32:56 nasoss2 kernel: LustreError: 
17643:0:(recov_thread.c:453:log_commit_thread()) Skipped 321 previous 
similar messages

Any idea what is going on? If i do this a few more times ill be almost 
to no used space, hehe.

Thanks

Scott



Andreas Dilger wrote:> On 2010-04-23, at 11:32, Scott wrote:
>> Appolgies if Im missing something obvious here.
>>
>> My OSTs are set up in Raid 5 and one of the arrays has a bad stripe so
i
>> need to rebuild it.  In preparation for this i want to move all the
data
>> off of this OST so i deactivated the OST on the MDS and ran:
>>
>>
>> lfs find --recursive --obd nasone-OST0002_UUID --quiet /lustre | while 
>> read F;  cp $F $F.tmp && mv $F.tmp $F; done
>>
>> This ran for quite a while and after it finished i ran the find command
>> again to confirm there were no more files on the OST.
> 
> I did nearly this same thing just yesterday, though I quoted "$F"
so that it would handle filenames with spaces, etc. in them.  Note that doing
this is only safe for files that are not currently in use by
clients/applications.
> 
>> However if i look at the OSS i still show there are 3.4TBs of used
space
>> on that OST2:
>>
>> # df
>> Filesystem           1K-blocks      Used Available Use% Mounted on
>>
>> /dev/sdd             5765425880 5223515676 249043440  96% /mnt/ost4
>> /dev/sdc             4804519904 3479755816 1080708536  77% /mnt/ost2
>>
>> Does this make any sense at all or am i missing something obvious here?
>> I was expecting (hoping) to see the used space back to almost zero so 
>> does this mean i have quite a bit of lost data?
> 
> I got down to virtually no space used, on my 4-year-old OST that
I''m moving to a larger disk.
> 
> One possibility is that you have open files that are holding this space in
use.  If you unmount the MDT (use "umount -f", which will evict all of
the clients, though this will cause applications to see IO errors, if that is
acceptable) and mount it again, does the space usage go away?
> 
> Cheers, Andreas
> --
> Andreas Dilger
> Lustre Technical Lead
> Oracle Corporation Canada Inc.
>

Lustre discuss - Apr 2010 - Moving files off an OST

[Lustre-discuss] Moving files off an OST

[Lustre-discuss] Moving files off an OST

[Lustre-discuss] Moving files off an OST