Andreas,
Thank you for the response. I should have mentioned that i only have 1
client currently active, i have removed the others in preparation for
this. So having alot of open files would not be expected.
I did what you suggested, umount -f on the MDS and then mounted the mdt
again. Good news, it freed up another 1.5 TB of space.
The OST now looks like:
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/sdc 4804519904 1947035768 2613428584 43% /mnt/ost2
What is troubling to me is i did this procedure a few more times just to
watch the log files and every time it seems to free up more space! Im
down to 1.5TB used now. Im seeing errors like these in the messages.log
file on the oss:
Apr 24 09:32:08 nasoss2 kernel: LustreError:
17601:0:(ldlm_resource.c:851:ldlm_resource_add()) lvbo_init failed for
resource 2107572: rc -2
Apr 24 09:32:08 nasoss2 kernel: LustreError:
17601:0:(ldlm_resource.c:851:ldlm_resource_add()) Skipped 17 previous
similar messages
Apr 24 09:32:28 nasoss2 kernel: LustreError:
17625:0:(filter.c:1396:filter_destroy_internal()) destroying objid
2331968 ino 144073365 nlink 0 count 1
Apr 24 09:32:28 nasoss2 kernel: LustreError:
17625:0:(filter.c:1396:filter_destroy_internal()) Skipped 3978 previous
similar messages
Apr 24 09:32:28 nasoss2 kernel: LustreError:
17625:0:(filter.c:1402:filter_destroy_internal()) error unlinking objid
2331968: rc -2
Apr 24 09:32:28 nasoss2 kernel: LustreError:
17625:0:(filter.c:1402:filter_destroy_internal()) Skipped 3978 previous
similar messages
Apr 24 09:32:56 nasoss2 kernel: LustreError:
17643:0:(recov_thread.c:453:log_commit_thread()) commit
ffff8101accd1000:ffff8101d9bf01c0 drop 124 cookies: rc -22
Apr 24 09:32:56 nasoss2 kernel: LustreError:
17643:0:(recov_thread.c:453:log_commit_thread()) Skipped 321 previous
similar messages
Any idea what is going on? If i do this a few more times ill be almost
to no used space, hehe.
Thanks
Scott
Andreas Dilger wrote:> On 2010-04-23, at 11:32, Scott wrote:
>> Appolgies if Im missing something obvious here.
>>
>> My OSTs are set up in Raid 5 and one of the arrays has a bad stripe so
i
>> need to rebuild it. In preparation for this i want to move all the
data
>> off of this OST so i deactivated the OST on the MDS and ran:
>>
>>
>> lfs find --recursive --obd nasone-OST0002_UUID --quiet /lustre | while
>> read F; cp $F $F.tmp && mv $F.tmp $F; done
>>
>> This ran for quite a while and after it finished i ran the find command
>> again to confirm there were no more files on the OST.
>
> I did nearly this same thing just yesterday, though I quoted "$F"
so that it would handle filenames with spaces, etc. in them. Note that doing
this is only safe for files that are not currently in use by
clients/applications.
>
>> However if i look at the OSS i still show there are 3.4TBs of used
space
>> on that OST2:
>>
>> # df
>> Filesystem 1K-blocks Used Available Use% Mounted on
>>
>> /dev/sdd 5765425880 5223515676 249043440 96% /mnt/ost4
>> /dev/sdc 4804519904 3479755816 1080708536 77% /mnt/ost2
>>
>> Does this make any sense at all or am i missing something obvious here?
>> I was expecting (hoping) to see the used space back to almost zero so
>> does this mean i have quite a bit of lost data?
>
> I got down to virtually no space used, on my 4-year-old OST that
I''m moving to a larger disk.
>
> One possibility is that you have open files that are holding this space in
use. If you unmount the MDT (use "umount -f", which will evict all of
the clients, though this will cause applications to see IO errors, if that is
acceptable) and mount it again, does the space usage go away?
>
> Cheers, Andreas
> --
> Andreas Dilger
> Lustre Technical Lead
> Oracle Corporation Canada Inc.
>