On Nov 8, 2010, at 12:18 AM, Andreas Dilger wrote:
> On 2010-11-05, at 07:30, Michael Barnes wrote:
>> I''m guessing that the MDS simply doesn''t know about
the dangling object on the OST. On one OST that I''ve examined I
noticed that most of the orphaned objects are from Oct 27, and somewhere between
Oct 26th and 27th the metadata server had a software crash (1.8.1.1 since
upgraded to 1.8.4). I''m guessing that the clients pushed data directly
to the OST and could not update the MDT, thus leaving the stray files on the
OST.
>>
>> Is there any way to get more information about the files or possibly
clean these files while lustre is active? All I have are the object ID and
regular UNIX metadata information that is stored on the object files. When
these errors occurred, did the write fail on the client side, and are the
users'' not expecting the data to be there?
>
> You can find out which MDS inode these objects belong(ed) to with the
ll_decode_filter_fid tool, included in newer lustre releases. It will print out
the MDS inode and generation numbers that are saved on the OST objects when they
are first accessed. On the MDS you can use debugfs with the "stat
<inode_nr>" command to determine if the inode is still in use (the
generation number would match the one on the OST, otherwise it is just a re-used
inode.
Thanks for the tip. BTW, what are the arguments to ll_decode_filter_fid?
I''ve given it an objid and a filename, and neither seem to do anything.
My suspicions were confirmed by a user yesterday. These orphaned objects are
not isolated to MDS/OSS failures, they are reproducible with a user''s
data summary script.
These users are dealing with a moderate amount of data in terms of size, but
they have many, many small files. I''ve seen hundreds of thousands of
these files go missing. The user told me the arguments to his script and told
me I could run it, and this script definitely generates orphaned files every
time its run.
########################################################################################
Some basic info:
clients are at 1.8.1.1 and 1.8.2 (with a hang patch)
mds/oss are at 1.8.4 originally at 1.8.1.1
~1000 clients
~200 TB filesystem
OSSes are on DDR infiniband
MDS is on QDR infiniband
clients are on anywhere between SDR to QDR infiniband, no TCP clients at this
time AFAIK
Some OSTs are turned off/deactivated (conf_param version).
Some OSTs are turned off/deactivated (set_param version).
########################################################################################
A junk file I just created looks like this from lfs getstripe from the client:
...
46: lustre-OST002e_UUID INACTIVE
...
tmp.px0py0pz0_phi_jr_Nsrc3_Ncfg50_20x64_m020m050_P.dat
obdidx objid objid group
46 207061 0x328d5 0
*** This does not make sense to me why the client sees the OST as inactive yet
is still is allocating objects to it. This OST should be active ***
On the mds lctl dl shows:
...
51 UP osc lustre-OST002e-osc lustre-mdtlov_UUID 5
...
The client''s syslog has many messages similar to:
Nov 9 10:24:44 qcd10i2.jlab.org kernel: LustreError:
30129:0:(file.c:1001:ll_glimpse_size()) Skipped 3716 previous similar messages
Nov 9 10:33:01 qcd10i2.jlab.org kernel: LustreError:
6460:0:(namei.c:1160:ll_objects_destroy()) obd destroy objid 0xb02800a at 0x0
error -5
Nov 9 10:33:01 qcd10i2.jlab.org kernel: LustreError:
6460:0:(namei.c:1160:ll_objects_destroy()) Skipped 3203 previous similar
messages
Nov 9 10:33:27 qcd10i2.jlab.org kernel: LustreError:
6410:0:(file.c:125:ll_close_inode_openhandle()) inode 184713223 ll_objects
destroy: rc = -5
Nov 9 10:41:36 qcd10i2.jlab.org kernel: LustreError:
6626:0:(file.c:1001:ll_glimpse_size()) obd_enqueue returned rc -5, returning
-EIO
Nov 9 10:41:36 qcd10i2.jlab.org kernel: LustreError:
6626:0:(file.c:1001:ll_glimpse_size()) Skipped 21 previous similar messages
Nov 9 10:43:15 qcd10i2.jlab.org kernel: LustreError:
6897:0:(namei.c:1160:ll_objects_destroy()) obd destroy objid 0x19188a1e at 0x0
error -5
Nov 9 10:43:15 qcd10i2.jlab.org kernel: LustreError:
6897:0:(namei.c:1160:ll_objects_destroy()) Skipped 3 previous similar messages
########################################################################################
Some more background info.
We have had many OST and a few MDS failures. More hardware, which in turn has
tickled some software bugs which seemed to of lessened by upgrading servers to
1.8.4.
We are changing our RAID configurations, and migrating data off of OSTs. This
is why we have OSTs offline/deactivated.
Any help would be appreciated.
TIA,
-mb
--
+-----------------------------------------------
| Michael Barnes
|
| Thomas Jefferson National Accelerator Facility
| Scientific Computing Group
| 12000 Jefferson Ave.
| Newport News, VA 23606
| (757) 269-7634
+-----------------------------------------------