thr3ads.net - Lustre discuss - [Lustre-discuss] Adjusting stripe for 50,000+ files? [Feb 2010]

If this information is useful, please help other people find it:
Share via:

Jeremy Mann

2010-Feb-25 17:04 UTC

[Lustre-discuss] Adjusting stripe for 50,000+ files?

We are running Lustre x86_64 2.6.22.14 version 1.6.7 with 1 MGS/MDT and 14
OSTs. The past few days I''ve been fighting a problem with one user who
is
storing roughly 50,000 >1k files in many subdirectories in our Lustre
filesystem. Saturday I had to repair the Meta server with e2fsck and now
this morning, everything was fine until he started another batch of jobs
submitted to our PBS queue.

Lustre: laredofs-MDT0000: sending delayed replies to recovered clients
Lustre: laredofs-MDT0000: recovery complete: rc 0
Lustre: MDS laredofs-MDT0000: laredofs-OST0000_UUID now active, resetting
orphans
Lustre: MDS laredofs-MDT0000: laredofs-OST0001_UUID now active, resetting
orphans
Lustre: Client laredofs-client has started
nph-mascot.exe[3816]: segfault at 0000000000000018 rip 000000000041bb45
rsp 00007fffa412e890 error 6

The above messages lasted all week with now errors. This morning I see this:

Lustre: 22640:0:(lustre_fsfilt.h:330:fsfilt_setattr()) laredofs-MDT0000:
slow setattr 48s
Lustre: 22644:0:(lustre_fsfilt.h:229:fsfilt_start_log()) laredofs-MDT0000:
slow journal start 32s
LDISKFS-fs error (device sdb2): ldiskfs_add_entry: bad entry in directory
#12731224: inode out of bounds - offset=1900, inode=1953587812,
rec_len=204, name_len=194
Aborting journal on device sdb2.
Remounting filesystem read-only
LustreError: 22627:0:(fsfilt-ldiskfs.c:280:fsfilt_ldiskfs_start()) error
starting handle for op 8 (33 credits): rc -30
LustreError: 22627:0:(mds_reint.c:154:mds_finish_transno()) fsfilt_start: -30

I managed to unmount the OSTs and the MDT but there was a kernel panic
that prevented me from running e2fsck on the Meta server so I simply
rebooted it. Then I ran e2fsck and it found inode problems all associated
within his directories, luckily e2fsck fixed them.

Now, everything is back to normal and his jobs are processing.

Currently, the stripe set on his directory is 128k (this is our default
stripe). I''m curious if I need to set a smaller stripe on his
directories
with those 50,000+ files.

-- 
Jeremy Mann
jeremy at biochem.uthscsa.edu

University of Texas Health Science Center
Bioinformatics Core Facility
http://www.bioinformatics.uthscsa.edu
Phone: (210) 567-2672

Johann Lombardi

2010-Feb-25 17:20 UTC

head link

[Lustre-discuss] Adjusting stripe for 50,000+ files?

On Thu, Feb 25, 2010 at 11:04:25AM -0600, Jeremy Mann
wrote:> We are running Lustre x86_64 2.6.22.14 version 1.6.7 with 1 MGS/MDT and 14                                                 ^^^^^
Please note that we released 1.6.7.1 shortly after 1.6.7 in order to address
a MDS corruption which was bug 18695.
> LDISKFS-fs error (device sdb2): ldiskfs_add_entry: bad entry in directory
> #12731224: inode out of bounds - offset=1900, inode=1953587812,
> rec_len=204, name_len=194
This really looks like you are hitting bug 18695. I would really recommend
upgrading to 1.6.7.2 or 1.8.2. Please note that your MDS can be severely
damaged by bug 18695, so you should run e2fsck against the MDT device before
upgrading.

Johann

Jeremy Mann

2010-Feb-25 17:25 UTC

head link

[Lustre-discuss] Adjusting stripe for 50,000+ files?

Johann Lombardi wrote:> On Thu, Feb 25, 2010 at 11:04:25AM -0600, Jeremy Mann wrote:
>> We are running Lustre x86_64 2.6.22.14 version 1.6.7 with 1 MGS/MDT and
>> 14
>                                                  ^^^^^
> Please note that we released 1.6.7.1 shortly after 1.6.7 in order to
> address
> a MDS corruption which was bug 18695.
>
>> LDISKFS-fs error (device sdb2): ldiskfs_add_entry: bad entry in
>> directory
>> #12731224: inode out of bounds - offset=1900, inode=1953587812,
>> rec_len=204, name_len=194
>
> This really looks like you are hitting bug 18695. I would really recommend
> upgrading to 1.6.7.2 or 1.8.2. Please note that your MDS can be severely
> damaged by bug 18695, so you should run e2fsck against the MDT device
> before
> upgrading.
Thank you Johann, I will get right on this.


-- 
Jeremy Mann
jeremy at biochem.uthscsa.edu

University of Texas Health Science Center
Bioinformatics Core Facility
http://www.bioinformatics.uthscsa.edu
Phone: (210) 567-2672

Lustre discuss - Feb 2010 - Adjusting stripe for 50,000+ files?

[Lustre-discuss] Adjusting stripe for 50,000+ files?

[Lustre-discuss] Adjusting stripe for 50,000+ files?

[Lustre-discuss] Adjusting stripe for 50,000+ files?