Hello Folks,
We''ve got this perl script that performs our purges based on the atime
returned from a stat() call. Over the weekend, it would appear, our script got
back millions of corrupted or misreported atimes and, lucky for us, unlinked a
whole bunch of files (on the order of 45TB). The only indication anything might
have happened was the following sitting in the dmesg of the lustre client that
houses the purge script:
*snip*
LustreError: 27061:0:(file.c:3312:ll_inode_revalidate_fini()) failure -2 inode
181017071
LustreError: 27061:0:(file.c:3312:ll_inode_revalidate_fini()) Skipped 19
previous similar messages
LustreError: 13381:0:(file.c:3312:ll_inode_revalidate_fini()) failure -2 inode
161973341
LustreError: 13381:0:(file.c:3312:ll_inode_revalidate_fini()) Skipped 25
previous similar messages
LustreError: 27061:0:(file.c:3312:ll_inode_revalidate_fini()) failure -2 inode
162433196
LustreError: 27061:0:(file.c:3312:ll_inode_revalidate_fini()) Skipped 32
previous similar messages
LustreError: 27061:0:(file.c:3312:ll_inode_revalidate_fini()) failure -2 inode
174530765
LustreError: 27061:0:(file.c:3312:ll_inode_revalidate_fini()) Skipped 33
previous similar messages
*snip*
Any ideas or experience with poorly reported atimes under lustre?
----------------
John White
High Performance Computing Services (HPCS)
(510) 486-7307
One Cyclotron Rd, MS: 50B-3209C
Lawrence Berkeley National Lab
Berkeley, CA 94720