So we have a new file system set up. beefy OSTs, but certainly under- sized metadata (we''re still figuring out what we''ll use in the end). We''ve just started to do friendly-user testing and ran into some odd file entries. In particular: -rw-r--r-- 1 ybao esd 22317940 Jul 23 10:49 met_em.d01.2060-12-30_12:00:00.nc ?--------- ? ? ? ? ? met_em.d01.2060-F9?q -rw-r--r-- 1 ybao esd 22317940 Jul 23 14:30 met_em.d01.2061-01-01_18:00:00.nc This seems to not only be an odd file entry, but might be messing with proper wildcard listings. For instance: [root at n0000 GFDLMET1]# ls met_em.d01.2060-01-01* ls: met_em.d01.2060-01-01*: No such file or directory [root at n0000 GFDLMET1]# ls met_em.d01.2060-01-01_00:00:00.nc met_em.d01.2060-01-01_00:00:00.nc [root at n0000 GFDLMET1]# Does anyone know a possible cause for this? I''m not convinced this wasn''t just a poorly executed transfer on the user''s part, but I would expect a file entry to still contain proper owner metadata, etc. ---------------- John White High Performance Computing Services (HPCS) (510) 486-7307 One Cyclotron Rd, MS: 50B-3209C Lawrence Berkeley National Lab Berkeley, CA 94720
On Jul 24, 2009 15:29 -0700, John White wrote:> So we have a new file system set up. beefy OSTs, but certainly under- > sized metadata (we''re still figuring out what we''ll use in the end). > We''ve just started to do friendly-user testing and ran into some odd > file entries. In particular: > -rw-r--r-- 1 ybao esd 22317940 Jul 23 10:49 > met_em.d01.2060-12-30_12:00:00.nc > ?--------- ? ? ? ? ? met_em.d01.2060-F9?q > -rw-r--r-- 1 ybao esd 22317940 Jul 23 14:30 > met_em.d01.2061-01-01_18:00:00.nc > > This seems to not only be an odd file entry, but might be messing with > proper wildcard listings. For instance: > [root at n0000 GFDLMET1]# ls met_em.d01.2060-01-01* > ls: met_em.d01.2060-01-01*: No such file or directoryThis generally means that the MDS has the directory entry for a file, but the object is missing from the OST. You will likely also have "-2" (-ENOENT) errors on one of your OSTs. In some cases, if a file is being deleted under load and the MDS crashes the OST objects will still be deleted. It isn''t really "data loss" because the file was being deleted, but it can be confusing to users. The simple solution is to use the "unlink" command to delete the file (GNU "rm" will try to stat it first, and fail). Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc.
Hello! On Jul 24, 2009, at 7:04 PM, Andreas Dilger wrote:> On Jul 24, 2009 15:29 -0700, John White wrote: >> So we have a new file system set up. beefy OSTs, but certainly >> under- >> sized metadata (we''re still figuring out what we''ll use in the end). >> We''ve just started to do friendly-user testing and ran into some odd >> file entries. In particular: >> -rw-r--r-- 1 ybao esd 22317940 Jul 23 10:49 >> met_em.d01.2060-12-30_12:00:00.nc >> ?--------- ? ? ? ? ? met_em.d01.2060-F9?q >> -rw-r--r-- 1 ybao esd 22317940 Jul 23 14:30 >> met_em.d01.2061-01-01_18:00:00.nc >> >> This seems to not only be an odd file entry, but might be messing >> with >> proper wildcard listings. For instance: >> [root at n0000 GFDLMET1]# ls met_em.d01.2060-01-01* >> ls: met_em.d01.2060-01-01*: No such file or directory > This generally means that the MDS has the directory entry for a file, > but the object is missing from the OST. You will likely also have > "-2" (-ENOENT) errors on one of your OSTs.Another possibility is stale client cache (should not happen at all of course). If you cannot see this entry on other clients and it disappears after you flush local metadata cache - this is what happens. (echo clear >/proc/fs/lustre/ldlm/namespaces/lustre-MDT0000-mdc-*/ lru_size on affected client). Bye, Oleg