I posted this question on 15/8/03 - almost a month ago. Since I've had no
response, I assume that very few people have seen the problem. So I'll tell
what we have discovered in the meantime.
Here's the original post:
-------------------------------------------------------------------------------------------
From: Glen Davison (glen@maths.unsw.edu.au)
Subject: [Samba] Netware CIFS nlm - linux samba
Date: 2003-08-15 01:40:06 PST
Dear Gurus,
We're having bizarre problems/behaviour. Admittedly we have an unusual
set-up:
- users on linux desktops (RedHat/KDE) mounting files over SMB using
samba-2.2.5-10 -client and -common rpms.
- files are on a SAN, clustered behind 2 netware servers (6.5), wihch run the
cifs.nlm (netware guy has gone home - can't tell you the version just now
)
Files are spontaneously changing modification timestamps - anywhere between
about 1930 and 2040 (AD, not hours). Or more likely *presenting* those
timestamps most of the time, though now and then the correct timestamp will
swim into view briefly.
example:
[glen@haiku glen]$ ls -l home/test333*
-rwxr-xr-x 1 glen users 0 Aug 12 16:44 home/test3332
-rwxr-xr-x 1 glen users 11 Aug 12 16:43 home/test3333
[glen@haiku glen]$ ls -l home/test3333
-rwxr-xr-x 1 glen users 11 Sep 2 1992 home/test3333
[glen@haiku glen]$ ls -l home/test3332
-rwxr-xr-x 1 glen users 0 Oct 26 1992 home/test3332
[home is the mount-point, or rather a symlink down thru the mount a little
way]
Most newly created files seem to have the problem straight away. (But the
bulk of the files were rsync'd across from a Tru64 filesystem a month ago)
We have tried versions 2.2.8a and 3.0.0beta of smbclient / smbmount; 2.2.8a
was the same; 3.0.0 started with promising results, but it eventually did the
same timestamp trick (maybe less frequent??) but it also dies somehow after
about 30 mins and has to be remounted.
We believe we have narrowed this behaviour down to only linux samba clients
talking to the netware cifs nlm.
To add to the pot: we have also had a handful of files apparently change
filename spontaneously - so that they start with '..' In most cases,
they
started as .xyz and became ..xyz The only processes which touched those
files should have been reads - no writes. This may be a red herring - may
not be samba-related.
The timestamp issue is wide-spread, the filename problem is rare.
So, has anyone seen anything like this? Can you explain what causes it?
And is there a solution?
TIA
Glen
-------------------------------------------------------------------------------
What we have been able to discover since then, by experiment, research and
guesswork follows. Note: a lot of this was worked out by a colleague,
including ripping off half this email itself.
It seems that netware NSS stores modification (& other?) time-stamps in the
directory-file (the file which *is* the containing directory) like windows
does, whereas unix filsystems store this info in the file's inode. When the
CIFS nlm on netware receives a file query, it looks at the file itself which
doesn't have the timestamp, and hence it returns some sort of
bogus/semi-random/null result, and we see the stupid timestamp.
But if samba (or cifs.nlm) gets a directory query, then a file query, within a
time-window smaller than the time that it caches results for (1 second by
default I think) then the file query gets the correct time, remembered from
the directory query.
This explains the behaviour seen above - `ls -l x*` does a directory query for
the glob expansion, then a file query on each resulting file, and hence the
correct time!
More examples of successes and failures:
ls -l file* -- correct timestamp
ls -l file1 -- wrong timestamp
ls -l $(echo file*) -- correct timestamp
ls > /dev/null; ls -l file1 -- correct timestamp
ls > /dev/null; sleep 2; ls -l file1 -- random timestamp
By extending smbmount's ttl (length of cache) option, there are obviously
poor
results:
-bash-2.05b# ls -l ?
-rwx------ 1 root root 6 Aug 20 16:10 x
-bash-2.05b# ls -l x
-rwx------ 1 root root 6 Aug 20 16:10 x
-bash-2.05b# touch x
-bash-2.05b# ls -l x
-rwx------ 1 root root 6 Nov 24 1922 x
So here a file was modified and the kernel smb cache now forgot
the timestamp (due to modification) and assigns another random
timestamp. This is not good at all.
Further, when first you mount a file system and try to access a file,
you may well get the message that the file does not exist!
In C, unix uses the 'stat()' system call for the file query. To get
correct
timestamps, do a opendir(), readdir() beforehand. Attached is C source to
see the difference. The opendir code is currently commented out, so it will
give bad timestamps as is.
To get correct NSS timestamps in perl:
local *DH;
opendir DH, $dir;
$ff = readdir(DH);
# make sure this happens within ttl (1 sec) of opendir:
$mtime = lstat("$dir/$ff")[9];
Unfortunately, once we worked out how to see the true mod. timestamps, we
discovered that there is a SECOND timestamp issue: when a file on NSS is
touched in some way from linux thru samba & CIFS, the 'true' (NSS)
timestamp
can sometimes change in a random way! [This may even happen when the file
has not been touched!! Circumstantial evidence points to this happening to
some files which haven't been written to at all!]
I'm guessing this is also related to the directory file vs inode
incompatility, but can't provide much detail. I do know that doing a touch
on the same file any number of times within about 1 hour keeps giving the
same NSS timestamp. Whether that's because it keeps mangling the timestamp
in the same way, or because it loses interest in changing the dir-file
(another caching issue perhaps?) I don't know. It seems to happen to about
1
file in 20. Later on, the same files will touch correctly, and a different 1
in 20 files will misbehave.
In the scenario where smbd is file-serving from linux, and linux clients are
smbmounting it, this problem doesn't seem to occur.
<wild speculation mode> Our best guess is that samba finds some way to
squeeze
the proper inode info, like timestamps, thru the CIFS protocol. Perhaps only
if it knows that the client is linux/samba too. </spec...>
I hope this might be useful to others at some point. We are almost certainly
going to go a completely different way to solve our problems, anyway.
Glen
:)
--
Glen Davison glen@maths.unsw.edu.au
Computer System Administrator phone: +61 2 9385 7018
Maths, UNSW fax: +61 2 9385 7192