Jim@Morris.net wrote:> I have finally tracked down the reason I am experiencing problems with > DBase files on Samba file servers - initially Redhat 6, and now on > Caldera OpenLinux 2.2 as well.Great debugging work !> The DOS client code uses a DBase library called CodeBase to access the > DBase files. When a record is appended to a file, the underlying C > code does the following at the beginning of the append operation: > > rc = lock( file->hand, pos_start, num_bytes ) ; > > At the the time of the call: > > file->hand = A valid file handle > pos_start = 0xEFFFFFFFThe pos_start is probably the problem. On Win32 this is an unsigned 32 bit number. On POSIX (Samba) lock calls treat this as a signed 32 bit number. Samba on a 32 bit system such as Linux does some lock mangling to move this into the POSIX range. It should work.> The same test against the Samba 2.0.5a server(s) shows that the lock() > call ALWAYS returns a 0 (zero) status - even if a different PC already > has a lock!Hmmm. But it obviously doesn't :-(.> The Win32 library code for this same operations uses a call to the > Win32 library function LockFile(), like this: > > if ( LockFile( (HANDLE)file->hand, pos_start, 0, num_bytes, 0 ) ) > rc = NO_ERROR ; > else > rc = GetLastError() ; > > This code under Win32 appears to always obtain the requested lock as > well, regardless of whether another PC currently has the lock. > > The DOS test program for this has been run from systems using MS-DOS > 6.22, PC-DOS 6.x, Win98 DOS Box, and WinNT 4.0sp/sp5. The Win32 > code has been tested on Win98 and WinNT 4.0sp3/sp5.Thanks for this analysis. I will test this with a Win32 program to check the Samba locking with this lock range on both 64 bit (IRIX) and 32 bit (Linux) systems.> I suspect that the problem may be that Samba things the lock is bogus, > since it is past the end of the actual file, and not bother to keep > track of the lock from two different clients. However, if I run two > copies of the test program on the SAME PC, then the lock works. So it > does appear that the smbd process for the specific client machine *IS* > tracking the lock - but it is not bothering to communicate the > existence of the lock to other running smbd processes.Nah. Actually the Win32 client redirector is doing the lock conflict detection locally in this case. That's why it needs to be done from two different machines. I'll code this up (Win32) and test it against the current CVS stable tree, and get back to you on this. Thanks for all the analysis, Cheers, Jeremy Allison, Samba Team. -- -------------------------------------------------------- Buying an operating system without source is like buying a self-assembly Space Shuttle with no instructions. --------------------------------------------------------
Hello Jeremy, Thursday, September 16, 1999, 1:26:30 PM, you wrote: Allison> Great debugging work ! Thanks! Allison> The pos_start is probably the problem. On Win32 this is an unsigned Allison> 32 bit number. On POSIX (Samba) lock calls treat this as a signed Allison> 32 bit number. Samba on a 32 bit system such as Linux does some Allison> lock mangling to move this into the POSIX range. It should work. Actually, even according to the Borland C++ 3.1 manual (old!), the offset is a long (signed 32 bit). The PC test program I have now shows the offset being locked as -268435458. I now have the following debug output from Samba, in log level 10, that shows the problem. Appears the fcntl call to lock the file region is failing in either the C library or Linux kernel.... Also note that Samba at least is treating the lock offset as an unsigned long, rather than a signed long. [1999/09/16 14:03:25, 10] locking/locking.c:remove_share_oplock_fn(246) remove_share_oplock_fn: removing oplock info for entry dev=801 ino=239624 [1999/09/16 14:03:25, 10] smbd/reply.c:reply_lockingX(4252) reply_lockingX: lock start=4026531839, len=1 for file cbtest/cbtest.DBF [1999/09/16 14:03:25, 10] locking/locking.c:do_lock(113) do_lock: lock type 1 start=4026531839 len=1 requested for file cbtest/cbtest.DBF [1999/09/16 14:03:25, 8] lib/util.c:fcntl_lock(2676) fcntl_lock 8 6 4026531839 1 1 [1999/09/16 14:03:25, 3] lib/util.c:fcntl_lock(2702) fcntl lock gave errno 22 (Invalid argument) [1999/09/16 14:03:25, 3] lib/util.c:fcntl_lock(2724) lock failed at offset 4026531839 count 1 op 6 type 1 (Invalid argument) [1999/09/16 14:03:25, 3] lib/util.c:fcntl_lock(2729) locking not supported? returning True [1999/09/16 14:03:25, 3] smbd/reply.c:reply_lockingX(4290) lockingX fnum=4880 type=2 num_locks=1 num_ulocks=0 [1999/09/16 14:03:25, 5] lib/util.c:show_msg(459) Note that after the lock failure, Samba decided to return a TRUE result! Ouch.... That explains why both clients think they got the lock, when the lock actually failed! I'm looking at the kernel source file linux/fs/locks.c right now, trying to determine what the "invalid argument" was. I've not quite figured it out at this point. Allison> Thanks for all the analysis, Sure! If you want, I can quickly email you a test DOS executable, and some small DBase files to test against. Best regards, Jim -- /------------------------------------------------\ | Jim Morris | Business: jmorris@rtc-group.com | | | Personal: Jim@Morris.net | |------------------------------------------------| | World Wide Web: http://Jim.Morris.net | | AOL Instant Messenger: JFM2001 | \------------------------------------------------/
Jim@Morris.net wrote:> > The same test against the Samba 2.0.5a server(s) shows that the lock() > call ALWAYS returns a 0 (zero) status - even if a different PC already > has a lock!Ok, I now understand the problem exactly. It *IS* the 64-bit glibc2.1 bug with Linux on x86 biting us again. Let me explain : On a full 64 bit system (IRIX), the value 0xEFFFFFFF is a valid POSIX lock range (as it is a positive value less than 2^63 - 1), so locking a byte at this range succeeds, and a second client attempt to lock this byte will fail (ie. the lock conflicts). Thus on a 64 bit system the clients conflict correctly. I have demonstrated this with my test Win32 client code working perfectly against a Samba server on an IRIX 6.5.x box. Unfortunately, on x86 Linux using glibc2.1, the libc claims to have 64 bit support, but really doesn't. Thus, the datatype off_t is sized as 64 bits, even though the underlying filesystem and locking mechanisms don't support 64 bit ranges. So what happens is that Samba thinks the underlying system is 64 bit clean, and the locking workaround is not activated, as 0xEFFFFFFF fits in 32 bits (this locking workaround is only activated on a system with non-working 64 bit locks when the client, usually NT, sends a 64 bit lock range - this is a *client* bug...). With me so far ? In addition to the 64 bit locking workaround added in 2.0.5, Samba also has unsigned -> signed lock mangling code to deal with the fact that Windows lock ranges are unsigned and POSIX lock ranges are signed (ie. 0xEFFFFFFF is a valid Windows lock range on a 32 bit system, but not on a 32 bit POSIX system as it would be negative). But because Samba on x86 Linux with glibc2.1 is being told off_t is 64 bits this unsigned -> signed lock mangling code is never activated (as 0x00000000EFFFFFFF is a positive 64 bit value), so the 64 bit value 0x00000000EFFFFFFF is being passed down to glibc fcntl locking as a valid positive lock range. The internals of glibc throws away the top 32 bits of this lock range (which are zeros) and is left with a 32 bit value of 0xEFFFFFFF - which is a *negative* 32 bit value and hence invalid as a POSIX lock range. So glibc 2.1 returns an EINVAL error and that legacy code in Samba is then returning True (meaning ok - lock granted). I will fix Samba for 2.0.6 so that 64 bit code is *never* activated on a Linux box and will have to leave it that way until Linux truely is 64 bit. I have also removed the legacy code in Samba that interprets EINVAL as a valid lock. Give me a day or so and I'll have an RPM you can use to test your code with. Regards, Jeremy Allison, Samba Team. -- -------------------------------------------------------- Buying an operating system without source is like buying a self-assembly Space Shuttle with no instructions. --------------------------------------------------------
Robert Franklin <r.c.franklin@reading.ac.uk> said:> Hi, > > I'm just wondering how other people cycle and archive their Samba > logfiles. We're running 2.0.5a which I understand has a bug which > causes the logfiles not to be closed/reopened (after cycling the > names to .old).What we do is use syslog for recording Samba activity long term. The Samba logs are really there for debugging and do not operate too well as you have observed for time based logging. -- ----------------------------------------------------------------------------- | Peter Polkinghorne, Computer Centre, Brunel University, Uxbridge, UB8 3PH,| | Peter.Polkinghorne@brunel.ac.uk +44 1895 274000 x2561 UK | -----------------------------------------------------------------------------
At 04:08 17.09.99 , Jeremy Allison wrote:>Jim@Morris.net wrote: > > > > The same test against the Samba 2.0.5a server(s) shows that the lock() > > call ALWAYS returns a 0 (zero) status - even if a different PC already > > has a lock! > >Ok, I now understand the problem exactly. It *IS* the 64-bit >glibc2.1 bug with Linux on x86 biting us again. Let me explain : > >On a full 64 bit system (IRIX), the value 0xEFFFFFFF is a >valid POSIX lock range (as it is a positive value less than >2^63 - 1), so locking a byte at this range succeeds, and >a second client attempt to lock this byte will fail (ie. the >lock conflicts). Thus on a 64 bit system the clients >conflict correctly. I have demonstrated this with my >test Win32 client code working perfectly against a Samba >server on an IRIX 6.5.x box. > >Unfortunately, on x86 Linux using glibc2.1, the libc claims >to have 64 bit support, but really doesn't. Thus, the >datatype off_t is sized as 64 bits, even though the >underlying filesystem and locking mechanisms don't >support 64 bit ranges. > >So what happens is that Samba thinks the underlying >system is 64 bit clean, and the locking workaround is >not activated, as 0xEFFFFFFF fits in 32 bits (this >locking workaround is only activated on a system with >non-working 64 bit locks when the client, usually NT, >sends a 64 bit lock range - this is a *client* bug...). >With me so far ? > >In addition to the 64 bit locking workaround added in >2.0.5, Samba also has unsigned -> signed lock mangling code to >deal with the fact that Windows lock ranges are unsigned >and POSIX lock ranges are signed (ie. 0xEFFFFFFF is a >valid Windows lock range on a 32 bit system, but not >on a 32 bit POSIX system as it would be negative). But >because Samba on x86 Linux with glibc2.1 is being told off_t >is 64 bits this unsigned -> signed lock mangling code is never >activated (as 0x00000000EFFFFFFF is a positive 64 bit value), so the >64 bit value 0x00000000EFFFFFFF is being passed down to glibc >fcntl locking as a valid positive lock range. The >internals of glibc throws away the top 32 bits of this lock range >(which are zeros) and is left with a 32 bit value of >0xEFFFFFFF - which is a *negative* 32 bit value and >hence invalid as a POSIX lock range. So glibc 2.1 returns >an EINVAL error and that legacy code in Samba is then returning >True (meaning ok - lock granted). > >I will fix Samba for 2.0.6 so that 64 bit code is *never* >activated on a Linux box and will have to leave it that >way until Linux truely is 64 bit. I have also removed the >legacy code in Samba that interprets EINVAL as a valid lock. >Give me a day or so and I'll have an RPM you can use to test >your code with.Which glibc version did you use for testing? 2.1.2? I know some (if not all) 64-bit issues have been resolved in 2.1.2 (2.1 and 2.1.1 are obsolete). Franz.>Regards, > > Jeremy Allison, > Samba Team. > >-- >-------------------------------------------------------- >Buying an operating system without source is like buying >a self-assembly Space Shuttle with no instructions. >--------------------------------------------------------
I support a small net with a Linux Samba server and about 5 W95 systems. This isn't connected to the outside world and the users can be trusted, so I avoided all the encrypted password hassles by making the shares public. This worked fine until I added the first W98 system... The networking is fine on the W98 box - TCP/IP works and all shares can be browsed. However, if a user tries to access a share, s/he gets a permission denied error when using the W98 box, and has no problems from the W95 boxes. This is completely repeatable. A typical share is something like [tmp] comment = Temporary file space path = tmp read only = no public = yes I thought I knew my way around Samba fairly well, but this has me stumped. It _must_ be a common problem??! So what's changed in W98? Any suggestions? Thanks and best regards, Paul Paul Sherwin Consulting 22 Monmouth Road, Oxford OX1 4TD, UK Phone +44 (0)1865 721438 http://www.telinco.co.uk/psherwin/index.htm Mobile +44 (0)7931 578334 mailto:psherwin@telinco.co.uk Pager +44 (0)7666 797228