On Thu, 2024-05-23 at 19:20 +0200, Andrea Venturoli via samba wrote:> Hello. > I know my description of the problem will be vague... I'm not asking > for specific help, rather for some directions on where to look in > order to understand it. > I've got several setups which are all alike:_ FreeBSD (currently > 13.2, 13.3 or, rarely, 14.0);_ ZFS;_ two jails: one for an AD DC and > one for a member fileserver;_ Samba <4.17 (in the past), 4.17 (still > installed in some places) or 4.19 (upgrading all instances is > underway);_ no use of ACLs. > This setup usually works fine. > However, from time to time, the fileserver starts acting stangely; > most of the times this happened after either the DC or the FS was > upgraded (e.g. from 4.17 to 4.19), but I also saw this all of a > sudden without any apparent external reason.Can you post your smb.conf. Have you used any of our fallback VFS modules instead of xattrs? (This would seem not to be the case if you have DOSATTR xattrs, but I still mention this if only for the benefit of future readers, as those are dev/inode based and reuse caused nightmares in our testsuites).> Symptoms include one or more of the following: > _ intermittent "file not found" problem, when the file is there (e.g. > run a program from a network share and it will complain some DLLs are > missing; run it again and it might work);_ directory listings missing > some subdirectories or files, but moving to another directory and > coming back might show everything; > _ inability to open any document (e.g. Word), but ability to copy the > file to a local folder (and then use it properly);_ Adobe Acrobat > Reader hanging when opening a PDF file from a share; > _ ability to enter any root folder of a share, but no second level > folders; > _ inability to list shares, i.e opening "\\fileserver", running "NET > VIEW \\fileserver" or "smbclient -L //fileserver" hangingCan you get a gdb backtrace of the hanging smbd? That would show what lock it was waiting on. You might do some investigation of the posix locking, to work out who is blocking that DB lock, and what that process is doing.> I was always able to solve (although not necessarily at first try) by > doing one or both of:_ removing DOSATTRIB extended attribute from all > files/directories;_ wiping Samba's databases, starting from scratch > and rejoining the domain. > > > Unfortunately I wasn't able to save some useful logs yet, but I think > I saw two strange things:_ when a client tried accessing some file > and failed, the logs were populated with entries about other files > (possibly in other shares) which no one was attempting to access at > that time;_ I'm not sure about this, but I think sometimes Samba > mistook files for folders or vice-versa.Very strange.> I'm very ignorant about Samba's internals, but one possible > explanation I came up with is that Samba has some sort of database > about files/folders and for some reason it started misapplying it > (i.e. apply the wrong record of the database to a file/folder).Does > this make any sense? Does this sort of database/cache/whatever exist? > > > Altough everything is working now on all the systems I manage, it > already happened at least five times (on at least three different > networks), so I'd like to be ready in case it happens again.I would install gdb and the debug symbol package for your Samba in the servers, and practice getting a gdb backtrace before the next failure. Andrew Bartlett -- Andrew Bartlett (he/him) https://samba.org/~abartlet/Samba Team Member (since 2001) https://samba.orgSamba Team Lead https://catalyst.net.nz/services/sambaCatalyst.Net Ltd Proudly developing Samba for Catalyst.Net Ltd - a Catalyst IT group company Samba Development and Support: https://catalyst.net.nz/services/samba Catalyst IT - Expert Open Source Solutions
On 5/23/24 22:58, Andrew Bartlett wrote:> Can you post your smb.conf.Actually no, as I don't have *one* smb.conf. I'll add three of them (the last three were I had this problem) at the end of this email. As you can see they are not exactly identical.> Have you used any of our fallback VFS modules instead of xattrs?I'd say not, but see the smb.confs.> (This would seem not to be the case if you have DOSATTR xattrs,One thing makes me wonder: at least in some setups, I have store "dos attributes=no"; still I get them.> but I still mention this if only for the benefit of future readers, as those are dev/inode based and reuse caused nightmares in our testsuites).Not following you here... too much ignorance on my part :(> Can you get a gdb backtrace of the hanging smbd?I'll try next time it happens (if it happens, as it did only once). How do I know which process I shoudl attach to, however?> I would install gdb and the debug symbol package for your Samba in the servers, and practice getting a gdb backtrace before the next failure.On my way to rebuilding Samba with debug info. Thanks for the suggestion. bye & Thanks av. --------------------------------------------------------------------------- [global] max log size=0 vfs objects=full_audit shadow_copy2 audit:facility=LOCAL7 audit:priority=INFO full_audit:success=all !closedir !connectpath !fdopendir !openat !readdir !realpath !stat full_audit:failure=all !get_real_filename !stat !translate_name full_audit:facility=LOCAL7 full_audit:priority=INFO full_audit: prefix = IP=%I | USER=%u | MACHINE=%m | VOLUME=%S shadow:sort = desc shadow:format=-%Y%m%d%H%M%S shadow:delimiter=- shadow:snapprefix=auto_zroot shadow:localtime=yes shadow:snapdir=.zfs/snapshot netbios name=FS1 security=ADS workgroup=XXXXX realm=xxxxx.xxxxxxxx.xx winbind refresh tickets = yes winbind use default domain = yes idmap config *:backend = tdb idmap config *:range = 100000-999999 idmap config XXXXX:backend = ad idmap config XXXXX:range = 500-99999 idmap config XXXXX:schema_mode = rfc2307 idmap config XXXXX:unix_nss_info = yes interfaces=vlan1 10.x.x.48/24, vlan2 192.168.xxx.48/24 hosts allow = 127. 10.x.x. 192.168.xxx. hide dot files=no admin users=xxxxxxxxxxx unix extensions=no mangled names=no bind interfaces only=yes [homes] acl allow execute always=yes store dos attributes=no read only = No force create mode=0400 create mask=0600 force directory mode=0700 directory mask=0700 csc policy=disable delete readonly=yes --------------------------------------------------------------------------- [global] max log size=0 vfs objects=full_audit shadow_copy2 audit:facility=LOCAL7 audit:priority=INFO full_audit:success=all !closedir !connectpath !fdopendir !openat !readdir !realpath !stat full_audit:failure=all !get_real_filename !stat !translate_name full_audit:facility=LOCAL7 full_audit:priority=INFO full_audit: prefix = IP=%I | USER=%u | MACHINE=%m | VOLUME=%S shadow:sort = desc shadow:format=-%Y%m%d%H%M%S shadow:delimiter=- shadow:snapprefix=auto_zroot shadow:localtime=yes shadow:snapdir=.zfs/snapshot interfaces=vlan1 192.168.yyy.4 bind interfaces only=yes security=ADS workgroup=YY realm=YY.YYYYYYYYYYYYY.YY idmap config *:backend=tdb idmap config *:range=3000-7999 idmap config YY:backend=ad idmap config YY:range=10000-999999 idmap config YY:schema_mode=rfc2307 idmap config YY:unix_nss_info=no template shell=/sbin/nologin template homedir=/usr/home/%U winbind refresh tickets = yes winbind use default domain = yes [homes] read only = No force create mode = 0600 force directory mode = 0700 path=/usr/home/%S force create mode=600 force directory mode=700 create mask=600 directory mask=600 [yyyyyyyyy] path=/shares/yyyyyyyyy public=no browseable=yes writeable=yes follow symlinks=yes force create mode=660 force directory mode=770 create mask=660 directory mask=770 --------------------------------------------------------------------------- [global] bind interfaces only = Yes interfaces = vlan1 192.168.zzz.0/24 log level = 1 netbios name = FS realm = ZZZZZ.ZZZZZZZZZZZZZZZZZZZZZ.ZZ security = ADS server role = member server winbind expand groups = 1 winbind nss info = rfc2307 winbind refresh tickets = Yes winbind use default domain = Yes workgroup = ZZZZZZ audit:facility = LOCAL7 audit:priority = INFO full_audit:success = all !closedir !connectpath !fdopendir !openat !readdir !realpath !stat full_audit:failure = all !get_real_filename !stat !translate_name full_audit:facility = LOCAL7 full_audit:priority = INFO full_audit: prefix = IP=%I | USER=%u | MACHINE=%m | VOLUME=%S shadow:sort = desc shadow:format = -%Y%m%d%H%M%S shadow:delimiter = - shadow:snapprefix = auto_zroot shadow:localtime = yes shadow:snapdir = .zfs/snapshot idmap config *:backend = tdb idmap config *:range = 100000-999999 idmap config zzzzzz:backend = ad idmap config zzzzzz:range = 500-99999 idmap config zzzzzz:schema_mode = rfc2307 idmap config zzzzzz:unix_nss_info = yes hosts allow = 192.168.zzz. 10.z.z.40 10.z.zz. 10.z.z. netbios name=FS log level=1 map archive=no store dos attributes=no vfs objects = full_audit shadow_copy2 [homes] read only=No force create mode=0600 create mask=0600 force directory mode=0700 directory mask=0700 csc policy=disable [zzzzzzzz] path=/shares/zzzzzzzz force create mode=0660 create mask=0660 force directory mode=0770 directory mask=0770 public=no writeable=yes force group=+utenti valid users=@utenti browseable=yes follow symlinks=no ---------------------------------------------------------------------------