Hello. I know my description of the problem will be vague... I'm not asking for specific help, rather for some directions on where to look in order to understand it. I've got several setups which are all alike: _ FreeBSD (currently 13.2, 13.3 or, rarely, 14.0); _ ZFS; _ two jails: one for an AD DC and one for a member fileserver; _ Samba <4.17 (in the past), 4.17 (still installed in some places) or 4.19 (upgrading all instances is underway); _ no use of ACLs. This setup usually works fine. However, from time to time, the fileserver starts acting stangely; most of the times this happened after either the DC or the FS was upgraded (e.g. from 4.17 to 4.19), but I also saw this all of a sudden without any apparent external reason. Symptoms include one or more of the following: _ intermittent "file not found" problem, when the file is there (e.g. run a program from a network share and it will complain some DLLs are missing; run it again and it might work); _ directory listings missing some subdirectories or files, but moving to another directory and coming back might show everything; _ inability to open any document (e.g. Word), but ability to copy the file to a local folder (and then use it properly); _ Adobe Acrobat Reader hanging when opening a PDF file from a share; _ ability to enter any root folder of a share, but no second level folders; _ inability to list shares, i.e opening "\\fileserver", running "NET VIEW \\fileserver" or "smbclient -L //fileserver" hanging I was always able to solve (although not necessarily at first try) by doing one or both of: _ removing DOSATTRIB extended attribute from all files/directories; _ wiping Samba's databases, starting from scratch and rejoining the domain. Unfortunately I wasn't able to save some useful logs yet, but I think I saw two strange things: _ when a client tried accessing some file and failed, the logs were populated with entries about other files (possibly in other shares) which no one was attempting to access at that time; _ I'm not sure about this, but I think sometimes Samba mistook files for folders or vice-versa. I'm very ignorant about Samba's internals, but one possible explanation I came up with is that Samba has some sort of database about files/folders and for some reason it started misapplying it (i.e. apply the wrong record of the database to a file/folder). Does this make any sense? Does this sort of database/cache/whatever exist? Altough everything is working now on all the systems I manage, it already happened at least five times (on at least three different networks), so I'd like to be ready in case it happens again. Any hint appreciated. bye & Thanks av.
On Thu, 23 May 2024 19:20:04 +0200 Andrea Venturoli via samba <samba at lists.samba.org> wrote:> Hello. > > I know my description of the problem will be vague... I'm not asking > for specific help, rather for some directions on where to look in > order to understand it. > > I've got several setups which are all alike: > _ FreeBSD (currently 13.2, 13.3 or, rarely, 14.0); > _ ZFS; > _ two jails: one for an AD DC and one for a member fileserver; > _ Samba <4.17 (in the past), 4.17 (still installed in some places) or > 4.19 (upgrading all instances is underway); > _ no use of ACLs. > > This setup usually works fine. > > However, from time to time, the fileserver starts acting stangely; > most of the times this happened after either the DC or the FS was > upgraded (e.g. from 4.17 to 4.19), but I also saw this all of a > sudden without any apparent external reason. > > > > Symptoms include one or more of the following: > > _ intermittent "file not found" problem, when the file is there (e.g. > run a program from a network share and it will complain some DLLs are > missing; run it again and it might work); > _ directory listings missing some subdirectories or files, but moving > to another directory and coming back might show everything; > > _ inability to open any document (e.g. Word), but ability to copy the > file to a local folder (and then use it properly); > _ Adobe Acrobat Reader hanging when opening a PDF file from a share; > > _ ability to enter any root folder of a share, but no second level > folders; > > _ inability to list shares, i.e opening "\\fileserver", running "NET > VIEW \\fileserver" or "smbclient -L //fileserver" hanging > > > > I was always able to solve (although not necessarily at first try) by > doing one or both of: > _ removing DOSATTRIB extended attribute from all files/directories; > _ wiping Samba's databases, starting from scratch and rejoining the > domain. > > > > Unfortunately I wasn't able to save some useful logs yet, but I think > I saw two strange things: > _ when a client tried accessing some file and failed, the logs were > populated with entries about other files (possibly in other shares) > which no one was attempting to access at that time; > _ I'm not sure about this, but I think sometimes Samba mistook files > for folders or vice-versa. > > > > I'm very ignorant about Samba's internals, but one possible > explanation I came up with is that Samba has some sort of database > about files/folders and for some reason it started misapplying it > (i.e. apply the wrong record of the database to a file/folder). > Does this make any sense? Does this sort of database/cache/whatever > exist? > > > > Altough everything is working now on all the systems I manage, it > already happened at least five times (on at least three different > networks), so I'd like to be ready in case it happens again. > > Any hint appreciated. > > bye & Thanks > av. >I wonder if this (fixed) bug has anything to do with your Problem ? https://bugzilla.samba.org/show_bug.cgi?id=15093 If it does, then upgrading to 4.19.x should fix it . Rowland
On Thu, 2024-05-23 at 19:20 +0200, Andrea Venturoli via samba wrote:> Hello. > I know my description of the problem will be vague... I'm not asking > for specific help, rather for some directions on where to look in > order to understand it. > I've got several setups which are all alike:_ FreeBSD (currently > 13.2, 13.3 or, rarely, 14.0);_ ZFS;_ two jails: one for an AD DC and > one for a member fileserver;_ Samba <4.17 (in the past), 4.17 (still > installed in some places) or 4.19 (upgrading all instances is > underway);_ no use of ACLs. > This setup usually works fine. > However, from time to time, the fileserver starts acting stangely; > most of the times this happened after either the DC or the FS was > upgraded (e.g. from 4.17 to 4.19), but I also saw this all of a > sudden without any apparent external reason.Can you post your smb.conf. Have you used any of our fallback VFS modules instead of xattrs? (This would seem not to be the case if you have DOSATTR xattrs, but I still mention this if only for the benefit of future readers, as those are dev/inode based and reuse caused nightmares in our testsuites).> Symptoms include one or more of the following: > _ intermittent "file not found" problem, when the file is there (e.g. > run a program from a network share and it will complain some DLLs are > missing; run it again and it might work);_ directory listings missing > some subdirectories or files, but moving to another directory and > coming back might show everything; > _ inability to open any document (e.g. Word), but ability to copy the > file to a local folder (and then use it properly);_ Adobe Acrobat > Reader hanging when opening a PDF file from a share; > _ ability to enter any root folder of a share, but no second level > folders; > _ inability to list shares, i.e opening "\\fileserver", running "NET > VIEW \\fileserver" or "smbclient -L //fileserver" hangingCan you get a gdb backtrace of the hanging smbd? That would show what lock it was waiting on. You might do some investigation of the posix locking, to work out who is blocking that DB lock, and what that process is doing.> I was always able to solve (although not necessarily at first try) by > doing one or both of:_ removing DOSATTRIB extended attribute from all > files/directories;_ wiping Samba's databases, starting from scratch > and rejoining the domain. > > > Unfortunately I wasn't able to save some useful logs yet, but I think > I saw two strange things:_ when a client tried accessing some file > and failed, the logs were populated with entries about other files > (possibly in other shares) which no one was attempting to access at > that time;_ I'm not sure about this, but I think sometimes Samba > mistook files for folders or vice-versa.Very strange.> I'm very ignorant about Samba's internals, but one possible > explanation I came up with is that Samba has some sort of database > about files/folders and for some reason it started misapplying it > (i.e. apply the wrong record of the database to a file/folder).Does > this make any sense? Does this sort of database/cache/whatever exist? > > > Altough everything is working now on all the systems I manage, it > already happened at least five times (on at least three different > networks), so I'd like to be ready in case it happens again.I would install gdb and the debug symbol package for your Samba in the servers, and practice getting a gdb backtrace before the next failure. Andrew Bartlett -- Andrew Bartlett (he/him) https://samba.org/~abartlet/Samba Team Member (since 2001) https://samba.orgSamba Team Lead https://catalyst.net.nz/services/sambaCatalyst.Net Ltd Proudly developing Samba for Catalyst.Net Ltd - a Catalyst IT group company Samba Development and Support: https://catalyst.net.nz/services/samba Catalyst IT - Expert Open Source Solutions