Aaron C. de Bruyn
2016-Aug-24 19:12 UTC
[Samba] Winbind occasionally forgets some users (failed to call wbcGetpwnam: WBC_ERR_DOMAIN_NOT_FOUND)
I've been googling, but can't seem to find an answer to this one. We have a Windows network with ~25 sites. Each site has a local Windows DC. Each site has a Debian 8.5 box running Samba 4.2.10-Debian. We decided to test moving shares from the Windows Server to the local Debian machine. (zfs snapshot is *really* handy when someone decides to open cryptolocker). File sharing has been working perfectly for about 6 months with one exception. Occasionally (for no reason I can find), winbind 'forgets' a handful of users. If I run 'wbinfo -i <a working user>' I get their name, home directory, shell, etc... If I run it against a non-working-user, I get: failed to call wbcGetpwnam: WBC_ERR_DOMAIN_NOT_FOUND Could not get info for user <some user> These users can no longer connect to shares. They don't show up in the 'net ads user' list, etc... I don't think it's a disconnect with AD because newly created users will sometimes show up after waiting ~15 minutes for the replication delay. Sometimes they won't. A 'net cache flush' doesn't fix the issue. I have to stop winbind and samba (causing problems for all users), and basically rm -rf everything under /var/lib/samba/, then run 'net ads join' and start the services back up. All users show up at that point. It's difficult to test because the issue appears to happen randomly between a few days and a few weeks. Logs don't reveal anything. Any pointers on what I can test or where I should focus the next time this happens? Thanks, -A
Jeremy Allison
2016-Aug-25 00:22 UTC
[Samba] Winbind occasionally forgets some users (failed to call wbcGetpwnam: WBC_ERR_DOMAIN_NOT_FOUND)
On Wed, Aug 24, 2016 at 12:12:29PM -0700, Aaron C. de Bruyn via samba wrote:> I've been googling, but can't seem to find an answer to this one. > > We have a Windows network with ~25 sites. Each site has a local Windows > DC. Each site has a Debian 8.5 box running Samba 4.2.10-Debian. > > We decided to test moving shares from the Windows Server to the local > Debian machine. (zfs snapshot is *really* handy when someone decides to > open cryptolocker). > > File sharing has been working perfectly for about 6 months with one > exception. > > Occasionally (for no reason I can find), winbind 'forgets' a handful of > users. > > If I run 'wbinfo -i <a working user>' I get their name, home directory, > shell, etc... > > If I run it against a non-working-user, I get: > > failed to call wbcGetpwnam: WBC_ERR_DOMAIN_NOT_FOUND > Could not get info for user <some user> > > These users can no longer connect to shares. They don't show up in the > 'net ads user' list, etc... > > I don't think it's a disconnect with AD because newly created users will > sometimes show up after waiting ~15 minutes for the replication delay. > Sometimes they won't. > > A 'net cache flush' doesn't fix the issue. > > I have to stop winbind and samba (causing problems for all users), and > basically rm -rf everything under /var/lib/samba/, then run 'net ads join' > and start the services back up. > > All users show up at that point. > > It's difficult to test because the issue appears to happen randomly between > a few days and a few weeks. > > Logs don't reveal anything.Logs will be key here. In conjunction with the source code they should be able to tell you the difference between a successful and failed lookup, and allow you to look into the different code paths. Can't help much more without them.
Possibly Parallel Threads
- winbind wbcGetpwnam WBC_ERR_DOMAIN_NOT_FOUND
- wbinfo -i -> failed to call wbcGetpwnam: WBC_ERR_DOMAIN_NOT_FOUND
- failed to call wbcGetpwnam/wbcGetgrnam/wbcGetpwsid WBC_ERR_DOMAIN_NOT_FOUND
- winbind authentication returning "failed to call wbcGetpwnam: WBC_ERR_DOMAIN_NOT_FOUND"
- failed to call wbcGetpwnam/wbcGetgrnam/wbcGetpwsid WBC_ERR_DOMAIN_NOT_FOUND