Pekka L.J. Jalkanen
2013-May-10 18:14 UTC
[Samba] Samba 3 member, winbind caching and DC availability
Hello all, I've a box running Samba 3.5.6 (Debian Squeeze) that retrieves its user accounts from AD, using Winbind. The box is receiving incoming mail. Idmap backend is AD, with rfc2307 schema mode. Currently it's only accessing one AD DC, and the MTA on the Samba box is stopped whenever the DC is temporarily offline to prevent rejection of any incoming mail with "user unknown" status. However, I'd like to add another DC to the mix, but I'm concerned that mail could get rejected if the active DC suddenly goes offline and winbind doesn't switch to another DC promptly enough. Consider the following scenario: 1. There is an AD account foo. The account hasn't been used for some time, and it's thus not in winbind's cache. It's possibly not even in Winbind's idmap cache. 2. There are two AD DCs, A and B. 3. Samba member server C runs Winbind and is currently using the DC A. 4. Hardware fails and the DC A suddenly drops offline. 5. Just few seconds later an e-mail is arriving for foo. The MTA tries to check for the user. 6. As Winbind is not yet aware of the unavailability of the DC A, it tries to contact it. A. Now, in the ideal world this would continue as follows: 7. Winbind can't contact the DC A anymore, so it promptly contacts the DC B. 8. The DC B confirms the existence of foo. 9. The MTA delivers mail for foo. B. However, I'm afraid that in the real world, the following could result: 7. Winbind frantically tries to contact the DC A, but timeouts and can't confirm the existence of foo. It tells the MTA that there's no account. 8. The MTA replies sender with a "550 5.1.1 <foo at my.site>... User unknown" error. 9. After the timeout Winbind finally manages to switch to the DC B, but the sender has already got the delivery failure message and now thinks that the address foo at my.site is no longer valid. I tried to look at the documentation, but didn't find any recommendations regarding winbind cache settings in situations where availability is critical. Is it recommended to just disable all Winbind caching entirely? Or do just the opposite and try to cache as much as ever possible? What are the practical effects of winbind cache time and idmap cache time smb.conf options in this situation? Also, are the caches for all accounts "replenished" every time the cache of any account expires, or in per-account basis? And do the idmap cache times even work in a predictable way with this old Samba, where bug 8658 still unfixed? Or should I just try to upgrade as soon as possible? I build a test box similar to the actual box receiving mail (Winbind cache time was the default (300 seconds) and idmap cache time was set to 86,400 seconds (one day)) and flooded it with messages while at the same time switching connections to the DCs back and forth. And sure enough, I did get some delivery errors due to Winbind unavailability, if the account receiving the mail hadn't been queried after the last winbind restart and before the DC went offline. So the likelihood of the scenario 'B' feels all too great. Any recommendations for avoiding it? Pekka L.J. Jalkanen
I've got no answers, but I realised that I had a picked up a rather poor title, so here's a better one, combined with a more concise summary of my earlier babbling... Are there any smb.conf settings that control (Samba 3) Winbind's DC failover timeout when security = ADS? I do realise that there is a setting called "ldap connection timeout", but I assume it is only related to situations where domain logons have been turned on and ldapsam is being utilised as a password backend. Is this correct? In case such settings do not exist can anyone please explain me the way that Winbind actually handles these failover situations internally? How transparent should the failover process be in practice? Any experiences? Thanks, Pekka L.J. Jalkanen On 10.5.2013 21:14, Pekka L.J. Jalkanen wrote:> Hello all, > > I've a box running Samba 3.5.6 (Debian Squeeze) that retrieves its user > accounts from AD, using Winbind. The box is receiving incoming mail. > Idmap backend is AD, with rfc2307 schema mode. > > Currently it's only accessing one AD DC, and the MTA on the Samba box is > stopped whenever the DC is temporarily offline to prevent rejection of > any incoming mail with "user unknown" status. > > However, I'd like to add another DC to the mix, but I'm concerned that > mail could get rejected if the active DC suddenly goes offline and > winbind doesn't switch to another DC promptly enough. > > Consider the following scenario: > > 1. There is an AD account foo. The account hasn't been used for some > time, and it's thus not in winbind's cache. It's possibly not even in > Winbind's idmap cache. > 2. There are two AD DCs, A and B. > 3. Samba member server C runs Winbind and is currently using the DC A. > 4. Hardware fails and the DC A suddenly drops offline. > 5. Just few seconds later an e-mail is arriving for foo. The MTA tries > to check for the user. > 6. As Winbind is not yet aware of the unavailability of the DC A, it > tries to contact it. > > A. Now, in the ideal world this would continue as follows: > > 7. Winbind can't contact the DC A anymore, so it promptly contacts the DC B. > 8. The DC B confirms the existence of foo. > 9. The MTA delivers mail for foo. > > B. However, I'm afraid that in the real world, the following could result: > > 7. Winbind frantically tries to contact the DC A, but timeouts and can't > confirm the existence of foo. It tells the MTA that there's no account. > 8. The MTA replies sender with a "550 5.1.1 <foo at my.site>... User > unknown" error. > 9. After the timeout Winbind finally manages to switch to the DC B, but > the sender has already got the delivery failure message and now thinks > that the address foo at my.site is no longer valid. > > I tried to look at the documentation, but didn't find any > recommendations regarding winbind cache settings in situations where > availability is critical. Is it recommended to just disable all Winbind > caching entirely? Or do just the opposite and try to cache as much as > ever possible? What are the practical effects of winbind cache time and > idmap cache time smb.conf options in this situation? Also, are the > caches for all accounts "replenished" every time the cache of any > account expires, or in per-account basis? > > And do the idmap cache times even work in a predictable way with this > old Samba, where bug 8658 still unfixed? Or should I just try to upgrade > as soon as possible? > > I build a test box similar to the actual box receiving mail (Winbind > cache time was the default (300 seconds) and idmap cache time was set to > 86,400 seconds (one day)) and flooded it with messages while at the same > time switching connections to the DCs back and forth. And sure enough, I > did get some delivery errors due to Winbind unavailability, if the > account receiving the mail hadn't been queried after the last winbind > restart and before the DC went offline. So the likelihood of the > scenario 'B' feels all too great. > > Any recommendations for avoiding it? > > Pekka L.J. Jalkanen >
Possibly Parallel Threads
- Sudden authentication failures, hex dumps in log.samba
- Samba 4 install fails, no matter what I do
- "make install" fails, can't link libreplace.inst.so
- Synchronising password of some AD users with an external LDAP?
- New Windows 8 RSAT and "OU=Domain Controllers" support?