Hello Samba Team ! First at all. It's seems that my problem is NOT (or not only) a Samba Winbind problem. But I need to understand what's happen to send good reports to correct maintainers. What's I'm trying to achieve : ------------------------------------------ Gnome gdm introduced a great feature that suspend the system on logout. This help a lot reducing the electric consumption on my school networks where the machines are often used sporadically for just one hour. So I working hard, since two months, trying making winbind works with suspend. Sadly without success... What's the problem : ------------------------------ In a random manner, winbind lost the connection with my DC. "wbinfo -p" works but "winbind -i username" say that the user is unknown for all user. Pam winbind stop working as the users are not identified. Sometimes winbind recover after 3 ? 5 minutes. Sometimes it never recover and need to be restarted. Strangely, sometimes a "wbinfo -g" make winbind works again... What's cause the problem : ---------------------------------------- I don't know really. It seems that the problem appear in two situations. 1) When the system recover from suspend. Even with a higher log level of debugging I don't see anythings strange in the logs. As winbind use many time related service, maybe some tickets can expire during suspend and maybe this situation is not implemented in the winbins code. I don't know if winbind "officially" support suspending. Currently I have written a systemd hook that kill winbind before suspend and restarting it after. 2) The problem appear also on "DHCPDISCOVER". I don't know why but DHCPDISCOVER make winbind react. Just after winbind try to update the "DC" list. I don't understand how winbind know what dhclient do. Maybe a bug in dhclient that close a winbind's opened socket ? But just after after DHCPDISCOVER, winbind lost network connection (strangely I'm ssh connected to the host so the host don't lost all network connectivity), and dns resolution fail. Here what I see in the logs, just after DHCPDISCOVER : -> 12:29:27 is the time of the suspend the day before ( my hook kill winbind ) -> 07:44:43 is the time of the wake : see how everything seems fine -> 07:46:40 is when DHCPDISCOVER is sent and when winbind lost connectivity 12:29:27 Got sig[15] terminate (is_parent=0) 07:44:43 connection_ok: Connection to fichdc01.samdom.com for domain SAMDOM is not connected 07:44:43 Successfully contacted LDAP server 172.16.0.30 07:44:43 get_dc_list: preferred server list: "fichdc01.samdom.com, *" 07:44:43 Connecting to 172.16.0.30 at port 445 07:44:43 ldb_wrap open of secrets.ldb 07:44:43 Connecting to 172.16.0.30 at port 135 07:44:43 Connecting to 172.16.0.30 at port 49153 07:44:43 Connecting to 172.16.0.30 at port 135 07:44:43 Connecting to 172.16.0.30 at port 49153 07:44:43 Connecting to 172.16.0.30 at port 135 07:44:43 Connecting to 172.16.0.30 at port 49152 07:44:45 ads: fetch sequence_number for SAMDOM 07:44:45 get_dc_list: preferred server list: "fichdc01.samdom.com, *" 07:44:45 Successfully contacted LDAP server 172.16.0.30 07:44:45 Connected to LDAP server fichdc01.samdom.com 07:46:40 connection_ok: Connection to fichdc01.samdom.com for domain SAMDOM is not connected 07:46:40 cldap_multi_netlogon_send: cldap_socket_init failed for ipv4:172.16.0.30:389 error NT_STATUS_NETWORK_UNREACHABLE 07:46:40 ads_cldap_netlogon: did not get a reply 07:46:40 ads_try_connect: CLDAP request 172.16.0.30 failed. 07:46:40 get_dc_list: preferred server list: ", *" 07:46:40 ads_find_dc: failed to find a valid DC on our site (Default-First-Site-Name), Trying to find another DC for realm 'samdom.com' (domain '') 07:46:40 get_dc_list: preferred server list: ", *" 07:46:40 dns_send_req: Failed to resolve _ldap._tcp.dc._msdcs.samdom.com (Connection refused) 07:46:40 ads_dns_lookup_srv: Failed to send DNS query (NT_STATUS_CONNECTION_REFUSED) 07:46:40 ads_find_dc: name resolution for realm 'samdom.com' (domain '') failed: NT_STATUS_NO_LOGON_SERVERS 07:46:40 get_dc_list: preferred server list: ", *" 07:46:40 resolve_lmhosts: Attempting lmhosts lookup for name SAMDOM<0x1c> 07:46:40 resolve_wins: WINS server resolution selected and no WINS servers listed. 07:46:40 Could not look up dc's for domain SAMDOM 07:46:40 get_dc_list: preferred server list: ", *" 07:46:40 ads_dns_lookup_srv: Failed to send DNS query (NT_STATUS_CONNECTION_REFUSED) 07:46:40 get_sorted_dc_list: no server for name samdom.com available in site Default-First-Site-Name, fallback to all servers 07:46:40 get_dc_list: preferred server list: ", *" 07:46:40 ads_dns_lookup_srv: Failed to send DNS query (NT_STATUS_CONNECTION_REFUSED) 07:46:40 get_dc_list: preferred server list: ", *" 07:46:40 ads_dns_lookup_srv: Failed to send DNS query (NT_STATUS_CONNECTION_REFUSED) 07:46:40 get_dc_list: preferred server list: ", *" 07:46:40 resolve_lmhosts: Attempting lmhosts lookup for name SAMDOM<0x1c> 07:46:40 resolve_wins: WINS server resolution selected and no WINS servers listed. 07:46:40 get_dc_list: preferred server list: ", *" 07:46:40 ads_dns_lookup_srv: Failed to send DNS query (NT_STATUS_CONNECTION_REFUSED) So my questions : --------------------------- 1) Did winbind officially support suspending ? Or did I need to keep my systemd hook to stop winbind on suspend ? 2) Does someone understand what's happen when winbind lost connectivity ? Why I don't see anythings in the logs when resuming from suspend ? Why "wbinfo -g" make sometimes winbind working again ? 3) Does someone recognize some points of my DHCPDISCOVER problem ? Any idea that help me the file a bug to the good persons. Thanks you very much ! Baptiste.
Mandi! Prunk Dump via samba In chel di` si favelave...> First at all. It's seems that my problem is NOT (or not only) a Samba > Winbind problem. But I need to understand what's happen to send good > reports to correct maintainers.I've hit some similar trouble, see my posts: https://lists.samba.org/archive/samba/2019-February/221044.html https://lists.samba.org/archive/samba/2019-March/222057.html To make a note, i've recently upgrade to samba 4.8 and this behaviour persist, and seems very similar to yours: if winbind 'lost' connection to the DC, it takes 'some time' to reconnect; in the meantime, winbind-based user providers (wbinfo, NSS, ...) return 'user unknown'. -- dott. Marco Gaiarin GNUPG Key ID: 240A3D66 Associazione ``La Nostra Famiglia'' http://www.lanostrafamiglia.it/ Polo FVG - Via della Bont?, 7 - 33078 - San Vito al Tagliamento (PN) marco.gaiarin(at)lanostrafamiglia.it t +39-0434-842711 f +39-0434-842797 Dona il 5 PER MILLE a LA NOSTRA FAMIGLIA! http://www.lanostrafamiglia.it/index.php/it/sostienici/5x1000 (cf 00307430132, categoria ONLUS oppure RICERCA SANITARIA)
Hai, Hmm, well, i never use suspending, just because it often lots of unwanted problems as you noticed. Most problem the network drivers has crashed or isnt correclty loaded, wifi adapters offline, nic autosence problems Things like that. #Fix Internet after Suspend alias fix-internet="sudo modprobe -r r8169 && sleep 10 && sudo modprobe r8169" Depending on what is used, you still have more options. For example, in networkmanager.conf, try "carrier-wait-timeout" and "ignore-carrier" And othere thing you might encounter, that that the network device name changed after suspending. Then you also need to use : /etc/udev/rules.d/10-network.rules SUBSYSTEM=="net", ACTION=="add", ATTR{address}=="aa:bb:cc:dd:ee:ff", NAME="eth1" SUBSYSTEM=="net", ACTION=="add", ATTR{address}=="ff:ee:dd:cc:bb:aa", NAME="eth0" But my advice, schedule an shutdown and startup when needed. Save you lots of problems. I have disable all supspending things/options i could find. Greetz, Louis> -----Oorspronkelijk bericht----- > Van: samba [mailto:samba-bounces at lists.samba.org] Namens > Prunk Dump via samba > Verzonden: maandag 21 oktober 2019 10:07 > Aan: samba at lists.samba.org > Onderwerp: [Samba] winbind : suspend nightmare > > Hello Samba Team ! > > First at all. It's seems that my problem is NOT (or not only) a Samba > Winbind problem. But I need to understand what's happen to send good > reports to correct maintainers. > > > What's I'm trying to achieve : > ------------------------------------------ > Gnome gdm introduced a great feature that suspend the system on > logout. This help a lot reducing the electric consumption on my school > networks where the machines are often used sporadically for just one > hour. > > So I working hard, since two months, trying making winbind works with > suspend. Sadly without success... > > > What's the problem : > ------------------------------ > In a random manner, winbind lost the connection with my DC. "wbinfo > -p" works but "winbind -i username" say that the user is unknown for > all user. Pam winbind stop working as the users are not identified. > > Sometimes winbind recover after 3 ? 5 minutes. Sometimes it never > recover and need to be restarted. > > Strangely, sometimes a "wbinfo -g" make winbind works again... > > > What's cause the problem : > ---------------------------------------- > > I don't know really. It seems that the problem appear in two > situations. > > 1) When the system recover from suspend. > > Even with a higher log level of debugging I don't see anythings > strange in the logs. As winbind use many time related service, maybe > some tickets can expire during suspend and maybe this situation is not > implemented in the winbins code. > > I don't know if winbind "officially" support suspending. Currently I > have written a systemd hook that kill winbind before suspend and > restarting it after. > > 2) The problem appear also on "DHCPDISCOVER". I don't know why but > DHCPDISCOVER make winbind react. Just after winbind try to update the > "DC" list. > > I don't understand how winbind know what dhclient do. Maybe a bug in > dhclient that close a winbind's opened socket ? > > But just after after DHCPDISCOVER, winbind lost network connection > (strangely I'm ssh connected to the host so the host don't lost all > network connectivity), and dns resolution fail. > > Here what I see in the logs, just after DHCPDISCOVER : > > -> 12:29:27 is the time of the suspend the day before ( my > hook kill winbind ) > -> 07:44:43 is the time of the wake : see how everything seems fine > -> 07:46:40 is when DHCPDISCOVER is sent and when winbind > lost connectivity > > 12:29:27 Got sig[15] terminate (is_parent=0) > 07:44:43 connection_ok: Connection to fichdc01.samdom.com for domain > SAMDOM is not connected > 07:44:43 Successfully contacted LDAP server 172.16.0.30 > 07:44:43 get_dc_list: preferred server list: "fichdc01.samdom.com, *" > 07:44:43 Connecting to 172.16.0.30 at port 445 > 07:44:43 ldb_wrap open of secrets.ldb > 07:44:43 Connecting to 172.16.0.30 at port 135 > 07:44:43 Connecting to 172.16.0.30 at port 49153 > 07:44:43 Connecting to 172.16.0.30 at port 135 > 07:44:43 Connecting to 172.16.0.30 at port 49153 > 07:44:43 Connecting to 172.16.0.30 at port 135 > 07:44:43 Connecting to 172.16.0.30 at port 49152 > 07:44:45 ads: fetch sequence_number for SAMDOM > 07:44:45 get_dc_list: preferred server list: "fichdc01.samdom.com, *" > 07:44:45 Successfully contacted LDAP server 172.16.0.30 > 07:44:45 Connected to LDAP server fichdc01.samdom.com > 07:46:40 connection_ok: Connection to fichdc01.samdom.com for domain > SAMDOM is not connected > 07:46:40 cldap_multi_netlogon_send: cldap_socket_init failed for > ipv4:172.16.0.30:389 error NT_STATUS_NETWORK_UNREACHABLE > 07:46:40 ads_cldap_netlogon: did not get a reply > 07:46:40 ads_try_connect: CLDAP request 172.16.0.30 failed. > 07:46:40 get_dc_list: preferred server list: ", *" > 07:46:40 ads_find_dc: failed to find a valid DC on our site > (Default-First-Site-Name), Trying to find another DC for realm > 'samdom.com' (domain '') > 07:46:40 get_dc_list: preferred server list: ", *" > 07:46:40 dns_send_req: Failed to resolve > _ldap._tcp.dc._msdcs.samdom.com (Connection refused) > 07:46:40 ads_dns_lookup_srv: Failed to send DNS query > (NT_STATUS_CONNECTION_REFUSED) > 07:46:40 ads_find_dc: name resolution for realm 'samdom.com' (domain > '') failed: NT_STATUS_NO_LOGON_SERVERS > 07:46:40 get_dc_list: preferred server list: ", *" > 07:46:40 resolve_lmhosts: Attempting lmhosts lookup for name > SAMDOM<0x1c> > 07:46:40 resolve_wins: WINS server resolution selected and no WINS > servers listed. > 07:46:40 Could not look up dc's for domain SAMDOM > 07:46:40 get_dc_list: preferred server list: ", *" > 07:46:40 ads_dns_lookup_srv: Failed to send DNS query > (NT_STATUS_CONNECTION_REFUSED) > 07:46:40 get_sorted_dc_list: no server for name samdom.com available > in site Default-First-Site-Name, fallback to all servers > 07:46:40 get_dc_list: preferred server list: ", *" > 07:46:40 ads_dns_lookup_srv: Failed to send DNS query > (NT_STATUS_CONNECTION_REFUSED) > 07:46:40 get_dc_list: preferred server list: ", *" > 07:46:40 ads_dns_lookup_srv: Failed to send DNS query > (NT_STATUS_CONNECTION_REFUSED) > 07:46:40 get_dc_list: preferred server list: ", *" > 07:46:40 resolve_lmhosts: Attempting lmhosts lookup for name > SAMDOM<0x1c> > 07:46:40 resolve_wins: WINS server resolution selected and no WINS > servers listed. > 07:46:40 get_dc_list: preferred server list: ", *" > 07:46:40 ads_dns_lookup_srv: Failed to send DNS query > (NT_STATUS_CONNECTION_REFUSED) > > > So my questions : > --------------------------- > > 1) Did winbind officially support suspending ? Or did I need to keep > my systemd hook to stop winbind on suspend ? > > 2) Does someone understand what's happen when winbind lost > connectivity ? Why I don't see anythings in the logs when resuming > from suspend ? Why "wbinfo -g" make sometimes winbind working again ? > > 3) Does someone recognize some points of my DHCPDISCOVER problem ? Any > idea that help me the file a bug to the good persons. > > Thanks you very much ! > > Baptiste. > > -- > To unsubscribe from this list go to the following URL and read the > instructions: https://lists.samba.org/mailman/options/samba > >
On Mon, Oct 21, 2019 at 10:07:20AM +0200, Prunk Dump via samba wrote:> > I don't know if winbind "officially" support suspending. Currently I > have written a systemd hook that kill winbind before suspend and > restarting it after.It hasn't been tested in that mode as far as I know. Congratulations, you're the first ! :-).> 07:44:43 connection_ok: Connection to fichdc01.samdom.com for domain > SAMDOM is not connected > 07:44:43 Successfully contacted LDAP server 172.16.0.30 > 07:44:43 get_dc_list: preferred server list: "fichdc01.samdom.com, *" > 07:44:43 Connecting to 172.16.0.30 at port 445 > 07:44:43 ldb_wrap open of secrets.ldb > 07:44:43 Connecting to 172.16.0.30 at port 135 > 07:44:43 Connecting to 172.16.0.30 at port 49153 > 07:44:43 Connecting to 172.16.0.30 at port 135 > 07:44:43 Connecting to 172.16.0.30 at port 49153 > 07:44:43 Connecting to 172.16.0.30 at port 135 > 07:44:43 Connecting to 172.16.0.30 at port 49152 > 07:44:45 ads: fetch sequence_number for SAMDOM > 07:44:45 get_dc_list: preferred server list: "fichdc01.samdom.com, *" > 07:44:45 Successfully contacted LDAP server 172.16.0.30 > 07:44:45 Connected to LDAP server fichdc01.samdom.com > 07:46:40 connection_ok: Connection to fichdc01.samdom.com for domain > SAMDOM is not connected > 07:46:40 cldap_multi_netlogon_send: cldap_socket_init failed for > ipv4:172.16.0.30:389 error NT_STATUS_NETWORK_UNREACHABLEOK, the above line is the problem. Why does that happen if above we have: 07:44:45 Successfully contacted LDAP server 172.16.0.30 cldap_multi_netlogon_send() does a UDP cldap ping to the server (172.16.0.30). Getting NT_STATUS_NETWORK_UNREACHABLE looks like the network interface isn't up yet. Can you start winbind under strace in this case so we can see what syscalls are being done and exactly how they're failing ? Thanks, Jeremy.
Le lun. 21 oct. 2019 ? 11:21, L.P.H. van Belle via samba <samba at lists.samba.org> a ?crit :> > Hai, > > > #Fix Internet after Suspend > alias fix-internet="sudo modprobe -r r8169 && sleep 10 && sudo modprobe r8169" > > Depending on what is used, you still have more options. > For example, in networkmanager.conf, try "carrier-wait-timeout" and "ignore-carrier" > > And othere thing you might encounter, that that the network device name changed after suspending. > Then you also need to use : /etc/udev/rules.d/10-network.rules > SUBSYSTEM=="net", ACTION=="add", ATTR{address}=="aa:bb:cc:dd:ee:ff", NAME="eth1" > SUBSYSTEM=="net", ACTION=="add", ATTR{address}=="ff:ee:dd:cc:bb:aa", NAME="eth0" > Greetz, > > LouisHow do you know all of that things Louis ? .... Impressive .... Following your advises, I have written some "monitor" service scripts that : -> check and record kernel logs about nic module when wake from suspend or DHCP discover -> record dhclient/network-manager logs -> record interface names But as my problem appear just one time a day (suspend must be sufficient long, at least 5 hours) I can't give the result now. I keep you informed. Le mer. 23 oct. 2019 ? 07:26, Jeremy Allison <jra at samba.org> a ?crit :> > On Mon, Oct 21, 2019 at 10:07:20AM +0200, Prunk Dump via samba wrote: > > > > I don't know if winbind "officially" support suspending. Currently I > > have written a systemd hook that kill winbind before suspend and > > restarting it after. > > It hasn't been tested in that mode as far as I know. > > Congratulations, you're the first ! :-). >Thank you very much Jeremy ! Here the systemd hook used. This solve the issue while recover from suspend. But don't solve the DHCPDISCOVER problem. Maybe you are also interested about strace on the "wake" problem. But I think it will too difficult for me to work on two problems at the same time. And as I have a workaround for the first one I prefer working on the DHCPDISCOVER problem first. ~# cat /lib/systemd/system/sleep-winbind.service [Unit] Description=winbind sleep hook Before=sleep.target StopWhenUnneeded=yes [Service] Environment=PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin Type=oneshot RemainAfterExit=yes ExecStart=-systemctl stop winbind ExecStop=-systemctl start winbind [Install] WantedBy=sleep.target> > 07:44:43 connection_ok: Connection to fichdc01.samdom.com for domain > > SAMDOM is not connected > > 07:44:43 Successfully contacted LDAP server 172.16.0.30 > > 07:44:43 get_dc_list: preferred server list: "fichdc01.samdom.com, *" > > 07:44:43 Connecting to 172.16.0.30 at port 445 > > 07:44:43 ldb_wrap open of secrets.ldb > > 07:44:43 Connecting to 172.16.0.30 at port 135 > > 07:44:43 Connecting to 172.16.0.30 at port 49153 > > 07:44:43 Connecting to 172.16.0.30 at port 135 > > 07:44:43 Connecting to 172.16.0.30 at port 49153 > > 07:44:43 Connecting to 172.16.0.30 at port 135 > > 07:44:43 Connecting to 172.16.0.30 at port 49152 > > 07:44:45 ads: fetch sequence_number for SAMDOM > > 07:44:45 get_dc_list: preferred server list: "fichdc01.samdom.com, *" > > 07:44:45 Successfully contacted LDAP server 172.16.0.30 > > 07:44:45 Connected to LDAP server fichdc01.samdom.com > > 07:46:40 connection_ok: Connection to fichdc01.samdom.com for domain > > SAMDOM is not connected > > 07:46:40 cldap_multi_netlogon_send: cldap_socket_init failed for > > ipv4:172.16.0.30:389 error NT_STATUS_NETWORK_UNREACHABLE > > OK, the above line is the problem. Why does that > happen if above we have: > > 07:44:45 Successfully contacted LDAP server 172.16.0.30 > > cldap_multi_netlogon_send() does a UDP cldap ping > to the server (172.16.0.30). Getting NT_STATUS_NETWORK_UNREACHABLE > looks like the network interface isn't up yet. > > Can you start winbind under strace in this case so we > can see what syscalls are being done and exactly how they're > failing ? > > Thanks, > > Jeremy.Yes I will do that and I will sent the result. But the first thing I don't understand is why winbind start reacting exactly when DHCPDISCOVER is launched. The first log lines : 07:44:43 connection_ok: Connection to fichdc01.samdom.com for domain SAMDOM is not connected 07:44:43 Successfully contacted LDAP server 172.16.0.30 Happen exactly when the DHCPDISCOVER response is received. Strange no ? How winbind know that ? There is also a hook script for ntp in Debian : /etc/dhcp/dhclient-exit-hooks.d/ntp I'm investigating if this problem can be cause by ntp. There is also a Samba hook : /etc/dhcp/dhclient-enter-hooks.d/samba But it seems only related to smbd, not winbind. Does smbd can sent some signal to winbind ? Thank you very must Samba Team !! Baptiste.
On Tue, 2019-10-22 at 22:26 -0700, Jeremy Allison via samba wrote:> On Mon, Oct 21, 2019 at 10:07:20AM +0200, Prunk Dump via samba wrote: > > I don't know if winbind "officially" support suspending. Currently > > I > > have written a systemd hook that kill winbind before suspend and > > restarting it after. > > It hasn't been tested in that mode as far as I know. > > Congratulations, you're the first ! :-). >(Sorry for the wall of words) Not exactly the first. I have been using winbind for several years now to integrate my workstations and laptops into a Windows world. My goal is to be able to hand a Linux laptop to an end user and off they trot with everything in place and properly useable. I'm rather close to my goal. Evolution for Exchange, Libre Office for errrr office, Kerberos all over the shop for as much as possible (Evo EWS can do Kerb). autofs with mount.cifs and Kerb for "drive mappings". CUPS can take Kerb auth and supports everything that prints (ta Apple). You can import your AD CA cert to the OpenSSL trust store so LDAPS works properly and your browsers can be persuaded to trust it as well. If you enable NDES on your AD CA then you can grab SSL certs for your Linux boxes with Certmonger and then you can do Wifi 802.1X and trusted web server etc. The last major hurdle is the laptop experience, ie suspend/resume. To be honest it isn't too bad and not too far from using Windows but Windows will always allow you to login with cached creds but a winbind based box will give you a fairly random result. I use nss_winbind and the rid idmap backed to get the same user on each device. It really does work very nicely for ethernet wired workstations - by the time everything has woken up in a short time, the user is available for auth via winbind. On a laptop with say VPNs over wifi to wake up you have to wait a while otherwise your userid will come up as unknown and it looks like there is some sort of caching (I've binned nscd) for quite a while. If you restart winbind then the userid will become available much quicker, so that systemd hook sounds like a great idea, that I will try out soon. winbind has a concept of offline and online but I don't know what that is, nor how nss works with it. I've tried using smbcontrol to tell winbind it is offline or online but that does not seem to work. Restarting winbind normally gets my account working again. If I had to guess, then offline and online mean "network available" (layer 2/3) and not "AD available" (layer 3/4) Cheers Jon