Hi, Last night I attempted to upgrade from Samba 4.8.5 to 4.9.0, with disastrous results. Upon starting Samba 4.9.0 my entire network came to a screaming halt a few seconds later, and upon shutting Samba down it came back to life again. Just to be sure this wasn't a coincidence, I then started Samba again. Once again all connectivity stopped, but came back as soon as I was able to shut down Samba. Network switches were all logging that they were shutting down physical ports due excessive numbers of broadcast packets being seen, and a Wireshark capture from my PC verified that indeed there really was a broadcast storm happening that was triggering this. The capture showed that upon startup Samba 4.9.0 was sending thousands and thousands of broadcast packets onto the wire in very quick succession. Wireshark counted around 6500 broadcasts in about 300ms. The packets are all Host Announcement packets sent from the IPv4 address of the host to the broadcast address of the subnet the Samba is on. Upon reverting back to 4.8.5 with no other config changes, everything is back to normal again. The config is very basic: thunderstorm ~ # testparm Load smb config files from /etc/samba/smb.conf rlimit_max: increasing rlimit_max (1024) to minimum Windows limit (16384) Processing section "[homes]" Processing section "[root]" Processing section "[photos]" Processing section "[store]" Loaded services file OK. Server role: ROLE_STANDALONE Press enter to see a dump of your service definitions # Global parameters [global] dns proxy = No domain master = Yes load printers = No log file = /var/log/samba/log.%m map to guest = Bad User max log size = 200 pam password change = Yes preferred master = Yes printcap name = /dev/null security = USER server role = standalone server server string = Samba Server %v unix extensions = No unix password sync = Yes username map = /etc/samba/smbusers workgroup = REUB idmap config * : backend = tdb There are four very basic shares specified after this. There is a Win2k16 server on the network but it is not currently providing any services and is not configured to support domain logins (workgroup only). I have uploaded the pcap file and the daemon logs to my web server: https://www.reub.net/files/samba/Samba-Syslog.log https://www.reub.net/files/samba/Samba-4.9.0-NetworkMeltdown.pcap The system is a Gentoo Linux x86_64 kept very up to date. The server is a VM which has one interface that has 4 IPv4 and IPv6 addresses on it, as well as a second vNIC (currently used for backups only with no hosts on it right now). Can anyone please assist in getting to the bottom of what appears to be a nasty bug? I'm keen to work on getting to the root cause of this. Thanks, Reuben
On Sat, 15 Sep 2018 12:52:52 +1000 Reuben Farrelly via samba <samba at lists.samba.org> wrote:> Hi, > > Last night I attempted to upgrade from Samba 4.8.5 to 4.9.0, with > disastrous results. Upon starting Samba 4.9.0 my entire network came > to a screaming halt a few seconds later, and upon shutting Samba down > it came back to life again. > > The config is very basic:and wrong> > thunderstorm ~ # testparm > Load smb config files from /etc/samba/smb.conf > rlimit_max: increasing rlimit_max (1024) to minimum Windows limit > (16384) Processing section "[homes]" > Processing section "[root]" > Processing section "[photos]" > Processing section "[store]" > Loaded services file OK. > Server role: ROLE_STANDALONE > > Press enter to see a dump of your service definitions > > # Global parameters > [global] > domain master = Yes > security = USER > server role = standalone serverNOTE: I have shrunk your smb.conf for clarity. It is undoubtedly for a 'standalone server', so why does it also have the line 'domain master = Yes' ?? It cannot be both, I would suggest removing this line. Rowland
On 09/15/2018 03:40 AM, Rowland Penny via samba wrote:> > It is undoubtedly for a 'standalone server', so why does it also have > the line 'domain master = Yes' ?? > It cannot be both, I would suggest removing this line. > > Rowland > >Rowland, domain master=yes used to be standard for stand-alone to cause nmbd claim a special domain specific NetBIOS name as a domain master browser (based on the os level/preferred master election rules) man smb.conf does not mention any discontinuation for use in stand-alone mode. Should it not be used any longer in that role, or is it a matter of network scale? -- David C. Rankin, J.D.,P.E.
On 15/09/2018 6:40 pm, Rowland Penny via samba wrote:> On Sat, 15 Sep 2018 12:52:52 +1000 > Reuben Farrelly via samba <samba at lists.samba.org> wrote: >> thunderstorm ~ # testparm >> Load smb config files from /etc/samba/smb.conf >> rlimit_max: increasing rlimit_max (1024) to minimum Windows limit >> (16384) Processing section "[homes]" >> Processing section "[root]" >> Processing section "[photos]" >> Processing section "[store]" >> Loaded services file OK. >> Server role: ROLE_STANDALONE >> >> Press enter to see a dump of your service definitions >> >> # Global parameters >> [global] >> domain master = Yes >> security = USER >> server role = standalone server > > NOTE: I have shrunk your smb.conf for clarity. > > It is undoubtedly for a 'standalone server', so why does it also have > the line 'domain master = Yes' ?? > It cannot be both, I would suggest removing this line.Sure - valid point. I've removed that statement now as you're right, it's not needed, and things are much better. Fingers crossed! What I have observed now was: - Upon startup of Samba 4.9.0 again I saw again a repeated burst of broadcast packets - Switches once again went into storm-control mode and shut ports down - The environment recovered, but this time things stabilised and has been OK for the last hour since. Things seem to be working fine now. Regardless of if the config was right or not (I agree that the setting in my case was wrong and unnecessary), this is a regression, because it causes an unexpected and undocumented change in behaviour compared to previous versions of the code. I also wonder why network broadcasts don't seem to be rate limited by Samba. I can't imagine any valid use case where any application would blast thousands of broadcasts per second out onto the wire, regardless of the configuration or misconfiguration of the application. At the very least this needs a mention in the release notes, especially given the potential this has to cause an outage. Things may have changed (and change is usually good), but the least that can be done is people are given a one line heads up. Thanks, Reuben
On Sat, 2018-09-15 at 12:52 +1000, Reuben Farrelly via samba wrote:> Hi, > > Last night I attempted to upgrade from Samba 4.8.5 to 4.9.0, with > disastrous results. Upon starting Samba 4.9.0 my entire network came > to > a screaming halt a few seconds later, and upon shutting Samba down > it > came back to life again.> Just to be sure this wasn't a coincidence, I then started Samba > again. > Once again all connectivity stopped, but came back as soon as I was > able > to shut down Samba. > > Network switches were all logging that they were shutting down > physical > ports due excessive numbers of broadcast packets being seen, and a > Wireshark capture from my PC verified that indeed there really was a > broadcast storm happening that was triggering this. > > The capture showed that upon startup Samba 4.9.0 was sending > thousands > and thousands of broadcast packets onto the wire in very quick > succession. Wireshark counted around 6500 broadcasts in about > 300ms. > The packets are all Host Announcement packets sent from the IPv4 > address > of the host to the broadcast address of the subnet the Samba is on. > > Upon reverting back to 4.8.5 with no other config changes, everything > is > back to normal again. > > The config is very basic: > > thunderstorm ~ # testparm > Load smb config files from /etc/samba/smb.conf > rlimit_max: increasing rlimit_max (1024) to minimum Windows limit > (16384) > Processing section "[homes]" > Processing section "[root]" > Processing section "[photos]" > Processing section "[store]" > Loaded services file OK. > Server role: ROLE_STANDALONE > > Press enter to see a dump of your service definitions > > # Global parameters > [global] > dns proxy = No > domain master = Yes > load printers = No > log file = /var/log/samba/log.%m > map to guest = Bad User > max log size = 200 > pam password change = Yes > preferred master = Yes > printcap name = /dev/null > security = USER > server role = standalone server > server string = Samba Server %v > unix extensions = No > unix password sync = Yes > username map = /etc/samba/smbusers > workgroup = REUB > idmap config * : backend = tdb > > There are four very basic shares specified after this. > > There is a Win2k16 server on the network but it is not currently > providing any services and is not configured to support domain > logins > (workgroup only). > > I have uploaded the pcap file and the daemon logs to my web server: > > https://www.reub.net/files/samba/Samba-Syslog.log > https://www.reub.net/files/samba/Samba-4.9.0-NetworkMeltdown.pcapIt certainly is defending it's name very aggressively! Ouch!> The system is a Gentoo Linux x86_64 kept very up to date. The server > is > a VM which has one interface that has 4 IPv4 and IPv6 addresses on > it, > as well as a second vNIC (currently used for backups only with no > hosts > on it right now). > > Can anyone please assist in getting to the bottom of what appears to > be > a nasty bug? I'm keen to work on getting to the root cause of this.Can you try reverting 3a383038ee7f74e5a9d2326a761b27950a14eb83? nmbd does not change much, and this is one of the few changes between 4.8 and 4.9. I've attached such a revert (it probably won't go to list recipients) for your testing. Thanks, Andrew Bartlett -- Andrew Bartlett http://samba.org/~abartlet/ Authentication Developer, Samba Team http://samba.org Samba Developer, Catalyst IT http://catalyst.net.nz/services/samba -------------- next part -------------- A non-text attachment was scrubbed... Name: 0001-Revert-s3-nmbd-Fix-possible-integer-overflow.patch Type: text/x-patch Size: 1015 bytes Desc: not available URL: <http://lists.samba.org/pipermail/samba/attachments/20180915/fc74b73c/0001-Revert-s3-nmbd-Fix-possible-integer-overflow.bin>
On 15/09/2018 10:44 pm, Andrew Bartlett wrote:> On Sat, 2018-09-15 at 12:52 +1000, Reuben Farrelly via samba wrote:>> The system is a Gentoo Linux x86_64 kept very up to date. The server >> is >> a VM which has one interface that has 4 IPv4 and IPv6 addresses on >> it, >> as well as a second vNIC (currently used for backups only with no >> hosts >> on it right now). >> >> Can anyone please assist in getting to the bottom of what appears to >> be >> a nasty bug? I'm keen to work on getting to the root cause of this. > > Can you try reverting 3a383038ee7f74e5a9d2326a761b27950a14eb83? > > nmbd does not change much, and this is one of the few changes between > 4.8 and 4.9. > > I've attached such a revert (it probably won't go to list recipients) > for your testing. > > Thanks, > > Andrew Bartlett >Cool. That seems to have fixed things both in the scenario of "domain master = yes" set (my initial config) as well as the case of that parameter now not explicitly defined. In both cases Samba now starts up normally and uneventfully. Thanks, Reuben
On Sat, Sep 15, 2018 at 12:52:52PM +1000, Reuben Farrelly via samba wrote:> Hi, > > Last night I attempted to upgrade from Samba 4.8.5 to 4.9.0, with disastrous > results. Upon starting Samba 4.9.0 my entire network came to a screaming > halt a few seconds later, and upon shutting Samba down it came back to life > again. > > Just to be sure this wasn't a coincidence, I then started Samba again. Once > again all connectivity stopped, but came back as soon as I was able to shut > down Samba. > > Network switches were all logging that they were shutting down physical > ports due excessive numbers of broadcast packets being seen, and a Wireshark > capture from my PC verified that indeed there really was a broadcast storm > happening that was triggering this. > > The capture showed that upon startup Samba 4.9.0 was sending thousands and > thousands of broadcast packets onto the wire in very quick succession. > Wireshark counted around 6500 broadcasts in about 300ms. The packets are all > Host Announcement packets sent from the IPv4 address of the host to the > broadcast address of the subnet the Samba is on. > > Upon reverting back to 4.8.5 with no other config changes, everything is > back to normal again.Reuben, Andrew and I think that the following is the correct patch rather than the simple revert (it keeps the compiler overflow fix in place). Can you test and confirm ? Thanks, Jeremy.
Possibly Parallel Threads
- Network Meltdown after Samba 4.9.0 Upgrade
- Network Meltdown after Samba 4.9.0 Upgrade
- dovecot-2.3 (-git) Warning and Fatal Compile Error
- Dovecot User Listing Error - getpwent() failed: Invalid Argument
- Dovecot User Listing Error - getpwent() failed: Invalid Argument