Aaron Power
2004-Jan-30 15:20 UTC
[Samba] A possible error in the way nmbd fails to respond to certain request packets
I have just speant several hours trying to nut out an error with a new installation of SAMBA 3, and the results may indicate something has changed with the way the Name Server (nmbd) responds to certain request. It could also be network stack related, but I am about at my limit of skills to diagnose it any further. I have just installed Fedora Core 1 to a test server in order to play around with SAMBA 3. The exact version is SAMBA 3.0.0-15. My network consists of three machines. All the machines are in a workgroup called DRWHO. 192.168.1.50 : TESTSERVER - The new Fedora machine 192.168.1.100 : K9 - A Windows 98 client 192.168.1.1 : TARDIS - An older SAMBA Server (Linux Kernel 2.4.9 / SAMBA 2.0.10) I used the SWAT wizard to configure the new server and set up a single share point called "pub", which just allowed guest access to /tmp. I then thought I would check out the new "nbmlookup -S <name>" tool to see what it would say about the different machines on the network. The results were as expected, so I won't repeat them here. I then went to the Win98 machine to compare these results with a "nbtstat -a <name>" command, and this is where the fun started. Basically the Win98 machine kept responding saying that TESTSERVER was an unknown host. It correctly reported the results for itself and the older SAMBA server, TARDIS. My first thought was that Fedora had set up some firewalling rules, even though I had turned it off during the configuration. But "iptables -L" showed that everything was set to ACCEPT. So I fired up Ethereal on the new Fedora box, and watched what went across the network when I queried TESTSERVER from both its local console and from the Win98 box. From the local console the results were:- Src=192.168.1.50 Dst=192.168.1.255 (broadcast) Name Query NB TESTSERVER<00> followed by a response, then Src=192.168.1.50 Dst=192.168.1.50 Name Query NBSTAT TESTSERVER<00> followed by a response. Everything as I expected to see it. However, from the Win98 box I just got three request packets, spaced about a second apart, but no response. A quick check from the console of TESTSERVER showed that nmbd had bound itself to the network correctly:- [root@TestServer root]# netstat -na udp 0 0 192.168.1.50:137 0.0.0.0:* udp 0 0 0.0.0.0:137 0.0.0.0:* udp 0 0 192.168.1.50:138 0.0.0.0:* udp 0 0 0.0.0.0:138 0.0.0.0:* So, if Ethereal was showing the query packets turning up on the network interface, and nmbd was listening on that interface, why wasn't it responding? So after much more fruitless time, running the nmbd daemon in interactive mode, with various levels of debug, searching google, and other online resources, I eventually (finally!) noticed in the Ethereal dumps that the exact request line from the Win98 box was:- Src=192.168.1.100 Dst=192.255.255.255 (broadcast) Name Query NB TESTSERVER<00> It seems that there was (and always has been) an error in my DHCP configuration file that meant that the Win98 machine was getting a 255.0.0.0 netmask, even though the rest of the network was working off a 255.255.255.0 netmask. While this was an error, I can't see why it shouldn't have worked. The destination address (IP=192.255.255.255 / Ethernet=ff:ff:ff:ff:ff:ff) should still be acceptable to a service listening on 0.0.0.0, and there was certainly enough routing information to return a response back to the Win98 box on 192.168.1.100. Also, the Win98 box has been successfully working with the older SAMBA server for several years, so it seems that *something* has changed. It could be that the new Fedora IP stack was dropping the request before it got to SAMBA, or it might be that SAMBA was ignoring the request - I couldn't tell which. Also, there might be a very good reason why such changes were made, in which case this message can just server as a warning to others. Thanks very much for your time, and thanks for the great software. Aaron Power.