Morning All, I've been having a heap of trouble with the primary network interface on a box that was running 5.4 and recently upgraded to 6.0-Beta5 where the interface would just go dead. Nothing in ifconfig or syslog or dmesg would indicate a problem, but nothing would go in or out. The only way to fix it was reboot. A week ago, after searching the mailing lists I realised it might be the fact that I was using Netatalk and that might not be MP safe so I set debug.mpsafenet="0" in /boot/loader.conf and the box has been stable ever since. Is anyone looking at the kernel Netatalk code? Is this likely to be the real reason for the problem? Thanks, Carl.
On Fri, Oct 28, 2005 at 09:55:21AM +1000, Carl Makin wrote:> Morning All, > > I've been having a heap of trouble with the primary network interface on > a box that was running 5.4 and recently upgraded to 6.0-Beta5 where the > interface would just go dead. Nothing in ifconfig or syslog or dmesg > would indicate a problem, but nothing would go in or out. The only way > to fix it was reboot. > > A week ago, after searching the mailing lists I realised it might be the > fact that I was using Netatalk and that might not be MP safe so I set > debug.mpsafenet="0" in /boot/loader.conf and the box has been stable > ever since. > > Is anyone looking at the kernel Netatalk code? Is this likely to be the > real reason for the problem?The netatalk code gets looked at now an then, but very few people stress test it. As more of the kernel is getting properly locked down, the odds of the other parts breaking increases due to increased concurrency so I wouldn't be at all surprised if there's a netatalk issue. -- Brooks -- Any statement of the form "X is the one, true Y" is FALSE. PGP fingerprint 655D 519C 26A7 82E7 2529 9BF0 5D8E 8BE9 F238 1AD4 -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20051027/4799359b/attachment.bin
Carl Makin wrote:> Morning All, > > I've been having a heap of trouble with the primary network interface > on a box that was running 5.4 and recently upgraded to 6.0-Beta5 where > the interface would just go dead. Nothing in ifconfig or syslog or > dmesg would indicate a problem, but nothing would go in or out. The > only way to fix it was reboot. >What sort of network card? I've been having the same syptoms with a sk driver gigabit card.
On Fri, 28 Oct 2005, Carl Makin wrote:> I've been having a heap of trouble with the primary network interface on > a box that was running 5.4 and recently upgraded to 6.0-Beta5 where the > interface would just go dead. Nothing in ifconfig or syslog or dmesg > would indicate a problem, but nothing would go in or out. The only way > to fix it was reboot. > > A week ago, after searching the mailing lists I realised it might be the > fact that I was using Netatalk and that might not be MP safe so I set > debug.mpsafenet="0" in /boot/loader.conf and the box has been stable > ever since. > > Is anyone looking at the kernel Netatalk code? Is this likely to be the > real reason for the problem?I've not seen any reports of problems, but have had my hands in it recently. I'm happy to help try and debug the issues, but my preference (if possible) would be to do this on 6.x and then backport fixes to 5.x. While the netatalk code does see testing, it's not all that widely used, and so it's possible there are lurking issues. netatalk is, in theory, MPSAFE, but there could be lasting race conditions. debug.mpsafenet puts Giant back over the stack, but also substantially changes the timing, so a race condition in a device driver or the socket code could also be indicated. Could you: - Submit a PR describing the details. - Include output from dmesg, ifconfig, and other information you might thing that would be useful. Indicate which interface is the one that is hanging. - Compile the kernel with INVARIANTS, INVARIANT_SUPPORT, WITNESS, DDB, and BREAK_TO_DEBUGGER. See if you get any debugging warnings around when the hang occurs. Note: these options have a large performance impact. - Once the interfae is dead, can you use it for IP traffic? - Once the interface is dead, if you run tcpdump on it, do you see traffic? - Once the interface is dead, if you generate traffic, do other hosts see it? - If you generate traffic, does tcpdump see your own traffic? Thanks, Robert N M Watson