OK, booting *too* quickly is a somewhat unusual problem..... I have FreeBSD 6.1-RELEASE-p3 running on a Dell PowerEdge 850. For some reason, in the PowerEdge 850 Dell chose to replace the perfectly adequate em(4) adapters found on the PE750 with bge(4) hardware. FreeBSD identifies these adapters as BCM5750A1, but Dell says they're actually Broadcom 5721J adapters instead. See http://www.dell.com/downloads/global/products/pedge/en/850_specs.pdf for details. The switch to which the host is connected is a Cisco Catalyst 3750. How this relates to FreeBSD, however..... During the boot process and before /etc/rc.d/netif runs, the networking hardware is *cold*, i.e., no link lights or anything. During boot, when FreeBSD brings the interface up, there is a period where the interface appears to do autonegotiation with the switch to which it's connected, regardless of whether the 'ifconfig_bge0=...' line in rc.conf includes "media" and "mediaopt" options. The console also displays various bge0: link state changed to DOWN bge0: link state changed to UP messages, while the link lights flash on and off in various patterns. Eventually the link stabilizes... but by this point FreeBSD has completed booting and is in multiuser. The result is that any services that rely on network being present during boot (NTP, for example, as well as numerous stuff installed from ports) fail in various ways. As hinted at above, locking the NIC and the associated switch ports to a fixed speed and duplex (thus avoiding the whole autonegotiation mess) does NOT help; FreeBSD still notes link state changes as described above and things break in unpleasant ways. My fix for this has been to apply this patch to /etc/rc.d/netif (also attached in pristine form): ---------- Patch for netif ---------- --- netif.orig Thu Jun 29 17:21:10 2006 +++ netif Thu Aug 17 20:30:10 2006 @@ -71,6 +71,12 @@ # Resync ipfilter /etc/rc.d/ipfilter resync fi + + if [ ! -z "$sleep_postnetif" ]; then + echo -n "Sleeping for $sleep_postnetif seconds . . . " + sleep $sleep_postnetif + echo "Done." + fi } network_stop() -------- End patch for netif -------- Setting $sleep_postnetif to a value of about 7 then causes the boot process to delay long enough that the network connection's stabilized enough to be usable. I chose that RC variable name as I suspected it would have a low incidence of collision in rc.conf in the future, i.e., I'm hoping this patch is safe to include in -STABLE in the event this sort of problem is widespread (and nobody comes up with a more elegant fix). Anyway, since I suspect that I might not be the only one running FreeBSD with Dell and/or Broadcom hardware, I figured it might be worth mentioning this and providing what has been (for me, anyway) a workable patch thus far. I hesitated to open a bug report on this because--well, it doesn't seem like the OS is really at fault here. :-\ Recommendations for improvement are welcome, as well as any suggestions for a less kludgy fix. I *really* dislike the idea of slowing down the boot process. :-( -- Alan Amesbury University of Minnesota -------------- next part -------------- --- netif.orig Thu Jun 29 17:21:10 2006 +++ netif Thu Aug 17 20:30:10 2006 @@ -71,6 +71,12 @@ # Resync ipfilter /etc/rc.d/ipfilter resync fi + + if [ ! -z "$sleep_postnetif" ]; then + echo -n "Sleeping for $sleep_postnetif seconds . . . " + sleep $sleep_postnetif + echo "Done." + fi } network_stop()
In the last episode (Aug 17), Alan Amesbury said:> OK, booting *too* quickly is a somewhat unusual problem..... I have > FreeBSD 6.1-RELEASE-p3 running on a Dell PowerEdge 850. For some > reason, in the PowerEdge 850 Dell chose to replace the perfectly > adequate em(4) adapters found on the PE750 with bge(4) hardware. > FreeBSD identifies these adapters as BCM5750A1, but Dell says they're > actually Broadcom 5721J adapters instead. See > > http://www.dell.com/downloads/global/products/pedge/en/850_specs.pdf > > for details. The switch to which the host is connected is a Cisco > Catalyst 3750. How this relates to FreeBSD, however.....Have you enabled portfast on the Cisco? http://www.cisco.com/warp/public/473/12.html#c2k Another thing to check is whether you have alias IPs. I believe the bge driver has to reset the card every time you add or remove an IP. I know the ti driver (whose chipset the broadcom chips are based on) had that problem. -- Dan Nelson dnelson@allantgroup.com
> 2006/8/18, Patrick M. Hausen <hausen@punkt.de>: > > > > On Fri, Aug 18, 2006 at 01:23:15PM +0200, Martin Horcicka wrote: > > > > > Unfortunately, I don't know how it works exactly. In our case when the > > > autodetection is disabled and there is e.g. 100/full configured > > > manually on both, switch and the FreeBSD box, ifconfig shows the > > > interface status wery early as "active". I suspect the switch (Cisco) > > > to activate the port (from the point of view of the FreeBSD box) but > > > not to forward any "normal" frames until the Spanning Tree Protocol > > > procedure is finished for that port. But it's just a guess. I don't > > > know the negotiation protocol in Ethernet at all and I would really > > > welcome a commentary from someone who does. > > > > This is indeed the case. > > > > The switch port goes up. Then the port goes into either the forwarding > > or the blocking state. The transition period usually takes between 30 > > and 50 seconds, which may be to long for some devices. > > > > spanning-tree portfast puts the port into the forwarding state > > immediately but still participates in STP, so eventually a loop > > will be detected and the port put back into blocking state again. > > This is a little off-topic (and I'm no Cisco specialist) but I'm > afraid that the loop detection won't happen with portfast. Cisco.com > says (the first page that Google gave me): > > --- > Understanding How PortFast Works > > Spanning-tree PortFast causes a port to enter the spanning-tree > forwarding state immediately, bypassing the listening and learning > states. You can use PortFast on switch ports connected to a single > workstation or server to allow those devices to connect to the network > immediately, rather than waiting for the port to transition from the > listening and learning states to the forwarding state. > > Caution: PortFast should be used only when connecting a single end > station to a switch port. If you enable PortFast on a port connected > to another networking device, such as a switch, you can create network > loops. > > When the switch powers up, or when a device is connected to a port, > the port normally enters the spanning-tree listening state. When the > forward delay timer expires, the port enters the learning state. When > the forward delay timer expires a second time, the port is > transitioned to the forwarding or blocking state. > > When you enable PortFast on a port, the port is immediately and > permanently transitioned to the spanning-tree forwarding state. > --- > > But then I don't see any difference between using portfast and > disabling Spanning Tree Protocol frames for that port at all. :-/ >because there isn't? if you are connecting a host to a switch, you can safely drop Spanning tree. from experience, even with SP enabled, the loop is detected, but not always the correct port is disabled :-(. danny> Martin > > > > The layer 2 interface is, of course, "up" during all this > > mumble - otherwise the switch could not send & receive STP frames. > > This is what confuses hosts waiting for DHCP or similar.
On Aug 17, 2006, at 9:49 PM, Alan Amesbury wrote:> adequate em(4) adapters found on the PE750 with bge(4) hardware. > FreeBSD identifies these adapters as BCM5750A1, but Dell says they're > actually Broadcom 5721J adapters instead. See > > http://www.dell.com/downloads/global/products/pedge/en/850_specs.pdf >I'm not sure how much to believe the dell docs... on a PE800, they claim the system has a BCM5721 chip, which is how it was coded into the bge driver when I first got this machine and helped get patches built for it. However, the pciconf database claims it is a "BCM5750A1". Which one is correct? I suspect the latter. I have PR's open on resolving this inconsistency, but they are obviously low priority. I have no problems with the delay in the 'active' status, but I hard- code IP configuration since it is a server.
Thanks for the feedback and discussion! Alas, in terms of network configuration, I'm just a tenant; I have no direct control over the networking gear, nor direct visibility into how the switch is configured. A couple people wrote to me directly and suggested I 'send-pr' this, so I'll do so (hopefully later today). Thanks again! -- Alan Amesbury University of Minnesota