All, I've been experiencing an intermittent issue with a drop in networking connectivity on a couple of boxes. At random times I drop connectivity between the servers and their gateway. I am able to login via the secondary interface and "/etc/netstart" and everything starts behaving as normal. My switch shows the link is up, ifconfig shows the link is up, but I am unable to ping my gateway until running "/etc/netstart". Somedays it'll happen a few times an hour, some days once every 8-10 hours. It really is intermittent, and driving me crazy trying to track down the issue. I've tried different cables, switches, gateways, IPs, and locations. Memtest for 5 days showed no errors. However, the same problem exists on two separate installs at different times. I am able to connect to the one server from the second via their secondary interfaces, so the problem isn't related to both network interfaces. Both servers have the Supermicro X7SLM-L motherboard, same CPU, RAM and disks. Using the Realtek network driver (re). pciconf shows: vendor = 'Realtek Semiconductor' device = 'Gigabit Ethernet NIC(NDIS 6.0) (RTL8168/8111)' class = network subclass = ethernet I've experienced the problem for some time now on both 7.2-RELEASE and 7.2-STABLE (09/20/09) using amd64. Any help or suggestions would be useful in getting to the bottom of this. Thanks, -c
On 22 Sep 2009, at 06:25, Cassidy Larson <alandaluz@gmail.com> wrote:> I've been experiencing an intermittent issue with a drop in networking > connectivity on a couple of boxes. > > At random times I drop connectivity between the servers and their > gateway. I am able to login via the secondary interface and > "/etc/netstart" and everything starts behaving as normal. My switch > shows the link is up, ifconfig shows the link is up, but I am unable > to ping my gateway until running "/etc/netstart".Can you see if "arp -da" is sufficient to get the server online? Thanks, Gavin
Alexandre "Sunny" Kovalenko
2009-Sep-22 12:55 UTC
Random Network Drops on Realtek Interfaces (re)
On Mon, 2009-09-21 at 23:25 -0600, Cassidy Larson wrote:> All, > > I've been experiencing an intermittent issue with a drop in networking > connectivity on a couple of boxes. > > At random times I drop connectivity between the servers and their > gateway. I am able to login via the secondary interface and > "/etc/netstart" and everything starts behaving as normal. My switch > shows the link is up, ifconfig shows the link is up, but I am unable > to ping my gateway until running "/etc/netstart". Somedays it'll > happen a few times an hour, some days once every 8-10 hours. It really > is intermittent, and driving me crazy trying to track down the issue. > I've tried different cables, switches, gateways, IPs, and locations. > Memtest for 5 days showed no errors. However, the same problem exists > on two separate installs at different times. I am able to connect to > the one server from the second via their secondary interfaces, so the > problem isn't related to both network interfaces. > > Both servers have the Supermicro X7SLM-L motherboard, same CPU, RAM > and disks. Using the Realtek network driver (re). pciconf shows: > vendor = 'Realtek Semiconductor' > device = 'Gigabit Ethernet NIC(NDIS 6.0) (RTL8168/8111)' > class = network > subclass = ethernet > > I've experienced the problem for some time now on both 7.2-RELEASE and > 7.2-STABLE (09/20/09) using amd64. > > Any help or suggestions would be useful in getting to the bottom of this.I do not know how applicable this is in your case, but I have seen such behavior when speed auto-negotiation was allowed on the box connected to the Cisco switch. Condition was usually triggered by the certain volume of traffic (e.g. system could be fine for weeks with SSH/telnet/X11 and lose interface when someone sends large file over FTP or SCP). Restarting the interface, usually fixed it for a while. In my case it was platform-agnostic causing me to have a cheat-sheet on how to disable auto-negotiation on AIX/Solaris/Linux/etc. -- Alexandre Kovalenko (????????? ?????????)
On Mon, Sep 21, 2009 at 11:25:17PM -0600, Cassidy Larson wrote:> All, > > I've been experiencing an intermittent issue with a drop in networking > connectivity on a couple of boxes. > > At random times I drop connectivity between the servers and their > gateway. I am able to login via the secondary interface and > "/etc/netstart" and everything starts behaving as normal. My switch > shows the link is up, ifconfig shows the link is up, but I am unable > to ping my gateway until running "/etc/netstart". Somedays it'll > happen a few times an hour, some days once every 8-10 hours. It really > is intermittent, and driving me crazy trying to track down the issue. > I've tried different cables, switches, gateways, IPs, and locations. > Memtest for 5 days showed no errors. However, the same problem exists > on two separate installs at different times. I am able to connect to > the one server from the second via their secondary interfaces, so the > problem isn't related to both network interfaces. > > Both servers have the Supermicro X7SLM-L motherboard, same CPU, RAM > and disks. Using the Realtek network driver (re). pciconf shows: > vendor = 'Realtek Semiconductor' > device = 'Gigabit Ethernet NIC(NDIS 6.0) (RTL8168/8111)' > class = network > subclass = ethernet > > I've experienced the problem for some time now on both 7.2-RELEASE and > 7.2-STABLE (09/20/09) using amd64. > > Any help or suggestions would be useful in getting to the bottom of this. > > Thanks, >By chace can you find any messages in dmesg reported by re(4)? dmesg output related to re(4) would be more helpful as RealTek controllers used to show same device ids.
On Wed, Sep 23, 2009 at 09:50:45AM -0600, Cassidy Larson wrote:> > It looks plain RTL8168C PCIe controller. Is there any odd messages > > reported by re(4) such as watchdog timeouts? > > If you disable MSI feature does it make any difference?(Add > > hw.re.msi_disable="1" to /boot/loader.conf to disable MSI.) > > Disabling MSI didnt solve the problem. > > Any other suggestions? >Hmm, not yet. When you lost network connection on re can you still see incoming traffics from other hosts with tcpdump? Also would you check available mbuf with "netstat -m" when you think re is not respond to any request?> Thanks, > > -c
> Hmm, not yet. When you lost network connection on re can you still > see incoming traffics from other hosts with tcpdump? Also would you > check available mbuf with "netstat -m" when you think re is not > respond to any request?No incoming traffic found with tcpdump, just outgoing arp requests from the local machine. netstat -m output: 258/777/1035 mbufs in use (current/cache/total) 256/396/652/25600 mbuf clusters in use (current/cache/total/max) 256/384 mbuf+clusters out of packet secondary zone in use (current/cache) 0/35/35/12800 4k (page size) jumbo clusters in use (current/cache/total/max) 0/0/0/6400 9k jumbo clusters in use (current/cache/total/max) 0/0/0/3200 16k jumbo clusters in use (current/cache/total/max) 576K/1126K/1702K bytes allocated to network (current/cache/total) 0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters) 0/0/0 requests for jumbo clusters denied (4k/9k/16k) 0/0/0 sfbufs in use (current/peak/max) 0 requests for sfbufs denied 0 requests for sfbufs delayed 0 requests for I/O initiated by sendfile 0 calls to protocol drain routines -c