Götz Reinicke - IT Koordinator
2016-Mar-29 06:46 UTC
[CentOS] Network bond - one port goes down from time to time
Am 28.03.16 um 16:23 schrieb Marcelo Ricardo Leitner:> Em 28-03-2016 06:27, G?tz Reinicke escreveu: >> Hi, >> >> may be someone has an idea: >> >> We have three supermicron servers with two 10Gb Ports each, connected >> to a cisco switch stack 1Gb ports. All are on auto speed. >> >> I configured a LACP bond on both sides on all servers, first with >> citrix xen server. >> >> On one server eth0 goes down from time to time ? maybe within minutes, >> someday it is up for some hours. >> >> Two server are fine; the bond is up for 24 days(!) now without any >> problem. >> >> Recently I installed centos 7.2 on that server in question and - bam - >> eth0 is going down from time to time ? >> >> I checked patch cables, tried an other switch port channel, >> reconfigured the ports, reinstalled the os. Same behavior. >> >> And: We got a replacement server. Same behavior ?. :) >> >> Currently the cisco tech guys don?t see a problem on the switch (which >> is up for 3 Years now with 10+ servers connected ? no problem so far), >> from the citrix side I don?t get much more hints. >> >> In the logs i just have a Nic Link is Down ? Nic Link is Up. It is >> always eth0. >> >> Question: >> >> Any idea ? One suggestion was Disable all power saving features in the >> server bios. Did not do that yet. >> >> Is there any chance to set some sort of higher debug level for that >> nic/kernel/whatever to get some server os side feedback why the port >> goes down? >> >> Regards and thanks for any hint! . G?tz > > If you are seeing NIC Link is Down as in: > [710442.668059] e1000e: enp0s25 NIC Link is Down > then the NIC lost its link and bond is just protecting you as you > probably didn't have any downtime due to that. IOW bonding is not the > issue. > > Which NIC do you have on those servers?The mainbord is a supermicro X10DRI-T with Intel X540 Dual port 10GBase-T. regards . G?tz
Marcelo Ricardo Leitner
2016-Mar-29 11:57 UTC
[CentOS] Network bond - one port goes down from time to time
Em 29-03-2016 03:46, G?tz Reinicke - IT Koordinator escreveu:> Am 28.03.16 um 16:23 schrieb Marcelo Ricardo Leitner: >> Em 28-03-2016 06:27, G?tz Reinicke escreveu: >>> Hi, >>> >>> may be someone has an idea: >>> >>> We have three supermicron servers with two 10Gb Ports each, connected >>> to a cisco switch stack 1Gb ports. All are on auto speed. >>> >>> I configured a LACP bond on both sides on all servers, first with >>> citrix xen server. >>> >>> On one server eth0 goes down from time to time ? maybe within minutes, >>> someday it is up for some hours. >>> >>> Two server are fine; the bond is up for 24 days(!) now without any >>> problem. >>> >>> Recently I installed centos 7.2 on that server in question and - bam - >>> eth0 is going down from time to time ? >>> >>> I checked patch cables, tried an other switch port channel, >>> reconfigured the ports, reinstalled the os. Same behavior. >>> >>> And: We got a replacement server. Same behavior ?. :) >>> >>> Currently the cisco tech guys don?t see a problem on the switch (which >>> is up for 3 Years now with 10+ servers connected ? no problem so far), >>> from the citrix side I don?t get much more hints. >>> >>> In the logs i just have a Nic Link is Down ? Nic Link is Up. It is >>> always eth0. >>> >>> Question: >>> >>> Any idea ? One suggestion was Disable all power saving features in the >>> server bios. Did not do that yet. >>> >>> Is there any chance to set some sort of higher debug level for that >>> nic/kernel/whatever to get some server os side feedback why the port >>> goes down? >>> >>> Regards and thanks for any hint! . G?tz >> >> If you are seeing NIC Link is Down as in: >> [710442.668059] e1000e: enp0s25 NIC Link is Down >> then the NIC lost its link and bond is just protecting you as you >> probably didn't have any downtime due to that. IOW bonding is not the >> issue. >> >> Which NIC do you have on those servers? > > > The mainbord is a supermicro X10DRI-T with Intel X540 Dual port 10GBase-T.Okay, it's probably using ixgbe driver then. You may consider testing a newer kernel and see how that goes out, before doing too much debugging. You can install v4.5 using one of ELrepo's kernels at http://elrepo.org/linux/kernel/el7/x86_64/RPMS/ http://elrepo.org/tiki/tiki-index.php There are some changes between 7.2 and that kernel that it's good to be tested. Or... enable ixgbe debug, module param debug=16, and send the dmesg log, specially the lines around the event.
Götz Reinicke - IT Koordinator
2016-Mar-30 09:46 UTC
[CentOS] Network bond - one port goes down from time to time
Am 29.03.16 um 13:57 schrieb Marcelo Ricardo Leitner:> Em 29-03-2016 03:46, G?tz Reinicke - IT Koordinator escreveu: >> Am 28.03.16 um 16:23 schrieb Marcelo Ricardo Leitner: >>> Em 28-03-2016 06:27, G?tz Reinicke escreveu: >>>> Hi, >>>> >>>> may be someone has an idea: >>>> >>>> We have three supermicron servers with two 10Gb Ports each, connected >>>> to a cisco switch stack 1Gb ports. All are on auto speed. >>>> >>>> I configured a LACP bond on both sides on all servers, first with >>>> citrix xen server. >>>> >>>> On one server eth0 goes down from time to time ? maybe within minutes, >>>> someday it is up for some hours. >>>> >>>> Two server are fine; the bond is up for 24 days(!) now without any >>>> problem. >>>> >>>> Recently I installed centos 7.2 on that server in question and - bam - >>>> eth0 is going down from time to time ? >>>> >>>> I checked patch cables, tried an other switch port channel, >>>> reconfigured the ports, reinstalled the os. Same behavior. >>>> >>>> And: We got a replacement server. Same behavior ?. :) >>>> >>>> Currently the cisco tech guys don?t see a problem on the switch (which >>>> is up for 3 Years now with 10+ servers connected ? no problem so far), >>>> from the citrix side I don?t get much more hints. >>>> >>>> In the logs i just have a Nic Link is Down ? Nic Link is Up. It is >>>> always eth0. >>>> >>>> Question: >>>> >>>> Any idea ? One suggestion was Disable all power saving features in the >>>> server bios. Did not do that yet. >>>> >>>> Is there any chance to set some sort of higher debug level for that >>>> nic/kernel/whatever to get some server os side feedback why the port >>>> goes down? >>>> >>>> Regards and thanks for any hint! . G?tz >>> >>> If you are seeing NIC Link is Down as in: >>> [710442.668059] e1000e: enp0s25 NIC Link is Down >>> then the NIC lost its link and bond is just protecting you as you >>> probably didn't have any downtime due to that. IOW bonding is not the >>> issue. >>> >>> Which NIC do you have on those servers? >> >> >> The mainbord is a supermicro X10DRI-T with Intel X540 Dual port >> 10GBase-T. > > Okay, it's probably using ixgbe driver then. > You may consider testing a newer kernel and see how that goes out, > before doing too much debugging. > You can install v4.5 using one of ELrepo's kernels at > http://elrepo.org/linux/kernel/el7/x86_64/RPMS/ > http://elrepo.org/tiki/tiki-index.php > There are some changes between 7.2 and that kernel that it's good to be > tested. > > Or... enable ixgbe debug, module param debug=16, and send the dmesg log, > specially the lines around the event.Hm,, could you give me a hint, how to enable that (at runtime) for centos 7.2? I cant figure that out. Would be nice. cheers . G?tz
Reasonably Related Threads
- Network bond - one port goes down from time to time
- Network bond - one port goes down from time to time
- Network bond - one port goes down from time to time
- Network bond - one port goes down from time to time
- Network bond - one port goes down from time to time