On Mon, Sep 5, 2016 at 1:43 AM, Slawa Olhovchenkov <slw at zxy.spb.ru> wrote:> On Sun, Sep 04, 2016 at 06:46:12PM -0700, hiren panchasara wrote: > >> On 09/05/16 at 12:57P, Slawa Olhovchenkov wrote: >> > I am try using 11.0 on Dual E5-2620 (no X2APIC). >> > Under high network load and may be addtional conditional system go to >> > unresponsible state -- no reaction to network and console (USB IPMI >> > emulation). INVARIANTS give to high overhad. Is this exist some way to >> > debug this? >> >> Can you panic it from console to get to db> to get backtrace and other >> info when it goes unresponsive? > > no > no reactionSo the canonical 'ipmitool chassis power diag' doesn't send an NMI to get you to the debugger? I've seen this at Netflix on one variant of our flash offload box with a Intel e5-2697v2 running with the Chelsio driver. We're working around it by having fewer receive threads than CPUs in the system. The only way the boxes would come back was with watchdog. The load was streaming video > ~36Gbps out 4 lagged 10G ports. Console is totally unresponsive as well. This is on our FreeBSD-10 stable based fork.>From my debugging, we go from totally fine as far as I can tell fromps, etc in the moments leading to the hang to being totally wedged. It seems a very sudden-onset condition. Sound at all familiar? Warner
On Mon, Sep 05, 2016 at 10:14:59AM -0600, Warner Losh wrote:> On Mon, Sep 5, 2016 at 1:43 AM, Slawa Olhovchenkov <slw at zxy.spb.ru> wrote: > > On Sun, Sep 04, 2016 at 06:46:12PM -0700, hiren panchasara wrote: > > > >> On 09/05/16 at 12:57P, Slawa Olhovchenkov wrote: > >> > I am try using 11.0 on Dual E5-2620 (no X2APIC). > >> > Under high network load and may be addtional conditional system go to > >> > unresponsible state -- no reaction to network and console (USB IPMI > >> > emulation). INVARIANTS give to high overhad. Is this exist some way to > >> > debug this? > >> > >> Can you panic it from console to get to db> to get backtrace and other > >> info when it goes unresponsive? > > > > no > > no reaction > > So the canonical 'ipmitool chassis power diag' doesn't send an NMI to > get you to the debugger?Don't try (and don't know about this). Can you some explain? Is this FreeBSD by default catch NMI and enter to debugger? How to interoperable with USB stack (I am beware USB keyboard may be locked)?> I've seen this at Netflix on one variant of our flash offload box with > a Intel e5-2697v2 running with the Chelsio driver. We're working > around it by having fewer receive threads than CPUs in the system. The > only way the boxes would come back was with watchdog. The load was > streaming video > ~36Gbps out 4 lagged 10G ports. Console is totally > unresponsive as well. This is on our FreeBSD-10 stable based fork. > >From my debugging, we go from totally fine as far as I can tell from > ps, etc in the moments leading to the hang to being totally wedged. It > seems a very sudden-onset condition. Sound at all familiar? > > WarnerNot sure. This is less power box and can be servered only 20Gbit, using Intel card (lagg 2x10H). Day ago I am using on this box 10-STABLE w/o such issuse. (Not cleancly remember, may be some month ago this box crashed by this issuse -- at the that time I am don't have any ideas about crash) May be stuck caused by some poor (too big) memory request from nginx (atempt parsing some malformed files). Or frequent nginx core dump (from this malformed files). 11.0 on two different more power box servered from 40 to 55Gbit w/o stuck. But w/o malformed files (t.e. w/o bogus memory request and w/o nginx crash). Not sure about correlation.
On Mon, Sep 05, 2016 at 10:14:59AM -0600, Warner Losh wrote:> On Mon, Sep 5, 2016 at 1:43 AM, Slawa Olhovchenkov <slw at zxy.spb.ru> wrote: > > On Sun, Sep 04, 2016 at 06:46:12PM -0700, hiren panchasara wrote: > > > >> On 09/05/16 at 12:57P, Slawa Olhovchenkov wrote: > >> > I am try using 11.0 on Dual E5-2620 (no X2APIC). > >> > Under high network load and may be addtional conditional system go to > >> > unresponsible state -- no reaction to network and console (USB IPMI > >> > emulation). INVARIANTS give to high overhad. Is this exist some way to > >> > debug this? > >> > >> Can you panic it from console to get to db> to get backtrace and other > >> info when it goes unresponsive? > > > > no > > no reaction > > So the canonical 'ipmitool chassis power diag' doesn't send an NMI to > get you to the debugger?This supermicro MB don't interact with ipmitool over lan :( either chassis power diag and sol> I've seen this at Netflix on one variant of our flash offload box with > a Intel e5-2697v2 running with the Chelsio driver. We're working > around it by having fewer receive threads than CPUs in the system. The > only way the boxes would come back was with watchdog. The load was > streaming video > ~36Gbps out 4 lagged 10G ports. Console is totally > unresponsive as well. This is on our FreeBSD-10 stable based fork. > >From my debugging, we go from totally fine as far as I can tell from > ps, etc in the moments leading to the hang to being totally wedged. It > seems a very sudden-onset condition. Sound at all familiar? > > Warner
On Monday, September 5, 2016, Warner Losh <imp at bsdimp.com <javascript:_e(%7B%7D,'cvml','imp at bsdimp.com');>> wrote:> On Mon, Sep 5, 2016 at 1:43 AM, Slawa Olhovchenkov <slw at zxy.spb.ru> wrote: > > On Sun, Sep 04, 2016 at 06:46:12PM -0700, hiren panchasara wrote: > > > >> On 09/05/16 at 12:57P, Slawa Olhovchenkov wrote: > >> > I am try using 11.0 on Dual E5-2620 (no X2APIC). > >> > Under high network load and may be addtional conditional system go to > >> > unresponsible state -- no reaction to network and console (USB IPMI > >> > emulation). INVARIANTS give to high overhad. Is this exist some way to > >> > debug this? > >> > >> Can you panic it from console to get to db> to get backtrace and other > >> info when it goes unresponsive? > > > > no > > no reaction > > So the canonical 'ipmitool chassis power diag' doesn't send an NMI to > get you to the debugger? > > I've seen this at Netflix on one variant of our flash offload box with > a Intel e5-2697v2 running with the Chelsio driver. We're working > around it by having fewer receive threads than CPUs in the system. The > only way the boxes would come back was with watchdog. The load was > streaming video > ~36Gbps out 4 lagged 10G ports. Console is totally > unresponsive as well.> Try to set kern.sched.preempt_thresh sysctl to 224 to get back your > console..> This is on our FreeBSD-10 stable based fork. > From my debugging, we go from totally fine as far as I can tell from > ps, etc in the moments leading to the hang to being totally wedged. It > seems a very sudden-onset condition. Sound at all familiar? > > Warner > _______________________________________________ > freebsd-stable at freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscribe at freebsd.org" >