The tuneable interrupt rate code is not mine, and looking at it I'm not entirely sure it works. Why are you focused on the interrupt rate anyway, do you have some reason to tie it to the watchdog? You could turn AIM off (enable_aim) and see if that changed anything? It seems most the time problems show up they involve the use of lagg, if you take it out of the mix does the problem go away? Jack On Fri, Jan 9, 2015 at 2:03 AM, Harald Schmalzbauer < h.schmalzbauer at omnilan.de> wrote:> Bez?glich Harald Schmalzbauer's Nachricht vom 08.01.2015 11:22 > (localtime): > > ... > > > While systat tells: > > 3 igb1:que 0 > > 1619 igb1:que 1 > > 3 igb1:que 2 > > 1 igb1:que 3 > > > > sysctl dev.igb tells: > > dev.igb.1.queue0.interrupt_rate: 43478 > > dev.igb.1.queue1.interrupt_rate: 76923 > > dev.igb.1.queue2.interrupt_rate: 111111 > > dev.igb.1.queue3.interrupt_rate: 90909 > > > > How do I have to understand sysctl's interrupt_rate value? > > Even more interesting, is it reasonable to get constantly visually > strange results from igb0's interrupt_rate? > 'sysctl dev.igb | grep rate' > dev.igb.0.queue0.interrupt_rate: 111111 > dev.igb.0.queue1.interrupt_rate: 111111 > dev.igb.0.queue2.interrupt_rate: 111111 > dev.igb.0.queue3.interrupt_rate: 41666 > dev.igb.1.queue0.interrupt_rate: 100000 > dev.igb.1.queue1.interrupt_rate: 76923 > dev.igb.1.queue2.interrupt_rate: 37037 > dev.igb.1.queue3.interrupt_rate: 52631 > ... > dev.igb.0.queue0.interrupt_rate: 125000 > dev.igb.0.queue1.interrupt_rate: 111111 > dev.igb.0.queue2.interrupt_rate: 111111 > dev.igb.0.queue3.interrupt_rate: 66666 > dev.igb.1.queue0.interrupt_rate: 40000 > dev.igb.1.queue1.interrupt_rate: 43478 > dev.igb.1.queue2.interrupt_rate: 37037 > dev.igb.1.queue3.interrupt_rate: 52631 > ... > dev.igb.0.queue0.interrupt_rate: 100000 > dev.igb.0.queue1.interrupt_rate: 111111 > dev.igb.0.queue2.interrupt_rate: 111111 > dev.igb.0.queue3.interrupt_rate: 100000 > dev.igb.1.queue0.interrupt_rate: 34482 > dev.igb.1.queue1.interrupt_rate: 6097 > dev.igb.1.queue2.interrupt_rate: 83333 > dev.igb.1.queue3.interrupt_rate: 76923 > > igb0 doesn't look random enough to me ;-) > > Any help highly appreciated! > > Thanks, > > -Harry > >
Bez?glich Jack Vogel's Nachricht vom 09.01.2015 18:46 (localtime):> The tuneable interrupt rate code is not mine, and looking at it I'm not > entirely > sure it works. Why are you focused on the interrupt rate anyway, do you have > some reason to tie it to the watchdog? > > You could turn AIM off (enable_aim) and see if that changed anything? > > It seems most the time problems show up they involve the use of lagg, if you > take it out of the mix does the problem go away?Thanks for your attention! Unfortunately I can't test anything without lagg(4), this machine is in production (with lagg(4) being parent of lots of vlan-interfaces). I guess the watchdog timeout is more often reported by people with lagg(4) in use for the reason that that's where igb(4) really get's some (peak-)load ;-) Serious, I can't see how lagg(4) should be the culprit for watchdog timeots, but stuck interrupts was my first guess. Especially since I'm doing the kld-reload-trick to get msi-x working inside ESXi (reported 2 years ago that booting FreeBSD initializes the passthrough device with some kind of wrong device-type-identifier; warmbooting the guest or simply kld-reloading solves this problem, the hypervisor then get's the correct device-type-indicator (for using msi-x)). Like mentioned this has been working without any issue for more than one year with FreeBSD 9.1. I have another machine with kawela cards and similar setup, but without load at all. I'll see if I can reproduce the problem there and narrow it down by removing lagg(4). Is there a way to reset the interface without rebooting the machine? The watchdog doesn't really reset the device, it's in non-operating state afterwards. I need to 'ifconfig down' it for bringin lagg(4) back into operational state. Some kind of D3D0-state switch for a single address? kldunloading would destroy the remaining interface too? Thanks, -Harry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 196 bytes Desc: OpenPGP digital signature URL: <http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20150110/afff3de8/attachment.sig>