thr3ads.net - freebsd stable - coredump when loading cxgb after boot with routing daemon already running (RELENG11) [Jan 2017]

If this information is useful, please help other people find it:
Share via:

Mike Tancsa

2017-Jan-04 19:00 UTC

coredump when loading cxgb after boot with routing daemon already running (RELENG11)

I ran into a strange problem when manually loading a network driver
after RELENG_11 box starts up with a routing daemon already running.

If I have zebra running (just a few static routes) and then try and do a
kldload if_cxgb, the box panics.  If I boot the box, load the nic's
driver and then start zebra, all is fine.

At first, I thought it might be a firmware issue, but I updated the
NIC's firmware and the same behaviour.  Not sure if this is specific to
the chelsio or if any kldload of a NIC driver will do.



cxgbc0: <Chelsio T310, 1 port> mem
0xf7081000-0xf7081fff,0xf6800000-0xf6ffffff,0xf7080000-0xf7080fff irq 16
at device 0.0 on pci5
cxgbc0: PCIe x4 Link, expect reduced performance
cxgbc0: using MSI-X interrupts (5 vectors)
cxgbc0: firmware needs to be updated to version 7.11.0
cJan  4 13:03:02 xgbc0: Firmware Version 5.0.0
cxgb0: <Port 0 10GBASE-SR> on cxgbc0
cxgb0: Using defaults for TSO: 65518/35/2048
cxgb0:
Ethernet address: 00:07:43:07:9e:14

offsite2 kernel:Fatal trap 12: page fault while in kernel mode
c found old FW mipuinor version(5.0)d =, driver compile 2; d for version
7.apic11
 id = 04
fault virtual address   = 0x0
fault code              = supervisor read instruction, page not present
instruction pointer     = 0x20:0x0
stack pointer           = 0x28:0xfffffe085d2df728
frame pointer           = 0x28:0xfffffe085d2df750
code segment            = base 0x0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 420 (zebra)
trap number             = 12
panic: page fault
cpuid = 0
KDB: stack backtrace:
#0 0xffffffff806fe447 at kdb_backtrace+0x67
#1 0xffffffff806b4966 at vpanic+0x186
#2 0xffffffff806b47d3 at panic+0x43
#3 0xffffffff80997f82 at trap_fatal+0x322
#4 0xffffffff8099814c at trap_pfault+0x1bc
#5 0xffffffff80997800 at trap+0x280
#6 0xffffffff8097c411 at calltrap+0x8
#7 0xffffffff807b90fd at ifioctl+0x6dd
#8 0xffffffff8071c1d6 at kern_ioctl+0x346
#9 0xffffffff8071bddf at sys_ioctl+0x13f
#10 0xffffffff8099890e at amd64_syscall+0x50e
#11 0xffffffff8097c6fb at Xfast_syscall+0xfb
Uptime: 3m9s
Dumping 1635 out of 32675
MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91%
-- 
-------------------
Mike Tancsa, tel +1 519 651 3400
Sentex Communications, mike at sentex.net
Providing Internet services since 1994 www.sentex.net
Cambridge, Ontario Canada   http://www.tancsa.com/

Navdeep Parhar

2017-Jan-04 19:07 UTC

head link

coredump when loading cxgb after boot with routing daemon already running (RELENG11)

What source line in releng-11 does ifioctl+0x6dd correspond to?

(kgdb) l *(ifioctl+0x6dd)

This might be race where the ifnet is being created or coming up and
zebra pokes it in some way before it's fully ready.  If that's the
case it could affect any ifnet.

Regards,
Navdeep



On Wed, Jan 4, 2017 at 11:00 AM, Mike Tancsa <mike at sentex.net>
wrote:> I ran into a strange problem when manually loading a network driver
> after RELENG_11 box starts up with a routing daemon already running.
>
> If I have zebra running (just a few static routes) and then try and do a
> kldload if_cxgb, the box panics.  If I boot the box, load the nic's
> driver and then start zebra, all is fine.
>
> At first, I thought it might be a firmware issue, but I updated the
> NIC's firmware and the same behaviour.  Not sure if this is specific to
> the chelsio or if any kldload of a NIC driver will do.
>
>
>
> cxgbc0: <Chelsio T310, 1 port> mem
> 0xf7081000-0xf7081fff,0xf6800000-0xf6ffffff,0xf7080000-0xf7080fff irq 16
> at device 0.0 on pci5
> cxgbc0: PCIe x4 Link, expect reduced performance
> cxgbc0: using MSI-X interrupts (5 vectors)
> cxgbc0: firmware needs to be updated to version 7.11.0
> cJan  4 13:03:02 xgbc0: Firmware Version 5.0.0
> cxgb0: <Port 0 10GBASE-SR> on cxgbc0
> cxgb0: Using defaults for TSO: 65518/35/2048
> cxgb0:
> Ethernet address: 00:07:43:07:9e:14
>
> offsite2 kernel:Fatal trap 12: page fault while in kernel mode
> c found old FW mipuinor version(5.0)d =, driver compile 2; d for version
> 7.apic11
>  id = 04
> fault virtual address   = 0x0
> fault code              = supervisor read instruction, page not present
> instruction pointer     = 0x20:0x0
> stack pointer           = 0x28:0xfffffe085d2df728
> frame pointer           = 0x28:0xfffffe085d2df750
> code segment            = base 0x0, limit 0xfffff, type 0x1b
>                         = DPL 0, pres 1, long 1, def32 0, gran 1
> processor eflags        = interrupt enabled, resume, IOPL = 0
> current process         = 420 (zebra)
> trap number             = 12
> panic: page fault
> cpuid = 0
> KDB: stack backtrace:
> #0 0xffffffff806fe447 at kdb_backtrace+0x67
> #1 0xffffffff806b4966 at vpanic+0x186
> #2 0xffffffff806b47d3 at panic+0x43
> #3 0xffffffff80997f82 at trap_fatal+0x322
> #4 0xffffffff8099814c at trap_pfault+0x1bc
> #5 0xffffffff80997800 at trap+0x280
> #6 0xffffffff8097c411 at calltrap+0x8
> #7 0xffffffff807b90fd at ifioctl+0x6dd
> #8 0xffffffff8071c1d6 at kern_ioctl+0x346
> #9 0xffffffff8071bddf at sys_ioctl+0x13f
> #10 0xffffffff8099890e at amd64_syscall+0x50e
> #11 0xffffffff8097c6fb at Xfast_syscall+0xfb
> Uptime: 3m9s
> Dumping 1635 out of 32675
> MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91%
> --
> -------------------
> Mike Tancsa, tel +1 519 651 3400
> Sentex Communications, mike at sentex.net
> Providing Internet services since 1994 www.sentex.net
> Cambridge, Ontario Canada   http://www.tancsa.com/

freebsd stable - Jan 2017 - coredump when loading cxgb after boot with routing daemon already running (RELENG11)

coredump when loading cxgb after boot with routing daemon already running (RELENG11)

coredump when loading cxgb after boot with routing daemon already running (RELENG11)