Mike Tancsa
2017-Jan-04 19:00 UTC
coredump when loading cxgb after boot with routing daemon already running (RELENG11)
I ran into a strange problem when manually loading a network driver after RELENG_11 box starts up with a routing daemon already running. If I have zebra running (just a few static routes) and then try and do a kldload if_cxgb, the box panics. If I boot the box, load the nic's driver and then start zebra, all is fine. At first, I thought it might be a firmware issue, but I updated the NIC's firmware and the same behaviour. Not sure if this is specific to the chelsio or if any kldload of a NIC driver will do. cxgbc0: <Chelsio T310, 1 port> mem 0xf7081000-0xf7081fff,0xf6800000-0xf6ffffff,0xf7080000-0xf7080fff irq 16 at device 0.0 on pci5 cxgbc0: PCIe x4 Link, expect reduced performance cxgbc0: using MSI-X interrupts (5 vectors) cxgbc0: firmware needs to be updated to version 7.11.0 cJan 4 13:03:02 xgbc0: Firmware Version 5.0.0 cxgb0: <Port 0 10GBASE-SR> on cxgbc0 cxgb0: Using defaults for TSO: 65518/35/2048 cxgb0: Ethernet address: 00:07:43:07:9e:14 offsite2 kernel:Fatal trap 12: page fault while in kernel mode c found old FW mipuinor version(5.0)d =, driver compile 2; d for version 7.apic11 id = 04 fault virtual address = 0x0 fault code = supervisor read instruction, page not present instruction pointer = 0x20:0x0 stack pointer = 0x28:0xfffffe085d2df728 frame pointer = 0x28:0xfffffe085d2df750 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 420 (zebra) trap number = 12 panic: page fault cpuid = 0 KDB: stack backtrace: #0 0xffffffff806fe447 at kdb_backtrace+0x67 #1 0xffffffff806b4966 at vpanic+0x186 #2 0xffffffff806b47d3 at panic+0x43 #3 0xffffffff80997f82 at trap_fatal+0x322 #4 0xffffffff8099814c at trap_pfault+0x1bc #5 0xffffffff80997800 at trap+0x280 #6 0xffffffff8097c411 at calltrap+0x8 #7 0xffffffff807b90fd at ifioctl+0x6dd #8 0xffffffff8071c1d6 at kern_ioctl+0x346 #9 0xffffffff8071bddf at sys_ioctl+0x13f #10 0xffffffff8099890e at amd64_syscall+0x50e #11 0xffffffff8097c6fb at Xfast_syscall+0xfb Uptime: 3m9s Dumping 1635 out of 32675 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91% -- ------------------- Mike Tancsa, tel +1 519 651 3400 Sentex Communications, mike at sentex.net Providing Internet services since 1994 www.sentex.net Cambridge, Ontario Canada http://www.tancsa.com/
Navdeep Parhar
2017-Jan-04 19:07 UTC
coredump when loading cxgb after boot with routing daemon already running (RELENG11)
What source line in releng-11 does ifioctl+0x6dd correspond to? (kgdb) l *(ifioctl+0x6dd) This might be race where the ifnet is being created or coming up and zebra pokes it in some way before it's fully ready. If that's the case it could affect any ifnet. Regards, Navdeep On Wed, Jan 4, 2017 at 11:00 AM, Mike Tancsa <mike at sentex.net> wrote:> I ran into a strange problem when manually loading a network driver > after RELENG_11 box starts up with a routing daemon already running. > > If I have zebra running (just a few static routes) and then try and do a > kldload if_cxgb, the box panics. If I boot the box, load the nic's > driver and then start zebra, all is fine. > > At first, I thought it might be a firmware issue, but I updated the > NIC's firmware and the same behaviour. Not sure if this is specific to > the chelsio or if any kldload of a NIC driver will do. > > > > cxgbc0: <Chelsio T310, 1 port> mem > 0xf7081000-0xf7081fff,0xf6800000-0xf6ffffff,0xf7080000-0xf7080fff irq 16 > at device 0.0 on pci5 > cxgbc0: PCIe x4 Link, expect reduced performance > cxgbc0: using MSI-X interrupts (5 vectors) > cxgbc0: firmware needs to be updated to version 7.11.0 > cJan 4 13:03:02 xgbc0: Firmware Version 5.0.0 > cxgb0: <Port 0 10GBASE-SR> on cxgbc0 > cxgb0: Using defaults for TSO: 65518/35/2048 > cxgb0: > Ethernet address: 00:07:43:07:9e:14 > > offsite2 kernel:Fatal trap 12: page fault while in kernel mode > c found old FW mipuinor version(5.0)d =, driver compile 2; d for version > 7.apic11 > id = 04 > fault virtual address = 0x0 > fault code = supervisor read instruction, page not present > instruction pointer = 0x20:0x0 > stack pointer = 0x28:0xfffffe085d2df728 > frame pointer = 0x28:0xfffffe085d2df750 > code segment = base 0x0, limit 0xfffff, type 0x1b > = DPL 0, pres 1, long 1, def32 0, gran 1 > processor eflags = interrupt enabled, resume, IOPL = 0 > current process = 420 (zebra) > trap number = 12 > panic: page fault > cpuid = 0 > KDB: stack backtrace: > #0 0xffffffff806fe447 at kdb_backtrace+0x67 > #1 0xffffffff806b4966 at vpanic+0x186 > #2 0xffffffff806b47d3 at panic+0x43 > #3 0xffffffff80997f82 at trap_fatal+0x322 > #4 0xffffffff8099814c at trap_pfault+0x1bc > #5 0xffffffff80997800 at trap+0x280 > #6 0xffffffff8097c411 at calltrap+0x8 > #7 0xffffffff807b90fd at ifioctl+0x6dd > #8 0xffffffff8071c1d6 at kern_ioctl+0x346 > #9 0xffffffff8071bddf at sys_ioctl+0x13f > #10 0xffffffff8099890e at amd64_syscall+0x50e > #11 0xffffffff8097c6fb at Xfast_syscall+0xfb > Uptime: 3m9s > Dumping 1635 out of 32675 > MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91% > -- > ------------------- > Mike Tancsa, tel +1 519 651 3400 > Sentex Communications, mike at sentex.net > Providing Internet services since 1994 www.sentex.net > Cambridge, Ontario Canada http://www.tancsa.com/