Mike Tancsa
2017-Jan-04 19:15 UTC
coredump when loading cxgb after boot with routing daemon already running (RELENG11)
On 1/4/2017 2:07 PM, Navdeep Parhar wrote:> What source line in releng-11 does ifioctl+0x6dd correspond to? > > (kgdb) l *(ifioctl+0x6dd) > > This might be race where the ifnet is being created or coming up and > zebra pokes it in some way before it's fully ready. If that's the > case it could affect any ifnet.Hi Navdeep, Thanks for looking. yes, I just tried it with igb and a similar panic. igb0: <Intel(R) PRO/1000 Network Connection, Version - 2.5.3-k> port 0xc000-0xc01f mem 0xf7200000-0xf727ffff,0xf7280000-0xf7283fff irq 17 at device 0.0 on pci4 igb0: Using MSIX interrupts with 5 vectors igb0: Ethernet address: 00:25:90:47:b5:d8 Fatal trap 12: page fault while in kernel mode cpuid = 3; apic id = 06 fault virtual address = 0x0 fault code = supervisor read instruction, page not present instruction pointer = 0x20:0x0 stack pointer = 0x28:0xfffffe085d4d1728 frame pointer = 0x28:0xfffffe085d4d1750 igb0: code segment = base 0x0, limit 0xfffff, type 0x1b Bound queue 0 to cpu 0 = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 846 (zebra) trap number = 12 panic: page fault cpuid = 3 KDB: stack backtrace: #0 0xffffffff806efae7 at kdb_backtrace+0x67 #1 0xffffffff806a6006 at vpanic+0x186 #2 0xffffffff806a5e73 at panic+0x43 #3 0xffffffff80989622 at trap_fatal+0x322 #4 0xffffffff809897ec at trap_pfault+0x1bc #5 0xffffffff80988ea0 at trap+0x280 #6 0xffffffff8096dab1 at calltrap+0x8 #7 0xffffffff807aa79d at ifioctl+0x6dd #8 0xffffffff8070d876 at kern_ioctl+0x346 #9 0xffffffff8070d47f at sys_ioctl+0x13f #10 0xffffffff80989fae at amd64_syscall+0x50e #11 0xffffffff8096dd9b at Xfast_syscall+0xfb Uptime: 1m9s Dumping 1267 out of 32675 MB:..2%..11%..21%..31%..41%..51%..61%..71%..81%..91% Dump complete kgdb) l *(ifioctl+0x6dd) 0xffffffff807b90fd is in ifioctl (/usr/src/sys/net/if.c:2655). 2650 case SIOCGIFMEDIA: 2651 case SIOCGIFXMEDIA: 2652 case SIOCGIFGENERIC: 2653 if (ifp->if_ioctl == NULL) 2654 return (EOPNOTSUPP); 2655 error = (*ifp->if_ioctl)(ifp, cmd, data); 2656 break; 2657 2658 case SIOCSIFLLADDR: 2659 error = priv_check(td, PRIV_NET_SETLLADDR); Current language: auto; currently minimal (kgdb)> > Regards, > Navdeep > > > > On Wed, Jan 4, 2017 at 11:00 AM, Mike Tancsa <mike at sentex.net> wrote: >> I ran into a strange problem when manually loading a network driver >> after RELENG_11 box starts up with a routing daemon already running. >> >> If I have zebra running (just a few static routes) and then try and do a >> kldload if_cxgb, the box panics. If I boot the box, load the nic's >> driver and then start zebra, all is fine. >> >> At first, I thought it might be a firmware issue, but I updated the >> NIC's firmware and the same behaviour. Not sure if this is specific to >> the chelsio or if any kldload of a NIC driver will do. >> >> >> >> cxgbc0: <Chelsio T310, 1 port> mem >> 0xf7081000-0xf7081fff,0xf6800000-0xf6ffffff,0xf7080000-0xf7080fff irq 16 >> at device 0.0 on pci5 >> cxgbc0: PCIe x4 Link, expect reduced performance >> cxgbc0: using MSI-X interrupts (5 vectors) >> cxgbc0: firmware needs to be updated to version 7.11.0 >> cJan 4 13:03:02 xgbc0: Firmware Version 5.0.0 >> cxgb0: <Port 0 10GBASE-SR> on cxgbc0 >> cxgb0: Using defaults for TSO: 65518/35/2048 >> cxgb0: >> Ethernet address: 00:07:43:07:9e:14 >> >> offsite2 kernel:Fatal trap 12: page fault while in kernel mode >> c found old FW mipuinor version(5.0)d =, driver compile 2; d for version >> 7.apic11 >> id = 04 >> fault virtual address = 0x0 >> fault code = supervisor read instruction, page not present >> instruction pointer = 0x20:0x0 >> stack pointer = 0x28:0xfffffe085d2df728 >> frame pointer = 0x28:0xfffffe085d2df750 >> code segment = base 0x0, limit 0xfffff, type 0x1b >> = DPL 0, pres 1, long 1, def32 0, gran 1 >> processor eflags = interrupt enabled, resume, IOPL = 0 >> current process = 420 (zebra) >> trap number = 12 >> panic: page fault >> cpuid = 0 >> KDB: stack backtrace: >> #0 0xffffffff806fe447 at kdb_backtrace+0x67 >> #1 0xffffffff806b4966 at vpanic+0x186 >> #2 0xffffffff806b47d3 at panic+0x43 >> #3 0xffffffff80997f82 at trap_fatal+0x322 >> #4 0xffffffff8099814c at trap_pfault+0x1bc >> #5 0xffffffff80997800 at trap+0x280 >> #6 0xffffffff8097c411 at calltrap+0x8 >> #7 0xffffffff807b90fd at ifioctl+0x6dd >> #8 0xffffffff8071c1d6 at kern_ioctl+0x346 >> #9 0xffffffff8071bddf at sys_ioctl+0x13f >> #10 0xffffffff8099890e at amd64_syscall+0x50e >> #11 0xffffffff8097c6fb at Xfast_syscall+0xfb >> Uptime: 3m9s >> Dumping 1635 out of 32675 >> MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91% >> -- >> ------------------- >> Mike Tancsa, tel +1 519 651 3400 >> Sentex Communications, mike at sentex.net >> Providing Internet services since 1994 www.sentex.net >> Cambridge, Ontario Canada http://www.tancsa.com/ > >-- ------------------- Mike Tancsa, tel +1 519 651 3400 Sentex Communications, mike at sentex.net Providing Internet services since 1994 www.sentex.net Cambridge, Ontario Canada http://www.tancsa.com/
Navdeep Parhar
2017-Jan-04 19:26 UTC
coredump when loading cxgb after boot with routing daemon already running (RELENG11)
Please file a bug against the network stack. Is zebra easy to install/configure? Send me details of your configuration offline and I can try it on head if it's something straightforward. Regards, Navdeep On Wed, Jan 4, 2017 at 11:15 AM, Mike Tancsa <mike at sentex.net> wrote:> On 1/4/2017 2:07 PM, Navdeep Parhar wrote: >> What source line in releng-11 does ifioctl+0x6dd correspond to? >> >> (kgdb) l *(ifioctl+0x6dd) >> >> This might be race where the ifnet is being created or coming up and >> zebra pokes it in some way before it's fully ready. If that's the >> case it could affect any ifnet. > > Hi Navdeep, > Thanks for looking. yes, I just tried it with igb and a similar panic. > > > igb0: <Intel(R) PRO/1000 Network Connection, Version - 2.5.3-k> port > 0xc000-0xc01f mem 0xf7200000-0xf727ffff,0xf7280000-0xf7283fff irq 17 at > device 0.0 on pci4 > igb0: Using MSIX interrupts with 5 vectors > igb0: > Ethernet address: 00:25:90:47:b5:d8 > > Fatal trap 12: page fault while in kernel mode > cpuid = 3; apic id = 06 > fault virtual address = 0x0 > fault code = supervisor read instruction, page not present > instruction pointer = 0x20:0x0 > stack pointer = 0x28:0xfffffe085d4d1728 > frame pointer = 0x28:0xfffffe085d4d1750 > igb0: code segment = base 0x0, limit 0xfffff, type 0x1b > Bound queue 0 to cpu 0 > = DPL 0, pres 1, long 1, def32 0, gran 1 > processor eflags = interrupt enabled, resume, IOPL = 0 > current process = 846 (zebra) > trap number = 12 > panic: page fault > cpuid = 3 > KDB: stack backtrace: > #0 0xffffffff806efae7 at kdb_backtrace+0x67 > #1 0xffffffff806a6006 at vpanic+0x186 > #2 0xffffffff806a5e73 at panic+0x43 > #3 0xffffffff80989622 at trap_fatal+0x322 > #4 0xffffffff809897ec at trap_pfault+0x1bc > #5 0xffffffff80988ea0 at trap+0x280 > #6 0xffffffff8096dab1 at calltrap+0x8 > #7 0xffffffff807aa79d at ifioctl+0x6dd > #8 0xffffffff8070d876 at kern_ioctl+0x346 > #9 0xffffffff8070d47f at sys_ioctl+0x13f > #10 0xffffffff80989fae at amd64_syscall+0x50e > #11 0xffffffff8096dd9b at Xfast_syscall+0xfb > Uptime: 1m9s > Dumping 1267 out of 32675 > MB:..2%..11%..21%..31%..41%..51%..61%..71%..81%..91% > Dump complete > > > kgdb) l *(ifioctl+0x6dd) > 0xffffffff807b90fd is in ifioctl (/usr/src/sys/net/if.c:2655). > 2650 case SIOCGIFMEDIA: > 2651 case SIOCGIFXMEDIA: > 2652 case SIOCGIFGENERIC: > 2653 if (ifp->if_ioctl == NULL) > 2654 return (EOPNOTSUPP); > 2655 error = (*ifp->if_ioctl)(ifp, cmd, data); > 2656 break; > 2657 > 2658 case SIOCSIFLLADDR: > 2659 error = priv_check(td, PRIV_NET_SETLLADDR); > Current language: auto; currently minimal > (kgdb) > > > >> >> Regards, >> Navdeep >> >> >> >> On Wed, Jan 4, 2017 at 11:00 AM, Mike Tancsa <mike at sentex.net> wrote: >>> I ran into a strange problem when manually loading a network driver >>> after RELENG_11 box starts up with a routing daemon already running. >>> >>> If I have zebra running (just a few static routes) and then try and do a >>> kldload if_cxgb, the box panics. If I boot the box, load the nic's >>> driver and then start zebra, all is fine. >>> >>> At first, I thought it might be a firmware issue, but I updated the >>> NIC's firmware and the same behaviour. Not sure if this is specific to >>> the chelsio or if any kldload of a NIC driver will do. >>> >>> >>> >>> cxgbc0: <Chelsio T310, 1 port> mem >>> 0xf7081000-0xf7081fff,0xf6800000-0xf6ffffff,0xf7080000-0xf7080fff irq 16 >>> at device 0.0 on pci5 >>> cxgbc0: PCIe x4 Link, expect reduced performance >>> cxgbc0: using MSI-X interrupts (5 vectors) >>> cxgbc0: firmware needs to be updated to version 7.11.0 >>> cJan 4 13:03:02 xgbc0: Firmware Version 5.0.0 >>> cxgb0: <Port 0 10GBASE-SR> on cxgbc0 >>> cxgb0: Using defaults for TSO: 65518/35/2048 >>> cxgb0: >>> Ethernet address: 00:07:43:07:9e:14 >>> >>> offsite2 kernel:Fatal trap 12: page fault while in kernel mode >>> c found old FW mipuinor version(5.0)d =, driver compile 2; d for version >>> 7.apic11 >>> id = 04 >>> fault virtual address = 0x0 >>> fault code = supervisor read instruction, page not present >>> instruction pointer = 0x20:0x0 >>> stack pointer = 0x28:0xfffffe085d2df728 >>> frame pointer = 0x28:0xfffffe085d2df750 >>> code segment = base 0x0, limit 0xfffff, type 0x1b >>> = DPL 0, pres 1, long 1, def32 0, gran 1 >>> processor eflags = interrupt enabled, resume, IOPL = 0 >>> current process = 420 (zebra) >>> trap number = 12 >>> panic: page fault >>> cpuid = 0 >>> KDB: stack backtrace: >>> #0 0xffffffff806fe447 at kdb_backtrace+0x67 >>> #1 0xffffffff806b4966 at vpanic+0x186 >>> #2 0xffffffff806b47d3 at panic+0x43 >>> #3 0xffffffff80997f82 at trap_fatal+0x322 >>> #4 0xffffffff8099814c at trap_pfault+0x1bc >>> #5 0xffffffff80997800 at trap+0x280 >>> #6 0xffffffff8097c411 at calltrap+0x8 >>> #7 0xffffffff807b90fd at ifioctl+0x6dd >>> #8 0xffffffff8071c1d6 at kern_ioctl+0x346 >>> #9 0xffffffff8071bddf at sys_ioctl+0x13f >>> #10 0xffffffff8099890e at amd64_syscall+0x50e >>> #11 0xffffffff8097c6fb at Xfast_syscall+0xfb >>> Uptime: 3m9s >>> Dumping 1635 out of 32675 >>> MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91% >>> -- >>> ------------------- >>> Mike Tancsa, tel +1 519 651 3400 >>> Sentex Communications, mike at sentex.net >>> Providing Internet services since 1994 www.sentex.net >>> Cambridge, Ontario Canada http://www.tancsa.com/ >> >> > > > -- > ------------------- > Mike Tancsa, tel +1 519 651 3400 > Sentex Communications, mike at sentex.net > Providing Internet services since 1994 www.sentex.net > Cambridge, Ontario Canada http://www.tancsa.com/