Navdeep Parhar
2017-Jan-04 19:07 UTC
coredump when loading cxgb after boot with routing daemon already running (RELENG11)
What source line in releng-11 does ifioctl+0x6dd correspond to? (kgdb) l *(ifioctl+0x6dd) This might be race where the ifnet is being created or coming up and zebra pokes it in some way before it's fully ready. If that's the case it could affect any ifnet. Regards, Navdeep On Wed, Jan 4, 2017 at 11:00 AM, Mike Tancsa <mike at sentex.net> wrote:> I ran into a strange problem when manually loading a network driver > after RELENG_11 box starts up with a routing daemon already running. > > If I have zebra running (just a few static routes) and then try and do a > kldload if_cxgb, the box panics. If I boot the box, load the nic's > driver and then start zebra, all is fine. > > At first, I thought it might be a firmware issue, but I updated the > NIC's firmware and the same behaviour. Not sure if this is specific to > the chelsio or if any kldload of a NIC driver will do. > > > > cxgbc0: <Chelsio T310, 1 port> mem > 0xf7081000-0xf7081fff,0xf6800000-0xf6ffffff,0xf7080000-0xf7080fff irq 16 > at device 0.0 on pci5 > cxgbc0: PCIe x4 Link, expect reduced performance > cxgbc0: using MSI-X interrupts (5 vectors) > cxgbc0: firmware needs to be updated to version 7.11.0 > cJan 4 13:03:02 xgbc0: Firmware Version 5.0.0 > cxgb0: <Port 0 10GBASE-SR> on cxgbc0 > cxgb0: Using defaults for TSO: 65518/35/2048 > cxgb0: > Ethernet address: 00:07:43:07:9e:14 > > offsite2 kernel:Fatal trap 12: page fault while in kernel mode > c found old FW mipuinor version(5.0)d =, driver compile 2; d for version > 7.apic11 > id = 04 > fault virtual address = 0x0 > fault code = supervisor read instruction, page not present > instruction pointer = 0x20:0x0 > stack pointer = 0x28:0xfffffe085d2df728 > frame pointer = 0x28:0xfffffe085d2df750 > code segment = base 0x0, limit 0xfffff, type 0x1b > = DPL 0, pres 1, long 1, def32 0, gran 1 > processor eflags = interrupt enabled, resume, IOPL = 0 > current process = 420 (zebra) > trap number = 12 > panic: page fault > cpuid = 0 > KDB: stack backtrace: > #0 0xffffffff806fe447 at kdb_backtrace+0x67 > #1 0xffffffff806b4966 at vpanic+0x186 > #2 0xffffffff806b47d3 at panic+0x43 > #3 0xffffffff80997f82 at trap_fatal+0x322 > #4 0xffffffff8099814c at trap_pfault+0x1bc > #5 0xffffffff80997800 at trap+0x280 > #6 0xffffffff8097c411 at calltrap+0x8 > #7 0xffffffff807b90fd at ifioctl+0x6dd > #8 0xffffffff8071c1d6 at kern_ioctl+0x346 > #9 0xffffffff8071bddf at sys_ioctl+0x13f > #10 0xffffffff8099890e at amd64_syscall+0x50e > #11 0xffffffff8097c6fb at Xfast_syscall+0xfb > Uptime: 3m9s > Dumping 1635 out of 32675 > MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91% > -- > ------------------- > Mike Tancsa, tel +1 519 651 3400 > Sentex Communications, mike at sentex.net > Providing Internet services since 1994 www.sentex.net > Cambridge, Ontario Canada http://www.tancsa.com/
Mike Tancsa
2017-Jan-04 19:15 UTC
coredump when loading cxgb after boot with routing daemon already running (RELENG11)
On 1/4/2017 2:07 PM, Navdeep Parhar wrote:> What source line in releng-11 does ifioctl+0x6dd correspond to? > > (kgdb) l *(ifioctl+0x6dd) > > This might be race where the ifnet is being created or coming up and > zebra pokes it in some way before it's fully ready. If that's the > case it could affect any ifnet.Hi Navdeep, Thanks for looking. yes, I just tried it with igb and a similar panic. igb0: <Intel(R) PRO/1000 Network Connection, Version - 2.5.3-k> port 0xc000-0xc01f mem 0xf7200000-0xf727ffff,0xf7280000-0xf7283fff irq 17 at device 0.0 on pci4 igb0: Using MSIX interrupts with 5 vectors igb0: Ethernet address: 00:25:90:47:b5:d8 Fatal trap 12: page fault while in kernel mode cpuid = 3; apic id = 06 fault virtual address = 0x0 fault code = supervisor read instruction, page not present instruction pointer = 0x20:0x0 stack pointer = 0x28:0xfffffe085d4d1728 frame pointer = 0x28:0xfffffe085d4d1750 igb0: code segment = base 0x0, limit 0xfffff, type 0x1b Bound queue 0 to cpu 0 = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 846 (zebra) trap number = 12 panic: page fault cpuid = 3 KDB: stack backtrace: #0 0xffffffff806efae7 at kdb_backtrace+0x67 #1 0xffffffff806a6006 at vpanic+0x186 #2 0xffffffff806a5e73 at panic+0x43 #3 0xffffffff80989622 at trap_fatal+0x322 #4 0xffffffff809897ec at trap_pfault+0x1bc #5 0xffffffff80988ea0 at trap+0x280 #6 0xffffffff8096dab1 at calltrap+0x8 #7 0xffffffff807aa79d at ifioctl+0x6dd #8 0xffffffff8070d876 at kern_ioctl+0x346 #9 0xffffffff8070d47f at sys_ioctl+0x13f #10 0xffffffff80989fae at amd64_syscall+0x50e #11 0xffffffff8096dd9b at Xfast_syscall+0xfb Uptime: 1m9s Dumping 1267 out of 32675 MB:..2%..11%..21%..31%..41%..51%..61%..71%..81%..91% Dump complete kgdb) l *(ifioctl+0x6dd) 0xffffffff807b90fd is in ifioctl (/usr/src/sys/net/if.c:2655). 2650 case SIOCGIFMEDIA: 2651 case SIOCGIFXMEDIA: 2652 case SIOCGIFGENERIC: 2653 if (ifp->if_ioctl == NULL) 2654 return (EOPNOTSUPP); 2655 error = (*ifp->if_ioctl)(ifp, cmd, data); 2656 break; 2657 2658 case SIOCSIFLLADDR: 2659 error = priv_check(td, PRIV_NET_SETLLADDR); Current language: auto; currently minimal (kgdb)> > Regards, > Navdeep > > > > On Wed, Jan 4, 2017 at 11:00 AM, Mike Tancsa <mike at sentex.net> wrote: >> I ran into a strange problem when manually loading a network driver >> after RELENG_11 box starts up with a routing daemon already running. >> >> If I have zebra running (just a few static routes) and then try and do a >> kldload if_cxgb, the box panics. If I boot the box, load the nic's >> driver and then start zebra, all is fine. >> >> At first, I thought it might be a firmware issue, but I updated the >> NIC's firmware and the same behaviour. Not sure if this is specific to >> the chelsio or if any kldload of a NIC driver will do. >> >> >> >> cxgbc0: <Chelsio T310, 1 port> mem >> 0xf7081000-0xf7081fff,0xf6800000-0xf6ffffff,0xf7080000-0xf7080fff irq 16 >> at device 0.0 on pci5 >> cxgbc0: PCIe x4 Link, expect reduced performance >> cxgbc0: using MSI-X interrupts (5 vectors) >> cxgbc0: firmware needs to be updated to version 7.11.0 >> cJan 4 13:03:02 xgbc0: Firmware Version 5.0.0 >> cxgb0: <Port 0 10GBASE-SR> on cxgbc0 >> cxgb0: Using defaults for TSO: 65518/35/2048 >> cxgb0: >> Ethernet address: 00:07:43:07:9e:14 >> >> offsite2 kernel:Fatal trap 12: page fault while in kernel mode >> c found old FW mipuinor version(5.0)d =, driver compile 2; d for version >> 7.apic11 >> id = 04 >> fault virtual address = 0x0 >> fault code = supervisor read instruction, page not present >> instruction pointer = 0x20:0x0 >> stack pointer = 0x28:0xfffffe085d2df728 >> frame pointer = 0x28:0xfffffe085d2df750 >> code segment = base 0x0, limit 0xfffff, type 0x1b >> = DPL 0, pres 1, long 1, def32 0, gran 1 >> processor eflags = interrupt enabled, resume, IOPL = 0 >> current process = 420 (zebra) >> trap number = 12 >> panic: page fault >> cpuid = 0 >> KDB: stack backtrace: >> #0 0xffffffff806fe447 at kdb_backtrace+0x67 >> #1 0xffffffff806b4966 at vpanic+0x186 >> #2 0xffffffff806b47d3 at panic+0x43 >> #3 0xffffffff80997f82 at trap_fatal+0x322 >> #4 0xffffffff8099814c at trap_pfault+0x1bc >> #5 0xffffffff80997800 at trap+0x280 >> #6 0xffffffff8097c411 at calltrap+0x8 >> #7 0xffffffff807b90fd at ifioctl+0x6dd >> #8 0xffffffff8071c1d6 at kern_ioctl+0x346 >> #9 0xffffffff8071bddf at sys_ioctl+0x13f >> #10 0xffffffff8099890e at amd64_syscall+0x50e >> #11 0xffffffff8097c6fb at Xfast_syscall+0xfb >> Uptime: 3m9s >> Dumping 1635 out of 32675 >> MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91% >> -- >> ------------------- >> Mike Tancsa, tel +1 519 651 3400 >> Sentex Communications, mike at sentex.net >> Providing Internet services since 1994 www.sentex.net >> Cambridge, Ontario Canada http://www.tancsa.com/ > >-- ------------------- Mike Tancsa, tel +1 519 651 3400 Sentex Communications, mike at sentex.net Providing Internet services since 1994 www.sentex.net Cambridge, Ontario Canada http://www.tancsa.com/
Andrey V. Elsukov
2017-Jan-08 13:22 UTC
coredump when loading cxgb after boot with routing daemon already running (RELENG11)
On 04.01.2017 22:07, Navdeep Parhar wrote:> What source line in releng-11 does ifioctl+0x6dd correspond to? > > (kgdb) l *(ifioctl+0x6dd) > > This might be race where the ifnet is being created or coming up and > zebra pokes it in some way before it's fully ready. If that's the > case it could affect any ifnet.Hi, from a quick look, it seems that ifnet becomes available for any actions just after if_alloc() and any strange things can happen in a window after if_alloc() and before if_attach(). Am I right? -- WBR, Andrey V. Elsukov