Alexander Motin
2008-Jun-04 22:25 UTC
Crashes in devfs. Possibly on interface creation/destruction.
Hi. After recent upgrading from 6.3-RC1/mpd-5.0rc1 to 6.3-STABLE/mpd-5.1 some of my PPPoE servers started to crash with about weekly period. Usually they just just hang without rebooting and core dumping. Consoles are inaccessible. All I have got from them was: kernel: Fatal trap 12: page fau kernel: lt while in k kernel: ernel kernel: mode kernel: kernel: cpuid = 1; apic id = 01 kernel: faut virtual address = 0x58 kernel: kernel: fault code = supervisor read, page not present kernel: kernel: instruction pointer = 0x20:0xc04800be kernel: kernel: stack pointer = 0x28:0xd690883c kernel: frame pointer = 0x28:0 kernel: xd6908854 kernel: code segment kernel: base 0x0, limit 0xfffff, type 0x1b kernel: kernel: = DPL 0, pres 1, def32 1, gra kernel: n 1 kernel: processor eflags = interrupt kernel: enab kernel: led, r kernel: esume kernel: , IOPL kernel: = 0 kernel: kernel: current process = 1835 (mpd5) kernel: kernel: trap number = 12 "fault virtual address" and "instruction pointer" are always the same. Address 0xc04800be looks like part of devfs code: > addr2line -f -e kernel.debug 0xc04800be devfs_populate_loop /usr/src/sys/fs/devfs/devfs_devs.c:443 devfs_devs.c: de = devfs_newdirent(s, q - s); if (cdp->cdp_c.si_flags & SI_ALIAS) { de->de_uid = 0; de->de_gid = 0; de->de_mode = 0755; de->de_dirent->d_type = DT_LNK; pdev = cdp->cdp_c.si_parent; ->> line 443 ->> j = strlen(pdev->si_name) + 1; de->de_symlink = malloc(j, M_DEVFS, M_WAITOK); bcopy(pdev->si_name, de->de_symlink, j); 0x58 - is precisely the offset of si_name field inside of struct cdev. So looks like pdev = cdp->cdp_c.si_parent is NULL here for some reason. As soon as network interfaces have respective devfs entries and looking higher interface creation/destruction rate that newest mpd5.1 is able to reach due to optimizations, I think it may be some kind or race somewhere interface creation. Can somebody give me any hint where to look to? -- Alexander Motin
Oleksandr Tymoshenko
2008-Jun-05 14:44 UTC
Crashes in devfs. Possibly on interface creation/destruction.
Alexander Motin wrote:> Hi. > > After recent upgrading from 6.3-RC1/mpd-5.0rc1 to 6.3-STABLE/mpd-5.1 > some of my PPPoE servers started to crash with about weekly period. > Usually they just just hang without rebooting and core dumping. Consoles > are inaccessible. All I have got from them was: > > kernel: Fatal trap 12: page fau > kernel: lt while in k > kernel: ernel > kernel: mode > kernel: > kernel: cpuid = 1; apic id = 01 > kernel: faut virtual address = 0x58 > kernel: > kernel: fault code = supervisor read, page not present > kernel: > kernel: instruction pointer = 0x20:0xc04800be > kernel: > kernel: stack pointer = 0x28:0xd690883c > kernel: frame pointer = 0x28:0 > kernel: xd6908854 > kernel: code segment > kernel: base 0x0, limit 0xfffff, type 0x1b > kernel: > kernel: = DPL 0, pres 1, def32 1, gra > kernel: n 1 > kernel: processor eflags = interrupt > kernel: enab > kernel: led, r > kernel: esume > kernel: , IOPL > kernel: = 0 > kernel: > kernel: current process = 1835 (mpd5) > kernel: > kernel: trap number = 12 > > "fault virtual address" and "instruction pointer" are always the same. > > Address 0xc04800be looks like part of devfs code: > > addr2line -f -e kernel.debug 0xc04800be > devfs_populate_loop > /usr/src/sys/fs/devfs/devfs_devs.c:443 > > devfs_devs.c: > de = devfs_newdirent(s, q - s); > if (cdp->cdp_c.si_flags & SI_ALIAS) { > de->de_uid = 0; > de->de_gid = 0; > de->de_mode = 0755; > de->de_dirent->d_type = DT_LNK; > pdev = cdp->cdp_c.si_parent; > ->> line 443 ->> j = strlen(pdev->si_name) + 1; > de->de_symlink = malloc(j, M_DEVFS, M_WAITOK); > bcopy(pdev->si_name, de->de_symlink, j); > > 0x58 - is precisely the offset of si_name field inside of struct cdev. > So looks like pdev = cdp->cdp_c.si_parent is NULL here for some reason. > > As soon as network interfaces have respective devfs entries and looking > higher interface creation/destruction rate that newest mpd5.1 is able to > reach due to optimizations, I think it may be some kind or race > somewhere interface creation. > > Can somebody give me any hint where to look to?On a quick glance the most likely place is make_dev_alias call in net/if.c line 457. And the most likely suspect is race for if_index variable. There are even a couple of "XXX: should be locked" notes there :) -- gonzo
Kostik Belousov
2008-Jun-05 15:02 UTC
Crashes in devfs. Possibly on interface creation/destruction.
On Thu, Jun 05, 2008 at 12:25:39AM +0300, Alexander Motin wrote:> Hi. > > After recent upgrading from 6.3-RC1/mpd-5.0rc1 to 6.3-STABLE/mpd-5.1 > some of my PPPoE servers started to crash with about weekly period. > Usually they just just hang without rebooting and core dumping. Consoles > are inaccessible. All I have got from them was: > > kernel: Fatal trap 12: page fau > kernel: lt while in k > kernel: ernel > kernel: mode > kernel: > kernel: cpuid = 1; apic id = 01 > kernel: faut virtual address = 0x58 > kernel: > kernel: fault code = supervisor read, page not present > kernel: > kernel: instruction pointer = 0x20:0xc04800be > kernel: > kernel: stack pointer = 0x28:0xd690883c > kernel: frame pointer = 0x28:0 > kernel: xd6908854 > kernel: code segment > kernel: base 0x0, limit 0xfffff, type 0x1b > kernel: > kernel: = DPL 0, pres 1, def32 1, gra > kernel: n 1 > kernel: processor eflags = interrupt > kernel: enab > kernel: led, r > kernel: esume > kernel: , IOPL > kernel: = 0 > kernel: > kernel: current process = 1835 (mpd5) > kernel: > kernel: trap number = 12 > > "fault virtual address" and "instruction pointer" are always the same. > > Address 0xc04800be looks like part of devfs code: > > addr2line -f -e kernel.debug 0xc04800be > devfs_populate_loop > /usr/src/sys/fs/devfs/devfs_devs.c:443 > > devfs_devs.c: > de = devfs_newdirent(s, q - s); > if (cdp->cdp_c.si_flags & SI_ALIAS) { > de->de_uid = 0; > de->de_gid = 0; > de->de_mode = 0755; > de->de_dirent->d_type = DT_LNK; > pdev = cdp->cdp_c.si_parent; > ->> line 443 ->> j = strlen(pdev->si_name) + 1; > de->de_symlink = malloc(j, M_DEVFS, M_WAITOK); > bcopy(pdev->si_name, de->de_symlink, j); > > 0x58 - is precisely the offset of si_name field inside of struct cdev. > So looks like pdev = cdp->cdp_c.si_parent is NULL here for some reason. > > As soon as network interfaces have respective devfs entries and looking > higher interface creation/destruction rate that newest mpd5.1 is able to > reach due to optimizations, I think it may be some kind or race > somewhere interface creation. > > Can somebody give me any hint where to look to?Try the following patch. It is against current, there might be further races at the device destruction, but may be not. Also, please note that devfs in RELENG_6 and RELENG_7/CURRENT are diverged enough to make MFC of most bugfixes to RELENG_6 nearly impossible. diff --git a/sys/kern/kern_conf.c b/sys/kern/kern_conf.c index e9d0f7b..af9a47d 100644 --- a/sys/kern/kern_conf.c +++ b/sys/kern/kern_conf.c @@ -825,9 +825,9 @@ make_dev_alias(struct cdev *pdev, const char *fmt, ...) va_end(ap); devfs_create(dev); + dev_dependsl(pdev, dev); clean_unrhdrl(devfs_inos); dev_unlock(); - dev_depends(pdev, dev); notify_create(dev); -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 195 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20080605/55a1c23d/attachment.pgp
Alexander Motin
2008-Jun-11 09:36 UTC
Crashes in devfs. Possibly on interface creation/destruction.
Kostik Belousov wrote:> Try the following patch. It is against current, there might be further > races at the device destruction, but may be not. Also, please note that > devfs in RELENG_6 and RELENG_7/CURRENT are diverged enough to make MFC > of most bugfixes to RELENG_6 nearly impossible. > > diff --git a/sys/kern/kern_conf.c b/sys/kern/kern_conf.c > index e9d0f7b..af9a47d 100644 > --- a/sys/kern/kern_conf.c > +++ b/sys/kern/kern_conf.c > @@ -825,9 +825,9 @@ make_dev_alias(struct cdev *pdev, const char *fmt, ...) > va_end(ap); > > devfs_create(dev); > + dev_dependsl(pdev, dev); > clean_unrhdrl(devfs_inos); > dev_unlock(); > - dev_depends(pdev, dev); > > notify_create(dev);Looks reasonable. For RELENG_6 it also applies with minor differences. Put it to the production. As soon as problem shows itself not very often, positive result probably will be seen only after several weeks of successive operation. Thank you. -- Alexander Motin