Alexander Motin
2008-Jun-04 22:25 UTC
Crashes in devfs. Possibly on interface creation/destruction.
Hi.
After recent upgrading from 6.3-RC1/mpd-5.0rc1 to 6.3-STABLE/mpd-5.1
some of my PPPoE servers started to crash with about weekly period.
Usually they just just hang without rebooting and core dumping. Consoles
are inaccessible. All I have got from them was:
kernel: Fatal trap 12: page fau
kernel: lt while in k
kernel: ernel
kernel: mode
kernel:
kernel: cpuid = 1; apic id = 01
kernel: faut virtual address = 0x58
kernel:
kernel: fault code = supervisor read, page not present
kernel:
kernel: instruction pointer = 0x20:0xc04800be
kernel:
kernel: stack pointer = 0x28:0xd690883c
kernel: frame pointer = 0x28:0
kernel: xd6908854
kernel: code segment kernel: base 0x0, limit 0xfffff, type 0x1b
kernel:
kernel: = DPL 0, pres 1, def32 1, gra
kernel: n 1
kernel: processor eflags = interrupt
kernel: enab
kernel: led, r
kernel: esume
kernel: , IOPL
kernel: = 0
kernel:
kernel: current process = 1835 (mpd5)
kernel:
kernel: trap number = 12
"fault virtual address" and "instruction pointer" are always
the same.
Address 0xc04800be looks like part of devfs code:
> addr2line -f -e kernel.debug 0xc04800be
devfs_populate_loop
/usr/src/sys/fs/devfs/devfs_devs.c:443
devfs_devs.c:
de = devfs_newdirent(s, q - s);
if (cdp->cdp_c.si_flags & SI_ALIAS) {
de->de_uid = 0;
de->de_gid = 0;
de->de_mode = 0755;
de->de_dirent->d_type = DT_LNK;
pdev = cdp->cdp_c.si_parent;
->> line 443 ->> j = strlen(pdev->si_name) + 1;
de->de_symlink = malloc(j, M_DEVFS, M_WAITOK);
bcopy(pdev->si_name, de->de_symlink, j);
0x58 - is precisely the offset of si_name field inside of struct cdev.
So looks like pdev = cdp->cdp_c.si_parent is NULL here for some reason.
As soon as network interfaces have respective devfs entries and looking
higher interface creation/destruction rate that newest mpd5.1 is able to
reach due to optimizations, I think it may be some kind or race
somewhere interface creation.
Can somebody give me any hint where to look to?
--
Alexander Motin
Oleksandr Tymoshenko
2008-Jun-05 14:44 UTC
Crashes in devfs. Possibly on interface creation/destruction.
Alexander Motin wrote:> Hi. > > After recent upgrading from 6.3-RC1/mpd-5.0rc1 to 6.3-STABLE/mpd-5.1 > some of my PPPoE servers started to crash with about weekly period. > Usually they just just hang without rebooting and core dumping. Consoles > are inaccessible. All I have got from them was: > > kernel: Fatal trap 12: page fau > kernel: lt while in k > kernel: ernel > kernel: mode > kernel: > kernel: cpuid = 1; apic id = 01 > kernel: faut virtual address = 0x58 > kernel: > kernel: fault code = supervisor read, page not present > kernel: > kernel: instruction pointer = 0x20:0xc04800be > kernel: > kernel: stack pointer = 0x28:0xd690883c > kernel: frame pointer = 0x28:0 > kernel: xd6908854 > kernel: code segment > kernel: base 0x0, limit 0xfffff, type 0x1b > kernel: > kernel: = DPL 0, pres 1, def32 1, gra > kernel: n 1 > kernel: processor eflags = interrupt > kernel: enab > kernel: led, r > kernel: esume > kernel: , IOPL > kernel: = 0 > kernel: > kernel: current process = 1835 (mpd5) > kernel: > kernel: trap number = 12 > > "fault virtual address" and "instruction pointer" are always the same. > > Address 0xc04800be looks like part of devfs code: > > addr2line -f -e kernel.debug 0xc04800be > devfs_populate_loop > /usr/src/sys/fs/devfs/devfs_devs.c:443 > > devfs_devs.c: > de = devfs_newdirent(s, q - s); > if (cdp->cdp_c.si_flags & SI_ALIAS) { > de->de_uid = 0; > de->de_gid = 0; > de->de_mode = 0755; > de->de_dirent->d_type = DT_LNK; > pdev = cdp->cdp_c.si_parent; > ->> line 443 ->> j = strlen(pdev->si_name) + 1; > de->de_symlink = malloc(j, M_DEVFS, M_WAITOK); > bcopy(pdev->si_name, de->de_symlink, j); > > 0x58 - is precisely the offset of si_name field inside of struct cdev. > So looks like pdev = cdp->cdp_c.si_parent is NULL here for some reason. > > As soon as network interfaces have respective devfs entries and looking > higher interface creation/destruction rate that newest mpd5.1 is able to > reach due to optimizations, I think it may be some kind or race > somewhere interface creation. > > Can somebody give me any hint where to look to?On a quick glance the most likely place is make_dev_alias call in net/if.c line 457. And the most likely suspect is race for if_index variable. There are even a couple of "XXX: should be locked" notes there :) -- gonzo
Kostik Belousov
2008-Jun-05 15:02 UTC
Crashes in devfs. Possibly on interface creation/destruction.
On Thu, Jun 05, 2008 at 12:25:39AM +0300, Alexander Motin wrote:> Hi. > > After recent upgrading from 6.3-RC1/mpd-5.0rc1 to 6.3-STABLE/mpd-5.1 > some of my PPPoE servers started to crash with about weekly period. > Usually they just just hang without rebooting and core dumping. Consoles > are inaccessible. All I have got from them was: > > kernel: Fatal trap 12: page fau > kernel: lt while in k > kernel: ernel > kernel: mode > kernel: > kernel: cpuid = 1; apic id = 01 > kernel: faut virtual address = 0x58 > kernel: > kernel: fault code = supervisor read, page not present > kernel: > kernel: instruction pointer = 0x20:0xc04800be > kernel: > kernel: stack pointer = 0x28:0xd690883c > kernel: frame pointer = 0x28:0 > kernel: xd6908854 > kernel: code segment > kernel: base 0x0, limit 0xfffff, type 0x1b > kernel: > kernel: = DPL 0, pres 1, def32 1, gra > kernel: n 1 > kernel: processor eflags = interrupt > kernel: enab > kernel: led, r > kernel: esume > kernel: , IOPL > kernel: = 0 > kernel: > kernel: current process = 1835 (mpd5) > kernel: > kernel: trap number = 12 > > "fault virtual address" and "instruction pointer" are always the same. > > Address 0xc04800be looks like part of devfs code: > > addr2line -f -e kernel.debug 0xc04800be > devfs_populate_loop > /usr/src/sys/fs/devfs/devfs_devs.c:443 > > devfs_devs.c: > de = devfs_newdirent(s, q - s); > if (cdp->cdp_c.si_flags & SI_ALIAS) { > de->de_uid = 0; > de->de_gid = 0; > de->de_mode = 0755; > de->de_dirent->d_type = DT_LNK; > pdev = cdp->cdp_c.si_parent; > ->> line 443 ->> j = strlen(pdev->si_name) + 1; > de->de_symlink = malloc(j, M_DEVFS, M_WAITOK); > bcopy(pdev->si_name, de->de_symlink, j); > > 0x58 - is precisely the offset of si_name field inside of struct cdev. > So looks like pdev = cdp->cdp_c.si_parent is NULL here for some reason. > > As soon as network interfaces have respective devfs entries and looking > higher interface creation/destruction rate that newest mpd5.1 is able to > reach due to optimizations, I think it may be some kind or race > somewhere interface creation. > > Can somebody give me any hint where to look to?Try the following patch. It is against current, there might be further races at the device destruction, but may be not. Also, please note that devfs in RELENG_6 and RELENG_7/CURRENT are diverged enough to make MFC of most bugfixes to RELENG_6 nearly impossible. diff --git a/sys/kern/kern_conf.c b/sys/kern/kern_conf.c index e9d0f7b..af9a47d 100644 --- a/sys/kern/kern_conf.c +++ b/sys/kern/kern_conf.c @@ -825,9 +825,9 @@ make_dev_alias(struct cdev *pdev, const char *fmt, ...) va_end(ap); devfs_create(dev); + dev_dependsl(pdev, dev); clean_unrhdrl(devfs_inos); dev_unlock(); - dev_depends(pdev, dev); notify_create(dev); -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 195 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20080605/55a1c23d/attachment.pgp
Alexander Motin
2008-Jun-11 09:36 UTC
Crashes in devfs. Possibly on interface creation/destruction.
Kostik Belousov wrote:> Try the following patch. It is against current, there might be further > races at the device destruction, but may be not. Also, please note that > devfs in RELENG_6 and RELENG_7/CURRENT are diverged enough to make MFC > of most bugfixes to RELENG_6 nearly impossible. > > diff --git a/sys/kern/kern_conf.c b/sys/kern/kern_conf.c > index e9d0f7b..af9a47d 100644 > --- a/sys/kern/kern_conf.c > +++ b/sys/kern/kern_conf.c > @@ -825,9 +825,9 @@ make_dev_alias(struct cdev *pdev, const char *fmt, ...) > va_end(ap); > > devfs_create(dev); > + dev_dependsl(pdev, dev); > clean_unrhdrl(devfs_inos); > dev_unlock(); > - dev_depends(pdev, dev); > > notify_create(dev);Looks reasonable. For RELENG_6 it also applies with minor differences. Put it to the production. As soon as problem shows itself not very often, positive result probably will be seen only after several weeks of successive operation. Thank you. -- Alexander Motin