thr3ads.net - freebsd stable - cpufreq(4) panic on RELENG_7 (was: Re: Call for bfe(4) testers.) [Aug 2008]

If this information is useful, please help other people find it:
Share via:

Ulrich Spoerlein

2008-Aug-06 20:12 UTC

cpufreq(4) panic on RELENG_7 (was: Re: Call for bfe(4) testers.)

On Mon, 04.08.2008 at 16:07:55 -0400, John Baldwin
wrote:> On Monday 04 August 2008 02:29:19 pm Ulrich Spoerlein wrote:
> > Fatal trap 12: page fault while in kernel mode
> > cpuid = 0; apic id = 00
> > fault virtual address   = 0x38
> > fault code              = supervisor read, page not present
> > instruction pointer     = 0x20:0xc058ec16
> > stack pointer           = 0x28:0xfb8b8ac8
> > frame pointer           = 0x28:0xfb8b8ac8
> > code segment            = base 0x0, limit 0xfffff, type 0x1b
> >                         = DPL 0, pres 1, def32 1, gran 1
> > processor eflags        = interrupt enabled, resume, IOPL = 0
> > current process         = 1176 (powerd)
> > db:0:kdb.enter.default>  show pcpu
> > cpuid        = 0
> > curthread    = 0xc4ec0aa0: pid 1176 "powerd"
> > curpcb       = 0xfb8b8d90
> > fpcurthread  = none
> > idlethread   = 0xc3f80cc0: pid 10 "idle: cpu0"
> > APIC ID      = 0
> > currentldt   = 0x50
> > db:0:kdb.enter.default>  bt
> > Tracing pid 1176 tid 100103 td 0xc4ec0aa0
> > device_is_attached(0,c87e6b40,fb8b8afc,0,101,...) at
device_is_attached+0x6
> > cf_set_method(c420b600,c87e6b40,64,fb8b8ba4,c87e33b4,...) at 
> cf_set_method+0x6a3
> > cpufreq_curr_sysctl(c420d840,c4207000,0,fb8b8ba4,fb8b8ba4,...) at 
> cpufreq_curr_sysctl+0x232
> > sysctl_root(fb8b8ba4,4,1,c4ec0aa0,c4501d38,...) at sysctl_root+0x137
> > userland_sysctl(c4ec0aa0,fb8b8c14,4,0,0,...) at userland_sysctl+0x151
> > __sysctl(c4ec0aa0,fb8b8cfc,18,fb8b8ca0,46,...) at __sysctl+0xec
> > syscall(fb8b8d38) at syscall+0x345
> > Xint0x80_syscall() at Xint0x80_syscall+0x20
> > --- syscall (202, FreeBSD ELF32, __sysctl), eip = 0x28161bd3, esp = 
> 0xbfbfe8cc, ebp = 0xbfbfe8f8 ---
> > db:0:kdb.enter.default>  capture off
> > 
> > Seems like I caught RELENG_7 during a bad time. Will update again.
> 
> What cpufreq drivers do you have loaded and attached?  This patch might
work
> around the issue, but I suspect there is a bug in one of the cpufreq
drivers.
Hi John,

sorry for the slow update, please bear with me.

This is on a first generation Pentium-M (Banias core) with EST (and also
p4tcc attached, as I just discovered):

CPU: Intel(R) Pentium(R) M processor 1.50GHz (1495.15-MHz 686-class CPU)
  Origin = "GenuineIntel"  Id = 0x6d6  Stepping = 6
 
Features=0xafe9f9bf<FPU,VME,DE,PSE,TSC,MSR,MCE,CX8,SEP,MTRR,PGE,MCA,CMOV,PAT,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,TM,PBE>
  Features2=0x180<EST,TM2>
..
cpu0: <ACPI CPU> on acpi0
est0: <Enhanced SpeedStep Frequency Control> on cpu0
p4tcc0: <CPU Frequency Thermal Control> on cpu0


dev.cpu.0.%desc: ACPI CPU
dev.cpu.0.%driver: cpu
dev.cpu.0.%location: handle=\_PR_.CPU0
dev.cpu.0.%pnpinfo: _HID=none _UID=0
dev.cpu.0.%parent: acpi0
dev.cpu.0.freq: 300
dev.cpu.0.freq_levels: 1500/-1 1312/-1 1200/-1 1050/-1 1000/-1 875/-1 800/-1
700/-1 600/-1 525/-1 450/-1 375/-1 300/-1 225/-1 150/-1 75/-1
dev.cpu.0.cx_supported: C1/1 C2/1 C3/85 C4/185
dev.cpu.0.cx_lowest: C1
dev.cpu.0.cx_usage: 100.00% 0.00% 0.00% 0.00%
dev.acpi_perf.0.%parent: cpu0
dev.est.0.%desc: Enhanced SpeedStep Frequency Control
dev.est.0.%driver: est
dev.est.0.%parent: cpu0
dev.est.0.freq_settings: 1500/-1 1200/-1 1000/-1 800/-1 600/-1
dev.cpufreq.0.%driver: cpufreq
dev.cpufreq.0.%parent: cpu0
dev.p4tcc.0.%desc: CPU Frequency Thermal Control
dev.p4tcc.0.%driver: p4tcc
dev.p4tcc.0.%parent: cpu0
dev.p4tcc.0.freq_settings: 10000/-1 8750/-1 7500/-1 6250/-1 5000/-1 3750/-1
2500/-1 1250/-1

a kernel from 2008-06-13 is the last known working one. I just had the
same crash with a kernel from sources at 2008-07-01 and am new
recompiling for 2008-06-24.

Your MFC of est.c rev 180044 might be the problem, I'll try a backout
once I confirmed that the 2008-06-24 kernel is running stable.
> Index: kern_cpu.c
> ==================================================================> RCS
file: /usr/cvs/src/sys/kern/kern_cpu.c,v
> retrieving revision 1.27.2.2
> diff -u -r1.27.2.2 kern_cpu.c
> --- kern_cpu.c  9 May 2008 19:02:10 -0000       1.27.2.2
> +++ kern_cpu.c  4 Aug 2008 20:07:41 -0000
> @@ -329,6 +329,8 @@
>         /* Next, set any/all relative frequencies via their drivers. */
>         for (i = 0; i < level->rel_count; i++) {
>                 set = &level->rel_set[i];
> +               if (set->dev == NULL)
> +                       continue;
>                 if (!device_is_attached(set->dev)) {
>                         error = ENXIO;
>                         goto out;
> 
Will try that one too, hopefully tomorrow.

Will also disable p4tcc. This was not attaching during the RELENG_6
times but leads to ridiculous rates of 75 MHz.

Cheers,
Ulrich Spoerlein
-- 
It is better to remain silent and be thought a fool,
than to speak, and remove all doubt.

John Baldwin

2008-Aug-06 22:12 UTC

head link

cpufreq(4) panic on RELENG_7 (was: Re: Call for bfe(4) testers.)

On Wednesday 06 August 2008 04:06:43 pm Ulrich Spoerlein
wrote:> On Mon, 04.08.2008 at 16:07:55 -0400, John Baldwin wrote:
> > On Monday 04 August 2008 02:29:19 pm Ulrich Spoerlein wrote:
> > > Fatal trap 12: page fault while in kernel mode
> > > cpuid = 0; apic id = 00
> > > fault virtual address   = 0x38
> > > fault code              = supervisor read, page not present
> > > instruction pointer     = 0x20:0xc058ec16
> > > stack pointer           = 0x28:0xfb8b8ac8
> > > frame pointer           = 0x28:0xfb8b8ac8
> > > code segment            = base 0x0, limit 0xfffff, type 0x1b
> > >                         = DPL 0, pres 1, def32 1, gran 1
> > > processor eflags        = interrupt enabled, resume, IOPL = 0
> > > current process         = 1176 (powerd)
> > > db:0:kdb.enter.default>  show pcpu
> > > cpuid        = 0
> > > curthread    = 0xc4ec0aa0: pid 1176 "powerd"
> > > curpcb       = 0xfb8b8d90
> > > fpcurthread  = none
> > > idlethread   = 0xc3f80cc0: pid 10 "idle: cpu0"
> > > APIC ID      = 0
> > > currentldt   = 0x50
> > > db:0:kdb.enter.default>  bt
> > > Tracing pid 1176 tid 100103 td 0xc4ec0aa0
> > > device_is_attached(0,c87e6b40,fb8b8afc,0,101,...) at
> > > device_is_attached+0x6
> > > cf_set_method(c420b600,c87e6b40,64,fb8b8ba4,c87e33b4,...) at
> >
> > cf_set_method+0x6a3
> >
> > > cpufreq_curr_sysctl(c420d840,c4207000,0,fb8b8ba4,fb8b8ba4,...) at
> >
> > cpufreq_curr_sysctl+0x232
> >
> > > sysctl_root(fb8b8ba4,4,1,c4ec0aa0,c4501d38,...) at
sysctl_root+0x137
> > > userland_sysctl(c4ec0aa0,fb8b8c14,4,0,0,...) at
userland_sysctl+0x151
> > > __sysctl(c4ec0aa0,fb8b8cfc,18,fb8b8ca0,46,...) at __sysctl+0xec
> > > syscall(fb8b8d38) at syscall+0x345
> > > Xint0x80_syscall() at Xint0x80_syscall+0x20
> > > --- syscall (202, FreeBSD ELF32, __sysctl), eip = 0x28161bd3, esp
> >
> > 0xbfbfe8cc, ebp = 0xbfbfe8f8 ---
> >
> > > db:0:kdb.enter.default>  capture off
> > >
> > > Seems like I caught RELENG_7 during a bad time. Will update
again.
> >
> > What cpufreq drivers do you have loaded and attached?  This patch
might
> > work around the issue, but I suspect there is a bug in one of the
cpufreq
> > drivers.
>
> Hi John,
>
> sorry for the slow update, please bear with me.
>
> This is on a first generation Pentium-M (Banias core) with EST (and also
> p4tcc attached, as I just discovered):
>
> CPU: Intel(R) Pentium(R) M processor 1.50GHz (1495.15-MHz 686-class CPU)
>   Origin = "GenuineIntel"  Id = 0x6d6  Stepping = 6
>  
>
Features=0xafe9f9bf<FPU,VME,DE,PSE,TSC,MSR,MCE,CX8,SEP,MTRR,PGE,MCA,CMOV,PA
>T,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,TM,PBE>
Features2=0x180<EST,TM2>
> ..
> cpu0: <ACPI CPU> on acpi0
> est0: <Enhanced SpeedStep Frequency Control> on cpu0
> p4tcc0: <CPU Frequency Thermal Control> on cpu0
>
>
> dev.cpu.0.%desc: ACPI CPU
> dev.cpu.0.%driver: cpu
> dev.cpu.0.%location: handle=\_PR_.CPU0
> dev.cpu.0.%pnpinfo: _HID=none _UID=0
> dev.cpu.0.%parent: acpi0
> dev.cpu.0.freq: 300
> dev.cpu.0.freq_levels: 1500/-1 1312/-1 1200/-1 1050/-1 1000/-1 875/-1
> 800/-1 700/-1 600/-1 525/-1 450/-1 375/-1 300/-1 225/-1 150/-1 75/-1
> dev.cpu.0.cx_supported: C1/1 C2/1 C3/85 C4/185
> dev.cpu.0.cx_lowest: C1
> dev.cpu.0.cx_usage: 100.00% 0.00% 0.00% 0.00%
> dev.acpi_perf.0.%parent: cpu0
> dev.est.0.%desc: Enhanced SpeedStep Frequency Control
> dev.est.0.%driver: est
> dev.est.0.%parent: cpu0
> dev.est.0.freq_settings: 1500/-1 1200/-1 1000/-1 800/-1 600/-1
> dev.cpufreq.0.%driver: cpufreq
> dev.cpufreq.0.%parent: cpu0
> dev.p4tcc.0.%desc: CPU Frequency Thermal Control
> dev.p4tcc.0.%driver: p4tcc
> dev.p4tcc.0.%parent: cpu0
> dev.p4tcc.0.freq_settings: 10000/-1 8750/-1 7500/-1 6250/-1 5000/-1 3750/-1
> 2500/-1 1250/-1
>
> a kernel from 2008-06-13 is the last known working one. I just had the
> same crash with a kernel from sources at 2008-07-01 and am new
> recompiling for 2008-06-24.
>
> Your MFC of est.c rev 180044 might be the problem, I'll try a backout
> once I confirmed that the 2008-06-24 kernel is running stable.
I doubt this will cause it.> > Index: kern_cpu.c
> > ==================================================================>
> RCS file: /usr/cvs/src/sys/kern/kern_cpu.c,v
> > retrieving revision 1.27.2.2
> > diff -u -r1.27.2.2 kern_cpu.c
> > --- kern_cpu.c  9 May 2008 19:02:10 -0000       1.27.2.2
> > +++ kern_cpu.c  4 Aug 2008 20:07:41 -0000
> > @@ -329,6 +329,8 @@
> >         /* Next, set any/all relative frequencies via their drivers.
*/
> >         for (i = 0; i < level->rel_count; i++) {
> >                 set = &level->rel_set[i];
> > +               if (set->dev == NULL)
> > +                       continue;
> >                 if (!device_is_attached(set->dev)) {
> >                         error = ENXIO;
> >                         goto out;
>
> Will try that one too, hopefully tomorrow.
>
> Will also disable p4tcc. This was not attaching during the RELENG_6
> times but leads to ridiculous rates of 75 MHz.
If p4tcc attaching is new, that might point to the culprit.  A good quick test 
would be to disable individual cpufreq drivers to find out which one causes 
the panic.

-- 
John Baldwin

pluknet

2008-Aug-07 21:26 UTC

head link

cpufreq(4) panic on RELENG_7 (was: Re: Call for bfe(4) testers.)

Hi, John.
>> > Fatal trap 12: page fault while in kernel mode
>> > cpuid = 0; apic id = 00
>> > fault virtual address   = 0x38
>> > fault code              = supervisor read, page not present
>> > instruction pointer     = 0x20:0xc058ec16
>> > stack pointer           = 0x28:0xfb8b8ac8
>> > frame pointer           = 0x28:0xfb8b8ac8
>> > code segment            = base 0x0, limit 0xfffff, type 0x1b
>> >                         = DPL 0, pres 1, def32 1, gran 1
>> > processor eflags        = interrupt enabled, resume, IOPL = 0
>> > current process         = 1176 (powerd)
>> > db:0:kdb.enter.default>  show pcpu
>> > cpuid        = 0
>> > curthread    = 0xc4ec0aa0: pid 1176 "powerd"
>> > curpcb       = 0xfb8b8d90
>> > fpcurthread  = none
>> > idlethread   = 0xc3f80cc0: pid 10 "idle: cpu0"
>> > APIC ID      = 0
>> > currentldt   = 0x50
>> > db:0:kdb.enter.default>  bt
>> > Tracing pid 1176 tid 100103 td 0xc4ec0aa0
>> > device_is_attached(0,c87e6b40,fb8b8afc,0,101,...) at
device_is_attached+0x6
>> > cf_set_method(c420b600,c87e6b40,64,fb8b8ba4,c87e33b4,...) at
>> cf_set_method+0x6a3
>> > cpufreq_curr_sysctl(c420d840,c4207000,0,fb8b8ba4,fb8b8ba4,...) at
>> cpufreq_curr_sysctl+0x232
>> > sysctl_root(fb8b8ba4,4,1,c4ec0aa0,c4501d38,...) at
sysctl_root+0x137
>> > userland_sysctl(c4ec0aa0,fb8b8c14,4,0,0,...) at
userland_sysctl+0x151
>> > __sysctl(c4ec0aa0,fb8b8cfc,18,fb8b8ca0,46,...) at __sysctl+0xec
>> > syscall(fb8b8d38) at syscall+0x345
>> > Xint0x80_syscall() at Xint0x80_syscall+0x20
>> > --- syscall (202, FreeBSD ELF32, __sysctl), eip = 0x28161bd3, esp
>> 0xbfbfe8cc, ebp = 0xbfbfe8f8 ---
>> > db:0:kdb.enter.default>  capture off
>> >
Is it intentional?

(kgdb) p level.rel_count
$44 = 1986356271

First level.rel_set+5 are NULL in my case.
(kgdb) p i
$46 = 0


P.S. Same problem/hardware/bt/sysctl/dmesg on Banias 1.6GHz, worked ok
on previous stable7 from Jun 16.

hth,
pluknet

freebsd stable - Aug 2008 - cpufreq(4) panic on RELENG_7 (was: Re: Call for bfe(4) testers.)

cpufreq(4) panic on RELENG_7 (was: Re: Call for bfe(4) testers.)

cpufreq(4) panic on RELENG_7 (was: Re: Call for bfe(4) testers.)

cpufreq(4) panic on RELENG_7 (was: Re: Call for bfe(4) testers.)