Ulrich Spoerlein
2008-Aug-06 20:12 UTC
cpufreq(4) panic on RELENG_7 (was: Re: Call for bfe(4) testers.)
On Mon, 04.08.2008 at 16:07:55 -0400, John Baldwin wrote:> On Monday 04 August 2008 02:29:19 pm Ulrich Spoerlein wrote: > > Fatal trap 12: page fault while in kernel mode > > cpuid = 0; apic id = 00 > > fault virtual address = 0x38 > > fault code = supervisor read, page not present > > instruction pointer = 0x20:0xc058ec16 > > stack pointer = 0x28:0xfb8b8ac8 > > frame pointer = 0x28:0xfb8b8ac8 > > code segment = base 0x0, limit 0xfffff, type 0x1b > > = DPL 0, pres 1, def32 1, gran 1 > > processor eflags = interrupt enabled, resume, IOPL = 0 > > current process = 1176 (powerd) > > db:0:kdb.enter.default> show pcpu > > cpuid = 0 > > curthread = 0xc4ec0aa0: pid 1176 "powerd" > > curpcb = 0xfb8b8d90 > > fpcurthread = none > > idlethread = 0xc3f80cc0: pid 10 "idle: cpu0" > > APIC ID = 0 > > currentldt = 0x50 > > db:0:kdb.enter.default> bt > > Tracing pid 1176 tid 100103 td 0xc4ec0aa0 > > device_is_attached(0,c87e6b40,fb8b8afc,0,101,...) at device_is_attached+0x6 > > cf_set_method(c420b600,c87e6b40,64,fb8b8ba4,c87e33b4,...) at > cf_set_method+0x6a3 > > cpufreq_curr_sysctl(c420d840,c4207000,0,fb8b8ba4,fb8b8ba4,...) at > cpufreq_curr_sysctl+0x232 > > sysctl_root(fb8b8ba4,4,1,c4ec0aa0,c4501d38,...) at sysctl_root+0x137 > > userland_sysctl(c4ec0aa0,fb8b8c14,4,0,0,...) at userland_sysctl+0x151 > > __sysctl(c4ec0aa0,fb8b8cfc,18,fb8b8ca0,46,...) at __sysctl+0xec > > syscall(fb8b8d38) at syscall+0x345 > > Xint0x80_syscall() at Xint0x80_syscall+0x20 > > --- syscall (202, FreeBSD ELF32, __sysctl), eip = 0x28161bd3, esp = > 0xbfbfe8cc, ebp = 0xbfbfe8f8 --- > > db:0:kdb.enter.default> capture off > > > > Seems like I caught RELENG_7 during a bad time. Will update again. > > What cpufreq drivers do you have loaded and attached? This patch might work > around the issue, but I suspect there is a bug in one of the cpufreq drivers.Hi John, sorry for the slow update, please bear with me. This is on a first generation Pentium-M (Banias core) with EST (and also p4tcc attached, as I just discovered): CPU: Intel(R) Pentium(R) M processor 1.50GHz (1495.15-MHz 686-class CPU) Origin = "GenuineIntel" Id = 0x6d6 Stepping = 6 Features=0xafe9f9bf<FPU,VME,DE,PSE,TSC,MSR,MCE,CX8,SEP,MTRR,PGE,MCA,CMOV,PAT,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,TM,PBE> Features2=0x180<EST,TM2> .. cpu0: <ACPI CPU> on acpi0 est0: <Enhanced SpeedStep Frequency Control> on cpu0 p4tcc0: <CPU Frequency Thermal Control> on cpu0 dev.cpu.0.%desc: ACPI CPU dev.cpu.0.%driver: cpu dev.cpu.0.%location: handle=\_PR_.CPU0 dev.cpu.0.%pnpinfo: _HID=none _UID=0 dev.cpu.0.%parent: acpi0 dev.cpu.0.freq: 300 dev.cpu.0.freq_levels: 1500/-1 1312/-1 1200/-1 1050/-1 1000/-1 875/-1 800/-1 700/-1 600/-1 525/-1 450/-1 375/-1 300/-1 225/-1 150/-1 75/-1 dev.cpu.0.cx_supported: C1/1 C2/1 C3/85 C4/185 dev.cpu.0.cx_lowest: C1 dev.cpu.0.cx_usage: 100.00% 0.00% 0.00% 0.00% dev.acpi_perf.0.%parent: cpu0 dev.est.0.%desc: Enhanced SpeedStep Frequency Control dev.est.0.%driver: est dev.est.0.%parent: cpu0 dev.est.0.freq_settings: 1500/-1 1200/-1 1000/-1 800/-1 600/-1 dev.cpufreq.0.%driver: cpufreq dev.cpufreq.0.%parent: cpu0 dev.p4tcc.0.%desc: CPU Frequency Thermal Control dev.p4tcc.0.%driver: p4tcc dev.p4tcc.0.%parent: cpu0 dev.p4tcc.0.freq_settings: 10000/-1 8750/-1 7500/-1 6250/-1 5000/-1 3750/-1 2500/-1 1250/-1 a kernel from 2008-06-13 is the last known working one. I just had the same crash with a kernel from sources at 2008-07-01 and am new recompiling for 2008-06-24. Your MFC of est.c rev 180044 might be the problem, I'll try a backout once I confirmed that the 2008-06-24 kernel is running stable.> Index: kern_cpu.c > ==================================================================> RCS file: /usr/cvs/src/sys/kern/kern_cpu.c,v > retrieving revision 1.27.2.2 > diff -u -r1.27.2.2 kern_cpu.c > --- kern_cpu.c 9 May 2008 19:02:10 -0000 1.27.2.2 > +++ kern_cpu.c 4 Aug 2008 20:07:41 -0000 > @@ -329,6 +329,8 @@ > /* Next, set any/all relative frequencies via their drivers. */ > for (i = 0; i < level->rel_count; i++) { > set = &level->rel_set[i]; > + if (set->dev == NULL) > + continue; > if (!device_is_attached(set->dev)) { > error = ENXIO; > goto out; >Will try that one too, hopefully tomorrow. Will also disable p4tcc. This was not attaching during the RELENG_6 times but leads to ridiculous rates of 75 MHz. Cheers, Ulrich Spoerlein -- It is better to remain silent and be thought a fool, than to speak, and remove all doubt.
John Baldwin
2008-Aug-06 22:12 UTC
cpufreq(4) panic on RELENG_7 (was: Re: Call for bfe(4) testers.)
On Wednesday 06 August 2008 04:06:43 pm Ulrich Spoerlein wrote:> On Mon, 04.08.2008 at 16:07:55 -0400, John Baldwin wrote: > > On Monday 04 August 2008 02:29:19 pm Ulrich Spoerlein wrote: > > > Fatal trap 12: page fault while in kernel mode > > > cpuid = 0; apic id = 00 > > > fault virtual address = 0x38 > > > fault code = supervisor read, page not present > > > instruction pointer = 0x20:0xc058ec16 > > > stack pointer = 0x28:0xfb8b8ac8 > > > frame pointer = 0x28:0xfb8b8ac8 > > > code segment = base 0x0, limit 0xfffff, type 0x1b > > > = DPL 0, pres 1, def32 1, gran 1 > > > processor eflags = interrupt enabled, resume, IOPL = 0 > > > current process = 1176 (powerd) > > > db:0:kdb.enter.default> show pcpu > > > cpuid = 0 > > > curthread = 0xc4ec0aa0: pid 1176 "powerd" > > > curpcb = 0xfb8b8d90 > > > fpcurthread = none > > > idlethread = 0xc3f80cc0: pid 10 "idle: cpu0" > > > APIC ID = 0 > > > currentldt = 0x50 > > > db:0:kdb.enter.default> bt > > > Tracing pid 1176 tid 100103 td 0xc4ec0aa0 > > > device_is_attached(0,c87e6b40,fb8b8afc,0,101,...) at > > > device_is_attached+0x6 > > > cf_set_method(c420b600,c87e6b40,64,fb8b8ba4,c87e33b4,...) at > > > > cf_set_method+0x6a3 > > > > > cpufreq_curr_sysctl(c420d840,c4207000,0,fb8b8ba4,fb8b8ba4,...) at > > > > cpufreq_curr_sysctl+0x232 > > > > > sysctl_root(fb8b8ba4,4,1,c4ec0aa0,c4501d38,...) at sysctl_root+0x137 > > > userland_sysctl(c4ec0aa0,fb8b8c14,4,0,0,...) at userland_sysctl+0x151 > > > __sysctl(c4ec0aa0,fb8b8cfc,18,fb8b8ca0,46,...) at __sysctl+0xec > > > syscall(fb8b8d38) at syscall+0x345 > > > Xint0x80_syscall() at Xint0x80_syscall+0x20 > > > --- syscall (202, FreeBSD ELF32, __sysctl), eip = 0x28161bd3, esp > > > > 0xbfbfe8cc, ebp = 0xbfbfe8f8 --- > > > > > db:0:kdb.enter.default> capture off > > > > > > Seems like I caught RELENG_7 during a bad time. Will update again. > > > > What cpufreq drivers do you have loaded and attached? This patch might > > work around the issue, but I suspect there is a bug in one of the cpufreq > > drivers. > > Hi John, > > sorry for the slow update, please bear with me. > > This is on a first generation Pentium-M (Banias core) with EST (and also > p4tcc attached, as I just discovered): > > CPU: Intel(R) Pentium(R) M processor 1.50GHz (1495.15-MHz 686-class CPU) > Origin = "GenuineIntel" Id = 0x6d6 Stepping = 6 > > Features=0xafe9f9bf<FPU,VME,DE,PSE,TSC,MSR,MCE,CX8,SEP,MTRR,PGE,MCA,CMOV,PA >T,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,TM,PBE> Features2=0x180<EST,TM2> > .. > cpu0: <ACPI CPU> on acpi0 > est0: <Enhanced SpeedStep Frequency Control> on cpu0 > p4tcc0: <CPU Frequency Thermal Control> on cpu0 > > > dev.cpu.0.%desc: ACPI CPU > dev.cpu.0.%driver: cpu > dev.cpu.0.%location: handle=\_PR_.CPU0 > dev.cpu.0.%pnpinfo: _HID=none _UID=0 > dev.cpu.0.%parent: acpi0 > dev.cpu.0.freq: 300 > dev.cpu.0.freq_levels: 1500/-1 1312/-1 1200/-1 1050/-1 1000/-1 875/-1 > 800/-1 700/-1 600/-1 525/-1 450/-1 375/-1 300/-1 225/-1 150/-1 75/-1 > dev.cpu.0.cx_supported: C1/1 C2/1 C3/85 C4/185 > dev.cpu.0.cx_lowest: C1 > dev.cpu.0.cx_usage: 100.00% 0.00% 0.00% 0.00% > dev.acpi_perf.0.%parent: cpu0 > dev.est.0.%desc: Enhanced SpeedStep Frequency Control > dev.est.0.%driver: est > dev.est.0.%parent: cpu0 > dev.est.0.freq_settings: 1500/-1 1200/-1 1000/-1 800/-1 600/-1 > dev.cpufreq.0.%driver: cpufreq > dev.cpufreq.0.%parent: cpu0 > dev.p4tcc.0.%desc: CPU Frequency Thermal Control > dev.p4tcc.0.%driver: p4tcc > dev.p4tcc.0.%parent: cpu0 > dev.p4tcc.0.freq_settings: 10000/-1 8750/-1 7500/-1 6250/-1 5000/-1 3750/-1 > 2500/-1 1250/-1 > > a kernel from 2008-06-13 is the last known working one. I just had the > same crash with a kernel from sources at 2008-07-01 and am new > recompiling for 2008-06-24. > > Your MFC of est.c rev 180044 might be the problem, I'll try a backout > once I confirmed that the 2008-06-24 kernel is running stable.I doubt this will cause it.> > Index: kern_cpu.c > > ==================================================================> > RCS file: /usr/cvs/src/sys/kern/kern_cpu.c,v > > retrieving revision 1.27.2.2 > > diff -u -r1.27.2.2 kern_cpu.c > > --- kern_cpu.c 9 May 2008 19:02:10 -0000 1.27.2.2 > > +++ kern_cpu.c 4 Aug 2008 20:07:41 -0000 > > @@ -329,6 +329,8 @@ > > /* Next, set any/all relative frequencies via their drivers. */ > > for (i = 0; i < level->rel_count; i++) { > > set = &level->rel_set[i]; > > + if (set->dev == NULL) > > + continue; > > if (!device_is_attached(set->dev)) { > > error = ENXIO; > > goto out; > > Will try that one too, hopefully tomorrow. > > Will also disable p4tcc. This was not attaching during the RELENG_6 > times but leads to ridiculous rates of 75 MHz.If p4tcc attaching is new, that might point to the culprit. A good quick test would be to disable individual cpufreq drivers to find out which one causes the panic. -- John Baldwin
pluknet
2008-Aug-07 21:26 UTC
cpufreq(4) panic on RELENG_7 (was: Re: Call for bfe(4) testers.)
Hi, John.>> > Fatal trap 12: page fault while in kernel mode >> > cpuid = 0; apic id = 00 >> > fault virtual address = 0x38 >> > fault code = supervisor read, page not present >> > instruction pointer = 0x20:0xc058ec16 >> > stack pointer = 0x28:0xfb8b8ac8 >> > frame pointer = 0x28:0xfb8b8ac8 >> > code segment = base 0x0, limit 0xfffff, type 0x1b >> > = DPL 0, pres 1, def32 1, gran 1 >> > processor eflags = interrupt enabled, resume, IOPL = 0 >> > current process = 1176 (powerd) >> > db:0:kdb.enter.default> show pcpu >> > cpuid = 0 >> > curthread = 0xc4ec0aa0: pid 1176 "powerd" >> > curpcb = 0xfb8b8d90 >> > fpcurthread = none >> > idlethread = 0xc3f80cc0: pid 10 "idle: cpu0" >> > APIC ID = 0 >> > currentldt = 0x50 >> > db:0:kdb.enter.default> bt >> > Tracing pid 1176 tid 100103 td 0xc4ec0aa0 >> > device_is_attached(0,c87e6b40,fb8b8afc,0,101,...) at device_is_attached+0x6 >> > cf_set_method(c420b600,c87e6b40,64,fb8b8ba4,c87e33b4,...) at >> cf_set_method+0x6a3 >> > cpufreq_curr_sysctl(c420d840,c4207000,0,fb8b8ba4,fb8b8ba4,...) at >> cpufreq_curr_sysctl+0x232 >> > sysctl_root(fb8b8ba4,4,1,c4ec0aa0,c4501d38,...) at sysctl_root+0x137 >> > userland_sysctl(c4ec0aa0,fb8b8c14,4,0,0,...) at userland_sysctl+0x151 >> > __sysctl(c4ec0aa0,fb8b8cfc,18,fb8b8ca0,46,...) at __sysctl+0xec >> > syscall(fb8b8d38) at syscall+0x345 >> > Xint0x80_syscall() at Xint0x80_syscall+0x20 >> > --- syscall (202, FreeBSD ELF32, __sysctl), eip = 0x28161bd3, esp >> 0xbfbfe8cc, ebp = 0xbfbfe8f8 --- >> > db:0:kdb.enter.default> capture off >> >Is it intentional? (kgdb) p level.rel_count $44 = 1986356271 First level.rel_set+5 are NULL in my case. (kgdb) p i $46 = 0 P.S. Same problem/hardware/bt/sysctl/dmesg on Banias 1.6GHz, worked ok on previous stable7 from Jun 16. hth, pluknet