FreeBSD 8.1-STABLE sometime after 10/28/2010 has caused a fatal boot error on my Toshiba U205, 1.8 GHz Core Duo laptop. Many times every week I sync with STABLE and build everything. I have been doing this for years. I sync'd (via csup) and built on 10/28/2010 and everything was fine. Then I sync'd yesterday 11/1/2010 and it crashes on boot. The diagnostics print out the following: --- Fatal trap 18: integer divide fault while in kernel mode kdb_backtrace panic trap_fatal trap calltrap topo_probe cpu_topo smp_topo sched_setup mi_startup --- I reverted at the loader via boot /boot/kernel.old, resync'd today, rebuilt, and things are still broken. Any ideas? Dan Allen
On Tue, Nov 02, 2010 at 11:30:28PM -0600, Dan Allen wrote:> FreeBSD 8.1-STABLE sometime after 10/28/2010 has caused a fatal boot error on my Toshiba U205, 1.8 GHz Core Duo laptop. > > Many times every week I sync with STABLE and build everything. I have been doing this for years. > > I sync'd (via csup) and built on 10/28/2010 and everything was fine. > > Then I sync'd yesterday 11/1/2010 and it crashes on boot. The diagnostics print out the following: > > --- > > Fatal trap 18: integer divide fault while in kernel mode > > kdb_backtrace > panic > trap_fatal > trap > calltrap > topo_probe > cpu_topo > smp_topo > sched_setup > mi_startup > > --- > > I reverted at the loader via boot /boot/kernel.old, resync'd today, rebuilt, and things are still broken. > > Any ideas?This looks like it could be related to the Intel CPU topology change that was recently MFC'd: http://freshbsd.org/2010/11/01/08/20/14 Can you please roll your source code back to a date prior to the above commit, rebuild, and re-try? You can accomplish this using the "date" option in your cvsup/csup file. See csup(1) for details. I would recommend also chopping off an additional hour "just in case". -- | Jeremy Chadwick jdc@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB |
on 03/11/2010 10:18 Sergey Kandaurov said the following:> On 3 November 2010 08:30, Dan Allen <danallen46@airwired.net> wrote: >> FreeBSD 8.1-STABLE sometime after 10/28/2010 has caused a fatal boot error on my Toshiba U205, 1.8 GHz Core Duo laptop. >> >> Many times every week I sync with STABLE and build everything. I have been doing this for years. >> >> I sync'd (via csup) and built on 10/28/2010 and everything was fine. >> >> Then I sync'd yesterday 11/1/2010 and it crashes on boot. The diagnostics print out the following: >> >> --- >> >> Fatal trap 18: integer divide fault while in kernel mode >> >> kdb_backtrace >> panic >> trap_fatal >> trap >> calltrap >> topo_probe >> cpu_topo >> smp_topo >> sched_setup >> mi_startup >> >> --- >> >> I reverted at the loader via boot /boot/kernel.old, resync'd today, rebuilt, and things are still broken.Right, if you never report the problems you get, the chance that they are fixed is very slim.> It's possible in theory to leave cpu_logical as zero in topo_probe_0x4(). > So, it makes sense to add some sort of the check.Actually - not exactly. This problem seems to happen only on SMP systems that for some reason run as UP. E.g. because ACPI and/or APIC are disabled. Or some other BIOS configuration. But I am not sure what exactly is the case here. Verbose dmesg from a working kernel would be helpful. P.S. Following the relevant mailing lists sometimes helps to prepare for the future: http://thread.gmane.org/gmane.os.freebsd.current/128603/focus=128717 http://article.gmane.org/gmane.os.freebsd.stable/72727 -- Andriy Gapon
On 3 November 2010 08:30, Dan Allen <danallen46@airwired.net> wrote:> FreeBSD 8.1-STABLE sometime after 10/28/2010 has caused a fatal boot error on my Toshiba U205, 1.8 GHz Core Duo laptop. > > Many times every week I sync with STABLE and build everything. ?I have been doing this for years. > > I sync'd (via csup) and built on 10/28/2010 and everything was fine. > > Then I sync'd yesterday 11/1/2010 and it crashes on boot. ?The diagnostics print out the following: > > --- > > Fatal trap 18: integer divide fault while in kernel mode > > kdb_backtrace > panic > trap_fatal > trap > calltrap > topo_probe > cpu_topo > smp_topo > sched_setup > mi_startup > > --- > > I reverted at the loader via boot /boot/kernel.old, resync'd today, rebuilt, and things are still broken. >It's possible in theory to leave cpu_logical as zero in topo_probe_0x4(). So, it makes sense to add some sort of the check. Index: svn/freebsd/head/sys/amd64/amd64/mp_machdep.c ==================================================================--- svn/freebsd/head/sys/amd64/amd64/mp_machdep.c (revision 214725) +++ svn/freebsd/head/sys/amd64/amd64/mp_machdep.c (working copy) @@ -239,6 +239,8 @@ cpu_logical++; } + if (cpu_logical == 0) + cpu_logical = 1; /* XXX max_logical? */ cpu_cores /= cpu_logical; hyperthreading_cpus = cpu_logical; } -- wbr, pluknet
On 2 Nov 2010, at 11:57 PM, Jeremy Chadwick wrote:> Can you please roll your source code back to a date prior to the above > commit, rebuild, and re-try? You can accomplish this using the "date" > option in your cvsup/csup file. See csup(1) for details. I would > recommend also chopping off an additional hour "just in case".I did this and the results are as expected, i.e., everything worked just fine. So we are zeroing in on this... Dan
on 09/11/2010 15:55 Dan Allen said the following:> > On 8 Nov 2010, at 10:30 PM, Andriy Gapon wrote: > >> Can you please also provide the following output: >> $ kenv | fgrep hint.acpi > > hint.acpi.0.oem="TOSHIB" > hint.acpi.0.revision="1" > hint.acpi.0.rsdp="0xf01e0" > hint.acpi.0.rsdt="0x3f7a0000"Two more things: - sysctl machdep.acpi_root - /boot/loader.conf contents -- Andriy Gapon
On 9 Nov 2010, at 8:11 AM, Andriy Gapon wrote:> - sysctl machdep.acpi_rootmachdep.acpi_root: 983520
On 9 Nov 2010, at 8:11 AM, Andriy Gapon wrote:> /boot/loader.conf contentsThis might be the smoking gun! cat loader.conf: hint.apic.0.disabled="1" Dan
on 09/11/2010 17:22 Dan Allen said the following:> > On 9 Nov 2010, at 8:11 AM, Andriy Gapon wrote: > >> /boot/loader.conf contents > > This might be the smoking gun! > > cat loader.conf: > > hint.apic.0.disabled="1"Yes, it is. So why do you have it and what happens if you remove it? -- Andriy Gapon
On 9 Nov 2010, at 8:24 AM, Andriy Gapon wrote:> on 09/11/2010 17:22 Dan Allen said the following: >> >> On 9 Nov 2010, at 8:11 AM, Andriy Gapon wrote: >> >>> /boot/loader.conf contents >> >> This might be the smoking gun! >> >> cat loader.conf: >> >> hint.apic.0.disabled="1" > > Yes, it is. > So why do you have it and what happens if you remove it?Well, there is good news and bad news. The good news is that if I remove this hint the machine boots with 2 CPUs. The bad news is that I get lots of: CPU0: local APIC error 0x40 CPU1: local APIC error 0x40 messages and the machine is very unresponsive. Every keystroke has a second or two of delay. It really is unusable. If memory serves I had to turn off APIC in order to see both CPUs at some time in the past. However, at some time in the past I had both CPUs and did not have the severe unresponsiveness that I get without this hint. So with APIC I get both CPUs but an unusable config. Without APIC I have one CPU but things are lively. What next? Dan
On Tuesday, November 09, 2010 10:24:59 am Andriy Gapon wrote:> on 09/11/2010 17:22 Dan Allen said the following: > > > > On 9 Nov 2010, at 8:11 AM, Andriy Gapon wrote: > > > >> /boot/loader.conf contents > > > > This might be the smoking gun! > > > > cat loader.conf: > > > > hint.apic.0.disabled="1" > > Yes, it is. > So why do you have it and what happens if you remove it?The kernel should still not panic if someone disables APIC. -- John Baldwin