Mark Johnston
2017-Jul-17 23:24 UTC
The 11.1-RC3 can only boot and attach disks in "Safe mode", otherwise gets stuck attaching
On Tue, Jul 18, 2017 at 01:01:16AM +0200, Mark Martinec wrote:> Upgrading 11.0-RELEASE-p11 to 11.1-RC3 using the usual freebsd-update > upgrade > method I ended up with a system which gets stuck while trying to attach > the second set of disks. This happened already after the first phase of > the upgrade procedure (installing and re-booting with a new kernel). > > The first set of disks (ada0 .. ada2) are attached successfully, also a > cd0, but then when the first of the set of four (a regular spinning > disk) > on an LSI controller is to be attached, the boot procedure just gets > stuck there: > > kernel: ada1: 300.000MB/s transfers (SATA 2.x, PIO4, PIO 8192bytes) > kernel: ada1: Command Queueing enabled > kernel: ada1: 305245MB (625142448 512 byte sectors) > kernel: ada2 at ahcich6 bus 0 scbus8 target 0 lun 0 > kernel: ada2: <OCZ-VERTEX3 2.25> ATA8-ACS SATA 3.x device > kernel: ada2: Serial Number OCZ-O1L6RF591R09Z5C8 > kernel: ada2: 300.000MB/s transfers (SATA 2.x, PIO4, PIO 8192bytes) > kernel: ada2: Command Queueing enabled > kernel: ada2: 114473MB (234441648 512 byte sectors) > kernel: ada2: quirks=0x1<4K> > kernel: da0 at mps0 bus 0 scbus0 target 2 lun 0 > > (stuck here, keyboard not responding, fans rising their pitch, > presumably CPU is spinning)Are you able to break into the debugger at this point? Try setting debug.kdb.break_to_debugger=1 and debug.kdb.alt_break_to_debugger=1 at the loader prompt, and hit the break key, or the key sequence <CR> ~ ctrl-b once the hang occurs. At the debugger prompt, try "bt" and "show allpcpu" to start.
Mark Martinec
2017-Jul-18 23:18 UTC
The 11.1-RC3 can only boot and attach disks in "Safe mode", otherwise gets stuck attaching
2017-07-18 01:24, Mark Johnston wrote:> Are you able to break into the debugger at this point? Try setting > debug.kdb.break_to_debugger=1 and debug.kdb.alt_break_to_debugger=1 at > the loader prompt, and hit the break key, or the key sequence > <CR> ~ ctrl-b once the hang occurs. At the debugger prompt, try > "bt" and "show allpcpu" to start.Thank you for a prompt and good suggestion! I spent an afternoon fiddling with the machine, with mixed results. Your suggestion to break into debugger did not work, there was no reaction to <break> or to <CR> ~ ctrl-b. So I embarked on rebuilding the RC3 kernel with options KDB options DDB options BREAK_TO_DEBUGGER options ALT_BREAK_TO_DEBUGGER options INVARIANTS options INVARIANT_SUPPORT options WITNESS options WITNESS_SKIPSPIN but then I realized the <debug> key is mapped-to by: alt ctrl <esc>, which now does break into debugger - but not so early where the holdup occurs. The WITNESS produced some LOR warnings, but that is probably ok. I came across a trace just before the problem area, but it flows by so fast on a vt console and only the last 40 or so lines remain on the screen (I have a photo), which do not look like revealing much. Unfortunately this machine does not have a serial interface. So in my last attempt I rebuilt a kernel with INVARIANTS but without WITNESS - and now I cannot reproduce the problem, with or without a "safe mode". What is interesting here that now the da0..da3 disks are attached first, and only then the ada disks - and even within the group of disks on the same controller their order has been shuffled - no idea what could have caused it - and it may have avoided the problem by doing so. Will play some more with this tomorrow... Mark> On Tue, Jul 18, 2017 at 01:01:16AM +0200, Mark Martinec wrote: >> Upgrading 11.0-RELEASE-p11 to 11.1-RC3 using the usual freebsd-update >> upgrade >> method I ended up with a system which gets stuck while trying to >> attach >> the second set of disks. This happened already after the first phase >> of >> the upgrade procedure (installing and re-booting with a new kernel). >> >> The first set of disks (ada0 .. ada2) are attached successfully, also >> a >> cd0, but then when the first of the set of four (a regular spinning >> disk) >> on an LSI controller is to be attached, the boot procedure just gets >> stuck there: >> kernel: ada1: 300.000MB/s transfers (SATA 2.x, PIO4, PIO 8192bytes) >> kernel: ada1: Command Queueing enabled >> kernel: ada1: 305245MB (625142448 512 byte sectors) >> kernel: ada2 at ahcich6 bus 0 scbus8 target 0 lun 0 >> kernel: ada2: <OCZ-VERTEX3 2.25> ATA8-ACS SATA 3.x device >> kernel: ada2: Serial Number OCZ-O1L6RF591R09Z5C8 >> kernel: ada2: 300.000MB/s transfers (SATA 2.x, PIO4, PIO 8192bytes) >> kernel: ada2: Command Queueing enabled >> kernel: ada2: 114473MB (234441648 512 byte sectors) >> kernel: ada2: quirks=0x1<4K> >> kernel: da0 at mps0 bus 0 scbus0 target 2 lun 0 >> >> (stuck here, keyboard not responding, fans rising their pitch, >> presumably CPU is spinning)[...]