Marc Branchaud
2019-Dec-10 15:35 UTC
Boot loader stuck after first stage upgrading 11.2 to 12.0-RC2
On 2019-12-10 9:18 a.m., Mark Martinec wrote:> Commenting on a thread from 2018-12 and from 2019-09-20, with my solution > to the boot problem at the end, in case anyone is still interested.Thank you very much for this. A couple of questions: (1) Why do you say "raw devices for historical reasons"? Glancing through the zpool man page and the Handbook, I see nothing recommending or requiring GPT partitions. (2) Just to be 100% clear, my 11.3 non-root zpool looks like this: NAME STATE READ WRITE CKSUM storage ONLINE 0 0 0 raidz2-0 ONLINE 0 0 0 ada2 ONLINE 0 0 0 ada3 ONLINE 0 0 0 ada4 ONLINE 0 0 0 ada5 ONLINE 0 0 0 ada6 ONLINE 0 0 0 ada7 ONLINE 0 0 0 So this is using raw devices. Are you saying that if I upgrade this machine to 12 that it won't be able to boot? Thanks again! M.> ======> > On 2018-11-29 myself wrote: > (after upgrading from 11.2 to 12.0): >> While booting, the 'BTX loader' comes up, lists the BIOS drives, >> then the spinner below the list comes up and begins turning, >> stuttering, and after a couple of seconds it grinds to a standstill >> and nothing happens afterwards. >> At this point the ZFS and the bootstrap loader is supposed to >> come up, but it doesn't. > [...] (on 2018-12-04): >> The situation has not changed: the BTX loader lists all BIOS drives >> C..J (disk0..disk7), then a spinner starts and gets stuck forever. >> It never reaches the 'BIOS 635kB/3537856kB available memory' line. >> >> While trying to restore the old /boot from 11.2, I tried booting >> a live image from a 12.0-RC3 memory stick - and the loader got >> stuck again, same as when booting from a disk. >> So I had to boot from an 11.2 memstick to be able to regain control. > > ======> > 2018-12-04, Ian Lepore writes: >> ? Toomas Soome wrote: >> |??? ok, if you could perform 2 tests: >> |??? 1. from loader prompt enter 0x413 0xa000 - @w . cr >> |??? 2. on first spinner, press space and type on boot: prompt: >> |??? /boot/loader_4th and see if that will do better >> |??? thanks, toomas >> I don't think that will be an option.? If it hasn't gotten to the point >> of saying how much BIOS available memory there is, it's only halfway >> through loader main() and has hung before getting to interact(). >> >> In fact, if that line hasn't printed, but some disk drives have been >> listed, it pretty much has to be hung in the "March through the device >> switch probing for things" loop. If all the disks are listed, then it >> got through that entry in the devsw, and is likely hanging in the >> dv_init calls for either the pxedisk or zfsdev devices. > > ======> > 2018-12-07 19:08, Willem Jan Withagen wrote: >> Ended up more or less in the same situation this afternoon with >> freebsd-upgrade to [12.0]-RC3 >> Boot stops after listing all DOS disks, in a spinner. >> So that is no fix. >> >> I booted from USB 11.2 and replaced the /boot/zfs{boot,loader} by the >> 11.2 ones. >> That makes my server again happy. > > =======are > > 2019-09-19 16:02, Kurt Jaeger wrote: > Subject: Re: Lockdown adaX numbers to allow booting ? >> |? Kurt Jaeger writes: >> |??? The problem is that if all 10 disks are connected, the system >> |??? looses track from where it should boot and fails to boot (serial >> boot log): >> | >> |??? Consoles: internal video/keyboard? serial port >> |??? BTX loader 1.00? BTX version is 1.02 >> |??? Consoles: internal video/keyboard? serial port >> |??? BIOS drive C: is disk0 >> |??? BIOS drive D: is disk1 >> |??? BIOS drive E: is disk2 >> |??? BIOS drive F: is disk3 >> |??? BIOS drive G: is disk4 >> |??? BIOS drive H: is disk5 >> |??? BIOS drive I: is disk6 >> |??? BIOS drive J: is disk7 >> |??? BIOS drive K: is disk8 >> |??? BIOS drive L: is disk9 >> |??? // >> |??? [...] >> |??? The solution right now is this to unplug all disks of the 'bck' >> pool, >> |??? reboot, and re-insert the data disks after the boot is finished. >> |??? [...] >> |??? No gpart on the bck pool, raw drives. > > 2019-09-20 17:27, Mark Martinec wrote: > Subject: Re: Lockdown adaX numbers to allow booting ? >> >> This sounds very much like my experience: >> >> ? 2018-11-29, Boot loader stuck after first stage upgrading 11.2 to >> 12.0-RC2 >> https://lists.freebsd.org/pipermail/freebsd-stable/2018-November/090129.html >> >> https://lists.freebsd.org/pipermail/freebsd-stable/2018-December/090159.html >> >> >> I now have three SuperMicro machines which are unable to boot after >> upgrading 11.2 to 12.0. After unsuccessfully fiddling with boot loaders, >> I have reverted two back to 11.2 (which boots and works fine again), >> and the third one is now at 12.0 but needs the boot hack as described >> by Kurt, i.e. pull out half the disks (of the 'data' pool), boot the >> system, plug the disks back in and zfs mount the remaining pool. >> >> Considering that the 11.2 boots and works fine on these machines, >> I consider it a btx loader failure and not a BIOS issue. >> >> What is common with these three machines is that they have one pool >> on raw devices for historical reasons (not on gpt partitions). >> My guess is that the new loader gets confused by these raw disks. > > ======> > Ok, now to my current situation and solution/workaround. > > What was common with these hosts (and similar) is that a machine > has more than a couple of disks, with a zfs pool (non-root) on > raw devices (for historical reasons), not on gpt partitions. > > Three workarounds seem possible: > > - replace a boot loader with the one from 11.2, or > > - using a default loader from 12, disconnect a sufficient number > ? of data disks, boot, then reconnect disks and zfs attach the pool, > > - or my current solution: zfs offline one disk at a time from > ? a data pool, wipe it, set up a gpt partition on it and > ? put it back to the pool by 'zfs replace', letting it resilver. > ? It was a painful and slightly risky procedure (9 hours of > ? resilvering each of the seven disks), but this procedure > ? has now salvaged our remaining hosts which could not be > ? upgraded from 11.2 to 12. > > Mark > > > > _______________________________________________ > freebsd-stable at freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscribe at freebsd.org"
Mark Martinec
2019-Dec-10 16:08 UTC
Boot loader stuck after first stage upgrading 11.2 to 12.0-RC2
2019-12-10 16:35, Marc Branchaud wrote:> On 2019-12-10 9:18 a.m., Mark Martinec wrote: >> Commenting on a thread from 2018-12 and from 2019-09-20, with my >> solution >> to the boot problem at the end, in case anyone is still interested. > > Thank you very much for this. A couple of questions: > > (1) Why do you say "raw devices for historical reasons"? Glancing > through the zpool man page and the Handbook, I see nothing > recommending or requiring GPT partitions.Apparently using raw devices for zpool is now discouraged, although I don't think it has ever become officially unsupported.> (2) Just to be 100% clear, my 11.3 non-root zpool looks like this: > NAME STATE READ WRITE CKSUM > storage ONLINE 0 0 0 > raidz2-0 ONLINE 0 0 0 > ada2 ONLINE 0 0 0 > ada3 ONLINE 0 0 0 > ada4 ONLINE 0 0 0 > ada5 ONLINE 0 0 0 > ada6 ONLINE 0 0 0 > ada7 ONLINE 0 0 0 > > So this is using raw devices. Are you saying that if I upgrade this > machine to 12 that it won't be able to boot?It is possible it won't boot under 12, although not necessary. Try booting from a 12.0 (or 12.0) memory stick - it that boots, it is probably a safe bet that it will survive an upgrade. Of the bunch of machines that I have upgraded from 11.2 to 12, only three failed to boot under 12.0 loader. There were a couple of others which upgraded and booted fine even though they had a zfs pool on raw devices. I never had a problem of booting on hosts that had zfs pool on a gpt partition. So it's a lottery: a few raw devices in a zpool seem to do fine, while many raw devices in a zpool is asking for trouble under 12.0 and later. Mark