Warner Losh
2018-Oct-22 03:30 UTC
head -r338804 boots threadripper 1950X fine; head -r338810+ do not; -r338807 seems implicated
On Sun, Oct 21, 2018 at 9:28 PM Warner Losh <imp at bsdimp.com> wrote:> > > On Sun, Oct 21, 2018 at 8:57 PM Mark Millard via freebsd-stable < > freebsd-stable at freebsd.org> wrote: > >> [I built based on WITHOUT_ZFS= for other reasons. But, >> after installing the build, Hyper-V based boots are >> working.] >> >> On 2018-Oct-20, at 2:09 AM, Mark Millard <marklmi at yahoo.com> wrote: >> >> > On 2018-Oct-20, at 1:39 AM, Mark Millard <marklmi at yahoo.com> wrote: >> > >> >> I attempted to jump from head -r334014 to -r339076 >> >> on a threadripper 1950X board and the boot fails. >> >> This is both native booting and under Hyper-V, >> >> same machine and root file system in both cases. >> > >> > I did my investigation under Hyper-V after seeing >> > a boot failure native. >> > >> > Looks like the native failure is even earlier, >> > before db> is even possible, possibly during >> > early loader activity. >> > >> > So this report is really for running under >> > Hyper-V: -r338804 boots and -r338810 does >> > not. By contrast -r334804 does not boot native. >> > (But I've little information for that context.) >> > >> > Sorry for the confusion. I rushed the report >> > in hopes of getting to sleep. It was not to be. >> > >> >> It fails just after the FreeBSD/SMP lines, >> >> reporting "kernel trap 9 with interrupts disabled". >> >> >> >> It fails in pmap_force_invaldiate_cache_range at >> >> a clflusl (%rax) instruction that produces a >> >> "Fatal trap 9: general protection fault while >> >> in kernel mode". cpudid=0 apic id= 00 >> >> >> >> I used kernel.txz files from: >> >> >> >> https://artifact.ci.freebsd.org/snapshot/head/r*/amd64/amd64/ >> >> >> >> to narrow the range of kernel builds for working -> failing >> >> and got: >> >> >> >> -r338804 boots fine >> >> (no amd64 kernel builds between to try) >> >> -r338810+ fails (any that I tried, anyway) >> >> >> >> In that range is -r338807 : >> >> >> >> QUOTE >> >> Author: kib >> >> Date: Wed Sep 19 19:35:02 2018 >> >> New Revision: 338807 >> >> URL: >> >> https://svnweb.freebsd.org/changeset/base/338807 >> >> >> >> >> >> Log: >> >> Convert x86 cache invalidation functions to ifuncs. >> >> >> >> This simplifies the runtime logic and reduces the number of >> >> runtime-constant branches. >> >> >> >> Reviewed by: alc, markj >> >> Sponsored by: The FreeBSD Foundation >> >> Approved by: re (gjb) >> >> Differential revision: >> >> https://reviews.freebsd.org/D16736 >> >> >> >> Modified: >> >> head/sys/amd64/amd64/pmap.c >> >> head/sys/amd64/include/pmap.h >> >> head/sys/dev/drm2/drm_os_freebsd.c >> >> head/sys/dev/drm2/i915/intel_ringbuffer.c >> >> head/sys/i386/i386/pmap.c >> >> head/sys/i386/i386/vm_machdep.c >> >> head/sys/i386/include/pmap.h >> >> head/sys/x86/iommu/intel_utils.c >> >> END QUOTE >> >> >> >> There do seem to be changes associated with >> >> clflush(...) use. Looking at: >> >> >> >> >> https://svnweb.freebsd.org/base/head/sys/amd64/amd64/pmap.c?annotate=339432 >> >> >> >> it appears that pmap_force_invalidate_cache_range has not >> >> changed since -r338807. >> >> >> >> It seems that -r338806 and -r3388810 would be unlikely >> >> contributors. >> > >> >> I went after my native-boot loader problem first because I >> could switch kernels via the loader for booting FreeBSD under >> Hyper-V. Switching loaders is more of a problem. >> >> In order to avoid the loader-time crash I switched to building >> installing based on WITHOUT_ZFS= . I've had no active use of >> ZFS in years. (The old official-build loaders that worked were >> non-ZFS ones.) >> >> This took care of the native-boot loader-crash --and, to my >> surprise, also the Hyper-V-boot kernel-time crash. >> >> My private builds now boot the 1950X in both contexts just >> fine. >> >> During my early investigation I did pick up specific changes >> from after -r339076 that seemed to be tied to Ryzen and such. >> (They made no difference to the boot problems at the time >> but I saw no reason to remove them.) >> >> # uname -apKU >> FreeBSD FBSDFSSD 12.0-ALPHA8 FreeBSD 12.0-ALPHA8 #5 r339076:339432M: Sun >> Oct 21 16:44:25 PDT 2018 markmi at FBSDFSSD:/usr/obj/amd64_clang/amd64.amd64/usr/src/amd64.amd64/sys/GENERIC-NODBG >> amd64 amd64 1200084 1200084 > >(stupid gmail) The phrase "no active use" bothers me. What does that mean? Are there any ZFS pools or any disks that any whiff of ZFSish thing on it at all? Clearly, there's something in the zfs boot loader that's freaking out by something on your system, but absent that information I can't help you. Warner
Mark Millard
2018-Oct-22 06:24 UTC
head -r338804 boots threadripper 1950X fine; head -r338810+ do not; -r338807 seems implicated
On 2018-Oct-21, at 8:30 PM, Warner Losh <imp at bsdimp.com> wrote:> On Sun, Oct 21, 2018 at 9:28 PM Warner Losh <imp at bsdimp.com> wrote: > > On Sun, Oct 21, 2018 at 8:57 PM Mark Millard via freebsd-stable <freebsd-stable at freebsd.org> wrote: >> [I built based on WITHOUT_ZFS= for other reasons. But, >> after installing the build, Hyper-V based boots are >> working.] >> >> On 2018-Oct-20, at 2:09 AM, Mark Millard <marklmi at yahoo.com> wrote: >> >> > On 2018-Oct-20, at 1:39 AM, Mark Millard <marklmi at yahoo.com> wrote: >> > >> >> I attempted to jump from head -r334014 to -r339076 >> >> on a threadripper 1950X board and the boot fails. >> >> This is both native booting and under Hyper-V, >> >> same machine and root file system in both cases. >> > >> > I did my investigation under Hyper-V after seeing >> > a boot failure native. >> > >> > Looks like the native failure is even earlier, >> > before db> is even possible, possibly during >> > early loader activity. >> > >> > So this report is really for running under >> > Hyper-V: -r338804 boots and -r338810 does >> > not. By contrast -r334804 does not boot native. >> > (But I've little information for that context.) >> > >> > Sorry for the confusion. I rushed the report >> > in hopes of getting to sleep. It was not to be. >> > >> >> It fails just after the FreeBSD/SMP lines, >> >> reporting "kernel trap 9 with interrupts disabled". >> >> >> >> It fails in pmap_force_invaldiate_cache_range at >> >> a clflusl (%rax) instruction that produces a >> >> "Fatal trap 9: general protection fault while >> >> in kernel mode". cpudid=0 apic id= 00 >> >> >> >> I used kernel.txz files from: >> >> >> >> https://artifact.ci.freebsd.org/snapshot/head/r*/amd64/amd64/ >> >> >> >> to narrow the range of kernel builds for working -> failing >> >> and got: >> >> >> >> -r338804 boots fine >> >> (no amd64 kernel builds between to try) >> >> -r338810+ fails (any that I tried, anyway) >> >> >> >> In that range is -r338807 : >> >> >> >> QUOTE >> >> Author: kib >> >> Date: Wed Sep 19 19:35:02 2018 >> >> New Revision: 338807 >> >> URL: >> >> https://svnweb.freebsd.org/changeset/base/338807 >> >> >> >> >> >> Log: >> >> Convert x86 cache invalidation functions to ifuncs. >> >> >> >> This simplifies the runtime logic and reduces the number of >> >> runtime-constant branches. >> >> >> >> Reviewed by: alc, markj >> >> Sponsored by: The FreeBSD Foundation >> >> Approved by: re (gjb) >> >> Differential revision: >> >> https://reviews.freebsd.org/D16736 >> >> >> >> Modified: >> >> head/sys/amd64/amd64/pmap.c >> >> head/sys/amd64/include/pmap.h >> >> head/sys/dev/drm2/drm_os_freebsd.c >> >> head/sys/dev/drm2/i915/intel_ringbuffer.c >> >> head/sys/i386/i386/pmap.c >> >> head/sys/i386/i386/vm_machdep.c >> >> head/sys/i386/include/pmap.h >> >> head/sys/x86/iommu/intel_utils.c >> >> END QUOTE >> >> >> >> There do seem to be changes associated with >> >> clflush(...) use. Looking at: >> >> >> >> https://svnweb.freebsd.org/base/head/sys/amd64/amd64/pmap.c?annotate=339432 >> >> >> >> it appears that pmap_force_invalidate_cache_range has not >> >> changed since -r338807. >> >> >> >> It seems that -r338806 and -r3388810 would be unlikely >> >> contributors. >> > >> >> I went after my native-boot loader problem first because I >> could switch kernels via the loader for booting FreeBSD under >> Hyper-V. Switching loaders is more of a problem. >> >> In order to avoid the loader-time crash I switched to building >> installing based on WITHOUT_ZFS= . I've had no active use of >> ZFS in years. (The old official-build loaders that worked were >> non-ZFS ones.) >> >> This took care of the native-boot loader-crash --and, to my >> surprise, also the Hyper-V-boot kernel-time crash. >> >> My private builds now boot the 1950X in both contexts just >> fine. >> >> During my early investigation I did pick up specific changes >> from after -r339076 that seemed to be tied to Ryzen and such. >> (They made no difference to the boot problems at the time >> but I saw no reason to remove them.) >> >> # uname -apKU >> FreeBSD FBSDFSSD 12.0-ALPHA8 FreeBSD 12.0-ALPHA8 #5 r339076:339432M: Sun Oct 21 16:44:25 PDT 2018 markmi at FBSDFSSD:/usr/obj/amd64_clang/amd64.amd64/usr/src/amd64.amd64/sys/GENERIC-NODBG amd64 amd64 1200084 1200084 >> >> (stupid gmail) > > The phrase "no active use" bothers me. What does that mean? Are there any ZFS pools or any disks that any whiff of ZFSish thing on it at all? Clearly, there's something in the zfs boot loader that's freaking out by something on your system, but absent that information I can't help you.No ZFS pools: Strictly UFS for FreeBSD file systems for the last few years, UFS before I had access to the 1950X system. I've never before bothered to use WITHOUT_ZFS= in my builds. So the system had the ZFS support, such as kernel modules, over all the time that this system had been in use. Prior to the recent versions I saw no such problems. But the default loader was not ZFS capable. As seen in the under-Hyper-V use-context: # gpart show -p => 40 937703008 da0 GPT (447G) 40 1024 da0p1 freebsd-boot (512K) 1064 746586112 da0p2 freebsd-ufs (356G) 746587176 31457280 da0p3 freebsd-swap (15G) 778044456 159383552 da0p4 freebsd-swap (76G) 937428008 275040 - free - (134M) => 40 937703008 da1 GPT (447G) 40 1024 da1p1 freebsd-boot (512K) 1064 369098752 da1p2 freebsd-ufs (176G) 369099816 406846424 da1p3 freebsd-swap (194G) 775946240 130024488 - free - (62G) 905970728 31457280 da1p4 freebsd-swap (15G) 937428008 275040 - free - (134M) => 40 419430320 da2 GPT (200G) 40 4056 - free - (2.0M) 4096 419426263 da2p1 freebsd-ufs (200G) 419430359 1 - free - (512B) => 40 2000409184 da3 GPT (954G) 40 1024 da3p1 freebsd-boot (512K) 1064 2000408159 da3p2 freebsd-ufs (954G) 2000409223 1 - free - (512B) So no ZFS pools. The above context never had the ZFS-capable loader problem but did have the kernel problem. I was booting the 356G freebsd-ufs partition: the only one that I have updated the FreeBSD version on so far. FreeBSD booted natively more drives are seen in gpart show, some not from/for FreeBSD. But the above drives are present and I was booting from the same partition of the same drive: the 356G freebsd-ufs partition. Still no ZFS pools anywhere: # gpart show -p => 34 4000797293 nvd0 GPT (1.9T) 34 262144 nvd0p1 ms-reserved (128M) 262178 2014 - free - (1.0M) 264192 3600451584 nvd0p2 ms-basic-data (1.7T) 3600715776 400081551 - free - (191G) => 40 937703008 nvd1 GPT (447G) 40 1024 nvd1p1 freebsd-boot (512K) 1064 746586112 nvd1p2 freebsd-ufs (356G) 746587176 31457280 nvd1p3 freebsd-swap (15G) 778044456 159383552 nvd1p4 freebsd-swap (76G) 937428008 275040 - free - (134M) => 40 937703008 nvd2 GPT (447G) 40 1024 nvd2p1 freebsd-boot (512K) 1064 369098752 nvd2p2 freebsd-ufs (176G) 369099816 406846424 nvd2p3 freebsd-swap (194G) 775946240 130024488 - free - (62G) 905970728 31457280 nvd2p4 freebsd-swap (15G) 937428008 275040 - free - (134M) => 34 2000409197 nvd3 GPT (954G) 34 2014 - free - (1.0M) 2048 1021952 nvd3p1 ms-recovery (499M) 1024000 202752 nvd3p2 efi (99M) 1226752 32768 nvd3p3 ms-reserved (16M) 1259520 1859119104 nvd3p4 ms-basic-data (886G) 1860378624 140030607 - free - (67G) => 40 2000409184 nvd4 GPT (954G) 40 1024 nvd4p1 freebsd-boot (512K) 1064 2000408159 nvd4p2 freebsd-ufs (954G) 2000409223 1 - free - (512B) => 63 2000409201 ada0 MBR (954G) 63 1985 - free - (993K) 2048 4096 ada0s1 linux-data (2.0M) 6144 2093056 - free - (1.0G) 2099200 1998309376 ada0s2 linux-lvm (953G) 2000408576 688 - free - (344K) => 34 2000409197 ada1 GPT (954G) 34 262144 ada1p1 ms-reserved (128M) 262178 2000147053 - free - (954G) => 34 2000409197 ada2 GPT (954G) 34 262144 ada2p1 ms-reserved (128M) 262178 2000147053 - free - (954G) => 34 1953497022 da0 GPT (932G) 34 262144 da0p1 ms-reserved (128M) 262178 2014 - free - (1.0M) 264192 1953230848 da0p2 ms-basic-data (931G) 1953495040 2016 - free - (1.0M) => 1 60062499 da1 MBR (29G) 1 31 - free - (16K) 32 60062468 da1s1 fat32lba (29G) The 356G freebsd-ufs partition is the only one of the freebsd-ufs partitions updated so far. This is the context that had the problem with the ZFS-capable loaders --but no later kernel problem when a not-ZFS-capable loader was used (via copying over an older one --until I did the WITHOUT_ZFS= build/install). As for the ZFS-capable loader: May it has problems when it sees one or more of: ms-reserved (on GPT) ms-basic-data (on GPT) (NTFS file system) ms-recovery (on GPT) efi (on GPT) linux-data (on MBR) linux-lvm (on MBR) fat32lba (on MBR) (given that none of these is available in the Hyper-V context as the virtual machine has been configured). ==Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar)
Toomas Soome
2018-Oct-22 09:27 UTC
head -r338804 boots threadripper 1950X fine; head -r338810+ do not; -r338807 seems implicated
> On 22 Oct 2018, at 06:30, Warner Losh <imp at bsdimp.com> wrote: > > On Sun, Oct 21, 2018 at 9:28 PM Warner Losh <imp at bsdimp.com <mailto:imp at bsdimp.com>> wrote: > >> >> >> On Sun, Oct 21, 2018 at 8:57 PM Mark Millard via freebsd-stable < >> freebsd-stable at freebsd.org> wrote: >> >>> [I built based on WITHOUT_ZFS= for other reasons. But, >>> after installing the build, Hyper-V based boots are >>> working.] >>> >>> On 2018-Oct-20, at 2:09 AM, Mark Millard <marklmi at yahoo.com> wrote: >>> >>>> On 2018-Oct-20, at 1:39 AM, Mark Millard <marklmi at yahoo.com> wrote: >>>> >>>>> I attempted to jump from head -r334014 to -r339076 >>>>> on a threadripper 1950X board and the boot fails. >>>>> This is both native booting and under Hyper-V, >>>>> same machine and root file system in both cases. >>>> >>>> I did my investigation under Hyper-V after seeing >>>> a boot failure native. >>>> >>>> Looks like the native failure is even earlier, >>>> before db> is even possible, possibly during >>>> early loader activity. >>>> >>>> So this report is really for running under >>>> Hyper-V: -r338804 boots and -r338810 does >>>> not. By contrast -r334804 does not boot native. >>>> (But I've little information for that context.) >>>> >>>> Sorry for the confusion. I rushed the report >>>> in hopes of getting to sleep. It was not to be. >>>> >>>>> It fails just after the FreeBSD/SMP lines, >>>>> reporting "kernel trap 9 with interrupts disabled". >>>>> >>>>> It fails in pmap_force_invaldiate_cache_range at >>>>> a clflusl (%rax) instruction that produces a >>>>> "Fatal trap 9: general protection fault while >>>>> in kernel mode". cpudid=0 apic id= 00 >>>>> >>>>> I used kernel.txz files from: >>>>> >>>>> https://artifact.ci.freebsd.org/snapshot/head/r*/amd64/amd64/ >>>>> >>>>> to narrow the range of kernel builds for working -> failing >>>>> and got: >>>>> >>>>> -r338804 boots fine >>>>> (no amd64 kernel builds between to try) >>>>> -r338810+ fails (any that I tried, anyway) >>>>> >>>>> In that range is -r338807 : >>>>> >>>>> QUOTE >>>>> Author: kib >>>>> Date: Wed Sep 19 19:35:02 2018 >>>>> New Revision: 338807 >>>>> URL: >>>>> https://svnweb.freebsd.org/changeset/base/338807 >>>>> >>>>> >>>>> Log: >>>>> Convert x86 cache invalidation functions to ifuncs. >>>>> >>>>> This simplifies the runtime logic and reduces the number of >>>>> runtime-constant branches. >>>>> >>>>> Reviewed by: alc, markj >>>>> Sponsored by: The FreeBSD Foundation >>>>> Approved by: re (gjb) >>>>> Differential revision: >>>>> https://reviews.freebsd.org/D16736 >>>>> >>>>> Modified: >>>>> head/sys/amd64/amd64/pmap.c >>>>> head/sys/amd64/include/pmap.h >>>>> head/sys/dev/drm2/drm_os_freebsd.c >>>>> head/sys/dev/drm2/i915/intel_ringbuffer.c >>>>> head/sys/i386/i386/pmap.c >>>>> head/sys/i386/i386/vm_machdep.c >>>>> head/sys/i386/include/pmap.h >>>>> head/sys/x86/iommu/intel_utils.c >>>>> END QUOTE >>>>> >>>>> There do seem to be changes associated with >>>>> clflush(...) use. Looking at: >>>>> >>>>> >>> https://svnweb.freebsd.org/base/head/sys/amd64/amd64/pmap.c?annotate=339432 >>>>> >>>>> it appears that pmap_force_invalidate_cache_range has not >>>>> changed since -r338807. >>>>> >>>>> It seems that -r338806 and -r3388810 would be unlikely >>>>> contributors. >>>> >>> >>> I went after my native-boot loader problem first because I >>> could switch kernels via the loader for booting FreeBSD under >>> Hyper-V. Switching loaders is more of a problem. >>> >>> In order to avoid the loader-time crash I switched to building >>> installing based on WITHOUT_ZFS= . I've had no active use of >>> ZFS in years. (The old official-build loaders that worked were >>> non-ZFS ones.) >>> >>> This took care of the native-boot loader-crash --and, to my >>> surprise, also the Hyper-V-boot kernel-time crash. >>> >>> My private builds now boot the 1950X in both contexts just >>> fine. >>> >>> During my early investigation I did pick up specific changes >>> from after -r339076 that seemed to be tied to Ryzen and such. >>> (They made no difference to the boot problems at the time >>> but I saw no reason to remove them.) >>> >>> # uname -apKU >>> FreeBSD FBSDFSSD 12.0-ALPHA8 FreeBSD 12.0-ALPHA8 #5 r339076:339432M: Sun >>> Oct 21 16:44:25 PDT 2018 markmi at FBSDFSSD:/usr/obj/amd64_clang/amd64.amd64/usr/src/amd64.amd64/sys/GENERIC-NODBG >>> amd64 amd64 1200084 1200084 >> >> > (stupid gmail) > > The phrase "no active use" bothers me. What does that mean? Are there any > ZFS pools or any disks that any whiff of ZFSish thing on it at all? > Clearly, there's something in the zfs boot loader that's freaking out by > something on your system, but absent that information I can't help you. >It would help to get output from loader lsdev -v command. Also if you could test boot loader with UEFI - for example get to loader prompt via usb/cd boot and then get the same lsdev -v output. I would be interested to see the sector size information and if the UEFI loader does also have issues. If it does, I?d like to see the outputs from commands: zpool status zpool import thanks, toomas