Good day, Geoff.
On 2024-12-02 03:22, Geoff Winkless wrote:> [...]
>
> I tried (a truncated version of) your instructions with my kernel and
> it boots fine under qemu. Sadly that same kernel will not boot on my
> real hardware (an ASRock N100-based board).
>
Just to confirm, are these 2 points all true?:
1. On the "ASRock N100-based board," your /EFI/BOOT/BOOTX64.EFI is the
/usr/lib/SYSLINUX.EFI/efi64/syslinux.efi file having MD5sum
388f8a1b8b2286de59952da3d25c0dae from the
https://packages.debian.org/bullseye/syslinux-efi page has been used?
2. On the "ASRock N100-based board," your /EFI/BOOT/ldlinux.e64 is the
/usr/lib/syslinux/modules/efi64/ldlinux.e64 file having MD5sum
e6fbf775695dd802a48820d467affd68 from the
https://packages.debian.org/bullseye/syslinux-common page has been used?
>
> The same kernel, with the only config change set to compile as x64,
> boots on the same hardware fine.
>
> To double-check it's not something I've done I grabbed the vmlinuz
and
> initrd from https://cdimage.debian.org/debian-cd/current/i386/iso-cd/
> and that also fails on this hardware.
>
> [...]
>
> and then freezes.
>
> Sprinkling some dprintfs in efi/main.c suggests it's failing at
> exit_boot: the odd thing is that the dprintf I added _inside_
> exit_boot() itself doesn't get written, which suggests the failure is
> in the attempt to call exit_boot itself (either that or dprintf
> doesn't work within exit_boot?).
>
Yes, I'd imagine that dbprintf would use the (U)EFI boot services, which
have just been exited, just prior to your usage that doesn't appear to
work.
On 2024-12-02 08:23, Geoff Winkless wrote:> On Mon, 2 Dec 2024 at 11:22, I wrote:
>> Sprinkling some dprintfs in efi/main.c suggests it's failing at
>> exit_boot: the odd thing is that the dprintf I added _inside_
>> exit_boot() itself doesn't get written, which suggests the failure
is
>> in the attempt to call exit_boot itself (either that or dprintf
>> doesn't work within exit_boot?).
>
> I inlined exit_boot in case something odd was going on with calling
> it, and I got more failure information.
>
> This call:
>
> status = uefi_call_wrapper(BS->ExitBootServices, 2, image_handle,
> key);
>
> is returning EFI_INVALID_PARAMETER, which (according to
> https://uefi.org/specs/UEFI/2.10/07_Services_Boot_Services.html#id41 )
> means "key" is incorrect.
>
> However I'm not sure if that was because I'd interspersed dprintf()
> calls between the call to get_memory_map and the call to
> ExitBootServices: the web page suggests that you should do the first
> just before the second so as not to invalidate the map. Certainly if I
> take out those dprintf() calls I no longer get the message telling me
> that ExitBootServices failed; unfortunately it still doesn't boot.
>
> [...]
>
Agreed: your dprintfs could be invalidating the memory-map and causing
the EFI_INVALID_PARAMETER result.
Some ideas...:
One "I'm blind, use brute force" technique would be to use
"hang versus
reboot" as an indicator of where you are, in the code. After proving
that you can issue this somewhere:
RT->SystemReset(EfiResetCold, EFI_SUCCESS, sizeof "Bye",
"Bye");
and cause a cold reboot, then you can [probably] use that to reboot and
use:
while (1) { static volatile int i; i = 42; ++i; }
to hang. Instead of 'dprintf' and its potential alteration of "the
latest memory map," perhaps these could be used, for insight(s).
Another idea to achieve your goal of using the known-good, 32-bit kernel
across your "fleet" is to boot a 64-bit kernel and initramfs who both
have the sole purpose of acquiring the desirable 32-bit kernel (and
relevant initramfs, if you're using one) and then using kexec to boot
it. This approach is cumbersome, but permits you to continue to use
(U)EFI Syslinux and to get your desired kernel booted on the hardware.
Given these 2 observations:
3. The 64-bit kernels work from the 64-bit (U)EFI Syslinux on the
"ASRock N100-based board"
4. The 32-bit kernel works from the 64-bit (U)EFI Syslinux on in QEmu
I'm suspicious of one of the following:
5. Some hand-off to the 32-bit kernel isn't quite right; possibly
related to "long mode."
6. Low-level code in the 32-bit kernel might be using BIOS interrupts,
which will exist in QEmu (I imagine that TianoCore builds upon them,
although I could be mistaken) and might not exist or otherwise operate
as expected on the "ASRock N100-based board," and this low-level code
might not be present in the 64-bit counterpart kernel with the
almost-the-same configuration. (One possible "early use" of BIOS
interrupts would be to print very early messages from the kernel.)
Another idea is to attempt to boot other 64-bit (U)EFI Linux loaders and
to find out how they fare, such as (U)EFI iPXE. (iPXE can have a kernel
embedded into it, for the record, although that makes for a large [U]EFI
binary.)
What an interesting challenge you've encountered, Geoff. Thank you for
sharing it.
- Shao Miller