On 04-08-15 17:38, Oscar Roozen wrote:> Okay, the code in efi/ uses Print() from gnu_efi, but generic code from > core/ like core/elflink/load_env32.c prints their messages and debugging > stuff using printf(). These messages end up nowhere. This may explain > why I never saw anything beyond a certain point, even with debugging > turned on.I was busy adding some code to dprintf.h to add a wrapper that converts the dprintf calls to a APrint("%a", buf) calls when I got an idea... The ILO4 environment provides a virtual COM2 port. I was reluctant to hook something to COM1 as I have to order some hardware for that, but this also works very well. A pity I didn't think of using it earlier. Still miss the printf output, though, so error messages from the shared code don't reach the user. I recompiled with this in mk/devel.mk: GCCWARN += -DDEBUG_PORT=0x2f8 -DCORE_DEBUG=1 And used ipmi_console to capture the output. Another way is to use the vsp command in the ILO4 shell. For future reference.> Any suggestions while I keep on debugging?Got some result! I was using 'vesamenu.c32' to display a menu. Changing this to 'menu.c32' resulted in a booting system! Okay, the menu is not that beautiful, but the system boots very well right now. Now... why is vesamenu.c32 crashing like it does now? Why is the version I tried without Gene's latest patches crashing before even beginning to load the first stage: ldlinux.e64? I'll investigate a bit further tomorrow.
On Tue, Aug 4, 2015 at 2:42 PM, Oscar Roozen via Syslinux <syslinux at zytor.com> wrote:> On 04-08-15 17:38, Oscar Roozen wrote:>> Any suggestions while I keep on debugging?> Got some result! > > I was using 'vesamenu.c32' to display a menu. Changing this to 'menu.c32' > resulted in a booting system! Okay, the menu is not that beautiful, but the > system boots very well right now.Excellent! I do seem to recall someone saying there's a quirk/bug/interaction in vesamenu.c32 on EFI but not finding it> Now... why is vesamenu.c32 crashing like it does now? Why is the version IWithout opening up the VESA-only code, I'd suspect some BIOS-isms. Remember that vesamenu.c32 and menu.c32 only differ by the inclusion of VESA code and the default VESA mode.> tried without Gene's latest patches crashing before even beginning to load > the first stage: ldlinux.e64?It _probably_ had issues finding the NIC properly. -- -Gene
> I was using 'vesamenu.c32' to display a menu. Changing this to > 'menu.c32' resulted in a booting system! Okay, the menu is not that > beautiful, but the system boots very well right now. > > Now... why is vesamenu.c32 crashing like it does now? Why is the version > I tried without Gene's latest patches crashing before even beginning to > load the first stage: ldlinux.e64? > > I'll investigate a bit further tomorrow.I think you were "hinted" about this before. Unfortunately, different people define "boot" in different ways. The good news is that the recent patches for the multi-nic branch seem to be doing their work for your hardware / firmware. The not-so-good "news" is that, considering that vesamenu.c32 has several problems (under UEFI), the reasons for your latest crashes would need more specific (detailed) investigations and reports (as opposed to "failed to boot"). Things that come to mind: the space-like character issues (whether SYSAPPEND is being used or not), additional building interactions (gcc, gnu-efi...), the output console (length of the command, keyboard issues...), screen resolution supported by your / the UEFI firmware, and more. Regards, Ady.> _______________________________________________ > Syslinux mailing list > Submissions to Syslinux at zytor.com > Unsubscribe or set options at: > http://www.zytor.com/mailman/listinfo/syslinux >
>>>Okay, the code in efi/ uses Print() from gnu_efi, but generic code from core/ like core/elflink/load_env32.c prints their messages and debugging stuff using printf(). These messages end up nowhere. This may explain why I never saw anything beyond a certain point, even with debugging turned on. <<< isn't it redirected at all??>>>I was busy adding some code to dprintf.h to add a wrapper that converts the dprintf calls to a APrint("%a", buf) calls when I got an idea... The ILO4 environment provides a virtual COM2 port. I was reluctant to hook something to COM1 as I have to order some hardware for that, but this also works very well. A pity I didn't think of using it earlier. <<< You can set the com address for debugging (com1/com2/ whatever) if you like>>>Still miss the printf output, though, so error messages from the shared code don't reach the user. <<< guau...>>>Got some result! I was using 'vesamenu.c32' to display a menu. Changing this to 'menu.c32' resulted in a booting system! Okay, the menu is not that beautiful, but the system boots very well right now. Now... why is vesamenu.c32 crashing like it does now? <<< Are you still loading PNGs? Have you tried vesamenu w/o a background image? >>> Why is the version I tried without Gene's latest patches crashing before even beginning to load the first stage: ldlinux.e64? <<< when you do not see the transfer of ldlinux.e64 the chances are you are suffering the multi-nic bug. If you see "anything" after syslinux.efi transferred correctly that tells us the multi-nic bug is gone and there might be something else going on now. Best, Patrick
On Aug 4, 2015 2:45 PM, "Oscar Roozen via Syslinux" <syslinux at zytor.com> wrote:> Still miss the printf output, though, so error messages from the sharedcode don't reach the user.> > I recompiled with this in mk/devel.mk: > GCCWARN += -DDEBUG_PORT=0x2f8 -DCORE_DEBUG=10x2f8 is a BIOSism. --Gene
On 05-08-15 02:27, Gene Cumm wrote:>> tried without Gene's latest patches crashing before even beginning to load >> the first stage: ldlinux.e64? > > It _probably_ had issues finding the NIC properly.Of course, and your latest patches solved this problem. But I should have been more clear. What happened was: 1 - HP ROM FW: Download syslinux.efi 2 - HP ROM FW: Run syslinux.efi 3 - Syslinux: Try to download ldlinux.e64 4 - *CRASH* Now it is: 1 - HP ROM FW: Download syslinux.efi 2 - HP ROM FW: Run syslinux.efi 3 - Syslinux: Try to download ldlinux.e64 4 - Syslinux: Invoke ldlinux.e64 [...] 5 - Syslinux: Try to run vesamenu.c32 6 - *CRASH* In both cases *CRASH* can either be a spontanious reboot or one of a selection of Exceptions reported in red on black. Due to the randomness of the crashes my bet would be on an overwritten return address or longjmp address somewhere. In both cases there should be some error message or an exit back to the firmware bootloader indicating a failure to startup. So, while the multinic issue is solved (thanks!!!), there still is something wrong. And it could be the same issue in both cases.
On 05-08-15 10:01, Patrick Masotta wrote:>>>> > Okay, the code in efi/ uses Print() from gnu_efi, but generic code from > core/ like core/elflink/load_env32.c prints their messages and debugging > stuff using printf(). These messages end up nowhere. This may explain > why I never saw anything beyond a certain point, even with debugging > turned on. > <<< > > isn't it redirected at all??No. If, for example, I change: dprintf("Starting %s elf module subsystem...\n", ELF_MOD_SYS); into regular printf(), I don't see the result in my ipmi-logs nor on my screen. If I change it to Aprint() (and %s into %a) then I do see the message. To make sure it isn't overwritten before I see it, I put a while(1) in front, so it will end in an endless loop printing this message. Maybe we need to explicitly redirect stdout and stderr to the console under EFI on some firmware?>>>> > I was busy adding some code to dprintf.h to add a wrapper that converts > the dprintf calls to a APrint("%a", buf) calls when I got an idea... The > ILO4 environment provides a virtual COM2 port. I was reluctant to > hook something to COM1 as I have to order some hardware for that, but this > also works very well. A pity I didn't think of using it earlier. > <<< > > You can set the com address for debugging (com1/com2/ whatever) if you likeI was originally looking for a way to debug to the console instead. Defining DEBUG_STDIO did not help. Now I know why.> Are you still loading PNGs? > Have you tried vesamenu w/o a background image?No and yes. I removed all fancy stuff, leaving just: LABEL linux KERNEL path/to/vmlinuz IPAPPEND 3 APPEND initrd=path/to/initrd rdblacklist=nouveau MENU LABEL ^AUTO - Normal node boot MENU DEFAULT [.. some more entries ..] DEFAULT vesamenu.c32 PROMPT 0 TIMEOUT 50 MENU TITLE Cluster Manager PXE Environment (EFI64) With the default being either vesamenu.c32 or menu.c32. Using menu.c32 i can sucessfully boot this kernel into a running system.> when you do not see the transfer of ldlinux.e64 the chances are you are suffering the multi-nic bug. > If you see "anything" after syslinux.efi transferred correctly that tells us the multi-nic bug is gone and > there might be something else going on now.See my message to Gene a few minutes ago. My guess is that the problems itself are not the reason to crash, but the exit-code or error-reporting code is the one doing the crashing.
On 05-08-15 12:05, Gene Cumm wrote:> > I recompiled with this in mk/devel.mk <http://devel.mk>: > > GCCWARN += -DDEBUG_PORT=0x2f8 -DCORE_DEBUG=1 > > 0x2f8 is a BIOSism.Is this a problem? The example in the comments said 0x3f8 which is COM1.
On 05-08-15 04:17, Ady via Syslinux wrote:>> Now... why is vesamenu.c32 crashing like it does now? Why is the version >> I tried without Gene's latest patches crashing before even beginning to >> load the first stage: ldlinux.e64? > I think you were "hinted" about this before.About what? Did I miss something obvious? I now know why it did not load ldlinux.e64, but I still don't know why it crashes. The code seems well written with error checking everywhere, so I would expect an error message, or even a silent fail causing the boot process to try the DVD player next, or whatever. Not a crash.> Unfortunately, different > people define "boot" in different ways.In this case I define 'boot' as 'reach a stage where syslinux successfully loads and runs a linux-kernel'. Up until now I think I described reasonably well how far it got into this process.> The not-so-good "news" is that, considering that vesamenu.c32 has > several problems (under UEFI), the reasons for your latest crashes > would need more specific (detailed) investigations and reports (as > opposed to "failed to boot").What more can I do? I dived deep into the C code of a product that I don't have intimate knowledge of and spent more time on google than a teenager on facebook, but this is all I can provide. Please help me out here, as time is running out. I don't own the hardware and will probably loose access to it soon. Unfortunately the debugging process is barely documented at all. The wiki page doesn't really help in this case and I had to read the source to find out how to enable more verbose output. And then there is this problem of stdout not being visible on this EFI system, so I first had to debug the debugging system... ;( Just describing my experience here. I'm new to the syslinux source and am having a hard time trying to understand how everything sings along.> Things that come to mind: the space-like character issues (whether > SYSAPPEND is being used or not),Nope.>additional building interactions (gcc,> gnu-efi...)I build using 'make spotless; make efi64', initially without any changes to the code or makefiles. The build system is Centos 7.1.1503 at the moment, with gcc version 4.8.3 20140911 (Red Hat 4.8.3-9). However, binaries built by Gene crashed in the same way.>, the output console (length of the command, keyboard > issues...), screen resolution supported by your / the UEFI firmware, > and more.This all only got relevant yesterday, when I found out that replacing vesamenu.c32 with menu.c32 solved the crashes in this case. Are there any specific outputs you want to see? Just telling me you need more doesn't really help... I don't want to spam the list with everything and more I can capture, but if I leave out too much, by all means, please ask for it.