hpa at zytor.com
2019-Jun-19 20:05 UTC
[syslinux] lpxelinux.0 issues with larger initrd.img files from RHEL >= 7.5 on UCS servers?
On June 19, 2019 12:21:05 PM PDT, Mathieu Chouquet-Stringer <m+syslinux at thi.eu.com> wrote:> Hello, > >On Tue, Jun 18, 2019 at 05:31:17PM -0700, hpa at zytor.com wrote: >> Which servers, what threshold, what clients, what about pxelinux.0? > >All affected servers so far are Cisco UCS B200 M3 blade servers. > >The threshold seems to be around 50MB, I haven't tested precisely but >54525200 bytes is enough to trigger the reboot while 49812292 isn't >(what I did was to recompress the initrd with a higher compression >setting). > >I tried today pxelinux.0 instead of lpxelinux.0 and it behaves the same >way: if the initrd is "too big", the server reboots. > >What do you mean by "clients"? > >Please find below what I initially wrote in my email, let me know if or >how I can help debug that. > >Cheers, > >> Hello, >> >> I am using lpxelinux.0 (latest stable version 6.03, using the >official binaries >> from kernel.org) to kickstart servers as http transfers really helps >over links >> with poor latencies... These servers are being booted in legacy >mode, >> not in UEFI. >> >> This has worked very well until recently. Starting with RHEL 7.5, on >> some servers, we would see the machine rebooting while pxelinux is >in the >> middle of downloading the initrd.img file. >> >> I quick workaround was to tell people to boot/kickstart using a >previous >> minor (last working was 7.4): the install process taking care of >updating to >> the latest version. >> >> I experienced the same issue again yesterday and had time to think >about >> it again. >> >> On rhel 7.4, the last working version in my case, the file is that >big: >> -rw-r--r-- 1 root root 49763300 Dec 1 2017 initrd.img >> >> I saw on 7.5 and 7.6 they're slightly larger, respectively: >> -r--r--r-- 1 root root 54525200 Mar 22 2018 initrd.img >> and >> -r--r--r-- 1 root root 54799220 Oct 10 2018 initrd.img >> >> Because I have no trace on the screen to explain the reboot (same >thing >> with a recorded session over a serial console), I was like: what if >the >> size is a factor? Given I had nothing else to try, I looked at the >files >> and saw they were compressed with xz. >> >> So I wondered, what if I compressed them more (I guessed they were >compressed >> with the default compression preset)? After uncompressing and >compressing with >> xz -9 -C crc32, here's what I get for 7.5 and 7.6 respectively: >> -r--r--r-- 1 root root 48823576 May 21 11:45 initrd.img >> and >> -r--r--r-- 1 root root 49812292 May 21 11:27 initrd.img >> >> Pretty close to what I had in 7.4. And to my suprise, with these >smaller >> files, lpxelinux doesn't reboot while downloading the file over >http. The >> kernel boots up and the OS is installed properly.... >> >> I haven't had the time to reproduce this issue over regular >> pxelinux/tftp so I don't know if it's just tied to lpxelinux/http or >> not. >> >> Also, so far this bug only seems to be triggered on some Cisco UCS >> servers such as UCSB-B200-M3 like the one described below. So maybe >it >> could be related to BIOS or memory maps, I am not sure!? >> >> I'd hate to go back to tftp because the switch to http was such a >huge >> step forward. >> >> Is there something I could do or provide to help debug this issue? I >> read the "Hardware Compatibility" and "Common Problems" pages on the >> wiki and found nothing close to what I'm seeing. I started reading >the >> "Development/Debugging" but while I could use "COM32 debug.c32" to >get >> more details, but I don't know which functions I should be tracing? >> >> Please let me know. >> >> Cheers, >> Mathieu >> >> BIOS Information >> Vendor: Cisco Systems, Inc. >> Version: B200M3.2.2.6f.0.052120182033 >> Release Date: 05/21/2018 >> Address: 0xF0000 >> Runtime Size: 64 kB >> ROM Size: 4096 kB >> Characteristics: >> PCI is supported >> BIOS is upgradeable >> BIOS shadowing is allowed >> Boot from CD is supported >> Selectable boot is supported >> BIOS ROM is socketed >> EDD is supported >> 5.25"/1.2 MB floppy services are supported (int >13h) >> 3.5"/720 kB floppy services are supported (int >13h) >> 3.5"/2.88 MB floppy services are supported (int >13h) >> Print screen service is supported (int 5h) >> 8042 keyboard services are supported (int 9h) >> Serial services are supported (int 14h) >> Printer services are supported (int 17h) >> ACPI is supported >> USB legacy is supported >> BIOS boot specification is supported >> Targeted content distribution is supported >> UEFI is supported >> BIOS Revision: 4.6 >> >> System Information >> Manufacturer: Cisco Systems Inc >> Product Name: UCSB-B200-M3 >> Version: 1 >> Serial Number: MYSERIALNUMBER >> UUID: SOMEUUID >> Wake-up Type: Other >> SKU Number: >> Family:Sounds like you may want to contact Cisco... -- Sent from my Android device with K-9 Mail. Please excuse my brevity.
Mathieu Chouquet-Stringer
2019-Jun-19 20:18 UTC
[syslinux] lpxelinux.0 issues with larger initrd.img files from RHEL >= 7.5 on UCS servers?
On Wed, Jun 19, 2019 at 01:05:50PM -0700, hpa at zytor.com wrote:> Sounds like you may want to contact Cisco...And tell them what? There's a bug in their PXE/BIOS stack somewhere? -- Mathieu Chouquet-Stringer m+syslonux at thi.eu.com The sun itself sees not till heaven clears. -- William Shakespeare --
Ady Ady
2019-Jun-19 23:15 UTC
[syslinux] lpxelinux.0 issues with larger initrd.img files from RHEL >= 7.5 on UCS servers?
> > Sounds like you may want to contact Cisco... > > And tell them what? There's a bug in their PXE/BIOS stack somewhere?Just some random and humble thoughts... Perhaps it would be worth some additional tests? Maybe a test with pxelinux.0 version 4.07? And using "LINUX", not "KERNEL": ### DEFAULT biginitrd PROMPT 0 LABEL biginitrd LINUX mykernel INITRD mybiginitrd APPEND myoptions ### Maybe a packet capture might reveal something about the network setup? Maybe it is not the size of the initrd file but rather a time limitation (which gets triggered by a big-enough file)? What about trying with 6.04-pre1, and/or (even better) with Debian Sid's packages and using debug.c32 (from the corresponding package/version/build): ### DEFAULT dbg PROMPT 1 SAY First press [Enter] for debug.c32, then [1][Enter] for OS. # Once in the boot prompt, press "Enter". LABEL dbg COM32 debug.c32 APPEND -e pxe_call,malloc # Additional functions for debug might be of interest. # After debug.c32 returns to the boot prompt, # press "1" and "Enter". LABEL 1 LINUX mykernel INITRD mybiginitrd APPEND myoptions ### Once the label named "1" is launched, is there any info (that could help)? Please avoid using boot menus for these tests; use the boot prompt. Sometimes a "cold" boot behaves differently than a "warm" boot. This might be (particularly) relevant when performing multiple cycles of (network) booting (tests). Another possible test could be to use UEFI mode instead of CSM; in such case the bootloader would be (efi64's) syslinux.efi + ldlinux.e64 ( + efi64's debug.c32). As usual, all files from the same set of package/version/build. Considering the lack of specific, knowledgeable replies about this topic in this Syslinux mailing list so far, maybe a test with a different bootloader (at least for comparison of resulting behavior) would be worth? Recurrent failures in all tests would probably indicate either: _ a bug in the firmware; or, _ some problem with the specific kernel+initrd combo (in this hardware); or, _ some problem with the network setup. OTOH, if a successful boot can be achieved by at least one test, then either a bug or a limitation in (l)pxelinux.0 might be exposed by this hardware/firmware. As I mentioned, these are some random and humble thoughts; they might not be worth much (or even nothing at all). Regards, Ady.
Reasonably Related Threads
- lpxelinux.0 issues with larger initrd.img files from RHEL >= 7.5 on UCS servers?
- lpxelinux.0 issues with larger initrd.img files from RHEL >= 7.5 on UCS servers?
- lpxelinux.0 issues with larger initrd.img files from RHEL >= 7.5 on UCS servers?
- lpxelinux.0 issues with larger initrd.img files from RHEL >= 7.5 on UCS servers?
- lpxelinux.0 issues with larger initrd.img files from RHEL >= 7.5 on UCS servers?