hpa at zytor.com
2019-Jun-19 00:31 UTC
[syslinux] lpxelinux.0 issues with larger initrd.img files from RHEL >= 7.5 on UCS servers?
On June 18, 2019 12:34:35 PM PDT, Mathieu Chouquet-Stringer <m+syslinux at thi.eu.com> wrote:> Hello Hans Peter, > >Any idea on how I could debug this problem? Basically lpxelinux 6.03 >reboots while loading the initrd if its size is above a certain >threshold. It only happens on certain servers and there's not output >when it happens. I can trigger it reliably on affected hardware. > >On Wed, May 22, 2019 at 05:45:20PM +0200, Mathieu Chouquet-Stringer via >Syslinux wrote: >> > I don't know how much relevant debug info you could actually get >when >> > using the "-dhcpinfo" option of linux.c32 (see the wiki for >details). >> >> I'll give it a shot nonetheless... > >Reading the doc, it's kinda useless to me because I don't even boot the >loaded kernel...Which servers, what threshold, what clients, what about pxelinux.0? -- Sent from my Android device with K-9 Mail. Please excuse my brevity.
Mathieu Chouquet-Stringer
2019-Jun-19 19:21 UTC
[syslinux] lpxelinux.0 issues with larger initrd.img files from RHEL >= 7.5 on UCS servers?
Hello, On Tue, Jun 18, 2019 at 05:31:17PM -0700, hpa at zytor.com wrote:> Which servers, what threshold, what clients, what about pxelinux.0?All affected servers so far are Cisco UCS B200 M3 blade servers. The threshold seems to be around 50MB, I haven't tested precisely but 54525200 bytes is enough to trigger the reboot while 49812292 isn't (what I did was to recompress the initrd with a higher compression setting). I tried today pxelinux.0 instead of lpxelinux.0 and it behaves the same way: if the initrd is "too big", the server reboots. What do you mean by "clients"? Please find below what I initially wrote in my email, let me know if or how I can help debug that. Cheers,> Hello, > > I am using lpxelinux.0 (latest stable version 6.03, using the official binaries > from kernel.org) to kickstart servers as http transfers really helps over links > with poor latencies... These servers are being booted in legacy mode, > not in UEFI. > > This has worked very well until recently. Starting with RHEL 7.5, on > some servers, we would see the machine rebooting while pxelinux is in the > middle of downloading the initrd.img file. > > I quick workaround was to tell people to boot/kickstart using a previous > minor (last working was 7.4): the install process taking care of updating to > the latest version. > > I experienced the same issue again yesterday and had time to think about > it again. > > On rhel 7.4, the last working version in my case, the file is that big: > -rw-r--r-- 1 root root 49763300 Dec 1 2017 initrd.img > > I saw on 7.5 and 7.6 they're slightly larger, respectively: > -r--r--r-- 1 root root 54525200 Mar 22 2018 initrd.img > and > -r--r--r-- 1 root root 54799220 Oct 10 2018 initrd.img > > Because I have no trace on the screen to explain the reboot (same thing > with a recorded session over a serial console), I was like: what if the > size is a factor? Given I had nothing else to try, I looked at the files > and saw they were compressed with xz. > > So I wondered, what if I compressed them more (I guessed they were compressed > with the default compression preset)? After uncompressing and compressing with > xz -9 -C crc32, here's what I get for 7.5 and 7.6 respectively: > -r--r--r-- 1 root root 48823576 May 21 11:45 initrd.img > and > -r--r--r-- 1 root root 49812292 May 21 11:27 initrd.img > > Pretty close to what I had in 7.4. And to my suprise, with these smaller > files, lpxelinux doesn't reboot while downloading the file over http. The > kernel boots up and the OS is installed properly.... > > I haven't had the time to reproduce this issue over regular > pxelinux/tftp so I don't know if it's just tied to lpxelinux/http or > not. > > Also, so far this bug only seems to be triggered on some Cisco UCS > servers such as UCSB-B200-M3 like the one described below. So maybe it > could be related to BIOS or memory maps, I am not sure!? > > I'd hate to go back to tftp because the switch to http was such a huge > step forward. > > Is there something I could do or provide to help debug this issue? I > read the "Hardware Compatibility" and "Common Problems" pages on the > wiki and found nothing close to what I'm seeing. I started reading the > "Development/Debugging" but while I could use "COM32 debug.c32" to get > more details, but I don't know which functions I should be tracing? > > Please let me know. > > Cheers, > Mathieu > > BIOS Information > Vendor: Cisco Systems, Inc. > Version: B200M3.2.2.6f.0.052120182033 > Release Date: 05/21/2018 > Address: 0xF0000 > Runtime Size: 64 kB > ROM Size: 4096 kB > Characteristics: > PCI is supported > BIOS is upgradeable > BIOS shadowing is allowed > Boot from CD is supported > Selectable boot is supported > BIOS ROM is socketed > EDD is supported > 5.25"/1.2 MB floppy services are supported (int 13h) > 3.5"/720 kB floppy services are supported (int 13h) > 3.5"/2.88 MB floppy services are supported (int 13h) > Print screen service is supported (int 5h) > 8042 keyboard services are supported (int 9h) > Serial services are supported (int 14h) > Printer services are supported (int 17h) > ACPI is supported > USB legacy is supported > BIOS boot specification is supported > Targeted content distribution is supported > UEFI is supported > BIOS Revision: 4.6 > > System Information > Manufacturer: Cisco Systems Inc > Product Name: UCSB-B200-M3 > Version: 1 > Serial Number: MYSERIALNUMBER > UUID: SOMEUUID > Wake-up Type: Other > SKU Number: > Family:-- Mathieu Chouquet-Stringer m+syslinux at thi.eu.com The sun itself sees not till heaven clears. -- William Shakespeare --
hpa at zytor.com
2019-Jun-19 20:05 UTC
[syslinux] lpxelinux.0 issues with larger initrd.img files from RHEL >= 7.5 on UCS servers?
On June 19, 2019 12:21:05 PM PDT, Mathieu Chouquet-Stringer <m+syslinux at thi.eu.com> wrote:> Hello, > >On Tue, Jun 18, 2019 at 05:31:17PM -0700, hpa at zytor.com wrote: >> Which servers, what threshold, what clients, what about pxelinux.0? > >All affected servers so far are Cisco UCS B200 M3 blade servers. > >The threshold seems to be around 50MB, I haven't tested precisely but >54525200 bytes is enough to trigger the reboot while 49812292 isn't >(what I did was to recompress the initrd with a higher compression >setting). > >I tried today pxelinux.0 instead of lpxelinux.0 and it behaves the same >way: if the initrd is "too big", the server reboots. > >What do you mean by "clients"? > >Please find below what I initially wrote in my email, let me know if or >how I can help debug that. > >Cheers, > >> Hello, >> >> I am using lpxelinux.0 (latest stable version 6.03, using the >official binaries >> from kernel.org) to kickstart servers as http transfers really helps >over links >> with poor latencies... These servers are being booted in legacy >mode, >> not in UEFI. >> >> This has worked very well until recently. Starting with RHEL 7.5, on >> some servers, we would see the machine rebooting while pxelinux is >in the >> middle of downloading the initrd.img file. >> >> I quick workaround was to tell people to boot/kickstart using a >previous >> minor (last working was 7.4): the install process taking care of >updating to >> the latest version. >> >> I experienced the same issue again yesterday and had time to think >about >> it again. >> >> On rhel 7.4, the last working version in my case, the file is that >big: >> -rw-r--r-- 1 root root 49763300 Dec 1 2017 initrd.img >> >> I saw on 7.5 and 7.6 they're slightly larger, respectively: >> -r--r--r-- 1 root root 54525200 Mar 22 2018 initrd.img >> and >> -r--r--r-- 1 root root 54799220 Oct 10 2018 initrd.img >> >> Because I have no trace on the screen to explain the reboot (same >thing >> with a recorded session over a serial console), I was like: what if >the >> size is a factor? Given I had nothing else to try, I looked at the >files >> and saw they were compressed with xz. >> >> So I wondered, what if I compressed them more (I guessed they were >compressed >> with the default compression preset)? After uncompressing and >compressing with >> xz -9 -C crc32, here's what I get for 7.5 and 7.6 respectively: >> -r--r--r-- 1 root root 48823576 May 21 11:45 initrd.img >> and >> -r--r--r-- 1 root root 49812292 May 21 11:27 initrd.img >> >> Pretty close to what I had in 7.4. And to my suprise, with these >smaller >> files, lpxelinux doesn't reboot while downloading the file over >http. The >> kernel boots up and the OS is installed properly.... >> >> I haven't had the time to reproduce this issue over regular >> pxelinux/tftp so I don't know if it's just tied to lpxelinux/http or >> not. >> >> Also, so far this bug only seems to be triggered on some Cisco UCS >> servers such as UCSB-B200-M3 like the one described below. So maybe >it >> could be related to BIOS or memory maps, I am not sure!? >> >> I'd hate to go back to tftp because the switch to http was such a >huge >> step forward. >> >> Is there something I could do or provide to help debug this issue? I >> read the "Hardware Compatibility" and "Common Problems" pages on the >> wiki and found nothing close to what I'm seeing. I started reading >the >> "Development/Debugging" but while I could use "COM32 debug.c32" to >get >> more details, but I don't know which functions I should be tracing? >> >> Please let me know. >> >> Cheers, >> Mathieu >> >> BIOS Information >> Vendor: Cisco Systems, Inc. >> Version: B200M3.2.2.6f.0.052120182033 >> Release Date: 05/21/2018 >> Address: 0xF0000 >> Runtime Size: 64 kB >> ROM Size: 4096 kB >> Characteristics: >> PCI is supported >> BIOS is upgradeable >> BIOS shadowing is allowed >> Boot from CD is supported >> Selectable boot is supported >> BIOS ROM is socketed >> EDD is supported >> 5.25"/1.2 MB floppy services are supported (int >13h) >> 3.5"/720 kB floppy services are supported (int >13h) >> 3.5"/2.88 MB floppy services are supported (int >13h) >> Print screen service is supported (int 5h) >> 8042 keyboard services are supported (int 9h) >> Serial services are supported (int 14h) >> Printer services are supported (int 17h) >> ACPI is supported >> USB legacy is supported >> BIOS boot specification is supported >> Targeted content distribution is supported >> UEFI is supported >> BIOS Revision: 4.6 >> >> System Information >> Manufacturer: Cisco Systems Inc >> Product Name: UCSB-B200-M3 >> Version: 1 >> Serial Number: MYSERIALNUMBER >> UUID: SOMEUUID >> Wake-up Type: Other >> SKU Number: >> Family:Sounds like you may want to contact Cisco... -- Sent from my Android device with K-9 Mail. Please excuse my brevity.
Possibly Parallel Threads
- lpxelinux.0 issues with larger initrd.img files from RHEL >= 7.5 on UCS servers?
- lpxelinux.0 issues with larger initrd.img files from RHEL >= 7.5 on UCS servers?
- lpxelinux.0 issues with larger initrd.img files from RHEL >= 7.5 on UCS servers?
- lpxelinux.0 issues with larger initrd.img files from RHEL >= 7.5 on UCS servers?
- lpxelinux.0 issues with larger initrd.img files from RHEL >= 7.5 on UCS servers?