On Oct 11, 2015 1:26 AM, "Michael Glasgow" <glasgow at beer.net> wrote:> > Gene Cumm wrote: > > >> My test x86-64 binaries: > > >> > > >>https://sites.google.com/site/genecsyslinux/sl604p0g17-x64.tgz?attredirects=0&d=1> > > > On Fri, Oct 2, 2015 at 4:46 PM, Derrick M <derrick.martinez at gmail.com>wrote:> > > This works! Fixes my issue I have been having with the DL160s > > > > Further testing, preferably of the above binaries, on machines that > > previously had issues loading ldlinux.e64/ldlinux.e32 would be greatly > > appreciated as I know you've observed this issue and this seems like > > we might have a final resolution. > > I got some time to look at this today. Definitely better, but I > think it's still broken for me on an Oracle X5-2 with latest bios > and ilom firmware. I loaded official binaries for this test and > replaced the two files with your patched versions.Excellent. We're at the next phase.> Here's the config file: > > DEFAULT type_INSTALL_to_beginDEFAULT install_ovm341 SAY type install_ovm341 to begin PROMPT 1 TIMEOUT 3000> LABEL INSTALL_ovm341This should be treated case insensitively and tab completion should show it as typed.> KERNEL mboot.c32 > APPEND media/ovm34_beta/images/pxeboot/xen.gz dom0_mem=max:128Gdom0_max_vcpus=20 com1=57600,8n1 console=com1 --- media/ovm34_beta/images/pxeboot/vmlinuz console=ttyS0,57600n8 kshttp://10.196.129.1/ks/ovm341_unmanaged.ks --- media/ovm34_beta/images/pxeboot/initrd.img I'm honestly unsure if mboot.c32 works on EFI. Did you try a plain Linux kernel yet?> Console output: > > >>Checking Media Presence...... > >>Media Present...... > Downloading NBP file... > > Succeed to download NBP file. > Getting cached packet > My IP is 10.196.129.123 > Loading type_INSTALL_to_begin... failed: No such file or directory > boot: INSTALL_ovm341 > > [hangs while loading the xen kernel]Thanks for the output.> In syslog you can see it request the xen kernel, then nothing further: > > Oct 11 06:08:49 oosinf01 in.tftpd[72726]: RRQ from 10.196.129.123filename efi64/mboot.c32> Oct 11 06:08:49 oosinf01 in.tftpd[72727]: RRQ from 10.196.129.123filename efi64/libcom32.c32> Oct 11 06:08:49 oosinf01 in.tftpd[72728]: RRQ from 10.196.129.123filename efi64/media/ovm34_beta/images/pxeboot/xen.gz> > With tcpdump you can see the pxe client suddenly stops acknowledging > tftp packets, apparently before the server is done sending the kernel: > > 06:08:49.645053 IP (tos 0x0, ttl 64, id 37770, offset 0, flags [none],proto UDP (17), length 1440)> 10.196.129.1.43197 > 10.196.129.123.1722: [bad udp cksum 0x1da2 ->0xc464!] UDP, length 1412> 06:08:49.645129 IP (tos 0x0, ttl 64, id 59240, offset 0, flags [none],proto UDP (17), length 32)> 10.196.129.123.1722 > 10.196.129.1.43197: [udp sum ok] UDP, length 4 > 06:08:49.645143 IP (tos 0x0, ttl 64, id 37771, offset 0, flags [none],proto UDP (17), length 1440)> 10.196.129.1.43197 > 10.196.129.123.1722: [bad udp cksum 0x1da2 ->0x6e43!] UDP, length 1412> 06:08:50.646315 IP (tos 0x0, ttl 64, id 37772, offset 0, flags [none],proto UDP (17), length 1440)> 10.196.129.1.43197 > 10.196.129.123.1722: [bad udp cksum 0x1da2 ->0x6e43!] UDP, length 1412> 06:08:52.648615 IP (tos 0x0, ttl 64, id 37773, offset 0, flags [none],proto UDP (17), length 1440)> 10.196.129.1.43197 > 10.196.129.123.1722: [bad udp cksum 0x1da2 ->0x6e43!] UDP, length 1412> 06:08:56.652794 IP (tos 0x0, ttl 64, id 37774, offset 0, flags [none],proto UDP (17), length 1440)> 10.196.129.1.43197 > 10.196.129.123.1722: [bad udp cksum 0x1da2 ->0x6e43!] UDP, length 1412> 06:09:04.660903 IP (tos 0x0, ttl 64, id 37775, offset 0, flags [none],proto UDP (17), length 1440)> 10.196.129.1.43197 > 10.196.129.123.1722: [bad udp cksum 0x1da2 ->0x6e43!] UDP, length 1412> 06:09:20.677014 IP (tos 0x0, ttl 64, id 37776, offset 0, flags [none],proto UDP (17), length 1440)> 10.196.129.1.43197 > 10.196.129.123.1722: [bad udp cksum 0x1da2 ->0x6e43!] UDP, length 1412> 06:09:25.689215 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has10.196.129.123 tell 10.196.129.1, length 28> 06:09:25.707342 ARP, Ethernet (len 6), IPv4 (len 4), Reply 10.196.129.123is-at 00:10:e0:71:eb:f4, length 46 Feels like a stall in mboot.c32. I'd typically consider a hang when Ctrl-Alt-Del and ARP don't respond. I'd guess that the core filled a buffer but mboot.c32 isn't emptying. How much of the kernel loaded? Please try a plain Linux kernel to see if the core is flowing nicely and that mboot.c32 is the issue. If you try to load a file over 15MB via TFTP, please do a capture to a file. I'd like to know if your system also exhibits the decaying IO rate. - Was this with binaries from sl604p0g17 or sl604p0g18? - Could you try the other also? If you have difficulty loading a plain Linux kernel with both, please report the following: - Make/model of system - UEFI firmware revision - What NIC type and port number? - UEFI extension agents (struggling to recall the proper term; comparable to a BIOS PXE OROM for add-in cards) - Looks like you copied the console output well enough. - I see you did a packet capture that seems valid. --Gene
Geert Stappers
2015-Oct-11 17:36 UTC
[syslinux] UEFI: Failed to load ldlinux.e64/ldlinux.e32
On Sun, Oct 11, 2015 at 10:25:32AM -0400, Gene Cumm via Syslinux wrote:> On Oct 11, 2015 1:26 AM, "Michael Glasgow" <glasgow at beer.net> wrote: > > > > I got some time to look at this today. Definitely better, but I > > think it's still broken for me on an Oracle X5-2 with latest bios > > and ilom firmware. I loaded official binaries for this test and > > replaced the two files with your patched versions. > > Excellent. We're at the next phase. > > > Here's the config file: > > > > DEFAULT type_INSTALL_to_begin > > DEFAULT install_ovm341 > SAY type install_ovm341 to begin > PROMPT 1 > TIMEOUT 3000 > > > LABEL INSTALL_ovm341 > > This should be treated case insensitively and tab completion should show it > as typed. > > > KERNEL mboot.c32 > > APPEND media/ovm34_beta/images/pxeboot/xen.gz dom0_mem=max:128G > dom0_max_vcpus=20 com1=57600,8n1 console=com1 --- > media/ovm34_beta/images/pxeboot/vmlinuz console=ttyS0,57600n8 ks> http://10.196.129.1/ks/ovm341_unmanaged.ks --- > media/ovm34_beta/images/pxeboot/initrd.img > > I'm honestly unsure if mboot.c32 works on EFI. Did you try a plain Linux > kernel yet? > > > Console output: > > > > >>Checking Media Presence...... > > >>Media Present...... > > Downloading NBP file... > > > > Succeed to download NBP file. > > Getting cached packet > > My IP is 10.196.129.123See below> > Loading type_INSTALL_to_begin... failed: No such file or directory > > boot: INSTALL_ovm341 > > > > [hangs while loading the xen kernel] > > Thanks for the output. > > > In syslog you can see it request the xen kernel, then nothing further: > > > > Oct 11 06:08:49 oosinf01 in.tftpd[72726]: RRQ from 10.196.129.123 filename efi64/mboot.c32 > > Oct 11 06:08:49 oosinf01 in.tftpd[72727]: RRQ from 10.196.129.123 filename efi64/libcom32.c32 > > Oct 11 06:08:49 oosinf01 in.tftpd[72728]: RRQ from 10.196.129.123 filename efi64/media/ovm34_beta/images/pxeboot/xen.gz > > > > With tcpdump you can see the pxe client suddenly stops acknowledging > > tftp packets, apparently before the server is done sending the kernel: > > > > Feels like a stall in mboot.c32. I'd typically consider a hang when > Ctrl-Alt-Del and ARP don't respond. I'd guess that the core filled a > buffer but mboot.c32 isn't emptying. How much of the kernel loaded? > > Please try a plain Linux kernel to see if the core is flowing nicely and > that mboot.c32 is the issue. If you try to load a file over 15MB via TFTP, > please do a capture to a file. I'd like to know if your system also > exhibits the decaying IO rate. > > - Was this with binaries from sl604p0g17 or sl604p0g18? > - Could you try the other also? > > If you have difficulty loading a plain Linux kernel with both, please > report the following: > > - Make/model of system > - UEFI firmware revision > - What NIC type and port number? > - UEFI extension agents (struggling to recall the proper term; > comparable to a BIOS PXE OROM for add-in cards) > > - Looks like you copied the console output well enough.FWIW, I miss "disable useDefaultAddress" Groeten Geert Stappers -- Leven en laten leven
Michael Glasgow
2015-Oct-11 19:15 UTC
[syslinux] UEFI: Failed to load ldlinux.e64/ldlinux.e32
Gene Cumm wrote:> On Oct 11, 2015 1:26 AM, "Michael Glasgow" <glasgow at beer.net> wrote: > > I got some time to look at this today. Definitely better, but I > > think it's still broken for me on an Oracle X5-2 with latest bios > > and ilom firmware. I loaded official binaries for this test and > > replaced the two files with your patched versions. > > Excellent. We're at the next phase.Yep, sure looks that way. Thanks for figuring this out!> > Here's the config file: > > > > DEFAULT type_INSTALL_to_begin > > DEFAULT install_ovm341 > SAY type install_ovm341 to begin > PROMPT 1 > TIMEOUT 3000 > > This should be treated case insensitively and tab completion should show it > as typed.As an aside, I intentionally break the default in that way because this is on a shared network. Sometimes the bios is set to boot from pxe, and the admin doesn't realize it. So when the system hasn't booted, they connect to the serial console to see what's going on and typically press enter. Bear in mind, with a serial console, they don't necessarily see what was printed before they connected. With the default pointing to a kickstart entry, you can end up with a junior admin accidently loading up a net install which wipes all the disks. So I tend to either make the default be "boot from first hard disk", or just break the default so they have to enter a valid label. One nice thing about setting the default this way is that the message gets repeated every time you press enter as part of the "not found" error. This makes it a bit more obvious what state the machine is in when you've freshly connected to the serial console. Crude, but effective.> I'm honestly unsure if mboot.c32 works on EFI. Did you try a plain Linux > kernel yet?I had not, purely because when I sat down to try this I really needed a Xen host.> Please try a plain Linux kernel to see if the core is flowing nicely and > that mboot.c32 is the issue. If you try to load a file over 15MB via TFTP, > please do a capture to a file. I'd like to know if your system also > exhibits the decaying IO rate.I just tried installing Oracle Linux 7.1, and it works!!! I'm not sure what the decaying i/o issue looks like. It's a bit slow loading the initrd, but I think the efi drivers are just slow in general. Just in case, I went ahead and did a capture on the g18 patch loading OL 7.1, which you can grab from here: http://www.beer.net/m/etc/sl604p0g18.pcap.gz [back to the xen attempt]> Feels like a stall in mboot.c32. I'd typically consider a hang when > Ctrl-Alt-Del and ARP don't respond. I'd guess that the core filled a > buffer but mboot.c32 isn't emptying. How much of the kernel loaded?Seems like most of it loaded, but I didn't count. Looks like you're probably right about where the issue is, though, since plain linux works.> - Was this with binaries from sl604p0g17 or sl604p0g18? > - Could you try the other also?Same failure mode for mboot / xen on both patches. Success for plain linux on both patches. Let me know what kind of info would be useful for debugging the mboot issue, and I will collect it all. Also, EFI is still somewhat new to me, so I don't know how to find the UEFI firmware revision or UEFI extension agents. I'll see if it's in the server manual, but if you have an idea how one typically finds those, I'd appreciate any pointers. Thanks again! -- Michael Glasgow <glasgow at beer.net>
> Let me know what kind of info would be useful for debugging the > mboot issue, and I will collect it all. Also, EFI is still somewhat > new to me, so I don't know how to find the UEFI firmware revision > or UEFI extension agents. I'll see if it's in the server manual, > but if you have an idea how one typically finds those, I'd appreciate > any pointers. > > Thanks again! > > -- > Michael Glasgow <glasgow at beer.net>Any method useful for a "legacy" firmware (aka "BIOS"), should be available for UEFI. Some possible methods (please read all of them): _ Shown in a "normal" POST screen during boot, and usually shown also in the firmware setup screens. KISS :). _ Using some tool that retrieves DMI data; although such tool should rather have its database updated for new-ish hardware. _ hdt.c32. Erwan Velu has updated the hdt code recently, some of it for DMI. For these updates to be effective, it needs to be re-built from current git HEAD together with the core module, the library modules and the bootloader. Whether it would then work correctly (under UEFI), I do not know. _ Using the SYSAPPEND directive in the Syslinux configuration file (hint: then check the resulting generated Syslinux cookie for the relevant client system, or run 'cat /proc/cmdline' in the booted system). If the server boots with Syslinux, this method would apply to it too. A possible SYSAPPEND statement could be: SYSAPPEND 0x18000 or its decimal alternative (in case the hex value fails for some reason): SYSAPPEND 98304 should provide: BIOSVENDOR= BIOS vendor name BIOSVERSION= BIOS version Regards, Ady. PS: Could the issue with booting Xen be not related to mboot.c32, but rather to something else (for instance, to something like the current "keeppxe" failing)?> _______________________________________________ > Syslinux mailing list > Submissions to Syslinux at zytor.com > Unsubscribe or set options at: > http://www.zytor.com/mailman/listinfo/syslinux >
On Sun, Oct 11, 2015 at 3:15 PM, Michael Glasgow <glasgow at beer.net> wrote:> I'm not sure what the decaying i/o issue looks like. It's a bit > slow loading the initrd, but I think the efi drivers are just slow > in general. Just in case, I went ahead and did a capture on the > g18 patch loading OL 7.1, which you can grab from here: > > http://www.beer.net/m/etc/sl604p0g18.pcap.gzJust like that although this one maintains a reasonable IO rate. Wireshark, Statistics, IO Graph, "udp.dstport == 1719", bits/tick. Look at both 1 second per tick and 5 pixels per tick and then 0.1 seconds per tick and 1 pixel per tick. -- -Gene