Alexander Perlis
2014-Nov-15 20:22 UTC
[syslinux] iPXE chain to lpxelinux.0 6.03 inconsistencies and failures
On 15 Nov 2014 05:06:52 +0200, Ady wrote:> > I would start by updating the BIOS.Prudent advice. As it turns out, I'm already at the latest version. On 15 Nov 2014 07:31:27 +0100, Geert Stappers wrote:> > And would reduce 'iPXE => pxe.0 => lpxelinux.0 => "vmlinux"' > into 'iPXE => "vmlinux"'That makes sense generally, but at the moment doesn't make sense for my particular circumstance. I should clarify: I do not seek a workaround that eliminates iPXE or eliminates lpxelinux.0; instead, since I have a test combination that exposes a bug somewhere in iPXE or lpxelinux.0 (or both), I'd like to use this opportunity to assist the developers in getting that fixed. Any iPXE or lpxelinux.0 developers who want to make the code more robust? What can I do to isolate the bug? Thanks all, Alex
Geert Stappers
2014-Nov-15 21:43 UTC
[syslinux] iPXE chain to lpxelinux.0 6.03 inconsistencies and failures
On Sat, Nov 15, 2014 at 02:22:03PM -0600, Alexander Perlis wrote:> On 15 Nov 2014 05:06:52 +0200, Ady wrote: > > > >I would start by updating the BIOS. > > Prudent advice. As it turns out, I'm already at the latest version. > > > On 15 Nov 2014 07:31:27 +0100, Geert Stappers wrote: > > > >And would reduce 'iPXE => pxe.0 => lpxelinux.0 => "vmlinux"' > >into 'iPXE => "vmlinux"' > > That makes sense generally, but at the moment doesn't make sense for > my particular circumstance. > > I should clarify: I do not seek a workaround that eliminates iPXE or > eliminates lpxelinux.0; instead, since I have a test combination > that exposes a bug somewhere in iPXE or lpxelinux.0 (or both), I'd > like to use this opportunity to assist the developers in getting > that fixed. > > Any iPXE or lpxelinux.0 developers who want to make the code more > robust? What can I do to isolate the bug?Quoting the original posting: | I boot to a USB stick with iPXE, which then is told to "dhcp" and then | "chain http://xxx.xxx.xxx.xxx/pxe.0". That loads a version of | lpxelinux.0 6.03 that is configured (via pxelinux-options) with an | appropriate next-server, path-prefix, and config-file. | | This all works great on a lot of different machines. | | But specifically on the Dell Optiplex GX620 and Optiplex 645, which have | built-in Broadcom ethernet (the GX620 has 14e4:1677 [1028:01ad], and the | 645 has 14e4:167a [1028:01da]), there's a problem: first lpxelinux.0 is | correctly transferred, then control is indeed handed to lpxelinux.0 | because the "PXELINUX 6.03 lwIP 2014-10-06" banner indeed appears, but | then the computer appears to be frozen, although it eventually says | "Failed to load ldlinux.c32". (At the server end there were no requests | to transfer anything.) And what is visible with a network sniffer ( tcpdump, tshark, wireshark ) at the client? | This can be further isolated to the built-in Broadcom ethernet (as | opposed to something else on the GX620 or 645) as follows: if on that | same hardware I insert a Linksys PCI card, and move the network cable to | that iPXE will DHCP & chain via that card, then there is no problem and | I end up at the graphical vesamenu. | | | Now my question: where more specifically is the bug? What can I do to | help a developer isolate this? | | For example, there could be a bug in the iPXE driver for the Broadcom | ethernet, a bug that doesn't affect iPXE's ability to load lpxelinux.0, | but then *does* affect lpxelinux.0's ability to ask iPXE to load the | next component. Or there could be a bug in lpxelinux.0, such as memory | management or stack management, which is simply being triggered by say | iPXE's Broadcom driver being say a different size than perhaps that of | most other drivers. Or who knows. (In case it helps: Back in July, Gene | posted that the problem may be related to commit 0c1dff8d.) | | I'm happy to do testing, run a custom debug build and report output, or | whatever might help. Just need some pointers as to what to do. Any iPXE | or lpxelinux.0 developer is welcome to contact me. For a "pointer": http://www.syslinux.org/wiki/index.php/Development/Debugging#Syslinux_Dynamic_Debugger Groeten Geert Stappers -- Leven en laten leven
Ady
2014-Nov-16 01:35 UTC
[syslinux] iPXE chain to lpxelinux.0 6.03 inconsistencies and failures
> On 15 Nov 2014 05:06:52 +0200, Ady wrote: > > > > I would start by updating the BIOS. > > Prudent advice. As it turns out, I'm already at the latest version. > > > On 15 Nov 2014 07:31:27 +0100, Geert Stappers wrote: > > > > And would reduce 'iPXE => pxe.0 => lpxelinux.0 => "vmlinux"' > > into 'iPXE => "vmlinux"' > > That makes sense generally, but at the moment doesn't make sense for my > particular circumstance. > > I should clarify: I do not seek a workaround that eliminates iPXE or > eliminates lpxelinux.0; instead, since I have a test combination that > exposes a bug somewhere in iPXE or lpxelinux.0 (or both), I'd like to > use this opportunity to assist the developers in getting that fixed. > > Any iPXE or lpxelinux.0 developers who want to make the code more > robust? What can I do to isolate the bug? > > Thanks all, > Alex > _______________________________________________ > Syslinux mailing list > Submissions to Syslinux at zytor.com > Unsubscribe or set options at: > http://www.zytor.com/mailman/listinfo/syslinux >In last July you seemed to be successful with a similar Optiplex GX620 (see [1]). The typical questions (after the one about the BIOS): _ What about trying a cold boot? _ Are you using official prebuilt binaries downlaoded from kernel.org? _ Have you updated *all* Syslinux-related files (including ldlinux.c32)? _ Have you tried resetting the BIOS to default settings, then rebooting and reconfiguring the BIOS again? And the NIC's ROM settings? Let's see some of the components of your current setup: _ ipxe _ chainloading _ lpxelinux _ pxelinux-options _ the specific built-in Broadcom (this seems to be already an identified issue) Isolating each part could narrow down the problem. So, for example: _ Is there some update available for the Broadcom that is not yet part of the main BIOS version? _ Have you tested other settings, like intentionally using a different (fixed) Ethernet speed (10/100/1000)? _ Do you happen to have some other system (even another Optiplex GX620) with the same (or as similar as possible) Broadcom? Is the same behavior seen in those similar systems? _ Have you contacted Dell about this behavior in this particular system? _ Is there some newer / testing version of ipxe / drivers so to test the whole process again? _ Can you boot the same kernel from ipxe directly (without chainloading to lpxelinux.0)? _ What happens if you change lpxelinux.0 and use pxelinux.0 instead? _ Do you have some way to test the whole chain without using the 'pxelinux-options' tool? _ Do you have some way to test (l)pxelinux.0 on this system without ipxe? I should quote one question that was just asked by Geert: _ And what is visible with a network sniffer ( tcpdump, tshark, wireshark ) at the client? Some of these questions are somewhat rhetorical, and a kind of trigger for you to isolate the problem. For the cases with a potential slower transfer (e.g. pxelinux.0), you don't actually need to load the kernel; all you need is for ldlinux.c32 to successfully load and then a simple configuration file. Even before loading a kernel, if you get to the configuration file (e.g. show a Syslinux "boot:" prompt) then that's already a success in the current context. Since the issue is seen on specific hardware, the only way to narrow down the source of the problem is by isolating each of the aforementioned parts of your process (and replicate the behavior). Although apparently the behavior is seen between lpxelinux.0 and ldlinux.c32, the culprit could be somewhere else too (e.g. NIC's ROM, some timer, some BIOS setting, some "energy-saving" thingy, faulty hardware...). Please let us know of any updates. Thank you, Ady. [1] http://www.syslinux.org/archives/2014-July/022487.html
Seemingly Similar Threads
- iPXE chain to lpxelinux.0 6.03 inconsistencies and failures
- iPXE chain to lpxelinux.0 6.03-pre17 inconsistencies and failures
- iPXE chain to lpxelinux.0 6.03-pre17 inconsistencies and failures
- iPXE chain to lpxelinux.0 6.03-pre17 inconsistencies and failures
- iPXE chain to lpxelinux.0 6.03-pre17 inconsistencies and failures