Gene Cumm
2014-Jul-12 19:24 UTC
[syslinux] lpxelinux hangs under Intel Boot Agent 1.3.81 (2.1 build 089) on Dell Optiplex 990 BIOS A16
On Sat, Jul 12, 2014 at 3:15 PM, Alexander Perlis <aperlis at math.lsu.edu> wrote:> On 07/11/2014 09:39 PM, Gene Cumm wrote: >> >> With everything else from 6.03-pre18, try this binary (xzip-compressed): >> http://www.zytor.com/~genec/lpxelinux-6.03p18g3.tgz > > > It works! Thanks! > > Anything else I should do/report on this hardware before I upgrade the BIOS > and possibly eliminate the broken environment that exhibited the problem? > > AlexOh fun. That's the exact same workaround just now for different hardware. I can't think of anything but not sure if there's anything that someone else might think of. -- -Gene
Alexander Perlis
2014-Jul-12 19:38 UTC
[syslinux] lpxelinux hangs under Intel Boot Agent 1.3.81 (2.1 build 089) on Dell Optiplex 990 BIOS A16
On 07/12/2014 02:24 PM, Gene Cumm wrote:> On Sat, Jul 12, 2014 at 3:15 PM, Alexander Perlis <aperlis at math.lsu.edu> wrote: >> On 07/11/2014 09:39 PM, Gene Cumm wrote: >>> >>> With everything else from 6.03-pre18, try this binary (xzip-compressed): >>> http://www.zytor.com/~genec/lpxelinux-6.03p18g3.tgz >> >> It works! Thanks! > > Oh fun. That's the exact same workaround just now for different > hardware.I'm curious about the nature of the workaround. I'm guessing a bug in how the Dell NIC firmware handles ARP packets, and somehow you work around that? I did look at some packet traces with 6.03p18g3, and noticed some more unexpected ARP behavior (see below), which may indicate more things to be worked around? This is with "stock" 6.03p18g3 (no pxelinux-options changes): After all the TFTP transfers are complete (at 1.3 seconds into the conversation), there is mostly silence, but at 5.3s, 6.3s, and 7.3s the PXE server makes an ARP request to the Optiplex990 client asking for the MAC (even though it already knows it since the ARP request isn't a broadcast but targeted to the client's MAC). These three requests are seemingly not answered by the client, and there is seemingly no further communication (I waited a few minutes). If I instead use pxelinux-options to set an HTTP prefix in 6.03p18g3, then the initial TFTP transfer followed by all the HTTP data transfers are complete after 1.1 seconds, then I see some FIN/ACK closing of one HTTP connection at 3.5 seconds, another FIN/ACK HTTP closing at 9 seconds, another at 20 seconds, and one more at 42 seconds, then at 47,48,49 seconds I again see the PXE server making those three ARP requests targeted to the Optiplex990 client asking it for its MAC, no response by the client, silence for a while, then at 85 seconds the client sends another FIN/ACK to the server on port 80, and now at 85,86,87 seconds the server makes a *broadcast* ARP request searching for the MAC of the client, and no one answers this and there follows only silence. I'm guessing the earlier targeted ARP requests were to update a stale but not yet expired ARP entry in the server, in preparation for the server to say _something_ to the client, and the latter broadcast ARP requests are because the ARP entry is gone but the server still has something it wishes to say to the client. I report this just in case you see something that concerns you and you wish to make more changes to your workaround. Certainly if I don't look at packet traces, and just wait for my vesamenu to come up, it does indeed come up, and I'm happy. :) Alex
Gene Cumm
2014-Jul-12 20:26 UTC
[syslinux] lpxelinux hangs under Intel Boot Agent 1.3.81 (2.1 build 089) on Dell Optiplex 990 BIOS A16
On Sat, Jul 12, 2014 at 3:38 PM, Alexander Perlis <aperlis at math.lsu.edu> wrote:> On 07/12/2014 02:24 PM, Gene Cumm wrote: >> >> On Sat, Jul 12, 2014 at 3:15 PM, Alexander Perlis <aperlis at math.lsu.edu> >> wrote: >>> >>> On 07/11/2014 09:39 PM, Gene Cumm wrote: >>>> >>>> >>>> With everything else from 6.03-pre18, try this binary (xzip-compressed): >>>> http://www.zytor.com/~genec/lpxelinux-6.03p18g3.tgz >>> >>> >>> It works! Thanks! >> >> >> Oh fun. That's the exact same workaround just now for different >> hardware. > > > I'm curious about the nature of the workaround. I'm guessing a bug in how > the Dell NIC firmware handles ARP packets, and somehow you work around that?Far simpler: their UNDI/PXE reports that interrupts should work but they never trigger so we need to force polling.> I did look at some packet traces with 6.03p18g3, and noticed some more > unexpected ARP behavior (see below), which may indicate more things to be > worked around? > > This is with "stock" 6.03p18g3 (no pxelinux-options changes): After all the > TFTP transfers are complete (at 1.3 seconds into the conversation), there is > mostly silence, but at 5.3s, 6.3s, and 7.3s the PXE server makes an ARP > request to the Optiplex990 client asking for the MAC (even though it already > knows it since the ARP request isn't a broadcast but targeted to the > client's MAC). These three requests are seemingly not answered by the > client, and there is seemingly no further communication (I waited a few > minutes). > > If I instead use pxelinux-options to set an HTTP prefix in 6.03p18g3, then > the initial TFTP transfer followed by all the HTTP data transfers are > complete after 1.1 seconds, then I see some FIN/ACK closing of one HTTP > connection at 3.5 seconds, another FIN/ACK HTTP closing at 9 seconds, > another at 20 seconds, and one more at 42 seconds, then at 47,48,49 seconds > I again see the PXE server making those three ARP requests targeted to the > Optiplex990 client asking it for its MAC, no response by the client, silence > for a while, then at 85 seconds the client sends another FIN/ACK to the > server on port 80, and now at 85,86,87 seconds the server makes a > *broadcast* ARP request searching for the MAC of the client, and no one > answers this and there follows only silence.Unnecessary repeat traffic and ignoring ARP. interesting.> I'm guessing the earlier targeted ARP requests were to update a stale but > not yet expired ARP entry in the server, in preparation for the server to > say _something_ to the client, and the latter broadcast ARP requests are > because the ARP entry is gone but the server still has something it wishes > to say to the client. > > I report this just in case you see something that concerns you and you wish > to make more changes to your workaround. Certainly if I don't look at packet > traces, and just wait for my vesamenu to come up, it does indeed come up, > and I'm happy. :)Probably no changes to the workaround but I'll try to take a look to see if I see similar behaviors on other machines that don't need this workaround. -- -Gene
Possibly Parallel Threads
- lpxelinux hangs under Intel Boot Agent 1.3.81 (2.1 build 089) on Dell Optiplex 990 BIOS A16
- lpxelinux hangs under Intel Boot Agent 1.3.81 (2.1 build 089) on Dell Optiplex 990 BIOS A16
- lpxelinux hangs under Intel Boot Agent 1.3.81 (2.1 build 089) on Dell Optiplex 990 BIOS A16
- lpxelinux hangs under Intel Boot Agent 1.3.81 (2.1 build 089) on Dell Optiplex 990 BIOS A16
- lpxelinux hangs under Intel Boot Agent 1.3.81 (2.1 build 089) on Dell Optiplex 990 BIOS A16