John Kennedy
2015-Apr-13 19:38 UTC
[syslinux] syslinux.efi problem TFTPing ldlinux.e64 on hyper-v gen2 netboot
I've found a problem, and I think it is in syslinux (specifically when it
tries to load ldlinux.e64 under hyper-V).
I have a physical computer with a Gigabyte GA-P85-D3 motherboard that
successfully pxeboots using the EFI images, and I hold that up as exhibit 1 that
my deployment environment works. It also implies that the syslinux.efi image
works in some cases, but not in all.
My Hyper-V (manager) version identifies itself as 6.3.9600.16384 running on
Windows 8.1.
The VM is "Version 5.0, Generation 2" (via VM summary), using the only
available network adapter.
Secure boot is disabled (although that probably would have burned me loading the
initial syslinux.efi).
The syslinux.efi and ldlinux.e64 are coming from syslinux-6.03.tar.gz (from
www.kernel.org).
Here is a bootsequence for the physical system that works:
Apr 8 06:44:01 localhost dhcpd: DHCPDISCOVER from 74:d4:35:b3:45:f1 via
eth0
Apr 8 06:44:01 localhost dhcpd: DHCPOFFER on 10.0.1.70 to
74:d4:35:b3:45:f1 via eth0
Apr 8 06:44:05 localhost dhcpd: DHCPREQUEST for 10.0.1.70 (10.0.1.241)
from 74:d4:35:b3:45:f1 via eth0
Apr 8 06:44:05 localhost dhcpd: DHCPACK on 10.0.1.70 to
74:d4:35:b3:45:f1 via eth0
Apr 8 06:44:05 localhost in.tftpd[4696]: RRQ from 10.0.1.70 filename
syslinux.efi
Apr 8 06:44:05 localhost in.tftpd[4696]: tftp: client does not accept
options
Apr 8 06:44:05 localhost in.tftpd[4697]: RRQ from 10.0.1.70 filename
syslinux.efi
Apr 8 06:44:06 localhost in.tftpd[4698]: RRQ from 10.0.1.70 filename
ldlinux.e64
Apr 8 06:44:06 localhost in.tftpd[4699]: RRQ from 10.0.1.70 filename
pxelinux.cfg/7402d403-3504-b305-4506-f10700080009
Apr 8 06:44:06 localhost in.tftpd[4700]: RRQ from 10.0.1.70 filename
pxelinux.cfg/01-74-d4-35-b3-45-f1
Apr 8 06:44:07 localhost in.tftpd[4701]: RRQ from 10.0.1.70 filename
pxelinux.cfg/0A000146
Apr 8 06:44:07 localhost in.tftpd[4702]: RRQ from 10.0.1.70 filename
pxelinux.cfg/0A00014
Apr 8 06:44:07 localhost in.tftpd[4703]: RRQ from 10.0.1.70 filename
pxelinux.cfg/0A0001
Apr 8 06:44:07 localhost in.tftpd[4704]: RRQ from 10.0.1.70 filename
pxelinux.cfg/0A000
Apr 8 06:44:07 localhost in.tftpd[4705]: RRQ from 10.0.1.70 filename
pxelinux.cfg/0A00
Apr 8 06:44:07 localhost in.tftpd[4706]: RRQ from 10.0.1.70 filename
pxelinux.cfg/0A0
Apr 8 06:44:07 localhost in.tftpd[4707]: RRQ from 10.0.1.70 filename
pxelinux.cfg/0A
Apr 8 06:44:07 localhost in.tftpd[4708]: RRQ from 10.0.1.70 filename
pxelinux.cfg/0
Apr 8 06:44:07 localhost in.tftpd[4709]: RRQ from 10.0.1.70 filename
pxelinux.cfg/default
Apr 8 06:44:08 localhost in.tftpd[4710]: RRQ from 10.0.1.70 filename
vmlinuz
Apr 8 06:44:21 localhost in.tftpd[4711]: RRQ from 10.0.1.70 filename
initrd.img
The failed system only makes it to *starting* to download ldlinux.e64:
Apr 8 08:18:26 localhost dhcpd: DHCPDISCOVER from 00:15:5d:01:d5:05 via
eth1
Apr 8 08:18:26 localhost dhcpd: DHCPOFFER on 10.10.0.71 to
00:15:5d:01:d5:05 via eth1
Apr 8 08:18:30 localhost dhcpd: DHCPREQUEST for 10.10.0.71 (10.10.0.1)
from 00:15:5d:01:d5:05 via eth1
Apr 8 08:18:30 localhost dhcpd: DHCPACK on 10.10.0.71 to
00:15:5d:01:d5:05 via eth1
Apr 8 08:18:30 localhost in.tftpd[3662]: RRQ from 10.10.0.71 filename
syslinux.efi
Apr 8 08:18:30 localhost in.tftpd[3662]: tftp: client does not accept
options
Apr 8 08:18:30 localhost in.tftpd[3663]: RRQ from 10.10.0.71 filename
syslinux.efi
Apr 8 08:18:30 localhost in.tftpd[3664]: RRQ from 10.10.0.71 filename
ldlinux.e64
I say *starting* because it only sends the initial TFTP packet and then times
out (which is enough to create the log entry). I've captured packets from
the TFTP server's perspective, but haven't been able to packet-sniff the
VM (no port-mirroring ability that I've found). I also haven't figured
out how to recompile syslinux (via GIT) to figure out what is going on from
syslinux's perspective or at least crank up the verbosity.
The bad ldlinux.e64 TFTP attempt sends the first packet (TFTP read request) to
the TFTP server, and the TFTP server sends the option acknowledgement (with
periodic repetition). The TFTP server never gets a request for blocks. Since
the code is the same and the basic information provided to it is correct, I can
only assume that something in the hyper-v environment is causing syslinux a
problem.
I tried to do some printf-quality hacking via the GIT sources. syslinux.efi
seems to have all the right information (TFTP server, IP, subnet mask) which is
verified by the first packet making it. To me, it looked like it was hanging in
the fread() while loading the module, but I don't trust my compilation (the
physical system wouldn't boot using the hacked-up version at least).
Suggestions on how to crack this problem open some more?
