John Kennedy
2015-Apr-13 19:38 UTC
[syslinux] syslinux.efi problem TFTPing ldlinux.e64 on hyper-v gen2 netboot
I've found a problem, and I think it is in syslinux (specifically when it tries to load ldlinux.e64 under hyper-V). I have a physical computer with a Gigabyte GA-P85-D3 motherboard that successfully pxeboots using the EFI images, and I hold that up as exhibit 1 that my deployment environment works. It also implies that the syslinux.efi image works in some cases, but not in all. My Hyper-V (manager) version identifies itself as 6.3.9600.16384 running on Windows 8.1. The VM is "Version 5.0, Generation 2" (via VM summary), using the only available network adapter. Secure boot is disabled (although that probably would have burned me loading the initial syslinux.efi). The syslinux.efi and ldlinux.e64 are coming from syslinux-6.03.tar.gz (from www.kernel.org). Here is a bootsequence for the physical system that works: Apr 8 06:44:01 localhost dhcpd: DHCPDISCOVER from 74:d4:35:b3:45:f1 via eth0 Apr 8 06:44:01 localhost dhcpd: DHCPOFFER on 10.0.1.70 to 74:d4:35:b3:45:f1 via eth0 Apr 8 06:44:05 localhost dhcpd: DHCPREQUEST for 10.0.1.70 (10.0.1.241) from 74:d4:35:b3:45:f1 via eth0 Apr 8 06:44:05 localhost dhcpd: DHCPACK on 10.0.1.70 to 74:d4:35:b3:45:f1 via eth0 Apr 8 06:44:05 localhost in.tftpd[4696]: RRQ from 10.0.1.70 filename syslinux.efi Apr 8 06:44:05 localhost in.tftpd[4696]: tftp: client does not accept options Apr 8 06:44:05 localhost in.tftpd[4697]: RRQ from 10.0.1.70 filename syslinux.efi Apr 8 06:44:06 localhost in.tftpd[4698]: RRQ from 10.0.1.70 filename ldlinux.e64 Apr 8 06:44:06 localhost in.tftpd[4699]: RRQ from 10.0.1.70 filename pxelinux.cfg/7402d403-3504-b305-4506-f10700080009 Apr 8 06:44:06 localhost in.tftpd[4700]: RRQ from 10.0.1.70 filename pxelinux.cfg/01-74-d4-35-b3-45-f1 Apr 8 06:44:07 localhost in.tftpd[4701]: RRQ from 10.0.1.70 filename pxelinux.cfg/0A000146 Apr 8 06:44:07 localhost in.tftpd[4702]: RRQ from 10.0.1.70 filename pxelinux.cfg/0A00014 Apr 8 06:44:07 localhost in.tftpd[4703]: RRQ from 10.0.1.70 filename pxelinux.cfg/0A0001 Apr 8 06:44:07 localhost in.tftpd[4704]: RRQ from 10.0.1.70 filename pxelinux.cfg/0A000 Apr 8 06:44:07 localhost in.tftpd[4705]: RRQ from 10.0.1.70 filename pxelinux.cfg/0A00 Apr 8 06:44:07 localhost in.tftpd[4706]: RRQ from 10.0.1.70 filename pxelinux.cfg/0A0 Apr 8 06:44:07 localhost in.tftpd[4707]: RRQ from 10.0.1.70 filename pxelinux.cfg/0A Apr 8 06:44:07 localhost in.tftpd[4708]: RRQ from 10.0.1.70 filename pxelinux.cfg/0 Apr 8 06:44:07 localhost in.tftpd[4709]: RRQ from 10.0.1.70 filename pxelinux.cfg/default Apr 8 06:44:08 localhost in.tftpd[4710]: RRQ from 10.0.1.70 filename vmlinuz Apr 8 06:44:21 localhost in.tftpd[4711]: RRQ from 10.0.1.70 filename initrd.img The failed system only makes it to *starting* to download ldlinux.e64: Apr 8 08:18:26 localhost dhcpd: DHCPDISCOVER from 00:15:5d:01:d5:05 via eth1 Apr 8 08:18:26 localhost dhcpd: DHCPOFFER on 10.10.0.71 to 00:15:5d:01:d5:05 via eth1 Apr 8 08:18:30 localhost dhcpd: DHCPREQUEST for 10.10.0.71 (10.10.0.1) from 00:15:5d:01:d5:05 via eth1 Apr 8 08:18:30 localhost dhcpd: DHCPACK on 10.10.0.71 to 00:15:5d:01:d5:05 via eth1 Apr 8 08:18:30 localhost in.tftpd[3662]: RRQ from 10.10.0.71 filename syslinux.efi Apr 8 08:18:30 localhost in.tftpd[3662]: tftp: client does not accept options Apr 8 08:18:30 localhost in.tftpd[3663]: RRQ from 10.10.0.71 filename syslinux.efi Apr 8 08:18:30 localhost in.tftpd[3664]: RRQ from 10.10.0.71 filename ldlinux.e64 I say *starting* because it only sends the initial TFTP packet and then times out (which is enough to create the log entry). I've captured packets from the TFTP server's perspective, but haven't been able to packet-sniff the VM (no port-mirroring ability that I've found). I also haven't figured out how to recompile syslinux (via GIT) to figure out what is going on from syslinux's perspective or at least crank up the verbosity. The bad ldlinux.e64 TFTP attempt sends the first packet (TFTP read request) to the TFTP server, and the TFTP server sends the option acknowledgement (with periodic repetition). The TFTP server never gets a request for blocks. Since the code is the same and the basic information provided to it is correct, I can only assume that something in the hyper-v environment is causing syslinux a problem. I tried to do some printf-quality hacking via the GIT sources. syslinux.efi seems to have all the right information (TFTP server, IP, subnet mask) which is verified by the first packet making it. To me, it looked like it was hanging in the fread() while loading the module, but I don't trust my compilation (the physical system wouldn't boot using the hacked-up version at least). Suggestions on how to crack this problem open some more?