I'm replacing the master node in a cluster and having issues PXE booting the diskless nodes. The current master works fine, but the new node to replace the current node which has the same setup is having the issues. The hardware is: Dell r920 (new master node, UEFI) Dell r620's (diskless, BIOS) Dell 1950's (diskless, BIOS) When trying to boot, the 1950 node displays (though both diskless types having same issue): --- Broadcom UNDI PXE-2.1 v4.4.4 Copyright (C) 2000-2008 Broadcom Corporation Copyright (C) 1997-2008 Intel Corporation All rights reserved. Broadcom Base Code PXE-2.1 v1.1.0 Copyright (C) 2000-2008 Broadcom Corporation Copyright (C) 1997-2008 Intel Corporation CLIENT MAC ADDR: 00 13 72 F9 54 41 GUID: 44454C4C 4A00 104D 8058 B4C04F534231 CLIENT IP: 10.0.1.1 MASK: 255.255.0.0 DHCP IP: 10.0.0.11 GATEWAY IP: 10.0.0.1 PXE-E32: TFTP open timeout PXE-E32: TFTP open timeout PXE-E32: TFTP open timeout PXE-M0F: Exiting Broadcom PXE ROM. strike F1 to retry boot, F2 for setup utility --- /diskless/pxelinux.cfg/default --- DEFAULT /gentoo-x86_64/boot/kernel-3.14.14-gentoo APPEND ip=dhcp ro rootfstype=nfs root=/dev/nfs nfsroot=10.0.0.11:/diskless/gentoo-x86_64 init=/linuxrc /etc/conf.d/in.tftpd (using tftp-hpa) --- INTFTPD_PATH="/diskless" INTFTPD_OPTS="-R 4096:32767 -v -s ${INTFTPD_PATH}" /etc/dhcp/dhcpd.conf --- ddns-update-style none; authoritative; log-facility local7; allow bootp; subnet 10.0.0.0 netmask 255.255.0.0 { default-lease-time 86400; max-lease-time 86400; option routers 10.0.0.1; option broadcast-address 10.0.255.255; option subnet-mask 255.255.0.0; option domain-name-servers xxx.xxx.xxx.xxx, xxx.xxx.xxx.xxx; option domain-name "mydomain.com"; } group { filename "pxelinux.0"; next-server 10.0.0.11; host node-1 { hardware ethernet 00:13:72:F9:54:41; fixed-address 10.0.1.1; } # snipped other nodes ... } I also have a pcap log for tftp if interested. The current setup I had no issues getting working w/ PXE, but w/ this new master node, haven't been able to see why its not working. Any help much appreciated.
On Thu, Feb 05, 2015 at 02:27:57PM -0600, Lane via Syslinux wrote:> When trying to boot, the 1950 node displays (though both diskless types > having same issue): > --- > > CLIENT MAC ADDR: 00 13 72 F9 54 41 GUID: 44454C4C 4A00 104D 8058 > B4C04F534231 > CLIENT IP: 10.0.1.1 MASK: 255.255.0.0 DHCP IP: 10.0.0.11 > GATEWAY IP: 10.0.0.1 > PXE-E32: TFTP open timeout > PXE-E32: TFTP open timeout > PXE-E32: TFTP open timeout > PXE-M0F: Exiting Broadcom PXE ROM. > > strike F1 to retry boot, F2 for setup utility > --- ><snip/>> > I also have a pcap log for tftp if interested.I am :-) Put the libpcap file somewhere on-line and post the URL. (either to the mailinglist(preferred) or to me private)> The current setup I had no > issues getting working w/ PXE, but w/ this new master node, haven't been > able to see why its not working. > > Any help much appreciated.What is in the logging of the TFTP server? Groeten Geert Stappers -- Leven en laten leven
> /diskless/pxelinux.cfg/default > --- > DEFAULT /gentoo-x86_64/boot/kernel-3.14.14-gentoo > APPEND ip=dhcp ro rootfstype=nfs root=/dev/nfs > nfsroot=10.0.0.11:/diskless/gentoo-x86_64 > init=/linuxrcThat pxelinux.cfg/default seems "tolerable" for older versions, but not for Syslinux 5+. Is this an inadequate copy of the actual content? Or is it really a trustworthy copy of your actual default config? If it is the latter, please report the correct content of the file. If it is the former, I would suggest something _similar_ to the following pxelinux.cfg/default: DEFAULT gentoo_3_14_14 LABEL gentoo_3_14_14 LINUX /gentoo-x86_64/boot/kernel-3.14.14-gentoo APPEND ip=dhcp ro rootfstype=nfs root=/dev/nfs nfsroot=10.0.0.11:/diskless/gentoo-x86_64 init=/linuxrc Note 1: The APPEND line should be in one row (i.e. no spanning, as it might be incorrectly displayed in this email). Note 2: The (absolute) path to the kernel might also be causing some problem. Note 3: There might be other problems, independent of this config. Regards, Ady.
On Thu, Feb 05, 2015 at 11:06:36PM +0100, Geert Stappers via Syslinux wrote:> On Thu, Feb 05, 2015 at 02:27:57PM -0600, Lane via Syslinux wrote: > > When trying to boot, the 1950 node displays (though both diskless types > > having same issue): > > --- > > > > CLIENT MAC ADDR: 00 13 72 F9 54 41 GUID: 44454C4C 4A00 104D 8058 > > B4C04F534231 > > CLIENT IP: 10.0.1.1 MASK: 255.255.0.0 DHCP IP: 10.0.0.11 > > GATEWAY IP: 10.0.0.1 > > PXE-E32: TFTP open timeout > > PXE-E32: TFTP open timeout > > PXE-E32: TFTP open timeout > > PXE-M0F: Exiting Broadcom PXE ROM. > > > > strike F1 to retry boot, F2 for setup utility > > --- > > > <snip/> > > > > I also have a pcap log for tftp if interested. > > I am :-) >( stappers got the libpcap file off-list ) The symptoms are those of a deaf client. We, the mailinglist (archive), have seen that before. ( All so on DELL ) If I recall correct is now the PCI-ID of the NIC needed, plus an additional expection (for polling or interrupt) as Gene has done before. Groeten Geert Stappers -- Leven en laten leven
Yes, that default file is actual content and working on my much older master node which is using syslinux 3.86. On the new system with an updated Gentoo install, it's not working. I have tried, 3.86, 4.07, 5.10, and 6.03. All gave the same results w/ the posted pxelinux.cfg/default file. I'm currently back at 5.10 on the new master. Tomorrow when I get back in I will change that file to what you recommend to see if that helps. On Thu, Feb 5, 2015 at 4:38 PM, Ady via Syslinux <syslinux at zytor.com> wrote:> > /diskless/pxelinux.cfg/default > > --- > > DEFAULT /gentoo-x86_64/boot/kernel-3.14.14-gentoo > > APPEND ip=dhcp ro rootfstype=nfs root=/dev/nfs > > nfsroot=10.0.0.11:/diskless/gentoo-x86_64 > > init=/linuxrc > > That pxelinux.cfg/default seems "tolerable" for older versions, but not > for Syslinux 5+. > > Is this an inadequate copy of the actual content? Or is it really a > trustworthy copy of your actual default config? > > If it is the latter, please report the correct content of the file. > > If it is the former, I would suggest something _similar_ to the > following pxelinux.cfg/default: > > DEFAULT gentoo_3_14_14 > LABEL gentoo_3_14_14 > LINUX /gentoo-x86_64/boot/kernel-3.14.14-gentoo > APPEND ip=dhcp ro rootfstype=nfs root=/dev/nfs > nfsroot=10.0.0.11:/diskless/gentoo-x86_64 init=/linuxrc > > > Note 1: The APPEND line should be in one row (i.e. no spanning, as it > might be incorrectly displayed in this email). > > Note 2: The (absolute) path to the kernel might also be causing some > problem. > > Note 3: There might be other problems, independent of this config. > > Regards, > Ady. > _______________________________________________ > Syslinux mailing list > Submissions to Syslinux at zytor.com > Unsubscribe or set options at: > http://www.zytor.com/mailman/listinfo/syslinux >
On Thu, Feb 5, 2015 at 3:27 PM, Lane via Syslinux <syslinux at zytor.com> wrote:> I'm replacing the master node in a cluster and having issues PXE booting > the diskless nodes. The current master works fine, but the new node to > replace the current node which has the same setup is having the issues. > > The hardware is: > > Dell r920 (new master node, UEFI) > Dell r620's (diskless, BIOS) > Dell 1950's (diskless, BIOS) > > When trying to boot, the 1950 node displays (though both diskless types > having same issue): > --- > Broadcom UNDI PXE-2.1 v4.4.4 > Copyright (C) 2000-2008 Broadcom Corporation > Copyright (C) 1997-2008 Intel Corporation > All rights reserved. > > Broadcom Base Code PXE-2.1 v1.1.0 > Copyright (C) 2000-2008 Broadcom Corporation > Copyright (C) 1997-2008 Intel Corporation > > CLIENT MAC ADDR: 00 13 72 F9 54 41 GUID: 44454C4C 4A00 104D 8058 > B4C04F534231 > CLIENT IP: 10.0.1.1 MASK: 255.255.0.0 DHCP IP: 10.0.0.11 > GATEWAY IP: 10.0.0.1 > PXE-E32: TFTP open timeout > PXE-E32: TFTP open timeout > PXE-E32: TFTP open timeout > PXE-M0F: Exiting Broadcom PXE ROM. > > strike F1 to retry boot, F2 for setup utility > --- >> I also have a pcap log for tftp if interested. The current setup I had no > issues getting working w/ PXE, but w/ this new master node, haven't been > able to see why its not working. > > Any help much appreciated.Thanks for emailing me the PCAPs. Looking at the capture with just node-1, I saw that the TFTP Read Request (packet 3) was sent directly from node-1 to the master. The response(packet 4), however, was sent from the master to the gateway device (a Cisco) which indicated an incongruent subnet mask on the master. When you adjusted the subnet mask, your issue disappeared. A packet capture of the client's port would have been ideal but in this scenario, there was enough information in the TFTP daemon host's capture to show a probable issue that did in fact lead to a resolution. -- -Gene