Hi, I have an intermittent pxelinux boot problem. It happens rarely, for example it happened one day and then did not happen again until 6 days later. However when it does happen it is rather serious as it affects all clients on the network. Here is some basic info: - IBM netvista PCs, built-in pxe, banner says it is "PXE 2.x". - happens rarely, but when it does it affects all clients. May last for an hour or so when it happens. - DHCP - microsoft - TFTP #1: hpa-tftp, started with -s /tftpboot -B 1468 -r blksize -v -v -v -v - TFTP #2: winagents tftp server (tsize on, blksize negotiation off) - network: WAN, but fiber-based and as fast as a LAN - retrieving vmlinuz *always* works at all times (0 failures) - retrieving initrd *almost always* works...except when this problem comes up, and then retrieving initrd *always* fails until some time passes and the problem goes away. The error message is: Loading vmlinuz................. Could not find ramdisk image: initrd boot: We tried switching between TFTP #1 and TFTP #2, but no help there. In the tftp log, what we see is vmlinuz being transferred completely and normally, and then no further requests coming in. I am using thinstation 2.2. I have also asked on that list but have not found a solution. That thread can be found here: http://www.nabble.com/intermittent-pxe-failure-t3655353.html Question: what code is responsible for downloading vmlinuz, and which code is responsible for downloading initrd? Is it the pxe firmware, or pxelinux itself? What happens between the vmlinuz download and the initrd download? Any network activity that could potentially lead to a failure? Is this activity logged anywhere? Any suggestions welcome! Larry
Op 05-05-2007 om 00:49 schreef Larry Howe: <snip/>> - happens rarely, but when it does it affects all clients. May last for an > hour or so when it happens.Gut feeling: a TFTP daemon that is started by inetd and stays live that hour (why the TFTPd handles vmlinuz, but not the initrd is indeed strange)> Question: what code is responsible for downloading vmlinuz, and which code is > responsible for downloading initrd? Is it the pxe firmware, or pxelinux > itself?Briefly: bootROM downloads and starts 'pxelinux.0' pxelinux.0 downloads and parses "pxelinux.cfg/default" pxelinux.0 downloads _both_ vmlinuz and initrd. all three use the same 'get_a_network_packet' software routine in the PXE ROM pxelinux.0 starts vmlinuz vmlinuz searches initrd (in download memory) and reads from it.> What happens between the vmlinuz download and the initrd download?Sorry, I don't know (for sure)> Any network activity that could potentially lead to a failure?Only malicious network activity ( which is poorly documented ;-)> Is this activity logged anywhere?IIRC get the TFTP requests in the syslog, you might need -v -v -v parameters.> Any suggestions welcome!tcpdump the TFTP server on the TFTP port. Watching only on port 69 will get you only the TFTP Requests. That has two advantages: * low disk usage,which makes monitoring for weeks possible * you should if the client really requests the initrd.> LarryCheers Geert Stappers P.S. From http://www.nabble.com/intermittent-pxe-failure-t3655353.html | Am I right in assuming that PXE loads vmlinuz, but then vmlinuz loads initrd? No. pxelinux loads both.
Geert Stappers wrote:> Op 05-05-2007 om 00:49 schreef Larry Howe: > <snip/> >> - happens rarely, but when it does it affects all clients. May last for an >> hour or so when it happens. > > Gut feeling: > a TFTP daemon that is started by inetd and stays live that hour > (why the TFTPd handles vmlinuz, but not the initrd is indeed strange)tftp-hpa sticks around for 15 minutes after last use, by default. -hpa
> tcpdump the TFTP server on the TFTP port. > Watching only on port 69 will get you only the TFTP Requests. > That has two advantages: > * low disk usage,which makes monitoring for weeks possible > * you should if the client really requests the initrd. > > Cheers > Geert StappersThanks Geert and Peter for the detailed answers. At least I know where to start looking. I will post back if I find anything. For now, we are booting with CD (ISOLINUX) which will work fine until we get this worked out. Larry
Op 07-05-2007 om 23:58 schreef Larry Howe:> > tcpdump the TFTP server on the TFTP port. > > Watching only on port 69 will get you only the TFTP Requests. > > That has two advantages: > > * low disk usage,which makes monitoring for weeks possible > > * you should see if the client really requests the initrd. > > > > Cheers > > Geert Stappers > > Thanks Geert and Peter for the detailed answers. At least I know where to > start looking. I will post back if I find anything. For now, we are booting > with CD (ISOLINUX) which will work fine until we get this worked out.What do you what to get worked out?
On Tuesday 08 May 2007 16:15, Geert Stappers wrote:> Op 07-05-2007 om 23:58 schreef Larry Howe: > > > tcpdump the TFTP server on the TFTP port. > > > Watching only on port 69 will get you only the TFTP Requests. > > > That has two advantages: > > > * low disk usage,which makes monitoring for weeks possible > > > * you should see if the client really requests the initrd. > > > > > > Cheers > > > Geert Stappers > > > > Thanks Geert and Peter for the detailed answers. At least I know where to > > start looking. I will post back if I find anything. For now, we are > > booting with CD (ISOLINUX) which will work fine until we get this worked > > out. > > What do you what to get worked out?For now, we will just boot from CD. That will give us time to look more closely at the PXE / TFTP problem. Larry
Op 08-05-2007 om 22:46 schreef Larry Howe:> On Tuesday 08 May 2007 16:15, Geert Stappers wrote: > > > > What do you what to get worked out? > > For now, we will just boot from CD. That will give us time to look more > closely at the PXE / TFTP problem.Each CD boot is a missed chance to reproduce the _intermitted_ PXE failure. My advice: Activate various loggers/monitors/watchers/datacaptuters and do PXE booting. Geert Stappers mostly in an attempt to prevent a self reply for Larry Howe -- There is nothing wrong self replies that have added value