In article <47572D0D.4050107 at udfc.com>, Alan Bunch <Alan.Bunch at
udfc.com> wrote:> Please bear with me as I know I have included a lot of detail.
>
> Description
> Redhat AS 3
> Kernel 2.4.21-47.0.1.ELsmp
> eth0 HWaddr 00:07:E9:11:30:76
> inet addr:192.168.1.3 Bcast:192.168.1.255 Mask:255.255.255.0
>
> eth0:1 HWaddr 00:07:E9:11:30:76
> inet addr:192.168.3.1 Bcast:192.168.3.255 Mask:255.255.255.0
>
> This interface is up but with no IP address
> eth1 HWaddr 00:07:E9:11:30:77
>
> eth1.101 10.10.1.1 Bcast:10.10.1.255 Mask:255.255.255.0 vlan 101
> eth1.102 10.10.2.1 Bcast:10.10.2.255 Mask:255.255.255.0 vlan 102
> eth1.103 10.10.3.1 Bcast:10.10.3.255 Mask:255.255.255.0 vlan 103
> eth1.104 10.10.4.1 Bcast:10.10.4.255 Mask:255.255.255.0 vlan 104
> eth1.105 10.10.5.1 Bcast:10.10.5.255 Mask:255.255.255.0 vlan 105
> eth1.106 10.10.6.1 Bcast:10.10.6.255 Mask:255.255.255.0 vlan 106
> eth1.107 10.10.7.1 Bcast:10.10.7.255 Mask:255.255.255.0 vlan 107
> eth1.108 10.10.8.1 Bcast:10.10.8.255 Mask:255.255.255.0 vlan 108
> eth1.109 10.10.9.1 Bcast:10.10.9.255 Mask:255.255.255.0 vlan 109
> eth2 Link encap:Ethernet HWaddr 00:06:5B:FE:56:C2
> BROADCAST MULTICAST MTU:1500 Metric:1
> eth3 Link encap:Ethernet HWaddr 00:06:5B:FE:56:C3
> BROADCAST MULTICAST MTU:1500 Metric:1
I assume the above machine is "the router". I don't know whether
leaving
eth1 without an IP address could be the source of any problems...
> This machine is routing between the vlans. When I ping 10.10.5.105 I
> get Host Unreachable.
Presumably, 10.10.5.105 is what you refer to below as "the device".
> Here is tcpdump from the ping on the router.
>
> tcpdump -i eth1.105
> 14:10:28.254008 arp who-has 10.10.5.105 tell 10.10.5.1
> 14:10:29.250067 arp who-has 10.10.5.105 tell 10.10.5.1
> 14:10:30.250143 arp who-has 10.10.5.105 tell 10.10.5.1
This suggests either:
a) the arp request is not being heard/understood by the device, or
b) the arp reply is not being heard by the router, or even perhaps
c) the lack of -n is causing tcpdump not to display some packets
while it is trying to resolve addresses to hostnames.
Try again using this: "tcpdump -n -e -i any" - this will include the
ethernet address and will monitor all interfaces instead of just 105
(in case a packet is going to the wrong interface).
Also, what is the routing table shown by "netstat -rn"?
> Ok now I go to the device via a serial port and ping back to 10.10.5.1 (
> the router ) and here is the tcpdump output
>
> tcpdump -i eth1.105 -n
> tcpdump: listening on eth1.105
> 14:12:06.706722 arp who-has 10.10.5.1 tell 10.10.5.105
> 14:12:06.706798 arp reply 10.10.5.1 is-at 0:7:e9:11:30:77
> 14:12:06.707715 10.10.5.105 > 10.10.5.1: icmp: echo request (DF)
> 14:12:06.707762 10.10.5.1 > 10.10.5.105: icmp: echo reply
> 14:12:07.723100 10.10.5.105 > 10.10.5.1: icmp: echo request (DF)
> 14:12:07.723136 10.10.5.1 > 10.10.5.105: icmp: echo reply
OK...
> Now of course I can ping 10.10.5.10 (the suspect device) from 10.10.5.1
> ( the router )
Is 10.10.5.10 a typo for 10.10.5.105?
What kind of unit is the "suspect device"? Can you display
"ifconfig -a"
and "netstat -rn" or the equivalent on it?
> ping 10.10.5.1
> PING 10.10.5.1 (10.10.5.1) 56(84) bytes of data.
> 64 bytes from 10.10.5.1: icmp_seq=0 ttl=64 time=0.068 ms
> 64 bytes from 10.10.5.1: icmp_seq=1 ttl=64 time=0.039 ms
But this appears to be pinging TO the router, not pinging FROM the router!
> This is fine untill the arp entry ages out. Then I back to not being
> able to ping the device.
>
> If I manually insert an arp table entry all is well. No filtering in
> the switches. Switches are SMC 6826 for the 10/100 and 8724 for the
> core and gig e.
>
> I have several similar symptoms like this in various places. I have
> some devices on vlans that I see the dhcp discover messages and I see
> the dhcp offer then the device sends another dhcp discover. This goes
> on for a few times and the device just waits and starts the process over.
>
> I feel that the problem lies in the handling of arp requests but I guess
> I just don't know enough about how linux handles them or how to control
> them to find a solution.
>
> Any ideas ?
Sounds more like a general broadcast issue to me, since DHCP discovers
and offers are sent as broadcasts and therefore don't need ARP first.
If you use -e in tcpdump you will see whether the broadcast ethernet
address is being used (ff:ff:ff:ff:ff:ff) or not. If ARP and DHCP packets
are not using the broadcast ethernet address, then something is not right
with the netmask or the broadcast address.
Intersting problem - let us know how you get on.
Cheers
Tony
--
Tony Mountifield
Work: tony at softins.co.uk - http://www.softins.co.uk
Play: tony at mountifield.org - http://tony.mountifield.org