I have a router with 3 network interfaces like in the following ASCII diagram below. All interfaces are 100mbit. There is tcp traffic being sent from net1 to net3 and from net2 to net3 and the tcp connections consume as much bandwidth as possible. There is a pfifo queue on the egress interface eth0 of the core router with a limit of 10 packets. net1 --> (eth1) router (eth0) -> net3 (eth2) ^ | net 2 I police traffic on the edge of net1 to 48.4375 Mbit and shape the traffic on exit of net 2 to 48.4375 Mbit. There are no packets in the queue of the egress interface eth0 of the router at any stage. (every packet is enqueued by pfifo_enqueue() to an empty queue. I have confirmed this by adding adding a counter in sch_fifo.c that is incremented every time there is a packet in the queue when a new packet is enqueued.) The delay is at a maximum of 2ms. When I increase the policing rate and shaping rates to 48.4687. The combined increase is 31.2 kbit which is very small. there are some packets queued for a short period and some dropped which clears the queue. The maximum number of packets dropped was 20 per second. But the delay goes up to 30ms. check out the graphs at http://frink.nuigalway.ie/~jlynch/queue/ I cant seem to explain this. Even if the queue was full all the time and each packet was of maximum size, the delay imposed by queueing should be a maximum of 10 * 1500 * 8 /100,000,000 which equals 1ms. How can so much delay be added by such a small increase in the throughput coming from net1 and net2 ? I would appreciate if someone could explain it to me. Btw im using a stratum 1 NTP server on the same LAN to ensure measurement accuracy. Jonathan
RH Equipe Teleinfor
2005-Oct-11 21:10 UTC
Linha fale a vontede todos os dias (11)6839-0277
Linha fale a vontade todos os dias, aquela que você só paga a assinatura www.telefonevesper.com.br ----- Original Message ----- From: "Jonathan Lynch" <jlynch@frink.nuigalway.ie> To: <lartc@mailman.ds9a.nl> Sent: Tuesday, October 11, 2005 6:04 PM Subject: [LARTC] The effects of queueing on delay> > I have a router with 3 network interfaces like in the following ASCII > diagram below. All interfaces are 100mbit. There is tcp traffic being > sent from net1 to net3 and from net2 to net3 and the tcp connections > consume as much bandwidth as possible. There is a pfifo queue on the > egress interface eth0 of the core router with a limit of 10 packets. > > > net1 --> (eth1) router (eth0) -> net3 > (eth2) > ^ > | > net 2 > > I police traffic on the edge of net1 to 48.4375 Mbit and shape the > traffic on exit of net 2 to 48.4375 Mbit. There are no packets in the > queue of the egress interface eth0 of the router at any stage. (every > packet is enqueued by pfifo_enqueue() to an empty queue. I have > confirmed this by adding adding a counter in sch_fifo.c that is > incremented every time there is a packet in the queue when a new packet > is enqueued.) The delay is at a maximum of 2ms. > > When I increase the policing rate and shaping rates to 48.4687. The > combined increase is 31.2 kbit which is very small. there are some > packets queued for a short period and some dropped which clears the > queue. The maximum number of packets dropped was 20 per second. But the > delay goes up to 30ms. > > check out the graphs at > http://frink.nuigalway.ie/~jlynch/queue/ > > > I cant seem to explain this. Even if the queue was full all the time and > each packet was of maximum size, the delay imposed by queueing should be > a maximum of 10 * 1500 * 8 /100,000,000 which equals 1ms. > > How can so much delay be added by such a small increase in the > throughput coming from net1 and net2 ? > > I would appreciate if someone could explain it to me. > > Btw im using a stratum 1 NTP server on the same LAN to ensure > measurement accuracy. > > > Jonathan > > _______________________________________________ > LARTC mailing list > LARTC@mailman.ds9a.nl > http://mailman.ds9a.nl/cgi-bin/mailman/listinfo/lartc
Jonathan Lynch
2005-Oct-13 16:06 UTC
Re: The effects of queueing on delay...(TX Ring Buffer the problem)
This was down to the tx buffer size on the network card i was using. It was an Intel 82547EI gigabit Card using the e1000 driver and operating at 100mbit. The tx buffer was set to 256 which caused this huge delay. The minimum the driver lets me reduce the tx buffer size using ethtool is 80. By reducing the tx ring buffer to 80, the delay when there is full link utilisation and a maximum queue of 10 packets was reduced from 30ms to 10ms. The 3com 3c59x vortex driver uses a tx buffer of 16. I reduced the tx to 16 on the e1000 driver, but the max throughput i could achieve on the interface went down. Has anyone experimented with reducing the size of the tx buffer on this card to get a good balance between delay and throughput ? Jonathan On Tue, 2005-10-11 at 22:04 +0100, Jonathan Lynch wrote:> I have a router with 3 network interfaces like in the following ASCII > diagram below. All interfaces are 100mbit. There is tcp traffic being > sent from net1 to net3 and from net2 to net3 and the tcp connections > consume as much bandwidth as possible. There is a pfifo queue on the > egress interface eth0 of the core router with a limit of 10 packets. > > > net1 --> (eth1) router (eth0) -> net3 > (eth2) > ^ > | > net 2 > > I police traffic on the edge of net1 to 48.4375 Mbit and shape the > traffic on exit of net 2 to 48.4375 Mbit. There are no packets in the > queue of the egress interface eth0 of the router at any stage. (every > packet is enqueued by pfifo_enqueue() to an empty queue. I have > confirmed this by adding adding a counter in sch_fifo.c that is > incremented every time there is a packet in the queue when a new packet > is enqueued.) The delay is at a maximum of 2ms. > > When I increase the policing rate and shaping rates to 48.4687. The > combined increase is 31.2 kbit which is very small. there are some > packets queued for a short period and some dropped which clears the > queue. The maximum number of packets dropped was 20 per second. But the > delay goes up to 30ms. > > check out the graphs at > http://frink.nuigalway.ie/~jlynch/queue/ > > > I cant seem to explain this. Even if the queue was full all the time and > each packet was of maximum size, the delay imposed by queueing should be > a maximum of 10 * 1500 * 8 /100,000,000 which equals 1ms. > > How can so much delay be added by such a small increase in the > throughput coming from net1 and net2 ? > > I would appreciate if someone could explain it to me. > > Btw im using a stratum 1 NTP server on the same LAN to ensure > measurement accuracy. > > > Jonathan > > _______________________________________________ > LARTC mailing list > LARTC@mailman.ds9a.nl > http://mailman.ds9a.nl/cgi-bin/mailman/listinfo/lartc
Andy Furniss
2005-Oct-17 19:20 UTC
Re: The effects of queueing on delay...(TX Ring Buffer the problem)
Jonathan Lynch wrote:> This was down to the tx buffer size on the network card i was using. It > was an Intel 82547EI gigabit Card using the e1000 driver and operating > at 100mbit. The tx buffer was set to 256 which caused this huge delay. > The minimum the driver lets me reduce the tx buffer size using ethtool > is 80. By reducing the tx ring buffer to 80, the delay when there is > full link utilisation and a maximum queue of 10 packets was reduced from > 30ms to 10ms. > > The 3com 3c59x vortex driver uses a tx buffer of 16. I reduced the tx to > 16 on the e1000 driver, but the max throughput i could achieve on the > interface went down. > > Has anyone experimented with reducing the size of the tx buffer on this > card to get a good balance between delay and throughput ?Strange - I thought that as long as you are under rate for the link then the most htb should burst per tick is the burst size specified. That assumes one bulk class - more will make it worse. Andy.
Jonathan Lynch
2005-Nov-19 23:53 UTC
Re: The effects of queueing on delay...(TX Ring Buffer the problem)
Quoting Andy Furniss <andy.furniss@dsl.pipex.com>:> Jonathan Lynch wrote: > > This was down to the tx buffer size on the network card i was using. It > > was an Intel 82547EI gigabit Card using the e1000 driver and operating > > at 100mbit. The tx buffer was set to 256 which caused this huge delay. > > The minimum the driver lets me reduce the tx buffer size using ethtool > > is 80. By reducing the tx ring buffer to 80, the delay when there is > > full link utilisation and a maximum queue of 10 packets was reduced from > > 30ms to 10ms. > > > > The 3com 3c59x vortex driver uses a tx buffer of 16. I reduced the tx to > > 16 on the e1000 driver, but the max throughput i could achieve on the > > interface went down. > > > > Has anyone experimented with reducing the size of the tx buffer on this > > card to get a good balance between delay and throughput ? > > Strange - I thought that as long as you are under rate for the link then > the most htb should burst per tick is the burst size specified. > > That assumes one bulk class - more will make it worse. > > Andy. >Just noticed your reply there, havnt been very busy lately and havnt been checked LARTC in a while. say for example with a htb qdisc configured with a ceil of 100 Mbit (overhead 24 mpu 84 mtu 1600 burst 12k cburst 12k quantum 1500) or a queue discipline that doesnt rate limit such as prio or red there was a delay of 30 ms imposed when the outgoing interface was saturated and the tx ring size was 256. when the tx ring size was reduced to 80 the delay was around 9ms. The tx ring is a fifo structure. The NIC driver uses DMA to transmit packets from the tx ring. these are worst case delays when The tx ring is full of maximum size FTP packets with the VoIP packet at the end. The VoIP has to wait for all the FTP packets to be transmitted. When the rate was reduced to 99Mbit the maximum delay imposed is about 2ms. It seems that with the reduced rate there is time to clear more packets from the TX ring...there are less packets in the ring resulting in a lower delay. But the delay increases linearly. Also a question when defining the following parameters (overhead 24 mpu 84 mtu 1600 burst 12k cburst 12k quantum 1500) i have them defined on all classes and on the htb qdisc itself. Is there a minimum place where they can be specified...ie just on the htb qdisc itself, or do they have to be specified on all Jonathan
Andy Furniss
2005-Dec-06 01:55 UTC
Re: The effects of queueing on delay...(TX Ring Buffer the problem)
Jonathan Lynch wrote:> Quoting Andy Furniss <andy.furniss@dsl.pipex.com>: > > >>Jonathan Lynch wrote: >> >>>This was down to the tx buffer size on the network card i was using. It >>>was an Intel 82547EI gigabit Card using the e1000 driver and operating >>>at 100mbit. The tx buffer was set to 256 which caused this huge delay. >>>The minimum the driver lets me reduce the tx buffer size using ethtool >>>is 80. By reducing the tx ring buffer to 80, the delay when there is >>>full link utilisation and a maximum queue of 10 packets was reduced from >>>30ms to 10ms. >>> >>>The 3com 3c59x vortex driver uses a tx buffer of 16. I reduced the tx to >>>16 on the e1000 driver, but the max throughput i could achieve on the >>>interface went down. >>> >>>Has anyone experimented with reducing the size of the tx buffer on this >>>card to get a good balance between delay and throughput ? >> >>Strange - I thought that as long as you are under rate for the link then >>the most htb should burst per tick is the burst size specified. >> >>That assumes one bulk class - more will make it worse. >> >>Andy. >> > > > Just noticed your reply there, havnt been very busy lately and havnt been checked LARTC in a while. > > say for example with a htb qdisc configured with a ceil of 100 Mbit (overhead 24 mpu 84 mtu 1600 > burst 12k cburst 12k quantum 1500) or a queue discipline that doesnt rate limit such as prio or red > there was a delay of 30 ms imposed when the outgoing interface was saturated and the tx ring size > was 256. when the tx ring size was reduced to 80 the delay was around 9ms.Ahh I see what you mean - reducing the buffer beyond htb, but I don''t really see why you need to, rather than reducing htb rate so you only have one htb burst of packets in it at a time (that assumes you only have two classes like in your other tests - more bulk classes would be worse).> > The tx ring is a fifo structure. The NIC driver uses DMA to transmit packets from the tx ring. these > are worst case delays when The tx ring is full of maximum size FTP packets with the VoIP packet at > the end. The VoIP has to wait for all the FTP packets to be transmitted. > > When the rate was reduced to 99Mbit the maximum delay imposed is about 2ms. It seems that with the > reduced rate there is time to clear more packets from the TX ring...there are less packets in the > ring resulting in a lower delay. But the delay increases linearly.I agree from our previous discussion and tests that even with overheads added you need to back off a bit more than expected, but I assumed this was to do with either : The 8/16 byte granularity of the lookup tables. The fact that overhead was not designed into htb, but added later. Timers maybe being a bit out. Me not knowing the overhead of ethernet properly. Making tx buffer smaller is just a workaround for htb being over rate for whatever reason.> > Also a question when defining the following parameters (overhead 24 mpu 84 mtu 1600 burst 12k cburst > 12k quantum 1500)I suppose quantum should be 1514 - as you pointed out to me previously as it''s eth - maybe more if you are still testing on vlans. I don''t think it will make any difference in this test - I see the overhead allows for it already (I am just stating for the list as it was down during our previous discussions about overheads). mtu 1600 - It''s a bit late to check now - but there is a size makes the table go from 8 to 16 and 1600 seems familiar - but 2048 comes to mind aswell :-) I would check with tc -s -d class ls and reduce it a bit if you see /16 after burst. i have them defined on all classes and on the htb qdisc itself. Is there a minimum> place where they can be specified...ie just on the htb qdisc itself, or do they have to be > specified on allI would think so, htb has a lookup table for each rate and ceil. Andy.