Hello, it seems, that filtering on nexthdr (TCP/UDP) content, especially src or dst port, is not working. The following has no effect on 2.4.16 or older (even 2.2) kernels: # tc filter add dev eth0 parent ffff: protocol ip prio 50 u32 match tcp dst 3128 0xffff police rate 40kbit burst 10k drop flowid :1 Even if # tc filter ls dev eth0 parent ffff: filter protocol ip pref 50 u32 filter protocol ip pref 50 u32 fh 800: ht divisor 1 filter protocol ip pref 50 u32 fh 800::800 order 2048 key ht 800 bkt 0 flowid :1 police 4 action drop rate 40Kbit burst 10Kb mtu 2Kb match 00000c38/0000ffff at nexthdr+0 looks reasonable, TCP connections to port 3128 are not policed. If I use "match ip dst <ip-address>" instead, the policing works. Port based matching isn''t working for outgoing shapers either, as can be seen with the statistics functions. Any idea? Anybody with port based (etc.) filtering actually working? Regards, Lutz -- _ | Lutz Pressler | Tel: ++49-551-3700002 |_ |\ | | Service Network GmbH | FAX: ++49-551-3700009 ._|ER | \|ET | Bahnhofsallee 1b | mailto:lp@SerNet.DE Service Network | D-37081 Goettingen | http://www.SerNet.DE/
On Thu, Dec 13, 2001 at 08:46:57PM +0100, Lutz Pressler wrote:> The following has no effect on 2.4.16 or older (even 2.2) kernels: > > # tc filter add dev eth0 parent ffff: protocol ip prio 50 u32 match tcp > dst 3128 0xffff police rate 40kbit burst 10k drop flowid :1Double check what this means! This limits speed of data *coming in to* your proxy from a client (browser). That is not a lot - most data will flow he other way, and will indeed not be matched. Data being received BY your proxy from the internet is not matched by this proxy.> Even if > # tc filter ls dev eth0 parent ffff: > filter protocol ip pref 50 u32 > filter protocol ip pref 50 u32 fh 800: ht divisor 1> filter protocol ip pref 50 u32 fh 800::800 order 2048 key ht 800 bkt 0 > flowid :1 police 4 action drop rate 40Kbit burst 10Kb mtu 2Kb > match 00000c38/0000ffff at nexthdr+0You supply a lot of redundant information. I''m not sure what the ''4'' means in this rule.> looks reasonable, TCP connections to port 3128 are not policed. > > If I use "match ip dst <ip-address>" instead, the policing works.Your proxy does no necessarily download FROM port 3128! Regards, bert -- http://www.PowerDNS.com Versatile DNS Software & Services Trilab The Technology People Netherlabs BV / Rent-a-Nerd.nl - Nerd Available - ''SYN! .. SYN|ACK! .. ACK!'' - the mating call of the internet
On Fri, 14 Dec 2001, bert hubert wrote:> On Thu, Dec 13, 2001 at 08:46:57PM +0100, Lutz Pressler wrote: > > > The following has no effect on 2.4.16 or older (even 2.2) kernels: > > > > # tc filter add dev eth0 parent ffff: protocol ip prio 50 u32 match tcp > > dst 3128 0xffff police rate 40kbit burst 10k drop flowid :1 > > Double check what this means! This limits speed of data *coming in to* your > proxy from a client (browser). That is not a lot - most data will flow he > other way, and will indeed not be matched. >Sorry, that was a typo (I forget that I tried the other way too, to be complete, before doing the cut&paste). Of course "src 3128"!> Data being received BY your proxy from the internet is not matched by this > proxy. > > > Even if > > # tc filter ls dev eth0 parent ffff: > > filter protocol ip pref 50 u32 > > filter protocol ip pref 50 u32 fh 800: ht divisor 1 > > > filter protocol ip pref 50 u32 fh 800::800 order 2048 key ht 800 bkt 0 > > flowid :1 police 4 action drop rate 40Kbit burst 10Kb mtu 2Kb > > match 00000c38/0000ffff at nexthdr+0and "match 0c380000/ffff0000" here.> > You supply a lot of redundant information. I''m not sure what the ''4'' means > in this rule.Neither do I, haven''t set it explicitly. Seems to increase with every change in policing rules.> > > looks reasonable, TCP connections to port 3128 are not policed. > > > > If I use "match ip dst <ip-address>" instead, the policing works. > > Your proxy does no necessarily download FROM port 3128!I did that - as a test, real situation is not about 3128 - on the client, not the proxy. Lutz -- _ | Lutz Pressler | Tel: ++49-551-3700002 |_ |\ | | Service Network GmbH | FAX: ++49-551-3700009 ._|ER | \|ET | Bahnhofsallee 1b | mailto:lp@SerNet.DE Service Network | D-37081 Goettingen | http://www.SerNet.DE/
Hi again, ok, did some tests: match ip sport 3128 does work (as does the more correct match ip sport 3128 0xffff match ip protocol 0xff to only consider TCP) - match tcp src 3128 does not. The difference as shown by tc filter show dev eth0 parent ffff: is that ip sport -> "match 0c380000/ffff0000 at 20" tcp src -> "match 0c380000/ffff0000 at nexthdr+0". This confirms my assumption, that nexthrd is broken. at nexthdr+0 _should_ work with IP options present, "at 20" not, correct? Lutz -- _ | Lutz Pressler | Tel: ++49-551-3700002 |_ |\ | | Service Network GmbH | FAX: ++49-551-3700009 ._|ER | \|ET | Bahnhofsallee 1b | mailto:lp@SerNet.DE Service Network | D-37081 Goettingen | http://www.SerNet.DE/
Hello, On Fri, 14 Dec 2001, Lutz Pressler wrote:> Hi again, > > ok, did some tests: > > match ip sport 3128 does work (as does the more correct > match ip sport 3128 0xffff match ip protocol 0xff to only consider > TCP) - match tcp src 3128 does not. > > The difference as shown by tc filter show dev eth0 parent ffff: > is that ip sport -> "match 0c380000/ffff0000 at 20" > tcp src -> "match 0c380000/ffff0000 at nexthdr+0". > > This confirms my assumption, that nexthrd is broken.It confirms only that nexthdr does not work with your settings. Nothing more. Read carefully iproute2/README.iproute2+tc and particularly the last filter in this file. I agree, it is not documented very well. To use nexthdr you must use "offset" with hash table. U32 is universal (read line #2 in cls_u32.c), it does not know that you are using IPv4, so the value 20 can not be guessed. For this, "offset" is used to extract the iphdr->ihl value and to use it as a base for all nexthdr+ relative offsets.> at nexthdr+0 _should_ work with IP options present, "at 20" not, > correct? > > LutzRegards -- Julian Anastasov <ja@ssi.bg>
On Fri, Dec 14, 2001 at 02:56:57PM +0200, Julian Anastasov wrote:> > The difference as shown by tc filter show dev eth0 parent ffff: > > is that ip sport -> "match 0c380000/ffff0000 at 20" > > tcp src -> "match 0c380000/ffff0000 at nexthdr+0".> not know that you are using IPv4, so the value 20 can not be > guessed. For this, "offset" is used to extract the iphdr->ihl > value and to use it as a base for all nexthdr+ relative offsets.Damn, that''s broken. Or at least, extremely non-obvious and hard to get working. Overly universal comes to mind. So ''ip sport'' would stop matching packets with ip options? Thanks for enlightening us - will update the HOWTO to this effect. Regards, bert -- http://www.PowerDNS.com Versatile DNS Software & Services Trilab The Technology People Netherlabs BV / Rent-a-Nerd.nl - Nerd Available - ''SYN! .. SYN|ACK! .. ACK!'' - the mating call of the internet
Hello, On Fri, 14 Dec 2001, bert hubert wrote:> > > not know that you are using IPv4, so the value 20 can not be > > guessed. For this, "offset" is used to extract the iphdr->ihl > > value and to use it as a base for all nexthdr+ relative offsets. > > Damn, that''s broken. Or at least, extremely non-obvious and hard to get > working. Overly universal comes to mind. So ''ip sport'' would stop matching > packets with ip options?No, ihl includes the options. Everything works perfectly. It is bug to use sport and dport if ip options are present. There are tcp dst and tcp src for example. Same for udp. For icmp there are icmp type and icmp code. All they use the same base pointer.> Regards, > > bertRegards -- Julian Anastasov <ja@ssi.bg>
On Fri, Dec 14, 2001 at 03:15:43PM +0200, Julian Anastasov wrote:> > Damn, that''s broken. Or at least, extremely non-obvious and hard to get > > working. Overly universal comes to mind. So ''ip sport'' would stop matching > > packets with ip options? > > No, ihl includes the options. Everything works perfectly. > It is bug to use sport and dport if ip options are present. ThereGeh. Or an ''undocumented feature''. Because you don''t know what kind of packets you will send or forward, using ''ip sport'' is always a bug.> are tcp dst and tcp src for example. Same for udp. For icmp there > are icmp type and icmp code. All they use the same base pointer.But tcp src only works when operating in a hashed filter? Which is not often the case. I tried this: tc filter add dev eth0 parent 1:0 prio 5 u32 \ match ip nofrag \ offset mask 0x0F00 shift 6 \ match tcp src 22 0xffff classid 1:2 But it doesn''t work, gives: RTNETLINK answers: Invalid argument Regards, bert -- http://www.PowerDNS.com Versatile DNS Software & Services Trilab The Technology People Netherlabs BV / Rent-a-Nerd.nl - Nerd Available - ''SYN! .. SYN|ACK! .. ACK!'' - the mating call of the internet
Hello, On Fri, 14 Dec 2001, bert hubert wrote:> > No, ihl includes the options. Everything works perfectly. > > It is bug to use sport and dport if ip options are present. There > > Geh. Or an ''undocumented feature''. Because you don''t know what kind of > packets you will send or forward, using ''ip sport'' is always a bug.Yes> > are tcp dst and tcp src for example. Same for udp. For icmp there > > are icmp type and icmp code. All they use the same base pointer. > > But tcp src only works when operating in a hashed filter? Which is > not often the case.Right. But only then we can match packets with options.> I tried this: > tc filter add dev eth0 parent 1:0 prio 5 u32 \ > match ip nofrag \ > offset mask 0x0F00 shift 6 \ > match tcp src 22 0xffff classid 1:2 > > But it doesn''t work, gives:Of course> RTNETLINK answers: Invalid argumentDidn''t tried it but something like this: F="tc filter add dev eth0 parent 1:0 protocol ip prio 5" $F handle 1: u32 divisor 1 $F u32 ht 1: match tcp src 22 0xFFFF match ip protocol 6 0xFF match ip firstfrag flowid 1:2 $F u32 ht 800:: match u8 0 0 offset at 0 mask 0x0f00 shift 6 link 1: Using ip nofrag is another bug :) Small? You miss traffic.> Regards, > > bertRegards -- Julian Anastasov <ja@ssi.bg>
On Friday 14 December 2001 14.15, Julian Anastasov wrote:> No, ihl includes the options. Everything works perfectly. > It is bug to use sport and dport if ip options are present. There > are tcp dst and tcp src for example. Same for udp. For icmp there > are icmp type and icmp code. All they use the same base pointer.Which only works if you have a chained the filter rules using a hash table, where the hash table has a IP offset rule. Regards Henrik
On Fri, Dec 14, 2001 at 03:54:43PM +0200, Julian Anastasov wrote:> Didn''t tried it but something like this: > > F="tc filter add dev eth0 parent 1:0 protocol ip prio 5" > $F handle 1: u32 divisor 1 > $F u32 ht 1: match tcp src 22 0xFFFF match ip protocol 6 0xFF match ip firstfrag flowid 1:2 > $F u32 ht 800:: match u8 0 0 offset at 0 mask 0x0f00 shift 6 link 1:Thanks for that example; a few more U32 filter examples in the HOWTO would be welcome I''m sure ... ;-~ -- Michael T. Babcock CTO, FibreSpeed Ltd. (Hosting, Security, Consultation, Database, etc) http://www.fibrespeed.net/~mbabcock/
On Fri, Dec 14, 2001 at 02:59:15PM -0500, Michael T. Babcock wrote:> On Fri, Dec 14, 2001 at 03:54:43PM +0200, Julian Anastasov wrote: > > Didn''t tried it but something like this: > > > > F="tc filter add dev eth0 parent 1:0 protocol ip prio 5" > > $F handle 1: u32 divisor 1 > > $F u32 ht 1: match tcp src 22 0xFFFF match ip protocol 6 0xFF match ip firstfrag flowid 1:2 > > $F u32 ht 800:: match u8 0 0 offset at 0 mask 0x0f00 shift 6 link 1: > > Thanks for that example; a few more U32 filter examples in the HOWTO > would be welcome I''m sure ... ;-~I''m always happy to receive tested examples. That is what takes the most time - I actually try to test everything these days or I need to be *sure* that somebody tested it. In the past a lot of crap was merged which later turned out not to work :-(> Michael T. Babcock > CTO, FibreSpeed Ltd. (Hosting, Security, Consultation, Database, etc) > http://www.fibrespeed.net/~mbabcock/Oh, I''ve been exploring how the ''virtual clock'' works in the Linux CBQ implementation, it turns out that you can misconfigure it quite badly and still get *statistically* accurate shaping. I''m still figuring out the effects at short timescales of misconfiguring bandwidth. Regards, bert hubert -- http://www.PowerDNS.com Versatile DNS Software & Services Trilab The Technology People Netherlabs BV / Rent-a-Nerd.nl - Nerd Available - ''SYN! .. SYN|ACK! .. ACK!'' - the mating call of the internet
On Sat, Dec 15, 2001 at 12:00:18AM +0100, bert hubert wrote:> Oh, I''ve been exploring how the ''virtual clock'' works in the Linux CBQ > implementation, it turns out that you can misconfigure it quite badly and > still get *statistically* accurate shaping. I''m still figuring out the > effects at short timescales of misconfiguring bandwidth.Please post your observations as you come across them so we can also test them and see what''s going on faster together. -- Michael T. Babcock CTO, FibreSpeed Ltd. (Hosting, Security, Consultation, Database, etc) http://www.fibrespeed.net/~mbabcock/
On Fri, Dec 14, 2001 at 06:07:28PM -0500, Michael T. Babcock wrote:> On Sat, Dec 15, 2001 at 12:00:18AM +0100, bert hubert wrote: > > Oh, I''ve been exploring how the ''virtual clock'' works in the Linux CBQ > > implementation, it turns out that you can misconfigure it quite badly and > > still get *statistically* accurate shaping. I''m still figuring out the > > effects at short timescales of misconfiguring bandwidth. > > Please post your observations as you come across them so we can also test > them and see what''s going on faster together.The theory is like this. CBQ wants to know the idle time of the interface, which would work like this. Enqueue a packet Enqueue a packet Enqueue a packet Packet is dequeued to the interface interface busy sending packet interface notifies us that the packet was sent CBQ notes how much time has passed, and uses this for avgidle calculations Packet is dequeued to the interface process repeats. Ok - now, this is not how it works in Unix, or at least, in Linux. In fact it goes like this: Enqueue a packet to the queue Dequeue all packets from the queue, give to the network interface Enqueue a packet to the queue Dequeue all packets from the queue, give to the network interface Enqueue a packet to the queue Dequeue all packets from the queue, give to the network interface ... Enqueue a packet to the queue Network interface is fed up, will notice us when there is room again ...... Enqueue a packet to the queue Enqueue a packet to the queue Enqueue a packet to the queue Enqueue a packet to the queue Network interface tells us that there is room dequeue dequeue dequeue dequeue Now - we don''t now really *know* when the network interface was done sending. So, what ''Alexey CBQ'' does is to guesstimate how long the interface would be busy, and move the clock ahead to the point where the interface would be idle after sending a packet. So it works like this: Enqueue a packet Dequeue a packet, store how big it was Enqueue a packet Calculate how long the previous packet will take to transmit. Calculate how much time has actually passed Move ''now'' ahead by the maximum of the above two. Ok, what does this mean. Sending packets would look like this: | | |-----+ +-+ +--- etc | | | | | | | | | | | |------------| |-----| +------------------------------------------------------- 1 2 3 4 5 At moment 1, a packet starts to be send. At point 2, the packet is done sending. CBQ knows that the packet was 10000 bits long. Say we want to shape to 100kbit/s, then time time that should elapse between point 3, where we start sending the next packet, and point 1 where we started sending the first one, is 0.1 second. In an ideal world, the next dequeue request comes directly when the network device is done sending, at moment two. But it doesn''t, it comes immediately. However, the kernel can *calculate* when 2 should occur by dividing the size of the packet by the actual bandwidth of the device. CBQ then shifts the virtual time to point 2, and bases all calculations on that. This is why CBQ needs to know the bandwidth of your link. Now, if the you set the bandwidth of your link higher than it is in reality, CBQ will mess up its avgidle calculations. It appears that ''overlimit'' is then still shaped at the proper rate, but link sharing may be done wrong. This is what I''m investigating now. The above may not make much sense, but perhaps you can make something of it :-) Regards, bert -- http://www.PowerDNS.com Versatile DNS Software & Services Trilab The Technology People Netherlabs BV / Rent-a-Nerd.nl - Nerd Available - ''SYN! .. SYN|ACK! .. ACK!'' - the mating call of the internet
On Sat, Dec 15, 2001 at 12:49:42AM +0100, bert hubert wrote:> Enqueue a packet to the queue > Enqueue a packet to the queue > Enqueue a packet to the queue > Network interface tells us that there is room > dequeue > dequeue > [...] > The above may not make much sense, but perhaps you can make something of itWhats interesting is on a couple of occasions I have seen a situation where a "ping -n {someone over eth1}" where eth1 is a CBQ''d interface will cause something like: 64 bytes from 216.168.105.33: icmp_seq=0 ttl=255 time=3.0 s 64 bytes from 216.168.105.33: icmp_seq=1 ttl=255 time=2.0 s 64 bytes from 216.168.105.33: icmp_seq=2 ttl=255 time=1.0 s 64 bytes from 216.168.105.33: icmp_seq=3 ttl=255 time=0.2 ms ... after a 3 second delay. I wonder if I could reproduce this and see if its related to some setting in CBQ. -- Michael T. Babcock CTO, FibreSpeed Ltd. (Hosting, Security, Consultation, Database, etc) http://www.fibrespeed.net/~mbabcock/