from http://luxik.cdi.cz/~patrick/imq/index.html By default you have two imq devices (imq0 and imq1). That means if you don''t specify numdevs below the default is 2, right? from http://luxik.cdi.cz/~patrick/imq/index.html modprobe imq numdevs=1 tc qdisc add dev imq0 handle 1: root htb default 1 tc class add dev imq0 parent 1: classid 1:1 htb rate 1mbit tc qdisc add dev imq0 parent 1:1 handle 10: htb default 5 We''re adding an htb as the qdisc for a child class of htb ? Why? Isn''t that just wasting time? Can''t all 10: stuff be done with 1: instead? So above the total rate for all imq traffic is limited to 1Mbit. As far as I see, it all goes out 1:1, and below we shape that. tc class add dev imq0 parent 10: handle 10:1 htb rate 256kbit burst 30k prio 1 tc class add dev imq0 parent 10: handle 10:2 htb rate 256kbit burst 30k prio 2 tc class add dev imq0 parent 10: handle 10:5 htb rate 1mbit prio 3 tc qdisc add dev imq0 parent 10:1 handle 21:0 pfifo tc qdisc add dev imq0 parent 10:2 handle 22:0 sfq tc qdisc add dev imq0 parent 10:3 handle 23:0 sfq Should this 10:3 be 10:5 ? tc filter add dev imq0 protocol ip pref 1 parent 10: handle 1 fw classid 10:1 tc filter add dev imq0 protocol ip pref 2 parent 10: handle 2 fw classid 10:2 iptables -t mangle -A PREROUTING -i ppp0 -j IMQ This is a little confusing. I gather -j IMQ means to mark the packet so that AFTER the mangle table is done, the IMQ device will steal the packet. Is that right? If there were more than one imq device, how would we know which one gets the packet? iptables -t mangle -A PREROUTING -i ppp0 -p tcp -m tos --tos minimize-delay -m state --state ESTABLISHED -j MARK --set-mark 1 iptables -t mangle -A PREROUTING -i ppp0 -p tcp --sport 80 --dport 1024: -m state --state ESTABLISHED -j MARK --set-mark 2 ip link set imq0 up So here''s what I imagine happens. Please confirm or correct. The packet goes through mangle table and maybe gets marked for later classification and maybe gets marked for imq. Then the imq hook steals away those that were marked for imq. It enqueues them and dequeues them according to its classes and the marks. At this point the skb dev is imq0 ? Or still ppp0 ? When the packet is eventually dequeued (if not dropped) then it goes where? I''m hoping it goes to the beginning of pre-routing so we can apply conntrack/nat/mangle rules to it with -i imq0. I suspect this is not the case, since I see in the patch code nf_reinject(skb, info, NF_ACCEPT) I''m not even sure netfilter supports what I want. I see in http://netfilter.samba.org/documentation/HOWTO//netfilter-hacking-HOWTO-3.html 5.NF_REPEAT: call this hook again. but what''s "this hook" ? Is it the imq hook or pre_routing ? My goal here is to protect conntrack from attack by rate limiting the packets not from known connections. To do that I need to send them to imq before conntrack sees them. Unfortunately, conntrack does all the work that I want to avoid for those packets in prerouting, and conntrack sees them before mangle. But maybe I could restrict all operations that involve conntrack to packets with dev imq (or the imq mark), which I hope would result in conntrack NOT seeing the packets with other devices. I could instead send all those packets from other devices (or without the mark) to imq, where I would look them up in the conntrack table (but not add them!) to see whether they belong in the rate limited class or not. Then when (if) they''re released they should go through conntrack, nat, mangle, etc. Perhaps even better than changing the dev to imq0 would be a way for netfilter rules to match on the imq mark. Then I wouldn''t have to worry about whether rp_filter would still work. from http://luxik.cdi.cz/~patrick/imq/faq.html 4. When do packets reach the device (qdisc) ? The imq device registers NF_IP_PRE_ROUTING (for ingress) and NF_IP_POST_ROUTING (egress) netfilter hooks. These hooks are also registered by iptables. Hooks can be registered with different priorities which determine the order in which the registered functions will be called. Packet delivery to the imq device in NF_IP_PRE_ROUTING happens directly after the mangle table has been passed (not in the table itself!). In NF_IP_POST_ROUTING packets reach the device after ALL tables have been passed. This means you will be able to use netfilter marks for classifying incoming and outgoing packets. Packets seen in NF_IP_PRE_ROUTING include the ones that will be dropped by packet filtering later (since they already occupied bandwidth), in NF_IP_POST_ROUTING only packets which already passed packet filtering are seen. from include/linux/netfilter_ipv4.h enum nf_ip_hook_priorities { NF_IP_PRI_FIRST = INT_MIN, NF_IP_PRI_CONNTRACK = -200, NF_IP_PRI_MANGLE = -150, NF_IP_PRI_NAT_DST = -100, NF_IP_PRI_FILTER = 0, NF_IP_PRI_NAT_SRC = 100, NF_IP_PRI_LAST = INT_MAX, }; So after mangle means first conntrack, then mangle, then IMQ, then ... It might be worth mentioning this somewhere in doc. One other thing I worry about. net/ipv4/netfilter/ip_queue.c contains: * Packets arrive here from netfilter for queuing to userspace. * All of them must be fed back via nf_reinject() or Alexey will kill Rusty. */ static int netfilter_receive(struct sk_buff *skb, I notice the patch returning NF_QUEUE. Will Rusty survive if IMQ ends up dropping packets ?
Hi. Alot question, i hope i can answer them to your satisfaction. Don Cohen wrote:> from http://luxik.cdi.cz/~patrick/imq/index.html > By default you have two imq devices (imq0 and imq1) > > >That means if you don''t specify numdevs below the default is 2, right? >right.> > from http://luxik.cdi.cz/~patrick/imq/index.html > modprobe imq numdevs=1 > tc qdisc add dev imq0 handle 1: root htb default 1 > tc class add dev imq0 parent 1: classid 1:1 htb rate 1mbit > tc qdisc add dev imq0 parent 1:1 handle 10: htb default 5 > >We''re adding an htb as the qdisc for a child class of htb ? Why? >Isn''t that just wasting time? Can''t all 10: stuff be done with 1: >instead? >The root qdisc is used for delay simulation, 10:0 is the "real" qdisc ( http://luxik.cdi.cz/~devik/qos/htb/manual/userg.htm#prio )> >So above the total rate for all imq traffic is limited to 1Mbit. As >far as I see, it all goes out 1:1, and below we shape that. > tc class add dev imq0 parent 10: handle 10:1 htb rate 256kbit burst 30k prio 1 > tc class add dev imq0 parent 10: handle 10:2 htb rate 256kbit burst 30k prio 2 > tc class add dev imq0 parent 10: handle 10:5 htb rate 1mbit prio 3 > > tc qdisc add dev imq0 parent 10:1 handle 21:0 pfifo > tc qdisc add dev imq0 parent 10:2 handle 22:0 sfq > tc qdisc add dev imq0 parent 10:3 handle 23:0 sfq > >Should this 10:3 be 10:5 ? >Yes you''re right. Someone else already told me but i forgot to correct it.> > > tc filter add dev imq0 protocol ip pref 1 parent 10: handle 1 fw classid 10:1 > tc filter add dev imq0 protocol ip pref 2 parent 10: handle 2 fw classid 10:2 > > iptables -t mangle -A PREROUTING -i ppp0 -j IMQ > >This is a little confusing. I gather -j IMQ means to mark the packet >so that AFTER the mangle table is done, the IMQ device will steal the >packet. Is that right? If there were more than one imq device, how >would we know which one gets the packet? >I put it in this order on purpose because it thought it would show people the imq device does not get to see the packet during mangle table traversal but afterwards. It probably IS confusing so i''m going to change it the next days. If more than one imq device is used you specify the one which should get the packet with --todev argument to IMQ target.> > > iptables -t mangle -A PREROUTING -i ppp0 -p tcp -m tos --tos minimize-delay -m state --state ESTABLISHED -j MARK --set-mark 1 > iptables -t mangle -A PREROUTING -i ppp0 -p tcp --sport 80 --dport 1024: -m state --state ESTABLISHED -j MARK --set-mark 2 > > ip link set imq0 up > >So here''s what I imagine happens. Please confirm or correct. >The packet goes through mangle table and maybe gets marked for later >classification and maybe gets marked for imq. Then the imq hook >steals away those that were marked for imq. It enqueues them and >dequeues them according to its classes and the marks. At this >point the skb dev is imq0 ? Or still ppp0 ? >skb->dev doesn''t get changed if thats what you mean ..> >When the packet is eventually dequeued (if not dropped) then it >goes where? I''m hoping it goes to the beginning of pre-routing >so we can apply conntrack/nat/mangle rules to it with -i imq0. >No it doesn''t. I think it doesn''t make any sense to use any kind of iptables rules on packets passing imq because all of them come from/go to real devices which you can use in your rules.> >I suspect this is not the case, since I see in the patch code > nf_reinject(skb, info, NF_ACCEPT) >I''m not even sure netfilter supports what I want. >I see in > http://netfilter.samba.org/documentation/HOWTO//netfilter-hacking-HOWTO-3.html > 5.NF_REPEAT: call this hook again. >but what''s "this hook" ? Is it the imq hook or pre_routing ? >it''s imq hook. from net/core/netfilter.c: nf_reinject(...) { ... if (verdict == NF_REPEAT) { elem = elem->prev; verdict = NF_ACCEPT; } ... }> > >My goal here is to protect conntrack from attack by rate limiting the >packets not from known connections. To do that I need to send them to >imq before conntrack sees them. Unfortunately, conntrack does all the >work that I want to avoid for those packets in prerouting, and >conntrack sees them before mangle. But maybe I could restrict all >you can easily change this order. i guess you already noticed if you looked at the imq source. but are you sure this is necessary ? i guess your connection must be extremly fast if someone wants to dos you through a connection tracking table fillup attack ...> >operations that involve conntrack to packets with dev imq (or the imq >mark), which I hope would result in conntrack NOT seeing the packets >with other devices. I could instead send all those packets from other >devices (or without the mark) to imq, where I would look them up in >the conntrack table (but not add them!) to see whether they belong in >the rate limited class or not. Then when (if) they''re released they >should go through conntrack, nat, mangle, etc. > > >Perhaps even better than changing the dev to imq0 would be a way for >netfilter rules to match on the imq mark. Then I wouldn''t have to >worry about whether rp_filter would still work. >Changing skb->dev to imq0 would result in something like this: ... -> NF_HOOK(..) -> imq -> qdisc -> reinject -> continue NF_HOOK -> ... -> dev_queue_xmit -> qdisc -> imq -> reinject (CRASH!)> > > from http://luxik.cdi.cz/~patrick/imq/faq.html > > 4. When do packets reach the device (qdisc) ? > > The imq device registers NF_IP_PRE_ROUTING (for ingress) and > NF_IP_POST_ROUTING (egress) netfilter hooks. These hooks are also > registered by iptables. Hooks can be registered with different > priorities which determine the order in which the registered functions > will be called. Packet delivery to the imq device in NF_IP_PRE_ROUTING > happens directly after the mangle table has been passed (not in the > table itself!). In NF_IP_POST_ROUTING packets reach the device after > ALL tables have been passed. This means you will be able to use > netfilter marks for classifying incoming and outgoing packets. Packets > seen in NF_IP_PRE_ROUTING include the ones that will be dropped by > packet filtering later (since they already occupied bandwidth), in > NF_IP_POST_ROUTING only packets which already passed packet filtering > are seen. > > from include/linux/netfilter_ipv4.h > enum nf_ip_hook_priorities { > NF_IP_PRI_FIRST = INT_MIN, > NF_IP_PRI_CONNTRACK = -200, > NF_IP_PRI_MANGLE = -150, > NF_IP_PRI_NAT_DST = -100, > NF_IP_PRI_FILTER = 0, > NF_IP_PRI_NAT_SRC = 100, > NF_IP_PRI_LAST = INT_MAX, > }; > >So after mangle means first conntrack, then mangle, then IMQ, then ... > >It might be worth mentioning this somewhere in doc. >hmm i guess i may explanation is the worst way to describe this simple fact :) (its supposed to mean the same thing)> > >One other thing I worry about. >net/ipv4/netfilter/ip_queue.c contains: > * Packets arrive here from netfilter for queuing to userspace. > * All of them must be fed back via nf_reinject() or Alexey will kill Rusty. > */ > static int netfilter_receive(struct sk_buff *skb, > >I notice the patch returning NF_QUEUE. >Will Rusty survive if IMQ ends up dropping packets ? >If you look at the imq source you find a imq_skb_destructor, i though about adding a comment that it''s meant to save rusty''s life. if skb''s are freed inside qdiscs kfree_skb will call the destructor which will do necessary things to protect rusty :) Bye, Patrick
Patrick McHardy writes: > >We''re adding an htb as the qdisc for a child class of htb ? Why? > >Isn''t that just wasting time? Can''t all 10: stuff be done with 1: > >instead? > > > The root qdisc is used for delay simulation, 10:0 is the "real" qdisc > ( http://luxik.cdi.cz/~devik/qos/htb/manual/userg.htm#prio ) So I was right, just to waste time. That was not part of the spec, as I recall. So I suggest it be removed from the example. > If more than one imq device is used you specify the one which should get > the packet with --todev argument to IMQ target. Be sure to put that in the doc. I didn''t see it there. I suppose the default is imq0 ? > skb->dev doesn''t get changed if thats what you mean .. Ok, important to know that. I gather there is currently no way to read the imq mark from netfilter. > >When the packet is eventually dequeued (if not dropped) then it > >goes where? I''m hoping it goes to the beginning of pre-routing > >so we can apply conntrack/nat/mangle rules to it with -i imq0. > > > No it doesn''t. I think it doesn''t make any sense to use any kind of > iptables rules on packets > passing imq because all of them come from/go to real devices which you > can use in your rules. But if you could read the imq mark then it would make a lot of sense. These two things in combination would allow me to do what I want without changing the code. As it is, it looks like I need a local variant of IMQ that runs before conntrack. (On the other hand, this is probably the more efficient solution anyhow.) > >I suspect this is not the case, since I see in the patch code > > nf_reinject(skb, info, NF_ACCEPT) > >I''m not even sure netfilter supports what I want. > >I see in > > http://netfilter.samba.org/documentation/HOWTO//netfilter-hacking-HOWTO-3.html > > 5.NF_REPEAT: call this hook again. > >but what''s "this hook" ? Is it the imq hook or pre_routing ? > > > it''s imq hook. from net/core/netfilter.c: > nf_reinject(...) As I thought, there''s no convenient way for you to do what I want. > you can easily change this order. i guess you already noticed if you > looked at the imq source. Right. But this is not a change that everyone would want. > but are you sure this is necessary ? i guess your connection must be > extremly fast if someone > wants to dos you through a connection tracking table fillup attack ... My idea of extremely fast has changed recently. Maybe it''s a bit ahead of yours. First, I''m interested in protecting against attacks from inside the firewall, and these are typically connected at 100Mbit. Is that fast enough? Next I''ve been playing with gigabit cards. Finally I visited sprint a few weeks ago and they''re not interested in anything as slow as one gigabit. Although, for a firewall, I admit that seems fast enough for the time being. > Changing skb->dev to imq0 would result in something like this: > ... -> NF_HOOK(..) -> imq -> qdisc -> reinject -> continue NF_HOOK -> > ... -> dev_queue_xmit -> qdisc -> imq -> reinject (CRASH!) If you mean it could result in infinite loops, yes, but this is not the first invention of infinite loops. If your rules do the right things then the loops can also be avoided. Besides, that requires my other request, that the reinject go back to the beginning of the prerouting hook. Without that it was completely plausible that the skb dev could have been changed. But I''m not complaining. I just wanted to know. > If you look at the imq source you find a imq_skb_destructor, i though > about adding a comment that > it''s meant to save rusty''s life. if skb''s are freed inside qdiscs > kfree_skb will call the destructor which > will do necessary things to protect rusty :) Ok, I wouldn''t want to contribute to his early demise. This tends to confirm my first guess, which was that the important thing here was to free skbs when they are no longer in use. I guess user mode can''t free them, but perhaps the better solution would have been to free them before a copy is sent to user space and then recreating them if the copy ever came back. But I digress... Thanks for all the answers.
Don Cohen wrote:>Patrick McHardy writes: > > >We''re adding an htb as the qdisc for a child class of htb ? Why? > > >Isn''t that just wasting time? Can''t all 10: stuff be done with 1: > > >instead? > > > > > The root qdisc is used for delay simulation, 10:0 is the "real" qdisc > > ( http://luxik.cdi.cz/~devik/qos/htb/manual/userg.htm#prio ) > >So I was right, just to waste time. That was not part of the spec, as >I recall. So I suggest it be removed from the example. >I don''t know if its really necessary, but as imq is a software device it probably is .. maybe devik can answer this ...> > > > If more than one imq device is used you specify the one which should get > > the packet with --todev argument to IMQ target. > >Be sure to put that in the doc. I didn''t see it there. >I suppose the default is imq0 ? >yes ..> > > > skb->dev doesn''t get changed if thats what you mean .. > >Ok, important to know that. I gather there is currently no way to >read the imq mark from netfilter. >currently not, but if you need something like this just change the mark match to match skb->imq_flags instead of skb->nfmark ... you should then look at include/linux/imq.h to see the meaning of the different bits.> > > > >When the packet is eventually dequeued (if not dropped) then it > > >goes where? I''m hoping it goes to the beginning of pre-routing > > >so we can apply conntrack/nat/mangle rules to it with -i imq0. > > > > > No it doesn''t. I think it doesn''t make any sense to use any kind of > > iptables rules on packets > > passing imq because all of them come from/go to real devices which you > > can use in your rules. > >But if you could read the imq mark then it would make a lot of sense. >These two things in combination would allow me to do what I want >without changing the code. As it is, it looks like I need a local >variant of IMQ that runs before conntrack. (On the other hand, this >is probably the more efficient solution anyhow.) >more efficient maybe, but you will loose the possibility to only donate bandwidth to established and assured connections by using the state match ..> > > > >I suspect this is not the case, since I see in the patch code > > > nf_reinject(skb, info, NF_ACCEPT) > > >I''m not even sure netfilter supports what I want. > > >I see in > > > http://netfilter.samba.org/documentation/HOWTO//netfilter-hacking-HOWTO-3.html > > > 5.NF_REPEAT: call this hook again. > > >but what''s "this hook" ? Is it the imq hook or pre_routing ? > > > > > it''s imq hook. from net/core/netfilter.c: > > nf_reinject(...) > >As I thought, there''s no convenient way for you to do what I want. > > > you can easily change this order. i guess you already noticed if you > > looked at the imq source. > >Right. But this is not a change that everyone would want. > > > but are you sure this is necessary ? i guess your connection must be > > extremly fast if someone > > wants to dos you through a connection tracking table fillup attack ... > >My idea of extremely fast has changed recently. Maybe it''s a bit >ahead of yours. First, I''m interested in protecting against attacks >from inside the firewall, and these are typically connected at >100Mbit. Is that fast enough? Next I''ve been playing with gigabit >cards. Finally I visited sprint a few weeks ago and they''re not >interested in anything as slow as one gigabit. Although, for a >firewall, I admit that seems fast enough for the time being. > > > Changing skb->dev to imq0 would result in something like this: > > ... -> NF_HOOK(..) -> imq -> qdisc -> reinject -> continue NF_HOOK -> > > ... -> dev_queue_xmit -> qdisc -> imq -> reinject (CRASH!) > >If you mean it could result in infinite loops, yes, but this is not >the first invention of infinite loops. If your rules do the right >things then the loops can also be avoided. Besides, that requires >my other request, that the reinject go back to the beginning of the >prerouting hook. Without that it was completely plausible that the >skb dev could have been changed. But I''m not complaining. I just >wanted to know. > > > If you look at the imq source you find a imq_skb_destructor, i though > > about adding a comment that > > it''s meant to save rusty''s life. if skb''s are freed inside qdiscs > > kfree_skb will call the destructor which > > will do necessary things to protect rusty :) >Ok, I wouldn''t want to contribute to his early demise. This tends to >confirm my first guess, which was that the important thing here was to >free skbs when they are no longer in use. I guess user mode can''t free >them, but perhaps the better solution would have been to free them >before a copy is sent to user space and then recreating them if the >copy ever came back. But I digress... >userspace ?? imq never sends anything to userspace, but if it really would then you''re right, userspace can''t free skbs. also the destructor doesn''t free them but it releases references hold on the real devices which are taken before by netfilter so the real device doesn''t vanish while the packet is out of netfilters control ..> > >Thanks for all the answers. >Your welcome, Patrick