William Dauchy
2013-Jul-09 12:24 UTC
[PATCH v3 0/3][xen-netback][toolstack] add a pseudo pps limit to netback
VM traffic is already limited by a throughput limit, but there is no control over the maximum packet per second (PPS). In DDOS attack the major issue is rather PPS than throughput. With provider offering more bandwidth to VMs, it becames easy to coordinate a massive attack using VMs. Example: 100Mbits ~ 200kpps using 64B packets. This patch provides a new option to limit VMs maximum packets per second emission rate. It follows the same credits logic used for throughput shaping. For the moment we have considered each "txreq" as a packet. PPS limits is passed to VIF at connection time via xenstore. PPS credit uses the same usecond period used by rate shaping check. known limitations: - by using the same usecond period, PPS shaping depends on throughput shaping. - it is not always true that a "txreq" correspond to a paquet (fragmentation cases) but as this shaping is meant to avoid DDOS (small paquets) such an pproximation should not impact the results. - Some help on burst handling will be appreciated. v2: - fix some typo v3: - fix some typo - add toolstack patch Ahmed Amamou (1): xen-netback: add a pseudo pps rate limit drivers/net/xen-netback/common.h | 2 ++ drivers/net/xen-netback/interface.c | 1 + drivers/net/xen-netback/netback.c | 46 +++++++++++++++++++++++++++++++++++ drivers/net/xen-netback/xenbus.c | 25 ++++++++++++++++--- 4 files changed, 70 insertions(+), 4 deletions(-) [toolstack] This patch will update the libxl in order to provide the new pps limit new pps limit can be defined as follow YYMb/s&XXKpps@ZZms or YYMb/s@ZZms&XXKpps or YYMb/s&XXKpps in such case default 50ms interval will be used Ahmed Amamou (2): add a pseudo pps rate limit netif documentation tools/libxl/libxl.c | 3 ++ tools/libxl/libxl_types.idl | 1 + tools/libxl/libxlu_vif.c | 69 +++++++++++++++++++++++++++++++++++++++-- xen/include/public/io/netif.h | 27 ++++++++++++++++ 4 files changed, 97 insertions(+), 3 deletions(-) -- 1.7.9.5
William Dauchy
2013-Jul-09 12:24 UTC
[PATCH v3 1/3][xen-netback] add a pseudo pps rate limit
This patch provides a new option to limit VMs maximum packets per second emission rate. It follows the same credits logic used for throughput shaping. For the moment we have considered each "txreq" as a packet. PPS limits is passed to VIF at connection time via xenstore. PPS credit uses the same usecond period used by rate shaping check. known limitations: - by using the same usecond period, PPS shaping depends on throughput shaping. - it is not always true that a "txreq" correspond to a packet (fragmentation cases) but as this shaping is meant to avoid DDOS (small paquets) such an approximation should not impact the results. - Some help on burst handling will be appreciated. Signed-off-by: Ahmed Amamou <ahmed@gandi.net> Signed-off-by: William Dauchy <william@gandi.net> Signed-off-by: Kamel Haddadou <kamel@gandi.net> --- drivers/net/xen-netback/common.h | 2 ++ drivers/net/xen-netback/interface.c | 1 + drivers/net/xen-netback/netback.c | 46 +++++++++++++++++++++++++++++++++++ drivers/net/xen-netback/xenbus.c | 25 ++++++++++++++++--- 4 files changed, 70 insertions(+), 4 deletions(-) diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h index 9d7f172..fefa79a 100644 --- a/drivers/net/xen-netback/common.h +++ b/drivers/net/xen-netback/common.h @@ -85,8 +85,10 @@ struct xenvif { /* Transmit shaping: allow ''credit_bytes'' every ''credit_usec''. */ unsigned long credit_bytes; + unsigned long credit_packets; unsigned long credit_usec; unsigned long remaining_credit; + unsigned long remaining_packets; struct timer_list credit_timeout; /* Statistics */ diff --git a/drivers/net/xen-netback/interface.c b/drivers/net/xen-netback/interface.c index d984141..06257dd 100644 --- a/drivers/net/xen-netback/interface.c +++ b/drivers/net/xen-netback/interface.c @@ -273,6 +273,7 @@ struct xenvif *xenvif_alloc(struct device *parent, domid_t domid, INIT_LIST_HEAD(&vif->notify_list); vif->credit_bytes = vif->remaining_credit = ~0UL; + vif->credit_packets = vif->remaining_packets = ~0UL; vif->credit_usec = 0UL; init_timer(&vif->credit_timeout); /* Initialize ''expires'' now: it''s used to track the credit window. */ diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c index 8c20935..172a6de 100644 --- a/drivers/net/xen-netback/netback.c +++ b/drivers/net/xen-netback/netback.c @@ -905,10 +905,16 @@ static void tx_add_credit(struct xenvif *vif) vif->remaining_credit = min(max_credit, max_burst); } +static void tx_add_packets(struct xenvif *vif) +{ + vif->remaining_packets = vif->credit_packets; +} + static void tx_credit_callback(unsigned long data) { struct xenvif *vif = (struct xenvif *)data; tx_add_credit(vif); + tx_add_packets(vif); xen_netbk_check_rx_xenvif(vif); } @@ -1419,6 +1425,38 @@ static bool tx_credit_exceeded(struct xenvif *vif, unsigned size) return false; } +static bool tx_packets_exceeded(struct xenvif *vif) +{ + unsigned long now = jiffies; + unsigned long next_credit + vif->credit_timeout.expires + + msecs_to_jiffies(vif->credit_usec / 1000); + + /* Timer could already be pending in rare cases. */ + if (timer_pending(&vif->credit_timeout)) + return true; + + /* Passed the point where we can replenish credit? */ + if (time_after_eq(now, next_credit)) { + vif->credit_timeout.expires = now; + tx_add_packets(vif); + } + + /* Not enough slot to send right now? Set a callback. */ + if (vif->remaining_packets < 1) { + vif->credit_timeout.data + (unsigned long)vif; + vif->credit_timeout.function + tx_credit_callback; + mod_timer(&vif->credit_timeout, + next_credit); + + return true; + } + + return false; +} + static unsigned xen_netbk_tx_build_gops(struct xen_netbk *netbk) { struct gnttab_copy *gop = netbk->tx_copy_ops, *request_gop; @@ -1470,6 +1508,13 @@ static unsigned xen_netbk_tx_build_gops(struct xen_netbk *netbk) rmb(); /* Ensure that we see the request before we copy it. */ memcpy(&txreq, RING_GET_REQUEST(&vif->tx, idx), sizeof(txreq)); + /* pps-based scheduling. */ + if(vif->remaining_packets < 1 && + tx_packets_exceeded(vif)) { + xenvif_put(vif); + continue; + } + /* Credit-based scheduling. */ if (txreq.size > vif->remaining_credit && tx_credit_exceeded(vif, txreq.size)) { @@ -1478,6 +1523,7 @@ static unsigned xen_netbk_tx_build_gops(struct xen_netbk *netbk) } vif->remaining_credit -= txreq.size; + vif->remaining_packets--; work_to_do--; vif->tx.req_cons = ++idx; diff --git a/drivers/net/xen-netback/xenbus.c b/drivers/net/xen-netback/xenbus.c index 410018c..7c55bed 100644 --- a/drivers/net/xen-netback/xenbus.c +++ b/drivers/net/xen-netback/xenbus.c @@ -267,15 +267,18 @@ static void frontend_changed(struct xenbus_device *dev, static void xen_net_read_rate(struct xenbus_device *dev, - unsigned long *bytes, unsigned long *usec) + unsigned long *bytes, + unsigned long *packet, + unsigned long *usec) { char *s, *e; - unsigned long b, u; - char *ratestr; + unsigned long b, u, pps; + char *ratestr, *ppsstr; /* Default to unlimited bandwidth. */ *bytes = ~0UL; *usec = 0; + *packet = ~0UL; ratestr = xenbus_read(XBT_NIL, dev->nodename, "rate", NULL); if (IS_ERR(ratestr)) @@ -295,11 +298,24 @@ static void xen_net_read_rate(struct xenbus_device *dev, *usec = u; kfree(ratestr); + ppsstr = xenbus_read(XBT_NIL, dev->nodename, "pps", NULL); + if (IS_ERR(ppsstr)) + return; + s = ppsstr; + pps = simple_strtoul(s, &e, 10); + if ((s == e) || (*e != ''\0'')) + goto fail2; + *packet = pps; + kfree(ppsstr); return; fail: pr_warn("Failed to parse network rate limit. Traffic unlimited.\n"); kfree(ratestr); + return; +fail2: + pr_warn("Failed to parse network PPS limit. PPS unlimited.\n"); + kfree(ppsstr); } static int xen_net_read_mac(struct xenbus_device *dev, u8 mac[]) @@ -370,8 +386,9 @@ static void connect(struct backend_info *be) } xen_net_read_rate(dev, &be->vif->credit_bytes, - &be->vif->credit_usec); + &be->vif->credit_packets, &be->vif->credit_usec); be->vif->remaining_credit = be->vif->credit_bytes; + be->vif->remaining_packets = be->vif->credit_packets; unregister_hotplug_status_watch(be); err = xenbus_watch_pathfmt(dev, &be->hotplug_status_watch, -- 1.7.9.5
adapt libxl to handle pps limit parameter the new pps limit can be defined using a ''&'' symbol after the rate limit for example: YYMb/s&XXKpps@ZZms or YYMb/s@ZZms&XXKpps or YYMb/s&XXKpps in such case default 50ms interval will be used Signed-off-by: Ahmed Amamou <ahmed@gandi.net> Signed-off-by: William Dauchy <william@gandi.net> Signed-off-by: Kamel Haddadou <kamel@gandi.net> --- tools/libxl/libxl.c | 3 ++ tools/libxl/libxl_types.idl | 1 + tools/libxl/libxlu_vif.c | 68 +++++++++++++++++++++++++++++++++++++++++-- 3 files changed, 69 insertions(+), 3 deletions(-) diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c index 3236aa9..7cbbd5b 100644 --- a/tools/libxl/libxl.c +++ b/tools/libxl/libxl.c @@ -2920,6 +2920,9 @@ void libxl__device_nic_add(libxl__egc *egc, uint32_t domid, flexarray_append(back, libxl__sprintf(gc, "%"PRIu64",%"PRIu32"", nic->rate_bytes_per_interval, nic->rate_interval_usecs)); + flexarray_append(back, "pps"); + flexarray_append(back, libxl__sprintf(gc, "%"PRIu64"", + nic->rate_packet_per_interval)); } flexarray_append(back, "bridge"); diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl index d218a2d..a397a91 100644 --- a/tools/libxl/libxl_types.idl +++ b/tools/libxl/libxl_types.idl @@ -390,6 +390,7 @@ libxl_device_nic = Struct("device_nic", [ ("script", string), ("nictype", libxl_nic_type), ("rate_bytes_per_interval", uint64), + ("rate_packet_per_interval", uint64), ("rate_interval_usecs", uint32), ("gatewaydev", string), ]) diff --git a/tools/libxl/libxlu_vif.c b/tools/libxl/libxlu_vif.c index 3b3de0f..4ff1725 100644 --- a/tools/libxl/libxlu_vif.c +++ b/tools/libxl/libxlu_vif.c @@ -3,6 +3,7 @@ static const char *vif_bytes_per_sec_re = "^[0-9]+[GMK]?[Bb]/s$"; static const char *vif_internal_usec_re = "^[0-9]+[mu]?s?$"; +static const char *vif_packet_per_sec_re = "^[0-9]+[GMK]?pps$"; static void xlu__vif_err(XLU_Config *cfg, const char *msg, const char *rate) { fprintf(cfg->report, @@ -49,6 +50,43 @@ out: return rc; } +static int vif_parse_rate_packet_per_sec(XLU_Config *cfg, const char *packet, + uint64_t *packet_per_sec) +{ + regex_t rec; + uint64_t tmp = 0; + const char *p; + int rc = 0; + + regcomp(&rec, vif_packet_per_sec_re, REG_EXTENDED|REG_NOSUB); + if (regexec(&rec, packet, 0, NULL, 0)) { + xlu__vif_err(cfg, "invalid pps", packet); + rc = EINVAL; + goto out; + } + + p = packet; + tmp = strtoull(p, (char**)&p, 0); + if (tmp == 0 || tmp > UINT32_MAX || errno == ERANGE) { + xlu__vif_err(cfg, "pps overflow", packet); + rc = EOVERFLOW; + goto out; + } + + if (*p == ''G'') + tmp *= 1000 * 1000 * 1000; + else if (*p == ''M'') + tmp *= 1000 * 1000; + else if (*p == ''K'') + tmp *= 1000; + + *packet_per_sec = tmp; + +out: + regfree(&rec); + return rc; +} + static int vif_parse_rate_interval_usecs(XLU_Config *cfg, const char *interval, uint32_t *interval_usecs) { @@ -94,22 +132,35 @@ int xlu_vif_parse_rate(XLU_Config *cfg, const char *rate, libxl_device_nic *nic) { uint64_t bytes_per_sec = 0; uint64_t bytes_per_interval = 0; + uint64_t packet_per_sec = 0; + uint64_t packet_per_interval = 0; uint32_t interval_usecs = 50000UL; /* Default to 50ms */ - char *ratetok, *tmprate; + char *ratetok, *tmprate, *tmp_pps, *tmpint; int rc = 0; + /* rate string need to be duplicated because strtok may change it */ tmprate = strdup(rate); + tmp_pps = strdup(rate); + tmpint = strdup(rate); + if (!strcmp(tmprate,"")) { xlu__vif_err(cfg, "no rate specified", rate); rc = EINVAL; goto out; } - ratetok = strtok(tmprate, "@"); + /* accepted rate string are as follow: + * rate&pps@interval or rate@interval&pps + */ + + /* ratetok contains the first token */ + ratetok = strtok(tmprate, "@&"); rc = vif_parse_rate_bytes_per_sec(cfg, ratetok, &bytes_per_sec); if (rc) goto out; - ratetok = strtok(NULL, "@"); + ratetok = strtok(tmpint, "@"); + ratetok = strtok(NULL, "@&"); + /* ratetok contains the first token following the ''@'' */ if (ratetok != NULL) { rc = vif_parse_rate_interval_usecs(cfg, ratetok, &interval_usecs); if (rc) goto out; @@ -121,11 +172,22 @@ int xlu_vif_parse_rate(XLU_Config *cfg, const char *rate, libxl_device_nic *nic) goto out; } + ratetok = strtok(tmp_pps, "&"); + ratetok = strtok(NULL, "&@"); + /* ratetok contains the first token following the ''&'' */ + if (ratetok != NULL) { + rc = vif_parse_rate_packet_per_sec(cfg, ratetok, &packet_per_sec); + if (rc) goto out; + } + bytes_per_interval (((uint64_t) bytes_per_sec * (uint64_t) interval_usecs) / 1000000UL); + packet_per_interval + (((uint64_t) packet_per_sec * (uint64_t) interval_usecs) / 1000000UL); nic->rate_interval_usecs = interval_usecs; nic->rate_bytes_per_interval = bytes_per_interval; + nic->rate_packet_per_interval = packet_per_interval; out: free(tmprate); -- 1.7.9.5
add netif old rate limit documentation add new pps limit documentation Signed-off-by: Ahmed Amamou <ahmed@gandi.net> Signed-off-by: William Dauchy <william@gandi.net> Signed-off-by: Kamel Haddadou <kamel@gandi.net> --- docs/misc/xl-network-configuration.markdown | 18 +++++++++++++----- xen/include/public/io/netif.h | 27 +++++++++++++++++++++++++++ 2 files changed, 40 insertions(+), 5 deletions(-) diff --git a/docs/misc/xl-network-configuration.markdown b/docs/misc/xl-network-configuration.markdown index e0d3d2a..203a002 100644 --- a/docs/misc/xl-network-configuration.markdown +++ b/docs/misc/xl-network-configuration.markdown @@ -141,25 +141,33 @@ domain which is outside the scope of this document. Specifies the rate at which the outgoing traffic will be limited to. The default if this keyword is not specified is unlimited. -The rate may be specified as "<RATE>/s" or optionally "<RATE>/s@<INTERVAL>". +The rate may be specified as "<RATE>/s" or optionally +"<RATE>/s@<INTERVAL>" or "<RATE>/s&<PPS>pps" or "<RATE>/s&<PPS>pps@<INTERVAL>". * `RATE` is in bytes and can accept suffixes: * GB, MB, KB, B for bytes. * Gb, Mb, Kb, b for bits. + + * `PPS` is in packet and can accept suffixes: + * Gpps, Mpps, Kpps, pps. + It determines the packets per second that outgoing traffic will be + limited to. + * `INTERVAL` is in microseconds and can accept suffixes: ms, us, s. It determines the frequency at which the vif transmission credit is replenished. The default is 50ms. -Vif rate limiting is credit-based. It means that for "1MB/s@20ms", the +Vif rate limiting is credit-based. It means that for "1MB/s&50Kpps@20ms", the available credit will be equivalent of the traffic you would have done -at "1MB/s" during 20ms. This will results in a credit of 20,000 bytes -replenished every 20,000 us. +at "1MB/s" or 50,000 pps during 20ms. This will results in a credit of 20,000 +bytes or 1000 packet replenished every 20,000 us. For example: ''rate=10Mb/s'' -- meaning up to 10 megabits every second ''rate=250KB/s'' -- meaning up to 250 kilobytes every second - ''rate=1MB/s@20ms'' -- meaning 20,000 bytes in every 20 millisecond period + ''rate=1MB/s&10Kpps@20ms'' -- meaning 20,000 bytes or 200 packets in +every 20 millisecond period NOTE: The actual underlying limits of rate limiting are dependent on the underlying netback implementation. diff --git a/xen/include/public/io/netif.h b/xen/include/public/io/netif.h index d477751..8bd112f 100644 --- a/xen/include/public/io/netif.h +++ b/xen/include/public/io/netif.h @@ -79,6 +79,33 @@ * Request N: netif_tx_request -- 0 */ +/* + * ------------------------- Backend Device Properties ------------------------- + * + * interval + * Values: <uint64_t> + * Default Value: 0 + * + * Used time interval in usecond for throughput and pps limits. The + * same interval is used for both of them in order to simplify the + * implementation. Using the same check interval also avoid possible bugs + * that invole bypassing these limits + * + * rate + * Values: <uint64_t> + * Default Value: ~0 + * + * The netback reading rate in bytes from the shared ring. This rate is + * represented in bytes per interval. + * + * pps + * Values: <uint64_t> + * Default Value: ~0 + * + * The netback reading rate in tx slots from the shared ring.This rate is + * represented in request per interval. + */ + /* Protocol checksum field is blank in the packet (hardware offload)? */ #define _NETTXF_csum_blank (0) #define NETTXF_csum_blank (1U<<_NETTXF_csum_blank) -- 1.7.9.5
Sander Eikelenboom
2013-Jul-09 13:48 UTC
Re: [PATCH v3 1/3][xen-netback] add a pseudo pps rate limit
Tuesday, July 9, 2013, 2:24:07 PM, you wrote:> This patch provides a new option to limit VMs maximum packets per second > emission rate. > It follows the same credits logic used for throughput shaping. For the > moment we have considered each "txreq" as a packet. > PPS limits is passed to VIF at connection time via xenstore. > PPS credit uses the same usecond period used by rate shaping check.> known limitations: > - by using the same usecond period, PPS shaping depends on throughput > shaping. > - it is not always true that a "txreq" correspond to a packet > (fragmentation cases) but as this shaping is meant to avoid DDOS > (small paquets) such an approximation should not impact the results. > - Some help on burst handling will be appreciated.Just wondering, why should this be done in the drivers ? Couldn''t this also be achieved with netfilter and the recent/limit modules ? The limit module can already handle bursts. -- Sander> Signed-off-by: Ahmed Amamou <ahmed@gandi.net> > Signed-off-by: William Dauchy <william@gandi.net> > Signed-off-by: Kamel Haddadou <kamel@gandi.net> > ---
William Dauchy
2013-Jul-09 14:01 UTC
Re: [PATCH v3 1/3][xen-netback] add a pseudo pps rate limit
On Jul09 15:48, Sander Eikelenboom wrote:> Just wondering, why should this be done in the drivers ? > Couldn''t this also be achieved with netfilter and the recent/limit modules ? > The limit module can already handle bursts.We indeed forgot to talk about it since we already got the question from Wei. The first thing is that your comment is also true for bandwidth which is already present. Moreover PPS is linked to bandwidth. By using netfilter, PPS shaping is done on backend level, once packet has left the VM; which means after using an additional memory transaction to copy packet from frontend. IMHO, at scale, shaping in this way should save some memory transactions comparing to netfilter. -- William _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Sander Eikelenboom
2013-Jul-09 14:42 UTC
Re: [PATCH v3 1/3][xen-netback] add a pseudo pps rate limit
Tuesday, July 9, 2013, 4:01:17 PM, you wrote:> On Jul09 15:48, Sander Eikelenboom wrote: >> Just wondering, why should this be done in the drivers ? >> Couldn''t this also be achieved with netfilter and the recent/limit modules ? >> The limit module can already handle bursts.> We indeed forgot to talk about it since we already got the question from > Wei. > The first thing is that your comment is also true for bandwidth which is > already present. Moreover PPS is linked to bandwidth. > By using netfilter, PPS shaping is done on backend level, once packet > has left the VM; which means after using an additional memory transaction > to copy packet from frontend. IMHO, at scale, shaping in this way should > save some memory transactions comparing to netfilter.Ok so the main usage scenario is not inbound traffic from the outside world that issues a (D)DOS, but rather a (malicious) guest that could issue a DOS on the host system by draining the resources of the netback driver by sending many packets per second. And that this scenario can''t be circumvented with netfilter because it doesn''t come into play yet (on the host). -- Sander
William Dauchy
2013-Jul-09 15:19 UTC
Re: [PATCH v3 1/3][xen-netback] add a pseudo pps rate limit
On Jul09 16:42, Sander Eikelenboom wrote:> Ok so the main usage scenario is not inbound traffic from the outside world that issues a (D)DOS, > but rather a (malicious) guest that could issue a DOS on the host system by > draining the resources of the netback driver by sending many packets per second. > And that this scenario can''t be circumvented with netfilter because it doesn''t come into play yet (on the host).yes Sander your example perfectly illustrates the worst case. IMHO it makes sense to filter traffic as soon as possible. Using netfilter for inbound traffic could make sense but outbound filtering in netfront would be the best choice; this solution sounds too risky. For outbound traffic even if the host is not the target of the DDOS attack, netfilter will consume way more resources in order to stop the attack. -- William _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
On Tue, Jul 09, 2013 at 02:24:08PM +0200, William Dauchy wrote:> adapt libxl to handle pps limit parameter > the new pps limit can be defined using a ''&'' symbol after > the rate limit for example: > YYMb/s&XXKpps@ZZms > or > YYMb/s@ZZms&XXKpps > or > YYMb/s&XXKpps in such case default 50ms interval will be used > > Signed-off-by: Ahmed Amamou <ahmed@gandi.net> > Signed-off-by: William Dauchy <william@gandi.net> > Signed-off-by: Kamel Haddadou <kamel@gandi.net> > --- > tools/libxl/libxl.c | 3 ++ > tools/libxl/libxl_types.idl | 1 + > tools/libxl/libxlu_vif.c | 68 +++++++++++++++++++++++++++++++++++++++++-- > 3 files changed, 69 insertions(+), 3 deletions(-) > > diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c > index 3236aa9..7cbbd5b 100644 > --- a/tools/libxl/libxl.c > +++ b/tools/libxl/libxl.c > @@ -2920,6 +2920,9 @@ void libxl__device_nic_add(libxl__egc *egc, uint32_t domid, > flexarray_append(back, libxl__sprintf(gc, "%"PRIu64",%"PRIu32"", > nic->rate_bytes_per_interval, > nic->rate_interval_usecs)); > + flexarray_append(back, "pps"); > + flexarray_append(back, libxl__sprintf(gc, "%"PRIu64"", > + nic->rate_packet_per_interval)); > } > > flexarray_append(back, "bridge"); > diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl > index d218a2d..a397a91 100644 > --- a/tools/libxl/libxl_types.idl > +++ b/tools/libxl/libxl_types.idl > @@ -390,6 +390,7 @@ libxl_device_nic = Struct("device_nic", [ > ("script", string), > ("nictype", libxl_nic_type), > ("rate_bytes_per_interval", uint64), > + ("rate_packet_per_interval", uint64),Use rate_packets_per_interval. Plural form. :-)> ("rate_interval_usecs", uint32), > ("gatewaydev", string), > ]) > diff --git a/tools/libxl/libxlu_vif.c b/tools/libxl/libxlu_vif.c > index 3b3de0f..4ff1725 100644 > --- a/tools/libxl/libxlu_vif.c > +++ b/tools/libxl/libxlu_vif.c > @@ -3,6 +3,7 @@ > > static const char *vif_bytes_per_sec_re = "^[0-9]+[GMK]?[Bb]/s$"; > static const char *vif_internal_usec_re = "^[0-9]+[mu]?s?$"; > +static const char *vif_packet_per_sec_re = "^[0-9]+[GMK]?pps$";Plural form.> > static void xlu__vif_err(XLU_Config *cfg, const char *msg, const char *rate) { > fprintf(cfg->report, > @@ -49,6 +50,43 @@ out: > return rc; > } > > +static int vif_parse_rate_packet_per_sec(XLU_Config *cfg, const char *packet, > + uint64_t *packet_per_sec) > +{ > + regex_t rec; > + uint64_t tmp = 0; > + const char *p; > + int rc = 0; > + > + regcomp(&rec, vif_packet_per_sec_re, REG_EXTENDED|REG_NOSUB); > + if (regexec(&rec, packet, 0, NULL, 0)) { > + xlu__vif_err(cfg, "invalid pps", packet); > + rc = EINVAL; > + goto out; > + } > + > + p = packet; > + tmp = strtoull(p, (char**)&p, 0); > + if (tmp == 0 || tmp > UINT32_MAX || errno == ERANGE) { > + xlu__vif_err(cfg, "pps overflow", packet); > + rc = EOVERFLOW; > + goto out; > + } > + > + if (*p == ''G'') > + tmp *= 1000 * 1000 * 1000; > + else if (*p == ''M'') > + tmp *= 1000 * 1000; > + else if (*p == ''K'') > + tmp *= 1000; > + > + *packet_per_sec = tmp; > + > +out: > + regfree(&rec); > + return rc; > +} > + > static int vif_parse_rate_interval_usecs(XLU_Config *cfg, const char *interval, > uint32_t *interval_usecs) > { > @@ -94,22 +132,35 @@ int xlu_vif_parse_rate(XLU_Config *cfg, const char *rate, libxl_device_nic *nic) > { > uint64_t bytes_per_sec = 0; > uint64_t bytes_per_interval = 0; > + uint64_t packet_per_sec = 0; > + uint64_t packet_per_interval = 0; > uint32_t interval_usecs = 50000UL; /* Default to 50ms */ > - char *ratetok, *tmprate; > + char *ratetok, *tmprate, *tmp_pps, *tmpint; > int rc = 0; > > + /* rate string need to be duplicated because strtok may change it */ > tmprate = strdup(rate); > + tmp_pps = strdup(rate); > + tmpint = strdup(rate); > + > if (!strcmp(tmprate,"")) { > xlu__vif_err(cfg, "no rate specified", rate); > rc = EINVAL; > goto out; > } > > - ratetok = strtok(tmprate, "@"); > + /* accepted rate string are as follow: > + * rate&pps@interval or rate@interval&pps > + */ > + > + /* ratetok contains the first token */ > + ratetok = strtok(tmprate, "@&"); > rc = vif_parse_rate_bytes_per_sec(cfg, ratetok, &bytes_per_sec); > if (rc) goto out; > > - ratetok = strtok(NULL, "@"); > + ratetok = strtok(tmpint, "@"); > + ratetok = strtok(NULL, "@&"); > + /* ratetok contains the first token following the ''@'' */ > if (ratetok != NULL) { > rc = vif_parse_rate_interval_usecs(cfg, ratetok, &interval_usecs); > if (rc) goto out; > @@ -121,11 +172,22 @@ int xlu_vif_parse_rate(XLU_Config *cfg, const char *rate, libxl_device_nic *nic) > goto out; > } > > + ratetok = strtok(tmp_pps, "&"); > + ratetok = strtok(NULL, "&@"); > + /* ratetok contains the first token following the ''&'' */ > + if (ratetok != NULL) { > + rc = vif_parse_rate_packet_per_sec(cfg, ratetok, &packet_per_sec); > + if (rc) goto out; > + } > + > bytes_per_interval > (((uint64_t) bytes_per_sec * (uint64_t) interval_usecs) / 1000000UL); > + packet_per_interval > + (((uint64_t) packet_per_sec * (uint64_t) interval_usecs) / 1000000UL); > > nic->rate_interval_usecs = interval_usecs; > nic->rate_bytes_per_interval = bytes_per_interval; > + nic->rate_packet_per_interval = packet_per_interval; > > out: > free(tmprate);Leaking tmp_pps and tmpint.> -- > 1.7.9.5
On Tue, Jul 09, 2013 at 02:24:07PM +0200, William Dauchy wrote:> This patch provides a new option to limit VMs maximum packets per second > emission rate. > It follows the same credits logic used for throughput shaping. For the > moment we have considered each "txreq" as a packet. > PPS limits is passed to VIF at connection time via xenstore. > PPS credit uses the same usecond period used by rate shaping check. > > known limitations: > - by using the same usecond period, PPS shaping depends on throughput > shaping. > - it is not always true that a "txreq" correspond to a packet > (fragmentation cases) but as this shaping is meant to avoid DDOS > (small paquets) such an approximation should not impact the results.Typo, "paquets".> - Some help on burst handling will be appreciated. > > Signed-off-by: Ahmed Amamou <ahmed@gandi.net> > Signed-off-by: William Dauchy <william@gandi.net> > Signed-off-by: Kamel Haddadou <kamel@gandi.net> > --- > drivers/net/xen-netback/common.h | 2 ++ > drivers/net/xen-netback/interface.c | 1 + > drivers/net/xen-netback/netback.c | 46 +++++++++++++++++++++++++++++++++++ > drivers/net/xen-netback/xenbus.c | 25 ++++++++++++++++--- > 4 files changed, 70 insertions(+), 4 deletions(-) > > diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h > index 9d7f172..fefa79a 100644 > --- a/drivers/net/xen-netback/common.h > +++ b/drivers/net/xen-netback/common.h > @@ -85,8 +85,10 @@ struct xenvif { > > /* Transmit shaping: allow ''credit_bytes'' every ''credit_usec''. */ > unsigned long credit_bytes; > + unsigned long credit_packets; > unsigned long credit_usec; > unsigned long remaining_credit; > + unsigned long remaining_packets; > struct timer_list credit_timeout; > > /* Statistics */ > diff --git a/drivers/net/xen-netback/interface.c b/drivers/net/xen-netback/interface.c > index d984141..06257dd 100644 > --- a/drivers/net/xen-netback/interface.c > +++ b/drivers/net/xen-netback/interface.c > @@ -273,6 +273,7 @@ struct xenvif *xenvif_alloc(struct device *parent, domid_t domid, > INIT_LIST_HEAD(&vif->notify_list); > > vif->credit_bytes = vif->remaining_credit = ~0UL; > + vif->credit_packets = vif->remaining_packets = ~0UL; > vif->credit_usec = 0UL; > init_timer(&vif->credit_timeout); > /* Initialize ''expires'' now: it''s used to track the credit window. */ > diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c > index 8c20935..172a6de 100644 > --- a/drivers/net/xen-netback/netback.c > +++ b/drivers/net/xen-netback/netback.c > @@ -905,10 +905,16 @@ static void tx_add_credit(struct xenvif *vif) > vif->remaining_credit = min(max_credit, max_burst); > } > > +static void tx_add_packets(struct xenvif *vif) > +{ > + vif->remaining_packets = vif->credit_packets; > +} > + > static void tx_credit_callback(unsigned long data) > { > struct xenvif *vif = (struct xenvif *)data; > tx_add_credit(vif); > + tx_add_packets(vif); > xen_netbk_check_rx_xenvif(vif); > } > > @@ -1419,6 +1425,38 @@ static bool tx_credit_exceeded(struct xenvif *vif, unsigned size) > return false; > } > > +static bool tx_packets_exceeded(struct xenvif *vif) > +{ > + unsigned long now = jiffies; > + unsigned long next_credit > + vif->credit_timeout.expires + > + msecs_to_jiffies(vif->credit_usec / 1000); > + > + /* Timer could already be pending in rare cases. */ > + if (timer_pending(&vif->credit_timeout)) > + return true; > + > + /* Passed the point where we can replenish credit? */ > + if (time_after_eq(now, next_credit)) { > + vif->credit_timeout.expires = now; > + tx_add_packets(vif); > + } > + > + /* Not enough slot to send right now? Set a callback. */ > + if (vif->remaining_packets < 1) { > + vif->credit_timeout.data > + (unsigned long)vif; > + vif->credit_timeout.function > + tx_credit_callback; > + mod_timer(&vif->credit_timeout, > + next_credit); > +No need to wrap lines like this, they''re certainly shorter than 80 characters.> + return true; > + } > + > + return false; > +} > + > static unsigned xen_netbk_tx_build_gops(struct xen_netbk *netbk) > { > struct gnttab_copy *gop = netbk->tx_copy_ops, *request_gop; > @@ -1470,6 +1508,13 @@ static unsigned xen_netbk_tx_build_gops(struct xen_netbk *netbk) > rmb(); /* Ensure that we see the request before we copy it. */ > memcpy(&txreq, RING_GET_REQUEST(&vif->tx, idx), sizeof(txreq)); > > + /* pps-based scheduling. */ > + if(vif->remaining_packets < 1 && > + tx_packets_exceeded(vif)) {Don''t wrap the above line.> + xenvif_put(vif); > + continue; > + } > + > /* Credit-based scheduling. */ > if (txreq.size > vif->remaining_credit && > tx_credit_exceeded(vif, txreq.size)) { > @@ -1478,6 +1523,7 @@ static unsigned xen_netbk_tx_build_gops(struct xen_netbk *netbk) > } > > vif->remaining_credit -= txreq.size; > + vif->remaining_packets--; > > work_to_do--; > vif->tx.req_cons = ++idx; > diff --git a/drivers/net/xen-netback/xenbus.c b/drivers/net/xen-netback/xenbus.c > index 410018c..7c55bed 100644 > --- a/drivers/net/xen-netback/xenbus.c > +++ b/drivers/net/xen-netback/xenbus.c > @@ -267,15 +267,18 @@ static void frontend_changed(struct xenbus_device *dev, > > > static void xen_net_read_rate(struct xenbus_device *dev, > - unsigned long *bytes, unsigned long *usec) > + unsigned long *bytes, > + unsigned long *packet,Please use plural form "packets".> + unsigned long *usec) > { > char *s, *e; > - unsigned long b, u; > - char *ratestr; > + unsigned long b, u, pps; > + char *ratestr, *ppsstr; > > /* Default to unlimited bandwidth. */ > *bytes = ~0UL; > *usec = 0; > + *packet = ~0UL; > > ratestr = xenbus_read(XBT_NIL, dev->nodename, "rate", NULL); > if (IS_ERR(ratestr)) > @@ -295,11 +298,24 @@ static void xen_net_read_rate(struct xenbus_device *dev, > *usec = u; > > kfree(ratestr); > + ppsstr = xenbus_read(XBT_NIL, dev->nodename, "pps", NULL); > + if (IS_ERR(ppsstr)) > + return;Indentation.> + s = ppsstr; > + pps = simple_strtoul(s, &e, 10); > + if ((s == e) || (*e != ''\0'')) > + goto fail2;Indentation.> + *packet = pps; > + kfree(ppsstr); > return; > > fail: > pr_warn("Failed to parse network rate limit. Traffic unlimited.\n"); > kfree(ratestr); > + return; > +fail2: > + pr_warn("Failed to parse network PPS limit. PPS unlimited.\n"); > + kfree(ppsstr);I know the logic of this function is correct, but the way it is structured is not very intuitive. I would do parse_rate; if (fail_to_parse_ratestr) goto fail_ratestr; parse_pps; if (fail_to_parse_ppsstr) goto fail_ppsstr; free(ratestr) free(ppsstr) return; fail_ppsstr: free(ppsstr); fail_ratestr: free(ratestr); Wei.> } > > static int xen_net_read_mac(struct xenbus_device *dev, u8 mac[]) > @@ -370,8 +386,9 @@ static void connect(struct backend_info *be) > } > > xen_net_read_rate(dev, &be->vif->credit_bytes, > - &be->vif->credit_usec); > + &be->vif->credit_packets, &be->vif->credit_usec); > be->vif->remaining_credit = be->vif->credit_bytes; > + be->vif->remaining_packets = be->vif->credit_packets; > > unregister_hotplug_status_watch(be); > err = xenbus_watch_pathfmt(dev, &be->hotplug_status_watch, > -- > 1.7.9.5
Ian Campbell
2013-Jul-10 12:50 UTC
Re: [PATCH v3 1/3][xen-netback] add a pseudo pps rate limit
On Tue, 2013-07-09 at 16:01 +0200, William Dauchy wrote:> On Jul09 15:48, Sander Eikelenboom wrote: > > Just wondering, why should this be done in the drivers ? > > Couldn''t this also be achieved with netfilter and the recent/limit modules ? > > The limit module can already handle bursts. > > We indeed forgot to talk about it since we already got the question from > Wei. > The first thing is that your comment is also true for bandwidth which is > already present. Moreover PPS is linked to bandwidth. > By using netfilter, PPS shaping is done on backend level, once packet > has left the VM; which means after using an additional memory transaction > to copy packet from frontend. IMHO, at scale, shaping in this way should > save some memory transactions comparing to netfilter.Have you tried the netfilter approach and found it to be insufficient in practice? I''m not sure how netfilter recent/limit is implemented but if it queues rather than drops you would naturally find that you end up with back pressure onto the netback device where the ring would fill with in-progress requests and therefore netback would have to stop processing more packets. Ian.
Sander Eikelenboom
2013-Jul-10 13:59 UTC
Re: [PATCH v3 1/3][xen-netback] add a pseudo pps rate limit
Wednesday, July 10, 2013, 2:50:59 PM, you wrote:> On Tue, 2013-07-09 at 16:01 +0200, William Dauchy wrote: >> On Jul09 15:48, Sander Eikelenboom wrote: >> > Just wondering, why should this be done in the drivers ? >> > Couldn''t this also be achieved with netfilter and the recent/limit modules ? >> > The limit module can already handle bursts. >> >> We indeed forgot to talk about it since we already got the question from >> Wei. >> The first thing is that your comment is also true for bandwidth which is >> already present. Moreover PPS is linked to bandwidth. >> By using netfilter, PPS shaping is done on backend level, once packet >> has left the VM; which means after using an additional memory transaction >> to copy packet from frontend. IMHO, at scale, shaping in this way should >> save some memory transactions comparing to netfilter.> Have you tried the netfilter approach and found it to be insufficient in > practice?> I''m not sure how netfilter recent/limit is implemented but if it queues > rather than drops you would naturally find that you end up with back > pressure onto the netback device where the ring would fill with > in-progress requests and therefore netback would have to stop processing > more packets.recent/limit don''t queue it self, it''s just for classifying, you need to specify a target which implements what to do with the packet when it hits the limit or if it is recent enough. When i read the manpage from iptables, the most likely target candidates (drop, reject, queue (to userspace queue), tarpit), all seem to consume the packet, so no back pressure will be built in ring. Also when the packet is not consumed, the recent and limit will probably keep getting hit by the same packet over and over again. So it''s probably not possible at the moment, but informing on the netfilter-list is worth a shot i guess. -- Sander> Ian.