Wei Liu
2013-Aug-06 09:06 UTC
[PATCH V4 0/3] xen-netback: switch to NAPI + kthread 1:1 model
This series implements NAPI + kthread 1:1 model for Xen netback. This model - provides better scheduling fairness among vifs - is prerequisite for implementing multiqueue for Xen network driver The first two patches are ground work for the third patch. First one simplifies code in netback, second one can reduce memory footprint if we switch to 1:1 model. The third patch has the real meat: - make use of NAPI to mitigate interrupt - kthreads are not bound to CPUs any more, so that we can take advantage of backend scheduler and trust it to do the right thing Benchmark done on a Dell T3400 workstation with 4 cores, running 4 DomUs. Netserver running in Dom0. DomUs do netperf to Dom0 with following command: /root/netperf -H Dom0 -fm -l120 IRQs are distributed to 4 cores by hand in the new model, while in the old model vifs are automatically distributed to 4 kthreads. * New model %Cpu0 : 0.5 us, 20.3 sy, 0.0 ni, 28.9 id, 0.0 wa, 0.0 hi, 24.4 si, 25.9 st %Cpu1 : 0.5 us, 17.8 sy, 0.0 ni, 28.8 id, 0.0 wa, 0.0 hi, 27.7 si, 25.1 st %Cpu2 : 0.5 us, 18.8 sy, 0.0 ni, 30.7 id, 0.0 wa, 0.0 hi, 22.9 si, 27.1 st %Cpu3 : 0.0 us, 20.1 sy, 0.0 ni, 30.4 id, 0.0 wa, 0.0 hi, 22.7 si, 26.8 st Throughputs: 2027.89 2025.95 2018.57 2016.23 aggregated: 8088.64 * Old model %Cpu0 : 0.5 us, 68.8 sy, 0.0 ni, 16.1 id, 0.5 wa, 0.0 hi, 2.8 si, 11.5 st %Cpu1 : 0.4 us, 45.1 sy, 0.0 ni, 31.1 id, 0.4 wa, 0.0 hi, 2.1 si, 20.9 st %Cpu2 : 0.9 us, 44.8 sy, 0.0 ni, 30.9 id, 0.0 wa, 0.0 hi, 1.3 si, 22.2 st %Cpu3 : 0.8 us, 46.4 sy, 0.0 ni, 28.3 id, 1.3 wa, 0.0 hi, 2.1 si, 21.1 st Throughputs: 1899.14 2280.43 1963.33 1893.47 aggregated: 8036.37 We can see that the impact is mainly on CPU usage. The new model moves processing from kthread to NAPI (software interrupt). Changes since V3: * rebase on top of net-next * group TX and RX fields in different regions of the structure * minor bugfix : use GFP_ATOMIC to avoid sleeping in SI context * split functional changes and funcation renames into two patches * minor coding style fixes Wei Liu (3): xen-netback: remove page tracking facility xen-netback: switch to NAPI + kthread 1:1 model xen-netback: rename functions drivers/net/xen-netback/common.h | 127 ++++-- drivers/net/xen-netback/interface.c | 119 +++-- drivers/net/xen-netback/netback.c | 833 +++++++++++------------------------ 3 files changed, 406 insertions(+), 673 deletions(-) -- 1.7.10.4
The data flow from DomU to DomU on the same host in current copying scheme with tracking facility: copy DomU --------> Dom0 DomU | ^ |____________________________| copy The page in Dom0 is a page with valid MFN. So we can always copy from page Dom0, thus removing the need for a tracking facility. copy copy DomU --------> Dom0 -------> DomU Simple iperf test shows no performance regression (obviously we copy twice either way): W/ tracking: ~5.3Gb/s W/o tracking: ~5.4Gb/s Signed-off-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> --- drivers/net/xen-netback/netback.c | 77 +------------------------------------ 1 file changed, 2 insertions(+), 75 deletions(-) diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c index 64828de..91f163d 100644 --- a/drivers/net/xen-netback/netback.c +++ b/drivers/net/xen-netback/netback.c @@ -95,21 +95,6 @@ struct netbk_rx_meta { #define MAX_BUFFER_OFFSET PAGE_SIZE -/* extra field used in struct page */ -union page_ext { - struct { -#if BITS_PER_LONG < 64 -#define IDX_WIDTH 8 -#define GROUP_WIDTH (BITS_PER_LONG - IDX_WIDTH) - unsigned int group:GROUP_WIDTH; - unsigned int idx:IDX_WIDTH; -#else - unsigned int group, idx; -#endif - } e; - void *mapping; -}; - struct xen_netbk { wait_queue_head_t wq; struct task_struct *task; @@ -214,45 +199,6 @@ static inline unsigned long idx_to_kaddr(struct xen_netbk *netbk, return (unsigned long)pfn_to_kaddr(idx_to_pfn(netbk, idx)); } -/* extra field used in struct page */ -static inline void set_page_ext(struct page *pg, struct xen_netbk *netbk, - unsigned int idx) -{ - unsigned int group = netbk - xen_netbk; - union page_ext ext = { .e = { .group = group + 1, .idx = idx } }; - - BUILD_BUG_ON(sizeof(ext) > sizeof(ext.mapping)); - pg->mapping = ext.mapping; -} - -static int get_page_ext(struct page *pg, - unsigned int *pgroup, unsigned int *pidx) -{ - union page_ext ext = { .mapping = pg->mapping }; - struct xen_netbk *netbk; - unsigned int group, idx; - - group = ext.e.group - 1; - - if (group < 0 || group >= xen_netbk_group_nr) - return 0; - - netbk = &xen_netbk[group]; - - idx = ext.e.idx; - - if ((idx < 0) || (idx >= MAX_PENDING_REQS)) - return 0; - - if (netbk->mmap_pages[idx] != pg) - return 0; - - *pgroup = group; - *pidx = idx; - - return 1; -} - /* * This is the amount of packet we copy rather than map, so that the * guest can''t fiddle with the contents of the headers while we do @@ -453,12 +399,6 @@ static void netbk_gop_frag_copy(struct xenvif *vif, struct sk_buff *skb, { struct gnttab_copy *copy_gop; struct netbk_rx_meta *meta; - /* - * These variables are used iff get_page_ext returns true, - * in which case they are guaranteed to be initialized. - */ - unsigned int uninitialized_var(group), uninitialized_var(idx); - int foreign = get_page_ext(page, &group, &idx); unsigned long bytes; /* Data must not cross a page boundary. */ @@ -494,20 +434,9 @@ static void netbk_gop_frag_copy(struct xenvif *vif, struct sk_buff *skb, copy_gop = npo->copy + npo->copy_prod++; copy_gop->flags = GNTCOPY_dest_gref; - if (foreign) { - struct xen_netbk *netbk = &xen_netbk[group]; - struct pending_tx_info *src_pend; + copy_gop->source.domid = DOMID_SELF; + copy_gop->source.u.gmfn = virt_to_mfn(page_address(page)); - src_pend = &netbk->pending_tx_info[idx]; - - copy_gop->source.domid = src_pend->vif->domid; - copy_gop->source.u.ref = src_pend->req.gref; - copy_gop->flags |= GNTCOPY_source_gref; - } else { - void *vaddr = page_address(page); - copy_gop->source.domid = DOMID_SELF; - copy_gop->source.u.gmfn = virt_to_mfn(vaddr); - } copy_gop->source.offset = offset; copy_gop->dest.domid = vif->domid; @@ -1047,7 +976,6 @@ static struct page *xen_netbk_alloc_page(struct xen_netbk *netbk, page = alloc_page(GFP_KERNEL|__GFP_COLD); if (!page) return NULL; - set_page_ext(page, netbk, pending_idx); netbk->mmap_pages[pending_idx] = page; return page; } @@ -1155,7 +1083,6 @@ static struct gnttab_copy *xen_netbk_get_requests(struct xen_netbk *netbk, first->req.offset = 0; first->req.size = dst_offset; first->head = start_idx; - set_page_ext(page, netbk, head_idx); netbk->mmap_pages[head_idx] = page; frag_set_pending_idx(&frags[shinfo->nr_frags], head_idx); } -- 1.7.10.4
Wei Liu
2013-Aug-06 09:06 UTC
[PATCH V4 2/3] xen-netback: switch to NAPI + kthread 1:1 model
This patch implements 1:1 model netback. NAPI and kthread are utilized to do the weight-lifting job: - NAPI is used for guest side TX (host side RX) - kthread is used for guest side RX (host side TX) Xenvif and xen_netbk are made into one structure to reduce code size. This model provides better scheduling fairness among vifs. It is also prerequisite for implementing multiqueue for Xen netback. Signed-off-by: Wei Liu <wei.liu2@citrix.com> --- drivers/net/xen-netback/common.h | 107 +++--- drivers/net/xen-netback/interface.c | 103 +++--- drivers/net/xen-netback/netback.c | 607 ++++++++++------------------------- 3 files changed, 310 insertions(+), 507 deletions(-) diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h index 8a4d77e..046cca9 100644 --- a/drivers/net/xen-netback/common.h +++ b/drivers/net/xen-netback/common.h @@ -45,31 +45,84 @@ #include <xen/grant_table.h> #include <xen/xenbus.h> -struct xen_netbk; +typedef unsigned int pending_ring_idx_t; +#define INVALID_PENDING_RING_IDX (~0U) + +struct pending_tx_info { + struct xen_netif_tx_request req; /* coalesced tx request */ + pending_ring_idx_t head; /* head != INVALID_PENDING_RING_IDX + * if it is head of one or more tx + * reqs + */ +}; + +#define XEN_NETIF_TX_RING_SIZE __CONST_RING_SIZE(xen_netif_tx, PAGE_SIZE) +#define XEN_NETIF_RX_RING_SIZE __CONST_RING_SIZE(xen_netif_rx, PAGE_SIZE) + +struct xenvif_rx_meta { + int id; + int size; + int gso_size; +}; + +/* Discriminate from any valid pending_idx value. */ +#define INVALID_PENDING_IDX 0xFFFF + +#define MAX_BUFFER_OFFSET PAGE_SIZE + +#define MAX_PENDING_REQS 256 struct xenvif { /* Unique identifier for this interface. */ domid_t domid; unsigned int handle; - /* Reference to netback processing backend. */ - struct xen_netbk *netbk; + /* Use NAPI for guest TX */ + struct napi_struct napi; + /* When feature-split-event-channels = 0, tx_irq = rx_irq. */ + unsigned int tx_irq; + /* Only used when feature-split-event-channels = 1 */ + char tx_irq_name[IFNAMSIZ+4]; /* DEVNAME-tx */ + struct xen_netif_tx_back_ring tx; + struct sk_buff_head tx_queue; + struct page *mmap_pages[MAX_PENDING_REQS]; + pending_ring_idx_t pending_prod; + pending_ring_idx_t pending_cons; + u16 pending_ring[MAX_PENDING_REQS]; + struct pending_tx_info pending_tx_info[MAX_PENDING_REQS]; + + /* Coalescing tx requests before copying makes number of grant + * copy ops greater or equal to number of slots required. In + * worst case a tx request consumes 2 gnttab_copy. + */ + struct gnttab_copy tx_copy_ops[2*MAX_PENDING_REQS]; - u8 fe_dev_addr[6]; + /* Use kthread for guest RX */ + struct task_struct *task; + wait_queue_head_t wq; /* When feature-split-event-channels = 0, tx_irq = rx_irq. */ - unsigned int tx_irq; unsigned int rx_irq; /* Only used when feature-split-event-channels = 1 */ - char tx_irq_name[IFNAMSIZ+4]; /* DEVNAME-tx */ char rx_irq_name[IFNAMSIZ+4]; /* DEVNAME-rx */ + struct xen_netif_rx_back_ring rx; + struct sk_buff_head rx_queue; - /* List of frontends to notify after a batch of frames sent. */ - struct list_head notify_list; + /* Allow xenvif_start_xmit() to peek ahead in the rx request + * ring. This is a prediction of what rx_req_cons will be + * once all queued skbs are put on the ring. + */ + RING_IDX rx_req_cons_peek; + + /* Given MAX_BUFFER_OFFSET of 4096 the worst case is that each + * head/fragment page uses 2 copy operations because it + * straddles two buffers in the frontend. + */ + struct gnttab_copy grant_copy_op[2*XEN_NETIF_RX_RING_SIZE]; + struct xenvif_rx_meta meta[2*XEN_NETIF_RX_RING_SIZE]; - /* The shared rings and indexes. */ - struct xen_netif_tx_back_ring tx; - struct xen_netif_rx_back_ring rx; + + u8 fe_dev_addr[6]; /* Frontend feature information. */ u8 can_sg:1; @@ -80,13 +133,6 @@ struct xenvif { /* Internal feature information. */ u8 can_queue:1; /* can queue packets for receiver? */ - /* - * Allow xenvif_start_xmit() to peek ahead in the rx request - * ring. This is a prediction of what rx_req_cons will be - * once all queued skbs are put on the ring. - */ - RING_IDX rx_req_cons_peek; - /* Transmit shaping: allow ''credit_bytes'' every ''credit_usec''. */ unsigned long credit_bytes; unsigned long credit_usec; @@ -97,11 +143,7 @@ struct xenvif { unsigned long rx_gso_checksum_fixup; /* Miscellaneous private stuff. */ - struct list_head schedule_list; - atomic_t refcnt; struct net_device *dev; - - wait_queue_head_t waiting_to_free; }; static inline struct xenbus_device *xenvif_to_xenbus_device(struct xenvif *vif) @@ -109,9 +151,6 @@ static inline struct xenbus_device *xenvif_to_xenbus_device(struct xenvif *vif) return to_xenbus_device(vif->dev->dev.parent); } -#define XEN_NETIF_TX_RING_SIZE __CONST_RING_SIZE(xen_netif_tx, PAGE_SIZE) -#define XEN_NETIF_RX_RING_SIZE __CONST_RING_SIZE(xen_netif_rx, PAGE_SIZE) - struct xenvif *xenvif_alloc(struct device *parent, domid_t domid, unsigned int handle); @@ -121,9 +160,6 @@ int xenvif_connect(struct xenvif *vif, unsigned long tx_ring_ref, unsigned int rx_evtchn); void xenvif_disconnect(struct xenvif *vif); -void xenvif_get(struct xenvif *vif); -void xenvif_put(struct xenvif *vif); - int xenvif_xenbus_init(void); void xenvif_xenbus_fini(void); @@ -139,18 +175,8 @@ int xen_netbk_map_frontend_rings(struct xenvif *vif, grant_ref_t tx_ring_ref, grant_ref_t rx_ring_ref); -/* (De)Register a xenvif with the netback backend. */ -void xen_netbk_add_xenvif(struct xenvif *vif); -void xen_netbk_remove_xenvif(struct xenvif *vif); - -/* (De)Schedule backend processing for a xenvif */ -void xen_netbk_schedule_xenvif(struct xenvif *vif); -void xen_netbk_deschedule_xenvif(struct xenvif *vif); - /* Check for SKBs from frontend and schedule backend processing */ void xen_netbk_check_rx_xenvif(struct xenvif *vif); -/* Receive an SKB from the frontend */ -void xenvif_receive_skb(struct xenvif *vif, struct sk_buff *skb); /* Queue an SKB for transmission to the frontend */ void xen_netbk_queue_tx_skb(struct xenvif *vif, struct sk_buff *skb); @@ -163,6 +189,11 @@ void xenvif_carrier_off(struct xenvif *vif); /* Returns number of ring slots required to send an skb to the frontend */ unsigned int xen_netbk_count_skb_slots(struct xenvif *vif, struct sk_buff *skb); +int xen_netbk_tx_action(struct xenvif *vif, int budget); +void xen_netbk_rx_action(struct xenvif *vif); + +int xen_netbk_kthread(void *data); + extern bool separate_tx_rx_irq; #endif /* __XEN_NETBACK__COMMON_H__ */ diff --git a/drivers/net/xen-netback/interface.c b/drivers/net/xen-netback/interface.c index 087d2db..5338fab 100644 --- a/drivers/net/xen-netback/interface.c +++ b/drivers/net/xen-netback/interface.c @@ -30,6 +30,7 @@ #include "common.h" +#include <linux/kthread.h> #include <linux/ethtool.h> #include <linux/rtnetlink.h> #include <linux/if_vlan.h> @@ -38,17 +39,7 @@ #include <asm/xen/hypercall.h> #define XENVIF_QUEUE_LENGTH 32 - -void xenvif_get(struct xenvif *vif) -{ - atomic_inc(&vif->refcnt); -} - -void xenvif_put(struct xenvif *vif) -{ - if (atomic_dec_and_test(&vif->refcnt)) - wake_up(&vif->waiting_to_free); -} +#define XENVIF_NAPI_WEIGHT 64 int xenvif_schedulable(struct xenvif *vif) { @@ -64,21 +55,39 @@ static irqreturn_t xenvif_tx_interrupt(int irq, void *dev_id) { struct xenvif *vif = dev_id; - if (vif->netbk == NULL) - return IRQ_HANDLED; - - xen_netbk_schedule_xenvif(vif); + if (RING_HAS_UNCONSUMED_REQUESTS(&vif->tx)) + napi_schedule(&vif->napi); return IRQ_HANDLED; } +static int xenvif_poll(struct napi_struct *napi, int budget) +{ + struct xenvif *vif = container_of(napi, struct xenvif, napi); + int work_done; + + work_done = xen_netbk_tx_action(vif, budget); + + if (work_done < budget) { + int more_to_do = 0; + unsigned long flags; + + local_irq_save(flags); + + RING_FINAL_CHECK_FOR_REQUESTS(&vif->tx, more_to_do); + if (!more_to_do) + __napi_complete(napi); + + local_irq_restore(flags); + } + + return work_done; +} + static irqreturn_t xenvif_rx_interrupt(int irq, void *dev_id) { struct xenvif *vif = dev_id; - if (vif->netbk == NULL) - return IRQ_HANDLED; - if (xenvif_rx_schedulable(vif)) netif_wake_queue(vif->dev); @@ -99,7 +108,8 @@ static int xenvif_start_xmit(struct sk_buff *skb, struct net_device *dev) BUG_ON(skb->dev != dev); - if (vif->netbk == NULL) + /* Drop the packet if vif is not ready */ + if (vif->task == NULL) goto drop; /* Drop the packet if the target domain has no receive buffers. */ @@ -108,7 +118,6 @@ static int xenvif_start_xmit(struct sk_buff *skb, struct net_device *dev) /* Reserve ring slots for the worst-case number of fragments. */ vif->rx_req_cons_peek += xen_netbk_count_skb_slots(vif, skb); - xenvif_get(vif); if (vif->can_queue && xen_netbk_must_stop_queue(vif)) netif_stop_queue(dev); @@ -123,11 +132,6 @@ static int xenvif_start_xmit(struct sk_buff *skb, struct net_device *dev) return NETDEV_TX_OK; } -void xenvif_receive_skb(struct xenvif *vif, struct sk_buff *skb) -{ - netif_rx_ni(skb); -} - void xenvif_notify_tx_completion(struct xenvif *vif) { if (netif_queue_stopped(vif->dev) && xenvif_rx_schedulable(vif)) @@ -142,7 +146,7 @@ static struct net_device_stats *xenvif_get_stats(struct net_device *dev) static void xenvif_up(struct xenvif *vif) { - xen_netbk_add_xenvif(vif); + napi_enable(&vif->napi); enable_irq(vif->tx_irq); if (vif->tx_irq != vif->rx_irq) enable_irq(vif->rx_irq); @@ -151,12 +155,11 @@ static void xenvif_up(struct xenvif *vif) static void xenvif_down(struct xenvif *vif) { + napi_disable(&vif->napi); disable_irq(vif->tx_irq); if (vif->tx_irq != vif->rx_irq) disable_irq(vif->rx_irq); del_timer_sync(&vif->credit_timeout); - xen_netbk_deschedule_xenvif(vif); - xen_netbk_remove_xenvif(vif); } static int xenvif_open(struct net_device *dev) @@ -272,11 +275,12 @@ struct xenvif *xenvif_alloc(struct device *parent, domid_t domid, struct net_device *dev; struct xenvif *vif; char name[IFNAMSIZ] = {}; + int i; snprintf(name, IFNAMSIZ - 1, "vif%u.%u", domid, handle); dev = alloc_netdev(sizeof(struct xenvif), name, ether_setup); if (dev == NULL) { - pr_warn("Could not allocate netdev\n"); + pr_warn("Could not allocate netdev for %s\n", name); return ERR_PTR(-ENOMEM); } @@ -285,14 +289,9 @@ struct xenvif *xenvif_alloc(struct device *parent, domid_t domid, vif = netdev_priv(dev); vif->domid = domid; vif->handle = handle; - vif->netbk = NULL; vif->can_sg = 1; vif->csum = 1; - atomic_set(&vif->refcnt, 1); - init_waitqueue_head(&vif->waiting_to_free); vif->dev = dev; - INIT_LIST_HEAD(&vif->schedule_list); - INIT_LIST_HEAD(&vif->notify_list); vif->credit_bytes = vif->remaining_credit = ~0UL; vif->credit_usec = 0UL; @@ -307,6 +306,16 @@ struct xenvif *xenvif_alloc(struct device *parent, domid_t domid, dev->tx_queue_len = XENVIF_QUEUE_LENGTH; + skb_queue_head_init(&vif->rx_queue); + skb_queue_head_init(&vif->tx_queue); + + vif->pending_cons = 0; + vif->pending_prod = MAX_PENDING_REQS; + for (i = 0; i < MAX_PENDING_REQS; i++) + vif->pending_ring[i] = i; + for (i = 0; i < MAX_PENDING_REQS; i++) + vif->mmap_pages[i] = NULL; + /* * Initialise a dummy MAC address. We choose the numerically * largest non-broadcast address to prevent the address getting @@ -316,6 +325,8 @@ struct xenvif *xenvif_alloc(struct device *parent, domid_t domid, memset(dev->dev_addr, 0xFF, ETH_ALEN); dev->dev_addr[0] &= ~0x01; + netif_napi_add(dev, &vif->napi, xenvif_poll, XENVIF_NAPI_WEIGHT); + netif_carrier_off(dev); err = register_netdev(dev); @@ -377,7 +388,14 @@ int xenvif_connect(struct xenvif *vif, unsigned long tx_ring_ref, disable_irq(vif->rx_irq); } - xenvif_get(vif); + init_waitqueue_head(&vif->wq); + vif->task = kthread_create(xen_netbk_kthread, + (void *)vif, vif->dev->name); + if (IS_ERR(vif->task)) { + pr_warn("Could not allocate kthread for %s\n", vif->dev->name); + err = PTR_ERR(vif->task); + goto err_rx_unbind; + } rtnl_lock(); if (!vif->can_sg && vif->dev->mtu > ETH_DATA_LEN) @@ -388,7 +406,13 @@ int xenvif_connect(struct xenvif *vif, unsigned long tx_ring_ref, xenvif_up(vif); rtnl_unlock(); + wake_up_process(vif->task); + return 0; + +err_rx_unbind: + unbind_from_irqhandler(vif->rx_irq, vif); + vif->rx_irq = 0; err_tx_unbind: unbind_from_irqhandler(vif->tx_irq, vif); vif->tx_irq = 0; @@ -408,7 +432,6 @@ void xenvif_carrier_off(struct xenvif *vif) if (netif_running(dev)) xenvif_down(vif); rtnl_unlock(); - xenvif_put(vif); } void xenvif_disconnect(struct xenvif *vif) @@ -422,9 +445,6 @@ void xenvif_disconnect(struct xenvif *vif) if (netif_carrier_ok(vif->dev)) xenvif_carrier_off(vif); - atomic_dec(&vif->refcnt); - wait_event(vif->waiting_to_free, atomic_read(&vif->refcnt) == 0); - if (vif->tx_irq) { if (vif->tx_irq == vif->rx_irq) unbind_from_irqhandler(vif->tx_irq, vif); @@ -438,6 +458,11 @@ void xenvif_disconnect(struct xenvif *vif) need_module_put = 1; } + if (vif->task) + kthread_stop(vif->task); + + netif_napi_del(&vif->napi); + unregister_netdev(vif->dev); xen_netbk_unmap_frontend_rings(vif); diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c index 91f163d..44ccc67 100644 --- a/drivers/net/xen-netback/netback.c +++ b/drivers/net/xen-netback/netback.c @@ -70,116 +70,25 @@ module_param(fatal_skb_slots, uint, 0444); */ #define XEN_NETBK_LEGACY_SLOTS_MAX XEN_NETIF_NR_SLOTS_MIN -typedef unsigned int pending_ring_idx_t; -#define INVALID_PENDING_RING_IDX (~0U) - -struct pending_tx_info { - struct xen_netif_tx_request req; /* coalesced tx request */ - struct xenvif *vif; - pending_ring_idx_t head; /* head != INVALID_PENDING_RING_IDX - * if it is head of one or more tx - * reqs - */ -}; - -struct netbk_rx_meta { - int id; - int size; - int gso_size; -}; - -#define MAX_PENDING_REQS 256 - -/* Discriminate from any valid pending_idx value. */ -#define INVALID_PENDING_IDX 0xFFFF - -#define MAX_BUFFER_OFFSET PAGE_SIZE - -struct xen_netbk { - wait_queue_head_t wq; - struct task_struct *task; - - struct sk_buff_head rx_queue; - struct sk_buff_head tx_queue; - - struct timer_list net_timer; - - struct page *mmap_pages[MAX_PENDING_REQS]; - - pending_ring_idx_t pending_prod; - pending_ring_idx_t pending_cons; - struct list_head net_schedule_list; - - /* Protect the net_schedule_list in netif. */ - spinlock_t net_schedule_list_lock; - - atomic_t netfront_count; - - struct pending_tx_info pending_tx_info[MAX_PENDING_REQS]; - /* Coalescing tx requests before copying makes number of grant - * copy ops greater or equal to number of slots required. In - * worst case a tx request consumes 2 gnttab_copy. - */ - struct gnttab_copy tx_copy_ops[2*MAX_PENDING_REQS]; - - u16 pending_ring[MAX_PENDING_REQS]; - - /* - * Given MAX_BUFFER_OFFSET of 4096 the worst case is that each - * head/fragment page uses 2 copy operations because it - * straddles two buffers in the frontend. - */ - struct gnttab_copy grant_copy_op[2*XEN_NETIF_RX_RING_SIZE]; - struct netbk_rx_meta meta[2*XEN_NETIF_RX_RING_SIZE]; -}; - -static struct xen_netbk *xen_netbk; -static int xen_netbk_group_nr; - /* * If head != INVALID_PENDING_RING_IDX, it means this tx request is head of * one or more merged tx requests, otherwise it is the continuation of * previous tx request. */ -static inline int pending_tx_is_head(struct xen_netbk *netbk, RING_IDX idx) -{ - return netbk->pending_tx_info[idx].head != INVALID_PENDING_RING_IDX; -} - -void xen_netbk_add_xenvif(struct xenvif *vif) -{ - int i; - int min_netfront_count; - int min_group = 0; - struct xen_netbk *netbk; - - min_netfront_count = atomic_read(&xen_netbk[0].netfront_count); - for (i = 0; i < xen_netbk_group_nr; i++) { - int netfront_count = atomic_read(&xen_netbk[i].netfront_count); - if (netfront_count < min_netfront_count) { - min_group = i; - min_netfront_count = netfront_count; - } - } - - netbk = &xen_netbk[min_group]; - - vif->netbk = netbk; - atomic_inc(&netbk->netfront_count); -} - -void xen_netbk_remove_xenvif(struct xenvif *vif) +static inline int pending_tx_is_head(struct xenvif *vif, RING_IDX idx) { - struct xen_netbk *netbk = vif->netbk; - vif->netbk = NULL; - atomic_dec(&netbk->netfront_count); + return vif->pending_tx_info[idx].head != INVALID_PENDING_RING_IDX; } -static void xen_netbk_idx_release(struct xen_netbk *netbk, u16 pending_idx, +static void xen_netbk_idx_release(struct xenvif *vif, u16 pending_idx, u8 status); static void make_tx_response(struct xenvif *vif, struct xen_netif_tx_request *txp, s8 st); + +static inline int tx_work_todo(struct xenvif *vif); +static inline int rx_work_todo(struct xenvif *vif); + static struct xen_netif_rx_response *make_rx_response(struct xenvif *vif, u16 id, s8 st, @@ -187,16 +96,16 @@ static struct xen_netif_rx_response *make_rx_response(struct xenvif *vif, u16 size, u16 flags); -static inline unsigned long idx_to_pfn(struct xen_netbk *netbk, +static inline unsigned long idx_to_pfn(struct xenvif *vif, u16 idx) { - return page_to_pfn(netbk->mmap_pages[idx]); + return page_to_pfn(vif->mmap_pages[idx]); } -static inline unsigned long idx_to_kaddr(struct xen_netbk *netbk, +static inline unsigned long idx_to_kaddr(struct xenvif *vif, u16 idx) { - return (unsigned long)pfn_to_kaddr(idx_to_pfn(netbk, idx)); + return (unsigned long)pfn_to_kaddr(idx_to_pfn(vif, idx)); } /* @@ -224,15 +133,10 @@ static inline pending_ring_idx_t pending_index(unsigned i) return i & (MAX_PENDING_REQS-1); } -static inline pending_ring_idx_t nr_pending_reqs(struct xen_netbk *netbk) +static inline pending_ring_idx_t nr_pending_reqs(struct xenvif *vif) { return MAX_PENDING_REQS - - netbk->pending_prod + netbk->pending_cons; -} - -static void xen_netbk_kick_thread(struct xen_netbk *netbk) -{ - wake_up(&netbk->wq); + vif->pending_prod + vif->pending_cons; } static int max_required_rx_slots(struct xenvif *vif) @@ -364,15 +268,15 @@ struct netrx_pending_operations { unsigned copy_prod, copy_cons; unsigned meta_prod, meta_cons; struct gnttab_copy *copy; - struct netbk_rx_meta *meta; + struct xenvif_rx_meta *meta; int copy_off; grant_ref_t copy_gref; }; -static struct netbk_rx_meta *get_next_rx_buffer(struct xenvif *vif, - struct netrx_pending_operations *npo) +static struct xenvif_rx_meta *get_next_rx_buffer(struct xenvif *vif, + struct netrx_pending_operations *npo) { - struct netbk_rx_meta *meta; + struct xenvif_rx_meta *meta; struct xen_netif_rx_request *req; req = RING_GET_REQUEST(&vif->rx, vif->rx.req_cons++); @@ -398,7 +302,7 @@ static void netbk_gop_frag_copy(struct xenvif *vif, struct sk_buff *skb, unsigned long offset, int *head) { struct gnttab_copy *copy_gop; - struct netbk_rx_meta *meta; + struct xenvif_rx_meta *meta; unsigned long bytes; /* Data must not cross a page boundary. */ @@ -434,15 +338,15 @@ static void netbk_gop_frag_copy(struct xenvif *vif, struct sk_buff *skb, copy_gop = npo->copy + npo->copy_prod++; copy_gop->flags = GNTCOPY_dest_gref; + copy_gop->len = bytes; + copy_gop->source.domid = DOMID_SELF; copy_gop->source.u.gmfn = virt_to_mfn(page_address(page)); - copy_gop->source.offset = offset; - copy_gop->dest.domid = vif->domid; + copy_gop->dest.domid = vif->domid; copy_gop->dest.offset = npo->copy_off; copy_gop->dest.u.ref = npo->copy_gref; - copy_gop->len = bytes; npo->copy_off += bytes; meta->size += bytes; @@ -485,7 +389,7 @@ static int netbk_gop_skb(struct sk_buff *skb, int nr_frags = skb_shinfo(skb)->nr_frags; int i; struct xen_netif_rx_request *req; - struct netbk_rx_meta *meta; + struct xenvif_rx_meta *meta; unsigned char *data; int head = 1; int old_meta_prod; @@ -565,7 +469,7 @@ static int netbk_check_gop(struct xenvif *vif, int nr_meta_slots, } static void netbk_add_frag_responses(struct xenvif *vif, int status, - struct netbk_rx_meta *meta, + struct xenvif_rx_meta *meta, int nr_meta_slots) { int i; @@ -594,9 +498,13 @@ struct skb_cb_overlay { int meta_slots_used; }; -static void xen_netbk_rx_action(struct xen_netbk *netbk) +static void xen_netbk_kick_thread(struct xenvif *vif) +{ + wake_up(&vif->wq); +} + +void xen_netbk_rx_action(struct xenvif *vif) { - struct xenvif *vif = NULL, *tmp; s8 status; u16 flags; struct xen_netif_rx_response *resp; @@ -608,17 +516,18 @@ static void xen_netbk_rx_action(struct xen_netbk *netbk) int count; unsigned long offset; struct skb_cb_overlay *sco; + int need_to_notify = 0; struct netrx_pending_operations npo = { - .copy = netbk->grant_copy_op, - .meta = netbk->meta, + .copy = vif->grant_copy_op, + .meta = vif->meta, }; skb_queue_head_init(&rxq); count = 0; - while ((skb = skb_dequeue(&netbk->rx_queue)) != NULL) { + while ((skb = skb_dequeue(&vif->rx_queue)) != NULL) { vif = netdev_priv(skb->dev); nr_frags = skb_shinfo(skb)->nr_frags; @@ -635,27 +544,27 @@ static void xen_netbk_rx_action(struct xen_netbk *netbk) break; } - BUG_ON(npo.meta_prod > ARRAY_SIZE(netbk->meta)); + BUG_ON(npo.meta_prod > ARRAY_SIZE(vif->meta)); if (!npo.copy_prod) return; - BUG_ON(npo.copy_prod > ARRAY_SIZE(netbk->grant_copy_op)); - gnttab_batch_copy(netbk->grant_copy_op, npo.copy_prod); + BUG_ON(npo.copy_prod > ARRAY_SIZE(vif->grant_copy_op)); + gnttab_batch_copy(vif->grant_copy_op, npo.copy_prod); while ((skb = __skb_dequeue(&rxq)) != NULL) { sco = (struct skb_cb_overlay *)skb->cb; vif = netdev_priv(skb->dev); - if (netbk->meta[npo.meta_cons].gso_size && vif->gso_prefix) { + if (vif->meta[npo.meta_cons].gso_size && vif->gso_prefix) { resp = RING_GET_RESPONSE(&vif->rx, - vif->rx.rsp_prod_pvt++); + vif->rx.rsp_prod_pvt++); resp->flags = XEN_NETRXF_gso_prefix | XEN_NETRXF_more_data; - resp->offset = netbk->meta[npo.meta_cons].gso_size; - resp->id = netbk->meta[npo.meta_cons].id; + resp->offset = vif->meta[npo.meta_cons].gso_size; + resp->id = vif->meta[npo.meta_cons].id; resp->status = sco->meta_slots_used; npo.meta_cons++; @@ -680,12 +589,12 @@ static void xen_netbk_rx_action(struct xen_netbk *netbk) flags |= XEN_NETRXF_data_validated; offset = 0; - resp = make_rx_response(vif, netbk->meta[npo.meta_cons].id, + resp = make_rx_response(vif, vif->meta[npo.meta_cons].id, status, offset, - netbk->meta[npo.meta_cons].size, + vif->meta[npo.meta_cons].size, flags); - if (netbk->meta[npo.meta_cons].gso_size && !vif->gso_prefix) { + if (vif->meta[npo.meta_cons].gso_size && !vif->gso_prefix) { struct xen_netif_extra_info *gso (struct xen_netif_extra_info *) RING_GET_RESPONSE(&vif->rx, @@ -693,7 +602,7 @@ static void xen_netbk_rx_action(struct xen_netbk *netbk) resp->flags |= XEN_NETRXF_extra_info; - gso->u.gso.size = netbk->meta[npo.meta_cons].gso_size; + gso->u.gso.size = vif->meta[npo.meta_cons].gso_size; gso->u.gso.type = XEN_NETIF_GSO_TYPE_TCPV4; gso->u.gso.pad = 0; gso->u.gso.features = 0; @@ -703,112 +612,33 @@ static void xen_netbk_rx_action(struct xen_netbk *netbk) } netbk_add_frag_responses(vif, status, - netbk->meta + npo.meta_cons + 1, + vif->meta + npo.meta_cons + 1, sco->meta_slots_used); RING_PUSH_RESPONSES_AND_CHECK_NOTIFY(&vif->rx, ret); + if (ret) + need_to_notify = 1; + xenvif_notify_tx_completion(vif); - if (ret && list_empty(&vif->notify_list)) - list_add_tail(&vif->notify_list, ¬ify); - else - xenvif_put(vif); npo.meta_cons += sco->meta_slots_used; dev_kfree_skb(skb); } - list_for_each_entry_safe(vif, tmp, ¬ify, notify_list) { + if (need_to_notify) notify_remote_via_irq(vif->rx_irq); - list_del_init(&vif->notify_list); - xenvif_put(vif); - } /* More work to do? */ - if (!skb_queue_empty(&netbk->rx_queue) && - !timer_pending(&netbk->net_timer)) - xen_netbk_kick_thread(netbk); + if (!skb_queue_empty(&vif->rx_queue)) + xen_netbk_kick_thread(vif); } void xen_netbk_queue_tx_skb(struct xenvif *vif, struct sk_buff *skb) { - struct xen_netbk *netbk = vif->netbk; + skb_queue_tail(&vif->rx_queue, skb); - skb_queue_tail(&netbk->rx_queue, skb); - - xen_netbk_kick_thread(netbk); -} - -static void xen_netbk_alarm(unsigned long data) -{ - struct xen_netbk *netbk = (struct xen_netbk *)data; - xen_netbk_kick_thread(netbk); -} - -static int __on_net_schedule_list(struct xenvif *vif) -{ - return !list_empty(&vif->schedule_list); -} - -/* Must be called with net_schedule_list_lock held */ -static void remove_from_net_schedule_list(struct xenvif *vif) -{ - if (likely(__on_net_schedule_list(vif))) { - list_del_init(&vif->schedule_list); - xenvif_put(vif); - } -} - -static struct xenvif *poll_net_schedule_list(struct xen_netbk *netbk) -{ - struct xenvif *vif = NULL; - - spin_lock_irq(&netbk->net_schedule_list_lock); - if (list_empty(&netbk->net_schedule_list)) - goto out; - - vif = list_first_entry(&netbk->net_schedule_list, - struct xenvif, schedule_list); - if (!vif) - goto out; - - xenvif_get(vif); - - remove_from_net_schedule_list(vif); -out: - spin_unlock_irq(&netbk->net_schedule_list_lock); - return vif; -} - -void xen_netbk_schedule_xenvif(struct xenvif *vif) -{ - unsigned long flags; - struct xen_netbk *netbk = vif->netbk; - - if (__on_net_schedule_list(vif)) - goto kick; - - spin_lock_irqsave(&netbk->net_schedule_list_lock, flags); - if (!__on_net_schedule_list(vif) && - likely(xenvif_schedulable(vif))) { - list_add_tail(&vif->schedule_list, &netbk->net_schedule_list); - xenvif_get(vif); - } - spin_unlock_irqrestore(&netbk->net_schedule_list_lock, flags); - -kick: - smp_mb(); - if ((nr_pending_reqs(netbk) < (MAX_PENDING_REQS/2)) && - !list_empty(&netbk->net_schedule_list)) - xen_netbk_kick_thread(netbk); -} - -void xen_netbk_deschedule_xenvif(struct xenvif *vif) -{ - struct xen_netbk *netbk = vif->netbk; - spin_lock_irq(&netbk->net_schedule_list_lock); - remove_from_net_schedule_list(vif); - spin_unlock_irq(&netbk->net_schedule_list_lock); + xen_netbk_kick_thread(vif); } void xen_netbk_check_rx_xenvif(struct xenvif *vif) @@ -818,7 +648,7 @@ void xen_netbk_check_rx_xenvif(struct xenvif *vif) RING_FINAL_CHECK_FOR_REQUESTS(&vif->tx, more_to_do); if (more_to_do) - xen_netbk_schedule_xenvif(vif); + napi_schedule(&vif->napi); } static void tx_add_credit(struct xenvif *vif) @@ -860,15 +690,12 @@ static void netbk_tx_err(struct xenvif *vif, txp = RING_GET_REQUEST(&vif->tx, cons++); } while (1); vif->tx.req_cons = cons; - xen_netbk_check_rx_xenvif(vif); - xenvif_put(vif); } static void netbk_fatal_tx_err(struct xenvif *vif) { netdev_err(vif->dev, "fatal error; disabling device\n"); xenvif_carrier_off(vif); - xenvif_put(vif); } static int netbk_count_requests(struct xenvif *vif, @@ -969,19 +796,20 @@ static int netbk_count_requests(struct xenvif *vif, return slots; } -static struct page *xen_netbk_alloc_page(struct xen_netbk *netbk, +static struct page *xen_netbk_alloc_page(struct xenvif *vif, u16 pending_idx) { struct page *page; - page = alloc_page(GFP_KERNEL|__GFP_COLD); + + page = alloc_page(GFP_ATOMIC|__GFP_COLD); if (!page) return NULL; - netbk->mmap_pages[pending_idx] = page; + vif->mmap_pages[pending_idx] = page; + return page; } -static struct gnttab_copy *xen_netbk_get_requests(struct xen_netbk *netbk, - struct xenvif *vif, +static struct gnttab_copy *xen_netbk_get_requests(struct xenvif *vif, struct sk_buff *skb, struct xen_netif_tx_request *txp, struct gnttab_copy *gop) @@ -1012,9 +840,9 @@ static struct gnttab_copy *xen_netbk_get_requests(struct xen_netbk *netbk, for (shinfo->nr_frags = slot = start; slot < nr_slots; shinfo->nr_frags++) { struct pending_tx_info *pending_tx_info - netbk->pending_tx_info; + vif->pending_tx_info; - page = alloc_page(GFP_KERNEL|__GFP_COLD); + page = alloc_page(GFP_ATOMIC|__GFP_COLD); if (!page) goto err; @@ -1049,21 +877,18 @@ static struct gnttab_copy *xen_netbk_get_requests(struct xen_netbk *netbk, gop->len = txp->size; dst_offset += gop->len; - index = pending_index(netbk->pending_cons++); + index = pending_index(vif->pending_cons++); - pending_idx = netbk->pending_ring[index]; + pending_idx = vif->pending_ring[index]; memcpy(&pending_tx_info[pending_idx].req, txp, sizeof(*txp)); - xenvif_get(vif); - - pending_tx_info[pending_idx].vif = vif; /* Poison these fields, corresponding * fields for head tx req will be set * to correct values after the loop. */ - netbk->mmap_pages[pending_idx] = (void *)(~0UL); + vif->mmap_pages[pending_idx] = (void *)(~0UL); pending_tx_info[pending_idx].head INVALID_PENDING_RING_IDX; @@ -1083,7 +908,7 @@ static struct gnttab_copy *xen_netbk_get_requests(struct xen_netbk *netbk, first->req.offset = 0; first->req.size = dst_offset; first->head = start_idx; - netbk->mmap_pages[head_idx] = page; + vif->mmap_pages[head_idx] = page; frag_set_pending_idx(&frags[shinfo->nr_frags], head_idx); } @@ -1093,18 +918,18 @@ static struct gnttab_copy *xen_netbk_get_requests(struct xen_netbk *netbk, err: /* Unwind, freeing all pages and sending error responses. */ while (shinfo->nr_frags-- > start) { - xen_netbk_idx_release(netbk, + xen_netbk_idx_release(vif, frag_get_pending_idx(&frags[shinfo->nr_frags]), XEN_NETIF_RSP_ERROR); } /* The head too, if necessary. */ if (start) - xen_netbk_idx_release(netbk, pending_idx, XEN_NETIF_RSP_ERROR); + xen_netbk_idx_release(vif, pending_idx, XEN_NETIF_RSP_ERROR); return NULL; } -static int xen_netbk_tx_check_gop(struct xen_netbk *netbk, +static int xen_netbk_tx_check_gop(struct xenvif *vif, struct sk_buff *skb, struct gnttab_copy **gopp) { @@ -1119,7 +944,7 @@ static int xen_netbk_tx_check_gop(struct xen_netbk *netbk, /* Check status of header. */ err = gop->status; if (unlikely(err)) - xen_netbk_idx_release(netbk, pending_idx, XEN_NETIF_RSP_ERROR); + xen_netbk_idx_release(vif, pending_idx, XEN_NETIF_RSP_ERROR); /* Skip first skb fragment if it is on same page as header fragment. */ start = (frag_get_pending_idx(&shinfo->frags[0]) == pending_idx); @@ -1129,7 +954,7 @@ static int xen_netbk_tx_check_gop(struct xen_netbk *netbk, pending_ring_idx_t head; pending_idx = frag_get_pending_idx(&shinfo->frags[i]); - tx_info = &netbk->pending_tx_info[pending_idx]; + tx_info = &vif->pending_tx_info[pending_idx]; head = tx_info->head; /* Check error status: if okay then remember grant handle. */ @@ -1137,18 +962,19 @@ static int xen_netbk_tx_check_gop(struct xen_netbk *netbk, newerr = (++gop)->status; if (newerr) break; - peek = netbk->pending_ring[pending_index(++head)]; - } while (!pending_tx_is_head(netbk, peek)); + peek = vif->pending_ring[pending_index(++head)]; + } while (!pending_tx_is_head(vif, peek)); if (likely(!newerr)) { /* Had a previous error? Invalidate this fragment. */ if (unlikely(err)) - xen_netbk_idx_release(netbk, pending_idx, XEN_NETIF_RSP_OKAY); + xen_netbk_idx_release(vif, pending_idx, + XEN_NETIF_RSP_OKAY); continue; } /* Error on this fragment: respond to client with an error. */ - xen_netbk_idx_release(netbk, pending_idx, XEN_NETIF_RSP_ERROR); + xen_netbk_idx_release(vif, pending_idx, XEN_NETIF_RSP_ERROR); /* Not the first error? Preceding frags already invalidated. */ if (err) @@ -1156,10 +982,11 @@ static int xen_netbk_tx_check_gop(struct xen_netbk *netbk, /* First error: invalidate header and preceding fragments. */ pending_idx = *((u16 *)skb->data); - xen_netbk_idx_release(netbk, pending_idx, XEN_NETIF_RSP_OKAY); + xen_netbk_idx_release(vif, pending_idx, XEN_NETIF_RSP_OKAY); for (j = start; j < i; j++) { pending_idx = frag_get_pending_idx(&shinfo->frags[j]); - xen_netbk_idx_release(netbk, pending_idx, XEN_NETIF_RSP_OKAY); + xen_netbk_idx_release(vif, pending_idx, + XEN_NETIF_RSP_OKAY); } /* Remember the error: invalidate all subsequent fragments. */ @@ -1170,7 +997,7 @@ static int xen_netbk_tx_check_gop(struct xen_netbk *netbk, return err; } -static void xen_netbk_fill_frags(struct xen_netbk *netbk, struct sk_buff *skb) +static void xen_netbk_fill_frags(struct xenvif *vif, struct sk_buff *skb) { struct skb_shared_info *shinfo = skb_shinfo(skb); int nr_frags = shinfo->nr_frags; @@ -1184,16 +1011,16 @@ static void xen_netbk_fill_frags(struct xen_netbk *netbk, struct sk_buff *skb) pending_idx = frag_get_pending_idx(frag); - txp = &netbk->pending_tx_info[pending_idx].req; - page = virt_to_page(idx_to_kaddr(netbk, pending_idx)); + txp = &vif->pending_tx_info[pending_idx].req; + page = virt_to_page(idx_to_kaddr(vif, pending_idx)); __skb_fill_page_desc(skb, i, page, txp->offset, txp->size); skb->len += txp->size; skb->data_len += txp->size; skb->truesize += txp->size; /* Take an extra reference to offset xen_netbk_idx_release */ - get_page(netbk->mmap_pages[pending_idx]); - xen_netbk_idx_release(netbk, pending_idx, XEN_NETIF_RSP_OKAY); + get_page(vif->mmap_pages[pending_idx]); + xen_netbk_idx_release(vif, pending_idx, XEN_NETIF_RSP_OKAY); } } @@ -1353,16 +1180,14 @@ static bool tx_credit_exceeded(struct xenvif *vif, unsigned size) return false; } -static unsigned xen_netbk_tx_build_gops(struct xen_netbk *netbk) +static unsigned xen_netbk_tx_build_gops(struct xenvif *vif) { - struct gnttab_copy *gop = netbk->tx_copy_ops, *request_gop; + struct gnttab_copy *gop = vif->tx_copy_ops, *request_gop; struct sk_buff *skb; int ret; - while ((nr_pending_reqs(netbk) + XEN_NETBK_LEGACY_SLOTS_MAX - < MAX_PENDING_REQS) && - !list_empty(&netbk->net_schedule_list)) { - struct xenvif *vif; + while ((nr_pending_reqs(vif) + XEN_NETBK_LEGACY_SLOTS_MAX + < MAX_PENDING_REQS)) { struct xen_netif_tx_request txreq; struct xen_netif_tx_request txfrags[XEN_NETBK_LEGACY_SLOTS_MAX]; struct page *page; @@ -1373,16 +1198,6 @@ static unsigned xen_netbk_tx_build_gops(struct xen_netbk *netbk) unsigned int data_len; pending_ring_idx_t index; - /* Get a netif from the list with work to do. */ - vif = poll_net_schedule_list(netbk); - /* This can sometimes happen because the test of - * list_empty(net_schedule_list) at the top of the - * loop is unlocked. Just go back and have another - * look. - */ - if (!vif) - continue; - if (vif->tx.sring->req_prod - vif->tx.req_cons > XEN_NETIF_TX_RING_SIZE) { netdev_err(vif->dev, @@ -1395,10 +1210,8 @@ static unsigned xen_netbk_tx_build_gops(struct xen_netbk *netbk) } RING_FINAL_CHECK_FOR_REQUESTS(&vif->tx, work_to_do); - if (!work_to_do) { - xenvif_put(vif); - continue; - } + if (!work_to_do) + break; idx = vif->tx.req_cons; rmb(); /* Ensure that we see the request before we copy it. */ @@ -1406,10 +1219,8 @@ static unsigned xen_netbk_tx_build_gops(struct xen_netbk *netbk) /* Credit-based scheduling. */ if (txreq.size > vif->remaining_credit && - tx_credit_exceeded(vif, txreq.size)) { - xenvif_put(vif); - continue; - } + tx_credit_exceeded(vif, txreq.size)) + break; vif->remaining_credit -= txreq.size; @@ -1422,12 +1233,12 @@ static unsigned xen_netbk_tx_build_gops(struct xen_netbk *netbk) work_to_do); idx = vif->tx.req_cons; if (unlikely(work_to_do < 0)) - continue; + break; } ret = netbk_count_requests(vif, &txreq, txfrags, work_to_do); if (unlikely(ret < 0)) - continue; + break; idx += ret; @@ -1435,7 +1246,7 @@ static unsigned xen_netbk_tx_build_gops(struct xen_netbk *netbk) netdev_dbg(vif->dev, "Bad packet size: %d\n", txreq.size); netbk_tx_err(vif, &txreq, idx); - continue; + break; } /* No crossing a page as the payload mustn''t fragment. */ @@ -1445,11 +1256,11 @@ static unsigned xen_netbk_tx_build_gops(struct xen_netbk *netbk) txreq.offset, txreq.size, (txreq.offset&~PAGE_MASK) + txreq.size); netbk_fatal_tx_err(vif); - continue; + break; } - index = pending_index(netbk->pending_cons); - pending_idx = netbk->pending_ring[index]; + index = pending_index(vif->pending_cons); + pending_idx = vif->pending_ring[index]; data_len = (txreq.size > PKT_PROT_LEN && ret < XEN_NETBK_LEGACY_SLOTS_MAX) ? @@ -1474,16 +1285,16 @@ static unsigned xen_netbk_tx_build_gops(struct xen_netbk *netbk) if (netbk_set_skb_gso(vif, skb, gso)) { /* Failure in netbk_set_skb_gso is fatal. */ kfree_skb(skb); - continue; + break; } } /* XXX could copy straight to head */ - page = xen_netbk_alloc_page(netbk, pending_idx); + page = xen_netbk_alloc_page(vif, pending_idx); if (!page) { kfree_skb(skb); netbk_tx_err(vif, &txreq, idx); - continue; + break; } gop->source.u.ref = txreq.gref; @@ -1499,10 +1310,9 @@ static unsigned xen_netbk_tx_build_gops(struct xen_netbk *netbk) gop++; - memcpy(&netbk->pending_tx_info[pending_idx].req, + memcpy(&vif->pending_tx_info[pending_idx].req, &txreq, sizeof(txreq)); - netbk->pending_tx_info[pending_idx].vif = vif; - netbk->pending_tx_info[pending_idx].head = index; + vif->pending_tx_info[pending_idx].head = index; *((u16 *)skb->data) = pending_idx; __skb_put(skb, data_len); @@ -1517,46 +1327,45 @@ static unsigned xen_netbk_tx_build_gops(struct xen_netbk *netbk) INVALID_PENDING_IDX); } - netbk->pending_cons++; + vif->pending_cons++; - request_gop = xen_netbk_get_requests(netbk, vif, - skb, txfrags, gop); + request_gop = xen_netbk_get_requests(vif, skb, txfrags, gop); if (request_gop == NULL) { kfree_skb(skb); netbk_tx_err(vif, &txreq, idx); - continue; + break; } gop = request_gop; - __skb_queue_tail(&netbk->tx_queue, skb); + __skb_queue_tail(&vif->tx_queue, skb); vif->tx.req_cons = idx; - xen_netbk_check_rx_xenvif(vif); - if ((gop-netbk->tx_copy_ops) >= ARRAY_SIZE(netbk->tx_copy_ops)) + if ((gop-vif->tx_copy_ops) >= ARRAY_SIZE(vif->tx_copy_ops)) break; } - return gop - netbk->tx_copy_ops; + return gop - vif->tx_copy_ops; } -static void xen_netbk_tx_submit(struct xen_netbk *netbk) + +static int xen_netbk_tx_submit(struct xenvif *vif, int budget) { - struct gnttab_copy *gop = netbk->tx_copy_ops; + struct gnttab_copy *gop = vif->tx_copy_ops; struct sk_buff *skb; + int work_done = 0; - while ((skb = __skb_dequeue(&netbk->tx_queue)) != NULL) { + while (work_done < budget && + (skb = __skb_dequeue(&vif->tx_queue)) != NULL) { struct xen_netif_tx_request *txp; - struct xenvif *vif; u16 pending_idx; unsigned data_len; pending_idx = *((u16 *)skb->data); - vif = netbk->pending_tx_info[pending_idx].vif; - txp = &netbk->pending_tx_info[pending_idx].req; + txp = &vif->pending_tx_info[pending_idx].req; /* Check the remap error code. */ - if (unlikely(xen_netbk_tx_check_gop(netbk, skb, &gop))) { + if (unlikely(xen_netbk_tx_check_gop(vif, skb, &gop))) { netdev_dbg(vif->dev, "netback grant failed.\n"); skb_shinfo(skb)->nr_frags = 0; kfree_skb(skb); @@ -1565,7 +1374,7 @@ static void xen_netbk_tx_submit(struct xen_netbk *netbk) data_len = skb->len; memcpy(skb->data, - (void *)(idx_to_kaddr(netbk, pending_idx)|txp->offset), + (void *)(idx_to_kaddr(vif, pending_idx)|txp->offset), data_len); if (data_len < txp->size) { /* Append the packet payload as a fragment. */ @@ -1573,7 +1382,8 @@ static void xen_netbk_tx_submit(struct xen_netbk *netbk) txp->size -= data_len; } else { /* Schedule a response immediately. */ - xen_netbk_idx_release(netbk, pending_idx, XEN_NETIF_RSP_OKAY); + xen_netbk_idx_release(vif, pending_idx, + XEN_NETIF_RSP_OKAY); } if (txp->flags & XEN_NETTXF_csum_blank) @@ -1581,7 +1391,7 @@ static void xen_netbk_tx_submit(struct xen_netbk *netbk) else if (txp->flags & XEN_NETTXF_data_validated) skb->ip_summed = CHECKSUM_UNNECESSARY; - xen_netbk_fill_frags(netbk, skb); + xen_netbk_fill_frags(vif, skb); /* * If the initial fragment was < PKT_PROT_LEN then @@ -1609,53 +1419,61 @@ static void xen_netbk_tx_submit(struct xen_netbk *netbk) vif->dev->stats.rx_bytes += skb->len; vif->dev->stats.rx_packets++; - xenvif_receive_skb(vif, skb); + work_done++; + + netif_receive_skb(skb); } + + return work_done; } /* Called after netfront has transmitted */ -static void xen_netbk_tx_action(struct xen_netbk *netbk) +int xen_netbk_tx_action(struct xenvif *vif, int budget) { unsigned nr_gops; + int work_done; - nr_gops = xen_netbk_tx_build_gops(netbk); + if (unlikely(!tx_work_todo(vif))) + return 0; + + nr_gops = xen_netbk_tx_build_gops(vif); if (nr_gops == 0) - return; + return 0; + + gnttab_batch_copy(vif->tx_copy_ops, nr_gops); - gnttab_batch_copy(netbk->tx_copy_ops, nr_gops); + work_done = xen_netbk_tx_submit(vif, nr_gops); - xen_netbk_tx_submit(netbk); + return work_done; } -static void xen_netbk_idx_release(struct xen_netbk *netbk, u16 pending_idx, +static void xen_netbk_idx_release(struct xenvif *vif, u16 pending_idx, u8 status) { - struct xenvif *vif; struct pending_tx_info *pending_tx_info; pending_ring_idx_t head; u16 peek; /* peek into next tx request */ - BUG_ON(netbk->mmap_pages[pending_idx] == (void *)(~0UL)); + BUG_ON(vif->mmap_pages[pending_idx] == (void *)(~0UL)); /* Already complete? */ - if (netbk->mmap_pages[pending_idx] == NULL) + if (vif->mmap_pages[pending_idx] == NULL) return; - pending_tx_info = &netbk->pending_tx_info[pending_idx]; + pending_tx_info = &vif->pending_tx_info[pending_idx]; - vif = pending_tx_info->vif; head = pending_tx_info->head; - BUG_ON(!pending_tx_is_head(netbk, head)); - BUG_ON(netbk->pending_ring[pending_index(head)] != pending_idx); + BUG_ON(!pending_tx_is_head(vif, head)); + BUG_ON(vif->pending_ring[pending_index(head)] != pending_idx); do { pending_ring_idx_t index; pending_ring_idx_t idx = pending_index(head); - u16 info_idx = netbk->pending_ring[idx]; + u16 info_idx = vif->pending_ring[idx]; - pending_tx_info = &netbk->pending_tx_info[info_idx]; + pending_tx_info = &vif->pending_tx_info[info_idx]; make_tx_response(vif, &pending_tx_info->req, status); /* Setting any number other than @@ -1664,18 +1482,15 @@ static void xen_netbk_idx_release(struct xen_netbk *netbk, u16 pending_idx, */ pending_tx_info->head = 0; - index = pending_index(netbk->pending_prod++); - netbk->pending_ring[index] = netbk->pending_ring[info_idx]; + index = pending_index(vif->pending_prod++); + vif->pending_ring[index] = vif->pending_ring[info_idx]; - xenvif_put(vif); + peek = vif->pending_ring[pending_index(++head)]; - peek = netbk->pending_ring[pending_index(++head)]; + } while (!pending_tx_is_head(vif, peek)); - } while (!pending_tx_is_head(netbk, peek)); - - netbk->mmap_pages[pending_idx]->mapping = 0; - put_page(netbk->mmap_pages[pending_idx]); - netbk->mmap_pages[pending_idx] = NULL; + put_page(vif->mmap_pages[pending_idx]); + vif->mmap_pages[pending_idx] = NULL; } @@ -1723,45 +1538,22 @@ static struct xen_netif_rx_response *make_rx_response(struct xenvif *vif, return resp; } -static inline int rx_work_todo(struct xen_netbk *netbk) +static inline int rx_work_todo(struct xenvif *vif) { - return !skb_queue_empty(&netbk->rx_queue); + return !skb_queue_empty(&vif->rx_queue); } -static inline int tx_work_todo(struct xen_netbk *netbk) +static inline int tx_work_todo(struct xenvif *vif) { - if ((nr_pending_reqs(netbk) + XEN_NETBK_LEGACY_SLOTS_MAX - < MAX_PENDING_REQS) && - !list_empty(&netbk->net_schedule_list)) + if (likely(RING_HAS_UNCONSUMED_REQUESTS(&vif->tx)) && + (nr_pending_reqs(vif) + XEN_NETBK_LEGACY_SLOTS_MAX + < MAX_PENDING_REQS)) return 1; return 0; } -static int xen_netbk_kthread(void *data) -{ - struct xen_netbk *netbk = data; - while (!kthread_should_stop()) { - wait_event_interruptible(netbk->wq, - rx_work_todo(netbk) || - tx_work_todo(netbk) || - kthread_should_stop()); - cond_resched(); - - if (kthread_should_stop()) - break; - - if (rx_work_todo(netbk)) - xen_netbk_rx_action(netbk); - - if (tx_work_todo(netbk)) - xen_netbk_tx_action(netbk); - } - - return 0; -} - void xen_netbk_unmap_frontend_rings(struct xenvif *vif) { if (vif->tx.sring) @@ -1807,11 +1599,29 @@ err: return err; } +int xen_netbk_kthread(void *data) +{ + struct xenvif *vif = data; + + while (!kthread_should_stop()) { + wait_event_interruptible(vif->wq, + rx_work_todo(vif) || + kthread_should_stop()); + if (kthread_should_stop()) + break; + + if (rx_work_todo(vif)) + xen_netbk_rx_action(vif); + + cond_resched(); + } + + return 0; +} + static int __init netback_init(void) { - int i; int rc = 0; - int group; if (!xen_domain()) return -ENODEV; @@ -1822,48 +1632,6 @@ static int __init netback_init(void) fatal_skb_slots = XEN_NETBK_LEGACY_SLOTS_MAX; } - xen_netbk_group_nr = num_online_cpus(); - xen_netbk = vzalloc(sizeof(struct xen_netbk) * xen_netbk_group_nr); - if (!xen_netbk) - return -ENOMEM; - - for (group = 0; group < xen_netbk_group_nr; group++) { - struct xen_netbk *netbk = &xen_netbk[group]; - skb_queue_head_init(&netbk->rx_queue); - skb_queue_head_init(&netbk->tx_queue); - - init_timer(&netbk->net_timer); - netbk->net_timer.data = (unsigned long)netbk; - netbk->net_timer.function = xen_netbk_alarm; - - netbk->pending_cons = 0; - netbk->pending_prod = MAX_PENDING_REQS; - for (i = 0; i < MAX_PENDING_REQS; i++) - netbk->pending_ring[i] = i; - - init_waitqueue_head(&netbk->wq); - netbk->task = kthread_create(xen_netbk_kthread, - (void *)netbk, - "netback/%u", group); - - if (IS_ERR(netbk->task)) { - pr_alert("kthread_create() fails at netback\n"); - del_timer(&netbk->net_timer); - rc = PTR_ERR(netbk->task); - goto failed_init; - } - - kthread_bind(netbk->task, group); - - INIT_LIST_HEAD(&netbk->net_schedule_list); - - spin_lock_init(&netbk->net_schedule_list_lock); - - atomic_set(&netbk->netfront_count, 0); - - wake_up_process(netbk->task); - } - rc = xenvif_xenbus_init(); if (rc) goto failed_init; @@ -1871,35 +1639,14 @@ static int __init netback_init(void) return 0; failed_init: - while (--group >= 0) { - struct xen_netbk *netbk = &xen_netbk[group]; - del_timer(&netbk->net_timer); - kthread_stop(netbk->task); - } - vfree(xen_netbk); return rc; - } module_init(netback_init); static void __exit netback_fini(void) { - int i, j; - xenvif_xenbus_fini(); - - for (i = 0; i < xen_netbk_group_nr; i++) { - struct xen_netbk *netbk = &xen_netbk[i]; - del_timer_sync(&netbk->net_timer); - kthread_stop(netbk->task); - for (j = 0; j < MAX_PENDING_REQS; j++) { - if (netbk->mmap_pages[j]) - __free_page(netbk->mmap_pages[j]); - } - } - - vfree(xen_netbk); } module_exit(netback_fini); -- 1.7.10.4
As we move to 1:1 model and melt xen_netbk and xenvif together, it would be better to use single prefix for all functions in xen-netback. Signed-off-by: Wei Liu <wei.liu2@citrix.com> --- drivers/net/xen-netback/common.h | 24 ++-- drivers/net/xen-netback/interface.c | 20 ++-- drivers/net/xen-netback/netback.c | 223 ++++++++++++++++++----------------- 3 files changed, 134 insertions(+), 133 deletions(-) diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h index 046cca9..83b3e47 100644 --- a/drivers/net/xen-netback/common.h +++ b/drivers/net/xen-netback/common.h @@ -165,21 +165,21 @@ void xenvif_xenbus_fini(void); int xenvif_schedulable(struct xenvif *vif); -int xen_netbk_rx_ring_full(struct xenvif *vif); +int xenvif_rx_ring_full(struct xenvif *vif); -int xen_netbk_must_stop_queue(struct xenvif *vif); +int xenvif_must_stop_queue(struct xenvif *vif); /* (Un)Map communication rings. */ -void xen_netbk_unmap_frontend_rings(struct xenvif *vif); -int xen_netbk_map_frontend_rings(struct xenvif *vif, - grant_ref_t tx_ring_ref, - grant_ref_t rx_ring_ref); +void xenvif_unmap_frontend_rings(struct xenvif *vif); +int xenvif_map_frontend_rings(struct xenvif *vif, + grant_ref_t tx_ring_ref, + grant_ref_t rx_ring_ref); /* Check for SKBs from frontend and schedule backend processing */ -void xen_netbk_check_rx_xenvif(struct xenvif *vif); +void xenvif_check_rx_xenvif(struct xenvif *vif); /* Queue an SKB for transmission to the frontend */ -void xen_netbk_queue_tx_skb(struct xenvif *vif, struct sk_buff *skb); +void xenvif_queue_tx_skb(struct xenvif *vif, struct sk_buff *skb); /* Notify xenvif that ring now has space to send an skb to the frontend */ void xenvif_notify_tx_completion(struct xenvif *vif); @@ -187,12 +187,12 @@ void xenvif_notify_tx_completion(struct xenvif *vif); void xenvif_carrier_off(struct xenvif *vif); /* Returns number of ring slots required to send an skb to the frontend */ -unsigned int xen_netbk_count_skb_slots(struct xenvif *vif, struct sk_buff *skb); +unsigned int xenvif_count_skb_slots(struct xenvif *vif, struct sk_buff *skb); -int xen_netbk_tx_action(struct xenvif *vif, int budget); -void xen_netbk_rx_action(struct xenvif *vif); +int xenvif_tx_action(struct xenvif *vif, int budget); +void xenvif_rx_action(struct xenvif *vif); -int xen_netbk_kthread(void *data); +int xenvif_kthread(void *data); extern bool separate_tx_rx_irq; diff --git a/drivers/net/xen-netback/interface.c b/drivers/net/xen-netback/interface.c index 5338fab..5689b2f 100644 --- a/drivers/net/xen-netback/interface.c +++ b/drivers/net/xen-netback/interface.c @@ -48,7 +48,7 @@ int xenvif_schedulable(struct xenvif *vif) static int xenvif_rx_schedulable(struct xenvif *vif) { - return xenvif_schedulable(vif) && !xen_netbk_rx_ring_full(vif); + return xenvif_schedulable(vif) && !xenvif_rx_ring_full(vif); } static irqreturn_t xenvif_tx_interrupt(int irq, void *dev_id) @@ -66,7 +66,7 @@ static int xenvif_poll(struct napi_struct *napi, int budget) struct xenvif *vif = container_of(napi, struct xenvif, napi); int work_done; - work_done = xen_netbk_tx_action(vif, budget); + work_done = xenvif_tx_action(vif, budget); if (work_done < budget) { int more_to_do = 0; @@ -117,12 +117,12 @@ static int xenvif_start_xmit(struct sk_buff *skb, struct net_device *dev) goto drop; /* Reserve ring slots for the worst-case number of fragments. */ - vif->rx_req_cons_peek += xen_netbk_count_skb_slots(vif, skb); + vif->rx_req_cons_peek += xenvif_count_skb_slots(vif, skb); - if (vif->can_queue && xen_netbk_must_stop_queue(vif)) + if (vif->can_queue && xenvif_must_stop_queue(vif)) netif_stop_queue(dev); - xen_netbk_queue_tx_skb(vif, skb); + xenvif_queue_tx_skb(vif, skb); return NETDEV_TX_OK; @@ -150,7 +150,7 @@ static void xenvif_up(struct xenvif *vif) enable_irq(vif->tx_irq); if (vif->tx_irq != vif->rx_irq) enable_irq(vif->rx_irq); - xen_netbk_check_rx_xenvif(vif); + xenvif_check_rx_xenvif(vif); } static void xenvif_down(struct xenvif *vif) @@ -352,7 +352,7 @@ int xenvif_connect(struct xenvif *vif, unsigned long tx_ring_ref, __module_get(THIS_MODULE); - err = xen_netbk_map_frontend_rings(vif, tx_ring_ref, rx_ring_ref); + err = xenvif_map_frontend_rings(vif, tx_ring_ref, rx_ring_ref); if (err < 0) goto err; @@ -389,7 +389,7 @@ int xenvif_connect(struct xenvif *vif, unsigned long tx_ring_ref, } init_waitqueue_head(&vif->wq); - vif->task = kthread_create(xen_netbk_kthread, + vif->task = kthread_create(xenvif_kthread, (void *)vif, vif->dev->name); if (IS_ERR(vif->task)) { pr_warn("Could not allocate kthread for %s\n", vif->dev->name); @@ -417,7 +417,7 @@ err_tx_unbind: unbind_from_irqhandler(vif->tx_irq, vif); vif->tx_irq = 0; err_unmap: - xen_netbk_unmap_frontend_rings(vif); + xenvif_unmap_frontend_rings(vif); err: module_put(THIS_MODULE); return err; @@ -465,7 +465,7 @@ void xenvif_disconnect(struct xenvif *vif) unregister_netdev(vif->dev); - xen_netbk_unmap_frontend_rings(vif); + xenvif_unmap_frontend_rings(vif); free_netdev(vif->dev); diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c index 44ccc67..956130c 100644 --- a/drivers/net/xen-netback/netback.c +++ b/drivers/net/xen-netback/netback.c @@ -80,8 +80,9 @@ static inline int pending_tx_is_head(struct xenvif *vif, RING_IDX idx) return vif->pending_tx_info[idx].head != INVALID_PENDING_RING_IDX; } -static void xen_netbk_idx_release(struct xenvif *vif, u16 pending_idx, - u8 status); +static void xenvif_idx_release(struct xenvif *vif, u16 pending_idx, + u8 status); + static void make_tx_response(struct xenvif *vif, struct xen_netif_tx_request *txp, s8 st); @@ -150,7 +151,7 @@ static int max_required_rx_slots(struct xenvif *vif) return max; } -int xen_netbk_rx_ring_full(struct xenvif *vif) +int xenvif_rx_ring_full(struct xenvif *vif) { RING_IDX peek = vif->rx_req_cons_peek; RING_IDX needed = max_required_rx_slots(vif); @@ -159,16 +160,16 @@ int xen_netbk_rx_ring_full(struct xenvif *vif) ((vif->rx.rsp_prod_pvt + XEN_NETIF_RX_RING_SIZE - peek) < needed); } -int xen_netbk_must_stop_queue(struct xenvif *vif) +int xenvif_must_stop_queue(struct xenvif *vif) { - if (!xen_netbk_rx_ring_full(vif)) + if (!xenvif_rx_ring_full(vif)) return 0; vif->rx.sring->req_event = vif->rx_req_cons_peek + max_required_rx_slots(vif); mb(); /* request notification /then/ check the queue */ - return xen_netbk_rx_ring_full(vif); + return xenvif_rx_ring_full(vif); } /* @@ -214,9 +215,9 @@ static bool start_new_rx_buffer(int offset, unsigned long size, int head) /* * Figure out how many ring slots we''re going to need to send @skb to * the guest. This function is essentially a dry run of - * netbk_gop_frag_copy. + * xenvif_gop_frag_copy. */ -unsigned int xen_netbk_count_skb_slots(struct xenvif *vif, struct sk_buff *skb) +unsigned int xenvif_count_skb_slots(struct xenvif *vif, struct sk_buff *skb) { unsigned int count; int i, copy_off; @@ -296,10 +297,10 @@ static struct xenvif_rx_meta *get_next_rx_buffer(struct xenvif *vif, * Set up the grant operations for this fragment. If it''s a flipping * interface, we also set up the unmap request from here. */ -static void netbk_gop_frag_copy(struct xenvif *vif, struct sk_buff *skb, - struct netrx_pending_operations *npo, - struct page *page, unsigned long size, - unsigned long offset, int *head) +static void xenvif_gop_frag_copy(struct xenvif *vif, struct sk_buff *skb, + struct netrx_pending_operations *npo, + struct page *page, unsigned long size, + unsigned long offset, int *head) { struct gnttab_copy *copy_gop; struct xenvif_rx_meta *meta; @@ -382,8 +383,8 @@ static void netbk_gop_frag_copy(struct xenvif *vif, struct sk_buff *skb, * zero GSO descriptors (for non-GSO packets) or one descriptor (for * frontend-side LRO). */ -static int netbk_gop_skb(struct sk_buff *skb, - struct netrx_pending_operations *npo) +static int xenvif_gop_skb(struct sk_buff *skb, + struct netrx_pending_operations *npo) { struct xenvif *vif = netdev_priv(skb->dev); int nr_frags = skb_shinfo(skb)->nr_frags; @@ -426,30 +427,30 @@ static int netbk_gop_skb(struct sk_buff *skb, if (data + len > skb_tail_pointer(skb)) len = skb_tail_pointer(skb) - data; - netbk_gop_frag_copy(vif, skb, npo, - virt_to_page(data), len, offset, &head); + xenvif_gop_frag_copy(vif, skb, npo, + virt_to_page(data), len, offset, &head); data += len; } for (i = 0; i < nr_frags; i++) { - netbk_gop_frag_copy(vif, skb, npo, - skb_frag_page(&skb_shinfo(skb)->frags[i]), - skb_frag_size(&skb_shinfo(skb)->frags[i]), - skb_shinfo(skb)->frags[i].page_offset, - &head); + xenvif_gop_frag_copy(vif, skb, npo, + skb_frag_page(&skb_shinfo(skb)->frags[i]), + skb_frag_size(&skb_shinfo(skb)->frags[i]), + skb_shinfo(skb)->frags[i].page_offset, + &head); } return npo->meta_prod - old_meta_prod; } /* - * This is a twin to netbk_gop_skb. Assume that netbk_gop_skb was + * This is a twin to xenvif_gop_skb. Assume that xenvif_gop_skb was * used to set up the operations on the top of * netrx_pending_operations, which have since been done. Check that * they didn''t give any errors and advance over them. */ -static int netbk_check_gop(struct xenvif *vif, int nr_meta_slots, - struct netrx_pending_operations *npo) +static int xenvif_check_gop(struct xenvif *vif, int nr_meta_slots, + struct netrx_pending_operations *npo) { struct gnttab_copy *copy_op; int status = XEN_NETIF_RSP_OKAY; @@ -468,9 +469,9 @@ static int netbk_check_gop(struct xenvif *vif, int nr_meta_slots, return status; } -static void netbk_add_frag_responses(struct xenvif *vif, int status, - struct xenvif_rx_meta *meta, - int nr_meta_slots) +static void xenvif_add_frag_responses(struct xenvif *vif, int status, + struct xenvif_rx_meta *meta, + int nr_meta_slots) { int i; unsigned long offset; @@ -498,12 +499,12 @@ struct skb_cb_overlay { int meta_slots_used; }; -static void xen_netbk_kick_thread(struct xenvif *vif) +static void xenvif_kick_thread(struct xenvif *vif) { wake_up(&vif->wq); } -void xen_netbk_rx_action(struct xenvif *vif) +void xenvif_rx_action(struct xenvif *vif) { s8 status; u16 flags; @@ -532,7 +533,7 @@ void xen_netbk_rx_action(struct xenvif *vif) nr_frags = skb_shinfo(skb)->nr_frags; sco = (struct skb_cb_overlay *)skb->cb; - sco->meta_slots_used = netbk_gop_skb(skb, &npo); + sco->meta_slots_used = xenvif_gop_skb(skb, &npo); count += nr_frags + 1; @@ -575,7 +576,7 @@ void xen_netbk_rx_action(struct xenvif *vif) vif->dev->stats.tx_bytes += skb->len; vif->dev->stats.tx_packets++; - status = netbk_check_gop(vif, sco->meta_slots_used, &npo); + status = xenvif_check_gop(vif, sco->meta_slots_used, &npo); if (sco->meta_slots_used == 1) flags = 0; @@ -611,9 +612,9 @@ void xen_netbk_rx_action(struct xenvif *vif) gso->flags = 0; } - netbk_add_frag_responses(vif, status, - vif->meta + npo.meta_cons + 1, - sco->meta_slots_used); + xenvif_add_frag_responses(vif, status, + vif->meta + npo.meta_cons + 1, + sco->meta_slots_used); RING_PUSH_RESPONSES_AND_CHECK_NOTIFY(&vif->rx, ret); @@ -631,17 +632,17 @@ void xen_netbk_rx_action(struct xenvif *vif) /* More work to do? */ if (!skb_queue_empty(&vif->rx_queue)) - xen_netbk_kick_thread(vif); + xenvif_kick_thread(vif); } -void xen_netbk_queue_tx_skb(struct xenvif *vif, struct sk_buff *skb) +void xenvif_queue_tx_skb(struct xenvif *vif, struct sk_buff *skb) { skb_queue_tail(&vif->rx_queue, skb); - xen_netbk_kick_thread(vif); + xenvif_kick_thread(vif); } -void xen_netbk_check_rx_xenvif(struct xenvif *vif) +void xenvif_check_rx_xenvif(struct xenvif *vif) { int more_to_do; @@ -675,11 +676,11 @@ static void tx_credit_callback(unsigned long data) { struct xenvif *vif = (struct xenvif *)data; tx_add_credit(vif); - xen_netbk_check_rx_xenvif(vif); + xenvif_check_rx_xenvif(vif); } -static void netbk_tx_err(struct xenvif *vif, - struct xen_netif_tx_request *txp, RING_IDX end) +static void xenvif_tx_err(struct xenvif *vif, + struct xen_netif_tx_request *txp, RING_IDX end) { RING_IDX cons = vif->tx.req_cons; @@ -692,16 +693,16 @@ static void netbk_tx_err(struct xenvif *vif, vif->tx.req_cons = cons; } -static void netbk_fatal_tx_err(struct xenvif *vif) +static void xenvif_fatal_tx_err(struct xenvif *vif) { netdev_err(vif->dev, "fatal error; disabling device\n"); xenvif_carrier_off(vif); } -static int netbk_count_requests(struct xenvif *vif, - struct xen_netif_tx_request *first, - struct xen_netif_tx_request *txp, - int work_to_do) +static int xenvif_count_requests(struct xenvif *vif, + struct xen_netif_tx_request *first, + struct xen_netif_tx_request *txp, + int work_to_do) { RING_IDX cons = vif->tx.req_cons; int slots = 0; @@ -718,7 +719,7 @@ static int netbk_count_requests(struct xenvif *vif, netdev_err(vif->dev, "Asked for %d slots but exceeds this limit\n", work_to_do); - netbk_fatal_tx_err(vif); + xenvif_fatal_tx_err(vif); return -ENODATA; } @@ -729,7 +730,7 @@ static int netbk_count_requests(struct xenvif *vif, netdev_err(vif->dev, "Malicious frontend using %d slots, threshold %u\n", slots, fatal_skb_slots); - netbk_fatal_tx_err(vif); + xenvif_fatal_tx_err(vif); return -E2BIG; } @@ -777,7 +778,7 @@ static int netbk_count_requests(struct xenvif *vif, if (unlikely((txp->offset + txp->size) > PAGE_SIZE)) { netdev_err(vif->dev, "Cross page boundary, txp->offset: %x, size: %u\n", txp->offset, txp->size); - netbk_fatal_tx_err(vif); + xenvif_fatal_tx_err(vif); return -EINVAL; } @@ -789,15 +790,15 @@ static int netbk_count_requests(struct xenvif *vif, } while (more_data); if (drop_err) { - netbk_tx_err(vif, first, cons + slots); + xenvif_tx_err(vif, first, cons + slots); return drop_err; } return slots; } -static struct page *xen_netbk_alloc_page(struct xenvif *vif, - u16 pending_idx) +static struct page *xenvif_alloc_page(struct xenvif *vif, + u16 pending_idx) { struct page *page; @@ -809,10 +810,10 @@ static struct page *xen_netbk_alloc_page(struct xenvif *vif, return page; } -static struct gnttab_copy *xen_netbk_get_requests(struct xenvif *vif, - struct sk_buff *skb, - struct xen_netif_tx_request *txp, - struct gnttab_copy *gop) +static struct gnttab_copy *xenvif_get_requests(struct xenvif *vif, + struct sk_buff *skb, + struct xen_netif_tx_request *txp, + struct gnttab_copy *gop) { struct skb_shared_info *shinfo = skb_shinfo(skb); skb_frag_t *frags = shinfo->frags; @@ -835,7 +836,7 @@ static struct gnttab_copy *xen_netbk_get_requests(struct xenvif *vif, /* Coalesce tx requests, at this point the packet passed in * should be <= 64K. Any packets larger than 64K have been - * handled in netbk_count_requests(). + * handled in xenvif_count_requests(). */ for (shinfo->nr_frags = slot = start; slot < nr_slots; shinfo->nr_frags++) { @@ -918,20 +919,20 @@ static struct gnttab_copy *xen_netbk_get_requests(struct xenvif *vif, err: /* Unwind, freeing all pages and sending error responses. */ while (shinfo->nr_frags-- > start) { - xen_netbk_idx_release(vif, + xenvif_idx_release(vif, frag_get_pending_idx(&frags[shinfo->nr_frags]), XEN_NETIF_RSP_ERROR); } /* The head too, if necessary. */ if (start) - xen_netbk_idx_release(vif, pending_idx, XEN_NETIF_RSP_ERROR); + xenvif_idx_release(vif, pending_idx, XEN_NETIF_RSP_ERROR); return NULL; } -static int xen_netbk_tx_check_gop(struct xenvif *vif, - struct sk_buff *skb, - struct gnttab_copy **gopp) +static int xenvif_tx_check_gop(struct xenvif *vif, + struct sk_buff *skb, + struct gnttab_copy **gopp) { struct gnttab_copy *gop = *gopp; u16 pending_idx = *((u16 *)skb->data); @@ -944,7 +945,7 @@ static int xen_netbk_tx_check_gop(struct xenvif *vif, /* Check status of header. */ err = gop->status; if (unlikely(err)) - xen_netbk_idx_release(vif, pending_idx, XEN_NETIF_RSP_ERROR); + xenvif_idx_release(vif, pending_idx, XEN_NETIF_RSP_ERROR); /* Skip first skb fragment if it is on same page as header fragment. */ start = (frag_get_pending_idx(&shinfo->frags[0]) == pending_idx); @@ -968,13 +969,13 @@ static int xen_netbk_tx_check_gop(struct xenvif *vif, if (likely(!newerr)) { /* Had a previous error? Invalidate this fragment. */ if (unlikely(err)) - xen_netbk_idx_release(vif, pending_idx, - XEN_NETIF_RSP_OKAY); + xenvif_idx_release(vif, pending_idx, + XEN_NETIF_RSP_OKAY); continue; } /* Error on this fragment: respond to client with an error. */ - xen_netbk_idx_release(vif, pending_idx, XEN_NETIF_RSP_ERROR); + xenvif_idx_release(vif, pending_idx, XEN_NETIF_RSP_ERROR); /* Not the first error? Preceding frags already invalidated. */ if (err) @@ -982,11 +983,11 @@ static int xen_netbk_tx_check_gop(struct xenvif *vif, /* First error: invalidate header and preceding fragments. */ pending_idx = *((u16 *)skb->data); - xen_netbk_idx_release(vif, pending_idx, XEN_NETIF_RSP_OKAY); + xenvif_idx_release(vif, pending_idx, XEN_NETIF_RSP_OKAY); for (j = start; j < i; j++) { pending_idx = frag_get_pending_idx(&shinfo->frags[j]); - xen_netbk_idx_release(vif, pending_idx, - XEN_NETIF_RSP_OKAY); + xenvif_idx_release(vif, pending_idx, + XEN_NETIF_RSP_OKAY); } /* Remember the error: invalidate all subsequent fragments. */ @@ -997,7 +998,7 @@ static int xen_netbk_tx_check_gop(struct xenvif *vif, return err; } -static void xen_netbk_fill_frags(struct xenvif *vif, struct sk_buff *skb) +static void xenvif_fill_frags(struct xenvif *vif, struct sk_buff *skb) { struct skb_shared_info *shinfo = skb_shinfo(skb); int nr_frags = shinfo->nr_frags; @@ -1018,13 +1019,13 @@ static void xen_netbk_fill_frags(struct xenvif *vif, struct sk_buff *skb) skb->data_len += txp->size; skb->truesize += txp->size; - /* Take an extra reference to offset xen_netbk_idx_release */ + /* Take an extra reference to offset xenvif_idx_release */ get_page(vif->mmap_pages[pending_idx]); - xen_netbk_idx_release(vif, pending_idx, XEN_NETIF_RSP_OKAY); + xenvif_idx_release(vif, pending_idx, XEN_NETIF_RSP_OKAY); } } -static int xen_netbk_get_extras(struct xenvif *vif, +static int xenvif_get_extras(struct xenvif *vif, struct xen_netif_extra_info *extras, int work_to_do) { @@ -1034,7 +1035,7 @@ static int xen_netbk_get_extras(struct xenvif *vif, do { if (unlikely(work_to_do-- <= 0)) { netdev_err(vif->dev, "Missing extra info\n"); - netbk_fatal_tx_err(vif); + xenvif_fatal_tx_err(vif); return -EBADR; } @@ -1045,7 +1046,7 @@ static int xen_netbk_get_extras(struct xenvif *vif, vif->tx.req_cons = ++cons; netdev_err(vif->dev, "Invalid extra type: %d\n", extra.type); - netbk_fatal_tx_err(vif); + xenvif_fatal_tx_err(vif); return -EINVAL; } @@ -1056,20 +1057,20 @@ static int xen_netbk_get_extras(struct xenvif *vif, return work_to_do; } -static int netbk_set_skb_gso(struct xenvif *vif, - struct sk_buff *skb, - struct xen_netif_extra_info *gso) +static int xenvif_set_skb_gso(struct xenvif *vif, + struct sk_buff *skb, + struct xen_netif_extra_info *gso) { if (!gso->u.gso.size) { netdev_err(vif->dev, "GSO size must not be zero.\n"); - netbk_fatal_tx_err(vif); + xenvif_fatal_tx_err(vif); return -EINVAL; } /* Currently only TCPv4 S.O. is supported. */ if (gso->u.gso.type != XEN_NETIF_GSO_TYPE_TCPV4) { netdev_err(vif->dev, "Bad GSO type %d.\n", gso->u.gso.type); - netbk_fatal_tx_err(vif); + xenvif_fatal_tx_err(vif); return -EINVAL; } @@ -1180,7 +1181,7 @@ static bool tx_credit_exceeded(struct xenvif *vif, unsigned size) return false; } -static unsigned xen_netbk_tx_build_gops(struct xenvif *vif) +static unsigned xenvif_tx_build_gops(struct xenvif *vif) { struct gnttab_copy *gop = vif->tx_copy_ops, *request_gop; struct sk_buff *skb; @@ -1205,7 +1206,7 @@ static unsigned xen_netbk_tx_build_gops(struct xenvif *vif) "req_prod %d, req_cons %d, size %ld\n", vif->tx.sring->req_prod, vif->tx.req_cons, XEN_NETIF_TX_RING_SIZE); - netbk_fatal_tx_err(vif); + xenvif_fatal_tx_err(vif); continue; } @@ -1229,14 +1230,14 @@ static unsigned xen_netbk_tx_build_gops(struct xenvif *vif) memset(extras, 0, sizeof(extras)); if (txreq.flags & XEN_NETTXF_extra_info) { - work_to_do = xen_netbk_get_extras(vif, extras, - work_to_do); + work_to_do = xenvif_get_extras(vif, extras, + work_to_do); idx = vif->tx.req_cons; if (unlikely(work_to_do < 0)) break; } - ret = netbk_count_requests(vif, &txreq, txfrags, work_to_do); + ret = xenvif_count_requests(vif, &txreq, txfrags, work_to_do); if (unlikely(ret < 0)) break; @@ -1245,7 +1246,7 @@ static unsigned xen_netbk_tx_build_gops(struct xenvif *vif) if (unlikely(txreq.size < ETH_HLEN)) { netdev_dbg(vif->dev, "Bad packet size: %d\n", txreq.size); - netbk_tx_err(vif, &txreq, idx); + xenvif_tx_err(vif, &txreq, idx); break; } @@ -1255,7 +1256,7 @@ static unsigned xen_netbk_tx_build_gops(struct xenvif *vif) "txreq.offset: %x, size: %u, end: %lu\n", txreq.offset, txreq.size, (txreq.offset&~PAGE_MASK) + txreq.size); - netbk_fatal_tx_err(vif); + xenvif_fatal_tx_err(vif); break; } @@ -1271,7 +1272,7 @@ static unsigned xen_netbk_tx_build_gops(struct xenvif *vif) if (unlikely(skb == NULL)) { netdev_dbg(vif->dev, "Can''t allocate a skb in start_xmit.\n"); - netbk_tx_err(vif, &txreq, idx); + xenvif_tx_err(vif, &txreq, idx); break; } @@ -1282,18 +1283,18 @@ static unsigned xen_netbk_tx_build_gops(struct xenvif *vif) struct xen_netif_extra_info *gso; gso = &extras[XEN_NETIF_EXTRA_TYPE_GSO - 1]; - if (netbk_set_skb_gso(vif, skb, gso)) { - /* Failure in netbk_set_skb_gso is fatal. */ + if (xenvif_set_skb_gso(vif, skb, gso)) { + /* Failure in xenvif_set_skb_gso is fatal. */ kfree_skb(skb); break; } } /* XXX could copy straight to head */ - page = xen_netbk_alloc_page(vif, pending_idx); + page = xenvif_alloc_page(vif, pending_idx); if (!page) { kfree_skb(skb); - netbk_tx_err(vif, &txreq, idx); + xenvif_tx_err(vif, &txreq, idx); break; } @@ -1329,10 +1330,10 @@ static unsigned xen_netbk_tx_build_gops(struct xenvif *vif) vif->pending_cons++; - request_gop = xen_netbk_get_requests(vif, skb, txfrags, gop); + request_gop = xenvif_get_requests(vif, skb, txfrags, gop); if (request_gop == NULL) { kfree_skb(skb); - netbk_tx_err(vif, &txreq, idx); + xenvif_tx_err(vif, &txreq, idx); break; } gop = request_gop; @@ -1349,7 +1350,7 @@ static unsigned xen_netbk_tx_build_gops(struct xenvif *vif) } -static int xen_netbk_tx_submit(struct xenvif *vif, int budget) +static int xenvif_tx_submit(struct xenvif *vif, int budget) { struct gnttab_copy *gop = vif->tx_copy_ops; struct sk_buff *skb; @@ -1365,7 +1366,7 @@ static int xen_netbk_tx_submit(struct xenvif *vif, int budget) txp = &vif->pending_tx_info[pending_idx].req; /* Check the remap error code. */ - if (unlikely(xen_netbk_tx_check_gop(vif, skb, &gop))) { + if (unlikely(xenvif_tx_check_gop(vif, skb, &gop))) { netdev_dbg(vif->dev, "netback grant failed.\n"); skb_shinfo(skb)->nr_frags = 0; kfree_skb(skb); @@ -1382,8 +1383,8 @@ static int xen_netbk_tx_submit(struct xenvif *vif, int budget) txp->size -= data_len; } else { /* Schedule a response immediately. */ - xen_netbk_idx_release(vif, pending_idx, - XEN_NETIF_RSP_OKAY); + xenvif_idx_release(vif, pending_idx, + XEN_NETIF_RSP_OKAY); } if (txp->flags & XEN_NETTXF_csum_blank) @@ -1391,7 +1392,7 @@ static int xen_netbk_tx_submit(struct xenvif *vif, int budget) else if (txp->flags & XEN_NETTXF_data_validated) skb->ip_summed = CHECKSUM_UNNECESSARY; - xen_netbk_fill_frags(vif, skb); + xenvif_fill_frags(vif, skb); /* * If the initial fragment was < PKT_PROT_LEN then @@ -1428,7 +1429,7 @@ static int xen_netbk_tx_submit(struct xenvif *vif, int budget) } /* Called after netfront has transmitted */ -int xen_netbk_tx_action(struct xenvif *vif, int budget) +int xenvif_tx_action(struct xenvif *vif, int budget) { unsigned nr_gops; int work_done; @@ -1436,20 +1437,20 @@ int xen_netbk_tx_action(struct xenvif *vif, int budget) if (unlikely(!tx_work_todo(vif))) return 0; - nr_gops = xen_netbk_tx_build_gops(vif); + nr_gops = xenvif_tx_build_gops(vif); if (nr_gops == 0) return 0; gnttab_batch_copy(vif->tx_copy_ops, nr_gops); - work_done = xen_netbk_tx_submit(vif, nr_gops); + work_done = xenvif_tx_submit(vif, nr_gops); return work_done; } -static void xen_netbk_idx_release(struct xenvif *vif, u16 pending_idx, - u8 status) +static void xenvif_idx_release(struct xenvif *vif, u16 pending_idx, + u8 status) { struct pending_tx_info *pending_tx_info; pending_ring_idx_t head; @@ -1554,7 +1555,7 @@ static inline int tx_work_todo(struct xenvif *vif) return 0; } -void xen_netbk_unmap_frontend_rings(struct xenvif *vif) +void xenvif_unmap_frontend_rings(struct xenvif *vif) { if (vif->tx.sring) xenbus_unmap_ring_vfree(xenvif_to_xenbus_device(vif), @@ -1564,9 +1565,9 @@ void xen_netbk_unmap_frontend_rings(struct xenvif *vif) vif->rx.sring); } -int xen_netbk_map_frontend_rings(struct xenvif *vif, - grant_ref_t tx_ring_ref, - grant_ref_t rx_ring_ref) +int xenvif_map_frontend_rings(struct xenvif *vif, + grant_ref_t tx_ring_ref, + grant_ref_t rx_ring_ref) { void *addr; struct xen_netif_tx_sring *txs; @@ -1595,11 +1596,11 @@ int xen_netbk_map_frontend_rings(struct xenvif *vif, return 0; err: - xen_netbk_unmap_frontend_rings(vif); + xenvif_unmap_frontend_rings(vif); return err; } -int xen_netbk_kthread(void *data) +int xenvif_kthread(void *data) { struct xenvif *vif = data; @@ -1611,7 +1612,7 @@ int xen_netbk_kthread(void *data) break; if (rx_work_todo(vif)) - xen_netbk_rx_action(vif); + xenvif_rx_action(vif); cond_resched(); } -- 1.7.10.4
Pasi Kärkkäinen
2013-Aug-06 13:16 UTC
Re: [PATCH V4 0/3] xen-netback: switch to NAPI + kthread 1:1 model
On Tue, Aug 06, 2013 at 10:06:00AM +0100, Wei Liu wrote:> > IRQs are distributed to 4 cores by hand in the new model, while in the > old model vifs are automatically distributed to 4 kthreads. >Hmm.. so with these patches applied is is *required* to do manual configuration in dom0 to get good performance? Thanks, -- Pasi
Wei Liu
2013-Aug-06 13:28 UTC
Re: [PATCH V4 0/3] xen-netback: switch to NAPI + kthread 1:1 model
On Tue, Aug 06, 2013 at 04:16:37PM +0300, Pasi Kärkkäinen wrote:> On Tue, Aug 06, 2013 at 10:06:00AM +0100, Wei Liu wrote: > > > > IRQs are distributed to 4 cores by hand in the new model, while in the > > old model vifs are automatically distributed to 4 kthreads. > > > > Hmm.. so with these patches applied is is *required* to do manual configuration in dom0 to get good performance? >Sort of. But you can use irqbalance to do the job for you. Wei.> Thanks, > > -- Pasi
David Vrabel
2013-Aug-06 13:29 UTC
Re: [PATCH V4 0/3] xen-netback: switch to NAPI + kthread 1:1 model
On 06/08/13 14:16, Pasi Kärkkäinen wrote:> On Tue, Aug 06, 2013 at 10:06:00AM +0100, Wei Liu wrote: >> >> IRQs are distributed to 4 cores by hand in the new model, while in the >> old model vifs are automatically distributed to 4 kthreads. >> > > Hmm.. so with these patches applied is is *required* to do manual configuration in dom0 to get good performance?This should be irqbalanced''s job. The existing version doesn''t do a good enough job yet though. Andrew Bennieston may have more details. David
Wei Liu
2013-Aug-06 13:33 UTC
Re: [PATCH V4 2/3] xen-netback: switch to NAPI + kthread 1:1 model
I sent this patch right before Ian requested more comments in code. Now I''ve updated this one. Not going to resend the whole series as this is only changes in comments. Sorry for my bad English, I don''t know whether I''ve made them clear enough. Suggestions are welcomed. Wei. ---8<--- From bb33027fe4bafeea546352cd3e409466f8bd7aa4 Mon Sep 17 00:00:00 2001 From: Wei Liu <wei.liu2@citrix.com> Date: Tue, 6 Aug 2013 06:59:15 +0100 Subject: [PATCH] xen-netback: switch to NAPI + kthread 1:1 model This patch implements 1:1 model netback. NAPI and kthread are utilized to do the weight-lifting job: - NAPI is used for guest side TX (host side RX) - kthread is used for guest side RX (host side TX) Xenvif and xen_netbk are made into one structure to reduce code size. This model provides better scheduling fairness among vifs. It is also prerequisite for implementing multiqueue for Xen netback. Signed-off-by: Wei Liu <wei.liu2@citrix.com> --- drivers/net/xen-netback/common.h | 128 +++++--- drivers/net/xen-netback/interface.c | 119 ++++--- drivers/net/xen-netback/netback.c | 607 ++++++++++------------------------- 3 files changed, 347 insertions(+), 507 deletions(-) diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h index 8a4d77e..184ae0a 100644 --- a/drivers/net/xen-netback/common.h +++ b/drivers/net/xen-netback/common.h @@ -45,31 +45,105 @@ #include <xen/grant_table.h> #include <xen/xenbus.h> -struct xen_netbk; +typedef unsigned int pending_ring_idx_t; +#define INVALID_PENDING_RING_IDX (~0U) + +/* For the head field in pending_tx_info: It is used to indicate + * whether this tx info is the head of one or more coalesced requests. + * + * When head != INVALID_PENDING_RING_IDX, it means the start of a new + * tx requests queue and the end of previous queue. + * + * An example sequence of head fields (I = INVALID_PENDING_RING_IDX): + * + * ... 0 I I I 5 I 9 I I I ... + * + * After consming the first packet we have: + * + * ... V V V V 5 I 9 I I I ... + * + * where V stands for "valid pending ring index", any number other + * than INVALID_PENDING_RING_IDX is OK. In practice we use 0. + * + * The non-INVALID_PENDING_RING_IDX (say 0, 5 and 9 in the above + * example) number is the index into pending_tx_info and mmap_pages + * arrays. + */ +struct pending_tx_info { + struct xen_netif_tx_request req; /* coalesced tx request */ + pending_ring_idx_t head; /* head != INVALID_PENDING_RING_IDX + * if it is head of one or more tx + * reqs + */ +}; + +#define XEN_NETIF_TX_RING_SIZE __CONST_RING_SIZE(xen_netif_tx, PAGE_SIZE) +#define XEN_NETIF_RX_RING_SIZE __CONST_RING_SIZE(xen_netif_rx, PAGE_SIZE) + +struct xenvif_rx_meta { + int id; + int size; + int gso_size; +}; + +/* Discriminate from any valid pending_idx value. */ +#define INVALID_PENDING_IDX 0xFFFF + +#define MAX_BUFFER_OFFSET PAGE_SIZE + +#define MAX_PENDING_REQS 256 struct xenvif { /* Unique identifier for this interface. */ domid_t domid; unsigned int handle; - /* Reference to netback processing backend. */ - struct xen_netbk *netbk; + /* Use NAPI for guest TX */ + struct napi_struct napi; + /* When feature-split-event-channels = 0, tx_irq = rx_irq. */ + unsigned int tx_irq; + /* Only used when feature-split-event-channels = 1 */ + char tx_irq_name[IFNAMSIZ+4]; /* DEVNAME-tx */ + struct xen_netif_tx_back_ring tx; + struct sk_buff_head tx_queue; + struct page *mmap_pages[MAX_PENDING_REQS]; + pending_ring_idx_t pending_prod; + pending_ring_idx_t pending_cons; + u16 pending_ring[MAX_PENDING_REQS]; + struct pending_tx_info pending_tx_info[MAX_PENDING_REQS]; + + /* Coalescing tx requests before copying makes number of grant + * copy ops greater or equal to number of slots required. In + * worst case a tx request consumes 2 gnttab_copy. + */ + struct gnttab_copy tx_copy_ops[2*MAX_PENDING_REQS]; - u8 fe_dev_addr[6]; + /* Use kthread for guest RX */ + struct task_struct *task; + wait_queue_head_t wq; /* When feature-split-event-channels = 0, tx_irq = rx_irq. */ - unsigned int tx_irq; unsigned int rx_irq; /* Only used when feature-split-event-channels = 1 */ - char tx_irq_name[IFNAMSIZ+4]; /* DEVNAME-tx */ char rx_irq_name[IFNAMSIZ+4]; /* DEVNAME-rx */ + struct xen_netif_rx_back_ring rx; + struct sk_buff_head rx_queue; - /* List of frontends to notify after a batch of frames sent. */ - struct list_head notify_list; + /* Allow xenvif_start_xmit() to peek ahead in the rx request + * ring. This is a prediction of what rx_req_cons will be + * once all queued skbs are put on the ring. + */ + RING_IDX rx_req_cons_peek; + + /* Given MAX_BUFFER_OFFSET of 4096 the worst case is that each + * head/fragment page uses 2 copy operations because it + * straddles two buffers in the frontend. + */ + struct gnttab_copy grant_copy_op[2*XEN_NETIF_RX_RING_SIZE]; + struct xenvif_rx_meta meta[2*XEN_NETIF_RX_RING_SIZE]; - /* The shared rings and indexes. */ - struct xen_netif_tx_back_ring tx; - struct xen_netif_rx_back_ring rx; + + u8 fe_dev_addr[6]; /* Frontend feature information. */ u8 can_sg:1; @@ -80,13 +154,6 @@ struct xenvif { /* Internal feature information. */ u8 can_queue:1; /* can queue packets for receiver? */ - /* - * Allow xenvif_start_xmit() to peek ahead in the rx request - * ring. This is a prediction of what rx_req_cons will be - * once all queued skbs are put on the ring. - */ - RING_IDX rx_req_cons_peek; - /* Transmit shaping: allow ''credit_bytes'' every ''credit_usec''. */ unsigned long credit_bytes; unsigned long credit_usec; @@ -97,11 +164,7 @@ struct xenvif { unsigned long rx_gso_checksum_fixup; /* Miscellaneous private stuff. */ - struct list_head schedule_list; - atomic_t refcnt; struct net_device *dev; - - wait_queue_head_t waiting_to_free; }; static inline struct xenbus_device *xenvif_to_xenbus_device(struct xenvif *vif) @@ -109,9 +172,6 @@ static inline struct xenbus_device *xenvif_to_xenbus_device(struct xenvif *vif) return to_xenbus_device(vif->dev->dev.parent); } -#define XEN_NETIF_TX_RING_SIZE __CONST_RING_SIZE(xen_netif_tx, PAGE_SIZE) -#define XEN_NETIF_RX_RING_SIZE __CONST_RING_SIZE(xen_netif_rx, PAGE_SIZE) - struct xenvif *xenvif_alloc(struct device *parent, domid_t domid, unsigned int handle); @@ -121,9 +181,6 @@ int xenvif_connect(struct xenvif *vif, unsigned long tx_ring_ref, unsigned int rx_evtchn); void xenvif_disconnect(struct xenvif *vif); -void xenvif_get(struct xenvif *vif); -void xenvif_put(struct xenvif *vif); - int xenvif_xenbus_init(void); void xenvif_xenbus_fini(void); @@ -139,18 +196,8 @@ int xen_netbk_map_frontend_rings(struct xenvif *vif, grant_ref_t tx_ring_ref, grant_ref_t rx_ring_ref); -/* (De)Register a xenvif with the netback backend. */ -void xen_netbk_add_xenvif(struct xenvif *vif); -void xen_netbk_remove_xenvif(struct xenvif *vif); - -/* (De)Schedule backend processing for a xenvif */ -void xen_netbk_schedule_xenvif(struct xenvif *vif); -void xen_netbk_deschedule_xenvif(struct xenvif *vif); - /* Check for SKBs from frontend and schedule backend processing */ void xen_netbk_check_rx_xenvif(struct xenvif *vif); -/* Receive an SKB from the frontend */ -void xenvif_receive_skb(struct xenvif *vif, struct sk_buff *skb); /* Queue an SKB for transmission to the frontend */ void xen_netbk_queue_tx_skb(struct xenvif *vif, struct sk_buff *skb); @@ -163,6 +210,11 @@ void xenvif_carrier_off(struct xenvif *vif); /* Returns number of ring slots required to send an skb to the frontend */ unsigned int xen_netbk_count_skb_slots(struct xenvif *vif, struct sk_buff *skb); +int xen_netbk_tx_action(struct xenvif *vif, int budget); +void xen_netbk_rx_action(struct xenvif *vif); + +int xen_netbk_kthread(void *data); + extern bool separate_tx_rx_irq; #endif /* __XEN_NETBACK__COMMON_H__ */ diff --git a/drivers/net/xen-netback/interface.c b/drivers/net/xen-netback/interface.c index 087d2db..44d6b70 100644 --- a/drivers/net/xen-netback/interface.c +++ b/drivers/net/xen-netback/interface.c @@ -30,6 +30,7 @@ #include "common.h" +#include <linux/kthread.h> #include <linux/ethtool.h> #include <linux/rtnetlink.h> #include <linux/if_vlan.h> @@ -38,17 +39,7 @@ #include <asm/xen/hypercall.h> #define XENVIF_QUEUE_LENGTH 32 - -void xenvif_get(struct xenvif *vif) -{ - atomic_inc(&vif->refcnt); -} - -void xenvif_put(struct xenvif *vif) -{ - if (atomic_dec_and_test(&vif->refcnt)) - wake_up(&vif->waiting_to_free); -} +#define XENVIF_NAPI_WEIGHT 64 int xenvif_schedulable(struct xenvif *vif) { @@ -64,21 +55,55 @@ static irqreturn_t xenvif_tx_interrupt(int irq, void *dev_id) { struct xenvif *vif = dev_id; - if (vif->netbk == NULL) - return IRQ_HANDLED; - - xen_netbk_schedule_xenvif(vif); + if (RING_HAS_UNCONSUMED_REQUESTS(&vif->tx)) + napi_schedule(&vif->napi); return IRQ_HANDLED; } +static int xenvif_poll(struct napi_struct *napi, int budget) +{ + struct xenvif *vif = container_of(napi, struct xenvif, napi); + int work_done; + + work_done = xen_netbk_tx_action(vif, budget); + + if (work_done < budget) { + int more_to_do = 0; + unsigned long flags; + + /* It is necessary to disable IRQ before calling + * RING_HAS_UNCONSUMED_REQUESTS. Otherwise we might + * lose event from the frontend. + * + * Consider: + * RING_HAS_UNCONSUMED_REQUESTS + * <frontend generates event to trigger napi_schedule> + * __napi_complete + * + * This handler is still in scheduled state so the + * event has no effect at all. After __napi_complete + * this handler is descheduled and cannot get + * scheduled again. We lose event in this case and the ring + * will be completely stalled. + */ + + local_irq_save(flags); + + RING_FINAL_CHECK_FOR_REQUESTS(&vif->tx, more_to_do); + if (!more_to_do) + __napi_complete(napi); + + local_irq_restore(flags); + } + + return work_done; +} + static irqreturn_t xenvif_rx_interrupt(int irq, void *dev_id) { struct xenvif *vif = dev_id; - if (vif->netbk == NULL) - return IRQ_HANDLED; - if (xenvif_rx_schedulable(vif)) netif_wake_queue(vif->dev); @@ -99,7 +124,8 @@ static int xenvif_start_xmit(struct sk_buff *skb, struct net_device *dev) BUG_ON(skb->dev != dev); - if (vif->netbk == NULL) + /* Drop the packet if vif is not ready */ + if (vif->task == NULL) goto drop; /* Drop the packet if the target domain has no receive buffers. */ @@ -108,7 +134,6 @@ static int xenvif_start_xmit(struct sk_buff *skb, struct net_device *dev) /* Reserve ring slots for the worst-case number of fragments. */ vif->rx_req_cons_peek += xen_netbk_count_skb_slots(vif, skb); - xenvif_get(vif); if (vif->can_queue && xen_netbk_must_stop_queue(vif)) netif_stop_queue(dev); @@ -123,11 +148,6 @@ static int xenvif_start_xmit(struct sk_buff *skb, struct net_device *dev) return NETDEV_TX_OK; } -void xenvif_receive_skb(struct xenvif *vif, struct sk_buff *skb) -{ - netif_rx_ni(skb); -} - void xenvif_notify_tx_completion(struct xenvif *vif) { if (netif_queue_stopped(vif->dev) && xenvif_rx_schedulable(vif)) @@ -142,7 +162,7 @@ static struct net_device_stats *xenvif_get_stats(struct net_device *dev) static void xenvif_up(struct xenvif *vif) { - xen_netbk_add_xenvif(vif); + napi_enable(&vif->napi); enable_irq(vif->tx_irq); if (vif->tx_irq != vif->rx_irq) enable_irq(vif->rx_irq); @@ -151,12 +171,11 @@ static void xenvif_up(struct xenvif *vif) static void xenvif_down(struct xenvif *vif) { + napi_disable(&vif->napi); disable_irq(vif->tx_irq); if (vif->tx_irq != vif->rx_irq) disable_irq(vif->rx_irq); del_timer_sync(&vif->credit_timeout); - xen_netbk_deschedule_xenvif(vif); - xen_netbk_remove_xenvif(vif); } static int xenvif_open(struct net_device *dev) @@ -272,11 +291,12 @@ struct xenvif *xenvif_alloc(struct device *parent, domid_t domid, struct net_device *dev; struct xenvif *vif; char name[IFNAMSIZ] = {}; + int i; snprintf(name, IFNAMSIZ - 1, "vif%u.%u", domid, handle); dev = alloc_netdev(sizeof(struct xenvif), name, ether_setup); if (dev == NULL) { - pr_warn("Could not allocate netdev\n"); + pr_warn("Could not allocate netdev for %s\n", name); return ERR_PTR(-ENOMEM); } @@ -285,14 +305,9 @@ struct xenvif *xenvif_alloc(struct device *parent, domid_t domid, vif = netdev_priv(dev); vif->domid = domid; vif->handle = handle; - vif->netbk = NULL; vif->can_sg = 1; vif->csum = 1; - atomic_set(&vif->refcnt, 1); - init_waitqueue_head(&vif->waiting_to_free); vif->dev = dev; - INIT_LIST_HEAD(&vif->schedule_list); - INIT_LIST_HEAD(&vif->notify_list); vif->credit_bytes = vif->remaining_credit = ~0UL; vif->credit_usec = 0UL; @@ -307,6 +322,16 @@ struct xenvif *xenvif_alloc(struct device *parent, domid_t domid, dev->tx_queue_len = XENVIF_QUEUE_LENGTH; + skb_queue_head_init(&vif->rx_queue); + skb_queue_head_init(&vif->tx_queue); + + vif->pending_cons = 0; + vif->pending_prod = MAX_PENDING_REQS; + for (i = 0; i < MAX_PENDING_REQS; i++) + vif->pending_ring[i] = i; + for (i = 0; i < MAX_PENDING_REQS; i++) + vif->mmap_pages[i] = NULL; + /* * Initialise a dummy MAC address. We choose the numerically * largest non-broadcast address to prevent the address getting @@ -316,6 +341,8 @@ struct xenvif *xenvif_alloc(struct device *parent, domid_t domid, memset(dev->dev_addr, 0xFF, ETH_ALEN); dev->dev_addr[0] &= ~0x01; + netif_napi_add(dev, &vif->napi, xenvif_poll, XENVIF_NAPI_WEIGHT); + netif_carrier_off(dev); err = register_netdev(dev); @@ -377,7 +404,14 @@ int xenvif_connect(struct xenvif *vif, unsigned long tx_ring_ref, disable_irq(vif->rx_irq); } - xenvif_get(vif); + init_waitqueue_head(&vif->wq); + vif->task = kthread_create(xen_netbk_kthread, + (void *)vif, vif->dev->name); + if (IS_ERR(vif->task)) { + pr_warn("Could not allocate kthread for %s\n", vif->dev->name); + err = PTR_ERR(vif->task); + goto err_rx_unbind; + } rtnl_lock(); if (!vif->can_sg && vif->dev->mtu > ETH_DATA_LEN) @@ -388,7 +422,13 @@ int xenvif_connect(struct xenvif *vif, unsigned long tx_ring_ref, xenvif_up(vif); rtnl_unlock(); + wake_up_process(vif->task); + return 0; + +err_rx_unbind: + unbind_from_irqhandler(vif->rx_irq, vif); + vif->rx_irq = 0; err_tx_unbind: unbind_from_irqhandler(vif->tx_irq, vif); vif->tx_irq = 0; @@ -408,7 +448,6 @@ void xenvif_carrier_off(struct xenvif *vif) if (netif_running(dev)) xenvif_down(vif); rtnl_unlock(); - xenvif_put(vif); } void xenvif_disconnect(struct xenvif *vif) @@ -422,9 +461,6 @@ void xenvif_disconnect(struct xenvif *vif) if (netif_carrier_ok(vif->dev)) xenvif_carrier_off(vif); - atomic_dec(&vif->refcnt); - wait_event(vif->waiting_to_free, atomic_read(&vif->refcnt) == 0); - if (vif->tx_irq) { if (vif->tx_irq == vif->rx_irq) unbind_from_irqhandler(vif->tx_irq, vif); @@ -438,6 +474,11 @@ void xenvif_disconnect(struct xenvif *vif) need_module_put = 1; } + if (vif->task) + kthread_stop(vif->task); + + netif_napi_del(&vif->napi); + unregister_netdev(vif->dev); xen_netbk_unmap_frontend_rings(vif); diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c index 91f163d..44ccc67 100644 --- a/drivers/net/xen-netback/netback.c +++ b/drivers/net/xen-netback/netback.c @@ -70,116 +70,25 @@ module_param(fatal_skb_slots, uint, 0444); */ #define XEN_NETBK_LEGACY_SLOTS_MAX XEN_NETIF_NR_SLOTS_MIN -typedef unsigned int pending_ring_idx_t; -#define INVALID_PENDING_RING_IDX (~0U) - -struct pending_tx_info { - struct xen_netif_tx_request req; /* coalesced tx request */ - struct xenvif *vif; - pending_ring_idx_t head; /* head != INVALID_PENDING_RING_IDX - * if it is head of one or more tx - * reqs - */ -}; - -struct netbk_rx_meta { - int id; - int size; - int gso_size; -}; - -#define MAX_PENDING_REQS 256 - -/* Discriminate from any valid pending_idx value. */ -#define INVALID_PENDING_IDX 0xFFFF - -#define MAX_BUFFER_OFFSET PAGE_SIZE - -struct xen_netbk { - wait_queue_head_t wq; - struct task_struct *task; - - struct sk_buff_head rx_queue; - struct sk_buff_head tx_queue; - - struct timer_list net_timer; - - struct page *mmap_pages[MAX_PENDING_REQS]; - - pending_ring_idx_t pending_prod; - pending_ring_idx_t pending_cons; - struct list_head net_schedule_list; - - /* Protect the net_schedule_list in netif. */ - spinlock_t net_schedule_list_lock; - - atomic_t netfront_count; - - struct pending_tx_info pending_tx_info[MAX_PENDING_REQS]; - /* Coalescing tx requests before copying makes number of grant - * copy ops greater or equal to number of slots required. In - * worst case a tx request consumes 2 gnttab_copy. - */ - struct gnttab_copy tx_copy_ops[2*MAX_PENDING_REQS]; - - u16 pending_ring[MAX_PENDING_REQS]; - - /* - * Given MAX_BUFFER_OFFSET of 4096 the worst case is that each - * head/fragment page uses 2 copy operations because it - * straddles two buffers in the frontend. - */ - struct gnttab_copy grant_copy_op[2*XEN_NETIF_RX_RING_SIZE]; - struct netbk_rx_meta meta[2*XEN_NETIF_RX_RING_SIZE]; -}; - -static struct xen_netbk *xen_netbk; -static int xen_netbk_group_nr; - /* * If head != INVALID_PENDING_RING_IDX, it means this tx request is head of * one or more merged tx requests, otherwise it is the continuation of * previous tx request. */ -static inline int pending_tx_is_head(struct xen_netbk *netbk, RING_IDX idx) -{ - return netbk->pending_tx_info[idx].head != INVALID_PENDING_RING_IDX; -} - -void xen_netbk_add_xenvif(struct xenvif *vif) -{ - int i; - int min_netfront_count; - int min_group = 0; - struct xen_netbk *netbk; - - min_netfront_count = atomic_read(&xen_netbk[0].netfront_count); - for (i = 0; i < xen_netbk_group_nr; i++) { - int netfront_count = atomic_read(&xen_netbk[i].netfront_count); - if (netfront_count < min_netfront_count) { - min_group = i; - min_netfront_count = netfront_count; - } - } - - netbk = &xen_netbk[min_group]; - - vif->netbk = netbk; - atomic_inc(&netbk->netfront_count); -} - -void xen_netbk_remove_xenvif(struct xenvif *vif) +static inline int pending_tx_is_head(struct xenvif *vif, RING_IDX idx) { - struct xen_netbk *netbk = vif->netbk; - vif->netbk = NULL; - atomic_dec(&netbk->netfront_count); + return vif->pending_tx_info[idx].head != INVALID_PENDING_RING_IDX; } -static void xen_netbk_idx_release(struct xen_netbk *netbk, u16 pending_idx, +static void xen_netbk_idx_release(struct xenvif *vif, u16 pending_idx, u8 status); static void make_tx_response(struct xenvif *vif, struct xen_netif_tx_request *txp, s8 st); + +static inline int tx_work_todo(struct xenvif *vif); +static inline int rx_work_todo(struct xenvif *vif); + static struct xen_netif_rx_response *make_rx_response(struct xenvif *vif, u16 id, s8 st, @@ -187,16 +96,16 @@ static struct xen_netif_rx_response *make_rx_response(struct xenvif *vif, u16 size, u16 flags); -static inline unsigned long idx_to_pfn(struct xen_netbk *netbk, +static inline unsigned long idx_to_pfn(struct xenvif *vif, u16 idx) { - return page_to_pfn(netbk->mmap_pages[idx]); + return page_to_pfn(vif->mmap_pages[idx]); } -static inline unsigned long idx_to_kaddr(struct xen_netbk *netbk, +static inline unsigned long idx_to_kaddr(struct xenvif *vif, u16 idx) { - return (unsigned long)pfn_to_kaddr(idx_to_pfn(netbk, idx)); + return (unsigned long)pfn_to_kaddr(idx_to_pfn(vif, idx)); } /* @@ -224,15 +133,10 @@ static inline pending_ring_idx_t pending_index(unsigned i) return i & (MAX_PENDING_REQS-1); } -static inline pending_ring_idx_t nr_pending_reqs(struct xen_netbk *netbk) +static inline pending_ring_idx_t nr_pending_reqs(struct xenvif *vif) { return MAX_PENDING_REQS - - netbk->pending_prod + netbk->pending_cons; -} - -static void xen_netbk_kick_thread(struct xen_netbk *netbk) -{ - wake_up(&netbk->wq); + vif->pending_prod + vif->pending_cons; } static int max_required_rx_slots(struct xenvif *vif) @@ -364,15 +268,15 @@ struct netrx_pending_operations { unsigned copy_prod, copy_cons; unsigned meta_prod, meta_cons; struct gnttab_copy *copy; - struct netbk_rx_meta *meta; + struct xenvif_rx_meta *meta; int copy_off; grant_ref_t copy_gref; }; -static struct netbk_rx_meta *get_next_rx_buffer(struct xenvif *vif, - struct netrx_pending_operations *npo) +static struct xenvif_rx_meta *get_next_rx_buffer(struct xenvif *vif, + struct netrx_pending_operations *npo) { - struct netbk_rx_meta *meta; + struct xenvif_rx_meta *meta; struct xen_netif_rx_request *req; req = RING_GET_REQUEST(&vif->rx, vif->rx.req_cons++); @@ -398,7 +302,7 @@ static void netbk_gop_frag_copy(struct xenvif *vif, struct sk_buff *skb, unsigned long offset, int *head) { struct gnttab_copy *copy_gop; - struct netbk_rx_meta *meta; + struct xenvif_rx_meta *meta; unsigned long bytes; /* Data must not cross a page boundary. */ @@ -434,15 +338,15 @@ static void netbk_gop_frag_copy(struct xenvif *vif, struct sk_buff *skb, copy_gop = npo->copy + npo->copy_prod++; copy_gop->flags = GNTCOPY_dest_gref; + copy_gop->len = bytes; + copy_gop->source.domid = DOMID_SELF; copy_gop->source.u.gmfn = virt_to_mfn(page_address(page)); - copy_gop->source.offset = offset; - copy_gop->dest.domid = vif->domid; + copy_gop->dest.domid = vif->domid; copy_gop->dest.offset = npo->copy_off; copy_gop->dest.u.ref = npo->copy_gref; - copy_gop->len = bytes; npo->copy_off += bytes; meta->size += bytes; @@ -485,7 +389,7 @@ static int netbk_gop_skb(struct sk_buff *skb, int nr_frags = skb_shinfo(skb)->nr_frags; int i; struct xen_netif_rx_request *req; - struct netbk_rx_meta *meta; + struct xenvif_rx_meta *meta; unsigned char *data; int head = 1; int old_meta_prod; @@ -565,7 +469,7 @@ static int netbk_check_gop(struct xenvif *vif, int nr_meta_slots, } static void netbk_add_frag_responses(struct xenvif *vif, int status, - struct netbk_rx_meta *meta, + struct xenvif_rx_meta *meta, int nr_meta_slots) { int i; @@ -594,9 +498,13 @@ struct skb_cb_overlay { int meta_slots_used; }; -static void xen_netbk_rx_action(struct xen_netbk *netbk) +static void xen_netbk_kick_thread(struct xenvif *vif) +{ + wake_up(&vif->wq); +} + +void xen_netbk_rx_action(struct xenvif *vif) { - struct xenvif *vif = NULL, *tmp; s8 status; u16 flags; struct xen_netif_rx_response *resp; @@ -608,17 +516,18 @@ static void xen_netbk_rx_action(struct xen_netbk *netbk) int count; unsigned long offset; struct skb_cb_overlay *sco; + int need_to_notify = 0; struct netrx_pending_operations npo = { - .copy = netbk->grant_copy_op, - .meta = netbk->meta, + .copy = vif->grant_copy_op, + .meta = vif->meta, }; skb_queue_head_init(&rxq); count = 0; - while ((skb = skb_dequeue(&netbk->rx_queue)) != NULL) { + while ((skb = skb_dequeue(&vif->rx_queue)) != NULL) { vif = netdev_priv(skb->dev); nr_frags = skb_shinfo(skb)->nr_frags; @@ -635,27 +544,27 @@ static void xen_netbk_rx_action(struct xen_netbk *netbk) break; } - BUG_ON(npo.meta_prod > ARRAY_SIZE(netbk->meta)); + BUG_ON(npo.meta_prod > ARRAY_SIZE(vif->meta)); if (!npo.copy_prod) return; - BUG_ON(npo.copy_prod > ARRAY_SIZE(netbk->grant_copy_op)); - gnttab_batch_copy(netbk->grant_copy_op, npo.copy_prod); + BUG_ON(npo.copy_prod > ARRAY_SIZE(vif->grant_copy_op)); + gnttab_batch_copy(vif->grant_copy_op, npo.copy_prod); while ((skb = __skb_dequeue(&rxq)) != NULL) { sco = (struct skb_cb_overlay *)skb->cb; vif = netdev_priv(skb->dev); - if (netbk->meta[npo.meta_cons].gso_size && vif->gso_prefix) { + if (vif->meta[npo.meta_cons].gso_size && vif->gso_prefix) { resp = RING_GET_RESPONSE(&vif->rx, - vif->rx.rsp_prod_pvt++); + vif->rx.rsp_prod_pvt++); resp->flags = XEN_NETRXF_gso_prefix | XEN_NETRXF_more_data; - resp->offset = netbk->meta[npo.meta_cons].gso_size; - resp->id = netbk->meta[npo.meta_cons].id; + resp->offset = vif->meta[npo.meta_cons].gso_size; + resp->id = vif->meta[npo.meta_cons].id; resp->status = sco->meta_slots_used; npo.meta_cons++; @@ -680,12 +589,12 @@ static void xen_netbk_rx_action(struct xen_netbk *netbk) flags |= XEN_NETRXF_data_validated; offset = 0; - resp = make_rx_response(vif, netbk->meta[npo.meta_cons].id, + resp = make_rx_response(vif, vif->meta[npo.meta_cons].id, status, offset, - netbk->meta[npo.meta_cons].size, + vif->meta[npo.meta_cons].size, flags); - if (netbk->meta[npo.meta_cons].gso_size && !vif->gso_prefix) { + if (vif->meta[npo.meta_cons].gso_size && !vif->gso_prefix) { struct xen_netif_extra_info *gso (struct xen_netif_extra_info *) RING_GET_RESPONSE(&vif->rx, @@ -693,7 +602,7 @@ static void xen_netbk_rx_action(struct xen_netbk *netbk) resp->flags |= XEN_NETRXF_extra_info; - gso->u.gso.size = netbk->meta[npo.meta_cons].gso_size; + gso->u.gso.size = vif->meta[npo.meta_cons].gso_size; gso->u.gso.type = XEN_NETIF_GSO_TYPE_TCPV4; gso->u.gso.pad = 0; gso->u.gso.features = 0; @@ -703,112 +612,33 @@ static void xen_netbk_rx_action(struct xen_netbk *netbk) } netbk_add_frag_responses(vif, status, - netbk->meta + npo.meta_cons + 1, + vif->meta + npo.meta_cons + 1, sco->meta_slots_used); RING_PUSH_RESPONSES_AND_CHECK_NOTIFY(&vif->rx, ret); + if (ret) + need_to_notify = 1; + xenvif_notify_tx_completion(vif); - if (ret && list_empty(&vif->notify_list)) - list_add_tail(&vif->notify_list, ¬ify); - else - xenvif_put(vif); npo.meta_cons += sco->meta_slots_used; dev_kfree_skb(skb); } - list_for_each_entry_safe(vif, tmp, ¬ify, notify_list) { + if (need_to_notify) notify_remote_via_irq(vif->rx_irq); - list_del_init(&vif->notify_list); - xenvif_put(vif); - } /* More work to do? */ - if (!skb_queue_empty(&netbk->rx_queue) && - !timer_pending(&netbk->net_timer)) - xen_netbk_kick_thread(netbk); + if (!skb_queue_empty(&vif->rx_queue)) + xen_netbk_kick_thread(vif); } void xen_netbk_queue_tx_skb(struct xenvif *vif, struct sk_buff *skb) { - struct xen_netbk *netbk = vif->netbk; + skb_queue_tail(&vif->rx_queue, skb); - skb_queue_tail(&netbk->rx_queue, skb); - - xen_netbk_kick_thread(netbk); -} - -static void xen_netbk_alarm(unsigned long data) -{ - struct xen_netbk *netbk = (struct xen_netbk *)data; - xen_netbk_kick_thread(netbk); -} - -static int __on_net_schedule_list(struct xenvif *vif) -{ - return !list_empty(&vif->schedule_list); -} - -/* Must be called with net_schedule_list_lock held */ -static void remove_from_net_schedule_list(struct xenvif *vif) -{ - if (likely(__on_net_schedule_list(vif))) { - list_del_init(&vif->schedule_list); - xenvif_put(vif); - } -} - -static struct xenvif *poll_net_schedule_list(struct xen_netbk *netbk) -{ - struct xenvif *vif = NULL; - - spin_lock_irq(&netbk->net_schedule_list_lock); - if (list_empty(&netbk->net_schedule_list)) - goto out; - - vif = list_first_entry(&netbk->net_schedule_list, - struct xenvif, schedule_list); - if (!vif) - goto out; - - xenvif_get(vif); - - remove_from_net_schedule_list(vif); -out: - spin_unlock_irq(&netbk->net_schedule_list_lock); - return vif; -} - -void xen_netbk_schedule_xenvif(struct xenvif *vif) -{ - unsigned long flags; - struct xen_netbk *netbk = vif->netbk; - - if (__on_net_schedule_list(vif)) - goto kick; - - spin_lock_irqsave(&netbk->net_schedule_list_lock, flags); - if (!__on_net_schedule_list(vif) && - likely(xenvif_schedulable(vif))) { - list_add_tail(&vif->schedule_list, &netbk->net_schedule_list); - xenvif_get(vif); - } - spin_unlock_irqrestore(&netbk->net_schedule_list_lock, flags); - -kick: - smp_mb(); - if ((nr_pending_reqs(netbk) < (MAX_PENDING_REQS/2)) && - !list_empty(&netbk->net_schedule_list)) - xen_netbk_kick_thread(netbk); -} - -void xen_netbk_deschedule_xenvif(struct xenvif *vif) -{ - struct xen_netbk *netbk = vif->netbk; - spin_lock_irq(&netbk->net_schedule_list_lock); - remove_from_net_schedule_list(vif); - spin_unlock_irq(&netbk->net_schedule_list_lock); + xen_netbk_kick_thread(vif); } void xen_netbk_check_rx_xenvif(struct xenvif *vif) @@ -818,7 +648,7 @@ void xen_netbk_check_rx_xenvif(struct xenvif *vif) RING_FINAL_CHECK_FOR_REQUESTS(&vif->tx, more_to_do); if (more_to_do) - xen_netbk_schedule_xenvif(vif); + napi_schedule(&vif->napi); } static void tx_add_credit(struct xenvif *vif) @@ -860,15 +690,12 @@ static void netbk_tx_err(struct xenvif *vif, txp = RING_GET_REQUEST(&vif->tx, cons++); } while (1); vif->tx.req_cons = cons; - xen_netbk_check_rx_xenvif(vif); - xenvif_put(vif); } static void netbk_fatal_tx_err(struct xenvif *vif) { netdev_err(vif->dev, "fatal error; disabling device\n"); xenvif_carrier_off(vif); - xenvif_put(vif); } static int netbk_count_requests(struct xenvif *vif, @@ -969,19 +796,20 @@ static int netbk_count_requests(struct xenvif *vif, return slots; } -static struct page *xen_netbk_alloc_page(struct xen_netbk *netbk, +static struct page *xen_netbk_alloc_page(struct xenvif *vif, u16 pending_idx) { struct page *page; - page = alloc_page(GFP_KERNEL|__GFP_COLD); + + page = alloc_page(GFP_ATOMIC|__GFP_COLD); if (!page) return NULL; - netbk->mmap_pages[pending_idx] = page; + vif->mmap_pages[pending_idx] = page; + return page; } -static struct gnttab_copy *xen_netbk_get_requests(struct xen_netbk *netbk, - struct xenvif *vif, +static struct gnttab_copy *xen_netbk_get_requests(struct xenvif *vif, struct sk_buff *skb, struct xen_netif_tx_request *txp, struct gnttab_copy *gop) @@ -1012,9 +840,9 @@ static struct gnttab_copy *xen_netbk_get_requests(struct xen_netbk *netbk, for (shinfo->nr_frags = slot = start; slot < nr_slots; shinfo->nr_frags++) { struct pending_tx_info *pending_tx_info - netbk->pending_tx_info; + vif->pending_tx_info; - page = alloc_page(GFP_KERNEL|__GFP_COLD); + page = alloc_page(GFP_ATOMIC|__GFP_COLD); if (!page) goto err; @@ -1049,21 +877,18 @@ static struct gnttab_copy *xen_netbk_get_requests(struct xen_netbk *netbk, gop->len = txp->size; dst_offset += gop->len; - index = pending_index(netbk->pending_cons++); + index = pending_index(vif->pending_cons++); - pending_idx = netbk->pending_ring[index]; + pending_idx = vif->pending_ring[index]; memcpy(&pending_tx_info[pending_idx].req, txp, sizeof(*txp)); - xenvif_get(vif); - - pending_tx_info[pending_idx].vif = vif; /* Poison these fields, corresponding * fields for head tx req will be set * to correct values after the loop. */ - netbk->mmap_pages[pending_idx] = (void *)(~0UL); + vif->mmap_pages[pending_idx] = (void *)(~0UL); pending_tx_info[pending_idx].head INVALID_PENDING_RING_IDX; @@ -1083,7 +908,7 @@ static struct gnttab_copy *xen_netbk_get_requests(struct xen_netbk *netbk, first->req.offset = 0; first->req.size = dst_offset; first->head = start_idx; - netbk->mmap_pages[head_idx] = page; + vif->mmap_pages[head_idx] = page; frag_set_pending_idx(&frags[shinfo->nr_frags], head_idx); } @@ -1093,18 +918,18 @@ static struct gnttab_copy *xen_netbk_get_requests(struct xen_netbk *netbk, err: /* Unwind, freeing all pages and sending error responses. */ while (shinfo->nr_frags-- > start) { - xen_netbk_idx_release(netbk, + xen_netbk_idx_release(vif, frag_get_pending_idx(&frags[shinfo->nr_frags]), XEN_NETIF_RSP_ERROR); } /* The head too, if necessary. */ if (start) - xen_netbk_idx_release(netbk, pending_idx, XEN_NETIF_RSP_ERROR); + xen_netbk_idx_release(vif, pending_idx, XEN_NETIF_RSP_ERROR); return NULL; } -static int xen_netbk_tx_check_gop(struct xen_netbk *netbk, +static int xen_netbk_tx_check_gop(struct xenvif *vif, struct sk_buff *skb, struct gnttab_copy **gopp) { @@ -1119,7 +944,7 @@ static int xen_netbk_tx_check_gop(struct xen_netbk *netbk, /* Check status of header. */ err = gop->status; if (unlikely(err)) - xen_netbk_idx_release(netbk, pending_idx, XEN_NETIF_RSP_ERROR); + xen_netbk_idx_release(vif, pending_idx, XEN_NETIF_RSP_ERROR); /* Skip first skb fragment if it is on same page as header fragment. */ start = (frag_get_pending_idx(&shinfo->frags[0]) == pending_idx); @@ -1129,7 +954,7 @@ static int xen_netbk_tx_check_gop(struct xen_netbk *netbk, pending_ring_idx_t head; pending_idx = frag_get_pending_idx(&shinfo->frags[i]); - tx_info = &netbk->pending_tx_info[pending_idx]; + tx_info = &vif->pending_tx_info[pending_idx]; head = tx_info->head; /* Check error status: if okay then remember grant handle. */ @@ -1137,18 +962,19 @@ static int xen_netbk_tx_check_gop(struct xen_netbk *netbk, newerr = (++gop)->status; if (newerr) break; - peek = netbk->pending_ring[pending_index(++head)]; - } while (!pending_tx_is_head(netbk, peek)); + peek = vif->pending_ring[pending_index(++head)]; + } while (!pending_tx_is_head(vif, peek)); if (likely(!newerr)) { /* Had a previous error? Invalidate this fragment. */ if (unlikely(err)) - xen_netbk_idx_release(netbk, pending_idx, XEN_NETIF_RSP_OKAY); + xen_netbk_idx_release(vif, pending_idx, + XEN_NETIF_RSP_OKAY); continue; } /* Error on this fragment: respond to client with an error. */ - xen_netbk_idx_release(netbk, pending_idx, XEN_NETIF_RSP_ERROR); + xen_netbk_idx_release(vif, pending_idx, XEN_NETIF_RSP_ERROR); /* Not the first error? Preceding frags already invalidated. */ if (err) @@ -1156,10 +982,11 @@ static int xen_netbk_tx_check_gop(struct xen_netbk *netbk, /* First error: invalidate header and preceding fragments. */ pending_idx = *((u16 *)skb->data); - xen_netbk_idx_release(netbk, pending_idx, XEN_NETIF_RSP_OKAY); + xen_netbk_idx_release(vif, pending_idx, XEN_NETIF_RSP_OKAY); for (j = start; j < i; j++) { pending_idx = frag_get_pending_idx(&shinfo->frags[j]); - xen_netbk_idx_release(netbk, pending_idx, XEN_NETIF_RSP_OKAY); + xen_netbk_idx_release(vif, pending_idx, + XEN_NETIF_RSP_OKAY); } /* Remember the error: invalidate all subsequent fragments. */ @@ -1170,7 +997,7 @@ static int xen_netbk_tx_check_gop(struct xen_netbk *netbk, return err; } -static void xen_netbk_fill_frags(struct xen_netbk *netbk, struct sk_buff *skb) +static void xen_netbk_fill_frags(struct xenvif *vif, struct sk_buff *skb) { struct skb_shared_info *shinfo = skb_shinfo(skb); int nr_frags = shinfo->nr_frags; @@ -1184,16 +1011,16 @@ static void xen_netbk_fill_frags(struct xen_netbk *netbk, struct sk_buff *skb) pending_idx = frag_get_pending_idx(frag); - txp = &netbk->pending_tx_info[pending_idx].req; - page = virt_to_page(idx_to_kaddr(netbk, pending_idx)); + txp = &vif->pending_tx_info[pending_idx].req; + page = virt_to_page(idx_to_kaddr(vif, pending_idx)); __skb_fill_page_desc(skb, i, page, txp->offset, txp->size); skb->len += txp->size; skb->data_len += txp->size; skb->truesize += txp->size; /* Take an extra reference to offset xen_netbk_idx_release */ - get_page(netbk->mmap_pages[pending_idx]); - xen_netbk_idx_release(netbk, pending_idx, XEN_NETIF_RSP_OKAY); + get_page(vif->mmap_pages[pending_idx]); + xen_netbk_idx_release(vif, pending_idx, XEN_NETIF_RSP_OKAY); } } @@ -1353,16 +1180,14 @@ static bool tx_credit_exceeded(struct xenvif *vif, unsigned size) return false; } -static unsigned xen_netbk_tx_build_gops(struct xen_netbk *netbk) +static unsigned xen_netbk_tx_build_gops(struct xenvif *vif) { - struct gnttab_copy *gop = netbk->tx_copy_ops, *request_gop; + struct gnttab_copy *gop = vif->tx_copy_ops, *request_gop; struct sk_buff *skb; int ret; - while ((nr_pending_reqs(netbk) + XEN_NETBK_LEGACY_SLOTS_MAX - < MAX_PENDING_REQS) && - !list_empty(&netbk->net_schedule_list)) { - struct xenvif *vif; + while ((nr_pending_reqs(vif) + XEN_NETBK_LEGACY_SLOTS_MAX + < MAX_PENDING_REQS)) { struct xen_netif_tx_request txreq; struct xen_netif_tx_request txfrags[XEN_NETBK_LEGACY_SLOTS_MAX]; struct page *page; @@ -1373,16 +1198,6 @@ static unsigned xen_netbk_tx_build_gops(struct xen_netbk *netbk) unsigned int data_len; pending_ring_idx_t index; - /* Get a netif from the list with work to do. */ - vif = poll_net_schedule_list(netbk); - /* This can sometimes happen because the test of - * list_empty(net_schedule_list) at the top of the - * loop is unlocked. Just go back and have another - * look. - */ - if (!vif) - continue; - if (vif->tx.sring->req_prod - vif->tx.req_cons > XEN_NETIF_TX_RING_SIZE) { netdev_err(vif->dev, @@ -1395,10 +1210,8 @@ static unsigned xen_netbk_tx_build_gops(struct xen_netbk *netbk) } RING_FINAL_CHECK_FOR_REQUESTS(&vif->tx, work_to_do); - if (!work_to_do) { - xenvif_put(vif); - continue; - } + if (!work_to_do) + break; idx = vif->tx.req_cons; rmb(); /* Ensure that we see the request before we copy it. */ @@ -1406,10 +1219,8 @@ static unsigned xen_netbk_tx_build_gops(struct xen_netbk *netbk) /* Credit-based scheduling. */ if (txreq.size > vif->remaining_credit && - tx_credit_exceeded(vif, txreq.size)) { - xenvif_put(vif); - continue; - } + tx_credit_exceeded(vif, txreq.size)) + break; vif->remaining_credit -= txreq.size; @@ -1422,12 +1233,12 @@ static unsigned xen_netbk_tx_build_gops(struct xen_netbk *netbk) work_to_do); idx = vif->tx.req_cons; if (unlikely(work_to_do < 0)) - continue; + break; } ret = netbk_count_requests(vif, &txreq, txfrags, work_to_do); if (unlikely(ret < 0)) - continue; + break; idx += ret; @@ -1435,7 +1246,7 @@ static unsigned xen_netbk_tx_build_gops(struct xen_netbk *netbk) netdev_dbg(vif->dev, "Bad packet size: %d\n", txreq.size); netbk_tx_err(vif, &txreq, idx); - continue; + break; } /* No crossing a page as the payload mustn''t fragment. */ @@ -1445,11 +1256,11 @@ static unsigned xen_netbk_tx_build_gops(struct xen_netbk *netbk) txreq.offset, txreq.size, (txreq.offset&~PAGE_MASK) + txreq.size); netbk_fatal_tx_err(vif); - continue; + break; } - index = pending_index(netbk->pending_cons); - pending_idx = netbk->pending_ring[index]; + index = pending_index(vif->pending_cons); + pending_idx = vif->pending_ring[index]; data_len = (txreq.size > PKT_PROT_LEN && ret < XEN_NETBK_LEGACY_SLOTS_MAX) ? @@ -1474,16 +1285,16 @@ static unsigned xen_netbk_tx_build_gops(struct xen_netbk *netbk) if (netbk_set_skb_gso(vif, skb, gso)) { /* Failure in netbk_set_skb_gso is fatal. */ kfree_skb(skb); - continue; + break; } } /* XXX could copy straight to head */ - page = xen_netbk_alloc_page(netbk, pending_idx); + page = xen_netbk_alloc_page(vif, pending_idx); if (!page) { kfree_skb(skb); netbk_tx_err(vif, &txreq, idx); - continue; + break; } gop->source.u.ref = txreq.gref; @@ -1499,10 +1310,9 @@ static unsigned xen_netbk_tx_build_gops(struct xen_netbk *netbk) gop++; - memcpy(&netbk->pending_tx_info[pending_idx].req, + memcpy(&vif->pending_tx_info[pending_idx].req, &txreq, sizeof(txreq)); - netbk->pending_tx_info[pending_idx].vif = vif; - netbk->pending_tx_info[pending_idx].head = index; + vif->pending_tx_info[pending_idx].head = index; *((u16 *)skb->data) = pending_idx; __skb_put(skb, data_len); @@ -1517,46 +1327,45 @@ static unsigned xen_netbk_tx_build_gops(struct xen_netbk *netbk) INVALID_PENDING_IDX); } - netbk->pending_cons++; + vif->pending_cons++; - request_gop = xen_netbk_get_requests(netbk, vif, - skb, txfrags, gop); + request_gop = xen_netbk_get_requests(vif, skb, txfrags, gop); if (request_gop == NULL) { kfree_skb(skb); netbk_tx_err(vif, &txreq, idx); - continue; + break; } gop = request_gop; - __skb_queue_tail(&netbk->tx_queue, skb); + __skb_queue_tail(&vif->tx_queue, skb); vif->tx.req_cons = idx; - xen_netbk_check_rx_xenvif(vif); - if ((gop-netbk->tx_copy_ops) >= ARRAY_SIZE(netbk->tx_copy_ops)) + if ((gop-vif->tx_copy_ops) >= ARRAY_SIZE(vif->tx_copy_ops)) break; } - return gop - netbk->tx_copy_ops; + return gop - vif->tx_copy_ops; } -static void xen_netbk_tx_submit(struct xen_netbk *netbk) + +static int xen_netbk_tx_submit(struct xenvif *vif, int budget) { - struct gnttab_copy *gop = netbk->tx_copy_ops; + struct gnttab_copy *gop = vif->tx_copy_ops; struct sk_buff *skb; + int work_done = 0; - while ((skb = __skb_dequeue(&netbk->tx_queue)) != NULL) { + while (work_done < budget && + (skb = __skb_dequeue(&vif->tx_queue)) != NULL) { struct xen_netif_tx_request *txp; - struct xenvif *vif; u16 pending_idx; unsigned data_len; pending_idx = *((u16 *)skb->data); - vif = netbk->pending_tx_info[pending_idx].vif; - txp = &netbk->pending_tx_info[pending_idx].req; + txp = &vif->pending_tx_info[pending_idx].req; /* Check the remap error code. */ - if (unlikely(xen_netbk_tx_check_gop(netbk, skb, &gop))) { + if (unlikely(xen_netbk_tx_check_gop(vif, skb, &gop))) { netdev_dbg(vif->dev, "netback grant failed.\n"); skb_shinfo(skb)->nr_frags = 0; kfree_skb(skb); @@ -1565,7 +1374,7 @@ static void xen_netbk_tx_submit(struct xen_netbk *netbk) data_len = skb->len; memcpy(skb->data, - (void *)(idx_to_kaddr(netbk, pending_idx)|txp->offset), + (void *)(idx_to_kaddr(vif, pending_idx)|txp->offset), data_len); if (data_len < txp->size) { /* Append the packet payload as a fragment. */ @@ -1573,7 +1382,8 @@ static void xen_netbk_tx_submit(struct xen_netbk *netbk) txp->size -= data_len; } else { /* Schedule a response immediately. */ - xen_netbk_idx_release(netbk, pending_idx, XEN_NETIF_RSP_OKAY); + xen_netbk_idx_release(vif, pending_idx, + XEN_NETIF_RSP_OKAY); } if (txp->flags & XEN_NETTXF_csum_blank) @@ -1581,7 +1391,7 @@ static void xen_netbk_tx_submit(struct xen_netbk *netbk) else if (txp->flags & XEN_NETTXF_data_validated) skb->ip_summed = CHECKSUM_UNNECESSARY; - xen_netbk_fill_frags(netbk, skb); + xen_netbk_fill_frags(vif, skb); /* * If the initial fragment was < PKT_PROT_LEN then @@ -1609,53 +1419,61 @@ static void xen_netbk_tx_submit(struct xen_netbk *netbk) vif->dev->stats.rx_bytes += skb->len; vif->dev->stats.rx_packets++; - xenvif_receive_skb(vif, skb); + work_done++; + + netif_receive_skb(skb); } + + return work_done; } /* Called after netfront has transmitted */ -static void xen_netbk_tx_action(struct xen_netbk *netbk) +int xen_netbk_tx_action(struct xenvif *vif, int budget) { unsigned nr_gops; + int work_done; - nr_gops = xen_netbk_tx_build_gops(netbk); + if (unlikely(!tx_work_todo(vif))) + return 0; + + nr_gops = xen_netbk_tx_build_gops(vif); if (nr_gops == 0) - return; + return 0; + + gnttab_batch_copy(vif->tx_copy_ops, nr_gops); - gnttab_batch_copy(netbk->tx_copy_ops, nr_gops); + work_done = xen_netbk_tx_submit(vif, nr_gops); - xen_netbk_tx_submit(netbk); + return work_done; } -static void xen_netbk_idx_release(struct xen_netbk *netbk, u16 pending_idx, +static void xen_netbk_idx_release(struct xenvif *vif, u16 pending_idx, u8 status) { - struct xenvif *vif; struct pending_tx_info *pending_tx_info; pending_ring_idx_t head; u16 peek; /* peek into next tx request */ - BUG_ON(netbk->mmap_pages[pending_idx] == (void *)(~0UL)); + BUG_ON(vif->mmap_pages[pending_idx] == (void *)(~0UL)); /* Already complete? */ - if (netbk->mmap_pages[pending_idx] == NULL) + if (vif->mmap_pages[pending_idx] == NULL) return; - pending_tx_info = &netbk->pending_tx_info[pending_idx]; + pending_tx_info = &vif->pending_tx_info[pending_idx]; - vif = pending_tx_info->vif; head = pending_tx_info->head; - BUG_ON(!pending_tx_is_head(netbk, head)); - BUG_ON(netbk->pending_ring[pending_index(head)] != pending_idx); + BUG_ON(!pending_tx_is_head(vif, head)); + BUG_ON(vif->pending_ring[pending_index(head)] != pending_idx); do { pending_ring_idx_t index; pending_ring_idx_t idx = pending_index(head); - u16 info_idx = netbk->pending_ring[idx]; + u16 info_idx = vif->pending_ring[idx]; - pending_tx_info = &netbk->pending_tx_info[info_idx]; + pending_tx_info = &vif->pending_tx_info[info_idx]; make_tx_response(vif, &pending_tx_info->req, status); /* Setting any number other than @@ -1664,18 +1482,15 @@ static void xen_netbk_idx_release(struct xen_netbk *netbk, u16 pending_idx, */ pending_tx_info->head = 0; - index = pending_index(netbk->pending_prod++); - netbk->pending_ring[index] = netbk->pending_ring[info_idx]; + index = pending_index(vif->pending_prod++); + vif->pending_ring[index] = vif->pending_ring[info_idx]; - xenvif_put(vif); + peek = vif->pending_ring[pending_index(++head)]; - peek = netbk->pending_ring[pending_index(++head)]; + } while (!pending_tx_is_head(vif, peek)); - } while (!pending_tx_is_head(netbk, peek)); - - netbk->mmap_pages[pending_idx]->mapping = 0; - put_page(netbk->mmap_pages[pending_idx]); - netbk->mmap_pages[pending_idx] = NULL; + put_page(vif->mmap_pages[pending_idx]); + vif->mmap_pages[pending_idx] = NULL; } @@ -1723,45 +1538,22 @@ static struct xen_netif_rx_response *make_rx_response(struct xenvif *vif, return resp; } -static inline int rx_work_todo(struct xen_netbk *netbk) +static inline int rx_work_todo(struct xenvif *vif) { - return !skb_queue_empty(&netbk->rx_queue); + return !skb_queue_empty(&vif->rx_queue); } -static inline int tx_work_todo(struct xen_netbk *netbk) +static inline int tx_work_todo(struct xenvif *vif) { - if ((nr_pending_reqs(netbk) + XEN_NETBK_LEGACY_SLOTS_MAX - < MAX_PENDING_REQS) && - !list_empty(&netbk->net_schedule_list)) + if (likely(RING_HAS_UNCONSUMED_REQUESTS(&vif->tx)) && + (nr_pending_reqs(vif) + XEN_NETBK_LEGACY_SLOTS_MAX + < MAX_PENDING_REQS)) return 1; return 0; } -static int xen_netbk_kthread(void *data) -{ - struct xen_netbk *netbk = data; - while (!kthread_should_stop()) { - wait_event_interruptible(netbk->wq, - rx_work_todo(netbk) || - tx_work_todo(netbk) || - kthread_should_stop()); - cond_resched(); - - if (kthread_should_stop()) - break; - - if (rx_work_todo(netbk)) - xen_netbk_rx_action(netbk); - - if (tx_work_todo(netbk)) - xen_netbk_tx_action(netbk); - } - - return 0; -} - void xen_netbk_unmap_frontend_rings(struct xenvif *vif) { if (vif->tx.sring) @@ -1807,11 +1599,29 @@ err: return err; } +int xen_netbk_kthread(void *data) +{ + struct xenvif *vif = data; + + while (!kthread_should_stop()) { + wait_event_interruptible(vif->wq, + rx_work_todo(vif) || + kthread_should_stop()); + if (kthread_should_stop()) + break; + + if (rx_work_todo(vif)) + xen_netbk_rx_action(vif); + + cond_resched(); + } + + return 0; +} + static int __init netback_init(void) { - int i; int rc = 0; - int group; if (!xen_domain()) return -ENODEV; @@ -1822,48 +1632,6 @@ static int __init netback_init(void) fatal_skb_slots = XEN_NETBK_LEGACY_SLOTS_MAX; } - xen_netbk_group_nr = num_online_cpus(); - xen_netbk = vzalloc(sizeof(struct xen_netbk) * xen_netbk_group_nr); - if (!xen_netbk) - return -ENOMEM; - - for (group = 0; group < xen_netbk_group_nr; group++) { - struct xen_netbk *netbk = &xen_netbk[group]; - skb_queue_head_init(&netbk->rx_queue); - skb_queue_head_init(&netbk->tx_queue); - - init_timer(&netbk->net_timer); - netbk->net_timer.data = (unsigned long)netbk; - netbk->net_timer.function = xen_netbk_alarm; - - netbk->pending_cons = 0; - netbk->pending_prod = MAX_PENDING_REQS; - for (i = 0; i < MAX_PENDING_REQS; i++) - netbk->pending_ring[i] = i; - - init_waitqueue_head(&netbk->wq); - netbk->task = kthread_create(xen_netbk_kthread, - (void *)netbk, - "netback/%u", group); - - if (IS_ERR(netbk->task)) { - pr_alert("kthread_create() fails at netback\n"); - del_timer(&netbk->net_timer); - rc = PTR_ERR(netbk->task); - goto failed_init; - } - - kthread_bind(netbk->task, group); - - INIT_LIST_HEAD(&netbk->net_schedule_list); - - spin_lock_init(&netbk->net_schedule_list_lock); - - atomic_set(&netbk->netfront_count, 0); - - wake_up_process(netbk->task); - } - rc = xenvif_xenbus_init(); if (rc) goto failed_init; @@ -1871,35 +1639,14 @@ static int __init netback_init(void) return 0; failed_init: - while (--group >= 0) { - struct xen_netbk *netbk = &xen_netbk[group]; - del_timer(&netbk->net_timer); - kthread_stop(netbk->task); - } - vfree(xen_netbk); return rc; - } module_init(netback_init); static void __exit netback_fini(void) { - int i, j; - xenvif_xenbus_fini(); - - for (i = 0; i < xen_netbk_group_nr; i++) { - struct xen_netbk *netbk = &xen_netbk[i]; - del_timer_sync(&netbk->net_timer); - kthread_stop(netbk->task); - for (j = 0; j < MAX_PENDING_REQS; j++) { - if (netbk->mmap_pages[j]) - __free_page(netbk->mmap_pages[j]); - } - } - - vfree(xen_netbk); } module_exit(netback_fini); -- 1.7.10.4
Ian Campbell
2013-Aug-06 13:50 UTC
Re: [PATCH V4 2/3] xen-netback: switch to NAPI + kthread 1:1 model
On Tue, 2013-08-06 at 14:33 +0100, Wei Liu wrote:> I sent this patch right before Ian requested more comments in code. Now > I''ve updated this one. Not going to resend the whole series as this is > only changes in comments. > > Sorry for my bad English, I don''t know whether I''ve made them clear > enough. Suggestions are welcomed. > > > Wei. > > ---8<--- > From bb33027fe4bafeea546352cd3e409466f8bd7aa4 Mon Sep 17 00:00:00 2001 > From: Wei Liu <wei.liu2@citrix.com> > Date: Tue, 6 Aug 2013 06:59:15 +0100 > Subject: [PATCH] xen-netback: switch to NAPI + kthread 1:1 model > > This patch implements 1:1 model netback. NAPI and kthread are utilized > to do the weight-lifting job: > > - NAPI is used for guest side TX (host side RX) > - kthread is used for guest side RX (host side TX) > > Xenvif and xen_netbk are made into one structure to reduce code size. > > This model provides better scheduling fairness among vifs. It is also > prerequisite for implementing multiqueue for Xen netback. > > Signed-off-by: Wei Liu <wei.liu2@citrix.com> > --- > drivers/net/xen-netback/common.h | 128 +++++--- > drivers/net/xen-netback/interface.c | 119 ++++--- > drivers/net/xen-netback/netback.c | 607 ++++++++++------------------------- > 3 files changed, 347 insertions(+), 507 deletions(-) > > diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h > index 8a4d77e..184ae0a 100644 > --- a/drivers/net/xen-netback/common.h > +++ b/drivers/net/xen-netback/common.h > @@ -45,31 +45,105 @@ > #include <xen/grant_table.h> > #include <xen/xenbus.h> > > -struct xen_netbk; > +typedef unsigned int pending_ring_idx_t; > +#define INVALID_PENDING_RING_IDX (~0U) > + > +/* For the head field in pending_tx_info: It is used to indicate > + * whether this tx info is the head of one or more coalesced requests. > + * > + * When head != INVALID_PENDING_RING_IDX, it means the start of a new > + * tx requests queue and the end of previous queue. > + * > + * An example sequence of head fields (I = INVALID_PENDING_RING_IDX): > + * > + * ... 0 I I I 5 I 9 I I I ... > + * > + * After consming the first packet we have:"consuming" The first packet here is "0 I I I"? Consisting of the head (0) and 3 extra data slots (the 3xI)?> + * > + * ... V V V V 5 I 9 I I I ... > + * > + * where V stands for "valid pending ring index", any number other > + * than INVALID_PENDING_RING_IDX is OK. In practice we use 0.OK, this is where I get confused, because 0 is also valid in a different state, this one:> + * The non-INVALID_PENDING_RING_IDX (say 0, 5 and 9 in the above > + * example) number is the index into pending_tx_info and mmap_pages > + * arrays.So what does V mean and how to you distinguish this from a non-INVALID_PENDING_RING_IDX which happens to correspond to the 0 you use in practice for V? I''m also not sure how 0 is considered a "valid pending ring index". I suppose it is not INVALID_PENDING_RING_IDX but it appears that it doesn''t have any actual meaning? So it''s just an arbitrary valid but meaningless number perhaps? Ian.
Andrew Bennieston
2013-Aug-06 14:17 UTC
Re: [PATCH V4 0/3] xen-netback: switch to NAPI + kthread 1:1 model
On 06/08/13 14:29, David Vrabel wrote:> On 06/08/13 14:16, Pasi Kärkkäinen wrote: >> On Tue, Aug 06, 2013 at 10:06:00AM +0100, Wei Liu wrote: >>> >>> IRQs are distributed to 4 cores by hand in the new model, while in the >>> old model vifs are automatically distributed to 4 kthreads. >>> >> >> Hmm.. so with these patches applied is is *required* to do manual configuration in dom0 to get good performance? > > This should be irqbalanced's job. The existing version doesn't do a > good enough job yet though. Andrew Bennieston may have more details. > > David >irqbalance 1.0.6 [1] includes a patch [2] from Wei Liu [3] that adds support for balancing `xen-dyn-event' interrupts. When I have compiled this version and run it under Xen(Server) I noticed that the interrupts are indeed moving between cores, but not necessarily in what I would call an obvious or optimal way (e.g. several VIF interrupts are being grouped onto a single dom0 VCPU at times). I plan on investigating this further when time permits. I also noticed that, from time to time, the irqbalance process disappears. I tracked this down to a segfault that occurs when a VM shuts down and an IRQ disappears during one of irqbalance's periodic rescans. I'm hoping to be able to narrow this down sufficiently to identify the cause and ideally fix it, but I don't have a lot of time to work on this at the moment. As for the impact on Wei's patches, without irqbalance it would be trivial to automatically assign (via a script, on VM start) the interrupts for a particular VIF to a particular dom0 vCPU in a round-robin fashion, just as VIFs were previously assigned to netback kthreads. This would result in broadly the same performance as before, while an improved irqbalanced should give better performance and fairness when two different VIFs would otherwise be competing for the same resources. Andrew. [1] https://code.google.com/p/irqbalance/downloads [2] https://code.google.com/p/irqbalance/source/detail?r=736a91b739018530c44e59d9b17252f88227fe51 [3] https://code.google.com/p/irqbalance/issues/detail?id=46 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Wei Liu
2013-Aug-06 14:50 UTC
Re: [PATCH V4 2/3] xen-netback: switch to NAPI + kthread 1:1 model
On Tue, Aug 06, 2013 at 02:50:04PM +0100, Ian Campbell wrote:> On Tue, 2013-08-06 at 14:33 +0100, Wei Liu wrote: > > I sent this patch right before Ian requested more comments in code. Now > > I''ve updated this one. Not going to resend the whole series as this is > > only changes in comments. > > > > Sorry for my bad English, I don''t know whether I''ve made them clear > > enough. Suggestions are welcomed. > > > > > > Wei. > > > > ---8<--- > > From bb33027fe4bafeea546352cd3e409466f8bd7aa4 Mon Sep 17 00:00:00 2001 > > From: Wei Liu <wei.liu2@citrix.com> > > Date: Tue, 6 Aug 2013 06:59:15 +0100 > > Subject: [PATCH] xen-netback: switch to NAPI + kthread 1:1 model > > > > This patch implements 1:1 model netback. NAPI and kthread are utilized > > to do the weight-lifting job: > > > > - NAPI is used for guest side TX (host side RX) > > - kthread is used for guest side RX (host side TX) > > > > Xenvif and xen_netbk are made into one structure to reduce code size. > > > > This model provides better scheduling fairness among vifs. It is also > > prerequisite for implementing multiqueue for Xen netback. > > > > Signed-off-by: Wei Liu <wei.liu2@citrix.com> > > --- > > drivers/net/xen-netback/common.h | 128 +++++--- > > drivers/net/xen-netback/interface.c | 119 ++++--- > > drivers/net/xen-netback/netback.c | 607 ++++++++++------------------------- > > 3 files changed, 347 insertions(+), 507 deletions(-) > > > > diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h > > index 8a4d77e..184ae0a 100644 > > --- a/drivers/net/xen-netback/common.h > > +++ b/drivers/net/xen-netback/common.h > > @@ -45,31 +45,105 @@ > > #include <xen/grant_table.h> > > #include <xen/xenbus.h> > > > > -struct xen_netbk; > > +typedef unsigned int pending_ring_idx_t; > > +#define INVALID_PENDING_RING_IDX (~0U) > > + > > +/* For the head field in pending_tx_info: It is used to indicate > > + * whether this tx info is the head of one or more coalesced requests. > > + * > > + * When head != INVALID_PENDING_RING_IDX, it means the start of a new > > + * tx requests queue and the end of previous queue. > > + * > > + * An example sequence of head fields (I = INVALID_PENDING_RING_IDX): > > + * > > + * ... 0 I I I 5 I 9 I I I ... > > + * > > + * After consming the first packet we have: > > "consuming" >Oops...> The first packet here is "0 I I I"? Consisting of the head (0) and 3 > extra data slots (the 3xI)? >Yes. But the term "extra data slot" is not acurate. "0 I I I" means we''ve coalesced 3 more slots into the first slot. And the index of first slot into various arrays is 0.> > + * > > + * ... V V V V 5 I 9 I I I ... > > + * > > + * where V stands for "valid pending ring index", any number other > > + * than INVALID_PENDING_RING_IDX is OK. In practice we use 0. > > OK, this is where I get confused, because 0 is also valid in a different > state, this one: > > + * The non-INVALID_PENDING_RING_IDX (say 0, 5 and 9 in the above > > + * example) number is the index into pending_tx_info and mmap_pages > > + * arrays. > > So what does V mean and how to you distinguish this from a > non-INVALID_PENDING_RING_IDX which happens to correspond to the 0 you > use in practice for V? >Oh, V is just another way of saying non-INVALID_PENDING_RING_IDX. :-/ The head field is always manipulated when coalescing slots and releasing slots. So if the routine sees 0 when processing a slot is actually means index 0.> I''m also not sure how 0 is considered a "valid pending ring index". I > suppose it is not INVALID_PENDING_RING_IDX but it appears that it > doesn''t have any actual meaning? So it''s just an arbitrary valid but > meaningless number perhaps? >Yes. 0 is as good as any number other than INVALID_PENDING_RING_IDX. Wei.> Ian.
Ian Campbell
2013-Aug-06 14:56 UTC
Re: [PATCH V4 2/3] xen-netback: switch to NAPI + kthread 1:1 model
On Tue, 2013-08-06 at 15:50 +0100, Wei Liu wrote:> On Tue, Aug 06, 2013 at 02:50:04PM +0100, Ian Campbell wrote: > > On Tue, 2013-08-06 at 14:33 +0100, Wei Liu wrote: > > > I sent this patch right before Ian requested more comments in code. Now > > > I''ve updated this one. Not going to resend the whole series as this is > > > only changes in comments. > > > > > > Sorry for my bad English, I don''t know whether I''ve made them clear > > > enough. Suggestions are welcomed. > > > > > > > > > Wei. > > > > > > ---8<--- > > > From bb33027fe4bafeea546352cd3e409466f8bd7aa4 Mon Sep 17 00:00:00 2001 > > > From: Wei Liu <wei.liu2@citrix.com> > > > Date: Tue, 6 Aug 2013 06:59:15 +0100 > > > Subject: [PATCH] xen-netback: switch to NAPI + kthread 1:1 model > > > > > > This patch implements 1:1 model netback. NAPI and kthread are utilized > > > to do the weight-lifting job: > > > > > > - NAPI is used for guest side TX (host side RX) > > > - kthread is used for guest side RX (host side TX) > > > > > > Xenvif and xen_netbk are made into one structure to reduce code size. > > > > > > This model provides better scheduling fairness among vifs. It is also > > > prerequisite for implementing multiqueue for Xen netback. > > > > > > Signed-off-by: Wei Liu <wei.liu2@citrix.com> > > > --- > > > drivers/net/xen-netback/common.h | 128 +++++--- > > > drivers/net/xen-netback/interface.c | 119 ++++--- > > > drivers/net/xen-netback/netback.c | 607 ++++++++++------------------------- > > > 3 files changed, 347 insertions(+), 507 deletions(-) > > > > > > diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h > > > index 8a4d77e..184ae0a 100644 > > > --- a/drivers/net/xen-netback/common.h > > > +++ b/drivers/net/xen-netback/common.h > > > @@ -45,31 +45,105 @@ > > > #include <xen/grant_table.h> > > > #include <xen/xenbus.h> > > > > > > -struct xen_netbk; > > > +typedef unsigned int pending_ring_idx_t; > > > +#define INVALID_PENDING_RING_IDX (~0U) > > > + > > > +/* For the head field in pending_tx_info: It is used to indicate > > > + * whether this tx info is the head of one or more coalesced requests. > > > + * > > > + * When head != INVALID_PENDING_RING_IDX, it means the start of a new > > > + * tx requests queue and the end of previous queue. > > > + * > > > + * An example sequence of head fields (I = INVALID_PENDING_RING_IDX): > > > + * > > > + * ... 0 I I I 5 I 9 I I I ... > > > + * > > > + * After consming the first packet we have: > > > > "consuming" > > > > Oops... > > > The first packet here is "0 I I I"? Consisting of the head (0) and 3 > > extra data slots (the 3xI)? > > > > Yes. But the term "extra data slot" is not acurate. "0 I I I" means > we''ve coalesced 3 more slots into the first slot. And the index of first > slot into various arrays is 0.Got it, I think.> > > > + * > > > + * ... V V V V 5 I 9 I I I ... > > > + * > > > + * where V stands for "valid pending ring index", any number other > > > + * than INVALID_PENDING_RING_IDX is OK. In practice we use 0. > > > > OK, this is where I get confused, because 0 is also valid in a different > > state, this one: > > > + * The non-INVALID_PENDING_RING_IDX (say 0, 5 and 9 in the above > > > + * example) number is the index into pending_tx_info and mmap_pages > > > + * arrays. > > > > So what does V mean and how to you distinguish this from a > > non-INVALID_PENDING_RING_IDX which happens to correspond to the 0 you > > use in practice for V? > > > > Oh, V is just another way of saying non-INVALID_PENDING_RING_IDX. :-/ > > The head field is always manipulated when coalescing slots and releasing > slots. So if the routine sees 0 when processing a slot is actually means > index 0.But you say "In practice we use 0", but you could use any arbitrary integer which is != INVALID_PENDING_RING_IDX?> > I''m also not sure how 0 is considered a "valid pending ring index". I > > suppose it is not INVALID_PENDING_RING_IDX but it appears that it > > doesn''t have any actual meaning? So it''s just an arbitrary valid but > > meaningless number perhaps? > > > > Yes. 0 is as good as any number other than INVALID_PENDING_RING_IDX.So I still have this confusion between a ''V'' entry and an entry which contains an actual index into various arrays. How are they different? AIUI each entry can be one of three types: a) Contains INVALID_PENDING_RING_IDX (is a coalesced into a previous head) b) Contains an index into various arrays (an integer) c) Contains ''V'', a "valid pending ring index", usually 0. So how are b) and c) distinguished? I think b) is the head, so what is c)?
Wei Liu
2013-Aug-06 15:05 UTC
Re: [PATCH V4 2/3] xen-netback: switch to NAPI + kthread 1:1 model
On Tue, Aug 06, 2013 at 03:56:46PM +0100, Ian Campbell wrote:> On Tue, 2013-08-06 at 15:50 +0100, Wei Liu wrote: > > On Tue, Aug 06, 2013 at 02:50:04PM +0100, Ian Campbell wrote: > > > On Tue, 2013-08-06 at 14:33 +0100, Wei Liu wrote: > > > > I sent this patch right before Ian requested more comments in code. Now > > > > I''ve updated this one. Not going to resend the whole series as this is > > > > only changes in comments. > > > > > > > > Sorry for my bad English, I don''t know whether I''ve made them clear > > > > enough. Suggestions are welcomed. > > > > > > > > > > > > Wei. > > > > > > > > ---8<--- > > > > From bb33027fe4bafeea546352cd3e409466f8bd7aa4 Mon Sep 17 00:00:00 2001 > > > > From: Wei Liu <wei.liu2@citrix.com> > > > > Date: Tue, 6 Aug 2013 06:59:15 +0100 > > > > Subject: [PATCH] xen-netback: switch to NAPI + kthread 1:1 model > > > > > > > > This patch implements 1:1 model netback. NAPI and kthread are utilized > > > > to do the weight-lifting job: > > > > > > > > - NAPI is used for guest side TX (host side RX) > > > > - kthread is used for guest side RX (host side TX) > > > > > > > > Xenvif and xen_netbk are made into one structure to reduce code size. > > > > > > > > This model provides better scheduling fairness among vifs. It is also > > > > prerequisite for implementing multiqueue for Xen netback. > > > > > > > > Signed-off-by: Wei Liu <wei.liu2@citrix.com> > > > > --- > > > > drivers/net/xen-netback/common.h | 128 +++++--- > > > > drivers/net/xen-netback/interface.c | 119 ++++--- > > > > drivers/net/xen-netback/netback.c | 607 ++++++++++------------------------- > > > > 3 files changed, 347 insertions(+), 507 deletions(-) > > > > > > > > diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h > > > > index 8a4d77e..184ae0a 100644 > > > > --- a/drivers/net/xen-netback/common.h > > > > +++ b/drivers/net/xen-netback/common.h > > > > @@ -45,31 +45,105 @@ > > > > #include <xen/grant_table.h> > > > > #include <xen/xenbus.h> > > > > > > > > -struct xen_netbk; > > > > +typedef unsigned int pending_ring_idx_t; > > > > +#define INVALID_PENDING_RING_IDX (~0U) > > > > + > > > > +/* For the head field in pending_tx_info: It is used to indicate > > > > + * whether this tx info is the head of one or more coalesced requests. > > > > + * > > > > + * When head != INVALID_PENDING_RING_IDX, it means the start of a new > > > > + * tx requests queue and the end of previous queue. > > > > + * > > > > + * An example sequence of head fields (I = INVALID_PENDING_RING_IDX): > > > > + * > > > > + * ... 0 I I I 5 I 9 I I I ... > > > > + * > > > > + * After consming the first packet we have: > > > > > > "consuming" > > > > > > > Oops... > > > > > The first packet here is "0 I I I"? Consisting of the head (0) and 3 > > > extra data slots (the 3xI)? > > > > > > > Yes. But the term "extra data slot" is not acurate. "0 I I I" means > > we''ve coalesced 3 more slots into the first slot. And the index of first > > slot into various arrays is 0. > > Got it, I think. > > > > > > > + * > > > > + * ... V V V V 5 I 9 I I I ... > > > > + * > > > > + * where V stands for "valid pending ring index", any number other > > > > + * than INVALID_PENDING_RING_IDX is OK. In practice we use 0. > > > > > > OK, this is where I get confused, because 0 is also valid in a different > > > state, this one: > > > > + * The non-INVALID_PENDING_RING_IDX (say 0, 5 and 9 in the above > > > > + * example) number is the index into pending_tx_info and mmap_pages > > > > + * arrays. > > > > > > So what does V mean and how to you distinguish this from a > > > non-INVALID_PENDING_RING_IDX which happens to correspond to the 0 you > > > use in practice for V? > > > > > > > Oh, V is just another way of saying non-INVALID_PENDING_RING_IDX. :-/ > > > > The head field is always manipulated when coalescing slots and releasing > > slots. So if the routine sees 0 when processing a slot is actually means > > index 0. > > But you say "In practice we use 0", but you could use any arbitrary > integer which is != INVALID_PENDING_RING_IDX? >Yes. This is only useful when releasing slots. When the routine sees a number not equal to INVALID_PENDING_RING_IDX it knows the previous sequence has ended. So setting head to 0 is as good as any.> > > I''m also not sure how 0 is considered a "valid pending ring index". I > > > suppose it is not INVALID_PENDING_RING_IDX but it appears that it > > > doesn''t have any actual meaning? So it''s just an arbitrary valid but > > > meaningless number perhaps? > > > > > > > Yes. 0 is as good as any number other than INVALID_PENDING_RING_IDX. > > So I still have this confusion between a ''V'' entry and an entry which > contains an actual index into various arrays. How are they different? > > AIUI each entry can be one of three types: > > a) Contains INVALID_PENDING_RING_IDX (is a coalesced into a previous head) > b) Contains an index into various arrays (an integer) > c) Contains ''V'', a "valid pending ring index", usually 0. > > So how are b) and c) distinguished? I think b) is the head, so what is > c)?c) state can only be reached *after* releasing a slot. When the slot is reused into b) state (after coalescing), the head field is then set to the actual number of the index. b) and c) are mutual exclusive. Wei.
Ian Campbell
2013-Aug-06 15:21 UTC
Re: [PATCH V4 2/3] xen-netback: switch to NAPI + kthread 1:1 model
On Tue, 2013-08-06 at 16:05 +0100, Wei Liu wrote:> On Tue, Aug 06, 2013 at 03:56:46PM +0100, Ian Campbell wrote: > > On Tue, 2013-08-06 at 15:50 +0100, Wei Liu wrote: > > > On Tue, Aug 06, 2013 at 02:50:04PM +0100, Ian Campbell wrote: > > > > On Tue, 2013-08-06 at 14:33 +0100, Wei Liu wrote: > > > > > I sent this patch right before Ian requested more comments in code. Now > > > > > I''ve updated this one. Not going to resend the whole series as this is > > > > > only changes in comments. > > > > > > > > > > Sorry for my bad English, I don''t know whether I''ve made them clear > > > > > enough. Suggestions are welcomed. > > > > > > > > > > > > > > > Wei. > > > > > > > > > > ---8<--- > > > > > From bb33027fe4bafeea546352cd3e409466f8bd7aa4 Mon Sep 17 00:00:00 2001 > > > > > From: Wei Liu <wei.liu2@citrix.com> > > > > > Date: Tue, 6 Aug 2013 06:59:15 +0100 > > > > > Subject: [PATCH] xen-netback: switch to NAPI + kthread 1:1 model > > > > > > > > > > This patch implements 1:1 model netback. NAPI and kthread are utilized > > > > > to do the weight-lifting job: > > > > > > > > > > - NAPI is used for guest side TX (host side RX) > > > > > - kthread is used for guest side RX (host side TX) > > > > > > > > > > Xenvif and xen_netbk are made into one structure to reduce code size. > > > > > > > > > > This model provides better scheduling fairness among vifs. It is also > > > > > prerequisite for implementing multiqueue for Xen netback. > > > > > > > > > > Signed-off-by: Wei Liu <wei.liu2@citrix.com> > > > > > --- > > > > > drivers/net/xen-netback/common.h | 128 +++++--- > > > > > drivers/net/xen-netback/interface.c | 119 ++++--- > > > > > drivers/net/xen-netback/netback.c | 607 ++++++++++------------------------- > > > > > 3 files changed, 347 insertions(+), 507 deletions(-) > > > > > > > > > > diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h > > > > > index 8a4d77e..184ae0a 100644 > > > > > --- a/drivers/net/xen-netback/common.h > > > > > +++ b/drivers/net/xen-netback/common.h > > > > > @@ -45,31 +45,105 @@ > > > > > #include <xen/grant_table.h> > > > > > #include <xen/xenbus.h> > > > > > > > > > > -struct xen_netbk; > > > > > +typedef unsigned int pending_ring_idx_t; > > > > > +#define INVALID_PENDING_RING_IDX (~0U) > > > > > + > > > > > +/* For the head field in pending_tx_info: It is used to indicate > > > > > + * whether this tx info is the head of one or more coalesced requests. > > > > > + * > > > > > + * When head != INVALID_PENDING_RING_IDX, it means the start of a new > > > > > + * tx requests queue and the end of previous queue. > > > > > + * > > > > > + * An example sequence of head fields (I = INVALID_PENDING_RING_IDX): > > > > > + * > > > > > + * ... 0 I I I 5 I 9 I I I ... > > > > > + * > > > > > + * After consming the first packet we have: > > > > > > > > "consuming" > > > > > > > > > > Oops... > > > > > > > The first packet here is "0 I I I"? Consisting of the head (0) and 3 > > > > extra data slots (the 3xI)? > > > > > > > > > > Yes. But the term "extra data slot" is not acurate. "0 I I I" means > > > we''ve coalesced 3 more slots into the first slot. And the index of first > > > slot into various arrays is 0. > > > > Got it, I think. > > > > > > > > > > + * > > > > > + * ... V V V V 5 I 9 I I I ... > > > > > + * > > > > > + * where V stands for "valid pending ring index", any number other > > > > > + * than INVALID_PENDING_RING_IDX is OK. In practice we use 0. > > > > > > > > OK, this is where I get confused, because 0 is also valid in a different > > > > state, this one: > > > > > + * The non-INVALID_PENDING_RING_IDX (say 0, 5 and 9 in the above > > > > > + * example) number is the index into pending_tx_info and mmap_pages > > > > > + * arrays. > > > > > > > > So what does V mean and how to you distinguish this from a > > > > non-INVALID_PENDING_RING_IDX which happens to correspond to the 0 you > > > > use in practice for V? > > > > > > > > > > Oh, V is just another way of saying non-INVALID_PENDING_RING_IDX. :-/ > > > > > > The head field is always manipulated when coalescing slots and releasing > > > slots. So if the routine sees 0 when processing a slot is actually means > > > index 0. > > > > But you say "In practice we use 0", but you could use any arbitrary > > integer which is != INVALID_PENDING_RING_IDX? > > > > Yes. This is only useful when releasing slots. When the routine sees a > number not equal to INVALID_PENDING_RING_IDX it knows the previous > sequence has ended. So setting head to 0 is as good as any. > > > > > I''m also not sure how 0 is considered a "valid pending ring index". I > > > > suppose it is not INVALID_PENDING_RING_IDX but it appears that it > > > > doesn''t have any actual meaning? So it''s just an arbitrary valid but > > > > meaningless number perhaps? > > > > > > > > > > Yes. 0 is as good as any number other than INVALID_PENDING_RING_IDX. > > > > So I still have this confusion between a ''V'' entry and an entry which > > contains an actual index into various arrays. How are they different? > > > > AIUI each entry can be one of three types: > > > > a) Contains INVALID_PENDING_RING_IDX (is a coalesced into a previous head) > > b) Contains an index into various arrays (an integer) > > c) Contains ''V'', a "valid pending ring index", usually 0. > > > > So how are b) and c) distinguished? I think b) is the head, so what is > > c)? > > c) state can only be reached *after* releasing a slot. When the slot is > reused into b) state (after coalescing), the head field is then set to > the actual number of the index. b) and c) are mutual exclusive.Ah, so these ''V'' entries are free entries, I think I''ve got it now. So: ...|0 I I I|5 I|9 I I I|... -->|<-INUSE---------------- Then you consume "0 I I I" and get: ...|V V V V|5 I|9 I I I|... -----FREE->|<-INUSE-------- Ian.
Wei Liu
2013-Aug-06 15:24 UTC
Re: [PATCH V4 2/3] xen-netback: switch to NAPI + kthread 1:1 model
On Tue, Aug 06, 2013 at 04:21:38PM +0100, Ian Campbell wrote:> On Tue, 2013-08-06 at 16:05 +0100, Wei Liu wrote: > > On Tue, Aug 06, 2013 at 03:56:46PM +0100, Ian Campbell wrote: > > > On Tue, 2013-08-06 at 15:50 +0100, Wei Liu wrote: > > > > On Tue, Aug 06, 2013 at 02:50:04PM +0100, Ian Campbell wrote: > > > > > On Tue, 2013-08-06 at 14:33 +0100, Wei Liu wrote: > > > > > > I sent this patch right before Ian requested more comments in code. Now > > > > > > I''ve updated this one. Not going to resend the whole series as this is > > > > > > only changes in comments. > > > > > > > > > > > > Sorry for my bad English, I don''t know whether I''ve made them clear > > > > > > enough. Suggestions are welcomed. > > > > > > > > > > > > > > > > > > Wei. > > > > > > > > > > > > ---8<--- > > > > > > From bb33027fe4bafeea546352cd3e409466f8bd7aa4 Mon Sep 17 00:00:00 2001 > > > > > > From: Wei Liu <wei.liu2@citrix.com> > > > > > > Date: Tue, 6 Aug 2013 06:59:15 +0100 > > > > > > Subject: [PATCH] xen-netback: switch to NAPI + kthread 1:1 model > > > > > > > > > > > > This patch implements 1:1 model netback. NAPI and kthread are utilized > > > > > > to do the weight-lifting job: > > > > > > > > > > > > - NAPI is used for guest side TX (host side RX) > > > > > > - kthread is used for guest side RX (host side TX) > > > > > > > > > > > > Xenvif and xen_netbk are made into one structure to reduce code size. > > > > > > > > > > > > This model provides better scheduling fairness among vifs. It is also > > > > > > prerequisite for implementing multiqueue for Xen netback. > > > > > > > > > > > > Signed-off-by: Wei Liu <wei.liu2@citrix.com> > > > > > > --- > > > > > > drivers/net/xen-netback/common.h | 128 +++++--- > > > > > > drivers/net/xen-netback/interface.c | 119 ++++--- > > > > > > drivers/net/xen-netback/netback.c | 607 ++++++++++------------------------- > > > > > > 3 files changed, 347 insertions(+), 507 deletions(-) > > > > > > > > > > > > diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h > > > > > > index 8a4d77e..184ae0a 100644 > > > > > > --- a/drivers/net/xen-netback/common.h > > > > > > +++ b/drivers/net/xen-netback/common.h > > > > > > @@ -45,31 +45,105 @@ > > > > > > #include <xen/grant_table.h> > > > > > > #include <xen/xenbus.h> > > > > > > > > > > > > -struct xen_netbk; > > > > > > +typedef unsigned int pending_ring_idx_t; > > > > > > +#define INVALID_PENDING_RING_IDX (~0U) > > > > > > + > > > > > > +/* For the head field in pending_tx_info: It is used to indicate > > > > > > + * whether this tx info is the head of one or more coalesced requests. > > > > > > + * > > > > > > + * When head != INVALID_PENDING_RING_IDX, it means the start of a new > > > > > > + * tx requests queue and the end of previous queue. > > > > > > + * > > > > > > + * An example sequence of head fields (I = INVALID_PENDING_RING_IDX): > > > > > > + * > > > > > > + * ... 0 I I I 5 I 9 I I I ... > > > > > > + * > > > > > > + * After consming the first packet we have: > > > > > > > > > > "consuming" > > > > > > > > > > > > > Oops... > > > > > > > > > The first packet here is "0 I I I"? Consisting of the head (0) and 3 > > > > > extra data slots (the 3xI)? > > > > > > > > > > > > > Yes. But the term "extra data slot" is not acurate. "0 I I I" means > > > > we''ve coalesced 3 more slots into the first slot. And the index of first > > > > slot into various arrays is 0. > > > > > > Got it, I think. > > > > > > > > > > > > > + * > > > > > > + * ... V V V V 5 I 9 I I I ... > > > > > > + * > > > > > > + * where V stands for "valid pending ring index", any number other > > > > > > + * than INVALID_PENDING_RING_IDX is OK. In practice we use 0. > > > > > > > > > > OK, this is where I get confused, because 0 is also valid in a different > > > > > state, this one: > > > > > > + * The non-INVALID_PENDING_RING_IDX (say 0, 5 and 9 in the above > > > > > > + * example) number is the index into pending_tx_info and mmap_pages > > > > > > + * arrays. > > > > > > > > > > So what does V mean and how to you distinguish this from a > > > > > non-INVALID_PENDING_RING_IDX which happens to correspond to the 0 you > > > > > use in practice for V? > > > > > > > > > > > > > Oh, V is just another way of saying non-INVALID_PENDING_RING_IDX. :-/ > > > > > > > > The head field is always manipulated when coalescing slots and releasing > > > > slots. So if the routine sees 0 when processing a slot is actually means > > > > index 0. > > > > > > But you say "In practice we use 0", but you could use any arbitrary > > > integer which is != INVALID_PENDING_RING_IDX? > > > > > > > Yes. This is only useful when releasing slots. When the routine sees a > > number not equal to INVALID_PENDING_RING_IDX it knows the previous > > sequence has ended. So setting head to 0 is as good as any. > > > > > > > I''m also not sure how 0 is considered a "valid pending ring index". I > > > > > suppose it is not INVALID_PENDING_RING_IDX but it appears that it > > > > > doesn''t have any actual meaning? So it''s just an arbitrary valid but > > > > > meaningless number perhaps? > > > > > > > > > > > > > Yes. 0 is as good as any number other than INVALID_PENDING_RING_IDX. > > > > > > So I still have this confusion between a ''V'' entry and an entry which > > > contains an actual index into various arrays. How are they different? > > > > > > AIUI each entry can be one of three types: > > > > > > a) Contains INVALID_PENDING_RING_IDX (is a coalesced into a previous head) > > > b) Contains an index into various arrays (an integer) > > > c) Contains ''V'', a "valid pending ring index", usually 0. > > > > > > So how are b) and c) distinguished? I think b) is the head, so what is > > > c)? > > > > c) state can only be reached *after* releasing a slot. When the slot is > > reused into b) state (after coalescing), the head field is then set to > > the actual number of the index. b) and c) are mutual exclusive. > > Ah, so these ''V'' entries are free entries, I think I''ve got it now. > > So: > ...|0 I I I|5 I|9 I I I|... > -->|<-INUSE---------------- > > Then you consume "0 I I I" and get: > > ...|V V V V|5 I|9 I I I|... > -----FREE->|<-INUSE-------- >Right, this is it. :-) I will use the above graph in the comment. Wei.> Ian.
Wei Liu
2013-Aug-06 15:33 UTC
Re: [PATCH V4 2/3] xen-netback: switch to NAPI + kthread 1:1 model
On Tue, Aug 06, 2013 at 04:24:10PM +0100, Wei Liu wrote:> On Tue, Aug 06, 2013 at 04:21:38PM +0100, Ian Campbell wrote: > > On Tue, 2013-08-06 at 16:05 +0100, Wei Liu wrote: > > > On Tue, Aug 06, 2013 at 03:56:46PM +0100, Ian Campbell wrote: > > > > On Tue, 2013-08-06 at 15:50 +0100, Wei Liu wrote: > > > > > On Tue, Aug 06, 2013 at 02:50:04PM +0100, Ian Campbell wrote: > > > > > > On Tue, 2013-08-06 at 14:33 +0100, Wei Liu wrote: > > > > > > > I sent this patch right before Ian requested more comments in code. Now > > > > > > > I''ve updated this one. Not going to resend the whole series as this is > > > > > > > only changes in comments. > > > > > > > > > > > > > > Sorry for my bad English, I don''t know whether I''ve made them clear > > > > > > > enough. Suggestions are welcomed. > > > > > > > > > > > > > > > > > > > > > Wei. > > > > > > > > > > > > > > ---8<--- > > > > > > > From bb33027fe4bafeea546352cd3e409466f8bd7aa4 Mon Sep 17 00:00:00 2001 > > > > > > > From: Wei Liu <wei.liu2@citrix.com> > > > > > > > Date: Tue, 6 Aug 2013 06:59:15 +0100 > > > > > > > Subject: [PATCH] xen-netback: switch to NAPI + kthread 1:1 model > > > > > > > > > > > > > > This patch implements 1:1 model netback. NAPI and kthread are utilized > > > > > > > to do the weight-lifting job: > > > > > > > > > > > > > > - NAPI is used for guest side TX (host side RX) > > > > > > > - kthread is used for guest side RX (host side TX) > > > > > > > > > > > > > > Xenvif and xen_netbk are made into one structure to reduce code size. > > > > > > > > > > > > > > This model provides better scheduling fairness among vifs. It is also > > > > > > > prerequisite for implementing multiqueue for Xen netback. > > > > > > > > > > > > > > Signed-off-by: Wei Liu <wei.liu2@citrix.com> > > > > > > > --- > > > > > > > drivers/net/xen-netback/common.h | 128 +++++--- > > > > > > > drivers/net/xen-netback/interface.c | 119 ++++--- > > > > > > > drivers/net/xen-netback/netback.c | 607 ++++++++++------------------------- > > > > > > > 3 files changed, 347 insertions(+), 507 deletions(-) > > > > > > > > > > > > > > diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h > > > > > > > index 8a4d77e..184ae0a 100644 > > > > > > > --- a/drivers/net/xen-netback/common.h > > > > > > > +++ b/drivers/net/xen-netback/common.h > > > > > > > @@ -45,31 +45,105 @@ > > > > > > > #include <xen/grant_table.h> > > > > > > > #include <xen/xenbus.h> > > > > > > > > > > > > > > -struct xen_netbk; > > > > > > > +typedef unsigned int pending_ring_idx_t; > > > > > > > +#define INVALID_PENDING_RING_IDX (~0U) > > > > > > > + > > > > > > > +/* For the head field in pending_tx_info: It is used to indicate > > > > > > > + * whether this tx info is the head of one or more coalesced requests. > > > > > > > + * > > > > > > > + * When head != INVALID_PENDING_RING_IDX, it means the start of a new > > > > > > > + * tx requests queue and the end of previous queue. > > > > > > > + * > > > > > > > + * An example sequence of head fields (I = INVALID_PENDING_RING_IDX): > > > > > > > + * > > > > > > > + * ... 0 I I I 5 I 9 I I I ... > > > > > > > + * > > > > > > > + * After consming the first packet we have: > > > > > > > > > > > > "consuming" > > > > > > > > > > > > > > > > Oops... > > > > > > > > > > > The first packet here is "0 I I I"? Consisting of the head (0) and 3 > > > > > > extra data slots (the 3xI)? > > > > > > > > > > > > > > > > Yes. But the term "extra data slot" is not acurate. "0 I I I" means > > > > > we''ve coalesced 3 more slots into the first slot. And the index of first > > > > > slot into various arrays is 0. > > > > > > > > Got it, I think. > > > > > > > > > > > > > > > > + * > > > > > > > + * ... V V V V 5 I 9 I I I ... > > > > > > > + * > > > > > > > + * where V stands for "valid pending ring index", any number other > > > > > > > + * than INVALID_PENDING_RING_IDX is OK. In practice we use 0. > > > > > > > > > > > > OK, this is where I get confused, because 0 is also valid in a different > > > > > > state, this one: > > > > > > > + * The non-INVALID_PENDING_RING_IDX (say 0, 5 and 9 in the above > > > > > > > + * example) number is the index into pending_tx_info and mmap_pages > > > > > > > + * arrays. > > > > > > > > > > > > So what does V mean and how to you distinguish this from a > > > > > > non-INVALID_PENDING_RING_IDX which happens to correspond to the 0 you > > > > > > use in practice for V? > > > > > > > > > > > > > > > > Oh, V is just another way of saying non-INVALID_PENDING_RING_IDX. :-/ > > > > > > > > > > The head field is always manipulated when coalescing slots and releasing > > > > > slots. So if the routine sees 0 when processing a slot is actually means > > > > > index 0. > > > > > > > > But you say "In practice we use 0", but you could use any arbitrary > > > > integer which is != INVALID_PENDING_RING_IDX? > > > > > > > > > > Yes. This is only useful when releasing slots. When the routine sees a > > > number not equal to INVALID_PENDING_RING_IDX it knows the previous > > > sequence has ended. So setting head to 0 is as good as any. > > > > > > > > > I''m also not sure how 0 is considered a "valid pending ring index". I > > > > > > suppose it is not INVALID_PENDING_RING_IDX but it appears that it > > > > > > doesn''t have any actual meaning? So it''s just an arbitrary valid but > > > > > > meaningless number perhaps? > > > > > > > > > > > > > > > > Yes. 0 is as good as any number other than INVALID_PENDING_RING_IDX. > > > > > > > > So I still have this confusion between a ''V'' entry and an entry which > > > > contains an actual index into various arrays. How are they different? > > > > > > > > AIUI each entry can be one of three types: > > > > > > > > a) Contains INVALID_PENDING_RING_IDX (is a coalesced into a previous head) > > > > b) Contains an index into various arrays (an integer) > > > > c) Contains ''V'', a "valid pending ring index", usually 0. > > > > > > > > So how are b) and c) distinguished? I think b) is the head, so what is > > > > c)? > > > > > > c) state can only be reached *after* releasing a slot. When the slot is > > > reused into b) state (after coalescing), the head field is then set to > > > the actual number of the index. b) and c) are mutual exclusive. > > > > Ah, so these ''V'' entries are free entries, I think I''ve got it now. > > > > So: > > ...|0 I I I|5 I|9 I I I|... > > -->|<-INUSE---------------- > > > > Then you consume "0 I I I" and get: > > > > ...|V V V V|5 I|9 I I I|... > > -----FREE->|<-INUSE-------- > > > > Right, this is it. :-) > > I will use the above graph in the comment. >/* For the head field in pending_tx_info: it is used to indicate * whether this tx info is the head of one or more coalesced requests. * * When head != INVALID_PENDING_RING_IDX, it means the start of a new * tx requests queue and the end of previous queue. * * An example sequence of head fields (I = INVALID_PENDING_RING_IDX): * * ...|0 I I I|5 I|9 I I I|... * -->|<-INUSE---------------- * * After consuming the first slot(s) we have: * * ...|V V V V|5 I|9 I I I|... * -----FREE->|<-INUSE-------- * * where V stands for "valid pending ring index". Any number other * than INVALID_PENDING_RING_IDX is OK. In practice we use 0. * * The non-INVALID_PENDING_RING_IDX (say 0, 5 and 9 in the above * example) number is the index into pending_tx_info and mmap_pages * arrays. */> > Wei. > > > Ian.
Ian Campbell
2013-Aug-06 15:38 UTC
Re: [PATCH V4 2/3] xen-netback: switch to NAPI + kthread 1:1 model
On Tue, 2013-08-06 at 16:33 +0100, Wei Liu wrote:> On Tue, Aug 06, 2013 at 04:24:10PM +0100, Wei Liu wrote: > > On Tue, Aug 06, 2013 at 04:21:38PM +0100, Ian Campbell wrote: > > > On Tue, 2013-08-06 at 16:05 +0100, Wei Liu wrote: > > > > On Tue, Aug 06, 2013 at 03:56:46PM +0100, Ian Campbell wrote: > > > > > On Tue, 2013-08-06 at 15:50 +0100, Wei Liu wrote: > > > > > > On Tue, Aug 06, 2013 at 02:50:04PM +0100, Ian Campbell wrote: > > > > > > > On Tue, 2013-08-06 at 14:33 +0100, Wei Liu wrote: > > > > > > > > I sent this patch right before Ian requested more comments in code. Now > > > > > > > > I''ve updated this one. Not going to resend the whole series as this is > > > > > > > > only changes in comments. > > > > > > > > > > > > > > > > Sorry for my bad English, I don''t know whether I''ve made them clear > > > > > > > > enough. Suggestions are welcomed. > > > > > > > > > > > > > > > > > > > > > > > > Wei. > > > > > > > > > > > > > > > > ---8<--- > > > > > > > > From bb33027fe4bafeea546352cd3e409466f8bd7aa4 Mon Sep 17 00:00:00 2001 > > > > > > > > From: Wei Liu <wei.liu2@citrix.com> > > > > > > > > Date: Tue, 6 Aug 2013 06:59:15 +0100 > > > > > > > > Subject: [PATCH] xen-netback: switch to NAPI + kthread 1:1 model > > > > > > > > > > > > > > > > This patch implements 1:1 model netback. NAPI and kthread are utilized > > > > > > > > to do the weight-lifting job: > > > > > > > > > > > > > > > > - NAPI is used for guest side TX (host side RX) > > > > > > > > - kthread is used for guest side RX (host side TX) > > > > > > > > > > > > > > > > Xenvif and xen_netbk are made into one structure to reduce code size. > > > > > > > > > > > > > > > > This model provides better scheduling fairness among vifs. It is also > > > > > > > > prerequisite for implementing multiqueue for Xen netback. > > > > > > > > > > > > > > > > Signed-off-by: Wei Liu <wei.liu2@citrix.com> > > > > > > > > --- > > > > > > > > drivers/net/xen-netback/common.h | 128 +++++--- > > > > > > > > drivers/net/xen-netback/interface.c | 119 ++++--- > > > > > > > > drivers/net/xen-netback/netback.c | 607 ++++++++++------------------------- > > > > > > > > 3 files changed, 347 insertions(+), 507 deletions(-) > > > > > > > > > > > > > > > > diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h > > > > > > > > index 8a4d77e..184ae0a 100644 > > > > > > > > --- a/drivers/net/xen-netback/common.h > > > > > > > > +++ b/drivers/net/xen-netback/common.h > > > > > > > > @@ -45,31 +45,105 @@ > > > > > > > > #include <xen/grant_table.h> > > > > > > > > #include <xen/xenbus.h> > > > > > > > > > > > > > > > > -struct xen_netbk; > > > > > > > > +typedef unsigned int pending_ring_idx_t; > > > > > > > > +#define INVALID_PENDING_RING_IDX (~0U) > > > > > > > > + > > > > > > > > +/* For the head field in pending_tx_info: It is used to indicate > > > > > > > > + * whether this tx info is the head of one or more coalesced requests. > > > > > > > > + * > > > > > > > > + * When head != INVALID_PENDING_RING_IDX, it means the start of a new > > > > > > > > + * tx requests queue and the end of previous queue. > > > > > > > > + * > > > > > > > > + * An example sequence of head fields (I = INVALID_PENDING_RING_IDX): > > > > > > > > + * > > > > > > > > + * ... 0 I I I 5 I 9 I I I ... > > > > > > > > + * > > > > > > > > + * After consming the first packet we have: > > > > > > > > > > > > > > "consuming" > > > > > > > > > > > > > > > > > > > Oops... > > > > > > > > > > > > > The first packet here is "0 I I I"? Consisting of the head (0) and 3 > > > > > > > extra data slots (the 3xI)? > > > > > > > > > > > > > > > > > > > Yes. But the term "extra data slot" is not acurate. "0 I I I" means > > > > > > we''ve coalesced 3 more slots into the first slot. And the index of first > > > > > > slot into various arrays is 0. > > > > > > > > > > Got it, I think. > > > > > > > > > > > > > > > > > > > + * > > > > > > > > + * ... V V V V 5 I 9 I I I ... > > > > > > > > + * > > > > > > > > + * where V stands for "valid pending ring index", any number other > > > > > > > > + * than INVALID_PENDING_RING_IDX is OK. In practice we use 0. > > > > > > > > > > > > > > OK, this is where I get confused, because 0 is also valid in a different > > > > > > > state, this one: > > > > > > > > + * The non-INVALID_PENDING_RING_IDX (say 0, 5 and 9 in the above > > > > > > > > + * example) number is the index into pending_tx_info and mmap_pages > > > > > > > > + * arrays. > > > > > > > > > > > > > > So what does V mean and how to you distinguish this from a > > > > > > > non-INVALID_PENDING_RING_IDX which happens to correspond to the 0 you > > > > > > > use in practice for V? > > > > > > > > > > > > > > > > > > > Oh, V is just another way of saying non-INVALID_PENDING_RING_IDX. :-/ > > > > > > > > > > > > The head field is always manipulated when coalescing slots and releasing > > > > > > slots. So if the routine sees 0 when processing a slot is actually means > > > > > > index 0. > > > > > > > > > > But you say "In practice we use 0", but you could use any arbitrary > > > > > integer which is != INVALID_PENDING_RING_IDX? > > > > > > > > > > > > > Yes. This is only useful when releasing slots. When the routine sees a > > > > number not equal to INVALID_PENDING_RING_IDX it knows the previous > > > > sequence has ended. So setting head to 0 is as good as any. > > > > > > > > > > > I''m also not sure how 0 is considered a "valid pending ring index". I > > > > > > > suppose it is not INVALID_PENDING_RING_IDX but it appears that it > > > > > > > doesn''t have any actual meaning? So it''s just an arbitrary valid but > > > > > > > meaningless number perhaps? > > > > > > > > > > > > > > > > > > > Yes. 0 is as good as any number other than INVALID_PENDING_RING_IDX. > > > > > > > > > > So I still have this confusion between a ''V'' entry and an entry which > > > > > contains an actual index into various arrays. How are they different? > > > > > > > > > > AIUI each entry can be one of three types: > > > > > > > > > > a) Contains INVALID_PENDING_RING_IDX (is a coalesced into a previous head) > > > > > b) Contains an index into various arrays (an integer) > > > > > c) Contains ''V'', a "valid pending ring index", usually 0. > > > > > > > > > > So how are b) and c) distinguished? I think b) is the head, so what is > > > > > c)? > > > > > > > > c) state can only be reached *after* releasing a slot. When the slot is > > > > reused into b) state (after coalescing), the head field is then set to > > > > the actual number of the index. b) and c) are mutual exclusive. > > > > > > Ah, so these ''V'' entries are free entries, I think I''ve got it now. > > > > > > So: > > > ...|0 I I I|5 I|9 I I I|... > > > -->|<-INUSE---------------- > > > > > > Then you consume "0 I I I" and get: > > > > > > ...|V V V V|5 I|9 I I I|... > > > -----FREE->|<-INUSE-------- > > > > > > > Right, this is it. :-) > > > > I will use the above graph in the comment. > > > > /* For the head field in pending_tx_info: it is used to indicate > * whether this tx info is the head of one or more coalesced requests. > * > * When head != INVALID_PENDING_RING_IDX, it means the start of a new > * tx requests queue and the end of previous queue. > * > * An example sequence of head fields (I = INVALID_PENDING_RING_IDX): > * > * ...|0 I I I|5 I|9 I I I|... > * -->|<-INUSE---------------- > * > * After consuming the first slot(s) we have: > * > * ...|V V V V|5 I|9 I I I|... > * -----FREE->|<-INUSE-------- > * > * where V stands for "valid pending ring index". Any number other > * than INVALID_PENDING_RING_IDX is OK. In practice we use 0.Perhaps add "These entries are considered free and can contain any number other than INVALID_PENDING_RING_IDX" ? Do you have a #define, like FREE_PENDING_RING_IDX for this value? Should mention it if so I think.> * > * The non-INVALID_PENDING_RING_IDX (say 0, 5 and 9 in the above^ add "in use"?> * example) number is the index into pending_tx_info and mmap_pages > * arrays. > */ > > > > > Wei. > > > > > Ian.
Wei Liu
2013-Aug-06 15:47 UTC
Re: [PATCH V4 2/3] xen-netback: switch to NAPI + kthread 1:1 model
On Tue, Aug 06, 2013 at 04:38:56PM +0100, Ian Campbell wrote: [...]> > * An example sequence of head fields (I = INVALID_PENDING_RING_IDX): > > * > > * ...|0 I I I|5 I|9 I I I|... > > * -->|<-INUSE---------------- > > * > > * After consuming the first slot(s) we have: > > * > > * ...|V V V V|5 I|9 I I I|... > > * -----FREE->|<-INUSE-------- > > * > > * where V stands for "valid pending ring index". Any number other > > * than INVALID_PENDING_RING_IDX is OK. In practice we use 0. > > Perhaps add "These entries are considered free and can contain any > number other than INVALID_PENDING_RING_IDX" ? > > Do you have a #define, like FREE_PENDING_RING_IDX for this value? Should > mention it if so I think. >Unfortunately no. This can be fixed later with separate patch? Or just have that change in this patch? Wei.> > * > > * The non-INVALID_PENDING_RING_IDX (say 0, 5 and 9 in the above > ^ add "in use"? > > > * example) number is the index into pending_tx_info and mmap_pages > > * arrays. > > */ > > > > > > > > Wei. > > > > > > > Ian. >
Ian Campbell
2013-Aug-06 15:50 UTC
Re: [PATCH V4 2/3] xen-netback: switch to NAPI + kthread 1:1 model
On Tue, 2013-08-06 at 16:47 +0100, Wei Liu wrote:> On Tue, Aug 06, 2013 at 04:38:56PM +0100, Ian Campbell wrote: > [...] > > > * An example sequence of head fields (I = INVALID_PENDING_RING_IDX): > > > * > > > * ...|0 I I I|5 I|9 I I I|... > > > * -->|<-INUSE---------------- > > > * > > > * After consuming the first slot(s) we have: > > > * > > > * ...|V V V V|5 I|9 I I I|... > > > * -----FREE->|<-INUSE-------- > > > * > > > * where V stands for "valid pending ring index". Any number other > > > * than INVALID_PENDING_RING_IDX is OK. In practice we use 0. > > > > Perhaps add "These entries are considered free and can contain any > > number other than INVALID_PENDING_RING_IDX" ? > > > > Do you have a #define, like FREE_PENDING_RING_IDX for this value? Should > > mention it if so I think. > > > > Unfortunately no. This can be fixed later with separate patch? Or just > have that change in this patch?Separate I think, this patch is already pretty unwieldy.> > > Wei. > > > > * > > > * The non-INVALID_PENDING_RING_IDX (say 0, 5 and 9 in the above > > ^ add "in use"? > > > > > * example) number is the index into pending_tx_info and mmap_pages > > > * arrays. > > > */ > > > > > > > > > > > Wei. > > > > > > > > > Ian. > >
Wei Liu
2013-Aug-06 15:54 UTC
Re: [PATCH V4 2/3] xen-netback: switch to NAPI + kthread 1:1 model
On Tue, Aug 06, 2013 at 04:50:02PM +0100, Ian Campbell wrote:> On Tue, 2013-08-06 at 16:47 +0100, Wei Liu wrote: > > On Tue, Aug 06, 2013 at 04:38:56PM +0100, Ian Campbell wrote: > > [...] > > > > * An example sequence of head fields (I = INVALID_PENDING_RING_IDX): > > > > * > > > > * ...|0 I I I|5 I|9 I I I|... > > > > * -->|<-INUSE---------------- > > > > * > > > > * After consuming the first slot(s) we have: > > > > * > > > > * ...|V V V V|5 I|9 I I I|... > > > > * -----FREE->|<-INUSE-------- > > > > * > > > > * where V stands for "valid pending ring index". Any number other > > > > * than INVALID_PENDING_RING_IDX is OK. In practice we use 0. > > > > > > Perhaps add "These entries are considered free and can contain any > > > number other than INVALID_PENDING_RING_IDX" ? > > > > > > Do you have a #define, like FREE_PENDING_RING_IDX for this value? Should > > > mention it if so I think. > > > > > > > Unfortunately no. This can be fixed later with separate patch? Or just > > have that change in this patch? > > Separate I think, this patch is already pretty unwieldy. >So the comment for now: /* For the head field in pending_tx_info: it is used to indicate * whether this tx info is the head of one or more coalesced requests. * * When head != INVALID_PENDING_RING_IDX, it means the start of a new * tx requests queue and the end of previous queue. * * An example sequence of head fields (I = INVALID_PENDING_RING_IDX): * * ...|0 I I I|5 I|9 I I I|... * -->|<-INUSE---------------- * * After consuming the first slot(s) we have: * * ...|V V V V|5 I|9 I I I|... * -----FREE->|<-INUSE-------- * * where V stands for "valid pending ring index". Any number other * than INVALID_PENDING_RING_IDX is OK. These entries are considered * free and can contain any number other than * INVALID_PENDING_RING_IDX. In practice we use 0. * * The in use non-INVALID_PENDING_RING_IDX (say 0, 5 and 9 in the * above example) number is the index into pending_tx_info and * mmap_pages arrays. */ The FREE_PENDING_RING_IDX will be added with later patch. Wei.> > > > > > Wei. > > > > > > * > > > > * The non-INVALID_PENDING_RING_IDX (say 0, 5 and 9 in the above > > > ^ add "in use"? > > > > > > > * example) number is the index into pending_tx_info and mmap_pages > > > > * arrays. > > > > */ > > > > > > > > > > > > > > Wei. > > > > > > > > > > > Ian. > > > >
Ian Campbell
2013-Aug-06 15:58 UTC
Re: [PATCH V4 2/3] xen-netback: switch to NAPI + kthread 1:1 model
On Tue, 2013-08-06 at 16:54 +0100, Wei Liu wrote:> On Tue, Aug 06, 2013 at 04:50:02PM +0100, Ian Campbell wrote: > > On Tue, 2013-08-06 at 16:47 +0100, Wei Liu wrote: > > > On Tue, Aug 06, 2013 at 04:38:56PM +0100, Ian Campbell wrote: > > > [...] > > > > > * An example sequence of head fields (I = INVALID_PENDING_RING_IDX): > > > > > * > > > > > * ...|0 I I I|5 I|9 I I I|... > > > > > * -->|<-INUSE---------------- > > > > > * > > > > > * After consuming the first slot(s) we have: > > > > > * > > > > > * ...|V V V V|5 I|9 I I I|... > > > > > * -----FREE->|<-INUSE-------- > > > > > * > > > > > * where V stands for "valid pending ring index". Any number other > > > > > * than INVALID_PENDING_RING_IDX is OK. In practice we use 0. > > > > > > > > Perhaps add "These entries are considered free and can contain any > > > > number other than INVALID_PENDING_RING_IDX" ? > > > > > > > > Do you have a #define, like FREE_PENDING_RING_IDX for this value? Should > > > > mention it if so I think. > > > > > > > > > > Unfortunately no. This can be fixed later with separate patch? Or just > > > have that change in this patch? > > > > Separate I think, this patch is already pretty unwieldy. > > > > So the comment for now: > > /* For the head field in pending_tx_info: it is used to indicate > * whether this tx info is the head of one or more coalesced requests. > * > * When head != INVALID_PENDING_RING_IDX, it means the start of a new > * tx requests queue and the end of previous queue. > * > * An example sequence of head fields (I = INVALID_PENDING_RING_IDX): > * > * ...|0 I I I|5 I|9 I I I|... > * -->|<-INUSE---------------- > * > * After consuming the first slot(s) we have: > * > * ...|V V V V|5 I|9 I I I|... > * -----FREE->|<-INUSE-------- > * > * where V stands for "valid pending ring index". Any number other > * than INVALID_PENDING_RING_IDX is OK. These entries are considered > * free and can contain any number other than > * INVALID_PENDING_RING_IDX. In practice we use 0. > * > * The in use non-INVALID_PENDING_RING_IDX (say 0, 5 and 9 in the > * above example) number is the index into pending_tx_info and > * mmap_pages arrays. > */Looks good.> The FREE_PENDING_RING_IDX will be added with later patch.thanks Ian.
Matt Wilson
2013-Aug-07 02:52 UTC
Re: [PATCH V4 1/3] xen-netback: remove page tracking facility
On Tue, Aug 06, 2013 at 10:06:01AM +0100, Wei Liu wrote:> The data flow from DomU to DomU on the same host in current copying > scheme with tracking facility: > > copy > DomU --------> Dom0 DomU > | ^ > |____________________________| > copy > > The page in Dom0 is a page with valid MFN. So we can always copy from > page Dom0, thus removing the need for a tracking facility. > > copy copy > DomU --------> Dom0 -------> DomU > > Simple iperf test shows no performance regression (obviously we copy > twice either way): > > W/ tracking: ~5.3Gb/s > W/o tracking: ~5.4Gb/s > > Signed-off-by: Wei Liu <wei.liu2@citrix.com> > Acked-by: Ian Campbell <ian.campbell@citrix.com>Acked-by: Matt Wilson <msw@amazon.com>> --- > drivers/net/xen-netback/netback.c | 77 +------------------------------------ > 1 file changed, 2 insertions(+), 75 deletions(-)[...] --msw
Wei Liu
2013-Aug-15 13:20 UTC
Re: [PATCH V4 0/3] xen-netback: switch to NAPI + kthread 1:1 model
Any more comments? Wei. On Tue, Aug 06, 2013 at 10:06:00AM +0100, Wei Liu wrote:> This series implements NAPI + kthread 1:1 model for Xen netback. > > This model > - provides better scheduling fairness among vifs > - is prerequisite for implementing multiqueue for Xen network driver > > The first two patches are ground work for the third patch. First one > simplifies code in netback, second one can reduce memory footprint if we > switch to 1:1 model. > > The third patch has the real meat: > - make use of NAPI to mitigate interrupt > - kthreads are not bound to CPUs any more, so that we can take > advantage of backend scheduler and trust it to do the right thing > > Benchmark done on a Dell T3400 workstation with 4 cores, running 4 > DomUs. Netserver running in Dom0. DomUs do netperf to Dom0 with > following command: /root/netperf -H Dom0 -fm -l120 > > IRQs are distributed to 4 cores by hand in the new model, while in the > old model vifs are automatically distributed to 4 kthreads. > > * New model > %Cpu0 : 0.5 us, 20.3 sy, 0.0 ni, 28.9 id, 0.0 wa, 0.0 hi, 24.4 si, 25.9 st > %Cpu1 : 0.5 us, 17.8 sy, 0.0 ni, 28.8 id, 0.0 wa, 0.0 hi, 27.7 si, 25.1 st > %Cpu2 : 0.5 us, 18.8 sy, 0.0 ni, 30.7 id, 0.0 wa, 0.0 hi, 22.9 si, 27.1 st > %Cpu3 : 0.0 us, 20.1 sy, 0.0 ni, 30.4 id, 0.0 wa, 0.0 hi, 22.7 si, 26.8 st > Throughputs: 2027.89 2025.95 2018.57 2016.23 aggregated: 8088.64 > > * Old model > %Cpu0 : 0.5 us, 68.8 sy, 0.0 ni, 16.1 id, 0.5 wa, 0.0 hi, 2.8 si, 11.5 st > %Cpu1 : 0.4 us, 45.1 sy, 0.0 ni, 31.1 id, 0.4 wa, 0.0 hi, 2.1 si, 20.9 st > %Cpu2 : 0.9 us, 44.8 sy, 0.0 ni, 30.9 id, 0.0 wa, 0.0 hi, 1.3 si, 22.2 st > %Cpu3 : 0.8 us, 46.4 sy, 0.0 ni, 28.3 id, 1.3 wa, 0.0 hi, 2.1 si, 21.1 st > Throughputs: 1899.14 2280.43 1963.33 1893.47 aggregated: 8036.37 > > We can see that the impact is mainly on CPU usage. The new model moves > processing from kthread to NAPI (software interrupt). > > Changes since V3: > * rebase on top of net-next > * group TX and RX fields in different regions of the structure > * minor bugfix : use GFP_ATOMIC to avoid sleeping in SI context > * split functional changes and funcation renames into two patches > * minor coding style fixes > > Wei Liu (3): > xen-netback: remove page tracking facility > xen-netback: switch to NAPI + kthread 1:1 model > xen-netback: rename functions > > drivers/net/xen-netback/common.h | 127 ++++-- > drivers/net/xen-netback/interface.c | 119 +++-- > drivers/net/xen-netback/netback.c | 833 +++++++++++------------------------ > 3 files changed, 406 insertions(+), 673 deletions(-) > > -- > 1.7.10.4
Wei Liu
2013-Aug-22 15:12 UTC
Re: [PATCH V4 0/3] xen-netback: switch to NAPI + kthread 1:1 model
On Tue, Aug 06, 2013 at 03:17:46PM +0100, Andrew Bennieston wrote:> On 06/08/13 14:29, David Vrabel wrote: > >On 06/08/13 14:16, Pasi Kärkkäinen wrote: > >>On Tue, Aug 06, 2013 at 10:06:00AM +0100, Wei Liu wrote: > >>> > >>>IRQs are distributed to 4 cores by hand in the new model, while in the > >>>old model vifs are automatically distributed to 4 kthreads. > >>> > >> > >>Hmm.. so with these patches applied is is *required* to do manual configuration in dom0 to get good performance? > > > >This should be irqbalanced''s job. The existing version doesn''t do a > >good enough job yet though. Andrew Bennieston may have more details. > > > >David > > > > irqbalance 1.0.6 [1] includes a patch [2] from Wei Liu [3] that adds > support for balancing `xen-dyn-event'' interrupts. When I have compiled > this version and run it under Xen(Server) I noticed that the > interrupts are indeed moving between cores, but not necessarily in > what I would call an obvious or optimal way (e.g. several VIF > interrupts are being grouped onto a single dom0 VCPU at times). I plan > on investigating this further when time permits. > > I also noticed that, from time to time, the irqbalance process > disappears. I tracked this down to a segfault that occurs when a VM > shuts down and an IRQ disappears during one of irqbalance''s periodic > rescans. I''m hoping to be able to narrow this down sufficiently to > identify the cause and ideally fix it, but I don''t have a lot of time > to work on this at the moment. > > As for the impact on Wei''s patches, without irqbalance it would be > trivial to automatically assign (via a script, on VM start) the > interrupts for a particular VIF to a particular dom0 vCPU in a > round-robin fashion, just as VIFs were previously assigned to netback > kthreads. This would result in broadly the same performance as before, > while an improved irqbalanced should give better performance and > fairness when two different VIFs would otherwise be competing for the > same resources. >So can I conclude that this model doesn''t incur severe performance regression, on the other hand it has its advantage on fairness so it''s worth upstreaming? If so I will post another series shortly with all comments addressed. Wei.> Andrew. > > [1] https://code.google.com/p/irqbalance/downloads > [2] https://code.google.com/p/irqbalance/source/detail?r=736a91b739018530c44e59d9b17252f88227fe51 > [3] https://code.google.com/p/irqbalance/issues/detail?id=46
Wei Liu
2013-Aug-22 15:19 UTC
Re: [PATCH V4 0/3] xen-netback: switch to NAPI + kthread 1:1 model
Matt, Konrad, Annie, David, thoughts? Wei. On Thu, Aug 22, 2013 at 04:12:38PM +0100, Wei Liu wrote:> On Tue, Aug 06, 2013 at 03:17:46PM +0100, Andrew Bennieston wrote: > > On 06/08/13 14:29, David Vrabel wrote: > > >On 06/08/13 14:16, Pasi Kärkkäinen wrote: > > >>On Tue, Aug 06, 2013 at 10:06:00AM +0100, Wei Liu wrote: > > >>> > > >>>IRQs are distributed to 4 cores by hand in the new model, while in the > > >>>old model vifs are automatically distributed to 4 kthreads. > > >>> > > >> > > >>Hmm.. so with these patches applied is is *required* to do manual configuration in dom0 to get good performance? > > > > > >This should be irqbalanced''s job. The existing version doesn''t do a > > >good enough job yet though. Andrew Bennieston may have more details. > > > > > >David > > > > > > > irqbalance 1.0.6 [1] includes a patch [2] from Wei Liu [3] that adds > > support for balancing `xen-dyn-event'' interrupts. When I have compiled > > this version and run it under Xen(Server) I noticed that the > > interrupts are indeed moving between cores, but not necessarily in > > what I would call an obvious or optimal way (e.g. several VIF > > interrupts are being grouped onto a single dom0 VCPU at times). I plan > > on investigating this further when time permits. > > > > I also noticed that, from time to time, the irqbalance process > > disappears. I tracked this down to a segfault that occurs when a VM > > shuts down and an IRQ disappears during one of irqbalance''s periodic > > rescans. I''m hoping to be able to narrow this down sufficiently to > > identify the cause and ideally fix it, but I don''t have a lot of time > > to work on this at the moment. > > > > As for the impact on Wei''s patches, without irqbalance it would be > > trivial to automatically assign (via a script, on VM start) the > > interrupts for a particular VIF to a particular dom0 vCPU in a > > round-robin fashion, just as VIFs were previously assigned to netback > > kthreads. This would result in broadly the same performance as before, > > while an improved irqbalanced should give better performance and > > fairness when two different VIFs would otherwise be competing for the > > same resources. > > > > So can I conclude that this model doesn''t incur severe performance > regression, on the other hand it has its advantage on fairness so it''s > worth upstreaming? > > If so I will post another series shortly with all comments addressed. > > Wei. > > > Andrew. > > > > [1] https://code.google.com/p/irqbalance/downloads > > [2] https://code.google.com/p/irqbalance/source/detail?r=736a91b739018530c44e59d9b17252f88227fe51 > > [3] https://code.google.com/p/irqbalance/issues/detail?id=46
Matt Wilson
2013-Aug-22 17:14 UTC
Re: [PATCH V4 0/3] xen-netback: switch to NAPI + kthread 1:1 model
On Thu, Aug 22, 2013 at 04:12:38PM +0100, Wei Liu wrote:> On Tue, Aug 06, 2013 at 03:17:46PM +0100, Andrew Bennieston wrote: > > On 06/08/13 14:29, David Vrabel wrote: > > >On 06/08/13 14:16, Pasi Kärkkäinen wrote: > > >>On Tue, Aug 06, 2013 at 10:06:00AM +0100, Wei Liu wrote: > > >>> > > >>>IRQs are distributed to 4 cores by hand in the new model, while in the > > >>>old model vifs are automatically distributed to 4 kthreads. > > >>> > > >> > > >>Hmm.. so with these patches applied is is *required* to do manual configuration in dom0 to get good performance? > > > > > >This should be irqbalanced''s job. The existing version doesn''t do a > > >good enough job yet though. Andrew Bennieston may have more details. > > > > > >David > > > > > > > irqbalance 1.0.6 [1] includes a patch [2] from Wei Liu [3] that adds > > support for balancing `xen-dyn-event'' interrupts. When I have compiled > > this version and run it under Xen(Server) I noticed that the > > interrupts are indeed moving between cores, but not necessarily in > > what I would call an obvious or optimal way (e.g. several VIF > > interrupts are being grouped onto a single dom0 VCPU at times). I plan > > on investigating this further when time permits. > > > > I also noticed that, from time to time, the irqbalance process > > disappears. I tracked this down to a segfault that occurs when a VM > > shuts down and an IRQ disappears during one of irqbalance''s periodic > > rescans. I''m hoping to be able to narrow this down sufficiently to > > identify the cause and ideally fix it, but I don''t have a lot of time > > to work on this at the moment. > > > > As for the impact on Wei''s patches, without irqbalance it would be > > trivial to automatically assign (via a script, on VM start) the > > interrupts for a particular VIF to a particular dom0 vCPU in a > > round-robin fashion, just as VIFs were previously assigned to netback > > kthreads. This would result in broadly the same performance as before, > > while an improved irqbalanced should give better performance and > > fairness when two different VIFs would otherwise be competing for the > > same resources. > > > > So can I conclude that this model doesn''t incur severe performance > regression, on the other hand it has its advantage on fairness so it''s > worth upstreaming?Yes, I think so. As far as initial interrupt affinity settings, I think it''d be great if the default value could be set in the kernel. I''d rather not require toolchain work or external scripts to get reasonable spread. That said, I can understand why we might hesitate to do this in the kernel, and there''s a fair president set through scripts like set_irq_affinity.sh [1]. On the other hand, the Hyper-V support for Linux guests assigns affinity on a round-robin basis [2].> If so I will post another series shortly with all comments addressed.Please do.> Wei.[1] http://www.intel.com/content/dam/doc/application-note/82575-82576-82598-82599-ethernet-controllers-interrupts-appl-note.pdf [2] http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=a11984 --msw
annie li
2013-Aug-23 02:24 UTC
Re: [PATCH V4 0/3] xen-netback: switch to NAPI + kthread 1:1 model
On 2013-8-22 23:12, Wei Liu wrote:> On Tue, Aug 06, 2013 at 03:17:46PM +0100, Andrew Bennieston wrote: >> On 06/08/13 14:29, David Vrabel wrote: >>> On 06/08/13 14:16, Pasi Kärkkäinen wrote: >>>> On Tue, Aug 06, 2013 at 10:06:00AM +0100, Wei Liu wrote: >>>>> IRQs are distributed to 4 cores by hand in the new model, while in the >>>>> old model vifs are automatically distributed to 4 kthreads. >>>>> >>>> Hmm.. so with these patches applied is is *required* to do manual configuration in dom0 to get good performance? >>> This should be irqbalanced''s job. The existing version doesn''t do a >>> good enough job yet though. Andrew Bennieston may have more details. >>> >>> David >>> >> irqbalance 1.0.6 [1] includes a patch [2] from Wei Liu [3] that adds >> support for balancing `xen-dyn-event'' interrupts. When I have compiled >> this version and run it under Xen(Server) I noticed that the >> interrupts are indeed moving between cores, but not necessarily in >> what I would call an obvious or optimal way (e.g. several VIF >> interrupts are being grouped onto a single dom0 VCPU at times). I plan >> on investigating this further when time permits. >> >> I also noticed that, from time to time, the irqbalance process >> disappears. I tracked this down to a segfault that occurs when a VM >> shuts down and an IRQ disappears during one of irqbalance''s periodic >> rescans. I''m hoping to be able to narrow this down sufficiently to >> identify the cause and ideally fix it, but I don''t have a lot of time >> to work on this at the moment. >> >> As for the impact on Wei''s patches, without irqbalance it would be >> trivial to automatically assign (via a script, on VM start) the >> interrupts for a particular VIF to a particular dom0 vCPU in a >> round-robin fashion, just as VIFs were previously assigned to netback >> kthreads. This would result in broadly the same performance as before, >> while an improved irqbalanced should give better performance and >> fairness when two different VIFs would otherwise be competing for the >> same resources. >> > So can I conclude that this model doesn''t incur severe performance > regression, on the other hand it has its advantage on fairness so it''s > worth upstreaming?I agree. From Andrew''s result, I do not see any bad affect about this model, and well worked irqbalance may give us much better performance too. And this can also simplify persistent map patch(it is kind of stopped, but my recent test shows some good results)> > If so I will post another series shortly with all comments addressed.Please go ahead. Thanks Annie