<steven.smith@citrix.com>
2009-Oct-04 15:03 UTC
[Xen-devel] [PATCH 00/22] Netchannel2 patches for XCI 2.6.27
This patch series contains all of the Linux-side changes from the netchannel2 tree, ported to 2.6.27 plus the XCI patchqueue and tidied up into a bunch of reasonably self-contained patches. Headline features: -- A new grant tables implementation. This is both faster and more flexible, adding support for sub-page and transitive grants. -- Receiver-side copying of packets, which gives better performance and better CPU accounting. -- Domain 0 bypass support. Domains which are located on the same physical host can set up a direct communication channel between themselves, improving performance and dramatically reducing the load on domain 0. Bypasses are created and destroyed automatically in response to observed traffic patterns, and do not require manual configuration. -- VMQ support, allowing guests to take advantage of virtual-machine acceleration support present in some modern NICs (but note the caveats below). Performance is generally decent, in some cases achieving more than twice the throughput of netchannel1, but this is rather workload-dependent. It shouldn''t be significantly slower in any realistic tests. The patch series contains four main chunks: -- First, a mechanism for netback to register its mappings of foreign pages in a way which will be accessible to netback2. This is necessary when forwarding packets between the two backends. (patches 1 and 2 in the series) -- Second, V2 grant tables. (patches 3 through 8) -- Third, netchannel2 itself. (patches 9 through 18) -- Finally, VMQ support for netchannel2. The actual netchannel2 parts of this are reasonably okay, but in order to make use of it you need to have a VMQ-capable NIC with a VMQ-capable driver. The final patch adds this capability to the ixgbe driver, but it''s to a slightly out-of-date version of it. Unfortunately, I don''t currently have access to any VMQ-capable hardware, and the patch is too big to try to forward-port blind, so there''s not a great deal I can do about this. (patches 19 through 22) The series should apply cleanly to commit 59270a8c95901eaa9e3f06d575d4168c320ae057 of the XenClient linux-2.6.27 tree plus 925d65d3d407a620dffb86fb43bb84d9eec3cb07 of their patchqueue. As far as I''m concerned, the first three chunks (everything up to patch 18, ``Add support for automatically creating and destroying bypass rings...'''') is ready to go in to the XCI patchqueue. Steven. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
<steven.smith@citrix.com>
2009-Oct-04 15:03 UTC
[Xen-devel] [PATCH 01/22] Introduce a live_maps facility for tracking which domain foreign pages were mapped from in a reasonably uniform way.
This isn''t terribly useful at present, but will make it much easier to forward mapped packets between domains when there are multiple drivers loaded which can produce such packets (e.g. netback1 and netback2). Signed-off-by: Steven Smith <steven.smith@citrix.com> --- drivers/xen/core/Makefile | 2 +- drivers/xen/core/gnttab.c | 1 + drivers/xen/core/live_maps.c | 61 ++++++++++++++++ include/xen/live_maps.h | 165 ++++++++++++++++++++++++++++++++++++++++++ 4 files changed, 228 insertions(+), 1 deletions(-) create mode 100644 drivers/xen/core/live_maps.c create mode 100644 include/xen/live_maps.h diff --git a/drivers/xen/core/Makefile b/drivers/xen/core/Makefile index 546f0b2..967981a 100644 --- a/drivers/xen/core/Makefile +++ b/drivers/xen/core/Makefile @@ -2,7 +2,7 @@ # Makefile for the linux kernel. # -obj-y := evtchn.o gnttab.o features.o reboot.o machine_reboot.o firmware.o +obj-y := evtchn.o gnttab.o features.o reboot.o machine_reboot.o firmware.o live_maps.o obj-$(CONFIG_PCI) += pci.o obj-$(CONFIG_PROC_FS) += xen_proc.o diff --git a/drivers/xen/core/gnttab.c b/drivers/xen/core/gnttab.c index cea08c0..7605c45 100644 --- a/drivers/xen/core/gnttab.c +++ b/drivers/xen/core/gnttab.c @@ -589,6 +589,7 @@ int gnttab_copy_grant_page(grant_ref_t ref, struct page **pagep) new_page->mapping = page->mapping; new_page->index = page->index; + new_page->private = page->private; set_bit(PG_foreign, &new_page->flags); *pagep = new_page; diff --git a/drivers/xen/core/live_maps.c b/drivers/xen/core/live_maps.c new file mode 100644 index 0000000..69d41f4 --- /dev/null +++ b/drivers/xen/core/live_maps.c @@ -0,0 +1,61 @@ +#include <linux/kernel.h> +#include <linux/module.h> +#include <linux/mm.h> +#include <xen/gnttab.h> +#include <xen/live_maps.h> + +/* This lock protects allocation and release of trackers, but is not + held when we''re actually looking stuff up. The caller is + responsible for making sure that suitable locks are held around + data path operations. */ +static DEFINE_SPINLOCK(tracker_lock); + +struct page_foreign_tracker *foreign_trackers[LIVE_MAP_NR_TRACKERS]; +EXPORT_SYMBOL(foreign_trackers); + +/* Allocate a foreign page tracker. @size is the maximum index in the + tracker. Returns NULL on error. */ +struct page_foreign_tracker *alloc_page_foreign_tracker(unsigned size) +{ + struct page_foreign_tracker *work; + unsigned x; + + BUG_ON(size & ~LIVE_MAP_TRACKER_IDX_MASK); + + work = kzalloc(sizeof(*work) + + size * sizeof(struct page_foreign_tracked), + GFP_KERNEL); + if (!work) + return work; + work->size = size; + + spin_lock(&tracker_lock); + for (x = 0; x < LIVE_MAP_NR_TRACKERS; x++) { + if (foreign_trackers[x] == NULL) { + work->id = x; + foreign_trackers[x] = work; + break; + } + } + spin_unlock(&tracker_lock); + if (x == LIVE_MAP_NR_TRACKERS) { + printk(KERN_WARNING "Out of foreign page trackers!\n"); + kfree(work); + return NULL; + } + return work; +} + +/* Release a tracker allocated with alloc_page_foreign_tracker. There + should be no tracked pages when this is called. */ +void free_page_foreign_tracker(struct page_foreign_tracker *pft) +{ + spin_lock(&tracker_lock); + BUG_ON(foreign_trackers[pft->id] != pft); + foreign_trackers[pft->id] = NULL; + spin_unlock(&tracker_lock); + kfree(pft); +} + +EXPORT_SYMBOL(alloc_page_foreign_tracker); +EXPORT_SYMBOL(free_page_foreign_tracker); diff --git a/include/xen/live_maps.h b/include/xen/live_maps.h new file mode 100644 index 0000000..96e080d --- /dev/null +++ b/include/xen/live_maps.h @@ -0,0 +1,165 @@ +#ifndef XEN_LIVE_MAPS_H__ +#define XEN_LIVE_MAPS_H__ + +/* A mechanism for tracking where pages have been grant mapped from. + Anything which can map pages through a grant reference is supposed + to allocate a page_tracker and then, whenever they map a grant: + + a) Flag the page as foreign with SetPageForeign(), and + b) Register the struct page with a tracker through start_tracking_page(). + + If you later need to grant access to the page (either with a normal + grant or implicitly in a copy grant operation), you should use + lookup_tracker_page() to find out what domain and grant reference + it was mapped from. + + Obviously, if a backend knows that the page will never need to be + re-granted once it''s been mapped, it can avoid doing all this + stuff. + + The number of trackers is quite limited, so they shouldn''t be + allocated unnecessarily. One per backend class is reasonable + (i.e. netback, blkback, etc.), but one per backend device probably + isn''t. +*/ + +#include <linux/mm.h> +#include <xen/gnttab.h> + +#ifdef CONFIG_XEN + +/* We use page->private to store some index information so that we can + find the tracking information later. The top few bits are used to + identify the tracker, and the rest are used as an index into that + tracker. */ + +/* How many bits to use for tracker IDs. */ +#define LIVE_MAP_TRACKER_BITS 2 + +/* How many bits to use for tracker indexes. */ +#define LIVE_MAP_TRACKER_IDX_BITS (32 - LIVE_MAP_TRACKER_BITS) + +/* Maximum number of trackers */ +#define LIVE_MAP_NR_TRACKERS (1 << LIVE_MAP_TRACKER_BITS) + +/* Bitmask of index inside tracker */ +#define LIVE_MAP_TRACKER_IDX_MASK (~0u >> LIVE_MAP_TRACKER_BITS) + +/* Turn off some moderately expensive debug checks. */ +#undef LIVE_MAPS_DEBUG + +struct page_foreign_tracked { + domid_t dom; + grant_ref_t gref; + void *ctxt; +#ifdef LIVE_MAPS_DEBUG + unsigned in_use; +#endif +}; + +struct page_foreign_tracker { + unsigned size; + unsigned id; + struct page_foreign_tracked contents[]; +}; + +extern struct page_foreign_tracker *foreign_trackers[LIVE_MAP_NR_TRACKERS]; + +/* Allocate a foreign page tracker. @size is the maximum index in the + tracker. Returns NULL on error. */ +struct page_foreign_tracker *alloc_page_foreign_tracker(unsigned size); + +/* Release a tracker allocated with alloc_page_foreign_tracker. There + should be no tracked pages when this is called. */ +void free_page_foreign_tracker(struct page_foreign_tracker *pft); + +static inline struct page_foreign_tracker *tracker_for_page(struct page *p) +{ + unsigned idx = page_private(p); + return foreign_trackers[idx >> LIVE_MAP_TRACKER_IDX_BITS]; +} + +static inline void *get_page_tracker_ctxt(struct page *p) +{ + struct page_foreign_tracker *pft = tracker_for_page(p); + unsigned idx = page_private(p); + return pft->contents[idx & LIVE_MAP_TRACKER_IDX_MASK].ctxt; +} + +/* Start tracking a page. @idx is an index in the tracker which is + not currently in use, and must be less than the size of the + tracker. The page must be marked as foreign before this is called. + The caller is expected to make sure that the page is not a + simulataneous target of lookup_tracker_page(). The page should be + passed to stop_tracking_page() when the grant is unmapped. */ +static inline void start_tracking_page(struct page_foreign_tracker *pft, + struct page *p, + domid_t dom, + grant_ref_t gref, + unsigned idx, + void *ctxt) +{ + BUG_ON(!PageForeign(p)); +#ifdef LIVE_MAPS_DEBUG + BUG_ON(idx > pft->size); + BUG_ON(pft->contents[idx].in_use); + pft->contents[idx].in_use = 1; +#endif + pft->contents[idx].dom = dom; + pft->contents[idx].gref = gref; + pft->contents[idx].ctxt = ctxt; + set_page_private(p, idx | (pft->id << LIVE_MAP_TRACKER_IDX_BITS)); +} + +static inline void stop_tracking_page(struct page *p) +{ +#ifdef LIVE_MAPS_DEBUG + struct page_foreign_tracker *pft; + unsigned idx = page_private(p); + BUG_ON(!PageForeign(p)); + pft = tracker_for_page(p); + BUG_ON((idx & LIVE_MAP_TRACKER_IDX_MASK) >= pft->size); + BUG_ON(!pft->contents[idx & LIVE_MAP_TRACKER_IDX_MASK].in_use); + pft->contents[idx & LIVE_MAP_TRACKER_IDX_MASK].in_use = 0; + set_page_private(p, 0); +#endif +} + +/* Lookup a page which is tracked in some tracker. + start_tracking_page() must have been called previously. *@dom and + *@gref will be set to the values which were specified when + start_tracking_page() was called. */ +static inline void lookup_tracker_page(struct page *p, domid_t *dom, + grant_ref_t *gref) +{ + struct page_foreign_tracker *pft; + unsigned idx = page_private(p); + BUG_ON(!PageForeign(p)); + pft = tracker_for_page(p); +#ifdef LIVE_MAPS_DEBUG + BUG_ON(!pft); + BUG_ON((idx & LIVE_MAP_TRACKER_IDX_MASK) >= pft->size); + BUG_ON(!pft->contents[idx & LIVE_MAP_TRACKER_IDX_MASK].in_use); +#endif + *dom = pft->contents[idx & LIVE_MAP_TRACKER_IDX_MASK].dom; + *gref = pft->contents[idx & LIVE_MAP_TRACKER_IDX_MASK].gref; +} + +static inline int page_is_tracked(struct page *p) +{ + return PageForeign(p) && p->mapping; +} + +#else /* !CONFIG_XEN */ +static inline int page_is_tracked(struct page *p) +{ + return 0; +} +static void lookup_tracker_page(struct page *p, domid_t *domid, + grant_ref_t *gref) +{ + BUG(); +} +#endif + +#endif /* !XEN_LIVE_MAPS_H__ */ -- 1.6.3.1 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
<steven.smith@citrix.com>
2009-Oct-04 15:03 UTC
[Xen-devel] [PATCH 02/22] Use the foreign page tracking logic in netback.c. This isn''t terribly useful, but will be necessary if anything else ever introduces mappings of foreign pages into the network stack.
Signed-off-by: Steven Smith <steven.smith@citrix.com> --- drivers/xen/netback/netback.c | 31 ++++++++++++++++++++++++++----- 1 files changed, 26 insertions(+), 5 deletions(-) diff --git a/drivers/xen/netback/netback.c b/drivers/xen/netback/netback.c index fe6eff7..76641b1 100644 --- a/drivers/xen/netback/netback.c +++ b/drivers/xen/netback/netback.c @@ -37,6 +37,7 @@ #include "common.h" #include <xen/balloon.h> #include <xen/interface/memory.h> +#include <xen/live_maps.h> #include <linux/kthread.h> /*define NETBE_DEBUG_INTERRUPT*/ @@ -133,6 +134,8 @@ typedef unsigned int PEND_RING_IDX; static PEND_RING_IDX pending_prod, pending_cons; #define NR_PENDING_REQS (MAX_PENDING_REQS - pending_prod + pending_cons) +static struct page_foreign_tracker *foreign_page_tracker; + /* Freed TX SKBs get batched on this ring before return to pending_ring. */ static u16 dealloc_ring[MAX_PENDING_REQS]; static PEND_RING_IDX dealloc_prod, dealloc_cons; @@ -438,16 +441,14 @@ static void netbk_gop_frag_copy(netif_t *netif, { gnttab_copy_t *copy_gop; struct netbk_rx_meta *meta; - int idx = netif_page_index(page); meta = npo->meta + npo->meta_prod - 1; copy_gop = npo->copy + npo->copy_prod++; copy_gop->flags = GNTCOPY_dest_gref; - if (idx > -1) { - struct pending_tx_info *src_pend = &pending_tx_info[idx]; - copy_gop->source.domid = src_pend->netif->domid; - copy_gop->source.u.ref = src_pend->req.gref; + if (page_is_tracked(page)) { + lookup_tracker_page(page, ©_gop->source.domid, + ©_gop->source.u.ref); copy_gop->flags |= GNTCOPY_source_gref; } else { copy_gop->source.domid = DOMID_SELF; @@ -1081,6 +1082,8 @@ inline static void net_tx_action_dealloc(void) if (!phys_to_machine_mapping_valid(pfn)) continue; + stop_tracking_page(mmap_pages[pending_idx]); + gnttab_set_unmap_op(gop, idx_to_kaddr(pending_idx), GNTMAP_host_map, grant_tx_handle[pending_idx]); @@ -1219,6 +1222,13 @@ static gnttab_map_grant_ref_t *netbk_get_requests(netif_t *netif, netif_get(netif); pending_tx_info[pending_idx].netif = netif; frags[i].page = (void *)pending_idx; + + start_tracking_page(foreign_page_tracker, + mmap_pages[pending_idx], + netif->domid, + pending_tx_info[pending_idx].req.gref, + pending_idx, + NULL); } return mop; @@ -1526,6 +1536,13 @@ static void net_tx_action(void) txreq.gref, netif->domid); mop++; + start_tracking_page(foreign_page_tracker, + mmap_pages[pending_idx], + netif->domid, + txreq.gref, + pending_idx, + NULL); + memcpy(&pending_tx_info[pending_idx].req, &txreq, sizeof(txreq)); pending_tx_info[pending_idx].netif = netif; @@ -1804,9 +1821,13 @@ static int __init netback_init(void) netbk_tx_pending_timer.data = 0; netbk_tx_pending_timer.function = netbk_tx_pending_timeout; + foreign_page_tracker = alloc_page_foreign_tracker(MAX_PENDING_REQS); + if (!foreign_page_tracker) + return -ENOMEM; mmap_pages = alloc_empty_pages_and_pagevec(MAX_PENDING_REQS); if (mmap_pages == NULL) { printk("%s: out of memory\n", __FUNCTION__); + free_page_foreign_tracker(foreign_page_tracker); return -ENOMEM; } -- 1.6.3.1 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
<steven.smith@citrix.com>
2009-Oct-04 15:03 UTC
[Xen-devel] [PATCH 03/22] Remove some trivial code duplication in gnttab.c.
Signed-off-by: Steven Smith <steven.smith@citrix.com> --- drivers/xen/core/gnttab.c | 6 +----- 1 files changed, 1 insertions(+), 5 deletions(-) diff --git a/drivers/xen/core/gnttab.c b/drivers/xen/core/gnttab.c index 7605c45..9a8fc89 100644 --- a/drivers/xen/core/gnttab.c +++ b/drivers/xen/core/gnttab.c @@ -152,11 +152,7 @@ int gnttab_grant_foreign_access(domid_t domid, unsigned long frame, if (unlikely((ref = get_free_entry()) < 0)) return -ENOSPC; - shared[ref].frame = frame; - shared[ref].domid = domid; - wmb(); - BUG_ON(flags & (GTF_accept_transfer | GTF_reading | GTF_writing)); - shared[ref].flags = GTF_permit_access | flags; + gnttab_grant_foreign_access_ref(ref, domid, frame, flags); return ref; } -- 1.6.3.1 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
<steven.smith@citrix.com>
2009-Oct-04 15:03 UTC
[Xen-devel] [PATCH 04/22] Fix a long-standing memory leak in the grant tables implementation. According to the interface comments, gnttab_end_foreign_access() is supposed to free the page once the grant is no longer in use, from a polling timer, but that was never implemented. Implement it.
This shouldn''t make any real difference, because the existing drivers all arrange that with well-behaved backends references are never ended while they''re still in use, but it tidies things up a bit. Signed-off-by: Steven Smith <steven.smith@citrix.com> --- drivers/xen/core/gnttab.c | 100 ++++++++++++++++++++++++++++++++++++++++++--- 1 files changed, 94 insertions(+), 6 deletions(-) diff --git a/drivers/xen/core/gnttab.c b/drivers/xen/core/gnttab.c index 9a8fc89..dd93f43 100644 --- a/drivers/xen/core/gnttab.c +++ b/drivers/xen/core/gnttab.c @@ -55,11 +55,17 @@ #define GNTTAB_LIST_END 0xffffffff #define ENTRIES_PER_GRANT_FRAME (PAGE_SIZE / sizeof(grant_entry_t)) +static void pending_free_timer(unsigned long ignore); + static grant_ref_t **gnttab_list; static unsigned int nr_grant_frames; static unsigned int boot_max_nr_grant_frames; static int gnttab_free_count; static grant_ref_t gnttab_free_head; +static grant_ref_t gnttab_pending_free_gref_head = GNTTAB_LIST_END; +static LIST_HEAD(gnttab_pending_free_pages); +static DEFINE_TIMER(gnttab_delayed_free_timer, pending_free_timer, 0, 0); +static DEFINE_SPINLOCK(gnttab_pending_free_lock); static DEFINE_SPINLOCK(gnttab_list_lock); static struct grant_entry *shared; @@ -180,7 +186,7 @@ int gnttab_query_foreign_access(grant_ref_t ref) } EXPORT_SYMBOL_GPL(gnttab_query_foreign_access); -int gnttab_end_foreign_access_ref(grant_ref_t ref) +static int _gnttab_end_foreign_access_ref(grant_ref_t ref) { u16 flags, nflags; @@ -195,19 +201,101 @@ int gnttab_end_foreign_access_ref(grant_ref_t ref) return 1; } + +int gnttab_end_foreign_access_ref(grant_ref_t ref) +{ + int r; + + r = _gnttab_end_foreign_access_ref(ref); + if (!r) + printk(KERN_DEBUG "WARNING: g.e. still in use!\n"); + return r; +} + EXPORT_SYMBOL_GPL(gnttab_end_foreign_access_ref); +static void pending_free_timer(unsigned long ignore) +{ + grant_ref_t gref, next_gref; + grant_ref_t prev; /* The last gref which we failed to release, + or GNTTAB_LIST_END if there is no such + gref. */ + int need_mod_timer; + struct page *page, *next_page; + + spin_lock(&gnttab_pending_free_lock); + prev = GNTTAB_LIST_END; + for (gref = gnttab_pending_free_gref_head; + gref != GNTTAB_LIST_END; + gref = next_gref) { + next_gref = gnttab_entry(gref); + if (_gnttab_end_foreign_access_ref(gref)) { + put_free_entry(gref); + if (prev != GNTTAB_LIST_END) + gnttab_entry(prev) = next_gref; + else + gnttab_pending_free_gref_head = next_gref; + } else { + prev = gref; + } + } + list_for_each_entry_safe(page, next_page, + &gnttab_pending_free_pages, lru) { + gref = page->index; + if (_gnttab_end_foreign_access_ref(gref)) { + list_del(&page->lru); + put_free_entry(gref); + /* The page hasn''t been used in this domain + for more than a second, so it''s probably + cold. */ + if (put_page_testzero(page)) { +#ifdef MODULE + __free_page(page); +#else + free_cold_page(page); +#endif + } + } + } + + need_mod_timer + (gnttab_pending_free_gref_head != GNTTAB_LIST_END) || + !list_empty(&gnttab_pending_free_pages); + spin_unlock(&gnttab_pending_free_lock); + + if (need_mod_timer) + mod_timer(&gnttab_delayed_free_timer, jiffies + HZ); +} + void gnttab_end_foreign_access(grant_ref_t ref, unsigned long page) { - if (gnttab_end_foreign_access_ref(ref)) { + int need_mod_timer; + struct page *page_struct; + + if (_gnttab_end_foreign_access_ref(ref)) { put_free_entry(ref); if (page != 0) free_page(page); } else { - /* XXX This needs to be fixed so that the ref and page are - placed on a list to be freed up later. */ - printk(KERN_DEBUG - "WARNING: leaking g.e. and page still in use!\n"); + spin_lock_bh(&gnttab_pending_free_lock); + if (page == 0) { + if (gnttab_pending_free_gref_head == GNTTAB_LIST_END) + need_mod_timer = 1; + else + need_mod_timer = 0; + gnttab_entry(ref) = gnttab_pending_free_gref_head; + gnttab_pending_free_gref_head = ref; + } else { + need_mod_timer + list_empty(&gnttab_pending_free_pages); + page_struct = virt_to_page((void *)page); + page_struct->index = ref; + list_add_tail(&page_struct->lru, + &gnttab_pending_free_pages); + } + spin_unlock_bh(&gnttab_pending_free_lock); + if (need_mod_timer) + mod_timer(&gnttab_delayed_free_timer, jiffies + HZ); } } EXPORT_SYMBOL_GPL(gnttab_end_foreign_access); -- 1.6.3.1 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
<steven.smith@citrix.com>
2009-Oct-04 15:03 UTC
[Xen-devel] [PATCH 05/22] Introduce support for version 2 grant tables. Use them by default when available.
This doesn''t include any of the new features, like copy grants or transitive grants, but it does include most of the V2 infrastructure. Signed-off-by: Steven Smith <steven.smith@citrix.com> --- drivers/xen/core/gnttab.c | 296 ++++++++++++++++++++++++++++------ include/xen/interface/grant_table.h | 131 +++++++++++++++- 2 files changed, 368 insertions(+), 59 deletions(-) diff --git a/drivers/xen/core/gnttab.c b/drivers/xen/core/gnttab.c index dd93f43..5484e7e 100644 --- a/drivers/xen/core/gnttab.c +++ b/drivers/xen/core/gnttab.c @@ -53,7 +53,9 @@ /* External tools reserve first few grant table entries. */ #define NR_RESERVED_ENTRIES 8 #define GNTTAB_LIST_END 0xffffffff -#define ENTRIES_PER_GRANT_FRAME (PAGE_SIZE / sizeof(grant_entry_t)) +#define ENTRIES_PER_GRANT_FRAME (grant_table_version == 1 ? \ + (PAGE_SIZE / sizeof(grant_entry_v1_t)) : \ + (PAGE_SIZE / sizeof(grant_entry_v2_t))) static void pending_free_timer(unsigned long ignore); @@ -68,10 +70,18 @@ static DEFINE_TIMER(gnttab_delayed_free_timer, pending_free_timer, 0, 0); static DEFINE_SPINLOCK(gnttab_pending_free_lock); static DEFINE_SPINLOCK(gnttab_list_lock); -static struct grant_entry *shared; +static union { + grant_entry_v1_t *v1; + grant_entry_v2_t *v2; + void *raw; +} shared; + +static grant_status_t *grstatus; static struct gnttab_free_callback *gnttab_free_callback_list; +static int grant_table_version; + static int gnttab_expand(unsigned int req_entries); #define RPP (PAGE_SIZE / sizeof(grant_ref_t)) @@ -80,6 +90,11 @@ static int gnttab_expand(unsigned int req_entries); #define nr_freelist_frames(grant_frames) \ (((grant_frames) * ENTRIES_PER_GRANT_FRAME + RPP - 1) / RPP) +#define SPP (PAGE_SIZE / sizeof(grant_status_t)) +#define nr_status_frames(grant_frames) \ + (((grant_frames) * ENTRIES_PER_GRANT_FRAME + SPP - 1) / SPP) + + static int get_free_entries(int count) { unsigned long flags; @@ -167,11 +182,19 @@ EXPORT_SYMBOL_GPL(gnttab_grant_foreign_access); void gnttab_grant_foreign_access_ref(grant_ref_t ref, domid_t domid, unsigned long frame, int flags) { - shared[ref].frame = frame; - shared[ref].domid = domid; - wmb(); - BUG_ON(flags & (GTF_accept_transfer | GTF_reading | GTF_writing)); - shared[ref].flags = GTF_permit_access | flags; + BUG_ON(flags & (GTF_accept_transfer | GTF_reading | + GTF_writing | GTF_sub_page)); + if (grant_table_version == 1) { + shared.v1[ref].frame = frame; + shared.v1[ref].domid = domid; + wmb(); + shared.v1[ref].flags = GTF_permit_access | flags; + } else { + shared.v2[ref].frame = frame; + shared.v2[ref].hdr.domid = domid; + wmb(); + shared.v2[ref].hdr.flags = GTF_permit_access | flags; + } } EXPORT_SYMBOL_GPL(gnttab_grant_foreign_access_ref); @@ -180,7 +203,10 @@ int gnttab_query_foreign_access(grant_ref_t ref) { u16 nflags; - nflags = shared[ref].flags; + if (grant_table_version == 1) + nflags = shared.v1[ref].flags; + else + nflags = grstatus[ref]; return (nflags & (GTF_reading|GTF_writing)); } @@ -189,29 +215,48 @@ EXPORT_SYMBOL_GPL(gnttab_query_foreign_access); static int _gnttab_end_foreign_access_ref(grant_ref_t ref) { u16 flags, nflags; - - nflags = shared[ref].flags; - do { - if ((flags = nflags) & (GTF_reading|GTF_writing)) { - printk(KERN_DEBUG "WARNING: g.e. still in use!\n"); + u16 *pflags; + + if (grant_table_version == 1) { + pflags = &shared.v1[ref].flags; + nflags = *pflags; + do { + if ((flags = nflags) & (GTF_reading|GTF_writing)) { + return 0; + } + } while ((nflags = synch_cmpxchg_subword(pflags, flags, 0)) !+ flags); + return 1; + } else { + shared.v2[ref].hdr.flags = 0; + mb(); + if (grstatus[ref] & (GTF_reading|GTF_writing)) { return 0; + } else { + /* The read of grstatus needs to have acquire + semantics. On x86, reads already have + that, and we just need to protect against + compiler reorderings. On other + architectures we may need a full + barrier. */ +#ifdef CONFIG_X86 + barrier(); +#else + mb(); +#endif + return 1; } - } while ((nflags = synch_cmpxchg_subword(&shared[ref].flags, flags, 0)) !- flags); - - return 1; + } } -int gnttab_end_foreign_access_ref(grant_ref_t ref) +int gnttab_end_foreign_access_ref(grant_ref_t gref) { - int r; - - r = _gnttab_end_foreign_access_ref(ref); - if (!r) + int res; + res = _gnttab_end_foreign_access_ref(gref); + if (res == 0) printk(KERN_DEBUG "WARNING: g.e. still in use!\n"); - return r; + return res; } - EXPORT_SYMBOL_GPL(gnttab_end_foreign_access_ref); static void pending_free_timer(unsigned long ignore) @@ -315,37 +360,53 @@ EXPORT_SYMBOL_GPL(gnttab_grant_foreign_transfer); void gnttab_grant_foreign_transfer_ref(grant_ref_t ref, domid_t domid, unsigned long pfn) { - shared[ref].frame = pfn; - shared[ref].domid = domid; - wmb(); - shared[ref].flags = GTF_accept_transfer; + if (grant_table_version == 1) { + shared.v1[ref].frame = pfn; + shared.v1[ref].domid = domid; + wmb(); + shared.v1[ref].flags = GTF_accept_transfer; + } else { + shared.v2[ref].frame = pfn; + shared.v2[ref].hdr.domid = domid; + wmb(); + shared.v2[ref].hdr.flags = GTF_accept_transfer; + } } EXPORT_SYMBOL_GPL(gnttab_grant_foreign_transfer_ref); unsigned long gnttab_end_foreign_transfer_ref(grant_ref_t ref) { unsigned long frame; - u16 flags; + u16 flags; + u16 *pflags; + + if (grant_table_version == 1) + pflags = &shared.v1[ref].flags; + else + pflags = &shared.v2[ref].hdr.flags; /* * If a transfer is not even yet started, try to reclaim the grant * reference and return failure (== 0). */ - while (!((flags = shared[ref].flags) & GTF_transfer_committed)) { - if (synch_cmpxchg_subword(&shared[ref].flags, flags, 0) == flags) + while (!((flags = *pflags) & GTF_transfer_committed)) { + if (synch_cmpxchg_subword(pflags, flags, 0) == flags) return 0; cpu_relax(); } /* If a transfer is in progress then wait until it is completed. */ while (!(flags & GTF_transfer_completed)) { - flags = shared[ref].flags; + flags = *pflags; cpu_relax(); } /* Read the frame number /after/ reading completion status. */ rmb(); - frame = shared[ref].frame; + if (grant_table_version == 1) + frame = shared.v1[ref].frame; + else + frame = shared.v2[ref].frame; BUG_ON(frame == 0); return frame; @@ -519,6 +580,30 @@ static inline unsigned int max_nr_grant_frames(void) return xen_max; } +static void gnttab_request_version(void) +{ + int rc; + struct gnttab_set_version gsv; + + gsv.version = 2; + rc = HYPERVISOR_grant_table_op(GNTTABOP_set_version, &gsv, 1); + if (rc == 0) { + grant_table_version = 2; + printk(KERN_NOTICE "Using V2 grant tables.\n"); + } else { + if (grant_table_version == 2) { + /* If we''ve already used version 2 features, + but then suddenly discover that they''re not + available (e.g. migrating to an older + version of Xen), almost unbounded badness + can happen. */ + panic("we need grant tables version 2, but only version 1 is available"); + } + grant_table_version = 1; + printk(KERN_WARNING "Using legacy V1 grant tables; upgrade to a newer version of Xen.\n"); + } +} + #ifdef CONFIG_XEN static DEFINE_SEQLOCK(gnttab_dma_lock); @@ -534,6 +619,16 @@ static int map_pte_fn(pte_t *pte, struct page *pmd_page, return 0; } +static int map_pte_fn_status(pte_t *pte, struct page *pmd_page, + unsigned long addr, void *data) +{ + uint64_t **frames = (uint64_t **)data; + + set_pte_at(&init_mm, addr, pte, pfn_pte_ma((*frames)[0], PAGE_KERNEL)); + (*frames)++; + return 0; +} + #ifdef CONFIG_PM_SLEEP static int unmap_pte_fn(pte_t *pte, struct page *pmd_page, unsigned long addr, void *data) @@ -551,43 +646,95 @@ void *arch_gnttab_alloc_shared(unsigned long *frames) BUG_ON(area == NULL); return area->addr; } + +void *arch_gnttab_alloc_status(uint64_t *frames) +{ + struct vm_struct *area; + area = alloc_vm_area(PAGE_SIZE * + nr_status_frames(boot_max_nr_grant_frames)); + BUG_ON(area == NULL); + return area->addr; +} #endif /* CONFIG_X86 */ static int gnttab_map(unsigned int start_idx, unsigned int end_idx) { struct gnttab_setup_table setup; - unsigned long *frames; + unsigned long *gframes; + uint64_t *sframes; unsigned int nr_gframes = end_idx + 1; + unsigned int nr_sframes; int rc; - frames = kmalloc(nr_gframes * sizeof(unsigned long), GFP_ATOMIC); - if (!frames) + BUG_ON(grant_table_version == 0); + + gframes = kmalloc(nr_gframes * sizeof(unsigned long), GFP_ATOMIC); + if (!gframes) return -ENOMEM; - setup.dom = DOMID_SELF; - setup.nr_frames = nr_gframes; - set_xen_guest_handle(setup.frame_list, frames); + setup.dom = DOMID_SELF; + setup.nr_frames = nr_gframes; + set_xen_guest_handle(setup.frame_list, gframes); rc = HYPERVISOR_grant_table_op(GNTTABOP_setup_table, &setup, 1); if (rc == -ENOSYS) { - kfree(frames); + kfree(gframes); return -ENOSYS; } BUG_ON(rc || setup.status); - if (shared == NULL) - shared = arch_gnttab_alloc_shared(frames); + if (shared.raw == NULL) + shared.raw = arch_gnttab_alloc_shared(gframes); + + if (grant_table_version > 1) { + struct gnttab_get_status_frames getframes; + + nr_sframes= nr_status_frames(nr_gframes); + + sframes = kmalloc(nr_sframes * sizeof(uint64_t), + GFP_ATOMIC); + if (!sframes) { + kfree(gframes); + return -ENOMEM; + } + getframes.dom = DOMID_SELF; + getframes.nr_frames = nr_sframes; + getframes.frame_list = (unsigned long)sframes; + + rc = HYPERVISOR_grant_table_op(GNTTABOP_get_status_frames, + &getframes, 1); + if (rc == -ENOSYS) { + kfree(gframes); + kfree(sframes); + return -ENOSYS; + } + + BUG_ON(rc || getframes.status); + + if (grstatus == NULL) + grstatus = arch_gnttab_alloc_status(sframes); + } #ifdef CONFIG_X86 - rc = apply_to_page_range(&init_mm, (unsigned long)shared, + rc = apply_to_page_range(&init_mm, (unsigned long)shared.raw, PAGE_SIZE * nr_gframes, - map_pte_fn, &frames); + map_pte_fn, &gframes); BUG_ON(rc); - frames -= nr_gframes; /* adjust after map_pte_fn() */ + gframes -= nr_gframes; /* adjust after map_pte_fn() */ + + if (grant_table_version > 1) { + rc = apply_to_page_range(&init_mm, (unsigned long)grstatus, + PAGE_SIZE * nr_sframes, + map_pte_fn_status, &sframes); + BUG_ON(rc); + sframes -= nr_sframes; /* adjust after map_pte_fn() */ + } #endif /* CONFIG_X86 */ - kfree(frames); + kfree(gframes); + if (grant_table_version > 1) + kfree(sframes); return 0; } @@ -794,6 +941,7 @@ EXPORT_SYMBOL(gnttab_post_map_adjust); static int gnttab_resume(struct sys_device *dev) { + gnttab_request_version(); if (max_nr_grant_frames() < nr_grant_frames) return -ENOSYS; return gnttab_map(0, nr_grant_frames - 1); @@ -804,9 +952,12 @@ static int gnttab_resume(struct sys_device *dev) #ifdef CONFIG_X86 static int gnttab_suspend(struct sys_device *dev, pm_message_t state) { - apply_to_page_range(&init_mm, (unsigned long)shared, + apply_to_page_range(&init_mm, (unsigned long)shared.raw, PAGE_SIZE * nr_grant_frames, unmap_pte_fn, NULL); + apply_to_page_range(&init_mm, (unsigned long)grstatus, + PAGE_SIZE * nr_status_frames(nr_grant_frames), + unmap_pte_fn, NULL); return 0; } #else @@ -829,7 +980,8 @@ static struct sys_device device_gnttab = { #include <platform-pci.h> -static unsigned long resume_frames; +static unsigned long resume_frames_gnttab; +static unsigned long resume_frames_status; static int gnttab_map(unsigned int start_idx, unsigned int end_idx) { @@ -843,7 +995,24 @@ static int gnttab_map(unsigned int start_idx, unsigned int end_idx) xatp.domid = DOMID_SELF; xatp.idx = i; xatp.space = XENMAPSPACE_grant_table; - xatp.gpfn = (resume_frames >> PAGE_SHIFT) + i; + xatp.gpfn = (resume_frames_gnttab >> PAGE_SHIFT) + i; + if (HYPERVISOR_memory_op(XENMEM_add_to_physmap, &xatp)) + BUG(); + } while (i-- > start_idx); + + return 0; +} + +static int gnttab_map_status(unsigned int start_idx, unsigned int end_idx) +{ + struct xen_add_to_physmap xatp; + unsigned int i = end_idx; + + do { + xatp.domid = DOMID_SELF; + xatp.idx = i | XENMAPIDX_grant_table_status; + xatp.space = XENMAPSPACE_grant_table; + xatp.gpfn = (resume_frames_status >> PAGE_SHIFT) + i; if (HYPERVISOR_memory_op(XENMEM_add_to_physmap, &xatp)) BUG(); } while (i-- > start_idx); @@ -854,16 +1023,21 @@ static int gnttab_map(unsigned int start_idx, unsigned int end_idx) int gnttab_resume(void) { unsigned int max_nr_gframes, nr_gframes; + unsigned int max_nr_sframes, nr_sframes; + + gnttab_request_version(); nr_gframes = nr_grant_frames; max_nr_gframes = max_nr_grant_frames(); if (max_nr_gframes < nr_gframes) return -ENOSYS; - if (!resume_frames) { - resume_frames = alloc_xen_mmio(PAGE_SIZE * max_nr_gframes); - shared = ioremap(resume_frames, PAGE_SIZE * max_nr_gframes); - if (shared == NULL) { + if (!resume_frames_gnttab) { + resume_frames_gnttab + alloc_xen_mmio(PAGE_SIZE * max_nr_gframes); + shared.raw = ioremap(resume_frames_gnttab, + PAGE_SIZE * max_nr_gframes); + if (shared.raw == NULL) { printk("error to ioremap gnttab share frames\n"); return -1; } @@ -871,6 +1045,22 @@ int gnttab_resume(void) gnttab_map(0, nr_gframes - 1); + if (grant_table_version > 1) { + nr_sframes = nr_status_frames(nr_gframes); + max_nr_sframes = nr_status_frames(max_nr_gframes); + if (!resume_frames_status) { + resume_frames_status + alloc_xen_mmio(PAGE_SIZE * max_nr_sframes); + grstatus = ioremap(resume_frames_status, + PAGE_SIZE * max_nr_sframes); + if (grstatus == NULL) { + printk("error ioremap()ing gnttab status frames\n"); + return -1; + } + } + + gnttab_map_status(0, nr_sframes - 1); + } return 0; } diff --git a/include/xen/interface/grant_table.h b/include/xen/interface/grant_table.h index c5c2044..1c3a7c7 100644 --- a/include/xen/interface/grant_table.h +++ b/include/xen/interface/grant_table.h @@ -84,12 +84,22 @@ */ /* + * Reference to a grant entry in a specified domain''s grant table. + */ +typedef uint32_t grant_ref_t; + +/* * A grant table comprises a packed array of grant entries in one or more * page frames shared between Xen and a guest. * [XEN]: This field is written by Xen and read by the sharing guest. * [GST]: This field is written by the guest and read by Xen. */ -struct grant_entry { + +/* + * Version 1 of the grant table entry structure is maintained purely + * for backwards compatibility. New guests should use version 2. + */ +struct grant_entry_v1 { /* GTF_xxx: various type and flag information. [XEN,GST] */ uint16_t flags; /* The domain being granted foreign privileges. [GST] */ @@ -100,7 +110,7 @@ struct grant_entry { */ uint32_t frame; }; -typedef struct grant_entry grant_entry_t; +typedef struct grant_entry_v1 grant_entry_v1_t; /* * Type of grant entry. @@ -108,10 +118,13 @@ typedef struct grant_entry grant_entry_t; * GTF_permit_access: Allow @domid to map/access @frame. * GTF_accept_transfer: Allow @domid to transfer ownership of one page frame * to this guest. Xen writes the page number to @frame. + * GTF_transitive: Allow @domid to transitively access a subrange of + * @trans_grant in @trans_domid. No mappings are allowed. */ #define GTF_invalid (0U<<0) #define GTF_permit_access (1U<<0) #define GTF_accept_transfer (2U<<0) +#define GTF_transitive (3U<<0) #define GTF_type_mask (3U<<0) /* @@ -120,6 +133,9 @@ typedef struct grant_entry grant_entry_t; * GTF_reading: Grant entry is currently mapped for reading by @domid. [XEN] * GTF_writing: Grant entry is currently mapped for writing by @domid. [XEN] * GTF_PAT, GTF_PWT, GTF_PCD: (x86) cache attribute flags for the grant [GST] + * GTF_sub_page: Grant access to only a subrange of the page. @domid + * will only be allowed to copy from the grant, and not + * map it. [GST] */ #define _GTF_readonly (2) #define GTF_readonly (1U<<_GTF_readonly) @@ -133,6 +149,8 @@ typedef struct grant_entry grant_entry_t; #define GTF_PCD (1U<<_GTF_PCD) #define _GTF_PAT (7) #define GTF_PAT (1U<<_GTF_PAT) +#define _GTF_sub_page (8) +#define GTF_sub_page (1U<<_GTF_sub_page) /* * Subflags for GTF_accept_transfer: @@ -149,15 +167,76 @@ typedef struct grant_entry grant_entry_t; #define _GTF_transfer_completed (3) #define GTF_transfer_completed (1U<<_GTF_transfer_completed) +/* + * Version 2 grant table entries. These fulfil the same role as + * version 1 entries, but can represent more complicated operations. + * Any given domain will have either a version 1 or a version 2 table, + * and every entry in the table will be the same version. + * + * The interface by which domains use grant references does not depend + * on the grant table version in use by the other domain. + */ -/*********************************** - * GRANT TABLE QUERIES AND USES +/* + * Version 1 and version 2 grant entries share a common prefix. The + * fields of the prefix are documented as part of struct + * grant_entry_v1. */ +struct grant_entry_header { + uint16_t flags; + domid_t domid; +}; +typedef struct grant_entry_header grant_entry_header_t; /* - * Reference to a grant entry in a specified domain''s grant table. + * Version 2 of the grant entry structure. + */ +struct grant_entry_v2 { + grant_entry_header_t hdr; + union { + /* + * The frame to which we are granting access. This field has + * the same meaning as the grant_entry_v1 field of the same + * name. + */ + uint32_t frame; + + /* + * If the grant type is GTF_grant_access and GTF_sub_page is + * set, @domid is allowed to access bytes + * [@page_off,@page_off+@length) in frame @frame. + */ + struct { + uint32_t frame; + uint16_t page_off; + uint16_t length; + } sub_page; + + /* + * If the grant is GTF_transitive, @domid is allowed to use + * the grant @gref in domain @trans_domid, as if it was the + * local domain. Obviously, the transitive access must be + * compatible with the original grant. + * + * The current version of Xen does not allow transitive grants + * to be mapped. + */ + struct { + domid_t trans_domid; + uint16_t pad0; + grant_ref_t gref; + } transitive; + + uint32_t __spacer[3]; /* Pad to a power of two */ + }; +}; +typedef struct grant_entry_v2 grant_entry_v2_t; + +typedef uint16_t grant_status_t; + +/*********************************** + * GRANT TABLE QUERIES AND USES */ -typedef uint32_t grant_ref_t; /* * Handle to track a mapping created via a grant reference. @@ -365,6 +444,46 @@ struct gnttab_unmap_and_replace { typedef struct gnttab_unmap_and_replace gnttab_unmap_and_replace_t; DEFINE_XEN_GUEST_HANDLE(gnttab_unmap_and_replace_t); +/* + * GNTTABOP_set_version: Request a particular version of the grant + * table shared table structure. This operation can only be performed + * once in any given domain. It must be performed before any grants + * are activated; otherwise, the domain will be stuck with version 1. + * The only defined versions are 1 and 2. + */ +#define GNTTABOP_set_version 8 +struct gnttab_set_version { + /* IN parameters */ + uint32_t version; +}; +DEFINE_XEN_GUEST_HANDLE_STRUCT(gnttab_set_version); +typedef struct gnttab_set_version gnttab_set_version_t; +DEFINE_XEN_GUEST_HANDLE(gnttab_set_version_t); + +/* + * GNTTABOP_get_status_frames: Get the list of frames used to store grant + * status for <dom>. In grant format version 2, the status is separated + * from the other shared grant fields to allow more efficient synchronization + * using barriers instead of atomic cmpexch operations. + * <nr_frames> specify the size of vector <frame_list>. + * The frame addresses are returned in the <frame_list>. + * Only <nr_frames> addresses are returned, even if the table is larger. + * NOTES: + * 1. <dom> may be specified as DOMID_SELF. + * 2. Only a sufficiently-privileged domain may specify <dom> != DOMID_SELF. + */ +#define GNTTABOP_get_status_frames 9 +struct gnttab_get_status_frames { + /* IN parameters. */ + uint32_t nr_frames; + domid_t dom; + /* OUT parameters. */ + int16_t status; /* GNTST_* */ + uint64_t frame_list; +}; +DEFINE_XEN_GUEST_HANDLE_STRUCT(gnttab_get_status_frames); +typedef struct gnttab_get_status_frames gnttab_get_status_frames_t; +DEFINE_XEN_GUEST_HANDLE(gnttab_get_status_frames_t); /* * Bitfield values for gnttab_map_grant_ref.flags. -- 1.6.3.1 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
<steven.smith@citrix.com>
2009-Oct-04 15:03 UTC
[Xen-devel] [PATCH 06/22] Add support for copy only (sub-page) grants. These are like normal access grants, except:
-- They can''t be used to map the page (so can only be used in a GNTTABOP_copy hypercall). -- It''s possible to grant access with a finer granularity than whole pages. -- Xen guarantees that they can be revoked quickly (a normal map grant can only be revoked with the cooperation of the domain which has been granted access). Signed-off-by: Steven Smith <steven.smith@citrix.com> --- drivers/xen/core/gnttab.c | 38 ++++++++++++++++++++++++++++++++++++++ include/xen/gnttab.h | 16 ++++++++++++++++ 2 files changed, 54 insertions(+), 0 deletions(-) diff --git a/drivers/xen/core/gnttab.c b/drivers/xen/core/gnttab.c index 5484e7e..b32dd5d 100644 --- a/drivers/xen/core/gnttab.c +++ b/drivers/xen/core/gnttab.c @@ -198,6 +198,44 @@ void gnttab_grant_foreign_access_ref(grant_ref_t ref, domid_t domid, } EXPORT_SYMBOL_GPL(gnttab_grant_foreign_access_ref); +int gnttab_grant_foreign_access_subpage(domid_t domid, unsigned long frame, + int flags, unsigned page_off, + unsigned length) +{ + int ref; + + if (unlikely((ref = get_free_entry()) < 0)) + return -ENOSPC; + + gnttab_grant_foreign_access_ref_subpage(ref, domid, frame, flags, + page_off, length); + + return ref; +} +EXPORT_SYMBOL_GPL(gnttab_grant_foreign_access_subpage); + +void gnttab_grant_foreign_access_ref_subpage(grant_ref_t ref, domid_t domid, + unsigned long frame, int flags, + unsigned page_off, + unsigned length) +{ + BUG_ON(flags & (GTF_accept_transfer | GTF_reading | + GTF_writing | GTF_sub_page | GTF_permit_access)); + BUG_ON(grant_table_version == 1); + shared.v2[ref].sub_page.frame = frame; + shared.v2[ref].sub_page.page_off = page_off; + shared.v2[ref].sub_page.length = length; + shared.v2[ref].hdr.domid = domid; + wmb(); + shared.v2[ref].hdr.flags = GTF_permit_access | GTF_sub_page | flags; +} +EXPORT_SYMBOL_GPL(gnttab_grant_foreign_access_ref_subpage); + +int gnttab_subpage_grants_available(void) +{ + return grant_table_version == 2; +} +EXPORT_SYMBOL_GPL(gnttab_subpage_grants_available); int gnttab_query_foreign_access(grant_ref_t ref) { diff --git a/include/xen/gnttab.h b/include/xen/gnttab.h index bde65fd..90201da 100644 --- a/include/xen/gnttab.h +++ b/include/xen/gnttab.h @@ -54,6 +54,18 @@ struct gnttab_free_callback { int gnttab_grant_foreign_access(domid_t domid, unsigned long frame, int flags); +int gnttab_grant_foreign_access_subpage(domid_t domid, unsigned long frame, + int flags, unsigned page_off, + unsigned length); + + +/* + * Are sub-page grants available on this version of Xen? Returns 1 if + * they are, and 0 if they''re not. + */ +int gnttab_subpage_grants_available(void); + + /* * End access through the given grant reference, iff the grant entry is no * longer in use. Return 1 if the grant entry was freed, 0 if it is still in @@ -98,6 +110,10 @@ void gnttab_cancel_free_callback(struct gnttab_free_callback *callback); void gnttab_grant_foreign_access_ref(grant_ref_t ref, domid_t domid, unsigned long frame, int flags); +void gnttab_grant_foreign_access_ref_subpage(grant_ref_t ref, domid_t domid, + unsigned long frame, int flags, + unsigned page_off, + unsigned length); void gnttab_grant_foreign_transfer_ref(grant_ref_t, domid_t domid, unsigned long pfn); -- 1.6.3.1 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
<steven.smith@citrix.com>
2009-Oct-04 15:04 UTC
[Xen-devel] [PATCH 07/22] Add support for transitive grants.
These allow a domain A which has been granted access on a page of domain B''s memory to issue domain C with a copy-grant on the same page. This is useful e.g. for forwarding packets between domains. Signed-off-by: Steven Smith <steven.smith@citrix.com> --- drivers/xen/core/gnttab.c | 32 ++++++++++++++++++++++++++++++++ include/xen/gnttab.h | 8 ++++++++ 2 files changed, 40 insertions(+), 0 deletions(-) diff --git a/drivers/xen/core/gnttab.c b/drivers/xen/core/gnttab.c index b32dd5d..02790fe 100644 --- a/drivers/xen/core/gnttab.c +++ b/drivers/xen/core/gnttab.c @@ -237,6 +237,38 @@ int gnttab_subpage_grants_available(void) } EXPORT_SYMBOL_GPL(gnttab_subpage_grants_available); +int gnttab_grant_foreign_access_trans(domid_t domid, int flags, + domid_t trans_domid, + grant_ref_t trans_gref) +{ + int ref; + + if (unlikely((ref = get_free_entry()) < 0)) + return -ENOSPC; + + gnttab_grant_foreign_access_ref_trans(ref, domid, flags, + trans_domid, trans_gref); + + return ref; +} +EXPORT_SYMBOL_GPL(gnttab_grant_foreign_access_trans); + +void gnttab_grant_foreign_access_ref_trans(grant_ref_t ref, domid_t domid, + int flags, + domid_t trans_domid, + grant_ref_t trans_gref) +{ + BUG_ON(flags & (GTF_accept_transfer | GTF_reading | + GTF_writing | GTF_sub_page | GTF_permit_access)); + BUG_ON(grant_table_version == 1); + shared.v2[ref].transitive.trans_domid = trans_domid; + shared.v2[ref].transitive.gref = trans_gref; + shared.v2[ref].hdr.domid = domid; + wmb(); + shared.v2[ref].hdr.flags = GTF_permit_access | GTF_transitive | flags; +} +EXPORT_SYMBOL_GPL(gnttab_grant_foreign_access_ref_trans); + int gnttab_query_foreign_access(grant_ref_t ref) { u16 nflags; diff --git a/include/xen/gnttab.h b/include/xen/gnttab.h index 90201da..30be437 100644 --- a/include/xen/gnttab.h +++ b/include/xen/gnttab.h @@ -58,6 +58,10 @@ int gnttab_grant_foreign_access_subpage(domid_t domid, unsigned long frame, int flags, unsigned page_off, unsigned length); +void gnttab_grant_foreign_access_ref_trans(grant_ref_t ref, domid_t domid, + int flags, + domid_t trans_domid, + grant_ref_t trans_gref); /* * Are sub-page grants available on this version of Xen? Returns 1 if @@ -114,6 +118,10 @@ void gnttab_grant_foreign_access_ref_subpage(grant_ref_t ref, domid_t domid, unsigned long frame, int flags, unsigned page_off, unsigned length); +void gnttab_grant_foreign_access_ref_trans(grant_ref_t ref, domid_t domid, + int flags, + domid_t trans_domid, + grant_ref_t trans_gref); void gnttab_grant_foreign_transfer_ref(grant_ref_t, domid_t domid, unsigned long pfn); -- 1.6.3.1 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
<steven.smith@citrix.com>
2009-Oct-04 15:04 UTC
[Xen-devel] [PATCH 08/22] Extend the grant tables implementation with an improved allocation batching mechanism.
The current batched allocation mechanism only allows grefs to be withdrawn from the pre-allocated pool one at a time; the new scheme allows them to be withdrawn in groups. There aren''t currently any users of this facility, but it will simplify some of the NC2 logic (coming up shortly). Signed-off-by: Steven Smith <steven.smith@citrix.com> --- drivers/xen/core/gnttab.c | 35 +++++++++++++++++++++++++++++++++++ include/xen/gnttab.h | 5 +++++ 2 files changed, 40 insertions(+), 0 deletions(-) diff --git a/drivers/xen/core/gnttab.c b/drivers/xen/core/gnttab.c index 02790fe..d5caf21 100644 --- a/drivers/xen/core/gnttab.c +++ b/drivers/xen/core/gnttab.c @@ -518,6 +518,41 @@ void gnttab_free_grant_references(grant_ref_t head) } EXPORT_SYMBOL_GPL(gnttab_free_grant_references); +int gnttab_suballoc_grant_references(u16 count, grant_ref_t *old_head, + grant_ref_t *new_head) +{ + grant_ref_t cursor; + unsigned nr_allocated; + + *new_head = cursor = *old_head; + if (cursor == GNTTAB_LIST_END) + return -ENOSPC; + nr_allocated = 1; + while (nr_allocated < count) { + cursor = gnttab_entry(cursor); + if (cursor == GNTTAB_LIST_END) + return -ENOSPC; + nr_allocated++; + } + *old_head = gnttab_entry(cursor); + gnttab_entry(cursor) = GNTTAB_LIST_END; + return 0; +} +EXPORT_SYMBOL_GPL(gnttab_suballoc_grant_references); + +void gnttab_subfree_grant_references(grant_ref_t head, grant_ref_t *pool) +{ + grant_ref_t cursor; + + for (cursor = head; + gnttab_entry(cursor) != GNTTAB_LIST_END; + cursor = gnttab_entry(cursor)) + ; + gnttab_entry(cursor) = *pool; + *pool = head; +} +EXPORT_SYMBOL_GPL(gnttab_subfree_grant_references); + int gnttab_alloc_grant_references(u16 count, grant_ref_t *head) { int h = get_free_entries(count); diff --git a/include/xen/gnttab.h b/include/xen/gnttab.h index 30be437..b4610d9 100644 --- a/include/xen/gnttab.h +++ b/include/xen/gnttab.h @@ -97,10 +97,15 @@ int gnttab_query_foreign_access(grant_ref_t ref); */ int gnttab_alloc_grant_references(u16 count, grant_ref_t *pprivate_head); +int gnttab_suballoc_grant_references(u16 count, grant_ref_t *old_head, + grant_ref_t *new_head); + void gnttab_free_grant_reference(grant_ref_t ref); void gnttab_free_grant_references(grant_ref_t head); +void gnttab_subfree_grant_references(grant_ref_t head, grant_ref_t *pool); + int gnttab_empty_grant_references(const grant_ref_t *pprivate_head); int gnttab_claim_grant_reference(grant_ref_t *pprivate_head); -- 1.6.3.1 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
<steven.smith@citrix.com>
2009-Oct-04 15:04 UTC
[Xen-devel] [PATCH 09/22] Add a very basic netchannel2 implementation.
This is functional, in the sense that packets can be sent and received, but lacks any advanced features. Signed-off-by: Steven Smith <steven.smith@citrix.com> --- drivers/xen/Kconfig | 24 + drivers/xen/Makefile | 1 + drivers/xen/netchannel2/Makefile | 12 + drivers/xen/netchannel2/chan.c | 659 ++++++++++++++++++++++++ drivers/xen/netchannel2/netback2.c | 354 +++++++++++++ drivers/xen/netchannel2/netchan2.c | 32 ++ drivers/xen/netchannel2/netchannel2_core.h | 351 +++++++++++++ drivers/xen/netchannel2/netchannel2_endpoint.h | 63 +++ drivers/xen/netchannel2/netfront2.c | 488 ++++++++++++++++++ drivers/xen/netchannel2/recv_packet.c | 216 ++++++++ drivers/xen/netchannel2/rscb.c | 385 ++++++++++++++ drivers/xen/netchannel2/util.c | 230 +++++++++ drivers/xen/netchannel2/xmit_packet.c | 318 ++++++++++++ include/xen/interface/io/netchannel2.h | 106 ++++ include/xen/interface/io/uring.h | 426 +++++++++++++++ 15 files changed, 3665 insertions(+), 0 deletions(-) create mode 100644 drivers/xen/netchannel2/Makefile create mode 100644 drivers/xen/netchannel2/chan.c create mode 100644 drivers/xen/netchannel2/netback2.c create mode 100644 drivers/xen/netchannel2/netchan2.c create mode 100644 drivers/xen/netchannel2/netchannel2_core.h create mode 100644 drivers/xen/netchannel2/netchannel2_endpoint.h create mode 100644 drivers/xen/netchannel2/netfront2.c create mode 100644 drivers/xen/netchannel2/recv_packet.c create mode 100644 drivers/xen/netchannel2/rscb.c create mode 100644 drivers/xen/netchannel2/util.c create mode 100644 drivers/xen/netchannel2/xmit_packet.c create mode 100644 include/xen/interface/io/netchannel2.h create mode 100644 include/xen/interface/io/uring.h diff --git a/drivers/xen/Kconfig b/drivers/xen/Kconfig index ed4b89b..a081b73 100644 --- a/drivers/xen/Kconfig +++ b/drivers/xen/Kconfig @@ -210,6 +210,30 @@ config XEN_SCSI_FRONTEND The SCSI frontend driver allows the kernel to access SCSI Devices within another guest OS. +config XEN_NETCHANNEL2 + tristate "Net channel 2 support" + depends on XEN && NET + default y + help + Xen netchannel2 driver support. This allows a domain to act as + either the backend or frontend part of a netchannel2 connection. + Unless you are building a dedicated device-driver domain, you + almost certainly want to say Y here. + + If you say Y or M here, you should also say Y to one or both of + ``Net channel2 backend support'''' and ``Net channel2 frontend + support'''', below. + +config XEN_NETDEV2_BACKEND + bool "Net channel 2 backend support" + depends on XEN_BACKEND && XEN_NETCHANNEL2 + default XEN_BACKEND + +config XEN_NETDEV2_FRONTEND + bool "Net channel 2 frontend support" + depends on XEN_NETCHANNEL2 + default y + config XEN_GRANT_DEV tristate "User-space granted page access driver" default XEN_PRIVILEGED_GUEST diff --git a/drivers/xen/Makefile b/drivers/xen/Makefile index 873e5a3..68eb231 100644 --- a/drivers/xen/Makefile +++ b/drivers/xen/Makefile @@ -30,4 +30,5 @@ obj-$(CONFIG_XEN_GRANT_DEV) += gntdev/ obj-$(CONFIG_XEN_NETDEV_ACCEL_SFC_UTIL) += sfc_netutil/ obj-$(CONFIG_XEN_NETDEV_ACCEL_SFC_FRONTEND) += sfc_netfront/ obj-$(CONFIG_XEN_NETDEV_ACCEL_SFC_BACKEND) += sfc_netback/ +obj-$(CONFIG_XEN_NETCHANNEL2) += netchannel2/ obj-$(CONFIG_XEN_ACPI_WMI_WRAPPER) += acpi-wmi/ diff --git a/drivers/xen/netchannel2/Makefile b/drivers/xen/netchannel2/Makefile new file mode 100644 index 0000000..bdad6da --- /dev/null +++ b/drivers/xen/netchannel2/Makefile @@ -0,0 +1,12 @@ +obj-$(CONFIG_XEN_NETCHANNEL2) += netchannel2.o + +netchannel2-objs := chan.o netchan2.o rscb.o util.o \ + xmit_packet.o recv_packet.o + +ifeq ($(CONFIG_XEN_NETDEV2_BACKEND),y) +netchannel2-objs += netback2.o +endif + +ifeq ($(CONFIG_XEN_NETDEV2_FRONTEND),y) +netchannel2-objs += netfront2.o +endif diff --git a/drivers/xen/netchannel2/chan.c b/drivers/xen/netchannel2/chan.c new file mode 100644 index 0000000..e3ad981 --- /dev/null +++ b/drivers/xen/netchannel2/chan.c @@ -0,0 +1,659 @@ +#include <linux/kernel.h> +#include <linux/kthread.h> +#include <linux/gfp.h> +#include <linux/etherdevice.h> +#include <linux/interrupt.h> +#include <linux/netdevice.h> +#include <linux/slab.h> +#include <linux/spinlock.h> +#include <linux/delay.h> +#include <linux/version.h> +#include <xen/evtchn.h> +#include <xen/xenbus.h> + +#include "netchannel2_endpoint.h" +#include "netchannel2_core.h" + +static int process_ring(struct napi_struct *napi, + int work_avail); + +static irqreturn_t nc2_int(int irq, void *dev_id) +{ + struct netchannel2_ring_pair *ncr = dev_id; + + if (ncr->irq == -1) + return IRQ_HANDLED; + if (ncr->cons_ring.sring->prod != ncr->cons_ring.cons_pvt || + ncr->interface->is_stopped) + nc2_kick(ncr); + return IRQ_HANDLED; +} + +/* Process all incoming messages. The function is given an + IRQ-disabled reference for the interface, and must dispose of it + (either by enabling the IRQ or re-introducing it to the pending + list). Alternatively, the function can stop the ring being + processed again by leaking the reference (e.g. when the remote + endpoint is misbehaving). */ +/* Returns -1 if we used all the available work without finishing, or + the amount of work used otherwise. */ +static int process_messages(struct netchannel2_ring_pair *ncrp, + int work_avail, + struct sk_buff_head *pending_rx_queue) +{ + struct netchannel2_msg_hdr hdr; + RING_IDX prod; + struct netchannel2 *nc = ncrp->interface; + int work_done; + + work_done = 1; + +retry: + prod = ncrp->cons_ring.sring->prod; + rmb(); + while (work_done < work_avail && + prod != ncrp->cons_ring.cons_pvt) { + nc2_copy_from_ring(&ncrp->cons_ring, &hdr, sizeof(hdr)); + if (hdr.size < sizeof(hdr)) { + printk(KERN_WARNING "Other end sent too-small message (%d)\n", + hdr.size); + goto done; + } + if (hdr.size > ncrp->cons_ring.payload_bytes) { + /* This one message is bigger than the whole + ring -> other end is clearly misbehaving. + We won''t take any more messages from this + ring. */ + printk(KERN_WARNING "Other end sent enormous message (%d > %zd)\n", + hdr.size, + ncrp->cons_ring.payload_bytes); + goto done; + } + + switch (hdr.type) { + case NETCHANNEL2_MSG_SET_MAX_PACKETS: + nc2_handle_set_max_packets_msg(ncrp, &hdr); + break; + case NETCHANNEL2_MSG_PACKET: + nc2_handle_packet_msg(nc, ncrp, &hdr, + pending_rx_queue); + break; + case NETCHANNEL2_MSG_FINISH_PACKET: + nc2_handle_finish_packet_msg(nc, ncrp, &hdr); + break; + case NETCHANNEL2_MSG_PAD: + break; + default: + /* Drop bad messages. We should arguably stop + processing the ring at this point, because + the ring is probably corrupt. However, if + it is corrupt then one of the other checks + will hit soon enough, and doing it this way + should make it a bit easier to add new + message types in future. */ + pr_debug("Bad message type %d from peer!\n", + hdr.type); + break; + } + hdr.size = (hdr.size + 7) & ~7; + ncrp->cons_ring.cons_pvt += hdr.size; + + work_done++; + if (work_done == work_avail) + return -1; + } + + if (unlikely(prod != ncrp->cons_ring.sring->prod)) + goto retry; + + /* Dispose of our IRQ-disable reference. */ +done: + napi_complete(&ncrp->napi); + enable_irq(ncrp->irq); + + if (nc2_final_check_for_messages(&ncrp->cons_ring, + prod)) { + /* More work to do still. */ + nc2_kick(ncrp); + } + + return work_done; +} + +/* Flush out all pending metadata messages on ring @ncrp, and then + update the ring pointers to indicate that we''ve done so. Fire the + event channel if necessary. */ +static void flush_rings(struct netchannel2_ring_pair *ncrp) +{ + int need_kick; + + flush_hypercall_batcher(&ncrp->pending_rx_hypercalls, + nc2_rscb_on_gntcopy_fail); + send_finish_packet_messages(ncrp); + if (ncrp->need_advertise_max_packets) + advertise_max_packets(ncrp); + + need_kick = 0; + if (nc2_finish_messages(&ncrp->cons_ring)) { + need_kick = 1; + /* If we need an event on the consumer ring, we always + need to notify the other end, even if we don''t have + any messages which would normally be considered + urgent. */ + ncrp->pending_time_sensitive_messages = 1; + } + if (nc2_flush_ring(&ncrp->prod_ring)) + need_kick = 1; + if (need_kick || + (ncrp->delayed_kick && ncrp->pending_time_sensitive_messages)) { + if (ncrp->pending_time_sensitive_messages) { + notify_remote_via_irq(ncrp->irq); + ncrp->delayed_kick = 0; + } else { + ncrp->delayed_kick = 1; + } + ncrp->pending_time_sensitive_messages = 0; + } +} + +/* Process incoming messages, and then flush outgoing metadata + * messages. We also try to unjam the xmit queue if any of the + * incoming messages would give us permission to send more stuff. */ +/* This is given an IRQ-disable reference, and must dispose of it. */ +static int nc2_poll(struct netchannel2_ring_pair *ncrp, int work_avail, + struct sk_buff_head *rx_queue) +{ + int work_done; + + if (!ncrp->is_attached) { + napi_complete(&ncrp->napi); + enable_irq(ncrp->irq); + return 0; + } + + work_done = process_messages(ncrp, work_avail, rx_queue); + + flush_rings(ncrp); + + if (work_done < 0) + return work_avail; + else + return work_done; +} + +/* Like skb_queue_purge(), but use release_tx_packet() rather than + kfree_skb() */ +void nc2_queue_purge(struct netchannel2_ring_pair *ncrp, + struct sk_buff_head *queue) +{ + struct sk_buff *skb; + + while (!skb_queue_empty(queue)) { + skb = skb_dequeue(queue); + release_tx_packet(ncrp, skb); + } +} + +/* struct net_device stop() method. */ +static int nc2_stop(struct net_device *nd) +{ + struct netchannel2 *nc = netdev_priv(nd); + + spin_lock_bh(&nc->rings.lock); + nc->stats.tx_dropped += skb_queue_len(&nc->pending_skbs); + nc2_queue_purge(&nc->rings, &nc->pending_skbs); + spin_unlock_bh(&nc->rings.lock); + + return 0; +} + +/* Kick a netchannel2 interface so that the poll() method runs + * soon. */ +/* This has semi release-like semantics, so you can set flags + lock-free and be guaranteed that the poll() method will eventually + run and see the flag set, without doing any explicit locking. */ +void nc2_kick(struct netchannel2_ring_pair *ncrp) +{ + if (napi_schedule_prep(&ncrp->napi)) { + disable_irq_nosync(ncrp->irq); + __napi_schedule(&ncrp->napi); + } +} + +static int nc2_open(struct net_device *nd) +{ + struct netchannel2 *nc = netdev_priv(nd); + + nc2_kick(&nc->rings); + return 0; +} + +/* Rad a mac address from an address in xenstore at @prefix/@node. + * Call not holding locks. Returns 0 on success or <0 on error. */ +static int read_mac_address(const char *prefix, const char *node, + unsigned char *addr) +{ + int err; + unsigned mac[6]; + int i; + + err = xenbus_scanf(XBT_NIL, prefix, node, + "%x:%x:%x:%x:%x:%x", + &mac[0], + &mac[1], + &mac[2], + &mac[3], + &mac[4], + &mac[5]); + if (err < 0) + return err; + if (err != 6) + return -EINVAL; + for (i = 0; i < 6; i++) { + if (mac[i] >= 0x100) + return -EINVAL; + addr[i] = mac[i]; + } + return 0; +} + +/* Release resources associated with a ring pair. It is assumed that + the ring pair has already been detached (which stops the IRQ and + un-pends the ring). */ +void cleanup_ring_pair(struct netchannel2_ring_pair *ncrp) +{ + BUG_ON(ncrp->prod_ring.sring); + BUG_ON(ncrp->cons_ring.sring); + + drop_pending_tx_packets(ncrp); + nc2_queue_purge(ncrp, &ncrp->release_on_flush_batcher); + if (ncrp->gref_pool != 0) + gnttab_free_grant_references(ncrp->gref_pool); + netif_napi_del(&ncrp->napi); +} + +int init_ring_pair(struct netchannel2_ring_pair *ncrp, + struct netchannel2 *nc) +{ + unsigned x; + + ncrp->interface = nc; + spin_lock_init(&ncrp->lock); + ncrp->irq = -1; + + for (x = 0; x < NR_TX_PACKETS - 1; x++) + txp_set_next_free(ncrp->tx_packets + x, x + 1); + txp_set_next_free(ncrp->tx_packets + x, INVALID_TXP_INDEX); + ncrp->head_free_tx_packet = 0; + + skb_queue_head_init(&ncrp->pending_tx_queue); + skb_queue_head_init(&ncrp->release_on_flush_batcher); + + if (gnttab_alloc_grant_references(NR_TX_PACKETS, + &ncrp->gref_pool) < 0) + return -1; + + netif_napi_add(ncrp->interface->net_device, &ncrp->napi, + process_ring, 64); + napi_enable(&ncrp->napi); + + return 0; +} + +static struct net_device_stats *nc2_get_stats(struct net_device *nd) +{ + struct netchannel2 *nc = netdev_priv(nd); + + return &nc->stats; +} + +/* Create a new netchannel2 structure. Call with no locks held. + Returns NULL on error. The xenbus device must remain valid for as + long as the netchannel2 structure does. The core does not take out + any kind of reference count on it, but will refer to it throughout + the returned netchannel2''s life. */ +struct netchannel2 *nc2_new(struct xenbus_device *xd) +{ + struct net_device *netdev; + struct netchannel2 *nc; + int err; + int local_trusted; + int remote_trusted; + int filter_mac; + + if (!gnttab_subpage_grants_available()) { + printk(KERN_ERR "netchannel2 needs version 2 grant tables\n"); + return NULL; + } + + if (xenbus_scanf(XBT_NIL, xd->nodename, "local-trusted", + "%d", &local_trusted) != 1) { + printk(KERN_WARNING "Can''t tell whether local endpoint is trusted; assuming it is.\n"); + local_trusted = 1; + } + + if (xenbus_scanf(XBT_NIL, xd->nodename, "remote-trusted", + "%d", &remote_trusted) != 1) { + printk(KERN_WARNING "Can''t tell whether local endpoint is trusted; assuming it isn''t.\n"); + remote_trusted = 0; + } + + if (xenbus_scanf(XBT_NIL, xd->nodename, "filter-mac", + "%d", &filter_mac) != 1) { + if (remote_trusted) { + printk(KERN_WARNING "Can''t tell whether to filter MAC addresses from remote domain; filtering off.\n"); + filter_mac = 0; + } else { + printk(KERN_WARNING "Can''t tell whether to filter MAC addresses from remote domain; filtering on.\n"); + filter_mac = 1; + } + } + + netdev = alloc_etherdev(sizeof(*nc)); + if (netdev == NULL) + return NULL; + + nc = netdev_priv(netdev); + memset(nc, 0, sizeof(*nc)); + nc->magic = NETCHANNEL2_MAGIC; + nc->net_device = netdev; + nc->xenbus_device = xd; + + nc->remote_trusted = remote_trusted; + nc->local_trusted = local_trusted; + nc->rings.filter_mac = filter_mac; + + skb_queue_head_init(&nc->pending_skbs); + if (init_ring_pair(&nc->rings, nc) < 0) { + nc2_release(nc); + return NULL; + } + + netdev->open = nc2_open; + netdev->stop = nc2_stop; + netdev->hard_start_xmit = nc2_start_xmit; + netdev->get_stats = nc2_get_stats; + + /* We need to hold the ring lock in order to send messages + anyway, so there''s no point in Linux doing additional + synchronisation. */ + netdev->features = NETIF_F_LLTX; + + SET_NETDEV_DEV(netdev, &xd->dev); + + err = read_mac_address(xd->nodename, "remote-mac", + nc->rings.remote_mac); + if (err == 0) + err = read_mac_address(xd->nodename, "mac", netdev->dev_addr); + if (err == 0) + err = register_netdev(netdev); + + if (err != 0) { + nc2_release(nc); + return NULL; + } + + return nc; +} + +/* Release a netchannel2 structure previously allocated with + * nc2_new(). Call with no locks held. The rings will be + * automatically detach if necessary. */ +void nc2_release(struct netchannel2 *nc) +{ + netif_carrier_off(nc->net_device); + + unregister_netdev(nc->net_device); + + nc2_detach_rings(nc); + + /* Unregistering the net device stops any netdev methods from + running, and detaching the rings stops the napi methods, so + we''re now the only thing accessing this netchannel2 + structure and we can tear it down with impunity. */ + + cleanup_ring_pair(&nc->rings); + + nc2_queue_purge(&nc->rings, &nc->pending_skbs); + + free_netdev(nc->net_device); +} + +static void _nc2_attach_rings(struct netchannel2_ring_pair *ncrp, + struct netchannel2_sring_cons *cons_sring, + const volatile void *cons_payload, + size_t cons_size, + struct netchannel2_sring_prod *prod_sring, + void *prod_payload, + size_t prod_size, + domid_t otherend_id) +{ + BUG_ON(prod_sring == NULL); + BUG_ON(cons_sring == NULL); + + ncrp->prod_ring.sring = prod_sring; + ncrp->prod_ring.payload_bytes = prod_size; + ncrp->prod_ring.prod_pvt = 0; + ncrp->prod_ring.payload = prod_payload; + + ncrp->cons_ring.sring = cons_sring; + ncrp->cons_ring.payload_bytes = cons_size; + ncrp->cons_ring.sring->prod_event = ncrp->cons_ring.sring->prod + 1; + ncrp->cons_ring.cons_pvt = 0; + ncrp->cons_ring.payload = cons_payload; + + ncrp->otherend_id = otherend_id; + + ncrp->is_attached = 1; + + ncrp->need_advertise_max_packets = 1; +} + +/* Attach a netchannel2 structure to a ring pair. The endpoint is + also expected to set up an event channel after calling this before + using the interface. Returns 0 on success or <0 on error. */ +int nc2_attach_rings(struct netchannel2 *nc, + struct netchannel2_sring_cons *cons_sring, + const volatile void *cons_payload, + size_t cons_size, + struct netchannel2_sring_prod *prod_sring, + void *prod_payload, + size_t prod_size, + domid_t otherend_id) +{ + spin_lock_bh(&nc->rings.lock); + _nc2_attach_rings(&nc->rings, cons_sring, cons_payload, cons_size, + prod_sring, prod_payload, prod_size, otherend_id); + + spin_unlock_bh(&nc->rings.lock); + + netif_carrier_on(nc->net_device); + + /* Kick it to get it going. */ + nc2_kick(&nc->rings); + + return 0; +} + +static void _detach_rings(struct netchannel2_ring_pair *ncrp) +{ + spin_lock_bh(&ncrp->lock); + /* We need to release all of the pending transmission packets, + because they''re never going to complete now that we''ve lost + the ring. */ + drop_pending_tx_packets(ncrp); + + disable_irq(ncrp->irq); + + BUG_ON(ncrp->nr_tx_packets_outstanding); + ncrp->max_tx_packets_outstanding = 0; + + /* No way of sending pending finish messages now; drop + * them. */ + ncrp->pending_finish.prod = 0; + ncrp->pending_finish.cons = 0; + + ncrp->cons_ring.sring = NULL; + ncrp->prod_ring.sring = NULL; + ncrp->is_attached = 0; + + spin_unlock_bh(&ncrp->lock); +} + +/* Detach from the rings. This includes unmapping them and stopping + the interrupt. */ +/* Careful: the netdev methods may still be running at this point. */ +/* This is not allowed to wait for the other end, because it might + have gone away (e.g. over suspend/resume). */ +static void nc2_detach_ring(struct netchannel2_ring_pair *ncrp) +{ + if (!ncrp->is_attached) + return; + + napi_disable(&ncrp->napi); + _detach_rings(ncrp); +} + +/* Trivial wrapper around nc2_detach_ring(). Make the ring no longer + used. */ +void nc2_detach_rings(struct netchannel2 *nc) +{ + nc2_detach_ring(&nc->rings); + + /* Okay, all async access to the ring is stopped. Kill the + irqhandlers. (It might be better to do this from the + _detach_ring() functions, but you''re not allowed to + free_irq() from interrupt context, and tasklets are close + enough to cause problems). */ + + if (nc->rings.irq >= 0) + unbind_from_irqhandler(nc->rings.irq, &nc->rings); + nc->rings.irq = -1; +} + +#if defined(CONFIG_XEN_NETDEV2_BACKEND) +/* Connect to an event channel port in a remote domain. Returns 0 on + success or <0 on error. The port is automatically disconnected + when the channel is released or if the rings are detached. This + should not be called if the port is already open. */ +int nc2_connect_evtchn(struct netchannel2 *nc, domid_t domid, + int evtchn) +{ + int err; + + BUG_ON(nc->rings.irq >= 0); + + err = bind_interdomain_evtchn_to_irqhandler(domid, + evtchn, + nc2_int, + IRQF_SAMPLE_RANDOM, + "netchannel2", + &nc->rings); + if (err >= 0) { + nc->rings.irq = err; + nc->rings.evtchn = irq_to_evtchn_port(err); + return 0; + } else { + return err; + } +} +#endif + +#if defined(CONFIG_XEN_NETDEV2_FRONTEND) +/* Listen for incoming event channel connections from domain domid. + Similar semantics to nc2_connect_evtchn(). */ +int nc2_listen_evtchn(struct netchannel2 *nc, domid_t domid) +{ + int err; + + BUG_ON(nc->rings.irq >= 0); + + err = bind_listening_port_to_irqhandler(domid, + nc2_int, + IRQF_SAMPLE_RANDOM, + "netchannel2", + &nc->rings); + if (err >= 0) { + nc->rings.irq = err; + nc->rings.evtchn = irq_to_evtchn_port(err); + return 0; + } else { + return err; + } +} +#endif + +/* Find the local event channel port which was allocated by + * nc2_listen_evtchn() or nc2_connect_evtchn(). It is an error to + * call this when there is no event channel connected. */ +int nc2_get_evtchn_port(struct netchannel2 *nc) +{ + BUG_ON(nc->rings.irq < 0); + return nc->rings.evtchn; +} + +/* @ncrp has been recently nc2_kick()ed. Do all of the necessary + stuff. */ +static int process_ring(struct napi_struct *napi, + int work_avail) +{ + struct netchannel2_ring_pair *ncrp + container_of(napi, struct netchannel2_ring_pair, napi); + struct netchannel2 *nc = ncrp->interface; + struct sk_buff *skb; + int work_done; + struct sk_buff_head rx_queue; + + skb_queue_head_init(&rx_queue); + + spin_lock(&ncrp->lock); + + /* Pick up incoming messages. */ + work_done = nc2_poll(ncrp, work_avail, &rx_queue); + + /* Transmit pending packets. */ + if (!skb_queue_empty(&ncrp->pending_tx_queue)) { + skb = __skb_dequeue(&ncrp->pending_tx_queue); + do { + if (!nc2_really_start_xmit(ncrp, skb)) { + /* Requeue the packet so that we will try + when the ring is less busy */ + __skb_queue_head(&ncrp->pending_tx_queue, skb); + break; + } + skb = __skb_dequeue(&ncrp->pending_tx_queue); + } while (skb != NULL); + + flush_rings(ncrp); + + while ((skb = __skb_dequeue(&ncrp->release_on_flush_batcher))) + release_tx_packet(ncrp, skb); + } + + if (nc->is_stopped) { + /* If the other end has processed some messages, there + may be space on the ring for a delayed send from + earlier. Process it now. */ + while (1) { + skb = skb_peek_tail(&nc->pending_skbs); + if (!skb) + break; + if (prepare_xmit_allocate_resources(nc, skb) < 0) { + /* Still stuck */ + break; + } + __skb_unlink(skb, &nc->pending_skbs); + queue_packet_to_interface(skb, ncrp); + } + if (skb_queue_empty(&nc->pending_skbs)) { + nc->is_stopped = 0; + netif_wake_queue(nc->net_device); + } + } + + spin_unlock(&ncrp->lock); + + receive_pending_skbs(&rx_queue); + + return work_done; +} diff --git a/drivers/xen/netchannel2/netback2.c b/drivers/xen/netchannel2/netback2.c new file mode 100644 index 0000000..fd6f238 --- /dev/null +++ b/drivers/xen/netchannel2/netback2.c @@ -0,0 +1,354 @@ +#include <linux/kernel.h> +#include <linux/gfp.h> +#include <linux/vmalloc.h> +#include <xen/gnttab.h> +#include <xen/xenbus.h> +#include <xen/interface/io/netchannel2.h> + +#include "netchannel2_core.h" +#include "netchannel2_endpoint.h" + +#define NETBACK2_MAGIC 0xb5e99485 +struct netback2 { + unsigned magic; + struct xenbus_device *xenbus_device; + + struct netchannel2 *chan; + + struct grant_mapping b2f_mapping; + struct grant_mapping f2b_mapping; + struct grant_mapping control_mapping; + + int attached; + + struct xenbus_watch shutdown_watch; + int have_shutdown_watch; +}; + +static struct netback2 *xenbus_device_to_nb2(struct xenbus_device *xd) +{ + struct netback2 *nb = xd->dev.driver_data; + BUG_ON(nb->magic != NETBACK2_MAGIC); + return nb; +} + +/* Read a range of grants out of xenstore and map them in gm. Any + existing mapping in gm is released. Returns 0 on success or <0 on + error. On error, gm is preserved, and xenbus_dev_fatal() is + called. */ +static int map_grants(struct netback2 *nd, const char *prefix, + struct grant_mapping *gm) +{ + struct xenbus_device *xd = nd->xenbus_device; + int err; + char buf[32]; + int i; + unsigned nr_pages; + grant_ref_t grefs[MAX_GRANT_MAP_PAGES]; + + sprintf(buf, "%s-nr-pages", prefix); + err = xenbus_scanf(XBT_NIL, xd->otherend, buf, "%u", &nr_pages); + if (err == -ENOENT) { + nr_pages = 1; + } else if (err != 1) { + if (err < 0) { + xenbus_dev_fatal(xd, err, "reading %s", buf); + return err; + } else { + xenbus_dev_fatal(xd, err, "reading %s as integer", + buf); + return -EINVAL; + } + } + + for (i = 0; i < nr_pages; i++) { + sprintf(buf, "%s-ref-%d", prefix, i); + err = xenbus_scanf(XBT_NIL, xd->otherend, buf, "%u", + &grefs[i]); + if (err != 1) { + if (err < 0) { + xenbus_dev_fatal(xd, + err, + "reading gref %d from %s/%s", + i, + xd->otherend, + buf); + } else { + xenbus_dev_fatal(xd, + -EINVAL, + "expected an integer at %s/%s", + xd->otherend, + buf); + err = -EINVAL; + } + return err; + } + } + + err = nc2_map_grants(gm, grefs, nr_pages, xd->otherend_id); + if (err < 0) + xenbus_dev_fatal(xd, err, "mapping ring %s from %s", + prefix, xd->otherend); + return err; +} + +/* Undo the effects of attach_to_frontend */ +static void detach_from_frontend(struct netback2 *nb) +{ + if (!nb->attached) + return; + nc2_detach_rings(nb->chan); + nc2_unmap_grants(&nb->b2f_mapping); + nc2_unmap_grants(&nb->f2b_mapping); + nc2_unmap_grants(&nb->control_mapping); + nb->attached = 0; +} + +static int attach_to_frontend(struct netback2 *nd) +{ + int err; + int evtchn; + struct xenbus_device *xd = nd->xenbus_device; + struct netchannel2 *nc = nd->chan; + struct netchannel2_backend_shared *nbs; + + if (nd->attached) + return 0; + + /* Attach the shared memory bits */ + err = map_grants(nd, "b2f-ring", &nd->b2f_mapping); + if (err) + return err; + err = map_grants(nd, "f2b-ring", &nd->f2b_mapping); + if (err) + return err; + err = map_grants(nd, "control", &nd->control_mapping); + if (err) + return err; + nbs = nd->control_mapping.mapping->addr; + err = nc2_attach_rings(nc, + &nbs->cons, + nd->f2b_mapping.mapping->addr, + nd->f2b_mapping.nr_pages * PAGE_SIZE, + &nbs->prod, + nd->b2f_mapping.mapping->addr, + nd->b2f_mapping.nr_pages * PAGE_SIZE, + xd->otherend_id); + if (err < 0) { + xenbus_dev_fatal(xd, err, "attaching to rings"); + return err; + } + + /* Connect the event channel. */ + err = xenbus_scanf(XBT_NIL, xd->otherend, "event-channel", "%u", + &evtchn); + if (err < 0) { + xenbus_dev_fatal(xd, err, + "reading %s/event-channel or {t,r}x-sring-pages", + xd->otherend); + return err; + } + err = nc2_connect_evtchn(nd->chan, xd->otherend_id, evtchn); + if (err < 0) { + xenbus_dev_fatal(xd, err, "binding to event channel"); + return err; + } + + /* All done */ + nd->attached = 1; + + return 0; +} + +static void frontend_changed(struct xenbus_device *xd, + enum xenbus_state frontend_state) +{ + struct netback2 *nb = xenbus_device_to_nb2(xd); + int err; + + switch (frontend_state) { + case XenbusStateInitialising: + /* If the frontend does a kexec following a crash, we + can end up bounced back here even though we''re + attached. Try to recover by detaching from the old + rings. */ + /* (A normal shutdown, and even a normal kexec, would + * have gone through Closed first, so we''ll already be + * detached, and this is pointless but harmless.) */ + detach_from_frontend(nb); + + /* Tell the frontend what sort of rings we''re willing + to accept. */ + xenbus_printf(XBT_NIL, nb->xenbus_device->nodename, + "max-sring-pages", "%d", MAX_GRANT_MAP_PAGES); + + /* Start the device bring-up bit of the state + * machine. */ + xenbus_switch_state(nb->xenbus_device, XenbusStateInitWait); + break; + + case XenbusStateInitWait: + /* Frontend doesn''t use this state */ + xenbus_dev_fatal(xd, EINVAL, + "unexpected frontend state InitWait"); + break; + + case XenbusStateInitialised: + case XenbusStateConnected: + /* Frontend has advertised its rings to us */ + err = attach_to_frontend(nb); + if (err >= 0) + xenbus_switch_state(xd, XenbusStateConnected); + break; + + case XenbusStateClosing: + detach_from_frontend(nb); + xenbus_switch_state(xd, XenbusStateClosed); + break; + + case XenbusStateClosed: + detach_from_frontend(nb); + xenbus_switch_state(xd, XenbusStateClosed); + if (!xenbus_dev_is_online(xd)) + device_unregister(&xd->dev); + break; + + case XenbusStateUnknown: + detach_from_frontend(nb); + xenbus_switch_state(xd, XenbusStateClosed); + device_unregister(&xd->dev); + break; + + default: + /* Ignore transitions to unknown states */ + break; + } +} + +static int netback2_uevent(struct xenbus_device *xd, + struct kobj_uevent_env *env) +{ + struct netback2 *nb = xenbus_device_to_nb2(xd); + + add_uevent_var(env, "vif=%s", nb->chan->net_device->name); + + return 0; +} + +static void netback2_shutdown(struct xenbus_device *xd) +{ + xenbus_switch_state(xd, XenbusStateClosing); +} + +static void shutdown_watch_callback(struct xenbus_watch *watch, + const char **vec, + unsigned int len) +{ + struct netback2 *nb + container_of(watch, struct netback2, shutdown_watch); + char *type; + + type = xenbus_read(XBT_NIL, nb->xenbus_device->nodename, + "shutdown-request", NULL); + if (IS_ERR(type)) { + if (PTR_ERR(type) != -ENOENT) + printk(KERN_WARNING "Cannot read %s/%s: %ld\n", + nb->xenbus_device->nodename, "shutdown-request", + PTR_ERR(type)); + return; + } + if (strcmp(type, "force") == 0) { + detach_from_frontend(nb); + xenbus_switch_state(nb->xenbus_device, XenbusStateClosed); + } else if (strcmp(type, "normal") == 0) { + netback2_shutdown(nb->xenbus_device); + } else { + printk(KERN_WARNING "Unrecognised shutdown request %s from tools\n", + type); + } + xenbus_rm(XBT_NIL, nb->xenbus_device->nodename, "shutdown-request"); + kfree(type); +} + +static int netback2_probe(struct xenbus_device *xd, + const struct xenbus_device_id *id) +{ + struct netback2 *nb; + + nb = kzalloc(sizeof(*nb), GFP_KERNEL); + if (nb == NULL) + goto err; + nb->magic = NETBACK2_MAGIC; + nb->xenbus_device = xd; + + nb->shutdown_watch.node = kasprintf(GFP_KERNEL, "%s/shutdown-request", + xd->nodename); + if (nb->shutdown_watch.node == NULL) + goto err; + nb->shutdown_watch.callback = shutdown_watch_callback; + if (register_xenbus_watch(&nb->shutdown_watch)) + goto err; + nb->have_shutdown_watch = 1; + + nb->chan = nc2_new(xd); + if (!nb->chan) + goto err; + + xd->dev.driver_data = nb; + + kobject_uevent(&xd->dev.kobj, KOBJ_ONLINE); + + return 0; + +err: + if (nb != NULL) { + if (nb->have_shutdown_watch) + unregister_xenbus_watch(&nb->shutdown_watch); + kfree(nb->shutdown_watch.node); + kfree(nb); + } + xenbus_dev_fatal(xd, ENOMEM, "probing netdev"); + return -ENOMEM; +} + +static int netback2_remove(struct xenbus_device *xd) +{ + struct netback2 *nb = xenbus_device_to_nb2(xd); + kobject_uevent(&xd->dev.kobj, KOBJ_OFFLINE); + if (nb->chan != NULL) + nc2_release(nb->chan); + if (nb->have_shutdown_watch) + unregister_xenbus_watch(&nb->shutdown_watch); + kfree(nb->shutdown_watch.node); + nc2_unmap_grants(&nb->b2f_mapping); + nc2_unmap_grants(&nb->f2b_mapping); + nc2_unmap_grants(&nb->control_mapping); + kfree(nb); + return 0; +} + +static const struct xenbus_device_id netback2_ids[] = { + { "vif2" }, + { "" } +}; + +static struct xenbus_driver netback2 = { + .name = "vif2", + .ids = netback2_ids, + .probe = netback2_probe, + .remove = netback2_remove, + .otherend_changed = frontend_changed, + .uevent = netback2_uevent, +}; + +int __init netback2_init(void) +{ + int r; + + r = xenbus_register_backend(&netback2); + if (r < 0) { + printk(KERN_ERR "error %d registering backend driver.\n", + r); + } + return r; +} diff --git a/drivers/xen/netchannel2/netchan2.c b/drivers/xen/netchannel2/netchan2.c new file mode 100644 index 0000000..b23b7e4 --- /dev/null +++ b/drivers/xen/netchannel2/netchan2.c @@ -0,0 +1,32 @@ +#include <linux/kernel.h> +#include <linux/module.h> +#include "netchannel2_endpoint.h" + +static int __init netchan2_init(void) +{ + int r; + + r = nc2_init(); + if (r < 0) + return r; + r = netfront2_init(); + if (r < 0) + return r; + r = netback2_init(); + if (r < 0) + netfront2_exit(); + return r; +} +module_init(netchan2_init); + +/* We can''t unload if we''re acting as a backend. */ +#ifndef CONFIG_XEN_NETDEV2_BACKEND +static void __exit netchan2_exit(void) +{ + netfront2_exit(); + nc2_exit(); +} +module_exit(netchan2_exit); +#endif + +MODULE_LICENSE("GPL"); diff --git a/drivers/xen/netchannel2/netchannel2_core.h b/drivers/xen/netchannel2/netchannel2_core.h new file mode 100644 index 0000000..6ae273d --- /dev/null +++ b/drivers/xen/netchannel2/netchannel2_core.h @@ -0,0 +1,351 @@ +#ifndef NETCHANNEL2_CORE_H__ +#define NETCHANNEL2_CORE_H__ + +#include <xen/interface/xen.h> +#include <xen/gnttab.h> +#include <xen/interface/io/netchannel2.h> +#include <linux/skbuff.h> +#include <linux/netdevice.h> + +/* After we send this number of frags, we request the other end to + * notify us when sending the corresponding finish packet message */ +#define MAX_MAX_COUNT_FRAGS_NO_EVENT 192 + +/* Very small packets (e.g. TCP pure acks) are sent inline in the + * ring, to avoid the hypercall overhead. This is the largest packet + * which will be sent small, in bytes. It should be big enough to + * cover the normal headers (i.e. ethernet + IP + TCP = 66 bytes) plus + * a little bit of slop for options etc. */ +#define PACKET_PREFIX_SIZE 96 + +/* How many packets can we have outstanding at any one time? This + * must be small enough that it won''t be confused with an sk_buff + * pointer; see the txp_slot stuff later. */ +#define NR_TX_PACKETS 256 + +/* A way of keeping track of a mapping of a bunch of grant references + into a contigous chunk of virtual address space. This is used for + things like multi-page rings. */ +#define MAX_GRANT_MAP_PAGES 4 +struct grant_mapping { + unsigned nr_pages; + grant_handle_t handles[MAX_GRANT_MAP_PAGES]; + struct vm_struct *mapping; +}; + +enum transmit_policy { + transmit_policy_unknown = 0, + transmit_policy_first = 0xf001, + transmit_policy_grant = transmit_policy_first, + transmit_policy_small, + transmit_policy_last = transmit_policy_small +}; + +/* When we send a packet message, we need to tag it with an ID. That + ID is an index into the TXP slot array. Each slot contains either + a pointer to an sk_buff (if it''s in use), or the index of the next + free slot (if it isn''t). A slot is in use if the contents is > + NR_TX_PACKETS, and free otherwise. */ +struct txp_slot { + unsigned long __contents; +}; + +typedef uint32_t nc2_txp_index_t; + +#define INVALID_TXP_INDEX ((nc2_txp_index_t)NR_TX_PACKETS) + +static inline int txp_slot_in_use(struct txp_slot *slot) +{ + if (slot->__contents <= NR_TX_PACKETS) + return 0; + else + return 1; +} + +static inline void txp_set_skb(struct txp_slot *slot, struct sk_buff *skb) +{ + slot->__contents = (unsigned long)skb; +} + +static inline struct sk_buff *txp_get_skb(struct txp_slot *slot) +{ + if (txp_slot_in_use(slot)) + return (struct sk_buff *)slot->__contents; + else + return NULL; +} + +static inline void txp_set_next_free(struct txp_slot *slot, + nc2_txp_index_t idx) +{ + slot->__contents = idx; +} + +static inline nc2_txp_index_t txp_get_next_free(struct txp_slot *slot) +{ + return (nc2_txp_index_t)slot->__contents; +} + +/* This goes in struct sk_buff::cb */ +struct skb_cb_overlay { + struct txp_slot *tp; + unsigned nr_fragments; + grant_ref_t gref_pool; + enum transmit_policy policy; + uint8_t failed; + uint8_t expecting_finish; + uint8_t type; + uint16_t inline_prefix_size; +}; + +#define CASSERT(x) typedef unsigned __cassert_ ## __LINE__ [(x)-1] +CASSERT(sizeof(struct skb_cb_overlay) <= sizeof(((struct sk_buff *)0)->cb)); + +static inline struct skb_cb_overlay *get_skb_overlay(struct sk_buff *skb) +{ + return (struct skb_cb_overlay *)skb->cb; +} + + +/* Packets for which we need to send FINISH_PACKET messages for as + soon as possible. */ +struct pending_finish_packets { +#define MAX_PENDING_FINISH_PACKETS 256 + uint32_t ids[MAX_PENDING_FINISH_PACKETS]; + RING_IDX prod; + RING_IDX cons; +}; + +#define RX_GRANT_COPY_BATCH 32 +struct hypercall_batcher { + unsigned nr_pending_gops; + gnttab_copy_t gops[RX_GRANT_COPY_BATCH]; + void *ctxt[RX_GRANT_COPY_BATCH]; +}; + +struct netchannel2_ring_pair { + struct netchannel2 *interface; + /* Main ring lock. Acquired from bottom halves. */ + spinlock_t lock; + + struct napi_struct napi; + + /* Protected by the lock. Initialised at attach_ring() time + and de-initialised at detach_ring() time. */ + struct netchannel2_prod_ring prod_ring; + struct netchannel2_cons_ring cons_ring; + uint8_t is_attached; /* True if the rings are currently safe to + access. */ + + unsigned max_count_frags_no_event; + unsigned expected_finish_messages; + + domid_t otherend_id; + + grant_ref_t gref_pool; + + /* The IRQ corresponding to the event channel which is + connected to the other end. This only changes from the + xenbus state change handler. It is notified from lots of + other places. Fortunately, it''s safe to notify on an irq + after it''s been released, so the lack of synchronisation + doesn''t matter. */ + int irq; + int evtchn; + + /* The MAC address of our peer. */ + unsigned char remote_mac[ETH_ALEN]; + + /* Set if we need to check the source MAC address on incoming + packets. */ + int filter_mac; + + /* A pool of free transmitted_packet structures, threaded on + the list member. Protected by the lock. */ + nc2_txp_index_t head_free_tx_packet; + + /* Total number of packets on the allocated list. Protected + by the lock. */ + unsigned nr_tx_packets_outstanding; + /* Maximum number of packets which the other end will allow us + to keep outstanding at one time. Valid whenever + is_attached is set. */ + unsigned max_tx_packets_outstanding; + + /* Count number of frags that we have sent to the other side + When we reach a max value we request that the other end + send an event when sending the corresponding finish message */ + unsigned count_frags_no_event; + + /* Set if we need to send a SET_MAX_PACKETS message. + Protected by the lock. */ + uint8_t need_advertise_max_packets; + + /* Set if there are messages on the ring which are considered + time-sensitive, so that it''s necessary to notify the remote + endpoint as soon as possible. */ + uint8_t pending_time_sensitive_messages; + + /* Set if we''ve previously suppressed a remote notification + because none of the messages pending at the time of the + flush were time-sensitive. The remote should be notified + as soon as the ring is flushed, even if the normal + filtering rules would suppress the event. */ + uint8_t delayed_kick; + + /* A list of packet IDs which we need to return to the other + end as soon as there is space on the ring. Protected by + the lock. */ + struct pending_finish_packets pending_finish; + + /* transmitted_packet structures which are to be transmitted + next time the TX tasklet looks at this interface. + Protected by the lock. */ + struct sk_buff_head pending_tx_queue; + + /* Packets which we''ll have finished transmitting as soon as + we flush the hypercall batcher. Protected by the lock. */ + struct sk_buff_head release_on_flush_batcher; + + struct hypercall_batcher pending_rx_hypercalls; + + /* A pre-allocated pool of TX packets. The + allocated_tx_packets and free_tx_packets linked lists + contain elements of this array, and it can also be directly + indexed by packet ID. Protected by the lock. */ + struct txp_slot tx_packets[NR_TX_PACKETS]; +}; + +struct netchannel2 { +#define NETCHANNEL2_MAGIC 0x57c68c1d + unsigned magic; + + /* Set when the structure is created and never changed */ + struct net_device *net_device; + struct xenbus_device *xenbus_device; + + /* Set if we trust the remote endpoint. */ + int remote_trusted; + /* Set if the remote endpoint is expected to trust us. + There''s no guarantee that this is actually correct, but + it''s useful for optimisation. */ + int local_trusted; + + struct netchannel2_ring_pair rings; + + /* Packets which we need to transmit soon */ + struct sk_buff_head pending_skbs; + + /* Flag to indicate that the interface is stopped + When the interface is stopped we need to run the tasklet + after we receive an interrupt so that we can wake it up */ + uint8_t is_stopped; + + /* Updates are protected by the lock. This can be read at any + * time without holding any locks, and the rest of Linux is + * expected to cope. */ + struct net_device_stats stats; +}; + +static inline void flush_prepared_grant_copies(struct hypercall_batcher *hb, + void (*on_fail)(void *ctxt, + gnttab_copy_t *gop)) +{ + unsigned x; + + if (hb->nr_pending_gops == 0) + return; + if (HYPERVISOR_grant_table_op(GNTTABOP_copy, hb->gops, + hb->nr_pending_gops)) + BUG(); + for (x = 0; x < hb->nr_pending_gops; x++) + if (hb->gops[x].status != GNTST_okay) + on_fail(hb->ctxt[x], &hb->gops[x]); + hb->nr_pending_gops = 0; +} + +static inline gnttab_copy_t *hypercall_batcher_grant_copy(struct hypercall_batcher *hb, + void *ctxt, + void (*on_fail)(void *, + gnttab_copy_t *gop)) +{ + if (hb->nr_pending_gops == ARRAY_SIZE(hb->gops)) + flush_prepared_grant_copies(hb, on_fail); + hb->ctxt[hb->nr_pending_gops] = ctxt; + return &hb->gops[hb->nr_pending_gops++]; +} + +static inline void flush_hypercall_batcher(struct hypercall_batcher *hb, + void (*on_fail)(void *, + gnttab_copy_t *gop)) +{ + flush_prepared_grant_copies(hb, on_fail); +} + +struct sk_buff *handle_receiver_copy_packet(struct netchannel2 *nc, + struct netchannel2_ring_pair *ncrp, + struct netchannel2_msg_packet *msg, + struct netchannel2_msg_hdr *hdr, + unsigned nr_frags, + unsigned frags_off); + +int prepare_xmit_allocate_small(struct netchannel2_ring_pair *ncrp, + struct sk_buff *skb); +int prepare_xmit_allocate_grant(struct netchannel2_ring_pair *ncrp, + struct sk_buff *skb); +void xmit_grant(struct netchannel2_ring_pair *ncrp, + struct sk_buff *skb, + volatile void *msg); + +void queue_finish_packet_message(struct netchannel2_ring_pair *ncrp, + uint32_t id, uint8_t flags); + +int allocate_txp_slot(struct netchannel2_ring_pair *ncrp, + struct sk_buff *skb); +void release_txp_slot(struct netchannel2_ring_pair *ncrp, + struct sk_buff *skb); +/* Releases the txp slot, the grant pool, and the skb */ +void release_tx_packet(struct netchannel2_ring_pair *ncrp, + struct sk_buff *skb); + +void fetch_fragment(struct netchannel2_ring_pair *ncrp, + unsigned idx, + struct netchannel2_fragment *frag, + unsigned off); + +void nc2_kick(struct netchannel2_ring_pair *ncrp); + +int nc2_map_grants(struct grant_mapping *gm, + const grant_ref_t *grefs, + unsigned nr_grefs, + domid_t remote_domain); +void nc2_unmap_grants(struct grant_mapping *gm); + +void queue_packet_to_interface(struct sk_buff *skb, + struct netchannel2_ring_pair *ncrp); + +void nc2_rscb_on_gntcopy_fail(void *ctxt, gnttab_copy_t *gop); + +int nc2_start_xmit(struct sk_buff *skb, struct net_device *dev); +int nc2_really_start_xmit(struct netchannel2_ring_pair *ncrp, + struct sk_buff *skb); +int prepare_xmit_allocate_resources(struct netchannel2 *nc, + struct sk_buff *skb); +void nc2_handle_finish_packet_msg(struct netchannel2 *nc, + struct netchannel2_ring_pair *ncrp, + struct netchannel2_msg_hdr *hdr); +void nc2_handle_set_max_packets_msg(struct netchannel2_ring_pair *ncrp, + struct netchannel2_msg_hdr *hdr); +void drop_pending_tx_packets(struct netchannel2_ring_pair *ncrp); + +void send_finish_packet_messages(struct netchannel2_ring_pair *ncrp); +void nc2_handle_packet_msg(struct netchannel2 *nc, + struct netchannel2_ring_pair *ncrp, + struct netchannel2_msg_hdr *hdr, + struct sk_buff_head *pending_rx_queue); +void advertise_max_packets(struct netchannel2_ring_pair *ncrp); +void receive_pending_skbs(struct sk_buff_head *rx_queue); +void nc2_queue_purge(struct netchannel2_ring_pair *ncrp, + struct sk_buff_head *queue); + +#endif /* !NETCHANNEL2_CORE_H__ */ diff --git a/drivers/xen/netchannel2/netchannel2_endpoint.h b/drivers/xen/netchannel2/netchannel2_endpoint.h new file mode 100644 index 0000000..2525f23 --- /dev/null +++ b/drivers/xen/netchannel2/netchannel2_endpoint.h @@ -0,0 +1,63 @@ +/* Interface between the endpoint implementations (netfront2.c, + netback2.c) and the netchannel2 core (chan.c and the various + transmission modes). */ +#ifndef NETCHANNEL2_ENDPOINT_H__ +#define NETCHANNEL2_ENDPOINT_H__ + +#include <linux/init.h> +#include <xen/interface/xen.h> + +struct netchannel2_sring_prod; +struct netchannel2_sring_cons; +struct netchannel2; +struct xenbus_device; + +struct netchannel2 *nc2_new(struct xenbus_device *xd); +void nc2_release(struct netchannel2 *nc); + +int nc2_attach_rings(struct netchannel2 *nc, + struct netchannel2_sring_cons *cons_sring, + const volatile void *cons_payload, + size_t cons_size, + struct netchannel2_sring_prod *prod_sring, + void *prod_payload, + size_t prod_size, + domid_t otherend_id); +void nc2_detach_rings(struct netchannel2 *nc); +#if defined(CONFIG_XEN_NETDEV2_FRONTEND) +int nc2_listen_evtchn(struct netchannel2 *nc, domid_t dom); +#endif +#if defined(CONFIG_XEN_NETDEV2_BACKEND) +int nc2_connect_evtchn(struct netchannel2 *nc, domid_t domid, + int evtchn); +#endif +int nc2_get_evtchn_port(struct netchannel2 *nc); +void nc2_suspend(struct netchannel2 *nc); + +void nc2_set_nr_tx_buffers(struct netchannel2 *nc, unsigned nr_buffers); + +/* Interface which the endpoints provide to the core. */ +#ifdef CONFIG_XEN_NETDEV2_FRONTEND +int __init netfront2_init(void); +void __exit netfront2_exit(void); +#else +static inline int netfront2_init(void) +{ + return 0; +} +static inline void netfront2_exit(void) +{ +} +#endif +#ifdef CONFIG_XEN_NETDEV2_BACKEND +int __init netback2_init(void); +#else +static inline int netback2_init(void) +{ + return 0; +} +#endif +int __init nc2_init(void); +void __exit nc2_exit(void); + +#endif /* NETCHANNEL2_ENDPOINT_H__ */ diff --git a/drivers/xen/netchannel2/netfront2.c b/drivers/xen/netchannel2/netfront2.c new file mode 100644 index 0000000..fb5d426 --- /dev/null +++ b/drivers/xen/netchannel2/netfront2.c @@ -0,0 +1,488 @@ +#include <linux/kernel.h> +#include <linux/gfp.h> +#include <linux/version.h> +#include <xen/gnttab.h> +#include <xen/xenbus.h> + +#include "netchannel2_core.h" +#include "netchannel2_endpoint.h" + +#define MAX_SRING_PAGES 4 + +struct netfront2 { +#define NETFRONT2_MAGIC 0x9268e704 + unsigned magic; + struct xenbus_device *xenbus_device; + + void *f2b_sring; + grant_ref_t f2b_grefs[MAX_SRING_PAGES]; + void *b2f_sring; + grant_ref_t b2f_grefs[MAX_SRING_PAGES]; + + struct netchannel2_frontend_shared *control_shared; + grant_ref_t control_shared_gref; + + int nr_sring_pages; + int sring_order; + + grant_ref_t rings_gref_pool; /* Some pre-allocated grant + references to cover the shared + rings. */ + + struct netchannel2 *chan; + + int attached; /* True if the shared rings are ready to go. */ +}; + +static struct netfront2 *xenbus_device_to_nf2(struct xenbus_device *xd) +{ + struct netfront2 *work = xd->dev.driver_data; + BUG_ON(work->magic != NETFRONT2_MAGIC); + return work; +} + +/* Try to revoke a bunch of grant references and return the grefs to + the rings grefs pool. Any cleared grefs are set to 0. Returns 0 + on success or <0 on error. Ignores zero entries in the @grefs + list, and zeroes any entries which are successfully ended. */ +static int ungrant_access_to_ring(struct netfront2 *nf, + grant_ref_t *grefs, + int nr_pages) +{ + int i; + int succ; + int failed; + + failed = 0; + + for (i = 0; i < nr_pages; i++) { + if (grefs[i]) { + succ = gnttab_end_foreign_access_ref(grefs[i]); + if (!succ) { + /* XXX we can''t recover when this + * happens. Try to do something + * vaguely plausible, but the device + * is pretty much doomed. */ + printk(KERN_WARNING "Failed to end access to gref %d\n", + i); + failed = 1; + continue; + } + gnttab_release_grant_reference(&nf->rings_gref_pool, + grefs[i]); + grefs[i] = 0; + } + } + + if (failed) + return -EBUSY; + else + return 0; +} + +/* Allocate and initialise grant references to cover a bunch of pages. + @ring should be in the direct-mapped region. The rings_gref_pool + on nf should contain at least @nr_pages references. + Already-populated slots in the @grefs list are left unchanged. */ +static void grant_access_to_ring(struct netfront2 *nf, + domid_t otherend, + void *ring, + int *grefs, + int nr_pages) +{ + void *p; + int i; + grant_ref_t ref; + + for (i = 0; i < nr_pages; i++) { + + if (grefs[i] != 0) + continue; + + p = (void *)((unsigned long)ring + PAGE_SIZE * i); + + ref = gnttab_claim_grant_reference(&nf->rings_gref_pool); + /* There should be enough grefs in the pool to handle + the rings. */ + BUG_ON(ref < 0); + gnttab_grant_foreign_access_ref(ref, + otherend, + virt_to_mfn(p), + 0); + grefs[i] = ref; + } +} + +/* Push an already-granted ring into xenstore. */ +static int publish_ring(struct xenbus_transaction xbt, + struct netfront2 *nf, + const char *prefix, + const int *grefs, + int nr_grefs) +{ + int i; + char buf[32]; + int err; + + sprintf(buf, "%s-nr-pages", prefix); + err = xenbus_printf(xbt, nf->xenbus_device->nodename, buf, + "%u", nr_grefs); + if (err) + return err; + + for (i = 0; i < nr_grefs; i++) { + BUG_ON(grefs[i] == 0); + sprintf(buf, "%s-ref-%u", prefix, i); + err = xenbus_printf(xbt, nf->xenbus_device->nodename, + buf, "%u", grefs[i]); + if (err) + return err; + } + return 0; +} + +static int publish_rings(struct netfront2 *nf) +{ + int err; + struct xenbus_transaction xbt; + const char *msg; + +again: + err = xenbus_transaction_start(&xbt); + if (err) { + xenbus_dev_fatal(nf->xenbus_device, err, + "starting transaction"); + return err; + } + + err = publish_ring(xbt, nf, "f2b-ring", nf->f2b_grefs, + nf->nr_sring_pages); + if (err) { + msg = "publishing f2b-ring"; + goto abort; + } + err = publish_ring(xbt, nf, "b2f-ring", nf->b2f_grefs, + nf->nr_sring_pages); + if (err) { + msg = "publishing b2f-ring"; + goto abort; + } + err = publish_ring(xbt, nf, "control", &nf->control_shared_gref, 1); + if (err) { + msg = "publishing control"; + goto abort; + } + err = xenbus_printf(xbt, nf->xenbus_device->nodename, + "event-channel", "%u", + nc2_get_evtchn_port(nf->chan)); + if (err) { + msg = "publishing event channel"; + goto abort; + } + + err = xenbus_transaction_end(xbt, 0); + if (err) { + if (err == -EAGAIN) + goto again; + xenbus_dev_fatal(nf->xenbus_device, err, + "completing transaction"); + } + + return err; + +abort: + xenbus_transaction_end(xbt, 1); + xenbus_dev_fatal(nf->xenbus_device, err, msg); + return err; +} + +/* Release the rings. WARNING: This will leak memory if the other end + still has the rings mapped. There isn''t really anything we can do + about that; the alternative (giving the other end access to + whatever Linux puts in the memory after we released it) is probably + worse. */ +static void release_rings(struct netfront2 *nf) +{ + int have_outstanding_grants; + + have_outstanding_grants = 0; + + if (nf->f2b_sring) { + if (ungrant_access_to_ring(nf, nf->f2b_grefs, + nf->nr_sring_pages) >= 0) { + free_pages((unsigned long)nf->f2b_sring, + nf->sring_order); + } else { + have_outstanding_grants = 1; + } + nf->f2b_sring = NULL; + } + + if (nf->b2f_sring) { + if (ungrant_access_to_ring(nf, nf->b2f_grefs, + nf->nr_sring_pages) >= 0) { + free_pages((unsigned long)nf->b2f_sring, + nf->sring_order); + } else { + have_outstanding_grants = 1; + } + nf->b2f_sring = NULL; + } + + if (nf->control_shared) { + if (ungrant_access_to_ring(nf, &nf->control_shared_gref, + 1) >= 0) { + free_page((unsigned long)nf->control_shared); + } else { + have_outstanding_grants = 1; + } + nf->control_shared = NULL; + } + + if (have_outstanding_grants != 0) { + printk(KERN_WARNING + "Released shared rings while the backend still had them mapped; leaking memory\n"); + } + + /* We can''t release the gref pool if there are still + references outstanding against it. */ + if (!have_outstanding_grants) { + if (nf->rings_gref_pool) + gnttab_free_grant_references(nf->rings_gref_pool); + nf->rings_gref_pool = 0; + } + + nf->attached = 0; +} + +static int allocate_rings(struct netfront2 *nf, domid_t otherend) +{ + int err; + int max_sring_pages; + int sring_order; + int nr_sring_pages; + size_t sring_size; + + /* Figure out how big our shared rings are going to be. */ + err = xenbus_scanf(XBT_NIL, nf->xenbus_device->otherend, + "max-sring-pages", "%d", &max_sring_pages); + if (err < 0) { + xenbus_dev_fatal(nf->xenbus_device, err, + "reading %s/max-sring-pages", + nf->xenbus_device->otherend); + return err; + } + if (max_sring_pages > MAX_SRING_PAGES) + max_sring_pages = MAX_SRING_PAGES; + sring_order = order_base_2(max_sring_pages); + nr_sring_pages = 1 << sring_order; + sring_size = nr_sring_pages * PAGE_SIZE; + + release_rings(nf); + + nf->nr_sring_pages = nr_sring_pages; + nf->sring_order = sring_order; + + nf->f2b_sring = (void *)__get_free_pages(GFP_KERNEL, sring_order); + if (!nf->f2b_sring) + return -ENOMEM; + memset(nf->f2b_sring, 0, sring_size); + + nf->b2f_sring = (void *)__get_free_pages(GFP_KERNEL, sring_order); + if (!nf->b2f_sring) + return -ENOMEM; + memset(nf->b2f_sring, 0, sring_size); + + nf->control_shared = (void *)get_zeroed_page(GFP_KERNEL); + if (!nf->control_shared) + return -ENOMEM; + + /* Pre-allocate enough grant references to be sure that we can + grant access to both rings without an error. */ + err = gnttab_alloc_grant_references(nr_sring_pages * 2 + 1, + &nf->rings_gref_pool); + if (err < 0) + return err; + + grant_access_to_ring(nf, + otherend, + nf->b2f_sring, + nf->b2f_grefs, + nr_sring_pages); + grant_access_to_ring(nf, + otherend, + nf->f2b_sring, + nf->f2b_grefs, + nr_sring_pages); + grant_access_to_ring(nf, + otherend, + nf->control_shared, + &nf->control_shared_gref, + 1); + err = nc2_listen_evtchn(nf->chan, otherend); + if (err < 0) + return err; + + nf->attached = 1; + + return 0; +} + +static void backend_changed(struct xenbus_device *xd, + enum xenbus_state backend_state) +{ + struct netfront2 *nf = xenbus_device_to_nf2(xd); + int err; + + switch (backend_state) { + case XenbusStateInitialising: + /* Backend isn''t ready yet, don''t do anything. */ + break; + + case XenbusStateInitWait: + /* Backend has advertised the ring protocol. Allocate + the rings, and tell the backend about them. */ + + err = 0; + if (!nf->attached) + err = allocate_rings(nf, xd->otherend_id); + if (err < 0) { + xenbus_dev_fatal(xd, err, "allocating shared rings"); + break; + } + err = publish_rings(nf); + if (err >= 0) + xenbus_switch_state(xd, XenbusStateInitialised); + break; + + case XenbusStateInitialised: + /* Backend isn''t supposed to use this state. */ + xenbus_dev_fatal(xd, EINVAL, + "unexpected backend state Initialised"); + break; + + case XenbusStateConnected: + /* All ready */ + err = nc2_attach_rings(nf->chan, + &nf->control_shared->cons, + nf->b2f_sring, + nf->nr_sring_pages * PAGE_SIZE, + &nf->control_shared->prod, + nf->f2b_sring, + nf->nr_sring_pages * PAGE_SIZE, + nf->xenbus_device->otherend_id); + if (err < 0) { + xenbus_dev_fatal(xd, err, + "failed to attach to rings"); + } else { + xenbus_switch_state(xd, XenbusStateConnected); + } + break; + + case XenbusStateClosing: + xenbus_switch_state(xd, XenbusStateClosing); + break; + + case XenbusStateClosed: + /* Tell the tools that it''s safe to remove the device + from the bus. */ + xenbus_frontend_closed(xd); + /* Note that we don''t release the rings here. This + means that if the backend moves to a different + domain, we won''t be able to reconnect, but it also + limits the amount of memory which can be wasted in + the release_rings() leak if the backend is faulty + or malicious. It''s not obvious which is more + useful, and so I choose the safer but less + featureful approach. */ + /* This is only a problem if you''re using driver + domains and trying to recover from a driver error + by rebooting the backend domain. The rest of the + tools don''t support that, so it''s a bit + theoretical. The memory leaks aren''t, though. */ + break; + + case XenbusStateUnknown: + /* The tools have removed the device area from the + store. Do nothing and rely on xenbus core to call + our remove method. */ + break; + + default: + /* Ignore transitions to unknown states */ + break; + } +} + +static int __devinit netfront_probe(struct xenbus_device *xd, + const struct xenbus_device_id *id) +{ + struct netfront2 *nf; + + nf = kzalloc(sizeof(*nf), GFP_KERNEL); + if (nf == NULL) + goto err; + nf->magic = NETFRONT2_MAGIC; + nf->xenbus_device = xd; + nf->chan = nc2_new(xd); + if (nf->chan == NULL) + goto err; + + xd->dev.driver_data = nf; + + return 0; + +err: + kfree(nf); + xenbus_dev_fatal(xd, ENOMEM, "probing netdev"); + return -ENOMEM; +} + +static int netfront_resume(struct xenbus_device *xd) +{ + /* We''ve been suspended and come back. The rings are + therefore dead. Tear them down. */ + /* We rely on the normal xenbus state machine to bring them + back to life. */ + struct netfront2 *nf = xenbus_device_to_nf2(xd); + + nc2_detach_rings(nf->chan); + release_rings(nf); + + return 0; +} + +static int __devexit netfront_remove(struct xenbus_device *xd) +{ + struct netfront2 *nf = xenbus_device_to_nf2(xd); + if (nf->chan != NULL) + nc2_release(nf->chan); + release_rings(nf); + kfree(nf); + return 0; +} + +static const struct xenbus_device_id netfront_ids[] = { + { "vif2" }, + { "" } +}; +MODULE_ALIAS("xen:vif2"); + +static struct xenbus_driver netfront2 = { + .name = "vif2", + .ids = netfront_ids, + .probe = netfront_probe, + .remove = __devexit_p(netfront_remove), + .otherend_changed = backend_changed, + .resume = netfront_resume, +}; + +int __init netfront2_init(void) +{ + return xenbus_register_frontend(&netfront2); +} + +void __exit netfront2_exit(void) +{ + xenbus_unregister_driver(&netfront2); +} diff --git a/drivers/xen/netchannel2/recv_packet.c b/drivers/xen/netchannel2/recv_packet.c new file mode 100644 index 0000000..4678c28 --- /dev/null +++ b/drivers/xen/netchannel2/recv_packet.c @@ -0,0 +1,216 @@ +/* Support for receiving individual packets, and all the stuff which + * goes with that. */ +#include <linux/kernel.h> +#include <linux/etherdevice.h> +#include <linux/version.h> +#include "netchannel2_core.h" + +/* Send as many finish packet messages as will fit on the ring. */ +void send_finish_packet_messages(struct netchannel2_ring_pair *ncrp) +{ + struct pending_finish_packets *pfp = &ncrp->pending_finish; + struct netchannel2_msg_finish_packet msg; + RING_IDX cons; + + while (pfp->prod != pfp->cons && + nc2_can_send_payload_bytes(&ncrp->prod_ring, sizeof(msg))) { + cons = pfp->cons; + msg.id = pfp->ids[pfp->cons % MAX_PENDING_FINISH_PACKETS]; + pfp->cons++; + nc2_send_message(&ncrp->prod_ring, + NETCHANNEL2_MSG_FINISH_PACKET, + 0, + &msg, + sizeof(msg)); + } +} + +/* Add a packet ID to the finish packet queue. The caller should + arrange that send_finish_packet_messages is sent soon to flush the + requests out. */ +void queue_finish_packet_message(struct netchannel2_ring_pair *ncrp, + uint32_t id, uint8_t flags) +{ + struct pending_finish_packets *pfp = &ncrp->pending_finish; + RING_IDX prod; + + prod = pfp->prod; + pfp->ids[prod % MAX_PENDING_FINISH_PACKETS] = id; + pfp->prod++; + + if (flags & NC2_PACKET_FLAG_need_event) + ncrp->pending_time_sensitive_messages = 1; +} + +/* Handle a packet message from the other end. On success, queues the + new skb to the pending skb list. If the packet is invalid, it is + discarded without generating a FINISH message. */ +/* Caution: this drops and re-acquires the ring lock. */ +void nc2_handle_packet_msg(struct netchannel2 *nc, + struct netchannel2_ring_pair *ncrp, + struct netchannel2_msg_hdr *hdr, + struct sk_buff_head *pending_rx_queue) +{ + unsigned nr_frags; + struct netchannel2_msg_packet msg; + struct sk_buff *skb; + const unsigned frags_off = sizeof(msg); + unsigned frags_bytes; + + if (ncrp->pending_finish.prod - ncrp->pending_finish.cons =+ MAX_PENDING_FINISH_PACKETS) { + pr_debug("Remote endpoint sent too many packets!\n"); + nc->stats.rx_errors++; + return; + } + + if (hdr->size < sizeof(msg)) { + pr_debug("Packet message too small (%d < %zd)\n", hdr->size, + sizeof(msg)); + nc->stats.rx_errors++; + return; + } + + if (hdr->size & 7) { + pr_debug("Packet size in ring not multiple of 8: %d\n", + hdr->size); + nc->stats.rx_errors++; + return; + } + + nc2_copy_from_ring(&ncrp->cons_ring, &msg, sizeof(msg)); + + frags_bytes = hdr->size - sizeof(msg) - msg.prefix_size; + nr_frags = frags_bytes / sizeof(struct netchannel2_fragment); + + switch (msg.type) { + case NC2_PACKET_TYPE_small: + if (nr_frags != 0) { + /* Small packets, by definition, have no + * fragments */ + pr_debug("Received small packet with %d frags?\n", + nr_frags); + nc->stats.rx_errors++; + return; + } + /* Any of the receiver functions can handle small + packets as a trivial special case. Use receiver + copy, since that''s the simplest. */ + skb = handle_receiver_copy_packet(nc, ncrp, &msg, hdr, + nr_frags, frags_off); + /* No finish message */ + break; + case NC2_PACKET_TYPE_receiver_copy: + skb = handle_receiver_copy_packet(nc, ncrp, &msg, hdr, + nr_frags, frags_off); + queue_finish_packet_message(ncrp, msg.id, msg.flags); + break; + default: + pr_debug("Unknown packet type %d\n", msg.type); + nc->stats.rx_errors++; + skb = NULL; + break; + } + if (skb != NULL) { + nc->stats.rx_bytes += skb->len; + nc->stats.rx_packets++; + skb->dev = nc->net_device; + + if (ncrp->filter_mac && + skb_headlen(skb) >= sizeof(struct ethhdr) && + memcmp(((struct ethhdr *)skb->data)->h_source, + ncrp->remote_mac, + ETH_ALEN)) { + /* We''re in filter MACs mode and the source + MAC on this packet is wrong. Drop it. */ + /* (We know that any packet big enough to + contain an ethernet header at all will + contain it in the head space because we do + a pull_through at the end of the type + handler.) */ + nc->stats.rx_missed_errors++; + goto err; + } + + __skb_queue_tail(pending_rx_queue, skb); + + if (ncrp->pending_rx_hypercalls.nr_pending_gops >+ RX_GRANT_COPY_BATCH) { + flush_prepared_grant_copies(&ncrp->pending_rx_hypercalls, + nc2_rscb_on_gntcopy_fail); + /* since receive could generate ACKs to the + start_xmit() function we need to release + the ring lock */ + spin_unlock(&ncrp->lock); + /* we should receive the packet as soon as the + copy is complete to benefit from cache + locality */ + receive_pending_skbs(pending_rx_queue); + spin_lock(&ncrp->lock); + + } + + } + return; + +err: + /* If the receive succeeded part-way, there may be references + to the skb in the hypercall batcher. Flush them out before + we release it. This is a slow path, so we don''t care that + much about performance. */ + flush_prepared_grant_copies(&ncrp->pending_rx_hypercalls, + nc2_rscb_on_gntcopy_fail); + + /* We may need to send a FINISH message here if this was a + receiver-map packet. That should be handled automatically + by the kfree_skb(). */ + kfree_skb(skb); + nc->stats.rx_errors++; + return; +} + +/* If there is space on the ring, tell the other end how many packets + its allowed to send at one time and clear the + need_advertise_max_packets flag. */ +void advertise_max_packets(struct netchannel2_ring_pair *ncrp) +{ + struct netchannel2_msg_set_max_packets msg; + + if (!nc2_can_send_payload_bytes(&ncrp->prod_ring, sizeof(msg))) + return; + msg.max_outstanding_packets = MAX_PENDING_FINISH_PACKETS; + nc2_send_message(&ncrp->prod_ring, + NETCHANNEL2_MSG_SET_MAX_PACKETS, + 0, + &msg, + sizeof(msg)); + ncrp->need_advertise_max_packets = 0; + ncrp->pending_time_sensitive_messages = 1; +} + +void receive_pending_skbs(struct sk_buff_head *pending_rx_queue) +{ + struct sk_buff *skb; + struct skb_cb_overlay *sco; + while (!skb_queue_empty(pending_rx_queue)) { + skb = __skb_dequeue(pending_rx_queue); + sco = get_skb_overlay(skb); + if (unlikely(sco->failed)) + kfree_skb(skb); + else { + skb->protocol = eth_type_trans(skb, skb->dev); + netif_receive_skb(skb); + } + } +} + + +/* These don''t really belong here, but it''s as good a place as any. */ +int __init nc2_init(void) +{ + return 0; +} + +void __exit nc2_exit(void) +{ +} diff --git a/drivers/xen/netchannel2/rscb.c b/drivers/xen/netchannel2/rscb.c new file mode 100644 index 0000000..8984f90 --- /dev/null +++ b/drivers/xen/netchannel2/rscb.c @@ -0,0 +1,385 @@ +/* Receiver-side copy buffer support */ +#include <linux/kernel.h> +#include <linux/skbuff.h> +#include <linux/netdevice.h> +#include <linux/version.h> +#include <xen/gnttab.h> +#include <xen/live_maps.h> + +#include "netchannel2_core.h" + +/* -------------------------- Receive -------------------------------- */ + +/* This is called whenever an RSCB grant copy fails. */ +void nc2_rscb_on_gntcopy_fail(void *ctxt, gnttab_copy_t *gop) +{ + struct sk_buff *skb = ctxt; + struct skb_cb_overlay *sco = get_skb_overlay(skb); + if (!sco->failed && net_ratelimit()) + printk(KERN_WARNING "Dropping RX packet because of copy error\n"); + sco->failed = 1; +} + + +/* Copy @size bytes from @offset in grant ref @gref against domain + @domid and shove them on the end of @skb. Fails if it the head + does not have enough space or if the copy would span multiple + pages. */ +static int nc2_grant_copy(struct netchannel2_ring_pair *ncrp, + struct sk_buff *skb, + unsigned offset, + unsigned size, + grant_ref_t gref, + domid_t domid) +{ + gnttab_copy_t *gop; + void *tail; + void *end; + + if (size > PAGE_SIZE) + return 0; + + tail = skb_tail_pointer(skb); + end = skb_end_pointer(skb); + + if (unlikely(size > (end-tail))) + return 0; + + if (unlikely(offset_in_page(tail) + size > PAGE_SIZE)) { + unsigned f1 = PAGE_SIZE - offset_in_page(tail); + /* Recursive, but only ever to depth 1, so okay */ + if (!nc2_grant_copy(ncrp, skb, offset, f1, gref, domid)) + return 0; + offset += f1; + size -= f1; + tail += f1; + } + + /* Copy this fragment into the header. */ + gop = hypercall_batcher_grant_copy(&ncrp->pending_rx_hypercalls, + skb, + nc2_rscb_on_gntcopy_fail); + gop->flags = GNTCOPY_source_gref; + gop->source.domid = domid; + gop->source.offset = offset; + gop->source.u.ref = gref; + gop->dest.domid = DOMID_SELF; + gop->dest.offset = offset_in_page(tail); + gop->dest.u.gmfn = virt_to_mfn(tail); + gop->len = size; + + skb_put(skb, size); + + return 1; +} + +/* We''ve received a receiver-copy packet message from the remote. + Parse it up, build an sk_buff, and return it. Returns NULL on + error. */ +struct sk_buff *handle_receiver_copy_packet(struct netchannel2 *nc, + struct netchannel2_ring_pair *ncrp, + struct netchannel2_msg_packet *msg, + struct netchannel2_msg_hdr *hdr, + unsigned nr_frags, + unsigned frags_off) +{ + struct netchannel2_fragment frag; + unsigned nr_bytes; + unsigned x; + struct sk_buff *skb; + unsigned skb_headsize; + int first_frag, first_frag_size; + gnttab_copy_t *gop; + struct skb_shared_info *shinfo; + struct page *new_page; + + if (msg->prefix_size > NETCHANNEL2_MAX_INLINE_BYTES) { + pr_debug("Inline prefix too big! (%d > %d)\n", + msg->prefix_size, NETCHANNEL2_MAX_INLINE_BYTES); + return NULL; + } + + /* Count the number of bytes in the packet. Be careful: the + other end can still access the packet on the ring, so the + size could change later. */ + nr_bytes = msg->prefix_size; + for (x = 0; x < nr_frags; x++) { + fetch_fragment(ncrp, x, &frag, frags_off); + nr_bytes += frag.size; + } + if (nr_bytes > NETCHANNEL2_MAX_PACKET_BYTES) { + pr_debug("Packet too big! (%d > %d)\n", nr_bytes, + NETCHANNEL2_MAX_PACKET_BYTES); + return NULL; + } + if (nr_bytes < 64) { + /* Linux sometimes has problems with very small SKBs. + Impose a minimum size of 64 bytes. */ + nr_bytes = 64; + } + + first_frag = 0; + if (nr_frags > 0) { + fetch_fragment(ncrp, 0, &frag, frags_off); + first_frag_size = frag.size; + first_frag = 1; + } else { + first_frag_size = 0; + first_frag = 0; + } + + /* We try to have both prefix and the first frag in the skb head + if they do not exceed the page size */ + skb_headsize = msg->prefix_size + first_frag_size + NET_IP_ALIGN; + if (skb_headsize > + ((PAGE_SIZE - sizeof(struct skb_shared_info) - NET_SKB_PAD) & + ~(SMP_CACHE_BYTES - 1))) { + skb_headsize = msg->prefix_size + NET_IP_ALIGN; + first_frag = 0; + } + + skb = dev_alloc_skb(skb_headsize); + if (!skb) { + /* Drop the packet. */ + pr_debug("Couldn''t allocate a %d byte skb.\n", nr_bytes); + nc->stats.rx_dropped++; + return NULL; + } + + /* Arrange that the IP header is nicely aligned in memory. */ + skb_reserve(skb, NET_IP_ALIGN); + + /* The inline prefix should always fit in the SKB head. */ + nc2_copy_from_ring_off(&ncrp->cons_ring, + skb_put(skb, msg->prefix_size), + msg->prefix_size, + frags_off + nr_frags * sizeof(frag)); + + /* copy first frag into skb head if it does not cross a + page boundary */ + if (first_frag == 1) { + fetch_fragment(ncrp, 0, &frag, frags_off); + if (!nc2_grant_copy(ncrp, skb, frag.off, frag.size, + frag.receiver_copy.gref, + ncrp->otherend_id)) { + get_skb_overlay(skb)->failed = 1; + return skb; + } + } + + shinfo = skb_shinfo(skb); + for (x = first_frag; x < nr_frags; x++) { + fetch_fragment(ncrp, x, &frag, frags_off); + + /* Allocate a new page for the fragment */ + new_page = alloc_page(GFP_ATOMIC); + if (!new_page) { + get_skb_overlay(skb)->failed = 1; + return skb; + } + + gop = hypercall_batcher_grant_copy(&ncrp->pending_rx_hypercalls, + skb, + nc2_rscb_on_gntcopy_fail); + gop->flags = GNTCOPY_source_gref; + gop->source.domid = ncrp->otherend_id; + gop->source.offset = frag.off; + gop->source.u.ref = frag.receiver_copy.gref; + gop->dest.domid = DOMID_SELF; + gop->dest.offset = 0; + gop->dest.u.gmfn = pfn_to_mfn(page_to_pfn(new_page)); + gop->len = frag.size; + + shinfo->frags[x-first_frag].page = new_page; + shinfo->frags[x-first_frag].page_offset = 0; + shinfo->frags[x-first_frag].size = frag.size; + shinfo->nr_frags++; + + skb->truesize += frag.size; + skb->data_len += frag.size; + skb->len += frag.size; + } + return skb; +} + + + +/* ------------------------------- Transmit ---------------------------- */ + +struct grant_packet_plan { + volatile struct netchannel2_fragment *out_fragment; + grant_ref_t gref_pool; + unsigned prefix_avail; +}; + +static inline int nfrags_skb(struct sk_buff *skb, int prefix_size) +{ + unsigned long start_grant; + unsigned long end_grant; + + if (skb_headlen(skb) <= prefix_size) + return skb_shinfo(skb)->nr_frags; + + start_grant = ((unsigned long)skb->data + prefix_size) & + ~(PAGE_SIZE-1); + end_grant = ((unsigned long)skb->data + + skb_headlen(skb) + PAGE_SIZE - 1) & + ~(PAGE_SIZE-1); + return ((end_grant - start_grant) >> PAGE_SHIFT) + + skb_shinfo(skb)->nr_frags; +} + +int prepare_xmit_allocate_grant(struct netchannel2_ring_pair *ncrp, + struct sk_buff *skb) +{ + struct skb_cb_overlay *skb_co = get_skb_overlay(skb); + unsigned nr_fragments; + grant_ref_t gref_pool; + int err; + unsigned inline_prefix_size; + + if (allocate_txp_slot(ncrp, skb) < 0) + return -1; + + /* We''re going to have to get the remote to issue a grant copy + hypercall anyway, so there''s no real benefit to shoving the + headers inline. */ + /* (very small packets won''t go through here, so there''s no + chance that we could completely eliminate the grant + copy.) */ + inline_prefix_size = sizeof(struct ethhdr); + + if (skb_co->nr_fragments == 0) { + nr_fragments = nfrags_skb(skb, inline_prefix_size); + + /* No-fragments packets should be policy small, not + * policy grant. */ + BUG_ON(nr_fragments == 0); + + skb_co->nr_fragments = nr_fragments; + } + + /* Grab the grant references. */ + err = gnttab_suballoc_grant_references(skb_co->nr_fragments, + &ncrp->gref_pool, + &gref_pool); + if (err < 0) { + release_txp_slot(ncrp, skb); + /* Leave skb_co->nr_fragments set, so that we don''t + have to recompute it next time around. */ + return -1; + } + skb_co->gref_pool = gref_pool; + skb_co->inline_prefix_size = inline_prefix_size; + + skb_co->type = NC2_PACKET_TYPE_receiver_copy; + + return 0; +} + +static void prepare_subpage_grant(struct netchannel2_ring_pair *ncrp, + struct page *page, + unsigned off_in_page, + unsigned size, + struct grant_packet_plan *plan) +{ + volatile struct netchannel2_fragment *frag; + domid_t trans_domid; + grant_ref_t trans_gref; + grant_ref_t gref; + + if (size <= plan->prefix_avail) { + /* This fragment is going to be inline -> nothing to + * do. */ + plan->prefix_avail -= size; + return; + } + if (plan->prefix_avail > 0) { + /* Part inline, part in payload. */ + size -= plan->prefix_avail; + off_in_page += plan->prefix_avail; + plan->prefix_avail = 0; + } + frag = plan->out_fragment; + gref = gnttab_claim_grant_reference(&plan->gref_pool); + frag->receiver_copy.gref = gref; + if (page_is_tracked(page)) { + lookup_tracker_page(page, &trans_domid, &trans_gref); + gnttab_grant_foreign_access_ref_trans(gref, + ncrp->otherend_id, + GTF_readonly, + trans_domid, + trans_gref); + } else { + gnttab_grant_foreign_access_ref_subpage(gref, + ncrp->otherend_id, + virt_to_mfn(page_address(page)), + GTF_readonly, + off_in_page, + size); + } + + frag->off = off_in_page; + frag->size = size; + plan->out_fragment++; +} + +static int grant_data_area(struct netchannel2_ring_pair *ncrp, + struct sk_buff *skb, + struct grant_packet_plan *plan) +{ + void *ptr = skb->data; + unsigned len = skb_headlen(skb); + unsigned off; + unsigned this_time; + + for (off = 0; off < len; off += this_time) { + this_time = len - off; + if (this_time + offset_in_page(ptr + off) > PAGE_SIZE) + this_time = PAGE_SIZE - offset_in_page(ptr + off); + prepare_subpage_grant(ncrp, + virt_to_page(ptr + off), + offset_in_page(ptr + off), + this_time, + plan); + } + return 0; +} + +void xmit_grant(struct netchannel2_ring_pair *ncrp, + struct sk_buff *skb, + volatile void *msg_buf) +{ + volatile struct netchannel2_msg_packet *msg = msg_buf; + struct skb_cb_overlay *skb_co = get_skb_overlay(skb); + struct grant_packet_plan plan; + unsigned x; + struct skb_shared_info *shinfo; + skb_frag_t *frag; + + memset(&plan, 0, sizeof(plan)); + plan.prefix_avail = skb_co->inline_prefix_size; + plan.out_fragment = msg->frags; + plan.gref_pool = skb_co->gref_pool; + + ncrp->count_frags_no_event += skb_co->nr_fragments; + if (ncrp->count_frags_no_event >= ncrp->max_count_frags_no_event) { + msg->flags |= NC2_PACKET_FLAG_need_event; + ncrp->count_frags_no_event = 0; + } + + grant_data_area(ncrp, skb, &plan); + + shinfo = skb_shinfo(skb); + for (x = 0; x < shinfo->nr_frags; x++) { + frag = &shinfo->frags[x]; + prepare_subpage_grant(ncrp, + frag->page, + frag->page_offset, + frag->size, + &plan); + } + + skb_co->nr_fragments = plan.out_fragment - msg->frags; +} + diff --git a/drivers/xen/netchannel2/util.c b/drivers/xen/netchannel2/util.c new file mode 100644 index 0000000..302dfc1 --- /dev/null +++ b/drivers/xen/netchannel2/util.c @@ -0,0 +1,230 @@ +#include <linux/kernel.h> +#include <linux/list.h> +#include <linux/skbuff.h> +#include <linux/version.h> +#ifdef CONFIG_XEN_NETDEV2_BACKEND +#include <xen/driver_util.h> +#endif +#include <xen/gnttab.h> +#include "netchannel2_core.h" + +int allocate_txp_slot(struct netchannel2_ring_pair *ncrp, + struct sk_buff *skb) +{ + struct skb_cb_overlay *skb_co = get_skb_overlay(skb); + struct txp_slot *tp; + + BUG_ON(skb_co->tp); + + if (ncrp->head_free_tx_packet == INVALID_TXP_INDEX || + ncrp->nr_tx_packets_outstanding =+ ncrp->max_tx_packets_outstanding) { + return -1; + } + + tp = &ncrp->tx_packets[ncrp->head_free_tx_packet]; + ncrp->head_free_tx_packet = txp_get_next_free(tp); + + txp_set_skb(tp, skb); + skb_co->tp = tp; + ncrp->nr_tx_packets_outstanding++; + return 0; +} + +static void nc2_free_skb(struct netchannel2 *nc, + struct sk_buff *skb) +{ + dev_kfree_skb(skb); +} + +void release_txp_slot(struct netchannel2_ring_pair *ncrp, + struct sk_buff *skb) +{ + struct skb_cb_overlay *skb_co = get_skb_overlay(skb); + struct txp_slot *tp = skb_co->tp; + + BUG_ON(txp_get_skb(tp) != skb); + + /* Try to keep the free TX packet list in order as far as + * possible, since that gives slightly better cache behaviour. + * It''s not worth spending a lot of effort getting this right, + * though, so just use a simple heuristic: if we''re freeing a + * packet, and the previous packet is already free, chain this + * packet directly after it, rather than putting it at the + * head of the list. This isn''t perfect by any means, but + * it''s enough that you get nice long runs of contiguous + * packets in the free list, and that''s all we really need. + * Runs much bigger than a cache line aren''t really very + * useful, anyway. */ + if (tp != ncrp->tx_packets && !txp_slot_in_use(tp - 1)) { + txp_set_next_free(tp, txp_get_next_free(tp - 1)); + txp_set_next_free(tp - 1, tp - ncrp->tx_packets); + } else { + txp_set_next_free(tp, ncrp->head_free_tx_packet); + ncrp->head_free_tx_packet = tp - ncrp->tx_packets; + } + skb_co->tp = NULL; + ncrp->nr_tx_packets_outstanding--; +} + +void release_tx_packet(struct netchannel2_ring_pair *ncrp, + struct sk_buff *skb) +{ + struct skb_cb_overlay *skb_co = get_skb_overlay(skb); + struct txp_slot *tp = skb_co->tp; + grant_ref_t gref; + int r; + unsigned cntr; + + if (skb_co->type == NC2_PACKET_TYPE_receiver_copy) { + while (1) { + r = gnttab_claim_grant_reference(&skb_co->gref_pool); + if (r == -ENOSPC) + break; + gref = (grant_ref_t)r; + /* It''s a subpage grant reference, so Xen + guarantees to release it quickly. Sit and + wait for it to do so. */ + cntr = 0; + while (!gnttab_end_foreign_access_ref(gref)) { + cpu_relax(); + if (++cntr % 65536 == 0) + printk(KERN_WARNING "Having trouble ending gref %d for receiver copy.\n", + gref); + } + gnttab_release_grant_reference(&ncrp->gref_pool, gref); + } + } else if (skb_co->gref_pool != 0) { + gnttab_subfree_grant_references(skb_co->gref_pool, + &ncrp->gref_pool); + } + + if (tp != NULL) + release_txp_slot(ncrp, skb); + + nc2_free_skb(ncrp->interface, skb); +} + +void fetch_fragment(struct netchannel2_ring_pair *ncrp, + unsigned idx, + struct netchannel2_fragment *frag, + unsigned off) +{ + nc2_copy_from_ring_off(&ncrp->cons_ring, + frag, + sizeof(*frag), + off + idx * sizeof(*frag)); +} + +/* Copy @count bytes from the skb''s data area into its head, updating + * the pointers as appropriate. The caller should ensure that there + * is actually enough space in the head. */ +void pull_through(struct sk_buff *skb, unsigned count) +{ + unsigned frag = 0; + unsigned this_frag; + void *buf; + void *va; + + while (count != 0 && frag < skb_shinfo(skb)->nr_frags) { + this_frag = skb_shinfo(skb)->frags[frag].size; + if (this_frag > count) + this_frag = count; + va = page_address(skb_shinfo(skb)->frags[frag].page); + buf = skb->tail; + memcpy(buf, va + skb_shinfo(skb)->frags[frag].page_offset, + this_frag); + skb->tail += this_frag; + BUG_ON(skb->tail > skb->end); + skb_shinfo(skb)->frags[frag].size -= this_frag; + skb_shinfo(skb)->frags[frag].page_offset += this_frag; + skb->data_len -= this_frag; + count -= this_frag; + frag++; + } + for (frag = 0; + frag < skb_shinfo(skb)->nr_frags && + skb_shinfo(skb)->frags[frag].size == 0; + frag++) { + put_page(skb_shinfo(skb)->frags[frag].page); + } + skb_shinfo(skb)->nr_frags -= frag; + memmove(skb_shinfo(skb)->frags, + skb_shinfo(skb)->frags+frag, + sizeof(skb_shinfo(skb)->frags[0]) * + skb_shinfo(skb)->nr_frags); +} + +#ifdef CONFIG_XEN_NETDEV2_BACKEND + +/* Zap a grant_mapping structure, releasing all mappings and the + reserved virtual address space. Prepare the grant_mapping for + re-use. */ +void nc2_unmap_grants(struct grant_mapping *gm) +{ + struct gnttab_unmap_grant_ref op[MAX_GRANT_MAP_PAGES]; + int i; + + if (gm->mapping == NULL) + return; + for (i = 0; i < gm->nr_pages; i++) { + gnttab_set_unmap_op(&op[i], + (unsigned long)gm->mapping->addr + + i * PAGE_SIZE, + GNTMAP_host_map, + gm->handles[i]); + } + if (HYPERVISOR_grant_table_op(GNTTABOP_unmap_grant_ref, op, i)) + BUG(); + free_vm_area(gm->mapping); + memset(gm, 0, sizeof(*gm)); +} + +int nc2_map_grants(struct grant_mapping *gm, + const grant_ref_t *grefs, + unsigned nr_grefs, + domid_t remote_domain) +{ + struct grant_mapping work; + struct gnttab_map_grant_ref op[MAX_GRANT_MAP_PAGES]; + int i; + + memset(&work, 0, sizeof(work)); + + if (nr_grefs > MAX_GRANT_MAP_PAGES || nr_grefs == 0) + return -EINVAL; + + if (nr_grefs & (nr_grefs-1)) { + /* Must map a power-of-two number of pages. */ + return -EINVAL; + } + + work.nr_pages = nr_grefs; + work.mapping = alloc_vm_area(PAGE_SIZE * work.nr_pages); + if (!work.mapping) + return -ENOMEM; + for (i = 0; i < nr_grefs; i++) + gnttab_set_map_op(&op[i], + (unsigned long)work.mapping->addr + + i * PAGE_SIZE, + GNTMAP_host_map, + grefs[i], + remote_domain); + + if (HYPERVISOR_grant_table_op(GNTTABOP_map_grant_ref, op, nr_grefs)) + BUG(); + + for (i = 0; i < nr_grefs; i++) { + if (op[i].status) { + work.nr_pages = i; + nc2_unmap_grants(&work); + return -EFAULT; + } + work.handles[i] = op[i].handle; + } + + nc2_unmap_grants(gm); + *gm = work; + return 0; +} +#endif diff --git a/drivers/xen/netchannel2/xmit_packet.c b/drivers/xen/netchannel2/xmit_packet.c new file mode 100644 index 0000000..92fbabf --- /dev/null +++ b/drivers/xen/netchannel2/xmit_packet.c @@ -0,0 +1,318 @@ +/* Things related to actually sending packet messages, and which is + shared across all transmit modes. */ +#include <linux/kernel.h> +#include <linux/version.h> +#include "netchannel2_core.h" + +/* We limit the number of transmitted packets which can be in flight + at any one time, as a somewhat paranoid safety catch. */ +#define MAX_TX_PACKETS MAX_PENDING_FINISH_PACKETS + +static enum transmit_policy transmit_policy(struct netchannel2 *nc, + struct sk_buff *skb) +{ + if (skb->len <= PACKET_PREFIX_SIZE && !skb_is_nonlinear(skb)) + return transmit_policy_small; + else + return transmit_policy_grant; +} + +/* Allocate resources for a small packet. The entire thing will be + transmitted in the ring. This is only called for small, linear + SKBs. It always succeeds, but has an int return type for symmetry + with the other prepare_xmit_*() functions. */ +int prepare_xmit_allocate_small(struct netchannel2_ring_pair *ncrp, + struct sk_buff *skb) +{ + struct skb_cb_overlay *skb_co = get_skb_overlay(skb); + + BUG_ON(skb_is_nonlinear(skb)); + BUG_ON(skb->len > NETCHANNEL2_MAX_INLINE_BYTES); + + skb_co->type = NC2_PACKET_TYPE_small; + skb_co->gref_pool = 0; + skb_co->inline_prefix_size = skb->len; + + return 0; +} + +/* Figure out how much space @tp will take up on the ring. */ +static unsigned get_transmitted_packet_msg_size(struct sk_buff *skb) +{ + struct skb_cb_overlay *skb_co = get_skb_overlay(skb); + return (sizeof(struct netchannel2_msg_packet) + + sizeof(struct netchannel2_fragment) * skb_co->nr_fragments + + skb_co->inline_prefix_size + 7) & ~7; +} + +/* Do the minimum amount of work to be certain that when we come to + transmit this packet we won''t run out of resources. This includes + figuring out how we''re going to fragment the packet for + transmission, which buffers we''re going to use, etc. Return <0 if + insufficient resources are available right now, or 0 if we + succeed. */ +/* Careful: this may allocate e.g. a TXP slot and then discover that + it can''t reserve ring space. In that case, the TXP remains + allocated. The expected case is that the caller will arrange for + us to retry the allocation later, in which case we''ll pick up the + already-allocated buffers. */ +int prepare_xmit_allocate_resources(struct netchannel2 *nc, + struct sk_buff *skb) +{ + struct skb_cb_overlay *skb_co = get_skb_overlay(skb); + enum transmit_policy policy; + unsigned msg_size; + int r; + + if (skb_co->policy == transmit_policy_unknown) { + policy = transmit_policy(nc, skb); + switch (policy) { + case transmit_policy_small: + r = prepare_xmit_allocate_small(&nc->rings, skb); + break; + case transmit_policy_grant: + r = prepare_xmit_allocate_grant(&nc->rings, skb); + break; + default: + BUG(); + /* Shut the compiler up. */ + r = -1; + } + if (r < 0) + return r; + skb_co->policy = policy; + } + + msg_size = get_transmitted_packet_msg_size(skb); + if (nc2_reserve_payload_bytes(&nc->rings.prod_ring, msg_size)) + return 0; + + return -1; +} + +/* Transmit a packet which has previously been prepared with + prepare_xmit_allocate_resources(). */ +/* Once this has been called, the ring must not be flushed until the + TX hypercall batcher is (assuming this ring has a hypercall + batcher). */ +int nc2_really_start_xmit(struct netchannel2_ring_pair *ncrp, + struct sk_buff *skb) +{ + struct skb_cb_overlay *skb_co = get_skb_overlay(skb); + struct netchannel2 *nc = ncrp->interface; + unsigned msg_size; + volatile struct netchannel2_msg_packet *msg; + + msg_size = get_transmitted_packet_msg_size(skb); + /* Un-reserve the space we reserved for the packet. */ + BUG_ON(ncrp->prod_ring.reserve < msg_size); + ncrp->prod_ring.reserve -= msg_size; + if (!nc2_can_send_payload_bytes(&ncrp->prod_ring, msg_size)) { + /* Aw, crud. We had to transmit a PAD message at just + the wrong time, and our attempt to reserve ring + space failed. Delay transmiting this packet + Make sure we redo the space reserve */ + ncrp->prod_ring.reserve += msg_size; + return 0; + } + __nc2_avoid_ring_wrap(&ncrp->prod_ring, msg_size); + + /* Set up part of the message. We do the message header + itself and the inline prefix. The individual xmit_* + methods are responsible for the fragments. They may also + set some more msg flags. */ + msg = __nc2_get_message_ptr(&ncrp->prod_ring); + msg->hdr.type = NETCHANNEL2_MSG_PACKET; + msg->hdr.flags = 0; + msg->id = skb_co->tp - ncrp->tx_packets; + msg->type = skb_co->type; + msg->flags = 0; + msg->prefix_size = skb_co->inline_prefix_size; + + /* We cast away the volatile to avoid compiler warnings, and + then use barrier()s to discourage gcc from using msg->frags + in CSE or somesuch. It''s kind of unlikely that it would, + but better to make sure. */ + barrier(); + memcpy((void *)(msg->frags + skb_co->nr_fragments), + skb->data, + skb_co->inline_prefix_size); + barrier(); + + switch (skb_co->policy) { + case transmit_policy_small: + /* Nothing to do */ + break; + case transmit_policy_grant: + xmit_grant(ncrp, skb, msg); + break; + default: + BUG(); + } + + /* The transmission method may have decided not to use all the + fragments it reserved, which changes the message size. */ + msg_size = get_transmitted_packet_msg_size(skb); + msg->hdr.size = msg_size; + + ncrp->prod_ring.prod_pvt += msg_size; + + BUG_ON(ncrp->prod_ring.bytes_available < msg_size); + + ncrp->prod_ring.bytes_available -= msg_size; + + ncrp->pending_time_sensitive_messages = 1; + + if (skb_co->tp) { + ncrp->expected_finish_messages++; + /* We''re now ready to accept a FINISH message for this + packet. */ + skb_co->expecting_finish = 1; + } else { + /* This packet doesn''t need a FINISH message. Queue + it up to be released as soon as we flush the + hypercall batcher and the ring. */ + nc->stats.tx_bytes += skb->len; + nc->stats.tx_packets++; + __skb_queue_tail(&ncrp->release_on_flush_batcher, skb); + } + + return 1; +} + +/* Arrange that @skb will be sent on ring @ncrp soon. Assumes that + prepare_xmit_allocate_resources() has been successfully called on + @skb already. */ +void queue_packet_to_interface(struct sk_buff *skb, + struct netchannel2_ring_pair *ncrp) +{ + __skb_queue_tail(&ncrp->pending_tx_queue, skb); + if (ncrp->pending_tx_queue.qlen == 1) + nc2_kick(ncrp); +} + +int nc2_start_xmit(struct sk_buff *skb, struct net_device *dev) +{ + struct netchannel2 *nc = netdev_priv(dev); + struct skb_cb_overlay *sco = get_skb_overlay(skb); + int r; + + memset(sco, 0, sizeof(*sco)); + + spin_lock_bh(&nc->rings.lock); + + if (!nc->rings.is_attached) { + spin_unlock_bh(&nc->rings.lock); + dev_kfree_skb(skb); + nc->stats.tx_dropped++; + return NETDEV_TX_OK; + } + + r = prepare_xmit_allocate_resources(nc, skb); + if (r < 0) + goto out_busy; + queue_packet_to_interface(skb, &nc->rings); + spin_unlock_bh(&nc->rings.lock); + + return NETDEV_TX_OK; + +out_busy: + /* Some more buffers may have arrived, so kick the worker + * thread to go and have a look. */ + nc2_kick(&nc->rings); + + __skb_queue_tail(&nc->pending_skbs, skb); + nc->is_stopped = 1; + netif_stop_queue(dev); + spin_unlock_bh(&nc->rings.lock); + return NETDEV_TX_OK; +} + + +void nc2_handle_finish_packet_msg(struct netchannel2 *nc, + struct netchannel2_ring_pair *ncrp, + struct netchannel2_msg_hdr *hdr) +{ + struct skb_cb_overlay *sco; + struct netchannel2_msg_finish_packet msg; + struct txp_slot *tp; + struct sk_buff *skb; + + if (hdr->size < sizeof(msg)) { + pr_debug("Packet finish message had strange size %d\n", + hdr->size); + return; + } + nc2_copy_from_ring(&ncrp->cons_ring, &msg, sizeof(msg)); + if (msg.id > NR_TX_PACKETS) { + pr_debug("Other end tried to end bad packet id %d\n", + msg.id); + return; + } + tp = &ncrp->tx_packets[msg.id]; + skb = txp_get_skb(tp); + if (!skb) { + pr_debug("Other end tried to end packet id %d which wasn''t in use\n", + msg.id); + return; + } + sco = get_skb_overlay(skb); + /* Careful: if the remote is malicious, they may try to end a + packet after we allocate it but before we send it (e.g. if + we''ve had to back out because we didn''t have enough ring + space). */ + if (!sco->expecting_finish) { + pr_debug("Other end finished packet before we sent it?\n"); + return; + } + nc->stats.tx_bytes += skb->len; + nc->stats.tx_packets++; + release_tx_packet(ncrp, skb); + ncrp->expected_finish_messages--; +} + + +/* ------------------------ Control-path operations ---------------------- */ +void nc2_handle_set_max_packets_msg(struct netchannel2_ring_pair *ncrp, + struct netchannel2_msg_hdr *hdr) +{ + struct netchannel2_msg_set_max_packets msg; + + if (hdr->size != sizeof(msg)) { + pr_debug("Set max packets message had strange size %d\n", + hdr->size); + return; + } + if (ncrp->max_tx_packets_outstanding != 0) { + pr_debug("Other end tried to change number of outstanding packets from %d.\n", + ncrp->max_tx_packets_outstanding); + return; + } + nc2_copy_from_ring(&ncrp->cons_ring, &msg, sizeof(msg)); + /* Limit the number of outstanding packets to something sane. + This is a little bit paranoid (it should be safe to set + this arbitrarily high), but limiting it avoids nasty + surprises in untested configurations. */ + if (msg.max_outstanding_packets > MAX_TX_PACKETS) { + pr_debug("Other end tried to set max outstanding to %d, limiting to %d.\n", + msg.max_outstanding_packets, MAX_TX_PACKETS); + ncrp->max_tx_packets_outstanding = MAX_TX_PACKETS; + } else { + ncrp->max_tx_packets_outstanding = msg.max_outstanding_packets; + } +} + +/* Release all packets on the transmitted and pending_tx lists. */ +void drop_pending_tx_packets(struct netchannel2_ring_pair *ncrp) +{ + struct sk_buff *skb; + unsigned x; + + nc2_queue_purge(ncrp, &ncrp->pending_tx_queue); + for (x = 0; x < NR_TX_PACKETS; x++) { + skb = txp_get_skb(&ncrp->tx_packets[x]); + if (skb) + release_tx_packet(ncrp, skb); + } +} + diff --git a/include/xen/interface/io/netchannel2.h b/include/xen/interface/io/netchannel2.h new file mode 100644 index 0000000..c45963e --- /dev/null +++ b/include/xen/interface/io/netchannel2.h @@ -0,0 +1,106 @@ +#ifndef __NETCHANNEL2_H__ +#define __NETCHANNEL2_H__ + +#include <xen/interface/io/uring.h> + +/* Tell the other end how many packets its allowed to have + * simultaneously outstanding for transmission. An endpoint must not + * send PACKET messages which would take it over this limit. + * + * The SET_MAX_PACKETS message must be sent before any PACKET + * messages. It should only be sent once, unless the ring is + * disconnected and reconnected. + */ +#define NETCHANNEL2_MSG_SET_MAX_PACKETS 1 +struct netchannel2_msg_set_max_packets { + struct netchannel2_msg_hdr hdr; + uint32_t max_outstanding_packets; +}; + +/* Pass a packet to the other end. The packet consists of a header, + * followed by a bunch of fragment descriptors, followed by an inline + * packet prefix. Every fragment descriptor in a packet must be the + * same type, and the type is determined by the header. The receiving + * endpoint should respond with a finished_packet message as soon as + * possible. The prefix may be no more than + * NETCHANNEL2_MAX_INLINE_BYTES. Packets may contain no more than + * NETCHANNEL2_MAX_PACKET_BYTES bytes of data, including all fragments + * and the prefix. + */ +#define NETCHANNEL2_MSG_PACKET 2 +#define NETCHANNEL2_MAX_PACKET_BYTES 65536 +#define NETCHANNEL2_MAX_INLINE_BYTES 256 +struct netchannel2_fragment { + uint16_t size; + /* The offset is always relative to the start of the page. + For pre_posted packet types, it is not relative to the + start of the buffer (although the fragment range will + obviously be within the buffer range). */ + uint16_t off; + union { + struct { + grant_ref_t gref; + } receiver_copy; + }; +}; +struct netchannel2_msg_packet { + struct netchannel2_msg_hdr hdr; + uint32_t id; /* Opaque ID which is echoed into the finished + packet message. */ + uint8_t type; + uint8_t flags; + uint8_t pad0; + uint8_t pad1; + uint16_t prefix_size; + uint16_t pad2; + uint16_t pad3; + uint16_t pad4; + /* Variable-size array. The number of elements is determined + by the size of the message. */ + /* Until we support scatter-gather, this will be either 0 or 1 + element. */ + struct netchannel2_fragment frags[0]; +}; + +/* If set, the transmitting domain requires an event urgently when + * this packet''s finish message is sent. Otherwise, the event can be + * delayed. */ +#define NC2_PACKET_FLAG_need_event 8 + +/* The mechanism which should be used to receive the data part of + * a packet: + * + * receiver_copy -- The transmitting domain has granted the receiving + * domain access to the original RX buffers using + * copy-only grant references. The receiving domain + * should copy the data out of the buffers and issue + * a FINISH message. + * + * Due to backend bugs, it is in not safe to use this + * packet type except on bypass rings. + * + * small -- The packet does not have any fragment descriptors + * (i.e. the entire thing is inline in the ring). The receiving + * domain should simply the copy the packet out of the ring + * into a locally allocated buffer. No FINISH message is required + * or allowed. + * + * This packet type may be used on any ring. + * + * All endpoints must be able to receive all packet types, but note + * that it is correct to treat receiver_map and small packets as + * receiver_copy ones. */ +#define NC2_PACKET_TYPE_receiver_copy 1 +#define NC2_PACKET_TYPE_small 4 + +/* Tell the other end that we''re finished with a message it sent us, + and it can release the transmit buffers etc. This must be sent in + response to receiver_copy and receiver_map packets. It must not be + sent in response to pre_posted or small packets. */ +#define NETCHANNEL2_MSG_FINISH_PACKET 3 +struct netchannel2_msg_finish_packet { + struct netchannel2_msg_hdr hdr; + uint32_t id; +}; + +#endif /* !__NETCHANNEL2_H__ */ diff --git a/include/xen/interface/io/uring.h b/include/xen/interface/io/uring.h new file mode 100644 index 0000000..663c3d7 --- /dev/null +++ b/include/xen/interface/io/uring.h @@ -0,0 +1,426 @@ +#ifndef __XEN_PUBLIC_IO_URING_H__ +#define __XEN_PUBLIC_IO_URING_H__ + +#include <linux/kernel.h> +#include <linux/module.h> +#include <asm/system.h> + +typedef unsigned RING_IDX; + +#define NETCHANNEL2_MSG_PAD 255 + +/* The sring structures themselves. The _cons and _prod variants are + different views of the same bit of shared memory, and are supposed + to provide better checking of the expected use patterns. Fields in + the shared ring are owned by either the producer end or the + consumer end. If a field is owned by your end, the other end will + never modify it. If it''s owned by the other end, the other end is + allowed to modify it whenever it likes, and you can never do so. + + Fields owned by the other end are always const (because you can''t + change them). They''re also volatile, because there are a bunch + of places where we go: + + local_x = sring->x; + validate(local_x); + use(local_x); + + and it would be very bad if the compiler turned that into: + + local_x = sring->x; + validate(sring->x); + use(local_x); + + because that contains a potential TOCTOU race (hard to exploit, but + still present). The compiler is only allowed to do that + optimisation because it knows that local_x == sring->x at the start + of the call to validate(), and it only knows that if it can reorder + the read of sring->x over the sequence point at the end of the + first statement. In other words, it can only do the bad + optimisation if it knows that reads of sring->x are side-effect + free. volatile stops it from making that assumption. + + We don''t need a full memory barrier here, because it''s sufficient + to copy the volatile data into stable guest-local storage, and + volatile achieves that. i.e. we don''t need local_x to be precisely + sring->x, but we do need it to be a stable snapshot of some + previous valud of sring->x. + + Note that there are still plenty of other places where we *do* need + full barriers. volatile just deals with this one, specific, case. + + We could also deal with it by putting compiler barriers in all over + the place. The downside of that approach is that you need to put + the barrier()s in lots of different places (basically, everywhere + which needs to access these fields), and it''s easy to forget one. + barrier()s also have somewhat heavier semantics than volatile + (because they prevent all reordering, rather than just reordering + on this one field), although that''s pretty much irrelevant because + gcc usually treats pretty much any volatile access as a call to + barrier(). +*/ + +/* Messages are sent over sring pairs. Each sring in a pair provides + * a unidirectional byte stream which can generate events when either + * the producer or consumer pointers cross a particular threshold. + * + * We define both sring_prod and sring_cons structures. The two + * structures will always map onto the same physical bytes in memory, + * but they provide different views of that memory which are + * appropriate to either producers or consumers. + * + * Obviously, the endpoints need to agree on which end produces + * messages on which ring. The endpoint which provided the memory + * backing the ring always produces on the first sring, and the one + * which just mapped the ring produces on the second. By convention, + * these are known as the frontend and backend, respectively. + */ + +/* For both rings, the producer (consumer) pointers point at the + * *next* byte which is going to be produced (consumed). An endpoint + * must generate an event on the event channel port if it moves the + * producer pointer (consumer pointer) across prod_event (cons_event). + * + * i.e if an endpoint ever updates a pointer so that the old pointer + * is strictly less than the event, and the new pointer is greater + * than or equal to the event then the remote must be notified. If + * the pointer overflows the ring, treat the new value as if it were + * (actual new value) + (1 << 32). + */ +struct netchannel2_sring_prod { + RING_IDX prod; + volatile const RING_IDX cons; + volatile const RING_IDX prod_event; + RING_IDX cons_event; + unsigned char pad[48]; +}; + +struct netchannel2_sring_cons { + volatile const RING_IDX prod; + RING_IDX cons; + RING_IDX prod_event; + volatile const RING_IDX cons_event; + unsigned char pad[48]; +}; + +struct netchannel2_frontend_shared { + struct netchannel2_sring_prod prod; + struct netchannel2_sring_cons cons; +}; + +struct netchannel2_backend_shared { + struct netchannel2_sring_cons cons; + struct netchannel2_sring_prod prod; +}; + +struct netchannel2_prod_ring { + struct netchannel2_sring_prod *sring; + void *payload; + RING_IDX prod_pvt; + /* This is the number of bytes available after prod_pvt last + time we checked, minus the number of bytes which we''ve + consumed since then. It''s used to a avoid a bunch of + memory barriers when checking for ring space. */ + unsigned bytes_available; + /* Number of bytes reserved by nc2_reserve_payload_bytes() */ + unsigned reserve; + size_t payload_bytes; +}; + +struct netchannel2_cons_ring { + struct netchannel2_sring_cons *sring; + const volatile void *payload; + RING_IDX cons_pvt; + size_t payload_bytes; +}; + +/* A message header. There is one of these at the start of every + * message. @type is one of the #define''s below, and @size is the + * size of the message, including the header and any padding. + * size should be a multiple of 8 so we avoid unaligned memory copies. + * structs defining message formats should have sizes multiple of 8 + * bytes and should use paddding fields if needed. + */ +struct netchannel2_msg_hdr { + uint8_t type; + uint8_t flags; + uint16_t size; +}; + +/* Copy some bytes from the shared ring to a stable local buffer, + * starting at the private consumer pointer. Does not update the + * private consumer pointer. + */ +static inline void nc2_copy_from_ring_off(struct netchannel2_cons_ring *ring, + void *buf, + size_t nbytes, + unsigned off) +{ + unsigned start, end; + + start = (ring->cons_pvt + off) & (ring->payload_bytes-1); + end = (ring->cons_pvt + nbytes + off) & (ring->payload_bytes-1); + /* We cast away the volatile modifier to get rid of an + irritating compiler warning, and compensate with a + barrier() at the end. */ + memcpy(buf, (const void *)ring->payload + start, nbytes); + barrier(); +} + +static inline void nc2_copy_from_ring(struct netchannel2_cons_ring *ring, + void *buf, + size_t nbytes) +{ + nc2_copy_from_ring_off(ring, buf, nbytes, 0); +} + + +/* Copy some bytes to the shared ring, starting at the private + * producer pointer. Does not update the private pointer. + */ +static inline void nc2_copy_to_ring_off(struct netchannel2_prod_ring *ring, + const void *src, + unsigned nr_bytes, + unsigned off) +{ + unsigned start, end; + + start = (ring->prod_pvt + off) & (ring->payload_bytes-1); + end = (ring->prod_pvt + nr_bytes + off) & (ring->payload_bytes-1); + memcpy(ring->payload + start, src, nr_bytes); +} + +static inline void nc2_copy_to_ring(struct netchannel2_prod_ring *ring, + const void *src, + unsigned nr_bytes) +{ + nc2_copy_to_ring_off(ring, src, nr_bytes, 0); +} + +static inline void __nc2_send_pad(struct netchannel2_prod_ring *ring, + unsigned nr_bytes) +{ + struct netchannel2_msg_hdr msg; + msg.type = NETCHANNEL2_MSG_PAD; + msg.flags = 0; + msg.size = nr_bytes; + nc2_copy_to_ring(ring, &msg, sizeof(msg)); + ring->prod_pvt += nr_bytes; + ring->bytes_available -= nr_bytes; +} + +static inline int __nc2_ring_would_wrap(struct netchannel2_prod_ring *ring, + unsigned nr_bytes) +{ + RING_IDX mask; + mask = ~(ring->payload_bytes - 1); + return (ring->prod_pvt & mask) != ((ring->prod_pvt + nr_bytes) & mask); +} + +static inline unsigned __nc2_pad_needed(struct netchannel2_prod_ring *ring) +{ + return ring->payload_bytes - + (ring->prod_pvt & (ring->payload_bytes - 1)); +} + +static inline void __nc2_avoid_ring_wrap(struct netchannel2_prod_ring *ring, + unsigned nr_bytes) +{ + if (!__nc2_ring_would_wrap(ring, nr_bytes)) + return; + __nc2_send_pad(ring, __nc2_pad_needed(ring)); + +} + +/* Prepare a message for the other end and place it on the shared + * ring, updating the private producer pointer. You need to call + * nc2_flush_messages() before the message is actually made visible to + * the other end. It is permissible to send several messages in a + * batch and only flush them once. + */ +static inline void nc2_send_message(struct netchannel2_prod_ring *ring, + unsigned type, + unsigned flags, + const void *msg, + size_t size) +{ + struct netchannel2_msg_hdr *hdr = (struct netchannel2_msg_hdr *)msg; + + __nc2_avoid_ring_wrap(ring, size); + + hdr->type = type; + hdr->flags = flags; + hdr->size = size; + + nc2_copy_to_ring(ring, msg, size); + ring->prod_pvt += size; + BUG_ON(ring->bytes_available < size); + ring->bytes_available -= size; +} + +static inline volatile void *__nc2_get_message_ptr(struct netchannel2_prod_ring *ncrp) +{ + return (volatile void *)ncrp->payload + + (ncrp->prod_pvt & (ncrp->payload_bytes-1)); +} + +/* Copy the private producer pointer to the shared producer pointer, + * with a suitable memory barrier such that all messages placed on the + * ring are stable before we do the copy. This effectively pushes any + * messages which we''ve just sent out to the other end. Returns 1 if + * we need to notify the other end and 0 otherwise. + */ +static inline int nc2_flush_ring(struct netchannel2_prod_ring *ring) +{ + RING_IDX old_prod, new_prod; + + old_prod = ring->sring->prod; + new_prod = ring->prod_pvt; + + wmb(); + + ring->sring->prod = new_prod; + + /* We need the update to prod to happen before we read + * event. */ + mb(); + + /* We notify if the producer pointer moves across the event + * pointer. */ + if ((RING_IDX)(new_prod - ring->sring->prod_event) < + (RING_IDX)(new_prod - old_prod)) + return 1; + else + return 0; +} + +/* Copy the private consumer pointer to the shared consumer pointer, + * with a memory barrier so that any previous reads from the ring + * complete before the pointer is updated. This tells the other end + * that we''re finished with the messages, and that it can re-use the + * ring space for more messages. Returns 1 if we need to notify the + * other end and 0 otherwise. + */ +static inline int nc2_finish_messages(struct netchannel2_cons_ring *ring) +{ + RING_IDX old_cons, new_cons; + + old_cons = ring->sring->cons; + new_cons = ring->cons_pvt; + + /* Need to finish reading from the ring before updating + cons */ + mb(); + ring->sring->cons = ring->cons_pvt; + + /* Need to publish our new consumer pointer before checking + event. */ + mb(); + if ((RING_IDX)(new_cons - ring->sring->cons_event) < + (RING_IDX)(new_cons - old_cons)) + return 1; + else + return 0; +} + +/* Check whether there are any unconsumed messages left on the shared + * ring. Returns 1 if there are, and 0 if there aren''t. If there are + * no more messages, set the producer event so that we''ll get a + * notification as soon as another one gets sent. It is assumed that + * all messages up to @prod have been processed, and none of the ones + * after it have been. */ +static inline int nc2_final_check_for_messages(struct netchannel2_cons_ring *ring, + RING_IDX prod) +{ + if (prod != ring->sring->prod) + return 1; + /* Request an event when more stuff gets poked on the ring. */ + ring->sring->prod_event = prod + 1; + + /* Publish event before final check for responses. */ + mb(); + if (prod != ring->sring->prod) + return 1; + else + return 0; +} + +/* Can we send a message with @nr_bytes payload bytes? Returns 1 if + * we can or 0 if we can''t. If there isn''t space right now, set the + * consumer event so that we''ll get notified when space is + * available. */ +static inline int nc2_can_send_payload_bytes(struct netchannel2_prod_ring *ring, + unsigned nr_bytes) +{ + unsigned space; + RING_IDX cons; + BUG_ON(ring->bytes_available > ring->payload_bytes); + /* Times 2 because we might need to send a pad message */ + if (likely(ring->bytes_available > nr_bytes * 2 + ring->reserve)) + return 1; + if (__nc2_ring_would_wrap(ring, nr_bytes)) + nr_bytes += __nc2_pad_needed(ring); +retry: + cons = ring->sring->cons; + space = ring->payload_bytes - (ring->prod_pvt - cons); + if (likely(space >= nr_bytes + ring->reserve)) { + /* We have enough space to send the message. */ + + /* Need to make sure that the read of cons happens + before any following memory writes. */ + mb(); + + ring->bytes_available = space; + + return 1; + } else { + /* Not enough space available. Set an event pointer + when cons changes. We need to be sure that the + @cons used here is the same as the cons used to + calculate @space above, and the volatile modifier + on sring->cons achieves that. */ + ring->sring->cons_event = cons + 1; + + /* Check whether more space became available while we + were messing about. */ + + /* Need the event pointer to be stable before we do + the check. */ + mb(); + if (unlikely(cons != ring->sring->cons)) { + /* Cons pointer changed. Try again. */ + goto retry; + } + + /* There definitely isn''t space on the ring now, and + an event has been set such that we''ll be notified + if more space becomes available. */ + /* XXX we get a notification as soon as any more space + becomes available. We could maybe optimise by + setting the event such that we only get notified + when we know that enough space is available. The + main complication is handling the case where you + try to send a message of size A, fail due to lack + of space, and then try to send one of size B, where + B < A. It''s not clear whether you want to set the + event for A bytes or B bytes. The obvious answer + is B, but that means moving the event pointer + backwards, and it''s not clear that that''s always + safe. Always setting for a single byte is safe, so + stick with that for now. */ + return 0; + } +} + +static inline int nc2_reserve_payload_bytes(struct netchannel2_prod_ring *ring, + unsigned nr_bytes) +{ + if (nc2_can_send_payload_bytes(ring, nr_bytes)) { + ring->reserve += nr_bytes; + return 1; + } else { + return 0; + } +} + +#endif /* __XEN_PUBLIC_IO_URING_H__ */ -- 1.6.3.1 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
<steven.smith@citrix.com>
2009-Oct-04 15:04 UTC
[Xen-devel] [PATCH 10/22] Add a fall-back poller, in case finish messages get stuck somewhere.
We try to avoid the event channel notification when sending finish messages, for performance reasons, but that can lead to a deadlock if you have a lot of packets going in one direction and nothing coming the other way. Fix it by just polling for messages every second when there are unfinished packets outstanding. Signed-off-by: Steven Smith <steven.smith@citrix.com> --- drivers/xen/netchannel2/Makefile | 2 +- drivers/xen/netchannel2/chan.c | 5 ++- drivers/xen/netchannel2/netchannel2_core.h | 10 +++++ drivers/xen/netchannel2/poll.c | 59 ++++++++++++++++++++++++++++ drivers/xen/netchannel2/xmit_packet.c | 3 + 5 files changed, 77 insertions(+), 2 deletions(-) create mode 100644 drivers/xen/netchannel2/poll.c diff --git a/drivers/xen/netchannel2/Makefile b/drivers/xen/netchannel2/Makefile index bdad6da..d6641a1 100644 --- a/drivers/xen/netchannel2/Makefile +++ b/drivers/xen/netchannel2/Makefile @@ -1,7 +1,7 @@ obj-$(CONFIG_XEN_NETCHANNEL2) += netchannel2.o netchannel2-objs := chan.o netchan2.o rscb.o util.o \ - xmit_packet.o recv_packet.o + xmit_packet.o recv_packet.o poll.o ifeq ($(CONFIG_XEN_NETDEV2_BACKEND),y) netchannel2-objs += netback2.o diff --git a/drivers/xen/netchannel2/chan.c b/drivers/xen/netchannel2/chan.c index e3ad981..a4b83a1 100644 --- a/drivers/xen/netchannel2/chan.c +++ b/drivers/xen/netchannel2/chan.c @@ -23,6 +23,7 @@ static irqreturn_t nc2_int(int irq, void *dev_id) if (ncr->irq == -1) return IRQ_HANDLED; + ncr->last_event = jiffies; if (ncr->cons_ring.sring->prod != ncr->cons_ring.cons_pvt || ncr->interface->is_stopped) nc2_kick(ncr); @@ -293,6 +294,8 @@ int init_ring_pair(struct netchannel2_ring_pair *ncrp, &ncrp->gref_pool) < 0) return -1; + nc2_init_poller(ncrp); + netif_napi_add(ncrp->interface->net_device, &ncrp->napi, process_ring, 64); napi_enable(&ncrp->napi); @@ -509,7 +512,7 @@ static void nc2_detach_ring(struct netchannel2_ring_pair *ncrp) { if (!ncrp->is_attached) return; - + nc2_stop_polling(ncrp); napi_disable(&ncrp->napi); _detach_rings(ncrp); } diff --git a/drivers/xen/netchannel2/netchannel2_core.h b/drivers/xen/netchannel2/netchannel2_core.h index 6ae273d..7304451 100644 --- a/drivers/xen/netchannel2/netchannel2_core.h +++ b/drivers/xen/netchannel2/netchannel2_core.h @@ -130,6 +130,11 @@ struct netchannel2_ring_pair { struct napi_struct napi; + /* jiffies the last time the interrupt fired. Not + synchronised at all, because it doesn''t usually matter if + it''s a bit off. */ + unsigned last_event; + /* Protected by the lock. Initialised at attach_ring() time and de-initialised at detach_ring() time. */ struct netchannel2_prod_ring prod_ring; @@ -139,6 +144,7 @@ struct netchannel2_ring_pair { unsigned max_count_frags_no_event; unsigned expected_finish_messages; + struct timer_list polling_timer; domid_t otherend_id; @@ -348,4 +354,8 @@ void receive_pending_skbs(struct sk_buff_head *rx_queue); void nc2_queue_purge(struct netchannel2_ring_pair *ncrp, struct sk_buff_head *queue); +void nc2_init_poller(struct netchannel2_ring_pair *ncrp); +void nc2_start_polling(struct netchannel2_ring_pair *ncrp); +void nc2_stop_polling(struct netchannel2_ring_pair *ncrp); + #endif /* !NETCHANNEL2_CORE_H__ */ diff --git a/drivers/xen/netchannel2/poll.c b/drivers/xen/netchannel2/poll.c new file mode 100644 index 0000000..42ca0d5 --- /dev/null +++ b/drivers/xen/netchannel2/poll.c @@ -0,0 +1,59 @@ +/* There are a couple of places where we try to minimise wakeups in + ways which work in the vast majority of cases, but occasionally + cause a needed event to be lost. Compensate for those with a 1Hz + ticker. The ticker runs whenever we have outstanding TX packets. + Once it''s running, we never try to modify it, and instead just let + it run out. */ +/* If we''re relying on this timer for correctness then performance is + going to be absolutely dire, but it should be sufficient to avoid + outright deadlocks. */ +#include <linux/kernel.h> +#include <linux/timer.h> +#include "netchannel2_core.h" + +#define TICKER_INTERVAL (HZ) + +static void poll_timer(unsigned long arg) +{ + struct netchannel2_ring_pair *ncrp + (struct netchannel2_ring_pair *)arg; + + /* If the ring appears to be behaving ``normally'''', increase + the number of messages which we''re allowed to have + outstanding by some small amount. If it looks like we''ve + deadlocked, halve it. */ + /* Arbitrarily define ``normal'''' to be at least one interrupt + every 100ms, and a small amount to be 10. */ + /* We don''t synchronise against concurrent readers of + max_count_frags_no_event, because it doesn''t matter too + much if it''s slightly wrong. We don''t need to worry about + concurrent writers, because this timer is the only thing + which can change it, and it''s only ever run on one cpu at a + time. */ + if (jiffies - ncrp->last_event > HZ/10) + ncrp->max_count_frags_no_event /= 2; + else if (ncrp->max_count_frags_no_event + 10 <+ MAX_MAX_COUNT_FRAGS_NO_EVENT) + ncrp->max_count_frags_no_event += 10; + + if (ncrp->expected_finish_messages == 0) + return; + if (ncrp->cons_ring.sring->prod != ncrp->cons_ring.cons_pvt) + nc2_kick(ncrp); + nc2_start_polling(ncrp); +} + +void nc2_init_poller(struct netchannel2_ring_pair *ncrp) +{ + setup_timer(&ncrp->polling_timer, poll_timer, (unsigned long)ncrp); +} + +void nc2_start_polling(struct netchannel2_ring_pair *ncrp) +{ + mod_timer(&ncrp->polling_timer, jiffies + TICKER_INTERVAL); +} + +void nc2_stop_polling(struct netchannel2_ring_pair *ncrp) +{ + del_timer_sync(&ncrp->polling_timer); +} diff --git a/drivers/xen/netchannel2/xmit_packet.c b/drivers/xen/netchannel2/xmit_packet.c index 92fbabf..a693a75 100644 --- a/drivers/xen/netchannel2/xmit_packet.c +++ b/drivers/xen/netchannel2/xmit_packet.c @@ -165,6 +165,9 @@ int nc2_really_start_xmit(struct netchannel2_ring_pair *ncrp, if (skb_co->tp) { ncrp->expected_finish_messages++; + if (ncrp->expected_finish_messages == 1 && + !timer_pending(&ncrp->polling_timer)) + nc2_start_polling(ncrp); /* We''re now ready to accept a FINISH message for this packet. */ skb_co->expecting_finish = 1; -- 1.6.3.1 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
<steven.smith@citrix.com>
2009-Oct-04 15:04 UTC
[Xen-devel] [PATCH 11/22] Transmit and receive checksum offload support.
Signed-off-by: Steven Smith <steven.smith@citrix.com> --- drivers/xen/netchannel2/Makefile | 2 +- drivers/xen/netchannel2/chan.c | 16 +++++ drivers/xen/netchannel2/netchannel2_core.h | 19 ++++++ drivers/xen/netchannel2/offload.c | 93 ++++++++++++++++++++++++++++ drivers/xen/netchannel2/recv_packet.c | 30 +++++++++ drivers/xen/netchannel2/xmit_packet.c | 17 +++++ include/xen/interface/io/netchannel2.h | 44 ++++++++++++- 7 files changed, 217 insertions(+), 4 deletions(-) create mode 100644 drivers/xen/netchannel2/offload.c diff --git a/drivers/xen/netchannel2/Makefile b/drivers/xen/netchannel2/Makefile index d6641a1..565ba89 100644 --- a/drivers/xen/netchannel2/Makefile +++ b/drivers/xen/netchannel2/Makefile @@ -1,7 +1,7 @@ obj-$(CONFIG_XEN_NETCHANNEL2) += netchannel2.o netchannel2-objs := chan.o netchan2.o rscb.o util.o \ - xmit_packet.o recv_packet.o poll.o + xmit_packet.o offload.o recv_packet.o poll.o ifeq ($(CONFIG_XEN_NETDEV2_BACKEND),y) netchannel2-objs += netback2.o diff --git a/drivers/xen/netchannel2/chan.c b/drivers/xen/netchannel2/chan.c index a4b83a1..af8d028 100644 --- a/drivers/xen/netchannel2/chan.c +++ b/drivers/xen/netchannel2/chan.c @@ -82,6 +82,9 @@ retry: case NETCHANNEL2_MSG_FINISH_PACKET: nc2_handle_finish_packet_msg(nc, ncrp, &hdr); break; + case NETCHANNEL2_MSG_SET_OFFLOAD: + nc2_handle_set_offload(nc, ncrp, &hdr); + break; case NETCHANNEL2_MSG_PAD: break; default: @@ -126,6 +129,7 @@ done: event channel if necessary. */ static void flush_rings(struct netchannel2_ring_pair *ncrp) { + struct netchannel2 *nc = ncrp->interface; int need_kick; flush_hypercall_batcher(&ncrp->pending_rx_hypercalls, @@ -133,6 +137,8 @@ static void flush_rings(struct netchannel2_ring_pair *ncrp) send_finish_packet_messages(ncrp); if (ncrp->need_advertise_max_packets) advertise_max_packets(ncrp); + if (nc->need_advertise_offloads) + advertise_offloads(nc); need_kick = 0; if (nc2_finish_messages(&ncrp->cons_ring)) { @@ -366,6 +372,9 @@ struct netchannel2 *nc2_new(struct xenbus_device *xd) nc->local_trusted = local_trusted; nc->rings.filter_mac = filter_mac; + /* Default to RX csum on. */ + nc->use_rx_csum = 1; + skb_queue_head_init(&nc->pending_skbs); if (init_ring_pair(&nc->rings, nc) < 0) { nc2_release(nc); @@ -383,6 +392,7 @@ struct netchannel2 *nc2_new(struct xenbus_device *xd) netdev->features = NETIF_F_LLTX; SET_NETDEV_DEV(netdev, &xd->dev); + SET_ETHTOOL_OPS(netdev, &nc2_ethtool_ops); err = read_mac_address(xd->nodename, "remote-mac", nc->rings.remote_mac); @@ -468,6 +478,8 @@ int nc2_attach_rings(struct netchannel2 *nc, _nc2_attach_rings(&nc->rings, cons_sring, cons_payload, cons_size, prod_sring, prod_payload, prod_size, otherend_id); + nc->need_advertise_offloads = 1; + spin_unlock_bh(&nc->rings.lock); netif_carrier_on(nc->net_device); @@ -532,6 +544,10 @@ void nc2_detach_rings(struct netchannel2 *nc) if (nc->rings.irq >= 0) unbind_from_irqhandler(nc->rings.irq, &nc->rings); nc->rings.irq = -1; + + /* Disable all offloads */ + nc->net_device->features &= ~NETIF_F_IP_CSUM; + nc->allow_tx_csum_offload = 0; } #if defined(CONFIG_XEN_NETDEV2_BACKEND) diff --git a/drivers/xen/netchannel2/netchannel2_core.h b/drivers/xen/netchannel2/netchannel2_core.h index 7304451..7e00daf 100644 --- a/drivers/xen/netchannel2/netchannel2_core.h +++ b/drivers/xen/netchannel2/netchannel2_core.h @@ -242,6 +242,19 @@ struct netchannel2 { /* Packets which we need to transmit soon */ struct sk_buff_head pending_skbs; + /* Task offload control. These are all protected by the + * lock. */ + /* Ethtool allows us to use RX checksumming */ + uint8_t use_rx_csum; + /* The remote endpoint allows us to use TX checksumming. + Whether we actually use TX checksumming is controlled by + the net device feature bits. */ + uint8_t allow_tx_csum_offload; + /* At some point in the past, we tried to tell the other end + what our current offload policy is and failed. Try again + as soon as possible. */ + uint8_t need_advertise_offloads; + /* Flag to indicate that the interface is stopped When the interface is stopped we need to run the tasklet after we receive an interrupt so that we can wake it up */ @@ -354,6 +367,12 @@ void receive_pending_skbs(struct sk_buff_head *rx_queue); void nc2_queue_purge(struct netchannel2_ring_pair *ncrp, struct sk_buff_head *queue); +void advertise_offloads(struct netchannel2 *nc); +void nc2_handle_set_offload(struct netchannel2 *nc, + struct netchannel2_ring_pair *ncrp, + struct netchannel2_msg_hdr *hdr); +extern struct ethtool_ops nc2_ethtool_ops; + void nc2_init_poller(struct netchannel2_ring_pair *ncrp); void nc2_start_polling(struct netchannel2_ring_pair *ncrp); void nc2_stop_polling(struct netchannel2_ring_pair *ncrp); diff --git a/drivers/xen/netchannel2/offload.c b/drivers/xen/netchannel2/offload.c new file mode 100644 index 0000000..90d0a54 --- /dev/null +++ b/drivers/xen/netchannel2/offload.c @@ -0,0 +1,93 @@ +/* All the bits used to handle enabling and disabling the various + * offloads. */ +#include <linux/kernel.h> +#include <linux/ethtool.h> +#include "netchannel2_core.h" + +static int nc2_set_tx_csum(struct net_device *nd, u32 val); + +/* ---------------- Interface to the other domain ----------------------- */ +void nc2_handle_set_offload(struct netchannel2 *nc, + struct netchannel2_ring_pair *ncrp, + struct netchannel2_msg_hdr *hdr) +{ + struct netchannel2_msg_set_offload msg; + if (hdr->size != sizeof(msg)) { + pr_debug("Strange sized offload message: %d\n", + hdr->size); + return; + } + if (ncrp != &nc->rings) { + pr_debug("Setting offloads on an ancillary ring!\n"); + return; + } + nc2_copy_from_ring(&nc->rings.cons_ring, &msg, hdr->size); + if (msg.csum != nc->allow_tx_csum_offload) { + nc->allow_tx_csum_offload = msg.csum; + nc2_set_tx_csum(nc->net_device, msg.csum); + } +} + +/* Tell the other end what sort of offloads it''s allowed to use. */ +void advertise_offloads(struct netchannel2 *nc) +{ + struct netchannel2_msg_set_offload msg; + + memset(&msg, 0, sizeof(msg)); + + if (nc2_can_send_payload_bytes(&nc->rings.prod_ring, sizeof(msg))) { + msg.csum = nc->use_rx_csum; + nc2_send_message(&nc->rings.prod_ring, + NETCHANNEL2_MSG_SET_OFFLOAD, + 0, &msg, sizeof(msg)); + nc->need_advertise_offloads = 0; + nc->rings.pending_time_sensitive_messages = 1; + } else { + nc->need_advertise_offloads = 1; + } +} + + + +/* ---------------------- Ethtool interface ---------------------------- */ + +static int nc2_set_rx_csum(struct net_device *nd, u32 val) +{ + struct netchannel2 *nc = netdev_priv(nd); + + spin_lock_bh(&nc->rings.lock); + if (nc->use_rx_csum != val) { + nc->use_rx_csum = val; + nc->need_advertise_offloads = 1; + spin_unlock_bh(&nc->rings.lock); + nc2_kick(&nc->rings); + } else { + spin_unlock_bh(&nc->rings.lock); + } + + return 0; +} + +static u32 nc2_get_rx_csum(struct net_device *nd) +{ + struct netchannel2 *nc = netdev_priv(nd); + return nc->use_rx_csum; +} + +static int nc2_set_tx_csum(struct net_device *nd, u32 val) +{ + struct netchannel2 *nc = netdev_priv(nd); + + /* Can''t turn on TX csum offload if the other end can''t do RX + csum offload. */ + if (val != 0 && !nc->allow_tx_csum_offload) + return -EOPNOTSUPP; + return ethtool_op_set_tx_csum(nd, val); +} + +struct ethtool_ops nc2_ethtool_ops = { + .get_tx_csum = ethtool_op_get_tx_csum, + .set_tx_csum = nc2_set_tx_csum, + .get_rx_csum = nc2_get_rx_csum, + .set_rx_csum = nc2_set_rx_csum, +}; diff --git a/drivers/xen/netchannel2/recv_packet.c b/drivers/xen/netchannel2/recv_packet.c index 4678c28..0d4e593 100644 --- a/drivers/xen/netchannel2/recv_packet.c +++ b/drivers/xen/netchannel2/recv_packet.c @@ -132,6 +132,36 @@ void nc2_handle_packet_msg(struct netchannel2 *nc, goto err; } + switch (msg.flags & (NC2_PACKET_FLAG_data_validated | + NC2_PACKET_FLAG_csum_blank)) { + case 0: + skb->ip_summed = CHECKSUM_NONE; + break; + case NC2_PACKET_FLAG_data_validated: + skb->ip_summed = CHECKSUM_UNNECESSARY; + break; + default: + /* csum_blank implies data_validated, so + csum_blank and csum_blank|data_validated + are equivalent. */ + skb->ip_summed = CHECKSUM_PARTIAL; + if (msg.csum_offset + 2 > skb->len) { + /* Whoops. Assuming no bugs in our + receive methods, the other end just + requested checksum calculation + beyond the end of the packet. */ + if (net_ratelimit()) + dev_warn(&nc->net_device->dev, + "csum field too far through packet (%d, skb len %d, headlen %d)\n", + msg.csum_offset, skb->len, + skb_headlen(skb)); + goto err; + } + skb->csum_start = msg.csum_start + skb_headroom(skb); + skb->csum_offset = msg.csum_offset - msg.csum_start; + break; + } + __skb_queue_tail(pending_rx_queue, skb); if (ncrp->pending_rx_hypercalls.nr_pending_gops >diff --git a/drivers/xen/netchannel2/xmit_packet.c b/drivers/xen/netchannel2/xmit_packet.c index a693a75..5b0ba6b 100644 --- a/drivers/xen/netchannel2/xmit_packet.c +++ b/drivers/xen/netchannel2/xmit_packet.c @@ -90,6 +90,21 @@ int prepare_xmit_allocate_resources(struct netchannel2 *nc, return -1; } +static void set_offload_flags(struct sk_buff *skb, + volatile struct netchannel2_msg_packet *msg) +{ + if (skb->ip_summed == CHECKSUM_PARTIAL) { + msg->flags |+ NC2_PACKET_FLAG_csum_blank | + NC2_PACKET_FLAG_data_validated; + msg->csum_start = skb->csum_start - (skb->data - skb->head); + msg->csum_offset = msg->csum_start + skb->csum_offset; + } + + if (skb->proto_data_valid) + msg->flags |= NC2_PACKET_FLAG_data_validated; +} + /* Transmit a packet which has previously been prepared with prepare_xmit_allocate_resources(). */ /* Once this has been called, the ring must not be flushed until the @@ -139,6 +154,8 @@ int nc2_really_start_xmit(struct netchannel2_ring_pair *ncrp, skb_co->inline_prefix_size); barrier(); + set_offload_flags(skb, msg); + switch (skb_co->policy) { case transmit_policy_small: /* Nothing to do */ diff --git a/include/xen/interface/io/netchannel2.h b/include/xen/interface/io/netchannel2.h index c45963e..5a56eb9 100644 --- a/include/xen/interface/io/netchannel2.h +++ b/include/xen/interface/io/netchannel2.h @@ -53,15 +53,31 @@ struct netchannel2_msg_packet { uint8_t pad1; uint16_t prefix_size; uint16_t pad2; - uint16_t pad3; - uint16_t pad4; - /* Variable-size array. The number of elements is determined + uint16_t csum_start; + uint16_t csum_offset; + /* Variable-size array. The number of elements is determined by the size of the message. */ /* Until we support scatter-gather, this will be either 0 or 1 element. */ struct netchannel2_fragment frags[0]; }; +/* TX csum offload. The transmitting domain has skipped a checksum + * calculation. Before forwarding the packet on, the receiving domain + * must first perform a 16 bit IP checksum on everything from + * csum_start to the end of the packet, and then write the result to + * an offset csum_offset in the packet. This should only be set if + * the transmitting domain has previously received a SET_OFFLOAD + * message with csum = 1. + */ +#define NC2_PACKET_FLAG_csum_blank 1 +/* RX csum offload. The transmitting domain has already validated the + * protocol-level checksum on this packet (i.e. TCP or UDP), so the + * receiving domain shouldn''t bother. This does not tell you anything + * about the IP-level checksum. This can be set on any packet, + * regardless of any SET_OFFLOAD messages which may or may not have + * been sent. */ +#define NC2_PACKET_FLAG_data_validated 2 /* If set, the transmitting domain requires an event urgently when * this packet''s finish message is sent. Otherwise, the event can be * delayed. */ @@ -103,4 +119,26 @@ struct netchannel2_msg_finish_packet { uint32_t id; }; +/* Tell the other end what sort of offloads we''re going to let it use. + * An endpoint must not use any offload unless it has been enabled + * by a previous SET_OFFLOAD message. */ +/* Note that there is no acknowledgement for this message. This means + * that an endpoint can continue to receive PACKET messages which + * require offload support for some time after it disables task + * offloading. The endpoint is expected to handle this case correctly + * (which may just mean dropping the packet and returning a FINISH + * message, if appropriate). + */ +#define NETCHANNEL2_MSG_SET_OFFLOAD 4 +struct netchannel2_msg_set_offload { + struct netchannel2_msg_hdr hdr; + /* Checksum offload. If this is 0, the other end must + * calculate checksums before sending the packet. If it is 1, + * the other end does not have to perform the calculation. + */ + uint8_t csum; + uint8_t pad; + uint16_t reserved; +}; + #endif /* !__NETCHANNEL2_H__ */ -- 1.6.3.1 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
<steven.smith@citrix.com>
2009-Oct-04 15:04 UTC
[Xen-devel] [PATCH 12/22] Scatter-gather support.
Signed-off-by: Steven Smith <steven.smith@citrix.com> --- drivers/xen/netchannel2/chan.c | 27 ++++++++++-- drivers/xen/netchannel2/netchannel2_core.h | 35 +++++++++++++--- drivers/xen/netchannel2/offload.c | 59 ++++++++++++++++++++++++++++ drivers/xen/netchannel2/recv_packet.c | 23 +++++++++++ drivers/xen/netchannel2/rscb.c | 18 ++++++-- drivers/xen/netchannel2/xmit_packet.c | 43 ++++++++++++-------- include/xen/interface/io/netchannel2.h | 24 ++++++++++- 7 files changed, 191 insertions(+), 38 deletions(-) diff --git a/drivers/xen/netchannel2/chan.c b/drivers/xen/netchannel2/chan.c index af8d028..ae9bdb0 100644 --- a/drivers/xen/netchannel2/chan.c +++ b/drivers/xen/netchannel2/chan.c @@ -85,6 +85,10 @@ retry: case NETCHANNEL2_MSG_SET_OFFLOAD: nc2_handle_set_offload(nc, ncrp, &hdr); break; + case NETCHANNEL2_MSG_SET_MAX_FRAGMENTS_PER_PACKET: + nc2_handle_set_max_fragments_per_packet(nc, ncrp, + &hdr); + break; case NETCHANNEL2_MSG_PAD: break; default: @@ -137,6 +141,8 @@ static void flush_rings(struct netchannel2_ring_pair *ncrp) send_finish_packet_messages(ncrp); if (ncrp->need_advertise_max_packets) advertise_max_packets(ncrp); + if (ncrp->need_advertise_max_fragments_per_packet) + advertise_max_fragments_per_packet(ncrp); if (nc->need_advertise_offloads) advertise_offloads(nc); @@ -460,6 +466,8 @@ static void _nc2_attach_rings(struct netchannel2_ring_pair *ncrp, ncrp->is_attached = 1; ncrp->need_advertise_max_packets = 1; + ncrp->need_advertise_max_fragments_per_packet = 1; + ncrp->max_fragments_per_tx_packet = 1; } /* Attach a netchannel2 structure to a ring pair. The endpoint is @@ -546,8 +554,9 @@ void nc2_detach_rings(struct netchannel2 *nc) nc->rings.irq = -1; /* Disable all offloads */ - nc->net_device->features &= ~NETIF_F_IP_CSUM; + nc->net_device->features &= ~(NETIF_F_IP_CSUM | NETIF_F_SG); nc->allow_tx_csum_offload = 0; + nc->rings.max_fragments_per_tx_packet = 1; } #if defined(CONFIG_XEN_NETDEV2_BACKEND) @@ -657,17 +666,25 @@ static int process_ring(struct napi_struct *napi, skb = skb_peek_tail(&nc->pending_skbs); if (!skb) break; - if (prepare_xmit_allocate_resources(nc, skb) < 0) { - /* Still stuck */ + switch (prepare_xmit_allocate_resources(nc, skb)) { + case PREP_XMIT_OKAY: + __skb_unlink(skb, &nc->pending_skbs); + queue_packet_to_interface(skb, ncrp); + break; + case PREP_XMIT_BUSY: + goto still_stuck; + case PREP_XMIT_DROP: + __skb_unlink(skb, &nc->pending_skbs); + release_tx_packet(ncrp, skb); break; } - __skb_unlink(skb, &nc->pending_skbs); - queue_packet_to_interface(skb, ncrp); } if (skb_queue_empty(&nc->pending_skbs)) { nc->is_stopped = 0; netif_wake_queue(nc->net_device); } +still_stuck: + ; } spin_unlock(&ncrp->lock); diff --git a/drivers/xen/netchannel2/netchannel2_core.h b/drivers/xen/netchannel2/netchannel2_core.h index 7e00daf..b3b063c 100644 --- a/drivers/xen/netchannel2/netchannel2_core.h +++ b/drivers/xen/netchannel2/netchannel2_core.h @@ -199,6 +199,15 @@ struct netchannel2_ring_pair { filtering rules would suppress the event. */ uint8_t delayed_kick; + /* Set if we need to send a SET_MAX_FRAGMENTS_PER_PACKET + * message. */ + uint8_t need_advertise_max_fragments_per_packet; + + /* The maximum number of fragments which can be used in any + given packet. We have to linearise anything which is more + fragmented than this. */ + uint32_t max_fragments_per_tx_packet; + /* A list of packet IDs which we need to return to the other end as soon as there is space on the ring. Protected by the lock. */ @@ -308,10 +317,18 @@ struct sk_buff *handle_receiver_copy_packet(struct netchannel2 *nc, unsigned nr_frags, unsigned frags_off); -int prepare_xmit_allocate_small(struct netchannel2_ring_pair *ncrp, - struct sk_buff *skb); -int prepare_xmit_allocate_grant(struct netchannel2_ring_pair *ncrp, - struct sk_buff *skb); +enum prepare_xmit_result { + PREP_XMIT_OKAY = 0, + PREP_XMIT_BUSY = -1, + PREP_XMIT_DROP = -2, +}; + +enum prepare_xmit_result prepare_xmit_allocate_small( + struct netchannel2_ring_pair *ncrp, + struct sk_buff *skb); +enum prepare_xmit_result prepare_xmit_allocate_grant( + struct netchannel2_ring_pair *ncrp, + struct sk_buff *skb); void xmit_grant(struct netchannel2_ring_pair *ncrp, struct sk_buff *skb, volatile void *msg); @@ -347,9 +364,9 @@ void nc2_rscb_on_gntcopy_fail(void *ctxt, gnttab_copy_t *gop); int nc2_start_xmit(struct sk_buff *skb, struct net_device *dev); int nc2_really_start_xmit(struct netchannel2_ring_pair *ncrp, - struct sk_buff *skb); -int prepare_xmit_allocate_resources(struct netchannel2 *nc, - struct sk_buff *skb); + struct sk_buff *skb); +enum prepare_xmit_result prepare_xmit_allocate_resources(struct netchannel2 *nc, + struct sk_buff *skb); void nc2_handle_finish_packet_msg(struct netchannel2 *nc, struct netchannel2_ring_pair *ncrp, struct netchannel2_msg_hdr *hdr); @@ -363,6 +380,10 @@ void nc2_handle_packet_msg(struct netchannel2 *nc, struct netchannel2_msg_hdr *hdr, struct sk_buff_head *pending_rx_queue); void advertise_max_packets(struct netchannel2_ring_pair *ncrp); +void nc2_handle_set_max_fragments_per_packet(struct netchannel2 *nc, + struct netchannel2_ring_pair *ncrp, + struct netchannel2_msg_hdr *hdr); +void advertise_max_fragments_per_packet(struct netchannel2_ring_pair *ncrp); void receive_pending_skbs(struct sk_buff_head *rx_queue); void nc2_queue_purge(struct netchannel2_ring_pair *ncrp, struct sk_buff_head *queue); diff --git a/drivers/xen/netchannel2/offload.c b/drivers/xen/netchannel2/offload.c index 90d0a54..552b0ad 100644 --- a/drivers/xen/netchannel2/offload.c +++ b/drivers/xen/netchannel2/offload.c @@ -5,6 +5,7 @@ #include "netchannel2_core.h" static int nc2_set_tx_csum(struct net_device *nd, u32 val); +static int nc2_set_sg(struct net_device *nd, u32 val); /* ---------------- Interface to the other domain ----------------------- */ void nc2_handle_set_offload(struct netchannel2 *nc, @@ -25,6 +26,14 @@ void nc2_handle_set_offload(struct netchannel2 *nc, if (msg.csum != nc->allow_tx_csum_offload) { nc->allow_tx_csum_offload = msg.csum; nc2_set_tx_csum(nc->net_device, msg.csum); + /* Linux doesn''t support scatter-gather mode without + TX csum offload. We therefore need to disable SG + support whenever the remote turns off csum support. + We also elect to enable SG support whenever the + remote turns on csum support, since that''s more + likely to be useful than requiring the user to + manually enable it every time. */ + nc2_set_sg(nc->net_device, msg.csum); } } @@ -47,6 +56,37 @@ void advertise_offloads(struct netchannel2 *nc) } } +/* Not really offload-related, but it interacts with checksum offload + and is easiest to do here. */ +void nc2_handle_set_max_fragments_per_packet(struct netchannel2 *nc, + struct netchannel2_ring_pair *ncrp, + struct netchannel2_msg_hdr *hdr) +{ + struct netchannel2_msg_set_max_fragments_per_packet msg; + + if (hdr->size != sizeof(msg)) { + pr_debug("Set max fragments per packet message had strange size %d\n", + hdr->size); + return; + } + nc2_copy_from_ring(&ncrp->cons_ring, &msg, sizeof(msg)); + if (msg.max_frags_per_packet < 1) { + pr_debug("set max fragments per packet to %d?\n", + msg.max_frags_per_packet); + return; + } + if (ncrp == &nc->rings && + ncrp->max_fragments_per_tx_packet == 1 && + msg.max_frags_per_packet > 1) { + /* Turning on scatter-gather mode. Linux only + supports it if you''ve got TX csum offload, + though. */ + if (nc->net_device->features & NETIF_F_IP_CSUM) + nc->net_device->features |= NETIF_F_SG; + } + ncrp->max_fragments_per_tx_packet = msg.max_frags_per_packet; +} + /* ---------------------- Ethtool interface ---------------------------- */ @@ -85,9 +125,28 @@ static int nc2_set_tx_csum(struct net_device *nd, u32 val) return ethtool_op_set_tx_csum(nd, val); } +/* ethtool set_sg() handler. Linux makes sure that TX csum offload is + only enabled when scatter-gather mode is, so we don''t have to worry + about that here. */ +static int nc2_set_sg(struct net_device *nd, u32 val) +{ + struct netchannel2 *nc = netdev_priv(nd); + + if (nc->rings.max_fragments_per_tx_packet <= 1) + return -EOPNOTSUPP; + + if (val) + nd->features |= NETIF_F_SG; + else + nd->features &= ~NETIF_F_SG; + return 0; +} + struct ethtool_ops nc2_ethtool_ops = { .get_tx_csum = ethtool_op_get_tx_csum, .set_tx_csum = nc2_set_tx_csum, .get_rx_csum = nc2_get_rx_csum, .set_rx_csum = nc2_set_rx_csum, + .get_sg = ethtool_op_get_sg, + .set_sg = nc2_set_sg, }; diff --git a/drivers/xen/netchannel2/recv_packet.c b/drivers/xen/netchannel2/recv_packet.c index 0d4e593..958a3a6 100644 --- a/drivers/xen/netchannel2/recv_packet.c +++ b/drivers/xen/netchannel2/recv_packet.c @@ -83,6 +83,13 @@ void nc2_handle_packet_msg(struct netchannel2 *nc, frags_bytes = hdr->size - sizeof(msg) - msg.prefix_size; nr_frags = frags_bytes / sizeof(struct netchannel2_fragment); + if (nr_frags > MAX_SKB_FRAGS) { + pr_debug("otherend misbehaving: %d frags > %ld\n", + nr_frags, MAX_SKB_FRAGS); + nc->stats.tx_errors++; + return; + } + switch (msg.type) { case NC2_PACKET_TYPE_small: if (nr_frags != 0) { @@ -218,6 +225,22 @@ void advertise_max_packets(struct netchannel2_ring_pair *ncrp) ncrp->pending_time_sensitive_messages = 1; } +void advertise_max_fragments_per_packet(struct netchannel2_ring_pair *ncrp) +{ + struct netchannel2_msg_set_max_fragments_per_packet msg; + + if (!nc2_can_send_payload_bytes(&ncrp->prod_ring, sizeof(msg))) + return; + msg.max_frags_per_packet = MAX_SKB_FRAGS; + nc2_send_message(&ncrp->prod_ring, + NETCHANNEL2_MSG_SET_MAX_FRAGMENTS_PER_PACKET, + 0, + &msg, + sizeof(msg)); + ncrp->need_advertise_max_fragments_per_packet = 0; + ncrp->pending_time_sensitive_messages = 1; +} + void receive_pending_skbs(struct sk_buff_head *pending_rx_queue) { struct sk_buff *skb; diff --git a/drivers/xen/netchannel2/rscb.c b/drivers/xen/netchannel2/rscb.c index 8984f90..8ad5454 100644 --- a/drivers/xen/netchannel2/rscb.c +++ b/drivers/xen/netchannel2/rscb.c @@ -229,8 +229,8 @@ static inline int nfrags_skb(struct sk_buff *skb, int prefix_size) + skb_shinfo(skb)->nr_frags; } -int prepare_xmit_allocate_grant(struct netchannel2_ring_pair *ncrp, - struct sk_buff *skb) +enum prepare_xmit_result prepare_xmit_allocate_grant(struct netchannel2_ring_pair *ncrp, + struct sk_buff *skb) { struct skb_cb_overlay *skb_co = get_skb_overlay(skb); unsigned nr_fragments; @@ -239,7 +239,7 @@ int prepare_xmit_allocate_grant(struct netchannel2_ring_pair *ncrp, unsigned inline_prefix_size; if (allocate_txp_slot(ncrp, skb) < 0) - return -1; + return PREP_XMIT_BUSY; /* We''re going to have to get the remote to issue a grant copy hypercall anyway, so there''s no real benefit to shoving the @@ -256,6 +256,14 @@ int prepare_xmit_allocate_grant(struct netchannel2_ring_pair *ncrp, * policy grant. */ BUG_ON(nr_fragments == 0); + if (nr_fragments > ncrp->max_fragments_per_tx_packet) { + if (skb_linearize(skb) < 0) + return PREP_XMIT_DROP; + nr_fragments = nfrags_skb(skb, inline_prefix_size); + if (nr_fragments > ncrp->max_fragments_per_tx_packet) + return PREP_XMIT_DROP; + } + skb_co->nr_fragments = nr_fragments; } @@ -267,14 +275,14 @@ int prepare_xmit_allocate_grant(struct netchannel2_ring_pair *ncrp, release_txp_slot(ncrp, skb); /* Leave skb_co->nr_fragments set, so that we don''t have to recompute it next time around. */ - return -1; + return PREP_XMIT_BUSY; } skb_co->gref_pool = gref_pool; skb_co->inline_prefix_size = inline_prefix_size; skb_co->type = NC2_PACKET_TYPE_receiver_copy; - return 0; + return PREP_XMIT_OKAY; } static void prepare_subpage_grant(struct netchannel2_ring_pair *ncrp, diff --git a/drivers/xen/netchannel2/xmit_packet.c b/drivers/xen/netchannel2/xmit_packet.c index 5b0ba6b..5cebca6 100644 --- a/drivers/xen/netchannel2/xmit_packet.c +++ b/drivers/xen/netchannel2/xmit_packet.c @@ -21,8 +21,9 @@ static enum transmit_policy transmit_policy(struct netchannel2 *nc, transmitted in the ring. This is only called for small, linear SKBs. It always succeeds, but has an int return type for symmetry with the other prepare_xmit_*() functions. */ -int prepare_xmit_allocate_small(struct netchannel2_ring_pair *ncrp, - struct sk_buff *skb) +enum prepare_xmit_result prepare_xmit_allocate_small( + struct netchannel2_ring_pair *ncrp, + struct sk_buff *skb) { struct skb_cb_overlay *skb_co = get_skb_overlay(skb); @@ -33,7 +34,7 @@ int prepare_xmit_allocate_small(struct netchannel2_ring_pair *ncrp, skb_co->gref_pool = 0; skb_co->inline_prefix_size = skb->len; - return 0; + return PREP_XMIT_OKAY; } /* Figure out how much space @tp will take up on the ring. */ @@ -56,13 +57,13 @@ static unsigned get_transmitted_packet_msg_size(struct sk_buff *skb) allocated. The expected case is that the caller will arrange for us to retry the allocation later, in which case we''ll pick up the already-allocated buffers. */ -int prepare_xmit_allocate_resources(struct netchannel2 *nc, - struct sk_buff *skb) +enum prepare_xmit_result prepare_xmit_allocate_resources(struct netchannel2 *nc, + struct sk_buff *skb) { struct skb_cb_overlay *skb_co = get_skb_overlay(skb); enum transmit_policy policy; unsigned msg_size; - int r; + enum prepare_xmit_result r; if (skb_co->policy == transmit_policy_unknown) { policy = transmit_policy(nc, skb); @@ -76,18 +77,18 @@ int prepare_xmit_allocate_resources(struct netchannel2 *nc, default: BUG(); /* Shut the compiler up. */ - r = -1; + r = PREP_XMIT_BUSY; } - if (r < 0) + if (r != PREP_XMIT_OKAY) return r; skb_co->policy = policy; } msg_size = get_transmitted_packet_msg_size(skb); if (nc2_reserve_payload_bytes(&nc->rings.prod_ring, msg_size)) - return 0; + return PREP_XMIT_OKAY; - return -1; + return PREP_XMIT_BUSY; } static void set_offload_flags(struct sk_buff *skb, @@ -221,21 +222,27 @@ int nc2_start_xmit(struct sk_buff *skb, struct net_device *dev) spin_lock_bh(&nc->rings.lock); - if (!nc->rings.is_attached) { - spin_unlock_bh(&nc->rings.lock); - dev_kfree_skb(skb); - nc->stats.tx_dropped++; - return NETDEV_TX_OK; - } + if (!nc->rings.is_attached) + goto out_drop; r = prepare_xmit_allocate_resources(nc, skb); - if (r < 0) - goto out_busy; + if (r != PREP_XMIT_OKAY) { + if (r == PREP_XMIT_BUSY) + goto out_busy; + else + goto out_drop; + } queue_packet_to_interface(skb, &nc->rings); spin_unlock_bh(&nc->rings.lock); return NETDEV_TX_OK; +out_drop: + spin_unlock_bh(&nc->rings.lock); + dev_kfree_skb(skb); + nc->stats.tx_dropped++; + return NETDEV_TX_OK; + out_busy: /* Some more buffers may have arrived, so kick the worker * thread to go and have a look. */ diff --git a/include/xen/interface/io/netchannel2.h b/include/xen/interface/io/netchannel2.h index 5a56eb9..11bb469 100644 --- a/include/xen/interface/io/netchannel2.h +++ b/include/xen/interface/io/netchannel2.h @@ -26,6 +26,11 @@ struct netchannel2_msg_set_max_packets { * NETCHANNEL2_MAX_INLINE_BYTES. Packets may contain no more than * NETCHANNEL2_MAX_PACKET_BYTES bytes of data, including all fragments * and the prefix. + * + * If a SET_MAX_FRAGMENTS_PER_PACKET message has been received, the + * number of fragments in the packet should respect that limit. + * Otherwise, there should be at most one fragment in the packet + * (there may be zero if the entire packet fits in the inline prefix). */ #define NETCHANNEL2_MSG_PACKET 2 #define NETCHANNEL2_MAX_PACKET_BYTES 65536 @@ -55,10 +60,8 @@ struct netchannel2_msg_packet { uint16_t pad2; uint16_t csum_start; uint16_t csum_offset; - /* Variable-size array. The number of elements is determined + /* Variable-size array. The number of elements is determined by the size of the message. */ - /* Until we support scatter-gather, this will be either 0 or 1 - element. */ struct netchannel2_fragment frags[0]; }; @@ -141,4 +144,19 @@ struct netchannel2_msg_set_offload { uint16_t reserved; }; +/* Set the maximum number of fragments which can be used in any packet + * (not including the inline prefix). Until this is sent, there can + * be at most one such fragment per packet. The maximum must not be + * set to zero. */ +/* Note that there is no acknowledgement for this message, and so if + * an endpoint tries to reduce the number of fragments then it may + * continue to recieve over-fragmented packets for some time. The + * receiving endpoint is expected to deal with this. + */ +#define NETCHANNEL2_MSG_SET_MAX_FRAGMENTS_PER_PACKET 5 +struct netchannel2_msg_set_max_fragments_per_packet { + struct netchannel2_msg_hdr hdr; + uint32_t max_frags_per_packet; +}; + #endif /* !__NETCHANNEL2_H__ */ -- 1.6.3.1 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
<steven.smith@citrix.com>
2009-Oct-04 15:04 UTC
[Xen-devel] [PATCH 13/22] Jumbogram support.
Most of the hard work was already done, and it just needed to be plumbed through. Signed-off-by: Steven Smith <steven.smith@citrix.com> --- drivers/xen/netchannel2/chan.c | 9 +++++++++ 1 files changed, 9 insertions(+), 0 deletions(-) diff --git a/drivers/xen/netchannel2/chan.c b/drivers/xen/netchannel2/chan.c index ae9bdb0..109f1b4 100644 --- a/drivers/xen/netchannel2/chan.c +++ b/drivers/xen/netchannel2/chan.c @@ -322,6 +322,14 @@ static struct net_device_stats *nc2_get_stats(struct net_device *nd) return &nc->stats; } +static int nc2_change_mtu(struct net_device *nd, int mtu) +{ + if (mtu > NETCHANNEL2_MAX_PACKET_BYTES) + return -EINVAL; + nd->mtu = mtu; + return 0; +} + /* Create a new netchannel2 structure. Call with no locks held. Returns NULL on error. The xenbus device must remain valid for as long as the netchannel2 structure does. The core does not take out @@ -391,6 +399,7 @@ struct netchannel2 *nc2_new(struct xenbus_device *xd) netdev->stop = nc2_stop; netdev->hard_start_xmit = nc2_start_xmit; netdev->get_stats = nc2_get_stats; + netdev->change_mtu = nc2_change_mtu; /* We need to hold the ring lock in order to send messages anyway, so there''s no point in Linux doing additional -- 1.6.3.1 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
This includes both TSO-send and TSO-receive support. Signed-off-by: Steven Smith <steven.smith@citrix.com> --- drivers/xen/netchannel2/chan.c | 3 +- drivers/xen/netchannel2/netchannel2_core.h | 4 +++ drivers/xen/netchannel2/offload.c | 33 ++++++++++++++++++++++++++- drivers/xen/netchannel2/recv_packet.c | 19 ++++++++++++++++ drivers/xen/netchannel2/xmit_packet.c | 8 ++++++ include/xen/interface/io/netchannel2.h | 19 +++++++++++---- 6 files changed, 78 insertions(+), 8 deletions(-) diff --git a/drivers/xen/netchannel2/chan.c b/drivers/xen/netchannel2/chan.c index 109f1b4..9bb7ce7 100644 --- a/drivers/xen/netchannel2/chan.c +++ b/drivers/xen/netchannel2/chan.c @@ -563,9 +563,10 @@ void nc2_detach_rings(struct netchannel2 *nc) nc->rings.irq = -1; /* Disable all offloads */ - nc->net_device->features &= ~(NETIF_F_IP_CSUM | NETIF_F_SG); + nc->net_device->features &= ~(NETIF_F_IP_CSUM | NETIF_F_SG | NETIF_F_TSO); nc->allow_tx_csum_offload = 0; nc->rings.max_fragments_per_tx_packet = 1; + nc->allow_tso = 0; } #if defined(CONFIG_XEN_NETDEV2_BACKEND) diff --git a/drivers/xen/netchannel2/netchannel2_core.h b/drivers/xen/netchannel2/netchannel2_core.h index b3b063c..7be97ea 100644 --- a/drivers/xen/netchannel2/netchannel2_core.h +++ b/drivers/xen/netchannel2/netchannel2_core.h @@ -259,6 +259,10 @@ struct netchannel2 { Whether we actually use TX checksumming is controlled by the net device feature bits. */ uint8_t allow_tx_csum_offload; + /* The remote endpoint allows us to use TSO for TCPv4. As for + checksumming, we only actually use the feature if the net + device says to. */ + uint8_t allow_tso; /* At some point in the past, we tried to tell the other end what our current offload policy is and failed. Try again as soon as possible. */ diff --git a/drivers/xen/netchannel2/offload.c b/drivers/xen/netchannel2/offload.c index 552b0ad..5e9c8d0 100644 --- a/drivers/xen/netchannel2/offload.c +++ b/drivers/xen/netchannel2/offload.c @@ -6,6 +6,7 @@ static int nc2_set_tx_csum(struct net_device *nd, u32 val); static int nc2_set_sg(struct net_device *nd, u32 val); +static int nc2_set_tso(struct net_device *nd, u32 val); /* ---------------- Interface to the other domain ----------------------- */ void nc2_handle_set_offload(struct netchannel2 *nc, @@ -35,6 +36,11 @@ void nc2_handle_set_offload(struct netchannel2 *nc, manually enable it every time. */ nc2_set_sg(nc->net_device, msg.csum); } + + if (msg.tcpv4_segmentation_offload != nc->allow_tso) { + nc->allow_tso = msg.tcpv4_segmentation_offload; + nc2_set_tso(nc->net_device, msg.tcpv4_segmentation_offload); + } } /* Tell the other end what sort of offloads it''s allowed to use. */ @@ -46,6 +52,14 @@ void advertise_offloads(struct netchannel2 *nc) if (nc2_can_send_payload_bytes(&nc->rings.prod_ring, sizeof(msg))) { msg.csum = nc->use_rx_csum; + /* We always claim to be able to accept TSO packets, + and don''t provide any way of turning it off through + ethtool. We used to use the LRO flag, but that''s + not quite right: receiving an LRO packet and + receiving a TSO one are subtly different, due to + the way they get packed into the skbuff + structure. */ + msg.tcpv4_segmentation_offload = 1; nc2_send_message(&nc->rings.prod_ring, NETCHANNEL2_MSG_SET_OFFLOAD, 0, &msg, sizeof(msg)); @@ -142,11 +156,26 @@ static int nc2_set_sg(struct net_device *nd, u32 val) return 0; } +static int nc2_set_tso(struct net_device *nd, u32 val) +{ + struct netchannel2 *nc = netdev_priv(nd); + /* We only allow ourselves to use TSO if the other end''s + allowed us to use sufficiently many fragments per + packet. */ + if (val != 0 && + (!nc->allow_tso || + nc->rings.max_fragments_per_tx_packet < MAX_SKB_FRAGS)) + return -EOPNOTSUPP; + return ethtool_op_set_tso(nd, val); +} + struct ethtool_ops nc2_ethtool_ops = { .get_tx_csum = ethtool_op_get_tx_csum, .set_tx_csum = nc2_set_tx_csum, .get_rx_csum = nc2_get_rx_csum, .set_rx_csum = nc2_set_rx_csum, - .get_sg = ethtool_op_get_sg, - .set_sg = nc2_set_sg, + .get_sg = ethtool_op_get_sg, + .set_sg = nc2_set_sg, + .get_tso = ethtool_op_get_tso, + .set_tso = nc2_set_tso }; diff --git a/drivers/xen/netchannel2/recv_packet.c b/drivers/xen/netchannel2/recv_packet.c index 958a3a6..80c5d5d 100644 --- a/drivers/xen/netchannel2/recv_packet.c +++ b/drivers/xen/netchannel2/recv_packet.c @@ -169,6 +169,25 @@ void nc2_handle_packet_msg(struct netchannel2 *nc, break; } + switch (msg.segmentation_type) { + case NC2_PACKET_SEGMENTATION_TYPE_none: + break; + case NC2_PACKET_SEGMENTATION_TYPE_tcpv4: + if (msg.mss == 0) { + pr_debug("TSO request with mss == 0?\n"); + goto err; + } + skb_shinfo(skb)->gso_type + SKB_GSO_TCPV4 | SKB_GSO_DODGY; + skb_shinfo(skb)->gso_size = msg.mss; + skb_shinfo(skb)->gso_segs = 0; + break; + default: + pr_debug("Unknown segmentation offload type %d!\n", + msg.segmentation_type); + goto err; + } + __skb_queue_tail(pending_rx_queue, skb); if (ncrp->pending_rx_hypercalls.nr_pending_gops >diff --git a/drivers/xen/netchannel2/xmit_packet.c b/drivers/xen/netchannel2/xmit_packet.c index 5cebca6..7eb845d 100644 --- a/drivers/xen/netchannel2/xmit_packet.c +++ b/drivers/xen/netchannel2/xmit_packet.c @@ -104,6 +104,14 @@ static void set_offload_flags(struct sk_buff *skb, if (skb->proto_data_valid) msg->flags |= NC2_PACKET_FLAG_data_validated; + + if (skb_shinfo(skb)->gso_size != 0) { + msg->mss = skb_shinfo(skb)->gso_size; + msg->segmentation_type = NC2_PACKET_SEGMENTATION_TYPE_tcpv4; + } else { + msg->mss = 0; + msg->segmentation_type = NC2_PACKET_SEGMENTATION_TYPE_none; + } } /* Transmit a packet which has previously been prepared with diff --git a/include/xen/interface/io/netchannel2.h b/include/xen/interface/io/netchannel2.h index 11bb469..1cca607 100644 --- a/include/xen/interface/io/netchannel2.h +++ b/include/xen/interface/io/netchannel2.h @@ -54,13 +54,13 @@ struct netchannel2_msg_packet { packet message. */ uint8_t type; uint8_t flags; - uint8_t pad0; - uint8_t pad1; + uint8_t segmentation_type; + uint8_t pad; uint16_t prefix_size; - uint16_t pad2; + uint16_t mss; uint16_t csum_start; uint16_t csum_offset; - /* Variable-size array. The number of elements is determined + /* Variable-size array. The number of elements is determined by the size of the message. */ struct netchannel2_fragment frags[0]; }; @@ -112,6 +112,9 @@ struct netchannel2_msg_packet { #define NC2_PACKET_TYPE_receiver_copy 1 #define NC2_PACKET_TYPE_small 4 +#define NC2_PACKET_SEGMENTATION_TYPE_none 0 +#define NC2_PACKET_SEGMENTATION_TYPE_tcpv4 1 + /* Tell the other end that we''re finished with a message it sent us, and it can release the transmit buffers etc. This must be sent in response to receiver_copy and receiver_map packets. It must not be @@ -140,7 +143,13 @@ struct netchannel2_msg_set_offload { * the other end does not have to perform the calculation. */ uint8_t csum; - uint8_t pad; + /* Segmentation offload. If this is 0, the other end must not + * generate any packet messages with a segmentation type other + * than NC2_PACKET_SEGMENTATION_TYPE_none. If it is 1, the + * other end may also generate packets with a type of + * NC2_PACKET_SEGMENTATION_TYPE_tcpv4. + */ + uint8_t tcpv4_segmentation_offload; uint16_t reserved; }; -- 1.6.3.1 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
<steven.smith@citrix.com>
2009-Oct-04 15:04 UTC
[Xen-devel] [PATCH 15/22] Add support for receiver-map mode.
In this mode of operation, the receiving domain maps the sending domain''s buffers, rather than grant-copying them into local memory. This is marginally faster, but requires the receiving domain to be somewhat trusted, because: a) It can see anything else which happens to be on the same page as the transmit buffer, and b) It can just hold onto the pages indefinitely, causing a memory leak in the transmitting domain. It''s therefore only really suitable for talking to a trusted peer, and we use it in that way. Signed-off-by: Steven Smith <steven.smith@citrix.com> --- drivers/xen/netchannel2/Makefile | 3 +- drivers/xen/netchannel2/chan.c | 14 + drivers/xen/netchannel2/netchannel2_core.h | 17 +- drivers/xen/netchannel2/receiver_map.c | 786 ++++++++++++++++++++++++++++ drivers/xen/netchannel2/recv_packet.c | 23 + drivers/xen/netchannel2/rscb.c | 46 ++- drivers/xen/netchannel2/util.c | 14 + drivers/xen/netchannel2/xmit_packet.c | 12 +- include/xen/interface/io/netchannel2.h | 20 + 9 files changed, 919 insertions(+), 16 deletions(-) create mode 100644 drivers/xen/netchannel2/receiver_map.c diff --git a/drivers/xen/netchannel2/Makefile b/drivers/xen/netchannel2/Makefile index 565ba89..d6fb796 100644 --- a/drivers/xen/netchannel2/Makefile +++ b/drivers/xen/netchannel2/Makefile @@ -1,7 +1,8 @@ obj-$(CONFIG_XEN_NETCHANNEL2) += netchannel2.o netchannel2-objs := chan.o netchan2.o rscb.o util.o \ - xmit_packet.o offload.o recv_packet.o poll.o + xmit_packet.o offload.o recv_packet.o poll.o \ + receiver_map.o ifeq ($(CONFIG_XEN_NETDEV2_BACKEND),y) netchannel2-objs += netback2.o diff --git a/drivers/xen/netchannel2/chan.c b/drivers/xen/netchannel2/chan.c index 9bb7ce7..47e1c5e 100644 --- a/drivers/xen/netchannel2/chan.c +++ b/drivers/xen/netchannel2/chan.c @@ -395,6 +395,13 @@ struct netchannel2 *nc2_new(struct xenbus_device *xd) return NULL; } + if (local_trusted) { + if (init_receive_map_mode() < 0) { + nc2_release(nc); + return NULL; + } + } + netdev->open = nc2_open; netdev->stop = nc2_stop; netdev->hard_start_xmit = nc2_start_xmit; @@ -499,6 +506,8 @@ int nc2_attach_rings(struct netchannel2 *nc, spin_unlock_bh(&nc->rings.lock); + resume_receive_map_mode(); + netif_carrier_on(nc->net_device); /* Kick it to get it going. */ @@ -630,6 +639,11 @@ int nc2_get_evtchn_port(struct netchannel2 *nc) return nc->rings.evtchn; } +void nc2_suspend(struct netchannel2 *nc) +{ + suspend_receive_map_mode(); +} + /* @ncrp has been recently nc2_kick()ed. Do all of the necessary stuff. */ static int process_ring(struct napi_struct *napi, diff --git a/drivers/xen/netchannel2/netchannel2_core.h b/drivers/xen/netchannel2/netchannel2_core.h index 7be97ea..c4de063 100644 --- a/drivers/xen/netchannel2/netchannel2_core.h +++ b/drivers/xen/netchannel2/netchannel2_core.h @@ -37,6 +37,7 @@ enum transmit_policy { transmit_policy_unknown = 0, transmit_policy_first = 0xf001, transmit_policy_grant = transmit_policy_first, + transmit_policy_map, transmit_policy_small, transmit_policy_last = transmit_policy_small }; @@ -320,6 +321,11 @@ struct sk_buff *handle_receiver_copy_packet(struct netchannel2 *nc, struct netchannel2_msg_hdr *hdr, unsigned nr_frags, unsigned frags_off); +struct sk_buff *handle_receiver_map_packet(struct netchannel2 *nc, + struct netchannel2_msg_packet *msg, + struct netchannel2_msg_hdr *hdr, + unsigned nr_frags, + unsigned frags_off); enum prepare_xmit_result { PREP_XMIT_OKAY = 0, @@ -332,9 +338,11 @@ enum prepare_xmit_result prepare_xmit_allocate_small( struct sk_buff *skb); enum prepare_xmit_result prepare_xmit_allocate_grant( struct netchannel2_ring_pair *ncrp, - struct sk_buff *skb); + struct sk_buff *skb, + int use_subpage_grants); void xmit_grant(struct netchannel2_ring_pair *ncrp, struct sk_buff *skb, + int use_subpage_grants, volatile void *msg); void queue_finish_packet_message(struct netchannel2_ring_pair *ncrp, @@ -353,6 +361,8 @@ void fetch_fragment(struct netchannel2_ring_pair *ncrp, struct netchannel2_fragment *frag, unsigned off); +void pull_through(struct sk_buff *skb, unsigned count); + void nc2_kick(struct netchannel2_ring_pair *ncrp); int nc2_map_grants(struct grant_mapping *gm, @@ -366,6 +376,11 @@ void queue_packet_to_interface(struct sk_buff *skb, void nc2_rscb_on_gntcopy_fail(void *ctxt, gnttab_copy_t *gop); +int init_receive_map_mode(void); +void deinit_receive_map_mode(void); +void suspend_receive_map_mode(void); +void resume_receive_map_mode(void); + int nc2_start_xmit(struct sk_buff *skb, struct net_device *dev); int nc2_really_start_xmit(struct netchannel2_ring_pair *ncrp, struct sk_buff *skb); diff --git a/drivers/xen/netchannel2/receiver_map.c b/drivers/xen/netchannel2/receiver_map.c new file mode 100644 index 0000000..e5c4ed1 --- /dev/null +++ b/drivers/xen/netchannel2/receiver_map.c @@ -0,0 +1,786 @@ +/* Support for mapping packets into the local domain, rather than + copying them or using pre-posted buffers. We only implement + receive-side support here; for transmit-side, we use the rscb.c + implementation. */ +#include <linux/kernel.h> +#include <linux/delay.h> +#include <linux/skbuff.h> +#include <linux/netdevice.h> +#include <xen/live_maps.h> +#include <xen/gnttab.h> +#include <xen/balloon.h> +#include <xen/evtchn.h> +#include "netchannel2_core.h" + +#define MAX_MAPPED_FRAGS 1024 +#define MAX_MAPPED_PACKETS MAX_PENDING_FINISH_PACKETS +#define SKB_MIN_PAYLOAD_SIZE 128 + +static DEFINE_SPINLOCK(global_map_lock); +static struct receive_mapper *receive_mapper; + +/* How long do we leave the packets in the Linux stack before trying + to copy them, in jiffies? */ +#define PACKET_TIMEOUT (HZ/2) + +/* A slot into which we could map a fragment. */ +struct rx_map_fragment { + struct list_head list; + struct rx_map_packet *packet; + grant_handle_t handle; /* 0 if the fragment isn''t currently + * mapped */ + struct netchannel2_fragment nc_frag; +}; + +struct rx_map_packet { + struct list_head list; + struct list_head frags; + /* We take a reference for every mapped fragment associated + with the packet. When the refcnt goes to zero, the packet + is finished, and can be moved to the + finished_packets_list. */ + atomic_t refcnt; + unsigned id; + unsigned long expires; /* We expect Linux to have finished + with the packet by this time (in + jiffies), or we try to copy it. */ + struct netchannel2 *nc; + uint8_t flags; +}; + +struct receive_mapper { + struct page_foreign_tracker *tracker; + + struct page **pages; + + /* Nests inside the netchannel2 lock. The + finished_packets_lock nests inside this. */ + spinlock_t rm_lock; + + /* Packet fragments which we''ve mapped, or slots into which we + could map packets. The free list and count are protected + by @rm_lock. */ + struct rx_map_fragment frags[MAX_MAPPED_FRAGS]; + struct list_head free_frags; + + struct rx_map_packet packets[MAX_MAPPED_PACKETS]; + struct list_head free_packets; + struct list_head active_packets; + unsigned nr_free_packets; + + /* Packets which Linux has finished with but which we haven''t + returned to the other endpoint yet. */ + spinlock_t finished_packets_lock; /* BH-safe leaf lock, + * acquired from the page + * free callback. Nests + * inside the rm_lock. */ + struct list_head finished_packets; + + struct tasklet_struct gc_tasklet; + + struct timer_list expire_timer; + + /* Set if we''re trying to run the mapper down prior to + suspending the domain. */ + uint8_t suspending; +}; + +static void suspend_receive_mapper(struct receive_mapper *rm); + +static unsigned fragment_idx(const struct rx_map_fragment *frag) +{ + return frag - receive_mapper->frags; +} + +static int alloc_rx_frags_for_packet(unsigned nr_frags, + struct rx_map_packet *packet) +{ + struct rx_map_fragment *rmf; + unsigned x; + + INIT_LIST_HEAD(&packet->frags); + for (x = 0; x < nr_frags; x++) { + if (list_empty(&receive_mapper->free_frags)) + goto err; + rmf = list_entry(receive_mapper->free_frags.next, + struct rx_map_fragment, + list); + rmf->packet = packet; + rmf->handle = -1; + list_move(&rmf->list, &packet->frags); + } + return 0; + +err: + list_splice_init(&packet->frags, &receive_mapper->free_frags); + return -EBUSY; +} + +static struct rx_map_packet *alloc_rx_packet(struct netchannel2 *nc, + unsigned nr_frags) +{ + struct rx_map_packet *rmp; + + spin_lock(&receive_mapper->rm_lock); + if (list_empty(&receive_mapper->free_packets) || + receive_mapper->suspending) { + spin_unlock(&receive_mapper->rm_lock); + return NULL; + } + rmp = list_entry(receive_mapper->free_packets.next, + struct rx_map_packet, list); + + if (alloc_rx_frags_for_packet(nr_frags, rmp) < 0) { + spin_unlock(&receive_mapper->rm_lock); + return NULL; + } + list_del(&rmp->list); + atomic_set(&rmp->refcnt, nr_frags); + rmp->nc = nc; + receive_mapper->nr_free_packets--; + + spin_unlock(&receive_mapper->rm_lock); + + return rmp; +} + +struct grant_unmapper { + unsigned nr_gops; + gnttab_unmap_grant_ref_t gop_queue[32]; +}; + +static void do_unmaps(struct grant_unmapper *unmapper) +{ + int ret; + unsigned x; + + if (unmapper->nr_gops != 0) { + ret = HYPERVISOR_grant_table_op(GNTTABOP_unmap_grant_ref, + unmapper->gop_queue, + unmapper->nr_gops); + BUG_ON(ret); + for (x = 0; x < unmapper->nr_gops; x++) { + set_phys_to_machine( + __pa(unmapper->gop_queue[x].host_addr) >> + PAGE_SHIFT, + INVALID_P2M_ENTRY); + } + } + unmapper->nr_gops = 0; +} + +static void grant_unmap(struct grant_unmapper *unmapper, + void *va, + int handle) +{ + gnttab_unmap_grant_ref_t *gop; + if (unmapper->nr_gops == ARRAY_SIZE(unmapper->gop_queue)) + do_unmaps(unmapper); + gop = &unmapper->gop_queue[unmapper->nr_gops]; + gnttab_set_unmap_op(gop, (unsigned long)va, GNTMAP_host_map, handle); + unmapper->nr_gops++; +} + +/* A tasklet which is invoked shortly after a packet is released so + that we can send the FINISH_PACKET message. */ +static void gc_tasklet(unsigned long _rm) +{ + struct list_head packets; + struct rx_map_packet *packet; + struct rx_map_fragment *rx_frag; + struct list_head released_fragments; + unsigned nr_released_packets; + unsigned idx; + struct grant_unmapper unmapper; + struct page *page; + struct netchannel2 *locked_nc; + + INIT_LIST_HEAD(&packets); + + spin_lock(&receive_mapper->finished_packets_lock); + list_splice_init(&receive_mapper->finished_packets, &packets); + spin_unlock(&receive_mapper->finished_packets_lock); + + /* Unmap the fragments. */ + unmapper.nr_gops = 0; + BUG_ON(packets.next == NULL); + list_for_each_entry(packet, &packets, list) { + BUG_ON(packet->list.next == NULL); + BUG_ON(atomic_read(&packet->refcnt) != 0); + BUG_ON(packet->frags.next == NULL); + list_for_each_entry(rx_frag, &packet->frags, list) { + BUG_ON(rx_frag->list.next == NULL); + if (rx_frag->handle == -1) + continue; + idx = fragment_idx(rx_frag); + page = receive_mapper->pages[idx]; + stop_tracking_page(page); + grant_unmap(&unmapper, page_address(page), + rx_frag->handle); + } + } + do_unmaps(&unmapper); + + /* Tell the other end that the packets are finished, and + accumulate the fragments into a local free list. */ + INIT_LIST_HEAD(&released_fragments); + nr_released_packets = 0; + + locked_nc = NULL; + list_for_each_entry(packet, &packets, list) { + if (locked_nc != packet->nc) { + if (locked_nc) { + spin_unlock(&locked_nc->rings.lock); + nc2_kick(&locked_nc->rings); + } + spin_lock(&packet->nc->rings.lock); + locked_nc = packet->nc; + } + BUG_ON(packet->frags.next == NULL); + list_for_each_entry(rx_frag, &packet->frags, list) { + BUG_ON(rx_frag->list.next == NULL); + idx = fragment_idx(rx_frag); + gnttab_reset_grant_page(receive_mapper->pages[idx]); + } + nr_released_packets++; + list_splice_init(&packet->frags, &released_fragments); + queue_finish_packet_message(&locked_nc->rings, packet->id, + packet->flags); + } + + if (locked_nc) { + spin_unlock(&locked_nc->rings.lock); + nc2_kick(&locked_nc->rings); + locked_nc = NULL; + + spin_lock(&receive_mapper->rm_lock); + list_splice(&packets, &receive_mapper->free_packets); + list_splice(&released_fragments, &receive_mapper->free_frags); + receive_mapper->nr_free_packets += nr_released_packets; + + /* Reprogram the expire timer. */ + if (!list_empty(&receive_mapper->active_packets)) { + mod_timer(&receive_mapper->expire_timer, + list_entry(receive_mapper->active_packets.next, + struct rx_map_packet, + list)->expires); + } + spin_unlock(&receive_mapper->rm_lock); + } +} + +/* Decrement the refcnt on @rmp and, if necessary, move it to the + finished packets list and schedule the GC tasklet. */ +static void put_rx_map_packet(struct rx_map_packet *rmp) +{ + if (atomic_dec_and_test(&rmp->refcnt)) { + /* Remove it from the active list. */ + spin_lock_bh(&receive_mapper->rm_lock); + list_del(&rmp->list); + spin_unlock_bh(&receive_mapper->rm_lock); + + /* Add it to the finished list. */ + spin_lock_bh(&receive_mapper->finished_packets_lock); + list_add_tail(&rmp->list, &receive_mapper->finished_packets); + spin_unlock_bh(&receive_mapper->finished_packets_lock); + + tasklet_schedule(&receive_mapper->gc_tasklet); + } +} + + +/* The page @page, which was previously part of a receiver-mapped SKB, + * has been released. If it was the last page involved in its SKB, + * the packet is finished and we can tell the other end that it''s + * finished. + */ +static void netchan2_page_release(struct page *page, unsigned order) +{ + struct rx_map_fragment *frag; + struct rx_map_packet *rmp; + + BUG_ON(order != 0); + + frag = (struct rx_map_fragment *)page->mapping; + rmp = frag->packet; + + put_rx_map_packet(rmp); +} + +/* Unmap the packet, removing all other references to it. The caller + * should take an additional reference to the packet before calling + * this, to stop it disappearing underneath us. The only way of + * checking whether this succeeded is to look at the packet''s + * reference count after it returns. + */ +static void unmap_this_packet(struct rx_map_packet *rmp) +{ + struct rx_map_fragment *rx_frag; + unsigned idx; + int r; + int cnt; + + /* Unmap every fragment in the packet. We don''t fail the whole + function just because gnttab_copy_grant_page() failed, + because success or failure will be inferable from the + reference count on the packet (this makes it easier to + handle the case where some pages have already been copied, + for instance). */ + cnt = 0; + list_for_each_entry(rx_frag, &rmp->frags, list) { + idx = fragment_idx(rx_frag); + if (rx_frag->handle != -1) { + r = gnttab_copy_grant_page(rx_frag->handle, + &receive_mapper->pages[idx]); + if (r == 0) { + /* We copied the page, so it''s not really + mapped any more. */ + rx_frag->handle = -1; + atomic_dec(&rmp->refcnt); + } + } + cnt++; + } + + /* Caller should hold a reference. */ + BUG_ON(atomic_read(&rmp->refcnt) == 0); +} + +static void unmap_all_packets(void) +{ + struct rx_map_packet *rmp; + struct rx_map_packet *next; + struct list_head finished_packets; + int need_tasklet; + + INIT_LIST_HEAD(&finished_packets); + + spin_lock_bh(&receive_mapper->rm_lock); + + list_for_each_entry_safe(rmp, next, &receive_mapper->active_packets, + list) { + atomic_inc(&rmp->refcnt); + unmap_this_packet(rmp); + if (atomic_dec_and_test(&rmp->refcnt)) + list_move(&rmp->list, finished_packets.prev); + } + spin_unlock_bh(&receive_mapper->rm_lock); + + need_tasklet = !list_empty(&finished_packets); + + spin_lock_bh(&receive_mapper->finished_packets_lock); + list_splice(&finished_packets, receive_mapper->finished_packets.prev); + spin_unlock_bh(&receive_mapper->finished_packets_lock); + + if (need_tasklet) + tasklet_schedule(&receive_mapper->gc_tasklet); +} + +static void free_receive_mapper(struct receive_mapper *rm) +{ + unsigned x; + + /* Get rid of any packets which are currently mapped. */ + suspend_receive_mapper(rm); + + /* Stop the expiry timer. We know it won''t get requeued + * because there are no packets outstanding and rm->suspending + * is set (because of suspend_receive_mapper()). */ + del_timer_sync(&rm->expire_timer); + + /* Wait for any last instances of the tasklet to finish. */ + tasklet_kill(&rm->gc_tasklet); + + if (rm->pages != NULL) { + for (x = 0; x < MAX_MAPPED_FRAGS; x++) { + if (PageForeign(rm->pages[x])) + ClearPageForeign(rm->pages[x]); + rm->pages[x]->mapping = NULL; + } + free_empty_pages_and_pagevec(rm->pages, MAX_MAPPED_FRAGS); + } + if (rm->tracker != NULL) + free_page_foreign_tracker(rm->tracker); + kfree(rm); +} + +/* Timer invoked shortly after a packet expires, so that we can copy + the data and get it back from Linux. This is necessary if a packet + gets stuck in a socket RX queue somewhere, or you risk a + deadlock. */ +static void expire_timer(unsigned long data) +{ + struct rx_map_packet *rmp, *next; + struct list_head finished_packets; + int need_tasklet; + + INIT_LIST_HEAD(&finished_packets); + + spin_lock(&receive_mapper->rm_lock); + list_for_each_entry_safe(rmp, next, &receive_mapper->active_packets, + list) { + if (time_after(rmp->expires, jiffies)) { + mod_timer(&receive_mapper->expire_timer, rmp->expires); + break; + } + atomic_inc(&rmp->refcnt); + unmap_this_packet(rmp); + if (atomic_dec_and_test(&rmp->refcnt)) { + list_move(&rmp->list, finished_packets.prev); + } else { + /* Couldn''t unmap the packet, either because + it''s in use by real hardware or we''ve run + out of memory. Send the packet to the end + of the queue and update the expiry time so + that we try again later. */ + /* Note that this can make the active packet + list slightly out of order. Oh well; it + won''t be by more than a few jiffies, and it + doesn''t really matter that much. */ + rmp->expires = jiffies + PACKET_TIMEOUT; + list_move(&rmp->list, + receive_mapper->active_packets.prev); + } + } + spin_unlock(&receive_mapper->rm_lock); + + need_tasklet = !list_empty(&finished_packets); + + spin_lock(&receive_mapper->finished_packets_lock); + list_splice(&finished_packets, receive_mapper->finished_packets.prev); + spin_unlock(&receive_mapper->finished_packets_lock); + + if (need_tasklet) + tasklet_schedule(&receive_mapper->gc_tasklet); +} + +static struct receive_mapper *new_receive_mapper(void) +{ + struct receive_mapper *rm; + unsigned x; + + rm = kzalloc(sizeof(*rm), GFP_KERNEL); + if (!rm) + goto err; + INIT_LIST_HEAD(&rm->free_frags); + INIT_LIST_HEAD(&rm->free_packets); + INIT_LIST_HEAD(&rm->active_packets); + INIT_LIST_HEAD(&rm->finished_packets); + spin_lock_init(&rm->rm_lock); + spin_lock_init(&rm->finished_packets_lock); + for (x = 0; x < MAX_MAPPED_FRAGS; x++) + list_add_tail(&rm->frags[x].list, &rm->free_frags); + for (x = 0; x < MAX_MAPPED_PACKETS; x++) + list_add_tail(&rm->packets[x].list, &rm->free_packets); + rm->nr_free_packets = MAX_MAPPED_PACKETS; + + setup_timer(&rm->expire_timer, expire_timer, 0); + tasklet_init(&rm->gc_tasklet, gc_tasklet, 0); + + rm->tracker = alloc_page_foreign_tracker(MAX_MAPPED_FRAGS); + if (!rm->tracker) + goto err; + rm->pages = alloc_empty_pages_and_pagevec(MAX_MAPPED_FRAGS); + if (!rm->pages) + goto err; + for (x = 0; x < MAX_MAPPED_FRAGS; x++) { + SetPageForeign(rm->pages[x], netchan2_page_release); + rm->pages[x]->mapping = (void *)&rm->frags[x]; + } + + return rm; + +err: + if (rm != NULL) + free_receive_mapper(rm); + return NULL; +} + +static void attach_frag_to_skb(struct sk_buff *skb, + struct rx_map_fragment *frag) +{ + unsigned idx; + struct skb_shared_info *shinfo; + skb_frag_t *sk_frag; + + shinfo = skb_shinfo(skb); + sk_frag = &shinfo->frags[shinfo->nr_frags]; + idx = fragment_idx(frag); + sk_frag->page = receive_mapper->pages[idx]; + sk_frag->page_offset = frag->nc_frag.off; + sk_frag->size = frag->nc_frag.size; + shinfo->nr_frags++; +} + +struct rx_plan { + int is_failed; + unsigned nr_mops; + gnttab_map_grant_ref_t mops[8]; + struct rx_map_fragment *frags[8]; +}; + +static void flush_grant_operations(struct rx_plan *rp) +{ + unsigned x; + int ret; + gnttab_map_grant_ref_t *mop; + + if (rp->nr_mops == 0) + return; + if (!rp->is_failed) { + ret = HYPERVISOR_grant_table_op(GNTTABOP_map_grant_ref, + rp->mops, + rp->nr_mops); + BUG_ON(ret); + for (x = 0; x < rp->nr_mops; x++) { + mop = &rp->mops[x]; + if (mop->status != 0) { + rp->is_failed = 1; + } else { + rp->frags[x]->handle = mop->handle; + set_phys_to_machine( + __pa(mop->host_addr) >> PAGE_SHIFT, + FOREIGN_FRAME(mop->dev_bus_addr >> + PAGE_SHIFT)); + } + } + } + rp->nr_mops = 0; +} + +static void map_fragment(struct rx_plan *rp, + struct rx_map_fragment *rx_frag, + struct netchannel2 *nc) +{ + unsigned idx = fragment_idx(rx_frag); + gnttab_map_grant_ref_t *mop; + + if (rp->nr_mops == ARRAY_SIZE(rp->mops)) + flush_grant_operations(rp); + mop = &rp->mops[rp->nr_mops]; + gnttab_set_map_op(mop, + (unsigned long)page_address(receive_mapper->pages[idx]), + GNTMAP_host_map | GNTMAP_readonly, + rx_frag->nc_frag.receiver_map.gref, + nc->rings.otherend_id); + rp->frags[rp->nr_mops] = rx_frag; + rp->nr_mops++; +} + +/* Unmap a packet which has been half-mapped. */ +static void unmap_partial_packet(struct rx_map_packet *rmp) +{ + unsigned idx; + struct rx_map_fragment *rx_frag; + struct grant_unmapper unmapper; + + unmapper.nr_gops = 0; + list_for_each_entry(rx_frag, &rmp->frags, list) { + if (rx_frag->handle == -1) + continue; + idx = fragment_idx(rx_frag); + grant_unmap(&unmapper, + page_address(receive_mapper->pages[idx]), + rx_frag->handle); + } + do_unmaps(&unmapper); +} + +struct sk_buff *handle_receiver_map_packet(struct netchannel2 *nc, + struct netchannel2_msg_packet *msg, + struct netchannel2_msg_hdr *hdr, + unsigned nr_frags, + unsigned frags_off) +{ + struct sk_buff *skb; + struct rx_map_fragment *rx_frag; + unsigned x; + unsigned len; + struct rx_map_packet *rmp; + unsigned idx; + struct rx_plan plan; + unsigned prefix_size; + + memset(&plan, 0, sizeof(plan)); + + rmp = alloc_rx_packet(nc, nr_frags); + if (rmp == NULL) + return NULL; + + if (msg->prefix_size < SKB_MIN_PAYLOAD_SIZE) + prefix_size = SKB_MIN_PAYLOAD_SIZE; + else + prefix_size = msg->prefix_size; + /* As in posted_buffers.c, we don''t limit the total size of + the packet, because we don''t need to allocate more memory + for very large packets. The prefix is safe because it''s + only a 16 bit number. A 64k allocation won''t always + succeed, but it''s unlikely to trigger the OOM killer or + otherwise interfere with the normal operation of the local + domain. */ + skb = dev_alloc_skb(prefix_size + NET_IP_ALIGN); + if (skb == NULL) { + spin_lock(&receive_mapper->rm_lock); + list_splice(&rmp->frags, &receive_mapper->free_frags); + list_add(&rmp->list, &receive_mapper->free_packets); + receive_mapper->nr_free_packets++; + spin_unlock(&receive_mapper->rm_lock); + return NULL; + } + skb_reserve(skb, NET_IP_ALIGN); + + rmp->id = msg->id; + rmp->flags = msg->flags; + + rx_frag = list_entry(rmp->frags.next, struct rx_map_fragment, list); + for (x = 0; x < nr_frags; x++) { + fetch_fragment(&nc->rings, x, &rx_frag->nc_frag, frags_off); + if (rx_frag->nc_frag.size > PAGE_SIZE || + rx_frag->nc_frag.off >= PAGE_SIZE || + rx_frag->nc_frag.size + rx_frag->nc_frag.off > PAGE_SIZE) { + plan.is_failed = 1; + break; + } + map_fragment(&plan, rx_frag, nc); + rx_frag = list_entry(rx_frag->list.next, + struct rx_map_fragment, + list); + } + + flush_grant_operations(&plan); + if (plan.is_failed) + goto fail_and_unmap; + + /* Grab the prefix off of the ring. */ + nc2_copy_from_ring_off(&nc->rings.cons_ring, + skb_put(skb, msg->prefix_size), + msg->prefix_size, + frags_off + + nr_frags * sizeof(struct netchannel2_fragment)); + + /* All fragments mapped, so we know that this is going to + work. Transfer the receive slots into the SKB. */ + len = 0; + list_for_each_entry(rx_frag, &rmp->frags, list) { + attach_frag_to_skb(skb, rx_frag); + idx = fragment_idx(rx_frag); + start_tracking_page(receive_mapper->tracker, + receive_mapper->pages[idx], + nc->rings.otherend_id, + rx_frag->nc_frag.receiver_map.gref, + idx, + nc); + len += rx_frag->nc_frag.size; + } + + skb->len += len; + skb->data_len += len; + skb->truesize += len; + + spin_lock(&receive_mapper->rm_lock); + list_add_tail(&rmp->list, &receive_mapper->active_packets); + rmp->expires = jiffies + PACKET_TIMEOUT; + if (rmp == list_entry(receive_mapper->active_packets.next, + struct rx_map_packet, + list)) + mod_timer(&receive_mapper->expire_timer, rmp->expires); + spin_unlock(&receive_mapper->rm_lock); + + if (skb_headlen(skb) < SKB_MIN_PAYLOAD_SIZE) + pull_through(skb, + SKB_MIN_PAYLOAD_SIZE - skb_headlen(skb)); + + return skb; + +fail_and_unmap: + pr_debug("Failed to map received packet!\n"); + unmap_partial_packet(rmp); + + spin_lock(&receive_mapper->rm_lock); + list_splice(&rmp->frags, &receive_mapper->free_frags); + list_add_tail(&rmp->list, &receive_mapper->free_packets); + receive_mapper->nr_free_packets++; + spin_unlock(&receive_mapper->rm_lock); + + kfree_skb(skb); + return NULL; +} + +static void suspend_receive_mapper(struct receive_mapper *rm) +{ + spin_lock_bh(&rm->rm_lock); + /* Stop any more packets coming in. */ + rm->suspending = 1; + + /* Wait for Linux to give back all of the SKBs which we''ve + given it. */ + while (rm->nr_free_packets != MAX_MAPPED_PACKETS) { + spin_unlock_bh(&rm->rm_lock); + unmap_all_packets(); + msleep(100); + spin_lock_bh(&rm->rm_lock); + } + spin_unlock_bh(&rm->rm_lock); +} + +static void resume_receive_mapper(void) +{ + spin_lock_bh(&receive_mapper->rm_lock); + receive_mapper->suspending = 0; + spin_unlock_bh(&receive_mapper->rm_lock); +} + + +int init_receive_map_mode(void) +{ + struct receive_mapper *new_rm; + spin_lock(&global_map_lock); + while (receive_mapper == NULL) { + spin_unlock(&global_map_lock); + new_rm = new_receive_mapper(); + if (new_rm == NULL) + return -ENOMEM; + spin_lock(&global_map_lock); + if (receive_mapper == NULL) { + receive_mapper = new_rm; + } else { + spin_unlock(&global_map_lock); + free_receive_mapper(new_rm); + spin_lock(&global_map_lock); + } + } + spin_unlock(&global_map_lock); + return 0; +} + +void deinit_receive_map_mode(void) +{ + if (!receive_mapper) + return; + BUG_ON(spin_is_locked(&global_map_lock)); + free_receive_mapper(receive_mapper); + receive_mapper = NULL; +} + +void suspend_receive_map_mode(void) +{ + if (!receive_mapper) + return; + suspend_receive_mapper(receive_mapper); +} + +void resume_receive_map_mode(void) +{ + if (!receive_mapper) + return; + resume_receive_mapper(); +} + +struct netchannel2 *nc2_get_interface_for_page(struct page *p) +{ + BUG_ON(!page_is_tracked(p)); + if (!receive_mapper || + tracker_for_page(p) != receive_mapper->tracker) + return NULL; + return get_page_tracker_ctxt(p); +} diff --git a/drivers/xen/netchannel2/recv_packet.c b/drivers/xen/netchannel2/recv_packet.c index 80c5d5d..8c38788 100644 --- a/drivers/xen/netchannel2/recv_packet.c +++ b/drivers/xen/netchannel2/recv_packet.c @@ -112,6 +112,28 @@ void nc2_handle_packet_msg(struct netchannel2 *nc, nr_frags, frags_off); queue_finish_packet_message(ncrp, msg.id, msg.flags); break; + case NC2_PACKET_TYPE_receiver_map: + if (!nc->local_trusted) { + /* The remote doesn''t trust us, so they + shouldn''t be sending us receiver-map + packets. Just treat it as an RSCB + packet. */ + skb = NULL; + } else { + skb = handle_receiver_map_packet(nc, &msg, hdr, + nr_frags, + frags_off); + /* Finish message will be sent when we unmap + * the packet. */ + } + if (skb == NULL) { + /* We can''t currently map this skb. Use a + receiver copy instead. */ + skb = handle_receiver_copy_packet(nc, ncrp, &msg, hdr, + nr_frags, frags_off); + queue_finish_packet_message(ncrp, msg.id, msg.flags); + } + break; default: pr_debug("Unknown packet type %d\n", msg.type); nc->stats.rx_errors++; @@ -285,4 +307,5 @@ int __init nc2_init(void) void __exit nc2_exit(void) { + deinit_receive_map_mode(); } diff --git a/drivers/xen/netchannel2/rscb.c b/drivers/xen/netchannel2/rscb.c index 8ad5454..cdcb116 100644 --- a/drivers/xen/netchannel2/rscb.c +++ b/drivers/xen/netchannel2/rscb.c @@ -209,6 +209,7 @@ struct sk_buff *handle_receiver_copy_packet(struct netchannel2 *nc, struct grant_packet_plan { volatile struct netchannel2_fragment *out_fragment; grant_ref_t gref_pool; + int use_subpage_grants; unsigned prefix_avail; }; @@ -223,14 +224,15 @@ static inline int nfrags_skb(struct sk_buff *skb, int prefix_size) start_grant = ((unsigned long)skb->data + prefix_size) & ~(PAGE_SIZE-1); end_grant = ((unsigned long)skb->data + - skb_headlen(skb) + PAGE_SIZE - 1) & + skb_headlen(skb) + PAGE_SIZE - 1) & ~(PAGE_SIZE-1); return ((end_grant - start_grant) >> PAGE_SHIFT) + skb_shinfo(skb)->nr_frags; } enum prepare_xmit_result prepare_xmit_allocate_grant(struct netchannel2_ring_pair *ncrp, - struct sk_buff *skb) + struct sk_buff *skb, + int use_subpage_grants) { struct skb_cb_overlay *skb_co = get_skb_overlay(skb); unsigned nr_fragments; @@ -241,13 +243,23 @@ enum prepare_xmit_result prepare_xmit_allocate_grant(struct netchannel2_ring_pai if (allocate_txp_slot(ncrp, skb) < 0) return PREP_XMIT_BUSY; - /* We''re going to have to get the remote to issue a grant copy - hypercall anyway, so there''s no real benefit to shoving the - headers inline. */ - /* (very small packets won''t go through here, so there''s no - chance that we could completely eliminate the grant - copy.) */ - inline_prefix_size = sizeof(struct ethhdr); + if (use_subpage_grants) { + /* We''re going to have to get the remote to issue a + grant copy hypercall anyway, so there''s no real + benefit to shoving the headers inline. */ + /* (very small packets won''t go through here, so + there''s no chance that we could completely + eliminate the grant copy.) */ + inline_prefix_size = sizeof(struct ethhdr); + } else { + /* If we''re going off-box (and we probably are, if the + remote is trusted), putting the header in the ring + potentially saves a TLB miss in the bridge, which + is worth doing. */ + inline_prefix_size = PACKET_PREFIX_SIZE; + if (skb_headlen(skb) < inline_prefix_size) + inline_prefix_size = skb_headlen(skb); + } if (skb_co->nr_fragments == 0) { nr_fragments = nfrags_skb(skb, inline_prefix_size); @@ -277,10 +289,14 @@ enum prepare_xmit_result prepare_xmit_allocate_grant(struct netchannel2_ring_pai have to recompute it next time around. */ return PREP_XMIT_BUSY; } + skb_co->gref_pool = gref_pool; skb_co->inline_prefix_size = inline_prefix_size; - skb_co->type = NC2_PACKET_TYPE_receiver_copy; + if (use_subpage_grants) + skb_co->type = NC2_PACKET_TYPE_receiver_copy; + else + skb_co->type = NC2_PACKET_TYPE_receiver_map; return PREP_XMIT_OKAY; } @@ -318,15 +334,19 @@ static void prepare_subpage_grant(struct netchannel2_ring_pair *ncrp, GTF_readonly, trans_domid, trans_gref); - } else { + } else if (plan->use_subpage_grants) { gnttab_grant_foreign_access_ref_subpage(gref, ncrp->otherend_id, virt_to_mfn(page_address(page)), GTF_readonly, off_in_page, size); + } else { + gnttab_grant_foreign_access_ref(gref, + ncrp->otherend_id, + virt_to_mfn(page_address(page)), + GTF_readonly); } - frag->off = off_in_page; frag->size = size; plan->out_fragment++; @@ -356,6 +376,7 @@ static int grant_data_area(struct netchannel2_ring_pair *ncrp, void xmit_grant(struct netchannel2_ring_pair *ncrp, struct sk_buff *skb, + int use_subpage_grants, volatile void *msg_buf) { volatile struct netchannel2_msg_packet *msg = msg_buf; @@ -366,6 +387,7 @@ void xmit_grant(struct netchannel2_ring_pair *ncrp, skb_frag_t *frag; memset(&plan, 0, sizeof(plan)); + plan.use_subpage_grants = use_subpage_grants; plan.prefix_avail = skb_co->inline_prefix_size; plan.out_fragment = msg->frags; plan.gref_pool = skb_co->gref_pool; diff --git a/drivers/xen/netchannel2/util.c b/drivers/xen/netchannel2/util.c index 302dfc1..79d9f09 100644 --- a/drivers/xen/netchannel2/util.c +++ b/drivers/xen/netchannel2/util.c @@ -94,6 +94,20 @@ void release_tx_packet(struct netchannel2_ring_pair *ncrp, } gnttab_release_grant_reference(&ncrp->gref_pool, gref); } + } else if (skb_co->type == NC2_PACKET_TYPE_receiver_map) { + while (1) { + r = gnttab_claim_grant_reference(&skb_co->gref_pool); + if (r == -ENOSPC) + break; + gref = (grant_ref_t)r; + r = gnttab_end_foreign_access_ref(gref); + if (r == 0) { + printk(KERN_WARNING "Failed to end remote access to packet memory.\n"); + } else { + gnttab_release_grant_reference(&ncrp->gref_pool, + gref); + } + } } else if (skb_co->gref_pool != 0) { gnttab_subfree_grant_references(skb_co->gref_pool, &ncrp->gref_pool); diff --git a/drivers/xen/netchannel2/xmit_packet.c b/drivers/xen/netchannel2/xmit_packet.c index 7eb845d..d95ad09 100644 --- a/drivers/xen/netchannel2/xmit_packet.c +++ b/drivers/xen/netchannel2/xmit_packet.c @@ -13,6 +13,8 @@ static enum transmit_policy transmit_policy(struct netchannel2 *nc, { if (skb->len <= PACKET_PREFIX_SIZE && !skb_is_nonlinear(skb)) return transmit_policy_small; + else if (nc->remote_trusted) + return transmit_policy_map; else return transmit_policy_grant; } @@ -72,7 +74,10 @@ enum prepare_xmit_result prepare_xmit_allocate_resources(struct netchannel2 *nc, r = prepare_xmit_allocate_small(&nc->rings, skb); break; case transmit_policy_grant: - r = prepare_xmit_allocate_grant(&nc->rings, skb); + r = prepare_xmit_allocate_grant(&nc->rings, skb, 1); + break; + case transmit_policy_map: + r = prepare_xmit_allocate_grant(&nc->rings, skb, 0); break; default: BUG(); @@ -170,7 +175,10 @@ int nc2_really_start_xmit(struct netchannel2_ring_pair *ncrp, /* Nothing to do */ break; case transmit_policy_grant: - xmit_grant(ncrp, skb, msg); + xmit_grant(ncrp, skb, 1, msg); + break; + case transmit_policy_map: + xmit_grant(ncrp, skb, 0, msg); break; default: BUG(); diff --git a/include/xen/interface/io/netchannel2.h b/include/xen/interface/io/netchannel2.h index 1cca607..f264995 100644 --- a/include/xen/interface/io/netchannel2.h +++ b/include/xen/interface/io/netchannel2.h @@ -46,6 +46,9 @@ struct netchannel2_fragment { struct { grant_ref_t gref; } receiver_copy; + struct { + grant_ref_t gref; + } receiver_map; }; }; struct netchannel2_msg_packet { @@ -98,6 +101,22 @@ struct netchannel2_msg_packet { * Due to backend bugs, it is in not safe to use this * packet type except on bypass rings. * + * receiver_map -- The transmitting domain has granted the receiving + * domain access to the original RX buffers using + * full (mappable) grant references. This can be + * treated the same way as receiver_copy, but the + * receiving domain also has the option of mapping + * the fragments, rather than copying them. If it + * decides to do so, it should ensure that the fragments + * will be unmapped in a reasonably timely fashion, + * and don''t e.g. become stuck in a receive buffer + * somewhere. In general, anything longer than about + * a second is likely to cause problems. Once all + * grant references have been unmapper, the receiving + * domain should send a FINISH message. + * + * This packet type may not be used on bypass rings. + * * small -- The packet does not have any fragment descriptors * (i.e. the entire thing is inline in the ring). The receiving * domain should simply the copy the packet out of the ring @@ -110,6 +129,7 @@ struct netchannel2_msg_packet { * that it is correct to treat receiver_map and small packets as * receiver_copy ones. */ #define NC2_PACKET_TYPE_receiver_copy 1 +#define NC2_PACKET_TYPE_receiver_map 3 #define NC2_PACKET_TYPE_small 4 #define NC2_PACKET_SEGMENTATION_TYPE_none 0 -- 1.6.3.1 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
<steven.smith@citrix.com>
2009-Oct-04 15:04 UTC
[Xen-devel] [PATCH 16/22] Bypass support, for both frontend and backend.
A bypass is an auxiliary ring attached to a netchannel2 interface which is used to communicate with a particular remote guest, completely bypassing the bridge in dom0. This is quite a bit faster, and can also help to prevent dom0 from becoming a bottleneck on large systems. Bypasses are inherently incompatible with packet filtering in domain 0. This is a moderately unusual configuration (there''ll usually be a firewall protecting the dom0 host stack, but bridge filtering is less common), and we rely on the user turning off bypasses if they''re doing it. Signed-off-by: Steven Smith <steven.smith@citrix.com> --- drivers/xen/Kconfig | 19 + drivers/xen/netchannel2/Makefile | 8 + drivers/xen/netchannel2/bypass.c | 794 ++++++++++++++++++++++++++ drivers/xen/netchannel2/bypassee.c | 737 ++++++++++++++++++++++++ drivers/xen/netchannel2/chan.c | 137 ++++- drivers/xen/netchannel2/netback2.c | 128 +++++ drivers/xen/netchannel2/netchannel2_core.h | 278 +++++++++- drivers/xen/netchannel2/netchannel2_uspace.h | 17 + drivers/xen/netchannel2/netfront2.c | 25 + drivers/xen/netchannel2/recv_packet.c | 9 + drivers/xen/netchannel2/xmit_packet.c | 17 +- include/xen/interface/io/netchannel2.h | 138 +++++ 12 files changed, 2291 insertions(+), 16 deletions(-) create mode 100644 drivers/xen/netchannel2/bypass.c create mode 100644 drivers/xen/netchannel2/bypassee.c create mode 100644 drivers/xen/netchannel2/netchannel2_uspace.h diff --git a/drivers/xen/Kconfig b/drivers/xen/Kconfig index a081b73..d4265f6 100644 --- a/drivers/xen/Kconfig +++ b/drivers/xen/Kconfig @@ -234,6 +234,25 @@ config XEN_NETDEV2_FRONTEND depends on XEN_NETCHANNEL2 default y +config XEN_NETDEV2_BYPASSABLE + bool "Net channel 2 bypassee support" + depends on XEN_NETDEV2_BACKEND + default y + help + This option allows net channel 2 endpoints in this domain to + be bypassed. If this domain is acting as a bridge between + domains on a single host, bypass support will allow faster + inter-domain communication and reduce load in this domain. + +config XEN_NETDEV2_BYPASS_ENDPOINT + bool "Net channel 2 bypass endpoint support" + depends on XEN_NETDEV2_BACKEND && XEN_NETDEV2_FRONTEND + default y + help + Support for acting as the endpoint of a netchannel2 bypass. + Bypasses allow faster inter-domain communication, provided + every VM supports them. + config XEN_GRANT_DEV tristate "User-space granted page access driver" default XEN_PRIVILEGED_GUEST diff --git a/drivers/xen/netchannel2/Makefile b/drivers/xen/netchannel2/Makefile index d6fb796..5aa3410 100644 --- a/drivers/xen/netchannel2/Makefile +++ b/drivers/xen/netchannel2/Makefile @@ -11,3 +11,11 @@ endif ifeq ($(CONFIG_XEN_NETDEV2_FRONTEND),y) netchannel2-objs += netfront2.o endif + +ifeq ($(CONFIG_XEN_NETDEV2_BYPASSABLE),y) +netchannel2-objs += bypassee.o +endif + +ifeq ($(CONFIG_XEN_NETDEV2_BYPASS_ENDPOINT),y) +netchannel2-objs += bypass.o +endif diff --git a/drivers/xen/netchannel2/bypass.c b/drivers/xen/netchannel2/bypass.c new file mode 100644 index 0000000..c907b48 --- /dev/null +++ b/drivers/xen/netchannel2/bypass.c @@ -0,0 +1,794 @@ +#include <linux/kernel.h> +#include <linux/kthread.h> +#include <linux/interrupt.h> +#include <linux/delay.h> +#include <xen/evtchn.h> +#include <xen/driver_util.h> +#include "netchannel2_core.h" + +/* Can we send this packet on this bypass? True if the destination + MAC address matches. */ +static int can_bypass_packet(struct nc2_alternate_ring *ncr, + struct sk_buff *skb) +{ + struct ethhdr *eh; + + if (skb_headlen(skb) < sizeof(*eh)) + return 0; + eh = (struct ethhdr *)skb->data; + if (memcmp(eh->h_dest, ncr->rings.remote_mac, ETH_ALEN)) + return 0; + else + return 1; +} + +/* Called from the netdev start_xmit method. We''re holding the master + nc ring lock, but not the bypass ring lock. */ +int bypass_xmit_packet(struct netchannel2 *nc, + struct nc2_alternate_ring *ncr, + struct sk_buff *skb) +{ + struct netchannel2_ring_pair *rings = &ncr->rings; + struct skb_cb_overlay *skb_co = get_skb_overlay(skb); + size_t msg_size; + enum transmit_policy policy; + int r; + + if (!can_bypass_packet(ncr, skb)) + return 0; + + spin_lock(&rings->lock); + if (ncr->state != nc2_alt_ring_ready) { + spin_unlock(&rings->lock); + return 0; + } + /* We''re now committed to either transmitting this packet on + this ring or dropping it outright. */ + if (skb->len <= PACKET_PREFIX_SIZE && !skb_is_nonlinear(skb)) { + r = prepare_xmit_allocate_small(rings, skb); + policy = transmit_policy_small; + } else { + r = prepare_xmit_allocate_grant(rings, skb, 1); + policy = transmit_policy_grant; + } + if (r < 0) { + spin_unlock(&rings->lock); + dev_kfree_skb(skb); + return 1; + } + + skb_co->policy = policy; + msg_size = get_transmitted_packet_msg_size(skb); + if (!nc2_reserve_payload_bytes(&rings->prod_ring, msg_size)) { + /* Uh oh. */ + release_tx_packet(rings, skb); + spin_unlock(&rings->lock); + return 1; + } + + queue_packet_to_interface(skb, rings); + + spin_unlock(&rings->lock); + + return 1; +} + +void nc2_aux_ring_start_disable_sequence(struct nc2_alternate_ring *nar) +{ + spin_lock(&nar->rings.lock); + if (nar->state < nc2_alt_ring_disabling) { + nar->state = nc2_alt_ring_disabling; + nc2_kick(&nar->rings); + } + spin_unlock(&nar->rings.lock); +} + +static void start_detach_worker(struct work_struct *ws) +{ + struct nc2_alternate_ring *ncr + container_of(ws, struct nc2_alternate_ring, detach_work_item); + + /* Detach from the ring. Note that it may still be running at + this point. In that case, we need to stop it and then go + and discard any outstanding messages on it. */ + + /* Stop the IRQ and change state. This will prevent us from + being added to the schedule list again, but we may still be + on it for other reasons, so we need to get back into the + worker thread to finish up. */ + + /* We defer actually unmapping the rings to + nc2_advertise_rings(), since that''s on the worker thread + and we therefore know we''re not going to race anything + doing it there. */ + + if (ncr->rings.irq >= 0) + unbind_from_irqhandler(ncr->rings.irq, &ncr->rings); + ncr->rings.irq = -1; + + spin_lock_bh(&ncr->rings.lock); + ncr->state = nc2_alt_ring_detached_pending; + ncr->rings.interface->need_aux_ring_state_machine = 1; + nc2_kick(&ncr->rings.interface->rings); + spin_unlock_bh(&ncr->rings.lock); +} + +void nc2_aux_ring_start_detach_sequence(struct nc2_alternate_ring *nar) +{ + spin_lock(&nar->rings.lock); + if (nar->state >= nc2_alt_ring_detaching) { + spin_unlock(&nar->rings.lock); + return; + } + nar->state = nc2_alt_ring_detaching; + spin_unlock(&nar->rings.lock); + + /* We can''t do unbind_from_irqhandler() from a tasklet, so + punt it to a workitem. */ + INIT_WORK(&nar->detach_work_item, + start_detach_worker); + schedule_work(&nar->detach_work_item); +} + +/* Crank through the auxiliary ring state machine. Called holding the + * master ring lock. */ +void _nc2_crank_aux_ring_state_machine(struct netchannel2 *nc) +{ + struct nc2_alternate_ring *nar; + struct nc2_alternate_ring *next_nar; + struct netchannel2_msg_bypass_disabled disabled_msg; + struct netchannel2_msg_bypass_detached detached_msg; + struct netchannel2_msg_bypass_frontend_ready frontend_ready_msg; + + memset(&disabled_msg, 0, sizeof(disabled_msg)); + memset(&detached_msg, 0, sizeof(detached_msg)); + memset(&frontend_ready_msg, 0, sizeof(frontend_ready_msg)); + + if (nc->pending_bypass_error) { + if (!nc2_can_send_payload_bytes(&nc->rings.prod_ring, + sizeof(frontend_ready_msg))) + return; + frontend_ready_msg.port = -1; + nc2_send_message(&nc->rings.prod_ring, + NETCHANNEL2_MSG_BYPASS_FRONTEND_READY, + 0, + &frontend_ready_msg, + sizeof(frontend_ready_msg)); + nc->rings.pending_time_sensitive_messages = 1; + nc->pending_bypass_error = 0; + } + + list_for_each_entry_safe(nar, next_nar, &nc->alternate_rings, + rings_by_interface) { + + spin_lock(&nar->rings.lock); + if (nar->state == nc2_alt_ring_frontend_send_ready_pending) { + if (!nc2_can_send_payload_bytes(&nc->rings.prod_ring, + sizeof(frontend_ready_msg))) { + spin_unlock(&nar->rings.lock); + return; + } + frontend_ready_msg.port + irq_to_evtchn_port(nar->rings.irq); + nc2_send_message(&nc->rings.prod_ring, + NETCHANNEL2_MSG_BYPASS_FRONTEND_READY, + 0, + &frontend_ready_msg, + sizeof(frontend_ready_msg)); + nar->state = nc2_alt_ring_frontend_sent_ready; + nc->rings.pending_time_sensitive_messages = 1; + } + if (nar->state == nc2_alt_ring_disabled_pending) { + if (!nc2_can_send_payload_bytes(&nc->rings.prod_ring, + sizeof(disabled_msg))) { + spin_unlock(&nar->rings.lock); + return; + } + disabled_msg.handle = nar->handle; + nc2_send_message(&nc->rings.prod_ring, + NETCHANNEL2_MSG_BYPASS_DISABLED, + 0, + &disabled_msg, + sizeof(disabled_msg)); + nar->state = nc2_alt_ring_disabled; + nc->rings.pending_time_sensitive_messages = 1; + } + if (nar->state == nc2_alt_ring_detached_pending) { + if (!nc2_can_send_payload_bytes(&nc->rings.prod_ring, + sizeof(detached_msg))) { + spin_unlock(&nar->rings.lock); + return; + } + + /* If we get here then we know that nobody + else is going to touch the ring, because + that''s what detached_pending means. */ + /* Deferred from start_detach_worker() */ + nc2_unmap_grants(&nar->prod_mapper); + nc2_unmap_grants(&nar->cons_mapper); + nc2_unmap_grants(&nar->control_mapper); + + detached_msg.handle = nar->handle; + nc2_send_message(&nc->rings.prod_ring, + NETCHANNEL2_MSG_BYPASS_DETACHED, + 0, + &detached_msg, + sizeof(detached_msg)); + nc->rings.pending_time_sensitive_messages = 1; + + list_del(&nar->rings_by_interface); + + spin_unlock(&nar->rings.lock); + + kfree(nar); + } else { + spin_unlock(&nar->rings.lock); + } + } + nc->need_aux_ring_state_machine = 0; +} + +static int map_rings_common(struct nc2_alternate_ring *ncr, + struct netchannel2_msg_bypass_common *msg) +{ + int err; + + if (msg->ring_domid == DOMID_SELF) + msg->ring_domid = ncr->rings.interface->rings.otherend_id; + + err = nc2_map_grants(&ncr->prod_mapper, + ncr->prod_grefs, + msg->ring_pages, + msg->ring_domid); + if (err < 0) { + printk(KERN_ERR "%d mapping producer ring", err); + return err; + } + + err = nc2_map_grants(&ncr->cons_mapper, + ncr->cons_grefs, + msg->ring_pages, + msg->ring_domid); + if (err < 0) { + printk(KERN_ERR "%d mapping consumer ring", err); + return err; + } + + err = nc2_map_grants(&ncr->control_mapper, + &msg->control_gref, + 1, + msg->ring_domid); + if (err < 0) + printk(KERN_ERR "%d mapping control ring", err); + return err; +} + +static int map_rings_frontend(struct nc2_alternate_ring *ncr) +{ + struct netchannel2_frontend_shared *nfs; + struct netchannel2_sring_prod *prod_sring; + struct netchannel2_sring_cons *cons_sring; + int err; + + err = map_rings_common(ncr, &ncr->frontend_setup_msg.common); + if (err < 0) + return err; + + nfs = ncr->control_mapper.mapping->addr; + cons_sring = &nfs->cons; + prod_sring = &nfs->prod; + _nc2_attach_rings(&ncr->rings, + cons_sring, + ncr->cons_mapper.mapping->addr, + ncr->frontend_setup_msg.common.ring_pages * PAGE_SIZE, + prod_sring, + ncr->prod_mapper.mapping->addr, + ncr->frontend_setup_msg.common.ring_pages * PAGE_SIZE, + ncr->frontend_setup_msg.common.peer_domid); + + return 0; +} + +static int map_rings_backend(struct nc2_alternate_ring *ncr) +{ + struct netchannel2_backend_shared *nbs; + struct netchannel2_sring_prod *prod_sring; + struct netchannel2_sring_cons *cons_sring; + int err; + + err = map_rings_common(ncr, &ncr->backend_setup_msg.common); + if (err < 0) + return err; + + nbs = ncr->control_mapper.mapping->addr; + cons_sring = &nbs->cons; + prod_sring = &nbs->prod; + _nc2_attach_rings(&ncr->rings, + cons_sring, + ncr->cons_mapper.mapping->addr, + ncr->backend_setup_msg.common.ring_pages * PAGE_SIZE, + prod_sring, + ncr->prod_mapper.mapping->addr, + ncr->backend_setup_msg.common.ring_pages * PAGE_SIZE, + ncr->backend_setup_msg.common.peer_domid); + + return 0; +} + +static void send_ready_message(struct nc2_alternate_ring *ncr) +{ + struct netchannel2_msg_bypass_ready msg; + + memset(&msg, 0, sizeof(msg)); + if (nc2_can_send_payload_bytes(&ncr->rings.prod_ring, sizeof(msg))) { + nc2_send_message(&ncr->rings.prod_ring, + NETCHANNEL2_MSG_BYPASS_READY, + 0, &msg, sizeof(msg)); + if (nc2_flush_ring(&ncr->rings.prod_ring)) + notify_remote_via_irq(ncr->rings.irq); + } else { + /* This shouldn''t happen, because the producer ring + should be essentially empty at this stage. If it + does, it probably means the other end is playing + silly buggers with the ring indexes. Drop the + message. */ + printk(KERN_WARNING "Failed to send bypass ring ready message.\n"); + } +} + +void nc2_handle_bypass_ready(struct netchannel2 *nc, + struct netchannel2_ring_pair *ncrp, + struct netchannel2_msg_hdr *hdr) +{ + struct nc2_alternate_ring *ncr; + + if (ncrp == &nc->rings) { + pr_debug("bypass ready on principal interface?\n"); + return; + } + ncr = container_of(ncrp, struct nc2_alternate_ring, rings); + /* We''re now allowed to start sending packets over this + * ring. */ + if (ncr->state == nc2_alt_ring_frontend_sent_ready) + ncr->state = nc2_alt_ring_ready; +} + +/* Called holding the aux ring lock. */ +void _nc2_alternate_ring_disable_finish(struct nc2_alternate_ring *ncr) +{ + /* No more packets will ever come out of this ring -> it is + now disabled. */ + ncr->state = nc2_alt_ring_disabled_pending; + ncr->rings.interface->need_aux_ring_state_machine = 1; + nc2_kick(&ncr->rings.interface->rings); +} + +static void initialise_bypass_frontend_work_item(struct work_struct *ws) +{ + struct nc2_alternate_ring *ncr + container_of(ws, struct nc2_alternate_ring, work_item); + struct netchannel2 *interface = ncr->rings.interface; + int err; + + memcpy(&ncr->rings.remote_mac, + ncr->frontend_setup_msg.common.remote_mac, 6); + err = map_rings_frontend(ncr); + if (err < 0) + goto err; + + BUG_ON(ncr->rings.cons_ring.sring == NULL); + + err = bind_listening_port_to_irqhandler(ncr->rings.otherend_id, + nc2_int, + 0, + "netchannel2_bypass", + &ncr->rings); + if (err < 0) + goto err; + ncr->rings.irq = err; + + /* Get it going. */ + nc2_kick(&ncr->rings); + + /* And get the master ring to send a FRONTEND_READY message */ + ncr->state = nc2_alt_ring_frontend_send_ready_pending; + spin_lock_bh(&interface->rings.lock); + interface->need_aux_ring_state_machine = 1; + nc2_kick(&interface->rings); + spin_unlock_bh(&interface->rings.lock); + + return; + +err: + printk(KERN_ERR "Error %d setting up bypass ring!\n", err); + + spin_lock_bh(&interface->rings.lock); + interface->pending_bypass_error = 1; + interface->need_aux_ring_state_machine = 1; + nc2_kick(&interface->rings); + list_del(&ncr->rings_by_interface); + spin_unlock_bh(&interface->rings.lock); + + nc2_unmap_grants(&ncr->prod_mapper); + nc2_unmap_grants(&ncr->cons_mapper); + nc2_unmap_grants(&ncr->control_mapper); + kfree(ncr); + return; +} + +static void initialise_bypass_backend_work_item(struct work_struct *ws) +{ + struct nc2_alternate_ring *ncr + container_of(ws, struct nc2_alternate_ring, work_item); + struct netchannel2 *interface = ncr->rings.interface; + int err; + + memcpy(&ncr->rings.remote_mac, + ncr->backend_setup_msg.common.remote_mac, 6); + err = map_rings_backend(ncr); + if (err < 0) + goto err; + + err = bind_interdomain_evtchn_to_irqhandler(ncr->rings.otherend_id, + ncr->backend_setup_msg.port, + nc2_int, + 0, + "netchannel2_bypass", + &ncr->rings); + if (err < 0) + goto err; + ncr->rings.irq = err; + + send_ready_message(ncr); + + spin_lock_bh(&ncr->rings.lock); + ncr->state = nc2_alt_ring_ready; + spin_unlock_bh(&ncr->rings.lock); + + nc2_kick(&ncr->rings); + + return; + +err: + printk(KERN_ERR "Error %d setting up bypass ring!\n", err); + + spin_lock_bh(&interface->rings.lock); + list_del(&ncr->rings_by_interface); + spin_unlock_bh(&interface->rings.lock); + + nc2_unmap_grants(&ncr->prod_mapper); + nc2_unmap_grants(&ncr->cons_mapper); + nc2_unmap_grants(&ncr->control_mapper); + kfree(ncr); + return; +} + +void nc2_handle_bypass_frontend(struct netchannel2 *nc, + struct netchannel2_ring_pair *ncrp, + struct netchannel2_msg_hdr *hdr) +{ + struct nc2_alternate_ring *work; + + if (hdr->size < sizeof(work->frontend_setup_msg)) { + pr_debug("Bypass message had strange size %d\n", hdr->size); + return; + } + if (ncrp != &nc->rings) { + pr_debug("Bypass message on ancillary ring!\n"); + return; + } + if (!nc->remote_trusted) { + pr_debug("Untrusted domain tried to set up a bypass.\n"); + return; + } + if (nc->pending_bypass_error) { + pr_debug("Remote tried to establish a bypass when we already had a pending error\n"); + return; + } + work = kzalloc(sizeof(*work), GFP_ATOMIC); + if (!work) { + printk(KERN_WARNING "no memory for alternative ring pair!\n"); + nc->pending_bypass_error = 1; + nc->need_aux_ring_state_machine = 1; + return; + } + nc2_copy_from_ring(&nc->rings.cons_ring, &work->frontend_setup_msg, + sizeof(work->frontend_setup_msg)); + if (hdr->size != sizeof(work->frontend_setup_msg) + + sizeof(uint32_t) * 2 * + work->frontend_setup_msg.common.ring_pages) { + printk(KERN_WARNING "inconsistent bypass message size (%d for %d pages)\n", + hdr->size, work->frontend_setup_msg.common.ring_pages); + goto err; + } + if (work->frontend_setup_msg.common.ring_pages > + MAX_BYPASS_RING_PAGES_MAPPABLE) { + printk(KERN_WARNING "too many ring pages: %d > %d\n", + work->frontend_setup_msg.common.ring_pages, + MAX_BYPASS_RING_PAGES_MAPPABLE); +err: + kfree(work); + nc->pending_bypass_error = 1; + nc->need_aux_ring_state_machine = 1; + return; + } + nc2_copy_from_ring_off(&ncrp->cons_ring, + &work->prod_grefs, + sizeof(uint32_t) * + work->frontend_setup_msg.common.ring_pages, + sizeof(work->frontend_setup_msg)); + nc2_copy_from_ring_off(&ncrp->cons_ring, + &work->cons_grefs, + sizeof(uint32_t) * + work->frontend_setup_msg.common.ring_pages, + sizeof(work->frontend_setup_msg) + + sizeof(uint32_t) * + work->frontend_setup_msg.common.ring_pages); + + work->state = nc2_alt_ring_frontend_preparing; + init_waitqueue_head(&work->eventq); + work->handle = work->frontend_setup_msg.common.handle; + INIT_WORK(&work->work_item, initialise_bypass_frontend_work_item); + if (init_ring_pair(&work->rings, nc) < 0) + goto err; + work->rings.filter_mac = 1; + + list_add(&work->rings_by_interface, &nc->alternate_rings); + schedule_work(&work->work_item); +} + +void nc2_handle_bypass_backend(struct netchannel2 *nc, + struct netchannel2_ring_pair *ncrp, + struct netchannel2_msg_hdr *hdr) +{ + struct nc2_alternate_ring *work; + + if (hdr->size < sizeof(work->backend_setup_msg)) { + pr_debug("Bypass message had strange size %d\n", hdr->size); + return; + } + if (ncrp != &nc->rings) { + pr_debug("Bypass message on ancillary ring!\n"); + return; + } + if (!nc->remote_trusted) { + pr_debug("Untrusted domain tried to set up a bypass.\n"); + return; + } + work = kzalloc(sizeof(*work), GFP_ATOMIC); + if (!work) { + printk(KERN_WARNING "no memory for alternative ring pair!\n"); + return; + } + nc2_copy_from_ring(&nc->rings.cons_ring, &work->backend_setup_msg, + sizeof(work->backend_setup_msg)); + if (hdr->size != sizeof(work->backend_setup_msg) + + sizeof(uint32_t) * 2 * + work->backend_setup_msg.common.ring_pages) { + printk(KERN_WARNING "inconsistent bypass message size (%d for %d pages)\n", + hdr->size, work->backend_setup_msg.common.ring_pages); + goto err; + } + if (work->backend_setup_msg.common.ring_pages > + MAX_BYPASS_RING_PAGES_MAPPABLE) { + printk(KERN_WARNING "too many ring pages: %d > %d\n", + work->backend_setup_msg.common.ring_pages, + MAX_BYPASS_RING_PAGES_MAPPABLE); +err: + kfree(work); + return; + } + nc2_copy_from_ring_off(&ncrp->cons_ring, + &work->prod_grefs, + sizeof(uint32_t) * + work->backend_setup_msg.common.ring_pages, + sizeof(work->backend_setup_msg)); + nc2_copy_from_ring_off(&ncrp->cons_ring, + &work->cons_grefs, + sizeof(uint32_t) * + work->backend_setup_msg.common.ring_pages, + sizeof(work->backend_setup_msg) + + sizeof(uint32_t) * + work->backend_setup_msg.common.ring_pages); + + work->state = nc2_alt_ring_backend_preparing; + init_waitqueue_head(&work->eventq); + work->handle = work->backend_setup_msg.common.handle; + INIT_WORK(&work->work_item, initialise_bypass_backend_work_item); + if (init_ring_pair(&work->rings, nc) < 0) + goto err; + work->rings.filter_mac = 1; + + list_add(&work->rings_by_interface, &nc->alternate_rings); + schedule_work(&work->work_item); +} + +/* Called under the nc master ring. */ +static struct nc2_alternate_ring *find_ring_by_handle(struct netchannel2 *nc, + uint32_t handle) +{ + struct nc2_alternate_ring *nar; + list_for_each_entry(nar, &nc->alternate_rings, rings_by_interface) { + if (nar->handle == handle) + return nar; + } + return NULL; +} + +void nc2_handle_bypass_disable(struct netchannel2 *nc, + struct netchannel2_ring_pair *ncrp, + struct netchannel2_msg_hdr *hdr) +{ + struct netchannel2_msg_bypass_disable msg; + struct nc2_alternate_ring *nar; + + if (ncrp != &nc->rings) { + pr_debug("Bypass disable on ancillary ring!\n"); + return; + } + if (!nc->remote_trusted) { + pr_debug("Untrusted remote requested bypass disable.\n"); + return; + } + if (hdr->size != sizeof(msg)) { + printk(KERN_WARNING "Strange size bypass disable message; %d != %zd.\n", + hdr->size, sizeof(msg)); + return; + } + nc2_copy_from_ring(&nc->rings.cons_ring, &msg, sizeof(msg)); + nar = find_ring_by_handle(nc, msg.handle); + if (nar == NULL) { + printk(KERN_WARNING "Request to disable unknown alternate ring %d.\n", + msg.handle); + return; + } + nc2_aux_ring_start_disable_sequence(nar); +} + +/* We''ve received a BYPASS_DETACH message on the master ring. Do + what''s needed to process it. */ +/* Called from the tasklet holding the master ring lock. */ +void nc2_handle_bypass_detach(struct netchannel2 *nc, + struct netchannel2_ring_pair *ncrp, + struct netchannel2_msg_hdr *hdr) +{ + struct netchannel2_msg_bypass_detach msg; + struct nc2_alternate_ring *nar; + + if (ncrp != &nc->rings) { + pr_debug("Bypass detach on wrong ring.\n"); + return; + } + if (!nc->remote_trusted) { + pr_debug("Detach request from untrusted peer.\n"); + return; + } + if (hdr->size != sizeof(msg)) { + printk(KERN_WARNING "Strange size bypass detach message; %d != %zd.\n", + hdr->size, sizeof(msg)); + return; + } + nc2_copy_from_ring(&nc->rings.cons_ring, &msg, sizeof(msg)); + nar = find_ring_by_handle(nc, msg.handle); + if (nar == NULL) { + printk(KERN_WARNING "Request to detach from unknown alternate ring %d.\n", + msg.handle); + return; + } + + nc2_aux_ring_start_detach_sequence(nar); +} + +/* This is only called once the irqs have been stopped and the + interfaces have been de-pended, so it shouldn''t have to worry about + any async activity. */ +static void release_alt_ring(struct nc2_alternate_ring *nar) +{ + flush_scheduled_work(); + + nc2_unmap_grants(&nar->prod_mapper); + nc2_unmap_grants(&nar->cons_mapper); + nc2_unmap_grants(&nar->control_mapper); + + cleanup_ring_pair(&nar->rings); +} + +void nc2_release_alt_rings(struct netchannel2 *nc) +{ + struct nc2_alternate_ring *nar, *next_nar; + + list_for_each_entry_safe(nar, next_nar, &nc->alternate_rings, + rings_by_interface) { + release_alt_ring(nar); + } +} + +/* This is called from a suspend callback just before the VM goes down + for suspend/resume. When it returns, we must have unmapped all + bypass rings. There is no possibility of failing. */ +void detach_all_bypasses(struct netchannel2 *nc) +{ + struct nc2_alternate_ring *nar; + + int cntr; + + spin_lock_bh(&nc->rings.lock); + cntr = 0; + while (!list_empty(&nc->alternate_rings) && cntr < 500) { + list_for_each_entry(nar, &nc->alternate_rings, + rings_by_interface) { + spin_lock(&nar->rings.lock); + /* If we''re currently in an operating state, + pretend we received a DISABLE message, so + we eventually generate a DISABLED message. + The peer will then start the detach state + machine, which will eventually destroy the + bypass. */ + /* nc2_alt_ring_frontend_sent_ready is a bit + odd. We are frontend-like, and we''ve told + the backend who we are, but we haven''t yet + received a READY from the backend. We + don''t necessarily trust the backend, so we + can''t wait for it. The best we can do is + to tell the peer that we''ve disabled, and + let it drive the backend into shutdown. */ + if (nar->state == nc2_alt_ring_frontend_sent_ready || + nar->state == nc2_alt_ring_ready) { + nar->state = nc2_alt_ring_disabling; + nc2_kick(&nar->rings); + } + spin_unlock(&nar->rings.lock); + } + spin_unlock_bh(&nc->rings.lock); + /* Bit of a hack... */ + msleep(10); + cntr++; + spin_lock_bh(&nc->rings.lock); + } + spin_unlock_bh(&nc->rings.lock); + + if (cntr < 500) + return; + + /* Okay, doing it the nice way didn''t work. This can happen + if the domain at the other end of the bypass isn''t picking + up messages, so we can''t flush through all of our pending + packets and disable ourselves cleanly. Force it through + instead, by pretending that we''ve received a DETACH message + from the parent. */ + printk(KERN_WARNING "timed out trying to disable a bypass nicely, being more forceful\n"); + spin_lock_bh(&nc->rings.lock); + cntr = 0; + while (!list_empty(&nc->alternate_rings)) { + list_for_each_entry(nar, &nc->alternate_rings, + rings_by_interface) { + spin_lock(&nar->rings.lock); + if (nar->state >= nc2_alt_ring_detaching) { + /* Okay, we''re already detaching, and + we''re waiting either for our work + item to run or for an opportunity + to tell the parent that we''re + detached. The parent is trusted, + so just wait for whatever it is + that we''re waiting for to + happen. */ + spin_unlock(&nar->rings.lock); + continue; + } + nar->state = nc2_alt_ring_detaching; + spin_unlock(&nar->rings.lock); + INIT_WORK(&nar->detach_work_item, + start_detach_worker); + schedule_work(&nar->detach_work_item); + } + spin_unlock_bh(&nc->rings.lock); + msleep(10); + cntr++; + if (cntr % 100 == 0) + printk(KERN_WARNING + "taking a long time to detach from bypasses (%d)\n", + cntr); + spin_lock_bh(&nc->rings.lock); + } + spin_unlock_bh(&nc->rings.lock); +} diff --git a/drivers/xen/netchannel2/bypassee.c b/drivers/xen/netchannel2/bypassee.c new file mode 100644 index 0000000..95ec681 --- /dev/null +++ b/drivers/xen/netchannel2/bypassee.c @@ -0,0 +1,737 @@ +/* All the bits which allow a domain to be bypassed. */ +#include <linux/kernel.h> +#include <linux/delay.h> +#include <linux/spinlock.h> +#include "netchannel2_core.h" + +/* Bypass disable is a bit tricky. Enable is relatively easy: + + 1) We decide to establish a bypass between two interfaces. + 2) We allocate the pages for the rings and grant them to + the relevant domains. + 3) We nominate one endpoint as the ``backend''''. + 4) We send both endpoints BYPASS messages. + 5) As far as we''re concerned, the bypass is now ready. The + endpoints will do the rest of the negotiation without any help + from us. + + Disable is harder. Each bypass endpoint can be in one of three + states: + + -- Running normally. + -- Disabled. + -- Detached. + + A disabled endpoint won''t generate any new operations (which means + that it can''t send packets, but can send FINISHED_PACKET messages + and so forth). A detached endpoint is one which has no longer + mapped the ring pages, so it can neither send nor receive. There + is no provision for transitioning ``backwards'''' i.e. from Disabled + to Running, Detached to Running, or Detached to Disabled. There + are a couple of messages relevant to changing state: + + -- DISABLE -- go to state Disabled if we''re in Running. Ignored in + other states (we won''t even get an ACK). We send this to the + endpoint. + -- DISABLED -- endpoint has transitioned to Disabled, whether of + its own accord or due to a DISABLE message. We receive this + from the endpoint. + -- DETACH -- go to state Detached if we''re in Running or Disabled. + Ignore in other states (without an ACK). Sent to the endpoint. + -- DETACHED -- endpoint has transitioned to DETACHED. Received + from the endpoint. + + A bypass in which both endpoints are Detached can be safely + destroyed. + + Once either endpoint has transitioned out of Running, the bypass is + pretty useless, so we try to push things so that we go to + Detached/Detached as quickly as possible. In particular: + + A state B state Action + Running Disabled Send A a DISABLE + Running Detached Send A a DETACH + Disabled Disabled Send both endpoints DETACH + Disabled Detached Send A a DETACH + Detached Detached Destroy the interface + + (And the obvious mirror images) + + There''s some filtering so that we never send a given endpoint more + than one DISABLE message or more than one DETACH message. If we + want to tear the bypass down from this end, we send both endpoints + DISABLE messages and let the state machine take things from + there. + + The core state machine is implemented in + crank_bypass_state_machine(). +*/ + +/* A list of all currently-live nc2_bypass interfaces. Only touched + from the worker thread. */ +static LIST_HEAD(all_bypasses); + +/* Bottom-half safe lock protecting pretty much all of the bypass + state, across all interfaces. The pending_list_lock is sometimes + acquired while this is held. It is acquired while holding the ring + lock. */ +static DEFINE_SPINLOCK(bypasses_lock); + +/* Encourage the endpoint to detach as soon as possible. */ +/* Called under the bypass lock. */ +static void schedule_detach(struct nc2_bypass_endpoint *ep) +{ + if (!ep->detached && !ep->need_detach && !ep->detach_sent) { + BUG_ON(ep->nc2 == NULL); + ep->need_detach = 1; + ep->nc2->need_advertise_bypasses = 1; + nc2_kick(&ep->nc2->rings); + } +} + +/* Encourage the endpoint to disable as soon as possible. */ +/* Called under the bypass lock. */ +static void schedule_disable(struct nc2_bypass_endpoint *ep) +{ + if (!ep->disabled && !ep->need_disable && !ep->disable_sent) { + BUG_ON(ep->detached); + BUG_ON(ep->nc2 == NULL); + ep->need_disable = 1; + ep->nc2->need_advertise_bypasses = 1; + nc2_kick(&ep->nc2->rings); + } +} + +static void grant_end(grant_ref_t *gref) +{ + if (*gref && gnttab_end_foreign_access_ref(*gref)) { + gnttab_free_grant_reference(*gref); + *gref = 0; + } +} + +/* Release all resources associated with the bypass. It is assumed + that the caller has ensured that nobody else is going to access it + any more. */ +static void release_bypass(struct nc2_bypass *bypass) +{ + int i; + + BUG_ON(atomic_read(&bypass->refcnt) != 0); + + for (i = 0; i < bypass->nr_ring_pages; i++) { + grant_end(&bypass->ep_a.incoming_grefs[i]); + grant_end(&bypass->ep_b.incoming_grefs[i]); + grant_end(&bypass->ep_a.outgoing_grefs[i]); + grant_end(&bypass->ep_b.outgoing_grefs[i]); + if (bypass->ep_a.incoming_pages[i] && + !bypass->ep_a.incoming_grefs[i] && + !bypass->ep_b.outgoing_grefs[i]) + free_page(bypass->ep_a.incoming_pages[i]); + if (bypass->ep_b.incoming_pages[i] && + !bypass->ep_b.incoming_grefs[i] && + !bypass->ep_a.outgoing_grefs[i]) + free_page(bypass->ep_b.incoming_pages[i]); + } + grant_end(&bypass->ep_a.control_gref); + grant_end(&bypass->ep_b.control_gref); + if (bypass->control_page && + !bypass->ep_a.control_gref && + !bypass->ep_b.control_gref) + free_page(bypass->control_page); + + kfree(bypass); +} + +static void put_bypass(struct nc2_bypass *bypass) +{ + if (atomic_dec_and_test(&bypass->refcnt)) + release_bypass(bypass); +} + +/* The state of one of the bypass endpoints has changed. Crank + through the state machine, scheduling any messages which are + needed. Tear the bypass down if both ends have detached. */ +/* Called under the bypass lock. */ +static void crank_bypass_state_machine(struct nc2_bypass *bypass) +{ + if (bypass->ep_a.disabled != bypass->ep_b.disabled) { + schedule_disable(&bypass->ep_a); + schedule_disable(&bypass->ep_b); + } + if (bypass->ep_a.disabled && bypass->ep_b.disabled) { + schedule_detach(&bypass->ep_b); + schedule_detach(&bypass->ep_a); + } + if (bypass->ep_a.detached != bypass->ep_b.detached) { + schedule_detach(&bypass->ep_b); + schedule_detach(&bypass->ep_a); + } + if (bypass->ep_a.detached && bypass->ep_b.detached) { + /* Okay, neither endpoint knows about the bypass any + more. It is therefore dead. */ + /* XXX: Should there be a concept of zombie bypasses? + * i.e. keep the bypass around until userspace + * explicitly reaps it, so as to avoid the usual ID + * reuse races. */ + list_del_init(&bypass->list); + wake_up_all(&bypass->detach_waitq); + put_bypass(bypass); + } +} + +/* A bypass disabled message has been received on @ncrp (which should + be the main ring for @nc, or someone''s misbehaving). */ +/* Called from the tasklet. */ +void nc2_handle_bypass_disabled(struct netchannel2 *nc, + struct netchannel2_ring_pair *ncrp, + struct netchannel2_msg_hdr *hdr) +{ + struct netchannel2_msg_bypass_disabled msg; + struct nc2_bypass *bypass; + + if (hdr->size != sizeof(msg)) { + pr_debug("Strange size bypass disabled message; %d != %zd.\n", + hdr->size, sizeof(msg)); + return; + } + if (ncrp != &nc->rings) { + pr_debug("bypass_disabled on wrong ring.\n"); + return; + } + nc2_copy_from_ring(&nc->rings.cons_ring, &msg, sizeof(msg)); + spin_lock(&bypasses_lock); + list_for_each_entry(bypass, &nc->bypasses_a, ep_a.list) { + if (bypass->handle == msg.handle) { + bypass->ep_a.disabled = 1; + crank_bypass_state_machine(bypass); + spin_unlock(&bypasses_lock); + return; + } + } + list_for_each_entry(bypass, &nc->bypasses_b, ep_b.list) { + if (bypass->handle == msg.handle) { + bypass->ep_b.disabled = 1; + crank_bypass_state_machine(bypass); + spin_unlock(&bypasses_lock); + return; + } + } + spin_unlock(&bypasses_lock); + + pr_debug("Disabled message was on the wrong ring (%d)?\n", + msg.handle); + return; +} + +static void detach(struct nc2_bypass_endpoint *ep) +{ + if (ep->detached) + return; + list_del_init(&ep->list); + ep->disabled = ep->detached = 1; + ep->nc2->extant_bypasses--; + ep->nc2 = NULL; +} + +/* One of our peers has sent us a bypass detached message i.e. it was + previously bypassing us, and it''s not any more. Do the appropriate + thing. */ +void nc2_handle_bypass_detached(struct netchannel2 *nc, + struct netchannel2_ring_pair *ncrp, + struct netchannel2_msg_hdr *hdr) +{ + struct netchannel2_msg_bypass_detached msg; + struct nc2_bypass *bypass; + + if (hdr->size != sizeof(msg)) { + pr_debug("Strange size bypass detached message; %d != %zd.\n", + hdr->size, sizeof(msg)); + return; + } + if (ncrp != &nc->rings) { + pr_debug("bypass_disabled on wrong ring.\n"); + return; + } + nc2_copy_from_ring(&nc->rings.cons_ring, &msg, sizeof(msg)); + spin_lock(&bypasses_lock); + list_for_each_entry(bypass, &nc->bypasses_a, ep_a.list) { + if (bypass->handle == msg.handle) { + detach(&bypass->ep_a); + crank_bypass_state_machine(bypass); + spin_unlock(&bypasses_lock); + return; + } + } + list_for_each_entry(bypass, &nc->bypasses_b, ep_b.list) { + if (bypass->handle == msg.handle) { + detach(&bypass->ep_b); + crank_bypass_state_machine(bypass); + spin_unlock(&bypasses_lock); + return; + } + } + spin_unlock(&bypasses_lock); + pr_debug("Detached message was on the wrong ring (%d)?\n", + msg.handle); +} + +static int send_disable_bypass_msg(struct netchannel2 *nc, + struct nc2_bypass *bypass) +{ + struct netchannel2_msg_bypass_disable msg = { + .handle = bypass->handle + }; + + if (!nc2_can_send_payload_bytes(&nc->rings.prod_ring, sizeof(msg))) + return 1; + nc2_send_message(&nc->rings.prod_ring, NETCHANNEL2_MSG_BYPASS_DISABLE, + 0, &msg, sizeof(msg)); + nc->rings.pending_time_sensitive_messages = 1; + return 0; +} + +static int send_detach_bypass_msg(struct netchannel2 *nc, + struct nc2_bypass *bypass) +{ + struct netchannel2_msg_bypass_detach msg = { + .handle = bypass->handle + }; + + if (!nc2_can_send_payload_bytes(&nc->rings.prod_ring, sizeof(msg))) + return 1; + nc2_send_message(&nc->rings.prod_ring, NETCHANNEL2_MSG_BYPASS_DETACH, + 0, &msg, sizeof(msg)); + nc->rings.pending_time_sensitive_messages = 1; + return 0; +} + +static void init_bypass_msg_common(struct netchannel2_msg_bypass_common *msg, + struct nc2_bypass_endpoint *this_ep, + struct netchannel2 *remote, + struct nc2_bypass *bypass) +{ + msg->control_gref = this_ep->control_gref; + + msg->ring_domid = DOMID_SELF; + msg->ring_pages = bypass->nr_ring_pages;; + msg->peer_domid = remote->rings.otherend_id; + msg->peer_trusted = remote->remote_trusted; + msg->handle = bypass->handle; + memcpy(msg->remote_mac, remote->rings.remote_mac, ETH_ALEN); +} + +static int advertise_bypass_frontend(struct netchannel2 *nc, + struct nc2_bypass *bypass) +{ + struct netchannel2_msg_bypass_frontend msg; + unsigned msg_size; + + BUG_ON(nc != bypass->ep_a.nc2); + + msg_size = sizeof(msg) + bypass->nr_ring_pages * 2 * sizeof(uint32_t); + if (!nc->current_bypass_frontend && + !nc2_can_send_payload_bytes(&nc->rings.prod_ring, msg_size)) + return 1; + + memset(&msg, 0, sizeof(msg)); + + init_bypass_msg_common(&msg.common, &bypass->ep_a, bypass->ep_b.nc2, + bypass); + + nc->current_bypass_frontend = bypass; + + /* Send the message. nc2_send_message doesn''t support the + right kind of scatter gather, so do it by hand. */ + __nc2_avoid_ring_wrap(&nc->rings.prod_ring, msg_size); + msg.hdr.type = NETCHANNEL2_MSG_BYPASS_FRONTEND; + msg.hdr.size = msg_size; + nc2_copy_to_ring(&nc->rings.prod_ring, &msg, sizeof(msg)); + nc2_copy_to_ring_off(&nc->rings.prod_ring, + bypass->ep_a.outgoing_grefs, + sizeof(uint32_t) * bypass->nr_ring_pages, + sizeof(msg)); + nc2_copy_to_ring_off(&nc->rings.prod_ring, + bypass->ep_a.incoming_grefs, + sizeof(uint32_t) * bypass->nr_ring_pages, + sizeof(msg) + sizeof(uint32_t) * bypass->nr_ring_pages); + nc->rings.prod_ring.prod_pvt += msg_size; + nc->rings.prod_ring.bytes_available -= msg_size; + nc->rings.pending_time_sensitive_messages = 1; + return 0; +} + +static int advertise_bypass_backend(struct netchannel2 *nc, + struct nc2_bypass *bypass) +{ + struct netchannel2_msg_bypass_backend msg; + unsigned msg_size; + + BUG_ON(nc != bypass->ep_b.nc2); + + msg_size = sizeof(msg) + bypass->nr_ring_pages * 2 * sizeof(uint32_t); + if (!nc2_can_send_payload_bytes(&nc->rings.prod_ring, msg_size)) + return 1; + + memset(&msg, 0, sizeof(msg)); + + init_bypass_msg_common(&msg.common, &bypass->ep_b, bypass->ep_a.nc2, + bypass); + + BUG_ON(bypass->evtchn_port == 0); + msg.port = bypass->evtchn_port; + msg.hdr.type = NETCHANNEL2_MSG_BYPASS_BACKEND; + msg.hdr.size = msg_size; + nc2_copy_to_ring(&nc->rings.prod_ring, &msg, sizeof(msg)); + nc2_copy_to_ring_off(&nc->rings.prod_ring, + bypass->ep_b.outgoing_grefs, + sizeof(uint32_t) * bypass->nr_ring_pages, + sizeof(msg)); + nc2_copy_to_ring_off(&nc->rings.prod_ring, + bypass->ep_b.incoming_grefs, + sizeof(uint32_t) * bypass->nr_ring_pages, + sizeof(msg) + sizeof(uint32_t) * bypass->nr_ring_pages); + nc->rings.prod_ring.prod_pvt += msg_size; + nc->rings.prod_ring.bytes_available -= msg_size; + nc->rings.pending_time_sensitive_messages = 1; + return 0; +} + +/* Called from the tasklet, holding the ring lock for nc and the + bypass lock. */ +static int advertise_bypass(struct netchannel2 *nc, struct nc2_bypass *bypass) +{ + if (nc == bypass->ep_a.nc2) + return advertise_bypass_frontend(nc, bypass); + else + return advertise_bypass_backend(nc, bypass); +} + +/* Called from the tasklet holding the ring and bypass locks. */ +static int nc2_do_bypass_advertise_work(struct nc2_bypass_endpoint *ep, + struct netchannel2 *nc, + struct nc2_bypass *bypass) +{ + if (ep->need_advertise) { + if (advertise_bypass(nc, bypass)) + return 0; + ep->need_advertise = 0; + } + if (ep->need_disable) { + if (send_disable_bypass_msg(nc, bypass)) + return 0; + ep->need_disable = 0; + ep->disable_sent = 1; + } + if (ep->need_detach) { + if (send_detach_bypass_msg(nc, bypass)) + return 0; + ep->need_detach = 0; + ep->detach_sent = 1; + } + return 1; +} + +/* Called from the tasklet holding the ring lock. */ +void _nc2_advertise_bypasses(struct netchannel2 *nc) +{ + struct nc2_bypass *bypass; + int success; + + spin_lock(&bypasses_lock); + success = 1; + list_for_each_entry(bypass, &nc->bypasses_a, ep_a.list) { + success &= nc2_do_bypass_advertise_work(&bypass->ep_a, + nc, + bypass); + } + list_for_each_entry(bypass, &nc->bypasses_b, ep_b.list) { + success &= nc2_do_bypass_advertise_work(&bypass->ep_b, + nc, + bypass); + } + if (success) + nc->need_advertise_bypasses = 0; + spin_unlock(&bypasses_lock); +} + +void nc2_handle_bypass_frontend_ready(struct netchannel2 *nc, + struct netchannel2_ring_pair *ncrp, + struct netchannel2_msg_hdr *hdr) +{ + struct netchannel2_msg_bypass_frontend_ready msg; + struct nc2_bypass *bypass; + + if (hdr->size != sizeof(msg) || ncrp != &nc->rings || + !nc->current_bypass_frontend) + return; + bypass = nc->current_bypass_frontend; + nc->current_bypass_frontend = NULL; + nc2_copy_from_ring(&nc->rings.cons_ring, &msg, sizeof(msg)); + spin_lock(&bypasses_lock); + if (msg.port <= 0) { + printk(KERN_WARNING "%d from frontend trying to establish bypass\n", + msg.port); + detach(&bypass->ep_a); + detach(&bypass->ep_b); + crank_bypass_state_machine(bypass); + spin_unlock(&bypasses_lock); + return; + } + + bypass->evtchn_port = msg.port; + bypass->ep_b.need_advertise = 1; + bypass->ep_b.nc2->need_advertise_bypasses = 1; + nc2_kick(&bypass->ep_b.nc2->rings); + spin_unlock(&bypasses_lock); +} + +/* Called from an ioctl not holding any locks. */ +static int build_bypass_page(int *gref_pool, + int *grefp_a, + int *grefp_b, + domid_t domid_a, + domid_t domid_b, + unsigned long *pagep) +{ + int gref_a, gref_b; + unsigned long page; + + page = get_zeroed_page(GFP_ATOMIC); + if (page == 0) + return -ENOMEM; + gref_a = gnttab_claim_grant_reference(gref_pool); + gref_b = gnttab_claim_grant_reference(gref_pool); + BUG_ON(gref_a < 0); + BUG_ON(gref_b < 0); + gnttab_grant_foreign_access_ref(gref_a, domid_a, virt_to_mfn(page), 0); + gnttab_grant_foreign_access_ref(gref_b, domid_b, virt_to_mfn(page), 0); + + *pagep = page; + *grefp_a = gref_a; + *grefp_b = gref_b; + return 0; +} + +/* Called from an ioctl or work queue item not holding any locks. */ +int nc2_establish_bypass(struct netchannel2 *a, struct netchannel2 *b) +{ + struct nc2_bypass *work; + struct nc2_bypass *other_bypass; + int err; + grant_ref_t gref_pool; + int i; + static atomic_t next_handle; + int handle; + unsigned nr_pages; + + /* Can''t establish a bypass unless we''re trusted by both of + the remote endpoints. */ + if (!a->local_trusted || !b->local_trusted) + return -EPERM; + + /* Can''t establish a bypass unless it''s allowed by both + * endpoints. */ + if (!a->bypass_max_pages || !b->bypass_max_pages) + return -EOPNOTSUPP; + + if (a->extant_bypasses >= a->max_bypasses || + b->extant_bypasses >= b->max_bypasses) + return -EMFILE; + + nr_pages = a->bypass_max_pages; + if (nr_pages > b->bypass_max_pages) + nr_pages = b->bypass_max_pages; + if (nr_pages > MAX_BYPASS_RING_PAGES_GRANTABLE) + nr_pages = MAX_BYPASS_RING_PAGES_GRANTABLE; + if (nr_pages == 0) { + printk(KERN_WARNING "tried to establish a null bypass ring?\n"); + return -EINVAL; + } + + work = kzalloc(sizeof(*work), GFP_ATOMIC); + if (!work) + return -ENOMEM; + atomic_set(&work->refcnt, 1); + init_waitqueue_head(&work->detach_waitq); + + work->nr_ring_pages = nr_pages; + + work->ep_a.nc2 = a; + work->ep_b.nc2 = b; + + work->ep_a.need_advertise = 1; + + handle = atomic_inc_return(&next_handle); + work->handle = handle; + + err = gnttab_alloc_grant_references(work->nr_ring_pages * 4 + 2, + &gref_pool); + if (err < 0) + goto err; + + err = -ENOMEM; + for (i = 0; i < work->nr_ring_pages; i++) { + err = build_bypass_page(&gref_pool, + &work->ep_a.incoming_grefs[i], + &work->ep_b.outgoing_grefs[i], + a->rings.otherend_id, + b->rings.otherend_id, + &work->ep_a.incoming_pages[i]); + if (err < 0) + goto err; + err = build_bypass_page(&gref_pool, + &work->ep_b.incoming_grefs[i], + &work->ep_a.outgoing_grefs[i], + b->rings.otherend_id, + a->rings.otherend_id, + &work->ep_b.incoming_pages[i]); + if (err < 0) + goto err; + } + err = build_bypass_page(&gref_pool, + &work->ep_a.control_gref, + &work->ep_b.control_gref, + a->rings.otherend_id, + b->rings.otherend_id, + &work->control_page); + if (err < 0) + goto err; + + spin_lock_bh(&bypasses_lock); + + if (work->ep_a.nc2->current_bypass_frontend) { + /* We can''t establish another bypass until this one + has finished (which might be forever, if the remote + domain is misbehaving, but that''s not a + problem). */ + err = -EBUSY; + spin_unlock_bh(&bypasses_lock); + goto err; + } + + /* Don''t allow redundant bypasses, because they''ll never be used. + This doesn''t actually matter all that much, because in order + to establish a redundant bypass, either: + + -- The user explicitly requested one, in which case they + get what they deserve, or + -- They''re using the autobypasser, in which case it''ll detect + that the bypass isn''t being used within a few seconds + and tear it down. + + Still, it''s better to avoid it (if only so the user gets a + sensible error message), and so we do a quick check here. + */ + list_for_each_entry(other_bypass, &a->bypasses_a, ep_a.list) { + BUG_ON(other_bypass->ep_a.nc2 != a); + if (other_bypass->ep_b.nc2 == b) { + err = -EEXIST; + spin_unlock_bh(&bypasses_lock); + goto err; + } + } + list_for_each_entry(other_bypass, &a->bypasses_b, ep_b.list) { + BUG_ON(other_bypass->ep_b.nc2 != a); + if (other_bypass->ep_a.nc2 == b) { + err = -EEXIST; + spin_unlock_bh(&bypasses_lock); + goto err; + } + } + + list_add(&work->ep_a.list, &a->bypasses_a); + INIT_LIST_HEAD(&work->ep_b.list); + a->need_advertise_bypasses = 1; + list_add(&work->ep_b.list, &b->bypasses_b); + list_add_tail(&work->list, &all_bypasses); + + a->extant_bypasses++; + b->extant_bypasses++; + + spin_unlock_bh(&bypasses_lock); + + nc2_kick(&a->rings); + + return handle; + +err: + gnttab_free_grant_references(gref_pool); + put_bypass(work); + return err; +} + +/* Called from an ioctl holding the bypass lock. */ +static struct nc2_bypass *get_bypass(uint32_t handle) +{ + struct nc2_bypass *bypass; + + list_for_each_entry(bypass, &all_bypasses, list) { + if (bypass->handle == handle) { + atomic_inc(&bypass->refcnt); + return bypass; + } + } + return NULL; +} + +static int bypass_fully_detached(struct nc2_bypass *bypass) +{ + int res; + spin_lock_bh(&bypasses_lock); + res = bypass->ep_a.detached && bypass->ep_b.detached; + spin_unlock_bh(&bypasses_lock); + return res; +} + +int nc2_destroy_bypass(int handle) +{ + struct nc2_bypass *bypass; + int r; + + spin_lock_bh(&bypasses_lock); + bypass = get_bypass(handle); + if (bypass == NULL) { + spin_unlock_bh(&bypasses_lock); + return -ESRCH; + } + schedule_disable(&bypass->ep_a); + schedule_disable(&bypass->ep_b); + spin_unlock_bh(&bypasses_lock); + + r = wait_event_interruptible_timeout(bypass->detach_waitq, + bypass_fully_detached(bypass), + 5 * HZ); + put_bypass(bypass); + if (r < 0) { + printk(KERN_WARNING "Failed to destroy a bypass (%d).\n", + r); + } + return r; +} + +/* We''re guaranteed to be the only thing accessing @nc at this point, + but we don''t know what''s happening to the other endpoints of any + bypasses which it might have attached. */ +void release_bypasses(struct netchannel2 *nc) +{ + struct nc2_bypass *bypass, *next_bypass; + + spin_lock(&bypasses_lock); + list_for_each_entry_safe(bypass, next_bypass, &nc->bypasses_a, + ep_a.list) { + detach(&bypass->ep_a); + crank_bypass_state_machine(bypass); + } + list_for_each_entry_safe(bypass, next_bypass, &nc->bypasses_b, + ep_b.list) { + detach(&bypass->ep_b); + crank_bypass_state_machine(bypass); + } + spin_unlock(&bypasses_lock); + + BUG_ON(!list_empty(&nc->bypasses_a)); + BUG_ON(!list_empty(&nc->bypasses_b)); + + flush_scheduled_work(); +} diff --git a/drivers/xen/netchannel2/chan.c b/drivers/xen/netchannel2/chan.c index 47e1c5e..357ad18 100644 --- a/drivers/xen/netchannel2/chan.c +++ b/drivers/xen/netchannel2/chan.c @@ -17,7 +17,7 @@ static int process_ring(struct napi_struct *napi, int work_avail); -static irqreturn_t nc2_int(int irq, void *dev_id) +irqreturn_t nc2_int(int irq, void *dev_id) { struct netchannel2_ring_pair *ncr = dev_id; @@ -89,6 +89,30 @@ retry: nc2_handle_set_max_fragments_per_packet(nc, ncrp, &hdr); break; + case NETCHANNEL2_MSG_BYPASS_FRONTEND: + nc2_handle_bypass_frontend(nc, ncrp, &hdr); + break; + case NETCHANNEL2_MSG_BYPASS_BACKEND: + nc2_handle_bypass_backend(nc, ncrp, &hdr); + break; + case NETCHANNEL2_MSG_BYPASS_FRONTEND_READY: + nc2_handle_bypass_frontend_ready(nc, ncrp, &hdr); + break; + case NETCHANNEL2_MSG_BYPASS_DISABLE: + nc2_handle_bypass_disable(nc, ncrp, &hdr); + break; + case NETCHANNEL2_MSG_BYPASS_DISABLED: + nc2_handle_bypass_disabled(nc, ncrp, &hdr); + break; + case NETCHANNEL2_MSG_BYPASS_DETACH: + nc2_handle_bypass_detach(nc, ncrp, &hdr); + break; + case NETCHANNEL2_MSG_BYPASS_DETACHED: + nc2_handle_bypass_detached(nc, ncrp, &hdr); + break; + case NETCHANNEL2_MSG_BYPASS_READY: + nc2_handle_bypass_ready(nc, ncrp, &hdr); + break; case NETCHANNEL2_MSG_PAD: break; default: @@ -143,8 +167,15 @@ static void flush_rings(struct netchannel2_ring_pair *ncrp) advertise_max_packets(ncrp); if (ncrp->need_advertise_max_fragments_per_packet) advertise_max_fragments_per_packet(ncrp); - if (nc->need_advertise_offloads) - advertise_offloads(nc); + + if (ncrp == &nc->rings) { + if (nc->need_advertise_offloads) + advertise_offloads(nc); + nc2_advertise_bypasses(nc); + nc2_crank_aux_ring_state_machine(nc); + } else { + nc2_alternate_ring_disable_finish(ncrp); + } need_kick = 0; if (nc2_finish_messages(&ncrp->cons_ring)) { @@ -343,6 +374,9 @@ struct netchannel2 *nc2_new(struct xenbus_device *xd) int local_trusted; int remote_trusted; int filter_mac; +#ifdef CONFIG_XEN_NETDEV2_BYPASSABLE + int max_bypasses; +#endif if (!gnttab_subpage_grants_available()) { printk(KERN_ERR "netchannel2 needs version 2 grant tables\n"); @@ -355,6 +389,17 @@ struct netchannel2 *nc2_new(struct xenbus_device *xd) local_trusted = 1; } +#ifdef CONFIG_XEN_NETDEV2_BYPASSABLE + max_bypasses = 0; + if (local_trusted) { + if (xenbus_scanf(XBT_NIL, xd->nodename, "max-bypasses", + "%d", &max_bypasses) != 1) { + printk(KERN_WARNING "Can''t get maximum bypass count; assuming 0.\n"); + max_bypasses = 0; + } + } +#endif + if (xenbus_scanf(XBT_NIL, xd->nodename, "remote-trusted", "%d", &remote_trusted) != 1) { printk(KERN_WARNING "Can''t tell whether local endpoint is trusted; assuming it isn''t.\n"); @@ -389,6 +434,15 @@ struct netchannel2 *nc2_new(struct xenbus_device *xd) /* Default to RX csum on. */ nc->use_rx_csum = 1; +#ifdef CONFIG_XEN_NETDEV2_BYPASSABLE + INIT_LIST_HEAD(&nc->bypasses_a); + INIT_LIST_HEAD(&nc->bypasses_b); + nc->max_bypasses = max_bypasses; +#endif +#ifdef CONFIG_XEN_NETDEV2_BYPASS_ENDPOINT + INIT_LIST_HEAD(&nc->alternate_rings); +#endif + skb_queue_head_init(&nc->pending_skbs); if (init_ring_pair(&nc->rings, nc) < 0) { nc2_release(nc); @@ -447,21 +501,25 @@ void nc2_release(struct netchannel2 *nc) we''re now the only thing accessing this netchannel2 structure and we can tear it down with impunity. */ + nc2_release_alt_rings(nc); + cleanup_ring_pair(&nc->rings); nc2_queue_purge(&nc->rings, &nc->pending_skbs); + release_bypasses(nc); + free_netdev(nc->net_device); } -static void _nc2_attach_rings(struct netchannel2_ring_pair *ncrp, - struct netchannel2_sring_cons *cons_sring, - const volatile void *cons_payload, - size_t cons_size, - struct netchannel2_sring_prod *prod_sring, - void *prod_payload, - size_t prod_size, - domid_t otherend_id) +void _nc2_attach_rings(struct netchannel2_ring_pair *ncrp, + struct netchannel2_sring_cons *cons_sring, + const volatile void *cons_payload, + size_t cons_size, + struct netchannel2_sring_prod *prod_sring, + void *prod_payload, + size_t prod_size, + domid_t otherend_id) { BUG_ON(prod_sring == NULL); BUG_ON(cons_sring == NULL); @@ -498,6 +556,28 @@ int nc2_attach_rings(struct netchannel2 *nc, size_t prod_size, domid_t otherend_id) { +#ifdef CONFIG_XEN_NETDEV2_BYPASSABLE + int feature_bypass; + int max_bypass_pages; + + if (xenbus_scanf(XBT_NIL, nc->xenbus_device->otherend, + "feature-bypass", "%d", &feature_bypass) < 0) + feature_bypass = 0; + if (feature_bypass) { + if (xenbus_scanf(XBT_NIL, nc->xenbus_device->otherend, + "feature-bypass-max-pages", "%d", + &max_bypass_pages) < 0) { + printk(KERN_WARNING "other end claimed to support bypasses, but didn''t expose max-pages?\n"); + /* Bypasses disabled for this ring. */ + nc->max_bypasses = 0; + } else { + nc->bypass_max_pages = max_bypass_pages; + } + } else { + nc->max_bypasses = 0; + } +#endif + spin_lock_bh(&nc->rings.lock); _nc2_attach_rings(&nc->rings, cons_sring, cons_payload, cons_size, prod_sring, prod_payload, prod_size, otherend_id); @@ -539,6 +619,24 @@ static void _detach_rings(struct netchannel2_ring_pair *ncrp) ncrp->is_attached = 0; spin_unlock_bh(&ncrp->lock); + +#ifdef CONFIG_XEN_NETDEV2_BYPASS_ENDPOINT + { + struct nc2_alternate_ring *nar; + + /* Walk the alternate rings list and detach all of + them as well. This is recursive, but it''s only + ever going to recur one deep, so it''s okay. */ + /* Don''t need to worry about synchronisation because + the interface has been stopped. */ + if (ncrp == &ncrp->interface->rings) { + list_for_each_entry(nar, + &ncrp->interface->alternate_rings, + rings_by_interface) + _detach_rings(&nar->rings); + } + } +#endif } /* Detach from the rings. This includes unmapping them and stopping @@ -570,6 +668,20 @@ void nc2_detach_rings(struct netchannel2 *nc) if (nc->rings.irq >= 0) unbind_from_irqhandler(nc->rings.irq, &nc->rings); nc->rings.irq = -1; +#ifdef CONFIG_XEN_NETDEV2_BYPASS_ENDPOINT + { + struct nc2_alternate_ring *ncr; + + list_for_each_entry(ncr, &nc->alternate_rings, + rings_by_interface) { + if (ncr->rings.irq >= 0) { + unbind_from_irqhandler(ncr->rings.irq, + &ncr->rings); + ncr->rings.irq = -1; + } + } + } +#endif /* Disable all offloads */ nc->net_device->features &= ~(NETIF_F_IP_CSUM | NETIF_F_SG | NETIF_F_TSO); @@ -641,6 +753,7 @@ int nc2_get_evtchn_port(struct netchannel2 *nc) void nc2_suspend(struct netchannel2 *nc) { + detach_all_bypasses(nc); suspend_receive_map_mode(); } @@ -682,7 +795,7 @@ static int process_ring(struct napi_struct *napi, release_tx_packet(ncrp, skb); } - if (nc->is_stopped) { + if (ncrp == &nc->rings && nc->is_stopped) { /* If the other end has processed some messages, there may be space on the ring for a delayed send from earlier. Process it now. */ diff --git a/drivers/xen/netchannel2/netback2.c b/drivers/xen/netchannel2/netback2.c index fd6f238..cf52839 100644 --- a/drivers/xen/netchannel2/netback2.c +++ b/drivers/xen/netchannel2/netback2.c @@ -1,18 +1,30 @@ #include <linux/kernel.h> #include <linux/gfp.h> #include <linux/vmalloc.h> +#include <linux/miscdevice.h> +#include <linux/fs.h> #include <xen/gnttab.h> #include <xen/xenbus.h> #include <xen/interface/io/netchannel2.h> #include "netchannel2_core.h" #include "netchannel2_endpoint.h" +#include "netchannel2_uspace.h" + +static atomic_t next_handle; +/* A list of all currently-live netback2 interfaces. */ +static LIST_HEAD(all_netbacks); +/* A lock to protect the above list. */ +static DEFINE_MUTEX(all_netbacks_lock); #define NETBACK2_MAGIC 0xb5e99485 struct netback2 { unsigned magic; struct xenbus_device *xenbus_device; + int handle; + struct list_head list; + struct netchannel2 *chan; struct grant_mapping b2f_mapping; @@ -182,6 +194,14 @@ static void frontend_changed(struct xenbus_device *xd, xenbus_printf(XBT_NIL, nb->xenbus_device->nodename, "max-sring-pages", "%d", MAX_GRANT_MAP_PAGES); +#ifdef CONFIG_XEN_NETDEV2_BYPASS_ENDPOINT + xenbus_printf(XBT_NIL, nb->xenbus_device->nodename, + "feature-bypass", "1"); + xenbus_printf(XBT_NIL, nb->xenbus_device->nodename, + "feature-bypass-max-pages", "%d", + MAX_BYPASS_RING_PAGES_GRANTABLE); +#endif + /* Start the device bring-up bit of the state * machine. */ xenbus_switch_state(nb->xenbus_device, XenbusStateInitWait); @@ -296,6 +316,11 @@ static int netback2_probe(struct xenbus_device *xd, xd->dev.driver_data = nb; + nb->handle = atomic_inc_return(&next_handle); + mutex_lock(&all_netbacks_lock); + list_add(&nb->list, &all_netbacks); + mutex_unlock(&all_netbacks_lock); + kobject_uevent(&xd->dev.kobj, KOBJ_ONLINE); return 0; @@ -315,6 +340,9 @@ static int netback2_remove(struct xenbus_device *xd) { struct netback2 *nb = xenbus_device_to_nb2(xd); kobject_uevent(&xd->dev.kobj, KOBJ_OFFLINE); + mutex_lock(&all_netbacks_lock); + list_del(&nb->list); + mutex_unlock(&all_netbacks_lock); if (nb->chan != NULL) nc2_release(nb->chan); if (nb->have_shutdown_watch) @@ -341,14 +369,114 @@ static struct xenbus_driver netback2 = { .uevent = netback2_uevent, }; +#ifdef CONFIG_XEN_NETDEV2_BYPASSABLE +static struct netback2 *find_netback_by_handle_locked(unsigned handle) +{ + struct netback2 *nb; + + list_for_each_entry(nb, &all_netbacks, list) { + if (nb->handle == handle) + return nb; + } + return NULL; +} + +static struct netback2 *find_netback_by_remote_mac_locked(const char *mac) +{ + struct netback2 *nb; + + list_for_each_entry(nb, &all_netbacks, list) { + if (!memcmp(nb->chan->rings.remote_mac, mac, ETH_ALEN)) + return nb; + } + return NULL; +} + +static long netchannel2_ioctl_establish_bypass(struct netchannel2_ioctl_establish_bypass __user *argsp) +{ + struct netchannel2_ioctl_establish_bypass args; + struct netback2 *a, *b; + int res; + + if (copy_from_user(&args, argsp, sizeof(args))) + return -EFAULT; + + mutex_lock(&all_netbacks_lock); + a = find_netback_by_handle_locked(args.handle_a); + b = find_netback_by_handle_locked(args.handle_b); + if (a && b) + res = nc2_establish_bypass(a->chan, b->chan); + else + res = -EINVAL; + mutex_unlock(&all_netbacks_lock); + + return res; +} + +void nb2_handle_suggested_bypass(struct netchannel2 *a_chan, const char *mac_b) +{ + struct netback2 *b; + mutex_lock(&all_netbacks_lock); + b = find_netback_by_remote_mac_locked(mac_b); + if (b != NULL) + nc2_establish_bypass(a_chan, b->chan); + mutex_unlock(&all_netbacks_lock); +} + +static long netchannel2_ioctl_destroy_bypass(struct netchannel2_ioctl_destroy_bypass __user *argsp) +{ + struct netchannel2_ioctl_destroy_bypass args; + + if (copy_from_user(&args, argsp, sizeof(args))) + return -EFAULT; + + return nc2_destroy_bypass(args.handle); +} +#endif + +static long misc_dev_unlocked_ioctl(struct file *filp, unsigned cmd, + unsigned long data) +{ + switch (cmd) { +#ifdef CONFIG_XEN_NETDEV2_BYPASSABLE + case NETCHANNEL2_IOCTL_ESTABLISH_BYPASS: + return netchannel2_ioctl_establish_bypass( + (struct netchannel2_ioctl_establish_bypass __user *)data); + case NETCHANNEL2_IOCTL_DESTROY_BYPASS: + return netchannel2_ioctl_destroy_bypass( + (struct netchannel2_ioctl_destroy_bypass __user *)data); +#endif + default: + return -EINVAL; + } +} + +static struct file_operations misc_dev_fops = { + .owner = THIS_MODULE, + .unlocked_ioctl = misc_dev_unlocked_ioctl +}; + +static struct miscdevice netback2_misc_dev = { + .minor = MISC_DYNAMIC_MINOR, + .name = "netback2", + .fops = &misc_dev_fops +}; + int __init netback2_init(void) { int r; + r = misc_register(&netback2_misc_dev); + if (r < 0) { + printk(KERN_ERR "Error %d registering control device.\n", + r); + return r; + } r = xenbus_register_backend(&netback2); if (r < 0) { printk(KERN_ERR "error %d registering backend driver.\n", r); + misc_deregister(&netback2_misc_dev); } return r; } diff --git a/drivers/xen/netchannel2/netchannel2_core.h b/drivers/xen/netchannel2/netchannel2_core.h index c4de063..ed94098 100644 --- a/drivers/xen/netchannel2/netchannel2_core.h +++ b/drivers/xen/netchannel2/netchannel2_core.h @@ -42,7 +42,7 @@ enum transmit_policy { transmit_policy_last = transmit_policy_small }; -/* When we send a packet message, we need to tag it with an ID. That +/* When we send a packet message, we need to tag it with an ID. That ID is an index into the TXP slot array. Each slot contains either a pointer to an sk_buff (if it''s in use), or the index of the next free slot (if it isn''t). A slot is in use if the contents is > @@ -107,7 +107,6 @@ static inline struct skb_cb_overlay *get_skb_overlay(struct sk_buff *skb) return (struct skb_cb_overlay *)skb->cb; } - /* Packets for which we need to send FINISH_PACKET messages for as soon as possible. */ struct pending_finish_packets { @@ -153,7 +152,7 @@ struct netchannel2_ring_pair { /* The IRQ corresponding to the event channel which is connected to the other end. This only changes from the - xenbus state change handler. It is notified from lots of + xenbus state change handler. It is notified from lots of other places. Fortunately, it''s safe to notify on an irq after it''s been released, so the lack of synchronisation doesn''t matter. */ @@ -247,6 +246,15 @@ struct netchannel2 { it''s useful for optimisation. */ int local_trusted; +#ifdef CONFIG_XEN_NETDEV2_BYPASS_ENDPOINT + /* Alternate rings for this interface. Protected by the + master rings lock. */ + struct list_head alternate_rings; + uint8_t need_aux_ring_state_machine; + + uint8_t pending_bypass_error; +#endif + struct netchannel2_ring_pair rings; /* Packets which we need to transmit soon */ @@ -274,12 +282,261 @@ struct netchannel2 { after we receive an interrupt so that we can wake it up */ uint8_t is_stopped; +#ifdef CONFIG_XEN_NETDEV2_BYPASSABLE + /* Bypass support. */ + /* There''s some unadvertised bypass in one of the lists. */ + uint8_t need_advertise_bypasses; + uint8_t bypass_max_pages; + uint16_t max_bypasses; + uint16_t extant_bypasses; + struct list_head bypasses_a; + struct list_head bypasses_b; + + struct nc2_bypass *current_bypass_frontend; +#endif + /* Updates are protected by the lock. This can be read at any * time without holding any locks, and the rest of Linux is * expected to cope. */ struct net_device_stats stats; }; +#ifdef CONFIG_XEN_NETDEV2_BYPASSABLE +#define MAX_BYPASS_RING_PAGES_GRANTABLE 4 +struct nc2_bypass_endpoint { + struct list_head list; /* Always ``valid'''', but won''t actually + be in any list if we''re detached (it + gets set to the empty list). */ + struct netchannel2 *nc2; /* Valid provided detached isn''t + * set */ + grant_ref_t incoming_grefs[MAX_BYPASS_RING_PAGES_GRANTABLE]; + grant_ref_t outgoing_grefs[MAX_BYPASS_RING_PAGES_GRANTABLE]; + grant_ref_t control_gref; + unsigned long incoming_pages[MAX_BYPASS_RING_PAGES_GRANTABLE]; + + uint8_t need_advertise; + uint8_t need_disable; + uint8_t disable_sent; + uint8_t disabled; + uint8_t need_detach; + uint8_t detach_sent; + uint8_t detached; +}; + +/* This is the representation of a bypass in the bypassed domain. */ +struct nc2_bypass { + /* Cleared to an empty list if both endpoints are detached. */ + struct list_head list; + + /* Reference count. Being on the big list, threaded through + @list, counts as a single reference. */ + atomic_t refcnt; + + struct nc2_bypass_endpoint ep_a; + struct nc2_bypass_endpoint ep_b; + unsigned long control_page; + unsigned nr_ring_pages; + + unsigned handle; + int evtchn_port; + + wait_queue_head_t detach_waitq; +}; + +int nc2_establish_bypass(struct netchannel2 *a, struct netchannel2 *b); +int nc2_destroy_bypass(int handle); +void _nc2_advertise_bypasses(struct netchannel2 *nc); +static inline void nc2_advertise_bypasses(struct netchannel2 *nc) +{ + if (nc->need_advertise_bypasses) + _nc2_advertise_bypasses(nc); +} +void nc2_handle_bypass_disabled(struct netchannel2 *nc, + struct netchannel2_ring_pair *ncrp, + struct netchannel2_msg_hdr *hdr); +void nc2_handle_bypass_detached(struct netchannel2 *nc, + struct netchannel2_ring_pair *ncrp, + struct netchannel2_msg_hdr *hdr); +void nc2_handle_bypass_frontend_ready(struct netchannel2 *nc, + struct netchannel2_ring_pair *ncrp, + struct netchannel2_msg_hdr *hdr); +void nc2_handle_bypass_disabled(struct netchannel2 *nc, + struct netchannel2_ring_pair *ncrp, + struct netchannel2_msg_hdr *hdr); +void nc2_handle_bypass_detached(struct netchannel2 *nc, + struct netchannel2_ring_pair *ncrp, + struct netchannel2_msg_hdr *hdr); +void release_bypasses(struct netchannel2 *nc); +void nb2_handle_suggested_bypass(struct netchannel2 *a_chan, + const char *mac_b); +#else +static inline void release_bypasses(struct netchannel2 *nc) +{ +} +static inline void nc2_advertise_bypasses(struct netchannel2 *nc) +{ +} +static inline void nc2_handle_bypass_frontend_ready(struct netchannel2 *nc, + struct netchannel2_ring_pair *ncrp, + struct netchannel2_msg_hdr *hdr) +{ +} +static inline void nc2_handle_bypass_disabled(struct netchannel2 *nc, + struct netchannel2_ring_pair *ncrp, + struct netchannel2_msg_hdr *hdr) +{ +} +static inline void nc2_handle_bypass_detached(struct netchannel2 *nc, + struct netchannel2_ring_pair *ncrp, + struct netchannel2_msg_hdr *hdr) +{ +} +#endif + +#ifdef CONFIG_XEN_NETDEV2_BYPASS_ENDPOINT +#define MAX_BYPASS_RING_PAGES_MAPPABLE 4 +/* This is the representation of a bypass from the point of view of + one of the endpoint domains. */ +struct nc2_alternate_ring { + /* List of all alternate rings on a given interface. Dangles + * off of alternate_rings in struct netchannel2. Protected by + * the netchannel2 master ring lock. */ + struct list_head rings_by_interface; + /* The state of the alternate ring. This only ever goes + * forwards. It is protected by the auxiliary ring lock. */ + enum { + /* This is a frontend, it''s just been allocated and + doesn''t yet have a port. */ + nc2_alt_ring_frontend_preparing = 0xf001, + /* This is a frontend, it has a port but hasn''t told + the parent yet. */ + nc2_alt_ring_frontend_send_ready_pending, + /* We''ve sent the FRONTEND_READY message and are + waiting for the backend to say it''s ready. */ + nc2_alt_ring_frontend_sent_ready, + /* This is a backend. In theory, we know what port to + use, but we haven''t tried to bind to it yet. */ + nc2_alt_ring_backend_preparing, + /* Running normally */ + nc2_alt_ring_ready, + /* Can''t be used for more PACKETs, will disable as + soon as all FINISHes arrive. */ + nc2_alt_ring_disabling, + /* All FINISHes arrived, waiting to send DISABLED */ + nc2_alt_ring_disabled_pending, + /* DISABLED sent. */ + nc2_alt_ring_disabled, + /* DETACH received */ + nc2_alt_ring_detaching, + /* Ring has been detached, waiting to send the + DETACHED message. */ + nc2_alt_ring_detached_pending + } state; + struct work_struct work_item; + struct work_struct detach_work_item; + + struct grant_mapping prod_mapper; + struct grant_mapping cons_mapper; + struct grant_mapping control_mapper; + + struct netchannel2_ring_pair rings; + + /* A lower bound on the number of times we''ve called + disable_irq() on the irq. The interrupt handler guarantees + to notify the eventq quickly if this increases. It + increases whenever there is work for the worker thread to + do. */ + atomic_t irq_disable_count; + wait_queue_head_t eventq; + uint32_t handle; + + struct netchannel2_msg_bypass_frontend frontend_setup_msg; + struct netchannel2_msg_bypass_backend backend_setup_msg; + uint32_t cons_grefs[MAX_BYPASS_RING_PAGES_MAPPABLE]; + uint32_t prod_grefs[MAX_BYPASS_RING_PAGES_MAPPABLE]; +}; + +void nc2_handle_bypass_ready(struct netchannel2 *nc, + struct netchannel2_ring_pair *ncrp, + struct netchannel2_msg_hdr *hdr); +int bypass_xmit_packet(struct netchannel2 *nc, + struct nc2_alternate_ring *ncr, + struct sk_buff *skb); +void _nc2_alternate_ring_disable_finish(struct nc2_alternate_ring *ncr); +static inline void nc2_alternate_ring_disable_finish(struct netchannel2_ring_pair *ncrp) +{ + struct nc2_alternate_ring *nar; + nar = container_of(ncrp, struct nc2_alternate_ring, rings); + if (nar->state == nc2_alt_ring_disabling && + ncrp->nr_tx_packets_outstanding == 0) + _nc2_alternate_ring_disable_finish(nar); +} +void _nc2_crank_aux_ring_state_machine(struct netchannel2 *nc); +static inline void nc2_crank_aux_ring_state_machine(struct netchannel2 *nc) +{ + if (nc->need_aux_ring_state_machine) + _nc2_crank_aux_ring_state_machine(nc); +} +void nc2_release_alt_rings(struct netchannel2 *nc); +void detach_all_bypasses(struct netchannel2 *nc); +void nc2_handle_bypass_frontend(struct netchannel2 *nc, + struct netchannel2_ring_pair *ncrp, + struct netchannel2_msg_hdr *hdr); +void nc2_handle_bypass_backend(struct netchannel2 *nc, + struct netchannel2_ring_pair *ncrp, + struct netchannel2_msg_hdr *hdr); +void nc2_handle_bypass_disable(struct netchannel2 *nc, + struct netchannel2_ring_pair *ncrp, + struct netchannel2_msg_hdr *hdr); +void nc2_handle_bypass_detach(struct netchannel2 *nc, + struct netchannel2_ring_pair *ncrp, + struct netchannel2_msg_hdr *hdr); +void nc2_handle_bypass_ready(struct netchannel2 *nc, + struct netchannel2_ring_pair *ncrp, + struct netchannel2_msg_hdr *hdr); +void nc2_aux_ring_start_disable_sequence(struct nc2_alternate_ring *nar); +void nc2_aux_ring_start_detach_sequence(struct nc2_alternate_ring *nar); +#else +static inline void detach_all_bypasses(struct netchannel2 *nc) +{ +} +static inline void nc2_crank_aux_ring_state_machine(struct netchannel2 *nc) +{ +} +static inline void nc2_alternate_ring_disable_finish(struct netchannel2_ring_pair *ncrp) +{ +} +static inline void nc2_release_alt_rings(struct netchannel2 *nc) +{ +} +static inline void nc2_handle_bypass_frontend(struct netchannel2 *nc, + struct netchannel2_ring_pair *ncrp, + struct netchannel2_msg_hdr *hdr) +{ +} +static inline void nc2_handle_bypass_backend(struct netchannel2 *nc, + struct netchannel2_ring_pair *ncrp, + struct netchannel2_msg_hdr *hdr) +{ +} +static inline void nc2_handle_bypass_disable(struct netchannel2 *nc, + struct netchannel2_ring_pair *ncrp, + struct netchannel2_msg_hdr *hdr) +{ +} +static inline void nc2_handle_bypass_detach(struct netchannel2 *nc, + struct netchannel2_ring_pair *ncrp, + struct netchannel2_msg_hdr *hdr) +{ +} +static inline void nc2_handle_bypass_ready(struct netchannel2 *nc, + struct netchannel2_ring_pair *ncrp, + struct netchannel2_msg_hdr *hdr) +{ +} +#endif + + static inline void flush_prepared_grant_copies(struct hypercall_batcher *hb, void (*on_fail)(void *ctxt, gnttab_copy_t *gop)) @@ -371,9 +628,24 @@ int nc2_map_grants(struct grant_mapping *gm, domid_t remote_domain); void nc2_unmap_grants(struct grant_mapping *gm); +void _nc2_attach_rings(struct netchannel2_ring_pair *ncrp, + struct netchannel2_sring_cons *cons_sring, + const volatile void *cons_payload, + size_t cons_size, + struct netchannel2_sring_prod *prod_sring, + void *prod_payload, + size_t prod_size, + domid_t otherend_id); void queue_packet_to_interface(struct sk_buff *skb, struct netchannel2_ring_pair *ncrp); +unsigned get_transmitted_packet_msg_size(struct sk_buff *skb); +int init_ring_pair(struct netchannel2_ring_pair *ncrp, + struct netchannel2 *nc); + +irqreturn_t nc2_int(int irq, void *dev_id); + +void cleanup_ring_pair(struct netchannel2_ring_pair *ncrp); void nc2_rscb_on_gntcopy_fail(void *ctxt, gnttab_copy_t *gop); int init_receive_map_mode(void); diff --git a/drivers/xen/netchannel2/netchannel2_uspace.h b/drivers/xen/netchannel2/netchannel2_uspace.h new file mode 100644 index 0000000..f4d06ca --- /dev/null +++ b/drivers/xen/netchannel2/netchannel2_uspace.h @@ -0,0 +1,17 @@ +#ifndef NETCHANNEL2_USPACE_H__ +#define NETCHANNEL2_USPACE_H__ + +#include <linux/ioctl.h> + +struct netchannel2_ioctl_establish_bypass { + unsigned handle_a; + unsigned handle_b; +}; +#define NETCHANNEL2_IOCTL_ESTABLISH_BYPASS _IOW(''N'', 0, struct netchannel2_ioctl_establish_bypass) + +struct netchannel2_ioctl_destroy_bypass { + unsigned handle; +}; +#define NETCHANNEL2_IOCTL_DESTROY_BYPASS _IOW(''N'', 1, struct netchannel2_ioctl_destroy_bypass) + +#endif /* !NETCHANNEL2_USPACE_H__ */ diff --git a/drivers/xen/netchannel2/netfront2.c b/drivers/xen/netchannel2/netfront2.c index fb5d426..9b2e2ec 100644 --- a/drivers/xen/netchannel2/netfront2.c +++ b/drivers/xen/netchannel2/netfront2.c @@ -180,6 +180,19 @@ again: goto abort; } +#ifdef CONFIG_XEN_NETDEV2_BYPASS_ENDPOINT + err = xenbus_printf(xbt, nf->xenbus_device->nodename, + "feature-bypass", "1"); + if (!err) + err = xenbus_printf(xbt, nf->xenbus_device->nodename, + "feature-bypass-max-pages", "%d", + MAX_BYPASS_RING_PAGES_MAPPABLE); + if (err) { + msg = "publishing bypass info"; + goto abort; + } +#endif + err = xenbus_transaction_end(xbt, 0); if (err) { if (err == -EAGAIN) @@ -438,6 +451,17 @@ err: return -ENOMEM; } +static int netfront_suspend(struct xenbus_device *xd) +{ + /* We''re about to suspend. Do the minimum amount of work to + make that safe. */ + struct netfront2 *nf = xenbus_device_to_nf2(xd); + + nc2_suspend(nf->chan); + + return 0; +} + static int netfront_resume(struct xenbus_device *xd) { /* We''ve been suspended and come back. The rings are @@ -475,6 +499,7 @@ static struct xenbus_driver netfront2 = { .remove = __devexit_p(netfront_remove), .otherend_changed = backend_changed, .resume = netfront_resume, + .suspend = netfront_suspend, }; int __init netfront2_init(void) diff --git a/drivers/xen/netchannel2/recv_packet.c b/drivers/xen/netchannel2/recv_packet.c index 8c38788..749c70e 100644 --- a/drivers/xen/netchannel2/recv_packet.c +++ b/drivers/xen/netchannel2/recv_packet.c @@ -80,6 +80,15 @@ void nc2_handle_packet_msg(struct netchannel2 *nc, nc2_copy_from_ring(&ncrp->cons_ring, &msg, sizeof(msg)); + if (msg.type != NC2_PACKET_TYPE_receiver_copy && + msg.type != NC2_PACKET_TYPE_small && + ncrp != &nc->rings) { + pr_debug("Received strange packet type %d on bypass ring.\n", + msg.type); + nc->stats.tx_errors++; + return; + } + frags_bytes = hdr->size - sizeof(msg) - msg.prefix_size; nr_frags = frags_bytes / sizeof(struct netchannel2_fragment); diff --git a/drivers/xen/netchannel2/xmit_packet.c b/drivers/xen/netchannel2/xmit_packet.c index d95ad09..a24105a 100644 --- a/drivers/xen/netchannel2/xmit_packet.c +++ b/drivers/xen/netchannel2/xmit_packet.c @@ -40,7 +40,7 @@ enum prepare_xmit_result prepare_xmit_allocate_small( } /* Figure out how much space @tp will take up on the ring. */ -static unsigned get_transmitted_packet_msg_size(struct sk_buff *skb) +unsigned get_transmitted_packet_msg_size(struct sk_buff *skb) { struct skb_cb_overlay *skb_co = get_skb_overlay(skb); return (sizeof(struct netchannel2_msg_packet) + @@ -238,6 +238,21 @@ int nc2_start_xmit(struct sk_buff *skb, struct net_device *dev) spin_lock_bh(&nc->rings.lock); + /* If we have a bypass suitable for this packet then we prefer + * that to the main ring pair. */ +#ifdef CONFIG_XEN_NETDEV2_BYPASS_ENDPOINT + { + struct nc2_alternate_ring *ncr; + list_for_each_entry(ncr, &nc->alternate_rings, + rings_by_interface) { + if (bypass_xmit_packet(nc, ncr, skb)) { + spin_unlock_bh(&nc->rings.lock); + return NETDEV_TX_OK; + } + } + } +#endif + if (!nc->rings.is_attached) goto out_drop; diff --git a/include/xen/interface/io/netchannel2.h b/include/xen/interface/io/netchannel2.h index f264995..f3cabe8 100644 --- a/include/xen/interface/io/netchannel2.h +++ b/include/xen/interface/io/netchannel2.h @@ -188,4 +188,142 @@ struct netchannel2_msg_set_max_fragments_per_packet { uint32_t max_frags_per_packet; }; +/* Attach to a bypass ring as a frontend. The receiving domain should + * map the bypass ring (which will be in the sending domain''s memory) + * and attach to it in the same as it attached to the original ring. + * This bypass ring will, once it''s been successfully set up, be used + * for all packets destined for @remote_mac (excluding broadcasts). + * + * @ring_domid indicates which domain allocated the ring pages, and + * hence which domain should be specified when grant mapping + * @control_gref, @prod_gref, and @cons_gref. It can be set to + * DOMID_SELF, in which case the domain ID of the domain sending the + * message should be used. + * + * @peer_domid indicates the domain ID of the domain on the other end + * of the ring. + * + * @handle gives a unique handle for the bypass which will be used in + * future messages. + * + * @peer_trusted is true if the peer should be trusted by the domain + * which sent the bypass message. + * + * @ring_pages gives the number of valid grefs in the @prod_grefs and + * @cons_grefs arrays. + * + * @is_backend_like indicates which ring attach the receiving domain + * should use. If @is_backend_like is set, the receiving domain + * should interpret the control area as a netchannel2_backend_shared. + * Otherwise, it''s a netchannel2_frontend_shared. Also, a + * backend-like endpoint should receive an event channel from the peer + * domain, while a frontend-like one should send one. Once + * established, the ring is symmetrical. + * + * + * BYPASS messages can only be sent by a trusted endpoint. They may + * not be sent over bypass rings. + * + * No packets may be sent over the ring until a READY message is + * received. Until that point, all packets must be sent over the + * parent ring. + */ +struct netchannel2_msg_bypass_common { + uint16_t ring_domid; + uint16_t peer_domid; + uint32_t handle; + + uint8_t remote_mac[6]; + uint8_t peer_trusted; + uint8_t ring_pages; + + uint32_t control_gref; + uint32_t pad; + + /* Followed by a run of @ring_pages uint32_t producer ring + grant references, then a run of @ring_pages uint32_t + consumer ring grant references */ +}; + +#define NETCHANNEL2_MSG_BYPASS_FRONTEND 9 +struct netchannel2_msg_bypass_frontend { + struct netchannel2_msg_hdr hdr; + uint32_t pad; + struct netchannel2_msg_bypass_common common; +}; + +#define NETCHANNEL2_MSG_BYPASS_BACKEND 10 +struct netchannel2_msg_bypass_backend { + struct netchannel2_msg_hdr hdr; + uint32_t port; + struct netchannel2_msg_bypass_common common; +}; + +#define NETCHANNEL2_MSG_BYPASS_FRONTEND_READY 11 +struct netchannel2_msg_bypass_frontend_ready { + struct netchannel2_msg_hdr hdr; + int32_t port; +}; + +/* This message is sent on a bypass ring once the sending domain is + * ready to receive packets. Until it has been received, the bypass + * ring cannot be used to transmit packets. It may only be sent once. + * + * Note that it is valid to send packet messages before *sending* a + * BYPASS_READY message, provided a BYPASS_READY message has been + * *received*. + * + * This message can only be sent on a bypass ring. + */ +#define NETCHANNEL2_MSG_BYPASS_READY 12 +struct netchannel2_msg_bypass_ready { + struct netchannel2_msg_hdr hdr; + uint32_t pad; +}; + +/* Disable an existing bypass. This is sent over the *parent* ring, + * in the same direction as the original BYPASS message, when the + * bypassed domain wishes to disable the ring. The receiving domain + * should stop sending PACKET messages over the ring, wait for FINISH + * messages for any outstanding PACKETs, and then acknowledge this + * message with a DISABLED message. + * + * This message may not be sent on bypass rings. + */ +#define NETCHANNEL2_MSG_BYPASS_DISABLE 13 +struct netchannel2_msg_bypass_disable { + struct netchannel2_msg_hdr hdr; + uint32_t handle; +}; +#define NETCHANNEL2_MSG_BYPASS_DISABLED 14 +struct netchannel2_msg_bypass_disabled { + struct netchannel2_msg_hdr hdr; + uint32_t handle; +}; + +/* Detach from an existing bypass. This is sent over the *parent* in + * the same direction as the original BYPASS message, when the + * bypassed domain wishes to destroy the ring. The receiving domain + * should immediately unmap the ring and respond with a DETACHED + * message. Any PACKET messages which haven''t already received a + * FINISH message are dropped. + * + * During a normal shutdown, this message will be sent after DISABLED + * messages have been received from both endpoints. However, it can + * also be sent without a preceding DISABLE message if the other + * endpoint appears to be misbehaving or has crashed. + * + * This message may not be sent on bypass rings. + */ +#define NETCHANNEL2_MSG_BYPASS_DETACH 15 +struct netchannel2_msg_bypass_detach { + struct netchannel2_msg_hdr hdr; + uint32_t handle; +}; +#define NETCHANNEL2_MSG_BYPASS_DETACHED 16 +struct netchannel2_msg_bypass_detached { + struct netchannel2_msg_hdr hdr; + uint32_t handle; +}; + #endif /* !__NETCHANNEL2_H__ */ -- 1.6.3.1 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
<steven.smith@citrix.com>
2009-Oct-04 15:04 UTC
[Xen-devel] [PATCH 17/22] Add some userspace tools for managing the creation and destruction of bypass rings.
These are pretty skanky. It''s not clear where they should live, or even whether they should live at all, but they''re handy for testing. Signed-off-by: Steven Smith <steven.smith@citrix.com> --- drivers/xen/netchannel2/tools/destroy_bypass.c | 25 +++++++++++++++++ drivers/xen/netchannel2/tools/establish_bypass.c | 31 ++++++++++++++++++++++ 2 files changed, 56 insertions(+), 0 deletions(-) create mode 100644 drivers/xen/netchannel2/tools/destroy_bypass.c create mode 100644 drivers/xen/netchannel2/tools/establish_bypass.c diff --git a/drivers/xen/netchannel2/tools/destroy_bypass.c b/drivers/xen/netchannel2/tools/destroy_bypass.c new file mode 100644 index 0000000..93b82e0 --- /dev/null +++ b/drivers/xen/netchannel2/tools/destroy_bypass.c @@ -0,0 +1,25 @@ +#include <sys/ioctl.h> +#include <err.h> +#include <fcntl.h> +#include <stdio.h> +#include <stdlib.h> +#include <unistd.h> +#include "../netchannel2_uspace.h" + +int +main(int argc, char *argv[]) +{ + int fd; + struct netchannel2_ioctl_destroy_bypass ioc; + int r; + + fd = open("/dev/netback2", O_RDWR); + if (fd < 0) + err(1, "openning /dev/netback2"); + ioc.handle = atoi(argv[1]); + + r = ioctl(fd, NETCHANNEL2_IOCTL_DESTROY_BYPASS, &ioc); + if (r < 0) + err(1, "destroying bypass"); + return 0; +} diff --git a/drivers/xen/netchannel2/tools/establish_bypass.c b/drivers/xen/netchannel2/tools/establish_bypass.c new file mode 100644 index 0000000..bdd326c --- /dev/null +++ b/drivers/xen/netchannel2/tools/establish_bypass.c @@ -0,0 +1,31 @@ +#include <sys/ioctl.h> +#include <err.h> +#include <fcntl.h> +#include <stdio.h> +#include <stdlib.h> +#include <unistd.h> +#include "../netchannel2_uspace.h" + +int +main(int argc, char *argv[]) +{ + int fd; + unsigned a; + unsigned b; + struct netchannel2_ioctl_establish_bypass ioc; + int r; + + fd = open("/dev/netback2", O_RDWR); + if (fd < 0) + err(1, "openning /dev/netback2"); + a = atoi(argv[1]); + b = atoi(argv[2]); + ioc.handle_a = a; + ioc.handle_b = b; + + r = ioctl(fd, NETCHANNEL2_IOCTL_ESTABLISH_BYPASS, &ioc); + if (r < 0) + err(1, "establishing bypass"); + printf("%d\n", r); + return 0; +} -- 1.6.3.1 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
<steven.smith@citrix.com>
2009-Oct-04 15:04 UTC
[Xen-devel] [PATCH 18/22] Add support for automatically creating and destroying bypass rings in response to observed traffic.
This is designed to minimise the overhead of the autobypass machine, and in particular to minimise the overhead in dom0, potentially at the cost of not always detecting that a bypass would be useful. In particular, it isn''t triggered by transmit_policy_small packets, and so if you have a lot of very small packets then no bypass will be created. Signed-off-by: Steven Smith <steven.smith@citrix.com> --- drivers/xen/Kconfig | 8 + drivers/xen/netchannel2/Makefile | 4 + drivers/xen/netchannel2/autobypass.c | 316 ++++++++++++++++++++++++++++ drivers/xen/netchannel2/bypass.c | 40 +++- drivers/xen/netchannel2/bypassee.c | 65 ++++++ drivers/xen/netchannel2/chan.c | 6 + drivers/xen/netchannel2/netchannel2_core.h | 108 ++++++++++- drivers/xen/netchannel2/recv_packet.c | 13 ++ drivers/xen/netchannel2/rscb.c | 19 ++ include/xen/interface/io/netchannel2.h | 13 ++ 10 files changed, 585 insertions(+), 7 deletions(-) create mode 100644 drivers/xen/netchannel2/autobypass.c diff --git a/drivers/xen/Kconfig b/drivers/xen/Kconfig index d4265f6..a7e5b5c 100644 --- a/drivers/xen/Kconfig +++ b/drivers/xen/Kconfig @@ -253,6 +253,14 @@ config XEN_NETDEV2_BYPASS_ENDPOINT Bypasses allow faster inter-domain communication, provided every VM supports them. +config XEN_NETDEV2_AUTOMATIC_BYPASS + bool "Automatically manage netchannel2 bypasses" + depends on XEN_NETDEV2_BYPASS_ENDPOINT + default y + help + Try to detect when bypasses would be useful, and manage + them automatically. + config XEN_GRANT_DEV tristate "User-space granted page access driver" default XEN_PRIVILEGED_GUEST diff --git a/drivers/xen/netchannel2/Makefile b/drivers/xen/netchannel2/Makefile index 5aa3410..9c4f97a 100644 --- a/drivers/xen/netchannel2/Makefile +++ b/drivers/xen/netchannel2/Makefile @@ -19,3 +19,7 @@ endif ifeq ($(CONFIG_XEN_NETDEV2_BYPASS_ENDPOINT),y) netchannel2-objs += bypass.o endif + +ifeq ($(CONFIG_XEN_NETDEV2_AUTOMATIC_BYPASS),y) +netchannel2-objs += autobypass.o +endif diff --git a/drivers/xen/netchannel2/autobypass.c b/drivers/xen/netchannel2/autobypass.c new file mode 100644 index 0000000..c83dac6 --- /dev/null +++ b/drivers/xen/netchannel2/autobypass.c @@ -0,0 +1,316 @@ +#include <linux/kernel.h> +#include <linux/jiffies.h> +#include "netchannel2_core.h" + +/* The state machine works like this: + + -- We start in state NORMAL. In this state, we count how many + bypass and non-bypass packets we receive, and don''t do anything + else. + + -- After receiving AUTOBYPASS_PERIOD packets, we look at the + bypass-candidate to non-bypass-candidate ratio. If the number + of non-bypass packets exceeds the number of bypass packets by + more than a factor of AUTOBYPASS_RATIO, reset the counters and + go back to state NORMAL. Otherwise, go to state CONSIDERING. + We also reset and go back to normal if it took more than + AUTOBYPASS_MAX_PERIOD_JIFFIES jiffies to get here. + + -- In state CONSIDERING, continue to count up the bypass and + non-bypass packets. In addition, whenever we get a bypass + packet, pull the source MAC address out of the header and + compare it to the hot list. If it''s in the hot list, increment + that entry''s count. + + -- After another AUTOBYPASS_PERIOD, check the packet counts again. + Provided the total bypass ratio is good enough (see the NORMAL + exit criteria), walk the hot list, and if any entry accounts for + more than AUTOBYPASS_RATIO2 of the total traffic, suggest to + dom0 that it create a new bypass for us. The go to DEBOUNCE. + + -- In DEBOUNCE, wait until we''ve received at least + AUTOBYPASS_DEBOUNCE_PERIOD bypass packets, then go to NORMAL. + + So, we establish a bypass if total traffic > PERIOD/MAX_PERIOD + packets per second, of which at least PERIOD/(MAX_PERIOD*(RATIO+1)) + are bypass candidates and PERIOD/(MAX_PERIOD*(RATIO2+1)) are for + one specific bypass. This needs to be sustained for at least + PERIOD*2 before we actually establish a bypass. +*/ + +/* If you increase this past 65536, consider changing the type of + auto_bypass.hot_macs[...].count, to avoid overflow. */ +#define AUTOBYPASS_PERIOD 1024 +#define AUTOBYPASS_RATIO 2 +#define AUTOBYPASS_RATIO2 4 +#define AUTOBYPASS_DEBOUNCE_PERIOD 1024 +#define AUTOBYPASS_MAX_PERIOD_JIFFIES (HZ/2) + + +#define TEARDOWN_PERIOD_JIFFIES (HZ*5) +#define TEARDOWN_MIN_PACKETS (256*TEARDOWN_PERIOD_JIFFIES) + +static void autoteardown_timer_fn(unsigned long ignore); + +static DEFINE_SPINLOCK(autoteardown_lock); +static LIST_HEAD(autoteardown_list); +static DEFINE_TIMER(autoteardown_timer, autoteardown_timer_fn, 0, 0); + +static void autoteardown_timer_fn(unsigned long ignore) +{ + struct nc2_alternate_ring *nar; + + spin_lock(&autoteardown_lock); + list_for_each_entry(nar, &autoteardown_list, + autoteardown.autoteardown_list) { + if (nar->autoteardown.seen_count < 2) { + /* Give it at least two periods to get started, + to avoid flapping. */ + /* One period isn''t enough, because we reset + the seen_count without holding the teardown + lock from + nc2_aux_ring_start_disable_sequence, and + there''s a risk that we''ll see it non-zero + when it should be zero. However, the + chances of that happening twice in a row + are so small that we can ignore them. Even + if it does go wrong twice, the worst case + is that we drop a few packets by forcing a + detach when the remote is behaving + correctly. */ + nar->autoteardown.seen_count++; + continue; + } + switch (nar->state) { + case nc2_alt_ring_frontend_sent_ready: + /* Interesting. We''re ready to go, but the + backend isn''t. Furthermore, this isn''t the + first time we''ve seen this interface, so + we''ve been trying to establish it for at + least TEARDOWN_PERIOD_JIFFIES. Conclude + that the backend is misbehaving and start a + disable sequence. */ + nc2_aux_ring_start_disable_sequence(nar); + break; + case nc2_alt_ring_ready: + if (nar->autoteardown.nr_packets < + TEARDOWN_MIN_PACKETS) { + /* This interface isn''t busy enough -> + needs to be torn down. */ + nc2_aux_ring_start_disable_sequence(nar); + } + nar->autoteardown.nr_packets = 0; + break; + case nc2_alt_ring_disabling: + /* We seem to have gotten stuck trying to + disable the ring, probably because the + remote isn''t sending FINISH messages fast + enough. Be a bit more aggressive. */ + nc2_aux_ring_start_detach_sequence(nar); + break; + default: + /* Other states are waiting either for the + local operating system to complete work + items, or for the upstream interface to + process messages. Upstream is always + trusted, so just assume that this''ll fix + itself sooner or later. */ + break; + } + } + if (!list_empty(&autoteardown_list)) { + mod_timer(&autoteardown_timer, + jiffies + TEARDOWN_PERIOD_JIFFIES); + } + spin_unlock(&autoteardown_lock); +} + +void nc2_register_bypass_for_autoteardown(struct nc2_alternate_ring *nar) +{ + spin_lock_bh(&autoteardown_lock); + if (list_empty(&autoteardown_list)) + mod_timer(&autoteardown_timer, + jiffies + TEARDOWN_PERIOD_JIFFIES); + list_move(&nar->autoteardown.autoteardown_list, &autoteardown_list); + spin_unlock_bh(&autoteardown_lock); +} + +void nc2_unregister_bypass_for_autoteardown(struct nc2_alternate_ring *nar) +{ + spin_lock_bh(&autoteardown_lock); + list_del_init(&nar->autoteardown.autoteardown_list); + if (list_empty(&autoteardown_list)) + del_timer(&autoteardown_timer); + spin_unlock_bh(&autoteardown_lock); +} + +static int busy_enough_for_bypass(struct netchannel2 *nc) +{ + uint64_t nr_non_bypass; + unsigned long start_jiffies; + + nr_non_bypass = nc->auto_bypass.nr_non_bypass_packets; + start_jiffies = nc->auto_bypass.start_jiffies; + nc->auto_bypass.nr_non_bypass_packets = 0; + nc->auto_bypass.nr_bypass_packets = 0; + if (nr_non_bypass > AUTOBYPASS_PERIOD * AUTOBYPASS_RATIO || + jiffies - start_jiffies > AUTOBYPASS_MAX_PERIOD_JIFFIES) { + /* Either took too long to collect the bypass + packets, or too many non-bypass relative to + number of bypasses. Either way, not a good + time to consider doing bypasses. */ + nc->auto_bypass.start_jiffies = jiffies; + return 0; + } else { + return 1; + } +} + +static void record_source_mac(struct netchannel2 *nc, struct sk_buff *skb) +{ + struct ethhdr *eh; + unsigned x; + + if (skb_headlen(skb) < sizeof(struct ethhdr)) + return; + eh = (struct ethhdr *)skb->data; + for (x = 0; x < nc->auto_bypass.nr_hot_macs; x++) { + if (!memcmp(eh->h_source, nc->auto_bypass.hot_macs[x].mac, + sizeof(eh->h_source))) { + nc->auto_bypass.hot_macs[x].count++; + return; + } + } + if (x == AUTOBYPASS_MAX_HOT_MACS) { + /* Communicating with too many bypass candidates -> + can''t keep track of them all -> drop a couple at + random. */ + return; + } + nc->auto_bypass.hot_macs[x].count = 1; + memcpy(nc->auto_bypass.hot_macs[x].mac, + eh->h_source, + sizeof(eh->h_source)); + nc->auto_bypass.nr_hot_macs++; +} + +static void queue_suggested_bypass(struct netchannel2 *nc, + const char *mac) +{ + int ind; + + ind = nc->auto_bypass.suggestion_head % AUTOBYPASS_SUGG_QUEUE_SIZE; + if (nc->auto_bypass.suggestion_head =+ nc->auto_bypass.suggestion_tail + AUTOBYPASS_SUGG_QUEUE_SIZE) { + /* We''ve overflowed the suggestion queue. That means + that, even though we''re receiving a massive number + of packets, we''ve never had enough free ring space + to actually send a suggestion message. I''m not + convinced that''s actually possible, but it''s + trivial to handle, so we might as well. */ + /* Drop the oldest pending suggestion, since it''s the + most likely to be out of date and therefore + useless. */ + nc->auto_bypass.suggestion_tail++; + } + nc->auto_bypass.suggestion_head++; + memcpy(&nc->auto_bypass.suggestions[ind], + mac, + ETH_ALEN); +} + +static void suggest_bypasses(struct netchannel2 *nc) +{ + unsigned x; + unsigned threshold; + + BUG_ON(nc->auto_bypass.nr_hot_macs == 0); + threshold + (nc->auto_bypass.nr_non_bypass_packets + + nc->auto_bypass.nr_bypass_packets) / AUTOBYPASS_RATIO2; + for (x = 0; x < nc->auto_bypass.nr_hot_macs; x++) { + if (nc->auto_bypass.hot_macs[x].count > threshold) { + queue_suggested_bypass( + nc, + nc->auto_bypass.hot_macs[x].mac); + } + } +} + +/* Called under the master ring lock whenever we receive a packet with + NC2_PACKET_FLAG_bypass_candidate set. */ +void nc2_received_bypass_candidate_packet(struct netchannel2 *nc, + struct sk_buff *skb) +{ + nc->auto_bypass.nr_bypass_packets++; + switch (nc->auto_bypass.state) { + case autobypass_state_normal: + if (nc->auto_bypass.nr_bypass_packets != AUTOBYPASS_PERIOD) + return; + if (!busy_enough_for_bypass(nc)) + return; + nc->auto_bypass.nr_hot_macs = 0; + nc->auto_bypass.state = autobypass_state_considering; + break; + case autobypass_state_considering: + record_source_mac(nc, skb); + if (nc->auto_bypass.nr_bypass_packets != AUTOBYPASS_PERIOD) + return; + if (busy_enough_for_bypass(nc)) + suggest_bypasses(nc); + nc->auto_bypass.state = autobypass_state_debounce; + break; + case autobypass_state_debounce: + if (nc->auto_bypass.nr_bypass_packets == AUTOBYPASS_PERIOD) { + nc->auto_bypass.state = autobypass_state_normal; + nc->auto_bypass.nr_non_bypass_packets = 0; + nc->auto_bypass.nr_bypass_packets = 0; + nc->auto_bypass.start_jiffies = jiffies; + } + break; + } +} + +static int send_suggestion(struct netchannel2_ring_pair *ncrp, + const char *mac) +{ + struct netchannel2_msg_suggest_bypass msg; + + if (!nc2_can_send_payload_bytes(&ncrp->prod_ring, sizeof(msg))) + return 0; + + memset(&msg, 0, sizeof(msg)); + memcpy(msg.mac, mac, ETH_ALEN); + nc2_send_message(&ncrp->prod_ring, + NETCHANNEL2_MSG_SUGGEST_BYPASS, + 0, + &msg, + sizeof(msg)); + ncrp->pending_time_sensitive_messages = 1; + return 1; +} + +void _nc2_autobypass_make_suggestions(struct netchannel2 *nc) +{ + struct nc2_auto_bypass *nab = &nc->auto_bypass; + struct netchannel2_ring_pair *ncrp = &nc->rings; + unsigned ind; + + while (nab->suggestion_tail != nab->suggestion_head) { + BUG_ON(nab->suggestion_head - nab->suggestion_tail > + AUTOBYPASS_SUGG_QUEUE_SIZE); + ind = nab->suggestion_tail % AUTOBYPASS_SUGG_QUEUE_SIZE; + if (!send_suggestion(ncrp, nab->suggestions[ind].mac)) + break; + nab->suggestion_tail++; + } +} + +void nc2_shutdown_autoteardown(void) +{ + /* There shouldn''t be any interfaces at all, so there + certainly won''t be any bypasses, and we don''t have to worry + about the timer getting requeued. Make sure it''s finished + and then get out. */ + del_timer_sync(&autoteardown_timer); +} diff --git a/drivers/xen/netchannel2/bypass.c b/drivers/xen/netchannel2/bypass.c index c907b48..227824a 100644 --- a/drivers/xen/netchannel2/bypass.c +++ b/drivers/xen/netchannel2/bypass.c @@ -66,6 +66,10 @@ int bypass_xmit_packet(struct netchannel2 *nc, return 1; } +#ifdef CONFIG_XEN_NETDEV2_AUTOMATIC_BYPASS + ncr->autoteardown.nr_packets++; +#endif + queue_packet_to_interface(skb, rings); spin_unlock(&rings->lock); @@ -77,6 +81,12 @@ void nc2_aux_ring_start_disable_sequence(struct nc2_alternate_ring *nar) { spin_lock(&nar->rings.lock); if (nar->state < nc2_alt_ring_disabling) { +#ifdef CONFIG_XEN_NETDEV2_AUTOMATIC_BYPASS + /* We should really hold the autoteardown lock for + this, but see the big comment in + autoteardown_timer_fn() */ + nar->autoteardown.seen_count = 0; +#endif nar->state = nc2_alt_ring_disabling; nc2_kick(&nar->rings); } @@ -106,6 +116,8 @@ static void start_detach_worker(struct work_struct *ws) unbind_from_irqhandler(ncr->rings.irq, &ncr->rings); ncr->rings.irq = -1; + nc2_unregister_bypass_for_autoteardown(ncr); + spin_lock_bh(&ncr->rings.lock); ncr->state = nc2_alt_ring_detached_pending; ncr->rings.interface->need_aux_ring_state_machine = 1; @@ -330,7 +342,7 @@ static void send_ready_message(struct nc2_alternate_ring *ncr) /* This shouldn''t happen, because the producer ring should be essentially empty at this stage. If it does, it probably means the other end is playing - silly buggers with the ring indexes. Drop the + silly buggers with the ring indexes. Drop the message. */ printk(KERN_WARNING "Failed to send bypass ring ready message.\n"); } @@ -349,8 +361,12 @@ void nc2_handle_bypass_ready(struct netchannel2 *nc, ncr = container_of(ncrp, struct nc2_alternate_ring, rings); /* We''re now allowed to start sending packets over this * ring. */ - if (ncr->state == nc2_alt_ring_frontend_sent_ready) + if (ncr->state == nc2_alt_ring_frontend_sent_ready) { ncr->state = nc2_alt_ring_ready; +#ifdef CONFIG_XEN_NETDEV2_AUTOMATIC_BYPASS + ncr->autoteardown.seen_count = 0; +#endif + } } /* Called holding the aux ring lock. */ @@ -397,6 +413,8 @@ static void initialise_bypass_frontend_work_item(struct work_struct *ws) nc2_kick(&interface->rings); spin_unlock_bh(&interface->rings.lock); + nc2_register_bypass_for_autoteardown(ncr); + return; err: @@ -441,12 +459,18 @@ static void initialise_bypass_backend_work_item(struct work_struct *ws) send_ready_message(ncr); +#ifdef CONFIG_XEN_NETDEV2_AUTOMATIC_BYPASS + ncr->autoteardown.seen_count = 0; +#endif + spin_lock_bh(&ncr->rings.lock); ncr->state = nc2_alt_ring_ready; spin_unlock_bh(&ncr->rings.lock); nc2_kick(&ncr->rings); + nc2_register_bypass_for_autoteardown(ncr); + return; err: @@ -526,11 +550,14 @@ err: work->frontend_setup_msg.common.ring_pages); work->state = nc2_alt_ring_frontend_preparing; +#ifdef CONFIG_XEN_NETDEV2_AUTOMATIC_BYPASS + INIT_LIST_HEAD(&work->autoteardown.autoteardown_list); +#endif init_waitqueue_head(&work->eventq); work->handle = work->frontend_setup_msg.common.handle; INIT_WORK(&work->work_item, initialise_bypass_frontend_work_item); if (init_ring_pair(&work->rings, nc) < 0) - goto err; + goto err; work->rings.filter_mac = 1; list_add(&work->rings_by_interface, &nc->alternate_rings); @@ -591,12 +618,15 @@ err: sizeof(uint32_t) * work->backend_setup_msg.common.ring_pages); +#ifdef CONFIG_XEN_NETDEV2_AUTOMATIC_BYPASS + INIT_LIST_HEAD(&work->autoteardown.autoteardown_list); +#endif work->state = nc2_alt_ring_backend_preparing; init_waitqueue_head(&work->eventq); work->handle = work->backend_setup_msg.common.handle; INIT_WORK(&work->work_item, initialise_bypass_backend_work_item); if (init_ring_pair(&work->rings, nc) < 0) - goto err; + goto err; work->rings.filter_mac = 1; list_add(&work->rings_by_interface, &nc->alternate_rings); @@ -725,7 +755,7 @@ void detach_all_bypasses(struct netchannel2 *nc) machine, which will eventually destroy the bypass. */ /* nc2_alt_ring_frontend_sent_ready is a bit - odd. We are frontend-like, and we''ve told + odd. We are frontend-like, and we''ve told the backend who we are, but we haven''t yet received a READY from the backend. We don''t necessarily trust the backend, so we diff --git a/drivers/xen/netchannel2/bypassee.c b/drivers/xen/netchannel2/bypassee.c index 95ec681..0379c8e 100644 --- a/drivers/xen/netchannel2/bypassee.c +++ b/drivers/xen/netchannel2/bypassee.c @@ -276,6 +276,63 @@ void nc2_handle_bypass_detached(struct netchannel2 *nc, msg.handle); } +static void process_suggestion_queue_workitem(struct work_struct *ws) +{ + struct netchannel2 *nc + container_of(ws, struct netchannel2, + incoming_bypass_suggestions.workitem); + struct nc2_incoming_bypass_suggestions *sugg + &nc->incoming_bypass_suggestions; + unsigned ind; + unsigned char mac[ETH_ALEN]; + + spin_lock_bh(&sugg->lock); + while (sugg->tail != sugg->head) { + ind = sugg->tail % NC2_BYPASS_SUGG_QUEUE_SIZE; + memcpy(mac, sugg->queue[ind].mac, ETH_ALEN); + sugg->tail++; + spin_unlock_bh(&sugg->lock); + + nb2_handle_suggested_bypass(nc, mac); + + spin_lock_bh(&sugg->lock); + } + spin_unlock_bh(&sugg->lock); +} + +void nc2_handle_suggest_bypass(struct netchannel2 *nc, + struct netchannel2_ring_pair *ncrp, + struct netchannel2_msg_hdr *hdr) +{ + struct nc2_incoming_bypass_suggestions *sugg + &nc->incoming_bypass_suggestions; + struct netchannel2_msg_suggest_bypass msg; + unsigned ind; + + if (hdr->size != sizeof(msg)) { + pr_debug("strange size suggest bypass message; %d != %zd\n", + hdr->size, sizeof(msg)); + return; + } + if (ncrp != &nc->rings) { + pr_debug("suggest bypass on bypass ring?\n"); + return; + } + nc2_copy_from_ring(&nc->rings.cons_ring, &msg, sizeof(msg)); + + spin_lock(&sugg->lock); + ind = sugg->head % NC2_BYPASS_SUGG_QUEUE_SIZE; + /* Drop if we''ve overflowed the queue */ + if (sugg->head == sugg->tail + NC2_BYPASS_SUGG_QUEUE_SIZE) + sugg->tail++; + memcpy(&sugg->queue[ind].mac, msg.mac, ETH_ALEN); + if (sugg->head == sugg->tail) + schedule_work(&sugg->workitem); + sugg->head++; + spin_unlock(&sugg->lock); +} + + static int send_disable_bypass_msg(struct netchannel2 *nc, struct nc2_bypass *bypass) { @@ -735,3 +792,11 @@ void release_bypasses(struct netchannel2 *nc) flush_scheduled_work(); } + +void nc2_init_incoming_bypass_suggestions( + struct netchannel2 *nc2, + struct nc2_incoming_bypass_suggestions *nibs) +{ + spin_lock_init(&nibs->lock); + INIT_WORK(&nibs->workitem, process_suggestion_queue_workitem); +} diff --git a/drivers/xen/netchannel2/chan.c b/drivers/xen/netchannel2/chan.c index 357ad18..fa52353 100644 --- a/drivers/xen/netchannel2/chan.c +++ b/drivers/xen/netchannel2/chan.c @@ -113,6 +113,9 @@ retry: case NETCHANNEL2_MSG_BYPASS_READY: nc2_handle_bypass_ready(nc, ncrp, &hdr); break; + case NETCHANNEL2_MSG_SUGGEST_BYPASS: + nc2_handle_suggest_bypass(nc, ncrp, &hdr); + break; case NETCHANNEL2_MSG_PAD: break; default: @@ -173,6 +176,7 @@ static void flush_rings(struct netchannel2_ring_pair *ncrp) advertise_offloads(nc); nc2_advertise_bypasses(nc); nc2_crank_aux_ring_state_machine(nc); + nc2_autobypass_make_suggestions(nc); } else { nc2_alternate_ring_disable_finish(ncrp); } @@ -437,6 +441,8 @@ struct netchannel2 *nc2_new(struct xenbus_device *xd) #ifdef CONFIG_XEN_NETDEV2_BYPASSABLE INIT_LIST_HEAD(&nc->bypasses_a); INIT_LIST_HEAD(&nc->bypasses_b); + nc2_init_incoming_bypass_suggestions(nc, + &nc->incoming_bypass_suggestions); nc->max_bypasses = max_bypasses; #endif #ifdef CONFIG_XEN_NETDEV2_BYPASS_ENDPOINT diff --git a/drivers/xen/netchannel2/netchannel2_core.h b/drivers/xen/netchannel2/netchannel2_core.h index ed94098..2a5ed06 100644 --- a/drivers/xen/netchannel2/netchannel2_core.h +++ b/drivers/xen/netchannel2/netchannel2_core.h @@ -107,6 +107,79 @@ static inline struct skb_cb_overlay *get_skb_overlay(struct sk_buff *skb) return (struct skb_cb_overlay *)skb->cb; } +struct nc2_alternate_ring; +struct netchannel2; + +#ifdef CONFIG_XEN_NETDEV2_AUTOMATIC_BYPASS +#define AUTOBYPASS_MAX_HOT_MACS 8 +#define AUTOBYPASS_SUGG_QUEUE_SIZE 8 +struct nc2_auto_bypass { + enum { + autobypass_state_normal, + autobypass_state_considering, + autobypass_state_debounce + } state; + uint32_t nr_bypass_packets; + uint64_t nr_non_bypass_packets; + unsigned long start_jiffies; + unsigned nr_hot_macs; + struct { + unsigned char mac[ETH_ALEN]; + /* This won''t overflow because the autobypass period + is less than 65536. */ + uint16_t count; + } hot_macs[AUTOBYPASS_MAX_HOT_MACS]; + unsigned suggestion_head; + unsigned suggestion_tail; + struct { + unsigned char mac[ETH_ALEN]; + } suggestions[AUTOBYPASS_SUGG_QUEUE_SIZE]; +}; +void nc2_received_bypass_candidate_packet(struct netchannel2 *nc, + struct sk_buff *skb); + +struct nc2_bypass_autoteardown { + struct list_head autoteardown_list; + uint64_t nr_packets; + unsigned seen_count; +}; + +void nc2_register_bypass_for_autoteardown(struct nc2_alternate_ring *nar); +void nc2_unregister_bypass_for_autoteardown(struct nc2_alternate_ring *nar); +void nc2_shutdown_autoteardown(void); +#else +static inline void nc2_shutdown_autoteardown(void) +{ +} +static inline void nc2_register_bypass_for_autoteardown(struct nc2_alternate_ring *nar) +{ +} +static inline void nc2_unregister_bypass_for_autoteardown(struct nc2_alternate_ring *nar) +{ +} +#endif + +#ifdef CONFIG_XEN_NETDEV2_BYPASSABLE +#define NC2_BYPASS_SUGG_QUEUE_SIZE 8 +struct nc2_incoming_bypass_suggestions { + spinlock_t lock; + + unsigned head; + unsigned tail; + + struct work_struct workitem; + + struct { + unsigned char mac[ETH_ALEN]; + } queue[NC2_BYPASS_SUGG_QUEUE_SIZE]; +}; + +void nc2_init_incoming_bypass_suggestions( + struct netchannel2 *nc, + struct nc2_incoming_bypass_suggestions *nibs); +#endif + + /* Packets for which we need to send FINISH_PACKET messages for as soon as possible. */ struct pending_finish_packets { @@ -151,8 +224,8 @@ struct netchannel2_ring_pair { grant_ref_t gref_pool; /* The IRQ corresponding to the event channel which is - connected to the other end. This only changes from the - xenbus state change handler. It is notified from lots of + connected to the other end. This only changes from the + xenbus state change handler. It is notified from lots of other places. Fortunately, it''s safe to notify on an irq after it''s been released, so the lack of synchronisation doesn''t matter. */ @@ -293,12 +366,17 @@ struct netchannel2 { struct list_head bypasses_b; struct nc2_bypass *current_bypass_frontend; + struct nc2_incoming_bypass_suggestions incoming_bypass_suggestions; #endif /* Updates are protected by the lock. This can be read at any * time without holding any locks, and the rest of Linux is * expected to cope. */ struct net_device_stats stats; + +#ifdef CONFIG_XEN_NETDEV2_AUTOMATIC_BYPASS + struct nc2_auto_bypass auto_bypass; +#endif }; #ifdef CONFIG_XEN_NETDEV2_BYPASSABLE @@ -366,6 +444,9 @@ void nc2_handle_bypass_disabled(struct netchannel2 *nc, void nc2_handle_bypass_detached(struct netchannel2 *nc, struct netchannel2_ring_pair *ncrp, struct netchannel2_msg_hdr *hdr); +void nc2_handle_suggest_bypass(struct netchannel2 *nc, + struct netchannel2_ring_pair *ncrp, + struct netchannel2_msg_hdr *hdr); void release_bypasses(struct netchannel2 *nc); void nb2_handle_suggested_bypass(struct netchannel2 *a_chan, const char *mac_b); @@ -391,6 +472,11 @@ static inline void nc2_handle_bypass_detached(struct netchannel2 *nc, struct netchannel2_msg_hdr *hdr) { } +static inline void nc2_handle_suggest_bypass(struct netchannel2 *nc, + struct netchannel2_ring_pair *ncrp, + struct netchannel2_msg_hdr *hdr) +{ +} #endif #ifdef CONFIG_XEN_NETDEV2_BYPASS_ENDPOINT @@ -454,6 +540,10 @@ struct nc2_alternate_ring { struct netchannel2_msg_bypass_backend backend_setup_msg; uint32_t cons_grefs[MAX_BYPASS_RING_PAGES_MAPPABLE]; uint32_t prod_grefs[MAX_BYPASS_RING_PAGES_MAPPABLE]; + +#ifdef CONFIG_XEN_NETDEV2_AUTOMATIC_BYPASS + struct nc2_bypass_autoteardown autoteardown; +#endif }; void nc2_handle_bypass_ready(struct netchannel2 *nc, @@ -536,6 +626,18 @@ static inline void nc2_handle_bypass_ready(struct netchannel2 *nc, } #endif +#ifdef CONFIG_XEN_NETDEV2_AUTOMATIC_BYPASS +void _nc2_autobypass_make_suggestions(struct netchannel2 *nc); +static inline void nc2_autobypass_make_suggestions(struct netchannel2 *nc) +{ + if (nc->auto_bypass.suggestion_tail != nc->auto_bypass.suggestion_head) + _nc2_autobypass_make_suggestions(nc); +} +#else +static inline void nc2_autobypass_make_suggestions(struct netchannel2 *nc) +{ +} +#endif static inline void flush_prepared_grant_copies(struct hypercall_batcher *hb, void (*on_fail)(void *ctxt, @@ -653,6 +755,8 @@ void deinit_receive_map_mode(void); void suspend_receive_map_mode(void); void resume_receive_map_mode(void); +struct netchannel2 *nc2_get_interface_for_page(struct page *p); + int nc2_start_xmit(struct sk_buff *skb, struct net_device *dev); int nc2_really_start_xmit(struct netchannel2_ring_pair *ncrp, struct sk_buff *skb); diff --git a/drivers/xen/netchannel2/recv_packet.c b/drivers/xen/netchannel2/recv_packet.c index 749c70e..94aa127 100644 --- a/drivers/xen/netchannel2/recv_packet.c +++ b/drivers/xen/netchannel2/recv_packet.c @@ -200,6 +200,18 @@ void nc2_handle_packet_msg(struct netchannel2 *nc, break; } +#ifdef CONFIG_XEN_NETDEV2_AUTOMATIC_BYPASS + if (ncrp == &nc->rings) { + if (msg.flags & NC2_PACKET_FLAG_bypass_candidate) + nc2_received_bypass_candidate_packet(nc, skb); + else + nc->auto_bypass.nr_non_bypass_packets++; + } else { + container_of(ncrp, struct nc2_alternate_ring, rings)-> + autoteardown.nr_packets++; + } +#endif + switch (msg.segmentation_type) { case NC2_PACKET_SEGMENTATION_TYPE_none: break; @@ -316,5 +328,6 @@ int __init nc2_init(void) void __exit nc2_exit(void) { + nc2_shutdown_autoteardown(); deinit_receive_map_mode(); } diff --git a/drivers/xen/netchannel2/rscb.c b/drivers/xen/netchannel2/rscb.c index cdcb116..f47cbdb 100644 --- a/drivers/xen/netchannel2/rscb.c +++ b/drivers/xen/netchannel2/rscb.c @@ -211,6 +211,9 @@ struct grant_packet_plan { grant_ref_t gref_pool; int use_subpage_grants; unsigned prefix_avail; +#ifdef CONFIG_XEN_NETDEV2_BYPASSABLE + int could_have_used_bypass; +#endif }; static inline int nfrags_skb(struct sk_buff *skb, int prefix_size) @@ -311,6 +314,9 @@ static void prepare_subpage_grant(struct netchannel2_ring_pair *ncrp, domid_t trans_domid; grant_ref_t trans_gref; grant_ref_t gref; +#ifdef CONFIG_XEN_NETDEV2_BYPASSABLE + struct netchannel2 *orig_iface; +#endif if (size <= plan->prefix_avail) { /* This fragment is going to be inline -> nothing to @@ -328,6 +334,12 @@ static void prepare_subpage_grant(struct netchannel2_ring_pair *ncrp, gref = gnttab_claim_grant_reference(&plan->gref_pool); frag->receiver_copy.gref = gref; if (page_is_tracked(page)) { +#ifdef CONFIG_XEN_NETDEV2_BYPASSABLE + orig_iface = nc2_get_interface_for_page(page); + if (orig_iface && + orig_iface->extant_bypasses < orig_iface->max_bypasses) + plan->could_have_used_bypass = 1; +#endif lookup_tracker_page(page, &trans_domid, &trans_gref); gnttab_grant_foreign_access_ref_trans(gref, ncrp->otherend_id, @@ -411,5 +423,12 @@ void xmit_grant(struct netchannel2_ring_pair *ncrp, } skb_co->nr_fragments = plan.out_fragment - msg->frags; + +#ifdef CONFIG_XEN_NETDEV2_BYPASSABLE + if (plan.could_have_used_bypass && + ncrp == &ncrp->interface->rings && + ncrp->interface->extant_bypasses < ncrp->interface->max_bypasses) + msg->flags |= NC2_PACKET_FLAG_bypass_candidate; +#endif } diff --git a/include/xen/interface/io/netchannel2.h b/include/xen/interface/io/netchannel2.h index f3cabe8..075658d 100644 --- a/include/xen/interface/io/netchannel2.h +++ b/include/xen/interface/io/netchannel2.h @@ -84,6 +84,11 @@ struct netchannel2_msg_packet { * regardless of any SET_OFFLOAD messages which may or may not have * been sent. */ #define NC2_PACKET_FLAG_data_validated 2 +/* If set, this flag indicates that this packet could have used a + * bypass if one had been available, and so it should be sent to the + * autobypass state machine. + */ +#define NC2_PACKET_FLAG_bypass_candidate 4 /* If set, the transmitting domain requires an event urgently when * this packet''s finish message is sent. Otherwise, the event can be * delayed. */ @@ -326,4 +331,12 @@ struct netchannel2_msg_bypass_detached { uint32_t handle; }; +#define NETCHANNEL2_MSG_SUGGEST_BYPASS 17 +struct netchannel2_msg_suggest_bypass { + struct netchannel2_msg_hdr hdr; + unsigned char mac[6]; + uint16_t pad1; + uint32_t pad2; +}; + #endif /* !__NETCHANNEL2_H__ */ -- 1.6.3.1 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
<steven.smith@citrix.com>
2009-Oct-04 15:04 UTC
[Xen-devel] [PATCH 19/22] Add the basic VMQ APIs. Nobody uses or implements them at the moment, but that will change shortly.
This includes various bits of patches which were Signed-off-by: Jose Renato Santos <jsantos@hpl.hp.com> Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Signed-off-by: Steven Smith <steven.smith@citrix.com> All bugs are mine, of course. --- include/linux/netdevice.h | 5 + include/linux/netvmq.h | 399 +++++++++++++++++++++++++++++++++++++++++++++ net/Kconfig | 6 + 3 files changed, 410 insertions(+), 0 deletions(-) create mode 100644 include/linux/netvmq.h diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 2b7b804..f439800 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -753,6 +753,11 @@ struct net_device #define GSO_MAX_SIZE 65536 unsigned int gso_max_size; +#ifdef CONFIG_NET_VMQ + /* multi-queue for virtualization */ + struct net_vmq *vmq; +#endif + #ifdef CONFIG_DCBNL /* Data Center Bridging netlink ops */ struct dcbnl_rtnl_ops *dcbnl_ops; diff --git a/include/linux/netvmq.h b/include/linux/netvmq.h new file mode 100644 index 0000000..108807b --- /dev/null +++ b/include/linux/netvmq.h @@ -0,0 +1,399 @@ +/****************************************************************************** + * netvmq.h + * + * Interface between the I/O virtualization layer and multi-queue devices to + * enable direct data placement in guest memory + * + * Copyright (c) 2008, Jose Renato Santos, Hewlett-Packard Co. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License version 2 + * as published by the Free Software Foundation; or, when distributed + * separately from the Linux kernel or incorporated into other + * software packages, subject to the following license: + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this source file (the "Software"), to deal in the Software without + * restriction, including without limitation the rights to use, copy, modify, + * merge, publish, distribute, sublicense, and/or sell copies of the Software, + * and to permit persons to whom the Software is furnished to do so, subject to + * the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS + * IN THE SOFTWARE. + * + */ + +/* + * This file defines the vmq API for Linux network device drivers + * to enable the use of multi-queue NICs for virtualization. + * The goal is to enable network device drivers to dedicate + * each RX queue to a specific guest. This means network + * drivers should be able to allocate physical memory from + * the set of memory pages assigned to a specific guest. + * + * The interface between network device drivers and the virtualization + * layer has two components: + * 1) A set of functions implemented by the virtualization layer that + * can be called from new multi-queue network device drivers + * 2) A set of new functions implemented by the device drivers to support + * multi-queue + */ + +#ifndef _NETVMQ_H +#define _NETVMQ_H + +#ifdef CONFIG_NET_VMQ + +#include <linux/netdevice.h> + +/* status flags for vmq_queue struct */ +/* allocated/free queue*/ +#define _VMQ_queue_allocated (0) +#define VMQ_queue_allocated (1U<<_VMQ_queue_allocated) + +/* queue type. RX/TX */ +#define _VMQ_queue_rx (1) +#define VMQ_queue_rx (1U<<_VMQ_queue_rx) + +/* enabled/disabled queue */ +#define _VMQ_queue_enabled (2) +#define VMQ_queue_enabled (1U<<_VMQ_queue_enabled) + +/* queue type used to allocate or check number of available queues */ +#define VMQ_TYPE_RX (1) +#define VMQ_TYPE_TX (2) +#define VMQ_TYPE_TX_RX (VMQ_TYPE_RX | VMQ_TYPE_TX) + + +struct vmq_queue { + /* queue flags - VMQ_queue_* */ + unsigned int flags; + /* pointer to opaque struct with guest information */ + /* format is specific to the virtualization layer used */ + void *guest; + /* pointer to opaque struct in device driver */ + void *devqueue; +}; +typedef struct vmq_queue vmq_queue_t; + +struct net_vmq { + /* pointer to device driver specific functions for multi-queue */ + + int (*avail_queues)(struct net_device *netdev, + unsigned int queue_type); + int (*alloc_queue)(struct net_device *netdev, + unsigned int queue_type); + int (*free_queue)(struct net_device *netdev, int queue); + int (*get_maxsize)(struct net_device *netdev); + int (*get_size)(struct net_device *netdev, int queue); + int (*set_size)(struct net_device *netdev, int queue, int size); + int (*set_mac)(struct net_device *netdev, int queue, u8 *mac_addr); + int (*set_vlan)(struct net_device *netdev, int queue, int vlan_id); + int (*enable)(struct net_device *netdev, int queue); + int (*disable)(struct net_device *netdev, int queue); + + /* maximum number of vm queues that device can allocate */ + int nvmq; + + /* Variable size Vector with queues info */ + /* nvmq defines the vector size */ + vmq_queue_t *queue; +}; +typedef struct net_vmq net_vmq_t; + +/** + * alloc_vmq - Allocate net_vmq struct used for multi-queue devices + * @max_queue: Maximum number of queues that can be allocated + * for virtualization + */ +static inline net_vmq_t *alloc_vmq(int max_queues) +{ + net_vmq_t *vmq; + vmq = kzalloc(sizeof(net_vmq_t), GFP_KERNEL); + if (!vmq) + return NULL; + vmq->queue = kzalloc(max_queues * sizeof(vmq_queue_t), GFP_KERNEL); + if (!vmq->queue) { + kfree(vmq); + return NULL; + } + return vmq; +} + +/** + * free_vmq - Free net_vmq struct + * @vmq: pointer to net_vmq struct + */ +static inline void free_vmq(net_vmq_t *vmq) +{ + kfree(vmq->queue); + kfree(vmq); +} + +/*================================================================* + * 1) Functions provided by the virtualization layer to support * + * multi-queue devices. * + * Device drivers that support multi-queue should use these new * + * functions instead of the ones they replace * + *================================================================*/ + + +/* vmq_alloc_skb : This function should be used instead of the usual + * netdev_alloc_skb() in order to post RX buffers to a RX queue + * dedicated to a guest. Queues not dedicated to a guest should + * use the reguler netdev_alloc_skb() function + * + * It will return buffers from memory belonging to a given guest + * The device driver should not try to change the data alignment + * or change the skb data pointer in any way. + * The function should already return an skb with the right alignment + * + * The device driver should be prepared to handle a NULL return value + * indicating no memory for that guest is currently available. In this case + * the device driver should only postpone the buffer allocation + * (probably until the next buffer is used by the device) and continue + * operating with the previously posted buffers + * + * netdev: network device allocating the skb + * queue: Queue id of a queue dedicated to a guest + * individual queues are identified by a integer in the + * the range [0, MAX-1]. Negative values are use to indicate error + * The maximum number of queues (MAX) is determined by the device + * + * length: size to allocate + */ +struct sk_buff *vmq_alloc_skb(struct net_device *netdev, int queue, + unsigned int length); + + +/* vmq_free_skb : Free an skb allocated with vmq_alloc_skb() + * + * skb: socket buffer to be freed + * qid: Queue id of a queue dedicated to a guest + * We could add a qid field in sk_buff struct and avoid passing it + * as a parameter in vm_free_skb() and vmq_netif_rx() + */ +void vmq_free_skb(struct sk_buff *skb, int queue); + +/* vmq_alloc_page : Allocate full pages from guest memory. + * This can only be used when the device MTU is larger than a page + * and multiple pages are neeeded to receive a packet. + * + * Similarly to vmq_alloc_skb(), + * the device driver should be prepared to handle a NULL return value + * indicating no memory for that guest is currently available. In this case + * the device driver should only postpone the buffer allocation + * (probably until the next buffer is used by the device) and continue + * operating with the previously posted buffers + * + * netdev: network device allocating the skb + * queue: Queue id of a queue dedicated to a guest + * individual queues are identified by a integer in the + * the range [0, MAX-1]. Negative values are use to indicate error + * The maximum number of queues (MAX) is determined by the device + */ +struct page *vmq_alloc_page(struct net_device *netdev, int queue); + +/* vmq_free_page : Free a guest page allocated with vmq_alloc_page() + * + * page: page to be freed + * queue: Queue id of a queue dedicated to a guest + */ +void vmq_free_page(struct page *page, int queue); + +/* + * vmq_netif_rx: This function is a replacement for the generic netif_rx() + * and allows packets received on a particular queue to be forwarded directly + * to a particular guest bypassing the regular network stack (bridge in xen). + * In Xen this function will be implemented by the Xen netback driver. + * The use of this function by the driver is optional and may be configured + * using a kernel CONFIG option (CONFIG option to be defined) + * + * skb: Received socket buffer + * queue: Queue id of a queue dedicated to a guest + */ +int vmq_netif_rx(struct sk_buff *skb, int queue); + +/*==============================================================* + * 2) New device driver functions for multi-queue devices * + *==============================================================*/ + +/* vmq_avail_queues: Returns number of available queues that can be allocated + * It does not include already allocated queues or queues used for receive + * side scaling. It should return 0 when vmq_alloc_queue() would fail + * + * netdev: network device + * queue_type: Queue type, (VMQ_TYPE_*) + * RETURN VALUE: + * number of available queues + * returns 0 on success + */ +static inline int vmq_avail_queues(struct net_device *netdev, + unsigned int queue_type) +{ + if (!netdev->vmq) + return -EINVAL; + return netdev->vmq->avail_queues(netdev, queue_type); +} + +/* vmq_alloc_queue: allocate a queue + * + * netdev: network device + * queue_type: Queue type, (VMQ_TYPE_*) + * RETURN VALUE: + * queue id of the allocated queue (the qid should be an integer which + * cannot exceed or be equal to the maximum number of queues); + * a negative value indicates error + */ +static inline int vmq_alloc_queue(struct net_device *netdev, + unsigned int queue_type) +{ + if (!netdev->vmq) + return -EINVAL; + return netdev->vmq->alloc_queue(netdev, queue_type); +} + +/* vmq_free_queue: free a queue previously allocated with vmq_alloc_queue() + * + * netdev: network device + * queue: id of queue to be freed + * RETURN VALUE: + * a negative value indicates error; + * returns 0 on success + */ +static inline int vmq_free_queue(struct net_device *netdev, int queue) +{ + if (!netdev->vmq) + return -EINVAL; + return netdev->vmq->free_queue(netdev, queue); +} + +/* vmq_get_maxsize: Get maximum size that can be set for a queue + * (max number of HW descriptors) + * + * netdev: network device + * RETURN VALUE: + * max size of a queue + * a negative value indicates error, + */ +static inline int vmq_get_maxsize(struct net_device *netdev) +{ + if (!netdev->vmq) + return -EINVAL; + return netdev->vmq->get_maxsize(netdev); +} + +/* vmq_get_size: Get size of queue (number of HW descriptors) + * + * netdev: network device + * queue: queue id + * RETURN VALUE: + * size of queue + * a negative value indicates error, + */ +static inline int vmq_get_size(struct net_device *netdev, int queue) +{ + if (!netdev->vmq) + return -EINVAL; + return netdev->vmq->get_size(netdev, queue); +} + +/* vmq_set_size: Set size of queue (number of HW descriptors) + * It can return error if size exceeds maximum hw capablity + * We will probably need function to return the maximum + * HW queue size, but we can live without it for now + * netdev: network device + * queue: queue id + * size: Queue size (number of HW descriptors) + * RETURN VALUE: + * a negative value indicates error, + * returns 0 on success + */ +static inline int vmq_set_size(struct net_device *netdev, int queue, int size) +{ + if (!netdev->vmq) + return -EINVAL; + return netdev->vmq->set_size(netdev, queue, size); +} + +/* vmq_set_mac: Set MAC address filter for a queue + * + * netdev: network device + * queue: queue id + * mac_addr: pointer to a 6 byte array with the MAC address + * MAC address FF:FF:FF:FF:FF:FF is used to reset the filter + * RETURN VALUE: + * a negative value indicates error, + * returns 0 on success + */ +static inline int vmq_set_mac(struct net_device *netdev, int queue, + u8 *mac_addr) +{ + if (!netdev->vmq) + return -EINVAL; + return netdev->vmq->set_mac(netdev, queue, mac_addr); +} + +/* vmq_set_vlan: Set VLAN filter for a queue + * + * netdev: network device + * queue: queue id + * vlan_id: VLAN id + * The invalid VLAN id -1 is used to reset the VLAN filter + * RETURN VALUE: + * a negative value indicates error, + * returns 0 on success + */ +static inline int vmq_set_vlan(struct net_device *netdev, int queue, + int vlan_id) +{ + if (!netdev->vmq) + return -EINVAL; + return netdev->vmq->set_vlan(netdev, queue, vlan_id); +} + +/* vmq_enable_queue: Enable queue + * For receive queues this will trigger allocating and posting buffers + * + * netdev: network device + * queue: queue id + * RETURN VALUE: + * a negative value indicates error, + * returns 0 on success + */ +static inline int vmq_enable_queue(struct net_device *netdev, int queue) +{ + if (!netdev->vmq) + return -EINVAL; + return netdev->vmq->enable(netdev, queue); +} + +/* vmq_disable_queue: Disable queue + * This will flush all buffers in the queue and will free the respective + * skb''s or fragment pages + * + * netdev: network device + * queue_id: queue id + * RETURN VALUE: + * a negative value indicates error, + * returns 0 on success + */ +static inline int vmq_disable_queue(struct net_device *netdev, int queue) +{ + if (!netdev->vmq) + return -EINVAL; + return netdev->vmq->disable(netdev, queue); +} + +#endif /* CONFIG_NET_VMQ */ + +#endif /* _NETVMQ_H */ diff --git a/net/Kconfig b/net/Kconfig index 0732cb3..7837a9e 100644 --- a/net/Kconfig +++ b/net/Kconfig @@ -37,6 +37,12 @@ source "net/unix/Kconfig" source "net/xfrm/Kconfig" source "net/iucv/Kconfig" +config NET_VMQ + bool "Virtual-machine multi-queue support" + default n + help + Add support for the VMQ features of certain modern network cards. + config INET bool "TCP/IP networking" ---help--- -- 1.6.3.1 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
<steven.smith@citrix.com>
2009-Oct-04 15:04 UTC
[Xen-devel] [PATCH 20/22] Posted buffer mode support.
In this mode, domains are expected to pre-post a number of receive buffers to their peer, and the peer will then copy packets into those buffers when it wants to transmit. This is similar to the way netchannel1 worked. This isn''t particularly useful by itself, because the software-only implementation is slower than the other transmission modes, and is disabled unless you set a #define, but it''s necessary for VMQ support. Signed-off-by: Steven Smith <steven.smith@citrix.com> --- drivers/xen/netchannel2/Makefile | 2 +- drivers/xen/netchannel2/chan.c | 37 ++ drivers/xen/netchannel2/netback2.c | 12 + drivers/xen/netchannel2/netchannel2_core.h | 92 ++++ drivers/xen/netchannel2/netfront2.c | 2 + drivers/xen/netchannel2/posted_buffers.c | 781 ++++++++++++++++++++++++++++ drivers/xen/netchannel2/recv_packet.c | 5 + drivers/xen/netchannel2/xmit_packet.c | 21 + include/xen/interface/io/netchannel2.h | 71 +++ 9 files changed, 1022 insertions(+), 1 deletions(-) create mode 100644 drivers/xen/netchannel2/posted_buffers.c diff --git a/drivers/xen/netchannel2/Makefile b/drivers/xen/netchannel2/Makefile index 9c4f97a..11a257e 100644 --- a/drivers/xen/netchannel2/Makefile +++ b/drivers/xen/netchannel2/Makefile @@ -1,7 +1,7 @@ obj-$(CONFIG_XEN_NETCHANNEL2) += netchannel2.o netchannel2-objs := chan.o netchan2.o rscb.o util.o \ - xmit_packet.o offload.o recv_packet.o poll.o \ + posted_buffers.o xmit_packet.o offload.o recv_packet.o poll.o \ receiver_map.o ifeq ($(CONFIG_XEN_NETDEV2_BACKEND),y) diff --git a/drivers/xen/netchannel2/chan.c b/drivers/xen/netchannel2/chan.c index fa52353..060b49b 100644 --- a/drivers/xen/netchannel2/chan.c +++ b/drivers/xen/netchannel2/chan.c @@ -89,6 +89,15 @@ retry: nc2_handle_set_max_fragments_per_packet(nc, ncrp, &hdr); break; + case NETCHANNEL2_MSG_POST_BUFFER: + nc2_handle_post_buffer(nc, ncrp, &hdr); + break; + case NETCHANNEL2_MSG_RETURN_POSTED_BUFFER: + nc2_handle_return_posted_buffer(nc, ncrp, &hdr); + break; + case NETCHANNEL2_MSG_SET_NR_POSTED_BUFFERS: + nc2_handle_set_nr_posted_buffers(nc, ncrp, &hdr); + break; case NETCHANNEL2_MSG_BYPASS_FRONTEND: nc2_handle_bypass_frontend(nc, ncrp, &hdr); break; @@ -172,8 +181,12 @@ static void flush_rings(struct netchannel2_ring_pair *ncrp) advertise_max_fragments_per_packet(ncrp); if (ncrp == &nc->rings) { + nc2_replenish_rx_buffers(nc); + nc2_return_pending_posted_buffers(nc); if (nc->need_advertise_offloads) advertise_offloads(nc); + if (nc->need_advertise_tx_buffers) + nc2_advertise_tx_buffers(nc); nc2_advertise_bypasses(nc); nc2_crank_aux_ring_state_machine(nc); nc2_autobypass_make_suggestions(nc); @@ -454,6 +467,13 @@ struct netchannel2 *nc2_new(struct xenbus_device *xd) nc2_release(nc); return NULL; } + INIT_LIST_HEAD(&nc->rx_buffers); + INIT_LIST_HEAD(&nc->unused_rx_buffers); + INIT_LIST_HEAD(&nc->unposted_rx_buffers); + INIT_LIST_HEAD(&nc->avail_tx_buffers); + nc->nr_avail_tx_buffers = 0; + INIT_LIST_HEAD(&nc->unused_tx_buffer_slots); + INIT_LIST_HEAD(&nc->pending_tx_buffer_return); if (local_trusted) { if (init_receive_map_mode() < 0) { @@ -513,8 +533,13 @@ void nc2_release(struct netchannel2 *nc) nc2_queue_purge(&nc->rings, &nc->pending_skbs); + /* Should have been released when we detached. */ + BUG_ON(nc->rx_buffer_structs); + release_bypasses(nc); + unprepare_tx_buffers(nc); + free_netdev(nc->net_device); } @@ -604,6 +629,9 @@ int nc2_attach_rings(struct netchannel2 *nc, static void _detach_rings(struct netchannel2_ring_pair *ncrp) { + if (ncrp == &ncrp->interface->rings) + nc2_posted_buffer_rx_forget(ncrp->interface); + spin_lock_bh(&ncrp->lock); /* We need to release all of the pending transmission packets, because they''re never going to complete now that we''ve lost @@ -795,6 +823,15 @@ static int process_ring(struct napi_struct *napi, skb = __skb_dequeue(&ncrp->pending_tx_queue); } while (skb != NULL); + /* If we''ve transmitted on the main ring then we may + have made use of the hypercall batcher. Flush it. + This must happen before we flush the rings, since + that''s when the PACKET messages will be made + visible to the other end. */ + if (ncrp == &nc->rings) + flush_hypercall_batcher(&nc->batcher, + nc2_posted_on_gntcopy_fail); + flush_rings(ncrp); while ((skb = __skb_dequeue(&ncrp->release_on_flush_batcher))) diff --git a/drivers/xen/netchannel2/netback2.c b/drivers/xen/netchannel2/netback2.c index cf52839..129ef81 100644 --- a/drivers/xen/netchannel2/netback2.c +++ b/drivers/xen/netchannel2/netback2.c @@ -11,6 +11,8 @@ #include "netchannel2_endpoint.h" #include "netchannel2_uspace.h" +#define NR_TX_BUFS 256 + static atomic_t next_handle; /* A list of all currently-live netback2 interfaces. */ static LIST_HEAD(all_netbacks); @@ -172,6 +174,11 @@ static int attach_to_frontend(struct netback2 *nd) return 0; } +static void nb2_shutdown(struct netchannel2 *nc) +{ + nc2_set_nr_tx_buffers(nc, 0); +} + static void frontend_changed(struct xenbus_device *xd, enum xenbus_state frontend_state) { @@ -189,6 +196,8 @@ static void frontend_changed(struct xenbus_device *xd, * detached, and this is pointless but harmless.) */ detach_from_frontend(nb); + nc2_set_nr_tx_buffers(nb->chan, NR_TX_BUFS); + /* Tell the frontend what sort of rings we''re willing to accept. */ xenbus_printf(XBT_NIL, nb->xenbus_device->nodename, @@ -222,6 +231,7 @@ static void frontend_changed(struct xenbus_device *xd, break; case XenbusStateClosing: + nb2_shutdown(nb->chan); detach_from_frontend(nb); xenbus_switch_state(xd, XenbusStateClosed); break; @@ -257,6 +267,8 @@ static int netback2_uevent(struct xenbus_device *xd, static void netback2_shutdown(struct xenbus_device *xd) { + struct netback2 *nb = xenbus_device_to_nb2(xd); + nb2_shutdown(nb->chan); xenbus_switch_state(xd, XenbusStateClosing); } diff --git a/drivers/xen/netchannel2/netchannel2_core.h b/drivers/xen/netchannel2/netchannel2_core.h index 2a5ed06..1939cbb 100644 --- a/drivers/xen/netchannel2/netchannel2_core.h +++ b/drivers/xen/netchannel2/netchannel2_core.h @@ -23,6 +23,10 @@ * pointer; see the txp_slot stuff later. */ #define NR_TX_PACKETS 256 +/* No matter what the other end wants, we never post more than this + number of RX buffers to it. */ +#define MAX_POSTED_BUFFERS (2048+256) + /* A way of keeping track of a mapping of a bunch of grant references into a contigous chunk of virtual address space. This is used for things like multi-page rings. */ @@ -37,6 +41,7 @@ enum transmit_policy { transmit_policy_unknown = 0, transmit_policy_first = 0xf001, transmit_policy_grant = transmit_policy_first, + transmit_policy_post, transmit_policy_map, transmit_policy_small, transmit_policy_last = transmit_policy_small @@ -89,6 +94,8 @@ static inline nc2_txp_index_t txp_get_next_free(struct txp_slot *slot) /* This goes in struct sk_buff::cb */ struct skb_cb_overlay { + struct list_head buffers; /* Only if we''re using the posted + buffer strategy. */ struct txp_slot *tp; unsigned nr_fragments; grant_ref_t gref_pool; @@ -369,11 +376,67 @@ struct netchannel2 { struct nc2_incoming_bypass_suggestions incoming_bypass_suggestions; #endif + /* Infrastructure for managing buffers which we''ve posted to + the other end. These are all protected by the lock. */ + /* A list of nx2_rx_buffer structures, threaded on list, which + we''ve posted to the other end. */ + struct list_head rx_buffers; + /* Buffers which we''ve allocated but not yet sent to the other + end. */ + struct list_head unposted_rx_buffers; + /* Buffers which are available but not yet allocated. */ + struct list_head unused_rx_buffers; + /* The number of buffers in the rx_buffers list. */ + unsigned nr_rx_buffers; + /* The maximum number of buffers which we can ever have + outstanding, and the size of the rx_buffer_structs + array. */ + unsigned max_nr_rx_buffers; + /* A bunch of nc2_rx_buffer structures which can be used for + RX buffers. */ + struct nc2_rx_buffer *rx_buffer_structs; + /* Set if we''re sufficiently far through device shutdown that + posting more RX buffers would be a bad idea. */ + uint8_t dont_post_buffers; + + /* Infrastructure for managing buffers which the other end has + posted to us. Protected by the lock. */ + /* A list of nc2_tx_buffer structures, threaded on list, which + contains all tx buffers which have been posted by the + remote. */ + struct list_head avail_tx_buffers; + /* A list of nc2_tx_buffer structures which the other end + hasn''t populated yet. */ + struct list_head unused_tx_buffer_slots; + /* A list of nc2_tx_buffer structures which we need to return + to the other end. */ + struct list_head pending_tx_buffer_return; + /* Some pre-allocated nc2_tx_buffer structures. We have to + pre-allocate, because we always need to be able to respond + to a POST_BUFFER message (up to some limit). */ + struct nc2_tx_buffer *tx_buffers; + /* Non-zero if we need to send the other end a + SET_NR_POSTED_BUFFERS message. */ + uint8_t need_advertise_tx_buffers; + /* Number of tx buffers. This is the actual number of slots + in the @tx_buffers array. */ + uint32_t nr_tx_buffers; + /* Number of available tx buffers. The length of the + * avail_tx_buffers list. */ + uint32_t nr_avail_tx_buffers; + /* ``Configured'''' number of tx buffers. We only actually + allocate any TX buffers when the local interface is up, but + this is set to the desired number of buffers all the + time. */ + uint32_t configured_nr_tx_buffers; + /* Updates are protected by the lock. This can be read at any * time without holding any locks, and the rest of Linux is * expected to cope. */ struct net_device_stats stats; + struct hypercall_batcher batcher; + #ifdef CONFIG_XEN_NETDEV2_AUTOMATIC_BYPASS struct nc2_auto_bypass auto_bypass; #endif @@ -680,11 +743,26 @@ struct sk_buff *handle_receiver_copy_packet(struct netchannel2 *nc, struct netchannel2_msg_hdr *hdr, unsigned nr_frags, unsigned frags_off); +struct sk_buff *handle_pre_posted_packet(struct netchannel2 *nc, + struct netchannel2_msg_packet *msg, + struct netchannel2_msg_hdr *hdr, + unsigned nr_frags, + unsigned frags_off); struct sk_buff *handle_receiver_map_packet(struct netchannel2 *nc, struct netchannel2_msg_packet *msg, struct netchannel2_msg_hdr *hdr, unsigned nr_frags, unsigned frags_off); +void nc2_handle_return_posted_buffer(struct netchannel2 *nc, + struct netchannel2_ring_pair *ncrp, + struct netchannel2_msg_hdr *hdr); +void nc2_handle_post_buffer(struct netchannel2 *nc, + struct netchannel2_ring_pair *ncrp, + struct netchannel2_msg_hdr *hdr); +void nc2_handle_set_nr_posted_buffers(struct netchannel2 *nc, + struct netchannel2_ring_pair *ncrp, + struct netchannel2_msg_hdr *hdr); +void nc2_advertise_tx_buffers(struct netchannel2 *nc); enum prepare_xmit_result { PREP_XMIT_OKAY = 0, @@ -704,9 +782,20 @@ void xmit_grant(struct netchannel2_ring_pair *ncrp, int use_subpage_grants, volatile void *msg); +int prepare_xmit_allocate_post(struct netchannel2 *nc, + struct sk_buff *skb); +void xmit_post(struct netchannel2 *nc, + struct sk_buff *skb, + volatile void *msg); + +void nc2_replenish_rx_buffers(struct netchannel2 *nc); + void queue_finish_packet_message(struct netchannel2_ring_pair *ncrp, uint32_t id, uint8_t flags); +void nc2_return_pending_posted_buffers(struct netchannel2 *nc); +void nc2_posted_buffer_rx_forget(struct netchannel2 *nc); + int allocate_txp_slot(struct netchannel2_ring_pair *ncrp, struct sk_buff *skb); void release_txp_slot(struct netchannel2_ring_pair *ncrp, @@ -715,6 +804,8 @@ void release_txp_slot(struct netchannel2_ring_pair *ncrp, void release_tx_packet(struct netchannel2_ring_pair *ncrp, struct sk_buff *skb); +void unprepare_tx_buffers(struct netchannel2 *nc); + void fetch_fragment(struct netchannel2_ring_pair *ncrp, unsigned idx, struct netchannel2_fragment *frag, @@ -749,6 +840,7 @@ irqreturn_t nc2_int(int irq, void *dev_id); void cleanup_ring_pair(struct netchannel2_ring_pair *ncrp); void nc2_rscb_on_gntcopy_fail(void *ctxt, gnttab_copy_t *gop); +void nc2_posted_on_gntcopy_fail(void *ctxt, gnttab_copy_t *gop); int init_receive_map_mode(void); void deinit_receive_map_mode(void); diff --git a/drivers/xen/netchannel2/netfront2.c b/drivers/xen/netchannel2/netfront2.c index 9b2e2ec..e06fa77 100644 --- a/drivers/xen/netchannel2/netfront2.c +++ b/drivers/xen/netchannel2/netfront2.c @@ -356,6 +356,8 @@ static void backend_changed(struct xenbus_device *xd, /* Backend has advertised the ring protocol. Allocate the rings, and tell the backend about them. */ + nc2_set_nr_tx_buffers(nf->chan, 0); + err = 0; if (!nf->attached) err = allocate_rings(nf, xd->otherend_id); diff --git a/drivers/xen/netchannel2/posted_buffers.c b/drivers/xen/netchannel2/posted_buffers.c new file mode 100644 index 0000000..96de7da --- /dev/null +++ b/drivers/xen/netchannel2/posted_buffers.c @@ -0,0 +1,781 @@ +/* Support for receiver-posted buffers */ +#include <linux/kernel.h> +#include <linux/delay.h> +#include <linux/skbuff.h> +#include <linux/netdevice.h> +#include <xen/evtchn.h> +#include <xen/gnttab.h> +#include <xen/xenbus.h> +#include <xen/live_maps.h> +#include "netchannel2_endpoint.h" +#include "netchannel2_core.h" + +#define POSTED_BUFFER_SIZE PAGE_SIZE + +/* A poison value to make certain buffer management errors more + * obvious. */ +#define RX_BUFFER_BIAS 0xbeef0000 + +static void prepare_tx_buffers(struct netchannel2 *nc); + +/* --------------------------- Receive -------------------------------- */ + +/* A buffer which we have allocated for the other end to send us + packets in. */ +struct nc2_rx_buffer { + struct list_head list; + void *buffer; + grant_ref_t gref; + uint8_t is_posted; /* Set if this buffer is available to the + other end. */ +}; + +/* The other end just sent us a buffer id. Convert it back to an + nc2_rx_buffer structure. Returns NULL if the id is invalid, or if + it isn''t currently owned by the other end. */ +static struct nc2_rx_buffer *find_rx_buffer(struct netchannel2 *nc, + uint32_t id) +{ + struct nc2_rx_buffer *rxb; + id -= RX_BUFFER_BIAS; + if (id >= nc->max_nr_rx_buffers) + return NULL; + rxb = &nc->rx_buffer_structs[id]; + if (rxb->is_posted) + return rxb; + else + return NULL; +} + +/* Post a buffer to the other endpoint immediately. Assumes that the + caller has already checked that there is enough space available on + the ring. */ +static void _nc2_post_buffer(struct netchannel2 *nc, + struct nc2_rx_buffer *rxb) +{ + struct netchannel2_msg_post_buffer msg; + + BUG_ON(!nc->remote_trusted); + + msg.id = rxb - nc->rx_buffer_structs + RX_BUFFER_BIAS; + msg.gref = rxb->gref; + msg.off_in_page = offset_in_page(rxb->buffer); + msg.size = POSTED_BUFFER_SIZE; + + nc2_send_message(&nc->rings.prod_ring, NETCHANNEL2_MSG_POST_BUFFER, + 0, &msg, sizeof(msg)); +} + +/* Push out all pending buffer posts, until the ring becomes full or + we run out of buffers to post. Called under the lock. */ +static void push_rx_buffer_posts(struct netchannel2 *nc) +{ + struct nc2_rx_buffer *buf; + + while (!list_empty(&nc->unposted_rx_buffers) && + nc2_can_send_payload_bytes(&nc->rings.prod_ring, + sizeof(struct netchannel2_msg_post_buffer))) { + buf = list_entry(nc->unposted_rx_buffers.next, + struct nc2_rx_buffer, + list); + _nc2_post_buffer(nc, buf); + buf->is_posted = 1; + list_move(&buf->list, &nc->rx_buffers); + nc->nr_rx_buffers++; + + nc->rings.pending_time_sensitive_messages = 1; + } +} + +/* Allocate more RX buffers until we reach our target number of RX + buffers and post them to the other endpoint. Call under the + lock. */ +void nc2_replenish_rx_buffers(struct netchannel2 *nc) +{ + struct nc2_rx_buffer *rb; + + if (nc->dont_post_buffers || !nc->remote_trusted) + return; + + while (!list_empty(&nc->unused_rx_buffers)) { + rb = list_entry(nc->unused_rx_buffers.next, + struct nc2_rx_buffer, + list); + rb->buffer = (void *)__get_free_pages(GFP_ATOMIC|__GFP_NOWARN, + 0); + if (!rb->buffer) + break; + rb->gref + gnttab_grant_foreign_access(nc->rings.otherend_id, + virt_to_mfn(rb->buffer), + 0); + if ((int)rb->gref < 0) { + free_page((unsigned long)rb->buffer); + break; + } + + list_move(&rb->list, &nc->unposted_rx_buffers); + } + + push_rx_buffer_posts(nc); +} + +/* The other endpoint has used @rxb to transmit part of the packet + which we''re goign to represent by @skb. Attach it to the packet''s + fragment list. The caller should make sure that @skb currently has + less than MAX_SKB_FRAGS in its shinfo area, and that @size and + @offset are appropriate for the buffer. @size gives the size of + the fragment, and @offset gives its offset relative to the start of + the receive buffer. */ +/* This effectively transfers ownership of the buffer''s page from @rxb + to @skb. */ +static void attach_buffer_to_skb(struct sk_buff *skb, + struct nc2_rx_buffer *rxb, + unsigned size, + unsigned offset) +{ + struct skb_shared_info *shinfo = skb_shinfo(skb); + skb_frag_t *frag = &shinfo->frags[shinfo->nr_frags]; + + BUG_ON(shinfo->nr_frags >= MAX_SKB_FRAGS); + + frag->page = virt_to_page(rxb->buffer); + frag->page_offset = offset_in_page(rxb->buffer) + offset; + frag->size = size; + skb->truesize += size; + skb->data_len += size; + skb->len += size; + + shinfo->nr_frags++; +} + +/* The other end has sent us a packet using pre-posted buffers. Parse + it up and return an skb representing the packet, or NULL on + error. */ +struct sk_buff *handle_pre_posted_packet(struct netchannel2 *nc, + struct netchannel2_msg_packet *msg, + struct netchannel2_msg_hdr *hdr, + unsigned nr_frags, + unsigned frags_off) +{ + struct netchannel2_fragment frag; + struct sk_buff *skb; + unsigned x; + struct nc2_rx_buffer *rxb; + int is_bad; + int dropped; + unsigned prefix_len; + +#define SKB_MIN_PAYLOAD_SIZE 128 + + dropped = 0; + is_bad = 0; + if (msg->prefix_size < SKB_MIN_PAYLOAD_SIZE) + prefix_len = SKB_MIN_PAYLOAD_SIZE; + else + prefix_len = msg->prefix_size; + /* We don''t enforce the MAX_PACKET_BYTES limit here. That''s + okay, because the amount of memory which the other end can + cause us to allocate is still limited, which is all that''s + really needed. */ + skb = dev_alloc_skb(prefix_len + NET_IP_ALIGN); + if (skb == NULL) { + is_bad = 1; + dropped = 1; + } else { + skb_reserve(skb, NET_IP_ALIGN); + nc2_copy_from_ring_off(&nc->rings.cons_ring, + skb_put(skb, msg->prefix_size), + msg->prefix_size, + frags_off + nr_frags * sizeof(frag)); + } + + for (x = 0; x < nr_frags; x++) { + fetch_fragment(&nc->rings, x, &frag, frags_off); + rxb = find_rx_buffer(nc, frag.pre_post.id); + if (rxb == NULL) { + pr_debug("RX in bad frag %d.\n", frag.pre_post.id); + is_bad = 1; + continue; + } + + if (!is_bad && + frag.size <= PAGE_SIZE && + frag.off < PAGE_SIZE && + frag.size + frag.off <= POSTED_BUFFER_SIZE && + gnttab_end_foreign_access_ref(rxb->gref)) { + gnttab_free_grant_reference(rxb->gref); + attach_buffer_to_skb(skb, rxb, frag.size, + frag.off); + + } else { + is_bad = 1; + gnttab_end_foreign_access(rxb->gref, + (unsigned long)rxb->buffer); + } + rxb->gref = 0; + rxb->buffer = NULL; + rxb->is_posted = 0; + nc->nr_rx_buffers--; + list_move(&rxb->list, &nc->unused_rx_buffers); + } + + if (is_bad) { + pr_debug("Received skb is bad!\n"); + if (skb) + kfree_skb(skb); + skb = NULL; + if (dropped) + nc->stats.rx_dropped++; + else + nc->stats.rx_errors++; + } else { + if (skb_headlen(skb) < SKB_MIN_PAYLOAD_SIZE) + pull_through(skb, + SKB_MIN_PAYLOAD_SIZE - skb_headlen(skb)); + } + + return skb; +} + +/* Release a single RX buffer and return it to the unused list. */ +static void release_rx_buffer(struct netchannel2 *nc, + struct nc2_rx_buffer *rxb) +{ + rxb->is_posted = 0; + gnttab_end_foreign_access(rxb->gref, + (unsigned long)rxb->buffer); + nc->nr_rx_buffers--; + list_move(&rxb->list, &nc->unused_rx_buffers); +} + +/* The other endpoint has finished with one of our RX buffers. Do + something suitable with it. */ +void nc2_handle_return_posted_buffer(struct netchannel2 *nc, + struct netchannel2_ring_pair *ncrp, + struct netchannel2_msg_hdr *hdr) +{ + struct netchannel2_msg_return_posted_buffer msg; + struct nc2_rx_buffer *rxb; + + if (hdr->size != sizeof(msg)) { + pr_debug("return rx buffer message wrong size %d != %zd\n", + hdr->size, sizeof(msg)); + return; + } + if (ncrp != &nc->rings) { + pr_debug("Return a posted buffer on an ancillary ring!\n"); + return; + } + nc2_copy_from_ring(&nc->rings.cons_ring, &msg, hdr->size); + rxb = find_rx_buffer(nc, msg.id); + if (!rxb) { + pr_debug("Other end returned buffer id %d which we didn''t know about.\n", + msg.id); + return; + } + release_rx_buffer(nc, rxb); +} + +/* Tear down any remaining RX buffers. The caller should have done + something to make sure that the other end isn''t going to try and + use them any more. */ +void nc2_posted_buffer_rx_forget(struct netchannel2 *nc) +{ + struct nc2_rx_buffer *rxb, *next; + + spin_lock_bh(&nc->rings.lock); + list_for_each_entry_safe(rxb, next, &nc->rx_buffers, list) + release_rx_buffer(nc, rxb); + list_for_each_entry_safe(rxb, next, &nc->unposted_rx_buffers, list) + release_rx_buffer(nc, rxb); + + BUG_ON(!list_empty(&nc->rx_buffers)); + BUG_ON(!list_empty(&nc->unposted_rx_buffers)); + + INIT_LIST_HEAD(&nc->unused_rx_buffers); + kfree(nc->rx_buffer_structs); + nc->rx_buffer_structs = NULL; + nc->max_nr_rx_buffers = 0; + spin_unlock_bh(&nc->rings.lock); +} + +void nc2_handle_set_nr_posted_buffers(struct netchannel2 *nc, + struct netchannel2_ring_pair *ncrp, + struct netchannel2_msg_hdr *hdr) +{ + struct netchannel2_msg_set_nr_posted_buffers msg; + struct nc2_rx_buffer *buffer_structs; + unsigned x; + unsigned nr_buffers; + + if (ncrp != &nc->rings) { + pr_debug("set_nr_posted_buffers on an ancillary ring!\n"); + return; + } + if (hdr->size != sizeof(msg)) { + pr_debug("set nr posted buffers message wrong size %d != %zd\n", + hdr->size, sizeof(msg)); + return; + } + if (nc->rx_buffer_structs != NULL) { + pr_debug("Other end tried to change posted buffer settings when they were already set.\n"); + return; + } + nc2_copy_from_ring(&nc->rings.cons_ring, &msg, hdr->size); + if (msg.nr_buffers <= MAX_POSTED_BUFFERS) { + nr_buffers = msg.nr_buffers; + } else { + pr_debug("remote recommended %d buffers, using %d\n", + msg.nr_buffers, MAX_POSTED_BUFFERS); + nr_buffers = MAX_POSTED_BUFFERS; + } + + buffer_structs = kzalloc(sizeof(struct nc2_rx_buffer) * nr_buffers, + GFP_ATOMIC); + if (buffer_structs == NULL) { + printk(KERN_WARNING "failed to allocate %d rx buffers", + nr_buffers); + return; + } + + for (x = 0; x < nr_buffers; x++) + list_add_tail(&buffer_structs[x].list, + &nc->unused_rx_buffers); + nc->max_nr_rx_buffers = nr_buffers; + nc->rx_buffer_structs = buffer_structs; + nc->dont_post_buffers = 0; +} + + +/* -------------------------- Transmit ------------------------------- */ + +/* A buffer which the other end has provided us which we can use to + transmit packets to it. */ +struct nc2_tx_buffer { + struct list_head list; + uint32_t id; /* ID assigned by the remote endpoint. */ + grant_ref_t gref; + uint16_t off_in_page; + uint16_t size; + grant_handle_t grant_handle; +}; + +/* A representation of a packet which is halfway through being + prepared for transmission. */ +struct post_packet_plan { + unsigned off_in_cur_buffer; + struct nc2_tx_buffer *cur_buffer; + + /* We assemble the next fragment in work_frag, and then copy + to output_frag once it''s done. */ + struct netchannel2_fragment work_frag; + volatile struct netchannel2_fragment *output_frag; +}; + +/* add a buffer slot to list of unused buffer slots after it has been + * returned to other end */ +static void free_tx_buffer(struct netchannel2 *nc, + struct nc2_tx_buffer *buffer) +{ + list_add(&buffer->list, &nc->unused_tx_buffer_slots); +} + +/* A grant copy failed while we were transmitting a packet. That + indicates that the *receiving* domain gave us a bad RX buffer. + We''re too late to send them an error, so there isn''t really + anything we can do to help them. Oh well, nevermind. */ +void nc2_posted_on_gntcopy_fail(void *ctxt, + gnttab_copy_t *gop) +{ + printk(KERN_WARNING "Grant copy failed for transmit; domain provided bad RX buffer (source %x, %x, %x, dest %x, %x, %x, len %x, flags %x, status %d).\n", + gop->source.u.ref, gop->source.domid, gop->source.offset, + gop->dest.u.ref, gop->dest.domid, gop->dest.offset, + gop->len, gop->flags, gop->status); +} + +/* Advance to the next transmit buffer/fragment in the packet. */ +static void advance_to_next_buffer(struct post_packet_plan *plan) +{ + BUG_ON(plan->off_in_cur_buffer < plan->cur_buffer->size); + plan->cur_buffer = list_entry(plan->cur_buffer->list.next, + struct nc2_tx_buffer, + list); + plan->off_in_cur_buffer = 0; + + *plan->output_frag = plan->work_frag; + plan->output_frag++; + memset(&plan->work_frag, 0, sizeof(plan->work_frag)); + plan->work_frag.pre_post.id = plan->cur_buffer->id; +} + +/* Schedule a copy from a range of bytes in a local page into the + packet we''re building in @plan. This cannot cross page or TX + buffer boundaries. */ +static void prepare_grant_copy(struct netchannel2 *nc, + struct post_packet_plan *plan, + struct page *page, + unsigned page_off, + unsigned count, + domid_t domid) +{ + gnttab_copy_t *gop; + + /* XXX: We don''t do any error checking on this grant copy. + That''s okay. There are only two ways a grant copy can + fail: + + -- The source is bad. But the source is either in our + local memory (so must be good), or something we''ve + already mapped (so the grant reference must be good, and + must already be pinned so it can''t go bad). Therefore, + the source must always be good, and we can''t fail + because of a bad source. + + -- The destination is bad. This could happen if the + receiving domain sent us a bad page to use as an RX + buffer. In that case, we''ll tell the receiving domain + that it received some data in a page when the page is + actually uninitialised. The worst case is that the + receiving domain ends up copying its own uninitialised + memory to its own userspace. That''s not a problem for + us (because it can''t see *our* uninitialised memory), + and if it''s a problem for the receiving domain then it + should have been more careful about what memory it gave + us to use as RX buffers. + + Therefore, the lack of error checking is actually perfectly + safe. + + (Even if it isn''t exactly great software engineering + practice.) + */ + gop = hypercall_batcher_grant_copy(&nc->batcher, + NULL, + nc2_posted_on_gntcopy_fail); + gop->flags = GNTCOPY_dest_gref; + if (page_is_tracked(page)) { + lookup_tracker_page(page, + &gop->source.domid, + &gop->source.u.ref); + gop->flags |= GNTCOPY_source_gref; + } else { + gop->source.domid = DOMID_SELF; + gop->source.u.gmfn = virt_to_mfn(page_address(page)); + } + gop->source.offset = page_off; + gop->dest.domid = domid; + gop->dest.offset + plan->cur_buffer->off_in_page + plan->off_in_cur_buffer; + gop->dest.u.ref = plan->cur_buffer->gref; + gop->len = count; +} + +/* Add the bytes from @ptr to @ptr + @size to the packet we''re + preparing in @plan. This cannot handle page-crossing local + buffers, but will correctly handle buffer-crossing operations. */ +static void prepare_subpage_post(struct netchannel2 *nc, + struct page *page, + unsigned off_in_page, + unsigned size, + struct post_packet_plan *plan) +{ + unsigned remaining_in_buffer; + unsigned this_time; + + BUG_ON(off_in_page + size > PAGE_SIZE); + while (size != 0) { + remaining_in_buffer + plan->cur_buffer->size - + plan->off_in_cur_buffer; + if (remaining_in_buffer == 0) { + advance_to_next_buffer(plan); + remaining_in_buffer = plan->cur_buffer->size; + } + + this_time = size; + if (this_time > remaining_in_buffer) + this_time = remaining_in_buffer; + prepare_grant_copy(nc, + plan, + page, + off_in_page, + this_time, + nc->rings.otherend_id); + plan->work_frag.size += this_time; + plan->off_in_cur_buffer += this_time; + + size -= this_time; + off_in_page += this_time; + } +} + +/* Add @skb->data to @skb->tail to the packet which is being prepared + in @plan. */ +static void prepare_data_area_post(struct netchannel2 *nc, struct sk_buff *skb, + struct post_packet_plan *plan) +{ + void *ptr = skb->data; + unsigned len = skb_headlen(skb); + unsigned off; + unsigned this_time; + + for (off = 0; off < len; off += this_time) { + this_time = len; + if (this_time + offset_in_page(ptr + off) > PAGE_SIZE) + this_time = PAGE_SIZE - offset_in_page(ptr + off); + prepare_subpage_post(nc, + virt_to_page(ptr + off), + offset_in_page(ptr + off), + this_time, + plan); + } +} + +/* Allocate some TX buffers suitable for transmitting @skb out of + @nc''s pool. The buffers are chained on @fragments. On success, + returns the number of buffers allocated. Returns -1 if + insufficient buffers are available, in which case no buffers are + allocated. We assume that the packet will be offset by + NET_IP_ALIGN bytes in the first fragment so that everything after + the ethernet header is properly aligned. */ +static int grab_tx_buffers(struct netchannel2 *nc, + struct sk_buff *skb, + struct list_head *fragments) +{ + unsigned bytes_to_transmit; + unsigned bytes_planned; + struct nc2_tx_buffer *current_buffer, *next; + int count; + + INIT_LIST_HEAD(fragments); + bytes_planned = 0; + bytes_to_transmit = skb->len + NET_IP_ALIGN; + count = 0; + list_for_each_entry_safe(current_buffer, next, &nc->avail_tx_buffers, + list) { + count++; + bytes_planned += current_buffer->size; + list_move(¤t_buffer->list, fragments); + if (bytes_planned >= bytes_to_transmit) { + BUG_ON(nc->nr_avail_tx_buffers < count); + nc->nr_avail_tx_buffers -= count; + return count; + } + } + BUG_ON(nc->nr_avail_tx_buffers != count); + list_splice_init(fragments, &nc->avail_tx_buffers); + return -1; +} + +int prepare_xmit_allocate_post(struct netchannel2 *nc, struct sk_buff *skb) +{ + struct skb_cb_overlay *scb; + int nr_fragments; + + scb = get_skb_overlay(skb); + nr_fragments = grab_tx_buffers(nc, skb, &scb->buffers); + if (nr_fragments < 0) + return -1; + scb->nr_fragments = nr_fragments; + scb->type = NC2_PACKET_TYPE_pre_posted; + + return 0; +} + +void xmit_post(struct netchannel2 *nc, struct sk_buff *skb, + volatile void *msg_buf) +{ + volatile struct netchannel2_msg_packet *msg = msg_buf; + struct skb_cb_overlay *scb; + struct skb_shared_info *shinfo; + skb_frag_t *frag; + unsigned x; + struct post_packet_plan plan; + + scb = get_skb_overlay(skb); + memset(&plan, 0, sizeof(plan)); + + plan.cur_buffer = list_entry(scb->buffers.next, + struct nc2_tx_buffer, + list); + plan.output_frag = msg->frags; + memset(&plan.work_frag, 0, sizeof(plan.work_frag)); + plan.work_frag.pre_post.id = plan.cur_buffer->id; + + /* Burn a couple of bytes at the start of the packet so as we + get better alignment in the body. */ + plan.work_frag.off = NET_IP_ALIGN; + plan.off_in_cur_buffer = NET_IP_ALIGN; + + prepare_data_area_post(nc, skb, &plan); + shinfo = skb_shinfo(skb); + for (x = 0; x < shinfo->nr_frags; x++) { + frag = &shinfo->frags[x]; + prepare_subpage_post(nc, + frag->page, + frag->page_offset, + frag->size, + &plan); + } + + *plan.output_frag = plan.work_frag; + + /* All of the buffer slots which have been used in + this packet are now available for the other end to + fill with new buffers. */ + list_splice(&scb->buffers, &nc->unused_tx_buffer_slots); +} + +/* The other endpoint has sent us a transmit buffer. Add it to the + list. Called under the lock. */ +void nc2_handle_post_buffer(struct netchannel2 *nc, + struct netchannel2_ring_pair *ncrp, + struct netchannel2_msg_hdr *hdr) +{ + struct netchannel2_msg_post_buffer msg; + struct nc2_tx_buffer *txb; + + if (hdr->size != sizeof(msg)) { + pr_debug("Strange sized rx buffer post %d\n", hdr->size); + return; + } + if (ncrp != &nc->rings) { + pr_debug("Posted buffer on an ancillary ring!\n"); + return; + } + nc2_copy_from_ring(&nc->rings.cons_ring, &msg, sizeof(msg)); + if (list_empty(&nc->unused_tx_buffer_slots) || + msg.size > PAGE_SIZE || + msg.off_in_page > PAGE_SIZE || + msg.size + msg.off_in_page > PAGE_SIZE || + msg.size < 64) { + pr_debug("Other end posted too many buffers, or this buffer was strange (%d,%d)\n", + msg.off_in_page, msg.size); + return; + } + + txb = list_entry(nc->unused_tx_buffer_slots.next, + struct nc2_tx_buffer, + list); + txb->id = msg.id; + txb->gref = msg.gref; + txb->off_in_page = msg.off_in_page; + txb->size = msg.size; + + nc->nr_avail_tx_buffers++; + + list_move(&txb->list, &nc->avail_tx_buffers); +} + +/* Process the pending TX buffer return list and push as many as + possible onto the ring. Called under the lock. Does not + automatically flush the ring; that''s the caller''s + responsibility. */ +void nc2_return_pending_posted_buffers(struct netchannel2 *nc) +{ + struct netchannel2_msg_return_posted_buffer msg; + struct nc2_tx_buffer *txb; + + memset(&msg, 0, sizeof(msg)); + while (!list_empty(&nc->pending_tx_buffer_return) && + nc2_can_send_payload_bytes(&nc->rings.prod_ring, sizeof(msg))) { + txb = list_entry(nc->pending_tx_buffer_return.next, + struct nc2_tx_buffer, + list); + list_del(&txb->list); + free_tx_buffer(nc, txb); + msg.id = txb->id; + nc2_send_message(&nc->rings.prod_ring, + NETCHANNEL2_MSG_RETURN_POSTED_BUFFER, + 0, + &msg, + sizeof(&msg)); + } +} + +/* If there is space on the ring, tell the other end how many RX + buffers we want it to post (i.e. how many TX buffers we''re allowed + to accept). Called under the lock. */ +void nc2_advertise_tx_buffers(struct netchannel2 *nc) +{ + struct netchannel2_msg_set_nr_posted_buffers msg; + + if (!nc2_can_send_payload_bytes(&nc->rings.prod_ring, sizeof(msg))) + return; + msg.nr_buffers = nc->nr_tx_buffers; + nc2_send_message(&nc->rings.prod_ring, + NETCHANNEL2_MSG_SET_NR_POSTED_BUFFERS, + 0, &msg, sizeof(msg)); + nc->need_advertise_tx_buffers = 0; + nc->rings.pending_time_sensitive_messages = 1; +} + +/* Set the target number of TX buffers. */ +void nc2_set_nr_tx_buffers(struct netchannel2 *nc, unsigned nr_buffers) +{ + int changed; + + spin_lock_bh(&nc->rings.lock); + changed = (nc->configured_nr_tx_buffers != nr_buffers); + nc->configured_nr_tx_buffers = nr_buffers; + spin_unlock_bh(&nc->rings.lock); + if (changed) + prepare_tx_buffers(nc); +} + +/* The local ethX interface just came up. Set up the TX buffers. */ +static void prepare_tx_buffers(struct netchannel2 *nc) +{ + struct nc2_tx_buffer *buffers; + unsigned x; + unsigned nr_buffers; + + nr_buffers = nc->configured_nr_tx_buffers; + if (nr_buffers == 0) { + /* Trying to shut down TX in posted buffers. */ + unprepare_tx_buffers(nc); + return; + } + + buffers = kzalloc(sizeof(struct nc2_tx_buffer) * nr_buffers, + GFP_KERNEL); + if (buffers == NULL) { + printk(KERN_ERR "Cannot allocate %d tx buffer slots, posted tx disabled.\n", + nr_buffers); + return; + } + + spin_lock_bh(&nc->rings.lock); + + /* nc->tx_buffers should be NULL, because starting and + stopping the TX buffer management should alternate. */ + BUG_ON(nc->tx_buffers); + + INIT_LIST_HEAD(&nc->avail_tx_buffers); + nc->nr_avail_tx_buffers = 0; + for (x = 0; x < nr_buffers; x++) + list_add_tail(&buffers[x].list, &nc->unused_tx_buffer_slots); + nc->tx_buffers = buffers; + nc->nr_tx_buffers = nr_buffers; + nc->need_advertise_tx_buffers = 1; + spin_unlock_bh(&nc->rings.lock); +} + +/* The local ethX interface is goign down. Release the TX buffers + allocated by prepare_tx_buffers(). Note that the poll() method has + already been stopped, so messages posted by the other end will not + be processed. */ +void unprepare_tx_buffers(struct netchannel2 *nc) +{ + spin_lock_bh(&nc->rings.lock); + INIT_LIST_HEAD(&nc->pending_tx_buffer_return); + INIT_LIST_HEAD(&nc->unused_tx_buffer_slots); + INIT_LIST_HEAD(&nc->avail_tx_buffers); + nc->nr_tx_buffers = 0; + nc->nr_avail_tx_buffers = 0; + nc->need_advertise_tx_buffers = 1; + kfree(nc->tx_buffers); + nc->tx_buffers = NULL; + spin_unlock_bh(&nc->rings.lock); +} diff --git a/drivers/xen/netchannel2/recv_packet.c b/drivers/xen/netchannel2/recv_packet.c index 94aa127..4501723 100644 --- a/drivers/xen/netchannel2/recv_packet.c +++ b/drivers/xen/netchannel2/recv_packet.c @@ -121,6 +121,11 @@ void nc2_handle_packet_msg(struct netchannel2 *nc, nr_frags, frags_off); queue_finish_packet_message(ncrp, msg.id, msg.flags); break; + case NC2_PACKET_TYPE_pre_posted: + skb = handle_pre_posted_packet(nc, &msg, hdr, nr_frags, + frags_off); + /* No finish message */ + break; case NC2_PACKET_TYPE_receiver_map: if (!nc->local_trusted) { /* The remote doesn''t trust us, so they diff --git a/drivers/xen/netchannel2/xmit_packet.c b/drivers/xen/netchannel2/xmit_packet.c index a24105a..1a879aa 100644 --- a/drivers/xen/netchannel2/xmit_packet.c +++ b/drivers/xen/netchannel2/xmit_packet.c @@ -4,6 +4,11 @@ #include <linux/version.h> #include "netchannel2_core.h" +/* You don''t normally want to transmit in posted buffers mode, because + grant mode is usually faster, but it''s sometimes useful for testing + the VMQ receiver when you don''t have VMQ-capable hardware. */ +#define PREFER_POSTED_BUFFERS 0 + /* We limit the number of transmitted packets which can be in flight at any one time, as a somewhat paranoid safety catch. */ #define MAX_TX_PACKETS MAX_PENDING_FINISH_PACKETS @@ -15,6 +20,16 @@ static enum transmit_policy transmit_policy(struct netchannel2 *nc, return transmit_policy_small; else if (nc->remote_trusted) return transmit_policy_map; + else if (PREFER_POSTED_BUFFERS && + /* We approximate the number of buffers needed by + skb_shinfo(skb)->nr_frags, which isn''t entirely + correct, but isn''t that far off, either. Getting + it wrong just means we''ll delay transmission + waiting for more buffers when we should have gone + ahead with polict grant; not ideal, but hardly a + disaster. */ + nc->nr_avail_tx_buffers > skb_shinfo(skb)->nr_frags) + return transmit_policy_post; else return transmit_policy_grant; } @@ -76,6 +91,9 @@ enum prepare_xmit_result prepare_xmit_allocate_resources(struct netchannel2 *nc, case transmit_policy_grant: r = prepare_xmit_allocate_grant(&nc->rings, skb, 1); break; + case transmit_policy_post: + r = prepare_xmit_allocate_post(nc, skb); + break; case transmit_policy_map: r = prepare_xmit_allocate_grant(&nc->rings, skb, 0); break; @@ -177,6 +195,9 @@ int nc2_really_start_xmit(struct netchannel2_ring_pair *ncrp, case transmit_policy_grant: xmit_grant(ncrp, skb, 1, msg); break; + case transmit_policy_post: + xmit_post(nc, skb, msg); + break; case transmit_policy_map: xmit_grant(ncrp, skb, 0, msg); break; diff --git a/include/xen/interface/io/netchannel2.h b/include/xen/interface/io/netchannel2.h index 075658d..554635c 100644 --- a/include/xen/interface/io/netchannel2.h +++ b/include/xen/interface/io/netchannel2.h @@ -47,6 +47,11 @@ struct netchannel2_fragment { grant_ref_t gref; } receiver_copy; struct { + /* The id of a buffer which previously posted + in a POST_BUFFER message. */ + uint32_t id; + } pre_post; + struct { grant_ref_t gref; } receiver_map; }; @@ -106,6 +111,13 @@ struct netchannel2_msg_packet { * Due to backend bugs, it is in not safe to use this * packet type except on bypass rings. * + * pre_posted -- The transmitting domain has copied the packet to + * buffers which were previously provided in POST_BUFFER + * messages. No FINISH message is required, and it is + * an error to send one. + * + * This packet type may not be used on bypass rings. + * * receiver_map -- The transmitting domain has granted the receiving * domain access to the original RX buffers using * full (mappable) grant references. This can be @@ -134,6 +146,7 @@ struct netchannel2_msg_packet { * that it is correct to treat receiver_map and small packets as * receiver_copy ones. */ #define NC2_PACKET_TYPE_receiver_copy 1 +#define NC2_PACKET_TYPE_pre_posted 2 #define NC2_PACKET_TYPE_receiver_map 3 #define NC2_PACKET_TYPE_small 4 @@ -193,6 +206,64 @@ struct netchannel2_msg_set_max_fragments_per_packet { uint32_t max_frags_per_packet; }; +/* Provide a buffer to the other end. The buffer is initially empty. + * The other end is expected to either: + * + * -- Put some packet data in it, and return it as part of a + * pre_posted PACKET message, or + * -- Not do anything with it, and return it in a RETURN_BUFFER + * message. + * + * The other end is allowed to hold on to the buffer for as long as it + * wants before returning the buffer. Buffers may be used out of + * order. + * + * This message cannot be sent unless the VM has received a + * SET_NR_POSTED_BUFFERS message. The total number of outstanding + * buffers must not exceed the limit specified in the + * SET_NR_POSTED_BUFFERS message. + * + * The grant reference should be a whole-page reference, and not a + * subpage reference, because the reeciving domain may need to map it + * in order to make the buffer available to hardware. The current + * Linux implementation doesn''t do this, but a future version will. + */ +#define NETCHANNEL2_MSG_POST_BUFFER 6 +struct netchannel2_msg_post_buffer { + struct netchannel2_msg_hdr hdr; + uint32_t id; + grant_ref_t gref; + uint16_t off_in_page; + uint16_t size; +}; + +/* The other end has decided not to use the buffer for some reason + * (usually because it''s shutting down). The buffer is returned + * containing no data. + */ +#define NETCHANNEL2_MSG_RETURN_POSTED_BUFFER 7 +struct netchannel2_msg_return_posted_buffer { + struct netchannel2_msg_hdr hdr; + uint32_t id; +}; + +/* The other end is allowing us to post up to @nr_buffers messages to + * us. If @nr_buffers is 0, the use of posted buffers is disabled. + * + * If there are buffers outstanding, a SET_NR_POSTED_BUFFERS message + * implicitly returns all of them, as if they had been returned with a + * run of RETURN_POSTED_BUFFER messages. This is true even if + * @nr_buffers is unchanged. + * + * @nr_buffers only ever provides an upper bound on the number of + * buffers posted; an endpoint may elect to post less than that. + */ +#define NETCHANNEL2_MSG_SET_NR_POSTED_BUFFERS 8 +struct netchannel2_msg_set_nr_posted_buffers { + struct netchannel2_msg_hdr hdr; + uint32_t nr_buffers; +}; + /* Attach to a bypass ring as a frontend. The receiving domain should * map the bypass ring (which will be in the sending domain''s memory) * and attach to it in the same as it attached to the original ring. -- 1.6.3.1 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
<steven.smith@citrix.com>
2009-Oct-04 15:04 UTC
[Xen-devel] [PATCH 21/22] NC2 VMQ support.
This only includes the transmit half, because the receiver uses an unmodified posted buffers mode implementation. This includes various bits of patches which were Signed-off-by: Jose Renato Santos <jsantos@hpl.hp.com> Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Signed-off-by: Steven Smith <steven.smith@citrix.com> All bugs are mine, of course. --- drivers/xen/Kconfig | 5 + drivers/xen/netchannel2/Makefile | 4 + drivers/xen/netchannel2/chan.c | 7 +- drivers/xen/netchannel2/netback2.c | 9 + drivers/xen/netchannel2/netchannel2_core.h | 10 + drivers/xen/netchannel2/posted_buffer.h | 50 ++ drivers/xen/netchannel2/posted_buffers.c | 20 +- drivers/xen/netchannel2/util.c | 8 +- drivers/xen/netchannel2/vmq.c | 805 ++++++++++++++++++++++++++++ drivers/xen/netchannel2/vmq.h | 58 ++ drivers/xen/netchannel2/vmq_def.h | 68 +++ drivers/xen/netchannel2/xmit_packet.c | 6 + 12 files changed, 1029 insertions(+), 21 deletions(-) create mode 100644 drivers/xen/netchannel2/posted_buffer.h create mode 100644 drivers/xen/netchannel2/vmq.c create mode 100644 drivers/xen/netchannel2/vmq.h create mode 100644 drivers/xen/netchannel2/vmq_def.h diff --git a/drivers/xen/Kconfig b/drivers/xen/Kconfig index a7e5b5c..a37b0cd 100644 --- a/drivers/xen/Kconfig +++ b/drivers/xen/Kconfig @@ -234,6 +234,11 @@ config XEN_NETDEV2_FRONTEND depends on XEN_NETCHANNEL2 default y +config XEN_NETDEV2_VMQ + bool "Net channel 2 support for multi-queue devices" + depends on XEN_NETDEV2_BACKEND && NET_VMQ + default y + config XEN_NETDEV2_BYPASSABLE bool "Net channel 2 bypassee support" depends on XEN_NETDEV2_BACKEND diff --git a/drivers/xen/netchannel2/Makefile b/drivers/xen/netchannel2/Makefile index 11a257e..918d8d8 100644 --- a/drivers/xen/netchannel2/Makefile +++ b/drivers/xen/netchannel2/Makefile @@ -12,6 +12,10 @@ ifeq ($(CONFIG_XEN_NETDEV2_FRONTEND),y) netchannel2-objs += netfront2.o endif +ifeq ($(CONFIG_XEN_NETDEV2_VMQ),y) +netchannel2-objs += vmq.o +endif + ifeq ($(CONFIG_XEN_NETDEV2_BYPASSABLE),y) netchannel2-objs += bypassee.o endif diff --git a/drivers/xen/netchannel2/chan.c b/drivers/xen/netchannel2/chan.c index 060b49b..8dad6fe 100644 --- a/drivers/xen/netchannel2/chan.c +++ b/drivers/xen/netchannel2/chan.c @@ -13,6 +13,7 @@ #include "netchannel2_endpoint.h" #include "netchannel2_core.h" +#include "vmq.h" static int process_ring(struct napi_struct *napi, int work_avail); @@ -810,6 +811,8 @@ static int process_ring(struct napi_struct *napi, /* Pick up incoming messages. */ work_done = nc2_poll(ncrp, work_avail, &rx_queue); + do_vmq_work(nc); + /* Transmit pending packets. */ if (!skb_queue_empty(&ncrp->pending_tx_queue)) { skb = __skb_dequeue(&ncrp->pending_tx_queue); @@ -828,9 +831,11 @@ static int process_ring(struct napi_struct *napi, This must happen before we flush the rings, since that''s when the PACKET messages will be made visible to the other end. */ - if (ncrp == &nc->rings) + if (ncrp == &nc->rings) { flush_hypercall_batcher(&nc->batcher, nc2_posted_on_gntcopy_fail); + vmq_flush_unmap_hypercall(); + } flush_rings(ncrp); diff --git a/drivers/xen/netchannel2/netback2.c b/drivers/xen/netchannel2/netback2.c index 129ef81..eb2a781 100644 --- a/drivers/xen/netchannel2/netback2.c +++ b/drivers/xen/netchannel2/netback2.c @@ -10,8 +10,13 @@ #include "netchannel2_core.h" #include "netchannel2_endpoint.h" #include "netchannel2_uspace.h" +#include "vmq.h" +#ifdef CONFIG_XEN_NETDEV2_VMQ +#define NR_TX_BUFS (VMQ_MAX_BUFFERS+256) +#else #define NR_TX_BUFS 256 +#endif static atomic_t next_handle; /* A list of all currently-live netback2 interfaces. */ @@ -168,6 +173,8 @@ static int attach_to_frontend(struct netback2 *nd) return err; } + nc2_vmq_connect(nc); + /* All done */ nd->attached = 1; @@ -176,6 +183,8 @@ static int attach_to_frontend(struct netback2 *nd) static void nb2_shutdown(struct netchannel2 *nc) { + nc2_vmq_disconnect(nc); + nc2_set_nr_tx_buffers(nc, 0); } diff --git a/drivers/xen/netchannel2/netchannel2_core.h b/drivers/xen/netchannel2/netchannel2_core.h index 1939cbb..8e1657d 100644 --- a/drivers/xen/netchannel2/netchannel2_core.h +++ b/drivers/xen/netchannel2/netchannel2_core.h @@ -7,6 +7,8 @@ #include <linux/skbuff.h> #include <linux/netdevice.h> +#include "vmq_def.h" + /* After we send this number of frags, we request the other end to * notify us when sending the corresponding finish packet message */ #define MAX_MAX_COUNT_FRAGS_NO_EVENT 192 @@ -43,6 +45,9 @@ enum transmit_policy { transmit_policy_grant = transmit_policy_first, transmit_policy_post, transmit_policy_map, +#ifdef CONFIG_XEN_NETDEV2_VMQ + transmit_policy_vmq, +#endif transmit_policy_small, transmit_policy_last = transmit_policy_small }; @@ -437,6 +442,11 @@ struct netchannel2 { struct hypercall_batcher batcher; +#ifdef CONFIG_XEN_NETDEV2_VMQ + /* vmq data for supporting multi-queue devices */ + nc2_vmq_t vmq; +#endif + #ifdef CONFIG_XEN_NETDEV2_AUTOMATIC_BYPASS struct nc2_auto_bypass auto_bypass; #endif diff --git a/drivers/xen/netchannel2/posted_buffer.h b/drivers/xen/netchannel2/posted_buffer.h new file mode 100644 index 0000000..e249777 --- /dev/null +++ b/drivers/xen/netchannel2/posted_buffer.h @@ -0,0 +1,50 @@ +/* Buffer management related bits, shared between vmq.c and + * posted_buffer.c */ +#ifndef NC2_POSTED_BUFFER_H__ +#define NC2_POSTED_BUFFER_H__ + +/* A buffer which the other end has provided us which we can use to + transmit packets to it. */ +struct nc2_tx_buffer { + struct list_head list; + uint32_t id; /* ID assigned by the remote endpoint. */ + grant_ref_t gref; + uint16_t off_in_page; + uint16_t size; + grant_handle_t grant_handle; +}; + +/* add a buffer to the pending list to be returned to the other end buffer */ +static inline void return_tx_buffer(struct netchannel2 *nc, + struct nc2_tx_buffer *buffer) +{ + list_add(&buffer->list, &nc->pending_tx_buffer_return); +} + +static inline struct nc2_tx_buffer *_get_tx_buffer(struct netchannel2 *nc) +{ + struct nc2_tx_buffer *buffer; + struct list_head *entry = nc->avail_tx_buffers.next; + list_del(entry); + buffer = list_entry(entry, struct nc2_tx_buffer, list); + nc->nr_avail_tx_buffers--; + return buffer; +} + +/* recycle a posted buffer: return it to the list of available buffers */ +static inline void recycle_tx_buffer(struct netchannel2 *nc, + struct nc2_tx_buffer *buffer) +{ + list_add(&buffer->list, &nc->avail_tx_buffers); + nc->nr_avail_tx_buffers++; +} + +/* add a buffer slot to list of unused buffer slots after it has been + * returned to other end */ +static inline void free_tx_buffer(struct netchannel2 *nc, + struct nc2_tx_buffer *buffer) +{ + list_add(&buffer->list, &nc->unused_tx_buffer_slots); +} + +#endif /* !NC2_POSTED_BUFFER_H__ */ diff --git a/drivers/xen/netchannel2/posted_buffers.c b/drivers/xen/netchannel2/posted_buffers.c index 96de7da..9fb7570 100644 --- a/drivers/xen/netchannel2/posted_buffers.c +++ b/drivers/xen/netchannel2/posted_buffers.c @@ -9,6 +9,7 @@ #include <xen/live_maps.h> #include "netchannel2_endpoint.h" #include "netchannel2_core.h" +#include "posted_buffer.h" #define POSTED_BUFFER_SIZE PAGE_SIZE @@ -350,17 +351,6 @@ void nc2_handle_set_nr_posted_buffers(struct netchannel2 *nc, /* -------------------------- Transmit ------------------------------- */ -/* A buffer which the other end has provided us which we can use to - transmit packets to it. */ -struct nc2_tx_buffer { - struct list_head list; - uint32_t id; /* ID assigned by the remote endpoint. */ - grant_ref_t gref; - uint16_t off_in_page; - uint16_t size; - grant_handle_t grant_handle; -}; - /* A representation of a packet which is halfway through being prepared for transmission. */ struct post_packet_plan { @@ -373,14 +363,6 @@ struct post_packet_plan { volatile struct netchannel2_fragment *output_frag; }; -/* add a buffer slot to list of unused buffer slots after it has been - * returned to other end */ -static void free_tx_buffer(struct netchannel2 *nc, - struct nc2_tx_buffer *buffer) -{ - list_add(&buffer->list, &nc->unused_tx_buffer_slots); -} - /* A grant copy failed while we were transmitting a packet. That indicates that the *receiving* domain gave us a bad RX buffer. We''re too late to send them an error, so there isn''t really diff --git a/drivers/xen/netchannel2/util.c b/drivers/xen/netchannel2/util.c index 79d9f09..1d96256 100644 --- a/drivers/xen/netchannel2/util.c +++ b/drivers/xen/netchannel2/util.c @@ -34,7 +34,13 @@ int allocate_txp_slot(struct netchannel2_ring_pair *ncrp, static void nc2_free_skb(struct netchannel2 *nc, struct sk_buff *skb) { - dev_kfree_skb(skb); +#ifdef CONFIG_XEN_NETDEV2_VMQ + nc2_vmq_t *vmq = &nc->vmq; + if (get_skb_overlay(skb)->policy == transmit_policy_vmq) + skb_queue_tail(&vmq->dealloc_queue, skb); + else +#endif + dev_kfree_skb(skb); } void release_txp_slot(struct netchannel2_ring_pair *ncrp, diff --git a/drivers/xen/netchannel2/vmq.c b/drivers/xen/netchannel2/vmq.c new file mode 100644 index 0000000..e36962b --- /dev/null +++ b/drivers/xen/netchannel2/vmq.c @@ -0,0 +1,805 @@ +/***************************************************************************** + * vmq.c + * + * Support multi-queue network devices. + * + * Copyright (c) 2008, Kaushik Kumar Ram, Rice University. + * Copyright (c) 2008, Jose Renato Santos, Hewlett-Packard Co. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License version 2 + * as published by the Free Software Foundation; or, when distributed + * separately from the Linux kernel or incorporated into other + * software packages, subject to the following license: + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this source file (the "Software"), to deal in the Software without + * restriction, including without limitation the rights to use, copy, modify, + * merge, publish, distribute, sublicense, and/or sell copies of the Software, + * and to permit persons to whom the Software is furnished to do so, subject to + * the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS + * IN THE SOFTWARE. + * + */ +/* This only implements the transmit half of the method; receive is + * handled by posted_buffers.c */ +#include <linux/kernel.h> +#include <linux/netvmq.h> +#include <linux/skbuff.h> +#include <xen/xenbus.h> +#include <xen/balloon.h> +#include "netchannel2_core.h" + +#include "posted_buffer.h" +#include "vmq.h" + +/* state of device queue when operating in vmq mode */ +#define VMQ_QUEUE_DISABLED 0 +#define VMQ_QUEUE_STARTING 1 +#define VMQ_QUEUE_ENABLED 2 +#define VMQ_QUEUE_CLOSING 3 + +#define VMQ_MAX_UNMAP_OPS 256 +struct vmq_unmap_grants { + unsigned n; + gnttab_unmap_grant_ref_t gop[VMQ_MAX_UNMAP_OPS]; +}; +typedef struct vmq_unmap_grants vmq_unmap_grants_t; + +vmq_unmap_grants_t vmq_unmap_grants; + +static inline void vmq_flush_unmap_grants(void) +{ + if (vmq_unmap_grants.n == 0) + return; + + if (HYPERVISOR_grant_table_op(GNTTABOP_unmap_grant_ref, + vmq_unmap_grants.gop, + vmq_unmap_grants.n)) + BUG(); + vmq_unmap_grants.n = 0; +} + +static inline gnttab_unmap_grant_ref_t *vmq_next_unmap_gop(void) +{ + if (vmq_unmap_grants.n == VMQ_MAX_UNMAP_OPS) + vmq_flush_unmap_grants(); + return &vmq_unmap_grants.gop[vmq_unmap_grants.n++]; +} + +void vmq_flush_unmap_hypercall(void) +{ + vmq_flush_unmap_grants(); +} + +static inline unsigned long vmq_idx_to_pfn(nc2_vmq_t *vmq, unsigned int idx) +{ + return page_to_pfn(vmq->pages[idx]); +} + +static inline unsigned long vmq_idx_to_kaddr(nc2_vmq_t *vmq, unsigned int idx) +{ + return (unsigned long)pfn_to_kaddr(vmq_idx_to_pfn(vmq, idx)); +} + +/* get vmq idx from page struct */ +static long nc2_vmq_page_index(struct page *page) +{ + nc2_vmq_buf_t *vmq_buf; + vmq_buf = (nc2_vmq_buf_t *)page->mapping; + return vmq_buf - vmq_buf->nc->vmq.buffer; +} + +/* Read a physical device name from xenstore and + * returns a pointer to the associated net_device structure. + * Returns NULL on error. */ +static struct net_device *read_pdev(struct xenbus_device *dev) +{ + char *pdevstr; + struct net_device *pdev = NULL; + + pdevstr = xenbus_read(XBT_NIL, dev->nodename, "pdev", NULL); + if (IS_ERR(pdevstr)) + return NULL; + + if (pdevstr) + pdev = dev_get_by_name(&init_net, pdevstr); + + kfree(pdevstr); + + return pdev; +} + +static void nc2_vmq_page_release(struct page *page, unsigned int order) +{ + printk(KERN_CRIT "%s: ERROR: Unexpected release of netchannel2 vmq page", + __func__); + BUG_ON(1); +} + +static inline int nc2_vmq_is_disabled(struct netchannel2 *nc) +{ + return nc->vmq.vmq_state == VMQ_QUEUE_DISABLED; +} + +static inline int nc2_vmq_is_starting(struct netchannel2 *nc) +{ + return nc->vmq.vmq_state == VMQ_QUEUE_STARTING; +} + +static inline int nc2_vmq_is_enabled(struct netchannel2 *nc) +{ + return nc->vmq.vmq_state == VMQ_QUEUE_ENABLED; +} + +static inline int nc2_vmq_is_closing(struct netchannel2 *nc) +{ + return nc->vmq.vmq_state == VMQ_QUEUE_CLOSING; +} + +static inline void nc2_vmq_enable(struct netchannel2 *nc) +{ + nc2_vmq_t *vmq = &nc->vmq; + vmq_get(vmq); + vmq_enable_queue(vmq->pdev, vmq->vmq_id); + vmq->vmq_state = VMQ_QUEUE_ENABLED; +} + +void nc2_vmq_disconnect(struct netchannel2 *nc) +{ + nc2_vmq_t *vmq = &nc->vmq; + + if (nc2_vmq_is_enabled(nc)) { + vmq_disable_queue(vmq->pdev, vmq->vmq_id); + vmq_free_queue(vmq->pdev, vmq->vmq_id); + vmq->vmq_state = VMQ_QUEUE_CLOSING; + /* wait until all buffers have been returned by dev driver */ + wait_event(vmq->waiting_to_free, + atomic_read(&vmq->refcnt) == 0); + return; + } + + if (nc2_vmq_is_starting(nc)) { + vmq_free_queue(vmq->pdev, vmq->vmq_id); + vmq->vmq_state = VMQ_QUEUE_CLOSING; + return; + } + +} + + +static void nc2_vmq_end_map_buffers(gnttab_map_grant_ref_t *mop, int count, + struct netchannel2 *nc, u16 *alloc_idx) +{ + int i, err; + u16 idx; + unsigned int prod; + nc2_vmq_t *vmq = &nc->vmq; + + prod = vmq->mapped_pages_prod; + + for (i = 0; i < count; i++) { + idx = alloc_idx[i]; + + /* Check error status */ + err = mop->status; + if (likely(!err)) { + set_phys_to_machine( + __pa(vmq_idx_to_kaddr(vmq, idx)) + >> PAGE_SHIFT, + FOREIGN_FRAME(mop->dev_bus_addr + >> PAGE_SHIFT)); + /* Store the handle */ + vmq->buffer[idx].buf->grant_handle = mop->handle; + + /* Add it to the mapped pages list */ + vmq->mapped_pages[VMQ_IDX_MASK(prod++)] = idx; + mop++; + continue; + } + + /* Error mapping page: return posted buffer to other end. + * TODO: We might need an error field on the return buffer + * message */ + return_tx_buffer(nc, vmq->buffer[idx].buf); + + /* Add the page back to the free list */ + vmq->unmapped_pages[VMQ_IDX_MASK(vmq->unmapped_pages_prod++)] + = idx; + + mop++; + } + + smp_wmb(); + vmq->mapped_pages_prod = prod; + + return; +} + +/* Map guest buffers and place them in the mapped buffers list. The mapped + * pages in this list are used when allocating a skb (vmq_alloc_skb()). + */ +static void nc2_vmq_map_buffers(struct netchannel2 *nc) +{ + u16 idx; + int count = 0; + unsigned int cons; + int nbufs; + int buf_avail; + struct nc2_tx_buffer *buf; + struct nc2_vmq *vmq = &nc->vmq; + int n_mapped = nr_vmq_bufs(nc); + + + /* + * Putting hundreds of bytes on the stack is considered rude. + * Static works because a tasklet can only be on one CPU at any time. + */ + static gnttab_map_grant_ref_t rx_map_ops[VMQ_MAX_BUFFERS]; + static u16 alloc_idx[VMQ_MAX_BUFFERS]; + + /* If there is at least VMQ_MIN_BUFFERS buffers, no work to do */ + if (n_mapped >= VMQ_MIN_BUFFERS) + return; + + /* Try to get VMQ_MAX_BUFFERS mapped buffers, if there are + sufficient buffers posted by the other end */ + nbufs = VMQ_MAX_BUFFERS - n_mapped; + buf_avail = nc->nr_avail_tx_buffers; + if (nbufs > buf_avail) + nbufs = buf_avail; + + /* Xen cannot handle more than 512 grant ops in a single hypercall */ + if (nbufs > 512) + nbufs = 512; + + /* give up if there are no buffers available */ + if (nbufs <= 0) + return; + + /* Note that we *should* have free pages to consume here + * and no checks are needed. + */ + cons = vmq->unmapped_pages_cons; + + while (count < nbufs) { + idx = vmq->unmapped_pages[VMQ_IDX_MASK(cons++)]; + buf = vmq->buffer[idx].buf = _get_tx_buffer(nc); + /* Setup grant map operation */ + gnttab_set_map_op(&rx_map_ops[count], + vmq_idx_to_kaddr(vmq, idx), + GNTMAP_host_map, + buf->gref, + nc->rings.otherend_id); + alloc_idx[count] = idx; + count++; + } + + vmq->unmapped_pages_cons = cons; + + /* Map all the pages */ + BUG_ON(HYPERVISOR_grant_table_op(GNTTABOP_map_grant_ref, + rx_map_ops, nbufs)); + + /* Finalize buffer mapping after checking if the grant operations + succeeded */ + nc2_vmq_end_map_buffers(rx_map_ops, nbufs, nc, alloc_idx); + + vmq->nbufs += nbufs; +} + +static void nc2_vmq_unmap_buf(struct netchannel2 *nc, + unsigned int idx, int recycle) +{ + nc2_vmq_t *vmq = &nc->vmq; + unsigned long pfn; + gnttab_unmap_grant_ref_t *gop; + unsigned prod; + + pfn = vmq_idx_to_pfn(vmq, idx); + /* Already unmapped? */ + if (!phys_to_machine_mapping_valid(pfn)) + return; + + gop = vmq_next_unmap_gop(); + gnttab_set_unmap_op(gop, vmq_idx_to_kaddr(vmq, idx), + GNTMAP_host_map, + vmq->buffer[idx].buf->grant_handle); + + vmq->nbufs--; + + set_phys_to_machine(__pa(vmq_idx_to_kaddr(vmq, idx)) >> + PAGE_SHIFT, + INVALID_P2M_ENTRY); + /* Ready for next use. */ + gnttab_reset_grant_page(vmq->pages[idx]); + /* Add the page back to the unmapped list */ + prod = vmq->unmapped_pages_prod; + vmq->unmapped_pages[VMQ_IDX_MASK(prod++)] = idx; + if (recycle) + recycle_tx_buffer(nc, vmq->buffer[idx].buf); + else + free_tx_buffer(nc, vmq->buffer[idx].buf); + smp_wmb(); + vmq->unmapped_pages_prod = prod; +} + +static void nc2_vmq_free_mapped_bufs(struct netchannel2 *nc) +{ + nc2_vmq_t *vmq = &nc->vmq; + unsigned int idx; + unsigned prod, cons; + + /* The queue should be disabled before this function is called */ + BUG_ON(vmq->vmq_state == VMQ_QUEUE_ENABLED); + + cons = vmq->mapped_pages_cons; + prod = vmq->mapped_pages_prod; + smp_rmb(); + + while (cons != prod) { + idx = vmq->mapped_pages[VMQ_IDX_MASK(cons++)]; + nc2_vmq_unmap_buf(nc, idx, 1); + } + + vmq_flush_unmap_grants(); + + vmq->mapped_pages_cons = cons; + +} + +static void nc2_vmq_free_skb(struct sk_buff *skb) +{ + struct netchannel2 *nc; + nc2_vmq_t *vmq; + unsigned int idx; + int nr_frags, i; + struct skb_shared_info *shinfo = skb_shinfo(skb); + skb_frag_t *frags = shinfo->frags; + + nc = netdev_priv(skb->dev); + vmq = &nc->vmq; + + nr_frags = shinfo->nr_frags; + for (i = 0; i < nr_frags; i++) { + idx = nc2_vmq_page_index(frags[i].page); + nc2_vmq_unmap_buf(nc, idx, 1); + } + + vmq_flush_unmap_grants(); + + shinfo->frag_list = NULL; + shinfo->nr_frags = 0; + + /* Add the skb back to the free pool */ + skb_queue_tail(&vmq->free_skb_list, skb); +} + +/* Initialize the free socket buffer list */ +static int vmq_init_free_skb_list(int n, struct sk_buff_head *free_skb_list) +{ + int i; + struct sk_buff *skb; + + skb_queue_head_init(free_skb_list); + + for (i = 0; i < n; i++) { + skb = alloc_skb(VMQ_SKB_SIZE, GFP_ATOMIC); + if (!skb) { + printk("Netchannel2 vmq: Failed to allocate socket " + "buffer %d (max=%d)\n", i, (int)n); + goto error; + } + skb_queue_tail(free_skb_list, skb); + } + + return 0; +error: + /* Free all the allocated buffers and return Error */ + while (!skb_queue_empty(free_skb_list)) + kfree_skb(skb_dequeue(free_skb_list)); + + return -1; +} + +/* Initialize vmq. Return 1 if vmq is used and 0 otherwise */ +int nc2_vmq_connect(struct netchannel2 *nc) +{ + nc2_vmq_t *vmq = &nc->vmq; + struct page *page; + int q_id; + int size; + int i; + + vmq->vmq_mode = 0; + vmq->pdev = read_pdev(nc->xenbus_device); + + /* cannot use vmq mode if physical device not found */ + if (!vmq->pdev) + return 0; + + /* Allocate a RX queue */ + q_id = vmq_alloc_queue(vmq->pdev, VMQ_TYPE_RX); + if (q_id < 0) + /* Allocation failed, cannot use multi-queue */ + goto free_pdev; + + vmq->vmq_id = q_id; + + /* Set the size of the queue */ + size = vmq_get_maxsize(vmq->pdev); + if (size > VMQ_QUEUE_SIZE) + size = VMQ_QUEUE_SIZE; + if (vmq_set_size(vmq->pdev, q_id, size) < 0) { + /* Failure, free up the queue and return error */ + printk(KERN_ERR "%s: could not set queue size on net device\n", + __func__); + goto free_queue; + } + vmq->vmq_size = size; + + /* Set the mac address of the queue */ + if (vmq_set_mac(vmq->pdev, q_id, nc->rings.remote_mac) < 0) { + /* Failure, free up the queue and return error */ + printk(KERN_ERR "%s: could not set MAC address for net device queue\n", + __func__); + goto free_queue; + } + + vmq->pages = alloc_empty_pages_and_pagevec(VMQ_MAX_BUFFERS); + if (vmq->pages == NULL) { + printk(KERN_ERR "%s: out of memory\n", __func__); + goto free_queue; + } + + skb_queue_head_init(&vmq->dealloc_queue); + skb_queue_head_init(&vmq->rx_queue); + + if (vmq_init_free_skb_list(VMQ_MAX_BUFFERS, + &vmq->free_skb_list)) { + printk(KERN_ERR "%s: Could not allocate free socket buffers", + __func__); + goto free_pagevec; + } + + for (i = 0; i < VMQ_MAX_BUFFERS; i++) { + vmq->buffer[i].nc = nc; + page = vmq->pages[i]; + SetPageForeign(page, nc2_vmq_page_release); + page->mapping = (void *)&vmq->buffer[i]; + vmq->unmapped_pages[i] = i; + } + + vmq->unmapped_pages_prod = VMQ_MAX_BUFFERS; + vmq->unmapped_pages_cons = 0; + + vmq->mapped_pages_prod = 0; + vmq->mapped_pages_cons = 0; + + vmq->nbufs = 0; + vmq->vmq_mode = 1; + + /* Store the pointer to netchannel2 device in pdev */ + BUG_ON((vmq->pdev->vmq == NULL) || (vmq->pdev->vmq->queue == NULL)); + vmq->pdev->vmq->queue[q_id].guest = (void *)nc->net_device; + + atomic_set(&vmq->refcnt, 0); + init_waitqueue_head(&vmq->waiting_to_free); + + printk(KERN_INFO "Netchannel2 using vmq mode for guest %d\n", + nc->xenbus_device->otherend_id); + + vmq->vmq_state = VMQ_QUEUE_STARTING; + + return 1; /* Success */ + + +free_pagevec: + free_empty_pages_and_pagevec(vmq->pages, VMQ_MAX_BUFFERS); +free_queue: + vmq_free_queue(vmq->pdev, vmq->vmq_id); +free_pdev: + dev_put(vmq->pdev); + vmq->pdev = NULL; + return 0; +} + +void nc2_vmq_shutdown(struct netchannel2 *nc) +{ + nc2_vmq_t *vmq = &nc->vmq; + int i; + + if (!vmq->vmq_mode) + return; + + /* All posted bufs should have been returned */ + BUG_ON(nr_vmq_bufs(nc) != nr_vmq_mapped_bufs(nc)); + + /* free the mapped bufs */ + nc2_vmq_free_mapped_bufs(nc); + + /* Free the vmq pages */ + if (vmq->pages) { + for (i = 0; i < VMQ_MAX_BUFFERS; i++) { + if (PageForeign(vmq->pages[i])) + ClearPageForeign(vmq->pages[i]); + vmq->pages[i]->mapping = NULL; + } + free_empty_pages_and_pagevec(vmq->pages, VMQ_MAX_BUFFERS); + vmq->pages = NULL; + } + + while (!skb_queue_empty(&vmq->free_skb_list)) { + /* Free the socket buffer pool */ + kfree_skb(skb_dequeue(&vmq->free_skb_list)); + } + vmq->vmq_state = VMQ_QUEUE_DISABLED; + vmq->vmq_mode = 0; + + if (vmq->pdev) { + dev_put(vmq->pdev); + vmq->pdev = NULL; + } + + vmq_put(vmq); +} + +static int prepare_xmit_allocate_vmq(struct netchannel2 *nc, + struct sk_buff *skb) +{ + unsigned msg_size; + + msg_size = get_transmitted_packet_msg_size(skb); + if (!nc2_reserve_payload_bytes(&nc->rings.prod_ring, msg_size)) + return -1; + return 0; +} + +void do_vmq_work(struct netchannel2 *nc) +{ + nc2_vmq_t *vmq = &nc->vmq; + struct sk_buff *skb; + unsigned long flags; + + /* if not in vmq mode do nothing */ + if (!nc2_in_vmq_mode(nc)) + return; + + /* Map guest buffers for dedicated NIC RX queue if needed */ + if (nr_vmq_bufs(nc) < VMQ_MIN_BUFFERS) { + nc2_vmq_map_buffers(nc); + /* We delay enabling the queue until we have enough + posted buffers. Check if it is time to enable it */ + if (nc2_vmq_is_starting(nc) && + (nr_vmq_bufs(nc) >= VMQ_MIN_BUFFERS)) { + nc2_vmq_enable(nc); + } + } + + /* free vmq skb''s returned by the physical device driver */ + while (!skb_queue_empty(&nc->vmq.dealloc_queue)) + nc2_vmq_free_skb(skb_dequeue(&nc->vmq.dealloc_queue)); + + /* complete vmq closing after all packets returned by physical + * device driver */ + + if (nc2_vmq_is_closing(nc) && + (nr_vmq_bufs(nc) == nr_vmq_mapped_bufs(nc))) { + nc->vmq.vmq_state = VMQ_QUEUE_DISABLED; + nc2_vmq_shutdown(nc); + } + + spin_lock_irqsave(&vmq->rx_queue.lock, flags); + while (!skb_queue_empty(&vmq->rx_queue)) { + skb = __skb_dequeue(&nc->vmq.rx_queue); + if (prepare_xmit_allocate_vmq(nc, skb) < 0) { + __skb_queue_head(&vmq->rx_queue, skb); + spin_unlock_irqrestore(&vmq->rx_queue.lock, flags); + return; + } + __skb_queue_tail(&nc->rings.pending_tx_queue, skb); + } + spin_unlock_irqrestore(&vmq->rx_queue.lock, flags); +} + +/* Return the netchannel2 device corresponding to the given queue in pdev */ +static inline struct net_device *nc2_vmq_queue_to_vif(struct net_device *pdev, + int queue_id) +{ + net_vmq_t *n_vmq; + vmq_queue_t *vmq_q; + + n_vmq = pdev->vmq; + BUG_ON(n_vmq == NULL); + vmq_q = &n_vmq->queue[queue_id]; + BUG_ON(vmq_q == NULL); + + return (struct net_device *)vmq_q->guest; +} + +/* Handle incoming vmq packet */ +int vmq_netif_rx(struct sk_buff *skb, int queue_id) +{ + struct skb_cb_overlay *skb_co = get_skb_overlay(skb); + struct net_device *dev; + struct netchannel2 *nc; + nc2_vmq_t *vmq; + + memset(skb_co, 0, sizeof(*skb_co)); + + skb_co->nr_fragments = skb_shinfo(skb)->nr_frags; + skb_co->type = NC2_PACKET_TYPE_pre_posted; + skb_co->policy = transmit_policy_vmq; + + /* get the netchannel2 interface corresponding to this queue */ + dev = nc2_vmq_queue_to_vif(skb->dev, queue_id); + nc = netdev_priv(dev); + vmq = &nc->vmq; + + /* replace source dev with destination dev */ + skb->dev = dev; + /* add skb to rx_queue */ + skb_queue_tail(&vmq->rx_queue, skb); + + /* Trigger thread excution to procees new packets */ + nc2_kick(&nc->rings); + + return 0; +} +EXPORT_SYMBOL(vmq_netif_rx); + + +/* Allocate a socket buffer from the free list, get a guest posted + * buffer, attach it to the skb, and return it. + */ +struct sk_buff *vmq_alloc_skb(struct net_device *netdevice, int queue_id, + unsigned int length) +{ + struct sk_buff *skb; + struct netchannel2 *nc; + nc2_vmq_t *vmq; + unsigned int idx; + int nr_bufs, i; + unsigned int cons; + unsigned int prod; + + /* get the netchannel2 interface corresponding to this queue */ + nc = netdev_priv(nc2_vmq_queue_to_vif(netdevice, queue_id)); + + vmq = &nc->vmq; + + /* Get a free buffer from the pool */ + if (skb_queue_empty(&vmq->free_skb_list)) { + /* No buffers to allocate */ + return NULL; + } + + + skb = skb_dequeue(&vmq->free_skb_list); + BUG_ON(skb == NULL); + + nr_bufs = VMQ_NUM_BUFFERS(length); + + cons = vmq->mapped_pages_cons; + prod = vmq->mapped_pages_prod; + smp_rmb(); + + if (nr_bufs > (prod - cons)) + /* Not enough mapped buffers in the pool */ + goto kick_nc2; + + if (nr_bufs > MAX_SKB_FRAGS) + goto error; + + for (i = 0; i < nr_bufs; i++) { + idx = vmq->mapped_pages[VMQ_IDX_MASK(cons)]; + /* FIX ME: This can be simplified */ + skb_shinfo(skb)->frags[i].page + virt_to_page(vmq_idx_to_kaddr(vmq, idx)); + skb_shinfo(skb)->frags[i].page_offset = 0; + skb_shinfo(skb)->frags[i].size = PAGE_SIZE; + skb_shinfo(skb)->nr_frags++; + skb->dev = netdevice; + cons++; + } + + vmq->mapped_pages_cons = cons; + + /* if number of buffers get low run tasklet to map more buffers */ + if (nr_vmq_bufs(nc) < VMQ_MIN_BUFFERS) + nc2_kick(&nc->rings); + + return skb; + +kick_nc2: + /* kick netchannel2 interface to get any recently posted buffers */ + nc2_kick(&nc->rings); +error: + /* Add the skb back to the free pool */ + skb_queue_tail(&vmq->free_skb_list, skb); + return NULL; +} +EXPORT_SYMBOL(vmq_alloc_skb); + +/* Detach the guest pages and free the socket buffer */ +void vmq_free_skb(struct sk_buff *skb, int queue_id) +{ + struct net_device *dev; + struct netchannel2 *nc; + nc2_vmq_t *vmq; + + /* get the netchannel2 interface corresponding to this queue */ + dev = nc2_vmq_queue_to_vif(skb->dev, queue_id); + + nc = netdev_priv(dev); + vmq = &nc->vmq; + + /* Add skb to the dealloc queue */ + skb->dev = dev; + skb_queue_tail(&vmq->dealloc_queue, skb); + + /* kick netchannel2 interface */ + nc2_kick(&nc->rings); + +} +EXPORT_SYMBOL(vmq_free_skb); + +int nc2_is_vmq_packet(struct netchannel2 *nc, struct sk_buff *skb) +{ + int nr_frags; + long idx; + nc2_vmq_t *vmq = &nc->vmq; + + nr_frags = skb_shinfo(skb)->nr_frags; + if (vmq->vmq_mode && nr_frags && + PageForeign(skb_shinfo(skb)->frags[0].page)) { + idx = nc2_vmq_page_index(skb_shinfo(skb)->frags[0].page); + if ((idx >= 0) && (idx < VMQ_MAX_BUFFERS)) + return 1; + } + + return 0; +} + +/* Prepare to transmit a vmq packet */ +void xmit_vmq(struct netchannel2 *nc, struct sk_buff *skb, + volatile void *msg_buf) +{ + volatile struct netchannel2_msg_packet *msg = msg_buf; + volatile struct netchannel2_fragment *out_frag; + nc2_vmq_t *vmq = &nc->vmq; + skb_frag_t *frag; + struct nc2_tx_buffer *txbuf; + int nr_frags; + unsigned int idx; + unsigned x; + + nr_frags = skb_shinfo(skb)->nr_frags; + for (x = 0; x < nr_frags; x++) { + frag = &skb_shinfo(skb)->frags[x]; + out_frag = &msg->frags[x]; + + idx = nc2_vmq_page_index(frag->page); + txbuf = vmq->buffer[idx].buf; + out_frag->pre_post.id = txbuf->id; + out_frag->off = frag->page_offset; + out_frag->size = frag->size; + /* TODO: need to batch unmap grants */ + nc2_vmq_unmap_buf(nc, idx, 0); + } + + /* Avoid unmapping frags grants when skb is freed later */ + /* by nc2_vmq_free_skb() */ + skb_shinfo(skb)->nr_frags = 0; +} + diff --git a/drivers/xen/netchannel2/vmq.h b/drivers/xen/netchannel2/vmq.h new file mode 100644 index 0000000..fa1cc8a --- /dev/null +++ b/drivers/xen/netchannel2/vmq.h @@ -0,0 +1,58 @@ +#ifndef VMQ_H__ +#define VMQ_H__ + +#include "netchannel2_core.h" + +#ifdef CONFIG_XEN_NETDEV2_VMQ + +int nc2_vmq_connect(struct netchannel2 *nc); +void nc2_vmq_disconnect(struct netchannel2 *nc); +void do_vmq_work(struct netchannel2 *nc); +int nc2_is_vmq_packet(struct netchannel2 *nc, struct sk_buff *skb); +void xmit_vmq(struct netchannel2 *nc, struct sk_buff *skb, + volatile void *msg); +void vmq_flush_unmap_hypercall(void); + +#define vmq_get(_b) \ + atomic_inc(&(_b)->refcnt); + +#define vmq_put(_b) \ + do { \ + if (atomic_dec_and_test(&(_b)->refcnt)) { \ + wake_up(&(_b)->waiting_to_free); \ + } \ + } while (0) + +static inline int nr_vmq_mapped_bufs(struct netchannel2 *nc) +{ + return nc->vmq.mapped_pages_prod - + nc->vmq.mapped_pages_cons; +} + +static inline int nr_vmq_bufs(struct netchannel2 *nc) +{ + return nc->vmq.nbufs; +} + +static inline int nc2_in_vmq_mode(struct netchannel2 *nc) +{ + return nc->vmq.vmq_mode; +} + +#else +static inline int nc2_vmq_connect(struct netchannel2 *nc) +{ + return 0; +} +static inline void nc2_vmq_disconnect(struct netchannel2 *nc) +{ +} +static inline void do_vmq_work(struct netchannel2 *nc) +{ +} +static inline void vmq_flush_unmap_hypercall(void) +{ +} +#endif /* CONFIG_XEN_NETDEV2_VMQ */ + +#endif /* !VMQ_H__ */ diff --git a/drivers/xen/netchannel2/vmq_def.h b/drivers/xen/netchannel2/vmq_def.h new file mode 100644 index 0000000..60f1ccb --- /dev/null +++ b/drivers/xen/netchannel2/vmq_def.h @@ -0,0 +1,68 @@ +#ifndef VMQ_DEF_H__ +#define VMQ_DEF_H__ + + +/* size of HW queue in VMQ device */ +#define VMQ_QUEUE_SIZE 1024 + +/* Mimimum amount of buffers needed for VMQ + * This is the lower water mark that triggers mapping more guest buffers + * Should be larger than the queue size to allow for in flight packets + */ +#define VMQ_MIN_BUFFERS 1920 + +/* Maximum amount of posted buffers which are reserved for VMQ + * Should be less than MAX_POSTED_BUFFERS. For now, the difference can be used + * for intra-node guest to guest traffic. When we map guest buffers we try to + * have VMQ_MAX_BUFFERS mapped. The difference (VMQ_MAX_BUFFERS-VMQ_MIN_BUFFERS) + * helps batch multiple grant map operattions + * VMQ_QUEUE_SIZE < VMQ_MIN_BUFFER < VMQ_MAX_BUFFER < MAX_POSTED_BUFFERS + * VMQ_MAX_BUFFERS must be a power of 2 + */ +#define VMQ_MAX_BUFFERS 2048 + +/* skb size is zero since packet data uses fragments */ +#define VMQ_SKB_SIZE 0 + +#define VMQ_NUM_BUFFERS(len) ((len + PAGE_SIZE - 1) / PAGE_SIZE) + +#define VMQ_IDX_MASK(_i) ((_i)&(VMQ_MAX_BUFFERS-1)) + +typedef struct nc2_vmq_buf { + struct nc2_tx_buffer *buf; + struct netchannel2 *nc; +} nc2_vmq_buf_t; + +typedef struct nc2_vmq { + struct net_device *pdev; /* Pointer to physical device */ + int vmq_mode; /* indicate if vif is in vmq mode */ + struct page **pages; /* pages for mapping guest RX bufs */ + struct sk_buff_head free_skb_list; /* Free socket buffer pool */ + struct sk_buff_head dealloc_queue; /* list of skb''s to be free */ + struct sk_buff_head rx_queue; /* list of received packets */ + + /* guest mapped buffers */ + nc2_vmq_buf_t buffer[VMQ_MAX_BUFFERS]; + + /* Ring with free pages available for mapping guest RX buffers */ + u16 unmapped_pages[VMQ_MAX_BUFFERS]; + unsigned int unmapped_pages_prod; + unsigned int unmapped_pages_cons; + + /* Ring of mapped RX pages avaialable for vmq device */ + u16 mapped_pages[VMQ_MAX_BUFFERS]; + unsigned int mapped_pages_prod; + unsigned int mapped_pages_cons; + + unsigned int nbufs; /* number of vmq buffers: posted to */ + /* HW queue or available to be posted */ + int vmq_id; /* Queue id */ + int vmq_size; /* Queue size */ + int vmq_state; /* queue stste */ + + atomic_t refcnt; + wait_queue_head_t waiting_to_free; + +} nc2_vmq_t; + +#endif /* !VMQ_DEF_H__ */ diff --git a/drivers/xen/netchannel2/xmit_packet.c b/drivers/xen/netchannel2/xmit_packet.c index 1a879aa..09827fc 100644 --- a/drivers/xen/netchannel2/xmit_packet.c +++ b/drivers/xen/netchannel2/xmit_packet.c @@ -3,6 +3,7 @@ #include <linux/kernel.h> #include <linux/version.h> #include "netchannel2_core.h" +#include "vmq.h" /* You don''t normally want to transmit in posted buffers mode, because grant mode is usually faster, but it''s sometimes useful for testing @@ -189,6 +190,11 @@ int nc2_really_start_xmit(struct netchannel2_ring_pair *ncrp, set_offload_flags(skb, msg); switch (skb_co->policy) { +#ifdef CONFIG_XEN_NETDEV2_VMQ + case transmit_policy_vmq: + xmit_vmq(nc, skb, msg); + break; +#endif case transmit_policy_small: /* Nothing to do */ break; -- 1.6.3.1 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
<steven.smith@citrix.com>
2009-Oct-04 15:04 UTC
[Xen-devel] [PATCH 22/22] Add netchannel2 VMQ support to an old version of the ixgbe driver.
This is a bit of a mess, and doesn''t really want to be applied as-is, but might be useful for testing. The VMQ patch which I have is against version 1.3.56.5 of the driver, whereas the current 2.6.27 tree has version 2.0.34.3. I don''t currently have access to any VMQ-capable hardware, and won''t be at Citrix long enough to acquire any, so this patch just rolls the driver back to 1.3.56.5 and adds VMQ support to that. The original VMQ patch was Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> My only contribution was to run combinediff, but FWIW that''s Signed-off-by: Steven Smith <steven.smith@citrix.com> --- drivers/net/ixgbe/Makefile | 4 +- drivers/net/ixgbe/ixgbe.h | 220 +- drivers/net/ixgbe/ixgbe_82598.c | 487 ++-- drivers/net/ixgbe/ixgbe_82599.c | 2626 -------------------- drivers/net/ixgbe/ixgbe_api.c | 169 +-- drivers/net/ixgbe/ixgbe_api.h | 57 +- drivers/net/ixgbe/ixgbe_common.c | 622 ++---- drivers/net/ixgbe/ixgbe_common.h | 12 +- drivers/net/ixgbe/ixgbe_dcb.c | 19 +- drivers/net/ixgbe/ixgbe_dcb.h | 28 +- drivers/net/ixgbe/ixgbe_dcb_82598.c | 7 +- drivers/net/ixgbe/ixgbe_dcb_82598.h | 2 +- drivers/net/ixgbe/ixgbe_dcb_82599.c | 508 ---- drivers/net/ixgbe/ixgbe_dcb_82599.h | 125 - drivers/net/ixgbe/ixgbe_dcb_nl.c | 555 +----- drivers/net/ixgbe/ixgbe_ethtool.c | 561 ++---- drivers/net/ixgbe/ixgbe_main.c | 4486 +++++++++++++---------------------- drivers/net/ixgbe/ixgbe_osdep.h | 15 +- drivers/net/ixgbe/ixgbe_param.c | 524 +---- drivers/net/ixgbe/ixgbe_phy.c | 792 +------ drivers/net/ixgbe/ixgbe_phy.h | 27 +- drivers/net/ixgbe/ixgbe_type.h | 938 +-------- drivers/net/ixgbe/kcompat.c | 195 +-- drivers/net/ixgbe/kcompat_ethtool.c | 10 +- drivers/xen/netchannel2/vmq.c | 3 + 25 files changed, 2457 insertions(+), 10535 deletions(-) delete mode 100644 drivers/net/ixgbe/ixgbe_82599.c delete mode 100644 drivers/net/ixgbe/ixgbe_dcb_82599.c delete mode 100644 drivers/net/ixgbe/ixgbe_dcb_82599.h diff --git a/drivers/net/ixgbe/Makefile b/drivers/net/ixgbe/Makefile index aa7eaac..afc940d 100644 --- a/drivers/net/ixgbe/Makefile +++ b/drivers/net/ixgbe/Makefile @@ -33,8 +33,8 @@ obj-$(CONFIG_IXGBE) += ixgbe.o CFILES = ixgbe_main.c ixgbe_common.c ixgbe_api.c ixgbe_param.c \ - ixgbe_ethtool.c kcompat.c ixgbe_82598.c ixgbe_82599.c \ - ixgbe_dcb.c ixgbe_dcb_nl.c ixgbe_dcb_82598.c ixgbe_dcb_82599.c \ + ixgbe_ethtool.c kcompat.c ixgbe_82598.c \ + ixgbe_dcb.c ixgbe_dcb_nl.c ixgbe_dcb_82598.c \ ixgbe_phy.c ixgbe_sysfs.c ixgbe-objs := $(CFILES:.c=.o) diff --git a/drivers/net/ixgbe/ixgbe.h b/drivers/net/ixgbe/ixgbe.h index f6c6d26..d76fd88 100644 --- a/drivers/net/ixgbe/ixgbe.h +++ b/drivers/net/ixgbe/ixgbe.h @@ -1,7 +1,7 @@ /******************************************************************************* Intel 10 Gigabit PCI Express Linux driver - Copyright(c) 1999 - 2009 Intel Corporation. + Copyright(c) 1999 - 2008 Intel Corporation. This program is free software; you can redistribute it and/or modify it under the terms and conditions of the GNU General Public License, @@ -35,6 +35,9 @@ #include <linux/pci.h> #include <linux/netdevice.h> #include <linux/vmalloc.h> +#ifdef CONFIG_XEN_NETDEV2_VMQ +#include <linux/netvmq.h> +#endif #ifdef SIOCETHTOOL #include <linux/ethtool.h> @@ -50,12 +53,20 @@ #include "ixgbe_dcb.h" - #include "kcompat.h" - #include "ixgbe_api.h" +#define IXGBE_NO_INET_LRO +#ifndef IXGBE_NO_LRO +#if defined(CONFIG_INET_LRO) || defined(CONFIG_INET_LRO_MODULE) +#include <linux/inet_lro.h> +#define IXGBE_MAX_LRO_DESCRIPTORS 8 +#undef IXGBE_NO_INET_LRO +#define IXGBE_NO_LRO +#endif +#endif /* IXGBE_NO_LRO */ + #define PFX "ixgbe: " #define DPRINTK(nlevel, klevel, fmt, args...) \ ((void)((NETIF_MSG_##nlevel & adapter->msg_enable) && \ @@ -88,17 +99,14 @@ #define IXGBE_RXBUFFER_128 128 /* Used for packet split */ #define IXGBE_RXBUFFER_256 256 /* Used for packet split */ #define IXGBE_RXBUFFER_2048 2048 -#define IXGBE_RXBUFFER_4096 4096 -#define IXGBE_RXBUFFER_8192 8192 -#define IXGBE_MAX_RXBUFFER 16384 /* largest size for single descriptor */ #define IXGBE_RX_HDR_SIZE IXGBE_RXBUFFER_256 #define MAXIMUM_ETHERNET_VLAN_SIZE (VLAN_ETH_FRAME_LEN + ETH_FCS_LEN) -#if defined(IXGBE_DCB) || defined(IXGBE_RSS) || \ - defined(IXGBE_VMDQ) -#define IXGBE_MQ +#if defined(CONFIG_IXGBE_DCB) || defined(CONFIG_IXGBE_RSS) || \ + defined(CONFIG_IXGBE_VMDQ) +#define CONFIG_IXGBE_MQ #endif /* How many Rx Buffers do we bundle into one write to the hardware ? */ @@ -108,8 +116,6 @@ #define IXGBE_TX_FLAGS_VLAN (u32)(1 << 1) #define IXGBE_TX_FLAGS_TSO (u32)(1 << 2) #define IXGBE_TX_FLAGS_IPV4 (u32)(1 << 3) -#define IXGBE_TX_FLAGS_FCOE (u32)(1 << 4) -#define IXGBE_TX_FLAGS_FSO (u32)(1 << 5) #define IXGBE_TX_FLAGS_VLAN_MASK 0xffff0000 #define IXGBE_TX_FLAGS_VLAN_PRIO_MASK 0x0000e000 #define IXGBE_TX_FLAGS_VLAN_SHIFT 16 @@ -121,34 +127,37 @@ struct ixgbe_lro_stats { u32 flushed; u32 coal; - u32 recycled; }; struct ixgbe_lro_desc { struct hlist_node lro_node; struct sk_buff *skb; + struct sk_buff *last_skb; + int timestamp; + u32 tsval; + u32 tsecr; u32 source_ip; u32 dest_ip; - u16 source_port; - u16 dest_port; - u16 vlan_tag; - u16 len; u32 next_seq; u32 ack_seq; u16 window; + u16 source_port; + u16 dest_port; + u16 append_cnt; u16 mss; - u16 opt_bytes; - u16 psh:1; - u32 tsval; - u32 tsecr; - u32 append_cnt; + u32 data_size; /*TCP data size*/ + u16 vlan_tag; +}; + +struct ixgbe_lro_info { + struct ixgbe_lro_stats stats; + int max; /*Maximum number of packet to coalesce.*/ }; struct ixgbe_lro_list { struct hlist_head active; struct hlist_head free; int active_cnt; - struct ixgbe_lro_stats stats; }; #endif /* IXGBE_NO_LRO */ @@ -177,18 +186,17 @@ struct ixgbe_queue_stats { struct ixgbe_ring { void *desc; /* descriptor ring memory */ + dma_addr_t dma; /* phys. address of descriptor ring */ + unsigned int size; /* length in bytes */ + unsigned int count; /* amount of descriptors */ + unsigned int next_to_use; + unsigned int next_to_clean; + + int queue_index; /* needed for multiqueue queue management */ union { struct ixgbe_tx_buffer *tx_buffer_info; struct ixgbe_rx_buffer *rx_buffer_info; }; - u8 atr_sample_rate; - u8 atr_count; - u16 count; /* amount of descriptors */ - u16 rx_buf_len; - u16 next_to_use; - u16 next_to_clean; - - u8 queue_index; /* needed for multiqueue queue management */ u16 head; u16 tail; @@ -196,43 +204,47 @@ struct ixgbe_ring { unsigned int total_bytes; unsigned int total_packets; + u16 reg_idx; /* holds the special value that gets the hardware register + * offset associated with this ring, which is different + * for DCB and RSS modes */ + #if defined(CONFIG_DCA) || defined(CONFIG_DCA_MODULE) /* cpu for tx queue */ int cpu; #endif - u16 work_limit; /* max work per interrupt */ - u16 reg_idx; /* holds the special value that gets the - * hardware register offset associated - * with this ring, which is different - * for DCB and RSS modes */ struct ixgbe_queue_stats stats; - unsigned long reinit_state; - u64 rsc_count; /* stat for coalesced packets */ - unsigned int size; /* length in bytes */ - dma_addr_t dma; /* phys. address of descriptor ring */ -}; - -enum ixgbe_ring_f_enum { - RING_F_NONE = 0, - RING_F_DCB, - RING_F_VMDQ, - RING_F_RSS, - RING_F_FDIR, - RING_F_ARRAY_SIZE /* must be last in enum set */ + u16 v_idx; /* maps directly to the index for this ring in the hardware + * vector array, can also be used for finding the bit in EICR + * and friends that represents the vector for this ring */ +#ifndef IXGBE_NO_LRO + /* LRO list for rx queue */ + struct ixgbe_lro_list *lrolist; +#endif +#ifndef IXGBE_NO_INET_LRO + struct net_lro_mgr lro_mgr; + bool lro_used; +#endif + u16 work_limit; /* max work per interrupt */ + u16 rx_buf_len; + u8 mac_addr[ETH_ALEN]; + u8 active; + u8 allocated; }; +#define RING_F_DCB 0 +#define RING_F_VMDQ 1 +#define RING_F_RSS 2 #define IXGBE_MAX_DCB_INDICES 8 #define IXGBE_MAX_RSS_INDICES 16 -#define IXGBE_MAX_VMDQ_INDICES 64 -#define IXGBE_MAX_FDIR_INDICES 64 +#define IXGBE_MAX_VMDQ_INDICES 16 struct ixgbe_ring_feature { int indices; int mask; }; -#define MAX_RX_QUEUES 128 -#define MAX_TX_QUEUES 128 +#define MAX_RX_QUEUES 64 +#define MAX_TX_QUEUES 32 #define MAX_RX_PACKET_BUFFERS ((adapter->flags & IXGBE_FLAG_DCB_ENABLED) \ ? 8 : 1) @@ -243,9 +255,6 @@ struct ixgbe_ring_feature { */ struct ixgbe_q_vector { struct ixgbe_adapter *adapter; - unsigned int v_idx; /* index of q_vector within array, also used for - * finding the bit in EICR and friends that - * represents the vector for this ring */ #ifdef CONFIG_IXGBE_NAPI struct napi_struct napi; #endif @@ -256,19 +265,14 @@ struct ixgbe_q_vector { u8 tx_itr; u8 rx_itr; u32 eitr; -#ifndef IXGBE_NO_LRO - struct ixgbe_lro_list *lrolist; /* LRO list for queue vector*/ -#endif - char name[IFNAMSIZ + 9]; }; /* Helper macros to switch between ints/sec and what the register uses. - * And yes, it''s the same math going both ways. The lowest value - * supported by all of the ixgbe hardware is 8. + * And yes, it''s the same math going both ways. */ #define EITR_INTS_PER_SEC_TO_REG(_eitr) \ - ((_eitr) ? (1000000000 / ((_eitr) * 256)) : 8) + ((_eitr) ? (1000000000 / ((_eitr) * 256)) : 0) #define EITR_REG_TO_INTS_PER_SEC EITR_INTS_PER_SEC_TO_REG #define IXGBE_DESC_UNUSED(R) \ @@ -295,21 +299,9 @@ struct ixgbe_q_vector { #define OTHER_VECTOR 1 #define NON_Q_VECTORS (OTHER_VECTOR + TCP_TIMER_VECTOR) -#define IXGBE_MAX_MSIX_VECTORS_82599 64 -#define IXGBE_MAX_MSIX_Q_VECTORS_82599 64 -#define IXGBE_MAX_MSIX_Q_VECTORS_82598 16 -#define IXGBE_MAX_MSIX_VECTORS_82598 18 - -/* - * Only for array allocations in our adapter struct. On 82598, there will be - * unused entries in the array, but that''s not a big deal. Also, in 82599, - * we can actually assign 64 queue vectors based on our extended-extended - * interrupt registers. This is different than 82598, which is limited to 16. - */ -#define MAX_MSIX_Q_VECTORS IXGBE_MAX_MSIX_Q_VECTORS_82599 -#define MAX_MSIX_COUNT IXGBE_MAX_MSIX_VECTORS_82599 - +#define MAX_MSIX_Q_VECTORS 16 #define MIN_MSIX_Q_VECTORS 2 +#define MAX_MSIX_COUNT (MAX_MSIX_Q_VECTORS + NON_Q_VECTORS) #define MIN_MSIX_COUNT (MIN_MSIX_Q_VECTORS + NON_Q_VECTORS) /* board specific private data structure */ @@ -320,11 +312,11 @@ struct ixgbe_adapter { #endif u16 bd_number; struct work_struct reset_task; - struct ixgbe_q_vector *q_vector[MAX_MSIX_Q_VECTORS]; + struct ixgbe_q_vector q_vector[MAX_MSIX_Q_VECTORS]; + char name[MAX_MSIX_COUNT][IFNAMSIZ + 5]; struct ixgbe_dcb_config dcb_cfg; struct ixgbe_dcb_config temp_dcb_cfg; u8 dcb_set_bitmap; - enum ixgbe_fc_mode last_lfc_mode; /* Interrupt Throttle Rate */ u32 itr_setting; @@ -345,24 +337,21 @@ struct ixgbe_adapter { /* RX */ struct ixgbe_ring *rx_ring; /* One per active queue */ int num_rx_queues; - int num_rx_pools; /* == num_rx_queues in 82598 */ - int num_rx_queues_per_pool; /* 1 if 82598, can be many if 82599 */ u64 hw_csum_rx_error; - u64 hw_rx_no_dma_resources; u64 hw_csum_rx_good; u64 non_eop_descs; #ifndef CONFIG_IXGBE_NAPI u64 rx_dropped_backlog; /* count drops from rx intr handler */ #endif int num_msix_vectors; - int max_msix_q_vectors; /* true count of q_vectors for device */ - struct ixgbe_ring_feature ring_feature[RING_F_ARRAY_SIZE]; + struct ixgbe_ring_feature ring_feature[3]; struct msix_entry *msix_entries; #ifdef IXGBE_TCP_TIMER irqreturn_t (*msix_handlers[MAX_MSIX_COUNT])(int irq, void *data, struct pt_regs *regs); #endif + u64 rx_hdr_split; u32 alloc_rx_page_failed; u32 alloc_rx_buff_failed; @@ -384,7 +373,7 @@ struct ixgbe_adapter { #define IXGBE_FLAG_IN_NETPOLL (u32)(1 << 9) #define IXGBE_FLAG_DCA_ENABLED (u32)(1 << 10) #define IXGBE_FLAG_DCA_CAPABLE (u32)(1 << 11) -#define IXGBE_FLAG_DCA_ENABLED_DATA (u32)(1 << 12) +#define IXGBE_FLAG_IMIR_ENABLED (u32)(1 << 12) #define IXGBE_FLAG_MQ_CAPABLE (u32)(1 << 13) #define IXGBE_FLAG_DCB_ENABLED (u32)(1 << 14) #define IXGBE_FLAG_DCB_CAPABLE (u32)(1 << 15) @@ -395,19 +384,7 @@ struct ixgbe_adapter { #define IXGBE_FLAG_FAN_FAIL_CAPABLE (u32)(1 << 20) #define IXGBE_FLAG_NEED_LINK_UPDATE (u32)(1 << 22) #define IXGBE_FLAG_IN_WATCHDOG_TASK (u32)(1 << 23) -#define IXGBE_FLAG_IN_SFP_LINK_TASK (u32)(1 << 24) -#define IXGBE_FLAG_IN_SFP_MOD_TASK (u32)(1 << 25) -#define IXGBE_FLAG_FDIR_HASH_CAPABLE (u32)(1 << 26) -#define IXGBE_FLAG_FDIR_PERFECT_CAPABLE (u32)(1 << 27) - u32 flags2; -#ifndef IXGBE_NO_HW_RSC -#define IXGBE_FLAG2_RSC_CAPABLE (u32)(1) -#define IXGBE_FLAG2_RSC_ENABLED (u32)(1 << 1) -#endif /* IXGBE_NO_HW_RSC */ -#ifndef IXGBE_NO_LRO -#define IXGBE_FLAG2_SWLRO_ENABLED (u32)(1 << 2) -#endif /* IXGBE_NO_LRO */ -#define IXGBE_FLAG2_VMDQ_DEFAULT_OVERRIDE (u32)(1 << 3) + /* default to trying for four seconds */ #define IXGBE_TRY_LINK_TIMEOUT (4 * HZ) @@ -416,7 +393,7 @@ struct ixgbe_adapter { struct pci_dev *pdev; struct net_device_stats net_stats; #ifndef IXGBE_NO_LRO - struct ixgbe_lro_stats lro_stats; + struct ixgbe_lro_info lro_data; #endif #ifdef ETHTOOL_TEST @@ -433,15 +410,23 @@ struct ixgbe_adapter { u32 lli_port; u32 lli_size; u64 lli_int; - u32 lli_etype; - u32 lli_vlan_pri; -#endif /* IXGBE_NO_LLI */ +#endif /* Interrupt Throttle Rate */ u32 eitr_param; unsigned long state; u32 *config_space; u64 tx_busy; +#ifndef IXGBE_NO_INET_LRO + unsigned int lro_max_aggr; + unsigned int lro_aggregated; + unsigned int lro_flushed; + unsigned int lro_no_desc; +#endif +#ifdef CONFIG_XEN_NETDEV2_VMQ + struct net_vmq *vmq; + u32 rx_queues_allocated; +#endif unsigned int tx_ring_count; unsigned int rx_ring_count; @@ -452,41 +437,19 @@ struct ixgbe_adapter { struct work_struct watchdog_task; struct work_struct sfp_task; struct timer_list sfp_timer; - struct work_struct multispeed_fiber_task; - struct work_struct sfp_config_module_task; - u64 flm; - u32 fdir_pballoc; - u32 atr_sample_rate; - spinlock_t fdir_perfect_lock; - struct work_struct fdir_reinit_task; - u64 rsc_count; - u32 wol; - u16 eeprom_version; - bool netdev_registered; - char lsc_int_name[IFNAMSIZ + 9]; -#ifdef IXGBE_TCP_TIMER - char tcp_timer_name[IFNAMSIZ + 9]; -#endif }; enum ixbge_state_t { __IXGBE_TESTING, __IXGBE_RESETTING, __IXGBE_DOWN, - __IXGBE_FDIR_INIT_DONE, __IXGBE_SFP_MODULE_NOT_FOUND }; -#ifdef CONFIG_DCB -extern struct dcbnl_rtnl_ops dcbnl_ops; -extern int ixgbe_copy_dcb_cfg(struct ixgbe_dcb_config *src_dcb_cfg, - struct ixgbe_dcb_config *dst_dcb_cfg, int tc_max); -#endif /* needed by ixgbe_main.c */ extern int ixgbe_validate_mac_addr(u8 *mc_addr); extern void ixgbe_check_options(struct ixgbe_adapter *adapter); -extern void ixgbe_assign_netdev_ops(struct net_device *netdev); /* needed by ixgbe_ethtool.c */ extern char ixgbe_driver_name[]; @@ -502,8 +465,10 @@ extern int ixgbe_setup_tx_resources(struct ixgbe_adapter *,struct ixgbe_ring *); extern void ixgbe_free_rx_resources(struct ixgbe_adapter *,struct ixgbe_ring *); extern void ixgbe_free_tx_resources(struct ixgbe_adapter *,struct ixgbe_ring *); extern void ixgbe_update_stats(struct ixgbe_adapter *adapter); + +/* needed by ixgbe_dcb_nl.c */ +extern void ixgbe_reset_interrupt_capability(struct ixgbe_adapter *adapter); extern int ixgbe_init_interrupt_scheme(struct ixgbe_adapter *adapter); -extern void ixgbe_clear_interrupt_scheme(struct ixgbe_adapter *adapter); extern bool ixgbe_is_ixgbe(struct pci_dev *pcidev); #ifdef ETHTOOL_OPS_COMPAT @@ -513,5 +478,12 @@ extern int ethtool_ioctl(struct ifreq *ifr); extern int ixgbe_dcb_netlink_register(void); extern int ixgbe_dcb_netlink_unregister(void); +extern int ixgbe_sysfs_create(struct ixgbe_adapter *adapter); +extern void ixgbe_sysfs_remove(struct ixgbe_adapter *adapter); + +#ifdef CONFIG_IXGBE_NAPI +extern void ixgbe_napi_add_all(struct ixgbe_adapter *adapter); +extern void ixgbe_napi_del_all(struct ixgbe_adapter *adapter); +#endif #endif /* _IXGBE_H_ */ diff --git a/drivers/net/ixgbe/ixgbe_82598.c b/drivers/net/ixgbe/ixgbe_82598.c index ae6490b..1059032 100644 --- a/drivers/net/ixgbe/ixgbe_82598.c +++ b/drivers/net/ixgbe/ixgbe_82598.c @@ -1,7 +1,7 @@ /******************************************************************************* Intel 10 Gigabit PCI Express Linux driver - Copyright(c) 1999 - 2009 Intel Corporation. + Copyright(c) 1999 - 2008 Intel Corporation. This program is free software; you can redistribute it and/or modify it under the terms and conditions of the GNU General Public License, @@ -30,12 +30,15 @@ #include "ixgbe_common.h" #include "ixgbe_phy.h" -u32 ixgbe_get_pcie_msix_count_82598(struct ixgbe_hw *hw); s32 ixgbe_init_ops_82598(struct ixgbe_hw *hw); static s32 ixgbe_get_link_capabilities_82598(struct ixgbe_hw *hw, ixgbe_link_speed *speed, bool *autoneg); +s32 ixgbe_get_copper_link_capabilities_82598(struct ixgbe_hw *hw, + ixgbe_link_speed *speed, + bool *autoneg); static enum ixgbe_media_type ixgbe_get_media_type_82598(struct ixgbe_hw *hw); +s32 ixgbe_setup_fc_82598(struct ixgbe_hw *hw, s32 packetbuf_num); s32 ixgbe_fc_enable_82598(struct ixgbe_hw *hw, s32 packetbuf_num); static s32 ixgbe_setup_mac_link_82598(struct ixgbe_hw *hw); static s32 ixgbe_check_mac_link_82598(struct ixgbe_hw *hw, @@ -56,37 +59,13 @@ static s32 ixgbe_clear_vmdq_82598(struct ixgbe_hw *hw, u32 rar, u32 vmdq); s32 ixgbe_set_vfta_82598(struct ixgbe_hw *hw, u32 vlan, u32 vind, bool vlan_on); static s32 ixgbe_clear_vfta_82598(struct ixgbe_hw *hw); +static s32 ixgbe_blink_led_stop_82598(struct ixgbe_hw *hw, u32 index); +static s32 ixgbe_blink_led_start_82598(struct ixgbe_hw *hw, u32 index); s32 ixgbe_read_analog_reg8_82598(struct ixgbe_hw *hw, u32 reg, u8 *val); s32 ixgbe_write_analog_reg8_82598(struct ixgbe_hw *hw, u32 reg, u8 val); s32 ixgbe_read_i2c_eeprom_82598(struct ixgbe_hw *hw, u8 byte_offset, u8 *eeprom_data); u32 ixgbe_get_supported_physical_layer_82598(struct ixgbe_hw *hw); -s32 ixgbe_init_phy_ops_82598(struct ixgbe_hw *hw); -void ixgbe_set_lan_id_multi_port_pcie_82598(struct ixgbe_hw *hw); - - -/** - * ixgbe_get_pcie_msix_count_82598 - Gets MSI-X vector count - * @hw: pointer to hardware structure - * - * Read PCIe configuration space, and get the MSI-X vector count from - * the capabilities table. - **/ -u32 ixgbe_get_pcie_msix_count_82598(struct ixgbe_hw *hw) -{ - u32 msix_count = 18; - - if (hw->mac.msix_vectors_from_pcie) { - msix_count = IXGBE_READ_PCIE_WORD(hw, - IXGBE_PCIE_MSIX_82598_CAPS); - msix_count &= IXGBE_PCIE_MSIX_TBL_SZ_MASK; - - /* MSI-X count is zero-based in HW, so increment to give - * proper value */ - msix_count++; - } - return msix_count; -} /** * ixgbe_init_ops_82598 - Inits func ptrs and MAC type @@ -100,13 +79,11 @@ s32 ixgbe_init_ops_82598(struct ixgbe_hw *hw) struct ixgbe_mac_info *mac = &hw->mac; struct ixgbe_phy_info *phy = &hw->phy; s32 ret_val; + u16 list_offset, data_offset; ret_val = ixgbe_init_phy_ops_generic(hw); ret_val = ixgbe_init_ops_generic(hw); - /* PHY */ - phy->ops.init = &ixgbe_init_phy_ops_82598; - /* MAC */ mac->ops.reset_hw = &ixgbe_reset_hw_82598; mac->ops.get_media_type = &ixgbe_get_media_type_82598; @@ -114,7 +91,10 @@ s32 ixgbe_init_ops_82598(struct ixgbe_hw *hw) &ixgbe_get_supported_physical_layer_82598; mac->ops.read_analog_reg8 = &ixgbe_read_analog_reg8_82598; mac->ops.write_analog_reg8 = &ixgbe_write_analog_reg8_82598; - mac->ops.set_lan_id = &ixgbe_set_lan_id_multi_port_pcie_82598; + + /* LEDs */ + mac->ops.blink_led_start = &ixgbe_blink_led_start_82598; + mac->ops.blink_led_stop = &ixgbe_blink_led_stop_82598; /* RAR, Multicast, VLAN */ mac->ops.set_vmdq = &ixgbe_set_vmdq_82598; @@ -123,67 +103,42 @@ s32 ixgbe_init_ops_82598(struct ixgbe_hw *hw) mac->ops.clear_vfta = &ixgbe_clear_vfta_82598; /* Flow Control */ - mac->ops.fc_enable = &ixgbe_fc_enable_82598; + mac->ops.setup_fc = &ixgbe_setup_fc_82598; + + /* Link */ + mac->ops.check_link = &ixgbe_check_mac_link_82598; + if (mac->ops.get_media_type(hw) == ixgbe_media_type_copper) { + mac->ops.setup_link = &ixgbe_setup_copper_link_82598; + mac->ops.setup_link_speed + &ixgbe_setup_copper_link_speed_82598; + mac->ops.get_link_capabilities + &ixgbe_get_copper_link_capabilities_82598; + } else { + mac->ops.setup_link = &ixgbe_setup_mac_link_82598; + mac->ops.setup_link_speed = &ixgbe_setup_mac_link_speed_82598; + mac->ops.get_link_capabilities + &ixgbe_get_link_capabilities_82598; + } mac->mcft_size = 128; mac->vft_size = 128; mac->num_rar_entries = 16; mac->max_tx_queues = 32; mac->max_rx_queues = 64; - mac->max_msix_vectors = ixgbe_get_pcie_msix_count_82598(hw); /* SFP+ Module */ phy->ops.read_i2c_eeprom = &ixgbe_read_i2c_eeprom_82598; - /* Link */ - mac->ops.check_link = &ixgbe_check_mac_link_82598; - mac->ops.setup_link = &ixgbe_setup_mac_link_82598; - mac->ops.setup_link_speed = &ixgbe_setup_mac_link_speed_82598; - mac->ops.get_link_capabilities - &ixgbe_get_link_capabilities_82598; - - return ret_val; -} - -/** - * ixgbe_init_phy_ops_82598 - PHY/SFP specific init - * @hw: pointer to hardware structure - * - * Initialize any function pointers that were not able to be - * set during init_shared_code because the PHY/SFP type was - * not known. Perform the SFP init if necessary. - * - **/ -s32 ixgbe_init_phy_ops_82598(struct ixgbe_hw *hw) -{ - struct ixgbe_mac_info *mac = &hw->mac; - struct ixgbe_phy_info *phy = &hw->phy; - s32 ret_val = 0; - u16 list_offset, data_offset; - - - /* Identify the PHY */ + /* Call PHY identify routine to get the phy type */ phy->ops.identify(hw); - /* Overwrite the link function pointers if copper PHY */ - if (mac->ops.get_media_type(hw) == ixgbe_media_type_copper) { - mac->ops.setup_link = &ixgbe_setup_copper_link_82598; - mac->ops.setup_link_speed - &ixgbe_setup_copper_link_speed_82598; - mac->ops.get_link_capabilities - &ixgbe_get_copper_link_capabilities_generic; - } - + /* PHY Init */ switch (hw->phy.type) { case ixgbe_phy_tn: phy->ops.check_link = &ixgbe_check_phy_link_tnx; phy->ops.get_firmware_version &ixgbe_get_phy_firmware_version_tnx; break; - case ixgbe_phy_aq: - phy->ops.get_firmware_version - &ixgbe_get_phy_firmware_version_aq; - break; case ixgbe_phy_nl: phy->ops.reset = &ixgbe_reset_phy_nl; @@ -226,19 +181,12 @@ static s32 ixgbe_get_link_capabilities_82598(struct ixgbe_hw *hw, bool *autoneg) { s32 status = 0; - u32 autoc = 0; /* * Determine link capabilities based on the stored value of AUTOC, - * which represents EEPROM defaults. If AUTOC value has not been - * stored, use the current register value. + * which represents EEPROM defaults. */ - if (hw->mac.orig_link_settings_stored) - autoc = hw->mac.orig_autoc; - else - autoc = IXGBE_READ_REG(hw, IXGBE_AUTOC); - - switch (autoc & IXGBE_AUTOC_LMS_MASK) { + switch (hw->mac.orig_autoc & IXGBE_AUTOC_LMS_MASK) { case IXGBE_AUTOC_LMS_1G_LINK_NO_AN: *speed = IXGBE_LINK_SPEED_1GB_FULL; *autoneg = false; @@ -257,9 +205,9 @@ static s32 ixgbe_get_link_capabilities_82598(struct ixgbe_hw *hw, case IXGBE_AUTOC_LMS_KX4_AN: case IXGBE_AUTOC_LMS_KX4_AN_1G_AN: *speed = IXGBE_LINK_SPEED_UNKNOWN; - if (autoc & IXGBE_AUTOC_KX4_SUPP) + if (hw->mac.orig_autoc & IXGBE_AUTOC_KX4_SUPP) *speed |= IXGBE_LINK_SPEED_10GB_FULL; - if (autoc & IXGBE_AUTOC_KX_SUPP) + if (hw->mac.orig_autoc & IXGBE_AUTOC_KX_SUPP) *speed |= IXGBE_LINK_SPEED_1GB_FULL; *autoneg = true; break; @@ -273,6 +221,38 @@ static s32 ixgbe_get_link_capabilities_82598(struct ixgbe_hw *hw, } /** + * ixgbe_get_copper_link_capabilities_82598 - Determines link capabilities + * @hw: pointer to hardware structure + * @speed: pointer to link speed + * @autoneg: boolean auto-negotiation value + * + * Determines the link capabilities by reading the AUTOC register. + **/ +s32 ixgbe_get_copper_link_capabilities_82598(struct ixgbe_hw *hw, + ixgbe_link_speed *speed, + bool *autoneg) +{ + s32 status = IXGBE_ERR_LINK_SETUP; + u16 speed_ability; + + *speed = 0; + *autoneg = true; + + status = hw->phy.ops.read_reg(hw, IXGBE_MDIO_PHY_SPEED_ABILITY, + IXGBE_MDIO_PMA_PMD_DEV_TYPE, + &speed_ability); + + if (status == 0) { + if (speed_ability & IXGBE_MDIO_PHY_SPEED_10G) + *speed |= IXGBE_LINK_SPEED_10GB_FULL; + if (speed_ability & IXGBE_MDIO_PHY_SPEED_1G) + *speed |= IXGBE_LINK_SPEED_1GB_FULL; + } + + return status; +} + +/** * ixgbe_get_media_type_82598 - Determines media type * @hw: pointer to hardware structure * @@ -282,18 +262,9 @@ static enum ixgbe_media_type ixgbe_get_media_type_82598(struct ixgbe_hw *hw) { enum ixgbe_media_type media_type; - /* Detect if there is a copper PHY attached. */ - if (hw->phy.type == ixgbe_phy_cu_unknown || - hw->phy.type == ixgbe_phy_tn || - hw->phy.type == ixgbe_phy_aq) { - media_type = ixgbe_media_type_copper; - goto out; - } - /* Media type for I82598 is based on device ID */ switch (hw->device_id) { case IXGBE_DEV_ID_82598: - case IXGBE_DEV_ID_82598_BX: /* Default device ID is mezzanine card KX/KX4 */ media_type = ixgbe_media_type_backplane; break; @@ -314,7 +285,7 @@ static enum ixgbe_media_type ixgbe_get_media_type_82598(struct ixgbe_hw *hw) media_type = ixgbe_media_type_unknown; break; } -out: + return media_type; } @@ -332,17 +303,6 @@ s32 ixgbe_fc_enable_82598(struct ixgbe_hw *hw, s32 packetbuf_num) u32 rmcs_reg; u32 reg; -#ifdef CONFIG_DCB - if (hw->fc.requested_mode == ixgbe_fc_pfc) - goto out; - -#endif /* CONFIG_DCB */ - /* Negotiate the fc mode to use */ - ret_val = ixgbe_fc_autoneg(hw); - if (ret_val) - goto out; - - /* Disable any previous flow control settings */ fctrl_reg = IXGBE_READ_REG(hw, IXGBE_FCTRL); fctrl_reg &= ~(IXGBE_FCTRL_RFCE | IXGBE_FCTRL_RPFCE); @@ -354,19 +314,14 @@ s32 ixgbe_fc_enable_82598(struct ixgbe_hw *hw, s32 packetbuf_num) * 0: Flow control is completely disabled * 1: Rx flow control is enabled (we can receive pause frames, * but not send pause frames). - * 2: Tx flow control is enabled (we can send pause frames but + * 2: Tx flow control is enabled (we can send pause frames but * we do not support receiving pause frames). * 3: Both Rx and Tx flow control (symmetric) are enabled. -#ifdef CONFIG_DCB - * 4: Priority Flow Control is enabled. -#endif * other: Invalid. */ switch (hw->fc.current_mode) { case ixgbe_fc_none: - /* Flow control is disabled by software override or autoneg. - * The code below will actually disable it in the HW. - */ + /* Flow control completely disabled by software override. */ break; case ixgbe_fc_rx_pause: /* @@ -391,11 +346,6 @@ s32 ixgbe_fc_enable_82598(struct ixgbe_hw *hw, s32 packetbuf_num) fctrl_reg |= IXGBE_FCTRL_RFCE; rmcs_reg |= IXGBE_RMCS_TFCE_802_3X; break; -#ifdef CONFIG_DCB - case ixgbe_fc_pfc: - goto out; - break; -#endif /* CONFIG_DCB */ default: hw_dbg(hw, "Flow control param set incorrectly\n"); ret_val = -IXGBE_ERR_CONFIG; @@ -403,8 +353,7 @@ s32 ixgbe_fc_enable_82598(struct ixgbe_hw *hw, s32 packetbuf_num) break; } - /* Set 802.3x based flow control settings. */ - fctrl_reg |= IXGBE_FCTRL_DPF; + /* Enable 802.3x based flow control settings. */ IXGBE_WRITE_REG(hw, IXGBE_FCTRL, fctrl_reg); IXGBE_WRITE_REG(hw, IXGBE_RMCS, rmcs_reg); @@ -423,7 +372,7 @@ s32 ixgbe_fc_enable_82598(struct ixgbe_hw *hw, s32 packetbuf_num) } /* Configure pause time (2 TCs per register) */ - reg = IXGBE_READ_REG(hw, IXGBE_FCTTV(packetbuf_num / 2)); + reg = IXGBE_READ_REG(hw, IXGBE_FCTTV(packetbuf_num)); if ((packetbuf_num & 1) == 0) reg = (reg & 0xFFFF0000) | hw->fc.pause_time; else @@ -437,6 +386,64 @@ out: } /** + * ixgbe_setup_fc_82598 - Set up flow control + * @hw: pointer to hardware structure + * + * Sets up flow control. + **/ +s32 ixgbe_setup_fc_82598(struct ixgbe_hw *hw, s32 packetbuf_num) +{ + s32 ret_val = 0; + + /* Validate the packetbuf configuration */ + if (packetbuf_num < 0 || packetbuf_num > 7) { + hw_dbg(hw, "Invalid packet buffer number [%d], expected range is" + " 0-7\n", packetbuf_num); + ret_val = IXGBE_ERR_INVALID_LINK_SETTINGS; + goto out; + } + + /* + * Validate the water mark configuration. Zero water marks are invalid + * because it causes the controller to just blast out fc packets. + */ + if (!hw->fc.low_water || !hw->fc.high_water || !hw->fc.pause_time) { + hw_dbg(hw, "Invalid water mark configuration\n"); + ret_val = IXGBE_ERR_INVALID_LINK_SETTINGS; + goto out; + } + + /* + * Validate the requested mode. Strict IEEE mode does not allow + * ixgbe_fc_rx_pause because it will cause testing anomalies. + */ + if (hw->fc.strict_ieee && hw->fc.requested_mode == ixgbe_fc_rx_pause) { + hw_dbg(hw, "ixgbe_fc_rx_pause not valid in strict IEEE mode\n"); + ret_val = IXGBE_ERR_INVALID_LINK_SETTINGS; + goto out; + } + + /* + * 10gig parts do not have a word in the EEPROM to determine the + * default flow control setting, so we explicitly set it to full. + */ + if (hw->fc.requested_mode == ixgbe_fc_default) + hw->fc.requested_mode = ixgbe_fc_full; + + /* + * Save off the requested flow control mode for use later. Depending + * on the link partner''s capabilities, we may or may not use this mode. + */ + hw->fc.current_mode = hw->fc.requested_mode; + + + ret_val = ixgbe_fc_enable_82598(hw, packetbuf_num); + +out: + return ret_val; +} + +/** * ixgbe_setup_mac_link_82598 - Configures MAC link settings * @hw: pointer to hardware structure * @@ -475,6 +482,9 @@ static s32 ixgbe_setup_mac_link_82598(struct ixgbe_hw *hw) } } + /* Set up flow control */ + status = ixgbe_setup_fc_82598(hw, 0); + /* Add delay to filter out noises during initial link setup */ msleep(50); @@ -562,11 +572,6 @@ static s32 ixgbe_check_mac_link_82598(struct ixgbe_hw *hw, else *speed = IXGBE_LINK_SPEED_1GB_FULL; - /* if link is down, zero out the current_mode */ - if (*link_up == false) { - hw->fc.current_mode = ixgbe_fc_none; - hw->fc.fc_was_autonegged = false; - } out: return 0; } @@ -636,10 +641,22 @@ static s32 ixgbe_setup_mac_link_speed_82598(struct ixgbe_hw *hw, static s32 ixgbe_setup_copper_link_82598(struct ixgbe_hw *hw) { s32 status; + u32 curr_autoc = IXGBE_READ_REG(hw, IXGBE_AUTOC); + u32 autoc = curr_autoc; /* Restart autonegotiation on PHY */ status = hw->phy.ops.setup_link(hw); + /* Set MAC to KX/KX4 autoneg, which defaults to Parallel detection */ + autoc &= ~IXGBE_AUTOC_LMS_MASK; + autoc |= IXGBE_AUTOC_LMS_KX4_AN; + + autoc &= ~(IXGBE_AUTOC_1G_PMA_PMD_MASK | IXGBE_AUTOC_10G_PMA_PMD_MASK); + autoc |= (IXGBE_AUTOC_10G_KX4 | IXGBE_AUTOC_1G_KX); + + if (autoc != curr_autoc) + IXGBE_WRITE_REG(hw, IXGBE_AUTOC, autoc); + /* Set up MAC */ ixgbe_setup_mac_link_82598(hw); @@ -661,10 +678,23 @@ static s32 ixgbe_setup_copper_link_speed_82598(struct ixgbe_hw *hw, bool autoneg_wait_to_complete) { s32 status; + u32 curr_autoc = IXGBE_READ_REG(hw, IXGBE_AUTOC); + u32 autoc = curr_autoc; /* Setup the PHY according to input speed */ status = hw->phy.ops.setup_link_speed(hw, speed, autoneg, autoneg_wait_to_complete); + + /* Set MAC to KX/KX4 autoneg, which defaults to Parallel detection */ + autoc &= ~IXGBE_AUTOC_LMS_MASK; + autoc |= IXGBE_AUTOC_LMS_KX4_AN; + + autoc &= ~(IXGBE_AUTOC_1G_PMA_PMD_MASK | IXGBE_AUTOC_10G_PMA_PMD_MASK); + autoc |= (IXGBE_AUTOC_10G_KX4 | IXGBE_AUTOC_1G_KX); + + if (autoc != curr_autoc) + IXGBE_WRITE_REG(hw, IXGBE_AUTOC, autoc); + /* Set up MAC */ ixgbe_setup_mac_link_82598(hw); @@ -682,7 +712,6 @@ static s32 ixgbe_setup_copper_link_speed_82598(struct ixgbe_hw *hw, static s32 ixgbe_reset_hw_82598(struct ixgbe_hw *hw) { s32 status = 0; - s32 phy_status = 0; u32 ctrl; u32 gheccr; u32 i; @@ -726,26 +755,14 @@ static s32 ixgbe_reset_hw_82598(struct ixgbe_hw *hw) } /* Reset PHY */ - if (hw->phy.reset_disable == false) { - /* PHY ops must be identified and initialized prior to reset */ - - /* Init PHY and function pointers, perform SFP setup */ - phy_status = hw->phy.ops.init(hw); - if (phy_status == IXGBE_ERR_SFP_NOT_SUPPORTED) - goto reset_hw_out; - else if (phy_status == IXGBE_ERR_SFP_NOT_PRESENT) - goto no_phy_reset; - + if (hw->phy.reset_disable == false) hw->phy.ops.reset(hw); - } -no_phy_reset: /* * Prevent the PCI-E bus from from hanging by disabling PCI-E master * access and verify no pending requests before reset */ - status = ixgbe_disable_pcie_master(hw); - if (status != 0) { + if (ixgbe_disable_pcie_master(hw) != 0) { status = IXGBE_ERR_MASTER_REQUESTS_PENDING; hw_dbg(hw, "PCI-E Master disable polling has failed.\n"); } @@ -785,23 +802,14 @@ no_phy_reset: if (hw->mac.orig_link_settings_stored == false) { hw->mac.orig_autoc = autoc; hw->mac.orig_link_settings_stored = true; - } else if (autoc != hw->mac.orig_autoc) + } + else if (autoc != hw->mac.orig_autoc) { IXGBE_WRITE_REG(hw, IXGBE_AUTOC, hw->mac.orig_autoc); + } /* Store the permanent mac address */ hw->mac.ops.get_mac_addr(hw, hw->mac.perm_addr); - /* - * Store MAC address from RAR0, clear receive address registers, and - * clear the multicast table - */ - hw->mac.ops.init_rx_addrs(hw); - - - -reset_hw_out: - if (phy_status != 0) - status = phy_status; return status; } @@ -918,6 +926,61 @@ static s32 ixgbe_clear_vfta_82598(struct ixgbe_hw *hw) } /** + * ixgbe_blink_led_start_82598 - Blink LED based on index. + * @hw: pointer to hardware structure + * @index: led number to blink + **/ +static s32 ixgbe_blink_led_start_82598(struct ixgbe_hw *hw, u32 index) +{ + ixgbe_link_speed speed = 0; + bool link_up = 0; + u32 autoc_reg = IXGBE_READ_REG(hw, IXGBE_AUTOC); + u32 led_reg = IXGBE_READ_REG(hw, IXGBE_LEDCTL); + + /* + * Link must be up to auto-blink the LEDs on the 82598EB MAC; + * force it if link is down. + */ + hw->mac.ops.check_link(hw, &speed, &link_up, false); + + if (!link_up) { + autoc_reg |= IXGBE_AUTOC_FLU; + IXGBE_WRITE_REG(hw, IXGBE_AUTOC, autoc_reg); + msleep(10); + } + + led_reg &= ~IXGBE_LED_MODE_MASK(index); + led_reg |= IXGBE_LED_BLINK(index); + IXGBE_WRITE_REG(hw, IXGBE_LEDCTL, led_reg); + IXGBE_WRITE_FLUSH(hw); + + return 0; +} + +/** + * ixgbe_blink_led_stop_82598 - Stop blinking LED based on index. + * @hw: pointer to hardware structure + * @index: led number to stop blinking + **/ +static s32 ixgbe_blink_led_stop_82598(struct ixgbe_hw *hw, u32 index) +{ + u32 autoc_reg = IXGBE_READ_REG(hw, IXGBE_AUTOC); + u32 led_reg = IXGBE_READ_REG(hw, IXGBE_LEDCTL); + + autoc_reg &= ~IXGBE_AUTOC_FLU; + autoc_reg |= IXGBE_AUTOC_AN_RESTART; + IXGBE_WRITE_REG(hw, IXGBE_AUTOC, autoc_reg); + + led_reg &= ~IXGBE_LED_MODE_MASK(index); + led_reg &= ~IXGBE_LED_BLINK(index); + led_reg |= IXGBE_LED_LINK_ACTIVE << IXGBE_LED_MODE_SHIFT(index); + IXGBE_WRITE_REG(hw, IXGBE_LEDCTL, led_reg); + IXGBE_WRITE_FLUSH(hw); + + return 0; +} + +/** * ixgbe_read_analog_reg8_82598 - Reads 8 bit Atlas analog register * @hw: pointer to hardware structure * @reg: analog register to read @@ -1030,56 +1093,33 @@ out: u32 ixgbe_get_supported_physical_layer_82598(struct ixgbe_hw *hw) { u32 physical_layer = IXGBE_PHYSICAL_LAYER_UNKNOWN; - u32 autoc = IXGBE_READ_REG(hw, IXGBE_AUTOC); - u32 pma_pmd_10g = autoc & IXGBE_AUTOC_10G_PMA_PMD_MASK; - u32 pma_pmd_1g = autoc & IXGBE_AUTOC_1G_PMA_PMD_MASK; - u16 ext_ability = 0; - - hw->phy.ops.identify(hw); - - /* Copper PHY must be checked before AUTOC LMS to determine correct - * physical layer because 10GBase-T PHYs use LMS = KX4/KX */ - if (hw->phy.type == ixgbe_phy_tn || - hw->phy.type == ixgbe_phy_cu_unknown) { - hw->phy.ops.read_reg(hw, IXGBE_MDIO_PHY_EXT_ABILITY, - IXGBE_MDIO_PMA_PMD_DEV_TYPE, &ext_ability); - if (ext_ability & IXGBE_MDIO_PHY_10GBASET_ABILITY) - physical_layer |= IXGBE_PHYSICAL_LAYER_10GBASE_T; - if (ext_ability & IXGBE_MDIO_PHY_1000BASET_ABILITY) - physical_layer |= IXGBE_PHYSICAL_LAYER_1000BASE_T; - if (ext_ability & IXGBE_MDIO_PHY_100BASETX_ABILITY) - physical_layer |= IXGBE_PHYSICAL_LAYER_100BASE_TX; - goto out; - } - switch (autoc & IXGBE_AUTOC_LMS_MASK) { - case IXGBE_AUTOC_LMS_1G_AN: - case IXGBE_AUTOC_LMS_1G_LINK_NO_AN: - if (pma_pmd_1g == IXGBE_AUTOC_1G_KX) - physical_layer = IXGBE_PHYSICAL_LAYER_1000BASE_KX; - else - physical_layer = IXGBE_PHYSICAL_LAYER_1000BASE_BX; + switch (hw->device_id) { + case IXGBE_DEV_ID_82598: + /* Default device ID is mezzanine card KX/KX4 */ + physical_layer = (IXGBE_PHYSICAL_LAYER_10GBASE_KX4 | + IXGBE_PHYSICAL_LAYER_1000BASE_KX); break; - case IXGBE_AUTOC_LMS_10G_LINK_NO_AN: - if (pma_pmd_10g == IXGBE_AUTOC_10G_CX4) - physical_layer = IXGBE_PHYSICAL_LAYER_10GBASE_CX4; - else if (pma_pmd_10g == IXGBE_AUTOC_10G_KX4) - physical_layer = IXGBE_PHYSICAL_LAYER_10GBASE_KX4; - else /* XAUI */ - physical_layer = IXGBE_PHYSICAL_LAYER_UNKNOWN; + case IXGBE_DEV_ID_82598EB_CX4: + case IXGBE_DEV_ID_82598_CX4_DUAL_PORT: + physical_layer = IXGBE_PHYSICAL_LAYER_10GBASE_CX4; break; - case IXGBE_AUTOC_LMS_KX4_AN: - case IXGBE_AUTOC_LMS_KX4_AN_1G_AN: - if (autoc & IXGBE_AUTOC_KX_SUPP) - physical_layer |= IXGBE_PHYSICAL_LAYER_1000BASE_KX; - if (autoc & IXGBE_AUTOC_KX4_SUPP) - physical_layer |= IXGBE_PHYSICAL_LAYER_10GBASE_KX4; + case IXGBE_DEV_ID_82598_DA_DUAL_PORT: + physical_layer = IXGBE_PHYSICAL_LAYER_SFP_PLUS_CU; break; - default: + case IXGBE_DEV_ID_82598AF_DUAL_PORT: + case IXGBE_DEV_ID_82598AF_SINGLE_PORT: + case IXGBE_DEV_ID_82598_SR_DUAL_PORT_EM: + physical_layer = IXGBE_PHYSICAL_LAYER_10GBASE_SR; break; - } - - if (hw->phy.type == ixgbe_phy_nl) { + case IXGBE_DEV_ID_82598EB_XF_LR: + physical_layer = IXGBE_PHYSICAL_LAYER_10GBASE_LR; + break; + case IXGBE_DEV_ID_82598AT: + physical_layer = (IXGBE_PHYSICAL_LAYER_10GBASE_T | + IXGBE_PHYSICAL_LAYER_1000BASE_T); + break; + case IXGBE_DEV_ID_82598EB_SFP_LOM: hw->phy.ops.identify_sfp(hw); switch (hw->phy.sfp_type) { @@ -1096,57 +1136,12 @@ u32 ixgbe_get_supported_physical_layer_82598(struct ixgbe_hw *hw) physical_layer = IXGBE_PHYSICAL_LAYER_UNKNOWN; break; } - } - - switch (hw->device_id) { - case IXGBE_DEV_ID_82598_DA_DUAL_PORT: - physical_layer = IXGBE_PHYSICAL_LAYER_SFP_PLUS_CU; - break; - case IXGBE_DEV_ID_82598AF_DUAL_PORT: - case IXGBE_DEV_ID_82598AF_SINGLE_PORT: - case IXGBE_DEV_ID_82598_SR_DUAL_PORT_EM: - physical_layer = IXGBE_PHYSICAL_LAYER_10GBASE_SR; - break; - case IXGBE_DEV_ID_82598EB_XF_LR: - physical_layer = IXGBE_PHYSICAL_LAYER_10GBASE_LR; break; + default: + physical_layer = IXGBE_PHYSICAL_LAYER_UNKNOWN; break; } -out: return physical_layer; } - -/** - * ixgbe_set_lan_id_multi_port_pcie_82598 - Set LAN id for PCIe multiple - * port devices. - * @hw: pointer to the HW structure - * - * Calls common function and corrects issue with some single port devices - * that enable LAN1 but not LAN0. - **/ -void ixgbe_set_lan_id_multi_port_pcie_82598(struct ixgbe_hw *hw) -{ - struct ixgbe_bus_info *bus = &hw->bus; - u16 pci_gen, pci_ctrl2; - - ixgbe_set_lan_id_multi_port_pcie(hw); - - /* check if LAN0 is disabled */ - hw->eeprom.ops.read(hw, IXGBE_PCIE_GENERAL_PTR, &pci_gen); - if ((pci_gen != 0) && (pci_gen != 0xFFFF)) { - - hw->eeprom.ops.read(hw, pci_gen + IXGBE_PCIE_CTRL2, &pci_ctrl2); - - /* if LAN0 is completely disabled force function to 0 */ - if ((pci_ctrl2 & IXGBE_PCIE_CTRL2_LAN_DISABLE) && - !(pci_ctrl2 & IXGBE_PCIE_CTRL2_DISABLE_SELECT) && - !(pci_ctrl2 & IXGBE_PCIE_CTRL2_DUMMY_ENABLE)) { - - bus->func = 0; - } - } -} - - diff --git a/drivers/net/ixgbe/ixgbe_82599.c b/drivers/net/ixgbe/ixgbe_82599.c deleted file mode 100644 index 8040d0b..0000000 --- a/drivers/net/ixgbe/ixgbe_82599.c +++ /dev/null @@ -1,2626 +0,0 @@ -/******************************************************************************* - - Intel 10 Gigabit PCI Express Linux driver - Copyright(c) 1999 - 2009 Intel Corporation. - - This program is free software; you can redistribute it and/or modify it - under the terms and conditions of the GNU General Public License, - version 2, as published by the Free Software Foundation. - - This program is distributed in the hope it will be useful, but WITHOUT - ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or - FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for - more details. - - You should have received a copy of the GNU General Public License along with - this program; if not, write to the Free Software Foundation, Inc., - 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA. - - The full GNU General Public License is included in this distribution in - the file called "COPYING". - - Contact Information: - e1000-devel Mailing List <e1000-devel@lists.sourceforge.net> - Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497 - -*******************************************************************************/ - -#include "ixgbe_type.h" -#include "ixgbe_api.h" -#include "ixgbe_common.h" -#include "ixgbe_phy.h" - -u32 ixgbe_get_pcie_msix_count_82599(struct ixgbe_hw *hw); -s32 ixgbe_init_ops_82599(struct ixgbe_hw *hw); -s32 ixgbe_get_link_capabilities_82599(struct ixgbe_hw *hw, - ixgbe_link_speed *speed, - bool *autoneg); -enum ixgbe_media_type ixgbe_get_media_type_82599(struct ixgbe_hw *hw); -s32 ixgbe_setup_mac_link_multispeed_fiber(struct ixgbe_hw *hw); -s32 ixgbe_setup_mac_link_speed_multispeed_fiber(struct ixgbe_hw *hw, - ixgbe_link_speed speed, bool autoneg, - bool autoneg_wait_to_complete); -s32 ixgbe_setup_mac_link_82599(struct ixgbe_hw *hw); -s32 ixgbe_check_mac_link_82599(struct ixgbe_hw *hw, - ixgbe_link_speed *speed, - bool *link_up, bool link_up_wait_to_complete); -s32 ixgbe_setup_mac_link_speed_82599(struct ixgbe_hw *hw, - ixgbe_link_speed speed, - bool autoneg, - bool autoneg_wait_to_complete); -static s32 ixgbe_setup_copper_link_82599(struct ixgbe_hw *hw); -static s32 ixgbe_setup_copper_link_speed_82599(struct ixgbe_hw *hw, - ixgbe_link_speed speed, - bool autoneg, - bool autoneg_wait_to_complete); -s32 ixgbe_setup_sfp_modules_82599(struct ixgbe_hw *hw); -void ixgbe_init_mac_link_ops_82599(struct ixgbe_hw *hw); -s32 ixgbe_reset_hw_82599(struct ixgbe_hw *hw); -s32 ixgbe_set_vmdq_82599(struct ixgbe_hw *hw, u32 rar, u32 vmdq); -s32 ixgbe_clear_vmdq_82599(struct ixgbe_hw *hw, u32 rar, u32 vmdq); -s32 ixgbe_insert_mac_addr_82599(struct ixgbe_hw *hw, u8 *addr, u32 vmdq); -s32 ixgbe_set_vfta_82599(struct ixgbe_hw *hw, u32 vlan, - u32 vind, bool vlan_on); -s32 ixgbe_clear_vfta_82599(struct ixgbe_hw *hw); -s32 ixgbe_init_uta_tables_82599(struct ixgbe_hw *hw); -s32 ixgbe_read_analog_reg8_82599(struct ixgbe_hw *hw, u32 reg, u8 *val); -s32 ixgbe_write_analog_reg8_82599(struct ixgbe_hw *hw, u32 reg, u8 val); -s32 ixgbe_start_hw_rev_1_82599(struct ixgbe_hw *hw); -s32 ixgbe_identify_phy_82599(struct ixgbe_hw *hw); -s32 ixgbe_init_phy_ops_82599(struct ixgbe_hw *hw); -u32 ixgbe_get_supported_physical_layer_82599(struct ixgbe_hw *hw); -s32 ixgbe_enable_rx_dma_82599(struct ixgbe_hw *hw, u32 regval); -s32 ixgbe_get_san_mac_addr_offset_82599(struct ixgbe_hw *hw, - u16 *san_mac_offset); -s32 ixgbe_get_san_mac_addr_82599(struct ixgbe_hw *hw, u8 *san_mac_addr); -s32 ixgbe_set_san_mac_addr_82599(struct ixgbe_hw *hw, u8 *san_mac_addr); -s32 ixgbe_get_device_caps_82599(struct ixgbe_hw *hw, u16 *device_caps); -static s32 ixgbe_verify_fw_version_82599(struct ixgbe_hw *hw); - - -void ixgbe_init_mac_link_ops_82599(struct ixgbe_hw *hw) -{ - struct ixgbe_mac_info *mac = &hw->mac; - - if (hw->phy.multispeed_fiber) { - /* Set up dual speed SFP+ support */ - mac->ops.setup_link - &ixgbe_setup_mac_link_multispeed_fiber; - mac->ops.setup_link_speed - &ixgbe_setup_mac_link_speed_multispeed_fiber; - } else { - mac->ops.setup_link - &ixgbe_setup_mac_link_82599; - mac->ops.setup_link_speed - &ixgbe_setup_mac_link_speed_82599; - } -} - -/** - * ixgbe_init_phy_ops_82599 - PHY/SFP specific init - * @hw: pointer to hardware structure - * - * Initialize any function pointers that were not able to be - * set during init_shared_code because the PHY/SFP type was - * not known. Perform the SFP init if necessary. - * - **/ -s32 ixgbe_init_phy_ops_82599(struct ixgbe_hw *hw) -{ - struct ixgbe_mac_info *mac = &hw->mac; - struct ixgbe_phy_info *phy = &hw->phy; - s32 ret_val = 0; - - /* Identify the PHY or SFP module */ - ret_val = phy->ops.identify(hw); - if (ret_val == IXGBE_ERR_SFP_NOT_SUPPORTED) - goto init_phy_ops_out; - - /* Setup function pointers based on detected SFP module and speeds */ - ixgbe_init_mac_link_ops_82599(hw); - if (hw->phy.sfp_type != ixgbe_sfp_type_unknown) - hw->phy.ops.reset = NULL; - - /* If copper media, overwrite with copper function pointers */ - if (mac->ops.get_media_type(hw) == ixgbe_media_type_copper) { - mac->ops.setup_link = &ixgbe_setup_copper_link_82599; - mac->ops.setup_link_speed - &ixgbe_setup_copper_link_speed_82599; - mac->ops.get_link_capabilities - &ixgbe_get_copper_link_capabilities_generic; - } - - /* Set necessary function pointers based on phy type */ - switch (hw->phy.type) { - case ixgbe_phy_tn: - phy->ops.check_link = &ixgbe_check_phy_link_tnx; - phy->ops.get_firmware_version - &ixgbe_get_phy_firmware_version_tnx; - break; - case ixgbe_phy_aq: - phy->ops.get_firmware_version - &ixgbe_get_phy_firmware_version_aq; - break; - default: - break; - } -init_phy_ops_out: - return ret_val; -} - -s32 ixgbe_setup_sfp_modules_82599(struct ixgbe_hw *hw) -{ - s32 ret_val = 0; - u16 list_offset, data_offset, data_value; - - if (hw->phy.sfp_type != ixgbe_sfp_type_unknown) { - ixgbe_init_mac_link_ops_82599(hw); - - hw->phy.ops.reset = NULL; - - ret_val = ixgbe_get_sfp_init_sequence_offsets(hw, &list_offset, - &data_offset); - if (ret_val != 0) - goto setup_sfp_out; - - /* PHY config will finish before releasing the semaphore */ - ret_val = ixgbe_acquire_swfw_sync(hw, IXGBE_GSSR_MAC_CSR_SM); - if (ret_val != 0) { - ret_val = IXGBE_ERR_SWFW_SYNC; - goto setup_sfp_out; - } - - hw->eeprom.ops.read(hw, ++data_offset, &data_value); - while (data_value != 0xffff) { - IXGBE_WRITE_REG(hw, IXGBE_CORECTL, data_value); - IXGBE_WRITE_FLUSH(hw); - hw->eeprom.ops.read(hw, ++data_offset, &data_value); - } - /* Now restart DSP by setting Restart_AN */ - IXGBE_WRITE_REG(hw, IXGBE_AUTOC, - (IXGBE_READ_REG(hw, IXGBE_AUTOC) | IXGBE_AUTOC_AN_RESTART)); - - /* Release the semaphore */ - ixgbe_release_swfw_sync(hw, IXGBE_GSSR_MAC_CSR_SM); - /* Delay obtaining semaphore again to allow FW access */ - msleep(hw->eeprom.semaphore_delay); - } - -setup_sfp_out: - return ret_val; -} - -/** - * ixgbe_get_pcie_msix_count_82599 - Gets MSI-X vector count - * @hw: pointer to hardware structure - * - * Read PCIe configuration space, and get the MSI-X vector count from - * the capabilities table. - **/ -u32 ixgbe_get_pcie_msix_count_82599(struct ixgbe_hw *hw) -{ - u32 msix_count = 64; - - if (hw->mac.msix_vectors_from_pcie) { - msix_count = IXGBE_READ_PCIE_WORD(hw, - IXGBE_PCIE_MSIX_82599_CAPS); - msix_count &= IXGBE_PCIE_MSIX_TBL_SZ_MASK; - - /* MSI-X count is zero-based in HW, so increment to give - * proper value */ - msix_count++; - } - - return msix_count; -} - -/** - * ixgbe_init_ops_82599 - Inits func ptrs and MAC type - * @hw: pointer to hardware structure - * - * Initialize the function pointers and assign the MAC type for 82599. - * Does not touch the hardware. - **/ - -s32 ixgbe_init_ops_82599(struct ixgbe_hw *hw) -{ - struct ixgbe_mac_info *mac = &hw->mac; - struct ixgbe_phy_info *phy = &hw->phy; - s32 ret_val; - - ret_val = ixgbe_init_phy_ops_generic(hw); - ret_val = ixgbe_init_ops_generic(hw); - - /* PHY */ - phy->ops.identify = &ixgbe_identify_phy_82599; - phy->ops.init = &ixgbe_init_phy_ops_82599; - - /* MAC */ - mac->ops.reset_hw = &ixgbe_reset_hw_82599; - mac->ops.get_media_type = &ixgbe_get_media_type_82599; - mac->ops.get_supported_physical_layer - &ixgbe_get_supported_physical_layer_82599; - mac->ops.enable_rx_dma = &ixgbe_enable_rx_dma_82599; - mac->ops.read_analog_reg8 = &ixgbe_read_analog_reg8_82599; - mac->ops.write_analog_reg8 = &ixgbe_write_analog_reg8_82599; - mac->ops.start_hw = &ixgbe_start_hw_rev_1_82599; - mac->ops.get_san_mac_addr = &ixgbe_get_san_mac_addr_82599; - mac->ops.set_san_mac_addr = &ixgbe_set_san_mac_addr_82599; - mac->ops.get_device_caps = &ixgbe_get_device_caps_82599; - - /* RAR, Multicast, VLAN */ - mac->ops.set_vmdq = &ixgbe_set_vmdq_82599; - mac->ops.clear_vmdq = &ixgbe_clear_vmdq_82599; - mac->ops.insert_mac_addr = &ixgbe_insert_mac_addr_82599; - mac->rar_highwater = 1; - mac->ops.set_vfta = &ixgbe_set_vfta_82599; - mac->ops.clear_vfta = &ixgbe_clear_vfta_82599; - mac->ops.init_uta_tables = &ixgbe_init_uta_tables_82599; - mac->ops.setup_sfp = &ixgbe_setup_sfp_modules_82599; - - /* Link */ - mac->ops.get_link_capabilities = &ixgbe_get_link_capabilities_82599; - mac->ops.check_link = &ixgbe_check_mac_link_82599; - ixgbe_init_mac_link_ops_82599(hw); - - mac->mcft_size = 128; - mac->vft_size = 128; - mac->num_rar_entries = 128; - mac->max_tx_queues = 128; - mac->max_rx_queues = 128; - mac->max_msix_vectors = ixgbe_get_pcie_msix_count_82599(hw); - - - return ret_val; -} - -/** - * ixgbe_get_link_capabilities_82599 - Determines link capabilities - * @hw: pointer to hardware structure - * @speed: pointer to link speed - * @negotiation: true when autoneg or autotry is enabled - * - * Determines the link capabilities by reading the AUTOC register. - **/ -s32 ixgbe_get_link_capabilities_82599(struct ixgbe_hw *hw, - ixgbe_link_speed *speed, - bool *negotiation) -{ - s32 status = 0; - u32 autoc = 0; - - /* - * Determine link capabilities based on the stored value of AUTOC, - * which represents EEPROM defaults. If AUTOC value has not - * been stored, use the current register values. - */ - if (hw->mac.orig_link_settings_stored) - autoc = hw->mac.orig_autoc; - else - autoc = IXGBE_READ_REG(hw, IXGBE_AUTOC); - - switch (autoc & IXGBE_AUTOC_LMS_MASK) { - case IXGBE_AUTOC_LMS_1G_LINK_NO_AN: - *speed = IXGBE_LINK_SPEED_1GB_FULL; - *negotiation = false; - break; - - case IXGBE_AUTOC_LMS_10G_LINK_NO_AN: - *speed = IXGBE_LINK_SPEED_10GB_FULL; - *negotiation = false; - break; - - case IXGBE_AUTOC_LMS_1G_AN: - *speed = IXGBE_LINK_SPEED_1GB_FULL; - *negotiation = true; - break; - - case IXGBE_AUTOC_LMS_10G_SERIAL: - *speed = IXGBE_LINK_SPEED_10GB_FULL; - *negotiation = false; - break; - - case IXGBE_AUTOC_LMS_KX4_KX_KR: - case IXGBE_AUTOC_LMS_KX4_KX_KR_1G_AN: - *speed = IXGBE_LINK_SPEED_UNKNOWN; - if (autoc & IXGBE_AUTOC_KR_SUPP) - *speed |= IXGBE_LINK_SPEED_10GB_FULL; - if (autoc & IXGBE_AUTOC_KX4_SUPP) - *speed |= IXGBE_LINK_SPEED_10GB_FULL; - if (autoc & IXGBE_AUTOC_KX_SUPP) - *speed |= IXGBE_LINK_SPEED_1GB_FULL; - *negotiation = true; - break; - - case IXGBE_AUTOC_LMS_KX4_KX_KR_SGMII: - *speed = IXGBE_LINK_SPEED_100_FULL; - if (autoc & IXGBE_AUTOC_KR_SUPP) - *speed |= IXGBE_LINK_SPEED_10GB_FULL; - if (autoc & IXGBE_AUTOC_KX4_SUPP) - *speed |= IXGBE_LINK_SPEED_10GB_FULL; - if (autoc & IXGBE_AUTOC_KX_SUPP) - *speed |= IXGBE_LINK_SPEED_1GB_FULL; - *negotiation = true; - break; - - case IXGBE_AUTOC_LMS_SGMII_1G_100M: - *speed = IXGBE_LINK_SPEED_1GB_FULL | IXGBE_LINK_SPEED_100_FULL; - *negotiation = false; - break; - - default: - status = IXGBE_ERR_LINK_SETUP; - goto out; - break; - } - - if (hw->phy.multispeed_fiber) { - *speed |= IXGBE_LINK_SPEED_10GB_FULL | - IXGBE_LINK_SPEED_1GB_FULL; - *negotiation = true; - } - -out: - return status; -} - -/** - * ixgbe_get_media_type_82599 - Get media type - * @hw: pointer to hardware structure - * - * Returns the media type (fiber, copper, backplane) - **/ -enum ixgbe_media_type ixgbe_get_media_type_82599(struct ixgbe_hw *hw) -{ - enum ixgbe_media_type media_type; - - /* Detect if there is a copper PHY attached. */ - if (hw->phy.type == ixgbe_phy_cu_unknown || - hw->phy.type == ixgbe_phy_tn || - hw->phy.type == ixgbe_phy_aq) { - media_type = ixgbe_media_type_copper; - goto out; - } - - switch (hw->device_id) { - case IXGBE_DEV_ID_82599_KX4: - case IXGBE_DEV_ID_82599_XAUI_LOM: - /* Default device ID is mezzanine card KX/KX4 */ - media_type = ixgbe_media_type_backplane; - break; - case IXGBE_DEV_ID_82599_SFP: - media_type = ixgbe_media_type_fiber; - break; - default: - media_type = ixgbe_media_type_unknown; - break; - } -out: - return media_type; -} - -/** - * ixgbe_setup_mac_link_82599 - Setup MAC link settings - * @hw: pointer to hardware structure - * - * Configures link settings based on values in the ixgbe_hw struct. - * Restarts the link. Performs autonegotiation if needed. - **/ -s32 ixgbe_setup_mac_link_82599(struct ixgbe_hw *hw) -{ - u32 autoc_reg; - u32 links_reg; - u32 i; - s32 status = 0; - - - /* Restart link */ - autoc_reg = IXGBE_READ_REG(hw, IXGBE_AUTOC); - autoc_reg |= IXGBE_AUTOC_AN_RESTART; - IXGBE_WRITE_REG(hw, IXGBE_AUTOC, autoc_reg); - - /* Only poll for autoneg to complete if specified to do so */ - if (hw->phy.autoneg_wait_to_complete) { - if ((autoc_reg & IXGBE_AUTOC_LMS_MASK) =- IXGBE_AUTOC_LMS_KX4_KX_KR || - (autoc_reg & IXGBE_AUTOC_LMS_MASK) =- IXGBE_AUTOC_LMS_KX4_KX_KR_1G_AN - || (autoc_reg & IXGBE_AUTOC_LMS_MASK) =- IXGBE_AUTOC_LMS_KX4_KX_KR_SGMII) { - links_reg = 0; /* Just in case Autoneg time = 0 */ - for (i = 0; i < IXGBE_AUTO_NEG_TIME; i++) { - links_reg = IXGBE_READ_REG(hw, IXGBE_LINKS); - if (links_reg & IXGBE_LINKS_KX_AN_COMP) - break; - msleep(100); - } - if (!(links_reg & IXGBE_LINKS_KX_AN_COMP)) { - status = IXGBE_ERR_AUTONEG_NOT_COMPLETE; - hw_dbg(hw, "Autoneg did not complete.\n"); - } - } - } - - /* Add delay to filter out noises during initial link setup */ - msleep(50); - - return status; -} - -/** - * ixgbe_setup_mac_link_multispeed_fiber - Setup MAC link settings - * @hw: pointer to hardware structure - * - * Configures link settings based on values in the ixgbe_hw struct. - * Restarts the link for multi-speed fiber at 1G speed, if link - * fails at 10G. - * Performs autonegotiation if needed. - **/ -s32 ixgbe_setup_mac_link_multispeed_fiber(struct ixgbe_hw *hw) -{ - s32 status = 0; - ixgbe_link_speed link_speed = IXGBE_LINK_SPEED_82599_AUTONEG; - status = ixgbe_setup_mac_link_speed_multispeed_fiber(hw, - link_speed, true, true); - return status; -} - -/** - * ixgbe_setup_mac_link_speed_multispeed_fiber - Set MAC link speed - * @hw: pointer to hardware structure - * @speed: new link speed - * @autoneg: true if autonegotiation enabled - * @autoneg_wait_to_complete: true when waiting for completion is needed - * - * Set the link speed in the AUTOC register and restarts link. - **/ -s32 ixgbe_setup_mac_link_speed_multispeed_fiber(struct ixgbe_hw *hw, - ixgbe_link_speed speed, bool autoneg, - bool autoneg_wait_to_complete) -{ - s32 status = 0; - ixgbe_link_speed link_speed; - ixgbe_link_speed highest_link_speed = IXGBE_LINK_SPEED_UNKNOWN; - u32 speedcnt = 0; - u32 esdp_reg = IXGBE_READ_REG(hw, IXGBE_ESDP); - u32 i = 0; - bool link_up = false; - bool negotiation; - - /* Mask off requested but non-supported speeds */ - status = ixgbe_get_link_capabilities(hw, &link_speed, &negotiation); - if (status != 0) - goto out; - - speed &= link_speed; - - /* Set autoneg_advertised value based on input link speed */ - hw->phy.autoneg_advertised = 0; - - if (speed & IXGBE_LINK_SPEED_10GB_FULL) - hw->phy.autoneg_advertised |= IXGBE_LINK_SPEED_10GB_FULL; - - if (speed & IXGBE_LINK_SPEED_1GB_FULL) - hw->phy.autoneg_advertised |= IXGBE_LINK_SPEED_1GB_FULL; - - /* - * When the driver changes the link speeds that it can support, - * it sets autotry_restart to true to indicate that we need to - * initiate a new autotry session with the link partner. To do - * so, we set the speed then disable and re-enable the tx laser, to - * alert the link partner that it also needs to restart autotry on its - * end. This is consistent with true clause 37 autoneg, which also - * involves a loss of signal. - */ - - /* - * Try each speed one by one, highest priority first. We do this in - * software because 10gb fiber doesn''t support speed autonegotiation. - */ - if (speed & IXGBE_LINK_SPEED_10GB_FULL) { - speedcnt++; - highest_link_speed = IXGBE_LINK_SPEED_10GB_FULL; - - /* If we already have link at this speed, just jump out */ - status = ixgbe_check_link(hw, &link_speed, &link_up, false); - if (status != 0) - goto out; - - if ((link_speed == IXGBE_LINK_SPEED_10GB_FULL) && link_up) - goto out; - - /* Set the module link speed */ - esdp_reg |= (IXGBE_ESDP_SDP5_DIR | IXGBE_ESDP_SDP5); - IXGBE_WRITE_REG(hw, IXGBE_ESDP, esdp_reg); - - /* Allow module to change analog characteristics (1G->10G) */ - msleep(40); - - status = ixgbe_setup_mac_link_speed_82599( - hw, IXGBE_LINK_SPEED_10GB_FULL, autoneg, - autoneg_wait_to_complete); - if (status != 0) - goto out; - - /* Flap the tx laser if it has not already been done */ - if (hw->mac.autotry_restart) { - /* Disable tx laser; allow 100us to go dark per spec */ - esdp_reg |= IXGBE_ESDP_SDP3; - IXGBE_WRITE_REG(hw, IXGBE_ESDP, esdp_reg); - udelay(100); - - /* Enable tx laser; allow 2ms to light up per spec */ - esdp_reg &= ~IXGBE_ESDP_SDP3; - IXGBE_WRITE_REG(hw, IXGBE_ESDP, esdp_reg); - msleep(2); - - hw->mac.autotry_restart = false; - } - - /* The controller may take up to 500ms at 10g to acquire link */ - for (i = 0; i < 5; i++) { - /* Wait for the link partner to also set speed */ - msleep(100); - - /* If we have link, just jump out */ - status = ixgbe_check_link(hw, &link_speed, - &link_up, false); - if (status != 0) - goto out; - - if (link_up) - goto out; - } - } - - if (speed & IXGBE_LINK_SPEED_1GB_FULL) { - speedcnt++; - if (highest_link_speed == IXGBE_LINK_SPEED_UNKNOWN) - highest_link_speed = IXGBE_LINK_SPEED_1GB_FULL; - - /* If we already have link at this speed, just jump out */ - status = ixgbe_check_link(hw, &link_speed, &link_up, false); - if (status != 0) - goto out; - - if ((link_speed == IXGBE_LINK_SPEED_1GB_FULL) && link_up) - goto out; - - /* Set the module link speed */ - esdp_reg &= ~IXGBE_ESDP_SDP5; - esdp_reg |= IXGBE_ESDP_SDP5_DIR; - IXGBE_WRITE_REG(hw, IXGBE_ESDP, esdp_reg); - - /* Allow module to change analog characteristics (10G->1G) */ - msleep(40); - - status = ixgbe_setup_mac_link_speed_82599( - hw, IXGBE_LINK_SPEED_1GB_FULL, autoneg, - autoneg_wait_to_complete); - if (status != 0) - goto out; - - /* Flap the tx laser if it has not already been done */ - if (hw->mac.autotry_restart) { - /* Disable tx laser; allow 100us to go dark per spec */ - esdp_reg |= IXGBE_ESDP_SDP3; - IXGBE_WRITE_REG(hw, IXGBE_ESDP, esdp_reg); - udelay(100); - - /* Enable tx laser; allow 2ms to light up per spec */ - esdp_reg &= ~IXGBE_ESDP_SDP3; - IXGBE_WRITE_REG(hw, IXGBE_ESDP, esdp_reg); - msleep(2); - - hw->mac.autotry_restart = false; - } - - /* Wait for the link partner to also set speed */ - msleep(100); - - /* If we have link, just jump out */ - status = ixgbe_check_link(hw, &link_speed, &link_up, false); - if (status != 0) - goto out; - - if (link_up) - goto out; - } - - /* - * We didn''t get link. Configure back to the highest speed we tried, - * (if there was more than one). We call ourselves back with just the - * single highest speed that the user requested. - */ - if (speedcnt > 1) - status = ixgbe_setup_mac_link_speed_multispeed_fiber(hw, - highest_link_speed, autoneg, autoneg_wait_to_complete); - -out: - return status; -} - -/** - * ixgbe_check_mac_link_82599 - Determine link and speed status - * @hw: pointer to hardware structure - * @speed: pointer to link speed - * @link_up: true when link is up - * @link_up_wait_to_complete: bool used to wait for link up or not - * - * Reads the links register to determine if link is up and the current speed - **/ -s32 ixgbe_check_mac_link_82599(struct ixgbe_hw *hw, ixgbe_link_speed *speed, - bool *link_up, bool link_up_wait_to_complete) -{ - u32 links_reg; - u32 i; - - links_reg = IXGBE_READ_REG(hw, IXGBE_LINKS); - if (link_up_wait_to_complete) { - for (i = 0; i < IXGBE_LINK_UP_TIME; i++) { - if (links_reg & IXGBE_LINKS_UP) { - *link_up = true; - break; - } else { - *link_up = false; - } - msleep(100); - links_reg = IXGBE_READ_REG(hw, IXGBE_LINKS); - } - } else { - if (links_reg & IXGBE_LINKS_UP) - *link_up = true; - else - *link_up = false; - } - - if ((links_reg & IXGBE_LINKS_SPEED_82599) =- IXGBE_LINKS_SPEED_10G_82599) - *speed = IXGBE_LINK_SPEED_10GB_FULL; - else if ((links_reg & IXGBE_LINKS_SPEED_82599) =- IXGBE_LINKS_SPEED_1G_82599) - *speed = IXGBE_LINK_SPEED_1GB_FULL; - else - *speed = IXGBE_LINK_SPEED_100_FULL; - - /* if link is down, zero out the current_mode */ - if (*link_up == false) { - hw->fc.current_mode = ixgbe_fc_none; - hw->fc.fc_was_autonegged = false; - } - - return 0; -} - -/** - * ixgbe_setup_mac_link_speed_82599 - Set MAC link speed - * @hw: pointer to hardware structure - * @speed: new link speed - * @autoneg: true if autonegotiation enabled - * @autoneg_wait_to_complete: true when waiting for completion is needed - * - * Set the link speed in the AUTOC register and restarts link. - **/ -s32 ixgbe_setup_mac_link_speed_82599(struct ixgbe_hw *hw, - ixgbe_link_speed speed, bool autoneg, - bool autoneg_wait_to_complete) -{ - s32 status = 0; - u32 autoc = IXGBE_READ_REG(hw, IXGBE_AUTOC); - u32 autoc2 = IXGBE_READ_REG(hw, IXGBE_AUTOC2); - u32 start_autoc = autoc; - u32 orig_autoc = 0; - u32 link_mode = autoc & IXGBE_AUTOC_LMS_MASK; - u32 pma_pmd_1g = autoc & IXGBE_AUTOC_1G_PMA_PMD_MASK; - u32 pma_pmd_10g_serial = autoc2 & IXGBE_AUTOC2_10G_SERIAL_PMA_PMD_MASK; - u32 links_reg; - u32 i; - ixgbe_link_speed link_capabilities = IXGBE_LINK_SPEED_UNKNOWN; - - /* Check to see if speed passed in is supported. */ - status = ixgbe_get_link_capabilities(hw, &link_capabilities, &autoneg); - if (status != 0) - goto out; - - speed &= link_capabilities; - - if (speed == IXGBE_LINK_SPEED_UNKNOWN) { - status = IXGBE_ERR_LINK_SETUP; - goto out; - } - - /* Use stored value (EEPROM defaults) of AUTOC to find KR/KX4 support*/ - if (hw->mac.orig_link_settings_stored) - orig_autoc = hw->mac.orig_autoc; - else - orig_autoc = autoc; - - if (link_mode == IXGBE_AUTOC_LMS_KX4_KX_KR || - link_mode == IXGBE_AUTOC_LMS_KX4_KX_KR_1G_AN || - link_mode == IXGBE_AUTOC_LMS_KX4_KX_KR_SGMII) { - /* Set KX4/KX/KR support according to speed requested */ - autoc &= ~(IXGBE_AUTOC_KX4_KX_SUPP_MASK | IXGBE_AUTOC_KR_SUPP); - if (speed & IXGBE_LINK_SPEED_10GB_FULL) - if (orig_autoc & IXGBE_AUTOC_KX4_SUPP) - autoc |= IXGBE_AUTOC_KX4_SUPP; - if (orig_autoc & IXGBE_AUTOC_KR_SUPP) - autoc |= IXGBE_AUTOC_KR_SUPP; - if (speed & IXGBE_LINK_SPEED_1GB_FULL) - autoc |= IXGBE_AUTOC_KX_SUPP; - } else if ((pma_pmd_1g == IXGBE_AUTOC_1G_SFI) && - (link_mode == IXGBE_AUTOC_LMS_1G_LINK_NO_AN || - link_mode == IXGBE_AUTOC_LMS_1G_AN)) { - /* Switch from 1G SFI to 10G SFI if requested */ - if ((speed == IXGBE_LINK_SPEED_10GB_FULL) && - (pma_pmd_10g_serial == IXGBE_AUTOC2_10G_SFI)) { - autoc &= ~IXGBE_AUTOC_LMS_MASK; - autoc |= IXGBE_AUTOC_LMS_10G_SERIAL; - } - } else if ((pma_pmd_10g_serial == IXGBE_AUTOC2_10G_SFI) && - (link_mode == IXGBE_AUTOC_LMS_10G_SERIAL)) { - /* Switch from 10G SFI to 1G SFI if requested */ - if ((speed == IXGBE_LINK_SPEED_1GB_FULL) && - (pma_pmd_1g == IXGBE_AUTOC_1G_SFI)) { - autoc &= ~IXGBE_AUTOC_LMS_MASK; - if (autoneg) - autoc |= IXGBE_AUTOC_LMS_1G_AN; - else - autoc |= IXGBE_AUTOC_LMS_1G_LINK_NO_AN; - } - } - - if (autoc != start_autoc) { - - /* Restart link */ - autoc |= IXGBE_AUTOC_AN_RESTART; - IXGBE_WRITE_REG(hw, IXGBE_AUTOC, autoc); - - /* Only poll for autoneg to complete if specified to do so */ - if (autoneg_wait_to_complete) { - if (link_mode == IXGBE_AUTOC_LMS_KX4_KX_KR || - link_mode == IXGBE_AUTOC_LMS_KX4_KX_KR_1G_AN || - link_mode == IXGBE_AUTOC_LMS_KX4_KX_KR_SGMII) { - links_reg = 0; /*Just in case Autoneg time=0*/ - for (i = 0; i < IXGBE_AUTO_NEG_TIME; i++) { - links_reg - IXGBE_READ_REG(hw, IXGBE_LINKS); - if (links_reg & IXGBE_LINKS_KX_AN_COMP) - break; - msleep(100); - } - if (!(links_reg & IXGBE_LINKS_KX_AN_COMP)) { - status - IXGBE_ERR_AUTONEG_NOT_COMPLETE; - hw_dbg(hw, "Autoneg did not complete.\n"); - } - } - } - - /* Add delay to filter out noises during initial link setup */ - msleep(50); - } - -out: - return status; -} - -/** - * ixgbe_setup_copper_link_82599 - Setup copper link settings - * @hw: pointer to hardware structure - * - * Restarts the link on PHY and then MAC. Performs autonegotiation if needed. - **/ -static s32 ixgbe_setup_copper_link_82599(struct ixgbe_hw *hw) -{ - s32 status; - - /* Restart autonegotiation on PHY */ - status = hw->phy.ops.setup_link(hw); - - /* Set up MAC */ - ixgbe_setup_mac_link_82599(hw); - - return status; -} - -/** - * ixgbe_setup_copper_link_speed_82599 - Set the PHY autoneg advertised field - * @hw: pointer to hardware structure - * @speed: new link speed - * @autoneg: true if autonegotiation enabled - * @autoneg_wait_to_complete: true if waiting is needed to complete - * - * Restarts link on PHY and MAC based on settings passed in. - **/ -static s32 ixgbe_setup_copper_link_speed_82599(struct ixgbe_hw *hw, - ixgbe_link_speed speed, - bool autoneg, - bool autoneg_wait_to_complete) -{ - s32 status; - - /* Setup the PHY according to input speed */ - status = hw->phy.ops.setup_link_speed(hw, speed, autoneg, - autoneg_wait_to_complete); - /* Set up MAC */ - ixgbe_setup_mac_link_82599(hw); - - return status; -} -/** - * ixgbe_reset_hw_82599 - Perform hardware reset - * @hw: pointer to hardware structure - * - * Resets the hardware by resetting the transmit and receive units, masks - * and clears all interrupts, perform a PHY reset, and perform a link (MAC) - * reset. - **/ -s32 ixgbe_reset_hw_82599(struct ixgbe_hw *hw) -{ - s32 status = 0; - u32 ctrl, ctrl_ext; - u32 i; - u32 autoc; - u32 autoc2; - - /* Call adapter stop to disable tx/rx and clear interrupts */ - hw->mac.ops.stop_adapter(hw); - - /* PHY ops must be identified and initialized prior to reset */ - - /* Identify PHY and related function pointers */ - status = hw->phy.ops.init(hw); - - if (status == IXGBE_ERR_SFP_NOT_SUPPORTED) - goto reset_hw_out; - - - /* Setup SFP module if there is one present. */ - if (hw->phy.sfp_setup_needed) { - status = hw->mac.ops.setup_sfp(hw); - hw->phy.sfp_setup_needed = false; - } - - if (status == IXGBE_ERR_SFP_NOT_SUPPORTED) - goto reset_hw_out; - - /* Reset PHY */ - if (hw->phy.reset_disable == false && hw->phy.ops.reset != NULL) - hw->phy.ops.reset(hw); - - /* - * Prevent the PCI-E bus from from hanging by disabling PCI-E master - * access and verify no pending requests before reset - */ - status = ixgbe_disable_pcie_master(hw); - if (status != 0) { - status = IXGBE_ERR_MASTER_REQUESTS_PENDING; - hw_dbg(hw, "PCI-E Master disable polling has failed.\n"); - } - - /* - * Issue global reset to the MAC. This needs to be a SW reset. - * If link reset is used, it might reset the MAC when mng is using it - */ - ctrl = IXGBE_READ_REG(hw, IXGBE_CTRL); - IXGBE_WRITE_REG(hw, IXGBE_CTRL, (ctrl | IXGBE_CTRL_RST)); - IXGBE_WRITE_FLUSH(hw); - - /* Poll for reset bit to self-clear indicating reset is complete */ - for (i = 0; i < 10; i++) { - udelay(1); - ctrl = IXGBE_READ_REG(hw, IXGBE_CTRL); - if (!(ctrl & IXGBE_CTRL_RST)) - break; - } - if (ctrl & IXGBE_CTRL_RST) { - status = IXGBE_ERR_RESET_FAILED; - hw_dbg(hw, "Reset polling failed to complete.\n"); - } - /* Clear PF Reset Done bit so PF/VF Mail Ops can work */ - ctrl_ext = IXGBE_READ_REG(hw, IXGBE_CTRL_EXT); - ctrl_ext |= IXGBE_CTRL_EXT_PFRSTD; - IXGBE_WRITE_REG(hw, IXGBE_CTRL_EXT, ctrl_ext); - - msleep(50); - - - - /* - * Store the original AUTOC/AUTOC2 values if they have not been - * stored off yet. Otherwise restore the stored original - * values since the reset operation sets back to defaults. - */ - autoc = IXGBE_READ_REG(hw, IXGBE_AUTOC); - autoc2 = IXGBE_READ_REG(hw, IXGBE_AUTOC2); - if (hw->mac.orig_link_settings_stored == false) { - hw->mac.orig_autoc = autoc; - hw->mac.orig_autoc2 = autoc2; - hw->mac.orig_link_settings_stored = true; - } else { - if (autoc != hw->mac.orig_autoc) - IXGBE_WRITE_REG(hw, IXGBE_AUTOC, (hw->mac.orig_autoc | - IXGBE_AUTOC_AN_RESTART)); - - if ((autoc2 & IXGBE_AUTOC2_UPPER_MASK) !- (hw->mac.orig_autoc2 & IXGBE_AUTOC2_UPPER_MASK)) { - autoc2 &= ~IXGBE_AUTOC2_UPPER_MASK; - autoc2 |= (hw->mac.orig_autoc2 & - IXGBE_AUTOC2_UPPER_MASK); - IXGBE_WRITE_REG(hw, IXGBE_AUTOC2, autoc2); - } - } - - /* Store the permanent mac address */ - hw->mac.ops.get_mac_addr(hw, hw->mac.perm_addr); - - /* - * Store MAC address from RAR0, clear receive address registers, and - * clear the multicast table. Also reset num_rar_entries to 128, - * since we modify this value when programming the SAN MAC address. - */ - hw->mac.num_rar_entries = 128; - hw->mac.ops.init_rx_addrs(hw); - - - - /* Store the permanent SAN mac address */ - hw->mac.ops.get_san_mac_addr(hw, hw->mac.san_addr); - - /* Add the SAN MAC address to the RAR only if it''s a valid address */ - if (ixgbe_validate_mac_addr(hw->mac.san_addr) == 0) { - hw->mac.ops.set_rar(hw, hw->mac.num_rar_entries - 1, - hw->mac.san_addr, 0, IXGBE_RAH_AV); - - /* Reserve the last RAR for the SAN MAC address */ - hw->mac.num_rar_entries--; - } - -reset_hw_out: - return status; -} - -/** - * ixgbe_insert_mac_addr_82599 - Find a RAR for this mac address - * @hw: pointer to hardware structure - * @addr: Address to put into receive address register - * @vmdq: VMDq pool to assign - * - * Puts an ethernet address into a receive address register, or - * finds the rar that it is aleady in; adds to the pool list - **/ -s32 ixgbe_insert_mac_addr_82599(struct ixgbe_hw *hw, u8 *addr, u32 vmdq) -{ - static const u32 NO_EMPTY_RAR_FOUND = 0xFFFFFFFF; - u32 first_empty_rar = NO_EMPTY_RAR_FOUND; - u32 rar; - u32 rar_low, rar_high; - u32 addr_low, addr_high; - - /* swap bytes for HW little endian */ - addr_low = addr[0] | (addr[1] << 8) - | (addr[2] << 16) - | (addr[3] << 24); - addr_high = addr[4] | (addr[5] << 8); - - /* - * Either find the mac_id in rar or find the first empty space. - * rar_highwater points to just after the highest currently used - * rar in order to shorten the search. It grows when we add a new - * rar to the top. - */ - for (rar = 0; rar < hw->mac.rar_highwater; rar++) { - rar_high = IXGBE_READ_REG(hw, IXGBE_RAH(rar)); - - if (((IXGBE_RAH_AV & rar_high) == 0) - && first_empty_rar == NO_EMPTY_RAR_FOUND) { - first_empty_rar = rar; - } else if ((rar_high & 0xFFFF) == addr_high) { - rar_low = IXGBE_READ_REG(hw, IXGBE_RAL(rar)); - if (rar_low == addr_low) - break; /* found it already in the rars */ - } - } - - if (rar < hw->mac.rar_highwater) { - /* already there so just add to the pool bits */ - ixgbe_set_vmdq(hw, rar, vmdq); - } else if (first_empty_rar != NO_EMPTY_RAR_FOUND) { - /* stick it into first empty RAR slot we found */ - rar = first_empty_rar; - ixgbe_set_rar(hw, rar, addr, vmdq, IXGBE_RAH_AV); - } else if (rar == hw->mac.rar_highwater) { - /* add it to the top of the list and inc the highwater mark */ - ixgbe_set_rar(hw, rar, addr, vmdq, IXGBE_RAH_AV); - hw->mac.rar_highwater++; - } else if (rar >= hw->mac.num_rar_entries) { - return IXGBE_ERR_INVALID_MAC_ADDR; - } - - /* - * If we found rar[0], make sure the default pool bit (we use pool 0) - * remains cleared to be sure default pool packets will get delivered - */ - if (rar == 0) - ixgbe_clear_vmdq(hw, rar, 0); - - return rar; -} - -/** - * ixgbe_clear_vmdq_82599 - Disassociate a VMDq pool index from a rx address - * @hw: pointer to hardware struct - * @rar: receive address register index to disassociate - * @vmdq: VMDq pool index to remove from the rar - **/ -s32 ixgbe_clear_vmdq_82599(struct ixgbe_hw *hw, u32 rar, u32 vmdq) -{ - u32 mpsar_lo, mpsar_hi; - u32 rar_entries = hw->mac.num_rar_entries; - - if (rar < rar_entries) { - mpsar_lo = IXGBE_READ_REG(hw, IXGBE_MPSAR_LO(rar)); - mpsar_hi = IXGBE_READ_REG(hw, IXGBE_MPSAR_HI(rar)); - - if (!mpsar_lo && !mpsar_hi) - goto done; - - if (vmdq == IXGBE_CLEAR_VMDQ_ALL) { - if (mpsar_lo) { - IXGBE_WRITE_REG(hw, IXGBE_MPSAR_LO(rar), 0); - mpsar_lo = 0; - } - if (mpsar_hi) { - IXGBE_WRITE_REG(hw, IXGBE_MPSAR_HI(rar), 0); - mpsar_hi = 0; - } - } else if (vmdq < 32) { - mpsar_lo &= ~(1 << vmdq); - IXGBE_WRITE_REG(hw, IXGBE_MPSAR_LO(rar), mpsar_lo); - } else { - mpsar_hi &= ~(1 << (vmdq - 32)); - IXGBE_WRITE_REG(hw, IXGBE_MPSAR_HI(rar), mpsar_hi); - } - - /* was that the last pool using this rar? */ - if (mpsar_lo == 0 && mpsar_hi == 0 && rar != 0) - hw->mac.ops.clear_rar(hw, rar); - } else { - hw_dbg(hw, "RAR index %d is out of range.\n", rar); - } - -done: - return 0; -} - -/** - * ixgbe_set_vmdq_82599 - Associate a VMDq pool index with a rx address - * @hw: pointer to hardware struct - * @rar: receive address register index to associate with a VMDq index - * @vmdq: VMDq pool index - **/ -s32 ixgbe_set_vmdq_82599(struct ixgbe_hw *hw, u32 rar, u32 vmdq) -{ - u32 mpsar; - u32 rar_entries = hw->mac.num_rar_entries; - - if (rar < rar_entries) { - if (vmdq < 32) { - mpsar = IXGBE_READ_REG(hw, IXGBE_MPSAR_LO(rar)); - mpsar |= 1 << vmdq; - IXGBE_WRITE_REG(hw, IXGBE_MPSAR_LO(rar), mpsar); - } else { - mpsar = IXGBE_READ_REG(hw, IXGBE_MPSAR_HI(rar)); - mpsar |= 1 << (vmdq - 32); - IXGBE_WRITE_REG(hw, IXGBE_MPSAR_HI(rar), mpsar); - } - } else { - hw_dbg(hw, "RAR index %d is out of range.\n", rar); - } - return 0; -} - -/** - * ixgbe_set_vfta_82599 - Set VLAN filter table - * @hw: pointer to hardware structure - * @vlan: VLAN id to write to VLAN filter - * @vind: VMDq output index that maps queue to VLAN id in VFVFB - * @vlan_on: boolean flag to turn on/off VLAN in VFVF - * - * Turn on/off specified VLAN in the VLAN filter table. - **/ -s32 ixgbe_set_vfta_82599(struct ixgbe_hw *hw, u32 vlan, u32 vind, - bool vlan_on) -{ - u32 regindex; - u32 bitindex; - u32 bits; - u32 first_empty_slot; - - if (vlan > 4095) - return IXGBE_ERR_PARAM; - - /* - * this is a 2 part operation - first the VFTA, then the - * VLVF and VLVFB if vind is set - */ - - /* Part 1 - * The VFTA is a bitstring made up of 128 32-bit registers - * that enable the particular VLAN id, much like the MTA: - * bits[11-5]: which register - * bits[4-0]: which bit in the register - */ - regindex = (vlan >> 5) & 0x7F; - bitindex = vlan & 0x1F; - bits = IXGBE_READ_REG(hw, IXGBE_VFTA(regindex)); - if (vlan_on) - bits |= (1 << bitindex); - else - bits &= ~(1 << bitindex); - IXGBE_WRITE_REG(hw, IXGBE_VFTA(regindex), bits); - - - /* Part 2 - * If the vind is set - * Either vlan_on - * make sure the vlan is in VLVF - * set the vind bit in the matching VLVFB - * Or !vlan_on - * clear the pool bit and possibly the vind - */ - if (vind) { - /* find the vlanid or the first empty slot */ - first_empty_slot = 0; - - for (regindex = 1; regindex < IXGBE_VLVF_ENTRIES; regindex++) { - bits = IXGBE_READ_REG(hw, IXGBE_VLVF(regindex)); - if (!bits && !first_empty_slot) - first_empty_slot = regindex; - else if ((bits & 0x0FFF) == vlan) - break; - } - - if (regindex >= IXGBE_VLVF_ENTRIES) { - if (first_empty_slot) - regindex = first_empty_slot; - else { - hw_dbg(hw, "No space in VLVF.\n"); - goto out; - } - } - - - if (vlan_on) { - /* set the pool bit */ - if (vind < 32) { - bits - IXGBE_READ_REG(hw, IXGBE_VLVFB(regindex*2)); - bits |= (1 << vind); - IXGBE_WRITE_REG(hw, - IXGBE_VLVFB(regindex*2), bits); - } else { - bits = IXGBE_READ_REG(hw, - IXGBE_VLVFB((regindex*2)+1)); - bits |= (1 << vind); - IXGBE_WRITE_REG(hw, - IXGBE_VLVFB((regindex*2)+1), bits); - } - } else { - /* clear the pool bit */ - if (vind < 32) { - bits = IXGBE_READ_REG(hw, - IXGBE_VLVFB(regindex*2)); - bits &= ~(1 << vind); - IXGBE_WRITE_REG(hw, - IXGBE_VLVFB(regindex*2), bits); - bits |= IXGBE_READ_REG(hw, - IXGBE_VLVFB((regindex*2)+1)); - } else { - bits = IXGBE_READ_REG(hw, - IXGBE_VLVFB((regindex*2)+1)); - bits &= ~(1 << vind); - IXGBE_WRITE_REG(hw, - IXGBE_VLVFB((regindex*2)+1), bits); - bits |= IXGBE_READ_REG(hw, - IXGBE_VLVFB(regindex*2)); - } - } - - if (bits) - IXGBE_WRITE_REG(hw, IXGBE_VLVF(regindex), - (IXGBE_VLVF_VIEN | vlan)); - else - IXGBE_WRITE_REG(hw, IXGBE_VLVF(regindex), 0); - } - -out: - return 0; -} - -/** - * ixgbe_clear_vfta_82599 - Clear VLAN filter table - * @hw: pointer to hardware structure - * - * Clears the VLAN filer table, and the VMDq index associated with the filter - **/ -s32 ixgbe_clear_vfta_82599(struct ixgbe_hw *hw) -{ - u32 offset; - - for (offset = 0; offset < hw->mac.vft_size; offset++) - IXGBE_WRITE_REG(hw, IXGBE_VFTA(offset), 0); - - for (offset = 0; offset < IXGBE_VLVF_ENTRIES; offset++) { - IXGBE_WRITE_REG(hw, IXGBE_VLVF(offset), 0); - IXGBE_WRITE_REG(hw, IXGBE_VLVFB(offset*2), 0); - IXGBE_WRITE_REG(hw, IXGBE_VLVFB((offset*2)+1), 0); - } - - return 0; -} - -/** - * ixgbe_init_uta_tables_82599 - Initialize the Unicast Table Array - * @hw: pointer to hardware structure - **/ -s32 ixgbe_init_uta_tables_82599(struct ixgbe_hw *hw) -{ - int i; - hw_dbg(hw, " Clearing UTA\n"); - - for (i = 0; i < 128; i++) - IXGBE_WRITE_REG(hw, IXGBE_UTA(i), 0); - - return 0; -} - -/** - * ixgbe_reinit_fdir_tables_82599 - Reinitialize Flow Director tables. - * @hw: pointer to hardware structure - **/ -s32 ixgbe_reinit_fdir_tables_82599(struct ixgbe_hw *hw) -{ - int i; - u32 fdirctrl = IXGBE_READ_REG(hw, IXGBE_FDIRCTRL); - fdirctrl &= ~IXGBE_FDIRCTRL_INIT_DONE; - - /* - * Before starting reinitialization process, - * FDIRCMD.CMD must be zero. - */ - for (i = 0; i < IXGBE_FDIRCMD_CMD_POLL; i++) { - if (!(IXGBE_READ_REG(hw, IXGBE_FDIRCMD) & - IXGBE_FDIRCMD_CMD_MASK)) - break; - udelay(10); - } - if (i >= IXGBE_FDIRCMD_CMD_POLL) { - hw_dbg(hw, "Flow Director previous command isn''t complete, " - "aborting table re-initialization. \n"); - return IXGBE_ERR_FDIR_REINIT_FAILED; - } - - IXGBE_WRITE_REG(hw, IXGBE_FDIRFREE, 0); - IXGBE_WRITE_FLUSH(hw); - /* - * 82599 adapters flow director init flow cannot be restarted, - * Workaround 82599 silicon errata by performing the following steps - * before re-writing the FDIRCTRL control register with the same value. - * - write 1 to bit 8 of FDIRCMD register & - * - write 0 to bit 8 of FDIRCMD register - */ - IXGBE_WRITE_REG(hw, IXGBE_FDIRCMD, - (IXGBE_READ_REG(hw, IXGBE_FDIRCMD) | - IXGBE_FDIRCMD_CLEARHT)); - IXGBE_WRITE_FLUSH(hw); - IXGBE_WRITE_REG(hw, IXGBE_FDIRCMD, - (IXGBE_READ_REG(hw, IXGBE_FDIRCMD) & - ~IXGBE_FDIRCMD_CLEARHT)); - IXGBE_WRITE_FLUSH(hw); - /* - * Clear FDIR Hash register to clear any leftover hashes - * waiting to be programmed. - */ - IXGBE_WRITE_REG(hw, IXGBE_FDIRHASH, 0x00); - IXGBE_WRITE_FLUSH(hw); - - IXGBE_WRITE_REG(hw, IXGBE_FDIRCTRL, fdirctrl); - IXGBE_WRITE_FLUSH(hw); - - /* Poll init-done after we write FDIRCTRL register */ - for (i = 0; i < IXGBE_FDIR_INIT_DONE_POLL; i++) { - if (IXGBE_READ_REG(hw, IXGBE_FDIRCTRL) & - IXGBE_FDIRCTRL_INIT_DONE) - break; - udelay(10); - } - if (i >= IXGBE_FDIR_INIT_DONE_POLL) { - hw_dbg(hw, "Flow Director Signature poll time exceeded!\n"); - return IXGBE_ERR_FDIR_REINIT_FAILED; - } - - /* Clear FDIR statistics registers (read to clear) */ - IXGBE_READ_REG(hw, IXGBE_FDIRUSTAT); - IXGBE_READ_REG(hw, IXGBE_FDIRFSTAT); - IXGBE_READ_REG(hw, IXGBE_FDIRMATCH); - IXGBE_READ_REG(hw, IXGBE_FDIRMISS); - IXGBE_READ_REG(hw, IXGBE_FDIRLEN); - - return 0; -} - -/** - * ixgbe_init_fdir_signature_82599 - Initialize Flow Director signature filters - * @hw: pointer to hardware structure - * @pballoc: which mode to allocate filters with - **/ -s32 ixgbe_init_fdir_signature_82599(struct ixgbe_hw *hw, u32 pballoc) -{ - u32 fdirctrl = 0; - u32 pbsize; - int i; - - /* - * Before enabling Flow Director, the Rx Packet Buffer size - * must be reduced. The new value is the current size minus - * flow director memory usage size. - */ - pbsize = (1 << (IXGBE_FDIR_PBALLOC_SIZE_SHIFT + pballoc)); - IXGBE_WRITE_REG(hw, IXGBE_RXPBSIZE(0), - (IXGBE_READ_REG(hw, IXGBE_RXPBSIZE(0)) - pbsize)); - - /* - * The defaults in the HW for RX PB 1-7 are not zero and so should be - * intialized to zero for non DCB mode otherwise actual total RX PB - * would be bigger than programmed and filter space would run into - * the PB 0 region. - */ - for (i = 1; i < 8; i++) - IXGBE_WRITE_REG(hw, IXGBE_RXPBSIZE(i), 0); - - /* Send interrupt when 64 filters are left */ - fdirctrl |= 4 << IXGBE_FDIRCTRL_FULL_THRESH_SHIFT; - - /* Set the maximum length per hash bucket to 0xA filters */ - fdirctrl |= 0xA << IXGBE_FDIRCTRL_MAX_LENGTH_SHIFT; - - switch (pballoc) { - case IXGBE_FDIR_PBALLOC_64K: - /* 8k - 1 signature filters */ - fdirctrl |= IXGBE_FDIRCTRL_PBALLOC_64K; - break; - case IXGBE_FDIR_PBALLOC_128K: - /* 16k - 1 signature filters */ - fdirctrl |= IXGBE_FDIRCTRL_PBALLOC_128K; - break; - case IXGBE_FDIR_PBALLOC_256K: - /* 32k - 1 signature filters */ - fdirctrl |= IXGBE_FDIRCTRL_PBALLOC_256K; - break; - default: - /* bad value */ - return IXGBE_ERR_CONFIG; - }; - - /* Move the flexible bytes to use the ethertype - shift 6 words */ - fdirctrl |= (0x6 << IXGBE_FDIRCTRL_FLEX_SHIFT); - - - /* Prime the keys for hashing */ - IXGBE_WRITE_REG(hw, IXGBE_FDIRHKEY, - IXGBE_HTONL(IXGBE_ATR_BUCKET_HASH_KEY)); - IXGBE_WRITE_REG(hw, IXGBE_FDIRSKEY, - IXGBE_HTONL(IXGBE_ATR_SIGNATURE_HASH_KEY)); - - /* - * Poll init-done after we write the register. Estimated times: - * 10G: PBALLOC = 11b, timing is 60us - * 1G: PBALLOC = 11b, timing is 600us - * 100M: PBALLOC = 11b, timing is 6ms - * - * Multiple these timings by 4 if under full Rx load - * - * So we''ll poll for IXGBE_FDIR_INIT_DONE_POLL times, sleeping for - * 1 msec per poll time. If we''re at line rate and drop to 100M, then - * this might not finish in our poll time, but we can live with that - * for now. - */ - IXGBE_WRITE_REG(hw, IXGBE_FDIRCTRL, fdirctrl); - IXGBE_WRITE_FLUSH(hw); - for (i = 0; i < IXGBE_FDIR_INIT_DONE_POLL; i++) { - if (IXGBE_READ_REG(hw, IXGBE_FDIRCTRL) & - IXGBE_FDIRCTRL_INIT_DONE) - break; - msleep(1); - } - if (i >= IXGBE_FDIR_INIT_DONE_POLL) - hw_dbg(hw, "Flow Director Signature poll time exceeded!\n"); - - return 0; -} - -/** - * ixgbe_init_fdir_perfect_82599 - Initialize Flow Director perfect filters - * @hw: pointer to hardware structure - * @pballoc: which mode to allocate filters with - **/ -s32 ixgbe_init_fdir_perfect_82599(struct ixgbe_hw *hw, u32 pballoc) -{ - u32 fdirctrl = 0; - u32 pbsize; - int i; - - /* - * Before enabling Flow Director, the Rx Packet Buffer size - * must be reduced. The new value is the current size minus - * flow director memory usage size. - */ - - pbsize = (1 << (IXGBE_FDIR_PBALLOC_SIZE_SHIFT + pballoc)); - IXGBE_WRITE_REG(hw, IXGBE_RXPBSIZE(0), - (IXGBE_READ_REG(hw, IXGBE_RXPBSIZE(0)) - pbsize)); - - /* - * The defaults in the HW for RX PB 1-7 are not zero and so should be - * intialized to zero for non DCB mode otherwise actual total RX PB - * would be bigger than programmed and filter space would run into - * the PB 0 region. - */ - for (i = 1; i < 8; i++) - IXGBE_WRITE_REG(hw, IXGBE_RXPBSIZE(i), 0); - - /* Send interrupt when 64 filters are left */ - fdirctrl |= 4 << IXGBE_FDIRCTRL_FULL_THRESH_SHIFT; - - switch (pballoc) { - case IXGBE_FDIR_PBALLOC_64K: - /* 2k - 1 perfect filters */ - fdirctrl |= IXGBE_FDIRCTRL_PBALLOC_64K; - break; - case IXGBE_FDIR_PBALLOC_128K: - /* 4k - 1 perfect filters */ - fdirctrl |= IXGBE_FDIRCTRL_PBALLOC_128K; - break; - case IXGBE_FDIR_PBALLOC_256K: - /* 8k - 1 perfect filters */ - fdirctrl |= IXGBE_FDIRCTRL_PBALLOC_256K; - break; - default: - /* bad value */ - return IXGBE_ERR_CONFIG; - }; - - /* Turn perfect match filtering on */ - fdirctrl |= IXGBE_FDIRCTRL_PERFECT_MATCH; - fdirctrl |= IXGBE_FDIRCTRL_REPORT_STATUS; - - /* Move the flexible bytes to use the ethertype - shift 6 words */ - fdirctrl |= (0x6 << IXGBE_FDIRCTRL_FLEX_SHIFT); - - /* Prime the keys for hashing */ - IXGBE_WRITE_REG(hw, IXGBE_FDIRHKEY, - IXGBE_HTONL(IXGBE_ATR_BUCKET_HASH_KEY)); - IXGBE_WRITE_REG(hw, IXGBE_FDIRSKEY, - IXGBE_HTONL(IXGBE_ATR_SIGNATURE_HASH_KEY)); - - /* - * Poll init-done after we write the register. Estimated times: - * 10G: PBALLOC = 11b, timing is 60us - * 1G: PBALLOC = 11b, timing is 600us - * 100M: PBALLOC = 11b, timing is 6ms - * - * Multiple these timings by 4 if under full Rx load - * - * So we''ll poll for IXGBE_FDIR_INIT_DONE_POLL times, sleeping for - * 1 msec per poll time. If we''re at line rate and drop to 100M, then - * this might not finish in our poll time, but we can live with that - * for now. - */ - - /* Set the maximum length per hash bucket to 0xA filters */ - fdirctrl |= (0xA << IXGBE_FDIRCTRL_MAX_LENGTH_SHIFT); - - IXGBE_WRITE_REG(hw, IXGBE_FDIRCTRL, fdirctrl); - IXGBE_WRITE_FLUSH(hw); - for (i = 0; i < IXGBE_FDIR_INIT_DONE_POLL; i++) { - if (IXGBE_READ_REG(hw, IXGBE_FDIRCTRL) & - IXGBE_FDIRCTRL_INIT_DONE) - break; - msleep(1); - } - if (i >= IXGBE_FDIR_INIT_DONE_POLL) - hw_dbg(hw, "Flow Director Perfect poll time exceeded!\n"); - - return 0; -} - - -/** - * ixgbe_atr_compute_hash_82599 - Compute the hashes for SW ATR - * @stream: input bitstream to compute the hash on - * @key: 32-bit hash key - **/ -u16 ixgbe_atr_compute_hash_82599(struct ixgbe_atr_input *atr_input, u32 key) -{ - /* - * The algorithm is as follows: - * Hash[15:0] = Sum { S[n] x K[n+16] }, n = 0...350 - * where Sum {A[n]}, n = 0...n is bitwise XOR of A[0], A[1]...A[n] - * and A[n] x B[n] is bitwise AND between same length strings - * - * K[n] is 16 bits, defined as: - * for n modulo 32 >= 15, K[n] = K[n % 32 : (n % 32) - 15] - * for n modulo 32 < 15, K[n] - * K[(n % 32:0) | (31:31 - (14 - (n % 32)))] - * - * S[n] is 16 bits, defined as: - * for n >= 15, S[n] = S[n:n - 15] - * for n < 15, S[n] = S[(n:0) | (350:350 - (14 - n))] - * - * To simplify for programming, the algorithm is implemented - * in software this way: - * - * Key[31:0], Stream[335:0] - * - * tmp_key[11 * 32 - 1:0] = 11{Key[31:0] = key concatenated 11 times - * int_key[350:0] = tmp_key[351:1] - * int_stream[365:0] = Stream[14:0] | Stream[335:0] | Stream[335:321] - * - * hash[15:0] = 0; - * for (i = 0; i < 351; i++) { - * if (int_key[i]) - * hash ^= int_stream[(i + 15):i]; - * } - */ - - union { - u64 fill[6]; - u32 key[11]; - u8 key_stream[44]; - } tmp_key; - - u8 *stream = (u8 *)atr_input; - u8 int_key[44]; /* upper-most bit unused */ - u8 hash_str[46]; /* upper-most 2 bits unused */ - u16 hash_result = 0; - int i, j, k, h; - - /* - * Initialize the fill member to prevent warnings - * on some compilers - */ - tmp_key.fill[0] = 0; - - /* First load the temporary key stream */ - for (i = 0; i < 6; i++) { - u64 fillkey = ((u64)key << 32) | key; - tmp_key.fill[i] = fillkey; - } - - /* - * Set the interim key for the hashing. Bit 352 is unused, so we must - * shift and compensate when building the key. - */ - - int_key[0] = tmp_key.key_stream[0] >> 1; - for (i = 1, j = 0; i < 44; i++) { - unsigned int this_key = tmp_key.key_stream[j] << 7; - j++; - int_key[i] = (u8)(this_key | (tmp_key.key_stream[j] >> 1)); - } - - /* - * Set the interim bit string for the hashing. Bits 368 and 367 are - * unused, so shift and compensate when building the string. - */ - hash_str[0] = (stream[40] & 0x7f) >> 1; - for (i = 1, j = 40; i < 46; i++) { - unsigned int this_str = stream[j] << 7; - j++; - if (j > 41) - j = 0; - hash_str[i] = (u8)(this_str | (stream[j] >> 1)); - } - - /* - * Now compute the hash. i is the index into hash_str, j is into our - * key stream, k is counting the number of bits, and h interates within - * each byte. - */ - for (i = 45, j = 43, k = 0; k < 351 && i >= 2 && j >= 0; i--, j--) { - for (h = 0; h < 8 && k < 351; h++, k++) { - if (int_key[j] & (1 << h)) { - /* - * Key bit is set, XOR in the current 16-bit - * string. Example of processing: - * h = 0, - * tmp = (hash_str[i - 2] & 0 << 16) | - * (hash_str[i - 1] & 0xff << 8) | - * (hash_str[i] & 0xff >> 0) - * So tmp = hash_str[15 + k:k], since the - * i + 2 clause rolls off the 16-bit value - * h = 7, - * tmp = (hash_str[i - 2] & 0x7f << 9) | - * (hash_str[i - 1] & 0xff << 1) | - * (hash_str[i] & 0x80 >> 7) - */ - int tmp = (hash_str[i] >> h); - tmp |= (hash_str[i - 1] << (8 - h)); - tmp |= (int)(hash_str[i - 2] & ((1 << h) - 1)) - << (16 - h); - hash_result ^= (u16)tmp; - } - } - } - - return hash_result; -} - -/** - * ixgbe_atr_set_vlan_id_82599 - Sets the VLAN id in the ATR input stream - * @input: input stream to modify - * @vlan: the VLAN id to load - **/ -s32 ixgbe_atr_set_vlan_id_82599(struct ixgbe_atr_input *input, u16 vlan) -{ - input->byte_stream[IXGBE_ATR_VLAN_OFFSET + 1] = vlan >> 8; - input->byte_stream[IXGBE_ATR_VLAN_OFFSET] = vlan & 0xff; - - return 0; -} - -/** - * ixgbe_atr_set_src_ipv4_82599 - Sets the source IPv4 address - * @input: input stream to modify - * @src_addr: the IP address to load - **/ -s32 ixgbe_atr_set_src_ipv4_82599(struct ixgbe_atr_input *input, u32 src_addr) -{ - input->byte_stream[IXGBE_ATR_SRC_IPV4_OFFSET + 3] = src_addr >> 24; - input->byte_stream[IXGBE_ATR_SRC_IPV4_OFFSET + 2] - (src_addr >> 16) & 0xff; - input->byte_stream[IXGBE_ATR_SRC_IPV4_OFFSET + 1] - (src_addr >> 8) & 0xff; - input->byte_stream[IXGBE_ATR_SRC_IPV4_OFFSET] = src_addr & 0xff; - - return 0; -} - -/** - * ixgbe_atr_set_dst_ipv4_82599 - Sets the destination IPv4 address - * @input: input stream to modify - * @dst_addr: the IP address to load - **/ -s32 ixgbe_atr_set_dst_ipv4_82599(struct ixgbe_atr_input *input, u32 dst_addr) -{ - input->byte_stream[IXGBE_ATR_DST_IPV4_OFFSET + 3] = dst_addr >> 24; - input->byte_stream[IXGBE_ATR_DST_IPV4_OFFSET + 2] - (dst_addr >> 16) & 0xff; - input->byte_stream[IXGBE_ATR_DST_IPV4_OFFSET + 1] - (dst_addr >> 8) & 0xff; - input->byte_stream[IXGBE_ATR_DST_IPV4_OFFSET] = dst_addr & 0xff; - - return 0; -} - -/** - * ixgbe_atr_set_src_ipv6_82599 - Sets the source IPv6 address - * @input: input stream to modify - * @src_addr_1: the first 4 bytes of the IP address to load - * @src_addr_2: the second 4 bytes of the IP address to load - * @src_addr_3: the third 4 bytes of the IP address to load - * @src_addr_4: the fourth 4 bytes of the IP address to load - **/ -s32 ixgbe_atr_set_src_ipv6_82599(struct ixgbe_atr_input *input, - u32 src_addr_1, u32 src_addr_2, - u32 src_addr_3, u32 src_addr_4) -{ - input->byte_stream[IXGBE_ATR_SRC_IPV6_OFFSET] = src_addr_4 & 0xff; - input->byte_stream[IXGBE_ATR_SRC_IPV6_OFFSET + 1] - (src_addr_4 >> 8) & 0xff; - input->byte_stream[IXGBE_ATR_SRC_IPV6_OFFSET + 2] - (src_addr_4 >> 16) & 0xff; - input->byte_stream[IXGBE_ATR_SRC_IPV6_OFFSET + 3] = src_addr_4 >> 24; - - input->byte_stream[IXGBE_ATR_SRC_IPV6_OFFSET + 4] = src_addr_3 & 0xff; - input->byte_stream[IXGBE_ATR_SRC_IPV6_OFFSET + 5] - (src_addr_3 >> 8) & 0xff; - input->byte_stream[IXGBE_ATR_SRC_IPV6_OFFSET + 6] - (src_addr_3 >> 16) & 0xff; - input->byte_stream[IXGBE_ATR_SRC_IPV6_OFFSET + 7] = src_addr_3 >> 24; - - input->byte_stream[IXGBE_ATR_SRC_IPV6_OFFSET + 8] = src_addr_2 & 0xff; - input->byte_stream[IXGBE_ATR_SRC_IPV6_OFFSET + 9] - (src_addr_2 >> 8) & 0xff; - input->byte_stream[IXGBE_ATR_SRC_IPV6_OFFSET + 10] - (src_addr_2 >> 16) & 0xff; - input->byte_stream[IXGBE_ATR_SRC_IPV6_OFFSET + 11] = src_addr_2 >> 24; - - input->byte_stream[IXGBE_ATR_SRC_IPV6_OFFSET + 12] = src_addr_1 & 0xff; - input->byte_stream[IXGBE_ATR_SRC_IPV6_OFFSET + 13] - (src_addr_1 >> 8) & 0xff; - input->byte_stream[IXGBE_ATR_SRC_IPV6_OFFSET + 14] - (src_addr_1 >> 16) & 0xff; - input->byte_stream[IXGBE_ATR_SRC_IPV6_OFFSET + 15] = src_addr_1 >> 24; - - return 0; -} - -/** - * ixgbe_atr_set_dst_ipv6_82599 - Sets the destination IPv6 address - * @input: input stream to modify - * @dst_addr_1: the first 4 bytes of the IP address to load - * @dst_addr_2: the second 4 bytes of the IP address to load - * @dst_addr_3: the third 4 bytes of the IP address to load - * @dst_addr_4: the fourth 4 bytes of the IP address to load - **/ -s32 ixgbe_atr_set_dst_ipv6_82599(struct ixgbe_atr_input *input, - u32 dst_addr_1, u32 dst_addr_2, - u32 dst_addr_3, u32 dst_addr_4) -{ - input->byte_stream[IXGBE_ATR_DST_IPV6_OFFSET] = dst_addr_4 & 0xff; - input->byte_stream[IXGBE_ATR_DST_IPV6_OFFSET + 1] - (dst_addr_4 >> 8) & 0xff; - input->byte_stream[IXGBE_ATR_DST_IPV6_OFFSET + 2] - (dst_addr_4 >> 16) & 0xff; - input->byte_stream[IXGBE_ATR_DST_IPV6_OFFSET + 3] = dst_addr_4 >> 24; - - input->byte_stream[IXGBE_ATR_DST_IPV6_OFFSET + 4] = dst_addr_3 & 0xff; - input->byte_stream[IXGBE_ATR_DST_IPV6_OFFSET + 5] - (dst_addr_3 >> 8) & 0xff; - input->byte_stream[IXGBE_ATR_DST_IPV6_OFFSET + 6] - (dst_addr_3 >> 16) & 0xff; - input->byte_stream[IXGBE_ATR_DST_IPV6_OFFSET + 7] = dst_addr_3 >> 24; - - input->byte_stream[IXGBE_ATR_DST_IPV6_OFFSET + 8] = dst_addr_2 & 0xff; - input->byte_stream[IXGBE_ATR_DST_IPV6_OFFSET + 9] - (dst_addr_2 >> 8) & 0xff; - input->byte_stream[IXGBE_ATR_DST_IPV6_OFFSET + 10] - (dst_addr_2 >> 16) & 0xff; - input->byte_stream[IXGBE_ATR_DST_IPV6_OFFSET + 11] = dst_addr_2 >> 24; - - input->byte_stream[IXGBE_ATR_DST_IPV6_OFFSET + 12] = dst_addr_1 & 0xff; - input->byte_stream[IXGBE_ATR_DST_IPV6_OFFSET + 13] - (dst_addr_1 >> 8) & 0xff; - input->byte_stream[IXGBE_ATR_DST_IPV6_OFFSET + 14] - (dst_addr_1 >> 16) & 0xff; - input->byte_stream[IXGBE_ATR_DST_IPV6_OFFSET + 15] = dst_addr_1 >> 24; - - return 0; -} - -/** - * ixgbe_atr_set_src_port_82599 - Sets the source port - * @input: input stream to modify - * @src_port: the source port to load - **/ -s32 ixgbe_atr_set_src_port_82599(struct ixgbe_atr_input *input, u16 src_port) -{ - input->byte_stream[IXGBE_ATR_SRC_PORT_OFFSET + 1] = src_port >> 8; - input->byte_stream[IXGBE_ATR_SRC_PORT_OFFSET] = src_port & 0xff; - - return 0; -} - -/** - * ixgbe_atr_set_dst_port_82599 - Sets the destination port - * @input: input stream to modify - * @dst_port: the destination port to load - **/ -s32 ixgbe_atr_set_dst_port_82599(struct ixgbe_atr_input *input, u16 dst_port) -{ - input->byte_stream[IXGBE_ATR_DST_PORT_OFFSET + 1] = dst_port >> 8; - input->byte_stream[IXGBE_ATR_DST_PORT_OFFSET] = dst_port & 0xff; - - return 0; -} - -/** - * ixgbe_atr_set_flex_byte_82599 - Sets the flexible bytes - * @input: input stream to modify - * @flex_bytes: the flexible bytes to load - **/ -s32 ixgbe_atr_set_flex_byte_82599(struct ixgbe_atr_input *input, u16 flex_byte) -{ - input->byte_stream[IXGBE_ATR_FLEX_BYTE_OFFSET + 1] = flex_byte >> 8; - input->byte_stream[IXGBE_ATR_FLEX_BYTE_OFFSET] = flex_byte & 0xff; - - return 0; -} - -/** - * ixgbe_atr_set_vm_pool_82599 - Sets the Virtual Machine pool - * @input: input stream to modify - * @vm_pool: the Virtual Machine pool to load - **/ -s32 ixgbe_atr_set_vm_pool_82599(struct ixgbe_atr_input *input, u8 vm_pool) -{ - input->byte_stream[IXGBE_ATR_VM_POOL_OFFSET] = vm_pool; - - return 0; -} - -/** - * ixgbe_atr_set_l4type_82599 - Sets the layer 4 packet type - * @input: input stream to modify - * @l4type: the layer 4 type value to load - **/ -s32 ixgbe_atr_set_l4type_82599(struct ixgbe_atr_input *input, u8 l4type) -{ - input->byte_stream[IXGBE_ATR_L4TYPE_OFFSET] = l4type; - - return 0; -} - -/** - * ixgbe_atr_get_vlan_id_82599 - Gets the VLAN id from the ATR input stream - * @input: input stream to search - * @vlan: the VLAN id to load - **/ -s32 ixgbe_atr_get_vlan_id_82599(struct ixgbe_atr_input *input, u16 *vlan) -{ - *vlan = input->byte_stream[IXGBE_ATR_VLAN_OFFSET]; - *vlan |= input->byte_stream[IXGBE_ATR_VLAN_OFFSET + 1] << 8; - - return 0; -} - -/** - * ixgbe_atr_get_src_ipv4_82599 - Gets the source IPv4 address - * @input: input stream to search - * @src_addr: the IP address to load - **/ -s32 ixgbe_atr_get_src_ipv4_82599(struct ixgbe_atr_input *input, u32 *src_addr) -{ - *src_addr = input->byte_stream[IXGBE_ATR_SRC_IPV4_OFFSET]; - *src_addr |= input->byte_stream[IXGBE_ATR_SRC_IPV4_OFFSET + 1] << 8; - *src_addr |= input->byte_stream[IXGBE_ATR_SRC_IPV4_OFFSET + 2] << 16; - *src_addr |= input->byte_stream[IXGBE_ATR_SRC_IPV4_OFFSET + 3] << 24; - - return 0; -} - -/** - * ixgbe_atr_get_dst_ipv4_82599 - Gets the destination IPv4 address - * @input: input stream to search - * @dst_addr: the IP address to load - **/ -s32 ixgbe_atr_get_dst_ipv4_82599(struct ixgbe_atr_input *input, u32 *dst_addr) -{ - *dst_addr = input->byte_stream[IXGBE_ATR_DST_IPV4_OFFSET]; - *dst_addr |= input->byte_stream[IXGBE_ATR_DST_IPV4_OFFSET + 1] << 8; - *dst_addr |= input->byte_stream[IXGBE_ATR_DST_IPV4_OFFSET + 2] << 16; - *dst_addr |= input->byte_stream[IXGBE_ATR_DST_IPV4_OFFSET + 3] << 24; - - return 0; -} - -/** - * ixgbe_atr_get_src_ipv6_82599 - Gets the source IPv6 address - * @input: input stream to search - * @src_addr_1: the first 4 bytes of the IP address to load - * @src_addr_2: the second 4 bytes of the IP address to load - * @src_addr_3: the third 4 bytes of the IP address to load - * @src_addr_4: the fourth 4 bytes of the IP address to load - **/ -s32 ixgbe_atr_get_src_ipv6_82599(struct ixgbe_atr_input *input, - u32 *src_addr_1, u32 *src_addr_2, - u32 *src_addr_3, u32 *src_addr_4) -{ - *src_addr_1 = input->byte_stream[IXGBE_ATR_SRC_IPV6_OFFSET + 12]; - *src_addr_1 = input->byte_stream[IXGBE_ATR_SRC_IPV6_OFFSET + 13] << 8; - *src_addr_1 = input->byte_stream[IXGBE_ATR_SRC_IPV6_OFFSET + 14] << 16; - *src_addr_1 = input->byte_stream[IXGBE_ATR_SRC_IPV6_OFFSET + 15] << 24; - - *src_addr_2 = input->byte_stream[IXGBE_ATR_SRC_IPV6_OFFSET + 8]; - *src_addr_2 = input->byte_stream[IXGBE_ATR_SRC_IPV6_OFFSET + 9] << 8; - *src_addr_2 = input->byte_stream[IXGBE_ATR_SRC_IPV6_OFFSET + 10] << 16; - *src_addr_2 = input->byte_stream[IXGBE_ATR_SRC_IPV6_OFFSET + 11] << 24; - - *src_addr_3 = input->byte_stream[IXGBE_ATR_SRC_IPV6_OFFSET + 4]; - *src_addr_3 = input->byte_stream[IXGBE_ATR_SRC_IPV6_OFFSET + 5] << 8; - *src_addr_3 = input->byte_stream[IXGBE_ATR_SRC_IPV6_OFFSET + 6] << 16; - *src_addr_3 = input->byte_stream[IXGBE_ATR_SRC_IPV6_OFFSET + 7] << 24; - - *src_addr_4 = input->byte_stream[IXGBE_ATR_SRC_IPV6_OFFSET]; - *src_addr_4 = input->byte_stream[IXGBE_ATR_SRC_IPV6_OFFSET + 1] << 8; - *src_addr_4 = input->byte_stream[IXGBE_ATR_SRC_IPV6_OFFSET + 2] << 16; - *src_addr_4 = input->byte_stream[IXGBE_ATR_SRC_IPV6_OFFSET + 3] << 24; - - return 0; -} - -/** - * ixgbe_atr_get_dst_ipv6_82599 - Gets the destination IPv6 address - * @input: input stream to search - * @dst_addr_1: the first 4 bytes of the IP address to load - * @dst_addr_2: the second 4 bytes of the IP address to load - * @dst_addr_3: the third 4 bytes of the IP address to load - * @dst_addr_4: the fourth 4 bytes of the IP address to load - **/ -s32 ixgbe_atr_get_dst_ipv6_82599(struct ixgbe_atr_input *input, - u32 *dst_addr_1, u32 *dst_addr_2, - u32 *dst_addr_3, u32 *dst_addr_4) -{ - *dst_addr_1 = input->byte_stream[IXGBE_ATR_DST_IPV6_OFFSET + 12]; - *dst_addr_1 = input->byte_stream[IXGBE_ATR_DST_IPV6_OFFSET + 13] << 8; - *dst_addr_1 = input->byte_stream[IXGBE_ATR_DST_IPV6_OFFSET + 14] << 16; - *dst_addr_1 = input->byte_stream[IXGBE_ATR_DST_IPV6_OFFSET + 15] << 24; - - *dst_addr_2 = input->byte_stream[IXGBE_ATR_DST_IPV6_OFFSET + 8]; - *dst_addr_2 = input->byte_stream[IXGBE_ATR_DST_IPV6_OFFSET + 9] << 8; - *dst_addr_2 = input->byte_stream[IXGBE_ATR_DST_IPV6_OFFSET + 10] << 16; - *dst_addr_2 = input->byte_stream[IXGBE_ATR_DST_IPV6_OFFSET + 11] << 24; - - *dst_addr_3 = input->byte_stream[IXGBE_ATR_DST_IPV6_OFFSET + 4]; - *dst_addr_3 = input->byte_stream[IXGBE_ATR_DST_IPV6_OFFSET + 5] << 8; - *dst_addr_3 = input->byte_stream[IXGBE_ATR_DST_IPV6_OFFSET + 6] << 16; - *dst_addr_3 = input->byte_stream[IXGBE_ATR_DST_IPV6_OFFSET + 7] << 24; - - *dst_addr_4 = input->byte_stream[IXGBE_ATR_DST_IPV6_OFFSET]; - *dst_addr_4 = input->byte_stream[IXGBE_ATR_DST_IPV6_OFFSET + 1] << 8; - *dst_addr_4 = input->byte_stream[IXGBE_ATR_DST_IPV6_OFFSET + 2] << 16; - *dst_addr_4 = input->byte_stream[IXGBE_ATR_DST_IPV6_OFFSET + 3] << 24; - - return 0; -} - -/** - * ixgbe_atr_get_src_port_82599 - Gets the source port - * @input: input stream to modify - * @src_port: the source port to load - * - * Even though the input is given in big-endian, the FDIRPORT registers - * expect the ports to be programmed in little-endian. Hence the need to swap - * endianness when retrieving the data. This can be confusing since the - * internal hash engine expects it to be big-endian. - **/ -s32 ixgbe_atr_get_src_port_82599(struct ixgbe_atr_input *input, u16 *src_port) -{ - *src_port = input->byte_stream[IXGBE_ATR_SRC_PORT_OFFSET] << 8; - *src_port |= input->byte_stream[IXGBE_ATR_SRC_PORT_OFFSET + 1]; - - return 0; -} - -/** - * ixgbe_atr_get_dst_port_82599 - Gets the destination port - * @input: input stream to modify - * @dst_port: the destination port to load - * - * Even though the input is given in big-endian, the FDIRPORT registers - * expect the ports to be programmed in little-endian. Hence the need to swap - * endianness when retrieving the data. This can be confusing since the - * internal hash engine expects it to be big-endian. - **/ -s32 ixgbe_atr_get_dst_port_82599(struct ixgbe_atr_input *input, u16 *dst_port) -{ - *dst_port = input->byte_stream[IXGBE_ATR_DST_PORT_OFFSET] << 8; - *dst_port |= input->byte_stream[IXGBE_ATR_DST_PORT_OFFSET + 1]; - - return 0; -} - -/** - * ixgbe_atr_get_flex_byte_82599 - Gets the flexible bytes - * @input: input stream to modify - * @flex_bytes: the flexible bytes to load - **/ -s32 ixgbe_atr_get_flex_byte_82599(struct ixgbe_atr_input *input, u16 *flex_byte) -{ - *flex_byte = input->byte_stream[IXGBE_ATR_FLEX_BYTE_OFFSET]; - *flex_byte |= input->byte_stream[IXGBE_ATR_FLEX_BYTE_OFFSET + 1] << 8; - - return 0; -} - -/** - * ixgbe_atr_get_vm_pool_82599 - Gets the Virtual Machine pool - * @input: input stream to modify - * @vm_pool: the Virtual Machine pool to load - **/ -s32 ixgbe_atr_get_vm_pool_82599(struct ixgbe_atr_input *input, u8 *vm_pool) -{ - *vm_pool = input->byte_stream[IXGBE_ATR_VM_POOL_OFFSET]; - - return 0; -} - -/** - * ixgbe_atr_get_l4type_82599 - Gets the layer 4 packet type - * @input: input stream to modify - * @l4type: the layer 4 type value to load - **/ -s32 ixgbe_atr_get_l4type_82599(struct ixgbe_atr_input *input, u8 *l4type) -{ - *l4type = input->byte_stream[IXGBE_ATR_L4TYPE_OFFSET]; - - return 0; -} - -/** - * ixgbe_atr_add_signature_filter_82599 - Adds a signature hash filter - * @hw: pointer to hardware structure - * @stream: input bitstream - * @queue: queue index to direct traffic to - **/ -s32 ixgbe_fdir_add_signature_filter_82599(struct ixgbe_hw *hw, - struct ixgbe_atr_input *input, - u8 queue) -{ - u64 fdirhashcmd; - u64 fdircmd; - u32 fdirhash; - u16 bucket_hash, sig_hash; - u8 l4type; - - bucket_hash = ixgbe_atr_compute_hash_82599(input, - IXGBE_ATR_BUCKET_HASH_KEY); - - /* bucket_hash is only 15 bits */ - bucket_hash &= IXGBE_ATR_HASH_MASK; - - sig_hash = ixgbe_atr_compute_hash_82599(input, - IXGBE_ATR_SIGNATURE_HASH_KEY); - - /* Get the l4type in order to program FDIRCMD properly */ - /* lowest 2 bits are FDIRCMD.L4TYPE, third lowest bit is FDIRCMD.IPV6 */ - ixgbe_atr_get_l4type_82599(input, &l4type); - - /* - * The lower 32-bits of fdirhashcmd is for FDIRHASH, the upper 32-bits - * is for FDIRCMD. Then do a 64-bit register write from FDIRHASH. - */ - fdirhash = sig_hash << IXGBE_FDIRHASH_SIG_SW_INDEX_SHIFT | bucket_hash; - - fdircmd = (IXGBE_FDIRCMD_CMD_ADD_FLOW | IXGBE_FDIRCMD_FILTER_UPDATE | - IXGBE_FDIRCMD_LAST | IXGBE_FDIRCMD_QUEUE_EN); - - switch (l4type & IXGBE_ATR_L4TYPE_MASK) { - case IXGBE_ATR_L4TYPE_TCP: - fdircmd |= IXGBE_FDIRCMD_L4TYPE_TCP; - break; - case IXGBE_ATR_L4TYPE_UDP: - fdircmd |= IXGBE_FDIRCMD_L4TYPE_UDP; - break; - case IXGBE_ATR_L4TYPE_SCTP: - fdircmd |= IXGBE_FDIRCMD_L4TYPE_SCTP; - break; - default: - hw_dbg(hw, " Error on l4type input\n"); - return IXGBE_ERR_CONFIG; - } - - if (l4type & IXGBE_ATR_L4TYPE_IPV6_MASK) - fdircmd |= IXGBE_FDIRCMD_IPV6; - - fdircmd |= ((u64)queue << IXGBE_FDIRCMD_RX_QUEUE_SHIFT); - fdirhashcmd = ((fdircmd << 32) | fdirhash); - - hw_dbg(hw, "Tx Queue=%x hash=%x\n", queue, fdirhash & 0x7FFF7FFF); - IXGBE_WRITE_REG64(hw, IXGBE_FDIRHASH, fdirhashcmd); - - return 0; -} - -/** - * ixgbe_fdir_add_perfect_filter_82599 - Adds a perfect filter - * @hw: pointer to hardware structure - * @input: input bitstream - * @queue: queue index to direct traffic to - * - * Note that the caller to this function must lock before calling, since the - * hardware writes must be protected from one another. - **/ -s32 ixgbe_fdir_add_perfect_filter_82599(struct ixgbe_hw *hw, - struct ixgbe_atr_input *input, - u16 soft_id, - u8 queue) -{ - u32 fdircmd = 0; - u32 fdirhash; - u32 src_ipv4, dst_ipv4; - u32 src_ipv6_1, src_ipv6_2, src_ipv6_3, src_ipv6_4; - u16 src_port, dst_port, vlan_id, flex_bytes; - u16 bucket_hash; - u8 l4type; - - /* Get our input values */ - ixgbe_atr_get_l4type_82599(input, &l4type); - - /* - * Check l4type formatting, and bail out before we touch the hardware - * if there''s a configuration issue - */ - switch (l4type & IXGBE_ATR_L4TYPE_MASK) { - case IXGBE_ATR_L4TYPE_TCP: - fdircmd |= IXGBE_FDIRCMD_L4TYPE_TCP; - break; - case IXGBE_ATR_L4TYPE_UDP: - fdircmd |= IXGBE_FDIRCMD_L4TYPE_UDP; - break; - case IXGBE_ATR_L4TYPE_SCTP: - fdircmd |= IXGBE_FDIRCMD_L4TYPE_SCTP; - break; - default: - hw_dbg(hw, " Error on l4type input\n"); - return IXGBE_ERR_CONFIG; - } - - bucket_hash = ixgbe_atr_compute_hash_82599(input, - IXGBE_ATR_BUCKET_HASH_KEY); - - /* bucket_hash is only 15 bits */ - bucket_hash &= IXGBE_ATR_HASH_MASK; - - ixgbe_atr_get_vlan_id_82599(input, &vlan_id); - ixgbe_atr_get_src_port_82599(input, &src_port); - ixgbe_atr_get_dst_port_82599(input, &dst_port); - ixgbe_atr_get_flex_byte_82599(input, &flex_bytes); - - fdirhash = soft_id << IXGBE_FDIRHASH_SIG_SW_INDEX_SHIFT | bucket_hash; - - /* Now figure out if we''re IPv4 or IPv6 */ - if (l4type & IXGBE_ATR_L4TYPE_IPV6_MASK) { - /* IPv6 */ - ixgbe_atr_get_src_ipv6_82599(input, &src_ipv6_1, &src_ipv6_2, - &src_ipv6_3, &src_ipv6_4); - - IXGBE_WRITE_REG(hw, IXGBE_FDIRSIPv6(0), src_ipv6_1); - IXGBE_WRITE_REG(hw, IXGBE_FDIRSIPv6(1), src_ipv6_2); - IXGBE_WRITE_REG(hw, IXGBE_FDIRSIPv6(2), src_ipv6_3); - /* The last 4 bytes is the same register as IPv4 */ - IXGBE_WRITE_REG(hw, IXGBE_FDIRIPSA, src_ipv6_4); - - fdircmd |= IXGBE_FDIRCMD_IPV6; - fdircmd |= IXGBE_FDIRCMD_IPv6DMATCH; - } else { - /* IPv4 */ - ixgbe_atr_get_src_ipv4_82599(input, &src_ipv4); - IXGBE_WRITE_REG(hw, IXGBE_FDIRIPSA, src_ipv4); - - } - - ixgbe_atr_get_dst_ipv4_82599(input, &dst_ipv4); - IXGBE_WRITE_REG(hw, IXGBE_FDIRIPDA, dst_ipv4); - - IXGBE_WRITE_REG(hw, IXGBE_FDIRVLAN, (vlan_id | - (flex_bytes << IXGBE_FDIRVLAN_FLEX_SHIFT))); - IXGBE_WRITE_REG(hw, IXGBE_FDIRPORT, (src_port | - (dst_port << IXGBE_FDIRPORT_DESTINATION_SHIFT))); - - fdircmd |= IXGBE_FDIRCMD_CMD_ADD_FLOW; - fdircmd |= IXGBE_FDIRCMD_FILTER_UPDATE; - fdircmd |= IXGBE_FDIRCMD_LAST; - fdircmd |= IXGBE_FDIRCMD_QUEUE_EN; - fdircmd |= queue << IXGBE_FDIRCMD_RX_QUEUE_SHIFT; - - IXGBE_WRITE_REG(hw, IXGBE_FDIRHASH, fdirhash); - IXGBE_WRITE_REG(hw, IXGBE_FDIRCMD, fdircmd); - - return 0; -} - -/** - * ixgbe_read_analog_reg8_82599 - Reads 8 bit Omer analog register - * @hw: pointer to hardware structure - * @reg: analog register to read - * @val: read value - * - * Performs read operation to Omer analog register specified. - **/ -s32 ixgbe_read_analog_reg8_82599(struct ixgbe_hw *hw, u32 reg, u8 *val) -{ - u32 core_ctl; - - IXGBE_WRITE_REG(hw, IXGBE_CORECTL, IXGBE_CORECTL_WRITE_CMD | - (reg << 8)); - IXGBE_WRITE_FLUSH(hw); - udelay(10); - core_ctl = IXGBE_READ_REG(hw, IXGBE_CORECTL); - *val = (u8)core_ctl; - - return 0; -} - -/** - * ixgbe_write_analog_reg8_82599 - Writes 8 bit Omer analog register - * @hw: pointer to hardware structure - * @reg: atlas register to write - * @val: value to write - * - * Performs write operation to Omer analog register specified. - **/ -s32 ixgbe_write_analog_reg8_82599(struct ixgbe_hw *hw, u32 reg, u8 val) -{ - u32 core_ctl; - - core_ctl = (reg << 8) | val; - IXGBE_WRITE_REG(hw, IXGBE_CORECTL, core_ctl); - IXGBE_WRITE_FLUSH(hw); - udelay(10); - - return 0; -} - -/** - * ixgbe_start_hw_rev_1_82599 - Prepare hardware for Tx/Rx - * @hw: pointer to hardware structure - * - * Starts the hardware using the generic start_hw function. - * Then performs revision-specific operations: - * Clears the rate limiter registers. - **/ -s32 ixgbe_start_hw_rev_1_82599(struct ixgbe_hw *hw) -{ - u32 q_num; - s32 ret_val = 0; - - ret_val = ixgbe_start_hw_generic(hw); - - /* Clear the rate limiters */ - for (q_num = 0; q_num < hw->mac.max_tx_queues; q_num++) { - IXGBE_WRITE_REG(hw, IXGBE_RTTDQSEL, q_num); - IXGBE_WRITE_REG(hw, IXGBE_RTTBCNRC, 0); - } - IXGBE_WRITE_FLUSH(hw); - - /* We need to run link autotry after the driver loads */ - hw->mac.autotry_restart = true; - - if (ret_val == 0) - ret_val = ixgbe_verify_fw_version_82599(hw); - return ret_val; -} - -/** - * ixgbe_identify_phy_82599 - Get physical layer module - * @hw: pointer to hardware structure - * - * Determines the physical layer module found on the current adapter. - * If PHY already detected, maintains current PHY type in hw struct, - * otherwise executes the PHY detection routine. - **/ -s32 ixgbe_identify_phy_82599(struct ixgbe_hw *hw) -{ - s32 status = IXGBE_ERR_PHY_ADDR_INVALID; - - /* Detect PHY if not unknown - returns success if already detected. */ - status = ixgbe_identify_phy_generic(hw); - if (status != 0) - status = ixgbe_identify_sfp_module_generic(hw); - /* Set PHY type none if no PHY detected */ - if (hw->phy.type == ixgbe_phy_unknown) { - hw->phy.type = ixgbe_phy_none; - status = 0; - } - - /* Return error if SFP module has been detected but is not supported */ - if (hw->phy.type == ixgbe_phy_sfp_unsupported) - status = IXGBE_ERR_SFP_NOT_SUPPORTED; - - return status; -} - -/** - * ixgbe_get_supported_physical_layer_82599 - Returns physical layer type - * @hw: pointer to hardware structure - * - * Determines physical layer capabilities of the current configuration. - **/ -u32 ixgbe_get_supported_physical_layer_82599(struct ixgbe_hw *hw) -{ - u32 physical_layer = IXGBE_PHYSICAL_LAYER_UNKNOWN; - u32 autoc = IXGBE_READ_REG(hw, IXGBE_AUTOC); - u32 autoc2 = IXGBE_READ_REG(hw, IXGBE_AUTOC2); - u32 pma_pmd_10g_serial = autoc2 & IXGBE_AUTOC2_10G_SERIAL_PMA_PMD_MASK; - u32 pma_pmd_10g_parallel = autoc & IXGBE_AUTOC_10G_PMA_PMD_MASK; - u32 pma_pmd_1g = autoc & IXGBE_AUTOC_1G_PMA_PMD_MASK; - u16 ext_ability = 0; - u8 comp_codes_10g = 0; - - hw->phy.ops.identify(hw); - - if (hw->phy.type == ixgbe_phy_tn || - hw->phy.type == ixgbe_phy_aq || - hw->phy.type == ixgbe_phy_cu_unknown) { - hw->phy.ops.read_reg(hw, IXGBE_MDIO_PHY_EXT_ABILITY, - IXGBE_MDIO_PMA_PMD_DEV_TYPE, &ext_ability); - if (ext_ability & IXGBE_MDIO_PHY_10GBASET_ABILITY) - physical_layer |= IXGBE_PHYSICAL_LAYER_10GBASE_T; - if (ext_ability & IXGBE_MDIO_PHY_1000BASET_ABILITY) - physical_layer |= IXGBE_PHYSICAL_LAYER_1000BASE_T; - if (ext_ability & IXGBE_MDIO_PHY_100BASETX_ABILITY) - physical_layer |= IXGBE_PHYSICAL_LAYER_100BASE_TX; - goto out; - } - - switch (autoc & IXGBE_AUTOC_LMS_MASK) { - case IXGBE_AUTOC_LMS_1G_AN: - case IXGBE_AUTOC_LMS_1G_LINK_NO_AN: - if (pma_pmd_1g == IXGBE_AUTOC_1G_KX_BX) { - physical_layer = IXGBE_PHYSICAL_LAYER_1000BASE_KX | - IXGBE_PHYSICAL_LAYER_1000BASE_BX; - goto out; - } else - /* SFI mode so read SFP module */ - goto sfp_check; - break; - case IXGBE_AUTOC_LMS_10G_LINK_NO_AN: - if (pma_pmd_10g_parallel == IXGBE_AUTOC_10G_CX4) - physical_layer = IXGBE_PHYSICAL_LAYER_10GBASE_CX4; - else if (pma_pmd_10g_parallel == IXGBE_AUTOC_10G_KX4) - physical_layer = IXGBE_PHYSICAL_LAYER_10GBASE_KX4; - else if (pma_pmd_10g_parallel == IXGBE_AUTOC_10G_XAUI) - physical_layer = IXGBE_PHYSICAL_LAYER_10GBASE_XAUI; - goto out; - break; - case IXGBE_AUTOC_LMS_10G_SERIAL: - if (pma_pmd_10g_serial == IXGBE_AUTOC2_10G_KR) { - physical_layer = IXGBE_PHYSICAL_LAYER_10GBASE_KR; - goto out; - } else if (pma_pmd_10g_serial == IXGBE_AUTOC2_10G_SFI) - goto sfp_check; - break; - case IXGBE_AUTOC_LMS_KX4_KX_KR: - case IXGBE_AUTOC_LMS_KX4_KX_KR_1G_AN: - if (autoc & IXGBE_AUTOC_KX_SUPP) - physical_layer |= IXGBE_PHYSICAL_LAYER_1000BASE_KX; - if (autoc & IXGBE_AUTOC_KX4_SUPP) - physical_layer |= IXGBE_PHYSICAL_LAYER_10GBASE_KX4; - if (autoc & IXGBE_AUTOC_KR_SUPP) - physical_layer |= IXGBE_PHYSICAL_LAYER_10GBASE_KR; - goto out; - break; - default: - goto out; - break; - } - -sfp_check: - /* SFP check must be done last since DA modules are sometimes used to - * test KR mode - we need to id KR mode correctly before SFP module. - * Call identify_sfp because the pluggable module may have changed */ - hw->phy.ops.identify_sfp(hw); - if (hw->phy.sfp_type == ixgbe_sfp_type_not_present) - goto out; - - switch (hw->phy.type) { - case ixgbe_phy_tw_tyco: - case ixgbe_phy_tw_unknown: - physical_layer = IXGBE_PHYSICAL_LAYER_SFP_PLUS_CU; - break; - case ixgbe_phy_sfp_avago: - case ixgbe_phy_sfp_ftl: - case ixgbe_phy_sfp_intel: - case ixgbe_phy_sfp_unknown: - hw->phy.ops.read_i2c_eeprom(hw, - IXGBE_SFF_10GBE_COMP_CODES, &comp_codes_10g); - if (comp_codes_10g & IXGBE_SFF_10GBASESR_CAPABLE) - physical_layer = IXGBE_PHYSICAL_LAYER_10GBASE_SR; - else if (comp_codes_10g & IXGBE_SFF_10GBASELR_CAPABLE) - physical_layer = IXGBE_PHYSICAL_LAYER_10GBASE_LR; - break; - default: - break; - } - -out: - return physical_layer; -} - -/** - * ixgbe_enable_rx_dma_82599 - Enable the Rx DMA unit on 82599 - * @hw: pointer to hardware structure - * @regval: register value to write to RXCTRL - * - * Enables the Rx DMA unit for 82599 - **/ -s32 ixgbe_enable_rx_dma_82599(struct ixgbe_hw *hw, u32 regval) -{ -#define IXGBE_MAX_SECRX_POLL 30 - int i; - int secrxreg; - - /* - * Workaround for 82599 silicon errata when enabling the Rx datapath. - * If traffic is incoming before we enable the Rx unit, it could hang - * the Rx DMA unit. Therefore, make sure the security engine is - * completely disabled prior to enabling the Rx unit. - */ - secrxreg = IXGBE_READ_REG(hw, IXGBE_SECRXCTRL); - secrxreg |= IXGBE_SECRXCTRL_RX_DIS; - IXGBE_WRITE_REG(hw, IXGBE_SECRXCTRL, secrxreg); - for (i = 0; i < IXGBE_MAX_SECRX_POLL; i++) { - secrxreg = IXGBE_READ_REG(hw, IXGBE_SECRXSTAT); - if (secrxreg & IXGBE_SECRXSTAT_SECRX_RDY) - break; - else - /* Use interrupt-safe sleep just in case */ - udelay(10); - } - - /* For informational purposes only */ - if (i >= IXGBE_MAX_SECRX_POLL) - hw_dbg(hw, "Rx unit being enabled before security " - "path fully disabled. Continuing with init.\n"); - - IXGBE_WRITE_REG(hw, IXGBE_RXCTRL, regval); - secrxreg = IXGBE_READ_REG(hw, IXGBE_SECRXCTRL); - secrxreg &= ~IXGBE_SECRXCTRL_RX_DIS; - IXGBE_WRITE_REG(hw, IXGBE_SECRXCTRL, secrxreg); - IXGBE_WRITE_FLUSH(hw); - - return 0; -} - -/** - * ixgbe_get_device_caps_82599 - Get additional device capabilities - * @hw: pointer to hardware structure - * @device_caps: the EEPROM word with the extra device capabilities - * - * This function will read the EEPROM location for the device capabilities, - * and return the word through device_caps. - **/ -s32 ixgbe_get_device_caps_82599(struct ixgbe_hw *hw, u16 *device_caps) -{ - hw->eeprom.ops.read(hw, IXGBE_DEVICE_CAPS, device_caps); - - return 0; -} - -/** - * ixgbe_get_san_mac_addr_offset_82599 - SAN MAC address offset for 82599 - * @hw: pointer to hardware structure - * @san_mac_offset: SAN MAC address offset - * - * This function will read the EEPROM location for the SAN MAC address - * pointer, and returns the value at that location. This is used in both - * get and set mac_addr routines. - **/ -s32 ixgbe_get_san_mac_addr_offset_82599(struct ixgbe_hw *hw, - u16 *san_mac_offset) -{ - /* - * First read the EEPROM pointer to see if the MAC addresses are - * available. - */ - hw->eeprom.ops.read(hw, IXGBE_SAN_MAC_ADDR_PTR, san_mac_offset); - - return 0; -} - -/** - * ixgbe_get_san_mac_addr_82599 - SAN MAC address retrieval for 82599 - * @hw: pointer to hardware structure - * @san_mac_addr: SAN MAC address - * - * Reads the SAN MAC address from the EEPROM, if it''s available. This is - * per-port, so set_lan_id() must be called before reading the addresses. - * set_lan_id() is called by identify_sfp(), but this cannot be relied - * upon for non-SFP connections, so we must call it here. - **/ -s32 ixgbe_get_san_mac_addr_82599(struct ixgbe_hw *hw, u8 *san_mac_addr) -{ - u16 san_mac_data, san_mac_offset; - u8 i; - - /* - * First read the EEPROM pointer to see if the MAC addresses are - * available. If they''re not, no point in calling set_lan_id() here. - */ - ixgbe_get_san_mac_addr_offset_82599(hw, &san_mac_offset); - - if ((san_mac_offset == 0) || (san_mac_offset == 0xFFFF)) { - /* - * No addresses available in this EEPROM. It''s not an - * error though, so just wipe the local address and return. - */ - for (i = 0; i < 6; i++) - san_mac_addr[i] = 0xFF; - - goto san_mac_addr_out; - } - - /* make sure we know which port we need to program */ - hw->mac.ops.set_lan_id(hw); - /* apply the port offset to the address offset */ - (hw->bus.func) ? (san_mac_offset += IXGBE_SAN_MAC_ADDR_PORT1_OFFSET) : - (san_mac_offset += IXGBE_SAN_MAC_ADDR_PORT0_OFFSET); - for (i = 0; i < 3; i++) { - hw->eeprom.ops.read(hw, san_mac_offset, &san_mac_data); - san_mac_addr[i * 2] = (u8)(san_mac_data); - san_mac_addr[i * 2 + 1] = (u8)(san_mac_data >> 8); - san_mac_offset++; - } - -san_mac_addr_out: - return 0; -} - -/** - * ixgbe_set_san_mac_addr_82599 - Write the SAN MAC address to the EEPROM - * @hw: pointer to hardware structure - * @san_mac_addr: SAN MAC address - * - * Write a SAN MAC address to the EEPROM. - **/ -s32 ixgbe_set_san_mac_addr_82599(struct ixgbe_hw *hw, u8 *san_mac_addr) -{ - s32 status = 0; - u16 san_mac_data, san_mac_offset; - u8 i; - - /* Look for SAN mac address pointer. If not defined, return */ - ixgbe_get_san_mac_addr_offset_82599(hw, &san_mac_offset); - - if ((san_mac_offset == 0) || (san_mac_offset == 0xFFFF)) { - status = IXGBE_ERR_NO_SAN_ADDR_PTR; - goto san_mac_addr_out; - } - - /* Make sure we know which port we need to write */ - hw->mac.ops.set_lan_id(hw); - /* Apply the port offset to the address offset */ - (hw->bus.func) ? (san_mac_offset += IXGBE_SAN_MAC_ADDR_PORT1_OFFSET) : - (san_mac_offset += IXGBE_SAN_MAC_ADDR_PORT0_OFFSET); - - for (i = 0; i < 3; i++) { - san_mac_data = (u16)((u16)(san_mac_addr[i * 2 + 1]) << 8); - san_mac_data |= (u16)(san_mac_addr[i * 2]); - hw->eeprom.ops.write(hw, san_mac_offset, san_mac_data); - san_mac_offset++; - } - -san_mac_addr_out: - return status; -} - -/** - * ixgbe_verify_fw_version_82599 - verify fw version for 82599 - * @hw: pointer to hardware structure - * - * Verifies that installed the firmware version is 0.6 or higher - * for SFI devices. All 82599 SFI devices should have version 0.6 or higher. - * - * Returns IXGBE_ERR_EEPROM_VERSION if the FW is not present or - * if the FW version is not supported. - **/ -static s32 ixgbe_verify_fw_version_82599(struct ixgbe_hw *hw) -{ - s32 status = IXGBE_ERR_EEPROM_VERSION; - u16 fw_offset, fw_ptp_cfg_offset; - u16 fw_version = 0; - - /* firmware check is only necessary for SFI devices */ - if (hw->phy.media_type != ixgbe_media_type_fiber) { - status = 0; - goto fw_version_out; - } - - /* get the offset to the Firmware Module block */ - hw->eeprom.ops.read(hw, IXGBE_FW_PTR, &fw_offset); - - if ((fw_offset == 0) || (fw_offset == 0xFFFF)) - goto fw_version_out; - - /* get the offset to the Pass Through Patch Configuration block */ - hw->eeprom.ops.read(hw, (fw_offset + - IXGBE_FW_PASSTHROUGH_PATCH_CONFIG_PTR), - &fw_ptp_cfg_offset); - - if ((fw_ptp_cfg_offset == 0) || (fw_ptp_cfg_offset == 0xFFFF)) - goto fw_version_out; - - /* get the firmware version */ - hw->eeprom.ops.read(hw, (fw_ptp_cfg_offset + - IXGBE_FW_PATCH_VERSION_4), - &fw_version); - - if (fw_version > 0x5) - status = 0; - -fw_version_out: - return status; -} diff --git a/drivers/net/ixgbe/ixgbe_api.c b/drivers/net/ixgbe/ixgbe_api.c index 89bfb76..3967594 100644 --- a/drivers/net/ixgbe/ixgbe_api.c +++ b/drivers/net/ixgbe/ixgbe_api.c @@ -1,7 +1,7 @@ /******************************************************************************* Intel 10 Gigabit PCI Express Linux driver - Copyright(c) 1999 - 2009 Intel Corporation. + Copyright(c) 1999 - 2008 Intel Corporation. This program is free software; you can redistribute it and/or modify it under the terms and conditions of the GNU General Public License, @@ -29,7 +29,6 @@ #include "ixgbe_common.h" extern s32 ixgbe_init_ops_82598(struct ixgbe_hw *hw); -extern s32 ixgbe_init_ops_82599(struct ixgbe_hw *hw); /** * ixgbe_init_shared_code - Initialize the shared code @@ -56,9 +55,6 @@ s32 ixgbe_init_shared_code(struct ixgbe_hw *hw) case ixgbe_mac_82598EB: status = ixgbe_init_ops_82598(hw); break; - case ixgbe_mac_82599EB: - status = ixgbe_init_ops_82599(hw); - break; default: status = IXGBE_ERR_DEVICE_NOT_SUPPORTED; break; @@ -81,7 +77,6 @@ s32 ixgbe_set_mac_type(struct ixgbe_hw *hw) if (hw->vendor_id == IXGBE_INTEL_VENDOR_ID) { switch (hw->device_id) { case IXGBE_DEV_ID_82598: - case IXGBE_DEV_ID_82598_BX: case IXGBE_DEV_ID_82598AF_SINGLE_PORT: case IXGBE_DEV_ID_82598AF_DUAL_PORT: case IXGBE_DEV_ID_82598AT: @@ -93,11 +88,6 @@ s32 ixgbe_set_mac_type(struct ixgbe_hw *hw) case IXGBE_DEV_ID_82598EB_SFP_LOM: hw->mac.type = ixgbe_mac_82598EB; break; - case IXGBE_DEV_ID_82599_KX4: - case IXGBE_DEV_ID_82599_XAUI_LOM: - case IXGBE_DEV_ID_82599_SFP: - hw->mac.type = ixgbe_mac_82599EB; - break; default: ret_val = IXGBE_ERR_DEVICE_NOT_SUPPORTED; break; @@ -194,46 +184,6 @@ s32 ixgbe_get_mac_addr(struct ixgbe_hw *hw, u8 *mac_addr) } /** - * ixgbe_get_san_mac_addr - Get SAN MAC address - * @hw: pointer to hardware structure - * @san_mac_addr: SAN MAC address - * - * Reads the SAN MAC address from the EEPROM, if it''s available. This is - * per-port, so set_lan_id() must be called before reading the addresses. - **/ -s32 ixgbe_get_san_mac_addr(struct ixgbe_hw *hw, u8 *san_mac_addr) -{ - return ixgbe_call_func(hw, hw->mac.ops.get_san_mac_addr, - (hw, san_mac_addr), IXGBE_NOT_IMPLEMENTED); -} - -/** - * ixgbe_set_san_mac_addr - Write a SAN MAC address - * @hw: pointer to hardware structure - * @san_mac_addr: SAN MAC address - * - * Writes A SAN MAC address to the EEPROM. - **/ -s32 ixgbe_set_san_mac_addr(struct ixgbe_hw *hw, u8 *san_mac_addr) -{ - return ixgbe_call_func(hw, hw->mac.ops.set_san_mac_addr, - (hw, san_mac_addr), IXGBE_NOT_IMPLEMENTED); -} - -/** - * ixgbe_get_device_caps - Get additional device capabilities - * @hw: pointer to hardware structure - * @device_caps: the EEPROM word for device capabilities - * - * Reads the extra device capabilities from the EEPROM - **/ -s32 ixgbe_get_device_caps(struct ixgbe_hw *hw, u16 *device_caps) -{ - return ixgbe_call_func(hw, hw->mac.ops.get_device_caps, - (hw, device_caps), IXGBE_NOT_IMPLEMENTED); -} - -/** * ixgbe_get_bus_info - Set PCI bus info * @hw: pointer to hardware structure * @@ -360,9 +310,6 @@ s32 ixgbe_get_phy_firmware_version(struct ixgbe_hw *hw, u16 *firmware_version) s32 ixgbe_read_phy_reg(struct ixgbe_hw *hw, u32 reg_addr, u32 device_type, u16 *phy_data) { - if (hw->phy.id == 0) - ixgbe_identify_phy(hw); - return ixgbe_call_func(hw, hw->phy.ops.read_reg, (hw, reg_addr, device_type, phy_data), IXGBE_NOT_IMPLEMENTED); } @@ -378,9 +325,6 @@ s32 ixgbe_read_phy_reg(struct ixgbe_hw *hw, u32 reg_addr, u32 device_type, s32 ixgbe_write_phy_reg(struct ixgbe_hw *hw, u32 reg_addr, u32 device_type, u16 phy_data) { - if (hw->phy.id == 0) - ixgbe_identify_phy(hw); - return ixgbe_call_func(hw, hw->phy.ops.write_reg, (hw, reg_addr, device_type, phy_data), IXGBE_NOT_IMPLEMENTED); } @@ -604,22 +548,6 @@ s32 ixgbe_update_eeprom_checksum(struct ixgbe_hw *hw) } /** - * ixgbe_insert_mac_addr - Find a RAR for this mac address - * @hw: pointer to hardware structure - * @addr: Address to put into receive address register - * @vmdq: VMDq pool to assign - * - * Puts an ethernet address into a receive address register, or - * finds the rar that it is aleady in; adds to the pool list - **/ -s32 ixgbe_insert_mac_addr(struct ixgbe_hw *hw, u8 *addr, u32 vmdq) -{ - return ixgbe_call_func(hw, hw->mac.ops.insert_mac_addr, - (hw, addr, vmdq), - IXGBE_NOT_IMPLEMENTED); -} - -/** * ixgbe_set_rar - Set Rx address register * @hw: pointer to hardware structure * @index: Receive address register to write @@ -787,15 +715,15 @@ s32 ixgbe_set_vfta(struct ixgbe_hw *hw, u32 vlan, u32 vind, bool vlan_on) } /** - * ixgbe_fc_enable - Enable flow control + * ixgbe_setup_fc - Set flow control * @hw: pointer to hardware structure * @packetbuf_num: packet buffer number (0-7) * * Configures the flow control settings based on SW configuration. **/ -s32 ixgbe_fc_enable(struct ixgbe_hw *hw, s32 packetbuf_num) +s32 ixgbe_setup_fc(struct ixgbe_hw *hw, s32 packetbuf_num) { - return ixgbe_call_func(hw, hw->mac.ops.fc_enable, (hw, packetbuf_num), + return ixgbe_call_func(hw, hw->mac.ops.setup_fc, (hw, packetbuf_num), IXGBE_NOT_IMPLEMENTED); } @@ -841,53 +769,6 @@ s32 ixgbe_init_uta_tables(struct ixgbe_hw *hw) } /** - * ixgbe_read_i2c_byte - Reads 8 bit word over I2C at specified device address - * @hw: pointer to hardware structure - * @byte_offset: byte offset to read - * @data: value read - * - * Performs byte read operation to SFP module''s EEPROM over I2C interface. - **/ -s32 ixgbe_read_i2c_byte(struct ixgbe_hw *hw, u8 byte_offset, u8 dev_addr, - u8 *data) -{ - return ixgbe_call_func(hw, hw->phy.ops.read_i2c_byte, (hw, byte_offset, - dev_addr, data), IXGBE_NOT_IMPLEMENTED); -} - -/** - * ixgbe_write_i2c_byte - Writes 8 bit word over I2C - * @hw: pointer to hardware structure - * @byte_offset: byte offset to write - * @data: value to write - * - * Performs byte write operation to SFP module''s EEPROM over I2C interface - * at a specified device address. - **/ -s32 ixgbe_write_i2c_byte(struct ixgbe_hw *hw, u8 byte_offset, u8 dev_addr, - u8 data) -{ - return ixgbe_call_func(hw, hw->phy.ops.write_i2c_byte, (hw, byte_offset, - dev_addr, data), IXGBE_NOT_IMPLEMENTED); -} - -/** - * ixgbe_write_i2c_eeprom - Writes 8 bit EEPROM word over I2C interface - * @hw: pointer to hardware structure - * @byte_offset: EEPROM byte offset to write - * @eeprom_data: value to write - * - * Performs byte write operation to SFP module''s EEPROM over I2C interface. - **/ -s32 ixgbe_write_i2c_eeprom(struct ixgbe_hw *hw, - u8 byte_offset, u8 eeprom_data) -{ - return ixgbe_call_func(hw, hw->phy.ops.write_i2c_eeprom, - (hw, byte_offset, eeprom_data), - IXGBE_NOT_IMPLEMENTED); -} - -/** * ixgbe_read_i2c_eeprom - Reads 8 bit EEPROM word over I2C interface * @hw: pointer to hardware structure * @byte_offset: EEPROM byte offset to read @@ -913,45 +794,3 @@ u32 ixgbe_get_supported_physical_layer(struct ixgbe_hw *hw) return ixgbe_call_func(hw, hw->mac.ops.get_supported_physical_layer, (hw), IXGBE_PHYSICAL_LAYER_UNKNOWN); } - -/** - * ixgbe_enable_rx_dma - Enables Rx DMA unit, dependant on device specifics - * @hw: pointer to hardware structure - * @regval: bitfield to write to the Rx DMA register - * - * Enables the Rx DMA unit of the device. - **/ -s32 ixgbe_enable_rx_dma(struct ixgbe_hw *hw, u32 regval) -{ - return ixgbe_call_func(hw, hw->mac.ops.enable_rx_dma, - (hw, regval), IXGBE_NOT_IMPLEMENTED); -} - -/** - * ixgbe_acquire_swfw_semaphore - Acquire SWFW semaphore - * @hw: pointer to hardware structure - * @mask: Mask to specify which semaphore to acquire - * - * Acquires the SWFW semaphore through SW_FW_SYNC register for the specified - * function (CSR, PHY0, PHY1, EEPROM, Flash) - **/ -s32 ixgbe_acquire_swfw_semaphore(struct ixgbe_hw *hw, u16 mask) -{ - return ixgbe_call_func(hw, hw->mac.ops.acquire_swfw_sync, - (hw, mask), IXGBE_NOT_IMPLEMENTED); -} - -/** - * ixgbe_release_swfw_semaphore - Release SWFW semaphore - * @hw: pointer to hardware structure - * @mask: Mask to specify which semaphore to release - * - * Releases the SWFW semaphore through SW_FW_SYNC register for the specified - * function (CSR, PHY0, PHY1, EEPROM, Flash) - **/ -void ixgbe_release_swfw_semaphore(struct ixgbe_hw *hw, u16 mask) -{ - if (hw->mac.ops.release_swfw_sync) - hw->mac.ops.release_swfw_sync(hw, mask); -} - diff --git a/drivers/net/ixgbe/ixgbe_api.h b/drivers/net/ixgbe/ixgbe_api.h index 3552f79..ab9df90 100644 --- a/drivers/net/ixgbe/ixgbe_api.h +++ b/drivers/net/ixgbe/ixgbe_api.h @@ -1,7 +1,7 @@ /******************************************************************************* Intel 10 Gigabit PCI Express Linux driver - Copyright(c) 1999 - 2009 Intel Corporation. + Copyright(c) 1999 - 2008 Intel Corporation. This program is free software; you can redistribute it and/or modify it under the terms and conditions of the GNU General Public License, @@ -78,7 +78,6 @@ s32 ixgbe_read_eeprom(struct ixgbe_hw *hw, u16 offset, u16 *data); s32 ixgbe_validate_eeprom_checksum(struct ixgbe_hw *hw, u16 *checksum_val); s32 ixgbe_update_eeprom_checksum(struct ixgbe_hw *hw); -s32 ixgbe_insert_mac_addr(struct ixgbe_hw *hw, u8 *addr, u32 vmdq); s32 ixgbe_set_rar(struct ixgbe_hw *hw, u32 index, u8 *addr, u32 vmdq, u32 enable_addr); s32 ixgbe_clear_rar(struct ixgbe_hw *hw, u32 index); @@ -96,7 +95,7 @@ s32 ixgbe_clear_vfta(struct ixgbe_hw *hw); s32 ixgbe_set_vfta(struct ixgbe_hw *hw, u32 vlan, u32 vind, bool vlan_on); -s32 ixgbe_fc_enable(struct ixgbe_hw *hw, s32 packetbuf_num); +s32 ixgbe_setup_fc(struct ixgbe_hw *hw, s32 packetbuf_num); void ixgbe_set_mta(struct ixgbe_hw *hw, u8 *mc_addr); s32 ixgbe_get_phy_firmware_version(struct ixgbe_hw *hw, @@ -106,57 +105,5 @@ s32 ixgbe_write_analog_reg8(struct ixgbe_hw *hw, u32 reg, u8 val); s32 ixgbe_init_uta_tables(struct ixgbe_hw *hw); s32 ixgbe_read_i2c_eeprom(struct ixgbe_hw *hw, u8 byte_offset, u8 *eeprom_data); u32 ixgbe_get_supported_physical_layer(struct ixgbe_hw *hw); -s32 ixgbe_enable_rx_dma(struct ixgbe_hw *hw, u32 regval); -s32 ixgbe_reinit_fdir_tables_82599(struct ixgbe_hw *hw); -s32 ixgbe_init_fdir_signature_82599(struct ixgbe_hw *hw, u32 pballoc); -s32 ixgbe_init_fdir_perfect_82599(struct ixgbe_hw *hw, u32 pballoc); -s32 ixgbe_fdir_add_signature_filter_82599(struct ixgbe_hw *hw, - struct ixgbe_atr_input *input, - u8 queue); -s32 ixgbe_fdir_add_perfect_filter_82599(struct ixgbe_hw *hw, - struct ixgbe_atr_input *input, - u16 soft_id, - u8 queue); -u16 ixgbe_atr_compute_hash_82599(struct ixgbe_atr_input *input, u32 key); -s32 ixgbe_atr_set_vlan_id_82599(struct ixgbe_atr_input *input, u16 vlan_id); -s32 ixgbe_atr_set_src_ipv4_82599(struct ixgbe_atr_input *input, u32 src_addr); -s32 ixgbe_atr_set_dst_ipv4_82599(struct ixgbe_atr_input *input, u32 dst_addr); -s32 ixgbe_atr_set_src_ipv6_82599(struct ixgbe_atr_input *input, u32 src_addr_1, - u32 src_addr_2, u32 src_addr_3, - u32 src_addr_4); -s32 ixgbe_atr_set_dst_ipv6_82599(struct ixgbe_atr_input *input, u32 dst_addr_1, - u32 dst_addr_2, u32 dst_addr_3, - u32 dst_addr_4); -s32 ixgbe_atr_set_src_port_82599(struct ixgbe_atr_input *input, u16 src_port); -s32 ixgbe_atr_set_dst_port_82599(struct ixgbe_atr_input *input, u16 dst_port); -s32 ixgbe_atr_set_flex_byte_82599(struct ixgbe_atr_input *input, u16 flex_byte); -s32 ixgbe_atr_set_vm_pool_82599(struct ixgbe_atr_input *input, u8 vm_pool); -s32 ixgbe_atr_set_l4type_82599(struct ixgbe_atr_input *input, u8 l4type); -s32 ixgbe_atr_get_vlan_id_82599(struct ixgbe_atr_input *input, u16 *vlan_id); -s32 ixgbe_atr_get_src_ipv4_82599(struct ixgbe_atr_input *input, u32 *src_addr); -s32 ixgbe_atr_get_dst_ipv4_82599(struct ixgbe_atr_input *input, u32 *dst_addr); -s32 ixgbe_atr_get_src_ipv6_82599(struct ixgbe_atr_input *input, u32 *src_addr_1, - u32 *src_addr_2, u32 *src_addr_3, - u32 *src_addr_4); -s32 ixgbe_atr_get_dst_ipv6_82599(struct ixgbe_atr_input *input, u32 *dst_addr_1, - u32 *dst_addr_2, u32 *dst_addr_3, - u32 *dst_addr_4); -s32 ixgbe_atr_get_src_port_82599(struct ixgbe_atr_input *input, u16 *src_port); -s32 ixgbe_atr_get_dst_port_82599(struct ixgbe_atr_input *input, u16 *dst_port); -s32 ixgbe_atr_get_flex_byte_82599(struct ixgbe_atr_input *input, - u16 *flex_byte); -s32 ixgbe_atr_get_vm_pool_82599(struct ixgbe_atr_input *input, u8 *vm_pool); -s32 ixgbe_atr_get_l4type_82599(struct ixgbe_atr_input *input, u8 *l4type); -s32 ixgbe_read_i2c_byte(struct ixgbe_hw *hw, u8 byte_offset, u8 dev_addr, - u8 *data); -s32 ixgbe_write_i2c_byte(struct ixgbe_hw *hw, u8 byte_offset, u8 dev_addr, - u8 data); -s32 ixgbe_write_i2c_eeprom(struct ixgbe_hw *hw, u8 byte_offset, u8 eeprom_data); -s32 ixgbe_get_san_mac_addr(struct ixgbe_hw *hw, u8 *san_mac_addr); -s32 ixgbe_set_san_mac_addr(struct ixgbe_hw *hw, u8 *san_mac_addr); -s32 ixgbe_get_device_caps(struct ixgbe_hw *hw, u16 *device_caps); -s32 ixgbe_acquire_swfw_semaphore(struct ixgbe_hw *hw, u16 mask); -void ixgbe_release_swfw_semaphore(struct ixgbe_hw *hw, u16 mask); - #endif /* _IXGBE_API_H_ */ diff --git a/drivers/net/ixgbe/ixgbe_common.c b/drivers/net/ixgbe/ixgbe_common.c index e4b3055..8801042 100644 --- a/drivers/net/ixgbe/ixgbe_common.c +++ b/drivers/net/ixgbe/ixgbe_common.c @@ -1,7 +1,7 @@ /******************************************************************************* Intel 10 Gigabit PCI Express Linux driver - Copyright(c) 1999 - 2009 Intel Corporation. + Copyright(c) 1999 - 2008 Intel Corporation. This program is free software; you can redistribute it and/or modify it under the terms and conditions of the GNU General Public License, @@ -42,7 +42,11 @@ static void ixgbe_lower_eeprom_clk(struct ixgbe_hw *hw, u32 *eec); static void ixgbe_release_eeprom(struct ixgbe_hw *hw); static u16 ixgbe_calc_eeprom_checksum(struct ixgbe_hw *hw); +static void ixgbe_enable_rar(struct ixgbe_hw *hw, u32 index); +static void ixgbe_disable_rar(struct ixgbe_hw *hw, u32 index); static s32 ixgbe_mta_vector(struct ixgbe_hw *hw, u8 *mc_addr); +void ixgbe_add_mc_addr(struct ixgbe_hw *hw, u8 *mc_addr); +void ixgbe_add_uc_addr(struct ixgbe_hw *hw, u8 *addr, u32 vmdq); /** * ixgbe_init_ops_generic - Inits function ptrs @@ -75,24 +79,20 @@ s32 ixgbe_init_ops_generic(struct ixgbe_hw *hw) mac->ops.clear_hw_cntrs = &ixgbe_clear_hw_cntrs_generic; mac->ops.get_media_type = NULL; mac->ops.get_supported_physical_layer = NULL; - mac->ops.enable_rx_dma = &ixgbe_enable_rx_dma_generic; mac->ops.get_mac_addr = &ixgbe_get_mac_addr_generic; mac->ops.stop_adapter = &ixgbe_stop_adapter_generic; mac->ops.get_bus_info = &ixgbe_get_bus_info_generic; mac->ops.set_lan_id = &ixgbe_set_lan_id_multi_port_pcie; - mac->ops.acquire_swfw_sync = &ixgbe_acquire_swfw_sync; - mac->ops.release_swfw_sync = &ixgbe_release_swfw_sync; /* LEDs */ mac->ops.led_on = &ixgbe_led_on_generic; mac->ops.led_off = &ixgbe_led_off_generic; - mac->ops.blink_led_start = &ixgbe_blink_led_start_generic; - mac->ops.blink_led_stop = &ixgbe_blink_led_stop_generic; + mac->ops.blink_led_start = NULL; + mac->ops.blink_led_stop = NULL; /* RAR, Multicast, VLAN */ mac->ops.set_rar = &ixgbe_set_rar_generic; mac->ops.clear_rar = &ixgbe_clear_rar_generic; - mac->ops.insert_mac_addr = NULL; mac->ops.set_vmdq = NULL; mac->ops.clear_vmdq = NULL; mac->ops.init_rx_addrs = &ixgbe_init_rx_addrs_generic; @@ -104,8 +104,6 @@ s32 ixgbe_init_ops_generic(struct ixgbe_hw *hw) mac->ops.set_vfta = NULL; mac->ops.init_uta_tables = NULL; - /* Flow Control */ - mac->ops.fc_enable = &ixgbe_fc_enable_generic; /* Link */ mac->ops.get_link_capabilities = NULL; @@ -128,16 +126,28 @@ s32 ixgbe_init_ops_generic(struct ixgbe_hw *hw) s32 ixgbe_start_hw_generic(struct ixgbe_hw *hw) { u32 ctrl_ext; - s32 ret_val = 0; /* Set the media type */ hw->phy.media_type = hw->mac.ops.get_media_type(hw); - /* PHY ops initialization must be done in reset_hw() */ + /* Set bus info */ + hw->mac.ops.get_bus_info(hw); + + /* Identify the PHY */ + hw->phy.ops.identify(hw); + + /* + * Store MAC address from RAR0, clear receive address registers, and + * clear the multicast table + */ + hw->mac.ops.init_rx_addrs(hw); /* Clear the VLAN filter table */ hw->mac.ops.clear_vfta(hw); + /* Set up link */ + hw->mac.ops.setup_link(hw); + /* Clear statistics registers */ hw->mac.ops.clear_hw_cntrs(hw); @@ -147,13 +157,10 @@ s32 ixgbe_start_hw_generic(struct ixgbe_hw *hw) IXGBE_WRITE_REG(hw, IXGBE_CTRL_EXT, ctrl_ext); IXGBE_WRITE_FLUSH(hw); - /* Setup flow control */ - ixgbe_setup_fc(hw, 0); - /* Clear adapter stopped flag */ hw->adapter_stopped = false; - return ret_val; + return 0; } /** @@ -168,17 +175,13 @@ s32 ixgbe_start_hw_generic(struct ixgbe_hw *hw) **/ s32 ixgbe_init_hw_generic(struct ixgbe_hw *hw) { - s32 status = 0; - /* Reset the hardware */ - status = hw->mac.ops.reset_hw(hw); + hw->mac.ops.reset_hw(hw); - if (status == 0) { - /* Start the HW */ - status = hw->mac.ops.start_hw(hw); - } + /* Start the HW */ + hw->mac.ops.start_hw(hw); - return status; + return 0; } /** @@ -204,28 +207,15 @@ s32 ixgbe_clear_hw_cntrs_generic(struct ixgbe_hw *hw) IXGBE_READ_REG(hw, IXGBE_RLEC); IXGBE_READ_REG(hw, IXGBE_LXONTXC); IXGBE_READ_REG(hw, IXGBE_LXOFFTXC); - if (hw->mac.type >= ixgbe_mac_82599EB) { - IXGBE_READ_REG(hw, IXGBE_LXONRXCNT); - IXGBE_READ_REG(hw, IXGBE_LXOFFRXCNT); - } else { - IXGBE_READ_REG(hw, IXGBE_LXONRXC); - IXGBE_READ_REG(hw, IXGBE_LXOFFRXC); - } + IXGBE_READ_REG(hw, IXGBE_LXONRXC); + IXGBE_READ_REG(hw, IXGBE_LXOFFRXC); for (i = 0; i < 8; i++) { IXGBE_READ_REG(hw, IXGBE_PXONTXC(i)); IXGBE_READ_REG(hw, IXGBE_PXOFFTXC(i)); - if (hw->mac.type >= ixgbe_mac_82599EB) { - IXGBE_READ_REG(hw, IXGBE_PXONRXCNT(i)); - IXGBE_READ_REG(hw, IXGBE_PXOFFRXCNT(i)); - } else { - IXGBE_READ_REG(hw, IXGBE_PXONRXC(i)); - IXGBE_READ_REG(hw, IXGBE_PXOFFRXC(i)); - } + IXGBE_READ_REG(hw, IXGBE_PXONRXC(i)); + IXGBE_READ_REG(hw, IXGBE_PXOFFRXC(i)); } - if (hw->mac.type >= ixgbe_mac_82599EB) - for (i = 0; i < 8; i++) - IXGBE_READ_REG(hw, IXGBE_PXON2OFFCNT(i)); IXGBE_READ_REG(hw, IXGBE_PRC64); IXGBE_READ_REG(hw, IXGBE_PRC127); IXGBE_READ_REG(hw, IXGBE_PRC255); @@ -392,7 +382,6 @@ void ixgbe_set_lan_id_multi_port_pcie(struct ixgbe_hw *hw) reg = IXGBE_READ_REG(hw, IXGBE_STATUS); bus->func = (reg & IXGBE_STATUS_LAN_ID) >> IXGBE_STATUS_LAN_ID_SHIFT; - bus->lan_id = bus->func; /* check for a port swap */ reg = IXGBE_READ_REG(hw, IXGBE_FACTPS); @@ -597,6 +586,7 @@ s32 ixgbe_write_eeprom_generic(struct ixgbe_hw *hw, u16 offset, u16 data) ixgbe_shift_out_eeprom_bits(hw, data, 16); ixgbe_standby_eeprom(hw); + msleep(hw->eeprom.semaphore_delay); /* Done with writing - release the EEPROM */ ixgbe_release_eeprom(hw); } @@ -785,10 +775,13 @@ static s32 ixgbe_acquire_eeprom(struct ixgbe_hw *hw) static s32 ixgbe_get_eeprom_semaphore(struct ixgbe_hw *hw) { s32 status = IXGBE_ERR_EEPROM; - u32 timeout = 2000; + u32 timeout; u32 i; u32 swsm; + /* Set timeout value based on size of EEPROM */ + timeout = hw->eeprom.word_size + 1; + /* Get SMBI software semaphore between device drivers first */ for (i = 0; i < timeout; i++) { /* @@ -800,7 +793,7 @@ static s32 ixgbe_get_eeprom_semaphore(struct ixgbe_hw *hw) status = 0; break; } - udelay(50); + msleep(1); } /* Now get the semaphore between SW/FW through the SWESMBI bit */ @@ -828,14 +821,11 @@ static s32 ixgbe_get_eeprom_semaphore(struct ixgbe_hw *hw) * was not granted because we don''t have access to the EEPROM */ if (i >= timeout) { - hw_dbg(hw, "SWESMBI Software EEPROM semaphore " + hw_dbg(hw, "Driver can''t access the Eeprom - Semaphore " "not granted.\n"); ixgbe_release_eeprom_semaphore(hw); status = IXGBE_ERR_EEPROM; } - } else { - hw_dbg(hw, "Software semaphore SMBI between device drivers " - "not granted.\n"); } return status; @@ -1068,9 +1058,6 @@ static void ixgbe_release_eeprom(struct ixgbe_hw *hw) IXGBE_WRITE_REG(hw, IXGBE_EEC, eec); ixgbe_release_swfw_sync(hw, IXGBE_GSSR_EEP_SM); - - /* Delay before attempt to obtain semaphore again to allow FW access */ - msleep(hw->eeprom.semaphore_delay); } /** @@ -1300,6 +1287,38 @@ s32 ixgbe_clear_rar_generic(struct ixgbe_hw *hw, u32 index) } /** + * ixgbe_enable_rar - Enable Rx address register + * @hw: pointer to hardware structure + * @index: index into the RAR table + * + * Enables the select receive address register. + **/ +static void ixgbe_enable_rar(struct ixgbe_hw *hw, u32 index) +{ + u32 rar_high; + + rar_high = IXGBE_READ_REG(hw, IXGBE_RAH(index)); + rar_high |= IXGBE_RAH_AV; + IXGBE_WRITE_REG(hw, IXGBE_RAH(index), rar_high); +} + +/** + * ixgbe_disable_rar - Disable Rx address register + * @hw: pointer to hardware structure + * @index: index into the RAR table + * + * Disables the select receive address register. + **/ +static void ixgbe_disable_rar(struct ixgbe_hw *hw, u32 index) +{ + u32 rar_high; + + rar_high = IXGBE_READ_REG(hw, IXGBE_RAH(index)); + rar_high &= (~IXGBE_RAH_AV); + IXGBE_WRITE_REG(hw, IXGBE_RAH(index), rar_high); +} + +/** * ixgbe_init_rx_addrs_generic - Initializes receive address filters. * @hw: pointer to hardware structure * @@ -1350,6 +1369,7 @@ s32 ixgbe_init_rx_addrs_generic(struct ixgbe_hw *hw) } /* Clear the MTA */ + hw->addr_ctrl.mc_addr_in_rar_count = 0; hw->addr_ctrl.mta_in_use = 0; IXGBE_WRITE_REG(hw, IXGBE_MCSTCTRL, hw->mac.mc_filter_type); @@ -1382,7 +1402,8 @@ void ixgbe_add_uc_addr(struct ixgbe_hw *hw, u8 *addr, u32 vmdq) * else put the controller into promiscuous mode */ if (hw->addr_ctrl.rar_used_count < rar_entries) { - rar = hw->addr_ctrl.rar_used_count; + rar = hw->addr_ctrl.rar_used_count - + hw->addr_ctrl.mc_addr_in_rar_count; hw->mac.ops.set_rar(hw, rar, addr, vmdq, IXGBE_RAH_AV); hw_dbg(hw, "Added a secondary address to RAR[%d]\n", rar); hw->addr_ctrl.rar_used_count++; @@ -1421,13 +1442,14 @@ s32 ixgbe_update_uc_addr_list_generic(struct ixgbe_hw *hw, u8 *addr_list, * Clear accounting of old secondary address list, * don''t count RAR[0] */ - uc_addr_in_use = hw->addr_ctrl.rar_used_count - 1; + uc_addr_in_use = hw->addr_ctrl.rar_used_count - + hw->addr_ctrl.mc_addr_in_rar_count - 1; hw->addr_ctrl.rar_used_count -= uc_addr_in_use; hw->addr_ctrl.overflow_promisc = 0; /* Zero out the other receive addresses */ - hw_dbg(hw, "Clearing RAR[1-%d]\n", hw->addr_ctrl.rar_used_count); - for (i = 1; i <= hw->addr_ctrl.rar_used_count; i++) { + hw_dbg(hw, "Clearing RAR[1-%d]\n", uc_addr_in_use); + for (i = 1; i <= uc_addr_in_use; i++) { IXGBE_WRITE_REG(hw, IXGBE_RAL(i), 0); IXGBE_WRITE_REG(hw, IXGBE_RAH(i), 0); } @@ -1536,6 +1558,40 @@ void ixgbe_set_mta(struct ixgbe_hw *hw, u8 *mc_addr) } /** + * ixgbe_add_mc_addr - Adds a multicast address. + * @hw: pointer to hardware structure + * @mc_addr: new multicast address + * + * Adds it to unused receive address register or to the multicast table. + **/ +void ixgbe_add_mc_addr(struct ixgbe_hw *hw, u8 *mc_addr) +{ + u32 rar_entries = hw->mac.num_rar_entries; + u32 rar; + + hw_dbg(hw, " MC Addr =%.2X %.2X %.2X %.2X %.2X %.2X\n", + mc_addr[0], mc_addr[1], mc_addr[2], + mc_addr[3], mc_addr[4], mc_addr[5]); + + /* + * Place this multicast address in the RAR if there is room, + * else put it in the MTA + */ + if (hw->addr_ctrl.rar_used_count < rar_entries) { + /* use RAR from the end up for multicast */ + rar = rar_entries - hw->addr_ctrl.mc_addr_in_rar_count - 1; + hw->mac.ops.set_rar(hw, rar, mc_addr, 0, IXGBE_RAH_AV); + hw_dbg(hw, "Added a multicast address to RAR[%d]\n", rar); + hw->addr_ctrl.rar_used_count++; + hw->addr_ctrl.mc_addr_in_rar_count++; + } else { + ixgbe_set_mta(hw, mc_addr); + } + + hw_dbg(hw, "ixgbe_add_mc_addr Complete\n"); +} + +/** * ixgbe_update_mc_addr_list_generic - Updates MAC list of multicast addresses * @hw: pointer to hardware structure * @mc_addr_list: the list of new multicast addresses @@ -1551,6 +1607,7 @@ s32 ixgbe_update_mc_addr_list_generic(struct ixgbe_hw *hw, u8 *mc_addr_list, u32 mc_addr_count, ixgbe_mc_addr_itr next) { u32 i; + u32 rar_entries = hw->mac.num_rar_entries; u32 vmdq; /* @@ -1558,8 +1615,18 @@ s32 ixgbe_update_mc_addr_list_generic(struct ixgbe_hw *hw, u8 *mc_addr_list, * use. */ hw->addr_ctrl.num_mc_addrs = mc_addr_count; + hw->addr_ctrl.rar_used_count -= hw->addr_ctrl.mc_addr_in_rar_count; + hw->addr_ctrl.mc_addr_in_rar_count = 0; hw->addr_ctrl.mta_in_use = 0; + /* Zero out the other receive addresses. */ + hw_dbg(hw, "Clearing RAR[%d-%d]\n", hw->addr_ctrl.rar_used_count, + rar_entries - 1); + for (i = hw->addr_ctrl.rar_used_count; i < rar_entries; i++) { + IXGBE_WRITE_REG(hw, IXGBE_RAL(i), 0); + IXGBE_WRITE_REG(hw, IXGBE_RAH(i), 0); + } + /* Clear the MTA */ hw_dbg(hw, " Clearing MTA\n"); for (i = 0; i < hw->mac.mcft_size; i++) @@ -1568,7 +1635,7 @@ s32 ixgbe_update_mc_addr_list_generic(struct ixgbe_hw *hw, u8 *mc_addr_list, /* Add the new addresses */ for (i = 0; i < mc_addr_count; i++) { hw_dbg(hw, " Adding the multicast addresses:\n"); - ixgbe_set_mta(hw, next(hw, &mc_addr_list, &vmdq)); + ixgbe_add_mc_addr(hw, next(hw, &mc_addr_list, &vmdq)); } /* Enable mta */ @@ -1588,8 +1655,15 @@ s32 ixgbe_update_mc_addr_list_generic(struct ixgbe_hw *hw, u8 *mc_addr_list, **/ s32 ixgbe_enable_mc_generic(struct ixgbe_hw *hw) { + u32 i; + u32 rar_entries = hw->mac.num_rar_entries; struct ixgbe_addr_filter_info *a = &hw->addr_ctrl; + if (a->mc_addr_in_rar_count > 0) + for (i = (rar_entries - a->mc_addr_in_rar_count); + i < rar_entries; i++) + ixgbe_enable_rar(hw, i); + if (a->mta_in_use > 0) IXGBE_WRITE_REG(hw, IXGBE_MCSTCTRL, IXGBE_MCSTCTRL_MFE | hw->mac.mc_filter_type); @@ -1605,369 +1679,23 @@ s32 ixgbe_enable_mc_generic(struct ixgbe_hw *hw) **/ s32 ixgbe_disable_mc_generic(struct ixgbe_hw *hw) { + u32 i; + u32 rar_entries = hw->mac.num_rar_entries; struct ixgbe_addr_filter_info *a = &hw->addr_ctrl; + if (a->mc_addr_in_rar_count > 0) + for (i = (rar_entries - a->mc_addr_in_rar_count); + i < rar_entries; i++) + ixgbe_disable_rar(hw, i); + if (a->mta_in_use > 0) IXGBE_WRITE_REG(hw, IXGBE_MCSTCTRL, hw->mac.mc_filter_type); return 0; } -/** - * ixgbe_fc_enable_generic - Enable flow control - * @hw: pointer to hardware structure - * @packetbuf_num: packet buffer number (0-7) - * - * Enable flow control according to the current settings. - **/ -s32 ixgbe_fc_enable_generic(struct ixgbe_hw *hw, s32 packetbuf_num) -{ - s32 ret_val = 0; - u32 mflcn_reg, fccfg_reg; - u32 reg; - u32 rx_pba_size; - -#ifdef CONFIG_DCB - if (hw->fc.requested_mode == ixgbe_fc_pfc) - goto out; - -#endif /* CONFIG_DCB */ - /* Negotiate the fc mode to use */ - ret_val = ixgbe_fc_autoneg(hw); - if (ret_val) - goto out; - - /* Disable any previous flow control settings */ - mflcn_reg = IXGBE_READ_REG(hw, IXGBE_MFLCN); - mflcn_reg &= ~(IXGBE_MFLCN_RFCE | IXGBE_MFLCN_RPFCE); - - fccfg_reg = IXGBE_READ_REG(hw, IXGBE_FCCFG); - fccfg_reg &= ~(IXGBE_FCCFG_TFCE_802_3X | IXGBE_FCCFG_TFCE_PRIORITY); - - /* - * The possible values of fc.current_mode are: - * 0: Flow control is completely disabled - * 1: Rx flow control is enabled (we can receive pause frames, - * but not send pause frames). - * 2: Tx flow control is enabled (we can send pause frames but - * we do not support receiving pause frames). - * 3: Both Rx and Tx flow control (symmetric) are enabled. -#ifdef CONFIG_DCB - * 4: Priority Flow Control is enabled. -#endif - * other: Invalid. - */ - switch (hw->fc.current_mode) { - case ixgbe_fc_none: - /* Flow control is disabled by software override or autoneg. - * The code below will actually disable it in the HW. - */ - break; - case ixgbe_fc_rx_pause: - /* - * Rx Flow control is enabled and Tx Flow control is - * disabled by software override. Since there really - * isn''t a way to advertise that we are capable of RX - * Pause ONLY, we will advertise that we support both - * symmetric and asymmetric Rx PAUSE. Later, we will - * disable the adapter''s ability to send PAUSE frames. - */ - mflcn_reg |= IXGBE_MFLCN_RFCE; - break; - case ixgbe_fc_tx_pause: - /* - * Tx Flow control is enabled, and Rx Flow control is - * disabled by software override. - */ - fccfg_reg |= IXGBE_FCCFG_TFCE_802_3X; - break; - case ixgbe_fc_full: - /* Flow control (both Rx and Tx) is enabled by SW override. */ - mflcn_reg |= IXGBE_MFLCN_RFCE; - fccfg_reg |= IXGBE_FCCFG_TFCE_802_3X; - break; -#ifdef CONFIG_DCB - case ixgbe_fc_pfc: - goto out; - break; -#endif /* CONFIG_DCB */ - default: - hw_dbg(hw, "Flow control param set incorrectly\n"); - ret_val = -IXGBE_ERR_CONFIG; - goto out; - break; - } - - /* Set 802.3x based flow control settings. */ - mflcn_reg |= IXGBE_MFLCN_DPF; - IXGBE_WRITE_REG(hw, IXGBE_MFLCN, mflcn_reg); - IXGBE_WRITE_REG(hw, IXGBE_FCCFG, fccfg_reg); - - reg = IXGBE_READ_REG(hw, IXGBE_MTQC); - /* Thresholds are different for link flow control when in DCB mode */ - if (reg & IXGBE_MTQC_RT_ENA) { - rx_pba_size = IXGBE_READ_REG(hw, IXGBE_RXPBSIZE(packetbuf_num)); - - /* Always disable XON for LFC when in DCB mode */ - reg = (rx_pba_size >> 5) & 0xFFE0; - IXGBE_WRITE_REG(hw, IXGBE_FCRTL_82599(packetbuf_num), reg); - - reg = (rx_pba_size >> 2) & 0xFFE0; - if (hw->fc.current_mode & ixgbe_fc_tx_pause) - reg |= IXGBE_FCRTH_FCEN; - IXGBE_WRITE_REG(hw, IXGBE_FCRTH_82599(packetbuf_num), reg); - } else { - /* Set up and enable Rx high/low water mark thresholds, - * enable XON. */ - if (hw->fc.current_mode & ixgbe_fc_tx_pause) { - if (hw->fc.send_xon) { - IXGBE_WRITE_REG(hw, - IXGBE_FCRTL_82599(packetbuf_num), - (hw->fc.low_water | - IXGBE_FCRTL_XONE)); - } else { - IXGBE_WRITE_REG(hw, - IXGBE_FCRTL_82599(packetbuf_num), - hw->fc.low_water); - } - - IXGBE_WRITE_REG(hw, IXGBE_FCRTH_82599(packetbuf_num), - (hw->fc.high_water | IXGBE_FCRTH_FCEN)); - } - } - - /* Configure pause time (2 TCs per register) */ - reg = IXGBE_READ_REG(hw, IXGBE_FCTTV(packetbuf_num / 2)); - if ((packetbuf_num & 1) == 0) - reg = (reg & 0xFFFF0000) | hw->fc.pause_time; - else - reg = (reg & 0x0000FFFF) | (hw->fc.pause_time << 16); - IXGBE_WRITE_REG(hw, IXGBE_FCTTV(packetbuf_num / 2), reg); - - IXGBE_WRITE_REG(hw, IXGBE_FCRTV, (hw->fc.pause_time >> 1)); - -out: - return ret_val; -} - -/** - * ixgbe_fc_autoneg - Configure flow control - * @hw: pointer to hardware structure - * - * Compares our advertised flow control capabilities to those advertised by - * our link partner, and determines the proper flow control mode to use. - **/ -s32 ixgbe_fc_autoneg(struct ixgbe_hw *hw) -{ - s32 ret_val = 0; - ixgbe_link_speed speed; - u32 pcs_anadv_reg, pcs_lpab_reg, linkstat; - bool link_up; - - /* - * AN should have completed when the cable was plugged in. - * Look for reasons to bail out. Bail out if: - * - FC autoneg is disabled, or if - * - we don''t have multispeed fiber, or if - * - we''re not running at 1G, or if - * - link is not up, or if - * - link is up but AN did not complete, or if - * - link is up and AN completed but timed out - * - * Since we''re being called from an LSC, link is already know to be up. - * So use link_up_wait_to_complete=false. - */ - hw->mac.ops.check_link(hw, &speed, &link_up, false); - linkstat = IXGBE_READ_REG(hw, IXGBE_PCS1GLSTA); - - if (hw->fc.disable_fc_autoneg || - !hw->phy.multispeed_fiber || - (speed != IXGBE_LINK_SPEED_1GB_FULL) || - !link_up || - ((linkstat & IXGBE_PCS1GLSTA_AN_COMPLETE) == 0) || - ((linkstat & IXGBE_PCS1GLSTA_AN_TIMED_OUT) == 1)) { - hw->fc.fc_was_autonegged = false; - hw->fc.current_mode = hw->fc.requested_mode; - hw_dbg(hw, "Autoneg FC was skipped.\n"); - goto out; - } - - /* - * Read the AN advertisement and LP ability registers and resolve - * local flow control settings accordingly - */ - pcs_anadv_reg = IXGBE_READ_REG(hw, IXGBE_PCS1GANA); - pcs_lpab_reg = IXGBE_READ_REG(hw, IXGBE_PCS1GANLP); - if ((pcs_anadv_reg & IXGBE_PCS1GANA_SYM_PAUSE) && - (pcs_lpab_reg & IXGBE_PCS1GANA_SYM_PAUSE)) { - /* - * Now we need to check if the user selected Rx ONLY - * of pause frames. In this case, we had to advertise - * FULL flow control because we could not advertise RX - * ONLY. Hence, we must now check to see if we need to - * turn OFF the TRANSMISSION of PAUSE frames. - */ - if (hw->fc.requested_mode == ixgbe_fc_full) { - hw->fc.current_mode = ixgbe_fc_full; - hw_dbg(hw, "Flow Control = FULL.\n"); - } else { - hw->fc.current_mode = ixgbe_fc_rx_pause; - hw_dbg(hw, "Flow Control = RX PAUSE frames only.\n"); - } - } else if (!(pcs_anadv_reg & IXGBE_PCS1GANA_SYM_PAUSE) && - (pcs_anadv_reg & IXGBE_PCS1GANA_ASM_PAUSE) && - (pcs_lpab_reg & IXGBE_PCS1GANA_SYM_PAUSE) && - (pcs_lpab_reg & IXGBE_PCS1GANA_ASM_PAUSE)) { - hw->fc.current_mode = ixgbe_fc_tx_pause; - hw_dbg(hw, "Flow Control = TX PAUSE frames only.\n"); - } else if ((pcs_anadv_reg & IXGBE_PCS1GANA_SYM_PAUSE) && - (pcs_anadv_reg & IXGBE_PCS1GANA_ASM_PAUSE) && - !(pcs_lpab_reg & IXGBE_PCS1GANA_SYM_PAUSE) && - (pcs_lpab_reg & IXGBE_PCS1GANA_ASM_PAUSE)) { - hw->fc.current_mode = ixgbe_fc_rx_pause; - hw_dbg(hw, "Flow Control = RX PAUSE frames only.\n"); - } else { - hw->fc.current_mode = ixgbe_fc_none; - hw_dbg(hw, "Flow Control = NONE.\n"); - } - - /* Record that current_mode is the result of a successful autoneg */ - hw->fc.fc_was_autonegged = true; - -out: - return ret_val; -} - -/** - * ixgbe_setup_fc - Set up flow control - * @hw: pointer to hardware structure - * - * Called at init time to set up flow control. - **/ -s32 ixgbe_setup_fc(struct ixgbe_hw *hw, s32 packetbuf_num) -{ - s32 ret_val = 0; - u32 reg; - -#ifdef CONFIG_DCB - if (hw->fc.requested_mode == ixgbe_fc_pfc) { - hw->fc.current_mode = hw->fc.requested_mode; - goto out; - } - -#endif /* CONFIG_DCB */ - - /* Validate the packetbuf configuration */ - if (packetbuf_num < 0 || packetbuf_num > 7) { - hw_dbg(hw, "Invalid packet buffer number [%d], expected range is" - " 0-7\n", packetbuf_num); - ret_val = IXGBE_ERR_INVALID_LINK_SETTINGS; - goto out; - } - - /* - * Validate the water mark configuration. Zero water marks are invalid - * because it causes the controller to just blast out fc packets. - */ - if (!hw->fc.low_water || !hw->fc.high_water || !hw->fc.pause_time) { - hw_dbg(hw, "Invalid water mark configuration\n"); - ret_val = IXGBE_ERR_INVALID_LINK_SETTINGS; - goto out; - } - - /* - * Validate the requested mode. Strict IEEE mode does not allow - * ixgbe_fc_rx_pause because it will cause us to fail at UNH. - */ - if (hw->fc.strict_ieee && hw->fc.requested_mode == ixgbe_fc_rx_pause) { - hw_dbg(hw, "ixgbe_fc_rx_pause not valid in strict IEEE mode\n"); - ret_val = IXGBE_ERR_INVALID_LINK_SETTINGS; - goto out; - } - - /* - * 10gig parts do not have a word in the EEPROM to determine the - * default flow control setting, so we explicitly set it to full. - */ - if (hw->fc.requested_mode == ixgbe_fc_default) - hw->fc.requested_mode = ixgbe_fc_full; - - /* - * Set up the 1G flow control advertisement registers so the HW will be - * able to do fc autoneg once the cable is plugged in. If we end up - * using 10g instead, this is harmless. - */ - reg = IXGBE_READ_REG(hw, IXGBE_PCS1GANA); - - /* - * The possible values of fc.requested_mode are: - * 0: Flow control is completely disabled - * 1: Rx flow control is enabled (we can receive pause frames, - * but not send pause frames). - * 2: Tx flow control is enabled (we can send pause frames but - * we do not support receiving pause frames). - * 3: Both Rx and Tx flow control (symmetric) are enabled. -#ifdef CONFIG_DCB - * 4: Priority Flow Control is enabled. -#endif - * other: Invalid. - */ - switch (hw->fc.requested_mode) { - case ixgbe_fc_none: - /* Flow control completely disabled by software override. */ - reg &= ~(IXGBE_PCS1GANA_SYM_PAUSE | IXGBE_PCS1GANA_ASM_PAUSE); - break; - case ixgbe_fc_rx_pause: - /* - * Rx Flow control is enabled and Tx Flow control is - * disabled by software override. Since there really - * isn''t a way to advertise that we are capable of RX - * Pause ONLY, we will advertise that we support both - * symmetric and asymmetric Rx PAUSE. Later, we will - * disable the adapter''s ability to send PAUSE frames. - */ - reg |= (IXGBE_PCS1GANA_SYM_PAUSE | IXGBE_PCS1GANA_ASM_PAUSE); - break; - case ixgbe_fc_tx_pause: - /* - * Tx Flow control is enabled, and Rx Flow control is - * disabled by software override. - */ - reg |= (IXGBE_PCS1GANA_ASM_PAUSE); - reg &= ~(IXGBE_PCS1GANA_SYM_PAUSE); - break; - case ixgbe_fc_full: - /* Flow control (both Rx and Tx) is enabled by SW override. */ - reg |= (IXGBE_PCS1GANA_SYM_PAUSE | IXGBE_PCS1GANA_ASM_PAUSE); - break; -#ifdef CONFIG_DCB - case ixgbe_fc_pfc: - goto out; - break; -#endif /* CONFIG_DCB */ - default: - hw_dbg(hw, "Flow control param set incorrectly\n"); - ret_val = -IXGBE_ERR_CONFIG; - goto out; - break; - } - - IXGBE_WRITE_REG(hw, IXGBE_PCS1GANA, reg); - reg = IXGBE_READ_REG(hw, IXGBE_PCS1GLCTL); - - /* Enable and restart autoneg to inform the link partner */ - reg |= IXGBE_PCS1GLCTL_AN_ENABLE | IXGBE_PCS1GLCTL_AN_RESTART; - /* Disable AN timeout */ - if (hw->fc.strict_ieee) - reg &= ~IXGBE_PCS1GLCTL_AN_1G_TIMEOUT_EN; - IXGBE_WRITE_REG(hw, IXGBE_PCS1GLCTL, reg); - hw_dbg(hw, "Set up FC; PCS1GLCTL = 0x%08X\n", reg); - -out: - return ret_val; -} /** * ixgbe_disable_pcie_master - Disable PCI-express master access @@ -2027,10 +1755,6 @@ s32 ixgbe_acquire_swfw_sync(struct ixgbe_hw *hw, u16 mask) s32 timeout = 200; while (timeout) { - /* - * SW EEPROM semaphore bit is used for access to all - * SW_FW_SYNC/GSSR bits (not just EEPROM) - */ if (ixgbe_get_eeprom_semaphore(hw)) return -IXGBE_ERR_SWFW_SYNC; @@ -2048,7 +1772,7 @@ s32 ixgbe_acquire_swfw_sync(struct ixgbe_hw *hw, u16 mask) } if (!timeout) { - hw_dbg(hw, "Driver can''t access resource, SW_FW_SYNC timeout.\n"); + hw_dbg(hw, "Driver can''t access resource, GSSR timeout.\n"); return -IXGBE_ERR_SWFW_SYNC; } @@ -2081,75 +1805,3 @@ void ixgbe_release_swfw_sync(struct ixgbe_hw *hw, u16 mask) ixgbe_release_eeprom_semaphore(hw); } -/** - * ixgbe_enable_rx_dma_generic - Enable the Rx DMA unit - * @hw: pointer to hardware structure - * @regval: register value to write to RXCTRL - * - * Enables the Rx DMA unit - **/ -s32 ixgbe_enable_rx_dma_generic(struct ixgbe_hw *hw, u32 regval) -{ - IXGBE_WRITE_REG(hw, IXGBE_RXCTRL, regval); - - return 0; -} - -/** - * ixgbe_blink_led_start_generic - Blink LED based on index. - * @hw: pointer to hardware structure - * @index: led number to blink - **/ -s32 ixgbe_blink_led_start_generic(struct ixgbe_hw *hw, u32 index) -{ - ixgbe_link_speed speed = 0; - bool link_up = 0; - u32 autoc_reg = IXGBE_READ_REG(hw, IXGBE_AUTOC); - u32 led_reg = IXGBE_READ_REG(hw, IXGBE_LEDCTL); - - /* - * Link must be up to auto-blink the LEDs; - * Force it if link is down. - */ - hw->mac.ops.check_link(hw, &speed, &link_up, false); - - if (!link_up) { - - autoc_reg |= IXGBE_AUTOC_AN_RESTART; - autoc_reg |= IXGBE_AUTOC_FLU; - IXGBE_WRITE_REG(hw, IXGBE_AUTOC, autoc_reg); - msleep(10); - } - - led_reg &= ~IXGBE_LED_MODE_MASK(index); - led_reg |= IXGBE_LED_BLINK(index); - IXGBE_WRITE_REG(hw, IXGBE_LEDCTL, led_reg); - IXGBE_WRITE_FLUSH(hw); - - return 0; -} - -/** - * ixgbe_blink_led_stop_generic - Stop blinking LED based on index. - * @hw: pointer to hardware structure - * @index: led number to stop blinking - **/ -s32 ixgbe_blink_led_stop_generic(struct ixgbe_hw *hw, u32 index) -{ - u32 autoc_reg = IXGBE_READ_REG(hw, IXGBE_AUTOC); - u32 led_reg = IXGBE_READ_REG(hw, IXGBE_LEDCTL); - - - autoc_reg &= ~IXGBE_AUTOC_FLU; - autoc_reg |= IXGBE_AUTOC_AN_RESTART; - IXGBE_WRITE_REG(hw, IXGBE_AUTOC, autoc_reg); - - led_reg &= ~IXGBE_LED_MODE_MASK(index); - led_reg &= ~IXGBE_LED_BLINK(index); - led_reg |= IXGBE_LED_LINK_ACTIVE << IXGBE_LED_MODE_SHIFT(index); - IXGBE_WRITE_REG(hw, IXGBE_LEDCTL, led_reg); - IXGBE_WRITE_FLUSH(hw); - - return 0; -} - diff --git a/drivers/net/ixgbe/ixgbe_common.h b/drivers/net/ixgbe/ixgbe_common.h index 5045656..a6a08f5 100644 --- a/drivers/net/ixgbe/ixgbe_common.h +++ b/drivers/net/ixgbe/ixgbe_common.h @@ -1,7 +1,7 @@ /******************************************************************************* Intel 10 Gigabit PCI Express Linux driver - Copyright(c) 1999 - 2009 Intel Corporation. + Copyright(c) 1999 - 2008 Intel Corporation. This program is free software; you can redistribute it and/or modify it under the terms and conditions of the GNU General Public License, @@ -61,14 +61,11 @@ s32 ixgbe_update_mc_addr_list_generic(struct ixgbe_hw *hw, u8 *mc_addr_list, ixgbe_mc_addr_itr func); s32 ixgbe_update_uc_addr_list_generic(struct ixgbe_hw *hw, u8 *addr_list, u32 addr_count, ixgbe_mc_addr_itr func); -void ixgbe_add_uc_addr(struct ixgbe_hw *hw, u8 *addr, u32 vmdq); s32 ixgbe_enable_mc_generic(struct ixgbe_hw *hw); s32 ixgbe_disable_mc_generic(struct ixgbe_hw *hw); -s32 ixgbe_enable_rx_dma_generic(struct ixgbe_hw *hw, u32 regval); -s32 ixgbe_setup_fc(struct ixgbe_hw *hw, s32 packetbuf_num); -s32 ixgbe_fc_enable_generic(struct ixgbe_hw *hw, s32 packtetbuf_num); -s32 ixgbe_fc_autoneg(struct ixgbe_hw *hw); +s32 ixgbe_setup_fc_generic(struct ixgbe_hw *hw, s32 packetbuf_num); +s32 ixgbe_fc_enable(struct ixgbe_hw *hw, s32 packtetbuf_num); s32 ixgbe_validate_mac_addr(u8 *mac_addr); s32 ixgbe_acquire_swfw_sync(struct ixgbe_hw *hw, u16 mask); @@ -77,7 +74,4 @@ s32 ixgbe_disable_pcie_master(struct ixgbe_hw *hw); s32 ixgbe_read_analog_reg8_generic(struct ixgbe_hw *hw, u32 reg, u8 *val); s32 ixgbe_write_analog_reg8_generic(struct ixgbe_hw *hw, u32 reg, u8 val); -s32 ixgbe_blink_led_start_generic(struct ixgbe_hw *hw, u32 index); -s32 ixgbe_blink_led_stop_generic(struct ixgbe_hw *hw, u32 index); - #endif /* IXGBE_COMMON */ diff --git a/drivers/net/ixgbe/ixgbe_dcb.c b/drivers/net/ixgbe/ixgbe_dcb.c index dd2df31..3a16e94 100644 --- a/drivers/net/ixgbe/ixgbe_dcb.c +++ b/drivers/net/ixgbe/ixgbe_dcb.c @@ -1,7 +1,7 @@ /******************************************************************************* Intel 10 Gigabit PCI Express Linux driver - Copyright(c) 1999 - 2009 Intel Corporation. + Copyright(c) 1999 - 2008 Intel Corporation. This program is free software; you can redistribute it and/or modify it under the terms and conditions of the GNU General Public License, @@ -29,7 +29,6 @@ #include "ixgbe_type.h" #include "ixgbe_dcb.h" #include "ixgbe_dcb_82598.h" -#include "ixgbe_dcb_82599.h" /** * ixgbe_dcb_config - Struct containing DCB settings. @@ -218,8 +217,6 @@ s32 ixgbe_dcb_get_tc_stats(struct ixgbe_hw *hw, struct ixgbe_hw_stats *stats, s32 ret = 0; if (hw->mac.type == ixgbe_mac_82598EB) ret = ixgbe_dcb_get_tc_stats_82598(hw, stats, tc_count); - else if (hw->mac.type == ixgbe_mac_82599EB) - ret = ixgbe_dcb_get_tc_stats_82599(hw, stats, tc_count); return ret; } @@ -237,8 +234,6 @@ s32 ixgbe_dcb_get_pfc_stats(struct ixgbe_hw *hw, struct ixgbe_hw_stats *stats, s32 ret = 0; if (hw->mac.type == ixgbe_mac_82598EB) ret = ixgbe_dcb_get_pfc_stats_82598(hw, stats, tc_count); - else if (hw->mac.type == ixgbe_mac_82599EB) - ret = ixgbe_dcb_get_pfc_stats_82599(hw, stats, tc_count); return ret; } @@ -255,8 +250,6 @@ s32 ixgbe_dcb_config_rx_arbiter(struct ixgbe_hw *hw, s32 ret = 0; if (hw->mac.type == ixgbe_mac_82598EB) ret = ixgbe_dcb_config_rx_arbiter_82598(hw, dcb_config); - else if (hw->mac.type == ixgbe_mac_82599EB) - ret = ixgbe_dcb_config_rx_arbiter_82599(hw, dcb_config); return ret; } @@ -273,8 +266,6 @@ s32 ixgbe_dcb_config_tx_desc_arbiter(struct ixgbe_hw *hw, s32 ret = 0; if (hw->mac.type == ixgbe_mac_82598EB) ret = ixgbe_dcb_config_tx_desc_arbiter_82598(hw, dcb_config); - else if (hw->mac.type == ixgbe_mac_82599EB) - ret = ixgbe_dcb_config_tx_desc_arbiter_82599(hw, dcb_config); return ret; } @@ -291,8 +282,6 @@ s32 ixgbe_dcb_config_tx_data_arbiter(struct ixgbe_hw *hw, s32 ret = 0; if (hw->mac.type == ixgbe_mac_82598EB) ret = ixgbe_dcb_config_tx_data_arbiter_82598(hw, dcb_config); - else if (hw->mac.type == ixgbe_mac_82599EB) - ret = ixgbe_dcb_config_tx_data_arbiter_82599(hw, dcb_config); return ret; } @@ -309,8 +298,6 @@ s32 ixgbe_dcb_config_pfc(struct ixgbe_hw *hw, s32 ret = 0; if (hw->mac.type == ixgbe_mac_82598EB) ret = ixgbe_dcb_config_pfc_82598(hw, dcb_config); - else if (hw->mac.type == ixgbe_mac_82599EB) - ret = ixgbe_dcb_config_pfc_82599(hw, dcb_config); return ret; } @@ -326,8 +313,6 @@ s32 ixgbe_dcb_config_tc_stats(struct ixgbe_hw *hw) s32 ret = 0; if (hw->mac.type == ixgbe_mac_82598EB) ret = ixgbe_dcb_config_tc_stats_82598(hw); - else if (hw->mac.type == ixgbe_mac_82599EB) - ret = ixgbe_dcb_config_tc_stats_82599(hw); return ret; } @@ -344,7 +329,5 @@ s32 ixgbe_dcb_hw_config(struct ixgbe_hw *hw, s32 ret = 0; if (hw->mac.type == ixgbe_mac_82598EB) ret = ixgbe_dcb_hw_config_82598(hw, dcb_config); - else if (hw->mac.type == ixgbe_mac_82599EB) - ret = ixgbe_dcb_hw_config_82599(hw, dcb_config); return ret; } diff --git a/drivers/net/ixgbe/ixgbe_dcb.h b/drivers/net/ixgbe/ixgbe_dcb.h index 112c641..206c9f2 100644 --- a/drivers/net/ixgbe/ixgbe_dcb.h +++ b/drivers/net/ixgbe/ixgbe_dcb.h @@ -1,7 +1,7 @@ /******************************************************************************* Intel 10 Gigabit PCI Express Linux driver - Copyright(c) 1999 - 2009 Intel Corporation. + Copyright(c) 1999 - 2008 Intel Corporation. This program is free software; you can redistribute it and/or modify it under the terms and conditions of the GNU General Public License, @@ -74,26 +74,6 @@ enum strict_prio_type { prio_link }; -/* DCB capability definitions */ -#define IXGBE_DCB_PG_SUPPORT 0x00000001 -#define IXGBE_DCB_PFC_SUPPORT 0x00000002 -#define IXGBE_DCB_BCN_SUPPORT 0x00000004 -#define IXGBE_DCB_UP2TC_SUPPORT 0x00000008 -#define IXGBE_DCB_GSP_SUPPORT 0x00000010 - -#define IXGBE_DCB_8_TC_SUPPORT 0x80 - -struct dcb_support { - /* DCB capabilities */ - u32 capabilities; - - /* Each bit represents a number of TCs configurable in the hw. - * If 8 traffic classes can be configured, the value is 0x80. - */ - u8 traffic_classes; - u8 pfc_traffic_classes; -}; - /* Traffic class bandwidth allocation per direction */ struct tc_bw_alloc { u8 bwg_id; /* Bandwidth Group (BWG) ID */ @@ -127,15 +107,9 @@ enum dcb_rx_pba_cfg { pba_80_48 /* PBA[0-3] each use 80KB, PBA[4-7] each use 48KB */ }; -struct dcb_num_tcs { - u8 pg_tcs; - u8 pfc_tcs; -}; struct ixgbe_dcb_config { struct tc_configuration tc_config[MAX_TRAFFIC_CLASS]; - struct dcb_support support; - struct dcb_num_tcs num_tcs; u8 bw_percentage[2][MAX_BW_GROUP]; /* One each for Tx/Rx */ bool pfc_mode_enable; bool round_robin_enable; diff --git a/drivers/net/ixgbe/ixgbe_dcb_82598.c b/drivers/net/ixgbe/ixgbe_dcb_82598.c index ace0a1f..9f937a0 100644 --- a/drivers/net/ixgbe/ixgbe_dcb_82598.c +++ b/drivers/net/ixgbe/ixgbe_dcb_82598.c @@ -1,7 +1,7 @@ /******************************************************************************* Intel 10 Gigabit PCI Express Linux driver - Copyright(c) 1999 - 2009 Intel Corporation. + Copyright(c) 1999 - 2008 Intel Corporation. This program is free software; you can redistribute it and/or modify it under the terms and conditions of the GNU General Public License, @@ -299,13 +299,11 @@ s32 ixgbe_dcb_config_pfc_82598(struct ixgbe_hw *hw, u32 reg, rx_pba_size; u8 i; - if (!dcb_config->pfc_mode_enable) - goto out; - /* Enable Transmit Priority Flow Control */ reg = IXGBE_READ_REG(hw, IXGBE_RMCS); reg &= ~IXGBE_RMCS_TFCE_802_3X; /* correct the reporting of our flow control status */ + hw->fc.current_mode = ixgbe_fc_none; reg |= IXGBE_RMCS_TFCE_PRIORITY; IXGBE_WRITE_REG(hw, IXGBE_RMCS, reg); @@ -349,7 +347,6 @@ s32 ixgbe_dcb_config_pfc_82598(struct ixgbe_hw *hw, /* Configure flow control refresh threshold value */ IXGBE_WRITE_REG(hw, IXGBE_FCRTV, 0x3400); -out: return 0; } diff --git a/drivers/net/ixgbe/ixgbe_dcb_82598.h b/drivers/net/ixgbe/ixgbe_dcb_82598.h index 247192c..592b0f8 100644 --- a/drivers/net/ixgbe/ixgbe_dcb_82598.h +++ b/drivers/net/ixgbe/ixgbe_dcb_82598.h @@ -1,7 +1,7 @@ /******************************************************************************* Intel 10 Gigabit PCI Express Linux driver - Copyright(c) 1999 - 2009 Intel Corporation. + Copyright(c) 1999 - 2008 Intel Corporation. This program is free software; you can redistribute it and/or modify it under the terms and conditions of the GNU General Public License, diff --git a/drivers/net/ixgbe/ixgbe_dcb_82599.c b/drivers/net/ixgbe/ixgbe_dcb_82599.c deleted file mode 100644 index 8dd78b0..0000000 --- a/drivers/net/ixgbe/ixgbe_dcb_82599.c +++ /dev/null @@ -1,508 +0,0 @@ -/******************************************************************************* - - Intel 10 Gigabit PCI Express Linux driver - Copyright(c) 1999 - 2009 Intel Corporation. - - This program is free software; you can redistribute it and/or modify it - under the terms and conditions of the GNU General Public License, - version 2, as published by the Free Software Foundation. - - This program is distributed in the hope it will be useful, but WITHOUT - ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or - FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for - more details. - - You should have received a copy of the GNU General Public License along with - this program; if not, write to the Free Software Foundation, Inc., - 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA. - - The full GNU General Public License is included in this distribution in - the file called "COPYING". - - Contact Information: - e1000-devel Mailing List <e1000-devel@lists.sourceforge.net> - Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497 - -*******************************************************************************/ - - -#include "ixgbe_type.h" -#include "ixgbe_dcb.h" -#include "ixgbe_dcb_82599.h" - -/** - * ixgbe_dcb_get_tc_stats_82599 - Returns status for each traffic class - * @hw: pointer to hardware structure - * @stats: pointer to statistics structure - * @tc_count: Number of elements in bwg_array. - * - * This function returns the status data for each of the Traffic Classes in use. - */ -s32 ixgbe_dcb_get_tc_stats_82599(struct ixgbe_hw *hw, - struct ixgbe_hw_stats *stats, - u8 tc_count) -{ - int tc; - - if (tc_count > MAX_TRAFFIC_CLASS) - return DCB_ERR_PARAM; - /* Statistics pertaining to each traffic class */ - for (tc = 0; tc < tc_count; tc++) { - /* Transmitted Packets */ - stats->qptc[tc] += IXGBE_READ_REG(hw, IXGBE_QPTC(tc)); - /* Transmitted Bytes */ - stats->qbtc[tc] += IXGBE_READ_REG(hw, IXGBE_QBTC(tc)); - /* Received Packets */ - stats->qprc[tc] += IXGBE_READ_REG(hw, IXGBE_QPRC(tc)); - /* Received Bytes */ - stats->qbrc[tc] += IXGBE_READ_REG(hw, IXGBE_QBRC(tc)); - -#if 0 - /* Can we get rid of these?? Consequently, getting rid - * of the tc_stats structure. - */ - tc_stats_array[up]->in_overflow_discards = 0; - tc_stats_array[up]->out_overflow_discards = 0; -#endif - } - - return 0; -} - -/** - * ixgbe_dcb_get_pfc_stats_82599 - Return CBFC status data - * @hw: pointer to hardware structure - * @stats: pointer to statistics structure - * @tc_count: Number of elements in bwg_array. - * - * This function returns the CBFC status data for each of the Traffic Classes. - */ -s32 ixgbe_dcb_get_pfc_stats_82599(struct ixgbe_hw *hw, - struct ixgbe_hw_stats *stats, - u8 tc_count) -{ - int tc; - - if (tc_count > MAX_TRAFFIC_CLASS) - return DCB_ERR_PARAM; - for (tc = 0; tc < tc_count; tc++) { - /* Priority XOFF Transmitted */ - stats->pxofftxc[tc] += IXGBE_READ_REG(hw, IXGBE_PXOFFTXC(tc)); - /* Priority XOFF Received */ - stats->pxoffrxc[tc] += IXGBE_READ_REG(hw, IXGBE_PXOFFRXCNT(tc)); - } - - return 0; -} - -/** - * ixgbe_dcb_config_packet_buffers_82599 - Configure DCB packet buffers - * @hw: pointer to hardware structure - * @dcb_config: pointer to ixgbe_dcb_config structure - * - * Configure packet buffers for DCB mode. - */ -s32 ixgbe_dcb_config_packet_buffers_82599(struct ixgbe_hw *hw, - struct ixgbe_dcb_config *dcb_config) -{ - s32 ret_val = 0; - u32 value = IXGBE_RXPBSIZE_64KB; - u8 i = 0; - - /* Setup Rx packet buffer sizes */ - switch (dcb_config->rx_pba_cfg) { - case pba_80_48: - /* Setup the first four at 80KB */ - value = IXGBE_RXPBSIZE_80KB; - for (; i < 4; i++) - IXGBE_WRITE_REG(hw, IXGBE_RXPBSIZE(i), value); - /* Setup the last four at 48KB...don''t re-init i */ - value = IXGBE_RXPBSIZE_48KB; - /* Fall Through */ - case pba_equal: - default: - for (; i < IXGBE_MAX_PACKET_BUFFERS; i++) - IXGBE_WRITE_REG(hw, IXGBE_RXPBSIZE(i), value); - - /* Setup Tx packet buffer sizes */ - for (i = 0; i < IXGBE_MAX_PACKET_BUFFERS; i++) { - IXGBE_WRITE_REG(hw, IXGBE_TXPBSIZE(i), - IXGBE_TXPBSIZE_20KB); - IXGBE_WRITE_REG(hw, IXGBE_TXPBTHRESH(i), - IXGBE_TXPBTHRESH_DCB); - } - break; - } - - return ret_val; -} - -/** - * ixgbe_dcb_config_rx_arbiter_82599 - Config Rx Data arbiter - * @hw: pointer to hardware structure - * @dcb_config: pointer to ixgbe_dcb_config structure - * - * Configure Rx Packet Arbiter and credits for each traffic class. - */ -s32 ixgbe_dcb_config_rx_arbiter_82599(struct ixgbe_hw *hw, - struct ixgbe_dcb_config *dcb_config) -{ - struct tc_bw_alloc *p; - u32 reg = 0; - u32 credit_refill = 0; - u32 credit_max = 0; - u8 i = 0; - - /* - * Disable the arbiter before changing parameters - * (always enable recycle mode; WSP) - */ - reg = IXGBE_RTRPCS_RRM | IXGBE_RTRPCS_RAC | IXGBE_RTRPCS_ARBDIS; - IXGBE_WRITE_REG(hw, IXGBE_RTRPCS, reg); - - /* Map all traffic classes to their UP, 1 to 1 */ - reg = 0; - for (i = 0; i < MAX_TRAFFIC_CLASS; i++) - reg |= (i << (i * IXGBE_RTRUP2TC_UP_SHIFT)); - IXGBE_WRITE_REG(hw, IXGBE_RTRUP2TC, reg); - - /* Configure traffic class credits and priority */ - for (i = 0; i < MAX_TRAFFIC_CLASS; i++) { - p = &dcb_config->tc_config[i].path[DCB_RX_CONFIG]; - - credit_refill = p->data_credits_refill; - credit_max = p->data_credits_max; - reg = credit_refill | (credit_max << IXGBE_RTRPT4C_MCL_SHIFT); - - reg |= (u32)(p->bwg_id) << IXGBE_RTRPT4C_BWG_SHIFT; - - if (p->prio_type == prio_link) - reg |= IXGBE_RTRPT4C_LSP; - - IXGBE_WRITE_REG(hw, IXGBE_RTRPT4C(i), reg); - } - - /* - * Configure Rx packet plane (recycle mode; WSP) and - * enable arbiter - */ - reg = IXGBE_RTRPCS_RRM | IXGBE_RTRPCS_RAC; - IXGBE_WRITE_REG(hw, IXGBE_RTRPCS, reg); - - return 0; -} - -/** - * ixgbe_dcb_config_tx_desc_arbiter_82599 - Config Tx Desc. arbiter - * @hw: pointer to hardware structure - * @dcb_config: pointer to ixgbe_dcb_config structure - * - * Configure Tx Descriptor Arbiter and credits for each traffic class. - */ -s32 ixgbe_dcb_config_tx_desc_arbiter_82599(struct ixgbe_hw *hw, - struct ixgbe_dcb_config *dcb_config) -{ - struct tc_bw_alloc *p; - u32 reg, max_credits; - u8 i; - - /* - * Disable the arbiter before changing parameters - * (always enable recycle mode; WSP) - */ - reg = IXGBE_RTTDCS_TDPAC | IXGBE_RTTDCS_TDRM | IXGBE_RTTDCS_ARBDIS; - IXGBE_WRITE_REG(hw, IXGBE_RTTDCS, reg); - - /* Clear the per-Tx queue credits; we use per-TC instead */ - for (i = 0; i < 128; i++) { - IXGBE_WRITE_REG(hw, IXGBE_RTTDQSEL, i); - IXGBE_WRITE_REG(hw, IXGBE_RTTDT1C, 0); - } - - /* Configure traffic class credits and priority */ - for (i = 0; i < MAX_TRAFFIC_CLASS; i++) { - p = &dcb_config->tc_config[i].path[DCB_TX_CONFIG]; - max_credits = dcb_config->tc_config[i].desc_credits_max; - reg = max_credits << IXGBE_RTTDT2C_MCL_SHIFT; - reg |= p->data_credits_refill; - reg |= (u32)(p->bwg_id) << IXGBE_RTTDT2C_BWG_SHIFT; - - if (p->prio_type == prio_group) - reg |= IXGBE_RTTDT2C_GSP; - - if (p->prio_type == prio_link) - reg |= IXGBE_RTTDT2C_LSP; - - IXGBE_WRITE_REG(hw, IXGBE_RTTDT2C(i), reg); - } - - /* - * Configure Tx descriptor plane (recycle mode; WSP) and - * enable arbiter - */ - reg = IXGBE_RTTDCS_TDPAC | IXGBE_RTTDCS_TDRM; - IXGBE_WRITE_REG(hw, IXGBE_RTTDCS, reg); - - return 0; -} - -/** - * ixgbe_dcb_config_tx_data_arbiter_82599 - Config Tx Data arbiter - * @hw: pointer to hardware structure - * @dcb_config: pointer to ixgbe_dcb_config structure - * - * Configure Tx Packet Arbiter and credits for each traffic class. - */ -s32 ixgbe_dcb_config_tx_data_arbiter_82599(struct ixgbe_hw *hw, - struct ixgbe_dcb_config *dcb_config) -{ - struct tc_bw_alloc *p; - u32 reg; - u8 i; - - /* - * Disable the arbiter before changing parameters - * (always enable recycle mode; SP; arb delay) - */ - reg = IXGBE_RTTPCS_TPPAC | IXGBE_RTTPCS_TPRM | - (IXGBE_RTTPCS_ARBD_DCB << IXGBE_RTTPCS_ARBD_SHIFT) | - IXGBE_RTTPCS_ARBDIS; - IXGBE_WRITE_REG(hw, IXGBE_RTTPCS, reg); - - /* Map all traffic classes to their UP, 1 to 1 */ - reg = 0; - for (i = 0; i < MAX_TRAFFIC_CLASS; i++) - reg |= (i << (i * IXGBE_RTTUP2TC_UP_SHIFT)); - IXGBE_WRITE_REG(hw, IXGBE_RTTUP2TC, reg); - - /* Configure traffic class credits and priority */ - for (i = 0; i < MAX_TRAFFIC_CLASS; i++) { - p = &dcb_config->tc_config[i].path[DCB_TX_CONFIG]; - reg = p->data_credits_refill; - reg |= (u32)(p->data_credits_max) << IXGBE_RTTPT2C_MCL_SHIFT; - reg |= (u32)(p->bwg_id) << IXGBE_RTTPT2C_BWG_SHIFT; - - if (p->prio_type == prio_group) - reg |= IXGBE_RTTPT2C_GSP; - - if (p->prio_type == prio_link) - reg |= IXGBE_RTTPT2C_LSP; - - IXGBE_WRITE_REG(hw, IXGBE_RTTPT2C(i), reg); - } - - /* - * Configure Tx packet plane (recycle mode; SP; arb delay) and - * enable arbiter - */ - reg = IXGBE_RTTPCS_TPPAC | IXGBE_RTTPCS_TPRM | - (IXGBE_RTTPCS_ARBD_DCB << IXGBE_RTTPCS_ARBD_SHIFT); - IXGBE_WRITE_REG(hw, IXGBE_RTTPCS, reg); - - return 0; -} - -/** - * ixgbe_dcb_config_pfc_82599 - Configure priority flow control - * @hw: pointer to hardware structure - * @dcb_config: pointer to ixgbe_dcb_config structure - * - * Configure Priority Flow Control (PFC) for each traffic class. - */ -s32 ixgbe_dcb_config_pfc_82599(struct ixgbe_hw *hw, - struct ixgbe_dcb_config *dcb_config) -{ - u32 i, reg, rx_pba_size; - - /* If PFC is disabled globally then fall back to LFC. */ - if (!dcb_config->pfc_mode_enable) { - for (i = 0; i < MAX_TRAFFIC_CLASS; i++) - hw->mac.ops.fc_enable(hw, i); - goto out; - } - - /* Configure PFC Tx thresholds per TC */ - for (i = 0; i < MAX_TRAFFIC_CLASS; i++) { - if (dcb_config->rx_pba_cfg == pba_equal) - rx_pba_size = IXGBE_RXPBSIZE_64KB; - else - rx_pba_size = (i < 4) ? IXGBE_RXPBSIZE_80KB - : IXGBE_RXPBSIZE_48KB; - - reg = ((rx_pba_size >> 5) & 0xFFE0); - if (dcb_config->tc_config[i].dcb_pfc == pfc_enabled_full || - dcb_config->tc_config[i].dcb_pfc == pfc_enabled_tx) - reg |= IXGBE_FCRTL_XONE; - IXGBE_WRITE_REG(hw, IXGBE_FCRTL_82599(i), reg); - - reg = ((rx_pba_size >> 2) & 0xFFE0); - if (dcb_config->tc_config[i].dcb_pfc == pfc_enabled_full || - dcb_config->tc_config[i].dcb_pfc == pfc_enabled_tx) - reg |= IXGBE_FCRTH_FCEN; - IXGBE_WRITE_REG(hw, IXGBE_FCRTH_82599(i), reg); - } - - /* Configure pause time (2 TCs per register) */ - reg = hw->fc.pause_time | (hw->fc.pause_time << 16); - for (i = 0; i < (MAX_TRAFFIC_CLASS / 2); i++) - IXGBE_WRITE_REG(hw, IXGBE_FCTTV(i), reg); - - /* Configure flow control refresh threshold value */ - IXGBE_WRITE_REG(hw, IXGBE_FCRTV, hw->fc.pause_time / 2); - - /* Enable Transmit PFC */ - reg = IXGBE_FCCFG_TFCE_PRIORITY; - IXGBE_WRITE_REG(hw, IXGBE_FCCFG, reg); - - /* - * Enable Receive PFC - * We will always honor XOFF frames we receive when - * we are in PFC mode. - */ - reg = IXGBE_READ_REG(hw, IXGBE_MFLCN); - reg &= ~IXGBE_MFLCN_RFCE; - reg |= IXGBE_MFLCN_RPFCE; - IXGBE_WRITE_REG(hw, IXGBE_MFLCN, reg); -out: - return 0; -} - -/** - * ixgbe_dcb_config_tc_stats_82599 - Config traffic class statistics - * @hw: pointer to hardware structure - * - * Configure queue statistics registers, all queues belonging to same traffic - * class uses a single set of queue statistics counters. - */ -s32 ixgbe_dcb_config_tc_stats_82599(struct ixgbe_hw *hw) -{ - u32 reg = 0; - u8 i = 0; - - /* - * Receive Queues stats setting - * 32 RQSMR registers, each configuring 4 queues. - * Set all 16 queues of each TC to the same stat - * with TC ''n'' going to stat ''n''. - */ - for (i = 0; i < 32; i++) { - reg = 0x01010101 * (i / 4); - IXGBE_WRITE_REG(hw, IXGBE_RQSMR(i), reg); - } - /* - * Transmit Queues stats setting - * 32 TQSM registers, each controlling 4 queues. - * Set all queues of each TC to the same stat - * with TC ''n'' going to stat ''n''. - * Tx queues are allocated non-uniformly to TCs: - * 32, 32, 16, 16, 8, 8, 8, 8. - */ - for (i = 0; i < 32; i++) { - if (i < 8) - reg = 0x00000000; - else if (i < 16) - reg = 0x01010101; - else if (i < 20) - reg = 0x02020202; - else if (i < 24) - reg = 0x03030303; - else if (i < 26) - reg = 0x04040404; - else if (i < 28) - reg = 0x05050505; - else if (i < 30) - reg = 0x06060606; - else - reg = 0x07070707; - IXGBE_WRITE_REG(hw, IXGBE_TQSM(i), reg); - } - - return 0; -} - -/** - * ixgbe_dcb_config_82599 - Configure general DCB parameters - * @hw: pointer to hardware structure - * @dcb_config: pointer to ixgbe_dcb_config structure - * - * Configure general DCB parameters. - */ -s32 ixgbe_dcb_config_82599(struct ixgbe_hw *hw) -{ - u32 reg; - u32 q; - - /* Disable the Tx desc arbiter so that MTQC can be changed */ - reg = IXGBE_READ_REG(hw, IXGBE_RTTDCS); - reg |= IXGBE_RTTDCS_ARBDIS; - IXGBE_WRITE_REG(hw, IXGBE_RTTDCS, reg); - - /* Enable DCB for Rx with 8 TCs */ - reg = IXGBE_READ_REG(hw, IXGBE_MRQC); - switch (reg & IXGBE_MRQC_MRQE_MASK) { - case 0: - case IXGBE_MRQC_RT4TCEN: - /* RSS disabled cases */ - reg = (reg & ~IXGBE_MRQC_MRQE_MASK) | IXGBE_MRQC_RT8TCEN; - break; - case IXGBE_MRQC_RSSEN: - case IXGBE_MRQC_RTRSS4TCEN: - /* RSS enabled cases */ - reg = (reg & ~IXGBE_MRQC_MRQE_MASK) | IXGBE_MRQC_RTRSS8TCEN; - break; - default: - /* Unsupported value, assume stale data, overwrite no RSS */ - reg = (reg & ~IXGBE_MRQC_MRQE_MASK) | IXGBE_MRQC_RT8TCEN; - } - IXGBE_WRITE_REG(hw, IXGBE_MRQC, reg); - - /* Enable DCB for Tx with 8 TCs */ - reg = IXGBE_MTQC_RT_ENA | IXGBE_MTQC_8TC_8TQ; - IXGBE_WRITE_REG(hw, IXGBE_MTQC, reg); - - /* Disable drop for all queues */ - for (q=0; q < 128; q++) { - IXGBE_WRITE_REG(hw, IXGBE_QDE, q << IXGBE_QDE_IDX_SHIFT); - } - - /* Enable the Tx desc arbiter */ - reg = IXGBE_READ_REG(hw, IXGBE_RTTDCS); - reg &= ~IXGBE_RTTDCS_ARBDIS; - IXGBE_WRITE_REG(hw, IXGBE_RTTDCS, reg); - - return 0; -} - -/** - * ixgbe_dcb_hw_config_82599 - Configure and enable DCB - * @hw: pointer to hardware structure - * @dcb_config: pointer to ixgbe_dcb_config structure - * - * Configure dcb settings and enable dcb mode. - */ -s32 ixgbe_dcb_hw_config_82599(struct ixgbe_hw *hw, - struct ixgbe_dcb_config *dcb_config) -{ - u32 pap = 0; - - ixgbe_dcb_config_packet_buffers_82599(hw, dcb_config); - ixgbe_dcb_config_82599(hw); - ixgbe_dcb_config_rx_arbiter_82599(hw, dcb_config); - ixgbe_dcb_config_tx_desc_arbiter_82599(hw, dcb_config); - ixgbe_dcb_config_tx_data_arbiter_82599(hw, dcb_config); - ixgbe_dcb_config_pfc_82599(hw, dcb_config); - ixgbe_dcb_config_tc_stats_82599(hw); - - /* - * TODO: For DCB SV purpose only, - * remove it before product release - */ - if (dcb_config->link_speed > 0 && dcb_config->link_speed <= 9) { - pap = IXGBE_READ_REG(hw, IXGBE_PAP); - pap |= (dcb_config->link_speed << 16); - IXGBE_WRITE_REG(hw, IXGBE_PAP, pap); - } - - return 0; -} - diff --git a/drivers/net/ixgbe/ixgbe_dcb_82599.h b/drivers/net/ixgbe/ixgbe_dcb_82599.h deleted file mode 100644 index 00cf7da..0000000 --- a/drivers/net/ixgbe/ixgbe_dcb_82599.h +++ /dev/null @@ -1,125 +0,0 @@ -/******************************************************************************* - - Intel 10 Gigabit PCI Express Linux driver - Copyright(c) 1999 - 2009 Intel Corporation. - - This program is free software; you can redistribute it and/or modify it - under the terms and conditions of the GNU General Public License, - version 2, as published by the Free Software Foundation. - - This program is distributed in the hope it will be useful, but WITHOUT - ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or - FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for - more details. - - You should have received a copy of the GNU General Public License along with - this program; if not, write to the Free Software Foundation, Inc., - 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA. - - The full GNU General Public License is included in this distribution in - the file called "COPYING". - - Contact Information: - e1000-devel Mailing List <e1000-devel@lists.sourceforge.net> - Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497 - -*******************************************************************************/ - -#ifndef _DCB_82599_CONFIG_H_ -#define _DCB_82599_CONFIG_H_ - -/* DCB register definitions */ -#define IXGBE_RTTDCS_TDPAC 0x00000001 /* 0 Round Robin, - * 1 WSP - Weighted Strict Priority - */ -#define IXGBE_RTTDCS_VMPAC 0x00000002 /* 0 Round Robin, - * 1 WRR - Weighted Round Robin - */ -#define IXGBE_RTTDCS_TDRM 0x00000010 /* Transmit Recycle Mode */ -#define IXGBE_RTTDCS_BDPM 0x00400000 /* Bypass Data Pipe - must clear! */ -#define IXGBE_RTTDCS_BPBFSM 0x00800000 /* Bypass PB Free Space - must - * clear! - */ -#define IXGBE_RTTDCS_SPEED_CHG 0x80000000 /* Link speed change */ - -/* Receive UP2TC mapping */ -#define IXGBE_RTRUP2TC_UP_SHIFT 3 -/* Transmit UP2TC mapping */ -#define IXGBE_RTTUP2TC_UP_SHIFT 3 - -#define IXGBE_RTRPT4C_MCL_SHIFT 12 /* Offset to Max Credit Limit setting */ -#define IXGBE_RTRPT4C_BWG_SHIFT 9 /* Offset to BWG index */ -#define IXGBE_RTRPT4C_GSP 0x40000000 /* GSP enable bit */ -#define IXGBE_RTRPT4C_LSP 0x80000000 /* LSP enable bit */ - -#define IXGBE_RDRXCTL_MPBEN 0x00000010 /* DMA config for multiple packet - * buffers enable - */ -#define IXGBE_RDRXCTL_MCEN 0x00000040 /* DMA config for multiple cores - * (RSS) enable - */ - -/* RTRPCS Bit Masks */ -#define IXGBE_RTRPCS_RRM 0x00000002 /* Receive Recycle Mode enable */ -/* Receive Arbitration Control: 0 Round Robin, 1 DFP */ -#define IXGBE_RTRPCS_RAC 0x00000004 -#define IXGBE_RTRPCS_ARBDIS 0x00000040 /* Arbitration disable bit */ - -/* RTTDT2C Bit Masks */ -#define IXGBE_RTTDT2C_MCL_SHIFT 12 -#define IXGBE_RTTDT2C_BWG_SHIFT 9 -#define IXGBE_RTTDT2C_GSP 0x40000000 -#define IXGBE_RTTDT2C_LSP 0x80000000 - -#define IXGBE_RTTPT2C_MCL_SHIFT 12 -#define IXGBE_RTTPT2C_BWG_SHIFT 9 -#define IXGBE_RTTPT2C_GSP 0x40000000 -#define IXGBE_RTTPT2C_LSP 0x80000000 - -/* RTTPCS Bit Masks */ -#define IXGBE_RTTPCS_TPPAC 0x00000020 /* 0 Round Robin, - * 1 SP - Strict Priority - */ -#define IXGBE_RTTPCS_ARBDIS 0x00000040 /* Arbiter disable */ -#define IXGBE_RTTPCS_TPRM 0x00000100 /* Transmit Recycle Mode enable */ -#define IXGBE_RTTPCS_ARBD_SHIFT 22 -#define IXGBE_RTTPCS_ARBD_DCB 0x4 /* Arbitration delay in DCB mode */ - -#define IXGBE_TXPBSIZE_20KB 0x00005000 /* 20KB Packet Buffer */ -#define IXGBE_TXPBSIZE_40KB 0x0000A000 /* 40KB Packet Buffer */ -#define IXGBE_RXPBSIZE_48KB 0x0000C000 /* 48KB Packet Buffer */ -#define IXGBE_RXPBSIZE_64KB 0x00010000 /* 64KB Packet Buffer */ -#define IXGBE_RXPBSIZE_80KB 0x00014000 /* 80KB Packet Buffer */ -#define IXGBE_RXPBSIZE_128KB 0x00020000 /* 128KB Packet Buffer */ - -#define IXGBE_TXPBTHRESH_DCB 0xA /* THRESH value for DCB mode */ - - -/* DCB hardware-specific driver APIs */ - -/* DCB PFC functions */ -s32 ixgbe_dcb_config_pfc_82599(struct ixgbe_hw *hw, - struct ixgbe_dcb_config *dcb_config); -s32 ixgbe_dcb_get_pfc_stats_82599(struct ixgbe_hw *hw, - struct ixgbe_hw_stats *stats, - u8 tc_count); - -/* DCB traffic class stats */ -s32 ixgbe_dcb_config_tc_stats_82599(struct ixgbe_hw *hw); -s32 ixgbe_dcb_get_tc_stats_82599(struct ixgbe_hw *hw, - struct ixgbe_hw_stats *stats, - u8 tc_count); - -/* DCB config arbiters */ -s32 ixgbe_dcb_config_tx_desc_arbiter_82599(struct ixgbe_hw *hw, - struct ixgbe_dcb_config *dcb_config); -s32 ixgbe_dcb_config_tx_data_arbiter_82599(struct ixgbe_hw *hw, - struct ixgbe_dcb_config *dcb_config); -s32 ixgbe_dcb_config_rx_arbiter_82599(struct ixgbe_hw *hw, - struct ixgbe_dcb_config *dcb_config); - -/* DCB hw initialization */ -s32 ixgbe_dcb_hw_config_82599(struct ixgbe_hw *hw, - struct ixgbe_dcb_config *config); - -#endif /* _DCB_82599_CONFIG_H */ diff --git a/drivers/net/ixgbe/ixgbe_dcb_nl.c b/drivers/net/ixgbe/ixgbe_dcb_nl.c index 043045d..f275114 100644 --- a/drivers/net/ixgbe/ixgbe_dcb_nl.c +++ b/drivers/net/ixgbe/ixgbe_dcb_nl.c @@ -1,7 +1,7 @@ /******************************************************************************* Intel 10 Gigabit PCI Express Linux driver - Copyright(c) 1999 - 2009 Intel Corporation. + Copyright(c) 1999 - 2008 Intel Corporation. This program is free software; you can redistribute it and/or modify it under the terms and conditions of the GNU General Public License, @@ -27,31 +27,19 @@ #include "ixgbe.h" -#ifdef CONFIG_DCB -#include <linux/dcbnl.h> -#include "ixgbe_dcb_82598.h" -#include "ixgbe_dcb_82599.h" -#else #include <linux/netlink.h> #include <linux/genetlink.h> #include <net/genetlink.h> #include <linux/netdevice.h> -#endif /* Callbacks for DCB netlink in the kernel */ #define BIT_DCB_MODE 0x01 #define BIT_PFC 0x02 #define BIT_PG_RX 0x04 #define BIT_PG_TX 0x08 -#define BIT_RESETLINK 0x40 +#define BIT_BCN 0x10 #define BIT_LINKSPEED 0x80 -/* Responses for the DCB_C_SET_ALL command */ -#define DCB_HW_CHG_RST 0 /* DCB configuration changed with reset */ -#define DCB_NO_HW_CHG 1 /* DCB configuration did not change */ -#define DCB_HW_CHG 2 /* DCB configuration changed, no reset */ - -#ifndef CONFIG_DCB /* DCB configuration commands */ enum { DCB_C_UNDEFINED, @@ -267,66 +255,7 @@ static int ixgbe_dcb_check_adapter(struct net_device *netdev) else return -EINVAL; } -#endif - -#ifdef CONFIG_DCB -int ixgbe_copy_dcb_cfg(struct ixgbe_dcb_config *src_dcb_cfg, - struct ixgbe_dcb_config *dst_dcb_cfg, int tc_max) -{ - struct tc_configuration *src_tc_cfg = NULL; - struct tc_configuration *dst_tc_cfg = NULL; - int i; - - if (!src_dcb_cfg || !dst_dcb_cfg) - return -EINVAL; - - for (i = DCB_PG_ATTR_TC_0; i < tc_max + DCB_PG_ATTR_TC_0; i++) { - src_tc_cfg = &src_dcb_cfg->tc_config[i - DCB_PG_ATTR_TC_0]; - dst_tc_cfg = &dst_dcb_cfg->tc_config[i - DCB_PG_ATTR_TC_0]; - dst_tc_cfg->path[DCB_TX_CONFIG].prio_type - src_tc_cfg->path[DCB_TX_CONFIG].prio_type; - - dst_tc_cfg->path[DCB_TX_CONFIG].bwg_id - src_tc_cfg->path[DCB_TX_CONFIG].bwg_id; - - dst_tc_cfg->path[DCB_TX_CONFIG].bwg_percent - src_tc_cfg->path[DCB_TX_CONFIG].bwg_percent; - - dst_tc_cfg->path[DCB_TX_CONFIG].up_to_tc_bitmap - src_tc_cfg->path[DCB_TX_CONFIG].up_to_tc_bitmap; - - dst_tc_cfg->path[DCB_RX_CONFIG].prio_type - src_tc_cfg->path[DCB_RX_CONFIG].prio_type; - - dst_tc_cfg->path[DCB_RX_CONFIG].bwg_id - src_tc_cfg->path[DCB_RX_CONFIG].bwg_id; - - dst_tc_cfg->path[DCB_RX_CONFIG].bwg_percent - src_tc_cfg->path[DCB_RX_CONFIG].bwg_percent; - - dst_tc_cfg->path[DCB_RX_CONFIG].up_to_tc_bitmap - src_tc_cfg->path[DCB_RX_CONFIG].up_to_tc_bitmap; - } - - for (i = DCB_PG_ATTR_BW_ID_0; i < DCB_PG_ATTR_BW_ID_MAX; i++) { - dst_dcb_cfg->bw_percentage[DCB_TX_CONFIG] - [i-DCB_PG_ATTR_BW_ID_0] = src_dcb_cfg->bw_percentage - [DCB_TX_CONFIG][i-DCB_PG_ATTR_BW_ID_0]; - dst_dcb_cfg->bw_percentage[DCB_RX_CONFIG] - [i-DCB_PG_ATTR_BW_ID_0] = src_dcb_cfg->bw_percentage - [DCB_RX_CONFIG][i-DCB_PG_ATTR_BW_ID_0]; - } - - for (i = DCB_PFC_UP_ATTR_0; i < DCB_PFC_UP_ATTR_MAX; i++) { - dst_dcb_cfg->tc_config[i - DCB_PFC_UP_ATTR_0].dcb_pfc - src_dcb_cfg->tc_config[i - DCB_PFC_UP_ATTR_0].dcb_pfc; - } - dst_dcb_cfg->pfc_mode_enable = src_dcb_cfg->pfc_mode_enable; - - return 0; -} -#else static int ixgbe_copy_dcb_cfg(struct ixgbe_dcb_config *src_dcb_cfg, struct ixgbe_dcb_config *dst_dcb_cfg, int tc_max) { @@ -413,90 +342,7 @@ err: kfree(dcb_skb); return -EINVAL; } -#endif - -#ifdef CONFIG_DCB -static u8 ixgbe_dcbnl_get_state(struct net_device *netdev) -{ - struct ixgbe_adapter *adapter = netdev_priv(netdev); - - DPRINTK(DRV, INFO, "Get DCB Admin Mode.\n"); - - return !!(adapter->flags & IXGBE_FLAG_DCB_ENABLED); -} - -static u8 ixgbe_dcbnl_set_state(struct net_device *netdev, u8 state) -{ - u8 err = 0; - struct ixgbe_adapter *adapter = netdev_priv(netdev); - - DPRINTK(DRV, INFO, "Set DCB Admin Mode.\n"); - if (state > 0) { - /* Turn on DCB */ - if (adapter->flags & IXGBE_FLAG_DCB_ENABLED) - goto out; - - if (!(adapter->flags & IXGBE_FLAG_MSIX_ENABLED)) { - DPRINTK(DRV, ERR, "Enable failed, needs MSI-X\n"); - err = 1; - goto out; - } - - if (netif_running(netdev)) -#ifdef HAVE_NET_DEVICE_OPS - netdev->netdev_ops->ndo_stop(netdev); -#else - netdev->stop(netdev); -#endif - ixgbe_clear_interrupt_scheme(adapter); - if (adapter->hw.mac.type == ixgbe_mac_82598EB) { - adapter->last_lfc_mode = adapter->hw.fc.current_mode; - adapter->hw.fc.requested_mode = ixgbe_fc_none; - } - adapter->flags &= ~IXGBE_FLAG_RSS_ENABLED; - if (adapter->hw.mac.type == ixgbe_mac_82599EB) { - DPRINTK(DRV, INFO, "DCB enabled, " - "disabling Flow Director\n"); - adapter->flags &= ~IXGBE_FLAG_FDIR_HASH_CAPABLE; - adapter->flags &= ~IXGBE_FLAG_FDIR_PERFECT_CAPABLE; - } - adapter->flags |= IXGBE_FLAG_DCB_ENABLED; - ixgbe_init_interrupt_scheme(adapter); - if (netif_running(netdev)) -#ifdef HAVE_NET_DEVICE_OPS - netdev->netdev_ops->ndo_open(netdev); -#else - netdev->open(netdev); -#endif - } else { - /* Turn off DCB */ - if (adapter->flags & IXGBE_FLAG_DCB_ENABLED) { - if (netif_running(netdev)) -#ifdef HAVE_NET_DEVICE_OPS - netdev->netdev_ops->ndo_stop(netdev); -#else - netdev->stop(netdev); -#endif - ixgbe_clear_interrupt_scheme(adapter); - adapter->hw.fc.requested_mode = adapter->last_lfc_mode; - adapter->temp_dcb_cfg.pfc_mode_enable = false; - adapter->dcb_cfg.pfc_mode_enable = false; - adapter->flags &= ~IXGBE_FLAG_DCB_ENABLED; - adapter->flags |= IXGBE_FLAG_RSS_ENABLED; - ixgbe_init_interrupt_scheme(adapter); - if (netif_running(netdev)) -#ifdef HAVE_NET_DEVICE_OPS - netdev->netdev_ops->ndo_open(netdev); -#else - netdev->open(netdev); -#endif - } - } -out: - return err; -} -#else static int ixgbe_dcb_gstate(struct sk_buff *skb, struct genl_info *info) { int ret = -ENOMEM; @@ -529,6 +375,9 @@ err_out: return ret; } +extern void ixgbe_napi_add_all(struct ixgbe_adapter *); +extern void ixgbe_napi_del_all(struct ixgbe_adapter *); + static int ixgbe_dcb_sstate(struct sk_buff *skb, struct genl_info *info) { struct net_device *netdev = NULL; @@ -558,25 +407,27 @@ static int ixgbe_dcb_sstate(struct sk_buff *skb, struct genl_info *info) case 0: if (adapter->flags & IXGBE_FLAG_DCB_ENABLED) { if (netdev->flags & IFF_UP) -#ifdef HAVE_NET_DEVICE_OPS - netdev->netdev_ops->ndo_stop(netdev); -#else netdev->stop(netdev); + ixgbe_reset_interrupt_capability(adapter); +#ifdef CONFIG_IXGBE_NAPI + ixgbe_napi_del_all(adapter); #endif - ixgbe_clear_interrupt_scheme(adapter); + kfree(adapter->tx_ring); + kfree(adapter->rx_ring); + adapter->tx_ring = NULL; + adapter->rx_ring = NULL; adapter->flags &= ~IXGBE_FLAG_DCB_ENABLED; if (adapter->flags & IXGBE_FLAG_RSS_CAPABLE) adapter->flags | IXGBE_FLAG_RSS_ENABLED; ixgbe_init_interrupt_scheme(adapter); +#ifdef CONFIG_IXGBE_NAPI + ixgbe_napi_add_all(adapter); +#endif ixgbe_reset(adapter); if (netdev->flags & IFF_UP) -#ifdef HAVE_NET_DEVICE_OPS - netdev->netdev_ops->ndo_open(netdev); -#else netdev->open(netdev); -#endif break; } else { /* Nothing to do, already off */ @@ -593,37 +444,26 @@ static int ixgbe_dcb_sstate(struct sk_buff *skb, struct genl_info *info) goto err_out; } else { if (netdev->flags & IFF_UP) -#ifdef HAVE_NET_DEVICE_OPS - netdev->netdev_ops->ndo_stop(netdev); -#else netdev->stop(netdev); + ixgbe_reset_interrupt_capability(adapter); +#ifdef CONFIG_IXGBE_NAPI + ixgbe_napi_del_all(adapter); #endif - ixgbe_clear_interrupt_scheme(adapter); + kfree(adapter->tx_ring); + kfree(adapter->rx_ring); + adapter->tx_ring = NULL; + adapter->rx_ring = NULL; adapter->flags &= ~IXGBE_FLAG_RSS_ENABLED; adapter->flags |= IXGBE_FLAG_DCB_ENABLED; - adapter->dcb_cfg.support.capabilities - (IXGBE_DCB_PG_SUPPORT | IXGBE_DCB_PFC_SUPPORT | - IXGBE_DCB_GSP_SUPPORT); - if (adapter->hw.mac.type == ixgbe_mac_82599EB) { - DPRINTK(DRV, INFO, "DCB enabled, " - "disabling Flow Director\n"); - adapter->flags &- ~IXGBE_FLAG_FDIR_HASH_CAPABLE; - adapter->flags &- ~IXGBE_FLAG_FDIR_PERFECT_CAPABLE; - adapter->dcb_cfg.support.capabilities |- IXGBE_DCB_UP2TC_SUPPORT; - } adapter->ring_feature[RING_F_DCB].indices = 8; ixgbe_init_interrupt_scheme(adapter); +#ifdef CONFIG_IXGBE_NAPI + ixgbe_napi_add_all(adapter); +#endif ixgbe_reset(adapter); if (netdev->flags & IFF_UP) -#ifdef HAVE_NET_DEVICE_OPS - netdev->netdev_ops->ndo_open(netdev); -#else netdev->open(netdev); -#endif break; } } @@ -721,24 +561,7 @@ err_out: err: return ret; } -#endif - -#ifdef CONFIG_DCB -static void ixgbe_dcbnl_get_perm_hw_addr(struct net_device *netdev, - u8 *perm_addr) -{ - struct ixgbe_adapter *adapter = netdev_priv(netdev); - int i, j; - - for (i = 0; i < netdev->addr_len; i++) - perm_addr[i] = adapter->hw.mac.perm_addr[i]; - if (adapter->hw.mac.type == ixgbe_mac_82599EB) { - for (j = 0; j < netdev->addr_len; j++, i++) - perm_addr[i] = adapter->hw.mac.san_addr[j]; - } -} -#else static int ixgbe_dcb_gperm_hwaddr(struct sk_buff *skb, struct genl_info *info) { void *data; @@ -816,137 +639,7 @@ err_out: dev_put(netdev); return ret; } -#endif - -#ifdef CONFIG_DCB -static void ixgbe_dcbnl_set_pg_tc_cfg_tx(struct net_device *netdev, int tc, - u8 prio, u8 bwg_id, u8 bw_pct, - u8 up_map) -{ - struct ixgbe_adapter *adapter = netdev_priv(netdev); - - if (prio != DCB_ATTR_VALUE_UNDEFINED) - adapter->temp_dcb_cfg.tc_config[tc].path[0].prio_type = prio; - if (bwg_id != DCB_ATTR_VALUE_UNDEFINED) - adapter->temp_dcb_cfg.tc_config[tc].path[0].bwg_id = bwg_id; - if (bw_pct != DCB_ATTR_VALUE_UNDEFINED) - adapter->temp_dcb_cfg.tc_config[tc].path[0].bwg_percent - bw_pct; - if (up_map != DCB_ATTR_VALUE_UNDEFINED) - adapter->temp_dcb_cfg.tc_config[tc].path[0].up_to_tc_bitmap - up_map; - - if ((adapter->temp_dcb_cfg.tc_config[tc].path[0].prio_type !- adapter->dcb_cfg.tc_config[tc].path[0].prio_type) || - (adapter->temp_dcb_cfg.tc_config[tc].path[0].bwg_id !- adapter->dcb_cfg.tc_config[tc].path[0].bwg_id) || - (adapter->temp_dcb_cfg.tc_config[tc].path[0].bwg_percent !- adapter->dcb_cfg.tc_config[tc].path[0].bwg_percent) || - (adapter->temp_dcb_cfg.tc_config[tc].path[0].up_to_tc_bitmap !- adapter->dcb_cfg.tc_config[tc].path[0].up_to_tc_bitmap)) { - adapter->dcb_set_bitmap |= BIT_PG_TX; - adapter->dcb_set_bitmap |= BIT_RESETLINK; - } -} - -static void ixgbe_dcbnl_set_pg_bwg_cfg_tx(struct net_device *netdev, int bwg_id, - u8 bw_pct) -{ - struct ixgbe_adapter *adapter = netdev_priv(netdev); - - adapter->temp_dcb_cfg.bw_percentage[0][bwg_id] = bw_pct; - - if (adapter->temp_dcb_cfg.bw_percentage[0][bwg_id] !- adapter->dcb_cfg.bw_percentage[0][bwg_id]) { - adapter->dcb_set_bitmap |= BIT_PG_RX; - adapter->dcb_set_bitmap |= BIT_RESETLINK; - } -} - -static void ixgbe_dcbnl_set_pg_tc_cfg_rx(struct net_device *netdev, int tc, - u8 prio, u8 bwg_id, u8 bw_pct, - u8 up_map) -{ - struct ixgbe_adapter *adapter = netdev_priv(netdev); - - if (prio != DCB_ATTR_VALUE_UNDEFINED) - adapter->temp_dcb_cfg.tc_config[tc].path[1].prio_type = prio; - if (bwg_id != DCB_ATTR_VALUE_UNDEFINED) - adapter->temp_dcb_cfg.tc_config[tc].path[1].bwg_id = bwg_id; - if (bw_pct != DCB_ATTR_VALUE_UNDEFINED) - adapter->temp_dcb_cfg.tc_config[tc].path[1].bwg_percent - bw_pct; - if (up_map != DCB_ATTR_VALUE_UNDEFINED) - adapter->temp_dcb_cfg.tc_config[tc].path[1].up_to_tc_bitmap - up_map; - - if ((adapter->temp_dcb_cfg.tc_config[tc].path[1].prio_type !- adapter->dcb_cfg.tc_config[tc].path[1].prio_type) || - (adapter->temp_dcb_cfg.tc_config[tc].path[1].bwg_id !- adapter->dcb_cfg.tc_config[tc].path[1].bwg_id) || - (adapter->temp_dcb_cfg.tc_config[tc].path[1].bwg_percent !- adapter->dcb_cfg.tc_config[tc].path[1].bwg_percent) || - (adapter->temp_dcb_cfg.tc_config[tc].path[1].up_to_tc_bitmap !- adapter->dcb_cfg.tc_config[tc].path[1].up_to_tc_bitmap)) { - adapter->dcb_set_bitmap |= BIT_PG_RX; - adapter->dcb_set_bitmap |= BIT_RESETLINK; - } -} - -static void ixgbe_dcbnl_set_pg_bwg_cfg_rx(struct net_device *netdev, int bwg_id, - u8 bw_pct) -{ - struct ixgbe_adapter *adapter = netdev_priv(netdev); - - adapter->temp_dcb_cfg.bw_percentage[1][bwg_id] = bw_pct; - - if (adapter->temp_dcb_cfg.bw_percentage[1][bwg_id] !- adapter->dcb_cfg.bw_percentage[1][bwg_id]) { - adapter->dcb_set_bitmap |= BIT_PG_RX; - adapter->dcb_set_bitmap |= BIT_RESETLINK; - } -} -static void ixgbe_dcbnl_get_pg_tc_cfg_tx(struct net_device *netdev, int tc, - u8 *prio, u8 *bwg_id, u8 *bw_pct, - u8 *up_map) -{ - struct ixgbe_adapter *adapter = netdev_priv(netdev); - - *prio = adapter->dcb_cfg.tc_config[tc].path[0].prio_type; - *bwg_id = adapter->dcb_cfg.tc_config[tc].path[0].bwg_id; - *bw_pct = adapter->dcb_cfg.tc_config[tc].path[0].bwg_percent; - *up_map = adapter->dcb_cfg.tc_config[tc].path[0].up_to_tc_bitmap; -} - -static void ixgbe_dcbnl_get_pg_bwg_cfg_tx(struct net_device *netdev, int bwg_id, - u8 *bw_pct) -{ - struct ixgbe_adapter *adapter = netdev_priv(netdev); - - *bw_pct = adapter->dcb_cfg.bw_percentage[0][bwg_id]; -} - -static void ixgbe_dcbnl_get_pg_tc_cfg_rx(struct net_device *netdev, int tc, - u8 *prio, u8 *bwg_id, u8 *bw_pct, - u8 *up_map) -{ - struct ixgbe_adapter *adapter = netdev_priv(netdev); - - *prio = adapter->dcb_cfg.tc_config[tc].path[1].prio_type; - *bwg_id = adapter->dcb_cfg.tc_config[tc].path[1].bwg_id; - *bw_pct = adapter->dcb_cfg.tc_config[tc].path[1].bwg_percent; - *up_map = adapter->dcb_cfg.tc_config[tc].path[1].up_to_tc_bitmap; -} - -static void ixgbe_dcbnl_get_pg_bwg_cfg_rx(struct net_device *netdev, int bwg_id, - u8 *bw_pct) -{ - struct ixgbe_adapter *adapter = netdev_priv(netdev); - - *bw_pct = adapter->dcb_cfg.bw_percentage[1][bwg_id]; -} -#else static int ixgbe_dcb_pg_scfg(struct sk_buff *skb, struct genl_info *info, int dir) { @@ -1045,7 +738,6 @@ static int ixgbe_dcb_pg_scfg(struct sk_buff *skb, struct genl_info *info, adapter->dcb_set_bitmap |= BIT_PG_TX; else adapter->dcb_set_bitmap |= BIT_PG_RX; - adapter->dcb_set_bitmap |= BIT_RESETLINK; DPRINTK(DRV, INFO, "Set DCB PG\n"); } else { @@ -1215,29 +907,7 @@ static int ixgbe_dcb_pgrx_gcfg(struct sk_buff *skb, struct genl_info *info) { return ixgbe_dcb_pg_gcfg(skb, info, DCB_RX_CONFIG); } -#endif - -#ifdef CONFIG_DCB -static void ixgbe_dcbnl_set_pfc_cfg(struct net_device *netdev, int priority, - u8 setting) -{ - struct ixgbe_adapter *adapter = netdev_priv(netdev); - - adapter->temp_dcb_cfg.tc_config[priority].dcb_pfc = setting; - if (adapter->temp_dcb_cfg.tc_config[priority].dcb_pfc !- adapter->dcb_cfg.tc_config[priority].dcb_pfc) { - adapter->dcb_set_bitmap |= BIT_PFC; - } -} -static void ixgbe_dcbnl_get_pfc_cfg(struct net_device *netdev, int priority, - u8 *setting) -{ - struct ixgbe_adapter *adapter = netdev_priv(netdev); - - *setting = adapter->dcb_cfg.tc_config[priority].dcb_pfc; -} -#else static int ixgbe_dcb_spfccfg(struct sk_buff *skb, struct genl_info *info) { struct nlattr *tb[IXGBE_DCB_PFC_A_UP_MAX + 1]; @@ -1381,70 +1051,7 @@ err_out: dev_put(netdev); return ret; } -#endif - -#ifdef CONFIG_DCB -static u8 ixgbe_dcbnl_set_all(struct net_device *netdev) -{ - struct ixgbe_adapter *adapter = netdev_priv(netdev); - int ret; - - if (!adapter->dcb_set_bitmap) - return DCB_NO_HW_CHG; - - /* Only take down the adapter if the configuration change - * requires a reset. - */ - if (adapter->dcb_set_bitmap & BIT_RESETLINK) { - while (test_and_set_bit(__IXGBE_RESETTING, &adapter->state)) - msleep(1); - if (netif_running(netdev)) - ixgbe_down(adapter); - } - - ret = ixgbe_copy_dcb_cfg(&adapter->temp_dcb_cfg, &adapter->dcb_cfg, - adapter->ring_feature[RING_F_DCB].indices); - if (ret) { - if (adapter->dcb_set_bitmap & BIT_RESETLINK) - clear_bit(__IXGBE_RESETTING, &adapter->state); - return DCB_NO_HW_CHG; - } - - if (adapter->dcb_cfg.pfc_mode_enable) { - if ((adapter->hw.mac.type != ixgbe_mac_82598EB) && - (adapter->hw.fc.current_mode != ixgbe_fc_pfc)) - adapter->last_lfc_mode = adapter->hw.fc.current_mode; - adapter->hw.fc.requested_mode = ixgbe_fc_pfc; - } else { - if (adapter->hw.mac.type != ixgbe_mac_82598EB) - adapter->hw.fc.requested_mode = adapter->last_lfc_mode; - else - adapter->hw.fc.requested_mode = ixgbe_fc_none; - } - - if (adapter->dcb_set_bitmap & BIT_RESETLINK) { - if (netif_running(netdev)) - ixgbe_up(adapter); - ret = DCB_HW_CHG_RST; - } else if (adapter->dcb_set_bitmap & BIT_PFC) { - if (adapter->hw.mac.type == ixgbe_mac_82598EB) - ixgbe_dcb_config_pfc_82598(&adapter->hw, - &adapter->dcb_cfg); - else if (adapter->hw.mac.type == ixgbe_mac_82599EB) - ixgbe_dcb_config_pfc_82599(&adapter->hw, - &adapter->dcb_cfg); - ret = DCB_HW_CHG; - } - if (adapter->dcb_cfg.pfc_mode_enable) - adapter->hw.fc.current_mode = ixgbe_fc_pfc; - - if (adapter->dcb_set_bitmap & BIT_RESETLINK) - clear_bit(__IXGBE_RESETTING, &adapter->state); - adapter->dcb_set_bitmap = 0x00; - return ret; -} -#else static int ixgbe_dcb_set_all(struct sk_buff *skb, struct genl_info *info) { struct net_device *netdev = NULL; @@ -1510,119 +1117,8 @@ err_out: err: return ret; } -#endif -#ifdef CONFIG_DCB -static u8 ixgbe_dcbnl_getcap(struct net_device *netdev, int capid, u8 *cap) -{ - struct ixgbe_adapter *adapter = netdev_priv(netdev); - u8 rval = 0; - - if (adapter->flags & IXGBE_FLAG_DCB_ENABLED) { - switch (capid) { - case DCB_CAP_ATTR_PG: - *cap = true; - break; - case DCB_CAP_ATTR_PFC: - *cap = true; - break; - case DCB_CAP_ATTR_UP2TC: - *cap = false; - break; - case DCB_CAP_ATTR_PG_TCS: - *cap = 0x80; - break; - case DCB_CAP_ATTR_PFC_TCS: - *cap = 0x80; - break; - case DCB_CAP_ATTR_GSP: - *cap = true; - break; - default: - rval = -EINVAL; - break; - } - } else { - rval = -EINVAL; - } - - return rval; -} -static u8 ixgbe_dcbnl_getnumtcs(struct net_device *netdev, int tcid, u8 *num) -{ - struct ixgbe_adapter *adapter = netdev_priv(netdev); - u8 rval = 0; - - if (adapter->flags & IXGBE_FLAG_DCB_ENABLED) { - switch (tcid) { - case DCB_NUMTCS_ATTR_PG: - *num = MAX_TRAFFIC_CLASS; - break; - case DCB_NUMTCS_ATTR_PFC: - *num = MAX_TRAFFIC_CLASS; - break; - default: - rval = -EINVAL; - break; - } - } else { - rval = -EINVAL; - } - - return rval; -} - -static u8 ixgbe_dcbnl_setnumtcs(struct net_device *netdev, int tcid, u8 num) -{ - return -EINVAL; -} - -static u8 ixgbe_dcbnl_getpfcstate(struct net_device *netdev) -{ - struct ixgbe_adapter *adapter = netdev_priv(netdev); - - return adapter->dcb_cfg.pfc_mode_enable; -} - -static void ixgbe_dcbnl_setpfcstate(struct net_device *netdev, u8 state) -{ - struct ixgbe_adapter *adapter = netdev_priv(netdev); - - DPRINTK(DRV, INFO, "Setting PFC state to %d.\n", state); - adapter->temp_dcb_cfg.pfc_mode_enable = state; - if (adapter->temp_dcb_cfg.pfc_mode_enable != - adapter->dcb_cfg.pfc_mode_enable) - adapter->dcb_set_bitmap |= BIT_PFC; - return; -} - -#else -#endif - -#ifdef CONFIG_DCB -struct dcbnl_rtnl_ops dcbnl_ops = { - .getstate = ixgbe_dcbnl_get_state, - .setstate = ixgbe_dcbnl_set_state, - .getpermhwaddr = ixgbe_dcbnl_get_perm_hw_addr, - .setpgtccfgtx = ixgbe_dcbnl_set_pg_tc_cfg_tx, - .setpgbwgcfgtx = ixgbe_dcbnl_set_pg_bwg_cfg_tx, - .setpgtccfgrx = ixgbe_dcbnl_set_pg_tc_cfg_rx, - .setpgbwgcfgrx = ixgbe_dcbnl_set_pg_bwg_cfg_rx, - .getpgtccfgtx = ixgbe_dcbnl_get_pg_tc_cfg_tx, - .getpgbwgcfgtx = ixgbe_dcbnl_get_pg_bwg_cfg_tx, - .getpgtccfgrx = ixgbe_dcbnl_get_pg_tc_cfg_rx, - .getpgbwgcfgrx = ixgbe_dcbnl_get_pg_bwg_cfg_rx, - .setpfccfg = ixgbe_dcbnl_set_pfc_cfg, - .getpfccfg = ixgbe_dcbnl_get_pfc_cfg, - .setall = ixgbe_dcbnl_set_all, - .getcap = ixgbe_dcbnl_getcap, - .getnumtcs = ixgbe_dcbnl_getnumtcs, - .setnumtcs = ixgbe_dcbnl_setnumtcs, - .getpfcstate = ixgbe_dcbnl_getpfcstate, - .setpfcstate = ixgbe_dcbnl_setpfcstate, -}; -#else /* DCB Generic NETLINK command Definitions */ /* Get DCB Admin Mode */ static struct genl_ops ixgbe_dcb_genl_c_gstate = { @@ -1817,4 +1313,3 @@ int ixgbe_dcb_netlink_unregister(void) { return genl_unregister_family(&dcb_family); } -#endif diff --git a/drivers/net/ixgbe/ixgbe_ethtool.c b/drivers/net/ixgbe/ixgbe_ethtool.c index 26d6f83..e9763dd 100644 --- a/drivers/net/ixgbe/ixgbe_ethtool.c +++ b/drivers/net/ixgbe/ixgbe_ethtool.c @@ -1,7 +1,7 @@ /******************************************************************************* Intel 10 Gigabit PCI Express Linux driver - Copyright(c) 1999 - 2009 Intel Corporation. + Copyright(c) 1999 - 2008 Intel Corporation. This program is free software; you can redistribute it and/or modify it under the terms and conditions of the GNU General Public License, @@ -101,23 +101,20 @@ static struct ixgbe_stats ixgbe_gstrings_stats[] = { {"rx_csum_offload_good", IXGBE_STAT(hw_csum_rx_good)}, {"rx_csum_offload_errors", IXGBE_STAT(hw_csum_rx_error)}, {"tx_csum_offload_ctxt", IXGBE_STAT(hw_csum_tx_good)}, + {"rx_header_split", IXGBE_STAT(rx_hdr_split)}, #ifndef IXGBE_NO_LLI {"low_latency_interrupt", IXGBE_STAT(lli_int)}, #endif {"alloc_rx_page_failed", IXGBE_STAT(alloc_rx_page_failed)}, {"alloc_rx_buff_failed", IXGBE_STAT(alloc_rx_buff_failed)}, #ifndef IXGBE_NO_LRO - {"lro_aggregated", IXGBE_STAT(lro_stats.coal)}, - {"lro_flushed", IXGBE_STAT(lro_stats.flushed)}, - {"lro_recycled", IXGBE_STAT(lro_stats.recycled)}, + {"lro_aggregated", IXGBE_STAT(lro_data.stats.coal)}, + {"lro_flushed", IXGBE_STAT(lro_data.stats.flushed)}, #endif /* IXGBE_NO_LRO */ - {"rx_no_dma_resources", IXGBE_STAT(hw_rx_no_dma_resources)}, -#ifndef IXGBE_NO_HW_RSC - {"hw_rsc_count", IXGBE_STAT(rsc_count)}, +#ifndef IXGBE_NO_INET_LRO + {"lro_aggregated", IXGBE_STAT(lro_aggregated)}, + {"lro_flushed", IXGBE_STAT(lro_flushed)}, #endif - {"rx_flm", IXGBE_STAT(flm)}, - {"fdir_match", IXGBE_STAT(stats.fdirmatch)}, - {"fdir_miss", IXGBE_STAT(stats.fdirmiss)}, }; #define IXGBE_QUEUE_STATS_LEN \ @@ -155,55 +152,17 @@ static int ixgbe_get_settings(struct net_device *netdev, ecmd->supported = SUPPORTED_10000baseT_Full; ecmd->autoneg = AUTONEG_ENABLE; ecmd->transceiver = XCVR_EXTERNAL; - if ((hw->phy.media_type == ixgbe_media_type_copper) || - (hw->mac.type == ixgbe_mac_82599EB)) { + if (hw->phy.media_type == ixgbe_media_type_copper) { ecmd->supported |= (SUPPORTED_1000baseT_Full | - SUPPORTED_Autoneg); + SUPPORTED_TP | SUPPORTED_Autoneg); - ecmd->advertising = ADVERTISED_Autoneg; + ecmd->advertising = (ADVERTISED_TP | ADVERTISED_Autoneg); if (hw->phy.autoneg_advertised & IXGBE_LINK_SPEED_10GB_FULL) ecmd->advertising |= ADVERTISED_10000baseT_Full; if (hw->phy.autoneg_advertised & IXGBE_LINK_SPEED_1GB_FULL) ecmd->advertising |= ADVERTISED_1000baseT_Full; - /* - * It''s possible that phy.autoneg_advertised may not be - * set yet. If so display what the default would be - - * both 1G and 10G supported. - */ - if (!(ecmd->advertising & (ADVERTISED_1000baseT_Full | - ADVERTISED_10000baseT_Full))) - ecmd->advertising |= (ADVERTISED_10000baseT_Full | - ADVERTISED_1000baseT_Full); - - if (hw->phy.media_type == ixgbe_media_type_copper) { - ecmd->supported |= SUPPORTED_TP; - ecmd->advertising |= ADVERTISED_TP; - ecmd->port = PORT_TP; - } else { - ecmd->supported |= SUPPORTED_FIBRE; - ecmd->advertising |= ADVERTISED_FIBRE; - ecmd->port = PORT_FIBRE; - } - } else if (hw->phy.media_type == ixgbe_media_type_backplane) { - /* Set as FIBRE until SERDES defined in kernel */ - switch (hw->device_id) { - case IXGBE_DEV_ID_82598: - ecmd->supported |= (SUPPORTED_1000baseT_Full | - SUPPORTED_FIBRE); - ecmd->advertising = (ADVERTISED_10000baseT_Full | - ADVERTISED_1000baseT_Full | - ADVERTISED_FIBRE); - ecmd->port = PORT_FIBRE; - break; - case IXGBE_DEV_ID_82598_BX: - ecmd->supported = (SUPPORTED_1000baseT_Full | - SUPPORTED_FIBRE); - ecmd->advertising = (ADVERTISED_1000baseT_Full | - ADVERTISED_FIBRE); - ecmd->port = PORT_FIBRE; - ecmd->autoneg = AUTONEG_DISABLE; - break; - } + + ecmd->port = PORT_TP; } else { ecmd->supported |= SUPPORTED_FIBRE; ecmd->advertising = (ADVERTISED_10000baseT_Full | @@ -241,10 +200,16 @@ static int ixgbe_set_settings(struct net_device *netdev, struct ixgbe_adapter *adapter = netdev_priv(netdev); struct ixgbe_hw *hw = &adapter->hw; u32 advertised, old; - s32 err = 0; + s32 err; - if ((hw->phy.media_type == ixgbe_media_type_copper) || - (hw->mac.type == ixgbe_mac_82599EB)) { + switch (hw->phy.media_type) { + case ixgbe_media_type_fiber: + if ((ecmd->autoneg == AUTONEG_ENABLE) || + (ecmd->speed + ecmd->duplex != SPEED_10000 + DUPLEX_FULL)) + return -EINVAL; + /* in this case we currently only support 10Gb/FULL */ + break; + case ixgbe_media_type_copper: /* 10000/copper and 1000/copper must autoneg * this function does not support any duplex forcing, but can * limit the advertising of the adapter to only 10000 or 1000 */ @@ -260,23 +225,20 @@ static int ixgbe_set_settings(struct net_device *netdev, advertised |= IXGBE_LINK_SPEED_1GB_FULL; if (old == advertised) - return err; + break; /* this sets the link speed and restarts auto-neg */ - hw->mac.autotry_restart = true; err = hw->mac.ops.setup_link_speed(hw, advertised, true, true); if (err) { DPRINTK(PROBE, INFO, "setup link failed with code %d\n", err); hw->mac.ops.setup_link_speed(hw, old, true, true); } - } else { - /* in this case we currently only support 10Gb/FULL */ - if ((ecmd->autoneg == AUTONEG_ENABLE) || - (ecmd->speed + ecmd->duplex != SPEED_10000 + DUPLEX_FULL)) - return -EINVAL; + break; + default: + break; } - return err; + return 0; } static void ixgbe_get_pauseparam(struct net_device *netdev, @@ -285,23 +247,7 @@ static void ixgbe_get_pauseparam(struct net_device *netdev, struct ixgbe_adapter *adapter = netdev_priv(netdev); struct ixgbe_hw *hw = &adapter->hw; - /* - * Flow Control Autoneg isn''t on if - * - we didn''t ask for it OR - * - it failed, we know this by tx & rx being off - */ - if (hw->fc.disable_fc_autoneg || (hw->fc.current_mode == ixgbe_fc_none)) - pause->autoneg = 0; - else - pause->autoneg = 1; - -#ifdef CONFIG_DCB - if (hw->fc.current_mode == ixgbe_fc_pfc) { - pause->rx_pause = 0; - pause->tx_pause = 0; - return; - } -#endif + pause->autoneg = (hw->fc.current_mode == ixgbe_fc_full ? 1 : 0); if (hw->fc.current_mode == ixgbe_fc_rx_pause) { pause->rx_pause = 1; @@ -318,41 +264,25 @@ static int ixgbe_set_pauseparam(struct net_device *netdev, { struct ixgbe_adapter *adapter = netdev_priv(netdev); struct ixgbe_hw *hw = &adapter->hw; - struct ixgbe_fc_info fc; - - if (adapter->dcb_cfg.pfc_mode_enable || - ((hw->mac.type == ixgbe_mac_82598EB) && - (adapter->flags & IXGBE_FLAG_DCB_ENABLED))) - return -EINVAL; - - fc = hw->fc; - if (pause->autoneg != AUTONEG_ENABLE) - fc.disable_fc_autoneg = true; - else - fc.disable_fc_autoneg = false; - - if (pause->rx_pause && pause->tx_pause) - fc.requested_mode = ixgbe_fc_full; + if ((pause->autoneg == AUTONEG_ENABLE) || + (pause->rx_pause && pause->tx_pause)) + hw->fc.current_mode = ixgbe_fc_full; else if (pause->rx_pause && !pause->tx_pause) - fc.requested_mode = ixgbe_fc_rx_pause; + hw->fc.current_mode = ixgbe_fc_rx_pause; else if (!pause->rx_pause && pause->tx_pause) - fc.requested_mode = ixgbe_fc_tx_pause; + hw->fc.current_mode = ixgbe_fc_tx_pause; else if (!pause->rx_pause && !pause->tx_pause) - fc.requested_mode = ixgbe_fc_none; + hw->fc.current_mode = ixgbe_fc_none; else return -EINVAL; - adapter->last_lfc_mode = fc.requested_mode; + hw->fc.requested_mode = hw->fc.current_mode; - /* if the thing changed then we''ll update and use new autoneg */ - if (memcmp(&fc, &hw->fc, sizeof(struct ixgbe_fc_info))) { - hw->fc = fc; - if (netif_running(netdev)) - ixgbe_reinit_locked(adapter); - else - ixgbe_reset(adapter); - } + if (netif_running(netdev)) + ixgbe_reinit_locked(adapter); + else + ixgbe_reset(adapter); return 0; } @@ -521,15 +451,9 @@ static void ixgbe_get_regs(struct net_device *netdev, struct ethtool_regs *regs, regs_buff[33] = IXGBE_READ_REG(hw, IXGBE_FCTTV(2)); regs_buff[34] = IXGBE_READ_REG(hw, IXGBE_FCTTV(3)); for (i = 0; i < 8; i++) - if (hw->mac.type == ixgbe_mac_82599EB) - regs_buff[35 + i] = IXGBE_READ_REG(hw, IXGBE_FCRTL_82599(i)); - else - regs_buff[35 + i] = IXGBE_READ_REG(hw, IXGBE_FCRTL(i)); + regs_buff[35 + i] = IXGBE_READ_REG(hw, IXGBE_FCRTL(i)); for (i = 0; i < 8; i++) - if (hw->mac.type == ixgbe_mac_82599EB) - regs_buff[43 + i] = IXGBE_READ_REG(hw, IXGBE_FCRTH_82599(i)); - else - regs_buff[43 + i] = IXGBE_READ_REG(hw, IXGBE_FCRTH(i)); + regs_buff[43 + i] = IXGBE_READ_REG(hw, IXGBE_FCRTH(i)); regs_buff[51] = IXGBE_READ_REG(hw, IXGBE_FCRTV); regs_buff[52] = IXGBE_READ_REG(hw, IXGBE_TFCS); @@ -870,17 +794,10 @@ static void ixgbe_get_drvinfo(struct net_device *netdev, struct ethtool_drvinfo *drvinfo) { struct ixgbe_adapter *adapter = netdev_priv(netdev); - char firmware_version[32]; strncpy(drvinfo->driver, ixgbe_driver_name, 32); strncpy(drvinfo->version, ixgbe_driver_version, 32); - - sprintf(firmware_version, "%d.%d-%d", - (adapter->eeprom_version & 0xF000) >> 12, - (adapter->eeprom_version & 0x0FF0) >> 4, - adapter->eeprom_version & 0x000F); - - strncpy(drvinfo->fw_version, firmware_version, 32); + strncpy(drvinfo->fw_version, "N/A", 32); strncpy(drvinfo->bus_info, pci_name(adapter->pdev), 32); drvinfo->n_stats = IXGBE_STATS_LEN; drvinfo->testinfo_len = IXGBE_TEST_LEN; @@ -908,10 +825,9 @@ static int ixgbe_set_ringparam(struct net_device *netdev, struct ethtool_ringparam *ring) { struct ixgbe_adapter *adapter = netdev_priv(netdev); - struct ixgbe_ring *temp_tx_ring, *temp_rx_ring; + struct ixgbe_ring *temp_ring; int i, err; u32 new_rx_count, new_tx_count; - bool need_update = false; if ((ring->rx_mini_pending) || (ring->rx_jumbo_pending)) return -EINVAL; @@ -930,92 +846,84 @@ static int ixgbe_set_ringparam(struct net_device *netdev, return 0; } + if (adapter->num_tx_queues > adapter->num_rx_queues) + temp_ring = vmalloc(adapter->num_tx_queues * + sizeof(struct ixgbe_ring)); + else + temp_ring = vmalloc(adapter->num_rx_queues * + sizeof(struct ixgbe_ring)); + if (!temp_ring) + return -ENOMEM; + while (test_and_set_bit(__IXGBE_RESETTING, &adapter->state)) msleep(1); - temp_tx_ring = kcalloc(adapter->num_tx_queues, - sizeof(struct ixgbe_ring), GFP_KERNEL); - if (!temp_tx_ring) { - err = -ENOMEM; - goto err_setup; - } + if (netif_running(netdev)) + ixgbe_down(adapter); - if (new_tx_count != adapter->tx_ring_count) { - memcpy(temp_tx_ring, adapter->tx_ring, + /* + * We can''t just free everything and then setup again, + * because the ISRs in MSI-X mode get passed pointers + * to the tx and rx ring structs. + */ + if (new_tx_count != adapter->tx_ring->count) { + memcpy(temp_ring, adapter->tx_ring, adapter->num_tx_queues * sizeof(struct ixgbe_ring)); + for (i = 0; i < adapter->num_tx_queues; i++) { - temp_tx_ring[i].count = new_tx_count; - err = ixgbe_setup_tx_resources(adapter, - &temp_tx_ring[i]); + temp_ring[i].count = new_tx_count; + err = ixgbe_setup_tx_resources(adapter, &temp_ring[i]); if (err) { while (i) { i--; ixgbe_free_tx_resources(adapter, - &temp_tx_ring[i]); + &temp_ring[i]); } goto err_setup; } } - need_update = true; - } - temp_rx_ring = kcalloc(adapter->num_rx_queues, - sizeof(struct ixgbe_ring), GFP_KERNEL); - if ((!temp_rx_ring) && (need_update)) { for (i = 0; i < adapter->num_tx_queues; i++) - ixgbe_free_tx_resources(adapter, &temp_tx_ring[i]); - kfree(temp_tx_ring); - err = -ENOMEM; - goto err_setup; + ixgbe_free_tx_resources(adapter, &adapter->tx_ring[i]); + + memcpy(adapter->tx_ring, temp_ring, + adapter->num_tx_queues * sizeof(struct ixgbe_ring)); + + adapter->tx_ring_count = new_tx_count; } - if (new_rx_count != adapter->rx_ring_count) { - memcpy(temp_rx_ring, adapter->rx_ring, + if (new_rx_count != adapter->rx_ring->count) { + memcpy(temp_ring, adapter->rx_ring, adapter->num_rx_queues * sizeof(struct ixgbe_ring)); + for (i = 0; i < adapter->num_rx_queues; i++) { - temp_rx_ring[i].count = new_rx_count; - err = ixgbe_setup_rx_resources(adapter, - &temp_rx_ring[i]); + temp_ring[i].count = new_rx_count; + err = ixgbe_setup_rx_resources(adapter, &temp_ring[i]); if (err) { while (i) { i--; ixgbe_free_rx_resources(adapter, - &temp_rx_ring[i]); + &temp_ring[i]); } goto err_setup; } } - need_update = true; - } - /* if rings need to be updated, here''s the place to do it in one shot */ - if (need_update) { - if (netif_running(netdev)) - ixgbe_down(adapter); - - /* tx */ - if (new_tx_count != adapter->tx_ring_count) { - kfree(adapter->tx_ring); - adapter->tx_ring = temp_tx_ring; - temp_tx_ring = NULL; - adapter->tx_ring_count = new_tx_count; - } + for (i = 0; i < adapter->num_rx_queues; i++) + ixgbe_free_rx_resources(adapter, &adapter->rx_ring[i]); - /* rx */ - if (new_rx_count != adapter->rx_ring_count) { - kfree(adapter->rx_ring); - adapter->rx_ring = temp_rx_ring; - temp_rx_ring = NULL; - adapter->rx_ring_count = new_rx_count; - } + memcpy(adapter->rx_ring, temp_ring, + adapter->num_rx_queues * sizeof(struct ixgbe_ring)); + + adapter->rx_ring_count = new_rx_count; } /* success! */ err = 0; +err_setup: if (netif_running(netdev)) ixgbe_up(adapter); -err_setup: clear_bit(__IXGBE_RESETTING, &adapter->state); return err; } @@ -1034,6 +942,19 @@ static void ixgbe_get_ethtool_stats(struct net_device *netdev, int j, k; int i; +#ifndef IXGBE_NO_INET_LRO + unsigned int aggregated = 0, flushed = 0, no_desc = 0; + + for (i = 0; i < adapter->num_rx_queues; i++) { + aggregated += adapter->rx_ring[i].lro_mgr.stats.aggregated; + flushed += adapter->rx_ring[i].lro_mgr.stats.flushed; + no_desc += adapter->rx_ring[i].lro_mgr.stats.no_desc; + } + adapter->lro_aggregated = aggregated; + adapter->lro_flushed = flushed; + adapter->lro_no_desc = no_desc; + +#endif ixgbe_update_stats(adapter); for (i = 0; i < IXGBE_GLOBAL_STATS_LEN; i++) { char *p = (char *)adapter + ixgbe_gstrings_stats[i].stat_offset; @@ -1154,55 +1075,31 @@ struct ixgbe_reg_test { #define TABLE64_TEST_LO 5 #define TABLE64_TEST_HI 6 -/* default 82599 register test */ -static struct ixgbe_reg_test reg_test_82599[] = { - { IXGBE_FCRTL_82599(0), 1, PATTERN_TEST, 0x8007FFF0, 0x8007FFF0 }, - { IXGBE_FCRTH_82599(0), 1, PATTERN_TEST, 0x8007FFF0, 0x8007FFF0 }, - { IXGBE_PFCTOP, 1, PATTERN_TEST, 0xFFFFFFFF, 0xFFFFFFFF }, - { IXGBE_VLNCTRL, 1, PATTERN_TEST, 0x00000000, 0x00000000 }, - { IXGBE_RDBAL(0), 4, PATTERN_TEST, 0xFFFFFF80, 0xFFFFFF80 }, - { IXGBE_RDBAH(0), 4, PATTERN_TEST, 0xFFFFFFFF, 0xFFFFFFFF }, - { IXGBE_RDLEN(0), 4, PATTERN_TEST, 0x000FFF80, 0x000FFFFF }, - { IXGBE_RXDCTL(0), 4, WRITE_NO_TEST, 0, IXGBE_RXDCTL_ENABLE }, - { IXGBE_RDT(0), 4, PATTERN_TEST, 0x0000FFFF, 0x0000FFFF }, - { IXGBE_RXDCTL(0), 4, WRITE_NO_TEST, 0, 0 }, - { IXGBE_FCRTH(0), 1, PATTERN_TEST, 0x8007FFF0, 0x8007FFF0 }, - { IXGBE_FCTTV(0), 1, PATTERN_TEST, 0xFFFFFFFF, 0xFFFFFFFF }, - { IXGBE_TDBAL(0), 4, PATTERN_TEST, 0xFFFFFF80, 0xFFFFFFFF }, - { IXGBE_TDBAH(0), 4, PATTERN_TEST, 0xFFFFFFFF, 0xFFFFFFFF }, - { IXGBE_TDLEN(0), 4, PATTERN_TEST, 0x000FFF80, 0x000FFF80 }, - { IXGBE_RXCTRL, 1, SET_READ_TEST, 0x00000001, 0x00000001 }, - { IXGBE_RAL(0), 16, TABLE64_TEST_LO, 0xFFFFFFFF, 0xFFFFFFFF }, - { IXGBE_RAL(0), 16, TABLE64_TEST_HI, 0x8001FFFF, 0x800CFFFF }, - { IXGBE_MTA(0), 128, TABLE32_TEST, 0xFFFFFFFF, 0xFFFFFFFF }, - { 0, 0, 0, 0 } -}; - -/* default 82598 register test */ +/* default register test */ static struct ixgbe_reg_test reg_test_82598[] = { - { IXGBE_FCRTL(0), 1, PATTERN_TEST, 0x8007FFF0, 0x8007FFF0 }, - { IXGBE_FCRTH(0), 1, PATTERN_TEST, 0x8007FFF0, 0x8007FFF0 }, - { IXGBE_PFCTOP, 1, PATTERN_TEST, 0xFFFFFFFF, 0xFFFFFFFF }, - { IXGBE_VLNCTRL, 1, PATTERN_TEST, 0x00000000, 0x00000000 }, - { IXGBE_RDBAL(0), 4, PATTERN_TEST, 0xFFFFFF80, 0xFFFFFFFF }, - { IXGBE_RDBAH(0), 4, PATTERN_TEST, 0xFFFFFFFF, 0xFFFFFFFF }, - { IXGBE_RDLEN(0), 4, PATTERN_TEST, 0x000FFF80, 0x000FFFFF }, + { IXGBE_FCRTL(0), 1, PATTERN_TEST, 0x8007FFF0, 0x8007FFF0 }, + { IXGBE_FCRTH(0), 1, PATTERN_TEST, 0x8007FFF0, 0x8007FFF0 }, + { IXGBE_PFCTOP, 1, PATTERN_TEST, 0xFFFFFFFF, 0xFFFFFFFF }, + { IXGBE_VLNCTRL, 1, PATTERN_TEST, 0x00000000, 0x00000000 }, + { IXGBE_RDBAL(0), 4, PATTERN_TEST, 0xFFFFFF80, 0xFFFFFFFF }, + { IXGBE_RDBAH(0), 4, PATTERN_TEST, 0xFFFFFFFF, 0xFFFFFFFF }, + { IXGBE_RDLEN(0), 4, PATTERN_TEST, 0x000FFF80, 0x000FFFFF }, /* Enable all four RX queues before testing. */ - { IXGBE_RXDCTL(0), 4, WRITE_NO_TEST, 0, IXGBE_RXDCTL_ENABLE }, + { IXGBE_RXDCTL(0), 4, WRITE_NO_TEST, 0, IXGBE_RXDCTL_ENABLE }, /* RDH is read-only for 82598, only test RDT. */ - { IXGBE_RDT(0), 4, PATTERN_TEST, 0x0000FFFF, 0x0000FFFF }, - { IXGBE_RXDCTL(0), 4, WRITE_NO_TEST, 0, 0 }, - { IXGBE_FCRTH(0), 1, PATTERN_TEST, 0x8007FFF0, 0x8007FFF0 }, - { IXGBE_FCTTV(0), 1, PATTERN_TEST, 0xFFFFFFFF, 0xFFFFFFFF }, - { IXGBE_TIPG, 1, PATTERN_TEST, 0x000000FF, 0x000000FF }, - { IXGBE_TDBAL(0), 4, PATTERN_TEST, 0xFFFFFF80, 0xFFFFFFFF }, - { IXGBE_TDBAH(0), 4, PATTERN_TEST, 0xFFFFFFFF, 0xFFFFFFFF }, - { IXGBE_TDLEN(0), 4, PATTERN_TEST, 0x000FFF80, 0x000FFFFF }, - { IXGBE_RXCTRL, 1, SET_READ_TEST, 0x00000003, 0x00000003 }, - { IXGBE_DTXCTL, 1, SET_READ_TEST, 0x00000005, 0x00000005 }, - { IXGBE_RAL(0), 16, TABLE64_TEST_LO, 0xFFFFFFFF, 0xFFFFFFFF }, - { IXGBE_RAL(0), 16, TABLE64_TEST_HI, 0x800CFFFF, 0x800CFFFF }, - { IXGBE_MTA(0), 128, TABLE32_TEST, 0xFFFFFFFF, 0xFFFFFFFF }, + { IXGBE_RDT(0), 4, PATTERN_TEST, 0x0000FFFF, 0x0000FFFF }, + { IXGBE_RXDCTL(0), 4, WRITE_NO_TEST, 0, 0 }, + { IXGBE_FCRTH(0), 1, PATTERN_TEST, 0x8007FFF0, 0x8007FFF0 }, + { IXGBE_FCTTV(0), 1, PATTERN_TEST, 0xFFFFFFFF, 0xFFFFFFFF }, + { IXGBE_TIPG, 1, PATTERN_TEST, 0x000000FF, 0x000000FF }, + { IXGBE_TDBAL(0), 4, PATTERN_TEST, 0xFFFFFF80, 0xFFFFFFFF }, + { IXGBE_TDBAH(0), 4, PATTERN_TEST, 0xFFFFFFFF, 0xFFFFFFFF }, + { IXGBE_TDLEN(0), 4, PATTERN_TEST, 0x000FFF80, 0x000FFFFF }, + { IXGBE_RXCTRL, 1, SET_READ_TEST, 0x00000003, 0x00000003 }, + { IXGBE_DTXCTL, 1, SET_READ_TEST, 0x00000005, 0x00000005 }, + { IXGBE_RAL(0), 16, TABLE64_TEST_LO, 0xFFFFFFFF, 0xFFFFFFFF }, + { IXGBE_RAL(0), 16, TABLE64_TEST_HI, 0x800CFFFF, 0x800CFFFF }, + { IXGBE_MTA(0), 128, TABLE32_TEST, 0xFFFFFFFF, 0xFFFFFFFF }, { 0, 0, 0, 0 } }; @@ -1248,13 +1145,8 @@ static int ixgbe_reg_test(struct ixgbe_adapter *adapter, u64 *data) u32 value, before, after; u32 i, toggle; - if (adapter->hw.mac.type == ixgbe_mac_82599EB) { - toggle = 0x7FFFF30F; - test = reg_test_82599; - } else { - toggle = 0x7FFFF3FF; - test = reg_test_82598; - } + toggle = 0x7FFFF3FF; + test = reg_test_82598; /* * Because the status register is such a special case, @@ -1452,42 +1344,16 @@ static void ixgbe_free_desc_rings(struct ixgbe_adapter *adapter) { struct ixgbe_ring *tx_ring = &adapter->test_tx_ring; struct ixgbe_ring *rx_ring = &adapter->test_rx_ring; - struct ixgbe_hw *hw = &adapter->hw; struct pci_dev *pdev = adapter->pdev; - u32 reg_ctl; int i; - /* shut down the DMA engines now so they can be reinitialized later */ - - /* first Rx */ - reg_ctl = IXGBE_READ_REG(hw, IXGBE_RXCTRL); - reg_ctl &= ~IXGBE_RXCTRL_RXEN; - IXGBE_WRITE_REG(hw, IXGBE_RXCTRL, reg_ctl); - reg_ctl = IXGBE_READ_REG(hw, IXGBE_RXDCTL(0)); - reg_ctl &= ~IXGBE_RXDCTL_ENABLE; - IXGBE_WRITE_REG(hw, IXGBE_RXDCTL(0), reg_ctl); - - /* now Tx */ - reg_ctl = IXGBE_READ_REG(hw, IXGBE_TXDCTL(0)); - reg_ctl &= ~IXGBE_TXDCTL_ENABLE; - IXGBE_WRITE_REG(hw, IXGBE_TXDCTL(0), reg_ctl); - if (hw->mac.type == ixgbe_mac_82599EB) { - reg_ctl = IXGBE_READ_REG(hw, IXGBE_DMATXCTL); - reg_ctl &= ~IXGBE_DMATXCTL_TE; - IXGBE_WRITE_REG(hw, IXGBE_DMATXCTL, reg_ctl); - } - - ixgbe_reset(adapter); - if (tx_ring->desc && tx_ring->tx_buffer_info) { for (i = 0; i < tx_ring->count; i++) { struct ixgbe_tx_buffer *buf &(tx_ring->tx_buffer_info[i]); - if (buf->dma) { + if (buf->dma) pci_unmap_single(pdev, buf->dma, buf->length, PCI_DMA_TODEVICE); - buf->dma = 0; - } if (buf->skb) dev_kfree_skb(buf->skb); } @@ -1497,12 +1363,10 @@ static void ixgbe_free_desc_rings(struct ixgbe_adapter *adapter) for (i = 0; i < rx_ring->count; i++) { struct ixgbe_rx_buffer *buf &(rx_ring->rx_buffer_info[i]); - if (buf->dma) { + if (buf->dma) pci_unmap_single(pdev, buf->dma, IXGBE_RXBUFFER_2048, PCI_DMA_FROMDEVICE); - buf->dma = 0; - } if (buf->skb) dev_kfree_skb(buf->skb); } @@ -1570,11 +1434,6 @@ static int ixgbe_setup_desc_rings(struct ixgbe_adapter *adapter) reg_data |= IXGBE_HLREG0_TXPADEN; IXGBE_WRITE_REG(&adapter->hw, IXGBE_HLREG0, reg_data); - if (adapter->hw.mac.type == ixgbe_mac_82599EB) { - reg_data = IXGBE_READ_REG(&adapter->hw, IXGBE_DMATXCTL); - reg_data |= IXGBE_DMATXCTL_TE; - IXGBE_WRITE_REG(&adapter->hw, IXGBE_DMATXCTL, reg_data); - } reg_data = IXGBE_READ_REG(&adapter->hw, IXGBE_TXDCTL(0)); reg_data |= IXGBE_TXDCTL_ENABLE; IXGBE_WRITE_REG(&adapter->hw, IXGBE_TXDCTL(0), reg_data); @@ -1658,17 +1517,6 @@ static int ixgbe_setup_desc_rings(struct ixgbe_adapter *adapter) reg_data = IXGBE_READ_REG(&adapter->hw, IXGBE_RXDCTL(0)); reg_data |= IXGBE_RXDCTL_ENABLE; IXGBE_WRITE_REG(&adapter->hw, IXGBE_RXDCTL(0), reg_data); - if (adapter->hw.mac.type == ixgbe_mac_82599EB) { - int j = adapter->rx_ring[0].reg_idx; - u32 k; - for (k = 0; k < 10; k++) { - if (IXGBE_READ_REG(&adapter->hw, - IXGBE_RXDCTL(j)) & IXGBE_RXDCTL_ENABLE) - break; - else - msleep(1); - } - } rctl |= IXGBE_RXCTRL_RXEN | IXGBE_RXCTRL_DMBYPS; IXGBE_WRITE_REG(&adapter->hw, IXGBE_RXCTRL, rctl); @@ -1939,75 +1787,15 @@ static void ixgbe_diag_test(struct net_device *netdev, msleep_interruptible(4 * 1000); } -static int ixgbe_wol_exclusion(struct ixgbe_adapter *adapter, - struct ethtool_wolinfo *wol) -{ - struct ixgbe_hw *hw = &adapter->hw; - int retval = 1; - - switch(hw->device_id) { - case IXGBE_DEV_ID_82599_KX4: - retval = 0; - break; - default: - wol->supported = 0; - retval = 0; - } - - return retval; -} - static void ixgbe_get_wol(struct net_device *netdev, struct ethtool_wolinfo *wol) { - struct ixgbe_adapter *adapter = netdev_priv(netdev); - - wol->supported = WAKE_UCAST | WAKE_MCAST | - WAKE_BCAST | WAKE_MAGIC; + wol->supported = 0; wol->wolopts = 0; - if (ixgbe_wol_exclusion(adapter, wol) || - !device_can_wakeup(&adapter->pdev->dev)) - return; - - if (adapter->wol & IXGBE_WUFC_EX) - wol->wolopts |= WAKE_UCAST; - if (adapter->wol & IXGBE_WUFC_MC) - wol->wolopts |= WAKE_MCAST; - if (adapter->wol & IXGBE_WUFC_BC) - wol->wolopts |= WAKE_BCAST; - if (adapter->wol & IXGBE_WUFC_MAG) - wol->wolopts |= WAKE_MAGIC; - return; } -static int ixgbe_set_wol(struct net_device *netdev, struct ethtool_wolinfo *wol) -{ - struct ixgbe_adapter *adapter = netdev_priv(netdev); - - if (wol->wolopts & (WAKE_PHY | WAKE_ARP | WAKE_MAGICSECURE)) - return -EOPNOTSUPP; - - if (ixgbe_wol_exclusion(adapter, wol)) - return wol->wolopts ? -EOPNOTSUPP : 0; - - adapter->wol = 0; - - if (wol->wolopts & WAKE_UCAST) - adapter->wol |= IXGBE_WUFC_EX; - if (wol->wolopts & WAKE_MCAST) - adapter->wol |= IXGBE_WUFC_MC; - if (wol->wolopts & WAKE_BCAST) - adapter->wol |= IXGBE_WUFC_BC; - if (wol->wolopts & WAKE_MAGIC) - adapter->wol |= IXGBE_WUFC_MAG; - - device_set_wakeup_enable(&adapter->pdev->dev, adapter->wol); - - return 0; -} - static int ixgbe_nway_reset(struct net_device *netdev) { struct ixgbe_adapter *adapter = netdev_priv(netdev); @@ -2051,30 +1839,16 @@ static int ixgbe_get_coalesce(struct net_device *netdev, #endif /* only valid if in constant ITR mode */ - switch (adapter->itr_setting) { - case 0: - /* throttling disabled */ - ec->rx_coalesce_usecs = 0; - break; - case 1: - /* dynamic ITR mode */ - ec->rx_coalesce_usecs = 1; - break; - default: - /* fixed interrupt rate mode */ + if (adapter->itr_setting == 0) ec->rx_coalesce_usecs = 1000000/adapter->eitr_param; - break; - } + return 0; } -extern void ixgbe_write_eitr(struct ixgbe_q_vector *q_vector); - static int ixgbe_set_coalesce(struct net_device *netdev, struct ethtool_coalesce *ec) { struct ixgbe_adapter *adapter = netdev_priv(netdev); - int i; if (ec->tx_max_coalesced_frames_irq) adapter->tx_ring[0].work_limit = ec->tx_max_coalesced_frames_irq; @@ -2084,81 +1858,37 @@ static int ixgbe_set_coalesce(struct net_device *netdev, adapter->rx_ring[0].work_limit = ec->rx_max_coalesced_frames_irq; #endif - if (ec->rx_coalesce_usecs > 1) { - /* check the limits */ - if ((1000000/ec->rx_coalesce_usecs > IXGBE_MAX_INT_RATE) || - (1000000/ec->rx_coalesce_usecs < IXGBE_MIN_INT_RATE)) - return -EINVAL; - + if (ec->rx_coalesce_usecs > 3) { + struct ixgbe_hw *hw = &adapter->hw; + int i; /* store the value in ints/second */ adapter->eitr_param = 1000000/ec->rx_coalesce_usecs; + for (i = 0; i < adapter->num_msix_vectors - NON_Q_VECTORS; i++){ + struct ixgbe_q_vector *q_vector = &adapter->q_vector[i]; + if (q_vector->txr_count && !q_vector->rxr_count) + q_vector->eitr = (adapter->eitr_param >> 1); + else + /* rx only */ + q_vector->eitr = adapter->eitr_param; + IXGBE_WRITE_REG(hw, IXGBE_EITR(i), + EITR_INTS_PER_SEC_TO_REG(q_vector->eitr)); + } + /* static value of interrupt rate */ adapter->itr_setting = adapter->eitr_param; - /* clear the lower bit as its used for dynamic state */ - adapter->itr_setting &= ~1; - } else if (ec->rx_coalesce_usecs == 1) { - /* 1 means dynamic mode */ - adapter->eitr_param = 20000; - adapter->itr_setting = 1; } else { - /* - * any other value means disable eitr, which is best - * served by setting the interrupt rate very high - */ - adapter->eitr_param = IXGBE_MAX_INT_RATE; - adapter->itr_setting = 0; + /* 1,2,3 means dynamic mode */ + adapter->itr_setting = ec->rx_coalesce_usecs; } - for (i = 0; i < adapter->num_msix_vectors - NON_Q_VECTORS; i++) { - struct ixgbe_q_vector *q_vector = adapter->q_vector[i]; - if (q_vector->txr_count && !q_vector->rxr_count) - /* tx vector gets half the rate */ - q_vector->eitr = (adapter->eitr_param >> 1); - else if (q_vector->rxr_count) - /* rx only or mixed */ - q_vector->eitr = adapter->eitr_param; - ixgbe_write_eitr(q_vector); - } + if (netif_running(netdev)) + ixgbe_reinit_locked(adapter); return 0; } -#ifdef ETHTOOL_GFLAGS -static int ixgbe_set_flags(struct net_device *netdev, u32 data) -{ -#if !defined(IXGBE_NO_HW_RSC) || !defined(IXGBE_NO_LRO) - struct ixgbe_adapter *adapter = netdev_priv(netdev); -#endif - ethtool_op_set_flags(netdev, data); - -#ifndef IXGBE_NO_HW_RSC - /* if state changes we need to update adapter->flags and reset */ - if (adapter->flags2 & IXGBE_FLAG2_RSC_CAPABLE) { - /* cast both to bool and verify if they are set the same */ - if ((!!(data & ETH_FLAG_LRO)) !- (!!(adapter->flags2 & IXGBE_FLAG2_RSC_ENABLED))) { - adapter->flags2 ^= IXGBE_FLAG2_RSC_ENABLED; - if (netif_running(netdev)) - ixgbe_reinit_locked(adapter); - else - ixgbe_reset(adapter); - } - return 0; - } -#endif /* IXGBE_NO_HW_RSC */ -#ifndef IXGBE_NO_LRO - /* cast both to bool and verify if they are set the same */ - if ((!!(data & ETH_FLAG_LRO)) != - (!!(adapter->flags2 & IXGBE_FLAG2_SWLRO_ENABLED))) - adapter->flags2 ^= IXGBE_FLAG2_SWLRO_ENABLED; - -#endif /* IXGBE_NO_LRO */ - return 0; - -} -#endif /* ETHTOOL_GFLAGS */ static struct ethtool_ops ixgbe_ethtool_ops = { .get_settings = ixgbe_get_settings, .set_settings = ixgbe_set_settings, @@ -2166,7 +1896,6 @@ static struct ethtool_ops ixgbe_ethtool_ops = { .get_regs_len = ixgbe_get_regs_len, .get_regs = ixgbe_get_regs, .get_wol = ixgbe_get_wol, - .set_wol = ixgbe_set_wol, .nway_reset = ixgbe_nway_reset, .get_link = ethtool_op_get_link, .get_eeprom_len = ixgbe_get_eeprom_len, @@ -2199,9 +1928,9 @@ static struct ethtool_ops ixgbe_ethtool_ops = { #endif .get_coalesce = ixgbe_get_coalesce, .set_coalesce = ixgbe_set_coalesce, -#ifdef ETHTOOL_GFLAGS +#ifndef IXGBE_NO_INET_LRO .get_flags = ethtool_op_get_flags, - .set_flags = ixgbe_set_flags, + .set_flags = ethtool_op_set_flags, #endif }; diff --git a/drivers/net/ixgbe/ixgbe_main.c b/drivers/net/ixgbe/ixgbe_main.c index 6670774..ec2fe1a 100644 --- a/drivers/net/ixgbe/ixgbe_main.c +++ b/drivers/net/ixgbe/ixgbe_main.c @@ -1,7 +1,7 @@ /******************************************************************************* Intel 10 Gigabit PCI Express Linux driver - Copyright(c) 1999 - 2009 Intel Corporation. + Copyright(c) 1999 - 2008 Intel Corporation. This program is free software; you can redistribute it and/or modify it under the terms and conditions of the GNU General Public License, @@ -29,6 +29,7 @@ /****************************************************************************** Copyright (c)2006 - 2007 Myricom, Inc. for some LRO specific code ******************************************************************************/ + #include <linux/types.h> #include <linux/module.h> #include <linux/pci.h> @@ -52,7 +53,6 @@ #include <linux/if_vlan.h> #endif - #include "ixgbe.h" char ixgbe_driver_name[] = "ixgbe"; @@ -66,11 +66,9 @@ static const char ixgbe_driver_string[] #define DRIVERNAPI "-NAPI" #endif -#define FPGA - -#define DRV_VERSION "2.0.34.3" DRIVERNAPI DRV_HW_PERF FPGA +#define DRV_VERSION "1.3.56.5-vmq" DRIVERNAPI DRV_HW_PERF const char ixgbe_driver_version[] = DRV_VERSION; -static char ixgbe_copyright[] = "Copyright (c) 1999-2009 Intel Corporation."; +static char ixgbe_copyright[] = "Copyright (c) 1999-2008 Intel Corporation."; /* ixgbe_pci_tbl - PCI Device ID Table * * Wildcard entries (PCI_ANY_ID) should come last @@ -81,7 +79,6 @@ static char ixgbe_copyright[] = "Copyright (c) 1999-2009 Intel Corporation."; */ static struct pci_device_id ixgbe_pci_tbl[] = { {PCI_DEVICE(PCI_VENDOR_ID_INTEL, IXGBE_DEV_ID_82598)}, - {PCI_DEVICE(PCI_VENDOR_ID_INTEL, IXGBE_DEV_ID_82598_BX)}, {PCI_DEVICE(PCI_VENDOR_ID_INTEL, IXGBE_DEV_ID_82598AF_DUAL_PORT)}, {PCI_DEVICE(PCI_VENDOR_ID_INTEL, IXGBE_DEV_ID_82598AF_SINGLE_PORT)}, {PCI_DEVICE(PCI_VENDOR_ID_INTEL, IXGBE_DEV_ID_82598AT)}, @@ -91,9 +88,6 @@ static struct pci_device_id ixgbe_pci_tbl[] = { {PCI_DEVICE(PCI_VENDOR_ID_INTEL, IXGBE_DEV_ID_82598_SR_DUAL_PORT_EM)}, {PCI_DEVICE(PCI_VENDOR_ID_INTEL, IXGBE_DEV_ID_82598EB_XF_LR)}, {PCI_DEVICE(PCI_VENDOR_ID_INTEL, IXGBE_DEV_ID_82598EB_SFP_LOM)}, - {PCI_DEVICE(PCI_VENDOR_ID_INTEL, IXGBE_DEV_ID_82599_KX4)}, - {PCI_DEVICE(PCI_VENDOR_ID_INTEL, IXGBE_DEV_ID_82599_XAUI_LOM)}, - {PCI_DEVICE(PCI_VENDOR_ID_INTEL, IXGBE_DEV_ID_82599_SFP)}, /* required last entry */ {0, } }; @@ -107,8 +101,8 @@ static struct notifier_block dca_notifier = { .next = NULL, .priority = 0 }; - #endif + MODULE_AUTHOR("Intel Corporation, <linux.nics@intel.com>"); MODULE_DESCRIPTION("Intel(R) 10 Gigabit PCI Express Network Driver"); MODULE_LICENSE("GPL"); @@ -136,69 +130,17 @@ static void ixgbe_get_hw_control(struct ixgbe_adapter *adapter) ctrl_ext | IXGBE_CTRL_EXT_DRV_LOAD); } -/* - * ixgbe_set_ivar - set the IVAR registers, mapping interrupt causes to vectors - * @adapter: pointer to adapter struct - * @direction: 0 for Rx, 1 for Tx, -1 for other causes - * @queue: queue to map the corresponding interrupt to - * @msix_vector: the vector to map to the corresponding queue - * - */ -static void ixgbe_set_ivar(struct ixgbe_adapter *adapter, s8 direction, - u8 queue, u8 msix_vector) +static void ixgbe_set_ivar(struct ixgbe_adapter *adapter, u16 int_alloc_entry, + u8 msix_vector) { u32 ivar, index; - struct ixgbe_hw *hw = &adapter->hw; - switch (hw->mac.type) { - case ixgbe_mac_82598EB: - msix_vector |= IXGBE_IVAR_ALLOC_VAL; - if (direction == -1) - direction = 0; - index = (((direction * 64) + queue) >> 2) & 0x1F; - ivar = IXGBE_READ_REG(hw, IXGBE_IVAR(index)); - ivar &= ~(0xFF << (8 * (queue & 0x3))); - ivar |= (msix_vector << (8 * (queue & 0x3))); - IXGBE_WRITE_REG(hw, IXGBE_IVAR(index), ivar); - break; - case ixgbe_mac_82599EB: - if (direction == -1) { - /* other causes */ - msix_vector |= IXGBE_IVAR_ALLOC_VAL; - index = ((queue & 1) * 8); - ivar = IXGBE_READ_REG(&adapter->hw, IXGBE_IVAR_MISC); - ivar &= ~(0xFF << index); - ivar |= (msix_vector << index); - IXGBE_WRITE_REG(&adapter->hw, IXGBE_IVAR_MISC, ivar); - break; - } else { - /* tx or rx causes */ - msix_vector |= IXGBE_IVAR_ALLOC_VAL; - index = ((16 * (queue & 1)) + (8 * direction)); - ivar = IXGBE_READ_REG(hw, IXGBE_IVAR(queue >> 1)); - ivar &= ~(0xFF << index); - ivar |= (msix_vector << index); - IXGBE_WRITE_REG(hw, IXGBE_IVAR(queue >> 1), ivar); - break; - } - default: - break; - } -} -static inline void ixgbe_irq_rearm_queues(struct ixgbe_adapter *adapter, - u64 qmask) -{ - u32 mask; - - if (adapter->hw.mac.type == ixgbe_mac_82598EB) { - mask = (IXGBE_EIMS_RTX_QUEUE & qmask); - IXGBE_WRITE_REG(&adapter->hw, IXGBE_EICS, mask); - } else { - mask = (qmask & 0xFFFFFFFF); - IXGBE_WRITE_REG(&adapter->hw, IXGBE_EICS_EX(0), mask); - mask = (qmask >> 32); - IXGBE_WRITE_REG(&adapter->hw, IXGBE_EICS_EX(1), mask); - } + msix_vector |= IXGBE_IVAR_ALLOC_VAL; + index = (int_alloc_entry >> 2) & 0x1F; + ivar = IXGBE_READ_REG(&adapter->hw, IXGBE_IVAR(index)); + ivar &= ~(0xFF << (8 * (int_alloc_entry & 0x3))); + ivar |= (msix_vector << (8 * (int_alloc_entry & 0x3))); + IXGBE_WRITE_REG(&adapter->hw, IXGBE_IVAR(index), ivar); } static void ixgbe_unmap_and_free_tx_resource(struct ixgbe_adapter *adapter, @@ -267,38 +209,39 @@ static inline bool ixgbe_check_tx_hang(struct ixgbe_adapter *adapter, #define DESC_NEEDED TXD_USE_COUNT(IXGBE_MAX_DATA_PER_TXD) #endif +#define GET_TX_HEAD_FROM_RING(ring) (\ + *(volatile u32 *) \ + ((union ixgbe_adv_tx_desc *)(ring)->desc + (ring)->count)) static void ixgbe_tx_timeout(struct net_device *netdev); /** * ixgbe_clean_tx_irq - Reclaim resources after transmit completes - * @q_vector: structure containing interrupt and ring information + * @adapter: board private structure * @tx_ring: tx ring to clean **/ -static bool ixgbe_clean_tx_irq(struct ixgbe_q_vector *q_vector, +static bool ixgbe_clean_tx_irq(struct ixgbe_adapter *adapter, struct ixgbe_ring *tx_ring) { - struct ixgbe_adapter *adapter = q_vector->adapter; - struct net_device *netdev = adapter->netdev; - union ixgbe_adv_tx_desc *tx_desc, *eop_desc; + union ixgbe_adv_tx_desc *tx_desc; struct ixgbe_tx_buffer *tx_buffer_info; - unsigned int i, eop, count = 0; + struct net_device *netdev = adapter->netdev; + struct sk_buff *skb; + unsigned int i; + u32 head, oldhead; + unsigned int count = 0; unsigned int total_bytes = 0, total_packets = 0; + rmb(); + head = GET_TX_HEAD_FROM_RING(tx_ring); + head = le32_to_cpu(head); i = tx_ring->next_to_clean; - eop = tx_ring->tx_buffer_info[i].next_to_watch; - eop_desc = IXGBE_TX_DESC_ADV(*tx_ring, eop); - - while ((eop_desc->wb.status & cpu_to_le32(IXGBE_TXD_STAT_DD)) && - (count < tx_ring->work_limit)) { - bool cleaned = false; - for ( ; !cleaned; count++) { - struct sk_buff *skb; + while (1) { + while (i != head) { tx_desc = IXGBE_TX_DESC_ADV(*tx_ring, i); tx_buffer_info = &tx_ring->tx_buffer_info[i]; - cleaned = (i == eop); skb = tx_buffer_info->skb; - if (cleaned && skb) { + if (skb) { #ifdef NETIF_F_TSO unsigned int segs, bytecount; @@ -318,17 +261,23 @@ static bool ixgbe_clean_tx_irq(struct ixgbe_q_vector *q_vector, ixgbe_unmap_and_free_tx_resource(adapter, tx_buffer_info); - tx_desc->wb.status = 0; - i++; if (i == tx_ring->count) i = 0; - } - - eop = tx_ring->tx_buffer_info[i].next_to_watch; - eop_desc = IXGBE_TX_DESC_ADV(*tx_ring, eop); - } + count++; + if (count == tx_ring->count) + goto done_cleaning; + } + oldhead = head; + rmb(); + head = GET_TX_HEAD_FROM_RING(tx_ring); + head = le32_to_cpu(head); + if (head == oldhead) + goto done_cleaning; + } /* while (1) */ + +done_cleaning: tx_ring->next_to_clean = i; #define TX_WAKE_THRESHOLD (DESC_NEEDED * 2) @@ -363,20 +312,18 @@ static bool ixgbe_clean_tx_irq(struct ixgbe_q_vector *q_vector, } } -#ifndef CONFIG_IXGBE_NAPI /* re-arm the interrupt */ - if ((count >= tx_ring->work_limit) && - (!test_bit(__IXGBE_DOWN, &adapter->state))) - ixgbe_irq_rearm_queues(adapter, ((u64)1 << q_vector->v_idx)); + if ((total_packets >= tx_ring->work_limit) || + (count == tx_ring->count)) + IXGBE_WRITE_REG(&adapter->hw, IXGBE_EICS, tx_ring->v_idx); -#endif tx_ring->total_bytes += total_bytes; tx_ring->total_packets += total_packets; tx_ring->stats.packets += total_packets; tx_ring->stats.bytes += total_bytes; adapter->net_stats.tx_bytes += total_bytes; adapter->net_stats.tx_packets += total_packets; - return (count < tx_ring->work_limit); + return (total_packets ? true : false); } #if defined(CONFIG_DCA) || defined(CONFIG_DCA_MODULE) @@ -386,34 +333,17 @@ static void ixgbe_update_rx_dca(struct ixgbe_adapter *adapter, u32 rxctrl; int cpu = get_cpu(); int q = rx_ring - adapter->rx_ring; - struct ixgbe_hw *hw = &adapter->hw; if (rx_ring->cpu != cpu) { - rxctrl = IXGBE_READ_REG(hw, IXGBE_DCA_RXCTRL(q)); - if (hw->mac.type == ixgbe_mac_82598EB) { - rxctrl &= ~IXGBE_DCA_RXCTRL_CPUID_MASK; - rxctrl |= dca3_get_tag(&adapter->pdev->dev, cpu); - } else if (hw->mac.type == ixgbe_mac_82599EB) { - rxctrl &= ~IXGBE_DCA_RXCTRL_CPUID_MASK_82599; - rxctrl |= (dca3_get_tag(&adapter->pdev->dev, cpu) << - IXGBE_DCA_RXCTRL_CPUID_SHIFT_82599); - } + rxctrl = IXGBE_READ_REG(&adapter->hw, IXGBE_DCA_RXCTRL(q)); + rxctrl &= ~IXGBE_DCA_RXCTRL_CPUID_MASK; + rxctrl |= dca3_get_tag(&adapter->pdev->dev, cpu); rxctrl |= IXGBE_DCA_RXCTRL_DESC_DCA_EN; rxctrl |= IXGBE_DCA_RXCTRL_HEAD_DCA_EN; - if (adapter->flags & IXGBE_FLAG_DCA_ENABLED_DATA) { - /* just do the header data when in Packet Split mode */ - if (adapter->flags & IXGBE_FLAG_RX_PS_ENABLED) - rxctrl |= IXGBE_DCA_RXCTRL_HEAD_DCA_EN; - else - rxctrl |= IXGBE_DCA_RXCTRL_DATA_DCA_EN; - } - rxctrl &= ~(IXGBE_DCA_RXCTRL_DESC_RRO_EN); - rxctrl &= ~(IXGBE_DCA_RXCTRL_DESC_WRO_EN | - IXGBE_DCA_RXCTRL_DESC_HSRO_EN); - IXGBE_WRITE_REG(hw, IXGBE_DCA_RXCTRL(q), rxctrl); + IXGBE_WRITE_REG(&adapter->hw, IXGBE_DCA_RXCTRL(q), rxctrl); rx_ring->cpu = cpu; } - put_cpu_no_resched(); + put_cpu(); } static void ixgbe_update_tx_dca(struct ixgbe_adapter *adapter, @@ -422,23 +352,13 @@ static void ixgbe_update_tx_dca(struct ixgbe_adapter *adapter, u32 txctrl; int cpu = get_cpu(); int q = tx_ring - adapter->tx_ring; - struct ixgbe_hw *hw = &adapter->hw; if (tx_ring->cpu != cpu) { - if (hw->mac.type == ixgbe_mac_82598EB) { - txctrl = IXGBE_READ_REG(hw, IXGBE_DCA_TXCTRL(q)); - txctrl &= ~IXGBE_DCA_TXCTRL_CPUID_MASK; - txctrl |= dca3_get_tag(&adapter->pdev->dev, cpu); - txctrl |= IXGBE_DCA_TXCTRL_DESC_DCA_EN; - IXGBE_WRITE_REG(hw, IXGBE_DCA_TXCTRL(q), txctrl); - } else if (hw->mac.type == ixgbe_mac_82599EB) { - txctrl = IXGBE_READ_REG(hw, IXGBE_DCA_TXCTRL_82599(q)); - txctrl &= ~IXGBE_DCA_TXCTRL_CPUID_MASK_82599; - txctrl |= (dca3_get_tag(&adapter->pdev->dev, cpu) << - IXGBE_DCA_TXCTRL_CPUID_SHIFT_82599); - txctrl |= IXGBE_DCA_TXCTRL_DESC_DCA_EN; - IXGBE_WRITE_REG(hw, IXGBE_DCA_TXCTRL_82599(q), txctrl); - } + txctrl = IXGBE_READ_REG(&adapter->hw, IXGBE_DCA_TXCTRL(q)); + txctrl &= ~IXGBE_DCA_TXCTRL_CPUID_MASK; + txctrl |= dca3_get_tag(&adapter->pdev->dev, cpu); + txctrl |= IXGBE_DCA_TXCTRL_DESC_DCA_EN; + IXGBE_WRITE_REG(&adapter->hw, IXGBE_DCA_TXCTRL(q), txctrl); tx_ring->cpu = cpu; } put_cpu(); @@ -451,9 +371,6 @@ static void ixgbe_setup_dca(struct ixgbe_adapter *adapter) if (!(adapter->flags & IXGBE_FLAG_DCA_ENABLED)) return; - /* Always use CB2 mode, difference is masked in the CB driver. */ - IXGBE_WRITE_REG(&adapter->hw, IXGBE_DCA_CTRL, 2); - for (i = 0; i < adapter->num_tx_queues; i++) { adapter->tx_ring[i].cpu = -1; ixgbe_update_tx_dca(adapter, &adapter->tx_ring[i]); @@ -475,6 +392,9 @@ static int __ixgbe_notify_dca(struct device *dev, void *data) /* if we''re already enabled, don''t do it again */ if (adapter->flags & IXGBE_FLAG_DCA_ENABLED) break; + /* Always use CB2 mode, difference is masked + * in the CB driver. */ + IXGBE_WRITE_REG(&adapter->hw, IXGBE_DCA_CTRL, 2); if (dca_add_requester(dev) == 0) { adapter->flags |= IXGBE_FLAG_DCA_ENABLED; ixgbe_setup_dca(adapter); @@ -496,35 +416,58 @@ static int __ixgbe_notify_dca(struct device *dev, void *data) #endif /* CONFIG_DCA or CONFIG_DCA_MODULE */ /** * ixgbe_receive_skb - Send a completed packet up the stack - * @q_vector: structure containing interrupt and ring information + * @adapter: board private structure * @skb: packet to send up - * @vlan_tag: vlan tag for packet + * @status: hardware indication of status of receive + * @rx_ring: rx descriptor ring (for a specific queue) to setup + * @rx_desc: rx descriptor **/ -static void ixgbe_receive_skb(struct ixgbe_q_vector *q_vector, - struct sk_buff *skb, u16 vlan_tag) +static void ixgbe_receive_skb(struct ixgbe_adapter *adapter, + struct sk_buff *skb, u8 status, + struct ixgbe_ring *ring, + union ixgbe_adv_rx_desc *rx_desc) { - struct ixgbe_adapter *adapter = q_vector->adapter; int ret; + bool is_vlan = (status & IXGBE_RXD_STAT_VP); + u16 tag = le16_to_cpu(rx_desc->wb.upper.vlan); +#ifdef CONFIG_XEN_NETDEV2_VMQ + if ((adapter->flags & IXGBE_FLAG_VMDQ_ENABLED) && ring->queue_index) { + /* This is a VMDq packet destined for a VM. */ + vmq_netif_rx(skb, ring->queue_index); + return; + } +#endif +#ifndef IXGBE_NO_INET_LRO + if (adapter->netdev->features & NETIF_F_LRO && + skb->ip_summed == CHECKSUM_UNNECESSARY) { +#ifdef NETIF_F_HW_VLAN_TX + if (adapter->vlgrp && is_vlan && (tag != 0)) + lro_vlan_hwaccel_receive_skb(&ring->lro_mgr, skb, + adapter->vlgrp, tag, + rx_desc); + else +#endif + lro_receive_skb(&ring->lro_mgr, skb, rx_desc); + ring->lro_used = true; + } else { +#endif /* IXGBE_NO_INET_LRO */ #ifdef CONFIG_IXGBE_NAPI if (!(adapter->flags & IXGBE_FLAG_IN_NETPOLL)) { #ifdef NETIF_F_HW_VLAN_TX - if (adapter->vlgrp && vlan_tag) - vlan_gro_receive(&q_vector->napi, - adapter->vlgrp, - vlan_tag, skb); + if (adapter->vlgrp && is_vlan && (tag != 0)) + vlan_hwaccel_receive_skb(skb, adapter->vlgrp, tag); else - napi_gro_receive(&q_vector->napi, skb); + netif_receive_skb(skb); #else - napi_gro_receive(&q_vector->napi, skb); + netif_receive_skb(skb); #endif } else { #endif /* CONFIG_IXGBE_NAPI */ #ifdef NETIF_F_HW_VLAN_TX - if (adapter->vlgrp && vlan_tag) - ret = vlan_hwaccel_rx(skb, adapter->vlgrp, - vlan_tag); + if (adapter->vlgrp && is_vlan && (tag != 0)) + ret = vlan_hwaccel_rx(skb, adapter->vlgrp, tag); else ret = netif_rx(skb); #else @@ -537,19 +480,20 @@ static void ixgbe_receive_skb(struct ixgbe_q_vector *q_vector, #ifdef CONFIG_IXGBE_NAPI } #endif /* CONFIG_IXGBE_NAPI */ +#ifndef IXGBE_NO_INET_LRO + } +#endif } /** * ixgbe_rx_checksum - indicate in skb if hw indicated a good cksum * @adapter: address of board private structure - * @rx_desc: current Rx descriptor being processed + * @status_err: hardware indication of status of receive * @skb: skb currently being received and modified **/ static inline void ixgbe_rx_checksum(struct ixgbe_adapter *adapter, - union ixgbe_adv_rx_desc *rx_desc, - struct sk_buff *skb) + u32 status_err, struct sk_buff *skb) { - u32 status_err = le32_to_cpu(rx_desc->wb.upper.status_error); skb->ip_summed = CHECKSUM_NONE; /* Rx csum disabled */ @@ -576,19 +520,6 @@ static inline void ixgbe_rx_checksum(struct ixgbe_adapter *adapter, adapter->hw_csum_rx_good++; } -static inline void ixgbe_release_rx_desc(struct ixgbe_hw *hw, - struct ixgbe_ring *rx_ring, u32 val) -{ - /* - * Force memory writes to complete before letting h/w - * know there are new descriptors to fetch. (Only - * applicable for weak-ordered memory model archs, - * such as IA-64). - */ - wmb(); - writel(val, hw->hw_addr + rx_ring->tail); -} - /** * ixgbe_alloc_rx_buffers - Replace used receive buffers; packet split * @adapter: address of board private structure @@ -605,6 +536,11 @@ static void ixgbe_alloc_rx_buffers(struct ixgbe_adapter *adapter, i = rx_ring->next_to_use; bi = &rx_ring->rx_buffer_info[i]; +#ifdef CONFIG_XEN_NETDEV2_VMQ + if ((adapter->flags & IXGBE_FLAG_VMDQ_ENABLED) && + (!rx_ring->active)) + return; +#endif while (cleaned_count--) { rx_desc = IXGBE_RX_DESC_ADV(*rx_ring, i); @@ -630,28 +566,50 @@ static void ixgbe_alloc_rx_buffers(struct ixgbe_adapter *adapter, } if (!bi->skb) { - struct sk_buff *skb = netdev_alloc_skb(adapter->netdev, - bufsz); + struct sk_buff *skb; +#ifdef CONFIG_XEN_NETDEV2_VMQ + if ((adapter->flags & IXGBE_FLAG_VMDQ_ENABLED) && + rx_ring->queue_index) { + skb = vmq_alloc_skb(adapter->netdev, + rx_ring->queue_index, + bufsz); + if (!skb) { + adapter->alloc_rx_buff_failed++; + goto no_buffers; + } + bi->skb = skb; + bi->dma = pci_map_page(pdev, + skb_shinfo(skb)->frags[0].page, + skb_shinfo(skb)->frags[0].page_offset, + skb_shinfo(skb)->frags[0].size, + PCI_DMA_FROMDEVICE); + } else { +#endif + skb = netdev_alloc_skb(adapter->netdev, bufsz); - if (!skb) { - adapter->alloc_rx_buff_failed++; - goto no_buffers; - } + if (!skb) { + adapter->alloc_rx_buff_failed++; + goto no_buffers; + } - /* - * Make buffer alignment 2 beyond a 16 byte boundary - * this will result in a 16 byte aligned IP header after - * the 14 byte MAC header is removed - */ - skb_reserve(skb, NET_IP_ALIGN); + skb->dev = adapter->netdev; + + /* + * Make buffer alignment 2 beyond a 16 + * byte boundary this will result in a + * 16 byte aligned IP header after the + * 14 byte MAC header is removed + */ + skb_reserve(skb, NET_IP_ALIGN); - bi->skb = skb; + bi->skb = skb; + bi->dma = pci_map_single(pdev, skb->data, bufsz, + PCI_DMA_FROMDEVICE); + } +#ifdef CONFIG_XEN_NETDEV2_VMQ } +#endif - if (!bi->dma) - bi->dma = pci_map_single(pdev, bi->skb->data, rx_ring->rx_buf_len, - PCI_DMA_FROMDEVICE); - /* Refresh the desc even if buffer_addrs didn''t change because * each write-back erases this info. */ if (adapter->flags & IXGBE_FLAG_RX_PS_ENABLED) { @@ -673,7 +631,14 @@ no_buffers: if (i-- == 0) i = (rx_ring->count - 1); - ixgbe_release_rx_desc(&adapter->hw, rx_ring, i); + /* + * Force memory writes to complete before letting h/w + * know there are new descriptors to fetch. (Only + * applicable for weak-ordered memory model archs, + * such as IA-64). + */ + wmb(); + writel(i, adapter->hw.hw_addr + rx_ring->tail); } } @@ -682,78 +647,39 @@ static inline u16 ixgbe_get_hdr_info(union ixgbe_adv_rx_desc *rx_desc) return rx_desc->wb.lower.lo_dword.hs_rss.hdr_info; } -#if !defined(IXGBE_NO_LRO) || !defined(IXGBE_NO_HW_RSC) -/** - * ixgbe_transform_rsc_queue - change rsc queue into a full packet - * @skb: pointer to the last skb in the rsc queue - * - * This function changes a queue full of hw rsc buffers into a completed - * packet. It uses the ->prev pointers to find the first packet and then - * turns it into the frag list owner. - **/ -static inline struct sk_buff *ixgbe_transform_rsc_queue(struct sk_buff *skb) +static inline u16 ixgbe_get_pkt_info(union ixgbe_adv_rx_desc *rx_desc) { - unsigned int frag_list_size = 0; - - while (skb->prev) { - struct sk_buff *prev = skb->prev; - frag_list_size += skb->len; - skb->prev = NULL; - skb = prev; - } - - skb_shinfo(skb)->frag_list = skb->next; - skb->next = NULL; - skb->len += frag_list_size; - skb->data_len += frag_list_size; - skb->truesize += frag_list_size; - return skb; + return rx_desc->wb.lower.lo_dword.hs_rss.pkt_info; } -#endif /* !IXGBE_NO_LRO || !IXGBE_NO_HW_RSC */ #ifndef IXGBE_NO_LRO -/** - * ixgbe_can_lro - returns true if packet is TCP/IPV4 and LRO is enabled - * @adapter: board private structure - * @rx_desc: pointer to the rx descriptor - * - **/ -static inline bool ixgbe_can_lro(struct ixgbe_adapter *adapter, - union ixgbe_adv_rx_desc *rx_desc) -{ - u16 pkt_info = rx_desc->wb.lower.lo_dword.hs_rss.pkt_info; - - return (adapter->flags2 & IXGBE_FLAG2_SWLRO_ENABLED) && - !(adapter->netdev->flags & IFF_PROMISC) && - (pkt_info & IXGBE_RXDADV_PKTTYPE_IPV4) && - (pkt_info & IXGBE_RXDADV_PKTTYPE_TCP); -} +static int lromax = 44; /** - * ixgbe_lro_flush - Indicate packets to upper layer. + * ixgbe_lro_ring_flush - Indicate packets to upper layer. * * Update IP and TCP header part of head skb if more than one * skb''s chained and indicate packets to upper layer. **/ -static void ixgbe_lro_flush(struct ixgbe_q_vector *q_vector, - struct ixgbe_lro_desc *lrod) +static void ixgbe_lro_ring_flush(struct ixgbe_lro_list *lrolist, + struct ixgbe_adapter *adapter, + struct ixgbe_lro_desc *lrod, u8 status, + struct ixgbe_ring *rx_ring, + union ixgbe_adv_rx_desc *rx_desc) { - struct ixgbe_lro_list *lrolist = q_vector->lrolist; struct iphdr *iph; struct tcphdr *th; struct sk_buff *skb; u32 *ts_ptr; + struct ixgbe_lro_info *lro_data = &adapter->lro_data; + struct net_device *netdev = adapter->netdev; hlist_del(&lrod->lro_node); lrolist->active_cnt--; skb = lrod->skb; - lrod->skb = NULL; if (lrod->append_cnt) { - /* take the lro queue and convert to skb format */ - skb = ixgbe_transform_rsc_queue(skb); - /* incorporate ip header and re-calculate checksum */ iph = (struct iphdr *)skb->data; iph->tot_len = ntohs(skb->len); @@ -763,12 +689,10 @@ static void ixgbe_lro_flush(struct ixgbe_q_vector *q_vector, /* incorporate the latest ack into the tcp header */ th = (struct tcphdr *) ((char *)skb->data + sizeof(*iph)); th->ack_seq = lrod->ack_seq; - th->psh = lrod->psh; th->window = lrod->window; - th->check = 0; /* incorporate latest timestamp into the tcp header */ - if (lrod->opt_bytes) { + if (lrod->timestamp) { ts_ptr = (u32 *)(th + 1); ts_ptr[1] = htonl(lrod->tsval); ts_ptr[2] = lrod->tsecr; @@ -778,27 +702,38 @@ static void ixgbe_lro_flush(struct ixgbe_q_vector *q_vector, #ifdef NETIF_F_TSO skb_shinfo(skb)->gso_size = lrod->mss; #endif - ixgbe_receive_skb(q_vector, skb, lrod->vlan_tag); - lrolist->stats.flushed++; + ixgbe_receive_skb(adapter, skb, status, rx_ring, rx_desc); - + netdev->last_rx = jiffies; + lro_data->stats.coal += lrod->append_cnt + 1; + lro_data->stats.flushed++; + + lrod->skb = NULL; + lrod->last_skb = NULL; + lrod->timestamp = 0; + lrod->append_cnt = 0; + lrod->data_size = 0; hlist_add_head(&lrod->lro_node, &lrolist->free); } -static void ixgbe_lro_flush_all(struct ixgbe_q_vector *q_vector) +static void ixgbe_lro_ring_flush_all(struct ixgbe_lro_list *lrolist, + struct ixgbe_adapter *adapter, u8 status, + struct ixgbe_ring *rx_ring, + union ixgbe_adv_rx_desc *rx_desc) { struct ixgbe_lro_desc *lrod; struct hlist_node *node, *node2; - struct ixgbe_lro_list *lrolist = q_vector->lrolist; hlist_for_each_entry_safe(lrod, node, node2, &lrolist->active, lro_node) - ixgbe_lro_flush(q_vector, lrod); + ixgbe_lro_ring_flush(lrolist, adapter, lrod, status, rx_ring, + rx_desc); } /* * ixgbe_lro_header_ok - Main LRO function. **/ -static u16 ixgbe_lro_header_ok(struct sk_buff *new_skb, struct iphdr *iph, +static int ixgbe_lro_header_ok(struct ixgbe_lro_info *lro_data, + struct sk_buff *new_skb, struct iphdr *iph, struct tcphdr *th) { int opt_bytes, tcp_data_len; @@ -843,135 +778,154 @@ static u16 ixgbe_lro_header_ok(struct sk_buff *new_skb, struct iphdr *iph, tcp_data_len = ntohs(iph->tot_len) - (th->doff << 2) - sizeof(*iph); + if (tcp_data_len == 0) + return -1; + return tcp_data_len; } /** - * ixgbe_lro_queue - if able, queue skb into lro chain - * @q_vector: structure containing interrupt and ring information + * ixgbe_lro_ring_queue - if able, queue skb into lro chain + * @lrolist: pointer to structure for lro entries + * @adapter: address of board private structure * @new_skb: pointer to current skb being checked - * @tag: vlan tag for skb + * @status: hardware indication of status of receive + * @rx_ring: rx descriptor ring (for a specific queue) to setup + * @rx_desc: rx descriptor * * Checks whether the skb given is eligible for LRO and if that''s * fine chains it to the existing lro_skb based on flowid. If an LRO for * the flow doesn''t exist create one. **/ -static struct sk_buff *ixgbe_lro_queue(struct ixgbe_q_vector *q_vector, - struct sk_buff *new_skb, - u16 tag) +static int ixgbe_lro_ring_queue(struct ixgbe_lro_list *lrolist, + struct ixgbe_adapter *adapter, + struct sk_buff *new_skb, u8 status, + struct ixgbe_ring *rx_ring, + union ixgbe_adv_rx_desc *rx_desc) { + struct ethhdr *eh; + struct iphdr *iph; + struct tcphdr *th, *header_th; + int opt_bytes, header_ok = 1; + u32 *ts_ptr = NULL; struct sk_buff *lro_skb; struct ixgbe_lro_desc *lrod; struct hlist_node *node; - struct skb_shared_info *new_skb_info = skb_shinfo(new_skb); - struct ixgbe_lro_list *lrolist = q_vector->lrolist; - struct iphdr *iph = (struct iphdr *)new_skb->data; - struct tcphdr *th = (struct tcphdr *)(iph + 1); - int tcp_data_len = ixgbe_lro_header_ok(new_skb, iph, th); - u16 opt_bytes = (th->doff << 2) - sizeof(*th); - u32 *ts_ptr = (opt_bytes ? (u32 *)(th + 1) : NULL); - u32 seq = ntohl(th->seq); + u32 seq; + struct ixgbe_lro_info *lro_data = &adapter->lro_data; + int tcp_data_len; + u16 tag = le16_to_cpu(rx_desc->wb.upper.vlan); + + /* Disable LRO when in promiscuous mode, useful for debugging LRO */ + if (adapter->netdev->flags & IFF_PROMISC) + return -1; + + eh = (struct ethhdr *)skb_mac_header(new_skb); + iph = (struct iphdr *)(eh + 1); + + /* check to see if it is IPv4/TCP */ + if (!((ixgbe_get_pkt_info(rx_desc) & IXGBE_RXDADV_PKTTYPE_IPV4) && + (ixgbe_get_pkt_info(rx_desc) & IXGBE_RXDADV_PKTTYPE_TCP))) + return -1; + + /* find the TCP header */ + th = (struct tcphdr *) (iph + 1); + + tcp_data_len = ixgbe_lro_header_ok(lro_data, new_skb, iph, th); + if (tcp_data_len == -1) + header_ok = 0; + /* make sure any packet we are about to chain doesn''t include any pad */ + skb_trim(new_skb, ntohs(iph->tot_len)); + + opt_bytes = (th->doff << 2) - sizeof(*th); + if (opt_bytes != 0) + ts_ptr = (u32 *)(th + 1); + + seq = ntohl(th->seq); /* * we have a packet that might be eligible for LRO, * so see if it matches anything we might expect */ hlist_for_each_entry(lrod, node, &lrolist->active, lro_node) { - if (lrod->source_port != th->source || - lrod->dest_port != th->dest || - lrod->source_ip != iph->saddr || - lrod->dest_ip != iph->daddr || - lrod->vlan_tag != tag) - continue; - - /* malformed header, or resultant packet would be too large */ - if (tcp_data_len < 0 || (tcp_data_len + lrod->len) > 65535) { - ixgbe_lro_flush(q_vector, lrod); - break; - } - - /* out of order packet */ - if (seq != lrod->next_seq) { - ixgbe_lro_flush(q_vector, lrod); - tcp_data_len = -1; - break; - } + if (lrod->source_port == th->source && + lrod->dest_port == th->dest && + lrod->source_ip == iph->saddr && + lrod->dest_ip == iph->daddr && + lrod->vlan_tag == tag) { + + if (!header_ok) { + ixgbe_lro_ring_flush(lrolist, adapter, lrod, + status, rx_ring, rx_desc); + return -1; + } - if (lrod->opt_bytes || opt_bytes) { - u32 tsval = ntohl(*(ts_ptr + 1)); - /* make sure timestamp values are increasing */ - if (opt_bytes != lrod->opt_bytes || - lrod->tsval > tsval || *(ts_ptr + 2) == 0) { - ixgbe_lro_flush(q_vector, lrod); - tcp_data_len = -1; - break; + if (seq != lrod->next_seq) { + /* out of order packet */ + ixgbe_lro_ring_flush(lrolist, adapter, lrod, + status, rx_ring, rx_desc); + return -1; } - - lrod->tsval = tsval; - lrod->tsecr = *(ts_ptr + 2); - } - /* remove any padding from the end of the skb */ - __pskb_trim(new_skb, ntohs(iph->tot_len)); - /* Remove IP and TCP header*/ - skb_pull(new_skb, ntohs(iph->tot_len) - tcp_data_len); + if (lrod->timestamp) { + u32 tsval = ntohl(*(ts_ptr + 1)); + /* make sure timestamp values are increasing */ + if (lrod->tsval > tsval || *(ts_ptr + 2) == 0) { + ixgbe_lro_ring_flush(lrolist, adapter, + lrod, status, + rx_ring, rx_desc); + return -1; + } + lrod->tsval = tsval; + lrod->tsecr = *(ts_ptr + 2); + } - lrod->next_seq += tcp_data_len; - lrod->ack_seq = th->ack_seq; - lrod->window = th->window; - lrod->len += tcp_data_len; - lrod->psh |= th->psh; - lrod->append_cnt++; - lrolist->stats.coal++; + lro_skb = lrod->skb; - if (tcp_data_len > lrod->mss) - lrod->mss = tcp_data_len; + lro_skb->len += tcp_data_len; + lro_skb->data_len += tcp_data_len; + lro_skb->truesize += tcp_data_len; - lro_skb = lrod->skb; + lrod->next_seq += tcp_data_len; + lrod->ack_seq = th->ack_seq; + lrod->window = th->window; + lrod->data_size += tcp_data_len; + if (tcp_data_len > lrod->mss) + lrod->mss = tcp_data_len; - /* if header is empty pull pages into current skb */ - if (!skb_headlen(new_skb) && - ((skb_shinfo(lro_skb)->nr_frags + - skb_shinfo(new_skb)->nr_frags) <= MAX_SKB_FRAGS )) { - struct skb_shared_info *lro_skb_info = skb_shinfo(lro_skb); + /* Remove IP and TCP header*/ + skb_pull(new_skb, ntohs(iph->tot_len) - tcp_data_len); - /* copy frags into the last skb */ - memcpy(lro_skb_info->frags + lro_skb_info->nr_frags, - new_skb_info->frags, - new_skb_info->nr_frags * sizeof(skb_frag_t)); + /* Chain this new skb in frag_list */ + if (skb_shinfo(lro_skb)->frag_list != NULL ) + lrod->last_skb->next = new_skb; + else + skb_shinfo(lro_skb)->frag_list = new_skb; - lro_skb_info->nr_frags += new_skb_info->nr_frags; - lro_skb->len += tcp_data_len; - lro_skb->data_len += tcp_data_len; - lro_skb->truesize += tcp_data_len; + lrod->last_skb = new_skb ; - new_skb_info->nr_frags = 0; - new_skb->truesize -= tcp_data_len; - new_skb->len = new_skb->data_len = 0; - } else if (tcp_data_len) { - /* Chain this new skb in frag_list */ - new_skb->prev = lro_skb; - lro_skb->next = new_skb; - lrod->skb = new_skb ; - } + lrod->append_cnt++; - if (lrod->psh) - ixgbe_lro_flush(q_vector, lrod); + /* New packet with push flag, flush the whole packet. */ + if (th->psh) { + header_th + (struct tcphdr *)(lro_skb->data + sizeof(*iph)); + header_th->psh |= th->psh; + ixgbe_lro_ring_flush(lrolist, adapter, lrod, + status, rx_ring, rx_desc); + return 0; + } - /* return the skb if it is empty for recycling */ - if (!new_skb->len) { - new_skb->data = skb_mac_header(new_skb); - __pskb_trim(new_skb, 0); - new_skb->protocol = 0; - lrolist->stats.recycled++; - return new_skb; - } + if (lrod->append_cnt >= lro_data->max) + ixgbe_lro_ring_flush(lrolist, adapter, lrod, + status, rx_ring, rx_desc); - return NULL; + return 0; + } /*End of if*/ } /* start a new packet */ - if (tcp_data_len > 0 && !hlist_empty(&lrolist->free) && !th->psh) { + if (header_ok && !hlist_empty(&lrolist->free)) { lrod = hlist_entry(lrolist->free.first, struct ixgbe_lro_desc, lro_node); @@ -980,18 +934,16 @@ static struct sk_buff *ixgbe_lro_queue(struct ixgbe_q_vector *q_vector, lrod->dest_ip = iph->daddr; lrod->source_port = th->source; lrod->dest_port = th->dest; - lrod->vlan_tag = tag; - lrod->len = new_skb->len; lrod->next_seq = seq + tcp_data_len; + lrod->mss = tcp_data_len; lrod->ack_seq = th->ack_seq; lrod->window = th->window; - lrod->mss = tcp_data_len; - lrod->opt_bytes = opt_bytes; - lrod->psh = 0; - lrod->append_cnt = 0; + lrod->data_size = tcp_data_len; + lrod->vlan_tag = tag; /* record timestamp if it is present */ if (opt_bytes) { + lrod->timestamp = 1; lrod->tsval = ntohl(*(ts_ptr + 1)); lrod->tsecr = *(ts_ptr + 2); } @@ -1000,13 +952,11 @@ static struct sk_buff *ixgbe_lro_queue(struct ixgbe_q_vector *q_vector, /* .. and insert at the front of the active list */ hlist_add_head(&lrod->lro_node, &lrolist->active); lrolist->active_cnt++; - lrolist->stats.coal++; - return NULL; + + return 0; } - /* packet not handled by any of the above, pass it to the stack */ - ixgbe_receive_skb(q_vector, new_skb, tag); - return NULL; + return -1; } static void ixgbe_lro_ring_exit(struct ixgbe_lro_list *lrolist) @@ -1027,7 +977,8 @@ static void ixgbe_lro_ring_exit(struct ixgbe_lro_list *lrolist) } } -static void ixgbe_lro_ring_init(struct ixgbe_lro_list *lrolist) +static void ixgbe_lro_ring_init(struct ixgbe_lro_list *lrolist, + struct ixgbe_adapter *adapter) { int j, bytes; struct ixgbe_lro_desc *lrod; @@ -1042,62 +993,30 @@ static void ixgbe_lro_ring_init(struct ixgbe_lro_list *lrolist) if (lrod != NULL) { INIT_HLIST_NODE(&lrod->lro_node); hlist_add_head(&lrod->lro_node, &lrolist->free); + } else { + DPRINTK(PROBE, ERR, + "Allocation for LRO descriptor %u failed\n", j); } } } #endif /* IXGBE_NO_LRO */ - -#ifndef IXGBE_NO_HW_RSC -static inline u32 ixgbe_get_rsc_count(union ixgbe_adv_rx_desc *rx_desc) -{ - return (le32_to_cpu(rx_desc->wb.lower.lo_dword.data) & - IXGBE_RXDADV_RSCCNT_MASK) >> - IXGBE_RXDADV_RSCCNT_SHIFT; -} - -#endif /* IXGBE_NO_HW_RSC */ - -static void ixgbe_rx_status_indication(u32 staterr, - struct ixgbe_adapter *adapter) -{ - switch (adapter->hw.mac.type) { - case ixgbe_mac_82599EB: - if (staterr & IXGBE_RXD_STAT_FLM) - adapter->flm++; -#ifndef IXGBE_NO_LLI - if (staterr & IXGBE_RXD_STAT_DYNINT) - adapter->lli_int++; -#endif /* IXGBE_NO_LLI */ - break; - case ixgbe_mac_82598EB: -#ifndef IXGBE_NO_LLI - if (staterr & IXGBE_RXD_STAT_DYNINT) - adapter->lli_int++; -#endif /* IXGBE_NO_LLI */ - break; - default: - break; - } -} - #ifdef CONFIG_IXGBE_NAPI -static bool ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector, +static bool ixgbe_clean_rx_irq(struct ixgbe_adapter *adapter, struct ixgbe_ring *rx_ring, int *work_done, int work_to_do) #else -static bool ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector, +static bool ixgbe_clean_rx_irq(struct ixgbe_adapter *adapter, struct ixgbe_ring *rx_ring) #endif { - struct ixgbe_adapter *adapter = q_vector->adapter; struct pci_dev *pdev = adapter->pdev; union ixgbe_adv_rx_desc *rx_desc, *next_rxd; struct ixgbe_rx_buffer *rx_buffer_info, *next_buffer; struct sk_buff *skb; - unsigned int i, rsc_count = 0; + unsigned int i; u32 len, staterr; - u16 hdr_info, vlan_tag; + u16 hdr_info; bool cleaned = false; int cleaned_count = 0; #ifndef CONFIG_IXGBE_NAPI @@ -1121,23 +1040,44 @@ static bool ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector, hdr_info = le16_to_cpu(ixgbe_get_hdr_info(rx_desc)); len = (hdr_info & IXGBE_RXDADV_HDRBUFLEN_MASK) >> IXGBE_RXDADV_HDRBUFLEN_SHIFT; + if (hdr_info & IXGBE_RXDADV_SPH) + adapter->rx_hdr_split++; if (len > IXGBE_RX_HDR_SIZE) len = IXGBE_RX_HDR_SIZE; upper_len = le16_to_cpu(rx_desc->wb.upper.length); } else { len = le16_to_cpu(rx_desc->wb.upper.length); } + +#ifndef IXGBE_NO_LLI + if (staterr & IXGBE_RXD_STAT_DYNINT) + adapter->lli_int++; +#endif + cleaned = true; skb = rx_buffer_info->skb; - prefetch(skb->data - NET_IP_ALIGN); rx_buffer_info->skb = NULL; - - /* if this is a skb from previous receive dma will be 0 */ - if (rx_buffer_info->dma) { +#ifdef CONFIG_XEN_NETDEV2_VMQ + if ((adapter->flags & IXGBE_FLAG_VMDQ_ENABLED) && + rx_ring->queue_index) { + /* for Xen VMDq, packet data goes in first page of + * skb, instead of data. + */ + /* TODO this is broke for jumbos > 4k */ + pci_unmap_page(pdev, rx_buffer_info->dma, + PAGE_SIZE, PCI_DMA_FROMDEVICE); + skb->len += len; + skb_shinfo(skb)->frags[0].size = len; + } else { + prefetch(skb->data - NET_IP_ALIGN); + } +#else + prefetch(skb->data - NET_IP_ALIGN); +#endif + if (len && !skb_shinfo(skb)->nr_frags) { pci_unmap_single(pdev, rx_buffer_info->dma, - rx_ring->rx_buf_len, + rx_ring->rx_buf_len + NET_IP_ALIGN, PCI_DMA_FROMDEVICE); - rx_buffer_info->dma = 0; skb_put(skb, len); } @@ -1150,7 +1090,8 @@ static bool ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector, rx_buffer_info->page_offset, upper_len); - if (page_count(rx_buffer_info->page) != 1) + if ((rx_ring->rx_buf_len > (PAGE_SIZE / 2)) || + (page_count(rx_buffer_info->page) != 1)) rx_buffer_info->page = NULL; else get_page(rx_buffer_info->page); @@ -1163,72 +1104,60 @@ static bool ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector, i++; if (i == rx_ring->count) i = 0; + next_buffer = &rx_ring->rx_buffer_info[i]; next_rxd = IXGBE_RX_DESC_ADV(*rx_ring, i); prefetch(next_rxd); - cleaned_count++; - -#ifndef IXGBE_NO_HW_RSC - if (adapter->flags2 & IXGBE_FLAG2_RSC_CAPABLE) - rsc_count = ixgbe_get_rsc_count(rx_desc); - -#endif - if (rsc_count) { - u32 nextp = (staterr & IXGBE_RXDADV_NEXTP_MASK) >> - IXGBE_RXDADV_NEXTP_SHIFT; - next_buffer = &rx_ring->rx_buffer_info[nextp]; - rx_ring->rsc_count += (rsc_count - 1); - } else { - next_buffer = &rx_ring->rx_buffer_info[i]; - } + cleaned_count++; if (staterr & IXGBE_RXD_STAT_EOP) { - ixgbe_rx_status_indication(staterr, adapter); -#ifndef IXGBE_NO_HW_RSC - if (skb->prev) - skb = ixgbe_transform_rsc_queue(skb); -#endif rx_ring->stats.packets++; rx_ring->stats.bytes += skb->len; } else { - if (adapter->flags & IXGBE_FLAG_RX_PS_ENABLED) { - rx_buffer_info->skb = next_buffer->skb; - rx_buffer_info->dma = next_buffer->dma; - next_buffer->skb = skb; - next_buffer->dma = 0; - } else { - skb->next = next_buffer->skb; - skb->next->prev = skb; - } + rx_buffer_info->skb = next_buffer->skb; + rx_buffer_info->dma = next_buffer->dma; + next_buffer->skb = skb; + next_buffer->dma = 0; adapter->non_eop_descs++; goto next_desc; } /* ERR_MASK will only have valid bits if EOP set */ if (unlikely(staterr & IXGBE_RXDADV_ERR_FRAME_ERR_MASK)) { - /* trim packet back to size 0 and recycle it */ - __pskb_trim(skb, 0); - rx_buffer_info->skb = skb; +#ifdef CONFIG_XEN_NETDEV2_VMQ + if ((adapter->flags & IXGBE_FLAG_VMDQ_ENABLED) && + rx_ring->queue_index) + vmq_free_skb(skb, rx_ring->queue_index); + else +#endif + dev_kfree_skb_irq(skb); goto next_desc; } - ixgbe_rx_checksum(adapter, rx_desc, skb); + ixgbe_rx_checksum(adapter, staterr, skb); /* probably a little skewed due to removing CRC */ total_rx_bytes += skb->len; total_rx_packets++; - - skb->protocol = eth_type_trans(skb, adapter->netdev); - vlan_tag = ((staterr & IXGBE_RXD_STAT_VP) ? - le16_to_cpu(rx_desc->wb.upper.vlan) : 0); +#ifdef CONFIG_XEN_NETDEV2_VMQ + if (!((adapter->flags & IXGBE_FLAG_VMDQ_ENABLED) && + rx_ring->queue_index)) +#endif + skb->protocol = eth_type_trans(skb, adapter->netdev); #ifndef IXGBE_NO_LRO - if (ixgbe_can_lro(adapter, rx_desc)) - rx_buffer_info->skb = ixgbe_lro_queue(q_vector, skb, vlan_tag); - else + if (ixgbe_lro_ring_queue(rx_ring->lrolist, + adapter, skb, staterr, rx_ring, rx_desc) == 0) { + adapter->netdev->last_rx = jiffies; + rx_ring->stats.packets++; + if (upper_len) + rx_ring->stats.bytes += upper_len; + else + rx_ring->stats.bytes += skb->len; + goto next_desc; + } #endif - ixgbe_receive_skb(q_vector, skb, vlan_tag); - + ixgbe_receive_skb(adapter, skb, staterr, rx_ring, rx_desc); adapter->netdev->last_rx = jiffies; next_desc: @@ -1242,17 +1171,23 @@ next_desc: /* use prefetched values */ rx_desc = next_rxd; - rx_buffer_info = &rx_ring->rx_buffer_info[i]; + rx_buffer_info = next_buffer; staterr = le32_to_cpu(rx_desc->wb.upper.status_error); } + rx_ring->next_to_clean = i; #ifndef IXGBE_NO_LRO - if (adapter->flags2 & IXGBE_FLAG2_SWLRO_ENABLED) - ixgbe_lro_flush_all(q_vector); + ixgbe_lro_ring_flush_all(rx_ring->lrolist, adapter, + staterr, rx_ring, rx_desc); #endif /* IXGBE_NO_LRO */ - rx_ring->next_to_clean = i; cleaned_count = IXGBE_DESC_UNUSED(rx_ring); +#ifndef IXGBE_NO_INET_LRO + if (rx_ring->lro_used) { + lro_flush_all(&rx_ring->lro_mgr); + rx_ring->lro_used = false; + } +#endif if (cleaned_count) ixgbe_alloc_rx_buffers(adapter, rx_ring, cleaned_count); @@ -1264,41 +1199,12 @@ next_desc: #ifndef CONFIG_IXGBE_NAPI /* re-arm the interrupt if we had to bail early and have more work */ - if ((*work_done >= work_to_do) && - (!test_bit(__IXGBE_DOWN, &adapter->state))) - ixgbe_irq_rearm_queues(adapter, ((u64)1 << q_vector->v_idx)); + if (*work_done >= work_to_do) + IXGBE_WRITE_REG(&adapter->hw, IXGBE_EICS, rx_ring->v_idx); #endif return cleaned; } -/** - * ixgbe_write_eitr - write EITR register in hardware specific way - * @q_vector: structure containing interrupt and ring information - * - * This function is made to be called by ethtool and by the driver - * when it needs to update EITR registers at runtime. Hardware - * specific quirks/differences are taken care of here. - */ -void ixgbe_write_eitr(struct ixgbe_q_vector *q_vector) -{ - struct ixgbe_adapter *adapter = q_vector->adapter; - struct ixgbe_hw *hw = &adapter->hw; - int v_idx = q_vector->v_idx; - u32 itr_reg = EITR_INTS_PER_SEC_TO_REG(q_vector->eitr); - - if (adapter->hw.mac.type == ixgbe_mac_82598EB) { - /* must write high and low 16 bits to reset counter */ - itr_reg |= (itr_reg << 16); - } else if (adapter->hw.mac.type == ixgbe_mac_82599EB) { - /* - * set the WDIS bit to not clear the timer bits and cause an - * immediate assertion of the interrupt - */ - itr_reg |= IXGBE_EITR_CNT_WDIS; - } - IXGBE_WRITE_REG(hw, IXGBE_EITR(v_idx), itr_reg); -} - #ifdef CONFIG_IXGBE_NAPI static int ixgbe_clean_rxonly(struct napi_struct *, int); #endif @@ -1317,19 +1223,18 @@ static void ixgbe_configure_msix(struct ixgbe_adapter *adapter) q_vectors = adapter->num_msix_vectors - NON_Q_VECTORS; - /* - * Populate the IVAR table and set the ITR values to the + /* Populate the IVAR table and set the ITR values to the * corresponding register. */ for (v_idx = 0; v_idx < q_vectors; v_idx++) { - q_vector = adapter->q_vector[v_idx]; + q_vector = &adapter->q_vector[v_idx]; /* XXX for_each_bit(...) */ r_idx = find_first_bit(q_vector->rxr_idx, adapter->num_rx_queues); for (i = 0; i < q_vector->rxr_count; i++) { j = adapter->rx_ring[r_idx].reg_idx; - ixgbe_set_ivar(adapter, 0, j, v_idx); + ixgbe_set_ivar(adapter, IXGBE_IVAR_RX_QUEUE(j), v_idx); r_idx = find_next_bit(q_vector->rxr_idx, adapter->num_rx_queues, r_idx + 1); @@ -1339,7 +1244,7 @@ static void ixgbe_configure_msix(struct ixgbe_adapter *adapter) for (i = 0; i < q_vector->txr_count; i++) { j = adapter->tx_ring[r_idx].reg_idx; - ixgbe_set_ivar(adapter, 1, j, v_idx); + ixgbe_set_ivar(adapter, IXGBE_IVAR_TX_QUEUE(j), v_idx); r_idx = find_next_bit(q_vector->txr_idx, adapter->num_tx_queues, r_idx + 1); @@ -1348,22 +1253,19 @@ static void ixgbe_configure_msix(struct ixgbe_adapter *adapter) /* if this is a tx only vector halve the interrupt rate */ if (q_vector->txr_count && !q_vector->rxr_count) q_vector->eitr = (adapter->eitr_param >> 1); - else if (q_vector->rxr_count) + else /* rx only */ q_vector->eitr = adapter->eitr_param; - ixgbe_write_eitr(q_vector); + IXGBE_WRITE_REG(&adapter->hw, IXGBE_EITR(v_idx), + EITR_INTS_PER_SEC_TO_REG(q_vector->eitr)); } - if (adapter->hw.mac.type == ixgbe_mac_82598EB) - ixgbe_set_ivar(adapter, -1, IXGBE_IVAR_OTHER_CAUSES_INDEX, - v_idx); - else if (adapter->hw.mac.type == ixgbe_mac_82599EB) - ixgbe_set_ivar(adapter, -1, 1, v_idx); + ixgbe_set_ivar(adapter, IXGBE_IVAR_OTHER_CAUSES_INDEX, v_idx); IXGBE_WRITE_REG(&adapter->hw, IXGBE_EITR(v_idx), 1950); #ifdef IXGBE_TCP_TIMER - ixgbe_set_ivar(adapter, -1, 0, ++v_idx); -#endif /* IXGBE_TCP_TIMER */ + ixgbe_set_ivar(adapter, IXGBE_IVAR_TCP_TIMER_INDEX, ++v_idx); +#endif /* set up to autoclear timer, and the vectors */ mask = IXGBE_EIMS_ENABLE_MASK; @@ -1445,10 +1347,12 @@ update_itr_done: static void ixgbe_set_itr_msix(struct ixgbe_q_vector *q_vector) { struct ixgbe_adapter *adapter = q_vector->adapter; + struct ixgbe_hw *hw = &adapter->hw; u32 new_itr; u8 current_itr, ret_itr; - int i, r_idx; - struct ixgbe_ring *rx_ring = NULL, *tx_ring = NULL; + int i, r_idx, v_idx = ((void *)q_vector - (void *)(adapter->q_vector)) / + sizeof(struct ixgbe_q_vector); + struct ixgbe_ring *rx_ring, *tx_ring; r_idx = find_first_bit(q_vector->txr_idx, adapter->num_tx_queues); for (i = 0; i < q_vector->txr_count; i++) { @@ -1497,14 +1401,14 @@ static void ixgbe_set_itr_msix(struct ixgbe_q_vector *q_vector) } if (new_itr != q_vector->eitr) { - + u32 itr_reg; /* do an exponential smoothing */ new_itr = ((q_vector->eitr * 90)/100) + ((new_itr * 10)/100); - - /* save the algorithm value here */ q_vector->eitr = new_itr; - - ixgbe_write_eitr(q_vector); + itr_reg = EITR_INTS_PER_SEC_TO_REG(new_itr); + /* must write high and low 16 bits to reset counter */ + DPRINTK(TX_ERR, DEBUG, "writing eitr(%d): %08X\n", v_idx, itr_reg); + IXGBE_WRITE_REG(hw, IXGBE_EITR(v_idx), itr_reg | (itr_reg)<<16); } return; @@ -1522,26 +1426,6 @@ static void ixgbe_check_fan_failure(struct ixgbe_adapter *adapter, u32 eicr) } } -static void ixgbe_check_sfp_event(struct ixgbe_adapter *adapter, u32 eicr) -{ - struct ixgbe_hw *hw = &adapter->hw; - - if (eicr & IXGBE_EICR_GPI_SDP1) { - /* Clear the interrupt */ - IXGBE_WRITE_REG(hw, IXGBE_EICR, IXGBE_EICR_GPI_SDP1); - if (!test_bit(__IXGBE_DOWN, &adapter->state)) - schedule_work(&adapter->multispeed_fiber_task); - } else if (eicr & IXGBE_EICR_GPI_SDP2) { - /* Clear the interrupt */ - IXGBE_WRITE_REG(hw, IXGBE_EICR, IXGBE_EICR_GPI_SDP2); - if (!test_bit(__IXGBE_DOWN, &adapter->state)) - schedule_work(&adapter->sfp_config_module_task); - } else { - /* Interrupt isn''t for us... */ - return; - } -} - static void ixgbe_check_lsc(struct ixgbe_adapter *adapter) { struct ixgbe_hw *hw = &adapter->hw; @@ -1560,50 +1444,15 @@ static irqreturn_t ixgbe_msix_lsc(int irq, void *data) struct net_device *netdev = data; struct ixgbe_adapter *adapter = netdev_priv(netdev); struct ixgbe_hw *hw = &adapter->hw; - u32 eicr; - - /* - * Workaround of Silicon errata on 82598. Use clear-by-write - * instead of clear-by-read to clear EICR , reading EICS gives the - * value of EICR without read-clear of EICR - */ - eicr = IXGBE_READ_REG(hw, IXGBE_EICS); - IXGBE_WRITE_REG(hw, IXGBE_EICR, eicr); + u32 eicr = IXGBE_READ_REG(hw, IXGBE_EICR); if (eicr & IXGBE_EICR_LSC) ixgbe_check_lsc(adapter); - if (hw->mac.type == ixgbe_mac_82599EB) { - if (eicr & IXGBE_EICR_ECC) { - DPRINTK(LINK, INFO, "Received unrecoverable ECC Err, " - "please reboot\n"); - IXGBE_WRITE_REG(hw, IXGBE_EICR, IXGBE_EICR_ECC); - } - /* Handle Flow Director Full threshold interrupt */ - if (eicr & IXGBE_EICR_FLOW_DIR) { - int i; - IXGBE_WRITE_REG(hw, IXGBE_EICR, IXGBE_EICR_FLOW_DIR); - /* Disable transmits before FDIR Re-initialization */ - netif_tx_stop_all_queues(netdev); - for (i = 0; i < adapter->num_tx_queues; i++) { - struct ixgbe_ring *tx_ring - &adapter->tx_ring[i]; - if (test_and_clear_bit(__IXGBE_FDIR_INIT_DONE, - &tx_ring->reinit_state)) - schedule_work(&adapter->fdir_reinit_task); - } - } - } - ixgbe_check_fan_failure(adapter, eicr); - if (hw->mac.type == ixgbe_mac_82599EB) - ixgbe_check_sfp_event(adapter, eicr); - - /* re-enable the original interrupt state, no lsc, no queues */ if (!test_bit(__IXGBE_DOWN, &adapter->state)) - IXGBE_WRITE_REG(hw, IXGBE_EIMS, eicr & - ~(IXGBE_EIMS_LSC | IXGBE_EIMS_RTX_QUEUE)); + IXGBE_WRITE_REG(hw, IXGBE_EIMS, IXGBE_EIMS_OTHER); return IRQ_HANDLED; } @@ -1638,40 +1487,6 @@ static irqreturn_t ixgbe_msix_tcp_timer(int irq, void *data) } #endif /* IXGBE_TCP_TIMER */ -static inline void ixgbe_irq_enable_queues(struct ixgbe_adapter *adapter, - u64 qmask) -{ - u32 mask; - - if (adapter->hw.mac.type == ixgbe_mac_82598EB) { - mask = (IXGBE_EIMS_RTX_QUEUE & qmask); - IXGBE_WRITE_REG(&adapter->hw, IXGBE_EIMS, mask); - } else { - mask = (qmask & 0xFFFFFFFF); - IXGBE_WRITE_REG(&adapter->hw, IXGBE_EIMS_EX(0), mask); - mask = (qmask >> 32); - IXGBE_WRITE_REG(&adapter->hw, IXGBE_EIMS_EX(1), mask); - } - /* skip the flush */ -} - -static inline void ixgbe_irq_disable_queues(struct ixgbe_adapter *adapter, - u64 qmask) -{ - u32 mask; - - if (adapter->hw.mac.type == ixgbe_mac_82598EB) { - mask = (IXGBE_EIMS_RTX_QUEUE & qmask); - IXGBE_WRITE_REG(&adapter->hw, IXGBE_EIMC, mask); - } else { - mask = (qmask & 0xFFFFFFFF); - IXGBE_WRITE_REG(&adapter->hw, IXGBE_EIMC_EX(0), mask); - mask = (qmask >> 32); - IXGBE_WRITE_REG(&adapter->hw, IXGBE_EIMC_EX(1), mask); - } - /* skip the flush */ -} - static irqreturn_t ixgbe_msix_clean_tx(int irq, void *data) { struct ixgbe_q_vector *q_vector = data; @@ -1685,28 +1500,21 @@ static irqreturn_t ixgbe_msix_clean_tx(int irq, void *data) r_idx = find_first_bit(q_vector->txr_idx, adapter->num_tx_queues); for (i = 0; i < q_vector->txr_count; i++) { tx_ring = &(adapter->tx_ring[r_idx]); - tx_ring->total_bytes = 0; - tx_ring->total_packets = 0; -#ifndef CONFIG_IXGBE_NAPI - ixgbe_clean_tx_irq(q_vector, tx_ring); #if defined(CONFIG_DCA) || defined(CONFIG_DCA_MODULE) if (adapter->flags & IXGBE_FLAG_DCA_ENABLED) ixgbe_update_tx_dca(adapter, tx_ring); #endif -#endif + tx_ring->total_bytes = 0; + tx_ring->total_packets = 0; + ixgbe_clean_tx_irq(adapter, tx_ring); r_idx = find_next_bit(q_vector->txr_idx, adapter->num_tx_queues, r_idx + 1); } -#ifdef CONFIG_IXGBE_NAPI - /* disable interrupts on this vector only */ - ixgbe_irq_disable_queues(adapter, ((u64)1 << q_vector->v_idx)); - napi_schedule(&q_vector->napi); -#endif /* * possibly later we can enable tx auto-adjustment if necessary * - if (adapter->itr_setting & 1) + if (adapter->itr_setting & 3) ixgbe_set_itr_msix(q_vector); */ @@ -1729,10 +1537,12 @@ static irqreturn_t ixgbe_msix_clean_rx(int irq, void *data) r_idx = find_first_bit(q_vector->rxr_idx, adapter->num_rx_queues); for (i = 0; i < q_vector->rxr_count; i++) { rx_ring = &(adapter->rx_ring[r_idx]); + if (!rx_ring->active) + continue; rx_ring->total_bytes = 0; rx_ring->total_packets = 0; #ifndef CONFIG_IXGBE_NAPI - ixgbe_clean_rx_irq(q_vector, rx_ring); + ixgbe_clean_rx_irq(adapter, rx_ring); #if defined(CONFIG_DCA) || defined(CONFIG_DCA_MODULE) if (adapter->flags & IXGBE_FLAG_DCA_ENABLED) @@ -1743,7 +1553,7 @@ static irqreturn_t ixgbe_msix_clean_rx(int irq, void *data) r_idx + 1); } - if (adapter->itr_setting & 1) + if (adapter->itr_setting & 3) ixgbe_set_itr_msix(q_vector); #else r_idx = find_next_bit(q_vector->rxr_idx, adapter->num_rx_queues, @@ -1753,9 +1563,13 @@ static irqreturn_t ixgbe_msix_clean_rx(int irq, void *data) if (!q_vector->rxr_count) return IRQ_HANDLED; + r_idx = find_first_bit(q_vector->rxr_idx, adapter->num_rx_queues); + rx_ring = &(adapter->rx_ring[r_idx]); + if (!rx_ring->active) + return IRQ_HANDLED; /* disable interrupts on this vector only */ - ixgbe_irq_disable_queues(adapter, ((u64)1 << q_vector->v_idx)); - napi_schedule(&q_vector->napi); + IXGBE_WRITE_REG(&adapter->hw, IXGBE_EIMC, rx_ring->v_idx); + netif_rx_schedule(adapter->netdev, &q_vector->napi); #endif return IRQ_HANDLED; @@ -1763,59 +1577,8 @@ static irqreturn_t ixgbe_msix_clean_rx(int irq, void *data) static irqreturn_t ixgbe_msix_clean_many(int irq, void *data) { - struct ixgbe_q_vector *q_vector = data; - struct ixgbe_adapter *adapter = q_vector->adapter; - struct ixgbe_ring *ring; - int r_idx; - int i; - - if (!q_vector->txr_count && !q_vector->rxr_count) - return IRQ_HANDLED; - - r_idx = find_first_bit(q_vector->txr_idx, adapter->num_tx_queues); - for (i = 0; i < q_vector->txr_count; i++) { - ring = &(adapter->tx_ring[r_idx]); - ring->total_bytes = 0; - ring->total_packets = 0; -#ifndef CONFIG_IXGBE_NAPI - ixgbe_clean_tx_irq(q_vector, ring); -#if defined(CONFIG_DCA) || defined(CONFIG_DCA_MODULE) - if (adapter->flags & IXGBE_FLAG_DCA_ENABLED) - ixgbe_update_tx_dca(adapter, ring); -#endif -#endif - r_idx = find_next_bit(q_vector->txr_idx, adapter->num_tx_queues, - r_idx + 1); - } - - r_idx = find_first_bit(q_vector->rxr_idx, adapter->num_rx_queues); - for (i = 0; i < q_vector->rxr_count; i++) { - ring = &(adapter->rx_ring[r_idx]); - ring->total_bytes = 0; - ring->total_packets = 0; -#ifndef CONFIG_IXGBE_NAPI - ixgbe_clean_rx_irq(q_vector, ring); - -#if defined(CONFIG_DCA) || defined(CONFIG_DCA_MODULE) - if (adapter->flags & IXGBE_FLAG_DCA_ENABLED) - ixgbe_update_rx_dca(adapter, ring); - -#endif - r_idx = find_next_bit(q_vector->rxr_idx, adapter->num_rx_queues, - r_idx + 1); - } - - if (adapter->itr_setting & 1) - ixgbe_set_itr_msix(q_vector); -#else - r_idx = find_next_bit(q_vector->rxr_idx, adapter->num_rx_queues, - r_idx + 1); - } - - /* disable interrupts on this vector only */ - ixgbe_irq_disable_queues(adapter, ((u64)1 << q_vector->v_idx)); - napi_schedule(&q_vector->napi); -#endif + ixgbe_msix_clean_rx(irq, data); + ixgbe_msix_clean_tx(irq, data); return IRQ_HANDLED; } @@ -1845,54 +1608,39 @@ static int ixgbe_clean_rxonly(struct napi_struct *napi, int budget) ixgbe_update_rx_dca(adapter, rx_ring); #endif - ixgbe_clean_rx_irq(q_vector, rx_ring, &work_done, budget); + if (rx_ring->active) + ixgbe_clean_rx_irq(adapter, rx_ring, &work_done, budget); -#ifndef HAVE_NETDEV_NAPI_LIST - if (!netif_running(adapter->netdev)) - work_done = 0; - -#endif /* If all Rx work done, exit the polling mode */ - if (work_done < budget) { - napi_complete(napi); - if (adapter->itr_setting & 1) + if ((work_done == 0) || !netif_running(adapter->netdev)) { + netif_rx_complete(adapter->netdev, napi); + if (adapter->itr_setting & 3) ixgbe_set_itr_msix(q_vector); if (!test_bit(__IXGBE_DOWN, &adapter->state)) - ixgbe_irq_enable_queues(adapter, ((u64)1 << q_vector->v_idx)); + IXGBE_WRITE_REG(&adapter->hw, IXGBE_EIMS, rx_ring->v_idx); + return 0; } return work_done; } /** - * ixgbe_clean_rxtx_many - msix (aka one shot) rx clean routine + * ixgbe_clean_rxonly_many - msix (aka one shot) rx clean routine * @napi: napi struct with our devices info in it * @budget: amount of work driver is allowed to do this pass, in packets * * This function will clean more than one rx queue associated with a * q_vector. **/ -static int ixgbe_clean_rxtx_many(struct napi_struct *napi, int budget) +static int ixgbe_clean_rxonly_many(struct napi_struct *napi, int budget) { struct ixgbe_q_vector *q_vector container_of(napi, struct ixgbe_q_vector, napi); struct ixgbe_adapter *adapter = q_vector->adapter; - struct ixgbe_ring *ring = NULL; + struct ixgbe_ring *rx_ring = NULL; int work_done = 0, i; long r_idx; - bool tx_clean_complete = true; - - r_idx = find_first_bit(q_vector->txr_idx, adapter->num_tx_queues); - for (i = 0; i < q_vector->txr_count; i++) { - ring = &(adapter->tx_ring[r_idx]); -#if defined(CONFIG_DCA) || defined(CONFIG_DCA_MODULE) - if (adapter->flags & IXGBE_FLAG_DCA_ENABLED) - ixgbe_update_tx_dca(adapter, ring); -#endif - tx_clean_complete &= ixgbe_clean_tx_irq(q_vector, ring); - r_idx = find_next_bit(q_vector->txr_idx, adapter->num_tx_queues, - r_idx + 1); - } + u16 enable_mask = 0; /* attempt to distribute budget to each queue fairly, but don''t allow * the budget to go below 1 because we''ll exit polling */ @@ -1900,75 +1648,29 @@ static int ixgbe_clean_rxtx_many(struct napi_struct *napi, int budget) budget = max(budget, 1); r_idx = find_first_bit(q_vector->rxr_idx, adapter->num_rx_queues); for (i = 0; i < q_vector->rxr_count; i++) { - ring = &(adapter->rx_ring[r_idx]); + rx_ring = &(adapter->rx_ring[r_idx]); #if defined(CONFIG_DCA) || defined(CONFIG_DCA_MODULE) if (adapter->flags & IXGBE_FLAG_DCA_ENABLED) - ixgbe_update_rx_dca(adapter, ring); + ixgbe_update_rx_dca(adapter, rx_ring); #endif - ixgbe_clean_rx_irq(q_vector, ring, &work_done, budget); + if (rx_ring->active) + ixgbe_clean_rx_irq(adapter, rx_ring, + &work_done, budget); + enable_mask |= rx_ring->v_idx; r_idx = find_next_bit(q_vector->rxr_idx, adapter->num_rx_queues, r_idx + 1); } - if (!tx_clean_complete) - work_done = budget; - -#ifndef HAVE_NETDEV_NAPI_LIST - if (!netif_running(adapter->netdev)) - work_done = 0; - -#endif - /* If all Rx work done, exit the polling mode */ - if (work_done < budget) { - napi_complete(napi); - if (adapter->itr_setting & 1) - ixgbe_set_itr_msix(q_vector); - if (!test_bit(__IXGBE_DOWN, &adapter->state)) - ixgbe_irq_enable_queues(adapter, ((u64)1 << q_vector->v_idx)); - } - - return work_done; -} - -/** - * ixgbe_clean_txonly - msix (aka one shot) tx clean routine - * @napi: napi struct with our devices info in it - * @budget: amount of work driver is allowed to do this pass, in packets - * - * This function is optimized for cleaning one queue only on a single - * q_vector!!! - **/ -static int ixgbe_clean_txonly(struct napi_struct *napi, int budget) -{ - struct ixgbe_q_vector *q_vector - container_of(napi, struct ixgbe_q_vector, napi); - struct ixgbe_adapter *adapter = q_vector->adapter; - struct ixgbe_ring *tx_ring = NULL; - int work_done = 0; - long r_idx; - - r_idx = find_first_bit(q_vector->txr_idx, adapter->num_tx_queues); - tx_ring = &(adapter->tx_ring[r_idx]); -#if defined(CONFIG_DCA) || defined(CONFIG_DCA_MODULE) - if (adapter->flags & IXGBE_FLAG_DCA_ENABLED) - ixgbe_update_tx_dca(adapter, tx_ring); -#endif - - if (!ixgbe_clean_tx_irq(q_vector, tx_ring)) - work_done = budget; - -#ifndef HAVE_NETDEV_NAPI_LIST - if (!netif_running(adapter->netdev)) - work_done = 0; - -#endif + r_idx = find_first_bit(q_vector->rxr_idx, adapter->num_rx_queues); + rx_ring = &(adapter->rx_ring[r_idx]); /* If all Rx work done, exit the polling mode */ - if (work_done < budget) { - napi_complete(napi); - if (adapter->itr_setting & 1) + if ((work_done == 0) || !netif_running(adapter->netdev)) { + netif_rx_complete(adapter->netdev, napi); + if (adapter->itr_setting & 3) ixgbe_set_itr_msix(q_vector); if (!test_bit(__IXGBE_DOWN, &adapter->state)) - ixgbe_irq_enable_queues(adapter, ((u64)1 << q_vector->v_idx)); + IXGBE_WRITE_REG(&adapter->hw, IXGBE_EIMS, enable_mask); + return 0; } return work_done; @@ -1978,24 +1680,26 @@ static int ixgbe_clean_txonly(struct napi_struct *napi, int budget) static inline void map_vector_to_rxq(struct ixgbe_adapter *a, int v_idx, int r_idx) { - struct ixgbe_q_vector *q_vector = a->q_vector[v_idx]; + a->q_vector[v_idx].adapter = a; + set_bit(r_idx, a->q_vector[v_idx].rxr_idx); + a->q_vector[v_idx].rxr_count++; + a->rx_ring[r_idx].v_idx = 1 << v_idx; - set_bit(r_idx, q_vector->rxr_idx); - q_vector->rxr_count++; } static inline void map_vector_to_txq(struct ixgbe_adapter *a, int v_idx, - int t_idx) + int r_idx) { - struct ixgbe_q_vector *q_vector = a->q_vector[v_idx]; - - set_bit(t_idx, q_vector->txr_idx); - q_vector->txr_count++; + a->q_vector[v_idx].adapter = a; + set_bit(r_idx, a->q_vector[v_idx].txr_idx); + a->q_vector[v_idx].txr_count++; + a->tx_ring[r_idx].v_idx = 1 << v_idx; } /** * ixgbe_map_rings_to_vectors - Maps descriptor rings to vectors * @adapter: board private structure to initialize + * @vectors: allotted vector count for descriptor rings * * This function maps descriptor rings to the queue-specific vectors * we were allotted through the MSI-X enabling code. Ideally, we''d have @@ -2003,9 +1707,8 @@ static inline void map_vector_to_txq(struct ixgbe_adapter *a, int v_idx, * group the rings as "efficiently" as possible. You would add new * mapping configurations in here. **/ -static int ixgbe_map_rings_to_vectors(struct ixgbe_adapter *adapter) +static int ixgbe_map_rings_to_vectors(struct ixgbe_adapter *adapter, int vectors) { - int q_vectors; int v_start = 0; int rxr_idx = 0, txr_idx = 0; int rxr_remaining = adapter->num_rx_queues; @@ -2018,18 +1721,17 @@ static int ixgbe_map_rings_to_vectors(struct ixgbe_adapter *adapter) if (!(adapter->flags & IXGBE_FLAG_MSIX_ENABLED)) goto out; - q_vectors = adapter->num_msix_vectors - NON_Q_VECTORS; - /* * The ideal configuration... * We have enough vectors to map one per queue. */ - if (q_vectors == adapter->num_rx_queues + adapter->num_tx_queues) { + if (vectors == adapter->num_rx_queues + adapter->num_tx_queues) { for (; rxr_idx < rxr_remaining; v_start++, rxr_idx++) map_vector_to_rxq(adapter, v_start, rxr_idx); for (; txr_idx < txr_remaining; v_start++, txr_idx++) map_vector_to_txq(adapter, v_start, txr_idx); + goto out; } @@ -2039,16 +1741,16 @@ static int ixgbe_map_rings_to_vectors(struct ixgbe_adapter *adapter) * multiple queues per vector. */ /* Re-adjusting *qpv takes care of the remainder. */ - for (i = v_start; i < q_vectors; i++) { - rqpv = DIV_ROUND_UP(rxr_remaining, q_vectors - i); + for (i = v_start; i < vectors; i++) { + rqpv = DIV_ROUND_UP(rxr_remaining, vectors - i); for (j = 0; j < rqpv; j++) { map_vector_to_rxq(adapter, i, rxr_idx); rxr_idx++; rxr_remaining--; } } - for (i = v_start; i < q_vectors; i++) { - tqpv = DIV_ROUND_UP(txr_remaining, q_vectors - i); + for (i = v_start; i < vectors; i++) { + tqpv = DIV_ROUND_UP(txr_remaining, vectors - i); for (j = 0; j < tqpv; j++) { map_vector_to_txq(adapter, i, txr_idx); txr_idx++; @@ -2077,31 +1779,30 @@ static int ixgbe_request_msix_irqs(struct ixgbe_adapter *adapter) /* Decrement for Other and TCP Timer vectors */ q_vectors = adapter->num_msix_vectors - NON_Q_VECTORS; -#define SET_HANDLER(_v) (((_v)->rxr_count && (_v)->txr_count) \ - ? &ixgbe_msix_clean_many : \ - (_v)->rxr_count ? &ixgbe_msix_clean_rx : \ - (_v)->txr_count ? &ixgbe_msix_clean_tx : \ - NULL) + /* Map the Tx/Rx rings to the vectors we were allotted. */ + err = ixgbe_map_rings_to_vectors(adapter, q_vectors); + if (err) + goto out; + +#define SET_HANDLER(_v) ((!(_v)->rxr_count) ? &ixgbe_msix_clean_tx : \ + (!(_v)->txr_count) ? &ixgbe_msix_clean_rx : \ + &ixgbe_msix_clean_many) for (vector = 0; vector < q_vectors; vector++) { - struct ixgbe_q_vector *q_vector = adapter->q_vector[vector]; - handler = SET_HANDLER(q_vector); + handler = SET_HANDLER(&adapter->q_vector[vector]); if (handler == &ixgbe_msix_clean_rx) { - sprintf(q_vector->name, "%s-%s-%d", + sprintf(adapter->name[vector], "%s-%s-%d", netdev->name, "rx", ri++); } else if (handler == &ixgbe_msix_clean_tx) { - sprintf(q_vector->name, "%s-%s-%d", + sprintf(adapter->name[vector], "%s-%s-%d", netdev->name, "tx", ti++); - } else if (handler == &ixgbe_msix_clean_many) { - sprintf(q_vector->name, "%s-%s-%d", - netdev->name, "TxRx", vector); } else { - /* skip this unused q_vector */ - continue; + sprintf(adapter->name[vector], "%s-%s-%d", + netdev->name, "TxRx", vector); } err = request_irq(adapter->msix_entries[vector].vector, - handler, 0, q_vector->name, - q_vector); + handler, 0, adapter->name[vector], + &(adapter->q_vector[vector])); if (err) { DPRINTK(PROBE, ERR, "request_irq failed for MSIX interrupt " @@ -2110,9 +1811,9 @@ static int ixgbe_request_msix_irqs(struct ixgbe_adapter *adapter) } } - sprintf(adapter->lsc_int_name, "%s:lsc", netdev->name); + sprintf(adapter->name[vector], "%s:lsc", netdev->name); err = request_irq(adapter->msix_entries[vector].vector, - &ixgbe_msix_lsc, 0, adapter->lsc_int_name, netdev); + &ixgbe_msix_lsc, 0, adapter->name[vector], netdev); if (err) { DPRINTK(PROBE, ERR, "request_irq for msix_lsc failed: %d\n", err); @@ -2121,9 +1822,9 @@ static int ixgbe_request_msix_irqs(struct ixgbe_adapter *adapter) #ifdef IXGBE_TCP_TIMER vector++; - sprintf(adapter->tcp_timer_name, "%s:timer", netdev->name); + sprintf(adapter->name[vector], "%s:timer", netdev->name); err = request_irq(adapter->msix_entries[vector].vector, - &ixgbe_msix_tcp_timer, 0, adapter->tcp_timer_name, + &ixgbe_msix_tcp_timer, 0, adapter->name[vector], netdev); if (err) { DPRINTK(PROBE, ERR, @@ -2139,17 +1840,19 @@ static int ixgbe_request_msix_irqs(struct ixgbe_adapter *adapter) free_queue_irqs: for (i = vector - 1; i >= 0; i--) free_irq(adapter->msix_entries[--vector].vector, - adapter->q_vector[i]); + &(adapter->q_vector[i])); adapter->flags &= ~IXGBE_FLAG_MSIX_ENABLED; pci_disable_msix(adapter->pdev); kfree(adapter->msix_entries); adapter->msix_entries = NULL; +out: return err; } static void ixgbe_set_itr(struct ixgbe_adapter *adapter) { - struct ixgbe_q_vector *q_vector = adapter->q_vector[0]; + struct ixgbe_hw *hw = &adapter->hw; + struct ixgbe_q_vector *q_vector = adapter->q_vector; u8 current_itr; u32 new_itr = q_vector->eitr; struct ixgbe_ring *rx_ring = &adapter->rx_ring[0]; @@ -2182,14 +1885,13 @@ static void ixgbe_set_itr(struct ixgbe_adapter *adapter) } if (new_itr != q_vector->eitr) { - + u32 itr_reg; /* do an exponential smoothing */ new_itr = ((q_vector->eitr * 90)/100) + ((new_itr * 10)/100); - - /* save the algorithm value here */ q_vector->eitr = new_itr; - - ixgbe_write_eitr(q_vector); + itr_reg = EITR_INTS_PER_SEC_TO_REG(new_itr); + /* must write high and low 16 bits to reset counter */ + IXGBE_WRITE_REG(hw, IXGBE_EITR(0), itr_reg | (itr_reg)<<16); } return; @@ -2199,117 +1901,70 @@ static void ixgbe_set_itr(struct ixgbe_adapter *adapter) * ixgbe_irq_enable - Enable default interrupt generation settings * @adapter: board private structure **/ -static inline void ixgbe_irq_enable(struct ixgbe_adapter *adapter, bool queues, bool flush) +static inline void ixgbe_irq_enable(struct ixgbe_adapter *adapter) { u32 mask; - u64 qmask; - - mask = (IXGBE_EIMS_ENABLE_MASK & ~IXGBE_EIMS_RTX_QUEUE); - qmask = ~0; - - /* don''t reenable LSC while waiting for link */ - if (adapter->flags & IXGBE_FLAG_NEED_LINK_UPDATE) - mask &= ~IXGBE_EIMS_LSC; + mask = IXGBE_EIMS_ENABLE_MASK; if (adapter->flags & IXGBE_FLAG_FAN_FAIL_CAPABLE) mask |= IXGBE_EIMS_GPI_SDP1; - if (adapter->hw.mac.type == ixgbe_mac_82599EB) { - mask |= IXGBE_EIMS_ECC; - mask |= IXGBE_EIMS_GPI_SDP1; - mask |= IXGBE_EIMS_GPI_SDP2; - } - if (adapter->flags & IXGBE_FLAG_FDIR_HASH_CAPABLE || - adapter->flags & IXGBE_FLAG_FDIR_PERFECT_CAPABLE) - mask |= IXGBE_EIMS_FLOW_DIR; - IXGBE_WRITE_REG(&adapter->hw, IXGBE_EIMS, mask); - if (queues) - ixgbe_irq_enable_queues(adapter, qmask); - if (flush) - IXGBE_WRITE_FLUSH(&adapter->hw); + IXGBE_WRITE_FLUSH(&adapter->hw); } + /** * ixgbe_intr - legacy mode Interrupt Handler * @irq: interrupt number * @data: pointer to a network interface device structure + * @pt_regs: CPU registers structure **/ static irqreturn_t ixgbe_intr(int irq, void *data) { struct net_device *netdev = data; struct ixgbe_adapter *adapter = netdev_priv(netdev); struct ixgbe_hw *hw = &adapter->hw; - struct ixgbe_q_vector *q_vector = adapter->q_vector[0]; u32 eicr; - /* - * Workaround of Silicon errata on 82598. Mask the interrupt - * before the read of EICR. - */ - IXGBE_WRITE_REG(hw, IXGBE_EIMC, IXGBE_IRQ_CLEAR_MASK); - /* for NAPI, using EIAM to auto-mask tx/rx interrupt bits on read * therefore no explict interrupt disable is necessary */ eicr = IXGBE_READ_REG(hw, IXGBE_EICR); if (!eicr) { - /* - * shared interrupt alert! +#ifdef CONFIG_IXGBE_NAPI + /* shared interrupt alert! * make sure interrupts are enabled because the read will - * have disabled interrupts due to EIAM - * finish the workaround of silicon errata on 82598. Unmask - * the interrupt that we masked before the EICR read. - */ - if (!test_bit(__IXGBE_DOWN, &adapter->state)) - ixgbe_irq_enable(adapter, true, true); + * have disabled interrupts due to EIAM */ + ixgbe_irq_enable(adapter); +#endif return IRQ_NONE; /* Not our interrupt */ } if (eicr & IXGBE_EICR_LSC) ixgbe_check_lsc(adapter); - if (hw->mac.type == ixgbe_mac_82599EB) { - if (eicr & IXGBE_EICR_ECC) - DPRINTK(LINK, INFO, "Received unrecoverable ECC Err, " - "please reboot\n"); - ixgbe_check_sfp_event(adapter, eicr); - } - ixgbe_check_fan_failure(adapter, eicr); #ifdef CONFIG_IXGBE_NAPI - if (napi_schedule_prep(&(q_vector->napi))) { + if (netif_rx_schedule_prep(netdev, &adapter->q_vector[0].napi)) { adapter->tx_ring[0].total_packets = 0; adapter->tx_ring[0].total_bytes = 0; adapter->rx_ring[0].total_packets = 0; adapter->rx_ring[0].total_bytes = 0; /* would disable interrupts here but EIAM disabled it */ - __napi_schedule(&(q_vector->napi)); + __netif_rx_schedule(netdev, &adapter->q_vector[0].napi); } - /* - * re-enable link(maybe) and non-queue interrupts, no flush. - * ixgbe_poll will re-enable the queue interrupts - */ - if (!test_bit(__IXGBE_DOWN, &adapter->state)) - ixgbe_irq_enable(adapter, false, false); #else adapter->tx_ring[0].total_packets = 0; adapter->tx_ring[0].total_bytes = 0; adapter->rx_ring[0].total_packets = 0; adapter->rx_ring[0].total_bytes = 0; - ixgbe_clean_tx_irq(q_vector, adapter->tx_ring); - ixgbe_clean_rx_irq(q_vector, adapter->rx_ring); + ixgbe_clean_rx_irq(adapter, adapter->rx_ring); + ixgbe_clean_tx_irq(adapter, adapter->tx_ring); /* dynamically adjust throttle */ - if (adapter->itr_setting & 1) + if (adapter->itr_setting & 3) ixgbe_set_itr(adapter); - /* - * Workaround of Silicon errata on 82598. Unmask - * the interrupt that we masked before the EICR read - * no flush of the re-enable is necessary here - */ - if (!test_bit(__IXGBE_DOWN, &adapter->state)) - ixgbe_irq_enable(adapter, true, false); #endif return IRQ_HANDLED; } @@ -2319,12 +1974,11 @@ static inline void ixgbe_reset_q_vectors(struct ixgbe_adapter *adapter) int i, q_vectors = adapter->num_msix_vectors - NON_Q_VECTORS; for (i = 0; i < q_vectors; i++) { - struct ixgbe_q_vector *q_vector = adapter->q_vector[i]; + struct ixgbe_q_vector *q_vector = &adapter->q_vector[i]; bitmap_zero(q_vector->rxr_idx, MAX_RX_QUEUES); bitmap_zero(q_vector->txr_idx, MAX_TX_QUEUES); q_vector->rxr_count = 0; q_vector->txr_count = 0; - q_vector->eitr = adapter->eitr_param; } } @@ -2371,11 +2025,11 @@ static void ixgbe_free_irq(struct ixgbe_adapter *adapter) i--; #endif free_irq(adapter->msix_entries[i].vector, netdev); - i--; + i--; for (; i >= 0; i--) { free_irq(adapter->msix_entries[i].vector, - adapter->q_vector[i]); + &(adapter->q_vector[i])); } ixgbe_reset_q_vectors(adapter); @@ -2390,13 +2044,7 @@ static void ixgbe_free_irq(struct ixgbe_adapter *adapter) **/ static inline void ixgbe_irq_disable(struct ixgbe_adapter *adapter) { - if (adapter->hw.mac.type == ixgbe_mac_82598EB) { - IXGBE_WRITE_REG(&adapter->hw, IXGBE_EIMC, ~0); - } else { - IXGBE_WRITE_REG(&adapter->hw, IXGBE_EIMC, 0xFFFF0000); - IXGBE_WRITE_REG(&adapter->hw, IXGBE_EIMC_EX(0), ~0); - IXGBE_WRITE_REG(&adapter->hw, IXGBE_EIMC_EX(1), ~0); - } + IXGBE_WRITE_REG(&adapter->hw, IXGBE_EIMC, ~0); IXGBE_WRITE_FLUSH(&adapter->hw); if (adapter->flags & IXGBE_FLAG_MSIX_ENABLED) { int i; @@ -2407,6 +2055,13 @@ static inline void ixgbe_irq_disable(struct ixgbe_adapter *adapter) } } +static inline void ixgbe_irq_enable_queues(struct ixgbe_adapter *adapter) +{ + u32 mask = IXGBE_EIMS_RTX_QUEUE; + IXGBE_WRITE_REG(&adapter->hw, IXGBE_EIMS, mask); + /* skip the flush */ +} + /** * ixgbe_configure_msi_and_legacy - Initialize PIN (INTA...) and MSI interrupts * @@ -2418,8 +2073,8 @@ static void ixgbe_configure_msi_and_legacy(struct ixgbe_adapter *adapter) IXGBE_WRITE_REG(hw, IXGBE_EITR(0), EITR_INTS_PER_SEC_TO_REG(adapter->eitr_param)); - ixgbe_set_ivar(adapter, 0, 0, 0); - ixgbe_set_ivar(adapter, 1, 0, 0); + ixgbe_set_ivar(adapter, IXGBE_IVAR_RX_QUEUE(0), 0); + ixgbe_set_ivar(adapter, IXGBE_IVAR_TX_QUEUE(0), 0); map_vector_to_rxq(adapter, 0, 0); map_vector_to_txq(adapter, 0, 0); @@ -2435,7 +2090,7 @@ static void ixgbe_configure_msi_and_legacy(struct ixgbe_adapter *adapter) **/ static void ixgbe_configure_tx(struct ixgbe_adapter *adapter) { - u64 tdba; + u64 tdba, tdwba; struct ixgbe_hw *hw = &adapter->hw; u32 i, j, tdlen, txctrl; @@ -2448,6 +2103,11 @@ static void ixgbe_configure_tx(struct ixgbe_adapter *adapter) IXGBE_WRITE_REG(hw, IXGBE_TDBAL(j), (tdba & DMA_32BIT_MASK)); IXGBE_WRITE_REG(hw, IXGBE_TDBAH(j), (tdba >> 32)); + tdwba = ring->dma + + (ring->count * sizeof(union ixgbe_adv_tx_desc)); + tdwba |= IXGBE_TDWBAL_HEAD_WB_ENABLE; + IXGBE_WRITE_REG(hw, IXGBE_TDWBAL(j), tdwba & DMA_32BIT_MASK); + IXGBE_WRITE_REG(hw, IXGBE_TDWBAH(j), (tdwba >> 32)); IXGBE_WRITE_REG(hw, IXGBE_TDLEN(j), tdlen); IXGBE_WRITE_REG(hw, IXGBE_TDH(j), 0); IXGBE_WRITE_REG(hw, IXGBE_TDT(j), 0); @@ -2460,39 +2120,6 @@ static void ixgbe_configure_tx(struct ixgbe_adapter *adapter) txctrl &= ~IXGBE_DCA_TXCTRL_TX_WB_RO_EN; IXGBE_WRITE_REG(hw, IXGBE_DCA_TXCTRL(j), txctrl); } - - if (hw->mac.type == ixgbe_mac_82599EB) { - u32 rttdcs; - - /* disable the arbiter while setting MTQC */ - rttdcs = IXGBE_READ_REG(hw, IXGBE_RTTDCS); - rttdcs |= IXGBE_RTTDCS_ARBDIS; - IXGBE_WRITE_REG(hw, IXGBE_RTTDCS, rttdcs); - - /* set transmit pool layout */ - switch (adapter->flags & - (IXGBE_FLAG_VMDQ_ENABLED | IXGBE_FLAG_DCB_ENABLED)) - { - - case (IXGBE_FLAG_VMDQ_ENABLED): - IXGBE_WRITE_REG(hw, IXGBE_MTQC, - (IXGBE_MTQC_VT_ENA | IXGBE_MTQC_64VF)); - break; - - case (IXGBE_FLAG_DCB_ENABLED): - IXGBE_WRITE_REG(hw, IXGBE_MTQC, - (IXGBE_MTQC_RT_ENA | IXGBE_MTQC_8TC_8TQ)); - break; - - default: - IXGBE_WRITE_REG(hw, IXGBE_MTQC, IXGBE_MTQC_64Q_1PB); - break; - } - - /* re-eable the arbiter */ - rttdcs &= ~IXGBE_RTTDCS_ARBDIS; - IXGBE_WRITE_REG(hw, IXGBE_RTTDCS, rttdcs); - } } #define IXGBE_SRRCTL_BSIZEHDRSIZE_SHIFT 2 @@ -2501,41 +2128,24 @@ static void ixgbe_configure_srrctl(struct ixgbe_adapter *adapter, int index) { struct ixgbe_ring *rx_ring; u32 srrctl; - int queue0 = 0; + int queue0; unsigned long mask; - struct ixgbe_ring_feature *feature = adapter->ring_feature; - if (adapter->hw.mac.type == ixgbe_mac_82599EB) { - if (adapter->flags & IXGBE_FLAG_DCB_ENABLED) { - int dcb_i = feature[RING_F_DCB].indices; - if (dcb_i == 8) - queue0 = index >> 4; - else if (dcb_i == 4) - queue0 = index >> 5; - else - DPRINTK(PROBE, ERR, "Invalid DCB configuration"); - } else { - queue0 = index; - } + /* program one srrctl register per VMDq index */ + if (adapter->flags & IXGBE_FLAG_VMDQ_ENABLED) { + long shift, len; + mask = (unsigned long) adapter->ring_feature[RING_F_VMDQ].mask; + len = sizeof(adapter->ring_feature[RING_F_VMDQ].mask) * 8; + shift = find_first_bit(&mask, len); + queue0 = (index & mask); + index = (index & mask) >> shift; + /* if VMDq is not active we must program one srrctl register per + * RSS queue since we have enabled RDRXCTL.MVMEN + */ } else { - /* program one srrctl register per VMDq index */ - if (adapter->flags & IXGBE_FLAG_VMDQ_ENABLED) { - long shift, len; - mask = (unsigned long) feature[RING_F_VMDQ].mask; - len = sizeof(feature[RING_F_VMDQ].mask) * 8; - shift = find_first_bit(&mask, len); - queue0 = (index & mask); - index = (index & mask) >> shift; - } else { - /* - * if VMDq is not active we must program one srrctl - * register per RSS queue since we have enabled - * RDRXCTL.MVMEN - */ - mask = (unsigned long) feature[RING_F_RSS].mask; - queue0 = index & mask; - index = index & mask; - } + mask = (unsigned long) adapter->ring_feature[RING_F_RSS].mask; + queue0 = index & mask; + index = index & mask; } rx_ring = &adapter->rx_ring[queue0]; @@ -2545,63 +2155,54 @@ static void ixgbe_configure_srrctl(struct ixgbe_adapter *adapter, int index) srrctl &= ~IXGBE_SRRCTL_BSIZEHDR_MASK; srrctl &= ~IXGBE_SRRCTL_BSIZEPKT_MASK; - srrctl |= (IXGBE_RX_HDR_SIZE << IXGBE_SRRCTL_BSIZEHDRSIZE_SHIFT) & - IXGBE_SRRCTL_BSIZEHDR_MASK; - if (adapter->flags & IXGBE_FLAG_RX_PS_ENABLED) { -#if (PAGE_SIZE / 2) > IXGBE_MAX_RXBUFFER - srrctl |= IXGBE_MAX_RXBUFFER >> IXGBE_SRRCTL_BSIZEPKT_SHIFT; -#else - srrctl |= (PAGE_SIZE / 2) >> IXGBE_SRRCTL_BSIZEPKT_SHIFT; -#endif + srrctl |= IXGBE_RXBUFFER_2048 >> IXGBE_SRRCTL_BSIZEPKT_SHIFT; srrctl |= IXGBE_SRRCTL_DESCTYPE_HDR_SPLIT_ALWAYS; + srrctl |= ((IXGBE_RX_HDR_SIZE << + IXGBE_SRRCTL_BSIZEHDRSIZE_SHIFT) & + IXGBE_SRRCTL_BSIZEHDR_MASK); } else { - srrctl |= ALIGN(rx_ring->rx_buf_len, 1024) >> - IXGBE_SRRCTL_BSIZEPKT_SHIFT; srrctl |= IXGBE_SRRCTL_DESCTYPE_ADV_ONEBUF; + + if (rx_ring->rx_buf_len == MAXIMUM_ETHERNET_VLAN_SIZE) + srrctl |= IXGBE_RXBUFFER_2048 >> + IXGBE_SRRCTL_BSIZEPKT_SHIFT; + else + srrctl |= rx_ring->rx_buf_len >> + IXGBE_SRRCTL_BSIZEPKT_SHIFT; } IXGBE_WRITE_REG(&adapter->hw, IXGBE_SRRCTL(index), srrctl); } - -static u32 ixgbe_setup_mrqc(struct ixgbe_adapter *adapter) +#ifndef IXGBE_NO_INET_LRO +/** + * ixgbe_get_skb_hdr - helper function for LRO header processing + * @skb: pointer to sk_buff to be added to LRO packet + * @iphdr: pointer to ip header structure + * @tcph: pointer to tcp header structure + * @hdr_flags: pointer to header flags + * @priv: private data + **/ +static int ixgbe_get_skb_hdr(struct sk_buff *skb, void **iphdr, void **tcph, + u64 *hdr_flags, void *priv) { - u32 mrqc = 0; - int mask; - - if (!(adapter->hw.mac.type == ixgbe_mac_82599EB)) - return mrqc; + union ixgbe_adv_rx_desc *rx_desc = priv; - mask = adapter->flags & (IXGBE_FLAG_RSS_ENABLED - | IXGBE_FLAG_DCB_ENABLED - | IXGBE_FLAG_VMDQ_ENABLED - ); - - switch (mask) { - case (IXGBE_FLAG_RSS_ENABLED): - mrqc = IXGBE_MRQC_RSSEN; - break; - case (IXGBE_FLAG_VMDQ_ENABLED): - mrqc = IXGBE_MRQC_VMDQEN; - break; - case (IXGBE_FLAG_RSS_ENABLED | IXGBE_FLAG_VMDQ_ENABLED): - if (adapter->ring_feature[RING_F_RSS].indices == 4) - mrqc = IXGBE_MRQC_VMDQRSS32EN; - else if (adapter->ring_feature[RING_F_RSS].indices == 2) - mrqc = IXGBE_MRQC_VMDQRSS64EN; - else - mrqc = IXGBE_MRQC_VMDQEN; - break; - case (IXGBE_FLAG_DCB_ENABLED): - mrqc = IXGBE_MRQC_RT8TCEN; - break; - default: - break; - } + /* Verify that this is a valid IPv4 TCP packet */ + if (!((ixgbe_get_pkt_info(rx_desc) & IXGBE_RXDADV_PKTTYPE_IPV4) && + (ixgbe_get_pkt_info(rx_desc) & IXGBE_RXDADV_PKTTYPE_TCP))) + return -1; - return mrqc; + /* Set network headers */ + skb_reset_network_header(skb); + skb_set_transport_header(skb, ip_hdrlen(skb)); + *iphdr = ip_hdr(skb); + *tcph = tcp_hdr(skb); + *hdr_flags = LRO_IPV4 | LRO_TCP; + return 0; } +#endif /* IXGBE_NO_INET_LRO */ /** * ixgbe_configure_rx - Configure 8259x Receive Unit after Reset * @adapter: board private structure @@ -2620,15 +2221,18 @@ static void ixgbe_configure_rx(struct ixgbe_adapter *adapter) 0xA54F2BEC, 0xEA49AF7C, 0xE214AD3D, 0xB855AABE, 0x6A3E67EA, 0x14364D17, 0x3BED200D}; u32 fctrl, hlreg0; - u32 reta = 0, mrqc = 0; + u32 reta = 0, mrqc; u32 vmdctl; - int pool; u32 rdrxctl; -#ifndef IXGBE_NO_HW_RSC - u32 rscctrl; -#endif /* IXGBE_NO_HW_RSC */ int rx_buf_len; +#ifndef IXGBE_NO_LRO + adapter->lro_data.max = lromax; + + if (lromax * netdev->mtu > (1 << 16)) + adapter->lro_data.max = ((1 << 16) / netdev->mtu) - 1; + +#endif /* Decide whether to use packet split mode or not */ if (netdev->mtu > ETH_DATA_LEN) { if (adapter->flags & IXGBE_FLAG_RX_PS_CAPABLE) @@ -2645,22 +2249,8 @@ static void ixgbe_configure_rx(struct ixgbe_adapter *adapter) /* Set the RX buffer length according to the mode */ if (adapter->flags & IXGBE_FLAG_RX_PS_ENABLED) { rx_buf_len = IXGBE_RX_HDR_SIZE; - if (hw->mac.type == ixgbe_mac_82599EB) { - /* PSRTYPE must be initialized in 82599 */ - u32 psrtype = IXGBE_PSRTYPE_TCPHDR | - IXGBE_PSRTYPE_UDPHDR | - IXGBE_PSRTYPE_IPV4HDR | - IXGBE_PSRTYPE_IPV6HDR | - IXGBE_PSRTYPE_L2HDR; - IXGBE_WRITE_REG(hw, IXGBE_PSRTYPE(0), psrtype); - } } else { -#ifndef IXGBE_NO_HW_RSC - if (!(adapter->flags2 & IXGBE_FLAG2_RSC_ENABLED) && - (netdev->mtu <= ETH_DATA_LEN)) -#else if (netdev->mtu <= ETH_DATA_LEN) -#endif /* IXGBE_NO_HW_RSC */ rx_buf_len = MAXIMUM_ETHERNET_VLAN_SIZE; else rx_buf_len = ALIGN(max_frame, 1024); @@ -2679,37 +2269,13 @@ static void ixgbe_configure_rx(struct ixgbe_adapter *adapter) hlreg0 |= IXGBE_HLREG0_JUMBOEN; IXGBE_WRITE_REG(hw, IXGBE_HLREG0, hlreg0); + rdlen = adapter->rx_ring[0].count * sizeof(union ixgbe_adv_rx_desc); /* disable receives while setting up the descriptors */ rxctrl = IXGBE_READ_REG(hw, IXGBE_RXCTRL); IXGBE_WRITE_REG(hw, IXGBE_RXCTRL, rxctrl & ~IXGBE_RXCTRL_RXEN); - if ((adapter->flags & IXGBE_FLAG_VMDQ_ENABLED) && - (hw->mac.type == ixgbe_mac_82599EB)) { - int pool; - for (pool = 0; pool < adapter->num_rx_pools; pool++) { - u32 vmolr; - - if (adapter->flags & IXGBE_FLAG_RSS_ENABLED) { - u32 psrtype = IXGBE_READ_REG(hw, IXGBE_PSRTYPE(pool)); - psrtype |= (adapter->num_rx_queues_per_pool << 29); - IXGBE_WRITE_REG(hw, IXGBE_PSRTYPE(pool), psrtype); - } - - /* - * accept untagged packets until a vlan tag - * is specifically set for the VMDQ queue/pool - */ - vmolr = IXGBE_READ_REG(hw, IXGBE_VMOLR(pool)); - vmolr |= IXGBE_VMOLR_AUPE; - IXGBE_WRITE_REG(hw, IXGBE_VMOLR(pool), vmolr); - } - } - - rdlen = adapter->rx_ring[0].count * sizeof(union ixgbe_adv_rx_desc); - /* - * Setup the HW Rx Head and Tail Descriptor Pointers and - * the Base and Length of the Rx Descriptor Ring - */ + /* Setup the HW Rx Head and Tail Descriptor Pointers and + * the Base and Length of the Rx Descriptor Ring */ for (i = 0; i < adapter->num_rx_queues; i++) { rdba = adapter->rx_ring[i].dma; j = adapter->rx_ring[i].reg_idx; @@ -2720,51 +2286,60 @@ static void ixgbe_configure_rx(struct ixgbe_adapter *adapter) IXGBE_WRITE_REG(hw, IXGBE_RDT(j), 0); adapter->rx_ring[i].head = IXGBE_RDH(j); adapter->rx_ring[i].tail = IXGBE_RDT(j); - adapter->rx_ring[i].rx_buf_len = rx_buf_len; +#ifndef CONFIG_XEN_NETDEV2_VMQ + if (adapter->flags & IXGBE_FLAG_VMDQ_ENABLED) { + /* Reserve VMDq set 1 for FCoE, using 3k buffers */ + if ((i & adapter->ring_feature[RING_F_VMDQ].mask) == 1) + adapter->rx_ring[i].rx_buf_len = 3072; + else + adapter->rx_ring[i].rx_buf_len = rx_buf_len; + } else { + adapter->rx_ring[i].rx_buf_len = rx_buf_len; + } +#else + adapter->rx_ring[i].rx_buf_len = rx_buf_len; +#endif /* CONFIG_XEN_NETDEV2_VMQ */ + +#ifndef IXGBE_NO_INET_LRO + /* Intitial LRO Settings */ + adapter->rx_ring[i].lro_mgr.max_aggr = adapter->lro_max_aggr; + adapter->rx_ring[i].lro_mgr.max_desc = IXGBE_MAX_LRO_DESCRIPTORS; + adapter->rx_ring[i].lro_mgr.get_skb_header = ixgbe_get_skb_hdr; + adapter->rx_ring[i].lro_mgr.features = LRO_F_EXTRACT_VLAN_ID; +#ifdef CONFIG_IXGBE_NAPI + if (!(adapter->flags & IXGBE_FLAG_IN_NETPOLL)) + adapter->rx_ring[i].lro_mgr.features |= LRO_F_NAPI; +#endif + adapter->rx_ring[i].lro_mgr.dev = adapter->netdev; + adapter->rx_ring[i].lro_mgr.ip_summed = CHECKSUM_UNNECESSARY; + adapter->rx_ring[i].lro_mgr.ip_summed_aggr = CHECKSUM_UNNECESSARY; + +#endif ixgbe_configure_srrctl(adapter, j); } - if (hw->mac.type == ixgbe_mac_82598EB) { - /* - * For VMDq support of different descriptor types or - * buffer sizes through the use of multiple SRRCTL - * registers, RDRXCTL.MVMEN must be set to 1 - * - * also, the manual doesn''t mention it clearly but DCA hints - * will only use queue 0''s tags unless this bit is set. Side - * effects of setting this bit are only that SRRCTL must be - * fully programmed [0..15] - */ - rdrxctl = IXGBE_READ_REG(hw, IXGBE_RDRXCTL); - rdrxctl |= IXGBE_RDRXCTL_MVMEN; - IXGBE_WRITE_REG(hw, IXGBE_RDRXCTL, rdrxctl); - } + /* + * For VMDq support of different descriptor types or + * buffer sizes through the use of multiple SRRCTL + * registers, RDRXCTL.MVMEN must be set to 1 + * + * also, the manual doesn''t mention it clearly but DCA hints + * will only use queue 0''s tags unless this bit is set. Side + * effects of setting this bit are only that SRRCTL must be + * fully programmed [0..15] + */ + rdrxctl = IXGBE_READ_REG(hw, IXGBE_RDRXCTL); + rdrxctl |= IXGBE_RDRXCTL_MVMEN; + IXGBE_WRITE_REG(hw, IXGBE_RDRXCTL, rdrxctl); if (adapter->flags & IXGBE_FLAG_VMDQ_ENABLED) { - u32 vt_reg; - u32 vt_reg_bits; - if (hw->mac.type == ixgbe_mac_82599EB) { - vt_reg = IXGBE_VT_CTL; - vt_reg_bits = IXGBE_VMD_CTL_VMDQ_EN - | IXGBE_VT_CTL_REPLEN; - } else { - vt_reg = IXGBE_VMD_CTL; - vt_reg_bits = IXGBE_VMD_CTL_VMDQ_EN; - } - vmdctl = IXGBE_READ_REG(hw, vt_reg); - IXGBE_WRITE_REG(hw, vt_reg, vmdctl | vt_reg_bits); IXGBE_WRITE_REG(hw, IXGBE_MRQC, 0); - - IXGBE_WRITE_REG(hw, IXGBE_VFRE(0), 0xFFFFFFFF); - IXGBE_WRITE_REG(hw, IXGBE_VFRE(1), 0xFFFFFFFF); - IXGBE_WRITE_REG(hw, IXGBE_VFTE(0), 0xFFFFFFFF); - IXGBE_WRITE_REG(hw, IXGBE_VFTE(1), 0xFFFFFFFF); + vmdctl = IXGBE_READ_REG(hw, IXGBE_VMD_CTL); + IXGBE_WRITE_REG(hw, IXGBE_VMD_CTL, + vmdctl | IXGBE_VMD_CTL_VMDQ_EN); } - /* Program MRQC for the distribution of queues */ - mrqc = ixgbe_setup_mrqc(adapter); - if (adapter->flags & IXGBE_FLAG_RSS_ENABLED) { /* Fill out redirection table */ for (i = 0, j = 0; i < 128; i++, j++) { @@ -2781,17 +2356,19 @@ static void ixgbe_configure_rx(struct ixgbe_adapter *adapter) for (i = 0; i < 10; i++) IXGBE_WRITE_REG(hw, IXGBE_RSSRK(i), seed[i]); - if (hw->mac.type == ixgbe_mac_82598EB) - mrqc |= IXGBE_MRQC_RSSEN; + mrqc = IXGBE_MRQC_RSSEN /* Perform hash on these packet types */ - mrqc |= IXGBE_MRQC_RSS_FIELD_IPV4 - | IXGBE_MRQC_RSS_FIELD_IPV4_TCP - | IXGBE_MRQC_RSS_FIELD_IPV4_UDP - | IXGBE_MRQC_RSS_FIELD_IPV6 - | IXGBE_MRQC_RSS_FIELD_IPV6_TCP - | IXGBE_MRQC_RSS_FIELD_IPV6_UDP; + | IXGBE_MRQC_RSS_FIELD_IPV4 + | IXGBE_MRQC_RSS_FIELD_IPV4_TCP + | IXGBE_MRQC_RSS_FIELD_IPV4_UDP + | IXGBE_MRQC_RSS_FIELD_IPV6_EX_TCP + | IXGBE_MRQC_RSS_FIELD_IPV6_EX + | IXGBE_MRQC_RSS_FIELD_IPV6 + | IXGBE_MRQC_RSS_FIELD_IPV6_TCP + | IXGBE_MRQC_RSS_FIELD_IPV6_UDP + | IXGBE_MRQC_RSS_FIELD_IPV6_EX_UDP; + IXGBE_WRITE_REG(hw, IXGBE_MRQC, mrqc); } - IXGBE_WRITE_REG(hw, IXGBE_MRQC, mrqc); rxcsum = IXGBE_READ_REG(hw, IXGBE_RXCSUM); @@ -2808,71 +2385,6 @@ static void ixgbe_configure_rx(struct ixgbe_adapter *adapter) } IXGBE_WRITE_REG(hw, IXGBE_RXCSUM, rxcsum); - - if (hw->mac.type == ixgbe_mac_82599EB) { - rdrxctl = IXGBE_READ_REG(hw, IXGBE_RDRXCTL); -#ifndef IXGBE_NO_HW_RSC - if (adapter->flags2 & IXGBE_FLAG2_RSC_ENABLED) - rdrxctl &= ~IXGBE_RDRXCTL_RSCFRSTSIZE; -#endif /* IXGBE_NO_HW_RSC */ - rdrxctl |= IXGBE_RDRXCTL_CRCSTRIP; - IXGBE_WRITE_REG(hw, IXGBE_RDRXCTL, rdrxctl); - } - -#ifndef IXGBE_NO_HW_RSC - if (adapter->flags2 & IXGBE_FLAG2_RSC_ENABLED) { - /* Enable 82599 HW RSC */ - for (i = 0; i < adapter->num_rx_queues; i++) { - j = adapter->rx_ring[i].reg_idx; - rscctrl = IXGBE_READ_REG(hw, IXGBE_RSCCTL(j)); - rscctrl |= IXGBE_RSCCTL_RSCEN; - /* - * we must limit the number of descriptors so that - * the total size of max desc * buf_len is not greater - * than 65535 - */ - if (adapter->flags & IXGBE_FLAG_RX_PS_ENABLED) { -#if (MAX_SKB_FRAGS > 16) - rscctrl |= IXGBE_RSCCTL_MAXDESC_16; -#elif (MAX_SKB_FRAGS > 8) - rscctrl |= IXGBE_RSCCTL_MAXDESC_8; -#elif (MAX_SKB_FRAGS > 4) - rscctrl |= IXGBE_RSCCTL_MAXDESC_4; -#else - rscctrl |= IXGBE_RSCCTL_MAXDESC_1; -#endif - } else { - if (rx_buf_len < IXGBE_RXBUFFER_4096) - rscctrl |= IXGBE_RSCCTL_MAXDESC_16; - else if (rx_buf_len < IXGBE_RXBUFFER_8192) - rscctrl |= IXGBE_RSCCTL_MAXDESC_8; - else - rscctrl |= IXGBE_RSCCTL_MAXDESC_4; - } - - if (adapter->num_rx_queues_per_pool == 1) - pool = j / 2; - else - pool = j / adapter->num_rx_queues_per_pool; - - if (adapter->flags & IXGBE_FLAG_VMDQ_ENABLED) - IXGBE_WRITE_REG(hw, IXGBE_PSRTYPE(pool), - (IXGBE_READ_REG(hw, IXGBE_PSRTYPE(pool)) | - IXGBE_PSRTYPE_TCPHDR)); - - IXGBE_WRITE_REG(hw, IXGBE_RSCCTL(j), rscctrl); - - } - /* Enable TCP header recognition in PSRTYPE */ - IXGBE_WRITE_REG(hw, IXGBE_PSRTYPE(0), - (IXGBE_READ_REG(hw, IXGBE_PSRTYPE(0)) | - IXGBE_PSRTYPE_TCPHDR)); - - /* Disable RSC for ACK packets */ - IXGBE_WRITE_REG(hw, IXGBE_RSCDBU, - (IXGBE_RSCDBU_RSCACKDIS | IXGBE_READ_REG(hw, IXGBE_RSCDBU))); - } -#endif /* IXGBE_NO_HW_RSC */ } #ifdef NETIF_F_HW_VLAN_TX @@ -2881,55 +2393,43 @@ static void ixgbe_vlan_rx_register(struct net_device *netdev, { struct ixgbe_adapter *adapter = netdev_priv(netdev); u32 ctrl; - int i, j; if (!test_bit(__IXGBE_DOWN, &adapter->state)) ixgbe_irq_disable(adapter); adapter->vlgrp = grp; - if (adapter->hw.mac.type == ixgbe_mac_82598EB) { - /* always enable VLAN tag insert/strip */ - ctrl = IXGBE_READ_REG(&adapter->hw, IXGBE_VLNCTRL); - ctrl |= IXGBE_VLNCTRL_VME | IXGBE_VLNCTRL_VFE; - ctrl &= ~IXGBE_VLNCTRL_CFIEN; - IXGBE_WRITE_REG(&adapter->hw, IXGBE_VLNCTRL, ctrl); - } else if (adapter->hw.mac.type == ixgbe_mac_82599EB) { + /* + * For a DCB driver, always enable VLAN tag stripping so we can + * still receive traffic from a DCB-enabled host. + */ + ctrl = IXGBE_READ_REG(&adapter->hw, IXGBE_VLNCTRL); + ctrl |= IXGBE_VLNCTRL_VME; + ctrl &= ~IXGBE_VLNCTRL_CFIEN; + IXGBE_WRITE_REG(&adapter->hw, IXGBE_VLNCTRL, ctrl); + + if (grp) { /* enable VLAN tag insert/strip */ ctrl = IXGBE_READ_REG(&adapter->hw, IXGBE_VLNCTRL); - ctrl |= IXGBE_VLNCTRL_VFE; + ctrl |= IXGBE_VLNCTRL_VME; ctrl &= ~IXGBE_VLNCTRL_CFIEN; IXGBE_WRITE_REG(&adapter->hw, IXGBE_VLNCTRL, ctrl); - for (i = 0; i < adapter->num_rx_queues; i++) { - j = adapter->rx_ring[i].reg_idx; - ctrl = IXGBE_READ_REG(&adapter->hw, IXGBE_RXDCTL(j)); - ctrl |= IXGBE_RXDCTL_VME; - IXGBE_WRITE_REG(&adapter->hw, IXGBE_RXDCTL(j), ctrl); - } } if (!test_bit(__IXGBE_DOWN, &adapter->state)) - ixgbe_irq_enable(adapter, true, true); + ixgbe_irq_enable(adapter); } static void ixgbe_vlan_rx_add_vid(struct net_device *netdev, u16 vid) { struct ixgbe_adapter *adapter = netdev_priv(netdev); struct ixgbe_hw *hw = &adapter->hw; - int i; #ifndef HAVE_NETDEV_VLAN_FEATURES struct net_device *v_netdev; #endif /* HAVE_NETDEV_VLAN_FEATURES */ /* add VID to filter table */ - if (hw->mac.ops.set_vfta) { + if (hw->mac.ops.set_vfta) hw->mac.ops.set_vfta(hw, vid, 0, true); - if ((adapter->flags & IXGBE_FLAG_VMDQ_ENABLED) && - (adapter->hw.mac.type == ixgbe_mac_82599EB)) { - /* enable vlan id for all pools */ - for (i = 1; i < adapter->num_rx_pools; i++) - hw->mac.ops.set_vfta(hw, vid, i, true); - } - } #ifndef HAVE_NETDEV_VLAN_FEATURES /* * Copy feature flags from netdev to the vlan netdev for this vid. @@ -2945,11 +2445,6 @@ static void ixgbe_vlan_rx_kill_vid(struct net_device *netdev, u16 vid) { struct ixgbe_adapter *adapter = netdev_priv(netdev); struct ixgbe_hw *hw = &adapter->hw; - int i; - - /* User is not allowed to remove vlan ID 0 */ - if (!vid) - return; if (!test_bit(__IXGBE_DOWN, &adapter->state)) ixgbe_irq_disable(adapter); @@ -2957,29 +2452,16 @@ static void ixgbe_vlan_rx_kill_vid(struct net_device *netdev, u16 vid) vlan_group_set_device(adapter->vlgrp, vid, NULL); if (!test_bit(__IXGBE_DOWN, &adapter->state)) - ixgbe_irq_enable(adapter, true, true); + ixgbe_irq_enable(adapter); /* remove VID from filter table */ - if (hw->mac.ops.set_vfta) { + if (hw->mac.ops.set_vfta) hw->mac.ops.set_vfta(hw, vid, 0, false); - if ((adapter->flags & IXGBE_FLAG_VMDQ_ENABLED) && - (adapter->hw.mac.type == ixgbe_mac_82599EB)) { - /* remove vlan id from all pools */ - for (i = 1; i < adapter->num_rx_pools; i++) - hw->mac.ops.set_vfta(hw, vid, i, false); - } - } } static void ixgbe_restore_vlan(struct ixgbe_adapter *adapter) { - struct ixgbe_hw *hw = &adapter->hw; - ixgbe_vlan_rx_register(adapter->netdev, adapter->vlgrp); - /* add vlan ID 0 so we always accept priority-tagged traffic */ - if (hw->mac.ops.set_vfta) - hw->mac.ops.set_vfta(hw, 0, 0, true); - if (adapter->vlgrp) { u16 vid; for (vid = 0; vid < VLAN_GROUP_ARRAY_LEN; vid++) { @@ -2991,11 +2473,44 @@ static void ixgbe_restore_vlan(struct ixgbe_adapter *adapter) } #endif +#ifndef CONFIG_XEN_NETDEV2_VMQ +/** + * compare_ether_oui - Compare two OUIs + * @addr1: pointer to a 6 byte array containing an Ethernet address + * @addr2: pointer to a 6 byte array containing an Ethernet address + * + * Compare the Organizationally Unique Identifiers from two Ethernet addresses, + * returns 0 if equal + */ +static inline int compare_ether_oui(const u8 *a, const u8 *b) +{ + return ((a[0] ^ b[0]) | (a[1] ^ b[1]) | (a[2] ^ b[2])) != 0; +} + +/** + * is_fcoe_ether_addr - Compare an Ethernet address to FCoE OUI + * @addr1: pointer to a 6 byte array containing an Ethernet address + * @addr2: pointer to a 6 byte array containing an Ethernet address + * + * Compare the Organizationally Unique Identifier from an Ethernet addresses + * with the well known Fibre Channel over Ethernet OUI + * + * Returns 1 if the address has an FCoE OUI + */ +static inline int is_fcoe_ether_addr(const u8 *addr) +{ + static const u8 fcoe_oui[] = { 0x0e, 0xfc, 0x00 }; + return compare_ether_oui(addr, fcoe_oui) == 0; +} +#endif /* CONFIG_XEN_NETDEV2_VMQ */ + static u8 *ixgbe_addr_list_itr(struct ixgbe_hw *hw, u8 **mc_addr_ptr, u32 *vmdq) { +#ifndef CONFIG_XEN_NETDEV2_VMQ + struct ixgbe_adapter *adapter = hw->back; +#endif struct dev_mc_list *mc_ptr; u8 *addr = *mc_addr_ptr; - *vmdq = 0; mc_ptr = container_of(addr, struct dev_mc_list, dmi_addr[0]); @@ -3003,7 +2518,27 @@ static u8 *ixgbe_addr_list_itr(struct ixgbe_hw *hw, u8 **mc_addr_ptr, u32 *vmdq) *mc_addr_ptr = mc_ptr->next->dmi_addr; else *mc_addr_ptr = NULL; - +#ifndef CONFIG_XEN_NETDEV2_VMQ + if (adapter->flags & IXGBE_FLAG_VMDQ_ENABLED) { + /* VMDQ set 1 is used for FCoE */ + if (adapter->ring_feature[RING_F_VMDQ].indices) + *vmdq = is_fcoe_ether_addr(addr) ? 1 : 0; + if (*vmdq == 1) { + u32 hlreg0, mhadd; + + /* Make sure that jumbo frames are enabled */ + hlreg0 = IXGBE_READ_REG(hw, IXGBE_HLREG0); + hlreg0 |= IXGBE_HLREG0_JUMBOEN; + IXGBE_WRITE_REG(hw, IXGBE_HLREG0, hlreg0); + + /* set the max frame size to pass receive filtering */ + mhadd = IXGBE_READ_REG(hw, IXGBE_MHADD); + mhadd &= IXGBE_MHADD_MFS_MASK; + mhadd |= 3072 << IXGBE_MHADD_MFS_SHIFT; + IXGBE_WRITE_REG(hw, IXGBE_MHADD, mhadd); + } + } +#endif return addr; } @@ -3042,7 +2577,6 @@ static void ixgbe_set_rx_mode(struct net_device *netdev) } vlnctrl |= IXGBE_VLNCTRL_VFE; hw->addr_ctrl.user_set_promisc = 0; - fctrl &= ~(IXGBE_FCTRL_UPE | IXGBE_FCTRL_MPE); } IXGBE_WRITE_REG(hw, IXGBE_FCTRL, fctrl); @@ -3080,20 +2614,17 @@ static void ixgbe_napi_enable_all(struct ixgbe_adapter *adapter) for (q_idx = 0; q_idx < q_vectors; q_idx++) { struct napi_struct *napi; - q_vector = adapter->q_vector[q_idx]; + q_vector = &adapter->q_vector[q_idx]; + if (!q_vector->rxr_count) + continue; napi = &q_vector->napi; - if (adapter->flags & IXGBE_FLAG_MSIX_ENABLED) { - if (!q_vector->rxr_count || !q_vector->txr_count) { - if (q_vector->txr_count == 1) - napi->poll = &ixgbe_clean_txonly; - else if (q_vector->rxr_count == 1) - napi->poll = &ixgbe_clean_rxonly; - } - } + if ((adapter->flags & IXGBE_FLAG_MSIX_ENABLED) && + (q_vector->rxr_count > 1)) + napi->poll = &ixgbe_clean_rxonly_many; napi_enable(napi); } -#endif /* CONFIG_IXGBE_NAPI */ +#endif } static void ixgbe_napi_disable_all(struct ixgbe_adapter *adapter) @@ -3108,7 +2639,9 @@ static void ixgbe_napi_disable_all(struct ixgbe_adapter *adapter) q_vectors = 1; for (q_idx = 0; q_idx < q_vectors; q_idx++) { - q_vector = adapter->q_vector[q_idx]; + q_vector = &adapter->q_vector[q_idx]; + if (!q_vector->rxr_count) + continue; napi_disable(&q_vector->napi); } #endif @@ -3151,88 +2684,14 @@ static void ixgbe_configure_dcb(struct ixgbe_adapter *adapter) } /* Enable VLAN tag insert/strip */ vlnctrl = IXGBE_READ_REG(hw, IXGBE_VLNCTRL); - if (hw->mac.type == ixgbe_mac_82598EB) { - vlnctrl |= IXGBE_VLNCTRL_VME | IXGBE_VLNCTRL_VFE; - vlnctrl &= ~IXGBE_VLNCTRL_CFIEN; - IXGBE_WRITE_REG(hw, IXGBE_VLNCTRL, vlnctrl); - } else if (hw->mac.type == ixgbe_mac_82599EB) { - vlnctrl |= IXGBE_VLNCTRL_VFE; - vlnctrl &= ~IXGBE_VLNCTRL_CFIEN; - IXGBE_WRITE_REG(hw, IXGBE_VLNCTRL, vlnctrl); - for (i = 0; i < adapter->num_rx_queues; i++) { - j = adapter->rx_ring[i].reg_idx; - vlnctrl = IXGBE_READ_REG(hw, IXGBE_RXDCTL(j)); - vlnctrl |= IXGBE_RXDCTL_VME; - IXGBE_WRITE_REG(hw, IXGBE_RXDCTL(j), vlnctrl); - } - } + vlnctrl |= IXGBE_VLNCTRL_VME | IXGBE_VLNCTRL_VFE; + vlnctrl &= ~IXGBE_VLNCTRL_CFIEN; + IXGBE_WRITE_REG(hw, IXGBE_VLNCTRL, vlnctrl); if (hw->mac.ops.set_vfta) hw->mac.ops.set_vfta(hw, 0, 0, true); } #ifndef IXGBE_NO_LLI -static void ixgbe_configure_lli_82599(struct ixgbe_adapter *adapter) -{ - u16 port; - - if (adapter->lli_etype) { - IXGBE_WRITE_REG(&adapter->hw, IXGBE_L34T_IMIR(0), - (IXGBE_IMIR_LLI_EN_82599 | IXGBE_IMIR_SIZE_BP_82599 | - IXGBE_IMIR_CTRL_BP_82599)); - IXGBE_WRITE_REG(&adapter->hw, IXGBE_ETQS(0), IXGBE_ETQS_LLI); - IXGBE_WRITE_REG(&adapter->hw, IXGBE_ETQF(0), - (adapter->lli_etype | IXGBE_ETQF_FILTER_EN)); - } - - if (adapter->lli_port) { - port = ntohs((u16)adapter->lli_port); - IXGBE_WRITE_REG(&adapter->hw, IXGBE_L34T_IMIR(0), - (IXGBE_IMIR_LLI_EN_82599 | IXGBE_IMIR_SIZE_BP_82599 | - IXGBE_IMIR_CTRL_BP_82599)); - IXGBE_WRITE_REG(&adapter->hw, IXGBE_FTQF(0), - (IXGBE_FTQF_POOL_MASK_EN | - (IXGBE_FTQF_PRIORITY_MASK << - IXGBE_FTQF_PRIORITY_SHIFT) | - (IXGBE_FTQF_DEST_PORT_MASK << - IXGBE_FTQF_5TUPLE_MASK_SHIFT))); - IXGBE_WRITE_REG(&adapter->hw, IXGBE_SDPQF(0), (port << 16)); - } - - if (adapter->flags & IXGBE_FLAG_LLI_PUSH) { - IXGBE_WRITE_REG(&adapter->hw, IXGBE_L34T_IMIR(0), - (IXGBE_IMIR_LLI_EN_82599 | IXGBE_IMIR_SIZE_BP_82599 | - IXGBE_IMIR_CTRL_PSH_82599 | IXGBE_IMIR_CTRL_SYN_82599 | - IXGBE_IMIR_CTRL_URG_82599 | IXGBE_IMIR_CTRL_ACK_82599 | - IXGBE_IMIR_CTRL_RST_82599 | IXGBE_IMIR_CTRL_FIN_82599)); - IXGBE_WRITE_REG(&adapter->hw, IXGBE_FTQF(0), - (IXGBE_FTQF_POOL_MASK_EN | - (IXGBE_FTQF_PRIORITY_MASK << - IXGBE_FTQF_PRIORITY_SHIFT) | - (IXGBE_FTQF_5TUPLE_MASK_MASK << - IXGBE_FTQF_5TUPLE_MASK_SHIFT))); - - IXGBE_WRITE_REG(&adapter->hw, IXGBE_LLITHRESH, 0xfc000000); - IXGBE_WRITE_REG(&adapter->hw, IXGBE_SYNQF, 0x80000100); - } - - if (adapter->lli_size) { - IXGBE_WRITE_REG(&adapter->hw, IXGBE_L34T_IMIR(0), - (IXGBE_IMIR_LLI_EN_82599 | IXGBE_IMIR_CTRL_BP_82599)); - IXGBE_WRITE_REG(&adapter->hw, IXGBE_LLITHRESH, adapter->lli_size); - IXGBE_WRITE_REG(&adapter->hw, IXGBE_FTQF(0), - (IXGBE_FTQF_POOL_MASK_EN | - (IXGBE_FTQF_PRIORITY_MASK << - IXGBE_FTQF_PRIORITY_SHIFT) | - (IXGBE_FTQF_5TUPLE_MASK_MASK << - IXGBE_FTQF_5TUPLE_MASK_SHIFT))); - } - - if (adapter->lli_vlan_pri) { - IXGBE_WRITE_REG(&adapter->hw, IXGBE_IMIRVP, - (IXGBE_IMIRVP_PRIORITY_EN | adapter->lli_vlan_pri)); - } -} - static void ixgbe_configure_lli(struct ixgbe_adapter *adapter) { u16 port; @@ -3270,7 +2729,6 @@ static void ixgbe_configure(struct ixgbe_adapter *adapter) { struct net_device *netdev = adapter->netdev; int i; - struct ixgbe_hw *hw = &adapter->hw; ixgbe_set_rx_mode(netdev); @@ -3284,116 +2742,12 @@ static void ixgbe_configure(struct ixgbe_adapter *adapter) netif_set_gso_max_size(netdev, 65536); } - if (adapter->flags & IXGBE_FLAG_FDIR_HASH_CAPABLE) - ixgbe_init_fdir_signature_82599(hw, adapter->fdir_pballoc); - else if (adapter->flags & IXGBE_FLAG_FDIR_PERFECT_CAPABLE) - ixgbe_init_fdir_perfect_82599(hw, adapter->fdir_pballoc); - ixgbe_configure_tx(adapter); ixgbe_configure_rx(adapter); - for (i = 0; i < adapter->num_rx_queues; i++) { - struct ixgbe_ring *ring = &adapter->rx_ring[i]; - ixgbe_alloc_rx_buffers(adapter, ring, IXGBE_DESC_UNUSED(ring)); - } -} - -static inline bool ixgbe_is_sfp(struct ixgbe_hw *hw) -{ - switch (hw->phy.type) { - case ixgbe_phy_sfp_avago: - case ixgbe_phy_sfp_ftl: - case ixgbe_phy_sfp_intel: - case ixgbe_phy_sfp_unknown: - case ixgbe_phy_tw_tyco: - case ixgbe_phy_tw_unknown: - return true; - default: - return false; - } -} - -/** - * ixgbe_sfp_link_config - set up SFP+ link - * @adapter: pointer to private adapter struct - **/ -static void ixgbe_sfp_link_config(struct ixgbe_adapter *adapter) -{ - struct ixgbe_hw *hw = &adapter->hw; - - if (hw->phy.multispeed_fiber) { - /* - * In multispeed fiber setups, the device may not have - * had a physical connection when the driver loaded. - * If that''s the case, the initial link configuration - * couldn''t get the MAC into 10G or 1G mode, so we''ll - * never have a link status change interrupt fire. - * We need to try and force an autonegotiation - * session, then bring up link. - */ - hw->mac.ops.setup_sfp(hw); - if (!(adapter->flags & IXGBE_FLAG_IN_SFP_LINK_TASK)) - schedule_work(&adapter->multispeed_fiber_task); - } else { - /* - * Direct Attach Cu and non-multispeed fiber modules - * still need to be configured properly prior to - * attempting link. - */ - if (!(adapter->flags & IXGBE_FLAG_IN_SFP_MOD_TASK)) - schedule_work(&adapter->sfp_config_module_task); - } -} - -/** - * ixgbe_non_sfp_link_config - set up non-SFP+ link - * @hw: pointer to private hardware struct - * - * Returns 0 on success, negative on failure - **/ -static int ixgbe_non_sfp_link_config(struct ixgbe_hw *hw) -{ - u32 autoneg; - bool link_up = false; - u32 ret = IXGBE_ERR_LINK_SETUP; - - if (hw->mac.ops.check_link) - ret = hw->mac.ops.check_link(hw, &autoneg, &link_up, false); - - if (ret) - goto link_cfg_out; - - if (hw->mac.ops.get_link_capabilities) - ret = hw->mac.ops.get_link_capabilities(hw, &autoneg, - &hw->mac.autoneg); - if (ret) - goto link_cfg_out; - - if (hw->mac.ops.setup_link_speed) - ret = hw->mac.ops.setup_link_speed(hw, autoneg, true, link_up); -link_cfg_out: - return ret; -} - -#define IXGBE_MAX_RX_DESC_POLL 10 -static inline void ixgbe_rx_desc_queue_enable(struct ixgbe_adapter *adapter, - int rxr) -{ - int j = adapter->rx_ring[rxr].reg_idx; - int k; - - for (k = 0; k < IXGBE_MAX_RX_DESC_POLL; k++) { - if (IXGBE_READ_REG(&adapter->hw, - IXGBE_RXDCTL(j)) & IXGBE_RXDCTL_ENABLE) - break; - else - msleep(1); - } - if (k >= IXGBE_MAX_RX_DESC_POLL) { - DPRINTK(DRV, ERR, "RXDCTL.ENABLE on Rx queue %d " - "not set within the polling period\n", rxr); - } - ixgbe_release_rx_desc(&adapter->hw, &adapter->rx_ring[rxr], - (adapter->rx_ring[rxr].count - 1)); + for (i = 0; i < adapter->num_rx_queues; i++) + if (adapter->rx_ring[i].active) + ixgbe_alloc_rx_buffers(adapter, &adapter->rx_ring[i], + IXGBE_DESC_UNUSED(&adapter->rx_ring[i])); } static int ixgbe_up_complete(struct ixgbe_adapter *adapter) @@ -3401,14 +2755,11 @@ static int ixgbe_up_complete(struct ixgbe_adapter *adapter) struct net_device *netdev = adapter->netdev; struct ixgbe_hw *hw = &adapter->hw; int i, j = 0; - int num_rx_rings = adapter->num_rx_queues; int max_frame = netdev->mtu + ETH_HLEN + ETH_FCS_LEN; - int err; #ifdef IXGBE_TCP_TIMER u32 tcp_timer; #endif u32 txdctl, rxdctl, mhadd; - u32 dmatxctl; u32 gpie; ixgbe_get_hw_control(adapter); @@ -3450,20 +2801,13 @@ static int ixgbe_up_complete(struct ixgbe_adapter *adapter) } #endif - /* Enable fan failure interrupt */ + /* Enable fan failure interrupt if media type is copper */ if (adapter->flags & IXGBE_FLAG_FAN_FAIL_CAPABLE) { gpie = IXGBE_READ_REG(hw, IXGBE_GPIE); gpie |= IXGBE_SDP1_GPIEN; IXGBE_WRITE_REG(hw, IXGBE_GPIE, gpie); } - if (hw->mac.type == ixgbe_mac_82599EB) { - gpie = IXGBE_READ_REG(hw, IXGBE_GPIE); - gpie |= IXGBE_SDP1_GPIEN; - gpie |= IXGBE_SDP2_GPIEN; - IXGBE_WRITE_REG(hw, IXGBE_GPIE, gpie); - } - mhadd = IXGBE_READ_REG(hw, IXGBE_MHADD); if (max_frame != (mhadd >> IXGBE_MHADD_MFS_SHIFT)) { mhadd &= ~IXGBE_MHADD_MFS_MASK; @@ -3477,42 +2821,25 @@ static int ixgbe_up_complete(struct ixgbe_adapter *adapter) txdctl = IXGBE_READ_REG(hw, IXGBE_TXDCTL(j)); /* enable WTHRESH=8 descriptors, to encourage burst writeback */ txdctl |= (8 << 16); - IXGBE_WRITE_REG(hw, IXGBE_TXDCTL(j), txdctl); - } - - if (hw->mac.type == ixgbe_mac_82599EB) { - /* DMATXCTL.EN must be set after all Tx queue config is done */ - dmatxctl = IXGBE_READ_REG(hw, IXGBE_DMATXCTL); - dmatxctl |= IXGBE_DMATXCTL_TE; - IXGBE_WRITE_REG(hw, IXGBE_DMATXCTL, dmatxctl); - } - - for (i = 0; i < adapter->num_tx_queues; i++) { - j = adapter->tx_ring[i].reg_idx; - txdctl = IXGBE_READ_REG(hw, IXGBE_TXDCTL(j)); txdctl |= IXGBE_TXDCTL_ENABLE; IXGBE_WRITE_REG(hw, IXGBE_TXDCTL(j), txdctl); } - for (i = 0; i < num_rx_rings; i++) { + for (i = 0; i < adapter->num_rx_queues; i++) { j = adapter->rx_ring[i].reg_idx; rxdctl = IXGBE_READ_REG(hw, IXGBE_RXDCTL(j)); /* enable PTHRESH=32 descriptors (half the internal cache) * and HTHRESH=0 descriptors (to minimize latency on fetch), * this also removes a pesky rx_no_buffer_count increment */ rxdctl |= 0x0020; - rxdctl |= IXGBE_RXDCTL_ENABLE; + if (adapter->rx_ring[i].active) + rxdctl |= IXGBE_RXDCTL_ENABLE; IXGBE_WRITE_REG(hw, IXGBE_RXDCTL(j), rxdctl); - if (hw->mac.type == ixgbe_mac_82599EB) - ixgbe_rx_desc_queue_enable(adapter, i); } /* enable all receives */ rxdctl = IXGBE_READ_REG(hw, IXGBE_RXCTRL); - if (hw->mac.type == ixgbe_mac_82598EB) - rxdctl |= (IXGBE_RXCTRL_DMBYPS | IXGBE_RXCTRL_RXEN); - else - rxdctl |= IXGBE_RXCTRL_RXEN; - ixgbe_enable_rx_dma(hw, rxdctl); + rxdctl |= (IXGBE_RXCTRL_DMBYPS | IXGBE_RXCTRL_RXEN); + IXGBE_WRITE_REG(hw, IXGBE_RXCTRL, rxdctl); if (adapter->flags & IXGBE_FLAG_MSIX_ENABLED) ixgbe_configure_msix(adapter); @@ -3521,51 +2848,23 @@ static int ixgbe_up_complete(struct ixgbe_adapter *adapter) #ifndef IXGBE_NO_LLI /* lli should only be enabled with MSI-X and MSI */ if (adapter->flags & IXGBE_FLAG_MSI_ENABLED || - adapter->flags & IXGBE_FLAG_MSIX_ENABLED) { - if (adapter->hw.mac.type == ixgbe_mac_82599EB) - ixgbe_configure_lli_82599(adapter); - else + adapter->flags & IXGBE_FLAG_MSIX_ENABLED) ixgbe_configure_lli(adapter); - } - #endif + clear_bit(__IXGBE_DOWN, &adapter->state); ixgbe_napi_enable_all(adapter); - /* - * For hot-pluggable SFP+ devices, a new SFP+ module may have - * arrived before interrupts were enabled. We need to kick off - * the SFP+ module setup first, then try to bring up link. - * If we''re not hot-pluggable SFP+, we just need to configure link - * and bring it up. - */ - err = hw->phy.ops.identify_sfp(hw); - if (err == IXGBE_ERR_SFP_NOT_SUPPORTED) { - DPRINTK(PROBE, ERR, "failed to load because an " - "unsupported SFP+ module type was detected.\n"); - ixgbe_down(adapter); - return err; - } - - if (ixgbe_is_sfp(hw)) { - ixgbe_sfp_link_config(adapter); - } else { - err = ixgbe_non_sfp_link_config(hw); - if (err) - DPRINTK(PROBE, ERR, "link_config FAILED %d\n", err); - } + /* clear any pending interrupts, may auto mask */ + IXGBE_READ_REG(hw, IXGBE_EICR); - /* enable transmits */ - netif_tx_start_all_queues(netdev); + ixgbe_irq_enable(adapter); /* bring the link up in the watchdog, this could race with our first * link up interrupt but shouldn''t be a problem */ adapter->flags |= IXGBE_FLAG_NEED_LINK_UPDATE; adapter->link_check_timeout = jiffies; mod_timer(&adapter->watchdog_timer, jiffies); - for (i = 0; i < adapter->num_tx_queues; i++) - set_bit(__IXGBE_FDIR_INIT_DONE, - &(adapter->tx_ring[i].reinit_state)); return 0; } @@ -3581,44 +2880,16 @@ void ixgbe_reinit_locked(struct ixgbe_adapter *adapter) int ixgbe_up(struct ixgbe_adapter *adapter) { - int err; - struct ixgbe_hw *hw = &adapter->hw; - ixgbe_configure(adapter); - err = ixgbe_up_complete(adapter); - - /* clear any pending interrupts, may auto mask */ - IXGBE_READ_REG(hw, IXGBE_EICR); - ixgbe_irq_enable(adapter, true, true); - - return err; + return ixgbe_up_complete(adapter); } void ixgbe_reset(struct ixgbe_adapter *adapter) { struct ixgbe_hw *hw = &adapter->hw; - int err; - - err = hw->mac.ops.init_hw(hw); - switch (err) { - case 0: - case IXGBE_ERR_SFP_NOT_PRESENT: - break; - case IXGBE_ERR_MASTER_REQUESTS_PENDING: - DPRINTK(HW, INFO, "master disable timed out\n"); - break; - case IXGBE_ERR_EEPROM_VERSION: - /* We are running on a pre-production device, log a warning */ - DPRINTK(PROBE, INFO, "This device is a pre-production adapter/" - "LOM. Please be aware there may be issues associated " - "with your hardware. If you are experiencing problems " - "please contact your Intel or hardware representative " - "who provided you with this hardware.\n"); - break; - default: - DPRINTK(PROBE, ERR, "Hardware Error: %d\n", err); - } + if (hw->mac.ops.init_hw(hw)) + DPRINTK(PROBE, ERR, "Hardware Error\n"); /* reprogram the RAR[0] in case user changed it. */ if (hw->mac.ops.set_rar) @@ -3643,21 +2914,28 @@ static void ixgbe_clean_rx_ring(struct ixgbe_adapter *adapter, struct ixgbe_rx_buffer *rx_buffer_info; rx_buffer_info = &rx_ring->rx_buffer_info[i]; + if (rx_buffer_info->skb) { +#ifdef CONFIG_XEN_NETDEV2_VMQ + if ((adapter->flags & IXGBE_FLAG_VMDQ_ENABLED) && + rx_ring->queue_index) { + pci_unmap_page(pdev, rx_buffer_info->dma, + PAGE_SIZE, + PCI_DMA_FROMDEVICE); + vmq_free_skb(rx_buffer_info->skb, + rx_ring->queue_index); + rx_buffer_info->dma = 0; + } else +#endif + dev_kfree_skb(rx_buffer_info->skb); + rx_buffer_info->skb = NULL; + } + if (rx_buffer_info->dma) { pci_unmap_single(pdev, rx_buffer_info->dma, - rx_ring->rx_buf_len, + rx_ring->rx_buf_len + NET_IP_ALIGN, PCI_DMA_FROMDEVICE); rx_buffer_info->dma = 0; } - if (rx_buffer_info->skb) { - struct sk_buff *skb = rx_buffer_info->skb; - rx_buffer_info->skb = NULL; - do { - struct sk_buff *this = skb; - skb = skb->prev; - dev_kfree_skb(this); - } while (skb); - } if (!rx_buffer_info->page) continue; pci_unmap_page(pdev, rx_buffer_info->page_dma, PAGE_SIZE / 2, @@ -3677,10 +2955,8 @@ static void ixgbe_clean_rx_ring(struct ixgbe_adapter *adapter, rx_ring->next_to_clean = 0; rx_ring->next_to_use = 0; - if (rx_ring->head) - writel(0, adapter->hw.hw_addr + rx_ring->head); - if (rx_ring->tail) - writel(0, adapter->hw.hw_addr + rx_ring->tail); + writel(0, adapter->hw.hw_addr + rx_ring->head); + writel(0, adapter->hw.hw_addr + rx_ring->tail); } /** @@ -3710,10 +2986,8 @@ static void ixgbe_clean_tx_ring(struct ixgbe_adapter *adapter, tx_ring->next_to_use = 0; tx_ring->next_to_clean = 0; - if (tx_ring->head) - writel(0, adapter->hw.hw_addr + tx_ring->head); - if (tx_ring->tail) - writel(0, adapter->hw.hw_addr + tx_ring->tail); + writel(0, adapter->hw.hw_addr + tx_ring->head); + writel(0, adapter->hw.hw_addr + tx_ring->tail); } /** @@ -3772,9 +3046,6 @@ void ixgbe_down(struct ixgbe_adapter *adapter) * holding */ while (adapter->flags & IXGBE_FLAG_IN_WATCHDOG_TASK) msleep(1); - if (adapter->flags & IXGBE_FLAG_FDIR_HASH_CAPABLE || - adapter->flags & IXGBE_FLAG_FDIR_PERFECT_CAPABLE) - cancel_work_sync(&adapter->fdir_reinit_task); /* disable transmits in the hardware now that interrupts are off */ for (i = 0; i < adapter->num_tx_queues; i++) { @@ -3783,11 +3054,6 @@ void ixgbe_down(struct ixgbe_adapter *adapter) IXGBE_WRITE_REG(hw, IXGBE_TXDCTL(j), (txdctl & ~IXGBE_TXDCTL_ENABLE)); } - /* Disable the Tx DMA engine on 82599 */ - if (hw->mac.type == ixgbe_mac_82599EB) - IXGBE_WRITE_REG(hw, IXGBE_DMATXCTL, - (IXGBE_READ_REG(hw, IXGBE_DMATXCTL) & - ~IXGBE_DMATXCTL_TE)); netif_carrier_off(netdev); @@ -3807,7 +3073,15 @@ void ixgbe_down(struct ixgbe_adapter *adapter) #if defined(CONFIG_DCA) || defined(CONFIG_DCA_MODULE) /* since we reset the hardware DCA settings were cleared */ - ixgbe_setup_dca(adapter); + if (adapter->flags & IXGBE_FLAG_DCA_CAPABLE) { + if (dca_add_requester(&adapter->pdev->dev) == 0) { + adapter->flags |= IXGBE_FLAG_DCA_ENABLED; + /* always use CB2 mode, difference is masked + * in the CB driver */ + IXGBE_WRITE_REG(hw, IXGBE_DCA_CTRL, 2); + ixgbe_setup_dca(adapter); + } + } #endif } @@ -3824,7 +3098,7 @@ static int ixgbe_poll(struct napi_struct *napi, int budget) struct ixgbe_q_vector *q_vector container_of(napi, struct ixgbe_q_vector, napi); struct ixgbe_adapter *adapter = q_vector->adapter; - int tx_clean_complete, work_done = 0; + int tx_cleaned, work_done = 0; #if defined(CONFIG_DCA) || defined(CONFIG_DCA_MODULE) if (adapter->flags & IXGBE_FLAG_DCA_ENABLED) { @@ -3833,24 +3107,20 @@ static int ixgbe_poll(struct napi_struct *napi, int budget) } #endif - tx_clean_complete = ixgbe_clean_tx_irq(q_vector, adapter->tx_ring); - ixgbe_clean_rx_irq(q_vector, adapter->rx_ring, &work_done, budget); + tx_cleaned = ixgbe_clean_tx_irq(adapter, adapter->tx_ring); + ixgbe_clean_rx_irq(adapter, adapter->rx_ring, &work_done, budget); - if (!tx_clean_complete) + if (tx_cleaned) work_done = budget; -#ifndef HAVE_NETDEV_NAPI_LIST - if (!netif_running(adapter->netdev)) - work_done = 0; - -#endif /* If no Tx and not enough Rx work done, exit the polling mode */ - if (work_done < budget) { - napi_complete(napi); - if (adapter->itr_setting & 1) + if ((work_done == 0) || !netif_running(adapter->netdev)) { + netif_rx_complete(adapter->netdev, napi); + if (adapter->itr_setting & 3) ixgbe_set_itr(adapter); if (!test_bit(__IXGBE_DOWN, &adapter->state)) - ixgbe_irq_enable_queues(adapter, IXGBE_EIMS_RTX_QUEUE); + ixgbe_irq_enable_queues(adapter); + return 0; } return work_done; } @@ -3883,197 +3153,120 @@ static void ixgbe_reset_task(struct work_struct *work) ixgbe_reinit_locked(adapter); } - -/** - * ixgbe_set_dcb_queues: Allocate queues for a DCB-enabled device - * @adapter: board private structure to initialize - * - * When DCB (Data Center Bridging) is enabled, allocate queues for - * each traffic class. If multiqueue isn''t availabe, then abort DCB - * initialization. - * - **/ -static inline bool ixgbe_set_dcb_queues(struct ixgbe_adapter *adapter) +static void ixgbe_set_num_queues(struct ixgbe_adapter *adapter) { - bool ret = false; - struct ixgbe_ring_feature *f = &adapter->ring_feature[RING_F_DCB]; - - if (!(adapter->flags & IXGBE_FLAG_DCB_ENABLED)) - return ret; + int nrq = 1, ntq = 1; + int feature_mask = 0, rss_i, rss_m; + int dcb_i, dcb_m; + int vmdq_i, vmdq_m; + /* Number of supported queues */ + switch (adapter->hw.mac.type) { + case ixgbe_mac_82598EB: + dcb_i = adapter->ring_feature[RING_F_DCB].indices; + dcb_m = 0; + vmdq_i = adapter->ring_feature[RING_F_VMDQ].indices; + vmdq_m = 0; + rss_i = adapter->ring_feature[RING_F_RSS].indices; + rss_m = 0; + feature_mask |= IXGBE_FLAG_DCB_ENABLED; + feature_mask |= IXGBE_FLAG_VMDQ_ENABLED; + feature_mask |= IXGBE_FLAG_RSS_ENABLED; + + switch (adapter->flags & feature_mask) { + case (IXGBE_FLAG_RSS_ENABLED | IXGBE_FLAG_DCB_ENABLED | + IXGBE_FLAG_VMDQ_ENABLED): + dcb_m = 0x7 << 3; + vmdq_i = min(2, vmdq_i); + vmdq_m = 0x1 << 2; + rss_i = min(4, rss_i); + rss_m = 0x3; + nrq = dcb_i * vmdq_i * rss_i; + ntq = dcb_i * vmdq_i; + break; + case (IXGBE_FLAG_VMDQ_ENABLED | IXGBE_FLAG_DCB_ENABLED): + dcb_m = 0x7 << 3; + vmdq_i = min(8, vmdq_i); + vmdq_m = 0x7; + nrq = dcb_i * vmdq_i; + ntq = min(MAX_TX_QUEUES, dcb_i * vmdq_i); + break; + case (IXGBE_FLAG_RSS_ENABLED | IXGBE_FLAG_DCB_ENABLED): + dcb_m = 0x7 << 3; + rss_i = min(8, rss_i); + rss_m = 0x7; + nrq = dcb_i * rss_i; + ntq = min(MAX_TX_QUEUES, dcb_i * rss_i); + break; + case (IXGBE_FLAG_DCB_ENABLED): #ifdef HAVE_TX_MQ - f->mask = 0x7 << 3; - adapter->num_rx_queues = f->indices; - adapter->num_tx_queues = f->indices; - ret = true; + dcb_m = 0x7 << 3; + nrq = dcb_i; + ntq = dcb_i; #else - DPRINTK(DRV, INFO, "Kernel has no multiqueue support, disabling DCB\n"); - f->mask = 0; - f->indices = 0; + DPRINTK(DRV, INFO, "Kernel has no multiqueue " + "support, disabling DCB.\n"); + /* Fall back onto RSS */ + rss_m = 0xF; + nrq = rss_i; + ntq = 1; + dcb_m = 0; + dcb_i = 0; #endif - - return ret; -} - -/** - * ixgbe_set_vmdq_queues: Allocate queues for VMDq devices - * @adapter: board private structure to initialize - * - * When VMDq (Virtual Machine Devices queue) is enabled, allocate queues - * and VM pools where appropriate. If RSS is available, then also try and - * enable RSS and map accordingly. - * - **/ -static inline bool ixgbe_set_vmdq_queues(struct ixgbe_adapter *adapter) -{ - int vmdq_i = adapter->ring_feature[RING_F_VMDQ].indices; - int vmdq_m = 0; - int rss_i = adapter->ring_feature[RING_F_RSS].indices; - int rss_m = adapter->ring_feature[RING_F_RSS].mask; - unsigned long i; - int rss_shift; - bool ret = false; - - switch (adapter->flags & - (IXGBE_FLAG_RSS_ENABLED | IXGBE_FLAG_VMDQ_ENABLED)) { - - case (IXGBE_FLAG_RSS_ENABLED | IXGBE_FLAG_VMDQ_ENABLED): - if (adapter->hw.mac.type == ixgbe_mac_82599EB) { - vmdq_i = min(IXGBE_MAX_VMDQ_INDICES, vmdq_i); - if (vmdq_i > 32) - rss_i = 2; - else - rss_i = 4; - i = rss_i; - rss_shift = find_first_bit(&i, sizeof(i) * 8); - rss_m = (rss_i - 1); - vmdq_m = ((IXGBE_MAX_VMDQ_INDICES - 1) << - rss_shift) & (MAX_RX_QUEUES - 1); - } - adapter->num_rx_queues = vmdq_i * rss_i; - adapter->num_tx_queues = min(MAX_TX_QUEUES, vmdq_i * rss_i); - ret = true; - break; - - case (IXGBE_FLAG_VMDQ_ENABLED): - if (adapter->hw.mac.type == ixgbe_mac_82599EB) - vmdq_m = (IXGBE_MAX_VMDQ_INDICES - 1) << 1; - else - vmdq_m = (IXGBE_MAX_VMDQ_INDICES - 1); - adapter->num_rx_queues = vmdq_i; - adapter->num_tx_queues = vmdq_i; - ret = true; - break; - - default: - ret = false; - goto vmdq_queues_out; - } - - if (adapter->flags & IXGBE_FLAG_VMDQ_ENABLED) { - adapter->num_rx_pools = vmdq_i; - adapter->num_rx_queues_per_pool = adapter->num_rx_queues / - vmdq_i; - } else { - adapter->num_rx_pools = adapter->num_rx_queues; - adapter->num_rx_queues_per_pool = 1; - } - /* save the mask for later use */ - adapter->ring_feature[RING_F_VMDQ].mask = vmdq_m; -vmdq_queues_out: - return ret; -} - -/** - * ixgbe_set_rss_queues: Allocate queues for RSS - * @adapter: board private structure to initialize - * - * This is our "base" multiqueue mode. RSS (Receive Side Scaling) will try - * to allocate one Rx queue per CPU, and if available, one Tx queue per CPU. - * - **/ -static inline bool ixgbe_set_rss_queues(struct ixgbe_adapter *adapter) -{ - bool ret = false; - struct ixgbe_ring_feature *f = &adapter->ring_feature[RING_F_RSS]; - - if (adapter->flags & IXGBE_FLAG_RSS_ENABLED) { - f->mask = 0xF; - adapter->num_rx_queues = f->indices; + break; + case (IXGBE_FLAG_RSS_ENABLED | IXGBE_FLAG_VMDQ_ENABLED): + vmdq_i = min(4, vmdq_i); + vmdq_m = 0x3 << 3; + rss_m = 0xF; + nrq = vmdq_i * rss_i; + ntq = min(MAX_TX_QUEUES, vmdq_i * rss_i); + break; + case (IXGBE_FLAG_VMDQ_ENABLED): + vmdq_m = 0xF; + nrq = vmdq_i; + ntq = vmdq_i; + break; + case (IXGBE_FLAG_RSS_ENABLED): + rss_m = 0xF; + nrq = rss_i; #ifdef HAVE_TX_MQ - adapter->num_tx_queues = f->indices; + ntq = rss_i; +#else + ntq = 1; #endif - ret = true; - } - - return ret; -} - -/** - * ixgbe_set_fdir_queues: Allocate queues for Flow Director - * @adapter: board private structure to initialize - * - * Flow Director is an advanced Rx filter, attempting to get Rx flows back - * to the original CPU that initiated the Tx session. This runs in addition - * to RSS, so if a packet doesn''t match an FDIR filter, we can still spread the - * Rx load across CPUs using RSS. - * - **/ -static bool inline ixgbe_set_fdir_queues(struct ixgbe_adapter *adapter) -{ - bool ret = false; - struct ixgbe_ring_feature *f_fdir = &adapter->ring_feature[RING_F_FDIR]; + break; + case 0: + default: + dcb_i = 0; + dcb_m = 0; + rss_i = 0; + rss_m = 0; + vmdq_i = 0; + vmdq_m = 0; + nrq = 1; + ntq = 1; + break; + } - f_fdir->indices = min((int)num_online_cpus(), f_fdir->indices); - f_fdir->mask = 0; + /* sanity check, we should never have zero queues */ + nrq = (nrq ?:1); + ntq = (ntq ?:1); - /* Flow Director must have RSS enabled */ - if (adapter->flags & IXGBE_FLAG_RSS_ENABLED && - ((adapter->flags & IXGBE_FLAG_FDIR_HASH_CAPABLE || - (adapter->flags & IXGBE_FLAG_FDIR_PERFECT_CAPABLE)))) { - adapter->num_rx_queues = f_fdir->indices; -#ifdef HAVE_TX_MQ - adapter->num_tx_queues = f_fdir->indices; -#endif - ret = true; - } else { - adapter->flags &= ~IXGBE_FLAG_FDIR_HASH_CAPABLE; - adapter->flags &= ~IXGBE_FLAG_FDIR_PERFECT_CAPABLE; + adapter->ring_feature[RING_F_DCB].indices = dcb_i; + adapter->ring_feature[RING_F_DCB].mask = dcb_m; + adapter->ring_feature[RING_F_VMDQ].indices = vmdq_i; + adapter->ring_feature[RING_F_VMDQ].mask = vmdq_m; + adapter->ring_feature[RING_F_RSS].indices = rss_i; + adapter->ring_feature[RING_F_RSS].mask = rss_m; + break; + default: + nrq = 1; + ntq = 1; + break; } - return ret; -} - -/* - * ixgbe_set_num_queues: Allocate queues for device, feature dependant - * @adapter: board private structure to initialize - * - * This is the top level queue allocation routine. The order here is very - * important, starting with the "most" number of features turned on at once, - * and ending with the smallest set of features. This way large combinations - * can be allocated if they''re turned on, and smaller combinations are the - * fallthrough conditions. - * - **/ -static void ixgbe_set_num_queues(struct ixgbe_adapter *adapter) -{ - /* Start with base case */ - adapter->num_rx_queues = 1; - adapter->num_tx_queues = 1; - adapter->num_rx_pools = adapter->num_rx_queues; - adapter->num_rx_queues_per_pool = 1; - if (ixgbe_set_vmdq_queues(adapter)) - return; - - if (ixgbe_set_dcb_queues(adapter)) - return; - - if (ixgbe_set_fdir_queues(adapter)) - return; - - - if (ixgbe_set_rss_queues(adapter)) - return; + adapter->num_rx_queues = nrq; + adapter->num_tx_queues = ntq; } static void ixgbe_acquire_msix_vectors(struct ixgbe_adapter *adapter, @@ -4114,222 +3307,131 @@ static void ixgbe_acquire_msix_vectors(struct ixgbe_adapter *adapter, adapter->flags &= ~IXGBE_FLAG_MSIX_ENABLED; kfree(adapter->msix_entries); adapter->msix_entries = NULL; + adapter->flags &= ~IXGBE_FLAG_DCB_ENABLED; + adapter->flags &= ~IXGBE_FLAG_DCB_CAPABLE; + adapter->flags &= ~IXGBE_FLAG_RSS_ENABLED; + adapter->flags &= ~IXGBE_FLAG_VMDQ_ENABLED; + ixgbe_set_num_queues(adapter); } else { adapter->flags |= IXGBE_FLAG_MSIX_ENABLED; /* Woot! */ - /* - * Adjust for only the vectors we''ll use, which is minimum - * of max_msix_q_vectors + NON_Q_VECTORS, or the number of - * vectors we were allocated. - */ - adapter->num_msix_vectors = min(vectors, - adapter->max_msix_q_vectors + NON_Q_VECTORS); + adapter->num_msix_vectors = vectors; } } /** - * ixgbe_cache_ring_rss - Descriptor ring to register mapping for RSS - * @adapter: board private structure to initialize - * - * Cache the descriptor ring offsets for RSS to the assigned rings. - * - **/ -static inline bool ixgbe_cache_ring_rss(struct ixgbe_adapter *adapter) -{ - int i; - - if (!(adapter->flags & IXGBE_FLAG_RSS_ENABLED)) - return false; - - for (i = 0; i < adapter->num_rx_queues; i++) - adapter->rx_ring[i].reg_idx = i; - for (i = 0; i < adapter->num_tx_queues; i++) - adapter->tx_ring[i].reg_idx = i; - - return true; -} - -/** - * ixgbe_cache_ring_dcb - Descriptor ring to register mapping for DCB + * ixgbe_cache_ring_register - Descriptor ring to register mapping * @adapter: board private structure to initialize * - * Cache the descriptor ring offsets for DCB to the assigned rings. - * + * Once we know the feature-set enabled for the device, we''ll cache + * the register offset the descriptor ring is assigned to. **/ -static inline bool ixgbe_cache_ring_dcb(struct ixgbe_adapter *adapter) +static void __devinit ixgbe_cache_ring_register(struct ixgbe_adapter *adapter) { - int i; - bool ret = false; - int dcb_i = adapter->ring_feature[RING_F_DCB].indices; - - if (!(adapter->flags & IXGBE_FLAG_DCB_ENABLED)) - return false; + int feature_mask = 0, rss_i; + int i, txr_idx, rxr_idx; + int dcb_i; + int vmdq_i, k; - /* the number of queues is assumed to be symmetric */ - if (adapter->hw.mac.type == ixgbe_mac_82598EB) { - for (i = 0; i < dcb_i; i++) { - adapter->rx_ring[i].reg_idx = i << 3; - adapter->tx_ring[i].reg_idx = i << 2; - } - ret = true; - } else if (adapter->hw.mac.type == ixgbe_mac_82599EB) { - if (dcb_i == 8) { - /* - * Tx TC0 starts at: descriptor queue 0 - * Tx TC1 starts at: descriptor queue 32 - * Tx TC2 starts at: descriptor queue 64 - * Tx TC3 starts at: descriptor queue 80 - * Tx TC4 starts at: descriptor queue 96 - * Tx TC5 starts at: descriptor queue 104 - * Tx TC6 starts at: descriptor queue 112 - * Tx TC7 starts at: descriptor queue 120 - * - * Rx TC0-TC7 are offset by 16 queues each - */ - for (i = 0; i < 3; i++) { - adapter->tx_ring[i].reg_idx = i << 5; - adapter->rx_ring[i].reg_idx = i << 4; + /* Number of supported queues */ + switch (adapter->hw.mac.type) { + case ixgbe_mac_82598EB: + dcb_i = adapter->ring_feature[RING_F_DCB].indices; + vmdq_i = adapter->ring_feature[RING_F_VMDQ].indices; + rss_i = adapter->ring_feature[RING_F_RSS].indices; + txr_idx = 0; + rxr_idx = 0; + feature_mask |= IXGBE_FLAG_DCB_ENABLED; + feature_mask |= IXGBE_FLAG_VMDQ_ENABLED; + feature_mask |= IXGBE_FLAG_RSS_ENABLED; + switch (adapter->flags & feature_mask) { + case (IXGBE_FLAG_RSS_ENABLED | IXGBE_FLAG_DCB_ENABLED | + IXGBE_FLAG_VMDQ_ENABLED): + for (i = 0; i < dcb_i; i++) { + int j; + for (j = 0; j < vmdq_i; j++) { + for (k = 0; k < rss_i; k++) { + adapter->rx_ring[rxr_idx].reg_idx = i << 3 | + j << 2 | + k; + rxr_idx++; } - for ( ; i < 5; i++) { - adapter->tx_ring[i].reg_idx = ((i + 2) << 4); - adapter->rx_ring[i].reg_idx = i << 4; + adapter->tx_ring[txr_idx].reg_idx = i << 2 | j; + txr_idx++; } - for ( ; i < dcb_i; i++) { - adapter->tx_ring[i].reg_idx = ((i + 8) << 3); - adapter->rx_ring[i].reg_idx = i << 4; } - ret = true; - } else if (dcb_i == 4) { - /* - * Tx TC0 starts at: descriptor queue 0 - * Tx TC1 starts at: descriptor queue 64 - * Tx TC2 starts at: descriptor queue 96 - * Tx TC3 starts at: descriptor queue 112 - * - * Rx TC0-TC3 are offset by 32 queues each - */ - adapter->tx_ring[0].reg_idx = 0; - adapter->tx_ring[1].reg_idx = 64; - adapter->tx_ring[2].reg_idx = 96; - adapter->tx_ring[3].reg_idx = 112; - for (i = 0 ; i < dcb_i; i++) - adapter->rx_ring[i].reg_idx = i << 5; - ret = true; - } - } - - return ret; -} - -/** - * ixgbe_cache_ring_vmdq - Descriptor ring to register mapping for VMDq - * @adapter: board private structure to initialize - * - * Cache the descriptor ring offsets for VMDq to the assigned rings. It - * will also try to cache the proper offsets if RSS is enabled along with - * VMDq. - * - **/ -static inline bool ixgbe_cache_ring_vmdq(struct ixgbe_adapter *adapter) -{ - int i; - bool ret = false; - - switch (adapter->flags & - (IXGBE_FLAG_RSS_ENABLED | IXGBE_FLAG_VMDQ_ENABLED)) { - - case (IXGBE_FLAG_RSS_ENABLED | IXGBE_FLAG_VMDQ_ENABLED): - if (adapter->hw.mac.type == ixgbe_mac_82599EB) { - /* since the # of rss queues per vmdq pool is - * limited to either 2 or 4, there is no index - * skipping and we can set them up with no - * funky mapping - */ + break; + case (IXGBE_FLAG_VMDQ_ENABLED | IXGBE_FLAG_DCB_ENABLED): + for (i = 0; i < dcb_i; i++) { + int j; + for (j = 0; j < vmdq_i; j++) { + adapter->rx_ring[rxr_idx].reg_idx = i << 3 | j; + adapter->tx_ring[txr_idx].reg_idx = i << 2 | + (j >> 1); + rxr_idx++; + if (j & 1) + txr_idx++; + } + } + break; + case (IXGBE_FLAG_RSS_ENABLED | IXGBE_FLAG_DCB_ENABLED): + for (i = 0; i < dcb_i; i++) { + int j; + /* Rx first */ + for (j = 0; j < adapter->num_rx_queues; j++) { + adapter->rx_ring[rxr_idx].reg_idx = i << 3 | j; + rxr_idx++; + } + /* Tx now */ + for (j = 0; j < adapter->num_tx_queues; j++) { + adapter->tx_ring[txr_idx].reg_idx = i << 2 | + (j >> 1); + if (j & 1) + txr_idx++; + } + } + break; + case (IXGBE_FLAG_DCB_ENABLED): + /* the number of queues is assumed to be symmetric */ + for (i = 0; i < dcb_i; i++) { + adapter->rx_ring[i].reg_idx = i << 3; + adapter->tx_ring[i].reg_idx = i << 2; + } + break; + case (IXGBE_FLAG_RSS_ENABLED | IXGBE_FLAG_VMDQ_ENABLED): + for (i = 0; i < vmdq_i; i++) { + int j; + for (j = 0; j < rss_i; j++) { + adapter->rx_ring[rxr_idx].reg_idx = i << 4 | j; + adapter->tx_ring[txr_idx].reg_idx = i << 3 | + (j >> 1); + rxr_idx++; + if (j & 1) + txr_idx++; + } + } + break; + case (IXGBE_FLAG_VMDQ_ENABLED): for (i = 0; i < adapter->num_rx_queues; i++) adapter->rx_ring[i].reg_idx = i; - for (i = 0; i < adapter->num_tx_queues; i++) + for (i = 0; i < adapter->num_tx_queues; i++) adapter->tx_ring[i].reg_idx = i; - ret = true; - } - break; - - case (IXGBE_FLAG_VMDQ_ENABLED): - if (adapter->hw.mac.type == ixgbe_mac_82598EB) { + break; + case (IXGBE_FLAG_RSS_ENABLED): for (i = 0; i < adapter->num_rx_queues; i++) adapter->rx_ring[i].reg_idx = i; for (i = 0; i < adapter->num_tx_queues; i++) adapter->tx_ring[i].reg_idx = i; - ret = true; - } else if (adapter->hw.mac.type == ixgbe_mac_82599EB) { - /* even without rss, there are 2 queues per - * pool, the odd numbered ones are unused. - */ - for (i = 0; i < adapter->num_rx_queues; i++) - adapter->rx_ring[i].reg_idx = i * 2; - for (i = 0; i < adapter->num_tx_queues; i++) - adapter->tx_ring[i].reg_idx = i * 2; - ret = true; + break; + case 0: + default: + break; } break; + default: + break; } - - return ret; -} - -/** - * ixgbe_cache_ring_fdir - Descriptor ring to register mapping for Flow Director - * @adapter: board private structure to initialize - * - * Cache the descriptor ring offsets for Flow Director to the assigned rings. - * - **/ -static bool inline ixgbe_cache_ring_fdir(struct ixgbe_adapter *adapter) -{ - int i; - bool ret = false; - - if (adapter->flags & IXGBE_FLAG_RSS_ENABLED && - ((adapter->flags & IXGBE_FLAG_FDIR_HASH_CAPABLE) || - (adapter->flags & IXGBE_FLAG_FDIR_PERFECT_CAPABLE))) { - for (i = 0; i < adapter->num_rx_queues; i++) - adapter->rx_ring[i].reg_idx = i; - for (i = 0; i < adapter->num_tx_queues; i++) - adapter->tx_ring[i].reg_idx = i; - ret = true; - } - - return ret; } -/** - * ixgbe_cache_ring_register - Descriptor ring to register mapping - * @adapter: board private structure to initialize - * - * Once we know the feature-set enabled for the device, we''ll cache - * the register offset the descriptor ring is assigned to. - * - * Note, the order the various feature calls is important. It must start with - * the "most" features enabled at the same time, then trickle down to the - * least amount of features turned on at once. - **/ -static void ixgbe_cache_ring_register(struct ixgbe_adapter *adapter) -{ - /* start with default case */ - adapter->rx_ring[0].reg_idx = 0; - adapter->tx_ring[0].reg_idx = 0; - - if (ixgbe_cache_ring_vmdq(adapter)) - return; - - if (ixgbe_cache_ring_dcb(adapter)) - return; - - if (ixgbe_cache_ring_fdir(adapter)) - return; - - if (ixgbe_cache_ring_rss(adapter)) - return; - -} /** * ixgbe_alloc_queues - Allocate memory for all rings @@ -4350,15 +3452,12 @@ static int ixgbe_alloc_queues(struct ixgbe_adapter *adapter) adapter->rx_ring = kcalloc(adapter->num_rx_queues, sizeof(struct ixgbe_ring), GFP_KERNEL); - if (!adapter->rx_ring) goto err_rx_ring_allocation; for (i = 0; i < adapter->num_tx_queues; i++) { adapter->tx_ring[i].count = adapter->tx_ring_count; adapter->tx_ring[i].queue_index = i; - adapter->tx_ring[i].atr_sample_rate = adapter->atr_sample_rate; - adapter->tx_ring[i].atr_count = 0; } for (i = 0; i < adapter->num_rx_queues; i++) { @@ -4385,7 +3484,6 @@ err_tx_ring_allocation: **/ static int ixgbe_set_interrupt_capability(struct ixgbe_adapter *adapter) { - struct ixgbe_hw *hw = &adapter->hw; int err = 0; int vector, v_budget; @@ -4403,36 +3501,42 @@ static int ixgbe_set_interrupt_capability(struct ixgbe_adapter *adapter) /* * At the same time, hardware can only support a maximum of - * hw.mac->max_msix_vectors vectors. With features - * such as RSS and VMDq, we can easily surpass the number of Rx and Tx - * descriptor queues supported by our device. Thus, we cap it off in - * those rare cases where the cpu count also exceeds our vector limit. + * MAX_MSIX_COUNT vectors. With features such as RSS and VMDq, + * we can easily reach upwards of 64 Rx descriptor queues and + * 32 Tx queues. Thus, we cap it off in those rare cases where + * the cpu count also exceeds our vector limit. */ - v_budget = min(v_budget, (int)hw->mac.max_msix_vectors); + v_budget = min(v_budget, MAX_MSIX_COUNT); /* A failure in MSI-X entry allocation isn''t fatal, but it does * mean we disable MSI-X capabilities of the adapter. */ adapter->msix_entries = kcalloc(v_budget, sizeof(struct msix_entry), GFP_KERNEL); - if (adapter->msix_entries) { - for (vector = 0; vector < v_budget; vector++) - adapter->msix_entries[vector].entry = vector; - - ixgbe_acquire_msix_vectors(adapter, v_budget); - - if (adapter->flags & IXGBE_FLAG_MSIX_ENABLED) + if (!adapter->msix_entries) { + adapter->flags &= ~IXGBE_FLAG_DCB_ENABLED; + adapter->flags &= ~IXGBE_FLAG_DCB_CAPABLE; + adapter->flags &= ~IXGBE_FLAG_VMDQ_ENABLED; + adapter->flags &= ~IXGBE_FLAG_RSS_ENABLED; + ixgbe_set_num_queues(adapter); + kfree(adapter->tx_ring); + kfree(adapter->rx_ring); + err = ixgbe_alloc_queues(adapter); + if (err) { + DPRINTK(PROBE, ERR, "Unable to allocate memory " + "for queues\n"); goto out; + } + goto try_msi; } - adapter->flags &= ~IXGBE_FLAG_DCB_ENABLED; - adapter->flags &= ~IXGBE_FLAG_DCB_CAPABLE; - adapter->flags &= ~IXGBE_FLAG_FDIR_HASH_CAPABLE; - adapter->flags &= ~IXGBE_FLAG_FDIR_PERFECT_CAPABLE; - adapter->atr_sample_rate = 0; - adapter->flags &= ~IXGBE_FLAG_VMDQ_ENABLED; - adapter->flags &= ~IXGBE_FLAG_RSS_ENABLED; - ixgbe_set_num_queues(adapter); + for (vector = 0; vector < v_budget; vector++) + adapter->msix_entries[vector].entry = vector; + + ixgbe_acquire_msix_vectors(adapter, v_budget); + + if (adapter->flags & IXGBE_FLAG_MSIX_ENABLED) + goto out; try_msi: if (!(adapter->flags & IXGBE_FLAG_MSI_CAPABLE)) @@ -4450,130 +3554,17 @@ try_msi: out: #ifdef HAVE_TX_MQ - /* Notify the stack of the (possibly) reduced Tx Queue count. */ #ifdef CONFIG_NETDEVICES_MULTIQUEUE + /* Notify the stack of the (possibly) reduced Tx Queue count. */ adapter->netdev->egress_subqueue_count = adapter->num_tx_queues; -#else +#else /* CONFIG_NETDEVICES_MULTIQUEUE */ adapter->netdev->real_num_tx_queues = adapter->num_tx_queues; -#endif +#endif /* CONFIG_NETDEVICES_MULTIQUEUE */ #endif /* HAVE_TX_MQ */ return err; } -/** - * ixgbe_alloc_q_vectors - Allocate memory for interrupt vectors - * @adapter: board private structure to initialize - * - * We allocate one q_vector per queue interrupt. If allocation fails we - * return -ENOMEM. - **/ -static int ixgbe_alloc_q_vectors(struct ixgbe_adapter *adapter) -{ - int v_idx, num_q_vectors; - struct ixgbe_q_vector *q_vector; - int rx_vectors; -#ifdef CONFIG_IXGBE_NAPI - int (*poll)(struct napi_struct *, int); -#endif - - if (adapter->flags & IXGBE_FLAG_MSIX_ENABLED) { - num_q_vectors = adapter->num_msix_vectors - NON_Q_VECTORS; - rx_vectors = adapter->num_rx_queues; -#ifdef CONFIG_IXGBE_NAPI - poll = &ixgbe_clean_rxtx_many; -#endif - } else { - num_q_vectors = 1; - rx_vectors = 1; -#ifdef CONFIG_IXGBE_NAPI - poll = &ixgbe_poll; -#endif - } - - for (v_idx = 0; v_idx < num_q_vectors; v_idx++) { - q_vector = kzalloc(sizeof(struct ixgbe_q_vector), GFP_KERNEL); - if (!q_vector) - goto err_out; - q_vector->adapter = adapter; - q_vector->eitr = adapter->eitr_param; - q_vector->v_idx = v_idx; -#ifndef IXGBE_NO_LRO - if (v_idx < rx_vectors) { - int size = sizeof(struct ixgbe_lro_list); - q_vector->lrolist = vmalloc(size); - if (!q_vector->lrolist) { - kfree(q_vector); - goto err_out; - } - memset(q_vector->lrolist, 0, size); - ixgbe_lro_ring_init(q_vector->lrolist); - } -#endif -#ifdef CONFIG_IXGBE_NAPI - netif_napi_add(adapter->netdev, &q_vector->napi, (*poll), 64); -#endif - adapter->q_vector[v_idx] = q_vector; - } - - return 0; - -err_out: - while (v_idx) { - v_idx--; - q_vector = adapter->q_vector[v_idx]; -#ifdef CONFIG_IXGBE_NAPI - netif_napi_del(&q_vector->napi); -#endif -#ifndef IXGBE_NO_LRO - if (q_vector->lrolist) { - ixgbe_lro_ring_exit(q_vector->lrolist); - vfree(q_vector->lrolist); - q_vector->lrolist = NULL; - } -#endif - kfree(q_vector); - adapter->q_vector[v_idx] = NULL; - } - return -ENOMEM; -} - -/** - * ixgbe_free_q_vectors - Free memory allocated for interrupt vectors - * @adapter: board private structure to initialize - * - * This function frees the memory allocated to the q_vectors. In addition if - * NAPI is enabled it will delete any references to the NAPI struct prior - * to freeing the q_vector. - **/ -static void ixgbe_free_q_vectors(struct ixgbe_adapter *adapter) -{ - int v_idx, num_q_vectors; - - if (adapter->flags & IXGBE_FLAG_MSIX_ENABLED) { - num_q_vectors = adapter->num_msix_vectors - NON_Q_VECTORS; - } else { - num_q_vectors = 1; - } - - for (v_idx = 0; v_idx < num_q_vectors; v_idx++) { - struct ixgbe_q_vector *q_vector = adapter->q_vector[v_idx]; - - adapter->q_vector[v_idx] = NULL; -#ifdef CONFIG_IXGBE_NAPI - netif_napi_del(&q_vector->napi); -#endif -#ifndef IXGBE_NO_LRO - if (q_vector->lrolist) { - ixgbe_lro_ring_exit(q_vector->lrolist); - vfree(q_vector->lrolist); - q_vector->lrolist = NULL; - } -#endif - kfree(q_vector); - } -} - -static void ixgbe_reset_interrupt_capability(struct ixgbe_adapter *adapter) +void ixgbe_reset_interrupt_capability(struct ixgbe_adapter *adapter) { if (adapter->flags & IXGBE_FLAG_MSIX_ENABLED) { adapter->flags &= ~IXGBE_FLAG_MSIX_ENABLED; @@ -4604,25 +3595,18 @@ int ixgbe_init_interrupt_scheme(struct ixgbe_adapter *adapter) /* Number of supported queues */ ixgbe_set_num_queues(adapter); - err = ixgbe_set_interrupt_capability(adapter); - if (err) { - DPRINTK(PROBE, ERR, "Unable to setup interrupt capabilities\n"); - goto err_set_interrupt; - } - - err = ixgbe_alloc_q_vectors(adapter); - if (err) { - DPRINTK(PROBE, ERR, "Unable to allocate memory for queue " - "vectors\n"); - goto err_alloc_q_vectors; - } - err = ixgbe_alloc_queues(adapter); if (err) { DPRINTK(PROBE, ERR, "Unable to allocate memory for queues\n"); goto err_alloc_queues; } + err = ixgbe_set_interrupt_capability(adapter); + if (err) { + DPRINTK(PROBE, ERR, "Unable to setup interrupt capabilities\n"); + goto err_set_interrupt; + } + DPRINTK(DRV, INFO, "Multiqueue %s: Rx Queue count = %u, " "Tx Queue count = %u\n", (adapter->num_rx_queues > 1) ? "Enabled" : @@ -4631,30 +3615,12 @@ int ixgbe_init_interrupt_scheme(struct ixgbe_adapter *adapter) set_bit(__IXGBE_DOWN, &adapter->state); return 0; -err_alloc_queues: - ixgbe_free_q_vectors(adapter); -err_alloc_q_vectors: - ixgbe_reset_interrupt_capability(adapter); -err_set_interrupt: - return err; -} -/** - * ixgbe_clear_interrupt_scheme - Clear the current interrupt scheme settings - * @adapter: board private structure to clear interrupt scheme on - * - * We go through and clear interrupt specific resources and reset the structure - * to pre-load conditions - **/ -void ixgbe_clear_interrupt_scheme(struct ixgbe_adapter *adapter) -{ +err_set_interrupt: kfree(adapter->tx_ring); kfree(adapter->rx_ring); - adapter->tx_ring = NULL; - adapter->rx_ring = NULL; - - ixgbe_free_q_vectors(adapter); - ixgbe_reset_interrupt_capability(adapter); +err_alloc_queues: + return err; } /** @@ -4684,7 +3650,7 @@ static void ixgbe_sfp_task(struct work_struct *work) if ((hw->phy.type == ixgbe_phy_nl) && (hw->phy.sfp_type == ixgbe_sfp_type_not_present)) { s32 ret = hw->phy.ops.identify_sfp(hw); - if (ret && ret != IXGBE_ERR_SFP_NOT_SUPPORTED) + if (ret) goto reschedule; ret = hw->phy.ops.reset(hw); if (ret == IXGBE_ERR_SFP_NOT_SUPPORTED) { @@ -4693,7 +3659,6 @@ static void ixgbe_sfp_task(struct work_struct *work) "Reload the driver after installing a " "supported module.\n"); unregister_netdev(adapter->netdev); - adapter->netdev_registered = false; } else { DPRINTK(PROBE, INFO, "detected SFP+: %d\n", hw->phy.sfp_type); @@ -4731,7 +3696,17 @@ static int __devinit ixgbe_sw_init(struct ixgbe_adapter *adapter) hw->subsystem_device_id = pdev->subsystem_device; err = ixgbe_init_shared_code(hw); - if (err) { + if (err == IXGBE_ERR_SFP_NOT_PRESENT) { + /* start a kernel thread to watch for a module to arrive */ + set_bit(__IXGBE_SFP_MODULE_NOT_FOUND, &adapter->state); + mod_timer(&adapter->sfp_timer, + round_jiffies(jiffies + (2 * HZ))); + err = 0; + } else if (err == IXGBE_ERR_SFP_NOT_SUPPORTED) { + DPRINTK(PROBE, ERR, "failed to load because an " + "unsupported SFP+ module type was detected.\n"); + goto out; + } else if (err) { DPRINTK(PROBE, ERR, "init_shared_code failed: %d\n", err); goto out; } @@ -4739,7 +3714,8 @@ static int __devinit ixgbe_sw_init(struct ixgbe_adapter *adapter) /* Set capability flags */ switch (hw->mac.type) { case ixgbe_mac_82598EB: - if (hw->device_id == IXGBE_DEV_ID_82598AT) + if (hw->mac.ops.get_media_type && + (hw->mac.ops.get_media_type(hw) == ixgbe_media_type_copper)) adapter->flags |= IXGBE_FLAG_FAN_FAIL_CAPABLE; adapter->flags |= IXGBE_FLAG_DCA_CAPABLE; adapter->flags |= IXGBE_FLAG_MSI_CAPABLE; @@ -4748,35 +3724,12 @@ static int __devinit ixgbe_sw_init(struct ixgbe_adapter *adapter) adapter->flags |= IXGBE_FLAG_MQ_CAPABLE; if (adapter->flags & IXGBE_FLAG_MQ_CAPABLE) adapter->flags |= IXGBE_FLAG_DCB_CAPABLE; -#ifdef IXGBE_RSS +#ifdef CONFIG_IXGBE_RSS if (adapter->flags & IXGBE_FLAG_MQ_CAPABLE) adapter->flags |= IXGBE_FLAG_RSS_CAPABLE; #endif if (adapter->flags & IXGBE_FLAG_MQ_CAPABLE) adapter->flags |= IXGBE_FLAG_VMDQ_CAPABLE; -#ifndef IXGBE_NO_HW_RSC - adapter->flags2 &= ~IXGBE_FLAG2_RSC_CAPABLE; -#endif - adapter->max_msix_q_vectors = IXGBE_MAX_MSIX_Q_VECTORS_82598; - break; - case ixgbe_mac_82599EB: -#ifndef IXGBE_NO_HW_RSC - adapter->flags2 |= IXGBE_FLAG2_RSC_CAPABLE; -#endif - adapter->flags |= IXGBE_FLAG_DCA_CAPABLE; - adapter->flags |= IXGBE_FLAG_MSI_CAPABLE; - adapter->flags |= IXGBE_FLAG_MSIX_CAPABLE; - if (adapter->flags & IXGBE_FLAG_MSIX_CAPABLE) - adapter->flags |= IXGBE_FLAG_MQ_CAPABLE; - if (adapter->flags & IXGBE_FLAG_MQ_CAPABLE) - adapter->flags |= IXGBE_FLAG_DCB_CAPABLE; -#ifdef IXGBE_RSS - if (adapter->flags & IXGBE_FLAG_MQ_CAPABLE) - adapter->flags |= IXGBE_FLAG_RSS_CAPABLE; -#endif - if (adapter->flags & IXGBE_FLAG_MQ_CAPABLE) - adapter->flags |= IXGBE_FLAG_VMDQ_CAPABLE; - adapter->max_msix_q_vectors = IXGBE_MAX_MSIX_Q_VECTORS_82599; break; default: break; @@ -4798,24 +3751,17 @@ static int __devinit ixgbe_sw_init(struct ixgbe_adapter *adapter) adapter->dcb_cfg.bw_percentage[DCB_TX_CONFIG][0] = 100; adapter->dcb_cfg.bw_percentage[DCB_RX_CONFIG][0] = 100; adapter->dcb_cfg.rx_pba_cfg = pba_equal; - adapter->dcb_cfg.pfc_mode_enable = false; adapter->dcb_cfg.round_robin_enable = false; adapter->dcb_set_bitmap = 0x00; } -#ifdef CONFIG_DCB - ixgbe_copy_dcb_cfg(&adapter->dcb_cfg, &adapter->temp_dcb_cfg, - adapter->ring_feature[RING_F_DCB].indices); -#endif /* default flow control settings */ - hw->fc.requested_mode = ixgbe_fc_full; - hw->fc.current_mode = ixgbe_fc_full; /* init for ethtool output */ - adapter->last_lfc_mode = hw->fc.current_mode; + hw->fc.current_mode = ixgbe_fc_none; + hw->fc.requested_mode = ixgbe_fc_none; hw->fc.high_water = IXGBE_DEFAULT_FCRTH; hw->fc.low_water = IXGBE_DEFAULT_FCRTL; hw->fc.pause_time = IXGBE_DEFAULT_FCPAUSE; hw->fc.send_xon = true; - hw->fc.disable_fc_autoneg = false; /* set defaults for eitr in MegaBytes */ adapter->eitr_low = 10; @@ -4853,7 +3799,8 @@ int ixgbe_setup_tx_resources(struct ixgbe_adapter *adapter, memset(tx_ring->tx_buffer_info, 0, size); /* round up to nearest 4K */ - tx_ring->size = tx_ring->count * sizeof(union ixgbe_adv_tx_desc); + tx_ring->size = tx_ring->count * sizeof(union ixgbe_adv_tx_desc) + + sizeof(u32); tx_ring->size = ALIGN(tx_ring->size, 4096); tx_ring->desc = pci_alloc_consistent(pdev, tx_ring->size, @@ -4875,30 +3822,6 @@ err: } /** - * ixgbe_setup_all_tx_resources - allocate all queues Tx resources - * @adapter: board private structure - * - * If this function returns with an error, then it''s possible one or - * more of the rings is populated (while the rest are not). It is the - * callers duty to clean those orphaned rings. - * - * Return 0 on success, negative on failure - **/ -static int ixgbe_setup_all_tx_resources(struct ixgbe_adapter *adapter) -{ - int i, err = 0; - - for (i = 0; i < adapter->num_tx_queues; i++) { - err = ixgbe_setup_tx_resources(adapter, &adapter->tx_ring[i]); - if (!err) - continue; - DPRINTK(PROBE, ERR, "Allocation for Tx Queue %u failed\n", i); - break; - } - return err; -} - -/** * ixgbe_setup_rx_resources - allocate Rx resources (Descriptors) * @adapter: board private structure * @rx_ring: rx descriptor ring (for a specific queue) to setup @@ -4911,6 +3834,22 @@ int ixgbe_setup_rx_resources(struct ixgbe_adapter *adapter, struct pci_dev *pdev = adapter->pdev; int size; +#ifndef IXGBE_NO_INET_LRO + size = sizeof(struct net_lro_desc) * IXGBE_MAX_LRO_DESCRIPTORS; + rx_ring->lro_mgr.lro_arr = vmalloc(size); + if (!rx_ring->lro_mgr.lro_arr) + return -ENOMEM; + memset(rx_ring->lro_mgr.lro_arr, 0, size); + +#endif /* IXGBE_NO_INET_LRO */ +#ifndef IXGBE_NO_LRO + size = sizeof(struct ixgbe_lro_list); + rx_ring->lrolist = vmalloc(size); + if (!rx_ring->lrolist) + return -ENOMEM; + memset(rx_ring->lrolist, 0, size); + +#endif size = sizeof(struct ixgbe_rx_buffer) * rx_ring->count; rx_ring->rx_buffer_info = vmalloc(size); if (!rx_ring->rx_buffer_info) { @@ -4942,36 +3881,36 @@ int ixgbe_setup_rx_resources(struct ixgbe_adapter *adapter, rx_ring->work_limit = rx_ring->count / 2; #endif +#ifdef CONFIG_XEN_NETDEV2_VMQ + if ((adapter->flags & IXGBE_FLAG_VMDQ_ENABLED) && + rx_ring->queue_index) { + rx_ring->active = 0; + rx_ring->allocated = 0; + } else { +#endif + rx_ring->active = 1; + rx_ring->allocated = 1; +#ifdef CONFIG_XEN_NETDEV2_VMQ + } +#endif + +#ifndef IXGBE_NO_LRO + ixgbe_lro_ring_init(rx_ring->lrolist, adapter); +#endif return 0; alloc_failed: +#ifndef IXGBE_NO_INET_LRO + vfree(rx_ring->lro_mgr.lro_arr); + rx_ring->lro_mgr.lro_arr = NULL; +#endif +#ifndef IXGBE_NO_LRO + vfree(rx_ring->lrolist); + rx_ring->lrolist = NULL; +#endif return -ENOMEM; } /** - * ixgbe_setup_all_rx_resources - allocate all queues Rx resources - * @adapter: board private structure - * - * If this function returns with an error, then it''s possible one or - * more of the rings is populated (while the rest are not). It is the - * callers duty to clean those orphaned rings. - * - * Return 0 on success, negative on failure - **/ -static int ixgbe_setup_all_rx_resources(struct ixgbe_adapter *adapter) -{ - int i, err = 0; - - for (i = 0; i < adapter->num_rx_queues; i++) { - err = ixgbe_setup_rx_resources(adapter, &adapter->rx_ring[i]); - if (!err) - continue; - DPRINTK(PROBE, ERR, "Allocation for Rx Queue %u failed\n", i); - break; - } - return err; -} - -/** * ixgbe_free_tx_resources - Free Tx Resources per Queue * @adapter: board private structure * @tx_ring: Tx descriptor ring for a specific queue @@ -5004,8 +3943,7 @@ static void ixgbe_free_all_tx_resources(struct ixgbe_adapter *adapter) int i; for (i = 0; i < adapter->num_tx_queues; i++) - if (adapter->tx_ring[i].desc) - ixgbe_free_tx_resources(adapter, &adapter->tx_ring[i]); + ixgbe_free_tx_resources(adapter, &adapter->tx_ring[i]); } /** @@ -5020,6 +3958,16 @@ void ixgbe_free_rx_resources(struct ixgbe_adapter *adapter, { struct pci_dev *pdev = adapter->pdev; +#ifndef IXGBE_NO_INET_LRO + vfree(rx_ring->lro_mgr.lro_arr); + rx_ring->lro_mgr.lro_arr = NULL; +#endif +#ifndef IXGBE_NO_LRO + if (rx_ring->lrolist) + ixgbe_lro_ring_exit(rx_ring->lrolist); + vfree(rx_ring->lrolist); + rx_ring->lrolist = NULL; +#endif ixgbe_clean_rx_ring(adapter, rx_ring); vfree(rx_ring->rx_buffer_info); @@ -5041,8 +3989,58 @@ static void ixgbe_free_all_rx_resources(struct ixgbe_adapter *adapter) int i; for (i = 0; i < adapter->num_rx_queues; i++) - if (adapter->rx_ring[i].desc) - ixgbe_free_rx_resources(adapter, &adapter->rx_ring[i]); + ixgbe_free_rx_resources(adapter, &adapter->rx_ring[i]); +} + +/** + * ixgbe_setup_all_rx_resources - allocate all queues Rx resources + * @adapter: board private structure + * + * If this function returns with an error, then it''s possible one or + * more of the rings is populated (while the rest are not). It is the + * callers duty to clean those orphaned rings. + * + * Return 0 on success, negative on failure + **/ +static int ixgbe_setup_all_rx_resources(struct ixgbe_adapter *adapter) +{ + int i, err = 0; + + for (i = 0; i < adapter->num_rx_queues; i++) { + err = ixgbe_setup_rx_resources(adapter, &adapter->rx_ring[i]); + if (!err) + continue; + DPRINTK(PROBE, ERR, "Allocation for Rx Queue %u failed\n", i); + break; + } +#ifdef CONFIG_XEN_NETDEV2_VMQ + adapter->rx_queues_allocated = 0; +#endif + return err; +} + +/** + * ixgbe_setup_all_tx_resources - allocate all queues Tx resources + * @adapter: board private structure + * + * If this function returns with an error, then it''s possible one or + * more of the rings is populated (while the rest are not). It is the + * callers duty to clean those orphaned rings. + * + * Return 0 on success, negative on failure + **/ +static int ixgbe_setup_all_tx_resources(struct ixgbe_adapter *adapter) +{ + int i, err = 0; + + for (i = 0; i < adapter->num_tx_queues; i++) { + err = ixgbe_setup_tx_resources(adapter, &adapter->tx_ring[i]); + if (!err) + continue; + DPRINTK(PROBE, ERR, "Allocation for Tx Queue %u failed\n", i); + break; + } + return err; } /** @@ -5061,6 +4059,12 @@ static int ixgbe_change_mtu(struct net_device *netdev, int new_mtu) if ((new_mtu < 68) || (max_frame > IXGBE_MAX_JUMBO_FRAME_SIZE)) return -EINVAL; +#ifdef CONFIG_XEN_NETDEV2_VMQ + /* Jumbo frames not currently supported in VMDq mode under Xen */ + if ((adapter->flags & IXGBE_FLAG_VMDQ_ENABLED) && + (max_frame > ETH_FRAME_LEN)) + return -EINVAL; +#endif DPRINTK(PROBE, INFO, "changing MTU from %d to %d\n", netdev->mtu, new_mtu); /* must set new MTU before calling down or up */ @@ -5087,7 +4091,6 @@ static int ixgbe_change_mtu(struct net_device *netdev, int new_mtu) static int ixgbe_open(struct net_device *netdev) { struct ixgbe_adapter *adapter = netdev_priv(netdev); - struct ixgbe_hw *hw = &adapter->hw; int err; /* disallow open during test */ @@ -5106,48 +4109,26 @@ static int ixgbe_open(struct net_device *netdev) ixgbe_configure(adapter); - /* - * Map the Tx/Rx rings to the vectors we were allotted. - * if request_irq will be called in this function map_rings - * must be called *before* up_complete - */ - ixgbe_map_rings_to_vectors(adapter); - - err = ixgbe_up_complete(adapter); - if (err) - goto err_setup_rx; - - /* clear any pending interrupts, may auto mask */ - IXGBE_READ_REG(hw, IXGBE_EICR); - err = ixgbe_request_irq(adapter); if (err) goto err_req_irq; - ixgbe_irq_enable(adapter, true, true); - - /* - * If this adapter has a fan, check to see if we had a failure - * before we enabled the interrupt. - */ - if (adapter->flags & IXGBE_FLAG_FAN_FAIL_CAPABLE) { - u32 esdp = IXGBE_READ_REG(hw, IXGBE_ESDP); - if (esdp & IXGBE_ESDP_SDP1) - DPRINTK(DRV, CRIT, - "Fan has stopped, replace the adapter\n"); - } + err = ixgbe_up_complete(adapter); + if (err) + goto err_up; + netif_tx_start_all_queues(netdev); return 0; -err_req_irq: - ixgbe_down(adapter); +err_up: ixgbe_release_hw_control(adapter); ixgbe_free_irq(adapter); -err_setup_rx: +err_req_irq: ixgbe_free_all_rx_resources(adapter); -err_setup_tx: +err_setup_rx: ixgbe_free_all_tx_resources(adapter); +err_setup_tx: ixgbe_reset(adapter); return err; @@ -5179,6 +4160,52 @@ static int ixgbe_close(struct net_device *netdev) return 0; } +#ifdef CONFIG_IXGBE_NAPI +/** + * ixgbe_napi_add_all - prep napi structs for use + * @adapter: private struct + * + * helper function to napi_add each possible q_vector->napi + */ +void ixgbe_napi_add_all(struct ixgbe_adapter *adapter) +{ + int q_idx, q_vectors; + int (*poll)(struct napi_struct *, int); + + if (adapter->flags & IXGBE_FLAG_MSIX_ENABLED) { + poll = &ixgbe_clean_rxonly; + /* Only enable as many vectors as we have rx queues. */ + q_vectors = adapter->num_rx_queues; + } else { + poll = &ixgbe_poll; + /* only one q_vector for legacy modes */ + q_vectors = 1; + } + + for (q_idx = 0; q_idx < q_vectors; q_idx++) { + struct ixgbe_q_vector *q_vector = &adapter->q_vector[q_idx]; + netif_napi_add(adapter->netdev, &q_vector->napi, (*poll), 64); + } +} + +void ixgbe_napi_del_all(struct ixgbe_adapter *adapter) +{ + int q_idx; + int q_vectors = adapter->num_msix_vectors - NON_Q_VECTORS; + + /* legacy and MSI only use one vector */ + if (!(adapter->flags & IXGBE_FLAG_MSIX_ENABLED)) + q_vectors = 1; + + for (q_idx = 0; q_idx < q_vectors; q_idx++) { + struct ixgbe_q_vector *q_vector = &adapter->q_vector[q_idx]; + if (!q_vector->rxr_count) + continue; + netif_napi_del(&q_vector->napi); + } +} + +#endif #ifdef CONFIG_PM static int ixgbe_resume(struct pci_dev *pdev) { @@ -5196,7 +4223,8 @@ static int ixgbe_resume(struct pci_dev *pdev) } pci_set_master(pdev); - pci_wake_from_d3(pdev, false); + pci_enable_wake(pdev, PCI_D3hot, 0); + pci_enable_wake(pdev, PCI_D3cold, 0); err = ixgbe_init_interrupt_scheme(adapter); if (err) { @@ -5205,9 +4233,11 @@ static int ixgbe_resume(struct pci_dev *pdev) return err; } - ixgbe_reset(adapter); +#ifdef CONFIG_IXGBE_NAPI + ixgbe_napi_add_all(adapter); - IXGBE_WRITE_REG(&adapter->hw, IXGBE_WUS, ~0); +#endif + ixgbe_reset(adapter); if (netif_running(netdev)) { err = ixgbe_open(adapter->netdev); @@ -5219,14 +4249,12 @@ static int ixgbe_resume(struct pci_dev *pdev) return 0; } + #endif /* CONFIG_PM */ -static int __ixgbe_shutdown(struct pci_dev *pdev, bool *enable_wake) +static int ixgbe_suspend(struct pci_dev *pdev, pm_message_t state) { struct net_device *netdev = pci_get_drvdata(pdev); struct ixgbe_adapter *adapter = netdev_priv(netdev); - struct ixgbe_hw *hw = &adapter->hw; - u32 ctrl, fctrl; - u32 wufc = adapter->wol; #ifdef CONFIG_PM int retval = 0; #endif @@ -5239,84 +4267,40 @@ static int __ixgbe_shutdown(struct pci_dev *pdev, bool *enable_wake) ixgbe_free_all_tx_resources(adapter); ixgbe_free_all_rx_resources(adapter); } + ixgbe_reset_interrupt_capability(adapter); - ixgbe_clear_interrupt_scheme(adapter); +#ifdef CONFIG_IXGBE_NAPI + ixgbe_napi_del_all(adapter); +#endif + kfree(adapter->tx_ring); + kfree(adapter->rx_ring); #ifdef CONFIG_PM retval = pci_save_state(pdev); if (retval) return retval; - #endif - if (wufc) { - ixgbe_set_rx_mode(netdev); - - /* turn on all-multi mode if wake on multicast is enabled */ - if (wufc & IXGBE_WUFC_MC) { - fctrl = IXGBE_READ_REG(hw, IXGBE_FCTRL); - fctrl |= IXGBE_FCTRL_MPE; - IXGBE_WRITE_REG(hw, IXGBE_FCTRL, fctrl); - } - - ctrl = IXGBE_READ_REG(hw, IXGBE_CTRL); - ctrl |= IXGBE_CTRL_GIO_DIS; - IXGBE_WRITE_REG(hw, IXGBE_CTRL, ctrl); - - IXGBE_WRITE_REG(hw, IXGBE_WUFC, wufc); - } else { - IXGBE_WRITE_REG(hw, IXGBE_WUC, 0); - IXGBE_WRITE_REG(hw, IXGBE_WUFC, 0); - } - - if (wufc && hw->mac.type == ixgbe_mac_82599EB) - pci_wake_from_d3(pdev, true); - else - pci_wake_from_d3(pdev, false); - *enable_wake = !!wufc; + pci_enable_wake(pdev, PCI_D3hot, 0); + pci_enable_wake(pdev, PCI_D3cold, 0); ixgbe_release_hw_control(adapter); pci_disable_device(pdev); - return 0; -} - -#ifdef CONFIG_PM -static int ixgbe_suspend(struct pci_dev *pdev, pm_message_t state) -{ - int retval; - bool wake; - - retval = __ixgbe_shutdown(pdev, &wake); - if (retval) - return retval; - - if (wake) { - pci_prepare_to_sleep(pdev); - } else { - pci_wake_from_d3(pdev, false); - pci_set_power_state(pdev, PCI_D3hot); - } + pci_set_power_state(pdev, pci_choose_state(pdev, state)); return 0; } -#endif /* CONFIG_PM */ #ifndef USE_REBOOT_NOTIFIER static void ixgbe_shutdown(struct pci_dev *pdev) { - bool wake; - - __ixgbe_shutdown(pdev, &wake); - - if (system_state == SYSTEM_POWER_OFF) { - pci_wake_from_d3(pdev, wake); - pci_set_power_state(pdev, PCI_D3hot); - } + ixgbe_suspend(pdev, PMSG_SUSPEND); } #endif + /** * ixgbe_update_stats - Update the board statistics counters. * @adapter: board private structure @@ -5326,37 +4310,7 @@ void ixgbe_update_stats(struct ixgbe_adapter *adapter) struct ixgbe_hw *hw = &adapter->hw; u64 total_mpc = 0; u32 i, missed_rx = 0, mpc, bprc, lxon, lxoff, xon_off_tot; -#ifndef IXGBE_NO_LRO - u32 flushed = 0, coal = 0, recycled = 0; - int num_q_vectors = 1; - if (adapter->flags & IXGBE_FLAG_MSIX_ENABLED) - num_q_vectors = adapter->num_msix_vectors - NON_Q_VECTORS; -#endif - - if (hw->mac.type == ixgbe_mac_82599EB) { - u64 rsc_count = 0; - for (i = 0; i < 16; i++) - adapter->hw_rx_no_dma_resources += IXGBE_READ_REG(hw, IXGBE_QPRDC(i)); - for (i = 0; i < adapter->num_rx_queues; i++) - rsc_count += adapter->rx_ring[i].rsc_count; - adapter->rsc_count = rsc_count; - } - -#ifndef IXGBE_NO_LRO - for (i = 0; i < num_q_vectors; i++) { - struct ixgbe_q_vector *q_vector = adapter->q_vector[i]; - if (!q_vector || !q_vector->lrolist) - continue; - flushed += q_vector->lrolist->stats.flushed; - coal += q_vector->lrolist->stats.coal; - recycled += q_vector->lrolist->stats.recycled; - } - adapter->lro_stats.flushed = flushed; - adapter->lro_stats.coal = coal; - adapter->lro_stats.recycled = recycled; - -#endif adapter->stats.crcerrs += IXGBE_READ_REG(hw, IXGBE_CRCERRS); for (i = 0; i < 8; i++) { /* for packet buffers not used, the register should read 0 */ @@ -5364,52 +4318,32 @@ void ixgbe_update_stats(struct ixgbe_adapter *adapter) missed_rx += mpc; adapter->stats.mpc[i] += mpc; total_mpc += adapter->stats.mpc[i]; - if (hw->mac.type == ixgbe_mac_82598EB) - adapter->stats.rnbc[i] += IXGBE_READ_REG(hw, IXGBE_RNBC(i)); + adapter->stats.rnbc[i] += IXGBE_READ_REG(hw, IXGBE_RNBC(i)); adapter->stats.qptc[i] += IXGBE_READ_REG(hw, IXGBE_QPTC(i)); adapter->stats.qbtc[i] += IXGBE_READ_REG(hw, IXGBE_QBTC(i)); adapter->stats.qprc[i] += IXGBE_READ_REG(hw, IXGBE_QPRC(i)); adapter->stats.qbrc[i] += IXGBE_READ_REG(hw, IXGBE_QBRC(i)); - if (hw->mac.type == ixgbe_mac_82599EB) { - adapter->stats.pxonrxc[i] += IXGBE_READ_REG(hw, - IXGBE_PXONRXCNT(i)); - adapter->stats.pxoffrxc[i] += IXGBE_READ_REG(hw, - IXGBE_PXOFFRXCNT(i)); - } else { - adapter->stats.pxonrxc[i] += IXGBE_READ_REG(hw, - IXGBE_PXONRXC(i)); - adapter->stats.pxoffrxc[i] += IXGBE_READ_REG(hw, - IXGBE_PXOFFRXC(i)); - } + adapter->stats.pxonrxc[i] += IXGBE_READ_REG(hw, + IXGBE_PXONRXC(i)); + adapter->stats.pxontxc[i] += IXGBE_READ_REG(hw, + IXGBE_PXONTXC(i)); + adapter->stats.pxoffrxc[i] += IXGBE_READ_REG(hw, + IXGBE_PXOFFRXC(i)); + adapter->stats.pxofftxc[i] += IXGBE_READ_REG(hw, + IXGBE_PXOFFTXC(i)); } adapter->stats.gprc += IXGBE_READ_REG(hw, IXGBE_GPRC); /* work around hardware counting issue */ adapter->stats.gprc -= missed_rx; /* 82598 hardware only has a 32 bit counter in the high register */ - if (hw->mac.type == ixgbe_mac_82599EB) { - adapter->stats.gorc += IXGBE_READ_REG(hw, IXGBE_GORCL); - IXGBE_READ_REG(hw, IXGBE_GORCH); /* to clear */ - adapter->stats.gotc += IXGBE_READ_REG(hw, IXGBE_GOTCL); - IXGBE_READ_REG(hw, IXGBE_GOTCH); /* to clear */ - adapter->stats.tor += IXGBE_READ_REG(hw, IXGBE_TORL); - IXGBE_READ_REG(hw, IXGBE_TORH); /* to clear */ - adapter->stats.lxonrxc += IXGBE_READ_REG(hw, IXGBE_LXONRXCNT); - adapter->stats.lxoffrxc += IXGBE_READ_REG(hw, IXGBE_LXOFFRXCNT); - adapter->stats.fdirmatch += IXGBE_READ_REG(hw, IXGBE_FDIRMATCH); - adapter->stats.fdirmiss += IXGBE_READ_REG(hw, IXGBE_FDIRMISS); - } else { - adapter->stats.lxonrxc += IXGBE_READ_REG(hw, IXGBE_LXONRXC); - adapter->stats.lxoffrxc += IXGBE_READ_REG(hw, IXGBE_LXOFFRXC); - adapter->stats.gorc += IXGBE_READ_REG(hw, IXGBE_GORCH); - adapter->stats.gotc += IXGBE_READ_REG(hw, IXGBE_GOTCH); - adapter->stats.tor += IXGBE_READ_REG(hw, IXGBE_TORH); - } + adapter->stats.gorc += IXGBE_READ_REG(hw, IXGBE_GORCH); + adapter->stats.gotc += IXGBE_READ_REG(hw, IXGBE_GOTCH); + adapter->stats.tor += IXGBE_READ_REG(hw, IXGBE_TORH); bprc = IXGBE_READ_REG(hw, IXGBE_BPRC); adapter->stats.bprc += bprc; adapter->stats.mprc += IXGBE_READ_REG(hw, IXGBE_MPRC); - if (hw->mac.type == ixgbe_mac_82598EB) - adapter->stats.mprc -= bprc; + adapter->stats.mprc -= bprc; adapter->stats.roc += IXGBE_READ_REG(hw, IXGBE_ROC); adapter->stats.prc64 += IXGBE_READ_REG(hw, IXGBE_PRC64); adapter->stats.prc127 += IXGBE_READ_REG(hw, IXGBE_PRC127); @@ -5418,6 +4352,8 @@ void ixgbe_update_stats(struct ixgbe_adapter *adapter) adapter->stats.prc1023 += IXGBE_READ_REG(hw, IXGBE_PRC1023); adapter->stats.prc1522 += IXGBE_READ_REG(hw, IXGBE_PRC1522); adapter->stats.rlec += IXGBE_READ_REG(hw, IXGBE_RLEC); + adapter->stats.lxonrxc += IXGBE_READ_REG(hw, IXGBE_LXONRXC); + adapter->stats.lxoffrxc += IXGBE_READ_REG(hw, IXGBE_LXOFFRXC); lxon = IXGBE_READ_REG(hw, IXGBE_LXONTXC); adapter->stats.lxontxc += lxon; lxoff = IXGBE_READ_REG(hw, IXGBE_LXOFFTXC); @@ -5465,135 +4401,31 @@ static void ixgbe_watchdog(unsigned long data) { struct ixgbe_adapter *adapter = (struct ixgbe_adapter *)data; struct ixgbe_hw *hw = &adapter->hw; - u64 eics = 0; - int i; - /* - * Do the watchdog outside of interrupt context due to the lovely - * delays that some of the newer hardware requires - */ - - if (test_bit(__IXGBE_DOWN, &adapter->state)) - goto watchdog_short_circuit; - - - if (!(adapter->flags & IXGBE_FLAG_MSIX_ENABLED)) { - /* - * for legacy and MSI interrupts don''t set any bits - * that are enabled for EIAM, because this operation - * would set *both* EIMS and EICS for any bit in EIAM - */ - IXGBE_WRITE_REG(hw, IXGBE_EICS, - (IXGBE_EICS_TCP_TIMER | IXGBE_EICS_OTHER)); - goto watchdog_reschedule; - } - - /* get one bit for every active tx/rx interrupt vector */ - for (i = 0; i < adapter->num_msix_vectors - NON_Q_VECTORS; i++) { - struct ixgbe_q_vector *qv = adapter->q_vector[i]; - if (qv->rxr_count || qv->txr_count) - eics |= ((u64)1 << i); + /* Do the watchdog outside of interrupt context due to the lovely + * delays that some of the newer hardware requires */ + if (!test_bit(__IXGBE_DOWN, &adapter->state)) { + /* Cause software interrupt to ensure rx rings are cleaned */ + if (adapter->flags & IXGBE_FLAG_MSIX_ENABLED) { + u32 eics + (1 << (adapter->num_msix_vectors - NON_Q_VECTORS)) - 1; + IXGBE_WRITE_REG(hw, IXGBE_EICS, eics); + } else { + /* for legacy and MSI interrupts don''t set any bits that + * are enabled for EIAM, because this operation would + * set *both* EIMS and EICS for any bit in EIAM */ + IXGBE_WRITE_REG(hw, IXGBE_EICS, + (IXGBE_EICS_TCP_TIMER | IXGBE_EICS_OTHER)); + } + /* Reset the timer */ + mod_timer(&adapter->watchdog_timer, + round_jiffies(jiffies + 2 * HZ)); } - /* Cause software interrupt to ensure rings are cleaned */ - ixgbe_irq_rearm_queues(adapter, eics); - -watchdog_reschedule: - /* Reset the timer */ - mod_timer(&adapter->watchdog_timer, round_jiffies(jiffies + 2 * HZ)); - -watchdog_short_circuit: schedule_work(&adapter->watchdog_task); } /** - * ixgbe_multispeed_fiber_task - worker thread to configure multispeed fiber - * @work: pointer to work_struct containing our data - **/ -static void ixgbe_multispeed_fiber_task(struct work_struct *work) -{ - struct ixgbe_adapter *adapter = container_of(work, - struct ixgbe_adapter, - multispeed_fiber_task); - struct ixgbe_hw *hw = &adapter->hw; - u32 autoneg; - - adapter->flags |= IXGBE_FLAG_IN_SFP_LINK_TASK; - if (hw->mac.ops.get_link_capabilities) - hw->mac.ops.get_link_capabilities(hw, &autoneg, - &hw->mac.autoneg); - if (hw->mac.ops.setup_link_speed) - hw->mac.ops.setup_link_speed(hw, autoneg, true, true); - adapter->flags |= IXGBE_FLAG_NEED_LINK_UPDATE; - adapter->flags &= ~IXGBE_FLAG_IN_SFP_LINK_TASK; -} - -/** - * ixgbe_sfp_config_module_task - worker thread to configure a new SFP+ module - * @work: pointer to work_struct containing our data - **/ -static void ixgbe_sfp_config_module_task(struct work_struct *work) -{ - struct ixgbe_adapter *adapter = container_of(work, - struct ixgbe_adapter, - sfp_config_module_task); - struct ixgbe_hw *hw = &adapter->hw; - u32 err; - - adapter->flags |= IXGBE_FLAG_IN_SFP_MOD_TASK; - err = hw->phy.ops.identify_sfp(hw); - if (err == IXGBE_ERR_SFP_NOT_SUPPORTED) { - DPRINTK(PROBE, ERR, "failed to load because an " - "unsupported SFP+ module type was detected.\n"); - unregister_netdev(adapter->netdev); - adapter->netdev_registered = false; - return; - } - /* - * A module may be identified correctly, but the EEPROM may not have - * support for that module. setup_sfp() will fail in that case, so - * we should not allow that module to load. - */ - err = hw->mac.ops.setup_sfp(hw); - if (err == IXGBE_ERR_SFP_NOT_SUPPORTED) { - DPRINTK(PROBE, ERR, "failed to load because an " - "unsupported SFP+ module type was detected.\n"); - unregister_netdev(adapter->netdev); - adapter->netdev_registered = false; - return; - } - - if (!(adapter->flags & IXGBE_FLAG_IN_SFP_LINK_TASK)) - /* This will also work for DA Twinax connections */ - schedule_work(&adapter->multispeed_fiber_task); - adapter->flags &= ~IXGBE_FLAG_IN_SFP_MOD_TASK; -} - -/** - * ixgbe_fdir_reinit_task - worker thread to reinit FDIR filter table - * @work: pointer to work_struct containing our data - **/ -static void ixgbe_fdir_reinit_task(struct work_struct *work) -{ - struct ixgbe_adapter *adapter = container_of(work, - struct ixgbe_adapter, - fdir_reinit_task); - struct ixgbe_hw *hw = &adapter->hw; - int i; - - if (ixgbe_reinit_fdir_tables_82599(hw) == 0) { - for (i = 0; i < adapter->num_tx_queues; i++) - set_bit(__IXGBE_FDIR_INIT_DONE, - &(adapter->tx_ring[i].reinit_state)); - } else { - DPRINTK(PROBE, ERR, "failed to finish FDIR re-initialization, " - "ignored adding FDIR ATR filters \n"); - } - /* Done FDIR Re-initialization, enable transmits */ - netif_tx_start_all_queues(adapter->netdev); -} - -/** * ixgbe_watchdog_task - worker thread to bring link up * @work: pointer to work_struct containing our data **/ @@ -5606,9 +4438,6 @@ static void ixgbe_watchdog_task(struct work_struct *work) struct ixgbe_hw *hw = &adapter->hw; u32 link_speed = adapter->link_speed; bool link_up = adapter->link_up; - int i; - struct ixgbe_ring *tx_ring; - int some_tx_pending = 0; adapter->flags |= IXGBE_FLAG_IN_WATCHDOG_TASK; @@ -5620,20 +4449,11 @@ static void ixgbe_watchdog_task(struct work_struct *work) link_speed = IXGBE_LINK_SPEED_10GB_FULL; link_up = true; } - if (link_up) { - if (adapter->flags & IXGBE_FLAG_DCB_ENABLED) { - for (i = 0; i < MAX_TRAFFIC_CLASS; i++) - hw->mac.ops.fc_enable(hw, i); - } else { - hw->mac.ops.fc_enable(hw, 0); - } - } - if (link_up || time_after(jiffies, (adapter->link_check_timeout + IXGBE_TRY_LINK_TIMEOUT))) { - adapter->flags &= ~IXGBE_FLAG_NEED_LINK_UPDATE; IXGBE_WRITE_REG(hw, IXGBE_EIMS, IXGBE_EIMC_LSC); + adapter->flags &= ~IXGBE_FLAG_NEED_LINK_UPDATE; } adapter->link_up = link_up; adapter->link_speed = link_speed; @@ -5641,28 +4461,19 @@ static void ixgbe_watchdog_task(struct work_struct *work) if (link_up) { if (!netif_carrier_ok(netdev)) { - bool flow_rx, flow_tx; - - if (hw->mac.type == ixgbe_mac_82599EB) { - u32 mflcn = IXGBE_READ_REG(hw, IXGBE_MFLCN); - u32 fccfg = IXGBE_READ_REG(hw, IXGBE_FCCFG); - flow_rx = (mflcn & IXGBE_MFLCN_RFCE); - flow_tx = (fccfg & IXGBE_FCCFG_TFCE_802_3X); - } else { - u32 frctl = IXGBE_READ_REG(hw, IXGBE_FCTRL); - u32 rmcs = IXGBE_READ_REG(hw, IXGBE_RMCS); - flow_rx = (frctl & IXGBE_FCTRL_RFCE); - flow_tx = (rmcs & IXGBE_RMCS_TFCE_802_3X); - } + u32 frctl = IXGBE_READ_REG(hw, IXGBE_FCTRL); + u32 rmcs = IXGBE_READ_REG(hw, IXGBE_RMCS); +#define FLOW_RX (frctl & IXGBE_FCTRL_RFCE) +#define FLOW_TX (rmcs & IXGBE_RMCS_TFCE_802_3X) DPRINTK(LINK, INFO, "NIC Link is Up %s, " "Flow Control: %s\n", (link_speed == IXGBE_LINK_SPEED_10GB_FULL ? "10 Gbps" : (link_speed == IXGBE_LINK_SPEED_1GB_FULL ? "1 Gbps" : "unknown speed")), - ((flow_rx && flow_tx) ? "RX/TX" : - (flow_rx ? "RX" : - (flow_tx ? "TX" : "None")))); + ((FLOW_RX && FLOW_TX) ? "RX/TX" : + (FLOW_RX ? "RX" : + (FLOW_TX ? "TX" : "None")))); netif_carrier_on(netdev); netif_tx_wake_all_queues(netdev); @@ -5680,33 +4491,8 @@ static void ixgbe_watchdog_task(struct work_struct *work) } } - if (!netif_carrier_ok(netdev)) { - for (i = 0; i < adapter->num_tx_queues; i++) { - tx_ring = &adapter->tx_ring[i]; - if (tx_ring->next_to_use != tx_ring->next_to_clean) { - some_tx_pending = 1; - break; - } - } - - if (some_tx_pending) { - /* We''ve lost link, so the controller stops DMA, - * but we''ve got queued Tx work that''s never going - * to get done, so reset controller to flush Tx. - * (Do the reset outside of interrupt context). - */ - schedule_work(&adapter->reset_task); - } - } - ixgbe_update_stats(adapter); adapter->flags &= ~IXGBE_FLAG_IN_WATCHDOG_TASK; - - if (adapter->flags & IXGBE_FLAG_NEED_LINK_UPDATE) { - /* poll faster when waiting for link */ - mod_timer(&adapter->watchdog_timer, jiffies + (HZ/10)); - } - } static int ixgbe_tso(struct ixgbe_adapter *adapter, struct ixgbe_ring *tx_ring, @@ -5740,11 +4526,12 @@ static int ixgbe_tso(struct ixgbe_adapter *adapter, struct ixgbe_ring *tx_ring, adapter->hw_tso_ctxt++; #ifdef NETIF_F_TSO6 } else if (skb_shinfo(skb)->gso_type == SKB_GSO_TCPV6) { - ipv6_hdr(skb)->payload_len = 0; - tcp_hdr(skb)->check - ~csum_ipv6_magic(&ipv6_hdr(skb)->saddr, - &ipv6_hdr(skb)->daddr, - 0, IPPROTO_TCP, 0); + struct ipv6hdr *ipv6h = ipv6_hdr(skb); + ipv6h->payload_len = 0; + tcp_hdr(skb)->check = ~csum_ipv6_magic(&ipv6h->saddr, + &ipv6h->daddr, + 0, IPPROTO_TCP, + 0); adapter->hw_tso6_ctxt++; #endif } @@ -5876,35 +4663,33 @@ static bool ixgbe_tx_csum(struct ixgbe_adapter *adapter, } static int ixgbe_tx_map(struct ixgbe_adapter *adapter, - struct ixgbe_ring *tx_ring, - struct sk_buff *skb, u32 tx_flags, + struct ixgbe_ring *tx_ring, struct sk_buff *skb, unsigned int first) { struct ixgbe_tx_buffer *tx_buffer_info; - unsigned int len; - unsigned int total = skb->len; + unsigned int len = skb->len; unsigned int offset = 0, size, count = 0, i; #ifdef MAX_SKB_FRAGS unsigned int nr_frags = skb_shinfo(skb)->nr_frags; unsigned int f; + + len -= skb->data_len; #endif i = tx_ring->next_to_use; - len = min(skb_headlen(skb), total); while (len) { tx_buffer_info = &tx_ring->tx_buffer_info[i]; size = min(len, (unsigned int)IXGBE_MAX_DATA_PER_TXD); tx_buffer_info->length = size; tx_buffer_info->dma = pci_map_single(adapter->pdev, - skb->data + offset, - size, PCI_DMA_TODEVICE); + skb->data + offset, size, + PCI_DMA_TODEVICE); tx_buffer_info->time_stamp = jiffies; tx_buffer_info->next_to_watch = i; len -= size; - total -= size; offset += size; count++; i++; @@ -5917,7 +4702,7 @@ static int ixgbe_tx_map(struct ixgbe_adapter *adapter, struct skb_frag_struct *frag; frag = &skb_shinfo(skb)->frags[f]; - len = min( (unsigned int)frag->size, total); + len = frag->size; offset = frag->page_offset; while (len) { @@ -5926,24 +4711,21 @@ static int ixgbe_tx_map(struct ixgbe_adapter *adapter, tx_buffer_info->length = size; tx_buffer_info->dma = pci_map_page(adapter->pdev, - frag->page, - offset, + frag->page, offset, size, PCI_DMA_TODEVICE); tx_buffer_info->time_stamp = jiffies; tx_buffer_info->next_to_watch = i; len -= size; - total -= size; offset += size; count++; i++; if (i == tx_ring->count) i = 0; } - if (total == 0) - break; } + #endif if (i == 0) i = tx_ring->count - 1; @@ -5988,6 +4770,7 @@ static void ixgbe_tx_queue(struct ixgbe_adapter *adapter, } else if (tx_flags & IXGBE_TX_FLAGS_CSUM) olinfo_status |= IXGBE_TXD_POPTS_TXSM << IXGBE_ADVTXD_POPTS_SHIFT; + olinfo_status |= ((paylen - hdr_len) << IXGBE_ADVTXD_PAYLEN_SHIFT); i = tx_ring->next_to_use; @@ -6017,64 +4800,13 @@ static void ixgbe_tx_queue(struct ixgbe_adapter *adapter, writel(i, adapter->hw.hw_addr + tx_ring->tail); } -static void ixgbe_atr(struct ixgbe_adapter *adapter, struct sk_buff *skb, - int queue, u32 tx_flags) -{ - /* Right now, we support IPv4 only */ - struct ixgbe_atr_input atr_input; - struct tcphdr *th; - struct udphdr *uh; - struct iphdr *iph = ip_hdr(skb); - struct ethhdr *eth = (struct ethhdr *)skb->data; - u16 vlan_id, src_port, dst_port, flex_bytes; - u32 src_ipv4_addr, dst_ipv4_addr; - u8 l4type = 0; - - /* check if we''re UDP or TCP */ - if (iph->protocol == IPPROTO_TCP) { - th = tcp_hdr(skb); - src_port = th->source; - dst_port = th->dest; - l4type |= IXGBE_ATR_L4TYPE_TCP; - /* l4type IPv4 type is 0, no need to assign */ - } else if(iph->protocol == IPPROTO_UDP) { - uh = udp_hdr(skb); - src_port = uh->source; - dst_port = uh->dest; - l4type |= IXGBE_ATR_L4TYPE_UDP; - /* l4type IPv4 type is 0, no need to assign */ - } else { - /* Unsupported L4 header, just bail here */ - return; - } - - memset(&atr_input, 0, sizeof(struct ixgbe_atr_input)); - - vlan_id = (tx_flags & IXGBE_TX_FLAGS_VLAN_MASK) >> - IXGBE_TX_FLAGS_VLAN_SHIFT; - src_ipv4_addr = iph->saddr; - dst_ipv4_addr = iph->daddr; - flex_bytes = eth->h_proto; - - ixgbe_atr_set_vlan_id_82599(&atr_input, vlan_id); - ixgbe_atr_set_src_port_82599(&atr_input, dst_port); - ixgbe_atr_set_dst_port_82599(&atr_input, src_port); - ixgbe_atr_set_flex_byte_82599(&atr_input, flex_bytes); - ixgbe_atr_set_l4type_82599(&atr_input, l4type); - /* src and dst are inverted, think how the receiver sees them */ - ixgbe_atr_set_src_ipv4_82599(&atr_input, dst_ipv4_addr); - ixgbe_atr_set_dst_ipv4_82599(&atr_input, src_ipv4_addr); - - /* This assumes the Rx queue and Tx queue are bound to the same CPU */ - ixgbe_fdir_add_signature_filter_82599(&adapter->hw, &atr_input, queue); -} - static int __ixgbe_maybe_stop_tx(struct net_device *netdev, struct ixgbe_ring *tx_ring, int size) { struct ixgbe_adapter *adapter = netdev_priv(netdev); netif_stop_subqueue(netdev, tx_ring->queue_index); + /* Herbert''s original patch had: * smp_mb__after_netif_stop_queue(); * but since that doesn''t exist yet, just open code it. */ @@ -6112,8 +4844,9 @@ static int ixgbe_xmit_frame(struct sk_buff *skb, struct net_device *netdev) #ifdef MAX_SKB_FRAGS unsigned int f; #endif + #ifdef HAVE_TX_MQ - r_idx = skb->queue_mapping; + r_idx = (adapter->num_tx_queues - 1) & skb->queue_mapping; #endif tx_ring = &adapter->tx_ring[r_idx]; @@ -6136,25 +4869,26 @@ static int ixgbe_xmit_frame(struct sk_buff *skb, struct net_device *netdev) #endif } #endif - /* four things can cause us to need a context descriptor */ + /* three things can cause us to need a context descriptor */ if (skb_is_gso(skb) || (skb->ip_summed == CHECKSUM_PARTIAL) || (tx_flags & IXGBE_TX_FLAGS_VLAN)) count++; + count += TXD_USE_COUNT(skb_headlen(skb)); #ifdef MAX_SKB_FRAGS for (f = 0; f < skb_shinfo(skb)->nr_frags; f++) count += TXD_USE_COUNT(skb_shinfo(skb)->frags[f].size); - #endif + if (ixgbe_maybe_stop_tx(netdev, tx_ring, count)) { adapter->tx_busy++; return NETDEV_TX_BUSY; } - first = tx_ring->next_to_use; if (skb->protocol == htons(ETH_P_IP)) tx_flags |= IXGBE_TX_FLAGS_IPV4; + first = tx_ring->next_to_use; tso = ixgbe_tso(adapter, tx_ring, skb, tx_flags, &hdr_len); if (tso < 0) { dev_kfree_skb_any(skb); @@ -6167,18 +4901,9 @@ static int ixgbe_xmit_frame(struct sk_buff *skb, struct net_device *netdev) (skb->ip_summed == CHECKSUM_PARTIAL)) tx_flags |= IXGBE_TX_FLAGS_CSUM; - /* add the ATR filter if ATR is on */ - if (tx_ring->atr_sample_rate) { - ++tx_ring->atr_count; - if ((tx_ring->atr_count >= tx_ring->atr_sample_rate) && - test_bit(__IXGBE_FDIR_INIT_DONE, &tx_ring->reinit_state)) { - ixgbe_atr(adapter, skb, tx_ring->queue_index, tx_flags); - tx_ring->atr_count = 0; - } - } ixgbe_tx_queue(adapter, tx_ring, tx_flags, - ixgbe_tx_map(adapter, tx_ring, skb, tx_flags, first), - skb->len, hdr_len); + ixgbe_tx_map(adapter, tx_ring, skb, first), + skb->len, hdr_len); netdev->trans_start = jiffies; @@ -6227,50 +4952,6 @@ static int ixgbe_set_mac(struct net_device *netdev, void *p) return 0; } -#if defined(HAVE_NETDEV_STORAGE_ADDRESS) && defined(NETDEV_HW_ADDR_T_SAN) -/** - * ixgbe_add_sanmac_netdev - Add the SAN MAC address to the corresponding - * netdev->dev_addr_list - * @netdev: network interface device structure - * - * Returns non-zero on failure - **/ -static int ixgbe_add_sanmac_netdev(struct net_device *dev) -{ - int err = 0; - struct ixgbe_adapter *adapter = netdev_priv(dev); - struct ixgbe_mac_info *mac = &adapter->hw.mac; - - if (is_valid_ether_addr(mac->san_addr)) { - rtnl_lock(); - err = dev_addr_add(dev, mac->san_addr, NETDEV_HW_ADDR_T_SAN); - rtnl_unlock(); - } - return err; -} - -/** - * ixgbe_del_sanmac_netdev - Removes the SAN MAC address to the corresponding - * netdev->dev_addr_list - * @netdev: network interface device structure - * - * Returns non-zero on failure - **/ -static int ixgbe_del_sanmac_netdev(struct net_device *dev) -{ - int err = 0; - struct ixgbe_adapter *adapter = netdev_priv(dev); - struct ixgbe_mac_info *mac = &adapter->hw.mac; - - if (is_valid_ether_addr(mac->san_addr)) { - rtnl_lock(); - err = dev_addr_del(dev, mac->san_addr, NETDEV_HW_ADDR_T_SAN); - rtnl_unlock(); - } - return err; -} - -#endif /* (HAVE_NETDEV_STORAGE_ADDRESS) && defined(NETDEV_HW_ADDR_T_SAN) */ #ifdef ETHTOOL_OPS_COMPAT /** * ixgbe_ioctl - @@ -6289,6 +4970,190 @@ static int ixgbe_ioctl(struct net_device *netdev, struct ifreq *ifr, int cmd) } #endif + +#ifdef CONFIG_XEN_NETDEV2_VMQ +int ixgbe_get_avail_queues(struct net_device *netdev, unsigned int queue_type) +{ + struct ixgbe_adapter *adapter = netdev_priv(netdev); + if (queue_type == VMQ_TYPE_RX) + return (adapter->num_rx_queues - adapter->rx_queues_allocated) - 1; + else if (queue_type == VMQ_TYPE_TX) + return 0; + else + return 0; +} + +int ixgbe_get_vmq_maxsize(struct net_device *netdev) +{ + return IXGBE_MAX_TXD; +} + +int ixgbe_alloc_vmq_queue(struct net_device *netdev, unsigned int queue_type) +{ + struct ixgbe_adapter *adapter = netdev_priv(netdev); + + if (queue_type == VMQ_TYPE_TX) + return -EINVAL; + + if (adapter->rx_queues_allocated >= adapter->num_rx_queues) { + return -EINVAL; + } else { + int i; + for (i = 1; i < adapter->num_rx_queues; i++) { + if (!adapter->rx_ring[i].allocated) { + adapter->rx_ring[i].allocated = TRUE; + adapter->rx_queues_allocated++; + return i; + } + } + return -EINVAL; + } +} + +int ixgbe_free_vmq_queue(struct net_device *netdev, int queue) +{ + struct ixgbe_adapter *adapter = netdev_priv(netdev); + + if (queue >= adapter->num_rx_queues) + return -EINVAL; + + if (!adapter->rx_ring[queue].allocated) + return -EINVAL; + + adapter->rx_ring[queue].allocated = FALSE; + adapter->rx_queues_allocated--; + ixgbe_clean_rx_ring(adapter, &adapter->rx_ring[queue]); + + return 0; +} + +int ixgbe_set_rxqueue_macfilter(struct net_device *netdev, int queue, + u8 *mac_addr) +{ + int err = 0; + u32 rah; + struct ixgbe_adapter *adapter = netdev_priv(netdev); + struct ixgbe_hw *hw = &adapter->hw; + struct ixgbe_ring *rx_ring = &adapter->rx_ring[queue]; + + if ((queue < 0) || (queue > adapter->num_rx_queues)) + return -EADDRNOTAVAIL; + + /* Note: Broadcast address is used to disable the MAC filter*/ + if (!is_valid_ether_addr(mac_addr)) { + + memset(rx_ring->mac_addr, 0xFF, ETH_ALEN); + + /* Clear RAR */ + IXGBE_WRITE_REG(hw, IXGBE_RAL(queue), 0); + IXGBE_WRITE_FLUSH(hw); + IXGBE_WRITE_REG(hw, IXGBE_RAH(queue), 0); + IXGBE_WRITE_FLUSH(hw); + + return -EADDRNOTAVAIL; + } + + /* Store in ring */ + memcpy(rx_ring->mac_addr, mac_addr, ETH_ALEN); + + err = ixgbe_set_rar(&adapter->hw, queue, rx_ring->mac_addr, 1, + IXGBE_RAH_AV); + + if (!err) { + /* Set the VIND for the indicated queue''s RAR Entry */ + rah = IXGBE_READ_REG(hw, IXGBE_RAH(queue)); + rah &= ~IXGBE_RAH_VIND_MASK; + rah |= (queue << IXGBE_RAH_VIND_SHIFT); + IXGBE_WRITE_REG(hw, IXGBE_RAH(queue), rah); + IXGBE_WRITE_FLUSH(hw); + } + + return err; +} + +int ixgbe_get_vmq_size(struct net_device *netdev, int queue) +{ + struct ixgbe_adapter *adapter = netdev_priv(netdev); + + if (queue >= adapter->num_rx_queues) + return -EINVAL; + return adapter->rx_ring[queue].count; +} + +int ixgbe_set_vmq_size(struct net_device *netdev, int queue, int size) +{ + struct ixgbe_adapter *adapter = netdev_priv(netdev); + /* Not implemented yet, so just return count. */ + return adapter->rx_ring[queue].count; +} + +int ixgbe_set_vmq_vlan(struct net_device *netdev, int queue, int vlan_id) +{ + return 0; /* not implemented */ +} + +int ixgbe_vmq_enable(struct net_device *netdev, int queue) +{ + struct ixgbe_adapter *adapter = netdev_priv(netdev); + struct ixgbe_hw *hw = &adapter->hw; + u32 rxdctl; + + if (queue >= adapter->num_rx_queues) + return -EINVAL; + + if (!adapter->rx_ring[queue].allocated) + return -EINVAL; + adapter->rx_ring[queue].active = 1; + rxdctl = IXGBE_READ_REG(hw, IXGBE_RXDCTL(queue)); + rxdctl |= IXGBE_RXDCTL_ENABLE; + IXGBE_WRITE_REG(hw, IXGBE_RXDCTL(queue), rxdctl); + IXGBE_WRITE_FLUSH(hw); + ixgbe_alloc_rx_buffers(adapter, + &adapter->rx_ring[queue], + IXGBE_DESC_UNUSED(&adapter->rx_ring[queue])); + return 0; +} +int ixgbe_vmq_disable(struct net_device *netdev, int queue) +{ + struct ixgbe_adapter *adapter = netdev_priv(netdev); + struct ixgbe_hw *hw = &adapter->hw; + u32 rxdctl; + + if (queue >= adapter->num_rx_queues) + return -EINVAL; + + if (!adapter->rx_ring[queue].allocated) + return -EINVAL; + + adapter->rx_ring[queue].active = 0; + rxdctl = IXGBE_READ_REG(hw, IXGBE_RXDCTL(queue)); + rxdctl &= ~IXGBE_RXDCTL_ENABLE; + IXGBE_WRITE_REG(hw, IXGBE_RXDCTL(queue), rxdctl); + return 0; +} + +static void ixgbe_setup_vmq(struct ixgbe_adapter *adapter) +{ + net_vmq_t *vmq; + + vmq = alloc_vmq(adapter->num_rx_queues); + if (vmq) { + vmq->avail_queues = ixgbe_get_avail_queues; + vmq->alloc_queue = ixgbe_alloc_vmq_queue; + vmq->free_queue = ixgbe_free_vmq_queue; + vmq->get_maxsize = ixgbe_get_vmq_maxsize; + vmq->get_size = ixgbe_get_vmq_size; + vmq->set_size = ixgbe_set_vmq_size; + vmq->set_mac = ixgbe_set_rxqueue_macfilter; + vmq->set_vlan = ixgbe_set_vmq_vlan; + vmq->enable = ixgbe_vmq_enable; + vmq->disable = ixgbe_vmq_disable; + vmq->nvmq = adapter->num_rx_queues; + adapter->netdev->vmq = vmq; + } +} +#endif /* CONFIG_XEN_NETDEV2_VMQ */ + #ifdef CONFIG_NET_POLL_CONTROLLER /* * Polling ''interrupt'' - used by things like netconsole to send skbs @@ -6304,88 +5169,39 @@ static void ixgbe_netpoll(struct net_device *netdev) adapter->flags |= IXGBE_FLAG_IN_NETPOLL; ixgbe_intr(adapter->pdev->irq, netdev); adapter->flags &= ~IXGBE_FLAG_IN_NETPOLL; - ixgbe_irq_enable(adapter, true, true); + ixgbe_irq_enable(adapter); } #endif -#ifdef HAVE_NETDEV_SELECT_QUEUE -static u16 ixgbe_select_queue(struct net_device *dev, struct sk_buff *skb) + +/** + * ixgbe_link_config - set up initial link with default speed and duplex + * @hw: pointer to private hardware struct + * + * Returns 0 on success, negative on failure + **/ +static int ixgbe_link_config(struct ixgbe_hw *hw) { - struct ixgbe_adapter *adapter = netdev_priv(dev); + u32 autoneg; + bool link_up = false; + u32 ret = IXGBE_ERR_LINK_SETUP; - if (adapter->flags & IXGBE_FLAG_FDIR_HASH_CAPABLE) - return smp_processor_id(); + if (hw->mac.ops.check_link) + ret = hw->mac.ops.check_link(hw, &autoneg, &link_up, false); - if (adapter->flags & IXGBE_FLAG_DCB_ENABLED) - return 0; /* all untagged traffic should default to TC 0 */ - - return skb_tx_hash(dev, skb); -} - -#endif /* HAVE_NETDEV_SELECT_QUEUE */ -#ifdef HAVE_NET_DEVICE_OPS -static const struct net_device_ops ixgbe_netdev_ops = { - .ndo_open = &ixgbe_open, - .ndo_stop = &ixgbe_close, - .ndo_start_xmit = &ixgbe_xmit_frame, - .ndo_get_stats = &ixgbe_get_stats, - .ndo_set_rx_mode = &ixgbe_set_rx_mode, - .ndo_set_multicast_list = &ixgbe_set_rx_mode, - .ndo_validate_addr = eth_validate_addr, - .ndo_set_mac_address = &ixgbe_set_mac, - .ndo_change_mtu = &ixgbe_change_mtu, -#ifdef ETHTOOL_OPS_COMPAT - .ndo_do_ioctl = &ixgbe_ioctl, -#endif - .ndo_tx_timeout = &ixgbe_tx_timeout, - .ndo_vlan_rx_register = &ixgbe_vlan_rx_register, - .ndo_vlan_rx_add_vid = &ixgbe_vlan_rx_add_vid, - .ndo_vlan_rx_kill_vid = &ixgbe_vlan_rx_kill_vid, -#ifdef CONFIG_NET_POLL_CONTROLLER - .ndo_poll_controller = &ixgbe_netpoll, -#endif - .ndo_select_queue = &ixgbe_select_queue, -}; + if (ret || !link_up) + goto link_cfg_out; -#endif /* HAVE_NET_DEVICE_OPS */ + if (hw->mac.ops.get_link_capabilities) + ret = hw->mac.ops.get_link_capabilities(hw, &autoneg, + &hw->mac.autoneg); + if (ret) + goto link_cfg_out; -void ixgbe_assign_netdev_ops(struct net_device *dev) -{ - struct ixgbe_adapter *adapter; - adapter = netdev_priv(dev); -#ifdef HAVE_NET_DEVICE_OPS - dev->netdev_ops = &ixgbe_netdev_ops; -#else /* HAVE_NET_DEVICE_OPS */ - dev->open = &ixgbe_open; - dev->stop = &ixgbe_close; - dev->hard_start_xmit = &ixgbe_xmit_frame; - dev->get_stats = &ixgbe_get_stats; -#ifdef HAVE_SET_RX_MODE - dev->set_rx_mode = &ixgbe_set_rx_mode; -#endif - dev->set_multicast_list = &ixgbe_set_rx_mode; - dev->set_mac_address = &ixgbe_set_mac; - dev->change_mtu = &ixgbe_change_mtu; -#ifdef ETHTOOL_OPS_COMPAT - dev->do_ioctl = &ixgbe_ioctl; -#endif -#ifdef HAVE_TX_TIMEOUT - dev->tx_timeout = &ixgbe_tx_timeout; -#endif -#ifdef NETIF_F_HW_VLAN_TX - dev->vlan_rx_register = &ixgbe_vlan_rx_register; - dev->vlan_rx_add_vid = &ixgbe_vlan_rx_add_vid; - dev->vlan_rx_kill_vid = &ixgbe_vlan_rx_kill_vid; -#endif -#ifdef CONFIG_NET_POLL_CONTROLLER - dev->poll_controller = &ixgbe_netpoll; -#endif -#ifdef HAVE_NETDEV_SELECT_QUEUE - dev->select_queue = &ixgbe_select_queue; -#endif /* HAVE_NETDEV_SELECT_QUEUE */ -#endif /* HAVE_NET_DEVICE_OPS */ - ixgbe_set_ethtool_ops(dev); - dev->watchdog_timeo = 5 * HZ; + if (hw->mac.ops.setup_link_speed) + ret = hw->mac.ops.setup_link_speed(hw, autoneg, true, true); +link_cfg_out: + return ret; } /** @@ -6407,7 +5223,6 @@ static int __devinit ixgbe_probe(struct pci_dev *pdev, struct ixgbe_hw *hw = NULL; static int cards_found; int i, err, pci_using_dac; - u32 part_num; err = pci_enable_device(pdev); if (err) @@ -6435,28 +5250,6 @@ static int __devinit ixgbe_probe(struct pci_dev *pdev, goto err_pci_reg; } - /* - * Workaround of Silicon errata on 82598. Disable LOs in the PCI switch - * port to which the 82598 is connected to prevent duplicate - * completions caused by LOs. We need the mac type so that we only - * do this on 82598 devices, ixgbe_set_mac_type does this for us if - * we set it''s device ID. - */ - hw = vmalloc(sizeof(struct ixgbe_hw)); - if (!hw) { - printk(KERN_INFO "Unable to allocate memory for LOs fix " - "- not checked\n"); - } else { - hw->vendor_id = pdev->vendor; - hw->device_id = pdev->device; - ixgbe_set_mac_type(hw); - if (hw->mac.type == ixgbe_mac_82598EB) - pci_disable_link_state(pdev, PCIE_LINK_STATE_L0S); - vfree(hw); - } - - pci_enable_pcie_error_reporting(pdev); - pci_set_master(pdev); #ifdef HAVE_TX_MQ @@ -6481,12 +5274,7 @@ static int __devinit ixgbe_probe(struct pci_dev *pdev, adapter->msg_enable = (1 << DEFAULT_DEBUG_LEVEL_SHIFT) - 1; #ifdef HAVE_PCI_ERS - /* - * call save state here in standalone driver because it relies on - * adapter struct to exist, and needs to call netdev_priv - */ pci_save_state(pdev); - #endif hw->hw_addr = ioremap(pci_resource_start(pdev, 0), pci_resource_len(pdev, 0)); @@ -6500,8 +5288,32 @@ static int __devinit ixgbe_probe(struct pci_dev *pdev, continue; } - ixgbe_assign_netdev_ops(netdev); - + netdev->open = &ixgbe_open; + netdev->stop = &ixgbe_close; + netdev->hard_start_xmit = &ixgbe_xmit_frame; + netdev->get_stats = &ixgbe_get_stats; +#ifdef HAVE_SET_RX_MODE + netdev->set_rx_mode = &ixgbe_set_rx_mode; +#endif + netdev->set_multicast_list = &ixgbe_set_rx_mode; + netdev->set_mac_address = &ixgbe_set_mac; + netdev->change_mtu = &ixgbe_change_mtu; +#ifdef ETHTOOL_OPS_COMPAT + netdev->do_ioctl = &ixgbe_ioctl; +#endif + ixgbe_set_ethtool_ops(netdev); +#ifdef HAVE_TX_TIMEOUT + netdev->tx_timeout = &ixgbe_tx_timeout; + netdev->watchdog_timeo = 5 * HZ; +#endif +#ifdef NETIF_F_HW_VLAN_TX + netdev->vlan_rx_register = ixgbe_vlan_rx_register; + netdev->vlan_rx_add_vid = ixgbe_vlan_rx_add_vid; + netdev->vlan_rx_kill_vid = ixgbe_vlan_rx_kill_vid; +#endif +#ifdef CONFIG_NET_POLL_CONTROLLER + netdev->poll_controller = ixgbe_netpoll; +#endif strcpy(netdev->name, pci_name(pdev)); adapter->bd_number = cards_found; @@ -6524,47 +5336,14 @@ static int __devinit ixgbe_probe(struct pci_dev *pdev, INIT_WORK(&adapter->sfp_task, ixgbe_sfp_task); - /* multispeed fiber has its own tasklet, called from GPI SDP1 context */ - INIT_WORK(&adapter->multispeed_fiber_task, ixgbe_multispeed_fiber_task); - - /* a new SFP+ module arrival, called from GPI SDP2 context */ - INIT_WORK(&adapter->sfp_config_module_task, - ixgbe_sfp_config_module_task); - /* setup the private structure */ err = ixgbe_sw_init(adapter); if (err) goto err_sw_init; - /* - * If we have a fan, this is as early we know, warn if we - * have had a failure. - */ - if (adapter->flags & IXGBE_FLAG_FAN_FAIL_CAPABLE) { - u32 esdp = IXGBE_READ_REG(hw, IXGBE_ESDP); - if (esdp & IXGBE_ESDP_SDP1) - DPRINTK(PROBE, CRIT, - "Fan has stopped, replace the adapter\n"); - } - /* reset_hw fills in the perm_addr as well */ err = hw->mac.ops.reset_hw(hw); - if (err == IXGBE_ERR_SFP_NOT_PRESENT && - hw->mac.type == ixgbe_mac_82598EB) { - /* - * Start a kernel thread to watch for a module to arrive. - * Only do this for 82598, since 82599 will generate interrupts - * on module arrival. - */ - set_bit(__IXGBE_SFP_MODULE_NOT_FOUND, &adapter->state); - mod_timer(&adapter->sfp_timer, - round_jiffies(jiffies + (2 * HZ))); - err = 0; - } else if (err == IXGBE_ERR_SFP_NOT_SUPPORTED) { - DPRINTK(PROBE, ERR, "failed to load because an " - "unsupported SFP+ module type was detected.\n"); - goto err_sw_init; - } else if (err) { + if (err) { DPRINTK(PROBE, ERR, "HW Init failed: %d\n", err); goto err_sw_init; } @@ -6595,34 +5374,11 @@ static int __devinit ixgbe_probe(struct pci_dev *pdev, netdev->features |= NETIF_F_TSO6; #endif /* NETIF_F_TSO6 */ #endif /* NETIF_F_TSO */ -#ifdef NETIF_F_GRO - netdev->features |= NETIF_F_GRO; -#endif /* NETIF_F_GRO */ if (adapter->flags & IXGBE_FLAG_DCB_ENABLED) adapter->flags &= ~IXGBE_FLAG_RSS_ENABLED; - if (adapter->flags & IXGBE_FLAG_VMDQ_ENABLED) - adapter->flags &= ~(IXGBE_FLAG_FDIR_HASH_CAPABLE - | IXGBE_FLAG_FDIR_PERFECT_CAPABLE); -#ifndef IXGBE_NO_HW_RSC - if (adapter->flags2 & IXGBE_FLAG2_RSC_CAPABLE) { -#ifdef NETIF_F_LRO - netdev->features |= NETIF_F_LRO; -#endif -#ifndef IXGBE_NO_LRO - adapter->flags2 &= ~IXGBE_FLAG2_SWLRO_ENABLED; -#endif - adapter->flags2 |= IXGBE_FLAG2_RSC_ENABLED; - } else { -#endif -#ifndef IXGBE_NO_LRO -#ifdef NETIF_F_LRO - netdev->features |= NETIF_F_LRO; -#endif - adapter->flags2 |= IXGBE_FLAG2_SWLRO_ENABLED; -#endif -#ifndef IXGBE_NO_HW_RSC - adapter->flags2 &= ~IXGBE_FLAG2_RSC_ENABLED; - } +#ifndef IXGBE_NO_INET_LRO + netdev->features |= NETIF_F_LRO; + #endif #ifdef HAVE_NETDEV_VLAN_FEATURES #ifdef NETIF_F_TSO @@ -6635,10 +5391,6 @@ static int __devinit ixgbe_probe(struct pci_dev *pdev, netdev->vlan_features |= NETIF_F_SG; #endif /* HAVE_NETDEV_VLAN_FEATURES */ -#ifdef CONFIG_DCB - netdev->dcbnl_ops = &dcbnl_ops; -#endif - if (pci_using_dac) netdev->features |= NETIF_F_HIGHDMA; @@ -6668,6 +5420,9 @@ static int __devinit ixgbe_probe(struct pci_dev *pdev, } #endif + if (hw->mac.ops.get_bus_info) + hw->mac.ops.get_bus_info(hw); + init_timer(&adapter->watchdog_timer); adapter->watchdog_timer.function = &ixgbe_watchdog; adapter->watchdog_timer.data = (unsigned long)adapter; @@ -6679,61 +5434,45 @@ static int __devinit ixgbe_probe(struct pci_dev *pdev, if (err) goto err_sw_init; - switch (pdev->device) { - case IXGBE_DEV_ID_82599_KX4: - adapter->wol = (IXGBE_WUFC_MAG | IXGBE_WUFC_EX | - IXGBE_WUFC_MC | IXGBE_WUFC_BC); - /* Enable ACPI wakeup in GRC */ - IXGBE_WRITE_REG(hw, IXGBE_GRC, - (IXGBE_READ_REG(hw, IXGBE_GRC) & ~IXGBE_GRC_APME)); - break; - default: - adapter->wol = 0; - break; - } - device_init_wakeup(&adapter->pdev->dev, true); - device_set_wakeup_enable(&adapter->pdev->dev, adapter->wol); - - /* save off EEPROM version number */ - ixgbe_read_eeprom(hw, 0x29, &adapter->eeprom_version); - /* reset the hardware with the new settings */ - err = hw->mac.ops.start_hw(hw); - if (err == IXGBE_ERR_EEPROM_VERSION) { - /* We are running on a pre-production device, log a warning */ - DPRINTK(PROBE, INFO, "This device is a pre-production adapter/" - "LOM. Please be aware there may be issues associated " - "with your hardware. If you are experiencing problems " - "please contact your Intel or hardware representative " - "who provided you with this hardware.\n"); - } - /* pick up the PCI bus settings for reporting later */ - if (hw->mac.ops.get_bus_info) - hw->mac.ops.get_bus_info(hw); + hw->mac.ops.start_hw(hw); + /* link_config depends on ixgbe_start_hw being called at least once */ + err = ixgbe_link_config(hw); + if (err) { + DPRINTK(PROBE, ERR, "setup_link_speed FAILED %d\n", err); + goto err_register; + } netif_carrier_off(netdev); netif_tx_stop_all_queues(netdev); - if (adapter->flags & IXGBE_FLAG_FDIR_HASH_CAPABLE || - adapter->flags & IXGBE_FLAG_FDIR_PERFECT_CAPABLE) - INIT_WORK(&adapter->fdir_reinit_task, ixgbe_fdir_reinit_task); +#ifdef CONFIG_IXGBE_NAPI + ixgbe_napi_add_all(adapter); +#endif strcpy(netdev->name, "eth%d"); +#ifdef CONFIG_XEN_NETDEV2_VMQ + if (adapter->flags & IXGBE_FLAG_VMDQ_ENABLED) + ixgbe_setup_vmq(adapter); +#endif err = register_netdev(netdev); if (err) goto err_register; - adapter->netdev_registered = true; +#ifndef CONFIG_XEN_NETDEV2_VMQ + if (adapter->flags & IXGBE_FLAG_VMDQ_ENABLED) + ixgbe_sysfs_create(adapter); +#endif + #if defined(CONFIG_DCA) || defined(CONFIG_DCA_MODULE) if (adapter->flags & IXGBE_FLAG_DCA_CAPABLE) { - err = dca_add_requester(&pdev->dev); - if (err == 0) { + if (dca_add_requester(&pdev->dev) == 0) { adapter->flags |= IXGBE_FLAG_DCA_ENABLED; + /* always use CB2 mode, difference is masked + * in the CB driver */ + IXGBE_WRITE_REG(hw, IXGBE_DCA_CTRL, 2); ixgbe_setup_dca(adapter); - } else { - DPRINTK(PROBE, INFO, "DCA registration failed: %d\n", - err); } } @@ -6741,8 +5480,7 @@ static int __devinit ixgbe_probe(struct pci_dev *pdev, /* print all messages at the end so that we use our eth%d name */ /* print bus type/speed/width info */ DPRINTK(PROBE, INFO, "(PCI Express:%s:%s) ", - ((hw->bus.speed == ixgbe_bus_speed_5000) ? "5.0Gb/s": - (hw->bus.speed == ixgbe_bus_speed_2500) ? "2.5Gb/s":"Unknown"), + ((hw->bus.speed == ixgbe_bus_speed_2500) ? "2.5Gb/s":"Unknown"), (hw->bus.width == ixgbe_bus_width_pcie_x8) ? "Width x8" : (hw->bus.width == ixgbe_bus_width_pcie_x4) ? "Width x4" : (hw->bus.width == ixgbe_bus_width_pcie_x1) ? "Width x1" : @@ -6752,15 +5490,13 @@ static int __devinit ixgbe_probe(struct pci_dev *pdev, for (i = 0; i < 6; i++) printk("%2.2x%c", netdev->dev_addr[i], i == 5 ? ''\n'' : '':''); - ixgbe_read_pba_num(hw, &part_num); - if (ixgbe_is_sfp(hw) && hw->phy.sfp_type != ixgbe_sfp_type_not_present) - DPRINTK(PROBE, INFO, "MAC: %d, PHY: %d, SFP+: %d, PBA No: %06x-%03x\n", - hw->mac.type, hw->phy.type, hw->phy.sfp_type, - (part_num >> 8), (part_num & 0xff)); + if ((hw->phy.type == ixgbe_phy_nl) && + (hw->phy.sfp_type != ixgbe_sfp_type_not_present)) + DPRINTK(PROBE, INFO, "MAC: %d, PHY: %d, SFP+: %d\n", + hw->mac.type, hw->phy.type, hw->phy.sfp_type); else - DPRINTK(PROBE, INFO, "MAC: %d, PHY: %d, PBA No: %06x-%03x\n", - hw->mac.type, hw->phy.type, - (part_num >> 8), (part_num & 0xff)); + DPRINTK(PROBE, INFO, "MAC: %d, PHY: %d\n", + hw->mac.type, hw->phy.type); if (hw->bus.width <= ixgbe_bus_width_pcie_x4) { DPRINTK(PROBE, WARNING, "PCI-Express bandwidth available for " @@ -6770,37 +5506,26 @@ static int __devinit ixgbe_probe(struct pci_dev *pdev, "PCI-Express slot is required.\n"); } +#ifndef IXGBE_NO_INET_LRO + DPRINTK(PROBE, INFO, "In-kernel LRO is enabled \n"); +#else #ifndef IXGBE_NO_LRO - if (adapter->flags2 & IXGBE_FLAG2_SWLRO_ENABLED) - DPRINTK(PROBE, INFO, "Internal LRO is enabled \n"); - else - DPRINTK(PROBE, INFO, "LRO is disabled \n"); + DPRINTK(PROBE, INFO, "Internal LRO is enabled \n"); +#else + DPRINTK(PROBE, INFO, "LRO is disabled \n"); #endif -#ifndef IXGBE_NO_HW_RSC - if (adapter->flags2 & IXGBE_FLAG2_RSC_CAPABLE) - DPRINTK(PROBE, INFO, "HW RSC is enabled \n"); #endif -#if defined(HAVE_NETDEV_STORAGE_ADDRESS) && defined(NETDEV_HW_ADDR_T_SAN) - /* add san mac addr to netdev */ - ixgbe_add_sanmac_netdev(netdev); - -#endif /* (HAVE_NETDEV_STORAGE_ADDRESS) && (NETDEV_HW_ADDR_T_SAN) */ DPRINTK(PROBE, INFO, "Intel(R) 10 Gigabit Network Connection\n"); cards_found++; return 0; err_register: - if (adapter->flags & IXGBE_FLAG_FDIR_HASH_CAPABLE || - adapter->flags & IXGBE_FLAG_FDIR_PERFECT_CAPABLE) - cancel_work_sync(&adapter->fdir_reinit_task); - ixgbe_clear_interrupt_scheme(adapter); ixgbe_release_hw_control(adapter); err_sw_init: clear_bit(__IXGBE_SFP_MODULE_NOT_FOUND, &adapter->state); del_timer_sync(&adapter->sfp_timer); cancel_work_sync(&adapter->sfp_task); - cancel_work_sync(&adapter->multispeed_fiber_task); - cancel_work_sync(&adapter->sfp_config_module_task); + ixgbe_reset_interrupt_capability(adapter); #ifdef IXGBE_TCP_TIMER iounmap(adapter->msix_addr); err_map_msix: @@ -6831,20 +5556,13 @@ static void __devexit ixgbe_remove(struct pci_dev *pdev) struct ixgbe_adapter *adapter = netdev_priv(netdev); set_bit(__IXGBE_DOWN, &adapter->state); - /* - * clear the module not found bit to make sure the worker won''t - * reschedule - */ + /* clear the module not found bit to make sure the worker won''t + * reschedule */ clear_bit(__IXGBE_SFP_MODULE_NOT_FOUND, &adapter->state); del_timer_sync(&adapter->watchdog_timer); del_timer_sync(&adapter->sfp_timer); cancel_work_sync(&adapter->watchdog_task); cancel_work_sync(&adapter->sfp_task); - if (adapter->flags & IXGBE_FLAG_FDIR_HASH_CAPABLE || - adapter->flags & IXGBE_FLAG_FDIR_PERFECT_CAPABLE) - cancel_work_sync(&adapter->fdir_reinit_task); - cancel_work_sync(&adapter->multispeed_fiber_task); - cancel_work_sync(&adapter->sfp_config_module_task); flush_scheduled_work(); #if defined(CONFIG_DCA) || defined(CONFIG_DCA_MODULE) @@ -6855,17 +5573,22 @@ static void __devexit ixgbe_remove(struct pci_dev *pdev) } #endif -#if defined(HAVE_NETDEV_STORAGE_ADDRESS) && defined(NETDEV_HW_ADDR_T_SAN) - /* remove the added san mac */ - ixgbe_del_sanmac_netdev(netdev); +#ifdef CONFIG_XEN_NETDEV2_VMQ + if (netdev->vmq) { + free_vmq(netdev->vmq); + netdev->vmq = 0; + } +#endif -#endif /* (HAVE_NETDEV_STORAGE_ADDRESS) && (NETDEV_HW_ADDR_T_SAN) */ - if (adapter->netdev_registered) { +#ifndef CONFIG_XEN_NETDEV2_VMQ + if (adapter->flags & IXGBE_FLAG_VMDQ_ENABLED) + ixgbe_sysfs_remove(adapter); +#endif + if (netdev->reg_state == NETREG_REGISTERED) unregister_netdev(netdev); - adapter->netdev_registered = false; - } - ixgbe_clear_interrupt_scheme(adapter); + ixgbe_reset_interrupt_capability(adapter); + ixgbe_release_hw_control(adapter); #ifdef IXGBE_TCP_TIMER @@ -6875,9 +5598,13 @@ static void __devexit ixgbe_remove(struct pci_dev *pdev) pci_release_regions(pdev); DPRINTK(PROBE, INFO, "complete\n"); - free_netdev(netdev); +#ifdef CONFIG_IXGBE_NAPI + ixgbe_napi_del_all(adapter); +#endif + kfree(adapter->tx_ring); + kfree(adapter->rx_ring); - pci_disable_pcie_error_reporting(pdev); + free_netdev(netdev); pci_disable_device(pdev); } @@ -6891,13 +5618,6 @@ u16 ixgbe_read_pci_cfg_word(struct ixgbe_hw *hw, u32 reg) return value; } -void ixgbe_write_pci_cfg_word(struct ixgbe_hw *hw, u32 reg, u16 value) -{ - struct ixgbe_adapter *adapter = hw->back; - - pci_write_config_word(adapter->pdev, reg, value); -} - #ifdef HAVE_PCI_ERS /** * ixgbe_io_error_detected - called when PCI error is detected @@ -6919,7 +5639,7 @@ static pci_ers_result_t ixgbe_io_error_detected(struct pci_dev *pdev, ixgbe_down(adapter); pci_disable_device(pdev); - /* Request a slot reset. */ + /* Request a slot reset */ return PCI_ERS_RESULT_NEED_RESET; } @@ -6933,26 +5653,21 @@ static pci_ers_result_t ixgbe_io_slot_reset(struct pci_dev *pdev) { struct net_device *netdev = pci_get_drvdata(pdev); struct ixgbe_adapter *adapter = netdev_priv(netdev); - pci_ers_result_t result; if (pci_enable_device(pdev)) { DPRINTK(PROBE, ERR, "Cannot re-enable PCI device after reset.\n"); - result = PCI_ERS_RESULT_DISCONNECT; - } else { - pci_set_master(pdev); - pci_restore_state(pdev); - - pci_wake_from_d3(pdev, false); - - ixgbe_reset(adapter); - IXGBE_WRITE_REG(&adapter->hw, IXGBE_WUS, ~0); - result = PCI_ERS_RESULT_RECOVERED; + return PCI_ERS_RESULT_DISCONNECT; } + pci_set_master(pdev); + pci_restore_state(pdev); - pci_cleanup_aer_uncorrect_error_status(pdev); + pci_enable_wake(pdev, PCI_D3hot, 0); + pci_enable_wake(pdev, PCI_D3cold, 0); - return result; + ixgbe_reset(adapter); + + return PCI_ERS_RESULT_RECOVERED; } /** @@ -7017,21 +5732,17 @@ bool ixgbe_is_ixgbe(struct pci_dev *pcidev) **/ static int __init ixgbe_init_module(void) { - int ret; printk(KERN_INFO "ixgbe: %s - version %s\n", ixgbe_driver_string, ixgbe_driver_version); printk(KERN_INFO "%s\n", ixgbe_copyright); -#ifndef CONFIG_DCB ixgbe_dcb_netlink_register(); -#endif #if defined(CONFIG_DCA) || defined(CONFIG_DCA_MODULE) dca_register_notify(&dca_notifier); #endif - ret = pci_register_driver(&ixgbe_driver); - return ret; + return pci_register_driver(&ixgbe_driver); } module_init(ixgbe_init_module); @@ -7047,9 +5758,7 @@ static void __exit ixgbe_exit_module(void) #if defined(CONFIG_DCA) || defined(CONFIG_DCA_MODULE) dca_unregister_notify(&dca_notifier); #endif -#ifndef CONFIG_DCB ixgbe_dcb_netlink_unregister(); -#endif pci_unregister_driver(&ixgbe_driver); } @@ -7068,4 +5777,3 @@ static int ixgbe_notify_dca(struct notifier_block *nb, unsigned long event, module_exit(ixgbe_exit_module); /* ixgbe_main.c */ - diff --git a/drivers/net/ixgbe/ixgbe_osdep.h b/drivers/net/ixgbe/ixgbe_osdep.h index eafde20..50da4d4 100644 --- a/drivers/net/ixgbe/ixgbe_osdep.h +++ b/drivers/net/ixgbe/ixgbe_osdep.h @@ -1,7 +1,7 @@ /******************************************************************************* Intel 10 Gigabit PCI Express Linux driver - Copyright(c) 1999 - 2009 Intel Corporation. + Copyright(c) 1999 - 2008 Intel Corporation. This program is free software; you can redistribute it and/or modify it under the terms and conditions of the GNU General Public License, @@ -54,7 +54,7 @@ #undef ASSERT #ifdef DBG -#define hw_dbg(hw, S, A...) printk(KERN_DEBUG S, ## A) +#define hw_dbg(hw, S, A...) printk(KERN_DEBUG S, A) #else #define hw_dbg(hw, S, A...) do {} while (0) #endif @@ -87,21 +87,10 @@ #define IXGBE_READ_REG_ARRAY(a, reg, offset) ( \ readl((a)->hw_addr + (reg) + ((offset) << 2))) -#ifndef writeq -#define writeq(val, addr) writel((u32) (val), addr); \ - writel((u32) (val >> 32), (addr + 4)); -#endif - -#define IXGBE_WRITE_REG64(a, reg, value) writeq((value), ((a)->hw_addr + (reg))) - #define IXGBE_WRITE_FLUSH(a) IXGBE_READ_REG(a, IXGBE_STATUS) struct ixgbe_hw; extern u16 ixgbe_read_pci_cfg_word(struct ixgbe_hw *hw, u32 reg); -extern void ixgbe_write_pci_cfg_word(struct ixgbe_hw *hw, u32 reg, u16 value); #define IXGBE_READ_PCIE_WORD ixgbe_read_pci_cfg_word -#define IXGBE_WRITE_PCIE_WORD ixgbe_write_pci_cfg_word #define IXGBE_EEPROM_GRANT_ATTEMPS 100 -#define IXGBE_HTONL(_i) htonl(_i) -#define IXGBE_HTONS(_i) htons(_i) #endif /* _IXGBE_OSDEP_H_ */ diff --git a/drivers/net/ixgbe/ixgbe_param.c b/drivers/net/ixgbe/ixgbe_param.c index d6ace0c..ba97102 100644 --- a/drivers/net/ixgbe/ixgbe_param.c +++ b/drivers/net/ixgbe/ixgbe_param.c @@ -1,7 +1,7 @@ /******************************************************************************* Intel 10 Gigabit PCI Express Linux driver - Copyright(c) 1999 - 2009 Intel Corporation. + Copyright(c) 1999 - 2008 Intel Corporation. This program is free software; you can redistribute it and/or modify it under the terms and conditions of the GNU General Public License, @@ -96,31 +96,16 @@ IXGBE_PARAM(InterruptType, "Change Interrupt Mode (0=Legacy, 1=MSI, 2=MSI-X), de IXGBE_PARAM(MQ, "Disable or enable Multiple Queues, default 1"); #if defined(CONFIG_DCA) || defined(CONFIG_DCA_MODULE) -/* DCA - Direct Cache Access (DCA) Control +/* DCA - Direct Cache Access (DCA) Enable/Disable * - * This option allows the device to hint to DCA enabled processors - * which CPU should have its cache warmed with the data being - * transferred over PCIe. This can increase performance by reducing - * cache misses. ixgbe hardware supports DCA for: - * tx descriptor writeback - * rx descriptor writeback - * rx data - * rx data header only (in packet split mode) - * - * enabling option 2 can cause cache thrash in some tests, particularly - * if the CPU is completely utilized - * - * Valid Range: 0 - 2 + * Valid Range: 0, 1 * - 0 - disables DCA * - 1 - enables DCA - * - 2 - enables DCA with rx data included * - * Default Value: 2 + * Default Value: 1 */ -#define IXGBE_MAX_DCA 2 - -IXGBE_PARAM(DCA, "Disable or enable Direct Cache Access, 0=disabled, 1=descriptor only, 2=descriptor and data"); +IXGBE_PARAM(DCA, "Disable or enable Direct Cache Access, default 1"); #endif /* RSS - Receive-Side Scaling (RSS) Descriptor Queues @@ -148,14 +133,14 @@ IXGBE_PARAM(VMDQ, "Number of Virtual Machine Device Queues: 0/1 = disable (defau /* Interrupt Throttle Rate (interrupts/sec) * - * Valid Range: 956-488281 (0=off, 1=dynamic) + * Valid Range: 100-500000 (0=off) * * Default Value: 8000 */ #define DEFAULT_ITR 8000 -IXGBE_PARAM(InterruptThrottleRate, "Maximum interrupts per second, per vector, (956-488281), default 8000"); -#define MAX_ITR IXGBE_MAX_INT_RATE -#define MIN_ITR IXGBE_MIN_INT_RATE +IXGBE_PARAM(InterruptThrottleRate, "Maximum interrupts per second, per vector, (100-500000), default 8000"); +#define MAX_ITR 500000 +#define MIN_ITR 100 #ifndef IXGBE_NO_LLI /* LLIPort (Low Latency Interrupt TCP Port) @@ -193,32 +178,22 @@ IXGBE_PARAM(LLISize, "Low Latency Interrupt on Packet Size (0-1500)"); #define DEFAULT_LLISIZE 0 #define MAX_LLISIZE 1500 #define MIN_LLISIZE 0 +#endif /* IXGBE_NO_LLI */ -/* LLIEType (Low Latency Interrupt Ethernet Type) - * - * Valid Range: 0 - 0x8fff - * - * Default Value: 0 (disabled) - */ -IXGBE_PARAM(LLIEType, "Low Latency Interrupt Ethernet Protocol Type"); - -#define DEFAULT_LLIETYPE 0 -#define MAX_LLIETYPE 0x8fff -#define MIN_LLIETYPE 0 - -/* LLIVLANP (Low Latency Interrupt on VLAN priority threshold) +#ifndef IXGBE_NO_INET_LRO +/* LROAggr (Large Receive Offload) * - * Valid Range: 0 - 7 + * Valid Range: 2 - 44 * - * Default Value: 0 (disabled) + * Default Value: 32 */ -IXGBE_PARAM(LLIVLANP, "Low Latency Interrupt on VLAN priority threshold"); +IXGBE_PARAM(LROAggr, "LRO - Maximum packets to aggregate"); -#define DEFAULT_LLIVLANP 0 -#define MAX_LLIVLANP 7 -#define MIN_LLIVLANP 0 +#define DEFAULT_LRO_AGGR 32 +#define MAX_LRO_AGGR 44 +#define MIN_LRO_AGGR 2 -#endif /* IXGBE_NO_LLI */ +#endif /* Rx buffer mode * * Valid Range: 0-2 0 = 1buf_mode_always, 1 = ps_mode_always and 2 = optimal @@ -234,51 +209,7 @@ IXGBE_PARAM(RxBufferMode, "0=1 descriptor per packet,\n" #define IXGBE_RXBUFMODE_OPTIMAL 2 #define IXGBE_DEFAULT_RXBUFMODE IXGBE_RXBUFMODE_OPTIMAL -/* Flow Director filtering mode - * - * Valid Range: 0-2 0 = off, 1 = Hashing (ATR), and 2 = perfect filters - * - * Default Value: 1 (ATR) - */ -IXGBE_PARAM(FdirMode, "Flow Director filtering modes:\n" - "\t\t\t0 = Filtering off\n" - "\t\t\t1 = Signature Hashing filters (SW ATR)\n" - "\t\t\t2 = Perfect Filters"); - -#define IXGBE_FDIR_FILTER_OFF 0 -#define IXGBE_FDIR_FILTER_HASH 1 -#define IXGBE_FDIR_FILTER_PERFECT 2 -#define IXGBE_DEFAULT_FDIR_FILTER IXGBE_FDIR_FILTER_HASH - -/* Flow Director packet buffer allocation level - * - * Valid Range: 0-2 0 = 8k hash/2k perfect, 1 = 16k hash/4k perfect, - * 2 = 32k hash/8k perfect - * - * Default Value: 0 - */ -IXGBE_PARAM(FdirPballoc, "Flow Director packet buffer allocation level:\n" - "\t\t\t0 = 8k hash filters or 2k perfect filters\n" - "\t\t\t1 = 16k hash filters or 4k perfect filters\n" - "\t\t\t2 = 32k hash filters or 8k perfect filters"); -#define IXGBE_FDIR_PBALLOC_64K 0 -#define IXGBE_FDIR_PBALLOC_128K 1 -#define IXGBE_FDIR_PBALLOC_256K 2 -#define IXGBE_DEFAULT_FDIR_PBALLOC IXGBE_FDIR_PBALLOC_64K - -/* Software ATR packet sample rate - * - * Valid Range: 0-100 0 = off, 1-100 = rate of Tx packet inspection - * - * Default Value: 20 - */ -IXGBE_PARAM(AtrSampleRate, "Software ATR Tx packet sample rate"); - -#define IXGBE_MAX_ATR_SAMPLE_RATE 100 -#define IXGBE_MIN_ATR_SAMPLE_RATE 1 -#define IXGBE_ATR_SAMPLE_RATE_OFF 0 -#define IXGBE_DEFAULT_ATR_SAMPLE_RATE 20 struct ixgbe_option { enum { enable_option, range_option, list_option } type; @@ -292,7 +223,7 @@ struct ixgbe_option { } r; struct { /* list_option info */ int nr; - const struct ixgbe_opt_list { + struct ixgbe_opt_list { int i; char *str; } *p; @@ -327,7 +258,7 @@ static int __devinit ixgbe_validate_option(unsigned int *value, break; case list_option: { int i; - const struct ixgbe_opt_list *ent; + struct ixgbe_opt_list *ent; for (i = 0; i < opt->arg.l.nr; i++) { ent = &opt->arg.l.p[i]; @@ -363,8 +294,6 @@ static int __devinit ixgbe_validate_option(unsigned int *value, void __devinit ixgbe_check_options(struct ixgbe_adapter *adapter) { int bd = adapter->bd_number; - u32 *aflags = &adapter->flags; - struct ixgbe_ring_feature *feature = adapter->ring_feature; if (bd >= IXGBE_MAX_NIC) { printk(KERN_NOTICE @@ -394,32 +323,32 @@ void __devinit ixgbe_check_options(struct ixgbe_adapter *adapter) ixgbe_validate_option(&i_type, &opt); switch (i_type) { case IXGBE_INT_MSIX: - if (!(*aflags & IXGBE_FLAG_MSIX_CAPABLE)) + if (!adapter->flags & IXGBE_FLAG_MSIX_CAPABLE) printk(KERN_INFO "Ignoring MSI-X setting; " - "support unavailable\n"); + "support unavailable.\n"); break; case IXGBE_INT_MSI: - if (!(*aflags & IXGBE_FLAG_MSI_CAPABLE)) { + if (!adapter->flags & IXGBE_FLAG_MSI_CAPABLE) { printk(KERN_INFO "Ignoring MSI setting; " - "support unavailable\n"); + "support unavailable.\n"); } else { - *aflags &= ~IXGBE_FLAG_MSIX_CAPABLE; - *aflags &= ~IXGBE_FLAG_DCB_CAPABLE; + adapter->flags &= ~IXGBE_FLAG_MSIX_CAPABLE; + adapter->flags &= ~IXGBE_FLAG_DCB_CAPABLE; } break; case IXGBE_INT_LEGACY: default: - *aflags &= ~IXGBE_FLAG_MSIX_CAPABLE; - *aflags &= ~IXGBE_FLAG_MSI_CAPABLE; - *aflags &= ~IXGBE_FLAG_DCB_CAPABLE; + adapter->flags &= ~IXGBE_FLAG_MSIX_CAPABLE; + adapter->flags &= ~IXGBE_FLAG_MSI_CAPABLE; + adapter->flags &= ~IXGBE_FLAG_DCB_CAPABLE; break; } #ifdef module_param_array } else { - *aflags |= IXGBE_FLAG_MSIX_CAPABLE; - *aflags |= IXGBE_FLAG_MSI_CAPABLE; + adapter->flags |= IXGBE_FLAG_MSIX_CAPABLE; + adapter->flags |= IXGBE_FLAG_MSI_CAPABLE; } #endif } @@ -437,35 +366,33 @@ void __devinit ixgbe_check_options(struct ixgbe_adapter *adapter) unsigned int mq = MQ[bd]; ixgbe_validate_option(&mq, &opt); if (mq) - *aflags |= IXGBE_FLAG_MQ_CAPABLE; + adapter->flags |= IXGBE_FLAG_MQ_CAPABLE; else - *aflags &= ~IXGBE_FLAG_MQ_CAPABLE; + adapter->flags &= ~IXGBE_FLAG_MQ_CAPABLE; #ifdef module_param_array } else { if (opt.def == OPTION_ENABLED) - *aflags |= IXGBE_FLAG_MQ_CAPABLE; + adapter->flags |= IXGBE_FLAG_MQ_CAPABLE; else - *aflags &= ~IXGBE_FLAG_MQ_CAPABLE; + adapter->flags &= ~IXGBE_FLAG_MQ_CAPABLE; } #endif /* Check Interoperability */ - if ((*aflags & IXGBE_FLAG_MQ_CAPABLE) && - !(*aflags & IXGBE_FLAG_MSIX_CAPABLE)) { + if ((adapter->flags & IXGBE_FLAG_MQ_CAPABLE) && + !(adapter->flags & IXGBE_FLAG_MSIX_CAPABLE)) { DPRINTK(PROBE, INFO, "Multiple queues are not supported while MSI-X " "is disabled. Disabling Multiple Queues.\n"); - *aflags &= ~IXGBE_FLAG_MQ_CAPABLE; + adapter->flags &= ~IXGBE_FLAG_MQ_CAPABLE; } } #if defined(CONFIG_DCA) || defined(CONFIG_DCA_MODULE) { /* Direct Cache Access (DCA) */ static struct ixgbe_option opt = { - .type = range_option, + .type = enable_option, .name = "Direct Cache Access (DCA)", .err = "defaulting to Enabled", - .def = IXGBE_MAX_DCA, - .arg = { .r = { .min = OPTION_DISABLED, - .max = IXGBE_MAX_DCA}} + .def = OPTION_ENABLED }; unsigned int dca = opt.def; @@ -475,29 +402,21 @@ void __devinit ixgbe_check_options(struct ixgbe_adapter *adapter) dca = DCA[bd]; ixgbe_validate_option(&dca, &opt); if (!dca) - *aflags &= ~IXGBE_FLAG_DCA_CAPABLE; + adapter->flags &= ~IXGBE_FLAG_DCA_CAPABLE; /* Check Interoperability */ - if (!(*aflags & IXGBE_FLAG_DCA_CAPABLE)) { + if (!(adapter->flags & IXGBE_FLAG_DCA_CAPABLE)) { DPRINTK(PROBE, INFO, "DCA is disabled\n"); - *aflags &= ~IXGBE_FLAG_DCA_ENABLED; - } - - if (dca == IXGBE_MAX_DCA) { - DPRINTK(PROBE, INFO, - "DCA enabled for rx data\n"); - adapter->flags |= IXGBE_FLAG_DCA_ENABLED_DATA; + adapter->flags &= ~IXGBE_FLAG_DCA_ENABLED; } #ifdef module_param_array } else { /* make sure to clear the capability flag if the * option is disabled by default above */ if (opt.def == OPTION_DISABLED) - *aflags &= ~IXGBE_FLAG_DCA_CAPABLE; + adapter->flags &= ~IXGBE_FLAG_DCA_CAPABLE; } #endif - if (dca == IXGBE_MAX_DCA) - adapter->flags |= IXGBE_FLAG_DCA_ENABLED_DATA; } #endif /* CONFIG_DCA or CONFIG_DCA_MODULE */ { /* Receive-Side Scaling (RSS) */ @@ -527,42 +446,39 @@ void __devinit ixgbe_check_options(struct ixgbe_adapter *adapter) ixgbe_validate_option(&rss, &opt); break; } - feature[RING_F_RSS].indices = rss; + adapter->ring_feature[RING_F_RSS].indices = rss; if (rss) - *aflags |= IXGBE_FLAG_RSS_ENABLED; + adapter->flags |= IXGBE_FLAG_RSS_ENABLED; else - *aflags &= ~IXGBE_FLAG_RSS_ENABLED; + adapter->flags &= ~IXGBE_FLAG_RSS_ENABLED; #ifdef module_param_array } else { if (opt.def == OPTION_DISABLED) { - *aflags &= ~IXGBE_FLAG_RSS_ENABLED; + adapter->flags &= ~IXGBE_FLAG_RSS_ENABLED; } else { rss = min(IXGBE_MAX_RSS_INDICES, (int)num_online_cpus()); - feature[RING_F_RSS].indices = rss; - if (rss) - *aflags |= IXGBE_FLAG_RSS_ENABLED; - else - *aflags &= ~IXGBE_FLAG_RSS_ENABLED; + adapter->ring_feature[RING_F_RSS].indices = rss; + adapter->flags |= IXGBE_FLAG_RSS_ENABLED; } } #endif /* Check Interoperability */ - if (*aflags & IXGBE_FLAG_RSS_ENABLED) { - if (!(*aflags & IXGBE_FLAG_RSS_CAPABLE)) { + if (adapter->flags & IXGBE_FLAG_RSS_ENABLED) { + if (!(adapter->flags & IXGBE_FLAG_RSS_CAPABLE)) { DPRINTK(PROBE, INFO, "RSS is not supported on this " "hardware. Disabling RSS.\n"); - *aflags &= ~IXGBE_FLAG_RSS_ENABLED; - feature[RING_F_RSS].indices = 0; - } else if (!(*aflags & IXGBE_FLAG_MQ_CAPABLE)) { + adapter->flags &= ~IXGBE_FLAG_RSS_ENABLED; + adapter->ring_feature[RING_F_RSS].indices = 0; + } else if (!(adapter->flags & IXGBE_FLAG_MQ_CAPABLE)) { DPRINTK(PROBE, INFO, "RSS is not supported while multiple " "queues are disabled. " "Disabling RSS.\n"); - *aflags &= ~IXGBE_FLAG_RSS_ENABLED; - *aflags &= ~IXGBE_FLAG_DCB_CAPABLE; - feature[RING_F_RSS].indices = 0; + adapter->flags &= ~IXGBE_FLAG_RSS_ENABLED; + adapter->flags &= ~IXGBE_FLAG_DCB_CAPABLE; + adapter->ring_feature[RING_F_RSS].indices = 0; } } } @@ -581,51 +497,42 @@ void __devinit ixgbe_check_options(struct ixgbe_adapter *adapter) #endif unsigned int vmdq = VMDQ[bd]; ixgbe_validate_option(&vmdq, &opt); - feature[RING_F_VMDQ].indices = vmdq; - adapter->flags2 |= IXGBE_FLAG2_VMDQ_DEFAULT_OVERRIDE; + adapter->ring_feature[RING_F_VMDQ].indices = vmdq; /* zero or one both mean disabled from our driver''s * perspective */ if (vmdq > 1) - *aflags |= IXGBE_FLAG_VMDQ_ENABLED; + adapter->flags |= IXGBE_FLAG_VMDQ_ENABLED; else - *aflags &= ~IXGBE_FLAG_VMDQ_ENABLED; + adapter->flags &= ~IXGBE_FLAG_VMDQ_ENABLED; #ifdef module_param_array } else { if (opt.def == OPTION_DISABLED) { - *aflags &= ~IXGBE_FLAG_VMDQ_ENABLED; + adapter->flags &= ~IXGBE_FLAG_VMDQ_ENABLED; } else { - feature[RING_F_VMDQ].indices = 8; - *aflags |= IXGBE_FLAG_VMDQ_ENABLED; + adapter->ring_feature[RING_F_VMDQ].indices = 8; + adapter->flags |= IXGBE_FLAG_VMDQ_ENABLED; } } #endif /* Check Interoperability */ - if (*aflags & IXGBE_FLAG_VMDQ_ENABLED) { - if (!(*aflags & IXGBE_FLAG_VMDQ_CAPABLE)) { + if (adapter->flags & IXGBE_FLAG_VMDQ_ENABLED) { + if (!(adapter->flags & IXGBE_FLAG_VMDQ_CAPABLE)) { DPRINTK(PROBE, INFO, "VMDQ is not supported on this " "hardware. Disabling VMDQ.\n"); - *aflags &= ~IXGBE_FLAG_VMDQ_ENABLED; - feature[RING_F_VMDQ].indices = 0; - } else if (!(*aflags & IXGBE_FLAG_MQ_CAPABLE)) { + adapter->flags &= ~IXGBE_FLAG_VMDQ_ENABLED; + adapter->ring_feature[RING_F_VMDQ].indices = 0; + } else if (!(adapter->flags & IXGBE_FLAG_MQ_CAPABLE)) { DPRINTK(PROBE, INFO, "VMDQ is not supported while multiple " "queues are disabled. " "Disabling VMDQ.\n"); - *aflags &= ~IXGBE_FLAG_VMDQ_ENABLED; - feature[RING_F_VMDQ].indices = 0; - } - if (adapter->hw.mac.type == ixgbe_mac_82598EB) { - /* for now, disable RSS when using VMDQ mode */ - *aflags &= ~IXGBE_FLAG_RSS_CAPABLE; - *aflags &= ~IXGBE_FLAG_RSS_ENABLED; - } else if (adapter->hw.mac.type == ixgbe_mac_82599EB) { - if (feature[RING_F_RSS].indices > 2 - && feature[RING_F_VMDQ].indices > 32) - feature[RING_F_RSS].indices = 2; - else if (feature[RING_F_RSS].indices != 0) - feature[RING_F_RSS].indices = 4; + adapter->flags &= ~IXGBE_FLAG_VMDQ_ENABLED; + adapter->ring_feature[RING_F_VMDQ].indices = 0; } + /* for now, disable RSS when using VMDQ mode */ + adapter->flags &= ~IXGBE_FLAG_RSS_CAPABLE; + adapter->flags &= ~IXGBE_FLAG_RSS_ENABLED; } } { /* Interrupt Throttling Rate */ @@ -637,42 +544,38 @@ void __devinit ixgbe_check_options(struct ixgbe_adapter *adapter) .arg = { .r = { .min = MIN_ITR, .max = MAX_ITR }} }; + u32 eitr; #ifdef module_param_array if (num_InterruptThrottleRate > bd) { #endif - u32 eitr = InterruptThrottleRate[bd]; + eitr = InterruptThrottleRate[bd]; switch (eitr) { case 0: DPRINTK(PROBE, INFO, "%s turned off\n", opt.name); - /* - * zero is a special value, we don''t want to + /* zero is a special value, we don''t want to * turn off ITR completely, just set it to an - * insane interrupt rate - */ - adapter->eitr_param = IXGBE_MAX_INT_RATE; - adapter->itr_setting = 0; + * insane interrupt rate (like 3.5 Million + * ints/s */ + eitr = EITR_REG_TO_INTS_PER_SEC(1); break; case 1: DPRINTK(PROBE, INFO, "dynamic interrupt " "throttling enabled\n"); - adapter->eitr_param = 20000; adapter->itr_setting = 1; + eitr = DEFAULT_ITR; break; default: ixgbe_validate_option(&eitr, &opt); - adapter->eitr_param = eitr; - /* the first bit is used as control */ - adapter->itr_setting = eitr & ~1; break; } #ifdef module_param_array } else { - adapter->eitr_param = DEFAULT_ITR; - adapter->itr_setting = DEFAULT_ITR; + eitr = DEFAULT_ITR; } #endif + adapter->eitr_param = eitr; } #ifndef IXGBE_NO_LLI { /* Low Latency Interrupt TCP Port*/ @@ -743,73 +646,47 @@ void __devinit ixgbe_check_options(struct ixgbe_adapter *adapter) unsigned int lli_push = LLIPush[bd]; ixgbe_validate_option(&lli_push, &opt); if (lli_push) - *aflags |= IXGBE_FLAG_LLI_PUSH; + adapter->flags |= IXGBE_FLAG_LLI_PUSH; else - *aflags &= ~IXGBE_FLAG_LLI_PUSH; + adapter->flags &= ~IXGBE_FLAG_LLI_PUSH; #ifdef module_param_array } else { if (opt.def == OPTION_ENABLED) - *aflags |= IXGBE_FLAG_LLI_PUSH; + adapter->flags |= IXGBE_FLAG_LLI_PUSH; else - *aflags &= ~IXGBE_FLAG_LLI_PUSH; + adapter->flags &= ~IXGBE_FLAG_LLI_PUSH; } #endif } - { /* Low Latency Interrupt EtherType*/ - static struct ixgbe_option opt = { - .type = range_option, - .name = "Low Latency Interrupt on Ethernet Protocol Type", - .err = "using default of " - __MODULE_STRING(DEFAULT_LLIETYPE), - .def = DEFAULT_LLIETYPE, - .arg = { .r = { .min = MIN_LLIETYPE, - .max = MAX_LLIETYPE }} - }; - -#ifdef module_param_array - if (num_LLIEType > bd) { -#endif - adapter->lli_etype = LLIEType[bd]; - if (adapter->lli_etype) { - ixgbe_validate_option(&adapter->lli_etype, &opt); - } else { - DPRINTK(PROBE, INFO, "%s turned off\n", - opt.name); - } -#ifdef module_param_array - } else { - adapter->lli_etype = opt.def; - } -#endif - } - { /* LLI VLAN Priority */ +#endif /* IXGBE_NO_LLI */ +#ifndef IXGBE_NO_INET_LRO + { /* Large Receive Offload - Maximum packets to aggregate */ static struct ixgbe_option opt = { .type = range_option, - .name = "Low Latency Interrupt on VLAN priority threashold", - .err = "using default of " - __MODULE_STRING(DEFAULT_LLIVLANP), - .def = DEFAULT_LLIVLANP, - .arg = { .r = { .min = MIN_LLIVLANP, - .max = MAX_LLIVLANP }} + .name = "LRO - Maximum packets to aggregate", + .err = "using default of " __MODULE_STRING(DEFAULT_LRO_AGGR), + .def = DEFAULT_LRO_AGGR, + .arg = { .r = { .min = MIN_LRO_AGGR, + .max = MAX_LRO_AGGR }} }; #ifdef module_param_array - if (num_LLIVLANP > bd) { + if (num_LROAggr > bd) { #endif - adapter->lli_vlan_pri = LLIVLANP[bd]; - if (adapter->lli_vlan_pri) { - ixgbe_validate_option(&adapter->lli_vlan_pri, &opt); + adapter->lro_max_aggr = LROAggr[bd]; + if (adapter->lro_max_aggr) { + ixgbe_validate_option(&adapter->lro_max_aggr, &opt); } else { DPRINTK(PROBE, INFO, "%s turned off\n", opt.name); } #ifdef module_param_array } else { - adapter->lli_vlan_pri = opt.def; + adapter->lro_max_aggr = opt.def; } #endif } -#endif /* IXGBE_NO_LLI */ +#endif /* IXGBE_NO_INET_LRO */ { /* Rx buffer mode */ unsigned int rx_buf_mode; static struct ixgbe_option opt = { @@ -829,202 +706,31 @@ void __devinit ixgbe_check_options(struct ixgbe_adapter *adapter) ixgbe_validate_option(&rx_buf_mode, &opt); switch (rx_buf_mode) { case IXGBE_RXBUFMODE_OPTIMAL: - *aflags |= IXGBE_FLAG_RX_1BUF_CAPABLE; - *aflags |= IXGBE_FLAG_RX_PS_CAPABLE; + adapter->flags |= IXGBE_FLAG_RX_1BUF_CAPABLE; + adapter->flags |= IXGBE_FLAG_RX_PS_CAPABLE; break; case IXGBE_RXBUFMODE_PS_ALWAYS: - *aflags |= IXGBE_FLAG_RX_PS_CAPABLE; + adapter->flags |= IXGBE_FLAG_RX_PS_CAPABLE; break; case IXGBE_RXBUFMODE_1BUF_ALWAYS: - *aflags |= IXGBE_FLAG_RX_1BUF_CAPABLE; + adapter->flags |= IXGBE_FLAG_RX_1BUF_CAPABLE; default: break; } #ifdef module_param_array } else { - *aflags |= IXGBE_FLAG_RX_1BUF_CAPABLE; - *aflags |= IXGBE_FLAG_RX_PS_CAPABLE; + adapter->flags |= IXGBE_FLAG_RX_1BUF_CAPABLE; + adapter->flags |= IXGBE_FLAG_RX_PS_CAPABLE; } #endif +#ifdef CONFIG_XEN_NETDEV2_VMQ + if ((adapter->flags & + (IXGBE_FLAG_RX_PS_CAPABLE | IXGBE_FLAG_VMDQ_ENABLED)) =+ (IXGBE_FLAG_RX_PS_CAPABLE | IXGBE_FLAG_VMDQ_ENABLED)) { + printk(KERN_INFO "ixgbe: packet split disabled for Xen VMDQ\n"); + adapter->flags &= ~IXGBE_FLAG_RX_PS_CAPABLE; } - { /* Flow Director filtering mode */ - unsigned int fdir_filter_mode; - static struct ixgbe_option opt = { - .type = range_option, - .name = "Flow Director filtering mode", - .err = "using default of " - __MODULE_STRING(IXGBE_DEFAULT_FDIR_FILTER), - .def = IXGBE_DEFAULT_FDIR_FILTER, - .arg = {.r = {.min = IXGBE_FDIR_FILTER_OFF, - .max = IXGBE_FDIR_FILTER_PERFECT}} - }; - - *aflags &= ~IXGBE_FLAG_FDIR_HASH_CAPABLE; - *aflags &= ~IXGBE_FLAG_FDIR_PERFECT_CAPABLE; - if (adapter->hw.mac.type == ixgbe_mac_82598EB) - goto no_flow_director; -#ifdef module_param_array - if (num_FdirMode > bd) { #endif -#ifdef HAVE_TX_MQ - fdir_filter_mode = FdirMode[bd]; -#else - fdir_filter_mode = IXGBE_FDIR_FILTER_OFF; -#endif /* HAVE_TX_MQ */ - ixgbe_validate_option(&fdir_filter_mode, &opt); - - switch (fdir_filter_mode) { - case IXGBE_FDIR_FILTER_OFF: - DPRINTK(PROBE, INFO, "Flow Director disabled\n"); - break; - case IXGBE_FDIR_FILTER_HASH: - *aflags |= IXGBE_FLAG_FDIR_HASH_CAPABLE; - *aflags &= ~IXGBE_FLAG_FDIR_PERFECT_CAPABLE; - feature[RING_F_FDIR].indices - IXGBE_MAX_FDIR_INDICES; - DPRINTK(PROBE, INFO, - "Flow Director hash filtering enabled\n"); - break; - case IXGBE_FDIR_FILTER_PERFECT: - *aflags &= ~IXGBE_FLAG_FDIR_HASH_CAPABLE; - *aflags |= IXGBE_FLAG_FDIR_PERFECT_CAPABLE; - feature[RING_F_FDIR].indices - IXGBE_MAX_FDIR_INDICES; - spin_lock_init(&adapter->fdir_perfect_lock); - DPRINTK(PROBE, INFO, - "Flow Director perfect filtering enabled\n"); - break; - default: - break; - } -#ifdef module_param_array - } else { -#ifdef HAVE_TX_MQ - *aflags |= IXGBE_FLAG_FDIR_HASH_CAPABLE; - feature[RING_F_FDIR].indices = IXGBE_MAX_FDIR_INDICES; - DPRINTK(PROBE, INFO, - "Flow Director hash filtering enabled\n"); -#else - *aflags &= ~IXGBE_FLAG_FDIR_HASH_CAPABLE; - *aflags &= ~IXGBE_FLAG_FDIR_PERFECT_CAPABLE; - feature[RING_F_FDIR].indices = 0; - DPRINTK(PROBE, INFO, - "Flow Director hash filtering disabled\n"); -#endif /* HAVE_TX_MQ */ - } - /* Check interoperability */ - if ((*aflags & IXGBE_FLAG_FDIR_HASH_CAPABLE) || - (*aflags & IXGBE_FLAG_FDIR_PERFECT_CAPABLE)) { - if (!(*aflags & IXGBE_FLAG_MQ_CAPABLE)) { - DPRINTK(PROBE, INFO, - "Flow Director is not supported " - "while multiple queues are disabled. " - "Disabling Flow Director\n"); - *aflags &= ~IXGBE_FLAG_FDIR_HASH_CAPABLE; - *aflags &= ~IXGBE_FLAG_FDIR_PERFECT_CAPABLE; - } - } -#endif -no_flow_director: - /* empty code line with semi-colon */ ; - } - { /* Flow Director packet buffer allocation */ - unsigned int fdir_pballoc_mode; - static struct ixgbe_option opt = { - .type = range_option, - .name = "Flow Director packet buffer allocation", - .err = "using default of " - __MODULE_STRING(IXGBE_DEFAULT_FDIR_PBALLOC), - .def = IXGBE_DEFAULT_FDIR_PBALLOC, - .arg = {.r = {.min = IXGBE_FDIR_PBALLOC_64K, - .max = IXGBE_FDIR_PBALLOC_256K}} - }; - - if ((adapter->hw.mac.type == ixgbe_mac_82598EB) || - (!(*aflags & (IXGBE_FLAG_FDIR_HASH_CAPABLE | - IXGBE_FLAG_FDIR_PERFECT_CAPABLE)))) - goto no_fdir_pballoc; -#ifdef module_param_array - if (num_FdirPballoc > bd) { -#endif - char pstring[10]; - fdir_pballoc_mode = FdirPballoc[bd]; - ixgbe_validate_option(&fdir_pballoc_mode, &opt); - switch (fdir_pballoc_mode) { - case IXGBE_FDIR_PBALLOC_64K: - adapter->fdir_pballoc = IXGBE_FDIR_PBALLOC_64K; - sprintf(pstring, "64kB"); - break; - case IXGBE_FDIR_PBALLOC_128K: - adapter->fdir_pballoc = IXGBE_FDIR_PBALLOC_128K; - sprintf(pstring, "128kB"); - break; - case IXGBE_FDIR_PBALLOC_256K: - adapter->fdir_pballoc = IXGBE_FDIR_PBALLOC_256K; - sprintf(pstring, "256kB"); - break; - default: - break; - } - DPRINTK(PROBE, INFO, - "Flow Director allocated %s of packet buffer\n", - pstring); - -#ifdef module_param_array - } else { - adapter->fdir_pballoc = opt.def; - DPRINTK(PROBE, INFO, - "Flow Director allocated 64kB of packet buffer\n"); - - } -#endif -no_fdir_pballoc: - /* empty code line with semi-colon */ ; - } - { /* Flow Director ATR Tx sample packet rate */ - static struct ixgbe_option opt = { - .type = range_option, - .name = "Software ATR Tx packet sample rate", - .err = "using default of " - __MODULE_STRING(IXGBE_DEFAULT_ATR_SAMPLE_RATE), - .def = IXGBE_DEFAULT_ATR_SAMPLE_RATE, - .arg = {.r = {.min = IXGBE_ATR_SAMPLE_RATE_OFF, - .max = IXGBE_MAX_ATR_SAMPLE_RATE}} - }; - static const char atr_string[] - "ATR Tx Packet sample rate set to"; - - adapter->atr_sample_rate = IXGBE_ATR_SAMPLE_RATE_OFF; - if (adapter->hw.mac.type == ixgbe_mac_82598EB) - goto no_fdir_sample; - - /* no sample rate for perfect filtering */ - if (*aflags & IXGBE_FLAG_FDIR_PERFECT_CAPABLE) - goto no_fdir_sample; -#ifdef module_param_array - if (num_AtrSampleRate > bd) { -#endif - /* Only enable the sample rate if hashing (ATR) is on */ - if (*aflags & IXGBE_FLAG_FDIR_HASH_CAPABLE) - adapter->atr_sample_rate = AtrSampleRate[bd]; - - if (adapter->atr_sample_rate) { - ixgbe_validate_option(&adapter->atr_sample_rate, - &opt); - DPRINTK(PROBE, INFO, "%s %d\n", atr_string, - adapter->atr_sample_rate); - } -#ifdef module_param_array - } else { - /* Only enable the sample rate if hashing (ATR) is on */ - if (*aflags & IXGBE_FLAG_FDIR_HASH_CAPABLE) - adapter->atr_sample_rate = opt.def; - - DPRINTK(PROBE, INFO, "%s default of %d\n", atr_string, - adapter->atr_sample_rate); - } -#endif -no_fdir_sample: - /* empty code line with semi-colon */ ; } } + diff --git a/drivers/net/ixgbe/ixgbe_phy.c b/drivers/net/ixgbe/ixgbe_phy.c index 530d858..a8f6af2 100644 --- a/drivers/net/ixgbe/ixgbe_phy.c +++ b/drivers/net/ixgbe/ixgbe_phy.c @@ -1,7 +1,7 @@ /******************************************************************************* Intel 10 Gigabit PCI Express Linux driver - Copyright(c) 1999 - 2009 Intel Corporation. + Copyright(c) 1999 - 2008 Intel Corporation. This program is free software; you can redistribute it and/or modify it under the terms and conditions of the GNU General Public License, @@ -29,19 +29,6 @@ #include "ixgbe_common.h" #include "ixgbe_phy.h" -static void ixgbe_i2c_start(struct ixgbe_hw *hw); -static void ixgbe_i2c_stop(struct ixgbe_hw *hw); -static s32 ixgbe_clock_in_i2c_byte(struct ixgbe_hw *hw, u8 *data); -static s32 ixgbe_clock_out_i2c_byte(struct ixgbe_hw *hw, u8 data); -static s32 ixgbe_get_i2c_ack(struct ixgbe_hw *hw); -static s32 ixgbe_clock_in_i2c_bit(struct ixgbe_hw *hw, bool *data); -static s32 ixgbe_clock_out_i2c_bit(struct ixgbe_hw *hw, bool data); -static s32 ixgbe_raise_i2c_clk(struct ixgbe_hw *hw, u32 *i2cctl); -static void ixgbe_lower_i2c_clk(struct ixgbe_hw *hw, u32 *i2cctl); -static s32 ixgbe_set_i2c_data(struct ixgbe_hw *hw, u32 *i2cctl, bool data); -static bool ixgbe_get_i2c_data(u32 *i2cctl); -void ixgbe_i2c_bus_clear(struct ixgbe_hw *hw); - /** * ixgbe_init_phy_ops_generic - Inits PHY function ptrs * @hw: pointer to the hardware structure @@ -61,11 +48,6 @@ s32 ixgbe_init_phy_ops_generic(struct ixgbe_hw *hw) phy->ops.setup_link_speed = &ixgbe_setup_phy_link_speed_generic; phy->ops.check_link = NULL; phy->ops.get_firmware_version = NULL; - phy->ops.read_i2c_byte = &ixgbe_read_i2c_byte_generic; - phy->ops.write_i2c_byte = &ixgbe_write_i2c_byte_generic; - phy->ops.read_i2c_eeprom = &ixgbe_read_i2c_eeprom_generic; - phy->ops.write_i2c_eeprom = &ixgbe_write_i2c_eeprom_generic; - phy->ops.i2c_bus_clear = &ixgbe_i2c_bus_clear; phy->ops.identify_sfp = &ixgbe_identify_sfp_module_generic; phy->sfp_type = ixgbe_sfp_type_unknown; @@ -82,7 +64,6 @@ s32 ixgbe_identify_phy_generic(struct ixgbe_hw *hw) { s32 status = IXGBE_ERR_PHY_ADDR_INVALID; u32 phy_addr; - u16 ext_ability = 0; if (hw->phy.type == ixgbe_phy_unknown) { for (phy_addr = 0; phy_addr < IXGBE_MAX_PHY_ADDR; phy_addr++) { @@ -91,29 +72,10 @@ s32 ixgbe_identify_phy_generic(struct ixgbe_hw *hw) ixgbe_get_phy_id(hw); hw->phy.type ixgbe_get_phy_type_from_id(hw->phy.id); - - if (hw->phy.type == ixgbe_phy_unknown) { - hw->phy.ops.read_reg(hw, - IXGBE_MDIO_PHY_EXT_ABILITY, - IXGBE_MDIO_PMA_PMD_DEV_TYPE, - &ext_ability); - if (ext_ability & - IXGBE_MDIO_PHY_10GBASET_ABILITY || - ext_ability & - IXGBE_MDIO_PHY_1000BASET_ABILITY) - hw->phy.type - ixgbe_phy_cu_unknown; - else - hw->phy.type - ixgbe_phy_generic; - } - status = 0; break; } } - if (status != 0) - hw->phy.addr = 0; } else { status = 0; } @@ -180,9 +142,6 @@ enum ixgbe_phy_type ixgbe_get_phy_type_from_id(u32 phy_id) case TN1010_PHY_ID: phy_type = ixgbe_phy_tn; break; - case AQ1002_PHY_ID: - phy_type = ixgbe_phy_aq; - break; case QT2022_PHY_ID: phy_type = ixgbe_phy_qt; break; @@ -204,40 +163,13 @@ enum ixgbe_phy_type ixgbe_get_phy_type_from_id(u32 phy_id) **/ s32 ixgbe_reset_phy_generic(struct ixgbe_hw *hw) { - u32 i; - u16 ctrl = 0; - s32 status = 0; - - if (hw->phy.type == ixgbe_phy_unknown) - status = ixgbe_identify_phy_generic(hw); - - if (status != 0 || hw->phy.type == ixgbe_phy_none) - goto out; - /* * Perform soft PHY reset to the PHY_XS. * This will cause a soft reset to the PHY */ - hw->phy.ops.write_reg(hw, IXGBE_MDIO_PHY_XS_CONTROL, - IXGBE_MDIO_PHY_XS_DEV_TYPE, - IXGBE_MDIO_PHY_XS_RESET); - - /* Poll for reset bit to self-clear indicating reset is complete */ - for (i = 0; i < 500; i++) { - msleep(1); - hw->phy.ops.read_reg(hw, IXGBE_MDIO_PHY_XS_CONTROL, - IXGBE_MDIO_PHY_XS_DEV_TYPE, &ctrl); - if (!(ctrl & IXGBE_MDIO_PHY_XS_RESET)) - break; - } - - if (ctrl & IXGBE_MDIO_PHY_XS_RESET) { - status = IXGBE_ERR_RESET_FAILED; - hw_dbg(hw, "PHY reset polling failed to complete.\n"); - } - -out: - return status; + return hw->phy.ops.write_reg(hw, IXGBE_MDIO_PHY_XS_CONTROL, + IXGBE_MDIO_PHY_XS_DEV_TYPE, + IXGBE_MDIO_PHY_XS_RESET); } /** @@ -437,7 +369,7 @@ s32 ixgbe_write_phy_reg_generic(struct ixgbe_hw *hw, u32 reg_addr, **/ s32 ixgbe_setup_phy_link_generic(struct ixgbe_hw *hw) { - s32 status = 0; + s32 status = IXGBE_NOT_IMPLEMENTED; u32 time_out; u32 max_time_out = 10; u16 autoneg_reg = IXGBE_MII_AUTONEG_REG; @@ -478,6 +410,7 @@ s32 ixgbe_setup_phy_link_generic(struct ixgbe_hw *hw) autoneg_reg &= IXGBE_MII_AUTONEG_COMPLETE; if (autoneg_reg == IXGBE_MII_AUTONEG_COMPLETE) { + status = 0; break; } } @@ -512,9 +445,6 @@ s32 ixgbe_setup_phy_link_speed_generic(struct ixgbe_hw *hw, if (speed & IXGBE_LINK_SPEED_1GB_FULL) hw->phy.autoneg_advertised |= IXGBE_LINK_SPEED_1GB_FULL; - if (speed & IXGBE_LINK_SPEED_100_FULL) - hw->phy.autoneg_advertised |= IXGBE_LINK_SPEED_100_FULL; - /* Setup link based on the new speed settings */ hw->phy.ops.setup_link(hw); @@ -522,40 +452,6 @@ s32 ixgbe_setup_phy_link_speed_generic(struct ixgbe_hw *hw, } /** - * ixgbe_get_copper_link_capabilities_generic - Determines link capabilities - * @hw: pointer to hardware structure - * @speed: pointer to link speed - * @autoneg: boolean auto-negotiation value - * - * Determines the link capabilities by reading the AUTOC register. - **/ -s32 ixgbe_get_copper_link_capabilities_generic(struct ixgbe_hw *hw, - ixgbe_link_speed *speed, - bool *autoneg) -{ - s32 status = IXGBE_ERR_LINK_SETUP; - u16 speed_ability; - - *speed = 0; - *autoneg = true; - - status = hw->phy.ops.read_reg(hw, IXGBE_MDIO_PHY_SPEED_ABILITY, - IXGBE_MDIO_PMA_PMD_DEV_TYPE, - &speed_ability); - - if (status == 0) { - if (speed_ability & IXGBE_MDIO_PHY_SPEED_10G) - *speed |= IXGBE_LINK_SPEED_10GB_FULL; - if (speed_ability & IXGBE_MDIO_PHY_SPEED_1G) - *speed |= IXGBE_LINK_SPEED_1GB_FULL; - if (speed_ability & IXGBE_MDIO_PHY_SPEED_100M) - *speed |= IXGBE_LINK_SPEED_100_FULL; - } - - return status; -} - -/** * ixgbe_check_phy_link_tnx - Determine link and speed status * @hw: pointer to hardware structure * @@ -620,24 +516,6 @@ s32 ixgbe_get_phy_firmware_version_tnx(struct ixgbe_hw *hw, return status; } - -/** - * ixgbe_get_phy_firmware_version_aq - Gets the PHY Firmware Version - * @hw: pointer to hardware structure - * @firmware_version: pointer to the PHY Firmware Version - **/ -s32 ixgbe_get_phy_firmware_version_aq(struct ixgbe_hw *hw, - u16 *firmware_version) -{ - s32 status = 0; - - status = hw->phy.ops.read_reg(hw, AQ_FW_REV, - IXGBE_MDIO_VENDOR_SPECIFIC_1_DEV_TYPE, - firmware_version); - - return status; -} - /** * ixgbe_reset_phy_nl - Performs a PHY reset * @hw: pointer to hardware structure @@ -745,101 +623,45 @@ s32 ixgbe_identify_sfp_module_generic(struct ixgbe_hw *hw) { s32 status = IXGBE_ERR_PHY_ADDR_INVALID; u32 vendor_oui = 0; - enum ixgbe_sfp_type stored_sfp_type = hw->phy.sfp_type; u8 identifier = 0; u8 comp_codes_1g = 0; u8 comp_codes_10g = 0; - u8 oui_bytes[3] = {0, 0, 0}; - u8 cable_tech = 0; - u16 enforce_sfp = 0; - - if (hw->mac.ops.get_media_type(hw) != ixgbe_media_type_fiber) { - hw->phy.sfp_type = ixgbe_sfp_type_not_present; - status = IXGBE_ERR_SFP_NOT_PRESENT; - goto out; - } + u8 oui_bytes[4] = {0, 0, 0, 0}; + u8 transmission_media = 0; status = hw->phy.ops.read_i2c_eeprom(hw, IXGBE_SFF_IDENTIFIER, &identifier); - if (status == IXGBE_ERR_SFP_NOT_PRESENT || status == IXGBE_ERR_I2C) { - status = IXGBE_ERR_SFP_NOT_PRESENT; + if (status == IXGBE_ERR_SFP_NOT_PRESENT) { hw->phy.sfp_type = ixgbe_sfp_type_not_present; - if (hw->phy.type != ixgbe_phy_nl) { - hw->phy.id = 0; - hw->phy.type = ixgbe_phy_unknown; - } goto out; } - /* LAN ID is needed for sfp_type determination */ - hw->mac.ops.set_lan_id(hw); - if (identifier == IXGBE_SFF_IDENTIFIER_SFP) { hw->phy.ops.read_i2c_eeprom(hw, IXGBE_SFF_1GBE_COMP_CODES, &comp_codes_1g); hw->phy.ops.read_i2c_eeprom(hw, IXGBE_SFF_10GBE_COMP_CODES, &comp_codes_10g); - hw->phy.ops.read_i2c_eeprom(hw, IXGBE_SFF_CABLE_TECHNOLOGY, - &cable_tech); + hw->phy.ops.read_i2c_eeprom(hw, IXGBE_SFF_TRANSMISSION_MEDIA, + &transmission_media); /* ID Module * ======== * 0 SFP_DA_CU * 1 SFP_SR * 2 SFP_LR - * 3 SFP_DA_CORE0 - 82599-specific - * 4 SFP_DA_CORE1 - 82599-specific - * 5 SFP_SR/LR_CORE0 - 82599-specific - * 6 SFP_SR/LR_CORE1 - 82599-specific */ - if (hw->mac.type == ixgbe_mac_82598EB) { - if (cable_tech & IXGBE_SFF_DA_PASSIVE_CABLE) - hw->phy.sfp_type = ixgbe_sfp_type_da_cu; - else if (comp_codes_10g & IXGBE_SFF_10GBASESR_CAPABLE) - hw->phy.sfp_type = ixgbe_sfp_type_sr; - else if (comp_codes_10g & IXGBE_SFF_10GBASELR_CAPABLE) - hw->phy.sfp_type = ixgbe_sfp_type_lr; - else - hw->phy.sfp_type = ixgbe_sfp_type_unknown; - } else if (hw->mac.type == ixgbe_mac_82599EB) { - if (cable_tech & IXGBE_SFF_DA_PASSIVE_CABLE) - if (hw->bus.lan_id == 0) - hw->phy.sfp_type - ixgbe_sfp_type_da_cu_core0; - else - hw->phy.sfp_type - ixgbe_sfp_type_da_cu_core1; - else if (comp_codes_10g & IXGBE_SFF_10GBASESR_CAPABLE) - if (hw->bus.lan_id == 0) - hw->phy.sfp_type - ixgbe_sfp_type_srlr_core0; - else - hw->phy.sfp_type - ixgbe_sfp_type_srlr_core1; - else if (comp_codes_10g & IXGBE_SFF_10GBASELR_CAPABLE) - if (hw->bus.lan_id == 0) - hw->phy.sfp_type - ixgbe_sfp_type_srlr_core0; - else - hw->phy.sfp_type - ixgbe_sfp_type_srlr_core1; - else - hw->phy.sfp_type = ixgbe_sfp_type_unknown; - } - - if (hw->phy.sfp_type != stored_sfp_type) - hw->phy.sfp_setup_needed = true; + if (transmission_media & IXGBE_SFF_TWIN_AX_CAPABLE) + hw->phy.sfp_type = ixgbe_sfp_type_da_cu; + else if (comp_codes_10g & IXGBE_SFF_10GBASESR_CAPABLE) + hw->phy.sfp_type = ixgbe_sfp_type_sr; + else if (comp_codes_10g & IXGBE_SFF_10GBASELR_CAPABLE) + hw->phy.sfp_type = ixgbe_sfp_type_lr; + else + hw->phy.sfp_type = ixgbe_sfp_type_unknown; - /* Determine if the SFP+ PHY is dual speed or not. */ - hw->phy.multispeed_fiber = false; - if (((comp_codes_1g & IXGBE_SFF_1GBASESX_CAPABLE) && - (comp_codes_10g & IXGBE_SFF_10GBASESR_CAPABLE)) || - ((comp_codes_1g & IXGBE_SFF_1GBASELX_CAPABLE) && - (comp_codes_10g & IXGBE_SFF_10GBASELR_CAPABLE))) - hw->phy.multispeed_fiber = true; /* Determine PHY vendor */ - if (hw->phy.type != ixgbe_phy_nl) { + if (hw->phy.type == ixgbe_phy_unknown) { hw->phy.id = identifier; hw->phy.ops.read_i2c_eeprom(hw, IXGBE_SFF_VENDOR_OUI_BYTE0, @@ -858,7 +680,8 @@ s32 ixgbe_identify_sfp_module_generic(struct ixgbe_hw *hw) switch (vendor_oui) { case IXGBE_SFF_VENDOR_OUI_TYCO: - if (cable_tech & IXGBE_SFF_DA_PASSIVE_CABLE) + if (transmission_media & + IXGBE_SFF_TWIN_AX_CAPABLE) hw->phy.type = ixgbe_phy_tw_tyco; break; case IXGBE_SFF_VENDOR_OUI_FTL: @@ -867,50 +690,16 @@ s32 ixgbe_identify_sfp_module_generic(struct ixgbe_hw *hw) case IXGBE_SFF_VENDOR_OUI_AVAGO: hw->phy.type = ixgbe_phy_sfp_avago; break; - case IXGBE_SFF_VENDOR_OUI_INTEL: - hw->phy.type = ixgbe_phy_sfp_intel; - break; default: - if (cable_tech & IXGBE_SFF_DA_PASSIVE_CABLE) + if (transmission_media & + IXGBE_SFF_TWIN_AX_CAPABLE) hw->phy.type = ixgbe_phy_tw_unknown; else hw->phy.type = ixgbe_phy_sfp_unknown; break; } } - - /* All passive DA cables are supported */ - if (cable_tech & IXGBE_SFF_DA_PASSIVE_CABLE) { - status = 0; - goto out; - } - - /* 1G SFP modules are not supported */ - if (comp_codes_10g == 0) { - hw->phy.type = ixgbe_phy_sfp_unsupported; - status = IXGBE_ERR_SFP_NOT_SUPPORTED; - goto out; - } - - /* Anything else 82598-based is supported */ - if (hw->mac.type == ixgbe_mac_82598EB) { - status = 0; - goto out; - } - - ixgbe_get_device_caps(hw, &enforce_sfp); - if (!(enforce_sfp & IXGBE_DEVICE_CAPS_ALLOW_ANY_SFP)) { - /* Make sure we''re a supported PHY type */ - if (hw->phy.type == ixgbe_phy_sfp_intel) { - status = 0; - } else { - hw_dbg(hw, "SFP+ module not supported\n"); - hw->phy.type = ixgbe_phy_sfp_unsupported; - status = IXGBE_ERR_SFP_NOT_SUPPORTED; - } - } else { - status = 0; - } + status = 0; } out: @@ -946,7 +735,7 @@ s32 ixgbe_get_sfp_init_sequence_offsets(struct ixgbe_hw *hw, hw->eeprom.ops.read(hw, IXGBE_PHY_INIT_OFFSET_NL, list_offset); if ((!*list_offset) || (*list_offset == 0xFFFF)) - return IXGBE_ERR_SFP_NO_INIT_SEQ_PRESENT; + return IXGBE_ERR_PHY; /* Shift offset to first ID word */ (*list_offset)++; @@ -982,532 +771,3 @@ s32 ixgbe_get_sfp_init_sequence_offsets(struct ixgbe_hw *hw, return 0; } -/** - * ixgbe_read_i2c_eeprom_generic - Reads 8 bit EEPROM word over I2C interface - * @hw: pointer to hardware structure - * @byte_offset: EEPROM byte offset to read - * @eeprom_data: value read - * - * Performs byte read operation to SFP module''s EEPROM over I2C interface. - **/ -s32 ixgbe_read_i2c_eeprom_generic(struct ixgbe_hw *hw, u8 byte_offset, - u8 *eeprom_data) -{ - return hw->phy.ops.read_i2c_byte(hw, byte_offset, - IXGBE_I2C_EEPROM_DEV_ADDR, - eeprom_data); -} - -/** - * ixgbe_write_i2c_eeprom_generic - Writes 8 bit EEPROM word over I2C interface - * @hw: pointer to hardware structure - * @byte_offset: EEPROM byte offset to write - * @eeprom_data: value to write - * - * Performs byte write operation to SFP module''s EEPROM over I2C interface. - **/ -s32 ixgbe_write_i2c_eeprom_generic(struct ixgbe_hw *hw, u8 byte_offset, - u8 eeprom_data) -{ - return hw->phy.ops.write_i2c_byte(hw, byte_offset, - IXGBE_I2C_EEPROM_DEV_ADDR, - eeprom_data); -} - -/** - * ixgbe_read_i2c_byte_generic - Reads 8 bit word over I2C - * @hw: pointer to hardware structure - * @byte_offset: byte offset to read - * @data: value read - * - * Performs byte read operation to SFP module''s EEPROM over I2C interface at - * a specified deivce address. - **/ -s32 ixgbe_read_i2c_byte_generic(struct ixgbe_hw *hw, u8 byte_offset, - u8 dev_addr, u8 *data) -{ - s32 status = 0; - u32 max_retry = 10; - u32 retry = 0; - u16 swfw_mask = 0; - bool nack = 1; - - if (IXGBE_READ_REG(hw, IXGBE_STATUS) & IXGBE_STATUS_LAN_ID_1) - swfw_mask = IXGBE_GSSR_PHY1_SM; - else - swfw_mask = IXGBE_GSSR_PHY0_SM; - - - do { - if (ixgbe_acquire_swfw_sync(hw, swfw_mask) != 0) { - status = IXGBE_ERR_SWFW_SYNC; - goto read_byte_out; - } - - ixgbe_i2c_start(hw); - - /* Device Address and write indication */ - status = ixgbe_clock_out_i2c_byte(hw, dev_addr); - if (status != 0) - goto fail; - - status = ixgbe_get_i2c_ack(hw); - if (status != 0) - goto fail; - - status = ixgbe_clock_out_i2c_byte(hw, byte_offset); - if (status != 0) - goto fail; - - status = ixgbe_get_i2c_ack(hw); - if (status != 0) - goto fail; - - ixgbe_i2c_start(hw); - - /* Device Address and read indication */ - status = ixgbe_clock_out_i2c_byte(hw, (dev_addr | 0x1)); - if (status != 0) - goto fail; - - status = ixgbe_get_i2c_ack(hw); - if (status != 0) - goto fail; - - status = ixgbe_clock_in_i2c_byte(hw, data); - if (status != 0) - goto fail; - - status = ixgbe_clock_out_i2c_bit(hw, nack); - if (status != 0) - goto fail; - - ixgbe_i2c_stop(hw); - break; - -fail: - ixgbe_release_swfw_sync(hw, swfw_mask); - msleep(100); - ixgbe_i2c_bus_clear(hw); - retry++; - if (retry < max_retry) - hw_dbg(hw, "I2C byte read error - Retrying.\n"); - else - hw_dbg(hw, "I2C byte read error.\n"); - - } while (retry < max_retry); - - ixgbe_release_swfw_sync(hw, swfw_mask); - -read_byte_out: - return status; -} - -/** - * ixgbe_write_i2c_byte_generic - Writes 8 bit word over I2C - * @hw: pointer to hardware structure - * @byte_offset: byte offset to write - * @data: value to write - * - * Performs byte write operation to SFP module''s EEPROM over I2C interface at - * a specified device address. - **/ -s32 ixgbe_write_i2c_byte_generic(struct ixgbe_hw *hw, u8 byte_offset, - u8 dev_addr, u8 data) -{ - s32 status = 0; - u32 max_retry = 1; - u32 retry = 0; - u16 swfw_mask = 0; - - if (IXGBE_READ_REG(hw, IXGBE_STATUS) & IXGBE_STATUS_LAN_ID_1) - swfw_mask = IXGBE_GSSR_PHY1_SM; - else - swfw_mask = IXGBE_GSSR_PHY0_SM; - - if (ixgbe_acquire_swfw_sync(hw, swfw_mask) != 0) { - status = IXGBE_ERR_SWFW_SYNC; - goto write_byte_out; - } - - do { - ixgbe_i2c_start(hw); - - status = ixgbe_clock_out_i2c_byte(hw, dev_addr); - if (status != 0) - goto fail; - - status = ixgbe_get_i2c_ack(hw); - if (status != 0) - goto fail; - - status = ixgbe_clock_out_i2c_byte(hw, byte_offset); - if (status != 0) - goto fail; - - status = ixgbe_get_i2c_ack(hw); - if (status != 0) - goto fail; - - status = ixgbe_clock_out_i2c_byte(hw, data); - if (status != 0) - goto fail; - - status = ixgbe_get_i2c_ack(hw); - if (status != 0) - goto fail; - - ixgbe_i2c_stop(hw); - break; - -fail: - ixgbe_i2c_bus_clear(hw); - retry++; - if (retry < max_retry) - hw_dbg(hw, "I2C byte write error - Retrying.\n"); - else - hw_dbg(hw, "I2C byte write error.\n"); - } while (retry < max_retry); - - ixgbe_release_swfw_sync(hw, swfw_mask); - -write_byte_out: - return status; -} - -/** - * ixgbe_i2c_start - Sets I2C start condition - * @hw: pointer to hardware structure - * - * Sets I2C start condition (High -> Low on SDA while SCL is High) - **/ -static void ixgbe_i2c_start(struct ixgbe_hw *hw) -{ - u32 i2cctl = IXGBE_READ_REG(hw, IXGBE_I2CCTL); - - /* Start condition must begin with data and clock high */ - ixgbe_set_i2c_data(hw, &i2cctl, 1); - ixgbe_raise_i2c_clk(hw, &i2cctl); - - /* Setup time for start condition (4.7us) */ - udelay(IXGBE_I2C_T_SU_STA); - - ixgbe_set_i2c_data(hw, &i2cctl, 0); - - /* Hold time for start condition (4us) */ - udelay(IXGBE_I2C_T_HD_STA); - - ixgbe_lower_i2c_clk(hw, &i2cctl); - - /* Minimum low period of clock is 4.7 us */ - udelay(IXGBE_I2C_T_LOW); - -} - -/** - * ixgbe_i2c_stop - Sets I2C stop condition - * @hw: pointer to hardware structure - * - * Sets I2C stop condition (Low -> High on SDA while SCL is High) - **/ -static void ixgbe_i2c_stop(struct ixgbe_hw *hw) -{ - u32 i2cctl = IXGBE_READ_REG(hw, IXGBE_I2CCTL); - - /* Stop condition must begin with data low and clock high */ - ixgbe_set_i2c_data(hw, &i2cctl, 0); - ixgbe_raise_i2c_clk(hw, &i2cctl); - - /* Setup time for stop condition (4us) */ - udelay(IXGBE_I2C_T_SU_STO); - - ixgbe_set_i2c_data(hw, &i2cctl, 1); - - /* bus free time between stop and start (4.7us)*/ - udelay(IXGBE_I2C_T_BUF); -} - -/** - * ixgbe_clock_in_i2c_byte - Clocks in one byte via I2C - * @hw: pointer to hardware structure - * @data: data byte to clock in - * - * Clocks in one byte data via I2C data/clock - **/ -static s32 ixgbe_clock_in_i2c_byte(struct ixgbe_hw *hw, u8 *data) -{ - s32 status = 0; - s32 i; - bool bit = 0; - - for (i = 7; i >= 0; i--) { - status = ixgbe_clock_in_i2c_bit(hw, &bit); - *data |= bit<<i; - - if (status != 0) - break; - } - - return status; -} - -/** - * ixgbe_clock_out_i2c_byte - Clocks out one byte via I2C - * @hw: pointer to hardware structure - * @data: data byte clocked out - * - * Clocks out one byte data via I2C data/clock - **/ -static s32 ixgbe_clock_out_i2c_byte(struct ixgbe_hw *hw, u8 data) -{ - s32 status = 0; - s32 i; - u32 i2cctl; - bool bit = 0; - - for (i = 7; i >= 0; i--) { - bit = (data >> i) & 0x1; - status = ixgbe_clock_out_i2c_bit(hw, bit); - - if (status != 0) - break; - } - - /* Release SDA line (set high) */ - i2cctl = IXGBE_READ_REG(hw, IXGBE_I2CCTL); - i2cctl |= IXGBE_I2C_DATA_OUT; - IXGBE_WRITE_REG(hw, IXGBE_I2CCTL, i2cctl); - - return status; -} - -/** - * ixgbe_get_i2c_ack - Polls for I2C ACK - * @hw: pointer to hardware structure - * - * Clocks in/out one bit via I2C data/clock - **/ -static s32 ixgbe_get_i2c_ack(struct ixgbe_hw *hw) -{ - s32 status; - u32 i = 0; - u32 i2cctl = IXGBE_READ_REG(hw, IXGBE_I2CCTL); - u32 timeout = 10; - bool ack = 1; - - status = ixgbe_raise_i2c_clk(hw, &i2cctl); - - if (status != 0) - goto out; - - /* Minimum high period of clock is 4us */ - udelay(IXGBE_I2C_T_HIGH); - - /* Poll for ACK. Note that ACK in I2C spec is - * transition from 1 to 0 */ - for (i = 0; i < timeout; i++) { - i2cctl = IXGBE_READ_REG(hw, IXGBE_I2CCTL); - ack = ixgbe_get_i2c_data(&i2cctl); - - udelay(1); - if (ack == 0) - break; - } - - if (ack == 1) { - hw_dbg(hw, "I2C ack was not received.\n"); - status = IXGBE_ERR_I2C; - } - - ixgbe_lower_i2c_clk(hw, &i2cctl); - - /* Minimum low period of clock is 4.7 us */ - udelay(IXGBE_I2C_T_LOW); - -out: - return status; -} - -/** - * ixgbe_clock_in_i2c_bit - Clocks in one bit via I2C data/clock - * @hw: pointer to hardware structure - * @data: read data value - * - * Clocks in one bit via I2C data/clock - **/ -static s32 ixgbe_clock_in_i2c_bit(struct ixgbe_hw *hw, bool *data) -{ - s32 status; - u32 i2cctl = IXGBE_READ_REG(hw, IXGBE_I2CCTL); - - status = ixgbe_raise_i2c_clk(hw, &i2cctl); - - /* Minimum high period of clock is 4us */ - udelay(IXGBE_I2C_T_HIGH); - - i2cctl = IXGBE_READ_REG(hw, IXGBE_I2CCTL); - *data = ixgbe_get_i2c_data(&i2cctl); - - ixgbe_lower_i2c_clk(hw, &i2cctl); - - /* Minimum low period of clock is 4.7 us */ - udelay(IXGBE_I2C_T_LOW); - - return status; -} - -/** - * ixgbe_clock_out_i2c_bit - Clocks in/out one bit via I2C data/clock - * @hw: pointer to hardware structure - * @data: data value to write - * - * Clocks out one bit via I2C data/clock - **/ -static s32 ixgbe_clock_out_i2c_bit(struct ixgbe_hw *hw, bool data) -{ - s32 status; - u32 i2cctl = IXGBE_READ_REG(hw, IXGBE_I2CCTL); - - status = ixgbe_set_i2c_data(hw, &i2cctl, data); - if (status == 0) { - status = ixgbe_raise_i2c_clk(hw, &i2cctl); - - /* Minimum high period of clock is 4us */ - udelay(IXGBE_I2C_T_HIGH); - - ixgbe_lower_i2c_clk(hw, &i2cctl); - - /* Minimum low period of clock is 4.7 us. - * This also takes care of the data hold time. - */ - udelay(IXGBE_I2C_T_LOW); - } else { - status = IXGBE_ERR_I2C; - hw_dbg(hw, "I2C data was not set to %X\n", data); - } - - return status; -} -/** - * ixgbe_raise_i2c_clk - Raises the I2C SCL clock - * @hw: pointer to hardware structure - * @i2cctl: Current value of I2CCTL register - * - * Raises the I2C clock line ''0''->''1'' - **/ -static s32 ixgbe_raise_i2c_clk(struct ixgbe_hw *hw, u32 *i2cctl) -{ - s32 status = 0; - - *i2cctl |= IXGBE_I2C_CLK_OUT; - - IXGBE_WRITE_REG(hw, IXGBE_I2CCTL, *i2cctl); - - /* SCL rise time (1000ns) */ - udelay(IXGBE_I2C_T_RISE); - - return status; -} - -/** - * ixgbe_lower_i2c_clk - Lowers the I2C SCL clock - * @hw: pointer to hardware structure - * @i2cctl: Current value of I2CCTL register - * - * Lowers the I2C clock line ''1''->''0'' - **/ -static void ixgbe_lower_i2c_clk(struct ixgbe_hw *hw, u32 *i2cctl) -{ - - *i2cctl &= ~IXGBE_I2C_CLK_OUT; - - IXGBE_WRITE_REG(hw, IXGBE_I2CCTL, *i2cctl); - - /* SCL fall time (300ns) */ - udelay(IXGBE_I2C_T_FALL); -} - -/** - * ixgbe_set_i2c_data - Sets the I2C data bit - * @hw: pointer to hardware structure - * @i2cctl: Current value of I2CCTL register - * @data: I2C data value (0 or 1) to set - * - * Sets the I2C data bit - **/ -static s32 ixgbe_set_i2c_data(struct ixgbe_hw *hw, u32 *i2cctl, bool data) -{ - s32 status = 0; - - if (data) - *i2cctl |= IXGBE_I2C_DATA_OUT; - else - *i2cctl &= ~IXGBE_I2C_DATA_OUT; - - IXGBE_WRITE_REG(hw, IXGBE_I2CCTL, *i2cctl); - - /* Data rise/fall (1000ns/300ns) and set-up time (250ns) */ - udelay(IXGBE_I2C_T_RISE + IXGBE_I2C_T_FALL + IXGBE_I2C_T_SU_DATA); - - /* Verify data was set correctly */ - *i2cctl = IXGBE_READ_REG(hw, IXGBE_I2CCTL); - if (data != ixgbe_get_i2c_data(i2cctl)) { - status = IXGBE_ERR_I2C; - hw_dbg(hw, "Error - I2C data was not set to %X.\n", data); - } - - return status; -} - -/** - * ixgbe_get_i2c_data - Reads the I2C SDA data bit - * @hw: pointer to hardware structure - * @i2cctl: Current value of I2CCTL register - * - * Returns the I2C data bit value - **/ -static bool ixgbe_get_i2c_data(u32 *i2cctl) -{ - bool data; - - if (*i2cctl & IXGBE_I2C_DATA_IN) - data = 1; - else - data = 0; - - return data; -} - -/** - * ixgbe_i2c_bus_clear - Clears the I2C bus - * @hw: pointer to hardware structure - * - * Clears the I2C bus by sending nine clock pulses. - * Used when data line is stuck low. - **/ -void ixgbe_i2c_bus_clear(struct ixgbe_hw *hw) -{ - u32 i2cctl = IXGBE_READ_REG(hw, IXGBE_I2CCTL); - u32 i; - - ixgbe_i2c_start(hw); - - ixgbe_set_i2c_data(hw, &i2cctl, 1); - - for (i = 0; i < 9; i++) { - ixgbe_raise_i2c_clk(hw, &i2cctl); - - /* Min high period of clock is 4us */ - udelay(IXGBE_I2C_T_HIGH); - - ixgbe_lower_i2c_clk(hw, &i2cctl); - - /* Min low period of clock is 4.7us*/ - udelay(IXGBE_I2C_T_LOW); - } - - ixgbe_i2c_start(hw); - - /* Put the i2c bus back to default state */ - ixgbe_i2c_stop(hw); -} diff --git a/drivers/net/ixgbe/ixgbe_phy.h b/drivers/net/ixgbe/ixgbe_phy.h index 773775e..2b985f8 100644 --- a/drivers/net/ixgbe/ixgbe_phy.h +++ b/drivers/net/ixgbe/ixgbe_phy.h @@ -1,7 +1,7 @@ /******************************************************************************* Intel 10 Gigabit PCI Express Linux driver - Copyright(c) 1999 - 2009 Intel Corporation. + Copyright(c) 1999 - 2008 Intel Corporation. This program is free software; you can redistribute it and/or modify it under the terms and conditions of the GNU General Public License, @@ -39,12 +39,11 @@ #define IXGBE_SFF_VENDOR_OUI_BYTE2 0x27 #define IXGBE_SFF_1GBE_COMP_CODES 0x6 #define IXGBE_SFF_10GBE_COMP_CODES 0x3 -#define IXGBE_SFF_CABLE_TECHNOLOGY 0x8 +#define IXGBE_SFF_TRANSMISSION_MEDIA 0x9 /* Bitmasks */ -#define IXGBE_SFF_DA_PASSIVE_CABLE 0x4 +#define IXGBE_SFF_TWIN_AX_CAPABLE 0x80 #define IXGBE_SFF_1GBASESX_CAPABLE 0x1 -#define IXGBE_SFF_1GBASELX_CAPABLE 0x2 #define IXGBE_SFF_10GBASESR_CAPABLE 0x10 #define IXGBE_SFF_10GBASELR_CAPABLE 0x20 #define IXGBE_I2C_EEPROM_READ_MASK 0x100 @@ -55,15 +54,14 @@ #define IXGBE_I2C_EEPROM_STATUS_IN_PROGRESS 0x3 /* Bit-shift macros */ -#define IXGBE_SFF_VENDOR_OUI_BYTE0_SHIFT 24 -#define IXGBE_SFF_VENDOR_OUI_BYTE1_SHIFT 16 -#define IXGBE_SFF_VENDOR_OUI_BYTE2_SHIFT 8 +#define IXGBE_SFF_VENDOR_OUI_BYTE0_SHIFT 12 +#define IXGBE_SFF_VENDOR_OUI_BYTE1_SHIFT 8 +#define IXGBE_SFF_VENDOR_OUI_BYTE2_SHIFT 4 /* Vendor OUIs: format of OUI is 0x[byte0][byte1][byte2][00] */ #define IXGBE_SFF_VENDOR_OUI_TYCO 0x00407600 #define IXGBE_SFF_VENDOR_OUI_FTL 0x00906500 #define IXGBE_SFF_VENDOR_OUI_AVAGO 0x00176A00 -#define IXGBE_SFF_VENDOR_OUI_INTEL 0x001B2100 /* I2C SDA and SCL timing parameters for standard mode */ #define IXGBE_I2C_T_HD_STA 4 @@ -93,9 +91,6 @@ s32 ixgbe_setup_phy_link_speed_generic(struct ixgbe_hw *hw, ixgbe_link_speed speed, bool autoneg, bool autoneg_wait_to_complete); -s32 ixgbe_get_copper_link_capabilities_generic(struct ixgbe_hw *hw, - ixgbe_link_speed *speed, - bool *autoneg); /* PHY specific */ s32 ixgbe_check_phy_link_tnx(struct ixgbe_hw *hw, @@ -103,20 +98,10 @@ s32 ixgbe_check_phy_link_tnx(struct ixgbe_hw *hw, bool *link_up); s32 ixgbe_get_phy_firmware_version_tnx(struct ixgbe_hw *hw, u16 *firmware_version); -s32 ixgbe_get_phy_firmware_version_aq(struct ixgbe_hw *hw, - u16 *firmware_version); s32 ixgbe_reset_phy_nl(struct ixgbe_hw *hw); s32 ixgbe_identify_sfp_module_generic(struct ixgbe_hw *hw); s32 ixgbe_get_sfp_init_sequence_offsets(struct ixgbe_hw *hw, u16 *list_offset, u16 *data_offset); -s32 ixgbe_read_i2c_byte_generic(struct ixgbe_hw *hw, u8 byte_offset, - u8 dev_addr, u8 *data); -s32 ixgbe_write_i2c_byte_generic(struct ixgbe_hw *hw, u8 byte_offset, - u8 dev_addr, u8 data); -s32 ixgbe_read_i2c_eeprom_generic(struct ixgbe_hw *hw, u8 byte_offset, - u8 *eeprom_data); -s32 ixgbe_write_i2c_eeprom_generic(struct ixgbe_hw *hw, u8 byte_offset, - u8 eeprom_data); #endif /* _IXGBE_PHY_H_ */ diff --git a/drivers/net/ixgbe/ixgbe_type.h b/drivers/net/ixgbe/ixgbe_type.h index 2cf6b71..9387965 100644 --- a/drivers/net/ixgbe/ixgbe_type.h +++ b/drivers/net/ixgbe/ixgbe_type.h @@ -1,7 +1,7 @@ /******************************************************************************* Intel 10 Gigabit PCI Express Linux driver - Copyright(c) 1999 - 2009 Intel Corporation. + Copyright(c) 1999 - 2008 Intel Corporation. This program is free software; you can redistribute it and/or modify it under the terms and conditions of the GNU General Public License, @@ -30,13 +30,11 @@ #include "ixgbe_osdep.h" - /* Vendor ID */ #define IXGBE_INTEL_VENDOR_ID 0x8086 /* Device IDs */ #define IXGBE_DEV_ID_82598 0x10B6 -#define IXGBE_DEV_ID_82598_BX 0x1508 #define IXGBE_DEV_ID_82598AF_DUAL_PORT 0x10C6 #define IXGBE_DEV_ID_82598AF_SINGLE_PORT 0x10C7 #define IXGBE_DEV_ID_82598AT 0x10C8 @@ -46,9 +44,6 @@ #define IXGBE_DEV_ID_82598_DA_DUAL_PORT 0x10F1 #define IXGBE_DEV_ID_82598_SR_DUAL_PORT_EM 0x10E1 #define IXGBE_DEV_ID_82598EB_XF_LR 0x10F4 -#define IXGBE_DEV_ID_82599_KX4 0x10F7 -#define IXGBE_DEV_ID_82599_SFP 0x10FB -#define IXGBE_DEV_ID_82599_XAUI_LOM 0x10FC /* General Registers */ #define IXGBE_CTRL 0x00000 @@ -56,12 +51,9 @@ #define IXGBE_CTRL_EXT 0x00018 #define IXGBE_ESDP 0x00020 #define IXGBE_EODSDP 0x00028 -#define IXGBE_I2CCTL 0x00028 #define IXGBE_LEDCTL 0x00200 #define IXGBE_FRTIMER 0x00048 #define IXGBE_TCPTIMER 0x0004C -#define IXGBE_CORESPARE 0x00600 -#define IXGBE_EXVET 0x05078 /* NVM Registers */ #define IXGBE_EEC 0x10010 @@ -75,19 +67,6 @@ #define IXGBE_FLOP 0x1013C #define IXGBE_GRC 0x10200 -/* General Receive Control */ -#define IXGBE_GRC_MNG 0x00000001 /* Manageability Enable */ -#define IXGBE_GRC_APME 0x00000002 /* Advanced Power Management Enable */ - -#define IXGBE_VPDDIAG0 0x10204 -#define IXGBE_VPDDIAG1 0x10208 - -/* I2CCTL Bit Masks */ -#define IXGBE_I2C_CLK_IN 0x00000001 -#define IXGBE_I2C_CLK_OUT 0x00000002 -#define IXGBE_I2C_DATA_IN 0x00000004 -#define IXGBE_I2C_DATA_OUT 0x00000008 - /* Interrupt Registers */ #define IXGBE_EICR 0x00800 #define IXGBE_EICS 0x00808 @@ -95,45 +74,21 @@ #define IXGBE_EIMC 0x00888 #define IXGBE_EIAC 0x00810 #define IXGBE_EIAM 0x00890 -#define IXGBE_EICS_EX(_i) (0x00A90 + (_i) * 4) -#define IXGBE_EIMS_EX(_i) (0x00AA0 + (_i) * 4) -#define IXGBE_EIMC_EX(_i) (0x00AB0 + (_i) * 4) -#define IXGBE_EIAM_EX(_i) (0x00AD0 + (_i) * 4) -/* 82599 EITR is only 12 bits, with the lower 3 always zero */ -/* - * 82598 EITR is 16 bits but set the limits based on the max - * supported by all ixgbe hardware - */ -#define IXGBE_MAX_INT_RATE 488281 -#define IXGBE_MIN_INT_RATE 956 -#define IXGBE_MAX_EITR 0x00000FF8 -#define IXGBE_MIN_EITR 8 #define IXGBE_EITR(_i) (((_i) <= 23) ? (0x00820 + ((_i) * 4)) : \ - (0x012300 + (((_i) - 24) * 4))) -#define IXGBE_EITR_ITR_INT_MASK 0x00000FF8 -#define IXGBE_EITR_LLI_MOD 0x00008000 -#define IXGBE_EITR_CNT_WDIS 0x80000000 + (0x012300 + ((_i) * 4))) +#define IXGBE_EITR_ITR_INT_MASK 0x00000FFF #define IXGBE_IVAR(_i) (0x00900 + ((_i) * 4)) /* 24 at 0x900-0x960 */ -#define IXGBE_IVAR_MISC 0x00A00 /* misc MSI-X interrupt causes */ -#define IXGBE_EITRSEL 0x00894 #define IXGBE_MSIXT 0x00000 /* MSI-X Table. 0x0000 - 0x01C */ #define IXGBE_MSIXPBA 0x02000 /* MSI-X Pending bit array */ #define IXGBE_PBACL(_i) (((_i) == 0) ? (0x11068) : (0x110C0 + ((_i) * 4))) #define IXGBE_GPIE 0x00898 /* Flow Control Registers */ -#define IXGBE_FCADBUL 0x03210 -#define IXGBE_FCADBUH 0x03214 -#define IXGBE_FCAMACL 0x04328 -#define IXGBE_FCAMACH 0x0432C -#define IXGBE_FCRTH_82599(_i) (0x03260 + ((_i) * 4)) /* 8 of these (0-7) */ -#define IXGBE_FCRTL_82599(_i) (0x03220 + ((_i) * 4)) /* 8 of these (0-7) */ #define IXGBE_PFCTOP 0x03008 #define IXGBE_FCTTV(_i) (0x03200 + ((_i) * 4)) /* 4 of these (0-3) */ #define IXGBE_FCRTL(_i) (0x03220 + ((_i) * 8)) /* 8 of these (0-7) */ #define IXGBE_FCRTH(_i) (0x03260 + ((_i) * 8)) /* 8 of these (0-7) */ #define IXGBE_FCRTV 0x032A0 -#define IXGBE_FCCFG 0x03D00 #define IXGBE_TFCS 0x0CE00 /* Receive DMA Registers */ @@ -149,12 +104,6 @@ (0x0D018 + ((_i - 64) * 0x40))) #define IXGBE_RXDCTL(_i) (((_i) < 64) ? (0x01028 + ((_i) * 0x40)) : \ (0x0D028 + ((_i - 64) * 0x40))) -#define IXGBE_RSCCTL(_i) (((_i) < 64) ? (0x0102C + ((_i) * 0x40)) : \ - (0x0D02C + ((_i - 64) * 0x40))) -#define IXGBE_RSCDBU 0x03028 -#define IXGBE_RDDCC 0x02F20 -#define IXGBE_RXMEMWRAP 0x03190 -#define IXGBE_STARCTRL 0x03024 /* * Split and Replication Receive Control Registers * 00-15 : 0x02100 + n*4 @@ -174,7 +123,6 @@ (((_i) < 64) ? (0x0100C + ((_i) * 0x40)) : \ (0x0D00C + ((_i - 64) * 0x40)))) #define IXGBE_RDRXCTL 0x02F00 -#define IXGBE_RDRXCTL_RSC_PUSH 0x80 #define IXGBE_RXPBSIZE(_i) (0x03C00 + ((_i) * 4)) /* 8 of these 0x03C00 - 0x03C1C */ #define IXGBE_RXCTRL 0x03000 @@ -192,8 +140,6 @@ (0x0A200 + ((_i) * 8))) #define IXGBE_RAH(_i) (((_i) <= 15) ? (0x05404 + ((_i) * 8)) : \ (0x0A204 + ((_i) * 8))) -#define IXGBE_MPSAR_LO(_i) (0x0A600 + ((_i) * 8)) -#define IXGBE_MPSAR_HI(_i) (0x0A604 + ((_i) * 8)) /* Packet split receive type */ #define IXGBE_PSRTYPE(_i) (((_i) <= 15) ? (0x05480 + ((_i) * 4)) : \ (0x0EA00 + ((_i) * 4))) @@ -205,28 +151,6 @@ #define IXGBE_VLNCTRL 0x05088 #define IXGBE_MCSTCTRL 0x05090 #define IXGBE_MRQC 0x05818 -#define IXGBE_SAQF(_i) (0x0E000 + ((_i) * 4)) /* Source Address Queue Filter */ -#define IXGBE_DAQF(_i) (0x0E200 + ((_i) * 4)) /* Dest. Address Queue Filter */ -#define IXGBE_SDPQF(_i) (0x0E400 + ((_i) * 4)) /* Src Dest. Addr Queue Filter */ -#define IXGBE_FTQF(_i) (0x0E600 + ((_i) * 4)) /* Five Tuple Queue Filter */ -#define IXGBE_ETQF(_i) (0x05128 + ((_i) * 4)) /* EType Queue Filter */ -#define IXGBE_ETQS(_i) (0x0EC00 + ((_i) * 4)) /* EType Queue Select */ -#define IXGBE_SYNQF 0x0EC30 /* SYN Packet Queue Filter */ -#define IXGBE_RQTC 0x0EC70 -#define IXGBE_MTQC 0x08120 -#define IXGBE_VLVF(_i) (0x0F100 + ((_i) * 4)) /* 64 of these (0-63) */ -#define IXGBE_VLVFB(_i) (0x0F200 + ((_i) * 4)) /* 128 of these (0-127) */ -#define IXGBE_VT_CTL 0x051B0 -#define IXGBE_VFRE(_i) (0x051E0 + ((_i) * 4)) -#define IXGBE_VFTE(_i) (0x08110 + ((_i) * 4)) -#define IXGBE_QDE 0x2F04 -#define IXGBE_VMOLR(_i) (0x0F000 + ((_i) * 4)) /* 64 total */ -#define IXGBE_UTA(_i) (0x0F400 + ((_i) * 4)) -#define IXGBE_VMRCTL(_i) (0x0F600 + ((_i) * 4)) -#define IXGBE_VMRVLAN(_i) (0x0F610 + ((_i) * 4)) -#define IXGBE_VMRVM(_i) (0x0F630 + ((_i) * 4)) -#define IXGBE_L34T_IMIR(_i) (0x0E800 + ((_i) * 4)) /*128 of these (0-127)*/ -#define IXGBE_LLITHRESH 0x0EC90 #define IXGBE_IMIR(_i) (0x05A80 + ((_i) * 4)) /* 8 of these (0-7) */ #define IXGBE_IMIREXT(_i) (0x05AA0 + ((_i) * 4)) /* 8 of these (0-7) */ #define IXGBE_IMIRVP 0x05AC0 @@ -234,33 +158,6 @@ #define IXGBE_RETA(_i) (0x05C00 + ((_i) * 4)) /* 32 of these (0-31) */ #define IXGBE_RSSRK(_i) (0x05C80 + ((_i) * 4)) /* 10 of these (0-9) */ -/* Flow Director registers */ -#define IXGBE_FDIRCTRL 0x0EE00 -#define IXGBE_FDIRHKEY 0x0EE68 -#define IXGBE_FDIRSKEY 0x0EE6C -#define IXGBE_FDIRDIP4M 0x0EE3C -#define IXGBE_FDIRSIP4M 0x0EE40 -#define IXGBE_FDIRTCPM 0x0EE44 -#define IXGBE_FDIRUDPM 0x0EE48 -#define IXGBE_FDIRIP6M 0x0EE74 -#define IXGBE_FDIRM 0x0EE70 - -/* Flow Director Stats registers */ -#define IXGBE_FDIRFREE 0x0EE38 -#define IXGBE_FDIRLEN 0x0EE4C -#define IXGBE_FDIRUSTAT 0x0EE50 -#define IXGBE_FDIRFSTAT 0x0EE54 -#define IXGBE_FDIRMATCH 0x0EE58 -#define IXGBE_FDIRMISS 0x0EE5C - -/* Flow Director Programming registers */ -#define IXGBE_FDIRSIPv6(_i) (0x0EE0C + ((_i) * 4)) /* 3 of these (0-2) */ -#define IXGBE_FDIRIPSA 0x0EE18 -#define IXGBE_FDIRIPDA 0x0EE1C -#define IXGBE_FDIRPORT 0x0EE20 -#define IXGBE_FDIRVLAN 0x0EE24 -#define IXGBE_FDIRHASH 0x0EE28 -#define IXGBE_FDIRCMD 0x0EE2C /* Transmit DMA registers */ #define IXGBE_TDBAL(_i) (0x06000 + ((_i) * 0x40)) /* 32 of these (0-31)*/ @@ -273,23 +170,7 @@ #define IXGBE_TDWBAH(_i) (0x0603C + ((_i) * 0x40)) #define IXGBE_DTXCTL 0x07E00 -#define IXGBE_DMATXCTL 0x04A80 -#define IXGBE_PFDTXGSWC 0x08220 -#define IXGBE_DTXMXSZRQ 0x08100 -#define IXGBE_DTXTCPFLGL 0x04A88 -#define IXGBE_DTXTCPFLGH 0x04A8C -#define IXGBE_LBDRPEN 0x0CA00 -#define IXGBE_TXPBTHRESH(_i) (0x04950 + ((_i) * 4)) /* 8 of these 0 - 7 */ - -#define IXGBE_DMATXCTL_TE 0x1 /* Transmit Enable */ -#define IXGBE_DMATXCTL_NS 0x2 /* No Snoop LSO hdr buffer */ -#define IXGBE_DMATXCTL_GDV 0x8 /* Global Double VLAN */ -#define IXGBE_DMATXCTL_VT_SHIFT 16 /* VLAN EtherType */ - -#define IXGBE_PFDTXGSWC_VT_LBEN 0x1 /* Local L2 VT switch enable */ #define IXGBE_DCA_TXCTRL(_i) (0x07200 + ((_i) * 4)) /* 16 of these (0-15) */ -/* Tx DCA Control register : 128 of these (0-127) */ -#define IXGBE_DCA_TXCTRL_82599(_i) (0x0600C + ((_i) * 0x40)) #define IXGBE_TIPG 0x0CB00 #define IXGBE_TXPBSIZE(_i) (0x0CC00 + ((_i) * 4)) /* 8 of these */ #define IXGBE_MNGTXMAP 0x0CD10 @@ -381,181 +262,6 @@ #define IXGBE_TDPT2TCSR(_i) (0x0CD40 + ((_i) * 4)) /* 8 of these (0-7) */ -/* Security Control Registers */ -#define IXGBE_SECTXCTRL 0x08800 -#define IXGBE_SECTXSTAT 0x08804 -#define IXGBE_SECTXBUFFAF 0x08808 -#define IXGBE_SECTXMINIFG 0x08810 -#define IXGBE_SECTXSTAT 0x08804 -#define IXGBE_SECRXCTRL 0x08D00 -#define IXGBE_SECRXSTAT 0x08D04 - -/* Security Bit Fields and Masks */ -#define IXGBE_SECTXCTRL_SECTX_DIS 0x00000001 -#define IXGBE_SECTXCTRL_TX_DIS 0x00000002 -#define IXGBE_SECTXCTRL_STORE_FORWARD 0x00000004 - -#define IXGBE_SECTXSTAT_SECTX_RDY 0x00000001 -#define IXGBE_SECTXSTAT_ECC_TXERR 0x00000002 - -#define IXGBE_SECRXCTRL_SECRX_DIS 0x00000001 -#define IXGBE_SECRXCTRL_RX_DIS 0x00000002 - -#define IXGBE_SECRXSTAT_SECRX_RDY 0x00000001 -#define IXGBE_SECRXSTAT_ECC_RXERR 0x00000002 - -/* LinkSec (MacSec) Registers */ -#define IXGBE_LSECTXCAP 0x08A00 -#define IXGBE_LSECRXCAP 0x08F00 -#define IXGBE_LSECTXCTRL 0x08A04 -#define IXGBE_LSECTXSCL 0x08A08 /* SCI Low */ -#define IXGBE_LSECTXSCH 0x08A0C /* SCI High */ -#define IXGBE_LSECTXSA 0x08A10 -#define IXGBE_LSECTXPN0 0x08A14 -#define IXGBE_LSECTXPN1 0x08A18 -#define IXGBE_LSECTXKEY0(_n) (0x08A1C + (4 * (_n))) /* 4 of these (0-3) */ -#define IXGBE_LSECTXKEY1(_n) (0x08A2C + (4 * (_n))) /* 4 of these (0-3) */ -#define IXGBE_LSECRXCTRL 0x08F04 -#define IXGBE_LSECRXSCL 0x08F08 -#define IXGBE_LSECRXSCH 0x08F0C -#define IXGBE_LSECRXSA(_i) (0x08F10 + (4 * (_i))) /* 2 of these (0-1) */ -#define IXGBE_LSECRXPN(_i) (0x08F18 + (4 * (_i))) /* 2 of these (0-1) */ -#define IXGBE_LSECRXKEY(_n, _m) (0x08F20 + ((0x10 * (_n)) + (4 * (_m)))) -#define IXGBE_LSECTXUT 0x08A3C /* OutPktsUntagged */ -#define IXGBE_LSECTXPKTE 0x08A40 /* OutPktsEncrypted */ -#define IXGBE_LSECTXPKTP 0x08A44 /* OutPktsProtected */ -#define IXGBE_LSECTXOCTE 0x08A48 /* OutOctetsEncrypted */ -#define IXGBE_LSECTXOCTP 0x08A4C /* OutOctetsProtected */ -#define IXGBE_LSECRXUT 0x08F40 /* InPktsUntagged/InPktsNoTag */ -#define IXGBE_LSECRXOCTD 0x08F44 /* InOctetsDecrypted */ -#define IXGBE_LSECRXOCTV 0x08F48 /* InOctetsValidated */ -#define IXGBE_LSECRXBAD 0x08F4C /* InPktsBadTag */ -#define IXGBE_LSECRXNOSCI 0x08F50 /* InPktsNoSci */ -#define IXGBE_LSECRXUNSCI 0x08F54 /* InPktsUnknownSci */ -#define IXGBE_LSECRXUNCH 0x08F58 /* InPktsUnchecked */ -#define IXGBE_LSECRXDELAY 0x08F5C /* InPktsDelayed */ -#define IXGBE_LSECRXLATE 0x08F60 /* InPktsLate */ -#define IXGBE_LSECRXOK(_n) (0x08F64 + (0x04 * (_n))) /* InPktsOk */ -#define IXGBE_LSECRXINV(_n) (0x08F6C + (0x04 * (_n))) /* InPktsInvalid */ -#define IXGBE_LSECRXNV(_n) (0x08F74 + (0x04 * (_n))) /* InPktsNotValid */ -#define IXGBE_LSECRXUNSA 0x08F7C /* InPktsUnusedSa */ -#define IXGBE_LSECRXNUSA 0x08F80 /* InPktsNotUsingSa */ - -/* LinkSec (MacSec) Bit Fields and Masks */ -#define IXGBE_LSECTXCAP_SUM_MASK 0x00FF0000 -#define IXGBE_LSECTXCAP_SUM_SHIFT 16 -#define IXGBE_LSECRXCAP_SUM_MASK 0x00FF0000 -#define IXGBE_LSECRXCAP_SUM_SHIFT 16 - -#define IXGBE_LSECTXCTRL_EN_MASK 0x00000003 -#define IXGBE_LSECTXCTRL_DISABLE 0x0 -#define IXGBE_LSECTXCTRL_AUTH 0x1 -#define IXGBE_LSECTXCTRL_AUTH_ENCRYPT 0x2 -#define IXGBE_LSECTXCTRL_AISCI 0x00000020 -#define IXGBE_LSECTXCTRL_PNTHRSH_MASK 0xFFFFFF00 -#define IXGBE_LSECTXCTRL_RSV_MASK 0x000000D8 - -#define IXGBE_LSECRXCTRL_EN_MASK 0x0000000C -#define IXGBE_LSECRXCTRL_EN_SHIFT 2 -#define IXGBE_LSECRXCTRL_DISABLE 0x0 -#define IXGBE_LSECRXCTRL_CHECK 0x1 -#define IXGBE_LSECRXCTRL_STRICT 0x2 -#define IXGBE_LSECRXCTRL_DROP 0x3 -#define IXGBE_LSECRXCTRL_PLSH 0x00000040 -#define IXGBE_LSECRXCTRL_RP 0x00000080 -#define IXGBE_LSECRXCTRL_RSV_MASK 0xFFFFFF33 - -/* IpSec Registers */ -#define IXGBE_IPSTXIDX 0x08900 -#define IXGBE_IPSTXSALT 0x08904 -#define IXGBE_IPSTXKEY(_i) (0x08908 + (4 * (_i))) /* 4 of these (0-3) */ -#define IXGBE_IPSRXIDX 0x08E00 -#define IXGBE_IPSRXIPADDR(_i) (0x08E04 + (4 * (_i))) /* 4 of these (0-3) */ -#define IXGBE_IPSRXSPI 0x08E14 -#define IXGBE_IPSRXIPIDX 0x08E18 -#define IXGBE_IPSRXKEY(_i) (0x08E1C + (4 * (_i))) /* 4 of these (0-3) */ -#define IXGBE_IPSRXSALT 0x08E2C -#define IXGBE_IPSRXMOD 0x08E30 - -#define IXGBE_SECTXCTRL_STORE_FORWARD_ENABLE 0x4 - -/* DCB registers */ -#define IXGBE_RTRPCS 0x02430 -#define IXGBE_RTTDCS 0x04900 -#define IXGBE_RTTDCS_ARBDIS 0x00000040 /* DCB arbiter disable */ -#define IXGBE_RTTPCS 0x0CD00 -#define IXGBE_RTRUP2TC 0x03020 -#define IXGBE_RTTUP2TC 0x0C800 -#define IXGBE_RTRPT4C(_i) (0x02140 + ((_i) * 4)) /* 8 of these (0-7) */ -#define IXGBE_RTRPT4S(_i) (0x02160 + ((_i) * 4)) /* 8 of these (0-7) */ -#define IXGBE_RTTDT2C(_i) (0x04910 + ((_i) * 4)) /* 8 of these (0-7) */ -#define IXGBE_RTTDT2S(_i) (0x04930 + ((_i) * 4)) /* 8 of these (0-7) */ -#define IXGBE_RTTPT2C(_i) (0x0CD20 + ((_i) * 4)) /* 8 of these (0-7) */ -#define IXGBE_RTTPT2S(_i) (0x0CD40 + ((_i) * 4)) /* 8 of these (0-7) */ -#define IXGBE_RTTDQSEL 0x04904 -#define IXGBE_RTTDT1C 0x04908 -#define IXGBE_RTTDT1S 0x0490C -#define IXGBE_RTTDTECC 0x04990 -#define IXGBE_RTTDTECC_NO_BCN 0x00000100 - -#define IXGBE_RTTBCNRC 0x04984 - - -/* FCoE DMA Context Registers */ -#define IXGBE_FCPTRL 0x02410 /* FC User Desc. PTR Low */ -#define IXGBE_FCPTRH 0x02414 /* FC USer Desc. PTR High */ -#define IXGBE_FCBUFF 0x02418 /* FC Buffer Control */ -#define IXGBE_FCDMARW 0x02420 /* FC Receive DMA RW */ -#define IXGBE_FCINVST0 0x03FC0 /* FC Invalid DMA Context Status Reg 0 */ -#define IXGBE_FCINVST(_i) (IXGBE_FCINVST0 + ((_i) * 4)) -#define IXGBE_FCBUFF_VALID (1 << 0) /* DMA Context Valid */ -#define IXGBE_FCBUFF_BUFFSIZE (3 << 3) /* User Buffer Size */ -#define IXGBE_FCBUFF_WRCONTX (1 << 7) /* 0: Initiator, 1: Target */ -#define IXGBE_FCBUFF_BUFFCNT 0x0000ff00 /* Number of User Buffers */ -#define IXGBE_FCBUFF_OFFSET 0xffff0000 /* User Buffer Offset */ -#define IXGBE_FCBUFF_BUFFSIZE_SHIFT 3 -#define IXGBE_FCBUFF_BUFFCNT_SHIFT 8 -#define IXGBE_FCBUFF_OFFSET_SHIFT 16 -#define IXGBE_FCDMARW_WE (1 << 14) /* Write enable */ -#define IXGBE_FCDMARW_RE (1 << 15) /* Read enable */ -#define IXGBE_FCDMARW_FCOESEL 0x000001ff /* FC X_ID: 11 bits */ -#define IXGBE_FCDMARW_LASTSIZE 0xffff0000 /* Last User Buffer Size */ -#define IXGBE_FCDMARW_LASTSIZE_SHIFT 16 -/* FCoE SOF/EOF */ -#define IXGBE_TEOFF 0x04A94 /* Tx FC EOF */ -#define IXGBE_TSOFF 0x04A98 /* Tx FC SOF */ -#define IXGBE_REOFF 0x05158 /* Rx FC EOF */ -#define IXGBE_RSOFF 0x051F8 /* Rx FC SOF */ -/* FCoE Filter Context Registers */ -#define IXGBE_FCFLT 0x05108 /* FC FLT Context */ -#define IXGBE_FCFLTRW 0x05110 /* FC Filter RW Control */ -#define IXGBE_FCPARAM 0x051d8 /* FC Offset Parameter */ -#define IXGBE_FCFLT_VALID (1 << 0) /* Filter Context Valid */ -#define IXGBE_FCFLT_FIRST (1 << 1) /* Filter First */ -#define IXGBE_FCFLT_SEQID 0x00ff0000 /* Sequence ID */ -#define IXGBE_FCFLT_SEQCNT 0xff000000 /* Sequence Count */ -#define IXGBE_FCFLTRW_RVALDT (1 << 13) /* Fast Re-Validation */ -#define IXGBE_FCFLTRW_WE (1 << 14) /* Write Enable */ -#define IXGBE_FCFLTRW_RE (1 << 15) /* Read Enable */ -/* FCoE Receive Control */ -#define IXGBE_FCRXCTRL 0x05100 /* FC Receive Control */ -#define IXGBE_FCRXCTRL_FCOELLI (1 << 0) /* Low latency interrupt */ -#define IXGBE_FCRXCTRL_SAVBAD (1 << 1) /* Save Bad Frames */ -#define IXGBE_FCRXCTRL_FRSTRDH (1 << 2) /* EN 1st Read Header */ -#define IXGBE_FCRXCTRL_LASTSEQH (1 << 3) /* EN Last Header in Seq */ -#define IXGBE_FCRXCTRL_ALLH (1 << 4) /* EN All Headers */ -#define IXGBE_FCRXCTRL_FRSTSEQH (1 << 5) /* EN 1st Seq. Header */ -#define IXGBE_FCRXCTRL_ICRC (1 << 6) /* Ignore Bad FC CRC */ -#define IXGBE_FCRXCTRL_FCCRCBO (1 << 7) /* FC CRC Byte Ordering */ -#define IXGBE_FCRXCTRL_FCOEVER 0x00000f00 /* FCoE Version: 4 bits */ -#define IXGBE_FCRXCTRL_FCOEVER_SHIFT 8 -/* FCoE Redirection */ -#define IXGBE_FCRECTL 0x0ED00 /* FC Redirection Control */ -#define IXGBE_FCRETA0 0x0ED10 /* FC Redirection Table 0 */ -#define IXGBE_FCRETA(_i) (IXGBE_FCRETA0 + ((_i) * 4)) /* FCoE Redir */ -#define IXGBE_FCRECTL_ENA 0x1 /* FCoE Redir Table Enable */ -#define IXGBE_FCRETA_SIZE 8 /* Max entries in FCRETA */ -#define IXGBE_FCRETA_ENTRY_MASK 0x0000007f /* 7 bits for the queue index */ /* Stats registers */ #define IXGBE_CRCERRS 0x04000 @@ -570,11 +276,6 @@ #define IXGBE_LXONRXC 0x0CF60 #define IXGBE_LXOFFTXC 0x03F68 #define IXGBE_LXOFFRXC 0x0CF68 -#define IXGBE_LXONRXCNT 0x041A4 -#define IXGBE_LXOFFRXCNT 0x041A8 -#define IXGBE_PXONRXCNT(_i) (0x04140 + ((_i) * 4)) /* 8 of these */ -#define IXGBE_PXOFFRXCNT(_i) (0x04160 + ((_i) * 4)) /* 8 of these */ -#define IXGBE_PXON2OFFCNT(_i) (0x03240 + ((_i) * 4)) /* 8 of these */ #define IXGBE_PXONTXC(_i) (0x03F00 + ((_i) * 4)) /* 8 of these 3F00-3F1C*/ #define IXGBE_PXONRXC(_i) (0x0CF00 + ((_i) * 4)) /* 8 of these CF00-CF1C*/ #define IXGBE_PXOFFTXC(_i) (0x03F20 + ((_i) * 4)) /* 8 of these 3F20-3F3C*/ @@ -614,29 +315,15 @@ #define IXGBE_MPTC 0x040F0 #define IXGBE_BPTC 0x040F4 #define IXGBE_XEC 0x04120 -#define IXGBE_SSVPC 0x08780 #define IXGBE_RQSMR(_i) (0x02300 + ((_i) * 4)) #define IXGBE_TQSMR(_i) (((_i) <= 7) ? (0x07300 + ((_i) * 4)) : \ (0x08600 + ((_i) * 4))) -#define IXGBE_TQSM(_i) (0x08600 + ((_i) * 4)) #define IXGBE_QPRC(_i) (0x01030 + ((_i) * 0x40)) /* 16 of these */ #define IXGBE_QPTC(_i) (0x06030 + ((_i) * 0x40)) /* 16 of these */ #define IXGBE_QBRC(_i) (0x01034 + ((_i) * 0x40)) /* 16 of these */ #define IXGBE_QBTC(_i) (0x06034 + ((_i) * 0x40)) /* 16 of these */ -#define IXGBE_QPRDC(_i) (0x01430 + ((_i) * 0x40)) /* 16 of these */ -#define IXGBE_QBTC_L(_i) (0x08700 + ((_i) * 0x8)) /* 16 of these */ -#define IXGBE_QBTC_H(_i) (0x08704 + ((_i) * 0x8)) /* 16 of these */ -#define IXGBE_FCCRC 0x05118 /* Count of Good Eth CRC w/ Bad FC CRC */ -#define IXGBE_FCOERPDC 0x0241C /* FCoE Rx Packets Dropped Count */ -#define IXGBE_FCLAST 0x02424 /* FCoE Last Error Count */ -#define IXGBE_FCOEPRC 0x02428 /* Number of FCoE Packets Received */ -#define IXGBE_FCOEDWRC 0x0242C /* Number of FCoE DWords Received */ -#define IXGBE_FCOEPTC 0x08784 /* Number of FCoE Packets Transmitted */ -#define IXGBE_FCOEDWTC 0x08788 /* Number of FCoE DWords Transmitted */ -#define IXGBE_FCCRC_CNT_MASK 0x0000FFFF /* CRC_CNT: bit 0 - 15 */ -#define IXGBE_FCLAST_CNT_MASK 0x0000FFFF /* Last_CNT: bit 0 - 15 */ /* Management */ #define IXGBE_MAVTV(_i) (0x05010 + ((_i) * 4)) /* 8 of these (0-7) */ @@ -649,9 +336,6 @@ #define IXGBE_MMAL(_i) (0x05910 + ((_i) * 8)) /* 4 of these (0-3) */ #define IXGBE_MMAH(_i) (0x05914 + ((_i) * 8)) /* 4 of these (0-3) */ #define IXGBE_FTFT 0x09400 /* 0x9400-0x97FC */ -#define IXGBE_METF(_i) (0x05190 + ((_i) * 4)) /* 4 of these (0-3) */ -#define IXGBE_MDEF_EXT(_i) (0x05160 + ((_i) * 4)) /* 8 of these (0-7) */ -#define IXGBE_LSWFW 0x15014 /* ARC Subsystem registers */ #define IXGBE_HICR 0x15F00 @@ -684,65 +368,16 @@ #define IXGBE_DCA_ID 0x11070 #define IXGBE_DCA_CTRL 0x11074 -/* PCI-E registers 82599-Specific */ -#define IXGBE_GCR_EXT 0x11050 -#define IXGBE_GSCL_5_82599 0x11030 -#define IXGBE_GSCL_6_82599 0x11034 -#define IXGBE_GSCL_7_82599 0x11038 -#define IXGBE_GSCL_8_82599 0x1103C -#define IXGBE_PHYADR_82599 0x11040 -#define IXGBE_PHYDAT_82599 0x11044 -#define IXGBE_PHYCTL_82599 0x11048 -#define IXGBE_PBACLR_82599 0x11068 -#define IXGBE_CIAA_82599 0x11088 -#define IXGBE_CIAD_82599 0x1108C -#define IXGBE_PCIE_DIAG_0_82599 0x11090 -#define IXGBE_PCIE_DIAG_1_82599 0x11094 -#define IXGBE_PCIE_DIAG_2_82599 0x11098 -#define IXGBE_PCIE_DIAG_3_82599 0x1109C -#define IXGBE_PCIE_DIAG_4_82599 0x110A0 -#define IXGBE_PCIE_DIAG_5_82599 0x110A4 -#define IXGBE_PCIE_DIAG_6_82599 0x110A8 -#define IXGBE_PCIE_DIAG_7_82599 0x110C0 -#define IXGBE_INTRPT_CSR_82599 0x110B0 -#define IXGBE_INTRPT_MASK_82599 0x110B8 -#define IXGBE_CDQ_MBR_82599 0x110B4 -#define IXGBE_MISC_REG_82599 0x110F0 -#define IXGBE_ECC_CTRL_0_82599 0x11100 -#define IXGBE_ECC_CTRL_1_82599 0x11104 -#define IXGBE_ECC_STATUS_82599 0x110E0 -#define IXGBE_BAR_CTRL_82599 0x110F4 - -/* Time Sync Registers */ -#define IXGBE_TSYNCRXCTL 0x05188 /* Rx Time Sync Control register - RW */ -#define IXGBE_TSYNCTXCTL 0x08C00 /* Tx Time Sync Control register - RW */ -#define IXGBE_RXSTMPL 0x051E8 /* Rx timestamp Low - RO */ -#define IXGBE_RXSTMPH 0x051A4 /* Rx timestamp High - RO */ -#define IXGBE_RXSATRL 0x051A0 /* Rx timestamp attribute low - RO */ -#define IXGBE_RXSATRH 0x051A8 /* Rx timestamp attribute high - RO */ -#define IXGBE_RXMTRL 0x05120 /* RX message type register low - RW */ -#define IXGBE_TXSTMPL 0x08C04 /* Tx timestamp value Low - RO */ -#define IXGBE_TXSTMPH 0x08C08 /* Tx timestamp value High - RO */ -#define IXGBE_SYSTIML 0x08C0C /* System time register Low - RO */ -#define IXGBE_SYSTIMH 0x08C10 /* System time register High - RO */ -#define IXGBE_TIMINCA 0x08C14 /* Increment attributes register - RW */ -#define IXGBE_RXUDP 0x08C1C /* Time Sync Rx UDP Port - RW */ - /* Diagnostic Registers */ #define IXGBE_RDSTATCTL 0x02C20 #define IXGBE_RDSTAT(_i) (0x02C00 + ((_i) * 4)) /* 0x02C00-0x02C1C */ #define IXGBE_RDHMPN 0x02F08 #define IXGBE_RIC_DW(_i) (0x02F10 + ((_i) * 4)) #define IXGBE_RDPROBE 0x02F20 -#define IXGBE_RDMAM 0x02F30 -#define IXGBE_RDMAD 0x02F34 #define IXGBE_TDSTATCTL 0x07C20 #define IXGBE_TDSTAT(_i) (0x07C00 + ((_i) * 4)) /* 0x07C00 - 0x07C1C */ #define IXGBE_TDHMPN 0x07F08 -#define IXGBE_TDHMPN2 0x082FC -#define IXGBE_TXDESCIC 0x082CC #define IXGBE_TIC_DW(_i) (0x07F10 + ((_i) * 4)) -#define IXGBE_TIC_DW2(_i) (0x082B0 + ((_i) * 4)) #define IXGBE_TDPROBE 0x07F20 #define IXGBE_TXBUFCTRL 0x0C600 #define IXGBE_TXBUFDATA0 0x0C610 @@ -770,10 +405,6 @@ #define IXGBE_TXDATARDPTR(_i) (0x0C720 + ((_i) * 4)) /* 8 of these C720-C72C*/ #define IXGBE_TXDESCRDPTR(_i) (0x0C730 + ((_i) * 4)) /* 8 of these C730-C73C*/ #define IXGBE_PCIEECCCTL 0x1106C -#define IXGBE_PCIEECCCTL0 0x11100 -#define IXGBE_PCIEECCCTL1 0x11104 -#define IXGBE_RXDBUECC 0x03F70 -#define IXGBE_TXDBUECC 0x0CF70 #define IXGBE_PBTXECC 0x0C300 #define IXGBE_PBRXECC 0x03300 #define IXGBE_GHECCR 0x110B0 @@ -799,74 +430,24 @@ #define IXGBE_MSRWD 0x04260 #define IXGBE_MLADD 0x04264 #define IXGBE_MHADD 0x04268 -#define IXGBE_MAXFRS 0x04268 #define IXGBE_TREG 0x0426C #define IXGBE_PCSS1 0x04288 #define IXGBE_PCSS2 0x0428C #define IXGBE_XPCSS 0x04290 -#define IXGBE_MFLCN 0x04294 #define IXGBE_SERDESC 0x04298 #define IXGBE_MACS 0x0429C #define IXGBE_AUTOC 0x042A0 #define IXGBE_LINKS 0x042A4 -#define IXGBE_LINKS2 0x04324 #define IXGBE_AUTOC2 0x042A8 #define IXGBE_AUTOC3 0x042AC #define IXGBE_ANLP1 0x042B0 #define IXGBE_ANLP2 0x042B4 #define IXGBE_ATLASCTL 0x04800 -#define IXGBE_MMNGC 0x042D0 -#define IXGBE_ANLPNP1 0x042D4 -#define IXGBE_ANLPNP2 0x042D8 -#define IXGBE_KRPCSFC 0x042E0 -#define IXGBE_KRPCSS 0x042E4 -#define IXGBE_FECS1 0x042E8 -#define IXGBE_FECS2 0x042EC -#define IXGBE_SMADARCTL 0x14F10 -#define IXGBE_MPVC 0x04318 -#define IXGBE_SGMIIC 0x04314 - -/* Omer CORECTL */ -#define IXGBE_CORECTL 0x014F00 -/* BARCTRL */ -#define IXGBE_BARCTRL 0x110F4 -#define IXGBE_BARCTRL_FLSIZE 0x0700 -#define IXGBE_BARCTRL_CSRSIZE 0x2000 - -/* RSCCTL Bit Masks */ -#define IXGBE_RSCCTL_RSCEN 0x01 -#define IXGBE_RSCCTL_MAXDESC_1 0x00 -#define IXGBE_RSCCTL_MAXDESC_4 0x04 -#define IXGBE_RSCCTL_MAXDESC_8 0x08 -#define IXGBE_RSCCTL_MAXDESC_16 0x0C - -/* RSCDBU Bit Masks */ -#define IXGBE_RSCDBU_RSCSMALDIS_MASK 0x0000007F -#define IXGBE_RSCDBU_RSCACKDIS 0x00000080 /* RDRXCTL Bit Masks */ #define IXGBE_RDRXCTL_RDMTS_1_2 0x00000000 /* Rx Desc Min Threshold Size */ -#define IXGBE_RDRXCTL_CRCSTRIP 0x00000002 /* CRC Strip */ #define IXGBE_RDRXCTL_MVMEN 0x00000020 #define IXGBE_RDRXCTL_DMAIDONE 0x00000008 /* DMA init cycle done */ -#define IXGBE_RDRXCTL_AGGDIS 0x00010000 /* Aggregation disable */ -#define IXGBE_RDRXCTL_RSCFRSTSIZE 0x003E0000 /* RSC First packet size */ -#define IXGBE_RDRXCTL_RSCLLIDIS 0x00800000 /* Disable RSC compl on LLI */ - -/* RQTC Bit Masks and Shifts */ -#define IXGBE_RQTC_SHIFT_TC(_i) ((_i) * 4) -#define IXGBE_RQTC_TC0_MASK (0x7 << 0) -#define IXGBE_RQTC_TC1_MASK (0x7 << 4) -#define IXGBE_RQTC_TC2_MASK (0x7 << 8) -#define IXGBE_RQTC_TC3_MASK (0x7 << 12) -#define IXGBE_RQTC_TC4_MASK (0x7 << 16) -#define IXGBE_RQTC_TC5_MASK (0x7 << 20) -#define IXGBE_RQTC_TC6_MASK (0x7 << 24) -#define IXGBE_RQTC_TC7_MASK (0x7 << 28) - -/* PSRTYPE.RQPL Bit masks and shift */ -#define IXGBE_PSRTYPE_RQPL_MASK 0x7 -#define IXGBE_PSRTYPE_RQPL_SHIFT 29 /* CTRL Bit Masks */ #define IXGBE_CTRL_GIO_DIS 0x00000004 /* Global IO Master Disable bit */ @@ -894,18 +475,11 @@ #define IXGBE_DCA_CTRL_DCA_MODE_CB2 0x02 /* DCA Mode CB2 */ #define IXGBE_DCA_RXCTRL_CPUID_MASK 0x0000001F /* Rx CPUID Mask */ -#define IXGBE_DCA_RXCTRL_CPUID_MASK_82599 0xFF000000 /* Rx CPUID Mask */ -#define IXGBE_DCA_RXCTRL_CPUID_SHIFT_82599 24 /* Rx CPUID Shift */ #define IXGBE_DCA_RXCTRL_DESC_DCA_EN (1 << 5) /* DCA Rx Desc enable */ #define IXGBE_DCA_RXCTRL_HEAD_DCA_EN (1 << 6) /* DCA Rx Desc header enable */ #define IXGBE_DCA_RXCTRL_DATA_DCA_EN (1 << 7) /* DCA Rx Desc payload enable */ -#define IXGBE_DCA_RXCTRL_DESC_RRO_EN (1 << 9) /* DCA Rx rd Desc Relax Order */ -#define IXGBE_DCA_RXCTRL_DESC_WRO_EN (1 << 13) /* DCA Rx wr Desc Relax Order */ -#define IXGBE_DCA_RXCTRL_DESC_HSRO_EN (1 << 15) /* DCA Rx Split Header RO */ #define IXGBE_DCA_TXCTRL_CPUID_MASK 0x0000001F /* Tx CPUID Mask */ -#define IXGBE_DCA_TXCTRL_CPUID_MASK_82599 0xFF000000 /* Tx CPUID Mask */ -#define IXGBE_DCA_TXCTRL_CPUID_SHIFT_82599 24 /* Tx CPUID Shift */ #define IXGBE_DCA_TXCTRL_DESC_DCA_EN (1 << 5) /* DCA Tx Desc enable */ #define IXGBE_DCA_TXCTRL_TX_WB_RO_EN (1 << 11) /* Tx Desc writeback RO bit */ #define IXGBE_DCA_MAX_QUEUES_82598 16 /* DCA regs only on 16 queues */ @@ -949,8 +523,6 @@ #define IXGBE_ATLAS_PDN_TX_1G_QL_ALL 0xF0 #define IXGBE_ATLAS_PDN_TX_AN_QL_ALL 0xF0 -/* Omer bit masks */ -#define IXGBE_CORECTL_WRITE_CMD 0x00010000 /* Device Type definitions for new protocol MDIO commands */ #define IXGBE_MDIO_PMA_PMD_DEV_TYPE 0x1 @@ -978,11 +550,6 @@ #define IXGBE_MDIO_PHY_SPEED_ABILITY 0x4 /* Speed Ability Reg */ #define IXGBE_MDIO_PHY_SPEED_10G 0x0001 /* 10G capable */ #define IXGBE_MDIO_PHY_SPEED_1G 0x0010 /* 1G capable */ -#define IXGBE_MDIO_PHY_SPEED_100M 0x0020 /* 100M capable */ -#define IXGBE_MDIO_PHY_EXT_ABILITY 0xB /* Ext Ability Reg */ -#define IXGBE_MDIO_PHY_10GBASET_ABILITY 0x0004 /* 10GBaseT capable */ -#define IXGBE_MDIO_PHY_1000BASET_ABILITY 0x0020 /* 1000BaseT capable */ -#define IXGBE_MDIO_PHY_100BASETX_ABILITY 0x0080 /* 100BaseTX capable */ #define IXGBE_MDIO_PMA_PMD_SDA_SCL_ADDR 0xC30A /* PHY_XS SDA/SCL Addr Reg */ #define IXGBE_MDIO_PMA_PMD_SDA_SCL_DATA 0xC30B /* PHY_XS SDA/SCL Data Reg */ @@ -1002,8 +569,6 @@ /* PHY IDs*/ #define TN1010_PHY_ID 0x00A19410 #define TNX_FW_REV 0xB -#define AQ1002_PHY_ID 0x03A1B420 -#define AQ_FW_REV 0x20 #define QT2022_PHY_ID 0x0043A400 #define ATH_PHY_ID 0x03429050 @@ -1025,17 +590,11 @@ /* General purpose Interrupt Enable */ #define IXGBE_SDP0_GPIEN 0x00000001 /* SDP0 */ #define IXGBE_SDP1_GPIEN 0x00000002 /* SDP1 */ -#define IXGBE_SDP2_GPIEN 0x00000004 /* SDP2 */ #define IXGBE_GPIE_MSIX_MODE 0x00000010 /* MSI-X mode */ #define IXGBE_GPIE_OCD 0x00000020 /* Other Clear Disable */ #define IXGBE_GPIE_EIMEN 0x00000040 /* Immediate Interrupt Enable */ #define IXGBE_GPIE_EIAME 0x40000000 #define IXGBE_GPIE_PBA_SUPPORT 0x80000000 -#define IXGBE_GPIE_RSC_DELAY_SHIFT 11 -#define IXGBE_GPIE_VTMODE_MASK 0x0000C000 /* VT Mode Mask */ -#define IXGBE_GPIE_VTMODE_16 0x00004000 /* 16 VFs 8 queues per VF */ -#define IXGBE_GPIE_VTMODE_32 0x00008000 /* 32 VFs 4 queues per VF */ -#define IXGBE_GPIE_VTMODE_64 0x0000C000 /* 64 VFs 2 queues per VF */ /* Transmit Flow Control status */ #define IXGBE_TFCS_TXOFF 0x00000001 @@ -1076,25 +635,6 @@ #define IXGBE_VMD_CTL_VMDQ_EN 0x00000001 #define IXGBE_VMD_CTL_VMDQ_FILTER 0x00000002 -/* VT_CTL bitmasks */ -#define IXGBE_VT_CTL_DIS_DEFPL 0x20000000 /* disable default pool */ -#define IXGBE_VT_CTL_REPLEN 0x40000000 /* replication enabled */ -#define IXGBE_VT_CTL_VT_ENABLE 0x00000001 /* Enable VT Mode */ -#define IXGBE_VT_CTL_POOL_SHIFT 7 -#define IXGBE_VT_CTL_POOL_MASK (0x3F << IXGBE_VT_CTL_POOL_SHIFT) - -/* VMOLR bitmasks */ -#define IXGBE_VMOLR_AUPE 0x01000000 /* accept untagged packets */ -#define IXGBE_VMOLR_ROMPE 0x02000000 /* accept packets in MTA tbl */ -#define IXGBE_VMOLR_ROPE 0x04000000 /* accept packets in UC tbl */ -#define IXGBE_VMOLR_BAM 0x08000000 /* accept broadcast packets */ -#define IXGBE_VMOLR_MPE 0x10000000 /* multicast promiscuous */ - -/* VFRE bitmask */ -#define IXGBE_VFRE_ENABLE_ALL 0xFFFFFFFF - -#define IXGBE_VF_INIT_TIMEOUT 200 /* Number of retries to clear RSTI */ - /* RDHMPN and TDHMPN bitmasks */ #define IXGBE_RDHMPN_RDICADDR 0x007FF800 #define IXGBE_RDHMPN_RDICRDREQ 0x00800000 @@ -1103,41 +643,6 @@ #define IXGBE_TDHMPN_TDICRDREQ 0x00800000 #define IXGBE_TDHMPN_TDICADDR_SHIFT 11 -#define IXGBE_RDMAM_MEM_SEL_SHIFT 13 -#define IXGBE_RDMAM_DWORD_SHIFT 9 -#define IXGBE_RDMAM_DESC_COMP_FIFO 1 -#define IXGBE_RDMAM_DFC_CMD_FIFO 2 -#define IXGBE_RDMAM_RSC_HEADER_ADDR 3 -#define IXGBE_RDMAM_TCN_STATUS_RAM 4 -#define IXGBE_RDMAM_WB_COLL_FIFO 5 -#define IXGBE_RDMAM_QSC_CNT_RAM 6 -#define IXGBE_RDMAM_QSC_FCOE_RAM 7 -#define IXGBE_RDMAM_QSC_QUEUE_CNT 8 -#define IXGBE_RDMAM_QSC_QUEUE_RAM 0xA -#define IXGBE_RDMAM_QSC_RSC_RAM 0xB -#define IXGBE_RDMAM_DESC_COM_FIFO_RANGE 135 -#define IXGBE_RDMAM_DESC_COM_FIFO_COUNT 4 -#define IXGBE_RDMAM_DFC_CMD_FIFO_RANGE 48 -#define IXGBE_RDMAM_DFC_CMD_FIFO_COUNT 7 -#define IXGBE_RDMAM_RSC_HEADER_ADDR_RANGE 32 -#define IXGBE_RDMAM_RSC_HEADER_ADDR_COUNT 4 -#define IXGBE_RDMAM_TCN_STATUS_RAM_RANGE 256 -#define IXGBE_RDMAM_TCN_STATUS_RAM_COUNT 9 -#define IXGBE_RDMAM_WB_COLL_FIFO_RANGE 8 -#define IXGBE_RDMAM_WB_COLL_FIFO_COUNT 4 -#define IXGBE_RDMAM_QSC_CNT_RAM_RANGE 64 -#define IXGBE_RDMAM_QSC_CNT_RAM_COUNT 4 -#define IXGBE_RDMAM_QSC_FCOE_RAM_RANGE 512 -#define IXGBE_RDMAM_QSC_FCOE_RAM_COUNT 5 -#define IXGBE_RDMAM_QSC_QUEUE_CNT_RANGE 32 -#define IXGBE_RDMAM_QSC_QUEUE_CNT_COUNT 4 -#define IXGBE_RDMAM_QSC_QUEUE_RAM_RANGE 128 -#define IXGBE_RDMAM_QSC_QUEUE_RAM_COUNT 8 -#define IXGBE_RDMAM_QSC_RSC_RAM_RANGE 32 -#define IXGBE_RDMAM_QSC_RSC_RAM_COUNT 8 - -#define IXGBE_TXDESCIC_READY 0x80000000 - /* Receive Checksum Control */ #define IXGBE_RXCSUM_IPPCSE 0x00001000 /* IP payload checksum enable */ #define IXGBE_RXCSUM_PCSD 0x00002000 /* packet checksum disabled */ @@ -1158,25 +663,15 @@ #define IXGBE_RMCS_TFCE_PRIORITY 0x00000010 /* Tx Priority FC ena */ #define IXGBE_RMCS_ARBDIS 0x00000040 /* Arbitration disable bit */ -/* FCCFG Bit Masks */ -#define IXGBE_FCCFG_TFCE_802_3X 0x00000008 /* Tx link FC enable */ -#define IXGBE_FCCFG_TFCE_PRIORITY 0x00000010 /* Tx priority FC enable */ /* Interrupt register bitmasks */ /* Extended Interrupt Cause Read */ #define IXGBE_EICR_RTX_QUEUE 0x0000FFFF /* RTx Queue Interrupt */ -#define IXGBE_EICR_FLOW_DIR 0x00010000 /* FDir Exception */ -#define IXGBE_EICR_RX_MISS 0x00020000 /* Packet Buffer Overrun */ -#define IXGBE_EICR_PCI 0x00040000 /* PCI Exception */ -#define IXGBE_EICR_MAILBOX 0x00080000 /* VF to PF Mailbox Interrupt */ #define IXGBE_EICR_LSC 0x00100000 /* Link Status Change */ -#define IXGBE_EICR_LINKSEC 0x00200000 /* PN Threshold */ #define IXGBE_EICR_MNG 0x00400000 /* Manageability Event Interrupt */ #define IXGBE_EICR_GPI_SDP0 0x01000000 /* Gen Purpose Interrupt on SDP0 */ #define IXGBE_EICR_GPI_SDP1 0x02000000 /* Gen Purpose Interrupt on SDP1 */ -#define IXGBE_EICR_GPI_SDP2 0x04000000 /* Gen Purpose Interrupt on SDP2 */ -#define IXGBE_EICR_ECC 0x10000000 /* ECC Error */ #define IXGBE_EICR_PBUR 0x10000000 /* Packet Buffer Handler Error */ #define IXGBE_EICR_DHER 0x20000000 /* Descriptor Handler Error */ #define IXGBE_EICR_TCP_TIMER 0x40000000 /* TCP Timer */ @@ -1184,16 +679,10 @@ /* Extended Interrupt Cause Set */ #define IXGBE_EICS_RTX_QUEUE IXGBE_EICR_RTX_QUEUE /* RTx Queue Interrupt */ -#define IXGBE_EICS_FLOW_DIR IXGBE_EICR_FLOW_DIR /* FDir Exception */ -#define IXGBE_EICS_RX_MISS IXGBE_EICR_RX_MISS /* Pkt Buffer Overrun */ -#define IXGBE_EICS_PCI IXGBE_EICR_PCI /* PCI Exception */ -#define IXGBE_EICS_MAILBOX IXGBE_EICR_MAILBOX /* VF to PF Mailbox Int */ #define IXGBE_EICS_LSC IXGBE_EICR_LSC /* Link Status Change */ #define IXGBE_EICS_MNG IXGBE_EICR_MNG /* MNG Event Interrupt */ #define IXGBE_EICS_GPI_SDP0 IXGBE_EICR_GPI_SDP0 /* SDP0 Gen Purpose Int */ #define IXGBE_EICS_GPI_SDP1 IXGBE_EICR_GPI_SDP1 /* SDP1 Gen Purpose Int */ -#define IXGBE_EICS_GPI_SDP2 IXGBE_EICR_GPI_SDP2 /* SDP2 Gen Purpose Int */ -#define IXGBE_EICS_ECC IXGBE_EICR_ECC /* ECC Error */ #define IXGBE_EICS_PBUR IXGBE_EICR_PBUR /* Pkt Buf Handler Err */ #define IXGBE_EICS_DHER IXGBE_EICR_DHER /* Desc Handler Error */ #define IXGBE_EICS_TCP_TIMER IXGBE_EICR_TCP_TIMER /* TCP Timer */ @@ -1201,16 +690,10 @@ /* Extended Interrupt Mask Set */ #define IXGBE_EIMS_RTX_QUEUE IXGBE_EICR_RTX_QUEUE /* RTx Queue Interrupt */ -#define IXGBE_EIMS_FLOW_DIR IXGBE_EICR_FLOW_DIR /* FDir Exception */ -#define IXGBE_EIMS_RX_MISS IXGBE_EICR_RX_MISS /* Packet Buffer Overrun */ -#define IXGBE_EIMS_PCI IXGBE_EICR_PCI /* PCI Exception */ -#define IXGBE_EIMS_MAILBOX IXGBE_EICR_MAILBOX /* VF to PF Mailbox Int */ #define IXGBE_EIMS_LSC IXGBE_EICR_LSC /* Link Status Change */ #define IXGBE_EIMS_MNG IXGBE_EICR_MNG /* MNG Event Interrupt */ #define IXGBE_EIMS_GPI_SDP0 IXGBE_EICR_GPI_SDP0 /* SDP0 Gen Purpose Int */ #define IXGBE_EIMS_GPI_SDP1 IXGBE_EICR_GPI_SDP1 /* SDP1 Gen Purpose Int */ -#define IXGBE_EIMS_GPI_SDP2 IXGBE_EICR_GPI_SDP2 /* SDP2 Gen Purpose Int */ -#define IXGBE_EIMS_ECC IXGBE_EICR_ECC /* ECC Error */ #define IXGBE_EIMS_PBUR IXGBE_EICR_PBUR /* Pkt Buf Handler Err */ #define IXGBE_EIMS_DHER IXGBE_EICR_DHER /* Descr Handler Error */ #define IXGBE_EIMS_TCP_TIMER IXGBE_EICR_TCP_TIMER /* TCP Timer */ @@ -1218,16 +701,10 @@ /* Extended Interrupt Mask Clear */ #define IXGBE_EIMC_RTX_QUEUE IXGBE_EICR_RTX_QUEUE /* RTx Queue Interrupt */ -#define IXGBE_EIMC_FLOW_DIR IXGBE_EICR_FLOW_DIR /* FDir Exception */ -#define IXGBE_EIMC_RX_MISS IXGBE_EICR_RX_MISS /* Packet Buffer Overrun */ -#define IXGBE_EIMC_PCI IXGBE_EICR_PCI /* PCI Exception */ -#define IXGBE_EIMC_MAILBOX IXGBE_EICR_MAILBOX /* VF to PF Mailbox Int */ #define IXGBE_EIMC_LSC IXGBE_EICR_LSC /* Link Status Change */ #define IXGBE_EIMC_MNG IXGBE_EICR_MNG /* MNG Event Interrupt */ #define IXGBE_EIMC_GPI_SDP0 IXGBE_EICR_GPI_SDP0 /* SDP0 Gen Purpose Int */ #define IXGBE_EIMC_GPI_SDP1 IXGBE_EICR_GPI_SDP1 /* SDP1 Gen Purpose Int */ -#define IXGBE_EIMC_GPI_SDP2 IXGBE_EICR_GPI_SDP2 /* SDP2 Gen Purpose Int */ -#define IXGBE_EIMC_ECC IXGBE_EICR_ECC /* ECC Error */ #define IXGBE_EIMC_PBUR IXGBE_EICR_PBUR /* Pkt Buf Handler Err */ #define IXGBE_EIMC_DHER IXGBE_EICR_DHER /* Desc Handler Err */ #define IXGBE_EIMC_TCP_TIMER IXGBE_EICR_TCP_TIMER /* TCP Timer */ @@ -1250,45 +727,12 @@ #define IXGBE_IMIREXT_CTRL_SYN 0x00020000 /* Check SYN bit in header */ #define IXGBE_IMIREXT_CTRL_FIN 0x00040000 /* Check FIN bit in header */ #define IXGBE_IMIREXT_CTRL_BP 0x00080000 /* Bypass check of control bits */ -#define IXGBE_IMIR_SIZE_BP_82599 0x00001000 /* Packet size bypass */ -#define IXGBE_IMIR_CTRL_URG_82599 0x00002000 /* Check URG bit in header */ -#define IXGBE_IMIR_CTRL_ACK_82599 0x00004000 /* Check ACK bit in header */ -#define IXGBE_IMIR_CTRL_PSH_82599 0x00008000 /* Check PSH bit in header */ -#define IXGBE_IMIR_CTRL_RST_82599 0x00010000 /* Check RST bit in header */ -#define IXGBE_IMIR_CTRL_SYN_82599 0x00020000 /* Check SYN bit in header */ -#define IXGBE_IMIR_CTRL_FIN_82599 0x00040000 /* Check FIN bit in header */ -#define IXGBE_IMIR_CTRL_BP_82599 0x00080000 /* Bypass check of control bits */ -#define IXGBE_IMIR_LLI_EN_82599 0x00100000 /* Enables low latency Int */ -#define IXGBE_IMIR_RX_QUEUE_MASK_82599 0x0000007F /* Rx Queue Mask */ -#define IXGBE_IMIR_RX_QUEUE_SHIFT_82599 21 /* Rx Queue Shift */ -#define IXGBE_IMIRVP_PRIORITY_MASK 0x00000007 /* VLAN priority mask */ -#define IXGBE_IMIRVP_PRIORITY_EN 0x00000008 /* VLAN priority enable */ - -#define IXGBE_MAX_FTQF_FILTERS 128 -#define IXGBE_FTQF_PROTOCOL_MASK 0x00000003 -#define IXGBE_FTQF_PROTOCOL_TCP 0x00000000 -#define IXGBE_FTQF_PROTOCOL_UDP 0x00000001 -#define IXGBE_FTQF_PROTOCOL_SCTP 2 -#define IXGBE_FTQF_PRIORITY_MASK 0x00000007 -#define IXGBE_FTQF_PRIORITY_SHIFT 2 -#define IXGBE_FTQF_POOL_MASK 0x0000003F -#define IXGBE_FTQF_POOL_SHIFT 8 -#define IXGBE_FTQF_5TUPLE_MASK_MASK 0x0000001F -#define IXGBE_FTQF_5TUPLE_MASK_SHIFT 25 -#define IXGBE_FTQF_SOURCE_ADDR_MASK 0x1E -#define IXGBE_FTQF_DEST_ADDR_MASK 0x1D -#define IXGBE_FTQF_SOURCE_PORT_MASK 0x1B -#define IXGBE_FTQF_DEST_PORT_MASK 0x17 -#define IXGBE_FTQF_PROTOCOL_COMP_MASK 0x0F -#define IXGBE_FTQF_POOL_MASK_EN 0x40000000 -#define IXGBE_FTQF_QUEUE_ENABLE 0x80000000 /* Interrupt clear mask */ #define IXGBE_IRQ_CLEAR_MASK 0xFFFFFFFF /* Interrupt Vector Allocation Registers */ #define IXGBE_IVAR_REG_NUM 25 -#define IXGBE_IVAR_REG_NUM_82599 64 #define IXGBE_IVAR_TXRX_ENTRY 96 #define IXGBE_IVAR_RX_ENTRY 64 #define IXGBE_IVAR_RX_QUEUE(_i) (0 + (_i)) @@ -1302,32 +746,6 @@ #define IXGBE_IVAR_ALLOC_VAL 0x80 /* Interrupt Allocation valid */ -/* ETYPE Queue Filter/Select Bit Masks */ -#define IXGBE_MAX_ETQF_FILTERS 8 -#define IXGBE_ETQF_FCOE 0x08000000 /* bit 27 */ -#define IXGBE_ETQF_BCN 0x10000000 /* bit 28 */ -#define IXGBE_ETQF_1588 0x40000000 /* bit 30 */ -#define IXGBE_ETQF_FILTER_EN 0x80000000 /* bit 31 */ -#define IXGBE_ETQF_POOL_ENABLE (1 << 26) /* bit 26 */ - -#define IXGBE_ETQS_RX_QUEUE 0x007F0000 /* bits 22:16 */ -#define IXGBE_ETQS_RX_QUEUE_SHIFT 16 -#define IXGBE_ETQS_LLI 0x20000000 /* bit 29 */ -#define IXGBE_ETQS_QUEUE_EN 0x80000000 /* bit 31 */ - -/* - * ETQF filter list: one static filter per filter consumer. This is - * to avoid filter collisions later. Add new filters - * here!! - * - * Current filters: - * EAPOL 802.1x (0x888e): Filter 0 - * FCoE (0x8906): Filter 2 - * 1588 (0x88f7): Filter 3 - */ -#define IXGBE_ETQF_FILTER_EAPOL 0 -#define IXGBE_ETQF_FILTER_FCOE 2 -#define IXGBE_ETQF_FILTER_1588 3 /* VLAN Control Bit Masks */ #define IXGBE_VLNCTRL_VET 0x0000FFFF /* bits 0-15 */ #define IXGBE_VLNCTRL_CFI 0x10000000 /* bit 28 */ @@ -1335,9 +753,6 @@ #define IXGBE_VLNCTRL_VFE 0x40000000 /* bit 30 */ #define IXGBE_VLNCTRL_VME 0x80000000 /* bit 31 */ -/* VLAN pool filtering masks */ -#define IXGBE_VLVF_VIEN 0x80000000 /* filter is valid */ -#define IXGBE_VLVF_ENTRIES 64 #define IXGBE_ETHERNET_IEEE_VLAN_TYPE 0x8100 /* 802.1q protocol */ @@ -1350,10 +765,7 @@ #define IXGBE_STATUS_LAN_ID_1 0x00000004 /* LAN ID 1 */ /* ESDP Bit Masks */ -#define IXGBE_ESDP_SDP0 0x00000001 /* SDP0 Data Value */ -#define IXGBE_ESDP_SDP1 0x00000002 /* SDP1 Data Value */ -#define IXGBE_ESDP_SDP2 0x00000004 /* SDP2 Data Value */ -#define IXGBE_ESDP_SDP3 0x00000008 /* SDP3 Data Value */ +#define IXGBE_ESDP_SDP1 0x00000001 #define IXGBE_ESDP_SDP4 0x00000010 /* SDP4 Data Value */ #define IXGBE_ESDP_SDP5 0x00000020 /* SDP5 Data Value */ #define IXGBE_ESDP_SDP6 0x00000040 /* SDP6 Data Value */ @@ -1390,17 +802,9 @@ #define IXGBE_AUTOC_AN_RX_LOOSE 0x01000000 #define IXGBE_AUTOC_AN_RX_DRIFT 0x00800000 #define IXGBE_AUTOC_AN_RX_ALIGN 0x007C0000 -#define IXGBE_AUTOC_FECA 0x00040000 -#define IXGBE_AUTOC_FECR 0x00020000 -#define IXGBE_AUTOC_KR_SUPP 0x00010000 #define IXGBE_AUTOC_AN_RESTART 0x00001000 #define IXGBE_AUTOC_FLU 0x00000001 #define IXGBE_AUTOC_LMS_SHIFT 13 -#define IXGBE_AUTOC_LMS_10G_SERIAL (0x3 << IXGBE_AUTOC_LMS_SHIFT) -#define IXGBE_AUTOC_LMS_KX4_KX_KR (0x4 << IXGBE_AUTOC_LMS_SHIFT) -#define IXGBE_AUTOC_LMS_SGMII_1G_100M (0x5 << IXGBE_AUTOC_LMS_SHIFT) -#define IXGBE_AUTOC_LMS_KX4_KX_KR_1G_AN (0x6 << IXGBE_AUTOC_LMS_SHIFT) -#define IXGBE_AUTOC_LMS_KX4_KX_KR_SGMII (0x7 << IXGBE_AUTOC_LMS_SHIFT) #define IXGBE_AUTOC_LMS_MASK (0x7 << IXGBE_AUTOC_LMS_SHIFT) #define IXGBE_AUTOC_LMS_1G_LINK_NO_AN (0x0 << IXGBE_AUTOC_LMS_SHIFT) #define IXGBE_AUTOC_LMS_10G_LINK_NO_AN (0x1 << IXGBE_AUTOC_LMS_SHIFT) @@ -1418,15 +822,6 @@ #define IXGBE_AUTOC_10G_CX4 (0x2 << IXGBE_AUTOC_10G_PMA_PMD_SHIFT) #define IXGBE_AUTOC_1G_BX (0x0 << IXGBE_AUTOC_1G_PMA_PMD_SHIFT) #define IXGBE_AUTOC_1G_KX (0x1 << IXGBE_AUTOC_1G_PMA_PMD_SHIFT) -#define IXGBE_AUTOC_1G_SFI (0x0 << IXGBE_AUTOC_1G_PMA_PMD_SHIFT) -#define IXGBE_AUTOC_1G_KX_BX (0x1 << IXGBE_AUTOC_1G_PMA_PMD_SHIFT) - -#define IXGBE_AUTOC2_UPPER_MASK 0xFFFF0000 -#define IXGBE_AUTOC2_10G_SERIAL_PMA_PMD_MASK 0x00030000 -#define IXGBE_AUTOC2_10G_SERIAL_PMA_PMD_SHIFT 16 -#define IXGBE_AUTOC2_10G_KR (0x0 << IXGBE_AUTOC2_10G_SERIAL_PMA_PMD_SHIFT) -#define IXGBE_AUTOC2_10G_XFI (0x1 << IXGBE_AUTOC2_10G_SERIAL_PMA_PMD_SHIFT) -#define IXGBE_AUTOC2_10G_SFI (0x2 << IXGBE_AUTOC2_10G_SERIAL_PMA_PMD_SHIFT) /* LINKS Bit Masks */ #define IXGBE_LINKS_KX_AN_COMP 0x80000000 @@ -1436,7 +831,6 @@ #define IXGBE_LINKS_RX_MODE 0x06000000 #define IXGBE_LINKS_TX_MODE 0x01800000 #define IXGBE_LINKS_XGXS_EN 0x00400000 -#define IXGBE_LINKS_SGMII_EN 0x02000000 #define IXGBE_LINKS_PCS_1G_EN 0x00200000 #define IXGBE_LINKS_1G_AN_EN 0x00100000 #define IXGBE_LINKS_KX_AN_IDLE 0x00080000 @@ -1446,13 +840,11 @@ #define IXGBE_LINKS_TL_FAULT 0x00001000 #define IXGBE_LINKS_SIGNAL 0x00000F00 -#define IXGBE_LINKS_SPEED_82599 0x30000000 -#define IXGBE_LINKS_SPEED_10G_82599 0x30000000 -#define IXGBE_LINKS_SPEED_1G_82599 0x20000000 -#define IXGBE_LINKS_SPEED_100_82599 0x10000000 #define IXGBE_LINK_UP_TIME 90 /* 9.0 Seconds */ #define IXGBE_AUTO_NEG_TIME 45 /* 4.5 Seconds */ +#define FIBER_LINK_UP_LIMIT 50 + /* PCS1GLSTA Bit Masks */ #define IXGBE_PCS1GLSTA_LINK_OK 1 #define IXGBE_PCS1GLSTA_SYNK_OK 0x10 @@ -1524,14 +916,6 @@ #define IXGBE_FW_PTR 0x0F #define IXGBE_PBANUM0_PTR 0x15 #define IXGBE_PBANUM1_PTR 0x16 -#define IXGBE_SAN_MAC_ADDR_PTR 0x28 -#define IXGBE_DEVICE_CAPS 0x2C -#define IXGBE_SERIAL_NUMBER_MAC_ADDR 0x11 -#define IXGBE_PCIE_MSIX_82599_CAPS 0x72 -#define IXGBE_PCIE_MSIX_82598_CAPS 0x62 - -/* MSI-X capability fields masks */ -#define IXGBE_PCIE_MSIX_TBL_SZ_MASK 0x7FF /* Legacy EEPROM word offsets */ #define IXGBE_ISCSI_BOOT_CAPS 0x0033 @@ -1570,18 +954,6 @@ #define IXGBE_EERD_ATTEMPTS 100000 #endif -#define IXGBE_PCIE_CTRL2 0x5 /* PCIe Control 2 Offset */ -#define IXGBE_PCIE_CTRL2_DUMMY_ENABLE 0x8 /* Dummy Function Enable */ -#define IXGBE_PCIE_CTRL2_LAN_DISABLE 0x2 /* LAN PCI Disable */ -#define IXGBE_PCIE_CTRL2_DISABLE_SELECT 0x1 /* LAN Disable Select */ - -#define IXGBE_SAN_MAC_ADDR_PORT0_OFFSET 0x0 -#define IXGBE_SAN_MAC_ADDR_PORT1_OFFSET 0x3 -#define IXGBE_DEVICE_CAPS_ALLOW_ANY_SFP 0x1 -#define IXGBE_DEVICE_CAPS_FCOE_OFFLOADS 0x2 -#define IXGBE_FW_PASSTHROUGH_PATCH_CONFIG_PTR 0x4 -#define IXGBE_FW_PATCH_VERSION_4 0x7 - /* PCI Bus Info */ #define IXGBE_PCI_LINK_STATUS 0xB2 #define IXGBE_PCI_LINK_WIDTH 0x3F0 @@ -1646,7 +1018,6 @@ #define IXGBE_RXCTRL_RXEN 0x00000001 /* Enable Receiver */ #define IXGBE_RXCTRL_DMBYPS 0x00000002 /* Descriptor Monitor Bypass */ #define IXGBE_RXDCTL_ENABLE 0x02000000 /* Enable specific Rx Queue */ -#define IXGBE_RXDCTL_VME 0x40000000 /* VLAN mode enable */ #define IXGBE_FCTRL_SBP 0x00000002 /* Store Bad Packet */ #define IXGBE_FCTRL_MPE 0x00000100 /* Multicast Promiscuous Ena*/ @@ -1657,23 +1028,9 @@ /* Receive Priority Flow Control Enable */ #define IXGBE_FCTRL_RPFCE 0x00004000 #define IXGBE_FCTRL_RFCE 0x00008000 /* Receive Flow Control Ena */ -#define IXGBE_MFLCN_PMCF 0x00000001 /* Pass MAC Control Frames */ -#define IXGBE_MFLCN_DPF 0x00000002 /* Discard Pause Frame */ -#define IXGBE_MFLCN_RPFCE 0x00000004 /* Receive Priority FC Enable */ -#define IXGBE_MFLCN_RFCE 0x00000008 /* Receive FC Enable */ /* Multiple Receive Queue Control */ #define IXGBE_MRQC_RSSEN 0x00000001 /* RSS Enable */ -#define IXGBE_MRQC_MRQE_MASK 0xF /* Bits 3:0 */ -#define IXGBE_MRQC_RT8TCEN 0x00000002 /* 8 TC no RSS */ -#define IXGBE_MRQC_RT4TCEN 0x00000003 /* 4 TC no RSS */ -#define IXGBE_MRQC_RTRSS8TCEN 0x00000004 /* 8 TC w/ RSS */ -#define IXGBE_MRQC_RTRSS4TCEN 0x00000005 /* 4 TC w/ RSS */ -#define IXGBE_MRQC_VMDQEN 0x00000008 /* VMDq2 64 pools no RSS */ -#define IXGBE_MRQC_VMDQRSS32EN 0x0000000A /* VMDq2 32 pools w/ RSS */ -#define IXGBE_MRQC_VMDQRSS64EN 0x0000000B /* VMDq2 64 pools w/ RSS */ -#define IXGBE_MRQC_VMDQRT8TCEN 0x0000000C /* VMDq2/RT 16 pool 8 TC */ -#define IXGBE_MRQC_VMDQRT4TCEN 0x0000000D /* VMDq2/RT 32 pool 4 TC */ #define IXGBE_MRQC_RSS_FIELD_MASK 0xFFFF0000 #define IXGBE_MRQC_RSS_FIELD_IPV4_TCP 0x00010000 #define IXGBE_MRQC_RSS_FIELD_IPV4 0x00020000 @@ -1684,12 +1041,6 @@ #define IXGBE_MRQC_RSS_FIELD_IPV4_UDP 0x00400000 #define IXGBE_MRQC_RSS_FIELD_IPV6_UDP 0x00800000 #define IXGBE_MRQC_RSS_FIELD_IPV6_EX_UDP 0x01000000 -#define IXGBE_MRQC_L3L4TXSWEN 0x00008000 - -/* Queue Drop Enable */ -#define IXGBE_QDE_ENABLE 0x00000001 -#define IXGBE_QDE_IDX_MASK 0x00007F00 -#define IXGBE_QDE_IDX_SHIFT 8 #define IXGBE_TXD_POPTS_IXSM 0x01 /* Insert IP checksum */ #define IXGBE_TXD_POPTS_TXSM 0x02 /* Insert TCP/UDP checksum */ @@ -1701,26 +1052,10 @@ #define IXGBE_TXD_CMD_VLE 0x40000000 /* Add VLAN tag */ #define IXGBE_TXD_STAT_DD 0x00000001 /* Descriptor Done */ -#define IXGBE_RXDADV_IPSEC_STATUS_SECP 0x00020000 -#define IXGBE_RXDADV_IPSEC_ERROR_INVALID_PROTOCOL 0x08000000 -#define IXGBE_RXDADV_IPSEC_ERROR_INVALID_LENGTH 0x10000000 -#define IXGBE_RXDADV_IPSEC_ERROR_AUTH_FAILED 0x18000000 -#define IXGBE_RXDADV_IPSEC_ERROR_BIT_MASK 0x18000000 -/* Multiple Transmit Queue Command Register */ -#define IXGBE_MTQC_RT_ENA 0x1 /* DCB Enable */ -#define IXGBE_MTQC_VT_ENA 0x2 /* VMDQ2 Enable */ -#define IXGBE_MTQC_64Q_1PB 0x0 /* 64 queues 1 pack buffer */ -#define IXGBE_MTQC_32VF 0x8 /* 4 TX Queues per pool w/32VF''s */ -#define IXGBE_MTQC_64VF 0x4 /* 2 TX Queues per pool w/64VF''s */ -#define IXGBE_MTQC_8TC_8TQ 0xC /* 8 TC if RT_ENA or 8 TQ if VT_ENA */ - /* Receive Descriptor bit definitions */ #define IXGBE_RXD_STAT_DD 0x01 /* Descriptor Done */ #define IXGBE_RXD_STAT_EOP 0x02 /* End of Packet */ -#define IXGBE_RXD_STAT_FLM 0x04 /* FDir Match */ #define IXGBE_RXD_STAT_VP 0x08 /* IEEE VLAN Packet */ -#define IXGBE_RXDADV_NEXTP_MASK 0x000FFFF0 /* Next Descriptor Index */ -#define IXGBE_RXDADV_NEXTP_SHIFT 0x00000004 #define IXGBE_RXD_STAT_UDPCS 0x10 /* UDP xsum calculated */ #define IXGBE_RXD_STAT_L4CS 0x20 /* L4 xsum calculated */ #define IXGBE_RXD_STAT_IPCS 0x40 /* IP xsum calculated */ @@ -1729,10 +1064,6 @@ #define IXGBE_RXD_STAT_VEXT 0x200 /* 1st VLAN found */ #define IXGBE_RXD_STAT_UDPV 0x400 /* Valid UDP checksum */ #define IXGBE_RXD_STAT_DYNINT 0x800 /* Pkt caused INT via DYNINT */ -#define IXGBE_RXD_STAT_LLINT 0x800 /* Pkt caused Low Latency Interrupt */ -#define IXGBE_RXD_STAT_TS 0x10000 /* Time Stamp */ -#define IXGBE_RXD_STAT_SECP 0x20000 /* Security Processing */ -#define IXGBE_RXD_STAT_LB 0x40000 /* Loopback Status */ #define IXGBE_RXD_STAT_ACK 0x8000 /* ACK Packet indication */ #define IXGBE_RXD_ERR_CE 0x01 /* CRC Error */ #define IXGBE_RXD_ERR_LE 0x02 /* Length Error */ @@ -1741,13 +1072,6 @@ #define IXGBE_RXD_ERR_USE 0x20 /* Undersize Error */ #define IXGBE_RXD_ERR_TCPE 0x40 /* TCP/UDP Checksum Error */ #define IXGBE_RXD_ERR_IPE 0x80 /* IP Checksum Error */ -#define IXGBE_RXDADV_ERR_MASK 0xfff00000 /* RDESC.ERRORS mask */ -#define IXGBE_RXDADV_ERR_SHIFT 20 /* RDESC.ERRORS shift */ -#define IXGBE_RXDADV_ERR_FCEOFE 0x80000000 /* FCoEFe/IPE */ -#define IXGBE_RXDADV_ERR_FCERR 0x00700000 /* FCERR/FDIRERR */ -#define IXGBE_RXDADV_ERR_FDIR_LEN 0x00100000 /* FDIR Length error */ -#define IXGBE_RXDADV_ERR_FDIR_DROP 0x00200000 /* FDIR Drop error */ -#define IXGBE_RXDADV_ERR_FDIR_COLL 0x00400000 /* FDIR Collision error */ #define IXGBE_RXDADV_ERR_HBO 0x00800000 /*Header Buffer Overflow */ #define IXGBE_RXDADV_ERR_CE 0x01000000 /* CRC Error */ #define IXGBE_RXDADV_ERR_LE 0x02000000 /* Length Error */ @@ -1762,30 +1086,9 @@ #define IXGBE_RXD_CFI_MASK 0x1000 /* CFI is bit 12 */ #define IXGBE_RXD_CFI_SHIFT 12 -#define IXGBE_RXDADV_STAT_DD IXGBE_RXD_STAT_DD /* Done */ -#define IXGBE_RXDADV_STAT_EOP IXGBE_RXD_STAT_EOP /* End of Packet */ -#define IXGBE_RXDADV_STAT_FLM IXGBE_RXD_STAT_FLM /* FDir Match */ -#define IXGBE_RXDADV_STAT_VP IXGBE_RXD_STAT_VP /* IEEE VLAN Pkt */ -#define IXGBE_RXDADV_STAT_MASK 0x000fffff /* Stat/NEXTP: bit 0-19 */ -#define IXGBE_RXDADV_STAT_FCEOFS 0x00000040 /* FCoE EOF/SOF Stat */ -#define IXGBE_RXDADV_STAT_FCSTAT 0x00000030 /* FCoE Pkt Stat */ -#define IXGBE_RXDADV_STAT_FCSTAT_NOMTCH 0x00000000 /* 00: No Ctxt Match */ -#define IXGBE_RXDADV_STAT_FCSTAT_NODDP 0x00000010 /* 01: Ctxt w/o DDP */ -#define IXGBE_RXDADV_STAT_FCSTAT_FCPRSP 0x00000020 /* 10: Recv. FCP_RSP */ -#define IXGBE_RXDADV_STAT_FCSTAT_DDP 0x00000030 /* 11: Ctxt w/ DDP */ - -/* PSRTYPE bit definitions */ -#define IXGBE_PSRTYPE_TCPHDR 0x00000010 -#define IXGBE_PSRTYPE_UDPHDR 0x00000020 -#define IXGBE_PSRTYPE_IPV4HDR 0x00000100 -#define IXGBE_PSRTYPE_IPV6HDR 0x00000200 -#define IXGBE_PSRTYPE_L2HDR 0x00001000 /* SRRCTL bit definitions */ #define IXGBE_SRRCTL_BSIZEPKT_SHIFT 10 /* so many KBs */ -#define IXGBE_SRRCTL_RDMTS_SHIFT 22 -#define IXGBE_SRRCTL_RDMTS_MASK 0x01C00000 -#define IXGBE_SRRCTL_DROP_EN 0x10000000 #define IXGBE_SRRCTL_BSIZEPKT_MASK 0x0000007F #define IXGBE_SRRCTL_BSIZEHDR_MASK 0x00003F00 #define IXGBE_SRRCTL_DESCTYPE_LEGACY 0x00000000 @@ -1800,10 +1103,7 @@ #define IXGBE_RXDADV_RSSTYPE_MASK 0x0000000F #define IXGBE_RXDADV_PKTTYPE_MASK 0x0000FFF0 -#define IXGBE_RXDADV_PKTTYPE_MASK_EX 0x0001FFF0 #define IXGBE_RXDADV_HDRBUFLEN_MASK 0x00007FE0 -#define IXGBE_RXDADV_RSCCNT_MASK 0x001E0000 -#define IXGBE_RXDADV_RSCCNT_SHIFT 17 #define IXGBE_RXDADV_HDRBUFLEN_SHIFT 5 #define IXGBE_RXDADV_SPLITHEADER_EN 0x00001000 #define IXGBE_RXDADV_SPH 0x8000 @@ -1830,20 +1130,6 @@ #define IXGBE_RXDADV_PKTTYPE_UDP 0x00000200 /* UDP hdr present */ #define IXGBE_RXDADV_PKTTYPE_SCTP 0x00000400 /* SCTP hdr present */ #define IXGBE_RXDADV_PKTTYPE_NFS 0x00000800 /* NFS hdr present */ -#define IXGBE_RXDADV_PKTTYPE_IPSEC_ESP 0x00001000 /* IPSec ESP */ -#define IXGBE_RXDADV_PKTTYPE_IPSEC_AH 0x00002000 /* IPSec AH */ -#define IXGBE_RXDADV_PKTTYPE_LINKSEC 0x00004000 /* LinkSec Encap */ -#define IXGBE_RXDADV_PKTTYPE_ETQF 0x00008000 /* PKTTYPE is ETQF index */ -#define IXGBE_RXDADV_PKTTYPE_ETQF_MASK 0x00000070 /* ETQF has 8 indices */ -#define IXGBE_RXDADV_PKTTYPE_ETQF_SHIFT 4 /* Right-shift 4 bits */ - -/* Security Processing bit Indication */ -#define IXGBE_RXDADV_LNKSEC_STATUS_SECP 0x00020000 -#define IXGBE_RXDADV_LNKSEC_ERROR_NO_SA_MATCH 0x08000000 -#define IXGBE_RXDADV_LNKSEC_ERROR_REPLAY_ERROR 0x10000000 -#define IXGBE_RXDADV_LNKSEC_ERROR_BIT_MASK 0x18000000 -#define IXGBE_RXDADV_LNKSEC_ERROR_BAD_SIG 0x18000000 - /* Masks to determine if packets should be dropped due to frame errors */ #define IXGBE_RXD_ERR_FRAME_ERR_MASK ( \ IXGBE_RXD_ERR_CE | \ @@ -1873,20 +1159,10 @@ #define IXGBE_RX_DESC_SPECIAL_PRI_SHIFT 0x000D /* Priority in upper 3 of 16 */ #define IXGBE_TX_DESC_SPECIAL_PRI_SHIFT IXGBE_RX_DESC_SPECIAL_PRI_SHIFT -/* SR-IOV specific macros */ -#define IXGBE_MBVFICR_INDEX(vf_number) (vf_number >> 4) -#define IXGBE_MBVFICR(_i) (0x00710 + (_i * 4)) -#define IXGBE_VFLRE(_i) (((_i & 1) ? 0x001C0 : 0x00600)) -#define IXGBE_VFLREC(_i) (0x00700 + (_i * 4)) - -/* Little Endian defines */ #ifndef __le16 +/* Little Endian defines */ #define __le16 u16 -#endif -#ifndef __le32 #define __le32 u32 -#endif -#ifndef __le64 #define __le64 u64 #endif @@ -1897,81 +1173,6 @@ #define __be64 u64 #endif -enum ixgbe_fdir_pballoc_type { - IXGBE_FDIR_PBALLOC_64K = 0, - IXGBE_FDIR_PBALLOC_128K, - IXGBE_FDIR_PBALLOC_256K, -}; -#define IXGBE_FDIR_PBALLOC_SIZE_SHIFT 16 - -/* Flow Director register values */ -#define IXGBE_FDIRCTRL_PBALLOC_64K 0x00000001 -#define IXGBE_FDIRCTRL_PBALLOC_128K 0x00000002 -#define IXGBE_FDIRCTRL_PBALLOC_256K 0x00000003 -#define IXGBE_FDIRCTRL_INIT_DONE 0x00000008 -#define IXGBE_FDIRCTRL_PERFECT_MATCH 0x00000010 -#define IXGBE_FDIRCTRL_REPORT_STATUS 0x00000020 -#define IXGBE_FDIRCTRL_REPORT_STATUS_ALWAYS 0x00000080 -#define IXGBE_FDIRCTRL_DROP_Q_SHIFT 8 -#define IXGBE_FDIRCTRL_FLEX_SHIFT 16 -#define IXGBE_FDIRCTRL_SEARCHLIM 0x00800000 -#define IXGBE_FDIRCTRL_MAX_LENGTH_SHIFT 24 -#define IXGBE_FDIRCTRL_FULL_THRESH_MASK 0xF0000000 -#define IXGBE_FDIRCTRL_FULL_THRESH_SHIFT 28 - -#define IXGBE_FDIRTCPM_DPORTM_SHIFT 16 -#define IXGBE_FDIRUDPM_DPORTM_SHIFT 16 -#define IXGBE_FDIRIP6M_DIPM_SHIFT 16 -#define IXGBE_FDIRM_VLANID 0x00000001 -#define IXGBE_FDIRM_VLANP 0x00000002 -#define IXGBE_FDIRM_POOL 0x00000004 -#define IXGBE_FDIRM_L3P 0x00000008 -#define IXGBE_FDIRM_L4P 0x00000010 -#define IXGBE_FDIRM_FLEX 0x00000020 -#define IXGBE_FDIRM_DIPv6 0x00000040 - -#define IXGBE_FDIRFREE_FREE_MASK 0xFFFF -#define IXGBE_FDIRFREE_FREE_SHIFT 0 -#define IXGBE_FDIRFREE_COLL_MASK 0x7FFF0000 -#define IXGBE_FDIRFREE_COLL_SHIFT 16 -#define IXGBE_FDIRLEN_MAXLEN_MASK 0x3F -#define IXGBE_FDIRLEN_MAXLEN_SHIFT 0 -#define IXGBE_FDIRLEN_MAXHASH_MASK 0x7FFF0000 -#define IXGBE_FDIRLEN_MAXHASH_SHIFT 16 -#define IXGBE_FDIRUSTAT_ADD_MASK 0xFFFF -#define IXGBE_FDIRUSTAT_ADD_SHIFT 0 -#define IXGBE_FDIRUSTAT_REMOVE_MASK 0xFFFF0000 -#define IXGBE_FDIRUSTAT_REMOVE_SHIFT 16 -#define IXGBE_FDIRFSTAT_FADD_MASK 0x00FF -#define IXGBE_FDIRFSTAT_FADD_SHIFT 0 -#define IXGBE_FDIRFSTAT_FREMOVE_MASK 0xFF00 -#define IXGBE_FDIRFSTAT_FREMOVE_SHIFT 8 -#define IXGBE_FDIRPORT_DESTINATION_SHIFT 16 -#define IXGBE_FDIRVLAN_FLEX_SHIFT 16 -#define IXGBE_FDIRHASH_BUCKET_VALID_SHIFT 15 -#define IXGBE_FDIRHASH_SIG_SW_INDEX_SHIFT 16 - -#define IXGBE_FDIRCMD_CMD_MASK 0x00000003 -#define IXGBE_FDIRCMD_CMD_ADD_FLOW 0x00000001 -#define IXGBE_FDIRCMD_CMD_REMOVE_FLOW 0x00000002 -#define IXGBE_FDIRCMD_CMD_QUERY_REM_FILT 0x00000003 -#define IXGBE_FDIRCMD_CMD_QUERY_REM_HASH 0x00000007 -#define IXGBE_FDIRCMD_FILTER_UPDATE 0x00000008 -#define IXGBE_FDIRCMD_IPv6DMATCH 0x00000010 -#define IXGBE_FDIRCMD_L4TYPE_UDP 0x00000020 -#define IXGBE_FDIRCMD_L4TYPE_TCP 0x00000040 -#define IXGBE_FDIRCMD_L4TYPE_SCTP 0x00000060 -#define IXGBE_FDIRCMD_IPV6 0x00000080 -#define IXGBE_FDIRCMD_CLEARHT 0x00000100 -#define IXGBE_FDIRCMD_DROP 0x00000200 -#define IXGBE_FDIRCMD_INT 0x00000400 -#define IXGBE_FDIRCMD_LAST 0x00000800 -#define IXGBE_FDIRCMD_COLLISION 0x00001000 -#define IXGBE_FDIRCMD_QUEUE_EN 0x00008000 -#define IXGBE_FDIRCMD_RX_QUEUE_SHIFT 16 -#define IXGBE_FDIRCMD_VT_POOL_SHIFT 24 -#define IXGBE_FDIR_INIT_DONE_POLL 10 -#define IXGBE_FDIRCMD_CMD_POLL 10 /* Transmit Descriptor - Legacy */ struct ixgbe_legacy_tx_desc { @@ -2059,9 +1260,6 @@ struct ixgbe_adv_tx_context_desc { /* Adv Transmit Descriptor Config Masks */ #define IXGBE_ADVTXD_DTALEN_MASK 0x0000FFFF /* Data buf length(bytes) */ -#define IXGBE_ADVTXD_MAC_LINKSEC 0x00040000 /* Insert LinkSec */ -#define IXGBE_ADVTXD_IPSEC_SA_INDEX_MASK 0x000003FF /* IPSec SA index */ -#define IXGBE_ADVTXD_IPSEC_ESP_LEN_MASK 0x000001FF /* IPSec ESP length */ #define IXGBE_ADVTXD_DTYP_MASK 0x00F00000 /* DTYP mask */ #define IXGBE_ADVTXD_DTYP_CTXT 0x00200000 /* Advanced Context Desc */ #define IXGBE_ADVTXD_DTYP_DATA 0x00300000 /* Advanced Data Descriptor */ @@ -2096,19 +1294,6 @@ struct ixgbe_adv_tx_context_desc { #define IXGBE_ADVTXD_TUCMD_L4T_TCP 0x00000800 /* L4 Packet TYPE of TCP */ #define IXGBE_ADVTXD_TUCMD_L4T_SCTP 0x00001000 /* L4 Packet TYPE of SCTP */ #define IXGBE_ADVTXD_TUCMD_MKRREQ 0x00002000 /*Req requires Markers and CRC*/ -#define IXGBE_ADVTXD_POPTS_IPSEC 0x00000400 /* IPSec offload request */ -#define IXGBE_ADVTXD_TUCMD_IPSEC_TYPE_ESP 0x00002000 /* IPSec Type ESP */ -#define IXGBE_ADVTXD_TUCMD_IPSEC_ENCRYPT_EN 0x00004000/* ESP Encrypt Enable */ -#define IXGBE_ADVTXT_TUCMD_FCOE 0x00008000 /* FCoE Frame Type */ -#define IXGBE_ADVTXD_FCOEF_EOF_MASK (0x3 << 10) /* FC EOF index */ -#define IXGBE_ADVTXD_FCOEF_SOF ((1 << 2) << 10) /* FC SOF index */ -#define IXGBE_ADVTXD_FCOEF_PARINC ((1 << 3) << 10) /* Rel_Off in F_CTL */ -#define IXGBE_ADVTXD_FCOEF_ORIE ((1 << 4) << 10) /* Orientation: End */ -#define IXGBE_ADVTXD_FCOEF_ORIS ((1 << 5) << 10) /* Orientation: Start */ -#define IXGBE_ADVTXD_FCOEF_EOF_N (0x0 << 10) /* 00: EOFn */ -#define IXGBE_ADVTXD_FCOEF_EOF_T (0x1 << 10) /* 01: EOFt */ -#define IXGBE_ADVTXD_FCOEF_EOF_NI (0x2 << 10) /* 10: EOFni */ -#define IXGBE_ADVTXD_FCOEF_EOF_A (0x3 << 10) /* 11: EOFa */ #define IXGBE_ADVTXD_L4LEN_SHIFT 8 /* Adv ctxt L4LEN shift */ #define IXGBE_ADVTXD_MSS_SHIFT 16 /* Adv ctxt MSS shift */ @@ -2122,17 +1307,13 @@ typedef u32 ixgbe_link_speed; #define IXGBE_LINK_SPEED_10GB_FULL 0x0080 #define IXGBE_LINK_SPEED_82598_AUTONEG (IXGBE_LINK_SPEED_1GB_FULL | \ IXGBE_LINK_SPEED_10GB_FULL) -#define IXGBE_LINK_SPEED_82599_AUTONEG (IXGBE_LINK_SPEED_100_FULL | \ - IXGBE_LINK_SPEED_1GB_FULL | \ - IXGBE_LINK_SPEED_10GB_FULL) - /* Physical layer type */ typedef u32 ixgbe_physical_layer; #define IXGBE_PHYSICAL_LAYER_UNKNOWN 0 #define IXGBE_PHYSICAL_LAYER_10GBASE_T 0x0001 #define IXGBE_PHYSICAL_LAYER_1000BASE_T 0x0002 -#define IXGBE_PHYSICAL_LAYER_100BASE_TX 0x0004 +#define IXGBE_PHYSICAL_LAYER_100BASE_T 0x0004 #define IXGBE_PHYSICAL_LAYER_SFP_PLUS_CU 0x0008 #define IXGBE_PHYSICAL_LAYER_10GBASE_LR 0x0010 #define IXGBE_PHYSICAL_LAYER_10GBASE_LRM 0x0020 @@ -2141,47 +1322,7 @@ typedef u32 ixgbe_physical_layer; #define IXGBE_PHYSICAL_LAYER_10GBASE_CX4 0x0100 #define IXGBE_PHYSICAL_LAYER_1000BASE_KX 0x0200 #define IXGBE_PHYSICAL_LAYER_1000BASE_BX 0x0400 -#define IXGBE_PHYSICAL_LAYER_10GBASE_KR 0x0800 -#define IXGBE_PHYSICAL_LAYER_10GBASE_XAUI 0x1000 - -/* Software ATR hash keys */ -#define IXGBE_ATR_BUCKET_HASH_KEY 0xE214AD3D -#define IXGBE_ATR_SIGNATURE_HASH_KEY 0x14364D17 - -/* Software ATR input stream offsets and masks */ -#define IXGBE_ATR_VLAN_OFFSET 0 -#define IXGBE_ATR_SRC_IPV6_OFFSET 2 -#define IXGBE_ATR_SRC_IPV4_OFFSET 14 -#define IXGBE_ATR_DST_IPV6_OFFSET 18 -#define IXGBE_ATR_DST_IPV4_OFFSET 30 -#define IXGBE_ATR_SRC_PORT_OFFSET 34 -#define IXGBE_ATR_DST_PORT_OFFSET 36 -#define IXGBE_ATR_FLEX_BYTE_OFFSET 38 -#define IXGBE_ATR_VM_POOL_OFFSET 40 -#define IXGBE_ATR_L4TYPE_OFFSET 41 - -#define IXGBE_ATR_L4TYPE_MASK 0x3 -#define IXGBE_ATR_L4TYPE_IPV6_MASK 0x4 -#define IXGBE_ATR_L4TYPE_UDP 0x1 -#define IXGBE_ATR_L4TYPE_TCP 0x2 -#define IXGBE_ATR_L4TYPE_SCTP 0x3 -#define IXGBE_ATR_HASH_MASK 0x7fff - -/* Flow Director ATR input struct. */ -struct ixgbe_atr_input { - /* Byte layout in order, all values with MSB first: - * - * vlan_id - 2 bytes - * src_ip - 16 bytes - * dst_ip - 16 bytes - * src_port - 2 bytes - * dst_port - 2 bytes - * flex_bytes - 2 bytes - * vm_pool - 1 byte - * l4type - 1 byte - */ - u8 byte_stream[42]; -}; + enum ixgbe_eeprom_type { ixgbe_eeprom_uninitialized = 0, @@ -2192,16 +1333,12 @@ enum ixgbe_eeprom_type { enum ixgbe_mac_type { ixgbe_mac_unknown = 0, ixgbe_mac_82598EB, - ixgbe_mac_82599EB, ixgbe_num_macs }; enum ixgbe_phy_type { ixgbe_phy_unknown = 0, - ixgbe_phy_none, ixgbe_phy_tn, - ixgbe_phy_aq, - ixgbe_phy_cu_unknown, ixgbe_phy_qt, ixgbe_phy_xaui, ixgbe_phy_nl, @@ -2210,8 +1347,6 @@ enum ixgbe_phy_type { ixgbe_phy_sfp_avago, ixgbe_phy_sfp_ftl, ixgbe_phy_sfp_unknown, - ixgbe_phy_sfp_intel, - ixgbe_phy_sfp_unsupported, /*Enforce bit set with unsupported module*/ ixgbe_phy_generic }; @@ -2223,19 +1358,11 @@ enum ixgbe_phy_type { * 0 SFP_DA_CU * 1 SFP_SR * 2 SFP_LR - * 3 SFP_DA_CU_CORE0 - 82599-specific - * 4 SFP_DA_CU_CORE1 - 82599-specific - * 5 SFP_SR/LR_CORE0 - 82599-specific - * 6 SFP_SR/LR_CORE1 - 82599-specific */ enum ixgbe_sfp_type { ixgbe_sfp_type_da_cu = 0, ixgbe_sfp_type_sr = 1, ixgbe_sfp_type_lr = 2, - ixgbe_sfp_type_da_cu_core0 = 3, - ixgbe_sfp_type_da_cu_core1 = 4, - ixgbe_sfp_type_srlr_core0 = 5, - ixgbe_sfp_type_srlr_core1 = 6, ixgbe_sfp_type_not_present = 0xFFFE, ixgbe_sfp_type_unknown = 0xFFFF }; @@ -2254,9 +1381,6 @@ enum ixgbe_fc_mode { ixgbe_fc_rx_pause, ixgbe_fc_tx_pause, ixgbe_fc_full, -#ifdef CONFIG_DCB - ixgbe_fc_pfc, -#endif ixgbe_fc_default }; @@ -2297,6 +1421,7 @@ enum ixgbe_bus_width { struct ixgbe_addr_filter_info { u32 num_mc_addrs; u32 rar_used_count; + u32 mc_addr_in_rar_count; u32 mta_in_use; u32 overflow_promisc; bool user_set_promisc; @@ -2309,7 +1434,6 @@ struct ixgbe_bus_info { enum ixgbe_bus_type type; u16 func; - u16 lan_id; }; /* Flow control parameters */ @@ -2319,8 +1443,6 @@ struct ixgbe_fc_info { u16 pause_time; /* Flow Control Pause timer */ bool send_xon; /* Flow control send XON */ bool strict_ieee; /* Strict IEEE mode */ - bool disable_fc_autoneg; /* Do not autonegotiate FC */ - bool fc_was_autonegged; /* Is current_mode the result of autonegging? */ enum ixgbe_fc_mode current_mode; /* FC mode in effect */ enum ixgbe_fc_mode requested_mode; /* FC mode requested by caller */ }; @@ -2382,21 +1504,6 @@ struct ixgbe_hw_stats { u64 qptc[16]; u64 qbrc[16]; u64 qbtc[16]; - u64 qprdc[16]; - u64 pxon2offc[8]; - u64 fdirustat_add; - u64 fdirustat_remove; - u64 fdirfstat_fadd; - u64 fdirfstat_fremove; - u64 fdirmatch; - u64 fdirmiss; - u64 fccrc; - u64 fclast; - u64 fcoerpdc; - u64 fcoeprc; - u64 fcoeptc; - u64 fcoedwrc; - u64 fcoedwtc; }; /* forward declaration */ @@ -2423,18 +1530,11 @@ struct ixgbe_mac_operations { enum ixgbe_media_type (*get_media_type)(struct ixgbe_hw *); u32 (*get_supported_physical_layer)(struct ixgbe_hw *); s32 (*get_mac_addr)(struct ixgbe_hw *, u8 *); - s32 (*get_san_mac_addr)(struct ixgbe_hw *, u8 *); - s32 (*set_san_mac_addr)(struct ixgbe_hw *, u8 *); - s32 (*get_device_caps)(struct ixgbe_hw *, u16 *); s32 (*stop_adapter)(struct ixgbe_hw *); s32 (*get_bus_info)(struct ixgbe_hw *); void (*set_lan_id)(struct ixgbe_hw *); s32 (*read_analog_reg8)(struct ixgbe_hw*, u32, u8*); s32 (*write_analog_reg8)(struct ixgbe_hw*, u32, u8); - s32 (*setup_sfp)(struct ixgbe_hw *); - s32 (*enable_rx_dma)(struct ixgbe_hw *, u32); - s32 (*acquire_swfw_sync)(struct ixgbe_hw *, u16); - void (*release_swfw_sync)(struct ixgbe_hw *, u16); /* Link */ s32 (*setup_link)(struct ixgbe_hw *); @@ -2453,7 +1553,6 @@ struct ixgbe_mac_operations { /* RAR, Multicast, VLAN */ s32 (*set_rar)(struct ixgbe_hw *, u32, u8 *, u32, u32); s32 (*clear_rar)(struct ixgbe_hw *, u32); - s32 (*insert_mac_addr)(struct ixgbe_hw *, u8 *, u32); s32 (*set_vmdq)(struct ixgbe_hw *, u32, u32); s32 (*clear_vmdq)(struct ixgbe_hw *, u32, u32); s32 (*init_rx_addrs)(struct ixgbe_hw *); @@ -2468,13 +1567,12 @@ struct ixgbe_mac_operations { s32 (*init_uta_tables)(struct ixgbe_hw *); /* Flow Control */ - s32 (*fc_enable)(struct ixgbe_hw *, s32); + s32 (*setup_fc)(struct ixgbe_hw *, s32); }; struct ixgbe_phy_operations { s32 (*identify)(struct ixgbe_hw *); s32 (*identify_sfp)(struct ixgbe_hw *); - s32 (*init)(struct ixgbe_hw *); s32 (*reset)(struct ixgbe_hw *); s32 (*read_reg)(struct ixgbe_hw *, u32, u32, u16 *); s32 (*write_reg)(struct ixgbe_hw *, u32, u32, u16); @@ -2487,7 +1585,6 @@ struct ixgbe_phy_operations { s32 (*write_i2c_byte)(struct ixgbe_hw *, u8, u8, u8); s32 (*read_i2c_eeprom)(struct ixgbe_hw *, u8 , u8 *); s32 (*write_i2c_eeprom)(struct ixgbe_hw *, u8, u8); - void (*i2c_bus_clear)(struct ixgbe_hw *); }; struct ixgbe_eeprom_info { @@ -2503,22 +1600,16 @@ struct ixgbe_mac_info { enum ixgbe_mac_type type; u8 addr[IXGBE_ETH_LENGTH_OF_ADDRESS]; u8 perm_addr[IXGBE_ETH_LENGTH_OF_ADDRESS]; - u8 san_addr[IXGBE_ETH_LENGTH_OF_ADDRESS]; s32 mc_filter_type; u32 mcft_size; u32 vft_size; u32 num_rar_entries; - u32 rar_highwater; u32 max_tx_queues; u32 max_rx_queues; - u32 max_msix_vectors; - bool msix_vectors_from_pcie; u32 orig_autoc; - u32 orig_autoc2; bool orig_link_settings_stored; bool autoneg; bool autoneg_succeeded; - bool autotry_restart; }; struct ixgbe_phy_info { @@ -2527,13 +1618,11 @@ struct ixgbe_phy_info { u32 addr; u32 id; enum ixgbe_sfp_type sfp_type; - bool sfp_setup_needed; u32 revision; enum ixgbe_media_type media_type; bool reset_disable; ixgbe_autoneg_advertised autoneg_advertised; bool autoneg_wait_to_complete; - bool multispeed_fiber; }; struct ixgbe_hw { @@ -2556,7 +1645,6 @@ struct ixgbe_hw { #define ixgbe_call_func(hw, func, params, error) \ (func != NULL) ? func params : error - /* Error Codes */ #define IXGBE_ERR_EEPROM -1 #define IXGBE_ERR_EEPROM_CHECKSUM -2 @@ -2578,10 +1666,6 @@ struct ixgbe_hw { #define IXGBE_ERR_I2C -18 #define IXGBE_ERR_SFP_NOT_SUPPORTED -19 #define IXGBE_ERR_SFP_NOT_PRESENT -20 -#define IXGBE_ERR_SFP_NO_INIT_SEQ_PRESENT -21 -#define IXGBE_ERR_NO_SAN_ADDR_PTR -22 -#define IXGBE_ERR_FDIR_REINIT_FAILED -23 -#define IXGBE_ERR_EEPROM_VERSION -24 #define IXGBE_NOT_IMPLEMENTED 0x7FFFFFFF diff --git a/drivers/net/ixgbe/kcompat.c b/drivers/net/ixgbe/kcompat.c index b8dbbaa..1923dc4 100644 --- a/drivers/net/ixgbe/kcompat.c +++ b/drivers/net/ixgbe/kcompat.c @@ -1,7 +1,7 @@ /******************************************************************************* Intel 10 Gigabit PCI Express Linux driver - Copyright(c) 1999 - 2009 Intel Corporation. + Copyright(c) 1999 - 2008 Intel Corporation. This program is free software; you can redistribute it and/or modify it under the terms and conditions of the GNU General Public License, @@ -25,30 +25,17 @@ *******************************************************************************/ + + + + + + +#ifdef DRIVER_IXGBE #include "ixgbe.h" -#include "kcompat.h" +#endif -/*****************************************************************************/ -#if ( LINUX_VERSION_CODE < KERNEL_VERSION(2,4,21) ) -struct sk_buff * -_kc_skb_pad(struct sk_buff *skb, int pad) -{ - struct sk_buff *nskb; - - /* If the skbuff is non linear tailroom is always zero.. */ - if(skb_tailroom(skb) >= pad) - { - memset(skb->data+skb->len, 0, pad); - return skb; - } - - nskb = skb_copy_expand(skb, skb_headroom(skb), skb_tailroom(skb) + pad, GFP_ATOMIC); - kfree_skb(skb); - if(nskb) - memset(nskb->data+nskb->len, 0, pad); - return nskb; -} -#endif /* < 2.4.21 */ +#include "kcompat.h" /*****************************************************************************/ #if ( LINUX_VERSION_CODE < KERNEL_VERSION(2,4,13) ) @@ -294,7 +281,7 @@ struct sk_buff *_kc_netdev_alloc_skb(struct net_device *dev, /*****************************************************************************/ #if ( LINUX_VERSION_CODE < KERNEL_VERSION(2,6,19) ) int _kc_pci_save_state(struct pci_dev *pdev) -{ +{ struct net_device *netdev = pci_get_drvdata(pdev); struct adapter_struct *adapter = netdev_priv(netdev); int size = PCI_CONFIG_SPACE_LEN, i; @@ -308,7 +295,7 @@ int _kc_pci_save_state(struct pci_dev *pdev) size = PCIE_CONFIG_SPACE_LEN; } pci_config_space_ich8lan(); -#ifdef HAVE_PCI_ERS +#ifdef HAVE_PCI_ERS if (adapter->config_space == NULL) #else WARN_ON(adapter->config_space != NULL); @@ -333,12 +320,12 @@ void _kc_pci_restore_state(struct pci_dev * pdev) if (adapter->config_space != NULL) { pcie_cap_offset = pci_find_capability(pdev, PCI_CAP_ID_EXP); - if (pcie_cap_offset && + if (pcie_cap_offset && !pci_read_config_word(pdev, pcie_cap_offset + PCIE_LINK_STATUS, &pcie_link_status)) size = PCIE_CONFIG_SPACE_LEN; - + pci_config_space_ich8lan(); for (i = 0; i < (size / 4); i++) pci_write_config_dword(pdev, i * 4, adapter->config_space[i]); @@ -373,6 +360,16 @@ void _kc_free_netdev(struct net_device *netdev) /*****************************************************************************/ #if ( LINUX_VERSION_CODE < KERNEL_VERSION(2,6,23) ) +#ifdef DRIVER_IXGBE +int ixgbe_sysfs_create(struct ixgbe_adapter *adapter) +{ + return 0; +} + +void ixgbe_sysfs_remove(struct ixgbe_adapter *adapter) +{ + return; +} int ixgbe_dcb_netlink_register() { @@ -383,38 +380,27 @@ int ixgbe_dcb_netlink_unregister() { return 0; } +#endif /* DRIVER_IXGBE */ #endif /* < 2.6.23 */ /*****************************************************************************/ #if ( LINUX_VERSION_CODE < KERNEL_VERSION(2,6,24) ) #ifdef NAPI -/* this function returns the true netdev of the napi struct */ -struct net_device * napi_to_netdev(struct napi_struct *napi) -{ - struct adapter_q_vector *q_vector = container_of(napi, - struct adapter_q_vector, - napi); - struct adapter_struct *adapter = q_vector->adapter; - - return adapter->netdev; -} - -int _kc_napi_schedule_prep(struct napi_struct *napi) -{ - return (netif_running(napi_to_netdev(napi)) && - netif_rx_schedule_prep(napi_to_poll_dev(napi))); -} - int __kc_adapter_clean(struct net_device *netdev, int *budget) { int work_done; int work_to_do = min(*budget, netdev->quota); +#ifdef DRIVER_IXGBE /* kcompat.h netif_napi_add puts napi struct in "fake netdev->priv" */ struct napi_struct *napi = netdev->priv; +#else + struct adapter_struct *adapter = netdev_priv(netdev); + struct napi_struct *napi = &adapter->rx_ring[0].napi; +#endif work_done = napi->poll(napi, work_to_do); *budget -= work_done; netdev->quota -= work_done; - return (work_done >= work_to_do) ? 1 : 0; + return work_done ? 1 : 0; } #endif /* NAPI */ #endif /* <= 2.6.24 */ @@ -453,121 +439,4 @@ void _kc_netif_tx_start_all_queues(struct net_device *netdev) netif_start_subqueue(netdev, i); } #endif /* HAVE_TX_MQ */ -#endif /* < 2.6.27 */ - -/*****************************************************************************/ -#if ( LINUX_VERSION_CODE < KERNEL_VERSION(2,6,28) ) - -int -_kc_pci_prepare_to_sleep(struct pci_dev *dev) -{ - pci_power_t target_state; - int error; - - target_state = pci_choose_state(dev, PMSG_SUSPEND); - - pci_enable_wake(dev, target_state, true); - - error = pci_set_power_state(dev, target_state); - - if (error) - pci_enable_wake(dev, target_state, false); - - return error; -} - -int -_kc_pci_wake_from_d3(struct pci_dev *dev, bool enable) -{ - int err; - - err = pci_enable_wake(dev, PCI_D3cold, enable); - if (err) - goto out; - - err = pci_enable_wake(dev, PCI_D3hot, enable); - -out: - return err; -} -#endif /* < 2.6.28 */ - -/*****************************************************************************/ -#if ( LINUX_VERSION_CODE < KERNEL_VERSION(2,6,29) ) -void _kc_pci_disable_link_state(struct pci_dev *pdev, int state) -{ - struct pci_dev *parent = pdev->bus->self; - u16 link_state; - int pos; - - if (!parent) - return; - - pos = pci_find_capability(parent, PCI_CAP_ID_EXP); - if (pos) { - pci_read_config_word(parent, pos + PCI_EXP_LNKCTL, &link_state); - link_state &= ~state; - pci_write_config_word(parent, pos + PCI_EXP_LNKCTL, link_state); - } -} -#endif /* < 2.6.29 */ - -/*****************************************************************************/ -#if ( LINUX_VERSION_CODE < KERNEL_VERSION(2,6,30) ) -#ifdef HAVE_NETDEV_SELECT_QUEUE -#include <net/ip.h> -u32 _kc_simple_tx_hashrnd; -u32 _kc_simple_tx_hashrnd_initialized = 0; - -u16 _kc_skb_tx_hash(struct net_device *dev, struct sk_buff *skb) -{ - u32 addr1, addr2, ports; - u32 hash, ihl; - u8 ip_proto = 0; - - if (unlikely(!_kc_simple_tx_hashrnd_initialized)) { - get_random_bytes(&_kc_simple_tx_hashrnd, 4); - _kc_simple_tx_hashrnd_initialized = 1; - } - - switch (skb->protocol) { - case htons(ETH_P_IP): - if (!(ip_hdr(skb)->frag_off & htons(IP_MF | IP_OFFSET))) - ip_proto = ip_hdr(skb)->protocol; - addr1 = ip_hdr(skb)->saddr; - addr2 = ip_hdr(skb)->daddr; - ihl = ip_hdr(skb)->ihl; - break; - case htons(ETH_P_IPV6): - ip_proto = ipv6_hdr(skb)->nexthdr; - addr1 = ipv6_hdr(skb)->saddr.s6_addr32[3]; - addr2 = ipv6_hdr(skb)->daddr.s6_addr32[3]; - ihl = (40 >> 2); - break; - default: - return 0; - } - - - switch (ip_proto) { - case IPPROTO_TCP: - case IPPROTO_UDP: - case IPPROTO_DCCP: - case IPPROTO_ESP: - case IPPROTO_AH: - case IPPROTO_SCTP: - case IPPROTO_UDPLITE: - ports = *((u32 *) (skb_network_header(skb) + (ihl * 4))); - break; - - default: - ports = 0; - break; - } - - hash = jhash_3words(addr1, addr2, ports, _kc_simple_tx_hashrnd); - - return (u16) (((u64) hash * dev->real_num_tx_queues) >> 32); -} -#endif /* HAVE_NETDEV_SELECT_QUEUE */ -#endif /* < 2.6.30 */ +#endif /* <= 2.6.27 */ diff --git a/drivers/net/ixgbe/kcompat_ethtool.c b/drivers/net/ixgbe/kcompat_ethtool.c index 388fb21..786d42e 100644 --- a/drivers/net/ixgbe/kcompat_ethtool.c +++ b/drivers/net/ixgbe/kcompat_ethtool.c @@ -1,7 +1,7 @@ /******************************************************************************* Intel 10 Gigabit PCI Express Linux driver - Copyright(c) 1999 - 2009 Intel Corporation. + Copyright(c) 1999 - 2008 Intel Corporation. This program is free software; you can redistribute it and/or modify it under the terms and conditions of the GNU General Public License, @@ -958,7 +958,7 @@ int _kc_mii_ethtool_gset(struct mii_if_info *mii, struct ethtool_cmd *ecmd) if (bmcr & BMCR_ANENABLE) { ecmd->advertising |= ADVERTISED_Autoneg; ecmd->autoneg = AUTONEG_ENABLE; - + nego = mii_nway_result(advert & lpa); if (nego == LPA_100FULL || nego == LPA_100HALF) ecmd->speed = SPEED_100; @@ -999,9 +999,9 @@ int _kc_mii_ethtool_sset(struct mii_if_info *mii, struct ethtool_cmd *ecmd) return -EINVAL; if (ecmd->autoneg != AUTONEG_DISABLE && ecmd->autoneg != AUTONEG_ENABLE) return -EINVAL; - + /* ignore supported, maxtxpkt, maxrxpkt */ - + if (ecmd->autoneg == AUTONEG_ENABLE) { u32 bmcr, advert, tmp; @@ -1026,7 +1026,7 @@ int _kc_mii_ethtool_sset(struct mii_if_info *mii, struct ethtool_cmd *ecmd) mii->mdio_write(dev, mii->phy_id, MII_ADVERTISE, tmp); mii->advertising = tmp; } - + /* turn on autonegotiation, and force a renegotiate */ bmcr = mii->mdio_read(dev, mii->phy_id, MII_BMCR); bmcr |= (BMCR_ANENABLE | BMCR_ANRESTART); diff --git a/drivers/xen/netchannel2/vmq.c b/drivers/xen/netchannel2/vmq.c index e36962b..aecfbf7 100644 --- a/drivers/xen/netchannel2/vmq.c +++ b/drivers/xen/netchannel2/vmq.c @@ -637,6 +637,9 @@ int vmq_netif_rx(struct sk_buff *skb, int queue_id) memset(skb_co, 0, sizeof(*skb_co)); + if (skb->ip_summed == CHECKSUM_UNNECESSARY) + skb->proto_data_valid = 1; + skb_co->nr_fragments = skb_shinfo(skb)->nr_frags; skb_co->type = NC2_PACKET_TYPE_pre_posted; skb_co->policy = transmit_policy_vmq; -- 1.6.3.1 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel