Herbert Xu
2007-Mar-20 04:46 UTC
[Xen-devel] RFC: [0/2] Remove netloop by lazy copying in netback
Hi Keir: These two patches remove the need for netloop by performing the copying in netback and only if it is necessary. The rationale is that most packets will be processed without delay allowing them to be freed without copying at all. So instead of copying every packet destined to dom0 we''ll only copy those that linger longer than a specified amount of time (currently 0.5s). As it is netloop doesn''t take care of all delays anyway. For instance packets delayed by qdisc or netfilter can hold up resources without any limits. Also if bridging isn''t used then traffic to dom0 does not even go through netloop. Testing shows that these patches do eliminate the copying for bulk transfers. In fact, bulk transfer throughput from domU to dom0 are increased by around 50%. Even when the copying path is taken the performance is roughly equal to that of netloop despite the unoptimised copying path. The copying is achieved through a new grant table operation. I''ve only implemented it for x86. However, there is a fallback path for other platforms so they should continue to work. It shouldn''t be too hard to implement this for ia64/ppc (for someone with access to them). In future I intend to exntend this idea to support lazy copying for dom0 to domU as well which should give us a complete zero-copy path from one domU to another. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Herbert Xu
2007-Mar-20 04:50 UTC
[Xen-devel] RFC: [1/2] [XEN] gnttab: Add new op unmap_and_replace
Hi: [XEN] gnttab: Add new op unmap_and_replace The operation unmap_and_replace is an extension of unmap_grant_ref. A new argument in the form of a virtual address (for PV) is given. Instead of modifying the PTE for the mapped grant table entry to null, we change it to the PTE for the new address. In turn we point the new address to null. As it stands grant table entries once mapped cannot be remapped by the guest OS (it can however perform a new mapping on the same entry but that is within our control). Therefore it''s safe to manipulate the mapped PTE entry to redirect it to a normal page where we''ve copied the contents. It''s intended to be used as follows: 1) map_grant_ref to v1 2) ... 3) alloc page at v2 4) copy the page at v1 to v2 5) unmap_and_replace v1 with v2 Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- diff -r 3ac19fda0bc2 linux-2.6-xen-sparse/include/xen/gnttab.h --- a/linux-2.6-xen-sparse/include/xen/gnttab.h Fri Mar 02 12:11:52 2007 +0000 +++ b/linux-2.6-xen-sparse/include/xen/gnttab.h Tue Mar 20 14:08:40 2007 +1100 @@ -135,4 +135,19 @@ gnttab_set_unmap_op(struct gnttab_unmap_ unmap->dev_bus_addr = 0; } +static inline void +gnttab_set_replace_op(struct gnttab_unmap_and_replace *unmap, maddr_t addr, + maddr_t new_addr, grant_handle_t handle) +{ + if (xen_feature(XENFEAT_auto_translated_physmap)) { + unmap->host_addr = __pa(addr); + unmap->new_addr = __pa(new_addr); + } else { + unmap->host_addr = addr; + unmap->new_addr = new_addr; + } + + unmap->handle = handle; +} + #endif /* __ASM_GNTTAB_H__ */ diff -r 3ac19fda0bc2 xen/arch/x86/mm.c --- a/xen/arch/x86/mm.c Fri Mar 02 12:11:52 2007 +0000 +++ b/xen/arch/x86/mm.c Tue Mar 20 14:08:40 2007 +1100 @@ -2594,8 +2594,8 @@ static int create_grant_va_mapping( return GNTST_okay; } -static int destroy_grant_va_mapping( - unsigned long addr, unsigned long frame, struct vcpu *v) +static int replace_grant_va_mapping( + unsigned long addr, unsigned long frame, l1_pgentry_t nl1e, struct vcpu *v) { l1_pgentry_t *pl1e, ol1e; unsigned long gl1mfn; @@ -2619,7 +2619,7 @@ static int destroy_grant_va_mapping( } /* Delete pagetable entry. */ - if ( unlikely(!UPDATE_ENTRY(l1, pl1e, ol1e, l1e_empty(), gl1mfn, v)) ) + if ( unlikely(!UPDATE_ENTRY(l1, pl1e, ol1e, nl1e, gl1mfn, v)) ) { MEM_LOG("Cannot delete PTE entry at %p", (unsigned long *)pl1e); rc = GNTST_general_error; @@ -2629,6 +2629,12 @@ static int destroy_grant_va_mapping( out: guest_unmap_l1e(v, pl1e); return rc; +} + +static int destroy_grant_va_mapping( + unsigned long addr, unsigned long frame, struct vcpu *v) +{ + return replace_grant_va_mapping(addr, frame, l1e_empty(), v); } int create_grant_host_mapping( @@ -2652,6 +2658,44 @@ int destroy_grant_host_mapping( if ( flags & GNTMAP_contains_pte ) return destroy_grant_pte_mapping(addr, frame, current->domain); return destroy_grant_va_mapping(addr, frame, current); +} + +int replace_grant_host_mapping( + uint64_t addr, unsigned long frame, uint64_t new_addr, unsigned int flags) +{ + l1_pgentry_t *pl1e, ol1e; + unsigned long gl1mfn; + int rc; + + if ( flags & GNTMAP_contains_pte ) + { + MEM_LOG("Unsupported grant table operation"); + return GNTST_general_error; + } + + pl1e = guest_map_l1e(current, new_addr, &gl1mfn); + if ( !pl1e ) + { + MEM_LOG("Could not find L1 PTE for address %lx", + (unsigned long)new_addr); + return GNTST_general_error; + } + ol1e = *pl1e; + + if ( unlikely(!UPDATE_ENTRY(l1, pl1e, ol1e, l1e_empty(), gl1mfn, current)) ) + { + MEM_LOG("Cannot delete PTE entry at %p", (unsigned long *)pl1e); + guest_unmap_l1e(current, pl1e); + return GNTST_general_error; + } + + guest_unmap_l1e(current, pl1e); + + rc = replace_grant_va_mapping(addr, frame, ol1e, current); + if ( rc && !paging_mode_refcounts(current->domain) ) + put_page_from_l1e(ol1e, current->domain); + + return rc; } int steal_page( diff -r 3ac19fda0bc2 xen/common/grant_table.c --- a/xen/common/grant_table.c Fri Mar 02 12:11:52 2007 +0000 +++ b/xen/common/grant_table.c Tue Mar 20 14:08:40 2007 +1100 @@ -540,6 +540,126 @@ fault: return -EFAULT; } +static void +__gnttab_unmap_and_replace( + struct gnttab_unmap_and_replace *op) +{ + domid_t dom; + grant_ref_t ref; + struct domain *ld, *rd; + struct active_grant_entry *act; + grant_entry_t *sha; + struct grant_mapping *map; + u16 flags; + s16 rc = 0; + unsigned long frame; + + ld = current->domain; + + if ( unlikely(op->handle >= ld->grant_table->maptrack_limit) ) + { + gdprintk(XENLOG_INFO, "Bad handle (%d).\n", op->handle); + op->status = GNTST_bad_handle; + return; + } + + map = &maptrack_entry(ld->grant_table, op->handle); + + if ( unlikely(!map->flags) ) + { + gdprintk(XENLOG_INFO, "Zero flags for handle (%d).\n", op->handle); + op->status = GNTST_bad_handle; + return; + } + + dom = map->domid; + ref = map->ref; + flags = map->flags; + + if ( unlikely((rd = rcu_lock_domain_by_id(dom)) == NULL) ) + { + /* This can happen when a grant is implicitly unmapped. */ + gdprintk(XENLOG_INFO, "Could not find domain %d\n", dom); + domain_crash(ld); /* naughty... */ + return; + } + + TRACE_1D(TRC_MEM_PAGE_GRANT_UNMAP, dom); + + spin_lock(&rd->grant_table->lock); + + act = &active_entry(rd->grant_table, ref); + sha = &shared_entry(rd->grant_table, ref); + + frame = act->frame; + + if ( flags & GNTMAP_host_map ) + { + if ( (rc = replace_grant_host_mapping(op->host_addr, + frame, op->new_addr, flags)) < 0 ) + goto unmap_out; + + ASSERT(act->pin & (GNTPIN_hstw_mask | GNTPIN_hstr_mask)); + map->flags &= ~GNTMAP_host_map; + if ( flags & GNTMAP_readonly ) + { + act->pin -= GNTPIN_hstr_inc; + put_page(mfn_to_page(frame)); + } + else + { + act->pin -= GNTPIN_hstw_inc; + put_page_and_type(mfn_to_page(frame)); + } + } + + if ( (map->flags & (GNTMAP_device_map|GNTMAP_host_map)) == 0 ) + { + map->flags = 0; + put_maptrack_handle(ld->grant_table, op->handle); + } + + /* If just unmapped a writable mapping, mark as dirtied */ + if ( !(flags & GNTMAP_readonly) ) + gnttab_mark_dirty(rd, frame); + + if ( ((act->pin & (GNTPIN_devw_mask|GNTPIN_hstw_mask)) == 0) && + !(flags & GNTMAP_readonly) ) + gnttab_clear_flag(_GTF_writing, &sha->flags); + + if ( act->pin == 0 ) + gnttab_clear_flag(_GTF_reading, &sha->flags); + + unmap_out: + op->status = rc; + spin_unlock(&rd->grant_table->lock); + rcu_unlock_domain(rd); +} + +static long +gnttab_unmap_and_replace( + XEN_GUEST_HANDLE(gnttab_unmap_and_replace_t) uop, unsigned int count) +{ + int i; + struct gnttab_unmap_and_replace op; + + for ( i = 0; i < count; i++ ) + { + if ( unlikely(__copy_from_guest_offset(&op, uop, i, 1)) ) + goto fault; + __gnttab_unmap_and_replace(&op); + if ( unlikely(__copy_to_guest_offset(uop, i, &op, 1)) ) + goto fault; + } + + flush_tlb_mask(current->domain->domain_dirty_cpumask); + return 0; + +fault: + flush_tlb_mask(current->domain->domain_dirty_cpumask); + return -EFAULT; +} + int gnttab_grow_table(struct domain *d, unsigned int req_nr_frames) { @@ -1201,6 +1321,21 @@ do_grant_table_op( if ( unlikely(!grant_operation_permitted(d)) ) goto out; rc = gnttab_unmap_grant_ref(unmap, count); + break; + } + case GNTTABOP_unmap_and_replace: + { + XEN_GUEST_HANDLE(gnttab_unmap_and_replace_t) unmap + guest_handle_cast(uop, gnttab_unmap_and_replace_t); + if ( unlikely(!guest_handle_okay(unmap, count)) ) + goto out; + rc = -EPERM; + if ( unlikely(!grant_operation_permitted(d)) ) + goto out; + rc = -ENOSYS; + if ( unlikely(!replace_grant_supported()) ) + goto out; + rc = gnttab_unmap_and_replace(unmap, count); break; } case GNTTABOP_setup_table: diff -r 3ac19fda0bc2 xen/include/asm-ia64/grant_table.h --- a/xen/include/asm-ia64/grant_table.h Fri Mar 02 12:11:52 2007 +0000 +++ b/xen/include/asm-ia64/grant_table.h Tue Mar 20 14:08:40 2007 +1100 @@ -64,4 +64,14 @@ static inline void gnttab_clear_flag(uns clear_bit(nr, addr); } +static inline int replace_grant_host_mapping(unsigned long gpaddr, unsigned long mfn, unsigned long new_gpaddr, unsigned int flags) +{ + return 0; +} + +static inline int replace_grant_supported(void) +{ + return 0; +} + #endif /* __ASM_GRANT_TABLE_H__ */ diff -r 3ac19fda0bc2 xen/include/asm-powerpc/grant_table.h --- a/xen/include/asm-powerpc/grant_table.h Fri Mar 02 12:11:52 2007 +0000 +++ b/xen/include/asm-powerpc/grant_table.h Tue Mar 20 14:08:40 2007 +1100 @@ -69,4 +69,14 @@ static inline uint cpu_foreign_map_order /* 16 GiB */ return 34 - PAGE_SHIFT; } + +static inline int replace_grant_host_mapping(unsigned long addr, unsigned long frame, unsigned long new_addr, unsigned int flags) +{ + return 0; +} + +static inline int replace_grant_supported(void) +{ + return 0; +} #endif /* __ASM_PPC_GRANT_TABLE_H__ */ diff -r 3ac19fda0bc2 xen/include/asm-x86/grant_table.h --- a/xen/include/asm-x86/grant_table.h Fri Mar 02 12:11:52 2007 +0000 +++ b/xen/include/asm-x86/grant_table.h Tue Mar 20 14:08:40 2007 +1100 @@ -17,6 +17,8 @@ int create_grant_host_mapping( uint64_t addr, unsigned long frame, unsigned int flags); int destroy_grant_host_mapping( uint64_t addr, unsigned long frame, unsigned int flags); +int replace_grant_host_mapping( + uint64_t addr, unsigned long frame, uint64_t new_addr, unsigned int flags); #define gnttab_create_shared_page(d, t, i) \ do { \ @@ -38,4 +40,9 @@ static inline void gnttab_clear_flag(uns clear_bit(nr, addr); } +static inline int replace_grant_supported(void) +{ + return 1; +} + #endif /* __ASM_GRANT_TABLE_H__ */ diff -r 3ac19fda0bc2 xen/include/public/grant_table.h --- a/xen/include/public/grant_table.h Fri Mar 02 12:11:52 2007 +0000 +++ b/xen/include/public/grant_table.h Tue Mar 20 14:08:40 2007 +1100 @@ -327,6 +327,29 @@ struct gnttab_query_size { }; typedef struct gnttab_query_size gnttab_query_size_t; DEFINE_XEN_GUEST_HANDLE(gnttab_query_size_t); + +/* + * GNTTABOP_unmap_and_replace: Destroy one or more grant-reference mappings + * tracked by <handle> but atomically replace the page table entry with one + * pointing to the machine address under <new_addr>. <new_addr> will be + * redirected to the null entry. + * NOTES: + * 1. The call may fail in an undefined manner if either mapping is not + * tracked by <handle>. + * 2. After executing a batch of unmaps, it is guaranteed that no stale + * mappings will remain in the device or host TLBs. + */ +#define GNTTABOP_unmap_and_replace 7 +struct gnttab_unmap_and_replace { + /* IN parameters. */ + uint64_t host_addr; + uint64_t new_addr; + grant_handle_t handle; + /* OUT parameters. */ + int16_t status; /* GNTST_* */ +}; +typedef struct gnttab_unmap_and_replace gnttab_unmap_and_replace_t; +DEFINE_XEN_GUEST_HANDLE(gnttab_unmap_and_replace_t); /* _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Hi: [NET] back: Add lazy copying This patch adds lazy copying using the new unmap_and_replace grant table operation. We keep a list of pending entries sorted by arrival order. We''ll process this list every time net_tx_action is invoked. We ensure that net_tx_action is invoked within one second of the arrival of the first packet in the list. When we process the list any entry that has been around for more than half a second is copied. This allows up to free the grant table entry and return it to domU. If the new grant table operation is not available (e.g., old HV or architectures that don''t support it yet) we simply copy each packet as we receive them using skb_linearize. We also disable SG/TSO if this is the case. By default the new code is disabled. In order to enable it, the module needs to be loaded with the argument copy_skb=1. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- diff -r 3ac19fda0bc2 linux-2.6-xen-sparse/drivers/xen/netback/common.h --- a/linux-2.6-xen-sparse/drivers/xen/netback/common.h Fri Mar 02 12:11:52 2007 +0000 +++ b/linux-2.6-xen-sparse/drivers/xen/netback/common.h Tue Mar 20 14:08:40 2007 +1100 @@ -102,6 +102,14 @@ typedef struct netif_st { wait_queue_head_t waiting_to_free; } netif_t; +enum { + NETBK_DONT_COPY_SKB, + NETBK_DELAYED_COPY_SKB, + NETBK_ALWAYS_COPY_SKB, +}; + +extern int netbk_copy_skb_mode; + #define NET_TX_RING_SIZE __RING_SIZE((netif_tx_sring_t *)0, PAGE_SIZE) #define NET_RX_RING_SIZE __RING_SIZE((netif_rx_sring_t *)0, PAGE_SIZE) diff -r 3ac19fda0bc2 linux-2.6-xen-sparse/drivers/xen/netback/netback.c --- a/linux-2.6-xen-sparse/drivers/xen/netback/netback.c Fri Mar 02 12:11:52 2007 +0000 +++ b/linux-2.6-xen-sparse/drivers/xen/netback/netback.c Tue Mar 20 14:08:40 2007 +1100 @@ -46,6 +46,11 @@ struct netbk_rx_meta { int copy:1; }; +struct netbk_tx_pending_inuse { + struct list_head list; + unsigned long alloc_time; +}; + static void netif_idx_release(u16 pending_idx); static void netif_page_release(struct page *page); static void make_tx_response(netif_t *netif, @@ -65,6 +70,7 @@ static DECLARE_TASKLET(net_rx_tasklet, n static DECLARE_TASKLET(net_rx_tasklet, net_rx_action, 0); static struct timer_list net_timer; +static struct timer_list netbk_tx_pending_timer; #define MAX_PENDING_REQS 256 @@ -92,6 +98,10 @@ static u16 dealloc_ring[MAX_PENDING_REQS static u16 dealloc_ring[MAX_PENDING_REQS]; static PEND_RING_IDX dealloc_prod, dealloc_cons; +/* Doubly-linked list of in-use pending entries. */ +static struct netbk_tx_pending_inuse pending_inuse[MAX_PENDING_REQS]; +static LIST_HEAD(pending_inuse_head); + static struct sk_buff_head tx_queue; static grant_handle_t grant_tx_handle[MAX_PENDING_REQS]; @@ -104,6 +114,13 @@ static spinlock_t net_schedule_list_lock #define MAX_MFN_ALLOC 64 static unsigned long mfn_list[MAX_MFN_ALLOC]; static unsigned int alloc_index = 0; + +/* Setting this allows the safe use of this driver without netloop. */ +static int MODPARM_copy_skb; +module_param_named(copy_skb, MODPARM_copy_skb, bool, 0); +MODULE_PARM_DESC(copy_skb, "Copy data received from netfront without netloop"); + +int netbk_copy_skb_mode; static inline unsigned long alloc_mfn(void) { @@ -710,6 +727,11 @@ static void net_alarm(unsigned long unus tasklet_schedule(&net_rx_tasklet); } +static void netbk_tx_pending_timeout(unsigned long unused) +{ + tasklet_schedule(&net_tx_tasklet); +} + struct net_device_stats *netif_be_get_stats(struct net_device *dev) { netif_t *netif = netdev_priv(dev); @@ -803,46 +825,140 @@ static void tx_credit_callback(unsigned netif_schedule_work(netif); } +/* Perform a delayed copy. This is slow-path only. */ +static int copy_pending_req(PEND_RING_IDX pending_idx) +{ + struct gnttab_unmap_and_replace unmap; + mmu_update_t mmu; + struct page *page; + struct page *new_page; + void *new_addr; + void *addr; + unsigned long pfn; + unsigned long new_mfn; + int err; + + page = mmap_pages[pending_idx]; + if (!get_page_unless_zero(page)) + return -ENOENT; + + new_page = alloc_page(GFP_ATOMIC | __GFP_NOWARN); + if (!new_page) + return -ENOMEM; + + new_addr = page_address(new_page); + addr = page_address(page); + memcpy(new_addr, addr, PAGE_SIZE); + + pfn = page_to_pfn(page); + new_mfn = virt_to_mfn(new_addr); + + if (!xen_feature(XENFEAT_auto_translated_physmap)) { + set_phys_to_machine(pfn, new_mfn); + set_phys_to_machine(page_to_pfn(new_page), INVALID_P2M_ENTRY); + } + + gnttab_set_replace_op(&unmap, (unsigned long)addr, + (unsigned long)new_addr, + grant_tx_handle[pending_idx]); + + err = HYPERVISOR_grant_table_op(GNTTABOP_unmap_and_replace, + &unmap, 1); + BUG_ON(err); + BUG_ON(unmap.status); + + if (!xen_feature(XENFEAT_auto_translated_physmap)) { + mmu.ptr = ((maddr_t)new_mfn << PAGE_SHIFT) | + MMU_MACHPHYS_UPDATE; + mmu.val = pfn; + err = HYPERVISOR_mmu_update(&mmu, 1, NULL, DOMID_SELF); + BUG_ON(err); + } + + ClearPageForeign(page); + put_page(page); + + SetPageForeign(new_page, netif_page_release); + new_page->index = pending_idx; + mmap_pages[pending_idx] = new_page; + + return 0; +} + inline static void net_tx_action_dealloc(void) { + struct netbk_tx_pending_inuse *inuse, *n; gnttab_unmap_grant_ref_t *gop; u16 pending_idx; PEND_RING_IDX dc, dp; netif_t *netif; int ret; + LIST_HEAD(list); dc = dealloc_cons; - dp = dealloc_prod; - - /* Ensure we see all indexes enqueued by netif_idx_release(). */ - smp_rmb(); + gop = tx_unmap_ops; /* * Free up any grants we have finished using */ - gop = tx_unmap_ops; - while (dc != dp) { - pending_idx = dealloc_ring[MASK_PEND_IDX(dc++)]; - gnttab_set_unmap_op(gop, idx_to_kaddr(pending_idx), - GNTMAP_host_map, - grant_tx_handle[pending_idx]); - gop++; - } + do { + dp = dealloc_prod; + + /* Ensure we see all indices enqueued by netif_idx_release(). */ + smp_rmb(); + + while (dc != dp) { + pending_idx = dealloc_ring[MASK_PEND_IDX(dc++)]; + list_move_tail(&pending_inuse[pending_idx].list, &list); + gnttab_set_unmap_op(gop, idx_to_kaddr(pending_idx), + GNTMAP_host_map, + grant_tx_handle[pending_idx]); + gop++; + } + + if (netbk_copy_skb_mode != NETBK_DELAYED_COPY_SKB || + list_empty(&pending_inuse_head)) + break; + + /* Copy any entries that have been pending for too long. */ + list_for_each_entry_safe(inuse, n, &pending_inuse_head, list) { + if (time_after(inuse->alloc_time + HZ / 2, jiffies)) + break; + + switch (copy_pending_req(inuse - pending_inuse)) { + case 0: + list_move_tail(&inuse->list, &list); + /* fall through */ + case -ENOENT: + continue; + } + + break; + } + } while (dp != dealloc_prod); + + dealloc_cons = dc; + ret = HYPERVISOR_grant_table_op( GNTTABOP_unmap_grant_ref, tx_unmap_ops, gop - tx_unmap_ops); BUG_ON(ret); - while (dealloc_cons != dp) { - pending_idx = dealloc_ring[MASK_PEND_IDX(dealloc_cons++)]; + list_for_each_entry_safe(inuse, n, &list, list) { + pending_idx = inuse - pending_inuse; netif = pending_tx_info[pending_idx].netif; make_tx_response(netif, &pending_tx_info[pending_idx].req, NETIF_RSP_OKAY); + /* Ready for next use. */ + init_page_count(mmap_pages[pending_idx]); + pending_ring[MASK_PEND_IDX(pending_prod++)] = pending_idx; netif_put(netif); + + list_del_init(&inuse->list); } } @@ -1014,6 +1130,11 @@ static void netbk_fill_frags(struct sk_b unsigned long pending_idx; pending_idx = (unsigned long)frag->page; + + pending_inuse[pending_idx].alloc_time = jiffies; + list_add_tail(&pending_inuse[pending_idx].list, + &pending_inuse_head); + txp = &pending_tx_info[pending_idx].req; frag->page = virt_to_page(idx_to_kaddr(pending_idx)); frag->size = txp->size; @@ -1302,8 +1423,24 @@ static void net_tx_action(unsigned long netif->stats.rx_bytes += skb->len; netif->stats.rx_packets++; + if (unlikely(netbk_copy_skb_mode == NETBK_ALWAYS_COPY_SKB) && + unlikely(skb_linearize(skb))) { + DPRINTK("Can''t linearize skb in net_tx_action.\n"); + kfree_skb(skb); + continue; + } + netif_rx(skb); netif->dev->last_rx = jiffies; + } + + if (netbk_copy_skb_mode == NETBK_DELAYED_COPY_SKB && + !list_empty(&pending_inuse_head)) { + struct netbk_tx_pending_inuse *oldest; + + oldest = list_entry(pending_inuse_head.next, + struct netbk_tx_pending_inuse, list); + mod_timer(&netbk_tx_pending_timer, oldest->alloc_time + HZ); } } @@ -1324,9 +1461,6 @@ static void netif_idx_release(u16 pendin static void netif_page_release(struct page *page) { - /* Ready for next use. */ - init_page_count(page); - netif_idx_release(page->index); } @@ -1448,6 +1582,10 @@ static int __init netback_init(void) net_timer.data = 0; net_timer.function = net_alarm; + init_timer(&netbk_tx_pending_timer); + netbk_tx_pending_timer.data = 0; + netbk_tx_pending_timer.function = netbk_tx_pending_timeout; + mmap_pages = alloc_empty_pages_and_pagevec(MAX_PENDING_REQS); if (mmap_pages == NULL) { printk("%s: out of memory\n", __FUNCTION__); @@ -1458,6 +1596,7 @@ static int __init netback_init(void) page = mmap_pages[i]; SetPageForeign(page, netif_page_release); page->index = i; + INIT_LIST_HEAD(&pending_inuse[i].list); } pending_cons = 0; @@ -1467,6 +1606,15 @@ static int __init netback_init(void) spin_lock_init(&net_schedule_list_lock); INIT_LIST_HEAD(&net_schedule_list); + + netbk_copy_skb_mode = NETBK_DONT_COPY_SKB; + if (MODPARM_copy_skb) { + if (HYPERVISOR_grant_table_op(GNTTABOP_unmap_and_replace, + NULL, 0)) + netbk_copy_skb_mode = NETBK_ALWAYS_COPY_SKB; + else + netbk_copy_skb_mode = NETBK_DELAYED_COPY_SKB; + } netif_xenbus_init(); diff -r 3ac19fda0bc2 linux-2.6-xen-sparse/drivers/xen/netback/xenbus.c --- a/linux-2.6-xen-sparse/drivers/xen/netback/xenbus.c Fri Mar 02 12:11:52 2007 +0000 +++ b/linux-2.6-xen-sparse/drivers/xen/netback/xenbus.c Tue Mar 20 14:08:40 2007 +1100 @@ -62,6 +62,7 @@ static int netback_probe(struct xenbus_d const char *message; struct xenbus_transaction xbt; int err; + int sg; struct backend_info *be = kzalloc(sizeof(struct backend_info), GFP_KERNEL); if (!be) { @@ -73,6 +74,10 @@ static int netback_probe(struct xenbus_d be->dev = dev; dev->dev.driver_data = be; + sg = 1; + if (netbk_copy_skb_mode == NETBK_ALWAYS_COPY_SKB) + sg = 0; + do { err = xenbus_transaction_start(&xbt); if (err) { @@ -80,14 +85,14 @@ static int netback_probe(struct xenbus_d goto fail; } - err = xenbus_printf(xbt, dev->nodename, "feature-sg", "%d", 1); + err = xenbus_printf(xbt, dev->nodename, "feature-sg", "%d", sg); if (err) { message = "writing feature-sg"; goto abort_transaction; } err = xenbus_printf(xbt, dev->nodename, "feature-gso-tcpv4", - "%d", 1); + "%d", sg); if (err) { message = "writing feature-gso-tcpv4"; goto abort_transaction; _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2007-Mar-20 07:10 UTC
Re: [Xen-devel] RFC: [0/2] Remove netloop by lazy copying in netback
Sounds good. I''m not sure about putting this in for 3.0.5 since that''s only a couple of weeks away. Also I''m loathe to add yet another darned grant-table operation if we can avoid it. In this case I have my doubts it''s required, if we''re prepared to hook off a slow path in the page-fault handler (unhandled kernel fault). If we could get a callback into netback in that case for some range of kernel virtual addresses then we could fix up lazily in the case that an access races replacement of foreign mapping with local mapping. We would like this anyway so we can do zero-copy-to-the-wire (this may additionally require us to rev our interface to copy the packet headers in the rings, or we could see how far just a small grant-copy operation gets us). The idea would be to point the skb at a bunch of empty mappings in the netback mapping area: if the areas have to be filled in (due to firewall rules, filtering, PIO, etc) then we do it on demand from the page-fault handler. -- Keir On 20/3/07 04:46, "Herbert Xu" <herbert@gondor.apana.org.au> wrote:> These two patches remove the need for netloop by performing the > copying in netback and only if it is necessary. The rationale > is that most packets will be processed without delay allowing > them to be freed without copying at all. So instead of copying > every packet destined to dom0 we''ll only copy those that linger > longer than a specified amount of time (currently 0.5s). > > As it is netloop doesn''t take care of all delays anyway. For > instance packets delayed by qdisc or netfilter can hold up > resources without any limits. Also if bridging isn''t used > then traffic to dom0 does not even go through netloop. > > Testing shows that these patches do eliminate the copying for > bulk transfers. In fact, bulk transfer throughput from domU > to dom0 are increased by around 50%. Even when the copying > path is taken the performance is roughly equal to that of > netloop despite the unoptimised copying path._______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Herbert Xu
2007-Mar-20 10:11 UTC
Re: [Xen-devel] RFC: [0/2] Remove netloop by lazy copying in netback
On Tue, Mar 20, 2007 at 07:10:27AM +0000, Keir Fraser wrote:> Sounds good. I''m not sure about putting this in for 3.0.5 since that''s only > a couple of weeks away. Also I''m loathe to add yet another darned > grant-table operation if we can avoid it. In this case I have my doubts it''s > required, if we''re prepared to hook off a slow path in the page-fault > handler (unhandled kernel fault). If we could get a callback into netback in > that case for some range of kernel virtual addresses then we could fix upYes that''s a good idea and would avoid the grant table addition. My only concern would be future acceptance into the mainstream Linux kernel, especially on the domU side since I''d like to see this idea used for dom0 => domU as well. However, I''m certainly happy to explore doing it in the way you suggested. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2007-Mar-20 10:18 UTC
Re: [Xen-devel] RFC: [0/2] Remove netloop by lazy copying in netback
On 20/3/07 10:11, "Herbert Xu" <herbert@gondor.apana.org.au> wrote:> Yes that''s a good idea and would avoid the grant table addition. > > My only concern would be future acceptance into the mainstream > Linux kernel, especially on the domU side since I''d like to see > this idea used for dom0 => domU as well. > > However, I''m certainly happy to explore doing it in the way you > suggested.That''s quite a valid concern, but I think that the required addition to the #PF handler (certainly for i386 and x86/64) will be clean and small, and it will not affect #PF critical-path latencies. I''d be fairly optimistic about it getting accepted upstream, perhaps modulo concerns over whether we''d need to implement it for *every* architecture. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2007-Mar-20 10:22 UTC
Re: [Xen-devel] RFC: [0/2] Remove netloop by lazy copying in netback
On 20/3/07 04:46, "Herbert Xu" <herbert@gondor.apana.org.au> wrote:> These two patches remove the need for netloop by performing the > copying in netback and only if it is necessary. The rationale > is that most packets will be processed without delay allowing > them to be freed without copying at all. So instead of copying > every packet destined to dom0 we''ll only copy those that linger > longer than a specified amount of time (currently 0.5s).One other thought I had -- does the Linux network stack ever modify date residing outside the main skbuff data area (i.e., data in fragments)? If it ever does, wouldn''t your delayed copy in netback potentially race such updates? If the Linux network stack guarantees never to do such a thing, that would be nice for us. :-) -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2007-Mar-20 10:27 UTC
Re: [Xen-devel] RFC: [0/2] Remove netloop by lazy copying in netback
On 20/3/07 10:22, "Keir Fraser" <keir@xensource.com> wrote:> One other thought I had -- does the Linux network stack ever modify date > residing outside the main skbuff data area (i.e., data in fragments)? If it > ever does, wouldn''t your delayed copy in netback potentially race such > updates? If the Linux network stack guarantees never to do such a thing, > that would be nice for us. :-)Scrub that question: we already map the grants read-only, so the network stack can''t be trying to write to fragments. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
David Edmondson
2007-Mar-20 11:50 UTC
Re: [Xen-devel] RFC: [0/2] Remove netloop by lazy copying in netback
On Tue, Mar 20, 2007 at 10:27:48AM +0000, Keir Fraser wrote:> On 20/3/07 10:22, "Keir Fraser" <keir@xensource.com> wrote: > > One other thought I had -- does the Linux network stack ever modify date > > residing outside the main skbuff data area (i.e., data in fragments)? If it > > ever does, wouldn''t your delayed copy in netback potentially race such > > updates? If the Linux network stack guarantees never to do such a thing, > > that would be nice for us. :-) > > Scrub that question: we already map the grants read-only, so the network > stack can''t be trying to write to fragments.The Solaris stack expects to be able to write to received packets and I have a hack in our front and back end drivers to permit a writable mapping of front->back buffers in preference to a copy (it''s declared using a feature flag, so a guest that doesn''t do it still works). I''ve resisted meddling in the fault handler, but I guess that I should spend some time there in case it''s a straightforward addition... dme. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Herbert Xu
2007-Mar-22 10:50 UTC
Re: [Xen-devel] RFC: [0/2] Remove netloop by lazy copying in netback
On Tue, Mar 20, 2007 at 03:18:54AM -0700, Keir Fraser wrote:> > That''s quite a valid concern, but I think that the required addition to the > #PF handler (certainly for i386 and x86/64) will be clean and small, and it > will not affect #PF critical-path latencies. I''d be fairly optimistic about > it getting accepted upstream, perhaps modulo concerns over whether we''d need > to implement it for *every* architecture.OK I''ve given it a spin and it''s pretty straight-forward for x86 as you said. However, we''ll need a bit more work for ia64/ppc either on the kernel side or on the hypervisor side as there is no way currently to swap entries in the P2M table. Any better suggestions for dealing with those two? Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2007-Mar-22 15:40 UTC
Re: [Xen-devel] RFC: [0/2] Remove netloop by lazy copying in netback
> On Tue, Mar 20, 2007 at 03:18:54AM -0700, Keir Fraser wrote: >> >> That''s quite a valid concern, but I think that the required addition to the >> #PF handler (certainly for i386 and x86/64) will be clean and small, and it >> will not affect #PF critical-path latencies. I''d be fairly optimistic about >> it getting accepted upstream, perhaps modulo concerns over whether we''d need >> to implement it for *every* architecture. > > OK I''ve given it a spin and it''s pretty straight-forward for > x86 as you said. However, we''ll need a bit more work for > ia64/ppc either on the kernel side or on the hypervisor side > as there is no way currently to swap entries in the P2M table. > > Any better suggestions for dealing with those two?I didn''t look super closely at the precise set of steps on x86 either. Do you take a normal page of memory in the guest''s address space and simply give it an extra mapping in the netback area? Or do you take a page with no pseudophysical address and assign it a pseudophysical address corresponding to its place in the netback area? I certainly assumed (a), and I think that would work fine on ia64 and powerpc, as the page would already have a pseudophysical address? Maybe I''m misunderstanding something... :-) -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Herbert Xu
2007-Mar-23 03:17 UTC
Re: [Xen-devel] RFC: [0/2] Remove netloop by lazy copying in netback
On Thu, Mar 22, 2007 at 03:40:59PM +0000, Keir Fraser wrote:> > I didn''t look super closely at the precise set of steps on x86 either. Do > you take a normal page of memory in the guest''s address space and simply > give it an extra mapping in the netback area? Or do you take a page with no > pseudophysical address and assign it a pseudophysical address corresponding > to its place in the netback area?I allocate a new page in dom0, i.e., an mfn/pfn pair, take away the mfn and give it to the original netback pfn which would have no mfn after the grant table entry has been unmapped. The newly mfn-less pfn is then put back into the netback area in the place of the old one which is now a normal page.> I certainly assumed (a), and I think that would work fine on ia64 and > powerpc, as the page would already have a pseudophysical address?a) would be good but it doesn''t look easy. The reason is that we have to fit the result into skb frags which takes a page_struct and kmaps it on demand. In other words once the skb is in injected into the stack the pseudo-physical address has to remain fixed whether we have a grant table entry mapped there or not. So changing the guest PTE in the auto-translted case (ia64/ppc and EPT/NPT in future) isn''t possible. As we either have to change the guest PTE or the host P2M, this leaves us with only the latter as an option. We can''t do that currently as the hypervisor doesn''t have such an operation. So it looks like we''ll need a new hypercall (although not a grant table operation) to do it this way on an auto-translated dom0. PS Check out the To/Cc fields, the number of Keirs is getting out of control :) Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2007-Mar-23 10:32 UTC
Re: [Xen-devel] RFC: [0/2] Remove netloop by lazy copying in netback
On 23/3/07 03:17, "Herbert Xu" <herbert@gondor.apana.org.au> wrote:> a) would be good but it doesn''t look easy. The reason is that we have > to fit the result into skb frags which takes a page_struct and kmaps it > on demand. In other words once the skb is in injected into the stack > the pseudo-physical address has to remain fixed whether we have a grant > table entry mapped there or not. > > So changing the guest PTE in the auto-translted case (ia64/ppc and EPT/NPT > in future) isn''t possible.It still sounds like it would work. The fragment''s ''struct page *'' will map to a particular kernel virtual addres. That kernel virtual address can be transformed by arithmetic back to the ''struct page *''. The fact that the pte that maps that kernel virtual address actually points over at some other poor unsuspecting pfn (which already has a struct page *, thank you very much) doesn''t actually matter, does it? Does anyone ever go look at the pte contents and try to work out the ''struct page *'' from that? I doubt it -- or our netback driver would not work right now on x86 (as the mach-to-phys entry is garbage from the p.o.v. of dom0, so any attempt to translate the pte contents into something meaningful in pseudophys space would fail). PS. I''ve stripped my name from the To/Cc lists! :-) -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Herbert Xu
2007-Mar-23 11:42 UTC
Re: [Xen-devel] RFC: [0/2] Remove netloop by lazy copying in netback
On Fri, Mar 23, 2007 at 10:32:37AM +0000, Keir Fraser wrote:> > It still sounds like it would work. The fragment''s ''struct page *'' will map > to a particular kernel virtual addres. That kernel virtual address can be > transformed by arithmetic back to the ''struct page *''. The fact that the pte > that maps that kernel virtual address actually points over at some other > poor unsuspecting pfn (which already has a struct page *, thank you very > much) doesn''t actually matter, does it? Does anyone ever go look at the pte > contents and try to work out the ''struct page *'' from that? I doubt it -- or > our netback driver would not work right now on x86 (as the mach-to-phys > entry is garbage from the p.o.v. of dom0, so any attempt to translate the > pte contents into something meaningful in pseudophys space would fail).You''re right, it''s not as bad as I thought. I was worried about this because if anybody calls virt_to_page or something equivalent on that virtual address they''ll get the old struct page/pseudo-physical address rather than the new one. However, this is OK because as long as the access is within the guest they''ll still have to convert it back to that virtual address. The only catch is that I wanted to have the same lazy-copy mechanism for dom0=>domU. In that case we can''t rely on domU to trigger the copy so dom0 (or the driver domain) will need to initiate it. The easiest way to do that from outside domU is through changing its P2M table. I suppose doing it this way doesn''t necessarily preclude a different solution for dom0=>domU. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2007-Mar-23 11:47 UTC
Re: [Xen-devel] RFC: [0/2] Remove netloop by lazy copying in netback
On 23/3/07 11:42, "Herbert Xu" <herbert@gondor.apana.org.au> wrote:> The only catch is that I wanted to have the same lazy-copy > mechanism for dom0=>domU. In that case we can''t rely on domU > to trigger the copy so dom0 (or the driver domain) will need > to initiate it. The easiest way to do that from outside domU > is through changing its P2M table.Hmmmmm..... :-)> I suppose doing it this way doesn''t necessarily preclude a > different solution for dom0=>domU.We should cross this bridge when we come to it. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Pratt
2007-Mar-23 13:24 UTC
RE: [Xen-devel] RFC: [0/2] Remove netloop by lazy copying in netback
> The only catch is that I wanted to have the same lazy-copy > mechanism for dom0=>domU. In that case we can''t rely on domU > to trigger the copy so dom0 (or the driver domain) will need > to initiate it. The easiest way to do that from outside domU > is through changing its P2M table. > > I suppose doing it this way doesn''t necessarily preclude a > different solution for dom0=>domU.I think we can make the guest TX path work rather better by putting the tx packet header off into a separate fragment and placing it within one of a set of pages that are ''permanently'' mapped between the front and back ends. The backend would not typically grant map the payload fragment(s), but only do so if it catches a page fault trying to access them. The way I''d do the ''permanent'' mappings is to have a bit in the grant reference passed across the ring that indicates that the backend is free to leave the grant mapped. The bit should be set whenever the grant entry is used (the front end can''t mix and match). The backend thus just needs to keep track of which grant refs it has mapped via this permanent scheme so that it can spot (valid) reuse, and clean up afterwards. Since the front end will typically use a block of grant refs for this purpose the backend code can be optimised for a simple range check. The frontend is free to pack multiple header fragments on to a page to save memory (rather than using larger payload sized frags). Jose Renato investigated something similar to this last year, but AFAIK we didn''t take it the whole way of introducing permanent mappings with no (default) mapping of payloads. BTW: when adding fragment support you ''borrowed'' the descriptor id field which wasn''t being used at the time. This means we can no longer return free buffers to the rx ring out of order. This is rather unfortunate as its very useful for dealing with some kinds of smart NIC. I expect we''ll have to introduce a net ring v2 format in the not too distant future unless we can think of a cunning way of stealing them back? Thanks, Ian _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Santos, Jose Renato G
2007-Mar-23 23:07 UTC
RE: [Xen-devel] RFC: [0/2] Remove netloop by lazy copying in netback
> -----Original Message----- > From: Ian Pratt [mailto:Ian.Pratt@cl.cam.ac.uk] > Sent: Friday, March 23, 2007 6:24 AM > To: Herbert Xu; Keir Fraser > Cc: Xen Development Mailing List; Santos, Jose Renato G; > ian.pratt@cl.cam.ac.uk > Subject: RE: [Xen-devel] RFC: [0/2] Remove netloop by lazy > copying in netback > > > The only catch is that I wanted to have the same lazy-copy > mechanism > > for dom0=>domU. In that case we can''t rely on domU to trigger the > > copy so dom0 (or the driver domain) will need to initiate it. The > > easiest way to do that from outside domU is through > changing its P2M > > table. > > > > I suppose doing it this way doesn''t necessarily preclude a > different > > solution for dom0=>domU. > > I think we can make the guest TX path work rather better by > putting the tx packet header off into a separate fragment and > placing it within one of a set of pages that are > ''permanently'' mapped between the front and back ends. The > backend would not typically grant map the payload > fragment(s), but only do so if it catches a page fault trying > to access them. > > The way I''d do the ''permanent'' mappings is to have a bit in > the grant reference passed across the ring that indicates > that the backend is free to leave the grant mapped. The bit > should be set whenever the grant entry is used (the front end > can''t mix and match). The backend thus just needs to keep > track of which grant refs it has mapped via this permanent > scheme so that it can spot (valid) reuse, and clean up > afterwards. Since the front end will typically use a block of > grant refs for this purpose the backend code can be optimised > for a simple range check. The frontend is free to pack > multiple header fragments on to a page to save memory (rather > than using larger payload sized frags). > > Jose Renato investigated something similar to this last year, > but AFAIK we didn''t take it the whole way of introducing > permanent mappings with no (default) mapping of payloads. > >The way I did it was slightly different. The shared pages used to store the packet headers are mapped at driver initialization at the same time the I/O ring is initialized (You can think of these pages as an extension to the I/O ring and mapped and unmapped at the same time). This shared header area is divided into fixed size slots large enough to store all packet headers (MAC/IP/TCP). Netfront copies the header of the packet to a free slot on the shared header buffer area and passes the offset of the header on the first I/O ring request for the packet (instead of passing a grant reference). Payload data are transferred as additional requests on I/O ring using grant references as usual. Netback assumes the first request of all packets point to headers in the shared area and copies the header into the main socket buffer data area. Payload data (in additional I/O ring requests) are associated with skb fragments but not mapped into dom0 memory unless a page fault happens (indicating that dom0 is trying to access that data). If the packet is sent out on the wire, the payload is never touched by the CPU (assuming no netfilter rules, etc...) and no memory mapping is necessary. I have put this code on the shelf for a while as I have been busy on analyzing/optimizing the receive path. I plan to port my patch to the current xen-unstable and get some performance data for the Xen summit after I finish my experiments on the receive side. I think we should include this on future releases, but I would like to hold on until I have some other results that may influence how we do this. This would be a good topic for discussion on the summit Regards Renato> BTW: when adding fragment support you ''borrowed'' the > descriptor id field which wasn''t being used at the time. > This means we can no longer return free buffers to the rx > ring out of order. This is rather unfortunate as its very > useful for dealing with some kinds of smart NIC. I expect > we''ll have to introduce a net ring v2 format in the not too > distant future unless we can think of a cunning way of > stealing them back? > > Thanks, > Ian > > > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Pratt
2007-Mar-23 23:29 UTC
RE: [Xen-devel] RFC: [0/2] Remove netloop by lazy copying in netback
> > I think we can make the guest TX path work rather better by > > putting the tx packet header off into a separate fragment and > > placing it within one of a set of pages that are > > ''permanently'' mapped between the front and back ends. The > > backend would not typically grant map the payload > > fragment(s), but only do so if it catches a page fault trying > > to access them. > > > > The way I''d do the ''permanent'' mappings is to have a bit in > > the grant reference passed across the ring that indicates > > that the backend is free to leave the grant mapped. The bit > > should be set whenever the grant entry is used (the front end > > can''t mix and match). The backend thus just needs to keep > > track of which grant refs it has mapped via this permanent > > scheme so that it can spot (valid) reuse, and clean up > > afterwards. Since the front end will typically use a block of > > grant refs for this purpose the backend code can be optimised > > for a simple range check. The frontend is free to pack > > multiple header fragments on to a page to save memory (rather > > than using larger payload sized frags). > > The way I did it was slightly different. > The shared pages used to store the packet headers are > mapped at driver initialization at the same time the > I/O ring is initialized (You can think of these pages > as an extension to the I/O ring and mapped and unmapped at > the same time). This shared header area > is divided into fixed size slots large enough to store all > packet headers (MAC/IP/TCP). Netfront copies the > header of the packet to a free slot on the shared header > buffer area and passes the offset of the header on the > first I/O ring request for the packet (instead of > passing a grant reference). Payload data are transferred > as additional requests on I/O ring using grant > references as usual. Netback assumes the first request > of all packets point to headers in the shared area and > copies the header into the main socket buffer data area. > Payload data (in additional I/O ring requests) are associated > with skb fragments but not mapped into dom0 memory > unless a page fault happens > (indicating that dom0 is trying to access that data). > If the packet is sent out on the wire, the payload is never > touched by the CPU (assuming no netfilter rules, etc...) > and no memory mapping is necessary.Yep, the approach is similar, but I think having the ''permanent'' grants mechanism is probably a bit more flexible for managing the header fragments. It would be good if you could post your patch as it has relevance to what Herbert is currently working on. BTW: I''m pretty convinced its time to redefine the net ring format to ensure each fragment has an id, flags, grant, offset and length fields. Things are just getting messy with the current format. Best, Ian _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Herbert Xu
2007-Mar-25 11:41 UTC
Re: [Xen-devel] RFC: [0/2] Remove netloop by lazy copying in netback
On Fri, Mar 23, 2007 at 10:42:17PM +1100, Herbert Xu wrote:> On Fri, Mar 23, 2007 at 10:32:37AM +0000, Keir Fraser wrote: > > > > It still sounds like it would work. The fragment''s ''struct page *'' will map > > to a particular kernel virtual addres. That kernel virtual address can be > > transformed by arithmetic back to the ''struct page *''. The fact that the pte > > that maps that kernel virtual address actually points over at some other > > poor unsuspecting pfn (which already has a struct page *, thank you very > > much) doesn''t actually matter, does it? Does anyone ever go look at the pte > > contents and try to work out the ''struct page *'' from that? I doubt it -- or > > our netback driver would not work right now on x86 (as the mach-to-phys > > entry is garbage from the p.o.v. of dom0, so any attempt to translate the > > pte contents into something meaningful in pseudophys space would fail). > > You''re right, it''s not as bad as I thought.Actually looking into it further for ia64 there is still a problem. We use large pages for the identity mapping on ia64. In order to modify the PTE we''d have to break the large pages. Am I missing something obvious? Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2007-Mar-25 12:27 UTC
Re: [Xen-devel] RFC: [0/2] Remove netloop by lazy copying in netback
On 25/3/07 12:41, "Herbert Xu" <herbert@gondor.apana.org.au> wrote:> Actually looking into it further for ia64 there is still a problem. > We use large pages for the identity mapping on ia64. In order to > modify the PTE we''d have to break the large pages. > > Am I missing something obvious?No. Actually I changed my mind -- your original scheme (strip the original pseudophysical frame of its RAM, and add that original pseudophys frame into the netback set of frames) has the very nice advantage that we don''t end up with queued packets taking space in the netback frame pool. I missed that your scheme essentially switches the roles of a normal RAM frame and a netback frame, so that you end up with a freed-up netback frame at the end of the swap. Sorry about that. So we''re back to the problem of doing this switch when Xen is doing the p2m translation (as on ia64 for example). On x86 we have a XENMEM_add_to_physmap hypercall. This could be generalised to other architectures and extended. For example, we could add a XENMAPSPACE_gpfn -- which would mean take the ''thing'' currently mapped at the specified gpfn and map it at the new gpfn location instead. I''d certainly personally rather see add_to_physmap() extended than add extra single-purpose crap to the grant-table interfaces. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Herbert Xu
2007-Mar-26 02:19 UTC
Re: [Xen-devel] RFC: [0/2] Remove netloop by lazy copying in netback
On Sun, Mar 25, 2007 at 01:27:04PM +0100, Keir Fraser wrote:> > So we''re back to the problem of doing this switch when Xen is doing the p2m > translation (as on ia64 for example). On x86 we have a XENMEM_add_to_physmap > hypercall. This could be generalised to other architectures and extended. > For example, we could add a XENMAPSPACE_gpfn -- which would mean take the > ''thing'' currently mapped at the specified gpfn and map it at the new gpfn > location instead. I''d certainly personally rather see add_to_physmap() > extended than add extra single-purpose crap to the grant-table interfaces.I''ve had a look at this and it seems that 1) We don''t have the underlying operations suited for this. We need something that can replace a p2m entry atomically and more importantly swap two p2m entries rather than setting one and unmapping the other. The former is because we can''t easily process p2m page faults in the guest. The latter is because we stlil need to unmap the grant table entry after this operation so we have to keep the entry around. This is actually one of the reasons I picked the grant tables interface originally in that we could unmap it at the same time rather than doing a full swap followed by an unmap. So are you OK with adding underlying operations that allows a full swap of two p2m entries? This would then be used as follows in translated mode: a) new_addr = alloc_page b) memcpy(new_addr, addr, len) c) p2m_swap(__pa(new_addr), __pa(addr)) d) grant_unmap(__pa(new_addr)) 2) I''m unsure what you want me to do for non-translated mode, i.e., x86. Are you happy with the new grant table operation or do you want to follow a swap mode as above? The swapping code would look like: a) new_addr = alloc_page b) memcpy(new_addr, addr, len) c) pte_swap(new_addr, addr) d) grant_unmap(new_addr) Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2007-Mar-26 18:36 UTC
Re: [Xen-devel] RFC: [0/2] Remove netloop by lazy copying in netback
On 26/3/07 03:19, "Herbert Xu" <herbert@gondor.apana.org.au> wrote:> We need something that can replace a p2m entry atomically and more > importantly swap two p2m entries rather than setting one and unmapping > the other. The former is because we can''t easily process p2m page > faults in the guest. The latter is because we stlil need to unmap the > grant table entry after this operation so we have to keep the entry > around.Can''t we wrap the ''swap around'' critical section in an irq-safe spinlock? All we''d need to do from the page-fault handler then is a barrier on that spinlock (i.e, wait for it to be released). Netback can simply copy the page to new memory frame, unmap the grant, then relocate the new memory frame''s pseudophysical address. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Herbert Xu
2007-Mar-26 21:08 UTC
Re: [Xen-devel] RFC: [0/2] Remove netloop by lazy copying in netback
On Mon, Mar 26, 2007 at 07:36:16PM +0100, Keir Fraser wrote:> On 26/3/07 03:19, "Herbert Xu" <herbert@gondor.apana.org.au> wrote: > > > We need something that can replace a p2m entry atomically and more > > importantly swap two p2m entries rather than setting one and unmapping > > the other. The former is because we can''t easily process p2m page > > faults in the guest. The latter is because we stlil need to unmap the > > grant table entry after this operation so we have to keep the entry > > around. > > Can''t we wrap the ''swap around'' critical section in an irq-safe spinlock? > All we''d need to do from the page-fault handler then is a barrier on that > spinlock (i.e, wait for it to be released). Netback can simply copy the page > to new memory frame, unmap the grant, then relocate the new memory frame''s > pseudophysical address.That works fine for the x86 case. But when it''s auto-translated, you won''t even get a page fault in the guest because the guest PTE is unchanged and completely valid. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Santos, Jose Renato G
2007-Mar-26 23:58 UTC
RE: [Xen-devel] RFC: [0/2] Remove netloop by lazy copying in netback
> > Yep, the approach is similar, but I think having the > ''permanent'' grants mechanism is probably a bit more flexible > for managing the header fragments. >Not sure why it would be more flexible. Do you mean dealing with headers that are in fragments? If headers are in fragments they will be linearized when copied into the shared header area by netfront. Also, it seems to me that having a fixed set of pages at initialization that never change is easier to manage. I probably did not understand you correctly...> It would be good if you could post your patch as it has > relevance to what Herbert is currently working on. >Ok. Here are the patches. A few disclaimers. They were not intended to be distributed as is, so there may be some cleanups/optimizations needed, although they were tested on an older version and used to work. They also do not apply cleanly to the current xen-unstable. I am not sure they will be very helpful to Herbert, but since you asked, here they are.> BTW: I''m pretty convinced its time to redefine the net ring > format to ensure each fragment has an id, flags, grant, > offset and length fields. > Things are just getting messy with the current format.I agree that we will need a new format soon, but I am not sure I understood your concerns above. Currently all fragments do have an id, flag, grant, offset and length fields. Maybe you mean that we need a better way to represent multiple fragments belonging to the same packet. Any way, the performance analysis that I am doing may indicate that we might need some architectural changes on the device model. I would like to discuss this before we settle on a format. I am trying to get some data in time for the summit, but I am racing the clock... Regards Renato =======================Patches tx_header_copy: Create and uses set of shared pages between netfront and netback to copy packet headers on network TX path. tx_lazy_map: Do not create a host mapping for packet fragments Tx_handle_page_fault: Handle page fault caused by dom0 trying to access unmapped skb fragment. Creates the mapping in that case> > Best, > Ian > > > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2007-Mar-27 00:33 UTC
Re: [Xen-devel] RFC: [0/2] Remove netloop by lazy copying in netback
On 26/3/07 22:08, "Herbert Xu" <herbert@gondor.apana.org.au> wrote:>> Can''t we wrap the ''swap around'' critical section in an irq-safe spinlock? >> All we''d need to do from the page-fault handler then is a barrier on that >> spinlock (i.e, wait for it to be released). Netback can simply copy the page >> to new memory frame, unmap the grant, then relocate the new memory frame''s >> pseudophysical address. > > That works fine for the x86 case. But when it''s auto-translated, > you won''t even get a page fault in the guest because the guest PTE > is unchanged and completely valid.How about you invalidate the PTE for the duration of the critical section. It''s a bit skanky, but would work around this issue quite nicely! -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Herbert Xu
2007-Mar-27 00:35 UTC
Re: [Xen-devel] RFC: [0/2] Remove netloop by lazy copying in netback
On Tue, Mar 27, 2007 at 01:33:58AM +0100, Keir Fraser wrote:> > >> Can''t we wrap the ''swap around'' critical section in an irq-safe spinlock? > >> All we''d need to do from the page-fault handler then is a barrier on that > >> spinlock (i.e, wait for it to be released). Netback can simply copy the page > >> to new memory frame, unmap the grant, then relocate the new memory frame''s > >> pseudophysical address. > > > > That works fine for the x86 case. But when it''s auto-translated, > > you won''t even get a page fault in the guest because the guest PTE > > is unchanged and completely valid. > > How about you invalidate the PTE for the duration of the critical section. > It''s a bit skanky, but would work around this issue quite nicely!Maybe I am missing something. Because I thought we''ve agreed that we can''t (or don''t want to) do the PTE trick because of the large pages (16MB at least) used for the identity mapping. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2007-Mar-27 00:45 UTC
Re: [Xen-devel] RFC: [0/2] Remove netloop by lazy copying in netback
On 27/3/07 01:33, "Keir Fraser" <Keir.Fraser@cl.cam.ac.uk> wrote:>> That works fine for the x86 case. But when it''s auto-translated, >> you won''t even get a page fault in the guest because the guest PTE >> is unchanged and completely valid. > > How about you invalidate the PTE for the duration of the critical section. > It''s a bit skanky, but would work around this issue quite nicely!As an arguably better alternative, perhaps we could simply make XENMEM_add_to_physmap an atomic switch operation (from old mapping to new mapping)? That is probably not a very hard thing to guarantee. So you would XENMEM_add_to_physmap over the top of the existing grant mapping, and expect the grant to simply ''fall away'' as a result of this operation (i.e., implicit removal from physmap triggers release of the grant). This sounds neat to me. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2007-Mar-27 00:46 UTC
Re: [Xen-devel] RFC: [0/2] Remove netloop by lazy copying in netback
On 27/3/07 01:35, "Herbert Xu" <herbert@gondor.apana.org.au> wrote:>> How about you invalidate the PTE for the duration of the critical section. >> It''s a bit skanky, but would work around this issue quite nicely! > > Maybe I am missing something. Because I thought we''ve agreed that > we can''t (or don''t want to) do the PTE trick because of the large > pages (16MB at least) used for the identity mapping.Oh yes, good point. Well, see my follow-up email for another suggestion. :-) -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Herbert Xu
2007-Mar-27 00:52 UTC
Re: [Xen-devel] RFC: [0/2] Remove netloop by lazy copying in netback
On Tue, Mar 27, 2007 at 01:45:54AM +0100, Keir Fraser wrote:> > As an arguably better alternative, perhaps we could simply make > XENMEM_add_to_physmap an atomic switch operation (from old mapping to new > mapping)? That is probably not a very hard thing to guarantee. > > So you would XENMEM_add_to_physmap over the top of the existing grant > mapping, and expect the grant to simply ''fall away'' as a result of this > operation (i.e., implicit removal from physmap triggers release of the > grant). > > This sounds neat to me.It sounds good but looking further there is still an issue. The problem is that the gpfn itself is not sufficient if we want to release the grant table entry. We need to have the grant handle too in order to free the entry completely. If we did provide the handle then this will just be the same operation that I was trying to add except that now we''re adding it under XENMEM_add_to_physmap :) Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2007-Mar-27 01:02 UTC
Re: [Xen-devel] RFC: [0/2] Remove netloop by lazy copying in netback
On 27/3/07 01:52, "Herbert Xu" <herbert@gondor.apana.org.au> wrote:> It sounds good but looking further there is still an issue. > > The problem is that the gpfn itself is not sufficient if we want to > release the grant table entry. We need to have the grant handle too > in order to free the entry completely. If we did provide the handle > then this will just be the same operation that I was trying to add > except that now we''re adding it under XENMEM_add_to_physmap :)Well, it could be a bit more work to change, but what does a p2m entry look like for a grant mapping? The p2m table should be a very convenient place to have typed entries, with one of those types being grant-mapping, and including the maptrack handle. The MFN can be derived from the maptrack handle where that is required. This would then conveniently be enough information to perform implicit unmaps (the lack of support of which is an annoying limitation for non-auto-translate guests which it would be really nice to avoid for auto-translate guests, particularly since it ought to be not really all that hard afaics). This is pretty much the avenue (hopefully not a dead end!) that I pushed Isaku Yamahata down when he queried physmap semantics a week or so ago. Sadly we''ve had no need for grant mappings in x86 auto-translate guests yet, so there''s no example x86 implementation to lead the way. :-) -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Herbert Xu
2007-Mar-27 01:08 UTC
Re: [Xen-devel] RFC: [0/2] Remove netloop by lazy copying in netback
On Tue, Mar 27, 2007 at 02:02:39AM +0100, Keir Fraser wrote:> > Well, it could be a bit more work to change, but what does a p2m entry look > like for a grant mapping? The p2m table should be a very convenient place to > have typed entries, with one of those types being grant-mapping, and > including the maptrack handle. The MFN can be derived from the maptrack > handle where that is required. This would then conveniently be enough > information to perform implicit unmaps (the lack of support of which is an > annoying limitation for non-auto-translate guests which it would be really > nice to avoid for auto-translate guests, particularly since it ought to be > not really all that hard afaics).OK I''ll look into this for the auto-translated case. BTW, what do you want me to do for the non-translated case? Should I go back to the original grant table operation or use the guest page-fault method? Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Herbert Xu
2007-Mar-27 01:17 UTC
Re: [Xen-devel] RFC: [0/2] Remove netloop by lazy copying in netback
On Tue, Mar 27, 2007 at 11:08:10AM +1000, Herbert Xu wrote:> On Tue, Mar 27, 2007 at 02:02:39AM +0100, Keir Fraser wrote: > > > > Well, it could be a bit more work to change, but what does a p2m entry look > > like for a grant mapping? The p2m table should be a very convenient place to > > have typed entries, with one of those types being grant-mapping, and > > including the maptrack handle. The MFN can be derived from the maptrack > > handle where that is required. This would then conveniently be enough > > information to perform implicit unmaps (the lack of support of which is an > > annoying limitation for non-auto-translate guests which it would be really > > nice to avoid for auto-translate guests, particularly since it ought to be > > not really all that hard afaics). > > OK I''ll look into this for the auto-translated case.Hmm where are we going to get space to store the grant handle? The p2m table itself is just a page table. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Isaku Yamahata
2007-Mar-27 03:34 UTC
Re: [Xen-devel] RFC: [0/2] Remove netloop by lazy copying in netback
On Tue, Mar 27, 2007 at 02:02:39AM +0100, Keir Fraser wrote:> On 27/3/07 01:52, "Herbert Xu" <herbert@gondor.apana.org.au> wrote: > > > It sounds good but looking further there is still an issue. > > > > The problem is that the gpfn itself is not sufficient if we want to > > release the grant table entry. We need to have the grant handle too > > in order to free the entry completely. If we did provide the handle > > then this will just be the same operation that I was trying to add > > except that now we''re adding it under XENMEM_add_to_physmap :) > > Well, it could be a bit more work to change, but what does a p2m entry look > like for a grant mapping? The p2m table should be a very convenient place to > have typed entries, with one of those types being grant-mapping, and > including the maptrack handle. The MFN can be derived from the maptrack > handle where that is required. This would then conveniently be enough > information to perform implicit unmaps (the lack of support of which is an > annoying limitation for non-auto-translate guests which it would be really > nice to avoid for auto-translate guests, particularly since it ought to be > not really all that hard afaics).The ia64 p2m entry is 64bit and some bits are already used so that there is no room for storing grant table related handle. I don''t think storing the maptrack handle instead of MFN is good idea because it would make the p2m more complicated. Is it possible to look up grant table reference by MFN effectively? Probably MFN-keyed hash table whose entry is struct active_grant_table. I''m not sure whether it is effective compared to guest providing grant refernece, though.> This is pretty much the avenue (hopefully not a dead end!) that I pushed > Isaku Yamahata down when he queried physmap semantics a week or so ago. > Sadly we''ve had no need for grant mappings in x86 auto-translate guests yet, > so there''s no example x86 implementation to lead the way. :-)The ia64 p2m semantic is now fixed in xen-ia64-unstable.hg so that page_is_removable() arch hook can be removed. I''ll send the patch to remove the hook after the next pull. -- yamahata _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Isaku Yamahata
2007-Mar-27 03:41 UTC
Re: [Xen-devel] RFC: [0/2] Remove netloop by lazy copying in netback
On Mon, Mar 26, 2007 at 12:19:47PM +1000, Herbert Xu wrote:> This is actually one of the reasons I picked the grant tables interface > originally in that we could unmap it at the same time rather than doing > a full swap followed by an unmap. > > So are you OK with adding underlying operations that allows a full swap > of two p2m entries? This would then be used as follows in translated mode: > > a) new_addr = alloc_page > b) memcpy(new_addr, addr, len) > c) p2m_swap(__pa(new_addr), __pa(addr)) > d) grant_unmap(__pa(new_addr))Swaping the p2m entry on ia64 is easy. However the following tlb flush cost for the newly allocated page is extremly high on ia64. Can alloc_page() at step a) be avoided? For example batched page allocation and reusing allocated pfns by setting PG_foreign. And I want arch hook before use/release those pages to reduce tlb flush cost. -- yamahata _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Herbert Xu
2007-Mar-27 05:45 UTC
Re: [Xen-devel] RFC: [0/2] Remove netloop by lazy copying in netback
On Tue, Mar 27, 2007 at 12:41:25PM +0900, Isaku Yamahata wrote:> > Swaping the p2m entry on ia64 is easy.Great!> However the following tlb flush cost for the newly allocated page > is extremly high on ia64.It doesn''t matter here because this is the slow path which we should never enter unless something is misbehaving or deliberately slow. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2007-Mar-27 07:36 UTC
Re: [Xen-devel] RFC: [0/2] Remove netloop by lazy copying in netback
On 27/3/07 04:34, "Isaku Yamahata" <yamahata@valinux.co.jp> wrote:> The ia64 p2m entry is 64bit and some bits are already used > so that there is no room for storing grant table related handle. > I don''t think storing the maptrack handle instead of MFN is good idea > because it would make the p2m more complicated. > > Is it possible to look up grant table reference by MFN effectively? > Probably MFN-keyed hash table whose entry is struct active_grant_table. > I''m not sure whether it is effective compared to guest providing > grant refernece, though.This is quite a shame, since the swap-grant-mapping-for-memory hypercall simply is not very nice. What if we wanted to swap a grant mapping for another grant mapping in future? Or some other kind of mapping? We end up needing a new, complicated, grant-table hypercall every time. Or in six months time we decide this wasn''t such a good idea, implement another scheme, but have this special-purpose grant-table hypercall hanging around unused forever more. Could we shatter the ia64 guest large mappings easily? They have no benefit on Xen anyway apart from some inconsequential memory saving. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Herbert Xu
2007-Mar-27 07:44 UTC
Re: [Xen-devel] RFC: [0/2] Remove netloop by lazy copying in netback
On Tue, Mar 27, 2007 at 08:36:39AM +0100, Keir Fraser wrote:> > This is quite a shame, since the swap-grant-mapping-for-memory hypercall > simply is not very nice. What if we wanted to swap a grant mapping for > another grant mapping in future? Or some other kind of mapping? We end up > needing a new, complicated, grant-table hypercall every time. Or in six > months time we decide this wasn''t such a good idea, implement another > scheme, but have this special-purpose grant-table hypercall hanging around > unused forever more. > > Could we shatter the ia64 guest large mappings easily? They have no benefit > on Xen anyway apart from some inconsequential memory saving.There is another way. Instead of expanding add_to_physmap which doesn''t quite fit our semantics, let''s add a new operation that *actually* swaps two p2m entries. Then it can be used as follows: a) new_addr = alloc_page b) memcpy(new_addr, addr, len) c) p2m_swap(__pa(new_addr), __pa(addr)) d) grant_unmap(__pa(new_addr)) Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2007-Mar-27 07:51 UTC
Re: [Xen-devel] RFC: [0/2] Remove netloop by lazy copying in netback
On 27/3/07 08:36, "Keir Fraser" <Keir.Fraser@cl.cam.ac.uk> wrote:> This is quite a shame, since the swap-grant-mapping-for-memory hypercall > simply is not very nice. What if we wanted to swap a grant mapping for > another grant mapping in future? Or some other kind of mapping? We end up > needing a new, complicated, grant-table hypercall every time. Or in six > months time we decide this wasn''t such a good idea, implement another > scheme, but have this special-purpose grant-table hypercall hanging around > unused forever more. > > Could we shatter the ia64 guest large mappings easily? They have no benefit > on Xen anyway apart from some inconsequential memory saving.Also bear in mind the plan to avoid the grant mapping at all in the direct-to-wire transmit case. This also will rely on a #PF (or some other fault, potentially) if the guest tries to access that pseudophysical page of memory. If we can work out a way to make this work on ia64 then the delayed-copy implementation problems are solved too. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Herbert Xu
2007-Mar-27 07:53 UTC
Re: [Xen-devel] RFC: [0/2] Remove netloop by lazy copying in netback
On Tue, Mar 27, 2007 at 08:51:15AM +0100, Keir Fraser wrote:> > Also bear in mind the plan to avoid the grant mapping at all in the > direct-to-wire transmit case. This also will rely on a #PF (or some other > fault, potentially) if the guest tries to access that pseudophysical page of > memory. If we can work out a way to make this work on ia64 then the > delayed-copy implementation problems are solved too.That''s to avoid the TLB flush costs, right? On ia64 (or anything other than x86-32 for that matter) you have enough virtual address bits to map the entire physical memory into dom0 (or the driver domain) so that could also be used to avoid TLB flushes. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Herbert Xu
2007-Mar-27 07:59 UTC
Re: [Xen-devel] RFC: [0/2] Remove netloop by lazy copying in netback
On Tue, Mar 27, 2007 at 05:53:18PM +1000, Herbert Xu wrote:> On Tue, Mar 27, 2007 at 08:51:15AM +0100, Keir Fraser wrote: > > > > Also bear in mind the plan to avoid the grant mapping at all in the > > direct-to-wire transmit case. This also will rely on a #PF (or some other > > fault, potentially) if the guest tries to access that pseudophysical page of > > memory. If we can work out a way to make this work on ia64 then the > > delayed-copy implementation problems are solved too.Even with this we could still keep the large pages by using page faults on the host page table rather than the guest page table. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2007-Mar-27 08:03 UTC
Re: [Xen-devel] RFC: [0/2] Remove netloop by lazy copying in netback
On 27/3/07 08:53, "Herbert Xu" <herbert@gondor.apana.org.au> wrote:>> Also bear in mind the plan to avoid the grant mapping at all in the >> direct-to-wire transmit case. This also will rely on a #PF (or some other >> fault, potentially) if the guest tries to access that pseudophysical page of >> memory. If we can work out a way to make this work on ia64 then the >> delayed-copy implementation problems are solved too. > > That''s to avoid the TLB flush costs, right? > > On ia64 (or anything other than x86-32 for that matter) you have > enough virtual address bits to map the entire physical memory > into dom0 (or the driver domain) so that could also be used to > avoid TLB flushes.Once we have IOMMU support we want to completely close off driver domain access to arbitrary memory (which they currently have via DMA). Giving a mapping of all physical memory would be a step in quite the opposite direction. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2007-Mar-27 08:10 UTC
Re: [Xen-devel] RFC: [0/2] Remove netloop by lazy copying in netback
On 27/3/07 08:59, "Herbert Xu" <herbert@gondor.apana.org.au> wrote:>>> Also bear in mind the plan to avoid the grant mapping at all in the >>> direct-to-wire transmit case. This also will rely on a #PF (or some other >>> fault, potentially) if the guest tries to access that pseudophysical page of >>> memory. If we can work out a way to make this work on ia64 then the >>> delayed-copy implementation problems are solved too. > > Even with this we could still keep the large pages by using page > faults on the host page table rather than the guest page table.Sure. How would you expose those to the guest? If that can be solved we can define grant-table unmap to switch the p2m entry into an explicitly invalid state and we''re away. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2007-Mar-27 08:12 UTC
Re: [Xen-devel] RFC: [0/2] Remove netloop by lazy copying in netback
On 27/3/07 08:44, "Herbert Xu" <herbert@gondor.apana.org.au> wrote:> There is another way. Instead of expanding add_to_physmap which > doesn''t quite fit our semantics, let''s add a new operation that > *actually* swaps two p2m entries. Then it can be used as follows: > > a) new_addr = alloc_page > b) memcpy(new_addr, addr, len) > c) p2m_swap(__pa(new_addr), __pa(addr)) > d) grant_unmap(__pa(new_addr))I like this a bit more than the grant-table hypercall, mainly because I suspect the hypercall implementation is simpler. I''m not sure how easy it is to make atomic though! Would there be enough locking to ensure that one of the p2m entries being present in both (or neither) locations at some intermediate point in time would not matter? -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Herbert Xu
2007-Mar-27 08:13 UTC
Re: [Xen-devel] RFC: [0/2] Remove netloop by lazy copying in netback
On Tue, Mar 27, 2007 at 09:10:40AM +0100, Keir Fraser wrote:> > > Even with this we could still keep the large pages by using page > > faults on the host page table rather than the guest page table. > > Sure. How would you expose those to the guest? If that can be solved we can > define grant-table unmap to switch the p2m entry into an explicitly invalid > state and we''re away.I was actually thinking of Jose''s scheme where it''s mapped in on demand. So you would still do your grant table map operation, but instead of mapping it in as we do now it would simply notify the hypervisor of the intention to map this page, and the hypervisor can then map it in when it actually faults. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Herbert Xu
2007-Mar-27 08:15 UTC
Re: [Xen-devel] RFC: [0/2] Remove netloop by lazy copying in netback
On Tue, Mar 27, 2007 at 09:12:09AM +0100, Keir Fraser wrote:> On 27/3/07 08:44, "Herbert Xu" <herbert@gondor.apana.org.au> wrote: > > > There is another way. Instead of expanding add_to_physmap which > > doesn''t quite fit our semantics, let''s add a new operation that > > *actually* swaps two p2m entries. Then it can be used as follows: > > > > a) new_addr = alloc_page > > b) memcpy(new_addr, addr, len) > > c) p2m_swap(__pa(new_addr), __pa(addr)) > > d) grant_unmap(__pa(new_addr)) > > I like this a bit more than the grant-table hypercall, mainly because I > suspect the hypercall implementation is simpler. I''m not sure how easy it is > to make atomic though! Would there be enough locking to ensure that one of > the p2m entries being present in both (or neither) locations at some > intermediate point in time would not matter?Well for my purpose I don''t need both entries to be written atomically. As long as each entry is replaced atomically wrt itself it''ll be good enough for the usage scenario above. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Isaku Yamahata
2007-Mar-27 08:15 UTC
Re: [Xen-devel] RFC: [0/2] Remove netloop by lazy copying in netback
On Tue, Mar 27, 2007 at 05:44:51PM +1000, Herbert Xu wrote:> On Tue, Mar 27, 2007 at 08:36:39AM +0100, Keir Fraser wrote: > > > > This is quite a shame, since the swap-grant-mapping-for-memory hypercall > > simply is not very nice. What if we wanted to swap a grant mapping for > > another grant mapping in future? Or some other kind of mapping? We end up > > needing a new, complicated, grant-table hypercall every time. Or in six > > months time we decide this wasn''t such a good idea, implement another > > scheme, but have this special-purpose grant-table hypercall hanging around > > unused forever more. > > > > Could we shatter the ia64 guest large mappings easily? They have no benefit > > on Xen anyway apart from some inconsequential memory saving. > > There is another way. Instead of expanding add_to_physmap which > doesn''t quite fit our semantics, let''s add a new operation that > *actually* swaps two p2m entries. Then it can be used as follows: > > a) new_addr = alloc_page > b) memcpy(new_addr, addr, len) > c) p2m_swap(__pa(new_addr), __pa(addr)) > d) grant_unmap(__pa(new_addr))I''m thinking same thing. This would work. -- yamahata _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Isaku Yamahata
2007-Mar-27 08:22 UTC
Re: [Xen-devel] RFC: [0/2] Remove netloop by lazy copying in netback
On Tue, Mar 27, 2007 at 05:53:18PM +1000, Herbert Xu wrote:> On Tue, Mar 27, 2007 at 08:51:15AM +0100, Keir Fraser wrote: > > > > Also bear in mind the plan to avoid the grant mapping at all in the > > direct-to-wire transmit case. This also will rely on a #PF (or some other > > fault, potentially) if the guest tries to access that pseudophysical page of > > memory. If we can work out a way to make this work on ia64 then the > > delayed-copy implementation problems are solved too. > That''s to avoid the TLB flush costs, right?Xen/IA64 has optimization there. Xen/ia64 tracks tlb inserts on grant table mapped page. If there was no tlb insert(i.e. direct-to-wire transmit case), it doesn''t issue tlb flush, but only clears the p2m entry when grant table unmap. On Xen/IA64 case, something like copy-on-demand-access can be implemented as copy-on-tlb-insert in xen, I suppose. -- yamahata _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2007-Mar-27 08:31 UTC
Re: [Xen-devel] RFC: [0/2] Remove netloop by lazy copying in netback
On 27/3/07 09:13, "Herbert Xu" <herbert@gondor.apana.org.au> wrote:> I was actually thinking of Jose''s scheme where it''s mapped in on > demand. So you would still do your grant table map operation, > but instead of mapping it in as we do now it would simply notify > the hypervisor of the intention to map this page, and the hypervisor > can then map it in when it actually faults.That''s not quite how Jose''s patch works. The lazy mapping is done by the guest in his patch. If we''re not going to go as far as supporting guest-visible ''p2m faults'' then I think overall your original patch is probably the best approach so far. If I view the new hypercall as a special case of the existing unmap (which is an unmap-to-zero-pte special case) then it''s not so bad. At least there should be very little new code in the hypervisor needed to implement it. We have a little while longer to think about this since the patches aren''t for 3.0.5. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Herbert Xu
2007-Mar-27 09:20 UTC
Re: [Xen-devel] RFC: [0/2] Remove netloop by lazy copying in netback
On Tue, Mar 27, 2007 at 09:31:15AM +0100, Keir Fraser wrote:> > > I was actually thinking of Jose''s scheme where it''s mapped in on > > demand. So you would still do your grant table map operation, > > but instead of mapping it in as we do now it would simply notify > > the hypervisor of the intention to map this page, and the hypervisor > > can then map it in when it actually faults. > > That''s not quite how Jose''s patch works. The lazy mapping is done by the > guest in his patch.That would mean the breaking of large pages on ia64 or any other platform that uses them. Is there any benefit for not doing it in the hypervisor?> If we''re not going to go as far as supporting guest-visible ''p2m faults'' > then I think overall your original patch is probably the best approach so > far. If I view the new hypercall as a special case of the existing unmap > (which is an unmap-to-zero-pte special case) then it''s not so bad. At least > there should be very little new code in the hypervisor needed to implement > it. We have a little while longer to think about this since the patches > aren''t for 3.0.5.OK. How do you want to proceed from here then? Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2007-Mar-27 09:58 UTC
Re: [Xen-devel] RFC: [0/2] Remove netloop by lazy copying in netback
On 27/3/07 10:20, "Herbert Xu" <herbert@gondor.apana.org.au> wrote:>> That''s not quite how Jose''s patch works. The lazy mapping is done by the >> guest in his patch. > > That would mean the breaking of large pages on ia64 or any other > platform that uses them. Is there any benefit for not doing it > in the hypervisor?Yet another bloody grant-table hypercall! Also there''s the question of where this lazy state gets squirrelled away until the demand mapping occurs. Perhaps the p2m is usable in this case, since I expect we have plenty of spare bits for a not-present entry... But anyway, we don''t have the luxury of a p2m to stick things in the x86 case.>> If we''re not going to go as far as supporting guest-visible ''p2m faults'' >> then I think overall your original patch is probably the best approach so >> far. If I view the new hypercall as a special case of the existing unmap >> (which is an unmap-to-zero-pte special case) then it''s not so bad. At least >> there should be very little new code in the hypervisor needed to implement >> it. We have a little while longer to think about this since the patches >> aren''t for 3.0.5. > > OK. How do you want to proceed from here then?I''ll take a closer look at your original patch. Having viewed the alternatives I don''t hate it so much now. :-) -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Herbert Xu
2007-Mar-27 10:06 UTC
Re: [Xen-devel] RFC: [0/2] Remove netloop by lazy copying in netback
On Tue, Mar 27, 2007 at 10:58:46AM +0100, Keir Fraser wrote:> > Yet another bloody grant-table hypercall! Also there''s the question of where > this lazy state gets squirrelled away until the demand mapping occurs. > Perhaps the p2m is usable in this case, since I expect we have plenty of > spare bits for a not-present entry... But anyway, we don''t have the luxury > of a p2m to stick things in the x86 case.Sorry I don''t get it. What if we called it a page table operation, would that make it better :) In other words, what exactly is wrong with having such an operation? As for a place to store the info, we can just store the real MFN in the p2m table but mark it as not present.> I''ll take a closer look at your original patch. Having viewed the > alternatives I don''t hate it so much now. :-)Thanks, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Herbert Xu
2007-Mar-27 10:15 UTC
Re: [Xen-devel] RFC: [0/2] Remove netloop by lazy copying in netback
On Tue, Mar 27, 2007 at 08:36:39AM +0100, Keir Fraser wrote:> > Could we shatter the ia64 guest large mappings easily? They have no benefit > on Xen anyway apart from some inconsequential memory saving.I think large pages are needed in general not to reduce memory usage, but to reduce TLB usage. I''m not sure about ia64, but for other architectures it''s quite important. So getting rid of them will add a non-trivial cost. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2007-Mar-27 10:17 UTC
Re: [Xen-devel] RFC: [0/2] Remove netloop by lazy copying in netback
On 27/3/07 11:06, "Herbert Xu" <herbert@gondor.apana.org.au> wrote:> Sorry I don''t get it. What if we called it a page table operation, > would that make it better :) In other words, what exactly is wrong > with having such an operation?The grant-table code is pretty complicated and a harbour for bugs. So I prefer to keep it as small and simple as possible. I''m not sure the comparison with page-table operations is a good one -- the Xen PV interface for pagetable updates doesn''t include anything as special-case as the things we are currently considering for grant tables.> As for a place to store the info, we can just store the real MFN in > the p2m table but mark it as not present.We can''t do that where there is no p2m. I suppose we could make this operation specific to auto-translate guests. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2007-Mar-27 10:19 UTC
Re: [Xen-devel] RFC: [0/2] Remove netloop by lazy copying in netback
On 27/3/07 11:15, "Herbert Xu" <herbert@gondor.apana.org.au> wrote:>> Could we shatter the ia64 guest large mappings easily? They have no benefit >> on Xen anyway apart from some inconsequential memory saving. > > I think large pages are needed in general not to reduce memory usage, > but to reduce TLB usage. I''m not sure about ia64, but for other > architectures it''s quite important. > > So getting rid of them will add a non-trivial cost.Is this a virtual TLB we''re talking about? Xen will already have to implicitly shatter the large mappings before updating the real TLB, since guest machine memory map is not contiguous at the scale of large pages (even if we tried at domain build time, it certainly wouldn''t be the case when foreign grant mappings are scattered all over the place). -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Herbert Xu
2007-Mar-27 10:25 UTC
Re: [Xen-devel] RFC: [0/2] Remove netloop by lazy copying in netback
On Tue, Mar 27, 2007 at 11:17:40AM +0100, Keir Fraser wrote:> > The grant-table code is pretty complicated and a harbour for bugs. So I > prefer to keep it as small and simple as possible. I''m not sure the > comparison with page-table operations is a good one -- the Xen PV interface > for pagetable updates doesn''t include anything as special-case as the things > we are currently considering for grant tables.Fair enough. The only thing I could say in this regard is that if we moved in this direction we could replace all existing grant table ops with this and eventually phase them out.> > As for a place to store the info, we can just store the real MFN in > > the p2m table but mark it as not present. > > We can''t do that where there is no p2m. I suppose we could make this > operation specific to auto-translate guests.Actually we can just mark the PTEs as not present in that case. The hypervisor can then mark the PTEs as present on the first fault. Just as grant table operations currently work on virtual addresses for non-translated guests and physical addresses for translated guests, such an operation would work on PTEs for non-translated guests and p2m entries for translated guests. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2007-Mar-27 10:40 UTC
Re: [Xen-devel] RFC: [0/2] Remove netloop by lazy copying in netback
On 27/3/07 11:25, "Herbert Xu" <herbert@gondor.apana.org.au> wrote:> Actually we can just mark the PTEs as not present in that case. The > hypervisor can then mark the PTEs as present on the first fault. > > Just as grant table operations currently work on virtual addresses for > non-translated guests and physical addresses for translated guests, > such an operation would work on PTEs for non-translated guests and > p2m entries for translated guests.Yeah, but: how does Xen efficiently work out whether a not-present fault is due to a lazy grant mapping and, if so, what it is that needs to be demand mapped? PV guests can store what they like in not-present pagetable entries. Perhaps we could make use of reserved bits in ptes instead (not that there are any in non-pae x86, but I''d very much like to obsolete non-pae x86 Xen after 3.0.5 anyway -- I don''t think anyone actually ships a product with non-pae bits). -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Herbert Xu
2007-Mar-27 11:09 UTC
Re: [Xen-devel] RFC: [0/2] Remove netloop by lazy copying in netback
On Tue, Mar 27, 2007 at 11:40:00AM +0100, Keir Fraser wrote:> > Yeah, but: how does Xen efficiently work out whether a not-present fault is > due to a lazy grant mapping and, if so, what it is that needs to be demand > mapped? PV guests can store what they like in not-present pagetable entries. > Perhaps we could make use of reserved bits in ptes instead (not that there > are any in non-pae x86, but I''d very much like to obsolete non-pae x86 Xen > after 3.0.5 anyway -- I don''t think anyone actually ships a product with > non-pae bits).That''s a good point. Actually, on x86 we have the accessed bit so we can do this in a different way. Instead of setting up PTE/P2M entries that cause a page fault on the first access, we setup the real thing every time and simply check the accessed bit when we unmap the grant table entry. That will then allow us to flush the TLB iff the accessed bit is set. Now I haven''t looked at the other architectures, but I would think the accessed bit would likely to be present there as well, do you know whether this is the case for ia64/ppc? Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Herbert Xu
2007-Mar-27 11:11 UTC
Re: [Xen-devel] RFC: [0/2] Remove netloop by lazy copying in netback
On Tue, Mar 27, 2007 at 11:19:10AM +0100, Keir Fraser wrote:> > Is this a virtual TLB we''re talking about? Xen will already have to > implicitly shatter the large mappings before updating the real TLB, since > guest machine memory map is not contiguous at the scale of large pages (even > if we tried at domain build time, it certainly wouldn''t be the case when > foreign grant mappings are scattered all over the place).OK, you got me there :) Of course I''d still like to avoid this if possible since it introduces another change from Linux. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Pratt
2007-Mar-27 13:15 UTC
RE: [Xen-devel] RFC: [0/2] Remove netloop by lazy copying in netback
> > Yep, the approach is similar, but I think having the > > ''permanent'' grants mechanism is probably a bit more flexible > > for managing the header fragments. > > > > Not sure why it would be more flexible. Do you mean dealing with > headers that are in fragments? If headers are in fragments they willbe> linearized when copied into the shared header area by netfront. Also,it> seems to me that having a fixed set of pages at initialization that > never change is easier to manage. I probably did not understand you > correctly...Netfront would copy the header out into a separate (small) fragment, allocating it from a from a frame in the permanently mapped pool. Having a bit to indicate ''permanent'' grants means that the header pool is easily expanded at run time if need be, and seems a more generic mechanism than having a bunch of frames agreed out-of-band at startup. Ian _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2007-Mar-27 13:50 UTC
Re: [Xen-devel] RFC: [0/2] Remove netloop by lazy copying in netback
On 27/3/07 12:11, "Herbert Xu" <herbert@gondor.apana.org.au> wrote:> OK, you got me there :) > > Of course I''d still like to avoid this if possible since it introduces > another change from Linux.Yeah, that''s fair enough. We''ve avoided PV-driver dependencies on changes in core architectural code so far. It would be nice to continue this, particularly given we now support PV-on-HVM (ie., PV-on-otherwise-unmodified-guest). Of course we''d only be introducing a dependency for a *backend* driver here. Your thought of using the accessed bit is rather neat. I believe it is guaranteed that if we test-and-clear a pte, and see A==0, then no TLB can cache that pte. If so, I think this could be a winner. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Herbert Xu
2007-Mar-27 20:46 UTC
Re: [Xen-devel] RFC: [0/2] Remove netloop by lazy copying in netback
On Tue, Mar 27, 2007 at 02:50:19PM +0100, Keir Fraser wrote:> > PV-on-otherwise-unmodified-guest). Of course we''d only be introducing a > dependency for a *backend* driver here.Well I''m still hoping to have a zero-copy dom0 => domU path :)> Your thought of using the accessed bit is rather neat. I believe it is > guaranteed that if we test-and-clear a pte, and see A==0, then no TLB can > cache that pte. If so, I think this could be a winner.Yes that assumption is used elsewhere too so it should be fine. I''ve checked again and the accessed bit is certainly present on ppc as well as ia64. The only I don''t know is if it''s present on the nested page tables on ia64 but I see no reasons why it wouldn''t be. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Herbert Xu
2007-Mar-28 13:05 UTC
Re: [Xen-devel] RFC: [0/2] Remove netloop by lazy copying in netback
On Wed, Mar 28, 2007 at 06:46:40AM +1000, Herbert Xu wrote:> > I''ve checked again and the accessed bit is certainly present on ppc > as well as ia64. The only I don''t know is if it''s present on the > nested page tables on ia64 but I see no reasons why it wouldn''t be.Here is a completely untested patch that demonstrates how this could be done on x86. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- diff -r 3ac19fda0bc2 xen/arch/x86/mm.c --- a/xen/arch/x86/mm.c Fri Mar 02 12:11:52 2007 +0000 +++ b/xen/arch/x86/mm.c Wed Mar 28 22:58:45 2007 +1000 @@ -1262,6 +1262,36 @@ static void free_l4_table(struct page_in #endif +static inline int cmpxchg_intpte(intpte_t *p, + intpte_t *old, + intpte_t new, + unsigned long mfn, + struct vcpu *v) +{ + int rv; + + for ( ; ; ) + { + intpte_t t = *old; + + rv = paging_cmpxchg_guest_entry(v, p, old, new, _mfn(mfn)); + if ( unlikely(rv == 0) ) + { + MEM_LOG("Failed to update %" PRIpte " -> %" PRIpte + ": saw %" PRIpte, t, new, t); + break; + } + + if ( t == *old ) + break; + + /* Allowed to change in Accessed/Dirty flags only. */ + BUG_ON((t ^ *old) & ~(intpte_t)(_PAGE_ACCESSED|_PAGE_DIRTY)); + } + + return rv; +} + /* How to write an entry to the guest pagetables. * Returns 0 for failure (pointer not valid), 1 for success. */ static inline int update_intpte(intpte_t *p, @@ -1274,27 +1304,7 @@ static inline int update_intpte(intpte_t #ifndef PTE_UPDATE_WITH_CMPXCHG rv = paging_write_guest_entry(v, p, new, _mfn(mfn)); #else - { - intpte_t t = old; - for ( ; ; ) - { - rv = paging_cmpxchg_guest_entry(v, p, &t, new, _mfn(mfn)); - if ( unlikely(rv == 0) ) - { - MEM_LOG("Failed to update %" PRIpte " -> %" PRIpte - ": saw %" PRIpte, old, new, t); - break; - } - - if ( t == old ) - break; - - /* Allowed to change in Accessed/Dirty flags only. */ - BUG_ON((t ^ old) & ~(intpte_t)(_PAGE_ACCESSED|_PAGE_DIRTY)); - - old = t; - } - } + rv = cmpxchg_intpte(p, &old, new, mfn, v); #endif return rv; } @@ -2598,6 +2608,7 @@ static int destroy_grant_va_mapping( unsigned long addr, unsigned long frame, struct vcpu *v) { l1_pgentry_t *pl1e, ol1e; + intpte_t *ptep, opte, npte; unsigned long gl1mfn; int rc = 0; @@ -2619,12 +2630,19 @@ static int destroy_grant_va_mapping( } /* Delete pagetable entry. */ - if ( unlikely(!UPDATE_ENTRY(l1, pl1e, ol1e, l1e_empty(), gl1mfn, v)) ) + ptep = (intpte_t *)pl1e; + opte = l1e_get_intpte(ol1e); + npte = l1e_get_intpte(l1e_empty()); + + if ( unlikely(!cmpxchg_intpte(ptep, opte, npte, mfn, v)) ) { MEM_LOG("Cannot delete PTE entry at %p", (unsigned long *)pl1e); rc = GNTST_general_error; goto out; } + + if (!(opte & (intpte_t)_PAGE_ACCESSED)) + rc = GNTST_unused; out: guest_unmap_l1e(v, pl1e); diff -r 3ac19fda0bc2 xen/common/grant_table.c --- a/xen/common/grant_table.c Fri Mar 02 12:11:52 2007 +0000 +++ b/xen/common/grant_table.c Wed Mar 28 22:58:45 2007 +1000 @@ -394,7 +394,7 @@ gnttab_map_grant_ref( return 0; } -static void +static int __gnttab_unmap_grant_ref( struct gnttab_unmap_grant_ref *op) { @@ -416,7 +416,7 @@ __gnttab_unmap_grant_ref( { gdprintk(XENLOG_INFO, "Bad handle (%d).\n", op->handle); op->status = GNTST_bad_handle; - return; + return 0; } map = &maptrack_entry(ld->grant_table, op->handle); @@ -425,7 +425,7 @@ __gnttab_unmap_grant_ref( { gdprintk(XENLOG_INFO, "Zero flags for handle (%d).\n", op->handle); op->status = GNTST_bad_handle; - return; + return 0; } dom = map->domid; @@ -437,7 +437,7 @@ __gnttab_unmap_grant_ref( /* This can happen when a grant is implicitly unmapped. */ gdprintk(XENLOG_INFO, "Could not find domain %d\n", dom); domain_crash(ld); /* naughty... */ - return; + return 0; } TRACE_1D(TRC_MEM_PAGE_GRANT_UNMAP, dom); @@ -511,9 +511,17 @@ __gnttab_unmap_grant_ref( gnttab_clear_flag(_GTF_reading, &sha->flags); unmap_out: - op->status = rc; spin_unlock(&rd->grant_table->lock); rcu_unlock_domain(rd); + + op->status = rc; + + if (rc == GNTST_unused) { + op->status = GNTST_okay; + return 0; + } + + return 1; } static long @@ -521,23 +529,25 @@ gnttab_unmap_grant_ref( XEN_GUEST_HANDLE(gnttab_unmap_grant_ref_t) uop, unsigned int count) { int i; + int used = 0; + int err = -EFAULT; struct gnttab_unmap_grant_ref op; for ( i = 0; i < count; i++ ) { if ( unlikely(__copy_from_guest_offset(&op, uop, i, 1)) ) goto fault; - __gnttab_unmap_grant_ref(&op); + used |= __gnttab_unmap_grant_ref(&op); if ( unlikely(__copy_to_guest_offset(uop, i, &op, 1)) ) goto fault; } - flush_tlb_mask(current->domain->domain_dirty_cpumask); - return 0; + err = 0; fault: - flush_tlb_mask(current->domain->domain_dirty_cpumask); - return -EFAULT; + if (used) + flush_tlb_mask(current->domain->domain_dirty_cpumask); + return err; } int diff -r 3ac19fda0bc2 xen/include/public/grant_table.h --- a/xen/include/public/grant_table.h Fri Mar 02 12:11:52 2007 +0000 +++ b/xen/include/public/grant_table.h Wed Mar 28 22:58:45 2007 +1000 @@ -360,6 +360,7 @@ DEFINE_XEN_GUEST_HANDLE(gnttab_query_siz /* * Values for error status returns. All errors are -ve. */ +#define GNTST_unused (1) /* Entry is unused (internal status). */ #define GNTST_okay (0) /* Normal return. */ #define GNTST_general_error (-1) /* General undefined error. */ #define GNTST_bad_domain (-2) /* Unrecognsed domain id. */ _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Isaku Yamahata
2007-Mar-29 06:08 UTC
Re: [Xen-devel] RFC: [0/2] Remove netloop by lazy copying in netback
On Wed, Mar 28, 2007 at 11:05:16PM +1000, Herbert Xu wrote:> On Wed, Mar 28, 2007 at 06:46:40AM +1000, Herbert Xu wrote: > > > > I''ve checked again and the accessed bit is certainly present on ppc > > as well as ia64. The only I don''t know is if it''s present on the > > nested page tables on ia64 but I see no reasons why it wouldn''t be. > > Here is a completely untested patch that demonstrates how this could > be done on x86.Xen/IA64 needs more than accessed bit for the optimized tlb flush so that destroy_grant_host_mapping() flushes tlb and it doesn''t set any bit of domain_dirty_cpumask. Can we make flush_tlb_mask() arch dependent and record something like GNTST_unuse in struct arch_vcpu? -- yamahata _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel