Rusty Russell
2006-Jun-06 05:35 UTC
[Xen-devel] [PATCH 1/9] Xen Share: Simplified I/O Mechanism
I think this is ready for experimental inclusion. A number of people have asked about it, and I think it would benefit from exposure to bright minds. It provides additional mechanisms, and need not replace existing ones. My main concern is that having the hypervisor decrement an arbitrary address on notification is wonderful for driver authors, but requires the x86 hypervisor to keep mappings: restricting addresses to within the shared pages would be easier for the hypervisor... Feedback encouraged! Rusty. --- Subject: Xen share core This introduces a page "share" mechanism to xen: an alternative to both cross-domain binding of event channels, and grant tables. Dom0 can create sharable pages, which returns a handle. It can then grant access permission to other domains. Any domain which has permission can request access using that handle, which binds an event channel to that share and returns a unique peerid for that domain: this is useful for arbitration on multi-way shared pages ("you are user #3"). A watch & trigger mechanism creates a simple event mechanism: a watch on an arbitrary "watch number" associated with the share causes the hypervisor to decrement an address when a trigger is performed on that watch number: if the location is decremented to zero, an event channel is raised. Finally, a scatter-gather list mechanism allows the domains to associate their pages with arbitrary queue numbers in the shared region, to transport bulk data (effectively by having the hypervisor do "DMA" between domains). The patch includes an abstraction layer so architectures which don''t want virtual or machine addresses from the kernel can change that, and also so that architectures can allocate the sharable pages as they wish. diff -r d5f98d23427a xen/arch/x86/Makefile --- a/xen/arch/x86/Makefile Tue May 30 10:44:23 2006 +++ b/xen/arch/x86/Makefile Wed May 31 17:39:54 2006 @@ -30,6 +30,7 @@ obj-y += physdev.o obj-y += rwlock.o obj-y += setup.o +obj-y += share.o obj-y += shutdown.o obj-y += smp.o obj-y += smpboot.o diff -r d5f98d23427a xen/arch/x86/x86_32/entry.S --- a/xen/arch/x86/x86_32/entry.S Tue May 30 10:44:23 2006 +++ b/xen/arch/x86/x86_32/entry.S Wed May 31 17:39:54 2006 @@ -648,6 +648,7 @@ .long do_xenoprof_op .long do_event_channel_op .long do_physdev_op + .long do_share_op .rept NR_hypercalls-((.-hypercall_table)/4) .long do_ni_hypercall .endr @@ -687,6 +688,7 @@ .byte 2 /* do_xenoprof_op */ .byte 2 /* do_event_channel_op */ .byte 2 /* do_physdev_op */ + .byte 5 /* do_share_op */ .rept NR_hypercalls-(.-hypercall_args_table) .byte 0 /* do_ni_hypercall */ .endr diff -r d5f98d23427a xen/common/Makefile --- a/xen/common/Makefile Tue May 30 10:44:23 2006 +++ b/xen/common/Makefile Wed May 31 17:39:54 2006 @@ -16,6 +16,7 @@ obj-y += sched_credit.o obj-y += sched_sedf.o obj-y += schedule.o +obj-y += share.o obj-y += softirq.o obj-y += string.o obj-y += symbols.o diff -r d5f98d23427a xen/common/dom0_ops.c --- a/xen/common/dom0_ops.c Tue May 30 10:44:23 2006 +++ b/xen/common/dom0_ops.c Wed May 31 17:39:54 2006 @@ -16,6 +16,7 @@ #include <xen/domain_page.h> #include <xen/trace.h> #include <xen/console.h> +#include <xen/share.h> #include <xen/iocap.h> #include <xen/guest_access.h> #include <asm/current.h> @@ -634,6 +635,27 @@ } } break; + case DOM0_CREATESHAREDPAGES: + { + ret = create_shared_pages(op->u.createsharedpages.num); + } + break; + case DOM0_DESTROYSHAREDPAGES: + { + ret = destroy_shared_pages(op->u.destroysharedpages.share_ref); + } + break; + case DOM0_GRANTSHAREDPAGES: + { + struct domain *d; + ret = -ESRCH; + d = find_domain_by_id(op->u.grantsharedpages.domain); + if ( d != NULL ) + { + ret = grant_shared_pages(op->u.grantsharedpages.share_ref, d); + } + } + break; case DOM0_IRQ_PERMISSION: { diff -r d5f98d23427a xen/common/domain.c --- a/xen/common/domain.c Tue May 30 10:44:23 2006 +++ b/xen/common/domain.c Wed May 31 17:39:54 2006 @@ -16,6 +16,7 @@ #include <xen/console.h> #include <xen/softirq.h> #include <xen/domain_page.h> +#include <xen/share.h> #include <xen/rangeset.h> #include <xen/guest_access.h> #include <xen/hypercall.h> @@ -304,6 +305,7 @@ grant_table_destroy(d); arch_domain_destroy(d); + free_shares(d); free_domain(d); diff -r d5f98d23427a xen/include/asm-x86/mm.h --- a/xen/include/asm-x86/mm.h Tue May 30 10:44:23 2006 +++ b/xen/include/asm-x86/mm.h Wed May 31 17:39:54 2006 @@ -183,9 +183,53 @@ free_domheap_page(page); } +int try_shared_page(struct page_info *page, struct domain *domain); static inline int get_page(struct page_info *page, struct domain *domain) +{ + u32 x, nx, y = page->count_info; + u32 d, nd = page->u.inuse._domain; + u32 _domain = pickle_domptr(domain); + + do { + x = y; + nx = x + 1; + d = nd; + if ( unlikely((x & PGC_count_mask) == 0) || /* Not allocated? */ + unlikely((nx & PGC_count_mask) == 0) ) /* Count overflow? */ + { + if ( !_shadow_mode_refcounts(domain) ) + DPRINTK("Error pfn %lx: rd=%p, od=%p, caf=%08x, taf=%" PRtype_info "\n", + page_to_pfn(page), domain, unpickle_domptr(d), + x, page->u.inuse.type_info); + return 0; + } + if ( unlikely(d != _domain) ) /* Wrong owner? */ + return try_shared_page(page, domain); + __asm__ __volatile__( + LOCK_PREFIX "cmpxchg8b %3" + : "=d" (nd), "=a" (y), "=c" (d), + "=m" (*(volatile u64 *)(&page->count_info)) + : "0" (d), "1" (x), "c" (d), "b" (nx) ); + } + while ( unlikely(nd != d) || unlikely(y != x) ); + + return 1; +} + +static inline int arch_get_shared_page(struct page_info *page) +{ + /* Shared pages always under lock, so this is safe. */ + if (unlikely((page->count_info+1)&PGC_count_mask) == 0) + return 0; + page->count_info++; + return 1; +} + +/* Does not try to get shared pages. */ +static inline int get_unshared_page(struct page_info *page, + struct domain *domain) { u32 x, nx, y = page->count_info; u32 d, nd = page->u.inuse._domain; diff -r d5f98d23427a xen/include/public/dom0_ops.h --- a/xen/include/public/dom0_ops.h Tue May 30 10:44:23 2006 +++ b/xen/include/public/dom0_ops.h Wed May 31 17:39:54 2006 @@ -513,6 +513,28 @@ }; typedef struct dom0_hypercall_init dom0_hypercall_init_t; DEFINE_XEN_GUEST_HANDLE(dom0_hypercall_init_t); + +#define DOM0_CREATESHAREDPAGES 49 +struct dom0_createsharedpages { + uint32_t num; +}; +typedef struct dom0_createsharedpages dom0_createsharedpages_t; +DEFINE_XEN_GUEST_HANDLE(dom0_createsharedpages_t); + +#define DOM0_GRANTSHAREDPAGES 50 +struct dom0_grantsharedpages { + unsigned long share_ref; + domid_t domain; +}; +typedef struct dom0_grantsharedpages dom0_grantsharedpages_t; +DEFINE_XEN_GUEST_HANDLE(dom0_grantsharedpages_t); + +#define DOM0_DESTROYSHAREDPAGES 51 +struct dom0_destroysharedpages { + unsigned long share_ref; +}; +typedef struct dom0_destroysharedpages dom0_destroysharedpages_t; +DEFINE_XEN_GUEST_HANDLE(dom0_destroysharedpages_t); struct dom0_op { uint32_t cmd; @@ -555,6 +577,9 @@ struct dom0_irq_permission irq_permission; struct dom0_iomem_permission iomem_permission; struct dom0_hypercall_init hypercall_init; + struct dom0_createsharedpages createsharedpages; + struct dom0_grantsharedpages grantsharedpages; + struct dom0_destroysharedpages destroysharedpages; uint8_t pad[128]; } u; }; diff -r d5f98d23427a xen/include/public/xen.h --- a/xen/include/public/xen.h Tue May 30 10:44:23 2006 +++ b/xen/include/public/xen.h Wed May 31 17:39:54 2006 @@ -64,6 +64,7 @@ #define __HYPERVISOR_xenoprof_op 31 #define __HYPERVISOR_event_channel_op 32 #define __HYPERVISOR_physdev_op 33 +#define __HYPERVISOR_share_op 33 /* Architecture-specific hypercall definitions. */ #define __HYPERVISOR_arch_0 48 diff -r d5f98d23427a xen/arch/x86/share.c --- /dev/null Tue May 30 10:44:23 2006 +++ b/xen/arch/x86/share.c Wed May 31 17:39:54 2006 @@ -0,0 +1,36 @@ +#include <xen/share.h> +#include <xen/mm.h> +#include <asm/share.h> + +struct page_info *arch_alloc_shared_pages(unsigned int order, share_ref_t *ref) +{ + struct page_info *page; + void *addr; + int i; + + /* x86 uses normal xen heap pages to share. */ + addr = alloc_xenheap_pages(order); + if (!addr) + return NULL; + + for(i=0;i<(1<<order);i++) { + clear_page(addr+i*PAGE_SIZE); + page = virt_to_page(addr+i*PAGE_SIZE); + page_set_owner(page, NULL); + /* Domain pointer must be visible before updating refcnt. */ + wmb(); + page->count_info = PGC_allocated|1; + page->u.inuse.type_info = PGT_writable_page|PGT_validated; + BUG_ON(page->u.inuse._domain); + } + + /* x86 simply uses page frame numbers as share_refs. */ + page = virt_to_page(addr); + *ref = page_to_mfn(page); + return page; +} + +void arch_free_shared_pages(struct page_info *page, unsigned int order) +{ + free_xenheap_pages(page_to_virt(page), order); +} diff -r d5f98d23427a xen/common/share.c --- /dev/null Tue May 30 10:44:23 2006 +++ b/xen/common/share.c Wed May 31 17:39:54 2006 @@ -0,0 +1,926 @@ +/* -*- Mode:C; c-basic-offset:8; tab-width:8; indent-tabs-mode:t -*- */ +/****************************************************************************** + * Page sharing and triggers for Xen. + * + * Copyright (C) 2005,2006 Rusty Russell IBM Corporation + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + */ +#include <public/share.h> +#include <xen/share.h> +#include <xen/list.h> +#include <xen/sched.h> +#include <xen/mm.h> +#include <xen/event.h> +#include <xen/kernel.h> +#include <xen/domain_page.h> +#include <asm/page.h> +#include <asm/share.h> + +/* Howmany peers are we willing to share a page with? */ +#define MAX_PEERS 32 + +struct watch +{ + struct list_head list; + + /* Where am I watching? */ + u32 trigger; + + /* Where to decrement: have done map_domain_mem on this. */ + atomic_t *decrement; + + struct peer *owner; +}; + +struct sg_list +{ + struct list_head list; + + /* Where to write size: have done map_domain_mem on this. */ + u32 *len_pointer; + + /* Which queue am I in? */ + u32 queue; + + struct peer *owner; + + int direction; + unsigned int num_sg; + struct xen_sg sg[0]; +}; + +/* Each domain accessing this share. */ +struct peer +{ + struct list_head list; + + /* Peer ID; unique for this share. */ + unsigned int id; + + /* Share whose linked list we''re in. */ + struct share *share; + + /* What domain & port to notify when something happens. */ + /* FIXME: Fix up this when vcpu goes up or down. */ + struct vcpu *vcpu; + int port; +}; + +struct share +{ + struct list_head list; + + share_ref_t share_ref; + + /* The page involved. */ + struct page_info *page; + + /* Which domains are sharing this. */ + struct list_head peers; + + /* Watches on this page */ + struct list_head watches; + + /* Scatter-gather lists on this page for this peer. */ + struct list_head sgs; + + /* Can this page be destroyed when last one unshares? */ + int destroy; + + /* Who can share this? At least creator. */ + unsigned int num_granted; + /* FIXME: Make this dynamic in the future */ + struct domain *granted[MAX_PEERS]; + + /* How many pages requested and what order allocation was used */ + unsigned int num_pages; + unsigned int order; +}; +static LIST_HEAD(shares); +static spinlock_t share_lock = SPIN_LOCK_UNLOCKED; + +static inline int is_sharing(const struct domain *domain, + const struct share *share) +{ + struct peer *i; + + list_for_each_entry(i, &share->peers, list) + if (i->vcpu->domain == domain) + return 1; + return 0; +} + +/* Has this domain been granted access to share? */ +static inline int allowed_to_share(const struct domain *domain, + const struct share *share) +{ + unsigned int i; + + for (i = 0; i < share->num_granted; i++) + if (share->granted[i] == domain) + return 1; + return 0; +} + +int try_shared_page(struct page_info *page, struct domain *domain) +{ + struct share *i; + + spin_lock(&share_lock); + list_for_each_entry(i, &shares, list) { + /* Does the pfn match the shared page or is the physical + * address in the range of allocated pages for this share_ref */ + if (i->page == page + || (page_to_maddr(i->page) <= page_to_maddr(page) + && page_to_maddr(page) < + page_to_maddr(i->page) + i->num_pages*PAGE_SIZE)) { + if (!is_sharing(domain, i)) + break; + if (!arch_get_shared_page(page)) + break; + spin_unlock(&share_lock); + return 1; + } + } + spin_unlock(&share_lock); + return 0; +} + +/* Like fds, guarantees lowest number available. Keeps list ordered */ +static void insert_by_peer_id(struct share *share, struct peer *peer) +{ + unsigned int id = 0; + struct peer *i; + + list_for_each_entry(i, &share->peers, list) { + if (i->id != id) { + /* Put new peer in this hole. */ + list_add_tail(&peer->list, &i->list); + peer->id = id; + return; + } + id++; + } + list_add_tail(&peer->list, &share->peers); + peer->id = id; +} + +static int add_domain_to_peers(struct share *share, + struct vcpu *vcpu, int port) +{ + struct peer *peer = xmalloc(struct peer); + + if (!peer) + return -ENOMEM; + + peer->vcpu = vcpu; + peer->port = port; + peer->share = share; + insert_by_peer_id(share, peer); + return peer->id; +} + +/* Returns share id of page(s). */ +share_ref_t create_shared_pages(unsigned int num) +{ + paddr_t ret; + struct share *share; + + /* Only support upto 16 pages at the moment. */ + if (num > 16 ) + return -EINVAL; + + share = xmalloc(struct share); + if (!share) + return -ENOMEM; + + share->num_granted = 1; + share->granted[0] = current->domain; + share->destroy = 0; + share->num_pages = num; + share->order = get_order_from_pages(num); + + INIT_LIST_HEAD(&share->peers); + INIT_LIST_HEAD(&share->watches); + INIT_LIST_HEAD(&share->sgs); + + share->page = arch_alloc_shared_pages(share->order, &share->share_ref); + if (!share->page) { + xfree(share); + return -ENOMEM; + } + + /* Grab first to avoid potential race with free. */ + ret = share->share_ref; + + spin_lock(&share_lock); + list_add(&share->list, &shares); + spin_unlock(&share_lock); + return ret; +} + +static inline int add_grant(struct share *share, struct domain *domain) +{ + /* FIXME: Currently we statically allocate an array for peers */ + if (share->num_granted == MAX_PEERS) + return -ENOSPC; + + /* Domain must not already have access. */ + if (allowed_to_share(domain, share)) + return -EEXIST; + + /* Add this domain to the end of the array, Lock already held */ + share->granted[share->num_granted] = domain; + share->num_granted++; + + return 0; +} + +static struct share *find_share(share_ref_t share_ref) +{ + struct share *i; + + list_for_each_entry(i, &shares, list) + if (i->share_ref == share_ref) + return i; + return NULL; +} + +static struct share *find_share_check(paddr_t machine_addr) +{ + struct share *share = find_share(machine_addr); + + if (!share || !is_sharing(current->domain, share)) + return NULL; + return share; +} + +int grant_shared_pages(share_ref_t share_ref, struct domain *domain) +{ + struct share *share; + int err; + + spin_lock(&share_lock); + share = find_share(share_ref); + if (share) + err = add_grant(share, domain); + else + err = -ENOENT; + spin_unlock(&share_lock); + return err; +} + +static void try_free_share(struct share *share) +{ + /* Last peer out frees page. */ + if (list_empty(&share->peers) && share->destroy) { + list_del(&share->list); + arch_free_shared_pages(share->page, share->order); + xfree(share); + } +} + +int destroy_shared_pages(share_ref_t share_ref) +{ + struct share *share; + int ret; + + spin_lock(&share_lock); + share = find_share(share_ref); + if (!share) + ret = -ENOENT; + else if (share->destroy) + ret = -EINVAL; + else { + share->destroy = 1; + try_free_share(share); + ret = 0; + } + spin_unlock(&share_lock); + return ret; +} + +static int share_get(share_ref_t share_ref, int port) +{ + struct share *share; + int err; + + printk("Getting share share_ref %#lx port %i\n", share_ref, port); + spin_lock(&share_lock); + share = find_share(share_ref); + if (share) { +#if 0 + if (!allowed_to_share(current->domain, share)) { + err = -EPERM; + } else +#endif + err = add_domain_to_peers(share, current, port); + } else { + printk("No such share!\n"); + err = -ENOENT; + } + spin_unlock(&share_lock); + return err; +} + +static void free_peer(struct peer *peer) +{ + list_del(&peer->list); + try_free_share(peer->share); + xfree(peer); +} + +static struct peer *find_self_as_peer(share_ref_t share_ref) +{ + struct peer *i; + struct share *share; + + share = find_share(share_ref); + if (!share) + return NULL; + + list_for_each_entry(i, &share->peers, list) { + if (i->vcpu->domain == current->domain) + return i; + } + return NULL; +} + +static int still_in_use(struct peer *peer) +{ + const struct watch *w; + const struct sg_list *s; + + list_for_each_entry(w, &peer->share->watches, list) + if (w->owner == peer) + return 1; + + list_for_each_entry(s, &peer->share->sgs, list) + if (s->owner == peer) + return 1; + + return 0; +} + +static int share_drop(share_ref_t share_ref) +{ + int err; + struct peer *peer; + + spin_lock(&share_lock); + peer = find_self_as_peer(share_ref); + if (peer) { + /* FIXME: could automatically close these */ + if (still_in_use(peer)) + err = -EBUSY; + else { + free_peer(peer); + err = 0; + } + } else + err = -ENOENT; + spin_unlock(&share_lock); + return err; +} + +/* Maps a user address. Use unmap_domain_page_global() on result to free. */ +static int map_user_address(paddr_t uaddr, void **p) +{ + unsigned int pageoff; + + /* Check addr is sane. */ + if ((uaddr % __alignof__(int)) != 0) + return -EINVAL; + + /* Hold reference to the page, check it''s valid. */ + if (!mfn_valid(paddr_to_pfn(uaddr)) + || !get_page(maddr_to_page(uaddr), current->domain)) { + return -EFAULT; + } + + pageoff = uaddr % PAGE_SIZE; + *p = map_domain_page_global(paddr_to_pfn(uaddr)) + pageoff; + return 0; +} + +static int add_watch(struct peer *peer, u32 trigger, paddr_t decaddr) +{ + struct watch *watch; + int err; + + /* FIXME: Limit */ + watch = xmalloc(struct watch); + if (!watch) + return -ENOMEM; + + err = map_user_address(decaddr, (void **)&watch->decrement); + if (err) { + xfree(watch); + return err; + } + + watch->trigger = trigger; + watch->owner = peer; + list_add(&watch->list, &peer->share->watches); + return 0; +} + +static int share_watch(share_ref_t share_ref, u32 trigger, paddr_t decaddr) +{ + struct peer *peer; + int ret; + + spin_lock(&share_lock); + peer = find_self_as_peer(share_ref); + if (!peer) + ret = -ENOENT; + else + ret = add_watch(peer, trigger, decaddr); + spin_unlock(&share_lock); + + return ret; +} + +static void free_watch(struct watch *watch) +{ + unmap_domain_page_global(&watch->decrement); + list_del(&watch->list); + xfree(watch); +} + +static int share_unwatch(share_ref_t share_ref, u32 trigger) +{ + struct peer *peer; + + spin_lock(&share_lock); + peer = find_self_as_peer(share_ref); + if (peer) { + struct watch *i; + list_for_each_entry(i, &peer->share->watches, list) { + if (i->owner == peer && i->trigger == trigger) { + free_watch(i); + spin_unlock(&share_lock); + return 0; + } + } + } + spin_unlock(&share_lock); + return -ENOENT; +} + +static unsigned int do_trigger(struct share *share, u32 trigger) +{ + struct watch *i; + unsigned int count = 0; + + list_for_each_entry(i, &share->watches, list) { + if (i->trigger != trigger) + continue; + count++; + if (atomic_dec_and_test(i->decrement)) + evtchn_set_pending(i->owner->vcpu, i->owner->port); + } + return count; +} + +static int share_trigger(share_ref_t share_ref, u32 trigger) +{ + struct share *share; + int ret; + + spin_lock(&share_lock); + share = find_share_check(share_ref); + if (share) + ret = do_trigger(share, trigger); + else + ret = -ENOENT; + spin_unlock(&share_lock); + return ret; +} + +/* Check that this domain has access to all this memory. */ +static int get_sg_list(const struct sg_list *sg) +{ + int i; + + for (i = 0; i < sg->num_sg; i++) { + struct page_info *page; + + page = maddr_to_page(sg->sg[i].addr); + + /* FIXME: What a hack! Must be same page for now. */ + if (page != maddr_to_page(sg->sg[i].addr + sg->sg[i].len - 1)) { + printk("Over a page 0x%08lx + %li\n", + sg->sg[i].addr, sg->sg[i].len); + goto fail; + } + + if (!mfn_valid(paddr_to_pfn(sg->sg[i].addr)) + || !get_unshared_page(page, current->domain)) { + printk("pfn %s\n", + mfn_valid(paddr_to_pfn(sg->sg[i].addr)) + ? "valid": "INVALID"); + goto fail; + } + } + return 1; + +fail: + /* Put all the pages. */ + while (--i >= 0) + put_page(maddr_to_page(sg->sg[i].addr)); + return 0; +} + +static void put_sg_list(const struct sg_list *sg) +{ + unsigned int i; + + for (i = 0; i < sg->num_sg; i++) + put_page(maddr_to_page(sg->sg[i].addr)); +} + +/* Caller must free this if it is used. */ +static struct sg_list *next_sg_list(struct share *share, u32 queue) +{ + struct sg_list *i; + + list_for_each_entry(i, &share->sgs, list) + if (i->queue == queue) + return i; + return NULL; +} + +static int sg_register(share_ref_t share_ref, u32 queue, + unsigned int num_sgs, int dir, + struct xen_sg *usgs, paddr_t ulenaddr) +{ + struct sg_list *sg; + struct peer *me; + int ret; + + if (num_sgs == 0 || num_sgs > XEN_SG_MAX) { + printk("%i sgs bad\n", num_sgs); + return -EINVAL; + } + + if (!(dir & XEN_SG_DIR)) { + printk("dir %i bad\n", dir); + return -EINVAL; + } + + sg = xmalloc_bytes(sizeof(*sg) + num_sgs * sizeof(sg->sg[0])); + if (!sg) { + printk("Could not allocate %i sgs\n", num_sgs); + return -ENOMEM; + } + + ret = map_user_address(ulenaddr, (void **)&sg->len_pointer); + if (ret < 0) + goto free_sg; + + spin_lock(&share_lock); + me = find_self_as_peer(share_ref); + if (!me) { + ret = -ENOENT; + goto unlock_free; + } + + if (copy_from_user(sg->sg, usgs, num_sgs * sizeof(sg->sg[0])) != 0) { + printk("Faulted copying sgs from %p\n", (void *)usgs); + ret = -EFAULT; + goto unlock_free; + } + + sg->num_sg = num_sgs; + sg->direction = dir; + sg->queue = queue; + sg->owner = me; + + if (!get_sg_list(sg)) { + ret = -EFAULT; + goto unlock_free; + } + + /* We always activate trigger 0 if we were completely out of sgs. + * FIXME: don''t trigger self? + */ + if (!next_sg_list(me->share, queue)) + do_trigger(me->share, 0); + + list_add(&sg->list, &me->share->sgs); + ret = 0; + spin_unlock(&share_lock); + return ret; + +unlock_free: + spin_unlock(&share_lock); + unmap_domain_page_global(sg->len_pointer); +free_sg: + xfree(sg); + return ret; +} + +static void free_sg_list(struct sg_list *sg_list) +{ + list_del(&sg_list->list); + unmap_domain_page_global(sg_list->len_pointer); + put_sg_list(sg_list); + xfree(sg_list); +} + +static int sg_unregister(share_ref_t share_ref, paddr_t first_addr) +{ + struct sg_list *i; + struct peer *peer; + int err; + + spin_lock(&share_lock); + peer = find_self_as_peer(share_ref); + if (!peer) + err = -ENOENT; + else { + err = -ENOENT; + list_for_each_entry(i, &peer->share->sgs, list) { + if (i->owner == peer && i->sg[0].addr == first_addr) { + free_sg_list(i); + err = 0; + break; + } + } + } + spin_unlock(&share_lock); + return err; +} + +static unsigned long from_user(paddr_t dst, paddr_t src, unsigned long len) +{ + void *dstp; + + /* Only do within this page boundary. */ + if ((dst % PAGE_SIZE) + len > PAGE_SIZE) + len = PAGE_SIZE - (dst % PAGE_SIZE); + + dstp = map_domain_page(paddr_to_pfn(dst)) + (dst % PAGE_SIZE); + if (copy_from_user(dstp, (void *)src, len) != 0) { + printk("Copying %li bytes from %p faulted!\n", len, + (void *)src); + len = 0; + } + unmap_domain_page_global(dstp); + return len; +} + +static unsigned long to_user(paddr_t dst, paddr_t src, unsigned long len) +{ + void *srcp; + + /* Only do within this page boundary. */ + if ((src % PAGE_SIZE) + len > PAGE_SIZE) + len = PAGE_SIZE - (src % PAGE_SIZE); + + srcp = map_domain_page(paddr_to_pfn(src)) + (src % PAGE_SIZE); + if (copy_to_user((void *)dst, srcp, len) != 0) + len = 0; + unmap_domain_page_global(srcp); + return len; +} + +/* Copy from src to dst, return amount copied. */ +static int do_copy(const struct sg_list *sgdst, const struct sg_list *sgsrc, + unsigned long (*copy)(paddr_t, paddr_t, unsigned long)) +{ + unsigned long totlen, src, dst, srcoff, dstoff; + int ret = 0; + + totlen = 0; + src = dst = 0; + srcoff = dstoff = 0; + while (src < sgsrc->num_sg) { + unsigned long len; + len = min(sgsrc->sg[src].len - srcoff, + sgdst->sg[dst].len - dstoff); + + len = copy(sgdst->sg[dst].addr+dstoff, + sgsrc->sg[src].addr+srcoff, + len); + if (len == 0) { + printk("Copying from uaddr 0x%08lx faulted\n", + sgsrc->sg[src].addr+srcoff); + return -EFAULT; + } + + totlen += len; + srcoff += len; + dstoff += len; + ret += len; + if (srcoff == sgsrc->sg[src].len) { + src++; + srcoff = 0; + } + if (dstoff == sgdst->sg[dst].len) { + dst++; + dstoff = 0; + if (dst == sgdst->num_sg) + break; + } + } + return ret; +} + +static int sg_xfer(share_ref_t share_ref, unsigned int num_sgs, int dir, + u32 queue, struct xen_sg *usgs) +{ + int ret; + struct share *share; + struct sg_list *sg; + struct { + struct sg_list sglist; + struct xen_sg sg[XEN_SG_MAX]; + } tmp; + + if (dir != XEN_SG_IN && dir != XEN_SG_OUT) + return -EINVAL; + + if (num_sgs == 0 || num_sgs > XEN_SG_MAX) + return -EINVAL; + + spin_lock(&share_lock); + share = find_share_check(share_ref); + if (!share) { + ret = -ENOENT; + goto out; + } + sg = next_sg_list(share, queue); + if (!sg) { + ret = -ENOSPC; + goto out; + } + if (copy_from_user(tmp.sg, usgs, num_sgs*sizeof(usgs[0])) != 0) { + printk("Copying %i sgs from uaddr %p faulted\n", + num_sgs, usgs); + ret = -EFAULT; + goto out; + } + tmp.sglist.num_sg = num_sgs; + /* If XEN_SG_IN, it must let us XEN_SG_OUT, and vice-versa. */ + if (!(sg->direction & (dir ^ XEN_SG_DIR))) { + ret = -EPERM; + goto out; + } + + if (dir == XEN_SG_IN) + ret = do_copy(&tmp.sglist, sg, to_user); + else + ret = do_copy(sg, &tmp.sglist, from_user); + + if (ret > 0) { + *sg->len_pointer = ret; + evtchn_set_pending(sg->owner->vcpu, sg->owner->port); + free_sg_list(sg); + } + +out: + spin_unlock(&share_lock); + return ret; +} + +static void free_peer_users(struct peer *peer) +{ + struct watch *w, *wtmp; + struct sg_list *s, *stmp; + + list_for_each_entry_safe(w, wtmp, &peer->share->watches, list) + if (w->owner == peer) + free_watch(w); + + list_for_each_entry_safe(s, stmp, &peer->share->sgs, list) + if (s->owner == peer) + free_sg_list(s); +} + +void free_shares(struct domain *domain) +{ + struct share *i, *tmp; + + spin_lock(&share_lock); + list_for_each_entry_safe(i, tmp, &shares, list) { + struct peer *s; + list_for_each_entry(s, &i->peers, list) { + if (s->vcpu->domain == domain) { + free_peer_users(s); + free_peer(s); + break; + } + } + } + spin_unlock(&share_lock); +} + +int share_dump(void) +{ +#if 0 + struct share *s; + + spin_lock(&share_lock); + list_for_each_entry(s, &shares, list) { + int i; + struct peer *p; + struct watch *w; + + printk("%i: Dumping share def for %#lx(destroy==%i)[%p]\n", + current->domain->domain_id, + s->share_ref, s->destroy, + page_to_virt(s->page)); + + for(i=0; i < s->num_granted; i++) + printk("\tGranted to Domain %i\n", + s->granted[i]->domain_id); + + list_for_each_entry(p, &s->peers, list) { + struct sg_list *sg; + + printk("\tHas peer %i(share %s match)\n", p->id, + (p->share==s?"does":"doesn''t")); + + list_for_each_entry(sg, &p->sgs, list) { + printk("\t\tRegistered sg [len_ptr==%#lx, " + "direction==%i, num_sg==%i]\n", + (unsigned long)sg->len_pointer, + sg->direction, sg->num_sg); + } + } + + list_for_each_entry(w, &s->watches, list) { + printk("\tHas watch [trigger==%u, decrement==%i]\n", + w->trigger, atomic_read(w->decrement)); + printk("\tOwner is peer %i\n", w->owner->id); + + } + } + spin_unlock(&share_lock); +#endif + return 0; +} + +static inline int xen_share_sg_arg_count(unsigned long arg) +{ + return (arg & 0xFFFF) >> 2; +} + +static inline int xen_share_sg_arg_dir(unsigned long arg) +{ + return arg & XEN_SG_DIR; +} + +static inline int xen_share_sg_arg_queue(unsigned long arg) +{ + return arg >> 16; +} + +long do_share_op(unsigned int cmd, + unsigned long arg1, unsigned long arg2, unsigned long arg3, + unsigned long arg4) +{ + switch (cmd) { + case XEN_SHARE_get: + return share_get(arg1, arg2); + case XEN_SHARE_drop: + return share_drop(arg1); + case XEN_SHARE_watch: + return share_watch(arg1, arg2, arg3); + case XEN_SHARE_unwatch: + return share_unwatch(arg1, arg2); + case XEN_SHARE_trigger: + return share_trigger(arg1, arg2); + case XEN_SHARE_sg_register: + return sg_register(arg1, xen_share_sg_arg_queue(arg2), + xen_share_sg_arg_count(arg2), + xen_share_sg_arg_dir(arg2), + (struct xen_sg *)arg3, arg4); + case XEN_SHARE_sg_unregister: + return sg_unregister(arg1, arg2); + case XEN_SHARE_sg_xfer: + return sg_xfer(arg1, xen_share_sg_arg_count(arg2), + xen_share_sg_arg_dir(arg2), + xen_share_sg_arg_queue(arg2), + (struct xen_sg *)arg3); + case XEN_SHARE_dump: + return share_dump(); + default: + return -ENOSYS; + } +} diff -r d5f98d23427a xen/include/asm-x86/share.h --- /dev/null Tue May 30 10:44:23 2006 +++ b/xen/include/asm-x86/share.h Wed May 31 17:39:54 2006 @@ -0,0 +1,7 @@ +#ifndef __XEN_ASM_SHARE_H +#define __XEN_ASM_SHARE_H + +struct page_info *arch_alloc_shared_pages(unsigned int order, share_ref_t *ref); +void arch_free_shared_pages(struct page_info *pfn, unsigned int order); + +#endif /* __XEN_ASM_SHARE_H */ diff -r d5f98d23427a xen/include/public/share.h --- /dev/null Tue May 30 10:44:23 2006 +++ b/xen/include/public/share.h Wed May 31 17:39:54 2006 @@ -0,0 +1,48 @@ +/* Simple share page ops for Xen. */ +#ifndef __XEN_PUBLIC_SHARE_H__ +#define __XEN_PUBLIC_SHARE_H__ + +/* Operations to share/unshare memory. */ +/* The share reference */ +typedef unsigned long share_ref_t; + +/* int get(share_ref, port). Returns unique peer id. */ +#define XEN_SHARE_get 0 +/* void drop(share_ref, peerid) */ +#define XEN_SHARE_drop 1 + +/* Watch and trigger operations */ +/* irq_t watch(share_ref, u32 triggernum, physaddr_t decaddr) */ +#define XEN_SHARE_watch 2 +/* void unwatch(share_ref, u32 triggernum) */ +#define XEN_SHARE_unwatch 3 +/* int trigger(share_ref, u32 triggernum) */ +#define XEN_SHARE_trigger 4 + +/* Scatter-gather operations. */ +#define XEN_SG_IN 0x01 +#define XEN_SG_OUT 0x02 +#define XEN_SG_DIR (XEN_SG_IN|XEN_SG_OUT) + +/* Maximum number of sg elements. */ +#define XEN_SG_MAX 16 + +struct xen_sg +{ + unsigned long addr, len; +}; + +/* We combine the count, queue and direction: the bottom two bits are + * directions XEN_SG_IN/OUT, the top 16 are the queue number. */ +#define xen_share_sg_arg(queue, count, dir) ((queue) << 16 | ((count) << 2) | (dir)) + +/* int sg_register(share_ref, queue_count_dir, struct xen_sg *sgs, physaddr_t len).*/ +#define XEN_SHARE_sg_register 5 +/* int sg_unregister(share_ref, memory_t first_sg_addr). */ +#define XEN_SHARE_sg_unregister 6 +/* int sg_xfer(share_ref, queue_count_dir, struct xen_sg *sgs) */ +#define XEN_SHARE_sg_xfer 7 +/* int share_dump(void) */ +#define XEN_SHARE_dump 8 + +#endif /* __XEN_PUBLIC_SHARE_H__ */ diff -r d5f98d23427a xen/include/xen/share.h --- /dev/null Tue May 30 10:44:23 2006 +++ b/xen/include/xen/share.h Wed May 31 17:39:54 2006 @@ -0,0 +1,13 @@ +#ifndef _XEN_SHARE_H +#define _XEN_SHARE_H +#include <public/share.h> + +struct domain; +/* DOM0 ops */ +share_ref_t create_shared_pages(unsigned int num); +int grant_shared_pages(share_ref_t share_ref, struct domain *d); +int destroy_shared_pages(share_ref_t share_ref); + +/* Domain is dying, release shares */ +void free_shares(struct domain *domain); +#endif -- ccontrol: http://ccontrol.ozlabs.org _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Rusty Russell
2006-Jun-06 05:50 UTC
[Xen-devel] [PATCH 3/9] privcmd interface addition to support share operations from Dom0 userspace
Subject: privcmd interface addition to support share operations from Dom0 userspace We create to simple privcmd ops to create and grant access to shares. diff -r 6d476981e3a5 -r 07a00d96357d linux-2.6-xen-sparse/drivers/xen/privcmd/privcmd.c --- a/linux-2.6-xen-sparse/drivers/xen/privcmd/privcmd.c Sun May 28 14:49:17 2006 +++ b/linux-2.6-xen-sparse/drivers/xen/privcmd/privcmd.c Wed May 31 05:33:38 2006 @@ -268,6 +268,7 @@ set_bit(__HYPERVISOR_xen_version, hypercall_permission_map); set_bit(__HYPERVISOR_sched_op, hypercall_permission_map); set_bit(__HYPERVISOR_sched_op_compat, hypercall_permission_map); + set_bit(__HYPERVISOR_share_op, hypercall_permission_map); set_bit(__HYPERVISOR_event_channel_op_compat, hypercall_permission_map); diff -r 6d476981e3a5 -r 07a00d96357d tools/libxc/xc_domain.c --- a/tools/libxc/xc_domain.c Sun May 28 14:49:17 2006 +++ b/tools/libxc/xc_domain.c Wed May 31 05:33:38 2006 @@ -482,6 +482,25 @@ return rc; } + +share_ref_t xc_create_shared_pages(int xc_handle, unsigned int num_pages) +{ + DECLARE_DOM0_OP; + + op.cmd = DOM0_CREATESHAREDPAGES; + op.u.createsharedpages.num = num_pages; + return do_dom0_op(xc_handle, &op); +} + +int xc_grant_shared_pages(int xc_handle, domid_t domid, share_ref_t share_ref) +{ + DECLARE_DOM0_OP; + + op.cmd = DOM0_GRANTSHAREDPAGES; + op.u.grantsharedpages.domain = domid; + op.u.grantsharedpages.share_ref = share_ref; + return do_dom0_op(xc_handle, &op); +} int xc_domain_irq_permission(int xc_handle, uint32_t domid, diff -r 6d476981e3a5 -r 07a00d96357d tools/libxc/xenctrl.h --- a/tools/libxc/xenctrl.h Sun May 28 14:49:17 2006 +++ b/tools/libxc/xenctrl.h Wed May 31 05:33:38 2006 @@ -21,6 +21,7 @@ #include <xen/memory.h> #include <xen/acm.h> #include <xen/acm_ops.h> +#include <xen/share.h> #ifdef __ia64__ #define XC_PAGE_SHIFT 14 @@ -584,6 +585,10 @@ int xc_version(int xc_handle, int cmd, void *arg); +/* Create & add permissions to sharable pages. */ +share_ref_t xc_create_shared_pages(int xc_handle, unsigned int num_pages); +int xc_grant_shared_pages(int xc_handle, domid_t domid, share_ref_t share_ref); + /* * MMU updates. */ -- ccontrol: http://ccontrol.ozlabs.org _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Rusty Russell
2006-Jun-06 05:51 UTC
[Xen-devel] [PATCH 4/9] /dev/xenshare for accessing/mapping shared pages from userspace
To manipulate shared pages from userspace, we use a simple device. Userspace can gain access to a share by handle, mmap it, place a watch, trigger them, and do scatter-gather transfers. FIXME: Should use vm_insert_page these days. diff -r 125c7cd65739 linux-2.6-xen-sparse/drivers/xen/Makefile --- a/linux-2.6-xen-sparse/drivers/xen/Makefile Thu Jun 1 23:24:05 2006 +++ b/linux-2.6-xen-sparse/drivers/xen/Makefile Fri Jun 2 09:25:42 2006 @@ -8,6 +8,7 @@ obj-y += balloon/ obj-y += privcmd/ obj-y += xenbus/ +obj-y += xenshare.o obj-$(CONFIG_XEN_BLKDEV_BACKEND) += blkback/ obj-$(CONFIG_XEN_NETDEV_BACKEND) += netback/ diff -r 125c7cd65739 linux-2.6-xen-sparse/drivers/xen/xenshare.c --- /dev/null Thu Jun 1 23:24:05 2006 +++ b/linux-2.6-xen-sparse/drivers/xen/xenshare.c Fri Jun 2 09:25:42 2006 @@ -0,0 +1,365 @@ +/* Userspace interface for accessing share regions. + * + * Copyright 2006 Rusty Russell <rusty@rustcorp.com.au> IBM Corporation + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + */ +#define DEBUG +#include <linux/module.h> +#include <linux/fs.h> +#include <linux/ioctl.h> +#include <linux/interrupt.h> +#include <linux/mm.h> +#include <linux/device.h> +#include <asm/hypervisor.h> +#include <xen/interface/share.h> +#include <xen/public/xenshare.h> +#include <xen/evtchn.h> +#include <asm/uaccess.h> +#include <asm/io.h> +#include <asm/share.h> + +/* FIXME: %s/pr_debug(/pr_debug(/g*/ +struct share_info +{ + struct xen_share *share; + + int out_sg_used; + unsigned int out_sg_pages; + struct page *out_sg; + + /* Trigger they placed watch on (-1 == none) */ + int watch_number; + int watch_result; + + struct xen_share_handler handler; + wait_queue_head_t waiters; +}; + +/* FIXME: Should we register handlers as required? */ +static void share_io_handler(struct xen_share_handler *handler) +{ + struct share_info *info; + + info = container_of(handler, struct share_info, handler); + pr_debug("xenshare: interrupt!\n"); + wake_up_all(&info->waiters); +} + +static int get_share(struct file *file, void __user *udata) +{ + struct xenshare_get_share share; + struct share_info *info; + int err; + + if (copy_from_user(&share, udata, sizeof(share)) != 0) + return -EFAULT; + + if (file->private_data) + return -EBUSY; + + info = kmalloc(sizeof(*info), GFP_KERNEL); + if (!info) + return -ENOMEM; + + info->out_sg = NULL; + info->watch_number = -1; + info->watch_result = 1; + init_waitqueue_head(&info->waiters); + + info->share = xen_share_get(share.share_ref, share.num_pages); + if (IS_ERR(info->share)) { + err = PTR_ERR(info->share); + pr_debug("xenshare: get_share returned %i\n", err); + goto free_info; + } + info->handler.handler = share_io_handler; + xen_share_add_handler(info->share, &info->handler); + file->private_data = info; + return info->share->peerid; + +free_info: + kfree(info); + return err; +} + +static int pages_to_sg(struct share_info *info, + struct xen_sg sg[], + unsigned long len) +{ + unsigned int i; + + if (len > PAGE_SIZE * XEN_SG_MAX) + return -ENOSPC; + + /* Register this length of our buffer as sg. */ + for (i = 0; i < len/PAGE_SIZE; i++) { + sg[i].addr = page_to_pfn(info->out_sg + i) << PAGE_SHIFT; + sg[i].len = PAGE_SIZE; + } + if (len % PAGE_SIZE) { + sg[i].addr = page_to_pfn(info->out_sg + i) << PAGE_SHIFT; + sg[i].len = len % PAGE_SIZE; + i++; + } + return i; +} + +static int send_sg(struct file *file, void __user *udata) +{ + struct xen_sg sg[XEN_SG_MAX]; + struct xenshare_sg send; + int err; + struct share_info *info = file->private_data; + + if (copy_from_user(&send, udata, sizeof(send)) != 0) + return -EFAULT; + + if (!info) + return -EINVAL; + + err = pages_to_sg(info, sg, send.len); + if (err >= 0) + err = xen_sg_xfer(info->share, send.queue, XEN_SG_OUT, + err, sg); + return err; +} + +static int register_sg(struct file *file, void __user *udata) +{ + struct share_info *info = file->private_data; + struct xen_sg sg[XEN_SG_MAX]; + struct xenshare_sg reg; + int err; + + if (copy_from_user(®, udata, sizeof(reg)) != 0) + return -EFAULT; + + if (!info) + return -EINVAL; + + err = pages_to_sg(info, sg, reg.len); + if (err < 0) + return err; + + info->out_sg_used = 0; + err = xen_sg_register(info->share, XEN_SG_IN, reg.queue, + &info->out_sg_used, err, sg); + pr_debug("xenshare: Registered sg: %i\n", err); + if (err) + info->out_sg_used = 1; + return err; +} + +static int watch(struct file *file, unsigned long trigger) +{ + struct share_info *info = file->private_data; + int err; + + if (!info) + return -EINVAL; + + pr_debug("xenshare: watch %li\n", trigger); + if (info->watch_number != -1) + return -EBUSY; + + info->watch_number = trigger; + err = xen_share_watch(info->share, trigger, &info->watch_result); + if (err) + info->watch_number = -1; + pr_debug("xenshare: watch returned %i\n", err); + return err; +} + +static int trigger(struct file *file, unsigned long watch_number) +{ + struct share_info *info = file->private_data; + + if (!info) + return -EINVAL; + + pr_debug("xenshare: trigger %li\n", watch_number); + return xen_share_trigger(info->share, watch_number); +} + +static int xenshare_ioctl(struct inode *inode, struct file *file, + unsigned int cmd, unsigned long data) +{ + switch (cmd) { + case IOCTL_XENSHARE_GET_SHARE: + return get_share(file, (void __user *)data); + case IOCTL_XENSHARE_SG_SEND: + return send_sg(file, (void __user *)data); + case IOCTL_XENSHARE_SG_REGISTER: + return register_sg(file, (void __user *)data); + case IOCTL_XENSHARE_WATCH: + return watch(file, data); + case IOCTL_XENSHARE_TRIGGER: + return trigger(file, data); + default: + return -ENOTTY; + } +} + +/* In 2.6.12, this is how you map a kernel page. Later, use vm_insert_page. */ +static struct page *share_nopage(struct vm_area_struct *vma, + unsigned long vaddr, int *type) +{ + unsigned int pagenum = (vaddr - vma->vm_start)/PAGE_SIZE; + if (vaddr > vma->vm_end) + return NOPAGE_SIGBUS; + if (type) + *type = VM_FAULT_MINOR; + return (struct page *)vma->vm_private_data + pagenum; +} + +static struct vm_operations_struct xenshare_vm_ops +{ + .nopage = share_nopage, +}; + +static void map_pages(struct vm_area_struct *vma, struct page *page) +{ + vma->vm_ops = &xenshare_vm_ops; + vma->vm_flags |= VM_DONTEXPAND | VM_RESERVED; + vma->vm_flags &= ~VM_IO; /* using shared anonymous pages */ + vma->vm_private_data = page; +} + +static int create_and_map_sg(struct share_info *info, + struct vm_area_struct *vma) +{ + unsigned long pages = (vma->vm_end - vma->vm_start) / PAGE_SIZE; + + if (!info->out_sg) { + if (pages > XEN_SG_MAX) + return -ENOSPC; + info->out_sg = alloc_pages(GFP_KERNEL, fls(pages-1)); + if (!info->out_sg) { + printk("Could not allocate %i pages\n", + 1<<fls(pages-1)); + return -ENOMEM; + } + info->out_sg_pages = pages; + /* We set this to 0 when registered with hypervisor. */ + info->out_sg_used = 1; + } + + /* Can''t map more than we have. */ + if (pages > info->out_sg_pages) + return -ENOSPC; + + map_pages(vma, info->out_sg); + return 0; +} + +static int xenshare_mmap(struct file *file, struct vm_area_struct *vma) +{ + struct share_info *info = file->private_data; + + if (!info) { + pr_debug("mmap before get_share file %p\n", file); + return -EINVAL; + } + + if (vma->vm_pgoff == 0) + return create_and_map_sg(info, vma); + if (vma->vm_pgoff == XENSHARE_MAP_SHARE_PAGE) + return xen_share_map(info->share, vma); + pr_debug("Unknown mmap offset %li\n", vma->vm_pgoff); + return -EINVAL; +} + +/* Read means wait for sg to be used / watch to be fired. */ +static ssize_t read(struct file *file, char __user *ubuf, size_t size, + loff_t *off) +{ + int err; + struct share_info *info = file->private_data; + + if (!info) + return -EINVAL; + if (size != sizeof(info->out_sg_used)) + return -EINVAL; + + err = wait_event_interruptible(info->waiters, + info->out_sg_used || !info->watch_result); + if (err) + return err; + + /* 0 or negative indicates the watch fired. */ + if (info->watch_result <= 0) { + int watch = -info->watch_number; + info->watch_result = 1; + pr_debug("Watch number %i\n", info->watch_number); + if (copy_to_user(ubuf, &watch, 4) != 0) + return -EFAULT; + } else { + pr_debug("sg_used %i\n", info->out_sg_used); + if (copy_to_user(ubuf, &info->out_sg_used, 4) != 0) + return -EFAULT; + } + return size; +} + +/* Free up allocated evtchn port and drop share */ +static int xenshare_release(struct inode *inode, struct file *file) +{ + struct share_info *info = file->private_data; + + /* If private_data isn''t allocated we we''re opened and closed + * without doing anything interesting */ + if (!info) + return 0; + + /* Unregister sg before drop */ + if (info->out_sg) + xen_sg_unregister(info->share, + page_to_pfn(info->out_sg) << PAGE_SHIFT); + + if (info->watch_number != -1) + xen_share_unwatch(info->share, info->watch_number); + + xen_share_remove_handler(info->share, &info->handler); + xen_share_put(info->share); + kfree(info); + file->private_data = NULL; + + return 0; +} + +static struct file_operations xenshare_file_ops = { + .ioctl = xenshare_ioctl, + .mmap = xenshare_mmap, + .read = read, + .release = xenshare_release, +}; + +static int init(void) +{ + struct class *xen_class; + int err; + + err = register_chrdev(0, "xenshare", &xenshare_file_ops); + if (err < 0) + return err; + + xen_class = class_create(THIS_MODULE, "xen"); + /* FIXME: save struct class_device * */ + (void*)class_device_create(xen_class, NULL, MKDEV(err, 0), NULL, "xenshare"); + return 0; +} +module_init(init); +MODULE_LICENSE("GPL"); diff -r 125c7cd65739 linux-2.6-xen-sparse/include/xen/public/xenshare.h --- /dev/null Thu Jun 1 23:24:05 2006 +++ b/linux-2.6-xen-sparse/include/xen/public/xenshare.h Fri Jun 2 09:25:42 2006 @@ -0,0 +1,77 @@ +/****************************************************************************** + * xenshare.h + * + * Interface to /dev/xenshare. + * + * Copyright 2006 Rusty Russell <rusty@rustcorp.com.au> IBM Corporation + * + * This file may be distributed separately from the Linux kernel, or + * incorporated into other software packages, subject to the following license: + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this source file (the "Software"), to deal in the Software without + * restriction, including without limitation the rights to use, copy, modify, + * merge, publish, distribute, sublicense, and/or sell copies of the Software, + * and to permit persons to whom the Software is furnished to do so, subject to + * the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS + * IN THE SOFTWARE. + */ +#ifndef __LINUX_XENSHARE_H__ +#define __LINUX_XENSHARE_H__ + +/* Device is used as follows: + * (1) IOCTL_XENSHARE_GET_SHARE is called to get the share. Then you can + * mmap at page XENSHARE_MAP_SHARE_PAGE to access it. + * (2) mmap at 0 creates a scatter-gather list. + * (3) Writing a 4-byte length to the fd registers it with the share. + * (4) Reading the fd blocks until the sg is filled. The 4-byte + * length is returned. + * (5) IOCTL_XENSHARE_SG_SEND is called to send an sg. + */ + +#define XENSHARE_MAP_SHARE_PAGE 0x1000 + +struct xenshare_get_share +{ + unsigned long share_ref; + unsigned int num_pages; +}; + +struct xenshare_sg +{ + unsigned long len; + uint32_t queue; +}; + +/* After this, you can mmap XENSHARE_MAP_PAGE to access share. + * Returns peerid. */ +#define IOCTL_XENSHARE_GET_SHARE \ + _IOR(''P'', 100, struct xenshare_get_share) + +/* Transfers the xenshare_sg. */ +#define IOCTL_XENSHARE_SG_SEND \ + _IOR(''P'', 101, struct xenshare_sg) + +/* Registers the xenshare_sg */ +#define IOCTL_XENSHARE_SG_REGISTER \ + _IOR(''P'', 102, struct xenshare_sg) + +/* Registers a watch (currently only 1): read returns -triggernum */ +#define IOCTL_XENSHARE_WATCH \ + _IO(''P'', 103) + +/* Triggers a watch: returns same a hypercall */ +#define IOCTL_XENSHARE_TRIGGER \ + _IO(''P'', 104) + +#endif /* __LINUX_XENSHARE_H__ */ -- ccontrol: http://ccontrol.ozlabs.org _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
This patch adds a single entry on the end of the start_info, to provide a simplified bus called "vdevice". This is completely backwards compatible: old kernels will ignore it, old tools set it to zero. The vdevice bus is a simple array of structures: now that a share reference is sufficient to describe devices, and devices no longer map other domains'' pages, this is all we need. The page is initially populated with devices (although currently it''s always empty when a domain starts). Devices are added by setting a zero entry, and triggering watch number 1. They are acknowledged by setting bits in the appropriate entry and triggering watch 0. Devices are removed the same way. This majority of this patch is plumbing it into the tools. diff -r cddc595b285b xen/include/public/xen.h --- a/xen/include/public/xen.h Thu Jun 1 23:26:37 2006 +++ b/xen/include/public/xen.h Fri Jun 2 09:27:39 2006 @@ -455,6 +455,7 @@ unsigned long mod_start; /* VIRTUAL address of pre-loaded module. */ unsigned long mod_len; /* Size (bytes) of pre-loaded module. */ int8_t cmd_line[MAX_GUEST_CMDLINE]; + unsigned long vdevice_share; /* share_ref of vdevice config page. */ }; typedef struct start_info start_info_t; diff -r cddc595b285b tools/libxc/xc_linux_build.c --- a/tools/libxc/xc_linux_build.c Thu Jun 1 23:26:37 2006 +++ b/tools/libxc/xc_linux_build.c Fri Jun 2 09:27:39 2006 @@ -457,7 +457,8 @@ unsigned long flags, unsigned int store_evtchn, unsigned long *store_mfn, unsigned int console_evtchn, unsigned long *console_mfn, - uint32_t required_features[XENFEAT_NR_SUBMAPS]) + uint32_t required_features[XENFEAT_NR_SUBMAPS], + unsigned long vdevice_share) { unsigned long *page_array = NULL; struct load_funcs load_funcs; @@ -549,6 +550,7 @@ start_info->store_evtchn = store_evtchn; start_info->console_mfn = nr_pages - 1; start_info->console_evtchn = console_evtchn; + start_info->vdevice_share = vdevice_share; start_info->nr_pages = nr_pages; // FIXME?: nr_pages - 2 ???? if ( initrd->len != 0 ) { @@ -612,7 +614,8 @@ unsigned long flags, unsigned int store_evtchn, unsigned long *store_mfn, unsigned int console_evtchn, unsigned long *console_mfn, - uint32_t required_features[XENFEAT_NR_SUBMAPS]) + uint32_t required_features[XENFEAT_NR_SUBMAPS], + unsigned long vdevice_share) { unsigned long *page_array = NULL; unsigned long count, i, hypercall_pfn; @@ -977,6 +980,7 @@ start_info->store_evtchn = store_evtchn; start_info->console_mfn = guest_console_mfn; start_info->console_evtchn = console_evtchn; + start_info->vdevice_share = vdevice_share; if ( initrd->len != 0 ) { start_info->mod_start = vinitrd_start; @@ -1044,7 +1048,8 @@ unsigned int store_evtchn, unsigned long *store_mfn, unsigned int console_evtchn, - unsigned long *console_mfn) + unsigned long *console_mfn, + unsigned long vdevice_share) { dom0_op_t launch_op; DECLARE_DOM0_OP; @@ -1097,8 +1102,8 @@ &vstack_start, ctxt, cmdline, op.u.getdomaininfo.shared_info_frame, flags, store_evtchn, store_mfn, - console_evtchn, console_mfn, - features_bitmap) < 0 ) + console_evtchn, console_mfn, features_bitmap, + vdevice_share) < 0 ) { ERROR("Error constructing guest OS"); goto error_out; @@ -1204,7 +1209,8 @@ unsigned int store_evtchn, unsigned long *store_mfn, unsigned int console_evtchn, - unsigned long *console_mfn) + unsigned long *console_mfn, + unsigned long vdevice_share) { int sts; char *img_buf; @@ -1245,7 +1251,7 @@ sts = xc_linux_build_internal(xc_handle, domid, img_buf, img_len, &initrd_info, cmdline, features, flags, store_evtchn, store_mfn, - console_evtchn, console_mfn); + console_evtchn, console_mfn, vdevice_share); out: /* The inflation routines may pass back the same buffer so be */ @@ -1270,7 +1276,8 @@ unsigned int store_evtchn, unsigned long *store_mfn, unsigned int console_evtchn, - unsigned long *console_mfn) + unsigned long *console_mfn, + unsigned long vdevice_share) { char *image = NULL; unsigned long image_size; @@ -1302,7 +1309,7 @@ sts = xc_linux_build_internal(xc_handle, domid, image, image_size, &initrd_info, cmdline, features, flags, store_evtchn, store_mfn, - console_evtchn, console_mfn); + console_evtchn, console_mfn, vdevice_share); error_out: free(image); diff -r cddc595b285b tools/libxc/xc_linux_restore.c --- a/tools/libxc/xc_linux_restore.c Thu Jun 1 23:26:37 2006 +++ b/tools/libxc/xc_linux_restore.c Fri Jun 2 09:27:39 2006 @@ -105,7 +105,8 @@ int xc_linux_restore(int xc_handle, int io_fd, uint32_t dom, unsigned long nr_pfns, unsigned int store_evtchn, unsigned long *store_mfn, - unsigned int console_evtchn, unsigned long *console_mfn) + unsigned int console_evtchn, unsigned long *console_mfn, + unsigned long vdevice_share) { DECLARE_DOM0_OP; int rc = 1, i, n; @@ -518,6 +519,7 @@ start_info->store_evtchn = store_evtchn; *console_mfn = start_info->console_mfn = p2m[start_info->console_mfn]; start_info->console_evtchn = console_evtchn; + start_info->vdevice_share = vdevice_share; munmap(start_info, PAGE_SIZE); /* Uncanonicalise each GDT frame number. */ diff -r cddc595b285b tools/libxc/xenguest.h --- a/tools/libxc/xenguest.h Thu Jun 1 23:26:37 2006 +++ b/tools/libxc/xenguest.h Fri Jun 2 09:27:39 2006 @@ -40,7 +40,7 @@ int xc_linux_restore(int xc_handle, int io_fd, uint32_t dom, unsigned long nr_pfns, unsigned int store_evtchn, unsigned long *store_mfn, unsigned int console_evtchn, - unsigned long *console_mfn); + unsigned long *console_mfn, unsigned long vdevice_share); /** * This function will create a domain for a paravirtualized Linux @@ -68,7 +68,8 @@ unsigned int store_evtchn, unsigned long *store_mfn, unsigned int console_evtchn, - unsigned long *console_mfn); + unsigned long *console_mfn, + unsigned long vdevice_share); /** * This function will create a domain for a paravirtualized Linux @@ -100,7 +101,8 @@ unsigned int store_evtchn, unsigned long *store_mfn, unsigned int console_evtchn, - unsigned long *console_mfn); + unsigned long *console_mfn, + unsigned long vdevice_share); int xc_hvm_build(int xc_handle, uint32_t domid, diff -r cddc595b285b tools/python/xen/lowlevel/xc/xc.c --- a/tools/python/xen/lowlevel/xc/xc.c Thu Jun 1 23:26:37 2006 +++ b/tools/python/xen/lowlevel/xc/xc.c Fri Jun 2 09:27:39 2006 @@ -331,25 +331,26 @@ int store_evtchn, console_evtchn; unsigned long store_mfn = 0; unsigned long console_mfn = 0; + unsigned long vdevice_share = 0; static char *kwd_list[] = { "dom", "store_evtchn", "console_evtchn", "image", /* optional */ "ramdisk", "cmdline", "flags", - "features", NULL }; - - if ( !PyArg_ParseTupleAndKeywords(args, kwds, "iiis|ssis", kwd_list, + "features", "vdevice_share", NULL }; + + if ( !PyArg_ParseTupleAndKeywords(args, kwds, "iiis|ssisi", kwd_list, &dom, &store_evtchn, &console_evtchn, &image, /* optional */ &ramdisk, &cmdline, &flags, - &features) ) + &features, &vdevice_share) ) return NULL; if ( xc_linux_build(self->xc_handle, dom, image, ramdisk, cmdline, features, flags, store_evtchn, &store_mfn, - console_evtchn, &console_mfn) != 0 ) { + console_evtchn, &console_mfn, vdevice_share) != 0 ) { if (!errno) errno = EINVAL; return PyErr_SetFromErrno(xc_error); @@ -671,6 +672,38 @@ "cc_compile_date", xen_cc.compile_date); } +static PyObject *pyxc_create_shared_pages(XcObject *self, + PyObject *args, + PyObject *kwds) +{ + unsigned long mfn; + int num; + static char *kwd_list[] = { "num", NULL }; + + if( !PyArg_ParseTupleAndKeywords(args, kwds, "i", kwd_list, &num) ) + return NULL; + + mfn = xc_create_shared_pages(self->xc_handle, num); + return Py_BuildValue("l", mfn); +} + +static PyObject *pyxc_grant_shared_pages(XcObject *self, + PyObject *args, + PyObject *kwds) +{ + unsigned long mfn; + int dom; + static char *kwd_list[] = { "dom", "mfn", NULL }; + + if( !PyArg_ParseTupleAndKeywords(args, kwds, "il", kwd_list, &dom, &mfn) ) + return NULL; + + if (xc_grant_shared_pages(self->xc_handle, dom, mfn) != 0) + return PyErr_SetFromErrno(xc_error); + + Py_INCREF(zero); + return zero; +} static PyObject *pyxc_sedf_domain_set(XcObject *self, PyObject *args, @@ -1109,6 +1142,21 @@ " dom [int]: Domain whose port space to allocate from.\n" " remote_dom [int]: Remote domain to accept connections from.\n\n" "Returns: [int] Unbound event-channel port.\n" }, + + { "create_shared_pages", + (PyCFunction)pyxc_create_shared_pages, + METH_KEYWORDS, "\n" + "Allocate one or more shared pages.\n" + " num_pages [int]: Number of pages t.\n" + "Returns: [int] mfn of (first) shared page\n" }, + + { "grant_shared_pages", + (PyCFunction)pyxc_grant_shared_pages, + METH_VARARGS | METH_KEYWORDS, "\n" + "Grant a domain access to shared pages.\n" + " dom [int]: Domain.\n" + " mfn [int]: Shared page mfn.\n" + "Returns: 0 on success\n" }, { "evtchn_status", (PyCFunction)pyxc_evtchn_status, diff -r cddc595b285b tools/python/xen/xend/XendDomainInfo.py --- a/tools/python/xen/xend/XendDomainInfo.py Thu Jun 1 23:26:37 2006 +++ b/tools/python/xen/xend/XendDomainInfo.py Fri Jun 2 09:27:39 2006 @@ -452,6 +452,7 @@ self.store_mfn = None self.console_port = None self.console_mfn = None + self.vdevice_share = None self.vmWatch = None self.shutdownWatch = None @@ -730,6 +731,7 @@ f(''console/ring-ref'', self.console_mfn) f(''store/port'', self.store_port) f(''store/ring-ref'', self.store_mfn) + f(''vdevice-share'', self.vdevice_share) to_store.update(self.vcpuDomDetails()) @@ -785,6 +787,8 @@ """For use only by image.py and XendCheckpoint.py.""" return self.store_port + def getVdevicePage(self): + return self.vdevice_share def getConsolePort(self): """For use only by image.py and XendCheckpoint.py""" @@ -1285,6 +1289,7 @@ 0, 0) self.createChannels() + self.createVdevicePage() channel_details = self.image.createImage() @@ -1444,6 +1449,11 @@ log.exception("Exception in alloc_unbound(%d)", self.domid) raise + def createVdevicePage(self): + """Create a shared Vdevice page for a domain + """ + self.vdevice_share = xc.create_shared_pages(1) + xc.grant_shared_pages(dom=self.domid, mfn=self.vdevice_share) ## public: diff -r cddc595b285b tools/python/xen/xend/image.py --- a/tools/python/xen/xend/image.py Thu Jun 1 23:26:37 2006 +++ b/tools/python/xen/xend/image.py Fri Jun 2 09:27:39 2006 @@ -171,13 +171,15 @@ ostype = "linux" def buildDomain(self): - store_evtchn = self.vm.getStorePort() + store_evtchn = self.vm.getStorePort() console_evtchn = self.vm.getConsolePort() + vdevice_share = self.vm.getVdevicePage() log.debug("dom = %d", self.vm.getDomid()) log.debug("image = %s", self.kernel) log.debug("store_evtchn = %d", store_evtchn) log.debug("console_evtchn = %d", console_evtchn) + log.debug("vdevide_mfn = %#lx", vdevice_share) log.debug("cmdline = %s", self.cmdline) log.debug("ramdisk = %s", self.ramdisk) log.debug("vcpus = %d", self.vm.getVCpuCount()) @@ -189,7 +191,8 @@ console_evtchn = console_evtchn, cmdline = self.cmdline, ramdisk = self.ramdisk, - features = self.vm.getFeatures()) + features = self.vm.getFeatures(), + vdevice_share = vdevice_share) class HVMImageHandler(ImageHandler): diff -r cddc595b285b tools/xcutils/xc_restore.c --- a/tools/xcutils/xc_restore.c Thu Jun 1 23:26:37 2006 +++ b/tools/xcutils/xc_restore.c Fri Jun 2 09:27:39 2006 @@ -19,11 +19,11 @@ { unsigned int xc_fd, io_fd, domid, nr_pfns, store_evtchn, console_evtchn; int ret; - unsigned long store_mfn, console_mfn; + unsigned long store_mfn, console_mfn, vdevice_share; - if (argc != 7) + if (argc != 8) errx(1, - "usage: %s xcfd iofd domid nr_pfns store_evtchn console_evtchn", + "usage: %s xcfd iofd domid nr_pfns store_evtchn console_evtchn vdevice_share", argv[0]); xc_fd = atoi(argv[1]); @@ -32,9 +32,10 @@ nr_pfns = atoi(argv[4]); store_evtchn = atoi(argv[5]); console_evtchn = atoi(argv[6]); + vdevice_share = atoi(argv[7]); ret = xc_linux_restore(xc_fd, io_fd, domid, nr_pfns, store_evtchn, - &store_mfn, console_evtchn, &console_mfn); + &store_mfn, console_evtchn, &console_mfn, vdevice_share); if (ret == 0) { printf("store-mfn %li\n", store_mfn); printf("console-mfn %li\n", console_mfn); diff -r cddc595b285b linux-2.6-xen-sparse/include/xen/public/io/vdevice.h --- /dev/null Thu Jun 1 23:26:37 2006 +++ b/xen/include/public/io/vdevice.h Fri Jun 2 09:27:39 2006 @@ -0,0 +1,23 @@ +#ifndef __XEN_PUBLIC_IO_VDEVICE_H__ +#define __XEN_PUBLIC_IO_VDEVICE_H__ + +/* These status bits are generic. 256 and above is device specific. */ +#define VDEVICE_S_ACKNOWLEDGE 1 /* We have seen device. */ +#define VDEVICE_S_MAPPED 2 /* We have mapped device OK */ +#define VDEVICE_S_DRIVER 4 /* We have found a driver */ +#define VDEVICE_S_DRIVER_OK 8 /* Driver says OK! */ +#define VDEVICE_S_FAILED 128 /* Something actually failed */ + +struct vdevice_id { + uint32_t type; + uint32_t features; +}; + +/* We have a page of these descriptors in the vdevice page. */ +struct vdevice_desc { + struct vdevice_id id; + uint32_t nr_pages; + uint32_t status; + uint64_t shared_ref; +}; +#endif /* __LINUX_VDEVICE_H__ */ -- ccontrol: http://ccontrol.ozlabs.org _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Subject: Linux support for vdevice bus This patch provides the Linux implementation of the vdevice bus. FIXME: currently it does not support save/restore of the domain: it should call stop before shutting down, and remap shares afterwards before calling reconnect. This depends on exactly what we do with shared pages on restore. diff -r 520f3bf7d3f0 linux-2.6-xen-sparse/drivers/xen/Makefile --- a/linux-2.6-xen-sparse/drivers/xen/Makefile Fri Jun 2 05:22:39 2006 +++ b/linux-2.6-xen-sparse/drivers/xen/Makefile Fri Jun 2 17:04:48 2006 @@ -8,6 +8,7 @@ obj-y += balloon/ obj-y += privcmd/ obj-y += xenbus/ +obj-y += vdevice/ obj-y += xenshare.o obj-$(CONFIG_XEN_BLKDEV_BACKEND) += blkback/ diff -r 520f3bf7d3f0 linux-2.6-xen-sparse/drivers/xen/vdevice/Makefile --- /dev/null Fri Jun 2 05:22:39 2006 +++ b/linux-2.6-xen-sparse/drivers/xen/vdevice/Makefile Fri Jun 2 17:04:48 2006 @@ -0,0 +1,1 @@ +obj-y := vdevice.o diff -r 520f3bf7d3f0 linux-2.6-xen-sparse/drivers/xen/vdevice/vdevice.c --- /dev/null Fri Jun 2 05:22:39 2006 +++ b/linux-2.6-xen-sparse/drivers/xen/vdevice/vdevice.c Fri Jun 2 17:04:48 2006 @@ -0,0 +1,286 @@ +#define DEBUG + +#include <linux/init.h> +#include <linux/err.h> +#include <linux/kernel.h> +#include <linux/device.h> +#include <linux/vdevice.h> +#include <linux/page-flags.h> +#include <linux/vmalloc.h> +#include <linux/workqueue.h> +#include <xen/evtchn.h> +#include <asm/page.h> +#include <xen/interface/share.h> + +static struct work_struct vdevice_add; +static struct xen_share *vdevice_share; +static struct vdevice_desc *vdevices; +static int vdevice_change_counter = 1; +static struct device **devices_installed; + +static ssize_t show_ref(struct bus_type *bus, char *buf) +{ + return sprintf(buf, "0x%lx\n", xen_start_info->vdevice_share); +} +static BUS_ATTR(share_ref, 0444, show_ref, NULL); + +static ssize_t type_show(struct device *_dev, + struct device_attribute *attr, char *buf) +{ + struct vdevice *dev = container_of(_dev, struct vdevice, dev); + return sprintf(buf, "%i", dev->id.type); +} +static ssize_t features_show(struct device *_dev, + struct device_attribute *attr, char *buf) +{ + struct vdevice *dev = container_of(_dev, struct vdevice, dev); + return sprintf(buf, "%i", dev->id.features); +} +static ssize_t share_ref_show(struct device *_dev, + struct device_attribute *attr, char *buf) +{ + struct vdevice *dev = container_of(_dev, struct vdevice, dev); + return sprintf(buf, "%li", + (long)vdevices[dev->vdevice_index].shared_ref); +} +static ssize_t status_show(struct device *_dev, + struct device_attribute *attr, char *buf) +{ + struct vdevice *dev = container_of(_dev, struct vdevice, dev); + return sprintf(buf, "%i", vdevices[dev->vdevice_index].status); +} +static ssize_t status_store(struct device *_dev, struct device_attribute *attr, + const char *buf, size_t count) +{ + struct vdevice *dev = container_of(_dev, struct vdevice, dev); + if (sscanf(buf, "%i", &vdevices[dev->vdevice_index].status) != 1) + return -EINVAL; + return count; +} +static struct device_attribute vdevice_dev_attrs[] = { + __ATTR_RO(type), + __ATTR_RO(features), + __ATTR_RO(share_ref), + __ATTR(status, 0644, status_show, status_store), + __ATTR_NULL +}; + +static int vdevice_match(struct device *_dev, struct device_driver *_drv) +{ + const struct vdevice_id *i; + struct vdevice *dev = container_of(_dev, struct vdevice, dev); + struct vdevice_driver *drv = container_of(_drv, struct vdevice_driver, + driver); + + for (i = drv->ids; i->type != 0; i++) { + if (dev->id.type == i->type && + (dev->id.features & i->features) == i->features) + return 1; + } + return 0; +} + +struct vdevice_bus { + struct bus_type bus; + struct vdevice dev; +}; + +static struct vdevice_bus vd_bus = { + .bus = { + .name = "vdevice", + .match = vdevice_match, + .dev_attrs = vdevice_dev_attrs, + }, + .dev.dev = { + .parent = NULL, + .bus_id = "vdevice", + } +}; + +static int vdevice_dev_probe(struct device *_dev) +{ + int ret; + struct vdevice *dev = container_of(_dev, struct vdevice, dev); + struct vdevice_driver *drv = container_of(dev->dev.driver, + struct vdevice_driver, driver); + struct vdevice_desc *me = &vdevices[dev->vdevice_index]; + + me->status |= VDEVICE_S_DRIVER; + + /* We only set this up when we actually probe, as userspace + * drivers don''t want this. Previous probe might have failed, + * so we could already have it mapped. */ + if (!dev->share) { + dev->share = xen_share_get(me->shared_ref, me->nr_pages); + if (IS_ERR(dev->share)) { + printk(KERN_ERR + "vdevice: failed mapping %u@%li for %i/%i\n", + me->nr_pages, (long)me->shared_ref, + dev->id.type, dev->id.features); + me->status |= VDEVICE_S_FAILED; + ret = PTR_ERR(dev->share); + dev->share = NULL; + return ret; + } + me->status |= VDEVICE_S_MAPPED; + } + + ret = drv->probe(dev, &dev->id); + if (ret == 0) + me->status |= VDEVICE_S_DRIVER_OK; + return ret; +} + +static int vdevice_dev_remove(struct device *_dev) +{ + struct vdevice *dev = container_of(_dev, struct vdevice, dev); + struct vdevice_driver *drv = container_of(dev->dev.driver, + struct vdevice_driver, driver); + + if (drv && drv->remove) + drv->remove(dev); + if (dev->share) + xen_share_put(dev->share); + put_device(_dev); + return 0; +} + +int register_vdevice_driver(struct vdevice_driver *drv) +{ + drv->driver.bus = &vd_bus.bus; + drv->driver.name = drv->name; + drv->driver.owner = drv->owner; + drv->driver.probe = vdevice_dev_probe; + drv->driver.remove = vdevice_dev_remove; + + return driver_register(&drv->driver); +} +EXPORT_SYMBOL_GPL(register_vdevice_driver); + +void unregister_vdevice_driver(struct vdevice_driver *drv) +{ + driver_unregister(&drv->driver); +} +EXPORT_SYMBOL_GPL(unregister_vdevice_driver); + +static share_ref_t new_shared_page(void) +{ + dom0_op_t op = { .cmd = DOM0_CREATESHAREDPAGES, + .interface_version = DOM0_INTERFACE_VERSION, + .u.createsharedpages.num = 1 }; + + return HYPERVISOR_dom0_op(&op); +} + +static void release_vdevice(struct device *_dev) +{ + struct vdevice *dev = container_of(_dev, struct vdevice, dev); + + devices_installed[dev->vdevice_index] = NULL; + kfree(dev); +} + +static void add_vdevice(unsigned int num) +{ + struct vdevice *new; + + vdevices[num].status = VDEVICE_S_ACKNOWLEDGE; + new = kmalloc(sizeof(struct vdevice), GFP_KERNEL); + if (!new) { + printk(KERN_EMERG "Could not allocate vdevice %u\n", num); + vdevices[num].status |= VDEVICE_S_FAILED; + return; + } + + new->vdevice_index = num; + new->id = vdevices[num].id; + new->private = NULL; + memset(&new->dev, 0, sizeof(new->dev)); + new->dev.parent = &vd_bus.dev.dev; + new->dev.bus = &vd_bus.bus; + new->dev.release = release_vdevice; + sprintf(new->dev.bus_id, "%u", num); + new->share = NULL; + if (device_register(&new->dev) != 0) { + printk(KERN_EMERG "Could not register vdevice %u\n", num); + vdevices[num].status |= VDEVICE_S_FAILED; + kfree(new); + } + + devices_installed[num] = &new->dev; +} + +static void vdevice_work(void *unused) +{ + unsigned int i; + + /* Something changed: look for differences. */ + for (i = 0; i < PAGE_SIZE / sizeof(struct vdevice_desc); i++) { + char name[20]; + struct device *dev; + + sprintf(name, "%i", i); + dev = devices_installed[i]; + if (vdevices[i].id.type != 0 && !dev) + add_vdevice(i); + else if (dev && vdevices[i].id.type == 0) + device_unregister(dev); + } + + /* Re-arm trigger */ + vdevice_change_counter = 1; + + /* Acknowledge. */ + HYPERVISOR_share(XEN_SHARE_trigger, xen_start_info->vdevice_share, + 0, 0, 0); +} + +static void vdevice_handler(struct xen_share_handler *h) +{ + schedule_work(&vdevice_add); +} +static struct xen_share_handler handler = { + .handler = vdevice_handler, +}; + +static int __init vdevice_init(void) +{ + int err; + + if (!xen_start_info->vdevice_share) { + /* We could be dom0, in which case we can create it. */ + xen_start_info->vdevice_share = new_shared_page(); + if (IS_ERR_VALUE(xen_start_info->vdevice_share)) { + printk(KERN_INFO "Vdevice bus not found\n"); + xen_start_info->vdevice_share = 0; + return 0; + } + } + printk(KERN_INFO "vdevice bus found at 0x%lx\n", xen_start_info->vdevice_share); + + vdevice_share = xen_share_get(xen_start_info->vdevice_share, 1); + BUG_ON(IS_ERR(vdevice_share)); + vdevices = vdevice_share->addr; + + /* Allocate space for the same number of devices as can fit + * on the vdevices page */ + devices_installed = kcalloc(sizeof(struct device*), + PAGE_SIZE / sizeof(struct vdevice_desc), + GFP_KERNEL); + BUG_ON(!devices_installed); + + bus_register(&vd_bus.bus); + device_register(&vd_bus.dev.dev); + bus_create_file(&vd_bus.bus, &bus_attr_share_ref); + + /* Scan bus once for existing devices before setting up interrupt */ + vdevice_work(NULL); + + INIT_WORK(&vdevice_add, vdevice_work, NULL); + xen_share_add_handler(vdevice_share, &handler); + err = xen_share_watch(vdevice_share, 1, &vdevice_change_counter); + BUG_ON(err<0); + + return 0; +} +postcore_initcall(vdevice_init); diff -r 520f3bf7d3f0 linux-2.6-xen-sparse/include/linux/vdevice.h --- /dev/null Fri Jun 2 05:22:39 2006 +++ b/linux-2.6-xen-sparse/include/linux/vdevice.h Fri Jun 2 17:04:48 2006 @@ -0,0 +1,40 @@ +#ifndef _LINUX_VDEVICE_H_ +#define _LINUX_VDEVICE_H_ + +#include <linux/device.h> +#include <xen/interface/io/vdevice.h> +#include <xen/interface/share.h> +#include <asm/share.h> + +struct vdevice { + /* Unique busid */ + int vdevice_index; + + /* Shared region for this device. */ + struct xen_share *share; + + struct device dev; + struct vdevice_id id; + + /* Driver can hang data off here. */ + void *private; +}; + +struct vdevice_driver { + /* I can drive the following type of device(s) */ + char *name; + struct module *owner; + const struct vdevice_id *ids; + int (*probe)(struct vdevice *dev, const struct vdevice_id *id); + void (*remove)(struct vdevice *dev); + + void (*stop)(struct vdevice *dev); + int (*reconnect)(struct vdevice *dev); + + struct device_driver driver; +}; + +extern int register_vdevice_driver(struct vdevice_driver *drv); +extern void unregister_vdevice_driver(struct vdevice_driver *drv); + +#endif /* _LINUX_VDEVICE_H_ */ -- ccontrol: http://ccontrol.ozlabs.org _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Rusty Russell
2006-Jun-06 05:55 UTC
[Xen-devel] [PATCH 7/9] vdevice tool for manipulating vdevice bus
Subject: vdevice tool for manipulating vdevice bus This creates a simple C tool for list, adding and removing vdevices. This is a demonstration (although, in my opinion, and important one, because it shows how simple this is). This functionality should also be added to the python tools. diff -r 570d423310b5 .hgignore --- a/.hgignore Fri Jun 2 07:05:28 2006 +++ b/.hgignore Mon Jun 5 14:26:28 2006 @@ -146,6 +146,7 @@ ^tools/security/secpol_tool$ ^tools/security/xen/.*$ ^tools/tests/test_x86_emulator$ +^tools/vdevice/vdevice$ ^tools/vnet/gc$ ^tools/vnet/gc.*/.*$ ^tools/vnet/vnet-module/.*\.ko$ diff -r 570d423310b5 tools/Makefile --- a/tools/Makefile Fri Jun 2 07:05:28 2006 +++ b/tools/Makefile Mon Jun 5 14:26:28 2006 @@ -13,6 +13,7 @@ SUBDIRS += console SUBDIRS += xenmon SUBDIRS += guest-headers +SUBDIRS += vdevice ifeq ($(VTPM_TOOLS),y) SUBDIRS += vtpm_manager SUBDIRS += vtpm diff -r 570d423310b5 tools/vdevice/Makefile --- /dev/null Fri Jun 2 07:05:28 2006 +++ b/tools/vdevice/Makefile Mon Jun 5 14:26:28 2006 @@ -0,0 +1,34 @@ +XEN_ROOT=../.. +include $(XEN_ROOT)/tools/Rules.mk + +INSTALL = install +INSTALL_DATA = $(INSTALL) -m0644 +INSTALL_PROG = $(INSTALL) -m0755 +INSTALL_DIR = $(INSTALL) -d -m0755 + +PROFILE=#-pg +BASECFLAGS=-Wall -g -Werror +# Make gcc generate dependencies. +BASECFLAGS += -Wp,-MD,.$(@F).d +PROG_DEP = .*.d +BASECFLAGS+= -O3 $(PROFILE) +BASECFLAGS+= -I$(XEN_ROOT)/tools/libxc +BASECFLAGS+= -I$(XEN_ROOT)/tools/xenstore +BASECFLAGS+= -I. + +CFLAGS += $(BASECFLAGS) +LDFLAGS += $(PROFILE) -L$(XEN_LIBXC) -L$(XEN_XENSTORE) + +all: vdevice + +clean: + rm -f vdevice *.o .*.d + +vdevice: vdevice.o + $(LINK.o) $^ $(LOADLIBES) $(LDLIBS) -lxenctrl -lxenstore -o $@ + +install: vdevice + $(INSTALL_DIR) -p $(DESTDIR)/usr/sbin + $(INSTALL_PROG) vdevice $(DESTDIR)/usr/sbin + +-include $(PROG_DEP) diff -r 570d423310b5 tools/vdevice/vdevice.c --- /dev/null Fri Jun 2 07:05:28 2006 +++ b/tools/vdevice/vdevice.c Mon Jun 5 14:26:28 2006 @@ -0,0 +1,571 @@ +#include <stdio.h> +#include <stdlib.h> +#include <string.h> +#include <err.h> +#include <fcntl.h> +#include <unistd.h> +#include <sys/types.h> +#include <sys/stat.h> +#include <stdint.h> +#include <stdbool.h> +#include <assert.h> +#include <errno.h> +#include <stdint.h> +#include <sys/ioctl.h> +#include <net/ethernet.h> + +#include <xs.h> + +#include <xen/xen.h> +#include <xen/share.h> +#include <xen/linux/xenshare.h> +#include <xen/event_channel.h> +#include <xen/linux/privcmd.h> +#include <xen/io/vdevice.h> + +#include <xc_private.h> + +#define PROGRAM_NAME "vdevice" + +static int xc_fd; + +#define __unused __attribute__((unused)) + +/* FIXME: Move to xenctrl library */ +static int HYPERVISOR_share(int cmd, int arg1, int arg2, int arg3, int arg4) +{ + privcmd_hypercall_t privcmd; + + privcmd.op = __HYPERVISOR_share_op; + privcmd.arg[0] = cmd; + privcmd.arg[1] = arg1; + privcmd.arg[2] = arg2; + privcmd.arg[3] = arg3; + privcmd.arg[4] = arg4; + + return do_xen_hypercall(xc_fd, &privcmd); +} + +/* FIXME: Move to xenctrl library */ +static int add_grant(share_ref_t ref, domid_t dom) +{ + dom0_op_t op = { .cmd = DOM0_GRANTSHAREDPAGES, + .interface_version = DOM0_INTERFACE_VERSION, + .u.grantsharedpages.share_ref = ref, + .u.grantsharedpages.domain = dom }; + + /* FIXME: Skip domain 0 as it will always have access */ + if (dom == 0) + return 0; + + return do_dom0_op(xc_fd, &op); +} + +static void *map_pages(share_ref_t share_ref, unsigned int num_pages, + unsigned int *peer_id) +{ + struct xenshare_get_share shareget; + int shareiofd, ret; + void *sharepage; + + shareiofd = open("/dev/xenshare", O_RDWR); + if (shareiofd < 0) + err(1, "Could not open ''%s''", "/dev/xenshare"); + + shareget.share_ref = share_ref; + shareget.num_pages = num_pages; + ret = ioctl(shareiofd, IOCTL_XENSHARE_GET_SHARE, &shareget); + if (ret < 0) + err(1, "Getting shared pages gave %i", ret); + *peer_id = ret; + + /* Map shared page */ + sharepage = mmap(NULL, num_pages*getpagesize(), PROT_READ|PROT_WRITE, + MAP_SHARED, shareiofd, + XENSHARE_MAP_SHARE_PAGE * getpagesize()); + if (sharepage == MAP_FAILED) + err(1, "Failed to map shared page"); + + return sharepage; +} + +/* Munmap addr, let the xenshare interface clean up evtchns etc */ +static int unmap_pages(void *addr) +{ + int err; + + err = munmap((void *)addr, PAGE_SIZE); + if (err < 0) { + fprintf(stderr, "Failed to munmap() (%i,%i)\n", err, -errno); + return -errno; + } + + return 0; +} + +/* FIXME: Move to xenctrl library */ +static share_ref_t create_shared_pages(int num_pages, unsigned int *peer_id) +{ + share_ref_t share_ref; + int err; + void *addr; + + dom0_op_t op = { .cmd = DOM0_CREATESHAREDPAGES, + .interface_version = DOM0_INTERFACE_VERSION, + .u.createsharedpages.num = num_pages }; + + err = do_dom0_op(xc_fd, &op); + if (err < 0) + return 0; + + printf("Create page returned 0x%x\n", err); + + /* Save the share_ref */ + share_ref = err; + + /* Clear the page */ + addr = map_pages(share_ref, num_pages, peer_id); + memset(addr, 0, num_pages * getpagesize()); + unmap_pages(addr); + + return share_ref; +} + +static uint64_t get_domain_shared_ref(struct xs_handle *h, domid_t domid) +{ + unsigned int len; + unsigned long long share_ref; + char key[512]; + char *val, *endp; + + sprintf(key, "/local/domain/%i/vdevice-share", domid); + val = xs_read(h, 0, key, &len); + + if (val == NULL) + return DOMID_FIRST_RESERVED; + share_ref = strtoull(val, &endp, 0); + if (endp == val || *endp) { + errno = EINVAL; + free(val); + return DOMID_FIRST_RESERVED; + } + free(val); + return share_ref; +} + +/* Get dom0 vdevice_share from /sys/bus/vdevice/share_ref */ +static uint64_t get_dom0_shared_ref(void) +{ + FILE *f; + unsigned long long share_ref; + + f = fopen("/sys/bus/vdevice/share_ref", "r"); + if (!f) { + return DOMID_FIRST_RESERVED; + } + if (fscanf(f, "%llx", &share_ref) != 1) { + errno = EINVAL; + return DOMID_FIRST_RESERVED; + } + fclose(f); + return share_ref; +} + +struct vdevice_type +{ + /* Name of this device */ + const char *name; + + /* Number of pages to create for it. */ + unsigned int num_pages; + + /* Type number of this device. */ + uint32_t type; + + /* Features when creating a new one of these */ + uint32_t features; + + /* --create. Returns num args consumed. */ + int (*create)(struct vdevice_type *, + share_ref_t ref, void *map, int argc, char *argv[]); + + /* List info about this vdevice. */ + void (*list)(struct vdevice_type *, const struct vdevice_desc *vdesc); +}; + +/* Volatile is important: someone else changes it. */ +static uint32_t get_status(volatile struct vdevice_desc *vdevice) +{ + return vdevice->status; +} + +/* Returns the vdevice reference for this domain. */ +static share_ref_t vdevice_ref_for_domain(domid_t domid) +{ + share_ref_t vdevice_ref; + + if (domid == 0) + vdevice_ref = get_dom0_shared_ref(); + else { + int saved_errno; + struct xs_handle *xsh = xs_daemon_open(); + if (!xsh) { + warn("Could not talk to xenstored"); + return DOMID_FIRST_RESERVED; + } + vdevice_ref = get_domain_shared_ref(xsh, domid); + saved_errno = errno; + xs_daemon_close(xsh); + errno = saved_errno; + } + return vdevice_ref; +} + +static bool add_vdevice_entry(const char *domain, + uint32_t type, uint32_t features, + unsigned int num_pages, share_ref_t share_ref, + uint32_t status_flags) +{ + struct vdevice_desc *vdevices; + unsigned int i, peer_id; + uint32_t status; + share_ref_t vdevice_ref; + long domid; + char *endp; + + domid = strtol(domain, &endp, 0); + if (domid >= DOMID_FIRST_RESERVED || endp == domain || *endp != ''\0'') { + warn("Invalid domain id ''%s''", domain); + return false; + } + + vdevice_ref = vdevice_ref_for_domain(domid); + if (vdevice_ref == DOMID_FIRST_RESERVED) { + warnx("Could not find vdevice page for domain %li", domid); + return false; + } + + /* There is always excatly 1 page for vdevices */ + vdevices = map_pages(vdevice_ref, 1, &peer_id); + if (!vdevices) { + warn("Could not access vdevice page %#llx for domain %li", + (long long)vdevice_ref, domid); + return false; + } + + for (i = 0; vdevices[i].id.type; i++) { + if (i == (PAGE_SIZE / sizeof(struct vdevice_desc)) - 1) { + warnx("Vdevice page for domain %li is full", domid); + unmap_pages(vdevices); + return false; + } + } + + if (add_grant(share_ref, domid) != 0) { + warn("Could not grant domain %li access to device", domid); + unmap_pages(vdevices); + return false; + } + + vdevices[i].id.type = type; + vdevices[i].id.features = features; + vdevices[i].nr_pages = num_pages; + vdevices[i].shared_ref = share_ref; + vdevices[i].status = 0; + + /* FIXME: magic "1" */ + HYPERVISOR_share(XEN_SHARE_trigger, vdevice_ref, 1, 0, 0); + + /* FIXME: Use /dev/xenshare, rather than spinning. Timeout. */ + do { + status = get_status(&vdevices[i]); + sleep(1); + } while ((status & (VDEVICE_S_FAILED|status_flags)) == 0); + + if (status & VDEVICE_S_FAILED) { + warnx("Adding device %i to domain %li failed: status %#08x", + i, domid, status); + /* if add_device filed the shared page is destroyed */ + vdevices[i].id.type = 0; + unmap_pages(vdevices); + return false; + } + unmap_pages(vdevices); + return true; +} + +static void remove_vdevice_entry(share_ref_t vdevice_ref, + struct vdevice_type *type, + share_ref_t share_ref) +{ + struct vdevice_desc *vdevices; + unsigned int i, peer_id; + + vdevices = map_pages(vdevice_ref, 1, &peer_id); + if (!vdevices) { + warn("Could not access vdevice page"); + return; + } + + for (i = 0; vdevices[i].shared_ref != share_ref; i++) { + if (i == (PAGE_SIZE / sizeof(struct vdevice_desc)) - 1) { + warnx("Could not find device %s (%li) in vdevice page", + type->name, share_ref); + return; + } + } + + /* FIXME: report the domid we''re talking about! */ + if (vdevices[i].id.type != type->type) { + warnx("Vdevice %i using shared ref %li" + " has wrong type: %i", + i, share_ref, vdevices[i].id.type); + return; + } + memset(&vdevices[i], 0, sizeof(vdevices[i])); + + HYPERVISOR_share(XEN_SHARE_trigger, vdevice_ref, 1, 0, 0); + /* FIXME: wait for ack! */ +} + +/* FIXME: some callers need to recover, not exit if this fails... */ +static share_ref_t domid_arg(const char *arg) +{ + unsigned long domain; + char *endp; + share_ref_t vdevice_ref; + + domain = strtol(arg, &endp, 0); + if (strlen(arg) == 0 || *endp != ''\0'') + errx(1, "Invalid domain id ''%s''", arg); + + vdevice_ref = vdevice_ref_for_domain(domain); + if (vdevice_ref == DOMID_FIRST_RESERVED) + err(1, "Cannot find vdevice page for domain ''%s''", arg); + return vdevice_ref; +} + +static struct vdevice_type types[] = { +}; + +#define ARRAY_SIZE(a) (sizeof(a)/sizeof(a[0])) +static struct vdevice_type *find_type(const char *type) +{ + unsigned int i; + for (i = 0; i < ARRAY_SIZE(types); i++) + if (!strcmp(types[i].name, type)) + return &types[i]; + return NULL; +} +static struct vdevice_type *find_type_err(const char *type) +{ + if (!find_type(type)) + errx(1, "unknown type ''%s''", type); + return find_type(type); +} +static struct vdevice_type *find_type_number(unsigned int num) +{ + unsigned int i; + for (i = 0; i < ARRAY_SIZE(types); i++) + if (num == types[i].type) + return &types[i]; + return NULL; +} + +static void usage(void) +{ + unsigned int i; + fprintf(stderr, "Usage:\n" + "\t%s --create <type> ...\n" + "\t%s --add <type> <share_ref> <domid>\n" + "\t%s --remove <type> <share_ref> <domid>\n" + "\t%s --delete <type> <share_ref> ...\n" + "\t%s --list <domid>\n", + PROGRAM_NAME, PROGRAM_NAME, PROGRAM_NAME, PROGRAM_NAME, + PROGRAM_NAME); + fprintf(stderr, "Available types:"); + for (i = 0; i < ARRAY_SIZE(types); i++) + fprintf(stderr, " %s", types[i].name); + fprintf(stderr, "\n"); + exit(1); +} + +static void list_devices(share_ref_t vdevice_ref) +{ + unsigned int i, peer_id; + struct vdevice_desc *vdevices; + + vdevices = map_pages(vdevice_ref, 1, &peer_id); + if (!vdevices) + err(1, "Could not access vdevice page"); + + for (i = 0; i < PAGE_SIZE / sizeof(struct vdevice_desc); i++) { + struct vdevice_type *type; + + if (!vdevices[i].id.type) + continue; + + type = find_type_number(vdevices[i].id.type); + printf("Device %i: %s %#x share=%#llx", i, + type ? type->name : "(unknown)", + vdevices[i].status, + (unsigned long long)vdevices[i].shared_ref); + if (type) + type->list(type, &vdevices[i]); + printf("\n"); + } +} + +static void destroy_share(share_ref_t share_ref) +{ + int olderr = errno; + dom0_op_t op = { .cmd = DOM0_DESTROYSHAREDPAGES, + .interface_version = DOM0_INTERFACE_VERSION, + .u.destroysharedpages.share_ref = share_ref }; + if (do_dom0_op(xc_fd, &op) != 0) + warn("Failed to destroy share"); + errno = olderr; +} + +/* Steal the number of pages from the command line if specified, + * otherwise use the default from the type defn. */ +static int get_num_pages(int *argc, char **argv, int num_pages) +{ + int i, j; + + for(i=0;i<*argc;i++) { + if (strcmp(argv[i], "--num_pages") == 0) { + if (i == *argc-1) + errx(1, "Specified num_pages at end of args"); + + num_pages = atoi(argv[i+1]); + if (num_pages <= 0) + errx(1, "%s is an invalid number of pages", + argv[i+1]); + + for(j=0;j+i+2<*argc;j++) { + argv[i+j] = argv[i+j+2]; + } + argv[(*argc)-1] = NULL; + argv[(*argc)-2] = NULL; + *argc -= 2; + break; + } + } + + return num_pages; +} + +static void create_device(struct vdevice_type *type, int argc, char *argv[]) +{ + unsigned int peer_id; + int argoff; + share_ref_t share_ref; + void *map; + int num_pages = get_num_pages(&argc, argv, type->num_pages); + + share_ref = create_shared_pages(num_pages, &peer_id); + if (share_ref == 0) + err(1, "Failed to create a new shared page!"); + + map = xc_map_foreign_range(xc_fd, DOMID_SELF, + PAGE_SIZE * num_pages, + PROT_READ|PROT_WRITE, share_ref); + if (!map) { + destroy_share(share_ref); + err(1, "Failed to map share %li", share_ref); + } + argoff = type->create(type, share_ref, map, argc, argv); + if (argoff < 0) { + destroy_share(share_ref); + exit(1); + } + argc -= argoff; + argv += argoff; + + while (argv[0]) { + add_vdevice_entry(argv[0], type->type, type->features, + num_pages, share_ref, VDEVICE_S_ACKNOWLEDGE); + argv++; + } +} + +static void add_device(struct vdevice_type *type, + share_ref_t share_ref, + const char *domain) +{ + /* FIXME: get nr_pages from vdesc? */ + if (!add_vdevice_entry(domain, type->type, type->features, + type->num_pages, share_ref, + VDEVICE_S_ACKNOWLEDGE)) + exit(1); +} + +static void delete_device(struct vdevice_type *type, share_ref_t share_ref, + int argc, char *argv[]) +{ + /* Remove domains, then destroy share. */ + while (argv[0]) { + remove_vdevice_entry(domid_arg(argv[0]), type, share_ref); + argv++; + } + + destroy_share(share_ref); +} + +static void remove_device(struct vdevice_type *type, + share_ref_t share_ref, + share_ref_t vdevices_ref) +{ + remove_vdevice_entry(vdevices_ref, type, share_ref); +} + +static uint64_t share_ref_arg(const char *arg) +{ + char *endp; + uint64_t share_ref = strtoull(arg, &endp, 0); + + if (*endp || endp == arg) + errx(1, "Invalid shared reference %s", arg); + return share_ref; +} + + +/* FIXME: Locking! what prevents 2 (or more) userspace apps clobbering each + * others memory? */ +int main(int argc, char *argv[]) +{ + if (argc < 2) + usage(); + + xc_fd = xc_interface_open(); + if (xc_fd < 0) + err(1, "Failed to open xc interface"); + + if (!strcmp(argv[1], "--list")) { + if (argc != 3) + usage(); + list_devices(domid_arg(argv[2])); + } else if (!strcmp(argv[1], "--create")) { + if (argc < 3) + usage(); + create_device(find_type_err(argv[2]), argc-3, argv+3); + } else if (!strcmp(argv[1], "--add")) { + if (argc != 5) + usage(); + add_device(find_type_err(argv[2]), share_ref_arg(argv[3]), + argv[4]); + } else if (!strcmp(argv[1], "--delete")) { + if (argc < 4) + usage(); + delete_device(find_type_err(argv[2]), share_ref_arg(argv[3]), + argc-4, argv+4); + } else if (!strcmp(argv[1], "--remove")) { + if (argc != 5) + usage(); + remove_device(find_type_err(argv[2]), share_ref_arg(argv[3]), + domid_arg(argv[4])); + } else + usage(); + return 0; +} -- ccontrol: http://ccontrol.ozlabs.org _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Subject: Xen Share Net Device This is a simple network device using the vdevice bus and the share info. It is primitive, in that it does not handle fragmented skbs (for no particularly good reason). The main feature of this device is that it demonstrates how a N:N device like a virtual intra-domain LAN can be implemented. diff -r c0c781af505d linux-2.6-xen-sparse/drivers/xen/Makefile --- a/linux-2.6-xen-sparse/drivers/xen/Makefile Mon Jun 5 04:27:31 2006 +++ b/linux-2.6-xen-sparse/drivers/xen/Makefile Mon Jun 5 16:40:49 2006 @@ -9,6 +9,7 @@ obj-y += privcmd/ obj-y += xenbus/ obj-y += vdevice/ +obj-m += sharenet/ obj-y += xenshare.o obj-$(CONFIG_XEN_BLKDEV_BACKEND) += blkback/ diff -r c0c781af505d tools/vdevice/vdevice.c --- a/tools/vdevice/vdevice.c Mon Jun 5 04:27:31 2006 +++ b/tools/vdevice/vdevice.c Mon Jun 5 16:40:49 2006 @@ -193,6 +193,46 @@ void (*list)(struct vdevice_type *, const struct vdevice_desc *vdesc); }; +/* --create. Returns num args consumed. */ +static int net_create(struct vdevice_type *type __unused, + share_ref_t ref __unused, void *map __unused, + int argc __unused, char *argv[] __unused) +{ + /* We don''t need to do anything to the shared page, nor wait + * for "backend". */ + return 0; +} + +/* List info about this vdevice. */ +static void net_list(struct vdevice_type *type, + const struct vdevice_desc *vdesc) +{ + unsigned int i, peer_id; + + struct xensnet_receiver + { + unsigned char mac[ETH_ALEN]; + /* Currently stores a peer''s promiscuity state */ + unsigned char flags; + }; + struct xensnet_receiver *r, empty; + + r = map_pages(vdesc->shared_ref, vdesc->nr_pages, &peer_id); + if (!r) { + printf(" *cannot map*"); + return; + } + memset(&empty, 0, sizeof(empty)); + for (i = 0; i < (vdesc->nr_pages*getpagesize())/sizeof(*r); i++) { + if (memcmp(&empty, &r[i], sizeof(empty)) != 0) + printf(" [%i]=%02x:%02x:%02x:%02x:%02x:%02x%s", + i, r[i].mac[0], r[i].mac[1], r[i].mac[2], + r[i].mac[3], r[i].mac[4], r[i].mac[5], + r[i].flags & 0x01 ? "(promisc)" : ""); + } + unmap_pages(r); +} + /* Volatile is important: someone else changes it. */ static uint32_t get_status(volatile struct vdevice_desc *vdevice) { @@ -346,6 +386,13 @@ } static struct vdevice_type types[] = { + { .name = "net", + .type = 1, + .features = 1, + .num_pages = 1, + .create = net_create, + .list = net_list, + }, }; #define ARRAY_SIZE(a) (sizeof(a)/sizeof(a[0])) diff -r c0c781af505d linux-2.6-xen-sparse/drivers/xen/sharenet/Makefile --- /dev/null Mon Jun 5 04:27:31 2006 +++ b/linux-2.6-xen-sparse/drivers/xen/sharenet/Makefile Mon Jun 5 16:40:49 2006 @@ -0,0 +1,1 @@ +obj-m := xensnet.o diff -r c0c781af505d linux-2.6-xen-sparse/drivers/xen/sharenet/xensnet.c --- /dev/null Mon Jun 5 04:27:31 2006 +++ b/linux-2.6-xen-sparse/drivers/xen/sharenet/xensnet.c Mon Jun 5 16:40:49 2006 @@ -0,0 +1,527 @@ +/* Simple Xen share network */ +// #define DEBUG +#include <xen/interface/share.h> +#include <xen/evtchn.h> +#include <asm/io.h> +#include <asm/share.h> +#include <linux/netdevice.h> +#include <linux/etherdevice.h> +#include <linux/module.h> +#include <linux/vdevice.h> + +#define SHARED_SIZE 4096 +#define DATA_SIZE 1500 +#define MAX_LANS 4 +#define NUM_SKBS 32 +#define PROMISC_BIT 0x01 + +struct xensnet_receiver +{ + unsigned char mac[ETH_ALEN]; + /* Currently stores a peer''s promiscuity state */ + unsigned char flags; +}; + +/* The skbs which are registered as sglists with the hypervisor. */ +struct xensnet_skb +{ + struct sk_buff *skb; + /* Set by Hypervisor when other end triggers */ + u32 length; +}; + +struct xensnet_info +{ + /* The shared page. */ + struct xensnet_receiver *peers; + + /* vdev->private == netdevice. */ + struct vdevice *vdev; + + struct net_device_stats stats; + + /* Receive queue. */ + struct xensnet_skb skbs[NUM_SKBS]; + + /* Single cached (failed) transmission, with lock */ + spinlock_t out_lock; + struct sk_buff *out_skb; + unsigned int out_peer; + + struct xen_share_handler handler; + + /* Set to 0 when congestion relieved. And later when peers + * join/unjoin */ + u32 change_watch; + u32 max_partitions; +}; + +/* How many bytes left in this page. */ +static unsigned int rest_of_page(void *data) +{ + return PAGE_SIZE - ((unsigned long)data % PAGE_SIZE); +} + +static int transfer_packet(struct net_device *dev, + struct sk_buff *skb, + unsigned int peernum, + struct xensnet_info *info) +{ + unsigned int i = 0; + struct xen_sg sg[2+MAX_SKB_FRAGS]; /* FIXME: Check overflow */ + unsigned long offset; + int retval; + + BUG_ON(skb_headlen(skb) != skb->len); + /* FIXME: pages might not be contiguous, but if Xen did + * translation we wouldn''t have to worry about it. */ + for (offset = 0; + offset < skb_headlen(skb); + offset += rest_of_page(skb->data + offset)) { + sg[i].addr = virt_to_phys(skb->data + offset); + sg[i].len = min((unsigned)(skb_headlen(skb) - offset), + rest_of_page(skb->data + offset)); + i++; + } + + BUG_ON(skb_shinfo(skb)->nr_frags); + + pr_debug("xfer length %04x (%u)\n", htons(skb->len), skb->len); + retval = xen_sg_xfer(info->vdev->share, peernum, XEN_SG_OUT, i, sg); + if (retval < 0) { + pr_debug("Can''t xfer to peer %i: %i\n", peernum, retval); + info->stats.tx_fifo_errors++; + return retval; + } else if (retval != skb->len) { + info->stats.tx_aborted_errors++; + pr_debug("Short xfer to peer %i: %i of %i (sg %p/%li)\n", + peernum, retval, skb->len, + (void *)sg[0].addr, sg[0].len); + /* This is their problem, don''t re-xmit. */ + return 0; + } else + pr_debug("xensnet: sent %u bytes in %i chunks\n", + skb->len, i); + info->stats.tx_bytes += skb->len; + info->stats.tx_packets++; + return 0; +} + +static int mac_eq(const unsigned char mac[ETH_ALEN], + struct xensnet_info *info, unsigned int peer) +{ + return memcmp(mac, info->peers[peer].mac, ETH_ALEN) == 0; +} + +static int unused_peer(struct xensnet_receiver *peers, unsigned int num) +{ + return peers[num].mac[0] == 0; +} + + +static int is_broadcast(const unsigned char dest[ETH_ALEN]) +{ + return dest[0] == 0xFF && dest[1] == 0xFF && dest[2] == 0xFF + && dest[3] == 0xFF && dest[4] == 0xFF && dest[5] == 0xFF; +} + +static int promisc(struct xensnet_info *info, unsigned int peer) +{ + return info->peers[peer].flags & PROMISC_BIT; +} + +static void xensnet_set_multicast(struct net_device *dev) +{ + struct xensnet_info *info = dev->priv; + + if (dev->flags & IFF_PROMISC) + info->peers[info->vdev->share->peerid].flags |= PROMISC_BIT; + else + info->peers[info->vdev->share->peerid].flags &= ~PROMISC_BIT; +} + +static int xensnet_start_xmit(struct sk_buff *skb, struct net_device *dev) +{ + unsigned int i; + int transferred = 0, broadcast = 0; + struct xensnet_info *info = dev->priv; + const unsigned char *dest = ((struct ethhdr *)skb->data)->h_dest; + + if (is_broadcast(dest)) + broadcast = 1; + + pr_debug("xensnet %s: xmit broadcast=%i\n", + dev->name, broadcast); + pr_debug("dest: %02x:%02x:%02x:%02x:%02x:%02x\n", + dest[0], dest[1], dest[2], dest[3], dest[4], dest[5]); + + for (i = 0; i < info->max_partitions; i++) { + if (i == info->vdev->share->peerid || unused_peer(info->peers, i)) + continue; + + if (broadcast || promisc(info, i) || mac_eq(dest, info, i)) { + unsigned long flags; + pr_debug("xensnet %s: sending from %i to %i\n", + dev->name, info->vdev->share->peerid, i); + spin_lock_irqsave(&info->out_lock, flags); + if (transfer_packet(dev, skb, i, info) == -ENOSPC + && !broadcast) { + /* Queue this packet, stop queue. */ + pr_debug("Queuing packet, stopping queue\n"); + BUG_ON(info->out_skb); + info->out_skb = skb; + info->out_peer = i; + netif_stop_queue(dev); + spin_unlock_irqrestore(&info->out_lock, flags); + return 0; + } + spin_unlock_irqrestore(&info->out_lock, flags); + transferred = 1; + } + } + + if (!transferred) { + pr_debug("Can''t xfer to %02x:%02x:%02x:%02x:%02x:%02x\n", + dest[0], dest[1], dest[2], dest[3], dest[4], dest[5]); + info->stats.tx_carrier_errors++; + } + + dev_kfree_skb(skb); + return 0; +} + +static struct sk_buff *xensnet_alloc_skb(struct net_device *dev, int gfpflags) +{ + struct sk_buff *skb; + + skb = alloc_skb(16 + ETH_HLEN + DATA_SIZE, gfpflags); + if (!skb) + return NULL; + + skb->dev = dev; + skb_reserve(skb, 16); + return skb; +} + +/* Unregister scatter-gather with hypervisor. */ +static void release_skb(struct xensnet_info *info, int slot) +{ + struct sk_buff *skb = info->skbs[slot].skb; + + xen_sg_unregister(info->vdev->share, virt_to_phys(skb->data)); +} + +/* Find a new skb to put in this slot in shared mem. */ +static int fill_slot(struct net_device *dev, unsigned int slot) +{ + struct xensnet_info *info = dev->priv; + struct xen_sg sg[MAX_SKB_FRAGS+1]; + int err; + + /* Try to create and register a new one. */ + info->skbs[slot].skb = xensnet_alloc_skb(dev, GFP_ATOMIC); + if (!info->skbs[slot].skb) { + printk("xensnet: could not fill slot %i\n", slot); + return -ENOMEM; + } + + info->skbs[slot].length = 0; + + sg[0].addr = virt_to_phys(info->skbs[slot].skb->data); + sg[0].len = ETH_HLEN + DATA_SIZE; + + /* We queue up at our peerid, by convention. */ + err = xen_sg_register(info->vdev->share, XEN_SG_IN, + info->vdev->share->peerid, + &info->skbs[slot].length, 1, sg); + if (err) { + dev_kfree_skb_irq(info->skbs[slot].skb); + info->skbs[slot].skb = NULL; + printk("xensnet: could not register skb for slot %i\n", slot); + return err; + } + + pr_debug("xensnet: %s populating slot %i with %p\n", dev->name, slot, + info->skbs[slot].skb); + + return 0; +} + +static int try_retransmit(struct net_device *dev, struct xensnet_info *info, + struct sk_buff *skb, unsigned int peer) +{ + int err; + + /* Nothing to re-xmit? */ + if (!skb) + return 0; + + /* Peer has gone away? */ + if (unused_peer(info->peers, peer)) { + printk("Peer %i no longer exists!\n", peer); + return 1; + } + + /* Any error other than "no buffers left". */ + err = transfer_packet(dev, skb, peer, info); + pr_debug("Transferring queued packet %i\n", err); + return err != -ENOSPC; +} + +static void xensnet_handler(struct xen_share_handler *handler) +{ + struct xensnet_info *info; + struct net_device *dev; + unsigned int i; + struct sk_buff *skb; + + info = container_of(handler, struct xensnet_info, handler); + dev = info->vdev->private; + + /* Something changed? If we have packet queued, try re-xmit. */ + if (info->change_watch != 1) { + unsigned long flags; + + info->change_watch = 1; + + pr_debug("%i: try_retransmit\n", info->vdev->share->peerid); + spin_lock_irqsave(&info->out_lock, flags); + if (try_retransmit(dev, info, info->out_skb, info->out_peer)) { + dev_kfree_skb_irq(info->out_skb); + info->out_skb = NULL; + netif_wake_queue(dev); + } else + pr_debug("%i: try_retransmit failed\n", info->vdev->share->peerid); + spin_unlock_irqrestore(&info->out_lock, flags); + } + + for (i = 0; i < ARRAY_SIZE(info->skbs); i++) { + unsigned int length; + + length = info->skbs[i].length; + if (length == 0) + continue; + + skb = info->skbs[i].skb; + fill_slot(dev, i); + + if (skb) { + if (length < 14 || length > 1514) { + printk(KERN_WARNING + "xensnet: unbelievable skb len: %i\n", + length); + dev_kfree_skb(skb); + continue; + } + skb_put(skb, length); + skb->protocol = eth_type_trans(skb, dev); + /* This is a reliable transport. */ + skb->ip_summed = CHECKSUM_UNNECESSARY; + pr_debug("Receiving skb proto 0x%04x len %i type %i\n", + ntohs(skb->protocol), skb->len,skb->pkt_type); + + info->stats.rx_bytes += skb->len; + info->stats.rx_packets++; + netif_rx(skb); + } + } +} + +static int populate_page(struct net_device *dev) +{ + int i; + struct xensnet_info *info = dev->priv; + struct xensnet_receiver *me = &info->peers[info->vdev->share->peerid]; + int retval; + + pr_debug("xensnet: peer %i shared page %p me %p\n", + info->vdev->share->peerid, info->peers, me); + /* Save MAC address */ + memcpy(me->mac, dev->dev_addr, ETH_ALEN); + + me->flags = 0; + /* Turn on promisc mode if needed */ + xensnet_set_multicast(dev); + + for (i = 0; i < ARRAY_SIZE(info->skbs); i++) { + retval = fill_slot(dev, i); + + if (retval) + goto cleanup; + } + pr_debug("xensnet: allocated %i watches\n", i); + + return 0; + +cleanup: + while (--i >= 0) { + release_skb(info, i); + dev_kfree_skb(info->skbs[i].skb); + } + + return -ENOMEM; +} + +static void unpopulate_page(struct xensnet_info *info) +{ + unsigned int i; + struct xensnet_receiver *me = &info->peers[info->vdev->share->peerid]; + + /* Clear all trace: others might deliver packets, we''ll ignore it. */ + memset(me, 0, sizeof(*me)); + mb(); + + /* Disclaim slot. */ + me->mac[0] = 0; + + /* Deregister sg lists, free up skbs, remove triggers. */ + for (i = 0; i < ARRAY_SIZE(info->skbs); i++) { + release_skb(info, i); + dev_kfree_skb(info->skbs[i].skb); + } +} + +static int xensnet_open(struct net_device *dev) +{ + return populate_page(dev); +} + +static int xensnet_close(struct net_device *dev) +{ + unpopulate_page(dev->priv); + return 0; +} + +static struct net_device_stats *xensnet_get_stats(struct net_device *dev) +{ + struct xensnet_info *info = dev->priv; + + return &info->stats; +} + +/* Setup device with page at this address. If fail, drop page and + * return ERR_PTR(-errno). */ +static struct net_device *setup_device(struct vdevice *vdev) +{ + int err; + struct net_device *dev; + struct xensnet_info *info; + + vdev->private = dev = alloc_etherdev(sizeof(struct xensnet_info)); + if (!dev) + return ERR_PTR(-ENOMEM); + + SET_MODULE_OWNER(dev); + + /* Ethernet defaults with some changes */ + ether_setup(dev); + dev->set_mac_address = NULL; + dev->mtu = DATA_SIZE; + + /* FIXME: Base initial MAC address on domain id. */ + random_ether_addr(dev->dev_addr); + /* Ensure top byte not zero */ + dev->dev_addr[0] |= 0x80; + + dev->open = xensnet_open; + dev->stop = xensnet_close; + dev->hard_start_xmit = xensnet_start_xmit; + dev->get_stats = xensnet_get_stats; + /* Turning on/off promisc will call dev->set_multicast_list. + * We don''t actually support multicast yet */ + dev->set_multicast_list = xensnet_set_multicast; + /* Only true for x86 where share_ref == mfn, but gives indication */ + dev->mem_start = vdev->share->share_ref << PAGE_SHIFT; + dev->mem_end = dev->mem_start + PAGE_SIZE; + dev->dma = 0; + + info = dev->priv; + info->vdev = vdev; + info->out_skb = NULL; + info->change_watch = 1; + info->peers = vdev->share->addr; + spin_lock_init(&info->out_lock); + + /* skbs allocated on open */ + memset(info->skbs, 0, sizeof(info->skbs)); + + info->handler.handler = xensnet_handler; + xen_share_add_handler(vdev->share, &info->handler); + + /* Watch offset 0 for changes. */ + err = xen_share_watch(vdev->share, 0, &info->change_watch); + if (err) { + pr_debug("xensnet: watching 0x%lx %i failed\n", + vdev->share->share_ref, + vdev->share->peerid); + goto remove_handler; + } + + err = register_netdev(dev); + if (err) { + pr_debug("xensnet: registering device failed\n"); + goto free_watch; + } + pr_debug("xensnet: registered device %s\n", dev->name); + + return dev; + +free_watch: + xen_share_unwatch(vdev->share, 0); +remove_handler: + xen_share_remove_handler(vdev->share, &info->handler); + free_netdev(dev); + return ERR_PTR(err); +} + +static void xensnet_remove(struct vdevice *vdev) +{ + struct net_device *netdev = vdev->private; + struct xensnet_info *info = netdev->priv; + + unregister_netdev(netdev); + xen_share_unwatch(vdev->share, 0); + xen_share_remove_handler(vdev->share, &info->handler); + free_netdev(netdev); +} + +static int xensnet_probe(struct vdevice *vdev, const struct vdevice_id *ent) +{ + struct net_device *netdev; + struct xensnet_info *info; + + netdev = setup_device(vdev); + if (IS_ERR(netdev)) + return PTR_ERR(netdev); + vdev->private = netdev; + info = netdev->priv; + info->max_partitions = (vdev->share->num_pages * PAGE_SIZE) / + sizeof(struct xensnet_receiver); + + printk(KERN_INFO + "xensnet: mapped lan %s at share_ref 0x%lx upto %i nodes\n", + netdev->name, vdev->share->share_ref, info->max_partitions); + return 0; +} + +static struct vdevice_id xensnet_ids[] = { + { .type = 1, .features = 1 }, + { .type = 0, .features = 0 }, +}; +static struct vdevice_driver xensnet_drv = { + .name = "xensnet", + .owner = THIS_MODULE, + .ids = xensnet_ids, + .probe = xensnet_probe, + .remove = xensnet_remove, + .stop = NULL, + .reconnect = NULL, +}; + +static __init int xensnet_init(void) +{ + return register_vdevice_driver(&xensnet_drv); +} + +module_init(xensnet_init); +MODULE_LICENSE("GPL"); -- ccontrol: http://ccontrol.ozlabs.org _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Rusty Russell
2006-Jun-06 05:58 UTC
[Xen-devel] [PATCH 2/9] Linux kernel infrastructure for Xen Share access
On the Linux kernel side, we provide some wrappers for accessing shared pages. They are currently reference-counted, because a future patch allows userspace to access shared pages, and the Xen interface will refuse the second request for access by the same domain. The entire hypercall interface is arch-wrapped, which is probably overkill, but I wasn''t entirely sure of the needs of non-x86 architectures. Some of this should almost certainly be in common code. diff -r 6d476981e3a5 -r 07a00d96357d linux-2.6-xen-sparse/include/asm-i386/mach-xen/asm/share.h --- /dev/null Sun May 28 14:49:17 2006 +++ b/linux-2.6-xen-sparse/include/asm-i386/mach-xen/asm/share.h Wed May 31 05:33:38 2006 @@ -0,0 +1,62 @@ +#ifndef __ASM_XEN_I386_SHARE_H +#define __ASM_XEN_I386_SHARE_H +#include <linux/types.h> +#include <linux/interrupt.h> +#include <linux/list.h> +#include <xen/interface/share.h> + +struct xen_share +{ + struct list_head list; + atomic_t use; + share_ref_t share_ref; + unsigned num_pages; + void *addr; + int event_channel; + int peerid; + int irq; + struct list_head handlers; +}; + +struct xen_share_handler +{ + struct list_head list; + void (*handler)(struct xen_share_handler *h); +}; + +/* Map a shared area. Returns PTR_ERR(errno) on fail. */ +struct xen_share *xen_share_get(share_ref_t share_ref, unsigned pages); + +/* Set up handler for events. */ +void xen_share_add_handler(struct xen_share *s, struct xen_share_handler *h); + +/* Remove handler. */ +void xen_share_remove_handler(struct xen_share *s, + struct xen_share_handler *h); + +/* Unmap a shared area (irq unbound if not done already). */ +void xen_share_put(struct xen_share *share); + +/* Register this sg list (physical kernel addresses). Returns 0 on success. */ +int xen_sg_register(struct xen_share *share, int dirmask, u32 queue, u32 *lenp, + unsigned int num_sgs, const struct xen_sg sg[]); + +/* Unregister this sg list: give first phys address of sg. */ +void xen_sg_unregister(struct xen_share *share, unsigned long sgaddr); + +/* Transfer this sg list (physical kernel addresses). Returns len xferred. */ +int xen_sg_xfer(struct xen_share *share, u32 queue, int dir, + unsigned int num_sgs, const struct xen_sg sg[]); + +/* Place watch on this trigger. Returns 0 on success. */ +int xen_share_watch(struct xen_share *share, int triggernum, u32 *resultp); + +/* Remove watch on this trigger. */ +void xen_share_unwatch(struct xen_share *share, int triggernum); + +/* Trigger a watch. Returns num watching on success. */ +int xen_share_trigger(struct xen_share *share, int triggernum); + +/* Map a share into a vma (for userspace mmap). */ +int xen_share_map(struct xen_share *share, struct vm_area_struct *vma); +#endif /* __ASM_XEN_I386_SHARE_H */ diff -r 6d476981e3a5 -r 07a00d96357d linux-2.6-xen-sparse/arch/i386/kernel/share-xen.c --- /dev/null Sun May 28 14:49:17 2006 +++ b/linux-2.6-xen-sparse/arch/i386/kernel/share-xen.c Wed May 31 05:33:38 2006 @@ -0,0 +1,280 @@ +/* x86 layer for share hypercalls. + * Copyright 2006 Rusty Russell <rusty@rustcorp.com.au> IBM Corporation + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + */ +#include <linux/sched.h> +#include <linux/page-flags.h> +#include <linux/vmalloc.h> +#include <linux/err.h> +#include <linux/module.h> +#include <linux/mm.h> +#include <linux/spinlock.h> +#include <asm/semaphore.h> +#include <asm/share.h> +#include <asm/io.h> +#include <xen/evtchn.h> +#include <asm/hypervisor.h> + +/* We only request each area from the hypervisor once, so track them. */ +static DECLARE_MUTEX(share_lock); +static spinlock_t handler_lock = SPIN_LOCK_UNLOCKED; +static LIST_HEAD(shares); + +static int get_evtchn_port(void) +{ + int err; + struct evtchn_alloc_unbound evtchn = { .dom = DOMID_SELF, + .remote_dom = DOMID_SELF }; + + err = HYPERVISOR_event_channel_op(EVTCHNOP_alloc_unbound, &evtchn); + if (err) + return err; + + return evtchn.port; +} + +static void close_evtchn_port(int port) +{ + struct evtchn_close evtchn; + evtchn.port = port; + BUG_ON(HYPERVISOR_event_channel_op(EVTCHNOP_close, &evtchn) != 0); +} + +static struct xen_share *get_share(share_ref_t share_ref) +{ + struct xen_share *i; + + list_for_each_entry(i, &shares, list) { + if (i->share_ref == share_ref) { + atomic_inc(&i->use); + return i; + } + } + return NULL; +} + +static irqreturn_t share_irq(int irq, void *share_, struct pt_regs *regs) +{ + struct xen_share *share = share_; + struct xen_share_handler *h; + + list_for_each_entry(h, &share->handlers, list) + h->handler(h); + return IRQ_HANDLED; +} + +struct xen_share *create_share(share_ref_t share_ref, unsigned pages) +{ + pgprot_t prot; + int err; + struct vm_struct *vma; + struct xen_share *share; + + share = kmalloc(sizeof(struct xen_share), GFP_KERNEL); + if (!share) { + err = -ENOMEM; + goto fail; + } + + share->share_ref = share_ref; + share->num_pages = pages; + atomic_set(&share->use, 1); + INIT_LIST_HEAD(&share->handlers); + vma = get_vm_area(pages * PAGE_SIZE, VM_IOREMAP); + if (!vma) { + err = -ENOMEM; + goto free_share; + } + + share->event_channel = get_evtchn_port(); + if (share->event_channel < 0) { + err = share->event_channel; + goto free_vma; + } + + err = bind_evtchn_to_irqhandler(share->event_channel, share_irq, + SA_SHIRQ, "xenshare", share); + if (err < 0) + goto close_evtchn; + share->irq = err; + + share->peerid = HYPERVISOR_share(XEN_SHARE_get, share_ref, + share->event_channel, 0, 0); + if (share->peerid < 0) { + err = share->peerid; + goto unbind_evtchn; + } + + prot = __pgprot(_PAGE_PRESENT|_PAGE_RW|_PAGE_DIRTY|_PAGE_ACCESSED); + err = direct_kernel_remap_pfn_range((unsigned long)vma->addr, + share_ref, pages * PAGE_SIZE, + prot, DOMID_SELF); + if (err) + goto put_share; + share->addr = vma->addr; + list_add(&share->list, &shares); + + return share; + +put_share: + BUG_ON(HYPERVISOR_share(XEN_SHARE_drop,share->share_ref,0,0,0) != 0); +unbind_evtchn: + unbind_from_irqhandler(share->irq, share); + goto free_vma; +close_evtchn: + close_evtchn_port(share->event_channel); +free_vma: + kfree(vma); +free_share: + kfree(share); +fail: + return ERR_PTR(err); +} + +/* Map a shared area. Returns PTR_ERR(errno) on fail. */ +struct xen_share *xen_share_get(share_ref_t share_ref, unsigned pages) +{ + struct xen_share *share; + + down(&share_lock); + share = get_share(share_ref); + if (share) + BUG_ON(share->num_pages != pages); + else + share = create_share(share_ref, pages); + up(&share_lock); + + return share; +} + +void xen_share_add_handler(struct xen_share *s, struct xen_share_handler *h) +{ + spin_lock_irq(&handler_lock); + list_add(&h->list, &s->handlers); + spin_unlock_irq(&handler_lock); +} + +/* Remove irq handler. */ +void xen_share_remove_handler(struct xen_share *s, struct xen_share_handler *h) +{ + BUG_ON(list_empty(&s->handlers)); + spin_lock_irq(&handler_lock); + list_del(&h->list); + spin_unlock_irq(&handler_lock); +} + +/* Unmap a shared area. */ +void xen_share_put(struct xen_share *share) +{ + down(&share_lock); + if (atomic_dec_and_test(&share->use)) { + BUG_ON(!list_empty(&share->handlers)); + unbind_from_irqhandler(share->irq, share); + + /* This also kfrees vma. */ + vunmap(share->addr); + BUG_ON(HYPERVISOR_share(XEN_SHARE_drop, share->share_ref, 0, + 0, 0) != 0); + list_del(&share->list); + kfree(share); + } + up(&share_lock); +} + +/* Register this sg list (physical kernel addresses). Returns 0 on success. */ +int xen_sg_register(struct xen_share *s, int dirmask, u32 queue, u32 *lenp, + unsigned int num_sgs, const struct xen_sg sg[]) +{ + struct xen_sg new_sg[XEN_SG_MAX]; + unsigned int i; + + /* We feed machine addresses to hypervisor. */ + for (i = 0; i < num_sgs; i++) { + new_sg[i].addr = phys_to_machine(sg[i].addr); + new_sg[i].len = sg[i].len; + } + + return HYPERVISOR_share(XEN_SHARE_sg_register, s->share_ref, + xen_share_sg_arg(queue, num_sgs, dirmask), + (long)new_sg, + virt_to_machine(lenp)); +} + +/* Unregister this sg list. */ +void xen_sg_unregister(struct xen_share *s, unsigned long addr) +{ + BUG_ON(HYPERVISOR_share(XEN_SHARE_sg_unregister, s->share_ref, + phys_to_machine(addr), 0, 0) != 0); +} + +/* Transfer this sg list (physical kernel addresses). Returns len xferred. */ +int xen_sg_xfer(struct xen_share *s, u32 queue, int dir, + unsigned int num_sgs, const struct xen_sg sg[]) +{ + struct xen_sg new_sg[XEN_SG_MAX]; + unsigned int i; + + /* Hypervisor wants virtual addresses here. */ + for (i = 0; i < num_sgs; i++) { + new_sg[i].addr = (long)phys_to_virt(sg[i].addr); + new_sg[i].len = sg[i].len; + } + + return HYPERVISOR_share(XEN_SHARE_sg_xfer, s->share_ref, + xen_share_sg_arg(queue, num_sgs, dir), + (long)new_sg, 0); +} + +/* Place watch on this trigger. Returns 0 on success. */ +int xen_share_watch(struct xen_share *s, int triggernum, u32 *resultp) +{ + return HYPERVISOR_share(XEN_SHARE_watch, s->share_ref, triggernum, + virt_to_machine(resultp), 0); +} + +/* Remove watch on this trigger. */ +void xen_share_unwatch(struct xen_share *s, int triggernum) +{ + BUG_ON(HYPERVISOR_share(XEN_SHARE_unwatch, s->share_ref, triggernum, + 0, 0) != 0); +} + +/* Trigger a watch. Returns num watching on success. */ +int xen_share_trigger(struct xen_share *s, int trigger) +{ + return HYPERVISOR_share(XEN_SHARE_trigger, s->share_ref, trigger,0,0); +} + +int xen_share_map(struct xen_share *s, struct vm_area_struct *vma) +{ + vma->vm_flags |= VM_RESERVED | VM_IO | VM_DONTCOPY; + return direct_remap_pfn_range(vma, vma->vm_start, + s->share_ref, + s->num_pages * PAGE_SIZE, + vma->vm_page_prot, DOMID_SELF); +} + +EXPORT_SYMBOL_GPL(xen_share_get); +EXPORT_SYMBOL_GPL(xen_share_put); +EXPORT_SYMBOL_GPL(xen_share_map); +EXPORT_SYMBOL_GPL(xen_share_trigger); +EXPORT_SYMBOL_GPL(xen_share_watch); +EXPORT_SYMBOL_GPL(xen_share_unwatch); +EXPORT_SYMBOL_GPL(xen_sg_xfer); +EXPORT_SYMBOL_GPL(xen_sg_register); +EXPORT_SYMBOL_GPL(xen_sg_unregister); +EXPORT_SYMBOL_GPL(xen_share_add_handler); +EXPORT_SYMBOL_GPL(xen_share_remove_handler); diff -r 6d476981e3a5 -r 07a00d96357d linux-2.6-xen-sparse/arch/i386/kernel/Makefile --- a/linux-2.6-xen-sparse/arch/i386/kernel/Makefile Sun May 28 14:49:17 2006 +++ b/linux-2.6-xen-sparse/arch/i386/kernel/Makefile Wed May 31 05:33:38 2006 @@ -88,6 +88,7 @@ include $(srctree)/scripts/Makefile.xen obj-y += fixup.o +obj-y += share-xen.o microcode-$(subst m,y,$(CONFIG_MICROCODE)) := microcode-xen.o n-obj-xen := i8259.o timers/ reboot.o smpboot.o trampoline.o diff -r 6d476981e3a5 -r 07a00d96357d linux-2.6-xen-sparse/arch/i386/mm/ioremap-xen.c --- a/linux-2.6-xen-sparse/arch/i386/mm/ioremap-xen.c Sun May 28 14:49:17 2006 +++ b/linux-2.6-xen-sparse/arch/i386/mm/ioremap-xen.c Wed May 31 05:33:38 2006 @@ -123,8 +123,11 @@ /* Same as remap_pfn_range(). */ vma->vm_flags |= VM_IO | VM_RESERVED; + /* FIXME: xenshare needs to pass DOMID_SELF. Check it''s save to remove + * the check. if (domid == DOMID_SELF) return -EINVAL; + */ return __direct_remap_pfn_range( vma->vm_mm, address, mfn, size, prot, domid); diff -r 6d476981e3a5 -r 07a00d96357d linux-2.6-xen-sparse/include/asm-i386/mach-xen/asm/hypercall.h --- a/linux-2.6-xen-sparse/include/asm-i386/mach-xen/asm/hypercall.h Sun May 28 14:49:17 2006 +++ b/linux-2.6-xen-sparse/include/asm-i386/mach-xen/asm/hypercall.h Wed May 31 05:33:38 2006 @@ -359,5 +359,11 @@ return _hypercall2(int, xenoprof_op, op, arg); } +static inline long +HYPERVISOR_share( + int op, long arg1, long arg2, long arg3, long arg4) +{ + return _hypercall5(long, share_op, op, arg1, arg2, arg3, arg4); +} #endif /* __HYPERCALL_H__ */ diff -r 6d476981e3a5 -r 07a00d96357d patches/linux-2.6.12/get_vm_area.patch --- /dev/null Sun May 28 14:49:17 2006 +++ b/patches/linux-2.6.12/get_vm_area.patch Wed May 31 05:33:38 2006 @@ -0,0 +1,9 @@ +diff -Naur linux-2.6.12/mm/vmalloc.c linux-2.6.12.post/mm/vmalloc.c +--- linux-2.6.12/mm/vmalloc.c 2005-06-18 05:48:29.000000000 +1000 ++++ linux-2.6.12.post/mm/vmalloc.c 2006-01-10 16:56:36.000000000 +1100 +@@ -247,6 +247,7 @@ + { + return __get_vm_area(size, flags, VMALLOC_START, VMALLOC_END); + } ++EXPORT_SYMBOL(get_vm_area); + -- ccontrol: http://ccontrol.ozlabs.org _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Rusty Russell
2006-Jun-06 05:59 UTC
[Xen-devel] [PATCH 9/9] Simple Xenshare Block Device and userspace backend
This is a simple block device, with the backend in userspace. While performance is good, it''s not production-ready because it need to keep track of which backend process to kill when device goes away (look for "So very, very wrong"), and also should use separate device numbers for front and backend. diff -r 20b744b2c4c0 .hgignore --- a/.hgignore Mon Jun 5 06:41:24 2006 +++ b/.hgignore Tue Jun 6 14:56:24 2006 @@ -147,6 +147,7 @@ ^tools/security/xen/.*$ ^tools/tests/test_x86_emulator$ ^tools/vdevice/vdevice$ +^tools/vdevice/xensblk$ ^tools/vnet/gc$ ^tools/vnet/gc.*/.*$ ^tools/vnet/vnet-module/.*\.ko$ diff -r 20b744b2c4c0 linux-2.6-xen-sparse/drivers/xen/Makefile --- a/linux-2.6-xen-sparse/drivers/xen/Makefile Mon Jun 5 06:41:24 2006 +++ b/linux-2.6-xen-sparse/drivers/xen/Makefile Tue Jun 6 14:56:24 2006 @@ -10,6 +10,7 @@ obj-y += xenbus/ obj-y += vdevice/ obj-m += sharenet/ +obj-m += shareblock/ obj-y += xenshare.o obj-$(CONFIG_XEN_BLKDEV_BACKEND) += blkback/ diff -r 20b744b2c4c0 tools/examples/Makefile --- a/tools/examples/Makefile Mon Jun 5 06:41:24 2006 +++ b/tools/examples/Makefile Tue Jun 6 14:56:24 2006 @@ -35,7 +35,7 @@ XEN_SCRIPT_DATA += vtpm-migration.sh XEN_HOTPLUG_DIR = /etc/hotplug -XEN_HOTPLUG_SCRIPTS = xen-backend.agent +XEN_HOTPLUG_SCRIPTS = xen-backend.agent vdevice.agent UDEV_RULES_DIR = /etc/udev UDEV_RULES = xen-backend.rules diff -r 20b744b2c4c0 tools/vdevice/Makefile --- a/tools/vdevice/Makefile Mon Jun 5 06:41:24 2006 +++ b/tools/vdevice/Makefile Tue Jun 6 14:56:24 2006 @@ -19,16 +19,21 @@ CFLAGS += $(BASECFLAGS) LDFLAGS += $(PROFILE) -L$(XEN_LIBXC) -L$(XEN_XENSTORE) -all: vdevice +all: vdevice xensblk clean: - rm -f vdevice *.o .*.d + rm -f vdevice xensblk *.o vdevice: vdevice.o $(LINK.o) $^ $(LOADLIBES) $(LDLIBS) -lxenctrl -lxenstore -o $@ -install: vdevice +xensblk: xensblk.o + $(LINK.o) $^ $(LOADLIBES) $(LDLIBS) -lxenctrl -o $@ + + +install: vdevice xensblk $(INSTALL_DIR) -p $(DESTDIR)/usr/sbin $(INSTALL_PROG) vdevice $(DESTDIR)/usr/sbin + $(INSTALL_PROG) xensblk $(DESTDIR)/usr/sbin -include $(PROG_DEP) diff -r 20b744b2c4c0 tools/vdevice/vdevice.c --- a/tools/vdevice/vdevice.c Mon Jun 5 06:41:24 2006 +++ b/tools/vdevice/vdevice.c Tue Jun 6 14:56:24 2006 @@ -22,6 +22,7 @@ #include <xen/event_channel.h> #include <xen/linux/privcmd.h> #include <xen/io/vdevice.h> +#include <xen/linux/xensblk.h> #include <xc_private.h> @@ -368,6 +369,61 @@ /* FIXME: wait for ack! */ } +/* --create. Returns argnum of first domain arg. */ +static int block_create(struct vdevice_type *type, + share_ref_t ref, void *map, + int argc, char *argv[]) +{ + struct xensblk_page *sharepage = map; + + /* We need backend and file. */ + if (argc < 2 || argc > 3) { + fprintf(stderr, "Usage:\n\t" + "%s --create block <backend-file> <backend-domid> [<frontend-domid>]\n", + PROGRAM_NAME); + return -1; + } + /* FIXME: Check length before copying to shared page */ + /* FIXME: Handle relative paths */ + if (argv[0][0] != ''/'') { + fprintf(stderr, "%s: backend file must be an absolute path\n", + PROGRAM_NAME); + return -1; + } + + /* FIXME: Allow other types here... */ + sharepage->device_type = XENSBLK_DEVTYPE_FILE; + sharepage->flags = 0; + strcpy(sharepage->device_specific_info, argv[0]); + + if (!add_vdevice_entry(argv[1], type->type, type->features, + type->num_pages, ref, VDEVICE_S_DRIVER_OK)) + return -1; + + return 2; +} + +/* List info about this vdevice. */ +static void block_list(struct vdevice_type *type __unused, + const struct vdevice_desc *vdesc) +{ + struct xensblk_page *b; + unsigned int peer_id; + + b = map_pages(vdesc->shared_ref, vdesc->nr_pages, &peer_id); + if (!b) { + printf(" *cannot map*"); + return; + } + if (b->device_type == XENSBLK_DEVTYPE_FILE) + printf(" %.*s", sizeof(b->device_specific_info), + b->device_specific_info); + printf(" %u blocks (%llu bytes)", b->capacity_in_blocks, + b->capacity_in_blocks * 512ULL); + if (b->error) + printf(" ERR %i", b->error); +} + /* FIXME: some callers need to recover, not exit if this fails... */ static share_ref_t domid_arg(const char *arg) { @@ -392,6 +448,13 @@ .num_pages = 1, .create = net_create, .list = net_list, + }, + { .name = "block", + .type = 2, + .features = 1, + .num_pages = 1, + .create = block_create, + .list = block_list, }, }; diff -r 20b744b2c4c0 linux-2.6-xen-sparse/drivers/xen/shareblock/Makefile --- /dev/null Mon Jun 5 06:41:24 2006 +++ b/linux-2.6-xen-sparse/drivers/xen/shareblock/Makefile Tue Jun 6 14:56:24 2006 @@ -0,0 +1,1 @@ +obj-m += xensblk.o diff -r 20b744b2c4c0 linux-2.6-xen-sparse/drivers/xen/shareblock/xensblk.c --- /dev/null Mon Jun 5 06:41:24 2006 +++ b/linux-2.6-xen-sparse/drivers/xen/shareblock/xensblk.c Tue Jun 6 14:56:24 2006 @@ -0,0 +1,293 @@ +/* A simple block driver for Xen, using share ops. + * + * Copyright 2006 Rusty Russell <rusty@rustcorp.com.au> IBM Corporation + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + */ +//#define DEBUG +#include <linux/major.h> +#include <linux/blkdev.h> +#include <linux/module.h> +#include <linux/init.h> +#include <linux/spinlock.h> +#include <linux/vdevice.h> +#include <xen/public/xensblk.h> +#include <xen/interface/share.h> +#include <xen/evtchn.h> + +struct blockshare +{ + spinlock_t lock; + + struct vdevice *vdev; + + /* The disk structure for the kernel. */ + struct gendisk *disk; + + /* The major number for this disk. */ + int major; + + struct xensblk_page *xb_page; + + /* Awaiting the response from the other end. */ + struct request *req; + int op; + u32 req_len; + + /* Handler which notifies us of changes. */ + struct xen_share_handler handler; + + /* For read, this holds bytes server sent. */ + u32 bytes_read; + u32 write_done; +}; + +static void done_with_request(struct blockshare *bs, int error) +{ + pr_debug("Done with request, error %i\n", error); + end_request(bs->req, error == 0); + bs->req = NULL; + /* Reset error. */ + bs->xb_page->error = 0; + blk_start_queue(bs->disk->queue); +} + +static void blockshare_handler(struct xen_share_handler *handler) +{ + struct blockshare *bs; + unsigned long flags; + + bs = container_of(handler, struct blockshare, handler); + /* Have they sent a reply? */ + spin_lock_irqsave(&bs->lock, flags); + if (bs->req) { + int err = bs->xb_page->error; + if (err) + done_with_request(bs, err); + else if (bs->op == 1) { + if (bs->write_done == 0) { + /* Write simply fills in error field. */ + pr_debug("xensblk: write error = %i\n", + bs->xb_page->error); + done_with_request(bs, bs->xb_page->error); + } else + pr_debug("xensblk: write in progress?\n"); + } else if (bs->op == 0) { + if (bs->bytes_read != 0) { + /* Read should have transferred data. */ + done_with_request(bs, + bs->bytes_read!=bs->req_len); + } else + pr_debug("xensblk: read in progress?\n"); + } else + BUG(); + } else { + printk("No request %p\n", bs->req); + } + spin_unlock_irqrestore(&bs->lock, flags); +} + +/* Returns number of sg elements used. */ +static unsigned req_to_sg(struct request *req, struct xen_sg sg[XEN_SG_MAX], + unsigned int *len) +{ + unsigned int i = 0, idx; + struct bio *bio; + + *len = 0; + rq_for_each_bio(bio, req) { + struct bio_vec *bvec; + bio_for_each_segment(bvec, bio, idx) { + BUG_ON(i == XEN_SG_MAX); + sg[i].addr = page_to_pseudophys(bvec->bv_page) + + bvec->bv_offset; + sg[i].len = bvec->bv_len; + BUG_ON(sg[i].addr / PAGE_SIZE !+ (sg[i].addr + sg[i].len - 1) / PAGE_SIZE); + *len += sg[i].len; + i++; + } + } + return i; +} + +static int send_op(struct blockshare *bs, struct request *req, int is_write) +{ + int ret; + + pr_debug("send_op: %s sector %li\n", + is_write ? "WRITE" : "READ", req->sector); + + bs->xb_page->req_type = is_write; + bs->xb_page->sector = req->sector; + bs->op = 0; + bs->req = req; + + if (is_write) { + struct xen_sg out[XEN_SG_MAX]; + unsigned int num; + + num = req_to_sg(req, out, &bs->req_len); + bs->xb_page->num = bs->req_len / 512; + bs->op = 1; + bs->write_done = 1; + ret = xen_sg_xfer(bs->vdev->share, XENSBLK_SERVER_QUEUE, + XEN_SG_OUT, num, out); + pr_debug("Write xfer returned %i\n", ret); + if (ret < 0 || ret < bs->req_len) { + printk("xensblk: write xfer %i returned %i\n", + bs->req_len, ret); + goto fail; + } + } else { + struct xen_sg in[XEN_SG_MAX]; + unsigned int num; + + /* Get receive straight into req. */ + num = req_to_sg(req, in, &bs->req_len); + bs->xb_page->num = bs->req_len / 512; + bs->bytes_read = 0; + ret = xen_sg_register(bs->vdev->share, XEN_SG_IN, + XENSBLK_CLIENT_QUEUE, + &bs->bytes_read, num, in); + if (ret != 0) { + printk("xensblk: could not set up receive sg: %i\n", + ret); + goto fail; + } + xen_share_trigger(bs->vdev->share, XENSBLK_OP_READY); + } + return 0; + +fail: + bs->req = NULL; + return -EIO; +} + +static void do_blkshare_request(request_queue_t *q) +{ + struct request *req; + + req = elv_next_request(q); + if (!req) + return; + + if (!blk_fs_request(req)) { + printk("Got non-command 0x%08lx\n", req->flags); + req->errors++; + end_request(req, 0); + } else { + struct blockshare *bs; + bs = req->rq_disk->private_data; + if (send_op(bs, req, rq_data_dir(req) == WRITE) < 0) { + req->errors++; + end_request(req, 0); + } else { + blk_stop_queue(q); + } + } +} + +static struct block_device_operations share_fops = { + .owner = THIS_MODULE, +}; + +static int xensblk_probe(struct vdevice *vdev, const struct vdevice_id *ent) +{ + struct blockshare *bs; + int err; + + bs = kmalloc(sizeof(*bs), GFP_KERNEL); + if (!bs) + return -ENOMEM; + + spin_lock_init(&bs->lock); + bs->vdev = vdev; + bs->xb_page = vdev->share->addr; + bs->disk = alloc_disk(1); + if (!bs->disk) { + err = -ENOMEM; + goto out_free_bs; + } + + bs->disk->queue = blk_init_queue(do_blkshare_request, &bs->lock); + if (!bs->disk->queue) { + err = -ENOMEM; + goto out_put; + } + + /* We want virtually-mapped pages. */ + blk_queue_bounce_limit(bs->disk->queue, BLK_BOUNCE_HIGH); + /* We can only handle a certain number of sg entries (one for op) */ + blk_queue_max_phys_segments(bs->disk->queue, XEN_SG_MAX - 1); + /* Buffers must not cross page boundaries */ + blk_queue_segment_boundary(bs->disk->queue, PAGE_SIZE-1); + + bs->major = register_blkdev(0, "xenblock"); + if (bs->major < 0) { + err = bs->major; + goto out_cleanup_queue; + } + + bs->handler.handler = blockshare_handler; + xen_share_add_handler(vdev->share, &bs->handler); + + bs->write_done = 1; + xen_share_watch(vdev->share, XENSBLK_OP_DONE, &bs->write_done); + + printk(KERN_INFO "xen block share device %li at major %d\n", + vdev->share->share_ref, bs->major); + + bs->disk->major = bs->major; + bs->disk->first_minor = 0; + bs->disk->private_data = bs; + bs->disk->fops = &share_fops; + sprintf(bs->disk->disk_name, "xensblock%d", bs->vdev->vdevice_index); + set_capacity(bs->disk, bs->xb_page->capacity_in_blocks); + add_disk(bs->disk); + vdev->private = bs; + return 0; + +out_cleanup_queue: + blk_cleanup_queue(bs->disk->queue); +out_put: + put_disk(bs->disk); +out_free_bs: + kfree(bs); + return err; +} + +static struct vdevice_id xensblk_ids[] = { + { .type = 2, .features = 1 }, + { .type = 0 }, +}; +static struct vdevice_driver xensblk_drv = { + .name = "xensblk", + .owner = THIS_MODULE, + .ids = xensblk_ids, + .probe = xensblk_probe, + .remove = NULL, + .stop = NULL, + .reconnect = NULL, +}; +static __init int init(void) +{ + return register_vdevice_driver(&xensblk_drv); +} + +module_init(init); + +MODULE_DESCRIPTION("Xen block driver using share"); +MODULE_LICENSE("GPL"); diff -r 20b744b2c4c0 linux-2.6-xen-sparse/include/xen/public/xensblk.h --- /dev/null Mon Jun 5 06:41:24 2006 +++ b/linux-2.6-xen-sparse/include/xen/public/xensblk.h Tue Jun 6 14:56:24 2006 @@ -0,0 +1,38 @@ +#ifndef _LINUX_XENSBLK_H +#define _LINUX_XENSBLK_H + +/* Possible device types */ +#define XENSBLK_DEVTYPE_FILE 1 /* Followed by path of file to serve. */ + +struct xensblk_page +{ + /* Filled in by the creation tool */ + uint16_t device_type; + uint16_t flags; + + /* Contents depends on device_type */ + char device_specific_info[256]; + + /* Filled in by server. */ + uint32_t capacity_in_blocks; + uint32_t error; + + /* Request 0 = read, 1 = write. */ + uint32_t req_type; + /* Length (sectors). */ + uint32_t num; + /* Sector to read/write (multiply by 512 for offset). */ + uint64_t sector; +}; + +/* Triggered by client when op ready (except write: xfer triggers) */ +#define XENSBLK_OP_READY 1 + +/* Triggered by server has finished op (except read: xfer triggers) */ +#define XENSBLK_OP_DONE 2 + +/* Where the server and client queue their sg lists */ +#define XENSBLK_CLIENT_QUEUE 0 +#define XENSBLK_SERVER_QUEUE 1 + +#endif /* _LINUX_XENSBLK_H */ diff -r 20b744b2c4c0 tools/examples/vdevice.agent --- /dev/null Mon Jun 5 06:41:24 2006 +++ b/tools/examples/vdevice.agent Tue Jun 6 14:56:24 2006 @@ -0,0 +1,18 @@ +#! /bin/sh + +PATH=/etc/xen/scripts:$PATH + +exec > /tmp/log 2>&1 +echo VDEVICE called: $DEVPATH + +if [ `cat /sys/$DEVPATH/type` == 2 ]; then + case "$ACTION" in + add) + /usr/sbin/xensblk $DEVPATH >> /tmp/log 2>&1 & + ;; + remove) + # FIXME: So very, very wrong... + killall /usr/sbin/xensblk + ;; + esac +fi diff -r 20b744b2c4c0 tools/examples/vdevice.rules --- /dev/null Mon Jun 5 06:41:24 2006 +++ b/tools/examples/vdevice.rules Tue Jun 6 14:56:24 2006 @@ -0,0 +1,3 @@ +SUBSYSTEM=="vdevice", SYSFS{type}=="2", ACTION=="add", RUN+="/usr/sbin/xensblk %p &" +# FIXME: So very, very wrong... +SUBSYSTEM=="vdevice", SYSFS{type}=="2", ACTION=="remove", RUN+="killall /usr/sbin/xensblk" diff -r 20b744b2c4c0 tools/vdevice/xensblk.c --- /dev/null Mon Jun 5 06:41:24 2006 +++ b/tools/vdevice/xensblk.c Tue Jun 6 14:56:24 2006 @@ -0,0 +1,269 @@ +/* Simple code for a userspace block device backend. */ +#define _GNU_SOURCE +#include <stdio.h> +#include <stdint.h> +#include <assert.h> +#include <unistd.h> +#include <sys/types.h> +#include <sys/stat.h> +#include <fcntl.h> +#include <sys/mman.h> +#include <stdlib.h> +#include <sys/ioctl.h> +#include <xen/linux/xenshare.h> +#include <xen/io/vdevice.h> +#include <xen/linux/xensblk.h> +#include <err.h> +#include <errno.h> +#include <stdbool.h> +#include "xenctrl.h" + +#define BUFFER_SIZE (getpagesize() * XEN_SG_MAX) + +static unsigned long get_share_ref(const char *devpath) +{ + char *fname; + FILE *f; + unsigned long share_ref; + + asprintf(&fname, "%s/%s/share_ref", "/sys", devpath); + f = fopen(fname, "r"); + if (!f) + err(1, "Could not open %s", fname); + if (fscanf(f, "%lu", &share_ref) != 1) + err(1, "Could not read share_ref from %s", fname); + fclose(f); + free(fname); + return share_ref; +} + +/* We OR in these status bits. */ +static char *status_file; +static bool add_status(uint32_t status) +{ + FILE *f; + uint32_t old_status; + + f = fopen(status_file, "r+"); + if (!f) + return false; + if (fscanf(f, "%u", &old_status) != 1) + return false; + rewind(f); + if (fprintf(f, "%u", old_status | status) < 0) + return false; + if (fclose(f) != 0) + return false; + return true; +} + +static void set_fail_status(void) +{ + add_status(VDEVICE_S_FAILED); +} + +static uint32_t size_in_blocks(int fd) +{ + struct stat st; + + fstat(fd, &st); + return st.st_size / 512; +} + +static void share_error(int err, int shareiofd, struct xensblk_page *share) +{ + share->error = err; + ioctl(shareiofd, IOCTL_XENSHARE_TRIGGER, XENSBLK_OP_DONE); +} + +static void handle_read(int backingfd, int shareiofd, + void *buffer, + uint32_t num_blocks, + struct xensblk_page *share) +{ + struct xenshare_sg send; + int ret; + uint32_t num = share->num; + uint64_t sector = share->sector; + + if ((uint64_t)num + sector > num_blocks) { + printf("xensblk READ: num=%u sector=%llu: out of range!\n", + num, sector); + share_error(ENOSPC, shareiofd, share); + return; + } + + lseek(backingfd, sector*512, SEEK_SET); + ret = read(backingfd, buffer, num*512); + if (ret != num*512) { + printf("xensblk READ: num=%u sector=%llu: short read: %i!\n", + num, sector, ret); + share_error(EIO, shareiofd, share); + return; + } + + send.len = num*512; + send.queue = XENSBLK_CLIENT_QUEUE; + + ret = ioctl(shareiofd, IOCTL_XENSHARE_SG_SEND, &send); + if (ret != num*512) { + printf("xensblk READ: num=%u sector=%llu: xfer returned %i: %i", + num, sector, ret, errno); + /* They''ll notice a partial xfer, but if completely failed, + * we have to trigger them. + */ + if (ret < 0) + share_error(EFAULT, shareiofd, share); + } +} + +static void handle_write(int backingfd, int shareiofd, + uint32_t num_blocks, + void *inbuf, + uint32_t len, + struct xensblk_page *share) +{ + int ret; + struct xenshare_sg reg; + uint32_t num = share->num; + uint64_t sector = share->sector; + + if ((uint64_t)num + sector > num_blocks) { + printf("WRITE: num = %u, sector = %llu out of range!\n", + num, sector); + share->error = ENOSPC; + goto out; + } + + if (len != num*512) { + printf("WRITE: num = %u, sector = %llu: %i bytes tranferred, not %i!\n", num, sector, len, num*512); + share->error = EINVAL; + goto out; + } + + lseek(backingfd, sector*512, SEEK_SET); + ret = write(backingfd, inbuf, num*512); + if (ret != num*512) { + printf("WRITE: num = %u, sector = %llu: Short write: %i!\n", + num, sector, ret); + share->error = EIO; + goto out; + } + +out: + /* Restore all the pages up for receiving data *before* we ack. */ + reg.len = BUFFER_SIZE; + reg.queue = XENSBLK_SERVER_QUEUE; + if (ioctl(shareiofd, IOCTL_XENSHARE_SG_REGISTER, ®) != 0) + err(1, "Failed to re-register sg"); + + ioctl(shareiofd, IOCTL_XENSHARE_TRIGGER, XENSBLK_OP_DONE); +} + +int main(int argc, char *argv[]) +{ + int backingfd, xc, shareiofd; + struct xensblk_page *sharepage; + void *buffer; + int ret; + uint32_t num_blocks; + struct xenshare_get_share shareget; + struct xenshare_sg reg; + + if (argc != 2) + err(1, "Usage: xensblk <devpath>"); + + asprintf(&status_file, "%s/%s/status", "/sys", argv[1]); + + /* We have met the driver, and it is us. */ + if (!add_status(VDEVICE_S_DRIVER)) + err(1, "Could not update status"); + atexit(set_fail_status); + + shareiofd = open("/dev/xenshare", O_RDWR); + if (shareiofd < 0) + err(1, "Could not open ''%s''", "/dev/xenshare"); + + xc = xc_interface_open(); + if (xc < 0) + err(1, "Failed to open xc interface"); + + shareget.share_ref = get_share_ref(argv[1]); + shareget.num_pages = 1; + ret = ioctl(shareiofd, IOCTL_XENSHARE_GET_SHARE, &shareget); + if (ret < 0) + err(1, "Getting shared pages gave %i", ret); + printf("I am peer %i\n", ret); + + /* Map shared page */ + sharepage = mmap(NULL, getpagesize(), PROT_READ|PROT_WRITE, + MAP_SHARED, shareiofd, + XENSHARE_MAP_SHARE_PAGE * getpagesize()); + if (sharepage == MAP_FAILED) + err(1, "Failed to map shared page"); + + if (!add_status(VDEVICE_S_MAPPED)) + err(1, "Could not update status"); + + if (sharepage->device_type != XENSBLK_DEVTYPE_FILE) + errx(1, "Unknown device type %i", sharepage->device_type); + + backingfd = open(sharepage->device_specific_info, O_RDWR); + if (backingfd < 0) + err(1, "Could not open ''%s''", + sharepage->device_specific_info); + + sharepage->capacity_in_blocks = num_blocks = size_in_blocks(backingfd); + sharepage->error = 0; + + /* Map input sg. */ + buffer = mmap(NULL, BUFFER_SIZE, PROT_READ|PROT_WRITE, MAP_SHARED, + shareiofd, 0); + if (buffer == MAP_FAILED) + err(1, "Failed to map sg"); + + reg.len = BUFFER_SIZE; + reg.queue = XENSBLK_SERVER_QUEUE; + if (ioctl(shareiofd, IOCTL_XENSHARE_SG_REGISTER, ®) != 0) + err(1, "Failed to register sg"); + + if (ioctl(shareiofd, IOCTL_XENSHARE_WATCH, XENSBLK_OP_READY) != 0) + err(1, "Failed to watch for op"); + + /* Now we''re ready to serve... */ + if (!add_status(VDEVICE_S_DRIVER_OK)) + err(1, "Could not update status"); + + for(;;) { + int32_t len; + if (read(shareiofd, &len, sizeof(len)) != sizeof(len)) + err(1, "Short read from shareiofd"); + + if (-len == XENSBLK_OP_READY) { + if (sharepage->req_type == 0) + handle_read(backingfd, shareiofd, buffer, + num_blocks, sharepage); + else { + fprintf(stderr, "Strange, op request %i!\n", + sharepage->req_type); + share_error(EINVAL, shareiofd, sharepage); + } + } else if (len <= 0) { + fprintf(stderr, "Strange, bufsize %i!\n", len); + share_error(EINVAL, shareiofd, sharepage); + } else { + if (sharepage->req_type == 1) + handle_write(backingfd, shareiofd, + num_blocks, buffer, len, + sharepage); + else { + fprintf(stderr, "Strange, xfer request %i!\n", + sharepage->req_type); + share_error(EINVAL, shareiofd, sharepage); + } + } + } +} + + + -- ccontrol: http://ccontrol.ozlabs.org _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Harry Butterworth
2006-Jun-06 14:31 UTC
Re: [Xen-devel] [PATCH 1/9] Xen Share: Simplified I/O Mechanism
On Tue, 2006-06-06 at 15:35 +1000, Rusty Russell wrote:> static inline int get_page(struct page_info *page, > struct domain *domain) > +{ > + u32 x, nx, y = page->count_info; > + u32 d, nd = page->u.inuse._domain; > + u32 _domain = pickle_domptr(domain); > + > + do { > + x = y; > + nx = x + 1; > + d = nd; > + if ( unlikely((x & PGC_count_mask) == 0) || /* Not allocated? */ > + unlikely((nx & PGC_count_mask) == 0) ) /* Count overflow? */ > + { > + if ( !_shadow_mode_refcounts(domain) ) > + DPRINTK("Error pfn %lx: rd=%p, od=%p, caf=%08x, taf=%" PRtype_info "\n", > + page_to_pfn(page), domain, unpickle_domptr(d), > + x, page->u.inuse.type_info); > + return 0; > + } > + if ( unlikely(d != _domain) ) /* Wrong owner? */ > + return try_shared_page(page, domain); > + __asm__ __volatile__( > + LOCK_PREFIX "cmpxchg8b %3" > + : "=d" (nd), "=a" (y), "=c" (d), > + "=m" (*(volatile u64 *)(&page->count_info)) > + : "0" (d), "1" (x), "c" (d), "b" (nx) ); > + } > + while ( unlikely(nd != d) || unlikely(y != x) ); > + > + return 1; > +}What is the "=c" (d) there for? And doesn''t cmpxchg8b modify the zero flag---is it necessary to clobber the condition code register?> diff -r d5f98d23427a xen/include/public/xen.h > --- a/xen/include/public/xen.h Tue May 30 10:44:23 2006 > +++ b/xen/include/public/xen.h Wed May 31 17:39:54 2006 > @@ -64,6 +64,7 @@ > #define __HYPERVISOR_xenoprof_op 31 > #define __HYPERVISOR_event_channel_op 32 > #define __HYPERVISOR_physdev_op 33 > +#define __HYPERVISOR_share_op 33Sharing no 33? _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Harry Butterworth
2006-Jun-06 14:47 UTC
Re: [Xen-devel] [PATCH 1/9] Xen Share: Simplified I/O Mechanism
On Tue, 2006-06-06 at 15:35 +1000, Rusty Russell wrote:> This introduces a page "share" mechanism to xen: an alternative to > both cross-domain binding of event channels, and grant tables.Why do you think an alternative to event channels and grant tables is needed? Personally, I think there is a need for a more convenient _high-level_ API that more directly meets the split-drivers'' requirements for inter-domain messaging and bulk data transfer but this seems to me to be another low-level API and doesn''t seem significantly more convenient to use than grant-tables and event-channels. Is the n>2-way sharing feature the only benefit over grant-tables or do you think there are other benefits to your approach? _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Rusty Russell
2006-Jun-07 02:24 UTC
Re: [Xen-devel] [PATCH 1/9] Xen Share: Simplified I/O Mechanism
On Tue, 2006-06-06 at 15:31 +0100, Harry Butterworth wrote:> On Tue, 2006-06-06 at 15:35 +1000, Rusty Russell wrote: > > + __asm__ __volatile__( > > + LOCK_PREFIX "cmpxchg8b %3" > > + : "=d" (nd), "=a" (y), "=c" (d), > > + "=m" (*(volatile u64 *)(&page->count_info)) > > + : "0" (d), "1" (x), "c" (d), "b" (nx) ); > > + } > > + while ( unlikely(nd != d) || unlikely(y != x) ); > > + > > + return 1; > > +} > > What is the "=c" (d) there for? And doesn''t cmpxchg8b modify the zero > flag---is it necessary to clobber the condition code register?Good questions. I copied this code from below, though, so I can conveniently punt on this one...> > diff -r d5f98d23427a xen/include/public/xen.h > > --- a/xen/include/public/xen.h Tue May 30 10:44:23 2006 > > +++ b/xen/include/public/xen.h Wed May 31 17:39:54 2006 > > @@ -64,6 +64,7 @@ > > #define __HYPERVISOR_xenoprof_op 31 > > #define __HYPERVISOR_event_channel_op 32 > > #define __HYPERVISOR_physdev_op 33 > > +#define __HYPERVISOR_share_op 33 > > Sharing no 33?Oops. Good catch, this must have been my sloppy merging. I''ve fixed this locally. Thanks! Rusty. -- ccontrol: http://ccontrol.ozlabs.org _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Rusty Russell
2006-Jun-07 02:35 UTC
Re: [Xen-devel] [PATCH 1/9] Xen Share: Simplified I/O Mechanism
On Tue, 2006-06-06 at 15:47 +0100, Harry Butterworth wrote:> On Tue, 2006-06-06 at 15:35 +1000, Rusty Russell wrote: > > This introduces a page "share" mechanism to xen: an alternative to > > both cross-domain binding of event channels, and grant tables. > > Why do you think an alternative to event channels and grant tables is > needed?Fundamentally, I think having a simplified mechanism for inter-domain communication keeps us honest: we can benchmark against it and look at the code size, if nothing else.> Personally, I think there is a need for a more convenient _high-level_ > API that more directly meets the split-drivers'' requirements for > inter-domain messaging and bulk data transfer but this seems to me to be > another low-level API and doesn''t seem significantly more convenient to > use than grant-tables and event-channels.The isolation it provides forms a better basis for a higher-level mechanism, IMHO. Neither side really knows, or cares, where the events and data are really going to. There is no mapping of other domain''s memory, with all the trust and coordination issues that we had to deal with (and didn''t completely!) with the current mechanisms. For example, if you wanted to use this as a remote mechanism for communication with another machine, you could. If you wanted to substitute backends without the front end knowing, you could. If the frontend or backend dies, there is no restriction on cleaning up its memory.> Is the n>2-way sharing feature the only benefit over grant-tables or do > you think there are other benefits to your approach?Other than the above theoretical advantages, I found it far easier to implement a Linux driver on top of this, than it was to implement it on top of grant tables, event channel binding and xenbus. Easier to implement means easier to optimize, easier to port, and easier to debug. Hope that clarifies! Rusty. -- ccontrol: http://ccontrol.ozlabs.org _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jacob Gorm Hansen
2006-Jun-07 10:03 UTC
Re: [Xen-devel] [PATCH 6/9] Linux support for vdevice bus
On 6/6/06, Rusty Russell <rusty@rustcorp.com.au> wrote:> Subject: Linux support for vdevice bus > > This patch provides the Linux implementation of the vdevice bus. > > FIXME: currently it does not support save/restore of the domain: it > should call stop before shutting down, and remap shares afterwards > before calling reconnect. This depends on exactly what we do with > shared pages on restore.In general I find the ''remember to suspend on save'' approach that we are currently using for xenbus and drivers problematic, and I much favor a ''reset on resume'' approach instead. Sometimes when doing a save (or migration, or, in my case, self-migration), the domain wants to continue running after the save, and then having to shut down all external devices just to immediately resume them is inelegant and often creates a lot of trouble. If we are to change the IPC/sharing mechanism (and you make some good arguments for that), I think we should design for ''reset on resume'' rather than ''suspend-on-save''. Just my DKK0.02. Jacob _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Rusty Russell
2006-Jun-07 10:58 UTC
Re: [Xen-devel] [PATCH 6/9] Linux support for vdevice bus
On Wed, 2006-06-07 at 12:03 +0200, Jacob Gorm Hansen wrote:> On 6/6/06, Rusty Russell <rusty@rustcorp.com.au> wrote: > > Subject: Linux support for vdevice bus > > > > This patch provides the Linux implementation of the vdevice bus. > > > > FIXME: currently it does not support save/restore of the domain: it > > should call stop before shutting down, and remap shares afterwards > > before calling reconnect. This depends on exactly what we do with > > shared pages on restore. > > In general I find the ''remember to suspend on save'' approach that we > are currently using for xenbus and drivers problematic, and I much > favor a ''reset on resume'' approach instead. Sometimes when doing a > save (or migration, or, in my case, self-migration), the domain wants > to continue running after the save, and then having to shut down all > external devices just to immediately resume them is inelegant and > often creates a lot of trouble. If we are to change the IPC/sharing > mechanism (and you make some good arguments for that), I think we > should design for ''reset on resume'' rather than ''suspend-on-save''.Thanks for this input Jacob! Things are simpler enough that this might not be an issue, but... Let''s look at the shared network device as an example. It maps the share_ref it''s told to in the vdevice page, and is reading and writing to that mapped page (although not in an interrupt, unlike the block driver). On restore, the previously mapped pages will no longer be valid, and the new share_ref needs to be mapped. Even if we remap the new share_refs in place on restore before allowing interrupts, the driver might process an interrupt before being told it has to reset, and get confused. So if we really want all the action to happen on restore, I think we need a stop!() function for drivers called before interrupts are re-enabled, and then we can call restart() later at our leisure. The driver, other than ensuring that "stop" can be called with interrupts off, doesn''t have to know whether it''s called during save or restore. Seem reasonable? Rusty. -- ccontrol: http://ccontrol.ozlabs.org _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jacob Gorm Hansen
2006-Jun-07 11:09 UTC
Re: [Xen-devel] [PATCH 6/9] Linux support for vdevice bus
On 6/7/06, Rusty Russell <rusty@rustcorp.com.au> wrote:> So if we really want all the action to happen on restore, I think we > need a stop!() function for drivers called before interrupts are > re-enabled, and then we can call restart() later at our leisure. > > The driver, other than ensuring that "stop" can be called with > interrupts off, doesn''t have to know whether it''s called during save or > restore. > > Seem reasonable?Yes, that sounds fine to me. Jacob _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Harry Butterworth
2006-Jun-07 13:31 UTC
Re: [Xen-devel] [PATCH 1/9] Xen Share: Simplified I/O Mechanism
On Wed, 2006-06-07 at 12:35 +1000, Rusty Russell wrote:> On Tue, 2006-06-06 at 15:47 +0100, Harry Butterworth wrote: > > On Tue, 2006-06-06 at 15:35 +1000, Rusty Russell wrote: > > > This introduces a page "share" mechanism to xen: an alternative to > > > both cross-domain binding of event channels, and grant tables. > > > > Why do you think an alternative to event channels and grant tables is > > needed?<big snip>> If the > frontend or backend dies, there is no restriction on cleaning up its > memory.This is a good answer. In my use of grant-tables for the USB driver, I ended up with the possibility of one side having to poll indefinitely for the other side to free up its memory with no guarantee of this ever happening. The memory couldn''t be reused until it was freed because the other side had a writeable mapping (reusing even just a readable mapping would have been a security hole). I think a fix for this problem is a significant win for your approach over grant-tables.> Hope that clarifies!Yes thanks. Harry. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Mike D. Day
2006-Jun-14 17:26 UTC
Re: [Xen-devel] [PATCH 2/9] Linux kernel infrastructure for Xen Share access
Rusty Russell wrote:> The entire hypercall interface is arch-wrapped, which is probably > overkill, but I wasn''t entirely sure of the needs of non-x86 > architectures. Some of this should almost certainly be in common code. > > diff -r 6d476981e3a5 -r 07a00d96357d linux-2.6-xen-sparse/include/asm-i386/mach-xen/asm/share.h> +struct xen_share *create_share(share_ref_t share_ref, unsigned pages)<snip>> + > + prot = __pgprot(_PAGE_PRESENT|_PAGE_RW|_PAGE_DIRTY|_PAGE_ACCESSED); > + err = direct_kernel_remap_pfn_range((unsigned long)vma->addr, > + share_ref, pages * PAGE_SIZE, > + prot, DOMID_SELF);Using share_ref as the 2nd parameter to remap_pfn_range it becomes clear that share_ref is really a page frame number. This is also made clear in xen/arch/x86/share.c. My question is: is it useful to abstract the share reference when it is always going to be a page number? Is there any architecture where it wouldn''t be feasible to refer to a share by page number? Really this is just a readability issue. Mike _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Mike D. Day
2006-Jun-14 17:46 UTC
Re: [Xen-devel] [PATCH 1/9] Xen Share: Simplified I/O Mechanism
Rusty Russell wrote:> Finally, a scatter-gather list mechanism allows the domains to > associate their pages with arbitrary queue numbers in the shared > region, to transport bulk data (effectively by having the hypervisor > do "DMA" between domains).The sg transfer mechanism avoids bounce buffers and does a straight copy from_user or to_user below. So this is an alternative to page flipping, right? It would be interesting to see how the direct copy performs compared to page flipping on various architectures. Page table updates are not free and processors are getting more efficient at bulk dma.> > +/* Copy from src to dst, return amount copied. */ > +static int do_copy(const struct sg_list *sgdst, const struct sg_list *sgsrc, > + unsigned long (*copy)(paddr_t, paddr_t, unsigned long)) > +{ > + unsigned long totlen, src, dst, srcoff, dstoff; > + int ret = 0; > + > + totlen = 0; > + src = dst = 0; > + srcoff = dstoff = 0; > + while (src < sgsrc->num_sg) { > + unsigned long len; > + len = min(sgsrc->sg[src].len - srcoff, > + sgdst->sg[dst].len - dstoff); > + > + len = copy(sgdst->sg[dst].addr+dstoff, > + sgsrc->sg[src].addr+srcoff, > + len);_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel