thr3ads.net - Xen devel - [Xen-devel] /proc/xen/xenbus supports watch? [Sep 2005]

If this information is useful, please help other people find it:
Share via:

NAHieu

2005-Sep-08 08:02 UTC

[Xen-devel] /proc/xen/xenbus supports watch?

Hi,

Anybody (Christian?) could please tell me if we can get the support
for registering watch with /proc/xen/xenbus? (..OK, I know that we
will change it this /proc stuff to a device soon)

So far we can only do read/write/rm. I really miss the xen watch feature.

Many thanks,
Hieu

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Christian Limpach

2005-Sep-08 10:38 UTC

head link

Re: [Xen-devel] /proc/xen/xenbus supports watch?

On 9/8/05, NAHieu <nahieu@gmail.com> wrote:> Anybody (Christian?) could please tell me if we can get the support
> for registering watch with /proc/xen/xenbus? (..OK, I know that we
> will change it this /proc stuff to a device soon)
> 
> So far we can only do read/write/rm. I really miss the xen watch feature.
I don''t plan to add support for watches anytime soon myself.  Patches
are welcome of course!

   christian

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Rusty Russell

2005-Sep-09 00:43 UTC

head link

Re: [Xen-devel] /proc/xen/xenbus supports watch?

On Thu, 2005-09-08 at 11:38 +0100, Christian Limpach
wrote:> On 9/8/05, NAHieu <nahieu@gmail.com> wrote:
> > Anybody (Christian?) could please tell me if we can get the support
> > for registering watch with /proc/xen/xenbus? (..OK, I know that we
> > will change it this /proc stuff to a device soon)
> > 
> > So far we can only do read/write/rm. I really miss the xen watch
feature.
At the moment, the xenbus device is a simple hack which grabs the lock
on all store communication on open, and drops it on close.  It''s not
really a general mechanism for tools in domUs to communicate with the
store.

BTW, Christian, you''ll be happy to know I''m testing a patch
which
changes the unnatural ioctl into a read/write interface 8)

Cheers,
Rusty.
-- 
A bad analogy is like a leaky screwdriver -- Richard Braakman


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Christian Limpach

2005-Sep-13 09:42 UTC

head link

Re: [Xen-devel] /proc/xen/xenbus supports watch?

On 9/9/05, Rusty Russell <rusty@rustcorp.com.au>
wrote:> On Thu, 2005-09-08 at 11:38 +0100, Christian Limpach wrote:
> > On 9/8/05, NAHieu <nahieu@gmail.com> wrote:
> > > Anybody (Christian?) could please tell me if we can get the
support
> > > for registering watch with /proc/xen/xenbus? (..OK, I know that
we
> > > will change it this /proc stuff to a device soon)
> > >
> > > So far we can only do read/write/rm. I really miss the xen watch
feature.
> 
> At the moment, the xenbus device is a simple hack which grabs the lock
> on all store communication on open, and drops it on close.  It''s
not
> really a general mechanism for tools in domUs to communicate with the
> store.
Could we reduce the time we hold the lock to from before we call
xb_write until both data->bytes_left and data->awaiting_reply are 0
again?  I think this would work except for transactions, how do you
feel about supporting multiple transactions per connection?  Should we
at least extend the interface now so that we can add concurrent
transaction support later, i.e. transaction start returning a handle
and all operations taking a handle argument?

    christian

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Rusty Russell

2005-Sep-14 00:21 UTC

head link

Re: [Xen-devel] /proc/xen/xenbus supports watch?

On Tue, 2005-09-13 at 10:42 +0100, Christian Limpach
wrote:> On 9/9/05, Rusty Russell <rusty@rustcorp.com.au> wrote:
> > On Thu, 2005-09-08 at 11:38 +0100, Christian Limpach wrote:
> > > On 9/8/05, NAHieu <nahieu@gmail.com> wrote:
> > > > Anybody (Christian?) could please tell me if we can get the
support
> > > > for registering watch with /proc/xen/xenbus? (..OK, I know
that we
> > > > will change it this /proc stuff to a device soon)
> > > >
> > > > So far we can only do read/write/rm. I really miss the xen
watch feature.
> > 
> > At the moment, the xenbus device is a simple hack which grabs the lock
> > on all store communication on open, and drops it on close. 
It''s not
> > really a general mechanism for tools in domUs to communicate with the
> > store.
> 
> Could we reduce the time we hold the lock to from before we call
> xb_write until both data->bytes_left and data->awaiting_reply are 0
> again?  I think this would work except for transactions, how do you
> feel about supporting multiple transactions per connection?  Should we
> at least extend the interface now so that we can add concurrent
> transaction support later, i.e. transaction start returning a handle
> and all operations taking a handle argument?
I think if we want to do this we should actually introduce a new
mechanism for communications.  There''s no reason why a domain
can''t
introduce more pages for xenstore communication beyond the one it is
given to start with, is there?  (OK, unintroduce needs to take a shared
page instead/as well as a domid).

So when someone opens the xenbus dev, we introduce a new page to the
domain and the xenstored uses that for comms.  When closed, the page is
released.  This actually simplifies the xenbus_dev driver a lot: now
it''s just a dumb pass-through since we don''t have to worry
about the
userspace program blowing chunks all over the kernel''s comms mechanism.

I''ll hack something up and see what you think...
Rusty.
-- 
A bad analogy is like a leaky screwdriver -- Richard Braakman


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Christian Limpach

2005-Sep-14 08:24 UTC

head link

Re: [Xen-devel] /proc/xen/xenbus supports watch?

On Wed, Sep 14, 2005 at 10:21:04AM +1000, Rusty Russell
wrote:> I think if we want to do this we should actually introduce a new
> mechanism for communications.  There''s no reason why a domain
can''t
> introduce more pages for xenstore communication beyond the one it is
> given to start with, is there?  (OK, unintroduce needs to take a shared
> page instead/as well as a domid).
> 
> So when someone opens the xenbus dev, we introduce a new page to the
> domain and the xenstored uses that for comms.  When closed, the page is
> released.  This actually simplifies the xenbus_dev driver a lot: now
> it''s just a dumb pass-through since we don''t have to
worry about the
> userspace program blowing chunks all over the kernel''s comms
mechanism.
That would work but there''s a few reasons why this is impracticle:
- xenstored needs to keep track of these pages and this consumes
  resources in the server.  There''s no way for xenstored to close
  a connection to free up resources unless we want to start handling
  this case in all clients.  With multiple transactions, we can just
  timeout the transaction after 5 minutes, we have to handle the
  timeout case in the client already anyway.
- this complicates save/restore yet again because we''ll have to
  reconnect all these pages when the domain is restored

    christian


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Rusty Russell

2005-Sep-14 09:18 UTC

head link

Re: [Xen-devel] /proc/xen/xenbus supports watch?

On Wed, 2005-09-14 at 10:21 +1000, Rusty Russell wrote:> So when someone opens the xenbus dev, we introduce a new page to the
> domain and the xenstored uses that for comms.  When closed, the page is
> released.  This actually simplifies the xenbus_dev driver a lot: now
> it''s just a dumb pass-through since we don''t have to
worry about the
> userspace program blowing chunks all over the kernel''s comms
mechanism.
> 
> I''ll hack something up and see what you think...
OK, realized there''s a problem with suspend/resume.  Within the kernel,
we prevent suspend/resume by simply holding the xenstore_lock, so it
can''t happen during normal operations, or transactions.

So let''s leave this for now.  It is fairly similar to the problem of
xenstored restarts, which I''m trying to merge, so I suggest we revisit
after that...

Patch for reading only (I subbed in NULL for the default store page for
the moment, since I know there''s another patch out there which touches
this).

diff -r 0d8c0db04258 linux-2.6-xen-sparse/drivers/xen/xenbus/xenbus_comms.c
--- a/linux-2.6-xen-sparse/drivers/xen/xenbus/xenbus_comms.c	Tue Sep 13 21:52:24
2005
+++ b/linux-2.6-xen-sparse/drivers/xen/xenbus/xenbus_comms.c	Wed Sep 14 19:16:48
2005
@@ -46,14 +46,18 @@
 
 DECLARE_WAIT_QUEUE_HEAD(xb_waitq);
 
-static inline struct ringbuf_head *outbuf(void)
-{
-	return mfn_to_virt(xen_start_info->store_mfn);
-}
-
-static inline struct ringbuf_head *inbuf(void)
-{
-	return mfn_to_virt(xen_start_info->store_mfn) + PAGE_SIZE/2;
+static inline struct ringbuf_head *outbuf(void *page)
+{
+	if (!page)
+		return mfn_to_virt(xen_start_info->store_mfn);
+	return page;
+}
+
+static inline struct ringbuf_head *inbuf(void *page)
+{
+	if (!page)
+		return mfn_to_virt(xen_start_info->store_mfn) + PAGE_SIZE/2;
+	return page + PAGE_SIZE/2;
 }
 
 static irqreturn_t wake_waiting(int irq, void *unused, struct pt_regs *regs)
@@ -117,10 +121,10 @@
 	return avail != 0;
 }
 
-int xb_write(const void *data, unsigned len)
+int xb_write(const void *data, unsigned len, void *page)
 {
 	struct ringbuf_head h;
-	struct ringbuf_head *out = outbuf();
+	struct ringbuf_head *out = outbuf(page);
 
 	do {
 		void *dst;
@@ -151,19 +155,19 @@
 	return 0;
 }
 
-int xs_input_avail(void)
+int xs_input_avail(void *page)
 {
 	unsigned int avail;
-	struct ringbuf_head *in = inbuf();
+	struct ringbuf_head *in = inbuf(page);
 
 	get_input_chunk(in, in->buf, &avail);
 	return avail != 0;
 }
 
-int xb_read(void *data, unsigned len)
+int xb_read(void *data, unsigned len, void *page)
 {
 	struct ringbuf_head h;
-	struct ringbuf_head *in = inbuf();
+	struct ringbuf_head *in = inbuf(page);
 	int was_full;
 
 	while (len != 0) {
@@ -200,34 +204,25 @@
 	return 0;
 }
 
-/* Set up interrupt handler off store event channel. */
-int xb_init_comms(void)
+/* Set up interrupt handler off store event channel, and clear page. */
+int xb_init_comms(unsigned int evtchn, void *page)
 {
 	int err;
 
-	if (!xen_start_info->store_evtchn)
-		return 0;
-
-	err = bind_evtchn_to_irqhandler(
-		xen_start_info->store_evtchn, wake_waiting,
-		0, "xenbus", &xb_waitq);
+	err = bind_evtchn_to_irqhandler(evtchn, wake_waiting,
+					0, "xenbus", &xb_waitq);
 	if (err) {
 		printk(KERN_ERR "XENBUS request irq failed %i\n", err);
-		unbind_evtchn_from_irq(xen_start_info->store_evtchn);
 		return err;
 	}
 
 	/* FIXME zero out page -- domain builder should probably do this*/
-	memset(mfn_to_virt(xen_start_info->store_mfn), 0, PAGE_SIZE);
-
+	memset(inbuf(page), 0, PAGE_SIZE/2);
+	memset(outbuf(page), 0, PAGE_SIZE/2);
 	return 0;
 }
 
-void xb_suspend_comms(void)
-{
-
-	if (!xen_start_info->store_evtchn)
-		return;
-
-	unbind_evtchn_from_irqhandler(xen_start_info->store_evtchn, &xb_waitq);
-}
+void xb_stop_comms(unsigned int evtchn)
+{
+	unbind_evtchn_from_irqhandler(evtchn, &xb_waitq);
+}
diff -r 0d8c0db04258 linux-2.6-xen-sparse/drivers/xen/xenbus/xenbus_comms.h
--- a/linux-2.6-xen-sparse/drivers/xen/xenbus/xenbus_comms.h	Tue Sep 13 21:52:24
2005
+++ b/linux-2.6-xen-sparse/drivers/xen/xenbus/xenbus_comms.h	Wed Sep 14 19:16:48
2005
@@ -29,13 +29,13 @@
 #define _XENBUS_COMMS_H
 
 int xs_init(void);
-int xb_init_comms(void);
-void xb_suspend_comms(void);
+int xb_init_comms(unsigned int evtchn, void *page);
+void xb_stop_comms(unsigned int evtchn);
 
-/* Low level routines. */
-int xb_write(const void *data, unsigned len);
-int xb_read(void *data, unsigned len);
-int xs_input_avail(void);
+/* Low level routines: page is NULL for default store page */
+int xb_write(const void *data, unsigned len, void *page);
+int xb_read(void *data, unsigned len, void *page);
+int xs_input_avail(void *page);
 extern wait_queue_head_t xb_waitq;
 
 #endif /* _XENBUS_COMMS_H */
diff -r 0d8c0db04258 linux-2.6-xen-sparse/drivers/xen/xenbus/xenbus_dev.c
--- a/linux-2.6-xen-sparse/drivers/xen/xenbus/xenbus_dev.c	Tue Sep 13 21:52:24
2005
+++ b/linux-2.6-xen-sparse/drivers/xen/xenbus/xenbus_dev.c	Wed Sep 14 19:16:48
2005
@@ -36,6 +36,7 @@
 #include <linux/notifier.h>
 #include <linux/wait.h>
 #include <linux/fs.h>
+#include <linux/spinlock.h>
 
 #include "xenbus_comms.h"
 
@@ -45,90 +46,47 @@
 #include <asm-xen/linux-public/xenstored.h>
 
 struct xenbus_dev_data {
-	/* Are there bytes left to be read in this message? */
-	int bytes_left;
-	/* Are we still waiting for the reply to a message we wrote? */
-	int awaiting_reply;
-	/* Buffer for outgoing messages. */
-	unsigned int len;
-	union {
-		struct xsd_sockmsg msg;
-		char buffer[PAGE_SIZE];
-	} u;
+	unsigned int evtchn;
+	char *page;
+	char bouncebuf[PAGE_SIZE];
 };
 
 static struct proc_dir_entry *xenbus_dev_intf;
 
-/* Reply can be long (dir, getperm): don''t buffer, just examine
- * headers so we can discard rest if they die. */
 static ssize_t xenbus_dev_read(struct file *filp,
 			       char __user *ubuf,
 			       size_t len, loff_t *ppos)
 {
 	struct xenbus_dev_data *data = filp->private_data;
-	struct xsd_sockmsg msg;
-	int err;
+	int ret;
 
-	/* Refill empty buffer? */
-	if (data->bytes_left == 0) {
-		if (len < sizeof(msg))
-			return -EINVAL;
-
-		err = xb_read(&msg, sizeof(msg));
-		if (err)
-			return err;
-		data->bytes_left = msg.len;
-		if (ubuf && copy_to_user(ubuf, &msg, sizeof(msg)) != 0)
-			return -EFAULT;
-		/* We can receive spurious XS_WATCH_EVENT messages. */
-		if (msg.type != XS_WATCH_EVENT)
-			data->awaiting_reply = 0;
-		return sizeof(msg);
-	}
-
-	/* Don''t read over next header, or over temporary buffer. */
-	if (len > sizeof(data->u.buffer))
-		len = sizeof(data->u.buffer);
-	if (len > data->bytes_left)
-		len = data->bytes_left;
-
-	err = xb_read(data->u.buffer, len);
-	if (err)
-		return err;
-
-	data->bytes_left -= len;
-	if (ubuf && copy_to_user(ubuf, data->u.buffer, len) != 0)
-		return -EFAULT;
-	return len;
+	if (len > sizeof(data->bouncebuf))
+		len = sizeof(data->bouncebuf);
+	ret = xb_read(data->bouncebuf, len, data->page);
+	if (ret && copy_to_user(ubuf, data->bouncebuf, len) != 0)
+		ret = -EFAULT;
+	return ret;
 }
 
-/* We do v. basic sanity checking so they don''t screw up kernel later.
*/
 static ssize_t xenbus_dev_write(struct file *filp,
 				const char __user *ubuf,
 				size_t len, loff_t *ppos)
 {
 	struct xenbus_dev_data *data = filp->private_data;
-	int err;
+	int ret;
 
-	/* We gather data in buffer until we''re ready to send it. */
-	if (len > data->len + sizeof(data->u))
-		return -EINVAL;
-	if (copy_from_user(data->u.buffer + data->len, ubuf, len) != 0)
+	if (len > sizeof(data->bouncebuf))
+		len = sizeof(data->bouncebuf);
+	if (copy_from_user(data->bouncebuf, ubuf, len) != 0)
 		return -EFAULT;
-	data->len += len;
-	if (data->len >= sizeof(data->u.msg) + data->u.msg.len) {
-		err = xb_write(data->u.buffer, data->len);
-		if (err)
-			return err;
-		data->len = 0;
-		data->awaiting_reply = 1;
-	}
-	return len;
+	return xb_write(data->boundbuf, len);
 }
 
 static int xenbus_dev_open(struct inode *inode, struct file *filp)
 {
-	struct xenbus_dev_data *u;
+	struct xenbus_dev_data *data;
+	int err;
+	evtchn_op_t op = { .cmd = EVTCHNOP_alloc_unbound };
 
 	if (xen_start_info->store_evtchn == 0)
 		return -ENOENT;
@@ -136,34 +94,71 @@
 	/* Don''t try seeking. */
 	nonseekable_open(inode, filp);
 
-	u = kmalloc(sizeof(*u), GFP_KERNEL);
-	if (u == NULL)
-		return -ENOMEM;
+	data = kmalloc(sizeof(*data), GFP_KERNEL);
+	if (!data) {
+		err = -ENOMEM;
+		goto out;
+	}
+	data->page = (void *)__get_free_page(GFP_KERNEL);
+	if (!data->page) {
+		err = -ENOMEM;
+		goto free_data;
+	}
+	op.u.alloc_unbound.dom = 0;
+	err = HYPERVISOR_event_channel_op(&op);
+	if (err)
+		goto free_page;
+	data->evtchn = op.u.alloc_unbound.port;
+	err = xb_init_comms(data->evtchn, data->page);
+	if (err)
+		goto release_evtchn;
 
-	memset(u, 0, sizeof(*u));
+	down(&xenbus_lock);
+	err = xenbus_introduce(virt_to_mfn(data->page), data->evtchn);
+	up(&xenbus_lock);
+	if (err)
+		goto release_comms;
 
 	filp->private_data = u;
+	return 0;
 
-	down(&xenbus_lock);
+release_comms:
+	xb_stop_comms(data->evtchn);
+release_evtchn:
+	op.cmd = EVTCHNOP_close;
+	op.u.close.dom = DOMID_SELF;
+	op.u.close.port = data->evtchn;
+	if (HYPERVISOR_event_channel_op(&op) != 0)
+		printk(KERN_WARNING "xenbus_dev: channel close failed\n");
+free_page:
+	free_page((unsigned long)data->page);
+free_data:
+	kfree(data);
+out:
+	return err;
+}
 
-	return 0;
-}
+static void xenbus_dev_stop(struct xenbus_dev_data *data)
+{
+
 
 static int xenbus_dev_release(struct inode *inode, struct file *filp)
 {
 	struct xenbus_dev_data *data = filp->private_data;
+	evtchn_op_t op = { .cmd = EVTCHNOP_close };
 
-	/* Discard any unread replies. */
-	while (data->bytes_left || data->awaiting_reply)
-		xenbus_dev_read(filp, NULL, sizeof(data->u.buffer), NULL);
-
-	/* Harmless if no transaction in progress. */
-	xenbus_transaction_end(1);
-
+	down(&xenbus_lock);
+	if (xenbus_release(virt_to_mfn(data->page)) != 0)
+		printk(KERN_WARNING "xenbus_dev: release failed\n");
 	up(&xenbus_lock);
 
+	xb_stop_comms(data->evtchn);
+	op.u.close.dom = DOMID_SELF;
+	op.u.close.port = data->evtchn;
+	if (HYPERVISOR_event_channel_op(&op) != 0)
+		printk(KERN_WARNING "xenbus_dev: channel close failed\n");
+	free_page((unsigned long)data->page);
 	kfree(data);
-
 	return 0;
 }
 
diff -r 0d8c0db04258 linux-2.6-xen-sparse/drivers/xen/xenbus/xenbus_probe.c
--- a/linux-2.6-xen-sparse/drivers/xen/xenbus/xenbus_probe.c	Tue Sep 13 21:52:24
2005
+++ b/linux-2.6-xen-sparse/drivers/xen/xenbus/xenbus_probe.c	Wed Sep 14 19:16:48
2005
@@ -607,12 +607,12 @@
 	down(&xenbus_lock);
 	bus_for_each_dev(&xenbus_frontend.bus, NULL, NULL, suspend_dev);
 	bus_for_each_dev(&xenbus_backend.bus, NULL, NULL, suspend_dev);
-	xb_suspend_comms();
+	xb_stop_comms(xen_start_info->store_evtchn);
 }
 
 void xenbus_resume(void)
 {
-	xb_init_comms();
+	xb_init_comms(xen_start_info->store_evtchn, NULL);
 	reregister_xenbus_watches();
 	bus_for_each_dev(&xenbus_frontend.bus, NULL, NULL, resume_dev);
 	bus_for_each_dev(&xenbus_backend.bus, NULL, NULL, resume_dev);
diff -r 0d8c0db04258 linux-2.6-xen-sparse/drivers/xen/xenbus/xenbus_xs.c
--- a/linux-2.6-xen-sparse/drivers/xen/xenbus/xenbus_xs.c	Tue Sep 13 21:52:24
2005
+++ b/linux-2.6-xen-sparse/drivers/xen/xenbus/xenbus_xs.c	Wed Sep 14 19:16:48
2005
@@ -426,6 +426,44 @@
 }
 EXPORT_SYMBOL(xenbus_gather);
 
+int xenbus_introduce(unsigned long mfn, unsigned int evtchn)
+{
+	char domid_str[21];
+	char mfn_str[21];
+	char eventchn_str[21];
+	struct kvec iov[4];
+
+	sprintf(domid_str, "%u", DOMID_SELF);
+	sprintf(mfn_str, "%lu", mfn);
+	sprintf(eventchn_str, "%u", eventchn);
+
+	iov[0].iov_base = domid_str;
+	iov[0].iov_len = strlen(domid_str) + 1;
+	iov[1].iov_base = mfn_str;
+	iov[1].iov_len = strlen(mfn_str) + 1;
+	iov[2].iov_base = eventchn_str;
+	iov[2].iov_len = strlen(eventchn_str) + 1;
+	iov[3].iov_base = "";
+	iov[3].iov_len = 1;
+
+	return xs_error(xs_talkv(h, XS_INTRODUCE, iov, ARRAY_SIZE(iov), NULL));
+}
+
+int xenbus_release(unsigned long mfn);
+{
+	struct iovec iov[2];
+	char domid_str[21], mfn_str[21];
+
+	sprintf(domid_str, "%u", DOMID_SELF);
+	sprintf(mfn_str, "%lu", mfn);
+	iov[0].iov_base = domid_str;
+	iov[0].iov_len = strlen(domid_str) + 1;
+	iov[1].iov_base = mfn_str;
+	iov[1].iov_len = strlen(mfn_str) + 1;
+
+	return xs_error(xs_talkv(h, XS_RELEASE, iov, ARRAY_SIZE(iov), NULL));
+}
+
 static int xs_watch(const char *path, const char *token)
 {
 	struct kvec iov[2];
@@ -569,7 +607,7 @@
 	int err;
 	struct task_struct *watcher;
 
-	err = xb_init_comms();
+	err = xb_init_comms(xen_start_info->store_evtchn, NULL);
 	if (err)
 		return err;
 	
diff -r 0d8c0db04258 linux-2.6-xen-sparse/include/asm-xen/xenbus.h
--- a/linux-2.6-xen-sparse/include/asm-xen/xenbus.h	Tue Sep 13 21:52:24 2005
+++ b/linux-2.6-xen-sparse/include/asm-xen/xenbus.h	Wed Sep 14 19:16:48 2005
@@ -90,6 +90,8 @@
 int xenbus_rm(const char *dir, const char *node);
 int xenbus_transaction_start(const char *subtree);
 int xenbus_transaction_end(int abort);
+int xenbus_introduce(unsigned long mfn, unsigned int evtchn);
+int xenbus_release(unsigned long mfn);
 
 /* Single read and scanf: returns -errno or num scanned if > 0. */
 int xenbus_scanf(const char *dir, const char *node, const char *fmt, ...)
diff -r 0d8c0db04258 tools/python/xen/lowlevel/xs/xs.c
--- a/tools/python/xen/lowlevel/xs/xs.c	Tue Sep 13 21:52:24 2005
+++ b/tools/python/xen/lowlevel/xs/xs.c	Wed Sep 14 19:16:48 2005
@@ -708,7 +708,7 @@
 }
 
 #define xspy_release_domain_doc "\n"					\
-	"Tell xenstore to release its channel to a domain.\n"		\
+	"Tell xenstore to release its channel(s) to a domain.\n"	\
 	"Unless this is done the domain will not be released.\n"	\
 	" dom [int]: domain id\n"					\
 	"\n"								\
@@ -733,7 +733,7 @@
                                      &dom))
         goto exit;
     Py_BEGIN_ALLOW_THREADS
-    xsval = xs_release_domain(xh, dom);
+    xsval = xs_release_domain(xh, dom, 0);
     Py_END_ALLOW_THREADS
     if (!xsval) {
         PyErr_SetFromErrno(PyExc_RuntimeError);
diff -r 0d8c0db04258 tools/xenstore/testsuite/09domain.test
--- a/tools/xenstore/testsuite/09domain.test	Tue Sep 13 21:52:24 2005
+++ b/tools/xenstore/testsuite/09domain.test	Wed Sep 14 19:16:48 2005
@@ -10,10 +10,16 @@
 close
 
 # Release that domain.
-release 1
+release 1 100
 close
 
 # Introduce and release by same connection.
 expect handle is 2
 introduce 1 100 7 /my/home
-release 1
+release 1 100
+
+# Introduce another connection from a domain connection
+expect handle is 3
+# FIXME: We can''t actually handle multiple domain connections in
xs_test.
+introduce 1 101 7 /my/home
+release 1 0
diff -r 0d8c0db04258 tools/xenstore/testsuite/11domain-watch.test
--- a/tools/xenstore/testsuite/11domain-watch.test	Tue Sep 13 21:52:24 2005
+++ b/tools/xenstore/testsuite/11domain-watch.test	Wed Sep 14 19:16:48 2005
@@ -12,7 +12,7 @@
 1 waitwatch
 1 ackwatch token
 1 unwatch /test token
-release 1
+release 1 0
 1 close
 
 # ignore watches while doing commands, should work.
@@ -26,7 +26,7 @@
 expect 1:/dir/test:token
 1 waitwatch
 1 ackwatch token
-release 1
+release 1 0
 1 close
 
 # unwatch
@@ -39,7 +39,7 @@
 expect 1:/dir/test2:token2
 1 waitwatch
 1 unwatch /dir token2
-release 1
+release 1 0
 1 close
 
 # unwatch while watch pending.
@@ -48,5 +48,5 @@
 1 watch /dir token1
 write /dir/test2 create contents
 1 unwatch /dir token1
-release 1
+release 1 0
 1 close
diff -r 0d8c0db04258 tools/xenstore/xenstored_core.c
--- a/tools/xenstore/xenstored_core.c	Tue Sep 13 21:52:24 2005
+++ b/tools/xenstore/xenstored_core.c	Wed Sep 14 19:16:48 2005
@@ -1285,7 +1285,7 @@
 		break;
 
 	case XS_RELEASE:
-		do_release(conn, onearg(in));
+		do_release(conn, in);
 		break;
 
 	case XS_GET_DOMAIN_PATH:
diff -r 0d8c0db04258 tools/xenstore/xenstored_domain.c
--- a/tools/xenstore/xenstored_domain.c	Tue Sep 13 21:52:24 2005
+++ b/tools/xenstore/xenstored_domain.c	Wed Sep 14 19:16:48 2005
@@ -51,6 +51,9 @@
 	/* Event channel port */
 	u16 port;
 
+	/* Shared page frame */
+	unsigned long mfn;
+
 	/* Domain path in store. */
 	char *path;
 
@@ -273,6 +276,7 @@
 	domain = talloc(context, struct domain);
 	domain->port = 0;
 	domain->domid = domid;
+	domain->mfn = mfn;
 	domain->path = talloc_strdup(domain, path);
 	domain->page = xc_map_foreign_range(*xc_handle, domain->domid,
 					    getpagesize(),
@@ -302,6 +306,10 @@
 void do_introduce(struct connection *conn, struct buffered_data *in)
 {
 	struct domain *domain;
+	domid_t domid;
+	unsigned long mfn;
+	unsigned int evtchn;
+	char *path;
 	char *vec[4];
 
 	if (get_strings(in, vec, ARRAY_SIZE(vec)) < ARRAY_SIZE(vec)) {
@@ -309,7 +317,20 @@
 		return;
 	}
 
-	if (conn->id != 0) {
+	domid = atoi(vec[0]);
+	mfn = atol(vec[1]);
+	evtchn = atoi(vec[2]);
+	path = vec[3];
+
+	/* Domains can introduce more pages: share same path. */
+	if (conn->domain) {
+		if (domid != DOMID_SELF) {
+			send_error(conn, EACCES);
+			return;
+		}
+		domid = conn->domain->domid;
+		path = conn->domain->path;
+	} else if (conn->id != 0) {
 		send_error(conn, EACCES);
 		return;
 	}
@@ -320,13 +341,12 @@
 	}
 
 	/* Sanity check args. */
-	if ((atoi(vec[2]) <= 0) || !is_valid_nodename(vec[3])) {
+	if (mfn == 0 || !is_valid_nodename(path)) {
 		send_error(conn, EINVAL);
 		return;
 	}
 	/* Hang domain off "in" until we''re finished. */
-	domain = new_domain(in, atoi(vec[0]), atol(vec[1]), atol(vec[2]),
-			    vec[3]);
+	domain = new_domain(in, domid, mfn, evtchn, path);
 	if (!domain) {
 		send_error(conn, errno);
 		return;
@@ -340,54 +360,69 @@
 	send_ack(conn, XS_INTRODUCE);
 }
 
-static struct domain *find_domain_by_domid(domid_t domid)
+static struct domain *find_domain(domid_t domid, unsigned long mfn)
 {
 	struct domain *i;
 
 	list_for_each_entry(i, &domains, list) {
-		if (i->domid == domid)
-			return i;
+		if (i->domid == domid) {
+			if (!mfn || i->mfn == mfn)
+				return i;
+		}
 	}
 	return NULL;
 }
 
-/* domid */
-void do_release(struct connection *conn, const char *domid_str)
+/* domid, mfn */
+void do_release(struct connection *conn, struct buffered_data *in)
 {
 	struct domain *domain;
 	domid_t domid;
-
-	if (!domid_str) {
+	unsigned long mfn;
+	char *vec[2];
+
+	if (get_strings(in, vec, ARRAY_SIZE(vec)) < ARRAY_SIZE(vec)) {
 		send_error(conn, EINVAL);
 		return;
 	}
-
-	domid = atoi(domid_str);
+	domid = atol(vec[0]);
+	mfn = atol(vec[1]);
+
 	if (!domid) {
 		send_error(conn, EINVAL);
 		return;
 	}
 
-	if (conn->id != 0) {
+	if (!conn->can_write) {
+		send_error(conn, EROFS);
+		return;
+	}
+
+	/* Domains can release own pages, but not all of them. */
+	if (conn->domain) {
+		if (domid != DOMID_SELF || !mfn) {
+			send_error(conn, EACCES);
+			return;
+		}
+		domid = conn->domain->domid;
+	} else if (conn->id != 0) {
 		send_error(conn, EACCES);
 		return;
 	}
 
-	domain = find_domain_by_domid(domid);
-	if (!domain) {
-		send_error(conn, ENOENT);
-		return;
-	}
-
-	if (!domain->conn) {
-		send_error(conn, EINVAL);
-		return;
-	}
-
-	talloc_free(domain->conn);
-
-	fire_watches(NULL, "@releaseDomain", false);
-
+	if (mfn) {
+		domain = find_domain(domid, mfn);
+		if (!domain) {
+			send_error(conn, ENOENT);
+			return;
+		}
+		talloc_free(domain->conn);
+	} else {
+		while ((domain = find_domain(domid, 0)) != NULL)
+			talloc_free(domain->conn);
+
+		fire_watches(NULL, "@releaseDomain", false);
+	}
 	send_ack(conn, XS_RELEASE);
 }
 
@@ -405,7 +440,7 @@
 	if (domid == DOMID_SELF)
 		domain = conn->domain;
 	else
-		domain = find_domain_by_domid(domid);
+		domain = find_domain(domid, 0);
 
 	if (!domain)
 		send_error(conn, ENOENT);
diff -r 0d8c0db04258 tools/xenstore/xenstored_domain.h
--- a/tools/xenstore/xenstored_domain.h	Tue Sep 13 21:52:24 2005
+++ b/tools/xenstore/xenstored_domain.h	Wed Sep 14 19:16:48 2005
@@ -26,7 +26,7 @@
 void do_introduce(struct connection *conn, struct buffered_data *in);
 
 /* domid */
-void do_release(struct connection *conn, const char *domid_str);
+void do_release(struct connection *conn, struct buffered_data *in);
 
 /* domid */
 void do_get_domain_path(struct connection *conn, const char *domid_str);
diff -r 0d8c0db04258 tools/xenstore/xs.c
--- a/tools/xenstore/xs.c	Tue Sep 13 21:52:24 2005
+++ b/tools/xenstore/xs.c	Wed Sep 14 19:16:48 2005
@@ -565,13 +565,19 @@
 	return xs_bool(xs_talkv(h, XS_INTRODUCE, iov, ARRAY_SIZE(iov), NULL));
 }
 
-bool xs_release_domain(struct xs_handle *h, domid_t domid)
-{
-	char domid_str[MAX_STRLEN(domid)];
+bool xs_release_domain(struct xs_handle *h, domid_t domid, unsigned long mfn)
+{
+	struct iovec iov[2];
+	char domid_str[MAX_STRLEN(domid)], mfn_str[MAX_STRLEN(mfn)];
 
 	sprintf(domid_str, "%u", domid);
-
-	return xs_bool(xs_single(h, XS_RELEASE, domid_str, NULL));
+	sprintf(mfn_str, "%lu", mfn);
+	iov[0].iov_base = domid_str;
+	iov[0].iov_len = strlen(domid_str) + 1;
+	iov[1].iov_base = mfn_str;
+	iov[1].iov_len = strlen(mfn_str) + 1;
+
+	return xs_bool(xs_talkv(h, XS_RELEASE, iov, ARRAY_SIZE(iov), NULL));
 }
 
 char *xs_get_domain_path(struct xs_handle *h, domid_t domid)
diff -r 0d8c0db04258 tools/xenstore/xs.h
--- a/tools/xenstore/xs.h	Tue Sep 13 21:52:24 2005
+++ b/tools/xenstore/xs.h	Wed Sep 14 19:16:48 2005
@@ -125,14 +125,16 @@
 /* Introduce a new domain.
  * This tells the store daemon about a shared memory page, event channel
  * and store path associated with a domain: the domain uses these to
communicate.
+ * The domain can use this to add its own extra pages.
  */
 bool xs_introduce_domain(struct xs_handle *h, domid_t domid, unsigned long mfn,
                          unsigned int eventchn, const char *path);
 
 /* Release a domain.
  * Tells the store domain to release the memory page to the domain.
+ * mfn == 0 means to release all pages.
  */
-bool xs_release_domain(struct xs_handle *h, domid_t domid);
+bool xs_release_domain(struct xs_handle *h, domid_t domid, unsigned long mfn);
 
 /* Query the home path of a domain.
  */
diff -r 0d8c0db04258 tools/xenstore/xs_test.c
--- a/tools/xenstore/xs_test.c	Tue Sep 13 21:52:24 2005
+++ b/tools/xenstore/xs_test.c	Wed Sep 14 19:16:48 2005
@@ -207,6 +207,7 @@
 	     "  start <node>\n"
 	     "  abort\n"
 	     "  introduce <domid> <mfn> <eventchn>
<path>\n"
+	     "  release <domid> <mfn>\n"
 	     "  commit\n"
 	     "  sleep <milliseconds>\n"
 	     "  expect <pattern>\n"
@@ -630,9 +631,9 @@
 	daemon_pid = *(int *)((void *)out + 32);
 }
 
-static void do_release(unsigned int handle, const char *domid)
-{
-	if (!xs_release_domain(handles[handle], atoi(domid)))
+static void do_release(unsigned int handle, const char *domid, const char *mfn)
+{
+	if (!xs_release_domain(handles[handle], atoi(domid), atol(mfn)))
 		failed(handle);
 }
 
@@ -809,7 +810,7 @@
 		do_introduce(handle, arg(line, 1), arg(line, 2),
 			     arg(line, 3), arg(line, 4));
 	else if (streq(command, "release"))
-		do_release(handle, arg(line, 1));
+		do_release(handle, arg(line, 1), arg(line, 2));
 	else if (streq(command, "dump"))
 		dump(handle);
 	else if (streq(command, "sleep")) {


-- 
A bad analogy is like a leaky screwdriver -- Richard Braakman


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Christian Limpach

2005-Sep-14 12:55 UTC

head link

Re: [Xen-devel] /proc/xen/xenbus supports watch?

On 9/14/05, Rusty Russell <rusty@rustcorp.com.au>
wrote:> Patch for reading only (I subbed in NULL for the default store page for
> the moment, since I know there''s another patch out there which
touches
> this).
I really don''t think that the multi-page approach is good and
it''s
also orthogonal, i.e. we could have multiple connections but still
want concurrent transactions on the same connection.  What''s wrong
with concurrent transactions on the same connection?

    christian

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Rusty Russell

2005-Sep-15 01:39 UTC

head link

Re: [Xen-devel] /proc/xen/xenbus supports watch?

On Wed, 2005-09-14 at 13:55 +0100, Christian Limpach
wrote:> On 9/14/05, Rusty Russell <rusty@rustcorp.com.au> wrote:
> > Patch for reading only (I subbed in NULL for the default store page
for
> > the moment, since I know there''s another patch out there
which touches
> > this).
> 
> I really don''t think that the multi-page approach is good and
it''s
> also orthogonal, i.e. we could have multiple connections but still
> want concurrent transactions on the same connection.  What''s wrong
> with concurrent transactions on the same connection?
You''re right it''s orthogonal, but we really do want separate
connections
for each client: they''re logically separate, so overloading them on one
transport is going to be a hack.  Inside the kernel we do it to a
limited degree, but already have a proxy for handling watches and we
trust everyone to get it right and use One Big Lock.

Now, we still might want concurrent transactions for a single user, but
more likely we want to get rid of the "root of transaction" model
altogether, since noone likes it, and there''s an unrelated issue with
NFS-root mounted store (it breaks horribly) that is likely to change our
underlying (on-disk directory-based) implementation.  Thoughts?

Rusty.
-- 
A bad analogy is like a leaky screwdriver -- Richard Braakman


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2005-Sep-15 10:53 UTC

head link

Re: [Xen-devel] /proc/xen/xenbus supports watch?

On 15 Sep 2005, at 02:39, Rusty Russell wrote:
> we really do want separate connections
> for each client: they''re logically separate, so overloading them
on one
> transport is going to be a hack.
How does two connections being ''logically separate'' imply that
it is
improper for them not to also be ''physically separate''?
Multiplexing
multiple simultaneous connections/transactions onto a single underlying 
page-level transport would seem fine to me!

  -- Keir


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Christian Limpach

2005-Sep-15 11:02 UTC

head link

Re: [Xen-devel] /proc/xen/xenbus supports watch?

On Thu, Sep 15, 2005 at 11:39:50AM +1000, Rusty Russell
wrote:> On Wed, 2005-09-14 at 13:55 +0100, Christian Limpach wrote:
> > On 9/14/05, Rusty Russell <rusty@rustcorp.com.au> wrote:
> > > Patch for reading only (I subbed in NULL for the default store
page for
> > > the moment, since I know there''s another patch out there
which touches
> > > this).
> > 
> > I really don''t think that the multi-page approach is good and
it''s
> > also orthogonal, i.e. we could have multiple connections but still
> > want concurrent transactions on the same connection.  What''s
wrong
> > with concurrent transactions on the same connection?
> 
> You''re right it''s orthogonal, but we really do want
separate connections
> for each client: they''re logically separate, so overloading them
on one
> transport is going to be a hack.  Inside the kernel we do it to a
> limited degree, but already have a proxy for handling watches and we
> trust everyone to get it right and use One Big Lock.
I think it''s not even a hack by your definition because at least for
the uses why I added the device (backend device setup from hotplug
scripts), it is logically the same client.

It seems to me that the solutions we want are extremely opposite:
- multiple pages with single connection and single transaction
- single page with single connection and multiple transactions

My main objections against multiple pages are:
- setup/teardown overhead: we''ll have to add messages to setup
  and teardown a new connection
- we have to maintain state about the connection in the daemon
- save/restore becomes more complicated
> Now, we still might want concurrent transactions for a single user, but
> more likely we want to get rid of the "root of transaction" model
> altogether, since noone likes it, and there''s an unrelated issue
with
> NFS-root mounted store (it breaks horribly) that is likely to change our
> underlying (on-disk directory-based) implementation.  Thoughts?
I''d like us to extend the message format so that we can support
concurrent transactions in the future, i.e. have transaction_start
return a transaction handle and have everything else support a
transaction handle.  We can return 0 as the handle for now and
also use 0 for outside of transaction operation.

    christian


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Rusty Russell

2005-Sep-17 08:26 UTC

head link

Re: [Xen-devel] /proc/xen/xenbus supports watch?

On Thu, 2005-09-15 at 11:53 +0100, Keir Fraser wrote:> On 15 Sep 2005, at 02:39, Rusty Russell wrote:
> 
> > we really do want separate connections
> > for each client: they''re logically separate, so overloading
them on one
> > transport is going to be a hack.
> 
> How does two connections being ''logically separate'' imply
that it is
> improper for them not to also be ''physically separate''?
Multiplexing
> multiple simultaneous connections/transactions onto a single underlying 
> page-level transport would seem fine to me!
Um, multiplexing, like any feature, adds complexity: if we don''t need
it, don''t do it.  <shrug>

We have a way of establishing new ringbuffers to talk to the store, we
just currently assume one per domain.  Loosening that seems simpler and
more robust than introducing a multiplexing layer, unless you two can
see something I can''t?

Christian says:> My main objections against multiple pages are:
> - setup/teardown overhead: we''ll have to add messages to setup
>   and teardown a new connection
Which we already have, as above.
> - we have to maintain state about the connection in the daemon
But allowing multiple connections over one transport doesn''t change
this.
> - save/restore becomes more complicated
Actually, I think it becomes simpler we simply force the device closed,
which should be handled by libxenstore just the same as a unix domain
socket closing on daemon restart, AFAICT.  So it has an appeal.

What was the reason for wanting multiple transactions per connection?
Changing the interface is going to be a PITA, so we should figure out if
we''re going to need that soon...

Thanks!
Rusty.
-- 
A bad analogy is like a leaky screwdriver -- Richard Braakman


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2005-Sep-17 08:33 UTC

head link

Re: [Xen-devel] /proc/xen/xenbus supports watch?

On 17 Sep 2005, at 09:26, Rusty Russell wrote:
>> How does two connections being ''logically separate''
imply that it is
>> improper for them not to also be ''physically
separate''? Multiplexing
>> multiple simultaneous connections/transactions onto a single 
>> underlying
>> page-level transport would seem fine to me!
>
> Um, multiplexing, like any feature, adds complexity: if we don''t
need
> it, don''t do it.  <shrug>
That doesn''t make it a ''hack''.
> We have a way of establishing new ringbuffers to talk to the store, we
> just currently assume one per domain.  Loosening that seems simpler and
> more robust than introducing a multiplexing layer, unless you two can
> see something I can''t?
Multiplexing will require user-space reads/writes to be passed to the 
kernel rather than stuffing its own comms page directly. This has the 
advantage of being what we already do, and any performance 
disadvantages really don''t matter.

On the xenstored side it ought simply to be a matter of picking a 
transaction or connection id out of the message to index into some kind 
of state table.

If we have multiple pages the client driver is complicated by reserving 
user pages and creating grant references for them, and cleanly tearing 
down and dealloc''ing grant references at the appropriate point(s). I 
agree the daemon doesn''t really get any more complicated, but I think 
save/restore will need extra code, either in the domain0 tools or in 
the guest os, to reconnect pages through to xenstored.

Maybe there is a hidden complexity to muxing that we don''t see? I guess
save/restore needs some care because transaction-id state will be lost 
when we reconnect to a new xenstored.

  -- Keir

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Christian Limpach

2005-Sep-17 17:40 UTC

head link

Re: [Xen-devel] /proc/xen/xenbus supports watch?

On Sat, Sep 17, 2005 at 06:26:04PM +1000, Rusty Russell
wrote:> What was the reason for wanting multiple transactions per connection?
> Changing the interface is going to be a PITA, so we should figure out if
> we''re going to need that soon...
It solves the immediate problem of making the xenbus lock in the kernel
driver more fine grained.  I''m not too happy with how a userspace
program can block all xenbus access for that domain at the moment,
even if this is limited to the root user.  Additionally, I believe
it''s necessary to support reconnect to a different store after
restore, allowing the store daemon or the xenbus driver to distinguish
operations which are part of an incomplete transaction from operations
which are not part of a transaction at all.  Finally, adding transaction
ids later will be a lot more difficult.

I also find the current behaviour a bit odd, we support operations
outside of transactions but once you start a transaction, every
operation you make is part of that transaction.  i think we should
either require to always be in a transaction or always allow outside
of transaction operations.

    christian


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Rusty Russell

2005-Sep-19 00:11 UTC

head link

Re: [Xen-devel] /proc/xen/xenbus supports watch?

On Sat, 2005-09-17 at 09:33 +0100, Keir Fraser wrote:> On 17 Sep 2005, at 09:26, Rusty Russell wrote:
> 
> >> How does two connections being ''logically
separate'' imply that it is
> >> improper for them not to also be ''physically
separate''? Multiplexing
> >> multiple simultaneous connections/transactions onto a single 
> >> underlying
> >> page-level transport would seem fine to me!
> >
> > Um, multiplexing, like any feature, adds complexity: if we
don''t need
> > it, don''t do it.  <shrug>
> 
> That doesn''t make it a ''hack''.
That''s quoting a little out of context.  The current partial exposure
of
the kernel''s channel is a hack, since the current model is a 1:1
mapping
between the transport and the connection.  The term hack is not a bad
thing in itself, but it does accurately reflect that it''s limited, as
in
this case where we''re asked to add watch support.
> > We have a way of establishing new ringbuffers to talk to the store, we
> > just currently assume one per domain.  Loosening that seems simpler
and
> > more robust than introducing a multiplexing layer, unless you two can
> > see something I can''t?
> 
> Multiplexing will require user-space reads/writes to be passed to the 
> kernel rather than stuffing its own comms page directly. This has the 
> advantage of being what we already do, and any performance 
> disadvantages really don''t matter.
No, I''m proposing we keep the device in the domU kernel exactly as now,
but rather than sharing the same page, use separate pages.  This means
we don''t have to share the same lock or worry about userspace messing
up, etc.
> On the xenstored side it ought simply to be a matter of picking a 
> transaction or connection id out of the message to index into some kind 
> of state table.
Sure, I can write multiplexing and demultiplexing code for the store
daemon, separate the data structures, duplicate that code in libxenstore
and xenbus.  But AFAICT it''s code which simply doesn''t need to
exist.
> If we have multiple pages the client driver is complicated by reserving 
> user pages and creating grant references for them, and cleanly tearing 
> down and dealloc''ing grant references at the appropriate point(s).
I
> agree the daemon doesn''t really get any more complicated, but I
think
> save/restore will need extra code, either in the domain0 tools or in 
> the guest os, to reconnect pages through to xenstored.
No, this is the beauty of it: save/restore should be exactly to
libxenstored as the restarting of store daemon; Christian and I have
been vigorously debating semantics to get them the same (and I think
we''re close).  Then on save, the device simply closes; on restore, the
library reconnects.

So, in summary: separating the concept of transport from connection is
possible, but I don''t think it actually buys us anything except another
layer of indirection.

Rusty.
-- 
A bad analogy is like a leaky screwdriver -- Richard Braakman


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Rusty Russell

2005-Sep-19 00:19 UTC

head link

Re: [Xen-devel] /proc/xen/xenbus supports watch?

On Sat, 2005-09-17 at 18:40 +0100, Christian Limpach
wrote:> On Sat, Sep 17, 2005 at 06:26:04PM +1000, Rusty Russell wrote:
> > What was the reason for wanting multiple transactions per connection?
> > Changing the interface is going to be a PITA, so we should figure out
if
> > we''re going to need that soon...
> 
> It solves the immediate problem of making the xenbus lock in the kernel
> driver more fine grained.  I''m not too happy with how a userspace
> program can block all xenbus access for that domain at the moment,
> even if this is limited to the root user.
Sure, and we already have the concept of separate transports, so I think
we should use them.
> Additionally, I believe
> it''s necessary to support reconnect to a different store after
> restore, allowing the store daemon or the xenbus driver to distinguish
> operations which are part of an incomplete transaction from operations
> which are not part of a transaction at all.
And I think you''re wrong; it''s unnecessary and it
doesn''t actually help.
The code will show.
>   Finally, adding transaction
> ids later will be a lot more difficult.
Unless we introduce a xs_transaction_context() call, and leave it
implicit?
> I also find the current behaviour a bit odd, we support operations
> outside of transactions but once you start a transaction, every
> operation you make is part of that transaction.  i think we should
> either require to always be in a transaction or always allow outside
> of transaction operations.
We could drop implicit transactions; it would simplify the code a bit.
I didn''t do this because it seems a bit silly for simple monitoring
tools.  The start/end transaction model is simple, but if it''s
demonstrably insufficient, I agree we should go for transaction ids.

That''s orthogonal to the multiplexing of connections over a single
transport tho.

Rusty.
-- 
A bad analogy is like a leaky screwdriver -- Richard Braakman


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2005-Sep-19 08:54 UTC

head link

Re: [Xen-devel] /proc/xen/xenbus supports watch?

On 19 Sep 2005, at 01:11, Rusty Russell wrote:
> That''s quoting a little out of context.  The current partial
exposure
> of
> the kernel''s channel is a hack, since the current model is a 1:1 
> mapping
> between the transport and the connection.  The term hack is not a bad
> thing in itself, but it does accurately reflect that it''s limited,
as
> in
> this case where we''re asked to add watch support.
Well, I can''t disagree with this. :-) Whichever way we go we have to 
keep some per-handle state on users of the xenbus device, whether 
that''s a transaction id or reference to a unique transport page. Either
way we don''t need to continuously hold the xenbus_lock (which is super 
gross).
>> If we have multiple pages the client driver is complicated by 
>> reserving
>> user pages and creating grant references for them, and cleanly tearing
>> down and dealloc''ing grant references at the appropriate
point(s). I
>> agree the daemon doesn''t really get any more complicated, but
I think
>> save/restore will need extra code, either in the domain0 tools or in
>> the guest os, to reconnect pages through to xenstored.
>
> No, this is the beauty of it: save/restore should be exactly to
> libxenstored as the restarting of store daemon; Christian and I have
> been vigorously debating semantics to get them the same (and I think
> we''re close).  Then on save, the device simply closes; on restore,
the
> library reconnects.
I wholeheartedly agree that restore should be equivalent to xenstored 
restart from the p.o.v. of the xenbus driver. That''s how the block qnde
net drivers already work. But it''s orthogonal to whether we mux 
connections onto a single transport page.

  -- Keir


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Rusty Russell

2005-Sep-20 11:01 UTC

head link

Re: [Xen-devel] /proc/xen/xenbus supports watch?

On Mon, 2005-09-19 at 09:54 +0100, Keir Fraser wrote:> I wholeheartedly agree that restore should be equivalent to xenstored
> restart from the p.o.v. of the xenbus driver. That''s how the block
qnde
> net drivers already work. But it''s orthogonal to whether we mux
> connections onto a single transport page.
OK, that''s a little confusing. There are three ways to talk to
xenstored (dom0, domU kernel, domU userspace), and two comms problems
(xenstored restart, and migration).

1) For domU kernels, we have a shared page, so xenstored restarts are
completely transparent to the domains: the new xenstored just maps the
page and away we go. We also hold the xenstore_lock across entire
transactions, which is grabbed on save/restore, so we trivially avoid
any sync issues on migration.

2) When dom0 tools connect to xenstored, they use a socket, which
there''s no easy way for the new xenstored to pick up (eg. you write to
the socket just as the daemon shuts down...). The idea, then, is to
handle reconnecting in libxenstore itself. The recent protocol tweaks
are designed to make this trivial. There''s no migration of dom0 tools
to worry about.

3) The new problem is domU tools. By providing a kernel device node we
make it look exactly like talking through a socket, so libxenstore is
basically unchanged. By each open using a separate page to talk to
xenstored, we don''t have to hold the kernel lock, nor worry about tool
errors/crashes screwing the kernel''s store page. xenstored restarts
are
transparent. On migration, we can simply force close (unmap page,
return EBADF), which libxestore can treat exactly like the xenstored
restart case with sockets, reconnect and re-xmit.

The only issue is that, in the case of migration, the new xenstored
won''t know about any transaction currently in progress. We can either
migration transactions (easy for clients), or return EAGAIN for the next
operation (easy for xenstored, sucks for clients).

Cheers,
Rusty.
--
A bad analogy is like a leaky screwdriver -- Richard Braakman

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2005-Sep-21 09:35 UTC

head link

Re: [Xen-devel] /proc/xen/xenbus supports watch?

On 20 Sep 2005, at 12:01, Rusty Russell wrote:
> By providing a kernel device node we
> make it look exactly like talking through a socket, so libxenstore is
> basically unchanged.  By each open using a separate page to talk to
> xenstored, we don''t have to hold the kernel lock, nor worry about
tool
> errors/crashes screwing the kernel''s store page.  xenstored
restarts
> are
> transparent.  On migration, we can simply force close (unmap page,
> return EBADF), which libxestore can treat exactly like the xenstored
> restart case with sockets, reconnect and re-xmit.
None of this really adds weight for or against muxing versus 
page-per-transaction.

If accesses from domU userspace are always communicated via the kernel 
device node, userspace gets no direct access to the transport page and 
so the kernel can ensure the page does not get corrupted. Also there''s 
no need to hold the kernel lock for the duration of the user-space 
transaction if we mux multiple transactions onto a single page -- as 
independent transactions it is up to xenstored to deal with sync issues 
between them. We would need to hold the lock for short periods during 
the transaction (e.g., while accessing the shared transport page) but 
we would not be holding it continuously.

As for client handling of migration/restart -- it''s an API issue that 
is independent of the underlying ''wire'' transport.

  -- Keir

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2005-Sep-21 09:39 UTC

head link

Re: [Xen-devel] /proc/xen/xenbus supports watch?

On 20 Sep 2005, at 12:01, Rusty Russell wrote:
> The only issue is that, in the case of migration, the new xenstored
> won''t know about any transaction currently in progress.  We can
either
> migration transactions (easy for clients), or return EAGAIN for the 
> next
> operation (easy for xenstored, sucks for clients).
Well, you know we already disagree very strongly on this.

Either we allow clients to lock down sections of the xenstore hierarchy 
for unbounded periods of time (unacceptable since we do not trust all 
clients) or we have to handle transaction failure in the clients. The 
only exception to this I can think of is read-only transactions, where 
you can take a read-only consistent snapshot of the store when the 
transaction begins and guarantee eventual success (even here we may 
want a timeout to avoid resource hogging). Apart from that, on failure 
the client *has* to execute its transactional code again -- the values 
it writes to the store may be dependent on values it reads as part of 
the same transaction. If the transaction is failed because some of 
those values it read are now stale (or might be stale, because a leased 
lock was revoked), the transaction replay has to include re-execution 
of the client code -- it cannot be hidden in the transactional API 
(e.g., just replaying the transactional writes from the previous failed 
attempt may be incorrect if those writes are based on stale reads from 
the previous failed attempt).

This is the price for providing ACID guarantees (well, A, C and I at 
least, and without atomicity, consistency and isolation you are not 
implementing transactions as understood by every computer scientist).

If we have to make failure visible to clients anyway (and I''m sure we 
do), it does at least greatly simplify things like xenstored restart 
and migration. Transactions are simply failed.

  -- Keir

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

harry

2005-Sep-21 11:42 UTC

head link

Re: [Xen-devel] /proc/xen/xenbus supports watch?

FWIW:

I think most of the ugliness here is a result of confusion from using
the store to do both the storage of persistent configuration information
and as a mechanism to implement configuration ABIs.

If you split out those two functions such that there is a persistent
store service which is generally useful for storage of persistent
configuration information but only stores information that is private to
each client then you can allow clients to lock down their private data
for unbounded periods (because it doesn''t matter to any other client)
and there is no need for transaction retry in the clients.

You have to provide a separate mechanism to publish configuration ABIs.
The provider of a configuration ABI can sign up to a maximum service
time for requests made of that ABI.  If a request takes too long, it can
be promoted to a domain failure of the provider domain.

So a scenario for a misbehaving domain might go as follows: domain
misbehaves and accidentally locks up part of its private region of the
store.  This state can persist indefinitely without affecting other
domains. Super-user performs some configuration which results in a
request to the domain''s configuration ABI to reconfigure something.
Domain attempts reconfiguration but hangs because it has locked up a bit
of its area of the store. Configuration request violates domain''s
config
ABI SLA.  Domain is killed.  Super-user''s configuration request fails.
Store detects domain exit and rolls back uncommitted transaction.
Domain is restarted.  Super-user can retry configuration request.

With this approach, clients of the persistent store don''t have to
handle
transaction failures.  Clients of the configuration ABI have to handle
failures of configuration requests as a result of domains exiting (which
is unavoidable: failing requests could be automatically retried after a
domain is restarted but a domain might fail continually in which case
the configuration request would eventually have to be failed).

This note might not be very helpful for 3.0 given the current
architecture but you might perhaps find a mapping of the above solution
onto the xenstore infrastructure.

On Wed, 2005-09-21 at 10:39 +0100, Keir Fraser wrote:> On 20 Sep 2005, at 12:01, Rusty Russell wrote:
> 
> > The only issue is that, in the case of migration, the new xenstored
> > won''t know about any transaction currently in progress.  We
can either
> > migration transactions (easy for clients), or return EAGAIN for the 
> > next
> > operation (easy for xenstored, sucks for clients).
> 
> Well, you know we already disagree very strongly on this.
> 
> Either we allow clients to lock down sections of the xenstore hierarchy 
> for unbounded periods of time (unacceptable since we do not trust all 
> clients) or we have to handle transaction failure in the clients. The 
> only exception to this I can think of is read-only transactions, where 
> you can take a read-only consistent snapshot of the store when the 
> transaction begins and guarantee eventual success (even here we may 
> want a timeout to avoid resource hogging). Apart from that, on failure 
> the client *has* to execute its transactional code again -- the values 
> it writes to the store may be dependent on values it reads as part of 
> the same transaction. If the transaction is failed because some of 
> those values it read are now stale (or might be stale, because a leased 
> lock was revoked), the transaction replay has to include re-execution 
> of the client code -- it cannot be hidden in the transactional API 
> (e.g., just replaying the transactional writes from the previous failed 
> attempt may be incorrect if those writes are based on stale reads from 
> the previous failed attempt).
> 
> This is the price for providing ACID guarantees (well, A, C and I at 
> least, and without atomicity, consistency and isolation you are not 
> implementing transactions as understood by every computer scientist).
> 
> If we have to make failure visible to clients anyway (and I''m sure
we
> do), it does at least greatly simplify things like xenstored restart 
> and migration. Transactions are simply failed.
> 
>   -- Keir
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Rusty Russell

2005-Sep-22 02:07 UTC

head link

Re: [Xen-devel] /proc/xen/xenbus supports watch?

On Wed, 2005-09-21 at 10:35 +0100, Keir Fraser wrote:> On 20 Sep 2005, at 12:01, Rusty Russell wrote:
> 
> > By providing a kernel device node we
> > make it look exactly like talking through a socket, so libxenstore is
> > basically unchanged.  By each open using a separate page to talk to
> > xenstored, we don''t have to hold the kernel lock, nor worry
about tool
> > errors/crashes screwing the kernel''s store page.  xenstored
restarts
> > are
> > transparent.  On migration, we can simply force close (unmap page,
> > return EBADF), which libxestore can treat exactly like the xenstored
> > restart case with sockets, reconnect and re-xmit.
> 
> None of this really adds weight for or against muxing versus 
> page-per-transaction.
Exactly!  So you would have me implement multiplexing code in the
kernel, demultiplexing code in the daemon, checking code in the kernel
to make sure we don''t corrupt the shared comms channel, for no reason.

Cheers,
Rusty.
-- 
A bad analogy is like a leaky screwdriver -- Richard Braakman


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Rusty Russell

2005-Sep-22 02:22 UTC

head link

Re: [Xen-devel] /proc/xen/xenbus supports watch?

On Wed, 2005-09-21 at 10:39 +0100, Keir Fraser wrote:> On 20 Sep 2005, at 12:01, Rusty Russell wrote:
> 
> > The only issue is that, in the case of migration, the new xenstored
> > won''t know about any transaction currently in progress.  We
can either
> > migrate transactions (easy for clients), or return EAGAIN for the 
> > next
> > operation (easy for xenstored, sucks for clients).
> 
> Well, you know we already disagree very strongly on this.
Perhaps I was unclear?

It''s not the *commit* that fails with EAGAIN, but *any operation*
(read/write/dir, etc), in this scenario.  Unlike daemon restarts, where
we can simply re-establish transactions, even if the eventual commit is
doomed to fail.  (I sent such code to Christian, but abandoned it
because we had to change everything else anyway).

Now, reread the paragraph you quoted.  Is my question clearer now?  I
really do want to know what you think,  Should we try to migrate
transactions, or label the xenstored API clearly that any operation
inside a transaction can return EAGAIN if you are inside a domU using
the xenbus device?  Or can you see a third way?

(BTW, your analysis of the use of locks to provide ACID is flawed, but
since I''ve had to abandon that approach for another reason
I''ll not
waste time now in an academic argument: that''s for pubs and IRC).

Have I unconfused the issue?
Rusty.
-- 
A bad analogy is like a leaky screwdriver -- Richard Braakman

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2005-Sep-22 09:35 UTC

head link

Re: [Xen-devel] /proc/xen/xenbus supports watch?

On 22 Sep 2005, at 03:22, Rusty Russell wrote:
> It''s not the *commit* that fails with EAGAIN, but *any operation*
> (read/write/dir, etc), in this scenario.
Well, that is more of a pain. Probably okay for Python as an exception 
gets raised and you automatically unwind the stack to the full scope of 
the transaction. More of a pain for C code. But still, it also has the 
advantage that you can fail a doomed transaction earlier -- for 
example, you can ensure that client code doesn''t execute based on 
inconsistent sets of read values (i.e., read a value A before a remote 
transaction commits; read a value B after a remote transaction commits; 
where that remote transaction updates both A and B).

Either we believe some clients are fragile to reading bad/inconsistent 
data, in which case I think EAGAIN on any read/write is a good idea 
(better than massively complicating xenstored). If we believe clients 
should be robust to reading crap from the store (and many ought to be, 
because backends can''t trust values written by frontends, for example) 
then we can handle doomed transactions without early report of EAGAIN 
as follows:
  * discard writes
  * return empty strings for reads (or ENOENT, so something like that).

Whatever, the client probably needs the code to realise that a bad 
thing has happened and to take appropriate action whichever strategy we 
go for. I suspect they are equivalent complexity for clients.

  -- Keir

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2005-Sep-22 09:36 UTC

head link

Re: [Xen-devel] /proc/xen/xenbus supports watch?

On 22 Sep 2005, at 03:07, Rusty Russell wrote:
> Exactly!  So you would have me implement multiplexing code in the
> kernel, demultiplexing code in the daemon, checking code in the kernel
> to make sure we don''t corrupt the shared comms channel, for no
reason.
Better to implement simple mux/demux code than pain-in-the-arse 
save/restore code.

  -- Keir


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Rusty Russell

2005-Sep-22 22:54 UTC

head link

Re: [Xen-devel] /proc/xen/xenbus supports watch?

On Thu, 2005-09-22 at 10:36 +0100, Keir Fraser wrote:> On 22 Sep 2005, at 03:07, Rusty Russell wrote:
> 
> > Exactly!  So you would have me implement multiplexing code in the
> > kernel, demultiplexing code in the daemon, checking code in the kernel
> > to make sure we don''t corrupt the shared comms channel, for
no reason.
> 
> Better to implement simple mux/demux code than pain-in-the-arse 
> save/restore code.
But but but... it doesn''t *help*.  That''s the entire point!

OK, please describe, in simple terms, why you think save/restore is
different if we multiplex across a single transport?

Rusty.
-- 
A bad analogy is like a leaky screwdriver -- Richard Braakman


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Rusty Russell

2005-Sep-22 23:51 UTC

head link

Re: [Xen-devel] /proc/xen/xenbus supports watch?

On Thu, 2005-09-22 at 10:35 +0100, Keir Fraser wrote:> Whatever, the client probably needs the code to realise that a bad 
> thing has happened and to take appropriate action whichever strategy we 
> go for. I suspect they are equivalent complexity for clients.
I think you''ve summed it up well.  Of these two I''m leaning
towards
EAGAIN (which the client can turn into a fake success if they want).
But both are subtle and kinda icky.

Which is why I am pondering a bundle/unbundle interface for
transactions, so we can migrate them with the domain.  Summary: 

1) Easy to do at the moment: we already snapshot the entire store for
transactions, so we can just bundle/unbundle that.  We need
globally-unique transactions IDs, but that''s fairly simple.
2) Each domain adds roughly 5k to the store (this will increase, say
10k).  This means migrating off a node with 100 domains means adding 1M
to the data we have to send *per transaction*.
3) The store compresses extremely well (~800 bytes per domain), so we
could trivially get it down to 160k/transaction in the 100 domain case.

You know I treasure simple APIs, and this makes the store API simpler
and so reduces subtle errors in future.

But is it worth the complexity?
Rusty.
-- 
A bad analogy is like a leaky screwdriver -- Richard Braakman

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Andrew Warfield

2005-Sep-23 01:01 UTC

head link

Re: [Xen-devel] /proc/xen/xenbus supports watch?

Rusty,

   Can you explain once again why you think that migrating in-progress
transactions is the right thing to do?  It seems to me that our
transactions are generally pretty small, and I don''t imagine them
getting problematically bigger in the future.  If client-side
transaction code is already being written to expect failures and retry
when they occur, what is the argument against blowing away in-progress
transactions when you migrate.

   Given that the majority of current transaction code is to do with
device drivers and you disconnect/reconnect those on migration anyway,
why go through the extra work of adding complexity to migration?

a.

On 9/22/05, Rusty Russell <rusty@rustcorp.com.au>
wrote:> On Thu, 2005-09-22 at 10:35 +0100, Keir Fraser wrote:
> > Whatever, the client probably needs the code to realise that a bad
> > thing has happened and to take appropriate action whichever strategy
we
> > go for. I suspect they are equivalent complexity for clients.
>
> I think you''ve summed it up well.  Of these two I''m
leaning towards
> EAGAIN (which the client can turn into a fake success if they want).
> But both are subtle and kinda icky.
>
> Which is why I am pondering a bundle/unbundle interface for
> transactions, so we can migrate them with the domain.  Summary:
>
> 1) Easy to do at the moment: we already snapshot the entire store for
> transactions, so we can just bundle/unbundle that.  We need
> globally-unique transactions IDs, but that''s fairly simple.
> 2) Each domain adds roughly 5k to the store (this will increase, say
> 10k).  This means migrating off a node with 100 domains means adding 1M
> to the data we have to send *per transaction*.
> 3) The store compresses extremely well (~800 bytes per domain), so we
> could trivially get it down to 160k/transaction in the 100 domain case.
>
> You know I treasure simple APIs, and this makes the store API simpler
> and so reduces subtle errors in future.
>
> But is it worth the complexity?
> Rusty.
> --
> A bad analogy is like a leaky screwdriver -- Richard Braakman
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
>
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2005-Sep-23 09:17 UTC

head link

Re: [Xen-devel] /proc/xen/xenbus supports watch?

> But but but... it doesn''t *help*.  That''s the entire
point!
>
> OK, please describe, in simple terms, why you think save/restore is
> different if we multiplex across a single transport?
Well, maybe there''s not so much in it after all. I''ll assume
here we go
for the ''xenstored forgets all state, and clients get EAGAIN at the 
first available opportunity'' approach.

If we mux on a single transport:
  1. The shared transport page is set up automatically in xenstored when 
the domain is restored. Xenstored has forgotten about any in-progress 
transactions.
  2.  The xenbus driver marks all file handles (or transaction 
structures, or whatever it uses to track local state for each local 
transaction) as doomed. Any further activity on those transactions 
returns EAGAIN rather than passing thru to xenstored.
  3. That''s it! Clients detect failure and retry.

If we have page per transaction:
  1. Same as (1) above.
  2. Same as (2) above, but free the per-transaction transport page.
  3. Same as (3) above.

However, I''m not clear yet what each separate transport page 
represents. Is it a single transaction, or a connection that stores 
multiple watches and one transaction at a time? If the latter, 
save/restore gets a bit harder as the transport pages must be 
automatically re-registered and watches re-registered...

  -- Keir


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2005-Sep-23 09:24 UTC

head link

Re: [Xen-devel] /proc/xen/xenbus supports watch?

On 23 Sep 2005, at 00:51, Rusty Russell wrote:
> Which is why I am pondering a bundle/unbundle interface for
> transactions, so we can migrate them with the domain.  Summary:
How do you guarantee consistency of the migrated transaction? It holds 
no locks or anything on the values it has read while it is not running, 
so they could change under its feet?

Correctness-wise, it sounds to me as though bundle/unbundle has the 
same correctness guarantees as drop all writes, ENOENT on reads, and 
EAGAIN on commit.

Or is the bundling simply so that you can continue to feed consistent 
snapshotted values to the transaction from its ''shadow store'',
even
though it is ultimately a doomed transaction?

Do xenstored''s performance problems stem from copying the store for 
every transaction?

  -- Keir

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Rusty Russell

2005-Sep-25 00:57 UTC

head link

Re: [Xen-devel] /proc/xen/xenbus supports watch?

On Thu, 2005-09-22 at 18:01 -0700, Andrew Warfield
wrote:> Rusty,
> 
>    Can you explain once again why you think that migrating in-progress
> transactions is the right thing to do?  It seems to me that our
> transactions are generally pretty small, and I don''t imagine them
> getting problematically bigger in the future.  If client-side
> transaction code is already being written to expect failures and retry
> when they occur, what is the argument against blowing away in-progress
> transactions when you migrate.
> 
>    Given that the majority of current transaction code is to do with
> device drivers and you disconnect/reconnect those on migration anyway,
> why go through the extra work of adding complexity to migration?
Actually, with more thought on your excellent points I''ve changed my
mind: we should wait for any transactions to complete before saving
domain and sidestep the whole issue.

To recap, the problem is that if we are are halfway through a
transaction when we migrate the domain, it''s hard to know what to do on
the next op (eg. "xs_write", "xs_read" etc).  Without
migrating the
transaction, we can fail the next op with EAGAIN or return "dummy"
values, neither of which is pleasant.  (Failing with EAGAIN on commit is
different, and something the caller needs to handle anyway).

Now, we already have this "domain won''t save until transactions
are
done" simply because we use a single big lock, but this discussion
started because we want to get rid of that lock for /proc/xen/xenbus
(it''s fine for drivers).  I think we should do so, but keep this
wont-save-during-transactions semantic; it means a waitqueue etc, but I
don''t think it''s too bad.  As you say, our transactions are
pretty
small.

Do people like this more?
Rusty.
-- 
A bad analogy is like a leaky screwdriver -- Richard Braakman

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Rusty Russell

2005-Sep-25 01:09 UTC

head link

Re: [Xen-devel] /proc/xen/xenbus supports watch?

On Fri, 2005-09-23 at 10:24 +0100, Keir Fraser wrote:> Or is the bundling simply so that you can continue to feed consistent 
> snapshotted values to the transaction from its ''shadow
store'', even
> though it is ultimately a doomed transaction?
Yes that was the plan, although see other mail, I now prefer not
allowing save during transactions at all to avoid this issue.
> Do xenstored''s performance problems stem from copying the store
for
> every transaction?
Yes.  Using TDB means it''s now copying a single file, rather than a
whole directory tree, but if someone puts in enough domains in the store
it will become a problem and we''ll need a more sophisticated
implementation.  It would be a fun problem to work on, but I don''t
think
I can justify spending cycles on it yet: I can create around 10,000
domain-style directories in the store on my laptop in 22 seconds (using
10,000 separate transactions).

The advantage of the "whole copy" approach are simplicity and
robustness: if you SIGKILL xenstored your chances of recovery are
excellent, as commit is done as a rename(2).

Cheers!
Rusty.
-- 
A bad analogy is like a leaky screwdriver -- Richard Braakman


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Rusty Russell

2005-Sep-25 03:29 UTC

head link

Re: [Xen-devel] /proc/xen/xenbus supports watch?

On Fri, 2005-09-23 at 10:17 +0100, Keir Fraser wrote:> However, I''m not clear yet what each separate transport page 
> represents. Is it a single transaction, or a connection that stores 
> multiple watches and one transaction at a time? If the latter, 
> save/restore gets a bit harder as the transport pages must be 
> automatically re-registered and watches re-registered...
I was assuming per-connection, ie. every time a tool
opens /proc/xen/xenbus we map a new page/event channel, and free on
close.  ie. the drivers in the domU kernel share the store comms page
created with the domain, tools in the domU get a separate page each,
tools in dom0 connect to the unix domain socket each.  To the store, one
connection, one transport.  (Each connection can set up multiple watches
of course, and Christian has been hinting that he wants multiple
transactions too, but that''s another argument).

So my proposal is (for those who came in late):

[xenstored]
 |
 |--> /var/run/xenstored/socket <-- libxenstore <-- tool in dom0
 |                              <-- libxenstore <-- tool in dom0
 |                              ...
 |--> /var/run/xenstored/socket_ro <-- libxenstore <-- tool in dom0
 |                                 <-- libxenstore <-- tool in dom0
 |                                 ...
 |--> shared page 100/evtchn 100 <-- xenbus/xenbus_xs.c <- domU 1
kernel
 |--> shared page 200/evtchn 200 <-- xenbus/xenbus_xs.c <- domU 2
kernel
 |--> shared page 300/evtchn 300 <-- xenbus/xenbus_xs.c <- domU 3
kernel
 |  ...
 |--> shared page 1000/evtchn 1000 <-- xenbus/xenbus_dev.c <--
libxenstore <-- tool in domU 1
 |--> shared page 1001/evtchn 1001 <-- xenbus/xenbus_dev.c <--
libxenstore <-- tool in domU 1
 |--> shared page 1002/evtchn 1002 <-- xenbus/xenbus_dev.c <--
libxenstore <-- tool in domU 1
 |  ...

This differs from what we have: xenbus/xenbus_dev.c uses the same shared
page as xenbus/xenbus_xs.c, and there''s one big lock (xenbus_lock) so
drivers can''t use xenbus while the device is open.  We all agree that
is
suboptimal.

So on domain save, we just force close everyone who has /proc/xen/xenbus
open.  This closing, like normal closing, frees all those pages and
channels, sends XS_UNINTRODUCE to store to tell it to unmap.

On restore, libxenstore gets EBADF from the /proc/xen/xenbus fd.  It
reopens the fd, reregisters watches, and continues.  This turns out to
be *exactly* the same as the case where a tool in dom0, using
libxenstore to talk to xenstored via socket, sees the xenstored restart.
ie. we need that code anyway.

In summary, I think this is the cleanest way to make /proc/xen/xenbus a
first-class citizen and more robust, involves minimal changes, and
doesn''t complicate save/restore much at all.

Cheers!
Rusty.
-- 
A bad analogy is like a leaky screwdriver -- Richard Braakman

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2005-Sep-25 11:02 UTC

head link

Re: [Xen-devel] /proc/xen/xenbus supports watch?

On 25 Sep 2005, at 04:29, Rusty Russell wrote:
>
> I was assuming per-connection, ie. every time a tool
> opens /proc/xen/xenbus we map a new page/event channel, and free on
> close.  ie. the drivers in the domU kernel share the store comms page
> created with the domain, tools in the domU get a separate page each,
> tools in dom0 connect to the unix domain socket each.  To the store, 
> one
> connection, one transport.  (Each connection can set up multiple 
> watches
> of course, and Christian has been hinting that he wants multiple
> transactions too, but that''s another argument).
Yeah, I can live with this, although: What about multiple transactions 
within the kernel? Do you plan to continue serialising them (e.g., on a 
waitqueue)? An advantage of mux/demux would be that concurrent kernel 
transactions could easily use the same mechanism. Your scheme places 
restart mechanisms in user space, so they''re out of reach for kernel 
transactions.

  -- Keir


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2005-Sep-25 11:09 UTC

head link

Re: [Xen-devel] /proc/xen/xenbus supports watch?

On 25 Sep 2005, at 01:57, Rusty Russell wrote:
> Now, we already have this "domain won''t save until
transactions are
> done" simply because we use a single big lock, but this discussion
> started because we want to get rid of that lock for /proc/xen/xenbus
> (it''s fine for drivers).  I think we should do so, but keep this
> wont-save-during-transactions semantic; it means a waitqueue etc, but I
> don''t think it''s too bad.  As you say, our transactions
are pretty
> small.
>
> Do people like this more?
Blocking system progress on arbitrary user apps doesn''t sound 
particularly attractive to me, but I guess it is at least simple. I''m 
more inclined to EAGAIN on read/write, maybe sugared by exceptions in 
languages that support them, or setjmp/longjmp to get you back to the 
outermost scope of the transaction. I agree that''s not pretty either, 
though.

How will you handle ''xenstored restart''? You can''t
really guarantee
that to always happen at opportune moments with no transactions in 
flight.

  -- Keir


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2005-Sep-25 11:33 UTC

head link

Re: [Xen-devel] /proc/xen/xenbus supports watch?

On 25 Sep 2005, at 12:02, Keir Fraser wrote:
> Yeah, I can live with this, although: What about multiple transactions 
> within the kernel? Do you plan to continue serialising them (e.g., on 
> a waitqueue)? An advantage of mux/demux would be that concurrent 
> kernel transactions could easily use the same mechanism. Your scheme 
> places restart mechanisms in user space, so they''re out of reach
for
> kernel transactions.
Also, page-per-connection won''t entirely avoid sharing of 
state/resource in xenstored. At some point we''ll want to add per-domain
access policy, and space/bandwidth quotas (to prevent DoS). All of 
those must be shared between the multiple connections of a domain -- so 
the separate connections aren''t as independent as you might like.

  -- Keir


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Christian Limpach

2005-Sep-25 18:55 UTC

head link

Re: [Xen-devel] /proc/xen/xenbus supports watch?

On Sun, Sep 25, 2005 at 12:33:33PM +0100, Keir Fraser
wrote:> 
> On 25 Sep 2005, at 12:02, Keir Fraser wrote:
> 
> >Yeah, I can live with this, although: What about multiple transactions 
> >within the kernel? Do you plan to continue serialising them (e.g., on 
> >a waitqueue)? An advantage of mux/demux would be that concurrent 
> >kernel transactions could easily use the same mechanism. Your scheme 
> >places restart mechanisms in user space, so they''re out of
reach for
> >kernel transactions.
> 
> Also, page-per-connection won''t entirely avoid sharing of 
> state/resource in xenstored. At some point we''ll want to add
per-domain
> access policy, and space/bandwidth quotas (to prevent DoS). All of 
> those must be shared between the multiple connections of a domain -- so 
> the separate connections aren''t as independent as you might like.
I believe that the only thing we really _need_ at the moment
is support for multiple concurrent transactions.  This avoids
having to hold the lock except for very short periods and it
allows the server to return EAGAIN on operations where it doesn''t
have the state corresponding to the transaction anymore.  Since we
need to add some kind of transaction identifier to the interface
to support this, we should make this change now.

    christian


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Rusty Russell

2005-Sep-25 22:52 UTC

head link

Re: [Xen-devel] /proc/xen/xenbus supports watch?

On Sun, 2005-09-25 at 12:09 +0100, Keir Fraser wrote:> How will you handle ''xenstored restart''? You
can''t really guarantee
> that to always happen at opportune moments with no transactions in 
> flight.
Transaction IDs: when you create a transaction instead of returning
"OK"
it returns an ID (this also helps if later we want multiple transactions
at a time on a single connection).  If you were in a transaction you
EBADF, you simply ask for it back before continuing: this can be done by
libxenstore.

It''s simple enough to implement, because xenstored keeps each
transaction in a separate file: it just has to pick up those files again
on restart.

I''ll start sending patches...
Cheers,
Rusty.
-- 
A bad analogy is like a leaky screwdriver -- Richard Braakman


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Rusty Russell

2005-Sep-25 23:06 UTC

head link

Re: [Xen-devel] /proc/xen/xenbus supports watch?

On Sun, 2005-09-25 at 12:33 +0100, Keir Fraser wrote:> On 25 Sep 2005, at 12:02, Keir Fraser wrote:
> 
> > Yeah, I can live with this, although: What about multiple transactions
> > within the kernel? Do you plan to continue serialising them (e.g., on 
> > a waitqueue)? An advantage of mux/demux would be that concurrent 
> > kernel transactions could easily use the same mechanism. Your scheme 
> > places restart mechanisms in user space, so they''re out of
reach for
> > kernel transactions.
We already have the mechanism: xenbus_lock.  I don''t think we want to
go
for parallelism within the kernel for xenstore comms: it''d be a fair
amount of pain for something which isn''t exactly speed critical.  Like
Andrew said, I can''t transactions getting significantly longer.
> Also, page-per-connection won''t entirely avoid sharing of 
> state/resource in xenstored. At some point we''ll want to add
per-domain
> access policy, and space/bandwidth quotas (to prevent DoS). All of 
> those must be shared between the multiple connections of a domain -- so 
> the separate connections aren''t as independent as you might like.
We already have a permissions model based on domid (although not
actually enforced due to a bug: we can fix this with one line but will
require xend fixups I imagine).  Space quotas will have to be by ID,
too, not by the connection(s) which created them: in the case of
migration, the store will be recreated by the tools, but should still be
counted against the ID which owns them.  So even if we multiplexed all
the connections together for one domain they would still have to be
separate.

Bandwidth quotas are and interesting idea: I was thinking of a dumb
fairness scheme.  We almost do this: we rotate the list of connections,
but there''s a FIXME about the unfair way we service domain pages.  Or
we
could just measure the time we spend servicing each connection, and put
the slowest ones at the tail... (socket connections would be immune,
since we trust dom0 tools).  I haven''t thought too hard about it.

Thanks, I''ll update the TODO file...
Rusty.
-- 
A bad analogy is like a leaky screwdriver -- Richard Braakman


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Rusty Russell

2005-Sep-26 06:36 UTC

head link

Re: [Xen-devel] /proc/xen/xenbus supports watch?

On Sun, 2005-09-25 at 19:55 +0100, Christian Limpach
wrote:> On Sun, Sep 25, 2005 at 12:33:33PM +0100, Keir Fraser wrote:
> > Also, page-per-connection won''t entirely avoid sharing of 
> > state/resource in xenstored. At some point we''ll want to add
per-domain
> > access policy, and space/bandwidth quotas (to prevent DoS). All of 
> > those must be shared between the multiple connections of a domain --
so
> > the separate connections aren''t as independent as you might
like.
> 
> I believe that the only thing we really _need_ at the moment
> is support for multiple concurrent transactions.
You mean, on the same connection, I assume?
> This avoids
> having to hold the lock except for very short periods
True, but is holding the lock a problem?  If we remove /proc/dev/xenbus
from the equation, I don''t think it is.  In fact, I think the lock is a
very good thing, which is why I put it there.
> and it
> allows the server to return EAGAIN on operations where it doesn''t
> have the state corresponding to the transaction anymore.
Non sequiter, AFAICT.  We can have the server return EAGAIN on any
operation, but that''s independent of the kernel''s locking
strategy.  But
you will have to make all the kernel code deal with it, as well as
everyone using libxenstore.  Frankly, that''s just stupid if we can
easily avoid it.
> Since we
> need to add some kind of transaction identifier to the interface
> to support this, we should make this change now.
Or, alternately, since we don''t need it, we shouldn''t.

Rusty.
-- 
A bad analogy is like a leaky screwdriver -- Richard Braakman


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2005-Sep-26 07:33 UTC

head link

Re: [Xen-devel] /proc/xen/xenbus supports watch?

On 26 Sep 2005, at 07:36, Rusty Russell wrote:
>> Since we
>> need to add some kind of transaction identifier to the interface
>> to support this, we should make this change now.
>
> Or, alternately, since we don''t need it, we shouldn''t.
Holding xenbus_lock even across kernel transactions is undesirable if 
we can avoid it, I think. Remind me again why mux/demux is harder or 
more code than spawning multiple pages per domain? Both approaches 
require extra code to be written after all, and it''s not clear to me 
that the extra code required in xenstored for mux/demux is very much or 
particularly tricky to write.

  -- Keir


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Christian Limpach

2005-Sep-26 18:51 UTC

head link

Re: [Xen-devel] /proc/xen/xenbus supports watch?

On Mon, Sep 26, 2005 at 04:36:11PM +1000, Rusty Russell
wrote:> > I believe that the only thing we really _need_ at the moment
> > is support for multiple concurrent transactions.
> 
> You mean, on the same connection, I assume?
Yes.

Before I reply to the rest of your comments, I''d like to point out that
I don''t agree with your assumption that we can delay suspend/resume
until we''re outside of a transaction.  This seems to be the fundamental
difference driving different solutions.
> > This avoids
> > having to hold the lock except for very short periods
> 
> True, but is holding the lock a problem?  If we remove /proc/dev/xenbus
> from the equation, I don''t think it is.  In fact, I think the lock
is a
> very good thing, which is why I put it there.
I think holding the lock is not desirable.  For live migration, it might
even turn out to be a bottleneck if we have to serialize device reconnect.
> > and it
> > allows the server to return EAGAIN on operations where it
doesn''t
> > have the state corresponding to the transaction anymore.
> 
> Non sequiter, AFAICT.  We can have the server return EAGAIN on any
> operation, but that''s independent of the kernel''s locking
strategy.  But
> you will have to make all the kernel code deal with it, as well as
> everyone using libxenstore.  Frankly, that''s just stupid if we can
> easily avoid it.
You read what I wrote in the wrong context:  transaction ids are necessary
to distinguish operations which are part of a transaction from those
which are outside of transactions.  On suspend/resume, you need to
be able to decide either at xenstored level or better on the client side
(xenbus driver or libxenstore) whether you need to fail an operation
or not.

You can achieve the same by only supporting operations within a
transaction.
> > Since we
> > need to add some kind of transaction identifier to the interface
> > to support this, we should make this change now.
> 
> Or, alternately, since we don''t need it, we shouldn''t.
I think we need them since it''s the simplest solution to the whole
multi-page/multi-connection issue for a saner xenbus_dev implementation:
- lock only held around xs_talkv
- transaction ids
- single point for demultiplexing watch events

    christian


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2005-Sep-26 19:30 UTC

head link

Re: [Xen-devel] /proc/xen/xenbus supports watch?

On 26 Sep 2005, at 19:51, Christian Limpach wrote:
>>> Since we
>>> need to add some kind of transaction identifier to the interface
>>> to support this, we should make this change now.
>>
>> Or, alternately, since we don''t need it, we
shouldn''t.
>
> I think we need them since it''s the simplest solution to the whole
> multi-page/multi-connection issue for a saner xenbus_dev 
> implementation:
> - lock only held around xs_talkv
> - transaction ids
> - single point for demultiplexing watch events
This is precisely how I expected that xenbus was going to be structured 
in the first place. It seems the simplest, most natural implementation 
and happens to avoid a lot of potential unnecessary blocking and 
serialisation. And not even at much cost in xenstored (how hard can the 
demultiplex be?).

  -- Keir


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Rusty Russell

2005-Sep-27 06:48 UTC

head link

Re: [Xen-devel] /proc/xen/xenbus supports watch?

On Mon, 2005-09-26 at 20:30 +0100, Keir Fraser wrote:> On 26 Sep 2005, at 19:51, Christian Limpach wrote:
> 
> >>> Since we
> >>> need to add some kind of transaction identifier to the
interface
> >>> to support this, we should make this change now.
> >>
> >> Or, alternately, since we don''t need it, we
shouldn''t.
> >
> > I think we need them since it''s the simplest solution to the
whole
> > multi-page/multi-connection issue for a saner xenbus_dev 
> > implementation:
> > - lock only held around xs_talkv
> > - transaction ids
> > - single point for demultiplexing watch events
> 
> This is precisely how I expected that xenbus was going to be structured 
> in the first place. It seems the simplest, most natural implementation 
> and happens to avoid a lot of potential unnecessary blocking and 
> serialisation. And not even at much cost in xenstored (how hard can the 
> demultiplex be?).
Well, here''s the simple patch that modifies introductions to allow the
domain to introduce new pages, and another one that tests it.  I''m away
for the next week and a half on other stuff, then I''m in Cambridge.

Modifying the xenbus dev to use this should be fairly easy, if you
choose to go this route.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

diff -r 0e368f851f6a tools/python/xen/lowlevel/xs/xs.c
--- a/tools/python/xen/lowlevel/xs/xs.c	Tue Sep 27 00:55:56 2005
+++ b/tools/python/xen/lowlevel/xs/xs.c	Tue Sep 27 16:36:19 2005
@@ -728,7 +728,7 @@
                                      &dom))
         goto exit;
     Py_BEGIN_ALLOW_THREADS
-    xsval = xs_release_domain(xh, dom);
+    xsval = xs_release_domain(xh, dom, 0);
     Py_END_ALLOW_THREADS
     if (!xsval) {
         PyErr_SetFromErrno(PyExc_RuntimeError);
diff -r 0e368f851f6a tools/xenstore/xenstored_core.c
--- a/tools/xenstore/xenstored_core.c	Tue Sep 27 00:55:56 2005
+++ b/tools/xenstore/xenstored_core.c	Tue Sep 27 16:36:19 2005
@@ -1136,7 +1136,7 @@
 		break;
 
 	case XS_RELEASE:
-		do_release(conn, onearg(in));
+		do_release(conn, in);
 		break;
 
 	case XS_GET_DOMAIN_PATH:
diff -r 0e368f851f6a tools/xenstore/xenstored_domain.c
--- a/tools/xenstore/xenstored_domain.c	Tue Sep 27 00:55:56 2005
+++ b/tools/xenstore/xenstored_domain.c	Tue Sep 27 16:36:19 2005
@@ -50,6 +50,9 @@
 
 	/* Event channel port */
 	u16 port;
+
+	/* Page number. */
+	unsigned long mfn;
 
 	/* Domain path in store. */
 	char *path;
@@ -282,6 +285,7 @@
 	domain->port = 0;
 	domain->shutdown = 0;
 	domain->domid = domid;
+	domain->mfn = mfn;
 	domain->path = talloc_strdup(domain, path);
 	domain->page = xc_map_foreign_range(*xc_handle, domain->domid,
 					    getpagesize(),
@@ -312,15 +316,28 @@
 {
 	struct domain *domain;
 	char *vec[4];
+	domid_t domid;
 
 	if (get_strings(in, vec, ARRAY_SIZE(vec)) < ARRAY_SIZE(vec)) {
 		send_error(conn, EINVAL);
 		return;
 	}
 
-	if (conn->id != 0 || !conn->can_write) {
+	domid = atoi(vec[0]);
+	if (!conn->can_write || (conn->id != 0 && domid != DOMID_SELF))
{
 		send_error(conn, EACCES);
 		return;
+	}
+
+	/* Domains can introduce more comms pages to store.  FIXME: limit. */
+	if (domid == DOMID_SELF) {
+		if (!conn->domain) {
+			send_error(conn, EINVAL);
+			return;
+		}
+		/* Same domid and path. */
+		domid = conn->domain->domid;
+		vec[3] = conn->domain->path;
 	}
 
 	/* Sanity check args. */
@@ -329,8 +346,7 @@
 		return;
 	}
 	/* Hang domain off "in" until we''re finished. */
-	domain = new_domain(in, atoi(vec[0]), atol(vec[1]), atol(vec[2]),
-			    vec[3]);
+	domain = new_domain(in, domid, atol(vec[1]), atol(vec[2]), vec[3]);
 	if (!domain) {
 		send_error(conn, errno);
 		return;
@@ -339,58 +355,69 @@
 	/* Now domain belongs to its connection. */
 	talloc_steal(domain->conn, domain);
 
-	fire_watches(conn, "@introduceDomain", false);
+	if (domid != DOMID_SELF)
+		fire_watches(conn, "@introduceDomain", false);
 
 	send_ack(conn, XS_INTRODUCE);
 }
 
-static struct domain *find_domain_by_domid(domid_t domid)
+static struct domain *find_domain(domid_t domid, unsigned long mfn)
 {
 	struct domain *i;
 
 	list_for_each_entry(i, &domains, list) {
-		if (i->domid == domid)
+		if (i->domid == domid && (!mfn || i->mfn == mfn))
 			return i;
 	}
 	return NULL;
 }
 
-/* domid */
-void do_release(struct connection *conn, const char *domid_str)
+/* domid, mfn */
+void do_release(struct connection *conn, struct buffered_data *in)
 {
 	struct domain *domain;
 	domid_t domid;
-
-	if (!domid_str) {
+	unsigned long mfn;
+	bool released = false;
+	char *vec[2];
+
+	if (get_strings(in, vec, ARRAY_SIZE(vec)) < ARRAY_SIZE(vec)) {
 		send_error(conn, EINVAL);
 		return;
 	}
-
-	domid = atoi(domid_str);
+	domid = atoi(vec[0]);
+	mfn = atol(vec[1]);
 	if (!domid) {
 		send_error(conn, EINVAL);
 		return;
 	}
 
-	if (conn->id != 0) {
-		send_error(conn, EACCES);
-		return;
-	}
-
-	domain = find_domain_by_domid(domid);
-	if (!domain) {
+	if (domid == DOMID_SELF) {
+		if (!conn->domain || mfn == 0) {
+			send_error(conn, EINVAL);
+			return;
+		}
+		domid = conn->domain->domid;
+	} else {
+		if (conn->id != 0) {
+			send_error(conn, EACCES);
+			return;
+		}
+	}
+
+	/* Can release multiple if mfn == 0 */
+	while ((domain = find_domain(domid, mfn)) != NULL) {
+		talloc_free(domain->conn);
+		released = true;
+	}
+
+	if (!released) {
 		send_error(conn, ENOENT);
 		return;
 	}
 
-	if (!domain->conn) {
-		send_error(conn, EINVAL);
-		return;
-	}
-
-	talloc_free(domain->conn);
-
-	fire_watches(conn, "@releaseDomain", false);
+	if (domid != DOMID_SELF)
+		fire_watches(conn, "@releaseDomain", false);
 
 	send_ack(conn, XS_RELEASE);
 }
@@ -409,7 +436,7 @@
 	if (domid == DOMID_SELF)
 		domain = conn->domain;
 	else
-		domain = find_domain_by_domid(domid);
+		domain = find_domain(domid, 0);
 
 	if (!domain)
 		send_error(conn, ENOENT);
diff -r 0e368f851f6a tools/xenstore/xenstored_domain.h
--- a/tools/xenstore/xenstored_domain.h	Tue Sep 27 00:55:56 2005
+++ b/tools/xenstore/xenstored_domain.h	Tue Sep 27 16:36:19 2005
@@ -25,8 +25,8 @@
 /* domid, mfn, eventchn, path */
 void do_introduce(struct connection *conn, struct buffered_data *in);
 
-/* domid */
-void do_release(struct connection *conn, const char *domid_str);
+/* domid, mfn */
+void do_release(struct connection *conn, struct buffered_data *in);
 
 /* domid */
 void do_get_domain_path(struct connection *conn, const char *domid_str);
diff -r 0e368f851f6a tools/xenstore/xs.c
--- a/tools/xenstore/xs.c	Tue Sep 27 00:55:56 2005
+++ b/tools/xenstore/xs.c	Tue Sep 27 16:36:19 2005
@@ -676,13 +676,21 @@
 	return xs_bool(xs_talkv(h, XS_INTRODUCE, iov, ARRAY_SIZE(iov), NULL));
 }
 
-bool xs_release_domain(struct xs_handle *h, domid_t domid)
+bool xs_release_domain(struct xs_handle *h, domid_t domid, unsigned long mfn)
 {
 	char domid_str[MAX_STRLEN(domid)];
+	char mfn_str[MAX_STRLEN(mfn)];
+	struct iovec iov[2];
 
 	sprintf(domid_str, "%u", domid);
-
-	return xs_bool(xs_single(h, XS_RELEASE, domid_str, NULL));
+	sprintf(mfn_str, "%lu", mfn);
+
+	iov[0].iov_base = domid_str;
+	iov[0].iov_len = strlen(domid_str) + 1;
+	iov[1].iov_base = mfn_str;
+	iov[1].iov_len = strlen(mfn_str) + 1;
+
+	return xs_bool(xs_talkv(h, XS_RELEASE, iov, ARRAY_SIZE(iov), NULL));
 }
 
 char *xs_get_domain_path(struct xs_handle *h, domid_t domid)
diff -r 0e368f851f6a tools/xenstore/xs.h
--- a/tools/xenstore/xs.h	Tue Sep 27 00:55:56 2005
+++ b/tools/xenstore/xs.h	Tue Sep 27 16:36:19 2005
@@ -130,8 +130,9 @@
 
 /* Release a domain.
  * Tells the store domain to release the memory page to the domain.
+ * mfn is 0 to release all of them.
  */
-bool xs_release_domain(struct xs_handle *h, domid_t domid);
+bool xs_release_domain(struct xs_handle *h, domid_t domid, unsigned long mfn);
 
 /* Query the home path of a domain.
  */


Test code for previous patch.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

diff -r 0e368f851f6a tools/xenstore/fake_libxc.c
--- a/tools/xenstore/fake_libxc.c	Tue Sep 27 00:55:56 2005
+++ b/tools/xenstore/fake_libxc.c	Tue Sep 27 16:36:19 2005
@@ -44,42 +44,38 @@
 	return 0;
 }
 
-void *xc_map_foreign_range(int xc_handle, u32 dom __attribute__((unused)),
+void *xc_map_foreign_range(int xc_handle __attribute__((unused)),
+			   u32 dom __attribute__((unused)),
 			   int size, int prot,
 			   unsigned long mfn __attribute__((unused)))
 {
 	void *ret;
+	int *extra;
+	int fd;
 
-	ret = mmap(NULL, size, prot, MAP_SHARED, xc_handle, 0);
+	fd = open("/tmp/xcmap", O_RDWR);
+	if (fd < 0)
+		return NULL;
+
+	/* We actually get extra page, for comms with xs_test. */
+	ret = mmap(NULL, size + getpagesize(), prot, MAP_SHARED, fd, 0);
 	if (ret == MAP_FAILED)
 		return NULL;
 
-	/* xs_test tells us pid and port by putting it in buffer, we reply. */
-	xs_test_pid = *(int *)(ret + 32);
-	port = *(int *)(ret + 36);
-	*(int *)(ret + 32) = getpid();
+	extra = ret + size;
+	xs_test_pid = extra[0];
+	port = extra[1];
+	extra[2] = getpid();
 	return ret;
 }
 
 int xc_interface_open(void)
 {
-	int fd;
-	char page[getpagesize()];
-
-	fd = open("/tmp/xcmap", O_RDWR|O_CREAT|O_TRUNC, 0600);
-	if (fd < 0)
-		return fd;
-
-	memset(page, 0, sizeof(page));
-	if (!xs_write_all(fd, page, sizeof(page)))
-		barf_perror("Failed to write /tmp/xcmap page");
-	
-	return fd;
+	return 1;
 }
 
-int xc_interface_close(int xc_handle)
+int xc_interface_close(int xc_handle __attribute__((unused)))
 {
-	close(xc_handle);
 	return 0;
 }
 
diff -r 0e368f851f6a tools/xenstore/testsuite/09domain.test
--- a/tools/xenstore/testsuite/09domain.test	Tue Sep 27 00:55:56 2005
+++ b/tools/xenstore/testsuite/09domain.test	Tue Sep 27 16:36:19 2005
@@ -17,3 +17,24 @@
 expect handle is 2
 introduce 1 100 7 /my/home
 release 1
+
+# Introduce sub-connection
+write /my/home/entry contents
+close
+expect handle is 3
+introduce 1 100 7 /my/home
+expect handle is 4
+3 introduce-self 120 7 /my/home
+
+# Check home is correct
+expect 4:contents
+4 read entry
+
+# Release 4 from 3
+3 release-self 120
+
+# Introduce a new one and release both at once.
+expect handle is 5
+3 introduce-self 120 7 /my/home
+
+release 1
diff -r 0e368f851f6a tools/xenstore/xs_test.c
--- a/tools/xenstore/xs_test.c	Tue Sep 27 00:55:56 2005
+++ b/tools/xenstore/xs_test.c	Tue Sep 27 16:36:19 2005
@@ -59,6 +59,7 @@
 
 static struct ringbuf_head *out, *in;
 static unsigned int ringbuf_datasize;
+static int event_channel;
 static int daemon_pid;
 
 /* FIXME: Mark connection as broken (close it?) when this happens. */
@@ -208,6 +209,9 @@
 	     "  start <node>\n"
 	     "  abort\n"
 	     "  introduce <domid> <mfn> <eventchn>
<path>\n"
+	     "  introduce-self <mfn> <eventchn> <path>\n"
+	     "  release <domid>\n"
+	     "  release-self <mfn>\n"
 	     "  commit\n"
 	     "  sleep <milliseconds>\n"
 	     "  expect <pattern>\n"
@@ -575,16 +579,22 @@
 }
 
 static void do_introduce(unsigned int handle,
-			 const char *domid,
+			 int domid,
 			 const char *mfn,
 			 const char *eventchn,
 			 const char *path)
 {
 	unsigned int i;
-	int fd;
+	int fd, *extra;
+	char pages[getpagesize()*2];
+
+	fd = open("/tmp/xcmap", O_RDWR|O_CREAT, 0600);
+	memset(pages, 0, sizeof(pages));
+	write(fd, pages, sizeof(pages));
 
 	/* This mechanism is v. slow w. valgrind running. */
-	timeout_ms = 5000;
+	if (timeout_ms)
+		timeout_ms = 5000;
 
 	/* We poll, so ignore signal */
 	signal(SIGUSR2, SIG_IGN);
@@ -592,22 +602,24 @@
 		if (!handles[i])
 			break;
 
-	fd = open("/tmp/xcmap", O_RDWR);
 	/* Set in and out pointers. */
-	out = mmap(NULL, getpagesize(), PROT_WRITE|PROT_READ, MAP_SHARED,fd,0);
+	out = mmap(NULL, sizeof(pages), PROT_WRITE|PROT_READ, MAP_SHARED,fd,0);
 	if (out == MAP_FAILED)
 		barf_perror("Failed to map /tmp/xcmap page");
 	in = (void *)out + getpagesize() / 2;
 	close(fd);
 
+	event_channel = atoi(eventchn);
+
 	/* Tell them the event channel and our PID. */
-	*(int *)((void *)out + 32) = getpid();
-	*(u16 *)((void *)out + 36) = atoi(eventchn);
-
-	if (!xs_introduce_domain(handles[handle], atoi(domid),
+	extra = (void *)out + getpagesize();
+	extra[0] = getpid();
+	extra[1] = event_channel;
+
+	if (!xs_introduce_domain(handles[handle], domid,
 				 atol(mfn), atoi(eventchn), path)) {
 		failed(handle);
-		munmap(out, getpagesize());
+		munmap(out, getpagesize()*2);
 		return;
 	}
 	output("handle is %i\n", i);
@@ -622,12 +634,18 @@
 	handles[i]->committing = false;
 
 	/* Read in daemon pid. */
-	daemon_pid = *(int *)((void *)out + 32);
+	daemon_pid = extra[2];
 }
 
 static void do_release(unsigned int handle, const char *domid)
 {
-	if (!xs_release_domain(handles[handle], atoi(domid)))
+	if (!xs_release_domain(handles[handle], atoi(domid), 0))
+		failed(handle);
+}
+
+static void do_release_self(unsigned int handle, const char *mfn)
+{
+	if (!xs_release_domain(handles[handle], DOMID_SELF, atol(mfn)))
 		failed(handle);
 }
 
@@ -802,10 +820,15 @@
 	else if (streq(command, "abort"))
 		do_end(handle, true);
 	else if (streq(command, "introduce"))
-		do_introduce(handle, arg(line, 1), arg(line, 2),
+		do_introduce(handle, atoi(arg(line, 1)), arg(line, 2),
 			     arg(line, 3), arg(line, 4));
+	else if (streq(command, "introduce-self"))
+		do_introduce(handle, DOMID_SELF,
+			     arg(line, 1), arg(line, 2), arg(line, 3));
 	else if (streq(command, "release"))
 		do_release(handle, arg(line, 1));
+	else if (streq(command, "release-self"))
+		do_release_self(handle, arg(line, 1));
 	else if (streq(command, "dump"))
 		dump(handle);
 	else if (streq(command, "sleep")) {

-- 
A bad analogy is like a leaky screwdriver -- Richard Braakman


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Rusty Russell

2005-Sep-27 07:15 UTC

head link

Re: [Xen-devel] /proc/xen/xenbus supports watch?

On Mon, 2005-09-26 at 19:51 +0100, Christian Limpach
wrote:> On Mon, Sep 26, 2005 at 04:36:11PM +1000, Rusty Russell wrote:
> > > I believe that the only thing we really _need_ at the moment
> > > is support for multiple concurrent transactions.
> > 
> > You mean, on the same connection, I assume?
> 
> Yes.
> 
> Before I reply to the rest of your comments, I''d like to point out
that
> I don''t agree with your assumption that we can delay
suspend/resume
> until we''re outside of a transaction.  This seems to be the
fundamental
> difference driving different solutions.
Absolutely.  The delays will be immeasurably short.  The code will be
simple.  The API will be simple.  Locking will be simple.  Drivers will
be simple.  Debugging will be simple.  Parallelism is not going to make
restore faster, in fact, the current "winner takes all" transaction
model means they''ll be much, much slower.

Now, since this is really my last email before I head off on leave,
you''re going to decide this without my bitching and moaning (yeah, I
know how much you''ll miss it!).

Nonetheless, I''m happy to hand xenstored to your capable hands.  When I
come back, I might start writing tests or something equally
uncontroversial 8)

Cheers!
Rusty.
-- 
A bad analogy is like a leaky screwdriver -- Richard Braakman


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

David Hopwood

2005-Sep-27 23:31 UTC

head link

Re: [Xen-devel] /proc/xen/xenbus supports watch?

Rusty Russell wrote:> On Mon, 2005-09-26 at 19:51 +0100, Christian Limpach wrote:
>>On Mon, Sep 26, 2005 at 04:36:11PM +1000, Rusty Russell wrote:
>>
>>Before I reply to the rest of your comments, I''d like to point
out that
>>I don''t agree with your assumption that we can delay
suspend/resume
>>until we''re outside of a transaction.  This seems to be the
fundamental
>>difference driving different solutions.
> 
> Absolutely.  The delays will be immeasurably short.
What about denial of service (intentional or due to bugs in guest OSes/tools)?

-- 
David Hopwood <david.nospam.hopwood@blueyonder.co.uk>


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Apparently Analagous Threads

Search for more possibly parallel threads

Xen devel - Sep 2005 - /proc/xen/xenbus supports watch?

[Xen-devel] /proc/xen/xenbus supports watch?

Re: [Xen-devel] /proc/xen/xenbus supports watch?

Re: [Xen-devel] /proc/xen/xenbus supports watch?

Re: [Xen-devel] /proc/xen/xenbus supports watch?

Re: [Xen-devel] /proc/xen/xenbus supports watch?

Re: [Xen-devel] /proc/xen/xenbus supports watch?

Re: [Xen-devel] /proc/xen/xenbus supports watch?

Re: [Xen-devel] /proc/xen/xenbus supports watch?

Re: [Xen-devel] /proc/xen/xenbus supports watch?

Re: [Xen-devel] /proc/xen/xenbus supports watch?

Re: [Xen-devel] /proc/xen/xenbus supports watch?

Re: [Xen-devel] /proc/xen/xenbus supports watch?

Re: [Xen-devel] /proc/xen/xenbus supports watch?

Re: [Xen-devel] /proc/xen/xenbus supports watch?

Re: [Xen-devel] /proc/xen/xenbus supports watch?

Re: [Xen-devel] /proc/xen/xenbus supports watch?

Re: [Xen-devel] /proc/xen/xenbus supports watch?

Re: [Xen-devel] /proc/xen/xenbus supports watch?

Re: [Xen-devel] /proc/xen/xenbus supports watch?

Re: [Xen-devel] /proc/xen/xenbus supports watch?

Re: [Xen-devel] /proc/xen/xenbus supports watch?

Re: [Xen-devel] /proc/xen/xenbus supports watch?

Re: [Xen-devel] /proc/xen/xenbus supports watch?

Re: [Xen-devel] /proc/xen/xenbus supports watch?

Re: [Xen-devel] /proc/xen/xenbus supports watch?

Re: [Xen-devel] /proc/xen/xenbus supports watch?

Re: [Xen-devel] /proc/xen/xenbus supports watch?

Re: [Xen-devel] /proc/xen/xenbus supports watch?

Re: [Xen-devel] /proc/xen/xenbus supports watch?

Re: [Xen-devel] /proc/xen/xenbus supports watch?

Re: [Xen-devel] /proc/xen/xenbus supports watch?

Re: [Xen-devel] /proc/xen/xenbus supports watch?

Re: [Xen-devel] /proc/xen/xenbus supports watch?

Re: [Xen-devel] /proc/xen/xenbus supports watch?

Re: [Xen-devel] /proc/xen/xenbus supports watch?

Re: [Xen-devel] /proc/xen/xenbus supports watch?

Re: [Xen-devel] /proc/xen/xenbus supports watch?

Re: [Xen-devel] /proc/xen/xenbus supports watch?

Re: [Xen-devel] /proc/xen/xenbus supports watch?

Re: [Xen-devel] /proc/xen/xenbus supports watch?

Re: [Xen-devel] /proc/xen/xenbus supports watch?

Re: [Xen-devel] /proc/xen/xenbus supports watch?

Re: [Xen-devel] /proc/xen/xenbus supports watch?

Re: [Xen-devel] /proc/xen/xenbus supports watch?

Re: [Xen-devel] /proc/xen/xenbus supports watch?

Re: [Xen-devel] /proc/xen/xenbus supports watch?

Apparently Analagous Threads