Aurelien Chartier
2013-May-16 12:34 UTC
[PATCH V4 0/2] xenbus: Fix S3 frontend resume when xenstored is not running
Hi, This patch series fixes the S3 resume of a domain running xenstored and a frontend over xenbus (xen-netfront in my use case). As device resume is happening before process resume, the xenbus frontend resume is hanging if xenstored is not running, thus causing a deadlock. This patch series is fixing that issue by deferring the xenbus frontend resume when we are running xenstored in that same domain. Aurelien Chartier (2): xenbus: save xenstore local status for later use xenbus: delay xenbus frontend resume if xenstored is not running drivers/xen/xenbus/xenbus_comms.h | 1 + drivers/xen/xenbus/xenbus_probe.c | 27 ++++++++++----------- drivers/xen/xenbus/xenbus_probe.h | 7 ++++++ drivers/xen/xenbus/xenbus_probe_frontend.c | 36 +++++++++++++++++++++++++++- include/xen/xenbus.h | 1 + 5 files changed, 56 insertions(+), 16 deletions(-) -- 1.7.10.4
Aurelien Chartier
2013-May-16 12:34 UTC
[PATCH V4 1/2] xenbus: save xenstore local status for later use
Save the xenstore local status computed in xenbus_init. It can then be used later to check if xenstored is running in this domain. Signed-off-by: Aurelien Chartier <aurelien.chartier@citrix.com> Reviewed-by: David Vrabel <david.vrabel@citrix.com> Changes in v4: - Change variable name to xen_store_domain_type --- drivers/xen/xenbus/xenbus_comms.h | 1 + drivers/xen/xenbus/xenbus_probe.c | 27 ++++++++++++--------------- drivers/xen/xenbus/xenbus_probe.h | 7 +++++++ 3 files changed, 20 insertions(+), 15 deletions(-) diff --git a/drivers/xen/xenbus/xenbus_comms.h b/drivers/xen/xenbus/xenbus_comms.h index c8abd3b..e74f9c1 100644 --- a/drivers/xen/xenbus/xenbus_comms.h +++ b/drivers/xen/xenbus/xenbus_comms.h @@ -45,6 +45,7 @@ int xb_wait_for_data_to_read(void); int xs_input_avail(void); extern struct xenstore_domain_interface *xen_store_interface; extern int xen_store_evtchn; +extern enum xenstore_init xen_store_domain_type; extern const struct file_operations xen_xenbus_fops; diff --git a/drivers/xen/xenbus/xenbus_probe.c b/drivers/xen/xenbus/xenbus_probe.c index 3325884..56cfaaa 100644 --- a/drivers/xen/xenbus/xenbus_probe.c +++ b/drivers/xen/xenbus/xenbus_probe.c @@ -69,6 +69,9 @@ EXPORT_SYMBOL_GPL(xen_store_evtchn); struct xenstore_domain_interface *xen_store_interface; EXPORT_SYMBOL_GPL(xen_store_interface); +enum xenstore_init xen_store_domain_type; +EXPORT_SYMBOL_GPL(xen_store_domain_type); + static unsigned long xen_store_mfn; static BLOCKING_NOTIFIER_HEAD(xenstore_chain); @@ -719,17 +722,11 @@ static int __init xenstored_local_init(void) return err; } -enum xenstore_init { - UNKNOWN, - PV, - HVM, - LOCAL, -}; static int __init xenbus_init(void) { int err = 0; - enum xenstore_init usage = UNKNOWN; uint64_t v = 0; + xen_store_domain_type = XS_UNKNOWN; if (!xen_domain()) return -ENODEV; @@ -737,29 +734,29 @@ static int __init xenbus_init(void) xenbus_ring_ops_init(); if (xen_pv_domain()) - usage = PV; + xen_store_domain_type = XS_PV; if (xen_hvm_domain()) - usage = HVM; + xen_store_domain_type = XS_HVM; if (xen_hvm_domain() && xen_initial_domain()) - usage = LOCAL; + xen_store_domain_type = XS_LOCAL; if (xen_pv_domain() && !xen_start_info->store_evtchn) - usage = LOCAL; + xen_store_domain_type = XS_LOCAL; if (xen_pv_domain() && xen_start_info->store_evtchn) xenstored_ready = 1; - switch (usage) { - case LOCAL: + switch (xen_store_domain_type) { + case XS_LOCAL: err = xenstored_local_init(); if (err) goto out_error; xen_store_interface = mfn_to_virt(xen_store_mfn); break; - case PV: + case XS_PV: xen_store_evtchn = xen_start_info->store_evtchn; xen_store_mfn = xen_start_info->store_mfn; xen_store_interface = mfn_to_virt(xen_store_mfn); break; - case HVM: + case XS_HVM: err = hvm_get_parameter(HVM_PARAM_STORE_EVTCHN, &v); if (err) goto out_error; diff --git a/drivers/xen/xenbus/xenbus_probe.h b/drivers/xen/xenbus/xenbus_probe.h index bb4f92e..146f857 100644 --- a/drivers/xen/xenbus/xenbus_probe.h +++ b/drivers/xen/xenbus/xenbus_probe.h @@ -47,6 +47,13 @@ struct xen_bus_type { struct bus_type bus; }; +enum xenstore_init { + XS_UNKNOWN, + XS_PV, + XS_HVM, + XS_LOCAL, +}; + extern struct device_attribute xenbus_dev_attrs[]; extern int xenbus_match(struct device *_dev, struct device_driver *_drv); -- 1.7.10.4
Aurelien Chartier
2013-May-16 12:34 UTC
[PATCH V4 2/2] xenbus: delay xenbus frontend resume if xenstored is not running
If the xenbus frontend is located in a domain running xenstored, the device resume is hanging because it is happening before the process resume. This patch adds extra logic to the resume code to check if we are the domain running xenstored and delay the resume if needed. Signed-off-by: Aurelien Chartier <aurelien.chartier@citrix.com> Changes in v2: - Instead of bypassing the resume, process it in a workqueue Changes in v3: - Add a struct work in xenbus_device to avoid dynamic allocation - Several small code fixes Changes in v4: - Use a dedicated workqueue --- drivers/xen/xenbus/xenbus_probe_frontend.c | 36 +++++++++++++++++++++++++++- include/xen/xenbus.h | 1 + 2 files changed, 36 insertions(+), 1 deletion(-) diff --git a/drivers/xen/xenbus/xenbus_probe_frontend.c b/drivers/xen/xenbus/xenbus_probe_frontend.c index 3159a37..6275be8 100644 --- a/drivers/xen/xenbus/xenbus_probe_frontend.c +++ b/drivers/xen/xenbus/xenbus_probe_frontend.c @@ -29,6 +29,8 @@ #include "xenbus_probe.h" +static struct workqueue_struct *xenbus_frontend_wq; + /* device/<type>/<id> => <type>-<id> */ static int frontend_bus_id(char bus_id[XEN_BUS_ID_SIZE], const char *nodename) { @@ -89,9 +91,34 @@ static void backend_changed(struct xenbus_watch *watch, xenbus_otherend_changed(watch, vec, len, 1); } +static void xenbus_frontend_delayed_resume(struct work_struct *w) +{ + struct xenbus_device *xdev = container_of(w, struct xenbus_device, work); + + xenbus_dev_resume(&xdev->dev); +} + +static int xenbus_frontend_dev_resume(struct device *dev) +{ + /* + * If xenstored is running in this domain, we cannot access the backend + * state at the moment, so we need to defer xenbus_dev_resume + */ + if (xen_store_domain_type == XS_LOCAL) { + struct xenbus_device *xdev = to_xenbus_device(dev); + + INIT_WORK(&xdev->work, xenbus_frontend_delayed_resume); + queue_work(xenbus_frontend_wq, &xdev->work); + + return 0; + } + + return xenbus_dev_resume(dev); +} + static const struct dev_pm_ops xenbus_pm_ops = { .suspend = xenbus_dev_suspend, - .resume = xenbus_dev_resume, + .resume = xenbus_frontend_dev_resume, .freeze = xenbus_dev_suspend, .thaw = xenbus_dev_cancel, .restore = xenbus_dev_resume, @@ -440,6 +467,13 @@ static int __init xenbus_probe_frontend_init(void) register_xenstore_notifier(&xenstore_notifier); + xenbus_frontend_wq = create_workqueue("xenbus_frontend"); + if (!xenbus_frontend_wq) { + printk(KERN_ERR "%s: create " + "xenbus_frontend_workqueue failed\n", __func__); + return -EFAULT; + } + return 0; } subsys_initcall(xenbus_probe_frontend_init); diff --git a/include/xen/xenbus.h b/include/xen/xenbus.h index 0a7515c..569c07f 100644 --- a/include/xen/xenbus.h +++ b/include/xen/xenbus.h @@ -70,6 +70,7 @@ struct xenbus_device { struct device dev; enum xenbus_state state; struct completion down; + struct work_struct work; }; static inline struct xenbus_device *to_xenbus_device(struct device *dev) -- 1.7.10.4
Jan Beulich
2013-May-16 13:32 UTC
Re: [PATCH V4 2/2] xenbus: delay xenbus frontend resume if xenstored is not running
>>> On 16.05.13 at 14:34, Aurelien Chartier <aurelien.chartier@citrix.com> wrote: > @@ -440,6 +467,13 @@ static int __init xenbus_probe_frontend_init(void) > > register_xenstore_notifier(&xenstore_notifier); > > + xenbus_frontend_wq = create_workqueue("xenbus_frontend"); > + if (!xenbus_frontend_wq) { > + printk(KERN_ERR "%s: create " > + "xenbus_frontend_workqueue failed\n", __func__);pr_err() should be the norm these days. And personally I consider it quite bad a habit to put function names in non-debugging log messages - this doesn''t really help with anything (as long as the rest of the message is meaningful), but clutters the log. And finally, you need to do proper error handling here - this code can be built as a module, and hence leaving notifier and bus registered upon failure sets up the kernel for crashing. Moving the code ahead of register_xenstore_notifier() will take care of one half of the problem, but you''ll nevertheless will need to call bus_unregister() if you really intend to make this a fatal error condition (which by itself is questionable since in most scenarios you won''t need the work queue at all). Jan> + return -EFAULT; > + } > + > return 0; > } > subsys_initcall(xenbus_probe_frontend_init);
Aurelien Chartier
2013-May-16 14:59 UTC
Re: [PATCH V4 2/2] xenbus: delay xenbus frontend resume if xenstored is not running
On 16/05/13 14:32, Jan Beulich wrote:>>>> On 16.05.13 at 14:34, Aurelien Chartier <aurelien.chartier@citrix.com> wrote: >> @@ -440,6 +467,13 @@ static int __init xenbus_probe_frontend_init(void) >> >> register_xenstore_notifier(&xenstore_notifier); >> >> + xenbus_frontend_wq = create_workqueue("xenbus_frontend"); >> + if (!xenbus_frontend_wq) { >> + printk(KERN_ERR "%s: create " >> + "xenbus_frontend_workqueue failed\n", __func__); > pr_err() should be the norm these days. > > And personally I consider it quite bad a habit to put function names > in non-debugging log messages - this doesn''t really help with > anything (as long as the rest of the message is meaningful), but > clutters the log.I was using the same format as pciback error handling, but I can switch to pr_err.> And finally, you need to do proper error handling here - this code > can be built as a module, and hence leaving notifier and bus > registered upon failure sets up the kernel for crashing. Moving > the code ahead of register_xenstore_notifier() will take care of > one half of the problem, but you''ll nevertheless will need to call > bus_unregister() if you really intend to make this a fatal error > condition (which by itself is questionable since in most scenarios > you won''t need the work queue at all).Right, I will move the error handling to the xenbus frontend resume function. Aurelien.
Konrad Rzeszutek Wilk
2013-May-28 12:45 UTC
Re: [PATCH V4 2/2] xenbus: delay xenbus frontend resume if xenstored is not running
On Thu, May 16, 2013 at 03:59:48PM +0100, Aurelien Chartier wrote:> On 16/05/13 14:32, Jan Beulich wrote: > >>>> On 16.05.13 at 14:34, Aurelien Chartier <aurelien.chartier@citrix.com> wrote: > >> @@ -440,6 +467,13 @@ static int __init xenbus_probe_frontend_init(void) > >> > >> register_xenstore_notifier(&xenstore_notifier); > >> > >> + xenbus_frontend_wq = create_workqueue("xenbus_frontend"); > >> + if (!xenbus_frontend_wq) { > >> + printk(KERN_ERR "%s: create " > >> + "xenbus_frontend_workqueue failed\n", __func__); > > pr_err() should be the norm these days. > > > > And personally I consider it quite bad a habit to put function names > > in non-debugging log messages - this doesn''t really help with > > anything (as long as the rest of the message is meaningful), but > > clutters the log. > I was using the same format as pciback error handling, but I can switch > to pr_err. > > > And finally, you need to do proper error handling here - this code > > can be built as a module, and hence leaving notifier and bus > > registered upon failure sets up the kernel for crashing. Moving > > the code ahead of register_xenstore_notifier() will take care of > > one half of the problem, but you''ll nevertheless will need to call > > bus_unregister() if you really intend to make this a fatal error > > condition (which by itself is questionable since in most scenarios > > you won''t need the work queue at all). > Right, I will move the error handling to the xenbus frontend resume > function.Any ETA when the v5 with these changes will be posted? Thanks.> > Aurelien. > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel >
Aurelien Chartier
2013-May-28 12:59 UTC
Re: [PATCH V4 2/2] xenbus: delay xenbus frontend resume if xenstored is not running
On 28/05/13 13:45, Konrad Rzeszutek Wilk wrote:> On Thu, May 16, 2013 at 03:59:48PM +0100, Aurelien Chartier wrote: >> On 16/05/13 14:32, Jan Beulich wrote: >>>>>> On 16.05.13 at 14:34, Aurelien Chartier <aurelien.chartier@citrix.com> wrote: >>>> @@ -440,6 +467,13 @@ static int __init xenbus_probe_frontend_init(void) >>>> >>>> register_xenstore_notifier(&xenstore_notifier); >>>> >>>> + xenbus_frontend_wq = create_workqueue("xenbus_frontend"); >>>> + if (!xenbus_frontend_wq) { >>>> + printk(KERN_ERR "%s: create " >>>> + "xenbus_frontend_workqueue failed\n", __func__); >>> pr_err() should be the norm these days. >>> >>> And personally I consider it quite bad a habit to put function names >>> in non-debugging log messages - this doesn''t really help with >>> anything (as long as the rest of the message is meaningful), but >>> clutters the log. >> I was using the same format as pciback error handling, but I can switch >> to pr_err. >> >>> And finally, you need to do proper error handling here - this code >>> can be built as a module, and hence leaving notifier and bus >>> registered upon failure sets up the kernel for crashing. Moving >>> the code ahead of register_xenstore_notifier() will take care of >>> one half of the problem, but you''ll nevertheless will need to call >>> bus_unregister() if you really intend to make this a fatal error >>> condition (which by itself is questionable since in most scenarios >>> you won''t need the work queue at all). >> Right, I will move the error handling to the xenbus frontend resume >> function. > Any ETA when the v5 with these changes will be posted? Thanks.I was on holidays last week, but the v5 was almost ready. I will send it today or tomorrow.
Konrad Rzeszutek Wilk
2013-May-28 16:18 UTC
Re: [PATCH V4 2/2] xenbus: delay xenbus frontend resume if xenstored is not running
On Tue, May 28, 2013 at 01:59:53PM +0100, Aurelien Chartier wrote:> On 28/05/13 13:45, Konrad Rzeszutek Wilk wrote: > > On Thu, May 16, 2013 at 03:59:48PM +0100, Aurelien Chartier wrote: > >> On 16/05/13 14:32, Jan Beulich wrote: > >>>>>> On 16.05.13 at 14:34, Aurelien Chartier <aurelien.chartier@citrix.com> wrote: > >>>> @@ -440,6 +467,13 @@ static int __init xenbus_probe_frontend_init(void) > >>>> > >>>> register_xenstore_notifier(&xenstore_notifier); > >>>> > >>>> + xenbus_frontend_wq = create_workqueue("xenbus_frontend"); > >>>> + if (!xenbus_frontend_wq) { > >>>> + printk(KERN_ERR "%s: create " > >>>> + "xenbus_frontend_workqueue failed\n", __func__); > >>> pr_err() should be the norm these days. > >>> > >>> And personally I consider it quite bad a habit to put function names > >>> in non-debugging log messages - this doesn''t really help with > >>> anything (as long as the rest of the message is meaningful), but > >>> clutters the log. > >> I was using the same format as pciback error handling, but I can switch > >> to pr_err. > >> > >>> And finally, you need to do proper error handling here - this code > >>> can be built as a module, and hence leaving notifier and bus > >>> registered upon failure sets up the kernel for crashing. Moving > >>> the code ahead of register_xenstore_notifier() will take care of > >>> one half of the problem, but you''ll nevertheless will need to call > >>> bus_unregister() if you really intend to make this a fatal error > >>> condition (which by itself is questionable since in most scenarios > >>> you won''t need the work queue at all). > >> Right, I will move the error handling to the xenbus frontend resume > >> function. > > Any ETA when the v5 with these changes will be posted? Thanks. > I was on holidays last week, but the v5 was almost ready. I will send it > today or tomorrow.OK. Thanks!>