Aurelien Chartier
2013-May-28 17:09 UTC
[PATCH V5 0/2] xenbus: Fix S3 frontend resume when xenstored is not running
Hi, This patch series fixes the S3 resume of a domain running xenstored and a frontend over xenbus (xen-netfront in my use case). As device resume is happening before process resume, the xenbus frontend resume is hanging if xenstored is not running, thus causing a deadlock. This patch series is fixing that issue by deferring the xenbus frontend resume when we are running xenstored in that same domain. Aurelien Chartier (2): xenbus: save xenstore local status for later use xenbus: delay xenbus frontend resume if xenstored is not running drivers/xen/xenbus/xenbus_comms.h | 1 + drivers/xen/xenbus/xenbus_probe.c | 27 +++++++++----------- drivers/xen/xenbus/xenbus_probe.h | 7 ++++++ drivers/xen/xenbus/xenbus_probe_frontend.c | 37 +++++++++++++++++++++++++++- include/xen/xenbus.h | 1 + 5 files changed, 57 insertions(+), 16 deletions(-) -- 1.7.10.4
Aurelien Chartier
2013-May-28 17:09 UTC
[PATCH V5 1/2] xenbus: save xenstore local status for later use
Save the xenstore local status computed in xenbus_init. It can then be used later to check if xenstored is running in this domain. Signed-off-by: Aurelien Chartier <aurelien.chartier@citrix.com> Reviewed-by: David Vrabel <david.vrabel@citrix.com> Changes in v4: - Change variable name to xen_store_domain_type --- drivers/xen/xenbus/xenbus_comms.h | 1 + drivers/xen/xenbus/xenbus_probe.c | 27 ++++++++++++--------------- drivers/xen/xenbus/xenbus_probe.h | 7 +++++++ 3 files changed, 20 insertions(+), 15 deletions(-) diff --git a/drivers/xen/xenbus/xenbus_comms.h b/drivers/xen/xenbus/xenbus_comms.h index c8abd3b..e74f9c1 100644 --- a/drivers/xen/xenbus/xenbus_comms.h +++ b/drivers/xen/xenbus/xenbus_comms.h @@ -45,6 +45,7 @@ int xb_wait_for_data_to_read(void); int xs_input_avail(void); extern struct xenstore_domain_interface *xen_store_interface; extern int xen_store_evtchn; +extern enum xenstore_init xen_store_domain_type; extern const struct file_operations xen_xenbus_fops; diff --git a/drivers/xen/xenbus/xenbus_probe.c b/drivers/xen/xenbus/xenbus_probe.c index 3325884..56cfaaa 100644 --- a/drivers/xen/xenbus/xenbus_probe.c +++ b/drivers/xen/xenbus/xenbus_probe.c @@ -69,6 +69,9 @@ EXPORT_SYMBOL_GPL(xen_store_evtchn); struct xenstore_domain_interface *xen_store_interface; EXPORT_SYMBOL_GPL(xen_store_interface); +enum xenstore_init xen_store_domain_type; +EXPORT_SYMBOL_GPL(xen_store_domain_type); + static unsigned long xen_store_mfn; static BLOCKING_NOTIFIER_HEAD(xenstore_chain); @@ -719,17 +722,11 @@ static int __init xenstored_local_init(void) return err; } -enum xenstore_init { - UNKNOWN, - PV, - HVM, - LOCAL, -}; static int __init xenbus_init(void) { int err = 0; - enum xenstore_init usage = UNKNOWN; uint64_t v = 0; + xen_store_domain_type = XS_UNKNOWN; if (!xen_domain()) return -ENODEV; @@ -737,29 +734,29 @@ static int __init xenbus_init(void) xenbus_ring_ops_init(); if (xen_pv_domain()) - usage = PV; + xen_store_domain_type = XS_PV; if (xen_hvm_domain()) - usage = HVM; + xen_store_domain_type = XS_HVM; if (xen_hvm_domain() && xen_initial_domain()) - usage = LOCAL; + xen_store_domain_type = XS_LOCAL; if (xen_pv_domain() && !xen_start_info->store_evtchn) - usage = LOCAL; + xen_store_domain_type = XS_LOCAL; if (xen_pv_domain() && xen_start_info->store_evtchn) xenstored_ready = 1; - switch (usage) { - case LOCAL: + switch (xen_store_domain_type) { + case XS_LOCAL: err = xenstored_local_init(); if (err) goto out_error; xen_store_interface = mfn_to_virt(xen_store_mfn); break; - case PV: + case XS_PV: xen_store_evtchn = xen_start_info->store_evtchn; xen_store_mfn = xen_start_info->store_mfn; xen_store_interface = mfn_to_virt(xen_store_mfn); break; - case HVM: + case XS_HVM: err = hvm_get_parameter(HVM_PARAM_STORE_EVTCHN, &v); if (err) goto out_error; diff --git a/drivers/xen/xenbus/xenbus_probe.h b/drivers/xen/xenbus/xenbus_probe.h index bb4f92e..146f857 100644 --- a/drivers/xen/xenbus/xenbus_probe.h +++ b/drivers/xen/xenbus/xenbus_probe.h @@ -47,6 +47,13 @@ struct xen_bus_type { struct bus_type bus; }; +enum xenstore_init { + XS_UNKNOWN, + XS_PV, + XS_HVM, + XS_LOCAL, +}; + extern struct device_attribute xenbus_dev_attrs[]; extern int xenbus_match(struct device *_dev, struct device_driver *_drv); -- 1.7.10.4
Aurelien Chartier
2013-May-28 17:09 UTC
[PATCH V5 2/2] xenbus: delay xenbus frontend resume if xenstored is not running
If the xenbus frontend is located in a domain running xenstored, the device resume is hanging because it is happening before the process resume. This patch adds extra logic to the resume code to check if we are the domain running xenstored and delay the resume if needed. Signed-off-by: Aurelien Chartier <aurelien.chartier@citrix.com> Changes in v2: - Instead of bypassing the resume, process it in a workqueue Changes in v3: - Add a struct work in xenbus_device to avoid dynamic allocation - Several small code fixes Changes in v4: - Use a dedicated workqueue Changes in v5: - Move create_workqueue error handling to xenbus_frontend_dev_resume --- drivers/xen/xenbus/xenbus_probe_frontend.c | 37 +++++++++++++++++++++++++++- include/xen/xenbus.h | 1 + 2 files changed, 37 insertions(+), 1 deletion(-) diff --git a/drivers/xen/xenbus/xenbus_probe_frontend.c b/drivers/xen/xenbus/xenbus_probe_frontend.c index 3159a37..3374aee 100644 --- a/drivers/xen/xenbus/xenbus_probe_frontend.c +++ b/drivers/xen/xenbus/xenbus_probe_frontend.c @@ -29,6 +29,8 @@ #include "xenbus_probe.h" +static struct workqueue_struct *xenbus_frontend_wq; + /* device/<type>/<id> => <type>-<id> */ static int frontend_bus_id(char bus_id[XEN_BUS_ID_SIZE], const char *nodename) { @@ -89,9 +91,40 @@ static void backend_changed(struct xenbus_watch *watch, xenbus_otherend_changed(watch, vec, len, 1); } +static void xenbus_frontend_delayed_resume(struct work_struct *w) +{ + struct xenbus_device *xdev = container_of(w, struct xenbus_device, work); + + xenbus_dev_resume(&xdev->dev); +} + +static int xenbus_frontend_dev_resume(struct device *dev) +{ + /* + * If xenstored is running in this domain, we cannot access the backend + * state at the moment, so we need to defer xenbus_dev_resume + */ + if (xen_store_domain_type == XS_LOCAL) { + struct xenbus_device *xdev = to_xenbus_device(dev); + + if (!xenbus_frontend_wq) { + pr_err("%s: no workqueue to process delayed resume\n", + xdev->nodename); + return -EFAULT; + } + + INIT_WORK(&xdev->work, xenbus_frontend_delayed_resume); + queue_work(xenbus_frontend_wq, &xdev->work); + + return 0; + } + + return xenbus_dev_resume(dev); +} + static const struct dev_pm_ops xenbus_pm_ops = { .suspend = xenbus_dev_suspend, - .resume = xenbus_dev_resume, + .resume = xenbus_frontend_dev_resume, .freeze = xenbus_dev_suspend, .thaw = xenbus_dev_cancel, .restore = xenbus_dev_resume, @@ -440,6 +473,8 @@ static int __init xenbus_probe_frontend_init(void) register_xenstore_notifier(&xenstore_notifier); + xenbus_frontend_wq = create_workqueue("xenbus_frontend"); + return 0; } subsys_initcall(xenbus_probe_frontend_init); diff --git a/include/xen/xenbus.h b/include/xen/xenbus.h index 0a7515c..569c07f 100644 --- a/include/xen/xenbus.h +++ b/include/xen/xenbus.h @@ -70,6 +70,7 @@ struct xenbus_device { struct device dev; enum xenbus_state state; struct completion down; + struct work_struct work; }; static inline struct xenbus_device *to_xenbus_device(struct device *dev) -- 1.7.10.4
Jan Beulich
2013-May-29 07:03 UTC
Re: [PATCH V5 2/2] xenbus: delay xenbus frontend resume if xenstored is not running
>>> On 28.05.13 at 19:09, Aurelien Chartier <aurelien.chartier@citrix.com> wrote: > If the xenbus frontend is located in a domain running xenstored, the device > resume is hanging because it is happening before the process resume. This > patch adds extra logic to the resume code to check if we are the domain > running xenstored and delay the resume if needed. > > Signed-off-by: Aurelien Chartier <aurelien.chartier@citrix.com> > > Changes in v2: > - Instead of bypassing the resume, process it in a workqueue > Changes in v3: > - Add a struct work in xenbus_device to avoid dynamic allocation > - Several small code fixes > Changes in v4: > - Use a dedicated workqueue > Changes in v5: > - Move create_workqueue error handling to xenbus_frontend_dev_resumeAcked-by: Jan Beulich <jbeulich@suse.com> Yet nevertheless I''m still seeing room for improvement:> +static int xenbus_frontend_dev_resume(struct device *dev) > +{ > + /* > + * If xenstored is running in this domain, we cannot access the backend > + * state at the moment, so we need to defer xenbus_dev_resume > + */ > + if (xen_store_domain_type == XS_LOCAL) { > + struct xenbus_device *xdev = to_xenbus_device(dev); > + > + if (!xenbus_frontend_wq) { > + pr_err("%s: no workqueue to process delayed resume\n", > + xdev->nodename); > + return -EFAULT;Issuing a message here is fine, but I think you should also issue a pr_warn() at initialization time.> + } > + > + INIT_WORK(&xdev->work, xenbus_frontend_delayed_resume);And I also think that this would better be done once at initialization time too.> + queue_work(xenbus_frontend_wq, &xdev->work); > + > + return 0; > + }Jan
Konrad Rzeszutek Wilk
2013-May-29 15:44 UTC
Re: [PATCH V5 2/2] xenbus: delay xenbus frontend resume if xenstored is not running
On Wed, May 29, 2013 at 08:03:13AM +0100, Jan Beulich wrote:> >>> On 28.05.13 at 19:09, Aurelien Chartier <aurelien.chartier@citrix.com> wrote: > > If the xenbus frontend is located in a domain running xenstored, the device > > resume is hanging because it is happening before the process resume. This > > patch adds extra logic to the resume code to check if we are the domain > > running xenstored and delay the resume if needed. > > > > Signed-off-by: Aurelien Chartier <aurelien.chartier@citrix.com> > > > > Changes in v2: > > - Instead of bypassing the resume, process it in a workqueue > > Changes in v3: > > - Add a struct work in xenbus_device to avoid dynamic allocation > > - Several small code fixes > > Changes in v4: > > - Use a dedicated workqueue > > Changes in v5: > > - Move create_workqueue error handling to xenbus_frontend_dev_resume > > Acked-by: Jan Beulich <jbeulich@suse.com> > > Yet nevertheless I''m still seeing room for improvement: > > > +static int xenbus_frontend_dev_resume(struct device *dev) > > +{ > > + /* > > + * If xenstored is running in this domain, we cannot access the backend > > + * state at the moment, so we need to defer xenbus_dev_resume > > + */ > > + if (xen_store_domain_type == XS_LOCAL) { > > + struct xenbus_device *xdev = to_xenbus_device(dev); > > + > > + if (!xenbus_frontend_wq) { > > + pr_err("%s: no workqueue to process delayed resume\n", > > + xdev->nodename); > > + return -EFAULT; > > Issuing a message here is fine, but I think you should also issue a > pr_warn() at initialization time. > > > + } > > + > > + INIT_WORK(&xdev->work, xenbus_frontend_delayed_resume); > > And I also think that this would better be done once at initialization > time too. > > > + queue_work(xenbus_frontend_wq, &xdev->work); > > + > > + return 0; > > + } > > JanLets do that as a follow up patch. Aurlien, I''ve taken your patches in.>
Aurelien Chartier
2013-May-29 16:17 UTC
Re: [PATCH V5 2/2] xenbus: delay xenbus frontend resume if xenstored is not running
On 29/05/13 16:44, Konrad Rzeszutek Wilk wrote:> On Wed, May 29, 2013 at 08:03:13AM +0100, Jan Beulich wrote: >>>>> On 28.05.13 at 19:09, Aurelien Chartier <aurelien.chartier@citrix.com> wrote: >>> If the xenbus frontend is located in a domain running xenstored, the device >>> resume is hanging because it is happening before the process resume. This >>> patch adds extra logic to the resume code to check if we are the domain >>> running xenstored and delay the resume if needed. >>> >>> Signed-off-by: Aurelien Chartier <aurelien.chartier@citrix.com> >>> >>> Changes in v2: >>> - Instead of bypassing the resume, process it in a workqueue >>> Changes in v3: >>> - Add a struct work in xenbus_device to avoid dynamic allocation >>> - Several small code fixes >>> Changes in v4: >>> - Use a dedicated workqueue >>> Changes in v5: >>> - Move create_workqueue error handling to xenbus_frontend_dev_resume >> Acked-by: Jan Beulich <jbeulich@suse.com> >> >> Yet nevertheless I''m still seeing room for improvement: >> >>> +static int xenbus_frontend_dev_resume(struct device *dev) >>> +{ >>> + /* >>> + * If xenstored is running in this domain, we cannot access the backend >>> + * state at the moment, so we need to defer xenbus_dev_resume >>> + */ >>> + if (xen_store_domain_type == XS_LOCAL) { >>> + struct xenbus_device *xdev = to_xenbus_device(dev); >>> + >>> + if (!xenbus_frontend_wq) { >>> + pr_err("%s: no workqueue to process delayed resume\n", >>> + xdev->nodename); >>> + return -EFAULT; >> Issuing a message here is fine, but I think you should also issue a >> pr_warn() at initialization time. >> >>> + } >>> + >>> + INIT_WORK(&xdev->work, xenbus_frontend_delayed_resume); >> And I also think that this would better be done once at initialization >> time too. >> >>> + queue_work(xenbus_frontend_wq, &xdev->work); >>> + >>> + return 0; >>> + } >> Jan > Lets do that as a follow up patch. Aurlien, I''ve taken your patches in.Thanks Konrad. I''ll send the follow up patch later on.