Aurelien Chartier
2013-May-08 14:32 UTC
[PATCH V2 0/2] xenbus: Fix S3 frontend resume when xenstored is not running
Hi, This patch series fixes the S3 resume of a domain running xenstored running a frontend over xenbus (xen-netfront in my use case). As device resume is happening before process resume, the xenbus frontend resume is hanging if xenstored is not running, thus causing a deadlock. This patch series is fixing that issue by deferring the xenbus frontend resume when we are running xenstored in that same domain. Aurelien Chartier (2): xenbus: save xenstore local status for later use xenbus: defer xenbus frontend resume if xenstored is not running drivers/xen/xenbus/xenbus_comms.h | 1 + drivers/xen/xenbus/xenbus_probe.c | 27 ++++++++++------------ drivers/xen/xenbus/xenbus_probe.h | 12 ++++++++++ drivers/xen/xenbus/xenbus_probe_frontend.c | 34 +++++++++++++++++++++++++++- 4 files changed, 58 insertions(+), 16 deletions(-) -- 1.7.10.4
Aurelien Chartier
2013-May-08 14:32 UTC
[PATCH V2 1/2] xenbus: save xenstore local status for later use
Save the xenstore local status computed in xenbus_init. It can then be used later to check if xenstored is running in that domain. Signed-off-by: Aurelien Chartier <aurelien.chartier@citrix.com> --- drivers/xen/xenbus/xenbus_comms.h | 1 + drivers/xen/xenbus/xenbus_probe.c | 27 ++++++++++++--------------- drivers/xen/xenbus/xenbus_probe.h | 7 +++++++ 3 files changed, 20 insertions(+), 15 deletions(-) diff --git a/drivers/xen/xenbus/xenbus_comms.h b/drivers/xen/xenbus/xenbus_comms.h index c8abd3b..be329a1 100644 --- a/drivers/xen/xenbus/xenbus_comms.h +++ b/drivers/xen/xenbus/xenbus_comms.h @@ -45,6 +45,7 @@ int xb_wait_for_data_to_read(void); int xs_input_avail(void); extern struct xenstore_domain_interface *xen_store_interface; extern int xen_store_evtchn; +extern enum xenstore_init xen_store_domain; extern const struct file_operations xen_xenbus_fops; diff --git a/drivers/xen/xenbus/xenbus_probe.c b/drivers/xen/xenbus/xenbus_probe.c index 3325884..c3375ba 100644 --- a/drivers/xen/xenbus/xenbus_probe.c +++ b/drivers/xen/xenbus/xenbus_probe.c @@ -69,6 +69,9 @@ EXPORT_SYMBOL_GPL(xen_store_evtchn); struct xenstore_domain_interface *xen_store_interface; EXPORT_SYMBOL_GPL(xen_store_interface); +enum xenstore_init xen_store_domain; +EXPORT_SYMBOL_GPL(xen_store_domain); + static unsigned long xen_store_mfn; static BLOCKING_NOTIFIER_HEAD(xenstore_chain); @@ -719,17 +722,11 @@ static int __init xenstored_local_init(void) return err; } -enum xenstore_init { - UNKNOWN, - PV, - HVM, - LOCAL, -}; static int __init xenbus_init(void) { int err = 0; - enum xenstore_init usage = UNKNOWN; uint64_t v = 0; + xen_store_domain = XS_UNKNOWN; if (!xen_domain()) return -ENODEV; @@ -737,29 +734,29 @@ static int __init xenbus_init(void) xenbus_ring_ops_init(); if (xen_pv_domain()) - usage = PV; + xen_store_domain = XS_PV; if (xen_hvm_domain()) - usage = HVM; + xen_store_domain = XS_HVM; if (xen_hvm_domain() && xen_initial_domain()) - usage = LOCAL; + xen_store_domain = XS_LOCAL; if (xen_pv_domain() && !xen_start_info->store_evtchn) - usage = LOCAL; + xen_store_domain = XS_LOCAL; if (xen_pv_domain() && xen_start_info->store_evtchn) xenstored_ready = 1; - switch (usage) { - case LOCAL: + switch (xen_store_domain) { + case XS_LOCAL: err = xenstored_local_init(); if (err) goto out_error; xen_store_interface = mfn_to_virt(xen_store_mfn); break; - case PV: + case XS_PV: xen_store_evtchn = xen_start_info->store_evtchn; xen_store_mfn = xen_start_info->store_mfn; xen_store_interface = mfn_to_virt(xen_store_mfn); break; - case HVM: + case XS_HVM: err = hvm_get_parameter(HVM_PARAM_STORE_EVTCHN, &v); if (err) goto out_error; diff --git a/drivers/xen/xenbus/xenbus_probe.h b/drivers/xen/xenbus/xenbus_probe.h index bb4f92e..146f857 100644 --- a/drivers/xen/xenbus/xenbus_probe.h +++ b/drivers/xen/xenbus/xenbus_probe.h @@ -47,6 +47,13 @@ struct xen_bus_type { struct bus_type bus; }; +enum xenstore_init { + XS_UNKNOWN, + XS_PV, + XS_HVM, + XS_LOCAL, +}; + extern struct device_attribute xenbus_dev_attrs[]; extern int xenbus_match(struct device *_dev, struct device_driver *_drv); -- 1.7.10.4
Aurelien Chartier
2013-May-08 14:32 UTC
[PATCH V2 2/2] xenbus: defer xenbus frontend resume if xenstored is not running
If the xenbus frontend is running in a domain running xenstored, the device resume is hanging because it is happening before the process resume. This patch adds extra logic to the resume code to check if we are the domain running xenstored and delay the resume if needed. Signed-off-by: Aurelien Chartier <aurelien.chartier@citrix.com> Changes in v2: - Instead of bypassing the resume, process it in a workqueue --- drivers/xen/xenbus/xenbus_probe.h | 5 ++++ drivers/xen/xenbus/xenbus_probe_frontend.c | 34 +++++++++++++++++++++++++++- 2 files changed, 38 insertions(+), 1 deletion(-) diff --git a/drivers/xen/xenbus/xenbus_probe.h b/drivers/xen/xenbus/xenbus_probe.h index 146f857..46b384c 100644 --- a/drivers/xen/xenbus/xenbus_probe.h +++ b/drivers/xen/xenbus/xenbus_probe.h @@ -47,6 +47,11 @@ struct xen_bus_type { struct bus_type bus; }; +struct xenbus_resume_work { + struct work_struct w; + struct device *dev; +}; + enum xenstore_init { XS_UNKNOWN, XS_PV, diff --git a/drivers/xen/xenbus/xenbus_probe_frontend.c b/drivers/xen/xenbus/xenbus_probe_frontend.c index 3159a37..83b83d0 100644 --- a/drivers/xen/xenbus/xenbus_probe_frontend.c +++ b/drivers/xen/xenbus/xenbus_probe_frontend.c @@ -28,6 +28,7 @@ #include "xenbus_comms.h" #include "xenbus_probe.h" +static struct workqueue_struct *xenbus_frontend_resume_wq; /* device/<type>/<id> => <type>-<id> */ static int frontend_bus_id(char bus_id[XEN_BUS_ID_SIZE], const char *nodename) @@ -89,9 +90,38 @@ static void backend_changed(struct xenbus_watch *watch, xenbus_otherend_changed(watch, vec, len, 1); } +static void xenbus_frontend_delayed_resume(struct work_struct *w) +{ + struct xenbus_resume_work *resume_work = (struct xenbus_resume_work *) w; + + xenbus_dev_resume(dev); + + kfree(w); +} + +static int xenbus_frontend_dev_resume(struct device *dev) +{ + /* + * If xenstored is running in that domain, we cannot access the backend + * state at the moment, so we need to defer xenbus_dev_resume + */ + if (xen_store_domain == XS_LOCAL) { + struct xenbus_resume_work *work + kmalloc(sizeof(struct xenbus_resume), GFP_KERNEL); + + INIT_WORK((struct work_struct *) work, xenbus_frontend_delayed_resume); + resume_work->dev = dev; + queue_work(xenbus_frontend_resume_wq, (struct work_struct *) work) + + return 0; + } + + return xenbus_dev_resume(dev); +} + static const struct dev_pm_ops xenbus_pm_ops = { .suspend = xenbus_dev_suspend, - .resume = xenbus_dev_resume, + .resume = xenbus_frontend_dev_resume, .freeze = xenbus_dev_suspend, .thaw = xenbus_dev_cancel, .restore = xenbus_dev_resume, @@ -440,6 +470,8 @@ static int __init xenbus_probe_frontend_init(void) register_xenstore_notifier(&xenstore_notifier); + xenbus_frontend_resume_wq = create_workqueue("xenbus_frontend_resume"); + return 0; } subsys_initcall(xenbus_probe_frontend_init); -- 1.7.10.4
Aurelien Chartier
2013-May-08 14:40 UTC
Re: [PATCH V2 2/2] xenbus: defer xenbus frontend resume if xenstored is not running
On 08/05/13 15:32, Aurelien Chartier wrote:> If the xenbus frontend is running in a domain running xenstored, the device > resume is hanging because it is happening before the process resume. This > patch adds extra logic to the resume code to check if we are the domain > running xenstored and delay the resume if needed. > > Signed-off-by: Aurelien Chartier <aurelien.chartier@citrix.com> > > Changes in v2: > - Instead of bypassing the resume, process it in a workqueue > --- > drivers/xen/xenbus/xenbus_probe.h | 5 ++++ > drivers/xen/xenbus/xenbus_probe_frontend.c | 34 +++++++++++++++++++++++++++- > 2 files changed, 38 insertions(+), 1 deletion(-) > > diff --git a/drivers/xen/xenbus/xenbus_probe.h b/drivers/xen/xenbus/xenbus_probe.h > index 146f857..46b384c 100644 > --- a/drivers/xen/xenbus/xenbus_probe.h > +++ b/drivers/xen/xenbus/xenbus_probe.h > @@ -47,6 +47,11 @@ struct xen_bus_type { > struct bus_type bus; > }; > > +struct xenbus_resume_work { > + struct work_struct w; > + struct device *dev; > +}; > + > enum xenstore_init { > XS_UNKNOWN, > XS_PV, > diff --git a/drivers/xen/xenbus/xenbus_probe_frontend.c b/drivers/xen/xenbus/xenbus_probe_frontend.c > index 3159a37..83b83d0 100644 > --- a/drivers/xen/xenbus/xenbus_probe_frontend.c > +++ b/drivers/xen/xenbus/xenbus_probe_frontend.c > @@ -28,6 +28,7 @@ > #include "xenbus_comms.h" > #include "xenbus_probe.h" > > +static struct workqueue_struct *xenbus_frontend_resume_wq; > > /* device/<type>/<id> => <type>-<id> */ > static int frontend_bus_id(char bus_id[XEN_BUS_ID_SIZE], const char *nodename) > @@ -89,9 +90,38 @@ static void backend_changed(struct xenbus_watch *watch, > xenbus_otherend_changed(watch, vec, len, 1); > } > > +static void xenbus_frontend_delayed_resume(struct work_struct *w) > +{ > + struct xenbus_resume_work *resume_work = (struct xenbus_resume_work *) w; > + > + xenbus_dev_resume(dev); > + > + kfree(w); > +} > + > +static int xenbus_frontend_dev_resume(struct device *dev) > +{ > + /* > + * If xenstored is running in that domain, we cannot access the backend > + * state at the moment, so we need to defer xenbus_dev_resume > + */ > + if (xen_store_domain == XS_LOCAL) { > + struct xenbus_resume_work *work > + kmalloc(sizeof(struct xenbus_resume), GFP_KERNEL); > + > + INIT_WORK((struct work_struct *) work, xenbus_frontend_delayed_resume); > + resume_work->dev = dev; > + queue_work(xenbus_frontend_resume_wq, (struct work_struct *) work)Missing a semicolon here. I should have read that patch once more, apologies for that.> + > + return 0; > + } > + > + return xenbus_dev_resume(dev); > +} > + > static const struct dev_pm_ops xenbus_pm_ops = { > .suspend = xenbus_dev_suspend, > - .resume = xenbus_dev_resume, > + .resume = xenbus_frontend_dev_resume, > .freeze = xenbus_dev_suspend, > .thaw = xenbus_dev_cancel, > .restore = xenbus_dev_resume, > @@ -440,6 +470,8 @@ static int __init xenbus_probe_frontend_init(void) > > register_xenstore_notifier(&xenstore_notifier); > > + xenbus_frontend_resume_wq = create_workqueue("xenbus_frontend_resume"); > + > return 0; > } > subsys_initcall(xenbus_probe_frontend_init);
Jan Beulich
2013-May-08 14:50 UTC
Re: [PATCH V2 2/2] xenbus: defer xenbus frontend resume if xenstored is not running
>>> On 08.05.13 at 16:32, Aurelien Chartier <aurelien.chartier@citrix.com> wrote: > --- a/drivers/xen/xenbus/xenbus_probe.h > +++ b/drivers/xen/xenbus/xenbus_probe.h > @@ -47,6 +47,11 @@ struct xen_bus_type { > struct bus_type bus; > }; > > +struct xenbus_resume_work { > + struct work_struct w; > + struct device *dev; > +}; > +I don''t think this structure needs to be in a header - it''s being used in a single source file only.> --- a/drivers/xen/xenbus/xenbus_probe_frontend.c > +++ b/drivers/xen/xenbus/xenbus_probe_frontend.c > @@ -28,6 +28,7 @@ > #include "xenbus_comms.h" > #include "xenbus_probe.h" > > +static struct workqueue_struct *xenbus_frontend_resume_wq; > > /* device/<type>/<id> => <type>-<id> */ > static int frontend_bus_id(char bus_id[XEN_BUS_ID_SIZE], const char *nodename) > @@ -89,9 +90,38 @@ static void backend_changed(struct xenbus_watch *watch, > xenbus_otherend_changed(watch, vec, len, 1); > } > > +static void xenbus_frontend_delayed_resume(struct work_struct *w) > +{ > + struct xenbus_resume_work *resume_work = (struct xenbus_resume_work *) w;container_of()> + > + xenbus_dev_resume(dev);Does this build at all? I don''t see where "dev" is being declared/ initialized?> + > + kfree(w);kfree(resume_work) - otherwise you have a hidden dependency on "w" being the first member of struct xenbus_resume_work.> +} > + > +static int xenbus_frontend_dev_resume(struct device *dev) > +{ > + /* > + * If xenstored is running in that domain, we cannot access the backend > + * state at the moment, so we need to defer xenbus_dev_resume > + */ > + if (xen_store_domain == XS_LOCAL) { > + struct xenbus_resume_work *work > + kmalloc(sizeof(struct xenbus_resume), GFP_KERNEL);Missing NULL return check (don''t know what you should do in that case). Perhaps dynamic allocation is the wrong approach here, and you want to rather add a new field to struct xenbus_device (which gets populated only for frontend devices, and - if possible - only when xen_store_domain == XS_LOCAL. Also - GFP_KERNEL really suitable here?> + > + INIT_WORK((struct work_struct *) work, xenbus_frontend_delayed_resume);&work->w (without any cast).> + resume_work->dev = dev; > + queue_work(xenbus_frontend_resume_wq, (struct work_struct *) work)Again. Jan
Aurelien Chartier
2013-May-08 16:23 UTC
Re: [PATCH V2 2/2] xenbus: defer xenbus frontend resume if xenstored is not running
On 08/05/13 15:50, Jan Beulich wrote:>>>> On 08.05.13 at 16:32, Aurelien Chartier <aurelien.chartier@citrix.com> wrote: >> --- a/drivers/xen/xenbus/xenbus_probe.h >> +++ b/drivers/xen/xenbus/xenbus_probe.h >> @@ -47,6 +47,11 @@ struct xen_bus_type { >> struct bus_type bus; >> }; >> >> +struct xenbus_resume_work { >> + struct work_struct w; >> + struct device *dev; >> +}; >> + > I don''t think this structure needs to be in a header - it''s being used > in a single source file only. > >> --- a/drivers/xen/xenbus/xenbus_probe_frontend.c >> +++ b/drivers/xen/xenbus/xenbus_probe_frontend.c >> @@ -28,6 +28,7 @@ >> #include "xenbus_comms.h" >> #include "xenbus_probe.h" >> >> +static struct workqueue_struct *xenbus_frontend_resume_wq; >> >> /* device/<type>/<id> => <type>-<id> */ >> static int frontend_bus_id(char bus_id[XEN_BUS_ID_SIZE], const char *nodename) >> @@ -89,9 +90,38 @@ static void backend_changed(struct xenbus_watch *watch, >> xenbus_otherend_changed(watch, vec, len, 1); >> } >> >> +static void xenbus_frontend_delayed_resume(struct work_struct *w) >> +{ >> + struct xenbus_resume_work *resume_work = (struct xenbus_resume_work *) w; > container_of() > >> + >> + xenbus_dev_resume(dev); > Does this build at all? I don''t see where "dev" is being declared/ > initialized?I messed up my config file and ended up with CONFIG_XEN_XENBUS_FRONTEND wiped out, so I did not catch this one. Sorry for that, it was meant to be resume_work->dev.> >> + >> + kfree(w); > kfree(resume_work) - otherwise you have a hidden dependency > on "w" being the first member of struct xenbus_resume_work. > >> +} >> + >> +static int xenbus_frontend_dev_resume(struct device *dev) >> +{ >> + /* >> + * If xenstored is running in that domain, we cannot access the backend >> + * state at the moment, so we need to defer xenbus_dev_resume >> + */ >> + if (xen_store_domain == XS_LOCAL) { >> + struct xenbus_resume_work *work >> + kmalloc(sizeof(struct xenbus_resume), GFP_KERNEL); > Missing NULL return check (don''t know what you should do in that > case). Perhaps dynamic allocation is the wrong approach here, and > you want to rather add a new field to struct xenbus_device (which > gets populated only for frontend devices, and - if possible - only > when xen_store_domain == XS_LOCAL. > > Also - GFP_KERNEL really suitable here?Adding a new field for xenbus_device for that purpose seemed too much for me, but it is probably a cleaner solution in the end. I don''t really see a way to avoid having that field for backend devices. We can have a #ifdef CONFIG_XEN_XENBUS_FRONTEND, but that won''t prevent domains with both frontend and backend to have that field set in all xenbus devices.> >> + >> + INIT_WORK((struct work_struct *) work, xenbus_frontend_delayed_resume); > &work->w (without any cast). > >> + resume_work->dev = dev; >> + queue_work(xenbus_frontend_resume_wq, (struct work_struct *) work) > Again.Thanks for the review. I will fix the errors in my next patch series. Aurelien.