Aurelien Chartier
2013-May-01 12:57 UTC
[PATCH 0/2] xenbus: Fix S3 frontend resume when xenstored is not running
Hi, This patch series fixes the S3 resume of dom0 or a Xenstore stub domain running a frontend over xenbus (xen-netfront in my use case). As device resume is happening before process resume, the xenbus frontend resume is hanging if xenstored is not running, thus causing a deadlock. This patch series is fixing that issue by bypassing the xenbus frontend resume when we are running in dom0 or a Xenstore stub domain. Aurelien Chartier (2): xenbus: save xenstore local status for later use xenbus: bypass xenbus frontend resume if xenstored is not running drivers/xen/xenbus/xenbus_comms.h | 1 + drivers/xen/xenbus/xenbus_probe.c | 27 ++++++++++++--------------- drivers/xen/xenbus/xenbus_probe.h | 7 +++++++ drivers/xen/xenbus/xenbus_probe_frontend.c | 15 ++++++++++++++- 4 files changed, 34 insertions(+), 16 deletions(-)
Aurelien Chartier
2013-May-01 12:57 UTC
[PATCH 1/2] xenbus: save xenstore local status for later use
Save the xenstore local status computed in xenbus_init. It can then be used later to check if xenstored is running in that domain. --- drivers/xen/xenbus/xenbus_comms.h | 1 + drivers/xen/xenbus/xenbus_probe.c | 27 ++++++++++++--------------- drivers/xen/xenbus/xenbus_probe.h | 7 +++++++ 3 files changed, 20 insertions(+), 15 deletions(-) diff --git a/drivers/xen/xenbus/xenbus_comms.h b/drivers/xen/xenbus/xenbus_comms.h index c8abd3b..be329a1 100644 --- a/drivers/xen/xenbus/xenbus_comms.h +++ b/drivers/xen/xenbus/xenbus_comms.h @@ -45,6 +45,7 @@ int xb_wait_for_data_to_read(void); int xs_input_avail(void); extern struct xenstore_domain_interface *xen_store_interface; extern int xen_store_evtchn; +extern enum xenstore_init xen_store_domain; extern const struct file_operations xen_xenbus_fops; diff --git a/drivers/xen/xenbus/xenbus_probe.c b/drivers/xen/xenbus/xenbus_probe.c index 3325884..c3375ba 100644 --- a/drivers/xen/xenbus/xenbus_probe.c +++ b/drivers/xen/xenbus/xenbus_probe.c @@ -69,6 +69,9 @@ EXPORT_SYMBOL_GPL(xen_store_evtchn); struct xenstore_domain_interface *xen_store_interface; EXPORT_SYMBOL_GPL(xen_store_interface); +enum xenstore_init xen_store_domain; +EXPORT_SYMBOL_GPL(xen_store_domain); + static unsigned long xen_store_mfn; static BLOCKING_NOTIFIER_HEAD(xenstore_chain); @@ -719,17 +722,11 @@ static int __init xenstored_local_init(void) return err; } -enum xenstore_init { - UNKNOWN, - PV, - HVM, - LOCAL, -}; static int __init xenbus_init(void) { int err = 0; - enum xenstore_init usage = UNKNOWN; uint64_t v = 0; + xen_store_domain = XS_UNKNOWN; if (!xen_domain()) return -ENODEV; @@ -737,29 +734,29 @@ static int __init xenbus_init(void) xenbus_ring_ops_init(); if (xen_pv_domain()) - usage = PV; + xen_store_domain = XS_PV; if (xen_hvm_domain()) - usage = HVM; + xen_store_domain = XS_HVM; if (xen_hvm_domain() && xen_initial_domain()) - usage = LOCAL; + xen_store_domain = XS_LOCAL; if (xen_pv_domain() && !xen_start_info->store_evtchn) - usage = LOCAL; + xen_store_domain = XS_LOCAL; if (xen_pv_domain() && xen_start_info->store_evtchn) xenstored_ready = 1; - switch (usage) { - case LOCAL: + switch (xen_store_domain) { + case XS_LOCAL: err = xenstored_local_init(); if (err) goto out_error; xen_store_interface = mfn_to_virt(xen_store_mfn); break; - case PV: + case XS_PV: xen_store_evtchn = xen_start_info->store_evtchn; xen_store_mfn = xen_start_info->store_mfn; xen_store_interface = mfn_to_virt(xen_store_mfn); break; - case HVM: + case XS_HVM: err = hvm_get_parameter(HVM_PARAM_STORE_EVTCHN, &v); if (err) goto out_error; diff --git a/drivers/xen/xenbus/xenbus_probe.h b/drivers/xen/xenbus/xenbus_probe.h index bb4f92e..146f857 100644 --- a/drivers/xen/xenbus/xenbus_probe.h +++ b/drivers/xen/xenbus/xenbus_probe.h @@ -47,6 +47,13 @@ struct xen_bus_type { struct bus_type bus; }; +enum xenstore_init { + XS_UNKNOWN, + XS_PV, + XS_HVM, + XS_LOCAL, +}; + extern struct device_attribute xenbus_dev_attrs[]; extern int xenbus_match(struct device *_dev, struct device_driver *_drv);
Aurelien Chartier
2013-May-01 12:57 UTC
[PATCH 2/2] xenbus: bypass xenbus frontend resume if xenstored is not running
If the xenbus frontend is running in a domain running xenstored or in dom0, the device resume is hanging because it is happening before the process resume. This patch adds extra logic to the resume code to check if we are the domain running xenstored or dom0. The frontend will be reconnected later, when the backend resumes from S3. This logic is working when xenstored is running in dom0, but has not been tested with a xenstore stub domain. --- drivers/xen/xenbus/xenbus_probe_frontend.c | 15 ++++++++++++++- 1 file changed, 14 insertions(+), 1 deletion(-) diff --git a/drivers/xen/xenbus/xenbus_probe_frontend.c b/drivers/xen/xenbus/xenbus_probe_frontend.c index 3159a37..8583afe 100644 --- a/drivers/xen/xenbus/xenbus_probe_frontend.c +++ b/drivers/xen/xenbus/xenbus_probe_frontend.c @@ -89,9 +89,22 @@ static void backend_changed(struct xenbus_watch *watch, xenbus_otherend_changed(watch, vec, len, 1); } +static int xenbus_frontend_dev_resume(struct device *dev) +{ + /* + * If xenstored is running in that domain, we cannot access the backend + * state at the moment. If we are running in dom0, the domain running + * xenstored is still suspended at that point + */ + if (xen_initial_domain() || (xen_store_domain == XS_LOCAL)) + return 0; + + return xenbus_dev_resume(dev); +} + static const struct dev_pm_ops xenbus_pm_ops = { .suspend = xenbus_dev_suspend, - .resume = xenbus_dev_resume, + .resume = xenbus_frontend_dev_resume, .freeze = xenbus_dev_suspend, .thaw = xenbus_dev_cancel, .restore = xenbus_dev_resume,
Ian Campbell
2013-May-02 08:24 UTC
Re: [PATCH 2/2] xenbus: bypass xenbus frontend resume if xenstored is not running
On Wed, 2013-05-01 at 13:57 +0100, Aurelien Chartier wrote:> If the xenbus frontend is running in a domain running xenstored or in dom0, > the device resume is hanging because it is happening before the process > resume. This patch adds extra logic to the resume code to check if we are > the domain running xenstored or dom0. > > The frontend will be reconnected later, when the backend resumes from S3. > This logic is working when xenstored is running in dom0, but has not been > tested with a xenstore stub domain. > --- > drivers/xen/xenbus/xenbus_probe_frontend.c | 15 ++++++++++++++- > 1 file changed, 14 insertions(+), 1 deletion(-) > > diff --git a/drivers/xen/xenbus/xenbus_probe_frontend.c b/drivers/xen/xenbus/xenbus_probe_frontend.c > index 3159a37..8583afe 100644 > --- a/drivers/xen/xenbus/xenbus_probe_frontend.c > +++ b/drivers/xen/xenbus/xenbus_probe_frontend.c > @@ -89,9 +89,22 @@ static void backend_changed(struct xenbus_watch *watch, > xenbus_otherend_changed(watch, vec, len, 1); > } > > +static int xenbus_frontend_dev_resume(struct device *dev) > +{ > + /* > + * If xenstored is running in that domain, we cannot access the backend > + * state at the moment. If we are running in dom0, the domain running > + * xenstored is still suspended at that point > + */ > + if (xen_initial_domain() || (xen_store_domain == XS_LOCAL)) > + return 0; > + > + return xenbus_dev_resume(dev);When or where does this eventually get called for the init domain or XS_LOCAL cases?> +} > + > static const struct dev_pm_ops xenbus_pm_ops = { > .suspend = xenbus_dev_suspend, > - .resume = xenbus_dev_resume, > + .resume = xenbus_frontend_dev_resume, > .freeze = xenbus_dev_suspend, > .thaw = xenbus_dev_cancel, > .restore = xenbus_dev_resume,
Jan Beulich
2013-May-02 09:21 UTC
Re: [PATCH 2/2] xenbus: bypass xenbus frontend resume if xenstored is not running
>>> On 02.05.13 at 10:24, Ian Campbell <Ian.Campbell@citrix.com> wrote: > On Wed, 2013-05-01 at 13:57 +0100, Aurelien Chartier wrote: >> If the xenbus frontend is running in a domain running xenstored or in dom0, >> the device resume is hanging because it is happening before the process >> resume. This patch adds extra logic to the resume code to check if we are >> the domain running xenstored or dom0. >> >> The frontend will be reconnected later, when the backend resumes from S3. >> This logic is working when xenstored is running in dom0, but has not been >> tested with a xenstore stub domain. >> --- >> drivers/xen/xenbus/xenbus_probe_frontend.c | 15 ++++++++++++++- >> 1 file changed, 14 insertions(+), 1 deletion(-) >> >> diff --git a/drivers/xen/xenbus/xenbus_probe_frontend.c > b/drivers/xen/xenbus/xenbus_probe_frontend.c >> index 3159a37..8583afe 100644 >> --- a/drivers/xen/xenbus/xenbus_probe_frontend.c >> +++ b/drivers/xen/xenbus/xenbus_probe_frontend.c >> @@ -89,9 +89,22 @@ static void backend_changed(struct xenbus_watch *watch, >> xenbus_otherend_changed(watch, vec, len, 1); >> } >> >> +static int xenbus_frontend_dev_resume(struct device *dev) >> +{ >> + /* >> + * If xenstored is running in that domain, we cannot access the backend >> + * state at the moment. If we are running in dom0, the domain running >> + * xenstored is still suspended at that point >> + */ >> + if (xen_initial_domain() || (xen_store_domain == XS_LOCAL)) >> + return 0; >> + >> + return xenbus_dev_resume(dev); > > When or where does this eventually get called for the init domain or > XS_LOCAL cases?I was about to ask the same question. Plus I don''t think the description here or in the overview mail really makes clear how specifically a deadlock would occur here. That''s pretty relevant to understand in the light that so far we had no indication of there being any special treatment necessary here, and resume from S3 had been working quite fine without that (at least as long as xenstored is running in Dom0 and at least with the traditional/ forward-port/non-pvops kernels). Jan
Ian Campbell
2013-May-02 09:24 UTC
Re: [PATCH 2/2] xenbus: bypass xenbus frontend resume if xenstored is not running
On Thu, 2013-05-02 at 10:21 +0100, Jan Beulich wrote:> >>> On 02.05.13 at 10:24, Ian Campbell <Ian.Campbell@citrix.com> wrote: > > On Wed, 2013-05-01 at 13:57 +0100, Aurelien Chartier wrote: > >> If the xenbus frontend is running in a domain running xenstored or in dom0, > >> the device resume is hanging because it is happening before the process > >> resume. This patch adds extra logic to the resume code to check if we are > >> the domain running xenstored or dom0. > >> > >> The frontend will be reconnected later, when the backend resumes from S3. > >> This logic is working when xenstored is running in dom0, but has not been > >> tested with a xenstore stub domain. > >> --- > >> drivers/xen/xenbus/xenbus_probe_frontend.c | 15 ++++++++++++++- > >> 1 file changed, 14 insertions(+), 1 deletion(-) > >> > >> diff --git a/drivers/xen/xenbus/xenbus_probe_frontend.c > > b/drivers/xen/xenbus/xenbus_probe_frontend.c > >> index 3159a37..8583afe 100644 > >> --- a/drivers/xen/xenbus/xenbus_probe_frontend.c > >> +++ b/drivers/xen/xenbus/xenbus_probe_frontend.c > >> @@ -89,9 +89,22 @@ static void backend_changed(struct xenbus_watch *watch, > >> xenbus_otherend_changed(watch, vec, len, 1); > >> } > >> > >> +static int xenbus_frontend_dev_resume(struct device *dev) > >> +{ > >> + /* > >> + * If xenstored is running in that domain, we cannot access the backend > >> + * state at the moment. If we are running in dom0, the domain running > >> + * xenstored is still suspended at that point > >> + */ > >> + if (xen_initial_domain() || (xen_store_domain == XS_LOCAL)) > >> + return 0; > >> + > >> + return xenbus_dev_resume(dev); > > > > When or where does this eventually get called for the init domain or > > XS_LOCAL cases? > > I was about to ask the same question. Plus I don''t think the > description here or in the overview mail really makes clear how > specifically a deadlock would occur here. That''s pretty relevant to > understand in the light that so far we had no indication of there > being any special treatment necessary here, and resume from S3 > had been working quite fine without that (at least as long as > xenstored is running in Dom0 and at least with the traditional/ > forward-port/non-pvops kernels).I think the unusual feature here is that dom0 has a netfront attached. Netfront resume is therefore hanging because it is trying to talk to the still frozen xenstored process in dom0. Ian.
Aurélien Chartier
2013-May-02 10:10 UTC
Re: [PATCH 2/2] xenbus: bypass xenbus frontend resume if xenstored is not running
On 02/05/13 10:24, Ian Campbell wrote:> On Thu, 2013-05-02 at 10:21 +0100, Jan Beulich wrote: >>>>> On 02.05.13 at 10:24, Ian Campbell <Ian.Campbell@citrix.com> wrote: >>> On Wed, 2013-05-01 at 13:57 +0100, Aurelien Chartier wrote: >>>> If the xenbus frontend is running in a domain running xenstored or in dom0, >>>> the device resume is hanging because it is happening before the process >>>> resume. This patch adds extra logic to the resume code to check if we are >>>> the domain running xenstored or dom0. >>>> >>>> The frontend will be reconnected later, when the backend resumes from S3. >>>> This logic is working when xenstored is running in dom0, but has not been >>>> tested with a xenstore stub domain. >>>> --- >>>> drivers/xen/xenbus/xenbus_probe_frontend.c | 15 ++++++++++++++- >>>> 1 file changed, 14 insertions(+), 1 deletion(-) >>>> >>>> diff --git a/drivers/xen/xenbus/xenbus_probe_frontend.c >>> b/drivers/xen/xenbus/xenbus_probe_frontend.c >>>> index 3159a37..8583afe 100644 >>>> --- a/drivers/xen/xenbus/xenbus_probe_frontend.c >>>> +++ b/drivers/xen/xenbus/xenbus_probe_frontend.c >>>> @@ -89,9 +89,22 @@ static void backend_changed(struct xenbus_watch *watch, >>>> xenbus_otherend_changed(watch, vec, len, 1); >>>> } >>>> >>>> +static int xenbus_frontend_dev_resume(struct device *dev) >>>> +{ >>>> + /* >>>> + * If xenstored is running in that domain, we cannot access the backend >>>> + * state at the moment. If we are running in dom0, the domain running >>>> + * xenstored is still suspended at that point >>>> + */ >>>> + if (xen_initial_domain() || (xen_store_domain == XS_LOCAL)) >>>> + return 0; >>>> + >>>> + return xenbus_dev_resume(dev); >>> When or where does this eventually get called for the init domain or >>> XS_LOCAL cases? >> I was about to ask the same question. Plus I don''t think the >> description here or in the overview mail really makes clear how >> specifically a deadlock would occur here. That''s pretty relevant to >> understand in the light that so far we had no indication of there >> being any special treatment necessary here, and resume from S3 >> had been working quite fine without that (at least as long as >> xenstored is running in Dom0 and at least with the traditional/ >> forward-port/non-pvops kernels). > I think the unusual feature here is that dom0 has a netfront attached. > Netfront resume is therefore hanging because it is trying to talk to the > still frozen xenstored process in dom0. > > Ian. >Yes, the unusual feature of having a netfront driver in dom0 is triggering the S3 issue I described. Ian made me realize this issue could also happen in Xenstore stub domains. The root cause of the issue is the assomption that a xenstored process is running in another domain when the xenbus frontend is being resumed from S3. This assomption is incorrect if xenstored and the xenbus frontend are running in the same domain. As Linux kernel is waiting for all devices to be resumed before resuming userland tasks, the xenbus frontend resume is blocking the userland process resume, waiting for xenstored (which cannot run as it is a userland process). The xenbus_dev_resume function for frontend devices such as nefront will not be called at all with that patch. I am relying on the fact that the network backend domain will be resumed after dom0 resume is complete. When that resume is happening, it will trigger a call to netback_changed in dom0 netfront. This call will end up resuming xenbus states in netfront. That logic is working for a dom0 netfront, as we can safely rely on the fact that the network backend domain will be resumed after dom0 resume is complete. I don''t have a Xen configuration with Xenstore stub domain, but it would probably need some extra logic to reconnect the frontend after xenstored is being resumed. The main goal of this patch is to fix the S3 resume of domains running both a xenbus frontend and xenstored. Aurelien
Ian Campbell
2013-May-02 10:30 UTC
Re: [PATCH 2/2] xenbus: bypass xenbus frontend resume if xenstored is not running
On Thu, 2013-05-02 at 11:10 +0100, Aurélien Chartier wrote:> On 02/05/13 10:24, Ian Campbell wrote: > > On Thu, 2013-05-02 at 10:21 +0100, Jan Beulich wrote: > >>>>> On 02.05.13 at 10:24, Ian Campbell <Ian.Campbell@citrix.com> wrote: > >>> On Wed, 2013-05-01 at 13:57 +0100, Aurelien Chartier wrote: > >>>> If the xenbus frontend is running in a domain running xenstored or in dom0, > >>>> the device resume is hanging because it is happening before the process > >>>> resume. This patch adds extra logic to the resume code to check if we are > >>>> the domain running xenstored or dom0. > >>>> > >>>> The frontend will be reconnected later, when the backend resumes from S3. > >>>> This logic is working when xenstored is running in dom0, but has not been > >>>> tested with a xenstore stub domain. > >>>> --- > >>>> drivers/xen/xenbus/xenbus_probe_frontend.c | 15 ++++++++++++++- > >>>> 1 file changed, 14 insertions(+), 1 deletion(-) > >>>> > >>>> diff --git a/drivers/xen/xenbus/xenbus_probe_frontend.c > >>> b/drivers/xen/xenbus/xenbus_probe_frontend.c > >>>> index 3159a37..8583afe 100644 > >>>> --- a/drivers/xen/xenbus/xenbus_probe_frontend.c > >>>> +++ b/drivers/xen/xenbus/xenbus_probe_frontend.c > >>>> @@ -89,9 +89,22 @@ static void backend_changed(struct xenbus_watch *watch, > >>>> xenbus_otherend_changed(watch, vec, len, 1); > >>>> } > >>>> > >>>> +static int xenbus_frontend_dev_resume(struct device *dev) > >>>> +{ > >>>> + /* > >>>> + * If xenstored is running in that domain, we cannot access the backend > >>>> + * state at the moment. If we are running in dom0, the domain running > >>>> + * xenstored is still suspended at that point > >>>> + */ > >>>> + if (xen_initial_domain() || (xen_store_domain == XS_LOCAL)) > >>>> + return 0; > >>>> + > >>>> + return xenbus_dev_resume(dev); > >>> When or where does this eventually get called for the init domain or > >>> XS_LOCAL cases? > >> I was about to ask the same question. Plus I don't think the > >> description here or in the overview mail really makes clear how > >> specifically a deadlock would occur here. That's pretty relevant to > >> understand in the light that so far we had no indication of there > >> being any special treatment necessary here, and resume from S3 > >> had been working quite fine without that (at least as long as > >> xenstored is running in Dom0 and at least with the traditional/ > >> forward-port/non-pvops kernels). > > I think the unusual feature here is that dom0 has a netfront attached. > > Netfront resume is therefore hanging because it is trying to talk to the > > still frozen xenstored process in dom0. > > > > Ian. > > > Yes, the unusual feature of having a netfront driver in dom0 is > triggering the S3 issue I described. Ian made me realize this issue > could also happen in Xenstore stub domains. > > The root cause of the issue is the assomption that a xenstored process > is running in another domain when the xenbus frontend is being resumed > from S3. This assomption is incorrect if xenstored and the xenbus > frontend are running in the same domain. As Linux kernel is waiting for > all devices to be resumed before resuming userland tasks, the xenbus > frontend resume is blocking the userland process resume, waiting for > xenstored (which cannot run as it is a userland process). > > The xenbus_dev_resume function for frontend devices such as nefront will > not be called at all with that patch. I am relying on the fact that the > network backend domain will be resumed after dom0 resume is complete. > When that resume is happening, it will trigger a call to netback_changed > in dom0 netfront. This call will end up resuming xenbus states in netfront. > > That logic is working for a dom0 netfront, as we can safely rely on the > fact that the network backend domain will be resumed after dom0 resume > is complete. I don't have a Xen configuration with Xenstore stub domain, > but it would probably need some extra logic to reconnect the frontend > after xenstored is being resumed. The main goal of this patch is to fix > the S3 resume of domains running both a xenbus frontend and xenstored.Is the assumption that other domains are all suspended over S3 a valid one in the general case? In principal there is nothing stopping the toolstack from leaving domains running over S3, is there? Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Aurelien Chartier
2013-May-02 12:32 UTC
Re: [PATCH 2/2] xenbus: bypass xenbus frontend resume if xenstored is not running
On 02/05/13 11:30, Ian Campbell wrote:> On Thu, 2013-05-02 at 11:10 +0100, Aurélien Chartier wrote: >> On 02/05/13 10:24, Ian Campbell wrote: >>> On Thu, 2013-05-02 at 10:21 +0100, Jan Beulich wrote: >>>>>>> On 02.05.13 at 10:24, Ian Campbell <Ian.Campbell@citrix.com> wrote: >>>>> On Wed, 2013-05-01 at 13:57 +0100, Aurelien Chartier wrote: >>>>>> If the xenbus frontend is running in a domain running xenstored or in dom0, >>>>>> the device resume is hanging because it is happening before the process >>>>>> resume. This patch adds extra logic to the resume code to check if we are >>>>>> the domain running xenstored or dom0. >>>>>> >>>>>> The frontend will be reconnected later, when the backend resumes from S3. >>>>>> This logic is working when xenstored is running in dom0, but has not been >>>>>> tested with a xenstore stub domain. >>>>>> --- >>>>>> drivers/xen/xenbus/xenbus_probe_frontend.c | 15 ++++++++++++++- >>>>>> 1 file changed, 14 insertions(+), 1 deletion(-) >>>>>> >>>>>> diff --git a/drivers/xen/xenbus/xenbus_probe_frontend.c >>>>> b/drivers/xen/xenbus/xenbus_probe_frontend.c >>>>>> index 3159a37..8583afe 100644 >>>>>> --- a/drivers/xen/xenbus/xenbus_probe_frontend.c >>>>>> +++ b/drivers/xen/xenbus/xenbus_probe_frontend.c >>>>>> @@ -89,9 +89,22 @@ static void backend_changed(struct xenbus_watch *watch, >>>>>> xenbus_otherend_changed(watch, vec, len, 1); >>>>>> } >>>>>> >>>>>> +static int xenbus_frontend_dev_resume(struct device *dev) >>>>>> +{ >>>>>> + /* >>>>>> + * If xenstored is running in that domain, we cannot access the backend >>>>>> + * state at the moment. If we are running in dom0, the domain running >>>>>> + * xenstored is still suspended at that point >>>>>> + */ >>>>>> + if (xen_initial_domain() || (xen_store_domain == XS_LOCAL)) >>>>>> + return 0; >>>>>> + >>>>>> + return xenbus_dev_resume(dev); >>>>> When or where does this eventually get called for the init domain or >>>>> XS_LOCAL cases? >>>> I was about to ask the same question. Plus I don't think the >>>> description here or in the overview mail really makes clear how >>>> specifically a deadlock would occur here. That's pretty relevant to >>>> understand in the light that so far we had no indication of there >>>> being any special treatment necessary here, and resume from S3 >>>> had been working quite fine without that (at least as long as >>>> xenstored is running in Dom0 and at least with the traditional/ >>>> forward-port/non-pvops kernels). >>> I think the unusual feature here is that dom0 has a netfront attached. >>> Netfront resume is therefore hanging because it is trying to talk to the >>> still frozen xenstored process in dom0. >>> >>> Ian. >>> >> Yes, the unusual feature of having a netfront driver in dom0 is >> triggering the S3 issue I described. Ian made me realize this issue >> could also happen in Xenstore stub domains. >> >> The root cause of the issue is the assomption that a xenstored process >> is running in another domain when the xenbus frontend is being resumed >> from S3. This assomption is incorrect if xenstored and the xenbus >> frontend are running in the same domain. As Linux kernel is waiting for >> all devices to be resumed before resuming userland tasks, the xenbus >> frontend resume is blocking the userland process resume, waiting for >> xenstored (which cannot run as it is a userland process). >> >> The xenbus_dev_resume function for frontend devices such as nefront will >> not be called at all with that patch. I am relying on the fact that the >> network backend domain will be resumed after dom0 resume is complete. >> When that resume is happening, it will trigger a call to netback_changed >> in dom0 netfront. This call will end up resuming xenbus states in netfront. >> >> That logic is working for a dom0 netfront, as we can safely rely on the >> fact that the network backend domain will be resumed after dom0 resume >> is complete. I don't have a Xen configuration with Xenstore stub domain, >> but it would probably need some extra logic to reconnect the frontend >> after xenstored is being resumed. The main goal of this patch is to fix >> the S3 resume of domains running both a xenbus frontend and xenstored. > Is the assumption that other domains are all suspended over S3 a valid > one in the general case? > > In principal there is nothing stopping the toolstack from leaving > domains running over S3, is there? >That seems a valid assumption for dom0. It is probably not for Xenstore stub domains, even if I fail to see a use case for that. Another solution would be to defer the call to xenbus_dev_resume until userland processes have been resumed. Any opinion on that ? Aurelien _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel