thr3ads.net - Xen devel - [Xen-devel] [PATCH] Skip vcpu_hotplug for VCPU 0 in smp

If this information is useful, please help other people find it:
Share via:

Brendan Cully

2009-Mar-31 16:53 UTC

[Xen-devel] [PATCH] Skip vcpu_hotplug for VCPU 0 in smp_resume

# HG changeset patch
# User Brendan Cully <brendan@cs.ubc.ca>
# Date 1238518139 25200
# Node ID ed51e0ec7b8c08b1f191af7f89e53302f855f7d7
# Parent  4c7eb2e71e9d9d94583336c022042ffaa3ba6481
Skip vcpu_hotplug for VCPU 0 in smp_resume.
This function can occasionally take up to 2 seconds to complete,
and smp_suspend also skips VCPU 0.

Signed-off-by: Brendan Cully <brendan@cs.ubc.ca>

diff --git a/drivers/xen/core/cpu_hotplug.c b/drivers/xen/core/cpu_hotplug.c
--- a/drivers/xen/core/cpu_hotplug.c
+++ b/drivers/xen/core/cpu_hotplug.c
@@ -144,8 +144,11 @@
 {
 	unsigned int cpu;
 
-	for_each_possible_cpu(cpu)
+	for_each_possible_cpu(cpu) {
+		if (cpu == 0)
+			continue;
 		vcpu_hotplug(cpu);
+	}
 }
 
 int cpu_up_check(unsigned int cpu)

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Brendan Cully

2009-Mar-31 17:24 UTC

head link

Re: [Xen-devel] [PATCH] Skip vcpu_hotplug for VCPU 0 in smp_resume

On Tuesday, 31 March 2009 at 09:53, Brendan Cully wrote:> Skip vcpu_hotplug for VCPU 0 in smp_resume.
> This function can occasionally take up to 2 seconds to complete,
> and smp_suspend also skips VCPU 0.
I''m doing a bit of testing ahead of releasing Remus, and I''ve
noticed
that there are a couple of places where resuming from suspension can
get stuck for a long time, which can cause Remus to think that the
domain has died. This is one of them.

The other is in the netfront accelerator. It tears down a xenstore
watch on the accelerator path at every suspend, and adds the watch
back on resume. As with any xenstore interaction, this can
occasionally take a very long time. I''m going from faulty memory here,
but I didn''t think it was necessary to tear down and restore watches
across suspend. Would it make sense to move the watch remove and add
into the resume hook (taking it completely out of suspend and
suspend_cancel)?

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2009-Mar-31 17:35 UTC

head link

Re: [Xen-devel] [PATCH] Skip vcpu_hotplug for VCPU 0 in smp_resume

On 31/03/2009 18:24, "Brendan Cully" <brendan@cs.ubc.ca> wrote:
> The other is in the netfront accelerator. It tears down a xenstore
> watch on the accelerator path at every suspend, and adds the watch
> back on resume. As with any xenstore interaction, this can
> occasionally take a very long time. I''m going from faulty memory
here,
> but I didn''t think it was necessary to tear down and restore
watches
> across suspend. Would it make sense to move the watch remove and add
> into the resume hook (taking it completely out of suspend and
> suspend_cancel)?
They have to be renewed on full resume, but of course they do not get reset
across a cancelled suspend. Kieran may be able to advise on changing the net
accel code.

 -- Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Brendan Cully

2009-Mar-31 20:04 UTC

head link

Re: [Xen-devel] [PATCH] Skip vcpu_hotplug for VCPU 0 in smp_resume

On Tuesday, 31 March 2009 at 18:35, Keir Fraser wrote:> On 31/03/2009 18:24, "Brendan Cully" <brendan@cs.ubc.ca>
wrote:
> 
> > The other is in the netfront accelerator. It tears down a xenstore
> > watch on the accelerator path at every suspend, and adds the watch
> > back on resume. As with any xenstore interaction, this can
> > occasionally take a very long time. I''m going from faulty
memory here,
> > but I didn''t think it was necessary to tear down and restore
watches
> > across suspend. Would it make sense to move the watch remove and add
> > into the resume hook (taking it completely out of suspend and
> > suspend_cancel)?
> 
> They have to be renewed on full resume, but of course they do not get reset
> across a cancelled suspend. Kieran may be able to advise on changing the
net
> accel code.
Could it be as simple as this? I can''t remember what happens if
unregister_xenbus_watch is called after the xenbus connection has been
reset. Should we just free the guest structures without interacting
with xenstore at the start of the resume method?


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Kieran Mansley

2009-Apr-01 10:26 UTC

head link

Re: [Xen-devel] [PATCH] Skip vcpu_hotplug for VCPU 0 in smp_resume

On Tue, 2009-03-31 at 13:04 -0700, Brendan Cully wrote:> On Tuesday, 31 March 2009 at 18:35, Keir Fraser wrote:
> > On 31/03/2009 18:24, "Brendan Cully"
<brendan@cs.ubc.ca> wrote:
> > 
> > > The other is in the netfront accelerator. It tears down a
xenstore
> > > watch on the accelerator path at every suspend, and adds the
watch
> > > back on resume. As with any xenstore interaction, this can
> > > occasionally take a very long time. I''m going from
faulty memory here,
> > > but I didn''t think it was necessary to tear down and
restore watches
> > > across suspend. Would it make sense to move the watch remove and
add
> > > into the resume hook (taking it completely out of suspend and
> > > suspend_cancel)?
> > 
> > They have to be renewed on full resume, but of course they do not get
reset
> > across a cancelled suspend. Kieran may be able to advise on changing
the net
> > accel code.
> 
> Could it be as simple as this? I can''t remember what happens if
> unregister_xenbus_watch is called after the xenbus connection has been
> reset. Should we just free the guest structures without interacting
> with xenstore at the start of the resume method?
I don''t think that will work.  We need to remove the watch before the
suspend to ensure that the code that handles the watch will not be
activated during the migration.

The code that handles the watch may attempt to access hardware directly,
and during a migration this hardware might not be there any more.  The
approach the acceleration code takes is therefore to tear down the
accelerator''s direct hardware access completely before the suspend, and
only reinstate it after the resume completes (on potentially new
hardware) or after the suspend is cancelled.

It may be possible to synchronise the watch handler with the
suspend/resume/cancel cycle without removing the watch, but that starts
to get complicated.

Kieran


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2009-Apr-01 10:31 UTC

head link

Re: [Xen-devel] [PATCH] Skip vcpu_hotplug for VCPU 0 in smp_resume

On 01/04/2009 11:26, "Kieran Mansley" <kmansley@solarflare.com>
wrote:
>> Could it be as simple as this? I can''t remember what happens
if
>> unregister_xenbus_watch is called after the xenbus connection has been
>> reset. Should we just free the guest structures without interacting
>> with xenstore at the start of the resume method?
> 
> It may be possible to synchronise the watch handler with the
> suspend/resume/cancel cycle without removing the watch, but that starts
> to get complicated.
Could we avoid any of this logic executing if there are no net accelerators?

 -- Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Kieran Mansley

2009-Apr-01 11:00 UTC

head link

Re: [Xen-devel] [PATCH] Skip vcpu_hotplug for VCPU 0 in smp_resume

On Wed, 2009-04-01 at 11:31 +0100, Keir Fraser wrote:> On 01/04/2009 11:26, "Kieran Mansley"
<kmansley@solarflare.com> wrote:
> 
> >> Could it be as simple as this? I can''t remember what
happens if
> >> unregister_xenbus_watch is called after the xenbus connection has
been
> >> reset. Should we just free the guest structures without
interacting
> >> with xenstore at the start of the resume method?
> > 
> > It may be possible to synchronise the watch handler with the
> > suspend/resume/cancel cycle without removing the watch, but that
starts
> > to get complicated.
> 
> Could we avoid any of this logic executing if there are no net
accelerators?
The watch handler will try to load an accelerator if the configuration
changes, so even if there were no accelerators before the suspend,
unless you can prevent the watch from firing, you could end up with one
trying to load between the suspend and resume.

If you got rid of the feature to load the requested accelerator
automatically when the configuration changes, then yes, that might be
possible, but I think I''d rather leave that in and use an extra lock
and
some state to ignore the watch firing at bad times.  This would mean we
could leave the watch in place during the suspend/resume/cancel cycle
(refreshing on resume).  The suspend_cancel callback would still be
necessary, but it would just be acquiring a lock and modifying some
state rather than doing a xenbus watch operation.

It''s not clear to me what the source of the long delay is, and whether
that change would solve it: the extra lock would be contended with the
watch handler''s work queue, and so if the watch is the source of the
delay it''s possible that we''d just contend in a different way
and the
delay would still be there.  Brendan: can you explain the delay for me?

Kieran

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Brendan Cully

2009-Apr-01 18:39 UTC

head link

Re: [Xen-devel] [PATCH] Skip vcpu_hotplug for VCPU 0 in smp_resume

On Wednesday, 01 April 2009 at 12:00, Kieran Mansley
wrote:> On Wed, 2009-04-01 at 11:31 +0100, Keir Fraser wrote:
> > On 01/04/2009 11:26, "Kieran Mansley"
<kmansley@solarflare.com> wrote:
> > 
> > >> Could it be as simple as this? I can''t remember what
happens if
> > >> unregister_xenbus_watch is called after the xenbus connection
has been
> > >> reset. Should we just free the guest structures without
interacting
> > >> with xenstore at the start of the resume method?
> > > 
> > > It may be possible to synchronise the watch handler with the
> > > suspend/resume/cancel cycle without removing the watch, but that
starts
> > > to get complicated.
> > 
> > Could we avoid any of this logic executing if there are no net
accelerators?
> 
> The watch handler will try to load an accelerator if the configuration
> changes, so even if there were no accelerators before the suspend,
> unless you can prevent the watch from firing, you could end up with one
> trying to load between the suspend and resume.
> 
> If you got rid of the feature to load the requested accelerator
> automatically when the configuration changes, then yes, that might be
> possible, but I think I''d rather leave that in and use an extra
lock and
> some state to ignore the watch firing at bad times.  This would mean we
> could leave the watch in place during the suspend/resume/cancel cycle
> (refreshing on resume).  The suspend_cancel callback would still be
> necessary, but it would just be acquiring a lock and modifying some
> state rather than doing a xenbus watch operation.
I''m afraid my memory of the kernel suspend mechanics has gotten a bit
rusty over the last few months, so I may be off-base here. But I
thought that xs_suspend masked watches until xs_reusume or
xs_suspend_cancel? If that is the case, isn''t that on its own enough
to protect netfront?
> It''s not clear to me what the source of the long delay is, and
whether
> that change would solve it: the extra lock would be contended with the
> watch handler''s work queue, and so if the watch is the source of
the
> delay it''s possible that we''d just contend in a different
way and the
> delay would still be there.  Brendan: can you explain the delay for me?
It''s the round trips to xenstored to manage the watches, which always
have a chance to be very slow.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Ian Campbell

2009-Apr-01 20:36 UTC

head link

Re: [Xen-devel] [PATCH] Skip vcpu_hotplug for VCPU 0 in smp_resume

On Wed, 2009-04-01 at 06:26 -0400, Kieran Mansley wrote:> On Tue, 2009-03-31 at 13:04 -0700, Brendan Cully wrote:
> > On Tuesday, 31 March 2009 at 18:35, Keir Fraser wrote:
> > > On 31/03/2009 18:24, "Brendan Cully"
<brendan@cs.ubc.ca> wrote:
> > > 
> > > > The other is in the netfront accelerator. It tears down a
xenstore
> > > > watch on the accelerator path at every suspend, and adds the
watch
> > > > back on resume. As with any xenstore interaction, this can
> > > > occasionally take a very long time. I''m going from
faulty memory here,
> > > > but I didn''t think it was necessary to tear down
and restore watches
> > > > across suspend. Would it make sense to move the watch remove
and add
> > > > into the resume hook (taking it completely out of suspend
and
> > > > suspend_cancel)?
> > > 
> > > They have to be renewed on full resume, but of course they do not
get reset
> > > across a cancelled suspend. Kieran may be able to advise on
changing the net
> > > accel code.
> > 
> > Could it be as simple as this? I can''t remember what happens
if
> > unregister_xenbus_watch is called after the xenbus connection has been
> > reset. Should we just free the guest structures without interacting
> > with xenstore at the start of the resume method?
> 
> I don''t think that will work.  We need to remove the watch before
the
> suspend to ensure that the code that handles the watch will not be
> activated during the migration.
It''s worth bearing in mind for if/when you port this stuff to a newer
pvops kernel that it tries to make much greater use of the Linux device
model callbacks rather than using xenbus specific suspend/resume
methods. The Linux device model doesn''t have a suspend cancelled
callback in it, AFAIK.

Ian.



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Kieran Mansley

2009-Apr-02 08:13 UTC

head link

Re: [Xen-devel] [PATCH] Skip vcpu_hotplug for VCPU 0 in smp_resume

On Wed, 2009-04-01 at 11:39 -0700, Brendan Cully wrote:> I''m afraid my memory of the kernel suspend mechanics has gotten a
bit
> rusty over the last few months, so I may be off-base here. But I
> thought that xs_suspend masked watches until xs_reusume or
> xs_suspend_cancel? If that is the case, isn''t that on its own
enough
> to protect netfront?
If that''s the case, it should be enough, yes.

Do watches that would have fired during that time fire at the end of
masked period?  If not, we''d still have to access xenstore on suspend
cancel to check if we''ve missed a change.

Kieran


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2009-Apr-02 08:27 UTC

head link

Re: [Xen-devel] [PATCH] Skip vcpu_hotplug for VCPU 0 in smp_resume

On 02/04/2009 09:13, "Kieran Mansley" <kmansley@solarflare.com>
wrote:
> On Wed, 2009-04-01 at 11:39 -0700, Brendan Cully wrote:
>> I''m afraid my memory of the kernel suspend mechanics has
gotten a bit
>> rusty over the last few months, so I may be off-base here. But I
>> thought that xs_suspend masked watches until xs_reusume or
>> xs_suspend_cancel? If that is the case, isn''t that on its own
enough
>> to protect netfront?
> 
> If that''s the case, it should be enough, yes.
> 
> Do watches that would have fired during that time fire at the end of
> masked period?  If not, we''d still have to access xenstore on
suspend
> cancel to check if we''ve missed a change.
I don''t think watches get lost across suspend-cancel, just delayed.

 -- Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Kieran Mansley

2009-Apr-02 09:13 UTC

head link

Re: [Xen-devel] [PATCH] Skip vcpu_hotplug for VCPU 0 in smp_resume

On Thu, 2009-04-02 at 09:27 +0100, Keir Fraser wrote:> On 02/04/2009 09:13, "Kieran Mansley"
<kmansley@solarflare.com> wrote:
> 
> > On Wed, 2009-04-01 at 11:39 -0700, Brendan Cully wrote:
> >> I''m afraid my memory of the kernel suspend mechanics has
gotten a bit
> >> rusty over the last few months, so I may be off-base here. But I
> >> thought that xs_suspend masked watches until xs_reusume or
> >> xs_suspend_cancel? If that is the case, isn''t that on its
own enough
> >> to protect netfront?
> > 
> > If that''s the case, it should be enough, yes.
> > 
> > Do watches that would have fired during that time fire at the end of
> > masked period?  If not, we''d still have to access xenstore on
suspend
> > cancel to check if we''ve missed a change.
> 
> I don''t think watches get lost across suspend-cancel, just
delayed.
OK, I''ll try and produce a patch based on the assumptions that (i)
watches won''t fire between suspend and either resume or suspend-cancel;
(ii) any that would have fired are triggered after a suspend-cancel; and
(iii) the watch must be refreshed on a resume (with no guarantees about
lost watches).

Kieran


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Kieran Mansley

2009-Apr-07 09:26 UTC

head link

Re: [Xen-devel] [PATCH] Skip vcpu_hotplug for VCPU 0 in smp_resume

On Thu, 2009-04-02 at 10:13 +0100, Kieran Mansley wrote:> OK, I''ll try and produce a patch based on the assumptions that (i)
> watches won''t fire between suspend and either resume or
suspend-cancel;
> (ii) any that would have fired are triggered after a suspend-cancel; and
> (iii) the watch must be refreshed on a resume (with no guarantees about
> lost watches).
OK, here''s the patch.

Signed-off-by: Kieran Mansley <kmansley@solarflare.com>

Keir: I''ll leave it up to you as to whether this is a bug fix (and so
should be included in 3.4) or just queued for unstable after 3.4 is
released.  Either is fine with us.

Thanks

Kieran 


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Xen devel - Mar 2009 - [PATCH] Skip vcpu_hotplug for VCPU 0 in smp_resume

[Xen-devel] [PATCH] Skip vcpu_hotplug for VCPU 0 in smp_resume

Re: [Xen-devel] [PATCH] Skip vcpu_hotplug for VCPU 0 in smp_resume

Re: [Xen-devel] [PATCH] Skip vcpu_hotplug for VCPU 0 in smp_resume

Re: [Xen-devel] [PATCH] Skip vcpu_hotplug for VCPU 0 in smp_resume

Re: [Xen-devel] [PATCH] Skip vcpu_hotplug for VCPU 0 in smp_resume

Re: [Xen-devel] [PATCH] Skip vcpu_hotplug for VCPU 0 in smp_resume

Re: [Xen-devel] [PATCH] Skip vcpu_hotplug for VCPU 0 in smp_resume

Re: [Xen-devel] [PATCH] Skip vcpu_hotplug for VCPU 0 in smp_resume

Re: [Xen-devel] [PATCH] Skip vcpu_hotplug for VCPU 0 in smp_resume

Re: [Xen-devel] [PATCH] Skip vcpu_hotplug for VCPU 0 in smp_resume

Re: [Xen-devel] [PATCH] Skip vcpu_hotplug for VCPU 0 in smp_resume

Re: [Xen-devel] [PATCH] Skip vcpu_hotplug for VCPU 0 in smp_resume

Re: [Xen-devel] [PATCH] Skip vcpu_hotplug for VCPU 0 in smp_resume