thr3ads.net - Xen devel - [Xen-devel] cs:21768 causes guest spend more time on boot up [Jul 2010]

If this information is useful, please help other people find it:
Share via:

Zhang, Yang Z

2010-Jul-15 08:07 UTC

[Xen-devel] cs:21768 causes guest spend more time on boot up

Hi Tim
         In our recently nightly test, we find guest will cost more time to boot
up. After our investigation, we find that for rhel5u3 and rh5u5 guest it will
stop at "start udev " for long time when boot up. And we find cs:21768
will cause this issue. Do you meet the same problem?

best regards
yang



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Tim Deegan

2010-Jul-15 09:48 UTC

head link

[Xen-devel] Re: cs:21768 causes guest spend more time on boot up

Hi, 

At 09:07 +0100 on 15 Jul (1279184841), Zhang, Yang Z
wrote:>          In our recently nightly test, we find guest will cost more
> time to boot up. After our investigation, we find that for rhel5u3 and
> rh5u5 guest it will stop at ?start udev ? for long time when boot
> up. And we find cs:21768 will cause this issue. Do you meet the same
> problem?
Nope, works fine for me[tm].  RHEL 5.5 stops at start_udev for about 5
seconds, but that''s not unusual.  I''ll try RHEL 5.3.

Tim.

-- 
Tim Deegan <Tim.Deegan@citrix.com>
Principal Software Engineer, XenServer Engineering
Citrix Systems UK Ltd.  (Company #02937203, SL9 0BG)

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Tim Deegan

2010-Jul-15 12:22 UTC

head link

[Xen-devel] Re: cs:21768 causes guest spend more time on boot up

At 10:48 +0100 on 15 Jul (1279190937), Tim Deegan wrote:> At 09:07 +0100 on 15 Jul (1279184841), Zhang, Yang Z wrote:
> >          In our recently nightly test, we find guest will cost more
> > time to boot up. After our investigation, we find that for rhel5u3 and
> > rh5u5 guest it will stop at ?start udev ? for long time when boot
> > up. And we find cs:21768 will cause this issue. Do you meet the same
> > problem?
> 
> Nope, works fine for me[tm].  RHEL 5.5 stops at start_udev for about 5
> seconds, but that''s not unusual.  I''ll try RHEL 5.3.
I''ve reproduced this slowdown on CentOS 5.5 x64.  It seems to be caused
by the size of the SMBIOS tables - reverting the part of this cset that
adds a type 11 object fixes the boot time; then just making some of the
other SMBIOS strings longer causes it to hang again. I wonder whether
we''re running into some other BIOS datastructure around 0xEB180

Tim.

-- 
Tim Deegan <Tim.Deegan@citrix.com>
Principal Software Engineer, XenServer Engineering
Citrix Systems UK Ltd.  (Company #02937203, SL9 0BG)

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Zhang, Yang Z

2010-Jul-15 15:14 UTC

head link

[Xen-devel] RE: cs:21768 causes guest spend more time on boot up

Is type 11 data structure cover 0xEB180 ? where is the start address of type 11
structure? and the max length ?


best regards
yang


-----Original Message-----
From: Tim Deegan [mailto:Tim.Deegan@citrix.com] 
Sent: Thursday, July 15, 2010 8:23 PM
To: Zhang, Yang Z
Cc: xen-devel@lists.xensource.com; Zhang, Jianwu; Xu, Jiajun
Subject: Re: cs:21768 causes guest spend more time on boot up

At 10:48 +0100 on 15 Jul (1279190937), Tim Deegan wrote:> At 09:07 +0100 on 15 Jul (1279184841), Zhang, Yang Z wrote:
> >          In our recently nightly test, we find guest will cost more
> > time to boot up. After our investigation, we find that for rhel5u3 and
> > rh5u5 guest it will stop at ?start udev ? for long time when boot
> > up. And we find cs:21768 will cause this issue. Do you meet the same
> > problem?
> 
> Nope, works fine for me[tm].  RHEL 5.5 stops at start_udev for about 5
> seconds, but that''s not unusual.  I''ll try RHEL 5.3.
I''ve reproduced this slowdown on CentOS 5.5 x64.  It seems to be caused
by the size of the SMBIOS tables - reverting the part of this cset that
adds a type 11 object fixes the boot time; then just making some of the
other SMBIOS strings longer causes it to hang again. I wonder whether
we''re running into some other BIOS datastructure around 0xEB180

Tim.

-- 
Tim Deegan <Tim.Deegan@citrix.com>
Principal Software Engineer, XenServer Engineering
Citrix Systems UK Ltd.  (Company #02937203, SL9 0BG)

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Tim Deegan

2010-Jul-15 15:20 UTC

head link

[Xen-devel] Re: cs:21768 causes guest spend more time on boot up

At 16:14 +0100 on 15 Jul (1279210449), Zhang, Yang Z
wrote:> Is type 11 data structure cover 0xEB180 ? where is the start address
> of type 11 structure? and the max length ?
No, the type-11 table is in the middle of the other SMBIOS tables.  The
SMBIOS tables cover 0xEB000 -- 0xEB171 before my patch; after the patch
they go a little further.  Extending them to cover 0xEB000 -- 0xEB187 
(just by making existing strings a little longer, not adding a type-11
table) makes CentOS 5.5 x64 hang up on boot.

Cheers,

Tim.
> 
> best regards
> yang
> 
> 
> -----Original Message-----
> From: Tim Deegan [mailto:Tim.Deegan@citrix.com] 
> Sent: Thursday, July 15, 2010 8:23 PM
> To: Zhang, Yang Z
> Cc: xen-devel@lists.xensource.com; Zhang, Jianwu; Xu, Jiajun
> Subject: Re: cs:21768 causes guest spend more time on boot up
> 
> At 10:48 +0100 on 15 Jul (1279190937), Tim Deegan wrote:
> > At 09:07 +0100 on 15 Jul (1279184841), Zhang, Yang Z wrote:
> > >          In our recently nightly test, we find guest will cost
more
> > > time to boot up. After our investigation, we find that for
rhel5u3 and
> > > rh5u5 guest it will stop at ?start udev ? for long time when boot
> > > up. And we find cs:21768 will cause this issue. Do you meet the
same
> > > problem?
> > 
> > Nope, works fine for me[tm].  RHEL 5.5 stops at start_udev for about 5
> > seconds, but that''s not unusual.  I''ll try RHEL 5.3.
> 
> I''ve reproduced this slowdown on CentOS 5.5 x64.  It seems to be
caused
> by the size of the SMBIOS tables - reverting the part of this cset that
> adds a type 11 object fixes the boot time; then just making some of the
> other SMBIOS strings longer causes it to hang again. I wonder whether
> we''re running into some other BIOS datastructure around 0xEB180
> 
> Tim.
> 
> -- 
> Tim Deegan <Tim.Deegan@citrix.com>
> Principal Software Engineer, XenServer Engineering
> Citrix Systems UK Ltd.  (Company #02937203, SL9 0BG)
-- 
Tim Deegan <Tim.Deegan@citrix.com>
Principal Software Engineer, XenServer Engineering
Citrix Systems UK Ltd.  (Company #02937203, SL9 0BG)

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Zhang, Yang Z

2010-Jul-15 15:30 UTC

head link

[Xen-devel] RE: cs:21768 causes guest spend more time on boot up

ok, I get it. Will you work on this issue ? it has block our nightly test. Hope
will be fixed as soon as possible.

best regards
yang


-----Original Message-----
From: Tim Deegan [mailto:Tim.Deegan@citrix.com] 
Sent: Thursday, July 15, 2010 11:20 PM
To: Zhang, Yang Z
Cc: xen-devel@lists.xensource.com; Zhang, Jianwu; Xu, Jiajun
Subject: Re: cs:21768 causes guest spend more time on boot up

At 16:14 +0100 on 15 Jul (1279210449), Zhang, Yang Z
wrote:> Is type 11 data structure cover 0xEB180 ? where is the start address
> of type 11 structure? and the max length ?
No, the type-11 table is in the middle of the other SMBIOS tables.  The
SMBIOS tables cover 0xEB000 -- 0xEB171 before my patch; after the patch
they go a little further.  Extending them to cover 0xEB000 -- 0xEB187 
(just by making existing strings a little longer, not adding a type-11
table) makes CentOS 5.5 x64 hang up on boot.

Cheers,

Tim.
> 
> best regards
> yang
> 
> 
> -----Original Message-----
> From: Tim Deegan [mailto:Tim.Deegan@citrix.com] 
> Sent: Thursday, July 15, 2010 8:23 PM
> To: Zhang, Yang Z
> Cc: xen-devel@lists.xensource.com; Zhang, Jianwu; Xu, Jiajun
> Subject: Re: cs:21768 causes guest spend more time on boot up
> 
> At 10:48 +0100 on 15 Jul (1279190937), Tim Deegan wrote:
> > At 09:07 +0100 on 15 Jul (1279184841), Zhang, Yang Z wrote:
> > >          In our recently nightly test, we find guest will cost
more
> > > time to boot up. After our investigation, we find that for
rhel5u3 and
> > > rh5u5 guest it will stop at ?start udev ? for long time when boot
> > > up. And we find cs:21768 will cause this issue. Do you meet the
same
> > > problem?
> > 
> > Nope, works fine for me[tm].  RHEL 5.5 stops at start_udev for about 5
> > seconds, but that''s not unusual.  I''ll try RHEL 5.3.
> 
> I''ve reproduced this slowdown on CentOS 5.5 x64.  It seems to be
caused
> by the size of the SMBIOS tables - reverting the part of this cset that
> adds a type 11 object fixes the boot time; then just making some of the
> other SMBIOS strings longer causes it to hang again. I wonder whether
> we''re running into some other BIOS datastructure around 0xEB180
> 
> Tim.
> 
> -- 
> Tim Deegan <Tim.Deegan@citrix.com>
> Principal Software Engineer, XenServer Engineering
> Citrix Systems UK Ltd.  (Company #02937203, SL9 0BG)
-- 
Tim Deegan <Tim.Deegan@citrix.com>
Principal Software Engineer, XenServer Engineering
Citrix Systems UK Ltd.  (Company #02937203, SL9 0BG)

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Tim Deegan

2010-Jul-15 15:33 UTC

head link

[Xen-devel] Re: cs:21768 causes guest spend more time on boot up

At 16:30 +0100 on 15 Jul (1279211440), Zhang, Yang Z
wrote:> ok, I get it. Will you work on this issue ? it has block our nightly
> test. Hope will be fixed as soon as possible.
Yes, I am still looking into it.  I think the address of the table is
not causing it, though, so it might take some time to find the real
cause.

Cheers,

Tim.
> best regards
> yang
> 
> 
> -----Original Message-----
> From: Tim Deegan [mailto:Tim.Deegan@citrix.com] 
> Sent: Thursday, July 15, 2010 11:20 PM
> To: Zhang, Yang Z
> Cc: xen-devel@lists.xensource.com; Zhang, Jianwu; Xu, Jiajun
> Subject: Re: cs:21768 causes guest spend more time on boot up
> 
> At 16:14 +0100 on 15 Jul (1279210449), Zhang, Yang Z wrote:
> > Is type 11 data structure cover 0xEB180 ? where is the start address
> > of type 11 structure? and the max length ?
> 
> No, the type-11 table is in the middle of the other SMBIOS tables.  The
> SMBIOS tables cover 0xEB000 -- 0xEB171 before my patch; after the patch
> they go a little further.  Extending them to cover 0xEB000 -- 0xEB187 
> (just by making existing strings a little longer, not adding a type-11
> table) makes CentOS 5.5 x64 hang up on boot.
> 
> Cheers,
> 
> Tim.
> 
> > 
> > best regards
> > yang
> > 
> > 
> > -----Original Message-----
> > From: Tim Deegan [mailto:Tim.Deegan@citrix.com] 
> > Sent: Thursday, July 15, 2010 8:23 PM
> > To: Zhang, Yang Z
> > Cc: xen-devel@lists.xensource.com; Zhang, Jianwu; Xu, Jiajun
> > Subject: Re: cs:21768 causes guest spend more time on boot up
> > 
> > At 10:48 +0100 on 15 Jul (1279190937), Tim Deegan wrote:
> > > At 09:07 +0100 on 15 Jul (1279184841), Zhang, Yang Z wrote:
> > > >          In our recently nightly test, we find guest will
cost more
> > > > time to boot up. After our investigation, we find that for
rhel5u3 and
> > > > rh5u5 guest it will stop at ?start udev ? for long time when
boot
> > > > up. And we find cs:21768 will cause this issue. Do you meet
the same
> > > > problem?
> > > 
> > > Nope, works fine for me[tm].  RHEL 5.5 stops at start_udev for
about 5
> > > seconds, but that''s not unusual.  I''ll try RHEL
5.3.
> > 
> > I''ve reproduced this slowdown on CentOS 5.5 x64.  It seems to
be caused
> > by the size of the SMBIOS tables - reverting the part of this cset
that
> > adds a type 11 object fixes the boot time; then just making some of
the
> > other SMBIOS strings longer causes it to hang again. I wonder whether
> > we''re running into some other BIOS datastructure around
0xEB180
> > 
> > Tim.
> > 
> > -- 
> > Tim Deegan <Tim.Deegan@citrix.com>
> > Principal Software Engineer, XenServer Engineering
> > Citrix Systems UK Ltd.  (Company #02937203, SL9 0BG)
> 
> -- 
> Tim Deegan <Tim.Deegan@citrix.com>
> Principal Software Engineer, XenServer Engineering
> Citrix Systems UK Ltd.  (Company #02937203, SL9 0BG)
-- 
Tim Deegan <Tim.Deegan@citrix.com>
Principal Software Engineer, XenServer Engineering
Citrix Systems UK Ltd.  (Company #02937203, SL9 0BG)

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Tim Deegan

2010-Jul-19 10:55 UTC

head link

[Xen-devel] [PATCH] Re: cs:21768 causes guest spend more time on boot up

At 16:33 +0100 on 15 Jul (1279211635), Tim Deegan wrote:> At 16:30 +0100 on 15 Jul (1279211440), Zhang, Yang Z wrote:
> > ok, I get it. Will you work on this issue ? it has block our nightly
> > test. Hope will be fixed as soon as possible.
> 
> Yes, I am still looking into it.  I think the address of the table is
> not causing it, though, so it might take some time to find the real
> cause.
The hang turned out to be entirely unrelated to the SMBIOS tables; the 
xenbus client zeroes out teh xenstore ring entirely, and it looks like
newer dom0 xenbus backends can''t handle that, so:


hvmloader: don''t zero out the xenbus page.  
Not all xenbus backends accept that gracefully.  Instead rely on the
xenbus frontend in the guest being able to start up with non-zero ring
offsets. 

Signed-off-by: Tim Deegan <Tim.Deegan@citrix.com>

diff -r c4a83e3cc6b4 tools/firmware/hvmloader/xenbus.c
--- a/tools/firmware/hvmloader/xenbus.c	Thu Jul 15 18:18:16 2010 +0100
+++ b/tools/firmware/hvmloader/xenbus.c	Mon Jul 19 11:51:11 2010 +0100
@@ -53,14 +53,14 @@
            (unsigned long) rings, (unsigned long) event);
 }
 
-/* Reset the xenbus connection so the next kernel can start again. 
- * We zero out the whole ring -- the backend can handle this, and it''s
- * not going to surprise any frontends since it''s equivalent to never 
- * having used the rings. */
+/* Drop the Xenbus connection, leaving it ready for the next user. 
+ * There should be no messages on the ring but make sure the rsp 
+ * consumer is up to date just in case. */
 void xenbus_shutdown(void)
 {
     ASSERT(rings != NULL);
-    memset(rings, 0, sizeof *rings);
+    ASSERT(rings->req_cons == rings->req_prod);
+    rings->rsp_cons = rings->rsp_prod;
     rings = NULL;
 }
 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2010-Jul-19 11:46 UTC

head link

[Xen-devel] Re: [PATCH] Re: cs:21768 causes guest spend more time on boot up

On 19/07/2010 11:55, "Tim Deegan" <Tim.Deegan@eu.citrix.com>
wrote:
>> Yes, I am still looking into it.  I think the address of the table is
>> not causing it, though, so it might take some time to find the real
>> cause.
> 
> The hang turned out to be entirely unrelated to the SMBIOS tables; the
> xenbus client zeroes out teh xenstore ring entirely, and it looks like
> newer dom0 xenbus backends can''t handle that, so:
What would it have to do with an in-kernel driver? Doesn''t the comms
page
only get looked at by [o]xenstored? In which case we could fix them.

 -- Keir
> hvmloader: don''t zero out the xenbus page.
> Not all xenbus backends accept that gracefully.  Instead rely on the
> xenbus frontend in the guest being able to start up with non-zero ring
> offsets. 
> 
> Signed-off-by: Tim Deegan <Tim.Deegan@citrix.com>


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Tim Deegan

2010-Jul-19 12:19 UTC

head link

[Xen-devel] Re: [PATCH] Re: cs:21768 causes guest spend more time on boot up

At 12:46 +0100 on 19 Jul (1279543585), Keir Fraser
wrote:> On 19/07/2010 11:55, "Tim Deegan"
<Tim.Deegan@eu.citrix.com> wrote:
> 
> >> Yes, I am still looking into it.  I think the address of the table
is
> >> not causing it, though, so it might take some time to find the
real
> >> cause.
> > 
> > The hang turned out to be entirely unrelated to the SMBIOS tables; the
> > xenbus client zeroes out teh xenstore ring entirely, and it looks like
> > newer dom0 xenbus backends can''t handle that, so:
> 
> What would it have to do with an in-kernel driver? Doesn''t the
comms page
> only get looked at by [o]xenstored? In which case we could fix them.
Ah, so it does.  I assumed it was the kernel because that''s all that
changed on my test box since I last tested this stuff.  I''m using the C
xenstored and its code looks like it should work fine with the page
getting zeroed under its feet.  I''ll dig further.

Cheers,

Tim.
> 
>  -- Keir
> 
> > hvmloader: don''t zero out the xenbus page.
> > Not all xenbus backends accept that gracefully.  Instead rely on the
> > xenbus frontend in the guest being able to start up with non-zero ring
> > offsets. 
> > 
> > Signed-off-by: Tim Deegan <Tim.Deegan@citrix.com>
> 
> 
-- 
Tim Deegan <Tim.Deegan@citrix.com>
Principal Software Engineer, XenServer Engineering
Citrix Systems UK Ltd.  (Company #02937203, SL9 0BG)

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2010-Jul-19 12:24 UTC

head link

[Xen-devel] Re: [PATCH] Re: cs:21768 causes guest spend more time on boot up

On 19/07/2010 13:19, "Tim Deegan" <Tim.Deegan@eu.citrix.com>
wrote:
>>> The hang turned out to be entirely unrelated to the SMBIOS tables;
the
>>> xenbus client zeroes out teh xenstore ring entirely, and it looks
like
>>> newer dom0 xenbus backends can''t handle that, so:
>> 
>> What would it have to do with an in-kernel driver? Doesn''t the
comms page
>> only get looked at by [o]xenstored? In which case we could fix them.
> 
> Ah, so it does.  I assumed it was the kernel because that''s all
that
> changed on my test box since I last tested this stuff.  I''m using
the C
> xenstored and its code looks like it should work fine with the page
> getting zeroed under its feet.  I''ll dig further.
Thanks. The revised patch might be acceptable, but if possible we''re
better
off relying on undocumented xenstored behaviour than domU frontend behaviour
since the former we have full control over (albeit we now have two daemons
to consider). Really nice would be some kind of explicit reset command or
protocol, but in this context since there are no watches or pending requests
or anything, I guess whacking the xenstore page is sufficient engineering
effort.

 -- Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2010-Jul-19 12:30 UTC

head link

Re: [Xen-devel] Re: [PATCH] Re: cs:21768 causes guest spend more time on boot up

On 19/07/2010 13:24, "Keir Fraser" <keir.fraser@eu.citrix.com>
wrote:
>> Ah, so it does.  I assumed it was the kernel because that''s
all that
>> changed on my test box since I last tested this stuff.  I''m
using the C
>> xenstored and its code looks like it should work fine with the page
>> getting zeroed under its feet.  I''ll dig further.
> 
> Thanks. The revised patch might be acceptable, but if possible
we''re better
> off relying on undocumented xenstored behaviour than domU frontend
behaviour
> since the former we have full control over (albeit we now have two daemons
> to consider). Really nice would be some kind of explicit reset command or
> protocol, but in this context since there are no watches or pending
requests
> or anything, I guess whacking the xenstore page is sufficient engineering
> effort.
And, yes, I agree it looks like the existing code should just work. The C
xenstored, at least, doesn''t cache ring indexes.

 -- Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Tim Deegan

2010-Jul-20 10:38 UTC

head link

[Xen-devel] [PATCH] v2 (Re: cs:21768 causes guest spend more time on boot up)

Here''s another patch that also unsticks CentOS 5.5 boot for me, and
seems safer and saner (even if it turns out that the bug is somewhere
else and I''m just perturbing the inputs to some more complex system...)

Cheers,

Tim.

hvmloader: clear the xenbus event-channel when we''re done with it.
Otherwise a later xenbus client that naively waits for the rising edge
could get stuck.

Signed-off-by: Tim Deegan <Tim.Deegan@citrix.com>

diff -r c4a83e3cc6b4 tools/firmware/hvmloader/util.c
--- a/tools/firmware/hvmloader/util.c	Thu Jul 15 18:18:16 2010 +0100
+++ b/tools/firmware/hvmloader/util.c	Tue Jul 20 11:34:06 2010 +0100
@@ -587,10 +587,28 @@
     return table;
 }
 
+struct shared_info *get_shared_info(void) 
+{
+    static struct shared_info *shared_info = NULL;
+    struct xen_add_to_physmap xatp;
+
+    if ( shared_info == NULL )
+    {
+        /* Map shared-info page. */
+        xatp.domid = DOMID_SELF;
+        xatp.space = XENMAPSPACE_shared_info;
+        xatp.idx   = 0;
+        xatp.gpfn  = 0xfffff;
+        if ( hypercall_memory_op(XENMEM_add_to_physmap, &xatp) != 0 )
+            BUG();
+        shared_info = (struct shared_info *) (xatp.gpfn << 12);
+    }
+    return shared_info;
+}
+
 uint16_t get_cpu_mhz(void)
 {
-    struct xen_add_to_physmap xatp;
-    struct shared_info *shared_info = (struct shared_info *)0xfffff000;
+    struct shared_info *shared_info = get_shared_info();
     struct vcpu_time_info *info = &shared_info->vcpu_info[0].time;
     uint64_t cpu_khz;
     uint32_t tsc_to_nsec_mul, version;
@@ -599,14 +617,6 @@
     static uint16_t cpu_mhz;
     if ( cpu_mhz != 0 )
         return cpu_mhz;
-
-    /* Map shared-info page. */
-    xatp.domid = DOMID_SELF;
-    xatp.space = XENMAPSPACE_shared_info;
-    xatp.idx   = 0;
-    xatp.gpfn  = (unsigned long)shared_info >> 12;
-    if ( hypercall_memory_op(XENMEM_add_to_physmap, &xatp) != 0 )
-        BUG();
 
     /* Get a consistent snapshot of scale factor (multiplier and shift). */
     do {
diff -r c4a83e3cc6b4 tools/firmware/hvmloader/util.h
--- a/tools/firmware/hvmloader/util.h	Thu Jul 15 18:18:16 2010 +0100
+++ b/tools/firmware/hvmloader/util.h	Tue Jul 20 11:34:06 2010 +0100
@@ -68,6 +68,9 @@
 #define pci_writeb(devfn, reg, val) (pci_write(devfn, reg, 1, (uint8_t) val))
 #define pci_writew(devfn, reg, val) (pci_write(devfn, reg, 2, (uint16_t)val))
 #define pci_writel(devfn, reg, val) (pci_write(devfn, reg, 4, (uint32_t)val))
+
+/* Get a pointer to the shared-info page */
+struct shared_info *get_shared_info(void);
 
 /* Get CPU speed in MHz. */
 uint16_t get_cpu_mhz(void);
diff -r c4a83e3cc6b4 tools/firmware/hvmloader/xenbus.c
--- a/tools/firmware/hvmloader/xenbus.c	Thu Jul 15 18:18:16 2010 +0100
+++ b/tools/firmware/hvmloader/xenbus.c	Tue Jul 20 11:34:06 2010 +0100
@@ -53,14 +53,20 @@
            (unsigned long) rings, (unsigned long) event);
 }
 
-/* Reset the xenbus connection so the next kernel can start again. 
- * We zero out the whole ring -- the backend can handle this, and it''s
- * not going to surprise any frontends since it''s equivalent to never 
- * having used the rings. */
+/* Reset the xenbus connection so the next kernel can start again. */
 void xenbus_shutdown(void)
 {
     ASSERT(rings != NULL);
+
+    /* We zero out the whole ring -- the backend can handle this, and
it''s
+     * not going to surprise any frontends since it''s equivalent to
never
+     * having used the rings. */
     memset(rings, 0, sizeof *rings);
+
+    /* Clear the xenbus event-channel too */
+    get_shared_info()->evtchn_pending[event / sizeof (unsigned long)]
+        &= ~(1UL << ((event % sizeof (unsigned long))));    
+
     rings = NULL;
 }
 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2010-Jul-20 11:16 UTC

head link

[Xen-devel] Re: [PATCH] v2 (Re: cs:21768 causes guest spend more time on boot up)

Ah, this is because you poll the ring and never actually use let alone clear
the event channel rx port? This looks like a good fix, thanks.

 -- Keir

On 20/07/2010 11:38, "Tim Deegan" <Tim.Deegan@eu.citrix.com>
wrote:
> Here''s another patch that also unsticks CentOS 5.5 boot for me,
and
> seems safer and saner (even if it turns out that the bug is somewhere
> else and I''m just perturbing the inputs to some more complex
system...)
> 
> Cheers,
> 
> Tim.
> 
> hvmloader: clear the xenbus event-channel when we''re done with it.
> Otherwise a later xenbus client that naively waits for the rising edge
> could get stuck.
> 
> Signed-off-by: Tim Deegan <Tim.Deegan@citrix.com>
> 
> diff -r c4a83e3cc6b4 tools/firmware/hvmloader/util.c
> --- a/tools/firmware/hvmloader/util.c Thu Jul 15 18:18:16 2010 +0100
> +++ b/tools/firmware/hvmloader/util.c Tue Jul 20 11:34:06 2010 +0100
> @@ -587,10 +587,28 @@
>      return table;
>  }
>  
> +struct shared_info *get_shared_info(void)
> +{
> +    static struct shared_info *shared_info = NULL;
> +    struct xen_add_to_physmap xatp;
> +
> +    if ( shared_info == NULL )
> +    {
> +        /* Map shared-info page. */
> +        xatp.domid = DOMID_SELF;
> +        xatp.space = XENMAPSPACE_shared_info;
> +        xatp.idx   = 0;
> +        xatp.gpfn  = 0xfffff;
> +        if ( hypercall_memory_op(XENMEM_add_to_physmap, &xatp) != 0 )
> +            BUG();
> +        shared_info = (struct shared_info *) (xatp.gpfn << 12);
> +    }
> +    return shared_info;
> +}
> +
>  uint16_t get_cpu_mhz(void)
>  {
> -    struct xen_add_to_physmap xatp;
> -    struct shared_info *shared_info = (struct shared_info *)0xfffff000;
> +    struct shared_info *shared_info = get_shared_info();
>      struct vcpu_time_info *info = &shared_info->vcpu_info[0].time;
>      uint64_t cpu_khz;
>      uint32_t tsc_to_nsec_mul, version;
> @@ -599,14 +617,6 @@
>      static uint16_t cpu_mhz;
>      if ( cpu_mhz != 0 )
>          return cpu_mhz;
> -
> -    /* Map shared-info page. */
> -    xatp.domid = DOMID_SELF;
> -    xatp.space = XENMAPSPACE_shared_info;
> -    xatp.idx   = 0;
> -    xatp.gpfn  = (unsigned long)shared_info >> 12;
> -    if ( hypercall_memory_op(XENMEM_add_to_physmap, &xatp) != 0 )
> -        BUG();
>  
>      /* Get a consistent snapshot of scale factor (multiplier and shift).
*/
>      do {
> diff -r c4a83e3cc6b4 tools/firmware/hvmloader/util.h
> --- a/tools/firmware/hvmloader/util.h Thu Jul 15 18:18:16 2010 +0100
> +++ b/tools/firmware/hvmloader/util.h Tue Jul 20 11:34:06 2010 +0100
> @@ -68,6 +68,9 @@
>  #define pci_writeb(devfn, reg, val) (pci_write(devfn, reg, 1, (uint8_t)
val))
>  #define pci_writew(devfn, reg, val) (pci_write(devfn, reg, 2,
(uint16_t)val))
>  #define pci_writel(devfn, reg, val) (pci_write(devfn, reg, 4,
(uint32_t)val))
> +
> +/* Get a pointer to the shared-info page */
> +struct shared_info *get_shared_info(void);
>  
>  /* Get CPU speed in MHz. */
>  uint16_t get_cpu_mhz(void);
> diff -r c4a83e3cc6b4 tools/firmware/hvmloader/xenbus.c
> --- a/tools/firmware/hvmloader/xenbus.c Thu Jul 15 18:18:16 2010 +0100
> +++ b/tools/firmware/hvmloader/xenbus.c Tue Jul 20 11:34:06 2010 +0100
> @@ -53,14 +53,20 @@
>             (unsigned long) rings, (unsigned long) event);
>  }
>  
> -/* Reset the xenbus connection so the next kernel can start again.
> - * We zero out the whole ring -- the backend can handle this, and
it''s
> - * not going to surprise any frontends since it''s equivalent to
never
> - * having used the rings. */
> +/* Reset the xenbus connection so the next kernel can start again. */
>  void xenbus_shutdown(void)
>  {
>      ASSERT(rings != NULL);
> +
> +    /* We zero out the whole ring -- the backend can handle this, and
it''s
> +     * not going to surprise any frontends since it''s equivalent
to never
> +     * having used the rings. */
>      memset(rings, 0, sizeof *rings);
> +
> +    /* Clear the xenbus event-channel too */
> +    get_shared_info()->evtchn_pending[event / sizeof (unsigned long)]
> +        &= ~(1UL << ((event % sizeof (unsigned long))));
> +
>      rings = NULL;
>  }
>  


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Zhang, Jianwu

2010-Jul-22 09:01 UTC

head link

[Xen-devel] RE: [PATCH] v2 (Re: cs:21768 causes guest spend more time on boot up)

Hi Tim
	When we set the VCPUS parameter more than 2 in guest configure file, We meet
this issue again . We try it on cs21837, and the guest os is rhel5u3.

Thanks 
Jianwu

-----Original Message-----
From: Keir Fraser [mailto:keir.fraser@eu.citrix.com] 
Sent: Tuesday, July 20, 2010 7:17 PM
To: Tim Deegan
Cc: Zhang, Yang Z; xen-devel@lists.xensource.com; Zhang, Jianwu; Xu, Jiajun
Subject: Re: [PATCH] v2 (Re: cs:21768 causes guest spend more time on boot up)

Ah, this is because you poll the ring and never actually use let alone clear
the event channel rx port? This looks like a good fix, thanks.

 -- Keir

On 20/07/2010 11:38, "Tim Deegan" <Tim.Deegan@eu.citrix.com>
wrote:
> Here''s another patch that also unsticks CentOS 5.5 boot for me,
and
> seems safer and saner (even if it turns out that the bug is somewhere
> else and I''m just perturbing the inputs to some more complex
system...)
> 
> Cheers,
> 
> Tim.
> 
> hvmloader: clear the xenbus event-channel when we''re done with it.
> Otherwise a later xenbus client that naively waits for the rising edge
> could get stuck.
> 
> Signed-off-by: Tim Deegan <Tim.Deegan@citrix.com>
> 
> diff -r c4a83e3cc6b4 tools/firmware/hvmloader/util.c
> --- a/tools/firmware/hvmloader/util.c Thu Jul 15 18:18:16 2010 +0100
> +++ b/tools/firmware/hvmloader/util.c Tue Jul 20 11:34:06 2010 +0100
> @@ -587,10 +587,28 @@
>      return table;
>  }
>  
> +struct shared_info *get_shared_info(void)
> +{
> +    static struct shared_info *shared_info = NULL;
> +    struct xen_add_to_physmap xatp;
> +
> +    if ( shared_info == NULL )
> +    {
> +        /* Map shared-info page. */
> +        xatp.domid = DOMID_SELF;
> +        xatp.space = XENMAPSPACE_shared_info;
> +        xatp.idx   = 0;
> +        xatp.gpfn  = 0xfffff;
> +        if ( hypercall_memory_op(XENMEM_add_to_physmap, &xatp) != 0 )
> +            BUG();
> +        shared_info = (struct shared_info *) (xatp.gpfn << 12);
> +    }
> +    return shared_info;
> +}
> +
>  uint16_t get_cpu_mhz(void)
>  {
> -    struct xen_add_to_physmap xatp;
> -    struct shared_info *shared_info = (struct shared_info *)0xfffff000;
> +    struct shared_info *shared_info = get_shared_info();
>      struct vcpu_time_info *info = &shared_info->vcpu_info[0].time;
>      uint64_t cpu_khz;
>      uint32_t tsc_to_nsec_mul, version;
> @@ -599,14 +617,6 @@
>      static uint16_t cpu_mhz;
>      if ( cpu_mhz != 0 )
>          return cpu_mhz;
> -
> -    /* Map shared-info page. */
> -    xatp.domid = DOMID_SELF;
> -    xatp.space = XENMAPSPACE_shared_info;
> -    xatp.idx   = 0;
> -    xatp.gpfn  = (unsigned long)shared_info >> 12;
> -    if ( hypercall_memory_op(XENMEM_add_to_physmap, &xatp) != 0 )
> -        BUG();
>  
>      /* Get a consistent snapshot of scale factor (multiplier and shift).
*/
>      do {
> diff -r c4a83e3cc6b4 tools/firmware/hvmloader/util.h
> --- a/tools/firmware/hvmloader/util.h Thu Jul 15 18:18:16 2010 +0100
> +++ b/tools/firmware/hvmloader/util.h Tue Jul 20 11:34:06 2010 +0100
> @@ -68,6 +68,9 @@
>  #define pci_writeb(devfn, reg, val) (pci_write(devfn, reg, 1, (uint8_t)
val))
>  #define pci_writew(devfn, reg, val) (pci_write(devfn, reg, 2,
(uint16_t)val))
>  #define pci_writel(devfn, reg, val) (pci_write(devfn, reg, 4,
(uint32_t)val))
> +
> +/* Get a pointer to the shared-info page */
> +struct shared_info *get_shared_info(void);
>  
>  /* Get CPU speed in MHz. */
>  uint16_t get_cpu_mhz(void);
> diff -r c4a83e3cc6b4 tools/firmware/hvmloader/xenbus.c
> --- a/tools/firmware/hvmloader/xenbus.c Thu Jul 15 18:18:16 2010 +0100
> +++ b/tools/firmware/hvmloader/xenbus.c Tue Jul 20 11:34:06 2010 +0100
> @@ -53,14 +53,20 @@
>             (unsigned long) rings, (unsigned long) event);
>  }
>  
> -/* Reset the xenbus connection so the next kernel can start again.
> - * We zero out the whole ring -- the backend can handle this, and
it''s
> - * not going to surprise any frontends since it''s equivalent to
never
> - * having used the rings. */
> +/* Reset the xenbus connection so the next kernel can start again. */
>  void xenbus_shutdown(void)
>  {
>      ASSERT(rings != NULL);
> +
> +    /* We zero out the whole ring -- the backend can handle this, and
it''s
> +     * not going to surprise any frontends since it''s equivalent
to never
> +     * having used the rings. */
>      memset(rings, 0, sizeof *rings);
> +
> +    /* Clear the xenbus event-channel too */
> +    get_shared_info()->evtchn_pending[event / sizeof (unsigned long)]
> +        &= ~(1UL << ((event % sizeof (unsigned long))));
> +
>      rings = NULL;
>  }
>  


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2010-Jul-22 13:20 UTC

head link

[Xen-devel] Re: [PATCH] v2 (Re: cs:21768 causes guest spend more time on boot up)

I suggest to try zapping the entire shared-info page when hvmloader
finishes. There is nothing in there that is useful to keep across hvmloader
and guest OS; zapping will ensure that other flags with rising-edge
semantics such as per-vcpu evtchn selector words get reset; and doing
anything more than zeroing is pointless since e.g., the evtchn_mask array
offset and size is dependent on whether the guest OS is 32-bit or 64-bit. If
hvmloader were to set the mask to all 1s and then boot a 64-bit guest, the
rearranged shared_info would actually mean that hvmloader has set 1s in part
of the 64-bit extended evtchn_pending array!

Just a thought...

 -- Keir

On 22/07/2010 10:01, "Zhang, Jianwu" <jianwu.zhang@intel.com>
wrote:
> Hi Tim
> When we set the VCPUS parameter more than 2 in guest configure file, We
meet
> this issue again . We try it on cs21837, and the guest os is rhel5u3.
> 
> Thanks 
> Jianwu
> 
> -----Original Message-----
> From: Keir Fraser [mailto:keir.fraser@eu.citrix.com]
> Sent: Tuesday, July 20, 2010 7:17 PM
> To: Tim Deegan
> Cc: Zhang, Yang Z; xen-devel@lists.xensource.com; Zhang, Jianwu; Xu, Jiajun
> Subject: Re: [PATCH] v2 (Re: cs:21768 causes guest spend more time on boot
up)
> 
> Ah, this is because you poll the ring and never actually use let alone
clear
> the event channel rx port? This looks like a good fix, thanks.
> 
>  -- Keir
> 
> On 20/07/2010 11:38, "Tim Deegan"
<Tim.Deegan@eu.citrix.com> wrote:
> 
>> Here''s another patch that also unsticks CentOS 5.5 boot for
me, and
>> seems safer and saner (even if it turns out that the bug is somewhere
>> else and I''m just perturbing the inputs to some more complex
system...)
>> 
>> Cheers,
>> 
>> Tim.
>> 
>> hvmloader: clear the xenbus event-channel when we''re done with
it.
>> Otherwise a later xenbus client that naively waits for the rising edge
>> could get stuck.
>> 
>> Signed-off-by: Tim Deegan <Tim.Deegan@citrix.com>
>> 
>> diff -r c4a83e3cc6b4 tools/firmware/hvmloader/util.c
>> --- a/tools/firmware/hvmloader/util.c Thu Jul 15 18:18:16 2010 +0100
>> +++ b/tools/firmware/hvmloader/util.c Tue Jul 20 11:34:06 2010 +0100
>> @@ -587,10 +587,28 @@
>>      return table;
>>  }
>>  
>> +struct shared_info *get_shared_info(void)
>> +{
>> +    static struct shared_info *shared_info = NULL;
>> +    struct xen_add_to_physmap xatp;
>> +
>> +    if ( shared_info == NULL )
>> +    {
>> +        /* Map shared-info page. */
>> +        xatp.domid = DOMID_SELF;
>> +        xatp.space = XENMAPSPACE_shared_info;
>> +        xatp.idx   = 0;
>> +        xatp.gpfn  = 0xfffff;
>> +        if ( hypercall_memory_op(XENMEM_add_to_physmap, &xatp) !=
0 )
>> +            BUG();
>> +        shared_info = (struct shared_info *) (xatp.gpfn << 12);
>> +    }
>> +    return shared_info;
>> +}
>> +
>>  uint16_t get_cpu_mhz(void)
>>  {
>> -    struct xen_add_to_physmap xatp;
>> -    struct shared_info *shared_info = (struct shared_info
*)0xfffff000;
>> +    struct shared_info *shared_info = get_shared_info();
>>      struct vcpu_time_info *info =
&shared_info->vcpu_info[0].time;
>>      uint64_t cpu_khz;
>>      uint32_t tsc_to_nsec_mul, version;
>> @@ -599,14 +617,6 @@
>>      static uint16_t cpu_mhz;
>>      if ( cpu_mhz != 0 )
>>          return cpu_mhz;
>> -
>> -    /* Map shared-info page. */
>> -    xatp.domid = DOMID_SELF;
>> -    xatp.space = XENMAPSPACE_shared_info;
>> -    xatp.idx   = 0;
>> -    xatp.gpfn  = (unsigned long)shared_info >> 12;
>> -    if ( hypercall_memory_op(XENMEM_add_to_physmap, &xatp) != 0 )
>> -        BUG();
>>  
>>      /* Get a consistent snapshot of scale factor (multiplier and
shift). */
>>      do {
>> diff -r c4a83e3cc6b4 tools/firmware/hvmloader/util.h
>> --- a/tools/firmware/hvmloader/util.h Thu Jul 15 18:18:16 2010 +0100
>> +++ b/tools/firmware/hvmloader/util.h Tue Jul 20 11:34:06 2010 +0100
>> @@ -68,6 +68,9 @@
>>  #define pci_writeb(devfn, reg, val) (pci_write(devfn, reg, 1,
(uint8_t)
>> val))
>>  #define pci_writew(devfn, reg, val) (pci_write(devfn, reg, 2,
>> (uint16_t)val))
>>  #define pci_writel(devfn, reg, val) (pci_write(devfn, reg, 4,
>> (uint32_t)val))
>> +
>> +/* Get a pointer to the shared-info page */
>> +struct shared_info *get_shared_info(void);
>>  
>>  /* Get CPU speed in MHz. */
>>  uint16_t get_cpu_mhz(void);
>> diff -r c4a83e3cc6b4 tools/firmware/hvmloader/xenbus.c
>> --- a/tools/firmware/hvmloader/xenbus.c Thu Jul 15 18:18:16 2010 +0100
>> +++ b/tools/firmware/hvmloader/xenbus.c Tue Jul 20 11:34:06 2010 +0100
>> @@ -53,14 +53,20 @@
>>             (unsigned long) rings, (unsigned long) event);
>>  }
>>  
>> -/* Reset the xenbus connection so the next kernel can start again.
>> - * We zero out the whole ring -- the backend can handle this, and
it''s
>> - * not going to surprise any frontends since it''s equivalent
to never
>> - * having used the rings. */
>> +/* Reset the xenbus connection so the next kernel can start again. */
>>  void xenbus_shutdown(void)
>>  {
>>      ASSERT(rings != NULL);
>> +
>> +    /* We zero out the whole ring -- the backend can handle this, and
it''s
>> +     * not going to surprise any frontends since it''s
equivalent to never
>> +     * having used the rings. */
>>      memset(rings, 0, sizeof *rings);
>> +
>> +    /* Clear the xenbus event-channel too */
>> +    get_shared_info()->evtchn_pending[event / sizeof (unsigned
long)]
>> +        &= ~(1UL << ((event % sizeof (unsigned long))));
>> +
>>      rings = NULL;
>>  }
>>  
> 
> 


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Tim Deegan

2010-Jul-22 13:41 UTC

head link

[Xen-devel] Re: [PATCH] v2 (Re: cs:21768 causes guest spend more time on boot up)

At 14:20 +0100 on 22 Jul (1279808457), Keir Fraser
wrote:> I suggest to try zapping the entire shared-info page when hvmloader
> finishes. There is nothing in there that is useful to keep across hvmloader
> and guest OS; zapping will ensure that other flags with rising-edge
> semantics such as per-vcpu evtchn selector words get reset; and doing
> anything more than zeroing is pointless since e.g., the evtchn_mask array
> offset and size is dependent on whether the guest OS is 32-bit or 64-bit.
If
> hvmloader were to set the mask to all 1s and then boot a 64-bit guest, the
> rearranged shared_info would actually mean that hvmloader has set 1s in
part
> of the 64-bit extended evtchn_pending array!
Good point.  That seems to do the trick.

hvmloader: clear the whole shared-info page when shutting down xenbus
since the contents might be in the wrong word-size for later users. 

Signed-off-by: Tim Deegan <Tim.Deegan@citrix.com>

diff -r e8dbc1262f52 tools/firmware/hvmloader/xenbus.c
--- a/tools/firmware/hvmloader/xenbus.c	Wed Jul 21 09:02:10 2010 +0100
+++ b/tools/firmware/hvmloader/xenbus.c	Thu Jul 22 14:39:28 2010 +0100
@@ -63,9 +63,8 @@ void xenbus_shutdown(void)
      * having used the rings. */
     memset(rings, 0, sizeof *rings);
 
-    /* Clear the xenbus event-channel too */
-    get_shared_info()->evtchn_pending[event / sizeof (unsigned long)]
-        &= ~(1UL << ((event % sizeof (unsigned long))));    
+    /* Clear the event-channel state too. */
+    memset(get_shared_info(), 0, PAGE_SIZE);
 
     rings = NULL;
 }

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Xen devel - Jul 2010 - cs:21768 causes guest spend more time on boot up

[Xen-devel] cs:21768 causes guest spend more time on boot up

[Xen-devel] Re: cs:21768 causes guest spend more time on boot up

[Xen-devel] Re: cs:21768 causes guest spend more time on boot up

[Xen-devel] RE: cs:21768 causes guest spend more time on boot up

[Xen-devel] Re: cs:21768 causes guest spend more time on boot up

[Xen-devel] RE: cs:21768 causes guest spend more time on boot up

[Xen-devel] Re: cs:21768 causes guest spend more time on boot up

[Xen-devel] [PATCH] Re: cs:21768 causes guest spend more time on boot up

[Xen-devel] Re: [PATCH] Re: cs:21768 causes guest spend more time on boot up

[Xen-devel] Re: [PATCH] Re: cs:21768 causes guest spend more time on boot up

[Xen-devel] Re: [PATCH] Re: cs:21768 causes guest spend more time on boot up

Re: [Xen-devel] Re: [PATCH] Re: cs:21768 causes guest spend more time on boot up

[Xen-devel] [PATCH] v2 (Re: cs:21768 causes guest spend more time on boot up)

[Xen-devel] Re: [PATCH] v2 (Re: cs:21768 causes guest spend more time on boot up)

[Xen-devel] RE: [PATCH] v2 (Re: cs:21768 causes guest spend more time on boot up)

[Xen-devel] Re: [PATCH] v2 (Re: cs:21768 causes guest spend more time on boot up)

[Xen-devel] Re: [PATCH] v2 (Re: cs:21768 causes guest spend more time on boot up)