Zhang, Yang Z
2010-Jul-15 08:07 UTC
[Xen-devel] cs:21768 causes guest spend more time on boot up
Hi Tim In our recently nightly test, we find guest will cost more time to boot up. After our investigation, we find that for rhel5u3 and rh5u5 guest it will stop at "start udev " for long time when boot up. And we find cs:21768 will cause this issue. Do you meet the same problem? best regards yang _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Tim Deegan
2010-Jul-15 09:48 UTC
[Xen-devel] Re: cs:21768 causes guest spend more time on boot up
Hi, At 09:07 +0100 on 15 Jul (1279184841), Zhang, Yang Z wrote:> In our recently nightly test, we find guest will cost more > time to boot up. After our investigation, we find that for rhel5u3 and > rh5u5 guest it will stop at ?start udev ? for long time when boot > up. And we find cs:21768 will cause this issue. Do you meet the same > problem?Nope, works fine for me[tm]. RHEL 5.5 stops at start_udev for about 5 seconds, but that''s not unusual. I''ll try RHEL 5.3. Tim. -- Tim Deegan <Tim.Deegan@citrix.com> Principal Software Engineer, XenServer Engineering Citrix Systems UK Ltd. (Company #02937203, SL9 0BG) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Tim Deegan
2010-Jul-15 12:22 UTC
[Xen-devel] Re: cs:21768 causes guest spend more time on boot up
At 10:48 +0100 on 15 Jul (1279190937), Tim Deegan wrote:> At 09:07 +0100 on 15 Jul (1279184841), Zhang, Yang Z wrote: > > In our recently nightly test, we find guest will cost more > > time to boot up. After our investigation, we find that for rhel5u3 and > > rh5u5 guest it will stop at ?start udev ? for long time when boot > > up. And we find cs:21768 will cause this issue. Do you meet the same > > problem? > > Nope, works fine for me[tm]. RHEL 5.5 stops at start_udev for about 5 > seconds, but that''s not unusual. I''ll try RHEL 5.3.I''ve reproduced this slowdown on CentOS 5.5 x64. It seems to be caused by the size of the SMBIOS tables - reverting the part of this cset that adds a type 11 object fixes the boot time; then just making some of the other SMBIOS strings longer causes it to hang again. I wonder whether we''re running into some other BIOS datastructure around 0xEB180 Tim. -- Tim Deegan <Tim.Deegan@citrix.com> Principal Software Engineer, XenServer Engineering Citrix Systems UK Ltd. (Company #02937203, SL9 0BG) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Zhang, Yang Z
2010-Jul-15 15:14 UTC
[Xen-devel] RE: cs:21768 causes guest spend more time on boot up
Is type 11 data structure cover 0xEB180 ? where is the start address of type 11 structure? and the max length ? best regards yang -----Original Message----- From: Tim Deegan [mailto:Tim.Deegan@citrix.com] Sent: Thursday, July 15, 2010 8:23 PM To: Zhang, Yang Z Cc: xen-devel@lists.xensource.com; Zhang, Jianwu; Xu, Jiajun Subject: Re: cs:21768 causes guest spend more time on boot up At 10:48 +0100 on 15 Jul (1279190937), Tim Deegan wrote:> At 09:07 +0100 on 15 Jul (1279184841), Zhang, Yang Z wrote: > > In our recently nightly test, we find guest will cost more > > time to boot up. After our investigation, we find that for rhel5u3 and > > rh5u5 guest it will stop at ?start udev ? for long time when boot > > up. And we find cs:21768 will cause this issue. Do you meet the same > > problem? > > Nope, works fine for me[tm]. RHEL 5.5 stops at start_udev for about 5 > seconds, but that''s not unusual. I''ll try RHEL 5.3.I''ve reproduced this slowdown on CentOS 5.5 x64. It seems to be caused by the size of the SMBIOS tables - reverting the part of this cset that adds a type 11 object fixes the boot time; then just making some of the other SMBIOS strings longer causes it to hang again. I wonder whether we''re running into some other BIOS datastructure around 0xEB180 Tim. -- Tim Deegan <Tim.Deegan@citrix.com> Principal Software Engineer, XenServer Engineering Citrix Systems UK Ltd. (Company #02937203, SL9 0BG) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Tim Deegan
2010-Jul-15 15:20 UTC
[Xen-devel] Re: cs:21768 causes guest spend more time on boot up
At 16:14 +0100 on 15 Jul (1279210449), Zhang, Yang Z wrote:> Is type 11 data structure cover 0xEB180 ? where is the start address > of type 11 structure? and the max length ?No, the type-11 table is in the middle of the other SMBIOS tables. The SMBIOS tables cover 0xEB000 -- 0xEB171 before my patch; after the patch they go a little further. Extending them to cover 0xEB000 -- 0xEB187 (just by making existing strings a little longer, not adding a type-11 table) makes CentOS 5.5 x64 hang up on boot. Cheers, Tim.> > best regards > yang > > > -----Original Message----- > From: Tim Deegan [mailto:Tim.Deegan@citrix.com] > Sent: Thursday, July 15, 2010 8:23 PM > To: Zhang, Yang Z > Cc: xen-devel@lists.xensource.com; Zhang, Jianwu; Xu, Jiajun > Subject: Re: cs:21768 causes guest spend more time on boot up > > At 10:48 +0100 on 15 Jul (1279190937), Tim Deegan wrote: > > At 09:07 +0100 on 15 Jul (1279184841), Zhang, Yang Z wrote: > > > In our recently nightly test, we find guest will cost more > > > time to boot up. After our investigation, we find that for rhel5u3 and > > > rh5u5 guest it will stop at ?start udev ? for long time when boot > > > up. And we find cs:21768 will cause this issue. Do you meet the same > > > problem? > > > > Nope, works fine for me[tm]. RHEL 5.5 stops at start_udev for about 5 > > seconds, but that''s not unusual. I''ll try RHEL 5.3. > > I''ve reproduced this slowdown on CentOS 5.5 x64. It seems to be caused > by the size of the SMBIOS tables - reverting the part of this cset that > adds a type 11 object fixes the boot time; then just making some of the > other SMBIOS strings longer causes it to hang again. I wonder whether > we''re running into some other BIOS datastructure around 0xEB180 > > Tim. > > -- > Tim Deegan <Tim.Deegan@citrix.com> > Principal Software Engineer, XenServer Engineering > Citrix Systems UK Ltd. (Company #02937203, SL9 0BG)-- Tim Deegan <Tim.Deegan@citrix.com> Principal Software Engineer, XenServer Engineering Citrix Systems UK Ltd. (Company #02937203, SL9 0BG) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Zhang, Yang Z
2010-Jul-15 15:30 UTC
[Xen-devel] RE: cs:21768 causes guest spend more time on boot up
ok, I get it. Will you work on this issue ? it has block our nightly test. Hope will be fixed as soon as possible. best regards yang -----Original Message----- From: Tim Deegan [mailto:Tim.Deegan@citrix.com] Sent: Thursday, July 15, 2010 11:20 PM To: Zhang, Yang Z Cc: xen-devel@lists.xensource.com; Zhang, Jianwu; Xu, Jiajun Subject: Re: cs:21768 causes guest spend more time on boot up At 16:14 +0100 on 15 Jul (1279210449), Zhang, Yang Z wrote:> Is type 11 data structure cover 0xEB180 ? where is the start address > of type 11 structure? and the max length ?No, the type-11 table is in the middle of the other SMBIOS tables. The SMBIOS tables cover 0xEB000 -- 0xEB171 before my patch; after the patch they go a little further. Extending them to cover 0xEB000 -- 0xEB187 (just by making existing strings a little longer, not adding a type-11 table) makes CentOS 5.5 x64 hang up on boot. Cheers, Tim.> > best regards > yang > > > -----Original Message----- > From: Tim Deegan [mailto:Tim.Deegan@citrix.com] > Sent: Thursday, July 15, 2010 8:23 PM > To: Zhang, Yang Z > Cc: xen-devel@lists.xensource.com; Zhang, Jianwu; Xu, Jiajun > Subject: Re: cs:21768 causes guest spend more time on boot up > > At 10:48 +0100 on 15 Jul (1279190937), Tim Deegan wrote: > > At 09:07 +0100 on 15 Jul (1279184841), Zhang, Yang Z wrote: > > > In our recently nightly test, we find guest will cost more > > > time to boot up. After our investigation, we find that for rhel5u3 and > > > rh5u5 guest it will stop at ?start udev ? for long time when boot > > > up. And we find cs:21768 will cause this issue. Do you meet the same > > > problem? > > > > Nope, works fine for me[tm]. RHEL 5.5 stops at start_udev for about 5 > > seconds, but that''s not unusual. I''ll try RHEL 5.3. > > I''ve reproduced this slowdown on CentOS 5.5 x64. It seems to be caused > by the size of the SMBIOS tables - reverting the part of this cset that > adds a type 11 object fixes the boot time; then just making some of the > other SMBIOS strings longer causes it to hang again. I wonder whether > we''re running into some other BIOS datastructure around 0xEB180 > > Tim. > > -- > Tim Deegan <Tim.Deegan@citrix.com> > Principal Software Engineer, XenServer Engineering > Citrix Systems UK Ltd. (Company #02937203, SL9 0BG)-- Tim Deegan <Tim.Deegan@citrix.com> Principal Software Engineer, XenServer Engineering Citrix Systems UK Ltd. (Company #02937203, SL9 0BG) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Tim Deegan
2010-Jul-15 15:33 UTC
[Xen-devel] Re: cs:21768 causes guest spend more time on boot up
At 16:30 +0100 on 15 Jul (1279211440), Zhang, Yang Z wrote:> ok, I get it. Will you work on this issue ? it has block our nightly > test. Hope will be fixed as soon as possible.Yes, I am still looking into it. I think the address of the table is not causing it, though, so it might take some time to find the real cause. Cheers, Tim.> best regards > yang > > > -----Original Message----- > From: Tim Deegan [mailto:Tim.Deegan@citrix.com] > Sent: Thursday, July 15, 2010 11:20 PM > To: Zhang, Yang Z > Cc: xen-devel@lists.xensource.com; Zhang, Jianwu; Xu, Jiajun > Subject: Re: cs:21768 causes guest spend more time on boot up > > At 16:14 +0100 on 15 Jul (1279210449), Zhang, Yang Z wrote: > > Is type 11 data structure cover 0xEB180 ? where is the start address > > of type 11 structure? and the max length ? > > No, the type-11 table is in the middle of the other SMBIOS tables. The > SMBIOS tables cover 0xEB000 -- 0xEB171 before my patch; after the patch > they go a little further. Extending them to cover 0xEB000 -- 0xEB187 > (just by making existing strings a little longer, not adding a type-11 > table) makes CentOS 5.5 x64 hang up on boot. > > Cheers, > > Tim. > > > > > best regards > > yang > > > > > > -----Original Message----- > > From: Tim Deegan [mailto:Tim.Deegan@citrix.com] > > Sent: Thursday, July 15, 2010 8:23 PM > > To: Zhang, Yang Z > > Cc: xen-devel@lists.xensource.com; Zhang, Jianwu; Xu, Jiajun > > Subject: Re: cs:21768 causes guest spend more time on boot up > > > > At 10:48 +0100 on 15 Jul (1279190937), Tim Deegan wrote: > > > At 09:07 +0100 on 15 Jul (1279184841), Zhang, Yang Z wrote: > > > > In our recently nightly test, we find guest will cost more > > > > time to boot up. After our investigation, we find that for rhel5u3 and > > > > rh5u5 guest it will stop at ?start udev ? for long time when boot > > > > up. And we find cs:21768 will cause this issue. Do you meet the same > > > > problem? > > > > > > Nope, works fine for me[tm]. RHEL 5.5 stops at start_udev for about 5 > > > seconds, but that''s not unusual. I''ll try RHEL 5.3. > > > > I''ve reproduced this slowdown on CentOS 5.5 x64. It seems to be caused > > by the size of the SMBIOS tables - reverting the part of this cset that > > adds a type 11 object fixes the boot time; then just making some of the > > other SMBIOS strings longer causes it to hang again. I wonder whether > > we''re running into some other BIOS datastructure around 0xEB180 > > > > Tim. > > > > -- > > Tim Deegan <Tim.Deegan@citrix.com> > > Principal Software Engineer, XenServer Engineering > > Citrix Systems UK Ltd. (Company #02937203, SL9 0BG) > > -- > Tim Deegan <Tim.Deegan@citrix.com> > Principal Software Engineer, XenServer Engineering > Citrix Systems UK Ltd. (Company #02937203, SL9 0BG)-- Tim Deegan <Tim.Deegan@citrix.com> Principal Software Engineer, XenServer Engineering Citrix Systems UK Ltd. (Company #02937203, SL9 0BG) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Tim Deegan
2010-Jul-19 10:55 UTC
[Xen-devel] [PATCH] Re: cs:21768 causes guest spend more time on boot up
At 16:33 +0100 on 15 Jul (1279211635), Tim Deegan wrote:> At 16:30 +0100 on 15 Jul (1279211440), Zhang, Yang Z wrote: > > ok, I get it. Will you work on this issue ? it has block our nightly > > test. Hope will be fixed as soon as possible. > > Yes, I am still looking into it. I think the address of the table is > not causing it, though, so it might take some time to find the real > cause.The hang turned out to be entirely unrelated to the SMBIOS tables; the xenbus client zeroes out teh xenstore ring entirely, and it looks like newer dom0 xenbus backends can''t handle that, so: hvmloader: don''t zero out the xenbus page. Not all xenbus backends accept that gracefully. Instead rely on the xenbus frontend in the guest being able to start up with non-zero ring offsets. Signed-off-by: Tim Deegan <Tim.Deegan@citrix.com> diff -r c4a83e3cc6b4 tools/firmware/hvmloader/xenbus.c --- a/tools/firmware/hvmloader/xenbus.c Thu Jul 15 18:18:16 2010 +0100 +++ b/tools/firmware/hvmloader/xenbus.c Mon Jul 19 11:51:11 2010 +0100 @@ -53,14 +53,14 @@ (unsigned long) rings, (unsigned long) event); } -/* Reset the xenbus connection so the next kernel can start again. - * We zero out the whole ring -- the backend can handle this, and it''s - * not going to surprise any frontends since it''s equivalent to never - * having used the rings. */ +/* Drop the Xenbus connection, leaving it ready for the next user. + * There should be no messages on the ring but make sure the rsp + * consumer is up to date just in case. */ void xenbus_shutdown(void) { ASSERT(rings != NULL); - memset(rings, 0, sizeof *rings); + ASSERT(rings->req_cons == rings->req_prod); + rings->rsp_cons = rings->rsp_prod; rings = NULL; } _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2010-Jul-19 11:46 UTC
[Xen-devel] Re: [PATCH] Re: cs:21768 causes guest spend more time on boot up
On 19/07/2010 11:55, "Tim Deegan" <Tim.Deegan@eu.citrix.com> wrote:>> Yes, I am still looking into it. I think the address of the table is >> not causing it, though, so it might take some time to find the real >> cause. > > The hang turned out to be entirely unrelated to the SMBIOS tables; the > xenbus client zeroes out teh xenstore ring entirely, and it looks like > newer dom0 xenbus backends can''t handle that, so:What would it have to do with an in-kernel driver? Doesn''t the comms page only get looked at by [o]xenstored? In which case we could fix them. -- Keir> hvmloader: don''t zero out the xenbus page. > Not all xenbus backends accept that gracefully. Instead rely on the > xenbus frontend in the guest being able to start up with non-zero ring > offsets. > > Signed-off-by: Tim Deegan <Tim.Deegan@citrix.com>_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Tim Deegan
2010-Jul-19 12:19 UTC
[Xen-devel] Re: [PATCH] Re: cs:21768 causes guest spend more time on boot up
At 12:46 +0100 on 19 Jul (1279543585), Keir Fraser wrote:> On 19/07/2010 11:55, "Tim Deegan" <Tim.Deegan@eu.citrix.com> wrote: > > >> Yes, I am still looking into it. I think the address of the table is > >> not causing it, though, so it might take some time to find the real > >> cause. > > > > The hang turned out to be entirely unrelated to the SMBIOS tables; the > > xenbus client zeroes out teh xenstore ring entirely, and it looks like > > newer dom0 xenbus backends can''t handle that, so: > > What would it have to do with an in-kernel driver? Doesn''t the comms page > only get looked at by [o]xenstored? In which case we could fix them.Ah, so it does. I assumed it was the kernel because that''s all that changed on my test box since I last tested this stuff. I''m using the C xenstored and its code looks like it should work fine with the page getting zeroed under its feet. I''ll dig further. Cheers, Tim.> > -- Keir > > > hvmloader: don''t zero out the xenbus page. > > Not all xenbus backends accept that gracefully. Instead rely on the > > xenbus frontend in the guest being able to start up with non-zero ring > > offsets. > > > > Signed-off-by: Tim Deegan <Tim.Deegan@citrix.com> > >-- Tim Deegan <Tim.Deegan@citrix.com> Principal Software Engineer, XenServer Engineering Citrix Systems UK Ltd. (Company #02937203, SL9 0BG) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2010-Jul-19 12:24 UTC
[Xen-devel] Re: [PATCH] Re: cs:21768 causes guest spend more time on boot up
On 19/07/2010 13:19, "Tim Deegan" <Tim.Deegan@eu.citrix.com> wrote:>>> The hang turned out to be entirely unrelated to the SMBIOS tables; the >>> xenbus client zeroes out teh xenstore ring entirely, and it looks like >>> newer dom0 xenbus backends can''t handle that, so: >> >> What would it have to do with an in-kernel driver? Doesn''t the comms page >> only get looked at by [o]xenstored? In which case we could fix them. > > Ah, so it does. I assumed it was the kernel because that''s all that > changed on my test box since I last tested this stuff. I''m using the C > xenstored and its code looks like it should work fine with the page > getting zeroed under its feet. I''ll dig further.Thanks. The revised patch might be acceptable, but if possible we''re better off relying on undocumented xenstored behaviour than domU frontend behaviour since the former we have full control over (albeit we now have two daemons to consider). Really nice would be some kind of explicit reset command or protocol, but in this context since there are no watches or pending requests or anything, I guess whacking the xenstore page is sufficient engineering effort. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2010-Jul-19 12:30 UTC
Re: [Xen-devel] Re: [PATCH] Re: cs:21768 causes guest spend more time on boot up
On 19/07/2010 13:24, "Keir Fraser" <keir.fraser@eu.citrix.com> wrote:>> Ah, so it does. I assumed it was the kernel because that''s all that >> changed on my test box since I last tested this stuff. I''m using the C >> xenstored and its code looks like it should work fine with the page >> getting zeroed under its feet. I''ll dig further. > > Thanks. The revised patch might be acceptable, but if possible we''re better > off relying on undocumented xenstored behaviour than domU frontend behaviour > since the former we have full control over (albeit we now have two daemons > to consider). Really nice would be some kind of explicit reset command or > protocol, but in this context since there are no watches or pending requests > or anything, I guess whacking the xenstore page is sufficient engineering > effort.And, yes, I agree it looks like the existing code should just work. The C xenstored, at least, doesn''t cache ring indexes. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Tim Deegan
2010-Jul-20 10:38 UTC
[Xen-devel] [PATCH] v2 (Re: cs:21768 causes guest spend more time on boot up)
Here''s another patch that also unsticks CentOS 5.5 boot for me, and seems safer and saner (even if it turns out that the bug is somewhere else and I''m just perturbing the inputs to some more complex system...) Cheers, Tim. hvmloader: clear the xenbus event-channel when we''re done with it. Otherwise a later xenbus client that naively waits for the rising edge could get stuck. Signed-off-by: Tim Deegan <Tim.Deegan@citrix.com> diff -r c4a83e3cc6b4 tools/firmware/hvmloader/util.c --- a/tools/firmware/hvmloader/util.c Thu Jul 15 18:18:16 2010 +0100 +++ b/tools/firmware/hvmloader/util.c Tue Jul 20 11:34:06 2010 +0100 @@ -587,10 +587,28 @@ return table; } +struct shared_info *get_shared_info(void) +{ + static struct shared_info *shared_info = NULL; + struct xen_add_to_physmap xatp; + + if ( shared_info == NULL ) + { + /* Map shared-info page. */ + xatp.domid = DOMID_SELF; + xatp.space = XENMAPSPACE_shared_info; + xatp.idx = 0; + xatp.gpfn = 0xfffff; + if ( hypercall_memory_op(XENMEM_add_to_physmap, &xatp) != 0 ) + BUG(); + shared_info = (struct shared_info *) (xatp.gpfn << 12); + } + return shared_info; +} + uint16_t get_cpu_mhz(void) { - struct xen_add_to_physmap xatp; - struct shared_info *shared_info = (struct shared_info *)0xfffff000; + struct shared_info *shared_info = get_shared_info(); struct vcpu_time_info *info = &shared_info->vcpu_info[0].time; uint64_t cpu_khz; uint32_t tsc_to_nsec_mul, version; @@ -599,14 +617,6 @@ static uint16_t cpu_mhz; if ( cpu_mhz != 0 ) return cpu_mhz; - - /* Map shared-info page. */ - xatp.domid = DOMID_SELF; - xatp.space = XENMAPSPACE_shared_info; - xatp.idx = 0; - xatp.gpfn = (unsigned long)shared_info >> 12; - if ( hypercall_memory_op(XENMEM_add_to_physmap, &xatp) != 0 ) - BUG(); /* Get a consistent snapshot of scale factor (multiplier and shift). */ do { diff -r c4a83e3cc6b4 tools/firmware/hvmloader/util.h --- a/tools/firmware/hvmloader/util.h Thu Jul 15 18:18:16 2010 +0100 +++ b/tools/firmware/hvmloader/util.h Tue Jul 20 11:34:06 2010 +0100 @@ -68,6 +68,9 @@ #define pci_writeb(devfn, reg, val) (pci_write(devfn, reg, 1, (uint8_t) val)) #define pci_writew(devfn, reg, val) (pci_write(devfn, reg, 2, (uint16_t)val)) #define pci_writel(devfn, reg, val) (pci_write(devfn, reg, 4, (uint32_t)val)) + +/* Get a pointer to the shared-info page */ +struct shared_info *get_shared_info(void); /* Get CPU speed in MHz. */ uint16_t get_cpu_mhz(void); diff -r c4a83e3cc6b4 tools/firmware/hvmloader/xenbus.c --- a/tools/firmware/hvmloader/xenbus.c Thu Jul 15 18:18:16 2010 +0100 +++ b/tools/firmware/hvmloader/xenbus.c Tue Jul 20 11:34:06 2010 +0100 @@ -53,14 +53,20 @@ (unsigned long) rings, (unsigned long) event); } -/* Reset the xenbus connection so the next kernel can start again. - * We zero out the whole ring -- the backend can handle this, and it''s - * not going to surprise any frontends since it''s equivalent to never - * having used the rings. */ +/* Reset the xenbus connection so the next kernel can start again. */ void xenbus_shutdown(void) { ASSERT(rings != NULL); + + /* We zero out the whole ring -- the backend can handle this, and it''s + * not going to surprise any frontends since it''s equivalent to never + * having used the rings. */ memset(rings, 0, sizeof *rings); + + /* Clear the xenbus event-channel too */ + get_shared_info()->evtchn_pending[event / sizeof (unsigned long)] + &= ~(1UL << ((event % sizeof (unsigned long)))); + rings = NULL; } _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2010-Jul-20 11:16 UTC
[Xen-devel] Re: [PATCH] v2 (Re: cs:21768 causes guest spend more time on boot up)
Ah, this is because you poll the ring and never actually use let alone clear the event channel rx port? This looks like a good fix, thanks. -- Keir On 20/07/2010 11:38, "Tim Deegan" <Tim.Deegan@eu.citrix.com> wrote:> Here''s another patch that also unsticks CentOS 5.5 boot for me, and > seems safer and saner (even if it turns out that the bug is somewhere > else and I''m just perturbing the inputs to some more complex system...) > > Cheers, > > Tim. > > hvmloader: clear the xenbus event-channel when we''re done with it. > Otherwise a later xenbus client that naively waits for the rising edge > could get stuck. > > Signed-off-by: Tim Deegan <Tim.Deegan@citrix.com> > > diff -r c4a83e3cc6b4 tools/firmware/hvmloader/util.c > --- a/tools/firmware/hvmloader/util.c Thu Jul 15 18:18:16 2010 +0100 > +++ b/tools/firmware/hvmloader/util.c Tue Jul 20 11:34:06 2010 +0100 > @@ -587,10 +587,28 @@ > return table; > } > > +struct shared_info *get_shared_info(void) > +{ > + static struct shared_info *shared_info = NULL; > + struct xen_add_to_physmap xatp; > + > + if ( shared_info == NULL ) > + { > + /* Map shared-info page. */ > + xatp.domid = DOMID_SELF; > + xatp.space = XENMAPSPACE_shared_info; > + xatp.idx = 0; > + xatp.gpfn = 0xfffff; > + if ( hypercall_memory_op(XENMEM_add_to_physmap, &xatp) != 0 ) > + BUG(); > + shared_info = (struct shared_info *) (xatp.gpfn << 12); > + } > + return shared_info; > +} > + > uint16_t get_cpu_mhz(void) > { > - struct xen_add_to_physmap xatp; > - struct shared_info *shared_info = (struct shared_info *)0xfffff000; > + struct shared_info *shared_info = get_shared_info(); > struct vcpu_time_info *info = &shared_info->vcpu_info[0].time; > uint64_t cpu_khz; > uint32_t tsc_to_nsec_mul, version; > @@ -599,14 +617,6 @@ > static uint16_t cpu_mhz; > if ( cpu_mhz != 0 ) > return cpu_mhz; > - > - /* Map shared-info page. */ > - xatp.domid = DOMID_SELF; > - xatp.space = XENMAPSPACE_shared_info; > - xatp.idx = 0; > - xatp.gpfn = (unsigned long)shared_info >> 12; > - if ( hypercall_memory_op(XENMEM_add_to_physmap, &xatp) != 0 ) > - BUG(); > > /* Get a consistent snapshot of scale factor (multiplier and shift). */ > do { > diff -r c4a83e3cc6b4 tools/firmware/hvmloader/util.h > --- a/tools/firmware/hvmloader/util.h Thu Jul 15 18:18:16 2010 +0100 > +++ b/tools/firmware/hvmloader/util.h Tue Jul 20 11:34:06 2010 +0100 > @@ -68,6 +68,9 @@ > #define pci_writeb(devfn, reg, val) (pci_write(devfn, reg, 1, (uint8_t) val)) > #define pci_writew(devfn, reg, val) (pci_write(devfn, reg, 2, (uint16_t)val)) > #define pci_writel(devfn, reg, val) (pci_write(devfn, reg, 4, (uint32_t)val)) > + > +/* Get a pointer to the shared-info page */ > +struct shared_info *get_shared_info(void); > > /* Get CPU speed in MHz. */ > uint16_t get_cpu_mhz(void); > diff -r c4a83e3cc6b4 tools/firmware/hvmloader/xenbus.c > --- a/tools/firmware/hvmloader/xenbus.c Thu Jul 15 18:18:16 2010 +0100 > +++ b/tools/firmware/hvmloader/xenbus.c Tue Jul 20 11:34:06 2010 +0100 > @@ -53,14 +53,20 @@ > (unsigned long) rings, (unsigned long) event); > } > > -/* Reset the xenbus connection so the next kernel can start again. > - * We zero out the whole ring -- the backend can handle this, and it''s > - * not going to surprise any frontends since it''s equivalent to never > - * having used the rings. */ > +/* Reset the xenbus connection so the next kernel can start again. */ > void xenbus_shutdown(void) > { > ASSERT(rings != NULL); > + > + /* We zero out the whole ring -- the backend can handle this, and it''s > + * not going to surprise any frontends since it''s equivalent to never > + * having used the rings. */ > memset(rings, 0, sizeof *rings); > + > + /* Clear the xenbus event-channel too */ > + get_shared_info()->evtchn_pending[event / sizeof (unsigned long)] > + &= ~(1UL << ((event % sizeof (unsigned long)))); > + > rings = NULL; > } >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Zhang, Jianwu
2010-Jul-22 09:01 UTC
[Xen-devel] RE: [PATCH] v2 (Re: cs:21768 causes guest spend more time on boot up)
Hi Tim When we set the VCPUS parameter more than 2 in guest configure file, We meet this issue again . We try it on cs21837, and the guest os is rhel5u3. Thanks Jianwu -----Original Message----- From: Keir Fraser [mailto:keir.fraser@eu.citrix.com] Sent: Tuesday, July 20, 2010 7:17 PM To: Tim Deegan Cc: Zhang, Yang Z; xen-devel@lists.xensource.com; Zhang, Jianwu; Xu, Jiajun Subject: Re: [PATCH] v2 (Re: cs:21768 causes guest spend more time on boot up) Ah, this is because you poll the ring and never actually use let alone clear the event channel rx port? This looks like a good fix, thanks. -- Keir On 20/07/2010 11:38, "Tim Deegan" <Tim.Deegan@eu.citrix.com> wrote:> Here''s another patch that also unsticks CentOS 5.5 boot for me, and > seems safer and saner (even if it turns out that the bug is somewhere > else and I''m just perturbing the inputs to some more complex system...) > > Cheers, > > Tim. > > hvmloader: clear the xenbus event-channel when we''re done with it. > Otherwise a later xenbus client that naively waits for the rising edge > could get stuck. > > Signed-off-by: Tim Deegan <Tim.Deegan@citrix.com> > > diff -r c4a83e3cc6b4 tools/firmware/hvmloader/util.c > --- a/tools/firmware/hvmloader/util.c Thu Jul 15 18:18:16 2010 +0100 > +++ b/tools/firmware/hvmloader/util.c Tue Jul 20 11:34:06 2010 +0100 > @@ -587,10 +587,28 @@ > return table; > } > > +struct shared_info *get_shared_info(void) > +{ > + static struct shared_info *shared_info = NULL; > + struct xen_add_to_physmap xatp; > + > + if ( shared_info == NULL ) > + { > + /* Map shared-info page. */ > + xatp.domid = DOMID_SELF; > + xatp.space = XENMAPSPACE_shared_info; > + xatp.idx = 0; > + xatp.gpfn = 0xfffff; > + if ( hypercall_memory_op(XENMEM_add_to_physmap, &xatp) != 0 ) > + BUG(); > + shared_info = (struct shared_info *) (xatp.gpfn << 12); > + } > + return shared_info; > +} > + > uint16_t get_cpu_mhz(void) > { > - struct xen_add_to_physmap xatp; > - struct shared_info *shared_info = (struct shared_info *)0xfffff000; > + struct shared_info *shared_info = get_shared_info(); > struct vcpu_time_info *info = &shared_info->vcpu_info[0].time; > uint64_t cpu_khz; > uint32_t tsc_to_nsec_mul, version; > @@ -599,14 +617,6 @@ > static uint16_t cpu_mhz; > if ( cpu_mhz != 0 ) > return cpu_mhz; > - > - /* Map shared-info page. */ > - xatp.domid = DOMID_SELF; > - xatp.space = XENMAPSPACE_shared_info; > - xatp.idx = 0; > - xatp.gpfn = (unsigned long)shared_info >> 12; > - if ( hypercall_memory_op(XENMEM_add_to_physmap, &xatp) != 0 ) > - BUG(); > > /* Get a consistent snapshot of scale factor (multiplier and shift). */ > do { > diff -r c4a83e3cc6b4 tools/firmware/hvmloader/util.h > --- a/tools/firmware/hvmloader/util.h Thu Jul 15 18:18:16 2010 +0100 > +++ b/tools/firmware/hvmloader/util.h Tue Jul 20 11:34:06 2010 +0100 > @@ -68,6 +68,9 @@ > #define pci_writeb(devfn, reg, val) (pci_write(devfn, reg, 1, (uint8_t) val)) > #define pci_writew(devfn, reg, val) (pci_write(devfn, reg, 2, (uint16_t)val)) > #define pci_writel(devfn, reg, val) (pci_write(devfn, reg, 4, (uint32_t)val)) > + > +/* Get a pointer to the shared-info page */ > +struct shared_info *get_shared_info(void); > > /* Get CPU speed in MHz. */ > uint16_t get_cpu_mhz(void); > diff -r c4a83e3cc6b4 tools/firmware/hvmloader/xenbus.c > --- a/tools/firmware/hvmloader/xenbus.c Thu Jul 15 18:18:16 2010 +0100 > +++ b/tools/firmware/hvmloader/xenbus.c Tue Jul 20 11:34:06 2010 +0100 > @@ -53,14 +53,20 @@ > (unsigned long) rings, (unsigned long) event); > } > > -/* Reset the xenbus connection so the next kernel can start again. > - * We zero out the whole ring -- the backend can handle this, and it''s > - * not going to surprise any frontends since it''s equivalent to never > - * having used the rings. */ > +/* Reset the xenbus connection so the next kernel can start again. */ > void xenbus_shutdown(void) > { > ASSERT(rings != NULL); > + > + /* We zero out the whole ring -- the backend can handle this, and it''s > + * not going to surprise any frontends since it''s equivalent to never > + * having used the rings. */ > memset(rings, 0, sizeof *rings); > + > + /* Clear the xenbus event-channel too */ > + get_shared_info()->evtchn_pending[event / sizeof (unsigned long)] > + &= ~(1UL << ((event % sizeof (unsigned long)))); > + > rings = NULL; > } >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2010-Jul-22 13:20 UTC
[Xen-devel] Re: [PATCH] v2 (Re: cs:21768 causes guest spend more time on boot up)
I suggest to try zapping the entire shared-info page when hvmloader finishes. There is nothing in there that is useful to keep across hvmloader and guest OS; zapping will ensure that other flags with rising-edge semantics such as per-vcpu evtchn selector words get reset; and doing anything more than zeroing is pointless since e.g., the evtchn_mask array offset and size is dependent on whether the guest OS is 32-bit or 64-bit. If hvmloader were to set the mask to all 1s and then boot a 64-bit guest, the rearranged shared_info would actually mean that hvmloader has set 1s in part of the 64-bit extended evtchn_pending array! Just a thought... -- Keir On 22/07/2010 10:01, "Zhang, Jianwu" <jianwu.zhang@intel.com> wrote:> Hi Tim > When we set the VCPUS parameter more than 2 in guest configure file, We meet > this issue again . We try it on cs21837, and the guest os is rhel5u3. > > Thanks > Jianwu > > -----Original Message----- > From: Keir Fraser [mailto:keir.fraser@eu.citrix.com] > Sent: Tuesday, July 20, 2010 7:17 PM > To: Tim Deegan > Cc: Zhang, Yang Z; xen-devel@lists.xensource.com; Zhang, Jianwu; Xu, Jiajun > Subject: Re: [PATCH] v2 (Re: cs:21768 causes guest spend more time on boot up) > > Ah, this is because you poll the ring and never actually use let alone clear > the event channel rx port? This looks like a good fix, thanks. > > -- Keir > > On 20/07/2010 11:38, "Tim Deegan" <Tim.Deegan@eu.citrix.com> wrote: > >> Here''s another patch that also unsticks CentOS 5.5 boot for me, and >> seems safer and saner (even if it turns out that the bug is somewhere >> else and I''m just perturbing the inputs to some more complex system...) >> >> Cheers, >> >> Tim. >> >> hvmloader: clear the xenbus event-channel when we''re done with it. >> Otherwise a later xenbus client that naively waits for the rising edge >> could get stuck. >> >> Signed-off-by: Tim Deegan <Tim.Deegan@citrix.com> >> >> diff -r c4a83e3cc6b4 tools/firmware/hvmloader/util.c >> --- a/tools/firmware/hvmloader/util.c Thu Jul 15 18:18:16 2010 +0100 >> +++ b/tools/firmware/hvmloader/util.c Tue Jul 20 11:34:06 2010 +0100 >> @@ -587,10 +587,28 @@ >> return table; >> } >> >> +struct shared_info *get_shared_info(void) >> +{ >> + static struct shared_info *shared_info = NULL; >> + struct xen_add_to_physmap xatp; >> + >> + if ( shared_info == NULL ) >> + { >> + /* Map shared-info page. */ >> + xatp.domid = DOMID_SELF; >> + xatp.space = XENMAPSPACE_shared_info; >> + xatp.idx = 0; >> + xatp.gpfn = 0xfffff; >> + if ( hypercall_memory_op(XENMEM_add_to_physmap, &xatp) != 0 ) >> + BUG(); >> + shared_info = (struct shared_info *) (xatp.gpfn << 12); >> + } >> + return shared_info; >> +} >> + >> uint16_t get_cpu_mhz(void) >> { >> - struct xen_add_to_physmap xatp; >> - struct shared_info *shared_info = (struct shared_info *)0xfffff000; >> + struct shared_info *shared_info = get_shared_info(); >> struct vcpu_time_info *info = &shared_info->vcpu_info[0].time; >> uint64_t cpu_khz; >> uint32_t tsc_to_nsec_mul, version; >> @@ -599,14 +617,6 @@ >> static uint16_t cpu_mhz; >> if ( cpu_mhz != 0 ) >> return cpu_mhz; >> - >> - /* Map shared-info page. */ >> - xatp.domid = DOMID_SELF; >> - xatp.space = XENMAPSPACE_shared_info; >> - xatp.idx = 0; >> - xatp.gpfn = (unsigned long)shared_info >> 12; >> - if ( hypercall_memory_op(XENMEM_add_to_physmap, &xatp) != 0 ) >> - BUG(); >> >> /* Get a consistent snapshot of scale factor (multiplier and shift). */ >> do { >> diff -r c4a83e3cc6b4 tools/firmware/hvmloader/util.h >> --- a/tools/firmware/hvmloader/util.h Thu Jul 15 18:18:16 2010 +0100 >> +++ b/tools/firmware/hvmloader/util.h Tue Jul 20 11:34:06 2010 +0100 >> @@ -68,6 +68,9 @@ >> #define pci_writeb(devfn, reg, val) (pci_write(devfn, reg, 1, (uint8_t) >> val)) >> #define pci_writew(devfn, reg, val) (pci_write(devfn, reg, 2, >> (uint16_t)val)) >> #define pci_writel(devfn, reg, val) (pci_write(devfn, reg, 4, >> (uint32_t)val)) >> + >> +/* Get a pointer to the shared-info page */ >> +struct shared_info *get_shared_info(void); >> >> /* Get CPU speed in MHz. */ >> uint16_t get_cpu_mhz(void); >> diff -r c4a83e3cc6b4 tools/firmware/hvmloader/xenbus.c >> --- a/tools/firmware/hvmloader/xenbus.c Thu Jul 15 18:18:16 2010 +0100 >> +++ b/tools/firmware/hvmloader/xenbus.c Tue Jul 20 11:34:06 2010 +0100 >> @@ -53,14 +53,20 @@ >> (unsigned long) rings, (unsigned long) event); >> } >> >> -/* Reset the xenbus connection so the next kernel can start again. >> - * We zero out the whole ring -- the backend can handle this, and it''s >> - * not going to surprise any frontends since it''s equivalent to never >> - * having used the rings. */ >> +/* Reset the xenbus connection so the next kernel can start again. */ >> void xenbus_shutdown(void) >> { >> ASSERT(rings != NULL); >> + >> + /* We zero out the whole ring -- the backend can handle this, and it''s >> + * not going to surprise any frontends since it''s equivalent to never >> + * having used the rings. */ >> memset(rings, 0, sizeof *rings); >> + >> + /* Clear the xenbus event-channel too */ >> + get_shared_info()->evtchn_pending[event / sizeof (unsigned long)] >> + &= ~(1UL << ((event % sizeof (unsigned long)))); >> + >> rings = NULL; >> } >> > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Tim Deegan
2010-Jul-22 13:41 UTC
[Xen-devel] Re: [PATCH] v2 (Re: cs:21768 causes guest spend more time on boot up)
At 14:20 +0100 on 22 Jul (1279808457), Keir Fraser wrote:> I suggest to try zapping the entire shared-info page when hvmloader > finishes. There is nothing in there that is useful to keep across hvmloader > and guest OS; zapping will ensure that other flags with rising-edge > semantics such as per-vcpu evtchn selector words get reset; and doing > anything more than zeroing is pointless since e.g., the evtchn_mask array > offset and size is dependent on whether the guest OS is 32-bit or 64-bit. If > hvmloader were to set the mask to all 1s and then boot a 64-bit guest, the > rearranged shared_info would actually mean that hvmloader has set 1s in part > of the 64-bit extended evtchn_pending array!Good point. That seems to do the trick. hvmloader: clear the whole shared-info page when shutting down xenbus since the contents might be in the wrong word-size for later users. Signed-off-by: Tim Deegan <Tim.Deegan@citrix.com> diff -r e8dbc1262f52 tools/firmware/hvmloader/xenbus.c --- a/tools/firmware/hvmloader/xenbus.c Wed Jul 21 09:02:10 2010 +0100 +++ b/tools/firmware/hvmloader/xenbus.c Thu Jul 22 14:39:28 2010 +0100 @@ -63,9 +63,8 @@ void xenbus_shutdown(void) * having used the rings. */ memset(rings, 0, sizeof *rings); - /* Clear the xenbus event-channel too */ - get_shared_info()->evtchn_pending[event / sizeof (unsigned long)] - &= ~(1UL << ((event % sizeof (unsigned long)))); + /* Clear the event-channel state too. */ + memset(get_shared_info(), 0, PAGE_SIZE); rings = NULL; } _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel