Mukesh Rathor
2009-Dec-18 04:36 UTC
[Xen-devel] [timer/ticks related] dom0 hang during boot on large 1TB system
Hi, I finally solved a hang on a 1TB box during our dom0 boot on xen 3.4.0, that I''d been working on. The hang comes from: calibrate_delay_direct(): .... for (i = 0; i < MAX_DIRECT_CALIBRATION_RETRIES; i++) { pre_start = 0; start_jiffies = jiffies; while (jiffies <= (start_jiffies + tick_divider)) { pre_start = start; read_current_timer(&start); } read_current_timer(&post_start); ... start_jiffies is set to : INITIAL_JIFFIES == 0xfffedb08 now, timer interrupt comes in and finding delta to be rather huge (thanks to the page scrubbing of 1TB in xen), makes jiffies wrap around. This causes hang in the loop, that would resolve after say several days. delta: 940b7d68a4, jiffies:00009f8b I came up with fix (is there a reason it doesn''t use 64bit values?) : while (jiffies <= (start_jiffies + tick_divider)) { pre_start = start; read_current_timer(&start); + if (jiffies < start_jiffies) /* jiffies wrapped */ + start_jiffies = jiffies; } The other fix I thought of was to change INITIAL_JIFFIES to something sooner. Would appreciate any help, I don''t understand xen time management well. thanks, Mukesh PS: I''m attaching output of ''xm debug-key t''. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2009-Dec-18 07:02 UTC
Re: [Xen-devel] [timer/ticks related] dom0 hang during boot on large 1TB system
On 18/12/2009 04:36, "Mukesh Rathor" <mukesh.rathor@oracle.com> wrote:> The other fix I thought of was to change INITIAL_JIFFIES to something > sooner. > > Would appreciate any help, I don''t understand xen time management well.This isn''t really Xen time code, but unchanged Linux time code. I don''t know which tree you quoted the code from -- 2.6.18 has similar but not identical. Anyway, I suggest try using the jiffy-comparison macros from <linux/jiffies.h>: time_before(), time_after(), etc. These are designed to work even when jiffies wraps. Feel free to send patch(es) for that, if you test that out and it works okay. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jan Beulich
2009-Dec-18 08:42 UTC
Re: [Xen-devel] [timer/ticks related] dom0 hang during boot on large 1TB system
>>> Keir Fraser <keir.fraser@eu.citrix.com> 18.12.09 08:02 >>> >On 18/12/2009 04:36, "Mukesh Rathor" <mukesh.rathor@oracle.com> wrote: > >> The other fix I thought of was to change INITIAL_JIFFIES to something >> sooner. >> >> Would appreciate any help, I don''t understand xen time management well. > >This isn''t really Xen time code, but unchanged Linux time code. I don''t know >which tree you quoted the code from -- 2.6.18 has similar but not identical. >Anyway, I suggest try using the jiffy-comparison macros from ><linux/jiffies.h>: time_before(), time_after(), etc. These are designed to >work even when jiffies wraps. Feel free to send patch(es) for that, if you >test that out and it works okay.But regardless of that - shouldn''t the page scrubbing really be a background operation these days, and as such be (relatively) performance neutral to the booting of Dom0? Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2009-Dec-18 09:13 UTC
Re: [Xen-devel] [timer/ticks related] dom0 hang during boot on large 1TB system
On 18/12/2009 08:42, "Jan Beulich" <JBeulich@novell.com> wrote:>> This isn''t really Xen time code, but unchanged Linux time code. I don''t know >> which tree you quoted the code from -- 2.6.18 has similar but not identical. >> Anyway, I suggest try using the jiffy-comparison macros from >> <linux/jiffies.h>: time_before(), time_after(), etc. These are designed to >> work even when jiffies wraps. Feel free to send patch(es) for that, if you >> test that out and it works okay. > > But regardless of that - shouldn''t the page scrubbing really be a > background operation these days, and as such be (relatively) > performance neutral to the booting of Dom0?We synchronously scrub free memory before starting dom0, and then subsequently scrub memory only for dying domains. So I don''t know what scrubbing would be going on during dom0''s boot-time calibrations, on any version of Xen, actually. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Dan Magenheimer
2009-Dec-18 16:35 UTC
RE: [Xen-devel] [timer/ticks related] dom0 hang during boot on large 1TB system
> So I don''t know what > scrubbing would be going on during dom0''s boot-time > calibrations, on any > version of Xen, actually.Wasn''t the async page scrubbing removed post 3.4.0? (I think Mukesh''s bug was seen on 3.4.0.) I see c/s 19886 in July 2009 is "Remove page-scrub lists and async scrubbing"... if that patch were not applied, would Mukesh''s observed bug make more sense? Thanks, Dan> -----Original Message----- > From: Keir Fraser [mailto:keir.fraser@eu.citrix.com] > Sent: Friday, December 18, 2009 2:14 AM > To: Jan Beulich > Cc: Xen-devel@lists.xensource.com; Dan Magenheimer; Kurt > Hackel; Mukesh > Rathor > Subject: Re: [Xen-devel] [timer/ticks related] dom0 hang > during boot on > large 1TB system > > > On 18/12/2009 08:42, "Jan Beulich" <JBeulich@novell.com> wrote: > > >> This isn''t really Xen time code, but unchanged Linux time > code. I don''t know > >> which tree you quoted the code from -- 2.6.18 has similar > but not identical. > >> Anyway, I suggest try using the jiffy-comparison macros from > >> <linux/jiffies.h>: time_before(), time_after(), etc. These > are designed to > >> work even when jiffies wraps. Feel free to send patch(es) > for that, if you > >> test that out and it works okay. > > > > But regardless of that - shouldn''t the page scrubbing really be a > > background operation these days, and as such be (relatively) > > performance neutral to the booting of Dom0? > > We synchronously scrub free memory before starting dom0, and then > subsequently scrub memory only for dying domains. So I don''t know what > scrubbing would be going on during dom0''s boot-time > calibrations, on any > version of Xen, actually. > > -- Keir > > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2009-Dec-18 17:15 UTC
Re: [Xen-devel] [timer/ticks related] dom0 hang during boot on large 1TB system
On 18/12/2009 16:35, "Dan Magenheimer" <dan.magenheimer@oracle.com> wrote:>> So I don''t know what >> scrubbing would be going on during dom0''s boot-time >> calibrations, on any >> version of Xen, actually. > > Wasn''t the async page scrubbing removed post 3.4.0? > (I think Mukesh''s bug was seen on 3.4.0.) I see > c/s 19886 in July 2009 is "Remove page-scrub lists > and async scrubbing"... if that patch were not > applied, would Mukesh''s observed bug make more sense?Async page scrubbing was for scrubbing pages of dying domains. No domains are dying while dom0 is still booting. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Mukesh Rathor
2009-Dec-18 19:25 UTC
Re: [Xen-devel] [timer/ticks related] dom0 hang during boot on large 1TB system
On Fri, 18 Dec 2009 07:02:55 +0000 Keir Fraser <keir.fraser@eu.citrix.com> wrote:> On 18/12/2009 04:36, "Mukesh Rathor" <mukesh.rathor@oracle.com> wrote: > > > The other fix I thought of was to change INITIAL_JIFFIES to > > something sooner. > > > > Would appreciate any help, I don''t understand xen time management > > well. > > This isn''t really Xen time code, but unchanged Linux time code. I > don''t know which tree you quoted the code from -- 2.6.18 has similar > but not identical. Anyway, I suggest try using the jiffy-comparison > macros from <linux/jiffies.h>: time_before(), time_after(), etc. > These are designed to work even when jiffies wraps. Feel free to send > patch(es) for that, if you test that out and it works okay. > > -- Keir >It''s from the unstable version 2.6.18 tree from http://xenbits.xensource.com/linux-2.6.18-xen.hg file init/calibrate.c, function calibrate_delay_direct(). I see the code exactly the same as I mentioned. Anyways, I''m testing out the patch, trying to reproduce and make sure fix works. thanks, Mukesh _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Mukesh Rathor
2009-Dec-18 19:28 UTC
Re: [Xen-devel] [timer/ticks related] dom0 hang during boot on large 1TB system
On Fri, 18 Dec 2009 09:13:32 +0000 Keir Fraser <keir.fraser@eu.citrix.com> wrote:> On 18/12/2009 08:42, "Jan Beulich" <JBeulich@novell.com> wrote: > > >> This isn''t really Xen time code, but unchanged Linux time code. I > >> don''t know which tree you quoted the code from -- 2.6.18 has > >> similar but not identical. Anyway, I suggest try using the > >> jiffy-comparison macros from <linux/jiffies.h>: time_before(), > >> time_after(), etc. These are designed to work even when jiffies > >> wraps. Feel free to send patch(es) for that, if you test that out > >> and it works okay. > > > > But regardless of that - shouldn''t the page scrubbing really be a > > background operation these days, and as such be (relatively) > > performance neutral to the booting of Dom0? > > We synchronously scrub free memory before starting dom0, and then > subsequently scrub memory only for dying domains. So I don''t know what > scrubbing would be going on during dom0''s boot-time calibrations, on > any version of Xen, actually. > > -- Keir >Scrubbing has nothing to do with the bug. It''s just that the timing is just right to expose the bug. The system boots fine with lesser memory. Since hyp does: create dom0, page scrub, unpause dom0. It appears with large scrubbing, this gets delta in dom0 timer_interrupt() to be large enough that jiffies wraps. Hope that makes sense. thanks, Mukesh _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Mukesh Rathor
2009-Dec-19 04:43 UTC
Re: [Xen-devel] [timer/ticks related] dom0 hang during boot on large 1TB system
On Fri, 18 Dec 2009 07:02:55 +0000 Keir Fraser <keir.fraser@eu.citrix.com> wrote:> On 18/12/2009 04:36, "Mukesh Rathor" <mukesh.rathor@oracle.com> wrote: > > > The other fix I thought of was to change INITIAL_JIFFIES to > > something sooner. > > > > Would appreciate any help, I don''t understand xen time management > > well. > > This isn''t really Xen time code, but unchanged Linux time code. I > don''t know which tree you quoted the code from -- 2.6.18 has similar > but not identical. Anyway, I suggest try using the jiffy-comparison > macros from <linux/jiffies.h>: time_before(), time_after(), etc. > These are designed to work even when jiffies wraps. Feel free to send > patch(es) for that, if you test that out and it works okay. > > -- Keir >Ok, I came up with the following patch. Jeremy, can you please take a look also, and comment on my fix since I noticed you''ve got the same issue in your tree. Here''s a summary for your benefit: init/calibrate.c : calibrate_delay_direct(): start_jiffies = get_jiffies_64(); while (get_jiffies_64() <= (start_jiffies + tick_divider)) { pre_start = start; read_current_timer(&start); } if first ever timer interrupt comes after start_jiffies is set, dom0 boot may hang if delta in timer_interrupt() is so huge that it causes jiffies to wrap. It appears delta is very large when memory is more than 512GB on certain boxes causing wrap around. why is delta in dom0->timer_interrupt() related to memory on system? Because hyp creates dom0, then page scrubs, then unpauses vcpu. so it appears lot of page scurbbing results in huge delta on first tick. thanks, Mukesh Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com> diff --git a/init/calibrate.c b/init/calibrate.c index 06066a6..14f62c8 100644 --- a/init/calibrate.c +++ b/init/calibrate.c @@ -32,7 +32,7 @@ static unsigned long __devinit calibrate_delay_direct(void) { unsigned long pre_start, start, post_start; unsigned long pre_end, end, post_end; - unsigned long start_jiffies; + u64 start_jiffies; unsigned long tsc_rate_min, tsc_rate_max; unsigned long good_tsc_sum = 0; unsigned long good_tsc_count = 0; @@ -64,8 +64,8 @@ static unsigned long __devinit calibrate_delay_direct(void) for (i = 0; i < MAX_DIRECT_CALIBRATION_RETRIES; i++) { pre_start = 0; read_current_timer(&start); - start_jiffies = jiffies; - while (jiffies <= (start_jiffies + tick_divider)) { + start_jiffies = get_jiffies_64(); + while (get_jiffies_64() <= (start_jiffies + tick_divider)) { pre_start = start; read_current_timer(&start); } @@ -73,7 +73,7 @@ static unsigned long __devinit calibrate_delay_direct(void) pre_end = 0; end = post_start; - while (jiffies <+ while (get_jiffies_64() < (start_jiffies + tick_divider * (1 + delay_calibration_ticks))) { pre_end = end; read_current_timer(&end); _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jan Beulich
2009-Dec-21 09:55 UTC
Re: [Xen-devel] [timer/ticks related] dom0 hang during boot on large 1TB system
>>> Mukesh Rathor <mukesh.rathor@oracle.com> 19.12.09 05:43 >>> >if first ever timer interrupt comes after start_jiffies is set, dom0 boot >may hang if delta in timer_interrupt() is so huge that it causes jiffies >to wrap. It appears delta is very large when memory is more than 512GB on >certain boxes causing wrap around. > >why is delta in dom0->timer_interrupt() related to memory on system? >Because hyp creates dom0, then page scrubs, then unpauses vcpu. so it >appears lot of page scurbbing results in huge delta on first tick.Based on prior analysis of similar problems, I''m not convinced this is the right solution: Kernel code should not need changing here. Instead, I''d recommend trying to insert a call to process_pending_timers() every so many pages scrubbed (just like is e.g. being done in the P2M/M2P table population code). Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2009-Dec-21 10:44 UTC
Re: [Xen-devel] [timer/ticks related] dom0 hang during boot on large 1TB system
On 19/12/2009 04:43, "Mukesh Rathor" <mukesh.rathor@oracle.com> wrote:> Ok, I came up with the following patch. Jeremy, can you please take a > look also, and comment on my fix since I noticed you''ve got the same > issue in your tree. Here''s a summary for your benefit:This patch doesn''t apply to http://xenbits.xensource.com/linux-2.6.18-xen.hg by the way. The code is different there. So I''m dropping this patch as I have nowhere to put it. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Dan Magenheimer
2009-Dec-21 18:20 UTC
RE: [Xen-devel] [timer/ticks related] dom0 hang during boot on large 1TB system
> From: Jan Beulich [mailto:JBeulich@novell.com] > > >>> Mukesh Rathor <mukesh.rathor@oracle.com> 19.12.09 05:43 >>> > >if first ever timer interrupt comes after start_jiffies is > set, dom0 boot > >may hang if delta in timer_interrupt() is so huge that it > causes jiffies > >to wrap. It appears delta is very large when memory is more > than 512GB on > >certain boxes causing wrap around. > > > >why is delta in dom0->timer_interrupt() related to memory on system? > >Because hyp creates dom0, then page scrubs, then unpauses vcpu. so it > >appears lot of page scurbbing results in huge delta on first tick. > > Based on prior analysis of similar problems, I''m not > convinced this is the > right solution: Kernel code should not need changing here. > Instead, I''d > recommend trying to insert a call to process_pending_timers() every so > many pages scrubbed (just like is e.g. being done in the P2M/M2P table > population code).Mukesh has dug into this a lot deeper than I, but I think process_pending_timers() is irrelevant here. When dom0 is constructed, its data space is initialized in memory and jiffies has been initialized in the data section with a fixed value of -300 * HZ. At this point, dom0 lives in memory but has not executed a single instruction, so is not capable of receiving any interrupts. I *think* Xen also initializes a clocksource (pvclock?) here. Then scrub_heap_pages() occurs which eats up a lot of time. THEN dom0 is started and receives a timer interrupt and, I guess, the clocksource code updates jiffies based on the time elapsed and, since jiffies is unsigned, it wraps around. So (admitting I don''t understand this fully), I think the problem is that the kernel has hardcoded into it that it''s impossible for 300 seconds to expire between the time it is put in memory and the time the first interrupt occurs. That seems like a kernel bug to me, maybe in the pvclock code, but still in the kernel. Not to say the problem can''t or shouldn''t be fixed in Xen. Keir, would bad things happen if construct_dom0 is done after scrub_heap_pages()? Other than some time wastage because dom0''s memory would get scrubbed just before it gets overwritten (which is admittedly a much bigger problem when dom0_mem is not specified in the Xen boot line on a machine with ginormous memory). Thanks, Dan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2009-Dec-21 19:07 UTC
Re: [Xen-devel] [timer/ticks related] dom0 hang during boot on large 1TB system
On 21/12/2009 18:20, "Dan Magenheimer" <dan.magenheimer@oracle.com> wrote:> Not to say the problem can''t or shouldn''t be fixed in Xen. > Keir, would bad things happen if construct_dom0 is done after > scrub_heap_pages()? Other than some time wastage because > dom0''s memory would get scrubbed just before it gets > overwritten (which is admittedly a much bigger problem > when dom0_mem is not specified in the Xen boot line > on a machine with ginormous memory).The problem is more likely that Xen system time started ticking some time earlier during boot process. I doubt it is to do with ordering of construct_dom0 versus boot-time scrubbing. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Steve Ofsthun
2009-Dec-21 19:17 UTC
Re: [Xen-devel] [timer/ticks related] dom0 hang during boot on large 1TB system
Mukesh Rathor wrote:> On Fri, 18 Dec 2009 07:02:55 +0000 > Keir Fraser <keir.fraser@eu.citrix.com> wrote: > >> On 18/12/2009 04:36, "Mukesh Rathor" <mukesh.rathor@oracle.com> wrote: >> >>> The other fix I thought of was to change INITIAL_JIFFIES to >>> something sooner. >>> >>> Would appreciate any help, I don''t understand xen time management >>> well. >> This isn''t really Xen time code, but unchanged Linux time code. I >> don''t know which tree you quoted the code from -- 2.6.18 has similar >> but not identical. Anyway, I suggest try using the jiffy-comparison >> macros from <linux/jiffies.h>: time_before(), time_after(), etc. >> These are designed to work even when jiffies wraps. Feel free to send >> patch(es) for that, if you test that out and it works okay. >> >> -- Keir >> > > Ok, I came up with the following patch. Jeremy, can you please take a > look also, and comment on my fix since I noticed you''ve got the same > issue in your tree. Here''s a summary for your benefit: > > init/calibrate.c : calibrate_delay_direct(): > > start_jiffies = get_jiffies_64(); > while (get_jiffies_64() <= (start_jiffies + tick_divider)) { > pre_start = start; > read_current_timer(&start); > } >Linux time code explicitly forces jiffies (32-bit) to wrap soon after boot to prevent other kernel code from making assumptions about jiffies wrap. In your case, I''m guessing that the scrubbing delay is causing a sufficient number of timer interrupts to be delayed (queued up) that it is forcing the jiffies to wrap earlier in the boot path than expected. As Keir suggests, the correct solution is probably to use the time_before/after macros appropriately. The proposed code avoids the problem by accessing jiffies_64 instead.> if first ever timer interrupt comes after start_jiffies is set, dom0 boot > may hang if delta in timer_interrupt() is so huge that it causes jiffies > to wrap. It appears delta is very large when memory is more than 512GB on > certain boxes causing wrap around. > > why is delta in dom0->timer_interrupt() related to memory on system? > Because hyp creates dom0, then page scrubs, then unpauses vcpu. so it > appears lot of page scurbbing results in huge delta on first tick.The problem here may be that timers are running in the domain while the vcpu is not. Steve _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Mukesh Rathor
2009-Dec-21 19:52 UTC
Re: [Xen-devel] [timer/ticks related] dom0 hang during boot on large 1TB system
On Mon, 21 Dec 2009 19:07:39 +0000 Keir Fraser <keir.fraser@eu.citrix.com> wrote:> On 21/12/2009 18:20, "Dan Magenheimer" <dan.magenheimer@oracle.com> > wrote: > > > Not to say the problem can''t or shouldn''t be fixed in Xen. > > Keir, would bad things happen if construct_dom0 is done after > > scrub_heap_pages()? Other than some time wastage because > > dom0''s memory would get scrubbed just before it gets > > overwritten (which is admittedly a much bigger problem > > when dom0_mem is not specified in the Xen boot line > > on a machine with ginormous memory). > > The problem is more likely that Xen system time started ticking some > time earlier during boot process. I doubt it is to do with ordering of > construct_dom0 versus boot-time scrubbing. > > -- Keir >The problem is exactly how Dan described it. ''delta'' for first interrupt in dom0->timer_interrupt() goes up proportionately with amount of memory on system. On this box, it appears more than 600GB causes delta to be large enough to wrap jiffies. 1TB delta: 940b7d68a4 32GB delta: 02ae56eadb xen->send_guest_vcpu_virq() ----> dom0->handle_IRQ() -> timer_interrupt() timer_interrupt will call do_timer delta/NS_PER_TICK number of times. Linux initializes jiffies to -5 minutes to catch problems from jiffies wrap early on. But like Dan said, dom0->calibrate_delay_direct() on baremetal starts running right away and is guaranteed to run in less than 5 minutes. We could let that assumption be true by moving page scrub before xen->construct_dom0(), in which case the first timer interrupt in dom0 will come in lot sooner, or just fix the loop to account for wrap. Since jiffies just represents lower 32bits of jiffies_64, and get_jiffies_64() is provided for the purpose of reading 64bit version, I just avail of that. Thanks, Mukesh _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2009-Dec-21 19:55 UTC
Re: [Xen-devel] [timer/ticks related] dom0 hang during boot on large 1TB system
On 12/21/2009 11:52 AM, Mukesh Rathor wrote:> On Mon, 21 Dec 2009 19:07:39 +0000 > Keir Fraser<keir.fraser@eu.citrix.com> wrote: > > >> On 21/12/2009 18:20, "Dan Magenheimer"<dan.magenheimer@oracle.com> >> wrote: >> >> >>> Not to say the problem can''t or shouldn''t be fixed in Xen. >>> Keir, would bad things happen if construct_dom0 is done after >>> scrub_heap_pages()? Other than some time wastage because >>> dom0''s memory would get scrubbed just before it gets >>> overwritten (which is admittedly a much bigger problem >>> when dom0_mem is not specified in the Xen boot line >>> on a machine with ginormous memory). >>> >> The problem is more likely that Xen system time started ticking some >> time earlier during boot process. I doubt it is to do with ordering of >> construct_dom0 versus boot-time scrubbing. >> >> -- Keir >> >> > The problem is exactly how Dan described it. ''delta'' for first interrupt > in dom0->timer_interrupt() goes up proportionately with amount of memory > on system. On this box, it appears more than 600GB causes delta to be > large enough to wrap jiffies. > > 1TB delta: 940b7d68a4 > 32GB delta: 02ae56eadb > > xen->send_guest_vcpu_virq() ----> dom0->handle_IRQ() -> timer_interrupt() > > timer_interrupt will call do_timer delta/NS_PER_TICK number of times. >How is it computing that delta? Anyway, I''m not at all sure this will apply to a pvops dom0 kernel as it does timekeeping quite differently from 2.6.18-xen. J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Mukesh Rathor
2009-Dec-21 22:47 UTC
Re: [Xen-devel] [timer/ticks related] dom0 hang during boot on large 1TB system
On Mon, 21 Dec 2009 11:55:09 -0800 Jeremy Fitzhardinge <jeremy@goop.org> wrote:> On 12/21/2009 11:52 AM, Mukesh Rathor wrote: > > On Mon, 21 Dec 2009 19:07:39 +0000 > > Keir Fraser<keir.fraser@eu.citrix.com> wrote: > > > > > >> On 21/12/2009 18:20, "Dan Magenheimer"<dan.magenheimer@oracle.com> > >> wrote: > >> > >> > >>> Not to say the problem can''t or shouldn''t be fixed in Xen. > >>> Keir, would bad things happen if construct_dom0 is done after > >>> scrub_heap_pages()? Other than some time wastage because > >>> dom0''s memory would get scrubbed just before it gets > >>> overwritten (which is admittedly a much bigger problem > >>> when dom0_mem is not specified in the Xen boot line > >>> on a machine with ginormous memory). > >>> > >> The problem is more likely that Xen system time started ticking > >> some time earlier during boot process. I doubt it is to do with > >> ordering of construct_dom0 versus boot-time scrubbing. > >> > >> -- Keir > >> > >> > > The problem is exactly how Dan described it. ''delta'' for first > > interrupt in dom0->timer_interrupt() goes up proportionately with > > amount of memory on system. On this box, it appears more than 600GB > > causes delta to be large enough to wrap jiffies. > > > > 1TB delta: 940b7d68a4 > > 32GB delta: 02ae56eadb > > > > xen->send_guest_vcpu_virq() ----> dom0->handle_IRQ() -> > > timer_interrupt() > > > > timer_interrupt will call do_timer delta/NS_PER_TICK number of > > times. > > How is it computing that delta? > > Anyway, I''m not at all sure this will apply to a pvops dom0 kernel as > it does timekeeping quite differently from 2.6.18-xen. > > Jdelta comes from: timer_inetrrupt() in time-xen.c : ... do { get_time_values_from_xen(cpu); /* Obtain a consistent snapshot of elapsed wallclock cycles. */ ---> delta = delta_cpu shadow->system_timestamp + get_nsec_offset(shadow); ---> delta -= processed_system_time; delta_cpu -= per_cpu(processed_system_time, cpu); /* * Obtain a consistent snapshot of stolen/blocked cycles. We * can use state_entry_time to detect if we get preempted here. */ do { sched_time = runstate->state_entry_time; barrier(); stolen = runstate->time[RUNSTATE_runnable] + runstate->time[RUNSTATE_offline] - per_cpu(processed_stolen_time, cpu); blocked = runstate->time[RUNSTATE_blocked] - per_cpu(processed_blocked_time, cpu); barrier(); } while (sched_time != runstate->state_entry_time); } while (!time_values_up_to_date(cpu)); ... At first glance, i don''t understand the above algorithm. Since you''ve the same code, I assumed you could also compute delta to be a large value when dom0 starts, in which case you may observe dom0 hang. thanks, Mukesh _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2009-Dec-21 23:13 UTC
Re: [Xen-devel] [timer/ticks related] dom0 hang during boot on large 1TB system
On 12/21/2009 02:47 PM, Mukesh Rathor wrote:> delta comes from: > > timer_inetrrupt() in time-xen.c : > ... > do { > get_time_values_from_xen(cpu); > > /* Obtain a consistent snapshot of elapsed wallclock cycles. */ > ---> delta = delta_cpu > shadow->system_timestamp + get_nsec_offset(shadow); > ---> delta -= processed_system_time; > delta_cpu -= per_cpu(processed_system_time, cpu); > > /* > * Obtain a consistent snapshot of stolen/blocked cycles. We > * can use state_entry_time to detect if we get preempted here. > */ > do { > sched_time = runstate->state_entry_time; > barrier(); > stolen = runstate->time[RUNSTATE_runnable] + > runstate->time[RUNSTATE_offline] - > per_cpu(processed_stolen_time, cpu); > blocked = runstate->time[RUNSTATE_blocked] - > per_cpu(processed_blocked_time, cpu); > barrier(); > } while (sched_time != runstate->state_entry_time); > } while (!time_values_up_to_date(cpu)); > ... > > > At first glance, i don''t understand the above algorithm. Since you''ve > the same code, I assumed you could also compute delta to be a large > value when dom0 starts, in which case you may observe dom0 hang. >There''s some code in the pvops kernel which looks vaguely like that, but it has nothing to do with timer interrupts. Could you be more specific about what you''re referring to? J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Mukesh Rathor
2009-Dec-21 23:40 UTC
Re: [Xen-devel] [timer/ticks related] dom0 hang during boot on large 1TB system
On Mon, 21 Dec 2009 10:44:51 +0000 Keir Fraser <keir.fraser@eu.citrix.com> wrote:> On 19/12/2009 04:43, "Mukesh Rathor" <mukesh.rathor@oracle.com> wrote: > > > Ok, I came up with the following patch. Jeremy, can you please take > > a look also, and comment on my fix since I noticed you''ve got the > > same issue in your tree. Here''s a summary for your benefit: > > This patch doesn''t apply to > http://xenbits.xensource.com/linux-2.6.18-xen.hg by the way. The code > is different there. So I''m dropping this patch as I have nowhere to > put it. > > -- Keir >Actually, INITIAL_JIFFIES appears to be buggy on 64bit linux: #define INITIAL_JIFFIES ((unsigned long)(unsigned int) (-300*HZ)) The casting to uint makes it still 0xfffedb08 instead of 0xfffffffffffedb08 which is what the intention is, that jiffies should wrap in few minutes. So, if they fix it in linux in future, my patch will still have the same problem. Ok, I''ll come up with another patch. thanks, Mukesh _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Dan Magenheimer
2009-Dec-21 23:57 UTC
RE: [Xen-devel] [timer/ticks related] dom0 hang during boot on large 1TB system
> From: Jeremy Fitzhardinge [mailto:jeremy@goop.org] > > There''s some code in the pvops kernel which looks vaguely > like that, but > it has nothing to do with timer interrupts. Could you be > more specific > about what you''re referring to?I spent some time rooting through the 2.6.32 code and ended up with my head spinning. I think the bottom line is if there is code that may cause jiffies to increment by a large amount from a single "tick" delivered by Xen, it''s likely the same problem can occur in 2.6.32 dom0 when running on Xen. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Mukesh Rathor
2009-Dec-22 04:00 UTC
Re: [Xen-devel] [timer/ticks related] dom0 hang during boot on large 1TB system
On Mon, 21 Dec 2009 14:17:57 -0500 Steve Ofsthun <steve.ofsthun@oracle.com> wrote:> As Keir suggests, the correct solution is probably to use the > time_before/after macros appropriately. > > The proposed code avoids the problem by accessing jiffies_64 instead.can''t use time_after/before as they do signed comparisions. time_after(a,b): ((long)(b) - (long)(a) < 0)) thus, time_after(0xFFFEDB09, 0xFFFEDB08) will return true as will time_after(0x1020, 0xFFFEDB08) as they are both after 0xFFFEDB08. For wrapping, unsigned comparision must be done, which is also the jiffies data type. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Mukesh Rathor
2009-Dec-22 04:18 UTC
Re: [Xen-devel] [timer/ticks related] dom0 hang during boot on large 1TB system
On Mon, 21 Dec 2009 20:00:25 -0800 Mukesh Rathor <mukesh.rathor@oracle.com> wrote:> On Mon, 21 Dec 2009 14:17:57 -0500 > Steve Ofsthun <steve.ofsthun@oracle.com> wrote: > > > As Keir suggests, the correct solution is probably to use the > > time_before/after macros appropriately. > > > > The proposed code avoids the problem by accessing jiffies_64 > > instead. > > can''t use time_after/before as they do signed comparisions. > time_after(a,b): ((long)(b) - (long)(a) < 0)) > > thus, time_after(0xFFFEDB09, 0xFFFEDB08) will return true as will > time_after(0x1020, 0xFFFEDB08) as they are both after 0xFFFEDB08. > > For wrapping, unsigned comparision must be done, which is also the > jiffies data type. >actually my bad. it can''t be used in if statement to check for wrapping, but i can use it in the while loop here as it seems to only care when jiffies is gone up. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Mukesh Rathor
2009-Dec-22 04:31 UTC
Re: [Xen-devel] [timer/ticks related] dom0 hang during boot on large 1TB system
On Mon, 21 Dec 2009 15:57:33 -0800 (PST) Dan Magenheimer <dan.magenheimer@oracle.com> wrote:> > From: Jeremy Fitzhardinge [mailto:jeremy@goop.org] > > > > There''s some code in the pvops kernel which looks vaguely > > like that, but > > it has nothing to do with timer interrupts. Could you be > > more specific > > about what you''re referring to? > > I spent some time rooting through the 2.6.32 code and > ended up with my head spinning. I think the bottom line > is if there is code that may cause jiffies to increment > by a large amount from a single "tick" delivered by > Xen, it''s likely the same problem can occur in 2.6.32 > dom0 when running on Xen.Right. Your calibrate_delay_direct() is the same, so in there: start_jiffies = jiffies; ... timer interrupt.... while (jiffies <= (start_jiffies + tick_divider)) { pre_start = start; read_current_timer(&start); } if timer tick comes in after start_jiff is set, and upon returning from timer interrupt the while loop finds jiffies wrapped, it will hang. i was looking at wrong "jeremy''s pvops tree", but now that i am looking at correct one, i see that your timer_interrupt() is pretty different. so if you believe that you could also increment jiffies by more than one in timer_interrupt, you should consider my new patch when i submit. i''m testing right now. thanks, mukesh _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2009-Dec-22 07:35 UTC
Re: [Xen-devel] [timer/ticks related] dom0 hang during boot on large 1TB system
On 21/12/2009 23:40, "Mukesh Rathor" <mukesh.rathor@oracle.com> wrote:> Actually, INITIAL_JIFFIES appears to be buggy on 64bit linux: > > #define INITIAL_JIFFIES ((unsigned long)(unsigned int) (-300*HZ)) > > The casting to uint makes it still 0xfffedb08 instead of > 0xfffffffffffedb08 which is what the intention is, that jiffies should > wrap in few minutes. So, if they fix it in linux in future, my > patch will still have the same problem.Actually the cast to unsigned int is deliberate. They want jiffies to wrap 32 bits soon after boot, but it should pretty much never wrap 64 bits. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2009-Dec-22 07:59 UTC
Re: [Xen-devel] [timer/ticks related] dom0 hang during boot on large 1TB system
I''ll try and make this *really* clear... On 22/12/2009 04:00, "Mukesh Rathor" <mukesh.rathor@oracle.com> wrote:> can''t use time_after/before as they do signed comparisions. > time_after(a,b): ((long)(b) - (long)(a) < 0))The whole point is to do signed comparison. This gives you reliable +/- (BITS_PER_LONG-1) bits to reliably compare: with 32-bit Linux that means jiffy values which do not differ by more than +/- 2^31 can be reliably compared, regardless of wrapping. Bear in mind that even at HZ=1000, it''ll take 3.5 *weeks* for jiffies to increase by 2^31.> thus, time_after(0xFFFEDB09, 0xFFFEDB08) will return true as will > time_after(0x1020, 0xFFFEDB08) as they are both after 0xFFFEDB08.Well yeah: anything in the ranges a=0xFFFEDB09-0xFFFFFFFF and a=0x0-0x7FFEDB09 will return true for time_after(a,0xFFFEDB08). That''s how a signed 32-bit comparison works. The assumption here is that 0x1020 is derived from jiffies_64=0x100001020: in general the assumption is that the arguments to time_after() were taken within seconds/minutes/hours of each other, not days/weeks. Which precludes a jiffies_64 difference of>0x7FFFFFFF, which is what would invalidate use of time_after().> For wrapping, unsigned comparision must be done, which is also the jiffies > data type.If you do 32-bit unsigned comparisons, that is broken by jiffies wrapping, the fixing of which was the whole point of the comparison macros. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2009-Dec-22 08:05 UTC
Re: [Xen-devel] [timer/ticks related] dom0 hang during boot on large 1TB system
On 22/12/2009 07:59, "Keir Fraser" <keir.fraser@eu.citrix.com> wrote:>> For wrapping, unsigned comparision must be done, which is also the jiffies >> data type. > > If you do 32-bit unsigned comparisons, that is broken by jiffies wrapping, > the fixing of which was the whole point of the comparison macros.I''m talking about ''(ulong)b<(ulong)a'' here of course. ''(ulong)b-(ulong)a<0'' would always be false, which is even less useful. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jan Beulich
2009-Dec-22 08:51 UTC
RE: [Xen-devel] [timer/ticks related] dom0 hang during boot on large 1TB system
>>> Dan Magenheimer <dan.magenheimer@oracle.com> 21.12.09 19:20 >>> > From: Jan Beulich [mailto:JBeulich@novell.com] >> Based on prior analysis of similar problems, I''m not >> convinced this is the >> right solution: Kernel code should not need changing here. >> Instead, I''d >> recommend trying to insert a call to process_pending_timers() every so >> many pages scrubbed (just like is e.g. being done in the P2M/M2P table >> population code). > >Mukesh has dug into this a lot deeper than I, but I think >process_pending_timers() is irrelevant here. When dom0Why would this be any different than a lot of time being consumed populating large p2m/m2p tables? All this happens when Dom0 already exists, but isn''t running yet.>is constructed, its data space is initialized in memory >and jiffies has been initialized in the data section with >a fixed value of -300 * HZ. At this point, dom0 lives in >memory but has not executed a single instruction, so is >not capable of receiving any interrupts. I *think* Xen >also initializes a clocksource (pvclock?) here.... and updates it each time local_time_calibration() is run, which is the missing piece (process_pending_timers() causes time_calibration() to run as needed, in turn causing TIME_CALIBRATE_SOFTIRQ to be raised as needed [and run the latest immediately before Dom0 gets passed control], in turn causing local_time_calibration() to run, updating dom0:vcpu0''s system time).>Then scrub_heap_pages() occurs which eats up a lot of time.... and confuses Xen''s own time keeping (because, depending on the platform timer used and it''s wrap-around interval, a wrap may be missed if process_pending_timers() isn''t being executed frequently enough. But from the other mail regarding this subject I conclude that this suggestion wasn''t even tried, despite me knowing that it fixed similar problems on 1Tb systems. And be assured, I spent hours (if not days) analyzing the problem until I finally understood that this is entirely unrelated to the kernel.>THEN dom0 is started and receives a timer interrupt and, >I guess, the clocksource code updates jiffies based on >the time elapsed and, since jiffies is unsigned, it >wraps around. > >So (admitting I don''t understand this fully), I think the >problem is that the kernel has hardcoded into it that it''s >impossible for 300 seconds to expire between the time it >is put in memory and the time the first interrupt occurs. >That seems like a kernel bug to me, maybe in the pvclock >code, but still in the kernel.No, the time the kernel gets put in memory doesn''t matter at all. Counting starts when the kernel starts initializing its time subsystem, and with timer interrupts being disabled initially I can''t even see how multiple of them could pile up.>Not to say the problem can''t or shouldn''t be fixed in Xen. >Keir, would bad things happen if construct_dom0 is done after >scrub_heap_pages()? Other than some time wastage because >dom0''s memory would get scrubbed just before it gets >overwritten (which is admittedly a much bigger problem >when dom0_mem is not specified in the Xen boot line >on a machine with ginormous memory).Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2009-Dec-22 10:20 UTC
Re: [Xen-devel] [timer/ticks related] dom0 hang during boot on large 1TB system
On 22/12/2009 08:51, "Jan Beulich" <JBeulich@novell.com> wrote:>> Then scrub_heap_pages() occurs which eats up a lot of time. > > ... and confuses Xen''s own time keeping (because, depending on > the platform timer used and it''s wrap-around interval, a wrap may > be missed if process_pending_timers() isn''t being executed > frequently enough.Process_pending_timers() has been called on every iteration of the scrub loop for as long as I can remember. I believe it was even you who added it. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jan Beulich
2009-Dec-22 11:10 UTC
Re: [Xen-devel] [timer/ticks related] dom0 hang during boot on large 1TB system
>>> Keir Fraser <keir.fraser@eu.citrix.com> 22.12.09 11:20 >>> >On 22/12/2009 08:51, "Jan Beulich" <JBeulich@novell.com> wrote: > >>> Then scrub_heap_pages() occurs which eats up a lot of time. >> >> ... and confuses Xen''s own time keeping (because, depending on >> the platform timer used and it''s wrap-around interval, a wrap may >> be missed if process_pending_timers() isn''t being executed >> frequently enough. > >Process_pending_timers() has been called on every iteration of the scrub >loop for as long as I can remember. I believe it was even you who added it.Should I have overlooked it? Indeed, I did (I looked at the end of the loop, while it''s sitting at the beginning). I''m really sorry for the noise then. Nevertheless I remain convinced that the problem ought not to be fixed by a kernel change (and even less by one that modifies Xen-unspecific code). Any patch to this effect, unless I should be convinced otherwise, has my explicit up front NAK (in case this counts anything). And then it should be possible to simulate the problem quite easily on a system with much less memory, by slowing down the scrub loop artificially. If I find time before the holiday break I''ll try to do that and see if I can convince myself otherwise (as per above). artificially Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2009-Dec-22 13:35 UTC
Re: [Xen-devel] [timer/ticks related] dom0 hang during boot on large 1TB system
On 22/12/2009 11:10, "Jan Beulich" <JBeulich@novell.com> wrote:> Nevertheless I remain convinced that the problem ought not to be fixed > by a kernel change (and even less by one that modifies Xen-unspecific > code). Any patch to this effect, unless I should be convinced otherwise, > has my explicit up front NAK (in case this counts anything).Well, I must say the kernel patch looked quite sensible to me. If no other reason than reinforcing the fact that jiffy values should always be compared using the provided macros. But I''m happy to have a hypervisor patch as well, if we can work out what it should be. I''m still unclear on the reason why slow page scrubbing causes this problem - Oracle''s explanation hasn''t convinced me yet.> And then it should be possible to simulate the problem quite easily on > a system with much less memory, by slowing down the scrub loop > artificially. If I find time before the holiday break I''ll try to do that and > see if I can convince myself otherwise (as per above). > artificiallyThat would be helpful, thanks. I''m particularly intrigued by how this could be seen for dom0 but not be a similar or worse issue for domU. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jan Beulich
2009-Dec-22 14:17 UTC
Re: [Xen-devel] [timer/ticks related] dom0 hang during boot on large 1TB system
>>> Keir Fraser <keir.fraser@eu.citrix.com> 22.12.09 14:35 >>> >On 22/12/2009 11:10, "Jan Beulich" <JBeulich@novell.com> wrote: > >> Nevertheless I remain convinced that the problem ought not to be fixed >> by a kernel change (and even less by one that modifies Xen-unspecific >> code). Any patch to this effect, unless I should be convinced otherwise, >> has my explicit up front NAK (in case this counts anything). > >Well, I must say the kernel patch looked quite sensible to me. If no other >reason than reinforcing the fact that jiffy values should always be compared >using the provided macros. But I''m happy to have a hypervisor patch as well, >if we can work out what it should be. I''m still unclear on the reason why >slow page scrubbing causes this problem - Oracle''s explanation hasn''t >convinced me yet.There''s another thing that seems inconsistent with this report: jiffies itself as well as all the arithmetic in calibrate_delay_direct() is using "unsigned long", so is being done 64-bit on x86-64 (which we''re talking about here). Hence I can see even less how an overflow could have happened, or how using explicit 64-bit types (or get_jiffies_64()) here can help. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jan Beulich
2009-Dec-22 14:23 UTC
Re: [Xen-devel] [timer/ticks related] dom0 hang during boot on large 1TB system
>>> "Jan Beulich" <JBeulich@novell.com> 22.12.09 15:17 >>> >There''s another thing that seems inconsistent with this report: jiffies >itself as well as all the arithmetic in calibrate_delay_direct() is using >"unsigned long", so is being done 64-bit on x86-64 (which we''re >talking about here). Hence I can see even less how an overflow could >have happened, or how using explicit 64-bit types (or get_jiffies_64()) >here can help.Oh, or are we talking about 32-bit Dom0 on 64-bit Xen here? I don''t recall this having been mentioned anywhere, but maybe I just overlooked it. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2009-Dec-22 15:19 UTC
Re: [Xen-devel] [timer/ticks related] dom0 hang during boot on large 1TB system
On 22/12/2009 14:23, "Jan Beulich" <JBeulich@novell.com> wrote:>>>> "Jan Beulich" <JBeulich@novell.com> 22.12.09 15:17 >>> >> There''s another thing that seems inconsistent with this report: jiffies >> itself as well as all the arithmetic in calibrate_delay_direct() is using >> "unsigned long", so is being done 64-bit on x86-64 (which we''re >> talking about here). Hence I can see even less how an overflow could >> have happened, or how using explicit 64-bit types (or get_jiffies_64()) >> here can help. > > Oh, or are we talking about 32-bit Dom0 on 64-bit Xen here? I don''t > recall this having been mentioned anywhere, but maybe I just > overlooked it.I''d assumed this must be the case. As you say, the issue couldn''t happen as described on 64-bit Linux. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Dan Magenheimer
2009-Dec-22 15:30 UTC
RE: [Xen-devel] [timer/ticks related] dom0 hang during boot on large 1TB system
> >>> "Jan Beulich" <JBeulich@novell.com> 22.12.09 15:17 >>> > >There''s another thing that seems inconsistent with this > report: jiffies > >itself as well as all the arithmetic in > calibrate_delay_direct() is using > >"unsigned long", so is being done 64-bit on x86-64 (which we''re > >talking about here). Hence I can see even less how an overflow could > >have happened, or how using explicit 64-bit types (or > get_jiffies_64()) > >here can help. > > Oh, or are we talking about 32-bit Dom0 on 64-bit Xen here? I don''t > recall this having been mentioned anywhere, but maybe I just > overlooked it.Mukesh''s work has been on a 32-bit dom0. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jan Beulich
2009-Dec-22 15:36 UTC
RE: [Xen-devel] [timer/ticks related] dom0 hang during boot on large 1TB system
>>> Dan Magenheimer <dan.magenheimer@oracle.com> 22.12.09 16:30 >>> >Mukesh''s work has been on a 32-bit dom0.Which seems quite odd a combination - 1Tb of memory, but a 32-bit Dom0 -, which is why initially I didn''t even consider the possibility. I''m afraid you''re asking for more trouble with this. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Dan Magenheimer
2009-Dec-22 16:05 UTC
RE: [Xen-devel] [timer/ticks related] dom0 hang during boot on large 1TB system
> From: Jan Beulich [mailto:JBeulich@novell.com] > > >>> Dan Magenheimer <dan.magenheimer@oracle.com> 22.12.09 16:30 >>> > >Mukesh''s work has been on a 32-bit dom0. > > Which seems quite odd a combination - 1Tb of memory, but a 32-bit > Dom0 -, which is why initially I didn''t even consider the > possibility. I''m > afraid you''re asking for more trouble with this.Indeed. Oracle expects to move to a 64-bit dom0 in a "future" release, but we have to make the 32-bit dom0 work until then. Our default configuration specifies dom0_mem=xxx which I would assume would eliminate most or all of the "trouble" you are referring to, but, Mukesh, could you confirm what dom0_mem is set to? _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jan Beulich
2009-Dec-22 16:33 UTC
Re: [Xen-devel] [timer/ticks related] dom0 hang during boot on large 1TB system
>>> Keir Fraser <keir.fraser@eu.citrix.com> 22.12.09 14:35 >>> >> And then it should be possible to simulate the problem quite easily on >> a system with much less memory, by slowing down the scrub loop >> artificially. If I find time before the holiday break I''ll try to do that and >> see if I can convince myself otherwise (as per above). >> artificially > >That would be helpful, thanks. I''m particularly intrigued by how this could >be seen for dom0 but not be a similar or worse issue for domU.Simulating the issue indeed went without problem. What I''m seeing is a Xen problem, though (as expected): Right around when scrubbing starts there is a run through time_calibration(). During and after scrub, none happen however, until Dom0 proceeded quite a bit into its initialization (namely, past its delay loop calibration). It is only then when regular (1 second interval) time_calibration() invocations resume. One other irregular at the first glance thing is that the mentioned very first run through time_calibration() does not seem to result in running local_time_calibration() on CPU0. One invocation (apparently independent of time_calibration()) happens right before Dom0 starts executing. Jan (XEN) *** LOADING DOMAIN 0 *** (XEN) Xen kernel: 64-bit, lsb, compat32 (XEN) Dom0 kernel: 32-bit, PAE, lsb, paddr 0x2000 -> 0x496000 (XEN) PHYSICAL MEMORY ARRANGEMENT: (XEN) Dom0 alloc.: 0000000229000000->000000022a000000 (1936909 pages to be allocated) (XEN) VIRTUAL MEMORY ARRANGEMENT: (XEN) Loaded kernel: 00000000c0002000->00000000c0496000 (XEN) Init. ramdisk: 00000000c0496000->00000000c0759193 (XEN) Phys-Mach map: 00000000c075a000->00000000c0ec1834 (XEN) Start info: 00000000c0ec2000->00000000c0ec24b4 (XEN) Page tables: 00000000c0ec3000->00000000c0ed1000 (XEN) Boot stack: 00000000c0ed1000->00000000c0ed2000 (XEN) TOTAL: 00000000c0000000->00000000c1000000 (XEN) ENTRY ADDRESS: 00000000c0002000 (XEN) Dom0 has maximum 8 VCPUs (XEN) tc (XEN) ltc@4[32767:4] (XEN) ltc@3[32767:3] (XEN) ltc@7[32767:7] (XEN) ltc@6[32767:6] (XEN) Scrubbing Free RAM: ltc@2[32767:2] (XEN) ltc@5[32767:5] (XEN) ltc@1[32767:1] (XEN) .....done. (XEN) Xen trace buffers: disabled (XEN) Std. Loglevel: All (XEN) Guest Loglevel: All (XEN) Xen is relinquishing VGA console. (XEN) *** Serial input -> Xen (type ''CTRL-q'' three times to switch input to DOM0) (XEN) Freed 160kB init memory. (XEN) ltc@0[0:0] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jan Beulich
2009-Dec-22 16:42 UTC
Re: [Xen-devel] [timer/ticks related] dom0 hang during boot on large 1TB system
>>> "Jan Beulich" <JBeulich@novell.com> 22.12.09 17:33 >>> >One other irregular at the first glance thing is that the mentioned >very first run through time_calibration() does not seem to result in >running local_time_calibration() on CPU0. One invocation (apparently >independent of time_calibration()) happens right before Dom0 starts >executing.And that''s of course the problem: CPU0''s TIME_CALIBRATE_SOFTIRQ can''t get serviced until entry to Dom0, but CPU0 is responsible for re-arming calibration_timer. Hence there''s a gap of calibrations, resulting in an excessive delta observed during the first timer interrupt in Dom0. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jan Beulich
2009-Dec-22 17:02 UTC
RE: [Xen-devel] [timer/ticks related] dom0 hang during boot on large 1TB system
>>> Dan Magenheimer <dan.magenheimer@oracle.com> 22.12.09 17:05 >>> >> From: Jan Beulich [mailto:JBeulich@novell.com] >> >> >>> Dan Magenheimer <dan.magenheimer@oracle.com> 22.12.09 16:30 >>> >> >Mukesh''s work has been on a 32-bit dom0. >> >> Which seems quite odd a combination - 1Tb of memory, but a 32-bit >> Dom0 -, which is why initially I didn''t even consider the >> possibility. I''m >> afraid you''re asking for more trouble with this. > >Indeed. Oracle expects to move to a 64-bit dom0 in a "future" >release, but we have to make the 32-bit dom0 work until then. > >Our default configuration specifies dom0_mem=xxx which I would >assume would eliminate most or all of the "trouble" you are >referring to, but, Mukesh, could you confirm what dom0_mem >is set to?No, that won''t help. I''m referring to things like Dom0 accesses to the M2P table it sees, which doesn''t cover even nearly all memory. I can''t say whether that can go without problem, but without closely looking at it I don''t think you can assume this would work. Likewise I would suspect tools issues (if you use the tools from xen-unstable et al), though I have no precise pointer right now at specific issues. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Dan Magenheimer
2009-Dec-22 17:27 UTC
RE: [Xen-devel] [timer/ticks related] dom0 hang during boot on large 1TB system
So, checking my understanding, the underlying problem is that shadow->tsc_timestamp has essentially stopped but hardware tsc has continued moving forward? Thus in timer_interrupt() (in time-xen.c) shadow->system_timestamp will be stale and so get_nsec_offset() is returning a large number, resulting in a large delta, which in turn causes jiffies to be incremented by a large amount which, if the interrupt happens by coincidence in the middle of the first while loop in calibrate_delay_direct() (in init/calibrate.c) and the large jiffies increment happens to be enough to wrap, the while loop will run for weeks. If this is right, I''m still not clear on how it can be fixed in Xen.> -----Original Message----- > From: Jan Beulich [mailto:JBeulich@novell.com] > Sent: Tuesday, December 22, 2009 9:43 AM > To: Keir Fraser; Dan Magenheimer; Mukesh Rathor > Cc: Jeremy Fitzhardinge; Xen-devel@lists.xensource.com; Kurt Hackel > Subject: Re: [Xen-devel] [timer/ticks related] dom0 hang > during boot on > large 1TB system > > > >>> "Jan Beulich" <JBeulich@novell.com> 22.12.09 17:33 >>> > >One other irregular at the first glance thing is that the mentioned > >very first run through time_calibration() does not seem to result in > >running local_time_calibration() on CPU0. One invocation (apparently > >independent of time_calibration()) happens right before Dom0 starts > >executing. > > And that''s of course the problem: CPU0''s TIME_CALIBRATE_SOFTIRQ can''t > get serviced until entry to Dom0, but CPU0 is responsible for > re-arming > calibration_timer. Hence there''s a gap of calibrations, > resulting in an > excessive delta observed during the first timer interrupt in Dom0. > > Jan > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2009-Dec-22 17:48 UTC
Re: [Xen-devel] [timer/ticks related] dom0 hang during boot on large 1TB system
On 22/12/2009 16:42, "Jan Beulich" <JBeulich@novell.com> wrote:>>>> "Jan Beulich" <JBeulich@novell.com> 22.12.09 17:33 >>> >> One other irregular at the first glance thing is that the mentioned >> very first run through time_calibration() does not seem to result in >> running local_time_calibration() on CPU0. One invocation (apparently >> independent of time_calibration()) happens right before Dom0 starts >> executing. > > And that''s of course the problem: CPU0''s TIME_CALIBRATE_SOFTIRQ can''t > get serviced until entry to Dom0, but CPU0 is responsible for re-arming > calibration_timer. Hence there''s a gap of calibrations, resulting in an > excessive delta observed during the first timer interrupt in Dom0.Arbitrarily delaying softirq work is probably inherently fragile. All we have to defer is SCHEDULE_SOFTIRQ as that can preempt the current context. So I will look into making a patch that changes process_pending_timers() to process_pending_softirqs(). Thanks, Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2009-Dec-22 18:03 UTC
Re: [Xen-devel] [timer/ticks related] dom0 hang during boot on large 1TB system
On 12/22/2009 09:02 AM, Jan Beulich wrote:> No, that won''t help. I''m referring to things like Dom0 accesses to the > M2P table it sees, which doesn''t cover even nearly all memory. I can''t > say whether that can go without problem, but without closely looking > at it I don''t think you can assume this would work. Likewise I would > suspect tools issues (if you use the tools from xen-unstable et al), > though I have no precise pointer right now at specific issues. >32-bit dom0 is the standard use model for Citrix product, and I think people tend to run it even with xen-unstable. Its a fairly well-tested combination. J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2009-Dec-22 18:42 UTC
Re: [Xen-devel] [timer/ticks related] dom0 hang during boot on large 1TB system
On 22/12/2009 16:42, "Jan Beulich" <JBeulich@novell.com> wrote:>>>> "Jan Beulich" <JBeulich@novell.com> 22.12.09 17:33 >>> >> One other irregular at the first glance thing is that the mentioned >> very first run through time_calibration() does not seem to result in >> running local_time_calibration() on CPU0. One invocation (apparently >> independent of time_calibration()) happens right before Dom0 starts >> executing. > > And that''s of course the problem: CPU0''s TIME_CALIBRATE_SOFTIRQ can''t > get serviced until entry to Dom0, but CPU0 is responsible for re-arming > calibration_timer. Hence there''s a gap of calibrations, resulting in an > excessive delta observed during the first timer interrupt in Dom0.Please give xen-unstable:20714 a look. If that fixes the apparent problems I think it is also a good candidate for backport to 3.4 branch. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Mukesh Rathor
2009-Dec-22 23:00 UTC
Re: [Xen-devel] [timer/ticks related] dom0 hang during boot on large 1TB system
On Tue, 22 Dec 2009 18:42:01 +0000 Keir Fraser <keir.fraser@eu.citrix.com> wrote:> On 22/12/2009 16:42, "Jan Beulich" <JBeulich@novell.com> wrote: > > >>>> "Jan Beulich" <JBeulich@novell.com> 22.12.09 17:33 >>> > >> One other irregular at the first glance thing is that the mentioned > >> very first run through time_calibration() does not seem to result > >> in running local_time_calibration() on CPU0. One invocation > >> (apparently independent of time_calibration()) happens right > >> before Dom0 starts executing. > > > > And that''s of course the problem: CPU0''s TIME_CALIBRATE_SOFTIRQ > > can''t get serviced until entry to Dom0, but CPU0 is responsible for > > re-arming calibration_timer. Hence there''s a gap of calibrations, > > resulting in an excessive delta observed during the first timer > > interrupt in Dom0. > > Please give xen-unstable:20714 a look. If that fixes the apparent > problems I think it is also a good candidate for backport to 3.4 > branch. > > -- Keir >Yup, that fixed it. Jiffies now jumps from 0xfffedb08 to 0xfffedb56 as opposed to 0x0000102c or similar. BTW, my test was limited to just booting the box. I''m glad it got resolved before I leave on holidays for a while. So many thanks to all. Mukesh _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jan Beulich
2010-Jan-04 08:23 UTC
Re: [Xen-devel] [timer/ticks related] dom0 hang during boot on large 1TB system
>>> Jeremy Fitzhardinge <jeremy@goop.org> 22.12.09 19:03 >>> >On 12/22/2009 09:02 AM, Jan Beulich wrote: >> No, that won''t help. I''m referring to things like Dom0 accesses to the >> M2P table it sees, which doesn''t cover even nearly all memory. I can''t >> say whether that can go without problem, but without closely looking >> at it I don''t think you can assume this would work. Likewise I would >> suspect tools issues (if you use the tools from xen-unstable et al), >> though I have no precise pointer right now at specific issues. >> > >32-bit dom0 is the standard use model for Citrix product, and I think >people tend to run it even with xen-unstable. Its a fairly well-tested >combination.But very unlikely with 1Tb of memory, don''t you think? Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Dan Magenheimer
2010-Jan-04 22:07 UTC
RE: [Xen-devel] [timer/ticks related] dom0 hang during boot on large 1TB system
> From: Jan Beulich [mailto:JBeulich@novell.com] > > >>> Jeremy Fitzhardinge <jeremy@goop.org> 22.12.09 19:03 >>> > >On 12/22/2009 09:02 AM, Jan Beulich wrote: > >> No, that won''t help. I''m referring to things like Dom0 > accesses to the > >> M2P table it sees, which doesn''t cover even nearly all > memory. I can''t > >> say whether that can go without problem, but without > closely looking > >> at it I don''t think you can assume this would work. > Likewise I would > >> suspect tools issues (if you use the tools from > xen-unstable et al), > >> though I have no precise pointer right now at specific issues. > >> > > > >32-bit dom0 is the standard use model for Citrix product, > and I think > >people tend to run it even with xen-unstable. Its a fairly > well-tested > >combination. > > But very unlikely with 1Tb of memory, don''t you think? > > JanOnly because machines with 1TB are rare/unlikely. I can''t speak for the Citrix product but there is NO supported configuration (yet) of the Oracle VM product with a 64-bit dom0. In other words, if a customer is using a released Oracle VM product and the machine on which they are running it has 1TB of physical memory, they ARE using a 32-bit dom0. However, Oracle VM always specifies a dom0_mem= Xen boot parameter. (which is always much smaller than 1TB). If there are known issues with 1TB of memory in this configuration, we''d like to understand them. If 1Tb with 32-bit dom0 is rife with hidden unresolvable problems, we''d like to make a clear support statement as to what the physical memory limit is. Thanks, Dan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Campbell
2010-Jan-04 22:21 UTC
RE: [Xen-devel] [timer/ticks related] dom0 hang during boot on large 1TB system
On Mon, 2010-01-04 at 22:07 +0000, Dan Magenheimer wrote:> > From: Jan Beulich [mailto:JBeulich@novell.com] > > > > >>> Jeremy Fitzhardinge <jeremy@goop.org> 22.12.09 19:03 >>> > > >On 12/22/2009 09:02 AM, Jan Beulich wrote: > > >> No, that won''t help. I''m referring to things like Dom0 > > accesses to the > > >> M2P table it sees, which doesn''t cover even nearly all > > memory. I can''t > > >> say whether that can go without problem, but without > > closely looking > > >> at it I don''t think you can assume this would work. > > Likewise I would > > >> suspect tools issues (if you use the tools from > > xen-unstable et al), > > >> though I have no precise pointer right now at specific issues. > > >> > > > > > >32-bit dom0 is the standard use model for Citrix product, > > and I think > > >people tend to run it even with xen-unstable. Its a fairly > > well-tested > > >combination. > > > > But very unlikely with 1Tb of memory, don''t you think? > > > > Jan > > Only because machines with 1TB are rare/unlikely. I can''t > speak for the Citrix product but there is NO supported > configuration (yet) of the Oracle VM product with a 64-bit dom0. > In other words, if a customer is using a released Oracle VM > product and the machine on which they are running it has 1TB > of physical memory, they ARE using a 32-bit dom0. > > However, Oracle VM always specifies a dom0_mem= Xen boot parameter. > (which is always much smaller than 1TB).So do XenServer and XCP, FWIW. Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jan Beulich
2010-Jan-05 08:33 UTC
RE: [Xen-devel] [timer/ticks related] dom0 hang during boot on large 1TB system
>>> Dan Magenheimer <dan.magenheimer@oracle.com> 04.01.10 23:07 >>> >If there are known issues with 1TB of memory in this >configuration, we''d like to understand them. If 1Tb with >32-bit dom0 is rife with hidden unresolvable problems, >we''d like to make a clear support statement as to what the >physical memory limit is.I can''t say there are known problems, but I''m convinced not everything can work properly above the boundary of 168G. Nevertheless it is quite possible that most or all of the normal (not error handling) code paths work well. Page table walks e.g. during exceptions or kexec would be problem candidates. And while my knowledge of the tools is rather limited, libxc also has - iirc - several hard coded assumptions that might not hold. What is clear though is that you also depend on the memory distribution across the (physical) address space: Contiguous (apart from the below- 4G hole) memory will likely represent little problems, but sparse memory crossing the 44-bit boundary can''t work in any case (since MFNs are represented as 32-bit quantities in 32-bit Dom0). Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Dan Magenheimer
2010-Jan-05 15:46 UTC
RE: [Xen-devel] [timer/ticks related] dom0 hang during boot on large 1TB system
> What is clear though is that you also depend on the memory > distribution > across the (physical) address space: Contiguous (apart from the below- > 4G hole) memory will likely represent little problems, but > sparse memory > crossing the 44-bit boundary can''t work in any case (since MFNs are > represented as 32-bit quantities in 32-bit Dom0).Urk. Yes, I had forgotten about the sparse problem.> I can''t say there are known problems, but I''m convinced not everything > can work properly above the boundary of 168G. Nevertheless it is quite > possible that most or all of the normal (not error handling) > code paths > work well. Page table walks e.g. during exceptions or kexec would be > problem candidates. And while my knowledge of the tools is rather > limited, libxc also has - iirc - several hard coded > assumptions that might not hold.What is special about 168GB? Or is that a typo? (And if it is supposed to be 128GB, what is special about 128GB?) Thanks, Dan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Campbell
2010-Jan-05 15:54 UTC
RE: [Xen-devel] [timer/ticks related] dom0 hang during boot on large 1TB system
On Tue, 2010-01-05 at 15:46 +0000, Dan Magenheimer wrote:> > What is clear though is that you also depend on the memory > > distribution > > across the (physical) address space: Contiguous (apart from the below- > > 4G hole) memory will likely represent little problems, but > > sparse memory > > crossing the 44-bit boundary can''t work in any case (since MFNs are > > represented as 32-bit quantities in 32-bit Dom0). > > Urk. Yes, I had forgotten about the sparse problem. > > > I can''t say there are known problems, but I''m convinced not everything > > can work properly above the boundary of 168G. Nevertheless it is quite > > possible that most or all of the normal (not error handling) > > code paths > > work well. Page table walks e.g. during exceptions or kexec would be > > problem candidates. And while my knowledge of the tools is rather > > limited, libxc also has - iirc - several hard coded > > assumptions that might not hold. > > What is special about 168GB? Or is that a typo? (And if it > is supposed to be 128GB, what is special about 128GB?)It''s the size of m2p you can fit into the hypervisor hole of a PAE guest running on a 64 bit hypervisor, since the hypervisor no longer need to reside in there it bigger than with a PAE guest on a PAE hypervisor. The size of the hypervisor hole is runtime settable for many guests but I''m not sure that is plumbed through in the tools so who knows how well it works. Increasing the size of the hypervisor hole eats in to kernel low memory though so you would be trading off maximum per-guest RAM against maximum host RAM to some degree. Ian.> > Thanks, > Dan > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jan Beulich
2010-Jan-05 16:08 UTC
RE: [Xen-devel] [timer/ticks related] dom0 hang during boot on large 1TB system
>>> Dan Magenheimer <dan.magenheimer@oracle.com> 05.01.10 16:46 >>> >What is special about 168GB? Or is that a typo? (And if it >is supposed to be 128GB, what is special about 128GB?)No, it''s not a typo. The maximum hole Xen can reserve for itself is 168M, and this is what allows to accommodate the M2P table for 168G. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel