Carb, Brian A
2008-May-06 15:16 UTC
[Xen-devel] Time went backwards errors booting unstable cs 17557 with 64GB memory
I''ve built the latest xen unstable cs 17557 to run on a multi-cell ES7000/one. When I boot with the procs and memory from a single cell (4 cores aka 8 cpus from cell0, 32gb memory from cell0), dom0 boots cleanly . However, if I use the memory from 2 cells (4 cores aka 8 cpus from cell0, 64gb memory interleaved from cell0 & cell1), dom0 boots eventually, but produces multiple Timer errors as shown below. I tried setting numa=on but timer errors persist. FYI Xen unstable cs 17318 ran successfully without timer errors on this multi-cell configuration. Timer ISR/0: Time went backwards: delta=-2679966594 delta_cpu=10033406 shadow=13664567457 off=530033587 processed=16874567457 cpu_processed=14184567457 0: 14184567457 1: 16874567457 2: 16874567457 3: 16874567457 4: 16874567457 5: 16874567457 6: 16874567457 7: 16874567457 The attached serial trace shows a boot with 32gb and then a boot with 64gb. brian carb unisys corporation - malvern, pa _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2008-May-06 15:30 UTC
Re: [Xen-devel] Time went backwards errors booting unstable cs 17557 with 64GB memory
You''ll probably have to binary chop to find the offending changeset. All sorts of stuff has gone in since c/s 17318. -- Keir On 6/5/08 16:16, "Carb, Brian A" <Brian.Carb@unisys.com> wrote:> I''ve built the latest xen unstable cs 17557 to run on a multi-cell > ES7000/one. When I boot with the procs and memory from a single cell (4 > cores aka 8 cpus from cell0, 32gb memory from cell0), dom0 boots cleanly > . However, if I use the memory from 2 cells (4 cores aka 8 cpus from > cell0, 64gb memory interleaved from cell0 & cell1), dom0 boots > eventually, but produces multiple Timer errors as shown below. I tried > setting numa=on but timer errors persist. FYI Xen unstable cs 17318 ran > successfully without timer errors on this multi-cell configuration. > > Timer ISR/0: Time went backwards: delta=-2679966594 delta_cpu=10033406 > shadow=13664567457 off=530033587 processed=16874567457 > cpu_processed=14184567457 > 0: 14184567457 > 1: 16874567457 > 2: 16874567457 > 3: 16874567457 > 4: 16874567457 > 5: 16874567457 > 6: 16874567457 > 7: 16874567457 > > The attached serial trace shows a boot with 32gb and then a boot with > 64gb. > > brian carb > unisys corporation - malvern, pa > > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Bill Burns
2008-May-06 15:40 UTC
Re: [Xen-devel] Time went backwards errors booting unstable cs 17557 with 64GB memory
Carb, Brian A wrote:> I''ve built the latest xen unstable cs 17557 to run on a multi-cell > ES7000/one. When I boot with the procs and memory from a single cell (4 > cores aka 8 cpus from cell0, 32gb memory from cell0), dom0 boots cleanly > . However, if I use the memory from 2 cells (4 cores aka 8 cpus from > cell0, 64gb memory interleaved from cell0 & cell1), dom0 boots > eventually, but produces multiple Timer errors as shown below. I tried > setting numa=on but timer errors persist. FYI Xen unstable cs 17318 ran > successfully without timer errors on this multi-cell configuration. > > Timer ISR/0: Time went backwards: delta=-2679966594 delta_cpu=10033406 > shadow=13664567457 off=530033587 processed=16874567457 > cpu_processed=14184567457 > 0: 14184567457 > 1: 16874567457 > 2: 16874567457 > 3: 16874567457 > 4: 16874567457 > 5: 16874567457 > 6: 16874567457 > 7: 16874567457 > > The attached serial trace shows a boot with 32gb and then a boot with > 64gb. >Is the system using the pm timer? You may try clocksource=pit, or keeping more memory in the Hypervisor (via xenheap or dom0_mem) as workarounds. Bill> brian carb > unisys corporation - malvern, pa > > > > > ------------------------------------------------------------------------ > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Carb, Brian A
2008-May-06 17:21 UTC
RE: [Xen-devel] Time went backwards errors booting unstable cs 17557with 64GB memory
Setting clocksource=pit removes the errors and allows a clean boot, as does setting dom0_mem=4096M. What are the implications of running with a different clocksource? FYI setting xenheap_megabytes=64 by itself does not eliminate the timer errors.. brian carb unisys corporation - malvern, pa -----Original Message----- From: Bill Burns [mailto:bburns@redhat.com] Sent: Tuesday, May 06, 2008 11:40 AM To: Carb, Brian A Cc: xen-devel Subject: Re: [Xen-devel] Time went backwards errors booting unstable cs 17557with 64GB memory Carb, Brian A wrote:> I''ve built the latest xen unstable cs 17557 to run on a multi-cell > ES7000/one. When I boot with the procs and memory from a single cell > (4 cores aka 8 cpus from cell0, 32gb memory from cell0), dom0 boots > cleanly . However, if I use the memory from 2 cells (4 cores aka 8 > cpus from cell0, 64gb memory interleaved from cell0 & cell1), dom0 > boots eventually, but produces multiple Timer errors as shown below. I> tried setting numa=on but timer errors persist. FYI Xen unstable cs > 17318 ran successfully without timer errors on this multi-cellconfiguration.> > Timer ISR/0: Time went backwards: delta=-2679966594 delta_cpu=10033406 > shadow=13664567457 off=530033587 processed=16874567457 > cpu_processed=14184567457 > 0: 14184567457 > 1: 16874567457 > 2: 16874567457 > 3: 16874567457 > 4: 16874567457 > 5: 16874567457 > 6: 16874567457 > 7: 16874567457 > > The attached serial trace shows a boot with 32gb and then a boot with > 64gb. >Is the system using the pm timer? You may try clocksource=pit, or keeping more memory in the Hypervisor (via xenheap or dom0_mem) as workarounds. Bill> brian carb > unisys corporation - malvern, pa > > > > > ---------------------------------------------------------------------- > -- > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Bill Burns
2008-May-06 17:31 UTC
Re: [Xen-devel] Time went backwards errors booting unstable cs 17557with 64GB memory
Carb, Brian A wrote:> > Setting clocksource=pit removes the errors and allows a clean boot, as > does setting dom0_mem=4096M. What are the implications of running with > a different clocksource? >Good question. What appears to be happening is that the initial calibration is bad. If dom0 is started before a second calibration occurs it''s gets the speed of CPU0 wrong and bad things happen. Delaying the start of dom0 by giving the hypervisor more memory to scrub allows for a second calibration. Why PIT works I really don''t know. The suspicion is that somehow the calibration code has an issue when there is a long outage, as in starting up a bunch of other CPUs, before the first calibration. But that is just a guess at this point. There is a bug there that needs to be fixed. Bill> FYI setting xenheap_megabytes=64 by itself does not eliminate the timer > errors.. > > brian carb > unisys corporation - malvern, pa > > -----Original Message----- > From: Bill Burns [mailto:bburns@redhat.com] > Sent: Tuesday, May 06, 2008 11:40 AM > To: Carb, Brian A > Cc: xen-devel > Subject: Re: [Xen-devel] Time went backwards errors booting unstable cs > 17557with 64GB memory > > Carb, Brian A wrote: >> I''ve built the latest xen unstable cs 17557 to run on a multi-cell >> ES7000/one. When I boot with the procs and memory from a single cell >> (4 cores aka 8 cpus from cell0, 32gb memory from cell0), dom0 boots >> cleanly . However, if I use the memory from 2 cells (4 cores aka 8 >> cpus from cell0, 64gb memory interleaved from cell0 & cell1), dom0 >> boots eventually, but produces multiple Timer errors as shown below. I > >> tried setting numa=on but timer errors persist. FYI Xen unstable cs >> 17318 ran successfully without timer errors on this multi-cell > configuration. >> Timer ISR/0: Time went backwards: delta=-2679966594 delta_cpu=10033406 >> shadow=13664567457 off=530033587 processed=16874567457 >> cpu_processed=14184567457 >> 0: 14184567457 >> 1: 16874567457 >> 2: 16874567457 >> 3: 16874567457 >> 4: 16874567457 >> 5: 16874567457 >> 6: 16874567457 >> 7: 16874567457 >> >> The attached serial trace shows a boot with 32gb and then a boot with >> 64gb. >> > > Is the system using the pm timer? You may try clocksource=pit, or > keeping more memory in the Hypervisor (via xenheap or dom0_mem) as > workarounds. > > Bill > >> brian carb >> unisys corporation - malvern, pa >> >> >> >> >> ---------------------------------------------------------------------- >> -- >> >> _______________________________________________ >> Xen-devel mailing list >> Xen-devel@lists.xensource.com >> http://lists.xensource.com/xen-devel >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel