Andrew Cooper
2013-Aug-27 13:02 UTC
Request fairly urgent reversion of c/s 8a3c4acc9907cfec9aae9f1bc251fbf50af6828e
Changeset 8a3c4acc9907cfec9aae9f1bc251fbf50af6828e is my "Clean stacks in debug builds patch" I applied the upstream version to XenServer, and resulted in some spectacular fails to boot. Curiously, the buggy V1 "$(STACK_SIZE >> 8) " works fine on all hardware, but the apparently correct "$(STACK_SIZE / 8)" causes complete deadlock on AMD boxes on AP bringup, and causes Intel boxes to only boot a single pcpu. I have some meetings now, so cant investigate immediately. Please could you revert until I have had a time to investigate and fix. Thanks, ~Andrew
Jan Beulich
2013-Aug-27 13:16 UTC
Re: Request fairly urgent reversion of c/s 8a3c4acc9907cfec9aae9f1bc251fbf50af6828e
>>> On 27.08.13 at 15:02, Andrew Cooper <andrew.cooper3@citrix.com> wrote: > Changeset 8a3c4acc9907cfec9aae9f1bc251fbf50af6828e is my "Clean stacks > in debug builds patch" > > I applied the upstream version to XenServer, and resulted in some > spectacular fails to boot. > > Curiously, the buggy V1 "$(STACK_SIZE >> 8) " works fine on all > hardware, but the apparently correct "$(STACK_SIZE / 8)" causes complete > deadlock on AMD boxes on AP bringup, and causes Intel boxes to only boot > a single pcpu.That original version was broken in the assembly version only iirc, which should not get used in AP bring-up. Hence I''m puzzled.> I have some meetings now, so cant investigate immediately. > > Please could you revert until I have had a time to investigate and fix.Reverted. Jan
Andrew Cooper
2013-Aug-27 14:06 UTC
Re: Request fairly urgent reversion of c/s 8a3c4acc9907cfec9aae9f1bc251fbf50af6828e
On 27/08/13 14:16, Jan Beulich wrote:>>>> On 27.08.13 at 15:02, Andrew Cooper <andrew.cooper3@citrix.com> wrote: >> Changeset 8a3c4acc9907cfec9aae9f1bc251fbf50af6828e is my "Clean stacks >> in debug builds patch" >> >> I applied the upstream version to XenServer, and resulted in some >> spectacular fails to boot. >> >> Curiously, the buggy V1 "$(STACK_SIZE >> 8) " works fine on all >> hardware, but the apparently correct "$(STACK_SIZE / 8)" causes complete >> deadlock on AMD boxes on AP bringup, and causes Intel boxes to only boot >> a single pcpu. > That original version was broken in the assembly version only iirc, > which should not get used in AP bring-up. Hence I''m puzzled.Me too. Literally changing the "$(STACK_SIZE / 8)" to "$(STACK_SIZE >> 8)" is sufficient to prevent the problem. I am investigating now. ~Andrew
Ian Campbell
2013-Aug-27 14:44 UTC
Re: Request fairly urgent reversion of c/s 8a3c4acc9907cfec9aae9f1bc251fbf50af6828e
On Tue, 2013-08-27 at 15:06 +0100, Andrew Cooper wrote:> On 27/08/13 14:16, Jan Beulich wrote: > >>>> On 27.08.13 at 15:02, Andrew Cooper <andrew.cooper3@citrix.com> wrote: > >> Changeset 8a3c4acc9907cfec9aae9f1bc251fbf50af6828e is my "Clean stacks > >> in debug builds patch" > >> > >> I applied the upstream version to XenServer, and resulted in some > >> spectacular fails to boot. > >> > >> Curiously, the buggy V1 "$(STACK_SIZE >> 8) " works fine on all > >> hardware, but the apparently correct "$(STACK_SIZE / 8)" causes complete > >> deadlock on AMD boxes on AP bringup, and causes Intel boxes to only boot > >> a single pcpu. > > That original version was broken in the assembly version only iirc, > > which should not get used in AP bring-up. Hence I''m puzzled. > > Me too. Literally changing the "$(STACK_SIZE / 8)" to "$(STACK_SIZE >> > 8)" is sufficient to prevent the problem."/ 8" is not the same as ">> 8". ">> 8" is "/ 256". ">> 3" is "/ 8". (I''m lacking the context to know if this is where you went wrong though) Ian.
Andrew Cooper
2013-Aug-27 14:53 UTC
Re: Request fairly urgent reversion of c/s 8a3c4acc9907cfec9aae9f1bc251fbf50af6828e
On 27/08/13 15:44, Ian Campbell wrote:> On Tue, 2013-08-27 at 15:06 +0100, Andrew Cooper wrote: >> On 27/08/13 14:16, Jan Beulich wrote: >>>>>> On 27.08.13 at 15:02, Andrew Cooper <andrew.cooper3@citrix.com> wrote: >>>> Changeset 8a3c4acc9907cfec9aae9f1bc251fbf50af6828e is my "Clean stacks >>>> in debug builds patch" >>>> >>>> I applied the upstream version to XenServer, and resulted in some >>>> spectacular fails to boot. >>>> >>>> Curiously, the buggy V1 "$(STACK_SIZE >> 8) " works fine on all >>>> hardware, but the apparently correct "$(STACK_SIZE / 8)" causes complete >>>> deadlock on AMD boxes on AP bringup, and causes Intel boxes to only boot >>>> a single pcpu. >>> That original version was broken in the assembly version only iirc, >>> which should not get used in AP bring-up. Hence I''m puzzled. >> Me too. Literally changing the "$(STACK_SIZE / 8)" to "$(STACK_SIZE >> >> 8)" is sufficient to prevent the problem. > "/ 8" is not the same as ">> 8". ">> 8" is "/ 256". ">> 3" is "/ 8". > > (I''m lacking the context to know if this is where you went wrong though) > > Ian. > >Indeed. I was wanting to zero STACK_SIZE bytes using `rep stosq` In V1, I accidentally miscalculated ecx for the `rep` as $(STACK_SIZE >> 8), which cleared too little. Jan spotted the mistake and corrected it to $(STACK_SIZE / 8) which cleared the intended number of bytes. Unfortunately, this leads to some breakage. I had thoroughly tested by incorrect-but-safe v1, but only tokenly tested v2 on my Intel 2cpu box, where I failed to notice the lack of the second CPU. The lack of APs is rather more noticeable on larger servers, and the fact it locks up all AMD boxes completely is another indication of suboptimal behaviour. I am currently neck deep in the 16bit trampoline code attempting to work out why it is executing twice for each AP :) ~Andrew