Is it really intended that the stack size, specifically bumped to 4 pages in include/asm-x86/config.h for x86-64 when debugging, gets shrunk to a single page in memguard_guard_stack()? To me it would seem much more reasonable if indeed only the first page (or at most, the first two pages) was used as a guard page here. Thanks, Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 28 Apr 2006, at 17:03, Jan Beulich wrote:> Is it really intended that the stack size, specifically bumped to 4 > pages in include/asm-x86/config.h for x86-64 when > debugging, gets shrunk to a single page in memguard_guard_stack()? To > me it would seem much more reasonable if indeed > only the first page (or at most, the first two pages) was used as a > guard page here.The only reason that the stack is 4 pages on a 64-bit debug build is because we put syscall-entry trampolines at the start of the per-cpu stack area. Therefore simply allocating 2 pages for a debug stack and then removing the mapping of the first page, as we do on i386, does not work -- that would unmap the trampolines! Instead we allocate 4 pages (next power of two) and zap the middle two page mappings. This has the desirable effect of placing the guard between the trampolines and the actual stack (otherwise the trampolines would get overwritten before the guard page gets trodden on!). It should never be possible for Xen to overflow 4kB of stack. Very little is done in interrupt contexts so we don''t have the overflow problems that Linux has suffered. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Hmm, but I found this precisely because we saw a double fault due to a stack overflow. Admittedly this was in the context of one of these IPI storms during shotdown that were fixed previously, but even that shouldn''t result in a stack overflow, should it ? Jan>>> Keir Fraser <Keir.Fraser@cl.cam.ac.uk> 04/28/06 6:30 PM >>>On 28 Apr 2006, at 17:03, Jan Beulich wrote:> Is it really intended that the stack size, specifically bumped to 4 > pages in include/asm-x86/config.h for x86-64 when > debugging, gets shrunk to a single page in memguard_guard_stack()? To > me it would seem much more reasonable if indeed > only the first page (or at most, the first two pages) was used as a > guard page here.The only reason that the stack is 4 pages on a 64-bit debug build is because we put syscall-entry trampolines at the start of the per-cpu stack area. Therefore simply allocating 2 pages for a debug stack and then removing the mapping of the first page, as we do on i386, does not work -- that would unmap the trampolines! Instead we allocate 4 pages (next power of two) and zap the middle two page mappings. This has the desirable effect of placing the guard between the trampolines and the actual stack (otherwise the trampolines would get overwritten before the guard page gets trodden on!). It should never be possible for Xen to overflow 4kB of stack. Very little is done in interrupt contexts so we don''t have the overflow problems that Linux has suffered. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 1 May 2006, at 12:36, Jan Beulich wrote:> Hmm, but I found this precisely because we saw a double fault due to a > stack overflow. Admittedly this was in the > context of one of these IPI storms during shotdown that were fixed > previously, but even that shouldn''t result in a stack > overflow, should it ? JanHow many CPUs were in the system? That code path was rather dodgy: the function forcibly enabled interrupts and so a single CPU could nest in that function up to (NR_CPUS-1) times which, if you had say 32 CPUs in the system could certainly cause problems. I don''t think it''s indicative of a wider problem in Xen -- for most interrupts (ones bound to a guest) we don''t even reenable interrupt delivery while handling them, so nested ISRs in Xen are impossible. I would like to understand exactly what happened in the context of your IPI storm (it *is* the machine restart bug we''re talking about, right?) -- if you had much fewer than 32 CPUs then I need to check exactly how much stack an invocation of machine_restart() uses. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 1 May 2006, at 15:09, Keir Fraser wrote:> How many CPUs were in the system? That code path was rather dodgy: the > function forcibly enabled interrupts and so a single CPU could nest in > that function up to (NR_CPUS-1) times which, if you had say 32 CPUs in > the system could certainly cause problems. I don''t think it''s > indicative of a wider problem in Xen -- for most interrupts (ones > bound to a guest) we don''t even reenable interrupt delivery while > handling them, so nested ISRs in Xen are impossible. > > I would like to understand exactly what happened in the context of > your IPI storm (it *is* the machine restart bug we''re talking about, > right?) -- if you had much fewer than 32 CPUs then I need to check > exactly how much stack an invocation of machine_restart() uses.Actually the behaviour is worse than I thought -- it was possible to build up stack frames unboundedly (think CPU1 execute machine_restart(), IPIs CPU2 executes machine_restart(), IPIs CPU1 .....). It was essentially a race to see if CPU0 could smp_send_stop() and quiesce the other CPUs before they blew up their stacks. :-) Case closed. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>>> Keir Fraser <Keir.Fraser@cl.cam.ac.uk> 01.05.06 16:09 >>> > >On 1 May 2006, at 12:36, Jan Beulich wrote: > >> Hmm, but I found this precisely because we saw a double fault due to a >> stack overflow. Admittedly this was in the >> context of one of these IPI storms during shotdown that were fixed >> previously, but even that shouldn''t result in a stack >> overflow, should it ? Jan > >How many CPUs were in the system? That code path was rather dodgy: the >function forcibly enabled interrupts and so a single CPU could nest in >that function up to (NR_CPUS-1) times which, if you had say 32 CPUs in >the system could certainly cause problems. I don''t think it''s >indicative of a wider problem in Xen -- for most interrupts (ones bound >to a guest) we don''t even reenable interrupt delivery while handling >them, so nested ISRs in Xen are impossible.At least 16 (both stack traces we got were on CPU 15.>I would like to understand exactly what happened in the context of your >IPI storm (it *is* the machine restart bug we''re talking about, right?)Yes.>-- if you had much fewer than 32 CPUs then I need to check exactly how >much stack an invocation of machine_restart() uses.A single nesting level consumed, according to the stack dump, up to 352 bytes. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel