Grzegorz Milos
2006-Nov-15 19:39 UTC
[Xen-devel] [PATCH] Mini-OS update of events initialisation
This patch moves initialisation of events (masking event channels) earlier during the boot process. Otherwise 64bit guests would sometimes crash. Signed-off-by: Grzegorz Milos <gm281@cam.ac.uk> Admittedly, I cannot explain the behavior I observed fully. This patch seems to fix it well, but I''d be grateful if someone on the list could explain it. The problem only appears in 64 bit guest, and only if MiniOS keeps accessing the console ring page (which it does, as all printks write to the console, even before events are initialised. Of course without sending notifications). What happens is that after a while, usually in memory initialisation stage, but at fairly random place MiniOS gets "restarted" i.e. start_kernel is called again. This leads to a crash as soon as MiniOS tries to remap shared info page. It looks like if Xen or xend decided that MiniOS domain should have masked the event channels before writing to the console page, and reboot it ... Any ideas? Thanks Gregor _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2006-Nov-16 07:45 UTC
Re: [Xen-devel] [PATCH] Mini-OS update of events initialisation
On 15/11/06 7:39 pm, "Grzegorz Milos" <gm281@cam.ac.uk> wrote:> What happens is that after a while, usually in memory initialisation > stage, but at fairly random place MiniOS gets "restarted" i.e. > start_kernel is called again. This leads to a crash as soon as MiniOS > tries to remap shared info page. It looks like if Xen or xend decided > that MiniOS domain should have masked the event channels before writing > to the console page, and reboot it ...If you enable event delivery before registering a callback handler then your domain will crash on first event delivery. But then you would be restarted in a fresh domain with a new shared_info page; wou wouldn''t jump back to the entry code of the existing domain. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Melvin Anderson
2006-Nov-17 13:53 UTC
Re: [Xen-devel] [PATCH] Mini-OS update of events initialisation
We have been seeing the same problem with a 32bit Mini-OS guest. I suspect the problem is in the initialisation order. Looking at start_kernel() in kernel.c: arch_init(si); trap_init(); /* ENABLE EVENT DELIVERY. This is disabled at start of day. */ __sti(); <code omitted...> /* Set up events. */ init_events(); The function arch_init() registers hypervisor_callback, and __sti() enables events to be delivered. Entry point hypervisor_callback is in the assembly code in x86_32.S which calls do_hypervisor_callback() in hypervisor.c, which in turn calls do_event() in events.c. Suppose a callback occurs between the calls of __sti() and init_events(). The function do_event() calls the handler indirectly via the array ev_actions. But ev_actions is initialised in init_events(), so if do_event() is called too early, the function pointer "handler" in ev_actions will still be 0 (default for static storage). We start again at virtual address 0, which is the entry point of Mini-OS, but %esi will not now point to the start_info page. I think this explains why Mini-OS sometimes gets "restarted", and when it does the start_info page seems to be garbage. I am not convinced that Grzegorz'' patch closes the window of opportunity for a misplaced callback, but it does reduce it. Shouldn''t __sti() be after init_events(), not before it? Thanks are due to Micha Moffie, who came up with key insights. Regards, Melvin Anderson. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Grzegorz Milos
2006-Nov-17 15:29 UTC
Re: [Xen-devel] [PATCH] Mini-OS update of events initialisation
> We have been seeing the same problem with a 32bit Mini-OS guest. > > I suspect the problem is in the initialisation order. Looking at > start_kernel() in kernel.c: > > arch_init(si); > > trap_init(); > > /* ENABLE EVENT DELIVERY. This is disabled at start of day. */ > __sti(); > > <code omitted...> > > /* Set up events. */ > init_events(); > > The function arch_init() registers hypervisor_callback, and __sti() > enables events to be delivered. Entry point hypervisor_callback is in > the assembly code in x86_32.S which calls do_hypervisor_callback() in > hypervisor.c, which in turn calls do_event() in events.c.That''s all correct so far.> Suppose a callback occurs between the calls of __sti() and > init_events(). The function do_event() calls the handler indirectly via > the array ev_actions. But ev_actions is initialised in init_events(), > so if do_event() is called too early, the function pointer "handler" in > ev_actions will still be 0 (default for static storage). We start again > at virtual address 0, which is the entry point of Mini-OS, but %esi will > not now point to the start_info page. I think this explains why Mini-OS > sometimes gets "restarted", and when it does the start_info page seems > to be garbage.This seems to explain the "restart". However, my understanding was that an event port has to be explicitly unmasked in order for any event to be delivered (i.e. that all event ports are masked by default). Unmasking is done by bind_evtchn() or bind_virq() in event.s. All calls to these functions happen after init_events() (in init_time(), init_console() and init_xenbus()). I guess the best way to verify if your scenario is correct is to put ''printk'' in do_events(). Could you try this Melvin?> I am not convinced that Grzegorz'' patch closes the window of opportunity > for a misplaced callback, but it does reduce it. Shouldn''t __sti() be > after init_events(), not before it?That certainly cannot hurt. I''m attaching a patch. Keir could you apply? Patch description: Events enabled after init_events() call. This guarantees that internal datastructures are set up correctly. Problem spotted/explained by Melvin Anderson and Micha Moffie. Signed-off-by: Grzegorz Milos <gm281@cam.ac.uk>> > Thanks are due to Micha Moffie, who came up with key insights._______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Melvin Anderson
2006-Nov-17 17:31 UTC
Re: [Xen-devel] [PATCH] Mini-OS update of events initialisation
On Fri, 2006-11-17 at 15:29 +0000, Grzegorz Milos wrote:> > Suppose a callback occurs between the calls of __sti() and > > init_events(). The function do_event() calls the handler indirectly via > > the array ev_actions. But ev_actions is initialised in init_events(), > > so if do_event() is called too early, the function pointer "handler" in > > ev_actions will still be 0 (default for static storage). We start again > > at virtual address 0, which is the entry point of Mini-OS, but %esi will > > not now point to the start_info page. I think this explains why Mini-OS > > sometimes gets "restarted", and when it does the start_info page seems > > to be garbage. > > This seems to explain the "restart". However, my understanding was that > an event port has to be explicitly unmasked in order for any event to > be delivered (i.e. that all event ports are masked by default). > Unmasking is done by bind_evtchn() or bind_virq() in event.s. All calls > to these functions happen after init_events() (in init_time(), > init_console() and init_xenbus()). I guess the best way to verify if > your scenario is correct is to put ''printk'' in do_events(). Could you > try this Melvin?Very good question. I have just tried calling a modified version of init_events() to set up the default_handler before __sti() is called, but my modified init_events() did not call mask_evtchn(). I find that I occasionally see an event on port 2. Looking at the debug output from xend (I ran xend with XEND_DEBUG=1 set in the environment), I see that this is the console event port. I have just looked at the domain builder code in libxc, tools/libxc/xc_linux_build.c, function setup_guest(). I see that this initialises the shared_info page with: memset(shared_info, 0, PAGE_SIZE); /* Mask all upcalls... */ for ( i = 0; i < MAX_VIRT_CPUS; i++ ) shared_info->vcpu_info[i].evtchn_upcall_mask = 1; So, the evtchn_mask array is cleared (unmasked) by the memset() call, but the per-cpu evtchn_upcall_mask is set to hold off callbacks until we are properly initialised. Function __sti() clears evtchn_upcall_mask, and as evtchn_mask array is cleared by default, that explains why the upcall on port 2 is not masked. Regards, Melvin Anderson. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel