thr3ads.net - Xen devel - [Xen-devel] Different esps [Jan 2004]

If this information is useful, please help other people find it:
Share via:

Jacob Gorm Hansen

2004-Jan-20 11:11 UTC

[Xen-devel] Different esps

hi,

I am trying to make sense of the various stack-pointers within Xen. My
problem is that my newly created domain gets killed, apparently because
Xen cannot write the stack when returning to the domain. I am not sure
which stack pointer Xen is using for the faulting write though.

I am looking at two different esp values, the ones printed by the printf
below:

execution_context_t ctxt;
memcpy(&ctxt, get_execution_context(),sizeof(execution_context_t));

printk("esp1 %08lx %08lx\n",ctxt.esp, current->thread.esp1);

What is the difference between these two stacks, and what is the
recommended way of reading and setting their values from within the
unprivileged domain?

thanks,
Jacob



-------------------------------------------------------
The SF.Net email is sponsored by EclipseCon 2004
Premiere Conference on Open Tools Development and Integration
See the breadth of Eclipse activity. February 3-5 in Anaheim, CA.
http://www.eclipsecon.org/osdn
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xen-devel

Jacob Gorm Hansen

2004-Jan-20 11:29 UTC

head link

Re: [Xen-devel] Different esps

On Tue, 2004-01-20 at 12:11, Jacob Gorm Hansen wrote:
> execution_context_t ctxt;
> memcpy(&ctxt, get_execution_context(),sizeof(execution_context_t));
> 
> printk("esp1 %08lx %08lx\n",ctxt.esp, current->thread.esp1);
Just to add a little more info;

my problem seems to be that the value of ctxt.esp from read from inside
Xen differs from what I set at domain creation time, and the value I get
from reading the esp from within the domain, by about 0x4000 bytes
negative.

I have tried setting the value of esp1 with the stack_switch hypercall,
but apparently this is not the one causing the Xen page fault.

best regards,
Jacob

-------------------------------------------------------
The SF.Net email is sponsored by EclipseCon 2004
Premiere Conference on Open Tools Development and Integration
See the breadth of Eclipse activity. February 3-5 in Anaheim, CA.
http://www.eclipsecon.org/osdn
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xen-devel

Steven Hand

2004-Jan-20 12:25 UTC

head link

Re: [Xen-devel] Different esps

> On Tue, 2004-01-20 at 12:11, Jacob Gorm Hansen wrote:
> 
> > execution_context_t ctxt;
> > memcpy(&ctxt,
get_execution_context(),sizeof(execution_context_t));
> > 
> > printk("esp1 %08lx %08lx\n",ctxt.esp,
current->thread.esp1);
So current->thread.esp1 is the stack pointer for ring 1 for the
current domain (aka the ''kernel'' stack pointer for xenolinux).

The equivalent ''stored'' version of this is the
''ring1_esp'' field in
a full_execution_context_t. 

ctxt is an ''execution_context_t'' which holds the user level
(ring 3)
stack pointer (and other registers). 
> I have tried setting the value of esp1 with the stack_switch hypercall,
> but apparently this is not the one causing the Xen page fault.
The stack_switch hypercall is typically used by GuestOS to tell tell
Xen what it''s "kernel" (ring 1) stack (segment + sp) are.
It''s a bit
like a ''virtual TSS'' for each domain (since the ss/sp for ring
1 are
updated from this through each domain schedule). For its use in XenoLinux 
see ./xenolinux-2.4.24-sparse/arch/xeno/kernel/process.c:__switch_to()

The initial ring 1 ss/sp come from the full_execution_context_t in 
the builddomain_t (see ./xen/common/domain.c:final_setup_guestos()) 


Anyway, can you post -

  a) what it is you''re trying to do in detail (I''m guessing
it''s
     to do with migration but not sure what stage you''re at) and 

  b) the console output leading up to your crash (xen and XL output
     if possible/relevant). 


You might also like to look at ./tools/xc/lib/xc_linux_{save,restore}.c 
to see how it works/ed in our version. 



cheers, 

S.













-------------------------------------------------------
The SF.Net email is sponsored by EclipseCon 2004
Premiere Conference on Open Tools Development and Integration
See the breadth of Eclipse activity. February 3-5 in Anaheim, CA.
http://www.eclipsecon.org/osdn
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xen-devel

Jacob Gorm Hansen

2004-Jan-20 13:37 UTC

head link

Re: [Xen-devel] Different esps

On Tue, 2004-01-20 at 13:25, Steven Hand wrote:> > On Tue, 2004-01-20 at 12:11, Jacob Gorm Hansen wrote:
> 
> Anyway, can you post -
> 
>   a) what it is you''re trying to do in detail (I''m
guessing it''s
>      to do with migration but not sure what stage you''re at) and 
I have migrated the domain pages to a new domain, and am trying to get
it to resume after migration. I am currently crashing sometime shortly
after resumption, and I have instrumented Xen to dump some info about
the state of the domain.

Since I am reading all info from inside the old domain ''by
hand'', this
is probably a case of some CPU state I have not managed to get across. 

I am quite sure the crash is due to a wrong stack page being pointed to,
but since my recovery code is running in ring1 in a __cli() context, I
suppose the ring3 stack cannot be to blame.

best,
Jacob

Jacob Gorm Hansen

2004-Jan-20 15:40 UTC

head link

Re: [Xen-devel] Different esps

On Tue, 2004-01-20 at 15:42, Jacob Gorm Hansen wrote:> On Tue, 2004-01-20 at 13:25, Steven Hand wrote:
> > > On Tue, 2004-01-20 at 12:11, Jacob Gorm Hansen wrote:
> > 
> > Anyway, can you post -
> > 
> >   a) what it is you''re trying to do in detail (I''m
guessing it''s
> >      to do with migration but not sure what stage you''re at)
and

This is my recovery function, which eip points to when the new domain is
started:

static void recover(void)
{
    __cli();
    HYPERVISOR_stack_switch(__KERNEL_DS, current->thread.esp0);
    asm volatile("addl $0x0, -4(%%eax)" :
:"eax"(current->thread.esp0));
    while(1) HYPERVISOR_console_write("alive",5);

    ...

I touch the ring1 stack to make sure it is writeable (due to my
migration hacks it may not always be).

The output on the serial looks like below. The first four lines are
printed by my version of Xen as a response to SCHEDOP_exit:

exit dom 41 : esp1 c3a96000, ss 00000821
eip c00b5b6d esp c3a95ed8  eflags 296
Killing domain 41
Releasing task 41
DOM42: alive
DOM42: alive
DOM42: alive
DOM42: alive
DOM42: alive
DOM42: alive
DOM42: alive
DOM42: alive
DOM42: alive
DOM42: alive
DOM42: alive
DOM42: alive
DOM42: alive
fault_in_hypervisor 2
dom 42 : esp1 c3a96000, ss 00000821
eip 00000000 esp c3a92004 eip 00000000  pf-addr c3a91ff8 eflags 10286
Killing domain 42
Releasing task 42

As you can see, the crash is not happening in direct response to some
action in the domain, but rather as an effect of something happening
outside. I was speculating that perhaps I need to re-register for the
timer interrupt, or that the __cli() does not prevent Xen from trying to
deliver them?

Btw, the value of ''current'' checks out, and is equal to the
value before
migration.

best,
Jacob

Jacob Gorm Hansen

2004-Jan-20 15:49 UTC

head link

Re: [Xen-devel] Different esps

On Tue, 2004-01-20 at 15:42, Jacob Gorm Hansen wrote:> On Tue, 2004-01-20 at 13:25, Steven Hand wrote:
> > > On Tue, 2004-01-20 at 12:11, Jacob Gorm Hansen wrote:
> > 
> > Anyway, can you post -
> > 
> >   a) what it is you''re trying to do in detail (I''m
guessing it''s
> >      to do with migration but not sure what stage you''re at)
and

This is my recovery function, which eip points to when the new domain is
started:

static void recover(void)
{
    __cli();
    HYPERVISOR_stack_switch(__KERNEL_DS, current->thread.esp0);
    asm volatile("addl $0x0, -4(%%eax)" :
:"eax"(current->thread.esp0));
    while(1) HYPERVISOR_console_write("alive",5);

    ...

I touch the ring1 stack to make sure it is writeable (due to my
migration hacks it may not always be).

The output on the serial looks like below. The first four lines are
printed by my version of Xen as a response to SCHEDOP_exit:

exit dom 41 : esp1 c3a96000, ss 00000821
eip c00b5b6d esp c3a95ed8  eflags 296
Killing domain 41
Releasing task 41
DOM42: alive
DOM42: alive
DOM42: alive
DOM42: alive
DOM42: alive
DOM42: alive
DOM42: alive
DOM42: alive
DOM42: alive
DOM42: alive
DOM42: alive
DOM42: alive
DOM42: alive
fault_in_hypervisor 2
dom 42 : esp1 c3a96000, ss 00000821
eip 00000000 esp c3a92004 eip 00000000  pf-addr c3a91ff8 eflags 10286
Killing domain 42
Releasing task 42

As you can see, the crash is not happening in direct response to some
action in the domain, but rather as an effect of something happening
outside. I was speculating that perhaps I need to re-register for the
timer interrupt, or that the __cli() does not prevent Xen from trying to
deliver them?

Btw, the value of ''current'' checks out, and is equal to the
value before
migration.

best,
Jacob

-------------------------------------------------------
The SF.Net email is sponsored by EclipseCon 2004
Premiere Conference on Open Tools Development and Integration
See the breadth of Eclipse activity. February 3-5 in Anaheim, CA.
http://www.eclipsecon.org/osdn
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xen-devel

Steven Hand

2004-Jan-20 17:08 UTC

head link

Re: [Xen-devel] Different esps

> On Tue, 2004-01-20 at 15:42, Jacob Gorm Hansen wrote:
> > On Tue, 2004-01-20 at 13:25, Steven Hand wrote:
> > > > On Tue, 2004-01-20 at 12:11, Jacob Gorm Hansen wrote:
> > > 
> > > Anyway, can you post -
> > > 
> > >   a) what it is you''re trying to do in detail
(I''m guessing it''s
> > >      to do with migration but not sure what stage you''re
at) and
> 
> 
> This is my recovery function, which eip points to when the new domain is
> started:
> 
> static void recover(void)
> {
>     __cli();
>     HYPERVISOR_stack_switch(__KERNEL_DS, current->thread.esp0);
>     asm volatile("addl $0x0, -4(%%eax)" :
:"eax"(current->thread.esp0));
>     while(1) HYPERVISOR_console_write("alive",5);
> 
>     ...
> 
> I touch the ring1 stack to make sure it is writeable (due to my
> migration hacks it may not always be).
> 
> The output on the serial looks like below. The first four lines are
> printed by my version of Xen as a response to SCHEDOP_exit:
> 
> exit dom 41 : esp1 c3a96000, ss 00000821
> eip c00b5b6d esp c3a95ed8  eflags 296
> Killing domain 41
> Releasing task 41
> DOM42: alive
> DOM42: alive
> DOM42: alive
> DOM42: alive
> DOM42: alive
> DOM42: alive
> DOM42: alive
> DOM42: alive
> DOM42: alive
> DOM42: alive
> DOM42: alive
> DOM42: alive
> DOM42: alive
> fault_in_hypervisor 2
> dom 42 : esp1 c3a96000, ss 00000821
> eip 00000000 esp c3a92004 eip 00000000  pf-addr c3a91ff8 eflags 10286
> Killing domain 42
> Releasing task 42
> 
> As you can see, the crash is not happening in direct response to some
> action in the domain, but rather as an effect of something happening
> outside. I was speculating that perhaps I need to re-register for the
> timer interrupt, or that the __cli() does not prevent Xen from trying to
> deliver them?
The __cli() should certainly prevent any events from being delivered. 

It''s tricky to work out what the above means as you''ve clearly
hacked
Xen to e.g print "fault_in_hypervisor 2" (is 2 the error_code? is 
there any reason you''ve added this stuff in place of the regular code 
in do_trap() or do_page_fault()?). Can you post the code/diffs for 
these parts of xen? 

What is at 0xc3a91ff8? Have you actually taken a page fault? I note 
that zero eips are not so good -- but OTOH I don''t know what those
values you print actually are [partic given there are two eips]... 
plus if we''re really multiply faulting in the hypervisor, all bets
may well be off... 

cheers, 

S.

Jacob Gorm Hansen

2004-Jan-20 17:22 UTC

head link

Re: [Xen-devel] Different esps

On Tue, 2004-01-20 at 18:08, Steven Hand wrote:
> The __cli() should certainly prevent any events from being delivered. 
> 
> It''s tricky to work out what the above means as you''ve
clearly hacked
> Xen to e.g print "fault_in_hypervisor 2" (is 2 the error_code? is
> there any reason you''ve added this stuff in place of the regular
code
> in do_trap() or do_page_fault()?). Can you post the code/diffs for 
> these parts of xen? 
I added this because there otherwise Xen will just kill the domain silently,
unless there is a debugging option I have overlooked.

about line 329 in traps.c:

  fault_in_hypervisor:
+       printk("fault_in_hypervisor 2\n");
+       printk("dom %d : esp1 %08lx, ss %08lx\n",current->domain,
current->thread.esp1, current->thread.ss1);
+
+       execution_context_t ctxt;
+    memcpy(&ctxt,
+           get_execution_context(), 
+           sizeof(execution_context_t));
+
+       printk("eip %p esp %p eip %p  addr %p flags %x\n", ctxt.eip,
ctxt.esp, gtb->eip, addr, ctxt.eflags);

> 
> What is at 0xc3a91ff8? Have you actually taken a page fault? I note 
> that zero eips are not so good -- but OTOH I don''t know what those
> values you print actually are [partic given there are two eips]... 
> plus if we''re really multiply faulting in the hypervisor, all bets
> may well be off... 
0xc3a91ff8 appears to be where ctxt.esp is pointing if you push some
values. The dual 0 eips worry me as well, I have not registered any
event-callbacks for the domain, perhaps that is the reason? But should I
not be allowed to run without registering for interrupts in a brand-new
domain? To me this looks like an interrupt-delivery gone bad.

cheers,
Jacob

Keir Fraser

2004-Jan-20 18:19 UTC

head link

Re: [Xen-devel] Different esps

> about line 329 in traps.c:
> 
>   fault_in_hypervisor:
> +       printk("fault_in_hypervisor 2\n");
> +       printk("dom %d : esp1 %08lx, ss
%08lx\n",current->domain, current->thread.esp1,
current->thread.ss1);
> +
> +       execution_context_t ctxt;
> +    memcpy(&ctxt,
> +           get_execution_context(), 
> +           sizeof(execution_context_t));
> +
> +       printk("eip %p esp %p eip %p  addr %p flags %x\n",
ctxt.eip, ctxt.esp, gtb->eip, addr, ctxt.eflags);
The EIP/ESP values you are printing aren''t up to date. You should be
printing regs->eip and regs->esp.

Also something like:
 struct pt_regs *guest_regs = (struct pt_regs *)(current->thread.esp1-1);
 <print guest_regs->esp, guest_regs->eip>
> 0xc3a91ff8 appears to be where ctxt.esp is pointing if you push some
> values. The dual 0 eips worry me as well, I have not registered any
> event-callbacks for the domain, perhaps that is the reason? But should I
> not be allowed to run without registering for interrupts in a brand-new
> domain? To me this looks like an interrupt-delivery gone bad.
Maybe an exception of some kind? Did you fill in the trap_table
(virtual IDT) in full_execution_context?

 -- Keir

Jacob Gorm Hansen

2004-Jan-20 18:19 UTC

head link

Re: [Xen-devel] Different esps

On Tue, 2004-01-20 at 18:08, Steven Hand wrote:> > On Tue, 2004-01-20 at 15:42, Jacob Gorm Hansen wrote:
> > > On Tue, 2004-01-20 at 13:25, Steven Hand wrote:
> > > > > On Tue, 2004-01-20 at 12:11, Jacob Gorm Hansen wrote:
> > > > 
> > > > Anyway, can you post -
> > > > 
> > > >   a) what it is you''re trying to do in detail
(I''m guessing it''s
> > > >      to do with migration but not sure what stage
you''re at) and

Adding the following to my recover() code changed things a bit, probably
more Xen state I need to restore before I can live happily ever after:

    HYPERVISOR_set_trap_table(trap_table);
    HYPERVISOR_set_fast_trap(SYSCALL_VECTOR);

Digging in. While I understand that this may mean I now have various
nice things such as a page fault handler, I am still puzzled how my
infinite loop could crash like that.

Jacob

Keir Fraser

2004-Jan-20 18:24 UTC

head link

Re: [Xen-devel] Different esps

> On Tue, 2004-01-20 at 18:08, Steven Hand wrote:
> > > On Tue, 2004-01-20 at 15:42, Jacob Gorm Hansen wrote:
> > > > On Tue, 2004-01-20 at 13:25, Steven Hand wrote:
> > > > > > On Tue, 2004-01-20 at 12:11, Jacob Gorm Hansen
wrote:
> > > > > 
> > > > > Anyway, can you post -
> > > > > 
> > > > >   a) what it is you''re trying to do in detail
(I''m guessing it''s
> > > > >      to do with migration but not sure what stage
you''re at) and
> 
> 
> Adding the following to my recover() code changed things a bit, probably
> more Xen state I need to restore before I can live happily ever after:
> 
>     HYPERVISOR_set_trap_table(trap_table);
>     HYPERVISOR_set_fast_trap(SYSCALL_VECTOR);
The alternative is to copy this info between the
full_execution_context''s of the old and new domains.
 > Digging in. While I understand that this may mean I now have various
> nice things such as a page fault handler, I am still puzzled how my
> infinite loop could crash like that.
Looks weird. Why not instrument Xenolinux''s trap handlers to see which
exception you are occasionally taking. It''s not hard -- most go thru
do_trap() in arch/xeno/kernel/traps.c. GPFs and page faults go thru
separate specialised functions.

 -- keir

Jacob Gorm Hansen

2004-Jan-20 18:29 UTC

head link

Re: [Xen-devel] Different esps

On Tue, 2004-01-20 at 19:24, Keir Fraser wrote:
> >     HYPERVISOR_set_trap_table(trap_table);
> >     HYPERVISOR_set_fast_trap(SYSCALL_VECTOR);
> 
> The alternative is to copy this info between the
> full_execution_context''s of the old and new domains.
Except that with my setup I do not have access to these, unless there is
a way to map them from within each unprivileged domain.
> Looks weird. Why not instrument Xenolinux''s trap handlers to see
which
> exception you are occasionally taking. It''s not hard -- most go
thru
> do_trap() in arch/xeno/kernel/traps.c. GPFs and page faults go thru
> separate specialised functions.
It seems with a little more wiggling of things (especially installing
the trap vectors _before_ touching the ring1 stack of current) makes me
run quite a bit further, though not really far.

thanks,
Jacob

Jacob Gorm Hansen

2004-Jan-22 23:06 UTC

head link

Re: [Xen-devel] Different esps

On Tue, 2004-01-20 at 19:19, Keir Fraser wrote:> > about line 329 in traps.c:
> > 
> Maybe an exception of some kind? Did you fill in the trap_table
> (virtual IDT) in full_execution_context?
Hi,

I cleaned up my page table remapping code, I am pretty confident it is
correct now (and the errors from Xen are fixed, thanks Steven). However,
I am still having the problem of an exception occuring apparently right
as the domain starts. I have tried copying the trap_table across and
installing it while creating the domain, but this has no effect -- for
instance I see lots of page faults (in ret_from_sys_call (xenolinux)) if
I print them from Xen, but the xenolinux pf handler is never reached.

Do I have to do anything more than just copy the trap_table into the
full-exe-ctxt before domain creation? Are the handler addresses in
virtual coordinates?

I have instrumented the GPF and general trap handlers in Xen, but they
are not called.

Perhaps I should just add a hypercall to make Xen dump the exe-context
in user space?

Jacob

Xen devel - Jan 2004 - Different esps

[Xen-devel] Different esps

Re: [Xen-devel] Different esps

Re: [Xen-devel] Different esps

Re: [Xen-devel] Different esps

Re: [Xen-devel] Different esps

Re: [Xen-devel] Different esps

Re: [Xen-devel] Different esps

Re: [Xen-devel] Different esps

Re: [Xen-devel] Different esps

Re: [Xen-devel] Different esps

Re: [Xen-devel] Different esps

Re: [Xen-devel] Different esps

Re: [Xen-devel] Different esps