thr3ads.net - Xen devel - [Xen-devel] E1000 hanging [Jul 2005]

If this information is useful, please help other people find it:
Share via:

peter bier

2005-Jul-02 10:33 UTC

[Xen-devel] E1000 hanging

I analyzed the e1000-ethernet driver hanging the system during reboot 
reboot. I found out, that the system worked properly when exchanging 
the "ms_delay" by an "ms_delay_irq". When looking into it I
recognized,
that the process goes to sleep when applying the first function, while 
it loops for the approximate time when applying the latter function. 
Further analysis showed, that the system schedules "xen_idle" which in
turn determines the "next timer".
   
I wrote quite a lot of various code insertions both in the kernel an within 
the xen hypervisor. I found out, that the timer which is in fact specified 
is essentially in the infinite future. This causes the sleep call never to 
wake up. It wakes up immediately at any keyboard click, and from what I saw 
in the scheduling code of the hypervisior ( the DF_BLOCKED flag seems to 
become cleared on receipt of an interrupt ), this seems to be the case for 
any interrupt. I managed to "fix" the problem by adding a timer a 10th
second
in the future within "do_block" in the hypervisor. The problem seems
to be
that the timer has elapsed, between the time the system decides to schedule
xen_idle and the moment it determines the "next timer". 
   Addmitedly I all of this with xen.2.0.5. I used the 2.0.6 hypervisor 
without recompiling the kernel and it showd the same behavior. I tried
xen-unstable yesterday, but the kernel failed to initialize and the 
last message I got was infroming that it was initializing the sata-disks. 
It hung and showed now devices.

I will try 2.0.6 next week by compiling the kernel completely. Looking 
into the routines "xen_idle", "set_timeout_timer" and 
"next_timer_interrupt" I found no changes at first sight. So I do not 
expect the behavior to change.  
  In the unstable version the check for local_softirq pending seems to 
be a candidate to fix the problem, because the system seems to be woken
up at the next clock-tick. And the cleck for pending events in 
do_block AFTER seting the (now-called) _VCPUF_BLOCKED flag and ( in 
case ) clearing it again, seems to do the job. 

Did this problem no show up at other "short-term" waits than in the 
e1000 driver ?
It occured there within the e1000_hw_reset routine. 

Is it good advice to try the "unstable" version ?

Thanks in advance 

      Peter Bier


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2005-Jul-02 15:20 UTC

head link

Re: [Xen-devel] E1000 hanging

On 2 Jul 2005, at 11:33, peter bier wrote:
> Did this problem no show up at other "short-term" waits than in
the
> e1000 driver ?
> It occured there within the e1000_hw_reset routine.
>
> Is it good advice to try the "unstable" version ?
The bug is fixed in 2.0.6 and in the unstable tree -- the problem 
should be gone if you use the 2.0.6 xenlinux kernel. Of course, the 
unstable tree may not work because of other problems. :-)

  -- Keir


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2005-Jul-02 15:24 UTC

head link

Re: [Xen-devel] E1000 hanging

On 2 Jul 2005, at 11:33, peter bier wrote:
>    Addmitedly I all of this with xen.2.0.5. I used the 2.0.6 hypervisor
> without recompiling the kernel and it showd the same behavior. I tried
> xen-unstable yesterday, but the kernel failed to initialize and the
> last message I got was infroming that it was initializing the 
> sata-disks.
> It hung and showed now devices.
This is interesting -- probably we are messing up ACPI initialisation 
and mis-routing your SATA controller''s IRQ line. Comparing verbose 
output from xen+xenlinux to that you get from native linux would be 
interesting.

  -- Keir


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Peter Bier

2005-Jul-02 19:41 UTC

head link

[Xen-devel] Re: E1000 hanging

Keir Fraser <Keir.Fraser <at> cl.cam.ac.uk> writes: 
 >  
>  
> On 2 Jul 2005, at 11:33, peter bier wrote: 
>  
> > Did this problem no show up at other "short-term" waits than
in the
> > e1000 driver ? 
> > It occured there within the e1000_hw_reset routine. 
> > 
> > Is it good advice to try the "unstable" version ? 
>  
> The bug is fixed in 2.0.6 and in the unstable tree -- the problem  
> should be gone if you use the 2.0.6 xenlinux kernel. Of course, the  
> unstable tree may not work because of other problems.  
>  
>   -- Keir 
>   
I''ll try the sources of 1.0.6 on monday or tuesday in the office. 
Could you give a hint where the problem has been fixed ? It seems, that 
it is not within the routines I mentioned in my above post, while they  
are completely redesigned in "xen-unstable".  
 
Peter  
 


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2005-Jul-02 22:28 UTC

head link

Re: [Xen-devel] Re: E1000 hanging

On 2 Jul 2005, at 20:41, Peter Bier wrote:
> I''ll try the sources of 1.0.6 on monday or tuesday in the office.
> Could you give a hint where the problem has been fixed ? It seems, that
> it is not within the routines I mentioned in my above post, while they
> are completely redesigned in "xen-unstable".
The problem was an overflow in jiffies_to_st() turning a small -ve 
number into a big +ve number, IIRC.

  -- Keir


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Peter Bier

2005-Jul-07 09:11 UTC

head link

[Xen-devel] Re: E1000 hanging

Keir Fraser <Keir.Fraser <at> cl.cam.ac.uk> writes:
> 
> 
> On 2 Jul 2005, at 20:41, Peter Bier wrote:
> 
> > I''ll try the sources of 1.0.6 on monday or tuesday in the
office.
> > Could you give a hint where the problem has been fixed ? It seems,
that
> > it is not within the routines I mentioned in my above post, while they
> > are completely redesigned in "xen-unstable".
> 
> The problem was an overflow in jiffies_to_st() turning a small -ve 
> number into a big +ve number, IIRC.
> 
>   -- Keir
> 

I tested xen2.0.6 and the problem is resolved. When looking into the sources,
I failed to notice the handling of the special case where "j-jiffies"
is less
than 1. Thank you for the support


   Peter 


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Xen devel - Jul 2005 - E1000 hanging

[Xen-devel] E1000 hanging

Re: [Xen-devel] E1000 hanging

Re: [Xen-devel] E1000 hanging

[Xen-devel] Re: E1000 hanging

Re: [Xen-devel] Re: E1000 hanging

[Xen-devel] Re: E1000 hanging