thr3ads.net - Xen devel - [Xen-devel] x86_64 eth0 e1000_clean_tx

If this information is useful, please help other people find it:
Share via:

Chris Wright

2006-Feb-07 20:47 UTC

[Xen-devel] x86_64 eth0 e1000_clean_tx_irq tx hang

This is against current x86_64 defconfig build:

e1000: eth0: e1000_clean_tx_irq: Detected Tx Unit Hang
  Tx Queue             <0>
  TDH                  <2b>
  TDT                  <31>
  next_to_use          <31>
  next_to_clean        <2b>
buffer_info[next_to_clean]
  time_stamp           <10004d5f2>
  next_to_watch        <2d>
  jiffies              <10004d7ce>
  next_to_watch.status <0>

... repeat until eventually ...

NETDEV WATCHDOG: eth0: transmit timed out

this is on simple scp to dom0 from external box.  after a bit watchdog
resets, and ping works, only to repeat itself when a try to scp again

thanks,
-chris

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Adam Wendt

2006-Feb-08 15:06 UTC

head link

Re: [Xen-devel] x86_64 eth0 e1000_clean_tx_irq tx hang

Ah, finally people experiencing same bug as me!

It is much much worse for me, as soon as I ping from domU network goes down,
started with the subarch change.

Adam Wendt
IPCoast, Inc.

On Wed, 8 Feb 2006 12:11 , Chris Wright <chrisw@sous-sol.org> sent:
>* Ian Pratt (m+Ian.Pratt@cl.cam.ac.uk) wrote:
>> Yep, this is the bug I warned y''all about at the summit, but
you asked
>> for the code to be checked in anyway... 
>
>Hehe, get what you ask for...
>
>> A bug shared is a bug fixed quicker? :-)
>
>Let''s hope ;-)
>
>> For us, this only manifests on x86_64, and arrived with the subarch xen
>> version of 2.6.12. Extensive inspection of the arch->subarch
conversion
>> suggests that nothing should have changed, so this is likely a latent
>> bug being triggered by slight timing changes.
>> 
>> It sounds like it''s rather easier for you to trigger than it
was for us
>> -- we had to run xm-test several times to get it to happen. Happy
>> hunting, and good luck :-)
>
>It''s trivial for me to trigger.  I''ll keep poking at it.
>
>thanks,
>-chris
>
>_______________________________________________
>Xen-devel mailing list
>Xen-devel@lists.xensource.com
>http://lists.xensource.com/xen-devel
>




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Ian Pratt

2006-Feb-08 20:01 UTC

head link

RE: [Xen-devel] x86_64 eth0 e1000_clean_tx_irq tx hang

> This is against current x86_64 defconfig build:
> 
> e1000: eth0: e1000_clean_tx_irq: Detected Tx Unit Hang
>   Tx Queue             <0>
>   TDH                  <2b>
>   TDT                  <31>
>   next_to_use          <31>
>   next_to_clean        <2b>
> buffer_info[next_to_clean]
>   time_stamp           <10004d5f2>
>   next_to_watch        <2d>
>   jiffies              <10004d7ce>
>   next_to_watch.status <0>
> 
> ... repeat until eventually ...
> 
> NETDEV WATCHDOG: eth0: transmit timed out
> 
> this is on simple scp to dom0 from external box.  after a bit 
> watchdog resets, and ping works, only to repeat itself when a 
> try to scp again
Yep, this is the bug I warned y''all about at the summit, but you asked
for the code to be checked in anyway... 

A bug shared is a bug fixed quicker? :-)

For us, this only manifests on x86_64, and arrived with the subarch xen
version of 2.6.12. Extensive inspection of the arch->subarch conversion
suggests that nothing should have changed, so this is likely a latent
bug being triggered by slight timing changes.

It sounds like it''s rather easier for you to trigger than it was for us
-- we had to run xm-test several times to get it to happen. Happy
hunting, and good luck :-)

Ian

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Chris Wright

2006-Feb-08 20:11 UTC

head link

Re: [Xen-devel] x86_64 eth0 e1000_clean_tx_irq tx hang

* Ian Pratt (m+Ian.Pratt@cl.cam.ac.uk) wrote:> Yep, this is the bug I warned y''all about at the summit, but you
asked
> for the code to be checked in anyway... 
Hehe, get what you ask for...
> A bug shared is a bug fixed quicker? :-)
Let''s hope ;-)
> For us, this only manifests on x86_64, and arrived with the subarch xen
> version of 2.6.12. Extensive inspection of the arch->subarch conversion
> suggests that nothing should have changed, so this is likely a latent
> bug being triggered by slight timing changes.
> 
> It sounds like it''s rather easier for you to trigger than it was
for us
> -- we had to run xm-test several times to get it to happen. Happy
> hunting, and good luck :-)
It''s trivial for me to trigger.  I''ll keep poking at it.

thanks,
-chris

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Ian Pratt

2006-Feb-08 23:36 UTC

head link

RE: [Xen-devel] x86_64 eth0 e1000_clean_tx_irq tx hang

> Ah, finally people experiencing same bug as me!
> 
> It is much much worse for me, as soon as I ping from domU 
> network goes down, started with the subarch change.
For this bug it might actually be helpful to start collecting
information about the hardware it''s observed on.

For us, the bug is hard to repro, despite us having tried on several
different machines (2 and 4 way SMP, Opteron and Xeon, tg3 and e1000
NICs).

If the bug is easier to trigger for you, please post a summary of the
hardware and anything unusual about your config (i.e. not default
bridged).

Thanks,
Ian
 > Adam Wendt
> IPCoast, Inc.
> 
> On Wed, 8 Feb 2006 12:11 , Chris Wright <chrisw@sous-sol.org> sent:
> 
> >* Ian Pratt (m+Ian.Pratt@cl.cam.ac.uk) wrote:
> >> Yep, this is the bug I warned y''all about at the summit,
but you
> >> asked for the code to be checked in anyway...
> >
> >Hehe, get what you ask for...
> >
> >> A bug shared is a bug fixed quicker? :-)
> >
> >Let''s hope ;-)
> >
> >> For us, this only manifests on x86_64, and arrived with 
> the subarch 
> >> xen version of 2.6.12. Extensive inspection of the
arch->subarch
> >> conversion suggests that nothing should have changed, so this is 
> >> likely a latent bug being triggered by slight timing changes.
> >> 
> >> It sounds like it''s rather easier for you to trigger than
> it was for 
> >> us
> >> -- we had to run xm-test several times to get it to happen. Happy 
> >> hunting, and good luck :-)
> >
> >It''s trivial for me to trigger.  I''ll keep poking at
it.
> >
> >thanks,
> >-chris
> >
> >_______________________________________________
> >Xen-devel mailing list
> >Xen-devel@lists.xensource.com
> >http://lists.xensource.com/xen-devel
> >
> 
> 
> 
> 
> 
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Chris Wright

2006-Feb-09 01:27 UTC

head link

Re: [Xen-devel] x86_64 eth0 e1000_clean_tx_irq tx hang

* Ian Pratt (m+Ian.Pratt@cl.cam.ac.uk) wrote:> For us, the bug is hard to repro, despite us having tried on several
> different machines (2 and 4 way SMP, Opteron and Xeon, tg3 and e1000
> NICs).
xeon, 4 cpu (2-ht), e1000, 4G

- works fine w/ 32-bit
- dom0 is UP (SMP fails as well)
  - this is dom0 only, no xend, no domUs, no bridging
  - limiting to 2G works fine, sounds like something with swiotlb

Also, while it was working, I blasted with packets, and eventually got:

irq 19: nobody cared (try booting with the "irqpoll" option)

Call Trace: <IRQ> <ffffffff80148508>{__report_bad_irq+56}
       <ffffffff80148721>{note_interrupt+449}
<ffffffff80147dcc>{handle_IRQ_event+76}
       <ffffffff80147ec2>{__do_IRQ+162}
<ffffffff8011077b>{do_IRQ+75}
       <ffffffff802f16b5>{evtchn_do_upcall+117}
<ffffffff8010e5f1>{do_hypervisor_callback+37}
       <ffffffff8011ccc5>{ia32_syscall+13}
<ffffffff8010a22a>{hypercall_page+554}
       <ffffffff8010a22a>{hypercall_page+554}
<ffffffff802f14de>{force_evtchn_callback+14}
       <ffffffff80147db5>{handle_IRQ_event+53}
<ffffffff80147ea8>{__do_IRQ+136}
       <ffffffff8011077b>{do_IRQ+75}
<ffffffff802f16b5>{evtchn_do_upcall+117}
       <ffffffff8010e5f1>{do_hypervisor_callback+37} <EOI>
       <ffffffff8011ccc5>{ia32_syscall+13}
handlers:
[<ffffffff80377b80>] (ata_interrupt+0x0/0x1b0)
[<ffffffff80396570>] (usb_hcd_irq+0x0/0x70)
Disabling IRQ #19

thanks,
-chris

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Ian Pratt

2006-Feb-09 01:29 UTC

head link

RE: [Xen-devel] x86_64 eth0 e1000_clean_tx_irq tx hang

> xeon, 4 cpu (2-ht), e1000, 4G
> 
> - works fine w/ 32-bit
> - dom0 is UP (SMP fails as well)
>   - this is dom0 only, no xend, no domUs, no bridging
>   - limiting to 2G works fine, sounds like something with swiotlb
That''s interesting, but I''d be surprised if it was an swiotlb
thing --
it looks so much more like an interrupt problem. e1000 and tg3
shouldn''t
be going anywhere near swiotlb anyhow.

Please can you try a PAE kernel just to check you don''t have the
problem.
 > Also, while it was working, I blasted with packets, and 
> eventually got:
> 
> irq 19: nobody cared (try booting with the "irqpoll" option)
What devices are on irq 19?

It might be worth trying booting nousb on the kernel command line (or
usb-handoff)

Thanks,
Ian
> Call Trace: <IRQ> <ffffffff80148508>{__report_bad_irq+56}
>        <ffffffff80148721>{note_interrupt+449} 
> <ffffffff80147dcc>{handle_IRQ_event+76}
>        <ffffffff80147ec2>{__do_IRQ+162}
<ffffffff8011077b>{do_IRQ+75}
>        <ffffffff802f16b5>{evtchn_do_upcall+117} 
> <ffffffff8010e5f1>{do_hypervisor_callback+37}
>        <ffffffff8011ccc5>{ia32_syscall+13} 
> <ffffffff8010a22a>{hypercall_page+554}
>        <ffffffff8010a22a>{hypercall_page+554} 
> <ffffffff802f14de>{force_evtchn_callback+14}
>        <ffffffff80147db5>{handle_IRQ_event+53} 
> <ffffffff80147ea8>{__do_IRQ+136}
>        <ffffffff8011077b>{do_IRQ+75} 
> <ffffffff802f16b5>{evtchn_do_upcall+117}
>        <ffffffff8010e5f1>{do_hypervisor_callback+37} <EOI>
>        <ffffffff8011ccc5>{ia32_syscall+13}
> handlers:
> [<ffffffff80377b80>] (ata_interrupt+0x0/0x1b0) 
> [<ffffffff80396570>] (usb_hcd_irq+0x0/0x70) Disabling IRQ #19
> 
> thanks,
> -chris
> 
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Chris Wright

2006-Feb-09 01:38 UTC

head link

Re: [Xen-devel] x86_64 eth0 e1000_clean_tx_irq tx hang

* Ian Pratt (m+Ian.Pratt@cl.cam.ac.uk) wrote:> > xeon, 4 cpu (2-ht), e1000, 4G
> > 
> > - works fine w/ 32-bit
> > - dom0 is UP (SMP fails as well)
> >   - this is dom0 only, no xend, no domUs, no bridging
> >   - limiting to 2G works fine, sounds like something with swiotlb
> 
> That''s interesting, but I''d be surprised if it was an
swiotlb thing --
> it looks so much more like an interrupt problem. e1000 and tg3
shouldn''t
> be going anywhere near swiotlb anyhow.
> 
> Please can you try a PAE kernel just to check you don''t have the
> problem.
It''s 64-bit.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Chris Wright

2006-Feb-09 01:49 UTC

head link

Re: [Xen-devel] x86_64 eth0 e1000_clean_tx_irq tx hang

* Ian Pratt (m+Ian.Pratt@cl.cam.ac.uk) wrote:
whoops, missed this part.
> What devices are on irq 19?
> 
> It might be worth trying booting nousb on the kernel command line (or
> usb-handoff)
19:       5748        Phys-irq  libata, uhci_hcd:usb3

with ata, that effectively killed the box.  trying with nousb, but i
wonder if it''s not evntchn problem?

thanks,
-chris

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Ian Pratt

2006-Feb-09 01:50 UTC

head link

RE: [Xen-devel] x86_64 eth0 e1000_clean_tx_irq tx hang

> > That''s interesting, but I''d be surprised if it was
an
> swiotlb thing -- 
> > it looks so much more like an interrupt problem. e1000 and tg3 
> > shouldn''t be going anywhere near swiotlb anyhow.
> > 
> > Please can you try a PAE kernel just to check you don''t have
the
> > problem.
> 
> It''s 64-bit.
Yep, but I''m wandering whether it''s worth trying a PAE kernel
as that
might give a datapoint to indicate whether swiotlb might be involved. 

My money is still on an interrupt problem (virtual or otherwise),
though.

Ian

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Chris Wright

2006-Feb-09 01:59 UTC

head link

Re: [Xen-devel] x86_64 eth0 e1000_clean_tx_irq tx hang

* Ian Pratt (m+Ian.Pratt@cl.cam.ac.uk) wrote:>  > > That''s interesting, but I''d be surprised if it
was an
> > swiotlb thing -- 
> > > it looks so much more like an interrupt problem. e1000 and tg3 
> > > shouldn''t be going anywhere near swiotlb anyhow.
> > > 
> > > Please can you try a PAE kernel just to check you don''t
have the
> > > problem.
> > 
> > It''s 64-bit.
> 
> Yep, but I''m wandering whether it''s worth trying a PAE
kernel as that
> might give a datapoint to indicate whether swiotlb might be involved. 
Yeah, sorry, I was confused at first.  I''m building PAE atm.

thanks,
-chris

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Ian Pratt

2006-Feb-09 02:29 UTC

head link

RE: [Xen-devel] x86_64 eth0 e1000_clean_tx_irq tx hang

> > What devices are on irq 19?
> > 
> > It might be worth trying booting nousb on the kernel 
> command line (or
> > usb-handoff)
> 
> 19:       5748        Phys-irq  libata, uhci_hcd:usb3
> 
> with ata, that effectively killed the box.  trying with 
> nousb, but i wonder if it''s not evntchn problem?
Something else to try might be booting with maxcpus=1 on the xen command
line, but if you''re running just a uniproc dom0 this really ought not
make any difference. 

When the box is in a bad state, it might be worth using the serial debug
keys to get some information about the ioapic and event channels.

I''m glad you''ve got an easy way of repro''ing this.
I''ve just tried again
on a couple of our machines and it took me ages to trigger. 

Thanks,
Ian

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Chris Wright

2006-Feb-09 02:29 UTC

head link

Re: [Xen-devel] x86_64 eth0 e1000_clean_tx_irq tx hang

* Ian Pratt (m+Ian.Pratt@cl.cam.ac.uk) wrote:> Yep, but I''m wandering whether it''s worth trying a PAE
kernel as that
> might give a datapoint to indicate whether swiotlb might be involved. 
OK, PAE works fine.

thanks,
-chris

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Christian Leber

2006-Feb-09 15:02 UTC

head link

Re: [Xen-devel] x86_64 eth0 e1000_clean_tx_irq tx hang

On Wed, Feb 08, 2006 at 11:36:06PM -0000, Ian Pratt wrote:
> For this bug it might actually be helpful to start collecting
> information about the hardware it''s observed on.
> 
> For us, the bug is hard to repro, despite us having tried on several
> different machines (2 and 4 way SMP, Opteron and Xeon, tg3 and e1000
> NICs).
It''s not on Xen, but i get something similar with scp:
(and this Tx Unit Hang seems to be a seldom problem)
(2.6.15)

[4294726.019000] e1000: eth0: e1000_clean_tx_irq: Detected Tx Unit Hang
[4294726.019000] TDH <cb>
[4294726.019000] TDT <cb>
[4294726.019000] next_to_use <cb>
[4294726.019000] next_to_clean <df>
[4294726.019000] buffer_info[next_to_clean]
[4294726.019000] dma <1aa25cce>
[4294726.019000] time_stamp <fffc40e7>
[4294726.019000] next_to_watch <df>
[4294726.019000] jiffies <fffc5183>
[4294726.019000] next_to_watch.status <0>

That happens on AthlonXP+ViaKT600 but not on Intel PIII with Intel 815
chipset.
https://launchpad.net/distros/ubuntu/+source/linux-source-2.6.15/+bug/30476


Christian Leber

-- 
  "Omnis enim res, quae dando non deficit, dum habetur et non datur,
   nondum habetur, quomodo habenda est."       (Aurelius Augustinus)
  Translation: <http://gnuhh.org/work/fsf-europe/augustinus.html>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Chris Wright

2006-Feb-09 17:24 UTC

head link

Re: [Xen-devel] x86_64 eth0 e1000_clean_tx_irq tx hang

* Christian Leber (christian@leber.de) wrote:> That happens on AthlonXP+ViaKT600 but not on Intel PIII with Intel 815
> chipset.
> https://launchpad.net/distros/ubuntu/+source/linux-source-2.6.15/+bug/30476
Does that have >=2.6.15.2 patchset?

thanks,
-chris

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Christian Leber

2006-Feb-09 23:29 UTC

head link

Re: [Xen-devel] x86_64 eth0 e1000_clean_tx_irq tx hang

On Thu, Feb 09, 2006 at 09:24:57AM -0800, Chris Wright
wrote:> > That happens on AthlonXP+ViaKT600 but not on Intel PIII with Intel 815
> > chipset.
> >
https://launchpad.net/distros/ubuntu/+source/linux-source-2.6.15/+bug/30476
> 
> Does that have >=2.6.15.2 patchset?
No, but it''s >=2.6.15.1 and the 2.6.15.2 changelog doesn''t
seem to be related
to ethernet drivers.
I tried also 2.6.16-rc2 and it has the same problem.

Christian Leber

-- 
  "Omnis enim res, quae dando non deficit, dum habetur et non datur,
   nondum habetur, quomodo habenda est."       (Aurelius Augustinus)
  Translation: <http://gnuhh.org/work/fsf-europe/augustinus.html>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Kamble, Nitin A

2006-Feb-09 23:55 UTC

head link

RE: [Xen-devel] x86_64 eth0 e1000_clean_tx_irq tx hang

>  - limiting to 2G works fine, sounds like something with swiotlb
I noticed it too and exactly same. I also notice this in the dom0 dmesg.

PCI-DMA: Disabling IOMMU.
WARNING more than 4GB of memory but IOMMU not compiled in.
WARNING 32bit PCI may malfunction.
You might want to enable CONFIG_GART_IOMMU
Memory: 5868412k/6071120k available (3553k kernel code, 202040k
reserved, 1376k
data, 300k init)

Thanks & Regards,
Nitin
------------------------------------------------------------------------
-----------
Open Source Technology Center, Intel Corp
>-----Original Message-----
>From: xen-devel-bounces@lists.xensource.com [mailto:xen-devel-
>bounces@lists.xensource.com] On Behalf Of Chris Wright
>Sent: Wednesday, February 08, 2006 5:28 PM
>To: Ian Pratt
>Cc: Chris Wright; xen-devel@lists.xensource.com; adam@ipcoast.com
>Subject: Re: [Xen-devel] x86_64 eth0 e1000_clean_tx_irq tx hang
>
>* Ian Pratt (m+Ian.Pratt@cl.cam.ac.uk) wrote:
>> For us, the bug is hard to repro, despite us having tried on several
>> different machines (2 and 4 way SMP, Opteron and Xeon, tg3 and e1000
>> NICs).
>
>xeon, 4 cpu (2-ht), e1000, 4G
>
>- works fine w/ 32-bit
>- dom0 is UP (SMP fails as well)
>  - this is dom0 only, no xend, no domUs, no bridging
>  - limiting to 2G works fine, sounds like something with swiotlb
>
>Also, while it was working, I blasted with packets, and eventually got:
>
>irq 19: nobody cared (try booting with the "irqpoll" option)
>
>Call Trace: <IRQ> <ffffffff80148508>{__report_bad_irq+56}
>       <ffffffff80148721>{note_interrupt+449}
><ffffffff80147dcc>{handle_IRQ_event+76}
>       <ffffffff80147ec2>{__do_IRQ+162}
<ffffffff8011077b>{do_IRQ+75}
>       <ffffffff802f16b5>{evtchn_do_upcall+117}
><ffffffff8010e5f1>{do_hypervisor_callback+37}
>       <ffffffff8011ccc5>{ia32_syscall+13}
><ffffffff8010a22a>{hypercall_page+554}
>       <ffffffff8010a22a>{hypercall_page+554}
><ffffffff802f14de>{force_evtchn_callback+14}
>       <ffffffff80147db5>{handle_IRQ_event+53}
><ffffffff80147ea8>{__do_IRQ+136}
>       <ffffffff8011077b>{do_IRQ+75}
><ffffffff802f16b5>{evtchn_do_upcall+117}
>       <ffffffff8010e5f1>{do_hypervisor_callback+37} <EOI>
>       <ffffffff8011ccc5>{ia32_syscall+13}
>handlers:
>[<ffffffff80377b80>] (ata_interrupt+0x0/0x1b0)
>[<ffffffff80396570>] (usb_hcd_irq+0x0/0x70)
>Disabling IRQ #19
>
>thanks,
>-chris
>
>_______________________________________________
>Xen-devel mailing list
>Xen-devel@lists.xensource.com
>http://lists.xensource.com/xen-devel
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2006-Feb-10 11:20 UTC

head link

Re: [Xen-devel] x86_64 eth0 e1000_clean_tx_irq tx hang

On 9 Feb 2006, at 23:55, Kamble, Nitin A wrote:
>>  - limiting to 2G works fine, sounds like something with swiotlb
>
> I noticed it too and exactly same. I also notice this in the dom0 
> dmesg.
>
> PCI-DMA: Disabling IOMMU.
> WARNING more than 4GB of memory but IOMMU not compiled in.
> WARNING 32bit PCI may malfunction.
> You might want to enable CONFIG_GART_IOMMU
> Memory: 5868412k/6071120k available (3553k kernel code, 202040k
> reserved, 1376k
> data, 300k init)
That is harmless. In fact our SWIOTLB probably is enabled (look at the 
lines just above the ones you posted). It''s because we don''t
properly
(yet) respect the new plug-n-play dma_ops structures in x86_64.

I''ve checked in a temporary fix to remove the above misleading lines.

  -- Keir


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Muli Ben-Yehuda

2006-Feb-10 11:33 UTC

head link

Re: [Xen-devel] x86_64 eth0 e1000_clean_tx_irq tx hang

On Fri, Feb 10, 2006 at 11:20:50AM +0000, Keir Fraser wrote:
> That is harmless. In fact our SWIOTLB probably is enabled (look at the 
> lines just above the ones you posted). It''s because we
don''t properly
> (yet) respect the new plug-n-play dma_ops structures in x86_64.
> 
> I''ve checked in a temporary fix to remove the above misleading
> lines.
There was also a harmless bug in the initial dma_ops patch that caused
the wrong printk in some cases. Jon Mason submitted a fix that is in
mainline now. Not sure if this is the case here, but FYI.

Cheers,
Muli
-- 
Muli Ben-Yehuda
http://www.mulix.org | http://mulix.livejournal.com/


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Chris Wright

2006-Feb-16 03:07 UTC

head link

Re: [Xen-devel] x86_64 eth0 e1000_clean_tx_irq tx hang

* Kamble, Nitin A (nitin.a.kamble@intel.com) wrote:> >  - limiting to 2G works fine, sounds like something with swiotlb
> 
> I noticed it too and exactly same. I also notice this in the dom0 dmesg.
After spending hours trying to find something -- anything -- wrong with
irq delivery and e1000 hung tx unit, I went back to my original hunch,
which was swiotlb related.  When TSO is enabled, some debugging showed
this:

swiotlb_map_page: returns d586a000
dma_map_page: returns ffffffffd586a000

Indeed.

 a43:   e8 00 00 00 00          callq  a48 <dma_map_page+0xc8>        a44:
R_X86_64_PC32      swiotlb_map_page+0xfffffffffffffffc
 a48:   48 63 d8                movslq %eax,%rbx

Whoops.  Prototype mismatch.

And had we been paying attention:

/home/chrisw/hg/xen/xen-unstable/linux-2.6.16-rc2-xen0/arch/x86_64/kernel/../../i386/kernel/pci-dma-xen.c:107:
warning: implicit declaration of function ‘swiotlb_map_page’
/home/chrisw/hg/xen/xen-unstable/linux-2.6.16-rc2-xen0/arch/x86_64/kernel/../../i386/kernel/pci-dma-xen.c:
In function ‘dma_unmap_page’:
/home/chrisw/hg/xen/xen-unstable/linux-2.6.16-rc2-xen0/arch/x86_64/kernel/../../i386/kernel/pci-dma-xen.c:125:
warning: implicit declaration of function ‘swiotlb_unmap_page’

Here''s a quick patch that fixes the issue (not ready to apply to
-unstable, since it''s a file that''s not in sparse tree). 
Nitin, this
should fix your problem as well.  I''ll work on a proper patch later
this
evening or tomorrow morning.

thanks,
-chris
--

--- linux-2.6.16-rc2/include/asm-x86_64/swiotlb.h	2006-02-15 21:42:24.000000000
-0500
+++ linux-2.6.16-rc2-xen0/include/asm-x86_64/swiotlb.h	2006-02-15
21:19:15.000000000 -0500
@@ -38,6 +38,11 @@
 extern void swiotlb_unmap_sg(struct device *hwdev, struct scatterlist *sg,
 			 int nents, int direction);
 extern int swiotlb_dma_mapping_error(dma_addr_t dma_addr);
+extern dma_addr_t swiotlb_map_page(struct device *hwdev, struct page *page,
+                                   unsigned long offset, size_t size,
+                                   enum dma_data_direction direction);
+extern void swiotlb_unmap_page(struct device *hwdev, dma_addr_t dma_address,
+                               size_t size, enum dma_data_direction direction);
 extern void swiotlb_free_coherent (struct device *hwdev, size_t size,
 				   void *vaddr, dma_addr_t dma_handle);
 extern int swiotlb_dma_supported(struct device *hwdev, u64 mask);

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2006-Feb-16 11:36 UTC

head link

Re: [Xen-devel] x86_64 eth0 e1000_clean_tx_irq tx hang

On 16 Feb 2006, at 03:07, Chris Wright wrote:
> Here''s a quick patch that fixes the issue (not ready to apply to
> -unstable, since it''s a file that''s not in sparse tree). 
Nitin, this
> should fix your problem as well.  I''ll work on a proper patch
later
> this
> evening or tomorrow morning.
Thanks for tracking this one down: it''s been our major outstanding bug 
for a while now. We checked in a suitable fix to -unstable (change 
pci-dma-xen.c to explicitly include the asm-i386/mach-xen version of 
swiotlb.h).

  -- Keir


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jan Beulich

2006-Feb-16 11:45 UTC

head link

Re: [Xen-devel] x86_64 eth0 e1000_clean_tx_irq tx hang

>>> Keir Fraser <Keir.Fraser@cl.cam.ac.uk> 16.02.06 12:36:29
>>>
>
>On 16 Feb 2006, at 03:07, Chris Wright wrote:
>
>> Here''s a quick patch that fixes the issue (not ready to apply
to
>> -unstable, since it''s a file that''s not in sparse
tree).  Nitin, this
>> should fix your problem as well.  I''ll work on a proper patch
later
>> this
>> evening or tomorrow morning.
>
>Thanks for tracking this one down: it''s been our major outstanding
bug
>for a while now. We checked in a suitable fix to -unstable (change 
>pci-dma-xen.c to explicitly include the asm-i386/mach-xen version of 
>swiotlb.h).
This doesn''t sound like a good thing to do, as that way all but this
one file will include the x86-64 version of it,
and you can easily get things out of sync (if e.g. the x86-64 version changes).
I would much favor the change being done
as originally posted; we have a similar same fix in our tree.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Guillaume Thouvenin

2006-Feb-16 13:10 UTC

head link

Re: [Xen-devel] x86_64 eth0 e1000_clean_tx_irq tx hang

On Wed, 15 Feb 2006 19:07:15 -0800
Chris Wright <chrisw@sous-sol.org> wrote:> 
> --- linux-2.6.16-rc2/include/asm-x86_64/swiotlb.h	2006-02-15
21:42:24.000000000 -0500
> +++ linux-2.6.16-rc2-xen0/include/asm-x86_64/swiotlb.h	2006-02-15
21:19:15.000000000 -0500
> @@ -38,6 +38,11 @@
>  extern void swiotlb_unmap_sg(struct device *hwdev, struct scatterlist *sg,
>  			 int nents, int direction);
>  extern int swiotlb_dma_mapping_error(dma_addr_t dma_addr);
> +extern dma_addr_t swiotlb_map_page(struct device *hwdev, struct page
*page,
> +                                   unsigned long offset, size_t size,
> +                                   enum dma_data_direction direction);
> +extern void swiotlb_unmap_page(struct device *hwdev, dma_addr_t
dma_address,
> +                               size_t size, enum dma_data_direction
direction);
>  extern void swiotlb_free_coherent (struct device *hwdev, size_t size,
>  				   void *vaddr, dma_addr_t dma_handle);
>  extern int swiotlb_dma_supported(struct device *hwdev, u64 mask);

The patch fixes the problem of the tx hang and it also fixes another
problem on my box. With the xen unstable (changeset 8833), I couldn''t
open a ssh connection on the domain 0 until I ran the xend daemon (I
don''t know why running the xend daemon allows the connection). With the
patch, I can open a ssh connection as soon as the ssh daemon is running
on domain0.

Just a remark, if I enable PAE, it doesn''t solve the problem of the tx
hang on my computer which is an Intel Xeon (1 CPU) with hyper-threading
enabled. I''m using a debian distribution.

thanks,
Guillaume

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2006-Feb-16 13:54 UTC

head link

Re: [Xen-devel] x86_64 eth0 e1000_clean_tx_irq tx hang

On 16 Feb 2006, at 11:45, Jan Beulich wrote:
>> Thanks for tracking this one down: it''s been our major
outstanding bug
>> for a while now. We checked in a suitable fix to -unstable (change
>> pci-dma-xen.c to explicitly include the asm-i386/mach-xen version of
>> swiotlb.h).
>
> This doesn''t sound like a good thing to do, as that way all but
this
> one file will include the x86-64 version of it,
> and you can easily get things out of sync (if e.g. the x86-64 version 
> changes). I would much favor the change being done
> as originally posted; we have a similar same fix in our tree.
In our tree, pci-dma-xen.c is the only file that uses the core swiotlb 
functions. Since it''s an i386 file linked against our xen-i386 swiotlb,
it seems to make sense for it to include explicitly the i386 swiotlb 
header file. The best fix of course is to merge the swiotlbs: maybe by 
incrementally modifying the xen-specific one to get it closer the 
generic swiotlb code.

  -- Keir


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2006-Feb-16 13:55 UTC

head link

Re: [Xen-devel] x86_64 eth0 e1000_clean_tx_irq tx hang

On 16 Feb 2006, at 13:10, Guillaume Thouvenin wrote:
> Just a remark, if I enable PAE, it doesn''t solve the problem of
the tx
> hang on my computer which is an Intel Xeon (1 CPU) with hyper-threading
> enabled. I''m using a debian distribution.
Does this go away if your specify ''mem=2G'' as a Xen boot
parameter?

   -- Keir


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Guillaume Thouvenin

2006-Feb-17 07:26 UTC

head link

Re: [Xen-devel] x86_64 eth0 e1000_clean_tx_irq tx hang

On Thu, 16 Feb 2006 13:55:18 +0000
Keir Fraser <Keir.Fraser@cl.cam.ac.uk> wrote:
> 
> On 16 Feb 2006, at 13:10, Guillaume Thouvenin wrote:
> 
> > Just a remark, if I enable PAE, it doesn''t solve the problem
of the tx
> > hang on my computer which is an Intel Xeon (1 CPU) with
hyper-threading
> > enabled. I''m using a debian distribution.
> 
> Does this go away if your specify ''mem=2G'' as a Xen boot
parameter?
Yes it goes away.

Guillaume

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Xen devel - Feb 2006 - x86_64 eth0 e1000_clean_tx_irq tx hang

[Xen-devel] x86_64 eth0 e1000_clean_tx_irq tx hang

Re: [Xen-devel] x86_64 eth0 e1000_clean_tx_irq tx hang

RE: [Xen-devel] x86_64 eth0 e1000_clean_tx_irq tx hang

Re: [Xen-devel] x86_64 eth0 e1000_clean_tx_irq tx hang

RE: [Xen-devel] x86_64 eth0 e1000_clean_tx_irq tx hang

Re: [Xen-devel] x86_64 eth0 e1000_clean_tx_irq tx hang

RE: [Xen-devel] x86_64 eth0 e1000_clean_tx_irq tx hang

Re: [Xen-devel] x86_64 eth0 e1000_clean_tx_irq tx hang

Re: [Xen-devel] x86_64 eth0 e1000_clean_tx_irq tx hang

RE: [Xen-devel] x86_64 eth0 e1000_clean_tx_irq tx hang

Re: [Xen-devel] x86_64 eth0 e1000_clean_tx_irq tx hang

RE: [Xen-devel] x86_64 eth0 e1000_clean_tx_irq tx hang

Re: [Xen-devel] x86_64 eth0 e1000_clean_tx_irq tx hang

Re: [Xen-devel] x86_64 eth0 e1000_clean_tx_irq tx hang

Re: [Xen-devel] x86_64 eth0 e1000_clean_tx_irq tx hang

Re: [Xen-devel] x86_64 eth0 e1000_clean_tx_irq tx hang

RE: [Xen-devel] x86_64 eth0 e1000_clean_tx_irq tx hang

Re: [Xen-devel] x86_64 eth0 e1000_clean_tx_irq tx hang

Re: [Xen-devel] x86_64 eth0 e1000_clean_tx_irq tx hang

Re: [Xen-devel] x86_64 eth0 e1000_clean_tx_irq tx hang

Re: [Xen-devel] x86_64 eth0 e1000_clean_tx_irq tx hang

Re: [Xen-devel] x86_64 eth0 e1000_clean_tx_irq tx hang

Re: [Xen-devel] x86_64 eth0 e1000_clean_tx_irq tx hang

Re: [Xen-devel] x86_64 eth0 e1000_clean_tx_irq tx hang

Re: [Xen-devel] x86_64 eth0 e1000_clean_tx_irq tx hang

Re: [Xen-devel] x86_64 eth0 e1000_clean_tx_irq tx hang