thr3ads.net - Xen users - [Xen-users] pci-passthrough in pvops causing offline raid [Nov 2010]

If this information is useful, please help other people find it:
Share via:

Mark Adams

2010-Nov-11 10:24 UTC

[Xen-users] pci-passthrough in pvops causing offline raid

Hi All,

Running xen 4.0.1-rc6, debian squeeze 2.6.32-21.

In a voip setup, where I have forwarded the onboard NIC interfaces
through to domU using the following grub config:

module  /vmlinuz-2.6.32-5-xen-amd64 placeholder
root=UUID=25c3ac79-6850-498d-afcf-ea42970e94fd ro  quiet xen-pciback.permissive
xen-pciback.hide=(02:00.0)(03:00.0) pci=resource_alignment=02:00.0;03:00.0

I''m having a serious issue where the raid card goes offline after an
indefinate period of time. Sometimes runs fine for a week, other times 1
day before I get "offline device" errors. Rebooting the machine fixes
it
straight away, and everything is back online.

What in the Xen pciback is causing the raid card to go offline? The
only devices hidden are the 2 onboard NIC''s.

I know that this issue is with Xen, as I had this running on a different
server (same xen setup) and it had the same issues, which I initially
thought were to do with the raid card.

Is there known issues in this kernel and xen version with pciback? I''m
going to update to the current package versions this evening (4.0.1-1
and 2.6.32-27) however would appreciate if anyone has any other insight
into this issue, or even just a note to say it is a bug that has been
fixed in current versions!

Thanks,
Mark

_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Olivier Hanesse

2010-Nov-11 11:13 UTC

head link

Re: [Xen-users] pci-passthrough in pvops causing offline raid

Hello,

What is the module of your raid card ?
If it is "megaraid_sas", please try upgrading the version of that
module
(see others post in the ML).

Regards

2010/11/11 Mark Adams <mark@campbell-lange.net>
> Hi All,
>
> Running xen 4.0.1-rc6, debian squeeze 2.6.32-21.
>
> In a voip setup, where I have forwarded the onboard NIC interfaces
> through to domU using the following grub config:
>
> module  /vmlinuz-2.6.32-5-xen-amd64 placeholder
> root=UUID=25c3ac79-6850-498d-afcf-ea42970e94fd ro  quiet
> xen-pciback.permissive xen-pciback.hide=(02:00.0)(03:00.0)
> pci=resource_alignment=02:00.0;03:00.0
>
> I''m having a serious issue where the raid card goes offline after
an
> indefinate period of time. Sometimes runs fine for a week, other times 1
> day before I get "offline device" errors. Rebooting the machine
fixes it
> straight away, and everything is back online.
>
> What in the Xen pciback is causing the raid card to go offline? The
> only devices hidden are the 2 onboard NIC''s.
>
> I know that this issue is with Xen, as I had this running on a different
> server (same xen setup) and it had the same issues, which I initially
> thought were to do with the raid card.
>
> Is there known issues in this kernel and xen version with pciback?
I''m
> going to update to the current package versions this evening (4.0.1-1
> and 2.6.32-27) however would appreciate if anyone has any other insight
> into this issue, or even just a note to say it is a bug that has been
> fixed in current versions!
>
> Thanks,
> Mark
>
> _______________________________________________
> Xen-users mailing list
> Xen-users@lists.xensource.com
> http://lists.xensource.com/xen-users
>

_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Mark Adams

2010-Nov-11 12:03 UTC

head link

Re: [Xen-devel] Re: [Xen-users] pci-passthrough in pvops causing offline raid

Hi - It''s not megaraid, its an Areca card (arcmsr)

This is definately something to do with the pciback. Anyone else got any
ideas or views on this? The domU is an HVM debian-squeeze instance.

On Thu, Nov 11, 2010 at 12:13:31PM +0100, Olivier Hanesse
wrote:> Hello,
> 
> What is the module of your raid card ?
> If it is "megaraid_sas", please try upgrading the version of that
module
> (see others post in the ML).
> 
> Regards
> 
> 2010/11/11 Mark Adams <mark@campbell-lange.net>
> 
> > Hi All,
> >
> > Running xen 4.0.1-rc6, debian squeeze 2.6.32-21.
> >
> > In a voip setup, where I have forwarded the onboard NIC interfaces
> > through to domU using the following grub config:
> >
> > module  /vmlinuz-2.6.32-5-xen-amd64 placeholder
> > root=UUID=25c3ac79-6850-498d-afcf-ea42970e94fd ro  quiet
> > xen-pciback.permissive xen-pciback.hide=(02:00.0)(03:00.0)
> > pci=resource_alignment=02:00.0;03:00.0
> >
> > I''m having a serious issue where the raid card goes offline
after an
> > indefinate period of time. Sometimes runs fine for a week, other times
1
> > day before I get "offline device" errors. Rebooting the
machine fixes it
> > straight away, and everything is back online.
> >
> > What in the Xen pciback is causing the raid card to go offline? The
> > only devices hidden are the 2 onboard NIC''s.
> >
> > I know that this issue is with Xen, as I had this running on a
different
> > server (same xen setup) and it had the same issues, which I initially
> > thought were to do with the raid card.
> >
> > Is there known issues in this kernel and xen version with pciback?
I''m
> > going to update to the current package versions this evening (4.0.1-1
> > and 2.6.32-27) however would appreciate if anyone has any other
insight
> > into this issue, or even just a note to say it is a bug that has been
> > fixed in current versions!
> >
> > Thanks,
> > Mark
> >
> > _______________________________________________
> > Xen-users mailing list
> > Xen-users@lists.xensource.com
> > http://lists.xensource.com/xen-users
> >
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Konrad Rzeszutek Wilk

2010-Nov-11 16:53 UTC

head link

[Xen-users] Re: [Xen-devel] pci-passthrough in pvops causing offline raid

On Thu, Nov 11, 2010 at 10:24:17AM +0000, Mark Adams
wrote:> Hi All,
> 
> Running xen 4.0.1-rc6, debian squeeze 2.6.32-21.
> 
> In a voip setup, where I have forwarded the onboard NIC interfaces
> through to domU using the following grub config:
> 
> module  /vmlinuz-2.6.32-5-xen-amd64 placeholder
root=UUID=25c3ac79-6850-498d-afcf-ea42970e94fd ro  quiet xen-pciback.permissive
xen-pciback.hide=(02:00.0)(03:00.0) pci=resource_alignment=02:00.0;03:00.0
> 
> I''m having a serious issue where the raid card goes offline after
an
> indefinate period of time. Sometimes runs fine for a week, other times 1
> day before I get "offline device" errors. Rebooting the machine
fixes it
> straight away, and everything is back online.
> 
> What in the Xen pciback is causing the raid card to go offline? The
> only devices hidden are the 2 onboard NIC''s.
You need to give more details. Is the RAID card a 3Ware? An LSI? Do you
run with an IOMMU? When the RAID card goes offline, do you see a stop of
IRQs going to the device? Are the IRQs for the RAID card sent to all of your
CPUs or just a specific one? Are you pinning your guests to specific CPUs?
Does the issue disappear if you don''t passthrough the NIC interfaces?
If so have
you run this setup for "a week" to make sure?> 
> I know that this issue is with Xen, as I had this running on a different
> server (same xen setup) and it had the same issues, which I initially
> thought were to do with the raid card.
So you never ran this setup on this kernel (2.6.32-5) without the Xen
hypervisor?
> 
> Is there known issues in this kernel and xen version with pciback?
I''m
No. It all works perfectly :-)
> going to update to the current package versions this evening (4.0.1-1
> and 2.6.32-27) however would appreciate if anyone has any other insight
> into this issue, or even just a note to say it is a bug that has been
> fixed in current versions!
Well, there were issues with the LSI cards having a hidden PCI device. But those
are pretty obvious as you can''t even use it correctly. There is also
a problem with 3Ware 9506 IDE card - which on my box stops sending IRQs
on the IOAPIC it has been assigned (28) and instead uses another one (17).
Not sure if this is just the PCI card using the wrong PCI interrupt pin on the
card and it ends up poking the wrong IOAPIC.

_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Mark Adams

2010-Nov-11 17:38 UTC

head link

[Xen-users] Re: [Xen-devel] pci-passthrough in pvops causing offline raid

On Thu, Nov 11, 2010 at 11:53:40AM -0500, Konrad Rzeszutek Wilk
wrote:> On Thu, Nov 11, 2010 at 10:24:17AM +0000, Mark Adams wrote:
> > Hi All,
> > 
> > Running xen 4.0.1-rc6, debian squeeze 2.6.32-21.
> > 
> > In a voip setup, where I have forwarded the onboard NIC interfaces
> > through to domU using the following grub config:
> > 
> > module  /vmlinuz-2.6.32-5-xen-amd64 placeholder
root=UUID=25c3ac79-6850-498d-afcf-ea42970e94fd ro  quiet xen-pciback.permissive
xen-pciback.hide=(02:00.0)(03:00.0) pci=resource_alignment=02:00.0;03:00.0
> > 
> > I''m having a serious issue where the raid card goes offline
after an
> > indefinate period of time. Sometimes runs fine for a week, other times
1
> > day before I get "offline device" errors. Rebooting the
machine fixes it
> > straight away, and everything is back online.
> > 
> > What in the Xen pciback is causing the raid card to go offline? The
> > only devices hidden are the 2 onboard NIC''s.
> 
> You need to give more details. Is the RAID card a 3Ware? An LSI? Do you
> run with an IOMMU? When the RAID card goes offline, do you see a stop of
> IRQs going to the device? Are the IRQs for the RAID card sent to all of
your
> CPUs or just a specific one? Are you pinning your guests to specific CPUs?
> Does the issue disappear if you don''t passthrough the NIC
interfaces? If so have
> you run this setup for "a week" to make sure?
It is an Areca 1220. I can''t see anything when the device goes offline
apart from 

    [77324.264270] sd 0:0:0:1: rejecting I/O to offline device
    [77334.005854] sd 0:0:0:0: rejecting I/O to offline device

Unfortunately nothing get''s logged because there is nothing to write to
anymore. I''m not sure how I can see the IRQs otherwise. There is no
pinning being done at all, and the machine was running for a few months
OK before the pciback was added.

Is my kernel module line correct above? are the xen-pciback.permissive
and resource_alignment options required? Also I am passing through the
onboard NIC''s - is this something that should be avoided or is it ok to
do?
> > 
> > I know that this issue is with Xen, as I had this running on a
different
> > server (same xen setup) and it had the same issues, which I initially
> > thought were to do with the raid card.
> 
> So you never ran this setup on this kernel (2.6.32-5) without the Xen
hypervisor?
no, its always had the hypervisor - but it was running ok before the
pciback options were added. This week, it''s seemed to happen
approximately every 24 hours.
> 
> > 
> > Is there known issues in this kernel and xen version with pciback?
I''m
> 
> No. It all works perfectly :-)
> 
> > going to update to the current package versions this evening (4.0.1-1
> > and 2.6.32-27) however would appreciate if anyone has any other
insight
> > into this issue, or even just a note to say it is a bug that has been
> > fixed in current versions!
> 
> Well, there were issues with the LSI cards having a hidden PCI device. But
those
> are pretty obvious as you can''t even use it correctly. There is
also
> a problem with 3Ware 9506 IDE card - which on my box stops sending IRQs
> on the IOAPIC it has been assigned (28) and instead uses another one (17).
> Not sure if this is just the PCI card using the wrong PCI interrupt pin on
the
> card and it ends up poking the wrong IOAPIC.
Thanks,
Mark


_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Richie

2010-Nov-11 17:40 UTC

head link

Re: [Xen-users] Re: [Xen-devel] pci-passthrough in pvops causing offline raid

On 11/11/2010 11:53 AM, Konrad Rzeszutek Wilk wrote:> On Thu, Nov 11, 2010 at 10:24:17AM +0000, Mark Adams wrote:
>    
>> Hi All,
>>
>> Running xen 4.0.1-rc6, debian squeeze 2.6.32-21.
>>
>> In a voip setup, where I have forwarded the onboard NIC interfaces
>> through to domU using the following grub config:
>>
>> module  /vmlinuz-2.6.32-5-xen-amd64 placeholder
root=UUID=25c3ac79-6850-498d-afcf-ea42970e94fd ro  quiet xen-pciback.permissive
xen-pciback.hide=(02:00.0)(03:00.0) pci=resource_alignment=02:00.0;03:00.0
>>
>> I''m having a serious issue where the raid card goes offline
after an
>> indefinate period of time. Sometimes runs fine for a week, other times
1
>> day before I get "offline device" errors. Rebooting the
machine fixes it
>> straight away, and everything is back online.
>>
>> What in the Xen pciback is causing the raid card to go offline? The
>> only devices hidden are the 2 onboard NIC''s.
>>      
> You need to give more details. Is the RAID card a 3Ware? An LSI? Do you
> run with an IOMMU? When the RAID card goes offline, do you see a stop of
> IRQs going to the device? Are the IRQs for the RAID card sent to all of
your
> CPUs or just a specific one? Are you pinning your guests to specific CPUs?
> Does the issue disappear if you don''t passthrough the NIC
interfaces? If so have
> you run this setup for "a week" to make sure?
>    
>> I know that this issue is with Xen, as I had this running on a
different
>> server (same xen setup) and it had the same issues, which I initially
>> thought were to do with the raid card.
>>      
> So you never ran this setup on this kernel (2.6.32-5) without the Xen
hypervisor?
>
>    
>> Is there known issues in this kernel and xen version with pciback?
I''m
>>      
> No. It all works perfectly :-)
>
>    
>> going to update to the current package versions this evening (4.0.1-1
>> and 2.6.32-27) however would appreciate if anyone has any other insight
>> into this issue, or even just a note to say it is a bug that has been
>> fixed in current versions!
>>      
> Well, there were issues with the LSI cards having a hidden PCI device. But
those
> are pretty obvious as you can''t even use it correctly. There is
also
> a problem with 3Ware 9506 IDE card - which on my box stops sending IRQs
> on the IOAPIC it has been assigned (28) and instead uses another one (17).
> Not sure if this is just the PCI card using the wrong PCI interrupt pin on
the
> card and it ends up poking the wrong IOAPIC.
>
>    
Note:  I have no idea if this would be related to your issue or that my 
assessment is completely accurate.

I had an issue that I feel the debian squeeze kernel running under domU 
played a part in.  My dom0 is 2.6.34.7 Xenified w/Andrew lyon''s patches
and I running Xen 4.0.2-pre (xen testing). I passthrough a pci tuner 
card but have not considered that this could also contribute.

Sometimes when I shutdown the domU and upon halt I started getting 
libata style DRIVE_NOT_READY errors in my dom0.  Either one drive would 
drop from my mdadm raid (which houses my lvm filesystems including root 
for dom0 and domU) or perhaps they would drop and cause a panic.  A 
reboot fixes everything though a rebuild would occur.   I was not able 
to capture those errors in the few times it happened, but  I have since 
changed to use a 2.6.31 pvops kernel from jeremy''s stable branch in my 
domU and I have yet to reproduce the issue.  I did note that it might 
take a number of days for the problem to manifest and so far I''ve
tested
a domU shutdown after 24 and 72 hours using the new kernel with no 
issues.  My next test is @ 7 days.

I wish I had more information myself, but I don''t.  Regardless of the 
accuracy of this claim, I recommend trying other kernels to see if the 
problem persists.

_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Konrad Rzeszutek Wilk

2010-Nov-11 17:58 UTC

head link

Re: [Xen-devel] pci-passthrough in pvops causing offline raid

On Thu, Nov 11, 2010 at 05:38:50PM +0000, Mark Adams
wrote:> On Thu, Nov 11, 2010 at 11:53:40AM -0500, Konrad Rzeszutek Wilk wrote:
> > On Thu, Nov 11, 2010 at 10:24:17AM +0000, Mark Adams wrote:
> > > Hi All,
> > > 
> > > Running xen 4.0.1-rc6, debian squeeze 2.6.32-21.
> > > 
> > > In a voip setup, where I have forwarded the onboard NIC
interfaces
> > > through to domU using the following grub config:
> > > 
> > > module  /vmlinuz-2.6.32-5-xen-amd64 placeholder
root=UUID=25c3ac79-6850-498d-afcf-ea42970e94fd ro  quiet xen-pciback.permissive
xen-pciback.hide=(02:00.0)(03:00.0) pci=resource_alignment=02:00.0;03:00.0
> > > 
> > > I''m having a serious issue where the raid card goes
offline after an
> > > indefinate period of time. Sometimes runs fine for a week, other
times 1
> > > day before I get "offline device" errors. Rebooting the
machine fixes it
> > > straight away, and everything is back online.
> > > 
> > > What in the Xen pciback is causing the raid card to go offline?
The
> > > only devices hidden are the 2 onboard NIC''s.
> > 
> > You need to give more details. Is the RAID card a 3Ware? An LSI? Do
you
> > run with an IOMMU? When the RAID card goes offline, do you see a stop
of
> > IRQs going to the device? Are the IRQs for the RAID card sent to all
of your
> > CPUs or just a specific one? Are you pinning your guests to specific
CPUs?
> > Does the issue disappear if you don''t passthrough the NIC
interfaces? If so have
> > you run this setup for "a week" to make sure?
> 
> It is an Areca 1220. I can''t see anything when the device goes
offline
> apart from 
> 
>     [77324.264270] sd 0:0:0:1: rejecting I/O to offline device
>     [77334.005854] sd 0:0:0:0: rejecting I/O to offline device
That is it? No other details from the driver? Did you poke at the driver
(modinfo)
to see if there are any options to increase its verbosity.
> 
> Unfortunately nothing get''s logged because there is nothing to
write to
> anymore. I''m not sure how I can see the IRQs otherwise. There is
no
cat /proc/interrupts
> pinning being done at all, and the machine was running for a few months
> OK before the pciback was added.
Ok, what about your NICs? Are they on-board? Are they sharing the IRQ
with the card? You should be able to see this by looking at /proc/interrupts.
Which NICs are they? lspci can you help you there. As of matter of fact, run
lspci -vvv and send that.> 
> Is my kernel module line correct above? are the xen-pciback.permissive
> and resource_alignment options required? Also I am passing through the
Not always. The resource_alignment only if the BARs (look at lspci output) are
not page-aligned. If you have no idea what I am talking about then the answer
is yes.
> onboard NIC''s - is this something that should be avoided or is it
ok to
> do?
It is fine. That is the first thing I test..
> 
> > > 
> > > I know that this issue is with Xen, as I had this running on a
different
> > > server (same xen setup) and it had the same issues, which I
initially
> > > thought were to do with the raid card.
> > 
> > So you never ran this setup on this kernel (2.6.32-5) without the Xen
hypervisor?
> 
> no, its always had the hypervisor - but it was running ok before the
> pciback options were added. This week, it''s seemed to happen
> approximately every 24 hours.
When this hang occurs, can you do ''xm debug-key Q'',
''xm debug-key i'', ''xm debug-key z''.
Then run ''xm dmesg'' and provide that to me?

Is your boot disk on the same disk as the RAID?

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Mark Adams

2010-Nov-11 18:13 UTC

head link

[Xen-users] Re: [Xen-devel] pci-passthrough in pvops causing offline raid

On Thu, Nov 11, 2010 at 12:58:09PM -0500, Konrad Rzeszutek Wilk
wrote:> On Thu, Nov 11, 2010 at 05:38:50PM +0000, Mark Adams wrote:
> > On Thu, Nov 11, 2010 at 11:53:40AM -0500, Konrad Rzeszutek Wilk wrote:
> > > On Thu, Nov 11, 2010 at 10:24:17AM +0000, Mark Adams wrote:
> > > > Hi All,
> > > > 
> > > > Running xen 4.0.1-rc6, debian squeeze 2.6.32-21.
> > > > 
> > > > In a voip setup, where I have forwarded the onboard NIC
interfaces
> > > > through to domU using the following grub config:
> > > > 
> > > > module  /vmlinuz-2.6.32-5-xen-amd64 placeholder
root=UUID=25c3ac79-6850-498d-afcf-ea42970e94fd ro  quiet xen-pciback.permissive
xen-pciback.hide=(02:00.0)(03:00.0) pci=resource_alignment=02:00.0;03:00.0
> > > > 
> > > > I''m having a serious issue where the raid card goes
offline after an
> > > > indefinate period of time. Sometimes runs fine for a week,
other times 1
> > > > day before I get "offline device" errors.
Rebooting the machine fixes it
> > > > straight away, and everything is back online.
> > > > 
> > > > What in the Xen pciback is causing the raid card to go
offline? The
> > > > only devices hidden are the 2 onboard NIC''s.
> > > 
> > > You need to give more details. Is the RAID card a 3Ware? An LSI?
Do you
> > > run with an IOMMU? When the RAID card goes offline, do you see a
stop of
> > > IRQs going to the device? Are the IRQs for the RAID card sent to
all of your
> > > CPUs or just a specific one? Are you pinning your guests to
specific CPUs?
> > > Does the issue disappear if you don''t passthrough the
NIC interfaces? If so have
> > > you run this setup for "a week" to make sure?
> > 
> > It is an Areca 1220. I can''t see anything when the device
goes offline
> > apart from 
> > 
> >     [77324.264270] sd 0:0:0:1: rejecting I/O to offline device
> >     [77334.005854] sd 0:0:0:0: rejecting I/O to offline device
> 
> That is it? No other details from the driver? Did you poke at the driver
(modinfo)
> to see if there are any options to increase its verbosity.
I can''t do anything once its happened, everything is offline so I have
no utils...> 
> > 
> > Unfortunately nothing get''s logged because there is nothing
to write to
> > anymore. I''m not sure how I can see the IRQs otherwise. There
is no
> 
> cat /proc/interrupts
> 
> > pinning being done at all, and the machine was running for a few
months
> > OK before the pciback was added.
> 
> Ok, what about your NICs? Are they on-board? Are they sharing the IRQ
> with the card? You should be able to see this by looking at
/proc/interrupts.
> Which NICs are they? lspci can you help you there. As of matter of fact,
run
> lspci -vvv and send that.
It is the onboard nics, they are Intel 82574L. I can see the arcmsr
line, but not anything for the NICS (because they are hidden?)

39:    1126249          0          0          0          0          0         0 
0  xen-pirq-ioapic-level  arcmsr

Nothing else is on 1126249

see lspci.txt attached.
> > 
> > Is my kernel module line correct above? are the xen-pciback.permissive
> > and resource_alignment options required? Also I am passing through the
> 
> Not always. The resource_alignment only if the BARs (look at lspci output)
are
> not page-aligned. If you have no idea what I am talking about then the
answer
> is yes.
> 
> > onboard NIC''s - is this something that should be avoided or
is it ok to
> > do?
> 
> It is fine. That is the first thing I test..
> 
> > 
> > > > 
> > > > I know that this issue is with Xen, as I had this running on
a different
> > > > server (same xen setup) and it had the same issues, which I
initially
> > > > thought were to do with the raid card.
> > > 
> > > So you never ran this setup on this kernel (2.6.32-5) without the
Xen hypervisor?
> > 
> > no, its always had the hypervisor - but it was running ok before the
> > pciback options were added. This week, it''s seemed to happen
> > approximately every 24 hours.
> 
> When this hang occurs, can you do ''xm debug-key Q'',
''xm debug-key i'', ''xm debug-key z''.
> Then run ''xm dmesg'' and provide that to me?
I can try this, but It probably won''t work as the device is will not be
readable.> 
> Is your boot disk on the same disk as the RAID?
There are 2 raids, a Raid1 for the OS (/boot / /var /tmp /usr) and a
raid5 for VM''s - They both dissapear at the same time so it appears the
card is dissapearing..



_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Mark Adams

2010-Nov-11 18:47 UTC

head link

Re: [Xen-devel] pci-passthrough in pvops causing offline raid

On Thu, Nov 11, 2010 at 06:13:29PM +0000, Mark Adams
wrote:> On Thu, Nov 11, 2010 at 12:58:09PM -0500, Konrad Rzeszutek Wilk wrote:
> > On Thu, Nov 11, 2010 at 05:38:50PM +0000, Mark Adams wrote:
> > > On Thu, Nov 11, 2010 at 11:53:40AM -0500, Konrad Rzeszutek Wilk
wrote:
> > > > On Thu, Nov 11, 2010 at 10:24:17AM +0000, Mark Adams wrote:
> > > > > Hi All,
> > > > > 
> > > > > Running xen 4.0.1-rc6, debian squeeze 2.6.32-21.
> > > > > 
> > > > > In a voip setup, where I have forwarded the onboard NIC
interfaces
> > > > > through to domU using the following grub config:
> > > > > 
> > > > > module  /vmlinuz-2.6.32-5-xen-amd64 placeholder
root=UUID=25c3ac79-6850-498d-afcf-ea42970e94fd ro  quiet xen-pciback.permissive
xen-pciback.hide=(02:00.0)(03:00.0) pci=resource_alignment=02:00.0;03:00.0
> > > > > 
> > > > > I''m having a serious issue where the raid card
goes offline after an
> > > > > indefinate period of time. Sometimes runs fine for a
week, other times 1
> > > > > day before I get "offline device" errors.
Rebooting the machine fixes it
> > > > > straight away, and everything is back online.
> > > > > 
> > > > > What in the Xen pciback is causing the raid card to go
offline? The
> > > > > only devices hidden are the 2 onboard NIC''s.
> > > > 
> > > > You need to give more details. Is the RAID card a 3Ware? An
LSI? Do you
> > > > run with an IOMMU? When the RAID card goes offline, do you
see a stop of
> > > > IRQs going to the device? Are the IRQs for the RAID card
sent to all of your
> > > > CPUs or just a specific one? Are you pinning your guests to
specific CPUs?
> > > > Does the issue disappear if you don''t passthrough
the NIC interfaces? If so have
> > > > you run this setup for "a week" to make sure?
> > > 
> > > It is an Areca 1220. I can''t see anything when the
device goes offline
> > > apart from 
> > > 
> > >     [77324.264270] sd 0:0:0:1: rejecting I/O to offline device
> > >     [77334.005854] sd 0:0:0:0: rejecting I/O to offline device
> > 
> > That is it? No other details from the driver? Did you poke at the
driver (modinfo)
> > to see if there are any options to increase its verbosity.
> 
> I can''t do anything once its happened, everything is offline so I
have
> no utils...
> > 
> > > 
> > > Unfortunately nothing get''s logged because there is
nothing to write to
> > > anymore. I''m not sure how I can see the IRQs otherwise.
There is no
> > 
> > cat /proc/interrupts
> > 
> > > pinning being done at all, and the machine was running for a few
months
> > > OK before the pciback was added.
> > 
> > Ok, what about your NICs? Are they on-board? Are they sharing the IRQ
> > with the card? You should be able to see this by looking at
/proc/interrupts.
> > Which NICs are they? lspci can you help you there. As of matter of
fact, run
> > lspci -vvv and send that.
> 
> It is the onboard nics, they are Intel 82574L. I can see the arcmsr
> line, but not anything for the NICS (because they are hidden?)
> 
> 39:    1126249          0          0          0          0          0      
0          0  xen-pirq-ioapic-level  arcmsr
> 
> Nothing else is on 1126249
> 
> see lspci.txt attached.
> 
I''ve just noticed this at the end of xm dmesg

(XEN) msi.c:715: MSI is already in use on device 02:00.0
(XEN) msi.c:715: MSI is already in use on device 02:00.0
(XEN) msi.c:715: MSI is already in use on device 02:00.0

Something else trying to use the device being exported? (the nics are
02:00.0 and 03:00.0)

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Konrad Rzeszutek Wilk

2010-Nov-11 18:57 UTC

head link

Re: [Xen-devel] pci-passthrough in pvops causing offline raid

> > > It is an Areca 1220. I can''t see anything when the
device goes offline
> > > apart from 
> > > 
> > >     [77324.264270] sd 0:0:0:1: rejecting I/O to offline device
> > >     [77334.005854] sd 0:0:0:0: rejecting I/O to offline device
> > 
> > That is it? No other details from the driver? Did you poke at the
driver (modinfo)
> > to see if there are any options to increase its verbosity.
> 
> I can''t do anything once its happened, everything is offline so I
have
> no utils...
An easy is to use netconsole. You can make all of the kernel log output
got a different machine on your network.
> > 
> > > 
> > > Unfortunately nothing get''s logged because there is
nothing to write to
> > > anymore. I''m not sure how I can see the IRQs otherwise.
There is no
> > 
> > cat /proc/interrupts
> > 
> > > pinning being done at all, and the machine was running for a few
months
> > > OK before the pciback was added.
> > 
> > Ok, what about your NICs? Are they on-board? Are they sharing the IRQ
> > with the card? You should be able to see this by looking at
/proc/interrupts.
> > Which NICs are they? lspci can you help you there. As of matter of
fact, run
> > lspci -vvv and send that.
> 
> It is the onboard nics, they are Intel 82574L. I can see the arcmsr
> line, but not anything for the NICS (because they are hidden?)
Your lspci tells me it is on 16 and 17. You should see in /proc/interrupts
on that line something about pciback?> 
> 39:    1126249          0          0          0          0          0      
0          0  xen-pirq-ioapic-level  arcmsr
> 
> Nothing else is on 1126249
You mean IRQ 39.> 
> see lspci.txt attached.
thanks.> > When this hang occurs, can you do ''xm debug-key Q'',
''xm debug-key i'', ''xm debug-key z''.
> > Then run ''xm dmesg'' and provide that to me?
> 
> I can try this, but It probably won''t work as the device is will
not be
> readable.
Look on Google for ''Wiki PVOPS'' and there is a section on how
to connect a serial console.
With the serial console we can send those commands to the hypervisor even if
your box
is hanged.

http://wiki.xen.org/xenwiki/XenSerialConsole
> > 
> > Is your boot disk on the same disk as the RAID?
> 
> There are 2 raids, a Raid1 for the OS (/boot / /var /tmp /usr) and a
> raid5 for VM''s - They both dissapear at the same time so it
appears the
> card is dissapearing..
> 
I wonder if we have your IRQs confused. Can you provide the full cat
/proc/interrupts
and as well the serial bootup of the console? Or just the ''xm
dmesg'' and ''dmesg'' output
if you don''t have the serial console hooked up yet.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Konrad Rzeszutek Wilk

2010-Nov-11 19:06 UTC

head link

Re: [Xen-devel] pci-passthrough in pvops causing offline raid

> I''ve just noticed this at the end of xm dmesg
> 
> (XEN) msi.c:715: MSI is already in use on device 02:00.0
> (XEN) msi.c:715: MSI is already in use on device 02:00.0
> (XEN) msi.c:715: MSI is already in use on device 02:00.0
> 
> Something else trying to use the device being exported? (the nics are
> 02:00.0 and 03:00.0)
Hmm, looks like it, but it should not have happend. Can you attach
the output of ''xm dmesg'' and also do the ''xm
debug-keys ..'' that I
asked for in the previous e-mail?

Jan, the fixes for the MSI you did, they weren''t for 4.0.1 right? Just
for unstable?

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Mark Adams

2010-Nov-11 19:22 UTC

head link

Re: [Xen-users] Re: [Xen-devel] pci-passthrough in pvops causing offline raid

See attached

On Thu, Nov 11, 2010 at 02:06:58PM -0500, Konrad Rzeszutek Wilk
wrote:> > I''ve just noticed this at the end of xm dmesg
> > 
> > (XEN) msi.c:715: MSI is already in use on device 02:00.0
> > (XEN) msi.c:715: MSI is already in use on device 02:00.0
> > (XEN) msi.c:715: MSI is already in use on device 02:00.0
> > 
> > Something else trying to use the device being exported? (the nics are
> > 02:00.0 and 03:00.0)
> 
> Hmm, looks like it, but it should not have happend. Can you attach
> the output of ''xm dmesg'' and also do the ''xm
debug-keys ..'' that I
> asked for in the previous e-mail?
> 
> Jan, the fixes for the MSI you did, they weren''t for 4.0.1 right?
Just
> for unstable?
> 
> _______________________________________________
> Xen-users mailing list
> Xen-users@lists.xensource.com
> http://lists.xensource.com/xen-users




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Mark Adams

2010-Nov-11 19:42 UTC

head link

Re: [Xen-users] Re: [Xen-devel] pci-passthrough in pvops causing offline raid

Apols - Also see plain dmesg attached. This one from updated machine
(4.0.1) still showing the msi issues.

On Thu, Nov 11, 2010 at 07:22:56PM +0000, Mark Adams
wrote:> See attached
> 
> On Thu, Nov 11, 2010 at 02:06:58PM -0500, Konrad Rzeszutek Wilk wrote:
> > > I''ve just noticed this at the end of xm dmesg
> > > 
> > > (XEN) msi.c:715: MSI is already in use on device 02:00.0
> > > (XEN) msi.c:715: MSI is already in use on device 02:00.0
> > > (XEN) msi.c:715: MSI is already in use on device 02:00.0
> > > 
> > > Something else trying to use the device being exported? (the nics
are
> > > 02:00.0 and 03:00.0)
> > 
> > Hmm, looks like it, but it should not have happend. Can you attach
> > the output of ''xm dmesg'' and also do the
''xm debug-keys ..'' that I
> > asked for in the previous e-mail?
> > 
> > Jan, the fixes for the MSI you did, they weren''t for 4.0.1
right? Just
> > for unstable?
> > 
> > _______________________________________________
> > Xen-users mailing list
> > Xen-users@lists.xensource.com
> > http://lists.xensource.com/xen-users
> 


_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Mark Adams

2010-Nov-12 17:10 UTC

head link

Re: [Xen-users] Re: [Xen-devel] pci-passthrough in pvops causing offline raid

On Thu, Nov 11, 2010 at 02:06:58PM -0500, Konrad Rzeszutek Wilk
wrote:> > I''ve just noticed this at the end of xm dmesg
> > 
> > (XEN) msi.c:715: MSI is already in use on device 02:00.0
> > (XEN) msi.c:715: MSI is already in use on device 02:00.0
> > (XEN) msi.c:715: MSI is already in use on device 02:00.0
> > 
> > Something else trying to use the device being exported? (the nics are
> > 02:00.0 and 03:00.0)
> 
> Hmm, looks like it, but it should not have happend. Can you attach
> the output of ''xm dmesg'' and also do the ''xm
debug-keys ..'' that I
> asked for in the previous e-mail?
> 
> Jan, the fixes for the MSI you did, they weren''t for 4.0.1 right?
Just
> for unstable?
> 
Any further idea''s on this? Is it a xen bug if the hidden device is
being
accessed in dom0? or is there an overlap somewhere? (not sure how this
would work)..

Regards,
Mark


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Konrad Rzeszutek Wilk

2010-Nov-12 22:22 UTC

head link

Re: [Xen-users] Re: [Xen-devel] pci-passthrough in pvops causing offline raid

On Fri, Nov 12, 2010 at 05:10:58PM +0000, Mark Adams
wrote:> On Thu, Nov 11, 2010 at 02:06:58PM -0500, Konrad Rzeszutek Wilk wrote:
> > > I''ve just noticed this at the end of xm dmesg
> > > 
> > > (XEN) msi.c:715: MSI is already in use on device 02:00.0
> > > (XEN) msi.c:715: MSI is already in use on device 02:00.0
> > > (XEN) msi.c:715: MSI is already in use on device 02:00.0
> > > 
> > > Something else trying to use the device being exported? (the nics
are
> > > 02:00.0 and 03:00.0)
> > 
> > Hmm, looks like it, but it should not have happend. Can you attach
> > the output of ''xm dmesg'' and also do the
''xm debug-keys ..'' that I
> > asked for in the previous e-mail?
> > 
> > Jan, the fixes for the MSI you did, they weren''t for 4.0.1
right? Just
> > for unstable?
> > 
> 
> Any further idea''s on this? Is it a xen bug if the hidden device
is being
> accessed in dom0? or is there an overlap somewhere? (not sure how this
> would work)..
I was going to look in the source today to get an idea but never got to it...

You might, as I mentioned in earlier emails, try to setup a serial console
or netconsole and log the Linux kernel output when it hangs/fails.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Mark Adams

2010-Nov-14 17:15 UTC

head link

Re: [Xen-users] Re: [Xen-devel] pci-passthrough in pvops causing offline raid

On 12 Nov 2010, at 22:22, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
wrote:
> On Fri, Nov 12, 2010 at 05:10:58PM +0000, Mark Adams wrote:
>> On Thu, Nov 11, 2010 at 02:06:58PM -0500, Konrad Rzeszutek Wilk wrote:
>>>> I''ve just noticed this at the end of xm dmesg
>>>> 
>>>> (XEN) msi.c:715: MSI is already in use on device 02:00.0
>>>> (XEN) msi.c:715: MSI is already in use on device 02:00.0
>>>> (XEN) msi.c:715: MSI is already in use on device 02:00.0
>>>> 
>>>> Something else trying to use the device being exported? (the
nics are
>>>> 02:00.0 and 03:00.0)
>>> 
>>> Hmm, looks like it, but it should not have happend. Can you attach
>>> the output of ''xm dmesg'' and also do the
''xm debug-keys ..'' that I
>>> asked for in the previous e-mail?
>>> 
>>> Jan, the fixes for the MSI you did, they weren''t for 4.0.1
right? Just
>>> for unstable?
>>> 
>> 
>> Any further idea''s on this? Is it a xen bug if the hidden
device is being
>> accessed in dom0? or is there an overlap somewhere? (not sure how this
>> would work)..
> 
> I was going to look in the source today to get an idea but never got to
it...
> 
> You might, as I mentioned in earlier emails, try to setup a serial console
> or netconsole and log the Linux kernel output when it hangs/fails.
> 
can''t do this unfortunately, the server
Is in use so not able to just let it hang again... The passthrough is not in use
now until I think there is some possible solution to get rid of the MSI conflict
(when it won''t hang anymore!)
> _______________________________________________
> Xen-users mailing list
> Xen-users@lists.xensource.com
> http://lists.xensource.com/xen-users
_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Mark Adams

2010-Nov-15 17:11 UTC

head link

Re: [Xen-users] Re: [Xen-devel] pci-passthrough in pvops causing offline raid

On Sun, Nov 14, 2010 at 05:15:02PM +0000, Mark Adams
wrote:> 
> On 12 Nov 2010, at 22:22, Konrad Rzeszutek Wilk
<konrad.wilk@oracle.com> wrote:
> 
> > On Fri, Nov 12, 2010 at 05:10:58PM +0000, Mark Adams wrote:
> >> On Thu, Nov 11, 2010 at 02:06:58PM -0500, Konrad Rzeszutek Wilk
wrote:
> >>>> I''ve just noticed this at the end of xm dmesg
> >>>> 
> >>>> (XEN) msi.c:715: MSI is already in use on device 02:00.0
> >>>> (XEN) msi.c:715: MSI is already in use on device 02:00.0
> >>>> (XEN) msi.c:715: MSI is already in use on device 02:00.0
> >>>> 
> >>>> Something else trying to use the device being exported?
(the nics are
> >>>> 02:00.0 and 03:00.0)
> >>> 
> >>> Hmm, looks like it, but it should not have happend. Can you
attach
> >>> the output of ''xm dmesg'' and also do the
''xm debug-keys ..'' that I
> >>> asked for in the previous e-mail?
> >>> 
> >>> Jan, the fixes for the MSI you did, they weren''t for
4.0.1 right? Just
> >>> for unstable?
> >>> 
> >> 
> >> Any further idea''s on this? Is it a xen bug if the hidden
device is being
> >> accessed in dom0? or is there an overlap somewhere? (not sure how
this
> >> would work)..
> > 
> > I was going to look in the source today to get an idea but never got
to it...
> > 
> > You might, as I mentioned in earlier emails, try to setup a serial
console
> > or netconsole and log the Linux kernel output when it hangs/fails.
> > 
> 
> can''t do this unfortunately, the server
> Is in use so not able to just let it hang again... The passthrough is
> not in use now until I think there is some possible solution to get
> rid of the MSI conflict (when it won''t hang anymore!)
Is there anything else I can do to help with resolution on this issue? I
see another user is also having a similar problem with (quiet possibly)
PCI passthrough causing their RAID array to go offline.

Did my logs show anything useful? You mentioned some fix for MSI earlier
has this been corrected in a newer version of Xen?

Regards,
Mark

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Konrad Rzeszutek Wilk

2010-Nov-15 17:15 UTC

head link

Re: [Xen-users] Re: [Xen-devel] pci-passthrough in pvops causing offline raid

On Sun, Nov 14, 2010 at 05:15:02PM +0000, Mark Adams
wrote:> 
> On 12 Nov 2010, at 22:22, Konrad Rzeszutek Wilk
<konrad.wilk@oracle.com> wrote:
> 
> > On Fri, Nov 12, 2010 at 05:10:58PM +0000, Mark Adams wrote:
> >> On Thu, Nov 11, 2010 at 02:06:58PM -0500, Konrad Rzeszutek Wilk
wrote:
> >>>> I''ve just noticed this at the end of xm dmesg
> >>>> 
> >>>> (XEN) msi.c:715: MSI is already in use on device 02:00.0
> >>>> (XEN) msi.c:715: MSI is already in use on device 02:00.0
> >>>> (XEN) msi.c:715: MSI is already in use on device 02:00.0
Looking briefly at the code it means that somebody enabled the MSI
already on the device and did not disable them. But I wonder how
you got those in the first place. Did you use xen-pciback.hide (for PVOPS
kernels)
or pciback.hide (for older kernels) to "hide" the devices away from
the
Linux Dom0 kernel?

> > I was going to look in the source today to get an idea but never got
to it...
> > 
> > You might, as I mentioned in earlier emails, try to setup a serial
console
> > or netconsole and log the Linux kernel output when it hangs/fails.
> > 
> 
> can''t do this unfortunately, the server
> Is in use so not able to just let it hang again... The passthrough is not
in use now until I think there is some possible solution to get rid of the MSI
conflict (when it won''t hang anymore!)
Didn''t you say that you had two servers and saw this problem on another
box too?

Without more details on the Xen hypervisor line or the kernel line when
the failure occurs I sadly can''t help you.

_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Mark Adams

2010-Nov-15 17:23 UTC

head link

Re: [Xen-users] Re: [Xen-devel] pci-passthrough in pvops causing offline raid

On Mon, Nov 15, 2010 at 12:15:44PM -0500, Konrad Rzeszutek Wilk
wrote:> On Sun, Nov 14, 2010 at 05:15:02PM +0000, Mark Adams wrote:
> > 
> > On 12 Nov 2010, at 22:22, Konrad Rzeszutek Wilk
<konrad.wilk@oracle.com> wrote:
> > 
> > > On Fri, Nov 12, 2010 at 05:10:58PM +0000, Mark Adams wrote:
> > >> On Thu, Nov 11, 2010 at 02:06:58PM -0500, Konrad Rzeszutek
Wilk wrote:
> > >>>> I''ve just noticed this at the end of xm
dmesg
> > >>>> 
> > >>>> (XEN) msi.c:715: MSI is already in use on device
02:00.0
> > >>>> (XEN) msi.c:715: MSI is already in use on device
02:00.0
> > >>>> (XEN) msi.c:715: MSI is already in use on device
02:00.0
> 
> Looking briefly at the code it means that somebody enabled the MSI
> already on the device and did not disable them. But I wonder how
> you got those in the first place. Did you use xen-pciback.hide (for PVOPS
kernels)
> or pciback.hide (for older kernels) to "hide" the devices away
from the
> Linux Dom0 kernel?
using xen-pciback.hide as its a pvops kernel (debian squeeze
2.6.32-5-27)
> 
> 
> > > I was going to look in the source today to get an idea but never
got to it...
> > > 
> > > You might, as I mentioned in earlier emails, try to setup a
serial console
> > > or netconsole and log the Linux kernel output when it
hangs/fails.
> > > 
> > 
> > can''t do this unfortunately, the server
> > Is in use so not able to just let it hang again... The passthrough is
not in use now until I think there is some possible solution to get rid of the
MSI conflict (when it won''t hang anymore!)
> 
> Didn''t you say that you had two servers and saw this problem on
another
> box too?
> 
> Without more details on the Xen hypervisor line or the kernel line when
> the failure occurs I sadly can''t help you.
Yes this occurs on both servers that I''ve tried it on. Doesn''t
the MSI
log above indicate that there is a conflict - which is what ends up
causing the device to go offline? Is there no other way to identify the
conflict?

Regards,
Mark


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Konrad Rzeszutek Wilk

2010-Nov-15 17:44 UTC

head link

Re: [Xen-users] Re: [Xen-devel] pci-passthrough in pvops causing offline raid

On Mon, Nov 15, 2010 at 05:23:09PM +0000, Mark Adams
wrote:> On Mon, Nov 15, 2010 at 12:15:44PM -0500, Konrad Rzeszutek Wilk wrote:
> > On Sun, Nov 14, 2010 at 05:15:02PM +0000, Mark Adams wrote:
> > > 
> > > On 12 Nov 2010, at 22:22, Konrad Rzeszutek Wilk
<konrad.wilk@oracle.com> wrote:
> > > 
> > > > On Fri, Nov 12, 2010 at 05:10:58PM +0000, Mark Adams wrote:
> > > >> On Thu, Nov 11, 2010 at 02:06:58PM -0500, Konrad
Rzeszutek Wilk wrote:
> > > >>>> I''ve just noticed this at the end of xm
dmesg
> > > >>>> 
> > > >>>> (XEN) msi.c:715: MSI is already in use on device
02:00.0
> > > >>>> (XEN) msi.c:715: MSI is already in use on device
02:00.0
> > > >>>> (XEN) msi.c:715: MSI is already in use on device
02:00.0
> > 
> > Looking briefly at the code it means that somebody enabled the MSI
> > already on the device and did not disable them. But I wonder how
> > you got those in the first place. Did you use xen-pciback.hide (for
PVOPS kernels)
> > or pciback.hide (for older kernels) to "hide" the devices
away from the
> > Linux Dom0 kernel?
> 
> using xen-pciback.hide as its a pvops kernel (debian squeeze
> 2.6.32-5-27)
Ok. Then it might be worth looking in when this happens. I think
there is an argument on the Xen hyperisor line to include the time-stamp, but
I don''t remember it :-(
> > Didn''t you say that you had two servers and saw this problem
on another
> > box too?
> > 
> > Without more details on the Xen hypervisor line or the kernel line
when
> > the failure occurs I sadly can''t help you.
> 
> Yes this occurs on both servers that I''ve tried it on.
Doesn''t the MSI
> log above indicate that there is a conflict - which is what ends up
> causing the device to go offline? Is there no other way to identify the
Could be, but it is unclear - it depends on when the message pops out.

But that does not help with finding out why your RAID controller goes offline.

_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Mark Adams

2010-Nov-15 17:56 UTC

head link

Re: [Xen-users] Re: [Xen-devel] pci-passthrough in pvops causing offline raid

On Mon, Nov 15, 2010 at 12:44:13PM -0500, Konrad Rzeszutek Wilk
wrote:> On Mon, Nov 15, 2010 at 05:23:09PM +0000, Mark Adams wrote:
> > On Mon, Nov 15, 2010 at 12:15:44PM -0500, Konrad Rzeszutek Wilk wrote:
> > > On Sun, Nov 14, 2010 at 05:15:02PM +0000, Mark Adams wrote:
> > > > 
> > > > On 12 Nov 2010, at 22:22, Konrad Rzeszutek Wilk
<konrad.wilk@oracle.com> wrote:
> > > > 
> > > > > On Fri, Nov 12, 2010 at 05:10:58PM +0000, Mark Adams
wrote:
> > > > >> On Thu, Nov 11, 2010 at 02:06:58PM -0500, Konrad
Rzeszutek Wilk wrote:
> > > > >>>> I''ve just noticed this at the end
of xm dmesg
> > > > >>>> 
> > > > >>>> (XEN) msi.c:715: MSI is already in use on
device 02:00.0
> > > > >>>> (XEN) msi.c:715: MSI is already in use on
device 02:00.0
> > > > >>>> (XEN) msi.c:715: MSI is already in use on
device 02:00.0
> > > 
> > > Looking briefly at the code it means that somebody enabled the
MSI
> > > already on the device and did not disable them. But I wonder how
> > > you got those in the first place. Did you use xen-pciback.hide
(for PVOPS kernels)
> > > or pciback.hide (for older kernels) to "hide" the
devices away from the
> > > Linux Dom0 kernel?
> > 
> > using xen-pciback.hide as its a pvops kernel (debian squeeze
> > 2.6.32-5-27)
> 
> Ok. Then it might be worth looking in when this happens. I think
> there is an argument on the Xen hyperisor line to include the time-stamp,
but
> I don''t remember it :-(
> 
> > > Didn''t you say that you had two servers and saw this
problem on another
> > > box too?
> > > 
> > > Without more details on the Xen hypervisor line or the kernel
line when
> > > the failure occurs I sadly can''t help you.
> > 
> > Yes this occurs on both servers that I''ve tried it on.
Doesn''t the MSI
> > log above indicate that there is a conflict - which is what ends up
> > causing the device to go offline? Is there no other way to identify
the
> 
> Could be, but it is unclear - it depends on when the message pops out.
The message appears immediately on boot.
> 
> But that does not help with finding out why your RAID controller goes
offline.
Maybe the other user having a similar issue can help with logs if it is
still happening to him. I''ll ask on that thread...

Regards,
Mark


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Pasi Kärkkäinen

2010-Nov-15 19:26 UTC

head link

Re: [Xen-users] Re: [Xen-devel] pci-passthrough in pvops causing offline raid

On Mon, Nov 15, 2010 at 12:44:13PM -0500, Konrad Rzeszutek Wilk
wrote:> On Mon, Nov 15, 2010 at 05:23:09PM +0000, Mark Adams wrote:
> > On Mon, Nov 15, 2010 at 12:15:44PM -0500, Konrad Rzeszutek Wilk wrote:
> > > On Sun, Nov 14, 2010 at 05:15:02PM +0000, Mark Adams wrote:
> > > > 
> > > > On 12 Nov 2010, at 22:22, Konrad Rzeszutek Wilk
<konrad.wilk@oracle.com> wrote:
> > > > 
> > > > > On Fri, Nov 12, 2010 at 05:10:58PM +0000, Mark Adams
wrote:
> > > > >> On Thu, Nov 11, 2010 at 02:06:58PM -0500, Konrad
Rzeszutek Wilk wrote:
> > > > >>>> I''ve just noticed this at the end
of xm dmesg
> > > > >>>> 
> > > > >>>> (XEN) msi.c:715: MSI is already in use on
device 02:00.0
> > > > >>>> (XEN) msi.c:715: MSI is already in use on
device 02:00.0
> > > > >>>> (XEN) msi.c:715: MSI is already in use on
device 02:00.0
> > > 
> > > Looking briefly at the code it means that somebody enabled the
MSI
> > > already on the device and did not disable them. But I wonder how
> > > you got those in the first place. Did you use xen-pciback.hide
(for PVOPS kernels)
> > > or pciback.hide (for older kernels) to "hide" the
devices away from the
> > > Linux Dom0 kernel?
> > 
> > using xen-pciback.hide as its a pvops kernel (debian squeeze
> > 2.6.32-5-27)
> 
> Ok. Then it might be worth looking in when this happens. I think
> there is an argument on the Xen hyperisor line to include the time-stamp,
but
> I don''t remember it :-(
> 
http://wiki.xen.org/xenwiki/XenHypervisorBootOptions

So I think it''s "console_timestamps"


-- Pasi


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Mark Adams

2010-Nov-16 10:37 UTC

head link

Re: [Xen-users] Re: [Xen-devel] pci-passthrough in pvops causing offline raid

On Mon, Nov 15, 2010 at 12:44:13PM -0500, Konrad Rzeszutek Wilk
wrote:> On Mon, Nov 15, 2010 at 05:23:09PM +0000, Mark Adams wrote:
> > On Mon, Nov 15, 2010 at 12:15:44PM -0500, Konrad Rzeszutek Wilk wrote:
> > > On Sun, Nov 14, 2010 at 05:15:02PM +0000, Mark Adams wrote:
> > > > 
> > > > On 12 Nov 2010, at 22:22, Konrad Rzeszutek Wilk
<konrad.wilk@oracle.com> wrote:
> > > > 
> > > > > On Fri, Nov 12, 2010 at 05:10:58PM +0000, Mark Adams
wrote:
> > > > >> On Thu, Nov 11, 2010 at 02:06:58PM -0500, Konrad
Rzeszutek Wilk wrote:
> > > > >>>> I''ve just noticed this at the end
of xm dmesg
> > > > >>>> 
> > > > >>>> (XEN) msi.c:715: MSI is already in use on
device 02:00.0
> > > > >>>> (XEN) msi.c:715: MSI is already in use on
device 02:00.0
> > > > >>>> (XEN) msi.c:715: MSI is already in use on
device 02:00.0
> > > 
> > > Looking briefly at the code it means that somebody enabled the
MSI
> > > already on the device and did not disable them. But I wonder how
> > > you got those in the first place. Did you use xen-pciback.hide
(for PVOPS kernels)
> > > or pciback.hide (for older kernels) to "hide" the
devices away from the
> > > Linux Dom0 kernel?
> > 
> > using xen-pciback.hide as its a pvops kernel (debian squeeze
> > 2.6.32-5-27)
> 
> Ok. Then it might be worth looking in when this happens. I think
> there is an argument on the Xen hyperisor line to include the time-stamp,
but
> I don''t remember it :-(
> 
> > > Didn''t you say that you had two servers and saw this
problem on another
> > > box too?
> > > 
> > > Without more details on the Xen hypervisor line or the kernel
line when
> > > the failure occurs I sadly can''t help you.
> > 
> > Yes this occurs on both servers that I''ve tried it on.
Doesn''t the MSI
> > log above indicate that there is a conflict - which is what ends up
> > causing the device to go offline? Is there no other way to identify
the
> 
> Could be, but it is unclear - it depends on when the message pops out.
> 
> But that does not help with finding out why your RAID controller goes
offline.
Stephan Austermuhle advises that nothing is logged via remote syslog
when this hang occurs. I''ll reply on that thread to see if he can add
the additional logging
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel

_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Konrad Rzeszutek Wilk

2010-Nov-16 16:04 UTC

head link

Re: [Xen-users] Re: [Xen-devel] pci-passthrough in pvops causing offline raid

> > Could be, but it is unclear - it depends on when the message pops out.
> > 
> > But that does not help with finding out why your RAID controller goes
offline.
> 
> Stephan Austermuhle advises that nothing is logged via remote syslog
> when this hang occurs. I''ll reply on that thread to see if he can
add
He can also look at http://wiki.xensource.com/xenwiki/XenSerialConsole
> the additional logging
Pasi, is there a Wiki with this?:

When the hang happens, he needs to do two things:

 1) In the Linux kernel, hit SysRq-L, SysRQ-T

 2). Then  go in the hypervisor, hit Ctrl-A three times.  He should see a 
     prompt saying (XEN) ** Serial ...
     and hit ''*'' - that will collect all of the relevant
information.

 3). Send the full serial log from the start of the machine to us (or in this
     case, to you Mark - or you can just CC him on this thread).

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Mark Adams

2010-Nov-16 16:47 UTC

head link

Re: [Xen-users] Re: [Xen-devel] pci-passthrough in pvops causing offline raid

Hi Stephan, please see debugging instructions from Konrad below.

Regards,
Mark

On Tue, Nov 16, 2010 at 11:04:22AM -0500, Konrad Rzeszutek Wilk
wrote:> > > Could be, but it is unclear - it depends on when the message pops
out.
> > > 
> > > But that does not help with finding out why your RAID controller
goes offline.
> > 
> > Stephan Austermuhle advises that nothing is logged via remote syslog
> > when this hang occurs. I''ll reply on that thread to see if he
can add
> 
> He can also look at http://wiki.xensource.com/xenwiki/XenSerialConsole
> 
> > the additional logging
> 
> Pasi, is there a Wiki with this?:
> 
> When the hang happens, he needs to do two things:
> 
>  1) In the Linux kernel, hit SysRq-L, SysRQ-T
> 
>  2). Then  go in the hypervisor, hit Ctrl-A three times.  He should see a 
>      prompt saying (XEN) ** Serial ...
>      and hit ''*'' - that will collect all of the relevant
information.
> 
>  3). Send the full serial log from the start of the machine to us (or in
this
>      case, to you Mark - or you can just CC him on this thread).
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Pasi Kärkkäinen

2010-Nov-16 21:19 UTC

head link

Re: [Xen-users] Re: [Xen-devel] pci-passthrough in pvops causing offline raid

On Tue, Nov 16, 2010 at 11:04:22AM -0500, Konrad Rzeszutek Wilk
wrote:> > > Could be, but it is unclear - it depends on when the message pops
out.
> > > 
> > > But that does not help with finding out why your RAID controller
goes offline.
> > 
> > Stephan Austermuhle advises that nothing is logged via remote syslog
> > when this hang occurs. I''ll reply on that thread to see if he
can add
> 
> He can also look at http://wiki.xensource.com/xenwiki/XenSerialConsole
> 
> > the additional logging
> 
> Pasi, is there a Wiki with this?:
>
I don''t think we have this in the wiki.. at least I haven''t
seen one..

-- Pasi
 > When the hang happens, he needs to do two things:
> 
>  1) In the Linux kernel, hit SysRq-L, SysRQ-T
> 
>  2). Then  go in the hypervisor, hit Ctrl-A three times.  He should see a 
>      prompt saying (XEN) ** Serial ...
>      and hit ''*'' - that will collect all of the relevant
information.
> 
>  3). Send the full serial log from the start of the machine to us (or in
this
>      case, to you Mark - or you can just CC him on this thread).
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Stephan Austermühle

2010-Nov-18 08:42 UTC

head link

Re: [Xen-users] Re: [Xen-devel] pci-passthrough in pvops causing offline raid

Hello Mark!

Am 16.11.2010 17:47, schrieb Mark Adams:
>>> Stephan Austermuhle advises that nothing is logged via remote
syslog
>>> when this hang occurs. I''ll reply on that thread to see if
he can add
>>
>> He can also look at http://wiki.xensource.com/xenwiki/XenSerialConsole
>>
>>> the additional logging
>>
>> Pasi, is there a Wiki with this?:
>>
>> When the hang happens, he needs to do two things:
>>
>>  1) In the Linux kernel, hit SysRq-L, SysRQ-T
>>
>>  2). Then  go in the hypervisor, hit Ctrl-A three times.  He should see
a
>>      prompt saying (XEN) ** Serial ...
>>      and hit ''*'' - that will collect all of the
relevant information.
>>
>>  3). Send the full serial log from the start of the machine to us (or
in this
>>      case, to you Mark - or you can just CC him on this thread).
Thanks for your support.

The server is far away from me (some hundred kilometers) with no chance
to connect a serial console. The only thing that I have access to is a
network console (kind of iLO). Is it sufficient to collect additional
debug data?

Best regards,

Stephan




_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Pasi Kärkkäinen

2010-Nov-18 08:45 UTC

head link

Re: [Xen-users] Re: [Xen-devel] pci-passthrough in pvops causing offline raid

On Thu, Nov 18, 2010 at 09:42:32AM +0100, Stephan Austermühle
wrote:> Hello Mark!
> 
> Am 16.11.2010 17:47, schrieb Mark Adams:
> 
> >>> Stephan Austermuhle advises that nothing is logged via remote
syslog
> >>> when this hang occurs. I''ll reply on that thread to
see if he can add
> >>
> >> He can also look at
http://wiki.xensource.com/xenwiki/XenSerialConsole
> >>
> >>> the additional logging
> >>
> >> Pasi, is there a Wiki with this?:
> >>
> >> When the hang happens, he needs to do two things:
> >>
> >>  1) In the Linux kernel, hit SysRq-L, SysRQ-T
> >>
> >>  2). Then  go in the hypervisor, hit Ctrl-A three times.  He
should see a
> >>      prompt saying (XEN) ** Serial ...
> >>      and hit ''*'' - that will collect all of the
relevant information.
> >>
> >>  3). Send the full serial log from the start of the machine to us
(or in this
> >>      case, to you Mark - or you can just CC him on this thread).
> 
> Thanks for your support.
> 
> The server is far away from me (some hundred kilometers) with no chance
> to connect a serial console. The only thing that I have access to is a
> network console (kind of iLO). Is it sufficient to collect additional
> debug data?
> 
If it''s SOL (Serial Over LAN), then yes, that''s enough.

See: http://wiki.xensource.com/xenwiki/XenSerialConsole

-- Pasi


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Stephan Austermühle

2010-Nov-18 08:48 UTC

head link

Re: [Xen-users] Re: [Xen-devel] pci-passthrough in pvops causing offline raid

Hi Pasi,

Am 18.11.2010 09:45, schrieb Pasi Kärkkäinen:
>> The server is far away from me (some hundred kilometers) with no chance
>> to connect a serial console. The only thing that I have access to is a
>> network console (kind of iLO). Is it sufficient to collect additional
>> debug data?
> 
> If it''s SOL (Serial Over LAN), then yes, that''s enough.
> 
> See: http://wiki.xensource.com/xenwiki/XenSerialConsole
I''ll check.

Stephan



_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Mark Adams

2010-Nov-24 17:59 UTC

head link

Re: [Xen-users] Re: [Xen-devel] pci-passthrough in pvops causing offline raid

> > > > > >>>> 
> > > > > >>>> (XEN) msi.c:715: MSI is already in use
on device 02:00.0
> > > > > >>>> (XEN) msi.c:715: MSI is already in use
on device 02:00.0
> > > > > >>>> (XEN) msi.c:715: MSI is already in use
on device 02:00.0
> > > > 
> > > > Looking briefly at the code it means that somebody enabled
the MSI
> > > > already on the device and did not disable them. But I wonder
how
> > > > you got those in the first place. Did you use
xen-pciback.hide (for PVOPS kernels)
> > > > or pciback.hide (for older kernels) to "hide" the
devices away from the
> > > > Linux Dom0 kernel?
> > > 
> > > using xen-pciback.hide as its a pvops kernel (debian squeeze
> > > 2.6.32-5-27)
> > 
> > Ok. Then it might be worth looking in when this happens. I think
> > there is an argument on the Xen hyperisor line to include the
time-stamp, but
> > I don''t remember it :-(
> > 
I''ve got a test setup in place now, and am trying to reproduce this.
I''ve not connected up serial as yet, but can see the following logs in
the qemu-dm log file when I get the "MSI is already in use" errors
above. Note also that this error -always- shows for the first specified
device in the pci= field, and not the 2nd.

pt_msixctrl_reg_write: guest enabling MSI-X, disable MSI-INTx translation
pci_intx: intx=1
pt_msix_update_one: Update msix entry 0 with pirq 4d gvec 59
pt_msix_update_one: Update msix entry 1 with pirq 4c gvec 61
pt_msix_update_one: Update msix entry 2 with pirq 4b gvec 69
pt_msixctrl_reg_write: guest enabling MSI-X, disable MSI-INTx translation
pci_intx: intx=2

I have also seen the following log just once, not sure if it''s related:

(XEN) domctl.c:811:d0 XEN_DOMCTL_test_assign_device: 2:0.0 already assigned, or
non-existent

Does this help at all with debugging my issues?

Regards,
Mark

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Konrad Rzeszutek Wilk

2010-Nov-24 20:28 UTC

head link

Re: [Xen-users] Re: [Xen-devel] pci-passthrough in pvops causing offline raid

On Wed, Nov 24, 2010 at 05:59:26PM +0000, Mark Adams
wrote:> > > > > > >>>> 
> > > > > > >>>> (XEN) msi.c:715: MSI is already
in use on device 02:00.0
> > > > > > >>>> (XEN) msi.c:715: MSI is already
in use on device 02:00.0
> > > > > > >>>> (XEN) msi.c:715: MSI is already
in use on device 02:00.0
> > > > > 
> > > > > Looking briefly at the code it means that somebody
enabled the MSI
> > > > > already on the device and did not disable them. But I
wonder how
> > > > > you got those in the first place. Did you use
xen-pciback.hide (for PVOPS kernels)
> > > > > or pciback.hide (for older kernels) to "hide"
the devices away from the
> > > > > Linux Dom0 kernel?
> > > > 
> > > > using xen-pciback.hide as its a pvops kernel (debian squeeze
> > > > 2.6.32-5-27)
> > > 
> > > Ok. Then it might be worth looking in when this happens. I think
> > > there is an argument on the Xen hyperisor line to include the
time-stamp, but
> > > I don''t remember it :-(
> > > 
> 
> I''ve got a test setup in place now, and am trying to reproduce
this.
> I''ve not connected up serial as yet, but can see the following
logs in
> the qemu-dm log file when I get the "MSI is already in use"
errors
> above. Note also that this error -always- shows for the first specified
> device in the pci= field, and not the 2nd.
> 
> pt_msixctrl_reg_write: guest enabling MSI-X, disable MSI-INTx translation
> pci_intx: intx=1
> pt_msix_update_one: Update msix entry 0 with pirq 4d gvec 59
> pt_msix_update_one: Update msix entry 1 with pirq 4c gvec 61
> pt_msix_update_one: Update msix entry 2 with pirq 4b gvec 69
> pt_msixctrl_reg_write: guest enabling MSI-X, disable MSI-INTx translation
> pci_intx: intx=2
> 
> I have also seen the following log just once, not sure if it''s
related:
> 
> (XEN) domctl.c:811:d0 XEN_DOMCTL_test_assign_device: 2:0.0 already
assigned, or non-existent
> 
> Does this help at all with debugging my issues?
Not yet. Need to serial log of the Linux kernel and the Xen hypervisor when your
machine is toast. I mentioned in the previous email the key sequences - look on
Google
on how to pass in SysRQ if you are using a serial concentrator.

_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Mark Adams

2010-Nov-26 11:15 UTC

head link

Re: [Xen-users] Re: [Xen-devel] pci-passthrough in pvops causing offline raid

On Wed, Nov 24, 2010 at 03:28:43PM -0500, Konrad Rzeszutek Wilk
wrote:> On Wed, Nov 24, 2010 at 05:59:26PM +0000, Mark Adams wrote:
> > > > > > > >>>> 
> > > > > > > >>>> (XEN) msi.c:715: MSI is
already in use on device 02:00.0
> > > > > > > >>>> (XEN) msi.c:715: MSI is
already in use on device 02:00.0
> > > > > > > >>>> (XEN) msi.c:715: MSI is
already in use on device 02:00.0
> > > > > > 
> > > > > > Looking briefly at the code it means that somebody
enabled the MSI
> > > > > > already on the device and did not disable them.
But I wonder how
> > > > > > you got those in the first place. Did you use
xen-pciback.hide (for PVOPS kernels)
> > > > > > or pciback.hide (for older kernels) to
"hide" the devices away from the
> > > > > > Linux Dom0 kernel?
> > 
> > I''ve got a test setup in place now, and am trying to
reproduce this.
> > I''ve not connected up serial as yet, but can see the
following logs in
> > the qemu-dm log file when I get the "MSI is already in use"
errors
> > above. Note also that this error -always- shows for the first
specified
> > device in the pci= field, and not the 2nd.
> > 
In my new test setup, I have seen some strange behaviour. 1 of the
HVM''s
(with identical config in dom0 and domU) suddenly would not allow the
igb driver to be loaded in domU, even though the device was visible in
lspci. Shutting the machine down, removing the power cord, waiting 5
seconds then plugging it in again corrected that issue - Is this
possibly a motherboard bug? I have also disabled the SR-IOV
functionality in the BIOS incase this is causing any issues.

In addition, to try to correct the MSI issue noted above, I have changed
my pci= line to the following:

pci=[ ''08:00.0,msitranslate=0'',
''08:00.1,msitranslate=0'' ]

This has stopped the "already in use on device" log, and the devices
appear to show correctly in the domU. Is it safe to disable
msitranslate? as I understand it, its for allowing multifunction devices
to be seen as such in domU. Is that correct?

I haven''t been able to reproduce the dropped raid issue yet, but I am
awaiting delivery of the Red-Fone boxes (ISDN VoIP) which seem to cause
this due to their very high interrupt usage (2000 per second).

In the mean time, I can see the following in the qemu-dm logs now with
the msitranslate=0 enabled. Is it anything to worry about?

pt_pci_write_config: Warning: Guest attempt to set address to unused Base
Address Register. [00:05.0][Offset:14h][Length:4]
pt_ioport_map: e_phys=ffff pio_base=e880 len=32 index=2 first_map=0
pt_ioport_map: e_phys=c220 pio_base=e880 len=32 index=2 first_map=0
pt_pci_write_config: Warning: Guest attempt to set address to unused Base
Address Register. [00:06.0][Offset:14h][Length:4]
pt_ioport_map: e_phys=ffff pio_base=ec00 len=32 index=2 first_map=0
pt_ioport_map: e_phys=c240 pio_base=ec00 len=32 index=2 first_map=0
pt_msix_update_one: Update msix entry 0 with pirq 4f gvec 59
pt_msix_update_one: Update msix entry 1 with pirq 4e gvec 61
pt_msix_update_one: Update msix entry 2 with pirq 4d gvec 69
pt_msix_update_one: Update msix entry 3 with pirq 4c gvec 71
pt_msix_update_one: Update msix entry 4 with pirq 4b gvec 79
pci_msix_writel: Error: Can''t update msix entry 0 since MSI-X is
already function.
pci_msix_writel: Error: Can''t update msix entry 0 since MSI-X is
already function.
pci_msix_writel: Error: Can''t update msix entry 0 since MSI-X is
already function.
pci_msix_writel: Error: Can''t update msix entry 1 since MSI-X is
already function.
pci_msix_writel: Error: Can''t update msix entry 1 since MSI-X is
already function.
pci_msix_writel: Error: Can''t update msix entry 1 since MSI-X is
already function.
pci_msix_writel: Error: Can''t update msix entry 2 since MSI-X is
already function.
pci_msix_writel: Error: Can''t update msix entry 2 since MSI-X is
already function.
pci_msix_writel: Error: Can''t update msix entry 2 since MSI-X is
already function.
pci_msix_writel: Error: Can''t update msix entry 3 since MSI-X is
already function.
pci_msix_writel: Error: Can''t update msix entry 3 since MSI-X is
already function.
pci_msix_writel: Error: Can''t update msix entry 3 since MSI-X is
already function.
pci_msix_writel: Error: Can''t update msix entry 4 since MSI-X is
already function.
pci_msix_writel: Error: Can''t update msix entry 4 since MSI-X is
already function.
pci_msix_writel: Error: Can''t update msix entry 4 since MSI-X is
already function.
> 
> Not yet. Need to serial log of the Linux kernel and the Xen hypervisor when
your
> machine is toast. I mentioned in the previous email the key sequences -
look on Google
> on how to pass in SysRQ if you are using a serial concentrator.
I will do this when I can get the machine to crash.

Best Regards,
Mark

_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Mark Adams

2010-Nov-26 15:25 UTC

head link

Re: [Xen-users] Re: [Xen-devel] pci-passthrough in pvops causing offline raid

On Fri, Nov 26, 2010 at 11:15:20AM +0000, Mark Adams
wrote:> On Wed, Nov 24, 2010 at 03:28:43PM -0500, Konrad Rzeszutek Wilk wrote:
> > On Wed, Nov 24, 2010 at 05:59:26PM +0000, Mark Adams wrote:
> > > > > > > > >>>> 
> > > > > > > > >>>> (XEN) msi.c:715: MSI is
already in use on device 02:00.0
> > > > > > > > >>>> (XEN) msi.c:715: MSI is
already in use on device 02:00.0
> > > > > > > > >>>> (XEN) msi.c:715: MSI is
already in use on device 02:00.0
> > > > > > > 
> > > > > > > Looking briefly at the code it means that
somebody enabled the MSI
> > > > > > > already on the device and did not disable
them. But I wonder how
> > > > > > > you got those in the first place. Did you use
xen-pciback.hide (for PVOPS kernels)
> > > > > > > or pciback.hide (for older kernels) to
"hide" the devices away from the
> > > > > > > Linux Dom0 kernel?
> > > 
> > > I''ve got a test setup in place now, and am trying to
reproduce this.
> > > I''ve not connected up serial as yet, but can see the
following logs in
> > > the qemu-dm log file when I get the "MSI is already in
use" errors
> > > above. Note also that this error -always- shows for the first
specified
> > > device in the pci= field, and not the 2nd.
> > > 
> 
> In my new test setup, I have seen some strange behaviour. 1 of the
HVM''s
> (with identical config in dom0 and domU) suddenly would not allow the
> igb driver to be loaded in domU, even though the device was visible in
> lspci. Shutting the machine down, removing the power cord, waiting 5
> seconds then plugging it in again corrected that issue - Is this
> possibly a motherboard bug? I have also disabled the SR-IOV
> functionality in the BIOS incase this is causing any issues.
> 
> In addition, to try to correct the MSI issue noted above, I have changed
> my pci= line to the following:
> 
> pci=[ ''08:00.0,msitranslate=0'',
''08:00.1,msitranslate=0'' ]
Apologies for replying to my own post, but I''m having an issue with
this
setup in that I can not get a link on the 2nd interface that I''m
passing
through. I''ve tried with msitranslate both on and off, with no success.

The device shows in the domU as an interface correctly, but even with
the link up (link led''s show on the interface) domU always shows the
eth2 interface as not ready.

[    7.001784] ADDRCONF(NETDEV_UP): eth1: link is not ready
[    7.047903] ADDRCONF(NETDEV_UP): eth2: link is not ready
[   10.108995] e1000e: eth1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control:
None
[   10.109653] ADDRCONF(NETDEV_CHANGE): eth1: link becomes ready
[   16.468102] eth0: no IPv6 routers present
[   20.404092] eth1: no IPv6 routers present

I''ve tried using the ET dual port card (igb) aswell as the onboard
interfaces (e1000e) with exactly the same result. 

The eth0 interface is a xen bridge device, and if I remove this, both
passthrough interfaces work correctly.

Can you not have a bridge and pci-passthrough operational? is there a
limit of 2 NIC''s in a HVM domU? (this doesn''t sound right...)

Regards,
Mark

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Konrad Rzeszutek Wilk

2010-Nov-29 16:36 UTC

head link

[Xen-devel] HVM DomU, msi_translate=0, MSI/MSI-X PCI passthrough fails.

> 
> In my new test setup, I have seen some strange behaviour. 1 of the
HVM''s
> (with identical config in dom0 and domU) suddenly would not allow the
> igb driver to be loaded in domU, even though the device was visible in
Let''s create a new thread for this other issue.
> lspci. Shutting the machine down, removing the power cord, waiting 5
> seconds then plugging it in again corrected that issue - Is this
> possibly a motherboard bug? I have also disabled the SR-IOV
> functionality in the BIOS incase this is causing any issues.
> 
> In addition, to try to correct the MSI issue noted above, I have changed
> my pci= line to the following:
> 
> pci=[ ''08:00.0,msitranslate=0'',
''08:00.1,msitranslate=0'' ]
With the msi_translate=1 turned on the DomU HVM guests did work, right?
> 
> This has stopped the "already in use on device" log, and the
devices
> appear to show correctly in the domU. Is it safe to disable
> msitranslate? as I understand it, its for allowing multifunction devices
> to be seen as such in domU. Is that correct?
> 
> I haven''t been able to reproduce the dropped raid issue yet, but I
am
> awaiting delivery of the Red-Fone boxes (ISDN VoIP) which seem to cause
> this due to their very high interrupt usage (2000 per second).
OK.> 
> In the mean time, I can see the following in the qemu-dm logs now with
> the msitranslate=0 enabled. Is it anything to worry about?
Well, the  "Error" ones are pretty bad, thought I am having a hard
time
understanding what it means. Lets copy some of the QEMU folks on this.
> pt_pci_write_config: Warning: Guest attempt to set address to unused Base
Address Register. [00:05.0][Offset:14h][Length:4]
> pt_ioport_map: e_phys=ffff pio_base=e880 len=32 index=2 first_map=0
> pt_ioport_map: e_phys=c220 pio_base=e880 len=32 index=2 first_map=0
> pt_pci_write_config: Warning: Guest attempt to set address to unused Base
Address Register. [00:06.0][Offset:14h][Length:4]
> pt_ioport_map: e_phys=ffff pio_base=ec00 len=32 index=2 first_map=0
> pt_ioport_map: e_phys=c240 pio_base=ec00 len=32 index=2 first_map=0
> pt_msix_update_one: Update msix entry 0 with pirq 4f gvec 59
> pt_msix_update_one: Update msix entry 1 with pirq 4e gvec 61
> pt_msix_update_one: Update msix entry 2 with pirq 4d gvec 69
> pt_msix_update_one: Update msix entry 3 with pirq 4c gvec 71
> pt_msix_update_one: Update msix entry 4 with pirq 4b gvec 79
> pci_msix_writel: Error: Can''t update msix entry 0 since MSI-X is
already function.
> pci_msix_writel: Error: Can''t update msix entry 0 since MSI-X is
already function.
> pci_msix_writel: Error: Can''t update msix entry 0 since MSI-X is
already function.
> pci_msix_writel: Error: Can''t update msix entry 1 since MSI-X is
already function.
> pci_msix_writel: Error: Can''t update msix entry 1 since MSI-X is
already function.
> pci_msix_writel: Error: Can''t update msix entry 1 since MSI-X is
already function.
> pci_msix_writel: Error: Can''t update msix entry 2 since MSI-X is
already function.
> pci_msix_writel: Error: Can''t update msix entry 2 since MSI-X is
already function.
> pci_msix_writel: Error: Can''t update msix entry 2 since MSI-X is
already function.
> pci_msix_writel: Error: Can''t update msix entry 3 since MSI-X is
already function.
> pci_msix_writel: Error: Can''t update msix entry 3 since MSI-X is
already function.
> pci_msix_writel: Error: Can''t update msix entry 3 since MSI-X is
already function.
> pci_msix_writel: Error: Can''t update msix entry 4 since MSI-X is
already function.
> pci_msix_writel: Error: Can''t update msix entry 4 since MSI-X is
already function.
> pci_msix_writel: Error: Can''t update msix entry 4 since MSI-X is
already function.
> 
> > 
> > Not yet. Need to serial log of the Linux kernel and the Xen hypervisor
when your
> > machine is toast. I mentioned in the previous email the key sequences
- look on Google
> > on how to pass in SysRQ if you are using a serial concentrator.
> 
> I will do this when I can get the machine to crash.
> 
> Best Regards,
> Mark
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Mark Adams

2010-Dec-08 12:58 UTC

head link

[Xen-devel] Re: HVM DomU, msi_translate=0, MSI/MSI-X PCI passthrough fails.

Hi - Apologies to top post this, but after alot of testing, I believe
there must be an issue with IRQ''s going missing between domU and dom0.
Unfortunately I have no data to prove this!

With msitranslate=0 as detailed below, and pci=nomsi in the guest kernel
grub config, all 3 NIC''s appear OK in the domU however I still had
issues with the red-fone ISDN box. The interrupts were showing correctly
(2000/s) in the domU but communication to the device via the NIC was
still being interrupted (as shown in the asterisk console)Note that to
get the igb driver to allow this many interrupts, the
InterruptThrottleRate was set to 0. The same config (red-fone box,
asterisk etc) works fine with a physical server.

There is also the additional issue that I could not get the passthrough
NIC''s to show correctly when I also had a bridge setup.

Throughout my testing however, I could not get the machine to crash.

Not sure where to go with this one. For now we are keeping our VoIP
servers physical when ISDN connections are required.

Regards,
Mark

On Mon, Nov 29, 2010 at 11:36:35AM -0500, Konrad Rzeszutek Wilk
wrote:> > 
> > In my new test setup, I have seen some strange behaviour. 1 of the
HVM''s
> > (with identical config in dom0 and domU) suddenly would not allow the
> > igb driver to be loaded in domU, even though the device was visible in
> 
> Let''s create a new thread for this other issue.
> 
> > lspci. Shutting the machine down, removing the power cord, waiting 5
> > seconds then plugging it in again corrected that issue - Is this
> > possibly a motherboard bug? I have also disabled the SR-IOV
> > functionality in the BIOS incase this is causing any issues.
> > 
> > In addition, to try to correct the MSI issue noted above, I have
changed
> > my pci= line to the following:
> > 
> > pci=[ ''08:00.0,msitranslate=0'',
''08:00.1,msitranslate=0'' ]
> 
> With the msi_translate=1 turned on the DomU HVM guests did work, right?
> 
> > 
> > This has stopped the "already in use on device" log, and the
devices
> > appear to show correctly in the domU. Is it safe to disable
> > msitranslate? as I understand it, its for allowing multifunction
devices
> > to be seen as such in domU. Is that correct?
> > 
> > I haven''t been able to reproduce the dropped raid issue yet,
but I am
> > awaiting delivery of the Red-Fone boxes (ISDN VoIP) which seem to
cause
> > this due to their very high interrupt usage (2000 per second).
> 
> OK.
> > 
> > In the mean time, I can see the following in the qemu-dm logs now with
> > the msitranslate=0 enabled. Is it anything to worry about?
> 
> Well, the  "Error" ones are pretty bad, thought I am having a
hard time
> understanding what it means. Lets copy some of the QEMU folks on this.
> 
> > pt_pci_write_config: Warning: Guest attempt to set address to unused
Base Address Register. [00:05.0][Offset:14h][Length:4]
> > pt_ioport_map: e_phys=ffff pio_base=e880 len=32 index=2 first_map=0
> > pt_ioport_map: e_phys=c220 pio_base=e880 len=32 index=2 first_map=0
> > pt_pci_write_config: Warning: Guest attempt to set address to unused
Base Address Register. [00:06.0][Offset:14h][Length:4]
> > pt_ioport_map: e_phys=ffff pio_base=ec00 len=32 index=2 first_map=0
> > pt_ioport_map: e_phys=c240 pio_base=ec00 len=32 index=2 first_map=0
> > pt_msix_update_one: Update msix entry 0 with pirq 4f gvec 59
> > pt_msix_update_one: Update msix entry 1 with pirq 4e gvec 61
> > pt_msix_update_one: Update msix entry 2 with pirq 4d gvec 69
> > pt_msix_update_one: Update msix entry 3 with pirq 4c gvec 71
> > pt_msix_update_one: Update msix entry 4 with pirq 4b gvec 79
> > pci_msix_writel: Error: Can''t update msix entry 0 since MSI-X
is already function.
> > pci_msix_writel: Error: Can''t update msix entry 0 since MSI-X
is already function.
> > pci_msix_writel: Error: Can''t update msix entry 0 since MSI-X
is already function.
> > pci_msix_writel: Error: Can''t update msix entry 1 since MSI-X
is already function.
> > pci_msix_writel: Error: Can''t update msix entry 1 since MSI-X
is already function.
> > pci_msix_writel: Error: Can''t update msix entry 1 since MSI-X
is already function.
> > pci_msix_writel: Error: Can''t update msix entry 2 since MSI-X
is already function.
> > pci_msix_writel: Error: Can''t update msix entry 2 since MSI-X
is already function.
> > pci_msix_writel: Error: Can''t update msix entry 2 since MSI-X
is already function.
> > pci_msix_writel: Error: Can''t update msix entry 3 since MSI-X
is already function.
> > pci_msix_writel: Error: Can''t update msix entry 3 since MSI-X
is already function.
> > pci_msix_writel: Error: Can''t update msix entry 3 since MSI-X
is already function.
> > pci_msix_writel: Error: Can''t update msix entry 4 since MSI-X
is already function.
> > pci_msix_writel: Error: Can''t update msix entry 4 since MSI-X
is already function.
> > pci_msix_writel: Error: Can''t update msix entry 4 since MSI-X
is already function.
> > 
> > > 
> > > Not yet. Need to serial log of the Linux kernel and the Xen
hypervisor when your
> > > machine is toast. I mentioned in the previous email the key
sequences - look on Google
> > > on how to pass in SysRQ if you are using a serial concentrator.
> > 
> > I will do this when I can get the machine to crash.
> > 
> > Best Regards,
> > Mark
> > 
> > _______________________________________________
> > Xen-devel mailing list
> > Xen-devel@lists.xensource.com
> > http://lists.xensource.com/xen-devel
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Sander Eikelenboom

2010-Dec-08 13:37 UTC

head link

Re: [Xen-devel] Re: HVM DomU, msi_translate=0, MSI/MSI-X PCI passthrough fails.

Hello Mark,

Just a recap:
     you pass through:
     - 3 physical nics/IGB
     - 1 ISDN pci ISDN box
     - all using msi/msi-x interrupts ?

Have you tried using a PV domU instead of a HVM domU ?
Have you tried passing through only the ISDN box, and let the network run with
the xen backend/frontend to rule out the IGB/network stuff ?


--
Sander



Wednesday, December 8, 2010, 1:58:55 PM, you wrote:
> Hi - Apologies to top post this, but after alot of testing, I believe
> there must be an issue with IRQ''s going missing between domU and
dom0.
> Unfortunately I have no data to prove this!
> With msitranslate=0 as detailed below, and pci=nomsi in the guest kernel
> grub config, all 3 NIC''s appear OK in the domU however I still had
> issues with the red-fone ISDN box. The interrupts were showing correctly
> (2000/s) in the domU but communication to the device via the NIC was
> still being interrupted (as shown in the asterisk console)Note that to
> get the igb driver to allow this many interrupts, the
> InterruptThrottleRate was set to 0. The same config (red-fone box,
> asterisk etc) works fine with a physical server.
> There is also the additional issue that I could not get the passthrough
> NIC''s to show correctly when I also had a bridge setup.
> Throughout my testing however, I could not get the machine to crash.
> Not sure where to go with this one. For now we are keeping our VoIP
> servers physical when ISDN connections are required.
> Regards,
> Mark
> On Mon, Nov 29, 2010 at 11:36:35AM -0500, Konrad Rzeszutek Wilk wrote:
>> > 
>> > In my new test setup, I have seen some strange behaviour. 1 of the
HVM''s
>> > (with identical config in dom0 and domU) suddenly would not allow
the
>> > igb driver to be loaded in domU, even though the device was
visible in
>> 
>> Let''s create a new thread for this other issue.
>> 
>> > lspci. Shutting the machine down, removing the power cord, waiting
5
>> > seconds then plugging it in again corrected that issue - Is this
>> > possibly a motherboard bug? I have also disabled the SR-IOV
>> > functionality in the BIOS incase this is causing any issues.
>> > 
>> > In addition, to try to correct the MSI issue noted above, I have
changed
>> > my pci= line to the following:
>> > 
>> > pci=[ ''08:00.0,msitranslate=0'',
''08:00.1,msitranslate=0'' ]
>> 
>> With the msi_translate=1 turned on the DomU HVM guests did work, right?
>> 
>> > 
>> > This has stopped the "already in use on device" log, and
the devices
>> > appear to show correctly in the domU. Is it safe to disable
>> > msitranslate? as I understand it, its for allowing multifunction
devices
>> > to be seen as such in domU. Is that correct?
>> > 
>> > I haven''t been able to reproduce the dropped raid issue
yet, but I am
>> > awaiting delivery of the Red-Fone boxes (ISDN VoIP) which seem to
cause
>> > this due to their very high interrupt usage (2000 per second).
>> 
>> OK.
>> > 
>> > In the mean time, I can see the following in the qemu-dm logs now
with
>> > the msitranslate=0 enabled. Is it anything to worry about?
>> 
>> Well, the  "Error" ones are pretty bad, thought I am having a
hard time
>> understanding what it means. Lets copy some of the QEMU folks on this.
>> 
>> > pt_pci_write_config: Warning: Guest attempt to set address to
unused Base Address Register. [00:05.0][Offset:14h][Length:4]
>> > pt_ioport_map: e_phys=ffff pio_base=e880 len=32 index=2
first_map=0
>> > pt_ioport_map: e_phys=c220 pio_base=e880 len=32 index=2
first_map=0
>> > pt_pci_write_config: Warning: Guest attempt to set address to
unused Base Address Register. [00:06.0][Offset:14h][Length:4]
>> > pt_ioport_map: e_phys=ffff pio_base=ec00 len=32 index=2
first_map=0
>> > pt_ioport_map: e_phys=c240 pio_base=ec00 len=32 index=2
first_map=0
>> > pt_msix_update_one: Update msix entry 0 with pirq 4f gvec 59
>> > pt_msix_update_one: Update msix entry 1 with pirq 4e gvec 61
>> > pt_msix_update_one: Update msix entry 2 with pirq 4d gvec 69
>> > pt_msix_update_one: Update msix entry 3 with pirq 4c gvec 71
>> > pt_msix_update_one: Update msix entry 4 with pirq 4b gvec 79
>> > pci_msix_writel: Error: Can''t update msix entry 0 since
MSI-X is already function.
>> > pci_msix_writel: Error: Can''t update msix entry 0 since
MSI-X is already function.
>> > pci_msix_writel: Error: Can''t update msix entry 0 since
MSI-X is already function.
>> > pci_msix_writel: Error: Can''t update msix entry 1 since
MSI-X is already function.
>> > pci_msix_writel: Error: Can''t update msix entry 1 since
MSI-X is already function.
>> > pci_msix_writel: Error: Can''t update msix entry 1 since
MSI-X is already function.
>> > pci_msix_writel: Error: Can''t update msix entry 2 since
MSI-X is already function.
>> > pci_msix_writel: Error: Can''t update msix entry 2 since
MSI-X is already function.
>> > pci_msix_writel: Error: Can''t update msix entry 2 since
MSI-X is already function.
>> > pci_msix_writel: Error: Can''t update msix entry 3 since
MSI-X is already function.
>> > pci_msix_writel: Error: Can''t update msix entry 3 since
MSI-X is already function.
>> > pci_msix_writel: Error: Can''t update msix entry 3 since
MSI-X is already function.
>> > pci_msix_writel: Error: Can''t update msix entry 4 since
MSI-X is already function.
>> > pci_msix_writel: Error: Can''t update msix entry 4 since
MSI-X is already function.
>> > pci_msix_writel: Error: Can''t update msix entry 4 since
MSI-X is already function.
>> > 
>> > > 
>> > > Not yet. Need to serial log of the Linux kernel and the Xen
hypervisor when your
>> > > machine is toast. I mentioned in the previous email the key
sequences - look on Google
>> > > on how to pass in SysRQ if you are using a serial
concentrator.
>> > 
>> > I will do this when I can get the machine to crash.
>> > 
>> > Best Regards,
>> > Mark
>> > 
>> > _______________________________________________
>> > Xen-devel mailing list
>> > Xen-devel@lists.xensource.com
>> > http://lists.xensource.com/xen-devel




-- 
Best regards,
 Sander                            mailto:linux@eikelenboom.it


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Dietmar Hahn

2010-Dec-08 13:48 UTC

head link

Re: [Xen-users] Re: HVM DomU, msi_translate=0, MSI/MSI-X PCI passthrough fails.

Hi,

Am 08.12.2010 schrieb "Mark Adams
<mark@campbell-lange.net>":> Hi - Apologies to top post this, but after alot of testing, I believe
> there must be an issue with IRQ''s going missing between domU and
dom0.
> Unfortunately I have no data to prove this!
> 
> With msitranslate=0 as detailed below, and pci=nomsi in the guest kernel
> grub config, all 3 NIC''s appear OK in the domU however I still had
> issues with the red-fone ISDN box. The interrupts were showing correctly
> (2000/s) in the domU but communication to the device via the NIC was
> still being interrupted (as shown in the asterisk console)Note that to
> get the igb driver to allow this many interrupts, the
> InterruptThrottleRate was set to 0. The same config (red-fone box,
> asterisk etc) works fine with a physical server.
> 
> There is also the additional issue that I could not get the passthrough
> NIC''s to show correctly when I also had a bridge setup.
> 
> Throughout my testing however, I could not get the machine to crash.
> 
> Not sure where to go with this one. For now we are keeping our VoIP
> servers physical when ISDN connections are required.
Today I did some tests with xen-unstable and found these problems too.
I tried to passthrough 2 pci cards and got some error messages on the xen
xonsole and in the qemu logs.
With msitranslate=0 and pci=nomsi I got the soundcard working in a domU linux
but it doesn''t help on windows.
I attached the logs from the xen serial console and the qemu logs.
Thanks!

Dietmar.
> 
> Regards,
> Mark
> 
> On Mon, Nov 29, 2010 at 11:36:35AM -0500, Konrad Rzeszutek Wilk wrote:
> > > 
> > > In my new test setup, I have seen some strange behaviour. 1 of
the HVM''s
> > > (with identical config in dom0 and domU) suddenly would not allow
the
> > > igb driver to be loaded in domU, even though the device was
visible in
> > 
> > Let''s create a new thread for this other issue.
> > 
> > > lspci. Shutting the machine down, removing the power cord,
waiting 5
> > > seconds then plugging it in again corrected that issue - Is this
> > > possibly a motherboard bug? I have also disabled the SR-IOV
> > > functionality in the BIOS incase this is causing any issues.
> > > 
> > > In addition, to try to correct the MSI issue noted above, I have
changed
> > > my pci= line to the following:
> > > 
> > > pci=[ ''08:00.0,msitranslate=0'',
''08:00.1,msitranslate=0'' ]
> > 
> > With the msi_translate=1 turned on the DomU HVM guests did work,
right?
> > 
> > > 
> > > This has stopped the "already in use on device" log,
and the devices
> > > appear to show correctly in the domU. Is it safe to disable
> > > msitranslate? as I understand it, its for allowing multifunction
devices
> > > to be seen as such in domU. Is that correct?
> > > 
> > > I haven''t been able to reproduce the dropped raid issue
yet, but I am
> > > awaiting delivery of the Red-Fone boxes (ISDN VoIP) which seem to
cause
> > > this due to their very high interrupt usage (2000 per second).
> > 
> > OK.
> > > 
> > > In the mean time, I can see the following in the qemu-dm logs now
with
> > > the msitranslate=0 enabled. Is it anything to worry about?
> > 
> > Well, the  "Error" ones are pretty bad, thought I am having
a hard time
> > understanding what it means. Lets copy some of the QEMU folks on this.
> > 
> > > pt_pci_write_config: Warning: Guest attempt to set address to
unused Base Address Register. [00:05.0][Offset:14h][Length:4]
> > > pt_ioport_map: e_phys=ffff pio_base=e880 len=32 index=2
first_map=0
> > > pt_ioport_map: e_phys=c220 pio_base=e880 len=32 index=2
first_map=0
> > > pt_pci_write_config: Warning: Guest attempt to set address to
unused Base Address Register. [00:06.0][Offset:14h][Length:4]
> > > pt_ioport_map: e_phys=ffff pio_base=ec00 len=32 index=2
first_map=0
> > > pt_ioport_map: e_phys=c240 pio_base=ec00 len=32 index=2
first_map=0
> > > pt_msix_update_one: Update msix entry 0 with pirq 4f gvec 59
> > > pt_msix_update_one: Update msix entry 1 with pirq 4e gvec 61
> > > pt_msix_update_one: Update msix entry 2 with pirq 4d gvec 69
> > > pt_msix_update_one: Update msix entry 3 with pirq 4c gvec 71
> > > pt_msix_update_one: Update msix entry 4 with pirq 4b gvec 79
> > > pci_msix_writel: Error: Can''t update msix entry 0 since
MSI-X is already function.
> > > pci_msix_writel: Error: Can''t update msix entry 0 since
MSI-X is already function.
> > > pci_msix_writel: Error: Can''t update msix entry 0 since
MSI-X is already function.
> > > pci_msix_writel: Error: Can''t update msix entry 1 since
MSI-X is already function.
> > > pci_msix_writel: Error: Can''t update msix entry 1 since
MSI-X is already function.
> > > pci_msix_writel: Error: Can''t update msix entry 1 since
MSI-X is already function.
> > > pci_msix_writel: Error: Can''t update msix entry 2 since
MSI-X is already function.
> > > pci_msix_writel: Error: Can''t update msix entry 2 since
MSI-X is already function.
> > > pci_msix_writel: Error: Can''t update msix entry 2 since
MSI-X is already function.
> > > pci_msix_writel: Error: Can''t update msix entry 3 since
MSI-X is already function.
> > > pci_msix_writel: Error: Can''t update msix entry 3 since
MSI-X is already function.
> > > pci_msix_writel: Error: Can''t update msix entry 3 since
MSI-X is already function.
> > > pci_msix_writel: Error: Can''t update msix entry 4 since
MSI-X is already function.
> > > pci_msix_writel: Error: Can''t update msix entry 4 since
MSI-X is already function.
> > > pci_msix_writel: Error: Can''t update msix entry 4 since
MSI-X is already function.
> > > 
> > > > 
> > > > Not yet. Need to serial log of the Linux kernel and the Xen
hypervisor when your
> > > > machine is toast. I mentioned in the previous email the key
sequences - look on Google
> > > > on how to pass in SysRQ if you are using a serial
concentrator.
> > > 
> > > I will do this when I can get the machine to crash.
> > > 
> > > Best Regards,
> > > Mark
> > > 
> > > _______________________________________________
> > > Xen-devel mailing list
> > > Xen-devel@lists.xensource.com
> > > http://lists.xensource.com/xen-devel
> 
> _______________________________________________
> Xen-users mailing list
> Xen-users@lists.xensource.com
> http://lists.xensource.com/xen-users
> 
> -- 
Company details: http://ts.fujitsu.com/imprint.html




_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Mark Adams

2010-Dec-08 13:48 UTC

head link

Re: [Xen-devel] Re: HVM DomU, msi_translate=0, MSI/MSI-X PCI passthrough fails.

On Wed, Dec 08, 2010 at 02:37:15PM +0100, Sander Eikelenboom
wrote:> Hello Mark,
Hi
> 
> Just a recap:
>      you pass through:
>      - 3 physical nics/IGB
>      - 1 ISDN pci ISDN box
The redfone box runs on 1 of the nics - its not seperate. It converts
ISDN to TDMoE see here.. http://www.red-fone.com/
>      - all using msi/msi-x interrupts ?
I tried using msi/msi-x interrupts, but it caused the raid card to drop
off (after some use) and provided seemingly even worse performance than
pegging everything back to legacy.
> 
> Have you tried using a PV domU instead of a HVM domU ?
I initially tried PV but had issues with the igb NIC''s. There was
another thread somewhere about my issues with that.

> Have you tried passing through only the ISDN box, and let the network run
with the xen backend/frontend to rule out the IGB/network stuff ?
> 
> 
> --
> Sander
> 
> 
> 
> Wednesday, December 8, 2010, 1:58:55 PM, you wrote:
> 
> > Hi - Apologies to top post this, but after alot of testing, I believe
> > there must be an issue with IRQ''s going missing between domU
and dom0.
> > Unfortunately I have no data to prove this!
> 
> > With msitranslate=0 as detailed below, and pci=nomsi in the guest
kernel
> > grub config, all 3 NIC''s appear OK in the domU however I
still had
> > issues with the red-fone ISDN box. The interrupts were showing
correctly
> > (2000/s) in the domU but communication to the device via the NIC was
> > still being interrupted (as shown in the asterisk console)Note that to
> > get the igb driver to allow this many interrupts, the
> > InterruptThrottleRate was set to 0. The same config (red-fone box,
> > asterisk etc) works fine with a physical server.
> 
> > There is also the additional issue that I could not get the
passthrough
> > NIC''s to show correctly when I also had a bridge setup.
> 
> > Throughout my testing however, I could not get the machine to crash.
> 
> > Not sure where to go with this one. For now we are keeping our VoIP
> > servers physical when ISDN connections are required.
> 
> > Regards,
> > Mark
> 
> > On Mon, Nov 29, 2010 at 11:36:35AM -0500, Konrad Rzeszutek Wilk wrote:
> >> > 
> >> > In my new test setup, I have seen some strange behaviour. 1
of the HVM''s
> >> > (with identical config in dom0 and domU) suddenly would not
allow the
> >> > igb driver to be loaded in domU, even though the device was
visible in
> >> 
> >> Let''s create a new thread for this other issue.
> >> 
> >> > lspci. Shutting the machine down, removing the power cord,
waiting 5
> >> > seconds then plugging it in again corrected that issue - Is
this
> >> > possibly a motherboard bug? I have also disabled the SR-IOV
> >> > functionality in the BIOS incase this is causing any issues.
> >> > 
> >> > In addition, to try to correct the MSI issue noted above, I
have changed
> >> > my pci= line to the following:
> >> > 
> >> > pci=[ ''08:00.0,msitranslate=0'',
''08:00.1,msitranslate=0'' ]
> >> 
> >> With the msi_translate=1 turned on the DomU HVM guests did work,
right?
> >> 
> >> > 
> >> > This has stopped the "already in use on device"
log, and the devices
> >> > appear to show correctly in the domU. Is it safe to disable
> >> > msitranslate? as I understand it, its for allowing
multifunction devices
> >> > to be seen as such in domU. Is that correct?
> >> > 
> >> > I haven''t been able to reproduce the dropped raid
issue yet, but I am
> >> > awaiting delivery of the Red-Fone boxes (ISDN VoIP) which
seem to cause
> >> > this due to their very high interrupt usage (2000 per
second).
> >> 
> >> OK.
> >> > 
> >> > In the mean time, I can see the following in the qemu-dm logs
now with
> >> > the msitranslate=0 enabled. Is it anything to worry about?
> >> 
> >> Well, the  "Error" ones are pretty bad, thought I am
having a hard time
> >> understanding what it means. Lets copy some of the QEMU folks on
this.
> >> 
> >> > pt_pci_write_config: Warning: Guest attempt to set address to
unused Base Address Register. [00:05.0][Offset:14h][Length:4]
> >> > pt_ioport_map: e_phys=ffff pio_base=e880 len=32 index=2
first_map=0
> >> > pt_ioport_map: e_phys=c220 pio_base=e880 len=32 index=2
first_map=0
> >> > pt_pci_write_config: Warning: Guest attempt to set address to
unused Base Address Register. [00:06.0][Offset:14h][Length:4]
> >> > pt_ioport_map: e_phys=ffff pio_base=ec00 len=32 index=2
first_map=0
> >> > pt_ioport_map: e_phys=c240 pio_base=ec00 len=32 index=2
first_map=0
> >> > pt_msix_update_one: Update msix entry 0 with pirq 4f gvec 59
> >> > pt_msix_update_one: Update msix entry 1 with pirq 4e gvec 61
> >> > pt_msix_update_one: Update msix entry 2 with pirq 4d gvec 69
> >> > pt_msix_update_one: Update msix entry 3 with pirq 4c gvec 71
> >> > pt_msix_update_one: Update msix entry 4 with pirq 4b gvec 79
> >> > pci_msix_writel: Error: Can''t update msix entry 0
since MSI-X is already function.
> >> > pci_msix_writel: Error: Can''t update msix entry 0
since MSI-X is already function.
> >> > pci_msix_writel: Error: Can''t update msix entry 0
since MSI-X is already function.
> >> > pci_msix_writel: Error: Can''t update msix entry 1
since MSI-X is already function.
> >> > pci_msix_writel: Error: Can''t update msix entry 1
since MSI-X is already function.
> >> > pci_msix_writel: Error: Can''t update msix entry 1
since MSI-X is already function.
> >> > pci_msix_writel: Error: Can''t update msix entry 2
since MSI-X is already function.
> >> > pci_msix_writel: Error: Can''t update msix entry 2
since MSI-X is already function.
> >> > pci_msix_writel: Error: Can''t update msix entry 2
since MSI-X is already function.
> >> > pci_msix_writel: Error: Can''t update msix entry 3
since MSI-X is already function.
> >> > pci_msix_writel: Error: Can''t update msix entry 3
since MSI-X is already function.
> >> > pci_msix_writel: Error: Can''t update msix entry 3
since MSI-X is already function.
> >> > pci_msix_writel: Error: Can''t update msix entry 4
since MSI-X is already function.
> >> > pci_msix_writel: Error: Can''t update msix entry 4
since MSI-X is already function.
> >> > pci_msix_writel: Error: Can''t update msix entry 4
since MSI-X is already function.
> >> > 
> >> > > 
> >> > > Not yet. Need to serial log of the Linux kernel and the
Xen hypervisor when your
> >> > > machine is toast. I mentioned in the previous email the
key sequences - look on Google
> >> > > on how to pass in SysRQ if you are using a serial
concentrator.
> >> > 
> >> > I will do this when I can get the machine to crash.
> >> > 
> >> > Best Regards,
> >> > Mark
> >> > 
> >> > _______________________________________________
> >> > Xen-devel mailing list
> >> > Xen-devel@lists.xensource.com
> >> > http://lists.xensource.com/xen-devel
> 
> 
> 
> 
> 
> -- 
> Best regards,
>  Sander                            mailto:linux@eikelenboom.it
> 
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Sander Eikelenboom

2010-Dec-08 14:05 UTC

head link

Re: [Xen-devel] Re: HVM DomU, msi_translate=0, MSI/MSI-X PCI passthrough fails.

Wednesday, December 8, 2010, 2:48:48 PM, you wrote:
> On Wed, Dec 08, 2010 at 02:37:15PM +0100, Sander Eikelenboom wrote:
>> Hello Mark,
> Hi
>> 
>> Just a recap:
>>      you pass through:
>>      - 3 physical nics/IGB
>>      - 1 ISDN pci ISDN box
> The redfone box runs on 1 of the nics - its not seperate. It converts
> ISDN to TDMoE see here.. http://www.red-fone.com/
So the problem is probably with the igb''s.
Searching showed http://forums.virtualbox.org/viewtopic.php?f=7&t=32171 ,
perhaps worth a try ?

Have you tried with just 1 IGB, and/or another simple 1gb NIC (non intel) to see
if it''s due to any of the special offload features ?

>>      - all using msi/msi-x interrupts ?
> I tried using msi/msi-x interrupts, but it caused the raid card to drop
> off (after some use) and provided seemingly even worse performance than
> pegging everything back to legacy.
>> 
>> Have you tried using a PV domU instead of a HVM domU ?
> I initially tried PV but had issues with the igb NIC''s. There was
> another thread somewhere about my issues with that.
>> Have you tried passing through only the ISDN box, and let the network
run with the xen backend/frontend to rule out the IGB/network stuff ?
>> 
>> 
>> --
>> Sander
>> 
>> 
>> 
>> Wednesday, December 8, 2010, 1:58:55 PM, you wrote:
>> 
>> > Hi - Apologies to top post this, but after alot of testing, I
believe
>> > there must be an issue with IRQ''s going missing between
domU and dom0.
>> > Unfortunately I have no data to prove this!
>> 
>> > With msitranslate=0 as detailed below, and pci=nomsi in the guest
kernel
>> > grub config, all 3 NIC''s appear OK in the domU however I
still had
>> > issues with the red-fone ISDN box. The interrupts were showing
correctly
>> > (2000/s) in the domU but communication to the device via the NIC
was
>> > still being interrupted (as shown in the asterisk console)Note
that to
>> > get the igb driver to allow this many interrupts, the
>> > InterruptThrottleRate was set to 0. The same config (red-fone box,
>> > asterisk etc) works fine with a physical server.
>> 
>> > There is also the additional issue that I could not get the
passthrough
>> > NIC''s to show correctly when I also had a bridge setup.
>> 
>> > Throughout my testing however, I could not get the machine to
crash.
>> 
>> > Not sure where to go with this one. For now we are keeping our
VoIP
>> > servers physical when ISDN connections are required.
>> 
>> > Regards,
>> > Mark
>> 
>> > On Mon, Nov 29, 2010 at 11:36:35AM -0500, Konrad Rzeszutek Wilk
wrote:
>> >> > 
>> >> > In my new test setup, I have seen some strange behaviour.
1 of the HVM''s
>> >> > (with identical config in dom0 and domU) suddenly would
not allow the
>> >> > igb driver to be loaded in domU, even though the device
was visible in
>> >> 
>> >> Let''s create a new thread for this other issue.
>> >> 
>> >> > lspci. Shutting the machine down, removing the power
cord, waiting 5
>> >> > seconds then plugging it in again corrected that issue -
Is this
>> >> > possibly a motherboard bug? I have also disabled the
SR-IOV
>> >> > functionality in the BIOS incase this is causing any
issues.
>> >> > 
>> >> > In addition, to try to correct the MSI issue noted above,
I have changed
>> >> > my pci= line to the following:
>> >> > 
>> >> > pci=[ ''08:00.0,msitranslate=0'',
''08:00.1,msitranslate=0'' ]
>> >> 
>> >> With the msi_translate=1 turned on the DomU HVM guests did
work, right?
>> >> 
>> >> > 
>> >> > This has stopped the "already in use on device"
log, and the devices
>> >> > appear to show correctly in the domU. Is it safe to
disable
>> >> > msitranslate? as I understand it, its for allowing
multifunction devices
>> >> > to be seen as such in domU. Is that correct?
>> >> > 
>> >> > I haven''t been able to reproduce the dropped
raid issue yet, but I am
>> >> > awaiting delivery of the Red-Fone boxes (ISDN VoIP) which
seem to cause
>> >> > this due to their very high interrupt usage (2000 per
second).
>> >> 
>> >> OK.
>> >> > 
>> >> > In the mean time, I can see the following in the qemu-dm
logs now with
>> >> > the msitranslate=0 enabled. Is it anything to worry
about?
>> >> 
>> >> Well, the  "Error" ones are pretty bad, thought I am
having a hard time
>> >> understanding what it means. Lets copy some of the QEMU folks
on this.
>> >> 
>> >> > pt_pci_write_config: Warning: Guest attempt to set
address to unused Base Address Register. [00:05.0][Offset:14h][Length:4]
>> >> > pt_ioport_map: e_phys=ffff pio_base=e880 len=32 index=2
first_map=0
>> >> > pt_ioport_map: e_phys=c220 pio_base=e880 len=32 index=2
first_map=0
>> >> > pt_pci_write_config: Warning: Guest attempt to set
address to unused Base Address Register. [00:06.0][Offset:14h][Length:4]
>> >> > pt_ioport_map: e_phys=ffff pio_base=ec00 len=32 index=2
first_map=0
>> >> > pt_ioport_map: e_phys=c240 pio_base=ec00 len=32 index=2
first_map=0
>> >> > pt_msix_update_one: Update msix entry 0 with pirq 4f gvec
59
>> >> > pt_msix_update_one: Update msix entry 1 with pirq 4e gvec
61
>> >> > pt_msix_update_one: Update msix entry 2 with pirq 4d gvec
69
>> >> > pt_msix_update_one: Update msix entry 3 with pirq 4c gvec
71
>> >> > pt_msix_update_one: Update msix entry 4 with pirq 4b gvec
79
>> >> > pci_msix_writel: Error: Can''t update msix entry
0 since MSI-X is already function.
>> >> > pci_msix_writel: Error: Can''t update msix entry
0 since MSI-X is already function.
>> >> > pci_msix_writel: Error: Can''t update msix entry
0 since MSI-X is already function.
>> >> > pci_msix_writel: Error: Can''t update msix entry
1 since MSI-X is already function.
>> >> > pci_msix_writel: Error: Can''t update msix entry
1 since MSI-X is already function.
>> >> > pci_msix_writel: Error: Can''t update msix entry
1 since MSI-X is already function.
>> >> > pci_msix_writel: Error: Can''t update msix entry
2 since MSI-X is already function.
>> >> > pci_msix_writel: Error: Can''t update msix entry
2 since MSI-X is already function.
>> >> > pci_msix_writel: Error: Can''t update msix entry
2 since MSI-X is already function.
>> >> > pci_msix_writel: Error: Can''t update msix entry
3 since MSI-X is already function.
>> >> > pci_msix_writel: Error: Can''t update msix entry
3 since MSI-X is already function.
>> >> > pci_msix_writel: Error: Can''t update msix entry
3 since MSI-X is already function.
>> >> > pci_msix_writel: Error: Can''t update msix entry
4 since MSI-X is already function.
>> >> > pci_msix_writel: Error: Can''t update msix entry
4 since MSI-X is already function.
>> >> > pci_msix_writel: Error: Can''t update msix entry
4 since MSI-X is already function.
>> >> > 
>> >> > > 
>> >> > > Not yet. Need to serial log of the Linux kernel and
the Xen hypervisor when your
>> >> > > machine is toast. I mentioned in the previous email
the key sequences - look on Google
>> >> > > on how to pass in SysRQ if you are using a serial
concentrator.
>> >> > 
>> >> > I will do this when I can get the machine to crash.
>> >> > 
>> >> > Best Regards,
>> >> > Mark
>> >> > 
>> >> > _______________________________________________
>> >> > Xen-devel mailing list
>> >> > Xen-devel@lists.xensource.com
>> >> > http://lists.xensource.com/xen-devel
>> 
>> 
>> 
>> 
>> 
>> -- 
>> Best regards,
>>  Sander                            mailto:linux@eikelenboom.it
>> 


-- 
Best regards,
 Sander                            mailto:linux@eikelenboom.it


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Mark Adams

2010-Dec-08 15:48 UTC

head link

Re: [Xen-users] Re: [Xen-devel] Re: HVM DomU, msi_translate=0, MSI/MSI-X PCI passthrough fails.

On Wed, Dec 08, 2010 at 03:05:50PM +0100, Sander Eikelenboom
wrote:> 
> Wednesday, December 8, 2010, 2:48:48 PM, you wrote:
> 
> > On Wed, Dec 08, 2010 at 02:37:15PM +0100, Sander Eikelenboom wrote:
> >> Hello Mark,
> 
> > Hi
> 
> >> 
> >> Just a recap:
> >>      you pass through:
> >>      - 3 physical nics/IGB
> >>      - 1 ISDN pci ISDN box
> 
> > The redfone box runs on 1 of the nics - its not seperate. It converts
> > ISDN to TDMoE see here.. http://www.red-fone.com/
> 
> So the problem is probably with the igb''s.
> Searching showed http://forums.virtualbox.org/viewtopic.php?f=7&t=32171
, perhaps worth a try ?
Tried this - doesn''t help.
> 
> Have you tried with just 1 IGB, and/or another simple 1gb NIC (non intel)
to see if it''s due to any of the special offload features ?
Haven''t got any other NIC''s to try unfortunately. Even if it
did work
with 1, it would be no use to me as I need 3.

> 
> 
> >>      - all using msi/msi-x interrupts ?
> 
> > I tried using msi/msi-x interrupts, but it caused the raid card to
drop
> > off (after some use) and provided seemingly even worse performance
than
> > pegging everything back to legacy.
> 
> >> 
> >> Have you tried using a PV domU instead of a HVM domU ?
> 
> > I initially tried PV but had issues with the igb NIC''s. There
was
> > another thread somewhere about my issues with that.
> 
> 
> >> Have you tried passing through only the ISDN box, and let the
network run with the xen backend/frontend to rule out the IGB/network stuff ?
> >> 
> >> 
> >> --
> >> Sander
> >> 
> >> 
> >> 
> >> Wednesday, December 8, 2010, 1:58:55 PM, you wrote:
> >> 
> >> > Hi - Apologies to top post this, but after alot of testing, I
believe
> >> > there must be an issue with IRQ''s going missing
between domU and dom0.
> >> > Unfortunately I have no data to prove this!
> >> 
> >> > With msitranslate=0 as detailed below, and pci=nomsi in the
guest kernel
> >> > grub config, all 3 NIC''s appear OK in the domU
however I still had
> >> > issues with the red-fone ISDN box. The interrupts were
showing correctly
> >> > (2000/s) in the domU but communication to the device via the
NIC was
> >> > still being interrupted (as shown in the asterisk
console)Note that to
> >> > get the igb driver to allow this many interrupts, the
> >> > InterruptThrottleRate was set to 0. The same config (red-fone
box,
> >> > asterisk etc) works fine with a physical server.
> >> 
> >> > There is also the additional issue that I could not get the
passthrough
> >> > NIC''s to show correctly when I also had a bridge
setup.
> >> 
> >> > Throughout my testing however, I could not get the machine to
crash.
> >> 
> >> > Not sure where to go with this one. For now we are keeping
our VoIP
> >> > servers physical when ISDN connections are required.
> >> 
> >> > Regards,
> >> > Mark
> >> 
> >> > On Mon, Nov 29, 2010 at 11:36:35AM -0500, Konrad Rzeszutek
Wilk wrote:
> >> >> > 
> >> >> > In my new test setup, I have seen some strange
behaviour. 1 of the HVM''s
> >> >> > (with identical config in dom0 and domU) suddenly
would not allow the
> >> >> > igb driver to be loaded in domU, even though the
device was visible in
> >> >> 
> >> >> Let''s create a new thread for this other issue.
> >> >> 
> >> >> > lspci. Shutting the machine down, removing the power
cord, waiting 5
> >> >> > seconds then plugging it in again corrected that
issue - Is this
> >> >> > possibly a motherboard bug? I have also disabled the
SR-IOV
> >> >> > functionality in the BIOS incase this is causing any
issues.
> >> >> > 
> >> >> > In addition, to try to correct the MSI issue noted
above, I have changed
> >> >> > my pci= line to the following:
> >> >> > 
> >> >> > pci=[ ''08:00.0,msitranslate=0'',
''08:00.1,msitranslate=0'' ]
> >> >> 
> >> >> With the msi_translate=1 turned on the DomU HVM guests
did work, right?
> >> >> 
> >> >> > 
> >> >> > This has stopped the "already in use on
device" log, and the devices
> >> >> > appear to show correctly in the domU. Is it safe to
disable
> >> >> > msitranslate? as I understand it, its for allowing
multifunction devices
> >> >> > to be seen as such in domU. Is that correct?
> >> >> > 
> >> >> > I haven''t been able to reproduce the
dropped raid issue yet, but I am
> >> >> > awaiting delivery of the Red-Fone boxes (ISDN VoIP)
which seem to cause
> >> >> > this due to their very high interrupt usage (2000
per second).
> >> >> 
> >> >> OK.
> >> >> > 
> >> >> > In the mean time, I can see the following in the
qemu-dm logs now with
> >> >> > the msitranslate=0 enabled. Is it anything to worry
about?
> >> >> 
> >> >> Well, the  "Error" ones are pretty bad, thought
I am having a hard time
> >> >> understanding what it means. Lets copy some of the QEMU
folks on this.
> >> >> 
> >> >> > pt_pci_write_config: Warning: Guest attempt to set
address to unused Base Address Register. [00:05.0][Offset:14h][Length:4]
> >> >> > pt_ioport_map: e_phys=ffff pio_base=e880 len=32
index=2 first_map=0
> >> >> > pt_ioport_map: e_phys=c220 pio_base=e880 len=32
index=2 first_map=0
> >> >> > pt_pci_write_config: Warning: Guest attempt to set
address to unused Base Address Register. [00:06.0][Offset:14h][Length:4]
> >> >> > pt_ioport_map: e_phys=ffff pio_base=ec00 len=32
index=2 first_map=0
> >> >> > pt_ioport_map: e_phys=c240 pio_base=ec00 len=32
index=2 first_map=0
> >> >> > pt_msix_update_one: Update msix entry 0 with pirq 4f
gvec 59
> >> >> > pt_msix_update_one: Update msix entry 1 with pirq 4e
gvec 61
> >> >> > pt_msix_update_one: Update msix entry 2 with pirq 4d
gvec 69
> >> >> > pt_msix_update_one: Update msix entry 3 with pirq 4c
gvec 71
> >> >> > pt_msix_update_one: Update msix entry 4 with pirq 4b
gvec 79
> >> >> > pci_msix_writel: Error: Can''t update msix
entry 0 since MSI-X is already function.
> >> >> > pci_msix_writel: Error: Can''t update msix
entry 0 since MSI-X is already function.
> >> >> > pci_msix_writel: Error: Can''t update msix
entry 0 since MSI-X is already function.
> >> >> > pci_msix_writel: Error: Can''t update msix
entry 1 since MSI-X is already function.
> >> >> > pci_msix_writel: Error: Can''t update msix
entry 1 since MSI-X is already function.
> >> >> > pci_msix_writel: Error: Can''t update msix
entry 1 since MSI-X is already function.
> >> >> > pci_msix_writel: Error: Can''t update msix
entry 2 since MSI-X is already function.
> >> >> > pci_msix_writel: Error: Can''t update msix
entry 2 since MSI-X is already function.
> >> >> > pci_msix_writel: Error: Can''t update msix
entry 2 since MSI-X is already function.
> >> >> > pci_msix_writel: Error: Can''t update msix
entry 3 since MSI-X is already function.
> >> >> > pci_msix_writel: Error: Can''t update msix
entry 3 since MSI-X is already function.
> >> >> > pci_msix_writel: Error: Can''t update msix
entry 3 since MSI-X is already function.
> >> >> > pci_msix_writel: Error: Can''t update msix
entry 4 since MSI-X is already function.
> >> >> > pci_msix_writel: Error: Can''t update msix
entry 4 since MSI-X is already function.
> >> >> > pci_msix_writel: Error: Can''t update msix
entry 4 since MSI-X is already function.
> >> >> > 
> >> >> > > 
> >> >> > > Not yet. Need to serial log of the Linux kernel
and the Xen hypervisor when your
> >> >> > > machine is toast. I mentioned in the previous
email the key sequences - look on Google
> >> >> > > on how to pass in SysRQ if you are using a
serial concentrator.
> >> >> > 
> >> >> > I will do this when I can get the machine to crash.
> >> >> > 
> >> >> > Best Regards,
> >> >> > Mark
> >> >> > 
> >> >> > _______________________________________________
> >> >> > Xen-devel mailing list
> >> >> > Xen-devel@lists.xensource.com
> >> >> > http://lists.xensource.com/xen-devel
> >> 
> >> 
> >> 
> >> 
> >> 
> >> -- 
> >> Best regards,
> >>  Sander                            mailto:linux@eikelenboom.it
> >> 
> 
> 
> 
> -- 
> Best regards,
>  Sander                            mailto:linux@eikelenboom.it
> 
> 
> _______________________________________________
> Xen-users mailing list
> Xen-users@lists.xensource.com
> http://lists.xensource.com/xen-users

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Sander Eikelenboom

2010-Dec-08 16:44 UTC

head link

Re: [Xen-users] Re: [Xen-devel] Re: HVM DomU, msi_translate=0, MSI/MSI-X PCI passthrough fails.

Wednesday, December 8, 2010, 4:48:57 PM, you wrote:
> On Wed, Dec 08, 2010 at 03:05:50PM +0100, Sander Eikelenboom wrote:
>> 
>> Wednesday, December 8, 2010, 2:48:48 PM, you wrote:
>> 
>> > On Wed, Dec 08, 2010 at 02:37:15PM +0100, Sander Eikelenboom
wrote:
>> >> Hello Mark,
>> 
>> > Hi
>> 
>> >> 
>> >> Just a recap:
>> >>      you pass through:
>> >>      - 3 physical nics/IGB
>> >>      - 1 ISDN pci ISDN box
>> 
>> > The redfone box runs on 1 of the nics - its not seperate. It
converts
>> > ISDN to TDMoE see here.. http://www.red-fone.com/
>> 
>> So the problem is probably with the igb''s.
>> Searching showed
http://forums.virtualbox.org/viewtopic.php?f=7&t=32171 , perhaps worth a try
?
> Tried this - doesn''t help.
>> 
>> Have you tried with just 1 IGB, and/or another simple 1gb NIC (non
intel) to see if it''s due to any of the special offload features ?
> Haven''t got any other NIC''s to try unfortunately. Even if
it did work
> with 1, it would be no use to me as I need 3.
I understand, but simplifying the setup and trying to isolate the problem, could
clarify things.

I also read you previous thread, and i saw you hide the 02:00.0 and 03:00.0 with
xen-pciback (e1000e driver) there, but now you seem to be passing through
08:00.0 and 08:00.1 (igb) ?
So i assume you have already tried 2 different NIC''s

http://download.intel.com/design/network/specupdt/82574.pdf though shows some
errata regarding msi-x interrupts and timing issues and workarounds on the 82574
(02:00.0 and 03:00.0) nics.

--
Sander

>> 
>> 
>> >>      - all using msi/msi-x interrupts ?
>> 
>> > I tried using msi/msi-x interrupts, but it caused the raid card to
drop
>> > off (after some use) and provided seemingly even worse performance
than
>> > pegging everything back to legacy.
>> 
>> >> 
>> >> Have you tried using a PV domU instead of a HVM domU ?
>> 
>> > I initially tried PV but had issues with the igb NIC''s.
There was
>> > another thread somewhere about my issues with that.
>> 
>> 
>> >> Have you tried passing through only the ISDN box, and let the
network run with the xen backend/frontend to rule out the IGB/network stuff ?
>> >> 
>> >> 
>> >> --
>> >> Sander
>> >> 
>> >> 
>> >> 
>> >> Wednesday, December 8, 2010, 1:58:55 PM, you wrote:
>> >> 
>> >> > Hi - Apologies to top post this, but after alot of
testing, I believe
>> >> > there must be an issue with IRQ''s going missing
between domU and dom0.
>> >> > Unfortunately I have no data to prove this!
>> >> 
>> >> > With msitranslate=0 as detailed below, and pci=nomsi in
the guest kernel
>> >> > grub config, all 3 NIC''s appear OK in the domU
however I still had
>> >> > issues with the red-fone ISDN box. The interrupts were
showing correctly
>> >> > (2000/s) in the domU but communication to the device via
the NIC was
>> >> > still being interrupted (as shown in the asterisk
console)Note that to
>> >> > get the igb driver to allow this many interrupts, the
>> >> > InterruptThrottleRate was set to 0. The same config
(red-fone box,
>> >> > asterisk etc) works fine with a physical server.
>> >> 
>> >> > There is also the additional issue that I could not get
the passthrough
>> >> > NIC''s to show correctly when I also had a bridge
setup.
>> >> 
>> >> > Throughout my testing however, I could not get the
machine to crash.
>> >> 
>> >> > Not sure where to go with this one. For now we are
keeping our VoIP
>> >> > servers physical when ISDN connections are required.
>> >> 
>> >> > Regards,
>> >> > Mark
>> >> 
>> >> > On Mon, Nov 29, 2010 at 11:36:35AM -0500, Konrad
Rzeszutek Wilk wrote:
>> >> >> > 
>> >> >> > In my new test setup, I have seen some strange
behaviour. 1 of the HVM''s
>> >> >> > (with identical config in dom0 and domU)
suddenly would not allow the
>> >> >> > igb driver to be loaded in domU, even though the
device was visible in
>> >> >> 
>> >> >> Let''s create a new thread for this other
issue.
>> >> >> 
>> >> >> > lspci. Shutting the machine down, removing the
power cord, waiting 5
>> >> >> > seconds then plugging it in again corrected that
issue - Is this
>> >> >> > possibly a motherboard bug? I have also disabled
the SR-IOV
>> >> >> > functionality in the BIOS incase this is causing
any issues.
>> >> >> > 
>> >> >> > In addition, to try to correct the MSI issue
noted above, I have changed
>> >> >> > my pci= line to the following:
>> >> >> > 
>> >> >> > pci=[
''08:00.0,msitranslate=0'',
''08:00.1,msitranslate=0'' ]
>> >> >> 
>> >> >> With the msi_translate=1 turned on the DomU HVM
guests did work, right?
>> >> >> 
>> >> >> > 
>> >> >> > This has stopped the "already in use on
device" log, and the devices
>> >> >> > appear to show correctly in the domU. Is it safe
to disable
>> >> >> > msitranslate? as I understand it, its for
allowing multifunction devices
>> >> >> > to be seen as such in domU. Is that correct?
>> >> >> > 
>> >> >> > I haven''t been able to reproduce the
dropped raid issue yet, but I am
>> >> >> > awaiting delivery of the Red-Fone boxes (ISDN
VoIP) which seem to cause
>> >> >> > this due to their very high interrupt usage
(2000 per second).
>> >> >> 
>> >> >> OK.
>> >> >> > 
>> >> >> > In the mean time, I can see the following in the
qemu-dm logs now with
>> >> >> > the msitranslate=0 enabled. Is it anything to
worry about?
>> >> >> 
>> >> >> Well, the  "Error" ones are pretty bad,
thought I am having a hard time
>> >> >> understanding what it means. Lets copy some of the
QEMU folks on this.
>> >> >> 
>> >> >> > pt_pci_write_config: Warning: Guest attempt to
set address to unused Base Address Register. [00:05.0][Offset:14h][Length:4]
>> >> >> > pt_ioport_map: e_phys=ffff pio_base=e880 len=32
index=2 first_map=0
>> >> >> > pt_ioport_map: e_phys=c220 pio_base=e880 len=32
index=2 first_map=0
>> >> >> > pt_pci_write_config: Warning: Guest attempt to
set address to unused Base Address Register. [00:06.0][Offset:14h][Length:4]
>> >> >> > pt_ioport_map: e_phys=ffff pio_base=ec00 len=32
index=2 first_map=0
>> >> >> > pt_ioport_map: e_phys=c240 pio_base=ec00 len=32
index=2 first_map=0
>> >> >> > pt_msix_update_one: Update msix entry 0 with
pirq 4f gvec 59
>> >> >> > pt_msix_update_one: Update msix entry 1 with
pirq 4e gvec 61
>> >> >> > pt_msix_update_one: Update msix entry 2 with
pirq 4d gvec 69
>> >> >> > pt_msix_update_one: Update msix entry 3 with
pirq 4c gvec 71
>> >> >> > pt_msix_update_one: Update msix entry 4 with
pirq 4b gvec 79
>> >> >> > pci_msix_writel: Error: Can''t update
msix entry 0 since MSI-X is already function.
>> >> >> > pci_msix_writel: Error: Can''t update
msix entry 0 since MSI-X is already function.
>> >> >> > pci_msix_writel: Error: Can''t update
msix entry 0 since MSI-X is already function.
>> >> >> > pci_msix_writel: Error: Can''t update
msix entry 1 since MSI-X is already function.
>> >> >> > pci_msix_writel: Error: Can''t update
msix entry 1 since MSI-X is already function.
>> >> >> > pci_msix_writel: Error: Can''t update
msix entry 1 since MSI-X is already function.
>> >> >> > pci_msix_writel: Error: Can''t update
msix entry 2 since MSI-X is already function.
>> >> >> > pci_msix_writel: Error: Can''t update
msix entry 2 since MSI-X is already function.
>> >> >> > pci_msix_writel: Error: Can''t update
msix entry 2 since MSI-X is already function.
>> >> >> > pci_msix_writel: Error: Can''t update
msix entry 3 since MSI-X is already function.
>> >> >> > pci_msix_writel: Error: Can''t update
msix entry 3 since MSI-X is already function.
>> >> >> > pci_msix_writel: Error: Can''t update
msix entry 3 since MSI-X is already function.
>> >> >> > pci_msix_writel: Error: Can''t update
msix entry 4 since MSI-X is already function.
>> >> >> > pci_msix_writel: Error: Can''t update
msix entry 4 since MSI-X is already function.
>> >> >> > pci_msix_writel: Error: Can''t update
msix entry 4 since MSI-X is already function.
>> >> >> > 
>> >> >> > > 
>> >> >> > > Not yet. Need to serial log of the Linux
kernel and the Xen hypervisor when your
>> >> >> > > machine is toast. I mentioned in the
previous email the key sequences - look on Google
>> >> >> > > on how to pass in SysRQ if you are using a
serial concentrator.
>> >> >> > 
>> >> >> > I will do this when I can get the machine to
crash.
>> >> >> > 
>> >> >> > Best Regards,
>> >> >> > Mark
>> >> >> > 
>> >> >> > _______________________________________________
>> >> >> > Xen-devel mailing list
>> >> >> > Xen-devel@lists.xensource.com
>> >> >> > http://lists.xensource.com/xen-devel
>> >> 
>> >> 
>> >> 
>> >> 
>> >> 
>> >> -- 
>> >> Best regards,
>> >>  Sander                            mailto:linux@eikelenboom.it
>> >> 
>> 
>> 
>> 
>> -- 
>> Best regards,
>>  Sander                            mailto:linux@eikelenboom.it
>> 
>> 
>> _______________________________________________
>> Xen-users mailing list
>> Xen-users@lists.xensource.com
>> http://lists.xensource.com/xen-users



-- 
Best regards,
 Sander                            mailto:linux@eikelenboom.it


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Konrad Rzeszutek Wilk

2010-Dec-08 17:01 UTC

head link

[Xen-users] Re: [Xen-devel] Re: HVM DomU, msi_translate=0, MSI/MSI-X PCI passthrough fails.

> > Just a recap:
> >      you pass through:
> >      - 3 physical nics/IGB
> >      - 1 ISDN pci ISDN box
> 
> The redfone box runs on 1 of the nics - its not seperate. It converts
> ISDN to TDMoE see here.. http://www.red-fone.com/
> 
> >      - all using msi/msi-x interrupts ?
> 
> I tried using msi/msi-x interrupts, but it caused the raid card to drop
> off (after some use) and provided seemingly even worse performance than
> pegging everything back to legacy.
Were you able to get a serial log and hit all of the differetn debug options
when the the RAID card died?> 
> > 
> > Have you tried using a PV domU instead of a HVM domU ?
> 
> I initially tried PV but had issues with the igb NIC''s. There was
> another thread somewhere about my issues with that.
Hmm, the only other thread I see from you is about the
RAID.> 
> 
> > Have you tried passing through only the ISDN box, and let the network
run with the xen backend/frontend to rule out the IGB/network stuff ?
Mr. Sander idea of isolating one piece by piece is the right way.

There is a bunch of warnings in the QEMU output - some of them quite ..
troubling.

You seem to have issues with gntdev (as in, not found), but if you are using
OpenSuSE
then that would work - I think. When you tested xen-unstable did you use the
OpenSUSE kernel or the PV-OPS one?

_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Sander Eikelenboom

2010-Dec-08 17:15 UTC

head link

Re: [Xen-devel] Re: HVM DomU, msi_translate=0, MSI/MSI-X PCI passthrough fails.

It seems Jeremy has had some troubles with the Intel 82754L as well.

Don''t know if he has got those resolved, or has to use workarounds ?

http://www.gossamer-threads.com/lists/linux/kernel/1166308

What kernel are you using in your domU ? Also the 2.6.32 debian ?
Perhaps it''s also worth a try to test a newer intel driver/kernel ?

--
Sander


Wednesday, December 8, 2010, 6:01:25 PM, you wrote:
>> > Just a recap:
>> >      you pass through:
>> >      - 3 physical nics/IGB
>> >      - 1 ISDN pci ISDN box
>> 
>> The redfone box runs on 1 of the nics - its not seperate. It converts
>> ISDN to TDMoE see here.. http://www.red-fone.com/
>> 
>> >      - all using msi/msi-x interrupts ?
>> 
>> I tried using msi/msi-x interrupts, but it caused the raid card to drop
>> off (after some use) and provided seemingly even worse performance than
>> pegging everything back to legacy.
> Were you able to get a serial log and hit all of the differetn debug
options
> when the the RAID card died?
>> 
>> > 
>> > Have you tried using a PV domU instead of a HVM domU ?
>> 
>> I initially tried PV but had issues with the igb NIC''s. There
was
>> another thread somewhere about my issues with that.
> Hmm, the only other thread I see from you is about the RAID.
>> 
>> 
>> > Have you tried passing through only the ISDN box, and let the
network run with the xen backend/frontend to rule out the IGB/network stuff ?
> Mr. Sander idea of isolating one piece by piece is the right way.
> There is a bunch of warnings in the QEMU output - some of them quite ..
troubling.
> You seem to have issues with gntdev (as in, not found), but if you are
using OpenSuSE
> then that would work - I think. When you tested xen-unstable did you use
the
> OpenSUSE kernel or the PV-OPS one?


-- 
Best regards,
 Sander                            mailto:linux@eikelenboom.it


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Sander Eikelenboom

2010-Dec-08 17:18 UTC

head link

[Xen-users] Re: [Xen-devel] Re: HVM DomU, msi_translate=0, MSI/MSI-X PCI passthrough fails.

And another thread with problems and strange irq reports with regard to the
82754
http://www.linuxquestions.org/questions/linux-hardware-18/intel-82574l-gigabit-network-card-issues-and-resolution-831364/

(please Mr. Konrad, no "Mr." that makes me feel really really old ;-)
)

Wednesday, December 8, 2010, 6:01:25 PM, you wrote:
>> > Just a recap:
>> >      you pass through:
>> >      - 3 physical nics/IGB
>> >      - 1 ISDN pci ISDN box
>> 
>> The redfone box runs on 1 of the nics - its not seperate. It converts
>> ISDN to TDMoE see here.. http://www.red-fone.com/
>> 
>> >      - all using msi/msi-x interrupts ?
>> 
>> I tried using msi/msi-x interrupts, but it caused the raid card to drop
>> off (after some use) and provided seemingly even worse performance than
>> pegging everything back to legacy.
> Were you able to get a serial log and hit all of the differetn debug
options
> when the the RAID card died?
>> 
>> > 
>> > Have you tried using a PV domU instead of a HVM domU ?
>> 
>> I initially tried PV but had issues with the igb NIC''s. There
was
>> another thread somewhere about my issues with that.
> Hmm, the only other thread I see from you is about the RAID.
>> 
>> 
>> > Have you tried passing through only the ISDN box, and let the
network run with the xen backend/frontend to rule out the IGB/network stuff ?
> Mr. Sander idea of isolating one piece by piece is the right way.
> There is a bunch of warnings in the QEMU output - some of them quite ..
troubling.
> You seem to have issues with gntdev (as in, not found), but if you are
using OpenSuSE
> then that would work - I think. When you tested xen-unstable did you use
the
> OpenSUSE kernel or the PV-OPS one?


-- 
Best regards,
 Sander                            mailto:linux@eikelenboom.it


_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Konrad Rzeszutek Wilk

2010-Dec-08 17:43 UTC

head link

[Xen-users] Re: [Xen-devel] Re: HVM DomU, msi_translate=0, MSI/MSI-X PCI passthrough fails.

On Wed, Dec 08, 2010 at 06:18:06PM +0100, Sander Eikelenboom
wrote:> And another thread with problems and strange irq reports with regard to the
82754
>
http://www.linuxquestions.org/questions/linux-hardware-18/intel-82574l-gigabit-network-card-issues-and-resolution-831364/
Ugh, that device sure looks to have some faults.

So I think I''ve confused myself. There was another person who tried a
similar
pass-through to an HVM guest of a sound card. While it worked it did not seem
to work that well and spitted out lots of warnings. But those are quite
different from what Mark had.

Mark, right now we are all busy trying to get patches ready for 2.6.38 so
hence the reason for not being so fast at responding to you or trying to
reproduce this on our machines.

The RAID is troubling, but the neat thing about it is that it hangs your
machine so if you hit all of those debug options via the serial console,
it can help us narrow down on where the problem is. The other issues you have
- well, there are many posibilities (and it might be very well the same issue
you are hitting with the RAID card) and narrowing it down to the exact cause
(say - it might be what Sander suggested - one of the NICs is just funky or
perhaps
needs a firmware update) can help here.

The latest kernel is 2.6.32.26 (I think?) and the latest xen-unstable.hg has
some fixes to the MSI ownership and some IRQ migration issues fixed.
> 
> (please Mr. Konrad, no "Mr." that makes me feel really really old
;-) )
Sure thing :-)

_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Jeremy Fitzhardinge

2010-Dec-08 19:51 UTC

head link

Re: [Xen-devel] Re: HVM DomU, msi_translate=0, MSI/MSI-X PCI passthrough fails.

On 12/08/2010 09:15 AM, Sander Eikelenboom wrote:> It seems Jeremy has had some troubles with the Intel 82754L as well.
>
> Don''t know if he has got those resolved, or has to use workarounds
?
Its basically a bug in that particular chip, I think.  The workaround is
to disable ASPM for it; I''m not sure if the upstream driver does that
yet, but the workaround used to be to completely disable ASPM
(pcie_aspm=off on the kernel command line, and in the BIOS).

    J

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Mark Adams

2010-Dec-09 10:39 UTC

head link

Re: [Xen-users] Re: [Xen-devel] Re: HVM DomU, msi_translate=0, MSI/MSI-X PCI passthrough fails.

On Wed, Dec 08, 2010 at 05:44:39PM +0100, Sander Eikelenboom
wrote:> Wednesday, December 8, 2010, 4:48:57 PM, you wrote:
> 
> > On Wed, Dec 08, 2010 at 03:05:50PM +0100, Sander Eikelenboom wrote:
> >> 
> >> Wednesday, December 8, 2010, 2:48:48 PM, you wrote:
> >> 
> >> > On Wed, Dec 08, 2010 at 02:37:15PM +0100, Sander Eikelenboom
wrote:
> >> >> Hello Mark,
> >> 
> >> > Hi
> >> 
> >> >> 
> >> >> Just a recap:
> >> >>      you pass through:
> >> >>      - 3 physical nics/IGB
> >> >>      - 1 ISDN pci ISDN box
> >> 
> >> > The redfone box runs on 1 of the nics - its not seperate. It
converts
> >> > ISDN to TDMoE see here.. http://www.red-fone.com/
> >> 
> >> So the problem is probably with the igb''s.
> >> Searching showed
http://forums.virtualbox.org/viewtopic.php?f=7&t=32171 , perhaps worth a try
?
> 
> > Tried this - doesn''t help.
> 
> >> 
> >> Have you tried with just 1 IGB, and/or another simple 1gb NIC (non
intel) to see if it''s due to any of the special offload features ?
> 
> > Haven''t got any other NIC''s to try unfortunately.
Even if it did work
> > with 1, it would be no use to me as I need 3.
> 
> I understand, but simplifying the setup and trying to isolate the problem,
could clarify things.
> 
> I also read you previous thread, and i saw you hide the 02:00.0 and 03:00.0
with xen-pciback (e1000e driver) there, but now you seem to be passing through
08:00.0 and 08:00.1 (igb) ?
> So i assume you have already tried 2 different NIC''s
> 
> http://download.intel.com/design/network/specupdt/82574.pdf though shows
some errata regarding msi-x interrupts and timing issues and workarounds on the
82574 (02:00.0 and 03:00.0) nics.
> 
I was initially using the onboard NICs (e1000e) when I had the crashing
problem. To try to get around this, I disabled all the msi based stuff I
could find - which seemed to correct the crashing issue. In order to do
this I needed 3 NIC''s because bridging would not work at the same time
as passthrough (would not show all devices being passed through?) hence
starting to use the igb based NIC card thats also in the machine.

Unfortunately the servers I''ve been testing on need to go in to
production now, so can''t test any further (hence sticking the voip
stuff
on to a physical box). Xen works really well for me when I don''t use
pci-passthrough!

Regards,
Mark
> --
> Sander
> 
> 
> >> 
> >> 
> >> >>      - all using msi/msi-x interrupts ?
> >> 
> >> > I tried using msi/msi-x interrupts, but it caused the raid
card to drop
> >> > off (after some use) and provided seemingly even worse
performance than
> >> > pegging everything back to legacy.
> >> 
> >> >> 
> >> >> Have you tried using a PV domU instead of a HVM domU ?
> >> 
> >> > I initially tried PV but had issues with the igb
NIC''s. There was
> >> > another thread somewhere about my issues with that.
> >> 
> >> 
> >> >> Have you tried passing through only the ISDN box, and let
the network run with the xen backend/frontend to rule out the IGB/network stuff
?
> >> >> 
> >> >> 
> >> >> --
> >> >> Sander
> >> >> 
> >> >> 
> >> >> 
> >> >> Wednesday, December 8, 2010, 1:58:55 PM, you wrote:
> >> >> 
> >> >> > Hi - Apologies to top post this, but after alot of
testing, I believe
> >> >> > there must be an issue with IRQ''s going
missing between domU and dom0.
> >> >> > Unfortunately I have no data to prove this!
> >> >> 
> >> >> > With msitranslate=0 as detailed below, and pci=nomsi
in the guest kernel
> >> >> > grub config, all 3 NIC''s appear OK in the
domU however I still had
> >> >> > issues with the red-fone ISDN box. The interrupts
were showing correctly
> >> >> > (2000/s) in the domU but communication to the device
via the NIC was
> >> >> > still being interrupted (as shown in the asterisk
console)Note that to
> >> >> > get the igb driver to allow this many interrupts,
the
> >> >> > InterruptThrottleRate was set to 0. The same config
(red-fone box,
> >> >> > asterisk etc) works fine with a physical server.
> >> >> 
> >> >> > There is also the additional issue that I could not
get the passthrough
> >> >> > NIC''s to show correctly when I also had a
bridge setup.
> >> >> 
> >> >> > Throughout my testing however, I could not get the
machine to crash.
> >> >> 
> >> >> > Not sure where to go with this one. For now we are
keeping our VoIP
> >> >> > servers physical when ISDN connections are required.
> >> >> 
> >> >> > Regards,
> >> >> > Mark
> >> >> 
> >> >> > On Mon, Nov 29, 2010 at 11:36:35AM -0500, Konrad
Rzeszutek Wilk wrote:
> >> >> >> > 
> >> >> >> > In my new test setup, I have seen some
strange behaviour. 1 of the HVM''s
> >> >> >> > (with identical config in dom0 and domU)
suddenly would not allow the
> >> >> >> > igb driver to be loaded in domU, even
though the device was visible in
> >> >> >> 
> >> >> >> Let''s create a new thread for this
other issue.
> >> >> >> 
> >> >> >> > lspci. Shutting the machine down, removing
the power cord, waiting 5
> >> >> >> > seconds then plugging it in again corrected
that issue - Is this
> >> >> >> > possibly a motherboard bug? I have also
disabled the SR-IOV
> >> >> >> > functionality in the BIOS incase this is
causing any issues.
> >> >> >> > 
> >> >> >> > In addition, to try to correct the MSI
issue noted above, I have changed
> >> >> >> > my pci= line to the following:
> >> >> >> > 
> >> >> >> > pci=[
''08:00.0,msitranslate=0'',
''08:00.1,msitranslate=0'' ]
> >> >> >> 
> >> >> >> With the msi_translate=1 turned on the DomU HVM
guests did work, right?
> >> >> >> 
> >> >> >> > 
> >> >> >> > This has stopped the "already in use
on device" log, and the devices
> >> >> >> > appear to show correctly in the domU. Is it
safe to disable
> >> >> >> > msitranslate? as I understand it, its for
allowing multifunction devices
> >> >> >> > to be seen as such in domU. Is that
correct?
> >> >> >> > 
> >> >> >> > I haven''t been able to reproduce
the dropped raid issue yet, but I am
> >> >> >> > awaiting delivery of the Red-Fone boxes
(ISDN VoIP) which seem to cause
> >> >> >> > this due to their very high interrupt usage
(2000 per second).
> >> >> >> 
> >> >> >> OK.
> >> >> >> > 
> >> >> >> > In the mean time, I can see the following
in the qemu-dm logs now with
> >> >> >> > the msitranslate=0 enabled. Is it anything
to worry about?
> >> >> >> 
> >> >> >> Well, the  "Error" ones are pretty
bad, thought I am having a hard time
> >> >> >> understanding what it means. Lets copy some of
the QEMU folks on this.
> >> >> >> 
> >> >> >> > pt_pci_write_config: Warning: Guest attempt
to set address to unused Base Address Register. [00:05.0][Offset:14h][Length:4]
> >> >> >> > pt_ioport_map: e_phys=ffff pio_base=e880
len=32 index=2 first_map=0
> >> >> >> > pt_ioport_map: e_phys=c220 pio_base=e880
len=32 index=2 first_map=0
> >> >> >> > pt_pci_write_config: Warning: Guest attempt
to set address to unused Base Address Register. [00:06.0][Offset:14h][Length:4]
> >> >> >> > pt_ioport_map: e_phys=ffff pio_base=ec00
len=32 index=2 first_map=0
> >> >> >> > pt_ioport_map: e_phys=c240 pio_base=ec00
len=32 index=2 first_map=0
> >> >> >> > pt_msix_update_one: Update msix entry 0
with pirq 4f gvec 59
> >> >> >> > pt_msix_update_one: Update msix entry 1
with pirq 4e gvec 61
> >> >> >> > pt_msix_update_one: Update msix entry 2
with pirq 4d gvec 69
> >> >> >> > pt_msix_update_one: Update msix entry 3
with pirq 4c gvec 71
> >> >> >> > pt_msix_update_one: Update msix entry 4
with pirq 4b gvec 79
> >> >> >> > pci_msix_writel: Error: Can''t
update msix entry 0 since MSI-X is already function.
> >> >> >> > pci_msix_writel: Error: Can''t
update msix entry 0 since MSI-X is already function.
> >> >> >> > pci_msix_writel: Error: Can''t
update msix entry 0 since MSI-X is already function.
> >> >> >> > pci_msix_writel: Error: Can''t
update msix entry 1 since MSI-X is already function.
> >> >> >> > pci_msix_writel: Error: Can''t
update msix entry 1 since MSI-X is already function.
> >> >> >> > pci_msix_writel: Error: Can''t
update msix entry 1 since MSI-X is already function.
> >> >> >> > pci_msix_writel: Error: Can''t
update msix entry 2 since MSI-X is already function.
> >> >> >> > pci_msix_writel: Error: Can''t
update msix entry 2 since MSI-X is already function.
> >> >> >> > pci_msix_writel: Error: Can''t
update msix entry 2 since MSI-X is already function.
> >> >> >> > pci_msix_writel: Error: Can''t
update msix entry 3 since MSI-X is already function.
> >> >> >> > pci_msix_writel: Error: Can''t
update msix entry 3 since MSI-X is already function.
> >> >> >> > pci_msix_writel: Error: Can''t
update msix entry 3 since MSI-X is already function.
> >> >> >> > pci_msix_writel: Error: Can''t
update msix entry 4 since MSI-X is already function.
> >> >> >> > pci_msix_writel: Error: Can''t
update msix entry 4 since MSI-X is already function.
> >> >> >> > pci_msix_writel: Error: Can''t
update msix entry 4 since MSI-X is already function.
> >> >> >> > 
> >> >> >> > > 
> >> >> >> > > Not yet. Need to serial log of the
Linux kernel and the Xen hypervisor when your
> >> >> >> > > machine is toast. I mentioned in the
previous email the key sequences - look on Google
> >> >> >> > > on how to pass in SysRQ if you are
using a serial concentrator.
> >> >> >> > 
> >> >> >> > I will do this when I can get the machine
to crash.
> >> >> >> > 
> >> >> >> > Best Regards,
> >> >> >> > Mark
> >> >> >> > 
> >> >> >> >
_______________________________________________
> >> >> >> > Xen-devel mailing list
> >> >> >> > Xen-devel@lists.xensource.com
> >> >> >> > http://lists.xensource.com/xen-devel
> >> >> 
> >> >> 
> >> >> 
> >> >> 
> >> >> 
> >> >> -- 
> >> >> Best regards,
> >> >>  Sander                           
mailto:linux@eikelenboom.it
> >> >> 
> >> 
> >> 
> >> 
> >> -- 
> >> Best regards,
> >>  Sander                            mailto:linux@eikelenboom.it
> >> 
> >> 
> >> _______________________________________________
> >> Xen-users mailing list
> >> Xen-users@lists.xensource.com
> >> http://lists.xensource.com/xen-users
> 
> 
> 
> 
> -- 
> Best regards,
>  Sander                            mailto:linux@eikelenboom.it
> 
> 
> _______________________________________________
> Xen-users mailing list
> Xen-users@lists.xensource.com
> http://lists.xensource.com/xen-users
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Mark Adams

2010-Dec-09 10:49 UTC

head link

[Xen-users] Re: [Xen-devel] Re: HVM DomU, msi_translate=0, MSI/MSI-X PCI passthrough fails.

On Wed, Dec 08, 2010 at 12:01:25PM -0500, Konrad Rzeszutek Wilk
wrote:> > > Just a recap:
> > >      you pass through:
> > >      - 3 physical nics/IGB
> > >      - 1 ISDN pci ISDN box
> > 
> > The redfone box runs on 1 of the nics - its not seperate. It converts
> > ISDN to TDMoE see here.. http://www.red-fone.com/
> > 
> > >      - all using msi/msi-x interrupts ?
> > 
> > I tried using msi/msi-x interrupts, but it caused the raid card to
drop
> > off (after some use) and provided seemingly even worse performance
than
> > pegging everything back to legacy.
> 
> Were you able to get a serial log and hit all of the differetn debug
options
> when the the RAID card died?
> > 
I didn''t get it to crash - I have spent my limited time with the
hardware trying to get the setup to work effectively on the current
debian xen packages (hence disabling all the MSI stuff which seemed to
be the problem).

I understand that this isn''t helpful in terms of getting whatever
bugs may be there fixed for the future however!
> > > 
> > > Have you tried using a PV domU instead of a HVM domU ?
> > 
> > I initially tried PV but had issues with the igb NIC''s. There
was
> > another thread somewhere about my issues with that.
> 
> Hmm, the only other thread I see from you is about the RAID.
> > 
> > 
> > > Have you tried passing through only the ISDN box, and let the
network run with the xen backend/frontend to rule out the IGB/network stuff ?
> 
> Mr. Sander idea of isolating one piece by piece is the right way.
> 
> There is a bunch of warnings in the QEMU output - some of them quite ..
troubling.
> 
> You seem to have issues with gntdev (as in, not found), but if you are
using OpenSuSE
> then that would work - I think. When you tested xen-unstable did you use
the
> OpenSUSE kernel or the PV-OPS one?
I''m running the Debian xen packages in squeeze (4.0.1-1 and 2.6.32-28)
-
haven''t stepped out of the packages at all.

Regards,
Mark

_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Xen users - Nov 2010 - pci-passthrough in pvops causing offline raid

[Xen-users] pci-passthrough in pvops causing offline raid

Re: [Xen-users] pci-passthrough in pvops causing offline raid

Re: [Xen-devel] Re: [Xen-users] pci-passthrough in pvops causing offline raid

[Xen-users] Re: [Xen-devel] pci-passthrough in pvops causing offline raid

[Xen-users] Re: [Xen-devel] pci-passthrough in pvops causing offline raid

Re: [Xen-users] Re: [Xen-devel] pci-passthrough in pvops causing offline raid

Re: [Xen-devel] pci-passthrough in pvops causing offline raid

[Xen-users] Re: [Xen-devel] pci-passthrough in pvops causing offline raid

Re: [Xen-devel] pci-passthrough in pvops causing offline raid

Re: [Xen-devel] pci-passthrough in pvops causing offline raid

Re: [Xen-devel] pci-passthrough in pvops causing offline raid

Re: [Xen-users] Re: [Xen-devel] pci-passthrough in pvops causing offline raid

Re: [Xen-users] Re: [Xen-devel] pci-passthrough in pvops causing offline raid

Re: [Xen-users] Re: [Xen-devel] pci-passthrough in pvops causing offline raid

Re: [Xen-users] Re: [Xen-devel] pci-passthrough in pvops causing offline raid

Re: [Xen-users] Re: [Xen-devel] pci-passthrough in pvops causing offline raid

Re: [Xen-users] Re: [Xen-devel] pci-passthrough in pvops causing offline raid

Re: [Xen-users] Re: [Xen-devel] pci-passthrough in pvops causing offline raid

Re: [Xen-users] Re: [Xen-devel] pci-passthrough in pvops causing offline raid

Re: [Xen-users] Re: [Xen-devel] pci-passthrough in pvops causing offline raid

Re: [Xen-users] Re: [Xen-devel] pci-passthrough in pvops causing offline raid

Re: [Xen-users] Re: [Xen-devel] pci-passthrough in pvops causing offline raid

Re: [Xen-users] Re: [Xen-devel] pci-passthrough in pvops causing offline raid

Re: [Xen-users] Re: [Xen-devel] pci-passthrough in pvops causing offline raid

Re: [Xen-users] Re: [Xen-devel] pci-passthrough in pvops causing offline raid

Re: [Xen-users] Re: [Xen-devel] pci-passthrough in pvops causing offline raid

Re: [Xen-users] Re: [Xen-devel] pci-passthrough in pvops causing offline raid

Re: [Xen-users] Re: [Xen-devel] pci-passthrough in pvops causing offline raid

Re: [Xen-users] Re: [Xen-devel] pci-passthrough in pvops causing offline raid

Re: [Xen-users] Re: [Xen-devel] pci-passthrough in pvops causing offline raid

Re: [Xen-users] Re: [Xen-devel] pci-passthrough in pvops causing offline raid

Re: [Xen-users] Re: [Xen-devel] pci-passthrough in pvops causing offline raid

Re: [Xen-users] Re: [Xen-devel] pci-passthrough in pvops causing offline raid

[Xen-devel] HVM DomU, msi_translate=0, MSI/MSI-X PCI passthrough fails.

[Xen-devel] Re: HVM DomU, msi_translate=0, MSI/MSI-X PCI passthrough fails.

Re: [Xen-devel] Re: HVM DomU, msi_translate=0, MSI/MSI-X PCI passthrough fails.

Re: [Xen-users] Re: HVM DomU, msi_translate=0, MSI/MSI-X PCI passthrough fails.

Re: [Xen-devel] Re: HVM DomU, msi_translate=0, MSI/MSI-X PCI passthrough fails.

Re: [Xen-devel] Re: HVM DomU, msi_translate=0, MSI/MSI-X PCI passthrough fails.

Re: [Xen-users] Re: [Xen-devel] Re: HVM DomU, msi_translate=0, MSI/MSI-X PCI passthrough fails.

Re: [Xen-users] Re: [Xen-devel] Re: HVM DomU, msi_translate=0, MSI/MSI-X PCI passthrough fails.

[Xen-users] Re: [Xen-devel] Re: HVM DomU, msi_translate=0, MSI/MSI-X PCI passthrough fails.

Re: [Xen-devel] Re: HVM DomU, msi_translate=0, MSI/MSI-X PCI passthrough fails.

[Xen-users] Re: [Xen-devel] Re: HVM DomU, msi_translate=0, MSI/MSI-X PCI passthrough fails.

[Xen-users] Re: [Xen-devel] Re: HVM DomU, msi_translate=0, MSI/MSI-X PCI passthrough fails.

Re: [Xen-devel] Re: HVM DomU, msi_translate=0, MSI/MSI-X PCI passthrough fails.

Re: [Xen-users] Re: [Xen-devel] Re: HVM DomU, msi_translate=0, MSI/MSI-X PCI passthrough fails.

[Xen-users] Re: [Xen-devel] Re: HVM DomU, msi_translate=0, MSI/MSI-X PCI passthrough fails.