thr3ads.net - Xen devel - [Xen-devel] xen dependant on pcpu 0 ? [Oct 2010]

If this information is useful, please help other people find it:
Share via:

Sander Eikelenboom

2010-Oct-12 16:28 UTC

[Xen-devel] xen dependant on pcpu 0 ?

Hi Keir,

Does xen and/or the xen console depend on physical cpu 0 ?

I''m still trying to solve the mystery of my machine freezing when
doing:

 - videograbbing in a domU with a usb3 pci-express controller passed through
(seems to cause quite a few interrupts)
 - compiling a linux kernel with "make -j 6"

It''s a 6 core AMD phenom x6.

Without cpu pinning:
I can freeze the machine easily within a minute after starting the compile, at
first xen serial console also slows down under the load (slow updates).
When the machine freezes i can''t do anything with xen serial console.

With cpu pinning:
By not using the pcpu 0 at all for any domain, and pinning the domain with the
videograbber to it''s own pcpu (pcpu 5)  it seems the machine keeps
running after 20 "make -j6" iterations of kernel compilation.
Xen serial console stays responsive and doesn''t slow down during the
kernel compilation. The videograbber shows no problem grabbing video.


Name                                ID  VCPU   CPU State   Time(s) CPU Affinity
Domain-0                             0     0     3   r--    2169.7 1-4
Domain-0                             0     1     1   -b-    2339.3 1-4
Domain-0                             0     2     2   -b-    2358.9 1-4
Domain-0                             0     3     3   -b-    2298.2 1-4
Domain-0                             0     4     1   -b-    2221.9 1-4
Domain-0                             0     5     4   -b-    2287.7 1-4
backup                               9     0     4   -b-      10.6 1-4
database                             1     0     4   -b-      45.3 1-4
davical                              5     0     3   -b-       8.7 1-4
git                                  8     0     2   -b-       7.9 1-4
mail                                 2     0     4   -b-       8.0 1-4
samba                                3     0     3   -b-      11.1 1-4
security                             7     0     5   r--    1433.2 5
www                                  4     0     1   -b-      10.2 1-4
zabbix                               6     0     3   -b-      21.2 1-4


Is there a way a deadlock could occur between hypervisor <-> dom0
<-> domU especially related to passthrough/interrupts in the context of
pcpu 0 ?

--
Sander


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Konrad Rzeszutek Wilk

2010-Oct-12 16:44 UTC

head link

[Xen-devel] Re: xen dependant on pcpu 0 ?

On Tue, Oct 12, 2010 at 06:28:13PM +0200, Sander Eikelenboom
wrote:> Hi Keir,
> 
> Does xen and/or the xen console depend on physical cpu 0 ?
Usually the console for Dom0, and I think all other domains go
through CPU0. Let me CC Ian here, who has been mucking in this
area and found some bugs (and produced fixes).

Ian, that bug you found with not clearing the eventchannel - that
wouldn''t have an impact here, right?
> 
> I''m still trying to solve the mystery of my machine freezing when
doing:
> 
>  - videograbbing in a domU with a usb3 pci-express controller passed
through (seems to cause quite a few interrupts)
>  - compiling a linux kernel with "make -j 6"
> 
> It''s a 6 core AMD phenom x6.
> 
> Without cpu pinning:
> I can freeze the machine easily within a minute after starting the compile,
at first xen serial console also slows down under the load (slow updates).
> When the machine freezes i can''t do anything with xen serial
console.
> 
> With cpu pinning:
> By not using the pcpu 0 at all for any domain, and pinning the domain with
the videograbber to it''s own pcpu (pcpu 5)  it seems the machine keeps
running after 20 "make -j6" iterations of kernel compilation.
> Xen serial console stays responsive and doesn''t slow down during
the kernel compilation. The videograbber shows no problem grabbing video.
> 
AHA! So finally closer to the mystery.

Can you provide the /proc/interrupts of the Dom0?

I wonder if this is related to the isseu I had some time ago, and never got
to look at. The problem was that during heavy compilation (this is a 2 Nehelem
socket box, just running Dom0 - no guests), the keyboard and USB driver would
stop getting interrupts.  So the drivers would start polling which is quite
slow,
albeit servicable, and then at some point it would pick up again.

The weirdness was that the /proc/interrupts showed absolutly _no_ interrupts on
CPU0
during that time - as if Xen just forgot to update them. Jeremy suggested I try
to
disable Xen IRQ balance (noirqbalance on Xen command line) in case that is it,
and to my
emberrasement I haven''t tried that yet.

Did you try that? I think somebody suggested that but I can''t recall
whether it
was for this issue?> 
> Name                                ID  VCPU   CPU State   Time(s) CPU
Affinity
> Domain-0                             0     0     3   r--    2169.7 1-4
> Domain-0                             0     1     1   -b-    2339.3 1-4
> Domain-0                             0     2     2   -b-    2358.9 1-4
> Domain-0                             0     3     3   -b-    2298.2 1-4
> Domain-0                             0     4     1   -b-    2221.9 1-4
> Domain-0                             0     5     4   -b-    2287.7 1-4
> backup                               9     0     4   -b-      10.6 1-4
> database                             1     0     4   -b-      45.3 1-4
> davical                              5     0     3   -b-       8.7 1-4
> git                                  8     0     2   -b-       7.9 1-4
> mail                                 2     0     4   -b-       8.0 1-4
> samba                                3     0     3   -b-      11.1 1-4
> security                             7     0     5   r--    1433.2 5
> www                                  4     0     1   -b-      10.2 1-4
> zabbix                               6     0     3   -b-      21.2 1-4
> 
> 
> Is there a way a deadlock could occur between hypervisor <-> dom0
<-> domU especially related to passthrough/interrupts in the context of
pcpu 0 ?
I don''t know, but I do know that the IRQ handling in Xen 4.0 changed
significantly compared
to 3.4. I don''t remember if you ever ran this setup under
3.4?> 
> --
> Sander
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Ian Campbell

2010-Oct-12 17:13 UTC

head link

[Xen-devel] Re: xen dependant on pcpu 0 ?

On Tue, 2010-10-12 at 17:44 +0100, Konrad Rzeszutek Wilk wrote:
> On Tue, Oct 12, 2010 at 06:28:13PM +0200, Sander Eikelenboom wrote:
> > Hi Keir,
> > 
> > Does xen and/or the xen console depend on physical cpu 0 ?
> 
> Usually the console for Dom0, and I think all other domains go
> through CPU0. Let me CC Ian here, who has been mucking in this
> area and found some bugs (and produced fixes).
> 
> Ian, that bug you found with not clearing the eventchannel - that
> wouldn''t have an impact here, right?
I don''t think so. That issue was related to evtchn delivery which is to
VCPUs not PCPUs. I don''t think it was specific to VCPU0 either -- it
just happened that the particular evtchn was generally tied to VCPU0 by
default.

I don''t think the problem would happen for PIRQs anyway since the
->startup method for that IRQ chip includes an explicit rebind of the
evtchn to a VCPU, it''s only dynirqs which have the issue.

Ian.



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Sander Eikelenboom

2010-Oct-12 18:50 UTC

head link

[Xen-devel] Re: xen dependant on pcpu 0 ?

Tuesday, October 12, 2010, 6:44:33 PM, you wrote:
> On Tue, Oct 12, 2010 at 06:28:13PM +0200, Sander Eikelenboom wrote:
>> Hi Keir,
>> 
>> Does xen and/or the xen console depend on physical cpu 0 ?
> Usually the console for Dom0, and I think all other domains go
> through CPU0. Let me CC Ian here, who has been mucking in this
> area and found some bugs (and produced fixes).
> Ian, that bug you found with not clearing the eventchannel - that
> wouldn''t have an impact here, right?
>> 
>> I''m still trying to solve the mystery of my machine freezing
when doing:
>> 
>>  - videograbbing in a domU with a usb3 pci-express controller passed
through (seems to cause quite a few interrupts)
>>  - compiling a linux kernel with "make -j 6"
>> 
>> It''s a 6 core AMD phenom x6.
>> 
>> Without cpu pinning:
>> I can freeze the machine easily within a minute after starting the
compile, at first xen serial console also slows down under the load (slow
updates).
>> When the machine freezes i can''t do anything with xen serial
console.
>> 
>> With cpu pinning:
>> By not using the pcpu 0 at all for any domain, and pinning the domain
with the videograbber to it''s own pcpu (pcpu 5)  it seems the machine
keeps running after 20 "make -j6" iterations of kernel compilation.
>> Xen serial console stays responsive and doesn''t slow down
during the kernel compilation. The videograbber shows no problem grabbing video.
>> 
> AHA! So finally closer to the mystery.
So i thought ... but all though it survived 20 iterations of kernel compiling,
it still froze while the dom0 was relatively idle, and the domU still grabing
video.
This time it gave the "RCU detected CPU stalls " again cpu 0, since
it''s dom0 that should be vcpu0=pcpu1. My xen serial console was frozen
again, so i can''t dump anything.
But:
-the hypervisor should still have pcpu0 available
-dom0 has pcpu1-4 although shared with some other mostly idle domains
-domU with videograbbing has pcpu5

So the cpu pinning seems to change things a bit, but only in the sense that it
survives some what longer ...

Another thing i''m wondering about is that xentop reports that dom0
consumes about 50% cpu, when i use top on dom0, i seem to get nowhere near 50%
when using the 2.6.31 pvops kernel
With the latest 2.6.32-pvops there is a problem that events/0 consumes a lot of
cpu related to xenconsoled (jeremy has allready a thread running on that).
That''s why i now tested 2.6.31-pvops that hasn''t got that
issue.
> Can you provide the /proc/interrupts of the Dom0?
Just when running for some time, or try to get it under load / just before
freeze ?

> I wonder if this is related to the isseu I had some time ago, and never got
> to look at. The problem was that during heavy compilation (this is a 2
Nehelem
> socket box, just running Dom0 - no guests), the keyboard and USB driver
would
> stop getting interrupts.  So the drivers would start polling which is quite
slow,
> albeit servicable, and then at some point it would pick up again.
> The weirdness was that the /proc/interrupts showed absolutly _no_
interrupts on CPU0
> during that time - as if Xen just forgot to update them. Jeremy suggested I
try to
> disable Xen IRQ balance (noirqbalance on Xen command line) in case that is
it, and to my
> emberrasement I haven''t tried that yet.
I did try that before, didn''t seem to make a difference, but i will try
again just to be sure.
> Did you try that? I think somebody suggested that but I can''t
recall whether it
> was for this issue?
>> 
>> Name                                ID  VCPU   CPU State   Time(s) CPU
Affinity
>> Domain-0                             0     0     3   r--    2169.7 1-4
>> Domain-0                             0     1     1   -b-    2339.3 1-4
>> Domain-0                             0     2     2   -b-    2358.9 1-4
>> Domain-0                             0     3     3   -b-    2298.2 1-4
>> Domain-0                             0     4     1   -b-    2221.9 1-4
>> Domain-0                             0     5     4   -b-    2287.7 1-4
>> backup                               9     0     4   -b-      10.6 1-4
>> database                             1     0     4   -b-      45.3 1-4
>> davical                              5     0     3   -b-       8.7 1-4
>> git                                  8     0     2   -b-       7.9 1-4
>> mail                                 2     0     4   -b-       8.0 1-4
>> samba                                3     0     3   -b-      11.1 1-4
>> security                             7     0     5   r--    1433.2 5
>> www                                  4     0     1   -b-      10.2 1-4
>> zabbix                               6     0     3   -b-      21.2 1-4
>> 
>> 
>> Is there a way a deadlock could occur between hypervisor <-> dom0
<-> domU especially related to passthrough/interrupts in the context of
pcpu 0 ?
> I don''t know, but I do know that the IRQ handling in Xen 4.0
changed significantly compared
> to 3.4. I don''t remember if you ever ran this setup under 3.4?
I tried xen 3.4-testing as well today (in combination with 2.6.31-pvops as
dom0), but that resulted in a videograbbing domU going beserk, the xhci driver
complains about "spurious interrupts" multiple times a second.
>> 
>> --
>> Sander


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Sander Eikelenboom

2010-Oct-12 19:37 UTC

head link

[Xen-devel] Re: xen dependant on pcpu 0 ?

Hi Konrad,

Here are the /proc/interrupts, without any cpu pinning, with normal interrupts i
saw a pciback entry in /proc/interrupts in dom0, but with msi-x these seem to be
missing ?

/proc/interrupts dom0 (2.6.31-pvops), running domU with videograbber active,
little other activity in dom0
            CPU0       CPU1       CPU2       CPU3       CPU4       CPU5
   1:          2          0          0          0          0          0 
xen-pirq-ioapic-edge  i8042
   8:          0          0          0          0          0          0 
xen-pirq-ioapic-edge  rtc0
   9:          0          0          0          0          0          0 
xen-pirq-ioapic-level  acpi
  12:          4          0          0          0          0          0 
xen-pirq-ioapic-edge  i8042
  17:          2          0          0          0          0          0 
xen-pirq-ioapic-level  ehci_hcd:usb1, ehci_hcd:usb2, ehci_hcd:usb3
  18:         33          0          0          0          0          0 
xen-pirq-ioapic-level  ohci_hcd:usb4, ohci_hcd:usb5, ohci_hcd:usb6,
ohci_hcd:usb7
  25:         18          0          0          0          0          0 
xen-pirq-ioapic-level  HDA Intel
 903:         38          0          0          0          0          0  
xen-dyn-event     vif9.0
 904:        912          0          0          0          0          0  
xen-dyn-event     blkif-backend
 905:         18          0          0          0          0          0  
xen-dyn-event     blkif-backend
 906:        409          0          0          0          0          0  
xen-dyn-event     evtchn:xenconsoled
 907:        285          0          0          0          0          0  
xen-dyn-event     evtchn:xenstored
 908:         12          0          0          0          0          0  
xen-dyn-event     vif8.0
 909:       4882          0          0          0          0          0  
xen-dyn-event     blkif-backend
 910:         19          0          0          0          0          0  
xen-dyn-event     blkif-backend
 911:        465          0          0          0          0          0  
xen-dyn-event     evtchn:xenconsoled
 912:        426          0          0          0          0          0  
xen-dyn-event     evtchn:xenstored
 913:        252          0          0          0          0          0  
xen-dyn-event     vif7.0
 916:        135          0          0          0          0          0  
xen-dyn-event     blkif-backend
 917:       1822          0          0          0          0          0  
xen-dyn-event     blkif-backend
 918:         25          0          0          0          0          0  
xen-dyn-event     blkif-backend
 919:       1021          0          0          0          0          0  
xen-dyn-event     pciback
 920:        947          0          0          0          0          0  
xen-dyn-event     evtchn:xenconsoled
 921:        357          0          0          0          0          0  
xen-dyn-event     evtchn:xenstored
 922:      61440          0          0          0          0          0  
xen-dyn-event     vif6.0
 923:       3065          0          0          0          0          0  
xen-dyn-event     blkif-backend
 924:         25          0          0          0          0          0  
xen-dyn-event     blkif-backend
 925:        236          0          0          0          0          0  
xen-dyn-event     evtchn:xenconsoled
 926:        262          0          0          0          0          0  
xen-dyn-event     evtchn:xenstored
 927:         12          0          0          0          0          0  
xen-dyn-event     vif5.0
 928:        932          0          0          0          0          0  
xen-dyn-event     blkif-backend
 929:         19          0          0          0          0          0  
xen-dyn-event     blkif-backend
 930:        272          0          0          0          0          0  
xen-dyn-event     evtchn:xenconsoled
 931:        288          0          0          0          0          0  
xen-dyn-event     evtchn:xenstored
 932:         59          0          0          0          0          0  
xen-dyn-event     vif4.0
 933:       1263          0          0          0          0          0  
xen-dyn-event     blkif-backend
 934:         23          0          0          0          0          0  
xen-dyn-event     blkif-backend
 935:        282          0          0          0          0          0  
xen-dyn-event     evtchn:xenconsoled
 936:        201          0          0          0          0          0  
xen-dyn-event     vif3.0
 937:        286          0          0          0          0          0  
xen-dyn-event     evtchn:xenstored
 938:       1082          0          0          0          0          0  
xen-dyn-event     blkif-backend
 939:         19          0          0          0          0          0  
xen-dyn-event     blkif-backend
 940:        301          0          0          0          0          0  
xen-dyn-event     evtchn:xenconsoled
 941:         18          0          0          0          0          0  
xen-dyn-event     vif2.0
 942:        810          0          0          0          0          0  
xen-dyn-event     blkif-backend
 943:         19          0          0          0          0          0  
xen-dyn-event     blkif-backend
 944:        280          0          0          0          0          0  
xen-dyn-event     evtchn:xenstored
 945:        281          0          0          0          0          0  
xen-dyn-event     evtchn:xenconsoled
 946:      42169          0          0          0          0          0  
xen-dyn-event     vif1.0
 947:        279          0          0          0          0          0  
xen-dyn-event     evtchn:xenstored
 948:      24824          0          0          0          0          0  
xen-dyn-event     blkif-backend
 949:         19          0          0          0          0          0  
xen-dyn-event     blkif-backend
 950:        285          0          0          0          0          0  
xen-dyn-event     evtchn:xenconsoled
 951:        282          0          0          0          0          0  
xen-dyn-event     evtchn:xenstored
 952:          0          0          0          0          0          0  
xen-dyn-event     evtchn:xenstored
 953:       5922          0          0          0          0          0  
xen-dyn-event     evtchn:xenstored
 954:      22988          0          0          0          0          0 
xen-pirq-msi       eth1
 955:      21566          0          0          0          0          0 
xen-pirq-msi       eth0
 956:          0          0          0          0          0          0 
xen-pirq-msi       ahci
 957:      72163          0          0          0          0          0 
xen-pirq-msi       ahci
 968:          0          0          0          0          0          0  
xen-dyn-virq      pcpu
 969:      11384          0          0          0          0          0  
xen-dyn-event     xenbus
 970:          0          0          0          0          0       1347  
xen-dyn-ipi       callfuncsingle5
 971:          0          0          0          0          0          0  
xen-dyn-virq      debug5
 972:          0          0          0          0          0        390  
xen-dyn-ipi       callfunc5
 973:          0          0          0          0          0      10300  
xen-dyn-ipi       resched5
 974:          0          0          0          0          0     137425  
xen-dyn-virq      timer5
 975:          0          0          0          0       1504          0  
xen-dyn-ipi       callfuncsingle4
 976:          0          0          0          0          0          0  
xen-dyn-virq      debug4
 977:          0          0          0          0        394          0  
xen-dyn-ipi       callfunc4
 978:          0          0          0          0      20872          0  
xen-dyn-ipi       resched4
 979:          0          0          0          0     254028          0  
xen-dyn-virq      timer4
 980:          0          0          0       1560          0          0  
xen-dyn-ipi       callfuncsingle3
 981:          0          0          0          0          0          0  
xen-dyn-virq      debug3
 982:          0          0          0        309          0          0  
xen-dyn-ipi       callfunc3
 983:          0          0          0      23055          0          0  
xen-dyn-ipi       resched3
 984:          0          0          0     348681          0          0  
xen-dyn-virq      timer3
 985:          0          0       1252          0          0          0  
xen-dyn-ipi       callfuncsingle2
 986:          0          0          0          0          0          0  
xen-dyn-virq      debug2
 987:          0          0        415          0          0          0  
xen-dyn-ipi       callfunc2
 988:          0          0      24141          0          0          0  
xen-dyn-ipi       resched2
 989:          0          0     415948          0          0          0  
xen-dyn-virq      timer2
 990:          0       1847          0          0          0          0  
xen-dyn-ipi       callfuncsingle1
 991:          0          0          0          0          0          0  
xen-dyn-virq      debug1
 992:          0        404          0          0          0          0  
xen-dyn-ipi       callfunc1
 993:          0      22844          0          0          0          0  
xen-dyn-ipi       resched1
 994:          0     484202          0          0          0          0  
xen-dyn-virq      timer1
 995:       1148          0          0          0          0          0  
xen-dyn-ipi       callfuncsingle0
 996:          0          0          0          0          0          0  
xen-dyn-virq      debug0
 997:        276          0          0          0          0          0  
xen-dyn-ipi       callfunc0
 998:      17435          0          0          0          0          0  
xen-dyn-ipi       resched0
 999:    1296208          0          0          0          0          0  
xen-dyn-virq      timer0
 NMI:          0          0          0          0          0          0  
Non-maskable interrupts
 LOC:          0          0          0          0          0          0   Local
timer interrupts
 SPU:          0          0          0          0          0          0  
Spurious interrupts
 CNT:          0          0          0          0          0          0  
Performance counter interrupts
 PND:          0          0          0          0          0          0  
Performance pending work
 RES:      17435      22844      24141      23055      20872      10300  
Rescheduling interrupts
 CAL:       1424       2251       1667       1869       1898       1737  
Function call interrupts
 TLB:          0          0          0          0          0          0   TLB
shootdowns
 TRM:          0          0          0          0          0          0  
Thermal event interrupts
 MCE:          0          0          0          0          0          0  
Machine check exceptions
 MCP:          5          5          5          5          5          5  
Machine check polls
 ERR:          0
 MIS:          0

 /proc/interrupts in videograbbing domU (2.6.36 pci-front0.7)

            CPU0
 44:          0  xen-pirq-pcifront  ohci_hcd:usb2
 45:          0  xen-pirq-pcifront  ohci_hcd:usb3
 46:          0  xen-pirq-pcifront  ehci_hcd:usb1
 86:          0  xen-pirq-pcifront-msi-x  xhci_hcd
 87:     147452  xen-pirq-pcifront-msi-x  xhci_hcd
244:        461   xen-dyn-event     eth0
245:        151   xen-dyn-event     blkif
246:       2720   xen-dyn-event     blkif
247:         30   xen-dyn-event     blkif
248:        309   xen-dyn-event     hvc_console
249:          2   xen-dyn-event     pcifront
250:        603   xen-dyn-event     xenbus
251:          0  xen-percpu-ipi       callfuncsingle0
252:          0  xen-percpu-virq      debug0
253:          0  xen-percpu-ipi       callfunc0
254:          0  xen-percpu-ipi       resched0
255:     193070  xen-percpu-virq      timer0
NMI:          0   Non-maskable interrupts
LOC:          0   Local timer interrupts
SPU:          0   Spurious interrupts
PMI:          0   Performance monitoring interrupts
PND:          0   Performance pending work
RES:          0   Rescheduling interrupts
CAL:          0   Function call interrupts
TLB:          0   TLB shootdowns
MCE:          0   Machine check exceptions
MCP:          0   Machine check polls
ERR:          0
MIS:          0







Tuesday, October 12, 2010, 6:44:33 PM, you wrote:
> On Tue, Oct 12, 2010 at 06:28:13PM +0200, Sander Eikelenboom wrote:
>> Hi Keir,
>> 
>> Does xen and/or the xen console depend on physical cpu 0 ?
> Usually the console for Dom0, and I think all other domains go
> through CPU0. Let me CC Ian here, who has been mucking in this
> area and found some bugs (and produced fixes).
> Ian, that bug you found with not clearing the eventchannel - that
> wouldn''t have an impact here, right?
>> 
>> I''m still trying to solve the mystery of my machine freezing
when doing:
>> 
>>  - videograbbing in a domU with a usb3 pci-express controller passed
through (seems to cause quite a few interrupts)
>>  - compiling a linux kernel with "make -j 6"
>> 
>> It''s a 6 core AMD phenom x6.
>> 
>> Without cpu pinning:
>> I can freeze the machine easily within a minute after starting the
compile, at first xen serial console also slows down under the load (slow
updates).
>> When the machine freezes i can''t do anything with xen serial
console.
>> 
>> With cpu pinning:
>> By not using the pcpu 0 at all for any domain, and pinning the domain
with the videograbber to it''s own pcpu (pcpu 5)  it seems the machine
keeps running after 20 "make -j6" iterations of kernel compilation.
>> Xen serial console stays responsive and doesn''t slow down
during the kernel compilation. The videograbber shows no problem grabbing video.
>> 
> AHA! So finally closer to the mystery.
> Can you provide the /proc/interrupts of the Dom0?
> I wonder if this is related to the isseu I had some time ago, and never got
> to look at. The problem was that during heavy compilation (this is a 2
Nehelem
> socket box, just running Dom0 - no guests), the keyboard and USB driver
would
> stop getting interrupts.  So the drivers would start polling which is quite
slow,
> albeit servicable, and then at some point it would pick up again.
> The weirdness was that the /proc/interrupts showed absolutly _no_
interrupts on CPU0
> during that time - as if Xen just forgot to update them. Jeremy suggested I
try to
> disable Xen IRQ balance (noirqbalance on Xen command line) in case that is
it, and to my
> emberrasement I haven''t tried that yet.
> Did you try that? I think somebody suggested that but I can''t
recall whether it
> was for this issue?
>> 
>> Name                                ID  VCPU   CPU State   Time(s) CPU
Affinity
>> Domain-0                             0     0     3   r--    2169.7 1-4
>> Domain-0                             0     1     1   -b-    2339.3 1-4
>> Domain-0                             0     2     2   -b-    2358.9 1-4
>> Domain-0                             0     3     3   -b-    2298.2 1-4
>> Domain-0                             0     4     1   -b-    2221.9 1-4
>> Domain-0                             0     5     4   -b-    2287.7 1-4
>> backup                               9     0     4   -b-      10.6 1-4
>> database                             1     0     4   -b-      45.3 1-4
>> davical                              5     0     3   -b-       8.7 1-4
>> git                                  8     0     2   -b-       7.9 1-4
>> mail                                 2     0     4   -b-       8.0 1-4
>> samba                                3     0     3   -b-      11.1 1-4
>> security                             7     0     5   r--    1433.2 5
>> www                                  4     0     1   -b-      10.2 1-4
>> zabbix                               6     0     3   -b-      21.2 1-4
>> 
>> 
>> Is there a way a deadlock could occur between hypervisor <-> dom0
<-> domU especially related to passthrough/interrupts in the context of
pcpu 0 ?
> I don''t know, but I do know that the IRQ handling in Xen 4.0
changed significantly compared
> to 3.4. I don''t remember if you ever ran this setup under 3.4?
>> 
>> --
>> Sander


-- 
Best regards,
 Sander                            mailto:linux@eikelenboom.it


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Sander Eikelenboom

2010-Oct-13 13:36 UTC

head link

[Xen-devel] Re: xen dependant on pcpu 0 ?

By messing a bit with printk''s and debug settings a warn_on in the
hypervisor is being triggered when starting the videograbbing domU:

mapping kernel into physical memory
about to get started...
(XEN) [2010-10-13 13:30:44] Xen WARN at msi.c:636
(XEN) [2010-10-13 13:30:44] ----[ Xen-4.1-unstable  x86_64  debug=y  Tainted:   
C ]----
(XEN) [2010-10-13 13:30:44] CPU:    2
(XEN) [2010-10-13 13:30:44] RIP:    e008:[<ffff82c48015d797>]
pci_enable_msi+0x48a/0x9d5
(XEN) [2010-10-13 13:30:44] RFLAGS: 0000000000010216   CONTEXT: hypervisor
(XEN) [2010-10-13 13:30:44] rax: 0000000000000004   rbx: 00000000fe5fe000   rcx:
0000000000000001
(XEN) [2010-10-13 13:30:44] rdx: 0000000000000004   rsi: 0000000000000282   rdi:
ffff82c48024e940
(XEN) [2010-10-13 13:30:44] rbp: ffff830237e57dc8   rsp: ffff830237e57cf8   r8: 
0000000000000009
(XEN) [2010-10-13 13:30:44] r9:  000000000000003a   r10: 0000000000000092   r11:
0000000000000213
(XEN) [2010-10-13 13:30:44] r12: 0000000000000000   r13: ffff830237e57ea8   r14:
ffff83020211ed10
(XEN) [2010-10-13 13:30:44] r15: 0000000000000008   cr0: 000000008005003b   cr4:
00000000000006f0
(XEN) [2010-10-13 13:30:44] cr3: 0000000225f0e000   cr2: ffff880004e93d68
(XEN) [2010-10-13 13:30:44] ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e010
cs: e008
(XEN) [2010-10-13 13:30:44] Xen stack trace from rsp=ffff830237e57cf8:
(XEN) [2010-10-13 13:30:44]    ffff830237e57d38 ffff82c480126b66
ffff830237e57e18 0700000000000010
(XEN) [2010-10-13 13:30:44]    0000000000001000 0000000000000030
00000000fe5ff000 00000000fe5ff000
(XEN) [2010-10-13 13:30:44]    0000009000077d68 ffff83014601ad10
0000000700000246 0000000000000000
(XEN) [2010-10-13 13:30:44]    0000000700000092 0000000000000000
ffff83020211eda8 00000000000fe5ff
(XEN) [2010-10-13 13:30:44]    00000000000fe5ff ffff8301622fde28
0000000000000202 ffff830237e57da8
(XEN) [2010-10-13 13:30:44]    ffff82c480120680 ffff830237e57ea8
00000000ffffffed ffff830146a24000
(XEN) [2010-10-13 13:30:44]    0000000000000057 0000000000000048
ffff830237e57e48 ffff82c48015f16e
(XEN) [2010-10-13 13:30:44]    0000000025dfc910 000000000000015c
0000000000000048 0000000000000120
(XEN) [2010-10-13 13:30:44]    ffff830237e82480 0000000000000282
ffff83020211ed10 ffff830237e57e28
(XEN) [2010-10-13 13:30:44]    ffff82c480120680 ffff88002df4bb30
0000000000000057 ffff830146a24000
(XEN) [2010-10-13 13:30:44]    0000000000000048 ffff830146a24190
ffff830237e57ef8 ffff82c480172806
(XEN) [2010-10-13 13:30:44]    0000000180196b1a ffff830237e5a020
ffff830200000004 ffff830237e57ea8
(XEN) [2010-10-13 13:30:44]    000000000000000b ffffffffffffffff
0000000000000007 0000000000000000
(XEN) [2010-10-13 13:30:44]    00000000fe5fe000 aaaaaaaaaaaaaaaa
0000000000000007 0000000000000048
(XEN) [2010-10-13 13:30:44]    00000000fe5fe000 0000000000000000
0000000000000246 ffff8300c7e88000
(XEN) [2010-10-13 13:30:44]    000000000000000b ffff8800278c4400
0000000000000011 ffff88002ffea700
(XEN) [2010-10-13 13:30:44]    00007cfdc81a80c7 ffff82c480202a82
ffffffff8100942a 0000000000000021
(XEN) [2010-10-13 13:30:44]    ffff88002ffea700 0000000000000011
ffff8800278c4400 000000000000000b
(XEN) [2010-10-13 13:30:44]    ffff88002df4bbd0 00000000000006a1
0000000000000213 ffff88002fc20200
(XEN) [2010-10-13 13:30:44]    ffffffff810df6ea 0000000000000011
0000000000000021 ffffffff8100942a
(XEN) [2010-10-13 13:30:44] Xen call trace:
(XEN) [2010-10-13 13:30:44]    [<ffff82c48015d797>]
pci_enable_msi+0x48a/0x9d5
(XEN) [2010-10-13 13:30:44]    [<ffff82c48015f16e>]
map_domain_pirq+0x275/0x363
(XEN) [2010-10-13 13:30:44]    [<ffff82c480172806>]
do_physdev_op+0x826/0x10b0
(XEN) [2010-10-13 13:30:44]    [<ffff82c480202a82>]
syscall_enter+0xf2/0x14c
(XEN) [2010-10-13 13:30:44]
(XEN) [2010-10-13 13:30:44] SEIK bus: 7 slot: 0 func:0 msi->table_base:
fe5fe000 read_pci_mem_bar: 4
(XEN) [2010-10-13 13:30:44] SEIK pba_paddr: 4

it''s this one:
WARN_ON(msi->table_base != read_pci_mem_bar(bus, slot, func, bir));

I have added some printk''s .. and read_pci_mem_bar seems to return a
bogus value .. the pba_addr is used later in the function, but i can''t
oversee if and when this could have implications.
This also occurs when disabling the pci_resource_align on the kernel line.

lspci on dom0 shows:

07:00.0 USB Controller: NEC Corporation Device 0194 (rev 03) (prog-if 30)
        Subsystem: ASUSTeK Computer Inc. Device 8413
        Flags: bus master, fast devsel, latency 0, IRQ 33
        Memory at fe5fe000 (64-bit, non-prefetchable) [size=8K]
        Capabilities: [50] Power Management version 3
        Capabilities: [70] Message Signalled Interrupts: Mask- 64bit+ Queue=0/3
Enable-
        Capabilities: [90] MSI-X: Enable+ Mask- TabSize=8
        Capabilities: [a0] Express Endpoint, MSI 00
        Capabilities: [100] Advanced Error Reporting <?>
        Capabilities: [140] Device Serial Number ff-ff-ff-ff-ff-ff-ff-ff
        Capabilities: [150] #18
        Kernel driver in use: pciback


In the same function it seems to trigger
           if ( d )
            {
                /* XXX How to deal with existing mappings? */
            }

Which seems to be a bit odd for a freshly booted system with no domU restarts ?




grub menu.lst:

title           xen-4.1-unstable.gz / Debian GNU/Linux,
2.6.32.23-xen-next-2.6.32.x-generaldebug-20101002
root            (hd0,0)
kernel          /xen-4.1-unstable.gz dom0_mem=768M loglvl=all loglvl_guest=all
com1=115200,8n1 sync_console console_to_ring console_timestamps console=vga,com1
iommu=off debug lapic=debug apic_verbosity=debug apic=debug noirqbalance
module          /vmlinuz-2.6.32.24-xen-next-2.6.32.x-tracing-20101013
root=/dev/mapper/serveerstertje-root ro earlyprintk=xen max_loop=255
loop_max_part=63 libata.noacpi=1 debug loglevel=10 noirqbalance irqbalance=off
iommu=soft xen-pciback.hide=(03:06.0)(07:00.0)(09:01.0)(09:01.1)(09:01.2)
pci=resource_alignment=03:06.0;07:00.0;09:01.0;09:01.1;09:01.2;
module          /initrd.img-2.6.32.24-xen-next-2.6.32.x-tracing-20101013




--

Sander



Tuesday, October 12, 2010, 6:44:33 PM, you wrote:
> On Tue, Oct 12, 2010 at 06:28:13PM +0200, Sander Eikelenboom wrote:
>> Hi Keir,
>> 
>> Does xen and/or the xen console depend on physical cpu 0 ?
> Usually the console for Dom0, and I think all other domains go
> through CPU0. Let me CC Ian here, who has been mucking in this
> area and found some bugs (and produced fixes).
> Ian, that bug you found with not clearing the eventchannel - that
> wouldn''t have an impact here, right?
>> 
>> I''m still trying to solve the mystery of my machine freezing
when doing:
>> 
>>  - videograbbing in a domU with a usb3 pci-express controller passed
through (seems to cause quite a few interrupts)
>>  - compiling a linux kernel with "make -j 6"
>> 
>> It''s a 6 core AMD phenom x6.
>> 
>> Without cpu pinning:
>> I can freeze the machine easily within a minute after starting the
compile, at first xen serial console also slows down under the load (slow
updates).
>> When the machine freezes i can''t do anything with xen serial
console.
>> 
>> With cpu pinning:
>> By not using the pcpu 0 at all for any domain, and pinning the domain
with the videograbber to it''s own pcpu (pcpu 5)  it seems the machine
keeps running after 20 "make -j6" iterations of kernel compilation.
>> Xen serial console stays responsive and doesn''t slow down
during the kernel compilation. The videograbber shows no problem grabbing video.
>> 
> AHA! So finally closer to the mystery.
> Can you provide the /proc/interrupts of the Dom0?
> I wonder if this is related to the isseu I had some time ago, and never got
> to look at. The problem was that during heavy compilation (this is a 2
Nehelem
> socket box, just running Dom0 - no guests), the keyboard and USB driver
would
> stop getting interrupts.  So the drivers would start polling which is quite
slow,
> albeit servicable, and then at some point it would pick up again.
> The weirdness was that the /proc/interrupts showed absolutly _no_
interrupts on CPU0
> during that time - as if Xen just forgot to update them. Jeremy suggested I
try to
> disable Xen IRQ balance (noirqbalance on Xen command line) in case that is
it, and to my
> emberrasement I haven''t tried that yet.
> Did you try that? I think somebody suggested that but I can''t
recall whether it
> was for this issue?
>> 
>> Name                                ID  VCPU   CPU State   Time(s) CPU
Affinity
>> Domain-0                             0     0     3   r--    2169.7 1-4
>> Domain-0                             0     1     1   -b-    2339.3 1-4
>> Domain-0                             0     2     2   -b-    2358.9 1-4
>> Domain-0                             0     3     3   -b-    2298.2 1-4
>> Domain-0                             0     4     1   -b-    2221.9 1-4
>> Domain-0                             0     5     4   -b-    2287.7 1-4
>> backup                               9     0     4   -b-      10.6 1-4
>> database                             1     0     4   -b-      45.3 1-4
>> davical                              5     0     3   -b-       8.7 1-4
>> git                                  8     0     2   -b-       7.9 1-4
>> mail                                 2     0     4   -b-       8.0 1-4
>> samba                                3     0     3   -b-      11.1 1-4
>> security                             7     0     5   r--    1433.2 5
>> www                                  4     0     1   -b-      10.2 1-4
>> zabbix                               6     0     3   -b-      21.2 1-4
>> 
>> 
>> Is there a way a deadlock could occur between hypervisor <-> dom0
<-> domU especially related to passthrough/interrupts in the context of
pcpu 0 ?
> I don''t know, but I do know that the IRQ handling in Xen 4.0
changed significantly compared
> to 3.4. I don''t remember if you ever ran this setup under 3.4?
>> 
>> --
>> Sander


-- 
Best regards,
 Sander                            mailto:linux@eikelenboom.it


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Sander Eikelenboom

2010-Oct-13 14:26 UTC

head link

[Xen-devel] Re: xen dependant on pcpu 0 ?

This code was changed in changeset "x86: protect MSI-X table and pending
bit array from guest writes"   22182:68cc3c514a0a

Besides ... returning a bogus address in this piece of code:

       if ( !dev->domain || !paging_mode_translate(dev->domain) )
        {
            struct domain *d = dev->domain;

            if ( !d )
                for_each_domain(d)
                    if ( !paging_mode_translate(d) &&
                         (iomem_access_permitted(d, dev->msix_table.first,
                                                 dev->msix_table.last) ||
                          iomem_access_permitted(d, dev->msix_pba.first,
                                                 dev->msix_pba.last)) )
                        break;
            if ( d )
            {
                /* XXX How to deal with existing mappings? */
                printk("SEIK: err what am i doing here ?? d=%d
\n",d->domain_id);

            }
        }

On a freshly booted machine, d seems to be 0 ... that would mean the ( !d ) code
path will never be followed since all devices will belong to dom0 at first ?

--
Sander



Wednesday, October 13, 2010, 3:36:41 PM, you wrote:
> By messing a bit with printk''s and debug settings a warn_on in the
hypervisor is being triggered when starting the videograbbing domU:
> mapping kernel into physical memory
> about to get started...
> (XEN) [2010-10-13 13:30:44] Xen WARN at msi.c:636
> (XEN) [2010-10-13 13:30:44] ----[ Xen-4.1-unstable  x86_64  debug=y 
Tainted:    C ]----
> (XEN) [2010-10-13 13:30:44] CPU:    2
> (XEN) [2010-10-13 13:30:44] RIP:    e008:[<ffff82c48015d797>]
pci_enable_msi+0x48a/0x9d5
> (XEN) [2010-10-13 13:30:44] RFLAGS: 0000000000010216   CONTEXT: hypervisor
> (XEN) [2010-10-13 13:30:44] rax: 0000000000000004   rbx: 00000000fe5fe000  
rcx: 0000000000000001
> (XEN) [2010-10-13 13:30:44] rdx: 0000000000000004   rsi: 0000000000000282  
rdi: ffff82c48024e940
> (XEN) [2010-10-13 13:30:44] rbp: ffff830237e57dc8   rsp: ffff830237e57cf8  
r8:  0000000000000009
> (XEN) [2010-10-13 13:30:44] r9:  000000000000003a   r10: 0000000000000092  
r11: 0000000000000213
> (XEN) [2010-10-13 13:30:44] r12: 0000000000000000   r13: ffff830237e57ea8  
r14: ffff83020211ed10
> (XEN) [2010-10-13 13:30:44] r15: 0000000000000008   cr0: 000000008005003b  
cr4: 00000000000006f0
> (XEN) [2010-10-13 13:30:44] cr3: 0000000225f0e000   cr2: ffff880004e93d68
> (XEN) [2010-10-13 13:30:44] ds: 0000   es: 0000   fs: 0000   gs: 0000   ss:
e010   cs: e008
> (XEN) [2010-10-13 13:30:44] Xen stack trace from rsp=ffff830237e57cf8:
> (XEN) [2010-10-13 13:30:44]    ffff830237e57d38 ffff82c480126b66
ffff830237e57e18 0700000000000010
> (XEN) [2010-10-13 13:30:44]    0000000000001000 0000000000000030
00000000fe5ff000 00000000fe5ff000
> (XEN) [2010-10-13 13:30:44]    0000009000077d68 ffff83014601ad10
0000000700000246 0000000000000000
> (XEN) [2010-10-13 13:30:44]    0000000700000092 0000000000000000
ffff83020211eda8 00000000000fe5ff
> (XEN) [2010-10-13 13:30:44]    00000000000fe5ff ffff8301622fde28
0000000000000202 ffff830237e57da8
> (XEN) [2010-10-13 13:30:44]    ffff82c480120680 ffff830237e57ea8
00000000ffffffed ffff830146a24000
> (XEN) [2010-10-13 13:30:44]    0000000000000057 0000000000000048
ffff830237e57e48 ffff82c48015f16e
> (XEN) [2010-10-13 13:30:44]    0000000025dfc910 000000000000015c
0000000000000048 0000000000000120
> (XEN) [2010-10-13 13:30:44]    ffff830237e82480 0000000000000282
ffff83020211ed10 ffff830237e57e28
> (XEN) [2010-10-13 13:30:44]    ffff82c480120680 ffff88002df4bb30
0000000000000057 ffff830146a24000
> (XEN) [2010-10-13 13:30:44]    0000000000000048 ffff830146a24190
ffff830237e57ef8 ffff82c480172806
> (XEN) [2010-10-13 13:30:44]    0000000180196b1a ffff830237e5a020
ffff830200000004 ffff830237e57ea8
> (XEN) [2010-10-13 13:30:44]    000000000000000b ffffffffffffffff
0000000000000007 0000000000000000
> (XEN) [2010-10-13 13:30:44]    00000000fe5fe000 aaaaaaaaaaaaaaaa
0000000000000007 0000000000000048
> (XEN) [2010-10-13 13:30:44]    00000000fe5fe000 0000000000000000
0000000000000246 ffff8300c7e88000
> (XEN) [2010-10-13 13:30:44]    000000000000000b ffff8800278c4400
0000000000000011 ffff88002ffea700
> (XEN) [2010-10-13 13:30:44]    00007cfdc81a80c7 ffff82c480202a82
ffffffff8100942a 0000000000000021
> (XEN) [2010-10-13 13:30:44]    ffff88002ffea700 0000000000000011
ffff8800278c4400 000000000000000b
> (XEN) [2010-10-13 13:30:44]    ffff88002df4bbd0 00000000000006a1
0000000000000213 ffff88002fc20200
> (XEN) [2010-10-13 13:30:44]    ffffffff810df6ea 0000000000000011
0000000000000021 ffffffff8100942a
> (XEN) [2010-10-13 13:30:44] Xen call trace:
> (XEN) [2010-10-13 13:30:44]    [<ffff82c48015d797>]
pci_enable_msi+0x48a/0x9d5
> (XEN) [2010-10-13 13:30:44]    [<ffff82c48015f16e>]
map_domain_pirq+0x275/0x363
> (XEN) [2010-10-13 13:30:44]    [<ffff82c480172806>]
do_physdev_op+0x826/0x10b0
> (XEN) [2010-10-13 13:30:44]    [<ffff82c480202a82>]
syscall_enter+0xf2/0x14c
> (XEN) [2010-10-13 13:30:44]
> (XEN) [2010-10-13 13:30:44] SEIK bus: 7 slot: 0 func:0 msi->table_base:
fe5fe000 read_pci_mem_bar: 4
> (XEN) [2010-10-13 13:30:44] SEIK pba_paddr: 4
> it''s this one:WARN_ON(msi->>table_base != read_pci_mem_bar(bus, slot, func, bir));
> I have added some printk''s .. and read_pci_mem_bar seems to return
a bogus value .. the pba_addr is used later in the function, but i
can''t oversee if and when this could have implications.
> This also occurs when disabling the pci_resource_align on the kernel line.
> lspci on dom0 shows:
> 07:00.0 USB Controller: NEC Corporation Device 0194 (rev 03) (prog-if 30)
>         Subsystem: ASUSTeK Computer Inc. Device 8413
>         Flags: bus master, fast devsel, latency 0, IRQ 33
>         Memory at fe5fe000 (64-bit, non-prefetchable) [size=8K]
>         Capabilities: [50] Power Management version 3
>         Capabilities: [70] Message Signalled Interrupts: Mask- 64bit+
Queue=0/3 Enable-
>         Capabilities: [90] MSI-X: Enable+ Mask- TabSize=8
>         Capabilities: [a0] Express Endpoint, MSI 00
>         Capabilities: [100] Advanced Error Reporting <?>
>         Capabilities: [140] Device Serial Number ff-ff-ff-ff-ff-ff-ff-ff
>         Capabilities: [150] #18
>         Kernel driver in use: pciback
> In the same function it seems to trigger
>            if ( d )
>             {
>                 /* XXX How to deal with existing mappings? */
>             }
> Which seems to be a bit odd for a freshly booted system with no domU
restarts ?


> grub menu.lst:
> title           xen-4.1-unstable.gz / Debian GNU/Linux,
2.6.32.23-xen-next-2.6.32.x-generaldebug-20101002
> root            (hd0,0)
> kernel          /xen-4.1-unstable.gz dom0_mem=768M loglvl=all
loglvl_guest=all com1=115200,8n1 sync_console console_to_ring console_timestamps
console=vga,com1 iommu=off debug lapic=debug apic_verbosity=debug apic=debug
noirqbalance
> module          /vmlinuz-2.6.32.24-xen-next-2.6.32.x-tracing-20101013
root=/dev/mapper/serveerstertje-root ro earlyprintk=xen max_loop=255
loop_max_part=63 libata.noacpi=1 debug loglevel=10 noirqbalance irqbalance=off
iommu=soft xen-pciback.hide=(03:06.0)(07:00.0)(09:01.0)(09:01.1)(09:01.2)
pci=resource_alignment=03:06.0;07:00.0;09:01.0;09:01.1;09:01.2;
> module          /initrd.img-2.6.32.24-xen-next-2.6.32.x-tracing-20101013


> --
> Sander

> Tuesday, October 12, 2010, 6:44:33 PM, you wrote:
>> On Tue, Oct 12, 2010 at 06:28:13PM +0200, Sander Eikelenboom wrote:
>>> Hi Keir,
>>> 
>>> Does xen and/or the xen console depend on physical cpu 0 ?
>> Usually the console for Dom0, and I think all other domains go
>> through CPU0. Let me CC Ian here, who has been mucking in this
>> area and found some bugs (and produced fixes).
>> Ian, that bug you found with not clearing the eventchannel - that
>> wouldn''t have an impact here, right?
>>> 
>>> I''m still trying to solve the mystery of my machine
freezing when doing:
>>> 
>>>  - videograbbing in a domU with a usb3 pci-express controller
passed through (seems to cause quite a few interrupts)
>>>  - compiling a linux kernel with "make -j 6"
>>> 
>>> It''s a 6 core AMD phenom x6.
>>> 
>>> Without cpu pinning:
>>> I can freeze the machine easily within a minute after starting the
compile, at first xen serial console also slows down under the load (slow
updates).
>>> When the machine freezes i can''t do anything with xen
serial console.
>>> 
>>> With cpu pinning:
>>> By not using the pcpu 0 at all for any domain, and pinning the
domain with the videograbber to it''s own pcpu (pcpu 5)  it seems the
machine keeps running after 20 "make -j6" iterations of kernel
compilation.
>>> Xen serial console stays responsive and doesn''t slow down
during the kernel compilation. The videograbber shows no problem grabbing video.
>>> 
>> AHA! So finally closer to the mystery.
>> Can you provide the /proc/interrupts of the Dom0?
>> I wonder if this is related to the isseu I had some time ago, and never
got
>> to look at. The problem was that during heavy compilation (this is a 2
Nehelem
>> socket box, just running Dom0 - no guests), the keyboard and USB driver
would
>> stop getting interrupts.  So the drivers would start polling which is
quite slow,
>> albeit servicable, and then at some point it would pick up again.
>> The weirdness was that the /proc/interrupts showed absolutly _no_
interrupts on CPU0
>> during that time - as if Xen just forgot to update them. Jeremy
suggested I try to
>> disable Xen IRQ balance (noirqbalance on Xen command line) in case that
is it, and to my
>> emberrasement I haven''t tried that yet.
>> Did you try that? I think somebody suggested that but I can''t
recall whether it
>> was for this issue?
>>> 
>>> Name                                ID  VCPU   CPU State   Time(s)
CPU Affinity
>>> Domain-0                             0     0     3   r--    2169.7
1-4
>>> Domain-0                             0     1     1   -b-    2339.3
1-4
>>> Domain-0                             0     2     2   -b-    2358.9
1-4
>>> Domain-0                             0     3     3   -b-    2298.2
1-4
>>> Domain-0                             0     4     1   -b-    2221.9
1-4
>>> Domain-0                             0     5     4   -b-    2287.7
1-4
>>> backup                               9     0     4   -b-      10.6
1-4
>>> database                             1     0     4   -b-      45.3
1-4
>>> davical                              5     0     3   -b-       8.7
1-4
>>> git                                  8     0     2   -b-       7.9
1-4
>>> mail                                 2     0     4   -b-       8.0
1-4
>>> samba                                3     0     3   -b-      11.1
1-4
>>> security                             7     0     5   r--    1433.2
5
>>> www                                  4     0     1   -b-      10.2
1-4
>>> zabbix                               6     0     3   -b-      21.2
1-4
>>> 
>>> 
>>> Is there a way a deadlock could occur between hypervisor <->
dom0 <-> domU especially related to passthrough/interrupts in the context
of pcpu 0 ?
>> I don''t know, but I do know that the IRQ handling in Xen 4.0
changed significantly compared
>> to 3.4. I don''t remember if you ever ran this setup under 3.4?
>>> 
>>> --
>>> Sander





-- 
Best regards,
 Sander                            mailto:linux@eikelenboom.it


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jan Beulich

2010-Oct-13 14:26 UTC

head link

[Xen-devel] Re: xen dependant on pcpu 0 ?

>>> On 13.10.10 at 15:36, Sander Eikelenboom
<linux@eikelenboom.it> wrote:
> it''s this one:
> WARN_ON(msi->table_base != read_pci_mem_bar(bus, slot, func, bir));
Yeah, read_pci_mem_bar() uses an inverted mask in two places.
Would you remove the ~ from the two uses of PCI_BASE_ADDRESS_MEM_MASK
in that function and try again?

(Yunhong, you had tested the patch that introduced this, and this
warning would basically trigger unconditionally as it stands. Didn''t
you notice that in your logs?)

The main thing however, if I correctly remember the context of this
thread, is that this code was only recently introduced and doesn''t
exist in the 4.0 tree, so your original problem is unlikely caused by it.
> I have added some printk''s .. and read_pci_mem_bar seems to return
a bogus
> value .. the pba_addr is used later in the function, but i can''t
oversee if
> and when this could have implications.
> This also occurs when disabling the pci_resource_align on the kernel line.
> In the same function it seems to trigger
>            if ( d )
>             {
>                 /* XXX How to deal with existing mappings? */
>             }
> 
> Which seems to be a bit odd for a freshly booted system with no domU 
> restarts ?
No, the comment refers to potentially existing mappings (which
would need to be actively searched for). It doesn''t mean there have
to be any.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jan Beulich

2010-Oct-13 14:41 UTC

head link

[Xen-devel] Re: xen dependant on pcpu 0 ?

>>> On 13.10.10 at 16:26, Sander Eikelenboom
<linux@eikelenboom.it> wrote:
> Besides ... returning a bogus address in this piece of code:
> 
>        if ( !dev->domain || !paging_mode_translate(dev->domain) )
>         {
>             struct domain *d = dev->domain;
> 
>             if ( !d )
>                 for_each_domain(d)
>                     if ( !paging_mode_translate(d) &&
>                          (iomem_access_permitted(d,
dev->msix_table.first,
>                                                  dev->msix_table.last)
||
>                           iomem_access_permitted(d, dev->msix_pba.first,
>                                                  dev->msix_pba.last)) )
>                         break;
>             if ( d )
>             {
>                 /* XXX How to deal with existing mappings? */
>                 printk("SEIK: err what am i doing here ?? d=%d 
> \n",d->domain_id);
> 
>             }
>         }
> 
> On a freshly booted machine, d seems to be 0 ... that would mean the ( !d )
> code path will never be followed since all devices will belong to dom0 at 
> first ?
Not sure what you''re trying to say. This code path will not only get
executed on a freshly booted system, but whenever MSI-X gets
first enabled for a device after it having been disabled (perhaps
because of it getting assigned to a guest).

And for the moment Dom0 is still considered an exception (i.e.
may map this space writable), so on initial boot it doesn''t matter
whether the device is considered un-owned or owned by Dom0.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jiang, Yunhong

2010-Oct-13 15:00 UTC

head link

RE: [Xen-devel] Re: xen dependant on pcpu 0 ?

>-----Original Message-----
>From: Jan Beulich [mailto:JBeulich@novell.com]
>Sent: Wednesday, October 13, 2010 10:27 PM
>To: Sander Eikelenboom
>Cc: Ian; Keir Fraser; Jeremy Fitzhardinge; Jiang, Yunhong;
>xen-devel@lists.xensource.com; Konrad Rzeszutek Wilk
>Subject: [Xen-devel] Re: xen dependant on pcpu 0 ?
>
>>>> On 13.10.10 at 15:36, Sander Eikelenboom
<linux@eikelenboom.it> wrote:
>> it''s this one:
>> WARN_ON(msi->table_base != read_pci_mem_bar(bus, slot, func, bir));
>
>Yeah, read_pci_mem_bar() uses an inverted mask in two places.
>Would you remove the ~ from the two uses of PCI_BASE_ADDRESS_MEM_MASK
>in that function and try again?
>
>(Yunhong, you had tested the patch that introduced this, and this
>warning would basically trigger unconditionally as it stands.
Didn''t
>you notice that in your logs?)
A bit amazing to me, but I do remember I didn''t notice such log.
And seems with this bug, the patch itself should not work at all, since the
PBA_addr is not correct, but I do remember with attached test module, and your
patch, the write_vector() will cause fault.

--jyh


>
>The main thing however, if I correctly remember the context of this
>thread, is that this code was only recently introduced and doesn''t
>exist in the 4.0 tree, so your original problem is unlikely caused by it.
>
>> I have added some printk''s .. and read_pci_mem_bar seems to
return a bogus
>> value .. the pba_addr is used later in the function, but i
can''t oversee if
>> and when this could have implications.
>> This also occurs when disabling the pci_resource_align on the kernel
line.
>> In the same function it seems to trigger
>>            if ( d )
>>             {
>>                 /* XXX How to deal with existing mappings? */
>>             }
>>
>> Which seems to be a bit odd for a freshly booted system with no domU
>> restarts ?
>
>No, the comment refers to potentially existing mappings (which
>would need to be actively searched for). It doesn''t mean there have
>to be any.
>
>Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jan Beulich

2010-Oct-13 15:31 UTC

head link

RE: [Xen-devel] Re: xen dependant on pcpu 0 ?

>>> On 13.10.10 at 17:00, "Jiang, Yunhong"
<yunhong.jiang@intel.com> wrote:
>>From: Jan Beulich [mailto:JBeulich@novell.com] 
>>(Yunhong, you had tested the patch that introduced this, and this
>>warning would basically trigger unconditionally as it stands.
Didn''t
>>you notice that in your logs?)
> 
> A bit amazing to me, but I do remember I didn''t notice such log.
> And seems with this bug, the patch itself should not work at all, since the
> PBA_addr is not correct, but I do remember with attached test module, and 
> your patch, the write_vector() will cause fault.
Probably MSI-X table and PBA share a page for the device you
tested with? In that case, the code would still have worked as is
afaict.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Sander Eikelenboom

2010-Oct-13 15:41 UTC

head link

[Xen-devel] Re: xen dependant on pcpu 0 ?

Wednesday, October 13, 2010, 5:26:27 PM, you wrote:
>>>> On 13.10.10 at 17:03, Sander Eikelenboom
<linux@eikelenboom.it> wrote:
>> Err yes i''m nor a kernel nor a xen hacker so i''m just
trying not to speak
>> complete gibberish :-)
>> Well since the device when seized by pciback on boot, seems to be
assigned
>> to dom0 and therefore d=0, the
>>                for_each_domain(d)
>>                      if ( !paging_mode_translate(d) &&
>>                           (iomem_access_permitted(d,
dev->msix_table.first,
>>                                                 
dev->msix_table.last) ||
>>                            iomem_access_permitted(d,
dev->msix_pba.first,
>>                                                  
dev->msix_pba.last)) )
>>                          break;
>> 
>> part seems never to be run, because a device seems to allways be
assigned to
>> a domain.
> That code fragment sits inside a if (!d), i.e. if we can easily tell
> (by just looking at dev->domain) which domain owns the device.
>> So if it seems to be never run ... why is it there ?
> You''re probably more after the subsequent if (d) with the comment
> somewhat confusing you in its body - again, the function gets
> executed (or is supposed to) when a domain enables MSI-X on the
> device. At that point, dev->domain should be non-NULL (and
> different from dom0), so the body (if there really was one) would
> get executed.
Thx for you patience .. just one more time ...
I saw a mistake in my explanation, i didn''t mean d=0, but in my case
(fresh boot, first time domain with passthrough is started) d is not NULL and
d->domain_id = 0
So it seems it thinks it''s still assigned to dom0 when the MSI-X gets
enabled ?
But this all does get triggered when the domU is started to which the domain is
passed through, and yes it enables MSI-X (when i look at lspci or
/proc/interrupts in the domU)
but d->domain_id results in "0" and not in the domain id of domU.
So if in this case the code in ( !d ) should have been run, it didn''t
(have put a printk there to be sure)

You were right that it didn''t fix my freeze problem, although the RCU
detected CPU stall was now followed by the beginning of a trace although it
doesn''t provide much more info.
I attached a photo of it.

--

Sander
> Jan



-- 
Best regards,
 Sander                            mailto:linux@eikelenboom.it

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jan Beulich

2010-Oct-13 16:08 UTC

head link

[Xen-devel] Re: xen dependant on pcpu 0 ?

>>> On 13.10.10 at 17:41, Sander Eikelenboom
<linux@eikelenboom.it> wrote:
> I saw a mistake in my explanation, i didn''t mean d=0, but in my
case (fresh
> boot, first time domain with passthrough is started) d is not NULL and 
> d->domain_id = 0
> So it seems it thinks it''s still assigned to dom0 when the MSI-X
gets enabled
> ?
That would be bad indeed, but would indicate a problem elsewhere.
> But this all does get triggered when the domU is started to which the
domain
> is passed through, and yes it enables MSI-X (when i look at lspci or 
> /proc/interrupts in the domU)
> but d->domain_id results in "0" and not in the domain id of
domU.
> So if in this case the code in ( !d ) should have been run, it
didn''t (have
> put a printk there to be sure)
No, generally the !d case shouldn''t get executed, the following
d case, however, would expect the correct domain to be used (if
it had a body implemented).
> You were right that it didn''t fix my freeze problem, although the
RCU
> detected CPU stall was now followed by the beginning of a trace although it
> doesn''t provide much more info.
> I attached a photo of it.
Looks like an access to a not mapped (IO-?)APIC page - that code
path likely hasn''t been tested so far in the pv-ops context.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Xen devel - Oct 2010 - xen dependant on pcpu 0 ?

[Xen-devel] xen dependant on pcpu 0 ?

[Xen-devel] Re: xen dependant on pcpu 0 ?

[Xen-devel] Re: xen dependant on pcpu 0 ?

[Xen-devel] Re: xen dependant on pcpu 0 ?

[Xen-devel] Re: xen dependant on pcpu 0 ?

[Xen-devel] Re: xen dependant on pcpu 0 ?

[Xen-devel] Re: xen dependant on pcpu 0 ?

[Xen-devel] Re: xen dependant on pcpu 0 ?

[Xen-devel] Re: xen dependant on pcpu 0 ?

RE: [Xen-devel] Re: xen dependant on pcpu 0 ?

RE: [Xen-devel] Re: xen dependant on pcpu 0 ?

[Xen-devel] Re: xen dependant on pcpu 0 ?

[Xen-devel] Re: xen dependant on pcpu 0 ?