Hello, I''m using linux 3.8.8 and Xen 4.2.1 and I try to use CUDA in Xen. I successfully installed the driver and the cuda dev package on my non-xen linux 3.8.8 but when I boot with Xen and that I try to use CUDA, I get this error message : all CUDA-capable devices are busy or unavailable While searching on google for a solution, I found threads dating from 2011 and mentioning this problem but no solutions were found at the time. Can you please tell me if there is a solution or if the problem is still unsolved ? Thank you, Best regards, Sebastien Fremal *U*MONS * **PhD Student S. Frémal **University of Mons *IT Department Rue de Houdain, n°9 7000 Mons +32(0)65/37.40.51 www.umons.ac.be _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
On Tue, Apr 30, 2013 at 12:39 PM, Sébastien Frémal <sebastien.fremal@gmail.com> wrote:> Hello, > > I''m using linux 3.8.8 and Xen 4.2.1 and I try to use CUDA in Xen. I > successfully installed the driver and the cuda dev package on my non-xen > linux 3.8.8 but when I boot with Xen and that I try to use CUDA, I get this > error message : > all CUDA-capable devices are busy or unavailable > > While searching on google for a solution, I found threads dating from 2011 > and mentioning this problem but no solutions were found at the time. Can you > please tell me if there is a solution or if the problem is still unsolved ?You mean it doesn''t work in dom0? This sounds like a potential issue with pvops Linux on Xen. -George
George Dunlap, le Tue 30 Apr 2013 12:57:52 +0100, a écrit :> On Tue, Apr 30, 2013 at 12:39 PM, Sébastien Frémal > <sebastien.fremal@gmail.com> wrote: > > I''m using linux 3.8.8 and Xen 4.2.1 and I try to use CUDA in Xen. I > > successfully installed the driver and the cuda dev package on my non-xen > > linux 3.8.8 but when I boot with Xen and that I try to use CUDA, I get this > > error message : > > all CUDA-capable devices are busy or unavailable > > > > While searching on google for a solution, I found threads dating from 2011 > > and mentioning this problem but no solutions were found at the time. Can you > > please tell me if there is a solution or if the problem is still unsolved ? > > You mean it doesn''t work in dom0?Yes, while it does work with the same kernel without the hypervisor. I advised him to post the content of lspci -vvv which might give a clue. I also advised him to try the almost-4.3 version. Samuel
I ran lspci in dom0 with xen 4.2.1 and xen 4.3 (there are no difference between their output), and in a linux kernel without xen. You can find the outputs in attachment. I ran a diff between the output got in dom0 and the output got in the linux kernel without xen and here is the result : ~$ diff lspci.txt lspci_xen_4.3.txt 114c114 < Interrupt: pin A routed to IRQ 93 ---> Interrupt: pin A routed to IRQ 118213c213 < Interrupt: pin C routed to IRQ 92 ---> Interrupt: pin C routed to IRQ 117249c249 < Latency: 0, Cache Line Size: 64 bytes ---> Latency: 0255c255 < Expansion ROM at f7e00000 [disabled] [size=512K] ---> [virtual] Expansion ROM at f7e00000 [disabled] [size=512K]264c264 < Interrupt: pin A routed to IRQ 94 ---> Interrupt: pin A routed to IRQ 119I will investigate the pvops trail. Best regards, Sebastien Fremal _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Sébastien Frémal, le Tue 30 Apr 2013 16:13:58 +0200, a écrit :> < Expansion ROM at f7e00000 [disabled] [size=512K] > --- > > [virtual] Expansion ROM at f7e00000 [disabled] [size=512K]And this is the nvidia 10de card. Probably worth checking. From lspci source: /* Reported by the OS, but not by the device */> Capabilities: <access denied>Please run it as root. Samuel
I ran lspci in root, results are in attachments and the new diff is : ~$ diff lspci.txt lspci_xen_4.3.txt 43c43 < ACSCtl: SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans- ---> ACSCtl: SrcValid+ TransBlk- ReqRedir+ CmpltRedir+ UpstreamFwd+EgressCtrl- DirectTrans- 95c95 < ACSCtl: SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans- ---> ACSCtl: SrcValid+ TransBlk- ReqRedir+ CmpltRedir+ UpstreamFwd+EgressCtrl- DirectTrans- 125c125 < LnkSta: Speed 2.5GT/s, Width x16, TrErr- Train- SlotClk+ DLActive+ BWMgmt+ ABWMgmt- ---> LnkSta: Speed 5GT/s, Width x16, TrErr- Train- SlotClk+ DLActive+ BWMgmt+ABWMgmt- 154c154 < ACSCtl: SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans- ---> ACSCtl: SrcValid+ TransBlk- ReqRedir+ CmpltRedir+ UpstreamFwd+EgressCtrl- DirectTrans- 213c213 < ACSCtl: SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans- ---> ACSCtl: SrcValid+ TransBlk- ReqRedir+ CmpltRedir+ UpstreamFwd+EgressCtrl- DirectTrans- 351c351 < Interrupt: pin A routed to IRQ 93 ---> Interrupt: pin A routed to IRQ 118357c357 < Address: 00000000fee003b8 Data: 0000 ---> Address: 00000000fee00338 Data: 0000416c416 < Address: fee00338 Data: 0000 ---> Address: fee002b8 Data: 0000463c463 < Address: fee00358 Data: 0000 ---> Address: fee002d8 Data: 0000565c565 < Interrupt: pin C routed to IRQ 92 ---> Interrupt: pin C routed to IRQ 117573c573 < Address: fee00398 Data: 0000 ---> Address: fee00318 Data: 0000648c648 < Latency: 0, Cache Line Size: 64 bytes ---> Latency: 0654c654 < Expansion ROM at f7e00000 [disabled] [size=512K] ---> [virtual] Expansion ROM at f7e00000 [disabled] [size=512K]660,661c660,661 < Capabilities: [78] Express (v1) Endpoint, MSI 00 < DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <1us, L1 <4us ---> Capabilities: [78] Express (v2) Endpoint, MSI 00 > DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <512ns, L1 <4us664c664 < RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+ ---> RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+666,667c666,667 < DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend- < LnkCap: Port #0, Speed 2.5GT/s, Width x16, ASPM L0s L1, Latency L0 <1us, L1 <1us ---> DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend- > LnkCap: Port #0, Speed 5GT/s, Width x16, ASPM L0s L1, Latency L0 <256ns,L1 <1us 671c671,678 < LnkSta: Speed 2.5GT/s, Width x16, TrErr- Train- SlotClk- DLActive- BWMgmt- ABWMgmt- ---> LnkSta: Speed 5GT/s, Width x16, TrErr- Train- SlotClk- DLActive- BWMgmt-ABWMgmt-> DevCap2: Completion Timeout: Not Supported, TimeoutDis+ > DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- > LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-,Selectable De-emphasis: -6dB> Transmit Margin: Normal Operating Range, EnterModifiedCompliance-ComplianceSOS-> Compliance De-emphasis: -6dB > LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-,EqualizationPhase1-> EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-690c697 < Interrupt: pin A routed to IRQ 94 ---> Interrupt: pin A routed to IRQ 119710c717 < Address: 00000000fee003f8 Data: 0000 ---> Address: 00000000fee00378 Data: 0000805c812 < ACSCtl: SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans- ---> ACSCtl: SrcValid+ TransBlk- ReqRedir+ CmpltRedir+ UpstreamFwd+EgressCtrl- DirectTrans- 864c871 < ACSCtl: SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans- ---> ACSCtl: SrcValid+ TransBlk- ReqRedir+ CmpltRedir+ UpstreamFwd+EgressCtrl- DirectTrans- SF 2013/4/30 Samuel Thibault <samuel.thibault@ens-lyon.org>> Sébastien Frémal, le Tue 30 Apr 2013 16:13:58 +0200, a écrit : > > < Expansion ROM at f7e00000 [disabled] [size=512K] > > --- > > > [virtual] Expansion ROM at f7e00000 [disabled] [size=512K] > > And this is the nvidia 10de card. Probably worth checking. From lspci > source: /* Reported by the OS, but not by the device */ > > > Capabilities: <access denied> > > Please run it as root. > > Samuel >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
This is the part we are interested in: Sébastien Frémal, le Tue 30 Apr 2013 16:32:45 +0200, a écrit :> 648c648 > < Latency: 0, Cache Line Size: 64 bytes > --- > > Latency: 0 > 654c654 > < Expansion ROM at f7e00000 [disabled] [size=512K] > --- > > [virtual] Expansion ROM at f7e00000 [disabled] [size=512K] > 660,661c660,661 > < Capabilities: [78] Express (v1) Endpoint, MSI 00 > < DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <1us, L1 <4us > --- > > Capabilities: [78] Express (v2) Endpoint, MSI 00So it seems the whole PCI express negociation changes...> > DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <512ns, L1 <4us > 664c664 > < RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+ > --- > > RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+ > 666,667c666,667 > < DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend- > < LnkCap: Port #0, Speed 2.5GT/s, Width x16, ASPM L0s L1, Latency L0 <1us, L1 > <1us > --- > > DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend- > > LnkCap: Port #0, Speed 5GT/s, Width x16, ASPM L0s L1, Latency L0 <256ns, L1Resulting in a different speed> <1us > 671c671,678 > < LnkSta: Speed 2.5GT/s, Width x16, TrErr- Train- SlotClk- DLActive- BWMgmt- ABWMgmt- > --- > > LnkSta: Speed 5GT/s, Width x16, TrErr- Train- SlotClk- DLActive- BWMgmt- ABWMgmt- > > DevCap2: Completion Timeout: Not Supported, TimeoutDis+ > > DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- > > LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-, Selectable De-emphasis: -6dB > > Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS- > > Compliance De-emphasis: -6dB > > LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1- > > EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-And additional capabilities. Samuel
Samuel Thibault, le Tue 30 Apr 2013 16:19:52 +0200, a écrit :> Sébastien Frémal, le Tue 30 Apr 2013 16:13:58 +0200, a écrit : > > < Expansion ROM at f7e00000 [disabled] [size=512K] > > --- > > > [virtual] Expansion ROM at f7e00000 [disabled] [size=512K] > > And this is the nvidia 10de card. Probably worth checking. From lspci > source: /* Reported by the OS, but not by the device */So does anybody have any idea why this could get disabled by the device? Some mis-reconfiguration by the hypervisor? Samuel