Tamas Lengyel
2013-Nov-19 09:32 UTC
Quadrified GTX 480 VT-d passthrough. CUDA 5.5 in Linux partial success
Hi everyone, after following in the footsteps of the following discussion (http://lists.xenproject.org/archives/html/xen-users/2013-09/msg00106.html) I had been able to turn my GTX 480 into a Quadro 6000. When I VT-d passthrough it to a Debian jessie VM it shows up fine and CUDA 5.5 seems to function properly up to a point: lspci -v: 00:04.0 VGA compatible controller: NVIDIA Corporation GF100GL [Quadro 6000] (rev a3) (prog-if 00 [VGA controller]) Subsystem: ASUSTeK Computer Inc. Device 075f Physical Slot: 4 Flags: bus master, fast devsel, latency 0, IRQ 32 Memory at ee000000 (32-bit, non-prefetchable) [size=32M] Memory at e0000000 (64-bit, prefetchable) [size=128M] Memory at e8000000 (64-bit, prefetchable) [size=64M] I/O ports at c100 [size=128] Expansion ROM at f1000000 [disabled] [size=512K] Capabilities: [60] Power Management version 3 Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+ Capabilities: [78] Express Endpoint, MSI 00 Capabilities: [b4] Vendor Specific Information: Len=14 <?> Kernel driver in use: nvidia 00:05.0 Audio device: NVIDIA Corporation GF100 High Definition Audio Controller (rev a1) Subsystem: ASUSTeK Computer Inc. Device 075f Physical Slot: 5 Flags: bus master, fast devsel, latency 0, IRQ 37 Memory at f1080000 (32-bit, non-prefetchable) [size=16K] Capabilities: [60] Power Management version 3 Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+ Capabilities: [78] Express Endpoint, MSI 00 Kernel driver in use: snd_hda_intel NVIDIA_CUDA-5.5_Samples/1_Utilities/deviceQuery# ./deviceQuery ./deviceQuery Starting... CUDA Device Query (Runtime API) version (CUDART static linking) Detected 1 CUDA Capable device(s) Device 0: "Quadro 6000" CUDA Driver Version / Runtime Version 6.0 / 5.5 CUDA Capability Major/Minor version number: 2.0 Total amount of global memory: 1536 MBytes (1610285056 bytes) (15) Multiprocessors, ( 32) CUDA Cores/MP: 480 CUDA Cores GPU Clock rate: 1401 MHz (1.40 GHz) Memory Clock rate: 1848 Mhz Memory Bus Width: 384-bit L2 Cache Size: 786432 bytes Maximum Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536, 65535), 3D=(2048, 2048, 2048) Maximum Layered 1D Texture Size, (num) layers 1D=(16384), 2048 layers Maximum Layered 2D Texture Size, (num) layers 2D=(16384, 16384), 2048 layers Total amount of constant memory: 65536 bytes Total amount of shared memory per block: 49152 bytes Total number of registers available per block: 32768 Warp size: 32 Maximum number of threads per multiprocessor: 1536 Maximum number of threads per block: 1024 Max dimension size of a thread block (x,y,z): (1024, 1024, 64) Max dimension size of a grid size (x,y,z): (65535, 65535, 65535) Maximum memory pitch: 2147483647 bytes Texture alignment: 512 bytes Concurrent copy and kernel execution: Yes with 2 copy engine(s) Run time limit on kernels: No Integrated GPU sharing Host Memory: No Support host page-locked memory mapping: Yes Alignment requirement for Surfaces: Yes Device has ECC support: Disabled Device supports Unified Addressing (UVA): Yes Device PCI Bus ID / PCI location ID: 0 / 4 Compute Mode: < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) > deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 6.0, CUDA Runtime Version = 5.5, NumDevs = 1, Device0 = Quadro 6000 Result = PASS Unfortunately if I try to run any CUDA app or even nvidia-smi afterwards, I get the following errors: NVIDIA_CUDA-5.5_Samples/1_Utilities/deviceQuery# ./deviceQuery ./deviceQuery Starting... CUDA Device Query (Runtime API) version (CUDART static linking) cudaGetDeviceCount returned 10 -> invalid device ordinal Result = FAIL # nvidia-smi Unable to determine the device handle for GPU 0000:00:04.0: The NVIDIA kernel module detected an issue with GPU interrupts.Consult the "Common Problems" Chapter of the NVIDIA Driver README for details and steps that can be taken to resolve this issue. If I restart the VM I can run a single CUDA app again, once. It''s still pretty impressive to be able to do that without having to patch Xen or reboot the entire machine =) It doesn''t seem to matter what CUDA app I''m running, here is matrixMul for example: matrixMul# ./matrixMul [Matrix Multiply Using CUDA] - Starting... GPU Device 0: "Quadro 6000" with compute capability 2.0 MatrixA(320,320), MatrixB(640,320) Computing result using CUDA Kernel... done Performance= 227.22 GFlop/s, Time= 0.577 msec, Size= 131072000 Ops, WorkgroupSize= 1024 threads/block Checking computed result for correctness: Result = PASS Note: For peak performance, please refer to the matrixMulCUBLAS example. Anyhoo, does anyone have any idea what might I be able to tweak so I can avoid this issue? The setup clearly seems to work for the most part. My domU config: arch = ''x86_64'' name = "debian-miner" builder = "hvm" maxmem = 512 memory = 512 vcpus = 1 maxcpus = 1 boot = "cd" pae=1 acpi = 1 apic = 1 hap=1 hpet=1 shadow_memory = 32 on_poweroff = "destroy" on_reboot = "restart" on_crash = "restart" vnc=1 vncunused=1 vnclisten="0.0.0.0" vif = [ ''type=netfront,bridge=xenbr0,mac=00:16:3e:12:c3:fa''] device_model_version="qemu-xen-traditional" gfx_passthru=0 xen_platform_pci=1 pci = [ ''01:00.0'', ''01:00.1'' ] pci_msitranslate = 1 pci_power_mgmt = 1 pci_permissive = 1 xen_extended_power_mgmt = 1 acpi_s3 = 1 acpi_s4 = 1 disk = [ ''phy:/dev/t0vg/debian-testing,xvda,w'']; And I''m running on Xen 4.3.1 with NVIDIA driver 331.20 x86_64 in the domU. Thanks and cheers! _______________________________________________ Xen-users mailing list Xen-users@lists.xen.org http://lists.xen.org/xen-users
Gordan Bobic
2013-Nov-19 10:32 UTC
Re: Quadrified GTX 480 VT-d passthrough. CUDA 5.5 in Linux partial success
Does the nvidia binary driver provide a reset handle for the device via sysfs? If you echo 1 into it, does it help or does it crash things? On Tue, 19 Nov 2013 10:32:31 +0100, Tamas Lengyel <tamas.lengyel@zentific.com> wrote:> Hi everyone, > after following in the footsteps of the following discussion > > (http://lists.xenproject.org/archives/html/xen-users/2013-09/msg00106.html > [1]) > I had been able to turn my GTX 480 into a Quadro 6000. When I VT-d > passthrough it to a Debian jessie VM it shows up fine and CUDA 5.5 > seems to function properly up to a point: > > lspci -v: > > 00:04.0 VGA compatible controller: NVIDIA Corporation GF100GL [Quadro > 6000] (rev a3) (prog-if 00 [VGA controller]) > Subsystem: ASUSTeK Computer Inc. Device 075f > Physical Slot: 4 > Flags: bus master, fast devsel, latency 0, IRQ 32 > Memory at ee000000 (32-bit, non-prefetchable) [size=32M] > Memory at e0000000 (64-bit, prefetchable) [size=128M] > Memory at e8000000 (64-bit, prefetchable) [size=64M] > I/O ports at c100 [size=128] > Expansion ROM at f1000000 [disabled] [size=512K] > Capabilities: [60] Power Management version 3 > Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+ > Capabilities: [78] Express Endpoint, MSI 00 > Capabilities: [b4] Vendor Specific Information: Len=14 > Kernel driver in use: nvidia > > 00:05.0 Audio device: NVIDIA Corporation GF100 High Definition Audio > Controller (rev a1) > Subsystem: ASUSTeK Computer Inc. Device 075f > Physical Slot: 5 > Flags: bus master, fast devsel, latency 0, IRQ 37 > Memory at f1080000 (32-bit, non-prefetchable) [size=16K] > Capabilities: [60] Power Management version 3 > Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+ > Capabilities: [78] Express Endpoint, MSI 00 > Kernel driver in use: snd_hda_intel > > NVIDIA_CUDA-5.5_Samples/1_Utilities/deviceQuery# ./deviceQuery > ./deviceQuery Starting... > > CUDA Device Query (Runtime API) version (CUDART static linking) > > Detected 1 CUDA Capable device(s) > > Device 0: "Quadro 6000" > CUDA Driver Version / Runtime Version 6.0 / 5.5 > CUDA Capability Major/Minor version number: 2.0 > Total amount of global memory: 1536 MBytes > (1610285056 bytes) > (15) Multiprocessors, ( 32) CUDA Cores/MP: 480 CUDA Cores > GPU Clock rate: 1401 > MHz (1.40 GHz) > Memory Clock rate: 1848 > Mhz > Memory Bus Width: > 384-bit > L2 Cache Size: > 786432 bytes > Maximum Texture Dimension Size (x,y,z) 1D=(65536), > 2D=(65536, 65535), 3D=(2048, 2048, 2048) > Maximum Layered 1D Texture Size, (num) layers 1D=(16384), 2048 > layers > Maximum Layered 2D Texture Size, (num) layers 2D=(16384, 16384), > 2048 layers > Total amount of constant memory: 65536 bytes > Total amount of shared memory per block: 49152 bytes > Total number of registers available per block: 32768 > Warp size: > 32 > Maximum number of threads per multiprocessor: 1536 > Maximum number of threads per block: 1024 > Max dimension size of a thread block (x,y,z): (1024, 1024, 64) > Max dimension size of a grid size (x,y,z): (65535, 65535, > 65535) > Maximum memory pitch: > 2147483647 bytes > Texture alignment: 512 > bytes > Concurrent copy and kernel execution: Yes with 2 copy > engine(s) > Run time limit on kernels: No > Integrated GPU sharing Host Memory: No > Support host page-locked memory mapping: Yes > Alignment requirement for Surfaces: Yes > Device has ECC support: > Disabled > Device supports Unified Addressing (UVA): Yes > Device PCI Bus ID / PCI location ID: 0 / 4 > Compute Mode: > < Default (multiple host threads can use ::cudaSetDevice() > with > device simultaneously) > > > deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 6.0, CUDA > Runtime Version = 5.5, NumDevs = 1, Device0 = Quadro 6000 > Result = PASS > > Unfortunately if I try to run any CUDA app or even nvidia-smi > afterwards, I get the following errors: > > NVIDIA_CUDA-5.5_Samples/1_Utilities/deviceQuery# ./deviceQuery > ./deviceQuery Starting... > > CUDA Device Query (Runtime API) version (CUDART static linking) > > cudaGetDeviceCount returned 10 > -> invalid device ordinal > Result = FAIL > > # nvidia-smi > Unable to determine the device handle for GPU 0000:00:04.0: The > NVIDIA > kernel module detected an issue with GPU interrupts.Consult the > "Common Problems" Chapter of the NVIDIA Driver README for > details and steps that can be taken to resolve this issue. > > If I restart the VM I can run a single CUDA app again, once. It's > still pretty impressive to be able to do that without having to > patch > Xen or reboot the entire machine =) It doesn't seem to matter what > CUDA app I'm running, here is matrixMul > for example: > > matrixMul# ./matrixMul > [Matrix Multiply Using CUDA] - Starting... > GPU Device 0: "Quadro 6000" with compute capability 2.0 > > MatrixA(320,320), MatrixB(640,320) > Computing result using CUDA Kernel... > done > Performance= 227.22 GFlop/s, Time= 0.577 msec, Size= 131072000 Ops, > WorkgroupSize= 1024 threads/block > Checking computed result for correctness: Result = PASS > > Note: For peak performance, please refer to the matrixMulCUBLAS > example. > > Anyhoo, does anyone have any idea what might I be able to tweak so I > can > avoid this issue? The setup clearly seems to work for the most > part. > > My domU config: > > arch = 'x86_64' > name = "debian-miner" > builder = "hvm" > maxmem = 512 > memory = 512 > vcpus = 1 > maxcpus = 1 > boot = "cd" > pae=1 > acpi = 1 > apic = 1 > hap=1 > hpet=1 > shadow_memory = 32 > on_poweroff = "destroy" > on_reboot = "restart" > on_crash = "restart" > vnc=1 > vncunused=1 > vnclisten="0.0.0.0" > vif = [ 'type=netfront,bridge=xenbr0,mac=00:16:3e:12:c3:fa'] > device_model_version="qemu-xen-traditional" > gfx_passthru=0 > xen_platform_pci=1 > pci = [ '01:00.0', '01:00.1' ] > pci_msitranslate = 1 > pci_power_mgmt = 1 > pci_permissive = 1 > xen_extended_power_mgmt = 1 > acpi_s3 = 1 > acpi_s4 = 1 > disk = [ 'phy:/dev/t0vg/debian-testing,xvda,w']; > > And I'm running on Xen 4.3.1 with NVIDIA driver 331.20 x86_64 in the > domU. > > Thanks and cheers! > > > Links: > ------ > [1] > > http://lists.xenproject.org/archives/html/xen-users/2013-09/msg00106.html_______________________________________________ Xen-users mailing list Xen-users@lists.xen.org http://lists.xen.org/xen-users
Tamas Lengyel
2013-Nov-19 13:22 UTC
Re: Quadrified GTX 480 VT-d passthrough. CUDA 5.5 in Linux partial success
I don''t see reset unfortunately: ls /sys/module/nvidia/drivers/pci\:nvidia/0000\:00\:04.0 boot_vga d3cold_allowed enable i2c-3 msi_bus rescan resource3 subsystem_device broken_parity_status device firmware_node irq msi_irqs resource resource3_wc subsystem_vendor class dma_mask_bits i2c-0 local_cpulist numa_node resource0 resource5 uevent config driver i2c-1 local_cpus power resource1 rom vendor consistent_dma_mask_bits drm i2c-2 modalias remove resource1_wc subsystem On Tue, Nov 19, 2013 at 11:32 AM, Gordan Bobic <gordan@bobich.net> wrote:> Does the nvidia binary driver provide a reset handle for the device via > sysfs? > If you echo 1 into it, does it help or does it crash things? > > > > On Tue, 19 Nov 2013 10:32:31 +0100, Tamas Lengyel < > tamas.lengyel@zentific.com> wrote: > >> Hi everyone, >> after following in the footsteps of the following discussion >> (http://lists.xenproject.org/archives/html/xen-users/2013- >> 09/msg00106.html >> [1]) >> >> I had been able to turn my GTX 480 into a Quadro 6000. When I VT-d >> passthrough it to a Debian jessie VM it shows up fine and CUDA 5.5 >> seems to function properly up to a point: >> >> lspci -v: >> >> 00:04.0 VGA compatible controller: NVIDIA Corporation GF100GL [Quadro >> 6000] (rev a3) (prog-if 00 [VGA controller]) >> Subsystem: ASUSTeK Computer Inc. Device 075f >> Physical Slot: 4 >> Flags: bus master, fast devsel, latency 0, IRQ 32 >> Memory at ee000000 (32-bit, non-prefetchable) [size=32M] >> Memory at e0000000 (64-bit, prefetchable) [size=128M] >> Memory at e8000000 (64-bit, prefetchable) [size=64M] >> I/O ports at c100 [size=128] >> Expansion ROM at f1000000 [disabled] [size=512K] >> Capabilities: [60] Power Management version 3 >> Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+ >> Capabilities: [78] Express Endpoint, MSI 00 >> Capabilities: [b4] Vendor Specific Information: Len=14 >> Kernel driver in use: nvidia >> >> 00:05.0 Audio device: NVIDIA Corporation GF100 High Definition Audio >> Controller (rev a1) >> Subsystem: ASUSTeK Computer Inc. Device 075f >> Physical Slot: 5 >> Flags: bus master, fast devsel, latency 0, IRQ 37 >> Memory at f1080000 (32-bit, non-prefetchable) [size=16K] >> Capabilities: [60] Power Management version 3 >> Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+ >> Capabilities: [78] Express Endpoint, MSI 00 >> Kernel driver in use: snd_hda_intel >> >> NVIDIA_CUDA-5.5_Samples/1_Utilities/deviceQuery# ./deviceQuery >> ./deviceQuery Starting... >> >> CUDA Device Query (Runtime API) version (CUDART static linking) >> >> Detected 1 CUDA Capable device(s) >> >> Device 0: "Quadro 6000" >> CUDA Driver Version / Runtime Version 6.0 / 5.5 >> CUDA Capability Major/Minor version number: 2.0 >> Total amount of global memory: 1536 MBytes >> (1610285056 bytes) >> (15) Multiprocessors, ( 32) CUDA Cores/MP: 480 CUDA Cores >> GPU Clock rate: 1401 >> MHz (1.40 GHz) >> Memory Clock rate: 1848 >> Mhz >> Memory Bus Width: >> 384-bit >> L2 Cache Size: >> 786432 bytes >> Maximum Texture Dimension Size (x,y,z) 1D=(65536), >> 2D=(65536, 65535), 3D=(2048, 2048, 2048) >> Maximum Layered 1D Texture Size, (num) layers 1D=(16384), 2048 >> layers >> Maximum Layered 2D Texture Size, (num) layers 2D=(16384, 16384), >> 2048 layers >> Total amount of constant memory: 65536 bytes >> Total amount of shared memory per block: 49152 bytes >> Total number of registers available per block: 32768 >> Warp size: >> 32 >> Maximum number of threads per multiprocessor: 1536 >> Maximum number of threads per block: 1024 >> Max dimension size of a thread block (x,y,z): (1024, 1024, 64) >> Max dimension size of a grid size (x,y,z): (65535, 65535, >> 65535) >> Maximum memory pitch: >> 2147483647 bytes >> Texture alignment: 512 >> bytes >> Concurrent copy and kernel execution: Yes with 2 copy >> engine(s) >> Run time limit on kernels: No >> Integrated GPU sharing Host Memory: No >> Support host page-locked memory mapping: Yes >> Alignment requirement for Surfaces: Yes >> Device has ECC support: >> Disabled >> Device supports Unified Addressing (UVA): Yes >> Device PCI Bus ID / PCI location ID: 0 / 4 >> Compute Mode: >> < Default (multiple host threads can use ::cudaSetDevice() >> with >> device simultaneously) > >> >> deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 6.0, CUDA >> Runtime Version = 5.5, NumDevs = 1, Device0 = Quadro 6000 >> Result = PASS >> >> Unfortunately if I try to run any CUDA app or even nvidia-smi >> afterwards, I get the following errors: >> >> NVIDIA_CUDA-5.5_Samples/1_Utilities/deviceQuery# ./deviceQuery >> ./deviceQuery Starting... >> >> CUDA Device Query (Runtime API) version (CUDART static linking) >> >> cudaGetDeviceCount returned 10 >> -> invalid device ordinal >> Result = FAIL >> >> # nvidia-smi >> Unable to determine the device handle for GPU 0000:00:04.0: The >> NVIDIA >> kernel module detected an issue with GPU interrupts.Consult the >> "Common Problems" Chapter of the NVIDIA Driver README for >> details and steps that can be taken to resolve this issue. >> >> If I restart the VM I can run a single CUDA app again, once. It''s >> still pretty impressive to be able to do that without having to patch >> Xen or reboot the entire machine =) It doesn''t seem to matter what >> CUDA app I''m running, here is matrixMul >> for example: >> >> matrixMul# ./matrixMul >> [Matrix Multiply Using CUDA] - Starting... >> GPU Device 0: "Quadro 6000" with compute capability 2.0 >> >> MatrixA(320,320), MatrixB(640,320) >> Computing result using CUDA Kernel... >> done >> Performance= 227.22 GFlop/s, Time= 0.577 msec, Size= 131072000 Ops, >> WorkgroupSize= 1024 threads/block >> Checking computed result for correctness: Result = PASS >> >> Note: For peak performance, please refer to the matrixMulCUBLAS >> example. >> >> Anyhoo, does anyone have any idea what might I be able to tweak so I >> can >> avoid this issue? The setup clearly seems to work for the most >> part. >> >> My domU config: >> >> arch = ''x86_64'' >> name = "debian-miner" >> builder = "hvm" >> maxmem = 512 >> memory = 512 >> vcpus = 1 >> maxcpus = 1 >> boot = "cd" >> pae=1 >> acpi = 1 >> apic = 1 >> hap=1 >> hpet=1 >> shadow_memory = 32 >> on_poweroff = "destroy" >> on_reboot = "restart" >> on_crash = "restart" >> vnc=1 >> vncunused=1 >> vnclisten="0.0.0.0" >> vif = [ ''type=netfront,bridge=xenbr0,mac=00:16:3e:12:c3:fa''] >> device_model_version="qemu-xen-traditional" >> gfx_passthru=0 >> xen_platform_pci=1 >> pci = [ ''01:00.0'', ''01:00.1'' ] >> pci_msitranslate = 1 >> pci_power_mgmt = 1 >> pci_permissive = 1 >> xen_extended_power_mgmt = 1 >> acpi_s3 = 1 >> acpi_s4 = 1 >> disk = [ ''phy:/dev/t0vg/debian-testing,xvda,w'']; >> >> And I''m running on Xen 4.3.1 with NVIDIA driver 331.20 x86_64 in the >> domU. >> >> Thanks and cheers! >> >> >> Links: >> ------ >> [1] >> >> http://lists.xenproject.org/archives/html/xen-users/2013-09/msg00106.html >> > >_______________________________________________ Xen-users mailing list Xen-users@lists.xen.org http://lists.xen.org/xen-users
Gordan Bobic
2013-Nov-19 13:47 UTC
Re: Quadrified GTX 480 VT-d passthrough. CUDA 5.5 in Linux partial success
I can't remember how it's all symlinked, but I normally find it under somewhere like: /sys/devices/pci0000:00/0000:00:03.0/0000:0b:00.0/0000:0c:02.0/0000:0d:00.0/reset (the path reflects PCI bridges along the way - yes, I have a card behind 3 PCIe bridges on my motherboard (5520->NF200->NF200->GPU) - and that's not even the GTX690 - that would add at least one more bridge to the path - madness) If nvidia driver isn't exposing it, you could try unloading the nvidia driver, loading the nouveau driver (make sure mode switching is disabled so it doesn't get bound into a non-loadable state by the console), issuing a reset (if that exposes a reset node, which IIRC it does no Fermi+ GPUs), unloading nouveau, and reloading nvidia.ko. Then see if it works after that. Gordan On Tue, 19 Nov 2013 14:22:48 +0100, Tamas Lengyel <tamas.lengyel@zentific.com> wrote:> I don't see reset unfortunately: > > ls /sys/module/nvidia/drivers/pci:nvidia/0000:00:04.0 > boot_vga d3cold_allowed enable i2c-3 msi_bus rescan > resource3 subsystem_device > broken_parity_status device firmware_node irq msi_irqs > resource resource3_wc subsystem_vendor > class dma_mask_bits i2c-0 local_cpulist numa_node resource0 > resource5 uevent > config driver i2c-1 local_cpus power resource1 rom > vendor > consistent_dma_mask_bits drm i2c-2 modalias remove > resource1_wc subsystem > > On Tue, Nov 19, 2013 at 11:32 AM, Gordan Bobic wrote: > Does the nvidia binary driver provide a reset handle for the device > via sysfs? > If you echo 1 into it, does it help or does it crash things? > > On Tue, 19 Nov 2013 10:32:31 +0100, Tamas Lengyel wrote: > > Hi everyone, > after following in the footsteps of the following discussion > > (http://lists.xenproject.org/archives/html/xen-users/2013-09/msg00106.html > [3] > [1]) > > I had been able to turn my GTX 480 into a Quadro 6000. When I VT-d > passthrough it to a Debian jessie VM it shows up fine and CUDA 5.5 > seems to function properly up to a point: > > lspci -v: > > 00:04.0 VGA compatible controller: NVIDIA Corporation GF100GL > [Quadro > 6000] (rev a3) (prog-if 00 [VGA controller]) > Subsystem: ASUSTeK Computer Inc. Device 075f > Physical Slot: 4 > Flags: bus master, fast devsel, latency 0, IRQ 32 > Memory at ee000000 (32-bit, non-prefetchable) [size=32M] > Memory at e0000000 (64-bit, prefetchable) [size=128M] > Memory at e8000000 (64-bit, prefetchable) [size=64M] > I/O ports at c100 [size=128] > Expansion ROM at f1000000 [disabled] [size=512K] > Capabilities: [60] Power Management version 3 > Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+ > Capabilities: [78] Express Endpoint, MSI 00 > Capabilities: [b4] Vendor Specific Information: Len=14 > > Kernel driver in use: nvidia > > 00:05.0 Audio device: NVIDIA Corporation GF100 High Definition Audio > Controller (rev a1) > Subsystem: ASUSTeK Computer Inc. Device 075f > Physical Slot: 5 > Flags: bus master, fast devsel, latency 0, IRQ 37 > Memory at f1080000 (32-bit, non-prefetchable) [size=16K] > Capabilities: [60] Power Management version 3 > Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+ > Capabilities: [78] Express Endpoint, MSI 00 > Kernel driver in use: snd_hda_intel > > NVIDIA_CUDA-5.5_Samples/1_Utilities/deviceQuery# ./deviceQuery > ./deviceQuery Starting... > > CUDA Device Query (Runtime API) version (CUDART static linking) > > Detected 1 CUDA Capable device(s) > > Device 0: "Quadro 6000" > CUDA Driver Version / Runtime Version 6.0 / 5.5 > CUDA Capability Major/Minor version number: 2.0 > Total amount of global memory: 1536 MBytes > (1610285056 bytes) > (15) Multiprocessors, ( 32) CUDA Cores/MP: 480 CUDA Cores > GPU Clock rate: > 1401 > MHz (1.40 GHz) > Memory Clock rate: > 1848 > Mhz > Memory Bus Width: > 384-bit > L2 Cache Size: > 786432 bytes > Maximum Texture Dimension Size (x,y,z) 1D=(65536), > 2D=(65536, 65535), 3D=(2048, 2048, 2048) > Maximum Layered 1D Texture Size, (num) layers 1D=(16384), 2048 > layers > Maximum Layered 2D Texture Size, (num) layers 2D=(16384, > 16384), > 2048 layers > Total amount of constant memory: 65536 bytes > Total amount of shared memory per block: 49152 bytes > Total number of registers available per block: 32768 > Warp size: > 32 > Maximum number of threads per multiprocessor: 1536 > Maximum number of threads per block: 1024 > Max dimension size of a thread block (x,y,z): (1024, 1024, 64) > Max dimension size of a grid size (x,y,z): (65535, 65535, > 65535) > Maximum memory pitch: > 2147483647 [4] bytes > Texture alignment: 512 > bytes > Concurrent copy and kernel execution: Yes with 2 > copy > engine(s) > Run time limit on kernels: No > Integrated GPU sharing Host Memory: No > Support host page-locked memory mapping: Yes > Alignment requirement for Surfaces: Yes > Device has ECC support: > Disabled > Device supports Unified Addressing (UVA): Yes > Device PCI Bus ID / PCI location ID: 0 / 4 > Compute Mode: > < Default (multiple host threads can use ::cudaSetDevice() > with > device simultaneously) > > > deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 6.0, CUDA > Runtime Version = 5.5, NumDevs = 1, Device0 = Quadro 6000 > Result = PASS > > Unfortunately if I try to run any CUDA app or even nvidia-smi > afterwards, I get the following errors: > > NVIDIA_CUDA-5.5_Samples/1_Utilities/deviceQuery# ./deviceQuery > ./deviceQuery Starting... > > CUDA Device Query (Runtime API) version (CUDART static linking) > > cudaGetDeviceCount returned 10 > -> invalid device ordinal > Result = FAIL > > # nvidia-smi > Unable to determine the device handle for GPU 0000:00:04.0: The > NVIDIA > kernel module detected an issue with GPU interrupts.Consult the > "Common Problems" Chapter of the NVIDIA Driver README for > details and steps that can be taken to resolve this issue. > > If I restart the VM I can run a single CUDA app again, once. It's > still pretty impressive to be able to do that without having to > patch > Xen or reboot the entire machine =) It doesn't seem to matter what > CUDA app I'm running, here is matrixMul > for example: > > matrixMul# ./matrixMul > [Matrix Multiply Using CUDA] - Starting... > GPU Device 0: "Quadro 6000" with compute capability 2.0 > > MatrixA(320,320), MatrixB(640,320) > Computing result using CUDA Kernel... > done > Performance= 227.22 GFlop/s, Time= 0.577 msec, Size= 131072000 Ops, > WorkgroupSize= 1024 threads/block > Checking computed result for correctness: Result = PASS > > Note: For peak performance, please refer to the matrixMulCUBLAS > example. > > Anyhoo, does anyone have any idea what might I be able to tweak so I > can > avoid this issue? The setup clearly seems to work for the most > part. > > My domU config: > > arch = 'x86_64' > name = "debian-miner" > builder = "hvm" > maxmem = 512 > memory = 512 > vcpus = 1 > maxcpus = 1 > boot = "cd" > pae=1 > acpi = 1 > apic = 1 > hap=1 > hpet=1 > shadow_memory = 32 > on_poweroff = "destroy" > on_reboot = "restart" > on_crash = "restart" > vnc=1 > vncunused=1 > vnclisten="0.0.0.0" > vif = [ 'type=netfront,bridge=xenbr0,mac=00:16:3e:12:c3:fa'] > device_model_version="qemu-xen-traditional" > gfx_passthru=0 > xen_platform_pci=1 > pci = [ '01:00.0', '01:00.1' ] > pci_msitranslate = 1 > pci_power_mgmt = 1 > pci_permissive = 1 > xen_extended_power_mgmt = 1 > acpi_s3 = 1 > acpi_s4 = 1 > disk = [ 'phy:/dev/t0vg/debian-testing,xvda,w']; > > And I'm running on Xen 4.3.1 with NVIDIA driver 331.20 x86_64 in the > domU. > > Thanks and cheers! > > Links: > ------ > [1] > > > http://lists.xenproject.org/archives/html/xen-users/2013-09/msg00106.html > [5] > > > > Links: > ------ > [1] mailto:gordan@bobich.net > [2] mailto:tamas.lengyel@zentific.com > [3] > > http://lists.xenproject.org/archives/html/xen-users/2013-09/msg00106.html > [4] http://mail.shatteredsilicon.net/tel:2147483647 > [5] > > http://lists.xenproject.org/archives/html/xen-users/2013-09/msg00106.html_______________________________________________ Xen-users mailing list Xen-users@lists.xen.org http://lists.xen.org/xen-users
Gordan Bobic
2013-Nov-19 13:48 UTC
Re: Quadrified GTX 480 VT-d passthrough. CUDA 5.5 in Linux partial success
Actually - try something simpler first - just unload and reload the nvidia.ko driver, see if that resets the card back into a CUDA-ble state. On Tue, 19 Nov 2013 13:47:18 +0000, Gordan Bobic <gordan@bobich.net> wrote:> I can't remember how it's all symlinked, but I normally > find it under somewhere like: > > /sys/devices/pci0000:00/0000:00:03.0/0000:0b:00.0/0000:0c:02.0/0000:0d:00.0/reset > > (the path reflects PCI bridges along the way - yes, I have a card > behind 3 PCIe > bridges on my motherboard (5520->NF200->NF200->GPU) - and that's not > even the > GTX690 - that would add at least one more bridge to the path - > madness) > > If nvidia driver isn't exposing it, you could try unloading the > nvidia driver, > loading the nouveau driver (make sure mode switching is disabled so > it doesn't > get bound into a non-loadable state by the console), issuing a reset > (if that > exposes a reset node, which IIRC it does no Fermi+ GPUs), unloading > nouveau, > and reloading nvidia.ko. Then see if it works after that. > > Gordan > > On Tue, 19 Nov 2013 14:22:48 +0100, Tamas Lengyel > <tamas.lengyel@zentific.com> wrote: >> I don't see reset unfortunately: >> >> ls /sys/module/nvidia/drivers/pci:nvidia/0000:00:04.0 >> boot_vga d3cold_allowed enable i2c-3 msi_bus rescan >> resource3 subsystem_device >> broken_parity_status device firmware_node irq msi_irqs >> resource resource3_wc subsystem_vendor >> class dma_mask_bits i2c-0 local_cpulist numa_node resource0 >> resource5 uevent >> config driver i2c-1 local_cpus power resource1 rom >> vendor >> consistent_dma_mask_bits drm i2c-2 modalias remove >> resource1_wc subsystem >> >> On Tue, Nov 19, 2013 at 11:32 AM, Gordan Bobic wrote: >> Does the nvidia binary driver provide a reset handle for the device >> via sysfs? >> If you echo 1 into it, does it help or does it crash things? >> >> On Tue, 19 Nov 2013 10:32:31 +0100, Tamas Lengyel wrote: >> >> Hi everyone, >> after following in the footsteps of the following discussion >> >> (http://lists.xenproject.org/archives/html/xen-users/2013-09/msg00106.html >> [3] >> [1]) >> >> I had been able to turn my GTX 480 into a Quadro 6000. When I VT-d >> passthrough it to a Debian jessie VM it shows up fine and CUDA 5.5 >> seems to function properly up to a point: >> >> lspci -v: >> >> 00:04.0 VGA compatible controller: NVIDIA Corporation GF100GL >> [Quadro >> 6000] (rev a3) (prog-if 00 [VGA controller]) >> Subsystem: ASUSTeK Computer Inc. Device 075f >> Physical Slot: 4 >> Flags: bus master, fast devsel, latency 0, IRQ 32 >> Memory at ee000000 (32-bit, non-prefetchable) [size=32M] >> Memory at e0000000 (64-bit, prefetchable) [size=128M] >> Memory at e8000000 (64-bit, prefetchable) [size=64M] >> I/O ports at c100 [size=128] >> Expansion ROM at f1000000 [disabled] [size=512K] >> Capabilities: [60] Power Management version 3 >> Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+ >> Capabilities: [78] Express Endpoint, MSI 00 >> Capabilities: [b4] Vendor Specific Information: Len=14 >> >> Kernel driver in use: nvidia >> >> 00:05.0 Audio device: NVIDIA Corporation GF100 High Definition >> Audio >> Controller (rev a1) >> Subsystem: ASUSTeK Computer Inc. Device 075f >> Physical Slot: 5 >> Flags: bus master, fast devsel, latency 0, IRQ 37 >> Memory at f1080000 (32-bit, non-prefetchable) [size=16K] >> Capabilities: [60] Power Management version 3 >> Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+ >> Capabilities: [78] Express Endpoint, MSI 00 >> Kernel driver in use: snd_hda_intel >> >> NVIDIA_CUDA-5.5_Samples/1_Utilities/deviceQuery# ./deviceQuery >> ./deviceQuery Starting... >> >> CUDA Device Query (Runtime API) version (CUDART static linking) >> >> Detected 1 CUDA Capable device(s) >> >> Device 0: "Quadro 6000" >> CUDA Driver Version / Runtime Version 6.0 / 5.5 >> CUDA Capability Major/Minor version number: 2.0 >> Total amount of global memory: 1536 MBytes >> (1610285056 bytes) >> (15) Multiprocessors, ( 32) CUDA Cores/MP: 480 CUDA Cores >> GPU Clock rate: >> 1401 >> MHz (1.40 GHz) >> Memory Clock rate: >> 1848 >> Mhz >> Memory Bus Width: >> 384-bit >> L2 Cache Size: >> 786432 bytes >> Maximum Texture Dimension Size (x,y,z) 1D=(65536), >> 2D=(65536, 65535), 3D=(2048, 2048, 2048) >> Maximum Layered 1D Texture Size, (num) layers 1D=(16384), 2048 >> layers >> Maximum Layered 2D Texture Size, (num) layers 2D=(16384, >> 16384), >> 2048 layers >> Total amount of constant memory: 65536 bytes >> Total amount of shared memory per block: 49152 bytes >> Total number of registers available per block: 32768 >> Warp size: >> 32 >> Maximum number of threads per multiprocessor: 1536 >> Maximum number of threads per block: 1024 >> Max dimension size of a thread block (x,y,z): (1024, 1024, 64) >> Max dimension size of a grid size (x,y,z): (65535, 65535, >> 65535) >> Maximum memory pitch: >> 2147483647 [4] bytes >> Texture alignment: 512 >> bytes >> Concurrent copy and kernel execution: Yes with 2 >> copy >> engine(s) >> Run time limit on kernels: No >> Integrated GPU sharing Host Memory: No >> Support host page-locked memory mapping: Yes >> Alignment requirement for Surfaces: Yes >> Device has ECC support: >> Disabled >> Device supports Unified Addressing (UVA): Yes >> Device PCI Bus ID / PCI location ID: 0 / 4 >> Compute Mode: >> < Default (multiple host threads can use ::cudaSetDevice() >> with >> device simultaneously) > >> >> deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 6.0, CUDA >> Runtime Version = 5.5, NumDevs = 1, Device0 = Quadro 6000 >> Result = PASS >> >> Unfortunately if I try to run any CUDA app or even nvidia-smi >> afterwards, I get the following errors: >> >> NVIDIA_CUDA-5.5_Samples/1_Utilities/deviceQuery# ./deviceQuery >> ./deviceQuery Starting... >> >> CUDA Device Query (Runtime API) version (CUDART static linking) >> >> cudaGetDeviceCount returned 10 >> -> invalid device ordinal >> Result = FAIL >> >> # nvidia-smi >> Unable to determine the device handle for GPU 0000:00:04.0: The >> NVIDIA >> kernel module detected an issue with GPU interrupts.Consult the >> "Common Problems" Chapter of the NVIDIA Driver README for >> details and steps that can be taken to resolve this issue. >> >> If I restart the VM I can run a single CUDA app again, once. It's >> still pretty impressive to be able to do that without having to >> patch >> Xen or reboot the entire machine =) It doesn't seem to matter what >> CUDA app I'm running, here is matrixMul >> for example: >> >> matrixMul# ./matrixMul >> [Matrix Multiply Using CUDA] - Starting... >> GPU Device 0: "Quadro 6000" with compute capability 2.0 >> >> MatrixA(320,320), MatrixB(640,320) >> Computing result using CUDA Kernel... >> done >> Performance= 227.22 GFlop/s, Time= 0.577 msec, Size= 131072000 Ops, >> WorkgroupSize= 1024 threads/block >> Checking computed result for correctness: Result = PASS >> >> Note: For peak performance, please refer to the matrixMulCUBLAS >> example. >> >> Anyhoo, does anyone have any idea what might I be able to tweak so >> I >> can >> avoid this issue? The setup clearly seems to work for the most >> part. >> >> My domU config: >> >> arch = 'x86_64' >> name = "debian-miner" >> builder = "hvm" >> maxmem = 512 >> memory = 512 >> vcpus = 1 >> maxcpus = 1 >> boot = "cd" >> pae=1 >> acpi = 1 >> apic = 1 >> hap=1 >> hpet=1 >> shadow_memory = 32 >> on_poweroff = "destroy" >> on_reboot = "restart" >> on_crash = "restart" >> vnc=1 >> vncunused=1 >> vnclisten="0.0.0.0" >> vif = [ 'type=netfront,bridge=xenbr0,mac=00:16:3e:12:c3:fa'] >> device_model_version="qemu-xen-traditional" >> gfx_passthru=0 >> xen_platform_pci=1 >> pci = [ '01:00.0', '01:00.1' ] >> pci_msitranslate = 1 >> pci_power_mgmt = 1 >> pci_permissive = 1 >> xen_extended_power_mgmt = 1 >> acpi_s3 = 1 >> acpi_s4 = 1 >> disk = [ 'phy:/dev/t0vg/debian-testing,xvda,w']; >> >> And I'm running on Xen 4.3.1 with NVIDIA driver 331.20 x86_64 in >> the >> domU. >> >> Thanks and cheers! >> >> Links: >> ------ >> [1] >> >> >> http://lists.xenproject.org/archives/html/xen-users/2013-09/msg00106.html >> [5] >> >> >> >> Links: >> ------ >> [1] mailto:gordan@bobich.net >> [2] mailto:tamas.lengyel@zentific.com >> [3] >> >> http://lists.xenproject.org/archives/html/xen-users/2013-09/msg00106.html >> [4] http://mail.shatteredsilicon.net/tel:2147483647 >> [5] >> >> http://lists.xenproject.org/archives/html/xen-users/2013-09/msg00106.html_______________________________________________ Xen-users mailing list Xen-users@lists.xen.org http://lists.xen.org/xen-users
Tamas Lengyel
2013-Nov-19 14:25 UTC
Re: Quadrified GTX 480 VT-d passthrough. CUDA 5.5 in Linux partial success
Allright, I did load the nouveau module and it doesn''t expose reset either. Loading nvidia back again had no effect either, still the same problem. root@debian-testing:~# lsmod | grep nouveau nouveau 731557 0 mxm_wmi 12515 1 nouveau wmi 13243 2 mxm_wmi,nouveau video 17792 1 nouveau ttm 58566 1 nouveau drm_kms_helper 31837 1 nouveau i2c_algo_bit 12841 1 nouveau drm 211856 4 ttm,drm_kms_helper,nvidia,nouveau button 12944 1 nouveau i2c_core 24353 6 drm,i2c_piix4,drm_kms_helper,i2c_algo_bit,nvidia,nouveau root@debian-testing:~# ls /sys/devices/pci0000\:00/0000\:00\:04.0/ boot_vga d3cold_allowed enable modalias rescan resource3 subsystem_device broken_parity_status device firmware_node msi_bus resource resource3_wc subsystem_vendor class dma_mask_bits irq numa_node resource0 resource5 uevent config driver local_cpulist power resource1 rom vendor consistent_dma_mask_bits drm local_cpus remove resource1_wc subsystem On Tue, Nov 19, 2013 at 2:48 PM, Gordan Bobic <gordan@bobich.net> wrote:> Actually - try something simpler first - just unload and reload the > nvidia.ko driver, see if that resets the card back into a CUDA-ble > state. > > > On Tue, 19 Nov 2013 13:47:18 +0000, Gordan Bobic <gordan@bobich.net> > wrote: > >> I can''t remember how it''s all symlinked, but I normally >> find it under somewhere like: >> >> /sys/devices/pci0000:00/0000:00:03.0/0000:0b:00.0/0000:0c: >> 02.0/0000:0d:00.0/reset >> >> (the path reflects PCI bridges along the way - yes, I have a card >> behind 3 PCIe >> bridges on my motherboard (5520->NF200->NF200->GPU) - and that''s not even >> the >> GTX690 - that would add at least one more bridge to the path - madness) >> >> If nvidia driver isn''t exposing it, you could try unloading the >> nvidia driver, >> loading the nouveau driver (make sure mode switching is disabled so >> it doesn''t >> get bound into a non-loadable state by the console), issuing a reset (if >> that >> exposes a reset node, which IIRC it does no Fermi+ GPUs), unloading >> nouveau, >> and reloading nvidia.ko. Then see if it works after that. >> >> Gordan >> >> On Tue, 19 Nov 2013 14:22:48 +0100, Tamas Lengyel >> <tamas.lengyel@zentific.com> wrote: >> >>> I don''t see reset unfortunately: >>> >>> ls /sys/module/nvidia/drivers/pci:nvidia/0000:00:04.0 >>> boot_vga d3cold_allowed enable i2c-3 msi_bus rescan >>> resource3 subsystem_device >>> broken_parity_status device firmware_node irq msi_irqs >>> resource resource3_wc subsystem_vendor >>> class dma_mask_bits i2c-0 local_cpulist numa_node resource0 >>> resource5 uevent >>> config driver i2c-1 local_cpus power resource1 rom >>> vendor >>> consistent_dma_mask_bits drm i2c-2 modalias remove >>> resource1_wc subsystem >>> >>> On Tue, Nov 19, 2013 at 11:32 AM, Gordan Bobic wrote: >>> Does the nvidia binary driver provide a reset handle for the device >>> via sysfs? >>> If you echo 1 into it, does it help or does it crash things? >>> >>> On Tue, 19 Nov 2013 10:32:31 +0100, Tamas Lengyel wrote: >>> >>> Hi everyone, >>> after following in the footsteps of the following discussion >>> (http://lists.xenproject.org/archives/html/xen-users/2013- >>> 09/msg00106.html >>> [3] >>> [1]) >>> >>> I had been able to turn my GTX 480 into a Quadro 6000. When I VT-d >>> passthrough it to a Debian jessie VM it shows up fine and CUDA 5.5 >>> seems to function properly up to a point: >>> >>> lspci -v: >>> >>> 00:04.0 VGA compatible controller: NVIDIA Corporation GF100GL [Quadro >>> 6000] (rev a3) (prog-if 00 [VGA controller]) >>> Subsystem: ASUSTeK Computer Inc. Device 075f >>> Physical Slot: 4 >>> Flags: bus master, fast devsel, latency 0, IRQ 32 >>> Memory at ee000000 (32-bit, non-prefetchable) [size=32M] >>> Memory at e0000000 (64-bit, prefetchable) [size=128M] >>> Memory at e8000000 (64-bit, prefetchable) [size=64M] >>> I/O ports at c100 [size=128] >>> Expansion ROM at f1000000 [disabled] [size=512K] >>> Capabilities: [60] Power Management version 3 >>> Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+ >>> Capabilities: [78] Express Endpoint, MSI 00 >>> Capabilities: [b4] Vendor Specific Information: Len=14 >>> >>> Kernel driver in use: nvidia >>> >>> 00:05.0 Audio device: NVIDIA Corporation GF100 High Definition Audio >>> Controller (rev a1) >>> Subsystem: ASUSTeK Computer Inc. Device 075f >>> Physical Slot: 5 >>> Flags: bus master, fast devsel, latency 0, IRQ 37 >>> Memory at f1080000 (32-bit, non-prefetchable) [size=16K] >>> Capabilities: [60] Power Management version 3 >>> Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+ >>> Capabilities: [78] Express Endpoint, MSI 00 >>> Kernel driver in use: snd_hda_intel >>> >>> NVIDIA_CUDA-5.5_Samples/1_Utilities/deviceQuery# ./deviceQuery >>> ./deviceQuery Starting... >>> >>> CUDA Device Query (Runtime API) version (CUDART static linking) >>> >>> Detected 1 CUDA Capable device(s) >>> >>> Device 0: "Quadro 6000" >>> CUDA Driver Version / Runtime Version 6.0 / 5.5 >>> CUDA Capability Major/Minor version number: 2.0 >>> Total amount of global memory: 1536 MBytes >>> (1610285056 bytes) >>> (15) Multiprocessors, ( 32) CUDA Cores/MP: 480 CUDA Cores >>> GPU Clock rate: >>> 1401 >>> MHz (1.40 GHz) >>> Memory Clock rate: >>> 1848 >>> Mhz >>> Memory Bus Width: >>> 384-bit >>> L2 Cache Size: >>> 786432 bytes >>> Maximum Texture Dimension Size (x,y,z) 1D=(65536), >>> 2D=(65536, 65535), 3D=(2048, 2048, 2048) >>> Maximum Layered 1D Texture Size, (num) layers 1D=(16384), 2048 >>> layers >>> Maximum Layered 2D Texture Size, (num) layers 2D=(16384, >>> 16384), >>> 2048 layers >>> Total amount of constant memory: 65536 bytes >>> Total amount of shared memory per block: 49152 bytes >>> Total number of registers available per block: 32768 >>> Warp size: >>> 32 >>> Maximum number of threads per multiprocessor: 1536 >>> Maximum number of threads per block: 1024 >>> Max dimension size of a thread block (x,y,z): (1024, 1024, 64) >>> Max dimension size of a grid size (x,y,z): (65535, 65535, >>> 65535) >>> Maximum memory pitch: >>> 2147483647 [4] bytes >>> Texture alignment: 512 >>> bytes >>> Concurrent copy and kernel execution: Yes with 2 >>> copy >>> engine(s) >>> Run time limit on kernels: No >>> Integrated GPU sharing Host Memory: No >>> Support host page-locked memory mapping: Yes >>> Alignment requirement for Surfaces: Yes >>> Device has ECC support: >>> Disabled >>> Device supports Unified Addressing (UVA): Yes >>> Device PCI Bus ID / PCI location ID: 0 / 4 >>> Compute Mode: >>> < Default (multiple host threads can use ::cudaSetDevice() >>> with >>> device simultaneously) > >>> >>> deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 6.0, CUDA >>> Runtime Version = 5.5, NumDevs = 1, Device0 = Quadro 6000 >>> Result = PASS >>> >>> Unfortunately if I try to run any CUDA app or even nvidia-smi >>> afterwards, I get the following errors: >>> >>> NVIDIA_CUDA-5.5_Samples/1_Utilities/deviceQuery# ./deviceQuery >>> ./deviceQuery Starting... >>> >>> CUDA Device Query (Runtime API) version (CUDART static linking) >>> >>> cudaGetDeviceCount returned 10 >>> -> invalid device ordinal >>> Result = FAIL >>> >>> # nvidia-smi >>> Unable to determine the device handle for GPU 0000:00:04.0: The >>> NVIDIA >>> kernel module detected an issue with GPU interrupts.Consult the >>> "Common Problems" Chapter of the NVIDIA Driver README for >>> details and steps that can be taken to resolve this issue. >>> >>> If I restart the VM I can run a single CUDA app again, once. It''s >>> still pretty impressive to be able to do that without having to >>> patch >>> Xen or reboot the entire machine =) It doesn''t seem to matter what >>> CUDA app I''m running, here is matrixMul >>> for example: >>> >>> matrixMul# ./matrixMul >>> [Matrix Multiply Using CUDA] - Starting... >>> GPU Device 0: "Quadro 6000" with compute capability 2.0 >>> >>> MatrixA(320,320), MatrixB(640,320) >>> Computing result using CUDA Kernel... >>> done >>> Performance= 227.22 GFlop/s, Time= 0.577 msec, Size= 131072000 Ops, >>> WorkgroupSize= 1024 threads/block >>> Checking computed result for correctness: Result = PASS >>> >>> Note: For peak performance, please refer to the matrixMulCUBLAS >>> example. >>> >>> Anyhoo, does anyone have any idea what might I be able to tweak so I >>> can >>> avoid this issue? The setup clearly seems to work for the most >>> part. >>> >>> My domU config: >>> >>> arch = ''x86_64'' >>> name = "debian-miner" >>> builder = "hvm" >>> maxmem = 512 >>> memory = 512 >>> vcpus = 1 >>> maxcpus = 1 >>> boot = "cd" >>> pae=1 >>> acpi = 1 >>> apic = 1 >>> hap=1 >>> hpet=1 >>> shadow_memory = 32 >>> on_poweroff = "destroy" >>> on_reboot = "restart" >>> on_crash = "restart" >>> vnc=1 >>> vncunused=1 >>> vnclisten="0.0.0.0" >>> vif = [ ''type=netfront,bridge=xenbr0,mac=00:16:3e:12:c3:fa''] >>> device_model_version="qemu-xen-traditional" >>> gfx_passthru=0 >>> xen_platform_pci=1 >>> pci = [ ''01:00.0'', ''01:00.1'' ] >>> pci_msitranslate = 1 >>> pci_power_mgmt = 1 >>> pci_permissive = 1 >>> xen_extended_power_mgmt = 1 >>> acpi_s3 = 1 >>> acpi_s4 = 1 >>> disk = [ ''phy:/dev/t0vg/debian-testing,xvda,w'']; >>> >>> And I''m running on Xen 4.3.1 with NVIDIA driver 331.20 x86_64 in the >>> domU. >>> >>> Thanks and cheers! >>> >>> Links: >>> ------ >>> [1] >>> >>> http://lists.xenproject.org/archives/html/xen-users/2013- >>> 09/msg00106.html >>> [5] >>> >>> >>> >>> Links: >>> ------ >>> [1] mailto:gordan@bobich.net >>> [2] mailto:tamas.lengyel@zentific.com >>> [3] >>> >>> http://lists.xenproject.org/archives/html/xen-users/2013- >>> 09/msg00106.html >>> [4] http://mail.shatteredsilicon.net/tel:2147483647 >>> [5] >>> >>> http://lists.xenproject.org/archives/html/xen-users/2013- >>> 09/msg00106.html >>> >> >_______________________________________________ Xen-users mailing list Xen-users@lists.xen.org http://lists.xen.org/xen-users
Tamas Lengyel
2013-Nov-19 14:51 UTC
Re: Quadrified GTX 480 VT-d passthrough. CUDA 5.5 in Linux partial success
Here is something interesting however! If I do rmmod nvidia in the domU and then remove the PCI devices from dom0 with xl pci-detach, then add them back with xl pci-attach and run modprobe nvidia in the domU, the problem doesn''t appear anymore! I can run multiple CUDA apps, nvidia-smi and everything "just works" after that. Something is fishy around the domU boot? On Tue, Nov 19, 2013 at 3:25 PM, Tamas Lengyel <tamas.lengyel@zentific.com>wrote:> Allright, I did load the nouveau module and it doesn''t expose reset > either. Loading nvidia back again had no effect either, still the same > problem. > > root@debian-testing:~# lsmod | grep nouveau > nouveau 731557 0 > mxm_wmi 12515 1 nouveau > wmi 13243 2 mxm_wmi,nouveau > video 17792 1 nouveau > ttm 58566 1 nouveau > drm_kms_helper 31837 1 nouveau > i2c_algo_bit 12841 1 nouveau > drm 211856 4 ttm,drm_kms_helper,nvidia,nouveau > button 12944 1 nouveau > i2c_core 24353 6 > drm,i2c_piix4,drm_kms_helper,i2c_algo_bit,nvidia,nouveau > root@debian-testing:~# ls /sys/devices/pci0000\:00/0000\:00\:04.0/ > boot_vga d3cold_allowed enable modalias rescan resource3 > subsystem_device > broken_parity_status device firmware_node msi_bus resource > resource3_wc subsystem_vendor > class dma_mask_bits irq numa_node resource0 resource5 uevent > config driver local_cpulist power resource1 rom vendor > consistent_dma_mask_bits drm local_cpus remove resource1_wc > subsystem > > > > On Tue, Nov 19, 2013 at 2:48 PM, Gordan Bobic <gordan@bobich.net> wrote: > >> Actually - try something simpler first - just unload and reload the >> nvidia.ko driver, see if that resets the card back into a CUDA-ble >> state. >> >> >> On Tue, 19 Nov 2013 13:47:18 +0000, Gordan Bobic <gordan@bobich.net> >> wrote: >> >>> I can''t remember how it''s all symlinked, but I normally >>> find it under somewhere like: >>> >>> /sys/devices/pci0000:00/0000:00:03.0/0000:0b:00.0/0000:0c: >>> 02.0/0000:0d:00.0/reset >>> >>> (the path reflects PCI bridges along the way - yes, I have a card >>> behind 3 PCIe >>> bridges on my motherboard (5520->NF200->NF200->GPU) - and that''s not >>> even the >>> GTX690 - that would add at least one more bridge to the path - madness) >>> >>> If nvidia driver isn''t exposing it, you could try unloading the >>> nvidia driver, >>> loading the nouveau driver (make sure mode switching is disabled so >>> it doesn''t >>> get bound into a non-loadable state by the console), issuing a reset (if >>> that >>> exposes a reset node, which IIRC it does no Fermi+ GPUs), unloading >>> nouveau, >>> and reloading nvidia.ko. Then see if it works after that. >>> >>> Gordan >>> >>> On Tue, 19 Nov 2013 14:22:48 +0100, Tamas Lengyel >>> <tamas.lengyel@zentific.com> wrote: >>> >>>> I don''t see reset unfortunately: >>>> >>>> ls /sys/module/nvidia/drivers/pci:nvidia/0000:00:04.0 >>>> boot_vga d3cold_allowed enable i2c-3 msi_bus rescan >>>> resource3 subsystem_device >>>> broken_parity_status device firmware_node irq msi_irqs >>>> resource resource3_wc subsystem_vendor >>>> class dma_mask_bits i2c-0 local_cpulist numa_node resource0 >>>> resource5 uevent >>>> config driver i2c-1 local_cpus power resource1 rom >>>> vendor >>>> consistent_dma_mask_bits drm i2c-2 modalias remove >>>> resource1_wc subsystem >>>> >>>> On Tue, Nov 19, 2013 at 11:32 AM, Gordan Bobic wrote: >>>> Does the nvidia binary driver provide a reset handle for the device >>>> via sysfs? >>>> If you echo 1 into it, does it help or does it crash things? >>>> >>>> On Tue, 19 Nov 2013 10:32:31 +0100, Tamas Lengyel wrote: >>>> >>>> Hi everyone, >>>> after following in the footsteps of the following discussion >>>> (http://lists.xenproject.org/archives/html/xen-users/2013- >>>> 09/msg00106.html >>>> [3] >>>> [1]) >>>> >>>> I had been able to turn my GTX 480 into a Quadro 6000. When I VT-d >>>> passthrough it to a Debian jessie VM it shows up fine and CUDA 5.5 >>>> seems to function properly up to a point: >>>> >>>> lspci -v: >>>> >>>> 00:04.0 VGA compatible controller: NVIDIA Corporation GF100GL [Quadro >>>> 6000] (rev a3) (prog-if 00 [VGA controller]) >>>> Subsystem: ASUSTeK Computer Inc. Device 075f >>>> Physical Slot: 4 >>>> Flags: bus master, fast devsel, latency 0, IRQ 32 >>>> Memory at ee000000 (32-bit, non-prefetchable) [size=32M] >>>> Memory at e0000000 (64-bit, prefetchable) [size=128M] >>>> Memory at e8000000 (64-bit, prefetchable) [size=64M] >>>> I/O ports at c100 [size=128] >>>> Expansion ROM at f1000000 [disabled] [size=512K] >>>> Capabilities: [60] Power Management version 3 >>>> Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+ >>>> Capabilities: [78] Express Endpoint, MSI 00 >>>> Capabilities: [b4] Vendor Specific Information: Len=14 >>>> >>>> Kernel driver in use: nvidia >>>> >>>> 00:05.0 Audio device: NVIDIA Corporation GF100 High Definition Audio >>>> Controller (rev a1) >>>> Subsystem: ASUSTeK Computer Inc. Device 075f >>>> Physical Slot: 5 >>>> Flags: bus master, fast devsel, latency 0, IRQ 37 >>>> Memory at f1080000 (32-bit, non-prefetchable) [size=16K] >>>> Capabilities: [60] Power Management version 3 >>>> Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+ >>>> Capabilities: [78] Express Endpoint, MSI 00 >>>> Kernel driver in use: snd_hda_intel >>>> >>>> NVIDIA_CUDA-5.5_Samples/1_Utilities/deviceQuery# ./deviceQuery >>>> ./deviceQuery Starting... >>>> >>>> CUDA Device Query (Runtime API) version (CUDART static linking) >>>> >>>> Detected 1 CUDA Capable device(s) >>>> >>>> Device 0: "Quadro 6000" >>>> CUDA Driver Version / Runtime Version 6.0 / 5.5 >>>> CUDA Capability Major/Minor version number: 2.0 >>>> Total amount of global memory: 1536 MBytes >>>> (1610285056 bytes) >>>> (15) Multiprocessors, ( 32) CUDA Cores/MP: 480 CUDA Cores >>>> GPU Clock rate: >>>> 1401 >>>> MHz (1.40 GHz) >>>> Memory Clock rate: >>>> 1848 >>>> Mhz >>>> Memory Bus Width: >>>> 384-bit >>>> L2 Cache Size: >>>> 786432 bytes >>>> Maximum Texture Dimension Size (x,y,z) 1D=(65536), >>>> 2D=(65536, 65535), 3D=(2048, 2048, 2048) >>>> Maximum Layered 1D Texture Size, (num) layers 1D=(16384), 2048 >>>> layers >>>> Maximum Layered 2D Texture Size, (num) layers 2D=(16384, >>>> 16384), >>>> 2048 layers >>>> Total amount of constant memory: 65536 bytes >>>> Total amount of shared memory per block: 49152 bytes >>>> Total number of registers available per block: 32768 >>>> Warp size: >>>> 32 >>>> Maximum number of threads per multiprocessor: 1536 >>>> Maximum number of threads per block: 1024 >>>> Max dimension size of a thread block (x,y,z): (1024, 1024, 64) >>>> Max dimension size of a grid size (x,y,z): (65535, 65535, >>>> 65535) >>>> Maximum memory pitch: >>>> 2147483647 [4] bytes >>>> Texture alignment: 512 >>>> bytes >>>> Concurrent copy and kernel execution: Yes with 2 >>>> copy >>>> engine(s) >>>> Run time limit on kernels: No >>>> Integrated GPU sharing Host Memory: No >>>> Support host page-locked memory mapping: Yes >>>> Alignment requirement for Surfaces: Yes >>>> Device has ECC support: >>>> Disabled >>>> Device supports Unified Addressing (UVA): Yes >>>> Device PCI Bus ID / PCI location ID: 0 / 4 >>>> Compute Mode: >>>> < Default (multiple host threads can use ::cudaSetDevice() >>>> with >>>> device simultaneously) > >>>> >>>> deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 6.0, CUDA >>>> Runtime Version = 5.5, NumDevs = 1, Device0 = Quadro 6000 >>>> Result = PASS >>>> >>>> Unfortunately if I try to run any CUDA app or even nvidia-smi >>>> afterwards, I get the following errors: >>>> >>>> NVIDIA_CUDA-5.5_Samples/1_Utilities/deviceQuery# ./deviceQuery >>>> ./deviceQuery Starting... >>>> >>>> CUDA Device Query (Runtime API) version (CUDART static linking) >>>> >>>> cudaGetDeviceCount returned 10 >>>> -> invalid device ordinal >>>> Result = FAIL >>>> >>>> # nvidia-smi >>>> Unable to determine the device handle for GPU 0000:00:04.0: The >>>> NVIDIA >>>> kernel module detected an issue with GPU interrupts.Consult the >>>> "Common Problems" Chapter of the NVIDIA Driver README for >>>> details and steps that can be taken to resolve this issue. >>>> >>>> If I restart the VM I can run a single CUDA app again, once. It''s >>>> still pretty impressive to be able to do that without having to >>>> patch >>>> Xen or reboot the entire machine =) It doesn''t seem to matter what >>>> CUDA app I''m running, here is matrixMul >>>> for example: >>>> >>>> matrixMul# ./matrixMul >>>> [Matrix Multiply Using CUDA] - Starting... >>>> GPU Device 0: "Quadro 6000" with compute capability 2.0 >>>> >>>> MatrixA(320,320), MatrixB(640,320) >>>> Computing result using CUDA Kernel... >>>> done >>>> Performance= 227.22 GFlop/s, Time= 0.577 msec, Size= 131072000 Ops, >>>> WorkgroupSize= 1024 threads/block >>>> Checking computed result for correctness: Result = PASS >>>> >>>> Note: For peak performance, please refer to the matrixMulCUBLAS >>>> example. >>>> >>>> Anyhoo, does anyone have any idea what might I be able to tweak so I >>>> can >>>> avoid this issue? The setup clearly seems to work for the most >>>> part. >>>> >>>> My domU config: >>>> >>>> arch = ''x86_64'' >>>> name = "debian-miner" >>>> builder = "hvm" >>>> maxmem = 512 >>>> memory = 512 >>>> vcpus = 1 >>>> maxcpus = 1 >>>> boot = "cd" >>>> pae=1 >>>> acpi = 1 >>>> apic = 1 >>>> hap=1 >>>> hpet=1 >>>> shadow_memory = 32 >>>> on_poweroff = "destroy" >>>> on_reboot = "restart" >>>> on_crash = "restart" >>>> vnc=1 >>>> vncunused=1 >>>> vnclisten="0.0.0.0" >>>> vif = [ ''type=netfront,bridge=xenbr0,mac=00:16:3e:12:c3:fa''] >>>> device_model_version="qemu-xen-traditional" >>>> gfx_passthru=0 >>>> xen_platform_pci=1 >>>> pci = [ ''01:00.0'', ''01:00.1'' ] >>>> pci_msitranslate = 1 >>>> pci_power_mgmt = 1 >>>> pci_permissive = 1 >>>> xen_extended_power_mgmt = 1 >>>> acpi_s3 = 1 >>>> acpi_s4 = 1 >>>> disk = [ ''phy:/dev/t0vg/debian-testing,xvda,w'']; >>>> >>>> And I''m running on Xen 4.3.1 with NVIDIA driver 331.20 x86_64 in the >>>> domU. >>>> >>>> Thanks and cheers! >>>> >>>> Links: >>>> ------ >>>> [1] >>>> >>>> http://lists.xenproject.org/archives/html/xen-users/2013- >>>> 09/msg00106.html >>>> [5] >>>> >>>> >>>> >>>> Links: >>>> ------ >>>> [1] mailto:gordan@bobich.net >>>> [2] mailto:tamas.lengyel@zentific.com >>>> [3] >>>> >>>> http://lists.xenproject.org/archives/html/xen-users/2013- >>>> 09/msg00106.html >>>> [4] http://mail.shatteredsilicon.net/tel:2147483647 >>>> [5] >>>> >>>> http://lists.xenproject.org/archives/html/xen-users/2013- >>>> 09/msg00106.html >>>> >>> >> >_______________________________________________ Xen-users mailing list Xen-users@lists.xen.org http://lists.xen.org/xen-users
Gordan Bobic
2013-Nov-19 15:03 UTC
Re: Quadrified GTX 480 VT-d passthrough. CUDA 5.5 in Linux partial success
On Tue, 19 Nov 2013 15:51:22 +0100, Tamas Lengyel <tamas.lengyel@zentific.com> wrote:> Here is something interesting however! If I do rmmod nvidia in the > domU and then remove the PCI devices from dom0 with xl pci-detach, > then add them back with xl pci-attach and run modprobe nvidia in the > domU, the problem doesn''t appear anymore! I can run multiple CUDA > apps, nvidia-smi and everything "just works" after that.Awesome find, and very strange indeed.> Something is fishy around the domU boot?Possibly. It''ll be a few days before I can cross-check on my setup. Then again, my setup is quirky in the extreme due to the semi-dysfunctional PCIe bridging madness going on, so the watever I find may well be inconclusive. I got a complete second setup with the same motherboard just for troubleshooting and experimentation purposes (what can I say, I''m a glutton for punishment) just so I can test and try to fix various stuff without having to down a production machine and risk breaking it. Gordan
Maybe Matching Threads
- Quadrified GTX 480 VT-d passthrough. CUDA 5.5 in Linux partial success!
- Trying CUDA/OpenCL with VGA Passthrough
- Inaccurate result for 0. (PR#13538)
- Is my Intel HD Graphics 4600 Xen VGA Passthrough to Windows 8 Enterprise HVM domU Considered Successful?
- Is my Intel HD Graphics 4600 Xen VGA Passthrough to Windows 8 Enterprise HVM domU Considered Successful?