Hi, I''m trying to get VGA passthrough to work to an XP x64 guest, and I''m seeing "interesting" things happening. I''m using the kernel and userspace tools from here: http://xen.crc.id.au/support/guides/install/# on Scientific Linux 6. I gave up on trying to get an Nvidia card to work in the guest having read about the extra patches required to get a non-Quadro card to work. So I switched to using an ATI 6450/7450 card. This works fine - almost. ATI cards have a secondary audio output device function on them for outputting audio over HDMI outputs. When I pass both the VGA and the HDMI audio devices from the host to the guest, the guest cannot use the VGA card. It always shows up as unusable in the guest (yellow exclamation mark in XP x64). What is the best way to debug this further? Gordan
>De : xen-users-bounces@lists.xen.org [mailto:xen-users-bounces@lists.xen.org] De la part de Gordan Bobic >Envoyé : lundi 15 avril 2013 00:47 >À : xen-users@lists.xen.org >Objet : [Xen-users] ATI VGA Passthrough / Xen 4.2 / Linux 3.8.6 > >Hi, > >I''m trying to get VGA passthrough to work to an XP x64 guest, and I''m seeing "interesting" things happening. > >I''m using the kernel and userspace tools from here: >http://xen.crc.id.au/support/guides/install/# >on Scientific Linux 6. > >I gave up on trying to get an Nvidia card to work in the guest having read about the extra patches required to get a non-Quadro card to work. >So I switched to using an ATI 6450/7450 card. This works fine - almost. >ATI cards have a secondary audio output device function on them for outputting audio over HDMI outputs. When I pass both the VGA and >the HDMI audio devices from the host to the guest, the guest cannot use the VGA card. It always shows up as unusable in the guest (yellow >exclamation mark in XP x64). >I''ve got the same behavior with Quadro FX3800. One possible workaround for that is to deactivate then to activate the card in your domU and reboot it. Check if you have FLR function (lspci -vv): http://wiki.xen.org/wiki/Xen_PCI_Passthrough#How_can_I_check_if_PCI_device_supports_FLR_.28Function_Level_Reset.29_.3F I think this behavior is linked to the non-support of this function. Aurelien> >What is the best way to debug this further? > >Gordan_______________________________________________ Xen-users mailing list Xen-users@lists.xen.org http://lists.xen.org/xen-users
On 2013-04-15 10:32, Aurélien MILLIAT wrote:>> De : xen-users-bounces@lists.xen.org [mailto:xen-users-bounces@lists.xen.org] De la part de Gordan Bobic >> Envoyé : lundi 15 avril 2013 00:47 >> À : xen-users@lists.xen.org >> Objet : [Xen-users] ATI VGA Passthrough / Xen 4.2 / Linux 3.8.6 >> >> Hi, >> >> I''m trying to get VGA passthrough to work to an XP x64 guest, and I''m seeing "interesting" things happening. >> >> I''m using the kernel and userspace tools from here: >> http://xen.crc.id.au/support/guides/install/# >> on Scientific Linux 6. >> >> I gave up on trying to get an Nvidia card to work in the guest having read about the extra patches required to get a non-Quadro card to work. >> So I switched to using an ATI 6450/7450 card. This works fine - almost. >> ATI cards have a secondary audio output device function on them for outputting audio over HDMI outputs. When I pass both the VGA and >the HDMI audio devices from the host to the guest, the guest cannot use the VGA card. It always shows up as unusable in the guest (yellow >exclamation mark in XP x64).I had this problem (windows XP only), and fixed it by setting: stdvga=1 Which means something like: replace the cirrus logic emulated card with a stdvga emulated card. And then I guess what happens is: winxp is too old to understand the new card ;) and has to fall back to the passed through one, and somehow that makes it work, maybe because it initializes it differently during boot up (when starting the AMD graphics driver). Also, make sure you have already installed the AMD graphics drivers in XP. And be sure you set gfx_passthrough=0 (secondary passthrough, with AMD support) rather than =1 (primary passthrough, without AMD support)>> > I''ve got the same behavior with Quadro FX3800. One possible workaround for that is to deactivate then to activate the card in your domU and reboot it. > Check if you have FLR function (lspci -vv): http://wiki.xen.org/wiki/Xen_PCI_Passthrough#How_can_I_check_if_PCI_device_supports_FLR_.28Function_Level_Reset.29_.3F > I think this behavior is linked to the non-support of this function.The patches AMD sent the xen devs, plus the passthrough support in the driver, are supposed to handle this lack of FLR.> > Aurelien > >> What is the best way to debug this further? >> >> Gordan
On 04/15/2013 01:02 PM, Peter Maloney wrote:> On 2013-04-15 10:32, Aurélien MILLIAT wrote: >>> De : xen-users-bounces@lists.xen.org [mailto:xen-users-bounces@lists.xen.org] De la part de Gordan Bobic >>> Envoyé : lundi 15 avril 2013 00:47 >>> À : xen-users@lists.xen.org >>> Objet : [Xen-users] ATI VGA Passthrough / Xen 4.2 / Linux 3.8.6 >>> >>> Hi, >>> >>> I''m trying to get VGA passthrough to work to an XP x64 guest, and I''m seeing "interesting" things happening. >>> >>> I''m using the kernel and userspace tools from here: >>> http://xen.crc.id.au/support/guides/install/# >>> on Scientific Linux 6. >>> >>> I gave up on trying to get an Nvidia card to work in the guest having read about the extra patches required to get a non-Quadro card to work. >>> So I switched to using an ATI 6450/7450 card. This works fine - almost. >>> ATI cards have a secondary audio output device function on them for outputting audio over HDMI outputs. When I pass both the VGA and >the HDMI audio devices from the host to the guest, the guest cannot use the VGA card. It always shows up as unusable in the guest (yellow >exclamation mark in XP x64). > I had this problem (windows XP only), and fixed it by setting: > > stdvga=1That''s another thing I''ve been meaning to ask - where are the VM configs stored? I am using xend with VMs configured using virt-manager on EL6, and I cannot figure out where it put the configuration. It doesn''t appear to be in /etc/xen. So where is it and how do I get to it? I ask because virt-manager doesn''t actually show a Video adapter in the configuration after the VM is created (a bug no doubt). (Yes, I know virt-manager is a pile of crap, and won''t even let me attach the PCI devices, I have to do it using "xm pci-attach", after which it sees they are attached - but it''s all I have to work with at the moment - OpenXenManager locks up on me as soon as it tried (and fails) to authenticate. Are there any other recommendable Linux tools for managing and configuring Xen VMs?) Also, with the ATI card passed through as a normal PCI device, once it is picked up by the OS and the driver is installed, the cirrus card no longer works - yellow exclamation mark in device manager, and the only video output is from the ATI card. While convenient, I''m not sure this is "normal".> Which means something like: replace the cirrus logic emulated card with > a stdvga emulated card. > > And then I guess what happens is: winxp is too old to understand the new > card ;) and has to fall back to the passed through one, and somehow that > makes it work, maybe because it initializes it differently during boot > up (when starting the AMD graphics driver). > > Also, make sure you have already installed the AMD graphics drivers in XP.Without the ATI drivers (or in safe mode) only the Cirrus card comes up. In normal mode with ATI drivers only the ATI card comes up.> And be sure you set gfx_passthrough=0 (secondary passthrough, with AMD > support) rather than =1 (primary passthrough, without AMD support)See above - I need to find the configs first. Can you point me in the right direction? I figured it must be in xenstored or something, but googling about it didn''t prove conclusive.>> I''ve got the same behavior with Quadro FX3800. One possible workaround for that is to deactivate then to activate the card in your domU and reboot it. >> Check if you have FLR function (lspci -vv): http://wiki.xen.org/wiki/Xen_PCI_Passthrough#How_can_I_check_if_PCI_device_supports_FLR_.28Function_Level_Reset.29_.3F >> I think this behavior is linked to the non-support of this function. > The patches AMD sent the xen devs, plus the passthrough support in the > driver, are supposed to handle this lack of FLR.Speaking of which - is there a way to force the device reset? I''m finding that sometimes all I get from the VM is a blank screen, after the monitor flickers briefly. The emulated VGA card shows nothing after the low level startup, and the desktop/login which is expected to come up on the VGA card - doesn''t. I haven''t yet figured out if there is a specific chain of actions that leads to this, so any pointers on where to look would be welcome. Gordan
On 2013-04-15 14:24, Gordan Bobic wrote:> On 04/15/2013 01:02 PM, Peter Maloney wrote: >> On 2013-04-15 10:32, Aurélien MILLIAT wrote: >>>> >>>> I''m trying to get VGA passthrough to work to an XP x64 guest, and >>>> I''m seeing "interesting" things happening. >>>> >>>> I''m using the kernel and userspace tools from here: >>>> http://xen.crc.id.au/support/guides/install/# >>>> on Scientific Linux 6. >>>> >>>> I gave up on trying to get an Nvidia card to work in the guest >>>> having read about the extra patches required to get a non-Quadro >>>> card to work. >>>> So I switched to using an ATI 6450/7450 card. This works fine - >>>> almost. >>>> ATI cards have a secondary audio output device function on them for >>>> outputting audio over HDMI outputs. When I pass both the VGA and >>>> >the HDMI audio devices from the host to the guest, the guest >>>> cannot use the VGA card. It always shows up as unusable in the >>>> guest (yellow >exclamation mark in XP x64). >> I had this problem (windows XP only), and fixed it by setting: >> >> stdvga=1 > > That''s another thing I''ve been meaning to ask - where are the VM > configs stored? I am using xend with VMs configured using virt-manager > on EL6, and I cannot figure out where it put the configuration. It > doesn''t appear to be in /etc/xen. So where is it and how do I get to > it? I ask because virt-manager doesn''t actually show a Video adapter > in the configuration after the VM is created (a bug no doubt).That possibly depends on your distro... with openSUSE, I created my vms with virt-manager, and then looked in /etc/xen and found a text version and an xml one... so I deleted the xml ones, and hand edited the text ones. Here is my very old working windows xp config: http://pastebin.com/WYawYpRM You can just copy that wherever you want, and use it from the command line and forget your old config. vim /path/to/file xm create /path/to/file xm list xm destroy nameofvm xm destroy vmid And FYI mine was 32 bit. I don''t know if that matters.> > (Yes, I know virt-manager is a pile of crap, and won''t even let me > attach the PCI devices, I have to do it using "xm pci-attach", after > which it sees they are attached - but it''s all I have to work with at > the moment - OpenXenManager locks up on me as soon as it tried (and > fails) to authenticate. Are there any other recommendable Linux tools > for managing and configuring Xen VMs?) > > Also, with the ATI card passed through as a normal PCI device, once it > is picked up by the OS and the driver is installed, the cirrus card no > longer works - yellow exclamation mark in device manager, and the only > video output is from the ATI card. > > While convenient, I''m not sure this is "normal".My guess is that attaching would work in windows to some extent .... but my guess is that for non-plug and play OSses (windows is not truly plug and play, even though they have claimed it was since windows 98), you would have issues. So to be sure, you should have it in your config when you "create" rather than attaching afterwards.> >> Which means something like: replace the cirrus logic emulated card with >> a stdvga emulated card. >> >> And then I guess what happens is: winxp is too old to understand the new >> card ;) and has to fall back to the passed through one, and somehow that >> makes it work, maybe because it initializes it differently during boot >> up (when starting the AMD graphics driver). >> >> Also, make sure you have already installed the AMD graphics drivers >> in XP. > > Without the ATI drivers (or in safe mode) only the Cirrus card comes > up. In normal mode with ATI drivers only the ATI card comes up. > >> And be sure you set gfx_passthrough=0 (secondary passthrough, with AMD >> support) rather than =1 (primary passthrough, without AMD support) > > See above - I need to find the configs first. Can you point me in the > right direction? I figured it must be in xenstored or something, but > googling about it didn''t prove conclusive.Well, if you have a cirrus card, I think that means it is definitely secondary. Primary would remove the cirrus entirely. And yes, let''s focus on the config issue.> >>> I''ve got the same behavior with Quadro FX3800. One possible >>> workaround for that is to deactivate then to activate the card in >>> your domU and reboot it. >>> Check if you have FLR function (lspci -vv): >>> http://wiki.xen.org/wiki/Xen_PCI_Passthrough#How_can_I_check_if_PCI_device_supports_FLR_.28Function_Level_Reset.29_.3F >>> I think this behavior is linked to the non-support of this function. >> The patches AMD sent the xen devs, plus the passthrough support in the >> driver, are supposed to handle this lack of FLR. > > Speaking of which - is there a way to force the device reset? I''m > finding that sometimes all I get from the VM is a blank screen, after > the monitor flickers briefly. The emulated VGA card shows nothing > after the low level startup, and the desktop/login which is expected > to come up on the VGA card - doesn''t. I haven''t yet figured out if > there is a specific chain of actions that leads to this, so any > pointers on where to look would be welcome.I''m not sure about this ... I think we should make sure you have your devices attached on the HVM "create" before we try to guess about this problem.
On 04/15/2013 02:19 PM, Peter Maloney wrote:> On 2013-04-15 14:24, Gordan Bobic wrote: >> On 04/15/2013 01:02 PM, Peter Maloney wrote: >>> On 2013-04-15 10:32, Aurélien MILLIAT wrote: >>>>> >>>>> I''m trying to get VGA passthrough to work to an XP x64 guest, and >>>>> I''m seeing "interesting" things happening. >>>>> >>>>> I''m using the kernel and userspace tools from here: >>>>> http://xen.crc.id.au/support/guides/install/# >>>>> on Scientific Linux 6. >>>>> >>>>> I gave up on trying to get an Nvidia card to work in the guest >>>>> having read about the extra patches required to get a non-Quadro >>>>> card to work. >>>>> So I switched to using an ATI 6450/7450 card. This works fine - >>>>> almost. >>>>> ATI cards have a secondary audio output device function on them for >>>>> outputting audio over HDMI outputs. When I pass both the VGA and >>>>>> the HDMI audio devices from the host to the guest, the guest >>>>> cannot use the VGA card. It always shows up as unusable in the >>>>> guest (yellow >exclamation mark in XP x64). >>> I had this problem (windows XP only), and fixed it by setting: >>> >>> stdvga=1 >> >> That''s another thing I''ve been meaning to ask - where are the VM >> configs stored? I am using xend with VMs configured using virt-manager >> on EL6, and I cannot figure out where it put the configuration. It >> doesn''t appear to be in /etc/xen. So where is it and how do I get to >> it? I ask because virt-manager doesn''t actually show a Video adapter >> in the configuration after the VM is created (a bug no doubt). > That possibly depends on your distro... with openSUSE, I created my vms > with virt-manager, and then looked in /etc/xen and found a text version > and an xml one... so I deleted the xml ones, and hand edited the text ones. > > Here is my very old working windows xp config: > > http://pastebin.com/WYawYpRM > > You can just copy that wherever you want, and use it from the command > line and forget your old config. > > vim /path/to/file > xm create /path/to/file > xm list > xm destroy nameofvm > xm destroy vmidI just tried that, and now I''m getting this. # xm create /etc/xen/edi Using config file "/etc/xen/edi". Error: (22, ''Invalid argument'') Interestingly, I''m also now getting that error message when I am using virt-manager to start the same domain. It seems to be related to having PCI devices passed through. If I comment out the pci= line, the domain gets created fine. Digging a little further, it seems to be specifically related to actually passing the ATI card through. If I remove that and leave the PCI network card passed through, that works fine. This is where things appear to go wrong in xend.log: [2013-04-20 18:05:36 10570] ERROR (XendDomainInfo:2933) XendDomainInfo.initDomain: exception occurred Traceback (most recent call last): File "/usr/lib64/python2.6/site-packages/xen/xend/XendDomainInfo.py", line 2920, in _initDomain self._createDevices() File "/usr/lib64/python2.6/site-packages/xen/xend/XendDomainInfo.py", line 2396, in _createDevices self.pci_device_configure_boot() File "/usr/lib64/python2.6/site-packages/xen/xend/XendDomainInfo.py", line 627, in pci_device_configure_boot self.pci_device_configure(dev_sxp, first_dev = first) File "/usr/lib64/python2.6/site-packages/xen/xend/XendDomainInfo.py", line 970, in pci_device_configure devid = self._createDevice(''pci'', existing_pci_conf) File "/usr/lib64/python2.6/site-packages/xen/xend/XendDomainInfo.py", line 2327, in _createDevice return self.getDeviceController(deviceClass).createDevice(devConfig) File "/usr/lib64/python2.6/site-packages/xen/xend/server/DevController.py", line 67, in createDevice self.setupDevice(config) File "/usr/lib64/python2.6/site-packages/xen/xend/server/pciif.py", line 453, in setupDevice self.setupOneDevice(d) File "/usr/lib64/python2.6/site-packages/xen/xend/server/pciif.py", line 353, in setupOneDevice allow_access = True) Error: (22, ''Invalid argument'') [2013-04-20 18:05:36 10570] ERROR (XendDomainInfo:488) VM start failed Traceback (most recent call last): File "/usr/lib64/python2.6/site-packages/xen/xend/XendDomainInfo.py", line 474, in start XendTask.log_progress(31, 60, self._initDomain) File "/usr/lib64/python2.6/site-packages/xen/xend/XendTask.py", line 209, in log_progress retval = func(*args, **kwds) File "/usr/lib64/python2.6/site-packages/xen/xend/XendDomainInfo.py", line 2936, in _initDomain raise VmError(str(exn)) VmError: (22, ''Invalid argument'') [2013-04-20 18:05:36 10570] DEBUG (XendDomainInfo:3077) XendDomainInfo.destroy: domid=18 [2013-04-20 18:05:37 10570] DEBUG (XendDomainInfo:2402) Destroying device model [2013-04-20 18:05:37 10570] INFO (image:619) edi device model terminated [2013-04-20 18:05:37 10570] DEBUG (XendDomainInfo:2409) Releasing devices [2013-04-20 18:05:37 10570] DEBUG (XendDomainInfo:2415) Removing vbd/768 [2013-04-20 18:05:37 10570] DEBUG (XendDomainInfo:1276) XendDomainInfo.destroyDevice: deviceClass = vbd, device = vbd/768 [2013-04-20 18:05:37 10570] DEBUG (XendDomainInfo:2415) Removing vfb/0 [2013-04-20 18:05:37 10570] DEBUG (XendDomainInfo:1276) XendDomainInfo.destroyDevice: deviceClass = vfb, device = vfb/0 [2013-04-20 18:05:37 10570] DEBUG (XendDomainInfo:2407) No device model [2013-04-20 18:05:37 10570] DEBUG (XendDomainInfo:2409) Releasing devices [2013-04-20 18:05:37 10570] DEBUG (XendDomainInfo:2415) Removing vbd/768 [2013-04-20 18:05:37 10570] DEBUG (XendDomainInfo:1276) XendDomainInfo.destroyDevice: deviceClass = vbd, device = vbd/768 [2013-04-20 18:05:37 10570] ERROR (XendDomainInfo:108) Domain construction failed Traceback (most recent call last): File "/usr/lib64/python2.6/site-packages/xen/xend/XendDomainInfo.py", line 106, in create vm.start() File "/usr/lib64/python2.6/site-packages/xen/xend/XendDomainInfo.py", line 474, in start XendTask.log_progress(31, 60, self._initDomain) File "/usr/lib64/python2.6/site-packages/xen/xend/XendTask.py", line 209, in log_progress retval = func(*args, **kwds) File "/usr/lib64/python2.6/site-packages/xen/xend/XendDomainInfo.py", line 2936, in _initDomain raise VmError(str(exn)) VmError: (22, ''Invalid argument'')
OK, that last error seemed to have come from a duff hypervisor upgrade (4.2.1-7 seems to have issues, 4.2.1-6 doesn''t). I got all this working again, but not without issues. In general, the first time I add the physical VGA card to the VM, it works; it initializes correctly and produces output. Reboot, and it will never work again, until the next clean install, at least with XP x64. I tried with Windows 7, and that seems to be behaving better (i.e. the configuration at least survives multiple reboots). As soon as the power management kicks in and the display goes to sleep, it all goes wrong - Win7 crashes in the VM, and continues crashing on VM reboots, saying that the PCI device reset has failed. I tried to unbind the GPU, and load the radeon FB driver, and then unbound it again and removed the radeon kernel driver, just to try to reset the card that way, re-bound it to the stub driver, rebooted the VM, aaaaaand - on the next VM boot attempt the whole _host_ locked up (or at least the dom0 did). That''s pretty poor... Is there a way to force a PCI device reset? I''m going to try to work around this by disabling all power management in the guest OS, but a proper fix would be nice. I also have a sneaky suspicion that it is dodgyness surrounding device resets that is making XP64 break. Are there any workarounds that can be applied to try to work around this issue? On 04/20/2013 06:06 PM, Gordan Bobic wrote:> On 04/15/2013 02:19 PM, Peter Maloney wrote: >> On 2013-04-15 14:24, Gordan Bobic wrote: >>> On 04/15/2013 01:02 PM, Peter Maloney wrote: >>>> On 2013-04-15 10:32, Aurélien MILLIAT wrote: >>>>>> >>>>>> I''m trying to get VGA passthrough to work to an XP x64 guest, and >>>>>> I''m seeing "interesting" things happening. >>>>>> >>>>>> I''m using the kernel and userspace tools from here: >>>>>> http://xen.crc.id.au/support/guides/install/# >>>>>> on Scientific Linux 6. >>>>>> >>>>>> I gave up on trying to get an Nvidia card to work in the guest >>>>>> having read about the extra patches required to get a non-Quadro >>>>>> card to work. >>>>>> So I switched to using an ATI 6450/7450 card. This works fine - >>>>>> almost. >>>>>> ATI cards have a secondary audio output device function on them for >>>>>> outputting audio over HDMI outputs. When I pass both the VGA and >>>>>>> the HDMI audio devices from the host to the guest, the guest >>>>>> cannot use the VGA card. It always shows up as unusable in the >>>>>> guest (yellow >exclamation mark in XP x64). >>>> I had this problem (windows XP only), and fixed it by setting: >>>> >>>> stdvga=1 >>> >>> That''s another thing I''ve been meaning to ask - where are the VM >>> configs stored? I am using xend with VMs configured using virt-manager >>> on EL6, and I cannot figure out where it put the configuration. It >>> doesn''t appear to be in /etc/xen. So where is it and how do I get to >>> it? I ask because virt-manager doesn''t actually show a Video adapter >>> in the configuration after the VM is created (a bug no doubt). >> That possibly depends on your distro... with openSUSE, I created my vms >> with virt-manager, and then looked in /etc/xen and found a text version >> and an xml one... so I deleted the xml ones, and hand edited the text >> ones. >> >> Here is my very old working windows xp config: >> >> http://pastebin.com/WYawYpRM >> >> You can just copy that wherever you want, and use it from the command >> line and forget your old config. >> >> vim /path/to/file >> xm create /path/to/file >> xm list >> xm destroy nameofvm >> xm destroy vmid > > I just tried that, and now I''m getting this. > > # xm create /etc/xen/edi > Using config file "/etc/xen/edi". > Error: (22, ''Invalid argument'') > > Interestingly, I''m also now getting that error message when I am using > virt-manager to start the same domain. It seems to be related to having > PCI devices passed through. If I comment out the pci= line, the domain > gets created fine. > > Digging a little further, it seems to be specifically related to > actually passing the ATI card through. If I remove that and leave the > PCI network card passed through, that works fine. > > This is where things appear to go wrong in xend.log: > > [2013-04-20 18:05:36 10570] ERROR (XendDomainInfo:2933) > XendDomainInfo.initDomain: exception occurred > Traceback (most recent call last): > File "/usr/lib64/python2.6/site-packages/xen/xend/XendDomainInfo.py", > line 2920, in _initDomain > self._createDevices() > File "/usr/lib64/python2.6/site-packages/xen/xend/XendDomainInfo.py", > line 2396, in _createDevices > self.pci_device_configure_boot() > File "/usr/lib64/python2.6/site-packages/xen/xend/XendDomainInfo.py", > line 627, in pci_device_configure_boot > self.pci_device_configure(dev_sxp, first_dev = first) > File "/usr/lib64/python2.6/site-packages/xen/xend/XendDomainInfo.py", > line 970, in pci_device_configure > devid = self._createDevice(''pci'', existing_pci_conf) > File "/usr/lib64/python2.6/site-packages/xen/xend/XendDomainInfo.py", > line 2327, in _createDevice > return self.getDeviceController(deviceClass).createDevice(devConfig) > File > "/usr/lib64/python2.6/site-packages/xen/xend/server/DevController.py", > line 67, in createDevice > self.setupDevice(config) > File "/usr/lib64/python2.6/site-packages/xen/xend/server/pciif.py", > line 453, in setupDevice > self.setupOneDevice(d) > File "/usr/lib64/python2.6/site-packages/xen/xend/server/pciif.py", > line 353, in setupOneDevice > allow_access = True) > Error: (22, ''Invalid argument'') > [2013-04-20 18:05:36 10570] ERROR (XendDomainInfo:488) VM start failed > Traceback (most recent call last): > File "/usr/lib64/python2.6/site-packages/xen/xend/XendDomainInfo.py", > line 474, in start > XendTask.log_progress(31, 60, self._initDomain) > File "/usr/lib64/python2.6/site-packages/xen/xend/XendTask.py", line > 209, in log_progress > retval = func(*args, **kwds) > File "/usr/lib64/python2.6/site-packages/xen/xend/XendDomainInfo.py", > line 2936, in _initDomain > raise VmError(str(exn)) > VmError: (22, ''Invalid argument'') > [2013-04-20 18:05:36 10570] DEBUG (XendDomainInfo:3077) > XendDomainInfo.destroy: domid=18 > [2013-04-20 18:05:37 10570] DEBUG (XendDomainInfo:2402) Destroying > device model > [2013-04-20 18:05:37 10570] INFO (image:619) edi device model terminated > [2013-04-20 18:05:37 10570] DEBUG (XendDomainInfo:2409) Releasing devices > [2013-04-20 18:05:37 10570] DEBUG (XendDomainInfo:2415) Removing vbd/768 > [2013-04-20 18:05:37 10570] DEBUG (XendDomainInfo:1276) > XendDomainInfo.destroyDevice: deviceClass = vbd, device = vbd/768 > [2013-04-20 18:05:37 10570] DEBUG (XendDomainInfo:2415) Removing vfb/0 > [2013-04-20 18:05:37 10570] DEBUG (XendDomainInfo:1276) > XendDomainInfo.destroyDevice: deviceClass = vfb, device = vfb/0 > [2013-04-20 18:05:37 10570] DEBUG (XendDomainInfo:2407) No device model > [2013-04-20 18:05:37 10570] DEBUG (XendDomainInfo:2409) Releasing devices > [2013-04-20 18:05:37 10570] DEBUG (XendDomainInfo:2415) Removing vbd/768 > [2013-04-20 18:05:37 10570] DEBUG (XendDomainInfo:1276) > XendDomainInfo.destroyDevice: deviceClass = vbd, device = vbd/768 > [2013-04-20 18:05:37 10570] ERROR (XendDomainInfo:108) Domain > construction failed > Traceback (most recent call last): > File "/usr/lib64/python2.6/site-packages/xen/xend/XendDomainInfo.py", > line 106, in create > vm.start() > File "/usr/lib64/python2.6/site-packages/xen/xend/XendDomainInfo.py", > line 474, in start > XendTask.log_progress(31, 60, self._initDomain) > File "/usr/lib64/python2.6/site-packages/xen/xend/XendTask.py", line > 209, in log_progress > retval = func(*args, **kwds) > File "/usr/lib64/python2.6/site-packages/xen/xend/XendDomainInfo.py", > line 2936, in _initDomain > raise VmError(str(exn)) > VmError: (22, ''Invalid argument'') > > > > _______________________________________________ > Xen-users mailing list > Xen-users@lists.xen.org > http://lists.xen.org/xen-users
Joy, more ATI crashes (atikmdag.sys), including: "An attempt was made to write to read-only memory." and PAGE_FAULT_IN_NON_PAGED_AREA I''m starting to get a feeling that VGA passthrough support is not as stable as I might have hoped. :-/ Does anyone know of Nvidia Quadro VGA passthrough is substantially more stable than ATI support? Gordan On 04/21/2013 06:07 PM, Gordan Bobic wrote:> OK, that last error seemed to have come from a duff hypervisor upgrade > (4.2.1-7 seems to have issues, 4.2.1-6 doesn''t). > > I got all this working again, but not without issues. In general, the > first time I add the physical VGA card to the VM, it works; it > initializes correctly and produces output. Reboot, and it will never > work again, until the next clean install, at least with XP x64. > > I tried with Windows 7, and that seems to be behaving better (i.e. the > configuration at least survives multiple reboots). As soon as the power > management kicks in and the display goes to sleep, it all goes wrong - > Win7 crashes in the VM, and continues crashing on VM reboots, saying > that the PCI device reset has failed. I tried to unbind the GPU, and > load the radeon FB driver, and then unbound it again and removed the > radeon kernel driver, just to try to reset the card that way, re-bound > it to the stub driver, rebooted the VM, aaaaaand - on the next VM boot > attempt the whole _host_ locked up (or at least the dom0 did). That''s > pretty poor... > > Is there a way to force a PCI device reset? I''m going to try to work > around this by disabling all power management in the guest OS, but a > proper fix would be nice. I also have a sneaky suspicion that it is > dodgyness surrounding device resets that is making XP64 break. Are there > any workarounds that can be applied to try to work around this issue? > > On 04/20/2013 06:06 PM, Gordan Bobic wrote: >> On 04/15/2013 02:19 PM, Peter Maloney wrote: >>> On 2013-04-15 14:24, Gordan Bobic wrote: >>>> On 04/15/2013 01:02 PM, Peter Maloney wrote: >>>>> On 2013-04-15 10:32, Aurélien MILLIAT wrote: >>>>>>> >>>>>>> I''m trying to get VGA passthrough to work to an XP x64 guest, and >>>>>>> I''m seeing "interesting" things happening. >>>>>>> >>>>>>> I''m using the kernel and userspace tools from here: >>>>>>> http://xen.crc.id.au/support/guides/install/# >>>>>>> on Scientific Linux 6. >>>>>>> >>>>>>> I gave up on trying to get an Nvidia card to work in the guest >>>>>>> having read about the extra patches required to get a non-Quadro >>>>>>> card to work. >>>>>>> So I switched to using an ATI 6450/7450 card. This works fine - >>>>>>> almost. >>>>>>> ATI cards have a secondary audio output device function on them for >>>>>>> outputting audio over HDMI outputs. When I pass both the VGA and >>>>>>>> the HDMI audio devices from the host to the guest, the guest >>>>>>> cannot use the VGA card. It always shows up as unusable in the >>>>>>> guest (yellow >exclamation mark in XP x64). >>>>> I had this problem (windows XP only), and fixed it by setting: >>>>> >>>>> stdvga=1 >>>> >>>> That''s another thing I''ve been meaning to ask - where are the VM >>>> configs stored? I am using xend with VMs configured using virt-manager >>>> on EL6, and I cannot figure out where it put the configuration. It >>>> doesn''t appear to be in /etc/xen. So where is it and how do I get to >>>> it? I ask because virt-manager doesn''t actually show a Video adapter >>>> in the configuration after the VM is created (a bug no doubt). >>> That possibly depends on your distro... with openSUSE, I created my vms >>> with virt-manager, and then looked in /etc/xen and found a text version >>> and an xml one... so I deleted the xml ones, and hand edited the text >>> ones. >>> >>> Here is my very old working windows xp config: >>> >>> http://pastebin.com/WYawYpRM >>> >>> You can just copy that wherever you want, and use it from the command >>> line and forget your old config. >>> >>> vim /path/to/file >>> xm create /path/to/file >>> xm list >>> xm destroy nameofvm >>> xm destroy vmid >> >> I just tried that, and now I''m getting this. >> >> # xm create /etc/xen/edi >> Using config file "/etc/xen/edi". >> Error: (22, ''Invalid argument'') >> >> Interestingly, I''m also now getting that error message when I am using >> virt-manager to start the same domain. It seems to be related to having >> PCI devices passed through. If I comment out the pci= line, the domain >> gets created fine. >> >> Digging a little further, it seems to be specifically related to >> actually passing the ATI card through. If I remove that and leave the >> PCI network card passed through, that works fine. >> >> This is where things appear to go wrong in xend.log: >> >> [2013-04-20 18:05:36 10570] ERROR (XendDomainInfo:2933) >> XendDomainInfo.initDomain: exception occurred >> Traceback (most recent call last): >> File "/usr/lib64/python2.6/site-packages/xen/xend/XendDomainInfo.py", >> line 2920, in _initDomain >> self._createDevices() >> File "/usr/lib64/python2.6/site-packages/xen/xend/XendDomainInfo.py", >> line 2396, in _createDevices >> self.pci_device_configure_boot() >> File "/usr/lib64/python2.6/site-packages/xen/xend/XendDomainInfo.py", >> line 627, in pci_device_configure_boot >> self.pci_device_configure(dev_sxp, first_dev = first) >> File "/usr/lib64/python2.6/site-packages/xen/xend/XendDomainInfo.py", >> line 970, in pci_device_configure >> devid = self._createDevice(''pci'', existing_pci_conf) >> File "/usr/lib64/python2.6/site-packages/xen/xend/XendDomainInfo.py", >> line 2327, in _createDevice >> return self.getDeviceController(deviceClass).createDevice(devConfig) >> File >> "/usr/lib64/python2.6/site-packages/xen/xend/server/DevController.py", >> line 67, in createDevice >> self.setupDevice(config) >> File "/usr/lib64/python2.6/site-packages/xen/xend/server/pciif.py", >> line 453, in setupDevice >> self.setupOneDevice(d) >> File "/usr/lib64/python2.6/site-packages/xen/xend/server/pciif.py", >> line 353, in setupOneDevice >> allow_access = True) >> Error: (22, ''Invalid argument'') >> [2013-04-20 18:05:36 10570] ERROR (XendDomainInfo:488) VM start failed >> Traceback (most recent call last): >> File "/usr/lib64/python2.6/site-packages/xen/xend/XendDomainInfo.py", >> line 474, in start >> XendTask.log_progress(31, 60, self._initDomain) >> File "/usr/lib64/python2.6/site-packages/xen/xend/XendTask.py", line >> 209, in log_progress >> retval = func(*args, **kwds) >> File "/usr/lib64/python2.6/site-packages/xen/xend/XendDomainInfo.py", >> line 2936, in _initDomain >> raise VmError(str(exn)) >> VmError: (22, ''Invalid argument'') >> [2013-04-20 18:05:36 10570] DEBUG (XendDomainInfo:3077) >> XendDomainInfo.destroy: domid=18 >> [2013-04-20 18:05:37 10570] DEBUG (XendDomainInfo:2402) Destroying >> device model >> [2013-04-20 18:05:37 10570] INFO (image:619) edi device model terminated >> [2013-04-20 18:05:37 10570] DEBUG (XendDomainInfo:2409) Releasing devices >> [2013-04-20 18:05:37 10570] DEBUG (XendDomainInfo:2415) Removing vbd/768 >> [2013-04-20 18:05:37 10570] DEBUG (XendDomainInfo:1276) >> XendDomainInfo.destroyDevice: deviceClass = vbd, device = vbd/768 >> [2013-04-20 18:05:37 10570] DEBUG (XendDomainInfo:2415) Removing vfb/0 >> [2013-04-20 18:05:37 10570] DEBUG (XendDomainInfo:1276) >> XendDomainInfo.destroyDevice: deviceClass = vfb, device = vfb/0 >> [2013-04-20 18:05:37 10570] DEBUG (XendDomainInfo:2407) No device model >> [2013-04-20 18:05:37 10570] DEBUG (XendDomainInfo:2409) Releasing devices >> [2013-04-20 18:05:37 10570] DEBUG (XendDomainInfo:2415) Removing vbd/768 >> [2013-04-20 18:05:37 10570] DEBUG (XendDomainInfo:1276) >> XendDomainInfo.destroyDevice: deviceClass = vbd, device = vbd/768 >> [2013-04-20 18:05:37 10570] ERROR (XendDomainInfo:108) Domain >> construction failed >> Traceback (most recent call last): >> File "/usr/lib64/python2.6/site-packages/xen/xend/XendDomainInfo.py", >> line 106, in create >> vm.start() >> File "/usr/lib64/python2.6/site-packages/xen/xend/XendDomainInfo.py", >> line 474, in start >> XendTask.log_progress(31, 60, self._initDomain) >> File "/usr/lib64/python2.6/site-packages/xen/xend/XendTask.py", line >> 209, in log_progress >> retval = func(*args, **kwds) >> File "/usr/lib64/python2.6/site-packages/xen/xend/XendDomainInfo.py", >> line 2936, in _initDomain >> raise VmError(str(exn)) >> VmError: (22, ''Invalid argument'') >> >> >> >> _______________________________________________ >> Xen-users mailing list >> Xen-users@lists.xen.org >> http://lists.xen.org/xen-users >
Hello Gordon, On Apr 21, 2013, at 1:22 PM, Gordan Bobic <gordan@bobich.net> wrote:> Joy, more ATI crashes (atikmdag.sys), including: > "An attempt was made to write to read-only memory." > and > PAGE_FAULT_IN_NON_PAGED_AREA > > I''m starting to get a feeling that VGA passthrough support is not as stable as I might have hoped. :-/It generally is, but I''ve had some fights with cards that aren''t super new. I was always able to get my 5850 to work until I installed gplpv, which would give that BSOD. I''m told that''s not an issue for anyone these days, but I''ve only seen that on Windows 7 DomU''s. Also, sometimes, FLR-related issues will give that BSOD. You could try detaching the card, attempt a good boot, reboot dom0, reattach and boot DomU. Good luck! YMMV, but I hope it helps :) Cheers, Andrew Bobulsky> Does anyone know of Nvidia Quadro VGA passthrough is substantially more stable than ATI support?
On 04/21/2013 09:43 PM, Andrew Bobulsky wrote:> Hello Gordon, > > On Apr 21, 2013, at 1:22 PM, Gordan Bobic <gordan@bobich.net> wrote: > >> Joy, more ATI crashes (atikmdag.sys), including: >> "An attempt was made to write to read-only memory." >> and >> PAGE_FAULT_IN_NON_PAGED_AREA >> >> I''m starting to get a feeling that VGA passthrough support is not as stable as I might have hoped. :-/ > > It generally is, but I''ve had some fights with cards that aren''t super new.Radeon HD7450 is the most recent generation.> I was always able to get my 5850 to work until I installed gplpv, > which would give that BSOD. I''m told that''s not an issue for anyone > these days, but I''ve only seen that on Windows 7 DomU''s.I tried XP64, too (XP64 is my preferred choice), but that fares even worse. Typically, that only works once, right after a clean install. Add the ATI card, boot up, install the driver, and it all works fine - until I shut down the VM. Thereafter it never works again.> Also, sometimes, FLR-related issues will give that BSOD. You could > try detaching the card, attempt a good boot, reboot dom0, reattach and > boot DomU.Having to reboot Dom0 completely defeats the purpose of what I''m trying to achieve. If I have to reboot the machine, I''ll run Windows on bare metal and go 25%+ faster for it. I specifically wanted to avoid having to reboot the host, and was willing to live with the virtualization performance tax to achieve it. Gordan
Update: Having read through some of the mailing list archives, I tried putting the secondary card being passed through into a different slot, in an attempt to make sure it is on a different PCIe bridge from other host devices (e.g. the host''s primary GPU). My subjective feeling is that this seems to have improved the reliability a little bit, but there are still BSODs occurring, mainly mentioning a timeout while waiting for the device reset on the GPU. Particularly when attempting to do some testing of the GPUs 3D capabilities (e.g. Crysis benchmark). Does anyone have any words of wisdom on the subject of troubleshooting VGA passthrough device reset timeouts in the guest? Gordan
Not sure if I can offer any wisdom, but I''ll just note that secondary GPU passthrough has been pretty stable for me with Windows 7 combined with a Radeon HD 6670 and the Catalyst 13.3beta3 driver. But I confess that I cheat a bit: For this setup, I''m using XCP 1.6 as dom0 and am managing guests with XenCenter. I had GPU passthrough working with vanilla Xen 4.x in the past. But XCP 1.6 (which includes Xen 4.1) is just easier. But whatever your setup, you may wish to give the Catalyst 13.3beta3 driver a try. It works great for me. _______________________________________________ Xen-users mailing list Xen-users@lists.xen.org http://lists.xen.org/xen-users
On 04/22/2013 10:46 AM, Gizmo Chicken wrote:> Not sure if I can offer any wisdom, but I''ll just note that secondary > GPU passthrough has been pretty stable for me with Windows 7 combined > with a Radeon HD 6670 and the Catalyst 13.3beta3 driver. But I confess > that I cheat a bit: For this setup, I''m using XCP 1.6 as dom0 and am > managing guests with XenCenter. I had GPU passthrough working with > vanilla Xen 4.x in the past. But XCP 1.6 (which includes Xen 4.1) is > just easier. But whatever your setup, you may wish to give the Catalyst > 13.3beta3 driver a try. It works great for me.It got to the point where the VM won''t actually start - it BSODs saying: "Attempt to reset the display driver and recover from timeout failed." in atikmpag.sys The only way I can boot it is either into safe mode, or by removing the ATI card from the VM configuration. In safe mode the 13.3-beta driver fails saying it failed to load the detection driver. While Windows 7 seems to have lasted a few reboots more, ultimately it''s still no more usable than XP considering it won''t even boot any more. Any debugging ideas welcome. Thus far the only noteworthy thing I''ve changed (to no obvious positive effect) is putting the VGA passthrough GPU into a different slot so that it is the only thing on that particular PCIe bridge. Gordan
Gordan Bobic wrote> On 04/22/2013 10:46 AM, Gizmo Chicken wrote: >> Not sure if I can offer any wisdom, but I''ll just note that secondary >> GPU passthrough has been pretty stable for me with Windows 7 combined >> with a Radeon HD 6670 and the Catalyst 13.3beta3 driver. But I confess >> that I cheat a bit: For this setup, I''m using XCP 1.6 as dom0 and am >> managing guests with XenCenter. I had GPU passthrough working with >> vanilla Xen 4.x in the past. But XCP 1.6 (which includes Xen 4.1) is >> just easier. But whatever your setup, you may wish to give the Catalyst >> 13.3beta3 driver a try. It works great for me. > > It got to the point where the VM won''t actually start - it BSODs saying: > > "Attempt to reset the display driver and recover from timeout failed." > > in atikmpag.sys > > The only way I can boot it is either into safe mode, or by removing the > ATI card from the VM configuration. In safe mode the 13.3-beta driver > fails saying it failed to load the detection driver. > > While Windows 7 seems to have lasted a few reboots more, ultimately it''s > still no more usable than XP considering it won''t even boot any more. > > Any debugging ideas welcome. > > Thus far the only noteworthy thing I''ve changed (to no obvious positive > effect) is putting the VGA passthrough GPU into a different slot so that > it is the only thing on that particular PCIe bridge. > > Gordan > > _______________________________________________ > Xen-users mailing list> Xen-users@.xen> http://lists.xen.org/xen-usersI''ve signed up to this mailing list simply because your situation mirrors my own. Allow me to elaborate, my setup is as follows: Processors = 2x Intel E5440 Xeons (Harpertown) Motherboard = SuperMicro X7DWA-N Graphics = 1x ATI Radeon 5870 1GB (PCIe Slot 1 -- Primary card in BIOS) 1x ATI Radeon 5570 1GB (PCIe Slot 6 -- Secondary; No POST messages on boot) I''m running Xen 4.2.1 with patches applied to qemu-dm for ATI primary pass-through. My dom0 is Fedora 18. My grub commandlines are: GRUB_DEFAULTGRUB_CMDLINE_LINUX="xen-pciback.permissive xen-pciback.hide=(02:00.0)(02:00.1)(00:1d.7) pci=resource_alignment=02:00.0;02:00.1" GRUB_CMDLINE_XEN="iommu=1 dom0_mem=10240M" The lspci listing for my cards has the 5870 (used by dom0) at 01:00.0, and 01:00.1 (HDMI audio) The same listing has the 5570 (hidden by pciback) at 02:00:0, and 02:00.1 I''ve tried Windows 7 in both x86 and x64 as domU and my results are sporadic but neither yields a display on my second monitor. The bitness seems not to matter, but the results vary when using different driver versions as well as using CCC. When using the 10.x series Catalyst drivers, the system boots, recognizes the card, and even shows the monitor in "Screen Resolution" settings. But when I attempt to "Extend the Display" to the monitor on my 5570, the system hangs for a moment, then BSODs. (Not to atikmpag.sys or atikmdag.sys but a different driver that escapes me at present, Error 8E). Upon reboot however, the system crashes immediately with a BSOD in atimkpag.sys. When using the 11.x or above drivers, the system crashes whether CCC is installed or not, again with a BSOD in atikmdag.sys Error 116. I have several theories as to why my setup is failing: 1) I''m using a Dual CPU system, and this is somehow causing problems with ownership and separation of PCIe devices on the same bus? I don''t know if this is possible. 2) I''m using a system with no integrated graphics adapter, but that relies on 2 discrete graphics cards, both Radeons. In order to use dom0, I need to initiate the primary adapter in the first PCIe slot (5870). I do not have a second machine to test this. 3) Xen cannot pass-through one card in a two card configuration when dom0 is using one of the cards. 4) Xen cannot pass-through the secondary card in a two card configuration. It must be the primary card, and the OS must use the secondary card. Perhaps a hardware limitation of VT-d? I''ve been working on a solution for days, and to this point I''m somewhat stumped. I''m going to attempt to change the BIOS settings to use the secondary card in Slot 6 (5570) as the primary card, and pass through the card at address 01:00.0 (5870). Maybe that will yield results? I hope anyone can help me resolve this situation. -- View this message in context: http://xen.1045712.n5.nabble.com/ATI-VGA-Passthrough-Xen-4-2-Linux-3-8-6-tp5715423p5715722.html Sent from the Xen - User mailing list archive at Nabble.com.
On 04/27/2013 09:25 PM, Alex Karaoui wrote:> Gordan Bobic wrote >> On 04/22/2013 10:46 AM, Gizmo Chicken wrote: >>> Not sure if I can offer any wisdom, but I''ll just note that secondary >>> GPU passthrough has been pretty stable for me with Windows 7 combined >>> with a Radeon HD 6670 and the Catalyst 13.3beta3 driver. But I confess >>> that I cheat a bit: For this setup, I''m using XCP 1.6 as dom0 and am >>> managing guests with XenCenter. I had GPU passthrough working with >>> vanilla Xen 4.x in the past. But XCP 1.6 (which includes Xen 4.1) is >>> just easier. But whatever your setup, you may wish to give the Catalyst >>> 13.3beta3 driver a try. It works great for me. >> >> It got to the point where the VM won''t actually start - it BSODs saying: >> >> "Attempt to reset the display driver and recover from timeout failed." >> >> in atikmpag.sys >> >> The only way I can boot it is either into safe mode, or by removing the >> ATI card from the VM configuration. In safe mode the 13.3-beta driver >> fails saying it failed to load the detection driver.[...]>> Thus far the only noteworthy thing I''ve changed (to no obvious positive >> effect) is putting the VGA passthrough GPU into a different slot so that >> it is the only thing on that particular PCIe bridge. >>> I''ve signed up to this mailing list simply because your situation mirrors my > own. Allow me to elaborate, my setup is as follows: > > Processors = 2x Intel E5440 Xeons (Harpertown) > Motherboard = SuperMicro X7DWA-N > Graphics > 1x ATI Radeon 5870 1GB (PCIe Slot 1 -- Primary card in BIOS) > 1x ATI Radeon 5570 1GB (PCIe Slot 6 -- Secondary; No POST messages on boot)I don''t think it matters what you use for your primary (Dom0) GPU. I have an Nvidia 8800GT and an ATI 6450. Passing through the Nvidia card didn''t work at all (yellow exclamation mark in device manager). I have had a little more success with the ATI card passthrough recently. More details below.> I''m running Xen 4.2.1 with patches applied to qemu-dm for ATI primary > pass-through. > My dom0 is Fedora 18. My grub commandlines are: > > GRUB_DEFAULT> GRUB_CMDLINE_LINUX="xen-pciback.permissive > xen-pciback.hide=(02:00.0)(02:00.1)(00:1d.7) > pci=resource_alignment=02:00.0;02:00.1" > GRUB_CMDLINE_XEN="iommu=1 dom0_mem=10240M"I''m using kernels and the xen stack from here since I use EL6: http://xen.crc.id.au/support/guides/ xen-pciback on that is a module rather than built in, so I have the options for it in modprobe.d.> The lspci listing for my cards has the 5870 (used by dom0) at 01:00.0, and > 01:00.1 (HDMI audio) > The same listing has the 5570 (hidden by pciback) at 02:00:0, and 02:00.1 > > I''ve tried Windows 7 in both x86 and x64 as domU and my results are sporadic > but neither yields a display on my second monitor. The bitness seems not to > matter, but the results vary when using different driver versions as well as > using CCC.I have managed to get further than this, but there are still issues. See here: https://lists.wireless.org.au/pipermail/kernel-xen/2013-April/000213.html I''m using xen-hypervisor 4.2.1-6, and everything else is 4.2.2-1, with a modified 3.8.8-1 kernel (3.8.8-1.0, pciehp is a module and NR_CPUS boosted to 32). In this setup the 6450 _almost_ works in the guest OS. I am running a bare driver (13.3beta3), no CCC. I also don''t have .NET installed in my Win7 64-bit guest OS. The VM boots and generally works OK as long as there is no full-screen 3D usage. The moment you try to fire up a 3D application in full screen mode, it causes the app crash. Sometimes the driver manages a device reset. Most of the time it results in a "PAGING_REQUEST_IN_NONPAGED_AREA" BSOD, though. Windowed 3D operations seem to work fine, though. I can run GPU-Z PCIe test and OCCT GPU test (windows, full screen causes above mentioned crashes) without any obvious problems.> 1) I''m using a Dual CPU system, and this is somehow causing problems with > ownership and separation of PCIe devices on the same bus? I don''t know if > this is possible.I''m not entirely convinced about this. Note that I am running a GPU in a slot where it is the only device behind a particular PCIe bridge. I don''t actually know whether that makes a difference or not (or whether it only might make a difference on my hardware since I am stuck with ALL 8 PCIe slots behind the Nvidia NF200 bridge (EVGA in their infinite wisdom decided that somebody might need 8 PCIe slots and crammed them all behind the NF200 routers).> 3) Xen cannot pass-through one card in a two card configuration when dom0 is > using one of the cards.Not sure if this might be dual-ATI related. I have no such issues with Nvidia+ATI card in my system.> 4) Xen cannot pass-through the secondary card in a two card configuration. > It must be the primary card, and the OS must use the secondary card. > Perhaps a hardware limitation of VT-d?Not the case - I only ever tried passing the secondary GPU. As mentioned above, this mostly seems to work now, albeit not quite usable yet for any particularly meaningful purpose. What made the difference for me since my last post is the Xen stack update to the versions mentioned above. The xen-hypervisor issue/regression I mentioned is something I am still investigating. Hoping to gather some logs and data and file a bug report later today. Gordan
This problem continues to drive me nuts - not by it flat out not working, but by working _intermittently_. For the past week, I had not managed to get ATI VGA passthrough to boot up once (BSOD every time). I was tweaking some boot parameters, and at one point it not only booted up without BSOD-ing, it actually managed full screen 3D applications, and completed a full GPU benchmark pass of Crysis! So just to make sure, I did a full shutdown and cold-booted the machine again - BSOD after BSOD after BSOD. Rebooted it again, and now it works again, including full screen 3D switching. One thing I have established is that pci=resource_alignment=<id>;<id> kernel boot parameter makes the machine not boot at all. It looks like it wipes out the NIC by realigning things, and I need the NIC to work because the machine runs on NFS root (and the VM disk is an iSCSI share). Has anybody got any suggestions on how I might debug this any further? It''s really quite annoying having this _almost_ working. It also looks like if it works once, it will continue working upon guest restarts. Gordan
On 05/05/2013 04:42 PM, Gordan Bobic wrote:> This problem continues to drive me nuts - not by it flat out not > working, but by working _intermittently_. > > For the past week, I had not managed to get ATI VGA passthrough to boot > up once (BSOD every time). I was tweaking some boot parameters, and at > one point it not only booted up without BSOD-ing, it actually managed > full screen 3D applications, and completed a full GPU benchmark pass of > Crysis! > > So just to make sure, I did a full shutdown and cold-booted the machine > again - BSOD after BSOD after BSOD. > > Rebooted it again, and now it works again, including full screen 3D > switching. > > One thing I have established is that pci=resource_alignment=<id>;<id> > kernel boot parameter makes the machine not boot at all. It looks like > it wipes out the NIC by realigning things, and I need the NIC to work > because the machine runs on NFS root (and the VM disk is an iSCSI share). > > Has anybody got any suggestions on how I might debug this any further? > It''s really quite annoying having this _almost_ working. It also looks > like if it works once, it will continue working upon guest restarts.Another observation - when it boots up, and I "eject" the ATI card from the guest, all that does it switch the guest display briefly to VNC primary, followed by slightly corrupted output switching back to the ATI card. And it is still responsive after that, I can execute a normal shutdown, don''t have to do it blind. Gordan
OK, I think I have finally managed to get things to what appears to be the average state of unreliability of ATI VGA passthrough (i.e. seems to work _most_ of the time after a fresh reboot, but becomes much more hit and miss after a VM reboot or two). I''m probably going to regret saying this when I find that over the next week I cannot get it to start up even once, fresh reboot or not, but what seems to have made a difference is disabling irq balancing (I''m on a dual X5650 machine, 2 CPUs, 6 cores / 12 threads each). So if you are having similarly difficult time, you might want to try the noirqbalance dom0 kernel boot parameter and disable the irqbalance service. Disclaimer: This is largely based on a gut feeling after a few hours testing, I certainly don''t think it''s definitive, but statistically it seems to help. I have a Quadro 2000 inbound, so I will test with that when it arrives next week. The optimist in me very much hoping it will "just work". The realist suspects that would be way too easy. Will know for sure one way or the other in a few days'' time. Gordan On 05/05/2013 04:57 PM, Gordan Bobic wrote:> On 05/05/2013 04:42 PM, Gordan Bobic wrote: >> This problem continues to drive me nuts - not by it flat out not >> working, but by working _intermittently_. >> >> For the past week, I had not managed to get ATI VGA passthrough to boot >> up once (BSOD every time). I was tweaking some boot parameters, and at >> one point it not only booted up without BSOD-ing, it actually managed >> full screen 3D applications, and completed a full GPU benchmark pass of >> Crysis! >> >> So just to make sure, I did a full shutdown and cold-booted the machine >> again - BSOD after BSOD after BSOD. >> >> Rebooted it again, and now it works again, including full screen 3D >> switching. >> >> One thing I have established is that pci=resource_alignment=<id>;<id> >> kernel boot parameter makes the machine not boot at all. It looks like >> it wipes out the NIC by realigning things, and I need the NIC to work >> because the machine runs on NFS root (and the VM disk is an iSCSI share). >> >> Has anybody got any suggestions on how I might debug this any further? >> It''s really quite annoying having this _almost_ working. It also looks >> like if it works once, it will continue working upon guest restarts. > > Another observation - when it boots up, and I "eject" the ATI card from > the guest, all that does it switch the guest display briefly to VNC > primary, followed by slightly corrupted output switching back to the ATI > card. And it is still responsive after that, I can execute a normal > shutdown, don''t have to do it blind.
As promised, here is an initial report on how the Quadro 2000 experiment panned out. In short - if anything stability is a little _worse_, with all the same issues that require a host reboot after a guest crash. So, as a list, both ATI and Nvidia Quadro suffer the following issues: 1) Random guest graphics corruption (horizontal lines) after almost-but-not-quite-proper crash. 2) Same BSODs reporting timeout while attempting to reset a device. 3) Similar but seemingly worse graphics stability. With the ATI card I managed about 5 minutes in Borderlands 2 before it crashed. With the Quadro I seem to be managing about 2 minutes before it tries to reset itself and falls flat on it''s face. The one advantage the quadro has is that once it has crashed, the driver disables the card, so the guest doesn''t repeately BSOD until you reboot the host. Instead it gives you the login screen on the primary (VNC) display output, so you can still get into the guest easily and do any required maintenance. Futile, but it''s an improvement on just BSOD-ing. The only two things that come to mind as possible causes are: 1) I''m on a dual socket system (dual Xeon X5650) 2) My motherboard''s PCIe slots are behind NF200 PCIe bridges (yes, EVGA have decided in their infinite wisdom to put all 7 PCIe slots behind NF200s, none are directly attached to the Intel NB). Has anyone managed to successfully run VGA passthrough on a dual socket system? What about with PCIe devices behind NF200 bridges? I know the NF200s don''t support PCI ACS, but that is a security feature (which I have disabled enforcement of to get this far), and AFAIK shouldn''t actually affect the basic PCI passthrough capability. All in all - rather deflating. I was really hoping that I wouldn''t have to be stuck with rebooting between OS-es for another hardware generation. Gordan On 05/05/2013 11:19 PM, Gordan Bobic wrote:> OK, I think I have finally managed to get things to what appears to be > the average state of unreliability of ATI VGA passthrough (i.e. seems to > work _most_ of the time after a fresh reboot, but becomes much more hit > and miss after a VM reboot or two). > > I''m probably going to regret saying this when I find that over the next > week I cannot get it to start up even once, fresh reboot or not, but > what seems to have made a difference is disabling irq balancing (I''m on > a dual X5650 machine, 2 CPUs, 6 cores / 12 threads each). > > So if you are having similarly difficult time, you might want to try the > noirqbalance dom0 kernel boot parameter and disable the irqbalance service. > > Disclaimer: This is largely based on a gut feeling after a few hours > testing, I certainly don''t think it''s definitive, but statistically it > seems to help. > > I have a Quadro 2000 inbound, so I will test with that when it arrives > next week. The optimist in me very much hoping it will "just work". The > realist suspects that would be way too easy. Will know for sure one way > or the other in a few days'' time. > > Gordan > > > On 05/05/2013 04:57 PM, Gordan Bobic wrote: >> On 05/05/2013 04:42 PM, Gordan Bobic wrote: >>> This problem continues to drive me nuts - not by it flat out not >>> working, but by working _intermittently_. >>> >>> For the past week, I had not managed to get ATI VGA passthrough to boot >>> up once (BSOD every time). I was tweaking some boot parameters, and at >>> one point it not only booted up without BSOD-ing, it actually managed >>> full screen 3D applications, and completed a full GPU benchmark pass of >>> Crysis! >>> >>> So just to make sure, I did a full shutdown and cold-booted the machine >>> again - BSOD after BSOD after BSOD. >>> >>> Rebooted it again, and now it works again, including full screen 3D >>> switching. >>> >>> One thing I have established is that pci=resource_alignment=<id>;<id> >>> kernel boot parameter makes the machine not boot at all. It looks like >>> it wipes out the NIC by realigning things, and I need the NIC to work >>> because the machine runs on NFS root (and the VM disk is an iSCSI >>> share). >>> >>> Has anybody got any suggestions on how I might debug this any further? >>> It''s really quite annoying having this _almost_ working. It also looks >>> like if it works once, it will continue working upon guest restarts. >> >> Another observation - when it boots up, and I "eject" the ATI card from >> the guest, all that does it switch the guest display briefly to VNC >> primary, followed by slightly corrupted output switching back to the ATI >> card. And it is still responsive after that, I can execute a normal >> shutdown, don''t have to do it blind. > > >
Hello Gordan, A couple suggestions and questions for you! On May 8, 2013, at 6:53 PM, Gordan Bobic <gordan@bobich.net> wrote:> The only two things that come to mind as possible causes are: > 1) I''m on a dual socket system (dual Xeon X5650)There''s a thread on xen-devel right now.... Someone attempting iommu with a dual socket Xeon system having trouble... Devices showing up twice with different addresses. It might not help at all, but it could be worth a peek.> 2) My motherboard''s PCIe slots are behind NF200 PCIe bridges (yes, EVGA have decided in their infinite wisdom to put all 7 PCIe slots behind NF200s, none are directly attached to the Intel NB).I''m so sorry :P. NF200 has probably caused a lot of xen tinkerers to utter a few dozen cuss words a piece.> What about with PCIe devices behind NF200 bridges? I know the NF200s don''t support PCI ACS, but that is a security feature (which I have disabled enforcement of to get this far), and AFAIK shouldn''t actually affect the basic PCI passthrough capability.Question: how''d you disable ACS? I think it may be causing me some issues. Have you tried passing the NF200 bridge itself, along with the devices behind it? If its possible, it may enable a "cleaner" interaction as the pcie bus ahead of it should be more well behaved.> All in all - rather deflating. I was really hoping that I wouldn''t have to be stuck with rebooting between OS-es for another hardware generation.Also, are you doing VGA pass through, or just pcie pass through of a device that happens to be a video card? I''m guessing you mean the latter, but if not, it''s definitely the more compatible option :) Cheers, Andrew> Gordan > > On 05/05/2013 11:19 PM, Gordan Bobic wrote: >> OK, I think I have finally managed to get things to what appears to be >> the average state of unreliability of ATI VGA passthrough (i.e. seems to >> work _most_ of the time after a fresh reboot, but becomes much more hit >> and miss after a VM reboot or two). >> >> I''m probably going to regret saying this when I find that over the next >> week I cannot get it to start up even once, fresh reboot or not, but >> what seems to have made a difference is disabling irq balancing (I''m on >> a dual X5650 machine, 2 CPUs, 6 cores / 12 threads each). >> >> So if you are having similarly difficult time, you might want to try the >> noirqbalance dom0 kernel boot parameter and disable the irqbalance service. >> >> Disclaimer: This is largely based on a gut feeling after a few hours >> testing, I certainly don''t think it''s definitive, but statistically it >> seems to help. >> >> I have a Quadro 2000 inbound, so I will test with that when it arrives >> next week. The optimist in me very much hoping it will "just work". The >> realist suspects that would be way too easy. Will know for sure one way >> or the other in a few days'' time. >> >> Gordan >> >> >> On 05/05/2013 04:57 PM, Gordan Bobic wrote: >>> On 05/05/2013 04:42 PM, Gordan Bobic wrote: >>>> This problem continues to drive me nuts - not by it flat out not >>>> working, but by working _intermittently_. >>>> >>>> For the past week, I had not managed to get ATI VGA passthrough to boot >>>> up once (BSOD every time). I was tweaking some boot parameters, and at >>>> one point it not only booted up without BSOD-ing, it actually managed >>>> full screen 3D applications, and completed a full GPU benchmark pass of >>>> Crysis! >>>> >>>> So just to make sure, I did a full shutdown and cold-booted the machine >>>> again - BSOD after BSOD after BSOD. >>>> >>>> Rebooted it again, and now it works again, including full screen 3D >>>> switching. >>>> >>>> One thing I have established is that pci=resource_alignment=<id>;<id> >>>> kernel boot parameter makes the machine not boot at all. It looks like >>>> it wipes out the NIC by realigning things, and I need the NIC to work >>>> because the machine runs on NFS root (and the VM disk is an iSCSI >>>> share). >>>> >>>> Has anybody got any suggestions on how I might debug this any further? >>>> It''s really quite annoying having this _almost_ working. It also looks >>>> like if it works once, it will continue working upon guest restarts. >>> >>> Another observation - when it boots up, and I "eject" the ATI card from >>> the guest, all that does it switch the guest display briefly to VNC >>> primary, followed by slightly corrupted output switching back to the ATI >>> card. And it is still responsive after that, I can execute a normal >>> shutdown, don''t have to do it blind. > > > _______________________________________________ > Xen-users mailing list > Xen-users@lists.xen.org > http://lists.xen.org/xen-users
On 05/09/2013 05:31 AM, Andrew Bobulsky wrote:> Hello Gordan, > > A couple suggestions and questions for you!Any suggestions gladly accepted. :)>> The only two things that come to mind as possible causes are: >> 1) I''m on a dual socket system (dual Xeon X5650) > > There''s a thread on xen-devel right now.... Someone attempting iommu > with a dual socket Xeon system having trouble... Devices showing up > twice with different addresses. It might not help at all, but it > could be worth a peek.Do you have a thread name or archive link? I just signed up to the xen-devel list and had a quick look through the archives but haven''t seen an obviously titled thread.>> 2) My motherboard''s PCIe slots are behind NF200 PCIe bridges (yes, EVGA have decided in their infinite wisdom to put all 7 PCIe slots behind NF200s, none are directly attached to the Intel NB). > > I''m so sorry :P. NF200 has probably caused a lot of xen tinkerers to > utter a few dozen cuss words a piece.I can believe that. What is the solution, though? The thing that drives me really nuts about the issues I''m seeing (which may or may not be specifically related to the NF200) is that it is so intermittent. It works well enough to boot up and work with a gaming type load for a few minutes. Then something happens that causes the VGA card to require a reset, and it all falls apart.>> What about with PCIe devices behind NF200 bridges? I know the NF200s don''t support PCI ACS, but that is a security feature (which I have disabled enforcement of to get this far), and AFAIK shouldn''t actually affect the basic PCI passthrough capability. > > Question: how''d you disable ACS? I think it may be causing me some issues.Put: (pci-passthrough-strict-check no) (pci-dev-assign-strict-check no) in /etc/xen/xend-config.sxp If it was causing you issues, however, I''d expect you to find errors in logs pointing at it.> Have you tried passing the NF200 bridge itself, along with the devices > behind it? If its possible, it may enable a "cleaner" interaction as > the pcie bus ahead of it should be more well behaved.I think I tried it, and the error I got back was that you can pass individual PCI devices, but not a PCI bridges.>> All in all - rather deflating. I was really hoping that I wouldn''t have to be stuck with rebooting between OS-es for another hardware generation. > > Also, are you doing VGA pass through, or just pcie pass through of a > device that happens to be a video card? I''m guessing you mean the > latter, but if not, it''s definitely the more compatible option :)I am doing the latter (secondary VGA passthrough). The packages / tools stack when combined with libvirt and virt-manager causes the VM configuration to end up in xenstore, and I have not yet figured out a clean way to migrate between a text config file and xenstore. I might try gfx_passthrough=1 once I''ve figured out how to toggle it in the xenstore configuration. Gordan
Thanks for posting the results Gordan, unfortunate that it isn''t working as well as we hoped.> > 2) My motherboard''s PCIe slots are behind NF200 PCIe bridges (yes, EVGA >>> have decided in their infinite wisdom to put all 7 PCIe slots behind >>> NF200s, none are directly attached to the Intel NB). >>> >> >> I''m so sorry :P. NF200 has probably caused a lot of xen tinkerers to >> utter a few dozen cuss words a piece. >> > > I can believe that. What is the solution, though? > > The thing that drives me really nuts about the issues I''m seeing (which > may or may not be specifically related to the NF200) is that it is so > intermittent. It works well enough to boot up and work with a gaming type > load for a few minutes. Then something happens that causes the VGA card to > require a reset, and it all falls apart. > >My solution was to buy another motherboard, I had no luck at all passing the devices behind the NF200, and similar to your situation all but one PCIe slot on that board was behind that bridge.> What about with PCIe devices behind NF200 bridges? I know the NF200s >>> don''t support PCI ACS, but that is a security feature (which I have >>> disabled enforcement of to get this far), and AFAIK shouldn''t actually >>> affect the basic PCI passthrough capability. >>> >> >> Question: how''d you disable ACS? I think it may be causing me some >> issues. >> > > Put: > > (pci-passthrough-strict-check no) > (pci-dev-assign-strict-check no) > > in /etc/xen/xend-config.sxp > > If it was causing you issues, however, I''d expect you to find errors in > logs pointing at it. > >As I understand the xend-config.sxp <http://wiki.xen.org/wiki/XEND> is for the xm toolstack and deprecated Xend service. Perhaps I am confused, or things changed while I wasn''t looking, but for me enabling Xend breaks the xl toolstack. My understanding is it was for the xm toolstack only and deprecated with 4.2. Any chance you can share how you configured it to work? Apparently it is required to get libvirt working, which I also did not know was compatible with Xen 4.2. _______________________________________________ Xen-users mailing list Xen-users@lists.xen.org http://lists.xen.org/xen-users
On Thu, 9 May 2013 14:20:20 +0000, Casey DeLorme <cdelorme@gmail.com> wrote:> Thanks for posting the results Gordan, unfortunate that it isn't > working as well as we hoped.I haven't given up _quite_ yet. I discovered yesterday that it _looks liks_ one of my PCIe slots is actually duff (two different GPUs both fail to detect properly in it but work fine in other slots). If it turns out to be a duff slot, there's no telling what else might be duff on the motherboard and how it might affect various things, even though several days of full load stability testing passed. So some more bare-metal testing seems to be called for - right now I am not prepared to disregard the possibility that maybe I have a hardware issue somewhere that despite EDAC and ECC on everything, remains undetected and unreported in the logs.> 2) My motherboard's PCIe slots are behind NF200 PCIe bridges (yes, > EVGA have decided in their infinite wisdom to put all 7 PCIe slots > behind NF200s, none are directly attached to the Intel NB). > > I'm so sorry :P. NF200 has probably caused a lot of xen tinkerers to > utter a few dozen cuss words a piece. > > I can believe that. What is the solution, though? > > The thing that drives me really nuts about the issues I'm seeing > (which may or may not be specifically related to the NF200) is that > it > is so intermittent. It works well enough to boot up and work with a > gaming type load for a few minutes. Then something happens that > causes > the VGA card to require a reset, and it all falls apart. > > My solution was to buy another motherboard, I had no luck at all > passing the devices behind the NF200, and similar to your situation > all but one PCIe slot on that board was behind that bridge.Did you not manage to get it working at all? Or was it just intermittent like in my case? I can typically get about 5 minutes of gaming out of my ATI card before it all goes wrong. Ironically, I was thinking about an Asus Sabertooth with an 8-core AMD, but opted to go for broke and get a couple of 6-core Xeons and an EVGA SR-2. It turns out, a solution that is 4x more expensive isn't actually better... :(> What about with PCIe devices behind NF200 bridges? I know the > NF200s > don't support PCI ACS, but that is a security feature (which I have > disabled enforcement of to get this far), and AFAIK shouldn't > actually > affect the basic PCI passthrough capability. > > Question: how'd you disable ACS? I think it may be causing me some > issues. > > Put: > > (pci-passthrough-strict-check no) > (pci-dev-assign-strict-check no) > > in /etc/xen/xend-config.sxp > > If it was causing you issues, however, I'd expect you to find errors > in logs pointing at it. > > As I understand the xend-config.sxp [1] is for the xm toolstack and > deprecated Xend service.xm toolstack and xend are what I am using. I have read reports of issues with VGA passthrough using the xl stack so I didn't even attempt to use it.> Perhaps I am confused, or things changed while I wasn't looking, but > for me enabling Xend breaks the xl toolstack. My understanding is it > was for the xm toolstack only and deprecated with 4.2. Any chance > you can share how you configured it to work? Apparently it is > required to get libvirt working, which I also did not know was > compatible with Xen 4.2.It is possible I'm the one doing it wrong. I'm on EL6, and using virt-manager (at least for things it is willing to do), and that defaults to the xm stack and xend. For what it's worth, it works for the most part - apart from VGA passthrough crashing within 5 minutes of gaming. Gordan _______________________________________________ Xen-users mailing list Xen-users@lists.xen.org http://lists.xen.org/xen-users
Gordan, For reference, I went the ''Asus Sabertooth + AMD 8 core'' route - ASUS Sabertooth 990FX (rev 1.0) and an AMD 8350. I''ve got a Radeon HD 6770 passed through as a secondary screen to a Windows 7 DomU (the primary card is currently a Radeon HD 7750, also used a Nvidia GTX 550 Ti). Its usually stable (unless I have to reboot it a lot) but i''m currently stuck at Xen 4.2.1 as the IVRS table is exporting bad information (it has an entry for one IO-APIC, with a handle of 0x0). This is using the xl toolset. Regards, David On Thu, May 9, 2013 at 10:28 AM, Gordan Bobic <gordan@bobich.net> wrote:> On Thu, 9 May 2013 14:20:20 +0000, Casey DeLorme <cdelorme@gmail.com> > wrote: > >> Thanks for posting the results Gordan, unfortunate that it isn''t >> working as well as we hoped. >> > > I haven''t given up _quite_ yet. > > I discovered yesterday that it _looks liks_ one of my PCIe slots is > actually duff (two different GPUs both fail to detect properly in it > but work fine in other slots). > > If it turns out to be a duff slot, there''s no telling what else > might be duff on the motherboard and how it might affect various > things, even though several days of full load stability testing > passed. > > So some more bare-metal testing seems to be called for - right now I > am not prepared to disregard the possibility that maybe I have a > hardware issue somewhere that despite EDAC and ECC on everything, > remains undetected and unreported in the logs. > > > 2) My motherboard''s PCIe slots are behind NF200 PCIe bridges (yes, >> EVGA have decided in their infinite wisdom to put all 7 PCIe slots >> behind NF200s, none are directly attached to the Intel NB). >> >> I''m so sorry :P. NF200 has probably caused a lot of xen tinkerers to >> utter a few dozen cuss words a piece. >> >> I can believe that. What is the solution, though? >> >> The thing that drives me really nuts about the issues I''m seeing >> (which may or may not be specifically related to the NF200) is that it >> is so intermittent. It works well enough to boot up and work with a >> gaming type load for a few minutes. Then something happens that causes >> the VGA card to require a reset, and it all falls apart. >> >> My solution was to buy another motherboard, I had no luck at all >> passing the devices behind the NF200, and similar to your situation >> all but one PCIe slot on that board was behind that bridge. >> > > Did you not manage to get it working at all? Or was it just > intermittent like in my case? I can typically get about 5 minutes of > gaming out of my ATI card before it all goes wrong. > > Ironically, I was thinking about an Asus Sabertooth with an 8-core AMD, > but opted to go for broke and get a couple of 6-core Xeons and an > EVGA SR-2. It turns out, a solution that is 4x more expensive isn''t > actually better... :( > > What about with PCIe devices behind NF200 bridges? I know the NF200s >> don''t support PCI ACS, but that is a security feature (which I have >> disabled enforcement of to get this far), and AFAIK shouldn''t actually >> affect the basic PCI passthrough capability. >> >> Question: how''d you disable ACS? I think it may be causing me some >> issues. >> >> Put: >> >> (pci-passthrough-strict-check no) >> (pci-dev-assign-strict-check no) >> >> in /etc/xen/xend-config.sxp >> >> If it was causing you issues, however, I''d expect you to find errors >> in logs pointing at it. >> >> As I understand the xend-config.sxp [1] is for the xm toolstack and >> deprecated Xend service. >> > > xm toolstack and xend are what I am using. I have read reports of issues > with VGA passthrough using the xl stack so I didn''t even attempt to use it. > > > Perhaps I am confused, or things changed while I wasn''t looking, but >> for me enabling Xend breaks the xl toolstack. My understanding is it >> was for the xm toolstack only and deprecated with 4.2. Any chance >> you can share how you configured it to work? Apparently it is >> required to get libvirt working, which I also did not know was >> compatible with Xen 4.2. >> > > It is possible I''m the one doing it wrong. I''m on EL6, and using > virt-manager (at least for things it is willing to do), and that > defaults to the xm stack and xend. > > For what it''s worth, it works for the most part - apart from VGA > passthrough crashing within 5 minutes of gaming. > > > Gordan > > ______________________________**_________________ > Xen-users mailing list > Xen-users@lists.xen.org > http://lists.xen.org/xen-users >_______________________________________________ Xen-users mailing list Xen-users@lists.xen.org http://lists.xen.org/xen-users
Thanks for posting the results Gordan, unfortunate that it isn''t> working as well as we hoped. >> > > I haven''t given up _quite_ yet. > > I discovered yesterday that it _looks liks_ one of my PCIe slots is > actually duff (two different GPUs both fail to detect properly in it > but work fine in other slots). > > If it turns out to be a duff slot, there''s no telling what else > might be duff on the motherboard and how it might affect various > things, even though several days of full load stability testing > passed. > > So some more bare-metal testing seems to be called for - right now I > am not prepared to disregard the possibility that maybe I have a > hardware issue somewhere that despite EDAC and ECC on everything, > remains undetected and unreported in the logs. > >I hope you manage to resolve it, though I feel the NF200 will be the larger challenge.> > 2) My motherboard''s PCIe slots are behind NF200 PCIe bridges (yes, >> EVGA have decided in their infinite wisdom to put all 7 PCIe slots >> behind NF200s, none are directly attached to the Intel NB). >> >> I''m so sorry :P. NF200 has probably caused a lot of xen tinkerers to >> utter a few dozen cuss words a piece. >> >> I can believe that. What is the solution, though? >> >> The thing that drives me really nuts about the issues I''m seeing >> (which may or may not be specifically related to the NF200) is that it >> is so intermittent. It works well enough to boot up and work with a >> gaming type load for a few minutes. Then something happens that causes >> the VGA card to require a reset, and it all falls apart. >> >> My solution was to buy another motherboard, I had no luck at all >> passing the devices behind the NF200, and similar to your situation >> all but one PCIe slot on that board was behind that bridge. >> > > Did you not manage to get it working at all? Or was it just > intermittent like in my case? I can typically get about 5 minutes of > gaming out of my ATI card before it all goes wrong. > > Ironically, I was thinking about an Asus Sabertooth with an 8-core AMD, > but opted to go for broke and get a couple of 6-core Xeons and an > EVGA SR-2. It turns out, a solution that is 4x more expensive isn''t > actually better... :( > >I was unable to get it working at all. The NF200 simply threw errors that 100% prevented me from passing the device. I think it was missing a number of specific features required for passthrough, and I vaguely remember running lspci -vvv to verify what was missing. Perhaps not all NF200''s are created equal?> What about with PCIe devices behind NF200 bridges? I know the NF200s >> don''t support PCI ACS, but that is a security feature (which I have >> disabled enforcement of to get this far), and AFAIK shouldn''t actually >> affect the basic PCI passthrough capability. >> >> Question: how''d you disable ACS? I think it may be causing me some >> issues. >> >> Put: >> >> (pci-passthrough-strict-check no) >> (pci-dev-assign-strict-check no) >> >> in /etc/xen/xend-config.sxp >> >> If it was causing you issues, however, I''d expect you to find errors >> in logs pointing at it. >> >> As I understand the xend-config.sxp [1] is for the xm toolstack and >> deprecated Xend service. >> > > xm toolstack and xend are what I am using. I have read reports of issues > with VGA passthrough using the xl stack so I didn''t even attempt to use it. > >The xm toolstack was deprecated in version 4.1. I read that it had not been updated in months due to a lack of maintainers. I did try xm back when I started, the passthrough worked but had the same problems I had when I began testing xl. I have been using xl since then. My logic was simply "why become dependent on a tool that is no-longer maintained and may be removed from the next release?" Does anyone know whether the xm toolstack been modified since 4.1 to accommodate changes with Xen 4.2? If it has not, it might be worth considering xl.> Perhaps I am confused, or things changed while I wasn''t looking, but >> for me enabling Xend breaks the xl toolstack. My understanding is it >> was for the xm toolstack only and deprecated with 4.2. Any chance >> you can share how you configured it to work? Apparently it is >> required to get libvirt working, which I also did not know was >> compatible with Xen 4.2. >> > > It is possible I''m the one doing it wrong. I''m on EL6, and using > virt-manager (at least for things it is willing to do), and that > defaults to the xm stack and xend. > > For what it''s worth, it works for the most part - apart from VGA > passthrough crashing within 5 minutes of gaming. >If you are using xm then it makes sense, as libvirt seems to require xm/xend to be loaded in order to function. There are more upgrade notes<http://wiki.xen.org/wiki/MigrationGuideToXen4.1%2B#Toolstack_upgrade_notes> about xend now, so that is new to me. According to the Xen Man Pages the xend-config.sxp file doesn''t have the flags you added; can you link to resources that mentioned them? I have not seen xl equivalents for your xend configuration, so I guess xm does have some features xl does not still. ~Casey _______________________________________________ Xen-users mailing list Xen-users@lists.xen.org http://lists.xen.org/xen-users
On 05/09/2013 06:35 PM, Casey DeLorme wrote:> Thanks for posting the results Gordan, unfortunate that it isn''t > > working as well as we hoped. > > > I haven''t given up _quite_ yet. > > I discovered yesterday that it _looks liks_ one of my PCIe slots is > actually duff (two different GPUs both fail to detect properly in it > but work fine in other slots). > > If it turns out to be a duff slot, there''s no telling what else > might be duff on the motherboard and how it might affect various > things, even though several days of full load stability testing > passed. > > So some more bare-metal testing seems to be called for - right now I > am not prepared to disregard the possibility that maybe I have a > hardware issue somewhere that despite EDAC and ECC on everything, > remains undetected and unreported in the logs. > > > I hope you manage to resolve it, though I feel the NF200 will be the > larger challenge.I hope I''ll resolve it, too, but right now I am not convinced that the NF200 is actually the cause of my problems. My gut feeling says that if I can get it working for 5 minutes at a time, something less fundamental than the NF200 PCIe routers are the cause of the problems.> 2) My motherboard''s PCIe slots are behind NF200 PCIe bridges > (yes, > EVGA have decided in their infinite wisdom to put all 7 PCIe slots > behind NF200s, none are directly attached to the Intel NB). > > I''m so sorry :P. NF200 has probably caused a lot of xen > tinkerers to > utter a few dozen cuss words a piece. > > I can believe that. What is the solution, though? > > The thing that drives me really nuts about the issues I''m seeing > (which may or may not be specifically related to the NF200) is > that it > is so intermittent. It works well enough to boot up and work with a > gaming type load for a few minutes. Then something happens that > causes > the VGA card to require a reset, and it all falls apart. > > My solution was to buy another motherboard, I had no luck at all > passing the devices behind the NF200, and similar to your situation > all but one PCIe slot on that board was behind that bridge. > > > Did you not manage to get it working at all? Or was it just > intermittent like in my case? I can typically get about 5 minutes of > gaming out of my ATI card before it all goes wrong. > > Ironically, I was thinking about an Asus Sabertooth with an 8-core AMD, > but opted to go for broke and get a couple of 6-core Xeons and an > EVGA SR-2. It turns out, a solution that is 4x more expensive isn''t > actually better... :( > > > I was unable to get it working at all. The NF200 simply threw errors > that 100% prevented me from passing the device. I think it was missing > a number of specific features required for passthrough, and I vaguely > remember running lspci -vvv to verify what was missing. Perhaps not all > NF200''s are created equal?The only logged issue I had with the NF200s was the lack of ACS, which can be disabled as I mentioned on this thread (at least if you are using the xm stack). After I disabled that PCI passthrough has been working OK. It''s just VGA passthrough BSOD-ing after some minutes that is causing me problems.> What about with PCIe devices behind NF200 bridges? I know the > NF200s > don''t support PCI ACS, but that is a security feature (which I have > disabled enforcement of to get this far), and AFAIK shouldn''t > actually > affect the basic PCI passthrough capability. > > Question: how''d you disable ACS? I think it may be causing me > some > issues. > > Put: > > (pci-passthrough-strict-check no) > (pci-dev-assign-strict-check no) > > in /etc/xen/xend-config.sxp > > If it was causing you issues, however, I''d expect you to find > errors > in logs pointing at it. > > As I understand the xend-config.sxp [1] is for the xm toolstack and > deprecated Xend service. > > > xm toolstack and xend are what I am using. I have read reports of issues > with VGA passthrough using the xl stack so I didn''t even attempt to > use it. > > > The xm toolstack was deprecated in version 4.1. I read that it had not > been updated in months due to a lack of maintainers.I heard that xl is still feature-incomplete and experimental, and problematic with VGA passthrough.> I did try xm back > when I started, the passthrough worked but had the same problems I had > when I began testing xl. I have been using xl since then. My logic was > simply "why become dependent on a tool that is no-longer maintained and > may be removed from the next release?"I''m not wedded to any particular tool stack, I''m happy to use whatever works. But since libvirt and virt-manager are still using xm, and since I have seen recent reports of xl being problematic for VGA passthrough as well as there being no apparent way to disable ACS requirements with the xl stack, that rules it out for me completely at the moment.> Does anyone know whether the xm toolstack been modified since 4.1 to > accommodate changes with Xen 4.2? If it has not, it might be worth > considering xl.Does anyone know how to disable the ACS bridge requirement with the xl stack?> Perhaps I am confused, or things changed while I wasn''t looking, but > for me enabling Xend breaks the xl toolstack. My understanding > is it > was for the xm toolstack only and deprecated with 4.2. Any chance > you can share how you configured it to work? Apparently it is > required to get libvirt working, which I also did not know was > compatible with Xen 4.2. > > > It is possible I''m the one doing it wrong. I''m on EL6, and using > virt-manager (at least for things it is willing to do), and that > defaults to the xm stack and xend. > > For what it''s worth, it works for the most part - apart from VGA > passthrough crashing within 5 minutes of gaming. > > > If you are using xm then it makes sense, as libvirt seems to require > xm/xend to be loaded in order to function. > > There are more upgrade notes > <http://wiki.xen.org/wiki/MigrationGuideToXen4.1%2B#Toolstack_upgrade_notes> about > xend now, so that is new to me. According to the Xen Man Pages the > xend-config.sxp file doesn''t have the flags you added; can you link to > resources that mentioned them? I have not seen xl equivalents for your > xend configuration, so I guess xm does have some features xl does not still.This mentions it, among others: http://wiki.xen.org/wiki/Xen_PCI_Passthrough Google for xen pci-passthrough-strict-check pci-dev-assign-strict-check and you should find some relevant things easily enough. Gordan
David, Gordan, On May 9, 2013, at 12:10 PM, David Sutton <kantras@gmail.com> wrote: Gordan, For reference, I went the ''Asus Sabertooth + AMD 8 core'' route - ASUS Sabertooth 990FX (rev 1.0) and an AMD 8350. I''ve got a Radeon HD 6770 passed through as a secondary screen to a Windows 7 DomU (the primary card is currently a Radeon HD 7750, also used a Nvidia GTX 550 Ti). Its usually stable (unless I have to reboot it a lot) but i''m currently stuck at Xen 4.2.1 as the IVRS table is exporting bad information (it has an entry for one IO-APIC, with a handle of 0x0). This is using the xl toolset. Regards, David I can report a similar experience here. To date, the best luck I''ve had with Passthrough is actually on a 990FX system with an 8 core CPU as well. It ran Xen pretty well, but back when I did that, I always had issues with the xm toolstack throwing errors at me. My current work with the xl stack has me thinking it''d smooth things out. The box is running ESXi at the moment, but I guess the real point is that with ~12 passed-through devices (VGA, HDMI audio, and one USB controller each) it runs like a champ, and is so rock solid that, while I''ve had issues, I think they''ve actually all been related to lack of FLR support on the cards... But that''s just speculation. I have to do some weird vm shuffling to piss it off :P -Andrew On Thu, May 9, 2013 at 10:28 AM, Gordan Bobic <gordan@bobich.net> wrote:> On Thu, 9 May 2013 14:20:20 +0000, Casey DeLorme <cdelorme@gmail.com> > wrote: > >> Thanks for posting the results Gordan, unfortunate that it isn''t >> working as well as we hoped. >> > > I haven''t given up _quite_ yet. > > I discovered yesterday that it _looks liks_ one of my PCIe slots is > actually duff (two different GPUs both fail to detect properly in it > but work fine in other slots). > > If it turns out to be a duff slot, there''s no telling what else > might be duff on the motherboard and how it might affect various > things, even though several days of full load stability testing > passed. > > So some more bare-metal testing seems to be called for - right now I > am not prepared to disregard the possibility that maybe I have a > hardware issue somewhere that despite EDAC and ECC on everything, > remains undetected and unreported in the logs. > > > 2) My motherboard''s PCIe slots are behind NF200 PCIe bridges (yes, >> EVGA have decided in their infinite wisdom to put all 7 PCIe slots >> behind NF200s, none are directly attached to the Intel NB). >> >> I''m so sorry :P. NF200 has probably caused a lot of xen tinkerers to >> utter a few dozen cuss words a piece. >> >> I can believe that. What is the solution, though? >> >> The thing that drives me really nuts about the issues I''m seeing >> (which may or may not be specifically related to the NF200) is that it >> is so intermittent. It works well enough to boot up and work with a >> gaming type load for a few minutes. Then something happens that causes >> the VGA card to require a reset, and it all falls apart. >> >> My solution was to buy another motherboard, I had no luck at all >> passing the devices behind the NF200, and similar to your situation >> all but one PCIe slot on that board was behind that bridge. >> > > Did you not manage to get it working at all? Or was it just > intermittent like in my case? I can typically get about 5 minutes of > gaming out of my ATI card before it all goes wrong. > > Ironically, I was thinking about an Asus Sabertooth with an 8-core AMD, > but opted to go for broke and get a couple of 6-core Xeons and an > EVGA SR-2. It turns out, a solution that is 4x more expensive isn''t > actually better... :( > > What about with PCIe devices behind NF200 bridges? I know the NF200s >> don''t support PCI ACS, but that is a security feature (which I have >> disabled enforcement of to get this far), and AFAIK shouldn''t actually >> affect the basic PCI passthrough capability. >> >> Question: how''d you disable ACS? I think it may be causing me some >> issues. >> >> Put: >> >> (pci-passthrough-strict-check no) >> (pci-dev-assign-strict-check no) >> >> in /etc/xen/xend-config.sxp >> >> If it was causing you issues, however, I''d expect you to find errors >> in logs pointing at it. >> >> As I understand the xend-config.sxp [1] is for the xm toolstack and >> deprecated Xend service. >> > > xm toolstack and xend are what I am using. I have read reports of issues > with VGA passthrough using the xl stack so I didn''t even attempt to use it. > > > Perhaps I am confused, or things changed while I wasn''t looking, but >> for me enabling Xend breaks the xl toolstack. My understanding is it >> was for the xm toolstack only and deprecated with 4.2. Any chance >> you can share how you configured it to work? Apparently it is >> required to get libvirt working, which I also did not know was >> compatible with Xen 4.2. >> > > It is possible I''m the one doing it wrong. I''m on EL6, and using > virt-manager (at least for things it is willing to do), and that > defaults to the xm stack and xend. > > For what it''s worth, it works for the most part - apart from VGA > passthrough crashing within 5 minutes of gaming. > > > Gordan > > ______________________________**_________________ > Xen-users mailing list > Xen-users@lists.xen.org > http://lists.xen.org/xen-users >_______________________________________________ Xen-users mailing list Xen-users@lists.xen.org http://lists.xen.org/xen-users _______________________________________________ Xen-users mailing list Xen-users@lists.xen.org http://lists.xen.org/xen-users
Hello Gordan, Casey, On May 9, 2013, at 2:05 PM, Gordan Bobic <gordan@bobich.net> wrote:> On 05/09/2013 06:35 PM, Casey DeLorme wrote: >> Thanks for posting the results Gordan, unfortunate that it isn''t >> >> working as well as we hoped. >> >> >> I haven''t given up _quite_ yet. >> >> I discovered yesterday that it _looks liks_ one of my PCIe slots is >> actually duff (two different GPUs both fail to detect properly in it >> but work fine in other slots). >> >> If it turns out to be a duff slot, there''s no telling what else >> might be duff on the motherboard and how it might affect various >> things, even though several days of full load stability testing >> passed. >> >> So some more bare-metal testing seems to be called for - right now I >> am not prepared to disregard the possibility that maybe I have a >> hardware issue somewhere that despite EDAC and ECC on everything, >> remains undetected and unreported in the logs. >> >> >> I hope you manage to resolve it, though I feel the NF200 will be the >> larger challenge. > > I hope I''ll resolve it, too, but right now I am not convinced that the NF200 is actually the cause of my problems. My gut feeling says that if I can get it working for 5 minutes at a time, something less fundamental than the NF200 PCIe routers are the cause of the problems.I don''t know if I''d be so quick to jump to that conclusion.... I''ll explain :) So the reason I asked about ACS enforcement is because I''m currently trying to pass my Radeon 6990 into a VM. I tried this a while back, but only with ESXi. After futzing with it for a day or two, I had to quit because while I had VT-d, and the ESXi install said Passthrough was supported, I ended up in a "this host requires a reboot before this device can be assigned to a vm" loop of some sort. Hours of investigation revealed that the PEX 8647 (or whatever it is, Google knows :P) which is the PCIe switch built in to the board of the 6990 is *supposed* to support ACS... but it''s seemingly switched off. I''d love to attempt to flash the chip if anyone can provide guidance. Any fellow nerds care to help destroy---I mean fix! Yeah, fix...---a PCIe chip that requires an NDA to get the tools for... I''m down. Maybe I should email AMD or PLX. Back to the point though! ;) So what might intrigue you the most here is that while I''m stuck with a VGA device sitting behind this non-ACS compliant switch... My results are almost identical to yours. Passing one of the VGA devices to the DomU, with or without the corresponding HDMI audio doesn''t seem to matter, I get this: " it is so intermittent. It works well enough to boot up and work with a gaming type load for a few minutes. Then something happens that causes the VGA card to require a reset, and it all falls apart." Seriously :P It eventually likes to BSOD, usually on atikmpag.sys I think. Plenty of "an attempt was made to reset the display adapter and failed" blah blah blah. This happens 100% of the time if I try to boot with both devices attached. The first time I boot it up, the driver isn''t installed so it''ll work until just before auto-login reaches the desktop, but after that I can''t boot at all with both VGA devices attached. I''d love to explore more, but I''m running out of places to look for solutions to my problem that don''t involve my credit card and some new hardware. In a fit of delicious irony, my problem is almost identical to yours---if only I''d bought some cheaper stuff it''d probably all work just great :D The only single GPU cards I have are the Radeon 5850s in the AMD box I have. I''m just a little reticent to tear the thing apart though cause it gets used a lot. I think my next step is to look for a video card that properly supports FLR, though I''m considering a hard-hack: think of a 12v relay and a PCIe extender cable---if a D3D0 reset actually powers off the slot momentarily but the PSU plugs on the card prevent it from working, then I could rig up a switch that ties those plugs'' power state into the slot itself---it''s radical, yes, but possibly the most inventive solution I can think of so far. I''m super curious to see if anyone more knowledgeable than myself thinks it would work, because it''d be super cheap to build! As the saying goes though, I''ll "cross that bridge when I come to it." :)>> 2) My motherboard''s PCIe slots are behind NF200 PCIe bridges >> (yes, >> EVGA have decided in their infinite wisdom to put all 7 PCIe slots >> behind NF200s, none are directly attached to the Intel NB). >> >> I''m so sorry :P. NF200 has probably caused a lot of xen >> tinkerers to >> utter a few dozen cuss words a piece. >> >> I can believe that. What is the solution, though? >> >> The thing that drives me really nuts about the issues I''m seeing >> (which may or may not be specifically related to the NF200) is >> that it >> is so intermittent. It works well enough to boot up and work with a >> gaming type load for a few minutes. Then something happens that >> causes >> the VGA card to require a reset, and it all falls apart. >> >> My solution was to buy another motherboard, I had no luck at all >> passing the devices behind the NF200, and similar to your situation >> all but one PCIe slot on that board was behind that bridge. >> >> >> Did you not manage to get it working at all? Or was it just >> intermittent like in my case? I can typically get about 5 minutes of >> gaming out of my ATI card before it all goes wrong. >> >> Ironically, I was thinking about an Asus Sabertooth with an 8-core AMD, >> but opted to go for broke and get a couple of 6-core Xeons and an >> EVGA SR-2. It turns out, a solution that is 4x more expensive isn''t >> actually better... :( >> >> >> I was unable to get it working at all. The NF200 simply threw errors >> that 100% prevented me from passing the device. I think it was missing >> a number of specific features required for passthrough, and I vaguely >> remember running lspci -vvv to verify what was missing. Perhaps not all >> NF200''s are created equal? > > The only logged issue I had with the NF200s was the lack of ACS, which can be disabled as I mentioned on this thread (at least if you are using the xm stack). After I disabled that PCI passthrough has been working OK. It''s just VGA passthrough BSOD-ing after some minutes that is causing me problems.In reading up on the wiki, there does indeed seem to be a lot more info regarding the use of xl and PCI Passthrough today than the last time I looked. It seems that these types of configuration options are set on a domain-by-domain basis, or even by device; docs say that things like VPCI vs direct PASS mapping of slot layout(?) is actually configured at the device level either in your DomU config file (like: pci = [''0:d:0.0, pci-just-forking-work-damn-you]) or via xl (like: xl pci-attach 1 0:d:0.0 pci-just-forking-work-damn-you). With that in mind, even though I''ve taken your advice and added the config info to my xend files, its entirely possible---especially in light of what Casey said---that I''m just Doing It Wrong(TM). It''d likely be beneficial for us both to compare notes on that regard. If either of you would be willing to help, I could probably use some pointers... I''ve kinda run out of logs to look at with my current knowledge on the subject :P>> What about with PCIe devices behind NF200 bridges? I know the >> NF200s >> don''t support PCI ACS, but that is a security feature (which I have >> disabled enforcement of to get this far), and AFAIK shouldn''t >> actually >> affect the basic PCI passthrough capability. >> >> Question: how''d you disable ACS? I think it may be causing me >> some >> issues. >> >> Put: >> >> (pci-passthrough-strict-check no) >> (pci-dev-assign-strict-check no) >> >> in /etc/xen/xend-config.sxp >> >> If it was causing you issues, however, I''d expect you to find >> errors >> in logs pointing at it. >> >> As I understand the xend-config.sxp [1] is for the xm toolstack and >> deprecated Xend service. >> >> >> xm toolstack and xend are what I am using. I have read reports of issues >> with VGA passthrough using the xl stack so I didn''t even attempt to >> use it. >> >> >> The xm toolstack was deprecated in version 4.1. I read that it had not >> been updated in months due to a lack of maintainers. > > I heard that xl is still feature-incomplete and experimental, and problematic with VGA passthrough. > >> I did try xm back >> when I started, the passthrough worked but had the same problems I had >> when I began testing xl. I have been using xl since then. My logic was >> simply "why become dependent on a tool that is no-longer maintained and >> may be removed from the next release?" > > I''m not wedded to any particular tool stack, I''m happy to use whatever works. But since libvirt and virt-manager are still using xm, and since I have seen recent reports of xl being problematic for VGA passthrough as well as there being no apparent way to disable ACS requirements with the xl stack, that rules it out for me completely at the moment.The xm stack was rather trying for me. It''s like it only wanted to throw errors at me when I did PCI stuff. Whereas xl has seemingly been more than happy to do whatever I tell it. Though I admit chances are pretty good I was just running around, haphazardly using the wrong version of python or something. Given our nearly identical results thus far, I''d wager that the toolstack itself isn''t really the source of our problems. If that''s true, though, the easy solution is likely out the window :(>> Does anyone know whether the xm toolstack been modified since 4.1 to >> accommodate changes with Xen 4.2? If it has not, it might be worth >> considering xl. > > Does anyone know how to disable the ACS bridge requirement with the xl stack?I''ll second that question!>> Perhaps I am confused, or things changed while I wasn''t looking, but >> for me enabling Xend breaks the xl toolstack. My understanding >> is it >> was for the xm toolstack only and deprecated with 4.2. Any chance >> you can share how you configured it to work? Apparently it is >> required to get libvirt working, which I also did not know was >> compatible with Xen 4.2. >> >> >> It is possible I''m the one doing it wrong. I''m on EL6, and using >> virt-manager (at least for things it is willing to do), and that >> defaults to the xm stack and xend. >> >> For what it''s worth, it works for the most part - apart from VGA >> passthrough crashing within 5 minutes of gaming. >> >> >> If you are using xm then it makes sense, as libvirt seems to require >> xm/xend to be loaded in order to function. >> >> There are more upgrade notes >> <http://wiki.xen.org/wiki/MigrationGuideToXen4.1%2B#Toolstack_upgrade_notes> about >> xend now, so that is new to me. According to the Xen Man Pages the >> xend-config.sxp file doesn''t have the flags you added; can you link to >> resources that mentioned them? I have not seen xl equivalents for your >> xend configuration, so I guess xm does have some features xl does not still. > > This mentions it, among others: > http://wiki.xen.org/wiki/Xen_PCI_Passthrough > > Google for > xen pci-passthrough-strict-check pci-dev-assign-strict-check > > and you should find some relevant things easily enough. > > Gordan > > _______________________________________________ > Xen-users mailing list > Xen-users@lists.xen.org > http://lists.xen.org/xen-usersBest Regards, Andrew
On Fri, 10 May 2013 10:12:07 -0400, Andrew Bobulsky <rulerof@gmail.com> wrote:> Hello Gordan, Casey, > >>> Thanks for posting the results Gordan, unfortunate that it isn''t >>> >>> working as well as we hoped. >>> >>> >>> I haven''t given up _quite_ yet. >>> >>> I discovered yesterday that it _looks liks_ one of my PCIe slots >>> is >>> actually duff (two different GPUs both fail to detect properly in >>> it >>> but work fine in other slots). >>> >>> If it turns out to be a duff slot, there''s no telling what else >>> might be duff on the motherboard and how it might affect various >>> things, even though several days of full load stability testing >>> passed. >>> >>> So some more bare-metal testing seems to be called for - right >>> now I >>> am not prepared to disregard the possibility that maybe I have a >>> hardware issue somewhere that despite EDAC and ECC on everything, >>> remains undetected and unreported in the logs. >>> >>> >>> I hope you manage to resolve it, though I feel the NF200 will be >>> the >>> larger challenge. >> >> I hope I''ll resolve it, too, but right now I am not convinced that >> the NF200 is actually the cause of my problems. My gut feeling says >> that if I can get it working for 5 minutes at a time, something less >> fundamental than the NF200 PCIe routers are the cause of the >> problems. > > I don''t know if I''d be so quick to jump to that conclusion.... I''ll > explain :) > > So the reason I asked about ACS enforcement is because I''m currently > trying to pass my Radeon 6990 into a VM. I tried this a while back, > but only with ESXi. After futzing with it for a day or two, I had to > quit because while I had VT-d, and the ESXi install said Passthrough > was supported, I ended up in a "this host requires a reboot before > this device can be assigned to a vm" loop of some sort. Hours of > investigation revealed that the PEX 8647 (or whatever it is, Google > knows :P) which is the PCIe switch built in to the board of the 6990 > is *supposed* to support ACS... but it''s seemingly switched off.Two points here: 1) Unlike ESXi 4.1+ (from what I can find), Xen (at least with the xm/xend stack does allow ACS requirement to be disabled. 2) I actually have it working - for 5 minutes or so at a time. If the problem was the lack of ACS, it wouldn''t work at all.> So what might intrigue you the most here is that while I''m stuck with > a VGA device sitting behind this non-ACS compliant switch... My > results are almost identical to yours. Passing one of the VGA > devices > to the DomU, with or without the corresponding HDMI audio doesn''t > seem > to matter, I get this: > > " it is so intermittent. It works well enough to boot up and work > with > a gaming type load for a few minutes. Then something happens that > causes the VGA card to require a reset, and it all falls apart." > > Seriously :PAnd you are convinced this is to do with the availability of ACS?> It eventually likes to BSOD, usually on atikmpag.sys I think. Plenty > of "an attempt was made to reset the display adapter and failed" blah > blah blah.Yes, all too familiar.> This happens 100% of the time if I try to boot with both > devices attached.Both devices? Just out of interest: 1) Are you using a multi-socket motherboard? 2) Have you tried disabling IRQ balancing (noirqbalance kernel parameter + disable irqbalance service)? 3) Are you assigning > 4GB of RAM to the guest? I found a post in the archive last night mentioning that there''s an outstanding qemu issue with > 4GB of RAM given to the guest. I didn''t get around to re-trying the VM with 3.5GB yet.> The first time I boot it up, the driver isn''t > installed so it''ll work until just before auto-login reaches the > desktop, but after that I can''t boot at all with both VGA devices > attached. I''d love to explore more, but I''m running out of places to > look for solutions to my problem that don''t involve my credit card > and > some new hardware. In a fit of delicious irony, my problem is almost > identical to yours---if only I''d bought some cheaper stuff it''d > probably all work just great :DLife on the bleeding edge is hard. :( The thing that really bugs me is that after a fresh reboot with irq balancing disabled, I can get it working for a few minutes _every time_. After a few minutes, it''ll start corrupting the screen output and eventually try to reset itself (sometimes even claim to succeed a few times), eventually fail and BSOD.> The only single GPU cards I have are the Radeon 5850s in the AMD box > I > have. I''m just a little reticent to tear the thing apart though > cause > it gets used a lot. I think my next step is to look for a video card > that properly supports FLR,As far as I can tell, for all the talk of it - there is NO SUCH THING. Somebody on the list posted lspci -vvv from their ATI FirePro card which shows it has no FLR, and I have just got a Quadro 2000, which also lacks FLR. The only vague mention I have seen of FLR on GPUs is on the Intel GPU on the very latest generation of Core i CPUs (the built in one). And even if that is true it''s not all that useful for gaming.> though I''m considering a hard-hack: think > of a 12v relay and a PCIe extender cable---if a D3D0 reset actually > powers off the slot momentarily but the PSU plugs on the card prevent > it from working, then I could rig up a switch that ties those plugs'' > power state into the slot itself---it''s radical, yes, but possibly > the > most inventive solution I can think of so far. I''m super curious to > see if anyone more knowledgeable than myself thinks it would work, > because it''d be super cheap to build! As the saying goes though, > I''ll > "cross that bridge when I come to it." :)Interesting. In theory, I think this _should_ work provider your PCIe bridges support hot-plugging. To be certain, you''d have to switch both the PCIe slot and (if your card uses it) the external power inputs.>>> 2) My motherboard''s PCIe slots are behind NF200 PCIe >>> bridges >>> (yes, >>> EVGA have decided in their infinite wisdom to put all 7 PCIe >>> slots >>> behind NF200s, none are directly attached to the Intel NB). >>> >>> I''m so sorry :P. NF200 has probably caused a lot of xen >>> tinkerers to >>> utter a few dozen cuss words a piece. >>> >>> I can believe that. What is the solution, though? >>> >>> The thing that drives me really nuts about the issues I''m >>> seeing >>> (which may or may not be specifically related to the NF200) >>> is >>> that it >>> is so intermittent. It works well enough to boot up and work >>> with a >>> gaming type load for a few minutes. Then something happens >>> that >>> causes >>> the VGA card to require a reset, and it all falls apart. >>> >>> My solution was to buy another motherboard, I had no luck at >>> all >>> passing the devices behind the NF200, and similar to your >>> situation >>> all but one PCIe slot on that board was behind that bridge. >>> >>> >>> Did you not manage to get it working at all? Or was it just >>> intermittent like in my case? I can typically get about 5 minutes >>> of >>> gaming out of my ATI card before it all goes wrong. >>> >>> Ironically, I was thinking about an Asus Sabertooth with an >>> 8-core AMD, >>> but opted to go for broke and get a couple of 6-core Xeons and an >>> EVGA SR-2. It turns out, a solution that is 4x more expensive >>> isn''t >>> actually better... :( >>> >>> >>> I was unable to get it working at all. The NF200 simply threw >>> errors >>> that 100% prevented me from passing the device. I think it was >>> missing >>> a number of specific features required for passthrough, and I >>> vaguely >>> remember running lspci -vvv to verify what was missing. Perhaps >>> not all >>> NF200''s are created equal? >> >> The only logged issue I had with the NF200s was the lack of ACS, >> which >> can be disabled as I mentioned on this thread (at least if you are >> using >> the xm stack). After I disabled that PCI passthrough has been >> working OK. >> It''s just VGA passthrough BSOD-ing after some minutes that is >> causing me >> problems. > > In reading up on the wiki, there does indeed seem to be a lot more > info regarding the use of xl and PCI Passthrough today than the last > time I looked. It seems that these types of configuration options > are > set on a domain-by-domain basis, or even by device; docs say that > things like VPCI vs direct PASS mapping of slot layout(?) is actually > configured at the device level either in your DomU config file (like: > pci = [''0:d:0.0, pci-just-forking-work-damn-you]) or via xl (like: xl > pci-attach 1 0:d:0.0 pci-just-forking-work-damn-you).Hmm... I honestly don''t think the xl way will succeed where xm is unstable, but I might give it a shot.> With that in mind, even though I''ve taken your advice and added the > config info to my xend files, its entirely possible---especially in > light of what Casey said---that I''m just Doing It Wrong(TM). It''d > likely be beneficial for us both to compare notes on that regard. If > either of you would be willing to help, I could probably use some > pointers... I''ve kinda run out of logs to look at with my current > knowledge on the subject :PCertainly - what notes do you propose we compare?>>> What about with PCIe devices behind NF200 bridges? I know >>> the >>> NF200s >>> don''t support PCI ACS, but that is a security feature (which >>> I have >>> disabled enforcement of to get this far), and AFAIK shouldn''t >>> actually >>> affect the basic PCI passthrough capability. >>> >>> Question: how''d you disable ACS? I think it may be causing >>> me >>> some >>> issues. >>> >>> Put: >>> >>> (pci-passthrough-strict-check no) >>> (pci-dev-assign-strict-check no) >>> >>> in /etc/xen/xend-config.sxp >>> >>> If it was causing you issues, however, I''d expect you to >>> find >>> errors >>> in logs pointing at it. >>> >>> As I understand the xend-config.sxp [1] is for the xm >>> toolstack and >>> deprecated Xend service. >>> >>> >>> xm toolstack and xend are what I am using. I have read reports of >>> issues >>> with VGA passthrough using the xl stack so I didn''t even attempt >>> to >>> use it. >>> >>> >>> The xm toolstack was deprecated in version 4.1. I read that it had >>> not >>> been updated in months due to a lack of maintainers. >> >> I heard that xl is still feature-incomplete and experimental, and >> problematic with VGA passthrough. >> >>> I did try xm back >>> when I started, the passthrough worked but had the same problems I >>> had >>> when I began testing xl. I have been using xl since then. My >>> logic was >>> simply "why become dependent on a tool that is no-longer maintained >>> and >>> may be removed from the next release?" >> >> I''m not wedded to any particular tool stack, I''m happy to use >> whatever >> works. But since libvirt and virt-manager are still using xm, and >> since >> I have seen recent reports of xl being problematic for VGA >> passthrough >> as well as there being no apparent way to disable ACS requirements >> with >> the xl stack, that rules it out for me completely at the moment. > > The xm stack was rather trying for me. It''s like it only wanted to > throw errors at me when I did PCI stuff. Whereas xl has seemingly > been more than happy to do whatever I tell it. Though I admit > chances > are pretty good I was just running around, haphazardly using the > wrong > version of python or something. Given our nearly identical results > thus far, I''d wager that the toolstack itself isn''t really the source > of our problems. If that''s true, though, the easy solution is likely > out the window :(What distro do you use? Gordan
On Fri, May 10, 2013 at 10:53 AM, Gordan Bobic <gordan@bobich.net> wrote:> On Fri, 10 May 2013 10:12:07 -0400, Andrew Bobulsky <rulerof@gmail.com> > wrote: > >> Hello Gordan, Casey, >> >> >> Thanks for posting the results Gordan, unfortunate that it isn''t >>>> >>>> working as well as we hoped. >>>> >>>> >>>> I haven''t given up _quite_ yet. >>>> >>>> I discovered yesterday that it _looks liks_ one of my PCIe slots is >>>> actually duff (two different GPUs both fail to detect properly in it >>>> but work fine in other slots). >>>> >>>> If it turns out to be a duff slot, there''s no telling what else >>>> might be duff on the motherboard and how it might affect various >>>> things, even though several days of full load stability testing >>>> passed. >>>> >>>> So some more bare-metal testing seems to be called for - right now I >>>> am not prepared to disregard the possibility that maybe I have a >>>> hardware issue somewhere that despite EDAC and ECC on everything, >>>> remains undetected and unreported in the logs. >>>> >>>> >>>> I hope you manage to resolve it, though I feel the NF200 will be the >>>> larger challenge. >>>> >>> >>> I hope I''ll resolve it, too, but right now I am not convinced that >>> the NF200 is actually the cause of my problems. My gut feeling says >>> that if I can get it working for 5 minutes at a time, something less >>> fundamental than the NF200 PCIe routers are the cause of the problems. >>> >> >> I don''t know if I''d be so quick to jump to that conclusion.... I''ll >> explain :) >> >> So the reason I asked about ACS enforcement is because I''m currently >> trying to pass my Radeon 6990 into a VM. I tried this a while back, >> but only with ESXi. After futzing with it for a day or two, I had to >> quit because while I had VT-d, and the ESXi install said Passthrough >> was supported, I ended up in a "this host requires a reboot before >> this device can be assigned to a vm" loop of some sort. Hours of >> investigation revealed that the PEX 8647 (or whatever it is, Google >> knows :P) which is the PCIe switch built in to the board of the 6990 >> is *supposed* to support ACS... but it''s seemingly switched off. >> > > Two points here: > > 1) Unlike ESXi 4.1+ (from what I can find), Xen (at least with the > xm/xend stack does allow ACS requirement to be disabled. >Hehe. It''s nice to have the option to screw things up, eh? :)> 2) I actually have it working - for 5 minutes or so at a time. If > the problem was the lack of ACS, it wouldn''t work at all.I just can''t help but wonder if it *is* the problem, though. It''s the only thing I can pin down that our situations have in common as far as its being the only "non-compatible" portion of the implementation, aside from the nearly identical behavior, of course. Maybe the AMD driver does some stupid stuff that ACS can mitigate? I just wish I knew more :(> So what might intrigue you the most here is that while I''m stuck with >> a VGA device sitting behind this non-ACS compliant switch... My >> results are almost identical to yours. Passing one of the VGA devices >> to the DomU, with or without the corresponding HDMI audio doesn''t seem >> to matter, I get this: >> >> " it is so intermittent. It works well enough to boot up and work with >> a gaming type load for a few minutes. Then something happens that >> causes the VGA card to require a reset, and it all falls apart." >> >> Seriously :P >> > > And you are convinced this is to do with the availability of ACS?Like I said, it''s the only thing that I can pinpoint as being a hindrance to compatibility. I guess my request here is if anyone can help me determine whether or not that''s true?> It eventually likes to BSOD, usually on atikmpag.sys I think. Plenty >> of "an attempt was made to reset the display adapter and failed" blah >> blah blah. >> > > Yes, all too familiar. > > This happens 100% of the time if I try to boot with both >> devices attached. >> > > Both devices? >Yes---that is to say both of the VGA controllers from the 6990. The relevant portion of my lspci looks like this: http://pastebin.com/raw.php?i=GwekPNAW Note: devices 09 and 0a are my "primary" 6990''s vga controllers. Also, my crossfire bridge is disconnected. I''m working with the other card, devices 0d and 0e. I''ve included the USB card as well in the list because I''m using it, but it causes me no problems whatsoever. For what its worth, that USB card works great in ESXi as well... Highpoint enabled ACS on their PEX chips :D Just out of interest:> > 1) Are you using a multi-socket motherboard? >Nope! It''s a Gigabyte GA-EX58-EXTREME. It''s LGA1366 with an i7 920 in it. VT-d support is provided through a hacked BIOS image that I found on the web a couple years or so ago.> 2) Have you tried disabling IRQ balancing > (noirqbalance kernel parameter + disable irqbalance service)? >No clue what that is. Can you provide any direction? I''d be happy to test.> 3) Are you assigning > 4GB of RAM to the guest? I found a post > in the archive last night mentioning that there''s an outstanding qemu > issue with > 4GB of RAM given to the guest. I didn''t get around to > re-trying the VM with 3.5GB yet.Yes sir. It''s got 8 GB + 1 GB for the standard video adapter. Not sure if that''s improper, but it boots just find with a single card, and the 5850 I plugged in for a short while seemed well behaved. Here''s a copy of my vm config file: http://pastebin.com/bX0ayA0u> The first time I boot it up, the driver isn''t >> installed so it''ll work until just before auto-login reaches the >> desktop, but after that I can''t boot at all with both VGA devices >> attached. I''d love to explore more, but I''m running out of places to >> look for solutions to my problem that don''t involve my credit card and >> some new hardware. In a fit of delicious irony, my problem is almost >> identical to yours---if only I''d bought some cheaper stuff it''d >> probably all work just great :D >> > > Life on the bleeding edge is hard. :( > The thing that really bugs me is that after a fresh reboot with irq > balancing disabled, I can get it working for a few minutes _every time_. > > After a few minutes, it''ll start corrupting the screen output and > eventually try to reset itself (sometimes even claim to succeed a few > times), eventually fail and BSOD.The only corrupted output I''ve seen is during a BSOD itself---which was once on Server 2012---and again I saw some black lines when I zoomed in with Chrome on a Win7 guest. I''m not entirely convinced that the black lines were a symptom of Xen/Radeon/Whatever versus just being a goofy Chrome bug. The only single GPU cards I have are the Radeon 5850s in the AMD box I>> have. I''m just a little reticent to tear the thing apart though cause >> it gets used a lot. I think my next step is to look for a video card >> that properly supports FLR, >> > > As far as I can tell, for all the talk of it - there is NO SUCH THING. > Somebody on the list posted lspci -vvv from their ATI FirePro card > which shows it has no FLR, and I have just got a Quadro 2000, which also > lacks FLR. > > The only vague mention I have seen of FLR on GPUs is on the Intel GPU on > the very latest generation of Core i CPUs (the built in one). And even > if that is true it''s not all that useful for gaming.Heh. The crappiest GPU that would ever be in my system is the most compatible? Good grief. :P though I''m considering a hard-hack: think>> of a 12v relay and a PCIe extender cable---if a D3D0 reset actually >> powers off the slot momentarily but the PSU plugs on the card prevent >> it from working, then I could rig up a switch that ties those plugs'' >> power state into the slot itself---it''s radical, yes, but possibly the >> most inventive solution I can think of so far. I''m super curious to >> see if anyone more knowledgeable than myself thinks it would work, >> because it''d be super cheap to build! As the saying goes though, I''ll >> "cross that bridge when I come to it." :) >> > > Interesting. In theory, I think this _should_ work provider your PCIe > bridges support hot-plugging. > > To be certain, you''d have to switch both the PCIe slot and (if your card > uses it) the external power inputs.That''d be the idea. Assuming it works the way I think it does, I could tap a 12v (I''m pretty sure it''s 12v in there) relay into the Vcc and GND pins of the PCIe slot and use the relay''s output to switch the Vcc from the plug-in cables off of the PSU. Bears testing with a slightly less expensive card, but I wouldn''t be surprised to see it work! It''d require some case modding for sure though, as the extension cable will get in the way of properly seating the card. It could be possible to build a tap that could be "slipped in" to a card''s PCIe slot... Short of proper FLR support, this could actually very cheaply be built into the expansion card itself. I''d suspect that simply adding FLR would be cheaper on the card manufacturers though. :) 2) My motherboard''s PCIe slots are behind NF200 PCIe bridges>>>> (yes, >>>> EVGA have decided in their infinite wisdom to put all 7 PCIe slots >>>> behind NF200s, none are directly attached to the Intel NB). >>>> >>>> I''m so sorry :P. NF200 has probably caused a lot of xen >>>> tinkerers to >>>> utter a few dozen cuss words a piece. >>>> >>>> I can believe that. What is the solution, though? >>>> >>>> The thing that drives me really nuts about the issues I''m seeing >>>> (which may or may not be specifically related to the NF200) is >>>> that it >>>> is so intermittent. It works well enough to boot up and work with >>>> a >>>> gaming type load for a few minutes. Then something happens that >>>> causes >>>> the VGA card to require a reset, and it all falls apart. >>>> >>>> My solution was to buy another motherboard, I had no luck at all >>>> passing the devices behind the NF200, and similar to your >>>> situation >>>> all but one PCIe slot on that board was behind that bridge. >>>> >>>> >>>> Did you not manage to get it working at all? Or was it just >>>> intermittent like in my case? I can typically get about 5 minutes of >>>> gaming out of my ATI card before it all goes wrong. >>>> >>>> Ironically, I was thinking about an Asus Sabertooth with an 8-core >>>> AMD, >>>> but opted to go for broke and get a couple of 6-core Xeons and an >>>> EVGA SR-2. It turns out, a solution that is 4x more expensive isn''t >>>> actually better... :( >>>> >>>> >>>> I was unable to get it working at all. The NF200 simply threw errors >>>> that 100% prevented me from passing the device. I think it was missing >>>> a number of specific features required for passthrough, and I vaguely >>>> remember running lspci -vvv to verify what was missing. Perhaps not all >>>> NF200''s are created equal? >>>> >>> >>> The only logged issue I had with the NF200s was the lack of ACS, which >>> can be disabled as I mentioned on this thread (at least if you are using >>> the xm stack). After I disabled that PCI passthrough has been working OK. >>> It''s just VGA passthrough BSOD-ing after some minutes that is causing me >>> problems. >>> >> >> In reading up on the wiki, there does indeed seem to be a lot more >> info regarding the use of xl and PCI Passthrough today than the last >> time I looked. It seems that these types of configuration options are >> set on a domain-by-domain basis, or even by device; docs say that >> things like VPCI vs direct PASS mapping of slot layout(?) is actually >> configured at the device level either in your DomU config file (like: >> pci = [''0:d:0.0, pci-just-forking-work-damn-**you]) or via xl (like: xl >> pci-attach 1 0:d:0.0 pci-just-forking-work-damn-**you). >> > > Hmm... I honestly don''t think the xl way will succeed where xm is unstable, > but I might give it a shot.You''d still likely require all the "hacks" you''re currently using, but they''ll all move to different places I''m guessing... if the toolstack itself doesn''t have any bearing on this (which is my suspicion) then you don''t want to go doing all the extra work for nothing, of course! With that in mind, even though I''ve taken your advice and added the>> config info to my xend files, its entirely possible---especially in >> light of what Casey said---that I''m just Doing It Wrong(TM). It''d >> likely be beneficial for us both to compare notes on that regard. If >> either of you would be willing to help, I could probably use some >> pointers... I''ve kinda run out of logs to look at with my current >> knowledge on the subject :P >> > > Certainly - what notes do you propose we compare?I''m not completely sure. If you can point me to the proper files to verify that my device has the same PCIe-level compatibility issues as yours (verify that ACS isn''t available to the device and so on) then I''d call that a step in the right direction. What about with PCIe devices behind NF200 bridges? I know the>>>> NF200s >>>> don''t support PCI ACS, but that is a security feature (which I >>>> have >>>> disabled enforcement of to get this far), and AFAIK shouldn''t >>>> actually >>>> affect the basic PCI passthrough capability. >>>> >>>> Question: how''d you disable ACS? I think it may be causing me >>>> some >>>> issues. >>>> >>>> Put: >>>> >>>> (pci-passthrough-strict-check no) >>>> (pci-dev-assign-strict-check no) >>>> >>>> in /etc/xen/xend-config.sxp >>>> >>>> If it was causing you issues, however, I''d expect you to find >>>> errors >>>> in logs pointing at it. >>>> >>>> As I understand the xend-config.sxp [1] is for the xm toolstack >>>> and >>>> deprecated Xend service. >>>> >>>> >>>> xm toolstack and xend are what I am using. I have read reports of >>>> issues >>>> with VGA passthrough using the xl stack so I didn''t even attempt to >>>> use it. >>>> >>>> >>>> The xm toolstack was deprecated in version 4.1. I read that it had not >>>> been updated in months due to a lack of maintainers. >>>> >>> >>> I heard that xl is still feature-incomplete and experimental, and >>> problematic with VGA passthrough. >>> >>> I did try xm back >>>> when I started, the passthrough worked but had the same problems I had >>>> when I began testing xl. I have been using xl since then. My logic was >>>> simply "why become dependent on a tool that is no-longer maintained and >>>> may be removed from the next release?" >>>> >>> >>> I''m not wedded to any particular tool stack, I''m happy to use whatever >>> works. But since libvirt and virt-manager are still using xm, and since >>> I have seen recent reports of xl being problematic for VGA passthrough >>> as well as there being no apparent way to disable ACS requirements with >>> the xl stack, that rules it out for me completely at the moment. >>> >> >> The xm stack was rather trying for me. It''s like it only wanted to >> throw errors at me when I did PCI stuff. Whereas xl has seemingly >> been more than happy to do whatever I tell it. Though I admit chances >> are pretty good I was just running around, haphazardly using the wrong >> version of python or something. Given our nearly identical results >> thus far, I''d wager that the toolstack itself isn''t really the source >> of our problems. If that''s true, though, the easy solution is likely >> out the window :( >> > > What distro do you use?<snip> Currently running Debian Squeeze 6.0.7 x86_64, with Linux kernel 3.4.44. Cheers, Andrew _______________________________________________ Xen-users mailing list Xen-users@lists.xen.org http://lists.xen.org/xen-users
On 05/10/2013 06:54 PM, Andrew Bobulsky wrote:> Two points here: > > 1) Unlike ESXi 4.1+ (from what I can find), Xen (at least with the > xm/xend stack does allow ACS requirement to be disabled. > > > Hehe. It''s nice to have the option to screw things up, eh? :)Personally, I really dislike too much cleverness from software. While I understand auto-detection is handy for Ubuntu users, I want there to be a way to override things if I need to, without extensive source code modifying. I like there to be a way to tell whatever you are using to quit holding your hand and just do as it''s damn well told.> 2) I actually have it working - for 5 minutes or so at a time. If > the problem was the lack of ACS, it wouldn''t work at all. > > > I just can''t help but wonder if it /is/ the problem, though. It''s the > only thing I can pin down that our situations have in common as far as > its being the only "non-compatible" portion of the implementation, aside > from the nearly identical behavior, of course. Maybe the AMD driver does > some stupid stuff that ACS can mitigate? I just wish I knew more :(Now you got me thinking... I noticed that when the GPU starts to head toward the crash, this appears in the syslog: May 6 16:35:51 normandy kernel: pcieport 0000:00:03.0: AER: Multiple Uncorrected (Non-Fatal) error received: id=0000 It certainly makes me wonder. Has anyone else seen this error? The device ID in question is: 00:03.0 PCI bridge: Intel Corporation 5520/5500/X58 I/O Hub PCI Express Root Port 3 (rev 22) which does not bode well... Duff hardware?> So what might intrigue you the most here is that while I''m stuck > with > a VGA device sitting behind this non-ACS compliant switch... My > results are almost identical to yours. Passing one of the VGA > devices > to the DomU, with or without the corresponding HDMI audio > doesn''t seem > to matter, I get this: > > " it is so intermittent. It works well enough to boot up and > work with > a gaming type load for a few minutes. Then something happens that > causes the VGA card to require a reset, and it all falls apart." > > Seriously :P > > > And you are convinced this is to do with the availability of ACS? > > > Like I said, it''s the only thing that I can pinpoint as being a > hindrance to compatibility. I guess my request here is if anyone can > help me determine whether or not that''s true?What motherboard are you using? Has anyone successfully used it for VGA passthrough? I don''t think the possibility of both of us having similarly duff hardware has been systematically excluded yet.> It eventually likes to BSOD, usually on atikmpag.sys I think. > Plenty > of "an attempt was made to reset the display adapter and failed" > blah > blah blah. > > > Yes, all too familiar. > > This happens 100% of the time if I try to boot with both > devices attached. > > > Both devices? > > > Yes---that is to say both of the VGA controllers from the 6990. The > relevant portion of my lspci looks like this: > http://pastebin.com/raw.php?i=GwekPNAWOK, I get it. I seem to remember reading in the archives that dual VGA passthrough is problematic (my experience over the years shows that multiple GPUs are a false economy of highly questionably benefit).> Note: devices 09 and 0a are my "primary" 6990''s vga controllers. Also, > my crossfire bridge is disconnected. I''m working with the other card, > devices 0d and 0e. I''ve included the USB card as well in the list > because I''m using it, but it causes me no problems whatsoever. For what > its worth, that USB card works great in ESXi as well... Highpoint > enabled ACS on their PEX chips :D > > Just out of interest: > > 1) Are you using a multi-socket motherboard? > > > Nope! It''s a Gigabyte GA-EX58-EXTREME. It''s LGA1366 with an i7 920 in > it. VT-d support is provided through a hacked BIOS image that I found > on the web a couple years or so ago.Having to use a hacked BIOS for VT-d support is not a good sign or a good starting point...> 2) Have you tried disabling IRQ balancing > (noirqbalance kernel parameter + disable irqbalance service)? > > > No clue what that is. Can you provide any direction? I''d be happy to > test.In your boot loader, find the kernel and xen lines and add: On the xen line: noirqbalance On the dom0 kernel line: noirqbalance> 3) Are you assigning > 4GB of RAM to the guest? I found a post > in the archive last night mentioning that there''s an outstanding qemu > issue with > 4GB of RAM given to the guest. I didn''t get around to > re-trying the VM with 3.5GB yet. > > > Yes sir. It''s got 8 GB + 1 GB for the standard video adapter. Not sure > if that''s improper, but it boots just find with a single card, and the > 5850 I plugged in for a short while seemed well behaved. Here''s a copy > of my vm config file: http://pastebin.com/bX0ayA0uI think reducing the guest RAM to 3.5GB is worth a shot, along with only passing a single GPU device.> The first time I boot it up, the driver isn''t > installed so it''ll work until just before auto-login reaches the > desktop, but after that I can''t boot at all with both VGA devices > attached. I''d love to explore more, but I''m running out of places to > look for solutions to my problem that don''t involve my credit > card and > some new hardware. In a fit of delicious irony, my problem is > almost > identical to yours---if only I''d bought some cheaper stuff it''d > probably all work just great :D > > > Life on the bleeding edge is hard. :( > The thing that really bugs me is that after a fresh reboot with irq > balancing disabled, I can get it working for a few minutes _every time_. > > After a few minutes, it''ll start corrupting the screen output and > eventually try to reset itself (sometimes even claim to succeed a few > times), eventually fail and BSOD. > > > The only corrupted output I''ve seen is during a BSOD itself---which was > once on Server 2012---and again I saw some black lines when I zoomed in > with Chrome on a Win7 guest. I''m not entirely convinced that the black > lines were a symptom of Xen/Radeon/Whatever versus just being a goofy > Chrome bug.I''m seeing white lines, both with the Radeon 6450 and the Quadro 2000.> The only single GPU cards I have are the Radeon 5850s in the AMD > box I > have. I''m just a little reticent to tear the thing apart though > cause > it gets used a lot. I think my next step is to look for a video > card > that properly supports FLR, > > > As far as I can tell, for all the talk of it - there is NO SUCH THING. > Somebody on the list posted lspci -vvv from their ATI FirePro card > which shows it has no FLR, and I have just got a Quadro 2000, which also > lacks FLR. > > The only vague mention I have seen of FLR on GPUs is on the Intel GPU on > the very latest generation of Core i CPUs (the built in one). And even > if that is true it''s not all that useful for gaming. > > > Heh. The crappiest GPU that would ever be in my system is the most > compatible? Good grief. :PI''m not sure about compatible, but it seems to have a feature that the others don''t - then again, take that with a pinch of salt - I don''t have one, and I tend not to believe such things until somebody shows me the lspci dump that proves it.> though I''m considering a hard-hack: think > of a 12v relay and a PCIe extender cable---if a D3D0 reset actually > powers off the slot momentarily but the PSU plugs on the card > prevent > it from working, then I could rig up a switch that ties those plugs'' > power state into the slot itself---it''s radical, yes, but > possibly the > most inventive solution I can think of so far. I''m super curious to > see if anyone more knowledgeable than myself thinks it would work, > because it''d be super cheap to build! As the saying goes > though, I''ll > "cross that bridge when I come to it." :) > > > Interesting. In theory, I think this _should_ work provider your PCIe > bridges support hot-plugging. > > To be certain, you''d have to switch both the PCIe slot and (if your card > uses it) the external power inputs. > > > That''d be the idea. Assuming it works the way I think it does, I could > tap a 12v (I''m pretty sure it''s 12v in there) relay into the Vcc and GND > pins of the PCIe slot and use the relay''s output to switch the Vcc from > the plug-in cables off of the PSU. Bears testing with a slightly less > expensive card, but I wouldn''t be surprised to see it work! It''d > require some case modding for sure though, as the extension cable will > get in the way of properly seating the card. It could be possible to > build a tap that could be "slipped in" to a card''s PCIe slot... Short > of proper FLR support, this could actually very cheaply be built into > the expansion card itself. I''d suspect that simply adding FLR would be > cheaper on the card manufacturers though. :)Just get a case with more slot cutouts on the back than your motherboard has slots. Then feed the ribbon to the bottom so the card sits in the slot on the case that is below your motherboard - no modding required. :)> 2) My motherboard''s PCIe slots are behind > NF200 PCIe bridges > (yes, > EVGA have decided in their infinite wisdom to put > all 7 PCIe slots > behind NF200s, none are directly attached to the > Intel NB). > > I''m so sorry :P. NF200 has probably caused a > lot of xen > tinkerers to > utter a few dozen cuss words a piece. > > I can believe that. What is the solution, though? > > The thing that drives me really nuts about the > issues I''m seeing > (which may or may not be specifically related to > the NF200) is > that it > is so intermittent. It works well enough to boot > up and work with a > gaming type load for a few minutes. Then > something happens that > causes > the VGA card to require a reset, and it all falls > apart. > > My solution was to buy another motherboard, I had > no luck at all > passing the devices behind the NF200, and similar > to your situation > all but one PCIe slot on that board was behind > that bridge. > > > Did you not manage to get it working at all? Or was > it just > intermittent like in my case? I can typically get > about 5 minutes of > gaming out of my ATI card before it all goes wrong. > > Ironically, I was thinking about an Asus Sabertooth > with an 8-core AMD, > but opted to go for broke and get a couple of 6-core > Xeons and an > EVGA SR-2. It turns out, a solution that is 4x more > expensive isn''t > actually better... :( > > > I was unable to get it working at all. The NF200 simply > threw errors > that 100% prevented me from passing the device. I think > it was missing > a number of specific features required for passthrough, > and I vaguely > remember running lspci -vvv to verify what was missing. > Perhaps not all > NF200''s are created equal? > > > The only logged issue I had with the NF200s was the lack of > ACS, which > can be disabled as I mentioned on this thread (at least if > you are using > the xm stack). After I disabled that PCI passthrough has > been working OK. > It''s just VGA passthrough BSOD-ing after some minutes that > is causing me > problems. > > > In reading up on the wiki, there does indeed seem to be a lot more > info regarding the use of xl and PCI Passthrough today than the last > time I looked. It seems that these types of configuration > options are > set on a domain-by-domain basis, or even by device; docs say that > things like VPCI vs direct PASS mapping of slot layout(?) is > actually > configured at the device level either in your DomU config file > (like: > pci = [''0:d:0.0, pci-just-forking-work-damn-__you]) or via xl > (like: xl > pci-attach 1 0:d:0.0 pci-just-forking-work-damn-__you). > > > Hmm... I honestly don''t think the xl way will succeed where xm is > unstable, > but I might give it a shot. > > > You''d still likely require all the "hacks" you''re currently using, but > they''ll all move to different places I''m guessing... if the toolstack > itself doesn''t have any bearing on this (which is my suspicion) then you > don''t want to go doing all the extra work for nothing, of course!Exactly. And right now what I have read (somebody point me to something that says otherwise), more people seem to have reported success with xm than xl stacks (but that could just be due to the xl stack being much more recent).> With that in mind, even though I''ve taken your advice and added the > config info to my xend files, its entirely possible---especially in > light of what Casey said---that I''m just Doing It Wrong(TM). It''d > likely be beneficial for us both to compare notes on that > regard. If > either of you would be willing to help, I could probably use some > pointers... I''ve kinda run out of logs to look at with my current > knowledge on the subject :P > > > Certainly - what notes do you propose we compare? > > > I''m not completely sure. If you can point me to the proper files to > verify that my device has the same PCIe-level compatibility issues as > yours (verify that ACS isn''t available to the device and so on) then I''d > call that a step in the right direction.Another thing - Do "lspci -vt" - can you put the card in a slot where it doesn''t share a bridge with any other PCIe devices?> What about with PCIe devices behind NF200 > bridges? I know the > NF200s > don''t support PCI ACS, but that is a security > feature (which I have > disabled enforcement of to get this far), and > AFAIK shouldn''t > actually > affect the basic PCI passthrough capability. > > Question: how''d you disable ACS? I think it > may be causing me > some > issues. > > Put: > > (pci-passthrough-strict-check no) > (pci-dev-assign-strict-check no) > > in /etc/xen/xend-config.sxp > > If it was causing you issues, however, I''d > expect you to find > errors > in logs pointing at it. > > As I understand the xend-config.sxp [1] is for > the xm toolstack and > deprecated Xend service. > > > xm toolstack and xend are what I am using. I have > read reports of issues > with VGA passthrough using the xl stack so I didn''t > even attempt to > use it. > > > The xm toolstack was deprecated in version 4.1. I read > that it had not > been updated in months due to a lack of maintainers. > > > I heard that xl is still feature-incomplete and > experimental, and problematic with VGA passthrough. > > I did try xm back > when I started, the passthrough worked but had the same > problems I had > when I began testing xl. I have been using xl since > then. My logic was > simply "why become dependent on a tool that is no-longer > maintained and > may be removed from the next release?" > > > I''m not wedded to any particular tool stack, I''m happy to > use whatever > works. But since libvirt and virt-manager are still using > xm, and since > I have seen recent reports of xl being problematic for VGA > passthrough > as well as there being no apparent way to disable ACS > requirements with > the xl stack, that rules it out for me completely at the moment. > > > The xm stack was rather trying for me. It''s like it only wanted to > throw errors at me when I did PCI stuff. Whereas xl has seemingly > been more than happy to do whatever I tell it. Though I admit > chances > are pretty good I was just running around, haphazardly using the > wrong > version of python or something. Given our nearly identical results > thus far, I''d wager that the toolstack itself isn''t really the > source > of our problems. If that''s true, though, the easy solution is > likely > out the window :( > > > What distro do you use? > > <snip> > > > Currently running Debian Squeeze 6.0.7 x86_64, with Linux kernel 3.4.44.OK, that''s a useful reference point. I''m on EL6 using 3.8.10 (will be upgrading to 3.8.12 tonight). Gordan
> > > 2) Have you tried disabling IRQ balancing >> (noirqbalance kernel parameter + disable irqbalance service)? >> >> >> No clue what that is. Can you provide any direction? I''d be happy to >> test. >> > > In your boot loader, find the kernel and xen lines and add: > > On the xen line: > noirqbalance > > On the dom0 kernel line: > noirqbalance > >How would removing noirqbalance help fix the problem? Just curious; as I understand it that tool is used to balance requests like a scheduler of sorts.> > 3) Are you assigning > 4GB of RAM to the guest? I found a post >> in the archive last night mentioning that there''s an outstanding qemu >> issue with > 4GB of RAM given to the guest. I didn''t get around to >> re-trying the VM with 3.5GB yet. >> >> >> Yes sir. It''s got 8 GB + 1 GB for the standard video adapter. Not sure >> if that''s improper, but it boots just find with a single card, and the >> 5850 I plugged in for a short while seemed well behaved. Here''s a copy >> of my vm config file: http://pastebin.com/bX0ayA0u >> > > I think reducing the guest RAM to 3.5GB is worth a shot, along with only > passing a single GPU device. > >If I recall the RAM limit is specific to PV guests or older versions of Xen. I have run Windows with 4, 6, 8, and 16GB of RAM without ever encountering this problem, and this includes tests with the xl toolstack on Xen 4.1.2.> > The only single GPU cards I have are the Radeon 5850s in the AMD >> box I >> have. I''m just a little reticent to tear the thing apart though >> cause >> it gets used a lot. I think my next step is to look for a video >> card >> that properly supports FLR, >> >> >> As far as I can tell, for all the talk of it - there is NO SUCH THING. >> Somebody on the list posted lspci -vvv from their ATI FirePro card >> which shows it has no FLR, and I have just got a Quadro 2000, which >> also >> lacks FLR. >> >> The only vague mention I have seen of FLR on GPUs is on the Intel GPU >> on >> the very latest generation of Core i CPUs (the built in one). And even >> if that is true it''s not all that useful for gaming. >> >> >> Heh. The crappiest GPU that would ever be in my system is the most >> compatible? Good grief. :P >> > > I''m not sure about compatible, but it seems to have a feature that the > others don''t - then again, take that with a pinch of salt - I don''t have > one, and I tend not to believe such things until somebody shows me the > lspci dump that proves it. > >Where did you find mention of the newer integrated graphics supporting FLR? I have an IvyBridge 3770 with an HD4000, but when I ran lspci -vv and -vvv I did not see FLReset+, but maybe I did something incorrectly as I also did not see any mention of FLReset anywhere? If the Ivybridge integrated has FLReset I would totally want to test it. It may not be a powerful chip compared to modern discrete cards, and it won''t prove that the lack of FLR is the cause of our AMD/nVidia problems, but it would show the effect the presence of FLR has.> > 2) My motherboard''s PCIe slots are behind >> NF200 PCIe bridges >> (yes, >> EVGA have decided in their infinite wisdom to put >> all 7 PCIe slots >> behind NF200s, none are directly attached to the >> Intel NB). >> >> I''m so sorry :P. NF200 has probably caused a >> lot of xen >> tinkerers to >> utter a few dozen cuss words a piece. >> >> I can believe that. What is the solution, though? >> >> The thing that drives me really nuts about the >> issues I''m seeing >> (which may or may not be specifically related to >> the NF200) is >> that it >> is so intermittent. It works well enough to boot >> up and work with a >> gaming type load for a few minutes. Then >> something happens that >> causes >> the VGA card to require a reset, and it all falls >> apart. >> >> My solution was to buy another motherboard, I had >> no luck at all >> passing the devices behind the NF200, and similar >> to your situation >> all but one PCIe slot on that board was behind >> that bridge. >> >> >> Did you not manage to get it working at all? Or was >> it just >> intermittent like in my case? I can typically get >> about 5 minutes of >> gaming out of my ATI card before it all goes wrong. >> >> Ironically, I was thinking about an Asus Sabertooth >> with an 8-core AMD, >> but opted to go for broke and get a couple of 6-core >> Xeons and an >> EVGA SR-2. It turns out, a solution that is 4x more >> expensive isn''t >> actually better... :( >> >> >> I was unable to get it working at all. The NF200 simply >> threw errors >> that 100% prevented me from passing the device. I think >> it was missing >> a number of specific features required for passthrough, >> and I vaguely >> remember running lspci -vvv to verify what was missing. >> Perhaps not all >> NF200''s are created equal? >> >> >> The only logged issue I had with the NF200s was the lack of >> ACS, which >> can be disabled as I mentioned on this thread (at least if >> you are using >> the xm stack). After I disabled that PCI passthrough has >> been working OK. >> It''s just VGA passthrough BSOD-ing after some minutes that >> is causing me >> problems. >> >> >> In reading up on the wiki, there does indeed seem to be a lot more >> info regarding the use of xl and PCI Passthrough today than the >> last >> time I looked. It seems that these types of configuration >> options are >> set on a domain-by-domain basis, or even by device; docs say that >> things like VPCI vs direct PASS mapping of slot layout(?) is >> actually >> configured at the device level either in your DomU config file >> (like: >> pci = [''0:d:0.0, pci-just-forking-work-damn-__**you]) or via xl >> (like: xl >> pci-attach 1 0:d:0.0 pci-just-forking-work-damn-__**you). >> >> >> >> Hmm... I honestly don''t think the xl way will succeed where xm is >> unstable, >> but I might give it a shot. >> >> >> You''d still likely require all the "hacks" you''re currently using, but >> they''ll all move to different places I''m guessing... if the toolstack >> itself doesn''t have any bearing on this (which is my suspicion) then you >> don''t want to go doing all the extra work for nothing, of course! >> > > Exactly. And right now what I have read (somebody point me to something > that says otherwise), more people seem to have reported success with xm > than xl stacks (but that could just be due to the xl stack being much more > recent). >I would go as far as to say that most of those reports came from people who used the packaged Xen, and until very recently the packaged Xen was 4.0 or 4.1 where xm is still the default toolstack. _______________________________________________ Xen-users mailing list Xen-users@lists.xen.org http://lists.xen.org/xen-users
On Fri, May 10, 2013 at 3:42 PM, Casey DeLorme <cdelorme@gmail.com> wrote:> >> 2) Have you tried disabling IRQ balancing >>> (noirqbalance kernel parameter + disable irqbalance service)? >>> >>> >>> No clue what that is. Can you provide any direction? I''d be happy to >>> test. >>> >> >> In your boot loader, find the kernel and xen lines and add: >> >> On the xen line: >> noirqbalance >> >> On the dom0 kernel line: >> noirqbalance >> >> > How would removing noirqbalance help fix the problem? Just curious; as I > understand it that tool is used to balance requests like a scheduler of > sorts. > > >> >> 3) Are you assigning > 4GB of RAM to the guest? I found a post >>> in the archive last night mentioning that there''s an outstanding qemu >>> issue with > 4GB of RAM given to the guest. I didn''t get around to >>> re-trying the VM with 3.5GB yet. >>> >>> >>> Yes sir. It''s got 8 GB + 1 GB for the standard video adapter. Not sure >>> if that''s improper, but it boots just find with a single card, and the >>> 5850 I plugged in for a short while seemed well behaved. Here''s a copy >>> of my vm config file: http://pastebin.com/bX0ayA0u >>> >> >> I think reducing the guest RAM to 3.5GB is worth a shot, along with only >> passing a single GPU device. >> >> > If I recall the RAM limit is specific to PV guests or older versions of > Xen. I have run Windows with 4, 6, 8, and 16GB of RAM without ever > encountering this problem, and this includes tests with the xl toolstack on > Xen 4.1.2. > > >> >> The only single GPU cards I have are the Radeon 5850s in the AMD >>> box I >>> have. I''m just a little reticent to tear the thing apart though >>> cause >>> it gets used a lot. I think my next step is to look for a video >>> card >>> that properly supports FLR, >>> >>> >>> As far as I can tell, for all the talk of it - there is NO SUCH >>> THING. >>> Somebody on the list posted lspci -vvv from their ATI FirePro card >>> which shows it has no FLR, and I have just got a Quadro 2000, which >>> also >>> lacks FLR. >>> >>> The only vague mention I have seen of FLR on GPUs is on the Intel >>> GPU on >>> the very latest generation of Core i CPUs (the built in one). And >>> even >>> if that is true it''s not all that useful for gaming. >>> >>> >>> Heh. The crappiest GPU that would ever be in my system is the most >>> compatible? Good grief. :P >>> >> >> I''m not sure about compatible, but it seems to have a feature that the >> others don''t - then again, take that with a pinch of salt - I don''t have >> one, and I tend not to believe such things until somebody shows me the >> lspci dump that proves it. >> >> > Where did you find mention of the newer integrated graphics supporting > FLR? I have an IvyBridge 3770 with an HD4000, but when I ran lspci -vv and > -vvv I did not see FLReset+, but maybe I did something incorrectly as I > also did not see any mention of FLReset anywhere? >That one got me too for a minute. You gotta run the lspci -vv[v] as root in order to see that detail. Doing a sudo lspci -vv gets me this, note the DevCap field at the end of the list (which isn''t full output for the device, just to show of course): 0e:00.0 Display controller: ATI Technologies Inc Device 671d Subsystem: ATI Technologies Inc Device 1b2a Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-> Stepping- SERR- FastB2B- DisINTx-Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort-> <MAbort- >SERR- <PERR- INTx-Latency: 0, Cache Line Size: 64 bytes Interrupt: pin A routed to IRQ 7 Region 0: Memory at d0000000 (64-bit, prefetchable) [size=256M] Region 2: Memory at fbcc0000 (64-bit, non-prefetchable) [size=128K] Region 4: I/O ports at be00 [size=256] [virtual] Expansion ROM at fbc00000 [disabled] [size=128K] Capabilities: [50] Power Management version 3 Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME- Capabilities: [58] Express (v2) Legacy Endpoint, MSI 00 DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <4us, L1 unlimited ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset- I''m still configuring and responding to Gordan, but figured you could use a quick answer just in case you weren''t aware of the root req for reading pci features.> If the Ivybridge integrated has FLReset I would totally want to test it. > It may not be a powerful chip compared to modern discrete cards, and it > won''t prove that the lack of FLR is the cause of our AMD/nVidia problems, > but it would show the effect the presence of FLR has. >> >> >> 2) My motherboard''s PCIe slots are behind >>> NF200 PCIe bridges >>> (yes, >>> EVGA have decided in their infinite wisdom to put >>> all 7 PCIe slots >>> behind NF200s, none are directly attached to the >>> Intel NB). >>> >>> I''m so sorry :P. NF200 has probably caused a >>> lot of xen >>> tinkerers to >>> utter a few dozen cuss words a piece. >>> >>> I can believe that. What is the solution, >>> though? >>> >>> The thing that drives me really nuts about the >>> issues I''m seeing >>> (which may or may not be specifically related to >>> the NF200) is >>> that it >>> is so intermittent. It works well enough to boot >>> up and work with a >>> gaming type load for a few minutes. Then >>> something happens that >>> causes >>> the VGA card to require a reset, and it all falls >>> apart. >>> >>> My solution was to buy another motherboard, I had >>> no luck at all >>> passing the devices behind the NF200, and similar >>> to your situation >>> all but one PCIe slot on that board was behind >>> that bridge. >>> >>> >>> Did you not manage to get it working at all? Or was >>> it just >>> intermittent like in my case? I can typically get >>> about 5 minutes of >>> gaming out of my ATI card before it all goes wrong. >>> >>> Ironically, I was thinking about an Asus Sabertooth >>> with an 8-core AMD, >>> but opted to go for broke and get a couple of 6-core >>> Xeons and an >>> EVGA SR-2. It turns out, a solution that is 4x more >>> expensive isn''t >>> actually better... :( >>> >>> >>> I was unable to get it working at all. The NF200 simply >>> threw errors >>> that 100% prevented me from passing the device. I think >>> it was missing >>> a number of specific features required for passthrough, >>> and I vaguely >>> remember running lspci -vvv to verify what was missing. >>> Perhaps not all >>> NF200''s are created equal? >>> >>> >>> The only logged issue I had with the NF200s was the lack of >>> ACS, which >>> can be disabled as I mentioned on this thread (at least if >>> you are using >>> the xm stack). After I disabled that PCI passthrough has >>> been working OK. >>> It''s just VGA passthrough BSOD-ing after some minutes that >>> is causing me >>> problems. >>> >>> >>> In reading up on the wiki, there does indeed seem to be a lot >>> more >>> info regarding the use of xl and PCI Passthrough today than the >>> last >>> time I looked. It seems that these types of configuration >>> options are >>> set on a domain-by-domain basis, or even by device; docs say that >>> things like VPCI vs direct PASS mapping of slot layout(?) is >>> actually >>> configured at the device level either in your DomU config file >>> (like: >>> pci = [''0:d:0.0, pci-just-forking-work-damn-__**you]) or via xl >>> (like: xl >>> pci-attach 1 0:d:0.0 pci-just-forking-work-damn-__**you). >>> >>> >>> >>> Hmm... I honestly don''t think the xl way will succeed where xm is >>> unstable, >>> but I might give it a shot. >>> >>> >>> You''d still likely require all the "hacks" you''re currently using, but >>> they''ll all move to different places I''m guessing... if the toolstack >>> itself doesn''t have any bearing on this (which is my suspicion) then you >>> don''t want to go doing all the extra work for nothing, of course! >>> >> >> Exactly. And right now what I have read (somebody point me to something >> that says otherwise), more people seem to have reported success with xm >> than xl stacks (but that could just be due to the xl stack being much more >> recent). >> > > I would go as far as to say that most of those reports came from people > who used the packaged Xen, and until very recently the packaged Xen was 4.0 > or 4.1 where xm is still the default toolstack. >I''d put money on it :P -Andrew> _______________________________________________ > Xen-users mailing list > Xen-users@lists.xen.org > http://lists.xen.org/xen-users >_______________________________________________ Xen-users mailing list Xen-users@lists.xen.org http://lists.xen.org/xen-users
Okay, here we go! On Fri, May 10, 2013 at 2:58 PM, Gordan Bobic <gordan@bobich.net> wrote:> On 05/10/2013 06:54 PM, Andrew Bobulsky wrote: > > Two points here: >> >> 1) Unlike ESXi 4.1+ (from what I can find), Xen (at least with the >> xm/xend stack does allow ACS requirement to be disabled. >> >> >> Hehe. It''s nice to have the option to screw things up, eh? :) >> > > Personally, I really dislike too much cleverness from software. While I > understand auto-detection is handy for Ubuntu users, I want there to be a > way to override things if I need to, without extensive source code > modifying. I like there to be a way to tell whatever you are using to quit > holding your hand and just do as it''s damn well told. >And if I wanna crash this kernel I''ll be damned if it doesn''t come flying to the concrete at nearly four gigahertz! Why I oughtta... ;)> 2) I actually have it working - for 5 minutes or so at a time. If >> the problem was the lack of ACS, it wouldn''t work at all. >> >> >> I just can''t help but wonder if it /is/ the problem, though. It''s the >> >> only thing I can pin down that our situations have in common as far as >> its being the only "non-compatible" portion of the implementation, aside >> from the nearly identical behavior, of course. Maybe the AMD driver does >> some stupid stuff that ACS can mitigate? I just wish I knew more :( >> > > Now you got me thinking... I noticed that when the GPU starts to head > toward the crash, this appears in the syslog: > > May 6 16:35:51 normandy kernel: pcieport 0000:00:03.0: AER: Multiple > Uncorrected (Non-Fatal) error received: id=0000 > > It certainly makes me wonder. > > Has anyone else seen this error? > > The device ID in question is: > > 00:03.0 PCI bridge: Intel Corporation 5520/5500/X58 I/O Hub PCI Express > Root Port 3 (rev 22) > > which does not bode well... > > Duff hardware?Hmmm... I''ll poke through my syslog at the next crash. I tried: cat /var/log/syslog | grep pcieport> cat /var/log/syslog.1 | grep pcieport > dmesg | grep pcieportNothing came back from any of those. I''ll see if I can identify any unique errors myself though!> So what might intrigue you the most here is that while I''m stuck >> with >> a VGA device sitting behind this non-ACS compliant switch... My >> results are almost identical to yours. Passing one of the VGA >> devices >> to the DomU, with or without the corresponding HDMI audio >> doesn''t seem >> to matter, I get this: >> >> " it is so intermittent. It works well enough to boot up and >> work with >> a gaming type load for a few minutes. Then something happens that >> causes the VGA card to require a reset, and it all falls apart." >> >> Seriously :P >> >> >> And you are convinced this is to do with the availability of ACS? >> >> >> Like I said, it''s the only thing that I can pinpoint as being a >> hindrance to compatibility. I guess my request here is if anyone can >> help me determine whether or not that''s true? >> > > What motherboard are you using? Has anyone successfully used it for VGA > passthrough? I don''t think the possibility of both of us having similarly > duff hardware has been systematically excluded yet.I think I said it, but I''ll link here anyway: http://www.gigabyte.us/products/product-page.aspx?pid=2957#ov As to whether or not anyone''s used it for passthrough before... I''ve got no clue. Probably not too many people, seeing as how I''m essentially running a custom BIOS :P> It eventually likes to BSOD, usually on atikmpag.sys I think. >> Plenty >> of "an attempt was made to reset the display adapter and failed" >> blah >> blah blah. >> >> >> Yes, all too familiar. >> >> This happens 100% of the time if I try to boot with both >> devices attached. >> >> >> Both devices? >> >> >> Yes---that is to say both of the VGA controllers from the 6990. The >> relevant portion of my lspci looks like this: >> http://pastebin.com/raw.php?i=**GwekPNAW<http://pastebin.com/raw.php?i=GwekPNAW> >> > > OK, I get it. I seem to remember reading in the archives that dual VGA > passthrough is problematic (my experience over the years shows that > multiple GPUs are a false economy of highly questionably benefit).That''s actually pretty much completely accurate. It drives me particularly up the wall because I hate running things in full screen, and crossfire basically doesn''t work at all without that :P Nonetheless, they once mined bitcoins like a pair of world-class champions ;D Note: devices 09 and 0a are my "primary" 6990''s vga controllers. Also,>> my crossfire bridge is disconnected. I''m working with the other card, >> devices 0d and 0e. I''ve included the USB card as well in the list >> because I''m using it, but it causes me no problems whatsoever. For what >> its worth, that USB card works great in ESXi as well... Highpoint >> enabled ACS on their PEX chips :D >> >> Just out of interest: >> >> 1) Are you using a multi-socket motherboard? >> >> >> Nope! It''s a Gigabyte GA-EX58-EXTREME. It''s LGA1366 with an i7 920 in >> it. VT-d support is provided through a hacked BIOS image that I found >> on the web a couple years or so ago. >> > > Having to use a hacked BIOS for VT-d support is not a good sign or a good > starting point...Technically, you''re right. AFAIK though, this particular generation of i7 chips allows for VT-d to be managed entirely by the chipset/bios. There''s no particular req (however artificial) coming out of the CPUs for this generation that stipulates VT-d can''t be patched in... so I figured, "why not?" I was modding my BIOS anyway and decided to use this one as a base because it had both VT-d and fully updated option ROMs for all my onboard stuff. The world of BIOS modding is a *very* neat one; I highly suggest every nerd spend a few days there at some point in his life ;) To the point though, it seems very well behaved on everything that *isn''t* my 6990 :-(> 2) Have you tried disabling IRQ balancing >> (noirqbalance kernel parameter + disable irqbalance service)? >> >> >> No clue what that is. Can you provide any direction? I''d be happy to >> test. >> > > In your boot loader, find the kernel and xen lines and add: > > On the xen line: > noirqbalance > > On the dom0 kernel line: > noirqbalanceI''ve gone with this in my /etc/default/grub:> GRUB_CMDLINE_XEN_DEFAULT="xen-pciback.permissive xen-pciback.passthrough=1 > noirqbalance"Just ran update-grub and I''ll reboot and see what happens!> 3) Are you assigning > 4GB of RAM to the guest? I found a post >> in the archive last night mentioning that there''s an outstanding qemu >> issue with > 4GB of RAM given to the guest. I didn''t get around to >> re-trying the VM with 3.5GB yet. >> >> >> Yes sir. It''s got 8 GB + 1 GB for the standard video adapter. Not sure >> if that''s improper, but it boots just find with a single card, and the >> 5850 I plugged in for a short while seemed well behaved. Here''s a copy >> of my vm config file: http://pastebin.com/bX0ayA0u >> > > I think reducing the guest RAM to 3.5GB is worth a shot, along with only > passing a single GPU device.Done. Will report.> The first time I boot it up, the driver isn''t >> installed so it''ll work until just before auto-login reaches the >> desktop, but after that I can''t boot at all with both VGA devices >> attached. I''d love to explore more, but I''m running out of places >> to >> look for solutions to my problem that don''t involve my credit >> card and >> some new hardware. In a fit of delicious irony, my problem is >> almost >> identical to yours---if only I''d bought some cheaper stuff it''d >> probably all work just great :D >> >> >> Life on the bleeding edge is hard. :( >> The thing that really bugs me is that after a fresh reboot with irq >> balancing disabled, I can get it working for a few minutes _every >> time_. >> >> After a few minutes, it''ll start corrupting the screen output and >> eventually try to reset itself (sometimes even claim to succeed a few >> times), eventually fail and BSOD. >> >> >> The only corrupted output I''ve seen is during a BSOD itself---which was >> once on Server 2012---and again I saw some black lines when I zoomed in >> with Chrome on a Win7 guest. I''m not entirely convinced that the black >> lines were a symptom of Xen/Radeon/Whatever versus just being a goofy >> Chrome bug. >> > > I''m seeing white lines, both with the Radeon 6450 and the Quadro 2000.Yeah. I''m convinced now. They might be a different color, but they''re in chrome (which uses a GPU accelerated 2d canvas) and they seem to precede the crash pretty reliably.> The only single GPU cards I have are the Radeon 5850s in the AMD >> box I >> have. I''m just a little reticent to tear the thing apart though >> cause >> it gets used a lot. I think my next step is to look for a video >> card >> that properly supports FLR, >> >> >> As far as I can tell, for all the talk of it - there is NO SUCH THING. >> Somebody on the list posted lspci -vvv from their ATI FirePro card >> which shows it has no FLR, and I have just got a Quadro 2000, which >> also >> lacks FLR. >> >> The only vague mention I have seen of FLR on GPUs is on the Intel GPU >> on >> the very latest generation of Core i CPUs (the built in one). And even >> if that is true it''s not all that useful for gaming. >> >> >> Heh. The crappiest GPU that would ever be in my system is the most >> compatible? Good grief. :P >> > > I''m not sure about compatible, but it seems to have a feature that the > others don''t - then again, take that with a pinch of salt - I don''t have > one, and I tend not to believe such things until somebody shows me the > lspci dump that proves it. > > though I''m considering a hard-hack: think >> of a 12v relay and a PCIe extender cable---if a D3D0 reset >> actually >> powers off the slot momentarily but the PSU plugs on the card >> prevent >> it from working, then I could rig up a switch that ties those >> plugs'' >> power state into the slot itself---it''s radical, yes, but >> possibly the >> most inventive solution I can think of so far. I''m super curious >> to >> see if anyone more knowledgeable than myself thinks it would work, >> because it''d be super cheap to build! As the saying goes >> though, I''ll >> "cross that bridge when I come to it." :) >> >> >> Interesting. In theory, I think this _should_ work provider your PCIe >> bridges support hot-plugging. >> >> To be certain, you''d have to switch both the PCIe slot and (if your >> card >> uses it) the external power inputs. >> >> >> That''d be the idea. Assuming it works the way I think it does, I could >> tap a 12v (I''m pretty sure it''s 12v in there) relay into the Vcc and GND >> pins of the PCIe slot and use the relay''s output to switch the Vcc from >> the plug-in cables off of the PSU. Bears testing with a slightly less >> expensive card, but I wouldn''t be surprised to see it work! It''d >> require some case modding for sure though, as the extension cable will >> get in the way of properly seating the card. It could be possible to >> build a tap that could be "slipped in" to a card''s PCIe slot... Short >> of proper FLR support, this could actually very cheaply be built into >> the expansion card itself. I''d suspect that simply adding FLR would be >> cheaper on the card manufacturers though. :) >> > > Just get a case with more slot cutouts on the back than your motherboard > has slots. Then feed the ribbon to the bottom so the card sits in the slot > on the case that is below your motherboard - no modding required. :) >But... but! I guess that''d require a mini(?) or MicroATX board. I''m a full size to XL ATX (or whatever the monster-sized boards are) kind of guy. Guess I just want more slots to pass GPUs to VMs, eh? :) There''s supposed to be some cases out there that allow for mounting of expansion cards on the end of flexible extenders. Haven''t heard about them in a couple years, but either way chances are pretty good that such cases aren''t exactly affordable... they likely target enterprise customers or simply have limited runs... economy of scale and all that. Probably the "slip-in" type of adapter/approach would be best, but I don''t wanna get ahead of myself on a simple idea that may not even work :P> 2) My motherboard''s PCIe slots are behind >> NF200 PCIe bridges >> (yes, >> EVGA have decided in their infinite wisdom to put >> all 7 PCIe slots >> behind NF200s, none are directly attached to the >> Intel NB). >> >> I''m so sorry :P. NF200 has probably caused a >> lot of xen >> tinkerers to >> utter a few dozen cuss words a piece. >> >> I can believe that. What is the solution, though? >> >> The thing that drives me really nuts about the >> issues I''m seeing >> (which may or may not be specifically related to >> the NF200) is >> that it >> is so intermittent. It works well enough to boot >> up and work with a >> gaming type load for a few minutes. Then >> something happens that >> causes >> the VGA card to require a reset, and it all falls >> apart. >> >> My solution was to buy another motherboard, I had >> no luck at all >> passing the devices behind the NF200, and similar >> to your situation >> all but one PCIe slot on that board was behind >> that bridge. >> >> >> Did you not manage to get it working at all? Or was >> it just >> intermittent like in my case? I can typically get >> about 5 minutes of >> gaming out of my ATI card before it all goes wrong. >> >> Ironically, I was thinking about an Asus Sabertooth >> with an 8-core AMD, >> but opted to go for broke and get a couple of 6-core >> Xeons and an >> EVGA SR-2. It turns out, a solution that is 4x more >> expensive isn''t >> actually better... :( >> >> >> I was unable to get it working at all. The NF200 simply >> threw errors >> that 100% prevented me from passing the device. I think >> it was missing >> a number of specific features required for passthrough, >> and I vaguely >> remember running lspci -vvv to verify what was missing. >> Perhaps not all >> NF200''s are created equal? >> >> >> The only logged issue I had with the NF200s was the lack of >> ACS, which >> can be disabled as I mentioned on this thread (at least if >> you are using >> the xm stack). After I disabled that PCI passthrough has >> been working OK. >> It''s just VGA passthrough BSOD-ing after some minutes that >> is causing me >> problems. >> >> >> In reading up on the wiki, there does indeed seem to be a lot more >> info regarding the use of xl and PCI Passthrough today than the >> last >> time I looked. It seems that these types of configuration >> options are >> set on a domain-by-domain basis, or even by device; docs say that >> things like VPCI vs direct PASS mapping of slot layout(?) is >> actually >> configured at the device level either in your DomU config file >> (like: >> pci = [''0:d:0.0, pci-just-forking-work-damn-__**you]) or via xl >> (like: xl >> pci-attach 1 0:d:0.0 pci-just-forking-work-damn-__**you). >> >> >> >> Hmm... I honestly don''t think the xl way will succeed where xm is >> unstable, >> but I might give it a shot. >> >> >> You''d still likely require all the "hacks" you''re currently using, but >> they''ll all move to different places I''m guessing... if the toolstack >> itself doesn''t have any bearing on this (which is my suspicion) then you >> don''t want to go doing all the extra work for nothing, of course! >> > > Exactly. And right now what I have read (somebody point me to something > that says otherwise), more people seem to have reported success with xm > than xl stacks (but that could just be due to the xl stack being much more > recent). > > > With that in mind, even though I''ve taken your advice and added >> the >> config info to my xend files, its entirely possible---especially >> in >> light of what Casey said---that I''m just Doing It Wrong(TM). It''d >> likely be beneficial for us both to compare notes on that >> regard. If >> either of you would be willing to help, I could probably use some >> pointers... I''ve kinda run out of logs to look at with my current >> knowledge on the subject :P >> >> >> Certainly - what notes do you propose we compare? >> >> >> I''m not completely sure. If you can point me to the proper files to >> verify that my device has the same PCIe-level compatibility issues as >> yours (verify that ACS isn''t available to the device and so on) then I''d >> call that a step in the right direction. >> > > Another thing - Do "lspci -vt" - can you put the card in a slot where it > doesn''t share a bridge with any other PCIe devices?I don''t think so. You should see the built-in bridge... it''s implied slightly up the hierarchy from the two side-by-side 6990 devices, which itself attaches to the root port at the top: http://pastebin.com/raw.php?i=4dGmneYi> > > What about with PCIe devices behind NF200 >> bridges? I know the >> NF200s >> don''t support PCI ACS, but that is a security >> feature (which I have >> disabled enforcement of to get this far), and >> AFAIK shouldn''t >> actually >> affect the basic PCI passthrough capability. >> >> Question: how''d you disable ACS? I think it >> may be causing me >> some >> issues. >> >> Put: >> >> (pci-passthrough-strict-check no) >> (pci-dev-assign-strict-check no) >> >> in /etc/xen/xend-config.sxp >> >> If it was causing you issues, however, I''d >> expect you to find >> errors >> in logs pointing at it. >> >> As I understand the xend-config.sxp [1] is for >> the xm toolstack and >> deprecated Xend service. >> >> >> xm toolstack and xend are what I am using. I have >> read reports of issues >> with VGA passthrough using the xl stack so I didn''t >> even attempt to >> use it. >> >> >> The xm toolstack was deprecated in version 4.1. I read >> that it had not >> been updated in months due to a lack of maintainers. >> >> >> I heard that xl is still feature-incomplete and >> experimental, and problematic with VGA passthrough. >> >> I did try xm back >> when I started, the passthrough worked but had the same >> problems I had >> when I began testing xl. I have been using xl since >> then. My logic was >> simply "why become dependent on a tool that is no-longer >> maintained and >> may be removed from the next release?" >> >> >> I''m not wedded to any particular tool stack, I''m happy to >> use whatever >> works. But since libvirt and virt-manager are still using >> xm, and since >> I have seen recent reports of xl being problematic for VGA >> passthrough >> as well as there being no apparent way to disable ACS >> requirements with >> the xl stack, that rules it out for me completely at the >> moment. >> >> >> The xm stack was rather trying for me. It''s like it only wanted >> to >> throw errors at me when I did PCI stuff. Whereas xl has seemingly >> been more than happy to do whatever I tell it. Though I admit >> chances >> are pretty good I was just running around, haphazardly using the >> wrong >> version of python or something. Given our nearly identical >> results >> thus far, I''d wager that the toolstack itself isn''t really the >> source >> of our problems. If that''s true, though, the easy solution is >> likely >> out the window :( >> >> >> What distro do you use? >> >> <snip> >> >> >> Currently running Debian Squeeze 6.0.7 x86_64, with Linux kernel 3.4.44. >> > > OK, that''s a useful reference point. I''m on EL6 using 3.8.10 (will be > upgrading to 3.8.12 tonight). > > > Gordan > > > ______________________________**_________________ > Xen-users mailing list > Xen-users@lists.xen.org > http://lists.xen.org/xen-users >Wish me luck! -Andrew _______________________________________________ Xen-users mailing list Xen-users@lists.xen.org http://lists.xen.org/xen-users
On 05/10/2013 08:42 PM, Casey DeLorme wrote:> > 2) Have you tried disabling IRQ balancing > (noirqbalance kernel parameter + disable irqbalance service)? > > > No clue what that is. Can you provide any direction? I''d be > happy to > test. > > > In your boot loader, find the kernel and xen lines and add: > > On the xen line: > noirqbalance > > On the dom0 kernel line: > noirqbalance > > > How would removing noirqbalance help fix the problem? Just curious; as > I understand it that tool is used to balance requests like a scheduler > of sorts.I am purely guessing here, but could it be possible that if the VM uses a CPU other than the CPU that handles the interrupts for the hardware it has been passed strange things happen, possibly more so if the CPU in question is not only not the same core, but not even the same socket. It''s possible that disabling rotating the interrupt handling between the cores alleviates an issue with IRQ routing. But take this with a bucket of salt - I am _purely_ guessing here.> 3) Are you assigning > 4GB of RAM to the guest? I found a post > in the archive last night mentioning that there''s an > outstanding qemu > issue with > 4GB of RAM given to the guest. I didn''t get > around to > re-trying the VM with 3.5GB yet. > > > Yes sir. It''s got 8 GB + 1 GB for the standard video adapter. > Not sure > if that''s improper, but it boots just find with a single card, > and the > 5850 I plugged in for a short while seemed well behaved. Here''s > a copy > of my vm config file: http://pastebin.com/bX0ayA0u > > > I think reducing the guest RAM to 3.5GB is worth a shot, along with > only passing a single GPU device. > > > If I recall the RAM limit is specific to PV guests or older versions of > Xen. I have run Windows with 4, 6, 8, and 16GB of RAM without ever > encountering this problem, and this includes tests with the xl toolstack > on Xen 4.1.2.Including VGA passthrough on those guests?> The only single GPU cards I have are the Radeon 5850s > in the AMD > box I > have. I''m just a little reticent to tear the thing > apart though > cause > it gets used a lot. I think my next step is to look > for a video > card > that properly supports FLR, > > > As far as I can tell, for all the talk of it - there is NO > SUCH THING. > Somebody on the list posted lspci -vvv from their ATI > FirePro card > which shows it has no FLR, and I have just got a Quadro > 2000, which also > lacks FLR. > > The only vague mention I have seen of FLR on GPUs is on the > Intel GPU on > the very latest generation of Core i CPUs (the built in > one). And even > if that is true it''s not all that useful for gaming. > > > Heh. The crappiest GPU that would ever be in my system is the most > compatible? Good grief. :P > > > I''m not sure about compatible, but it seems to have a feature that > the others don''t - then again, take that with a pinch of salt - I > don''t have one, and I tend not to believe such things until somebody > shows me the lspci dump that proves it. > > > Where did you find mention of the newer integrated graphics supporting > FLR? I have an IvyBridge 3770 with an HD4000, but when I ran lspci -vv > and -vvv I did not see FLReset+, but maybe I did something incorrectly > as I also did not see any mention of FLReset anywhere? If the Ivybridge > integrated has FLReset I would totally want to test it. It may not be a > powerful chip compared to modern discrete cards, and it won''t prove that > the lack of FLR is the cause of our AMD/nVidia problems, but it would > show the effect the presence of FLR has.I came across a post on a forum or a miling list from someone after googling something like "GPU" "FLreset+" and then trawling to a few hundred pages to find one that actually lists lspci output that is referring to a GPU. Having said that, I have also found references to people claiming that FirePro and Quadro cars have FLR, which is quite clearly not the case. So let''s not assume that it''s true just because somebody on the internet said so. :)> 2) My motherboard''s PCIe slots are behind > NF200 PCIe bridges > (yes, > EVGA have decided in their infinite > wisdom to put > all 7 PCIe slots > behind NF200s, none are directly > attached to the > Intel NB). > > I''m so sorry :P. NF200 has probably > caused a > lot of xen > tinkerers to > utter a few dozen cuss words a piece. > > I can believe that. What is the > solution, though? > > The thing that drives me really nuts > about the > issues I''m seeing > (which may or may not be specifically > related to > the NF200) is > that it > is so intermittent. It works well enough > to boot > up and work with a > gaming type load for a few minutes. Then > something happens that > causes > the VGA card to require a reset, and it > all falls > apart. > > My solution was to buy another > motherboard, I had > no luck at all > passing the devices behind the NF200, > and similar > to your situation > all but one PCIe slot on that board was > behind > that bridge. > > > Did you not manage to get it working at all? > Or was > it just > intermittent like in my case? I can > typically get > about 5 minutes of > gaming out of my ATI card before it all goes > wrong. > > Ironically, I was thinking about an Asus > Sabertooth > with an 8-core AMD, > but opted to go for broke and get a couple > of 6-core > Xeons and an > EVGA SR-2. It turns out, a solution that is > 4x more > expensive isn''t > actually better... :( > > > I was unable to get it working at all. The > NF200 simply > threw errors > that 100% prevented me from passing the device. > I think > it was missing > a number of specific features required for > passthrough, > and I vaguely > remember running lspci -vvv to verify what was > missing. > Perhaps not all > NF200''s are created equal? > > > The only logged issue I had with the NF200s was the > lack of > ACS, which > can be disabled as I mentioned on this thread (at > least if > you are using > the xm stack). After I disabled that PCI > passthrough has > been working OK. > It''s just VGA passthrough BSOD-ing after some > minutes that > is causing me > problems. > > > In reading up on the wiki, there does indeed seem to be > a lot more > info regarding the use of xl and PCI Passthrough today > than the last > time I looked. It seems that these types of configuration > options are > set on a domain-by-domain basis, or even by device; > docs say that > things like VPCI vs direct PASS mapping of slot > layout(?) is > actually > configured at the device level either in your DomU > config file > (like: > pci = [''0:d:0.0, pci-just-forking-work-damn-____you]) > or via xl > (like: xl > pci-attach 1 0:d:0.0 pci-just-forking-work-damn-____you). > > > > Hmm... I honestly don''t think the xl way will succeed where > xm is > unstable, > but I might give it a shot. > > > You''d still likely require all the "hacks" you''re currently > using, but > they''ll all move to different places I''m guessing... if the > toolstack > itself doesn''t have any bearing on this (which is my suspicion) > then you > don''t want to go doing all the extra work for nothing, of course! > > > Exactly. And right now what I have read (somebody point me to > something that says otherwise), more people seem to have reported > success with xm than xl stacks (but that could just be due to the xl > stack being much more recent). > > > I would go as far as to say that most of those reports came from people > who used the packaged Xen, and until very recently the packaged Xen was > 4.0 or 4.1 where xm is still the default toolstack.Which I don''t find to be in any way an encouragement to even attempt to do this using the xl tool stack at the moment. :) Gordan
On 05/10/2013 09:19 PM, Andrew Bobulsky wrote:> 2) I actually have it working - for 5 minutes or so at a > time. If > the problem was the lack of ACS, it wouldn''t work at all. > > > I just can''t help but wonder if it /is/ the problem, though. > It''s the > > only thing I can pin down that our situations have in common as > far as > its being the only "non-compatible" portion of the > implementation, aside > from the nearly identical behavior, of course. Maybe the AMD > driver does > some stupid stuff that ACS can mitigate? I just wish I knew more :( > > > Now you got me thinking... I noticed that when the GPU starts to > head toward the crash, this appears in the syslog: > > May 6 16:35:51 normandy kernel: pcieport 0000:00:03.0: AER: > Multiple Uncorrected (Non-Fatal) error received: id=0000 > > It certainly makes me wonder. > > Has anyone else seen this error? > > The device ID in question is: > > 00:03.0 PCI bridge: Intel Corporation 5520/5500/X58 I/O Hub PCI > Express Root Port 3 (rev 22) > > which does not bode well... > > Duff hardware? > > > Hmmm... I''ll poke through my syslog at the next crash. I tried: > > cat /var/log/syslog | grep pcieport > cat /var/log/syslog.1 | grep pcieport > dmesg | grep pcieport > > > Nothing came back from any of those. I''ll see if I can identify any > unique errors myself though!Worth paying attention to. :)> So what might intrigue you the most here is that while > I''m stuck > with > a VGA device sitting behind this non-ACS compliant > switch... My > results are almost identical to yours. Passing one of > the VGA > devices > to the DomU, with or without the corresponding HDMI audio > doesn''t seem > to matter, I get this: > > " it is so intermittent. It works well enough to boot > up and > work with > a gaming type load for a few minutes. Then something > happens that > causes the VGA card to require a reset, and it all > falls apart." > > Seriously :P > > > And you are convinced this is to do with the availability > of ACS? > > > Like I said, it''s the only thing that I can pinpoint as being a > hindrance to compatibility. I guess my request here is if > anyone can > help me determine whether or not that''s true? > > > What motherboard are you using? Has anyone successfully used it for > VGA passthrough? I don''t think the possibility of both of us having > similarly duff hardware has been systematically excluded yet. > > > I think I said it, but I''ll link here anyway: > http://www.gigabyte.us/products/product-page.aspx?pid=2957#ovIndeed, you did. Apologies, it''s been a long week. :p> As to whether or not anyone''s used it for passthrough before... I''ve got > no clue. Probably not too many people, seeing as how I''m essentially > running a custom BIOS :PBIOSes are getting so crap (except maybe on Asus boards) these days that I''m amazed anything works at all. You wouldn''t believe the amount of BIOS buggyness people are encountering on the SR2, and that''s now an EOL product that should by now have had most of it''s bugs fixed (yeah - right).> It eventually likes to BSOD, usually on atikmpag.sys I > think. > Plenty > of "an attempt was made to reset the display adapter > and failed" > blah > blah blah. > > > Yes, all too familiar. > > This happens 100% of the time if I try to boot with both > devices attached. > > > Both devices? > > > Yes---that is to say both of the VGA controllers from the 6990. The > relevant portion of my lspci looks like this: > http://pastebin.com/raw.php?i=__GwekPNAW > <http://pastebin.com/raw.php?i=GwekPNAW> > > > OK, I get it. I seem to remember reading in the archives that dual > VGA passthrough is problematic (my experience over the years shows > that multiple GPUs are a false economy of highly questionably benefit). > > > That''s actually pretty much completely accurate. It drives me > particularly up the wall because I hate running things in full screen, > and crossfire basically doesn''t work at all without that :PI like my full screen gaming - but throw something obscure like an IBM T221 into the mix and things start to get rather non-trivial. T221 is 3840x2400 which is too much for DL-DVI to drive. But it''s a 10+ year old monitor design and it actually takes 3xSL-DVI (but there''s an adapter available that makes it drivable using 2xDL-DVI instead). Then you have to stitch the screens together (workable with 2xDL-DVI on XP, you need a Quadro or an Eyefinity card for the driver features to do it on Vista and 7). What I''ve found back when my old 4870X2 was bleeding edge was that with dual monitors attached, the 2nd GPU never did anything at all (stayed stone cold, performance unaffected by Crossfire). Since then I''ve learned my lesson - buy the biggest single GPU you can afford - it''s as good as it''s going to get. Everything else is going to be hit-and-miss. Debugging other people''s products may be fun when you''re 14, but I''m two decades too old to not have something better to do with my time. Nowdays I appreciate things that "just work" - the unfortunate thing I''m finding, however, is that there tend to be no things that "just work" that include all the features that I want - which in turn leads to endless debugging of other people''s software to get it to do what I want, because apparently, nobody else has tried it before. :-/> Note: devices 09 and 0a are my "primary" 6990''s vga controllers. > Also, > my crossfire bridge is disconnected. I''m working with the other > card, > devices 0d and 0e. I''ve included the USB card as well in the list > because I''m using it, but it causes me no problems whatsoever. > For what > its worth, that USB card works great in ESXi as well... Highpoint > enabled ACS on their PEX chips :D > > Just out of interest: > > 1) Are you using a multi-socket motherboard? > > > Nope! It''s a Gigabyte GA-EX58-EXTREME. It''s LGA1366 with an i7 > 920 in > it. VT-d support is provided through a hacked BIOS image that I > found > on the web a couple years or so ago. > > > Having to use a hacked BIOS for VT-d support is not a good sign or a > good starting point... > > > Technically, you''re right. AFAIK though, this particular generation of > i7 chips allows for VT-d to be managed entirely by the chipset/bios.That''s just it - I don''t like things only manageable by binary blobs with no source code. I''d much rather just have a clean interface (e.g. from /sys/) to just write the relevant registers straight to the hardware to enable/disable features. Otherwise you''re at the mercy of motherboard manufacturers who have no interest in supporting a product for people who have already bought it (sale''s made, why should they care).> There''s no particular req (however artificial) coming out of the CPUs > for this generation that stipulates VT-d can''t be patched in... so I > figured, "why not?" I was modding my BIOS anyway and decided to use > this one as a base because it had both VT-d and fully updated option > ROMs for all my onboard stuff. The world of BIOS modding is a /very/ > neat one; I highly suggest every nerd spend a few days there at some > point in his life ;)Last time I checked, this was mostly limited to people using BIOS editors to unhide features. Have things actually progressed to the point where you can add in a specific assembly payload to initialize things differently?> To the point though, it seems very well behaved on everything that > /isn''t/ my 6990 :-(Didn''t you mention you had another ATI GPU in another rig that you could borrow temporarily? It might be worth a shot to see if it''s the dual GPUs that are foiling you. Especially since they are inevitable on the same PCIe bridge. A standalone single GPU might just work. Ironically, my Quadro has been refusing to play ball completely today (it worked passably well yesterday, although not as well as my 6450 card, which today seems to be working well enough to get to the login screen without BSOD-ing. Different slot this time, though, so we''ll see how it fares in a bit. [noirqbalance, limiting guest to 3.5GB of RAM] [screen corruption, white/black lines]> Yeah. I''m convinced now. They might be a different color, but they''re > in chrome (which uses a GPU accelerated 2d canvas) and they seem to > precede the crash pretty reliably.Yes, similar here, although I don''t use Chrome - I get them in most things, including on the desktop once it has all started to go wrong.> though I''m considering a hard-hack: think > of a 12v relay and a PCIe extender cable---if a D3D0 > reset actually > powers off the slot momentarily but the PSU plugs on > the card > prevent > it from working, then I could rig up a switch that ties > those plugs'' > power state into the slot itself---it''s radical, yes, but > possibly the > most inventive solution I can think of so far. I''m > super curious to > see if anyone more knowledgeable than myself thinks it > would work, > because it''d be super cheap to build! As the saying goes > though, I''ll > "cross that bridge when I come to it." :) > > > Interesting. In theory, I think this _should_ work provider > your PCIe > bridges support hot-plugging. > > To be certain, you''d have to switch both the PCIe slot and > (if your card > uses it) the external power inputs. > > > That''d be the idea. Assuming it works the way I think it does, > I could > tap a 12v (I''m pretty sure it''s 12v in there) relay into the Vcc > and GND > pins of the PCIe slot and use the relay''s output to switch the > Vcc from > the plug-in cables off of the PSU. Bears testing with a > slightly less > expensive card, but I wouldn''t be surprised to see it work! It''d > require some case modding for sure though, as the extension > cable will > get in the way of properly seating the card. It could be > possible to > build a tap that could be "slipped in" to a card''s PCIe slot... > Short > of proper FLR support, this could actually very cheaply be built > into > the expansion card itself. I''d suspect that simply adding FLR > would be > cheaper on the card manufacturers though. :) > > > Just get a case with more slot cutouts on the back than your > motherboard has slots. Then feed the ribbon to the bottom so the > card sits in the slot on the case that is below your motherboard - > no modding required. :) > > > But... but! I guess that''d require a mini(?) or MicroATX board. I''m a > full size to XL ATX (or whatever the monster-sized boards are) kind of > guy. Guess I just want more slots to pass GPUs to VMs, eh? :)You don''t need a smaller motherboard - you need a bigger case. :) With your board, you could probably do this with a PC-P80 Armorsuit (one of the few off the shelf cases that will take my SR-2 due to a weird, needlessly oversized form factor - I mean seriously, who needs 7 PCIe x16 slots??). Hmm... Something just occurred to me - on the SR-2 this could be implemented _TRIVIALLY_! The SR-2 has jumpers to disable/enable each of the PCIe slots. So in theory, all I''d have to do is put together a simple USB controlled witch that would toggle between connecting pins 1-2 and 2-3, and attach it using a normal 3-pin jumper-type header to the jumper block in question. Or (boringly), just wire it up to a suitable button on the front of the case. I might just have to try this and see what happens (and hope it doesn''t make the magic smoke escape from something).> There''s supposed to be some cases out there that allow for mounting of > expansion cards on the end of flexible extenders. Haven''t heard about > them in a couple years, but either way chances are pretty good that such > cases aren''t exactly affordable... they likely target enterprise > customers or simply have limited runs... economy of scale and all that. > Probably the "slip-in" type of adapter/approach would be best, but I > don''t wanna get ahead of myself on a simple idea that may not even work :PUsually rack-mount cases. But it''s amazing what you can achieve with a dremel and a power drill in a few minutes. ;)> With that in mind, even though I''ve taken your advice > and added the > config info to my xend files, its entirely > possible---especially in > light of what Casey said---that I''m just Doing It > Wrong(TM). It''d > likely be beneficial for us both to compare notes on that > regard. If > either of you would be willing to help, I could > probably use some > pointers... I''ve kinda run out of logs to look at with > my current > knowledge on the subject :P > > > Certainly - what notes do you propose we compare? > > > I''m not completely sure. If you can point me to the proper files to > verify that my device has the same PCIe-level compatibility > issues as > yours (verify that ACS isn''t available to the device and so on) > then I''d > call that a step in the right direction. > > > Another thing - Do "lspci -vt" - can you put the card in a slot > where it doesn''t share a bridge with any other PCIe devices? > > > I don''t think so. You should see the built-in bridge... it''s implied > slightly up the hierarchy from the two side-by-side 6990 devices, which > itself attaches to the root port at the top: > http://pastebin.com/raw.php?i=4dGmneYiBut the 2 GPUs are inevitably on the same bridge. I think trying a single GPU would definitely be a good next step in troubleshooting.> Wish me luck!To both of us! :) Gordan
An update in bullet points (sadly without a solution). * 4.1.x vs. 4.2.x I tried to test the theory that there was something about Xen 4.1 that made VGA passthrough work better than in 4.2. I built 4.1.4 and it made no difference. Same problems, same symptoms, same BSODs. * IRQ balancing This partial workaround still seems to hold true for me - without noirqbalance in the dom0 kernel boot parameters, I generally cannot get as far as the login screen of the domU (estimating at under <10% of the time). With noirqbalance and irqbalance service disabled, I can get that far every time after a fresh host reboot. This, to me at least, implies some kind of an IRQ routing issue. Has anybody got any suggestions on how to troubleshoot this further and capture any further debug information? * Screen corruption sometimes preceeding a crash I attached a screenshot of the desktop after this happens to a bug report here: http://xen.crc.id.au/bugs/view.php?id=10 To me this implies either a memory stomp going on (aperture alignment?) or an in-flight data corruption on the PCIe bus going on that is specific to virtualization (because bare metal works fine). The idea of in-flight corruption is further corroborated by errors like these in the dom0 syslog: May 12 11:37:04 normandy kernel: pcieport 0000:00:07.0: AER: Uncorrected (Non-Fatal) error received: id=0000 May 12 11:51:28 normandy kernel: pcieport 0000:00:07.0: AER: Multiple Uncorrected (Non-Fatal) error received: id=0000 Device 0000:00:07.0 is actually the host IOH PCIe bridge, behind which is the NF200 PCIe router, behind with is the passed through ATI card. See lspci output attached to the bug report here: http://xen.crc.id.au/bugs/view.php?id=10 * ATI driver I upgraded from 13.3 beta3 to 13.4 - no obvious difference in reliability. * Nvidia Quadro I have a Quadro 2000 which people have reported success with in the past after gailing to get ATI cards to work. If anything my results with the Quadro 2000 were actually worse. I have not, however, yet tested for perfect reproducibility with a Quadro card as I have with the ATI card (see next point - I will update on this later when I have had a chance to try it, hopefully today). * Reproducibility I can now reliably reproduce the domU GPU crash following a clean reboot. Fire up Steam, fire up Borderlands 2. Hit Play. Wait. 2K animation plays through. Gearbox animation plays through. Nvidia animation plays through. Crash (blank screen, a flicker or two as the driver seemingly tries in vain to reset the GPU, AER errors in dom0 syslog, BSOD as attached to this bug report: http://xen.crc.id.au/bugs/view.php?id=9 What I am pondering now is ways to capture all the PCIe traffic from the domU and from dom0, then re-trying the same thing with bare metal and looking for a difference (unfortunately this involves analyzing GBs of captured PCIe traffic, and right now I''m not even sure how one might go about capturing this). Has anybody got any suggestions at this point? Should I be taking this to the xen-devel list instead of xen-users? There has to be a reasonably explainable, logically analyzable issue here, because the behaviour seems pretty consistent - hopefully consistent enough for debugging. Gordan
Top posting :P Hello Gordan, Casey, I hope you''ve had a good weekend. I got back to my project this morning; I decided to shove one of my 5850''s into my board to see if I could get it to work... I''ve had this Windows DomU running, with GPLPV drivers, for a few hours now. Performance is excellent. I''m using the 5850 passed-through as a PCIe device. One of my 6990s is also plugged in, and it''s being used by Dom0. Comically, I''ve got the better monitor plugged into my Dom0''s card because this 5850 lacks mini displayport :D I also can''t get gfx_passthru=1 to work. Nothing happens other than an SDL window claiming to be a Serial console showing up on my Dom0''s screen. I even have the 5850 set up as my BIOS''s primary video card. Oh well :) Gordan, I''m going to poke through your other email later and see if I can present some information to help you line up any of your suspicions. Given the way things have gone for me---and I''ve basically duplicated as much of your and Casey''s setups as humanly possible here---I''ve got to believe the problem here is ACS, or something related to it. I can even reboot this VM and the card just keeps on working. On another note, should we retire this thread soon? It''s getting a bit long and I don''t want to discourage any future googlers, nor get too off topic :P Cheers, Andrew On Fri, May 10, 2013 at 6:39 PM, Gordan Bobic <gordan@bobich.net> wrote:> On 05/10/2013 09:19 PM, Andrew Bobulsky wrote: > >> 2) I actually have it working - for 5 minutes or so at a >> time. If >> the problem was the lack of ACS, it wouldn''t work at all. >> >> >> I just can''t help but wonder if it /is/ the problem, though. >> It''s the >> >> only thing I can pin down that our situations have in common as >> far as >> its being the only "non-compatible" portion of the >> implementation, aside >> from the nearly identical behavior, of course. Maybe the AMD >> driver does >> some stupid stuff that ACS can mitigate? I just wish I knew more >> :( >> >> >> Now you got me thinking... I noticed that when the GPU starts to >> head toward the crash, this appears in the syslog: >> >> May 6 16:35:51 normandy kernel: pcieport 0000:00:03.0: AER: >> Multiple Uncorrected (Non-Fatal) error received: id=0000 >> >> It certainly makes me wonder. >> >> Has anyone else seen this error? >> >> The device ID in question is: >> >> 00:03.0 PCI bridge: Intel Corporation 5520/5500/X58 I/O Hub PCI >> Express Root Port 3 (rev 22) >> >> which does not bode well... >> >> Duff hardware? >> >> >> Hmmm... I''ll poke through my syslog at the next crash. I tried: >> >> cat /var/log/syslog | grep pcieport >> cat /var/log/syslog.1 | grep pcieport >> dmesg | grep pcieport >> >> >> Nothing came back from any of those. I''ll see if I can identify any >> unique errors myself though! > > > Worth paying attention to. :) > > >> So what might intrigue you the most here is that while >> I''m stuck >> with >> a VGA device sitting behind this non-ACS compliant >> switch... My >> results are almost identical to yours. Passing one of >> the VGA >> devices >> to the DomU, with or without the corresponding HDMI audio >> doesn''t seem >> to matter, I get this: >> >> " it is so intermittent. It works well enough to boot >> up and >> work with >> a gaming type load for a few minutes. Then something >> happens that >> causes the VGA card to require a reset, and it all >> falls apart." >> >> Seriously :P >> >> >> And you are convinced this is to do with the availability >> of ACS? >> >> >> Like I said, it''s the only thing that I can pinpoint as being a >> hindrance to compatibility. I guess my request here is if >> anyone can >> help me determine whether or not that''s true? >> >> >> What motherboard are you using? Has anyone successfully used it for >> VGA passthrough? I don''t think the possibility of both of us having >> similarly duff hardware has been systematically excluded yet. >> >> >> I think I said it, but I''ll link here anyway: >> http://www.gigabyte.us/products/product-page.aspx?pid=2957#ov > > > Indeed, you did. Apologies, it''s been a long week. :p > > >> As to whether or not anyone''s used it for passthrough before... I''ve got >> no clue. Probably not too many people, seeing as how I''m essentially >> running a custom BIOS :P > > > BIOSes are getting so crap (except maybe on Asus boards) these days that I''m > amazed anything works at all. You wouldn''t believe the amount of BIOS > buggyness people are encountering on the SR2, and that''s now an EOL product > that should by now have had most of it''s bugs fixed (yeah - right). > >> It eventually likes to BSOD, usually on atikmpag.sys I >> think. >> Plenty >> of "an attempt was made to reset the display adapter >> and failed" >> blah >> blah blah. >> >> >> Yes, all too familiar. >> >> This happens 100% of the time if I try to boot with both >> devices attached. >> >> >> Both devices? >> >> >> Yes---that is to say both of the VGA controllers from the 6990. >> The >> relevant portion of my lspci looks like this: >> http://pastebin.com/raw.php?i=__GwekPNAW >> >> <http://pastebin.com/raw.php?i=GwekPNAW> >> >> >> OK, I get it. I seem to remember reading in the archives that dual >> VGA passthrough is problematic (my experience over the years shows >> that multiple GPUs are a false economy of highly questionably >> benefit). >> >> >> That''s actually pretty much completely accurate. It drives me >> particularly up the wall because I hate running things in full screen, >> and crossfire basically doesn''t work at all without that :P > > > I like my full screen gaming - but throw something obscure like an IBM T221 > into the mix and things start to get rather non-trivial. T221 is 3840x2400 > which is too much for DL-DVI to drive. But it''s a 10+ year old monitor > design and it actually takes 3xSL-DVI (but there''s an adapter available that > makes it drivable using 2xDL-DVI instead). > > Then you have to stitch the screens together (workable with 2xDL-DVI on XP, > you need a Quadro or an Eyefinity card for the driver features to do it on > Vista and 7). What I''ve found back when my old 4870X2 was bleeding edge was > that with dual monitors attached, the 2nd GPU never did anything at all > (stayed stone cold, performance unaffected by Crossfire). > > Since then I''ve learned my lesson - buy the biggest single GPU you can > afford - it''s as good as it''s going to get. Everything else is going to be > hit-and-miss. Debugging other people''s products may be fun when you''re 14, > but I''m two decades too old to not have something better to do with my time. > Nowdays I appreciate things that "just work" - the unfortunate thing I''m > finding, however, is that there tend to be no things that "just work" that > include all the features that I want - which in turn leads to endless > debugging of other people''s software to get it to do what I want, because > apparently, nobody else has tried it before. :-/ > > >> Note: devices 09 and 0a are my "primary" 6990''s vga controllers. >> Also, >> my crossfire bridge is disconnected. I''m working with the other >> card, >> devices 0d and 0e. I''ve included the USB card as well in the list >> because I''m using it, but it causes me no problems whatsoever. >> For what >> its worth, that USB card works great in ESXi as well... Highpoint >> enabled ACS on their PEX chips :D >> >> Just out of interest: >> >> 1) Are you using a multi-socket motherboard? >> >> >> Nope! It''s a Gigabyte GA-EX58-EXTREME. It''s LGA1366 with an i7 >> 920 in >> it. VT-d support is provided through a hacked BIOS image that I >> found >> on the web a couple years or so ago. >> >> >> Having to use a hacked BIOS for VT-d support is not a good sign or a >> good starting point... >> >> >> Technically, you''re right. AFAIK though, this particular generation of >> i7 chips allows for VT-d to be managed entirely by the chipset/bios. > > > That''s just it - I don''t like things only manageable by binary blobs with no > source code. I''d much rather just have a clean interface (e.g. from /sys/) > to just write the relevant registers straight to the hardware to > enable/disable features. Otherwise you''re at the mercy of motherboard > manufacturers who have no interest in supporting a product for people who > have already bought it (sale''s made, why should they care). > >> There''s no particular req (however artificial) coming out of the CPUs >> for this generation that stipulates VT-d can''t be patched in... so I >> figured, "why not?" I was modding my BIOS anyway and decided to use >> this one as a base because it had both VT-d and fully updated option >> ROMs for all my onboard stuff. The world of BIOS modding is a /very/ >> >> neat one; I highly suggest every nerd spend a few days there at some >> point in his life ;) > > > Last time I checked, this was mostly limited to people using BIOS editors to > unhide features. Have things actually progressed to the point where you can > add in a specific assembly payload to initialize things differently? > >> To the point though, it seems very well behaved on everything that >> /isn''t/ my 6990 :-( > > > Didn''t you mention you had another ATI GPU in another rig that you could > borrow temporarily? It might be worth a shot to see if it''s the dual GPUs > that are foiling you. Especially since they are inevitable on the same PCIe > bridge. A standalone single GPU might just work. > > Ironically, my Quadro has been refusing to play ball completely today (it > worked passably well yesterday, although not as well as my 6450 card, which > today seems to be working well enough to get to the login screen without > BSOD-ing. Different slot this time, though, so we''ll see how it fares in a > bit. > > [noirqbalance, limiting guest to 3.5GB of RAM] > > [screen corruption, white/black lines] > > >> Yeah. I''m convinced now. They might be a different color, but they''re >> in chrome (which uses a GPU accelerated 2d canvas) and they seem to >> precede the crash pretty reliably. > > > Yes, similar here, although I don''t use Chrome - I get them in most things, > including on the desktop once it has all started to go wrong. > > >> though I''m considering a hard-hack: think >> of a 12v relay and a PCIe extender cable---if a D3D0 >> reset actually >> powers off the slot momentarily but the PSU plugs on >> the card >> prevent >> it from working, then I could rig up a switch that ties >> those plugs'' >> power state into the slot itself---it''s radical, yes, but >> possibly the >> most inventive solution I can think of so far. I''m >> super curious to >> see if anyone more knowledgeable than myself thinks it >> would work, >> because it''d be super cheap to build! As the saying goes >> though, I''ll >> "cross that bridge when I come to it." :) >> >> >> Interesting. In theory, I think this _should_ work provider >> your PCIe >> bridges support hot-plugging. >> >> To be certain, you''d have to switch both the PCIe slot and >> (if your card >> uses it) the external power inputs. >> >> >> That''d be the idea. Assuming it works the way I think it does, >> I could >> tap a 12v (I''m pretty sure it''s 12v in there) relay into the Vcc >> and GND >> pins of the PCIe slot and use the relay''s output to switch the >> Vcc from >> the plug-in cables off of the PSU. Bears testing with a >> slightly less >> expensive card, but I wouldn''t be surprised to see it work! It''d >> require some case modding for sure though, as the extension >> cable will >> get in the way of properly seating the card. It could be >> possible to >> build a tap that could be "slipped in" to a card''s PCIe slot... >> Short >> of proper FLR support, this could actually very cheaply be built >> into >> the expansion card itself. I''d suspect that simply adding FLR >> would be >> cheaper on the card manufacturers though. :) >> >> >> Just get a case with more slot cutouts on the back than your >> motherboard has slots. Then feed the ribbon to the bottom so the >> card sits in the slot on the case that is below your motherboard - >> no modding required. :) >> >> >> But... but! I guess that''d require a mini(?) or MicroATX board. I''m a >> full size to XL ATX (or whatever the monster-sized boards are) kind of >> guy. Guess I just want more slots to pass GPUs to VMs, eh? :) > > > You don''t need a smaller motherboard - you need a bigger case. :) > > With your board, you could probably do this with a PC-P80 Armorsuit (one of > the few off the shelf cases that will take my SR-2 due to a weird, > needlessly oversized form factor - I mean seriously, who needs 7 PCIe x16 > slots??). > > Hmm... Something just occurred to me - on the SR-2 this could be implemented > _TRIVIALLY_! The SR-2 has jumpers to disable/enable each of the PCIe slots. > So in theory, all I''d have to do is put together a simple USB controlled > witch that would toggle between connecting pins 1-2 and 2-3, and attach it > using a normal 3-pin jumper-type header to the jumper block in question. Or > (boringly), just wire it up to a suitable button on the front of the case. > > I might just have to try this and see what happens (and hope it doesn''t make > the magic smoke escape from something). > > >> There''s supposed to be some cases out there that allow for mounting of >> expansion cards on the end of flexible extenders. Haven''t heard about >> them in a couple years, but either way chances are pretty good that such >> cases aren''t exactly affordable... they likely target enterprise >> customers or simply have limited runs... economy of scale and all that. >> Probably the "slip-in" type of adapter/approach would be best, but I >> don''t wanna get ahead of myself on a simple idea that may not even work :P > > > Usually rack-mount cases. > But it''s amazing what you can achieve with a dremel and a power drill in a > few minutes. ;) > > >> With that in mind, even though I''ve taken your advice >> and added the >> config info to my xend files, its entirely >> possible---especially in >> light of what Casey said---that I''m just Doing It >> Wrong(TM). It''d >> likely be beneficial for us both to compare notes on that >> regard. If >> either of you would be willing to help, I could >> probably use some >> pointers... I''ve kinda run out of logs to look at with >> my current >> knowledge on the subject :P >> >> >> Certainly - what notes do you propose we compare? >> >> >> I''m not completely sure. If you can point me to the proper files >> to >> verify that my device has the same PCIe-level compatibility >> issues as >> yours (verify that ACS isn''t available to the device and so on) >> then I''d >> call that a step in the right direction. >> >> >> Another thing - Do "lspci -vt" - can you put the card in a slot >> where it doesn''t share a bridge with any other PCIe devices? >> >> >> I don''t think so. You should see the built-in bridge... it''s implied >> slightly up the hierarchy from the two side-by-side 6990 devices, which >> itself attaches to the root port at the top: >> http://pastebin.com/raw.php?i=4dGmneYi > > > But the 2 GPUs are inevitably on the same bridge. I think trying a single > GPU would definitely be a good next step in troubleshooting. > >> Wish me luck! > > > To both of us! :) > > > Gordan > > > _______________________________________________ > Xen-users mailing list > Xen-users@lists.xen.org > http://lists.xen.org/xen-users
On Mon, 13 May 2013 09:49:54 -0400, Andrew Bobulsky <rulerof@gmail.com> wrote:> Top posting :P > > Hello Gordan, Casey, > > I hope you''ve had a good weekend. I got back to my project this > morning; I decided to shove one of my 5850''s into my board to see if > I > could get it to work... > > I''ve had this Windows DomU running, with GPLPV drivers, for a few > hours now. Performance is excellent. I''m using the 5850 > passed-through as a PCIe device. One of my 6990s is also plugged in, > and it''s being used by Dom0. Comically, I''ve got the better monitor > plugged into my Dom0''s card because this 5850 lacks mini displayport > :DSo a dual GPU passthrough didn''t work for you, but a single GPU secondary passthrough, as is most commonly used, works fine? I''m happy for you. Glad to hear that it''s the dualness of the GPU that was foiling your previous attempts.> I also can''t get gfx_passthru=1 to work. Nothing happens other than > an SDL window claiming to be a Serial console showing up on my Dom0''s > screen. I even have the 5850 set up as my BIOS''s primary video card. > Oh well :)I _think_ that could be because it is trying to pass through the host''s primary GPU as the primary GPU for the domU. Isn''t that the way it is supposed to work? You could try setting up your X on the secondary GPU, and pass the primary through with gfx_passthru=1 and see what happens.> Gordan, I''m going to poke through your other email later and see if I > can present some information to help you line up any of your > suspicions. Given the way things have gone for me---and I''ve > basically duplicated as much of your and Casey''s setups as humanly > possible here---I''ve got to believe the problem here is ACS, or > something related to it. I can even reboot this VM and the card just > keeps on working.What bothers me is that ACS is purely a security feature, not a functionality feature.> On another note, should we retire this thread soon? It''s getting a > bit long and I don''t want to discourage any future googlers, nor get > too off topic :PWe could start a new one, I guess? Or perhaps take it to xen-devel as if it continues it is likely to get low-level and debug-y. The main thing that bothers me at the moment is that it _looks_ like my 5520 PCIe bridge (as in: 5520 PCIe bridge -> NF200 PCIe router -> VGA) clearly starts reporting uncorrected errors on the PCIe bus when the GPU passthrough starts to go wrong that the GPU crashes in the domU and takes the domU down with it. 1) This clearly doesn''t happen with bare metal, so there is something happening with the low-level hypervisor interraction that seems to be resulting in corrupt data being sent down the PCIe bus. 2) This doesn''t seem to happen in simple 3D applications. For example, I can run OCCT GPU test full screen or furmark in a window for hours without any issues, but as soon as I fire up a game it all goes wrong in very short order. The Quadro case is horribly intermittent, but the ATI behaves very predictably. It always tends to crash at exactly the same point, which leads me to think there is a very specific, very particular thing the domU tries to do that leads to everything falling apart. If only I could figure out _what_ that particular something is, I might actually stand a chance of doing something about it (hence why I was talking about taking PCIe capture dumps, but I imagine this is going to be akin to looking through several GBs of wireshark logs, i.e. boring, time consuming, labour intensive, and without any a-priori promise that it will yield any useful findings. Gordan
> > Hello Gordan, Casey, > > I hope you''ve had a good weekend. I got back to my project this > morning; I decided to shove one of my 5850''s into my board to see if I > could get it to work... > > I''ve had this Windows DomU running, with GPLPV drivers, for a few > hours now. Performance is excellent. I''m using the 5850 > passed-through as a PCIe device. One of my 6990s is also plugged in, > and it''s being used by Dom0. Comically, I''ve got the better monitor > plugged into my Dom0''s card because this 5850 lacks mini displayport > :D > > I also can''t get gfx_passthru=1 to work. Nothing happens other than > an SDL window claiming to be a Serial console showing up on my Dom0''s > screen. I even have the 5850 set up as my BIOS''s primary video card. > Oh well :) > > For primary passthrough (which is what the gfx_passthru=1 flag is supposedto set), you need to apply AMD patches to the Xen source code manually and rebuild. I have not yet tried this myself and simply make do with secondary passthrough. Gordan, I''m going to poke through your other email later and see if I> can present some information to help you line up any of your > suspicions. Given the way things have gone for me---and I''ve > basically duplicated as much of your and Casey''s setups as humanly > possible here---I''ve got to believe the problem here is ACS, or > something related to it. I can even reboot this VM and the card just > keeps on working. >Have you rebooted and then tried a 3D Application as well? Rebooting works fine for me with the windows interface since that requires very little in the way of graphics power. However if I do not eject the card after a reboot prior to starting up a 3D application, such as a game, it will run either very slowly or the software crashes. If you are not experiencing that, then you have somehow worked around the degraded passthrough performance. On another note, should we retire this thread soon? It''s getting a> bit long and I don''t want to discourage any future googlers, nor get > too off topic :P > > I think it is a good thing that we have a long email chain. I only wish Ihad run across a discussion like this when I was setting up my system. ~Casey _______________________________________________ Xen-users mailing list Xen-users@lists.xen.org http://lists.xen.org/xen-users
Hopefully a penultimate update. In short - I have ATI Secondary Passthrough working and seemingly stable today. 1) I have _no idea_ what, if anything I configured differently compared to all the previous attempts over the past week. :( 2) Yes, it works fine on a dual socket system with NF200 PCIe bridges. NF200 is NOT a problem. 3) The only thing I can think if that I did differently today is that I eject the card immediately before running a game (testing with Borderlands 2). Eject the card, it immediately gets re-added, then launch the app. Hours of gaming without a single glitch. 4) No spurious AER PCIe errors like the ones manifesting with the Nvidia Quadro card. If today''s success sticks, I may just get a 7970 cometh pay day, I''m going to try to figure out what, if anything other than reseting the card immediately before starting a 3D app is different. But I''m most pleased that whatever the problem was, hardware compatibility wasn''t it. Thanks to all of you who helped and provided encouragement that saw to it that I persevere with the effort. Gordan
Aloha! Gordan, I''ve been following this thread reading your very detailed and thorough analysis. Like you I too am using an AMD Radeon (I have a 5870 and a 5750). I''m also using a dual socket Xeon (5440 on Supermicro X7DWA-N). I haven''t gotten as far as you have though, not even close. I never video to work, not even once. I had to stop working on getting Xen to work because I''m involved in several development projects at the moment and just didn''t have the time, but I hope to get back to it soon. Kindly, would it be possible for you to answer some the following: (it''s a lot, so pick and choose, but I would like best if I could reproduce your environment). 1) Could you post your grub.cfg, kernel configuration (from /boot), and your xm''s domU configuration file? 2) What distro and kernel version are your running? 3) Did you compile the Xen kernel or the dom0 kernel with any patches or are they stock? 4) Did you compile qemu with any patches? 5) What version of Xen are you using? Still using the xm toolchain, yes? 6) Are you using ''radeon'' OSS driver with Kernel-Mode-Swtiching or the AMD proprietary driver? 7) Are you using pciback compiled into the kernel or as a module? How are you invoking it? 8) What version of Windows and what version of Catalyst are you using? Are you also using CCC? 9) Is the dom0 running Xorg? Is Xorg running on your primary (BIOS) or secondary card? 10) Which card did you pass through the primary or secondary? What I would ultimately like to do is create a LiveCD for anyone attempting this in the future that''s configured to some extent to run AMD cards using PCI passthrough. But having followed your posts, I''m just amazed that you got it working, and now, seemingly flawlessly. I just wish I could understand how. On May 14, 2013, at 1:37 PM, Gordan Bobic [via Xen] <ml-node+s1045712n5716152h63@n5.nabble.com> wrote:> Hopefully a penultimate update. In short - I have ATI Secondary > Passthrough working and seemingly stable today. > > 1) I have _no idea_ what, if anything I configured differently compared > to all the previous attempts over the past week. :( > > 2) Yes, it works fine on a dual socket system with NF200 PCIe bridges. > NF200 is NOT a problem. > > 3) The only thing I can think if that I did differently today is that I > eject the card immediately before running a game (testing with > Borderlands 2). Eject the card, it immediately gets re-added, then > launch the app. Hours of gaming without a single glitch. > > 4) No spurious AER PCIe errors like the ones manifesting with the Nvidia > Quadro card. > > If today''s success sticks, I may just get a 7970 cometh pay day, > > I''m going to try to figure out what, if anything other than reseting the > card immediately before starting a 3D app is different. But I''m most > pleased that whatever the problem was, hardware compatibility wasn''t it. > > Thanks to all of you who helped and provided encouragement that saw to > it that I persevere with the effort. > > Gordan > > _______________________________________________ > Xen-users mailing list > [hidden email] > http://lists.xen.org/xen-users > > > If you reply to this email, your message will be added to the discussion below: > http://xen.1045712.n5.nabble.com/ATI-VGA-Passthrough-Xen-4-2-Linux-3-8-6-tp5715423p5716152.html > To unsubscribe from ATI VGA Passthrough / Xen 4.2 / Linux 3.8.6, click here. > NAML-- View this message in context: http://xen.1045712.n5.nabble.com/ATI-VGA-Passthrough-Xen-4-2-Linux-3-8-6-tp5715423p5716154.html Sent from the Xen - User mailing list archive at Nabble.com. _______________________________________________ Xen-users mailing list Xen-users@lists.xen.org http://lists.xen.org/xen-users
On 05/15/2013 02:33 AM, Alex Karaoui wrote:> Aloha! > > Gordan, I''ve been following this thread reading your very detailed and > thorough analysis. Like you I too am using an AMD Radeon (I have a 5870 > and a 5750). I''m also using a dual socket Xeon (5440 on Supermicro > X7DWA-N). I haven''t gotten as far as you have though, not even close. > I never video to work, not even once. I had to stop working on > getting Xen to work because I''m involved in several development projects > at the moment and just didn''t have the time, but I hope to get back to > it soon. > > Kindly, would it be possible for you to answer _some_ the following: > (it''s a lot, so pick and choose, but I would like best if I could > reproduce your environment). > > 1) Could you post your grub.cfg, kernel configuration (from /boot),My machine is diskless using NFS root and booting via PXE, but my pxe boot config file entry is here: label xen kernel mboot.c32 append xen.gz noreboot dom0_vcpus_pin --- vmlinuz-3.9.2-1.el6xen.x86_64 rd_NO_LUKS rd_NO_LVM rd_NO_MD rd_NO_DM root=nfs:10.11.12.13:/nfsroot/normandy,rw,proto=tcp,noatime,nolock,nocto,actimeo=300 ip=eth0:dhcp selinux=0 intel_iommu=on elevator=deadline iomem=relaxed noirqbalance --- initramfs-3.9.2-1.el6xen.x86_64.img (sorry about the line wraps, I''m sure you can figure it out) You may or may not require noirqbalance - I found that without it the domU would reliably BSOD every time.> and your xm''s domU configuration file?It is in xenstore, so I cannot easily get to it. You should be able to reverse engineer most of it from the attached xend.log. I created the VM using virt-manager.> 2) What distro and kernel version are your running?EL6 (Scientific Linux) with the kernel and xen packages from here: http://xen.crc.id.au/support/guides/install/ Note: /boot/xen.gz I use is from xen-hypervisor-4.2.1-6 package. The rest of the xen stack (using xm) is 4.2.2. This is because of this bug that is not yet resolved: http://xen.crc.id.au/bugs/view.php?id=5 Versions of /boot/xen.gz newer than 4.2.1-6 don''t work for me, and the problem has been tracked down to XSA46.> 3) Did you compile the Xen kernel or the dom0 kernel with any patches or > are they stock?Unmodified packages from the source mentioned above.> 4) Did you compile qemu with any patches?All packages are as pulled down from EL6 repositories or from the Xen EL6 kernels site.> 5) What version of Xen are you using? Still using the xm toolchain, yes?Yes, xm.> 6) Are you using ''radeon'' OSS driver with Kernel-Mode-Swtiching or the > AMD proprietary driver?My dom0 GPU is nvidia, using the Nvidia binary driver (modified 295.75 due to this: http://www.altechnative.net/2013/04/14/wquxga-a-k-a-omgwtf-ibm-t221-3840x2400-204dpi-monitor-part-6-regressing-drivers-and-xen/ I have not yet had a chance to write a LD_PRELOAD library that fakes out RandR geometry the same way the fakexinerama library does to enable sensible use of my T221.> 7) Are you using pciback compiled into the kernel or as a module? How > are you invoking it?It''s a module. # cat /etc/modprobe.d/xen-pciback.conf options xen-pciback permissive=1 hide=(00:1a.0)(00:1b.0)(00:1d.2)(02:00.0)(07:00.0)(07:00.1) 02:00.0 is a Marvell NIC (one of the two on-board ones) 07:00.? is the ATI card the rest is USB conrollers. The radeon driver is blacklisted. Before I start the VM, I detach the devices from the host: ===# cat usr/local/sbin/detach.sh #!/bin/bash modprobe xen-pciback virsh nodedev-detach pci_0000_00_1a_0 virsh nodedev-detach pci_0000_00_1b_0 virsh nodedev-detach pci_0000_00_1d_2 virsh nodedev-detach pci_0000_02_00_0 virsh nodedev-detach pci_0000_07_00_0 virsh nodedev-detach pci_0000_07_00_1 === If you are using radeon cards for both dom0 and domU you will have to do some additional magic because you can''t just blacklist the radeon driver. (Note: Modified from my nvidia based config when I was trying Nvidia 8800GT for dom0 and Quadro 2000 for domU, both using the same binary driver) # cat etc/modprobe.d/radeon.conf install radeon /usr/local/sbin/detach-radeon.sh; insmod /lib/modules/$(/bin/uname -r)/kernel/drivers/gpu/drm/radeon/radeon.ko ===# cat usr/local/sbin/detach-radeon.sh #!/bin/bash modprobe xen-pciback virsh nodedev-detach pci_0000_07_00_0 virsh nodedev-detach pci_0000_07_00_1 === In this example, the passed through card is 07:00:?, adjust the IDs accordingly for your setup. What this does is when the radeon driver gets probed, it will first invoke the script above, which will detach the passthrough device from the host and bind it to xen-pciback driver. This will make the radeon driver unable to bind the device, and it will thus only bind to the dom0 primary device, which will leave the secondary free for passthrough. You can omit the duplicated lines in detach.sh since there is no point in detaching the GPU twice (although I don''t think there''s any harm in doing so).> 8) What version of Windows and what version of Catalyst are you using? > Are you also using CCC?Windows 7 in this test. Latest 13.4 ATI drivers. No CCC, just the bare driver.> 9) Is the dom0 running Xorg? Is Xorg running on your primary (BIOS) or > secondary card?Yes, running on the BIOS primary GPU (Nvidia 8800GT).> 10) Which card did you pass through the primary or secondary?Secondary, Radeon 6450.> What I would ultimately like to do is create a LiveCD for anyone > attempting this in the future that''s configured to some extent to run > AMD cards using PCI passthrough.I don''t think this is possible without at least some manual configuration. Specifically, the device PCI IDs to be detached and passed through will differ on every system. Thankfully, when swapping cards around, the PCI ID is based on the slot the card is plugged into so when replacing GPUs, as long as you put it in the same slot, you don''t have to change any IDs in your configs.> But having followed your posts, I''m just amazed that you got it working, > and now, seemingly flawlessly. I just wish I could understand how.Me too - I am more frustrated by the fact that it is working now than I was by the fact that it wasn''t working before. The only definitive thing I can narrow it down to at the moment is ejecting the card to reset it just prior to starting a game. Then again, I half expect that tonight when I try it again, it will just BSOD on me all over the place, even though the configuration can''t have changed since powering the machine off last night. Gordan _______________________________________________ Xen-users mailing list Xen-users@lists.xen.org http://lists.xen.org/xen-users
I believe I finally figured out what miraculously made things work: I reduced the domU memory from 8GB to 2GB. With 8GB it BSODs at startup. With 2GB it "just works". Now on to fighting the more minor issues (none of my USB devices work after domU reboot, ejecting them doesn''t help) and the issues that everybody else is having (BSOD on subsequent domU start-ups after shutting it down once). Gordan On 05/15/2013 12:32 AM, Gordan Bobic wrote:> Hopefully a penultimate update. In short - I have ATI Secondary > Passthrough working and seemingly stable today. > > 1) I have _no idea_ what, if anything I configured differently compared > to all the previous attempts over the past week. :( > > 2) Yes, it works fine on a dual socket system with NF200 PCIe bridges. > NF200 is NOT a problem. > > 3) The only thing I can think if that I did differently today is that I > eject the card immediately before running a game (testing with > Borderlands 2). Eject the card, it immediately gets re-added, then > launch the app. Hours of gaming without a single glitch. > > 4) No spurious AER PCIe errors like the ones manifesting with the Nvidia > Quadro card. > > If today''s success sticks, I may just get a 7970 cometh pay day, > > I''m going to try to figure out what, if anything other than reseting the > card immediately before starting a 3D app is different. But I''m most > pleased that whatever the problem was, hardware compatibility wasn''t it. > > Thanks to all of you who helped and provided encouragement that saw to > it that I persevere with the effort. > > Gordan
Aloha Gordon, When you say "BSOD on subsequent start-ups," are you saying that on the boot after AMD driver installation your system BSODs? How do you manage to get around that, by simply restarting the host? On May 15, 2013, at 10:52 AM, "Gordan Bobic [via Xen]" <ml-node+s1045712n5716166h97@n5.nabble.com> wrote:> I believe I finally figured out what miraculously made things work: I > reduced the domU memory from 8GB to 2GB. > > With 8GB it BSODs at startup. > > With 2GB it "just works". > > Now on to fighting the more minor issues (none of my USB devices work > after domU reboot, ejecting them doesn''t help) and the issues that > everybody else is having (BSOD on subsequent domU start-ups after > shutting it down once). > > Gordan > > On 05/15/2013 12:32 AM, Gordan Bobic wrote: > > > Hopefully a penultimate update. In short - I have ATI Secondary > > Passthrough working and seemingly stable today. > > > > 1) I have _no idea_ what, if anything I configured differently compared > > to all the previous attempts over the past week. :( > > > > 2) Yes, it works fine on a dual socket system with NF200 PCIe bridges. > > NF200 is NOT a problem. > > > > 3) The only thing I can think if that I did differently today is that I > > eject the card immediately before running a game (testing with > > Borderlands 2). Eject the card, it immediately gets re-added, then > > launch the app. Hours of gaming without a single glitch. > > > > 4) No spurious AER PCIe errors like the ones manifesting with the Nvidia > > Quadro card. > > > > If today''s success sticks, I may just get a 7970 cometh pay day, > > > > I''m going to try to figure out what, if anything other than reseting the > > card immediately before starting a 3D app is different. But I''m most > > pleased that whatever the problem was, hardware compatibility wasn''t it. > > > > Thanks to all of you who helped and provided encouragement that saw to > > it that I persevere with the effort. > > > > Gordan > > > _______________________________________________ > Xen-users mailing list > [hidden email] > http://lists.xen.org/xen-users > > > If you reply to this email, your message will be added to the discussion below: > http://xen.1045712.n5.nabble.com/ATI-VGA-Passthrough-Xen-4-2-Linux-3-8-6-tp5715423p5716166.html > To unsubscribe from ATI VGA Passthrough / Xen 4.2 / Linux 3.8.6, click here. > NAML-- View this message in context: http://xen.1045712.n5.nabble.com/ATI-VGA-Passthrough-Xen-4-2-Linux-3-8-6-tp5715423p5716169.html Sent from the Xen - User mailing list archive at Nabble.com. _______________________________________________ Xen-users mailing list Xen-users@lists.xen.org http://lists.xen.org/xen-users
On 05/16/2013 01:17 AM, Alex Karaoui wrote:> Aloha Gordon, > > When you say "BSOD on subsequent start-ups," are you saying that on the > boot after AMD driver installation your system BSODs? How do you manage > to get around that, by simply restarting the host?What I am referring to is the following: 1) Boot up host. 2) Start up domU - works every time now with <= 2GB of RAM (any more 3GB sometimes works. 4GB+ hardly ever gets to the login screen) 3) Shut down domU 4) At this stage, starting up the domU usually works. Maybe half of the time it BSODs or loses the USB devices (they show up with a yellow exclamation mark in device manager - ejecting them doesn''t help). 5) Repeat from 3) At this point things never get past a BSOD and the only way to get the domU working again is to reboot the host. This should now probably become a couple of new threads: 1) VGA passthrough BSOD-ing with a domU with more than 2GB of RAM 2) domU reboots cause USB controllers to become unavailable Gordan
Hello Gordan, Alex, On Thu, May 16, 2013 at 2:25 AM, Gordan Bobic <gordan@bobich.net> wrote:> On 05/16/2013 01:17 AM, Alex Karaoui wrote: > >> Aloha Gordon, >> >> When you say "BSOD on subsequent start-ups," are you saying that on the >> boot after AMD driver installation your system BSODs? How do you manage >> to get around that, by simply restarting the host? >> > > What I am referring to is the following: > > 1) Boot up host. > 2) Start up domU - works every time now with <= 2GB of RAM (any more 3GB > sometimes works. 4GB+ hardly ever gets to the login screen)I thought I''d write in because I''ve seen this before. This behavior....> 3) Shut down domU > 4) At this stage, starting up the domU usually works. Maybe half of the > time it BSODs or loses the USB devices (they show up with a yellow > exclamation mark in device manager - ejecting them doesn''t help). > 5) Repeat from 3) At this point things never get past a BSOD and the only > way to get the domU working again is to reboot the host. > > This should now probably become a couple of new threads: > 1) VGA passthrough BSOD-ing with a domU with more than 2GB of RAM >...and more specifically, this behavior---I''m referring to a BSOD that precedes the login screen; before the video driver switches out of the Windows Boot Manager''s VGA(?) mode---is eerily similar to a longstanding issue with passing through PCIe devices on ESXi. The workaround is to configure the VM with a custom PCI Hole mapping; I use "pciHole.start 1100" and "pciHole.end = 2200", though pretty much any "total" exceeding either 1024 (or perhaps simply the memory total in MB on the PCIe device itself), where 2200-1100 in this case = 1100, just fixes the problem entirely. I don''t know for sure if they''re related, but the kicker here is that the workaround is required when *and only when* passing through a PCIe video card to a VM that has more than 2GB of RAM assigned to it! The thread on their forums[1] where I first found this some time ago is ~44 pages long.> 2) domU reboots cause USB controllers to become unavailable<snippity-snip/> I may be getting lost in the email chain, but I think that USB controllers are the one thing I''ve *never *had a problem with. I haven''t ever tried attaching an onboard controller though, because every onboard controller I''ve come across is a regular PCI device... problems passing those are basically to be expected :( If you want to throw money at that problem, grab a RocketU 1144A or B. The lspci entries for the device[2] are a passthrough-user''s dream come true. There''s also the USB/IP project, but I haven''t had a lot of luck with it. I actually think that it''s an ideal solution to passing USB devices, as I''ve used commercial software that does the same, and it''s good enough to connect audio, keyboard, and mouse to machines that are actually on physically different hosts with zero noticeable lag or packet loss. I''d love to see someone get the F/OSS version working... personally I''d buy the commercial software if it wasn''t priced to gouge the crap out of a corporate wallet :( ----------- Just my two cents of course! I wish I could be a little more helpful here. I seriously admire your persistence on this issue; I probably would have quit a week ago and just bought something else, and am really happy to see you making progress! I''m currently ignoring my desktop... I just got a ThinkPad Helix tablet and am focusing on getting Xen to work on it for the time being. Also, on a side note, I can confirm that my HD 4000 graphics does not support FLR... and appears to be a PCI device :( Cheers, Andrew _______________________________________________ Xen-users mailing list Xen-users@lists.xen.org http://lists.xen.org/xen-users
Apologies, top posting because my mail reader is truncating that previous message content when replying to it. I was having issues passing through USB devices, so I figured I''d just pass the PCI devices that are the USB controllers associated with the relevant ports. I''ll try passing just the USB devices again. As for the PCI hole - is there a parameter to achieve this on xen? Or a patch for xen/qemu that fixes the problem? Gordan On Thu, May 16, 2013 at 2:25 AM, Gordan Bobic <gordan@bobich.net> wrote: On 05/16/2013 01:17 AM, Alex Karaoui wrote: Aloha Gordon, When you say "BSOD on subsequent start-ups," are you saying that on the boot after AMD driver installation your system BSODs? How do you manage to get around that, by simply restarting the host? What I am referring to is the following: 1) Boot up host. 2) Start up domU - works every time now with <= 2GB of RAM (any more 3GB sometimes works. 4GB+ hardly ever gets to the login screen) I thought I''d write in because I''ve seen this before. This behavior.... 3) Shut down domU 4) At this stage, starting up the domU usually works. Maybe half of the time it BSODs or loses the USB devices (they show up with a yellow exclamation mark in device manager - ejecting them doesn''t help). 5) Repeat from 3) At this point things never get past a BSOD and the only way to get the domU working again is to reboot the host. This should now probably become a couple of new threads: 1) VGA passthrough BSOD-ing with a domU with more than 2GB of RAM ...and more specifically, this behavior---I''m referring to a BSOD that precedes the login screen; before the video driver switches out of the Windows Boot Manager''s VGA(?) mode---is eerily similar to a longstanding issue with passing through PCIe devices on ESXi. The workaround is to configure the VM with a custom PCI Hole mapping; I use "pciHole.start = 1100" and "pciHole.end = 2200", though pretty much any "total" exceeding either 1024 (or perhaps simply the memory total in MB on the PCIe device itself), where 2200-1100 in this case = 1100, just fixes the problem entirely. I don''t know for sure if they''re related, but the kicker here is that the workaround is required when and only when passing through a PCIe video card to a VM that has more than 2GB of RAM assigned to it! The thread on their forums[1] where I first found this some time ago is ~44 pages long. 2) domU reboots cause USB controllers to become unavailable <snippity-snip/> I may be getting lost in the email chain, but I think that USB controllers are the one thing I''ve never had a problem with. I haven''t ever tried attaching an onboard controller though, because every onboard controller I''ve come across is a regular PCI device... problems passing those are basically to be expected :( If you want to throw money at that problem, grab a RocketU 1144A or B. The lspci entries for the device[2] are a passthrough-user''s dream come true. There''s also the USB/IP project, but I haven''t had a lot of luck with it. I actually think that it''s an ideal solution to passing USB devices, as I''ve used commercial software that does the same, and it''s good enough to connect audio, keyboard, and mouse to machines that are actually on physically different hosts with zero noticeable lag or packet loss. I''d love to see someone get the F/OSS version working... personally I''d buy the commercial software if it wasn''t priced to gouge the crap out of a corporate wallet :( ----------- Just my two cents of course! I wish I could be a little more helpful here. I seriously admire your persistence on this issue; I probably would have quit a week ago and just bought something else, and am really happy to see you making progress! I''m currently ignoring my desktop... I just got a ThinkPad Helix tablet and am focusing on getting Xen to work on it for the time being. Also, on a side note, I can confirm that my HD 4000 graphics does not support FLR... and appears to be a PCI device :( Cheers, Andrew
Sent from my Android device. On May 16, 2013 2:38 PM, "Andrew Bobulsky" <rulerof@gmail.com> wrote:> > Hello Gordan, Alex, > > On Thu, May 16, 2013 at 2:25 AM, Gordan Bobic <gordan@bobich.net> wrote: >> >> On 05/16/2013 01:17 AM, Alex Karaoui wrote: >>> >>> Aloha Gordon, >>> >>> When you say "BSOD on subsequent start-ups," are you saying that on the >>> boot after AMD driver installation your system BSODs? How do you manage >>> to get around that, by simply restarting the host? >> >> >> What I am referring to is the following: >> >> 1) Boot up host. >> 2) Start up domU - works every time now with <= 2GB of RAM (any more 3GBsometimes works. 4GB+ hardly ever gets to the login screen)> > > I thought I''d write in because I''ve seen this before. This behavior.... > >> >> 3) Shut down domU >> 4) At this stage, starting up the domU usually works. Maybe half of thetime it BSODs or loses the USB devices (they show up with a yellow exclamation mark in device manager - ejecting them doesn''t help).>> 5) Repeat from 3) At this point things never get past a BSOD and theonly way to get the domU working again is to reboot the host.>> >> This should now probably become a couple of new threads: >> 1) VGA passthrough BSOD-ing with a domU with more than 2GB of RAM > > > ...and more specifically, this behavior---I''m referring to a BSOD thatprecedes the login screen; before the video driver switches out of the Windows Boot Manager''s VGA(?) mode---is eerily similar to a longstanding issue with passing through PCIe devices on ESXi. The workaround is to configure the VM with a custom PCI Hole mapping; I use "pciHole.start 1100" and "pciHole.end = 2200", though pretty much any "total" exceeding either 1024 (or perhaps simply the memory total in MB on the PCIe device itself), where 2200-1100 in this case = 1100, just fixes the problem entirely.> > I don''t know for sure if they''re related, but the kicker here is that theworkaround is required when and only when passing through a PCIe video card to a VM that has more than 2GB of RAM assigned to it! The thread on their forums[1] where I first found this some time ago is ~44 pages long.> >> >> 2) domU reboots cause USB controllers to become unavailable >> >> <snippity-snip/> > > > I may be getting lost in the email chain, but I think that USBcontrollers are the one thing I''ve never had a problem with. I haven''t ever tried attaching an onboard controller though, because every onboard controller I''ve come across is a regular PCI device... problems passing those are basically to be expected :(> > If you want to throw money at that problem, grab a RocketU 1144A or B.The lspci entries for the device[2] are a passthrough-user''s dream come true.> > There''s also the USB/IP project, but I haven''t had a lot of luck with it.I actually think that it''s an ideal solution to passing USB devices, as I''ve used commercial software that does the same, and it''s good enough to connect audio, keyboard, and mouse to machines that are actually on physically different hosts with zero noticeable lag or packet loss. I''d love to see someone get the F/OSS version working... personally I''d buy the commercial software if it wasn''t priced to gouge the crap out of a corporate wallet :(> > ----------- > > Just my two cents of course! I wish I could be a little more helpfulhere. I seriously admire your persistence on this issue; I probably would have quit a week ago and just bought something else, and am really happy to see you making progress!> > I''m currently ignoring my desktop... I just got a ThinkPad Helix tabletand am focusing on getting Xen to work on it for the time being.> > Also, on a side note, I can confirm that my HD 4000 graphics does notsupport FLR... and appears to be a PCI device :(> > Cheers, > Andrew > > > _______________________________________________ > Xen-users mailing list > Xen-users@lists.xen.org > http://lists.xen.org/xen-usersHi, HD4000 graphics is provided by the CPU and it works with primary VGA passthrough with the xm toolstack running Wheezy stock linux kernel an 4.1.x hypervisor. FLR- in my personnal experience doesnt mean much. Regards, Ricardo Jesus. _______________________________________________ Xen-users mailing list Xen-users@lists.xen.org http://lists.xen.org/xen-users
On Thu, 16 May 2013 14:47:09 +0100, Ricardo Jesus <ricardo.meb.jesus@gmail.com> wrote:> > Also, on a side note, I can confirm that my HD 4000 graphics does > not support FLR... and appears to be a PCI device :( >> HD4000 graphics is provided by the CPU and it works with primary VGA > passthrough with the xm toolstack running Wheezy stock linux kernel > an > 4.1.x hypervisor.Really? Why wouldn''t it work for secondary passthrough? I have one of those in my microserver. It''s not really up to the task of gaming (hell, it isn''t even up to the task of HD video decoding and scaling it down to 1366x768), but it can take 16GB of RAM, so makes for a reasonably neat, dirt cheap, low-performance VM testing machine.> FLR- in my personnal experience doesnt mean much.You have seen the mythical item that is a GPU with FLR? If you haven''t I''m not sure how you can make a reasonable experience based comparison. Gordan
On Thu, May 16, 2013 at 9:31 AM, Andrew Bobulsky <rulerof@gmail.com> wrote:> Hello Gordan, Alex, > > On Thu, May 16, 2013 at 2:25 AM, Gordan Bobic <gordan@bobich.net> wrote: > >> On 05/16/2013 01:17 AM, Alex Karaoui wrote: >> >>> Aloha Gordon, >>> >>> When you say "BSOD on subsequent start-ups," are you saying that on the >>> boot after AMD driver installation your system BSODs? How do you manage >>> to get around that, by simply restarting the host? >>> >> >> What I am referring to is the following: >> >> 1) Boot up host. >> 2) Start up domU - works every time now with <= 2GB of RAM (any more 3GB >> sometimes works. 4GB+ hardly ever gets to the login screen) > > > I thought I''d write in because I''ve seen this before. This behavior.... > > >> 3) Shut down domU >> 4) At this stage, starting up the domU usually works. Maybe half of the >> time it BSODs or loses the USB devices (they show up with a yellow >> exclamation mark in device manager - ejecting them doesn''t help). >> 5) Repeat from 3) At this point things never get past a BSOD and the only >> way to get the domU working again is to reboot the host. >> >> This should now probably become a couple of new threads: >> 1) VGA passthrough BSOD-ing with a domU with more than 2GB of RAM >> > > ...and more specifically, this behavior---I''m referring to a BSOD that > precedes the login screen; before the video driver switches out of the > Windows Boot Manager''s VGA(?) mode---is eerily similar to a longstanding > issue with passing through PCIe devices on ESXi. The workaround is to > configure the VM with a custom PCI Hole mapping; I use "pciHole.start > 1100" and "pciHole.end = 2200", though pretty much any "total" exceeding > either 1024 (or perhaps simply the memory total in MB on the PCIe device > itself), where 2200-1100 in this case = 1100, just fixes the problem > entirely. > > I don''t know for sure if they''re related, but the kicker here is that the > workaround is required when *and only when* passing through a PCIe video > card to a VM that has more than 2GB of RAM assigned to it! The thread on > their forums[1] where I first found this some time ago is ~44 pages long. > > >> 2) domU reboots cause USB controllers to become unavailable > > <snippity-snip/> > > > I may be getting lost in the email chain, but I think that USB controllers > are the one thing I''ve *never *had a problem with. I haven''t ever tried > attaching an onboard controller though, because every onboard controller > I''ve come across is a regular PCI device... problems passing those are > basically to be expected :( > > If you want to throw money at that problem, grab a RocketU 1144A or B. > The lspci entries for the device[2] are a passthrough-user''s dream come > true. > > There''s also the USB/IP project, but I haven''t had a lot of luck with it. > I actually think that it''s an ideal solution to passing USB devices, as > I''ve used commercial software that does the same, and it''s good enough to > connect audio, keyboard, and mouse to machines that are actually on > physically different hosts with zero noticeable lag or packet loss. I''d > love to see someone get the F/OSS version working... personally I''d buy the > commercial software if it wasn''t priced to gouge the crap out of a > corporate wallet :( > > ----------- > > Just my two cents of course! I wish I could be a little more helpful > here. I seriously admire your persistence on this issue; I probably would > have quit a week ago and just bought something else, and am really happy to > see you making progress! > > I''m currently ignoring my desktop... I just got a ThinkPad Helix tablet > and am focusing on getting Xen to work on it for the time being. > > Also, on a side note, I can confirm that my HD 4000 graphics does not > support FLR... and appears to be a PCI device :( > > Cheers, > Andrew >Oops! I meant to provide some references, but I started this message and got summoned by the wife, then forgot about it :P [1]: http://communities.vmware.com/thread/297072?start=0&tstart=0 - VMware PCIHole discussion [2]: http://pastebin.com/raw.php?i=GHMj4W8e - lspci output for the RocketU 1144A _______________________________________________ Xen-users mailing list Xen-users@lists.xen.org http://lists.xen.org/xen-users
Hello again, hope this doesn''t make it hard to read, I know mobile devices like to top post :) On Thu, May 16, 2013 at 9:42 AM, Gordan Bobic <gordan@bobich.net> wrote:> Apologies, top posting because my mail reader is truncating that previous > message > content when replying to it. > > I was having issues passing through USB devices, so I figured I''d just pass > the PCI devices that are the USB controllers associated with the relevant > ports. > I''ll try passing just the USB devices again. >That''s a reasonable thing to do, and I have tried it before; I''m just saying that reliability may be hit or miss depending on a variety of circumstances :P> As for the PCI hole - is there a parameter to achieve this on xen? Or a > patch > for xen/qemu that fixes the problem? > > GordanWith the PCI hole thing, when I first started trying to work with IOMMU on an AMD system---though AFAIK the Intel vs. AMD thing doesn''t make a difference in VMware land for this particular issue---this workaround was unique to VMware. It wasn''t at all necessary for my Xen VMs on the same physical hardware. I''m just seeing that, for you, this behavior is eerily similar to a long-standing problem on VMware and may be worth looking at to see if it''s a problem here, too. :) -Andrew> On Thu, May 16, 2013 at 2:25 AM, Gordan Bobic <gordan@bobich.net> wrote: > > On 05/16/2013 01:17 AM, Alex Karaoui wrote: > > Aloha Gordon, > > When you say "BSOD on subsequent start-ups," are you saying that > on the > boot after AMD driver installation your system BSODs? How do you > manage > to get around that, by simply restarting the host? > > > What I am referring to is the following: > > 1) Boot up host. > 2) Start up domU - works every time now with <= 2GB of RAM (any more > 3GB sometimes works. 4GB+ hardly ever gets to the login screen) > > > I thought I''d write in because I''ve seen this before. This behavior.... > > > 3) Shut down domU > 4) At this stage, starting up the domU usually works. Maybe half of > the time it BSODs or loses the USB devices (they show up with a yellow > exclamation mark in device manager - ejecting them doesn''t help). > 5) Repeat from 3) At this point things never get past a BSOD and the > only way to get the domU working again is to reboot the host. > > This should now probably become a couple of new threads: > 1) VGA passthrough BSOD-ing with a domU with more than 2GB of RAM > > > ...and more specifically, this behavior---I''m referring to a BSOD that > precedes the login screen; before the video driver switches out of the > Windows Boot Manager''s VGA(?) mode---is eerily similar to a longstanding > issue with passing through PCIe devices on ESXi. The workaround is to > configure the VM with a custom PCI Hole mapping; I use "pciHole.start > 1100" and "pciHole.end = 2200", though pretty much any "total" exceeding > either 1024 (or perhaps simply the memory total in MB on the PCIe device > itself), where 2200-1100 in this case = 1100, just fixes the problem > entirely. > > I don''t know for sure if they''re related, but the kicker here is that the > workaround is required when and only when passing through a PCIe video card > to a VM that has more than 2GB of RAM assigned to it! The thread on their > forums[1] where I first found this some time ago is ~44 pages long. > > > 2) domU reboots cause USB controllers to become unavailable > > <snippity-snip/> > > > I may be getting lost in the email chain, but I think that USB controllers > are the one thing I''ve never had a problem with. I haven''t ever tried > attaching an onboard controller though, because every onboard controller > I''ve come across is a regular PCI device... problems passing those are > basically to be expected :( > > If you want to throw money at that problem, grab a RocketU 1144A or B. > The lspci entries for the device[2] are a passthrough-user''s dream come > true. > > There''s also the USB/IP project, but I haven''t had a lot of luck with it. > I actually think that it''s an ideal solution to passing USB devices, as > I''ve used commercial software that does the same, and it''s good enough to > connect audio, keyboard, and mouse to machines that are actually on > physically different hosts with zero noticeable lag or packet loss. I''d > love to see someone get the F/OSS version working... personally I''d buy the > commercial software if it wasn''t priced to gouge the crap out of a > corporate wallet :( > > ----------- > > Just my two cents of course! I wish I could be a little more helpful > here. I seriously admire your persistence on this issue; I probably would > have quit a week ago and just bought something else, and am really happy to > see you making progress! > > I''m currently ignoring my desktop... I just got a ThinkPad Helix tablet > and am focusing on getting Xen to work on it for the time being. > > Also, on a side note, I can confirm that my HD 4000 graphics does not > support FLR... and appears to be a PCI device :( > > Cheers, > Andrew > > >_______________________________________________ Xen-users mailing list Xen-users@lists.xen.org http://lists.xen.org/xen-users
On Thu, May 16, 2013 at 9:47 AM, Ricardo Jesus <ricardo.meb.jesus@gmail.com>wrote:> Sent from my Android device. > > On May 16, 2013 2:38 PM, "Andrew Bobulsky" <rulerof@gmail.com> wrote: > > > > Hello Gordan, Alex, > > > > On Thu, May 16, 2013 at 2:25 AM, Gordan Bobic <gordan@bobich.net> wrote: > >> > >> On 05/16/2013 01:17 AM, Alex Karaoui wrote: > >>> > >>> Aloha Gordon, > >>> > >>> When you say "BSOD on subsequent start-ups," are you saying that on the > >>> boot after AMD driver installation your system BSODs? How do you > manage > >>> to get around that, by simply restarting the host? > >> > >> > >> What I am referring to is the following: > >> > >> 1) Boot up host. > >> 2) Start up domU - works every time now with <= 2GB of RAM (any more > 3GB sometimes works. 4GB+ hardly ever gets to the login screen) > > > > > > I thought I''d write in because I''ve seen this before. This behavior.... > > > >> > >> 3) Shut down domU > >> 4) At this stage, starting up the domU usually works. Maybe half of the > time it BSODs or loses the USB devices (they show up with a yellow > exclamation mark in device manager - ejecting them doesn''t help). > >> 5) Repeat from 3) At this point things never get past a BSOD and the > only way to get the domU working again is to reboot the host. > >> > >> This should now probably become a couple of new threads: > >> 1) VGA passthrough BSOD-ing with a domU with more than 2GB of RAM > > > > > > ...and more specifically, this behavior---I''m referring to a BSOD that > precedes the login screen; before the video driver switches out of the > Windows Boot Manager''s VGA(?) mode---is eerily similar to a longstanding > issue with passing through PCIe devices on ESXi. The workaround is to > configure the VM with a custom PCI Hole mapping; I use "pciHole.start > 1100" and "pciHole.end = 2200", though pretty much any "total" exceeding > either 1024 (or perhaps simply the memory total in MB on the PCIe device > itself), where 2200-1100 in this case = 1100, just fixes the problem > entirely. > > > > I don''t know for sure if they''re related, but the kicker here is that > the workaround is required when and only when passing through a PCIe video > card to a VM that has more than 2GB of RAM assigned to it! The thread on > their forums[1] where I first found this some time ago is ~44 pages long. > > > >> > >> 2) domU reboots cause USB controllers to become unavailable > >> > >> <snippity-snip/> > > > > > > I may be getting lost in the email chain, but I think that USB > controllers are the one thing I''ve never had a problem with. I haven''t > ever tried attaching an onboard controller though, because every onboard > controller I''ve come across is a regular PCI device... problems passing > those are basically to be expected :( > > > > If you want to throw money at that problem, grab a RocketU 1144A or B. > The lspci entries for the device[2] are a passthrough-user''s dream come > true. > > > > There''s also the USB/IP project, but I haven''t had a lot of luck with > it. I actually think that it''s an ideal solution to passing USB devices, > as I''ve used commercial software that does the same, and it''s good enough > to connect audio, keyboard, and mouse to machines that are actually on > physically different hosts with zero noticeable lag or packet loss. I''d > love to see someone get the F/OSS version working... personally I''d buy the > commercial software if it wasn''t priced to gouge the crap out of a > corporate wallet :( > > > > ----------- > > > > Just my two cents of course! I wish I could be a little more helpful > here. I seriously admire your persistence on this issue; I probably would > have quit a week ago and just bought something else, and am really happy to > see you making progress! > > > > I''m currently ignoring my desktop... I just got a ThinkPad Helix tablet > and am focusing on getting Xen to work on it for the time being. > > > > Also, on a side note, I can confirm that my HD 4000 graphics does not > support FLR... and appears to be a PCI device :( > > > > Cheers, > > Andrew > > > > > > _______________________________________________ > > Xen-users mailing list > > Xen-users@lists.xen.org > > http://lists.xen.org/xen-users > > Hi, > > HD4000 graphics is provided by the CPU and it works with primary VGA > passthrough with the xm toolstack running Wheezy stock linux kernel an > 4.1.x hypervisor. > > FLR- in my personnal experience doesnt mean much. > > Regards, > Ricardo Jesus. >Ricardo, Would you mind pasting or linking the output of an lspci -Q on that system? I''d love to know the PCI layout you''re working with! Cheers, Andrew _______________________________________________ Xen-users mailing list Xen-users@lists.xen.org http://lists.xen.org/xen-users
On Thu, May 16, 2013 at 9:51 AM, Gordan Bobic <gordan@bobich.net> wrote:> On Thu, 16 May 2013 14:47:09 +0100, Ricardo Jesus < > ricardo.meb.jesus@gmail.com> wrote: > > > Also, on a side note, I can confirm that my HD 4000 graphics does >> not support FLR... and appears to be a PCI device :( >> >> > HD4000 graphics is provided by the CPU and it works with primary VGA >> passthrough with the xm toolstack running Wheezy stock linux kernel an >> 4.1.x hypervisor. >> > > Really? Why wouldn''t it work for secondary passthrough?I''d suspect that it *does* work for secondary passthrough. I''m under the impression that working primary passthrough devices are a subset of working secondary passthrough devices. I''d *love* to know if that''s not the case, though! -Andrew> I have one of those in my microserver. It''s not really up to the task of > gaming > (hell, it isn''t even up to the task of HD video decoding and > scaling it down to 1366x768), but it can take 16GB of RAM, so makes > for a reasonably neat, dirt cheap, low-performance VM testing machine. > > > FLR- in my personnal experience doesnt mean much. >> > > You have seen the mythical item that is a GPU with FLR? If you > haven''t I''m not sure how you can make a reasonable experience > based comparison. > > Gordan >_______________________________________________ Xen-users mailing list Xen-users@lists.xen.org http://lists.xen.org/xen-users
On Thu, May 16, 2013 at 2:23 PM, Andrew Bobulsky <rulerof@gmail.com> wrote:> > > On Thu, May 16, 2013 at 9:47 AM, Ricardo Jesus < > ricardo.meb.jesus@gmail.com> wrote: > >> Sent from my Android device. >> >> On May 16, 2013 2:38 PM, "Andrew Bobulsky" <rulerof@gmail.com> wrote: >> > >> > Hello Gordan, Alex, >> > >> > On Thu, May 16, 2013 at 2:25 AM, Gordan Bobic <gordan@bobich.net> >> wrote: >> >> >> >> On 05/16/2013 01:17 AM, Alex Karaoui wrote: >> >>> >> >>> Aloha Gordon, >> >>> >> >>> When you say "BSOD on subsequent start-ups," are you saying that on >> the >> >>> boot after AMD driver installation your system BSODs? How do you >> manage >> >>> to get around that, by simply restarting the host? >> >> >> >> >> >> What I am referring to is the following: >> >> >> >> 1) Boot up host. >> >> 2) Start up domU - works every time now with <= 2GB of RAM (any more >> 3GB sometimes works. 4GB+ hardly ever gets to the login screen) >> > >> > >> > I thought I''d write in because I''ve seen this before. This behavior.... >> > >> >> >> >> 3) Shut down domU >> >> 4) At this stage, starting up the domU usually works. Maybe half of >> the time it BSODs or loses the USB devices (they show up with a yellow >> exclamation mark in device manager - ejecting them doesn''t help). >> >> 5) Repeat from 3) At this point things never get past a BSOD and the >> only way to get the domU working again is to reboot the host. >> >> >> >> This should now probably become a couple of new threads: >> >> 1) VGA passthrough BSOD-ing with a domU with more than 2GB of RAM >> > >> > >> > ...and more specifically, this behavior---I''m referring to a BSOD that >> precedes the login screen; before the video driver switches out of the >> Windows Boot Manager''s VGA(?) mode---is eerily similar to a longstanding >> issue with passing through PCIe devices on ESXi. The workaround is to >> configure the VM with a custom PCI Hole mapping; I use "pciHole.start >> 1100" and "pciHole.end = 2200", though pretty much any "total" exceeding >> either 1024 (or perhaps simply the memory total in MB on the PCIe device >> itself), where 2200-1100 in this case = 1100, just fixes the problem >> entirely. >> > >> > I don''t know for sure if they''re related, but the kicker here is that >> the workaround is required when and only when passing through a PCIe video >> card to a VM that has more than 2GB of RAM assigned to it! The thread on >> their forums[1] where I first found this some time ago is ~44 pages long. >> > >> >> >> >> 2) domU reboots cause USB controllers to become unavailable >> >> >> >> <snippity-snip/> >> > >> > >> > I may be getting lost in the email chain, but I think that USB >> controllers are the one thing I''ve never had a problem with. I haven''t >> ever tried attaching an onboard controller though, because every onboard >> controller I''ve come across is a regular PCI device... problems passing >> those are basically to be expected :( >> > >> > If you want to throw money at that problem, grab a RocketU 1144A or B. >> The lspci entries for the device[2] are a passthrough-user''s dream come >> true. >> > >> > There''s also the USB/IP project, but I haven''t had a lot of luck with >> it. I actually think that it''s an ideal solution to passing USB devices, >> as I''ve used commercial software that does the same, and it''s good enough >> to connect audio, keyboard, and mouse to machines that are actually on >> physically different hosts with zero noticeable lag or packet loss. I''d >> love to see someone get the F/OSS version working... personally I''d buy the >> commercial software if it wasn''t priced to gouge the crap out of a >> corporate wallet :( >> > >> > ----------- >> > >> > Just my two cents of course! I wish I could be a little more helpful >> here. I seriously admire your persistence on this issue; I probably would >> have quit a week ago and just bought something else, and am really happy to >> see you making progress! >> > >> > I''m currently ignoring my desktop... I just got a ThinkPad Helix tablet >> and am focusing on getting Xen to work on it for the time being. >> > >> > Also, on a side note, I can confirm that my HD 4000 graphics does not >> support FLR... and appears to be a PCI device :( >> > >> > Cheers, >> > Andrew >> > >> > >> > _______________________________________________ >> > Xen-users mailing list >> > Xen-users@lists.xen.org >> > http://lists.xen.org/xen-users >> >> Hi, >> >> HD4000 graphics is provided by the CPU and it works with primary VGA >> passthrough with the xm toolstack running Wheezy stock linux kernel an >> 4.1.x hypervisor. >> >> FLR- in my personnal experience doesnt mean much. >> >> Regards, >> Ricardo Jesus. >> > Ricardo, > > Would you mind pasting or linking the output of an lspci -Q on that > system? I''d love to know the PCI layout you''re working with! > > Cheers, > Andrew >$ lspci -Q 00:00.0 Host bridge: Intel Corporation Xeon E3-1200 v2/3rd Gen Core processor DRAM Controller (rev 09) 00:01.0 PCI bridge: Intel Corporation Xeon E3-1200 v2/3rd Gen Core processor PCI Express Root Port (rev 09) 00:02.0 VGA compatible controller: Intel Corporation Xeon E3-1200 v2/3rd Gen Core processor Graphics Controller (rev 09) 00:14.0 USB controller: Intel Corporation 7 Series/C210 Series Chipset Family USB xHCI Host Controller (rev 04) 00:16.0 Communication controller: Intel Corporation 7 Series/C210 Series Chipset Family MEI Controller #1 (rev 04) 00:16.3 Serial controller: Intel Corporation 7 Series/C210 Series Chipset Family KT Controller (rev 04) 00:19.0 Ethernet controller: Intel Corporation 82579LM Gigabit Network Connection (rev 04) 00:1a.0 USB controller: Intel Corporation 7 Series/C210 Series Chipset Family USB Enhanced Host Controller #2 (rev 04) 00:1b.0 Audio device: Intel Corporation 7 Series/C210 Series Chipset Family High Definition Audio Controller (rev 04) 00:1c.0 PCI bridge: Intel Corporation 7 Series/C210 Series Chipset Family PCI Express Root Port 1 (rev c4) 00:1c.6 PCI bridge: Intel Corporation 7 Series/C210 Series Chipset Family PCI Express Root Port 7 (rev c4) 00:1c.7 PCI bridge: Intel Corporation 7 Series/C210 Series Chipset Family PCI Express Root Port 8 (rev c4) 00:1d.0 USB controller: Intel Corporation 7 Series/C210 Series Chipset Family USB Enhanced Host Controller #1 (rev 04) 00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev a4) 00:1f.0 ISA bridge: Intel Corporation Q77 Express Chipset LPC Controller (rev 04) 00:1f.2 SATA controller: Intel Corporation 7 Series/C210 Series Chipset Family 6-port SATA Controller [AHCI mode] (rev 04) 00:1f.3 SMBus: Intel Corporation 7 Series/C210 Series Chipset Family SMBus Controller (rev 04) 01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Pitcairn PRO [Radeon HD 7850] 01:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Cape Verde/Pitcairn HDMI Audio [Radeon HD 7700/7800 Series] 02:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Cedar [Radeon HD 5000/6000/7350 Series] 02:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Cedar HDMI Audio [Radeon HD 5400/6300 Series] 03:00.0 Ethernet controller: Intel Corporation 82574L Gigabit Network Connection 04:00.0 Network controller: Broadcom Corporation BCM43225 802.11b/g/n (rev 01) 05:00.0 Multimedia audio controller: Creative Labs SB0400 Audigy2 Value I''ve documented the steps for Xen 4.1.3 Windows 8 HVM domU with Intel HD4000 VGA Passthrough on Debian Wheezy at http://linux-bsd-sharing.blogspot.pt/2012/10/howto-xen-413-windows-8-hvm-domu-with.html Recently I''ve put my Radeon HD 7850 and HD 5450 running Windows 8 and Debian Wheezy simultaneously using this time PCI passthrough without much issues. _______________________________________________ Xen-users mailing list Xen-users@lists.xen.org http://lists.xen.org/xen-users
On Thu, May 16, 2013 at 2:26 PM, Andrew Bobulsky <rulerof@gmail.com> wrote:> > > On Thu, May 16, 2013 at 9:51 AM, Gordan Bobic <gordan@bobich.net> wrote: > >> On Thu, 16 May 2013 14:47:09 +0100, Ricardo Jesus < >> ricardo.meb.jesus@gmail.com> wrote: >> >> > Also, on a side note, I can confirm that my HD 4000 graphics does >>> not support FLR... and appears to be a PCI device :( >>> >>> >> HD4000 graphics is provided by the CPU and it works with primary VGA >>> passthrough with the xm toolstack running Wheezy stock linux kernel an >>> 4.1.x hypervisor. >>> >> >> Really? Why wouldn''t it work for secondary passthrough? > > > I''d suspect that it *does* work for secondary passthrough. I''m under the > impression that working primary passthrough devices are a subset of working > secondary passthrough devices. I''d *love* to know if that''s not the > case, though! > > -Andrew > > >> I have one of those in my microserver. It''s not really up to the task of >> gaming >> (hell, it isn''t even up to the task of HD video decoding and >> scaling it down to 1366x768), but it can take 16GB of RAM, so makes >> for a reasonably neat, dirt cheap, low-performance VM testing machine. >> >> >> FLR- in my personnal experience doesnt mean much. >>> >> >> You have seen the mythical item that is a GPU with FLR? If you >> haven''t I''m not sure how you can make a reasonable experience >> based comparison. >> >> Gordan >> > >I''ve managed HD4000 with VGA passthrough. VGA passthrough implies that HD4000 was the primary display device. Honestly don''t recall using PCI passthrough on the HD4000 because as soon as I got discrete graphics cards I''ve been using the HD4000 for dom0. _______________________________________________ Xen-users mailing list Xen-users@lists.xen.org http://lists.xen.org/xen-users