On Sun, Jun 03, 2012 at 05:31:32PM +1000, aorchis@gmail.com wrote:> Hi Jeremy and Konrad,CC-ing xen-devel.> > Basically the driver NVIDIA provided is a binary blob and recent > versions does not work with the PAT layout of XEN so it falls back to > MTRR to provide write combining (please correct me if I''m wrong).OK? Which is still OK. Are you using a v3.4 kernel with an up-to-date NVidia driver? I''ve had reports that it works OK.> However there is no MTRR support on XEN so the driver hard crashed my > machine (I can''t ssh into the box anymore). > > Moreover, there is problem with the open source driver ''nouveau'' for > NVIDIA card (also has something to do with PAT layout of XEN) which > causes memory corruption.Huh? Can you point me to a bugzilla please? There was a corruption issue where you can pass in ''nopat'' on the command line.> > I found several patches for XEN which supposedly provide basic MTRR > support for XEN however there is still no /proc/mtrr. Jeremy, can you > tell me if you had been able to get /proc/mtrr on XEN dom0?> > Thanks for your time. > > Damien. > > > On Sun, Jun 3, 2012 at 5:37 AM, Jeremy Fitzhardinge <jeremy@goop.org> wrote: > > On 06/02/2012 03:13 AM, aorchis@gmail.com wrote: > >> Hi Jeremy, > >> > >> Is there any way I can get back MTRR support in XEN in 3.0 kernel? To > >> make a long story short, NVIDIA binary driver rejects PAT in XEN and > >> it falls back to using MTRR but MTRR in XEN was taken out a long time > >> ago so now there''s no way to get the NVIDIA binary blob running under > >> a linux XEN dom0. I was about to tear my hair out looking for > >> solutions high and low. > > > > Hi! > > > > Firstly, Konrad is probably the person you should send this to these > > days, since I''m not managing to get much Xen stuff done. > > > > Secondly, hm. Unfortunately, the changes we did have to integrate Xen''s > > MTRR machinery with Linux have been solidly rejected by the upstream > > maintainers several times, so I think its unlikely that they will ever > > make it into the mainline kernel. And it doesn''t seem to have really > > made a difference because PAT does subsume MTRR for at least all the in > > kernel users, as far as I know. > > > > What do you mean by "[the] NVIDIA binary driver rejects PAT in XEN and > > it falls back to using MTRR"? Why does the Nvidia driver reject PAT? > > Perhaps addressing that would be a more profitable way of getting this > > working. In the past we''ve talked about changing Xen''s PAT mapping to > > match the kernel''s (or make it configurable), but for now we''re > > remapping between the PAT schemes in the pte pvops. If the NVIDIA > > driver is using that mechanism to set ptes (as it must to get anywhere > > in a pvops kernel), then it should be fine with the remapping. Or its > > possible they''re having problems with reading a pte back and mapping > > from Xen->Linux PAT formats, which is a problem some of the in-kernel > > drivers also had. Konrad, how did that turn out in the end?Attic. I''ve turned it off since we had corruption issues (the WC didn''t turn back into WB b/c of page_attr using the pte_flag instead of pte_var). Peter was talking about some software PAT lookup code but I hadn''t focused on that. There is also some performance numbers to run and collect.
On Wed, Jun 6, 2012 at 2:17 AM, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote:> On Sun, Jun 03, 2012 at 05:31:32PM +1000, aorchis@gmail.com wrote: >> Hi Jeremy and Konrad, > > CC-ing xen-devel. > >> >> Basically the driver NVIDIA provided is a binary blob and recent >> versions does not work with the PAT layout of XEN so it falls back to >> MTRR to provide write combining (please correct me if I''m wrong). > > OK? Which is still OK. Are you using a v3.4 kernel with an up-to-date > NVidia driver? I''ve had reports that it works OK.I briefly tried kernel 3.4 to see if the problem is fixed but it''s not. I used v3.4 kernel with NVIDIA driver v295.49 and a beta version (v302) but both didn''t work. When the nvidia module is loaded, it prints out an error message: "NVRM: PAT configuration unsupported, falling back to MTRRs." When I launch Xorg, the screen turns blank then the monitors powered down and my box hard crashed, I had to hold the power button to turn it off. This only happens in dom0 under XEN, if I run my dom0 by itself then the nvidia module loads fine.>> However there is no MTRR support on XEN so the driver hard crashed my >> machine (I can''t ssh into the box anymore). >> >> Moreover, there is problem with the open source driver ''nouveau'' for >> NVIDIA card (also has something to do with PAT layout of XEN) which >> causes memory corruption. > > Huh? Can you point me to a bugzilla please? There was a corruption > issue where you can pass in ''nopat'' on the command line.Yes, that is the issue I was referring to, I had to pass "nopat" to GRUB in order to fix it.>> >> I found several patches for XEN which supposedly provide basic MTRR >> support for XEN however there is still no /proc/mtrr. Jeremy, can you >> tell me if you had been able to get /proc/mtrr on XEN dom0? > >> >> Thanks for your time. >> >> Damien. >> >> >> On Sun, Jun 3, 2012 at 5:37 AM, Jeremy Fitzhardinge <jeremy@goop.org> wrote: >> > On 06/02/2012 03:13 AM, aorchis@gmail.com wrote: >> >> Hi Jeremy, >> >> >> >> Is there any way I can get back MTRR support in XEN in 3.0 kernel? To >> >> make a long story short, NVIDIA binary driver rejects PAT in XEN and >> >> it falls back to using MTRR but MTRR in XEN was taken out a long time >> >> ago so now there''s no way to get the NVIDIA binary blob running under >> >> a linux XEN dom0. I was about to tear my hair out looking for >> >> solutions high and low. >> > >> > Hi! >> > >> > Firstly, Konrad is probably the person you should send this to these >> > days, since I''m not managing to get much Xen stuff done. >> > >> > Secondly, hm. Unfortunately, the changes we did have to integrate Xen''s >> > MTRR machinery with Linux have been solidly rejected by the upstream >> > maintainers several times, so I think its unlikely that they will ever >> > make it into the mainline kernel. And it doesn''t seem to have really >> > made a difference because PAT does subsume MTRR for at least all the in >> > kernel users, as far as I know. >> > >> > What do you mean by "[the] NVIDIA binary driver rejects PAT in XEN and >> > it falls back to using MTRR"? Why does the Nvidia driver reject PAT? >> > Perhaps addressing that would be a more profitable way of getting this >> > working. In the past we''ve talked about changing Xen''s PAT mapping to >> > match the kernel''s (or make it configurable), but for now we''re >> > remapping between the PAT schemes in the pte pvops. If the NVIDIA >> > driver is using that mechanism to set ptes (as it must to get anywhere >> > in a pvops kernel), then it should be fine with the remapping. Or its >> > possible they''re having problems with reading a pte back and mapping >> > from Xen->Linux PAT formats, which is a problem some of the in-kernel >> > drivers also had. Konrad, how did that turn out in the end? > > Attic. I''ve turned it off since we had corruption issues (the WC didn''t > turn back into WB b/c of page_attr using the pte_flag instead of pte_var). > Peter was talking about some software PAT lookup code but I hadn''t > focused on that. There is also some performance numbers to run and collect. >
On Thu, Jun 07, 2012 at 08:49:37PM +1000, aorchis@gmail.com wrote:> On Wed, Jun 6, 2012 at 2:17 AM, Konrad Rzeszutek Wilk > <konrad.wilk@oracle.com> wrote: > > On Sun, Jun 03, 2012 at 05:31:32PM +1000, aorchis@gmail.com wrote: > >> Hi Jeremy and Konrad, > > > > CC-ing xen-devel. > > > >> > >> Basically the driver NVIDIA provided is a binary blob and recent > >> versions does not work with the PAT layout of XEN so it falls back to > >> MTRR to provide write combining (please correct me if I''m wrong). > > > > OK? Which is still OK. Are you using a v3.4 kernel with an up-to-date > > NVidia driver? I''ve had reports that it works OK. > > I briefly tried kernel 3.4 to see if the problem is fixed but it''s not. > I used v3.4 kernel with NVIDIA driver v295.49 and a beta version > (v302) but both didn''t work. > > When the nvidia module is loaded, it prints out an error message: > "NVRM: PAT configuration unsupported, falling back to MTRRs." > > When I launch Xorg, the screen turns blank then the monitors powered down and my > box hard crashed, I had to hold the power button to turn it off.Ok, lets CC Ben - he might have more up-to-date information. Also you might want to setup a serial console to capture the kernel to see where it crashes.> > This only happens in dom0 under XEN, if I run my dom0 by itself then > the nvidia module loads fine. > > >> However there is no MTRR support on XEN so the driver hard crashed my > >> machine (I can''t ssh into the box anymore). > >> > >> Moreover, there is problem with the open source driver ''nouveau'' for > >> NVIDIA card (also has something to do with PAT layout of XEN) which > >> causes memory corruption. > > > > Huh? Can you point me to a bugzilla please? There was a corruption > > issue where you can pass in ''nopat'' on the command line. > > Yes, that is the issue I was referring to, I had to pass "nopat" to > GRUB in order > to fix it.Ok, so there is afall back for you.
Hi at all, I confirm this issue on gentoo with kernel 3.4.0. However, in my case i can't use nouveau because fan is always at 100% for a problem of the driver. If I use nvidia driver in my case kernel doesn't oopps but I have always black screen. So if I can help you I can retrieve dmesg and kernel log through ssh. My video card is gts 250. Regards, geaaru On Thu Jun 7 2012 05:58:14 PM CEST, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote:> On Thu, Jun 07, 2012 at 08:49:37PM +1000, aorchis@gmail.com wrote: > > On Wed, Jun 6, 2012 at 2:17 AM, Konrad Rzeszutek Wilk > > <konrad.wilk@oracle.com> wrote: > > > On Sun, Jun 03, 2012 at 05:31:32PM +1000, aorchis@gmail.com wrote: > > > > Hi Jeremy and Konrad, > > > > > > CC-ing xen-devel. > > > > > > > > > > > Basically the driver NVIDIA provided is a binary blob and recent > > > > versions does not work with the PAT layout of XEN so it falls back > > > > to MTRR to provide write combining (please correct me if I'm > > > > wrong). > > > > > > OK? Which is still OK. Are you using a v3.4 kernel with an up-to-date > > > NVidia driver? I've had reports that it works OK. > > > > I briefly tried kernel 3.4 to see if the problem is fixed but it's not. > > I used v3.4 kernel with NVIDIA driver v295.49 and a beta version > > (v302) but both didn't work. > > > > When the nvidia module is loaded, it prints out an error message: > > "NVRM: PAT configuration unsupported, falling back to MTRRs." > > > > When I launch Xorg, the screen turns blank then the monitors powered > > down and my box hard crashed, I had to hold the power button to turn > > it off. > > Ok, lets CC Ben - he might have more up-to-date information. Also > you might want to setup a serial console to capture the kernel to see > where it crashes. > > > > > This only happens in dom0 under XEN, if I run my dom0 by itself then > > the nvidia module loads fine. > > > > > > However there is no MTRR support on XEN so the driver hard crashed > > > > my machine (I can't ssh into the box anymore). > > > > > > > > Moreover, there is problem with the open source driver 'nouveau' > > > > for NVIDIA card (also has something to do with PAT layout of XEN) > > > > which causes memory corruption. > > > > > > Huh? Can you point me to a bugzilla please? There was a corruption > > > issue where you can pass in 'nopat' on the command line. > > > > Yes, that is the issue I was referring to, I had to pass "nopat" to > > GRUB in order > > to fix it. > > Ok, so there is afall back for you. > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > lists.xen.org/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org lists.xen.org/xen-devel
On Thu, Jun 7, 2012 at 11:58 AM, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote:> On Thu, Jun 07, 2012 at 08:49:37PM +1000, aorchis@gmail.com wrote: >> On Wed, Jun 6, 2012 at 2:17 AM, Konrad Rzeszutek Wilk >> <konrad.wilk@oracle.com> wrote: >> > On Sun, Jun 03, 2012 at 05:31:32PM +1000, aorchis@gmail.com wrote: >> >> Hi Jeremy and Konrad, >> > >> > CC-ing xen-devel. >> > >> >> >> >> Basically the driver NVIDIA provided is a binary blob and recent >> >> versions does not work with the PAT layout of XEN so it falls back to >> >> MTRR to provide write combining (please correct me if I''m wrong). >> > >> > OK? Which is still OK. Are you using a v3.4 kernel with an up-to-date >> > NVidia driver? I''ve had reports that it works OK. >> >> I briefly tried kernel 3.4 to see if the problem is fixed but it''s not. >> I used v3.4 kernel with NVIDIA driver v295.49 and a beta version >> (v302) but both didn''t work. >> >> When the nvidia module is loaded, it prints out an error message: >> "NVRM: PAT configuration unsupported, falling back to MTRRs." >> >> When I launch Xorg, the screen turns blank then the monitors powered down and my >> box hard crashed, I had to hold the power button to turn it off. > > Ok, lets CC Ben - he might have more up-to-date information. Also > you might want to setup a serial console to capture the kernel to see > where it crashes. >Sorry - we ended up needing to abandon using the closed source driver, as despite compiling it with IGNORE_XEN_PRESENCE set, I was never able to get it to work with the 3.2 linux kernel. Since we had reasonably good luck with the nouveau drivers, and our OpenGL use case is pretty basic - it ended up being sufficient for our needs. I''d have to do some email archeology to see what the specific failure was...but it sounds very similar to the problem desribed above. I apologize that I''m not a lot of help here. /btg>> >> This only happens in dom0 under XEN, if I run my dom0 by itself then >> the nvidia module loads fine. >> >> >> However there is no MTRR support on XEN so the driver hard crashed my >> >> machine (I can''t ssh into the box anymore). >> >> >> >> Moreover, there is problem with the open source driver ''nouveau'' for >> >> NVIDIA card (also has something to do with PAT layout of XEN) which >> >> causes memory corruption. >> > >> > Huh? Can you point me to a bugzilla please? There was a corruption >> > issue where you can pass in ''nopat'' on the command line. >> >> Yes, that is the issue I was referring to, I had to pass "nopat" to >> GRUB in order >> to fix it. > > Ok, so there is afall back for you.
These were the failures I was seeing: nvnews.net/vbulletin/showthread.php?t=174219 going back through old xen-devel emails, the XID failure looks like it has been happening for about a year, at least. I have not tried on kernels newer than 3.2 On Thu, Jun 7, 2012 at 7:39 PM, Ben Guthro <ben@guthro.net> wrote:> On Thu, Jun 7, 2012 at 11:58 AM, Konrad Rzeszutek Wilk > <konrad.wilk@oracle.com> wrote: >> On Thu, Jun 07, 2012 at 08:49:37PM +1000, aorchis@gmail.com wrote: >>> On Wed, Jun 6, 2012 at 2:17 AM, Konrad Rzeszutek Wilk >>> <konrad.wilk@oracle.com> wrote: >>> > On Sun, Jun 03, 2012 at 05:31:32PM +1000, aorchis@gmail.com wrote: >>> >> Hi Jeremy and Konrad, >>> > >>> > CC-ing xen-devel. >>> > >>> >> >>> >> Basically the driver NVIDIA provided is a binary blob and recent >>> >> versions does not work with the PAT layout of XEN so it falls back to >>> >> MTRR to provide write combining (please correct me if I''m wrong). >>> > >>> > OK? Which is still OK. Are you using a v3.4 kernel with an up-to-date >>> > NVidia driver? I''ve had reports that it works OK. >>> >>> I briefly tried kernel 3.4 to see if the problem is fixed but it''s not. >>> I used v3.4 kernel with NVIDIA driver v295.49 and a beta version >>> (v302) but both didn''t work. >>> >>> When the nvidia module is loaded, it prints out an error message: >>> "NVRM: PAT configuration unsupported, falling back to MTRRs." >>> >>> When I launch Xorg, the screen turns blank then the monitors powered down and my >>> box hard crashed, I had to hold the power button to turn it off. >> >> Ok, lets CC Ben - he might have more up-to-date information. Also >> you might want to setup a serial console to capture the kernel to see >> where it crashes. >> > > Sorry - we ended up needing to abandon using the closed source driver, > as despite compiling it with IGNORE_XEN_PRESENCE set, I was never able > to get it to work with the 3.2 linux kernel. > > Since we had reasonably good luck with the nouveau drivers, and our > OpenGL use case is pretty basic - it ended up being sufficient for our > needs. > > I''d have to do some email archeology to see what the specific failure > was...but it sounds very similar to the problem desribed above. > > I apologize that I''m not a lot of help here. > > /btg > >>> >>> This only happens in dom0 under XEN, if I run my dom0 by itself then >>> the nvidia module loads fine. >>> >>> >> However there is no MTRR support on XEN so the driver hard crashed my >>> >> machine (I can''t ssh into the box anymore). >>> >> >>> >> Moreover, there is problem with the open source driver ''nouveau'' for >>> >> NVIDIA card (also has something to do with PAT layout of XEN) which >>> >> causes memory corruption. >>> > >>> > Huh? Can you point me to a bugzilla please? There was a corruption >>> > issue where you can pass in ''nopat'' on the command line. >>> >>> Yes, that is the issue I was referring to, I had to pass "nopat" to >>> GRUB in order >>> to fix it. >> >> Ok, so there is afall back for you.
On Thu, Jun 07, 2012 at 11:01:48PM +0200, geaaru wrote:> Hi at all, > > I confirm this issue on gentoo with kernel 3.4.0.Can you try cherry-pick these two patches from stable/for-x86-3.3: 4f93aa02acd0e34806d4ac9c3a700bb5d040eab6 f474007a0761d0ecb6b84ceaf4f97f4f1de92038 and revert 8eaffa67b43e99ae581622c5133e20b0f48bcef1. The easiest way is to do this: git clone git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen.git cd xen git cherry-pick 4f93aa02acd0e34806d4ac9c3a700bb5d040eab6 git cherry-pick f474007a0761d0ecb6b84ceaf4f97f4f1de92038 git revert 8eaffa67b43e99ae581622c5133e20b0f48bcef1 and then build the kernel and install it and such. [What you are doing is removing the band-aid for the PAT issue and adding in code that allows PAT to work]
Hi, thanks for reply. I test this kernel version but some issue. # dmesg | grep PAT [ 11.098566] NVRM: PAT configuration unsupported, falling back to MTRRs. # uname -a Linux localhost 3.4.0+ #1 SMP PREEMPT Sat Jun 9 16:37:11 CEST 2012 x86_64 Pentium(R) Dual-Core CPU E5200 @ 2.50GHz GenuineIntel GNU/Linux On X log I see: [ 389.551] (II) NVIDIA(0): The NVIDIA X driver has encountered an error; attempting to [ 389.551] (II) NVIDIA(0): recover... and on dmesg: [ 389.548750] NVRM: Xid (0000:01:00): 6, PE0001 [ 389.555804] NVRM: Xid (0000:01:00): 6, PE0001 [ 392.363472] NVRM: Xid (0000:01:00): 6, PE007e [ 392.365793] NVRM: Xid (0000:01:00): 6, PE007e [ 392.368036] NVRM: Xid (0000:01:00): 6, PE007e [ 392.370266] NVRM: Xid (0000:01:00): 6, PE007e [ 392.372486] NVRM: Xid (0000:01:00): 6, PE007e [ 392.374720] NVRM: Xid (0000:01:00): 6, PE007e /proc/mtrr doesn''t exits. While on PAT I have this: # cat /sys/kernel/debug/x86/pat_memtype_list PAT memtype list: uncached-minus @ 0xcfee0000-0xcfee1000 uncached-minus @ 0xcfee3000-0xcfee8000 uncached-minus @ 0xcfee8000-0xcfee9000 write-combining @ 0xd0000000-0xd0001000 uncached-minus @ 0xe0000000-0xe4000000 uncached-minus @ 0xe0008000-0xe0009000 uncached-minus @ 0xe0100000-0xe0101000 write-combining @ 0xe4000000-0xe5000000 uncached-minus @ 0xe6000000-0xe7000000 uncached-minus @ 0xe6060000-0xe6061000 uncached-minus @ 0xe6640000-0xe6641000 uncached-minus @ 0xe6647000-0xe6648000 uncached-minus @ 0xe6647000-0xe6648000 uncached-minus @ 0xe6c02000-0xe6c03000 uncached-minus @ 0xea010000-0xea011000 uncached-minus @ 0xea100000-0xea101000 uncached-minus @ 0xea200000-0xea204000 uncached-minus @ 0xea204000-0xea205000 uncached-minus @ 0xea205000-0xea206000 uncached-minus @ 0xfed1f000-0xfed20000 uncached-minus @ 0xfed1f000-0xfed20000 On Fri, 2012-06-08 at 11:30 -0400, Konrad Rzeszutek Wilk wrote:> On Thu, Jun 07, 2012 at 11:01:48PM +0200, geaaru wrote: > > Hi at all, > > > > I confirm this issue on gentoo with kernel 3.4.0. > > Can you try cherry-pick these two patches from stable/for-x86-3.3: > 4f93aa02acd0e34806d4ac9c3a700bb5d040eab6 > > f474007a0761d0ecb6b84ceaf4f97f4f1de92038 > > and revert 8eaffa67b43e99ae581622c5133e20b0f48bcef1. > > The easiest way is to do this: > git clone git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen.git > cd xen > git cherry-pick 4f93aa02acd0e34806d4ac9c3a700bb5d040eab6 > git cherry-pick f474007a0761d0ecb6b84ceaf4f97f4f1de92038 > git revert 8eaffa67b43e99ae581622c5133e20b0f48bcef1 > > and then build the kernel and install it and such. > > [What you are doing is removing the band-aid for the PAT > issue and adding in code that allows PAT to work]