Hi Ilia Thanks for taking the time to reply. On Wed, 28 Oct 2020 at 14:10, Ilia Mirkin <imirkin at alum.mit.edu> wrote:> > The most common issue on arm is that the pci memory window is too narrow to allocate all the BARs. Can you see if there are messages in the kernel to that effect?All the BAR allocations seem to succeed except for the IO one. AIUI I/O is deprecated, but is it still used on these cards? [ 1.060851] brcm-pcie fd500000.pcie: host bridge /scb/pcie at 7d500000 ranges: [ 1.060892] brcm-pcie fd500000.pcie: No bus range found for /scb/pcie at 7d500000, using [bus 00-ff] [ 1.060975] brcm-pcie fd500000.pcie: MEM 0x0600000000..0x063fffffff -> 0x00c0000000 [ 1.061061] brcm-pcie fd500000.pcie: IB MEM 0x0000000000..0x00ffffffff -> 0x0100000000 [ 1.109943] brcm-pcie fd500000.pcie: link up, 5.0 GT/s PCIe x1 (SSC) [ 1.110129] brcm-pcie fd500000.pcie: PCI host bridge to bus 0000:00 [ 1.110159] pci_bus 0000:00: root bus resource [bus 00-ff] [ 1.110187] pci_bus 0000:00: root bus resource [mem 0x600000000-0x63fffffff] (bus address [0xc0000000-0xffffffff]) [ 1.110286] pci 0000:00:00.0: [14e4:2711] type 01 class 0x060400 [ 1.110505] pci 0000:00:00.0: PME# supported from D0 D3hot [ 1.114095] pci 0000:00:00.0: bridge configuration invalid ([bus 00-00]), reconfiguring [ 1.114343] pci 0000:01:00.0: [10de:128b] type 00 class 0x030000 [ 1.114404] pci 0000:01:00.0: reg 0x10: [mem 0x00000000-0x00ffffff] [ 1.114456] pci 0000:01:00.0: reg 0x14: [mem 0x00000000-0x07ffffff 64bit pref] [ 1.114510] pci 0000:01:00.0: reg 0x1c: [mem 0x00000000-0x01ffffff 64bit pref] [ 1.114551] pci 0000:01:00.0: reg 0x24: [io 0x0000-0x007f] [ 1.114590] pci 0000:01:00.0: reg 0x30: [mem 0x00000000-0x0007ffff pref] [ 1.114853] pci 0000:01:00.0: 4.000 Gb/s available PCIe bandwidth, limited by 5.0 GT/s PCIe x1 link at 0000:00:00.0 (capable of 63.008 Gb/s with 8.0 GT/s PCIe x8 link) [ 1.115022] pci 0000:01:00.0: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none [ 1.115125] pci 0000:01:00.1: [10de:0e0f] type 00 class 0x040300 [ 1.115184] pci 0000:01:00.1: reg 0x10: [mem 0x00000000-0x00003fff] [ 1.119065] pci_bus 0000:01: busn_res: [bus 01-ff] end is updated to 01 [ 1.119120] pci 0000:00:00.0: BAR 9: assigned [mem 0x600000000-0x60bffffff 64bit pref] [ 1.119151] pci 0000:00:00.0: BAR 8: assigned [mem 0x60c000000-0x60d7fffff] [ 1.119183] pci 0000:01:00.0: BAR 1: assigned [mem 0x600000000-0x607ffffff 64bit pref] [ 1.119235] pci 0000:01:00.0: BAR 3: assigned [mem 0x608000000-0x609ffffff 64bit pref] [ 1.119285] pci 0000:01:00.0: BAR 0: assigned [mem 0x60c000000-0x60cffffff] [ 1.119317] pci 0000:01:00.0: BAR 6: assigned [mem 0x60d000000-0x60d07ffff pref] [ 1.119345] pci 0000:01:00.1: BAR 0: assigned [mem 0x60d080000-0x60d083fff] [ 1.119376] pci 0000:01:00.0: BAR 5: no space for [io size 0x0080] [ 1.119400] pci 0000:01:00.0: BAR 5: failed to assign [io size 0x0080] [ 1.119426] pci 0000:00:00.0: PCI bridge to [bus 01] [ 1.119456] pci 0000:00:00.0: bridge window [mem 0x60c000000-0x60d7fffff] [ 1.119484] pci 0000:00:00.0: bridge window [mem 0x600000000-0x60bffffff 64bit pref] [ 1.119662] pci 0000:01:00.1: D0 power state depends on 0000:01:00.0 Dave> On Wed, Oct 28, 2020, 9:46 AM Dave Stevenson <dave.stevenson at raspberrypi.com> wrote: >> >> Hi >> >> Seeing as we (Raspberry Pi) have just launched the Compute Module 4 >> with an exposed PCIe x1 lane, people are asking about adding graphics >> cards. >> >> Seeing as you are the people who have the knowledge with regard to >> NVidia and nouveau, what are your immediate thoughts of nouveau >> working on ARM/ARM64? Is there a chance of this working? I'm no PCIe >> expert, although I can call on some expertise :-/ >> >> I've tried it so far with a GT710 board [1] and ARM64. It's blowing up >> in the memset of nvkm_instobj_new whilst initialising the BAR >> subdevice [2], having gone through the "No such luck" path in >> nvkm_mmu_ptc_get [3]. >> >> Taking the naive approach of simply removing the memset, I get through >> initialising all the subdevices, but again die in a location I >> currently haven't pinpointed. The last logging messages are: >> [ 1023.407302] nouveau 0000:01:00.0: fifo: one-time init completed in 760us >> [ 1023.407312] nouveau 0000:01:00.0: fifo: init completed in 775us >> [ 1023.407325] nouveau: DRM-master:00000000:80009009: init running... >> [ 1023.407329] nouveau: DRM-master:00000000:80009009: init children... >> [ 1023.407333] nouveau: DRM-master:00000000:80009009: init completed in 4us >> [ 1023.407352] nouveau 0000:01:00.0: fifo: fault 01 [WRITE] at >> 00000000000d1000 engine 05 [BAR2] client 08 [HUB/HOST_CPU_NB] reason >> 02 [PTE] on channel -1 [007fd38000 unknown] >> [ 1023.407354] nouveau: DRM-master:00000000:00000000: ioctl: return 0 >> [ 1023.407385] nouveau: DRM-master:00000000:00000000: ioctl: size 32 >> [ 1023.407392] nouveau: DRM-master:00000000:00000000: ioctl: vers 0 >> type 01 object ffffff80ee8c2170 owner ff >> [ 1023.407415] nouveau: DRM-master:00000000:80009009: ioctl: sclass size 8 >> [ 1023.407419] nouveau: DRM-master:00000000:80009009: ioctl: sclass >> vers 0 count 0 >> [ 1023.407432] nouveau: DRM-master:00000000:00000000: ioctl: return 0 >> [ 1023.407452] nouveau: DRM-master:00000000:00000000: ioctl: size 48 >> [ 1023.407459] nouveau: DRM-master:00000000:00000000: ioctl: vers 0 >> type 01 object ffffff80ee8c2170 owner ff >> [ 1023.407482] nouveau: DRM-master:00000000:80009009: ioctl: sclass size 24 >> [ 1023.407485] nouveau: DRM-master:00000000:80009009: ioctl: sclass >> vers 0 count 2 >> [ 1023.407498] nouveau: DRM-master:00000000:00000000: ioctl: return 0 >> [ 1023.407519] nouveau: DRM-master:00000000:00000000: ioctl: size 48 >> >> Any input very welcome, otherwise I'll continue blundering about >> slightly in the dark. >> >> Thanks >> Dave >> >> [1] https://www.amazon.co.uk/ASUS-GT710-4H-SL-2GD5-GeForce-Multi-Monitor-Productivity/dp/B0897T6PYM/ >> [2] https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/nouveau/nvkm/subdev/instmem/base.c#L114 >> [3] https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/base.c#L201 >> _______________________________________________ >> Nouveau mailing list >> Nouveau at lists.freedesktop.org >> https://lists.freedesktop.org/mailman/listinfo/nouveau
On Wed, Oct 28, 2020 at 10:20 AM Dave Stevenson <dave.stevenson at raspberrypi.com> wrote:> > Hi Ilia > > Thanks for taking the time to reply. > > On Wed, 28 Oct 2020 at 14:10, Ilia Mirkin <imirkin at alum.mit.edu> wrote: > > > > The most common issue on arm is that the pci memory window is too narrow to allocate all the BARs. Can you see if there are messages in the kernel to that effect? > > All the BAR allocations seem to succeed except for the IO one. > AIUI I/O is deprecated, but is it still used on these cards?I must admit I was ignorant of the fact that the IO ports were treated as a BAR, but it makes a lot of sense. One thing does stand out as odd:> > [ 1.060851] brcm-pcie fd500000.pcie: host bridge /scb/pcie at 7d500000 ranges: > [ 1.060892] brcm-pcie fd500000.pcie: No bus range found for > /scb/pcie at 7d500000, using [bus 00-ff] > [ 1.060975] brcm-pcie fd500000.pcie: MEM > 0x0600000000..0x063fffffff -> 0x00c0000000 > [ 1.061061] brcm-pcie fd500000.pcie: IB MEM > 0x0000000000..0x00ffffffff -> 0x0100000000 > [ 1.109943] brcm-pcie fd500000.pcie: link up, 5.0 GT/s PCIe x1 (SSC) > [ 1.110129] brcm-pcie fd500000.pcie: PCI host bridge to bus 0000:00 > [ 1.110159] pci_bus 0000:00: root bus resource [bus 00-ff] > [ 1.110187] pci_bus 0000:00: root bus resource [mem > 0x600000000-0x63fffffff] (bus address [0xc0000000-0xffffffff]) > [ 1.110286] pci 0000:00:00.0: [14e4:2711] type 01 class 0x060400 > [ 1.110505] pci 0000:00:00.0: PME# supported from D0 D3hot > [ 1.114095] pci 0000:00:00.0: bridge configuration invalid ([bus > 00-00]), reconfiguring > [ 1.114343] pci 0000:01:00.0: [10de:128b] type 00 class 0x030000 > [ 1.114404] pci 0000:01:00.0: reg 0x10: [mem 0x00000000-0x00ffffff] > [ 1.114456] pci 0000:01:00.0: reg 0x14: [mem 0x00000000-0x07ffffff > 64bit pref] > [ 1.114510] pci 0000:01:00.0: reg 0x1c: [mem 0x00000000-0x01ffffff > 64bit pref] > [ 1.114551] pci 0000:01:00.0: reg 0x24: [io 0x0000-0x007f] > [ 1.114590] pci 0000:01:00.0: reg 0x30: [mem 0x00000000-0x0007ffff pref] > [ 1.114853] pci 0000:01:00.0: 4.000 Gb/s available PCIe bandwidth, > limited by 5.0 GT/s PCIe x1 link at 0000:00:00.0 (capable of 63.008 > Gb/s with 8.0 GT/s PCIe x8 link) > [ 1.115022] pci 0000:01:00.0: vgaarb: VGA device added: > decodes=io+mem,owns=none,locks=none > [ 1.115125] pci 0000:01:00.1: [10de:0e0f] type 00 class 0x040300 > [ 1.115184] pci 0000:01:00.1: reg 0x10: [mem 0x00000000-0x00003fff] > [ 1.119065] pci_bus 0000:01: busn_res: [bus 01-ff] end is updated to 01 > [ 1.119120] pci 0000:00:00.0: BAR 9: assigned [mem > 0x600000000-0x60bffffff 64bit pref] > [ 1.119151] pci 0000:00:00.0: BAR 8: assigned [mem 0x60c000000-0x60d7fffff]This is your brcm-pcie device.> [ 1.119183] pci 0000:01:00.0: BAR 1: assigned [mem > 0x600000000-0x607ffffff 64bit pref] > [ 1.119235] pci 0000:01:00.0: BAR 3: assigned [mem > 0x608000000-0x609ffffff 64bit pref] > [ 1.119285] pci 0000:01:00.0: BAR 0: assigned [mem 0x60c000000-0x60cffffff]And this is the NVIDIA device. Note that these memory windows are identical (or at least overlapping). I must admit almost complete ignorance of PCIe and whether this is OK, but it seems sketchy at first glance. A quick eyeballing of my x86 system suggests that all PCIe devices get non-overlapping windows. OTOH there are messages further up about some sort of remapping going on, so perhaps it's OK? But two things on the same bus still shouldn't have the same addresses allocated, based on my (limited) understanding. In case it's an option, could you "unplug" the NIC (not just not load its driver, but make it not appear at all on the PCI bus)?> [ 1.119317] pci 0000:01:00.0: BAR 6: assigned [mem > 0x60d000000-0x60d07ffff pref]Never heard of a BAR6 on NVIDIA. Probably just my ignorance though.> [ 1.119345] pci 0000:01:00.1: BAR 0: assigned [mem 0x60d080000-0x60d083fff] > [ 1.119376] pci 0000:01:00.0: BAR 5: no space for [io size 0x0080] > [ 1.119400] pci 0000:01:00.0: BAR 5: failed to assign [io size 0x0080] > [ 1.119426] pci 0000:00:00.0: PCI bridge to [bus 01] > [ 1.119456] pci 0000:00:00.0: bridge window [mem 0x60c000000-0x60d7fffff] > [ 1.119484] pci 0000:00:00.0: bridge window [mem > 0x600000000-0x60bffffff 64bit pref] > [ 1.119662] pci 0000:01:00.1: D0 power state depends on 0000:01:00.0Back to your original issue:> > Dave > > > On Wed, Oct 28, 2020, 9:46 AM Dave Stevenson <dave.stevenson at raspberrypi.com> wrote: > >> > >> Hi > >> > >> Seeing as we (Raspberry Pi) have just launched the Compute Module 4 > >> with an exposed PCIe x1 lane, people are asking about adding graphics > >> cards. > >> > >> Seeing as you are the people who have the knowledge with regard to > >> NVidia and nouveau, what are your immediate thoughts of nouveau > >> working on ARM/ARM64? Is there a chance of this working? I'm no PCIe > >> expert, although I can call on some expertise :-/ > >> > >> I've tried it so far with a GT710 board [1] and ARM64. It's blowing up > >> in the memset of nvkm_instobj_new whilst initialising the BAR > >> subdevice [2], having gone through the "No such luck" path in > >> nvkm_mmu_ptc_get [3]. > >> > >> Taking the naive approach of simply removing the memset, I get through > >> initialising all the subdevices, but again die in a location I > >> currently haven't pinpointed. The last logging messages are:That's not a winning strategy, I'm afraid. You need to figure out why the memset is blowing up. The simplest explanation is "it's trying to write to an I/O resource but that resource wasn't allocated", hence my question about BARs. But something's not mapped, or mapped in the wrong way, or whatever. If you can't write to it at that point in time, you won't be able to write to it later either. I would focus on that. FWIW this all does "work" on ARM in general -- the Jetson TK1/TX1/T(latest)1 are NVIDIA ARM devices with onboard *platform* devices (not PCI) which work OK. So the core of nouveau works (or at least worked) fine. Just some of the PCIe glue may be off. Cheers, -ilia
Hi Ilia Thanks again for the reply. On Wed, 28 Oct 2020 at 14:59, Ilia Mirkin <imirkin at alum.mit.edu> wrote:> > On Wed, Oct 28, 2020 at 10:20 AM Dave Stevenson > <dave.stevenson at raspberrypi.com> wrote: > > > > Hi Ilia > > > > Thanks for taking the time to reply. > > > > On Wed, 28 Oct 2020 at 14:10, Ilia Mirkin <imirkin at alum.mit.edu> wrote: > > > > > > The most common issue on arm is that the pci memory window is too narrow to allocate all the BARs. Can you see if there are messages in the kernel to that effect? > > > > All the BAR allocations seem to succeed except for the IO one. > > AIUI I/O is deprecated, but is it still used on these cards? > > I must admit I was ignorant of the fact that the IO ports were treated > as a BAR, but it makes a lot of sense. > > One thing does stand out as odd: > > > > > [ 1.060851] brcm-pcie fd500000.pcie: host bridge /scb/pcie at 7d500000 ranges: > > [ 1.060892] brcm-pcie fd500000.pcie: No bus range found for > > /scb/pcie at 7d500000, using [bus 00-ff] > > [ 1.060975] brcm-pcie fd500000.pcie: MEM > > 0x0600000000..0x063fffffff -> 0x00c0000000 > > [ 1.061061] brcm-pcie fd500000.pcie: IB MEM > > 0x0000000000..0x00ffffffff -> 0x0100000000 > > [ 1.109943] brcm-pcie fd500000.pcie: link up, 5.0 GT/s PCIe x1 (SSC) > > [ 1.110129] brcm-pcie fd500000.pcie: PCI host bridge to bus 0000:00 > > [ 1.110159] pci_bus 0000:00: root bus resource [bus 00-ff] > > [ 1.110187] pci_bus 0000:00: root bus resource [mem > > 0x600000000-0x63fffffff] (bus address [0xc0000000-0xffffffff]) > > [ 1.110286] pci 0000:00:00.0: [14e4:2711] type 01 class 0x060400 > > [ 1.110505] pci 0000:00:00.0: PME# supported from D0 D3hot > > [ 1.114095] pci 0000:00:00.0: bridge configuration invalid ([bus > > 00-00]), reconfiguring > > [ 1.114343] pci 0000:01:00.0: [10de:128b] type 00 class 0x030000 > > [ 1.114404] pci 0000:01:00.0: reg 0x10: [mem 0x00000000-0x00ffffff] > > [ 1.114456] pci 0000:01:00.0: reg 0x14: [mem 0x00000000-0x07ffffff > > 64bit pref] > > [ 1.114510] pci 0000:01:00.0: reg 0x1c: [mem 0x00000000-0x01ffffff > > 64bit pref] > > [ 1.114551] pci 0000:01:00.0: reg 0x24: [io 0x0000-0x007f] > > [ 1.114590] pci 0000:01:00.0: reg 0x30: [mem 0x00000000-0x0007ffff pref] > > [ 1.114853] pci 0000:01:00.0: 4.000 Gb/s available PCIe bandwidth, > > limited by 5.0 GT/s PCIe x1 link at 0000:00:00.0 (capable of 63.008 > > Gb/s with 8.0 GT/s PCIe x8 link) > > [ 1.115022] pci 0000:01:00.0: vgaarb: VGA device added: > > decodes=io+mem,owns=none,locks=none > > [ 1.115125] pci 0000:01:00.1: [10de:0e0f] type 00 class 0x040300 > > [ 1.115184] pci 0000:01:00.1: reg 0x10: [mem 0x00000000-0x00003fff] > > [ 1.119065] pci_bus 0000:01: busn_res: [bus 01-ff] end is updated to 01 > > [ 1.119120] pci 0000:00:00.0: BAR 9: assigned [mem > > 0x600000000-0x60bffffff 64bit pref] > > [ 1.119151] pci 0000:00:00.0: BAR 8: assigned [mem 0x60c000000-0x60d7fffff] > > This is your brcm-pcie device. > > > [ 1.119183] pci 0000:01:00.0: BAR 1: assigned [mem > > 0x600000000-0x607ffffff 64bit pref] > > [ 1.119235] pci 0000:01:00.0: BAR 3: assigned [mem > > 0x608000000-0x609ffffff 64bit pref] > > [ 1.119285] pci 0000:01:00.0: BAR 0: assigned [mem 0x60c000000-0x60cffffff] > > And this is the NVIDIA device. Note that these memory windows are > identical (or at least overlapping). I must admit almost complete > ignorance of PCIe and whether this is OK, but it seems sketchy at > first glance. A quick eyeballing of my x86 system suggests that all > PCIe devices get non-overlapping windows. OTOH there are messages > further up about some sort of remapping going on, so perhaps it's OK? > But two things on the same bus still shouldn't have the same addresses > allocated, based on my (limited) understanding.I've raised this with colleagues and it seems that this is normal. The PCI bridge reports the window through which devices can be mapped, and all devices have to fit within that. Pass as to whether that is a quirk of ARM or this particular bridge. I do note that on my x86 systems device 0000:00:00.0 is reported by lspci as a "Host bridge" instead of a "PCI bridge". On an Ubuntu VM I've got running, I do get [ 0.487249] PCI host bridge to bus 0000:00 [ 0.487252] pci_bus 0000:00: root bus resource [io 0x0000-0x0cf7 window] [ 0.487254] pci_bus 0000:00: root bus resource [io 0x0d00-0xffff window] [ 0.487256] pci_bus 0000:00: root bus resource [mem 0x000a0000-0x000bffff window] [ 0.487258] pci_bus 0000:00: root bus resource [mem 0xe0000000-0xfdffffff window] [ 0.487260] pci_bus 0000:00: root bus resource [bus 00-ff] and all device allocations are from within those ranges, so I'm not convinced it's that different.> In case it's an option, could you "unplug" the NIC (not just not load > its driver, but make it not appear at all on the PCI bus)?NIC? The network interface is totally separate. Or is this another reuse of a TLA? Unplugging the GPU means the PCI bus reports as being down and I get no output at all from lspci.> > [ 1.119317] pci 0000:01:00.0: BAR 6: assigned [mem > > 0x60d000000-0x60d07ffff pref] > > Never heard of a BAR6 on NVIDIA. Probably just my ignorance though.You're ahead of me in knowledge, so I really don't know. When I get a chance I'll have a look again on an x86 system to see what it reports.> > [ 1.119345] pci 0000:01:00.1: BAR 0: assigned [mem 0x60d080000-0x60d083fff] > > [ 1.119376] pci 0000:01:00.0: BAR 5: no space for [io size 0x0080] > > [ 1.119400] pci 0000:01:00.0: BAR 5: failed to assign [io size 0x0080] > > [ 1.119426] pci 0000:00:00.0: PCI bridge to [bus 01] > > [ 1.119456] pci 0000:00:00.0: bridge window [mem 0x60c000000-0x60d7fffff] > > [ 1.119484] pci 0000:00:00.0: bridge window [mem > > 0x600000000-0x60bffffff 64bit pref] > > [ 1.119662] pci 0000:01:00.1: D0 power state depends on 0000:01:00.0 > > Back to your original issue: > > > > > Dave > > > > > On Wed, Oct 28, 2020, 9:46 AM Dave Stevenson <dave.stevenson at raspberrypi.com> wrote: > > >> > > >> Hi > > >> > > >> Seeing as we (Raspberry Pi) have just launched the Compute Module 4 > > >> with an exposed PCIe x1 lane, people are asking about adding graphics > > >> cards. > > >> > > >> Seeing as you are the people who have the knowledge with regard to > > >> NVidia and nouveau, what are your immediate thoughts of nouveau > > >> working on ARM/ARM64? Is there a chance of this working? I'm no PCIe > > >> expert, although I can call on some expertise :-/ > > >> > > >> I've tried it so far with a GT710 board [1] and ARM64. It's blowing up > > >> in the memset of nvkm_instobj_new whilst initialising the BAR > > >> subdevice [2], having gone through the "No such luck" path in > > >> nvkm_mmu_ptc_get [3]. > > >> > > >> Taking the naive approach of simply removing the memset, I get through > > >> initialising all the subdevices, but again die in a location I > > >> currently haven't pinpointed. The last logging messages are: > > That's not a winning strategy, I'm afraid. You need to figure out why > the memset is blowing up. The simplest explanation is "it's trying to > write to an I/O resource but that resource wasn't allocated", hence my > question about BARs. But something's not mapped, or mapped in the > wrong way, or whatever. If you can't write to it at that point in > time, you won't be able to write to it later either. I would focus on > that.I did say it was the naive approach :-) I was trying to gauge how much effort was going to be needed to get this going. Was it going to blow up in 1, 10, or 100 places? It feels like it is only a couple of things that are wrong, so there is hope. Slightly annoyingly something more urgent has come up and I need to shelve my experimentation for now, but thanks for the pointers. At least I have some idea of where to start looking when time allows.> FWIW this all does "work" on ARM in general -- the Jetson > TK1/TX1/T(latest)1 are NVIDIA ARM devices with onboard *platform* > devices (not PCI) which work OK. So the core of nouveau works (or at > least worked) fine. Just some of the PCIe glue may be off.It certainly feels like PCIe glue being off, but I was so far in the dark it was worth the quick email to ask the question, and I'm very grateful for the response. Thanks again, Dave