Masami Hiramatsu (Google)
2025-Mar-19 12:13 UTC
Debugging PCIe Configuration Space Using mmiotrace
Hi, On Tue, 18 Mar 2025 23:53:34 +0530 Naveen Kumar P <naveenkumar.parna at gmail.com> wrote:> Hi all, > > I am currently debugging an issue on an x86 machine running the latest > Linux kernel, involving a PCIe device whose memory is mapped via BAR0. > I am encountering unexpected behavior when reading its PCI > configuration space using lspci, and I am seeking guidance on whether > mmiotrace can help diagnose the problem.AFAIK, mmiotrace is tracing mmio operation from CPU side. That traces what data the driver is writing where, and what data is read from where.> > Issue Summary: > Expected Behavior After Boot: > lspci -xxx -s 01:00.0 correctly displays valid PCI configuration space > values, including a properly mapped BAR0. > > $ sudo lspci -xxx -s 01:00.0 | grep "10:" > 10: 00 00 40 b0 00 00 00 00 00 00 00 00 00 00 00 00 > > > Unexpected Behavior After Uptime: > After a few days, reading the PCI configuration space (lspci -xxx -s > 01:00.0) sometimes returns all 0xffs for the entire config space. > dmesg does not log any relevant errors. >Hmm, the below problem seems device side issue (especially 9xffff means failed to read the PCI bus, IIRC.)> $ sudo lspci -xxx -s 01:00.0 > 01:00.0 RAM memory: PLDA Device 5555 (rev ff) > 00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > 10: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > 20: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > 30: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > 40: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > 50: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > 60: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > 70: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > 80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > 90: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > a0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > b0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > c0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > d0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > e0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > f0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > > > After Subsequent Reads: > Re-running lspci -xxx -s 01:00.0 restores non-0xff values, but BAR0 > gets reset to zero. > > $ sudo lspci -xxx -s 01:00.0 > 01:00.0 RAM memory: PLDA Device 5555 > 00: 56 15 55 55 00 00 10 00 00 00 00 05 00 00 00 00 > 10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 20: 00 00 00 00 00 00 00 00 00 00 00 00 00 40 00 00 > 30: 00 00 00 00 40 00 00 00 00 00 00 00 ff 01 00 00 > 40: 01 48 03 00 08 00 00 00 05 60 00 00 00 00 00 00 > 50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 60: 10 00 02 00 c2 8f 00 00 10 28 01 00 21 f4 03 00 > 70: 00 00 21 00 00 00 00 00 00 00 00 00 00 00 00 00 > 80: 00 00 00 00 02 00 00 00 00 00 00 00 00 00 00 00 > 90: 20 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 > a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > This suggests that some function or driver is resetting BAR0 during or > after a failed config space read. > > > mmiotrace Setup & Results: > I have enabled mmiotrace and verified it is active: > # cat /sys/kernel/tracing/available_tracers > hwlat blk mmiotrace function_graph wakeup_dl wakeup_rt wakeup function nop > > # cat current_tracer > mmiotrace > > However, trace_pipe and trace logs remain empty even after reproducing > the issue: > > # cat trace_pipe > VERSION 20070824 > PCIDEV 0000 80860f00 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 iosf_mbi_pci > PCIDEV 0010 80860f31 61 b0000000 0 a0000008 0 e081 0 c0002 400000 0 > 10000000 0 8 0 20000 i915 > PCIDEV 0098 80860f23 5b e071 e061 e051 e041 e021 b0b17000 0 8 4 8 4 20 > 800 0 ahci > PCIDEV 00a0 80860f35 5a b0b00004 0 0 0 0 0 0 10000 0 0 0 0 0 0 xhci_hcd > PCIDEV 00b8 80860f50 17 b0b16000 b0b15000 0 0 0 0 0 1000 1000 0 0 0 0 > 0 sdhci-pci > PCIDEV 00d0 80860f18 62 b0900000 b0800000 0 0 0 0 0 100000 100000 0 0 > 0 0 0 mei_txe > PCIDEV 00d8 80860f04 16 b0b10004 0 0 0 0 0 0 4000 0 0 0 0 0 0 snd_hda_intel > PCIDEV 00e0 80860f48 57 0 0 0 0 0 0 0 0 0 0 0 0 0 0 pcieport > PCIDEV 00e2 80860f4c 58 0 0 0 0 0 0 0 0 0 0 0 0 0 0 pcieport > PCIDEV 00e3 80860f4e 59 0 0 0 0 0 0 0 0 0 0 0 0 0 0 pcieport > PCIDEV 00f8 80860f1c 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 lpc_ich > PCIDEV 00fb 80860f12 12 b0b14000 0 0 0 e001 0 0 20 0 0 0 20 0 0 i801_smbus > PCIDEV 0100 15565555 b b0400000 0 0 0 0 0 0 400000 0 0 0 0 0 0 > PCIDEV 0300 80861533 13 b0a00000 0 d001 b0a80000 0 0 0 80000 0 20 4000 0 0 0 igbNote that once you read the `trace_pipe` file, the trace data is consumed and erased (technically, it is not ereased but you can not access it anymore.)> > cat trace > # tracer: mmiotrace > # > # entries-in-buffer/entries-written: 0/0 #P:1 > # > # _-----=> irqs-off/BH-disabled > # / _----=> need-resched > # | / _---=> hardirq/softirq > # || / _--=> preempt-depth > # ||| / _-=> migrate-disable > # |||| / delay > # TASK-PID CPU# ||||| TIMESTAMP FUNCTION > # | | | ||||| | |Thus after reading `trace_pipe`, the `trace` file must be empty. If you want to read it multiple times, you need to use `trace` file always.> > > Request for Assistance: > Can mmiotrace help determine the root cause of why reading the PCI > configuration space results in all 0xffs?As I said, this seems device side or bus side issue. mmiotrace may not directly help, but you can explain what the software does to the hardware people. Thank you,> > Is there a way to determine what function or driver is clearing BAR0 > when the values are restored? > > If mmiotrace is suitable for this, how can I properly capture the > relevant trace data to analyze this issue? > > Any insights or suggestions would be greatly appreciated. Please let > me know if you > need more details. > > Best regards, > Naveen-- Masami Hiramatsu (Google) <mhiramat at kernel.org>