Hi all I already posted about this problem on xen-users some time ago (http://markmail.org/message/sbgtyjqh6bzmqx4s) but I couldn''t resolve my problem using help from people on xen-users, so I''m posting here . I have a problem with enabling IOMMU on Xen 4.2.1. When I enable it in BIOS and in grub.conf using iommu=1 kernel option, my machine cannot boot. I get a following error on serial console: (XEN) **************************************** (XEN) Panic on CPU 0: (XEN) Xen BUG at pci_amd_iommu.c:35 (XEN) **************************************** (XEN) (XEN) Reboot in five seconds... Is it a bug in Xen or maybe bug in BIOS? I run CentOS 6.3 and use dom0 kernel 3.7.1 and Xen from http://au1.mirror.crc.id.au/repo/el6/x86_64/ repository, but I also tried with other 3.x and 2.6.32 kernels and different Xen builds with no luck. GRUB entry: title CentOS Xen kernel IOMMU serial console (3.7.1-3.el6xen.x86_64) root (hd0,0) kernel /xen.gz dom0_mem=1G,max:1G dom0_max_vcpus=1 dom0_vcpus_pin iommu=verbose loglvl=all guest_loglvl=all iommu=1 com1=38400,8n1 console=com1 module /vmlinuz-3.7.1-3.el6xen.x86_64 ro root=/dev/mapper/vg_titan_raid5-lv_titan_root rd_LVM_LV=vg_titan_raid5/lv_titan_root rd_NO_LUKS LANG=en_US.UTF-8 rd_NO_MD SYSFONT=latarcyrheb-sun16 crashkernel=auto rd_LVM_LV=vg_titan_raid5/lv_titan_swap KEYBOARDTYPE=pc KEYTABLE=us rd_NO_DM console=hvc0 earlyprintk=xen nomodeset module /initramfs-3.7.1-3.el6xen.x86_64.img Hardware info: Motherboard: ASUS M4A89TD PRO USB3 (AMD 890FX chipset, reported to work with IOMMU on Xen wiki) CPU: AMD Phenom II X6 1045T Software info: OS: CentOS 6.3 64bit Xen: 4.2.1 BIOS version: 3029 (up to date, also tried with older versions) Detailed information: Full serial output: http://pastebin.com/raw.php?i=K1DuhDcj xl info (when booting with iommu=0): http://pastebin.com/raw.php?i=jU7bEFrN lspci -vvv: http://pastebin.com/raw.php?i=3wpKPQT9 dmidecode: http://pastebin.com/raw.php?i=7wEcTXzr kernel config: http://pastebin.com/raw.php?i=zYgGZ84f Please help povder
----- povder@gmail.com wrote:> > GRUB entry: > title CentOS Xen kernel IOMMU serial console (3.7.1-3.el6xen.x86_64) > root (hd0,0) > kernel /xen.gz dom0_mem=1G,max:1G dom0_max_vcpus=1 > dom0_vcpus_pin iommu=verbose loglvl=all guest_loglvl=all iommu=1 > com1=38400,8n1 console=com1Try adding "iommu=debug" option --- it will print more information including dump of the ACPI table that describes IOMMU. -boris
2013/2/12 Boris Ostrovsky <boris.ostrovsky@oracle.com>:> > Try adding "iommu=debug" option --- it will print more information including > dump of the ACPI table that describes IOMMU. >Thanks for a quick reply. Here is the output with iommu=debug: http://pastebin.com/raw.php?i=1wwLw82c
>>> On 12.02.13 at 07:26, povder <povder@gmail.com> wrote: > 2013/2/12 Boris Ostrovsky <boris.ostrovsky@oracle.com>: >> >> Try adding "iommu=debug" option --- it will print more information including >> dump of the ACPI table that describes IOMMU. >> > > Thanks for a quick reply. > Here is the output with iommu=debug: http://pastebin.com/raw.php?i=1wwLw82cThere''s no boot failure in that log, so please clarify what this was generated from. Also, sadly, the debug output isn''t really helpful, which is why the patch that supposedly fixes your boot problem also adjusts what gets printed here (for the unlikely case that your problem persists with that patch: http://lists.xen.org/archives/html/xen-devel/2013-02/msg00408.html). Jan
2013/2/12 Jan Beulich <JBeulich@suse.com>:>> On 12.02.13 at 07:26, povder <povder@gmail.com> wrote: >> Here is the output with iommu=debug: http://pastebin.com/raw.php?i=1wwLw82c > > There''s no boot failure in that log, so please clarify what this was > generated from. >I don''t know if I understood you well but for me this log contains a boot failure. Maybe I misunderstand the term "boot failure" but for me (non-native english speaker) when a machine cannot start I understand it as a "boot failure". This log is a full serial output of start process of my machine which ends up with a failure so in my opinion it contains a boot failure - correct me if I''m wrong.> Also, sadly, the debug output isn''t really helpful, which is why > the patch that supposedly fixes your boot problem also adjusts > what gets printed here (for the unlikely case that your problem > persists with that patch: > http://lists.xen.org/archives/html/xen-devel/2013-02/msg00408.html). >So to ensure myself: I should apply a path from http://lists.xen.org/archives/html/xen-devel/2013-02/msg00408.html and that should resolve my problem? Or maybe is there is a planned release of next Xen version that will already contain this patch? If a release is planned to be in near future I would prefer to wait for it rather than apply the patch myself and compile Xen from sources.
>>> On 12.02.13 at 11:55, povder <povder@gmail.com> wrote: > 2013/2/12 Jan Beulich <JBeulich@suse.com>: >>> On 12.02.13 at 07:26, povder <povder@gmail.com> wrote: >>> Here is the output with iommu=debug: http://pastebin.com/raw.php?i=1wwLw82c >> >> There''s no boot failure in that log, so please clarify what this was >> generated from. >> > > I don''t know if I understood you well but for me this log contains a > boot failure. > Maybe I misunderstand the term "boot failure" but for me (non-native > english speaker) > when a machine cannot start I understand it as a "boot failure". > This log is a full serial output of start process of my machine > which ends up with a failure so in my opinion it contains a boot > failure - correct me > if I''m wrong.Oh, I''m sorry, I didn''t look through all the way to the end of the log, as I expected the crash to happen before Dom0 even starts.>> Also, sadly, the debug output isn''t really helpful, which is why >> the patch that supposedly fixes your boot problem also adjusts >> what gets printed here (for the unlikely case that your problem >> persists with that patch: >> http://lists.xen.org/archives/html/xen-devel/2013-02/msg00408.html). >> > > So to ensure myself: I should apply a path from > http://lists.xen.org/archives/html/xen-devel/2013-02/msg00408.html > and that should resolve my problem? Or maybe is there is a planned release > of next Xen version that will already contain this patch? If a release > is planned > to be in near future I would prefer to wait for it rather than apply the > patch > myself and compile Xen from sources.With the above, the patch is unlikely to address your problem, but will likely provide better debugging output. So please nevertheless try building with that patch included, assuming the problem first started after you built Xen from a recent 4.2-testing tree (as opposed to this being plain 4.2.1, in which case the problem is obviously unrelated to the recent changes I''m thinking of). Jan
2013/2/12 Jan Beulich <JBeulich@suse.com>:> With the above, the patch is unlikely to address your problem, > but will likely provide better debugging output. So please > nevertheless try building with that patch included, assuming > the problem first started after you built Xen from a recent > 4.2-testing tree (as opposed to this being plain 4.2.1, in which > case the problem is obviously unrelated to the recent changes > I''m thinking of). >I haven''t built Xen myself, I use binaries from http://au1.mirror.crc.id.au/repo/el6/x86_64/ repository and I guess that builds in this repository are from plain 4.2.1. xl info (when I boot with iommu disabled) shows: xen_major : 4 xen_minor : 2 xen_extra : .1 I just started using Xen when 4.2.1 already was released so this problem appeared to me from the beginning. I can try with 4.2-testing though.
>>> On 12.02.13 at 12:15, povder <povder@gmail.com> wrote: > 2013/2/12 Jan Beulich <JBeulich@suse.com>: >> With the above, the patch is unlikely to address your problem, >> but will likely provide better debugging output. So please >> nevertheless try building with that patch included, assuming >> the problem first started after you built Xen from a recent >> 4.2-testing tree (as opposed to this being plain 4.2.1, in which >> case the problem is obviously unrelated to the recent changes >> I''m thinking of). >> > > I haven''t built Xen myself, I use binaries from > http://au1.mirror.crc.id.au/repo/el6/x86_64/ repository and I guess > that builds in this repository are from plain 4.2.1. > xl info (when I boot with iommu disabled) shows: > xen_major : 4 > xen_minor : 2 > xen_extra : .1 > > I just started using Xen when 4.2.1 already was released so this > problem appeared to me from the beginning. I can try with 4.2-testing > though.No, there''s no point I''m afraid. We really need to analyze the debugging output to first understand what''s missing. Jan
>>> On 12.02.13 at 12:22, "Jan Beulich" <JBeulich@suse.com> wrote: >>>> On 12.02.13 at 12:15, povder <povder@gmail.com> wrote: >> 2013/2/12 Jan Beulich <JBeulich@suse.com>: >>> With the above, the patch is unlikely to address your problem, >>> but will likely provide better debugging output. So please >>> nevertheless try building with that patch included, assuming >>> the problem first started after you built Xen from a recent >>> 4.2-testing tree (as opposed to this being plain 4.2.1, in which >>> case the problem is obviously unrelated to the recent changes >>> I''m thinking of). >>> >> >> I haven''t built Xen myself, I use binaries from >> http://au1.mirror.crc.id.au/repo/el6/x86_64/ repository and I guess >> that builds in this repository are from plain 4.2.1. >> xl info (when I boot with iommu disabled) shows: >> xen_major : 4 >> xen_minor : 2 >> xen_extra : .1 >> >> I just started using Xen when 4.2.1 already was released so this >> problem appeared to me from the beginning. I can try with 4.2-testing >> though. > > No, there''s no point I''m afraid. We really need to analyze the > debugging output to first understand what''s missing.All there is for bus 7 is (XEN) AMD-Vi: IVHD Device Entry: (XEN) AMD-Vi: Type 0x2 (XEN) AMD-Vi: Dev_Id 0x700 (XEN) AMD-Vi: Flags 0x0 i.e. a single device at 07:00.0, yet from the register dump at the crash it''s fairly clear that we''re talking about 07:00.1 here. I''m afraid only a firmware update can help you here (or passing "iommu=off" to Xen); in particular I can''t see how we could work around that problem in software. Jan
On 02/12/2013 06:29 AM, Jan Beulich wrote:>>>> On 12.02.13 at 12:22, "Jan Beulich" <JBeulich@suse.com> wrote: >>>>> On 12.02.13 at 12:15, povder <povder@gmail.com> wrote: >>> 2013/2/12 Jan Beulich <JBeulich@suse.com>: >>>> With the above, the patch is unlikely to address your problem, >>>> but will likely provide better debugging output. So please >>>> nevertheless try building with that patch included, assuming >>>> the problem first started after you built Xen from a recent >>>> 4.2-testing tree (as opposed to this being plain 4.2.1, in which >>>> case the problem is obviously unrelated to the recent changes >>>> I''m thinking of). >>>> >>> I haven''t built Xen myself, I use binaries from >>> http://au1.mirror.crc.id.au/repo/el6/x86_64/ repository and I guess >>> that builds in this repository are from plain 4.2.1. >>> xl info (when I boot with iommu disabled) shows: >>> xen_major : 4 >>> xen_minor : 2 >>> xen_extra : .1 >>> >>> I just started using Xen when 4.2.1 already was released so this >>> problem appeared to me from the beginning. I can try with 4.2-testing >>> though. >> No, there''s no point I''m afraid. We really need to analyze the >> debugging output to first understand what''s missing. > All there is for bus 7 is > > (XEN) AMD-Vi: IVHD Device Entry: > (XEN) AMD-Vi: Type 0x2 > (XEN) AMD-Vi: Dev_Id 0x700 > (XEN) AMD-Vi: Flags 0x0 > > i.e. a single device at 07:00.0, yet from the register dump at the > crash it''s fairly clear that we''re talking about 07:00.1 here. I''m > afraid only a firmware update can help you here (or passing > "iommu=off" to Xen); in particular I can''t see how we could work > around that problem in software.I don''t see any devices on bus 7 in lspci output (http://pastebin.com/raw.php?i=3wpKPQT9 from original report). However the log shows pci 0000:07:00.0: disabling ASPM on pre-1.1 PCIe device. You can enable it with ''pcie_aspm=force'' .. (XEN) PCI add device 0000:07:00.0 -boris
2013/2/12 Boris Ostrovsky <boris.ostrovsky@oracle.com>:> > I don''t see any devices on bus 7 in lspci output > (http://pastebin.com/raw.php?i=3wpKPQT9 from original report). > > However the log shows > > pci 0000:07:00.0: disabling ASPM on pre-1.1 PCIe device. You can enable it > with ''pcie_aspm=force'' > .. > (XEN) PCI add device 0000:07:00.0 >There is device 00:07.0 in lspci output from original report: 00:07.0 PCI bridge: Advanced Micro Devices [AMD] nee ATI RD890 PCI to PCI bridge (PCI express gpp port G) (prog-if 00 [Normal decode])
>>> On 12.02.13 at 16:50, povder <povder@gmail.com> wrote: > 2013/2/12 Boris Ostrovsky <boris.ostrovsky@oracle.com>: >> >> I don''t see any devices on bus 7 in lspci output >> (http://pastebin.com/raw.php?i=3wpKPQT9 from original report). >> >> However the log shows >> >> pci 0000:07:00.0: disabling ASPM on pre-1.1 PCIe device. You can enable it >> with ''pcie_aspm=force'' >> .. >> (XEN) PCI add device 0000:07:00.0 >> > > There is device 00:07.0 in lspci output from original report: > 00:07.0 PCI bridge: Advanced Micro Devices [AMD] nee ATI RD890 PCI to > PCI bridge (PCI express gpp port G) (prog-if 00 [Normal decode])But we''re seeing a device reported as 07:00.0; we don''t care about the one at 00:07.0. You ought to explain where this device comes from, or why your lspci output doesn''t show it. Perhaps handing us a native kernel boot log (at maximum log level) might already help... Jan
2013/2/12 Jan Beulich <JBeulich@suse.com>:>>>> On 12.02.13 at 16:50, povder <povder@gmail.com> wrote: >> 2013/2/12 Boris Ostrovsky <boris.ostrovsky@oracle.com>: >>> >>> I don''t see any devices on bus 7 in lspci output >>> (http://pastebin.com/raw.php?i=3wpKPQT9 from original report). >>> >>> However the log shows >>> >>> pci 0000:07:00.0: disabling ASPM on pre-1.1 PCIe device. You can enable it >>> with ''pcie_aspm=force'' >>> .. >>> (XEN) PCI add device 0000:07:00.0 >>> >> >> There is device 00:07.0 in lspci output from original report: >> 00:07.0 PCI bridge: Advanced Micro Devices [AMD] nee ATI RD890 PCI to >> PCI bridge (PCI express gpp port G) (prog-if 00 [Normal decode]) > > But we''re seeing a device reported as 07:00.0; we don''t care > about the one at 00:07.0. > > You ought to explain where this device comes from, or why your > lspci output doesn''t show it. Perhaps handing us a native kernel > boot log (at maximum log level) might already help... > > Jan >Sorry, I''ve mistaken 00:07.0 with 07:00.0. I''ll post some more info soon.
2013/2/12 Jan Beulich <JBeulich@suse.com>:> You ought to explain where this device comes from, or why your > lspci output doesn''t show it. Perhaps handing us a native kernel > boot log (at maximum log level) might already help... >The original lspci -vvv output I posted was from the time I had Firewire disabled in BIOS, I guess I''ve reset settings since then because I was trying on different BIOS versions. With Firewire enabled at 06:00.0 is Firewire controller instead of SATA controller, at 07:00.0 is SATA controller and at 07:00.1 is IDE interface. So I guess boot always fail on the 07:00.1 (or 06:00.1) IDE interface: JMicron Technology Corp. JMB361 AHCI/IDE (rev 02) (prog-if 85 [Master SecO PriO]). If I disable Firewire boot fails with: (XEN) PCI add device 0000:00:18.4 (XEN) PCI add device 0000:06:00.0 (XEN) Xen BUG at pci_amd_iommu.c:35 (XEN) ----[ Xen-4.2.1 x86_64 debug=n Not tainted ]---- (XEN) CPU: 0 (XEN) RIP: e008:[<ffff82c48014afd2>] find_iommu_for_device+0x32/0x40 (XEN) RFLAGS: 0000000000010246 CONTEXT: hypervisor (XEN) rax: 0000000000000601 rbx: 0000000000000601 rcx: ffff83042c980010 If I enable Firewire boot fails with: (XEN) PCI add device 0000:00:18.4 (XEN) PCI add device 0000:07:00.0 (XEN) Xen BUG at pci_amd_iommu.c:35 (XEN) ----[ Xen-4.2.1 x86_64 debug=n Not tainted ]---- (XEN) CPU: 0 (XEN) RIP: e008:[<ffff82c48014afd2>] find_iommu_for_device+0x32/0x40 (XEN) RFLAGS: 0000000000010246 CONTEXT: hypervisor (XEN) rax: 0000000000000701 rbx: 0000000000000701 rcx: ffff83042c980010 full lspci -vvv with firewire enabled (with IDE interface as 07:00.1): http://pastebin.com/raw.php?i=V7YqxNYD boot log with firewire enabled (posted earlier): http://pastebin.com/raw.php?i=1wwLw82c full lspci -vvv with firewire disabled (posted earlier, with IDE interface as 06:00.1): http://pastebin.com/raw.php?i=3wpKPQT9 boot log with firewire disabled: http://pastebin.com/raw.php?i=LhaN4XeK
I disabled id BIOS IDE interface that was causing problems and system boots fine. Thanks for your help! I just wonder if it''s a bug in BIOS or in Xen. If it''s ASUS bug I would like to report bug to them.
----- povder@gmail.com wrote:> I disabled id BIOS IDE interface that was causing problems and system > boots fine. Thanks for your help! > > I just wonder if it''s a bug in BIOS or in Xen. If it''s ASUS bug I > would like to report bug to them.This looks like BIOS bug -- there is no entry for the IDE interface in IVRS table (which is used by IOMMU driver to discover devices). I am wondering whether such cases (undeclared devices in IVRS) should cause a panic or disabling of IOMMU. This may be a more generic case of Jan''s earlier patch for dealing with missing IOAPIC. Not sure whether it would be possible to "unwind" IOMMU at this point though. (For the record, I asked povder to run with xen-unstable that I provided to him because for some reason I thought this might be combined mode problem. Obviously this had nothing to do with combined mode) -boris
>>> On 12.02.13 at 19:40, povder <povder@gmail.com> wrote: > I disabled id BIOS IDE interface that was causing problems and system > boots fine. Thanks for your help! > > I just wonder if it''s a bug in BIOS or in Xen. If it''s ASUS bug I > would like to report bug to them.Quite obviously it''s a BIOS bug, failing to cover all devices in the IVRS table. Whether we can do any better than crashing in that case is a different question - I wonder how native Linux with IOMMU enabled does in that situation... Jan
>>> On 13.02.13 at 02:33, Boris Ostrovsky <boris.ostrovsky@oracle.com> wrote: > I am wondering whether such cases (undeclared devices in IVRS) should cause > a panic or disabling of IOMMU. This may be a more generic case of Jan''s earlier > patch for dealing with missing IOAPIC. Not sure whether it would be possible > to "unwind" IOMMU at this point though.I doubt it - this would likely cause further problems down the road. Instead, with us doing a bus scan anyway (as of 4.1), we could detect the problem much earlier (and only bug on devices that we don''t find but Dom0 does - in particular, I''m wondering how SR-IOV VFs would get dealt with here). Furthermore I wonder whether in the single IOMMU case we couldn''t deal with this on the same basis as is being done (sort of unintendedly) for the IO-APIC case: The single IOMMU in the system _must_ be the one responsible for any device not reported by the firmware. That would deal with povder''s case, and experience tells us that more complex (read: expensive) systems tend to have less ACPI table flaws (i.e. wouldn''t suffer from not being covered by such a workaround). Jan
2013/2/13 Jan Beulich <JBeulich@suse.com>:> I wonder how native Linux with > IOMMU enabled does in that situation... >I can try it today if you want. What kernel option should i use to enable iommu? "iommu=force"?
>>> On 13.02.13 at 09:28, povder <povder@gmail.com> wrote: > 2013/2/13 Jan Beulich <JBeulich@suse.com>: >> I wonder how native Linux with >> IOMMU enabled does in that situation... >> > > I can try it today if you want. What kernel option should i use to > enable iommu? "iommu=force"?Looks like other than Intel''s, AMD''s IOMMU gets turned on by default independent of any configuration settings. So providing a native kernel boot log (at maximum log level) ought to suffice (assuming of course you don''t have any command line options in place to _disable_ the IOMMU). Jan
>> 2013/2/13 Jan Beulich <JBeulich@suse.com>: >>> I wonder how native Linux with >>> IOMMU enabled does in that situation... >>>Here is full boot log of latest centos stable kernel: http://pastebin.com/raw.php?i=RnrMFXqf I had to set amd_iommu=on and amd_iommu_dump (to dump ACPI table) - undocumented kernel options. Interesting part in my opinion: calling pci_iommu_init+0x0/0x21 @ 1 AMD-Vi: device: 00:00.2 cap: 0040 seg: 0 flags: 3e info 1300 AMD-Vi: mmio-addr: 00000000f6000000 AMD-Vi: DEV_SELECT_RANGE_START devid: 00:00.0 flags: 00 AMD-Vi: DEV_RANGE_END devid: 00:00.2 AMD-Vi: DEV_SELECT devid: 00:04.0 flags: 00 AMD-Vi: DEV_SELECT devid: 06:00.0 flags: 00 AMD-Vi: DEV_SELECT devid: 00:06.0 flags: 00 AMD-Vi: DEV_SELECT devid: 05:00.0 flags: 00 AMD-Vi: DEV_SELECT devid: 00:07.0 flags: 00 AMD-Vi: DEV_SELECT devid: 04:00.0 flags: 00 AMD-Vi: DEV_SELECT devid: 00:0b.0 flags: 00 AMD-Vi: DEV_SELECT_RANGE_START devid: 03:00.0 flags: 00 AMD-Vi: DEV_RANGE_END devid: 03:00.1 AMD-Vi: DEV_SELECT devid: 00:0d.0 flags: 00 AMD-Vi: DEV_SELECT devid: 02:00.0 flags: 00 AMD-Vi: DEV_SELECT devid: 00:11.0 flags: 00 AMD-Vi: DEV_SELECT_RANGE_START devid: 00:12.0 flags: 00 AMD-Vi: DEV_RANGE_END devid: 00:12.2 AMD-Vi: DEV_SELECT_RANGE_START devid: 00:13.0 flags: 00 AMD-Vi: DEV_RANGE_END devid: 00:13.2 AMD-Vi: DEV_SELECT devid: 00:14.0 flags: d7 AMD-Vi: DEV_SELECT devid: 00:14.1 flags: 00 AMD-Vi: DEV_SELECT devid: 00:14.2 flags: 00 AMD-Vi: DEV_SELECT devid: 00:14.3 flags: 00 AMD-Vi: DEV_SELECT devid: 00:14.4 flags: 00 AMD-Vi: DEV_ALIAS_RANGE devid: 01:00.0 flags: 00 devid_to: 00:14.4 AMD-Vi: DEV_RANGE_END devid: 01:1f.7 AMD-Vi: DEV_SELECT devid: 00:14.5 flags: 00 AMD-Vi: DEV_SELECT_RANGE_START devid: 00:16.0 flags: 00 AMD-Vi: DEV_RANGE_END devid: 00:16.2 alloc irq_desc for 55 on node 0 alloc kstat_irqs on node 0 IOAPIC[1]: Set routing entry (7-31 -> 0x79 -> IRQ 55 Mode:1 Active:1) pci 0000:00:00.2: PCI INT A -> GSI 55 (level, low) -> IRQ 55 alloc irq_desc for 56 on node 0 alloc kstat_irqs on node 0 pci 0000:00:00.2: irq 56 for MSI/MSI-X AMD-Vi: Enabling IOMMU at 0000:00:00.2 cap 0x40 AMD-Vi: Initialized for Passthrough Mode AMD-Vi: Enabling IOMMU at 0000:00:00.2 cap 0x40 initcall pci_iommu_init+0x0/0x21 returned 0 after 569797 usecs I don''t see 06:00.1 device in IOMMU enabling process on which Xen crashes. lspci output: http://pastebin.com/raw.php?i=3wpKPQT9
>>> On 13.02.13 at 19:21, povder <povder@gmail.com> wrote: >> > 2013/2/13 Jan Beulich <JBeulich@suse.com>: >>>> I wonder how native Linux with >>>> IOMMU enabled does in that situation... >>>> > > Here is full boot log of latest centos stable kernel: > http://pastebin.com/raw.php?i=RnrMFXqf > I had to set amd_iommu=on and amd_iommu_dump (to dump ACPI table) - > undocumented kernel options. > > Interesting part in my opinion: > calling pci_iommu_init+0x0/0x21 @ 1 > AMD-Vi: device: 00:00.2 cap: 0040 seg: 0 flags: 3e info 1300 > AMD-Vi: mmio-addr: 00000000f6000000 > AMD-Vi: DEV_SELECT_RANGE_START devid: 00:00.0 flags: 00 > AMD-Vi: DEV_RANGE_END devid: 00:00.2 > AMD-Vi: DEV_SELECT devid: 00:04.0 flags: 00 > AMD-Vi: DEV_SELECT devid: 06:00.0 flags: 00 > AMD-Vi: DEV_SELECT devid: 00:06.0 flags: 00 > AMD-Vi: DEV_SELECT devid: 05:00.0 flags: 00 > AMD-Vi: DEV_SELECT devid: 00:07.0 flags: 00 > AMD-Vi: DEV_SELECT devid: 04:00.0 flags: 00 > AMD-Vi: DEV_SELECT devid: 00:0b.0 flags: 00 > AMD-Vi: DEV_SELECT_RANGE_START devid: 03:00.0 flags: 00 > AMD-Vi: DEV_RANGE_END devid: 03:00.1 > AMD-Vi: DEV_SELECT devid: 00:0d.0 flags: 00 > AMD-Vi: DEV_SELECT devid: 02:00.0 flags: 00 > AMD-Vi: DEV_SELECT devid: 00:11.0 flags: 00 > AMD-Vi: DEV_SELECT_RANGE_START devid: 00:12.0 flags: 00 > AMD-Vi: DEV_RANGE_END devid: 00:12.2 > AMD-Vi: DEV_SELECT_RANGE_START devid: 00:13.0 flags: 00 > AMD-Vi: DEV_RANGE_END devid: 00:13.2 > AMD-Vi: DEV_SELECT devid: 00:14.0 flags: d7 > AMD-Vi: DEV_SELECT devid: 00:14.1 flags: 00 > AMD-Vi: DEV_SELECT devid: 00:14.2 flags: 00 > AMD-Vi: DEV_SELECT devid: 00:14.3 flags: 00 > AMD-Vi: DEV_SELECT devid: 00:14.4 flags: 00 > AMD-Vi: DEV_ALIAS_RANGE devid: 01:00.0 flags: 00 devid_to: 00:14.4 > AMD-Vi: DEV_RANGE_END devid: 01:1f.7 > AMD-Vi: DEV_SELECT devid: 00:14.5 flags: 00 > AMD-Vi: DEV_SELECT_RANGE_START devid: 00:16.0 flags: 00 > AMD-Vi: DEV_RANGE_END devid: 00:16.2 > alloc irq_desc for 55 on node 0 > alloc kstat_irqs on node 0 > IOAPIC[1]: Set routing entry (7-31 -> 0x79 -> IRQ 55 Mode:1 Active:1) > pci 0000:00:00.2: PCI INT A -> GSI 55 (level, low) -> IRQ 55 > alloc irq_desc for 56 on node 0 > alloc kstat_irqs on node 0 > pci 0000:00:00.2: irq 56 for MSI/MSI-X > AMD-Vi: Enabling IOMMU at 0000:00:00.2 cap 0x40 > AMD-Vi: Initialized for Passthrough Mode > AMD-Vi: Enabling IOMMU at 0000:00:00.2 cap 0x40 > initcall pci_iommu_init+0x0/0x21 returned 0 after 569797 usecs > > I don''t see 06:00.1 device in IOMMU enabling process on which Xen crashes.So the problem appears to be that this device has BDF higher than any known one. Could you therefore try whether the patch below allows the system to come up? Jan --- a/xen/drivers/passthrough/amd/pci_amd_iommu.c +++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c @@ -32,8 +32,8 @@ struct amd_iommu *find_iommu_for_device( { struct ivrs_mappings *ivrs_mappings = get_ivrs_mappings(seg); - BUG_ON ( bdf >= ivrs_bdf_entries ); - return ivrs_mappings ? ivrs_mappings[bdf].iommu : NULL; + return ivrs_mappings && bdf < ivrs_bdf_entries ? ivrs_mappings[bdf].iommu + : NULL; } /*
>>> On 13.02.13 at 19:21, povder <povder@gmail.com> wrote: > I don''t see 06:00.1 device in IOMMU enabling process on which Xen crashes. > > lspci output: http://pastebin.com/raw.php?i=3wpKPQT9This is really odd: The "iommu=debug" output you made available shows that while there are further devices that have no associated IOMMU, the bus scan done in the hypervisor didn''t even find a device at 06:00.1. Which I see possible only in two ways: Either the device becomes visible on the bus only when the driver for 06:00.0 loads (and is otherwise detectable only by other means, e.g. ACPI), or 06:00.0 doesn''t have the multi function device flag properly set. That latter aspect could be checked by looking at the raw (hex) config space dump of 06:00.0. Boris, one other thought I had in this context: Is it really possible for functions on the same (non-bridge) device to be serviced by different IOMMUs? If not, find_iommu_for_device() could simply look for function 0 if nothing is known about the passed in function. Jan
On 02/14/2013 06:29 AM, Jan Beulich wrote:>>>> On 13.02.13 at 19:21, povder <povder@gmail.com> wrote: >> I don''t see 06:00.1 device in IOMMU enabling process on which Xen crashes. >> >> lspci output: http://pastebin.com/raw.php?i=3wpKPQT9 > This is really odd: The "iommu=debug" output you made available > shows that while there are further devices that have no associated > IOMMU, the bus scan done in the hypervisor didn''t even find a > device at 06:00.1. Which I see possible only in two ways: Either > the device becomes visible on the bus only when the driver for > 06:00.0 loads (and is otherwise detectable only by other means, > e.g. ACPI), or 06:00.0 doesn''t have the multi function device flag > properly set. That latter aspect could be checked by looking at > the raw (hex) config space dump of 06:00.0.If I read this correctly, Linux enables multi-functionness (?): http://lxr.linux.no/#linux+v3.7.7/drivers/pci/quirks.c#L1494 So you are probably right. BIOS does not enumerate 06:00.1 in IVRS because it doesn''t see it enabled yet.> > Boris, one other thought I had in this context: Is it really possible > for functions on the same (non-bridge) device to be serviced > by different IOMMUs?I can''t see how this may be possible: IOMMU is PCIe root complex and any downstream device can only send transactions through its root. (I hope I am using right terminology).> If not, find_iommu_for_device() could simply > look for function 0 if nothing is known about the passed in function.Yes, this could work. But with a warning in the log. -boris
>>> On 14.02.13 at 15:55, Boris Ostrovsky <boris.ostrovsky@oracle.com> wrote: > On 02/14/2013 06:29 AM, Jan Beulich wrote: >>>>> On 13.02.13 at 19:21, povder <povder@gmail.com> wrote: >>> I don''t see 06:00.1 device in IOMMU enabling process on which Xen crashes. >>> >>> lspci output: http://pastebin.com/raw.php?i=3wpKPQT9 >> This is really odd: The "iommu=debug" output you made available >> shows that while there are further devices that have no associated >> IOMMU, the bus scan done in the hypervisor didn''t even find a >> device at 06:00.1. Which I see possible only in two ways: Either >> the device becomes visible on the bus only when the driver for >> 06:00.0 loads (and is otherwise detectable only by other means, >> e.g. ACPI), or 06:00.0 doesn''t have the multi function device flag >> properly set. That latter aspect could be checked by looking at >> the raw (hex) config space dump of 06:00.0. > > If I read this correctly, Linux enables multi-functionness (?): > > http://lxr.linux.no/#linux+v3.7.7/drivers/pci/quirks.c#L1494 > > So you are probably right. BIOS does not enumerate 06:00.1 in IVRS because > it doesn''t see it enabled yet.Indeed, and it has been that way since 2.6.18 (i.e. virtually forever; commit 15e0c694367332d7e7114c7c73044bc5fed9ee48). I''ve got a patch mostly ready to deal with non-zero functions when we at least know something about function zero. But I don''t think we can easily deal with the single IOMMU case, making that IOMMU cover all devices, as we would still need to figure out the requestor ID for each device. That requires looking at the PCI bus topology iiuc, and while we have the necessary logic for VT-d, it seems not really strait forward (mainly because risky) to make use of this in the AMD Vi code too. Jan
On 02/15/2013 03:21 AM, Jan Beulich wrote:> > But I don''t think we can easily deal with the single IOMMU > case, making that IOMMU cover all devices, as we would > still need to figure out the requestor ID for each device. That > requires looking at the PCI bus topology iiuc, and while we > have the necessary logic for VT-d, it seems not really strait > forward (mainly because risky) to make use of this in the AMD Vi > code too.Scanning PCI for devices would effectively mean that we are ignoring IVHD (device portion of IVRS). That would be somewhat unfortunate (but maybe unavoidable, given the state of BIOSes). -boris
Maybe Matching Threads
- [xen-unstable] Commit 2ca9fbd739b8a72b16dd790d0fff7b75f5488fb8 AMD IOMMU: allocate IRTE entries instead of using a static mapping, makes dom0 boot process stall several times.
- Xen 4.2.1 boot failure with IOMMU enabled
- VGA passthrough radeon 4850 as primary card
- pcieport 0000:00:01.0: PME: Spurious native interrupt (nvidia with nouveau and thunderbolt on thinkpad P73)
- BTRFS thinks device is busy [kernel 3.5.3]