Isaku Yamahata
2009-May-28 03:46 UTC
[Xen-devel] [PATCH 0/5] pcie io space multiplexing for bootable pass through HVM domain
This patch series is for PCIe IO space multiplexing patch for commit. It is not uncommon that a big iron for server consolidation has many (e.g. > 16) PCIe slots. It will hold many domains, and the administrator wants them to boot from pass through devices. But currently up to 16 hvm domains can boot from pass through device. This patch series address it by multiplexing PCI IO space access. Usage: Add the following options to dom0 kernel command line guestdev=<device path>+iomul (append "+iomul") or guestiomuldev=[<segment>:]<bus>:<dev>[,[<segment:><bus>:dev]][,...] (In this case, don''t forget to add related options. pciback.hide, reassign_resources or guestdev.) Then dom0 Linux will allocate IO ports which are shared by specified devices. And ioemu will automatically recognize IO-port-shared devices. The unspecified devices will be treated same as before. Note: Specifying unit to share IO port is PCI slot (device), on the other hand the unit of guestdev is function. If you specify a function to guestdev with "+iomul", all the functions of the given slot will share IO port even if you specify some of functions. This patch series addresses the issue by multiplexing IO space access. The patches are composed of Linux part: backport: preliminary patch Linux part: IO space ressignment code and multiplexing driver Linux part: guestdev kernel parameter support. Linux part: add kernel command line to reserve io/memory space xen part: udev script for the driver ioemu part: make use of the PCIe io space multiplexing driver Details: PCI expansion ROM BIOS often uses IO port access to boot from its device and Linux as dom0 exclusively assigns IO space to downstream PCI bridges and the assignment unit of PCI bridge IO space is 4K. So the only up to 16 PCIe device can be accessed via IO space within 64K IO ports. So on virtualized environment, it means only up to 16 guest domains can boot from such pass-through devices. The solution is to assign the same IO port region to pci devices under same PCIe switch and disable IO bit in command register. When accessing to one of IO port shared devices, the IO bit of the device is enabled, and then issues IOIO. Limitation: - PCI devices or root complex integrated endpoints aren''t supported. - IO port of IO shared devices can''t be accessed from dom0 Linux device driver. But those wouldn''t be big issues because PCIe specification discourages the use of IO space and recommends that IO space should be used only for bootable device with ROM code. OS device driver should work without IO space access. Test: I don''t have a machine with complicated PCIe topology nor many PCIe cards. - PCI hotplug was tested with Linux fakephp. - Only pci device (not bridge) hot plug/remove was tested. - I have tested with only single multifunction PCIe Changes from take2: - rebased - add kernel command line to reserve memory/io space for unused pci slot change take 2: - support PCI hotplug - guestdev kernel paremeter. thanks _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Tian, Kevin
2009-May-28 10:27 UTC
RE: [Xen-devel] [PATCH 0/5] pcie io space multiplexing for bootable pass through HVM domain
Curious question. :-) Is this very virtualization specific usage? How about same box used in native environment, where how admin can judge which PCI-e slot is allocated with I/O ports for bootable purpose? Thanks, Kevin>From: Isaku Yamahata >Sent: 2009年5月28日 11:47 > >This patch series is for PCIe IO space multiplexing patch for commit. > >It is not uncommon that a big iron for server consolidation has >many (e.g. > 16) PCIe slots. It will hold many domains, and >the administrator wants them to boot from pass through devices. >But currently up to 16 hvm domains can boot from pass through device. >This patch series address it by multiplexing PCI IO space access. > >Usage: >Add the following options to dom0 kernel command line > > guestdev=<device path>+iomul > (append "+iomul") >or > guestiomuldev=[<segment>:]<bus>:<dev>[,[<segment:><bus>:dev]][,...] > (In this case, don't forget to add related options. pciback.hide, > reassign_resources or guestdev.) > >Then dom0 Linux will allocate IO ports which are shared by >specified devices. >And ioemu will automatically recognize IO-port-shared devices. >The unspecified devices will be treated same as before. >Note: Specifying unit to share IO port is PCI slot (device), >on the other > hand the unit of guestdev is function. > If you specify a function to guestdev with "+iomul", all the > functions of the given slot will share IO port even if >you specify > some of functions. > > >This patch series addresses the issue by multiplexing IO space access. >The patches are composed of > Linux part: backport: preliminary patch > Linux part: IO space ressignment code and multiplexing driver > Linux part: guestdev kernel parameter support. > Linux part: add kernel command line to reserve io/memory space > xen part: udev script for the driver > ioemu part: make use of the PCIe io space multiplexing driver > > >Details: >PCI expansion ROM BIOS often uses IO port access to boot from >its device >and Linux as dom0 exclusively assigns IO space to downstream >PCI bridges >and the assignment unit of PCI bridge IO space is 4K. So the only up to >16 PCIe device can be accessed via IO space within 64K IO ports. >So on virtualized environment, it means only up to 16 guest domains >can boot from such pass-through devices. > >The solution is to assign the same IO port region to pci devices >under same PCIe switch and disable IO bit in command register. >When accessing to one of IO port shared devices, the IO bit >of the device is enabled, and then issues IOIO. > > >Limitation: >- PCI devices or root complex integrated endpoints aren't supported. >- IO port of IO shared devices can't be accessed from dom0 Linux device > driver. > But those wouldn't be big issues because PCIe specification >discourages > the use of IO space and recommends that IO space should be used only > for bootable device with ROM code. OS device driver should >work without > IO space access. > >Test: >I don't have a machine with complicated PCIe topology nor many >PCIe cards. >- PCI hotplug was tested with Linux fakephp. >- Only pci device (not bridge) hot plug/remove was tested. >- I have tested with only single multifunction PCIe > >Changes from take2: >- rebased >- add kernel command line to reserve memory/io space for >unused pci slot > >change take 2: >- support PCI hotplug >- guestdev kernel paremeter. > >thanks > >_______________________________________________ >Xen-devel mailing list >Xen-devel@lists.xensource.com >http://lists.xensource.com/xen-devel >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Isaku Yamahata
2009-May-28 10:52 UTC
Re: [Xen-devel] [PATCH 0/5] pcie io space multiplexing for bootable pass through HVM domain
On Thu, May 28, 2009 at 06:27:43PM +0800, Tian, Kevin wrote:> Curious question. :-) > > Is this very virtualization specific usage?Yes, right.> How about same box used > in native environment, where how admin can judge which PCI-e slot > is allocated with I/O ports for bootable purpose?I''m not sure I understand your point. You mean there should be a way for an admin to know easily which pcie slot is specified? The admins should know what they did. They know what kernel command line they passed to the kernel. or /proc/cmdline. -- yamahata _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Tian, Kevin
2009-May-29 02:27 UTC
RE: [Xen-devel] [PATCH 0/5] pcie io space multiplexing for bootable pass through HVM domain
>From: Isaku Yamahata [mailto:yamahata@valinux.co.jp] >Sent: 2009年5月28日 18:52 >> How about same box used >> in native environment, where how admin can judge which PCI-e slot >> is allocated with I/O ports for bootable purpose? > >I'm not sure I understand your point. >You mean there should be a way for an admin to know easily >which pcie slot >is specified? >The admins should know what they did. They know what kernel >command line >they passed to the kernel. or /proc/cmdline. >In a native case, as you said, only 16 or fewer (if legacy ISA card exists) PCI-e slots can be assigned with I/O resources. Say an admin wants to insert a bootable card. How can this guy know which slot will be granted with I/O resources by kernel? Does native Linux provide similar cmdline as your approach to assign which slot to be I/O capable? Or admin has to use lspci with assumption that next boot kernel will assign I/O in same way? Not relevant to your patch here. Just interested about how this big box works on native. Thanks, Kevin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Isaku Yamahata
2009-May-29 03:14 UTC
Re: [Xen-devel] [PATCH 0/5] pcie io space multiplexing for bootable pass through HVM domain
On Fri, May 29, 2009 at 10:27:14AM +0800, Tian, Kevin wrote:> >From: Isaku Yamahata [mailto:yamahata@valinux.co.jp] > >Sent: 2009年5月28日 18:52 > >> How about same box used > >> in native environment, where how admin can judge which PCI-e slot > >> is allocated with I/O ports for bootable purpose? > > > >I''m not sure I understand your point. > >You mean there should be a way for an admin to know easily > >which pcie slot > >is specified? > >The admins should know what they did. They know what kernel > >command line > >they passed to the kernel. or /proc/cmdline. > > > > In a native case, as you said, only 16 or fewer (if legacy ISA card > exists) PCI-e slots can be assigned with I/O resources. Say an > admin wants to insert a bootable card. How can this guy know > which slot will be granted with I/O resources by kernel? Does > native Linux provide similar cmdline as your approach to assign > which slot to be I/O capable? Or admin has to use lspci with > assumption that next boot kernel will assign I/O in same way?Oh I see. The answers are - The admin has to lspci or /proc/iomem. and - no and - yes In general, the admin usually doesn''t have to worry how IO space is assigned to PCIe slot where kernel is running on native. It doesn''t matter for OS because almost all device drivers for PCIe cards don''t access IO space. The IO space is accessed only by ROM BIOS on the card. In order to avoid the issues you described above, the PCIe specification encourages hardware vendors to provide memory space for OS device drivers and to use IO space only for booting OS, i.e. IO space should be used only by ROM BIOS and OS device drivers shouldn''t access IO space. Of course, there might be exceptions. "boot" is the key. On native environment "boot" from a PCIe device takes place only one time before OS involved. Once OS booted, it doesn''t care about IO space. On the other hand on virtualized environment with pass-through PCIe cards, "booting" guest OS takes place many times from arbitrary PCIe cards. Here the need for IO space multiplexing arises. I hardly see any other use case. thanks, -- yamahata _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel