Hi, one of the troubles with the way xen boots paravirtualized kernels is that you get the kernel and the initrd from domain 0, whereas modules that are later loaded are on the domU filesystem. This is a management headache: You must ensure that the kernel and initrd you configure in dom0 for domU booting are in sync with the kernel (modules) in domU. Jeremy Katz has thankfully created some infrastructure to allow plugging in bootloaders instead and contributed pygrub. I extended the infrastructure a bit and added another bootloader. Unlike pygrub it does not offer a menu and does not parse the grub menu.lst; it''s meant for paravirtualized domains and thus we accept that the booted kernel is selected differently. By e.g. a symlink, if one wants to control it from the domU. The bootloader is called domUloader. domUloader parses the bootentry (passed via --entry=) and the disk setup (passed via --disks=). It then sets up loop devices as needed, scans for partition tables (the exported disks / loop devs can contain partitions) using kpartx (dm) and sets them up, so the kernel and initrd can be copied to a temporary location in dom0. The bootentry may contain a dev: prefix describing the partition (from a domU perspective!) where kernel and initrd are located, followed by kernel filename and (optional) initrd filenames relative to the filesystem on dev:. The kernel and initrd filename can also be relative to the domU root filesystem. The domUloader than evaluates /etc/fstab found in the root filesystem (passed via --root=) to locate kernel and initrd. Afterwards everything is cleaned up. (We use the destructors, so python reference counting makes sure this also happens when exceptions occur.) Unlike pygrub, it does use any code to understand filesystems or partitions; the filesystem support comes from the dom0 kernel, whereas kpartx (from multipath-tools) is used for the knowledge of partitions and for setting up device-mapper. More details by calling domUloader.py --help. An example config could look like this: bootentry = hda2:/vmlinuz-xen,/initrd-xen bootloader = /path/to/domUloader.py disks = [''phy:VG_Xen/LV_dom5,hda,w'', ''file:/var/lib/xen/test,sda,w''] ... assuming LV_dom5 has a second partition with a filesystem containing vmlinuz-xen and initrd-xen in its root fs (the /boot partition). or bootentry = /boot/vmlinuz-xen,/boot/initrd-xen bootloader = /path/to/domUloader.py root = /dev/hda1 disk = ... assuming that the root filesystem has an /etc/fstab that points the way to /boot/vmlinuz-xen. (Does not need to be a separate FS.) The following three mails will contain (1) A patch to xend/XenDomainInfo.py, xend/XenBootloader.py and xm/create.py, making sure that all the needed info is passed to the bootloader and also stored for reuse on rebooting. (2) A patch to make pygrub accept the new parameters passed by XendBootloader.py (but so far pygrub just ignores them ...) (3) The domUloader.py script. Patches are against a working copy (8259) on my laptop; if noone else will, I can create diffs against mercurial tip. I hope this is useful to someone and can be integrated into the Xen distribution. Enjoy, -- Kurt Garloff, Head Architect, Director SUSE Labs (act.), Novell Inc. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Hi Kurt, Kurt Garloff wrote:>domUloader parses the bootentry (passed via --entry=) and the disk >setup (passed via --disks=). It then sets up loop devices as needed, >scans for partition tables (the exported disks / loop devs can >contain partitions) using kpartx (dm) and sets them up, so the kernel >and initrd can be copied to a temporary location in dom0. > >Just to clarify, this means that domU filesystems are being mounted in dom0? I knew there was some security concerns voiced about this many months ago. I think one of the advantages to using libext2 was that it theoritically allowed the filesystem parsing to be done as a non-privileged user. Regards, Anthony Liguori>The bootentry may contain a dev: prefix describing the partition >(from a domU perspective!) where kernel and initrd are located, >followed by kernel filename and (optional) initrd filenames relative >to the filesystem on dev:. >The kernel and initrd filename can also be relative to the domU root >filesystem. The domUloader than evaluates /etc/fstab found in the >root filesystem (passed via --root=) to locate kernel and initrd. >Afterwards everything is cleaned up. (We use the destructors, so >python reference counting makes sure this also happens when >exceptions occur.) > >Unlike pygrub, it does use any code to understand filesystems or >partitions; the filesystem support comes from the dom0 kernel, >whereas kpartx (from multipath-tools) is used for the knowledge >of partitions and for setting up device-mapper. > >More details by calling domUloader.py --help. > >An example config could look like this: >bootentry = hda2:/vmlinuz-xen,/initrd-xen >bootloader = /path/to/domUloader.py >disks = [''phy:VG_Xen/LV_dom5,hda,w'', ''file:/var/lib/xen/test,sda,w''] >... >assuming LV_dom5 has a second partition with a filesystem containing >vmlinuz-xen and initrd-xen in its root fs (the /boot partition). > >or >bootentry = /boot/vmlinuz-xen,/boot/initrd-xen >bootloader = /path/to/domUloader.py >root = /dev/hda1 >disk = ... >assuming that the root filesystem has an /etc/fstab that points the >way to /boot/vmlinuz-xen. (Does not need to be a separate FS.) > >The following three mails will contain >(1) A patch to xend/XenDomainInfo.py, xend/XenBootloader.py and > xm/create.py, making sure that all the needed info is passed > to the bootloader and also stored for reuse on rebooting. >(2) A patch to make pygrub accept the new parameters passed by > XendBootloader.py (but so far pygrub just ignores them ...) >(3) The domUloader.py script. > > >Patches are against a working copy (8259) on my laptop; if noone >else will, I can create diffs against mercurial tip. > >I hope this is useful to someone and can be integrated into the >Xen distribution. > >Enjoy, > > >------------------------------------------------------------------------ > >_______________________________________________ >Xen-devel mailing list >Xen-devel@lists.xensource.com >http://lists.xensource.com/xen-devel > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Hi, For discussion, attached is a patch to do something similar using the pygrub code. It adds another file syntax to xm create, allowing you to say things like "kernel=guest:(vbda)/boot/vmlinuz", where vbda is what the new domain will see its block device as. Because it''s based on pygrub, it only handles ext2fs and reiser for now. Also, it hasn''t got partition-table handling, since that''s not working in pygrub either, but it could be added if necessary. The advantage of this is that the dom0 kernel never needs to read the domU filesystems, so you could run the extraction code without so much worrying. The patch is against -unstable of last month, but I can update it if people are interested. Tim. -- Tim Deegan (My opinions, not the University''s) Systems Research Group University of Cambridge Computer Laboratory _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Hi Anthony, On Tue, Jan 17, 2006 at 05:52:14AM -0600, Anthony Liguori wrote:> Just to clarify, this means that domU filesystems are being mounted in > dom0?Correct.> I knew there was some security concerns voiced about this many > months ago. I think one of the advantages to using libext2 was that it > theoritically allowed the filesystem parsing to be done as a > non-privileged user.I can see your point. There''s two concerns you could have: 1. When the domU fs gets mounted in dom0, a local user there could get (read-only) access to data that he shouldn''t have access to. This can be prevented by mounting under a directory that''s not readable to anyone but root. I didn''t do this in my patch set, but it''s certainly a good idea. (And dom0 root you need to trust anyway, such is the trust model in a hybrid virtualization model without encrypting everything.) 2. The filesystem in the domU could be prepared such that the kernel trips over a bug in its filesystem code. The same can happen if you read the FS with a userspace library of course, but the effects would be less bad -- at least if you would do it with non-root euid. The downside is that need to use a secondary source for filesystem code, which needs to be maintained and kept in sync, audited, ... And you are limited to the filesystems where you have userspace libraries for. In a paranoid scenario, you would not load any data from the domU filesystem in any way :-) But I can see why you would choose pygrub over domUloader in a sensitive environment, where you can''t trust the domU admins. Point taken. I still think that in many use scenarios, you would be perfectly fine with domUloader. Did I catch your concerns? -- Kurt Garloff, Head Architect, Director SUSE Labs (act.), Novell Inc. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Tue, 17 Jan 2006, Kurt Garloff wrote:> 2. The filesystem in the domU could be prepared such that the kernel > trips over a bug in its filesystem code. > The same can happen if you read the FS with a userspace library > of course, but the effects would be less bad -- at least if you > would do it with non-root euid. > The downside is that need to use a secondary source for filesystem > code, which needs to be maintained and kept in sync, audited, ... > And you are limited to the filesystems where you have userspace > libraries for. > In a paranoid scenario, you would not load any data from the domU > filesystem in any way :-) But I can see why you would choose > pygrub over domUloader in a sensitive environment, where you > can''t trust the domU admins. Point taken. > I still think that in many use scenarios, you would be perfectly > fine with domUloader.Have a special kernel that is used just for this, then boot a temporary domU, using this special kernel, read the data you need from the filesystem, then shut it down. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Hi Adam, On Tue, Jan 17, 2006 at 11:28:58AM -0600, Adam Heath wrote:> On Tue, 17 Jan 2006, Kurt Garloff wrote: > > > In a paranoid scenario, you would not load any data from the domU > > filesystem in any way :-) But I can see why you would choose > > pygrub over domUloader in a sensitive environment, where you > > can''t trust the domU admins. Point taken. > > I still think that in many use scenarios, you would be perfectly > > fine with domUloader. > > Have a special kernel that is used just for this, then boot a temporary domU, > using this special kernel, read the data you need from the filesystem, then > shut it down.Good solution but quite complex ... I wonder whether it would be easier porting grub to xen. For now something simple that just works and is secure enough for 90+% of the users does not look so bad to me. Best, -- Kurt Garloff, Head Architect, Director SUSE Labs (act.), Novell Inc. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Kurt Garloff wrote:>Hi Anthony, > > >>I knew there was some security concerns voiced about this many >>months ago. I think one of the advantages to using libext2 was that it >>theoritically allowed the filesystem parsing to be done as a >>non-privileged user. >> >> > >I can see your point. > >There''s two concerns you could have: > >1. When the domU fs gets mounted in dom0, a local user there could > get (read-only) access to data that he shouldn''t have access to. > This can be prevented by mounting under a directory that''s not > readable to anyone but root. I didn''t do this in my patch set, > but it''s certainly a good idea. > (And dom0 root you need to trust anyway, such is the trust model > in a hybrid virtualization model without encrypting everything.) > >2. The filesystem in the domU could be prepared such that the kernel > trips over a bug in its filesystem code. > The same can happen if you read the FS with a userspace library > of course, but the effects would be less bad -- at least if you > would do it with non-root euid. > The downside is that need to use a secondary source for filesystem > code, which needs to be maintained and kept in sync, audited, ... > And you are limited to the filesystems where you have userspace > libraries for. > In a paranoid scenario, you would not load any data from the domU > filesystem in any way :-) But I can see why you would choose > pygrub over domUloader in a sensitive environment, where you > can''t trust the domU admins. Point taken. > I still think that in many use scenarios, you would be perfectly > fine with domUloader. > >Did I catch your concerns? > >Yup, just wanted to make sure it was considered :-) Regards, Anthony Liguori _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Tue, 2006-01-17 at 05:52 -0600, Anthony Liguori wrote:> Kurt Garloff wrote: > >domUloader parses the bootentry (passed via --entry=) and the disk > >setup (passed via --disks=). It then sets up loop devices as needed, > >scans for partition tables (the exported disks / loop devs can > >contain partitions) using kpartx (dm) and sets them up, so the kernel > >and initrd can be copied to a temporary location in dom0. > > > Just to clarify, this means that domU filesystems are being mounted in > dom0? I knew there was some security concerns voiced about this many > months ago. I think one of the advantages to using libext2 was that it > theoritically allowed the filesystem parsing to be done as a > non-privileged user.The other concern with mounting is that there have been some cases where changes to filesystems have broken reading new filesystems with older kernels. It''s a lot easier to get the library that supports more (and less has to be supported, so you''re less likely to need to make changes) than to upgrade your kernel for dom0 Jeremy _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Hi Jeremy, On Wed, Jan 18, 2006 at 01:06:04PM -0500, Jeremy Katz wrote:> The other concern with mounting is that there have been some cases where > changes to filesystems have broken reading new filesystems with older > kernels. It''s a lot easier to get the library that supports more (and > less has to be supported, so you''re less likely to need to make changes) > than to upgrade your kernel for dom0I tend to disagree. As the dom0 kernel drives the hardware and hardware drivers seems to be the more prominent reason for moving to new kernel versions, I would assume the dom0 kernel to be updated more likely than the domU kernels. And actually, filesystem forward compatibility is not that bad. I''m not saying this can''t be an issue, but I suspect it won''t be for most people. Cheers, -- Kurt Garloff, Head Architect, Director SUSE Labs (act.), Novell Inc. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On a side note, one thing we all have to think about is how a boot loader would work with something like a virtual framebuffer. It may be time to start thinking about writing a first class domU bootloader. Something that just sets up a page table that maps the pfns linearly and enough XenBus to read from a virtual disk. We can reuse code from grub for filesystem parsing (or even write it from scratch--it''s not that hard to just read from a filesystem). We could also use mini-OS as a base. Regards, Anthony Liguori Kurt Garloff wrote:>Hi Jeremy, > >On Wed, Jan 18, 2006 at 01:06:04PM -0500, Jeremy Katz wrote: > > >>The other concern with mounting is that there have been some cases where >>changes to filesystems have broken reading new filesystems with older >>kernels. It''s a lot easier to get the library that supports more (and >>less has to be supported, so you''re less likely to need to make changes) >>than to upgrade your kernel for dom0 >> >> > >I tend to disagree. >As the dom0 kernel drives the hardware and hardware drivers seems >to be the more prominent reason for moving to new kernel versions, >I would assume the dom0 kernel to be updated more likely than the >domU kernels. >And actually, filesystem forward compatibility is not that bad. > >I''m not saying this can''t be an issue, but I suspect it won''t be >for most people. > >Cheers, > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Wed, Jan 18, 2006 at 01:07:00PM -0500, Jeremy Katz wrote:> Sounds reasonable enough, although I''ll have to look at it a little > closer when I get back from Austin. FWIW, partition table handling in > pygrub should work fine (I''m installing to full disk vbds with partition > tables regularly)The partition handling is only enough to find the "active" partition, so it doesn''t handle extended partitions. That''s not a problem for pygrub, but would need to be done to have the extraction tool handle partitions properly. Also, it doesn''t work if your e2fsprogs are too old to have ext2fs_open2() -- again, not really a bug but the failure mode is a bit ugly, and the version in the Xen 3 tarball has this problem. Is there some way of telling from inside a python script whether the pygrub library is going to be able to read partitions or not? Tim. -- Tim Deegan (My opinions, not the University''s) Systems Research Group University of Cambridge Computer Laboratory _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Wed, 2006-01-18 at 22:31 -0600, Anthony Liguori wrote:> On a side note, one thing we all have to think about is how a boot > loader would work with something like a virtual framebuffer.Yeah :/> It may be time to start thinking about writing a first class domU > bootloader. Something that just sets up a page table that maps the pfns > linearly and enough XenBus to read from a virtual disk. We can reuse > code from grub for filesystem parsing (or even write it from > scratch--it''s not that hard to just read from a filesystem). > > We could also use mini-OS as a base.The problem is where does something like this end? So we add a basic blkfront. Then someone wants to do some form of netboot. Or boot on iSCSI. Or they use something like GFS or OCFS2 which require significantly more infrastructure than most filesystems. And then, there is a world of pain :/ Unfortunately, I am completely convinced that the right thing is to have the kernel for domU inside the domU''s filesystem because anything else is just fundamentally not manageable. So, perhaps we do have to just suck it up and go the path of what''s essentially mini-OS as a domU "bios" Jeremy _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Hi Tim, Jeremy, On Thu, Jan 19, 2006 at 01:06:18PM +0000, Tim Deegan wrote:> On Wed, Jan 18, 2006 at 01:07:00PM -0500, Jeremy Katz wrote: > > Sounds reasonable enough, although I''ll have to look at it a little > > closer when I get back from Austin. FWIW, partition table handling in > > pygrub should work fine (I''m installing to full disk vbds with partition > > tables regularly) > > The partition handling is only enough to find the "active" partition, so > it doesn''t handle extended partitions. That''s not a problem for pygrub, > but would need to be done to have the extraction tool handle partitions > properly.pygrub does assume there''s one whole disk export which contains a DOS style partition table with the /boot partition marked active. If that one is ext2, everything works fine. (The reiser support failed in my testing.) domUloader is more flexible there. It understands both whole disk devices and partitions, can handle a set of them, finds the /boot partition by having it specified in bootentry or by parsing /etc/fstab on the partition that''s been passed to domU with root=. Maybe we want to import domUloader in pygrub to just get all this, as the handling is nicely abstracted there. You just don''t call domUloader.main(argv) then ... If we do that, the main difference will be that - pygrub offers an interactive (ncurses) mode - pygrub users libraries to get files off the FS, which could be somewhat safer and more easily extensible (but currently is not (yet?) very reliable and limits the FS choice). And I''d really like to advocate for the little changes I have done to XendDomainInfo, XendBootloader and create, so we pass enough information to the bootloader to handle all this.> Also, it doesn''t work if your e2fsprogs are too old to have > ext2fs_open2() -- again, not really a bug but the failure mode is a bit > ugly,So I''m not the only that experiences pygrub running OOM?> and the version in the Xen 3 tarball has this problem. Is there > some way of telling from inside a python script whether the pygrub > library is going to be able to read partitions or not?Updated domUloader attached. Best, -- Kurt Garloff, Head Architect, Director SUSE Labs (act.), Novell Inc. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Hi, On Thu, Jan 19, 2006 at 12:19:53PM -0500, Jeremy Katz wrote:> Unfortunately, I am completely convinced that the right thing is to have > the kernel for domU inside the domU''s filesystem because anything else > is just fundamentally not manageable.I tend to agree. The real trouble starts when the storage needed to boot that domU isn''t even visible to the dom0, though --- perhaps because we''ve got a virtual HBA (say an iSCSI initiator, or virtual FC HBA), connected to a SAN which is filtering by initiator so that only the domU can see the LUN''s contents. Bootstrapping that sort of environment is nasty.> So, perhaps we do have to just > suck it up and go the path of what''s essentially mini-OS as a domU > "bios"If the domU pre-boot is going to have to have enough smarts to run a full iSCSI initiator then we might be better off just biting the bullet and running a proper kernel there with either kexec, or some manner of domU respawn, to boot the correct kernel/initrd once the pre-boot one has downloaded them. Either that, or we basically need to have special cases for /boot to make sure that those files, plus the grub.conf-type kernel args, are registered elsewhere (directory?) for the dom0 to get them from. --Stephen _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Hi, Rumor has it that on Fri, Jan 20, 2006 at 03:36:33PM -0500 Stephen Tweedie said:> Hi, > > On Thu, Jan 19, 2006 at 12:19:53PM -0500, Jeremy Katz wrote: > > > Unfortunately, I am completely convinced that the right thing is to have > > the kernel for domU inside the domU''s filesystem because anything else > > is just fundamentally not manageable. > > I tend to agree. The real trouble starts when the storage needed to > boot that domU isn''t even visible to the dom0, though --- perhaps > because we''ve got a virtual HBA (say an iSCSI initiator, or virtual FC > HBA), connected to a SAN which is filtering by initiator so that only > the domU can see the LUN''s contents. > > Bootstrapping that sort of environment is nasty. >Indeed. I agree with the domU filesystem approach as well. How does one boot off of software iSCSI on a physical machine now? There''s clearly no boot ROM. I''m guessing people use PXE. Then it''s up to the initrd or initramfs to have the iSCSI smarts. In that case the domU pre-boot would need to support vbd, v-nics and have a PXE client. Someone would have to serve the PXE requests and handle the images for that of course. But I think treating it more like a BIOS on a physical machine rather than something that boots in ways that a physical machine doesn''t. The more like real machines the VMs look the better. Of course, that would mean support for root on iSCSI in the installer and mkinitrd code. Anyway, my 2 cents... Cheers, Phil> > So, perhaps we do have to just > > suck it up and go the path of what''s essentially mini-OS as a domU > > "bios" > > If the domU pre-boot is going to have to have enough smarts to run a > full iSCSI initiator then we might be better off just biting the > bullet and running a proper kernel there with either kexec, or some > manner of domU respawn, to boot the correct kernel/initrd once the > pre-boot one has downloaded them. Either that, or we basically need > to have special cases for /boot to make sure that those files, plus > the grub.conf-type kernel args, are registered elsewhere (directory?) > for the dom0 to get them from. > > --Stephen > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel-- Philip R. Auld, Ph.D. Egenera, Inc. Software Architect 165 Forest St. (508) 858-2628 Marlboro, MA 01752 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Thu, Jan 19, 2006 at 01:06:18PM +0000, Tim Deegan wrote:> The partition handling is only enough to find the "active" partition, so > it doesn''t handle extended partitions. That''s not a problem for pygrub, > but would need to be done to have the extraction tool handle partitions > properly.A new version of my patch is attached that understands DOS partition tables (so long as your libext2fs libraries are up to date.) I do prefer Kurt''s changes to the xm create "bootloader" syntax. They are cleaner than what I''ve done with pyfscat. I''d like to do a merge of the pygrub extraction code with his bootloader structure, but haven''t time to do it properly at the moment. Tim. -- Tim Deegan (My opinions, not the University''s) Systems Research Group University of Cambridge Computer Laboratory _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Hi, On Fri, Jan 20, 2006 at 06:08:26PM -0500, Philip R. Auld wrote:> Of course, that would mean support for root on iSCSI in the installer > and mkinitrd code.I would really handle iSCSI in dom0 and export sdX/hdX to domU. Solves the nasty OOM problem as well and makes your domains know less about the underlaying storage. Which is good in my vision of virtualization. Best, -- Kurt Garloff, Head Architect, Director SUSE Labs (act.), Novell Inc. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Rumor has it that on Mon, Jan 23, 2006 at 03:19:50PM +0100 Kurt Garloff said:> Hi, > > On Fri, Jan 20, 2006 at 06:08:26PM -0500, Philip R. Auld wrote: > > Of course, that would mean support for root on iSCSI in the installer > > and mkinitrd code. > > I would really handle iSCSI in dom0 and export sdX/hdX to domU. > Solves the nasty OOM problem as well and makes your domains know > less about the underlaying storage. Which is good in my vision > of virtualization.That''s how I would do it too, if I was using iSCSI. Actually that is how I do it with FC, and in fact dom0 doesn''t know about it either. I was responding to Stephen''s comments about the requirements for a pre-boot image. Cheers, Phil> > Best, > -- > Kurt Garloff, Head Architect, Director SUSE Labs (act.), Novell Inc.-- Philip R. Auld, Ph.D. Egenera, Inc. Software Architect 165 Forest St. (508) 858-2628 Marlboro, MA 01752 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Edwards, Nigel \(Nigel Edwards\)
2006-Jan-26 10:17 UTC
RE: [Xen-devel] [PATCH 0/3] domUloader
> > I would really handle iSCSI in dom0 and export sdX/hdX to domU. > Solves the nasty OOM problem as well and makes your domains know > less about the underlaying storage. Which is good in my vision > of virtualization. >There are good arguments for isolating storage management in Dom0. However, I have been looking at migration of domains with iSCSI and have found it a pain to fix up so that the iSCSI disk appears on the same /dev/sdx point in both source and destination Dom0s. That is why I have started looking at direct iSCSI attachment in DomU via initrd. Then storage is fixed up automatically via migration of network connections. The disadvantage of this, as Kurt points out, is that it exposes your iSCSI infrastructure into DomU. Cheers, Nigel.> -----Original Message----- > From: xen-devel-bounces@lists.xensource.com > [mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of > Kurt Garloff > Sent: 23 January 2006 14:20 > To: Philip R. Auld > Cc: Xen development list; Jeremy Katz > Subject: Re: [Xen-devel] [PATCH 0/3] domUloader > > > Hi, > > On Fri, Jan 20, 2006 at 06:08:26PM -0500, Philip R. Auld wrote: > > Of course, that would mean support for root on iSCSI in the > installer > > and mkinitrd code. > > I would really handle iSCSI in dom0 and export sdX/hdX to domU. > Solves the nasty OOM problem as well and makes your domains know > less about the underlaying storage. Which is good in my vision > of virtualization. > > Best, > -- > Kurt Garloff, Head Architect, Director SUSE Labs (act.), Novell Inc. >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> There are good arguments for isolating storage management in Dom0. > However, I have been looking at migration of domains with iSCSI > and have found it a pain to fix up so that the iSCSI disk > appears on the same /dev/sdx point in both source and destination > Dom0s.If you wrote a setup script for iSCSI devices (as we have for NBD, etc) you could do some sort of lookup to identify the correct device node when the domain arrives at a new host - then you wouldn''t need for the device node to be the same everywhere. Don''t know terribly much about iSCSI, but you''d just need a device-node independent way of specifying the LUN as the block device in the config file. Cheers, Mark> That is why I have started looking at direct iSCSI attachment in DomU > via initrd. Then storage is fixed up automatically via migration of > network connections. The disadvantage of this, as Kurt points out, > is that it exposes your iSCSI infrastructure into DomU. > > Cheers, > Nigel. > > > -----Original Message----- > > From: xen-devel-bounces@lists.xensource.com > > [mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of > > Kurt Garloff > > Sent: 23 January 2006 14:20 > > To: Philip R. Auld > > Cc: Xen development list; Jeremy Katz > > Subject: Re: [Xen-devel] [PATCH 0/3] domUloader > > > > > > Hi, > > > > On Fri, Jan 20, 2006 at 06:08:26PM -0500, Philip R. Auld wrote: > > > Of course, that would mean support for root on iSCSI in the > > > > installer > > > > > and mkinitrd code. > > > > I would really handle iSCSI in dom0 and export sdX/hdX to domU. > > Solves the nasty OOM problem as well and makes your domains know > > less about the underlaying storage. Which is good in my vision > > of virtualization. > > > > Best, > > -- > > Kurt Garloff, Head Architect, Director SUSE Labs (act.), Novell Inc. > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel-- Dave: Just a question. What use is a unicyle with no seat? And no pedals! Mark: To answer a question with a question: What use is a skateboard? Dave: Skateboards have wheels. Mark: My wheel has a wheel! _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Rumor has it that on Thu, Jan 26, 2006 at 10:17:22AM -0000 Edwards, Nigel (Nigel Edwards) said:> > > > I would really handle iSCSI in dom0 and export sdX/hdX to domU. > > Solves the nasty OOM problem as well and makes your domains know > > less about the underlaying storage. Which is good in my vision > > of virtualization. > > > There are good arguments for isolating storage management in Dom0. > However, I have been looking at migration of domains with iSCSI > and have found it a pain to fix up so that the iSCSI disk > appears on the same /dev/sdx point in both source and destination > Dom0s.It should be possible to use udev to assign a specific unique name to each disk based on UID or whatever iSCSI has for WWN support. I think that would be the right fix. The order based /dev/sdX naming has been a deficiency in Linux proper for ever, but should be going away at this point. Cheers, Phil> > That is why I have started looking at direct iSCSI attachment in DomU > via initrd. Then storage is fixed up automatically via migration of > network connections. The disadvantage of this, as Kurt points out, > is that it exposes your iSCSI infrastructure into DomU. > > Cheers, > Nigel. > > > -----Original Message----- > > From: xen-devel-bounces@lists.xensource.com > > [mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of > > Kurt Garloff > > Sent: 23 January 2006 14:20 > > To: Philip R. Auld > > Cc: Xen development list; Jeremy Katz > > Subject: Re: [Xen-devel] [PATCH 0/3] domUloader > > > > > > Hi, > > > > On Fri, Jan 20, 2006 at 06:08:26PM -0500, Philip R. Auld wrote: > > > Of course, that would mean support for root on iSCSI in the > > installer > > > and mkinitrd code. > > > > I would really handle iSCSI in dom0 and export sdX/hdX to domU. > > Solves the nasty OOM problem as well and makes your domains know > > less about the underlaying storage. Which is good in my vision > > of virtualization. > > > > Best, > > -- > > Kurt Garloff, Head Architect, Director SUSE Labs (act.), Novell Inc. > >-- Philip R. Auld, Ph.D. Egenera, Inc. Software Architect 165 Forest St. (508) 858-2628 Marlboro, MA 01752 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Thu, 2006-01-26 at 08:37 -0500, Philip R. Auld wrote:> It should be possible to use udev to assign a specific unique name > to each disk basedIt already does this to some degree. From a handy Debian box: $ tree /dev/disk/ /dev/disk/ |-- by-id | |-- scsi-0ATA_ST3200826AS_Linux_ATA-SCSI_simulator -> ../../sda | |-- scsi-0ATA_ST3200826AS_Linux_ATA-SCSI_simulator-part1 -> ../../sda1 | |-- scsi-0ATA_ST3200826AS_Linux_ATA-SCSI_simulator-part2 -> ../../sda2 | `-- scsi-0ATA_ST3200826AS_Linux_ATA-SCSI_simulator-part3 -> ../../sda3 |-- by-label | `-- boot -> ../../sda1 |-- by-path | |-- pci-0000:00:1f.1-ide-0:0 -> ../../hda | |-- pci-0000:00:1f.2-scsi-0:0:0:0 -> ../../sda | |-- pci-0000:00:1f.2-scsi-0:0:0:0-part1 -> ../../sda1 | |-- pci-0000:00:1f.2-scsi-0:0:0:0-part2 -> ../../sda2 | `-- pci-0000:00:1f.2-scsi-0:0:0:0-part3 -> ../../sda3 `-- by-uuid `-- 8312472d-e311-4e0d-837c-6c4eb646a5e3 -> ../../sda1 The logic is in /etc/udev/persistent.rules. It might need updating with some iSCSI knowledge I suppose. Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Rumor has it that on Thu, Jan 26, 2006 at 02:01:38PM +0000 Ian Campbell said:> On Thu, 2006-01-26 at 08:37 -0500, Philip R. Auld wrote: > > It should be possible to use udev to assign a specific unique name > > to each disk based > > It already does this to some degree. From a handy Debian box: > > $ tree /dev/disk/ > /dev/disk/ > |-- by-id > | |-- scsi-0ATA_ST3200826AS_Linux_ATA-SCSI_simulator -> ../../sda > | |-- scsi-0ATA_ST3200826AS_Linux_ATA-SCSI_simulator-part1 -> ../../sda1 > | |-- scsi-0ATA_ST3200826AS_Linux_ATA-SCSI_simulator-part2 -> ../../sda2 > | `-- scsi-0ATA_ST3200826AS_Linux_ATA-SCSI_simulator-part3 -> ../../sda3 > |-- by-label > | `-- boot -> ../../sda1 > |-- by-path > | |-- pci-0000:00:1f.1-ide-0:0 -> ../../hda > | |-- pci-0000:00:1f.2-scsi-0:0:0:0 -> ../../sda > | |-- pci-0000:00:1f.2-scsi-0:0:0:0-part1 -> ../../sda1 > | |-- pci-0000:00:1f.2-scsi-0:0:0:0-part2 -> ../../sda2 > | `-- pci-0000:00:1f.2-scsi-0:0:0:0-part3 -> ../../sda3 > `-- by-uuid > `-- 8312472d-e311-4e0d-837c-6c4eb646a5e3 -> ../../sda1 > > The logic is in /etc/udev/persistent.rules. It might need updating with > some iSCSI knowledge I suppose.Right, but this is just showing which UID it mapped to sda. My point was you can configure it to give a specific name to a specific UID: scsi-0ATA_ST3200826AS_Linux_ATA-SCSI_simulator -> ../../my_domU_disk Then you use my_domU_disk in the xen domain configuration. You can do this manually as well by looking up which sdX name the deivce got and making a link it or device node with the same major/minor. But I think udev can be configured to to it for you. I don''t know the details of how to make it do that, but that''s part of what it''s for. Cheers, Phil> > Ian. >-- Philip R. Auld, Ph.D. Egenera, Inc. Software Architect 165 Forest St. (508) 858-2628 Marlboro, MA 01752 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Hi Philip, On Thu, Jan 26, 2006 at 09:20:40AM -0500, Philip R. Auld wrote:> Right, but this is just showing which UID it mapped to sda. My point > was you can configure it to give a specific name to a specific UID: > > scsi-0ATA_ST3200826AS_Linux_ATA-SCSI_simulator -> ../../my_domU_diska > > Then you use my_domU_disk in the xen domain configuration. > You can do this manually as well by looking up which sdX name the > deivce got and making a link it or device node with the same > major/minor. But I think udev can be configured to to it for you. > I don''t know the details of how to make it do that, but that''s part > of what it''s for.Use disk = [ ''iscsi:scsi-0ATA_ST3200826AS_Linux_ATA-SCSI_simulator,hda,w'' ] and drop a script into /etc/xen/scripts/block-iscsi that handles it. (I''m assuming now that scsi-0ATA_ST... is a unique identifier here; otherwise chose a different property of your iSCSI target.) Unless I misunderstood something, that suggestion of Mark really solves your problem. Best, -- Kurt Garloff, Head Architect, Director SUSE Labs (act.), Novell Inc. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Oh, and if you need help with the script people here can probably advise. It''s relatively simple - see the NBD scripting. If it has more general applicability, it''d be nice to see a copy on xen-devel if you get the chance ;-) Cheers, Mark On Thursday 26 January 2006 18:28, Kurt Garloff wrote:> Hi Philip, > > On Thu, Jan 26, 2006 at 09:20:40AM -0500, Philip R. Auld wrote: > > Right, but this is just showing which UID it mapped to sda. My point > > was you can configure it to give a specific name to a specific UID: > > > > scsi-0ATA_ST3200826AS_Linux_ATA-SCSI_simulator -> ../../my_domU_diska > > > > Then you use my_domU_disk in the xen domain configuration. > > You can do this manually as well by looking up which sdX name the > > deivce got and making a link it or device node with the same > > major/minor. But I think udev can be configured to to it for you. > > I don''t know the details of how to make it do that, but that''s part > > of what it''s for. > > Use > disk = [ ''iscsi:scsi-0ATA_ST3200826AS_Linux_ATA-SCSI_simulator,hda,w'' ] > and drop a script into /etc/xen/scripts/block-iscsi > that handles it. (I''m assuming now that scsi-0ATA_ST... is a unique > identifier here; otherwise chose a different property of your iSCSI > target.) > > Unless I misunderstood something, that suggestion of Mark really > solves your problem. > > Best,-- Dave: Just a question. What use is a unicyle with no seat? And no pedals! Mark: To answer a question with a question: What use is a skateboard? Dave: Skateboards have wheels. Mark: My wheel has a wheel! _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Hi Kurt, Rumor has it that on Thu, Jan 26, 2006 at 07:28:53PM +0100 Kurt Garloff said:> Hi Philip, > > On Thu, Jan 26, 2006 at 09:20:40AM -0500, Philip R. Auld wrote: > > Right, but this is just showing which UID it mapped to sda. My point > > was you can configure it to give a specific name to a specific UID: > > > > scsi-0ATA_ST3200826AS_Linux_ATA-SCSI_simulator -> ../../my_domU_diska > > > > Then you use my_domU_disk in the xen domain configuration. > > You can do this manually as well by looking up which sdX name the > > deivce got and making a link it or device node with the same > > major/minor. But I think udev can be configured to to it for you. > > I don''t know the details of how to make it do that, but that''s part > > of what it''s for. > > Use > disk = [ ''iscsi:scsi-0ATA_ST3200826AS_Linux_ATA-SCSI_simulator,hda,w'' ] > and drop a script into /etc/xen/scripts/block-iscsi > that handles it. (I''m assuming now that scsi-0ATA_ST... is a unique > identifier here; otherwise chose a different property of your iSCSI > target.) > > Unless I misunderstood something, that suggestion of Mark really > solves your problem. >Thanks. But I don''t have a problem (at least not this anyway ;). We''re both talking about different ways to do the same thing. My point again is that udev can handle the task of creating well-known, consistent device names for uniquely identifiable disks. And that it makes sense to me to just configure udev properly and you do not need to do anything xen specific to solve this problem. And it is not iSCSI specific either. As I said, I don''t know for sure how to make udev do this for iSCSI. For FC you can use the scsi_id callout to get the UID and then have a udev rule to use to generate a device name. There may not exist an analog of scsi_id for iSCSI. It''s the same as what you are suggesting except it''s Linux-wide and not xen only :) Cheers, Phil> Best, > -- > Kurt Garloff, Head Architect, Director SUSE Labs (act.), Novell Inc.-- Philip R. Auld, Ph.D. Egenera, Inc. Software Architect 165 Forest St. (508) 858-2628 Marlboro, MA 01752 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Hi Philip, On Thu, Jan 26, 2006 at 01:57:35PM -0500, Philip R. Auld wrote:> Thanks. But I don''t have a problem (at least not this anyway ;). > > We''re both talking about different ways to do the same thing. > My point again is that udev can handle the task of creating > well-known, consistent device names for uniquely identifiable > disks. And that it makes sense to me to just configure udev > properly and you do not need to do anything xen specific to > solve this problem. And it is not iSCSI specific either.... and you just use phy:by-id/name as for disk exporting then. Sure, that would work. The block-iscsi script has the advantage that it''s possible to connect to iSCSI target only when needed and disconnect once you are done. This helps you when migrating VMs from one physical machine to another. I don''t know exactly the semantics of multiple parallel connections to an iSCSI target. Probably it should not cause problems, in which case you''d get away with your phy: solution.> As I said, I don''t know for sure how to make udev do > this for iSCSI. For FC you can use the scsi_id callout to > get the UID and then have a udev rule to use to generate a > device name. There may not exist an analog of scsi_id for > iSCSI. It''s the same as what you are suggesting except it''s > Linux-wide and not xen only :)You''re trying to solve the persistant device naming problem. Mark and I are trying to connect and disconnect to iSCSI targets on demand (and have the possibility to solve the persistent device name problem along the way for xen -- if needed.) Anyway, we''re far off-topic the original post now. Best, -- Kurt Garloff, Head Architect, Director SUSE Labs (act.), Novell Inc. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Kurt Garloff wrote:> > I extended the infrastructure a bit and added another bootloader. > Unlike pygrub it does not offer a menu and does not parse the > grub menu.lst; it''s meant for paravirtualized domains and thus > we accept that the booted kernel is selected differently. By e.g. > a symlink, if one wants to control it from the domU. The bootloader > is called domUloader. >Is there a project page for this our tarball / RPM out there that includes domUloader and it''s latest updates? _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Hi Matt, On Wed, Mar 22, 2006 at 01:59:23PM -0500, Matt Ayres wrote:> Kurt Garloff wrote: > > >I extended the infrastructure a bit and added another bootloader. > >Unlike pygrub it does not offer a menu and does not parse the > >grub menu.lst; it''s meant for paravirtualized domains and thus > >we accept that the booted kernel is selected differently. By e.g. > >a symlink, if one wants to control it from the domU. The bootloader > >is called domUloader. > > Is there a project page for this our tarball / RPM out there that > includes domUloader and it''s latest updates?I was hoping to see it merged right away, so I did not set up a project page. Seems I have to do it :-( The Novell/SUSE RPMs include the domUloader functionality. You can find latest version in our current betas or on http://forge.novell.com/modules/xfmod/project/?xenpreview Best, -- Kurt Garloff, Head Architect Linux, Novell Inc. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Kurt Garloff wrote:> Hi Matt, > > On Wed, Mar 22, 2006 at 01:59:23PM -0500, Matt Ayres wrote: >> Kurt Garloff wrote: >> >>> I extended the infrastructure a bit and added another bootloader. >>> Unlike pygrub it does not offer a menu and does not parse the >>> grub menu.lst; it''s meant for paravirtualized domains and thus >>> we accept that the booted kernel is selected differently. By e.g. >>> a symlink, if one wants to control it from the domU. The bootloader >>> is called domUloader. >> Is there a project page for this our tarball / RPM out there that >> includes domUloader and it''s latest updates? > > I was hoping to see it merged right away, so I did not set up a project > page. Seems I have to do it :-( > > The Novell/SUSE RPMs include the domUloader functionality. > You can find latest version in our current betas or on > http://forge.novell.com/modules/xfmod/project/?xenpreview >Hi Kurt, Are there any plans to re-diff domUloader against 3.0.2 and post to this list? I''ve been having quite a difficult time to get your original patches to apply. Thanks, Matt _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel