Luke Kenneth Casson Leighton
2005-Jan-02 16:26 UTC
[Xen-devel] [XEN] using shmfs for swapspace
hi, am starting to play with XEN - the virtualisation project (http://xen.sf.net). i''ll give some background first of all and then the question - at the bottom - will make sense [when posting to lkml i often get questions asked that are answered by the background material i also provide... *sigh*] each virtual machine requires (typically) its own physical ram (a chunk of the host''s real memory) and some virtual memory - swapspace. xen uses 32mb for its shm guest OS inter-communication. so, in the case i''m setting up, that''s 5 virtual machines (only one of which can get away with having only 32mb of ram, the rest require 64mb) so that''s five lots of 256mbyte swap files. the memory usage is the major concern: i only have 256mb of ram and you''ve probably by now added up that the above comes to 320mbytes. so i started looking at ways to minimise the memory usage. first, reducing each machine to only having 32mb of ram, and secondly, on the host, creating a MASSIVE swap file (1gbyte), making a MASSIVE shmfs/tmpfs partition (1gbyte) and then creating swap files in the tmpfs partition!!! the reasoning behind doing this is quite straightforward: by placing the swapfiles in a tmpfs, presumably then when one of the guest OSes requires some memory, then RAM on the host OS will be used until such time as the amount of RAM requested exceeds the host OSes physical memory, and then it will go into swap-space. this is presumed to be infinitely better than forcing the swapspace to be always on disk, especially with the guests only being allocated 32mbyte of physical RAM. here''s the problems: 1) tmpfs doesn''t support sparse files 2) files created in tmpfs don''t support block devices (???) 3) as a workaround i have to create a swap partition in a 256mb file, (dd if=/dev/zero of=/mnt/swapfile bs=1M count=256 and do mkswap on it) then copy the ENTIRE file into the tmpfs-mounted partition. on every boot-up. per swapfile needed. eeeuw, yuk. so, my question is a strategic one: * in what other ways could the same results be achieved? in other words, what other ways can i publish block devices from the master OS (and they must be block devices for XEN guest OSes to be able to see them) that can be used as swap space, that will be in RAM if possible, bearing in mind that they can be recreated at boot time, i.e. they don''t need to be persistent. ta, l. -- -- <a href="http://lkcl.net">http://lkcl.net</a> -- ------------------------------------------------------- The SF.Net email is sponsored by: Beat the post-holiday blues Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek. It''s fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
Luke Kenneth Casson Leighton
2005-Jan-03 20:53 UTC
[Xen-devel] Re: [XEN] using shmfs for swapspace
On Mon, Jan 03, 2005 at 01:31:34PM -0500, Joseph Fannin wrote:> On Sun, Jan 02, 2005 at 04:26:52PM +0000, Luke Kenneth Casson Leighton wrote: > [...] > > this is presumed to be infinitely better than forcing the swapspace to > > be always on disk, especially with the guests only being allocated > > 32mbyte of physical RAM. > > I''d be interested in knowing how a tmpfs that''s gone far into swap > performs compared to a more normal on-disk fs. I don''t know if anyone > has ever looked into it. Is it comparable, or is tmpfs''s ability to > swap more a last-resort escape hatch? > > This is the part where I would add something valuable to this > conversation, if I were going to do that. (But no.):) okay. some kind person from ibm pointed out that of course if you use a file-based swap file (in xen terminology, disk=[''file:/xen/guest1-swapfile,/dev/sda2,rw''] which means "publish guest1-swapfile on the DOM0 VM as /dev/sda2 hard drive on the guest1 VM) then you of course end up using the linux filesystem cache on DOM0 which is of course RAM-based. so this tends to suggest a strategy where you allocate as much memory as you can afford to the DOM0 VM, and as little as you can afford to the guests, and make the guest swap files bigger to compensate. ... and i thought it was going to need some wacky wacko non-sharing shared-memory virtual-memory pseudo-tmpfs block-based filesystem driver. dang. l. ------------------------------------------------------- The SF.Net email is sponsored by: Beat the post-holiday blues Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek. It''s fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
> so this tends to suggest a strategy where you allocate as > much memory as you can afford to the DOM0 VM, and as little > as you can afford to the guests, and make the guest swap > files bigger to compensate.This is essentially what the mainframe folks are already doing and have been doing for some time because the kernel VM has no external inputs for saying "you are virtualised so be nice" for doing opportunistic page recycling ("I dont need this page but when I ask for it back please tell me if you trashed the content") and for hinting to the underlying VM what pages are best blasted out of existance first and how to communicate so we dont page them back in scanning them. ------------------------------------------------------- The SF.Net email is sponsored by: Beat the post-holiday blues Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek. It''s fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
On Mon, 3 Jan 2005, Luke Kenneth Casson Leighton wrote:> On Mon, Jan 03, 2005 at 01:31:34PM -0500, Joseph Fannin wrote: > > On Sun, Jan 02, 2005 at 04:26:52PM +0000, Luke Kenneth Casson Leighton wrote: > > [...] > > > this is presumed to be infinitely better than forcing the swapspace to > > > be always on disk, especially with the guests only being allocated > > > 32mbyte of physical RAM. > > > > I''d be interested in knowing how a tmpfs that''s gone far into swap > > performs compared to a more normal on-disk fs. I don''t know if anyone > > has ever looked into it. Is it comparable, or is tmpfs''s ability to > > swap more a last-resort escape hatch? > > > > This is the part where I would add something valuable to this > > conversation, if I were going to do that. (But no.) > > :) > > okay. > > some kind person from ibm pointed out that of course if you use a > file-based swap file (in xen terminology, > disk=[''file:/xen/guest1-swapfile,/dev/sda2,rw''] which means "publish > guest1-swapfile on the DOM0 VM as /dev/sda2 hard drive on the > guest1 VM) then you of course end up using the linux filesystem cache > on DOM0 which is of course RAM-based. > > so this tends to suggest a strategy where you allocate as > much memory as you can afford to the DOM0 VM, and as little > as you can afford to the guests, and make the guest swap > files bigger to compensate.But the guest kernels need real ram to run programs in. The problem with dom0 doing the caching, is that dom0 has no idea about the usage pattern for the swap. It''s just a plain file to dom0. Only each guest kernel knows how to combine swap reads/writes correctly. ------------------------------------------------------- The SF.Net email is sponsored by: Beat the post-holiday blues Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek. It''s fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
> for doing opportunistic page recycling ("I dont need this page but when > I ask for it back please tell me if you trashed the content")We''ve talked about doing this but AFAIK nobody has gotten round to it yet because there hasn''t been a pressing need (IIRC, it was on the todo list when Xen 1.0 came out). IMHO, it doesn''t look terribly difficult but would require (hopefully small) modifications to the architecture independent code, plus a little bit of support code in Xen. I''d quite like to look at this one fine day but I suspect there are more useful things I should do first... Cheers, Mark ------------------------------------------------------- The SF.Net email is sponsored by: Beat the post-holiday blues Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek. It''s fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
Luke Kenneth Casson Leighton
2005-Jan-04 09:30 UTC
Re: [Xen-devel] Re: [XEN] using shmfs for swapspace
On Mon, Jan 03, 2005 at 03:07:42PM -0600, Adam Heath wrote:> > so this tends to suggest a strategy where you allocate as > > much memory as you can afford to the DOM0 VM, and as little > > as you can afford to the guests, and make the guest swap > > files bigger to compensate. > > But the guest kernels need real ram to run programs in. > > The problem with dom0 doing the caching, is that dom0 has no idea about the > usage pattern for the swap. It''s just a plain file to dom0. Only each guest > kernel knows how to combine swap reads/writes correctly.... hmm... then that tends to suggest that this is an issue that should really be dealt with by XEN. that there needs to be coordination of swap management between the virtual machines. l. -- -- <a href="http://lkcl.net">http://lkcl.net</a> -- ------------------------------------------------------- The SF.Net email is sponsored by: Beat the post-holiday blues Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek. It''s fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
On Tue, 4 Jan 2005, Mark Williamson wrote:>> for doing opportunistic page recycling ("I dont need this page but when >> I ask for it back please tell me if you trashed the content") > > We''ve talked about doing this but AFAIK nobody has gotten round to it > yet because there hasn''t been a pressing need (IIRC, it was on the todo > list when Xen 1.0 came out). > > IMHO, it doesn''t look terribly difficult but would require (hopefully > small) modifications to the architecture independent code, plus a little > bit of support code in Xen.The architecture independant changes are fine, since they''re also useful for S390(x), PPC64 and UML...> I''d quite like to look at this one fine day but I suspect there are more > useful things I should do first...I wonder if the same effect could be achieved by just measuring the VM pressure inside the guests and ballooning the guests as required, letting them grow and shrink with their workloads. That wouldn''t need many kernel changes, maybe just a few extra statistics, or maybe all the needed stats already exist. It would also allow more complex policy to be done in userspace, eg. dealing with Xen guests of different priority... -- "Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it." - Brian W. Kernighan ------------------------------------------------------- The SF.Net email is sponsored by: Beat the post-holiday blues Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek. It''s fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
On Tue, 4 Jan 2005, Luke Kenneth Casson Leighton wrote:> then that tends to suggest that this is an issue that should > really be dealt with by XEN.Probably.> that there needs to be coordination of swap management between the > virtual machines.I''d like to see the maximum security separation possible between the unprivileged guests, though... -- "Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it." - Brian W. Kernighan ------------------------------------------------------- The SF.Net email is sponsored by: Beat the post-holiday blues Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek. It''s fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
On Dinsdag 04 Januar 2005 04:04, Mark Williamson wrote:> > for doing opportunistic page recycling ("I dont need this page but when > > I ask for it back please tell me if you trashed the content") > > We''ve talked about doing this but AFAIK nobody has gotten round to it yet > because there hasn''t been a pressing need (IIRC, it was on the todo list when > Xen 1.0 came out). > > IMHO, it doesn''t look terribly difficult but would require (hopefully small) > modifications to the architecture independent code, plus a little bit of > support code in Xen. > > I''d quite like to look at this one fine day but I suspect there are more > useful things I should do first...There are two other alternatives that are already used on s390 for making multi-level paging a little more pleasant: - Pseudo faults: When Linux accesses a page that it believes to be present but is actually swapped out in z/VM, the VM hypervisor causes a special PFAULT exception. Linux can then choose to either ignore this exception and continue, which will force VM to swap the page back in. Or it can do a task switch and wait for the page to come back. At the point where VM has read the page back from its swap device, it causes another exception, after which Linux wakes up the sleeping process. see arch/s390/mm/fault.c - Ballooning: z/VM has an interface (DIAG 10) for the OS to tell it about a page that is currently unused. The kernel uses get_free_page to reserve a number of pages, then calls DIAG10 to give it to z/VM. The amount of pages to give back to the hypervisor is determined by a system wide workload manager. see arch/s390/mm/cmm.c When you want to introduce some interface in Xen, you probably want something more powerful than these, but it probably makes sense to see them as a base line of what can be done with practically no common code changes (if you don''t do similar stuff already). Arnd <><
Luke Kenneth Casson Leighton
2005-Jan-06 11:38 UTC
Re: [Xen-devel] Re: [XEN] using shmfs for swapspace
On Tue, Jan 04, 2005 at 09:05:13AM -0500, Rik van Riel wrote:> On Tue, 4 Jan 2005, Mark Williamson wrote: > > >>for doing opportunistic page recycling ("I dont need this page but when > >>I ask for it back please tell me if you trashed the content") > > > >We''ve talked about doing this but AFAIK nobody has gotten round to it > >yet because there hasn''t been a pressing need (IIRC, it was on the todo > >list when Xen 1.0 came out). > > > >IMHO, it doesn''t look terribly difficult but would require (hopefully > >small) modifications to the architecture independent code, plus a little > >bit of support code in Xen. > > The architecture independant changes are fine, since > they''re also useful for S390(x), PPC64 and UML... > > >I''d quite like to look at this one fine day but I suspect there are more > >useful things I should do first... > > I wonder if the same effect could be achieved by just > measuring the VM pressure inside the guests and > ballooning the guests as required, letting them grow > and shrink with their workloads.mem = 64M-128M target = 64M "if needed, grow me to 128mb but if not, whittle down to 64". mem=64M-128 target=128M "if you absolutely have to, steal some of my memory, but don''t nick any more than 64M". i''m probably going to have to "manually" implement something like this. l. ------------------------------------------------------- The SF.Net email is sponsored by: Beat the post-holiday blues Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek. It''s fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
On Wed, 5 Jan 2005, Arnd Bergmann wrote:> - Pseudo faults:These are a problem, because they turn what would be a single pageout into a pageout, a pagein, and another pageout, in effect tripling the amount of IO that needs to be done.> - Ballooning:Xen already has this. I wonder if it makes sense to consolidate the various balloon approaches into a single driver, and keep the amount of ballooned memory into account when reporting statistics in /proc/meminfo.> When you want to introduce some interface in Xen, you probably want > something more powerful than these,Xen has a nice balloon driver, that can also be controlled from outside the guest domain. -- "Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it." - Brian W. Kernighan ------------------------------------------------------- This SF.Net email is sponsored by: IntelliVIEW -- Interactive Reporting Tool for open source databases. Create drag-&-drop reports. Save time by over 75%! Publish reports on the web. Export to DOC, XLS, RTF, etc. Download a FREE copy at http://www.intelliview.com/go/osdn_nl _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
I could be wrong, but I think the significance was that on the s390, the kernel (periodically) gave pages back to the hypervisor, and requested memory back via the balloon driver only when needed. I don''t know how the balloon driver is implimented here, but in the past I had wondered if it would be possible for the kernel to try and increase memory via the balloon driver before calling the oom killer. It seems to me like giving memory to the hypervisor when it wasn''t needed could be handled in userspace by monitoring /proc/meminfo, but I think requesting memory would have to be within the kernel in order to be able to make the attempt when there is no memory free but before the oom killer kicks in. I was considering trying to impliment a daemon like that in userspace, but I don''t think it would be reliable and would depend a lot on guesswork to try and pull in memory before it was needed. On Fri, 21 Jan 2005 16:37:09 -0500 (EST), Rik van Riel <riel@redhat.com> wrote:> On Wed, 5 Jan 2005, Arnd Bergmann wrote: > > - Ballooning: > > Xen already has this. I wonder if it makes sense to > consolidate the various balloon approaches into a single > driver, and keep the amount of ballooned memory into > account when reporting statistics in /proc/meminfo.-- Puer Misellus Triste ------------------------------------------------------- This SF.Net email is sponsored by: IntelliVIEW -- Interactive Reporting Tool for open source databases. Create drag-&-drop reports. Save time by over 75%! Publish reports on the web. Export to DOC, XLS, RTF, etc. Download a FREE copy at http://www.intelliview.com/go/osdn_nl _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
> > - Pseudo faults: > > These are a problem, because they turn what would be a single > pageout into a pageout, a pagein, and another pageout, in > effect tripling the amount of IO that needs to be done.The Disco VMM tackled this by detecting attempts to double-page using a special virtual swap disk. Perhaps it would be possible to find some cleaner way to avoid wasteful double-paging by adding some more hooks for virtualised architectures... In any case, for now Xen guests are not swapped onto disk storage at runtime - they retain their physical memory reservation unless they alter it using the balloon driver.> Xen already has this. I wonder if it makes sense to > consolidate the various balloon approaches into a single > driver, and keep the amount of ballooned memory into > account when reporting statistics in /proc/meminfo.If multiple platforms want to do this, we could refactor the code so that the core of the balloon driver can be used in multiple archs. We could have an arch_release/request_memory() that the core balloon driver can call into to actually return memory to the VMM.> > When you want to introduce some interface in Xen, you probably want > > something more powerful than these, > > Xen has a nice balloon driver, that can also be > controlled from outside the guest domain.The Xen control interface made this fairly trivial to implement. Again, the balloon driver core could be plumbed into whatever the preferred virtual machine control interface for the platform is (I don''t know if / how other platforms tackle this). Cheers, Mark ------------------------------------------------------- This SF.Net email is sponsored by: IntelliVIEW -- Interactive Reporting Tool for open source databases. Create drag-&-drop reports. Save time by over 75%! Publish reports on the web. Export to DOC, XLS, RTF, etc. Download a FREE copy at http://www.intelliview.com/go/osdn_nl _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel