I did some quick benchmarks in case somebody is interested. Few words about the setup: DOM1 64Mb RAM, vd:1 (4Gb on real /dev/sdb) DOM2 64Mb RAM, sda1 (4Gb mapped from real /dev/sdc5) sdb is configured with only two VD''s (4Gb root + 128Mb swap) so that in both cases partitions lay at the beginning of the physical disk. I suspect at least, because I don''t know how VD code allocates space. sdc also has only two phy partitions with the same setup. Both have identical root fs-s. Single run is done while other domains sit idle. Concurrent run is both domain running bonnie++, well, concurrently :) It seems that phy is significantly faster than vd, and that concurrent i/o load on separate disks doesn''t hurt performace. Detailed log attached.
Cool! Thanks for the info - there haven''t been any benchmarks on virtual disks yet, so it''s good to have some figures. A couple of comments: VD space has probably been allocated at the end of the drive, not the start. The VD management layer tries to allocate new space from the end (not the start) of the longest expired disk, first. The idea behind this was to preserve any meta data at the start of the expired disk being scavenged from, in case of a future option for "partial undeletion" of expired disks whose space has been reallocated. In practice, I suspect most filesystems people will use are a little cleverer in terms of where they place their metadata, so this may not actually help much. I suspect you might be able to increase the performance by using a different extent size. The VD management code allocates extents of "initialised" physical disks. The extent size for a device can be specified when it''s initialised. The default is 64 megabytes. A 4 gigabyte virtual disk would make for 64 extents of 64 meg. A long list of extents could be slowing down the disk address translation in Xen, resulting in poorer benchmark results. At some stage I might rethink the default extent size. If you want, you could try a larger extent size by specifying it at initialisation time. [ you can''t change the extent size on an already initialised device :-( ]. This might speed things up in the benchmark, although if you don''t allocate space in multiples of the extent size there will be more space in unused "partial extents" than for a smaller extent size. In the degenerate case that your entire virtual disk fits inside an extent, it should give exactly the same performance as for the phy case (since the code will be doing exactly the same thing). If I get some spare time I might test this out myself - right now I''m debugging a different problem. Cheers, Mark ------------------------------------------------------- SF.Net is sponsored by: Speed Start Your Linux Apps Now. Build and deploy apps & Web services for Linux with a free DVD software kit from IBM. Click Now! http://ads.osdn.com/?ad_id=1356&alloc_id=3438&op=click _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
> I suspect you might be able to increase the performance by using a different > extent size. The VD management code allocates extents of "initialised" > physical disks. The extent size for a device can be specified when it''s > initialised. The default is 64 megabytes. A 4 gigabyte virtual disk would > make for 64 extents of 64 meg. A long list of extents could be slowing down > the disk address translation in Xen, resulting in poorer benchmark results. > At some stage I might rethink the default extent size.The correct fix is to get rid of the linear linked lists. A good way to do this is to use a buddy allocator for allocating VDs: you track power-of-two multiples of 64MB free disc space, and try to create VDs out of the largest possible extents. Within Xen you can then use a truncated radix tree to map virtual extents to real extents --- Linux has code we can use for truncated radix trees. -- Keir> If you want, you could try a larger extent size by specifying it at > initialisation time. [ you can''t change the extent size on an already > initialised device :-( ]. This might speed things up in the benchmark, > although if you don''t allocate space in multiples of the extent size there > will be more space in unused "partial extents" than for a smaller extent size. > > In the degenerate case that your entire virtual disk fits inside an extent, it > should give exactly the same performance as for the phy case (since the code > will be doing exactly the same thing). > > If I get some spare time I might test this out myself - right now I''m > debugging a different problem. > > Cheers, > > Mark > > > > ------------------------------------------------------- > SF.Net is sponsored by: Speed Start Your Linux Apps Now. > Build and deploy apps & Web services for Linux with > a free DVD software kit from IBM. Click Now! > http://ads.osdn.com/?ad_id=1356&alloc_id=3438&op=click > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/xen-devel------------------------------------------------------- SF.Net is sponsored by: Speed Start Your Linux Apps Now. Build and deploy apps & Web services for Linux with a free DVD software kit from IBM. Click Now! http://ads.osdn.com/?ad_id=1356&alloc_id=3438&op=click _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
> I suspect you might be able to increase the performance by using a different > extent size. The VD management code allocates extents of "initialised" > physical disks. The extent size for a device can be specified when it''s > initialised. The default is 64 megabytes. A 4 gigabyte virtual disk would > make for 64 extents of 64 meg. A long list of extents could be slowing down > the disk address translation in Xen, resulting in poorer benchmark results. > At some stage I might rethink the default extent size. > > If you want, you could try a larger extent size by specifying it at > initialisation time. [ you can''t change the extent size on an already > initialised device :-( ]. This might speed things up in the benchmark, > although if you don''t allocate space in multiples of the extent size there > will be more space in unused "partial extents" than for a smaller extent size.Any difference in performance must surely be down to the physical position of the data on the disk (and hence transfer rate): 64MB extents are sufficiently big not to introduce many extra seeks in a transfer, even if they weren''t contiguous, which they likely are. The fact that we have to search a linked list of 64 extents for every transfer is pretty naive, but hardly likely to take long enough to actually produce a noticeable slowdown. I think the likely problem is that vdtool is populating the disk from the last sector, so you''re getting the slow bit of the disk first. The fact that each of the extents are effectively in the wrong order probably isn''t giving Xen''s disk scheduler optimal performance either. Mark: could you initialise the free list on a newly created VD partition ''backwards'' so we allocate from the front and sequentially allocated extents are forward on disk? Thanks, Ian ------------------------------------------------------- SF.Net is sponsored by: Speed Start Your Linux Apps Now. Build and deploy apps & Web services for Linux with a free DVD software kit from IBM. Click Now! http://ads.osdn.com/?ad_id=1356&alloc_id=3438&op=click _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
> The correct fix is to get rid of the linear linked lists. > > A good way to do this is to use a buddy allocator for allocating VDs: > you track power-of-two multiples of 64MB free disc space, and try to > create VDs out of the largest possible extents. Within Xen you can > then use a truncated radix tree to map virtual extents to real extents > --- Linux has code we can use for truncated radix trees.That''s a really cool idea! Thanks, Keir :-) I''ll stick it on my todo list to implement in a future release, time permitting. In the meantime, perhaps Ian''s suggestion will give some decent improvements... (see later e-mail). Cheers, Mark ------------------------------------------------------- SF.Net is sponsored by: Speed Start Your Linux Apps Now. Build and deploy apps & Web services for Linux with a free DVD software kit from IBM. Click Now! http://ads.osdn.com/?ad_id=1356&alloc_id=3438&op=click _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
OK, thanks Ian. Changes now made to the 1.2 repository and pushed to bkbits. Hopefully this should improve things a bit. Obviously, you''ll have to use the slow bits of the disk eventually but it makes sense not to use it first ;-) If anyone''s interested in trying a comparative benchmark now, they''d be very welcome! Mark ------------------------------------------------------- SF.Net is sponsored by: Speed Start Your Linux Apps Now. Build and deploy apps & Web services for Linux with a free DVD software kit from IBM. Click Now! http://ads.osdn.com/?ad_id=1356&alloc_id=3438&op=click _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
On Tuesday 17 February 2004 18:51, Mark Williamson wrote:> OK, thanks Ian. Changes now made to the 1.2 repository and pushed to > bkbits. Hopefully this should improve things a bit. Obviously, you''ll have > to use the slow bits of the disk eventually but it makes sense not to use > it first ;-)Will this break existing VD-s? ------------------------------------------------------- SF.Net is sponsored by: Speed Start Your Linux Apps Now. Build and deploy apps & Web services for Linux with a free DVD software kit from IBM. Click Now! http://ads.osdn.com/?ad_id=1356&alloc_id=3438&op=click _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
Yan-Ching CHU
2004-Feb-18 18:24 UTC
[Xen-devel] Some thoughts about the soft real time scheduler for Xen
Hi Xen guys, Frist, a (partial) recap of what Ian said before: =====================================We need a compile (or run time) option to completely replace the current BVT scheduler with a soft real time scheduler that allows domains to be given guarantees of the form "x microseconds every y microseconds" (having a constraint that y must be a power of 2 or suchlike would be fine) If there''s CPU time left over after meeting the guarantees of all the runnable domains, it should be shared out in a proportional manner between domains that have an ''eligible for best-effort extra time'' flag set. ===================================== Some questions: 1. According to the "2003 Xenoserver Computing Infrastructure", in a commercial production environment clients are supposed to "buy" the computing time from Xenoserver, customers may not be happy with only soft real time QoS? 2. I am working on a (simple) absolute share scheduler function in Xen, which should provide the bottom line for what a customer buy from Xenoserver. But I guess a hybrid scheduler combining these two is desirable in the future? 3. For a Xenolinux (domain) to specify meaningful QoS requests, it has to gather information from application processes and inform them to Xen. In the literature there are serveral approaches such as directly modifying the kernel scheduler to be fully preemptible (preserving original interface), implementing new extension as module, using " dual kernels" by providing a thin layer between Linux kernel and interrupt control hardward (real time tasks interact with another [real time] kernel interface). Xen shows properties like some of these in the way that it sits below standard Linux like "dual kernel", and, that application processes run unmodified. Besides Xen''s scheduler, the schduler in Xenolinux needs to be changed. Any idea how this should be implemented in Xenolinux? Which approach is more appropriate? Any comments are welcomed. Thanks, Yan-Ching CHU ------------------------------------------------------- SF.Net is sponsored by: Speed Start Your Linux Apps Now. Build and deploy apps & Web services for Linux with a free DVD software kit from IBM. Click Now! http://ads.osdn.com/?ad_id=1356&alloc_id=3438&op=click _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
Steven Hand
2004-Feb-18 18:49 UTC
Re: [Xen-devel] Some thoughts about the soft real time scheduler for Xen
> Frist, a (partial) recap of what Ian said before: > > =====================================> We need a compile (or run time) option to completely replace the > current BVT scheduler with a soft real time scheduler that allows > domains to be given guarantees of the form "x microseconds every > y microseconds" (having a constraint that y must be a power of 2 > or suchlike would be fine) > > If there''s CPU time left over after meeting the guarantees of all > the runnable domains, it should be shared out in a proportional > manner between domains that have an ''eligible for best-effort > extra time'' flag set. > =====================================> > Some questions: > > 1. According to the "2003 Xenoserver Computing Infrastructure", in a > commercial production environment clients are supposed to "buy" the > computing time from Xenoserver, customers may not be happy with only soft > real time QoS?In the context of XenoServers, the notion is that you pay for e.g. S ms every P ms, where the total demand from all domains clearly needs to be less than e.g. 100% assuming EDF. However: if there is K% of the CPU remaining, we would like to be able to choose whether to dole this out to some subset of all domains, or to simply ''waste'' it. The subset sharing the ''slack time'' may or may not include any paying customers (e.g. it might involve running maintenance tasks for the XenoServer). In the non-XenoServer case (e.g. more traditional web hosting world), the use of a work conserving scheduler seems to make lots of sense. A scheduler which allows both these extremes (hard QoS & best effort) would hence we a very nice thing.> 2. I am working on a (simple) absolute share scheduler function in Xen, > which should provide the bottom line for what a customer buy from > Xenoserver. But I guess a hybrid scheduler combining these two is desirable > in the future?Yup!> 3. For a Xenolinux (domain) to specify meaningful QoS requests, it has > to gather information from application processes and inform them to Xen. In > the literature there are serveral approaches such as directly modifying the > kernel scheduler to be fully preemptible (preserving original interface), > implementing new extension as module, using " dual kernels" by providing a > thin layer between Linux kernel and interrupt control hardward (real time > tasks interact with another [real time] kernel interface). Xen shows > properties like some of these in the way that it sits below standard Linux > like "dual kernel", and, that application processes run unmodified. Besides > Xen''s scheduler, the schduler in Xenolinux needs to be changed. Any idea how > this should be implemented in Xenolinux? Which approach is more appropriate?So in principle a XenoLinux (or other guest) scheduler could certainly export some ''real-time'' scheduling notions to its hosted processes. However this is not at all a requirement for us; in particular our experience with Nemesis showed that the sorts of QoS requirements people actually have in general are rather coarse... e.g. some notion of "an aggregate machine which is about 15% as powerful as a the real one". We''d like to support higher-level QoS specs (e.g. "a machine which is capable of scoring 225 on SPEC WEb99") and have a plan for this. But none of this involves tweaking any of the scheduling in XenoLinux or other guests. cheers, S. ------------------------------------------------------- SF.Net is sponsored by: Speed Start Your Linux Apps Now. Build and deploy apps & Web services for Linux with a free DVD software kit from IBM. Click Now! http://ads.osdn.com/?ad_id=1356&alloc_id=3438&op=click _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
David Hopwood
2004-Feb-18 19:50 UTC
Re: [Xen-devel] Some thoughts about the soft real time scheduler for Xen
Yan-Ching CHU wrote:> Hi Xen guys, > > Frist, a (partial) recap of what Ian said before: > > =====================================> We need a compile (or run time) option to completely replace the > current BVT scheduler with a soft real time scheduler that allows > domains to be given guarantees of the form "x microseconds every > y microseconds" (having a constraint that y must be a power of 2 > or suchlike would be fine) > > If there''s CPU time left over after meeting the guarantees of all > the runnable domains, it should be shared out in a proportional > manner between domains that have an ''eligible for best-effort > extra time'' flag set.Can I suggest looking up "hierarchical scheduling" in general, and this paper in particular: <http://citeseer.nj.nec.com/regehr01hls.html>. David Hopwood <david.nospam.hopwood@blueyonder.co.uk> ------------------------------------------------------- SF.Net is sponsored by: Speed Start Your Linux Apps Now. Build and deploy apps & Web services for Linux with a free DVD software kit from IBM. Click Now! http://ads.osdn.com/?ad_id=1356&alloc_id=3438&op=click _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
Yan-Ching CHU
2004-Feb-18 20:16 UTC
Re: [Xen-devel] Some thoughts about the soft real time scheduler for Xen
Thanks Steven that really clarified things a lot. Yan-Ching CHU ----- Original Message ----- From: "Steven Hand" <Steven.Hand@cl.cam.ac.uk> To: "Yan-Ching CHU" <cs0u210a@liverpool.ac.uk> Cc: "Ian Pratt" <Ian.Pratt@cl.cam.ac.uk>; <xen-devel@lists.sourceforge.net> Sent: Wednesday, February 18, 2004 6:49 PM Subject: Re: [Xen-devel] Some thoughts about the soft real time scheduler for Xen> > > Frist, a (partial) recap of what Ian said before: > > > > =====================================> > We need a compile (or run time) option to completely replace the > > current BVT scheduler with a soft real time scheduler that allows > > domains to be given guarantees of the form "x microseconds every > > y microseconds" (having a constraint that y must be a power of 2 > > or suchlike would be fine) > > > > If there''s CPU time left over after meeting the guarantees of all > > the runnable domains, it should be shared out in a proportional > > manner between domains that have an ''eligible for best-effort > > extra time'' flag set. > > =====================================> > > > Some questions: > > > > 1. According to the "2003 Xenoserver Computing Infrastructure", in a > > commercial production environment clients are supposed to "buy" the > > computing time from Xenoserver, customers may not be happy with onlysoft> > real time QoS? > > In the context of XenoServers, the notion is that you pay for e.g. > S ms every P ms, where the total demand from all domains clearly needs > to be less than e.g. 100% assuming EDF. > > However: if there is K% of the CPU remaining, we would like to be able > to choose whether to dole this out to some subset of all domains, or > to simply ''waste'' it. The subset sharing the ''slack time'' may or may > not include any paying customers (e.g. it might involve running > maintenance tasks for the XenoServer). > > In the non-XenoServer case (e.g. more traditional web hosting world), > the use of a work conserving scheduler seems to make lots of sense. > A scheduler which allows both these extremes (hard QoS & best effort) > would hence we a very nice thing. > > > 2. I am working on a (simple) absolute share scheduler function inXen,> > which should provide the bottom line for what a customer buy from > > Xenoserver. But I guess a hybrid scheduler combining these two isdesirable> > in the future? > > Yup! > > > 3. For a Xenolinux (domain) to specify meaningful QoS requests, ithas> > to gather information from application processes and inform them to Xen.In> > the literature there are serveral approaches such as directly modifyingthe> > kernel scheduler to be fully preemptible (preserving originalinterface),> > implementing new extension as module, using " dual kernels" by providinga> > thin layer between Linux kernel and interrupt control hardward (realtime> > tasks interact with another [real time] kernel interface). Xen shows > > properties like some of these in the way that it sits below standardLinux> > like "dual kernel", and, that application processes run unmodified.Besides> > Xen''s scheduler, the schduler in Xenolinux needs to be changed. Any ideahow> > this should be implemented in Xenolinux? Which approach is moreappropriate?> > So in principle a XenoLinux (or other guest) scheduler could certainly > export some ''real-time'' scheduling notions to its hosted processes. > However this is not at all a requirement for us; in particular our > experience with Nemesis showed that the sorts of QoS requirements > people actually have in general are rather coarse... e.g. some notion > of "an aggregate machine which is about 15% as powerful as a the > real one". We''d like to support higher-level QoS specs (e.g. "a > machine which is capable of scoring 225 on SPEC WEb99") and have a > plan for this. But none of this involves tweaking any of the > scheduling in XenoLinux or other guests. > > cheers, > > S. > > > ------------------------------------------------------- > SF.Net is sponsored by: Speed Start Your Linux Apps Now. > Build and deploy apps & Web services for Linux with > a free DVD software kit from IBM. Click Now! > http://ads.osdn.com/?ad_id=1356&alloc_id=3438&op=click > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/xen-devel >