Hi Ian/Stefano, So, I''m back to using pfn space from maxphysaddr below. Stefano, you suggested ballooning, but that would be just too slow. There are lot of pages to be mapped, 4k at a time during guest creation, and I am afraid ballooning and hypercalls to populate EPT will be pretty slow. OTOH, there is tons of address space available between max-physaddr and max pfn in dom0. Stefano, your concern was stuff mapped there causing problems in future. But we can always look at the e820 for conflicts. Keeping things fast is important for us . Please let me know if you still have issues with my approach. I believe this is what Ian is doing on ARM port. thanks, Mukesh
On Fri, 23 Mar 2012, Mukesh Rathor wrote:> Hi Ian/Stefano, > > So, I''m back to using pfn space from maxphysaddr below. Stefano, you > suggested ballooning, but that would be just too slow. There are lot of > pages to be mapped, 4k at a time during guest creation, and I am afraid > ballooning and hypercalls to populate EPT will be pretty slow. > > OTOH, there is tons of address space available between max-physaddr and > max pfn in dom0. Stefano, your concern was stuff mapped there > causing problems in future. But we can always look at the e820 for > conflicts. Keeping things fast is important for us . > > Please let me know if you still have issues with my approach. I > believe this is what Ian is doing on ARM port.I think that we should explicitly allocate these pages/addresses and not rely on the fact that they are at a specific location that we deem safe for now. So if we explicitly introduce a new region at the end of the e820 that we mark as reserved and we use it for this, I would be OK with that. However we need to be careful because editing the e820 has proved to be challenging in the past. Also we would need to figure out a way to tell Linux that these reserved addresses are actually OK to be used. Maybe we need a new command line or hypercall for that.
On Mon, 2012-03-26 at 11:37 +0100, Stefano Stabellini wrote:> On Fri, 23 Mar 2012, Mukesh Rathor wrote: > > Hi Ian/Stefano, > > > > So, I''m back to using pfn space from maxphysaddr below. Stefano, you > > suggested ballooning, but that would be just too slow. There are lot of > > pages to be mapped, 4k at a time during guest creation, and I am afraid > > ballooning and hypercalls to populate EPT will be pretty slow. > > > > OTOH, there is tons of address space available between max-physaddr and > > max pfn in dom0. Stefano, your concern was stuff mapped there > > causing problems in future. But we can always look at the e820 for > > conflicts. Keeping things fast is important for us . > > > > Please let me know if you still have issues with my approach. I > > believe this is what Ian is doing on ARM port. > > I think that we should explicitly allocate these pages/addresses and > not rely on the fact that they are at a specific location that we deem > safe for now.Agreed. In the context of the thing I''m doing on ARM this is entirely a short term hack until we figure out something better. Ian.> So if we explicitly introduce a new region at the end of the e820 that > we mark as reserved and we use it for this, I would be OK with that. > However we need to be careful because editing the e820 has proved to be > challenging in the past. > Also we would need to figure out a way to tell Linux that these > reserved addresses are actually OK to be used. Maybe we need a new > command line or hypercall for that.
On Mon, 26 Mar 2012 11:37:46 +0100 Stefano Stabellini <stefano.stabellini@eu.citrix.com> wrote:> I think that we should explicitly allocate these pages/addresses and > not rely on the fact that they are at a specific location that we deem > safe for now. > So if we explicitly introduce a new region at the end of the e820 that > we mark as reserved and we use it for this, I would be OK with that. > However we need to be careful because editing the e820 has proved to > be challenging in the past. > Also we would need to figure out a way to tell Linux that these > reserved addresses are actually OK to be used. Maybe we need a new > command line or hypercall for that.That sounds like reasonable approach. Lets do it as part of phase II. I wanna get some basic code in. So, to give an update of where I am, good news, I''ve got guests finally booting using hybrid dom0. So, that means I am almost there now!!!! Yeay... But, the pfn space management for privcmd mapping is still a hack. Running into many issues. Basially, it is forcing me to write a slab allocator for the resvd pfn space, that I am trying to avoid. During guest creation, xl process maps about 10k foreign pgs, and xenstored 1. I was thinking of just dividing my pfn space into say 10 chunks, each with 10k pages, so 10 guest creations can happen simultaneously. But, then xl is not the only process doing the mapping I found out. xenstored also needs to map domU frames. Otherwise, I could just do one chunk per process. Also, I am breaking mmap semantics somewhat by hooking via privcmd_mmap, because the unmaps don''t follow any order. So my last unmap frees the entire 10k chunk it''s using. In a nutshell, I am still trying to figure how to allocate rsvd pfn''s for privcmd without writing a slab allocator. I think using mmap makes it harder, can''t we just use ioctl to get the VA? Then, I could nicely do something like: xl: - open(privcmd file) - ioctl(get rsvd/e820 pfn handle) - ioctl(get VA using above handle) /* alternate to mmap */ - ioctl(get VA1 using above handle) /* alternate to mmap */ ... - ioctl(release handle) - ioctl(release VA) - close file Is that an option (to change mmap to ioctl)? Hope that makes sense, thanks, Mukesh
On Sat, 2012-04-14 at 02:47 +0100, Mukesh Rathor wrote:> On Mon, 26 Mar 2012 11:37:46 +0100 > Stefano Stabellini <stefano.stabellini@eu.citrix.com> wrote: > > I think that we should explicitly allocate these pages/addresses and > > not rely on the fact that they are at a specific location that we deem > > safe for now. > > So if we explicitly introduce a new region at the end of the e820 that > > we mark as reserved and we use it for this, I would be OK with that. > > However we need to be careful because editing the e820 has proved to > > be challenging in the past. > > Also we would need to figure out a way to tell Linux that these > > reserved addresses are actually OK to be used. Maybe we need a new > > command line or hypercall for that. > > That sounds like reasonable approach. Lets do it as part of phase II. > I wanna get some basic code in. > > So, to give an update of where I am, good news, I''ve got guests > finally booting using hybrid dom0. So, that means I am almost there > now!!!! Yeay...Awesome news!> But, the pfn space management for privcmd mapping is still a hack. > Running into many issues. Basially, it is forcing me to write a slab > allocator for the resvd pfn space, that I am trying to avoid. During > guest creation, xl process maps about 10k foreign pgs, and xenstored 1.10k simultaneously or over the life of a domain build?> I was thinking of just dividing my pfn space into say 10 chunks, each > with 10k pages, so 10 guest creations can happen simultaneously. But, > then xl is not the only process doing the mapping I found out. xenstored > also needs to map domU frames. Otherwise, I could just do one chunk > per process. Also, I am breaking mmap semantics somewhat by hooking > via privcmd_mmap, because the unmaps don''t follow any order. So my last > unmap frees the entire 10k chunk it''s using.Presumably that''s mostly just an issue of doing more accounting/tracking in the privcmd driver (like the gntdev device does) so you can properly release things at the right time/place?> In a nutshell, I am still trying to figure how to allocate rsvd pfn''s > for privcmd without writing a slab allocator.Can''t you just use the core get_page function (or alloc_xenballooned_pages) and move the associated mfn aside temporarily (or not if using alloc_xenballooned_pages)?> I think using mmap makes > it harder, can''t we just use ioctl to get the VA? Then, I could nicely > do something like: > xl: > - open(privcmd file) > - ioctl(get rsvd/e820 pfn handle) > - ioctl(get VA using above handle) /* alternate to mmap */ > - ioctl(get VA1 using above handle) /* alternate to mmap */ > ... > - ioctl(release handle) > - ioctl(release VA) > - close file > > Is that an option (to change mmap to ioctl)? > > Hope that makes sense, > > thanks, > Mukesh >
On 04/13/2012 09:47 PM, Mukesh Rathor wrote:> On Mon, 26 Mar 2012 11:37:46 +0100 > Stefano Stabellini <stefano.stabellini@eu.citrix.com> wrote: >> I think that we should explicitly allocate these pages/addresses and >> not rely on the fact that they are at a specific location that we deem >> safe for now. >> So if we explicitly introduce a new region at the end of the e820 that >> we mark as reserved and we use it for this, I would be OK with that. >> However we need to be careful because editing the e820 has proved to >> be challenging in the past. >> Also we would need to figure out a way to tell Linux that these >> reserved addresses are actually OK to be used. Maybe we need a new >> command line or hypercall for that. > > That sounds like reasonable approach. Lets do it as part of phase II. > I wanna get some basic code in. > > So, to give an update of where I am, good news, I''ve got guests > finally booting using hybrid dom0. So, that means I am almost there > now!!!! Yeay... > > But, the pfn space management for privcmd mapping is still a hack. > Running into many issues. Basially, it is forcing me to write a slab > allocator for the resvd pfn space, that I am trying to avoid. During > guest creation, xl process maps about 10k foreign pgs, and xenstored 1. > > I was thinking of just dividing my pfn space into say 10 chunks, each > with 10k pages, so 10 guest creations can happen simultaneously. But, > then xl is not the only process doing the mapping I found out. xenstored > also needs to map domU frames.With Xen 4.2, xenstored should be using the grant table for its shared page. Similar changes can be made to xenconsoled so that only the domain build/migrate processes use map_foreign_range. I have a patch to xenconsoled without the fallback to map_foreign_range sitting around; I was planning to post it with proper fallback (which I may do soon, looks simple enough).> Otherwise, I could just do one chunk > per process. Also, I am breaking mmap semantics somewhat by hooking > via privcmd_mmap, because the unmaps don''t follow any order. So my last > unmap frees the entire 10k chunk it''s using. > > In a nutshell, I am still trying to figure how to allocate rsvd pfn''s > for privcmd without writing a slab allocator. I think using mmap makes > it harder, can''t we just use ioctl to get the VA? Then, I could nicely > do something like: > xl: > - open(privcmd file) > - ioctl(get rsvd/e820 pfn handle) > - ioctl(get VA using above handle) /* alternate to mmap */ > - ioctl(get VA1 using above handle) /* alternate to mmap */ > ... > - ioctl(release handle) > - ioctl(release VA) > - close file > > Is that an option (to change mmap to ioctl)? > > Hope that makes sense, > > thanks, > Mukesh > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel-- Daniel De Graaf National Security Agency
On Mon, 2012-04-16 at 15:39 +0100, Daniel De Graaf wrote:> On 04/13/2012 09:47 PM, Mukesh Rathor wrote: > > I was thinking of just dividing my pfn space into say 10 chunks, each > > with 10k pages, so 10 guest creations can happen simultaneously. But, > > then xl is not the only process doing the mapping I found out. xenstored > > also needs to map domU frames. > > With Xen 4.2, xenstored should be using the grant table for its shared > page. Similar changes can be made to xenconsoled so that only the domain > build/migrate processes use map_foreign_range. I have a patch to xenconsoled > without the fallback to map_foreign_range sitting around; I was planning to > post it with proper fallback (which I may do soon, looks simple enough).That sounds like a good thing to have, although I don''t think we''d take it for 4.2 at this point so you''ve got some time. I think the privcmd stuff needs to still assume that domain build is not the only privileged mapper of pages and do proper tracking of what it has mapped where. Various debug utilities etc also use this interface, i.e. xenctx (and gdbsx? I suppose Mukesh would know ;-)) Ian.
On Mon, 16 Apr 2012, Ian Campbell wrote:> > In a nutshell, I am still trying to figure how to allocate rsvd pfn''s > > for privcmd without writing a slab allocator. > > Can''t you just use the core get_page function (or > alloc_xenballooned_pages) and move the associated mfn aside temporarily > (or not if using alloc_xenballooned_pages)?I think that is a good suggestion: if we are trying to get in something that works but might not be the best solution, then using alloc_xenballooned_pages to get some pages and then changing the p2m is the best option: it wastes a non-trivial amount of memory in dom0 but at least it is known to work well and it wouldn''t be an "hack". Give a look at gntdev_alloc_map, gnttab_map_refs and m2p_add_override for an example.
On Mon, 2012-04-16 at 17:22 +0100, Stefano Stabellini wrote:> On Mon, 16 Apr 2012, Ian Campbell wrote: > > > In a nutshell, I am still trying to figure how to allocate rsvd pfn''s > > > for privcmd without writing a slab allocator. > > > > Can''t you just use the core get_page function (or > > alloc_xenballooned_pages) and move the associated mfn aside temporarily > > (or not if using alloc_xenballooned_pages)? > > I think that is a good suggestion: if we are trying to get in something > that works but might not be the best solution, then using > alloc_xenballooned_pages to get some pages and then changing the p2m is > the best option: it wastes a non-trivial amount of memory in dom0 but at > least it is known to work well and it wouldn''t be an "hack".I don''t think it wastes all that much -- even 10k pages is only a few 10s of megabytes for the duration of the build. Also free_xenballooned_pages does the right thing if alloc_xenballooned_pages had to explicitly free some pages to satisfy the allocation. i.e. it will shrink the balloon and re-add those pages to the allocator, it won''t leave them in the balloon or something. Ian.> > Give a look at gntdev_alloc_map, gnttab_map_refs and m2p_add_override > for an example.
On Mon, 16 Apr 2012 17:22:14 +0100 Stefano Stabellini <stefano.stabellini@eu.citrix.com> wrote:> On Mon, 16 Apr 2012, Ian Campbell wrote: > > > In a nutshell, I am still trying to figure how to allocate rsvd > > > pfn''s for privcmd without writing a slab allocator. > > > > Can''t you just use the core get_page function (or > > alloc_xenballooned_pages) and move the associated mfn aside > > temporarily (or not if using alloc_xenballooned_pages)? > > I think that is a good suggestion: if we are trying to get in > something that works but might not be the best solution, then using > alloc_xenballooned_pages to get some pages and then changing the p2m > is the best option: it wastes a non-trivial amount of memory in dom0 > but at least it is known to work well and it wouldn''t be an "hack". > > Give a look at gntdev_alloc_map, gnttab_map_refs and m2p_add_override > for an example.Ok. I changed to using alloc_xenballooned_pages. In future, if we run into problems, we can look into alternatives. In past we''ve had problems with limits reached ballooing down. We run with small dom0. thanks, Mukesh
On Wed, 2012-04-18 at 02:20 +0100, Mukesh Rathor wrote:> On Mon, 16 Apr 2012 17:22:14 +0100 > Stefano Stabellini <stefano.stabellini@eu.citrix.com> wrote: > > > On Mon, 16 Apr 2012, Ian Campbell wrote: > > > > In a nutshell, I am still trying to figure how to allocate rsvd > > > > pfn''s for privcmd without writing a slab allocator. > > > > > > Can''t you just use the core get_page function (or > > > alloc_xenballooned_pages) and move the associated mfn aside > > > temporarily (or not if using alloc_xenballooned_pages)? > > > > I think that is a good suggestion: if we are trying to get in > > something that works but might not be the best solution, then using > > alloc_xenballooned_pages to get some pages and then changing the p2m > > is the best option: it wastes a non-trivial amount of memory in dom0 > > but at least it is known to work well and it wouldn''t be an "hack". > > > > Give a look at gntdev_alloc_map, gnttab_map_refs and m2p_add_override > > for an example. > > > Ok. I changed to using alloc_xenballooned_pages. In future, if we run > into problems, we can look into alternatives. In past we''ve had > problems with limits reached ballooing down. We run with small dom0.You don''t really need to increase the size of dom0, just the size of the balloon. e.g. if you run dom0_mem=512M,max:1024M then you get a dom0 with 512M of RAM, but a total PFN space of 1024M, which means you have 512M of balloon available for alloc_xenballooned_pages. If you just do dom0_mem=512M then I believe you get 512M of RAM but PFN space sized for the entire host, which is going to give you more than enough balloon space on any typical host (but there are obviously downsides if the host has lots of RAM relative to 512M!) Ian.
Apparently Analagous Threads
- [PATCH] xen/gnttab: leave lazy MMU mode in the case of a m2p override failure
- [PATCH v4 1/2] xen: add an "highmem" parameter to alloc_xenballooned_pages
- [RFC 00/14] arm: implement ballooning and privcmd foreign mappings based on x86 PVH
- [PATCH] xen/grant-table: Refactor gnttab_[un]map_refs to avoid m2p_override
- [PATCH] xen/grant-table: Refactor gnttab_[un]map_refs to avoid m2p_override