On Wed, Sep 28, 2011 at 11:39 PM, Larry Bassel <lbassel@codeaurora.org> wrote:> We need to create a large (~100M) contiguous physical memory region > which will only be needed occasionally. As this region will > use up 10-20% of all of the available memory, we do not want > to pre-reserve it at boot time. Instead, we want to create > this memory region "on the fly" when asked to by userspace, > and do it as quickly as possible, and return it to > system use when not needed. > > AFAIK, this sort of operation is currently done using memory > compaction (as CMA does for instance). > Alternatively, this memory region (if it is in a fixed place) > could be created using "logical memory hotremove" and returned > to the system using "logical memory hotplug". In either case, > the contiguous physical memory would be created via migrating > pages from the "movable zone". > > The problem with this approach is that the copying of up to 25000 > pages may take considerable time (as well as finding destinations > for all of the pages if free memory is scarce -- this may > even fail, causing the memory region not to be created). > > It was suggested to me that a new zone type which would be similar > to the "movable zone" but is only allowed to contain pages > that can be discarded (such as text) could solve this problem, > so that there is no copying or finding destination pages needed (thus > considerably reducing latency). >Is this approach similar to Copy-on-Write being used in most page sharing entitlements ? If yes, then it almost depends on the # of writes made on the pages.> The downside I see is that there may not be anywhere near > 25000 such discardable pages, so most of this zone would go unused, and > the memory would be "wasted" as in the case where it is pre-reserved. > Also, this is not currently supported, so new code would > have to be designed and implemented. > > I would appreciate people''s comments about: > > 1. Does this type of zone make any sense? It > would have to co-exist with the current movable zone type.Ideally can''t there be a reserved zone created from which all the remaining on-the fly zones are shared based on CoW ?> 2. How hard would it be to implement this? The new zone type would > need to be supported and "discardable" pages steered into this zone. >Most VMs do support ballooning, CoW and other forms of sharing and can provide as basis for any memory management projects.> 3. Are there better ways of allocating a large memory region > with minimal latency that I haven''t mentioned here? >Hmm...there are mechanisms as pointed by yourself but they all depend on the policy of consolidation, priority and security of operations.> Thanks. > > Larry Bassel > > -- > Sent by an employee of the Qualcomm Innovation Center, Inc. > The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum. > > -- > To unsubscribe, send a message with ''unsubscribe linux-mm'' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ > Don''t email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> From: Sameer Pramod Niphadkar [mailto:spniphadkar@gmail.com] > Sent: Thursday, September 29, 2011 12:08 AM > To: Larry Bassel > Cc: linux-mm@kvack.org; vgandhi@codeaurora.org; Xen-devel@lists.xensource.com > Subject: [Xen-devel] Re: RFC -- new zone type > > On Wed, Sep 28, 2011 at 11:39 PM, Larry Bassel <lbassel@codeaurora.org> wrote: > > We need to create a large (~100M) contiguous physical memory region > > which will only be needed occasionally. As this region will > > use up 10-20% of all of the available memory, we do not want > > to pre-reserve it at boot time. Instead, we want to create > > this memory region "on the fly" when asked to by userspace, > > and do it as quickly as possible, and return it to > > system use when not needed. > > > > AFAIK, this sort of operation is currently done using memory > > compaction (as CMA does for instance). > > Alternatively, this memory region (if it is in a fixed place) > > could be created using "logical memory hotremove" and returned > > to the system using "logical memory hotplug". In either case, > > the contiguous physical memory would be created via migrating > > pages from the "movable zone". > > > > The problem with this approach is that the copying of up to 25000 > > pages may take considerable time (as well as finding destinations > > for all of the pages if free memory is scarce -- this may > > even fail, causing the memory region not to be created). > > > > It was suggested to me that a new zone type which would be similar > > to the "movable zone" but is only allowed to contain pages > > that can be discarded (such as text) could solve this problem, > > so that there is no copying or finding destination pages needed (thus > > considerably reducing latency).If I read the above correctly, you are talking about indeed pre-reserving your ~100MB contiguous chunk of memory but using it for "discardable" pages only, then discarding all of those pages when you need the memory region, then going back to using the contiguous chunk for discardable pages, and so on. You may be interested in the concept of "ephemeral pages" introduced by transcendent memory ("tmem") and the cleancache patchset which went upstream at 3.0. If you write a driver (called a "backend" in tmem language) that accepts pages from cleancache, you would be able to use your 100MB contiguous chunk of memory for clean pagecache pages when it is not needed for your other purposes, easily discard all the pages when you do need the space, then start using it for clean pagecache pages again when you don''t need it for your purposes anymore (and repeat this cycle as many times as necessary). You maybe could call your driver "cleanzone". Zcache (also upstream in drivers/staging) does something like this already, though you might not want/need to use compression in your driver. In zcache, space reclaim is driven by the kernel "shrinker" code that runs when memory is low, but another trigger could easily be used. Also there is likely a lot of code in zcache (e.g. tmem.c) that you could leverage. For more info, see: http://lwn.net/Articles/454795/ http://oss.oracle.com/projects/tmem I''d be happy to answer any questions if you are still interested after you have read the above documentation. Thanks, Dan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 29 Sep 11 09:38, Dan Magenheimer wrote:> > From: Sameer Pramod Niphadkar [mailto:spniphadkar@gmail.com] > > Sent: Thursday, September 29, 2011 12:08 AM > > To: Larry Bassel > > Cc: linux-mm@kvack.org; vgandhi@codeaurora.org; Xen-devel@lists.xensource.com > > Subject: [Xen-devel] Re: RFC -- new zone type > > > > On Wed, Sep 28, 2011 at 11:39 PM, Larry Bassel <lbassel@codeaurora.org> wrote: > > > We need to create a large (~100M) contiguous physical memory region > > > which will only be needed occasionally. As this region will > > > use up 10-20% of all of the available memory, we do not want > > > to pre-reserve it at boot time. Instead, we want to create > > > this memory region "on the fly" when asked to by userspace, > > > and do it as quickly as possible, and return it to > > > system use when not needed. > > > > > > AFAIK, this sort of operation is currently done using memory > > > compaction (as CMA does for instance). > > > Alternatively, this memory region (if it is in a fixed place) > > > could be created using "logical memory hotremove" and returned > > > to the system using "logical memory hotplug". In either case, > > > the contiguous physical memory would be created via migrating > > > pages from the "movable zone". > > > > > > The problem with this approach is that the copying of up to 25000 > > > pages may take considerable time (as well as finding destinations > > > for all of the pages if free memory is scarce -- this may > > > even fail, causing the memory region not to be created). > > > > > > It was suggested to me that a new zone type which would be similar > > > to the "movable zone" but is only allowed to contain pages > > > that can be discarded (such as text) could solve this problem, > > > so that there is no copying or finding destination pages needed (thus > > > considerably reducing latency). > > If I read the above correctly, you are talking about indeed > pre-reserving your ~100MB contiguous chunk of memory but using > it for "discardable" pages only, then discarding all of those > pages when you need the memory region, then going back to using > the contiguous chunk for discardable pages, and so on.Yes, that is exactly what we want to do.> > You may be interested in the concept of "ephemeral pages" > introduced by transcendent memory ("tmem") and the cleancache > patchset which went upstream at 3.0. If you write a driver > (called a "backend" in tmem language) that accepts pages > from cleancache, you would be able to use your 100MB contiguous > chunk of memory for clean pagecache pages when it is not needed > for your other purposes, easily discard all the pages when > you do need the space, then start using it for clean pagecache > pages again when you don''t need it for your purposes anymore > (and repeat this cycle as many times as necessary). > > You maybe could call your driver "cleanzone". > > Zcache (also upstream in drivers/staging) does something like > this already, though you might not want/need to use compression > in your driver. In zcache, space reclaim is driven by the kernel > "shrinker" code that runs when memory is low, but another trigger > could easily be used. Also there is likely a lot of code in > zcache (e.g. tmem.c) that you could leverage. > > For more info, see: > http://lwn.net/Articles/454795/ > http://oss.oracle.com/projects/tmemThanks very much, I''ll look into these.> > I''d be happy to answer any questions if you are still interested > after you have read the above documentation. > > Thanks, > DanLarry>-- Sent by an employee of the Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 29 Sep 11 09:38, Dan Magenheimer wrote: [snip]> > You may be interested in the concept of "ephemeral pages" > introduced by transcendent memory ("tmem") and the cleancache > patchset which went upstream at 3.0. If you write a driver > (called a "backend" in tmem language) that accepts pages > from cleancache, you would be able to use your 100MB contiguous > chunk of memory for clean pagecache pages when it is not needed > for your other purposes, easily discard all the pages when > you do need the space, then start using it for clean pagecache > pages again when you don''t need it for your purposes anymore > (and repeat this cycle as many times as necessary). > > You maybe could call your driver "cleanzone". > > Zcache (also upstream in drivers/staging) does something like > this already, though you might not want/need to use compression > in your driver. In zcache, space reclaim is driven by the kernel > "shrinker" code that runs when memory is low, but another trigger > could easily be used. Also there is likely a lot of code in > zcache (e.g. tmem.c) that you could leverage. > > For more info, see: > http://lwn.net/Articles/454795/ > http://oss.oracle.com/projects/tmem > > I''d be happy to answer any questions if you are still interested > after you have read the above documentation.It appears that ephemeral tmem ("cleancache") is at least close to meeting our needs. We won''t need to have virtualization or compression. I do have some questions (I''ve read the references you included in your email to me last week and a few of the links from the "project transcendent memory" one, but have not looked at any of the source yet): 1. Is it currently possible to specify the size of tmem (as for us it must be convertable into a large contiguous physical block of specified size)? Is is currently possible to specify the start of tmem? Are there any alignment constraints on the start or size? 2. How does one "turn on" and "turn off" tmem (the memory which tmem uses may also be needed for the large contiguous memory block, or perhaps may be powered off entirely)? Is it simply that one always answers "no" for both get and put requests when it is "off"? 3. How portable is the tmem code? This needs to run on an ARM system. 4. Apparently hooks are needed in the filesystem code -- which filesystems are currently supported to be used with tmem? Is it difficult to add hooks for filesystems that aren''t yet supported? 5. There are no dependencies on memory compaction or memory hotplug (or sparsemem), correct? Thank you for suggesting tmem and thanks in advance for answering my questions.> > Thanks, > Dan >Larry -- Sent by an employee of the Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> > You may be interested in the concept of "ephemeral pages" > > introduced by transcendent memory ("tmem") and the cleancache > > patchset which went upstream at 3.0. If you write a driver > > (called a "backend" in tmem language) that accepts pages > > from cleancache, you would be able to use your 100MB contiguous > > chunk of memory for clean pagecache pages when it is not needed > > for your other purposes, easily discard all the pages when > > you do need the space, then start using it for clean pagecache > > pages again when you don''t need it for your purposes anymore > > (and repeat this cycle as many times as necessary). > > > > You maybe could call your driver "cleanzone". > > > > Zcache (also upstream in drivers/staging) does something like > > this already, though you might not want/need to use compression > > in your driver. In zcache, space reclaim is driven by the kernel > > "shrinker" code that runs when memory is low, but another trigger > > could easily be used. Also there is likely a lot of code in > > zcache (e.g. tmem.c) that you could leverage. > > > > For more info, see: > > http://lwn.net/Articles/454795/ > > http://oss.oracle.com/projects/tmem > > > > I''d be happy to answer any questions if you are still interested > > after you have read the above documentation. > > It appears that ephemeral tmem ("cleancache") is at least > close to meeting our needs.Yes, I thought so also,> We won''t need to > have virtualization or compression.Right. Those just demonstrate different interesting uses of tmem/cleancache.> I do have some questions (I''ve read the references > you included in your email to me last week and a few > of the links from the "project transcendent memory" one, but have > not looked at any of the source yet): > > 1. Is it currently possible to specify the size of tmem > (as for us it must be convertable into a large contiguous physical > block of specified size)? Is is currently possible to specify > the start of tmem? Are there any alignment constraints on > the start or size?Your "cleanzone" driver would have complete control over this so there would be no constraints unless you (or generic kernel code) choose to enforce them.> 2. How does one "turn on" and "turn off" tmem (the memory > which tmem uses may also be needed for the large contiguous > memory block, or perhaps may be powered off entirely)? > Is it simply that one always answers "no" for both > get and put requests when it is "off"?That''s right. However, you must ensure that stale data isn''t get''able after you''ve turned if off and then on again. I don''t think you''ll need to do that... I think you will be assuming all of the cleancache data is gone (not preserved).> 3. How portable is the tmem code? This needs to run > on an ARM system.I don''t think there is any reason it wouldn''t be portable. If you are running on a system with a 32-bit pointer but >4GB memory (e.g. "highmem"), that might add some complexity, but I think those problems have now been solved in zcache so should be solveable for cleanzone also.> 4. Apparently hooks are needed in the filesystem code -- > which filesystems are currently supported to be used with > tmem? Is it difficult to add hooks for filesystems > that aren''t yet supported?The hooks are currently in ext3, ext4, btrfs, and ocfs2. If the filesystem is "well behaved" the support is easy to add.> 5. There are no dependencies on memory compaction > or memory hotplug (or sparsemem), correct?No dependencies. Dan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Thanks for your answers to my questions. I have one more: Will there be any problem if the memory I want to be transcendent is highmem (i.e. doesn''t have any permanent virtual<->physical mapping)? Thanks. Larry -- Sent by an employee of the Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> From: Larry Bassel [mailto:lbassel@codeaurora.org] > Sent: Thursday, October 06, 2011 5:04 PM > To: Dan Magenheimer > Cc: Larry Bassel; linux-mm@kvack.org; Xen-devel@lists.xensource.com > Subject: Re: [Xen-devel] Re: RFC -- new zone type > > Thanks for your answers to my questions. I have one more: > > Will there be any problem if the memory I want to be > transcendent is highmem (i.e. doesn''t have any permanent > virtual<->physical mapping)?Hi Larry -- I have to admit I am not an expert with highmem things. Seth Jennings (cc''ed) fixed highmem for zcache with a patch, so I assume that there shouldn''t be a problem for your code. Dan P.S. Seth, google for the subject if needed... there is not a single email thread I can easily point you to. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 10/07/2011 10:23 AM, Dan Magenheimer wrote:>> From: Larry Bassel [mailto:lbassel@codeaurora.org] >> Sent: Thursday, October 06, 2011 5:04 PM >> To: Dan Magenheimer >> Cc: Larry Bassel; linux-mm@kvack.org; Xen-devel@lists.xensource.com >> Subject: Re: [Xen-devel] Re: RFC -- new zone type >> >> Thanks for your answers to my questions. I have one more: >> >> Will there be any problem if the memory I want to be >> transcendent is highmem (i.e. doesn''t have any permanent >> virtual<->physical mapping)?I guess I need to make the distinction between tmem, the transcendent memory layer, and zcache, a tmem backend that does the compression and storage work. Tmem is highmem agnostic. It''s just passing the page information through to the backend, zcache. Zcache can store data stored in highmem pages (after the patch that Dan referred to), but can''t use highmem pages in it''s own storage pools. Both zbud (storage for compressed ephemeral pages) and xvmalloc (storage for compressed persistent pages) don''t set __GFP_HIGHMEM in their page allocation calls because they return the virtual address of the page to zcache. Since highmem pages have no virtual address expect for the short time they are mapped, this prevents highmem pages from being used by zbud and xvmalloc. I did write a patch a while back that allows xvmalloc to use highmem pages in it''s storage pool. Although, from looking at the history of this conversation, you''d be writing a different backend for tmem and not using zcache anyway. Currently the tmem code is in the zcache driver. However, if there are going to be other backends designed for it, we may need to move it into its own module so it can be shared. -- Seth _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 07 Oct 11 11:01, Seth Jennings wrote:> On 10/07/2011 10:23 AM, Dan Magenheimer wrote: > >> From: Larry Bassel [mailto:lbassel@codeaurora.org] > >> Sent: Thursday, October 06, 2011 5:04 PM > >> To: Dan Magenheimer > >> Cc: Larry Bassel; linux-mm@kvack.org; Xen-devel@lists.xensource.com > >> Subject: Re: [Xen-devel] Re: RFC -- new zone type > >> > >> Thanks for your answers to my questions. I have one more: > >> > >> Will there be any problem if the memory I want to be > >> transcendent is highmem (i.e. doesn''t have any permanent > >> virtual<->physical mapping)? > > I guess I need to make the distinction between tmem, the transcendent > memory layer, and zcache, a tmem backend that does the compression > and storage work. Tmem is highmem agnostic. It''s just passing the > page information through to the backend, zcache.I''m sorry if my question was ambiguous -- I want to use the "cleancache" concept to allow us to have a large (> 100M) piece of contiguous physical memory which can either be used as such or otherwise used as a cleancache for discardable pages. It is this memory that I''m asking if it can be highmem.> > Zcache can store data stored in highmem pages (after the patch that Dan > referred to), but can''t use highmem pages in it''s own storage pools. Both > zbud (storage for compressed ephemeral pages) and xvmalloc (storage for > compressed persistent pages) don''t set __GFP_HIGHMEM in their page > allocation calls because they return the virtual address of the page to > zcache. Since highmem pages have no virtual address expect for the short > time they are mapped, this prevents highmem pages from being used by zbud > and xvmalloc.As this area must be very large and contiguous, I can''t use kmalloc or similar allocation APIs -- I imagine I''ll carve it out early in boot with memblock_remove() -- luckily this area is of fixed size. If this memory were in ZONE_HIGHMEM, I''d just have to use kmap to get a temporary mapping to use when the page is copied to or from "normal" system memory (or am I missing something here?). Whether this area is in highmem or not, I imagine I''ll need to write an allocator to allocate/free pages from the "dual-purpose" memory when it is cleancache.> > I did write a patch a while back that allows xvmalloc to use highmem > pages in it''s storage pool. Although, from looking at the history of this > conversation, you''d be writing a different backend for tmem and not using > zcache anyway.We''re going to want a backend which is (at least to a first approximation) a simplification of zcache -- no compression and no frontswap is needed. Possibly we''ll start with zcache and remove things we don''t need.> > Currently the tmem code is in the zcache driver. However, if there are > going to be other backends designed for it, we may need to move it into its > own module so it can be shared. > > -- > Seth >Larry -- Sent by an employee of the Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> From: Larry Bassel [mailto:lbassel@codeaurora.org] > > As this area must be very large and contiguous, I can''t use kmalloc or similar > allocation APIs -- I imagine I''ll carve it out early in boot with > memblock_remove() -- luckily this area is of fixed size. If this memory > were in ZONE_HIGHMEM, I''d just have to use kmap to get a temporary mapping > to use when the page is copied to or from "normal" system memory (or am > I missing something here?). Whether this area is in highmem or not, I imagine > I''ll need to write an allocator to allocate/free pages from the "dual-purpose" > memory when it is cleancache.Yep. It would also be very nice if you could allocate the metadata (tmem data structures) from the same "dual-purpose" memory as then all of the data structures can simply be discarded when you need the memory for the "big-100MB-block" purpose. Zeroing a single pointer would be enough to "free" all data and metadata. Sadly I don''t think this will work when the dual-purpose memory is in highmem... you will need to walk the metadata and free it all up when you free the cleancache pages.> > I did write a patch a while back that allows xvmalloc to use highmem > > pages in it''s storage pool. Although, from looking at the history of this > > conversation, you''d be writing a different backend for tmem and not using > > zcache anyway. > > We''re going to want a backend which is (at least to a > first approximation) a simplification of zcache > -- no compression and no frontswap is needed. > Possibly we''ll start with zcache and remove things we don''t need.Agree that''s your best bet. Let us know how it goes, especially if you eventually plan for the driver to be submitted upstream.> > Currently the tmem code is in the zcache driver. However, if there are > > going to be other backends designed for it, we may need to move it into its > > own module so it can be shared.I think the longterm home for tmem.c/tmem.h should be in the "lib" subdirectory of the linux tree, but it will require another driver or two to use it before the linux maintainers will consider that. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel