On Fri, Oct 04, 2019 at 10:06:03AM +0200, David Hildenbrand wrote:> On 04.10.19 01:15, Tyler Sanderson wrote: > > I was mistaken, the problem with overcommit accounting is not fixed by > > the change to shrinker interface. > > This means that large allocations are stopped even if they could succeed > > by deflating the balloon. > > Please note that some people use the balloon for actual memory unplug - > so initiating to deflate the balloon under any circumstances is > undesired. It's different with "VIRTIO_BALLOON_F_DEFLATE_ON_OOM" being > set - however that is barely the case (at least in the setups I know :) ). > > So yes, free page reporting is a different thing, because it really is > used to "hint" and not to "agree to unplug" in any scenario. > > -- > > Thanks, >VIRTIO_BALLOON_F_DEFLATE_ON_OOM isn't really well thought through at the spec level either. For example, when will we inflate again? Current code does this at the next interrupt, which requires host to somehow know it's time to inflate. -- MST
On 04.10.19 10:35, Michael S. Tsirkin wrote:> On Fri, Oct 04, 2019 at 10:06:03AM +0200, David Hildenbrand wrote: >> On 04.10.19 01:15, Tyler Sanderson wrote: >>> I was mistaken, the problem with overcommit accounting is not fixed by >>> the change to shrinker interface. >>> This means that large allocations are stopped even if they could succeed >>> by deflating the balloon. >> >> Please note that some people use the balloon for actual memory unplug - >> so initiating to deflate the balloon under any circumstances is >> undesired. It's different with "VIRTIO_BALLOON_F_DEFLATE_ON_OOM" being >> set - however that is barely the case (at least in the setups I know :) ). >> >> So yes, free page reporting is a different thing, because it really is >> used to "hint" and not to "agree to unplug" in any scenario. >> >> -- >> >> Thanks, >> > > > VIRTIO_BALLOON_F_DEFLATE_ON_OOM isn't really well thought through > at the spec level either. For example, when will we inflate again? > Current code does this at the next interrupt, which requires > host to somehow know it's time to inflate. >The host has access to memory stats of the guest, so it could come up with some heuristics - but I do agree that is not well thought through - one reason why it is barely used :) -- Thanks, David / dhildenb
On Fri, Oct 04, 2019 at 12:03:43PM -0700, Tyler Sanderson wrote:> I think DEFLATE_ON_OOM makes sense conceptually, it's just that the > implementation doesn't play well with the rest of memory management under > memory pressure. > It could probably be fixed with enough effort, but IMO free page hinting gets > 90% of the benefit without poking the dark corners of memory management and so > is a net win. > > The obvious place where free page hinting falls short (as David pointed out > above) is that it can't pressure the page cache. > Would it be possible?to add a mechanism that explicitly causes page cache to > shrink without requiring the system to be under memory pressure?Which API would you call to shrink it?> On Fri, Oct 4, 2019 at 1:56 AM David Hildenbrand <david at redhat.com> wrote: > > On 04.10.19 10:35, Michael S. Tsirkin wrote: > > On Fri, Oct 04, 2019 at 10:06:03AM +0200, David Hildenbrand wrote: > >> On 04.10.19 01:15, Tyler Sanderson wrote: > >>> I was mistaken, the problem with overcommit accounting is not fixed by > >>> the change to shrinker interface. > >>> This means that large allocations are stopped even if they could > succeed > >>> by deflating the balloon. > >> > >> Please note that some people use the balloon for actual memory unplug - > >> so initiating to deflate the balloon under any circumstances is > >> undesired. It's different with "VIRTIO_BALLOON_F_DEFLATE_ON_OOM" being > >> set - however that is barely the case (at least in the setups I know :) > ). > >> > >> So yes, free page reporting is a different thing, because it really is > >> used to "hint" and not to "agree to unplug" in any scenario. > >> > >> -- > >> > >> Thanks, > >> > > > > > > VIRTIO_BALLOON_F_DEFLATE_ON_OOM isn't really well thought through > > at the spec level either. For example, when will we inflate again? > > Current code does this at the next interrupt, which requires > > host to somehow know it's time to inflate. > > > > The host has access to memory stats of the guest, so it could come up > with some heuristics - but I do agree that is not well thought through - > one reason why it is barely used :) > > -- > > Thanks, > > David / dhildenb >
On 04.10.19 21:03, Tyler Sanderson wrote:> I think DEFLATE_ON_OOM makes sense conceptually, it's just that the > implementation doesn't play well with the rest of memory management > under memory pressure. > It could probably be fixed with enough effort, but IMO free page hinting > gets 90% of the benefit without poking the dark corners of memory > management and so is a net win. > > The obvious place where free page hinting falls short (as David pointed > out above) is that it can't pressure the page cache.One solution is to move the page cache to the hypervisor, e.g., using emulated NVDIMMs or virtio-pmem.> Would it be possible?to add a mechanism that explicitly causes page > cache to shrink without requiring the system to be under memory pressure? >We do have a sysctl "drop_caches" which calls iterate_supers(drop_pagecache_sb, NULL) and drop_slab(). doc/Documentation/sysctl/vm.txt: ============================================================= drop_caches Writing to this will cause the kernel to drop clean caches, as well as reclaimable slab objects like dentries and inodes. Once dropped, their memory becomes free. To free pagecache: echo 1 > /proc/sys/vm/drop_caches To free reclaimable slab objects (includes dentries and inodes): echo 2 > /proc/sys/vm/drop_caches To free slab objects and pagecache: echo 3 > /proc/sys/vm/drop_caches This is a non-destructive operation and will not free any dirty objects. To increase the number of objects freed by this operation, the user may run `sync' prior to writing to /proc/sys/vm/drop_caches. This will minimize the number of dirty objects on the system and create more candidates to be dropped. This file is not a means to control the growth of the various kernel caches (inodes, dentries, pagecache, etc...) These objects are automatically reclaimed by the kernel when memory is needed elsewhere on the system. Use of this file can cause performance problems. Since it discards cached objects, it may cost a significant amount of I/O and CPU to recreate the dropped objects, especially if they were under heavy use. Because of this, use outside of a testing or debugging environment is not recommended. You may see informational messages in your kernel log when this file is used: cat (1234): drop_caches: 3 These are informational only. They do not mean that anything is wrong with your system. To disable them, echo 4 (bit 2) into drop_caches. ============================================================= Please note the "use outside of a testing or debugging environment is not recommended". Usually you want a "soft" version of this, e.g., via the OOM handler (so only drop parts of the cache, not all). -- Thanks, David / dhildenb
On Sun, Oct 06, 2019 at 10:30:40AM +0200, David Hildenbrand wrote:> Please note the "use outside of a testing or debugging environment is > not recommended". Usually you want a "soft" version of this, e.g., via > the OOM handler (so only drop parts of the cache, not all).Right. We'll need something softer I guess. By how much, I don't know. -- MST