On Tue, Dec 12, 2017 at 07:55:55PM +0800, Wei Wang wrote:> +int xb_preload_and_set_bit(struct xb *xb, unsigned long bit, gfp_t gfp);I'm struggling to understand when one would use this. The xb_ API requires you to handle your own locking. But specifying GFP flags here implies you can sleep. So ... um ... there's no locking?> +void xb_clear_bit_range(struct xb *xb, unsigned long start, unsigned long end);That's xb_zero() which you deleted with the previous patch ... remember, keep things as close as possible to the bitmap API.
On 12/16/2017 02:42 AM, Matthew Wilcox wrote:> On Tue, Dec 12, 2017 at 07:55:55PM +0800, Wei Wang wrote: >> +int xb_preload_and_set_bit(struct xb *xb, unsigned long bit, gfp_t gfp); > I'm struggling to understand when one would use this. The xb_ API > requires you to handle your own locking. But specifying GFP flags > here implies you can sleep. So ... um ... there's no locking?In the regular use cases, people would do xb_preload() before taking the lock, and the xb_set/clear within the lock. In the virtio-balloon usage, we have a large number of bits to set with the balloon_lock being held (we're not unlocking for each bit), so we used the above wrapper to do preload and set within the balloon_lock, and passed in GFP_NOWAIT to avoid sleeping. Probably we can change to put this wrapper implementation to virtio-balloon, since it would not be useful for the regular cases. Best, Wei
On 12/16/2017 07:28 PM, Tetsuo Handa wrote:> Wei Wang wrote: >> On 12/16/2017 02:42 AM, Matthew Wilcox wrote: >>> On Tue, Dec 12, 2017 at 07:55:55PM +0800, Wei Wang wrote: >>>> +int xb_preload_and_set_bit(struct xb *xb, unsigned long bit, gfp_t gfp); >>> I'm struggling to understand when one would use this. The xb_ API >>> requires you to handle your own locking. But specifying GFP flags >>> here implies you can sleep. So ... um ... there's no locking? >> In the regular use cases, people would do xb_preload() before taking the >> lock, and the xb_set/clear within the lock. >> >> In the virtio-balloon usage, we have a large number of bits to set with >> the balloon_lock being held (we're not unlocking for each bit), so we >> used the above wrapper to do preload and set within the balloon_lock, >> and passed in GFP_NOWAIT to avoid sleeping. Probably we can change to >> put this wrapper implementation to virtio-balloon, since it would not be >> useful for the regular cases. > GFP_NOWAIT is chosen in order not to try to OOM-kill something, isn't it?Yes, I think that's right the issue we are discussing here (also discussed in the deadlock patch before): Suppose we use a sleep-able flag GFP_KERNEL, which gets the caller (fill_balloon or leak_balloon) into sleep with balloon_lock being held, and the memory reclaiming from GFP_KERNEL would fall into the OOM code path which first invokes the oom_notify-->leak_balloon to release some balloon memory, which needs to take the balloon_lock that is being held by the task who is sleeping. So, using GFP_NOWAIT avoids sleeping to get memory through directly memory reclaiming, which could fall into that OOM code path that needs to take the balloon_lock.> But passing GFP_NOWAIT means that we can handle allocation failure. There is > no need to use preload approach when we can handle allocation failure.I think the reason we need xb_preload is because radix tree insertion needs the memory being preallocated already (it couldn't suffer from memory failure during the process of inserting, probably because handling the failure there isn't easy, Matthew may know the backstory of this) So, I think we can handle the memory failure with xb_preload, which stops going into the radix tree APIs, but shouldn't call radix tree APIs without the related memory preallocated. Best, Wei
> -----Original Message----- > From: Tetsuo Handa [mailto:penguin-kernel at I-love.SAKURA.ne.jp] > Sent: Sunday, December 17, 2017 6:22 PM > To: Wang, Wei W <wei.w.wang at intel.com>; willy at infradead.org > Cc: virtio-dev at lists.oasis-open.org; linux-kernel at vger.kernel.org; qemu- > devel at nongnu.org; virtualization at lists.linux-foundation.org; > kvm at vger.kernel.org; linux-mm at kvack.org; mst at redhat.com; > mhocko at kernel.org; akpm at linux-foundation.org; mawilcox at microsoft.com; > david at redhat.com; cornelia.huck at de.ibm.com; > mgorman at techsingularity.net; aarcange at redhat.com; > amit.shah at redhat.com; pbonzini at redhat.com; > liliang.opensource at gmail.com; yang.zhang.wz at gmail.com; > quan.xu at aliyun.com; nilal at redhat.com; riel at redhat.com > Subject: Re: [PATCH v19 3/7] xbitmap: add more operations > > Wei Wang wrote: > > > But passing GFP_NOWAIT means that we can handle allocation failure. > > > There is no need to use preload approach when we can handle allocation > failure. > > > > I think the reason we need xb_preload is because radix tree insertion > > needs the memory being preallocated already (it couldn't suffer from > > memory failure during the process of inserting, probably because > > handling the failure there isn't easy, Matthew may know the backstory > > of > > this) > > According to https://lwn.net/Articles/175432/ , I think that preloading is > needed only when failure to insert an item into a radix tree is a significant > problem. > That is, when failure to insert an item into a radix tree is not a problem, I > think that we don't need to use preloading.It also mentions that the preload attempts to allocate sufficient memory to *guarantee* that the next radix tree insertion cannot fail. If we check radix_tree_node_alloc(), the comments there says "this assumes that the caller has performed appropriate preallocation". So, I think we would get a risk of triggering some issue without preload().> > > > So, I think we can handle the memory failure with xb_preload, which > > stops going into the radix tree APIs, but shouldn't call radix tree > > APIs without the related memory preallocated. > > It seems to me that virtio-ballon case has no problem without using > preloading.Why is that? Best, Wei
On 12/17/2017 11:16 PM, Tetsuo Handa wrote:> Wang, Wei W wrote: >>> Wei Wang wrote: >>>>> But passing GFP_NOWAIT means that we can handle allocation failure. >>>>> There is no need to use preload approach when we can handle allocation failure. >>>> I think the reason we need xb_preload is because radix tree insertion >>>> needs the memory being preallocated already (it couldn't suffer from >>>> memory failure during the process of inserting, probably because >>>> handling the failure there isn't easy, Matthew may know the backstory >>>> of >>>> this) >>> According to https://lwn.net/Articles/175432/ , I think that preloading is >>> needed only when failure to insert an item into a radix tree is a significant >>> problem. >>> That is, when failure to insert an item into a radix tree is not a problem, I >>> think that we don't need to use preloading. >> It also mentions that the preload attempts to allocate sufficient memory to *guarantee* that the next radix tree insertion cannot fail. >> >> If we check radix_tree_node_alloc(), the comments there says "this assumes that the caller has performed appropriate preallocation". > If you read what radix_tree_node_alloc() is doing, you will find that > radix_tree_node_alloc() returns NULL when memory allocation failed. > > I think that "this assumes that the caller has performed appropriate preallocation" > means "The caller has to perform appropriate preallocation if the caller does not > want radix_tree_node_alloc() to return NULL".For the radix tree, I agree that we may not need preload. But ida_bitmap, which the xbitmap is based on, is allocated via preload, so I think we cannot bypass preload, otherwise, we get no ida_bitmap to use. Best, Wei