thr3ads.net - Linux Virtualization - [PATCH v16 3/5] virtio-balloon: VIRTIO_BALLOON_F

If this information is useful, please help other people find it:
Share via:

Michael S. Tsirkin

2017-Oct-09 15:20 UTC

[PATCH v16 3/5] virtio-balloon: VIRTIO_BALLOON_F_SG

On Sat, Sep 30, 2017 at 12:05:52PM +0800, Wei Wang
wrote:> +static inline void xb_set_page(struct virtio_balloon *vb,
> +			       struct page *page,
> +			       unsigned long *pfn_min,
> +			       unsigned long *pfn_max)
> +{
> +	unsigned long pfn = page_to_pfn(page);
> +
> +	*pfn_min = min(pfn, *pfn_min);
> +	*pfn_max = max(pfn, *pfn_max);
> +	xb_preload(GFP_KERNEL);
> +	xb_set_bit(&vb->page_xb, pfn);
> +	xb_preload_end();
> +}
> +
So, this will allocate memory

...
> @@ -198,9 +327,12 @@ static unsigned leak_balloon(struct virtio_balloon
*vb, size_t num)
>  	struct page *page;
>  	struct balloon_dev_info *vb_dev_info = &vb->vb_dev_info;
>  	LIST_HEAD(pages);
> +	bool use_sg = virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_SG);
> +	unsigned long pfn_max = 0, pfn_min = ULONG_MAX;
>  
> -	/* We can only do one array worth at a time. */
> -	num = min(num, ARRAY_SIZE(vb->pfns));
> +	/* Traditionally, we can only do one array worth at a time. */
> +	if (!use_sg)
> +		num = min(num, ARRAY_SIZE(vb->pfns));
>  
>  	mutex_lock(&vb->balloon_lock);
>  	/* We can't release more pages than taken */
And is sometimes called on OOM.


I suspect we need to

1. keep around some memory for leak on oom

2. for non oom allocate outside locks


-- 
MST

Wei Wang

2017-Oct-10 07:28 UTC

head link

[PATCH v16 3/5] virtio-balloon: VIRTIO_BALLOON_F_SG

On 10/09/2017 11:20 PM, Michael S. Tsirkin wrote:> On Sat, Sep 30, 2017 at 12:05:52PM +0800, Wei Wang wrote:
>> +static inline void xb_set_page(struct virtio_balloon *vb,
>> +			       struct page *page,
>> +			       unsigned long *pfn_min,
>> +			       unsigned long *pfn_max)
>> +{
>> +	unsigned long pfn = page_to_pfn(page);
>> +
>> +	*pfn_min = min(pfn, *pfn_min);
>> +	*pfn_max = max(pfn, *pfn_max);
>> +	xb_preload(GFP_KERNEL);
>> +	xb_set_bit(&vb->page_xb, pfn);
>> +	xb_preload_end();
>> +}
>> +
> So, this will allocate memory
>
> ...
>
>> @@ -198,9 +327,12 @@ static unsigned leak_balloon(struct virtio_balloon
*vb, size_t num)
>>   	struct page *page;
>>   	struct balloon_dev_info *vb_dev_info = &vb->vb_dev_info;
>>   	LIST_HEAD(pages);
>> +	bool use_sg = virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_SG);
>> +	unsigned long pfn_max = 0, pfn_min = ULONG_MAX;
>>   
>> -	/* We can only do one array worth at a time. */
>> -	num = min(num, ARRAY_SIZE(vb->pfns));
>> +	/* Traditionally, we can only do one array worth at a time. */
>> +	if (!use_sg)
>> +		num = min(num, ARRAY_SIZE(vb->pfns));
>>   
>>   	mutex_lock(&vb->balloon_lock);
>>   	/* We can't release more pages than taken */
> And is sometimes called on OOM.
>
>
> I suspect we need to
>
> 1. keep around some memory for leak on oom
>
> 2. for non oom allocate outside locks
>
>
I think maybe we can optimize the existing balloon logic, which could 
remove the big balloon lock:

It would not be necessary to have the inflating and deflating run at the 
same time.
For example, 1st request to inflate 7G RAM, when 1GB has been given to 
the host (so 6G left), the
2nd request to deflate 5G is received. Instead of waiting for the 1st 
request to inflate 6G and then
continuing with the 2nd request to deflate 5G, we can do a diff (6G to 
inflate - 5G to deflate) immediately,
and got 1G to inflate. In this way, all that driver will do is to simply 
inflate another 1G.

Same for the OOM case: when OOM asks for 1G, while inflating 5G is in 
progress, then the driver can
deduct 1G from the amount that needs to inflate, and as a result, it 
will inflate 4G.

In this case, we will never have the inflating and deflating task run at 
the same time, so I think it is
possible to remove the lock, and therefore, we will not have that 
deadlock issue.

What would you guys think?

Best,
Wei

Wei Wang

2017-Oct-10 12:32 UTC

head link

[PATCH v16 3/5] virtio-balloon: VIRTIO_BALLOON_F_SG

On 10/10/2017 07:08 PM, Tetsuo Handa wrote:> Wei Wang wrote:
>> On 10/09/2017 11:20 PM, Michael S. Tsirkin wrote:
>>> On Sat, Sep 30, 2017 at 12:05:52PM +0800, Wei Wang wrote:
>>>> +static inline void xb_set_page(struct virtio_balloon *vb,
>>>> +			       struct page *page,
>>>> +			       unsigned long *pfn_min,
>>>> +			       unsigned long *pfn_max)
>>>> +{
>>>> +	unsigned long pfn = page_to_pfn(page);
>>>> +
>>>> +	*pfn_min = min(pfn, *pfn_min);
>>>> +	*pfn_max = max(pfn, *pfn_max);
>>>> +	xb_preload(GFP_KERNEL);
>>>> +	xb_set_bit(&vb->page_xb, pfn);
>>>> +	xb_preload_end();
>>>> +}
>>>> +
>>> So, this will allocate memory
>>>
>>> ...
>>>
>>>> @@ -198,9 +327,12 @@ static unsigned leak_balloon(struct
virtio_balloon *vb, size_t num)
>>>>    	struct page *page;
>>>>    	struct balloon_dev_info *vb_dev_info =
&vb->vb_dev_info;
>>>>    	LIST_HEAD(pages);
>>>> +	bool use_sg = virtio_has_feature(vb->vdev,
VIRTIO_BALLOON_F_SG);
>>>> +	unsigned long pfn_max = 0, pfn_min = ULONG_MAX;
>>>>    
>>>> -	/* We can only do one array worth at a time. */
>>>> -	num = min(num, ARRAY_SIZE(vb->pfns));
>>>> +	/* Traditionally, we can only do one array worth at a time.
*/
>>>> +	if (!use_sg)
>>>> +		num = min(num, ARRAY_SIZE(vb->pfns));
>>>>    
>>>>    	mutex_lock(&vb->balloon_lock);
>>>>    	/* We can't release more pages than taken */
>>> And is sometimes called on OOM.
>>>
>>>
>>> I suspect we need to
>>>
>>> 1. keep around some memory for leak on oom
>>>
>>> 2. for non oom allocate outside locks
>>>
>>>
>> I think maybe we can optimize the existing balloon logic, which could
>> remove the big balloon lock:
>>
>> It would not be necessary to have the inflating and deflating run at
the
>> same time.
>> For example, 1st request to inflate 7G RAM, when 1GB has been given to
>> the host (so 6G left), the
>> 2nd request to deflate 5G is received. Instead of waiting for the 1st
>> request to inflate 6G and then
>> continuing with the 2nd request to deflate 5G, we can do a diff (6G to
>> inflate - 5G to deflate) immediately,
>> and got 1G to inflate. In this way, all that driver will do is to
simply
>> inflate another 1G.
>>
>> Same for the OOM case: when OOM asks for 1G, while inflating 5G is in
>> progress, then the driver can
>> deduct 1G from the amount that needs to inflate, and as a result, it
>> will inflate 4G.
>>
>> In this case, we will never have the inflating and deflating task run
at
>> the same time, so I think it is
>> possible to remove the lock, and therefore, we will not have that
>> deadlock issue.
>>
>> What would you guys think?
> What is balloon_lock at virtballoon_migratepage() for?
>
>    e22504296d4f64fb "virtio_balloon: introduce migration primitives to
balloon pages"
>    f68b992bbb474641 "virtio_balloon: fix race by fill and leak"
I think that's the part we need to improve for the existing 
implementation when going with the above direction.

As also stated in the commit log, the lock was proposed to synchronize 
accesses to elements
of struct virtio_balloon and its queue operation. To be more precise, 
fill_balloon/leak_balloon/migrationpage
share vb->pfns[] and vb->num_pfns, which can actually be changed to use 
local variables of their own each.

For example, for migratepage:
+       __virtio32 pfn;
...
-       vb->num_pfns = VIRTIO_BALLOON_PAGES_PER_PAGE;
-       set_page_pfns(vb, vb->pfns, newpage);
-       tell_host(vb, vb->inflate_vq);
+       set_page_pfns(vb, &pfn, newpage);
+       tell_host(vb, vb->inflate_vq, &pfn,
VIRTIO_BALLOON_PAGES_PER_PAGE);

For the queue access, it could be a small lock for each queue access, 
which I think won't cause the issue.


> And even if we could remove balloon_lock, you still cannot use
> __GFP_DIRECT_RECLAIM at xb_set_page(). I think you will need to use
> "whether it is safe to wait" flag from
> "[PATCH] virtio: avoid possible OOM lockup at
virtballoon_oom_notify()" .
Without the lock being held, why couldn't we use __GFP_DIRECT_RECLAIM at 
xb_set_page()?


Best,
Wei

Wei Wang

2017-Oct-11 01:51 UTC

head link

[PATCH v16 3/5] virtio-balloon: VIRTIO_BALLOON_F_SG

On 10/10/2017 09:09 PM, Tetsuo Handa wrote:> Wei Wang wrote:
>>> And even if we could remove balloon_lock, you still cannot use
>>> __GFP_DIRECT_RECLAIM at xb_set_page(). I think you will need to use
>>> "whether it is safe to wait" flag from
>>> "[PATCH] virtio: avoid possible OOM lockup at
virtballoon_oom_notify()" .
>> Without the lock being held, why couldn't we use
__GFP_DIRECT_RECLAIM at
>> xb_set_page()?
> Because of dependency shown below.
>
> leak_balloon()
>    xb_set_page()
>      xb_preload(GFP_KERNEL)
>        kmalloc(GFP_KERNEL)
>          __alloc_pages_may_oom()
>            Takes oom_lock
>            out_of_memory()
>              blocking_notifier_call_chain()
>                leak_balloon()
>                  xb_set_page()
>                    xb_preload(GFP_KERNEL)
>                      kmalloc(GFP_KERNEL)
>                        __alloc_pages_may_oom()
>                          Fails to take oom_lock and loop forever
__alloc_pages_may_oom() uses mutex_trylock(&oom_lock).

I think the second __alloc_pages_may_oom() will not continue since the
first one is in progress.
>
> By the way, is xb_set_page() safe?
> Sleeping in the kernel with preemption disabled is a bug, isn't it?
> __radix_tree_preload() returns 0 with preemption disabled upon success.
> xb_preload() disables preemption if __radix_tree_preload() fails.
> Then, kmalloc() is called with preemption disabled, isn't it?
> But xb_set_page() calls xb_preload(GFP_KERNEL) which might sleep with
> preemption disabled.
Yes, I think that should not be expected, thanks.

I plan to change it like this:

bool xb_preload(gfp_t gfp)
{
         if (!this_cpu_read(ida_bitmap)) {
                 struct ida_bitmap *bitmap = kmalloc(sizeof(*bitmap), gfp);

                 if (!bitmap)
                         return false;
                 bitmap = this_cpu_cmpxchg(ida_bitmap, NULL, bitmap);
                 kfree(bitmap);
         }

         if (__radix_tree_preload(gfp, XB_PRELOAD_SIZE) < 0)
                 return false;

         return true;
}


Best,
Wei

Wei Wang

2017-Oct-11 03:16 UTC

head link

[PATCH v16 3/5] virtio-balloon: VIRTIO_BALLOON_F_SG

On 10/11/2017 10:26 AM, Tetsuo Handa wrote:> Wei Wang wrote:
>> On 10/10/2017 09:09 PM, Tetsuo Handa wrote:
>>> Wei Wang wrote:
>>>>> And even if we could remove balloon_lock, you still cannot
use
>>>>> __GFP_DIRECT_RECLAIM at xb_set_page(). I think you will
need to use
>>>>> "whether it is safe to wait" flag from
>>>>> "[PATCH] virtio: avoid possible OOM lockup at
virtballoon_oom_notify()" .
>>>> Without the lock being held, why couldn't we use
__GFP_DIRECT_RECLAIM at
>>>> xb_set_page()?
>>> Because of dependency shown below.
>>>
>>> leak_balloon()
>>>    xb_set_page()
>>>      xb_preload(GFP_KERNEL)
>>>        kmalloc(GFP_KERNEL)
>>>          __alloc_pages_may_oom()
>>>            Takes oom_lock
>>>            out_of_memory()
>>>              blocking_notifier_call_chain()
>>>                leak_balloon()
>>>                  xb_set_page()
>>>                    xb_preload(GFP_KERNEL)
>>>                      kmalloc(GFP_KERNEL)
>>>                        __alloc_pages_may_oom()
>>>                          Fails to take oom_lock and loop forever
>> __alloc_pages_may_oom() uses mutex_trylock(&oom_lock).
> Yes. But this mutex_trylock(&oom_lock) is semantically
mutex_lock(&oom_lock)
> because __alloc_pages_slowpath() will continue looping until
> mutex_trylock(&oom_lock) succeeds (or somebody releases memory).
>
>> I think the second __alloc_pages_may_oom() will not continue since the
>> first one is in progress.
> The second __alloc_pages_may_oom() will be called repeatedly because
> __alloc_pages_slowpath() will continue looping (unless somebody releases
> memory).
>
OK, I see, thanks. So, the point is that the OOM code path should not
have memory allocation, and the
old leak_balloon (without the F_SG feature) don't need xb_preload(). I
think one solution would be to let
the OOM uses the old leak_balloon() code path, and we can add one more
parameter to leak_balloon
to control that:

leak_balloon(struct virtio_balloon *vb, size_t num, bool oom)


>>> By the way, is xb_set_page() safe?
>>> Sleeping in the kernel with preemption disabled is a bug, isn't
it?
>>> __radix_tree_preload() returns 0 with preemption disabled upon
success.
>>> xb_preload() disables preemption if __radix_tree_preload() fails.
>>> Then, kmalloc() is called with preemption disabled, isn't it?
>>> But xb_set_page() calls xb_preload(GFP_KERNEL) which might sleep
with
>>> preemption disabled.
>> Yes, I think that should not be expected, thanks.
>>
>> I plan to change it like this:
>>
>> bool xb_preload(gfp_t gfp)
>> {
>>         if (!this_cpu_read(ida_bitmap)) {
>>                 struct ida_bitmap *bitmap = kmalloc(sizeof(*bitmap),
gfp);
>>
>>                 if (!bitmap)
>>                         return false;
>>                 bitmap = this_cpu_cmpxchg(ida_bitmap, NULL, bitmap);
>>                 kfree(bitmap);
>>         }
> Excuse me, but you are allocating per-CPU memory when running CPU might
> change at this line? What happens if running CPU has changed at this line?
> Will it work even with new CPU's ida_bitmap == NULL ?
>

Yes, it will be detected in xb_set_bit(): when ida_bitmap = NULL on the
new CPU, xb_set_bit() will
return -EAGAIN to the caller, and the caller should restart from
xb_preload().

Best,
Wei

Possibly Parallel Threads

Search for more maybe matching threads

Linux Virtualization - Oct 2017 - [PATCH v16 3/5] virtio-balloon: VIRTIO_BALLOON_F_SG

[PATCH v16 3/5] virtio-balloon: VIRTIO_BALLOON_F_SG

[PATCH v16 3/5] virtio-balloon: VIRTIO_BALLOON_F_SG

[PATCH v16 3/5] virtio-balloon: VIRTIO_BALLOON_F_SG

[PATCH v16 3/5] virtio-balloon: VIRTIO_BALLOON_F_SG

[PATCH v16 3/5] virtio-balloon: VIRTIO_BALLOON_F_SG

Possibly Parallel Threads