thr3ads.net - Xen devel - [Xen-devel] Re: One (possible) x86 get_user

If this information is useful, please help other people find it:
Share via:

Jan Beulich

2011-Jan-27 16:07 UTC

[Xen-devel] Re: One (possible) x86 get_user_pages bug

>>> On 27.01.11 at 14:05, Xiaowei Yang <xiaowei.yang@huawei.com>
wrote:
> We created a scenario to reproduce the bug:
> ----------------------------------------------------------------
> // proc1/proc1.2 are 2 threads sharing one page table.
> // proc1 is the parent of proc2.
> 
> proc1               proc2          proc1.2
> ...                 ...            // in gup_pte_range()
> ...                 ...            pte = gup_get_pte()
> ...                 ...            page1 = pte_page(pte)  // (1)
> do_wp_page(page1)   ...            ...
> ...                 exit_map()     ...
> ...                 ...            get_page(page1)        // (2)
> -----------------------------------------------------------------
> 
> do_wp_page() and exit_map() cause page1 to be released into free list 
> before get_page() in proc1.2 is called. The longer the delay between 
> (1)&(2), the easier the BUG_ON shows.
Other than responded initially, I don''t this can happen outside
of Xen: do_wp_page() won''t reach page_cache_release() when
gup_pte_range() is running for the same mm on another CPU,
since it can''t get past ptep_clear_flush() (waiting for the CPU
in get_user_pages_fast() to re-enable interrupts).
> An experimental patch is made to prevent the PTE being modified in the 
> middle of gup_pte_range(). The BUG_ON disappears afterward.
> 
> However, from the comments embedded in gup.c, it seems deliberate to 
> avoid the lock in the fast path. The question is: if so, how to avoid 
> the above scenario?
Nick, based on your doing of the initial implementation, would
you be able to estimate whether disabling get_user_pages_fast()
altogether for Xen would be performing measurably worse than
adding the locks (but continuing to avoid acquiring mm->mmap_sem)
as suggested by Xiaowei? That''s of course only if the latter is correct
at all, of which I haven''t fully convinced myself yet.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Peter Zijlstra

2011-Jan-27 16:25 UTC

head link

[Xen-devel] Re: One (possible) x86 get_user_pages bug

On Thu, 2011-01-27 at 16:07 +0000, Jan Beulich wrote:> 
> Nick, based on your doing of the initial implementation, would
> you be able to estimate whether disabling get_user_pages_fast()
> altogether for Xen would be performing measurably worse than
> adding the locks (but continuing to avoid acquiring mm->mmap_sem)
> as suggested by Xiaowei? That''s of course only if the latter is
correct
> at all, of which I haven''t fully convinced myself yet. 
Adding the lock will result in deadlocks, __get_user_pages_fast() is
used from NMI context.

Then again, I''ve got no clue if Xen even has NMIs..

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jan Beulich

2011-Jan-27 16:41 UTC

head link

[Xen-devel] Re: One (possible) x86 get_user_pages bug

>>> On 27.01.11 at 17:25, Peter Zijlstra <a.p.zijlstra@chello.nl>
wrote:
> On Thu, 2011-01-27 at 16:07 +0000, Jan Beulich wrote:
>> 
>> Nick, based on your doing of the initial implementation, would
>> you be able to estimate whether disabling get_user_pages_fast()
>> altogether for Xen would be performing measurably worse than
>> adding the locks (but continuing to avoid acquiring mm->mmap_sem)
>> as suggested by Xiaowei? That''s of course only if the latter
is correct
>> at all, of which I haven''t fully convinced myself yet. 
> 
> Adding the lock will result in deadlocks, __get_user_pages_fast() is
> used from NMI context.
> 
> Then again, I''ve got no clue if Xen even has NMIs..
It does, but not in a way that would be usable for perf events
afaict, so that code path should be dead under Xen. (We don''t
even build the offending source file in our kernels, but I''m not
sure how pv-ops deals with situations like this - Jeremy?)

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Peter Zijlstra

2011-Jan-27 16:56 UTC

head link

[Xen-devel] Re: One (possible) x86 get_user_pages bug

On Thu, 2011-01-27 at 16:07 +0000, Jan Beulich wrote:> 
> Nick, based on your doing of the initial implementation, would
> you be able to estimate whether disabling get_user_pages_fast()
> altogether for Xen would be performing measurably worse than
> adding the locks (but continuing to avoid acquiring mm->mmap_sem)
> as suggested by Xiaowei? That''s of course only if the latter is
correct
> at all, of which I haven''t fully convinced myself yet. 
Also, I don''t think taking the pte_lock is sufficient in your case..
since x86 mmu_gather frees the page tables themselves too, your
dereference of the pgd/pud/pmd will be subject to the same deref after
free problems.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Nick Piggin

2011-Jan-27 21:24 UTC

head link

[Xen-devel] Re: One (possible) x86 get_user_pages bug

On Fri, Jan 28, 2011 at 3:07 AM, Jan Beulich <JBeulich@novell.com>
wrote:>>>> On 27.01.11 at 14:05, Xiaowei Yang
<xiaowei.yang@huawei.com> wrote:
>> We created a scenario to reproduce the bug:
>> ----------------------------------------------------------------
>> // proc1/proc1.2 are 2 threads sharing one page table.
>> // proc1 is the parent of proc2.
>>
>> proc1               proc2          proc1.2
>> ...                 ...            // in gup_pte_range()
>> ...                 ...            pte = gup_get_pte()
>> ...                 ...            page1 = pte_page(pte)  // (1)
>> do_wp_page(page1)   ...            ...
>> ...                 exit_map()     ...
>> ...                 ...            get_page(page1)        // (2)
>> -----------------------------------------------------------------
>>
>> do_wp_page() and exit_map() cause page1 to be released into free list
>> before get_page() in proc1.2 is called. The longer the delay between
>> (1)&(2), the easier the BUG_ON shows.
>
> Other than responded initially, I don''t this can happen outside
> of Xen: do_wp_page() won''t reach page_cache_release() when
> gup_pte_range() is running for the same mm on another CPU,
> since it can''t get past ptep_clear_flush() (waiting for the CPU
> in get_user_pages_fast() to re-enable interrupts).
Yeah, this cannot happen on native.

>> An experimental patch is made to prevent the PTE being modified in the
>> middle of gup_pte_range(). The BUG_ON disappears afterward.
>>
>> However, from the comments embedded in gup.c, it seems deliberate to
>> avoid the lock in the fast path. The question is: if so, how to avoid
>> the above scenario?
>
> Nick, based on your doing of the initial implementation, would
> you be able to estimate whether disabling get_user_pages_fast()
> altogether for Xen would be performing measurably worse than
> adding the locks (but continuing to avoid acquiring mm->mmap_sem)
> as suggested by Xiaowei? That''s of course only if the latter is
correct
> at all, of which I haven''t fully convinced myself yet.
You must have some way to guarantee existence of Linux page
tables when you walk them in order to resolve a TLB refill.

x86 does this with IPI and hardware fill that is atomic WRT interrupts.
So fast gup can disable interrupts to walk page tables, I don''t think
it
is fragile it is absolutely tied to the system ISA (of course that can
change, but as Peter said, other things will have to change).

Other architectures use RCU for this, so fast gup uses a lockless-
pagecache-alike protcol for that.

If Xen is not using IPIs for flush, it should use whatever locks or
synchronization its TLB refill is using.

Thanks,
Nick

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Xiaowei Yang

2011-Jan-28 07:17 UTC

head link

[Xen-devel] Re: One (possible) x86 get_user_pages bug

On 2011-1-28 5:24, Nick Piggin wrote:> On Fri, Jan 28, 2011 at 3:07 AM, Jan Beulich<JBeulich@novell.com> 
wrote:
>>>>> On 27.01.11 at 14:05, Xiaowei
Yang<xiaowei.yang@huawei.com>  wrote:
>>> We created a scenario to reproduce the bug:
>>> ----------------------------------------------------------------
>>> // proc1/proc1.2 are 2 threads sharing one page table.
>>> // proc1 is the parent of proc2.
>>>
>>> proc1               proc2          proc1.2
>>> ...                 ...            // in gup_pte_range()
>>> ...                 ...            pte = gup_get_pte()
>>> ...                 ...            page1 = pte_page(pte)  // (1)
>>> do_wp_page(page1)   ...            ...
>>> ...                 exit_map()     ...
>>> ...                 ...            get_page(page1)        // (2)
>>> -----------------------------------------------------------------
>>>
>>> do_wp_page() and exit_map() cause page1 to be released into free
list
>>> before get_page() in proc1.2 is called. The longer the delay
between
>>> (1)&(2), the easier the BUG_ON shows.
>>
>> Other than responded initially, I don''t this can happen
outside
>> of Xen: do_wp_page() won''t reach page_cache_release() when
>> gup_pte_range() is running for the same mm on another CPU,
>> since it can''t get past ptep_clear_flush() (waiting for the
CPU
>> in get_user_pages_fast() to re-enable interrupts).
>
> Yeah, this cannot happen on native.
>
>
>>> An experimental patch is made to prevent the PTE being modified in
the
>>> middle of gup_pte_range(). The BUG_ON disappears afterward.
>>>
>>> However, from the comments embedded in gup.c, it seems deliberate
to
>>> avoid the lock in the fast path. The question is: if so, how to
avoid
>>> the above scenario?
>>
>> Nick, based on your doing of the initial implementation, would
>> you be able to estimate whether disabling get_user_pages_fast()
>> altogether for Xen would be performing measurably worse than
>> adding the locks (but continuing to avoid acquiring mm->mmap_sem)
>> as suggested by Xiaowei? That''s of course only if the latter
is correct
>> at all, of which I haven''t fully convinced myself yet.
>
> You must have some way to guarantee existence of Linux page
> tables when you walk them in order to resolve a TLB refill.
>
> x86 does this with IPI and hardware fill that is atomic WRT interrupts.
> So fast gup can disable interrupts to walk page tables, I don''t
think it
> is fragile it is absolutely tied to the system ISA (of course that can
> change, but as Peter said, other things will have to change).
>
> Other architectures use RCU for this, so fast gup uses a lockless-
> pagecache-alike protcol for that.
>
> If Xen is not using IPIs for flush, it should use whatever locks or
> synchronization its TLB refill is using.
Thanks everyone! It''s very clear now that the problem only occurs on
Xen
PV kernel which doesn''t use IPI to flush TLB so lacks the implicit sync
mechanism. For now we can disable the fast path as a temp solution until 
a better one comes up -- comparing to adding extra locks, the slow path 
may not be bad, and actually get_user_pages_fast() is not called that 
much in our environment.

However, this issue may raise another concern: could there be other 
places inside Xen PV kernel which has the same sync problem?

Thanks,
Xiaowei

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Xen devel - Jan 2011 - Re: One (possible) x86 get_user_pages bug

[Xen-devel] Re: One (possible) x86 get_user_pages bug

[Xen-devel] Re: One (possible) x86 get_user_pages bug

[Xen-devel] Re: One (possible) x86 get_user_pages bug

[Xen-devel] Re: One (possible) x86 get_user_pages bug

[Xen-devel] Re: One (possible) x86 get_user_pages bug

[Xen-devel] Re: One (possible) x86 get_user_pages bug