Alistair Popple
2021-Jun-10 00:18 UTC
[Nouveau] [PATCH v10 07/10] mm: Device exclusive memory access
On Thursday, 10 June 2021 2:05:06 AM AEST Peter Xu wrote:> On Wed, Jun 09, 2021 at 07:38:04PM +1000, Alistair Popple wrote: > > On Wednesday, 9 June 2021 4:33:52 AM AEST Peter Xu wrote: > > > On Mon, Jun 07, 2021 at 05:58:52PM +1000, Alistair Popple wrote:[...]> > For thp this means we could end up passing > > tail pages to rmap_walk(), however it doesn't actually walk them. > > > > Based on the results of previous testing I had done I assumed rmap_walk() > > filtered out tail pages. It does, and I didn't hit the BUG_ON above, but the > > filtering was not as deliberate as assumed. > > > > I've gone back and looked at what was happening in my earlier tests and the > > tail pages get filtered because the VMA is not getting locked in > > page_lock_anon_vma_read() due to failing this check: > > > > anon_mapping = (unsigned long)READ_ONCE(page->mapping); > > if ((anon_mapping & PAGE_MAPPING_FLAGS) != PAGE_MAPPING_ANON) > > goto out; > > > > And now I'm not sure it makes sense to read page->mapping of a tail page. So > > it might be best if we explicitly ignore any tail pages returned from GUP, at > > least for now (a future series will improve thp support such as adding a pmd > > version for exclusive entries). > > I feel like it's illegal to access page->mapping of tail pages; I looked at > what happens if we call page_anon_vma() on a tail page: > > struct anon_vma *page_anon_vma(struct page *page) > { > unsigned long mapping; > > page = compound_head(page); > mapping = (unsigned long)page->mapping; > if ((mapping & PAGE_MAPPING_FLAGS) != PAGE_MAPPING_ANON) > return NULL; > return __page_rmapping(page); > } > > It'll just take the head's mapping instead. It makes sense since the tail page > shouldn't have a different value against the head page, afaiu.Right, it makes no sense to look at ->mapping on a tail page because the field is used for something else. On the 1st tail page it is ->compound_nr and on the 2nd tail page it is ->deferred_list. See the definitions of compound_nr() and page_deferred_list() respectively. I suppose on the rest of the pages it could be anything. I think in practice it is probably ok - iuc bit 0 won't be set for compound_nr and certainly not for deferred_list->next (a pointer). But none of that seems intentional, so it would be better to be explicit and not walk the tail pages.> It would be great if thp experts could chim in. Before that happens, I agree > with you that a safer approach is to explicitly not walk a tail page for its > rmap (and I think the rmap of a tail page will be the same of the head > anyways.. since they seem to share the anon_vma as quoted). > > > > > So... for thp mappings, wondering whether we should do normal GUP (without > > > SPLIT), pass in always normal or head pages into rmap_walk(), but then > > > unconditionally split_huge_pmd_address() in page_make_device_exclusive_one()? > > > > That could work (although I think GUP will still return tail pages - see > > follow_trans_huge_pmd() which is called from follow_pmd_mask() in gup). > > Agreed. > > > The main problem is split_huge_pmd_address() unconditionally calls a mmu > > notifier so I would need to plumb in passing an owner everywhere which could > > get messy. > > Could I ask why? split_huge_pmd_address() will notify with CLEAR, so I'm a bit > confused why we need to pass over the owner.Sure, it is the same reason we need to pass it for the exclusive notifier. Any invalidation during the make exclusive operation will break the mmu read side critical section forcing a retry of the operation. The owner field is what is used to filter out invalidations (such as the exclusive invalidation) that don't need to be retried.> I thought plumb it right before your EXCLUSIVE notifier init would work?I did try this just to double check and it doesn't work due to the unconditional notifier.> ---8<--- > diff --git a/mm/rmap.c b/mm/rmap.c > index a94d9aed9d95..360ce86f3822 100644 > --- a/mm/rmap.c > +++ b/mm/rmap.c > @@ -2042,6 +2042,12 @@ static bool page_make_device_exclusive_one(struct page *page, > swp_entry_t entry; > pte_t swp_pte; > > + /* > + * Make sure thps split as device exclusive entries only support pte > + * level for now. > + */ > + split_huge_pmd_address(vma, address, false, page); > + > mmu_notifier_range_init_owner(&range, MMU_NOTIFY_EXCLUSIVE, 0, vma, > vma->vm_mm, address, min(vma->vm_end, > address + page_size(page)), args->owner); > ---8<--- > > Thanks, > > -- > Peter Xu >
Alistair Popple
2021-Jun-10 14:21 UTC
[Nouveau] [PATCH v10 07/10] mm: Device exclusive memory access
On Friday, 11 June 2021 4:04:35 AM AEST Peter Xu wrote:> External email: Use caution opening links or attachments > > > On Thu, Jun 10, 2021 at 10:18:25AM +1000, Alistair Popple wrote: > > > > The main problem is split_huge_pmd_address() unconditionally calls a mmu > > > > notifier so I would need to plumb in passing an owner everywhere which could > > > > get messy. > > > > > > Could I ask why? split_huge_pmd_address() will notify with CLEAR, so I'm a bit > > > confused why we need to pass over the owner. > > > > Sure, it is the same reason we need to pass it for the exclusive notifier. > > Any invalidation during the make exclusive operation will break the mmu read > > side critical section forcing a retry of the operation. The owner field is what > > is used to filter out invalidations (such as the exclusive invalidation) that > > don't need to be retried. > > Do you mean the mmu_interval_read_begin|retry() calls?Yep.> Hmm, the thing is.. to me FOLL_SPLIT_PMD should have similar effect to explicit > call split_huge_pmd_address(), afaict. Since both of them use __split_huge_pmd() > internally which will generate that unwanted CLEAR notify.Agree that gup calls __split_huge_pmd() via split_huge_pmd_address() which will always CLEAR. However gup only calls split_huge_pmd_address() if it finds a thp pmd. In follow_pmd_mask() we have: if (likely(!pmd_trans_huge(pmdval))) return follow_page_pte(vma, address, pmd, flags, &ctx->pgmap); So I don't think we have a problem here.> If that's the case, I think it fails because split_huge_pmd_address() will > trigger that CLEAR notify unconditionally (even if it's not a thp; not sure > whether it should be optimized to not notify at all... definitely another > story), while FOLL_SPLIT_PMD will skip the notify as it calls split_huge_pmd() > instead, who checks the pmd before calling __split_huge_pmd(). > > Does it also mean that if there's a real THP it won't really work? As then > FOLL_SPLIT_PMD will start to trigger that CLEAR notify too, I think.. > > -- > Peter Xu >
Peter Xu
2021-Jun-10 18:04 UTC
[Nouveau] [PATCH v10 07/10] mm: Device exclusive memory access
On Thu, Jun 10, 2021 at 10:18:25AM +1000, Alistair Popple wrote:> > > The main problem is split_huge_pmd_address() unconditionally calls a mmu > > > notifier so I would need to plumb in passing an owner everywhere which could > > > get messy. > > > > Could I ask why? split_huge_pmd_address() will notify with CLEAR, so I'm a bit > > confused why we need to pass over the owner. > > Sure, it is the same reason we need to pass it for the exclusive notifier. > Any invalidation during the make exclusive operation will break the mmu read > side critical section forcing a retry of the operation. The owner field is what > is used to filter out invalidations (such as the exclusive invalidation) that > don't need to be retried.Do you mean the mmu_interval_read_begin|retry() calls? Hmm, the thing is.. to me FOLL_SPLIT_PMD should have similar effect to explicit call split_huge_pmd_address(), afaict. Since both of them use __split_huge_pmd() internally which will generate that unwanted CLEAR notify. If that's the case, I think it fails because split_huge_pmd_address() will trigger that CLEAR notify unconditionally (even if it's not a thp; not sure whether it should be optimized to not notify at all... definitely another story), while FOLL_SPLIT_PMD will skip the notify as it calls split_huge_pmd() instead, who checks the pmd before calling __split_huge_pmd(). Does it also mean that if there's a real THP it won't really work? As then FOLL_SPLIT_PMD will start to trigger that CLEAR notify too, I think.. -- Peter Xu