thr3ads.net - Nouveau - [Nouveau] [RESEND PATCH 2/3] nouveau: fix mixed normal and device private page migration [Jun 2020]

If this information is useful, please help other people find it:
Share via:

Ralph Campbell

2020-Jun-25 17:25 UTC

[Nouveau] [RESEND PATCH 2/3] nouveau: fix mixed normal and device private page migration

Making sure to include linux-mm and Bharata B Rao for IBM's
use of migrate_vma*().

On 6/24/20 11:10 AM, Ralph Campbell wrote:> 
> On 6/24/20 12:23 AM, Christoph Hellwig wrote:
>> On Mon, Jun 22, 2020 at 04:38:53PM -0700, Ralph Campbell wrote:
>>> The OpenCL function clEnqueueSVMMigrateMem(), without any flags,
will
>>> migrate memory in the given address range to device private memory.
The
>>> source pages might already have been migrated to device private
memory.
>>> In that case, the source struct page is not checked to see if it is
>>> a device private page and incorrectly computes the GPU's
physical
>>> address of local memory leading to data corruption.
>>> Fix this by checking the source struct page and computing the
correct
>>> physical address.
>>
>> I'm really worried about all this delicate code to fix the mixed
>> ranges.? Can't we make it clear at the migrate_vma_* level if we
want
>> to migrate from or two device private memory, and then skip all the
work
>> for regions of memory that already are in the right place?? This might
be
>> a little more work initially, but I think it leads to a much better
>> API.
>>
> 
> The current code does encode the direction with src_owner != NULL meaning
> device private to system memory and src_owner == NULL meaning system
> memory to device private memory. This patch would obviously defeat that
> so perhaps a flag could be added to the struct migrate_vma to indicate the
> direction but I'm unclear how that makes things less delicate.
> Can you expand on what you are worried about?
> 
> The issue with invalidations might be better addressed by letting the
device
> driver handle device private page TLB invalidations when migrating to
> system memory and changing migrate_vma_setup() to only invalidate CPU
> TLB entries for normal pages being migrated to device private memory.
> If a page isn't migrating, it seems inefficient to invalidate those TLB
> entries.
> 
> Any other suggestions?
After a night's sleep, I think this might work. What do others think?

1) Add a new MMU_NOTIFY_MIGRATE enum to mmu_notifier_event.

2) Change migrate_vma_collect() to use the new MMU_NOTIFY_MIGRATE event type.

3) Modify nouveau_svmm_invalidate_range_start() to simply return (no
invalidations)
for MMU_NOTIFY_MIGRATE mmu notifier callbacks.

4) Leave the src_owner check in migrate_vma_collect_pmd() for normal pages so if
the
device driver is migrating normal pages to device private memory, the driver
would
set src_owner = NULL and already migrated device private pages would be skipped.
Since the mmu notifier callback did nothing, the device private entries remain
valid
in the device's MMU. migrate_vma_collect_pmd() would still invalidate the
CPU page
tables for migrated normal pages.
If the driver is migrating device private pages to system memory, it would set
src_owner != NULL, normal pages would be skipped, but now the device driver has
to
invalidate device MMU mappings in the "alloc and copy" before doing
the copy.
This would be after migrate_vma_setup() returns so the list of migrating device
pages is known to the driver.

The rest of the migrate_vma_pages() and migrate_vma_finalize() remain the same.

Jason Gunthorpe

2020-Jun-25 17:31 UTC

head link

[Nouveau] [RESEND PATCH 2/3] nouveau: fix mixed normal and device private page migration

On Thu, Jun 25, 2020 at 10:25:38AM -0700, Ralph Campbell
wrote:> Making sure to include linux-mm and Bharata B Rao for IBM's
> use of migrate_vma*().
> 
> On 6/24/20 11:10 AM, Ralph Campbell wrote:
> > 
> > On 6/24/20 12:23 AM, Christoph Hellwig wrote:
> > > On Mon, Jun 22, 2020 at 04:38:53PM -0700, Ralph Campbell wrote:
> > > > The OpenCL function clEnqueueSVMMigrateMem(), without any
flags, will
> > > > migrate memory in the given address range to device private
memory. The
> > > > source pages might already have been migrated to device
private memory.
> > > > In that case, the source struct page is not checked to see
if it is
> > > > a device private page and incorrectly computes the GPU's
physical
> > > > address of local memory leading to data corruption.
> > > > Fix this by checking the source struct page and computing
the correct
> > > > physical address.
> > > 
> > > I'm really worried about all this delicate code to fix the
mixed
> > > ranges.? Can't we make it clear at the migrate_vma_* level if
we want
> > > to migrate from or two device private memory, and then skip all
the work
> > > for regions of memory that already are in the right place?? This
might be
> > > a little more work initially, but I think it leads to a much
better
> > > API.
> > > 
> > 
> > The current code does encode the direction with src_owner != NULL
meaning
> > device private to system memory and src_owner == NULL meaning system
> > memory to device private memory. This patch would obviously defeat
that
> > so perhaps a flag could be added to the struct migrate_vma to indicate
the
> > direction but I'm unclear how that makes things less delicate.
> > Can you expand on what you are worried about?
> > 
> > The issue with invalidations might be better addressed by letting the
device
> > driver handle device private page TLB invalidations when migrating to
> > system memory and changing migrate_vma_setup() to only invalidate CPU
> > TLB entries for normal pages being migrated to device private memory.
> > If a page isn't migrating, it seems inefficient to invalidate
those TLB
> > entries.
> > 
> > Any other suggestions?
> 
> After a night's sleep, I think this might work. What do others think?
> 
> 1) Add a new MMU_NOTIFY_MIGRATE enum to mmu_notifier_event.
> 
> 2) Change migrate_vma_collect() to use the new MMU_NOTIFY_MIGRATE event
type.
>
> 3) Modify nouveau_svmm_invalidate_range_start() to simply return (no
invalidations)
> for MMU_NOTIFY_MIGRATE mmu notifier callbacks.
Isn't it a bit of an assumption that migrate_vma_collect() is only
used by nouveau itself?

What if some other devices' device_private pages are being migrated?

Jason

Ralph Campbell

2020-Jun-25 17:42 UTC

head link

[Nouveau] [RESEND PATCH 2/3] nouveau: fix mixed normal and device private page migration

On 6/25/20 10:31 AM, Jason Gunthorpe wrote:> On Thu, Jun 25, 2020 at 10:25:38AM -0700, Ralph Campbell wrote:
>> Making sure to include linux-mm and Bharata B Rao for IBM's
>> use of migrate_vma*().
>>
>> On 6/24/20 11:10 AM, Ralph Campbell wrote:
>>>
>>> On 6/24/20 12:23 AM, Christoph Hellwig wrote:
>>>> On Mon, Jun 22, 2020 at 04:38:53PM -0700, Ralph Campbell wrote:
>>>>> The OpenCL function clEnqueueSVMMigrateMem(), without any
flags, will
>>>>> migrate memory in the given address range to device private
memory. The
>>>>> source pages might already have been migrated to device
private memory.
>>>>> In that case, the source struct page is not checked to see
if it is
>>>>> a device private page and incorrectly computes the
GPU's physical
>>>>> address of local memory leading to data corruption.
>>>>> Fix this by checking the source struct page and computing
the correct
>>>>> physical address.
>>>>
>>>> I'm really worried about all this delicate code to fix the
mixed
>>>> ranges.? Can't we make it clear at the migrate_vma_* level
if we want
>>>> to migrate from or two device private memory, and then skip all
the work
>>>> for regions of memory that already are in the right place??
This might be
>>>> a little more work initially, but I think it leads to a much
better
>>>> API.
>>>>
>>>
>>> The current code does encode the direction with src_owner != NULL
meaning
>>> device private to system memory and src_owner == NULL meaning
system
>>> memory to device private memory. This patch would obviously defeat
that
>>> so perhaps a flag could be added to the struct migrate_vma to
indicate the
>>> direction but I'm unclear how that makes things less delicate.
>>> Can you expand on what you are worried about?
>>>
>>> The issue with invalidations might be better addressed by letting
the device
>>> driver handle device private page TLB invalidations when migrating
to
>>> system memory and changing migrate_vma_setup() to only invalidate
CPU
>>> TLB entries for normal pages being migrated to device private
memory.
>>> If a page isn't migrating, it seems inefficient to invalidate
those TLB
>>> entries.
>>>
>>> Any other suggestions?
>>
>> After a night's sleep, I think this might work. What do others
think?
>>
>> 1) Add a new MMU_NOTIFY_MIGRATE enum to mmu_notifier_event.
>>
>> 2) Change migrate_vma_collect() to use the new MMU_NOTIFY_MIGRATE event
type.
>>
>> 3) Modify nouveau_svmm_invalidate_range_start() to simply return (no
invalidations)
>> for MMU_NOTIFY_MIGRATE mmu notifier callbacks.
> 
> Isn't it a bit of an assumption that migrate_vma_collect() is only
> used by nouveau itself?
> 
> What if some other devices' device_private pages are being migrated?
> 
> Jason
> 
Good point. The driver needs a way of knowing the callback is due its call
to migrate_vma_setup() and not some other migration invalidation.
How about adding a void pointer to struct mmu_notifier_range
which migrate_vma_collect() can set to src_owner. If the event is
MMU_NOTIFY_MIGRATE and the src_owner matches the void pointer, then the
callback should be the one the driver initiated.

Possibly Parallel Threads

Search for more reasonably related threads

Nouveau - Jun 2020 - [RESEND PATCH 2/3] nouveau: fix mixed normal and device private page migration

[Nouveau] [RESEND PATCH 2/3] nouveau: fix mixed normal and device private page migration

[Nouveau] [RESEND PATCH 2/3] nouveau: fix mixed normal and device private page migration

[Nouveau] [RESEND PATCH 2/3] nouveau: fix mixed normal and device private page migration

Possibly Parallel Threads