thr3ads.net - Nouveau - [Nouveau] [PATCH 2/2] mm: remove device private page support from hmm_range

If this information is useful, please help other people find it:
Share via:

Jason Gunthorpe

2020-Mar-16 20:09 UTC

[Nouveau] [PATCH 2/2] mm: remove device private page support from hmm_range_fault

On Mon, Mar 16, 2020 at 07:49:35PM +0100, Christoph Hellwig
wrote:> On Mon, Mar 16, 2020 at 11:42:19AM -0700, Ralph Campbell wrote:
> >
> > On 3/16/20 10:52 AM, Christoph Hellwig wrote:
> >> No driver has actually used properly wire up and support this
feature.
> >> There is various code related to it in nouveau, but as far as I
can tell
> >> it never actually got turned on, and the only changes since the
initial
> >> commit are global cleanups.
> >
> > This is not actually true. OpenCL 2.x does support SVM with nouveau
and
> > device private memory via clEnqueueSVMMigrateMem().
> > Also, Ben Skeggs has accepted a set of patches to map GPU memory after
being
> > migrated and this change would conflict with that.
> 
> Can you explain me how we actually invoke this code?
> 
> For that we'd need HMM_PFN_DEVICE_PRIVATE NVIF_VMM_PFNMAP_V0_VRAM
> set in ->pfns before calling hmm_range_fault, which isn't happening.
Oh, I got tripped on this too

The logic is backwards from what you'd think.. If you *set*
HMM_PFN_DEVICE_PRIVATE then this triggers:

hmm_pte_need_fault():
	if ((cpu_flags & range->flags[HMM_PFN_DEVICE_PRIVATE])) {
		/* Do we fault on device memory ? */
		if (pfns & range->flags[HMM_PFN_DEVICE_PRIVATE]) {
			*write_fault = pfns & range->flags[HMM_PFN_WRITE];
			*fault = true;
		}
		return;
	}

Ie if the cpu page is a DEVICE_PRIVATE and the caller sets
HMM_PFN_DEVICE_PRIVATE in the input flags (pfns) then it always faults
it and never sets HMM_PFN_DEVICE_PRIVATE in the output flags.

So setting 0 enabled device_private support, and nouveau is Ok.

AMDGPU is broken because it can't handle device private and can't set
the flag to supress it.

I was going to try and sort this out as part of getting rid of range->flags

Jason

Ralph Campbell

2020-Mar-16 20:24 UTC

head link

[Nouveau] [PATCH 2/2] mm: remove device private page support from hmm_range_fault

On 3/16/20 1:09 PM, Jason Gunthorpe wrote:> On Mon, Mar 16, 2020 at 07:49:35PM +0100, Christoph Hellwig wrote:
>> On Mon, Mar 16, 2020 at 11:42:19AM -0700, Ralph Campbell wrote:
>>>
>>> On 3/16/20 10:52 AM, Christoph Hellwig wrote:
>>>> No driver has actually used properly wire up and support this
feature.
>>>> There is various code related to it in nouveau, but as far as I
can tell
>>>> it never actually got turned on, and the only changes since the
initial
>>>> commit are global cleanups.
>>>
>>> This is not actually true. OpenCL 2.x does support SVM with nouveau
and
>>> device private memory via clEnqueueSVMMigrateMem().
>>> Also, Ben Skeggs has accepted a set of patches to map GPU memory
after being
>>> migrated and this change would conflict with that.
>>
>> Can you explain me how we actually invoke this code?
>>
>> For that we'd need HMM_PFN_DEVICE_PRIVATE NVIF_VMM_PFNMAP_V0_VRAM
>> set in ->pfns before calling hmm_range_fault, which isn't
happening.
> 
> Oh, I got tripped on this too
> 
> The logic is backwards from what you'd think.. If you *set*
> HMM_PFN_DEVICE_PRIVATE then this triggers:
> 
> hmm_pte_need_fault():
> 	if ((cpu_flags & range->flags[HMM_PFN_DEVICE_PRIVATE])) {
> 		/* Do we fault on device memory ? */
> 		if (pfns & range->flags[HMM_PFN_DEVICE_PRIVATE]) {
> 			*write_fault = pfns & range->flags[HMM_PFN_WRITE];
> 			*fault = true;
> 		}
> 		return;
> 	}
> 
> Ie if the cpu page is a DEVICE_PRIVATE and the caller sets
> HMM_PFN_DEVICE_PRIVATE in the input flags (pfns) then it always faults
> it and never sets HMM_PFN_DEVICE_PRIVATE in the output flags.
> 
> So setting 0 enabled device_private support, and nouveau is Ok.
> 
> AMDGPU is broken because it can't handle device private and can't
set
> the flag to supress it.
> 
> I was going to try and sort this out as part of getting rid of
range->flags
> 
> Jason
> 
The reason for it being backwards is that "normally" a device
doesn't want
the device private page to be faulted back to system memory, it wants to
get the device private struct page so it can update its page table to point
to the memory already in the device.
Also, a device like Nvidia's GPUs may have an alternate path for copying
one GPU's memory to another (nvlink) without going through system memory
so getting a device private struct page/PFN from hmm_range_fault() that
isn't
"owned" by the faulting GPU is useful.
I agree that the current code isn't well tested or thought out for multiple
devices
(rdma, NVMe drives, GPUs, etc.) but it also ties in with peer-to-peer access via
PCIe.

Jason Gunthorpe

2020-Mar-17 11:56 UTC

head link

[Nouveau] [PATCH 2/2] mm: remove device private page support from hmm_range_fault

On Mon, Mar 16, 2020 at 01:24:09PM -0700, Ralph Campbell wrote:
> The reason for it being backwards is that "normally" a device
doesn't want
> the device private page to be faulted back to system memory, it wants to
> get the device private struct page so it can update its page table to point
> to the memory already in the device.
The "backwards" is you set the flag on input and never get it on
output, clear the flag in input and maybe get it on output.

Compare with valid or write which don't work that way.
> Also, a device like Nvidia's GPUs may have an alternate path for
copying
> one GPU's memory to another (nvlink) without going through system
memory
> so getting a device private struct page/PFN from hmm_range_fault() that
isn't
> "owned" by the faulting GPU is useful.
> I agree that the current code isn't well tested or thought out for
multiple devices
> (rdma, NVMe drives, GPUs, etc.) but it also ties in with peer-to-peer
access via PCIe.
I think the series here is a big improvement. The GPU driver can set
owners that match the hidden cluster interconnect.

Jason

Maybe Matching Threads

Search for more reasonably related threads

Nouveau - Mar 2020 - [PATCH 2/2] mm: remove device private page support from hmm_range_fault

[Nouveau] [PATCH 2/2] mm: remove device private page support from hmm_range_fault

[Nouveau] [PATCH 2/2] mm: remove device private page support from hmm_range_fault

[Nouveau] [PATCH 2/2] mm: remove device private page support from hmm_range_fault

Maybe Matching Threads