thr3ads.net - Nouveau - [Nouveau] [RFC] Explicit synchronization for Nouveau [Oct 2014]

If this information is useful, please help other people find it:
Share via:

Lauri Peltonen

2014-Oct-02 14:59 UTC

[Nouveau] [RFC] Explicit synchronization for Nouveau

+Rom who seems to be presenting about mainlining android sync at linux plumbers


On Wed, Oct 01, 2014 at 05:58:52PM +0200, Maarten Lankhorst
wrote:> You could neuter implicit fences by always attaching the fences as
> shared when explicit syncing is used. This would work correctly with
> eviction, and wouldn't cause any unneeded syncing. :)
Yes, that will probably work!  So, just to reiterate that I understood you and
Daniel correctly:

- de-stage sync_fence and it's user space API (the tedious part)
- add dma-buf ioctls for extracting and attaching explicit fences
- Nouveau specific: allow flagging gem buffers for explicit sync
  - do not check pre-fences from explicitly synced buffers at submit 
  - continue to attach a shared fence after submit so that pinning and
    unmapping continue to work

Then working sets and getting rid of locking all buffers individually 
can be dealt with later as an optimization.


On Wed, Oct 01, 2014 at 07:27:21PM +0200, Daniel Vetter
wrote:> On Wed, Oct 01, 2014 at 06:14:16PM +0300, Lauri Peltonen wrote:
> > Implicit fences attached to individual buffers are one way for
residency
> > management.  Do you think a working set based model could work in the
DRM
> > framework?  For example, something like this:
> > 
> > - Allow user space to create "working set objects" and
associate buffers with
> >   them.  If the user space doesn't want to manage working sets
explicitly, it
> >   could also use an implicit default working set that contains all
buffers that
> >   are mapped to the channel vm (on Android we could always use the
default
> >   working set since we don't need to manage residency).  The
working sets are
> >   initially marked as dirty.
> > - User space tells which working sets are referenced by each work
submission.
> >   Kernel locks these working sets, pins all buffers in dirty working
sets, and
> >   resets the dirty bits.  After kicking off work, kernel stores the
fence to
> >   the _working sets_, and then releases the locks (if an implicit
default
> >   working set is used, then this would be roughly equivalent to
storing a fence
> >   to channel vm that tells "this is the last hw operation that
might have
> >   touched buffers in this address space").
> > - If swapping doesn't happen, then we just need to check the
working set dirty
> >   bits at each submit.
> > - When a buffer is swapped out, all working sets that refer to it need
to be
> >   marked as dirty.
> > - When a buffer is swapped out or unmapped, we need to wait for the
fences from
> >   all working sets that refer to the buffer.
> > 
> > Initially one might think of working sets as a mere optimization - we
now need
> > to process a few working sets at every submit instead of many
individual
> > buffers.  However, it makes a huge difference because of fences:
fences that
> > are attached to buffers are used for implicitly synchronizing work
across
> > different channels and engines.  They are in the performance critical
path, and
> > we want to carefully manage them (that's the idea of explicit
synchronization).
> > The working set fences, on the other hand, would only be used to
guarantee that
> > we don't swap out or unmap something that the GPU might be
accessing.  We never
> > need to wait for those fences (except when swapping or unmapping), so
we can be
> > conservative without hurting performance.
> 
> Yeah, within the driver (i.e. for private objects which are never exported
> to dma_buf) we can recently do stuff like this. And your above idea is
> roughly one of the things we're tossing around for i915.
> 
> But the cool stuff with drm is that cmd submission is driver-specific, so
> you can just go wild with nouveau. Of course you have to coninvce the
> nouveau guys (and also have open-source users for the new interface).
> 
> For shared buffers I think we should stick with the implicit fences for a
> while simply because I'm not sure whether it's really worth the
fuzz. And
> reworking all the drivers and dma-buf for some working sets is a lot of
> fuzz ;-) Like Maarten said you can mostly short-circuit the implicit
> fencing by only attaching shared fences.
Yes, I'll try to do that.

> In case you're curious: The idea is to have a 1:1 association between
> ppgtt address spaces and what you call the working set above, to implement
> the buffer svm model in ocl2. Mostly because we expect that applications
> won't get the more fine-grained buffer list right anyway. And this kind
of
> gang-scheduling of working set sizes should be more efficient for the
> usual case where everything fits.
If I understood correctly, this would be exactly the same as what I called the
"default working set" above.  On Android we don't care much about
finer grained
working sets either, because of UMA and no swapping.

> > > Imo de-staging the android syncpt stuff needs to happen first,
before drivers
> > > can use it. Since non-staging stuff really shouldn't depend
upon code from
> > > staging.
> > 
> > Fully agree.  I thought the best way towards that would be to show
some driver
> > code that _would_ use it. :)
> 
> Oh, there's the usual chicken&egg where we need a full-blown
prototype
> before we can start merging. Interface work on upstream is super-hard, but
> given the ridiculous backwards compat guarantees Linus expects us to keep
> up totally justified. Mistakes are really expensive. So I'm happy to
see
> you charge ahead here.
Given that I'm not used to working with upstream, don't expect too much
from my
"charging ahead". :)  I'm still secretly hoping that the Android
guys at Google
would jump in to help, now that we seem to agree that we could de-stage
sync_fence.

> > > I'm all for adding explicit syncing. Our plans are roughly. 
- Add both an in
> > > and and out fence to execbuf to sync with other rendering and
give userspace
> > > a fence back. Needs to different flags probably.
> > > 
> > > - Maybe add an ioctl to dma-bufs to get at the current implicit
fences
> > >   attached to them (both an exclusive and non-exclusive version).
This
> > >   should help with making explicit and implicit sync work
together nicely.
> > > 
> > > - Add fence support to kms. Probably only worth it together with
the new
> > >   atomic stuff. Again we need an in fence to wait for (one for
each
> > >   buffer) and an out fence. The later can easily be implemented
by
> > >   extending struct drm_event, which means not a single driver
code line
> > >   needs to be changed for this.
> > > 
> > > - For de-staging android syncpts we need to de-clutter the
internal
> > >   interfaces and also review all the ioctls exposed. Like you say
it
> > >   should be just the userspace interface for struct drm_fence.
Also, it
> > >   needs testcases and preferrably manpages.
> > 
> > This all sounds very similar to what we'd like to do!  Maybe we
can move
> > forward with these parts, and continue to attach fences at submit
until we have
> > a satisfactory solution for the pinning problem?
> 
> Yeah, that's our plan for i915 too. First add explicit fences, then
figure
> out whether we need to be better at neutering the implicit fences, in case
> and only where it really gets in the way.
> 
> > I'd like to understand what are the concrete steps to de-stage
struct
> > sync_fence, since that's the first thing that needs to be done. 
For example,
> > what do you mean by "de-cluttering the internal interfaces"?
Just that we'd
> > move the sync_fence parts from drivers/staging/android/sync.c to, say,
> > drivers/dma-buf/sync-fence.c ?  Would we still leave a copy of the
existing
> > full driver to staging/android?
> 
> Yeah I guess that would be an approach. Personally I think we should also
> have basic ioctl testcase for all the ioctls exposed by syncpt fds. And
> reviewing the kerneldoc for the driver-internal interfaces (which includes
> removing everything that's no made obsolete by struct fence). Bonus
points
> for documenting the ioctls. We could throw the test binary into libdrm
> maybe, there's a bunch other like it already there.
> 
> I'm not sure whether/how much google has already for this.
The simplest way to add tests is if we allow user space to create and trigger
fences.  We don't want to enable that from kernel by default, because that
opens possibilities for deadlocks (e.g. a process could deadlock a compositor
by passing a fence that it never triggers).  Android sync driver solves this by
having a separate CONFIG_SW_SYNC_USER that can be used for testing.

Here's a simple test by Google:
https://android.googlesource.com/platform/system/core/+/master/libsync/sync_test.c

And this is their userspace wrapper lib:
https://android.googlesource.com/platform/system/core/+/master/libsync/sync.c
https://android.googlesource.com/platform/system/core/+/master/libsync/include/sync/sync.h

Would that go to libdrm now...?

> Aside: Will you be at XDC or linux plumbers? Either would be a perfect
> place to discuss plans and ideas - I'll attend both.
I wasn't going to, but let's see.  The former is pretty soon and the
latter is
sold out.  At least Andy Ritger from Nvidia is coming to XDC for sure, and
he's
been involved in our internal discussions around these topics. So I suggest you
have a chat with him at least!  :)


Thanks,
Lauri

Daniel Vetter

2014-Oct-02 20:44 UTC

head link

[Nouveau] [RFC] Explicit synchronization for Nouveau

On Thu, Oct 02, 2014 at 05:59:51PM +0300, Lauri Peltonen
wrote:> +Rom who seems to be presenting about mainlining android sync at linux
plumbers
Also add Greg KH as fyi that we're working on de-stage one of the android
subsystems.
> On Wed, Oct 01, 2014 at 05:58:52PM +0200, Maarten Lankhorst wrote:
> > You could neuter implicit fences by always attaching the fences as
> > shared when explicit syncing is used. This would work correctly with
> > eviction, and wouldn't cause any unneeded syncing. :)
> 
> Yes, that will probably work!  So, just to reiterate that I understood you
and
> Daniel correctly:
> 
> - de-stage sync_fence and it's user space API (the tedious part)
> - add dma-buf ioctls for extracting and attaching explicit fences
> - Nouveau specific: allow flagging gem buffers for explicit sync
>   - do not check pre-fences from explicitly synced buffers at submit 
>   - continue to attach a shared fence after submit so that pinning and
>     unmapping continue to work
    - Have explicit in/out fences for the pushbuf ioctl is missing I
      guess in this step?

I also think we need some kind of demonstration vehicle using nouveau to
satisfy Dave Airlie's open-source userspace requirements for new
interfaces. Might be good to chat with him to make sure we have that
covered (and how much needs to be there really).
> Then working sets and getting rid of locking all buffers individually 
> can be dealt with later as an optimization.
Yeah, sounds like a good plan.
> On Wed, Oct 01, 2014 at 07:27:21PM +0200, Daniel Vetter wrote:
> > On Wed, Oct 01, 2014 at 06:14:16PM +0300, Lauri Peltonen wrote:
> > > Implicit fences attached to individual buffers are one way for
residency
> > > management.  Do you think a working set based model could work in
the DRM
> > > framework?  For example, something like this:
> > > 
> > > - Allow user space to create "working set objects" and
associate buffers with
> > >   them.  If the user space doesn't want to manage working
sets explicitly, it
> > >   could also use an implicit default working set that contains
all buffers that
> > >   are mapped to the channel vm (on Android we could always use
the default
> > >   working set since we don't need to manage residency).  The
working sets are
> > >   initially marked as dirty.
> > > - User space tells which working sets are referenced by each work
submission.
> > >   Kernel locks these working sets, pins all buffers in dirty
working sets, and
> > >   resets the dirty bits.  After kicking off work, kernel stores
the fence to
> > >   the _working sets_, and then releases the locks (if an implicit
default
> > >   working set is used, then this would be roughly equivalent to
storing a fence
> > >   to channel vm that tells "this is the last hw operation
that might have
> > >   touched buffers in this address space").
> > > - If swapping doesn't happen, then we just need to check the
working set dirty
> > >   bits at each submit.
> > > - When a buffer is swapped out, all working sets that refer to it
need to be
> > >   marked as dirty.
> > > - When a buffer is swapped out or unmapped, we need to wait for
the fences from
> > >   all working sets that refer to the buffer.
> > > 
> > > Initially one might think of working sets as a mere optimization
- we now need
> > > to process a few working sets at every submit instead of many
individual
> > > buffers.  However, it makes a huge difference because of fences:
fences that
> > > are attached to buffers are used for implicitly synchronizing
work across
> > > different channels and engines.  They are in the performance
critical path, and
> > > we want to carefully manage them (that's the idea of explicit
synchronization).
> > > The working set fences, on the other hand, would only be used to
guarantee that
> > > we don't swap out or unmap something that the GPU might be
accessing.  We never
> > > need to wait for those fences (except when swapping or
unmapping), so we can be
> > > conservative without hurting performance.
> > 
> > Yeah, within the driver (i.e. for private objects which are never
exported
> > to dma_buf) we can recently do stuff like this. And your above idea is
> > roughly one of the things we're tossing around for i915.
> > 
> > But the cool stuff with drm is that cmd submission is driver-specific,
so
> > you can just go wild with nouveau. Of course you have to coninvce the
> > nouveau guys (and also have open-source users for the new interface).
> > 
> > For shared buffers I think we should stick with the implicit fences
for a
> > while simply because I'm not sure whether it's really worth
the fuzz. And
> > reworking all the drivers and dma-buf for some working sets is a lot
of
> > fuzz ;-) Like Maarten said you can mostly short-circuit the implicit
> > fencing by only attaching shared fences.
> 
> Yes, I'll try to do that.
> 
> 
> > In case you're curious: The idea is to have a 1:1 association
between
> > ppgtt address spaces and what you call the working set above, to
implement
> > the buffer svm model in ocl2. Mostly because we expect that
applications
> > won't get the more fine-grained buffer list right anyway. And this
kind of
> > gang-scheduling of working set sizes should be more efficient for the
> > usual case where everything fits.
> 
> If I understood correctly, this would be exactly the same as what I called
the
> "default working set" above.  On Android we don't care much
about finer grained
> working sets either, because of UMA and no swapping.
Yeah, that's pretty much the idea. Well with ocl the SVM buffer working
set sizes are attached to an ocl context, but I don't think there'll be
more than one of those around really (except maybe when different
libraries all use ocl, but don't work together on the same data).
> > > > Imo de-staging the android syncpt stuff needs to happen
first, before drivers
> > > > can use it. Since non-staging stuff really shouldn't
depend upon code from
> > > > staging.
> > > 
> > > Fully agree.  I thought the best way towards that would be to
show some driver
> > > code that _would_ use it. :)
> > 
> > Oh, there's the usual chicken&egg where we need a full-blown
prototype
> > before we can start merging. Interface work on upstream is super-hard,
but
> > given the ridiculous backwards compat guarantees Linus expects us to
keep
> > up totally justified. Mistakes are really expensive. So I'm happy
to see
> > you charge ahead here.
> 
> Given that I'm not used to working with upstream, don't expect too
much from my
> "charging ahead". :)  I'm still secretly hoping that the
Android guys at Google
> would jump in to help, now that we seem to agree that we could de-stage
> sync_fence.
Forget that, imo the android guys are 100% absorbed with their own stuff,
at least on the gfx side. Occasionally they pipe up with "btw this is what
we're doing now".

Also, the problem is that to actually push android stuff out of staging
you need a use-case in upstream, which means an open-source gpu driver.
There's not a lot of companies who have both that and ship android, and
definitely not the nexus/android lead platforms.

Display side would be easier since there's a bunch of kms drivers now
upstream. But given that google decided to go ahead with their own adf
instead of drm-kms that's also a non-starter.

Heck, I have a hard time draggin our own android folks here at intel into
upstream work ...
> > > > I'm all for adding explicit syncing. Our plans are
roughly.  - Add both an in
> > > > and and out fence to execbuf to sync with other rendering
and give userspace
> > > > a fence back. Needs to different flags probably.
> > > > 
> > > > - Maybe add an ioctl to dma-bufs to get at the current
implicit fences
> > > >   attached to them (both an exclusive and non-exclusive
version). This
> > > >   should help with making explicit and implicit sync work
together nicely.
> > > > 
> > > > - Add fence support to kms. Probably only worth it together
with the new
> > > >   atomic stuff. Again we need an in fence to wait for (one
for each
> > > >   buffer) and an out fence. The later can easily be
implemented by
> > > >   extending struct drm_event, which means not a single
driver code line
> > > >   needs to be changed for this.
> > > > 
> > > > - For de-staging android syncpts we need to de-clutter the
internal
> > > >   interfaces and also review all the ioctls exposed. Like
you say it
> > > >   should be just the userspace interface for struct
drm_fence. Also, it
> > > >   needs testcases and preferrably manpages.
> > > 
> > > This all sounds very similar to what we'd like to do!  Maybe
we can move
> > > forward with these parts, and continue to attach fences at submit
until we have
> > > a satisfactory solution for the pinning problem?
> > 
> > Yeah, that's our plan for i915 too. First add explicit fences,
then figure
> > out whether we need to be better at neutering the implicit fences, in
case
> > and only where it really gets in the way.
> > 
> > > I'd like to understand what are the concrete steps to
de-stage struct
> > > sync_fence, since that's the first thing that needs to be
done.  For example,
> > > what do you mean by "de-cluttering the internal
interfaces"?  Just that we'd
> > > move the sync_fence parts from drivers/staging/android/sync.c to,
say,
> > > drivers/dma-buf/sync-fence.c ?  Would we still leave a copy of
the existing
> > > full driver to staging/android?
> > 
> > Yeah I guess that would be an approach. Personally I think we should
also
> > have basic ioctl testcase for all the ioctls exposed by syncpt fds.
And
> > reviewing the kerneldoc for the driver-internal interfaces (which
includes
> > removing everything that's no made obsolete by struct fence).
Bonus points
> > for documenting the ioctls. We could throw the test binary into libdrm
> > maybe, there's a bunch other like it already there.
> > 
> > I'm not sure whether/how much google has already for this.
> 
> The simplest way to add tests is if we allow user space to create and
trigger
> fences.  We don't want to enable that from kernel by default, because
that
> opens possibilities for deadlocks (e.g. a process could deadlock a
compositor
> by passing a fence that it never triggers).  Android sync driver solves
this by
> having a separate CONFIG_SW_SYNC_USER that can be used for testing.
> 
> Here's a simple test by Google:
>
https://android.googlesource.com/platform/system/core/+/master/libsync/sync_test.c
Hm, usually we expose such test interfaces through debugfs - that way
production system won't ever ship with it (since there's too many
exploits
in there, especially with secure boot). But since you need it for
validation tests (at least for the i915 suite) it should always be there
when you need it.

Exposing this as a configurable driver in dev is imo a no-go. But we
should be able to easily convert this into a few debugfs files, so not too
much fuzz hopefully.
> And this is their userspace wrapper lib:
>
https://android.googlesource.com/platform/system/core/+/master/libsync/sync.c
>
https://android.googlesource.com/platform/system/core/+/master/libsync/include/sync/sync.h
> 
> Would that go to libdrm now...?
Imo it would make sense. At least we kinda want to use fences outside of
Android, and a separate library project feels like overkill.
> > Aside: Will you be at XDC or linux plumbers? Either would be a perfect
> > place to discuss plans and ideas - I'll attend both.
> 
> I wasn't going to, but let's see.  The former is pretty soon and
the latter is
> sold out.  At least Andy Ritger from Nvidia is coming to XDC for sure, and
he's
> been involved in our internal discussions around these topics. So I suggest
you
> have a chat with him at least!  :)
I'll definitely have a chat (and some beers) with Andy, been a while
I've
last seen him ;-)

Cheers, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

Rom Lemarchand

2014-Oct-03 21:56 UTC

head link

[Nouveau] [RFC] Explicit synchronization for Nouveau

Riley (CCed) and I will be at Plumbers in a couple weeks.

There is a session on sync planned in the Android track, and of course
we'll be available to chat.

On Thu, Oct 2, 2014 at 1:44 PM, Daniel Vetter <daniel at ffwll.ch> wrote:
> On Thu, Oct 02, 2014 at 05:59:51PM +0300, Lauri Peltonen wrote:
> > +Rom who seems to be presenting about mainlining android sync at linux
> plumbers
>
> Also add Greg KH as fyi that we're working on de-stage one of the
android
> subsystems.
>
> > On Wed, Oct 01, 2014 at 05:58:52PM +0200, Maarten Lankhorst wrote:
> > > You could neuter implicit fences by always attaching the fences
as
> > > shared when explicit syncing is used. This would work correctly
with
> > > eviction, and wouldn't cause any unneeded syncing. :)
> >
> > Yes, that will probably work!  So, just to reiterate that I understood
> you and
> > Daniel correctly:
> >
> > - de-stage sync_fence and it's user space API (the tedious part)
> > - add dma-buf ioctls for extracting and attaching explicit fences
> > - Nouveau specific: allow flagging gem buffers for explicit sync
> >   - do not check pre-fences from explicitly synced buffers at submit
> >   - continue to attach a shared fence after submit so that pinning and
> >     unmapping continue to work
>
>     - Have explicit in/out fences for the pushbuf ioctl is missing I
>       guess in this step?
>
> I also think we need some kind of demonstration vehicle using nouveau to
> satisfy Dave Airlie's open-source userspace requirements for new
> interfaces. Might be good to chat with him to make sure we have that
> covered (and how much needs to be there really).
>
> > Then working sets and getting rid of locking all buffers individually
> > can be dealt with later as an optimization.
>
> Yeah, sounds like a good plan.
>
> > On Wed, Oct 01, 2014 at 07:27:21PM +0200, Daniel Vetter wrote:
> > > On Wed, Oct 01, 2014 at 06:14:16PM +0300, Lauri Peltonen wrote:
> > > > Implicit fences attached to individual buffers are one way
for
> residency
> > > > management.  Do you think a working set based model could
work in
> the DRM
> > > > framework?  For example, something like this:
> > > >
> > > > - Allow user space to create "working set objects"
and associate
> buffers with
> > > >   them.  If the user space doesn't want to manage
working sets
> explicitly, it
> > > >   could also use an implicit default working set that
contains all
> buffers that
> > > >   are mapped to the channel vm (on Android we could always
use the
> default
> > > >   working set since we don't need to manage residency). 
The working
> sets are
> > > >   initially marked as dirty.
> > > > - User space tells which working sets are referenced by each
work
> submission.
> > > >   Kernel locks these working sets, pins all buffers in dirty
working
> sets, and
> > > >   resets the dirty bits.  After kicking off work, kernel
stores the
> fence to
> > > >   the _working sets_, and then releases the locks (if an
implicit
> default
> > > >   working set is used, then this would be roughly equivalent
to
> storing a fence
> > > >   to channel vm that tells "this is the last hw
operation that might
> have
> > > >   touched buffers in this address space").
> > > > - If swapping doesn't happen, then we just need to check
the working
> set dirty
> > > >   bits at each submit.
> > > > - When a buffer is swapped out, all working sets that refer
to it
> need to be
> > > >   marked as dirty.
> > > > - When a buffer is swapped out or unmapped, we need to wait
for the
> fences from
> > > >   all working sets that refer to the buffer.
> > > >
> > > > Initially one might think of working sets as a mere
optimization -
> we now need
> > > > to process a few working sets at every submit instead of
many
> individual
> > > > buffers.  However, it makes a huge difference because of
fences:
> fences that
> > > > are attached to buffers are used for implicitly
synchronizing work
> across
> > > > different channels and engines.  They are in the performance
> critical path, and
> > > > we want to carefully manage them (that's the idea of
explicit
> synchronization).
> > > > The working set fences, on the other hand, would only be
used to
> guarantee that
> > > > we don't swap out or unmap something that the GPU might
be
> accessing.  We never
> > > > need to wait for those fences (except when swapping or
unmapping),
> so we can be
> > > > conservative without hurting performance.
> > >
> > > Yeah, within the driver (i.e. for private objects which are never
> exported
> > > to dma_buf) we can recently do stuff like this. And your above
idea is
> > > roughly one of the things we're tossing around for i915.
> > >
> > > But the cool stuff with drm is that cmd submission is
driver-specific,
> so
> > > you can just go wild with nouveau. Of course you have to coninvce
the
> > > nouveau guys (and also have open-source users for the new
interface).
> > >
> > > For shared buffers I think we should stick with the implicit
fences
> for a
> > > while simply because I'm not sure whether it's really
worth the fuzz.
> And
> > > reworking all the drivers and dma-buf for some working sets is a
lot of
> > > fuzz ;-) Like Maarten said you can mostly short-circuit the
implicit
> > > fencing by only attaching shared fences.
> >
> > Yes, I'll try to do that.
> >
> >
> > > In case you're curious: The idea is to have a 1:1 association
between
> > > ppgtt address spaces and what you call the working set above, to
> implement
> > > the buffer svm model in ocl2. Mostly because we expect that
> applications
> > > won't get the more fine-grained buffer list right anyway. And
this
> kind of
> > > gang-scheduling of working set sizes should be more efficient for
the
> > > usual case where everything fits.
> >
> > If I understood correctly, this would be exactly the same as what I
> called the
> > "default working set" above.  On Android we don't care
much about finer
> grained
> > working sets either, because of UMA and no swapping.
>
> Yeah, that's pretty much the idea. Well with ocl the SVM buffer working
> set sizes are attached to an ocl context, but I don't think
there'll be
> more than one of those around really (except maybe when different
> libraries all use ocl, but don't work together on the same data).
>
> > > > > Imo de-staging the android syncpt stuff needs to happen
first,
> before drivers
> > > > > can use it. Since non-staging stuff really
shouldn't depend upon
> code from
> > > > > staging.
> > > >
> > > > Fully agree.  I thought the best way towards that would be
to show
> some driver
> > > > code that _would_ use it. :)
> > >
> > > Oh, there's the usual chicken&egg where we need a
full-blown prototype
> > > before we can start merging. Interface work on upstream is
super-hard,
> but
> > > given the ridiculous backwards compat guarantees Linus expects us
to
> keep
> > > up totally justified. Mistakes are really expensive. So I'm
happy to
> see
> > > you charge ahead here.
> >
> > Given that I'm not used to working with upstream, don't expect
too much
> from my
> > "charging ahead". :)  I'm still secretly hoping that the
Android guys at
> Google
> > would jump in to help, now that we seem to agree that we could
de-stage
> > sync_fence.
>
> Forget that, imo the android guys are 100% absorbed with their own stuff,
> at least on the gfx side. Occasionally they pipe up with "btw this is
what
> we're doing now".
>
> Also, the problem is that to actually push android stuff out of staging
> you need a use-case in upstream, which means an open-source gpu driver.
> There's not a lot of companies who have both that and ship android, and
> definitely not the nexus/android lead platforms.
>
> Display side would be easier since there's a bunch of kms drivers now
> upstream. But given that google decided to go ahead with their own adf
> instead of drm-kms that's also a non-starter.
>
> Heck, I have a hard time draggin our own android folks here at intel into
> upstream work ...
>
> > > > > I'm all for adding explicit syncing. Our plans are
roughly.  - Add
> both an in
> > > > > and and out fence to execbuf to sync with other
rendering and give
> userspace
> > > > > a fence back. Needs to different flags probably.
> > > > >
> > > > > - Maybe add an ioctl to dma-bufs to get at the current
implicit
> fences
> > > > >   attached to them (both an exclusive and non-exclusive
version).
> This
> > > > >   should help with making explicit and implicit sync
work together
> nicely.
> > > > >
> > > > > - Add fence support to kms. Probably only worth it
together with
> the new
> > > > >   atomic stuff. Again we need an in fence to wait for
(one for each
> > > > >   buffer) and an out fence. The later can easily be
implemented by
> > > > >   extending struct drm_event, which means not a single
driver code
> line
> > > > >   needs to be changed for this.
> > > > >
> > > > > - For de-staging android syncpts we need to de-clutter
the internal
> > > > >   interfaces and also review all the ioctls exposed.
Like you say
> it
> > > > >   should be just the userspace interface for struct
drm_fence.
> Also, it
> > > > >   needs testcases and preferrably manpages.
> > > >
> > > > This all sounds very similar to what we'd like to do! 
Maybe we can
> move
> > > > forward with these parts, and continue to attach fences at
submit
> until we have
> > > > a satisfactory solution for the pinning problem?
> > >
> > > Yeah, that's our plan for i915 too. First add explicit
fences, then
> figure
> > > out whether we need to be better at neutering the implicit
fences, in
> case
> > > and only where it really gets in the way.
> > >
> > > > I'd like to understand what are the concrete steps to
de-stage struct
> > > > sync_fence, since that's the first thing that needs to
be done.  For
> example,
> > > > what do you mean by "de-cluttering the internal
interfaces"?  Just
> that we'd
> > > > move the sync_fence parts from
drivers/staging/android/sync.c to,
> say,
> > > > drivers/dma-buf/sync-fence.c ?  Would we still leave a copy
of the
> existing
> > > > full driver to staging/android?
> > >
> > > Yeah I guess that would be an approach. Personally I think we
should
> also
> > > have basic ioctl testcase for all the ioctls exposed by syncpt
fds. And
> > > reviewing the kerneldoc for the driver-internal interfaces (which
> includes
> > > removing everything that's no made obsolete by struct fence).
Bonus
> points
> > > for documenting the ioctls. We could throw the test binary into
libdrm
> > > maybe, there's a bunch other like it already there.
> > >
> > > I'm not sure whether/how much google has already for this.
> >
> > The simplest way to add tests is if we allow user space to create and
> trigger
> > fences.  We don't want to enable that from kernel by default,
because
> that
> > opens possibilities for deadlocks (e.g. a process could deadlock a
> compositor
> > by passing a fence that it never triggers).  Android sync driver
solves
> this by
> > having a separate CONFIG_SW_SYNC_USER that can be used for testing.
> >
> > Here's a simple test by Google:
> >
>
https://android.googlesource.com/platform/system/core/+/master/libsync/sync_test.c
>
> Hm, usually we expose such test interfaces through debugfs - that way
> production system won't ever ship with it (since there's too many
exploits
> in there, especially with secure boot). But since you need it for
> validation tests (at least for the i915 suite) it should always be there
> when you need it.
>
> Exposing this as a configurable driver in dev is imo a no-go. But we
> should be able to easily convert this into a few debugfs files, so not too
> much fuzz hopefully.
>
> > And this is their userspace wrapper lib:
> >
>
https://android.googlesource.com/platform/system/core/+/master/libsync/sync.c
> >
>
https://android.googlesource.com/platform/system/core/+/master/libsync/include/sync/sync.h
> >
> > Would that go to libdrm now...?
>
> Imo it would make sense. At least we kinda want to use fences outside of
> Android, and a separate library project feels like overkill.
>
> > > Aside: Will you be at XDC or linux plumbers? Either would be a
perfect
> > > place to discuss plans and ideas - I'll attend both.
> >
> > I wasn't going to, but let's see.  The former is pretty soon
and the
> latter is
> > sold out.  At least Andy Ritger from Nvidia is coming to XDC for sure,
> and he's
> > been involved in our internal discussions around these topics. So I
> suggest you
> > have a chat with him at least!  :)
>
> I'll definitely have a chat (and some beers) with Andy, been a while
I've
> last seen him ;-)
>
> Cheers, Daniel
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> +41 (0) 79 365 57 48 - http://blog.ffwll.ch
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.freedesktop.org/archives/nouveau/attachments/20141003/2ec9b33e/attachment-0001.html>

Seemingly Similar Threads

Search for more possibly parallel threads

Nouveau - Oct 2014 - [RFC] Explicit synchronization for Nouveau

[Nouveau] [RFC] Explicit synchronization for Nouveau

[Nouveau] [RFC] Explicit synchronization for Nouveau

[Nouveau] [RFC] Explicit synchronization for Nouveau

Seemingly Similar Threads