thr3ads.net - Nouveau - [Nouveau] [PATCH 2/2] nouveau: Do not add most bo's to the global bo list. [Feb 2015]

If this information is useful, please help other people find it:
Share via:

Patrick Baggett

2015-Feb-25 15:28 UTC

[Nouveau] [PATCH 2/2] nouveau: Do not add most bo's to the global bo list.

On Wed, Feb 25, 2015 at 9:07 AM, Maarten Lankhorst <
maarten.lankhorst at ubuntu.com> wrote:
> Op 25-02-15 om 16:04 schreef Patrick Baggett:
> > On Wed, Feb 25, 2015 at 8:59 AM, Maarten Lankhorst <
> > maarten.lankhorst at ubuntu.com> wrote:
> >
> >> Op 25-02-15 om 15:11 schreef Emil Velikov:
> >>> On 24 February 2015 at 09:01, Maarten Lankhorst
> >>> <maarten.lankhorst at ubuntu.com> wrote:
> >>>> Only add wrapped bo's and bo's that have been
exported through flink
> or
> >> dma-buf.
> >>>> This avoids a lock in the common case, and decreases
traversal needed
> >> for importing
> >>>> a dma-buf or flink.
> >>>>
> >>>> Signed-off-by: Maarten Lankhorst <maarten.lankhorst at
ubuntu.com>
> >>>> ---
> >>>>  nouveau/nouveau.c | 47
> +++++++++++++++++++++++------------------------
> >>>>  1 file changed, 23 insertions(+), 24 deletions(-)
> >>>>
> >>>> diff --git a/nouveau/nouveau.c b/nouveau/nouveau.c
> >>>> index 1c723b9..d411523 100644
> >>>> --- a/nouveau/nouveau.c
> >>>> +++ b/nouveau/nouveau.c
> >>>> @@ -349,8 +349,8 @@ nouveau_bo_del(struct nouveau_bo *bo)
> >>>>         struct nouveau_bo_priv *nvbo = nouveau_bo(bo);
> >>>>         struct drm_gem_close req = { bo->handle };
> >>>>
> >>>> -       pthread_mutex_lock(&nvdev->lock);
> >>>> -       if (nvbo->name) {
> >>>> +       if (nvbo->head.next) {
> >>>> +               pthread_mutex_lock(&nvdev->lock);
> >>>>                 if (atomic_read(&nvbo->refcnt) ==
0) {
> >>>>                         DRMLISTDEL(&nvbo->head);
> >>>>                         /*
> >>>> @@ -365,8 +365,6 @@ nouveau_bo_del(struct nouveau_bo *bo)
> >>>>                 }
> >>>>                 pthread_mutex_unlock(&nvdev->lock);
> >>>>         } else {
> >>>> -               DRMLISTDEL(&nvbo->head);
> >>>> -               pthread_mutex_unlock(&nvdev->lock);
> >>>>                 drmIoctl(bo->device->fd,
DRM_IOCTL_GEM_CLOSE, &req);
> >>>>         }
> >>>>         if (bo->map)
> >>>> @@ -379,7 +377,6 @@ nouveau_bo_new(struct nouveau_device
*dev,
> uint32_t
> >> flags, uint32_t align,
> >>>>                uint64_t size, union nouveau_bo_config
*config,
> >>>>                struct nouveau_bo **pbo)
> >>>>  {
> >>>> -       struct nouveau_device_priv *nvdev =
nouveau_device(dev);
> >>>>         struct nouveau_bo_priv *nvbo = calloc(1,
sizeof(*nvbo));
> >>>>         struct nouveau_bo *bo = &nvbo->base;
> >>>>         int ret;
> >>>> @@ -397,10 +394,6 @@ nouveau_bo_new(struct nouveau_device
*dev,
> >> uint32_t flags, uint32_t align,
> >>>>                 return ret;
> >>>>         }
> >>>>
> >>>> -       pthread_mutex_lock(&nvdev->lock);
> >>>> -       DRMLISTADD(&nvbo->head,
&nvdev->bo_list);
> >>>> -       pthread_mutex_unlock(&nvdev->lock);
> >>>> -
> >>>>         *pbo = bo;
> >>>>         return 0;
> >>>>  }
> >>>> @@ -457,6 +450,18 @@ nouveau_bo_wrap_locked(struct
nouveau_device
> *dev,
> >> uint32_t handle,
> >>>>         return -ENOMEM;
> >>>>  }
> >>>>
> >>>> +static void
> >>>> +nouveau_bo_make_global(struct nouveau_bo_priv *nvbo)
> >>>> +{
> >>>> +       if (!nvbo->head.next) {
> >>>> +               struct nouveau_device_priv *nvdev >
>> nouveau_device(nvbo->base.device);
> >>>> +               pthread_mutex_lock(&nvdev->lock);
> >>>> +               if (!nvbo->head.next)
> > I guess the bo_make_global call is not particularly sensitive, so
> >> removing's fine with me.
> >>
> > I would be worried about the duplicated check. It seems like a
"smart"
> > compiler could cache the value of "nvbo->head.next"
(unless marked as
> > volatile), rendering the second if() useless. If the field is marked
> > volatile, then of course, this does not apply.
> >
> In that case I would be worried about a compiler that ignores the acquire
> semantics of pthread_mutex_lock..
>
This isn't about the acquire semantics here, because you're reading it
on
both sides of the lock. The compiler need not reorder the reads at all,
merely just save the value of the first read. Unless the compiler knows
that the value can change after the lock is acquired, it can simply cache
the result. Consider the nearly identical transformation:

const int condition = (!nvbo->head.next);
if(condition){
   pthread_mutex_lock(...);
    if(condition){ //redundant, already can assert it is true
        ...

I just wouldn't risk it. The compiler probably makes this transformation
already, but even if it doesn't, it *can*, and that's a bad thing IMO.


Patrick
>
> ~Maarten
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.freedesktop.org/archives/nouveau/attachments/20150225/d3cd53d0/attachment-0001.html>

Maarten Lankhorst

2015-Feb-25 15:43 UTC

head link

[Nouveau] [PATCH 2/2] nouveau: Do not add most bo's to the global bo list.

Op 25-02-15 om 16:28 schreef Patrick Baggett:> On Wed, Feb 25, 2015 at 9:07 AM, Maarten Lankhorst <
> maarten.lankhorst at ubuntu.com> wrote:
>
>> Op 25-02-15 om 16:04 schreef Patrick Baggett:
>>> On Wed, Feb 25, 2015 at 8:59 AM, Maarten Lankhorst <
>>> maarten.lankhorst at ubuntu.com> wrote:
>>>
>>>> Op 25-02-15 om 15:11 schreef Emil Velikov:
>>>>> On 24 February 2015 at 09:01, Maarten Lankhorst
>>>>> <maarten.lankhorst at ubuntu.com> wrote:
>>>>>> Only add wrapped bo's and bo's that have been
exported through flink
>> or
>>>> dma-buf.
>>>>>> This avoids a lock in the common case, and decreases
traversal needed
>>>> for importing
>>>>>> a dma-buf or flink.
>>>>>>
>>>>>> Signed-off-by: Maarten Lankhorst <maarten.lankhorst
at ubuntu.com>
>>>>>> ---
>>>>>>  nouveau/nouveau.c | 47
>> +++++++++++++++++++++++------------------------
>>>>>>  1 file changed, 23 insertions(+), 24 deletions(-)
>>>>>>
>>>>>> diff --git a/nouveau/nouveau.c b/nouveau/nouveau.c
>>>>>> index 1c723b9..d411523 100644
>>>>>> --- a/nouveau/nouveau.c
>>>>>> +++ b/nouveau/nouveau.c
>>>>>> @@ -349,8 +349,8 @@ nouveau_bo_del(struct nouveau_bo
*bo)
>>>>>>         struct nouveau_bo_priv *nvbo = nouveau_bo(bo);
>>>>>>         struct drm_gem_close req = { bo->handle };
>>>>>>
>>>>>> -       pthread_mutex_lock(&nvdev->lock);
>>>>>> -       if (nvbo->name) {
>>>>>> +       if (nvbo->head.next) {
>>>>>> +              
pthread_mutex_lock(&nvdev->lock);
>>>>>>                 if (atomic_read(&nvbo->refcnt)
== 0) {
>>>>>>                         DRMLISTDEL(&nvbo->head);
>>>>>>                         /*
>>>>>> @@ -365,8 +365,6 @@ nouveau_bo_del(struct nouveau_bo
*bo)
>>>>>>                 }
>>>>>>                
pthread_mutex_unlock(&nvdev->lock);
>>>>>>         } else {
>>>>>> -               DRMLISTDEL(&nvbo->head);
>>>>>> -              
pthread_mutex_unlock(&nvdev->lock);
>>>>>>                 drmIoctl(bo->device->fd,
DRM_IOCTL_GEM_CLOSE, &req);
>>>>>>         }
>>>>>>         if (bo->map)
>>>>>> @@ -379,7 +377,6 @@ nouveau_bo_new(struct
nouveau_device *dev,
>> uint32_t
>>>> flags, uint32_t align,
>>>>>>                uint64_t size, union nouveau_bo_config
*config,
>>>>>>                struct nouveau_bo **pbo)
>>>>>>  {
>>>>>> -       struct nouveau_device_priv *nvdev =
nouveau_device(dev);
>>>>>>         struct nouveau_bo_priv *nvbo = calloc(1,
sizeof(*nvbo));
>>>>>>         struct nouveau_bo *bo = &nvbo->base;
>>>>>>         int ret;
>>>>>> @@ -397,10 +394,6 @@ nouveau_bo_new(struct
nouveau_device *dev,
>>>> uint32_t flags, uint32_t align,
>>>>>>                 return ret;
>>>>>>         }
>>>>>>
>>>>>> -       pthread_mutex_lock(&nvdev->lock);
>>>>>> -       DRMLISTADD(&nvbo->head,
&nvdev->bo_list);
>>>>>> -       pthread_mutex_unlock(&nvdev->lock);
>>>>>> -
>>>>>>         *pbo = bo;
>>>>>>         return 0;
>>>>>>  }
>>>>>> @@ -457,6 +450,18 @@ nouveau_bo_wrap_locked(struct
nouveau_device
>> *dev,
>>>> uint32_t handle,
>>>>>>         return -ENOMEM;
>>>>>>  }
>>>>>>
>>>>>> +static void
>>>>>> +nouveau_bo_make_global(struct nouveau_bo_priv *nvbo)
>>>>>> +{
>>>>>> +       if (!nvbo->head.next) {
>>>>>> +               struct nouveau_device_priv *nvdev
>>>> nouveau_device(nvbo->base.device);
>>>>>> +              
pthread_mutex_lock(&nvdev->lock);
>>>>>> +               if (!nvbo->head.next)
>>> I guess the bo_make_global call is not particularly sensitive, so
>>>> removing's fine with me.
>>>>
>>> I would be worried about the duplicated check. It seems like a
"smart"
>>> compiler could cache the value of "nvbo->head.next"
(unless marked as
>>> volatile), rendering the second if() useless. If the field is
marked
>>> volatile, then of course, this does not apply.
>>>
>> In that case I would be worried about a compiler that ignores the
acquire
>> semantics of pthread_mutex_lock..
>>
> This isn't about the acquire semantics here, because you're reading
it on
> both sides of the lock. The compiler need not reorder the reads at all,
> merely just save the value of the first read. Unless the compiler knows
> that the value can change after the lock is acquired, it can simply cache
> the result. Consider the nearly identical transformation:
>
> const int condition = (!nvbo->head.next);
> if(condition){
>    pthread_mutex_lock(...);
>     if(condition){ //redundant, already can assert it is true
>         ...
>So you're saying a compiler can optimize:

- statement with memory access
- read memory barrier
- statement with memory access

To drop the second statement? I would worry about your definition of memory
barrier then..

~Maarten

Emil Velikov

2015-Feb-25 16:14 UTC

head link

[Nouveau] [PATCH 2/2] nouveau: Do not add most bo's to the global bo list.

On 25 February 2015 at 15:43, Maarten Lankhorst
<maarten.lankhorst at ubuntu.com> wrote:> Op 25-02-15 om 16:28 schreef Patrick Baggett:
>> On Wed, Feb 25, 2015 at 9:07 AM, Maarten Lankhorst <
>> maarten.lankhorst at ubuntu.com> wrote:
>>
>>> Op 25-02-15 om 16:04 schreef Patrick Baggett:
>>>> On Wed, Feb 25, 2015 at 8:59 AM, Maarten Lankhorst <
>>>> maarten.lankhorst at ubuntu.com> wrote:
>>>>
>>>>> Op 25-02-15 om 15:11 schreef Emil Velikov:
>>>>>> On 24 February 2015 at 09:01, Maarten Lankhorst
>>>>>> <maarten.lankhorst at ubuntu.com> wrote:
>>>>>>> Only add wrapped bo's and bo's that have
been exported through flink
>>> or
>>>>> dma-buf.
>>>>>>> This avoids a lock in the common case, and
decreases traversal needed
>>>>> for importing
>>>>>>> a dma-buf or flink.
>>>>>>>
>>>>>>> Signed-off-by: Maarten Lankhorst
<maarten.lankhorst at ubuntu.com>
>>>>>>> ---
>>>>>>>  nouveau/nouveau.c | 47
>>> +++++++++++++++++++++++------------------------
>>>>>>>  1 file changed, 23 insertions(+), 24 deletions(-)
>>>>>>>
>>>>>>> diff --git a/nouveau/nouveau.c b/nouveau/nouveau.c
>>>>>>> index 1c723b9..d411523 100644
>>>>>>> --- a/nouveau/nouveau.c
>>>>>>> +++ b/nouveau/nouveau.c
>>>>>>> @@ -349,8 +349,8 @@ nouveau_bo_del(struct
nouveau_bo *bo)
>>>>>>>         struct nouveau_bo_priv *nvbo =
nouveau_bo(bo);
>>>>>>>         struct drm_gem_close req = { bo->handle
};
>>>>>>>
>>>>>>> -       pthread_mutex_lock(&nvdev->lock);
>>>>>>> -       if (nvbo->name) {
>>>>>>> +       if (nvbo->head.next) {
>>>>>>> +              
pthread_mutex_lock(&nvdev->lock);
>>>>>>>                 if
(atomic_read(&nvbo->refcnt) == 0) {
>>>>>>>                        
DRMLISTDEL(&nvbo->head);
>>>>>>>                         /*
>>>>>>> @@ -365,8 +365,6 @@ nouveau_bo_del(struct
nouveau_bo *bo)
>>>>>>>                 }
>>>>>>>                
pthread_mutex_unlock(&nvdev->lock);
>>>>>>>         } else {
>>>>>>> -               DRMLISTDEL(&nvbo->head);
>>>>>>> -              
pthread_mutex_unlock(&nvdev->lock);
>>>>>>>                 drmIoctl(bo->device->fd,
DRM_IOCTL_GEM_CLOSE, &req);
>>>>>>>         }
>>>>>>>         if (bo->map)
>>>>>>> @@ -379,7 +377,6 @@ nouveau_bo_new(struct
nouveau_device *dev,
>>> uint32_t
>>>>> flags, uint32_t align,
>>>>>>>                uint64_t size, union
nouveau_bo_config *config,
>>>>>>>                struct nouveau_bo **pbo)
>>>>>>>  {
>>>>>>> -       struct nouveau_device_priv *nvdev =
nouveau_device(dev);
>>>>>>>         struct nouveau_bo_priv *nvbo = calloc(1,
sizeof(*nvbo));
>>>>>>>         struct nouveau_bo *bo = &nvbo->base;
>>>>>>>         int ret;
>>>>>>> @@ -397,10 +394,6 @@ nouveau_bo_new(struct
nouveau_device *dev,
>>>>> uint32_t flags, uint32_t align,
>>>>>>>                 return ret;
>>>>>>>         }
>>>>>>>
>>>>>>> -       pthread_mutex_lock(&nvdev->lock);
>>>>>>> -       DRMLISTADD(&nvbo->head,
&nvdev->bo_list);
>>>>>>> -       pthread_mutex_unlock(&nvdev->lock);
>>>>>>> -
>>>>>>>         *pbo = bo;
>>>>>>>         return 0;
>>>>>>>  }
>>>>>>> @@ -457,6 +450,18 @@ nouveau_bo_wrap_locked(struct
nouveau_device
>>> *dev,
>>>>> uint32_t handle,
>>>>>>>         return -ENOMEM;
>>>>>>>  }
>>>>>>>
>>>>>>> +static void
>>>>>>> +nouveau_bo_make_global(struct nouveau_bo_priv
*nvbo)
>>>>>>> +{
>>>>>>> +       if (!nvbo->head.next) {
>>>>>>> +               struct nouveau_device_priv *nvdev
>>>>> nouveau_device(nvbo->base.device);
>>>>>>> +              
pthread_mutex_lock(&nvdev->lock);
>>>>>>> +               if (!nvbo->head.next)
>>>> I guess the bo_make_global call is not particularly sensitive,
so
>>>>> removing's fine with me.
>>>>>
>>>> I would be worried about the duplicated check. It seems like a
"smart"
>>>> compiler could cache the value of
"nvbo->head.next" (unless marked as
>>>> volatile), rendering the second if() useless. If the field is
marked
>>>> volatile, then of course, this does not apply.
>>>>
>>> In that case I would be worried about a compiler that ignores the
acquire
>>> semantics of pthread_mutex_lock..
>>>
>> This isn't about the acquire semantics here, because you're
reading it on
>> both sides of the lock. The compiler need not reorder the reads at all,
>> merely just save the value of the first read. Unless the compiler knows
>> that the value can change after the lock is acquired, it can simply
cache
>> the result. Consider the nearly identical transformation:
>>
>> const int condition = (!nvbo->head.next);
>> if(condition){
>>    pthread_mutex_lock(...);
>>     if(condition){ //redundant, already can assert it is true
>>         ...
>>
> So you're saying a compiler can optimize:
>
> - statement with memory access
> - read memory barrier
> - statement with memory access
>
> To drop the second statement? I would worry about your definition of memory
barrier then..
>Patrick thanks for elaborating my concern in a more complete manner.

Maarten, I would assume that the overhead of dropping the first if
statement will be negligible wrt the time spent debugging when the
compiler does the silly thing. Afaict even with a single if, your
patch does reduce the time spend in libdrm_nouveau quite a bit.

That said, it's up-to you and others to draw the final call.

Cheers,
Emil

Patrick Baggett

2015-Feb-25 16:28 UTC

head link

[Nouveau] [PATCH 2/2] nouveau: Do not add most bo's to the global bo list.

So you're saying a compiler can optimize:
>
> - statement with memory access
> - read memory barrier
> - statement with memory access
>
> To drop the second statement? I would worry about your definition of
> memory barrier then..
>This is tricky, but I think you're mixing up the general case with the
actual code you have. In general, you are pointing to this example:

- Read memory location 1 (asm level)
- RMB
- Read memory location 2 (asm level)

In this case, you are precisely correct, the compiler will not reorder the
reads, and the CPU won't either.

What you actually have is this:
- Access memory location 1 (C code)
- RMB
- Acesss memory location 1 (C code)

I say "access" here because looking at the C code, not assembly. You
have
this:

read nvbo->head.next
RMB
read nvbo->head.next

This only works if you can guarantee that the compiler will actually
generate a MOV instruction for the second read. If the variable isn't
marked as "volatile", the compiler has no requirement to. Barriers
control
ordering and when a variable can be loaded, but you've already loaded it.
You need to reload the variable, but you haven't shown the compiler a
reason why it should. "Volatile" is a reason why.

I tested this code on gcc:

if(globalFlag) {
    math...
    if(globalFlag) {
        more math
    }
}

It only tests once. If you replace "math" with printf(), it retests
the
value of globalFlag. Printf(), which has absolutely no barrier semantics at
all. In other words, GCC conservatively will assume any function call will
require a reload because it could modify anything. The code will work as-is.

Obviously, pthread_mutex_lock() does not modify the value of
`nvbo->head.next`. Once GCC knows that (and maybe other compilers already
do), perhaps with LTO, it doesn't have to generate a second load. If you
ever switch to an inlined lock or different compiler that understands that,
your code will break. I suggest you simply do the right thing the first
time and cast the pointer to volatile since you're treating the value as if
it is volatile (i.e. changing outside of the scope of the compiler's
knowledge).

Patrick
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.freedesktop.org/archives/nouveau/attachments/20150225/83c32418/attachment.html>

Seemingly Similar Threads

Search for more possibly parallel threads

Nouveau - Feb 2015 - [PATCH 2/2] nouveau: Do not add most bo's to the global bo list.

[Nouveau] [PATCH 2/2] nouveau: Do not add most bo's to the global bo list.

[Nouveau] [PATCH 2/2] nouveau: Do not add most bo's to the global bo list.

[Nouveau] [PATCH 2/2] nouveau: Do not add most bo's to the global bo list.

[Nouveau] [PATCH 2/2] nouveau: Do not add most bo's to the global bo list.

Seemingly Similar Threads