Patrick Baggett
2015-Feb-25 15:28 UTC
[Nouveau] [PATCH 2/2] nouveau: Do not add most bo's to the global bo list.
On Wed, Feb 25, 2015 at 9:07 AM, Maarten Lankhorst < maarten.lankhorst at ubuntu.com> wrote:> Op 25-02-15 om 16:04 schreef Patrick Baggett: > > On Wed, Feb 25, 2015 at 8:59 AM, Maarten Lankhorst < > > maarten.lankhorst at ubuntu.com> wrote: > > > >> Op 25-02-15 om 15:11 schreef Emil Velikov: > >>> On 24 February 2015 at 09:01, Maarten Lankhorst > >>> <maarten.lankhorst at ubuntu.com> wrote: > >>>> Only add wrapped bo's and bo's that have been exported through flink > or > >> dma-buf. > >>>> This avoids a lock in the common case, and decreases traversal needed > >> for importing > >>>> a dma-buf or flink. > >>>> > >>>> Signed-off-by: Maarten Lankhorst <maarten.lankhorst at ubuntu.com> > >>>> --- > >>>> nouveau/nouveau.c | 47 > +++++++++++++++++++++++------------------------ > >>>> 1 file changed, 23 insertions(+), 24 deletions(-) > >>>> > >>>> diff --git a/nouveau/nouveau.c b/nouveau/nouveau.c > >>>> index 1c723b9..d411523 100644 > >>>> --- a/nouveau/nouveau.c > >>>> +++ b/nouveau/nouveau.c > >>>> @@ -349,8 +349,8 @@ nouveau_bo_del(struct nouveau_bo *bo) > >>>> struct nouveau_bo_priv *nvbo = nouveau_bo(bo); > >>>> struct drm_gem_close req = { bo->handle }; > >>>> > >>>> - pthread_mutex_lock(&nvdev->lock); > >>>> - if (nvbo->name) { > >>>> + if (nvbo->head.next) { > >>>> + pthread_mutex_lock(&nvdev->lock); > >>>> if (atomic_read(&nvbo->refcnt) == 0) { > >>>> DRMLISTDEL(&nvbo->head); > >>>> /* > >>>> @@ -365,8 +365,6 @@ nouveau_bo_del(struct nouveau_bo *bo) > >>>> } > >>>> pthread_mutex_unlock(&nvdev->lock); > >>>> } else { > >>>> - DRMLISTDEL(&nvbo->head); > >>>> - pthread_mutex_unlock(&nvdev->lock); > >>>> drmIoctl(bo->device->fd, DRM_IOCTL_GEM_CLOSE, &req); > >>>> } > >>>> if (bo->map) > >>>> @@ -379,7 +377,6 @@ nouveau_bo_new(struct nouveau_device *dev, > uint32_t > >> flags, uint32_t align, > >>>> uint64_t size, union nouveau_bo_config *config, > >>>> struct nouveau_bo **pbo) > >>>> { > >>>> - struct nouveau_device_priv *nvdev = nouveau_device(dev); > >>>> struct nouveau_bo_priv *nvbo = calloc(1, sizeof(*nvbo)); > >>>> struct nouveau_bo *bo = &nvbo->base; > >>>> int ret; > >>>> @@ -397,10 +394,6 @@ nouveau_bo_new(struct nouveau_device *dev, > >> uint32_t flags, uint32_t align, > >>>> return ret; > >>>> } > >>>> > >>>> - pthread_mutex_lock(&nvdev->lock); > >>>> - DRMLISTADD(&nvbo->head, &nvdev->bo_list); > >>>> - pthread_mutex_unlock(&nvdev->lock); > >>>> - > >>>> *pbo = bo; > >>>> return 0; > >>>> } > >>>> @@ -457,6 +450,18 @@ nouveau_bo_wrap_locked(struct nouveau_device > *dev, > >> uint32_t handle, > >>>> return -ENOMEM; > >>>> } > >>>> > >>>> +static void > >>>> +nouveau_bo_make_global(struct nouveau_bo_priv *nvbo) > >>>> +{ > >>>> + if (!nvbo->head.next) { > >>>> + struct nouveau_device_priv *nvdev > >> nouveau_device(nvbo->base.device); > >>>> + pthread_mutex_lock(&nvdev->lock); > >>>> + if (!nvbo->head.next) > > I guess the bo_make_global call is not particularly sensitive, so > >> removing's fine with me. > >> > > I would be worried about the duplicated check. It seems like a "smart" > > compiler could cache the value of "nvbo->head.next" (unless marked as > > volatile), rendering the second if() useless. If the field is marked > > volatile, then of course, this does not apply. > > > In that case I would be worried about a compiler that ignores the acquire > semantics of pthread_mutex_lock.. >This isn't about the acquire semantics here, because you're reading it on both sides of the lock. The compiler need not reorder the reads at all, merely just save the value of the first read. Unless the compiler knows that the value can change after the lock is acquired, it can simply cache the result. Consider the nearly identical transformation: const int condition = (!nvbo->head.next); if(condition){ pthread_mutex_lock(...); if(condition){ //redundant, already can assert it is true ... I just wouldn't risk it. The compiler probably makes this transformation already, but even if it doesn't, it *can*, and that's a bad thing IMO. Patrick> > ~Maarten > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.freedesktop.org/archives/nouveau/attachments/20150225/d3cd53d0/attachment-0001.html>
Maarten Lankhorst
2015-Feb-25 15:43 UTC
[Nouveau] [PATCH 2/2] nouveau: Do not add most bo's to the global bo list.
Op 25-02-15 om 16:28 schreef Patrick Baggett:> On Wed, Feb 25, 2015 at 9:07 AM, Maarten Lankhorst < > maarten.lankhorst at ubuntu.com> wrote: > >> Op 25-02-15 om 16:04 schreef Patrick Baggett: >>> On Wed, Feb 25, 2015 at 8:59 AM, Maarten Lankhorst < >>> maarten.lankhorst at ubuntu.com> wrote: >>> >>>> Op 25-02-15 om 15:11 schreef Emil Velikov: >>>>> On 24 February 2015 at 09:01, Maarten Lankhorst >>>>> <maarten.lankhorst at ubuntu.com> wrote: >>>>>> Only add wrapped bo's and bo's that have been exported through flink >> or >>>> dma-buf. >>>>>> This avoids a lock in the common case, and decreases traversal needed >>>> for importing >>>>>> a dma-buf or flink. >>>>>> >>>>>> Signed-off-by: Maarten Lankhorst <maarten.lankhorst at ubuntu.com> >>>>>> --- >>>>>> nouveau/nouveau.c | 47 >> +++++++++++++++++++++++------------------------ >>>>>> 1 file changed, 23 insertions(+), 24 deletions(-) >>>>>> >>>>>> diff --git a/nouveau/nouveau.c b/nouveau/nouveau.c >>>>>> index 1c723b9..d411523 100644 >>>>>> --- a/nouveau/nouveau.c >>>>>> +++ b/nouveau/nouveau.c >>>>>> @@ -349,8 +349,8 @@ nouveau_bo_del(struct nouveau_bo *bo) >>>>>> struct nouveau_bo_priv *nvbo = nouveau_bo(bo); >>>>>> struct drm_gem_close req = { bo->handle }; >>>>>> >>>>>> - pthread_mutex_lock(&nvdev->lock); >>>>>> - if (nvbo->name) { >>>>>> + if (nvbo->head.next) { >>>>>> + pthread_mutex_lock(&nvdev->lock); >>>>>> if (atomic_read(&nvbo->refcnt) == 0) { >>>>>> DRMLISTDEL(&nvbo->head); >>>>>> /* >>>>>> @@ -365,8 +365,6 @@ nouveau_bo_del(struct nouveau_bo *bo) >>>>>> } >>>>>> pthread_mutex_unlock(&nvdev->lock); >>>>>> } else { >>>>>> - DRMLISTDEL(&nvbo->head); >>>>>> - pthread_mutex_unlock(&nvdev->lock); >>>>>> drmIoctl(bo->device->fd, DRM_IOCTL_GEM_CLOSE, &req); >>>>>> } >>>>>> if (bo->map) >>>>>> @@ -379,7 +377,6 @@ nouveau_bo_new(struct nouveau_device *dev, >> uint32_t >>>> flags, uint32_t align, >>>>>> uint64_t size, union nouveau_bo_config *config, >>>>>> struct nouveau_bo **pbo) >>>>>> { >>>>>> - struct nouveau_device_priv *nvdev = nouveau_device(dev); >>>>>> struct nouveau_bo_priv *nvbo = calloc(1, sizeof(*nvbo)); >>>>>> struct nouveau_bo *bo = &nvbo->base; >>>>>> int ret; >>>>>> @@ -397,10 +394,6 @@ nouveau_bo_new(struct nouveau_device *dev, >>>> uint32_t flags, uint32_t align, >>>>>> return ret; >>>>>> } >>>>>> >>>>>> - pthread_mutex_lock(&nvdev->lock); >>>>>> - DRMLISTADD(&nvbo->head, &nvdev->bo_list); >>>>>> - pthread_mutex_unlock(&nvdev->lock); >>>>>> - >>>>>> *pbo = bo; >>>>>> return 0; >>>>>> } >>>>>> @@ -457,6 +450,18 @@ nouveau_bo_wrap_locked(struct nouveau_device >> *dev, >>>> uint32_t handle, >>>>>> return -ENOMEM; >>>>>> } >>>>>> >>>>>> +static void >>>>>> +nouveau_bo_make_global(struct nouveau_bo_priv *nvbo) >>>>>> +{ >>>>>> + if (!nvbo->head.next) { >>>>>> + struct nouveau_device_priv *nvdev >>>> nouveau_device(nvbo->base.device); >>>>>> + pthread_mutex_lock(&nvdev->lock); >>>>>> + if (!nvbo->head.next) >>> I guess the bo_make_global call is not particularly sensitive, so >>>> removing's fine with me. >>>> >>> I would be worried about the duplicated check. It seems like a "smart" >>> compiler could cache the value of "nvbo->head.next" (unless marked as >>> volatile), rendering the second if() useless. If the field is marked >>> volatile, then of course, this does not apply. >>> >> In that case I would be worried about a compiler that ignores the acquire >> semantics of pthread_mutex_lock.. >> > This isn't about the acquire semantics here, because you're reading it on > both sides of the lock. The compiler need not reorder the reads at all, > merely just save the value of the first read. Unless the compiler knows > that the value can change after the lock is acquired, it can simply cache > the result. Consider the nearly identical transformation: > > const int condition = (!nvbo->head.next); > if(condition){ > pthread_mutex_lock(...); > if(condition){ //redundant, already can assert it is true > ... >So you're saying a compiler can optimize: - statement with memory access - read memory barrier - statement with memory access To drop the second statement? I would worry about your definition of memory barrier then.. ~Maarten
Emil Velikov
2015-Feb-25 16:14 UTC
[Nouveau] [PATCH 2/2] nouveau: Do not add most bo's to the global bo list.
On 25 February 2015 at 15:43, Maarten Lankhorst <maarten.lankhorst at ubuntu.com> wrote:> Op 25-02-15 om 16:28 schreef Patrick Baggett: >> On Wed, Feb 25, 2015 at 9:07 AM, Maarten Lankhorst < >> maarten.lankhorst at ubuntu.com> wrote: >> >>> Op 25-02-15 om 16:04 schreef Patrick Baggett: >>>> On Wed, Feb 25, 2015 at 8:59 AM, Maarten Lankhorst < >>>> maarten.lankhorst at ubuntu.com> wrote: >>>> >>>>> Op 25-02-15 om 15:11 schreef Emil Velikov: >>>>>> On 24 February 2015 at 09:01, Maarten Lankhorst >>>>>> <maarten.lankhorst at ubuntu.com> wrote: >>>>>>> Only add wrapped bo's and bo's that have been exported through flink >>> or >>>>> dma-buf. >>>>>>> This avoids a lock in the common case, and decreases traversal needed >>>>> for importing >>>>>>> a dma-buf or flink. >>>>>>> >>>>>>> Signed-off-by: Maarten Lankhorst <maarten.lankhorst at ubuntu.com> >>>>>>> --- >>>>>>> nouveau/nouveau.c | 47 >>> +++++++++++++++++++++++------------------------ >>>>>>> 1 file changed, 23 insertions(+), 24 deletions(-) >>>>>>> >>>>>>> diff --git a/nouveau/nouveau.c b/nouveau/nouveau.c >>>>>>> index 1c723b9..d411523 100644 >>>>>>> --- a/nouveau/nouveau.c >>>>>>> +++ b/nouveau/nouveau.c >>>>>>> @@ -349,8 +349,8 @@ nouveau_bo_del(struct nouveau_bo *bo) >>>>>>> struct nouveau_bo_priv *nvbo = nouveau_bo(bo); >>>>>>> struct drm_gem_close req = { bo->handle }; >>>>>>> >>>>>>> - pthread_mutex_lock(&nvdev->lock); >>>>>>> - if (nvbo->name) { >>>>>>> + if (nvbo->head.next) { >>>>>>> + pthread_mutex_lock(&nvdev->lock); >>>>>>> if (atomic_read(&nvbo->refcnt) == 0) { >>>>>>> DRMLISTDEL(&nvbo->head); >>>>>>> /* >>>>>>> @@ -365,8 +365,6 @@ nouveau_bo_del(struct nouveau_bo *bo) >>>>>>> } >>>>>>> pthread_mutex_unlock(&nvdev->lock); >>>>>>> } else { >>>>>>> - DRMLISTDEL(&nvbo->head); >>>>>>> - pthread_mutex_unlock(&nvdev->lock); >>>>>>> drmIoctl(bo->device->fd, DRM_IOCTL_GEM_CLOSE, &req); >>>>>>> } >>>>>>> if (bo->map) >>>>>>> @@ -379,7 +377,6 @@ nouveau_bo_new(struct nouveau_device *dev, >>> uint32_t >>>>> flags, uint32_t align, >>>>>>> uint64_t size, union nouveau_bo_config *config, >>>>>>> struct nouveau_bo **pbo) >>>>>>> { >>>>>>> - struct nouveau_device_priv *nvdev = nouveau_device(dev); >>>>>>> struct nouveau_bo_priv *nvbo = calloc(1, sizeof(*nvbo)); >>>>>>> struct nouveau_bo *bo = &nvbo->base; >>>>>>> int ret; >>>>>>> @@ -397,10 +394,6 @@ nouveau_bo_new(struct nouveau_device *dev, >>>>> uint32_t flags, uint32_t align, >>>>>>> return ret; >>>>>>> } >>>>>>> >>>>>>> - pthread_mutex_lock(&nvdev->lock); >>>>>>> - DRMLISTADD(&nvbo->head, &nvdev->bo_list); >>>>>>> - pthread_mutex_unlock(&nvdev->lock); >>>>>>> - >>>>>>> *pbo = bo; >>>>>>> return 0; >>>>>>> } >>>>>>> @@ -457,6 +450,18 @@ nouveau_bo_wrap_locked(struct nouveau_device >>> *dev, >>>>> uint32_t handle, >>>>>>> return -ENOMEM; >>>>>>> } >>>>>>> >>>>>>> +static void >>>>>>> +nouveau_bo_make_global(struct nouveau_bo_priv *nvbo) >>>>>>> +{ >>>>>>> + if (!nvbo->head.next) { >>>>>>> + struct nouveau_device_priv *nvdev >>>>> nouveau_device(nvbo->base.device); >>>>>>> + pthread_mutex_lock(&nvdev->lock); >>>>>>> + if (!nvbo->head.next) >>>> I guess the bo_make_global call is not particularly sensitive, so >>>>> removing's fine with me. >>>>> >>>> I would be worried about the duplicated check. It seems like a "smart" >>>> compiler could cache the value of "nvbo->head.next" (unless marked as >>>> volatile), rendering the second if() useless. If the field is marked >>>> volatile, then of course, this does not apply. >>>> >>> In that case I would be worried about a compiler that ignores the acquire >>> semantics of pthread_mutex_lock.. >>> >> This isn't about the acquire semantics here, because you're reading it on >> both sides of the lock. The compiler need not reorder the reads at all, >> merely just save the value of the first read. Unless the compiler knows >> that the value can change after the lock is acquired, it can simply cache >> the result. Consider the nearly identical transformation: >> >> const int condition = (!nvbo->head.next); >> if(condition){ >> pthread_mutex_lock(...); >> if(condition){ //redundant, already can assert it is true >> ... >> > So you're saying a compiler can optimize: > > - statement with memory access > - read memory barrier > - statement with memory access > > To drop the second statement? I would worry about your definition of memory barrier then.. >Patrick thanks for elaborating my concern in a more complete manner. Maarten, I would assume that the overhead of dropping the first if statement will be negligible wrt the time spent debugging when the compiler does the silly thing. Afaict even with a single if, your patch does reduce the time spend in libdrm_nouveau quite a bit. That said, it's up-to you and others to draw the final call. Cheers, Emil
Patrick Baggett
2015-Feb-25 16:28 UTC
[Nouveau] [PATCH 2/2] nouveau: Do not add most bo's to the global bo list.
So you're saying a compiler can optimize:> > - statement with memory access > - read memory barrier > - statement with memory access > > To drop the second statement? I would worry about your definition of > memory barrier then.. >This is tricky, but I think you're mixing up the general case with the actual code you have. In general, you are pointing to this example: - Read memory location 1 (asm level) - RMB - Read memory location 2 (asm level) In this case, you are precisely correct, the compiler will not reorder the reads, and the CPU won't either. What you actually have is this: - Access memory location 1 (C code) - RMB - Acesss memory location 1 (C code) I say "access" here because looking at the C code, not assembly. You have this: read nvbo->head.next RMB read nvbo->head.next This only works if you can guarantee that the compiler will actually generate a MOV instruction for the second read. If the variable isn't marked as "volatile", the compiler has no requirement to. Barriers control ordering and when a variable can be loaded, but you've already loaded it. You need to reload the variable, but you haven't shown the compiler a reason why it should. "Volatile" is a reason why. I tested this code on gcc: if(globalFlag) { math... if(globalFlag) { more math } } It only tests once. If you replace "math" with printf(), it retests the value of globalFlag. Printf(), which has absolutely no barrier semantics at all. In other words, GCC conservatively will assume any function call will require a reload because it could modify anything. The code will work as-is. Obviously, pthread_mutex_lock() does not modify the value of `nvbo->head.next`. Once GCC knows that (and maybe other compilers already do), perhaps with LTO, it doesn't have to generate a second load. If you ever switch to an inlined lock or different compiler that understands that, your code will break. I suggest you simply do the right thing the first time and cast the pointer to volatile since you're treating the value as if it is volatile (i.e. changing outside of the scope of the compiler's knowledge). Patrick -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.freedesktop.org/archives/nouveau/attachments/20150225/83c32418/attachment.html>
Seemingly Similar Threads
- [PATCH 2/2] nouveau: Do not add most bo's to the global bo list.
- [PATCH 2/2] nouveau: Do not add most bo's to the global bo list.
- [PATCH 2/2] nouveau: Do not add most bo's to the global bo list.
- [PATCH 2/2] nouveau: Do not add most bo's to the global bo list.
- [PATCH 2/2] nouveau: Do not add most bo's to the global bo list.