Maarten Lankhorst
2015-Feb-25 17:13 UTC
[Nouveau] [PATCH 2/2] nouveau: Do not add most bo's to the global bo list.
Hey, On 25-02-15 18:05, Ilia Mirkin wrote:> On Wed, Feb 25, 2015 at 11:59 AM, Patrick Baggett > <baggett.patrick at gmail.com> wrote: >>> If code like >>> >>> x = *a; >>> pthread_mutex_lock or unlock or __memory_barrier() >>> y = *a; >>> >>> doesn't cause a to get loaded twice, then the compiler's in serious >>> trouble. Basically functions like pthread_mutex_lock imply that all >>> memory is changed to the compiler, and thus need to be reloaded. >>> >> Well, I've said before and I might be alone, but I disagree with you. The >> compiler is under no requirement to reload (*a) because a lock was changed. >> It does, but it doesn't have to. It's fine if you guys don't want to change >> it. It may never be a problem with gcc. >> >> This is the definition of pthread_mutex_lock() in glibc. There aren't any >> magic hints that this invalidates memory: >> >> extern int pthread_mutex_lock (pthread_mutex_t *__mutex) >> __THROWNL __nonnull ((1)); >> >> THOWNL is attribute((nothrow)). > > Hm, this is actually a little worrying. Maarten, thoughts? I would > have assumed there'd be a __attribute__((some_magic_thing)) in there.In general things don't get optimized across function calls, except in case of inlinable functions. And for compiler attributes it's the opposite,__attribute__((const)) and __attribute((pure)) can be used to indicate some kind of safety to optimize across functions. https://gcc.gnu.org/onlinedocs/gcc/Function-Attributes.html ~Maarten
Patrick Baggett
2015-Feb-25 17:26 UTC
[Nouveau] [PATCH 2/2] nouveau: Do not add most bo's to the global bo list.
> > > In general things don't get optimized across function calls, except in > case of inlinable functions. > > And for compiler attributes it's the opposite,__attribute__((const)) and > __attribute((pure)) can be used to indicate some kind of safety to optimize > across functions. > > https://gcc.gnu.org/onlinedocs/gcc/Function-Attributes.html > > This is true, but LTO increases the compiler's ability to make these sortsof optimizations across function calls and even C source file boundaries without you needing to explicitly mark functions as such. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.freedesktop.org/archives/nouveau/attachments/20150225/7434431b/attachment-0001.html>
Maarten Lankhorst
2015-Feb-25 17:55 UTC
[Nouveau] [PATCH 2/2] nouveau: Do not add most bo's to the global bo list.
On 25-02-15 18:26, Patrick Baggett wrote:>> >> >> In general things don't get optimized across function calls, except in >> case of inlinable functions. >> >> And for compiler attributes it's the opposite,__attribute__((const)) and >> __attribute((pure)) can be used to indicate some kind of safety to optimize >> across functions. >> >> https://gcc.gnu.org/onlinedocs/gcc/Function-Attributes.html >> >> This is true, but LTO increases the compiler's ability to make these sorts > of optimizations across function calls and even C source file boundaries > without you needing to explicitly mark functions as such.Even if pthread_mutex_lock was completely inlined there would still be a asm volatile(("" ::: "memory")) in there acting as a complete memory barrier to the compiler. Create a function called dummy.c, abusing the fact that gcc can't handle pointers well so it won't get reduced to a constant return value: int x; int *px = &x; int main() { if (*px == 1) return 1; asm volatile("" ::: "memory"); if (*px == 1) return 1; return -1; } Now compile with gcc test.c -O3 -fwhole-program, and run objdump -d a.out: 400400: 83 3d 49 0c 20 00 01 cmpl $0x1,0x200c49(%rip) # 601050 <x> 400407: 74 09 je 400412 <main+0x12> 400409: 83 3d 40 0c 20 00 01 cmpl $0x1,0x200c40(%rip) # 601050 <x> 400410: 75 06 jne 400418 <main+0x18> 400412: b8 01 00 00 00 mov $0x1,%eax 400417: c3 retq 400418: 83 c8 ff or $0xffffffff,%eax 40041b: c3 retq Hey my second check didn't get compiled away.. magic. And to show that a random function call does the same, replace the barrier with random(): 0000000000400440 <main>: 400440: 83 3d 09 0c 20 00 01 cmpl $0x1,0x200c09(%rip) # 601050 <x> 400447: 74 1b je 400464 <main+0x24> 400449: 50 push %rax 40044a: 31 c0 xor %eax,%eax 40044c: e8 df ff ff ff callq 400430 <random at plt> 400451: 83 3d f8 0b 20 00 01 cmpl $0x1,0x200bf8(%rip) # 601050 <x> 400458: b8 01 00 00 00 mov $0x1,%eax 40045d: 75 0b jne 40046a <main+0x2a> 40045f: 48 83 c4 08 add $0x8,%rsp 400463: c3 retq 400464: b8 01 00 00 00 mov $0x1,%eax 400469: c3 retq 40046a: 83 c8 ff or $0xffffffff,%eax 40046d: eb f0 jmp 40045f <main+0x1f> And just to be thorough, showing what happens without function call or barrier: 0000000000400400 <main>: 400400: 8b 05 4a 0c 20 00 mov 0x200c4a(%rip),%eax # 601050 <x> 400406: ba ff ff ff ff mov $0xffffffff,%edx 40040b: 83 f8 01 cmp $0x1,%eax 40040e: 0f 45 c2 cmovne %edx,%eax 400411: c3 retq ~Maarten
Apparently Analagous Threads
- [PATCH 2/2] nouveau: Do not add most bo's to the global bo list.
- [PATCH 2/2] nouveau: Do not add most bo's to the global bo list.
- [PATCH 2/2] nouveau: Do not add most bo's to the global bo list.
- [PATCH 2/2] nouveau: Do not add most bo's to the global bo list.
- [PATCH 2/2] nouveau: Do not add most bo's to the global bo list.