Patrick Baggett
2015-Feb-25 16:28 UTC
[Nouveau] [PATCH 2/2] nouveau: Do not add most bo's to the global bo list.
So you're saying a compiler can optimize:> > - statement with memory access > - read memory barrier > - statement with memory access > > To drop the second statement? I would worry about your definition of > memory barrier then.. >This is tricky, but I think you're mixing up the general case with the actual code you have. In general, you are pointing to this example: - Read memory location 1 (asm level) - RMB - Read memory location 2 (asm level) In this case, you are precisely correct, the compiler will not reorder the reads, and the CPU won't either. What you actually have is this: - Access memory location 1 (C code) - RMB - Acesss memory location 1 (C code) I say "access" here because looking at the C code, not assembly. You have this: read nvbo->head.next RMB read nvbo->head.next This only works if you can guarantee that the compiler will actually generate a MOV instruction for the second read. If the variable isn't marked as "volatile", the compiler has no requirement to. Barriers control ordering and when a variable can be loaded, but you've already loaded it. You need to reload the variable, but you haven't shown the compiler a reason why it should. "Volatile" is a reason why. I tested this code on gcc: if(globalFlag) { math... if(globalFlag) { more math } } It only tests once. If you replace "math" with printf(), it retests the value of globalFlag. Printf(), which has absolutely no barrier semantics at all. In other words, GCC conservatively will assume any function call will require a reload because it could modify anything. The code will work as-is. Obviously, pthread_mutex_lock() does not modify the value of `nvbo->head.next`. Once GCC knows that (and maybe other compilers already do), perhaps with LTO, it doesn't have to generate a second load. If you ever switch to an inlined lock or different compiler that understands that, your code will break. I suggest you simply do the right thing the first time and cast the pointer to volatile since you're treating the value as if it is volatile (i.e. changing outside of the scope of the compiler's knowledge). Patrick -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.freedesktop.org/archives/nouveau/attachments/20150225/83c32418/attachment.html>
Ilia Mirkin
2015-Feb-25 16:35 UTC
[Nouveau] [PATCH 2/2] nouveau: Do not add most bo's to the global bo list.
pthread_mutex_lock had *better* imply a compiler barrier across which code can't be moved... which is very different from the printf case where it might have done it due to register pressure or who knows what. If code like x = *a; pthread_mutex_lock or unlock or __memory_barrier() y = *a; doesn't cause a to get loaded twice, then the compiler's in serious trouble. Basically functions like pthread_mutex_lock imply that all memory is changed to the compiler, and thus need to be reloaded. -ilia On Wed, Feb 25, 2015 at 11:28 AM, Patrick Baggett <baggett.patrick at gmail.com> wrote:> So you're saying a compiler can optimize: >> >> >> - statement with memory access >> - read memory barrier >> - statement with memory access >> >> To drop the second statement? I would worry about your definition of >> memory barrier then.. > > This is tricky, but I think you're mixing up the general case with the > actual code you have. In general, you are pointing to this example: > > - Read memory location 1 (asm level) > - RMB > - Read memory location 2 (asm level) > > In this case, you are precisely correct, the compiler will not reorder the > reads, and the CPU won't either. > > What you actually have is this: > - Access memory location 1 (C code) > - RMB > - Acesss memory location 1 (C code) > > I say "access" here because looking at the C code, not assembly. You have > this: > > read nvbo->head.next > RMB > read nvbo->head.next > > This only works if you can guarantee that the compiler will actually > generate a MOV instruction for the second read. If the variable isn't marked > as "volatile", the compiler has no requirement to. Barriers control ordering > and when a variable can be loaded, but you've already loaded it. You need to > reload the variable, but you haven't shown the compiler a reason why it > should. "Volatile" is a reason why. > > I tested this code on gcc: > > if(globalFlag) { > math... > if(globalFlag) { > more math > } > } > > It only tests once. If you replace "math" with printf(), it retests the > value of globalFlag. Printf(), which has absolutely no barrier semantics at > all. In other words, GCC conservatively will assume any function call will > require a reload because it could modify anything. The code will work as-is. > > Obviously, pthread_mutex_lock() does not modify the value of > `nvbo->head.next`. Once GCC knows that (and maybe other compilers already > do), perhaps with LTO, it doesn't have to generate a second load. If you > ever switch to an inlined lock or different compiler that understands that, > your code will break. I suggest you simply do the right thing the first time > and cast the pointer to volatile since you're treating the value as if it is > volatile (i.e. changing outside of the scope of the compiler's knowledge). > > Patrick > > > > > _______________________________________________ > Nouveau mailing list > Nouveau at lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/nouveau >
Patrick Baggett
2015-Feb-25 16:59 UTC
[Nouveau] [PATCH 2/2] nouveau: Do not add most bo's to the global bo list.
On Wed, Feb 25, 2015 at 10:35 AM, Ilia Mirkin <imirkin at alum.mit.edu> wrote:> pthread_mutex_lock had *better* imply a compiler barrier across which > code can't be moved... which is very different from the printf case > where it might have done it due to register pressure or who knows > what. >In the dummy function, register pressure was certainly not an issue, but point taken. Still, a compiler barrier prevents reordering like: x = *a; y = *a; pthread_mutex_lock() This changes the external visibility ordering and is certainly NOT ok. However, I contend that it doesn't stop: const int loaded = *a; x = loaded; pthread_mutex_lock(); y = loaded; Because this doesn't change the external visibility behavior, it just changes whether a value is reloaded. It doesn't break the ordering of a memory barrier at all.> If code like > > x = *a; > pthread_mutex_lock or unlock or __memory_barrier() > y = *a; > > doesn't cause a to get loaded twice, then the compiler's in serious > trouble. Basically functions like pthread_mutex_lock imply that all > memory is changed to the compiler, and thus need to be reloaded. > > Well, I've said before and I might be alone, but I disagree with you. Thecompiler is under no requirement to reload (*a) because a lock was changed. It does, but it doesn't have to. It's fine if you guys don't want to change it. It may never be a problem with gcc. This is the definition of pthread_mutex_lock() in glibc. There aren't any magic hints that this invalidates memory: extern int pthread_mutex_lock (pthread_mutex_t *__mutex) __THROWNL __nonnull ((1)); THOWNL is attribute((nothrow)). -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.freedesktop.org/archives/nouveau/attachments/20150225/1d46e029/attachment.html>
Possibly Parallel Threads
- [PATCH 2/2] nouveau: Do not add most bo's to the global bo list.
- [PATCH 2/2] nouveau: Do not add most bo's to the global bo list.
- [PATCH 2/2] nouveau: Do not add most bo's to the global bo list.
- [PATCH 2/2] nouveau: Do not add most bo's to the global bo list.
- [PATCH 2/2] nouveau: Do not add most bo's to the global bo list.