On Jul 23, 2009, at 1:59 AM, Mark Shannon wrote:> Hi Dan, > > What you are proposing is a major change in the semantics of llvm. > > You are taking certain uses of an instruction that have well defined > behaviour and undefining them. > > Have you made any estimate of how many peoples' code this may or may > not > break? > > I think this is a *very* bad idea.Hi Mark, I estimate this will break very little. I apologize for being unclear; In practical terms, this proposal isn't a big change. It's a conceptual framework for describing assumptions that the optimizer is already making in many cases. It will help us resolve some cases where the optimizer's assumptions are contradictory. And, it sets the stage for a new flavor of getelementptr which will be more permissive than the current getelementptr.> > Let me make some more detailed comments: > > Dan Gohman wrote: >> Hello, >> >> I'm working on refining the definition of getelementptr (GEP) to >> clarify its semantics and to allow front-ends to provide additional >> information to optimization passes. >> >> To help support this, I'd like to propose the following rule: >> >> Any memory access must be done though a pointer value associated >> with with address range of the memory access, otherwise the behavior >> is undefined. >> >> "associated with" is defined as follows: >> >> - A pointer value formed from a getelementptr instruction is >> associated with the addresses associated with the first operand of >> the getelementptr. >> - An addresses of a global variable is associated with the address >> range of the variable's storage. >> - The result value of an allocation instruction is associated with >> the address range of the allocated storage. >> - A null pointer is associated with no addresses. >> - A pointer value formed by a ptrtoint is associated with all >> address >> ranges of all pointer values that contribute (directly or >> indirectly) to the computation of the pointer's value. >> - An integer value other than zero may be associated with address >> ranges allocated through mechanisms other than those provided by >> LLVM; such ranges shall not overlap with any ranges of address >> allocated by mechanisms provided by LLVM. > I notice no mention of bitcasts on pointers here.This was an oversight. Bitcasts are simple: - The result value of a bitcast is associated with all addresses associated with the operand of the bitcast.>> For example, in an instruction like this: >> >> %p = getelementptr [4 x i8]* @A, i32 0, i32 %huge >> >> if %huge is beyond the size of @A, %p could potentially point into >> some object other than @A. This rule says that it's not valid to use >> %p to access that other object. %p would only be usable for memory >> accesses when it points within @A. >> >> C and C-like front-ends already obey this rule, since in C pointer >> arithmetic results must remain within an allocated object (or one >> past the end), which is more strict. > Lots of C code passes pointers around which might point to the > middle of > an array, say for searching. That means that negative indices could > still remain within an allocated object.This proposal supports negative GEP indices. In fact, negative GEP indices are one of the corner cases that the optimizer today handles incorrectly, though only in obscure ways that usually don't break real code. This proposal will actually let LLVM support them in a more robust manner.> >> >> Front-ends that do crazy things to pointers may need to use >> ptrtoint+arithmetic+inttoptr instead of getelementptr. If the >> target-independent properties of getelementptr are needed, the >> "getelementptr null" trick may be useful. > > Who says pointer arithmetic is crazy? > I create llvm code from an IR that has 2 types of pointers, > heap-pointers and non-heap pointers, and is safe for GC, but prevents > alias analysis. > So, I don't use any alias analysis stuff when doing optimisation, no > big > deal. > Suddenly, my correct and working code will become "undefined" :(I don't see anything problematic in your description here. Alias analysis may not be especially profitable in these circumstances, but I wouldn't guess that it would be unsafe. Dan
Hi Dan, Thanks for reply, it seems that Dan Gohman wrote:> On Jul 23, 2009, at 1:59 AM, Mark Shannon wrote: > >> Hi Dan, >> >> What you are proposing is a major change in the semantics of llvm. >> >> You are taking certain uses of an instruction that have well defined >> behaviour and undefining them. >> >> Have you made any estimate of how many peoples' code this may or may >> not >> break? >> >> I think this is a *very* bad idea. > > Hi Mark, > > I estimate this will break very little. I apologize for being unclear;No apology needed> In practical terms, this proposal isn't a big change. It's a conceptual > framework for describing assumptions that the optimizer is already > making in many cases. It will help us resolve some cases where the > optimizer's assumptions are contradictory. And, it sets the stage for aI'm curious what these assumptions are, but its not important.> new flavor of getelementptr which will be more permissive than the > current getelementptr. > >> Let me make some more detailed comments: >> >> Dan Gohman wrote: >>> Hello, >>> >>> I'm working on refining the definition of getelementptr (GEP) to >>> clarify its semantics and to allow front-ends to provide additional >>> information to optimization passes. >>> >>> To help support this, I'd like to propose the following rule: >>> >>> Any memory access must be done though a pointer value associated >>> with with address range of the memory access, otherwise the behavior >>> is undefined. >>> >>> "associated with" is defined as follows: >>> >>> - A pointer value formed from a getelementptr instruction is >>> associated with the addresses associated with the first operand of >>> the getelementptr. >>> - An addresses of a global variable is associated with the address >>> range of the variable's storage. >>> - The result value of an allocation instruction is associated with >>> the address range of the allocated storage. >>> - A null pointer is associated with no addresses. >>> - A pointer value formed by a ptrtoint is associated with all >>> address >>> ranges of all pointer values that contribute (directly or >>> indirectly) to the computation of the pointer's value. >>> - An integer value other than zero may be associated with address >>> ranges allocated through mechanisms other than those provided by >>> LLVM; such ranges shall not overlap with any ranges of address >>> allocated by mechanisms provided by LLVM. >> I notice no mention of bitcasts on pointers here. > > This was an oversight. Bitcasts are simple: > > - The result value of a bitcast is associated with all addresses > associated with the operand of the bitcast.This makes a big difference :)> >>> For example, in an instruction like this: >>> >>> %p = getelementptr [4 x i8]* @A, i32 0, i32 %huge >>> >>> if %huge is beyond the size of @A, %p could potentially point into >>> some object other than @A. This rule says that it's not valid to use >>> %p to access that other object. %p would only be usable for memory >>> accesses when it points within @A. >>> >>> C and C-like front-ends already obey this rule, since in C pointer >>> arithmetic results must remain within an allocated object (or one >>> past the end), which is more strict. >> Lots of C code passes pointers around which might point to the >> middle of >> an array, say for searching. That means that negative indices could >> still remain within an allocated object. > > This proposal supports negative GEP indices. In fact, negative GEP > indices are one of the corner cases that the optimizer today handles > incorrectly, though only in obscure ways that usually don't break real > code. This proposal will actually let LLVM support them in a more > robust manner.It doesn't appear to have broken mine. My compiler could potentially use negative indices in GEPs.> >>> Front-ends that do crazy things to pointers may need to use >>> ptrtoint+arithmetic+inttoptr instead of getelementptr. If the >>> target-independent properties of getelementptr are needed, the >>> "getelementptr null" trick may be useful. >> Who says pointer arithmetic is crazy? >> I create llvm code from an IR that has 2 types of pointers, >> heap-pointers and non-heap pointers, and is safe for GC, but prevents >> alias analysis. >> So, I don't use any alias analysis stuff when doing optimisation, no >> big >> deal. >> Suddenly, my correct and working code will become "undefined" :( > > I don't see anything problematic in your description here. Alias > analysis > may not be especially profitable in these circumstances, but I wouldn't > guess that it would be unsafe.Any tests I can do to find out if its safe? One last question. If I were to turn off all optimisations, would that ensure that the final machine code semantics with the new GEP would be unchanged from the present GEP? In other words, will any of this affect to the code-generator? Cheers, Mark.
On Jul 23, 2009, at 10:55 AM, Mark Shannon wrote:> >> In practical terms, this proposal isn't a big change. It's a >> conceptual >> >> framework for describing assumptions that the optimizer is already >> >> making in many cases. It will help us resolve some cases where the >> >> optimizer's assumptions are contradictory. And, it sets the stage >> for a >> > I'm curious what these assumptions are, but its not important.My first email aimed to make them explicit. Here's another example: @g = global i32 %a = alloca i32 %b = malloc i32 %c = inttoptr 777770 to i32* %aa = getelementptr i32* %a, i64 %i %bb = getelementptr i32* %b, i64 %j %cc = getelementptr i32* %c, i64 %k %gg = getelementptr i32* @g, i64 %l store i32 0, i32* %aa store i32 1, i32* %bb store i32 2, i32* %cc store i32 3, i32* %gg Here, %i, %j, %k, and %l are totally arbitrary values that the optimizer knows nothing about. The optimizer will assume that all four stores are to different memory locations. In theory, some truly crazy code could figure out what values to use for %i, %j, %k, and/or %l to cause the getelementptrs to overindex recklessly through the address space and break this assumption. This the kind of thing that the optimizer assumes doesn't happen.>> >> >> I don't see anything problematic in your description here. Alias >> >> analysis >> >> may not be especially profitable in these circumstances, but I >> wouldn't >> >> guess that it would be unsafe. >> > Any tests I can do to find out if its safe?I don't know of any. You can run GVN and LICM and so on to see if they break anything, but of course that wouldn't prove that there aren't any problems.> One last question. > If I were to turn off all optimisations, would that ensure that the > final machine code semantics with the new GEP would be unchanged from > the present GEP? > In other words, will any of this affect to the code-generator?It probably will at some point, to use alias-analysis info for scheduling and other things. After all, CodeGen is just a highly specialized optimizer ;-). Dan