In the language I am working on, there are cases where the call flow graph can affect the type of a variable. A simple example is a "nullable" type (a pointer which is allowed to be NULL), which can be converted into a non-nullable type via an if-test. So for example if x is nullable, and I test x != NULL, then inside the conditional block the type of x is no longer nullable. Nullable types behave slightly differently (and produce less efficient code) than non-nullable types. For example, a downcast to a nullable type is a dynamic cast (because if the cast fails, the result can be NULL), whereas a downcast to a non-nullable type throws an exception if the cast fails. This kind of analysis is trivial given LLVM's architecture. The problem I have, however, is that all of the high-level type analysis occurs before LLVM ever gets into the picture. LLVM doesn't know anything about the front-end type system. What I am wondering is, will I have to re-invent the same sort of CFG analysis that LLVM does in my frontend, or is there some shortcut? I guess this is really a general compiler-design question rather than one specific to LLVM, but I thought I would ask anyway. -- Talin
On Fri, May 23, 2008 at 11:53 PM, Talin <viridia at gmail.com> wrote:> In the language I am working on, there are cases where the call flow > graph can affect the type of a variable. A simple example is a > "nullable" type (a pointer which is allowed to be NULL), which can be > converted into a non-nullable type via an if-test. So for example if x > is nullable, and I test x != NULL, then inside the conditional block the > type of x is no longer nullable. Nullable types behave slightly > differently (and produce less efficient code) than non-nullable types. > For example, a downcast to a nullable type is a dynamic cast (because if > the cast fails, the result can be NULL), whereas a downcast to a > non-nullable type throws an exception if the cast fails.I'm not completely sure what you're trying to do here, but it seems like you want some special statement that essentially introduces a new variable with the non-nullable type for a child block. Something like ifnonnull P = getPersonForName("Talin") { code that references P }, for a language with C-like syntax. It doesn't make any sense to do this sort of thing at the LLVM level; I doubt you really want optimizers or flow-sensitive analysis messing with the semantics of programs in your language. -Eli
On May 24, 2008, at 02:53, Talin wrote:> In the language I am working on, there are cases where the call flow > graph can affect the type of a variable. A simple example is a > "nullable" type (a pointer which is allowed to be NULL), which can > be converted into a non-nullable type via an if-test. So for example > if x is nullable, and I test x != NULL, then inside the conditional > block the type of x is no longer nullable. Nullable types behave > slightly differently (and produce less efficient code) than non- > nullable types. For example, a downcast to a nullable type is a > dynamic cast (because if the cast fails, the result can be NULL), > whereas a downcast to a non-nullable type throws an exception if the > cast fails. > > This kind of analysis is trivial given LLVM's architecture. The > problem I have, however, is that all of the high-level type analysis > occurs before LLVM ever gets into the picture. LLVM doesn't know > anything about the front-end type system. > > What I am wondering is, will I have to re-invent the same sort of > CFG analysis that LLVM does in my frontend, or is there some > shortcut? I guess this is really a general compiler-design question > rather than one specific to LLVM, but I thought I would ask anyway.I have to agree with earlier poster, I'm not sure altering the type of values is tractable; this seems better as a diagnostic analysis. A way to leverage the type system here might be with 'match' statement like ml's. Consider: type maybe 'a = None | Some 'a let maybe_add x y match x, y with | Some x, Some y -> x + y | Some n, None | None, Some n -> n | None, None -> None Here, the expressions like 'Some x' and 'None' are patterns, which match different variants of the 'maybe int' type. 'Some x' and 'Some y' additionally bind matched subexpressions to names. For the first pattern, I chose to shadow the nullable parameters with the non- nullable matched subexpressions. More to your point, however, the analysis you're proposing seems strongly something that should be done in your language's IR prior to LLVM lowering. Several of LLVM's graph algorithms are templates, so you could possibly specialize a traits type for your compiler's parse tree/AST/IR and potentially leverage some of LLVM's code. — Gordon
Am Samstag, den 24.05.2008, 13:18 -0400 schrieb Gordon Henriksen:> A way to leverage the type system here might be with 'match' > statement > like ml's. Consider: > > type maybe 'a = None | Some 'a > > let maybe_add x y > match x, y with > | Some x, Some y -> x + y > | Some n, None | None, Some n -> n > | None, None -> None > > Here, the expressions like 'Some x' and 'None' are patterns, which > match different variants of the 'maybe int' type. 'Some x' and 'Some > y' additionally bind matched subexpressions to names. For the first > pattern, I chose to shadow the nullable parameters with the non- > nullable matched subexpressions.At the LLVM level, this is already what Talin wants to do. The only difference is that Talin's code would need to rename the variables inside the 'then' block in the frontend: if x != NULL then x1 = cast-to-non-nullable-type (x) ... continue with x1 and know it can't be NULL endif OCaml's match is a shorthand for the comparison-plus-cast combination. IIRC there's an OCaml compiler that uses LLVM; Talin should be able to reuse the technique employed there. HTH Jo
On May 23, 2008, at 11:53 PM, Talin wrote:> In the language I am working on, there are cases where the call flow > graph can affect the type of a variable.:-) This reminds me of people that want to use CFG and the optimizer to make: int i; int() { if (i) return 1; else return 0; } not warn/error that flow falls off the end. Extend that out and you have to do arbitrarily hard things and the document the limit of what was done, which is messy and wrong, as the next person will come along and say, but I want this one obvious little case to work as well. If you want to go down that path, you can, but we're gonna wave you off. In the end, I suspect you have to replicate all the code in the front end as it is part of the spec, not an `optimization'.
Mike Stump wrote:-> On May 23, 2008, at 11:53 PM, Talin wrote: > > In the language I am working on, there are cases where the call flow > > graph can affect the type of a variable. > > :-) > > This reminds me of people that want to use CFG and the optimizer to > make: > > int i; int() { if (i) return 1; else return 0; } > > not warn/error that flow falls off the end. Extend that out and you > have to do arbitrarily hard things and the document the limit of what > was done, which is messy and wrong, as the next person will come along > and say, but I want this one obvious little case to work as well.You can get 98% of all cases people care about with no analysis and just a couple of boolean variables in the front end, including the case above. Neil.
On May 27, 2008, at 1:57 PM, Mike Stump wrote:> On May 23, 2008, at 11:53 PM, Talin wrote: >> In the language I am working on, there are cases where the call flow >> graph can affect the type of a variable. > > :-) > > This reminds me of people that want to use CFG and the optimizer to > make: > > int i; int() { if (i) return 1; else return 0; } > > not warn/error that flow falls off the end. Extend that out and you > have to do arbitrarily hard things and the document the limit of what > was done, which is messy and wrong, as the next person will come along > and say, but I want this one obvious little case to work as well.I'm not sure what your point is here Mike. That is clearly an important thing to avoid warning for. You can argue about whether the optimizer or front-end should be generating the warning, but any compiler that emits a false positive for that would be pretty useless in practice. -Chris
On May 23, 2008, at 11:53 PM, Talin wrote:> In the language I am working on, there are cases where the call flow > graph can affect the type of a variable.Ok.> A simple example is a > "nullable" type (a pointer which is allowed to be NULL), which can be > converted into a non-nullable type via an if-test. So for example if x > is nullable, and I test x != NULL, then inside the conditional block > the > type of x is no longer nullable.Makes sense.> Nullable types behave slightly > differently (and produce less efficient code) than non-nullable types. > For example, a downcast to a nullable type is a dynamic cast > (because if > the cast fails, the result can be NULL), whereas a downcast to a > non-nullable type throws an exception if the cast fails.Sure.> This kind of analysis is trivial given LLVM's architecture. The > problem > I have, however, is that all of the high-level type analysis occurs > before LLVM ever gets into the picture. LLVM doesn't know anything > about > the front-end type system.Why not codegen this logically as a pair of <bool, T> where each is an LLVM Value*? This would allow the optimizer to propagate around and eliminate the bool.> What I am wondering is, will I have to re-invent the same sort of CFG > analysis that LLVM does in my frontend, or is there some shortcut? I > guess this is really a general compiler-design question rather than > one > specific to LLVM, but I thought I would ask anyway.This issue is fairly straight-forward I think. More general problems are things like auto-unboxing of types and doing type inference of source level languages. This can also be done pretty easily in the LLVM IR, if you have questions about that I can explain. -Chris
Thanks (everyone) for the suggestions, they were helpful. What I decided to do was to essentially "wrap" LLVM's BasicBlock structure with my own block class that contains additional high-level information. Up to this point I was generating LLVM basic blocks on the fly as I walked the statement tree; Now I build my own BB graph, and then use that to generate the BB graph for LLVM. This allows me to do a couple of things. For one thing, its a lot easier to traverse the BB graph than the statement tree, and its more amenable to manipulation (especially when taking break/continue and other jumps into account.) For one thing, implementing Python-style "yield" statement (which is a transformation of the graph) will be easier with this data structure. In the case of the nullable types, I don't need to store full use/def chains, since all I want to know is "given variable X, is there any chance that X could be null at this point?" If there are multiple definitions of X reaching a particular use of X, then the value could be null if any of the reaching definitions could be null. (In the face of incomplete knowledge, assume the worst.) So in other words, the "nullable" bits are merely or'd together. I don't need to deep dive into subroutines or expressions since "nullable" is propagated upward by the type system and frozen at call boundaries. Yes, there will be some false positives, but a few extra tests for pointer validity isn't going to hurt all that much. So essentially I just have to store at the beginning of each BB the set of variables which could be null, and do the typical intra-block analysis. The overall motivation for this, BTW, is my attempt to prove a personal theory, which is that most static type systems are guarding against the wrong things. Type mismatch errors don't keep programmers up at night; null pointers, deadlocks and race conditions do. So my language tries to check for null pointer errors at compile time using static types. -- Talin Chris Lattner wrote:> On May 23, 2008, at 11:53 PM, Talin wrote: > >> In the language I am working on, there are cases where the call flow >> graph can affect the type of a variable. >> > > Ok. > > >> A simple example is a >> "nullable" type (a pointer which is allowed to be NULL), which can be >> converted into a non-nullable type via an if-test. So for example if x >> is nullable, and I test x != NULL, then inside the conditional block >> the >> type of x is no longer nullable. >> > > Makes sense. > > >> Nullable types behave slightly >> differently (and produce less efficient code) than non-nullable types. >> For example, a downcast to a nullable type is a dynamic cast >> (because if >> the cast fails, the result can be NULL), whereas a downcast to a >> non-nullable type throws an exception if the cast fails. >> > > Sure. > > >> This kind of analysis is trivial given LLVM's architecture. The >> problem >> I have, however, is that all of the high-level type analysis occurs >> before LLVM ever gets into the picture. LLVM doesn't know anything >> about >> the front-end type system. >> > > Why not codegen this logically as a pair of <bool, T> where each is an > LLVM Value*? This would allow the optimizer to propagate around and > eliminate the bool. > > >> What I am wondering is, will I have to re-invent the same sort of CFG >> analysis that LLVM does in my frontend, or is there some shortcut? I >> guess this is really a general compiler-design question rather than >> one >> specific to LLVM, but I thought I would ask anyway. >> > > This issue is fairly straight-forward I think. More general problems > are things like auto-unboxing of types and doing type inference of > source level languages. This can also be done pretty easily in the > LLVM IR, if you have questions about that I can explain. > > -Chris > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu llvm.cs.uiuc.edu > lists.cs.uiuc.edu/mailman/listinfo/llvmdev > >