thr3ads.net - llvm dev - [LLVMdev] Advice on CFG pre-analysis [May 2008]

If this information is useful, please help other people find it:
Share via:

Talin

2008-May-24 06:53 UTC

[LLVMdev] Advice on CFG pre-analysis

In the language I am working on, there are cases where the call flow 
graph can affect the type of a variable. A simple example is a 
"nullable" type (a pointer which is allowed to be NULL), which can be 
converted into a non-nullable type via an if-test. So for example if x 
is nullable, and I test x != NULL, then inside the conditional block the 
type of x is no longer nullable. Nullable types behave slightly 
differently (and produce less efficient code) than non-nullable types. 
For example, a downcast to a nullable type is a dynamic cast (because if 
the cast fails, the result can be NULL), whereas a downcast to a 
non-nullable type throws an exception if the cast fails.

This kind of analysis is trivial given LLVM's architecture. The problem 
I have, however, is that all of the high-level type analysis occurs 
before LLVM ever gets into the picture. LLVM doesn't know anything about 
the front-end type system.

What I am wondering is, will I have to re-invent the same sort of CFG 
analysis that LLVM does in my frontend, or is there some shortcut? I 
guess this is really a general compiler-design question rather than one 
specific to LLVM, but I thought I would ask anyway.

-- Talin

Eli Friedman

2008-May-24 16:12 UTC

head link

[LLVMdev] Advice on CFG pre-analysis

On Fri, May 23, 2008 at 11:53 PM, Talin <viridia at gmail.com>
wrote:> In the language I am working on, there are cases where the call flow
> graph can affect the type of a variable. A simple example is a
> "nullable" type (a pointer which is allowed to be NULL), which
can be
> converted into a non-nullable type via an if-test. So for example if x
> is nullable, and I test x != NULL, then inside the conditional block the
> type of x is no longer nullable. Nullable types behave slightly
> differently (and produce less efficient code) than non-nullable types.
> For example, a downcast to a nullable type is a dynamic cast (because if
> the cast fails, the result can be NULL), whereas a downcast to a
> non-nullable type throws an exception if the cast fails.
I'm not completely sure what you're trying to do here, but it seems
like you want some special statement that essentially introduces a new
variable with the non-nullable type for a child block. Something like
ifnonnull P = getPersonForName("Talin") { code that references P },
for a language with C-like syntax.

It doesn't make any sense to do this sort of thing at the LLVM level;
I doubt you really want  optimizers or flow-sensitive analysis messing
with the semantics of programs in your language.

-Eli

Gordon Henriksen

2008-May-24 17:18 UTC

head link

[LLVMdev] Advice on CFG pre-analysis

On May 24, 2008, at 02:53, Talin wrote:
> In the language I am working on, there are cases where the call flow  
> graph can affect the type of a variable. A simple example is a  
> "nullable" type (a pointer which is allowed to be NULL), which
can
> be converted into a non-nullable type via an if-test. So for example  
> if x is nullable, and I test x != NULL, then inside the conditional  
> block the type of x is no longer nullable. Nullable types behave  
> slightly differently (and produce less efficient code) than non- 
> nullable types. For example, a downcast to a nullable type is a  
> dynamic cast (because if the cast fails, the result can be NULL),  
> whereas a downcast to a non-nullable type throws an exception if the  
> cast fails.
>
> This kind of analysis is trivial given LLVM's architecture. The  
> problem I have, however, is that all of the high-level type analysis  
> occurs before LLVM ever gets into the picture. LLVM doesn't know  
> anything about the front-end type system.
>
> What I am wondering is, will I have to re-invent the same sort of  
> CFG analysis that LLVM does in my frontend, or is there some  
> shortcut? I guess this is really a general compiler-design question  
> rather than one specific to LLVM, but I thought I would ask anyway.

I have to agree with earlier poster, I'm not sure altering the type of  
values is tractable; this seems better as a diagnostic analysis.

A way to leverage the type system here might be with 'match' statement  
like ml's. Consider:

type maybe 'a = None | Some 'a

let maybe_add x y    match x, y with
   | Some x, Some y -> x + y
   | Some n, None | None, Some n -> n
   | None, None -> None

Here, the expressions like 'Some x' and 'None' are patterns,
which
match different variants of the 'maybe int' type. 'Some x' and
'Some
y' additionally bind matched subexpressions to names. For the first  
pattern, I chose to shadow the nullable parameters with the non- 
nullable matched subexpressions.

More to your point, however, the analysis you're proposing seems  
strongly something that should be done in your language's IR prior to  
LLVM lowering. Several of LLVM's graph algorithms are templates, so  
you could possibly specialize a traits type for your compiler's parse  
tree/AST/IR and potentially leverage some of LLVM's code.

— Gordon

Joachim Durchholz

2008-May-24 18:23 UTC

head link

[LLVMdev] Advice on CFG pre-analysis

Am Samstag, den 24.05.2008, 13:18 -0400 schrieb Gordon
Henriksen:> A way to leverage the type system here might be with 'match'
> statement  
> like ml's. Consider:
> 
> type maybe 'a = None | Some 'a
> 
> let maybe_add x y >    match x, y with
>    | Some x, Some y -> x + y
>    | Some n, None | None, Some n -> n
>    | None, None -> None
> 
> Here, the expressions like 'Some x' and 'None' are
patterns, which
> match different variants of the 'maybe int' type. 'Some x'
and 'Some
> y' additionally bind matched subexpressions to names. For the first  
> pattern, I chose to shadow the nullable parameters with the non- 
> nullable matched subexpressions.
At the LLVM level, this is already what Talin wants to do.
The only difference is that Talin's code would need to rename the
variables inside the 'then' block in the frontend:

if x != NULL then
  x1 = cast-to-non-nullable-type (x)
  ... continue with x1 and know it can't be NULL
endif

OCaml's match is a shorthand for the comparison-plus-cast combination.

IIRC there's an OCaml compiler that uses LLVM; Talin should be able to
reuse the technique employed there.

HTH
Jo

Mike Stump

2008-May-27 20:57 UTC

head link

[LLVMdev] Advice on CFG pre-analysis

On May 23, 2008, at 11:53 PM, Talin wrote:> In the language I am working on, there are cases where the call flow
> graph can affect the type of a variable.
:-)

This reminds me of people that want to use CFG and the optimizer to  
make:

int i; int() { if (i) return 1; else return 0; }

not warn/error that flow falls off the end.  Extend that out and you  
have to do arbitrarily hard things and the document the limit of what  
was done, which is messy and wrong, as the next person will come along  
and say, but I want this one obvious little case to work as well.

If you want to go down that path, you can, but we're gonna wave you  
off.  In the end, I suspect you have to replicate all the code in the  
front end as it is part of the spec, not an `optimization'.

Neil Booth

2008-May-27 23:05 UTC

head link

[LLVMdev] Advice on CFG pre-analysis

Mike Stump wrote:-
> On May 23, 2008, at 11:53 PM, Talin wrote:
> > In the language I am working on, there are cases where the call flow
> > graph can affect the type of a variable.
> 
> :-)
> 
> This reminds me of people that want to use CFG and the optimizer to  
> make:
> 
> int i; int() { if (i) return 1; else return 0; }
> 
> not warn/error that flow falls off the end.  Extend that out and you  
> have to do arbitrarily hard things and the document the limit of what  
> was done, which is messy and wrong, as the next person will come along  
> and say, but I want this one obvious little case to work as well.
You can get 98% of all cases people care about with no analysis
and just a couple of boolean variables in the front end, including
the case above.

Neil.

Chris Lattner

2008-May-28 03:41 UTC

head link

[LLVMdev] Advice on CFG pre-analysis

On May 27, 2008, at 1:57 PM, Mike Stump wrote:
> On May 23, 2008, at 11:53 PM, Talin wrote:
>> In the language I am working on, there are cases where the call flow
>> graph can affect the type of a variable.
>
> :-)
>
> This reminds me of people that want to use CFG and the optimizer to
> make:
>
> int i; int() { if (i) return 1; else return 0; }
>
> not warn/error that flow falls off the end.  Extend that out and you
> have to do arbitrarily hard things and the document the limit of what
> was done, which is messy and wrong, as the next person will come along
> and say, but I want this one obvious little case to work as well.
I'm not sure what your point is here Mike.  That is clearly an  
important thing to avoid warning for.  You can argue about whether the  
optimizer or front-end should be generating the warning, but any  
compiler that emits a false positive for that would be pretty useless  
in practice.

-Chris

Chris Lattner

2008-May-28 03:44 UTC

head link

[LLVMdev] Advice on CFG pre-analysis

On May 23, 2008, at 11:53 PM, Talin wrote:> In the language I am working on, there are cases where the call flow
> graph can affect the type of a variable.
Ok.
> A simple example is a
> "nullable" type (a pointer which is allowed to be NULL), which
can be
> converted into a non-nullable type via an if-test. So for example if x
> is nullable, and I test x != NULL, then inside the conditional block  
> the
> type of x is no longer nullable.
Makes sense.
> Nullable types behave slightly
> differently (and produce less efficient code) than non-nullable types.
> For example, a downcast to a nullable type is a dynamic cast  
> (because if
> the cast fails, the result can be NULL), whereas a downcast to a
> non-nullable type throws an exception if the cast fails.
Sure.
> This kind of analysis is trivial given LLVM's architecture. The  
> problem
> I have, however, is that all of the high-level type analysis occurs
> before LLVM ever gets into the picture. LLVM doesn't know anything  
> about
> the front-end type system.
Why not codegen this logically as a pair of <bool, T> where each is an  
LLVM Value*?  This would allow the optimizer to propagate around and  
eliminate the bool.
> What I am wondering is, will I have to re-invent the same sort of CFG
> analysis that LLVM does in my frontend, or is there some shortcut? I
> guess this is really a general compiler-design question rather than  
> one
> specific to LLVM, but I thought I would ask anyway.
This issue is fairly straight-forward I think.  More general problems  
are things like auto-unboxing of types and doing type inference of  
source level languages.  This can also be done pretty easily in the  
LLVM IR, if you have questions about that I can explain.

-Chris

Talin

2008-May-28 04:19 UTC

head link

[LLVMdev] Advice on CFG pre-analysis

Thanks (everyone) for the suggestions, they were helpful.

What I decided to do was to essentially "wrap" LLVM's BasicBlock 
structure with my own block class that contains additional high-level 
information. Up to this point I was generating LLVM basic blocks on the 
fly as I walked the statement tree; Now I build my own BB graph, and 
then use that to generate the BB graph for LLVM.

This allows me to do a couple of things. For one thing, its a lot easier 
to traverse the BB graph than the statement tree, and its more amenable 
to manipulation (especially when taking break/continue and other jumps 
into account.) For one thing, implementing Python-style "yield" 
statement (which is a transformation of the graph) will be easier with 
this data structure.

In the case of the nullable types, I don't need to store full use/def 
chains, since all I want to know is "given variable X, is there any 
chance that X could be null at this point?" If there are multiple 
definitions of X reaching a particular use of X, then the value could be 
null if any of the reaching definitions could be null. (In the face of 
incomplete knowledge, assume the worst.) So in other words, the 
"nullable" bits are merely or'd together. I don't need to deep
dive into
subroutines or expressions since "nullable" is propagated upward by
the
type system and frozen at call boundaries. Yes, there will be some false 
positives, but a few extra tests for pointer validity isn't going to 
hurt all that much.

So essentially I just have to store at the beginning of each BB the set 
of variables which could be null, and do the typical intra-block analysis.

The overall motivation for this, BTW, is my attempt to prove a personal 
theory, which is that most static type systems are guarding against the 
wrong things. Type mismatch errors don't keep programmers up at night; 
null pointers, deadlocks and race conditions do. So my language tries to 
check for null pointer errors at compile time using static types.

-- Talin

Chris Lattner wrote:> On May 23, 2008, at 11:53 PM, Talin wrote:
>   
>> In the language I am working on, there are cases where the call flow
>> graph can affect the type of a variable.
>>     
>
> Ok.
>
>   
>> A simple example is a
>> "nullable" type (a pointer which is allowed to be NULL),
which can be
>> converted into a non-nullable type via an if-test. So for example if x
>> is nullable, and I test x != NULL, then inside the conditional block  
>> the
>> type of x is no longer nullable.
>>     
>
> Makes sense.
>
>   
>> Nullable types behave slightly
>> differently (and produce less efficient code) than non-nullable types.
>> For example, a downcast to a nullable type is a dynamic cast  
>> (because if
>> the cast fails, the result can be NULL), whereas a downcast to a
>> non-nullable type throws an exception if the cast fails.
>>     
>
> Sure.
>
>   
>> This kind of analysis is trivial given LLVM's architecture. The  
>> problem
>> I have, however, is that all of the high-level type analysis occurs
>> before LLVM ever gets into the picture. LLVM doesn't know anything
>> about
>> the front-end type system.
>>     
>
> Why not codegen this logically as a pair of <bool, T> where each is
an
> LLVM Value*?  This would allow the optimizer to propagate around and  
> eliminate the bool.
>
>   
>> What I am wondering is, will I have to re-invent the same sort of CFG
>> analysis that LLVM does in my frontend, or is there some shortcut? I
>> guess this is really a general compiler-design question rather than  
>> one
>> specific to LLVM, but I thought I would ask anyway.
>>     
>
> This issue is fairly straight-forward I think.  More general problems  
> are things like auto-unboxing of types and doing type inference of  
> source level languages.  This can also be done pretty easily in the  
> LLVM IR, if you have questions about that I can explain.
>
> -Chris
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>

Maybe Matching Threads

Search for more apparently analagous threads

llvm dev - May 2008 - [LLVMdev] Advice on CFG pre-analysis

[LLVMdev] Advice on CFG pre-analysis

[LLVMdev] Advice on CFG pre-analysis

[LLVMdev] Advice on CFG pre-analysis

[LLVMdev] Advice on CFG pre-analysis

[LLVMdev] Advice on CFG pre-analysis

[LLVMdev] Advice on CFG pre-analysis

[LLVMdev] Advice on CFG pre-analysis

[LLVMdev] Advice on CFG pre-analysis

[LLVMdev] Advice on CFG pre-analysis

Maybe Matching Threads