thr3ads.net - llvm dev - [LLVMdev] [RFC] CodeGen Context [Oct 2013]

If this information is useful, please help other people find it:
Share via:

Bill Wendling

2013-Oct-12 08:55 UTC

[LLVMdev] [RFC] CodeGen Context

Hi all,

This is my proposal for how to solve the problem we have with function
attributes that affect code generation changing between functions. (This is
mostly a problem for LTO.)

Please take a look at this proposal, and let me know if you have any questions
or comments.

Cheers!

-bw


                           CodeGen Context
                           ==============
The back-end's objects are currently generated once with a set of options
handed
to it by the front-end. These options are not expected to change throughout the
lifetime of the back-end. With the advent of extended function attributes, this
is no longer a correct assumption. During LTO for instance, a function's
attributes may change how the back-end should generate code for that function.
For example, in this code `@foo' won't disable frame pointer generation,
but
`@bar' will disable it:

 define void @foo() "no-frame-pointer-elim"="true"  { ret
void }
 define void @bar() "no-frame-pointer-elim"="false" { ret
void }

Of course, this is a very simple example. Other options affect the construction
of the back-end objects themselves (e.g., `use-soft-float').

--------------------------------------------------------------------------------

Before we get further, here are a few definitions used in this document:

Back-end Objects ::

Objects that affect code generation --- e.g., TargetInstrInfo,
TargetFrameLowering, DataLayout, etc.

CGContext ::

A central repository for back-end objects. The back-end objects may change, so
they should not be "cached" by individual passes. This is analogous to
the
current TargetMachine object. The term "CGContext" is used because it
separates the current implementation from the "ideal" implementation.

Important Options ::

Those options which affect back-end object construction.

--------------------------------------------------------------------------------

So, the back-end has to be prepared for "important options" to change.
The ideal
solution would be for the back-end to query the CGContext any time it needs
information on how to generate code.  Unfortunately, this isn't currently
feasible, because of how back-end objects are constructed, though it is
something worth striving for. As such, there are four goals we want to achieve:

1. As many options as possible should be queried via the back-end directly
  rather than relying upon objects holding onto these options,

2. Those which affect how objects are generated require those objects to be
  regenerated when the important options change,

3. There is no more dependence upon IR-level code. I.e., the back-end would
  still function if the IR code were deleted, and

4. Not prevent the back-end from being parallelized.

Some things to note:

* Recreating the back-end for each changing set of important options is
 expensive. A simple test showed that there is a measurable slowdown in the
 worst-case scenario where the back-end is recreated for every function.

* Object creation in the back-end has a high order of coupling. I.e., one
 object creates another object, which uses the original object, and may
 create other objects dependent upon previous objects, etc.

* Most functions should have the same set of important options, thus reducing
 the need to regenerate the back-end objects for each function.

* Some objects are created on demand, and may change during code generation.

This is a simple model of how command line options and function attributes will
be pass through the compiler from the front-end to the middle-end and finally
the back-end:

The front-end generates the functions with appropriate function attributes taken
from command line options. Because the front-end may be dealing with IR files
and the command line options that are currently used may be different from those
the function was generated with, the front-end will create an
"OptionContext"
object. Options specified by function attributes may be overridden by options
specified in the OptionContext. These are used as IR options by the middle
end. A suitable API will be set up to make this transparent to the middle end
*waves hands wildly*.

The function attributes and options context are used to generate the CGContext.
All IR passes, that need to know about target data, and code-generation passes
will query the CGContext for all information needed to construct the back-end.
When important options change (based on a new function's attributes), the
context can transparently reconstruct the objects that are affected. To minimize
time spent recreating the back-end objects, they can be cached.

Have some ASCII art:

            ,---------------.
        ::  | OptionContext | --.
        |   `---------------'   |   ,------------.
Front End |                       |-->| IR Options |   :: Middle End
        |  ,----------------.   |   `------------'
        :: | Function Attrs |---+-.
           `----------------'     |    ,-----------.
                                  `--> | CGContext | :: Back End
                                       `-----------'

The CGContext will transparently recreate any objects it needs to. This means
that back-end code won't be able to cache any of the objects the CGContext
creates (this has already been addressed).

The CGContext can be reached through the MachineFunction object:

 CGContext &context = MF->getContext();
 const TargetFrameLowering *TFL = context->getFrameLowering();

 if (TFL->getStackGrowthDirection() == TargetFrameLowering::StackGrowsUp) {
   // ...
 }

Currently, the best place to process the function attributes is towards the
beginning of the `SelectionDAGISel::runOnMachineFunction()' method. This has
one
side-effect --- the CGContext may not be available to IR passes which use
it. This will need to be addressed on a case-by-case basis. One option is to
have the pass manager populate the CGContext at the point in the pipeline where
we begin lowering.

Bob Wilson

2013-Oct-13 21:35 UTC

head link

[LLVMdev] [RFC] CodeGen Context

On Oct 12, 2013, at 1:55 AM, Bill Wendling <isanbard at gmail.com> wrote:
> Hi all,
> 
> This is my proposal for how to solve the problem we have with function
attributes that affect code generation changing between functions. (This is
mostly a problem for LTO.)
> 
> Please take a look at this proposal, and let me know if you have any
questions or comments.
> 
> Cheers!
> 
> -bw
Thanks, Bill.  I do have some comments, but first I want to apologize for not
sending them earlier.  Bill had asked for feedback before sending this out to a
wider audience but I didn’t get to it until now.  Sorry Bill!

[ background description removed ]
> 
> CGContext ::
> 
> A central repository for back-end objects. The back-end objects may change,
so
> they should not be "cached" by individual passes. This is
analogous to the
> current TargetMachine object. The term "CGContext" is used
because it
> separates the current implementation from the "ideal"
implementation.
I’m pretty strongly opposed to introducing this new CGContext thing.  We already
have TargetMachine to collect all the target-specific information.  With
function attributes, the target info may depend on which function is being
compiled, so we should just fix the TargetMachine APIs where necessary to let
you specify the function.  You describe the proposed CGContext as being
analogous to TargetMachine, so let’s just keep TargetMachine and make it do what
we need.
> 
> Important Options ::
> 
> Those options which affect back-end object construction.
> 
>
--------------------------------------------------------------------------------
> 
> So, the back-end has to be prepared for "important options" to
change. The ideal
> solution would be for the back-end to query the CGContext any time it needs
> information on how to generate code.  Unfortunately, this isn't
currently
> feasible, because of how back-end objects are constructed, though it is
> something worth striving for. As such, there are four goals we want to
achieve:
> 
> 1. As many options as possible should be queried via the back-end directly
>  rather than relying upon objects holding onto these options,
I’m not quite sure what you mean by “queried via the back-end directly”.  Here’s
what I propose:  When constructing a target-specific instance of
MachineFunctionInfo, any “simple” function attributes that the target backend
may need to query should be read from the IR (per goal 3 below) and recorded in
the MachineFunctionInfo object.  Target-independent function attributes can be
handled in the MachineFunctionInfo base class.  The information will then be
retrieved as needed from the MachineFunctionInfo object.
> 
> 2. Those which affect how objects are generated require those objects to be
>  regenerated when the important options change,
This whole notion of options “changing" is just wrong.  It is inherently
tied to a sequential compilation process where we handle one function at a time.
See goal 4 below.  We can’t have a single CGContext that just returns
information about “the current function”.  The right API should take the
function as an argument.
> 
> 3. There is no more dependence upon IR-level code. I.e., the back-end would
>  still function if the IR code were deleted, and
> 
> 4. Not prevent the back-end from being parallelized.
> 
> Some things to note:
> 
> * Recreating the back-end for each changing set of important options is
> expensive. A simple test showed that there is a measurable slowdown in the
> worst-case scenario where the back-end is recreated for every function.
> 
> * Object creation in the back-end has a high order of coupling. I.e., one
> object creates another object, which uses the original object, and may
> create other objects dependent upon previous objects, etc.
> 
> * Most functions should have the same set of important options, thus
reducing
> the need to regenerate the back-end objects for each function.
> 
> * Some objects are created on demand, and may change during code
generation.
> 
> This is a simple model of how command line options and function attributes
will
> be pass through the compiler from the front-end to the middle-end and
finally
> the back-end:
> 
> The front-end generates the functions with appropriate function attributes
taken
> from command line options. Because the front-end may be dealing with IR
files
> and the command line options that are currently used may be different from
those
> the function was generated with, the front-end will create an
"OptionContext"
> object. Options specified by function attributes may be overridden by
options
> specified in the OptionContext. These are used as IR options by the middle
> end. A suitable API will be set up to make this transparent to the middle
end
> *waves hands wildly*.
I’m not sure I understand.  If I do this:

$ clang -mavx -flto -c file1.c
$ clang -mno-avx file1.o file2.c

that the -mno-avx should win and that file1.c will be compiled without AVX
support?

Maybe I missed some earlier discussion, but that seems really wrong to me.  We
need the front-end settings to be consistent with the code-gen options. 
Whatever options are specified when running the front-end should be preserved
without overrides in the bit code.
> 
> The function attributes and options context are used to generate the
CGContext.
> All IR passes, that need to know about target data, and code-generation
passes
> will query the CGContext for all information needed to construct the
back-end.
> When important options change (based on a new function's attributes),
the
> context can transparently reconstruct the objects that are affected. To
minimize
> time spent recreating the back-end objects, they can be cached.
> 
> Have some ASCII art:
> 
>            ,---------------.
>        ::  | OptionContext | --.
>        |   `---------------'   |   ,------------.
> Front End |                       |-->| IR Options |   :: Middle End
>        |  ,----------------.   |   `------------'
>        :: | Function Attrs |---+-.
>           `----------------'     |    ,-----------.
>                                  `--> | CGContext | :: Back End
>                                       `-----------'
> 
> The CGContext will transparently recreate any objects it needs to. This
means
> that back-end code won't be able to cache any of the objects the
CGContext
> creates (this has already been addressed).
> 
> The CGContext can be reached through the MachineFunction object:
> 
> CGContext &context = MF->getContext();
> const TargetFrameLowering *TFL = context->getFrameLowering();
> 
> if (TFL->getStackGrowthDirection() == TargetFrameLowering::StackGrowsUp)
{
>   // ...
> }
> 
> Currently, the best place to process the function attributes is towards the
> beginning of the `SelectionDAGISel::runOnMachineFunction()' method.
This has one
> side-effect --- the CGContext may not be available to IR passes which use
> it. This will need to be addressed on a case-by-case basis. One option is
to
> have the pass manager populate the CGContext at the point in the pipeline
where
> we begin lowering.
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Bill Wendling

2013-Oct-14 03:23 UTC

head link

[LLVMdev] [RFC] CodeGen Context

On Oct 13, 2013, at 2:35 PM, Bob Wilson <bob.wilson at apple.com> wrote:
> On Oct 12, 2013, at 1:55 AM, Bill Wendling <isanbard at gmail.com>
wrote:
> 
>> CGContext ::
>> 
>> A central repository for back-end objects. The back-end objects may
change, so
>> they should not be "cached" by individual passes. This is
analogous to the
>> current TargetMachine object. The term "CGContext" is used
because it
>> separates the current implementation from the "ideal"
implementation.
> 
> I’m pretty strongly opposed to introducing this new CGContext thing.  We
already have TargetMachine to collect all the target-specific information.  With
function attributes, the target info may depend on which function is being
compiled, so we should just fix the TargetMachine APIs where necessary to let
you specify the function.  You describe the proposed CGContext as being
analogous to TargetMachine, so let’s just keep TargetMachine and make it do what
we need.
> I was using CGContext to keep this proposal separate from the current
implementation. However, while there are some similarities with TargetMachine,
there are two differences:

1) A CGContext essentially contains a cache of "TargetMachine"'s
--- one for each change in attributes that affect object generation.

2) A TargetMachine needs to be derived for a specific machine (see
LLVMTargetMachine). This is not how a context should work, in my opinion.

That said, I'm not that concerned what the final object is called. :-) If we
keep the TargetMachine name, then it will definitely change in fundamental ways.
>> Important Options ::
>> 
>> Those options which affect back-end object construction.
>> 
>>
--------------------------------------------------------------------------------
>> 
>> So, the back-end has to be prepared for "important options"
to change. The ideal
>> solution would be for the back-end to query the CGContext any time it
needs
>> information on how to generate code.  Unfortunately, this isn't
currently
>> feasible, because of how back-end objects are constructed, though it is
>> something worth striving for. As such, there are four goals we want to
achieve:
>> 
>> 1. As many options as possible should be queried via the back-end
directly
>> rather than relying upon objects holding onto these options,
> 
> I’m not quite sure what you mean by “queried via the back-end directly”. 
Here’s what I propose:  When constructing a target-specific instance of
MachineFunctionInfo, any “simple” function attributes that the target backend
may need to query should be read from the IR (per goal 3 below) and recorded in
the MachineFunctionInfo object.  Target-independent function attributes can be
handled in the MachineFunctionInfo base class.  The information will then be
retrieved as needed from the MachineFunctionInfo object.
> That's pretty much what I had in mind.
>> 2. Those which affect how objects are generated require those objects
to be
>> regenerated when the important options change,
> 
> This whole notion of options “changing" is just wrong.  It is
inherently tied to a sequential compilation process where we handle one function
at a time.  See goal 4 below.  We can’t have a single CGContext that just
returns information about “the current function”.  The right API should take the
function as an argument.
> I don't understand what you're saying here. The whole reason for this
proposal is because the options are changing between functions. I didn't
mention that it would return information only about the current function, though
that will be the non-parallel case. If you just want an API that takes the
function as an argument, that's fine with me. :)
>> 3. There is no more dependence upon IR-level code. I.e., the back-end
would
>> still function if the IR code were deleted, and
>> 
>> 4. Not prevent the back-end from being parallelized.
>> 
>> Some things to note:
>> 
>> * Recreating the back-end for each changing set of important options is
>> expensive. A simple test showed that there is a measurable slowdown in
the
>> worst-case scenario where the back-end is recreated for every function.
>> 
>> * Object creation in the back-end has a high order of coupling. I.e.,
one
>> object creates another object, which uses the original object, and may
>> create other objects dependent upon previous objects, etc.
>> 
>> * Most functions should have the same set of important options, thus
reducing
>> the need to regenerate the back-end objects for each function.
>> 
>> * Some objects are created on demand, and may change during code
generation.
>> 
>> This is a simple model of how command line options and function
attributes will
>> be pass through the compiler from the front-end to the middle-end and
finally
>> the back-end:
>> 
>> The front-end generates the functions with appropriate function
attributes taken
>> from command line options. Because the front-end may be dealing with IR
files
>> and the command line options that are currently used may be different
from those
>> the function was generated with, the front-end will create an
"OptionContext"
>> object. Options specified by function attributes may be overridden by
options
>> specified in the OptionContext. These are used as IR options by the
middle
>> end. A suitable API will be set up to make this transparent to the
middle end
>> *waves hands wildly*.
> 
> I’m not sure I understand.  If I do this:
> 
> $ clang -mavx -flto -c file1.c
> $ clang -mno-avx file1.o file2.c
> 
> that the -mno-avx should win and that file1.c will be compiled without AVX
support?
> 
> Maybe I missed some earlier discussion, but that seems really wrong to me. 
We need the front-end settings to be consistent with the code-gen options. 
Whatever options are specified when running the front-end should be preserved
without overrides in the bit code.

No, the file1.c should keep the `-mavx' flag. If you do a quick experiment,
compiling file1.o with a different flag (try `-fstack-protector' and
`-fstack-protector-all') won't affect the attributes it was originally
compiled with.

$ cat f.c
void bar(char *);

void foo() {
  char b[37];
  bar(b);
}

$ clang -S -o - -emit-llvm f.c
; ModuleID = 'f.c'
target datalayout =
"e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"
target triple = "x86_64-apple-macosx10.8.0"

define void @foo() nounwind ssp uwtable {
  %b = alloca [37 x i8], align 16
  %1 = getelementptr inbounds [37 x i8]* %b, i32 0, i32 0
  call void @bar(i8* %1)
  ret void
}

declare void @bar(i8*)
$ clang -S -o f.ll -emit-llvm f.c && clang -S -emit-llvm f.ll -o -
-fstack-protector-all
; ModuleID = 'f.ll'
target datalayout =
"e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"
target triple = "x86_64-apple-macosx10.8.0"

define void @foo() nounwind ssp uwtable {
  %b = alloca [37 x i8], align 16
  %1 = getelementptr inbounds [37 x i8]* %b, i32 0, i32 0
  call void @bar(i8* %1)
  ret void
}

declare void @bar(i8*)

-bw

Reed Kotler

2013-Oct-14 19:51 UTC

head link

[LLVMdev] [RFC] CodeGen Context

Hi Bill,

I will try and go through all this in detail this week.

I wanted to point out that for mips16, I already have something like 
this working which uses your attribute work. It's used heavily by the 
mips16 port so I'm confident that there are no fundamental issues with 
it. I am able to switch between Mips32 and Mips16 on a per function 
basis, which are really different backends using different TD files, 
IselLowering, etc. classes. Everything between mips16 and mips32 is 
subclassed. You can see most of the details in MipsTargetMachine.cpp. 
The whole thing required surprisingly little code and I think it could 
be simplified even further.

It was very clean with only some small minuses:
1) It would have been a tad cleaner if I could have dynamically inserted 
passes; something which Chandlers new pass manager will allow. I have to 
insert some extra function passes which are optionally called and with 
Chandlers new scheme, they could be optionally inserted.
2) I need to make the TargetTransformInfoPass into a function pass. I 
have not done this yet but did prototype it. There were some issues and 
I did not have the time to look into it. Right now I just disable this 
for Mips16.

You will need to solve problem #2 with your scheme too.

It took me several tries to come up with a scheme that really allowed 
what you want to do.

Reed

On 10/12/2013 01:55 AM, Bill Wendling wrote:> Hi all,
>
> This is my proposal for how to solve the problem we have with function
attributes that affect code generation changing between functions. (This is
mostly a problem for LTO.)
>
> Please take a look at this proposal, and let me know if you have any
questions or comments.
>
> Cheers!
>
> -bw
>
>
>                             CodeGen Context
>                             ==============>
> The back-end's objects are currently generated once with a set of
options handed
> to it by the front-end. These options are not expected to change throughout
the
> lifetime of the back-end. With the advent of extended function attributes,
this
> is no longer a correct assumption. During LTO for instance, a
function's
> attributes may change how the back-end should generate code for that
function.
> For example, in this code `@foo' won't disable frame pointer
generation, but
> `@bar' will disable it:
>
>   define void @foo() "no-frame-pointer-elim"="true"  {
ret void }
>   define void @bar() "no-frame-pointer-elim"="false" {
ret void }
>
> Of course, this is a very simple example. Other options affect the
construction
> of the back-end objects themselves (e.g., `use-soft-float').
>
>
--------------------------------------------------------------------------------
>
> Before we get further, here are a few definitions used in this document:
>
> Back-end Objects ::
>
> Objects that affect code generation --- e.g., TargetInstrInfo,
> TargetFrameLowering, DataLayout, etc.
>
> CGContext ::
>
> A central repository for back-end objects. The back-end objects may change,
so
> they should not be "cached" by individual passes. This is
analogous to the
> current TargetMachine object. The term "CGContext" is used
because it
> separates the current implementation from the "ideal"
implementation.
>
> Important Options ::
>
> Those options which affect back-end object construction.
>
>
--------------------------------------------------------------------------------
>
> So, the back-end has to be prepared for "important options" to
change. The ideal
> solution would be for the back-end to query the CGContext any time it needs
> information on how to generate code.  Unfortunately, this isn't
currently
> feasible, because of how back-end objects are constructed, though it is
> something worth striving for. As such, there are four goals we want to
achieve:
>
> 1. As many options as possible should be queried via the back-end directly
>    rather than relying upon objects holding onto these options,
>
> 2. Those which affect how objects are generated require those objects to be
>    regenerated when the important options change,
>
> 3. There is no more dependence upon IR-level code. I.e., the back-end would
>    still function if the IR code were deleted, and
>
> 4. Not prevent the back-end from being parallelized.
>
> Some things to note:
>
> * Recreating the back-end for each changing set of important options is
>   expensive. A simple test showed that there is a measurable slowdown in
the
>   worst-case scenario where the back-end is recreated for every function.
>
> * Object creation in the back-end has a high order of coupling. I.e., one
>   object creates another object, which uses the original object, and may
>   create other objects dependent upon previous objects, etc.
>
> * Most functions should have the same set of important options, thus
reducing
>   the need to regenerate the back-end objects for each function.
>
> * Some objects are created on demand, and may change during code
generation.
>
> This is a simple model of how command line options and function attributes
will
> be pass through the compiler from the front-end to the middle-end and
finally
> the back-end:
>
> The front-end generates the functions with appropriate function attributes
taken
> from command line options. Because the front-end may be dealing with IR
files
> and the command line options that are currently used may be different from
those
> the function was generated with, the front-end will create an
"OptionContext"
> object. Options specified by function attributes may be overridden by
options
> specified in the OptionContext. These are used as IR options by the middle
> end. A suitable API will be set up to make this transparent to the middle
end
> *waves hands wildly*.
>
> The function attributes and options context are used to generate the
CGContext.
> All IR passes, that need to know about target data, and code-generation
passes
> will query the CGContext for all information needed to construct the
back-end.
> When important options change (based on a new function's attributes),
the
> context can transparently reconstruct the objects that are affected. To
minimize
> time spent recreating the back-end objects, they can be cached.
>
> Have some ASCII art:
>
>              ,---------------.
>          ::  | OptionContext | --.
>          |   `---------------'   |   ,------------.
> Front End |                       |-->| IR Options |   :: Middle End
>          |  ,----------------.   |   `------------'
>          :: | Function Attrs |---+-.
>             `----------------'     |    ,-----------.
>                                    `--> | CGContext | :: Back End
>                                         `-----------'
>
> The CGContext will transparently recreate any objects it needs to. This
means
> that back-end code won't be able to cache any of the objects the
CGContext
> creates (this has already been addressed).
>
> The CGContext can be reached through the MachineFunction object:
>
>   CGContext &context = MF->getContext();
>   const TargetFrameLowering *TFL = context->getFrameLowering();
>
>   if (TFL->getStackGrowthDirection() ==
TargetFrameLowering::StackGrowsUp) {
>     // ...
>   }
>
> Currently, the best place to process the function attributes is towards the
> beginning of the `SelectionDAGISel::runOnMachineFunction()' method.
This has one
> side-effect --- the CGContext may not be available to IR passes which use
> it. This will need to be addressed on a case-by-case basis. One option is
to
> have the pass manager populate the CGContext at the point in the pipeline
where
> we begin lowering.
>

Seemingly Similar Threads

Search for more possibly parallel threads

llvm dev - Oct 2013 - [LLVMdev] [RFC] CodeGen Context

[LLVMdev] [RFC] CodeGen Context

[LLVMdev] [RFC] CodeGen Context

[LLVMdev] [RFC] CodeGen Context

[LLVMdev] [RFC] CodeGen Context

Seemingly Similar Threads