thr3ads.net - llvm dev - [llvm-dev] IPRA, interprocedural register allocation, question [Jul 2016]

If this information is useful, please help other people find it:
Share via:

Lawrence, Peter via llvm-dev

2016-Jul-12 19:20 UTC

[llvm-dev] IPRA, interprocedural register allocation, question

Mehdi,
             I am looking for an understanding of   1) IPRA in general,   2)
IPRA in LLVM.
Whether I want to use LTO or not is a separate issue.

1)  I currently believe it is a true statement that:
                If all external functions are known to not call back into the
“whole-program”
                Being compiled, then IPRA is free to do anything at all to the
functions being
                Compiled, not limited to only “upgrades” calling convention
changes, but
                Also allowing “downgrades” calling convention changes as well.

Do you think my current belief #1 is correct ?


2) it seems that LLVM currently limits itself to “upgrades” calling convention
changes,
The reason being so that not all call sites are required to be changed,
therefore calls through function pointers can use the default calling convention
If for example there is insufficient analysis to know for sure what functions
can be
called from that site.

Is my understanding #2 of IPRA in LLVM correct ?


--Peter.


“whole-program” here is a misnomer since there are external functions, but I
don’t
Have a better term for this.

“upgrades” means some scratch regs are converted to save
(the callee either doesn’t touch them at all, or does do save/restore)
“downgrades” means some save regs are converted to scratch
                (the callee no longer does save/restore to some registers, and
does clobber them)






From: mehdi.amini at apple.com [mailto:mehdi.amini at apple.com]
Sent: Monday, July 11, 2016 8:41 PM
To: Lawrence, Peter <c_plawre at qca.qualcomm.com>
Cc: vivek pandya <vivekvpandya at gmail.com>; llvm-dev <llvm-dev at
lists.llvm.org>; llvm-dev-request at lists.llvm.org; Hal Finkel <hfinkel
at anl.gov>
Subject: Re: [llvm-dev] IPRA, interprocedural register allocation, question



Sent from my iPhone

On Jul 11, 2016, at 7:48 PM, Lawrence, Peter <c_plawre at
qca.qualcomm.com<mailto:c_plawre at qca.qualcomm.com>> wrote:
Mehdi,
            I’m compiling embedded applications which are small enough to do
whole-program-compilation. There’s no advantage in breaking them up into
separate compilation pieces and linking them, even though in source form
they are composed of a couple of separate source files.

Ok, so LTO case basically.




So for me the compilation unit is always the entire program (and includes
main())
Except for some hand-coded-assembly-language support functions that are
“external”
to the compilation unit and in my case never call back into the compilation
unit,
IE they are always “leaf” functions from the point of view of the compilation
unit’s call-graph.

Hence I would like a clang function attribute that says this function is “leaf”
So that IPRA can know that none of the functions it is compiling is ever called
From outside this compilation unit.

I believe the usual (and best way from the compiler point of view) way to
address your particular scenario is to have a proper export list and use LTO.
For instance if you never call into the program from one of your hand-coded
assembly routines, LTO should be able to turn every global functions/variables
into local ones.




And I apologize to everyone for confusingly using the term “compilation unit”
When I meant “whole program”.


Yes I am aware of the fact that if you change a function’s calling convention
By converting some scratch regs into save regs (for example because they aren’t
even touched)
Then you are safe to call it from either the default calling convention or the
Optimized calling convention.   This is the safe thing to do, and is why I will
Only use “preserves_most” and “preserves_all” optimized calling conventions,
As those will have been implemented by a back-end writer who is aware of
All these compilations (as opposed to the “registermask=” calling convention
Which is much less safe)

I do however feel that IPRA in the whole-program case should not be restricted
to
Only scratch-becoming-save changes, I don’t have any data to support the notion,
But it begs to be investigated, unless someone can somehow prove that it can’t
help
Performance.

Beside an attribute on declarations, what do you suggest exactly?


--
Mehdi





From: mehdi.amini at apple.com<mailto:mehdi.amini at apple.com>
[mailto:mehdi.amini at apple.com]
Sent: Monday, July 11, 2016 7:06 PM
To: Lawrence, Peter <c_plawre at qca.qualcomm.com<mailto:c_plawre at
qca.qualcomm.com>>
Cc: vivek pandya <vivekvpandya at gmail.com<mailto:vivekvpandya at
gmail.com>>; llvm-dev <llvm-dev at lists.llvm.org<mailto:llvm-dev at
lists.llvm.org>>; llvm-dev-request at
lists.llvm.org<mailto:llvm-dev-request at lists.llvm.org>; Hal Finkel
<hfinkel at anl.gov<mailto:hfinkel at anl.gov>>
Subject: Re: [llvm-dev] IPRA, interprocedural register allocation, question


On Jul 11, 2016, at 6:45 PM, Lawrence, Peter <c_plawre at
qca.qualcomm.com<mailto:c_plawre at qca.qualcomm.com>> wrote:

Vivek,
          Here’s the way I see it, let me know if you agree or disagree,

You cannot optimize a function’s calling convention (register-usage) unless
You can see and change every caller,

That’s true only if you want to “downgrade” the guarantees, i.e. if you want to
reduce the callee-saved registers.
You can freely provide more information to limit the amount of caller-saved
registers to a partial list of call-sites, which is in practice changing the
“local" calling convention while keeping it compatible with the public one.




and you only know this for non-static functions
if you know that all calls to external functions cannot call back into the
current
compilation unit.

I’m not sure why you consider calls to external functions and call back? If you
don’t see main() (the common case) you don’t need a call to an external function
to have a possible call to an externally visible function in the current module.




#1 gives you the info necessary to change the call-site to the external function

So you don’t need #2 to do RA around the call-site to the external function,
instead
You need #2 before you can change any non-static function’s calling convention
within the current compilation unit, assuming you have this information for all
external functions.

If I understand the case you have in mind, it is only when you see the main()
function in the current module and you’re trying to prove that an externally
visible function could not be called from outside the module basically?

It seems to me that this is a bit orthogonal to IPRA: multiple optimizations
(IPRA included) work best when functions are deduced local, non-recursive, are
not tail called (for IPRA in particular), and don’t have their address taken.
The “infer-func-attr” and “globalopt” passes try to do their best to make this
happen, especially during LTO.

The attribute case that Vivek is adding seems more murky though.

—
Mehdi





To be more concrete, let foo() be a non-static function in the current
compilation
Unit,  any calls to foo() from external functions will have to use the “default”
Calling convention, so foo’s calling convention cannot be changed.  We have to
Know that none of the external functions can call-back to the compilation unit
(they are “leaf” functions relative to the compilation unit) before we can
change
Foo()’s calling convention.


Also, the issue of escaping-pointer-to-function is made clear by the example
Of the atexit() and exit() library functions,  IE even static functions can end
up
Being called by external functions.  So exit() can never be declared “leaf”, and
To get the benefit of IPRA it needs to be within the compilation unit, either
By whole-program compilation or by LTO, if it is used.


--Peter.






From: vivek pandya [mailto:vivekvpandya at gmail.com]
Sent: Friday, July 08, 2016 9:26 PM
To: Lawrence, Peter <c_plawre at qca.qualcomm.com<mailto:c_plawre at
qca.qualcomm.com>>
Cc: llvm-dev <llvm-dev at lists.llvm.org<mailto:llvm-dev at
lists.llvm.org>>; llvm-dev-request at
lists.llvm.org<mailto:llvm-dev-request at lists.llvm.org>; Hal Finkel
<hfinkel at anl.gov<mailto:hfinkel at anl.gov>>; Tim Amini Golling
<mehdi.amini at apple.com<mailto:mehdi.amini at apple.com>>
Subject: Re: Re:[llvm-dev] IPRA, interprocedural register allocation, question



On Sat, Jul 9, 2016 at 8:15 AM, Lawrence, Peter <c_plawre at
qca.qualcomm.com<mailto:c_plawre at qca.qualcomm.com>> wrote:
Vivek,
           IIUC it seems that we need two pieces of information to do IPRA,
1. what registers the callee clobbers
2. what the callee does to the call-graph
Yes I think this is enough, but in your case we don't require #2

And it is #2 that we are missing when we define an external function,
Even when we declare it with a preserves or a regmask attribute,

Because I think  once we have effect of attribute at IR/MI level then we can
just parse it and populate register usage information vector for declared
function and then we can propagate reg mask on each call site encountered.
But I am not user will it be easy to get new attribute working or we may need to
hack clang for that too.

I would also like to have thoughts from my mentors (Mehdi Amini and Hal Finkel)
about this.
So what I / we need is another attribute that says this is a leaf function,
At least in my case all I’m really concerned with are leaf functions

I am stating with a simple function  declaration which have a custom attribute.

-Vivek

Thoughts ?


--Peter Lawrence.



From: vivek pandya [mailto:vivekvpandya at gmail.com<mailto:vivekvpandya at
gmail.com>]
Sent: Friday, July 08, 2016 10:24 AM
To: Lawrence, Peter <c_plawre at qca.qualcomm.com<mailto:c_plawre at
qca.qualcomm.com>>
Cc: llvm-dev <llvm-dev at lists.llvm.org<mailto:llvm-dev at
lists.llvm.org>>; llvm-dev-request at
lists.llvm.org<mailto:llvm-dev-request at lists.llvm.org>
Subject: Re: Re:[llvm-dev] IPRA, interprocedural register allocation, question



On Fri, Jul 8, 2016 at 1:42 PM, vivek pandya <vivekvpandya at
gmail.com<mailto:vivekvpandya at gmail.com>> wrote:


On Fri, Jul 8, 2016 at 9:47 AM, Lawrence, Peter <c_plawre at
qca.qualcomm.com<mailto:c_plawre at qca.qualcomm.com>> wrote:
Vivek,
             I am looking into these function attributes in the clang docs
                Preserve_most
                Preserve_all
They are not available in the 3.6.2 that I am currently using, but I hope they
exist in 3.8

These should provide enough info to solve my problem,
at the MC level calls to functions with these attributes
with be code-gen’ed  through different “calling conventions”,
and CALL instructions to them should have different register USE and DEF info,

Yes I believe that preserve_most or preserve_all should help you even with out
IPRA. But just to note IPRA can even help further for example on X86
preserve_most cc will not preserve R11 (this can be verified from
X86CallingConv.td and X86RegisterInfo.cpp) how ever IPAR calculates regmask
based on the actual register usage and if procedure with preserve_most cc does
not use R11 and none callsite inside of function body then IPRA will mark R11 as
preserved. Also IPRA produces RegMask which is super set of RegMask due to
calling convention.

I believe that __attribute__ ((registermask = ....))  can provide more
flexibility compare to preserve_all or preserve_most CC in some case. So believe
that we should try it out.

-Vivek

This CALL instruction register USE and DEF info should already be useful
to the intra-procedural register allocator (allowing values live across these
calls to be in what are otherwise caller-save registers),
at least that’s how I read the MC dumps, every call instruction seems to have
every caller-save register flagged as “imp-def”, IE implicitly-defined by the
instruction,
and hopefully what is considered a caller-save register at a call-site is
defined by the callee.
And this should be the information that IPRA takes advantage of in its bottom-up
analysis.

Yes that is expected help from IPRA.

Which leads me to this question, when compiling an entire whole program at one
time,
so there is no linking and no LTO, will there ever be IPRA that works within LLC
for this scenario,
and is this an objective of your project, or are you focusing only on LTO ?
The current IPRA infrastructure works at compile time so it's scope of
optimization is restricted to a compilation unit. So IPRA can only construct
correct register usage information if the procedure's code is generated by
same compiler instance that means we can't optimize library calls or
procedure defined in other module. This is because we can't keep register
usage information data across two different compiler instance.

Now if we consider LTO, it eliminates above limitation by making a large IR
module from smaller modules before generating code and thus we can have register
usage information (at lest) for procedure which was previously defined in other
module, because now with LTO every thing is in one module. So that also
clarifies that IPRA does not do anything at link time.

Now coming to LLC, it can use IPRA and optimize for functions defined in current
module. So yes while compiling whole program ( a single huge .bc file) IPRA can
be used with LLC. Also just note that if a software is written in separate files
per module (which is very common) and still you want to maximize benefits of
IPRA, then we can use llvm-link tool to combine several .bc files to produce a
huge .bc file and use that with LLC to get maximum benefits.

I know this is not the typical “linux” scenario (dynamic linking of not only
standard libraries,
but also sometimes even application libraries, and lots of static linking
because of program
size), but it is a typical “embedded” scenario, which is where I am currently.

I don't understand this use case but we can have further improvement in IPRA
for example if you have several libraries which has already compiled and
codegen, but you are able to provide information of register usage for the
functions of that libraries than we can think about an approach were we can
store register usage information into a file (which will obviously increase
compile time) and use that information across different compiler instances so
that we can provide register usage information with out having actual code while
compiling.

Other thoughts or comments ?

I am looking for some ideas that can improve current IPRA. So if you feel
anything relevant please let me know we can discuss and implement feasible
ideas.

Thanks,
Vivek

--Peter Lawrence.


From: vivek pandya [mailto:vivekvpandya at gmail.com<mailto:vivekvpandya at
gmail.com>]
Sent: Wednesday, July 06, 2016 2:09 PM
To: llvm-dev <llvm-dev at lists.llvm.org<mailto:llvm-dev at
lists.llvm.org>>; llvm-dev-request at
lists.llvm.org<mailto:llvm-dev-request at lists.llvm.org>; Lawrence, Peter
<c_plawre at qca.qualcomm.com<mailto:c_plawre at qca.qualcomm.com>>
Subject: Re:[llvm-dev] IPRA, interprocedural register allocation, question

Hello Peter,

Thanks to pointing out this interesting case.
Vivek,
          I have an application where many of the leaf functions are
Hand-coded assembly language,  because they use special IO instructions
That only the assembler knows about.  These functions typically don't
Use any registers besides the incoming argument registers, IE they don't
Need to use any additional callee-save nor caller-save registers.
If inline asm template has specified clobbered list properly than IPRA is able
to use that information and it propagates correct register mask (and that also
means that skipping clobbers list while IPRA enabled may broke executable)
For example in following code:
int gcd( int a, int b ) {
    int result ;
    /* Compute Greatest Common Divisor using Euclid's Algorithm */
    __asm__ __volatile__ ( "movl %1, %%r15d;"
                          "movl %2, %%ecx;"
                          "CONTD: cmpl $0, %%ecx;"
                          "je DONE;"
                          "xorl %%r13d, %%r13d;"
                          "idivl %%ecx;"
                          "movl %%ecx, %%r15d;"
                          "movl %%r13d, %%ecx;"
                          "jmp CONTD;"
                          "DONE: movl %%r15d, %0;" : "=g"
(result) : "g" (a), "g" (b) : "ecx"
,"r13", "r15"
    );

    return result ;
}
IPRA calculates and propagates correct regmask in which it marks CH, CL, ECX ..
clobbered and R13, R15 is not marked clobbered as it is callee saved and LLVM
code generators also insert spill/restores code for them.

Is there any way in your IPRA interprocedural register allocation project that
The user can supply this information for external functions ?
By external word do you here mean function defined in other module than being
used?  In that case as IPRA can operate on only one module at time register
usage propagation is not possible. But there is a work around for this problem.
You can use IPRA with link time optimization enabled because the way LLVM LTO
works it creates a big IR modules out of source files and them optimize and
codegen it so in that case IPRA can have actual register usage info (if function
will be compiled in current module).

In case you want to experiment with IPRA please apply
http://reviews.llvm.org/D21395 this patch before you begin.

-Vivek

Perhaps using some form of __attribute__ ?
Maybe __attribute__ ((registermask = ....))  ?


--Peter Lawrence.

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160712/290a258c/attachment-0001.html>

Mehdi Amini via llvm-dev

2016-Jul-12 19:30 UTC

head link

[llvm-dev] IPRA, interprocedural register allocation, question

> On Jul 12, 2016, at 12:20 PM, Lawrence, Peter <c_plawre at
qca.qualcomm.com> wrote:
> 
> Mehdi,
>              I am looking for an understanding of   1) IPRA in general,  
2) IPRA in LLVM.
> Whether I want to use LTO or not is a separate issue.
>  
> 1)  I currently believe it is a true statement that:
>                 If all external functions are known to not call back into
the “whole-program”
>                 Being compiled, then IPRA is free to do anything at all to
the functions being
>                 Compiled, not limited to only “upgrades” calling convention
changes, but
>                 Also allowing “downgrades” calling convention changes as
well.
>  
> Do you think my current belief #1 is correct ?
Yes, with some extra assumptions (you don’t use dlsym for instance, and you
won’t link to another file with a global initializer that can call any of
these).

I expressed this earlier (which include the other issues I mentioned just
before) as “we can turn the linkage of every function into local” (or private,
or static, whatever denomination you prefer).

> 2) it seems that LLVM currently limits itself to “upgrades” calling
convention changes,
> The reason being so that not all call sites are required to be changed,
> therefore calls through function pointers can use the default calling
convention
> If for example there is insufficient analysis to know for sure what
functions can be
> called from that site.
>  
> Is my understanding #2 of IPRA in LLVM correct ?

I don’t believe this is correct, currently IPRA will limit itself to this for
function that can be called from another module.
I will freely change the calling convention, including downgrades, when it knows
that it can see all call sites (+ extra conditions, like no recursion being
involved I think).

>  
> “whole-program” here is a misnomer since there are external functions, but
I don’t
> Have a better term for this.
I believe you can talk about “main module”, i.e. the module defines the entry
point for the program.
Note LLVM can’t make assumption about the lack of dlsym() or global initializer
in other module for example, so the linkage type of functions is what tells us
about the possibility to call back or not.


— 
Mehdi


>  
> “upgrades” means some scratch regs are converted to save
> (the callee either doesn’t touch them at all, or does do save/restore)
> “downgrades” means some save regs are converted to scratch
>                 (the callee no longer does save/restore to some registers,
and does clobber them)
>  
>  
>  
>  
>  
>  
> From: mehdi.amini at apple.com [mailto:mehdi.amini at apple.com] 
> Sent: Monday, July 11, 2016 8:41 PM
> To: Lawrence, Peter <c_plawre at qca.qualcomm.com>
> Cc: vivek pandya <vivekvpandya at gmail.com>; llvm-dev <llvm-dev
at lists.llvm.org>; llvm-dev-request at lists.llvm.org; Hal Finkel
<hfinkel at anl.gov>
> Subject: Re: [llvm-dev] IPRA, interprocedural register allocation, question
>  
> 
> 
> Sent from my iPhone
> 
> On Jul 11, 2016, at 7:48 PM, Lawrence, Peter <c_plawre at
qca.qualcomm.com <mailto:c_plawre at qca.qualcomm.com>> wrote:
> 
> Mehdi,
>             I’m compiling embedded applications which are small enough to
do
> whole-program-compilation. There’s no advantage in breaking them up into
> separate compilation pieces and linking them, even though in source form
> they are composed of a couple of separate source files.
>  
> Ok, so LTO case basically.
>  
> 
> 
>  
> So for me the compilation unit is always the entire program (and includes
main())
> Except for some hand-coded-assembly-language support functions that are
“external”
> to the compilation unit and in my case never call back into the compilation
unit,
> IE they are always “leaf” functions from the point of view of the
compilation unit’s call-graph.
>  
> Hence I would like a clang function attribute that says this function is
“leaf”
> So that IPRA can know that none of the functions it is compiling is ever
called
> From outside this compilation unit.
>  
> I believe the usual (and best way from the compiler point of view) way to
address your particular scenario is to have a proper export list and use LTO.
> For instance if you never call into the program from one of your hand-coded
assembly routines, LTO should be able to turn every global functions/variables
into local ones.
>  
>  
>  
>  
> And I apologize to everyone for confusingly using the term “compilation
unit”
> When I meant “whole program”.
>  
>  
> Yes I am aware of the fact that if you change a function’s calling
convention
> By converting some scratch regs into save regs (for example because they
aren’t even touched)
> Then you are safe to call it from either the default calling convention or
the
> Optimized calling convention.   This is the safe thing to do, and is why I
will
> Only use “preserves_most” and “preserves_all” optimized calling
conventions,
> As those will have been implemented by a back-end writer who is aware of
> All these compilations (as opposed to the “registermask=” calling
convention
> Which is much less safe)
>  
> I do however feel that IPRA in the whole-program case should not be
restricted to
> Only scratch-becoming-save changes, I don’t have any data to support the
notion,
> But it begs to be investigated, unless someone can somehow prove that it
can’t help
> Performance.
>  
> Beside an attribute on declarations, what do you suggest exactly?
>  
>  
> -- 
> Mehdi
> 
> 
>  
>  
>  
> From: mehdi.amini at apple.com <mailto:mehdi.amini at apple.com>
[mailto:mehdi.amini at apple.com <mailto:mehdi.amini at apple.com>]
> Sent: Monday, July 11, 2016 7:06 PM
> To: Lawrence, Peter <c_plawre at qca.qualcomm.com <mailto:c_plawre at
qca.qualcomm.com>>
> Cc: vivek pandya <vivekvpandya at gmail.com <mailto:vivekvpandya at
gmail.com>>; llvm-dev <llvm-dev at lists.llvm.org <mailto:llvm-dev
at lists.llvm.org>>; llvm-dev-request at lists.llvm.org
<mailto:llvm-dev-request at lists.llvm.org>; Hal Finkel <hfinkel at
anl.gov <mailto:hfinkel at anl.gov>>
> Subject: Re: [llvm-dev] IPRA, interprocedural register allocation, question
>  
>  
> On Jul 11, 2016, at 6:45 PM, Lawrence, Peter <c_plawre at
qca.qualcomm.com <mailto:c_plawre at qca.qualcomm.com>> wrote:
>  
> Vivek,
>           Here’s the way I see it, let me know if you agree or disagree,
>  
> You cannot optimize a function’s calling convention (register-usage) unless
> You can see and change every caller,
>  
> That’s true only if you want to “downgrade” the guarantees, i.e. if you
want to reduce the callee-saved registers.
> You can freely provide more information to limit the amount of caller-saved
registers to a partial list of call-sites, which is in practice changing the
“local" calling convention while keeping it compatible with the public one.
>  
> 
> 
> 
> and you only know this for non-static functions
> if you know that all calls to external functions cannot call back into the
current
> compilation unit.
>  
> I’m not sure why you consider calls to external functions and call back? If
you don’t see main() (the common case) you don’t need a call to an external
function to have a possible call to an externally visible function in the
current module.
> 
> 
> 
>  
> #1 gives you the info necessary to change the call-site to the external
function
>  
> So you don’t need #2 to do RA around the call-site to the external
function, instead
> You need #2 before you can change any non-static function’s calling
convention
> within the current compilation unit, assuming you have this information for
all
> external functions.
>  
> If I understand the case you have in mind, it is only when you see the
main() function in the current module and you’re trying to prove that an
externally visible function could not be called from outside the module
basically?
>  
> It seems to me that this is a bit orthogonal to IPRA: multiple
optimizations (IPRA included) work best when functions are deduced local,
non-recursive, are not tail called (for IPRA in particular), and don’t have
their address taken.
> The “infer-func-attr” and “globalopt” passes try to do their best to make
this happen, especially during LTO.
>  
> The attribute case that Vivek is adding seems more murky though.
>  
> — 
> Mehdi
>  
> 
> 
> 
>
> To be more concrete, let foo() be a non-static function in the current
compilation
> Unit,  any calls to foo() from external functions will have to use the
“default”
> Calling convention, so foo’s calling convention cannot be changed.  We have
to
> Know that none of the external functions can call-back to the compilation
unit
> (they are “leaf” functions relative to the compilation unit) before we can
change
> Foo()’s calling convention.
>  
>  
> Also, the issue of escaping-pointer-to-function is made clear by the
example
> Of the atexit() and exit() library functions,  IE even static functions can
end up
> Being called by external functions.  So exit() can never be declared
“leaf”, and
> To get the benefit of IPRA it needs to be within the compilation unit,
either
> By whole-program compilation or by LTO, if it is used.
>  
>  
> --Peter.
>  
>  
>  
>  
>  
>  
> From: vivek pandya [mailto:vivekvpandya at gmail.com
<mailto:vivekvpandya at gmail.com>]
> Sent: Friday, July 08, 2016 9:26 PM
> To: Lawrence, Peter <c_plawre at qca.qualcomm.com <mailto:c_plawre at
qca.qualcomm.com>>
> Cc: llvm-dev <llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>>; llvm-dev-request at lists.llvm.org
<mailto:llvm-dev-request at lists.llvm.org>; Hal Finkel <hfinkel at
anl.gov <mailto:hfinkel at anl.gov>>; Tim Amini Golling <mehdi.amini
at apple.com <mailto:mehdi.amini at apple.com>>
> Subject: Re: Re:[llvm-dev] IPRA, interprocedural register allocation,
question
>  
>  
>  
> On Sat, Jul 9, 2016 at 8:15 AM, Lawrence, Peter <c_plawre at
qca.qualcomm.com <mailto:c_plawre at qca.qualcomm.com>> wrote:
> Vivek,
>            IIUC it seems that we need two pieces of information to do IPRA,
> 1. what registers the callee clobbers
> 2. what the callee does to the call-graph
> Yes I think this is enough, but in your case we don't require #2 
>  
> And it is #2 that we are missing when we define an external function,
> Even when we declare it with a preserves or a regmask attribute,
>  
> Because I think  once we have effect of attribute at IR/MI level then we
can just parse it and populate register usage information vector for declared
function and then we can propagate reg mask on each call site encountered.
> But I am not user will it be easy to get new attribute working or we may
need to hack clang for that too.
>  
> I would also like to have thoughts from my mentors (Mehdi Amini and Hal
Finkel) about this.
> So what I / we need is another attribute that says this is a leaf function,
> At least in my case all I’m really concerned with are leaf functions
>  
> I am stating with a simple function  declaration which have a custom
attribute.
>  
> -Vivek
>  
> Thoughts ?
>  
>  
> --Peter Lawrence.
>  
>  
>  
> From: vivek pandya [mailto:vivekvpandya at gmail.com
<mailto:vivekvpandya at gmail.com>]
> Sent: Friday, July 08, 2016 10:24 AM
> To: Lawrence, Peter <c_plawre at qca.qualcomm.com <mailto:c_plawre at
qca.qualcomm.com>>
> Cc: llvm-dev <llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>>; llvm-dev-request at lists.llvm.org
<mailto:llvm-dev-request at lists.llvm.org>
> Subject: Re: Re:[llvm-dev] IPRA, interprocedural register allocation,
question
>  
>  
>  
> On Fri, Jul 8, 2016 at 1:42 PM, vivek pandya <vivekvpandya at gmail.com
<mailto:vivekvpandya at gmail.com>> wrote:
>  
>  
> On Fri, Jul 8, 2016 at 9:47 AM, Lawrence, Peter <c_plawre at
qca.qualcomm.com <mailto:c_plawre at qca.qualcomm.com>> wrote:
> Vivek,
>              I am looking into these function attributes in the clang docs
>                 Preserve_most
>                 Preserve_all
> They are not available in the 3.6.2 that I am currently using, but I hope
they exist in 3.8
>  
> These should provide enough info to solve my problem,
> at the MC level calls to functions with these attributes
> with be code-gen’ed  through different “calling conventions”,
> and CALL instructions to them should have different register USE and DEF
info,
>  
> Yes I believe that preserve_most or preserve_all should help you even with
out IPRA. But just to note IPRA can even help further for example on X86
preserve_most cc will not preserve R11 (this can be verified from
X86CallingConv.td and X86RegisterInfo.cpp) how ever IPAR calculates regmask
based on the actual register usage and if procedure with preserve_most cc does
not use R11 and none callsite inside of function body then IPRA will mark R11 as
preserved. Also IPRA produces RegMask which is super set of RegMask due to
calling convention.
>  
> I believe that __attribute__ ((registermask = ....))  can provide more
flexibility compare to preserve_all or preserve_most CC in some case. So believe
that we should try it out.
>  
> -Vivek
>  
> This CALL instruction register USE and DEF info should already be useful
> to the intra-procedural register allocator (allowing values live across
these
> calls to be in what are otherwise caller-save registers),
> at least that’s how I read the MC dumps, every call instruction seems to
have
> every caller-save register flagged as “imp-def”, IE implicitly-defined by
the instruction,
> and hopefully what is considered a caller-save register at a call-site is
defined by the callee.
> And this should be the information that IPRA takes advantage of in its
bottom-up analysis.
>  
> Yes that is expected help from IPRA. 
>  
> Which leads me to this question, when compiling an entire whole program at
one time,
> so there is no linking and no LTO, will there ever be IPRA that works
within LLC for this scenario,
> and is this an objective of your project, or are you focusing only on LTO ?
> The current IPRA infrastructure works at compile time so it's scope of
optimization is restricted to a compilation unit. So IPRA can only construct
correct register usage information if the procedure's code is generated by
same compiler instance that means we can't optimize library calls or
procedure defined in other module. This is because we can't keep register
usage information data across two different compiler instance.
>  
> Now if we consider LTO, it eliminates above limitation by making a large IR
module from smaller modules before generating code and thus we can have register
usage information (at lest) for procedure which was previously defined in other
module, because now with LTO every thing is in one module. So that also
clarifies that IPRA does not do anything at link time.
>  
> Now coming to LLC, it can use IPRA and optimize for functions defined in
current module. So yes while compiling whole program ( a single huge .bc file)
IPRA can be used with LLC. Also just note that if a software is written in
separate files per module (which is very common) and still you want to maximize
benefits of IPRA, then we can use llvm-link tool to combine several .bc files to
produce a huge .bc file and use that with LLC to get maximum benefits.
>  
> I know this is not the typical “linux” scenario (dynamic linking of not
only standard libraries,
> but also sometimes even application libraries, and lots of static linking
because of program
> size), but it is a typical “embedded” scenario, which is where I am
currently.
>  
> I don't understand this use case but we can have further improvement in
IPRA for example if you have several libraries which has already compiled and
codegen, but you are able to provide information of register usage for the
functions of that libraries than we can think about an approach were we can
store register usage information into a file (which will obviously increase
compile time) and use that information across different compiler instances so
that we can provide register usage information with out having actual code while
compiling.
>  
> Other thoughts or comments ?
>  
> I am looking for some ideas that can improve current IPRA. So if you feel
anything relevant please let me know we can discuss and implement feasible
ideas.
>  
> Thanks,
> Vivek  
>  
> --Peter Lawrence.
>  
>  
> From: vivek pandya [mailto:vivekvpandya at gmail.com
<mailto:vivekvpandya at gmail.com>]
> Sent: Wednesday, July 06, 2016 2:09 PM
> To: llvm-dev <llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>>; llvm-dev-request at lists.llvm.org
<mailto:llvm-dev-request at lists.llvm.org>; Lawrence, Peter <c_plawre
at qca.qualcomm.com <mailto:c_plawre at qca.qualcomm.com>>
> Subject: Re:[llvm-dev] IPRA, interprocedural register allocation, question
>  
> Hello Peter,
>  
> Thanks to pointing out this interesting case. 
> Vivek,
>           I have an application where many of the leaf functions are
> Hand-coded assembly language,  because they use special IO instructions
> That only the assembler knows about.  These functions typically don't
> Use any registers besides the incoming argument registers, IE they
don't
> Need to use any additional callee-save nor caller-save registers.
> 
> If inline asm template has specified clobbered list properly than IPRA is
able to use that information and it propagates correct register mask (and that
also means that skipping clobbers list while IPRA enabled may broke executable)
> For example in following code:
> int gcd( int a, int b ) {
>     int result ;
>     /* Compute Greatest Common Divisor using Euclid's Algorithm */
>     __asm__ __volatile__ ( "movl %1, %%r15d;"
>                           "movl %2, %%ecx;"
>                           "CONTD: cmpl $0, %%ecx;"
>                           "je DONE;"
>                           "xorl %%r13d, %%r13d;"
>                           "idivl %%ecx;"
>                           "movl %%ecx, %%r15d;"
>                           "movl %%r13d, %%ecx;"
>                           "jmp CONTD;"
>                           "DONE: movl %%r15d, %0;" :
"=g" (result) : "g" (a), "g" (b) : "ecx"
,"r13", "r15"
>     );
>  
>     return result ;
> }
> IPRA calculates and propagates correct regmask in which it marks CH, CL,
ECX .. clobbered and R13, R15 is not marked clobbered as it is callee saved and
LLVM code generators also insert spill/restores code for them.
>  
> Is there any way in your IPRA interprocedural register allocation project
that
> The user can supply this information for external functions ?
> By external word do you here mean function defined in other module than
being used?  In that case as IPRA can operate on only one module at time
register usage propagation is not possible. But there is a work around for this
problem. You can use IPRA with link time optimization enabled because the way
LLVM LTO works it creates a big IR modules out of source files and them optimize
and codegen it so in that case IPRA can have actual register usage info (if
function will be compiled in current module).
>  
> In case you want to experiment with IPRA please apply
http://reviews.llvm.org/D21395 <http://reviews.llvm.org/D21395> this patch
before you begin.
>  
> -Vivek
>  
> Perhaps using some form of __attribute__ ?
> Maybe __attribute__ ((registermask = ....))  ?
> 
> 
> --Peter Lawrence.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160712/577cd3fa/attachment-0001.html>

Lawrence, Peter via llvm-dev

2016-Jul-12 19:55 UTC

head link

[llvm-dev] IPRA, interprocedural register allocation, question

Mehdi,
            In my mind at least, “whole program” means no dynamic libraries, so
the only
external functions are simple runtime support, do you have a suggested term for
that ?

--Peter.



From: mehdi.amini at apple.com [mailto:mehdi.amini at apple.com]
Sent: Tuesday, July 12, 2016 12:31 PM
To: Lawrence, Peter <c_plawre at qca.qualcomm.com>
Cc: vivek pandya <vivekvpandya at gmail.com>; llvm-dev <llvm-dev at
lists.llvm.org>; llvm-dev-request at lists.llvm.org; Hal Finkel <hfinkel
at anl.gov>
Subject: Re: [llvm-dev] IPRA, interprocedural register allocation, question


On Jul 12, 2016, at 12:20 PM, Lawrence, Peter <c_plawre at
qca.qualcomm.com<mailto:c_plawre at qca.qualcomm.com>> wrote:

Mehdi,
             I am looking for an understanding of   1) IPRA in general,   2)
IPRA in LLVM.
Whether I want to use LTO or not is a separate issue.

1)  I currently believe it is a true statement that:
                If all external functions are known to not call back into the
“whole-program”
                Being compiled, then IPRA is free to do anything at all to the
functions being
                Compiled, not limited to only “upgrades” calling convention
changes, but
                Also allowing “downgrades” calling convention changes as well.

Do you think my current belief #1 is correct ?

Yes, with some extra assumptions (you don’t use dlsym for instance, and you
won’t link to another file with a global initializer that can call any of
these).

I expressed this earlier (which include the other issues I mentioned just
before) as “we can turn the linkage of every function into local” (or private,
or static, whatever denomination you prefer).



2) it seems that LLVM currently limits itself to “upgrades” calling convention
changes,
The reason being so that not all call sites are required to be changed,
therefore calls through function pointers can use the default calling convention
If for example there is insufficient analysis to know for sure what functions
can be
called from that site.

Is my understanding #2 of IPRA in LLVM correct ?


I don’t believe this is correct, currently IPRA will limit itself to this for
function that can be called from another module.
I will freely change the calling convention, including downgrades, when it knows
that it can see all call sites (+ extra conditions, like no recursion being
involved I think).



“whole-program” here is a misnomer since there are external functions, but I
don’t
Have a better term for this.

I believe you can talk about “main module”, i.e. the module defines the entry
point for the program.
Note LLVM can’t make assumption about the lack of dlsym() or global initializer
in other module for example, so the linkage type of functions is what tells us
about the possibility to call back or not.


—
Mehdi




“upgrades” means some scratch regs are converted to save
(the callee either doesn’t touch them at all, or does do save/restore)
“downgrades” means some save regs are converted to scratch
                (the callee no longer does save/restore to some registers, and
does clobber them)






-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160712/d837082b/attachment.html>

Lawrence, Peter via llvm-dev

2016-Jul-13 23:24 UTC

head link

[llvm-dev] IPRA, interprocedural register allocation, question

Mehdi,
               I am perusing the 3.8 trunk sources, and don’t find evidence
where I
would expect it for LLVM “downgrading” a function’s calling convention.

PrologEpilogEmitter() {         “CodeGen/”
     ...
     TFI->determineCalleeSaves() {        “Target/XYZ/”
           TargetFrameLowering::determineCalleeSaves() {   “CodeGen/”
                Return <<< some object derived from “*CallingConv.td”
>>>;     “build/lib/Target/XYX/”
           }
           ...
           SavedRegs.set(Reg);  // to “add” a reg, EG for ‘hasFP’, ETC
           ...
     }
}

The SavedRegs set always starts out with a predefined calling-convention value
That comes typically from “*CallingConv.td” hence is not function-specific.

The only time SavedRegs.reset() is ever called (which is rarely to begin with)
are for target-specific, calling-conventions-specific reasons, never
function-specific.

Perhaps I’m looking in the wrong place ?

But I think while we both agree that in principle LLVM could “downgrade” a
function,
Given that it can provably see every call-site to it, it does not seem like this
is actually
Happening, unless I’m missing something ???


(even if true I’m not claiming we’re missing an important case, I don’t have any
Logical arguments either way and don’t have any evidence either way.  I’m just
Trying to understand what LLVM actually does or does not do).


--Peter Lawrence.




2) it seems that LLVM currently limits itself to “upgrades” calling convention
changes,
The reason being so that not all call sites are required to be changed,
therefore calls through function pointers can use the default calling convention
If for example there is insufficient analysis to know for sure what functions
can be
called from that site.

Is my understanding #2 of IPRA in LLVM correct ?


I don’t believe this is correct, currently IPRA will limit itself to this for
function that can be called from another module.
I will freely change the calling convention, including downgrades, when it knows
that it can see all call sites (+ extra conditions, like no recursion being
involved I think).



-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160713/e83fa4f6/attachment.html>

Seemingly Similar Threads

Search for more possibly parallel threads

llvm dev - Jul 2016 - IPRA, interprocedural register allocation, question

[llvm-dev] IPRA, interprocedural register allocation, question

[llvm-dev] IPRA, interprocedural register allocation, question

[llvm-dev] IPRA, interprocedural register allocation, question

[llvm-dev] IPRA, interprocedural register allocation, question

Seemingly Similar Threads