Lawrence, Peter via llvm-dev
2016-Jul-09 02:45 UTC
[llvm-dev] IPRA, interprocedural register allocation, question
Vivek,
IIUC it seems that we need two pieces of information to do IPRA,
1. what registers the callee clobbers
2. what the callee does to the call-graph
And it is #2 that we are missing when we define an external function,
Even when we declare it with a preserves or a regmask attribute,
So what I / we need is another attribute that says this is a leaf function,
At least in my case all I’m really concerned with are leaf functions
Thoughts ?
--Peter Lawrence.
From: vivek pandya [mailto:vivekvpandya at gmail.com]
Sent: Friday, July 08, 2016 10:24 AM
To: Lawrence, Peter <c_plawre at qca.qualcomm.com>
Cc: llvm-dev <llvm-dev at lists.llvm.org>; llvm-dev-request at
lists.llvm.org
Subject: Re: Re:[llvm-dev] IPRA, interprocedural register allocation, question
On Fri, Jul 8, 2016 at 1:42 PM, vivek pandya <vivekvpandya at
gmail.com<mailto:vivekvpandya at gmail.com>> wrote:
On Fri, Jul 8, 2016 at 9:47 AM, Lawrence, Peter <c_plawre at
qca.qualcomm.com<mailto:c_plawre at qca.qualcomm.com>> wrote:
Vivek,
I am looking into these function attributes in the clang docs
Preserve_most
Preserve_all
They are not available in the 3.6.2 that I am currently using, but I hope they
exist in 3.8
These should provide enough info to solve my problem,
at the MC level calls to functions with these attributes
with be code-gen’ed through different “calling conventions”,
and CALL instructions to them should have different register USE and DEF info,
Yes I believe that preserve_most or preserve_all should help you even with out
IPRA. But just to note IPRA can even help further for example on X86
preserve_most cc will not preserve R11 (this can be verified from
X86CallingConv.td and X86RegisterInfo.cpp) how ever IPAR calculates regmask
based on the actual register usage and if procedure with preserve_most cc does
not use R11 and none callsite inside of function body then IPRA will mark R11 as
preserved. Also IPRA produces RegMask which is super set of RegMask due to
calling convention.
I believe that __attribute__ ((registermask = ....)) can provide more
flexibility compare to preserve_all or preserve_most CC in some case. So believe
that we should try it out.
-Vivek
This CALL instruction register USE and DEF info should already be useful
to the intra-procedural register allocator (allowing values live across these
calls to be in what are otherwise caller-save registers),
at least that’s how I read the MC dumps, every call instruction seems to have
every caller-save register flagged as “imp-def”, IE implicitly-defined by the
instruction,
and hopefully what is considered a caller-save register at a call-site is
defined by the callee.
And this should be the information that IPRA takes advantage of in its bottom-up
analysis.
Yes that is expected help from IPRA.
Which leads me to this question, when compiling an entire whole program at one
time,
so there is no linking and no LTO, will there ever be IPRA that works within LLC
for this scenario,
and is this an objective of your project, or are you focusing only on LTO ?
The current IPRA infrastructure works at compile time so it's scope of
optimization is restricted to a compilation unit. So IPRA can only construct
correct register usage information if the procedure's code is generated by
same compiler instance that means we can't optimize library calls or
procedure defined in other module. This is because we can't keep register
usage information data across two different compiler instance.
Now if we consider LTO, it eliminates above limitation by making a large IR
module from smaller modules before generating code and thus we can have register
usage information (at lest) for procedure which was previously defined in other
module, because now with LTO every thing is in one module. So that also
clarifies that IPRA does not do anything at link time.
Now coming to LLC, it can use IPRA and optimize for functions defined in current
module. So yes while compiling whole program ( a single huge .bc file) IPRA can
be used with LLC. Also just note that if a software is written in separate files
per module (which is very common) and still you want to maximize benefits of
IPRA, then we can use llvm-link tool to combine several .bc files to produce a
huge .bc file and use that with LLC to get maximum benefits.
I know this is not the typical “linux” scenario (dynamic linking of not only
standard libraries,
but also sometimes even application libraries, and lots of static linking
because of program
size), but it is a typical “embedded” scenario, which is where I am currently.
I don't understand this use case but we can have further improvement in IPRA
for example if you have several libraries which has already compiled and
codegen, but you are able to provide information of register usage for the
functions of that libraries than we can think about an approach were we can
store register usage information into a file (which will obviously increase
compile time) and use that information across different compiler instances so
that we can provide register usage information with out having actual code while
compiling.
Other thoughts or comments ?
I am looking for some ideas that can improve current IPRA. So if you feel
anything relevant please let me know we can discuss and implement feasible
ideas.
Thanks,
Vivek
--Peter Lawrence.
From: vivek pandya [mailto:vivekvpandya at gmail.com<mailto:vivekvpandya at
gmail.com>]
Sent: Wednesday, July 06, 2016 2:09 PM
To: llvm-dev <llvm-dev at lists.llvm.org<mailto:llvm-dev at
lists.llvm.org>>; llvm-dev-request at
lists.llvm.org<mailto:llvm-dev-request at lists.llvm.org>; Lawrence, Peter
<c_plawre at qca.qualcomm.com<mailto:c_plawre at qca.qualcomm.com>>
Subject: Re:[llvm-dev] IPRA, interprocedural register allocation, question
Hello Peter,
Thanks to pointing out this interesting case.
Vivek,
I have an application where many of the leaf functions are
Hand-coded assembly language, because they use special IO instructions
That only the assembler knows about. These functions typically don't
Use any registers besides the incoming argument registers, IE they don't
Need to use any additional callee-save nor caller-save registers.
If inline asm template has specified clobbered list properly than IPRA is able
to use that information and it propagates correct register mask (and that also
means that skipping clobbers list while IPRA enabled may broke executable)
For example in following code:
int gcd( int a, int b ) {
int result ;
/* Compute Greatest Common Divisor using Euclid's Algorithm */
__asm__ __volatile__ ( "movl %1, %%r15d;"
"movl %2, %%ecx;"
"CONTD: cmpl $0, %%ecx;"
"je DONE;"
"xorl %%r13d, %%r13d;"
"idivl %%ecx;"
"movl %%ecx, %%r15d;"
"movl %%r13d, %%ecx;"
"jmp CONTD;"
"DONE: movl %%r15d, %0;" : "=g"
(result) : "g" (a), "g" (b) : "ecx"
,"r13", "r15"
);
return result ;
}
IPRA calculates and propagates correct regmask in which it marks CH, CL, ECX ..
clobbered and R13, R15 is not marked clobbered as it is callee saved and LLVM
code generators also insert spill/restores code for them.
Is there any way in your IPRA interprocedural register allocation project that
The user can supply this information for external functions ?
By external word do you here mean function defined in other module than being
used? In that case as IPRA can operate on only one module at time register
usage propagation is not possible. But there is a work around for this problem.
You can use IPRA with link time optimization enabled because the way LLVM LTO
works it creates a big IR modules out of source files and them optimize and
codegen it so in that case IPRA can have actual register usage info (if function
will be compiled in current module).
In case you want to experiment with IPRA please apply
http://reviews.llvm.org/D21395 this patch before you begin.
-Vivek
Perhaps using some form of __attribute__ ?
Maybe __attribute__ ((registermask = ....)) ?
--Peter Lawrence.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160709/7124bc00/attachment-0001.html>
vivek pandya via llvm-dev
2016-Jul-09 04:26 UTC
[llvm-dev] IPRA, interprocedural register allocation, question
On Sat, Jul 9, 2016 at 8:15 AM, Lawrence, Peter <c_plawre at qca.qualcomm.com> wrote:> Vivek, > > IIUC it seems that we need two pieces of information to do IPRA, > > 1. what registers the callee clobbers > > 2. what the callee does to the call-graph >Yes I think this is enough, but in your case we don't require #2> > > And it is #2 that we are missing when we define an external function, > > Even when we declare it with a preserves or a regmask attribute, > > >Because I think once we have effect of attribute at IR/MI level then we can just parse it and populate register usage information vector for declared function and then we can propagate reg mask on each call site encountered. But I am not user will it be easy to get new attribute working or we may need to hack clang for that too. I would also like to have thoughts from my mentors (Mehdi Amini and Hal Finkel) about this.> So what I / we need is another attribute that says this is a leaf function, > > At least in my case all I’m really concerned with are leaf functions > > >I am stating with a simple function declaration which have a custom attribute. -Vivek> > > Thoughts ? > > > > > > --Peter Lawrence. > > > > > > > > *From:* vivek pandya [mailto:vivekvpandya at gmail.com] > *Sent:* Friday, July 08, 2016 10:24 AM > *To:* Lawrence, Peter <c_plawre at qca.qualcomm.com> > *Cc:* llvm-dev <llvm-dev at lists.llvm.org>; llvm-dev-request at lists.llvm.org > *Subject:* Re: Re:[llvm-dev] IPRA, interprocedural register allocation, > question > > > > > > > > On Fri, Jul 8, 2016 at 1:42 PM, vivek pandya <vivekvpandya at gmail.com> > wrote: > > > > > > On Fri, Jul 8, 2016 at 9:47 AM, Lawrence, Peter <c_plawre at qca.qualcomm.com> > wrote: > > Vivek, > > I am looking into these function attributes in the clang docs > > Preserve_most > > Preserve_all > > They are not available in the 3.6.2 that I am currently using, but I hope > they exist in 3.8 > > > > These should provide enough info to solve my problem, > > at the MC level calls to functions with these attributes > > with be code-gen’ed through different “calling conventions”, > > and CALL instructions to them should have different register USE and DEF > info, > > > > Yes I believe that preserve_most or preserve_all should help you even with > out IPRA. But just to note IPRA can even help further for example on X86 > preserve_most cc will not preserve R11 (this can be verified from > X86CallingConv.td and X86RegisterInfo.cpp) how ever IPAR calculates regmask > based on the actual register usage and if procedure with preserve_most cc > does not use R11 and none callsite inside of function body then IPRA will > mark R11 as preserved. Also IPRA produces RegMask which is super set of > RegMask due to calling convention. > > > > I believe that __attribute__ ((registermask = ....)) can provide > more flexibility compare to preserve_all or preserve_most CC in some case. > So believe that we should try it out. > > > > -Vivek > > > > This CALL instruction register USE and DEF info should already be useful > > to the intra-procedural register allocator (allowing values live across > these > > calls to be in what are otherwise caller-save registers), > > at least that’s how I read the MC dumps, every call instruction seems to > have > > every caller-save register flagged as “imp-def”, IE implicitly-defined by > the instruction, > > and hopefully what is considered a caller-save register at a call-site is > defined by the callee. > > And this should be the information that IPRA takes advantage of in its > bottom-up analysis. > > > > Yes that is expected help from IPRA. > > > > Which leads me to this question, when compiling an entire whole program at > one time, > > so there is no linking and no LTO, will there ever be IPRA that works > within LLC for this scenario, > > and is this an objective of your project, or are you focusing only on LTO ? > > The current IPRA infrastructure works at compile time so it's scope of > optimization is restricted to a compilation unit. So IPRA can only > construct correct register usage information if the procedure's code is > generated by same compiler instance that means we can't optimize library > calls or procedure defined in other module. This is because we can't keep > register usage information data across two different compiler instance. > > > > Now if we consider LTO, it eliminates above limitation by making a large > IR module from smaller modules before generating code and thus we can have > register usage information (at lest) for procedure which was previously > defined in other module, because now with LTO every thing is in one module. > So that also clarifies that IPRA does not do anything at link time. > > > > Now coming to LLC, it can use IPRA and optimize for functions defined in > current module. So yes while compiling whole program ( a single huge .bc > file) IPRA can be used with LLC. Also just note that if a software is > written in separate files per module (which is very common) and still you > want to maximize benefits of IPRA, then we can use llvm-link tool to > combine several .bc files to produce a huge .bc file and use that with LLC > to get maximum benefits. > > > > I know this is not the typical “linux” scenario (dynamic linking of not > only standard libraries, > > but also sometimes even application libraries, and lots of static linking > because of program > > size), but it is a typical “embedded” scenario, which is where I am > currently. > > > > I don't understand this use case but we can have further improvement in > IPRA for example if you have several libraries which has already compiled > and codegen, but you are able to provide information of register usage for > the functions of that libraries than we can think about an approach were we > can store register usage information into a file (which will obviously > increase compile time) and use that information across different compiler > instances so that we can provide register usage information with out having > actual code while compiling. > > > > Other thoughts or comments ? > > > > I am looking for some ideas that can improve current IPRA. So if you feel > anything relevant please let me know we can discuss and implement feasible > ideas. > > > > Thanks, > > Vivek > > > > --Peter Lawrence. > > > > > > *From:* vivek pandya [mailto:vivekvpandya at gmail.com] > *Sent:* Wednesday, July 06, 2016 2:09 PM > *To:* llvm-dev <llvm-dev at lists.llvm.org>; llvm-dev-request at lists.llvm.org; > Lawrence, Peter <c_plawre at qca.qualcomm.com> > *Subject:* Re:[llvm-dev] IPRA, interprocedural register allocation, > question > > > > Hello Peter, > > > > Thanks to pointing out this interesting case. > > Vivek, > I have an application where many of the leaf functions are > Hand-coded assembly language, because they use special IO instructions > That only the assembler knows about. These functions typically don't > Use any registers besides the incoming argument registers, IE they don't > Need to use any additional callee-save nor caller-save registers. > > If inline asm template has specified clobbered list properly than IPRA is > able to use that information and it propagates correct register mask (and > that also means that skipping clobbers list while IPRA enabled may broke > executable) > > For example in following code: > > int gcd( int a, int b ) { > > int result ; > > /* Compute Greatest Common Divisor using Euclid's Algorithm */ > > __asm__ __volatile__ ( "movl %1, %%r15d;" > > "movl %2, %%ecx;" > > "CONTD: cmpl $0, %%ecx;" > > "je DONE;" > > "xorl %%r13d, %%r13d;" > > "idivl %%ecx;" > > "movl %%ecx, %%r15d;" > > "movl %%r13d, %%ecx;" > > "jmp CONTD;" > > "DONE: movl %%r15d, %0;" : "=g" (result) : "g" > (a), "g" (b) : "ecx" ,"r13", "r15" > > ); > > > > return result ; > > } > > IPRA calculates and propagates correct regmask in which it marks CH, CL, > ECX .. clobbered and R13, R15 is not marked clobbered as it is callee saved > and LLVM code generators also insert spill/restores code for them. > > > > Is there any way in your IPRA interprocedural register allocation project > that > The user can supply this information for external functions ? > > By external word do you here mean function defined in other module than > being used? In that case as IPRA can operate on only one module at time > register usage propagation is not possible. But there is a work around for > this problem. You can use IPRA with link time optimization enabled because > the way LLVM LTO works it creates a big IR modules out of source files and > them optimize and codegen it so in that case IPRA can have actual register > usage info (if function will be compiled in current module). > > > > In case you want to experiment with IPRA please apply > http://reviews.llvm.org/D21395 this patch before you begin. > > > > -Vivek > > > > Perhaps using some form of __attribute__ ? > Maybe __attribute__ ((registermask = ....)) ? > > > --Peter Lawrence. > > > > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160709/a81fc2aa/attachment.html>
vivek pandya via llvm-dev
2016-Jul-11 18:27 UTC
[llvm-dev] IPRA, interprocedural register allocation, question
Dear Peter, Hal and Mehdi,
I did some hack around clang so that I can attach a string attribute to
function declaration.
So I think instead of adding new regmask attribute it would be better to
use existing annotate attribute for example we can use it as follows:
extern void foo() __attribute__((annotate("REGMASK:R11,R8"))); //
here
R11, R8 are clobbered regs
this will add string REGMASK:R11,R8 into llvm.metadata section and it will
be tied to function foo via llvm.global.annotations. ( This currently works
with function definitions only, work is needed to make this work with
function declaration ) . The llvm.metadata should be accessed at IR level
and then it can be parsed to create a regmask out of it.
The parsing will need to access Module object, and I hope when parsing for
all such function reconnecting global annotation for function and string
value would be simple.
An other approach would be adding a new attribute regmask and while codegen
to IR this attribute should get lowered to corresponding string attribute
in LLVM IR ( which should also be added) and then a pass would iterate
through all such function which has such an attribute and populate register
usage container.
But any idea to simplify is welcomed. Please share your views.
I have cced clang mailing list so that clang developers can correct me if I
have make any mistake in context of clang.
Sincerely,
Vivek
On Sat, Jul 9, 2016 at 9:56 AM, vivek pandya <vivekvpandya at gmail.com>
wrote:
>
>
> On Sat, Jul 9, 2016 at 8:15 AM, Lawrence, Peter <c_plawre at
qca.qualcomm.com
> > wrote:
>
>> Vivek,
>>
>> IIUC it seems that we need two pieces of information to do
>> IPRA,
>>
>> 1. what registers the callee clobbers
>>
>> 2. what the callee does to the call-graph
>>
> Yes I think this is enough, but in your case we don't require #2
>
>>
>>
>> And it is #2 that we are missing when we define an external function,
>>
>> Even when we declare it with a preserves or a regmask attribute,
>>
>>
>>
> Because I think once we have effect of attribute at IR/MI level then we
> can just parse it and populate register usage information vector for
> declared function and then we can propagate reg mask on each call site
> encountered.
> But I am not user will it be easy to get new attribute working or we may
> need to hack clang for that too.
>
> I would also like to have thoughts from my mentors (Mehdi Amini and Hal
> Finkel) about this.
>
>> So what I / we need is another attribute that says this is a leaf
>> function,
>>
>> At least in my case all I’m really concerned with are leaf functions
>>
>>
>>
> I am stating with a simple function declaration which have a custom
> attribute.
>
> -Vivek
>
>>
>>
>> Thoughts ?
>>
>>
>>
>>
>>
>> --Peter Lawrence.
>>
>>
>>
>>
>>
>>
>>
>> *From:* vivek pandya [mailto:vivekvpandya at gmail.com]
>> *Sent:* Friday, July 08, 2016 10:24 AM
>> *To:* Lawrence, Peter <c_plawre at qca.qualcomm.com>
>> *Cc:* llvm-dev <llvm-dev at lists.llvm.org>; llvm-dev-request at
lists.llvm.org
>> *Subject:* Re: Re:[llvm-dev] IPRA, interprocedural register allocation,
>> question
>>
>>
>>
>>
>>
>>
>>
>> On Fri, Jul 8, 2016 at 1:42 PM, vivek pandya <vivekvpandya at
gmail.com>
>> wrote:
>>
>>
>>
>>
>>
>> On Fri, Jul 8, 2016 at 9:47 AM, Lawrence, Peter <
>> c_plawre at qca.qualcomm.com> wrote:
>>
>> Vivek,
>>
>> I am looking into these function attributes in the clang
docs
>>
>> Preserve_most
>>
>> Preserve_all
>>
>> They are not available in the 3.6.2 that I am currently using, but I
hope
>> they exist in 3.8
>>
>>
>>
>> These should provide enough info to solve my problem,
>>
>> at the MC level calls to functions with these attributes
>>
>> with be code-gen’ed through different “calling conventions”,
>>
>> and CALL instructions to them should have different register USE and
DEF
>> info,
>>
>>
>>
>> Yes I believe that preserve_most or preserve_all should help you even
>> with out IPRA. But just to note IPRA can even help further for example
on
>> X86 preserve_most cc will not preserve R11 (this can be verified from
>> X86CallingConv.td and X86RegisterInfo.cpp) how ever IPAR calculates
regmask
>> based on the actual register usage and if procedure with preserve_most
cc
>> does not use R11 and none callsite inside of function body then IPRA
will
>> mark R11 as preserved. Also IPRA produces RegMask which is super set of
>> RegMask due to calling convention.
>>
>>
>>
>> I believe that __attribute__ ((registermask = ....)) can provide
>> more flexibility compare to preserve_all or preserve_most CC in some
case.
>> So believe that we should try it out.
>>
>>
>>
>> -Vivek
>>
>>
>>
>> This CALL instruction register USE and DEF info should already be
useful
>>
>> to the intra-procedural register allocator (allowing values live across
>> these
>>
>> calls to be in what are otherwise caller-save registers),
>>
>> at least that’s how I read the MC dumps, every call instruction seems
to
>> have
>>
>> every caller-save register flagged as “imp-def”, IE implicitly-defined
by
>> the instruction,
>>
>> and hopefully what is considered a caller-save register at a call-site
is
>> defined by the callee.
>>
>> And this should be the information that IPRA takes advantage of in its
>> bottom-up analysis.
>>
>>
>>
>> Yes that is expected help from IPRA.
>>
>>
>>
>> Which leads me to this question, when compiling an entire whole program
>> at one time,
>>
>> so there is no linking and no LTO, will there ever be IPRA that works
>> within LLC for this scenario,
>>
>> and is this an objective of your project, or are you focusing only on
LTO
>> ?
>>
>> The current IPRA infrastructure works at compile time so it's scope
of
>> optimization is restricted to a compilation unit. So IPRA can only
>> construct correct register usage information if the procedure's
code is
>> generated by same compiler instance that means we can't optimize
library
>> calls or procedure defined in other module. This is because we
can't keep
>> register usage information data across two different compiler instance.
>>
>>
>>
>> Now if we consider LTO, it eliminates above limitation by making a
large
>> IR module from smaller modules before generating code and thus we can
have
>> register usage information (at lest) for procedure which was previously
>> defined in other module, because now with LTO every thing is in one
module.
>> So that also clarifies that IPRA does not do anything at link time.
>>
>>
>>
>> Now coming to LLC, it can use IPRA and optimize for functions defined
in
>> current module. So yes while compiling whole program ( a single huge
.bc
>> file) IPRA can be used with LLC. Also just note that if a software is
>> written in separate files per module (which is very common) and still
you
>> want to maximize benefits of IPRA, then we can use llvm-link tool to
>> combine several .bc files to produce a huge .bc file and use that with
LLC
>> to get maximum benefits.
>>
>>
>>
>> I know this is not the typical “linux” scenario (dynamic linking of not
>> only standard libraries,
>>
>> but also sometimes even application libraries, and lots of static
linking
>> because of program
>>
>> size), but it is a typical “embedded” scenario, which is where I am
>> currently.
>>
>>
>>
>> I don't understand this use case but we can have further
improvement in
>> IPRA for example if you have several libraries which has already
compiled
>> and codegen, but you are able to provide information of register usage
for
>> the functions of that libraries than we can think about an approach
were we
>> can store register usage information into a file (which will obviously
>> increase compile time) and use that information across different
compiler
>> instances so that we can provide register usage information with out
having
>> actual code while compiling.
>>
>>
>>
>> Other thoughts or comments ?
>>
>>
>>
>> I am looking for some ideas that can improve current IPRA. So if you
feel
>> anything relevant please let me know we can discuss and implement
feasible
>> ideas.
>>
>>
>>
>> Thanks,
>>
>> Vivek
>>
>>
>>
>> --Peter Lawrence.
>>
>>
>>
>>
>>
>> *From:* vivek pandya [mailto:vivekvpandya at gmail.com]
>> *Sent:* Wednesday, July 06, 2016 2:09 PM
>> *To:* llvm-dev <llvm-dev at lists.llvm.org>; llvm-dev-request at
lists.llvm.org;
>> Lawrence, Peter <c_plawre at qca.qualcomm.com>
>> *Subject:* Re:[llvm-dev] IPRA, interprocedural register allocation,
>> question
>>
>>
>>
>> Hello Peter,
>>
>>
>>
>> Thanks to pointing out this interesting case.
>>
>> Vivek,
>> I have an application where many of the leaf functions are
>> Hand-coded assembly language, because they use special IO instructions
>> That only the assembler knows about. These functions typically
don't
>> Use any registers besides the incoming argument registers, IE they
don't
>> Need to use any additional callee-save nor caller-save registers.
>>
>> If inline asm template has specified clobbered list properly than IPRA
is
>> able to use that information and it propagates correct register mask
(and
>> that also means that skipping clobbers list while IPRA enabled may
broke
>> executable)
>>
>> For example in following code:
>>
>> int gcd( int a, int b ) {
>>
>> int result ;
>>
>> /* Compute Greatest Common Divisor using Euclid's Algorithm */
>>
>> __asm__ __volatile__ ( "movl %1, %%r15d;"
>>
>> "movl %2, %%ecx;"
>>
>> "CONTD: cmpl $0, %%ecx;"
>>
>> "je DONE;"
>>
>> "xorl %%r13d, %%r13d;"
>>
>> "idivl %%ecx;"
>>
>> "movl %%ecx, %%r15d;"
>>
>> "movl %%r13d, %%ecx;"
>>
>> "jmp CONTD;"
>>
>> "DONE: movl %%r15d, %0;" :
"=g" (result) : "g"
>> (a), "g" (b) : "ecx" ,"r13",
"r15"
>>
>> );
>>
>>
>>
>> return result ;
>>
>> }
>>
>> IPRA calculates and propagates correct regmask in which it marks CH,
CL,
>> ECX .. clobbered and R13, R15 is not marked clobbered as it is callee
saved
>> and LLVM code generators also insert spill/restores code for them.
>>
>>
>>
>> Is there any way in your IPRA interprocedural register allocation
project
>> that
>> The user can supply this information for external functions ?
>>
>> By external word do you here mean function defined in other module than
>> being used? In that case as IPRA can operate on only one module at
time
>> register usage propagation is not possible. But there is a work around
for
>> this problem. You can use IPRA with link time optimization enabled
because
>> the way LLVM LTO works it creates a big IR modules out of source files
and
>> them optimize and codegen it so in that case IPRA can have actual
register
>> usage info (if function will be compiled in current module).
>>
>>
>>
>> In case you want to experiment with IPRA please apply
>> http://reviews.llvm.org/D21395 this patch before you begin.
>>
>>
>>
>> -Vivek
>>
>>
>>
>> Perhaps using some form of __attribute__ ?
>> Maybe __attribute__ ((registermask = ....)) ?
>>
>>
>> --Peter Lawrence.
>>
>>
>>
>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160711/797abc3a/attachment.html>
Lawrence, Peter via llvm-dev
2016-Jul-12 01:45 UTC
[llvm-dev] IPRA, interprocedural register allocation, question
Vivek,
Here’s the way I see it, let me know if you agree or disagree,
You cannot optimize a function’s calling convention (register-usage) unless
You can see and change every caller, and you only know this for non-static
functions
if you know that all calls to external functions cannot call back into the
current
compilation unit.
#1 gives you the info necessary to change the call-site to the external function
So you don’t need #2 to do RA around the call-site to the external function,
instead
You need #2 before you can change any non-static function’s calling convention
within the current compilation unit, assuming you have this information for all
external functions.
To be more concrete, let foo() be a non-static function in the current
compilation
Unit, any calls to foo() from external functions will have to use the “default”
Calling convention, so foo’s calling convention cannot be changed. We have to
Know that none of the external functions can call-back to the compilation unit
(they are “leaf” functions relative to the compilation unit) before we can
change
Foo()’s calling convention.
Also, the issue of escaping-pointer-to-function is made clear by the example
Of the atexit() and exit() library functions, IE even static functions can end
up
Being called by external functions. So exit() can never be declared “leaf”, and
To get the benefit of IPRA it needs to be within the compilation unit, either
By whole-program compilation or by LTO, if it is used.
--Peter.
From: vivek pandya [mailto:vivekvpandya at gmail.com]
Sent: Friday, July 08, 2016 9:26 PM
To: Lawrence, Peter <c_plawre at qca.qualcomm.com>
Cc: llvm-dev <llvm-dev at lists.llvm.org>; llvm-dev-request at
lists.llvm.org; Hal Finkel <hfinkel at anl.gov>; Tim Amini Golling
<mehdi.amini at apple.com>
Subject: Re: Re:[llvm-dev] IPRA, interprocedural register allocation, question
On Sat, Jul 9, 2016 at 8:15 AM, Lawrence, Peter <c_plawre at
qca.qualcomm.com<mailto:c_plawre at qca.qualcomm.com>> wrote:
Vivek,
IIUC it seems that we need two pieces of information to do IPRA,
1. what registers the callee clobbers
2. what the callee does to the call-graph
Yes I think this is enough, but in your case we don't require #2
And it is #2 that we are missing when we define an external function,
Even when we declare it with a preserves or a regmask attribute,
Because I think once we have effect of attribute at IR/MI level then we can
just parse it and populate register usage information vector for declared
function and then we can propagate reg mask on each call site encountered.
But I am not user will it be easy to get new attribute working or we may need to
hack clang for that too.
I would also like to have thoughts from my mentors (Mehdi Amini and Hal Finkel)
about this.
So what I / we need is another attribute that says this is a leaf function,
At least in my case all I’m really concerned with are leaf functions
I am stating with a simple function declaration which have a custom attribute.
-Vivek
Thoughts ?
--Peter Lawrence.
From: vivek pandya [mailto:vivekvpandya at gmail.com<mailto:vivekvpandya at
gmail.com>]
Sent: Friday, July 08, 2016 10:24 AM
To: Lawrence, Peter <c_plawre at qca.qualcomm.com<mailto:c_plawre at
qca.qualcomm.com>>
Cc: llvm-dev <llvm-dev at lists.llvm.org<mailto:llvm-dev at
lists.llvm.org>>; llvm-dev-request at
lists.llvm.org<mailto:llvm-dev-request at lists.llvm.org>
Subject: Re: Re:[llvm-dev] IPRA, interprocedural register allocation, question
On Fri, Jul 8, 2016 at 1:42 PM, vivek pandya <vivekvpandya at
gmail.com<mailto:vivekvpandya at gmail.com>> wrote:
On Fri, Jul 8, 2016 at 9:47 AM, Lawrence, Peter <c_plawre at
qca.qualcomm.com<mailto:c_plawre at qca.qualcomm.com>> wrote:
Vivek,
I am looking into these function attributes in the clang docs
Preserve_most
Preserve_all
They are not available in the 3.6.2 that I am currently using, but I hope they
exist in 3.8
These should provide enough info to solve my problem,
at the MC level calls to functions with these attributes
with be code-gen’ed through different “calling conventions”,
and CALL instructions to them should have different register USE and DEF info,
Yes I believe that preserve_most or preserve_all should help you even with out
IPRA. But just to note IPRA can even help further for example on X86
preserve_most cc will not preserve R11 (this can be verified from
X86CallingConv.td and X86RegisterInfo.cpp) how ever IPAR calculates regmask
based on the actual register usage and if procedure with preserve_most cc does
not use R11 and none callsite inside of function body then IPRA will mark R11 as
preserved. Also IPRA produces RegMask which is super set of RegMask due to
calling convention.
I believe that __attribute__ ((registermask = ....)) can provide more
flexibility compare to preserve_all or preserve_most CC in some case. So believe
that we should try it out.
-Vivek
This CALL instruction register USE and DEF info should already be useful
to the intra-procedural register allocator (allowing values live across these
calls to be in what are otherwise caller-save registers),
at least that’s how I read the MC dumps, every call instruction seems to have
every caller-save register flagged as “imp-def”, IE implicitly-defined by the
instruction,
and hopefully what is considered a caller-save register at a call-site is
defined by the callee.
And this should be the information that IPRA takes advantage of in its bottom-up
analysis.
Yes that is expected help from IPRA.
Which leads me to this question, when compiling an entire whole program at one
time,
so there is no linking and no LTO, will there ever be IPRA that works within LLC
for this scenario,
and is this an objective of your project, or are you focusing only on LTO ?
The current IPRA infrastructure works at compile time so it's scope of
optimization is restricted to a compilation unit. So IPRA can only construct
correct register usage information if the procedure's code is generated by
same compiler instance that means we can't optimize library calls or
procedure defined in other module. This is because we can't keep register
usage information data across two different compiler instance.
Now if we consider LTO, it eliminates above limitation by making a large IR
module from smaller modules before generating code and thus we can have register
usage information (at lest) for procedure which was previously defined in other
module, because now with LTO every thing is in one module. So that also
clarifies that IPRA does not do anything at link time.
Now coming to LLC, it can use IPRA and optimize for functions defined in current
module. So yes while compiling whole program ( a single huge .bc file) IPRA can
be used with LLC. Also just note that if a software is written in separate files
per module (which is very common) and still you want to maximize benefits of
IPRA, then we can use llvm-link tool to combine several .bc files to produce a
huge .bc file and use that with LLC to get maximum benefits.
I know this is not the typical “linux” scenario (dynamic linking of not only
standard libraries,
but also sometimes even application libraries, and lots of static linking
because of program
size), but it is a typical “embedded” scenario, which is where I am currently.
I don't understand this use case but we can have further improvement in IPRA
for example if you have several libraries which has already compiled and
codegen, but you are able to provide information of register usage for the
functions of that libraries than we can think about an approach were we can
store register usage information into a file (which will obviously increase
compile time) and use that information across different compiler instances so
that we can provide register usage information with out having actual code while
compiling.
Other thoughts or comments ?
I am looking for some ideas that can improve current IPRA. So if you feel
anything relevant please let me know we can discuss and implement feasible
ideas.
Thanks,
Vivek
--Peter Lawrence.
From: vivek pandya [mailto:vivekvpandya at gmail.com<mailto:vivekvpandya at
gmail.com>]
Sent: Wednesday, July 06, 2016 2:09 PM
To: llvm-dev <llvm-dev at lists.llvm.org<mailto:llvm-dev at
lists.llvm.org>>; llvm-dev-request at
lists.llvm.org<mailto:llvm-dev-request at lists.llvm.org>; Lawrence, Peter
<c_plawre at qca.qualcomm.com<mailto:c_plawre at qca.qualcomm.com>>
Subject: Re:[llvm-dev] IPRA, interprocedural register allocation, question
Hello Peter,
Thanks to pointing out this interesting case.
Vivek,
I have an application where many of the leaf functions are
Hand-coded assembly language, because they use special IO instructions
That only the assembler knows about. These functions typically don't
Use any registers besides the incoming argument registers, IE they don't
Need to use any additional callee-save nor caller-save registers.
If inline asm template has specified clobbered list properly than IPRA is able
to use that information and it propagates correct register mask (and that also
means that skipping clobbers list while IPRA enabled may broke executable)
For example in following code:
int gcd( int a, int b ) {
int result ;
/* Compute Greatest Common Divisor using Euclid's Algorithm */
__asm__ __volatile__ ( "movl %1, %%r15d;"
"movl %2, %%ecx;"
"CONTD: cmpl $0, %%ecx;"
"je DONE;"
"xorl %%r13d, %%r13d;"
"idivl %%ecx;"
"movl %%ecx, %%r15d;"
"movl %%r13d, %%ecx;"
"jmp CONTD;"
"DONE: movl %%r15d, %0;" : "=g"
(result) : "g" (a), "g" (b) : "ecx"
,"r13", "r15"
);
return result ;
}
IPRA calculates and propagates correct regmask in which it marks CH, CL, ECX ..
clobbered and R13, R15 is not marked clobbered as it is callee saved and LLVM
code generators also insert spill/restores code for them.
Is there any way in your IPRA interprocedural register allocation project that
The user can supply this information for external functions ?
By external word do you here mean function defined in other module than being
used? In that case as IPRA can operate on only one module at time register
usage propagation is not possible. But there is a work around for this problem.
You can use IPRA with link time optimization enabled because the way LLVM LTO
works it creates a big IR modules out of source files and them optimize and
codegen it so in that case IPRA can have actual register usage info (if function
will be compiled in current module).
In case you want to experiment with IPRA please apply
http://reviews.llvm.org/D21395 this patch before you begin.
-Vivek
Perhaps using some form of __attribute__ ?
Maybe __attribute__ ((registermask = ....)) ?
--Peter Lawrence.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160712/448cf1e0/attachment.html>