thr3ads.net - llvm dev - [llvm-dev] [GlobalISel][RFC] Value to vreg during IR to MachineInstr translation for aggregate type [Jan 2016]

If this information is useful, please help other people find it:
Share via:

Quentin Colombet via llvm-dev

2016-Jan-15 00:41 UTC

[llvm-dev] [GlobalISel][RFC] Value to vreg during IR to MachineInstr translation for aggregate type

Hi,

As part of the effort to bring up GlobalISel, I would like your feedbacks on the
best way to map LLVM IR values into MachineInstr values (virtual registers), in
particular when aggregate types get involved.

I am looking for a long term solution.
Short term is to replicate SDAG solution.


** Context **

The first step of GlobalISel is to translate the LLVM IR into MachineInstr
representation. During this process we need to be able to tell where are the
different LLVM IR values with respect to their machine representation, such that
we can feed the right element to the related machine instructions.

Right now, in SelectionDAG, we have a mapping from one Value* to  one SDValue.
But this is not the whole story!


** Problem With Aggregate Types **

* Facts *

- Aggregate types cannot be expressed with a single EVT (Extended Value Type)
- SDValues are typed with a single EVT.

Therefore, values with aggregate types need to be break apart to be represented
by a SDValue.

* Implications *

- Instructions that may accept aggregate type, e.g., load, store, and select,
must be split to handle the different component that compose a Value. (Look for
the use of ComputeValueVTs in the SelecionDAGBuilder and the related loops this
implies).
- Use of aggregate type generates a bunch of MERGE_VALUE nodes, which sole
purpose is to glue all the components that make the aggregate type together.

Therefore, in practice, values with aggregate types are mapped to several
SDValue hidden in MERGE_VALUE nodes.

* Summary *

Values with aggregate type map to a list of SDValue and, consequently, the
handling of the instructions is uneven between the ones that need to support
aggregate type and the ones that do not.


** Possible Solutions **

* A. Replicate SDAG Solution *

Not much to add from the title. One can see that in the sketch of patch for the
IRTranslator (from the original GlobalISel email
http://lists.llvm.org/pipermail/llvm-dev/2015-November/092566.html), we have a
map from a Value to a list of virtual registers.

Pros:
- We know it worked for SDAG.
- We know backends would generate reasonable code from that.

Cons:
- We need to pay (compile time, memory consumption) for the extra logic for
every non-aggregate type.
- Translation is not homogeneous between, say, add and store.
- Premature splitting that need to be fixed later(?), e.g., merge consecutive
stores.


* B. Map Aggregate Type to One (Big) Virtual Register *

Have one big virtual register for the whole aggregate type. Only the size of the
register matters for the instructions that handle aggregate type, so we can
still use EVT for virtual registers.

Pros:
- Translation is fast.
- Code is easy to understand.
- No premature splitting for instance for load instructions.

Cons:
- Useless bits (padding) are carried around and may be hard to eliminate.
- Expose weird types to legalization more often.
- Backends may not cope very well with that kind of code.


* C. Ideas *

Any other ideas?


** Feedback Needed **

I want to have your feedbacks on how we should model aggregate types during the
IR translation with respect to the mapping to their machine representation.

This is not a road blocker as, at first, I will replicate SDAG solution, but I
want we have a conversation on what is the best long term solution.

I will file a PR so that we remember we would need to come back to this part of
the implementation of the prototype when we agreed on what the solution should
be and look at productizing the selector.


** Example **

Consider the following LLVM IR input:
%struct.bar = type { i16, i16 }

define void @foo(i16 signext %a, i16 signext %b, %struct.bar* nocapture %addr)
#0 {
entry:
  %tmp = insertvalue %struct.bar undef, i16 %a, 0
  %tmp2 = insertvalue %struct.bar %tmp, i16 %b, 1
  store %struct.bar %tmp2, %struct.bar* %addr, align 4
  ret void
}

Which is roughly:
struct bar {
  short a;
  short b;
};

void foo(short a, short b, struct bar *addr) {
  struct bar tmp;
  tmp.a = a;
  tmp.b = b;
  *addr = tmp;
}

* Solution A: Replicate SDAG *

Note: (#) is the size of the virtual register.

- Translation:

arg1(32) = copy R0
arg2(32) = copy R1
addr(32) = copy R2
a(16) = truncate arg1(32)
b(16) = truncate arg2(32)
tmp5(16), tmp6(16) = merge_value a(16), b(16)
store tmp5(16), addr(32), 0
store tmp6(16), addr(32), 4

- Legalization:
arg1(32) = copy R0
arg2(32) = copy R1
addr(32) = copy R2
tmp5(16) = extract_subreg a(32), 0
tmp6(16) = extract_subreg b(32), 0
store tmp5(16), addr(32), 0
store tmp6(16), addr(32), 4


* Solution B: One to one mapping Value to Vreg *

- Translation:

arg1(32) = copy R0
arg2(32) = copy R1
addr(32) = copy R2
a(16) = truncate arg1(32)
b(16) = truncate arg2(32)
tmp(32) = concat a(16), b(16)
store tmp(32), addr

- Legalization:
arg1(32) = copy R0
arg2(32) = copy R1
addr(32) = copy R2
a(32) = and arg1(32), 0xFFFF
b(32) = and arg2(32), 0xFFFF
tmp(32) = concat a(32/16), b(32/16) ; <— although the input registers are
32bits we are really only interested in the 16bit low part of both a and b.
store tmp(32), addr ; <— legal

The problem now is how do we legalize that thing with generic code?
I.e., we’ll end up with either load/store sequences (maybe storing i16 is not
even legal) or SHIFT and OR and that may be painful to optimize later.

Thanks,
-Quentin
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160114/7264119c/attachment.html>

Hal Finkel via llvm-dev

2016-Jan-15 00:53 UTC

head link

[llvm-dev] [GlobalISel][RFC] Value to vreg during IR to MachineInstr translation for aggregate type

----- Original Message -----> From: "Quentin Colombet via llvm-dev" <llvm-dev at
lists.llvm.org>
> To: "llvm-dev" <llvm-dev at lists.llvm.org>
> Sent: Thursday, January 14, 2016 6:41:57 PM
> Subject: [llvm-dev] [GlobalISel][RFC] Value to vreg during IR to
MachineInstr translation for aggregate type
> 
> 
> Hi,
> 
> 
> As part of the effort to bring up GlobalISel, I would like your
> feedbacks on the best way to map LLVM IR values into MachineInstr
> values (virtual registers), in particular when aggregate types get
> involved.
> 
> 
> I am looking for a long term solution.
> Short term is to replicate SDAG solution.
> 
> 
> 
> 
> ** Context **
> 
> 
> The first step of GlobalISel is to translate the LLVM IR into
> MachineInstr representation. During this process we need to be able
> to tell where are the different LLVM IR values with respect to their
> machine representation, such that we can feed the right element to
> the related machine instructions.
> 
> 
> Right now, in SelectionDAG, we have a mapping from one Value* to one
> SDValue. But this is not the whole story!
> 
> 
> 
> 
> ** Problem With Aggregate Types **
> 
> 
> * Facts *
> 
> 
> - Aggregate types cannot be expressed with a single EVT (Extended
> Value Type)
> - SDValues are typed with a single EVT.
> 
> 
> Therefore, values with aggregate types need to be break apart to be
> represented by a SDValue.
> 
> 
> * Implications *
> 
> 
> - Instructions that may accept aggregate type, e.g., load, store, and
> select, must be split to handle the different component that compose
> a Value. (Look for the use of ComputeValueVTs in the
> SelecionDAGBuilder and the related loops this implies).
> - Use of aggregate type generates a bunch of MERGE_VALUE nodes, which
> sole purpose is to glue all the components that make the aggregate
> type together.
> 
> 
> Therefore, in practice, values with aggregate types are mapped to
> several SDValue hidden in MERGE_VALUE nodes.
> 
> 
> * Summary *
> 
> 
> Values with aggregate type map to a list of SDValue and,
> consequently, the handling of the instructions is uneven between the
> ones that need to support aggregate type and the ones that do not.
> 
> 
> 
> 
> ** Possible Solutions **
> 
> 
> * A. Replicate SDAG Solution *
> 
> 
> Not much to add from the title. One can see that in the sketch of
> patch for the IRTranslator (from the original GlobalISel email
> http://lists.llvm.org/pipermail/llvm-dev/2015-November/092566.html
> ), we have a map from a Value to a list of virtual registers.
> 
> 
> Pros:
> - We know it worked for SDAG.
> - We know backends would generate reasonable code from that.
> 
> 
> Cons:
> - We need to pay (compile time, memory consumption) for the extra
> logic for every non-aggregate type.
> - Translation is not homogeneous between, say, add and store.
> - Premature splitting that need to be fixed later(?), e.g., merge
> consecutive stores.
> 
> 
> 
> 
> * B. Map Aggregate Type to One (Big) Virtual Register *
> 
> 
> Have one big virtual register for the whole aggregate type. Only the
> size of the register matters for the instructions that handle
> aggregate type, so we can still use EVT for virtual registers.
> 
> 
> Pros:
> - Translation is fast.
> - Code is easy to understand.
> - No premature splitting for instance for load instructions.
> 
> 
> Cons:
> - Useless bits (padding) are carried around and may be hard to
> eliminate.
We could associate some metadata with the register somehow to represent the
padding bits (we already do something like this for memcpys).
> - Expose weird types to legalization more often.
> - Backends may not cope very well with that kind of code.
> 
Given that every backend is going to need major work to truly make this
transition, I'd advise not to worry too much about your last item.

Seems like this second option is better than replicating the SDAG solution, so I
recommend we pursue that.

 -Hal
> 
> * C. Ideas *
> 
> 
> Any other ideas?
> 
> 
> 
> 
> ** Feedback Needed **
> 
> 
> I want to have your feedbacks on how we should model aggregate types
> during the IR translation with respect to the mapping to their
> machine representation.
> 
> 
> This is not a road blocker as, at first, I will replicate SDAG
> solution, but I want we have a conversation on what is the best long
> term solution.
> 
> 
> I will file a PR so that we remember we would need to come back to
> this part of the implementation of the prototype when we agreed on
> what the solution should be and look at productizing the selector.
> 
> 
> 
> 
> ** Example **
> 
> 
> Consider the following LLVM IR input:
> 
> %struct.bar = type { i16, i16 }
> 
> 
> define void @foo(i16 signext %a, i16 signext %b, %struct.bar*
> nocapture %addr) #0 {
> entry:
> %tmp = insertvalue %struct.bar undef, i16 %a, 0
> %tmp2 = insertvalue %struct.bar %tmp, i16 %b, 1
> store %struct.bar %tmp2, %struct.bar* %addr, align 4
> ret void
> }
> 
> 
> Which is roughly:
> 
> struct bar {
> short a;
> short b;
> };
> 
> 
> void foo(short a, short b, struct bar *addr) {
> struct bar tmp;
> tmp.a = a;
> tmp.b = b;
> *addr = tmp;
> }
> 
> 
> * Solution A: Replicate SDAG *
> 
> 
> Note: (#) is the size of the virtual register.
> 
> 
> - Translation:
> 
> 
> 
> 
> arg1(32) = copy R0
> arg2(32) = copy R1
> addr(32) = copy R2
> a(16) = truncate arg1(32)
> b(16) = truncate arg2(32)
> tmp5(16), tmp6(16) = merge_value a(16), b(16)
> store tmp5(16), addr(32), 0
> store tmp6(16), addr(32), 4
> 
> 
> - Legalization:
> 
> 
> 
> arg1(32) = copy R0
> arg2(32) = copy R1
> addr(32) = copy R2
> tmp5(16) = extract_subreg a(32), 0
> 
> tmp6(16) = extract_subreg b(32), 0
> store tmp5(16), addr(32), 0
> store tmp6(16), addr(32), 4
> 
> 
> 
> 
> * Solution B: One to one mapping Value to Vreg *
> 
> 
> - Translation:
> 
> 
> arg1(32) = copy R0
> arg2(32) = copy R1
> addr(32) = copy R2
> a(16) = truncate arg1(32)
> b(16) = truncate arg2(32)
> tmp(32) = concat a(16), b(16)
> store tmp(32), addr
> 
> 
> - Legalization:
> 
> 
> 
> arg1(32) = copy R0
> arg2(32) = copy R1
> addr(32) = copy R2
> a(32) = and arg1(32), 0xFFFF
> b(32) = and arg2(32), 0xFFFF
> tmp(32) = concat a(32/16), b(32/16) ; <— although the input registers
> are 32bits we are really only interested in the 16bit low part of
> both a and b.
> store tmp(32), addr ; <— legal
> 
> 
> The problem now is how do we legalize that thing with generic code?
> I.e., we’ll end up with either load/store sequences (maybe storing
> i16 is not even legal) or SHIFT and OR and that may be painful to
> optimize later.
> 
> 
> Thanks,
> -Quentin
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> 
-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory

Quentin Colombet via llvm-dev

2016-Jan-15 19:37 UTC

head link

[llvm-dev] [GlobalISel][RFC] Value to vreg during IR to MachineInstr translation for aggregate type

Hi Hal,

Thanks for your quick reply.
> On Jan 14, 2016, at 4:53 PM, Hal Finkel <hfinkel at anl.gov> wrote:
> 
> ----- Original Message -----
>> From: "Quentin Colombet via llvm-dev" <llvm-dev at
lists.llvm.org>
>> To: "llvm-dev" <llvm-dev at lists.llvm.org>
>> Sent: Thursday, January 14, 2016 6:41:57 PM
>> Subject: [llvm-dev] [GlobalISel][RFC] Value to vreg during IR to
MachineInstr translation for aggregate type
>> 
>> 
>> Hi,
>> 
>> 
>> As part of the effort to bring up GlobalISel, I would like your
>> feedbacks on the best way to map LLVM IR values into MachineInstr
>> values (virtual registers), in particular when aggregate types get
>> involved.
>> 
>> 
>> I am looking for a long term solution.
>> Short term is to replicate SDAG solution.
>> 
>> 
>> 
>> 
>> ** Context **
>> 
>> 
>> The first step of GlobalISel is to translate the LLVM IR into
>> MachineInstr representation. During this process we need to be able
>> to tell where are the different LLVM IR values with respect to their
>> machine representation, such that we can feed the right element to
>> the related machine instructions.
>> 
>> 
>> Right now, in SelectionDAG, we have a mapping from one Value* to one
>> SDValue. But this is not the whole story!
>> 
>> 
>> 
>> 
>> ** Problem With Aggregate Types **
>> 
>> 
>> * Facts *
>> 
>> 
>> - Aggregate types cannot be expressed with a single EVT (Extended
>> Value Type)
>> - SDValues are typed with a single EVT.
>> 
>> 
>> Therefore, values with aggregate types need to be break apart to be
>> represented by a SDValue.
>> 
>> 
>> * Implications *
>> 
>> 
>> - Instructions that may accept aggregate type, e.g., load, store, and
>> select, must be split to handle the different component that compose
>> a Value. (Look for the use of ComputeValueVTs in the
>> SelecionDAGBuilder and the related loops this implies).
>> - Use of aggregate type generates a bunch of MERGE_VALUE nodes, which
>> sole purpose is to glue all the components that make the aggregate
>> type together.
>> 
>> 
>> Therefore, in practice, values with aggregate types are mapped to
>> several SDValue hidden in MERGE_VALUE nodes.
>> 
>> 
>> * Summary *
>> 
>> 
>> Values with aggregate type map to a list of SDValue and,
>> consequently, the handling of the instructions is uneven between the
>> ones that need to support aggregate type and the ones that do not.
>> 
>> 
>> 
>> 
>> ** Possible Solutions **
>> 
>> 
>> * A. Replicate SDAG Solution *
>> 
>> 
>> Not much to add from the title. One can see that in the sketch of
>> patch for the IRTranslator (from the original GlobalISel email
>> http://lists.llvm.org/pipermail/llvm-dev/2015-November/092566.html
>> ), we have a map from a Value to a list of virtual registers.
>> 
>> 
>> Pros:
>> - We know it worked for SDAG.
>> - We know backends would generate reasonable code from that.
>> 
>> 
>> Cons:
>> - We need to pay (compile time, memory consumption) for the extra
>> logic for every non-aggregate type.
>> - Translation is not homogeneous between, say, add and store.
>> - Premature splitting that need to be fixed later(?), e.g., merge
>> consecutive stores.
>> 
>> 
>> 
>> 
>> * B. Map Aggregate Type to One (Big) Virtual Register *
>> 
>> 
>> Have one big virtual register for the whole aggregate type. Only the
>> size of the register matters for the instructions that handle
>> aggregate type, so we can still use EVT for virtual registers.
>> 
>> 
>> Pros:
>> - Translation is fast.
>> - Code is easy to understand.
>> - No premature splitting for instance for load instructions.
>> 
>> 
>> Cons:
>> - Useless bits (padding) are carried around and may be hard to
>> eliminate.
> 
> We could associate some metadata with the register somehow to represent the
padding bits (we already do something like this for memcpy).
The problem I have with metadata is that they can be lost.
That being said, having a mechanism to tell the optimizer what are in a register
would be useful. Right now, to get this kind of information we use the
computeKnownBits analysis and I wonder how could we generalize that so that the
information is cheaper to get and pass along the way.
> 
>> - Expose weird types to legalization more often.
>> - Backends may not cope very well with that kind of code.
>> 
> 
> Given that every backend is going to need major work to truly make this
transition, I'd advise not to worry too much about your last item.
> 
> Seems like this second option is better than replicating the SDAG solution,
so I recommend we pursue that.
My guts tell me that indeed the second option is better, but the lack of actual
data makes me nervous.
Anyway, we don’t have to support aggregates to lay the basic pipeline. And when
the basic pipeline is set, we can make actual measurements!

I’ve filed https://llvm.org/bugs/show_bug.cgi?id=26161 to keep track of that
issue.

Thanks,
-Quentin

> 
> -Hal
> 
>> 
>> * C. Ideas *
>> 
>> 
>> Any other ideas?
>> 
>> 
>> 
>> 
>> ** Feedback Needed **
>> 
>> 
>> I want to have your feedbacks on how we should model aggregate types
>> during the IR translation with respect to the mapping to their
>> machine representation.
>> 
>> 
>> This is not a road blocker as, at first, I will replicate SDAG
>> solution, but I want we have a conversation on what is the best long
>> term solution.
>> 
>> 
>> I will file a PR so that we remember we would need to come back to
>> this part of the implementation of the prototype when we agreed on
>> what the solution should be and look at productizing the selector.
>> 
>> 
>> 
>> 
>> ** Example **
>> 
>> 
>> Consider the following LLVM IR input:
>> 
>> %struct.bar = type { i16, i16 }
>> 
>> 
>> define void @foo(i16 signext %a, i16 signext %b, %struct.bar*
>> nocapture %addr) #0 {
>> entry:
>> %tmp = insertvalue %struct.bar undef, i16 %a, 0
>> %tmp2 = insertvalue %struct.bar %tmp, i16 %b, 1
>> store %struct.bar %tmp2, %struct.bar* %addr, align 4
>> ret void
>> }
>> 
>> 
>> Which is roughly:
>> 
>> struct bar {
>> short a;
>> short b;
>> };
>> 
>> 
>> void foo(short a, short b, struct bar *addr) {
>> struct bar tmp;
>> tmp.a = a;
>> tmp.b = b;
>> *addr = tmp;
>> }
>> 
>> 
>> * Solution A: Replicate SDAG *
>> 
>> 
>> Note: (#) is the size of the virtual register.
>> 
>> 
>> - Translation:
>> 
>> 
>> 
>> 
>> arg1(32) = copy R0
>> arg2(32) = copy R1
>> addr(32) = copy R2
>> a(16) = truncate arg1(32)
>> b(16) = truncate arg2(32)
>> tmp5(16), tmp6(16) = merge_value a(16), b(16)
>> store tmp5(16), addr(32), 0
>> store tmp6(16), addr(32), 4
>> 
>> 
>> - Legalization:
>> 
>> 
>> 
>> arg1(32) = copy R0
>> arg2(32) = copy R1
>> addr(32) = copy R2
>> tmp5(16) = extract_subreg a(32), 0
>> 
>> tmp6(16) = extract_subreg b(32), 0
>> store tmp5(16), addr(32), 0
>> store tmp6(16), addr(32), 4
>> 
>> 
>> 
>> 
>> * Solution B: One to one mapping Value to Vreg *
>> 
>> 
>> - Translation:
>> 
>> 
>> arg1(32) = copy R0
>> arg2(32) = copy R1
>> addr(32) = copy R2
>> a(16) = truncate arg1(32)
>> b(16) = truncate arg2(32)
>> tmp(32) = concat a(16), b(16)
>> store tmp(32), addr
>> 
>> 
>> - Legalization:
>> 
>> 
>> 
>> arg1(32) = copy R0
>> arg2(32) = copy R1
>> addr(32) = copy R2
>> a(32) = and arg1(32), 0xFFFF
>> b(32) = and arg2(32), 0xFFFF
>> tmp(32) = concat a(32/16), b(32/16) ; <— although the input
registers
>> are 32bits we are really only interested in the 16bit low part of
>> both a and b.
>> store tmp(32), addr ; <— legal
>> 
>> 
>> The problem now is how do we legalize that thing with generic code?
>> I.e., we’ll end up with either load/store sequences (maybe storing
>> i16 is not even legal) or SHIFT and OR and that may be painful to
>> optimize later.
>> 
>> 
>> Thanks,
>> -Quentin
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
<http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
>> 
> 
> -- 
> Hal Finkel
> Assistant Computational Scientist
> Leadership Computing Facility
> Argonne National Laboratory
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160115/4ed9e799/attachment.html>

Apparently Analagous Threads

Search for more apparently analagous threads

llvm dev - Jan 2016 - [GlobalISel][RFC] Value to vreg during IR to MachineInstr translation for aggregate type

[llvm-dev] [GlobalISel][RFC] Value to vreg during IR to MachineInstr translation for aggregate type

[llvm-dev] [GlobalISel][RFC] Value to vreg during IR to MachineInstr translation for aggregate type

[llvm-dev] [GlobalISel][RFC] Value to vreg during IR to MachineInstr translation for aggregate type

Apparently Analagous Threads