thr3ads.net - llvm dev - [LLVMdev] VLIW Ports [Oct 2011]

If this information is useful, please help other people find it:
Share via:

Evan Cheng

2011-Oct-26 19:07 UTC

[LLVMdev] VLIW Ports

On Oct 25, 2011, at 1:59 AM, Stripf, Timo wrote:
> Hi all,
> 
>> Ok, so in your proposal a bundle is just a special MachineInstr? That
sounds good. How are the MachineInstr's embedded inside a bundle? How are
the cumulative operands, implicit register defs and uses represented?
> 
> I attached the packing and unpacking pass I used within my backend. In my
solution multiple MachineInstruction are packed into one variadic
"PACK" MachineInstruction. The opcode and operands of the original
instruction are encoded as operands of the PACK instruction. The opcode is added
as immediate following by the operands of the original instructions. Within the
operands one instruction is terminated by an "EndOfOp" operand. The
implicit defs/uses are also added to the PACK instruction but not used for
unpacking. Unpacking reconstructs them from the TargetDescriptionInfo.
> 
> I took a look at the packing/unpacking solution of Evan and I think it is
more elegant to use a derived class of MachineInstr for storing multiple
instructions into one.
Here are my thoughts on instruction bundle.

First, let's talk about the prerequisite for adding a codegen level IR
extension. A MachineInstr bundle should be generic enough to support the
followings 1) VLIW bundles (where there are no intra-dependencies between
instructions in a bundle), 2) bundles for other targets where there may be
intra-dependencies between instructions in a bundle. #2 is very important for
the extension to be accepted into LLVM mainline today since there are no proper
VLIW targets.

Now let's look at the options.

1. Extend MachineInstr to represent a bundle. This can be achieved either a
derived class or add a pointer in MachineInstr that points to the next
instruction in the bundle.
2. Add a bit to MachineInstr that indicates it is part of a bundle / sequence.

The advantage of #1 is this requires minimum change to register allocator and
many other codegen passes. However, that's only true for VLIW targets with
no intra-bundle dependencies. For other targets or for use of optimizations
which model a sequence of instructions, this is not true. The register allocator
and scheduler need to know the cumulative properties of a bundle. For example,
the register allocator needs to know what are the input operands, what are the
outputs. The scheduler needs to know the cumulative latency of the bundle. Other
passes that examine individual instruction properties (e.g. is it a load /
store, control flow) will need to know the combined properties of individual
instructions in a bundle.

Of course, this is a solvable problem. The pass that combine instructions into
bundles can construct the bundle MachineInstr properly so it presents the right
information. The down size is this will add memory overhead and it needs to be
carefully studied.

The advantage of #2 is the low overhead. Adding a bit won't add much if any
memory overhead. Packing / unpacking are both very easy. This is especially good
for register allocator, which can still model register liveness even when there
are intra-bundle dependencies. The downsize of #2 is also obvious. Every pass
that operates on MachineInstr will have to be aware of bundles. This is the only
real downsize that I can think of, but it's a big one.

Evan

> 
> Best regards,
> Timo Stripf
> 
> -----Ursprüngliche Nachricht-----
> Von: Evan Cheng [mailto:evan.cheng at apple.com] 
> Gesendet: Dienstag, 25. Oktober 2011 01:55
> An: Carlos Sánchez de La Lama
> Cc: Stripf, Timo; LLVM Dev
> Betreff: Re: [LLVMdev] VLIW Ports
> 
> 
> On Oct 24, 2011, at 2:38 PM, Carlos Sánchez de La Lama wrote:
> 
>> Hi Evan (and all),
>> 
>>> I think any implementation that makes a "bundle" a
different entity from MachineInstr is going to be difficult to use. All of the
current backend passes will have to taught to know about bundles.
>> 
>> The approach in the patch I sent (and I believe Timo's code works
similar, according to his explanations) is precisely to make "bundles"
no different from MachineInstructions. They are MIs (a class derived from it),
so all other passes work transparently with them. For example, in my code
register allocator does not know it is allocating regs for a bundle, it sees it
just as a MI using a lot of registers. Of course, normal (scalar) passes can not
"inspect" inside bundles, and wont be able for example to put spilling
code into bundles or anything like that.
>> 
>> But the good point is that bundles (which are MIs) and regular MIs can
coexist inside a MachineBasicBlock, and bundles can easily be "broken
back" to regular MIs when needed for some pass.
> 
> Ok, so in your proposal a bundle is just a special MachineInstr? That
sounds good. How are the MachineInstr's embedded inside a bundle? How are
the cumulative operands, implicit register defs and uses represented?
> 
>> 
>>> I think what we need is a concept of a sequence of fixed machine
instructions. Something that represent a number of MachineInstr's that are
scheduled as a unit, something that is never broken up by MI passes such as
branch folding. This is something that current targets can use to, for example,
pre-schedule instructions. This can be useful for macro-fusing optimization. It
can also be used for VLIW targets.
>> 
>> There might be something I am missing, but I do not see the advantage
here. Even more, if you use sequences you need to find a way to tell the passes
how long a sequence is. On the other hand, if you use a class derived from MI,
the passes know already (from their POV their are just dealing with MIs). You
have of course to be careful on how you build the bundles so they have the right
properties matching those of the inner MIs, and there is where the pack/unpack
methods come in.
> 
> A "sequence" would not be actually a sequence of
MachineInstr's. I'm merely proposing you using a generic concept that is
not tied to VLIW. In the VLIW bundle, there are no inter-dependencies between
the instructions. However, I'm looking for a more generic concept that may
represent a sequence of instructions which may or may not have dependencies
between them. The key is to introduce a concept that can be used by an existing
target today.
> 
> Sounds like what you are proposing is not very far what I've described.
Do you have patches ready for review?
> 
> Evan
> 
>> 
>> BR
>> 
>> Carlos
>> 
>>> On Oct 21, 2011, at 4:52 PM, Stripf, Timo wrote:
>>> 
>>>> Hi all,
>>>> 
>>>> I worked the last 2 years on a LLVM back-end that supports
clustered and non-clustered VLIW architectures. I also wrote a paper about it
that is currently within the review process and is hopefully going to be
accepted. Here is a small summary how I realized VLIW support with a LLVM
back-end. I also used packing and unpacking of VLIW bundles. My implementations
do not require any modification of the LLVM core.
>>>> 
>>>> To support VLIW I added two representations for VLIW
instructions: packed and unpacked representation. Within the unpacked
representation a VLIW Bundle is separated by a NEXT instruction like it was done
within the IA-64 back-end. The pack representation packs all instructions of one
Bundle into a single PACK instruction and I used this representation especially
for the register allocation.
>>>> 
>>>> I used the following pass order for the clustered VLIW
back-end:
>>>> 
>>>> DAG->DAG Pattern Instruction Selection
>>>> ...
>>>> Clustering (Not required for unicluster VLIW architectures) 
>>>> Scheduling Packing ...
>>>> Register Allocation
>>>> ...
>>>> Prolog/Epilog Insertion & Frame Finalization Unpacking
Reclustering
>>>> ...
>>>> Rescheduling (Splitting, Packing, Scheduling, Unpacking)
Assembly
>>>> Printer
>>>> 
>>>> 
>>>> In principle, it is possible to use the LLVM scheduler to
generate parallel code by providing a custom hazard recognizer that checks true
data dependencies of the current bundle. The scheduler has also the capability
to output NEXT operations by using NoopHazard and outputting a NEXT instruction
instead of a NOP. However, the scheduler that is used within "DAG->DAG
Pattern Instruction Selection" uses this glue mechanism and that could be
problematic since no NEXT instructions are issued between glued instructions.
>>>> 
>>>> Within my back-end I added a parallelizing scheduling after
"DAG->DAG Pattern Instruction Selection" by reusing the LLVM
Post-RA scheduler together with a custom hazard recognizer as explained. The
Post-RA scheduler works very well with some small modifications (special PHI
instruction handling and a small performance issue due to the high virtual
register numbers) also before register allocation.
>>>> 
>>>> Before register allocation the Packing pass converts the
unpacked representation outputted by the scheduler into the pack representation.
So the register allocation sees the VLIW bundles as one instruction. After
"Prolog/Epilog Insertion & Frame Finalization" the Unpack pass
converts the PACK instruction back to the unpacked representation. Thereby,
instructions that were added within the Register Allocation and Prolog/Epilog
Insertion are recognized and gets into one bundle since they are not
parallelized.
>>>> 
>>>> At the end (just before assembly output) I added several passes
for doing a rescheduling. First, the splitting pass tries to split a VLIW bundle
into single instructions (if possible). The Packing pass packs all Bundles with
more the one instruction into a single PACK instruction. The scheduler will
recognize the PACK instruction as a single scheduling unit. Scheduling is nearly
the same as before RA. Unpacking establishes again the unpacked representation.
>>>> 
>>>> If anyone is interested in more information please send me an
email. I'm also interested in increasing support for VLIW architectures
within LLVM.
>>>> 
>>>> Kind regards,
>>>> Timo Stripf
>>>> 
>>>> -----Ursprüngliche Nachricht-----
>>>> Von: llvmdev-bounces at cs.uiuc.edu 
>>>> [mailto:llvmdev-bounces at cs.uiuc.edu] Im Auftrag von Carlos
Sánchez
>>>> de La Lama
>>>> Gesendet: Donnerstag, 6. Oktober 2011 13:14
>>>> An: LLVM Dev
>>>> Betreff: Re: [LLVMdev] VLIW Ports
>>>> 
>>>> Hi all,
>>>> 
>>>> here is the current (unfinished) version of the VLIW support I
mentioned. It is a patch over svn rev 141176. It includes the MachineInstrBundle
class, and small required changes in a couple of outside LLVM files.
>>>> 
>>>> Also includes a modification to Mips target to simulate a
2-wide VLIW MIPS. The scheduler is really silly, I did not want to implement a
scheduler, just the bundle class, and the test scheduler is just provided as an
example.
>>>> 
>>>> Main thing still missing is to finish the "pack" and
"unpack" methods in the bundle class. Right now it manages operands,
both implicit and explicit, but it should also manage memory references, and
update MIB flags acording to sub-MI flags.
>>>> 
>>>> For any question I would be glad to help.
>>>> 
>>>> BR
>>>> 
>>>> Carlos
>>>> 
>>>> On Tue, 2011-09-20 at 16:02 +0200, Carlos Sánchez de La Lama
wrote:
>>>>> Hi,
>>>>> 
>>>>>> Has anyone attempted the port of LLVM to a VLIW
architecture?  Is
>>>>>> there any publication about it?
>>>>> 
>>>>> I have developed a derivation of MachineInstr class, called
>>>>> MachineInstrBundle, which is essnetially a VLIW-style
machine
>>>>> instruction which can store any MI on each
"slot". After the
>>>>> scheduling phase has grouped MIs in bundles, it has to call
>>>>> MIB->pack() method, which takes operands from the MIs in
the "slots"
>>>>> and transfers them to the superinstruction. From this point
on the
>>>>> bundle is a normal machineinstruction which can be
processed by
>>>>> other LLVM passes (such as register allocation).
>>>>> 
>>>>> The idea was to make a framework on top of which VLIW/ILP 
>>>>> scheduling could be studies using LLVM. It is not
completely
>>>>> finished, but it is more or less usable and works with a
trivial
>>>>> scheduler in a synthetic MIPS-VLIW architecture. Code
emission does
>>>>> not work though (yet) so bundles have to be unpacked prior
to emission.
>>>>> 
>>>>> I was waiting to finish it to send a patch to the list, but
if you
>>>>> are interested I can send you a patch over svn of my
current code.
>>>>> 
>>>>> BR
>>>>> 
>>>>> Carlos
>>>> 
>>>> 
>>>> _______________________________________________
>>>> LLVM Developers mailing list
>>>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>> 
>> 
> 
> <TS1VLIWPacking.cpp><TS1VLIWUnpacking.cpp>

Sergei Larin

2011-Oct-26 20:01 UTC

head link

[LLVMdev] VLIW Ports

Evan, 

  What would change if tomorrow we got a VLIW target/back end with some
certain properties - let's say no intra-packed deps - would it sway your
opinion in either direction? Would it be a natural prerogative to implement
it certain way for such hypothetical contributor/submitter? 

Thanks. 

Sergei Larin

-----Original Message-----
From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On
Behalf Of Evan Cheng
Sent: Wednesday, October 26, 2011 2:08 PM
To: Stripf, Timo
Cc: LLVM Dev
Subject: Re: [LLVMdev] VLIW Ports


On Oct 25, 2011, at 1:59 AM, Stripf, Timo wrote:
> Hi all,
> 
>> Ok, so in your proposal a bundle is just a special MachineInstr? Thatsounds good. How are the MachineInstr's embedded inside a bundle? How are
the cumulative operands, implicit register defs and uses
represented?> 
> I attached the packing and unpacking pass I used within my backend. In mysolution multiple MachineInstruction are packed into one variadic
"PACK"
MachineInstruction. The opcode and operands of the original instruction are
encoded as operands of the PACK instruction. The opcode is added as
immediate following by the operands of the original instructions. Within the
operands one instruction is terminated by an "EndOfOp" operand. The
implicit
defs/uses are also added to the PACK instruction but not used for unpacking.
Unpacking reconstructs them from the TargetDescriptionInfo.
> 
> I took a look at the packing/unpacking solution of Evan and I think it ismore elegant to use a derived class of MachineInstr for storing multiple
instructions into one.

Here are my thoughts on instruction bundle.

First, let's talk about the prerequisite for adding a codegen level IR
extension. A MachineInstr bundle should be generic enough to support the
followings 1) VLIW bundles (where there are no intra-dependencies between
instructions in a bundle), 2) bundles for other targets where there may be
intra-dependencies between instructions in a bundle. #2 is very important
for the extension to be accepted into LLVM mainline today since there are no
proper VLIW targets.

Now let's look at the options.

1. Extend MachineInstr to represent a bundle. This can be achieved either a
derived class or add a pointer in MachineInstr that points to the next
instruction in the bundle.
2. Add a bit to MachineInstr that indicates it is part of a bundle /
sequence.

The advantage of #1 is this requires minimum change to register allocator
and many other codegen passes. However, that's only true for VLIW targets
with no intra-bundle dependencies. For other targets or for use of
optimizations which model a sequence of instructions, this is not true. The
register allocator and scheduler need to know the cumulative properties of a
bundle. For example, the register allocator needs to know what are the input
operands, what are the outputs. The scheduler needs to know the cumulative
latency of the bundle. Other passes that examine individual instruction
properties (e.g. is it a load / store, control flow) will need to know the
combined properties of individual instructions in a bundle.

Of course, this is a solvable problem. The pass that combine instructions
into bundles can construct the bundle MachineInstr properly so it presents
the right information. The down size is this will add memory overhead and it
needs to be carefully studied.

The advantage of #2 is the low overhead. Adding a bit won't add much if any
memory overhead. Packing / unpacking are both very easy. This is especially
good for register allocator, which can still model register liveness even
when there are intra-bundle dependencies. The downsize of #2 is also
obvious. Every pass that operates on MachineInstr will have to be aware of
bundles. This is the only real downsize that I can think of, but it's a big
one.

Evan

> 
> Best regards,
> Timo Stripf
> 
> -----Ursprüngliche Nachricht-----
> Von: Evan Cheng [mailto:evan.cheng at apple.com] 
> Gesendet: Dienstag, 25. Oktober 2011 01:55
> An: Carlos Sánchez de La Lama
> Cc: Stripf, Timo; LLVM Dev
> Betreff: Re: [LLVMdev] VLIW Ports
> 
> 
> On Oct 24, 2011, at 2:38 PM, Carlos Sánchez de La Lama wrote:
> 
>> Hi Evan (and all),
>> 
>>> I think any implementation that makes a "bundle" a
different entity fromMachineInstr is going to be difficult to use. All of the current backend
passes will have to taught to know about bundles. >> 
>> The approach in the patch I sent (and I believe Timo's code workssimilar, according to his explanations) is precisely to make "bundles"
no
different from MachineInstructions. They are MIs (a class derived from it),
so all other passes work transparently with them. For example, in my code
register allocator does not know it is allocating regs for a bundle, it sees
it just as a MI using a lot of registers. Of course, normal (scalar) passes
can not "inspect" inside bundles, and wont be able for example to put
spilling code into bundles or anything like that.>> 
>> But the good point is that bundles (which are MIs) and regular MIs cancoexist inside a MachineBasicBlock, and bundles can easily be "broken
back"
to regular MIs when needed for some pass.> 
> Ok, so in your proposal a bundle is just a special MachineInstr? Thatsounds good. How are the MachineInstr's embedded inside a bundle? How are
the cumulative operands, implicit register defs and uses
represented?> 
>> 
>>> I think what we need is a concept of a sequence of fixed machineinstructions. Something that represent a number of MachineInstr's that are
scheduled as a unit, something that is never broken up by MI passes such as
branch folding. This is something that current targets can use to, for
example, pre-schedule instructions. This can be useful for macro-fusing
optimization. It can also be used for VLIW targets.>> 
>> There might be something I am missing, but I do not see the advantagehere. Even more, if you use sequences you need to find a way to tell the
passes how long a sequence is. On the other hand, if you use a class derived
from MI, the passes know already (from their POV their are just dealing with
MIs). You have of course to be careful on how you build the bundles so they
have the right properties matching those of the inner MIs, and there is
where the pack/unpack methods come in.> 
> A "sequence" would not be actually a sequence of
MachineInstr's. I'mmerely proposing you using a generic concept that is not tied to VLIW. In
the VLIW bundle, there are no inter-dependencies between the instructions.
However, I'm looking for a more generic concept that may represent a
sequence of instructions which may or may not have dependencies between
them. The key is to introduce a concept that can be used by an existing
target today.> 
> Sounds like what you are proposing is not very far what I've described.
Do
you have patches ready for review?> 
> Evan
> 
>> 
>> BR
>> 
>> Carlos
>> 
>>> On Oct 21, 2011, at 4:52 PM, Stripf, Timo wrote:
>>> 
>>>> Hi all,
>>>> 
>>>> I worked the last 2 years on a LLVM back-end that supports
clusteredand non-clustered VLIW architectures. I also wrote a paper about it that is
currently within the review process and is hopefully going to be accepted.
Here is a small summary how I realized VLIW support with a LLVM back-end. I
also used packing and unpacking of VLIW bundles. My implementations do not
require any modification of the LLVM core.>>>> 
>>>> To support VLIW I added two representations for VLIW
instructions:packed and unpacked representation. Within the unpacked representation a
VLIW Bundle is separated by a NEXT instruction like it was done within the
IA-64 back-end. The pack representation packs all instructions of one Bundle
into a single PACK instruction and I used this representation especially for
the register allocation.>>>> 
>>>> I used the following pass order for the clustered VLIW
back-end:
>>>> 
>>>> DAG->DAG Pattern Instruction Selection
>>>> ...
>>>> Clustering (Not required for unicluster VLIW architectures) 
>>>> Scheduling Packing ...
>>>> Register Allocation
>>>> ...
>>>> Prolog/Epilog Insertion & Frame Finalization Unpacking
Reclustering
>>>> ...
>>>> Rescheduling (Splitting, Packing, Scheduling, Unpacking)
Assembly
>>>> Printer
>>>> 
>>>> 
>>>> In principle, it is possible to use the LLVM scheduler to
generateparallel code by providing a custom hazard recognizer that checks true data
dependencies of the current bundle. The scheduler has also the capability to
output NEXT operations by using NoopHazard and outputting a NEXT instruction
instead of a NOP. However, the scheduler that is used within "DAG->DAG
Pattern Instruction Selection" uses this glue mechanism and that could be
problematic since no NEXT instructions are issued between glued
instructions.>>>> 
>>>> Within my back-end I added a parallelizing scheduling after
"DAG->DAGPattern Instruction Selection" by reusing the LLVM Post-RA scheduler
together with a custom hazard recognizer as explained. The Post-RA scheduler
works very well with some small modifications (special PHI instruction
handling and a small performance issue due to the high virtual register
numbers) also before register allocation.>>>> 
>>>> Before register allocation the Packing pass converts the
unpackedrepresentation outputted by the scheduler into the pack representation. So
the register allocation sees the VLIW bundles as one instruction. After
"Prolog/Epilog Insertion & Frame Finalization" the Unpack pass
converts the
PACK instruction back to the unpacked representation. Thereby, instructions
that were added within the Register Allocation and Prolog/Epilog Insertion
are recognized and gets into one bundle since they are not
parallelized.>>>> 
>>>> At the end (just before assembly output) I added several passes
fordoing a rescheduling. First, the splitting pass tries to split a VLIW bundle
into single instructions (if possible). The Packing pass packs all Bundles
with more the one instruction into a single PACK instruction. The scheduler
will recognize the PACK instruction as a single scheduling unit. Scheduling
is nearly the same as before RA. Unpacking establishes again the unpacked
representation. >>>> 
>>>> If anyone is interested in more information please send me an
email.I'm also interested in increasing support for VLIW architectures within
LLVM.>>>> 
>>>> Kind regards,
>>>> Timo Stripf
>>>> 
>>>> -----Ursprüngliche Nachricht-----
>>>> Von: llvmdev-bounces at cs.uiuc.edu 
>>>> [mailto:llvmdev-bounces at cs.uiuc.edu] Im Auftrag von Carlos
Sánchez
>>>> de La Lama
>>>> Gesendet: Donnerstag, 6. Oktober 2011 13:14
>>>> An: LLVM Dev
>>>> Betreff: Re: [LLVMdev] VLIW Ports
>>>> 
>>>> Hi all,
>>>> 
>>>> here is the current (unfinished) version of the VLIW support Imentioned. It is a patch over svn rev 141176. It includes the
MachineInstrBundle class, and small required changes in a couple of outside
LLVM files.>>>> 
>>>> Also includes a modification to Mips target to simulate a
2-wide VLIWMIPS. The scheduler is really silly, I did not want to implement a
scheduler, just the bundle class, and the test scheduler is just provided as
an example.>>>> 
>>>> Main thing still missing is to finish the "pack" and
"unpack" methodsin the bundle class. Right now it manages operands, both implicit and
explicit, but it should also manage memory references, and update MIB flags
acording to sub-MI flags.>>>> 
>>>> For any question I would be glad to help.
>>>> 
>>>> BR
>>>> 
>>>> Carlos
>>>> 
>>>> On Tue, 2011-09-20 at 16:02 +0200, Carlos Sánchez de La Lama
wrote:
>>>>> Hi,
>>>>> 
>>>>>> Has anyone attempted the port of LLVM to a VLIW
architecture?  Is
>>>>>> there any publication about it?
>>>>> 
>>>>> I have developed a derivation of MachineInstr class, called
>>>>> MachineInstrBundle, which is essnetially a VLIW-style
machine
>>>>> instruction which can store any MI on each
"slot". After the
>>>>> scheduling phase has grouped MIs in bundles, it has to call
>>>>> MIB->pack() method, which takes operands from the MIs in
the "slots"
>>>>> and transfers them to the superinstruction. From this point
on the
>>>>> bundle is a normal machineinstruction which can be
processed by
>>>>> other LLVM passes (such as register allocation).
>>>>> 
>>>>> The idea was to make a framework on top of which VLIW/ILP 
>>>>> scheduling could be studies using LLVM. It is not
completely
>>>>> finished, but it is more or less usable and works with a
trivial
>>>>> scheduler in a synthetic MIPS-VLIW architecture. Code
emission does
>>>>> not work though (yet) so bundles have to be unpacked prior
to
emission.>>>>> 
>>>>> I was waiting to finish it to send a patch to the list, but
if you
>>>>> are interested I can send you a patch over svn of my
current code.
>>>>> 
>>>>> BR
>>>>> 
>>>>> Carlos
>>>> 
>>>> 
>>>> _______________________________________________
>>>> LLVM Developers mailing list
>>>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>> 
>> 
> 
> <TS1VLIWPacking.cpp><TS1VLIWUnpacking.cpp>

_______________________________________________
LLVM Developers mailing list
LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Evan Cheng

2011-Oct-31 16:36 UTC

head link

[LLVMdev] VLIW Ports

The key is there should a *single* mechanism to represent instruction bundles.
That means it has to be able to model intra-bundle dependencies. It doesn't
mean the support is in the codeine on day one. That can be added when a target
needs it. But the representation must have buy in from code owners who are
responsible for the components that are affected, e.g. register allocator.

Evan

On Oct 26, 2011, at 1:01 PM, Sergei Larin wrote:
> Evan, 
> 
>  What would change if tomorrow we got a VLIW target/back end with some
> certain properties - let's say no intra-packed deps - would it sway
your
> opinion in either direction? Would it be a natural prerogative to implement
> it certain way for such hypothetical contributor/submitter? 
> 
> Thanks. 
> 
> Sergei Larin
> 
> -----Original Message-----
> From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at
cs.uiuc.edu] On
> Behalf Of Evan Cheng
> Sent: Wednesday, October 26, 2011 2:08 PM
> To: Stripf, Timo
> Cc: LLVM Dev
> Subject: Re: [LLVMdev] VLIW Ports
> 
> 
> On Oct 25, 2011, at 1:59 AM, Stripf, Timo wrote:
> 
>> Hi all,
>> 
>>> Ok, so in your proposal a bundle is just a special MachineInstr?
That
> sounds good. How are the MachineInstr's embedded inside a bundle? How
are
> the cumulative operands, implicit register defs and uses represented?
>> 
>> I attached the packing and unpacking pass I used within my backend. In
my
> solution multiple MachineInstruction are packed into one variadic
"PACK"
> MachineInstruction. The opcode and operands of the original instruction are
> encoded as operands of the PACK instruction. The opcode is added as
> immediate following by the operands of the original instructions. Within
the
> operands one instruction is terminated by an "EndOfOp" operand.
The implicit
> defs/uses are also added to the PACK instruction but not used for
unpacking.
> Unpacking reconstructs them from the TargetDescriptionInfo. 
>> 
>> I took a look at the packing/unpacking solution of Evan and I think it
is
> more elegant to use a derived class of MachineInstr for storing multiple
> instructions into one.
> 
> Here are my thoughts on instruction bundle.
> 
> First, let's talk about the prerequisite for adding a codegen level IR
> extension. A MachineInstr bundle should be generic enough to support the
> followings 1) VLIW bundles (where there are no intra-dependencies between
> instructions in a bundle), 2) bundles for other targets where there may be
> intra-dependencies between instructions in a bundle. #2 is very important
> for the extension to be accepted into LLVM mainline today since there are
no
> proper VLIW targets.
> 
> Now let's look at the options.
> 
> 1. Extend MachineInstr to represent a bundle. This can be achieved either a
> derived class or add a pointer in MachineInstr that points to the next
> instruction in the bundle.
> 2. Add a bit to MachineInstr that indicates it is part of a bundle /
> sequence.
> 
> The advantage of #1 is this requires minimum change to register allocator
> and many other codegen passes. However, that's only true for VLIW
targets
> with no intra-bundle dependencies. For other targets or for use of
> optimizations which model a sequence of instructions, this is not true. The
> register allocator and scheduler need to know the cumulative properties of
a
> bundle. For example, the register allocator needs to know what are the
input
> operands, what are the outputs. The scheduler needs to know the cumulative
> latency of the bundle. Other passes that examine individual instruction
> properties (e.g. is it a load / store, control flow) will need to know the
> combined properties of individual instructions in a bundle.
> 
> Of course, this is a solvable problem. The pass that combine instructions
> into bundles can construct the bundle MachineInstr properly so it presents
> the right information. The down size is this will add memory overhead and
it
> needs to be carefully studied.
> 
> The advantage of #2 is the low overhead. Adding a bit won't add much if
any
> memory overhead. Packing / unpacking are both very easy. This is especially
> good for register allocator, which can still model register liveness even
> when there are intra-bundle dependencies. The downsize of #2 is also
> obvious. Every pass that operates on MachineInstr will have to be aware of
> bundles. This is the only real downsize that I can think of, but it's a
big
> one.
> 
> Evan
> 
> 
>> 
>> Best regards,
>> Timo Stripf
>> 
>> -----Ursprüngliche Nachricht-----
>> Von: Evan Cheng [mailto:evan.cheng at apple.com] 
>> Gesendet: Dienstag, 25. Oktober 2011 01:55
>> An: Carlos Sánchez de La Lama
>> Cc: Stripf, Timo; LLVM Dev
>> Betreff: Re: [LLVMdev] VLIW Ports
>> 
>> 
>> On Oct 24, 2011, at 2:38 PM, Carlos Sánchez de La Lama wrote:
>> 
>>> Hi Evan (and all),
>>> 
>>>> I think any implementation that makes a "bundle" a
different entity from
> MachineInstr is going to be difficult to use. All of the current backend
> passes will have to taught to know about bundles. 
>>> 
>>> The approach in the patch I sent (and I believe Timo's code
works
> similar, according to his explanations) is precisely to make
"bundles" no
> different from MachineInstructions. They are MIs (a class derived from it),
> so all other passes work transparently with them. For example, in my code
> register allocator does not know it is allocating regs for a bundle, it
sees
> it just as a MI using a lot of registers. Of course, normal (scalar) passes
> can not "inspect" inside bundles, and wont be able for example to
put
> spilling code into bundles or anything like that.
>>> 
>>> But the good point is that bundles (which are MIs) and regular MIs
can
> coexist inside a MachineBasicBlock, and bundles can easily be "broken
back"
> to regular MIs when needed for some pass.
>> 
>> Ok, so in your proposal a bundle is just a special MachineInstr? That
> sounds good. How are the MachineInstr's embedded inside a bundle? How
are
> the cumulative operands, implicit register defs and uses represented?
>> 
>>> 
>>>> I think what we need is a concept of a sequence of fixed
machine
> instructions. Something that represent a number of MachineInstr's that
are
> scheduled as a unit, something that is never broken up by MI passes such as
> branch folding. This is something that current targets can use to, for
> example, pre-schedule instructions. This can be useful for macro-fusing
> optimization. It can also be used for VLIW targets.
>>> 
>>> There might be something I am missing, but I do not see the
advantage
> here. Even more, if you use sequences you need to find a way to tell the
> passes how long a sequence is. On the other hand, if you use a class
derived
> from MI, the passes know already (from their POV their are just dealing
with
> MIs). You have of course to be careful on how you build the bundles so they
> have the right properties matching those of the inner MIs, and there is
> where the pack/unpack methods come in.
>> 
>> A "sequence" would not be actually a sequence of
MachineInstr's. I'm
> merely proposing you using a generic concept that is not tied to VLIW. In
> the VLIW bundle, there are no inter-dependencies between the instructions.
> However, I'm looking for a more generic concept that may represent a
> sequence of instructions which may or may not have dependencies between
> them. The key is to introduce a concept that can be used by an existing
> target today.
>> 
>> Sounds like what you are proposing is not very far what I've
described. Do
> you have patches ready for review?
>> 
>> Evan
>> 
>>> 
>>> BR
>>> 
>>> Carlos
>>> 
>>>> On Oct 21, 2011, at 4:52 PM, Stripf, Timo wrote:
>>>> 
>>>>> Hi all,
>>>>> 
>>>>> I worked the last 2 years on a LLVM back-end that supports
clustered
> and non-clustered VLIW architectures. I also wrote a paper about it that is
> currently within the review process and is hopefully going to be accepted.
> Here is a small summary how I realized VLIW support with a LLVM back-end. I
> also used packing and unpacking of VLIW bundles. My implementations do not
> require any modification of the LLVM core.
>>>>> 
>>>>> To support VLIW I added two representations for VLIW
instructions:
> packed and unpacked representation. Within the unpacked representation a
> VLIW Bundle is separated by a NEXT instruction like it was done within the
> IA-64 back-end. The pack representation packs all instructions of one
Bundle
> into a single PACK instruction and I used this representation especially
for
> the register allocation.
>>>>> 
>>>>> I used the following pass order for the clustered VLIW
back-end:
>>>>> 
>>>>> DAG->DAG Pattern Instruction Selection
>>>>> ...
>>>>> Clustering (Not required for unicluster VLIW architectures)
>>>>> Scheduling Packing ...
>>>>> Register Allocation
>>>>> ...
>>>>> Prolog/Epilog Insertion & Frame Finalization Unpacking
Reclustering
>>>>> ...
>>>>> Rescheduling (Splitting, Packing, Scheduling, Unpacking)
Assembly
>>>>> Printer
>>>>> 
>>>>> 
>>>>> In principle, it is possible to use the LLVM scheduler to
generate
> parallel code by providing a custom hazard recognizer that checks true data
> dependencies of the current bundle. The scheduler has also the capability
to
> output NEXT operations by using NoopHazard and outputting a NEXT
instruction
> instead of a NOP. However, the scheduler that is used within
"DAG->DAG
> Pattern Instruction Selection" uses this glue mechanism and that could
be
> problematic since no NEXT instructions are issued between glued
> instructions.
>>>>> 
>>>>> Within my back-end I added a parallelizing scheduling after
"DAG->DAG
> Pattern Instruction Selection" by reusing the LLVM Post-RA scheduler
> together with a custom hazard recognizer as explained. The Post-RA
scheduler
> works very well with some small modifications (special PHI instruction
> handling and a small performance issue due to the high virtual register
> numbers) also before register allocation.
>>>>> 
>>>>> Before register allocation the Packing pass converts the
unpacked
> representation outputted by the scheduler into the pack representation. So
> the register allocation sees the VLIW bundles as one instruction. After
> "Prolog/Epilog Insertion & Frame Finalization" the Unpack
pass converts the
> PACK instruction back to the unpacked representation. Thereby, instructions
> that were added within the Register Allocation and Prolog/Epilog Insertion
> are recognized and gets into one bundle since they are not parallelized.
>>>>> 
>>>>> At the end (just before assembly output) I added several
passes for
> doing a rescheduling. First, the splitting pass tries to split a VLIW
bundle
> into single instructions (if possible). The Packing pass packs all Bundles
> with more the one instruction into a single PACK instruction. The scheduler
> will recognize the PACK instruction as a single scheduling unit. Scheduling
> is nearly the same as before RA. Unpacking establishes again the unpacked
> representation. 
>>>>> 
>>>>> If anyone is interested in more information please send me
an email.
> I'm also interested in increasing support for VLIW architectures within
> LLVM.
>>>>> 
>>>>> Kind regards,
>>>>> Timo Stripf
>>>>> 
>>>>> -----Ursprüngliche Nachricht-----
>>>>> Von: llvmdev-bounces at cs.uiuc.edu 
>>>>> [mailto:llvmdev-bounces at cs.uiuc.edu] Im Auftrag von
Carlos Sánchez
>>>>> de La Lama
>>>>> Gesendet: Donnerstag, 6. Oktober 2011 13:14
>>>>> An: LLVM Dev
>>>>> Betreff: Re: [LLVMdev] VLIW Ports
>>>>> 
>>>>> Hi all,
>>>>> 
>>>>> here is the current (unfinished) version of the VLIW
support I
> mentioned. It is a patch over svn rev 141176. It includes the
> MachineInstrBundle class, and small required changes in a couple of outside
> LLVM files.
>>>>> 
>>>>> Also includes a modification to Mips target to simulate a
2-wide VLIW
> MIPS. The scheduler is really silly, I did not want to implement a
> scheduler, just the bundle class, and the test scheduler is just provided
as
> an example.
>>>>> 
>>>>> Main thing still missing is to finish the "pack"
and "unpack" methods
> in the bundle class. Right now it manages operands, both implicit and
> explicit, but it should also manage memory references, and update MIB flags
> acording to sub-MI flags.
>>>>> 
>>>>> For any question I would be glad to help.
>>>>> 
>>>>> BR
>>>>> 
>>>>> Carlos
>>>>> 
>>>>> On Tue, 2011-09-20 at 16:02 +0200, Carlos Sánchez de La
Lama wrote:
>>>>>> Hi,
>>>>>> 
>>>>>>> Has anyone attempted the port of LLVM to a VLIW
architecture?  Is
>>>>>>> there any publication about it?
>>>>>> 
>>>>>> I have developed a derivation of MachineInstr class,
called
>>>>>> MachineInstrBundle, which is essnetially a VLIW-style
machine
>>>>>> instruction which can store any MI on each
"slot". After the
>>>>>> scheduling phase has grouped MIs in bundles, it has to
call
>>>>>> MIB->pack() method, which takes operands from the
MIs in the "slots"
>>>>>> and transfers them to the superinstruction. From this
point on the
>>>>>> bundle is a normal machineinstruction which can be
processed by
>>>>>> other LLVM passes (such as register allocation).
>>>>>> 
>>>>>> The idea was to make a framework on top of which
VLIW/ILP
>>>>>> scheduling could be studies using LLVM. It is not
completely
>>>>>> finished, but it is more or less usable and works with
a trivial
>>>>>> scheduler in a synthetic MIPS-VLIW architecture. Code
emission does
>>>>>> not work though (yet) so bundles have to be unpacked
prior to
> emission.
>>>>>> 
>>>>>> I was waiting to finish it to send a patch to the list,
but if you
>>>>>> are interested I can send you a patch over svn of my
current code.
>>>>>> 
>>>>>> BR
>>>>>> 
>>>>>> Carlos
>>>>> 
>>>>> 
>>>>> _______________________________________________
>>>>> LLVM Developers mailing list
>>>>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>>> 
>>> 
>> 
>> <TS1VLIWPacking.cpp><TS1VLIWUnpacking.cpp>
> 
> 
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>

Seemingly Similar Threads

Search for more apparently analagous threads

llvm dev - Oct 2011 - [LLVMdev] VLIW Ports

[LLVMdev] VLIW Ports

[LLVMdev] VLIW Ports

[LLVMdev] VLIW Ports

Seemingly Similar Threads