thr3ads.net - llvm dev - [LLVMdev] [Discussion] Portable SIMD programming using LLVM? [Apr 2011]

If this information is useful, please help other people find it:
Share via:

Hua Huang

2011-Mar-28 21:20 UTC

[LLVMdev] [Discussion] Portable SIMD programming using LLVM?

Hello all,

I am currently working on building a portable SIMD programming model(SIMD
within a register) for various platforms, like x86, ARM and so on. I know
that LLVM supports SIMD programming on many architectures, so I am thinking
of using LLVM to help build this model but I not quite sure whether LLVM can
help.

The following describes the details of the model and also my current
approach for this model.

*Background*
Almost modern processor families support SIMD instruction sets but the
instruction set designs for each platform have different combinations of
operations. The portable SIMD here is to make an uniform system of SIMD
operations at all power-of-2 field widths.
For example, for simd_add on SSE2, I want to have all the following
operations supported, simd<2^x>::add(a, b) for
2<=2^x<=register_size(register_size equals 128 in SSE2). However, SSE2
only
supports simd_add on field widths of 8, 16, 32, or 64. Hence, simulation for
simd_add on field widths of 2, 4 and 128 is needed. *At mean time, I want
the simulation to be as good as possible*.

*An iteration Approach*
My current approach of getting the best simulation is a strategy-based
iteration algorithm. For example, simd<4>::add can be implemented based on
simd<8>::add in the following manner.

===strategy_add_doubling==1. hiMask = simd<8>::himask() //this is a
constant with high 4 bits of each
8-bits field are 1111 and low 4 bits of each 8-bits field are 0000.
2. loMask = simd<8>::lowmask() //this is a constant with high 4 bits of
each
8-bits field are 0000 and low 4 bits of each 8-bits field are 1111.
3. c = simd<8>::add(simd_and(loMask, a), simd_and(loMask, b))
4. d = simd<8>::add(simd_and(hiMask, a), simd_and(hiMask, b))
5. return simd_or(d, simd_and(loMask, c)) // get the answer for
simd<4>::add(a, b)

Also, it is not hard to see that simd<fw>::add can be implemented based on
simd<fw/2>::add. The key is getting the correct carry information and
constructing that carry mask.
===strategy_add_halving==1. c = simd<fw/2>::add(a, b)
2. get the carry mask d //the mask should look like
...00010000...00000000...00010000...
3. return simd<fw/2>::add(d, c)

As you can see, for a certain simd operation, there are many ways/strategies
to simulate it, like simd<fw>::add, it can be simulated based on
simd<2*fw>::add and also based on simd<fw/2>::add.

Then, we can construct as many strategies as possible for each simd
operation we wanna support. Based on all the strategies we have, an
iteration algorithm can be performed to get the best simulation for each
simd operation. The criteria here for evaulation is the number of
instructions used in a simulation, so the best simulation will have the
least number of instructions.

*Can I use LLVM to improve the above approach or even get a new approach?*

Since LLVM supports various architectures and captures the system resources
very well, I am wondering whether LLVM could be used to help the project. I
went through the LLVM documentation, but I am still stucking here on how to
incorporate LLVM into my approach. Below are some of my thoughts.

1. The strategy-based approach doesn't capture the target resources very
well in terms of CPU cycles for operations, instruction selection, etc.
There might be a case that two different simulations for a certain operation
have the same number of instructions, the strategy-based approach can't
handle this case very well(It just randomly picks one).
2. The strategies are written by human, so they are not carefully tuned to
achieve best performance. There are quite a lot of room for improving the
performance.
3. Can LLVM be incorporated into the strategy-based approach so that LLVM
provides additional information for getting the best simulation?
4. Instead of using the strategy-based approach, can we utilize LLVM to get
a new approach to achieve the goal?

I know that LLVM has auto-vectorization and can generate SIMD instructions
for scalar codes. But I don't know the details how LLVM captures the target
resources and generates good instructions especially the SIMD instructions
on different platforms.

Any suggestions or comments are appreciated.

I look forward to hearing from you.

Thanks a lot,
Kevin
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20110328/a2f9f8eb/attachment.html>

Duncan Sands

2011-Apr-01 09:30 UTC

head link

[LLVMdev] [Discussion] Portable SIMD programming using LLVM?

Hi Kevin,
> *Background*
> Almost modern processor families support SIMD instruction sets but the
> instruction set designs for each platform have different combinations of
> operations. The portable SIMD here is to make an uniform system of SIMD
> operations at all power-of-2 field widths.
> For example, for simd_add on SSE2, I want to have all the following
operations
> supported, simd<2^x>::add(a, b) for
2<=2^x<=register_size(register_size equals
> 128 in SSE2). However, SSE2 only supports simd_add on field widths of 8,
16, 32,
> or 64. Hence, simulation for simd_add on field widths of 2, 4 and 128 is
needed.
> *At mean time, I want the simulation to be as good as possible*.
LLVM already does this: if you use a vector type that is not natively supported
then operations on it are simulated using either vector operations in another
vector type or scalar operations if no suitable vector type exists.

Ciao, Duncan.

Kevin Huang

2011-Apr-01 20:35 UTC

head link

[LLVMdev] [Discussion] Portable SIMD programming using LLVM?

Right. LLVM supports the transformation from unsupported vector type to
supported vector type or even scalar types.
According to the LLVM's documentation, LLVM has two ways of doing the
transformation, one is promoting the small vector type to larger vector
type, the other is breaking up the large vector type into smaller ones.
But I am wondering, how good the transformation will be in LLVM's
implementation.

In my approach, I use strategies to capture all possible transformation. The
transformation are not only just promoting and breaking up, but also other
forms of transformation, like the transformation between vector types with
same size.
My goal is to make sure the simulation to be as good as possible in terms of
the number of instructions, CPU latency, etc. If LLVM has information about
the latency information of supported architectures, maybe I can use them to
optimize my approach.

Does LLVM have the latency information(such as CPU cycles) of SIMD
instructions for each supported architecture?

Thanks.

On Fri, Apr 1, 2011 at 2:30 AM, Duncan Sands <baldrick at free.fr> wrote:
> Hi Kevin,
>
> > *Background*
> > Almost modern processor families support SIMD instruction sets but the
> > instruction set designs for each platform have different combinations
of
> > operations. The portable SIMD here is to make an uniform system of
SIMD
> > operations at all power-of-2 field widths.
> > For example, for simd_add on SSE2, I want to have all the following
> operations
> > supported, simd<2^x>::add(a, b) for
2<=2^x<=register_size(register_size
> equals
> > 128 in SSE2). However, SSE2 only supports simd_add on field widths of
8,
> 16, 32,
> > or 64. Hence, simulation for simd_add on field widths of 2, 4 and 128
is
> needed.
> > *At mean time, I want the simulation to be as good as possible*.
>
> LLVM already does this: if you use a vector type that is not natively
> supported
> then operations on it are simulated using either vector operations in
> another
> vector type or scalar operations if no suitable vector type exists.
>
> Ciao, Duncan.
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20110401/c8351864/attachment.html>

Maybe Matching Threads

Search for more possibly parallel threads

llvm dev - Apr 2011 - [LLVMdev] [Discussion] Portable SIMD programming using LLVM?

[LLVMdev] [Discussion] Portable SIMD programming using LLVM?

[LLVMdev] [Discussion] Portable SIMD programming using LLVM?

[LLVMdev] [Discussion] Portable SIMD programming using LLVM?

Maybe Matching Threads