thr3ads.net - llvm dev - [llvm-dev] Best way to integrate special-purpose hardware with LLVM [Sep 2021]

If this information is useful, please help other people find it:
Share via:
Kai Plociennik via llvm-dev
2021-Sep-27 14:03 UTC
[llvm-dev] Best way to integrate special-purpose hardware with LLVM

Dear all,

in a research project, we are developing a special-purpose
floating-point accelerator, together with C++/OpenMP offloading support
with LLVM.

Since our hardware has a limited, special-purpose instruction set, it is
often a challenge for us to decide on how to integrate with LLVM's
general-purpose concepts such as IR and instruction selection. Hence, I
would appreciate a lot if we could get some feedback w.r.t. our current
approach. Let me sketch an example.

Our accelerator has no instruction for loading/writing floating point 
values from or to memory using a given address. Instead: 1) The data 
layout of a structured volume of data in memory can be 
hardware-configured, and then load and store instructions have a special 
form with offsets around the "center point" in the data volume which
is
currently processed. 2) We can write values to hardware registers and 
use them in computations.

Now, let's assume we have C++ code like the following:

float array[5] {1, 2, 3, 4, 5}.

#pragma omp target
{
   ...
   float a = array[0];
   ...
}

In the IR we get, we find fragments like this:

%arrayidx = getelementptr inbounds [5 x float], ..., i32 0
%1 = load float, float* %arrayidx, align 4

If we leave it like this, in the created selection DAG there will be a 
lot of instructions we do not conceptually have on our hardware, and 
detecting the necessary changes at instruction selection stage is 
complex and hard.

Hence, we chose method 2) (use register) from above in this case and do 
the following:

1) Define a target-specific intrinsic "array_access" which captures
information about the accessed array and the offset.

2) Replace the two instructions (getelementptr, load) from above by a
single "array_access" intrinsic in a special IR pass. This is a kind
of
"meta-instruction" which is here for conveying necessary information
to
instruction selection phase.

3) Lower the array_access intrinsic to a pseudo-instruction we also 
define for our hardware, to get from IR to selection DAG.

4) Replace the pseudo instruction by a hardware register in instruction 
selection. This creates valid machine code. (At runtime, we configure 
the register correctly before executing the machine code.)


I would be interested in feedback on several things here:

-Is there an "intended way" of integrating such special-purpose
hardware
with LLVM?

-Is it a good idea to introduce an intrinsic for the described purpose, 
or does this violate some architectural ideas of intrinsics or LLVM in 
general?

-We do not want to lose any relevant information on the IR or selection 
DAG. For example, in the original form, general LLVM functionality will 
be aware of what getelementptr and load means, so data dependencies will 
not be broken. In our "meta-instruction" (intrinsic) form, there might
be a risk of either LLVM passes creating invalid code since they do not 
know what our meta-instructions mean, or losing the ability to employ 
optimization functionality present in LLVM due to unknown semantics of 
our meta-instructions. What would be the right way to handle this?

-How to cope in general with the fact that a target architecture has no 
instructions for certain concepts such as integer computation, address 
calculations, "standard" memory accesses, and so on. Are there known 
projects which had to tackle this?

-When modifying IR or selection DAG as above, how to get the necessary 
"metadata" from IR to instruction selection or to runtime? (In this
case
the data on the accessed array and offset or register)

To summarize: What would be the best "architectural approach" to 
integrate support for such a kind of special-purpose hardware in LLVM? 
Should we e.g. include "pseudo instructions" in
OurHWInstructionInfo.td
and "pseudo registers" in OurHWRegisterInfo.td to be able to integrate
with the general-purpose LLVM concepts and in the end treat 
instructions/registers in a special way? Or can we avoid having to do so 
earlier?

Any feedback on this would be greatly appreciated.

Sincerely,

Kai Plociennik

-- 
Dr. Kai Plociennik
Fraunhofer-Institut für Techno- und Wirtschaftsmathematik ITWM
Competence Center High Performance Computing
Fraunhofer-Platz 1
67663 Kaiserslautern
Tel: +49 (0)631 31600 4081
mail: kai.plociennik at itwm.fraunhofer.de
www.itwm.fraunhofer.de
llvm dev - Sep 2021 - Best way to integrate special-purpose hardware with LLVM

[llvm-dev] Best way to integrate special-purpose hardware with LLVM