Alex Susu via llvm-dev
2017-May-21 08:22 UTC
[llvm-dev] Handling native i16 types in clang and opt
Hello.
My target architecture supports natively 16 bit integers (i16).
Whenever I write in C programs using only short types, clang compiles the
program to
LLVM and converts the i16 data to i32 to perform arithmetic operations and then
truncates
the results to i16. Then, the InstructionCombining (INSTCOMBINE or IC) pass
removes these
conversions back and forth from i16, except for the (s)div LLVM IR operation.
Is there a way to avoid these conversion made by clang back and forth from
i16 to
i32, if my source program uses only short types?
Otherwise, how can I make the IC pass handle sdiv the way it does with add
(sub),
mul? (that is, if the input operands are i16, the add/mul operation will
eventually be
i16, with any unnecessary conversion back and forth from i32 removed).
Thank you,
Alex
Craig Topper via llvm-dev
2017-May-21 08:40 UTC
[llvm-dev] Handling native i16 types in clang and opt
Do you have a simple test case you can send? I'm having trouble
replicating this on x86-64 with the simplest possible test.
unsigned short foo(unsigned short a, unsigned short b) {
return a + b;
}
This gives IR with no mention of i32. Maybe there's somethings
misconfigured for your target or I need a more complex test case.
~Craig
On Sun, May 21, 2017 at 1:22 AM, Alex Susu via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> Hello.
> My target architecture supports natively 16 bit integers (i16).
>
> Whenever I write in C programs using only short types, clang compiles
> the program to LLVM and converts the i16 data to i32 to perform arithmetic
> operations and then truncates the results to i16. Then, the
> InstructionCombining (INSTCOMBINE or IC) pass removes these conversions
> back and forth from i16, except for the (s)div LLVM IR operation.
>
> Is there a way to avoid these conversion made by clang back and forth
> from i16 to i32, if my source program uses only short types?
> Otherwise, how can I make the IC pass handle sdiv the way it does with
> add (sub), mul? (that is, if the input operands are i16, the add/mul
> operation will eventually be i16, with any unnecessary conversion back and
> forth from i32 removed).
>
> Thank you,
> Alex
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170521/9f445ab1/attachment.html>
Friedman, Eli via llvm-dev
2017-May-22 17:27 UTC
[llvm-dev] Handling native i16 types in clang and opt
On 5/21/2017 1:22 AM, Alex Susu via llvm-dev wrote:> Hello. > My target architecture supports natively 16 bit integers (i16). > > Whenever I write in C programs using only short types, clang > compiles the program to LLVM and converts the i16 data to i32 to > perform arithmetic operations and then truncates the results to i16. > Then, the InstructionCombining (INSTCOMBINE or IC) pass removes these > conversions back and forth from i16, except for the (s)div LLVM IR > operation.sdiv in particular is special: it has undefined behavior on overflow. "sdiv i32 -32768, -1" produces "i32 32768", but "sdiv i16 -32768, -1" is undefined. -Eli
Nemanja Ivanovic via llvm-dev
2017-May-29 13:30 UTC
[llvm-dev] Handling native i16 types in clang and opt
Just a shot in the dark here... could this possibly be that Clang (or whatever is adding those trunc/ext's into the IR) is considering your calling conventions in <TargetName>CallingConv.td? We certainly get the same behaviour on PPC but I wonder if that's due to lines like this in PPCCallingConv.td: `CCIfType<[i8], CCPromoteToType<i64>>` I don't know enough about this code to really know whether the above has any relation to reality, but it seems related. On Mon, May 22, 2017 at 1:27 PM, Friedman, Eli via llvm-dev < llvm-dev at lists.llvm.org> wrote:> On 5/21/2017 1:22 AM, Alex Susu via llvm-dev wrote: > >> Hello. >> My target architecture supports natively 16 bit integers (i16). >> >> Whenever I write in C programs using only short types, clang compiles >> the program to LLVM and converts the i16 data to i32 to perform arithmetic >> operations and then truncates the results to i16. Then, the >> InstructionCombining (INSTCOMBINE or IC) pass removes these conversions >> back and forth from i16, except for the (s)div LLVM IR operation. >> > > sdiv in particular is special: it has undefined behavior on overflow. > "sdiv i32 -32768, -1" produces "i32 32768", but "sdiv i16 -32768, -1" is > undefined. > > -Eli > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170529/4fd4e157/attachment.html>
Alex Susu via llvm-dev
2018-Jul-25 15:12 UTC
[llvm-dev] Handling native i16 types in clang and opt
Hello.
I come back to this older thread.
I'd also like to thank Peter Lawrence for the insightful answer (see
below his email,
if interested). Actually I would like to add that the C11 standard, Section
6.3.1.1, talks
about integer promotions, which explains why the C language requires short
arithmetic to
be promoted to the size of int. See also
https://stackoverflow.com/questions/46073295/implicit-type-promotion-rules .
I would like to give an answer to Craig Topper: indeed I have a simple very
interesting case where these promotions happen - the Floyd-Warshall algorithm,
with the
below program (also try the example at
https://www.geeksforgeeks.org/integer-promotions-in-c/) . But in all cases do
give clang
-O0 to emit unoptimized LLVM IR.
#define SIZE 128
short path[SIZE][SIZE];
void FloydWarshall() {
int i, j, k;
for (k = 0; k < SIZE; k++) {
for (i = 0; i < SIZE; ++i) {
short pik = path[i][k];
for (j = 0; j < SIZE; j++) {
path[i][j] = path[i][j] < pik + path[k][j] ?
path[i][j] : pik + path[k][j];
}
}
}
}
The innermost's loop body is translated to the following UNoptimized
LLVM IR code -
see lines with comment "IMPORTANT":
for.body8: ; preds = %for.cond6
%6 = load i32, i32* %j, align 4
%idxprom9 = sext i32 %6 to i64
%7 = load i32, i32* %i, align 4
%idxprom10 = sext i32 %7 to i64
%arrayidx11 = getelementptr inbounds [256 x [256 x i16]], [256 x [256
x i16]]*
@path, i64 0, i64 %idxprom10
%arrayidx12 = getelementptr inbounds [256 x i16], [256 x i16]*
%arrayidx11, i64
0, i64 %idxprom9
%8 = load i16, i16* %arrayidx12
%conv = sext i16 %8 to i32 ; IMPORTANT
%9 = load i16, i16* %pik
%conv13 = sext i16 %9 to i32 ; IMPORTANT
%10 = load i32, i32* %j, align 4
%idxprom14 = sext i32 %10 to i64
%11 = load i32, i32* %k, align 4
%idxprom15 = sext i32 %11 to i64
%arrayidx16 = getelementptr inbounds [256 x [256 x i16]], [256 x [256
x i16]]*
@path, i64 0, i64 %idxprom15
%arrayidx17 = getelementptr inbounds [256 x i16], [256 x i16]*
%arrayidx16, i64
0, i64 %idxprom14
%12 = load i16, i16* %arrayidx17, align 2, !dbg !61
%conv18 = sext i16 %12 to i32
%add = add nsw i32 %conv13, %conv18
%add = add nsw i16 %9, %12 ; IMPORTANT
%cmp19 = icmp slt i32 %conv, %add
%cmp19 = icmp slt i16 %8, %add ; IMPORTANT
br i1 %cmp19, label %cond.true, label %cond.false
Best regards,
Alex
On 5/21/2017 11:40 AM, Craig Topper wrote:> Do you have a simple test case you can send? I'm having trouble
replicating this on
> x86-64 with the simplest possible test.
>
> unsigned short foo(unsigned short a, unsigned short b) {
> return a + b;
> }
>
> This gives IR with no mention of i32. Maybe there's somethings
misconfigured for your
> target or I need a more complex test case.
>
> ~Craig
On 5/31/2017 11:04 PM, Peter Lawrence via llvm-dev wrote:
> Alex,
> The C language requires “short” arithmetic to be promoted to the
size
> of “int”, hence the conversions to “int” and later the optimizations back
to “short”
> but only when the optimizer can prove that the result will be the same.
>
> If your machine has only 16-bit registers and arithmetic then you should
> change clang. There won’t be any conversions in the IR (but there are
> A variety of problems with LLVM’s optimizations that you will run into !
).
>
> If your machine has both 16-bit and 32-bit registers and arithmetic, then
> you probably must leave clang alone. I am inclined to read your email
> as implying this is the case for you.
>
> Do you really need signed div and rem, usually people don’t need the
> quirky results of signed div and rem (in fact more often than not they
> need results consistent with two’s-complement shifts and masks) ?
>
> If unsigned is OK then CI Should (?) transform unsigned 32-bit div
> and rem of unsigned short into 16-bit unsigned div and rem. (Can someone
> verify / confirm that I’m thinking correctly here ?)
>
>
> The only thing I can think of off the top of my head for getting 16-bit
sdiv
> and srem instructions emitted on a 32-bit machine is with inline-asm ?
>
>
> BTW, IIRC sdiv and srem also inhibit vectorization to 16-bit SIMD
> instructions for the same reason (similarly shifts become undef for
different
> shift amounts in 16-bit), I wonder what work-arounds folks use in
> this context, perhaps someone else on this list can chime in ?
>
>
> -Peter Lawrence.
>
>
> On Sun, May 21, 2017 at 1:22 AM, Alex Susu via llvm-dev <llvm-dev at
lists.llvm.org
> <mailto:llvm-dev at lists.llvm.org>> wrote:
>
> Hello.
> My target architecture supports natively 16 bit integers (i16).
>
> Whenever I write in C programs using only short types, clang
compiles the program
> to LLVM and converts the i16 data to i32 to perform arithmetic
operations and then
> truncates the results to i16. Then, the InstructionCombining
(INSTCOMBINE or IC) pass
> removes these conversions back and forth from i16, except for the
(s)div LLVM IR
> operation.
>
> Is there a way to avoid these conversion made by clang back and
forth from i16 to
> i32, if my source program uses only short types?
> Otherwise, how can I make the IC pass handle sdiv the way it does
with add (sub),
> mul? (that is, if the input operands are i16, the add/mul operation
will eventually be
> i16, with any unnecessary conversion back and forth from i32 removed).
>
> Thank you,
> Alex