Alex Susu via llvm-dev
2017-May-21 08:22 UTC
[llvm-dev] Handling native i16 types in clang and opt
Hello. My target architecture supports natively 16 bit integers (i16). Whenever I write in C programs using only short types, clang compiles the program to LLVM and converts the i16 data to i32 to perform arithmetic operations and then truncates the results to i16. Then, the InstructionCombining (INSTCOMBINE or IC) pass removes these conversions back and forth from i16, except for the (s)div LLVM IR operation. Is there a way to avoid these conversion made by clang back and forth from i16 to i32, if my source program uses only short types? Otherwise, how can I make the IC pass handle sdiv the way it does with add (sub), mul? (that is, if the input operands are i16, the add/mul operation will eventually be i16, with any unnecessary conversion back and forth from i32 removed). Thank you, Alex
Craig Topper via llvm-dev
2017-May-21 08:40 UTC
[llvm-dev] Handling native i16 types in clang and opt
Do you have a simple test case you can send? I'm having trouble replicating this on x86-64 with the simplest possible test. unsigned short foo(unsigned short a, unsigned short b) { return a + b; } This gives IR with no mention of i32. Maybe there's somethings misconfigured for your target or I need a more complex test case. ~Craig On Sun, May 21, 2017 at 1:22 AM, Alex Susu via llvm-dev < llvm-dev at lists.llvm.org> wrote:> Hello. > My target architecture supports natively 16 bit integers (i16). > > Whenever I write in C programs using only short types, clang compiles > the program to LLVM and converts the i16 data to i32 to perform arithmetic > operations and then truncates the results to i16. Then, the > InstructionCombining (INSTCOMBINE or IC) pass removes these conversions > back and forth from i16, except for the (s)div LLVM IR operation. > > Is there a way to avoid these conversion made by clang back and forth > from i16 to i32, if my source program uses only short types? > Otherwise, how can I make the IC pass handle sdiv the way it does with > add (sub), mul? (that is, if the input operands are i16, the add/mul > operation will eventually be i16, with any unnecessary conversion back and > forth from i32 removed). > > Thank you, > Alex > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170521/9f445ab1/attachment.html>
Friedman, Eli via llvm-dev
2017-May-22 17:27 UTC
[llvm-dev] Handling native i16 types in clang and opt
On 5/21/2017 1:22 AM, Alex Susu via llvm-dev wrote:> Hello. > My target architecture supports natively 16 bit integers (i16). > > Whenever I write in C programs using only short types, clang > compiles the program to LLVM and converts the i16 data to i32 to > perform arithmetic operations and then truncates the results to i16. > Then, the InstructionCombining (INSTCOMBINE or IC) pass removes these > conversions back and forth from i16, except for the (s)div LLVM IR > operation.sdiv in particular is special: it has undefined behavior on overflow. "sdiv i32 -32768, -1" produces "i32 32768", but "sdiv i16 -32768, -1" is undefined. -Eli
Nemanja Ivanovic via llvm-dev
2017-May-29 13:30 UTC
[llvm-dev] Handling native i16 types in clang and opt
Just a shot in the dark here... could this possibly be that Clang (or whatever is adding those trunc/ext's into the IR) is considering your calling conventions in <TargetName>CallingConv.td? We certainly get the same behaviour on PPC but I wonder if that's due to lines like this in PPCCallingConv.td: `CCIfType<[i8], CCPromoteToType<i64>>` I don't know enough about this code to really know whether the above has any relation to reality, but it seems related. On Mon, May 22, 2017 at 1:27 PM, Friedman, Eli via llvm-dev < llvm-dev at lists.llvm.org> wrote:> On 5/21/2017 1:22 AM, Alex Susu via llvm-dev wrote: > >> Hello. >> My target architecture supports natively 16 bit integers (i16). >> >> Whenever I write in C programs using only short types, clang compiles >> the program to LLVM and converts the i16 data to i32 to perform arithmetic >> operations and then truncates the results to i16. Then, the >> InstructionCombining (INSTCOMBINE or IC) pass removes these conversions >> back and forth from i16, except for the (s)div LLVM IR operation. >> > > sdiv in particular is special: it has undefined behavior on overflow. > "sdiv i32 -32768, -1" produces "i32 32768", but "sdiv i16 -32768, -1" is > undefined. > > -Eli > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170529/4fd4e157/attachment.html>
Alex Susu via llvm-dev
2018-Jul-25 15:12 UTC
[llvm-dev] Handling native i16 types in clang and opt
Hello. I come back to this older thread. I'd also like to thank Peter Lawrence for the insightful answer (see below his email, if interested). Actually I would like to add that the C11 standard, Section 6.3.1.1, talks about integer promotions, which explains why the C language requires short arithmetic to be promoted to the size of int. See also https://stackoverflow.com/questions/46073295/implicit-type-promotion-rules . I would like to give an answer to Craig Topper: indeed I have a simple very interesting case where these promotions happen - the Floyd-Warshall algorithm, with the below program (also try the example at https://www.geeksforgeeks.org/integer-promotions-in-c/) . But in all cases do give clang -O0 to emit unoptimized LLVM IR. #define SIZE 128 short path[SIZE][SIZE]; void FloydWarshall() { int i, j, k; for (k = 0; k < SIZE; k++) { for (i = 0; i < SIZE; ++i) { short pik = path[i][k]; for (j = 0; j < SIZE; j++) { path[i][j] = path[i][j] < pik + path[k][j] ? path[i][j] : pik + path[k][j]; } } } } The innermost's loop body is translated to the following UNoptimized LLVM IR code - see lines with comment "IMPORTANT": for.body8: ; preds = %for.cond6 %6 = load i32, i32* %j, align 4 %idxprom9 = sext i32 %6 to i64 %7 = load i32, i32* %i, align 4 %idxprom10 = sext i32 %7 to i64 %arrayidx11 = getelementptr inbounds [256 x [256 x i16]], [256 x [256 x i16]]* @path, i64 0, i64 %idxprom10 %arrayidx12 = getelementptr inbounds [256 x i16], [256 x i16]* %arrayidx11, i64 0, i64 %idxprom9 %8 = load i16, i16* %arrayidx12 %conv = sext i16 %8 to i32 ; IMPORTANT %9 = load i16, i16* %pik %conv13 = sext i16 %9 to i32 ; IMPORTANT %10 = load i32, i32* %j, align 4 %idxprom14 = sext i32 %10 to i64 %11 = load i32, i32* %k, align 4 %idxprom15 = sext i32 %11 to i64 %arrayidx16 = getelementptr inbounds [256 x [256 x i16]], [256 x [256 x i16]]* @path, i64 0, i64 %idxprom15 %arrayidx17 = getelementptr inbounds [256 x i16], [256 x i16]* %arrayidx16, i64 0, i64 %idxprom14 %12 = load i16, i16* %arrayidx17, align 2, !dbg !61 %conv18 = sext i16 %12 to i32 %add = add nsw i32 %conv13, %conv18 %add = add nsw i16 %9, %12 ; IMPORTANT %cmp19 = icmp slt i32 %conv, %add %cmp19 = icmp slt i16 %8, %add ; IMPORTANT br i1 %cmp19, label %cond.true, label %cond.false Best regards, Alex On 5/21/2017 11:40 AM, Craig Topper wrote:> Do you have a simple test case you can send? I'm having trouble replicating this on > x86-64 with the simplest possible test. > > unsigned short foo(unsigned short a, unsigned short b) { > return a + b; > } > > This gives IR with no mention of i32. Maybe there's somethings misconfigured for your > target or I need a more complex test case. > > ~CraigOn 5/31/2017 11:04 PM, Peter Lawrence via llvm-dev wrote: > Alex, > The C language requires “short” arithmetic to be promoted to the size > of “int”, hence the conversions to “int” and later the optimizations back to “short” > but only when the optimizer can prove that the result will be the same. > > If your machine has only 16-bit registers and arithmetic then you should > change clang. There won’t be any conversions in the IR (but there are > A variety of problems with LLVM’s optimizations that you will run into ! ). > > If your machine has both 16-bit and 32-bit registers and arithmetic, then > you probably must leave clang alone. I am inclined to read your email > as implying this is the case for you. > > Do you really need signed div and rem, usually people don’t need the > quirky results of signed div and rem (in fact more often than not they > need results consistent with two’s-complement shifts and masks) ? > > If unsigned is OK then CI Should (?) transform unsigned 32-bit div > and rem of unsigned short into 16-bit unsigned div and rem. (Can someone > verify / confirm that I’m thinking correctly here ?) > > > The only thing I can think of off the top of my head for getting 16-bit sdiv > and srem instructions emitted on a 32-bit machine is with inline-asm ? > > > BTW, IIRC sdiv and srem also inhibit vectorization to 16-bit SIMD > instructions for the same reason (similarly shifts become undef for different > shift amounts in 16-bit), I wonder what work-arounds folks use in > this context, perhaps someone else on this list can chime in ? > > > -Peter Lawrence. >> > On Sun, May 21, 2017 at 1:22 AM, Alex Susu via llvm-dev <llvm-dev at lists.llvm.org > <mailto:llvm-dev at lists.llvm.org>> wrote: > > Hello. > My target architecture supports natively 16 bit integers (i16). > > Whenever I write in C programs using only short types, clang compiles the program > to LLVM and converts the i16 data to i32 to perform arithmetic operations and then > truncates the results to i16. Then, the InstructionCombining (INSTCOMBINE or IC) pass > removes these conversions back and forth from i16, except for the (s)div LLVM IR > operation. > > Is there a way to avoid these conversion made by clang back and forth from i16 to > i32, if my source program uses only short types? > Otherwise, how can I make the IC pass handle sdiv the way it does with add (sub), > mul? (that is, if the input operands are i16, the add/mul operation will eventually be > i16, with any unnecessary conversion back and forth from i32 removed). > > Thank you, > Alex
Apparently Analagous Threads
- [LLVMdev] introducing sign extending halfword loads into the LLVM IR
- [LLVMdev] introducing sign extending halfword loads into the LLVM IR
- [LLVMdev] introducing sign extending halfword loads into the LLVM IR
- [LLVMdev] introducing sign extending halfword loads into the LLVM IR
- Signed Division and InstCombine