Hi Mats, It's private backend. I will try describing what I am dealing with. struct S { unsigned int a : 8; unsigned int b : 8; unsigned int c : 8; unsigned int d : 8; unsigned int e; } We want to read S->b for example. The size of struct S is 64 bits, and seems LLVM treats it as i64. Below is the IR corresponding to S->b, IIRC. %0 = load i64, *i64 ptr, align 4; %1 = %0 lshr 8; %2 = %1 and 255; Our target doesn't support load i64, so we have following code in XXXISelLowering.cpp setOperationAction(ISD::LOAD, MVT::i64, Custom); Transform load i64 to load v2i32 during type legalization. During op legalization, load v2i32 is found unaligned (4 v.s. 8), so stack load/store instructions are generated. This is one problem. Besides of that, our target has bitset/bitextract instructions, we want to use them on bitfield access, too. But don't know how to do that. Thanks. Regards, chenwj 2017-06-15 0:10 GMT+08:00 mats petersson <mats at planetcatfish.com>:> Would probably help if you explained which backend you are working on > (assuming it's a publicly available one). An example, with source that can > be compiled by "anyone", along with the generated "bad code" and what you > expect to see as "good code" would also help a lot. > > From the things I've seen, it's not noticeably worse (or better) than > other compilers. But it's not an area that I've spent a LOT of time on, and > the combination of generic LLVM operations and the target implementation > will determine the outcome - there are lots of clever tricks one can do at > the machine-code level, that LLVM can't "know" in generic ways, since it's > dependent on specific instructions. Most of my experience comes from x86 > and ARM, both of which are fairly well established architectures with a > good amount of people supporting the code-gen part. If you are using a > different target, there may be missing target optimisations that the > compiler could do. > > I probably can't really help, just trying to help you make the question as > clear as possible, so that those who may be able to help have enough > information to work on. > > -- > Mats > > On 14 June 2017 at 13:57, 陳韋任 via llvm-dev <llvm-dev at lists.llvm.org> > wrote: > >> Hi All, >> >> Is there known issue that LLVM is bad at codegen for some language >> structure, say C bitfield? >> Our custom backend generates inefficient code for bitfield access, so I >> am wondering where >> should I look into first. >> >> Thanks. >> >> Regards, >> chenwj >> >> -- >> Wei-Ren Chen (陳韋任) >> Homepage: https://people.cs.nctu.edu.tw/~chenwj >> >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> >> >-- Wei-Ren Chen (陳韋任) Homepage: https://people.cs.nctu.edu.tw/~chenwj -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170615/1a438345/attachment.html>
I understand the problem. Can't offer any useful help - most likely, you need to add some code to help the instruction selection or some such... but it's not an area that I'm familiar with... -- Mats On 15 June 2017 at 12:06, 陳韋任 <chenwj.cs97g at g2.nctu.edu.tw> wrote:> Hi Mats, > > It's private backend. I will try describing what I am dealing with. > > struct S { > unsigned int a : 8; > unsigned int b : 8; > unsigned int c : 8; > unsigned int d : 8; > > unsigned int e; > } > > We want to read S->b for example. The size of struct S is 64 bits, and > seems LLVM treats it as i64. > Below is the IR corresponding to S->b, IIRC. > > %0 = load i64, *i64 ptr, align 4; > %1 = %0 lshr 8; > %2 = %1 and 255; > > Our target doesn't support load i64, so we have following code > in XXXISelLowering.cpp > > setOperationAction(ISD::LOAD, MVT::i64, Custom); > > Transform load i64 to load v2i32 during type legalization. During op > legalization, load v2i32 > is found unaligned (4 v.s. 8), so stack load/store instructions are > generated. This is one problem. > > Besides of that, our target has bitset/bitextract instructions, we want to > use them on bitfield > access, too. But don't know how to do that. > > Thanks. > > Regards, > chenwj > > > 2017-06-15 0:10 GMT+08:00 mats petersson <mats at planetcatfish.com>: > >> Would probably help if you explained which backend you are working on >> (assuming it's a publicly available one). An example, with source that can >> be compiled by "anyone", along with the generated "bad code" and what you >> expect to see as "good code" would also help a lot. >> >> From the things I've seen, it's not noticeably worse (or better) than >> other compilers. But it's not an area that I've spent a LOT of time on, and >> the combination of generic LLVM operations and the target implementation >> will determine the outcome - there are lots of clever tricks one can do at >> the machine-code level, that LLVM can't "know" in generic ways, since it's >> dependent on specific instructions. Most of my experience comes from x86 >> and ARM, both of which are fairly well established architectures with a >> good amount of people supporting the code-gen part. If you are using a >> different target, there may be missing target optimisations that the >> compiler could do. >> >> I probably can't really help, just trying to help you make the question >> as clear as possible, so that those who may be able to help have enough >> information to work on. >> >> -- >> Mats >> >> On 14 June 2017 at 13:57, 陳韋任 via llvm-dev <llvm-dev at lists.llvm.org> >> wrote: >> >>> Hi All, >>> >>> Is there known issue that LLVM is bad at codegen for some language >>> structure, say C bitfield? >>> Our custom backend generates inefficient code for bitfield access, so I >>> am wondering where >>> should I look into first. >>> >>> Thanks. >>> >>> Regards, >>> chenwj >>> >>> -- >>> Wei-Ren Chen (陳韋任) >>> Homepage: https://people.cs.nctu.edu.tw/~chenwj >>> >>> _______________________________________________ >>> LLVM Developers mailing list >>> llvm-dev at lists.llvm.org >>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>> >>> >> > > > -- > Wei-Ren Chen (陳韋任) > Homepage: https://people.cs.nctu.edu.tw/~chenwj >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170615/77fec40d/attachment.html>
I may be out to lunch here but this sounds like something that SROA converts into an i64 load. I wonder if disabling it produces IR that is easier for your target to handle. Of course, this isn't to say that simply disabling SROA is a viable solution, but it may give you some ideas as to where to go in terms of looking for a solution. You also may be able to combine such patterns in the SDAG (before legalization) into loads that your target can handle. This is all kind of speculative but hopefully it sheds some light on what might be going on. On Thu, Jun 15, 2017 at 1:45 PM mats petersson via llvm-dev < llvm-dev at lists.llvm.org> wrote:> I understand the problem. Can't offer any useful help - most likely, you > need to add some code to help the instruction selection or some such... but > it's not an area that I'm familiar with... > > -- > Mats > > On 15 June 2017 at 12:06, 陳韋任 <chenwj.cs97g at g2.nctu.edu.tw> wrote: > >> Hi Mats, >> >> It's private backend. I will try describing what I am dealing with. >> >> struct S { >> unsigned int a : 8; >> unsigned int b : 8; >> unsigned int c : 8; >> unsigned int d : 8; >> >> unsigned int e; >> } >> >> We want to read S->b for example. The size of struct S is 64 bits, and >> seems LLVM treats it as i64. >> Below is the IR corresponding to S->b, IIRC. >> >> %0 = load i64, *i64 ptr, align 4; >> %1 = %0 lshr 8; >> %2 = %1 and 255; >> >> Our target doesn't support load i64, so we have following code >> in XXXISelLowering.cpp >> >> setOperationAction(ISD::LOAD, MVT::i64, Custom); >> >> Transform load i64 to load v2i32 during type legalization. During op >> legalization, load v2i32 >> is found unaligned (4 v.s. 8), so stack load/store instructions are >> generated. This is one problem. >> >> Besides of that, our target has bitset/bitextract instructions, we want >> to use them on bitfield >> access, too. But don't know how to do that. >> >> Thanks. >> >> Regards, >> chenwj >> >> >> 2017-06-15 0:10 GMT+08:00 mats petersson <mats at planetcatfish.com>: >> >>> Would probably help if you explained which backend you are working on >>> (assuming it's a publicly available one). An example, with source that can >>> be compiled by "anyone", along with the generated "bad code" and what you >>> expect to see as "good code" would also help a lot. >>> >>> From the things I've seen, it's not noticeably worse (or better) than >>> other compilers. But it's not an area that I've spent a LOT of time on, and >>> the combination of generic LLVM operations and the target implementation >>> will determine the outcome - there are lots of clever tricks one can do at >>> the machine-code level, that LLVM can't "know" in generic ways, since it's >>> dependent on specific instructions. Most of my experience comes from x86 >>> and ARM, both of which are fairly well established architectures with a >>> good amount of people supporting the code-gen part. If you are using a >>> different target, there may be missing target optimisations that the >>> compiler could do. >>> >>> I probably can't really help, just trying to help you make the question >>> as clear as possible, so that those who may be able to help have enough >>> information to work on. >>> >>> -- >>> Mats >>> >>> On 14 June 2017 at 13:57, 陳韋任 via llvm-dev <llvm-dev at lists.llvm.org> >>> wrote: >>> >>>> Hi All, >>>> >>>> Is there known issue that LLVM is bad at codegen for some language >>>> structure, say C bitfield? >>>> Our custom backend generates inefficient code for bitfield access, so I >>>> am wondering where >>>> should I look into first. >>>> >>>> Thanks. >>>> >>>> Regards, >>>> chenwj >>>> >>>> -- >>>> Wei-Ren Chen (陳韋任) >>>> Homepage: https://people.cs.nctu.edu.tw/~chenwj >>>> >>>> _______________________________________________ >>>> LLVM Developers mailing list >>>> llvm-dev at lists.llvm.org >>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>>> >>>> >>> >> >> >> -- >> Wei-Ren Chen (陳韋任) >> Homepage: https://people.cs.nctu.edu.tw/~chenwj >> > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170615/b237a8bc/attachment.html>
On 6/15/2017 4:06 AM, 陳韋任 via llvm-dev wrote:> Hi Mats, > > It's private backend. I will try describing what I am dealing with. > > struct S { > unsigned int a : 8; > unsigned int b : 8; > unsigned int c : 8; > unsigned int d : 8; > > unsigned int e; > } > > We want to read S->b for example. The size of struct S is 64 bits, and > seems LLVM treats it as i64. > Below is the IR corresponding to S->b, IIRC. > > %0 = load i64, *i64 ptr, align 4; > %1 = %0 lshr 8; > %2 = %1 and 255;This looks fine.> > Our target doesn't support load i64, so we have following code > in XXXISelLowering.cpp > > setOperationAction(ISD::LOAD, MVT::i64, Custom); > Transform load i64 to load v2i32 during type legalization.If misaligned load v2i32 isn't legal, don't generate it. If it is legal, you might need to mess with your implementation of allowsMisalignedMemoryAccesses.> Besides of that, our target has bitset/bitextract instructions, we > want to use them on bitfield > access, too. But don't know how to do that.This is generally implemented by pattern-matching the shift and mask operations. ARM has instructions like this if you're looking for inspiration; look for UBFX, SBFX and BFI. -Eli -- Employee of Qualcomm Innovation Center, Inc. Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170615/be20c236/attachment.html>
> > Our target doesn't support load i64, so we have following code > in XXXISelLowering.cpp > > setOperationAction(ISD::LOAD, MVT::i64, Custom); > > Transform load i64 to load v2i32 during type legalization. > > > If misaligned load v2i32 isn't legal, don't generate it. If it is legal, > you might need to mess with your implementation of > allowsMisalignedMemoryAccesses. > >Will check that. Just a little more explanation about the misaligned part. We declare i64 is 8 align in the DataLayout, and in "%0 = load i64, *i64 ptr, align 4" the alignment is 4. In the op legalization stage, it will go through SelectionDAGLegalize::LegalizeLoadOps -> TargetLowering::expandUnalignedLoad We don't expect load i64 would be 4 align, so how do I know I will generate misaligned load v2i32 beforehand? Another question is usually what we do to handle load i64 if that is not natively supported? Is it correct transforming load i64 to load v2i32? An existing backend example would be great. Besides of that, our target has bitset/bitextract instructions, we want to> use them on bitfield > access, too. But don't know how to do that. > > > This is generally implemented by pattern-matching the shift and mask > operations. ARM has instructions like this if you're looking for > inspiration; look for UBFX, SBFX and BFI. >Thanks. Having example is good. :-) Regards, chenwj -- Wei-Ren Chen (陳韋任) Homepage: https://people.cs.nctu.edu.tw/~chenwj -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170616/5a4e9ebf/attachment.html>
Forgot to reply to all Hi Eli struct S { unsigned int a : 8; unsigned int b : 8; unsigned int c : 8; unsigned int d : 8; unsigned int e; } We want to read S->b for example. The size of struct S is 64 bits, and seems LLVM treats it as i64. Below is the IR corresponding to S->b, IIRC. %0 = load i64, *i64 ptr, align 4; %1 = %0 lshr 8; %2 = %1 and 255; This looks fine. Why can't we expect InstCombine to simplify this to an 8 bit load, assuming each of %0 and %1 has only one use ? Thanks Ehsan -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170616/2f73ed1e/attachment.html>
Here is the complete IR I am dealing with, if that helps discussion. ------------------------------------------------------------- %struct.A = type { %struct.Z*, %struct.Z*, %struct.Z*, %struct.Z*, %struct.Z* } %struct.Z = type { %union.X, [180 x %union.Y] } %union.X = type { %struct.anon } %struct.anon = type { i64 } %union.Y = type { %struct.anon.0 } %struct.anon.0 = type { i32 } %struct.D = type { i64 } ; Function Attrs: norecurse nounwind define void @func(%struct.A* noalias nocapture readonly %a, %struct.D* noalias nocapture readonly %d) local_unnamed_addr #0 { entry: %a2 = getelementptr inbounds %struct.A, %struct.A* %a, i32 0, i32 1 %0 = load %struct.Z*, %struct.Z** %a2, align 4, !tbaa !1 %1 = getelementptr inbounds %struct.D, %struct.D* %d, i32 0, i32 0 %bf.load = load i64, i64* %1, align 4 %bf.lshr = lshr i64 %bf.load, 8 %2 = trunc i64 %bf.lshr to i32 %bf.cast = and i32 %2, 255 %3 = getelementptr inbounds %struct.Z, %struct.Z* %0, i32 0, i32 1, i32 %bf.cast, i32 0, i32 0 %bf.load1 = load i32, i32* %3, align 4 %bf.clear2 = and i32 %bf.load1, 65535 store i32 %bf.clear2, i32* %3, align 4 ret void } ------------------------------------------------------------- Regards, chenwj 2017-06-16 14:13 GMT+08:00 Ehsan Amiri via llvm-dev <llvm-dev at lists.llvm.org>:> > Forgot to reply to all > > Hi Eli > > > struct S { > unsigned int a : 8; > unsigned int b : 8; > unsigned int c : 8; > unsigned int d : 8; > > unsigned int e; > } > > We want to read S->b for example. The size of struct S is 64 bits, and > seems LLVM treats it as i64. > Below is the IR corresponding to S->b, IIRC. > > %0 = load i64, *i64 ptr, align 4; > %1 = %0 lshr 8; > %2 = %1 and 255; > > > This looks fine. > > > Why can't we expect InstCombine to simplify this to an 8 bit load, > assuming each of %0 and %1 has only one use ? > > Thanks > Ehsan > > > > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > >-- Wei-Ren Chen (陳韋任) Homepage: https://people.cs.nctu.edu.tw/~chenwj -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170616/0ba02143/attachment.html>
On 6/15/2017 11:13 PM, Ehsan Amiri wrote:> > Forgot to reply to all > > Hi Eli > >> We want to read S->b for example. The size of struct S is 64 >> bits, and seems LLVM treats it as i64. >> Below is the IR corresponding to S->b, IIRC. >> >> %0 = load i64, *i64 ptr, align 4; >> %1 = %0 lshr 8; >> %2 = %1 and 255; > > This looks fine. > > > Why can't we expect InstCombine to simplify this to an 8 bit load, > assuming each of %0 and %1 has only one use ? > >We don't aggressively narrow loads and stores in IR because it tends to block other optimizations. See https://reviews.llvm.org/D30416. -Eli -- Employee of Qualcomm Innovation Center, Inc. Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170616/da39f767/attachment.html>
For this specific case, modifying source code (or the frontend) to use a struct instead of bitfield seems to be an easy way since all sizes of bitfields are 8 bits. But you cannot? For general cases, you may want to enhance backend to emit bitextract/bitset. ----- Hiroshi Inoue <inouehrs at jp.ibm.com> IBM Research - Tokyo "llvm-dev" <llvm-dev-bounces at lists.llvm.org> wrote on 2017/06/15 20:06:49:> From: 陳韋任 via llvm-dev <llvm-dev at lists.llvm.org> > To: mats petersson <mats at planetcatfish.com> > Cc: LLVM Developers Mailing List <llvm-dev at lists.llvm.org> > Date: 2017/06/15 20:07 > Subject: Re: [llvm-dev] About CodeGen quality > Sent by: "llvm-dev" <llvm-dev-bounces at lists.llvm.org> > > Hi Mats, > > It's private backend. I will try describing what I am dealing with. > > struct S { > unsigned int a : 8; > unsigned int b : 8; > unsigned int c : 8; > unsigned int d : 8; > > unsigned int e; > } > > We want to read S->b for example. The size of struct S is 64 bits, > and seems LLVM treats it as i64. > Below is the IR corresponding to S->b, IIRC. > > %0 = load i64, *i64 ptr, align 4; > %1 = %0 lshr 8; > %2 = %1 and 255; > > Our target doesn't support load i64, so we have following code > in XXXISelLowering.cpp > > setOperationAction(ISD::LOAD, MVT::i64, Custom); > > Transform load i64 to load v2i32 during type legalization. During op > legalization, load v2i32 > is found unaligned (4 v.s. 8), so stack load/store instructions are > generated. This is one problem. > > Besides of that, our target has bitset/bitextract instructions, we > want to use them on bitfield > access, too. But don't know how to do that. > > Thanks. > > Regards, > chenwj > > 2017-06-15 0:10 GMT+08:00 mats petersson <mats at planetcatfish.com>: > Would probably help if you explained which backend you are working > on (assuming it's a publicly available one). An example, with source > that can be compiled by "anyone", along with the generated "bad > code" and what you expect to see as "good code" would also help a lot.> From the things I've seen, it's not noticeably worse (or better) > than other compilers. But it's not an area that I've spent a LOT of > time on, and the combination of generic LLVM operations and the > target implementation will determine the outcome - there are lots of > clever tricks one can do at the machine-code level, that LLVM can't > "know" in generic ways, since it's dependent on specific > instructions. Most of my experience comes from x86 and ARM, both of > which are fairly well established architectures with a good amount > of people supporting the code-gen part. If you are using a different > target, there may be missing target optimisations that the compiler coulddo.> I probably can't really help, just trying to help you make the > question as clear as possible, so that those who may be able to help > have enough information to work on. > > -- > Mats > > On 14 June 2017 at 13:57, 陳韋任 via llvm-dev <llvm-dev at lists.llvm.org>wrote:> Hi All, > > Is there known issue that LLVM is bad at codegen for some language > structure, say C bitfield? > Our custom backend generates inefficient code for bitfield access, > so I am wondering where > should I look into first. > > Thanks. > > Regards, > chenwj > > -- > Wei-Ren Chen (陳韋任) > Homepage: https://people.cs.nctu.edu.tw/~chenwj > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev> >> > -- > Wei-Ren Chen (陳韋任) > Homepage: https://people.cs.nctu.edu.tw/~chenwj > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170617/d04b0161/attachment.html>
2017-06-17 3:19 GMT+08:00 Hiroshi 7 Inoue <INOUEHRS at jp.ibm.com>:> For this specific case, modifying source code (or the frontend) to use a > struct instead of > bitfield seems to be an easy way since all sizes of bitfields are 8 bits. > But you cannot? > For general cases, you may want to enhance backend to emit > bitextract/bitset. >The original test case was modified as it is, the bitfield are not always 8 bits. And sure, emitting bitextract/bitset will improve the code quality. Regards, chenwj -- Wei-Ren Chen (陳韋任) Homepage: https://people.cs.nctu.edu.tw/~chenwj -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170617/92302dfb/attachment.html>