Jon Chesterfield via llvm-dev
2017-Sep-15 19:16 UTC
[llvm-dev] Question about 'DAGTypeLegalizer::SplitVecOp_EXTRACT_VECTOR_ELT'
Hi JinGu, The initial selection dag looks reasonable to me. Are you seeing a cannot select error related to the extending load or does the assembly generated fail to implement the semantics you expect? Jon On Fri, Sep 15, 2017 at 8:00 PM, via llvm-dev <llvm-dev at lists.llvm.org> wrote:> Send llvm-dev mailing list submissions to > llvm-dev at lists.llvm.org > > To subscribe or unsubscribe via the World Wide Web, visit > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > or, via email, send a message with subject or body 'help' to > llvm-dev-request at lists.llvm.org > > You can reach the person managing the list at > llvm-dev-owner at lists.llvm.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of llvm-dev digest..." > > > Today's Topics: > > 1. What should a truncating store do? (Jon Chesterfield via llvm-dev) > 2. Re: Question about > 'DAGTypeLegalizer::SplitVecOp_EXTRACT_VECTOR_ELT' > (jingu at codeplay.com via llvm-dev) > 3. DIVA - Debug Information Visual Analyser (Phil Camp via llvm-dev) > 4. Re: Changes to 'ADJCALLSTACK*' and 'callseq_*' between LLVM > v4.0 and v5.0 (Serge Pavlov via llvm-dev) > 5. Re: RFC: Trace-based layout. (Kyle Butt via llvm-dev) > 6. Re: Question about > 'DAGTypeLegalizer::SplitVecOp_EXTRACT_VECTOR_ELT' > (Demikhovsky, Elena via llvm-dev) > 7. Re: What should a truncating store do? > (Friedman, Eli via llvm-dev) > 8. Re: What should a truncating store do? > (Jon Chesterfield via llvm-dev) > 9. Re: What should a truncating store do? > (Friedman, Eli via llvm-dev) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Fri, 15 Sep 2017 13:49:48 +0100 > From: Jon Chesterfield via llvm-dev <llvm-dev at lists.llvm.org> > To: llvm-dev <llvm-dev at lists.llvm.org> > Subject: [llvm-dev] What should a truncating store do? > Message-ID: > <CAOUYtQCN4KYLtmwmVjnCajsSfVKwSETAPZ1zaoYK9w=v3c26Tg at mail. > gmail.com> > Content-Type: text/plain; charset="utf-8" > > For example, truncating store of an i32 to i6. My assumption was that this > should write the low six bits of the i32 to somewhere in memory. > > Should the top 24 bits of a corresponding 32 bit region of memory be > unchanged, zero, undefined? > > Should the two bits that would round the i6 up to a byte be preserved, > zero, undefined? > > I can't write six bits directly so am trying to determine what set of > bitwise ops to apply between a load and subsequent store to emulate the > truncating store. > > Thanks! > > Jon > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: <http://lists.llvm.org/pipermail/llvm-dev/ > attachments/20170915/5b458bec/attachment-0001.html> > > ------------------------------ > > Message: 2 > Date: Fri, 15 Sep 2017 15:45:05 +0100 > From: "jingu at codeplay.com via llvm-dev" <llvm-dev at lists.llvm.org> > To: "llvm-dev at lists.llvm.org" <llvm-dev at lists.llvm.org>, > elena.demikhovsky at intel.com, daniel_l_sanders at apple.com > Subject: Re: [llvm-dev] Question about > 'DAGTypeLegalizer::SplitVecOp_EXTRACT_VECTOR_ELT' > Message-ID: <5fdb722e-2682-ee03-871b-0f00ed1b5909 at codeplay.com> > Content-Type: text/plain; charset=utf-8; format=flowed > > Can someone give the comment about it please? > > Thanks, > > JinGu Kang > > > On 14/09/17 12:05, jingu at codeplay.com wrote: > > Hi All, > > > > I have a question about splitting 'EXTRACT_VECTOR_ELT' with 'v2i1'. I > > have a llvm IR code snippet as following: > > > > llvm IR code snippet: > > > > for.body: ; preds = %entry, > > %for.cond > > %i.022 = phi i32 [ 0, %entry ], [ %inc, %for.cond ] > > %0 = icmp ne <2 x i32> %vecinit1, <i32 0, i32 -23> > > %1 = extractelement <2 x i1> %0, i32 %i.022 > > %vecext4 = extractelement <2 x i32> %vecinit1, i32 %i.022 > > %vecext5 = extractelement <2 x i32> <i32 0, i32 -23>, i32 %i.022 > > %cmp6 = icmp ne i32 %vecext4, %vecext5 > > %cmp7 = xor i1 %1, %cmp6 > > > > ... > > > > and the SelectionDAG before TypeLegalizer is like this. > > > > t0: ch = EntryToken > > t2: i32,ch = CopyFromReg t0, Register:i32 %vreg0 > > t3: ch = ValueType:i32 > > t5: i32,ch = CopyFromReg t2:1, Register:i32 %vreg1 > > t7: i32 = AssertZext t5, ValueType:ch:i1 > > t8: v2i32 = BUILD_VECTOR t2, t7 > > t11: v2i32 = BUILD_VECTOR Constant:i32<0>, Constant:i32<-23> > > t15: i32,ch = CopyFromReg t0, Register:i32 %vreg2 > > t22: i32 = add t15, Constant:i32<1> > > t24: ch = CopyToReg t0, Register:i32 %vreg3, t22 > > t27: ch = CopyToReg t0, Register:i32 %vreg8, Constant:i32<-1> > > t31: ch = TokenFactor t24, t27 > > t13: v2i1 = setcc t8, t11, setne:ch > > t16: i1 = extract_vector_elt t13, t15 > > t17: i32 = extract_vector_elt t8, t15 > > t18: i32 = extract_vector_elt t11, t15 > > t19: i1 = setcc t17, t18, setne:ch > > t20: i1 = xor t16, t19 > > > > ... > > > > I have not added any vector register class so 'DAGTypeLegalizer' tries > > to split the "t16: i1 = extract_vector_elt t13, t15" because t13's > > result type is 'v2i1'. If the size of vector element is less than > > 8bit, 'DAGTypeLegalizer::SplitVecOp_EXTRACT_VECTOR_ELT()' function > > extends the elements to 8bit and stores them on stack. Finally, the > > function generates 'ExtLoad' to load specific element. But if the > > element's size is less than 8bit, I think it could be wrong. It looks > > it needs just 'Load' or "Load and Truncate" to match the result type > > of 'EXTRACT_VECTOR_ELT'. How do you think about it? If I missed > > something, please let me know. > > > > Thanks, > > > > JinGu Kang > > > > > > ------------------------------ > > Message: 3 > Date: Fri, 15 Sep 2017 16:38:48 +0100 > From: Phil Camp via llvm-dev <llvm-dev at lists.llvm.org> > To: llvm-dev at lists.llvm.org > Subject: [llvm-dev] DIVA - Debug Information Visual Analyser > Message-ID: <5b25cc76-bbd9-515c-b984-34a03dd1cd2a at flametop.co.uk> > Content-Type: text/plain; charset="utf-8"; Format="flowed" > > DIVA, the Debug Information Visual Analyser, was presented at the 2017 > European LLVM Developers Meeting > (https://www.youtube.com/watch?v=SwtpXaCk2bE). > > The DIVA binaries have been available since March, I am pleased to > announce that the source code is now available on GitHub. > https://github.com/SNSystems/DIVA > > DIVA is a command line tool that processes DWARF debug information > contained within ELF files and prints the semantics of that debug > information. The DIVA output is designed to be understandable by > software programmers without any low-level compiler or DWARF knowledge; > as such, it can be used to report debug information bugs to the compiler > provider. DIVA's output can also be used as the input to DWARF tests, to > compare the debug information generated from multiple compilers, from > different versions of the same compiler, from different compiler > switches and from the use of different DWARF specifications (i.e. DWARF > 3, 4 and 5). DIVA will be used on the LLVM project to test and validate > the output of clang to help improve the quality of the debug experience. > > Phil Camp > > SN Systems > > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: <http://lists.llvm.org/pipermail/llvm-dev/ > attachments/20170915/c02ff23f/attachment-0001.html> > > ------------------------------ > > Message: 4 > Date: Fri, 15 Sep 2017 23:39:44 +0700 > From: Serge Pavlov via llvm-dev <llvm-dev at lists.llvm.org> > To: "Martin J. O'Riordan" <MartinO at theheart.ie> > Cc: LLVM Developers <llvm-dev at lists.llvm.org> > Subject: Re: [llvm-dev] Changes to 'ADJCALLSTACK*' and 'callseq_*' > between LLVM v4.0 and v5.0 > Message-ID: > <CACOhrX4VSKtYBubv9q5kFd=btSWe5k6eEQSOYEo8c4uB2O27Rw at mail. > gmail.com> > Content-Type: text/plain; charset="utf-8" > > Hi Martin, > > Pseudo CALLSEQ_START was changed in r302527, commit message contains > details on the changes. > However CALLSEQ_END was not modified. If your made changes to > ADJCALLSTACKUP to add > additional argument, that may result in error. > > Thanks, > --Serge > > 2017-09-15 19:09 GMT+07:00 Martin J. O'Riordan via llvm-dev < > llvm-dev at lists.llvm.org>: > > > Hi LLVM-Devs, > > > > I have managed to complete updating our sources from LLVM v4.0 to v5.0, > but > > I am getting selection errors for 'callseq_end'. I am aware that the > > 'ADJCALLSTACKUP' and 'ADJCALLSTACKDOWN' patterns have changed, and have > > added an additional argument to the TD descriptions for these. > > > > There are interactions with 'ISD::CALL' and 'ISD::RET_FLAG', but so far > as > > I > > can tell I have revised these in the same way as the in-tree targets have > > adjusted their sources. > > > > The error I am seeing is: > > > > fatal error: error in backend: Cannot select: 0x15c9bbe00: ch,glue > > callseq_end 0x15c9bbd98, TargetConstant:i32<0>, > > TargetGlobalAddress:i32<void > > (i8*, i32, i8*, i8*)* @__assert_func> 0, 0x15c9bbd98:1 > > 0x15c9bb920: i32 = TargetConstant<0> > > 0x15c9bb8b8: i32 = TargetGlobalAddress<void (i8*, i32, i8*, i8*)* > > @__assert_func> 0 > > 0x15c9bbd98: ch,glue = MYISD::CALL 0x15c9bbcc8, > > TargetGlobalAddress:i32<void (i8*, i32, i8*, i8*)* @__assert_func> 0, > > Register:i32 %I18, Register:i32 %I17, Register:i32 %I16, Register:i32 > %I15, > > RegisterMask:Untyped, 0x15c9bbcc8:1 > > 0x15c9bb8b8: i32 = TargetGlobalAddress<void (i8*, i32, i8*, i8*)* > > @__assert_func> 0 > > 0x15c9bb9f0: i32 = Register %I18 > > 0x15c9bbac0: i32 = Register %I17 > > 0x15c9bbb90: i32 = Register %I16 > > 0x15c9bbc60: i32 = Register %I15 > > 0x15c9bbd30: Untyped = RegisterMask > > 0x15c9bbcc8: ch,glue = CopyToReg 0x15c9bbbf8, Register:i32 %I15, > > 0x15c9bb718, 0x15c9bbbf8:1 > > 0x15c9bbc60: i32 = Register %I15 > > 0x15c9bb718: i32,ch,glue = CopyFromReg 0x15c9bb648:1, > Register:i32 > > %vreg2, 0x15c9bb648:1 > > 0x15c9bb6b0: i32 = Register %vreg2 > > 0x15c9bbbf8: ch,glue = CopyToReg 0x15c9bbb28, Register:i32 %I16, > > Constant:i32<0>, 0x15c9bbb28:1 > > 0x15c9bbb90: i32 = Register %I16 > > 0x15c9bb850: i32 = Constant<0> > > 0x15c9bbb28: ch,glue = CopyToReg 0x15c9bba58, Register:i32 > %I17, > > 0x15c9bb648, 0x15c9bba58:1 > > 0x15c9bbac0: i32 = Register %I17 > > 0x15c9bb648: i32,ch,glue = CopyFromReg 0x15c9bb578:1, > > Register:i32 %vreg1, 0x15c9bb578:1 > > 0x15c9bb5e0: i32 = Register %vreg1 > > 0x15c9bba58: ch,glue = CopyToReg 0x15c9bb988, Register:i32 > > %I18, > > 0x15c9bb578 > > 0x15c9bb9f0: i32 = Register %I18 > > 0x15c9bb578: i32,ch,glue = CopyFromReg 0x15c967b38, > > Register:i32 %vreg0 > > 0x15c9bb510: i32 = Register %vreg0 > > > > My TD for this has: > > > > def SDT_MYCallSeqStart : SDCallSeqStart<[SDTCisVT<0, i32>, SDTCisVT<1, > > i32>]>; > > def SDT_MYCallSeqEnd : SDCallSeqStart<[SDTCisVT<0, i32>, SDTCisVT<1, > > i32>]>; > > def MYCallseqStart : SDNode<"ISD::CALLSEQ_START", > SDT_MYCallSeqStart, > > [SDNPHasChain, SDNPOutGlue]>; > > def MYCallseqEnd : SDNode<"ISD::CALLSEQ_END", SDT_MYCallSeqEnd, > > [SDNPHasChain, SDNPOptInGlue, > > SDNPOutGlue]>; > > > > def SDT_MYCall : SDTypeProfile<0, 1, [SDTCisVT<0, i32>]>; > > def SDT_MYRet : SDTypeProfile<0, 0, []>; > > def MYcall : SDNode<"MYISD::CALL", SDT_MYCall, > > [SDNPHasChain, SDNPOptInGlue, > > SDNPOutGlue, > > SDNPVariadic]>; > > def MYret : SDNode<"MYISD::RET_FLAG", SDTNone, > > [SDNPHasChain, SDNPOptInGlue, > > SDNPVariadic]>; > > > > let hasCtrlDep = 1, hasSideEffects = 1 in { > > def ADJCALLSTACKDOWN : Pseudo<(outs), (ins i32imm:$amt1, > i32imm:$amt2), > > [(MYCallseqStart timm:$amt1, > > timm:$amt2)]>; > > def ADJCALLSTACKUP : Pseudo<(outs), (ins i32imm:$amt1, > i32imm:$amt2), > > [(MYCallseqEnd timm:$amt1, > timm:$amt2)]>; > > } > > > > def: Pat<(MYret), (JMP_Ret (i32 LR))>; > > > > The function that is failing does warn - "warning: function declared > > 'noreturn' should not return [-Winvalid-noreturn]", and it does seem to > > return. In fact it invokes a custom builtin which does not actually > > return. > > In the past I have just ignored this warning. > > > > Any hints that might help me to make the necessary adaptations to fix > this? > > > > Thanks in advance, > > > > MartinO > > > > PS: I won't be able to reply until Monday as I will be away for the > weekend > > > > > > _______________________________________________ > > LLVM Developers mailing list > > llvm-dev at lists.llvm.org > > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > > > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: <http://lists.llvm.org/pipermail/llvm-dev/ > attachments/20170915/88bef271/attachment-0001.html> > > ------------------------------ > > Message: 5 > Date: Fri, 15 Sep 2017 10:00:11 -0700 > From: Kyle Butt via llvm-dev <llvm-dev at lists.llvm.org> > To: Sean Silva <chisophugis at gmail.com> > Cc: LLVM Developers <llvm-dev at lists.llvm.org> > Subject: Re: [llvm-dev] RFC: Trace-based layout. > Message-ID: > <CABeP02Ar0toCzHnax2EdGyGu8Bukq6PGEeoTy0CmSi0Dg8yneQ at mail. > gmail.com> > Content-Type: text/plain; charset="utf-8" > > It is essentially block layout algorithm 2 here, with limited non-greedy > lookahead. (The triangle detection) > https://www.ece.cmu.edu/~ece447/s13/lib/exe/fetch.php?media=p16-pettis.pdf > > On Thu, Sep 14, 2017 at 7:24 PM, Sean Silva <chisophugis at gmail.com> wrote: > > > Is this an existing published algorithm? Do you have a link to a paper? > > > > -- Sean Silva > > > > On Thu, Sep 14, 2017 at 6:53 PM, Kyle Butt via llvm-dev < > > llvm-dev at lists.llvm.org> wrote: > > > >> I plan on rewriting the block placement algorithm to proceed by traces. > >> > >> A trace is a chain of blocks where each block in the chain may fall > >> through to > >> the successor in the chain. > >> > >> The overall algorithm would be to first produce traces for a function, > >> and then > >> order those traces to try and get cache locality. > >> > >> Currently block placement uses a greedy single step approach to layout. > It > >> produces chains working from inner to outer loops. Unlike a trace, a > >> chain may > >> contain non-fallthrough edges. This causes problems with loop layout. > The > >> main > >> problems with loop layout are: loop rotation and cold blocks in a loop. > >> > >> Overview of proposed solution: > >> > >> Phase 1: > >> Greedily produce a set of traces through the function. A trace is a list > >> of > >> blocks with each block in the list falling through (possibly > >> conditionally) to > >> the next block in the list. Loop rotation will occur naturally in this > >> phase via > >> the triangle replacement algorithm below. Handling single trace loops > >> requires a > >> tweak, see the detailed design. > >> > >> Phase 2: > >> After producing what we believe are the best traces, they need to be > >> ordered. > >> They will be ordered topologically, except that traces that are cold > >> enough (As > >> measured by their warmest block) will be floated later, This may push > >> them out > >> of a loop or to the end of the function. > >> > >> Detailed Design > >> > >> Note whenever an edge is used as a number, I am referring to the edge > >> frequency. > >> > >> Phase 1: Producing traces > >> Traces are produced according to the following algorithm: > >> * Sort the edges according to weight, stable-sorting them according the > >> incoming > >> block and edge ordering. > >> * Place each block in a trace of length 1. > >> * For each edge in order: > >> * If the source is at the end of a trace, and the target is at the > >> beginning > >> of a trace, glue those 2 traces into 1 longer trace. > >> * If an edge has a target or source in the middle of another trace, > >> consider > >> tail duplication. The benefit calculation is the same as the > >> existing > >> code. > >> * If an edge has a source or target in the middle, check them to see > >> if they > >> can be replaced as a triangle. (Triangle replacement described > >> below) > >> * Compare the benefit of choosing the edge, along with any > triangles > >> found, with the cost of breaking the existing edges. > >> * If it is a net benefit, perform the switch. > >> * Triangle checking: > >> Consider a trace in 2 parts: A1->A2, and the current edge under > >> consideration > >> is A1->B (the case for C->A2 is mirror, and both may need to be > done) > >> * First find the best alternative C->B > >> * Check for an alternative for A2: D->A2 > >> * Find D's best Alternative: D->E > >> * Compare the frequencies: A1->A2 + C->B + D->E vs A1->B + D->A2 > >> * If the 2nd sum is bigger, do the switch. > >> * Loop Rotation Tweak: > >> If A contains a backedge A2->A1, then when considering A1->B or > >> C->A2, we > >> can include that backedge in the gain: > >> A1->A2 + C->D + E->B vs A1->B + C->A2 + A2->A > >> > >> Phase 2: Order traces. > >> First we compute the frequency of a trace by finding the max frequency > of > >> any of > >> its blocks. > >> Then we attempt to place the traces topologically. When a trace cannot > be > >> placed > >> topologically, we prefer warmer traces first. > >> > >> Questions and comments welcome. > >> > >> _______________________________________________ > >> LLVM Developers mailing list > >> llvm-dev at lists.llvm.org > >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > >> > >> > > > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: <http://lists.llvm.org/pipermail/llvm-dev/ > attachments/20170915/82dfc991/attachment-0001.html> > > ------------------------------ > > Message: 6 > Date: Fri, 15 Sep 2017 17:42:23 +0000 > From: "Demikhovsky, Elena via llvm-dev" <llvm-dev at lists.llvm.org> > To: "jingu at codeplay.com" <jingu at codeplay.com>, > "daniel_l_sanders at apple.com" <daniel_l_sanders at apple.com> > Cc: "llvm-dev at lists.llvm.org" <llvm-dev at lists.llvm.org> > Subject: Re: [llvm-dev] Question about > 'DAGTypeLegalizer::SplitVecOp_EXTRACT_VECTOR_ELT' > Message-ID: > <A0DC88CEB3010344830D52D66533DA8E5EE2F88D at hasmsx108.ger. > corp.intel.com> > > Content-Type: text/plain; charset="utf-8" > > > extends the elements to 8bit and stores them on stack. > Store is responsible for zero-extend. This is the policy... > > - Elena > > > -----Original Message----- > From: jingu at codeplay.com [mailto:jingu at codeplay.com] > Sent: Friday, September 15, 2017 17:45 > To: llvm-dev at lists.llvm.org; Demikhovsky, Elena < > elena.demikhovsky at intel.com>; daniel_l_sanders at apple.com > Subject: Re: Question about 'DAGTypeLegalizer::SplitVecOp_ > EXTRACT_VECTOR_ELT' > > Can someone give the comment about it please? > > Thanks, > > JinGu Kang > > > On 14/09/17 12:05, jingu at codeplay.com wrote: > > Hi All, > > > > I have a question about splitting 'EXTRACT_VECTOR_ELT' with 'v2i1'. I > > have a llvm IR code snippet as following: > > > > llvm IR code snippet: > > > > for.body: ; preds = %entry, > > %for.cond > > %i.022 = phi i32 [ 0, %entry ], [ %inc, %for.cond ] > > %0 = icmp ne <2 x i32> %vecinit1, <i32 0, i32 -23> > > %1 = extractelement <2 x i1> %0, i32 %i.022 > > %vecext4 = extractelement <2 x i32> %vecinit1, i32 %i.022 > > %vecext5 = extractelement <2 x i32> <i32 0, i32 -23>, i32 %i.022 > > %cmp6 = icmp ne i32 %vecext4, %vecext5 > > %cmp7 = xor i1 %1, %cmp6 > > > > ... > > > > and the SelectionDAG before TypeLegalizer is like this. > > > > t0: ch = EntryToken > > t2: i32,ch = CopyFromReg t0, Register:i32 %vreg0 > > t3: ch = ValueType:i32 > > t5: i32,ch = CopyFromReg t2:1, Register:i32 %vreg1 > > t7: i32 = AssertZext t5, ValueType:ch:i1 > > t8: v2i32 = BUILD_VECTOR t2, t7 > > t11: v2i32 = BUILD_VECTOR Constant:i32<0>, Constant:i32<-23> > > t15: i32,ch = CopyFromReg t0, Register:i32 %vreg2 > > t22: i32 = add t15, Constant:i32<1> > > t24: ch = CopyToReg t0, Register:i32 %vreg3, t22 > > t27: ch = CopyToReg t0, Register:i32 %vreg8, Constant:i32<-1> > > t31: ch = TokenFactor t24, t27 > > t13: v2i1 = setcc t8, t11, setne:ch > > t16: i1 = extract_vector_elt t13, t15 > > t17: i32 = extract_vector_elt t8, t15 > > t18: i32 = extract_vector_elt t11, t15 > > t19: i1 = setcc t17, t18, setne:ch > > t20: i1 = xor t16, t19 > > > > ... > > > > I have not added any vector register class so 'DAGTypeLegalizer' tries > > to split the "t16: i1 = extract_vector_elt t13, t15" because t13's > > result type is 'v2i1'. If the size of vector element is less than > > 8bit, 'DAGTypeLegalizer::SplitVecOp_EXTRACT_VECTOR_ELT()' function > > extends the elements to 8bit and stores them on stack. Finally, the > > function generates 'ExtLoad' to load specific element. But if the > > element's size is less than 8bit, I think it could be wrong. It looks > > it needs just 'Load' or "Load and Truncate" to match the result type > > of 'EXTRACT_VECTOR_ELT'. How do you think about it? If I missed > > something, please let me know. > > > > Thanks, > > > > JinGu Kang > > > > --------------------------------------------------------------------- > Intel Israel (74) Limited > > This e-mail and any attachments may contain confidential material for > the sole use of the intended recipient(s). Any review or distribution > by others is strictly prohibited. If you are not the intended > recipient, please contact the sender and delete all copies. > > ------------------------------ > > Message: 7 > Date: Fri, 15 Sep 2017 10:55:14 -0700 > From: "Friedman, Eli via llvm-dev" <llvm-dev at lists.llvm.org> > To: Jon Chesterfield <jonathanchesterfield at gmail.com>, llvm-dev > <llvm-dev at lists.llvm.org> > Subject: Re: [llvm-dev] What should a truncating store do? > Message-ID: <a0b1d63b-d177-beff-899e-420e8f2c0798 at codeaurora.org> > Content-Type: text/plain; charset=utf-8; format=flowed > > On 9/15/2017 5:49 AM, Jon Chesterfield via llvm-dev wrote: > > For example, truncating store of an i32 to i6. My assumption was that > > this should write the low six bits of the i32 to somewhere in memory. > > > > Should the top 24 bits of a corresponding 32 bit region of memory be > > unchanged, zero, undefined? > > Unchanged. > > > Should the two bits that would round the i6 up to a byte be preserved, > > zero, undefined? > > Zero. Legalization will normally handle this for you, though, by > transforming it to an i8 store. > > -Eli > > -- > Employee of Qualcomm Innovation Center, Inc. > Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux > Foundation Collaborative Project > > > > ------------------------------ > > Message: 8 > Date: Fri, 15 Sep 2017 19:30:20 +0100 > From: Jon Chesterfield via llvm-dev <llvm-dev at lists.llvm.org> > To: "Friedman, Eli" <efriedma at codeaurora.org> > Cc: llvm-dev <llvm-dev at lists.llvm.org> > Subject: Re: [llvm-dev] What should a truncating store do? > Message-ID: > <CAOUYtQBoArROmMx1Ke0jFxpsQ2ztFqtNxgbLzWVvycs0Ls72eA at mail. > gmail.com> > Content-Type: text/plain; charset="utf-8" > > Interesting, thank you. I expected both answers to be "unchanged" so was > surprised by the zero extend in the legaliser. > > The motivation here is that it's faster for us to load N bytes, apply > whatever masks are necessary to reproduce the truncating store then store > all N bytes. This is only a good plan if there's no change to the semantics > :) > > Are scalar integer types zero extended to the next multiple of 8 or to the > next power of 2 greater than 7? For example, i17 => i24 or i17 => i32? > > I think this means truncating stores of vector types will introduce zero > bits at the end of each element instead grouping all the zeros at the end. > For example, <i6 63, i6 63> writes to sixteen bits as 0b0011111100111111, > not as 0b0000111111111111? > > > Thanks! > > Jon > > > > On Fri, Sep 15, 2017 at 6:55 PM, Friedman, Eli <efriedma at codeaurora.org> > wrote: > > > On 9/15/2017 5:49 AM, Jon Chesterfield via llvm-dev wrote: > > > >> For example, truncating store of an i32 to i6. My assumption was that > >> this should write the low six bits of the i32 to somewhere in memory. > >> > >> Should the top 24 bits of a corresponding 32 bit region of memory be > >> unchanged, zero, undefined? > >> > > > > Unchanged. > > > > Should the two bits that would round the i6 up to a byte be preserved, > >> zero, undefined? > >> > > > > Zero. Legalization will normally handle this for you, though, by > > transforming it to an i8 store. > > > > -Eli > > > > -- > > Employee of Qualcomm Innovation Center, Inc. > > Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a > Linux > > Foundation Collaborative Project > > > > > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: <http://lists.llvm.org/pipermail/llvm-dev/ > attachments/20170915/1b054776/attachment-0001.html> > > ------------------------------ > > Message: 9 > Date: Fri, 15 Sep 2017 11:41:14 -0700 > From: "Friedman, Eli via llvm-dev" <llvm-dev at lists.llvm.org> > To: Jon Chesterfield <jonathanchesterfield at gmail.com> > Cc: llvm-dev <llvm-dev at lists.llvm.org> > Subject: Re: [llvm-dev] What should a truncating store do? > Message-ID: <8a9c81d9-9c89-9956-c269-d3057a71b451 at codeaurora.org> > Content-Type: text/plain; charset=utf-8; format=flowed > > On 9/15/2017 11:30 AM, Jon Chesterfield wrote: > > Interesting, thank you. I expected both answers to be "unchanged" so > > was surprised by the zero extend in the legaliser. > > > > The motivation here is that it's faster for us to load N bytes, apply > > whatever masks are necessary to reproduce the truncating store then > > store all N bytes. This is only a good plan if there's no change to > > the semantics :) > > See http://llvm.org/docs/LangRef.html#store-instruction . In general, > you have to be careful to avoid data races, but that might not apply to > your target. > > > Are scalar integer types zero extended to the next multiple of 8 or to > > the next power of 2 greater than 7? For example, i17 => i24 or i17 => > i32? > > Multiple of 8. > > > I think this means truncating stores of vector types will introduce > > zero bits at the end of each element instead grouping all the zeros at > > the end. For example, <i6 63, i6 63> writes to sixteen bits as > > 0b0011111100111111, not as 0b0000111111111111? > > Vector types are tightly packed, so <8 x i1> is 1 byte, not 8 bytes. > > -Eli > > -- > Employee of Qualcomm Innovation Center, Inc. > Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux > Foundation Collaborative Project > > > > ------------------------------ > > Subject: Digest Footer > > _______________________________________________ > llvm-dev mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > > > ------------------------------ > > End of llvm-dev Digest, Vol 159, Issue 57 > ***************************************** >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170915/e80f49d2/attachment.html>