Friedman, Eli via llvm-dev
2017-Jul-07 02:11 UTC
[llvm-dev] Error in v64i32 type in x86 backend
Have you read http://llvm.org/docs/WritingAnLLVMBackend.html and http://llvm.org/docs/CodeGenerator.html ? http://llvm.org/docs/WritingAnLLVMBackend.html#instruction-selector describes how to define a store instruction. -Eli On 7/6/2017 6:51 PM, hameeza ahmed via llvm-dev wrote:> Please correct me i m stuck at this point. > > On Jul 6, 2017 5:18 PM, "hameeza ahmed" <hahmed2305 at gmail.com > <mailto:hahmed2305 at gmail.com>> wrote: > > Hello, > i am experimenting with the increase in register/ vector width to > 64 elements of 32 bits instead of 16 in x86 backend. > for eg. > i have a loop with 65 iterations; > if my IR generates v64i32 and 1 scalar, still the backend breaks > the v64i32 into 4 v16i32. i want it to retain v64i32. like if > there are 128 elements in loop then it should break it into 2 > v64i32 instructions. > > in order to do this i have made necessary changes in > X86ISelLowering.cpp. and rebuild llvm. then when i use the > command -view-dag-combine2-dags i get the required output in graph > but the following error on console: > > LLVM ERROR: Cannot select: t10: ch = store<ST256[bitcast ([65 x > i32]* @a to <64 x i32>*)](align=16)(tbaa=<0x30c5438>)> t9, t7, > t12, undef:i64 > t7: v64i32 = add t6, t4 > t6: v64i32,ch = load<LD256[bitcast ([65 x i32]* @c to <64 x > i32>*)](align=16)(tbaa=<0x30c5438>)(dereferenceable)> t0, t14, > undef:i64 > t14: i64 = X86ISD::Wrapper TargetGlobalAddress:i64<[65 x > i32]* @c> 0 > t13: i64 = TargetGlobalAddress<[65 x i32]* @c> 0 > t3: i64 = undef > t4: v64i32,ch = load<LD256[bitcast ([65 x i32]* @b to <64 x > i32>*)](align=16)(tbaa=<0x30c5438>)(dereferenceable)> t0, t16, > undef:i64 > t16: i64 = X86ISD::Wrapper TargetGlobalAddress:i64<[65 x > i32]* @b> 0 > t15: i64 = TargetGlobalAddress<[65 x i32]* @b> 0 > t3: i64 = undef > t12: i64 = X86ISD::Wrapper TargetGlobalAddress:i64<[65 x i32]* @a> 0 > t11: i64 = TargetGlobalAddress<[65 x i32]* @a> 0 > t3: i64 = undef > In function: foo > > The dag after legalization is also attached here. > > the source is vector sum of 65 elements. > > > Kindly correct me. > > > > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev-- Employee of Qualcomm Innovation Center, Inc. Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170706/cedbb555/attachment.html>
hameeza ahmed via llvm-dev
2017-Jul-07 03:00 UTC
[llvm-dev] Error in v64i32 type in x86 backend
Thank You. well i have seen these links. but they dont cover the problem that i have mentioned. actually i am doing all the things step by step. so i havent yet worked with instruction selection phase/ files. rather before that i am trying to do legalization by allowing vector elements>16 i.e 64xi32. here i have mainly worked with 2 files uptil now, i.e registerinfo.td to define register class to be called in legalization. and most importantly i am dealing with file X86ISelLowering.cpp. Now is there any relation in this and instruction selection. since instruction selection comes after combine and legalize so i havent yet worked on it. Please correct me, I am stuck here. Thank You again On Fri, Jul 7, 2017 at 7:11 AM, Friedman, Eli <efriedma at codeaurora.org> wrote:> Have you read http://llvm.org/docs/WritingAnLLVMBackend.html and > http://llvm.org/docs/CodeGenerator.html ? http://llvm.org/docs/ > WritingAnLLVMBackend.html#instruction-selector describes how to define a > store instruction. > > -Eli > > > On 7/6/2017 6:51 PM, hameeza ahmed via llvm-dev wrote: > > Please correct me i m stuck at this point. > > On Jul 6, 2017 5:18 PM, "hameeza ahmed" <hahmed2305 at gmail.com> wrote: > > Hello, > i am experimenting with the increase in register/ vector width to 64 > elements of 32 bits instead of 16 in x86 backend. > for eg. > i have a loop with 65 iterations; > if my IR generates v64i32 and 1 scalar, still the backend breaks the > v64i32 into 4 v16i32. i want it to retain v64i32. like if there are 128 > elements in loop then it should break it into 2 v64i32 instructions. > > in order to do this i have made necessary changes in X86ISelLowering.cpp. > and rebuild llvm. then when i use the command -view-dag-combine2-dags i > get the required output in graph but the following error on console: > > LLVM ERROR: Cannot select: t10: ch = store<ST256[bitcast ([65 x i32]* @a > to <64 x i32>*)](align=16)(tbaa=<0x30c5438>)> t9, t7, t12, undef:i64 > t7: v64i32 = add t6, t4 > t6: v64i32,ch = load<LD256[bitcast ([65 x i32]* @c to <64 x > i32>*)](align=16)(tbaa=<0x30c5438>)(dereferenceable)> t0, t14, undef:i64 > t14: i64 = X86ISD::Wrapper TargetGlobalAddress:i64<[65 x i32]* @c> 0 > t13: i64 = TargetGlobalAddress<[65 x i32]* @c> 0 > t3: i64 = undef > t4: v64i32,ch = load<LD256[bitcast ([65 x i32]* @b to <64 x > i32>*)](align=16)(tbaa=<0x30c5438>)(dereferenceable)> t0, t16, undef:i64 > t16: i64 = X86ISD::Wrapper TargetGlobalAddress:i64<[65 x i32]* @b> 0 > t15: i64 = TargetGlobalAddress<[65 x i32]* @b> 0 > t3: i64 = undef > t12: i64 = X86ISD::Wrapper TargetGlobalAddress:i64<[65 x i32]* @a> 0 > t11: i64 = TargetGlobalAddress<[65 x i32]* @a> 0 > t3: i64 = undef > In function: foo > > The dag after legalization is also attached here. > > the source is vector sum of 65 elements. > > > Kindly correct me. > > > > > _______________________________________________ > LLVM Developers mailing listllvm-dev at lists.llvm.orghttp://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > > > -- > Employee of Qualcomm Innovation Center, Inc. > Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170707/b426b117/attachment.html>
hameeza ahmed via llvm-dev
2017-Jul-07 03:21 UTC
[llvm-dev] Error in v64i32 type in x86 backend
also i further run the following command; llc -debug filer-knl_o3.ll and its output is attached here. by looking at the output can we say that legalization runs fine and the error is due to instruction selection/ pattern matching which is not yet implemented? so do i need to worry and try to correct it at this stage or should i move forward to implement instruction selection/ pattern matching? Please guide me. Thank You On Fri, Jul 7, 2017 at 8:00 AM, hameeza ahmed <hahmed2305 at gmail.com> wrote:> Thank You. well i have seen these links. but they dont cover the problem > that i have mentioned. actually i am doing all the things step by step. > > so i havent yet worked with instruction selection phase/ files. rather > before that i am trying to do legalization by allowing vector elements>16 > i.e 64xi32. here i have mainly worked with 2 files uptil now, i.e > registerinfo.td to define register class to be called in legalization. > and most importantly i am dealing with file X86ISelLowering.cpp. > > Now is there any relation in this and instruction selection. since > instruction selection comes after combine and legalize so i havent yet > worked on it. > > > Please correct me, I am stuck here. > > Thank You again > > On Fri, Jul 7, 2017 at 7:11 AM, Friedman, Eli <efriedma at codeaurora.org> > wrote: > >> Have you read http://llvm.org/docs/WritingAnLLVMBackend.html and >> http://llvm.org/docs/CodeGenerator.html ? http://llvm.org/docs/WritingAn >> LLVMBackend.html#instruction-selector describes how to define a store >> instruction. >> >> -Eli >> >> >> On 7/6/2017 6:51 PM, hameeza ahmed via llvm-dev wrote: >> >> Please correct me i m stuck at this point. >> >> On Jul 6, 2017 5:18 PM, "hameeza ahmed" <hahmed2305 at gmail.com> wrote: >> >> Hello, >> i am experimenting with the increase in register/ vector width to 64 >> elements of 32 bits instead of 16 in x86 backend. >> for eg. >> i have a loop with 65 iterations; >> if my IR generates v64i32 and 1 scalar, still the backend breaks the >> v64i32 into 4 v16i32. i want it to retain v64i32. like if there are 128 >> elements in loop then it should break it into 2 v64i32 instructions. >> >> in order to do this i have made necessary changes in X86ISelLowering.cpp. >> and rebuild llvm. then when i use the command -view-dag-combine2-dags i >> get the required output in graph but the following error on console: >> >> LLVM ERROR: Cannot select: t10: ch = store<ST256[bitcast ([65 x i32]* @a >> to <64 x i32>*)](align=16)(tbaa=<0x30c5438>)> t9, t7, t12, undef:i64 >> t7: v64i32 = add t6, t4 >> t6: v64i32,ch = load<LD256[bitcast ([65 x i32]* @c to <64 x >> i32>*)](align=16)(tbaa=<0x30c5438>)(dereferenceable)> t0, t14, undef:i64 >> t14: i64 = X86ISD::Wrapper TargetGlobalAddress:i64<[65 x i32]* @c> 0 >> t13: i64 = TargetGlobalAddress<[65 x i32]* @c> 0 >> t3: i64 = undef >> t4: v64i32,ch = load<LD256[bitcast ([65 x i32]* @b to <64 x >> i32>*)](align=16)(tbaa=<0x30c5438>)(dereferenceable)> t0, t16, undef:i64 >> t16: i64 = X86ISD::Wrapper TargetGlobalAddress:i64<[65 x i32]* @b> 0 >> t15: i64 = TargetGlobalAddress<[65 x i32]* @b> 0 >> t3: i64 = undef >> t12: i64 = X86ISD::Wrapper TargetGlobalAddress:i64<[65 x i32]* @a> 0 >> t11: i64 = TargetGlobalAddress<[65 x i32]* @a> 0 >> t3: i64 = undef >> In function: foo >> >> The dag after legalization is also attached here. >> >> the source is vector sum of 65 elements. >> >> >> Kindly correct me. >> >> >> >> >> _______________________________________________ >> LLVM Developers mailing listllvm-dev at lists.llvm.orghttp://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> >> >> -- >> Employee of Qualcomm Innovation Center, Inc. >> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project >> >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170707/8ce74459/attachment-0001.html> -------------- next part -------------- hameeza at ubuntu:$ llc -debug filer-knl_o3.ll Args:llc -debug filer-knl_o3.ll Features:+64bit,+sse2,+adx,+aes,+avx,+avx2,+avx512cd,+avx512er,+avx512f,+avx512pf,+bmi,+bmi2,+cx16,+f16c,+fma,+fsgsbase,+fxsr,+lzcnt,+mmx,+movbe,+pclmul,+popcnt,+prefetchwt1,+rdrnd,+rdseed,+rtm,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+ssse3,+x87,+xsave,+xsaveopt CPU:knl Subtarget features: SSELevel 9, 3DNowLevel 1, 64bit 1 ********** Begin Constant Hoisting ********** ********** Function: foo ********** End Constant Hoisting ********** *** Interleaved Access Pass: foo CGP: Found local addrmode: [GV:@b] CGP: Found local addrmode: [GV:@c] CGP: Found local addrmode: [GV:@a] CGP: Found local addrmode: [GV:@b + 256] CGP: Found local addrmode: [GV:@c + 256] CGP: Found local addrmode: [GV:@a + 256] [SafeStack] Function: foo [SafeStack] safestack is not requested for this function ---- Branch Probability Info : foo ---- Computing probabilities for scalar.ph Computing probabilities for vector.body Computing probabilities for min.iters.checked === foo Initial selection DAG: BB#0 'foo:min.iters.checked' SelectionDAG has 1 nodes: t0: ch = EntryToken Combining: t0: ch = EntryToken Optimized lowered selection DAG: BB#0 'foo:min.iters.checked' SelectionDAG has 1 nodes: t0: ch = EntryToken Legally typed node: t0: ch = EntryToken Legally typed node: t65535: ch = handlenode t0 Type-legalized selection DAG: BB#0 'foo:min.iters.checked' SelectionDAG has 1 nodes: t0: ch = EntryToken Legalizing: t0: ch = EntryToken Legalized selection DAG: BB#0 'foo:min.iters.checked' SelectionDAG has 1 nodes: t0: ch = EntryToken Legalizing: t0: ch = EntryToken Combining: t0: ch = EntryToken Optimized legalized selection DAG: BB#0 'foo:min.iters.checked' SelectionDAG has 1 nodes: t0: ch = EntryToken ===== Instruction selection begins: BB#0 'min.iters.checked' Selecting: t0: ch = EntryToken ===== Instruction selection ends: Selected selection DAG: BB#0 'foo:min.iters.checked' SelectionDAG has 1 nodes: t0: ch = EntryToken ********** List Scheduling BB#0 'min.iters.checked' ********** *** Final schedule *** Total amount of phi nodes to update: 0 Initial selection DAG: BB#1 'foo:vector.body' SelectionDAG has 11 nodes: t0: ch = EntryToken t2: i64 = Constant<0> t4: v64i32,ch = load<LD256[bitcast ([65 x i32]* @b to <64 x i32>*)](align=16)(tbaa=<0x4452448>)(dereferenceable)> t0, GlobalAddress:i64<[65 x i32]* @b> 0, undef:i64 t6: v64i32,ch = load<LD256[bitcast ([65 x i32]* @c to <64 x i32>*)](align=16)(tbaa=<0x4452448>)(dereferenceable)> t0, GlobalAddress:i64<[65 x i32]* @c> 0, undef:i64 t9: ch = TokenFactor t4:1, t6:1 t7: v64i32 = add t6, t4 t10: ch = store<ST256[bitcast ([65 x i32]* @a to <64 x i32>*)](align=16)(tbaa=<0x4452448>)> t9, t7, GlobalAddress:i64<[65 x i32]* @a> 0, undef:i64 Combining: t10: ch = store<ST256[bitcast ([65 x i32]* @a to <64 x i32>*)](align=16)(tbaa=<0x4452448>)> t9, t7, GlobalAddress:i64<[65 x i32]* @a> 0, undef:i64 Combining: t9: ch = TokenFactor t4:1, t6:1 Combining: t8: i64 = GlobalAddress<[65 x i32]* @a> 0 Combining: t7: v64i32 = add t6, t4 Combining: t6: v64i32,ch = load<LD256[bitcast ([65 x i32]* @c to <64 x i32>*)](align=16)(tbaa=<0x4452448>)(dereferenceable)> t0, GlobalAddress:i64<[65 x i32]* @c> 0, undef:i64 Combining: t5: i64 = GlobalAddress<[65 x i32]* @c> 0 Combining: t4: v64i32,ch = load<LD256[bitcast ([65 x i32]* @b to <64 x i32>*)](align=16)(tbaa=<0x4452448>)(dereferenceable)> t0, GlobalAddress:i64<[65 x i32]* @b> 0, undef:i64 Combining: t3: i64 = undef Combining: t1: i64 = GlobalAddress<[65 x i32]* @b> 0 Combining: t0: ch = EntryToken Optimized lowered selection DAG: BB#1 'foo:vector.body' SelectionDAG has 10 nodes: t0: ch = EntryToken t4: v64i32,ch = load<LD256[bitcast ([65 x i32]* @b to <64 x i32>*)](align=16)(tbaa=<0x4452448>)(dereferenceable)> t0, GlobalAddress:i64<[65 x i32]* @b> 0, undef:i64 t6: v64i32,ch = load<LD256[bitcast ([65 x i32]* @c to <64 x i32>*)](align=16)(tbaa=<0x4452448>)(dereferenceable)> t0, GlobalAddress:i64<[65 x i32]* @c> 0, undef:i64 t9: ch = TokenFactor t4:1, t6:1 t7: v64i32 = add t6, t4 t10: ch = store<ST256[bitcast ([65 x i32]* @a to <64 x i32>*)](align=16)(tbaa=<0x4452448>)> t9, t7, GlobalAddress:i64<[65 x i32]* @a> 0, undef:i64 Legally typed node: t8: i64 = GlobalAddress<[65 x i32]* @a> 0 Legally typed node: t5: i64 = GlobalAddress<[65 x i32]* @c> 0 Legally typed node: t3: i64 = undef Legally typed node: t1: i64 = GlobalAddress<[65 x i32]* @b> 0 Legally typed node: t0: ch = EntryToken Legally typed node: t4: v64i32,ch = load<LD256[bitcast ([65 x i32]* @b to <64 x i32>*)](align=16)(tbaa=<0x4452448>)(dereferenceable)> t0, GlobalAddress:i64<[65 x i32]* @b> 0, undef:i64 Legally typed node: t6: v64i32,ch = load<LD256[bitcast ([65 x i32]* @c to <64 x i32>*)](align=16)(tbaa=<0x4452448>)(dereferenceable)> t0, GlobalAddress:i64<[65 x i32]* @c> 0, undef:i64 Legally typed node: t7: v64i32 = add t6, t4 Legally typed node: t9: ch = TokenFactor t4:1, t6:1 Legally typed node: t10: ch = store<ST256[bitcast ([65 x i32]* @a to <64 x i32>*)](align=16)(tbaa=<0x4452448>)> t9, t7, GlobalAddress:i64<[65 x i32]* @a> 0, undef:i64 Legally typed node: t65535: ch = handlenode t10 Type-legalized selection DAG: BB#1 'foo:vector.body' SelectionDAG has 10 nodes: t0: ch = EntryToken t4: v64i32,ch = load<LD256[bitcast ([65 x i32]* @b to <64 x i32>*)](align=16)(tbaa=<0x4452448>)(dereferenceable)> t0, GlobalAddress:i64<[65 x i32]* @b> 0, undef:i64 t6: v64i32,ch = load<LD256[bitcast ([65 x i32]* @c to <64 x i32>*)](align=16)(tbaa=<0x4452448>)(dereferenceable)> t0, GlobalAddress:i64<[65 x i32]* @c> 0, undef:i64 t9: ch = TokenFactor t4:1, t6:1 t7: v64i32 = add t6, t4 t10: ch = store<ST256[bitcast ([65 x i32]* @a to <64 x i32>*)](align=16)(tbaa=<0x4452448>)> t9, t7, GlobalAddress:i64<[65 x i32]* @a> 0, undef:i64 Legalizing: t10: ch = store<ST256[bitcast ([65 x i32]* @a to <64 x i32>*)](align=16)(tbaa=<0x4452448>)> t9, t7, GlobalAddress:i64<[65 x i32]* @a> 0, undef:i64 Legalizing: t7: v64i32 = add t6, t4 Legalizing: t9: ch = TokenFactor t4:1, t6:1 Legalizing: t6: v64i32,ch = load<LD256[bitcast ([65 x i32]* @c to <64 x i32>*)](align=16)(tbaa=<0x4452448>)(dereferenceable)> t0, GlobalAddress:i64<[65 x i32]* @c> 0, undef:i64 Legalizing: t4: v64i32,ch = load<LD256[bitcast ([65 x i32]* @b to <64 x i32>*)](align=16)(tbaa=<0x4452448>)(dereferenceable)> t0, GlobalAddress:i64<[65 x i32]* @b> 0, undef:i64 Legalizing: t8: i64 = GlobalAddress<[65 x i32]* @a> 0 ... replacing: t8: i64 = GlobalAddress<[65 x i32]* @a> 0 with: t12: i64 = X86ISD::Wrapper TargetGlobalAddress:i64<[65 x i32]* @a> 0 Legalizing: t5: i64 = GlobalAddress<[65 x i32]* @c> 0 ... replacing: t5: i64 = GlobalAddress<[65 x i32]* @c> 0 with: t14: i64 = X86ISD::Wrapper TargetGlobalAddress:i64<[65 x i32]* @c> 0 Legalizing: t3: i64 = undef Legalizing: t1: i64 = GlobalAddress<[65 x i32]* @b> 0 ... replacing: t1: i64 = GlobalAddress<[65 x i32]* @b> 0 with: t16: i64 = X86ISD::Wrapper TargetGlobalAddress:i64<[65 x i32]* @b> 0 Legalizing: t0: ch = EntryToken Legalizing: t16: i64 = X86ISD::Wrapper TargetGlobalAddress:i64<[65 x i32]* @b> 0 Legalizing: t15: i64 = TargetGlobalAddress<[65 x i32]* @b> 0 Legalizing: t14: i64 = X86ISD::Wrapper TargetGlobalAddress:i64<[65 x i32]* @c> 0 Legalizing: t13: i64 = TargetGlobalAddress<[65 x i32]* @c> 0 Legalizing: t12: i64 = X86ISD::Wrapper TargetGlobalAddress:i64<[65 x i32]* @a> 0 Legalizing: t11: i64 = TargetGlobalAddress<[65 x i32]* @a> 0 Legalized selection DAG: BB#1 'foo:vector.body' SelectionDAG has 13 nodes: t0: ch = EntryToken t16: i64 = X86ISD::Wrapper TargetGlobalAddress:i64<[65 x i32]* @b> 0 t4: v64i32,ch = load<LD256[bitcast ([65 x i32]* @b to <64 x i32>*)](align=16)(tbaa=<0x4452448>)(dereferenceable)> t0, t16, undef:i64 t14: i64 = X86ISD::Wrapper TargetGlobalAddress:i64<[65 x i32]* @c> 0 t6: v64i32,ch = load<LD256[bitcast ([65 x i32]* @c to <64 x i32>*)](align=16)(tbaa=<0x4452448>)(dereferenceable)> t0, t14, undef:i64 t9: ch = TokenFactor t4:1, t6:1 t7: v64i32 = add t6, t4 t12: i64 = X86ISD::Wrapper TargetGlobalAddress:i64<[65 x i32]* @a> 0 t10: ch = store<ST256[bitcast ([65 x i32]* @a to <64 x i32>*)](align=16)(tbaa=<0x4452448>)> t9, t7, t12, undef:i64 Legalizing: t16: i64 = X86ISD::Wrapper TargetGlobalAddress:i64<[65 x i32]* @b> 0 Combining: t16: i64 = X86ISD::Wrapper TargetGlobalAddress:i64<[65 x i32]* @b> 0 Legalizing: t15: i64 = TargetGlobalAddress<[65 x i32]* @b> 0 Combining: t15: i64 = TargetGlobalAddress<[65 x i32]* @b> 0 Legalizing: t14: i64 = X86ISD::Wrapper TargetGlobalAddress:i64<[65 x i32]* @c> 0 Combining: t14: i64 = X86ISD::Wrapper TargetGlobalAddress:i64<[65 x i32]* @c> 0 Legalizing: t13: i64 = TargetGlobalAddress<[65 x i32]* @c> 0 Combining: t13: i64 = TargetGlobalAddress<[65 x i32]* @c> 0 Legalizing: t12: i64 = X86ISD::Wrapper TargetGlobalAddress:i64<[65 x i32]* @a> 0 Combining: t12: i64 = X86ISD::Wrapper TargetGlobalAddress:i64<[65 x i32]* @a> 0 Legalizing: t11: i64 = TargetGlobalAddress<[65 x i32]* @a> 0 Combining: t11: i64 = TargetGlobalAddress<[65 x i32]* @a> 0 Legalizing: t10: ch = store<ST256[bitcast ([65 x i32]* @a to <64 x i32>*)](align=16)(tbaa=<0x4452448>)> t9, t7, t12, undef:i64 Combining: t10: ch = store<ST256[bitcast ([65 x i32]* @a to <64 x i32>*)](align=16)(tbaa=<0x4452448>)> t9, t7, t12, undef:i64 Legalizing: t7: v64i32 = add t6, t4 Combining: t7: v64i32 = add t6, t4 Legalizing: t9: ch = TokenFactor t4:1, t6:1 Combining: t9: ch = TokenFactor t4:1, t6:1 Legalizing: t6: v64i32,ch = load<LD256[bitcast ([65 x i32]* @c to <64 x i32>*)](align=16)(tbaa=<0x4452448>)(dereferenceable)> t0, t14, undef:i64 Combining: t6: v64i32,ch = load<LD256[bitcast ([65 x i32]* @c to <64 x i32>*)](align=16)(tbaa=<0x4452448>)(dereferenceable)> t0, t14, undef:i64 Legalizing: t4: v64i32,ch = load<LD256[bitcast ([65 x i32]* @b to <64 x i32>*)](align=16)(tbaa=<0x4452448>)(dereferenceable)> t0, t16, undef:i64 Combining: t4: v64i32,ch = load<LD256[bitcast ([65 x i32]* @b to <64 x i32>*)](align=16)(tbaa=<0x4452448>)(dereferenceable)> t0, t16, undef:i64 Legalizing: t3: i64 = undef Combining: t3: i64 = undef Legalizing: t0: ch = EntryToken Combining: t0: ch = EntryToken Optimized legalized selection DAG: BB#1 'foo:vector.body' SelectionDAG has 13 nodes: t0: ch = EntryToken t16: i64 = X86ISD::Wrapper TargetGlobalAddress:i64<[65 x i32]* @b> 0 t4: v64i32,ch = load<LD256[bitcast ([65 x i32]* @b to <64 x i32>*)](align=16)(tbaa=<0x4452448>)(dereferenceable)> t0, t16, undef:i64 t14: i64 = X86ISD::Wrapper TargetGlobalAddress:i64<[65 x i32]* @c> 0 t6: v64i32,ch = load<LD256[bitcast ([65 x i32]* @c to <64 x i32>*)](align=16)(tbaa=<0x4452448>)(dereferenceable)> t0, t14, undef:i64 t9: ch = TokenFactor t4:1, t6:1 t7: v64i32 = add t6, t4 t12: i64 = X86ISD::Wrapper TargetGlobalAddress:i64<[65 x i32]* @a> 0 t10: ch = store<ST256[bitcast ([65 x i32]* @a to <64 x i32>*)](align=16)(tbaa=<0x4452448>)> t9, t7, t12, undef:i64 ===== Instruction selection begins: BB#1 'vector.body' Selecting: t10: ch = store<ST256[bitcast ([65 x i32]* @a to <64 x i32>*)](align=16)(tbaa=<0x4452448>)> t9, t7, t12, undef:i64 ISEL: Starting pattern match on root node: t10: ch = store<ST256[bitcast ([65 x i32]* @a to <64 x i32>*)](align=16)(tbaa=<0x4452448>)> t9, t7, t12, undef:i64 Skipped scope entry (due to false predicate) at index 14, continuing at 81 Skipped scope entry (due to false predicate) at index 82, continuing at 149 Skipped scope entry (due to false predicate) at index 150, continuing at 217 Skipped scope entry (due to false predicate) at index 218, continuing at 267 Skipped scope entry (due to false predicate) at index 268, continuing at 317 Skipped scope entry (due to false predicate) at index 318, continuing at 367 Skipped scope entry (due to false predicate) at index 368, continuing at 394 Skipped scope entry (due to false predicate) at index 395, continuing at 421 Skipped scope entry (due to false predicate) at index 422, continuing at 471 Skipped scope entry (due to false predicate) at index 472, continuing at 521 Skipped scope entry (due to false predicate) at index 522, continuing at 571 Skipped scope entry (due to false predicate) at index 572, continuing at 639 Skipped scope entry (due to false predicate) at index 640, continuing at 707 Skipped scope entry (due to false predicate) at index 708, continuing at 775 Skipped scope entry (due to false predicate) at index 776, continuing at 804 Skipped scope entry (due to false predicate) at index 805, continuing at 833 Skipped scope entry (due to false predicate) at index 834, continuing at 862 Skipped scope entry (due to false predicate) at index 863, continuing at 891 Skipped scope entry (due to false predicate) at index 892, continuing at 920 Skipped scope entry (due to false predicate) at index 921, continuing at 949 Skipped scope entry (due to false predicate) at index 950, continuing at 987 Skipped scope entry (due to false predicate) at index 988, continuing at 1025 Match failed at index 12 Continuing at 1026 OpcodeSwitch from 1029 to 5725 Match failed at index 5743 Continuing at 5772 Match failed at index 5776 Continuing at 5805 Match failed at index 5809 Continuing at 5838 Match failed at index 5842 Continuing at 5911 Match failed at index 5915 Continuing at 5953 Match failed at index 5957 Continuing at 5995 Match failed at index 5999 Continuing at 6037 Match failed at index 6041 Continuing at 6084 Match failed at index 6088 Continuing at 6131 Skipped scope entry (due to false predicate) at index 6138, continuing at 6181 Skipped scope entry (due to false predicate) at index 6182, continuing at 6228 Skipped scope entry (due to false predicate) at index 6235, continuing at 6384 Match failed at index 6388 Continuing at 6419 Match failed at index 6423 Continuing at 6454 Match failed at index 6458 Continuing at 6489 Continuing at 6490 Continuing at 6491 Continuing at 6492 Match failed at index 6514 Continuing at 6545 Match failed at index 6562 Continuing at 6593 Match failed at index 6610 Continuing at 6641 Continuing at 6642 Match failed at index 6658 Continuing at 6772 Match failed at index 6788 Continuing at 6902 Continuing at 13636 Match failed at index 13640 Continuing at 14940 Match failed at index 14943 Continuing at 15415 Match failed at index 15417 Continuing at 15570 Match failed at index 15571 Continuing at 15598 Match failed at index 15599 Continuing at 15716 Match failed at index 15719 Continuing at 15837 Match failed at index 15840 Continuing at 16174 Skipped scope entry (due to false predicate) at index 16179, continuing at 16261 Skipped scope entry (due to false predicate) at index 16262, continuing at 16370 Skipped scope entry (due to false predicate) at index 16371, continuing at 16440 Skipped scope entry (due to false predicate) at index 16441, continuing at 16463 Skipped scope entry (due to false predicate) at index 16464, continuing at 16486 Skipped scope entry (due to false predicate) at index 16487, continuing at 16509 Skipped scope entry (due to false predicate) at index 16510, continuing at 16532 Skipped scope entry (due to false predicate) at index 16533, continuing at 16656 Skipped scope entry (due to false predicate) at index 16657, continuing at 16780 Skipped scope entry (due to false predicate) at index 16781, continuing at 16866 Skipped scope entry (due to false predicate) at index 16867, continuing at 16952 Skipped scope entry (due to false predicate) at index 16954, continuing at 17145 Skipped scope entry (due to false predicate) at index 17147, continuing at 17318 Skipped scope entry (due to false predicate) at index 17320, continuing at 17473 Skipped scope entry (due to false predicate) at index 17475, continuing at 17608 Skipped scope entry (due to false predicate) at index 17610, continuing at 17777 Skipped scope entry (due to false predicate) at index 17779, continuing at 17920 Skipped scope entry (due to false predicate) at index 17922, continuing at 18050 Skipped scope entry (due to false predicate) at index 18051, continuing at 18154 Skipped scope entry (due to false predicate) at index 18155, continuing at 18229 Skipped scope entry (due to false predicate) at index 18230, continuing at 18254 Skipped scope entry (due to false predicate) at index 18255, continuing at 18279 Skipped scope entry (due to false predicate) at index 18280, continuing at 18304 Skipped scope entry (due to false predicate) at index 18305, continuing at 18352 Skipped scope entry (due to false predicate) at index 18353, continuing at 18400 Skipped scope entry (due to false predicate) at index 18401, continuing at 18496 Skipped scope entry (due to false predicate) at index 18497, continuing at 18612 Skipped scope entry (due to false predicate) at index 18613, continuing at 18637 Skipped scope entry (due to false predicate) at index 18638, continuing at 18687 Skipped scope entry (due to false predicate) at index 18688, continuing at 18712 Skipped scope entry (due to false predicate) at index 18713, continuing at 18746 Skipped scope entry (due to false predicate) at index 18747, continuing at 18832 Skipped scope entry (due to false predicate) at index 18833, continuing at 18918 Skipped scope entry (due to false predicate) at index 18919, continuing at 19004 Match failed at index 16177 Continuing at 19005 LLVM ERROR: Cannot select: t10: ch = store<ST256[bitcast ([65 x i32]* @a to <64 x i32>*)](align=16)(tbaa=<0x4452448>)> t9, t7, t12, undef:i64 t7: v64i32 = add t6, t4 t6: v64i32,ch = load<LD256[bitcast ([65 x i32]* @c to <64 x i32>*)](align=16)(tbaa=<0x4452448>)(dereferenceable)> t0, t14, undef:i64 t14: i64 = X86ISD::Wrapper TargetGlobalAddress:i64<[65 x i32]* @c> 0 t13: i64 = TargetGlobalAddress<[65 x i32]* @c> 0 t3: i64 = undef t4: v64i32,ch = load<LD256[bitcast ([65 x i32]* @b to <64 x i32>*)](align=16)(tbaa=<0x4452448>)(dereferenceable)> t0, t16, undef:i64 t16: i64 = X86ISD::Wrapper TargetGlobalAddress:i64<[65 x i32]* @b> 0 t15: i64 = TargetGlobalAddress<[65 x i32]* @b> 0 t3: i64 = undef t12: i64 = X86ISD::Wrapper TargetGlobalAddress:i64<[65 x i32]* @a> 0 t11: i64 = TargetGlobalAddress<[65 x i32]* @a> 0 t3: i64 = undef In function: foo