Displaying 2 results from an estimated 2 matches for "sse2conversiontbl".
2016 Apr 11
2
X86 TRUNCATE cost for AVX & AVX2 mode
...finds cost as 30 for this operation. 30 cost for this operation looks very high.
Wondering why such a high cost kept for this, any pointers to understand this will be helpful.
In few cases this restricts better vectorization opportunities.
Other observations:
Cost for TRUNCATE v16i32 to v16i8 in SSE2ConversionTbl as 7.
Cost for TRUNCATE v8i32 to v8i8 is 2 in AVX2 and 4 in AVX mode.
Thanks,
Ashutosh
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160411/7233218b/attachment.html>
2016 Apr 12
2
X86 TRUNCATE cost for AVX & AVX2 mode
...ong>
Thanks Elena.
Mostly I was interested in why such a high cost 30 kept for TRUNCATE v16i32 to v16i8 in SSE41.
Looking at the code it appears like TRUNCATE v16i32 to v16i8 in SSE41 is very expensive
vs SSE2. I feel this number should be same/close to the cost mentioned for same
operation in SSE2ConversionTbl.
Below patch from Cong Hou reduce cost for same operation in SSE2 mode.
http://reviews.llvm.org/rL256194
Looks like as the part of same patch we should reduce cost for TRUNCATE v16i32 to v16i8 in SSE4.1 as well.
Regards,
Ashutosh
From: Demikhovsky, Elena [mailto:elena.demikhovsky at intel.com]...