> *NOTE* IIRC compiling this with -O0 on x86-64 can yield the wrong result > since clang will emit shifts and on intel shifts are mod the register > size [...snip...]I remember finding the same thing (although I haven't tried it on a recent clang version) and what I wondered was whether there was mileage in having an explicit intrinsic for rotation (like there is for bit counting, as in __builtin_clz and __builtin_ctz and so on). Microsoft's compiler has an explicit intrinsic in the form of _rotl8 and _rotl16 (IIRC -- this is from memory!). It would be nice to have a __builtin_rotl family in clang, in my opinion, but it would need back-end support from llvm. I would expect it to find use in code relating to hashing and cryptology. I know that the compiler *should* optimise uint32_t ror(uint32_t input, size_t rot_bits) { return (input >> rot_bits) | (input << ((sizeof(input) << 3) - rot_bits)); } and such macros / inline functions are common, but an intrinsic specifies intent better and could provide for better code optimisation. Would this be worthwhile pursuing? Cheers Andy
After some reflection, I believe the problem was actually with shld/shrd and a similar bit of C code. On Jul 29, 2012, at 1:02 PM, Andy Gibbs <andyg1001 at hotmail.co.uk> wrote:>> *NOTE* IIRC compiling this with -O0 on x86-64 can yield the wrong result >> since clang will emit shifts and on intel shifts are mod the register >> size [...snip...] > > I remember finding the same thing (although I haven't tried it on a recent > clang version) and what I wondered was whether there was mileage in having > an explicit intrinsic for rotation (like there is for bit counting, as in > __builtin_clz and __builtin_ctz and so on). Microsoft's compiler has an > explicit intrinsic in the form of _rotl8 and _rotl16 (IIRC -- this is from > memory!). It would be nice to have a __builtin_rotl family in clang, in > my opinion, but it would need back-end support from llvm. I would expect > it to find use in code relating to hashing and cryptology. I know that > the compiler *should* optimise > > uint32_t ror(uint32_t input, size_t rot_bits) { > return (input >> rot_bits) | (input << ((sizeof(input) << 3) - rot_bits)); > } > > and such macros / inline functions are common, but an intrinsic specifies > intent better and could provide for better code optimisation. > > Would this be worthwhile pursuing? > > Cheers > Andy > >
Hey Andy, I proposed a similar patch to LLVM (left circular shift) around 10/2011. Parts of my patch did make it into trunk about a year after, but others did not. At that time, my solution was to add a binary operator to the IRBuilder, since LCS fits in nicely with the other shift operators. But, it is quite cumbersome to merge :*(. I would be happy to resend the original patch if you'd like. -Cameron On Sun, Jul 29, 2012 at 4:02 PM, Andy Gibbs <andyg1001 at hotmail.co.uk> wrote: ...> It would be nice to have a __builtin_rotl family in clang, in > my opinion, but it would need back-end support from llvm.-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20120729/6a1f68df/attachment.html>
On Monday, July 30, 2012 12:16 AM, Cameron McInally wrote:> Hey Andy, > > I proposed a similar patch to LLVM (left circular shift) around 10/2011. > Parts of my patch did make it into trunk about a year after, but others > did not. > > At that time, my solution was to add a binary operator to the IRBuilder, > since LCS fits in nicely with the other shift operators. But, it is quite > cumbersome to merge :*(. I would be happy to resend the original patch > if you'd like.Yes, I would be interested. Thank you! I don't know if was rejected before that I'll have any better luck this time, but I can try... Andy