Hello, Our software uses 4 x float vectors a lot, and I pass these to LLVM as packed types - but when I do the JIT compile it seems that the LowerPacked pass is never run so the code generation fails. I noticed that most other passes have a header file with a public createXXXPass() function so they can be added to the PassManager, but LowerPacked doesn't have this... What should I do? m. PS. Chris, thanks for the feedback on the memory cleanup patch - I'm a bit busy getting LLVM integrated in our app now, but I will incorporate your suggestions and submit a proper patch soon...
On Wed, 17 Nov 2004, Morten Ofstad wrote:> Our software uses 4 x float vectors a lot, and I pass these to LLVM as > packed types - but when I do the JIT compile it seems that the > LowerPacked pass is never run so the code generation fails. I noticed > that most other passes have a header file with a public createXXXPass() > function so they can be added to the PassManager, but LowerPacked > doesn't have this... What should I do?I just added it. There was no reason to not expose it, we just never got to that point. Note that packed support in LLVM is not complete yet. In particular, here are some of the big missing pieces: 1. No code generators can generate vector instructions yet (SSE or altivec, for example). This should be fairly easy to add though. 2. The lowerpacked pass, which currently converts packed ops into their scalar counterparts, has a few limitations: A. It does not handle packed arguments to functions B. It always lowers all of the way to scalar ops, even if the target supports SOME packed types. For example, it would be nice for it to eventually lower <16 x float> into 4 <4 x float>'s if the target supports them. C. It has never been thoroughly tested, primarily because we don't have a producer of packed operations yet. I believe it should work reasonably well though. 3. LLVM is missing support for a bunch of important vector operations. In particular, we need at least 'extract element' and 'build vector out of scalars' operations. Given these, we can implement packed arguments to functions without a problem. There are problem many others we eventually want. For your work, it might be most expedient to just ignore the lower packed pass and add SSE support to the X86 backend: that will get you up and running quickly and get you the performance you are obviously after. If backwards compatibility with old hardware is an issue, revisiting the lower packed pass would make sense. Let me know what you think. In the very short term, the hook exposed to create the lower packed pass can be plunked into the X86TargetMachine and get intra function packed types working for you.> PS. Chris, thanks for the feedback on the memory cleanup patch - I'm a > bit busy getting LLVM integrated in our app now, but I will incorporate > your suggestions and submit a proper patch soon...No problem. -Chris -- http://llvm.org/ http://nondot.org/sabre/
Chris Lattner wrote:> Note that packed support in LLVM is not complete yet. In > particular, here are some of the big missing pieces: > > 1. No code generators can generate vector instructions yet (SSE or > altivec, for example). This should be fairly easy to add though. > 2. The lowerpacked pass, which currently converts packed ops into their > scalar counterparts, has a few limitations: > A. It does not handle packed arguments to functions > B. It always lowers all of the way to scalar ops, even if the target > supports SOME packed types. For example, it would be nice for it > to eventually lower <16 x float> into 4 <4 x float>'s if the > target supports them.> C. It has never been thoroughly tested, primarily because we don't > have a producer of packed operations yet. I believe it should > work reasonably well though.It works reasonably well, quite impressive really considering it's not been tested ;-) B is not much of a problem for my use, but A is a bit annoying even though I mostly pass pointers to packed types anyway. Can you elaborate a bit on what is the problem with this? I have calls going back into our code by adding mappings to the JIT, but I'm not sure if I can get it to call functions with R32x4 (<float x 4>) args without making a wrapper that takes a pointer.> For your work, it might be most expedient to just ignore the lower packed > pass and add SSE support to the X86 backend: that will get you up and > running quickly and get you the performance you are obviously after. If > backwards compatibility with old hardware is an issue, revisiting the > lower packed pass would make sense.Is it easy to add intrinsics to do things like dot product of packed types using SSE instructions? That's probably all I need...> Let me know what you think. In the very short term, the hook exposed to > create the lower packed pass can be plunked into the X86TargetMachine and > get intra function packed types working for you.The patch you did was missing the actual implementation of createLowerPackedPass, so I'm including my own differences -- I guess you don't want to apply the changes to X86TargetMachine as I'm the only one actually generating packed types, but I include it for completeness.. m. -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: lowerpacked.patch.txt URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20041119/5fab9aff/attachment.txt>