On Sun, Jun 7, 2015 at 2:34 AM, Eric Christopher <echristo at gmail.com> wrote:> > > On Sat, Jun 6, 2015 at 12:31 PM C Bergström <cbergstrom at pathscale.com> > wrote: >> >> On Sun, Jun 7, 2015 at 2:22 AM, Eric Christopher <echristo at gmail.com> >> wrote: >> > >> > >> > On Sat, Jun 6, 2015 at 5:02 AM C Bergström <cbergstrom at pathscale.com> >> > wrote: >> >> >> >> On Sat, Jun 6, 2015 at 6:24 PM, Christos Margiolas >> >> <chrmargiolas at gmail.com> wrote: >> >> > Hello, >> >> > >> >> > Thank you a lot for the feedback. I believe that the heterogeneous >> >> > engine >> >> > should be strongly connected with parallelization and vectorization >> >> > efforts. >> >> > Most of the accelerators are parallel architectures where having >> >> > efficient >> >> > parallelization and vectorization can be critical for performance. >> >> > >> >> > I am interested in these efforts and I hope that my code can help you >> >> > managing the offloading operations. Your LLVM instruction set >> >> > extensions >> >> > may >> >> > require some changes in the analysis code but I think is going to be >> >> > straightforward. >> >> > >> >> > I am planning to push my code on phabricator in the next days. >> >> >> >> If you're doing the extracting at the loop and llvm ir level - why >> >> would you need to modify the IR? Wouldn't the target level lowering >> >> happen later? >> >> >> >> How are you actually determining to offload? Is this tied to >> >> directives or using heuristics+some set of restrictions? >> >> >> >> Lastly, are you handling 2 targets in the same module or end up >> >> emitting 2 modules and dealing with recombining things later.. >> >> >> > >> > It's not currently possible to do this using the current structure >> > without >> > some significant and, honestly, icky patches. >> >> What's not possible? I agree some of our local patches and design may >> not make it upstream as-is, but we are offloading to 2+ targets using >> llvm ir *today*. >> > > I'm not sure how much more clear I can be. It's not possible, in the same > module, to handle multiple targets at the same time. > >> >> IMHO - you must (re)solve the problem about handling multiple targets >> concurrently. That means 2 targets in a single Module or 2 Modules >> basically glued one after the other. > > > Patches welcome.While I appreciate your taste in music - Canned (troll) replies are typically a waste of time..
Eric Christopher
2015-Jun-06 19:52 UTC
[LLVMdev] Supporting heterogeneous computing in llvm.
On Sat, Jun 6, 2015 at 12:43 PM C Bergström <cbergstrom at pathscale.com> wrote:> On Sun, Jun 7, 2015 at 2:34 AM, Eric Christopher <echristo at gmail.com> > wrote: > > > > > > On Sat, Jun 6, 2015 at 12:31 PM C Bergström <cbergstrom at pathscale.com> > > wrote: > >> > >> On Sun, Jun 7, 2015 at 2:22 AM, Eric Christopher <echristo at gmail.com> > >> wrote: > >> > > >> > > >> > On Sat, Jun 6, 2015 at 5:02 AM C Bergström <cbergstrom at pathscale.com> > >> > wrote: > >> >> > >> >> On Sat, Jun 6, 2015 at 6:24 PM, Christos Margiolas > >> >> <chrmargiolas at gmail.com> wrote: > >> >> > Hello, > >> >> > > >> >> > Thank you a lot for the feedback. I believe that the heterogeneous > >> >> > engine > >> >> > should be strongly connected with parallelization and vectorization > >> >> > efforts. > >> >> > Most of the accelerators are parallel architectures where having > >> >> > efficient > >> >> > parallelization and vectorization can be critical for performance. > >> >> > > >> >> > I am interested in these efforts and I hope that my code can help > you > >> >> > managing the offloading operations. Your LLVM instruction set > >> >> > extensions > >> >> > may > >> >> > require some changes in the analysis code but I think is going to > be > >> >> > straightforward. > >> >> > > >> >> > I am planning to push my code on phabricator in the next days. > >> >> > >> >> If you're doing the extracting at the loop and llvm ir level - why > >> >> would you need to modify the IR? Wouldn't the target level lowering > >> >> happen later? > >> >> > >> >> How are you actually determining to offload? Is this tied to > >> >> directives or using heuristics+some set of restrictions? > >> >> > >> >> Lastly, are you handling 2 targets in the same module or end up > >> >> emitting 2 modules and dealing with recombining things later.. > >> >> > >> > > >> > It's not currently possible to do this using the current structure > >> > without > >> > some significant and, honestly, icky patches. > >> > >> What's not possible? I agree some of our local patches and design may > >> not make it upstream as-is, but we are offloading to 2+ targets using > >> llvm ir *today*. > >> > > > > I'm not sure how much more clear I can be. It's not possible, in the same > > module, to handle multiple targets at the same time. > > > >> > >> IMHO - you must (re)solve the problem about handling multiple targets > >> concurrently. That means 2 targets in a single Module or 2 Modules > >> basically glued one after the other. > > > > > > Patches welcome. > > While I appreciate your taste in music - Canned (troll) replies are > typically a waste of time.. >This is uncalled for and unacceptable. I've done an immense amount of work so that we can support different subtargets in the same module and get better LTO and target features. If you have a feature above and beyond what I've been able to do (and you say you do) then a request for patches is more than acceptable as a response. I've yet to see any work from you and a lot of talk about what other people should do. -eric -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150606/ca1342cb/attachment.html>
On Sun, Jun 7, 2015 at 2:52 AM, Eric Christopher <echristo at gmail.com> wrote:> > > On Sat, Jun 6, 2015 at 12:43 PM C Bergström <cbergstrom at pathscale.com> > wrote: >> >> On Sun, Jun 7, 2015 at 2:34 AM, Eric Christopher <echristo at gmail.com> >> wrote: >> > >> > >> > On Sat, Jun 6, 2015 at 12:31 PM C Bergström <cbergstrom at pathscale.com> >> > wrote: >> >> >> >> On Sun, Jun 7, 2015 at 2:22 AM, Eric Christopher <echristo at gmail.com> >> >> wrote: >> >> > >> >> > >> >> > On Sat, Jun 6, 2015 at 5:02 AM C Bergström <cbergstrom at pathscale.com> >> >> > wrote: >> >> >> >> >> >> On Sat, Jun 6, 2015 at 6:24 PM, Christos Margiolas >> >> >> <chrmargiolas at gmail.com> wrote: >> >> >> > Hello, >> >> >> > >> >> >> > Thank you a lot for the feedback. I believe that the heterogeneous >> >> >> > engine >> >> >> > should be strongly connected with parallelization and >> >> >> > vectorization >> >> >> > efforts. >> >> >> > Most of the accelerators are parallel architectures where having >> >> >> > efficient >> >> >> > parallelization and vectorization can be critical for performance. >> >> >> > >> >> >> > I am interested in these efforts and I hope that my code can help >> >> >> > you >> >> >> > managing the offloading operations. Your LLVM instruction set >> >> >> > extensions >> >> >> > may >> >> >> > require some changes in the analysis code but I think is going to >> >> >> > be >> >> >> > straightforward. >> >> >> > >> >> >> > I am planning to push my code on phabricator in the next days. >> >> >> >> >> >> If you're doing the extracting at the loop and llvm ir level - why >> >> >> would you need to modify the IR? Wouldn't the target level lowering >> >> >> happen later? >> >> >> >> >> >> How are you actually determining to offload? Is this tied to >> >> >> directives or using heuristics+some set of restrictions? >> >> >> >> >> >> Lastly, are you handling 2 targets in the same module or end up >> >> >> emitting 2 modules and dealing with recombining things later.. >> >> >> >> >> > >> >> > It's not currently possible to do this using the current structure >> >> > without >> >> > some significant and, honestly, icky patches. >> >> >> >> What's not possible? I agree some of our local patches and design may >> >> not make it upstream as-is, but we are offloading to 2+ targets using >> >> llvm ir *today*. >> >> >> > >> > I'm not sure how much more clear I can be. It's not possible, in the >> > same >> > module, to handle multiple targets at the same time. >> > >> >> >> >> IMHO - you must (re)solve the problem about handling multiple targets >> >> concurrently. That means 2 targets in a single Module or 2 Modules >> >> basically glued one after the other. >> > >> > >> > Patches welcome. >> >> While I appreciate your taste in music - Canned (troll) replies are >> typically a waste of time.. > > > This is uncalled for and unacceptable. I've done an immense amount of work > so that we can support different subtargets in the same module and get > better LTO and target features. If you have a feature above and beyond what > I've been able to do (and you say you do) then a request for patches is more > than acceptable as a response. I've yet to see any work from you and a lot > of talk about what other people should do.Umm.. don't get your feathers in a ruffle - you provided *zero* content and I was just saying it wasn't impossible. To pop back all huffy is just funny. Anyway, to bring this conversation back to something technical instead of just stupid comments.. I'd agree that flipping targets back and forth (intermixed) in the same Module *is* probably a substantial amount of work. If the optimization passes worked at a PU (program unit) aka function level it wouldn't be. Why can't you append 1 Module after another and switch? As you point out whole program analysis/optimization will face a similar problem - same question as above. --------------------- Currently - (I don't know about DSP - TI/Qualcomm), but most people in the industry are using custom runtimes to parse the GPU code and load/execute. It would be great if the linker/loader actually had better support for this built-in. I don't know the exact capabilities of gnu/sun linker/loader, but something along the lines of managling the function to also include target details so compiler would emit multiple mangled versions of foo() and linker/loader could pick the most optimized. Something like this nvc0_foo avx2_foo avx512_foo (Also I'd agree that the above would be quite hard)
Apparently Analagous Threads
- [LLVMdev] Supporting heterogeneous computing in llvm.
- [LLVMdev] Supporting heterogeneous computing in llvm.
- [LLVMdev] Supporting heterogeneous computing in llvm.
- [LLVMdev] [cfe-dev] RFC: A proposal to move toward using C++11 features in LLVM & Clang / bounding support for old host compilers
- CUDA fixed VA allocations and sparse mappings