Displaying 5 results from an estimated 5 matches for "polymage".
2019 Sep 02
2
AVX2 codegen - question reg. FMA generation
...---------------------------------------------------------------------
> > _______________________________________________
> > LLVM Developers mailing list
> > llvm-dev at lists.llvm.org
> > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
--
Founder and Director, PolyMage Labs
2019 Sep 02
3
AVX2 codegen - question reg. FMA generation
Hello,
On the appended reasonably simple test case that has an fmul/fadd
sequence on <8 x float> vector types, I don't see the x86-64 code
generator (with cpu set to haswell or later types) turning it into an
AVX2 FMA instructions. Here's the snippet in the output it generates:
$ llc -O3 -mcpu=skylake
---------------------
.LBB0_2: # =>This Inner
2020 Jan 30
2
Writing loop transformations on the right representation is more productive
Am Mo., 27. Jan. 2020 um 22:06 Uhr schrieb Uday Kumar Reddy Bondhugula <
uday at polymagelabs.com>:
> Hi Michael,
>
> Although the approach to use a higher order in-memory abstraction like the
> loop tree will make it easier than what you have today, if you used MLIR
> for this representation, you already get a round trippable textual format
> that is *very close*...
2020 Feb 03
5
Writing loop transformations on the right representation is more productive
Am Do., 30. Jan. 2020 um 04:40 Uhr schrieb Uday Kumar Reddy Bondhugula
<uday at polymagelabs.com>:
> There are multiple ways regions in MLIR can be viewed, but the more relevant point here is you do have a loop tree structure native in the IR with MLIR. Regions in MLIR didn't evolve from modeling inlined calls - the affine.for/affine.if were originally the only two operations...
2020 Jan 03
10
Writing loop transformations on the right representation is more productive
In the 2018 LLVM DevMtg [1], I presented some shortcomings of how LLVM
optimizes loops. In summary, the biggest issues are (a) the complexity
of writing a new loop optimization pass (including needing to deal
with a variety of low-level issues, a significant amount of required
boilerplate, the difficulty of analysis preservation, etc.), (b)
independent optimization heuristics and a fixed pass