On 08/11/14 07:32 PM, Das, Dibyendu wrote:> Storing llvm-ir in the fat binary may have the same performance issues mentioned below. The fat binary discussed in the proposal has provision for storing the isa/llvm-ir. My point is instead of llvm-ir it shd be something like spir.Ok - so lets see some data. #1 Benchmarks showing at least SPIR dgemm/sgemm performance #2 Some logical explanation why all the extra work for SPIR when LLVM IR is native Basically besides an opinion or because it's "shiny" some solid technical reason. I hate to repeat myself, but again.. why on earth would a solution which is closed source be preferred over llvm ir...
I believe I should just extend the section regarding the target code. It can be literally anything that target RTL could support. For Xeon PHI it will be a linux executable - best for performance, code size, portability, etc. For any OpenCL-compatible system this can be SPIR, as you wish. For a proprietary DSP it can be something else. But it is your (or a vendor) responsibility to provide a RTL that will be capable to translate this to the target. Sergos On Mon, Aug 11, 2014 at 4:38 PM, "C. Bergström" <cbergstrom at pathscale.com> wrote:> On 08/11/14 07:32 PM, Das, Dibyendu wrote: >> >> Storing llvm-ir in the fat binary may have the same performance issues >> mentioned below. The fat binary discussed in the proposal has provision for >> storing the isa/llvm-ir. My point is instead of llvm-ir it shd be something >> like spir. > > Ok - so lets see some data. > > #1 Benchmarks showing at least SPIR dgemm/sgemm performance > #2 Some logical explanation why all the extra work for SPIR when LLVM IR is > native > > Basically besides an opinion or because it's "shiny" some solid technical > reason. > > I hate to repeat myself, but again.. why on earth would a solution which is > closed source be preferred over llvm ir...
On 08/13/14 02:05 PM, Sergey Ostanevich wrote:> I believe I should just extend the section regarding the target code. > It can be literally anything that target RTL could support. For Xeon > PHI it will be a linux executable - best for performance, code size, > portability, etc. > For any OpenCL-compatible system this can be SPIR, as you wish. For a > proprietary DSP it can be something else. But it is your (or a vendor) > responsibility to provide a RTL that will be capable to translate this > to the target.I need to talk this over internally, but for Xeon PHI we may be willing to contribute some (a lot?) of code to the open source. Specifically - we have a tiny and scalable "OS" we wrote in order to replace the onboard linux which is uploaded to Xeon PHI. The benefits appear to be 1) Less overhead (both on init times as well resident card) 2) Less impact in terms of scheduling and large linux kernel problems oncard 3) Easier to hack if you want to research and test something 4) Exposes an interface which makes it appear more similar in design to the GPU. (This mostly impacts runtime design, but also may make it easier to support the target/data OMP4 clauses) ----------- I mention this since the target code isn't the whole story. For GPU (AMD/NVIDIA) both accept more or less raw kernels and the hw can execute that, but Xeon PHI is an outliner. On there you talk with the firmware, but still require an OS. (Someone from Intel feel free to correct me) Technically Xeon PHI can take not only an elf object, but also a directly compressed file. @Intel - I realize that some of your onboard tools are required to in order to activate the fast DMA copies. We handled that in the driver https://github.com/pathscale/intel_xeon_phi_kernel_driver If anyone cares about this stuff please speak up so we can work together. If nobody cares then we'll just keep it in-house and for our compiler alone.