thr3ads.net - llvm dev - [LLVMdev] [llvm-commits] [PATCH][RFC] NVPTX Backend [May 2012]

If this information is useful, please help other people find it:
Share via:

Justin Holewinski

2012-May-01 14:30 UTC

[LLVMdev] [llvm-commits] [PATCH][RFC] NVPTX Backend

> -----Original Message-----
> From: Dan Bailey [mailto:dan at dneg.com]
> Sent: Sunday, April 29, 2012 8:46 AM
> To: Justin Holewinski
> Cc: Jim Grosbach; llvm-commits at cs.uiuc.edu; Vinod Grover;
> llvmdev at cs.uiuc.edu
> Subject: Re: [llvm-commits] [PATCH][RFC] NVPTX Backend
> 
> Justin,
> 
> Firstly, this is great! It seems to be so much further forward in terms of
> features compared to the PTX backend and I'm pleased to see all the
support
> for vector instructions, atomics, debugging information, etc - much of
which
> was still to be added to the other backend.
> 
> I have a few comments to make, but bear in mind my knowledge comes
> solely from working on the PTX backend, so I don't know much about how
> this compares to other backends and the feasibility of reusing components
> designed to be shared across backends, plus my knowledge of tablegen is
> also relatively limited.
> 
> This is in no particular order:
> 
> * Is there any reason why the 32-bit arch name is nvptx and the 64-bit arch
> name is nvptx64, especially as the default for the NVCC compiler is to pass
> the -m64 flag? Also, all the internal naming (NVPTXTargetMachine for
> example) use 32 / 64 suffixes.
As far as I know, there is no fundamental reason for this.  If it bugs you too
much, I'm sure we could change it. :)
> 
> * The register naming seems a little arbitrary as well, using FL prefixes
for 64-
> bit float and da prefixes for 64-bit float arguments for example.
Really, any choice is going to be arbitrary.
> 
> * Something I picked up in the NVVM IR spec - it seems to only be possible
to
> use the bar.sync 0 instruction. Unless this is being removed for PTX 3.0,
the
> spec (and the PTX backend) support using bar.sync {0..15}. The old PTX
> intrinsic also supports a non-zero integer operand.
The NVVM intrinsic is there to implement CUDA's __syncthreads().  The old
intrinsic is still exposed.  I'll see about adding a proper NVVM intrinsic.
> 
> * I guess this raises the question of whether or not it's actually
worthwhile
> retaining compatibility with the old backend. I converted my Jet compiler
to
> use NVVM intrinsics and still found the address spaces were out of line, so
it
> was largely useless until I'd done the full conversion anyway. If the
plan is to
> deprecate the old backend (as I fully support), I'd be tempted to say
maybe
> just leave all these old intrinsics out too?
This is something I've debated back and forth.  The original thought was to
maintain as much backwards compatibility as possible with the old back-end.  I
don't think it hurts anything to leave the old intrinsics in there.
> 
> * I know this isn't an easy problem as I remember discussing this at
length
> with you at the time Justin, but supporting i8 for ld, st, cvt seems to
introduce
> a load of blocks of string concat code in tablegen. Is there not a better
way of
> tackling this, it looks like a fairly ugly solution to me?
Oh there are quite a few ugly TableGen solutions in there. :)  I'm trying to
find ways to clean some of this stuff up, but it works so it's hard to
justify devoting time to it.
> 
> * There are a few places in the code where there are blocks of commented
> out code that I presume are waiting for upstreaming of other additions.
> Some of these are documented, but many of them aren't. Maybe these
> should either be removed for addition later or some explanation added as to
> why they're not being used.
Good point.  I'll see about adding more documentation to these.
> 
> * There seems to be a lot of features that are lacking tests too. I was
trying to
> find the equivalent flags to use for switching on and off FMA (a feature we
> use quite a bit to compare results with x86) and couldn't find any
tests
> demonstrating this. Considering the amount of code in the backend
> committed to handling mad / fma, there really should be a bunch of tests to
> go along with this. In fact non of the nvptx-* flags seem to have any
> accompanying test cases, nor did atomics, nvvm intrinsics or many of the
> other instructions. I imagine NVidia has a fairly full featured CUDA
testing
> framework, but if this is going to be made open-source, I'd rather as
many of
> the instructions could have tests too. Having said that, maybe some of the
> test cases for the PTX backend could be migrated across?
I'm working on converting the tests now.
> 
> * Finally I spotted a number of typos in the comments in the code that
could
> be cleaned up.
> 
> There's a lot of code here, so I might have missed some things. On the
whole
> though, most of this is fairly minor. It looks well designed and I'm
looking
> forward to this making it in. I'm especially excited to see the new
pass for
> vectorization and other passes to do code transformations. I managed to run
> our fluid solver through this backend and it generated PTX that did the
right
> thing. I haven't had time to do any profiling comparing the old backend
yet.
Can I put you down as a technical reviewer of the back-end? :)

As the main consumer of the open-source back-end, your seal of approval would go
a long way!
> 
> Dan
> 
> Justin Holewinski wrote:
> 
> 	Thanks for the feedback!
> 
> 
> 
> 	The attached patch addresses the style issues that have been found.
> 
> 
> 
> 	From: Jim Grosbach [mailto:grosbach at apple.com]
> 	Sent: Wednesday, April 25, 2012 2:22 PM
> 	To: Justin Holewinski
> 	Cc: llvm-commits at cs.uiuc.edu; llvmdev at cs.uiuc.edu; Vinod Grover
> 	Subject: Re: [llvm-commits] [PATCH][RFC] NVPTX Backend
> 
> 
> 
> 	Hi Justin,
> 
> 
> 
> 	Cool stuff, to be sure. Excited to see this.
> 
> 
> 
> 	As a pre-cursor to more involved technical feedback, I suggest going
> through and fixing up the coding style and formatting issues. Just glancing
> through, I see lots of things like function names starting with capital
letters,
> compound statements with the opening brace on the line following an
> if/for/while/etc., single-statements after an 'if' enclosed in a
compound
> statement, container.end() being evaluated every iterations of loops, etc,
> etc..
> 
> 
> 
> 	-Jim
> 
> 
> 
> 	On Apr 24, 2012, at 11:50 AM, Justin Holewinski
> <jholewinski at nvidia.com> wrote:
> 
> 
> 
> 
> 
> 	Hi LLVMers,
> 
> 
> 
> 	We at NVIDIA would like to contribute back to the LLVM open-source
> community by up-streaming the NVPTX back-end for LLVM.  This back-end is
> based on the sources used by NVIDIA, and currently provides significantly
> more functionality than the current PTX back-end.  Some functionality is
> currently disabled due to dependencies on LLVM core changes that we are
> also in the process of up-streaming, but the back-end is very usable in its
> current state and would benefit all current and future users of the LLVM
PTX
> back-end.
> 
> 
> 
> 	The goal is to phase out the existing PTX back-end, while maintaining
> compatibility with it.  To that end, the NVPTX back-end maintains its own
set
> of intrinsics but also supports the old PTX back-end intrinsics to ensure
> compatibility with out-of-tree users.
> 
> 
> 
> 	We would like to get your feedback on the attached patch to make
> sure it is up to LLVM commit quality.  We would like to commit this as soon
as
> the community is satisfied with it.
> 
> 
> 
> 	Also, as the current maintainer of the PTX back-end, I give my
> consent to deprecate it. J
> 
> 
> 
> 
> 
> 	Thanks,
> 
> 
> 
> 	Justin Holewinski
> 
> 
> 
> 
> ________________________________
> 
> 
> 	This email message is for the sole use of the intended recipient(s)
> and may contain confidential information.  Any unauthorized review, use,
> disclosure or distribution is prohibited.  If you are not the intended
recipient,
> please contact the sender by reply email and destroy all copies of the
original
> message.
> 
> 
> ________________________________
> 
> 
> 
> 
> 	<nvptx-backend-
> public.patch>_______________________________________________
> 	llvm-commits mailing list
> 	llvm-commits at cs.uiuc.edu <mailto:llvm-commits at cs.uiuc.edu>
> 	http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
> <http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits>
> 
> 
> 
> 
> ________________________________
> 
> 
> 	_______________________________________________
> 	llvm-commits mailing list
> 	llvm-commits at cs.uiuc.edu
> 	http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>

Dan Bailey

2012-May-02 16:17 UTC

head link

[LLVMdev] [llvm-commits] [PATCH][RFC] NVPTX Backend

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
  <meta content="text/html;charset=ISO-8859-1"
http-equiv="Content-Type">
</head>
<body bgcolor="#ffffff" text="#000000">
Justin Holewinski wrote:
<blockquote
 cite="mid:448D0C8E64E8084981082A23DEDAEC70A28C88FD52@HQMAIL02.nvidia.com"
 type="cite">
  <blockquote type="cite">
    <pre wrap="">-----Original Message-----
From: Dan Bailey [<a class="moz-txt-link-freetext"
href="mailto:dan@dneg.com">mailto:dan@dneg.com</a>]
Sent: Sunday, April 29, 2012 8:46 AM
To: Justin Holewinski
Cc: Jim Grosbach; <a class="moz-txt-link-abbreviated"
href="mailto:llvm-commits@cs.uiuc.edu">llvm-commits@cs.uiuc.edu</a>;
Vinod Grover;
<a class="moz-txt-link-abbreviated"
href="mailto:llvmdev@cs.uiuc.edu">llvmdev@cs.uiuc.edu</a>
Subject: Re: [llvm-commits] [PATCH][RFC] NVPTX Backend

Justin,

Firstly, this is great! It seems to be so much further forward in terms of
features compared to the PTX backend and I'm pleased to see all the support
for vector instructions, atomics, debugging information, etc - much of which
was still to be added to the other backend.

I have a few comments to make, but bear in mind my knowledge comes
solely from working on the PTX backend, so I don't know much about how
this compares to other backends and the feasibility of reusing components
designed to be shared across backends, plus my knowledge of tablegen is
also relatively limited.

This is in no particular order:

* Is there any reason why the 32-bit arch name is nvptx and the 64-bit arch
name is nvptx64, especially as the default for the NVCC compiler is to pass
the -m64 flag? Also, all the internal naming (NVPTXTargetMachine for
example) use 32 / 64 suffixes.
    </pre>
  </blockquote>
  <pre wrap=""><!---->
As far as I know, there is no fundamental reason for this.  If it bugs you too
much, I'm sure we could change it. :)

  </pre>
  <blockquote type="cite">
    <pre wrap="">* The register naming seems a little arbitrary
as well, using FL prefixes for 64-
bit float and da prefixes for 64-bit float arguments for example.
    </pre>
  </blockquote>
  <pre wrap=""><!---->
Really, any choice is going to be arbitrary.

  </pre>
  <blockquote type="cite">
    <pre wrap="">* Something I picked up in the NVVM IR spec -
it seems to only be possible to
use the bar.sync 0 instruction. Unless this is being removed for PTX 3.0, the
spec (and the PTX backend) support using bar.sync {0..15}. The old PTX
intrinsic also supports a non-zero integer operand.
    </pre>
  </blockquote>
  <pre wrap=""><!---->
The NVVM intrinsic is there to implement CUDA's __syncthreads().  The old
intrinsic is still exposed.  I'll see about adding a proper NVVM intrinsic.

  </pre>
  <blockquote type="cite">
    <pre wrap="">* I guess this raises the question of whether
or not it's actually worthwhile
retaining compatibility with the old backend. I converted my Jet compiler to
use NVVM intrinsics and still found the address spaces were out of line, so it
was largely useless until I'd done the full conversion anyway. If the plan
is to
deprecate the old backend (as I fully support), I'd be tempted to say maybe
just leave all these old intrinsics out too?
    </pre>
  </blockquote>
  <pre wrap=""><!---->
This is something I've debated back and forth.  The original thought was to
maintain as much backwards compatibility as possible with the old back-end.  I
don't think it hurts anything to leave the old intrinsics in there.

  </pre>
  <blockquote type="cite">
    <pre wrap="">* I know this isn't an easy problem as I
remember discussing this at length
with you at the time Justin, but supporting i8 for ld, st, cvt seems to
introduce
a load of blocks of string concat code in tablegen. Is there not a better way of
tackling this, it looks like a fairly ugly solution to me?
    </pre>
  </blockquote>
  <pre wrap=""><!---->
Oh there are quite a few ugly TableGen solutions in there. :)  I'm trying to
find ways to clean some of this stuff up, but it works so it's hard to
justify devoting time to it.

  </pre>
  <blockquote type="cite">
    <pre wrap="">* There are a few places in the code where
there are blocks of commented
out code that I presume are waiting for upstreaming of other additions.
Some of these are documented, but many of them aren't. Maybe these
should either be removed for addition later or some explanation added as to
why they're not being used.
    </pre>
  </blockquote>
  <pre wrap=""><!---->
Good point.  I'll see about adding more documentation to these.

  </pre>
  <blockquote type="cite">
    <pre wrap="">* There seems to be a lot of features that are
lacking tests too. I was trying to
find the equivalent flags to use for switching on and off FMA (a feature we
use quite a bit to compare results with x86) and couldn't find any tests
demonstrating this. Considering the amount of code in the backend
committed to handling mad / fma, there really should be a bunch of tests to
go along with this. In fact non of the nvptx-* flags seem to have any
accompanying test cases, nor did atomics, nvvm intrinsics or many of the
other instructions. I imagine NVidia has a fairly full featured CUDA testing
framework, but if this is going to be made open-source, I'd rather as many
of
the instructions could have tests too. Having said that, maybe some of the
test cases for the PTX backend could be migrated across?
    </pre>
  </blockquote>
  <pre wrap=""><!---->
I'm working on converting the tests now.

  </pre>
  <blockquote type="cite">
    <pre wrap="">* Finally I spotted a number of typos in the
comments in the code that could
be cleaned up.

There's a lot of code here, so I might have missed some things. On the whole
though, most of this is fairly minor. It looks well designed and I'm looking
forward to this making it in. I'm especially excited to see the new pass for
vectorization and other passes to do code transformations. I managed to run
our fluid solver through this backend and it generated PTX that did the right
thing. I haven't had time to do any profiling comparing the old backend yet.
    </pre>
  </blockquote>
  <pre wrap=""><!---->
Can I put you down as a technical reviewer of the back-end? :)

As the main consumer of the open-source back-end, your seal of approval would go
a long way</pre>
</blockquote>
All sounds good to me. Sure, put me down. :)<br>
<br>
Thanks,<br>
Dan<br>
<br>
<br>
</body>
</html>

Justin Holewinski

2012-May-03 15:35 UTC

head link

[LLVMdev] [llvm-commits] [PATCH][RFC] NVPTX Backend

All,

One of the primary users of the existing PTX back-end provided a technical
review for us and has confirmed that the back-end works as expected.  Since this
is replacing the existing PTX back-end and does not affect any other targets or
parts of LLVM (other than adding the triple and intrinsics), and we have
confirmation that it works as expected for users, can we commit this?  If not,
what other action is required?

> -----Original Message-----
> From: Dan Bailey [mailto:dan at dneg.com]
> Sent: Wednesday, May 02, 2012 9:17 AM
> To: Justin Holewinski
> Cc: llvm-commits at cs.uiuc.edu; Vinod Grover; llvmdev at cs.uiuc.edu
> Subject: Re: [llvm-commits] [PATCH][RFC] NVPTX Backend
> 
> Justin Holewinski wrote:
> 
> 		-----Original Message-----
> 		From: Dan Bailey [mailto:dan at dneg.com]
> 		Sent: Sunday, April 29, 2012 8:46 AM
> 		To: Justin Holewinski
> 		Cc: Jim Grosbach; llvm-commits at cs.uiuc.edu; Vinod Grover;
> 		llvmdev at cs.uiuc.edu
> 		Subject: Re: [llvm-commits] [PATCH][RFC] NVPTX Backend
> 
> 		Justin,
> 
> 		Firstly, this is great! It seems to be so much further forward in
> terms of
> 		features compared to the PTX backend and I'm pleased to
> see all the support
> 		for vector instructions, atomics, debugging information, etc -
> much of which
> 		was still to be added to the other backend.
> 
> 		I have a few comments to make, but bear in mind my
> knowledge comes
> 		solely from working on the PTX backend, so I don't know
> much about how
> 		this compares to other backends and the feasibility of reusing
> components
> 		designed to be shared across backends, plus my knowledge
> of tablegen is
> 		also relatively limited.
> 
> 		This is in no particular order:
> 
> 		* Is there any reason why the 32-bit arch name is nvptx and
> the 64-bit arch
> 		name is nvptx64, especially as the default for the NVCC
> compiler is to pass
> 		the -m64 flag? Also, all the internal naming
> (NVPTXTargetMachine for
> 		example) use 32 / 64 suffixes.
> 
> 
> 
> 	As far as I know, there is no fundamental reason for this.  If it bugs
> you too much, I'm sure we could change it. :)
> 
> 
> 
> 		* The register naming seems a little arbitrary as well, using FL
> prefixes for 64-
> 		bit float and da prefixes for 64-bit float arguments for
> example.
> 
> 
> 
> 	Really, any choice is going to be arbitrary.
> 
> 
> 
> 		* Something I picked up in the NVVM IR spec - it seems to
> only be possible to
> 		use the bar.sync 0 instruction. Unless this is being removed
> for PTX 3.0, the
> 		spec (and the PTX backend) support using bar.sync {0..15}.
> The old PTX
> 		intrinsic also supports a non-zero integer operand.
> 
> 
> 
> 	The NVVM intrinsic is there to implement CUDA's __syncthreads().
> The old intrinsic is still exposed.  I'll see about adding a proper
NVVM intrinsic.
> 
> 
> 
> 		* I guess this raises the question of whether or not it's
> actually worthwhile
> 		retaining compatibility with the old backend. I converted my
> Jet compiler to
> 		use NVVM intrinsics and still found the address spaces were
> out of line, so it
> 		was largely useless until I'd done the full conversion anyway.
> If the plan is to
> 		deprecate the old backend (as I fully support), I'd be
> tempted to say maybe
> 		just leave all these old intrinsics out too?
> 
> 
> 
> 	This is something I've debated back and forth.  The original thought
> was to maintain as much backwards compatibility as possible with the old
> back-end.  I don't think it hurts anything to leave the old intrinsics
in there.
> 
> 
> 
> 		* I know this isn't an easy problem as I remember discussing
> this at length
> 		with you at the time Justin, but supporting i8 for ld, st, cvt
> seems to introduce
> 		a load of blocks of string concat code in tablegen. Is there not
> a better way of
> 		tackling this, it looks like a fairly ugly solution to me?
> 
> 
> 
> 	Oh there are quite a few ugly TableGen solutions in there. :)  I'm
> trying to find ways to clean some of this stuff up, but it works so
it's hard to
> justify devoting time to it.
> 
> 
> 
> 		* There are a few places in the code where there are blocks
> of commented
> 		out code that I presume are waiting for upstreaming of other
> additions.
> 		Some of these are documented, but many of them aren't.
> Maybe these
> 		should either be removed for addition later or some
> explanation added as to
> 		why they're not being used.
> 
> 
> 
> 	Good point.  I'll see about adding more documentation to these.
> 
> 
> 
> 		* There seems to be a lot of features that are lacking tests
> too. I was trying to
> 		find the equivalent flags to use for switching on and off FMA
> (a feature we
> 		use quite a bit to compare results with x86) and couldn't find
> any tests
> 		demonstrating this. Considering the amount of code in the
> backend
> 		committed to handling mad / fma, there really should be a
> bunch of tests to
> 		go along with this. In fact non of the nvptx-* flags seem to
> have any
> 		accompanying test cases, nor did atomics, nvvm intrinsics or
> many of the
> 		other instructions. I imagine NVidia has a fairly full featured
> CUDA testing
> 		framework, but if this is going to be made open-source, I'd
> rather as many of
> 		the instructions could have tests too. Having said that, maybe
> some of the
> 		test cases for the PTX backend could be migrated across?
> 
> 
> 
> 	I'm working on converting the tests now.
> 
> 
> 
> 		* Finally I spotted a number of typos in the comments in the
> code that could
> 		be cleaned up.
> 
> 		There's a lot of code here, so I might have missed some
> things. On the whole
> 		though, most of this is fairly minor. It looks well designed and
> I'm looking
> 		forward to this making it in. I'm especially excited to see the
> new pass for
> 		vectorization and other passes to do code transformations. I
> managed to run
> 		our fluid solver through this backend and it generated PTX
> that did the right
> 		thing. I haven't had time to do any profiling comparing the old
> backend yet.
> 
> 
> 
> 	Can I put you down as a technical reviewer of the back-end? :)
> 
> 	As the main consumer of the open-source back-end, your seal of
> approval would go a long way
> 
> All sounds good to me. Sure, put me down. :)
> 
> Thanks,
> Dan
> 
> 
-----------------------------------------------------------------------------------
This email message is for the sole use of the intended recipient(s) and may
contain
confidential information.  Any unauthorized review, use, disclosure or
distribution
is prohibited.  If you are not the intended recipient, please contact the sender
by
reply email and destroy all copies of the original message.
-----------------------------------------------------------------------------------

Possibly Parallel Threads

Search for more seemingly similar threads

llvm dev - May 2012 - [LLVMdev] [llvm-commits] [PATCH][RFC] NVPTX Backend

[LLVMdev] [llvm-commits] [PATCH][RFC] NVPTX Backend

[LLVMdev] [llvm-commits] [PATCH][RFC] NVPTX Backend

[LLVMdev] [llvm-commits] [PATCH][RFC] NVPTX Backend

Possibly Parallel Threads