thr3ads.net - llvm dev - [LLVMdev] [cfe-dev] Proposal: floating point accuracy metadata (OpenCL related) [Sep 2011]

If this information is useful, please help other people find it:
Share via:

Robert Quill

2011-Sep-08 10:24 UTC

[LLVMdev] [cfe-dev] Proposal: floating point accuracy metadata (OpenCL related)

Hi Peter,

This sounds like I really good idea. One thing that did occur to me
though from an OpenCL point of view is that ULP accuracy requirements
can differ for embedded and full profile so that may need to be handled
somehow.

Thanks,
Rob 

On Wed, 2011-09-07 at 21:55 +0100, Peter Collingbourne
wrote:> Hi,
> 
> This is my proposal to add floating point accuracy support to LLVM.
> The intention is that the frontend may provide metadata to signal to
> the backend that it may select a less accurate (i.e. more efficient)
> instruction to perform a given operation.  This is primarily a
> requirement of OpenCL, which specifies that certain floating point
> operations may be computed inaccurately.
> 
> Comments appreciated.
> 
> ------------------------------------------------------------------------
> 
> Specification
> -------------
> 
> The metadata attribute is named "fpaccuracy", and contains a
single
> integer parameter which specifies the maximum relative error of the
> operation to which it is attached, in ULPs.  For the definition of
> ULPs we follow the definition given in the OpenCL 1.1 specification
> (section 7.4):
> 
> 	If x is a real number that lies between two finite
> 	consecutive floating-point numbers a and b, without being
> 	equal to one of them, then ulp(x) = |b - a|, otherwise
> 	ulp(x) is the distance between the two non-equal finite
> 	floating-point numbers nearest x. Moreover, ulp(NaN) is
> 	NaN.
> 
> Implementation
> --------------
> 
> As a start, I am attaching a Clang patch which adds this metadata
> to single precision floating point division instructions in OpenCL
> (which are accurate to 2.5ulp, per OpenCL 1.1 s7.4).  I would imagine
> that in the future we may want to allow the user to control the level
> of accuracy (one idea is that we can provide a __builtin_fpaccuracy
> function, which could be used like this):
> 
> __builtin_fpaccuracy(x/y, 2) // compute x/y with a maximum error of 2ulp
> 
> The backend is not my area of expertise, so I won't venture to say
> how difficult it would be to recognise and act on the attribute there.
> 
> Alternatives
> ------------
> 
> I am proposing metadata here, since it may be removed without
> negatively affecting accuracy.
> 
> We could add fpaccuracy as a core LLVM attribute (a la nsw and nuw).
> But this would be relatively intrusive (only a few clients care about
> accuracy) and it would need to be added to a variety of unrelated
> instruction types (most floating point operations, as well as
> intrinsic calls).
> 
> We could also introduce a set of intrinsics for inaccurate FP
> operations.  The main disadvantage is that we would need to add an
> intrinsic for each FP operation, which could add up to a lot of work.
> Furthermore, the new intrinsics would not necessarily be recognised
> by the existing optimisers.
> 
> ------------------------------------------------------------------------
> 
> Thanks,
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev

Peter Collingbourne

2011-Sep-08 17:27 UTC

head link

[LLVMdev] [cfe-dev] Proposal: floating point accuracy metadata (OpenCL related)

On Thu, Sep 08, 2011 at 11:15:06AM -0500, Villmow, Micah
wrote:> Peter,
>  Is there a way to make this flag globally available? Metadata can be
fairly expensive to handle at each node when in many cases it is a global flag
and not a per operation flag.
There are two main reasons why I think we shouldn't go for global
flags:

1) It becomes difficult if not impossible to correctly link together
   modules with different accuracy requirements, especially if LTO
   is done on the combined module.

2) Some LLVM optimisations will create operations with a accuracy
   requirement different from the language specified accuracy.
   For example, consider the following OpenCL kernel:

-----
#pragma OPENCL EXTENSION cl_khr_fp64: enable

__kernel void dpdiv(__global float *result, float x, float y) {
  *result = (double) x / (double) y;
}
-----

When compiled to LLVM with optimisations turned off, the function
looks like this:

-----
define void @dpdiv(float* %result, float %x, float %y) nounwind uwtable {
entry:
  %result.addr = alloca float*, align 8
  %x.addr = alloca float, align 4
  %y.addr = alloca float, align 4
  store float* %result, float** %result.addr, align 8
  store float %x, float* %x.addr, align 4
  store float %y, float* %y.addr, align 4
  %tmp = load float* %x.addr, align 4
  %conv = fpext float %tmp to double
  %tmp1 = load float* %y.addr, align 4
  %conv2 = fpext float %tmp1 to double
  %div = fdiv double %conv, %conv2
  %conv3 = fptrunc double %div to float
  %tmp4 = load float** %result.addr, align 8
  store float %conv3, float* %tmp4
  ret void
}
-----

With optimisations turned on:

-----
define void @dpdiv(float* nocapture %result, float %x, float %y) nounwind
uwtable {
entry:
  %conv3 = fdiv float %x, %y
  store float %conv3, float* %result, align 4, !tbaa !1
  ret void
}
-----

The main optimisation applied here is near the top of
InstCombiner::visitFPTrunc,
which simplifies fptrunc(fdiv (fpextend x), (fpextend y)) to fdiv(x, y).
Because double precision floating point divides are accurate in OpenCL,
the single precision divide in the optimised code must also be
accurate, unlike a "direct" single precision divide.

I would imagine that creating a pinned metadata name for fpaccuracy, as
we currently do for dbg, tbaa and prof, would go some way towards addressing
the efficiency problem.

Thanks,
-- 
Peter

Possibly Parallel Threads

Search for more maybe matching threads

llvm dev - Sep 2011 - [LLVMdev] [cfe-dev] Proposal: floating point accuracy metadata (OpenCL related)

[LLVMdev] [cfe-dev] Proposal: floating point accuracy metadata (OpenCL related)

[LLVMdev] [cfe-dev] Proposal: floating point accuracy metadata (OpenCL related)

Possibly Parallel Threads