Robert Quill
2011-Sep-08 10:24 UTC
[LLVMdev] [cfe-dev] Proposal: floating point accuracy metadata (OpenCL related)
Hi Peter, This sounds like I really good idea. One thing that did occur to me though from an OpenCL point of view is that ULP accuracy requirements can differ for embedded and full profile so that may need to be handled somehow. Thanks, Rob On Wed, 2011-09-07 at 21:55 +0100, Peter Collingbourne wrote:> Hi, > > This is my proposal to add floating point accuracy support to LLVM. > The intention is that the frontend may provide metadata to signal to > the backend that it may select a less accurate (i.e. more efficient) > instruction to perform a given operation. This is primarily a > requirement of OpenCL, which specifies that certain floating point > operations may be computed inaccurately. > > Comments appreciated. > > ------------------------------------------------------------------------ > > Specification > ------------- > > The metadata attribute is named "fpaccuracy", and contains a single > integer parameter which specifies the maximum relative error of the > operation to which it is attached, in ULPs. For the definition of > ULPs we follow the definition given in the OpenCL 1.1 specification > (section 7.4): > > If x is a real number that lies between two finite > consecutive floating-point numbers a and b, without being > equal to one of them, then ulp(x) = |b - a|, otherwise > ulp(x) is the distance between the two non-equal finite > floating-point numbers nearest x. Moreover, ulp(NaN) is > NaN. > > Implementation > -------------- > > As a start, I am attaching a Clang patch which adds this metadata > to single precision floating point division instructions in OpenCL > (which are accurate to 2.5ulp, per OpenCL 1.1 s7.4). I would imagine > that in the future we may want to allow the user to control the level > of accuracy (one idea is that we can provide a __builtin_fpaccuracy > function, which could be used like this): > > __builtin_fpaccuracy(x/y, 2) // compute x/y with a maximum error of 2ulp > > The backend is not my area of expertise, so I won't venture to say > how difficult it would be to recognise and act on the attribute there. > > Alternatives > ------------ > > I am proposing metadata here, since it may be removed without > negatively affecting accuracy. > > We could add fpaccuracy as a core LLVM attribute (a la nsw and nuw). > But this would be relatively intrusive (only a few clients care about > accuracy) and it would need to be added to a variety of unrelated > instruction types (most floating point operations, as well as > intrinsic calls). > > We could also introduce a set of intrinsics for inaccurate FP > operations. The main disadvantage is that we would need to add an > intrinsic for each FP operation, which could add up to a lot of work. > Furthermore, the new intrinsics would not necessarily be recognised > by the existing optimisers. > > ------------------------------------------------------------------------ > > Thanks, > _______________________________________________ > cfe-dev mailing list > cfe-dev at cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
Peter Collingbourne
2011-Sep-08 17:27 UTC
[LLVMdev] [cfe-dev] Proposal: floating point accuracy metadata (OpenCL related)
On Thu, Sep 08, 2011 at 11:15:06AM -0500, Villmow, Micah wrote:> Peter, > Is there a way to make this flag globally available? Metadata can be fairly expensive to handle at each node when in many cases it is a global flag and not a per operation flag.There are two main reasons why I think we shouldn't go for global flags: 1) It becomes difficult if not impossible to correctly link together modules with different accuracy requirements, especially if LTO is done on the combined module. 2) Some LLVM optimisations will create operations with a accuracy requirement different from the language specified accuracy. For example, consider the following OpenCL kernel: ----- #pragma OPENCL EXTENSION cl_khr_fp64: enable __kernel void dpdiv(__global float *result, float x, float y) { *result = (double) x / (double) y; } ----- When compiled to LLVM with optimisations turned off, the function looks like this: ----- define void @dpdiv(float* %result, float %x, float %y) nounwind uwtable { entry: %result.addr = alloca float*, align 8 %x.addr = alloca float, align 4 %y.addr = alloca float, align 4 store float* %result, float** %result.addr, align 8 store float %x, float* %x.addr, align 4 store float %y, float* %y.addr, align 4 %tmp = load float* %x.addr, align 4 %conv = fpext float %tmp to double %tmp1 = load float* %y.addr, align 4 %conv2 = fpext float %tmp1 to double %div = fdiv double %conv, %conv2 %conv3 = fptrunc double %div to float %tmp4 = load float** %result.addr, align 8 store float %conv3, float* %tmp4 ret void } ----- With optimisations turned on: ----- define void @dpdiv(float* nocapture %result, float %x, float %y) nounwind uwtable { entry: %conv3 = fdiv float %x, %y store float %conv3, float* %result, align 4, !tbaa !1 ret void } ----- The main optimisation applied here is near the top of InstCombiner::visitFPTrunc, which simplifies fptrunc(fdiv (fpextend x), (fpextend y)) to fdiv(x, y). Because double precision floating point divides are accurate in OpenCL, the single precision divide in the optimised code must also be accurate, unlike a "direct" single precision divide. I would imagine that creating a pinned metadata name for fpaccuracy, as we currently do for dbg, tbaa and prof, would go some way towards addressing the efficiency problem. Thanks, -- Peter
Apparently Analagous Threads
- [LLVMdev] [cfe-dev] Proposal: floating point accuracy metadata (OpenCL related)
- [LLVMdev] [cfe-dev] Proposal: floating point accuracy metadata (OpenCL related)
- InstCombine wrongful (?) optimization on BinOp with SameOperands
- [LLVMdev] What's the Alias Analysis does clang use ?
- [LLVMdev] What's the Alias Analysis does clang use ?