thr3ads.net - search: "double4"

Displaying 3 results from an estimated 3 matches for "double4".

Did you mean: double

[LLVMdev] [NVPTX] We need an LLVM CUDA math library, after all

2013 Feb 09

[LLVMdev] [NVPTX] We need an LLVM CUDA math library, after all

...c. In its current state the library has rough edges, e.g. the precision of many math functions is not yet ideal, and exceptional cases (nan, inf) are probably not yet all handled correctly. I would be happy if vecmathlib could be used in LLVM. For example, assuming that there is a data type "double4" containing a vector of 4 double precision values, vecmathlib provides a function double4 pow(double4, double4) that implements pow(). In the general case, i.e. if no system-specific machine instructions are available, this would use Taylor expansions to calculate pow(x,y)=exp(y*log(x)). I wo...

[LLVMdev] [NVPTX] We need an LLVM CUDA math library, after all

2013 Feb 07

[LLVMdev] [NVPTX] We need an LLVM CUDA math library, after all

Hi Justin, gentlemen, I'm afraid I have to escalate this issue at this point. Since it was discussed for the first time last summer, it was sufficient for us for a while to have lowering of math calls into intrinsics disabled at DragonEgg level, and link them against CUDA math functions at LLVM IR level. Now I can say: this is not sufficient any longer, and we need NVPTX backend to deal with

unsorted - suggestion for performance improvement and ALTREP support for POSIXct

2019 Jan 05

unsorted - suggestion for performance improvement and ALTREP support for POSIXct

I believe the performance of isUnsorted() in sort.c could be improved by calling REAL() once (outside of the for loop), rather than calling it twice inside the loop. As an aside, it is implemented in the faster way in doSort() (sort.c line 401). The example below shows the performance improvement for a vectors of double of moving REAL() outside the for loop. # example as implemented in

search for: double4