similar to: [LLVMdev] Compiling integer mod

Displaying 20 results from an estimated 900 matches similar to: "[LLVMdev] Compiling integer mod"

2019 Mar 04
2
Where's the optimiser gone (part 11): use the proper instruction for sign extension
Compile with -O3 -m32 (see <https://godbolt.org/z/yCpBpM>): long lsign(long x) { return (x > 0) - (x < 0); } long long llsign(long long x) { return (x > 0) - (x < 0); } While the code generated for the "long" version of this function is quite OK, the code for the "long long" version misses an obvious optimisation: lsign: # @lsign mov
2011 Dec 14
2
[LLVMdev] Failure to optimize ? operator
I don't understand your point. Which version is better does NOT depend on what inputs are passed to the function. The compiled code for (as per llvm) f1 will always take less time to execute than f2. for x > 0 => T(f1) < T(f2) for x <= 0 => T(f1) = T(f2) where T() is the time to execute the given function. So always T(f1) <= T(f2). I would call this a missed
2009 Jan 07
3
[LLVMdev] LLVM optmization
The following C test program was compiled using LLVM with -O3 option and MSVC with /O2. The MSVC one is about 600 times faster than the one compiled with the LLVM. We can see that the for loop in MSVC assembler is solved in the optimization pass more efficiently than that in LLVM. Is there an way to get a optimization result in LLVM like that of the MSVC? Manoel Teixeira #include
2014 Jan 11
3
[LLVMdev] Possible error in docs.
http://llvm.org/docs/CodeGenerator.html#machine-code-description-classes Section starting: Fixed (preassigned) registers It talks about converting: define i32 @test(i32 %X, i32 %Y) { %Z = udiv i32 %X, %Y ret i32 %Z } into ;; X is in EAX, Y is in ECX mov %EAX, %EDX sar %EDX, 31 idiv %ECX ret BUT, where does the "sar" come from? Kind Regards James
2009 Jan 06
2
[LLVMdev] LLVM Optmizer
The following C code : #include <stdio.h> #include <stdlib.h> int TESTE2( int parami , int paraml ,double paramd ) { int varx=0,vary; int nI =0; //varx= parami; if( parami > 0 ) { varx = parami; vary = varx + 1; } else { varx = vary + 1; vary = paraml; } varx = varx + parami + paraml; for( nI = 1 ; nI <= paraml; nI++) { varx =
2011 Dec 14
0
[LLVMdev] Failure to optimize ? operator
On Tue, Dec 13, 2011 at 5:59 AM, Brent Walker <brenthwalker at gmail.com> wrote: > The following seemingly identical functions, get compiled to quite > different machine code.  The first is correctly optimized (the > computation of var y is nicely moved into the else branch of the "if" > statement), which the second one is not (the full computation of var y > is
2011 Dec 13
4
[LLVMdev] Failure to optimize ? operator
The following seemingly identical functions, get compiled to quite different machine code. The first is correctly optimized (the computation of var y is nicely moved into the else branch of the "if" statement), which the second one is not (the full computation of var y is always done). The output was produced using the demo page on llvm's web site (optimization level LTO). Can
2011 Dec 06
0
[LLVMdev] Implement implicit TLS on Windows - need advice
On Sun, Dec 4, 2011 at 9:18 AM, Kai <kai at redstar.de> wrote: > Hi! > > LLVM currently does not implement the implicit TLS model on Windows. This > model is easy: > > - a thread local variable ends up in the .tls section > - to access a thread local variable, you have to do >  (1) load pointer to thread local storage from TEB >      On x86_64, this is gs:0x58, on
2018 Mar 01
0
[parallel] fixes load balancing of parLapplyLB
Dear Tomas, Thanks for your commitment to fix this issue and also to add the chunk size as an argument. If you want our input, let us know ;) Best Regards On 02/26/2018 04:01 PM, Tomas Kalibera wrote: > Dear Christian and Henrik, > > thank you for spotting the problem and suggestions for a fix. We'll probably add a chunk.size argument to parLapplyLB and parLapply to follow OpenMP
2009 Sep 25
2
[LLVMdev] MinGW/MSVC++ uses different ABI for sret
Let's go directly to the example struct S { double dummy1; double dummy2; }; S bar(); S foo() { return bar(); } This is the result of g++ -c -S -O2 (focus on the final `ret'): __Z3foov: LFB0: pushl %ebp LCFI0: movl %esp, %ebp LCFI1: pushl %ebx LCFI2: subl $20, %esp LCFI3: movl 8(%ebp), %ebx movl %ebx, (%esp) call __Z3barv pushl %eax movl %ebx, %eax movl -4(%ebp), %ebx
2016 Nov 24
1
[parallel-package] feature request: set default cluster type via environment variable
Dear all, I?m working as an administrator of a High-Performance Computing (HPC) Cluster which runs on Linux. A lot of people are using R on this Linux cluster and, of course, the *parallel* package to speed up their computations. It has been our collective experience, that using |makeForkCluster| yields an overall better experience /on Linux/ than the |makePSOCKcluster|, for whatever definition
2013 Oct 04
2
Again about encoding speed of different compiles
I downloaded current version of FLAC sources and compiled it with: * GCC 4.8.1 (MSYS from http://xhmikosr.1f0.de/tools/) * Intel C++ Composer XE 2013 update 5 * MSVS 2010 SP1 * MSVS 2012 update 3 (SSSE3 and SSE4.1 code was disabled for all compilers) Stereo 24-bit WAV file was encoded with -8 preset. Encoding time, in seconds: GCC 32-bit: 209 ICC 32-bit: 130 VS10 32-bit: 116 VS12 32-bit: 114
2012 Jun 18
2
[LLVMdev] Best way to replace LLVM IR operation with code containing control flow?
Hi, -Does anyone know where a backend-specific optimization can be added to replace an instruction with code containing control flow? I'm interested in adding an optimization for the DIV instruction (x86-atom) which replace the IDIV/DIV with code containing control flow to select between the intended IDIV/DIV and an 8-bit DIV with movzx, as described in the Intel Atom Optimization Guide. My
2018 Feb 20
0
[parallel] fixes load balancing of parLapplyLB
Dear Henrik, The rationale is just that it is within these extremes and that it is really simple to calculate, without making any assumptions and knowing that it won't be perfect. The extremes A and B you are mentioning are special cases based on assumptions. Case A is based on the assumption that the function has a long runtime or varying runtime, then you are likely to get the best load
2018 Feb 19
0
[parallel] fixes load balancing of parLapplyLB
Dear R-Devel List, I have installed R 3.4.3 with the patch applied on our cluster and ran a *real-world* job of one of our users to confirm that the patch works to my satisfaction. Here are the results. The original was a series of jobs, all essentially doing the same stuff using bootstrapped data, so for the original there is more data and I show the arithmetic mean with standard deviation. The
2006 Jun 26
0
[klibc 24/43] i386 support for klibc
The parts of klibc specific to the i386 architecture. Signed-off-by: H. Peter Anvin <hpa at zytor.com> --- commit bd0599e5290ca1a16bb7a68f7c362d395c612eb3 tree 8f33afdd02a14c22e7a3984da2bad13184e3f729 parent 84f6a72f42cf41e32daa59871a0b5424572093e4 author H. Peter Anvin <hpa at zytor.com> Sun, 25 Jun 2006 16:58:21 -0700 committer H. Peter Anvin <hpa at zytor.com> Sun, 25 Jun
2018 Feb 12
2
[parallel] fixes load balancing of parLapplyLB
Dear R-Devel List, **TL;DR:** The function **parLapplyLB** of the parallel package has [reportedly][1] (see also attached RRD output) not been doing its job, i.e. not actually balancing the load. My colleague Dirk Sarpe and I found the cause of the problem and we also have a patch to fix it (attached). A similar fix has also been provided [here][2]. [1]:
2018 Feb 19
2
[parallel] fixes load balancing of parLapplyLB
Hi, I'm trying to understand the rationale for your proposed amount of splitting and more precisely why that one is THE one. If I put labels on your example numbers in one of your previous post: nbrOfElements <- 97 nbrOfWorkers <- 5 With these, there are two extremes in how you can split up the processing in chunks such that all workers are utilized: (A) Each worker, called
2018 Feb 26
2
[parallel] fixes load balancing of parLapplyLB
Dear Christian and Henrik, thank you for spotting the problem and suggestions for a fix. We'll probably add a chunk.size argument to parLapplyLB and parLapply to follow OpenMP terminology, which has already been an inspiration for the present code (parLapply already implements static scheduling via internal function staticClusterApply, yet with a fixed chunk size; parLapplyLB already
2013 Apr 05
0
[LLVMdev] Integer divide by zero
On 4/5/2013 1:23 PM, Cameron McInally wrote: > Hey guys, > > I'm learning that LLVM does not preserve faults during constant > folding. I realize that this is an architecture dependent problem, but > I'm not sure if it's safe to constant fold away a fault on x86-64. > > A little testcase: > > #include <stdio.h> > > int foo(int j, int d) { >