Displaying 20 results from an estimated 3000 matches similar to: "[LLVMdev] Loop unswitching creates dead code"
2007 Nov 20
9
Testing Models without fixtures
Hi,
I would like to test a sorting method that is in the user model, it''s
a class method called search.
What I would like to do is create 2 users and load the test database
with just those 2 users, so that I can call
User.search("john") and it would return those two users.
Not sure how to clear the test database and populate it just with
these 2 users for that specific
2015 May 06
2
[LLVMdev] [LoopVectorizer] Missed vectorization opportunities caused by sext/zext operations
For
void test0(unsigned short a, unsigned short * in, unsigned short * out) {
for (unsigned short w = 1; w < a - 1; w++) //this will never overflow
out[w] = in[w+7] * 2;
}
I think it will be sufficient to add a couple of new cases to
ScalarEvolution::HowManyLessThans --
zext(A) ult zext(B) == A ult B
sext(A) slt sext(B) == A slt B
Currently it bails out if it sees a non-add
2018 May 11
0
Query on unswitching + vectorization
On 5/10/2018 10:44 PM, Gopalasubramanian, Ganesh via llvm-dev wrote:
>
> Hi,
>
> I am going through analysis on unswitching + vectorization.
>
> For the below test, llvm unswitches successfully but fails to
> vectorize the loop after unswitching.
>
> Llvm bails out saying “Found an outside user” apparently which is the
> value of ‘tmp’.
>
> int i, w, x[1000],
2018 May 11
2
Query on unswitching + vectorization
Hi,
I am going through analysis on unswitching + vectorization.
For the below test, llvm unswitches successfully but fails to vectorize the loop after unswitching.
Llvm bails out saying "Found an outside user" apparently which is the value of 'tmp'.
int i, w, x[1000], y[1000],tmp;
void fn()
{
for (i = 0; i < 1000; i++) {
if (w==1) {
y[i] = 1; tmp = i*2;
}
2018 May 14
1
Query on unswitching + vectorization
* Looks like some sort of pass ordering issue; it will vectorize if indvars runs sometime between loop unswitch and the vectorizer.
That insight is helpful. I scheduled Canonicalization of induction variable before loop vectorization and could get the loop vectorized.
The indvars are heavily dependent on SCEV. If there a scalar like tmp which is of real type, we may not be able to get the
2015 Apr 29
2
[LLVMdev] [LoopVectorizer] Missed vectorization opportunities caused by sext/zext operations
Hi,
This is somewhat similar to the previous thread regarding missed vectorization
opportunities (http://lists.cs.uiuc.edu/pipermail/llvmdev/2015-April/084765.html),
but maybe different enough to require a new thread.
I'm seeing some missed vectorization opportunities in the loop vectorizer because SCEV
is not able to fold sext/zext expressions into recurrence expressions (AddRecExpr).
This
2012 Nov 26
2
[LLVMdev] RFC: change BoundsChecking.cpp to use address-based tests
I am investigating changing BoundsChecking to use address-based rather
than size- & offset-based tests.
To explain, here is a short code sample cribbed from one of the tests:
%mem = tail call i8* @calloc(i64 1, i64 %elements)
%memobj = bitcast i8* %mem to i64*
%ptr = getelementptr inbounds i64* %memobj, i64 %index
%4 = load i64* %ptr, align 8
Currently, the IR for bounds checking
2005 Dec 16
6
rake remote_exec on Windows
I am using the shovel deploy.rb from http://nubyonrails.com/pages/shovel
I have SwitchTower-ized my app, copied the shovel deploy.rb file and put
my settings in it.
But when I run "rake remote_exec ACTION=setup_lighty" from the local app
root it has no effect. It should prompt for a password for at least
throw an error? I just get returned to the DOS prompt.
If I do "rake
2013 Jun 28
2
[LLVMdev] Possible instruction combine bug with pointer icmp?
If I give instcombine the following IR:
define i1 @f([1 x i8]* %a, [1 x i8]* %b) {
%c = getelementptr [1 x i8]* %a, i32 0, i32 0
%d = getelementptr [1 x i8]* %b, i32 0, i32 0
%cmp = icmp ult i8* %c, %d
ret i1 %cmp
}
It optimizes it into:
define i1 @f([1 x i8]* %a, [1 x i8]* %b) {
%cmp = icmp slt [1 x i8]* %a, %b
ret i1 %cmp
}
Is this a bug, or are there some semantics of icmp
2012 Nov 26
0
[LLVMdev] RFC: change BoundsChecking.cpp to use address-based tests
Hi Kevin,
Thanks for your interest and for your deep analysis.
Unfortunately, your approach doesn't catch all bugs and is vulnerable to an
attack.
Consider the following case:
...................... | ----- obj --- | |
end ^ ptr ^ ^ end-of-memory
The scenario is as follows:
- an object is allocated in the last page of the address space
- obj is byte
2013 Apr 23
2
[LLVMdev] 'loop invariant code motion' and 'Reassociate Expression'
Hi,
I am investigating a performance degradation between llvm-3.1 and llvm-3.2
(Note: current top-of-tree shows a similar degradation)
One issue I see is the following:
- 'loop invariant code motion' seems to be depending on the result of the 'reassociate expression' pass:
In the samples below I observer the following behavior:
Both start with the same expression:
%add = add
2015 May 06
1
Intel NUC haswell-ULT
I have one of those new little NUC's and installed Centos 7.1 on it.
lspci shows
00:00.0 Host bridge: Intel Corporation Haswell-ULT DRAM Controller (rev 09)
00:02.0 VGA compatible controller: Intel Corporation Haswell-ULT Integrated
Graphics Controller (rev 09)
00:03.0 Audio device: Intel Corporation Haswell-ULT HD Audio Controller
(rev 09)
00:14.0 USB controller: Intel Corporation 8 Series
2013 Apr 25
2
[LLVMdev] 'loop invariant code motion' and 'Reassociate Expression'
It's an interesting problem.
The best stuff I've seen published is by Cooper, Eckhart, & Kennedy, in
PACT '08.
Cooper gives a nice intro in one of his lectures:
http://www.cs.rice.edu/~keith/512/2012/Lectures/26ReassocII-1up.pdf
I can't tell, quickly, what's going on in Reassociate;
as usual, the documentation resolutely avoids giving any credit for the
ideas.
Why is that?
2018 Jan 17
3
always allow canonicalizing to 8- and 16-bit ops?
Example:
define i8 @narrow_add(i8 %x, i8 %y) {
%x32 = zext i8 %x to i32
%y32 = zext i8 %y to i32
%add = add nsw i32 %x32, %y32
%tr = trunc i32 %add to i8
ret i8 %tr
}
With no data-layout or with an x86 target where 8-bit integer is in the
data-layout, we reduce to:
$ ./opt -instcombine narrowadd.ll -S
define i8 @narrow_add(i8 %x, i8 %y) {
%add = add i8 %x, %y
ret i8 %add
}
But on
2013 Apr 23
0
[LLVMdev] 'loop invariant code motion' and 'Reassociate Expression'
As far as I can understand of the code, the Reassociate tries to achieve
this result by its "ranking" mechanism.
If it dose not, it is not hard to achieve this result, just restructure
the expression in a way such that
the earlier definition of the sub-expression is permute earlier in the
resulting expr.
e.g.
outer-loop1
x=
outer-loop2
y =
2018 Jan 22
2
always allow canonicalizing to 8- and 16-bit ops?
Thanks for the perf testing. I assume that DAG legalization is equipped to
handle these cases fairly well, or someone would've complained by now...
FWIW (and at least some of this can be blamed on me), instcombine already
does the narrowing transforms without checking shouldChangeType() for
binops like and/or/xor/udiv. The justification was that narrower ops are
always better for
2013 Apr 25
0
[LLVMdev] 'loop invariant code motion' and 'Reassociate Expression'
On Apr 25, 2013, at 10:51 AM, Preston Briggs <preston.briggs at gmail.com> wrote:
> It's an interesting problem.
> The best stuff I've seen published is by Cooper, Eckhart, & Kennedy, in PACT '08.
> Cooper gives a nice intro in one of his lectures: http://www.cs.rice.edu/~keith/512/2012/Lectures/26ReassocII-1up.pdf
> I can't tell, quickly, what's going on
2016 May 24
1
BitcodeReader non explicit error
Hi,
I'm working on OpenCL and I'm using clang as compiler (based on clang 3.7.0).
I have a issue, I'm generating a bitcode file (that I can print before before the generation). But when I'm trying to read it again with clang, I have this issue:
"error: Invalid record"
How can I managed to know where it comes from?
Thank you,
Romaric
Here is what is print before the
2019 Dec 18
2
Missing code depending on a #ifdef within the .ll file
Hi David,
My question is: why both #ifdef and #else branches are missing? I think at
least one of the two should be present... In fact there is a case where the
width could be greater then
PNG_USER_WIDTH_MAX but not greater then PNG_UINT_31_MAX. That's why I was
expecting at least one of the two...
Thanks
Alberto
On Wed, Dec 18, 2019, 22:12 David Blaikie <dblaikie at gmail.com>
2018 Jan 22
0
always allow canonicalizing to 8- and 16-bit ops?
Hello
Thanks for looking into this.
I can't be very confident what the knock on result of a change like that would be,
especially on architectures that are not Arm. What I can do though, is run some
benchmarks and look at that results.
Using this patch:
--- a/lib/Transforms/InstCombine/InstructionCombining.cpp
+++ b/lib/Transforms/InstCombine/InstructionCombining.cpp
@@ -150,6 +150,9 @@