Displaying 20 results from an estimated 10000 matches similar to: "[LLVMdev] Weird volatile propagation ?"
2013 Jan 20
0
[LLVMdev] codegen of volatile aggregate copies (was "Weird volatile propagation" on llvm-dev)
As a results of my investigations, the thread is also added to cfe-dev.
The context : while porting my company code from the LLVM/Clang releases
3.1 to 3.2, I stumbled on a code size and performance regression. The
testcase is :
$ cat test.c
#include <stdint.h>
struct R {
uint16_t a;
uint16_t b;
};
volatile struct R * const addr = (volatile struct R *) 416;
void test(uint16_t a)
{
2013 Jan 20
2
[LLVMdev] [cfe-dev] codegen of volatile aggregate copies (was "Weird volatile propagation" on llvm-dev)
I doubt you needed to add cfe-dev here. Sorry I hadn't seen this, this
seems like an easy and simple deficiency in the IR intrinsic for memcpy.
See below.
On Sun, Jan 20, 2013 at 1:42 PM, Arnaud de Grandmaison <
arnaud.allarddegrandmaison at parrot.com> wrote:
> define void @test(i16 zeroext %a) nounwind uwtable {
> %r.sroa.0 = alloca i16, align 2
> %r.sroa.1 = alloca i16,
2013 Jan 28
4
[LLVMdev] Specify the volatile access behaviour of the memcpy, memmove and memset intrinsics
Hi All,
In the language reference manual, the access behavior of the memcpy,
memmove and memset intrinsics is not well defined with respect to the
volatile flag. The LRM even states that "it is unwise to depend on it".
This forces optimization passes to be conservatively correct and prevent
optimizations.
A very simple example of this is :
$ cat test.c
#include <stdint.h>
2013 Jan 29
0
[LLVMdev] Specify the volatile access behaviour of the memcpy, memmove and memset intrinsics
I can't think of a better way to do this, so I think it's ok.
I also submitted a complementary patch on llvm-commits clarifying volatile semantics.
-Andy
On Jan 28, 2013, at 8:54 AM, Arnaud A. de Grandmaison <arnaud.allarddegrandmaison at parrot.com> wrote:
> Hi All,
>
> In the language reference manual, the access behavior of the memcpy,
> memmove and memset
2013 Jan 21
0
[LLVMdev] [cfe-dev] codegen of volatile aggregate copies (was "Weird volatile propagation" on llvm-dev)
On 01/20/2013 10:56 PM, Chandler Carruth wrote:
> I doubt you needed to add cfe-dev here. Sorry I hadn't seen this, this
> seems like an easy and simple deficiency in the IR intrinsic for
> memcpy. See below.
>
> On Sun, Jan 20, 2013 at 1:42 PM, Arnaud de Grandmaison
> <arnaud.allarddegrandmaison at parrot.com
> <mailto:arnaud.allarddegrandmaison at
2013 Jan 31
0
[LLVMdev] Specify the volatile access behaviour of the memcpy, memmove and memset intrinsics
Thanks Andy and Chandler,
After specifying the volatile access behaviour, the second step was to
autoupgrade the memmove/memcpy intrinsics, and implement
(is|set)Volatile in terms of (is|set)(Src|Dest)Volatile, with no
functional change.
0001-Specify-the-access-behaviour-of-the-memcpy-memmove-a.patch is the
one you already reviewed, unaltered.
2013 Feb 03
0
[LLVMdev] Specify the volatile access behaviour of the memcpy, memmove and memset intrinsics
Same patches as before, but 0002-memcpy has been updated to put the
(is|set)SrcVolatile methods to where they logically belong :
MemTransferInst. This makes (is|set)Volatile methods look a bit ugly to
keep compatibility with existing behaviour, but they will hopefully
disappear when all users have moved to the new interface --- in the next
series of patches.
I plan to give a try to phabricator
2013 Dec 16
2
[LLVMdev] Question about Pre-RA-schedule in LLVM3.3
At 2013-12-15 22:43:34,"Caldarale, Charles R" <Chuck.Caldarale at unisys.com> wrote:
>> From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu]
>> On Behalf Of Haishan
>> Subject: [LLVMdev] Question about Pre-RA-schedule in LLVM3.3
>
>> My clang version is 3.3 and debug build.
>
>> //test.c
>> int a[6] = {1, 2, 3, 4, 5,
2013 Dec 21
0
[LLVMdev] Question about Pre-RA-schedule in LLVM3.3
The flag -enable-aa-sched-mi should do what you want you want in the MachineScheduler pass.
If you want to do it in the selection DAG, there is a subtarget hook that might do it:
TargetSubtargetInfo::useAA()
LLVM won’t generate the schedule you want anyway for Intel core processors, but the alias analysis can be useful in general.
-Andy
On Dec 16, 2013, at 6:03 AM, Haishan <hndxvon at
2013 Dec 15
3
[LLVMdev] Question about Pre-RA-schedule in LLVM3.3
Hi,
I compile a case (test.c) to get object machine file (test.o) using clang as follows:
"clang -target arm -integrated-as -c test.c -o test.o"
My clang version is 3.3 and debug build.
//test.c
int a[6] = {1, 2, 3, 4, 5, 6}
int main() {
a[0] = a[5];
a[1] = a[4];
a[2] = a[5];
}
//end test.c
Then test.dump is generated by using the objdump tool.
//test.dump
ldr r1, [r0, #20]
2017 Mar 09
3
[RFC] bitfield access shrinking
On 03/09/2017 12:28 PM, Krzysztof Parzyszek via llvm-dev wrote:
> We could add intrinsics to extract/insert a bitfield, which would
> simplify a lot of that bitwise logic.
But then you need to teach a bunch of places about how to simply them,
fold using bitwise logic and other things that reduce demanded bits into
them, etc. This seems like a difficult tradeoff.
-Hal
>
>
2013 Nov 24
1
[LLVMdev] wrong code generation for memcpy function in SROA optimization pass
SROA optimization pass did some optimizations and transforms for memcpy function,such as ld/st operations.When someone has written down code like size>sizeof(dest) in memcpy(*dest,*src,size),
there was much likely a wrong code generation.for example,considered as such testcase:
int main()
{
char ch;
short sh = 0x1234;
memcpy(&ch,&sh,2);
printf("ch=0x%02x\n",ch);
}
At
2016 Dec 30
3
Avoiding during my pass the optimization (copy propagation) of my LLVM IR code (at generation)
Hello.
I'm writing an LLVM pass that is working on LLVM IR.
To my surprise the following LLVM pass code generates optimized code - it does copy
propagation on it.
Value *vecShuffleOnePtr = Builder.CreateGEP(ptr_B, vecShuffleOne, "VectorGep");
...
packed_gather_params.push_back(vecShuffleOnePtr);
CallInst *callGather =
2016 Jul 04
2
Optimization issues (Alias Analysis?)
Hey,
I am currently working on a VM which is based on LLVM and I would like to
use its optimizer, but it somehow it can't detect something very simple (I
guess.)
This is the LLVM IR:
target datalayout = "e-m:e-p:32:32-f64:32:64-f80:32-n8:16:32-S128"
target triple = "i386-unknown-linux-gnu"
%struct.regs = type { i32, i32, i32 }
define void @Test(%struct.regs* noalias
2020 Aug 14
6
Intel AMX programming model discussion.
Hi,
Intel Advanced Matrix Extensions (Intel AMX) is a new programming paradigm consisting of two components: a set of 2-dimensional registers (tiles) representing sub-arrays from a larger 2-dimensional memory image, and accelerators able to operate on tiles. Capability of Intel AMX implementation is enumerated by palettes. Two palettes are supported: palette 0 represents the initialized state and
2020 Aug 14
3
Intel AMX programming model discussion.
[Yuanke] AMX register is special. It needs to be configured before use and the config instruction is expensive. To avoid unnecessary tile configure, we collect the tile shape information as much as possible and combine them into one ldtilecfg instruction. The ldtilecfg instruction should dominate any AMX instruction that access tile register. On the other side, the ldtilecfg should post-dominated
2013 Mar 11
0
[LLVMdev] How to unroll reduction loop with caching accumulator on register?
I tried to manually assign each of 3 arrays a unique TBAA node. But it does
not seem to help: alias analysis still considers arrays as may-alias, which
most likely prevents the desired optimization. Below is the sample code
with TBAA metadata inserted. Could you please suggest what might be wrong
with it?
Many thanks,
- D.
marcusmae at M17xR4:~/forge/llvm$ opt -time-passes -enable-tbaa -tbaa
2013 Dec 15
0
[LLVMdev] Question about Pre-RA-schedule in LLVM3.3
> From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu]
> On Behalf Of Haishan
> Subject: [LLVMdev] Question about Pre-RA-schedule in LLVM3.3
> My clang version is 3.3 and debug build.
> //test.c
> int a[6] = {1, 2, 3, 4, 5, 6}
> int main() {
> a[0] = a[5];
> a[1] = a[4];
> a[2] = a[5];
> }
> //end test.c
> Then test.dump is
2020 Aug 18
2
Intel AMX programming model discussion.
The AMX registers are complicated. The single configuration register (which is mostly used implicitly, similar to MXCSR for floating point) controls the shape of all the tile registers, and if you change the tile configuration every single tile register is cleared. In practice, if we have to change the the configuration while any of the tile registers are live, performance is going to be terrible.
2020 Jul 02
3
Redundant ptrtoint/inttoptr instructions
Hi all,
We noticed a lot of unnecessary ptrtoint instructions that stand in way of
some of our optimizations; the code pattern looks like this:
bb1:
%int1 = ptrtoint %struct.s* %ptr1 to i64
bb2:
%int2 = ptrtoint %struct.s* %ptr2 to i64
%bb3:
%phi.node = phi i64 [ %int1, %bb1 ], [%int2, %bb2 ]
%ptr = inttoptr i64 %phi.node to %struct.s*
In short, the pattern above arises due to:
1.