Displaying 20 results from an estimated 2000 matches similar to: "[LLVMdev] Optimization feasibility"
2007 Dec 25
0
[LLVMdev] Optimization feasibility
Hi Jo,
On 2007-12-24, at 14:43, Joachim Durchholz wrote:
> I'm in a very preliminary phase of a language project which requires
> some specific optimizations to be reasonably efficient.
>
> LLVM already looks very good; I'd just like to know whether I can
> push these optimizations through LLVM to the JIT phase (which, as
> far as I understand the docs, is a
2007 Dec 25
3
[LLVMdev] Optimization feasibility
On 25 Dec 2007, at 03:29, Gordon Henriksen wrote:
> Hi Jo,
>
> On 2007-12-24, at 14:43, Joachim Durchholz wrote:
>
>> I'm in a very preliminary phase of a language project which requires
>> some specific optimizations to be reasonably efficient.
>>
>> LLVM already looks very good; I'd just like to know whether I can
>> push these optimizations
2013 Feb 14
2
[LLVMdev] Question about fastcc assumptions and seemingly superfluous %esp updates
Hello,
While investigating one of the existing tests
(test/CodeGen/X86/tailcallpic2.ll), I ran into IR that produces some
interesting code. The IR is very straightforward:
define protected fastcc i32 @tailcallee(i32 %a1, i32 %a2, i32 %a3, i32 %a4) {
entry:
ret i32 %a3
}
define fastcc i32 @tailcaller(i32 %in1, i32 %in2) {
entry:
%tmp11 = tail call fastcc i32 @tailcallee( i32 %in1, i32 %in2, i32
2013 Feb 15
2
[LLVMdev] Question about fastcc assumptions and seemingly superfluous %esp updates
>> While investigating one of the existing tests
>> (test/CodeGen/X86/tailcallpic2.ll), I ran into IR that produces some
>> interesting code. The IR is very straightforward:
>>
>> define protected fastcc i32 @tailcallee(i32 %a1, i32 %a2, i32 %a3, i32
>> %a4) {
>> entry:
>> ret i32 %a3
>> }
>>
>> define fastcc i32 @tailcaller(i32
2013 Feb 15
0
[LLVMdev] Question about fastcc assumptions and seemingly superfluous %esp updates
Hey Eli,
On Thu, Feb 14, 2013 at 5:45 PM, Eli Bendersky <eliben at google.com> wrote:
> Hello,
>
> While investigating one of the existing tests
> (test/CodeGen/X86/tailcallpic2.ll), I ran into IR that produces some
> interesting code. The IR is very straightforward:
>
> define protected fastcc i32 @tailcallee(i32 %a1, i32 %a2, i32 %a3, i32
> %a4) {
> entry:
>
2013 Feb 15
0
[LLVMdev] Question about fastcc assumptions and seemingly superfluous %esp updates
When you enable -tailcallopt you get support for tail calls between functions with arbitrary stack space requirements. That means the calling convention has to change slightly. E.g the callee is responsible for removing it's arguments of the stack. The caller cannot transitively know the tail callee's tailcallee's requirement. Also care must be taken to make sure the stack stays
2013 Feb 15
1
[LLVMdev] Question about fastcc assumptions and seemingly superfluous %esp updates
Hi Arnold,
Thanks for the insights. My comments below:
On Thu, Feb 14, 2013 at 5:30 PM, Arnold Schwaighofer
<aschwaighofer at apple.com> wrote:
> When you enable -tailcallopt you get support for tail calls between functions with arbitrary stack space requirements. That means the calling convention has to change slightly. E.g the callee is responsible for removing it's arguments of
2008 Jan 02
0
[LLVMdev] Optimization feasibility
On Dec 25, 2007, at 9:07 AM, Arnold Schwaighofer wrote:
> On 25 Dec 2007, at 03:29, Gordon Henriksen wrote:
>
>> Hi Jo,
>>
>> On 2007-12-24, at 14:43, Joachim Durchholz wrote:
>>
>>> I'm in a very preliminary phase of a language project which requires
>>> some specific optimizations to be reasonably efficient.
>>>
>>> LLVM
2017 Jun 24
1
musttail & alwaysinline interaction
Consider this program:
@globalSideEffect = global i32 0
define void @tobeinlined() #0 {
entry:
store i32 3, i32* @globalSideEffect, align 4
musttail call fastcc void @tailcallee(i32 3)
ret void
}
define fastcc void @tailcallee(i32 %i) {
entry:
call void @tobeinlined()
ret void
}
attributes #0 = { alwaysinline }
Clearly, if this is processed with opt -alwaysinline, it will lead
2007 Aug 11
1
[LLVMdev] Tail call optimization deeds
Okay so i implemented an(other :) initial version for X86-32 backend,
this time based on TOT:
It is not very generic at the moment. Only functions with
callingconv::fastcc and the tail call attribute will be optimized.
Maybe the next step should be to integrate the code into the other
calling convention lowering. Here is what i have at the moment:
If callingconv::fastcc is used the
2007 Aug 09
4
[LLVMdev] Tail call optimization thoughts
Hello, Arnold.
Only quick comments, I'll try to make a full review a little bit later.
> 0.)a fast calling convention (maybe use the current
> CallingConv::Fast, or create a CallingConv::TailCall)
> 1.) lowering of formal arguments
> like for example x86_LowerCCCArguments in stdcall mode
> we need to make sure that later mentioned CALL_CLOBBERED_REG is
>
2007 Aug 13
0
[LLVMdev] Tail call optimization deeds
Hi Arnold and Anton,
Sorry I have been ignoring your emails on this topic. It's an
important task and I really need sometime to think about it (and talk
to Chris about it!) But this has been an especially hectic week. I am
also going to vacation soon so I am not sure when I would get around
to it.
If Chris has time, I am sure he has lots to say on this topic. :-)
Otherwise, please
2007 Mar 06
6
[LLVMdev] alloca & store generation
I am writing a transformation that needs to add a call to a function F()
at the beginning of main() with the addresses of argc and argv as
parameters to F(). However, the bytecode file I'm transforming has not
allocated space on the stack for argc and argv. So, I developed my
transformation to change main() from:
-----
int main(int %argc, sbyte** %argv){
entry:
...
// some use of
2017 Feb 06
2
[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON
Hi Jean-Marc,
Thanks a lot for reviewing this huge assembly function!
silk_warped_autocorrelation_FIX_c()'s kernel part is
for( n = 0; n < length; n++ ) {
tmp1_QS = silk_LSHIFT32( (opus_int32)input[ n ], QS );
/* Loop over allpass sections */
for( i = 0; i < order; i++ ) {
/* Output of allpass section */
tmp2_QS = silk_SMLAWB(
2017 Feb 07
2
[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON
This is a great idea. But the order (psEncC->shapingLPCOrder) can be
configured to 12, 14, 16, 20 and 24 according to complexity parameter.
It's hard to get a universal function to handle all these orders
efficiently. Any suggestions?
Thanks,
Linfeng
On Mon, Feb 6, 2017 at 12:40 PM, Jean-Marc Valin <jmvalin at jmvalin.ca> wrote:
> Hi Linfeng,
>
> On 06/02/17 02:51 PM,
2017 Feb 07
3
[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON
Hi Jean-Marc,
Thanks for your suggestions. Will get back to you once we have some updates.
Linfeng
On Mon, Feb 6, 2017 at 5:47 PM, Jean-Marc Valin <jmvalin at jmvalin.ca> wrote:
> Hi Linfeng,
>
> On 06/02/17 07:18 PM, Linfeng Zhang wrote:
> > This is a great idea. But the order (psEncC->shapingLPCOrder) can be
> > configured to 12, 14, 16, 20 and 24 according to
2007 Aug 09
1
[LLVMdev] Tail call optimization thoughts
Implementing tail call opt could look like the following:
0.)a fast calling convention (maybe use the current
CallingConv::Fast, or create a CallingConv::TailCall)
1.) lowering of formal arguments
like for example x86_LowerCCCArguments in stdcall mode
we need to make sure that later mentioned CALL_CLOBBERED_REG is
not used (remove it from available
registers in callingconvention for
2017 Apr 05
2
[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON
I attached a new patch with small cleanup (disassembly is identical as the
last patch). We have done the same internal testing as usual.
Also, attached 2 failed temporary versions which try to reduce code size
(just for code review reference purpose).
The new patch of silk_warped_autocorrelation_FIX_neon() has a code size of
3,228 bytes (with gcc).
smaller_slower.c has a code size of 2,304
2016 Aug 29
2
GVN / Alias Analysis issue with llvm.masked.scatter/gather intrinsics
Hello everyone,
I think I have found an gvn / alias analysis related bug, but before
opening an issue on the tracker I wanted to see if I am missing something.
I have the following testcase:
define spir_kernel void @test(<2 x i32*> %in1, <2 x i32*> %in2, i32* %out) {
> entry:
> ; Just some temporary storage
> %tmp.0 = alloca i32
> %tmp.1 = alloca i32
> %tmp.i =
2017 Apr 05
4
[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON
Thank Jean-Marc!
The speedup percentages are all relative to the entire encoder.
Comparing to master, this optimization patch speeds up fixed-point SILK
encoder on NEON as following: Complexity 5: 6.1% Complexity 6: 5.8%
Complexity 8: 5.5% Complexity 10: 4.0%
when testing on an Acer Chromebook, ARMv7 Processor rev 3 (v7l), CPU max
MHz: 2116.5
Thanks,
Linfeng
On Wed, Apr 5, 2017 at 11:02 AM,