thr3ads.net - search: "_mm_cvtsi32

[LLVMdev] Optimized code analysis problems

2009 Jan 31

2

[LLVMdev] Optimized code analysis problems

...roblem I am encountering is that I need to do some matching based on the analysis and I would like to extract the names of the functions as is rather than llvm intrinsics with an llvm. representation. Essentially I would like to extract the control flow graph representation with function names (eg. _mm_cvtsi32_si128) instead of the functions being replaced by 'llvm.*' Is there anyway to extract these names directly as function calls? I have included the sample code below. I get the function call names as llvm.x86 something instead of getting function names(eg. _mm_cvtsi32_si128) #include <pmmin...

[LLVMdev] Optimized code analysis problems

2009 Jan 31

0

[LLVMdev] Optimized code analysis problems

On Fri, Jan 30, 2009 at 7:10 PM, Nipun Arora <nipun2512 at gmail.com> wrote: > Essentially I would like to extract the control flow graph representation > with function names (eg. _mm_cvtsi32_si128) instead of the functions being > replaced by 'llvm.*' > Is there anyway to extract these names directly as function calls? The names disappear in an unrecoverable way once the first inlining pass runs to take care of always_inline. You might be able to hack the code to sneak in bef...

[LLVMdev] Optimized code analysis problems

2009 Jan 31

2

[LLVMdev] Optimized code analysis problems

...pun On Fri, Jan 30, 2009 at 10:39 PM, Eli Friedman <eli.friedman at gmail.com>wrote: > On Fri, Jan 30, 2009 at 7:10 PM, Nipun Arora <nipun2512 at gmail.com> wrote: > > Essentially I would like to extract the control flow graph representation > > with function names (eg. _mm_cvtsi32_si128) instead of the functions > being > > replaced by 'llvm.*' > > Is there anyway to extract these names directly as function calls? > > The names disappear in an unrecoverable way once the first inlining > pass runs to take care of always_inline. You might be able to...

[RFC PATCH v3] Intrinsics/RTCD related fixes. Mostly x86.

2015 Mar 13

1

[RFC PATCH v3] Intrinsics/RTCD related fixes. Mostly x86.

...rt enough to eliminate the extra MOVD instruction. + For _mm_cvtepi16_epi32, it does the right thing, though does *not* optimize out + the extra MOVQ if it's specified explicitly */ + # if defined(__clang__) || !defined(__OPTIMIZE__) # define OP_CVTEPI8_EPI32_M32(x) \ (_mm_cvtepi8_epi32(_mm_cvtsi32_si128(*(int *)(x)))) - -# define OP_CVTEPI16_EPI32_M64(x) \ - (_mm_cvtepi16_epi32(_mm_loadl_epi64((__m128i *)(x)))) # else # define OP_CVTEPI8_EPI32_M32(x) \ (_mm_cvtepi8_epi32(*(__m128i *)(x))) +#endif +# if !defined(__OPTIMIZE__) +# define OP_CVTEPI16_EPI32_M64(x) \ + (_mm_cvtepi16_epi32(_mm_l...

[RFC PATCHv2] Intrinsics/RTCD related fixes. Mostly x86.

2015 Mar 12

1

[RFC PATCHv2] Intrinsics/RTCD related fixes. Mostly x86.

...rt enough to eliminate the extra MOVD instruction. + For _mm_cvtepi16_epi32, it does the right thing, though does *not* optimize out + the extra MOVQ if it's specified explicitly */ + # if defined(__clang__) || !defined(__OPTIMIZE__) # define OP_CVTEPI8_EPI32_M32(x) \ (_mm_cvtepi8_epi32(_mm_cvtsi32_si128(*(int *)(x)))) - -# define OP_CVTEPI16_EPI32_M64(x) \ - (_mm_cvtepi16_epi32(_mm_loadl_epi64((__m128i *)(x)))) # else # define OP_CVTEPI8_EPI32_M32(x) \ (_mm_cvtepi8_epi32(*(__m128i *)(x))) +#endif +# if !defined(__OPTIMIZE__) +# define OP_CVTEPI16_EPI32_M64(x) \ + (_mm_cvtepi16_epi32(_mm_l...

Patch cleaning up Opus x86 intrinsics configury

2015 Mar 02

13

Patch cleaning up Opus x86 intrinsics configury

The attached patch cleans up Opus's x86 intrinsics configury. It: * Makes ?enable-intrinsics work with clang and other non-GCC compilers * Enables RTCD for the floating-point-mode SSE code in Celt. * Disables use of RTCD in cases where the compiler targets an instruction set by default. * Enables the SSE4.1 Silk optimizations that apply to the common parts of Silk when Opus is built in

[RFC PATCH v1 0/4] Enable aarch64 intrinsics/Ne10

2015 Mar 18

5

[RFC PATCH v1 0/4] Enable aarch64 intrinsics/Ne10

Hi All, Since I continue to base my work on top of Jonathan's patch, and my previous Ne10 fft/ifft/mdct_forward/backward patches, I thought it would be better to just post all new patches as a patch series. Please let me know if anyone disagrees with this approach. You can see wip branch of all latest patches at https://git.linaro.org/people/viswanath.puttagunta/opus.git Branch:

[RFC PATCH v1 0/5] aarch64: celt_pitch_xcorr: Fixed point series

2015 Mar 31

6

[RFC PATCH v1 0/5] aarch64: celt_pitch_xcorr: Fixed point series

Hi Timothy, As I mentioned earlier [1], I now fixed compile issues with fixed point and resubmitting the patch. I also have new patch that does intrinsics optimizations for celt_pitch_xcorr targetting aarch64. You can find my latest work-in-progress branch at [2] For reference, you can use the Ne10 pre-built libraries at [3] Note that I am working with Phil at ARM to get my patch at [4]

[RFC PATCH v2]: Ne10 fft fixed and previous 0/8]

2015 May 08

8

[RFC PATCH v2]: Ne10 fft fixed and previous 0/8]

Hi All, As per Timothy's suggestion, disabling mdct_forward for fixed point. Only effects armv7,armv8: Extend fixed fft NE10 optimizations to mdct Rest of patches are same as in [1] For reference, latest wip code for opus is at [2] Still working with NE10 team at ARM to get corner cases of mdct_forward. Will update with another patch when issue in NE10 gets fixed. Regards, Vish [1]:

[RFC V3 0/8] Ne10 fft fixed and previous

2015 May 15

11

[RFC V3 0/8] Ne10 fft fixed and previous

Hi All, Changes from RFC v2 [1] armv7,armv8: Extend fixed fft NE10 optimizations to mdct - Overflow issue fixed by Phil at ARM. Ne10 wip at [2]. Should be upstream soon. - So, re-enabled using fixed fft for mdct_forward which was disabled in RFCv2 armv7,armv8: Optimize fixed point fft using NE10 library - Thanks to Jonathan Lennox, fixed some build fixes on iOS and some copy-paste errors Rest

[RFC PATCH v1 0/8] Ne10 fft fixed and previous

2015 Apr 28

10

[RFC PATCH v1 0/8] Ne10 fft fixed and previous

Hello Timothy / Jean-Marc / opus-dev, This patch series is follow up on work I posted on [1]. In addition to what was posted on [1], this patch series mainly integrates Fixed point FFT implementations in NE10 library into opus. You can view my opus wip code at [2]. Note that while I found some issues both with the NE10 library(fixed fft) and with Linaro toolchain (armv8 intrinsics), the work

search for: _mm_cvtsi32_si128