search for: alus

Displaying 20 results from an estimated 37 matches for "alus".

Did you mean: alum
2012 Oct 26
0
[LLVMdev] Data sharing between two ALUs and avoiding illegal copies
Hi, I'm working on support for the latest generation of AMD GPUs (Southern Islands) in the R600 backend, and I need some advice on how to handle interactions between two different ALUs. The processors on Southern Islands GPUs are grouped into compute units, which contain 1 Scalar ALU (sALU) and 64 Vector ALUs (vALU). The sALU is mainly responsible for flow control (implemented using predicates) and loading data from read-only memory. The vALU does most of the data processing a...
2017 Nov 01
5
RFC: [X86] Introducing command line options to prefer narrower vector instructions even when wider instructions are available
...16-31 registers. Motivation: -Using 512-bit operations on some Intel CPUs may cause a decrease in CPU frequency that may offset the gains from using the wider register size. See section 15.26 of Intel® 64 and IA-32 Architectures Optimization Reference Manual published October 2017. -The vector ALUs on ports 0 and 1 of the Skylake Server microarchitecture are only 256-bits wide. 512-bit instructions using these ALUs must use both ports. See section 2.1 of Intel® 64 and IA-32 Architectures Optimization Reference Manual published October 2017. Implementation Plan: -Add prefer-avx256 and pref...
2006 Apr 21
2
Major internal changes, TI DSP build change
> The C5x and C6x output diverges in build 10143, which has log message "lpc > floor converted to fixed-point." Also, the measured SNR changed from 11.05 > in builds 9854-10141 to 9.22 and 9.24 in 10143. Actually, build 10143 introduced another bug, that was the reason for the 1.1.11.1 release. > There is just four lines in modes.c which declare the constant, and one
2006 Apr 22
2
Major internal changes, TI DSP build change
...s no way to judge the real quality. SNR, especially on a single sample, can be very misleading. Yet, could you just check that the DSP results match what you get on a PC? > >Does the C55 have a 32x16 multiplier or do you mean it handles my > >emulation of it well? > > I has two ALUs with 17x17 bit MACs, and it has an instruction that does > this: > ACy = M40(rnd((ACx >> #16) + (uns(Xmem) * uns(Ymem)))) > > I never quite understood this, so I went of and looked at the manuals. It > can multiply the low half in one cycle, then shift and add it to the hig...
2017 Nov 03
2
RFC: [X86] Introducing command line options to prefer narrower vector instructions even when wider instructions are available
...to see documented! > > There's also the issues discussed here < > http://www.agner.org/optimize/blog/read.php?i=165> (and elsewhere) > related to warm-up time for the 256-bit execution pipeline, which is > another issue with using wide-vector ops. > > > -The vector ALUs on ports 0 and 1 of the Skylake Server microarchitecture >> are only 256-bits wide. 512-bit instructions using these ALUs must use both >> ports. See section 2.1 of Intel® 64 and IA-32 Architectures Optimization >> Reference Manual published October 2017. >> > > >&g...
2015 Oct 19
4
Is there a way to determine what CPU resource is used by which instruction?
I'm trying to figure out if there is a way to figure out what processor resource is used by which instruction during scheduling. This is purely for debugging purposes. Since I'm somewhat new to LLVM it is a bit difficult for me to figure this out. Initial idea was to insert comments in the generated assembly which would tell me what what resource is used. MachineInstr has a uint8_t
2017 Nov 07
2
RFC: [X86] Introducing command line options to prefer narrower vector instructions even when wider instructions are available
...es discussed here <http://www.agner.org/ > >> optimize/blog/read.php?i=165> (and elsewhere) related to warm-up time > >> for the 256-bit execution pipeline, which is another issue with using > >> wide-vector ops. > >> > >> > >> -The vector ALUs on ports 0 and 1 of the Skylake Server microarchitecture > >>> are only 256-bits wide. 512-bit instructions using these ALUs must use both > >>> ports. See section 2.1 of Intel® 64 and IA-32 Architectures Optimization > >>> Reference Manual published October 2017...
2003 Sep 09
1
Checksum question
....ogg" I ripped the checksum function from libvorbis, and only modified it in that I hard-coded what I think is the comment header (as far as I can tell from the docs). However it seems to generate an invalid checksum (according to ogginfo) Here is a url to the two files, please help :) http://alus.mine.nu/ogg <p>--- >8 ---- List archives: http://www.xiph.org/archives/ Ogg project homepage: http://www.xiph.org/ogg/ To unsubscribe from this list, send a message to 'vorbis-dev-request@xiph.org' containing only the word 'unsubscribe' in the body. No subject is needed....
2006 Apr 22
0
Major internal changes, TI DSP build change
...way to judge the real quality. >> The MIPs are not a problem for me, and the C55 does very well on 32x16 >> multiplies, so I have not played with PRECISION16 since last year. > >Does the C55 have a 32x16 multiplier or do you mean it handles my >emulation of it well? I has two ALUs with 17x17 bit MACs, and it has an instruction that does this: ACy = M40(rnd((ACx >> #16) + (uns(Xmem) * uns(Ymem)))) I never quite understood this, so I went of and looked at the manuals. It can multiply the low half in one cycle, then shift and add it to the high half in a second cycle...
2006 Apr 22
0
Major internal changes, TI DSP build change
...else wants to run the audio through the encoder and decoder at 8kbps, complexity 1. I might be able to get a coworker to do this, but not any time soon. >> >Does the C55 have a 32x16 multiplier or do you mean it handles my >> >emulation of it well? >> >> I has two ALUs with 17x17 bit MACs, and it has an instruction that does >> this: >> ACy = M40(rnd((ACx >> #16) + (uns(Xmem) * uns(Ymem)))) >> >> I never quite understood this, so I went of and looked at the manuals. >> It >> can multiply the low half in one cycle, then s...
2004 Oct 20
2
[LLVMdev] Re: LLVM Compiler Infrastructure Tutorial
...tion. However, when we are doing high-level synthesis (also called behavioral/architectural synthesis), the targeting architecture is also changing. That is, we need to do architecture exploration and the IR transfromation simultaneously. For example, after a particular pass, we may need 4 ALUs to execute the program, under a certain latency constraint; then, after another optimization pass, we may end up with only 3 ALUs. The instruction-to-ALU binding will be different after this pass. The samething could happen for register allocation and binding, and other synthesis pass. Also,...
2017 Nov 09
2
RFC: [X86] Introducing command line options to prefer narrower vector instructions even when wider instructions are available
...gt; >> optimize/blog/read.php?i=165> (and elsewhere) related to warm-up time >> > >> for the 256-bit execution pipeline, which is another issue with using >> > >> wide-vector ops. >> > >> >> > >> >> > >> -The vector ALUs on ports 0 and 1 of the Skylake Server >> microarchitecture >> > >>> are only 256-bits wide. 512-bit instructions using these ALUs must >> use both >> > >>> ports. See section 2.1 of Intel® 64 and IA-32 Architectures >> Optimization >> &gt...
2017 Nov 11
2
RFC: [X86] Introducing command line options to prefer narrower vector instructions even when wider instructions are available
...arm-up >>>> time >>>> > >> for the 256-bit execution pipeline, which is another issue with >>>> using >>>> > >> wide-vector ops. >>>> > >> >>>> > >> >>>> > >> -The vector ALUs on ports 0 and 1 of the Skylake Server >>>> microarchitecture >>>> > >>> are only 256-bits wide. 512-bit instructions using these ALUs >>>> must use both >>>> > >>> ports. See section 2.1 of Intel® 64 and IA-32 Architectures &g...
2017 Nov 12
2
RFC: [X86] Introducing command line options to prefer narrower vector instructions even when wider instructions are available
...;> > >> for the 256-bit execution pipeline, which is another issue with >>>>>> using >>>>>> > >> wide-vector ops. >>>>>> > >> >>>>>> > >> >>>>>> > >> -The vector ALUs on ports 0 and 1 of the Skylake Server >>>>>> microarchitecture >>>>>> > >>> are only 256-bits wide. 512-bit instructions using these ALUs >>>>>> must use both >>>>>> > >>> ports. See section 2.1 of Inte...
2004 Oct 21
0
[LLVMdev] Re: LLVM Compiler Infrastructure Tutorial
...doing high-level synthesis (also called > behavioral/architectural synthesis), > the targeting architecture is also changing. That is, we need to do > architecture exploration > and the IR transfromation simultaneously. For example, after a > particular pass, we may need 4 > ALUs to execute the program, under a certain latency constraint; > then, after another optimization pass, > we may end up with only 3 ALUs. The instruction-to-ALU binding will > be different after this pass. > The samething could happen for register allocation and binding, and > othe...
2019 Oct 01
2
Adding support for vscale
On Tue, Oct 1, 2019 at 11:08 AM Graham Hunter <Graham.Hunter at arm.com> wrote: > Hi Luke, hi graham, thanks for responding in such an informative fashion. > > On 1 Oct 2019, at 09:21, Luke Kenneth Casson Leighton via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > typedef vec4 float[4]; // SEW=32,LMUL=4 probably > > static vec4 globalvec[1024]; // vscale ==
2017 Nov 13
3
RFC: [X86] Introducing command line options to prefer narrower vector instructions even when wider instructions are available
...for the 256-bit execution pipeline, which is another issue with >>>>>>> using >>>>>>> > >> wide-vector ops. >>>>>>> > >> >>>>>>> > >> >>>>>>> > >> -The vector ALUs on ports 0 and 1 of the Skylake Server >>>>>>> microarchitecture >>>>>>> > >>> are only 256-bits wide. 512-bit instructions using these ALUs >>>>>>> must use both >>>>>>> > >>> ports. See sec...
2004 Oct 20
0
[LLVMdev] Re: LLVM Compiler Infrastructure Tutorial
Yiping, Could you describe in a little more detail what your goals are? I agree with Reid and Misha that modifying the instruction definition is usually not advisable but to suggest alternatives, we would need to know more. Also, for some projects it could make sense to change the instruction set. --Vikram http://www.cs.uiuc.edu/~vadve http://llvm.cs.uiuc.edu/ On Oct 20, 2004, at 2:41 PM,
2013 Jun 24
2
[LLVMdev] Register Class assignment for integer and pointer types
...all standard integer operations). Since tablegen matches patterns based on types, it is impossible for us to select from both instruction sets, and it sounds like you will have the same problem if you treat pointers as integer types. To make matters worse, you can only copy registers between the ALUs in one direction. SALU to VALU is legal, but VALU to SALU isn't. Currently we do post processing on the DAG after instruction selection to help avoid these illegal copies. I've considered also adding post processing to help select the optimal instruction type (VALU or SALU), but I think s...
2017 Nov 13
2
RFC: [X86] Introducing command line options to prefer narrower vector instructions even when wider instructions are available
...ith >> using >> > >> wide-vector ops. >> > >> >> > >> >> > >> -The vector ALUs on ports 0 and >> 1 of the Skylake Server microarchitecture >> > >>> are only 256-bits wide. 512-bit >> instructions using these ALUs must >>...