thr3ads.net - search: "superscalar"

Displaying 20 results from an estimated 33 matches for "superscalar".

How to get started with instruction scheduling? Advice needed.

2016 Apr 20

How to get started with instruction scheduling? Advice needed.

...at 12:29 PM, Sergei Larin <slarin at codeaurora.org> wrote: > Target does make a difference. VLIW needs more hand-holding. For what you > are describing it should be fairly simple. > > > > Best strategy – see what other targets do. ARM might be a good start for > generic superscalar. Hexagon for VLIW style scheduling. > > > > Depending on what you decide, you might need different target hooks. > > > > Sergei > > > > --- > > Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted > by The Linux Foundation > >...

[OT] Memory Models and Multi/Virtual-Cores -- WAS: 4.0 -> 4.1 update failing

2005 Jun 21

[OT] Memory Models and Multi/Virtual-Cores -- WAS: 4.0 -> 4.1 update failing

...But "Long Mode" was designed so its PAE52 model could run both 32-bit (and PAE36) as well as new 48-bit programs. We'll revisit that in a bit. Now, let's talk about Intel/AMD design lineage. - Intel IA-32 Complete Design Lineage IA-32 Gen 1 (1986): i386, including i486 - Non-superscalar: ALU + optional FPU (std. in 486DX), TLB added in i486 IA-32 Gen 2 (1992): i586, Pentium/MMX (defunct, redesigned in i686) - Superscalar 2+1 ALU+FPU (pipelined) IA-32 Gen 3 (1994): i686, Pentium Pro, II, III, 4 (partial refit) - Superscalar: 2+2 ALU+FPU (pipelined), FPU 1 complex or 2 ADD - P3...

How to get started with instruction scheduling? Advice needed.

2016 Apr 26

How to get started with instruction scheduling? Advice needed.

...2:29 PM, Sergei Larin <slarin at codeaurora.org<mailto:slarin at codeaurora.org>> wrote: Target does make a difference. VLIW needs more hand-holding. For what you are describing it should be fairly simple. Best strategy – see what other targets do. ARM might be a good start for generic superscalar. Hexagon for VLIW style scheduling. Depending on what you decide, you might need different target hooks. Sergei --- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org<mailto:llvm-dev-boun...

Re: i486 and i686 are the majority ISAs for x86 -- WAS: CentOS 4.0 -> 4.1 update failing

2005 Jun 20

Re: i486 and i686 are the majority ISAs for x86 -- WAS: CentOS 4.0 -> 4.1 update failing

...t should _never_ be used on anything else -- and that includes the Pentium Pro on-ward, which are i686 ISA. The i686 ISA and designs fix a massive amount of serious design errors made in both the Pentium ALU itself as well as considerations in its i586 ISA. To be fair to Intel, it was their first superscalar design. But at the same time, NexGen was able to pull off an even better, superscalar ALU, and much of Intel's "design assistance" came indirectly from joining Digital on the Alpha chip briefly. i486 ISA Compatible Architectures - AMD: 486, 586, K5 (both original and Nx586+FPU), SC...

Llvm-mca library.

2019 May 03

Llvm-mca library.

...a llvm-dev < llvm-dev at lists.llvm.org> wrote: > > I read that out-of-order cores are supported. How about in-order cores? > Would it be easy/difficult to add support for that? > > Cheers, > Sjoerd. > > I don't think that it would be difficult to support in-order superscalar cores. However, it would require a different llvm-mca pipeline of stages. That is because some stages (and simulated hardware components) work under the assumption that the processor is out-of-order (example: the dispatch stage and the retire stage). That being sadi, it would be a bit more complica...

[LLVMdev] Portable OpenCL (pocl) v0.6 released

2012 Aug 17

[LLVMdev] Portable OpenCL (pocl) v0.6 released

...ptimizations. At the core of pocl is a set of LLVM passes used to statically parallelize multiple work-items with the kernel compiler, even in the presence of work-group barriers. This enables parallelization of the fine-grained static concurrency in the work groups in multiple ways (SIMD, VLIW, superscalar,...). The code base is modularized to allow easy adding of new "device drivers" in the host-device layer. A generic multithreaded "target driver" is included. It allows running OpenCL applications on a host that supports the pthread library with multithreading at the work gro...

[LLVMdev] Curiosity about transform changes under Sanitizers (Was: [PATCH] Disable branch folding with MemorySanitizer)

2013 Nov 19

[LLVMdev] Curiosity about transform changes under Sanitizers (Was: [PATCH] Disable branch folding with MemorySanitizer)

...s is only a real problem on strongly ordered architectures (e.g. x86), but given the relative cost of a cache shootdown and everything else in this test case (with the exception of the thread creation), I wouldn't be surprised if it ended up slowing things down. Especially given that a modern superscalar CPU will speculatively execute the load ANYWAY if it can do so from cache, and if it can't then the performance improvement from doing it before the branch will likely be negligible. For single-core, in-order, single-issue architectures, or multicore, weakly ordered, in-order, single-issue arc...

CentOS 4.0 -> 4.1 update failing

2005 Jun 20

CentOS 4.0 -> 4.1 update failing

I've updated CentOS 4.0 to 4.1 on several machines (some desktops, some servers). However on my laptop, update is failing with following error just after headers are downloaded: --> Running transaction check --> Processing Dependency: glibc-common = 2.3.4-2 for package: glibc --> Finished Dependency Resolution Error: Missing Dependency: glibc-common = 2.3.4-2 is needed by package

[LLVMdev] Curiosity about transform changes under Sanitizers (Was: [PATCH] Disable branch folding with MemorySanitizer)

2013 Nov 20

[LLVMdev] Curiosity about transform changes under Sanitizers (Was: [PATCH] Disable branch folding with MemorySanitizer)

...blem on strongly ordered architectures > (e.g. x86), but given the relative cost of a cache shootdown and everything > else in this test case (with the exception of the thread creation), I > wouldn't be surprised if it ended up slowing things down. Especially given > that a modern superscalar CPU will speculatively execute the load ANYWAY if > it can do so from cache, and if it can't then the performance improvement > from doing it before the branch will likely be negligible. > > For single-core, in-order, single-issue architectures, or multicore, > weakly ordered, in...

[LLVMdev] Curiosity about transform changes under Sanitizers (Was: [PATCH] Disable branch folding with MemorySanitizer)

2013 Nov 21

[LLVMdev] Curiosity about transform changes under Sanitizers (Was: [PATCH] Disable branch folding with MemorySanitizer)

...m on > strongly ordered architectures (e.g. x86), but given the relative > cost of a cache shootdown and everything else in this test case > (with the exception of the thread creation), I wouldn't be surprised > if it ended up slowing things down. Especially given that a modern > superscalar CPU will speculatively execute the load ANYWAY if it can > do so from cache, and if it can't then the performance improvement > from doing it before the branch will likely be negligible. > > For single-core, in-order, single-issue architectures, or multicore, > weakly ordered, i...

[LLVMdev] LLVM performance tuning for target machines

2009 Aug 03

[LLVMdev] LLVM performance tuning for target machines

...mental MIPS backend and using it as a model. One thing I am currently however not sure about is instruction scheduling. Does LLVM have a pass which copes with instruction dependencies which will reorder instructions to minimize latencies (and given a model of the CPU try to find a good ordering for superscalar cpus? Is there an example of how this sort of thing is done? Thanks in advance.

[LLVMdev] Portable OpenCL (pocl) v0.7 released

2013 Jan 09

[LLVMdev] Portable OpenCL (pocl) v0.7 released

...ptimizations. At the core of pocl is the kernel compiler that consists of a set of LLVM passes used to statically generate multiple work-item work group functions of kernels, even in the presence of work-group barriers. These functions are suitable for parallelization in multiple ways (SIMD, VLIW, superscalar,...). This release adds support for LLVM 3.2, generating the work group functions using simple (parallel) loop structures, fixes to make the pocl work on ppc32, ppc64 and armv7. Initial Cell SPU support has also been added (very experimental!) to this release as an example of an heterogeneous pocl...

[LLVMdev] Portable Computing Language (pocl) v0.8 released

2013 Aug 12

[LLVMdev] Portable Computing Language (pocl) v0.8 released

...ons. At the core of pocl is the kernel compiler that consists of a set of LLVM passes used to statically transform kernels into work-group functions with multiple work-items, even in the presence of work-group barriers. These functions are suitable for parallelization in multiple ways (SIMD, VLIW, superscalar,...). This release adds support for LLVM/Clang 3.3, employs inner loop parallelization in the kernel compiler, uses Vecmathlib for inlineable efficient math library implementations, contains plenty of bug fixes, and provides several new OpenCL API implementations. We consider pocl ready for wider...

fairly OT: profiling

2003 Nov 03

fairly OT: profiling

The following is from Eric Raymond's new book on Unix programming. You'll get more insight from using profilers if you think of them less as ways to collect individual performance numbers, and more as ways to learn how performance varies as a function of interesting parameters ... Try fitting those numbers to a model, using open-source software like R or a good-quality

How to get started with instruction scheduling? Advice needed.

2018 Jan 11

How to get started with instruction scheduling? Advice needed.

...ar * Writing Great Machine Schedulers[4] by Javed Absar and Florian Hahn Hi Alex, Please leading me to implement Machine scheduling model for at least one core (e.g. Rocket, PULP)[5] Rocket - RV64G - "in-order", single-issue applicaEon core, BOOM - RV64G - "out-of-order", superscalar applicaEon core[6] So what about PULP? is it in-order or out-of-order? Hi LLVM developers, Welcome to review our work about porting GlobalISel to RISCV[7] and give us some suggestion, thanks a lot! [1] https://en.wikipedia.org/wiki/Compilers:_Principles,_Techniques,_and_Tools [2] https://l...

Big thanks for supporting i586 type machines.

2005 Jul 30

Big thanks for supporting i586 type machines.

While I know that, technically, the only i586 machines are the Pentium and Pentium MMX, it is still nice that I can use some headless AMD K6/2 machines I have lying around for CentOS 4. Many thanks for the effort expended to get that working. -- Lamar Owen Director of Information Technology Pisgah Astronomical Research Institute 1 PARI Drive Rosman, NC 28772 (828)862-5554 www.pari.edu

[LLVMdev] Speculative phi elimination at the top of a loop?

2010 Jun 04

[LLVMdev] Speculative phi elimination at the top of a loop?

Hi, On Fri, Jun 4, 2010 at 5:18 AM, Pekka Nikander <pekka.nikander at nomadiclab.com> wrote: > Would the best way be to add an option to -loop-unroll, and hack away at lib/Transforms/Utils/LoopUnroll.cpp? Instead, the better alternative is to write another pass similar to LoopUnrollPass.cpp (say LoopPeelPass.cpp) and add new option -loop-peel. The new pass could use llvm::UnrollLoop()

SiL311x SataRaid (sata_sil)

2005 Jun 18

SiL311x SataRaid (sata_sil)

Hi, On my x86_64 system I have a SiL311x controller that can do RAID. If I configure my 2 identical disks in a RAID1 setup, I would expect to see only 1 block device on Linux. Still I see 2 block devices. Is this intentional, and if so, isn't that dangerous ? (i.e. writing to both disks at the same time) Anyone with an insight, please explain :) -- dag wieers, dag at wieers.com,

[LLVMdev] speed up memcpy intrinsic using ARM Neon registers

2009 Nov 11

[LLVMdev] speed up memcpy intrinsic using ARM Neon registers

On Nov 11, 2009, at 3:27 AM, Rodolph Perfetta wrote: > > If you know about the alignment, maybe use structured load/store > (vst1.64/vld1.64 {dn-dm}). You may also want to work on whole cache > lines > (64 bytes on A8). You can find more in this discussion: > http://groups.google.com/group/beagleboard/browse_thread/thread/12c7bd415fbc >

[LLVMdev] MI Scheduler Update (was Experimental Evaluation of the Schedulers in LLVM 3.3)

2013 Sep 24

[LLVMdev] MI Scheduler Update (was Experimental Evaluation of the Schedulers in LLVM 3.3)

..., but given that we haven't dominstrated the value of simple heuristics, I don't want to pursue anything more complicated. I think better solutions will have to transcend list scheduling. I do like to the idea of constraining the DAG prior to scheduling [Touati, "Register Saturation in Superscalar and VLIW Codes", CC 2001], because that entirely separates the problem from list scheduler heuristics. However, I won't be able to justify adding more complexity, beyond list scheduling heuristics, to the LLVM codebase to solve this problem. Work in this area would need to be done as side...

search for: superscalar