Displaying 20 results from an estimated 33 matches for "superscalar".
2016 Apr 20
2
How to get started with instruction scheduling? Advice needed.
...at 12:29 PM, Sergei Larin <slarin at codeaurora.org>
wrote:
> Target does make a difference. VLIW needs more hand-holding. For what you
> are describing it should be fairly simple.
>
>
>
> Best strategy – see what other targets do. ARM might be a good start for
> generic superscalar. Hexagon for VLIW style scheduling.
>
>
>
> Depending on what you decide, you might need different target hooks.
>
>
>
> Sergei
>
>
>
> ---
>
> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted
> by The Linux Foundation
>
>...
2005 Jun 21
9
[OT] Memory Models and Multi/Virtual-Cores -- WAS: 4.0 -> 4.1 update failing
...But "Long Mode" was designed
so its PAE52 model could run both 32-bit (and PAE36) as well as new
48-bit programs.
We'll revisit that in a bit. Now, let's talk about Intel/AMD design
lineage.
- Intel IA-32 Complete Design Lineage
IA-32 Gen 1 (1986): i386, including i486
- Non-superscalar: ALU + optional FPU (std. in 486DX), TLB added in i486
IA-32 Gen 2 (1992): i586, Pentium/MMX (defunct, redesigned in i686)
- Superscalar 2+1 ALU+FPU (pipelined)
IA-32 Gen 3 (1994): i686, Pentium Pro, II, III, 4 (partial refit)
- Superscalar: 2+2 ALU+FPU (pipelined), FPU 1 complex or 2 ADD
- P3...
2016 Apr 26
3
How to get started with instruction scheduling? Advice needed.
...2:29 PM, Sergei Larin <slarin at codeaurora.org<mailto:slarin at codeaurora.org>> wrote:
Target does make a difference. VLIW needs more hand-holding. For what you are describing it should be fairly simple.
Best strategy – see what other targets do. ARM might be a good start for generic superscalar. Hexagon for VLIW style scheduling.
Depending on what you decide, you might need different target hooks.
Sergei
---
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org<mailto:llvm-dev-boun...
2005 Jun 20
0
Re: i486 and i686 are the majority ISAs for x86 -- WAS: CentOS 4.0 -> 4.1 update failing
...t should
_never_ be used on anything else -- and that includes the Pentium
Pro on-ward, which are i686 ISA.
The i686 ISA and designs fix a massive amount of serious design
errors made in both the Pentium ALU itself as well as considerations
in its i586 ISA. To be fair to Intel, it was their first superscalar
design. But at the same time, NexGen was able to pull off an even
better, superscalar ALU, and much of Intel's "design assistance"
came indirectly from joining Digital on the Alpha chip briefly.
i486 ISA Compatible Architectures
- AMD: 486, 586, K5 (both original and Nx586+FPU), SC...
2019 May 03
3
Llvm-mca library.
...a llvm-dev <
llvm-dev at lists.llvm.org> wrote:
>
> I read that out-of-order cores are supported. How about in-order cores?
> Would it be easy/difficult to add support for that?
>
>
Cheers,
> Sjoerd.
>
>
I don't think that it would be difficult to support in-order superscalar
cores.
However, it would require a different llvm-mca pipeline of stages. That is
because some stages (and simulated hardware components) work under the
assumption that the processor is out-of-order (example: the dispatch stage
and the retire stage).
That being sadi, it would be a bit more complica...
2012 Aug 17
1
[LLVMdev] Portable OpenCL (pocl) v0.6 released
...ptimizations. At the core of pocl is a set of LLVM passes
used to statically parallelize multiple work-items with the kernel
compiler, even in the presence of work-group barriers. This enables
parallelization of the fine-grained static concurrency in the work
groups in multiple ways (SIMD, VLIW, superscalar,...).
The code base is modularized to allow easy adding of new "device drivers"
in the host-device layer. A generic multithreaded "target driver" is
included. It allows running OpenCL applications on a host that supports
the pthread library with multithreading at the work gro...
2013 Nov 19
0
[LLVMdev] Curiosity about transform changes under Sanitizers (Was: [PATCH] Disable branch folding with MemorySanitizer)
...s is only a real problem on strongly ordered architectures (e.g. x86), but given the relative cost of a cache shootdown and everything else in this test case (with the exception of the thread creation), I wouldn't be surprised if it ended up slowing things down. Especially given that a modern superscalar CPU will speculatively execute the load ANYWAY if it can do so from cache, and if it can't then the performance improvement from doing it before the branch will likely be negligible.
For single-core, in-order, single-issue architectures, or multicore, weakly ordered, in-order, single-issue arc...
2005 Jun 20
8
CentOS 4.0 -> 4.1 update failing
I've updated CentOS 4.0 to 4.1 on several machines (some desktops, some
servers). However on my laptop, update is failing with following error
just after headers are downloaded:
--> Running transaction check
--> Processing Dependency: glibc-common = 2.3.4-2 for package: glibc
--> Finished Dependency Resolution
Error: Missing Dependency: glibc-common = 2.3.4-2 is needed by package
2013 Nov 20
3
[LLVMdev] Curiosity about transform changes under Sanitizers (Was: [PATCH] Disable branch folding with MemorySanitizer)
...blem on strongly ordered architectures
> (e.g. x86), but given the relative cost of a cache shootdown and everything
> else in this test case (with the exception of the thread creation), I
> wouldn't be surprised if it ended up slowing things down. Especially given
> that a modern superscalar CPU will speculatively execute the load ANYWAY if
> it can do so from cache, and if it can't then the performance improvement
> from doing it before the branch will likely be negligible.
>
> For single-core, in-order, single-issue architectures, or multicore,
> weakly ordered, in...
2013 Nov 21
0
[LLVMdev] Curiosity about transform changes under Sanitizers (Was: [PATCH] Disable branch folding with MemorySanitizer)
...m on
> strongly ordered architectures (e.g. x86), but given the relative
> cost of a cache shootdown and everything else in this test case
> (with the exception of the thread creation), I wouldn't be surprised
> if it ended up slowing things down. Especially given that a modern
> superscalar CPU will speculatively execute the load ANYWAY if it can
> do so from cache, and if it can't then the performance improvement
> from doing it before the branch will likely be negligible.
>
> For single-core, in-order, single-issue architectures, or multicore,
> weakly ordered, i...
2009 Aug 03
0
[LLVMdev] LLVM performance tuning for target machines
...mental MIPS backend and using it as a model. One thing I am currently however not sure about is instruction scheduling. Does LLVM have a pass which copes with instruction dependencies which will reorder instructions to minimize latencies (and given a model of the CPU try to find a good ordering for superscalar cpus? Is there an example of how this sort of thing is done?
Thanks in advance.
2013 Jan 09
0
[LLVMdev] Portable OpenCL (pocl) v0.7 released
...ptimizations.
At the core of pocl is the kernel compiler that consists of a
set of LLVM passes used to statically generate multiple work-item
work group functions of kernels, even in the presence of work-group
barriers. These functions are suitable for parallelization in multiple
ways (SIMD, VLIW, superscalar,...).
This release adds support for LLVM 3.2, generating the work group
functions using simple (parallel) loop structures, fixes to make the
pocl work on ppc32, ppc64 and armv7. Initial Cell SPU support has also
been added (very experimental!) to this release as an example of
an heterogeneous pocl...
2013 Aug 12
0
[LLVMdev] Portable Computing Language (pocl) v0.8 released
...ons.
At the core of pocl is the kernel compiler that consists of a set of LLVM passes used to statically transform kernels into work-group functions with multiple work-items, even in the presence of work-group barriers. These functions are suitable for parallelization in multiple ways (SIMD, VLIW, superscalar,...).
This release adds support for LLVM/Clang 3.3, employs inner loop parallelization in the kernel compiler, uses Vecmathlib for inlineable efficient math library implementations, contains plenty of bug fixes, and provides several new OpenCL API implementations.
We consider pocl ready for wider...
2003 Nov 03
1
fairly OT: profiling
The following is from Eric Raymond's new book on Unix programming.
You'll get more insight from using profilers if you think of them less
as ways to collect individual performance numbers, and more as ways to
learn how performance varies as a function of interesting parameters
... Try fitting those numbers to a model, using open-source software
like R or a good-quality
2018 Jan 11
0
How to get started with instruction scheduling? Advice needed.
...ar
* Writing Great Machine Schedulers[4] by Javed Absar and Florian Hahn
Hi Alex,
Please leading me to implement Machine scheduling model for at least one
core (e.g. Rocket, PULP)[5]
Rocket - RV64G - "in-order", single-issue applicaEon core, BOOM - RV64G
- "out-of-order", superscalar applicaEon core[6]
So what about PULP? is it in-order or out-of-order?
Hi LLVM developers,
Welcome to review our work about porting GlobalISel to RISCV[7] and give
us some suggestion, thanks a lot!
[1]
https://en.wikipedia.org/wiki/Compilers:_Principles,_Techniques,_and_Tools
[2] https://l...
2005 Jul 30
2
Big thanks for supporting i586 type machines.
While I know that, technically, the only i586 machines are the Pentium and
Pentium MMX, it is still nice that I can use some headless AMD K6/2 machines
I have lying around for CentOS 4. Many thanks for the effort expended to get
that working.
--
Lamar Owen
Director of Information Technology
Pisgah Astronomical Research Institute
1 PARI Drive
Rosman, NC 28772
(828)862-5554
www.pari.edu
2010 Jun 04
0
[LLVMdev] Speculative phi elimination at the top of a loop?
Hi,
On Fri, Jun 4, 2010 at 5:18 AM, Pekka Nikander
<pekka.nikander at nomadiclab.com> wrote:
> Would the best way be to add an option to -loop-unroll, and hack away at lib/Transforms/Utils/LoopUnroll.cpp?
Instead, the better alternative is to write another pass similar to
LoopUnrollPass.cpp (say LoopPeelPass.cpp) and add new option
-loop-peel. The new pass could use llvm::UnrollLoop()
2005 Jun 18
2
SiL311x SataRaid (sata_sil)
Hi,
On my x86_64 system I have a SiL311x controller that can do RAID. If I
configure my 2 identical disks in a RAID1 setup, I would expect to see
only 1 block device on Linux. Still I see 2 block devices.
Is this intentional, and if so, isn't that dangerous ? (i.e. writing to
both disks at the same time)
Anyone with an insight, please explain :)
-- dag wieers, dag at wieers.com,
2009 Nov 11
0
[LLVMdev] speed up memcpy intrinsic using ARM Neon registers
On Nov 11, 2009, at 3:27 AM, Rodolph Perfetta wrote:
>
> If you know about the alignment, maybe use structured load/store
> (vst1.64/vld1.64 {dn-dm}). You may also want to work on whole cache
> lines
> (64 bytes on A8). You can find more in this discussion:
> http://groups.google.com/group/beagleboard/browse_thread/thread/12c7bd415fbc
>
2013 Sep 24
0
[LLVMdev] MI Scheduler Update (was Experimental Evaluation of the Schedulers in LLVM 3.3)
..., but given that we haven't dominstrated the value of simple heuristics, I don't want to pursue anything more complicated. I think better solutions will have to transcend list scheduling. I do like to the idea of constraining the DAG prior to scheduling [Touati, "Register Saturation in Superscalar and VLIW Codes", CC 2001], because that entirely separates the problem from list scheduler heuristics. However, I won't be able to justify adding more complexity, beyond list scheduling heuristics, to the LLVM codebase to solve this problem. Work in this area would need to be done as side...