Displaying 20 results from an estimated 300 matches similar to: "[llvm-mca] Resource consumption of ProcResGroups"
2020 May 10
2
[llvm-mca] Resource consumption of ProcResGroups
Hi Alex,
On Sun, May 10, 2020 at 4:00 PM Alex Renda <renda at csail.mit.edu> wrote:
> Thanks, that’s very helpful!
>
>
>
> Also, sorry for the miscue on that bug with the 2/4 cycles — I realize now
> that that’s an artifact of a change that I made to not crash when resource
> groups overlap without all atomic subunits being specified:
>
> `echo 'fxrstor
2020 May 10
2
[llvm-mca] Resource consumption of ProcResGroups
> On May 9, 2020, at 5:12 PM, Andrea Di Biagio via llvm-dev <llvm-dev at lists.llvm.org> wrote:
>
> The llvm scheduling model is quite simple and doesn't allow mca to accurately simulate the execution of individual uOPs. That limitation is sort-of acceptable if you consider how the scheduling model framework was originally designed with a different goal in mind (i.e. machine
2018 Feb 08
0
[VLIW Scheduler] Itineraries vs. per operand scheduling
We have a two different dimensions for each instruction: slot
assignments, and operand timings. These two are unrelated to each other,
and also each (or both) can change for any given instruction from one
architecture version to the next.
The main concern for us was which of these mechanisms contains all the
information that we need. We cannot express all the scheduling details
by hand, and
2018 Feb 08
2
[VLIW Scheduler] Itineraries vs. per operand scheduling
Hi Krzysztof,
2018-02-08 13:32 GMT+08:00 Andrew Trick via llvm-dev <
llvm-dev at lists.llvm.org>:
>
>
> On Feb 4, 2018, at 9:15 AM, Yatsina, Marina via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
> Hi,
>
> What is the best way to model a scheduler for a VLIW in-order architecture?
> I’ve looked at the Hexagon and R600 architectures and they are using
2019 Jun 07
2
[llvm-mca] What's the difference between Rthroughput and "total cycles" in llvm-mca
Hi Andrea,
So does this definition make sense for basic blocks with more than one
instructions? E.g. how should one interpret a basic block with RThroughput
of 2.3?
On Fri, Jun 7, 2019 at 7:39 AM Andrea Di Biagio <andrea.dibiagio at gmail.com>
wrote:
> Hi Tom,
>
> Field 'Total Cycles' from the summary view simply reports the elapsed
> number of cycles for the entire
2020 Jan 16
2
[llvm-exegesis]?==?utf-8?q? [RFC] Renaming Uops- classes
Since the option of running -mode=inverse_throughput was added to llvm-exegesis the names of classes like UopsSnippetGenerator and UopsBenchmarkRunner, that this mode shares with uops, started to be less descriptive.
Inverse_throughput doesn't use the uops counters, so for example, the instruction layout shared between these two modes is really connected to parallelism, not uops. It's
2018 Mar 15
5
[RFC] llvm-exegesis: Automatic Measurement of Instruction Latency/Uops
[You can find an easier to read and more complete version of this RFC here
<https://docs.google.com/document/d/1QidaJMJUyQdRrFKD66vE1_N55whe0coQ3h1GpFzz27M/edit?ts=5aaa84ee#>
.]
Knowing instruction scheduling properties (latency, uops) is the basis for
all scheduling work done by LLVM.
Unfortunately, vendors usually release only partial (and sometimes
incorrect) information. Updating the
2019 Dec 17
2
[llvm-exegesis] Uops mode isnćt working
Hello,
I've been testing llvm-exegesis on X86. Latency and inverse_throughput modes work fine but when I run uops I get an error:
event not found - cannot create event uops_dispatched_port:port_0
LLVM ERROR: invalid perf event 'uops_dispatched_port:port_0'
I'm running this on a i7-4790K. Am I missing something on my computer or is this not yet fully implemented?
This also
2018 Mar 15
3
[RFC] llvm-exegesis: Automatic Measurement of Instruction Latency/Uops
On Thu, Mar 15, 2018 at 4:41 PM, Hal Finkel via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
>
> On 03/15/2018 10:04 AM, Guillaume Chatelet via llvm-dev wrote:
>
> [You can find an easier to read and more complete version of this RFC here
> <https://docs.google.com/document/d/1QidaJMJUyQdRrFKD66vE1_N55whe0coQ3h1GpFzz27M/edit?ts=5aaa84ee#>
> .]
>
> Knowing
2018 Mar 15
0
[RFC] llvm-exegesis: Automatic Measurement of Instruction Latency/Uops
On 03/15/2018 10:04 AM, Guillaume Chatelet via llvm-dev wrote:
> [You can find an easier to read and more complete version of this RFC
> here
> <https://docs.google.com/document/d/1QidaJMJUyQdRrFKD66vE1_N55whe0coQ3h1GpFzz27M/edit?ts=5aaa84ee#>.]
>
> Knowing instruction scheduling properties (latency, uops) is the basis
> for all scheduling work done by LLVM.
>
>
>
2018 Mar 15
0
[RFC] llvm-exegesis: Automatic Measurement of Instruction Latency/Uops
Sounds like a very useful tool. Thank you for contributing.
Taking a step back and looking at the big picture, combining this with
the recently contributed llvm-mca dramatically improves our scheduling
and performance analysis story. Being able to take a snippet of code on
a particular machine, measure latency/throughput/ports for each
instruction (this tool), and then analyze the entire
2005 Nov 21
5
Error: Error creating domain: (22, ''Invalid argument'')
Hi there
I get the following error message when i try to "xm create <domid>"
Error: Error creating domain: (22, ''Invalid argument'')
I have included everything i can think of
Thanks
The DomU config is
kernel = "/boot/vmlinuz-2.6.12-xenU"
ramdisk = "/boot/initrd-2.6.12.6-xenU.img"
memory = 128
name = "xen01"
nics=1
disk = [
2012 Nov 07
1
[LLVMdev] AVX broadcast Vs. vector constant pool load
Hey guys,
I'm currently investigating broadcasts from the constant pool on Sandy
Bridge. I see this comment in llvm/lib/Target/X86/X86ISelLowering.cpp:
// Handle the broadcasting a single constant scalar from the constant
pool
// into a vector. On Sandybridge it is still better to load a constant
vector
// from the constant pool and not to broadcast it from a scalar.
Would anyone
2018 Jan 04
2
FYI, we've posted a component of Spectre mitigation on llvm-commits
On Thu, Jan 4, 2018 at 12:31 PM Stephen Checkoway via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
>
> > On Jan 4, 2018, at 04:23, Chandler Carruth via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
> >
> > Sending a note here as this seems likely to be of relatively broad
> interest.
>
> It looks like this is producing code of the following form.
2005 Nov 19
6
correct procedure for reporting kernel problems?
I''ve had domU''s hickup a few times with latest unstable, and I guess this can
be expected. Is there a procedure in place to report these problems?
For exampe:
Unable to handle kernel paging request at virtual address c12b46c0
printing eip: cpanel cpanel.mydomain
c02555f7
*pde = ma 7e9fc067 pa 00002067
*pte = ma 00000000 pa 55555000
Oops: 0002 [#1]
PREEMPT SMP
Modules
2006 May 31
1
How to enable VMX?
Hello,
I''m trying to use the VT technology on box but when I start Xen VMX is
disabled by Feature Control MSR as shown in the following message:
Xen version 3.0.2-3 (guill@frec.bull.fr) (gcc version 3.3.5 (Debian 1:3.3.5-13)) Wed May 31 16:07:00 CEST 2006
Latest ChangeSet: Tue May 30 18:14:05 2006 +0100 9697:18e8e613deb9
...
(XEN) Initializing CPU#0
(XEN) Detected 3391.682 MHz
2018 Jan 04
0
FYI, we've posted a component of Spectre mitigation on llvm-commits
On Jan 4, 2018, at 11:52, Chandler Carruth via llvm-dev <llvm-dev at lists.llvm.org> wrote:
> On Thu, Jan 4, 2018 at 12:31 PM Stephen Checkoway via llvm-dev <llvm-dev at lists.llvm.org> wrote:
>
>> As I understand it, the busy loop is to cause the speculative execution to be trapped in the loop. Was something like ud2 considered? I presume that would stop the speculative
2019 Dec 24
2
Get llvm-mca results inside opt?
Hi,
I am trying to generate performance models for specific pieces of code like an omp.outlined function. Lets say I have the following code:
start_collect_parallel_for_data(-1.0,-1.0,-1.0, size, “tag for this region”);
#pragma omp parallel for
for(auto i = 0; i < size; ++i){
// … do work
}
stop_collecting_parallel_for_data();
The omp region will get outlined into a new function and what I
2012 Aug 10
18
[PATCH v2 0/5] ARM hypercall ABI: 64 bit ready
Hi all,
this patch series makes the necessary changes to make sure that the
current ARM hypercall ABI can be used as-is on 64 bit ARM platforms:
- it defines xen_ulong_t as uint64_t on ARM;
- it introduces a new macro to handle guest pointers, called
XEN_GUEST_HANDLE_PARAM (that has size 4 bytes on aarch and is going to
have size 8 bytes on aarch64);
- it replaces all the occurrences of
2013 Jun 06
0
[LLVMdev] Enabling the vectorizer for -Os
Hi,
Thanks for the feedback. I think that we agree that vectorization on -Os can benefit many programs. Regarding -O2 vs -O3, maybe we should set a higher cost threshold for O2 to increase the likelihood of improving the performance ? We have very few regressions on -O3 as is and with better cost models I believe that we can bring them close to zero, so I am not sure if it can help that much.