thr3ads.net - llvm dev - [llvm-dev] Per-write cycle count with ReadAdvance

If this information is useful, please help other people find it:
Share via:

Garfee Guan via llvm-dev

2018-Nov-15 06:52 UTC

[llvm-dev] Per-write cycle count with ReadAdvance - Do I really need that?

Hi list,
I happened to read below thread (written in 3 years ago). I think I may
need this ReadAdvance feature to work with my ARCH.

It is about the scheduler info which describes reading my ARCH's vector
register. There are different latencies since forwarding/bypass appears. I
give it as below example:

def : WriteRes<WriteVector,    [MyArchVALU]>  { let Latency = 6; }
...
def MyWriteAddVector : SchedWriteRes<[MyArchVALU]> { let Latency = 6; }
def MyWriteMulVector : SchedWriteRes<[MyArchVALU]> { let Latency = 6; }
...

Here I defined 3 different Writes with same latency number. Below shows the
forwarding.

def : ReadAdvance<MyReadVector, 5, [WriteVector]>;
def : ReadAdvance<MyReadVector, 3, [MyWriteAddVector_3cycles]>;
def : ReadAdvance<MyReadVector, 1, [MyWriteMulVector_5cycles]>;
...
def : ReadAdvance<MyReadStoreVector, 0, [WriteVector]>;
def : ReadAdvance<MyReadStoreVector, 0, [MyWriteAddVector_3cycles]>;
def : ReadAdvance<MyReadStoreVector, 0, [MyWriteMulVector_5cycles]>;
...

Basically my intention is to model that, for any non-store instruction
which reads vector, it forwards vector write to: normally 1 cycle, 3 cycles
for my ADD, 5 cycles for my MUL. But for any store instruction takes vector
register as source, It can not forward. So the latency is kept as 6.

Unfortunately, above code can not be compiled by tblgen. I am not sure if I
really need per-write cycle count with ReadAdvance, or there is any existed
method to meet my requirement. Anyway the latencies here seems to be
decided by considering both

a) 3 kinds of Write,
b) 2 kinds of Read.

Therefore I doubt if it can not be modeled with current tblgen implement.

Can you comment and help?

--
Garfee Guan,
LLVM Compiler Backend Engineer
Enflame Technology Co.
Website: http://www.enflame-tech.com/

--------------------------------------------------------------------
[llvm-dev] Per-write cycle count with ReadAdvance
*Pierre-Andre Saulais via llvm-dev* llvm-dev at lists.llvm.org
<llvm-dev%40lists.llvm.org?Subject=Re%3A%20%5Bllvm-dev%5D%20Per-write%20cycle%20count%20with%20ReadAdvance&In-Reply-To=%3C565C3F99.9060206%40codeplay.com%3E>
*Mon Nov 30 04:22:49 PST 2015*


   - Previous message: [llvm-dev] difference with autotools, cmake and
   ninja building methods
   <http://lists.llvm.org/pipermail/llvm-dev/2015-November/092870.html>
   - Next message: [llvm-dev] LLVM Weekly - #100, Nov 30th 2015
   <http://lists.llvm.org/pipermail/llvm-dev/2015-November/092850.html>
   - *Messages sorted by:* [ date ]
  
<http://lists.llvm.org/pipermail/llvm-dev/2015-November/date.html#92849> [
   thread ]
  
<http://lists.llvm.org/pipermail/llvm-dev/2015-November/thread.html#92849>
    [ subject ]
  
<http://lists.llvm.org/pipermail/llvm-dev/2015-November/subject.html#92849>
    [ author ]
  
<http://lists.llvm.org/pipermail/llvm-dev/2015-November/author.html#92849>

------------------------------

Hi all,

I am working on a backend that uses the ProcResource scheduling model
and one limitation I found is that while it is possible to specify
multiple SchedWrites in a ReadAdvance record, each write uses the same
cycle count. I tried writing multiple ReadAdvance records for the same
SchedRead, but tablegen does not seem to allow that.

It would be useful to have a per-write cycle count to model different
pipeline bypasses, where the cycle count depends on the (read, write)
pair and not just on the read.

Two possible solutions are: 1) changing the 'Cycles' field in
(Proc)ReadAdvance to be a list of int and 2) changing tablegen to allow
multiple (Proc)ReadAdvance records with the same read resource.

The former solution doesn't seem ideal as it requires repeating the
cycle count many times for targets that use long SchedWriteRes lists:

-def : ReadAdvance<ReadIM, 1, [WriteImm,WriteI,
+def: ReadAdvance<ReadIM, [1, 1, 1, 1, 1, 1, 1, 1], [WriteImm, WriteI,
                                WriteISReg, WriteIEReg,WriteIS,
                                WriteID32,WriteID64,
                                WriteIM32,WriteIM64]>;

The latter is a bit more verbose when per-write cycle count is used, but
requires no change to existing targets. It is also easier to visually
match cycle counts to write types:

def : ReadAdvance<ReadFoo, 2, [WriteType1]>;
def : ReadAdvance<ReadFoo, 4, [WriteType2]>;
def : ReadAdvance<ReadFoo, 3, [WriteType3]>;

I have a patch for the second solution. Would that benefit any in-tree
target?

Thanks,
Pierre-Andre

-- 
Pierre-Andre Saulais
Principal Software Engineer, Compilers
Codeplay Software Ltd
Level C, Argyle House
3 Lady Lawson St,
Edinburgh EH3 9DR
Tel: 0131 466 0503
Fax: 0131 557 6600
Website: http://www.codeplay.com
Twitter: https://twitter.com/codeplaysoft

This email and any attachments may contain confidential and /or
privileged information and is for use by the addressee only. If you
are not the intended recipient, please notify Codeplay Software Ltd
immediately and delete the message from your computer. You may not
copy or forward it, or use or disclose its contents to any other
person. Any views or other information in this message which do not
relate to our business are not authorized by Codeplay software Ltd,
nor does this message form part of any contract unless so stated.
As internet communications are capable of data corruption Codeplay
Software Ltd does not accept any responsibility for any changes made
to this message after it was sent. Please note that Codeplay Software
Ltd does not accept any liability or responsibility for viruses and it
is your responsibility to scan any attachments.
Company registered in England and Wales, number: 04567874
Registered office: 81 Linkfield Street, Redhill RH1 6BY

-------------- next part --------------
A non-text attachment was scrubbed...
Name: multiple_readadvance.patch
Type: text/x-patch
Size: 6336 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20151130/08d3acbf/attachment.bin>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20181115/4169cd68/attachment.html>

Andrew Trick via llvm-dev

2018-Nov-16 00:00 UTC

head link

[llvm-dev] Per-write cycle count with ReadAdvance - Do I really need that?

> On Nov 14, 2018, at 10:52 PM, Garfee Guan via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
> Hi list,
> 
> I happened to read below thread (written in 3 years ago). I think I may
need this ReadAdvance feature to work with my ARCH.
> 
> It is about the scheduler info which describes reading my ARCH's vector
register. There are different latencies since forwarding/bypass appears. I give
it as below example:
> 
> def : WriteRes<WriteVector,    [MyArchVALU]>  { let Latency = 6; }
> ...
> def MyWriteAddVector : SchedWriteRes<[MyArchVALU]> { let Latency = 6;
}
> def MyWriteMulVector : SchedWriteRes<[MyArchVALU]> { let Latency = 6;
}
> ...
> 
> Here I defined 3 different Writes with same latency number. Below shows the
forwarding.
> 
> def : ReadAdvance<MyReadVector, 5, [WriteVector]>;  
> def : ReadAdvance<MyReadVector, 3, [MyWriteAddVector_3cycles]>;
> def : ReadAdvance<MyReadVector, 1, [MyWriteMulVector_5cycles]>;
> ...
> def : ReadAdvance<MyReadStoreVector, 0, [WriteVector]>;  
> def : ReadAdvance<MyReadStoreVector, 0, [MyWriteAddVector_3cycles]>;
> def : ReadAdvance<MyReadStoreVector, 0, [MyWriteMulVector_5cycles]>;
> ...
> 
> Basically my intention is to model that, for any non-store instruction
which reads vector, it forwards vector write to: normally 1 cycle, 3 cycles for
my ADD, 5 cycles for my MUL. But for any store instruction takes vector register
as source, It can not forward. So the latency is kept as 6.
> 
> Unfortunately, above code can not be compiled by tblgen. I am not sure if I
really need per-write cycle count with ReadAdvance, or there is any existed
method to meet my requirement. Anyway the latencies here seems to be decided by
considering both
> 
> a) 3 kinds of Write, 
> b) 2 kinds of Read. 
> 
> Therefore I doubt if it can not be modeled with current tblgen implement.
I’m not sure if the TableGen bug mentioned below was ever fixed.

It looks to me like this should work, but I haven’t tried it:

def : WriteRes<WriteVector,    [MyArchVALU]>  { let Latency = 6; }
def MyWriteAddVector : SchedWriteRes<[MyArchVALU]> { let Latency = 6; }
def MyWriteMulVector : SchedWriteRes<[MyArchVALU]> { let Latency = 6; }  

// Forward from a vector op (normal, add, mul) to a non-store.
def : ReadAdvance<MyReadVector, 5, [WriteVector]>;  
def : ReadAdvance<MyReadVector, 3, [MyWriteAddVector]>;
def : ReadAdvance<MyReadVector, 1, [MyWriteMulVector]>;

Additionally, you could do this but I don’t think it would have any effect at
all:

// Forward from a vector op (normal, add, mul) to a store.
def : ReadAdvance<MyReadStoreVector, 0, [WriteVector, MyWriteAddVector,
MyWriteMulVector]>;

-Andy
> --
> Garfee Guan,
> LLVM Compiler Backend Engineer
> Enflame Technology Co.
> Website: http://www.enflame-tech.com/ <http://www.enflame-tech.com/>
> 
> --------------------------------------------------------------------
> [llvm-dev] Per-write cycle count with ReadAdvance
> 
> Pierre-Andre Saulais via llvm-dev llvm-dev at lists.llvm.org 
<mailto:llvm-dev%40lists.llvm.org?Subject=Re%3A%20%5Bllvm-dev%5D%20Per-write%20cycle%20count%20with%20ReadAdvance&In-Reply-To=%3C565C3F99.9060206%40codeplay.com%3E>
> Mon Nov 30 04:22:49 PST 2015
> 
> Previous message: [llvm-dev] difference with autotools,	cmake and ninja
building methods
<http://lists.llvm.org/pipermail/llvm-dev/2015-November/092870.html>
> Next message: [llvm-dev] LLVM Weekly - #100, Nov 30th 2015
<http://lists.llvm.org/pipermail/llvm-dev/2015-November/092850.html>
> Messages sorted by: [ date ]
<http://lists.llvm.org/pipermail/llvm-dev/2015-November/date.html#92849> [
thread ]
<http://lists.llvm.org/pipermail/llvm-dev/2015-November/thread.html#92849>
[ subject ]
<http://lists.llvm.org/pipermail/llvm-dev/2015-November/subject.html#92849>
[ author ]
<http://lists.llvm.org/pipermail/llvm-dev/2015-November/author.html#92849>
> Hi all,
> 
> I am working on a backend that uses the ProcResource scheduling model 
> and one limitation I found is that while it is possible to specify 
> multiple SchedWrites in a ReadAdvance record, each write uses the same 
> cycle count. I tried writing multiple ReadAdvance records for the same 
> SchedRead, but tablegen does not seem to allow that.
> 
> It would be useful to have a per-write cycle count to model different 
> pipeline bypasses, where the cycle count depends on the (read, write) 
> pair and not just on the read.
> 
> Two possible solutions are: 1) changing the 'Cycles' field in 
> (Proc)ReadAdvance to be a list of int and 2) changing tablegen to allow 
> multiple (Proc)ReadAdvance records with the same read resource.
> 
> The former solution doesn't seem ideal as it requires repeating the 
> cycle count many times for targets that use long SchedWriteRes lists:
> 
> -def : ReadAdvance<ReadIM, 1, [WriteImm,WriteI,
> +def: ReadAdvance<ReadIM, [1, 1, 1, 1, 1, 1, 1, 1], [WriteImm, WriteI,
>                                 WriteISReg, WriteIEReg,WriteIS,
>                                 WriteID32,WriteID64,
>                                 WriteIM32,WriteIM64]>;
> 
> The latter is a bit more verbose when per-write cycle count is used, but 
> requires no change to existing targets. It is also easier to visually 
> match cycle counts to write types:
> 
> def : ReadAdvance<ReadFoo, 2, [WriteType1]>;
> def : ReadAdvance<ReadFoo, 4, [WriteType2]>;
> def : ReadAdvance<ReadFoo, 3, [WriteType3]>;
> 
> I have a patch for the second solution. Would that benefit any in-tree 
> target?
> 
> Thanks,
> Pierre-Andre
> 
> -- 
> Pierre-Andre Saulais
> Principal Software Engineer, Compilers
> Codeplay Software Ltd
> Level C, Argyle House
> 3 Lady Lawson St,
> Edinburgh EH3 9DR
> Tel: 0131 466 0503
> Fax: 0131 557 6600
> Website: http://www.codeplay.com <http://www.codeplay.com/>
> Twitter: https://twitter.com/codeplaysoft
<https://twitter.com/codeplaysoft>
> 
> This email and any attachments may contain confidential and /or privileged
information and is for use by the addressee only. If you are not the intended
recipient, please notify Codeplay Software Ltd immediately and delete the
message from your computer. You may not copy or forward it, or use or disclose
its contents to any other person. Any views or other information in this message
which do not relate to our business are not authorized by Codeplay software Ltd,
nor does this message form part of any contract unless so stated.
> As internet communications are capable of data corruption Codeplay Software
Ltd does not accept any responsibility for any changes made to this message
after it was sent. Please note that Codeplay Software Ltd does not accept any
liability or responsibility for viruses and it is your responsibility to scan
any attachments.
> Company registered in England and Wales, number: 04567874
> Registered office: 81 Linkfield Street, Redhill RH1 6BY
> 
> -------------- next part --------------
> A non-text attachment was scrubbed...
> Name: multiple_readadvance.patch
> Type: text/x-patch
> Size: 6336 bytes
> Desc: not available
> URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20151130/08d3acbf/attachment.bin
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20151130/08d3acbf/attachment.bin>>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20181115/3f09cea7/attachment.html>

Garfee Guan via llvm-dev

2018-Nov-17 02:31 UTC

head link

[llvm-dev] Per-write cycle count with ReadAdvance - Do I really need that?

Thanks Andrew. I have tried with recent tblgen, ReadAdvance would not work
for multiple latencies. Maybe I should make improvement into tblgen if
Pierre-Andre
does not have the change anymore.

However, I just a little curious about the situation I met. The hardware
forwording may fail for different reasons, which different register read
may have different latencies, depending both on the register reader and
writer. I am freshman into tblgen. So I wonder if any other Target already
has other way to describe that .

On Fri, Nov 16, 2018, 8:00 AM Andrew Trick <atrick at apple.com wrote:
>
>
> On Nov 14, 2018, at 10:52 PM, Garfee Guan via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
> Hi list,
> I happened to read below thread (written in 3 years ago). I think I may
> need this ReadAdvance feature to work with my ARCH.
>
> It is about the scheduler info which describes reading my ARCH's vector
> register. There are different latencies since forwarding/bypass appears. I
> give it as below example:
>
> def : WriteRes<WriteVector,    [MyArchVALU]>  { let Latency = 6; }
> ...
> def MyWriteAddVector : SchedWriteRes<[MyArchVALU]> { let Latency = 6;
}
> def MyWriteMulVector : SchedWriteRes<[MyArchVALU]> { let Latency = 6;
}
> ...
>
> Here I defined 3 different Writes with same latency number. Below shows
> the forwarding.
>
> def : ReadAdvance<MyReadVector, 5, [WriteVector]>;
> def : ReadAdvance<MyReadVector, 3, [MyWriteAddVector_3cycles]>;
> def : ReadAdvance<MyReadVector, 1, [MyWriteMulVector_5cycles]>;
> ...
> def : ReadAdvance<MyReadStoreVector, 0, [WriteVector]>;
> def : ReadAdvance<MyReadStoreVector, 0, [MyWriteAddVector_3cycles]>;
> def : ReadAdvance<MyReadStoreVector, 0, [MyWriteMulVector_5cycles]>;
> ...
>
> Basically my intention is to model that, for any non-store instruction
> which reads vector, it forwards vector write to: normally 1 cycle, 3
> cycles for my ADD, 5 cycles for my MUL. But for any store instruction
> takes vector register as source, It can not forward. So the latency is kept
> as 6.
>
> Unfortunately, above code can not be compiled by tblgen. I am not sure if
> I really need per-write cycle count with ReadAdvance, or there is any
> existed method to meet my requirement. Anyway the latencies here seems to
> be decided by considering both
>
> a) 3 kinds of Write,
> b) 2 kinds of Read.
>
> Therefore I doubt if it can not be modeled with current tblgen implement.
>
>
> I’m not sure if the TableGen bug mentioned below was ever fixed.
>
> It looks to me like this should work, but I haven’t tried it:
>
> def : WriteRes<WriteVector,    [MyArchVALU]>  { let Latency = 6; }
> def MyWriteAddVector : SchedWriteRes<[MyArchVALU]> { let Latency = 6;
}
> def MyWriteMulVector : SchedWriteRes<[MyArchVALU]> { let Latency = 6;
}
>
> // Forward from a vector op (normal, add, mul) to a non-store.
> def : ReadAdvance<MyReadVector, 5, [WriteVector]>;
> def : ReadAdvance<MyReadVector, 3, [MyWriteAddVector]>;
> def : ReadAdvance<MyReadVector, 1, [MyWriteMulVector]>;
>
> Additionally, you could do this but I don’t think it would have any effect
> at all:
>
> // Forward from a vector op (normal, add, mul) to a store.
> def : ReadAdvance<MyReadStoreVector, 0,
> [WriteVector, MyWriteAddVector, MyWriteMulVector]>;
>
> -Andy
>
> --
> Garfee Guan,
> LLVM Compiler Backend Engineer
> Enflame Technology Co.
> Website: http://www.enflame-tech.com/
>
> --------------------------------------------------------------------
> [llvm-dev] Per-write cycle count with ReadAdvance
> *Pierre-Andre Saulais via llvm-dev* llvm-dev at lists.llvm.org
>
<llvm-dev%40lists.llvm.org?Subject=Re%3A%20%5Bllvm-dev%5D%20Per-write%20cycle%20count%20with%20ReadAdvance&In-Reply-To=%3C565C3F99.9060206%40codeplay.com%3E>
> *Mon Nov 30 04:22:49 PST 2015*
>
>
>    - Previous message: [llvm-dev] difference with autotools, cmake and
>    ninja building methods
>   
<http://lists.llvm.org/pipermail/llvm-dev/2015-November/092870.html>
>    - Next message: [llvm-dev] LLVM Weekly - #100, Nov 30th 2015
>   
<http://lists.llvm.org/pipermail/llvm-dev/2015-November/092850.html>
>    - *Messages sorted by:* [ date ]
>   
<http://lists.llvm.org/pipermail/llvm-dev/2015-November/date.html#92849>
>     [ thread ]
>   
<http://lists.llvm.org/pipermail/llvm-dev/2015-November/thread.html#92849>
>     [ subject ]
>   
<http://lists.llvm.org/pipermail/llvm-dev/2015-November/subject.html#92849>
>     [ author ]
>   
<http://lists.llvm.org/pipermail/llvm-dev/2015-November/author.html#92849>
>
> ------------------------------
>
> Hi all,
>
> I am working on a backend that uses the ProcResource scheduling model
> and one limitation I found is that while it is possible to specify
> multiple SchedWrites in a ReadAdvance record, each write uses the same
> cycle count. I tried writing multiple ReadAdvance records for the same
> SchedRead, but tablegen does not seem to allow that.
>
> It would be useful to have a per-write cycle count to model different
> pipeline bypasses, where the cycle count depends on the (read, write)
> pair and not just on the read.
>
> Two possible solutions are: 1) changing the 'Cycles' field in
> (Proc)ReadAdvance to be a list of int and 2) changing tablegen to allow
> multiple (Proc)ReadAdvance records with the same read resource.
>
> The former solution doesn't seem ideal as it requires repeating the
> cycle count many times for targets that use long SchedWriteRes lists:
>
> -def : ReadAdvance<ReadIM, 1, [WriteImm,WriteI,
> +def: ReadAdvance<ReadIM, [1, 1, 1, 1, 1, 1, 1, 1], [WriteImm, WriteI,
>                                 WriteISReg, WriteIEReg,WriteIS,
>                                 WriteID32,WriteID64,
>                                 WriteIM32,WriteIM64]>;
>
> The latter is a bit more verbose when per-write cycle count is used, but
> requires no change to existing targets. It is also easier to visually
> match cycle counts to write types:
>
> def : ReadAdvance<ReadFoo, 2, [WriteType1]>;
> def : ReadAdvance<ReadFoo, 4, [WriteType2]>;
> def : ReadAdvance<ReadFoo, 3, [WriteType3]>;
>
> I have a patch for the second solution. Would that benefit any in-tree
> target?
>
> Thanks,
> Pierre-Andre
>
> --
> Pierre-Andre Saulais
> Principal Software Engineer, Compilers
> Codeplay Software Ltd
> Level C, Argyle House
> 3 Lady Lawson St,
> Edinburgh EH3 9DR
> Tel: 0131 466 0503
> Fax: 0131 557 6600
> Website: http://www.codeplay.com
> Twitter: https://twitter.com/codeplaysoft
>
> This email and any attachments may contain confidential and /or privileged
information and is for use by the addressee only. If you are not the intended
recipient, please notify Codeplay Software Ltd immediately and delete the
message from your computer. You may not copy or forward it, or use or disclose
its contents to any other person. Any views or other information in this message
which do not relate to our business are not authorized by Codeplay software Ltd,
nor does this message form part of any contract unless so stated.
> As internet communications are capable of data corruption Codeplay Software
Ltd does not accept any responsibility for any changes made to this message
after it was sent. Please note that Codeplay Software Ltd does not accept any
liability or responsibility for viruses and it is your responsibility to scan
any attachments.
> Company registered in England and Wales, number: 04567874
> Registered office: 81 Linkfield Street, Redhill RH1 6BY
>
> -------------- next part --------------
> A non-text attachment was scrubbed...
> Name: multiple_readadvance.patch
> Type: text/x-patch
> Size: 6336 bytes
> Desc: not available
> URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20151130/08d3acbf/attachment.bin>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20181117/d37638b4/attachment-0001.html>

Possibly Parallel Threads

Search for more possibly parallel threads

llvm dev - Nov 2018 - Per-write cycle count with ReadAdvance - Do I really need that?

[llvm-dev] Per-write cycle count with ReadAdvance - Do I really need that?

[llvm-dev] Per-write cycle count with ReadAdvance - Do I really need that?

[llvm-dev] Per-write cycle count with ReadAdvance - Do I really need that?

Possibly Parallel Threads