thr3ads.net - Nouveau - [Nouveau] Dealing with opencl kernel parameters in nouveau now that RES support is gone [Feb 2016]

If this information is useful, please help other people find it:
Share via:

Hans de Goede

2016-Feb-22 11:26 UTC

[Nouveau] Dealing with opencl kernel parameters in nouveau now that RES support is gone

Hi,

On 19-02-16 20:43, Ilia Mirkin wrote:> On Fri, Feb 19, 2016 at 5:36 AM, Hans de Goede <hdegoede at
redhat.com> wrote:
>> Hi,
>>
>> On 18-02-16 17:39, Ilia Mirkin wrote:
>>>
>>> On Thu, Feb 18, 2016 at 9:45 AM, Hans de Goede <hdegoede at
redhat.com>
>>> wrote:
>>>>
>>>> But this does not seem to be hooked up yet for nouveau.
>>>
>>>
>>> Samuel has patches. See
>>>
https://cgit.freedesktop.org/~hakzsam/mesa/log/?h=arb_compute_shader_v3
>>
>>
>> Cool, I will take a look at those.
>>
>>>> So some questions:
>>>> -The commit by Samual says:
>>>>    This introduces TGSI_FILE_MEMORY for shared, global and
local memory.
>>>> Only
>>>> shared memory is currently supported.
>>>>
>>>>    The commit introduces MEMORY[x] and MEMORY[x],SHARED so in
reality it
>>>> also
>>>> introduces
>>>>    a second option next to shared, so what are we going to use
plain
>>>> MEMORY[x]
>>>> for?
>>>>    I suggest using it for global memory but we need to be in
agreement on
>>>> this.
>>>
>>>
>>> That sounds fine to me. However what I had in mind was switching
the
>>> SHARED field into a 2-bit field and making it
>>>
>>> 1 = SHARED
>>> 2 = GLOBAL
>>> 3 = LOCAL
>>>
>>> (since for OpenCL you also need to be able to address local or
private
>>> memory). I sorta wanted Samuel to do it, but since I had no idea
where
>>> you were at, or if you were even still working on this, I figured
it
>>> should be fixed up by the first person who needed it.
>>
>>
>> Sounds good, only the naming is somewhat unfortunate since opencl uses
>> different
>> naming. I.e. it has no "shared"
>
> Sad. Well, "shared" is what OpenGL compute shaders use, which is
why I
> proposed it.
>
>>
>> OpenCL has:
>> -global:  accessible by all worker-groups as well as by the CPU
>> -const:   read-only global
>> -local:   shared by worker-items in the same worker-group, not shared
>> between worker-groups
>> -private: accessible only to a single worker-item
>>
>> So how do these map to the TGSI:
>>
>>> 1 = SHARED
>>> 2 = GLOBAL
>>> 3 = LOCAL
>
> OpenCL global = TGSI global
> OpenCL const = TGSI global
> OpenCL local = TGSI shared
> OpenCL private = TGSI local
>
> Not sure what the distinction is between OpenCL const and global is.
> If the const stuff is actually just user-supplied uniforms (and
> doesn't need to be in a particular place in memory), then those should
> go into TGSI CONST somehow.
AFAIK OpenCL const is really read-only global, so the data is filled
in by the CPU, then passed to the opencl-kernel running on the GPU
where all worker-items have access to it. I think that TGSI CONST might
indeed be usable for this, but it is probably easiest to treat
it as GLOBAL for now.
>>>> -What about kernel input parameters, so far these have been
using
>>>> RES[32764]
>>>>    I must admit that I do not understand where the file_index
of 32764
>>>> comes
>>>>    from (or where any of the file indexes come from for
>>>> src/gallium/tests/trivial/compute.c ?
>>>>    I have the feeling that these are not used at all, and
everything
>>>> simply
>>>> goes
>>>>    to a flat (virtual) memory space, with the params at address
0, correct
>>>> ?
>>>
>>>
>>> It was never particularly well-specified, which was one of the
reasons
>>> it went away. It also didn't map nicely onto the OpenGL model.
There
>>> is a remaining question of how to do addressing in memory...
there's
>>> 40 bits of address space. Should these implicitly be U64
>>> (dual-component in TGSI) addresses that are passed around? Not sure
>>> what the OpenCL position on all this is.
>>
>>
>> So far I've been using U32 for addresses as that is what
Francisco's
>> original
>> code was using. And this also is what things like the tgsi LOAD
instruction
>> take. If you're doing a LOAD on a 1d buffer then you will use
TEMP[#].x to
>> specify the index, and the way how this currently works with OpenCL is
that
>> clCreateBuffer() will return a cl_mem type which then gets passed into
>> the kernel as input parameter and gets treated as a pointer by the
compiler,
>> so e.g. global mem gets treated as a single address space even if there
>> are multiple global buffers and TEMP[#].x contains the value passed in
>> via cl_mem as start offset for the buffer + the index into the buffer.
>>
>> So this means that currently we are limited to U32 since TEMP[#].x is
>> only 32 bits wide. Internally 40 bits addresses can and should probably
>> be used so that at least the different memory spaces each have the full
>> 32 bits available.
>>
>> Note that we could fix this by adding some sort of LOAD64 opcode, which
>> uses TEMP[#].x and TEMP[#].y as address for 1d buffers, I'm not
sure
>> how this would work for 3d buffers though. I foresee the llvm backend
>> eventually getting a 64 bit mode where it will use 64 bits for all
>> pointers and use something like a LOAD64 opcode to indicate that
>> the indexes (which it effectively uses as addresses / pointers)
>> are 64 bit wide.
>
> Well, this LOAD64/STORE64 would just only be defined for MEMORY[]
> src/dest, so you don't need to worry about 3d or anything like that. I
> believe this is a good solution to the problem.
Right for MEMORY this will work fine. I've no clue yet how images will
work with OpenCL though, hopefully we can avoid the one flat address
space thing there, but I simply don't know yet.
>>> Also it would be highly preferable to avoid using GLOBAL at all in
the
>>> first place, and just use BUFFER[] or IMAGE[] or whatever. However
>>> after having a look at how OpenCL works, I don't think
that's
>>> possible. But perhaps I missed something?
>>
>>
>> OpenCL is very C-like and as such uses a flat address space per memory
>> space, getting around that is going to be very tricky, if possible at
>> all.
>
> That was my assessment as well.
>
>>
>>> Perhaps I didn't make it clear enough though -- sorry. The
reality is
>>> that TGSI is a living thing, and things change every so often. I
don't
>>> think people would be comfortable with locking down TGSI in such a
>>> way.
>>
>>
>> Hmm, I do think that if we end up using a llvm-ir -> tgsi step in
some
>> cases that we do need to lock TGSI down. Taken the RES thing for
example,
>> it would have been pretty trivial actually to not nuke it, and instead
just
>> add the image support next to it. Note I'm fine with making changes
while
>> we are figuring this out, but once I start pushing the llvm bits to
llvm
>> upstream (which is still months away) we are going to need some sort of
>> TGSI stability.
>
> I can't unilaterally make such promises. This would have to be
> discussed and approved by the other gallium/tgsi stakeholders.
Understood.

So back to the problem of getting OpenCL(ish) code to work again with
the recent mesa changes. For starters I would like to get:

src/gallium/tests/trivial/compute.c and then the test with mask 8,
test_input_global() to work again, when that is working I should be
able to adjust my llvm work (and if necessary clover) to start to
work again.

Currently the test_input_global() test uses the following bit of
TGSI code:

COMP
DCL SV[0], THREAD_ID[0]
DCL TEMP[0], LOCAL
DCL TEMP[1], LOCAL
IMM UINT32 { 8, 0, 0, 0 }

    BGNSUB\n"
        UMUL TEMP[0], SV[0], IMM[0]
        LOAD TEMP[1].xy, RINPUT, TEMP[0]
        LOAD TEMP[0].x, RGLOBAL, TEMP[1].yyyy
        UADD TEMP[1].x, TEMP[0], -TEMP[1]
        STORE RGLOBAL.x, TEMP[1].yyyy, TEMP[1]
        RET
    ENDSUB


Where by RINPUT and RGLOBAL get replaces by processing the
code with cpp and the following defines:

#define RGLOBAL        RES[32767]
#define RLOCAL         RES[32766]
#define RPRIVATE       RES[32765]
#define RINPUT         RES[32764]

If I understand how memory is supposed to work, then I would need to
change the TGSI as follows:

COMP
DCL SV[0], THREAD_ID[0]
DCL MEMORY[0]
DCL TEMP[0], LOCAL
DCL TEMP[1], LOCAL
IMM UINT32 { 8, 0, 0, 0 }

BGNSUB\n"
UMUL TEMP[0], SV[0], IMM[0]
LOAD TEMP[1].xy, RINPUT, TEMP[0]
LOAD TEMP[0].x, MEMORY[0], TEMP[1].yyyy
UADD TEMP[1].x, TEMP[0], -TEMP[1]
STORE MEMORY[0].x, TEMP[1].yyyy, TEMP[1]
RET
ENDSUB

This assumes, that as discussed declaring memory without a , SHARED or other
flag means the memory is global.

So 2 questions:

1) Do the above changes for using the new MEMORY keyword look as intended
to you?

2) This only solves the accessing of the global memory, it does not solve
getting to the kernel input kernel parameters, how would I deal with those ?

Currently the kernel input parameters are uploaded by
src/gallium/drivers/nouveau/nv50/nv50_compute.c: nv50_compute_upload_input()

Are we going to change how these parameters get uploaded ? And if not how do we
get to these parameters from TGSI now that we no longer have RES[#] to do so ?

I think we should also keep in mind that what works for nouveau may not work
for other GPU-s, so we should either introduce a MEMORY[x], INPUT or
a new DCL INPUT[0] for this, so that the code for other GPU-s can have its
own handling for this.

Regards,

Hans

Samuel Pitoiset

2016-Feb-22 12:41 UTC

head link

[Nouveau] Dealing with opencl kernel parameters in nouveau now that RES support is gone

Hi there,

On 02/22/2016 12:26 PM, Hans de Goede wrote:> Hi,
>
> On 19-02-16 20:43, Ilia Mirkin wrote:
>> On Fri, Feb 19, 2016 at 5:36 AM, Hans de Goede <hdegoede at
redhat.com>
>> wrote:
>>> Hi,
>>>
>>> On 18-02-16 17:39, Ilia Mirkin wrote:
>>>>
>>>> On Thu, Feb 18, 2016 at 9:45 AM, Hans de Goede <hdegoede at
redhat.com>
>>>> wrote:
>>>>>
>>>>> But this does not seem to be hooked up yet for nouveau.
>>>>
>>>>
>>>> Samuel has patches. See
>>>>
https://cgit.freedesktop.org/~hakzsam/mesa/log/?h=arb_compute_shader_v3
>>>
>>>
>>> Cool, I will take a look at those.
>>>
>>>>> So some questions:
>>>>> -The commit by Samual says:
>>>>>    This introduces TGSI_FILE_MEMORY for shared, global and
local
>>>>> memory.
>>>>> Only
>>>>> shared memory is currently supported.
>>>>>
>>>>>    The commit introduces MEMORY[x] and MEMORY[x],SHARED so
in
>>>>> reality it
>>>>> also
>>>>> introduces
>>>>>    a second option next to shared, so what are we going to
use plain
>>>>> MEMORY[x]
>>>>> for?
>>>>>    I suggest using it for global memory but we need to be
in
>>>>> agreement on
>>>>> this.
>>>>
>>>>
>>>> That sounds fine to me. However what I had in mind was
switching the
>>>> SHARED field into a 2-bit field and making it
>>>>
>>>> 1 = SHARED
>>>> 2 = GLOBAL
>>>> 3 = LOCAL
>>>>
>>>> (since for OpenCL you also need to be able to address local or
private
>>>> memory). I sorta wanted Samuel to do it, but since I had no
idea where
>>>> you were at, or if you were even still working on this, I
figured it
>>>> should be fixed up by the first person who needed it.
>>>
>>>
>>> Sounds good, only the naming is somewhat unfortunate since opencl
uses
>>> different
>>> naming. I.e. it has no "shared"
>>
>> Sad. Well, "shared" is what OpenGL compute shaders use, which
is why I
>> proposed it.
>>
>>>
>>> OpenCL has:
>>> -global:  accessible by all worker-groups as well as by the CPU
>>> -const:   read-only global
>>> -local:   shared by worker-items in the same worker-group, not
shared
>>> between worker-groups
>>> -private: accessible only to a single worker-item
>>>
>>> So how do these map to the TGSI:
>>>
>>>> 1 = SHARED
>>>> 2 = GLOBAL
>>>> 3 = LOCAL
>>
>> OpenCL global = TGSI global
>> OpenCL const = TGSI global
>> OpenCL local = TGSI shared
>> OpenCL private = TGSI local
>>
>> Not sure what the distinction is between OpenCL const and global is.
>> If the const stuff is actually just user-supplied uniforms (and
>> doesn't need to be in a particular place in memory), then those
should
>> go into TGSI CONST somehow.
>
> AFAIK OpenCL const is really read-only global, so the data is filled
> in by the CPU, then passed to the opencl-kernel running on the GPU
> where all worker-items have access to it. I think that TGSI CONST might
> indeed be usable for this, but it is probably easiest to treat
> it as GLOBAL for now.
>
>>>>> -What about kernel input parameters, so far these have been
using
>>>>> RES[32764]
>>>>>    I must admit that I do not understand where the
file_index of 32764
>>>>> comes
>>>>>    from (or where any of the file indexes come from for
>>>>> src/gallium/tests/trivial/compute.c ?
>>>>>    I have the feeling that these are not used at all, and
everything
>>>>> simply
>>>>> goes
>>>>>    to a flat (virtual) memory space, with the params at
address 0,
>>>>> correct
>>>>> ?
>>>>
>>>>
>>>> It was never particularly well-specified, which was one of the
reasons
>>>> it went away. It also didn't map nicely onto the OpenGL
model. There
>>>> is a remaining question of how to do addressing in memory...
there's
>>>> 40 bits of address space. Should these implicitly be U64
>>>> (dual-component in TGSI) addresses that are passed around? Not
sure
>>>> what the OpenCL position on all this is.
>>>
>>>
>>> So far I've been using U32 for addresses as that is what
Francisco's
>>> original
>>> code was using. And this also is what things like the tgsi LOAD
>>> instruction
>>> take. If you're doing a LOAD on a 1d buffer then you will use
>>> TEMP[#].x to
>>> specify the index, and the way how this currently works with OpenCL
>>> is that
>>> clCreateBuffer() will return a cl_mem type which then gets passed
into
>>> the kernel as input parameter and gets treated as a pointer by the
>>> compiler,
>>> so e.g. global mem gets treated as a single address space even if
there
>>> are multiple global buffers and TEMP[#].x contains the value passed
in
>>> via cl_mem as start offset for the buffer + the index into the
buffer.
>>>
>>> So this means that currently we are limited to U32 since TEMP[#].x
is
>>> only 32 bits wide. Internally 40 bits addresses can and should
probably
>>> be used so that at least the different memory spaces each have the
full
>>> 32 bits available.
>>>
>>> Note that we could fix this by adding some sort of LOAD64 opcode,
which
>>> uses TEMP[#].x and TEMP[#].y as address for 1d buffers, I'm not
sure
>>> how this would work for 3d buffers though. I foresee the llvm
backend
>>> eventually getting a 64 bit mode where it will use 64 bits for all
>>> pointers and use something like a LOAD64 opcode to indicate that
>>> the indexes (which it effectively uses as addresses / pointers)
>>> are 64 bit wide.
>>
>> Well, this LOAD64/STORE64 would just only be defined for MEMORY[]
>> src/dest, so you don't need to worry about 3d or anything like
that. I
>> believe this is a good solution to the problem.
>
> Right for MEMORY this will work fine. I've no clue yet how images will
> work with OpenCL though, hopefully we can avoid the one flat address
> space thing there, but I simply don't know yet.
>
>>>> Also it would be highly preferable to avoid using GLOBAL at all
in the
>>>> first place, and just use BUFFER[] or IMAGE[] or whatever.
However
>>>> after having a look at how OpenCL works, I don't think
that's
>>>> possible. But perhaps I missed something?
>>>
>>>
>>> OpenCL is very C-like and as such uses a flat address space per
memory
>>> space, getting around that is going to be very tricky, if possible
at
>>> all.
>>
>> That was my assessment as well.
>>
>>>
>>>> Perhaps I didn't make it clear enough though -- sorry. The
reality is
>>>> that TGSI is a living thing, and things change every so often.
I don't
>>>> think people would be comfortable with locking down TGSI in
such a
>>>> way.
>>>
>>>
>>> Hmm, I do think that if we end up using a llvm-ir -> tgsi step
in some
>>> cases that we do need to lock TGSI down. Taken the RES thing for
>>> example,
>>> it would have been pretty trivial actually to not nuke it, and
>>> instead just
>>> add the image support next to it. Note I'm fine with making
changes
>>> while
>>> we are figuring this out, but once I start pushing the llvm bits to
llvm
>>> upstream (which is still months away) we are going to need some
sort of
>>> TGSI stability.
>>
>> I can't unilaterally make such promises. This would have to be
>> discussed and approved by the other gallium/tgsi stakeholders.
>
> Understood.
>
> So back to the problem of getting OpenCL(ish) code to work again with
> the recent mesa changes. For starters I would like to get:
>
> src/gallium/tests/trivial/compute.c and then the test with mask 8,
> test_input_global() to work again, when that is working I should be
> able to adjust my llvm work (and if necessary clover) to start to
> work again.
>
> Currently the test_input_global() test uses the following bit of
> TGSI code:
>
> COMP
> DCL SV[0], THREAD_ID[0]
> DCL TEMP[0], LOCAL
> DCL TEMP[1], LOCAL
> IMM UINT32 { 8, 0, 0, 0 }
>
>     BGNSUB\n"
>         UMUL TEMP[0], SV[0], IMM[0]
>         LOAD TEMP[1].xy, RINPUT, TEMP[0]
>         LOAD TEMP[0].x, RGLOBAL, TEMP[1].yyyy
>         UADD TEMP[1].x, TEMP[0], -TEMP[1]
>         STORE RGLOBAL.x, TEMP[1].yyyy, TEMP[1]
>         RET
>     ENDSUB
>
>
> Where by RINPUT and RGLOBAL get replaces by processing the
> code with cpp and the following defines:
>
> #define RGLOBAL        RES[32767]
> #define RLOCAL         RES[32766]
> #define RPRIVATE       RES[32765]
> #define RINPUT         RES[32764]
>
> If I understand how memory is supposed to work, then I would need to
> change the TGSI as follows:
>
> COMP
> DCL SV[0], THREAD_ID[0]
> DCL MEMORY[0]
> DCL TEMP[0], LOCAL
> DCL TEMP[1], LOCAL
> IMM UINT32 { 8, 0, 0, 0 }
>
> BGNSUB\n"
> UMUL TEMP[0], SV[0], IMM[0]
> LOAD TEMP[1].xy, RINPUT, TEMP[0]
> LOAD TEMP[0].x, MEMORY[0], TEMP[1].yyyy
> UADD TEMP[1].x, TEMP[0], -TEMP[1]
> STORE MEMORY[0].x, TEMP[1].yyyy, TEMP[1]
> RET
> ENDSUB
Nope, this won't work because RINPUT is RES[32764]. And you have to 
remove all occurrences to RES because it's not longer supported. In my 
opinion, using BUFFER[0] in a first time should work. Currently, only 
SHARED with MEMORY is supported.
>
> This assumes, that as discussed declaring memory without a , SHARED or
> other
> flag means the memory is global.
>
> So 2 questions:
>
> 1) Do the above changes for using the new MEMORY keyword look as intended
> to you?
>
> 2) This only solves the accessing of the global memory, it does not solve
> getting to the kernel input kernel parameters, how would I deal with
> those ?
The input kernel parameters are directly passed through a call to 
pipe_context::launch_grid. You just have to fill the 
pipe_grid_info::input array with your parameters and they will be 
uploaded by nvXX_compute_upload_input().

I will have a look at the test_input_global().

Thanks!
>
> Currently the kernel input parameters are uploaded by
> src/gallium/drivers/nouveau/nv50/nv50_compute.c:
> nv50_compute_upload_input()
>
> Are we going to change how these parameters get uploaded ? And if not
> how do we
> get to these parameters from TGSI now that we no longer have RES[#] to
> do so ?
>
> I think we should also keep in mind that what works for nouveau may not
> work
> for other GPU-s, so we should either introduce a MEMORY[x], INPUT or
> a new DCL INPUT[0] for this, so that the code for other GPU-s can have its
> own handling for this.
>
> Regards,
>
> Hans
-- 
-Samuel

Hans de Goede

2016-Feb-22 12:46 UTC

head link

[Nouveau] Dealing with opencl kernel parameters in nouveau now that RES support is gone

Hi,

On 22-02-16 13:41, Samuel Pitoiset wrote:> Hi there,
>
> On 02/22/2016 12:26 PM, Hans de Goede wrote:
<snip>
>> So back to the problem of getting OpenCL(ish) code to work again with
>> the recent mesa changes. For starters I would like to get:
>>
>> src/gallium/tests/trivial/compute.c and then the test with mask 8,
>> test_input_global() to work again, when that is working I should be
>> able to adjust my llvm work (and if necessary clover) to start to
>> work again.
>>
>> Currently the test_input_global() test uses the following bit of
>> TGSI code:
>>
>> COMP
>> DCL SV[0], THREAD_ID[0]
>> DCL TEMP[0], LOCAL
>> DCL TEMP[1], LOCAL
>> IMM UINT32 { 8, 0, 0, 0 }
>>
>>     BGNSUB\n"
>>         UMUL TEMP[0], SV[0], IMM[0]
>>         LOAD TEMP[1].xy, RINPUT, TEMP[0]
>>         LOAD TEMP[0].x, RGLOBAL, TEMP[1].yyyy
>>         UADD TEMP[1].x, TEMP[0], -TEMP[1]
>>         STORE RGLOBAL.x, TEMP[1].yyyy, TEMP[1]
>>         RET
>>     ENDSUB
>>
>>
>> Where by RINPUT and RGLOBAL get replaces by processing the
>> code with cpp and the following defines:
>>
>> #define RGLOBAL        RES[32767]
>> #define RLOCAL         RES[32766]
>> #define RPRIVATE       RES[32765]
>> #define RINPUT         RES[32764]
>>
>> If I understand how memory is supposed to work, then I would need to
>> change the TGSI as follows:
>>
>> COMP
>> DCL SV[0], THREAD_ID[0]
>> DCL MEMORY[0]
>> DCL TEMP[0], LOCAL
>> DCL TEMP[1], LOCAL
>> IMM UINT32 { 8, 0, 0, 0 }
>>
>> BGNSUB\n"
>> UMUL TEMP[0], SV[0], IMM[0]
>> LOAD TEMP[1].xy, RINPUT, TEMP[0]
>> LOAD TEMP[0].x, MEMORY[0], TEMP[1].yyyy
>> UADD TEMP[1].x, TEMP[0], -TEMP[1]
>> STORE MEMORY[0].x, TEMP[1].yyyy, TEMP[1]
>> RET
>> ENDSUB
>
> Nope, this won't work because RINPUT is RES[32764]. And you have to
remove all occurrences to RES because it's not longer supported. In my
opinion, using BUFFER[0] in a first time should work. Currently, only SHARED
with MEMORY is supported.
Right, as I say below "This only solves the accessing of the global memory,
it does not solve
getting to the kernel input kernel parameters"
>> This assumes, that as discussed declaring memory without a , SHARED or
>> other
>> flag means the memory is global.
>>
>> So 2 questions:
>>
>> 1) Do the above changes for using the new MEMORY keyword look as
intended
>> to you?
>>
>> 2) This only solves the accessing of the global memory, it does not
solve
>> getting to the kernel input kernel parameters, how would I deal with
>> those ?
>
> The input kernel parameters are directly passed through a call to
pipe_context::launch_grid. You just have to fill the pipe_grid_info::input array
with your parameters and they will be uploaded by nvXX_compute_upload_input().
Right, the uploading side I understand, the question is how to get to them from
the compute kernel's tgsi code ?

If I understand you correctly you are suggesting to use BUFFER[0] for this,
that is fine from a nouveau point-of-view, but might be a bit nouveau
centric way of looking at things, I think a better approach would be
a separate input register-file for this, as that will be more flexible
when people try to do opencl via clang->llvm->tgsi on other GPUs.
> I will have a look at the test_input_global().
Thanks!

Regards,

Hans

Maybe Matching Threads

Search for more seemingly similar threads

Nouveau - Feb 2016 - Dealing with opencl kernel parameters in nouveau now that RES support is gone

[Nouveau] Dealing with opencl kernel parameters in nouveau now that RES support is gone

[Nouveau] Dealing with opencl kernel parameters in nouveau now that RES support is gone

[Nouveau] Dealing with opencl kernel parameters in nouveau now that RES support is gone

Maybe Matching Threads