thr3ads.net - Nouveau - [Nouveau] [Intel-gfx] [PATCH 0/7] Per client engine busyness [May 2021]

If this information is useful, please help other people find it:
Share via:

Daniel Vetter

2021-May-20 14:11 UTC

[Nouveau] [Intel-gfx] [PATCH 0/7] Per client engine busyness

On Wed, May 19, 2021 at 11:17:24PM +0000, Nieto, David M
wrote:> [AMD Official Use Only]
> 
> Parsing over 550 processes for fdinfo is taking between 40-100ms single
> threaded in a 2GHz skylake IBRS within a VM using simple string
> comparisons and DIRent parsing. And that is pretty much the worst case
> scenario with some more optimized implementations.
I think this is plenty ok, and if it's not you could probably make this
massively faster with io_uring for all the fs operations and whack a
parser-generator on top for real parsing speed.

So imo we shouldn't worry about algorithmic inefficiency of the fdinfo
approach at all, and focuse more on trying to reasonably (but not too
much, this is still drm render stuff after all) standardize how it works
and how we'll extend it all. I think there's tons of good suggestions in
this thread on this topic already.

/me out
-Daniel
> 
> David
> ________________________________
> From: Daniel Vetter <daniel at ffwll.ch>
> Sent: Wednesday, May 19, 2021 11:23 AM
> To: Tvrtko Ursulin <tvrtko.ursulin at linux.intel.com>
> Cc: Daniel Stone <daniel at fooishbar.org>; jhubbard at nvidia.com
<jhubbard at nvidia.com>; nouveau at lists.freedesktop.org <nouveau at
lists.freedesktop.org>; Intel Graphics Development <Intel-gfx at
lists.freedesktop.org>; Maling list - DRI developers <dri-devel at
lists.freedesktop.org>; Simon Ser <contact at emersion.fr>; Koenig,
Christian <Christian.Koenig at amd.com>; aritger at nvidia.com <aritger
at nvidia.com>; Nieto, David M <David.Nieto at amd.com>
> Subject: Re: [Intel-gfx] [PATCH 0/7] Per client engine busyness
> 
> On Wed, May 19, 2021 at 6:16 PM Tvrtko Ursulin
> <tvrtko.ursulin at linux.intel.com> wrote:
> >
> >
> > On 18/05/2021 10:40, Tvrtko Ursulin wrote:
> > >
> > > On 18/05/2021 10:16, Daniel Stone wrote:
> > >> Hi,
> > >>
> > >> On Tue, 18 May 2021 at 10:09, Tvrtko Ursulin
> > >> <tvrtko.ursulin at linux.intel.com> wrote:
> > >>> I was just wondering if stat(2) and a chrdev major check
would be a
> > >>> solid criteria to more efficiently (compared to parsing
the text
> > >>> content) detect drm files while walking procfs.
> > >>
> > >> Maybe I'm missing something, but is the per-PID walk
actually a
> > >> measurable performance issue rather than just a bit
unpleasant?
> > >
> > > Per pid and per each open fd.
> > >
> > > As said in the other thread what bothers me a bit in this scheme
is that
> > > the cost of obtaining GPU usage scales based on non-GPU criteria.
> > >
> > > For use case of a top-like tool which shows all processes this is
a
> > > smaller additional cost, but then for a gpu-top like tool it is
somewhat
> > > higher.
> >
> > To further expand, not only cost would scale per pid multiplies per
open
> > fd, but to detect which of the fds are DRM I see these three options:
> >
> > 1) Open and parse fdinfo.
> > 2) Name based matching ie /dev/dri/.. something.
> > 3) Stat the symlink target and check for DRM major.
> 
> stat with symlink following should be plenty fast.
> 
> > All sound quite sub-optimal to me.
> >
> > Name based matching is probably the least evil on system resource
usage
> > (Keeping the dentry cache too hot? Too many syscalls?), even though
> > fundamentally I don't it is the right approach.
> >
> > What happens with dup(2) is another question.
> 
> We need benchmark numbers showing that on anything remotely realistic
> it's an actual problem. Until we've demonstrated it's a real
problem
> we don't need to solve it.
> 
> E.g. top with any sorting enabled also parses way more than it
> displays on every update. It seems to be doing Just Fine (tm).
> 
> > Does anyone have any feedback on the /proc/<pid>/gpu idea at
all?
> 
> When we know we have a problem to solve we can take a look at solutions.
> -Daniel
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
>
https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Fblog.ffwll.ch%2F&amp;data=04%7C01%7CDavid.Nieto%40amd.com%7Cf6aea97532cf41f916de08d91af32cc1%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637570453997158377%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=4CFrY9qWbJREcIcSzeO9KIn2P%2Fw6k%2BYdNlh6rdS%2BEh4%3D&amp;reserved=0
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

Christian König

2021-May-20 14:12 UTC

head link

[Nouveau] [Intel-gfx] [PATCH 0/7] Per client engine busyness

Am 20.05.21 um 16:11 schrieb Daniel Vetter:> On Wed, May 19, 2021 at 11:17:24PM +0000, Nieto, David M wrote:
>> [AMD Official Use Only]
>>
>> Parsing over 550 processes for fdinfo is taking between 40-100ms single
>> threaded in a 2GHz skylake IBRS within a VM using simple string
>> comparisons and DIRent parsing. And that is pretty much the worst case
>> scenario with some more optimized implementations.
> I think this is plenty ok, and if it's not you could probably make this
> massively faster with io_uring for all the fs operations and whack a
> parser-generator on top for real parsing speed.
Well if it becomes a problem fixing the debugfs "clients" file and 
making it sysfs shouldn't be much of a problem later on.

Christian.
>
> So imo we shouldn't worry about algorithmic inefficiency of the fdinfo
> approach at all, and focuse more on trying to reasonably (but not too
> much, this is still drm render stuff after all) standardize how it works
> and how we'll extend it all. I think there's tons of good
suggestions in
> this thread on this topic already.
>
> /me out
> -Daniel
>
>> David
>> ________________________________
>> From: Daniel Vetter <daniel at ffwll.ch>
>> Sent: Wednesday, May 19, 2021 11:23 AM
>> To: Tvrtko Ursulin <tvrtko.ursulin at linux.intel.com>
>> Cc: Daniel Stone <daniel at fooishbar.org>; jhubbard at
nvidia.com <jhubbard at nvidia.com>; nouveau at lists.freedesktop.org
<nouveau at lists.freedesktop.org>; Intel Graphics Development
<Intel-gfx at lists.freedesktop.org>; Maling list - DRI developers
<dri-devel at lists.freedesktop.org>; Simon Ser <contact at
emersion.fr>; Koenig, Christian <Christian.Koenig at amd.com>; aritger
at nvidia.com <aritger at nvidia.com>; Nieto, David M <David.Nieto at
amd.com>
>> Subject: Re: [Intel-gfx] [PATCH 0/7] Per client engine busyness
>>
>> On Wed, May 19, 2021 at 6:16 PM Tvrtko Ursulin
>> <tvrtko.ursulin at linux.intel.com> wrote:
>>>
>>> On 18/05/2021 10:40, Tvrtko Ursulin wrote:
>>>> On 18/05/2021 10:16, Daniel Stone wrote:
>>>>> Hi,
>>>>>
>>>>> On Tue, 18 May 2021 at 10:09, Tvrtko Ursulin
>>>>> <tvrtko.ursulin at linux.intel.com> wrote:
>>>>>> I was just wondering if stat(2) and a chrdev major
check would be a
>>>>>> solid criteria to more efficiently (compared to parsing
the text
>>>>>> content) detect drm files while walking procfs.
>>>>> Maybe I'm missing something, but is the per-PID walk
actually a
>>>>> measurable performance issue rather than just a bit
unpleasant?
>>>> Per pid and per each open fd.
>>>>
>>>> As said in the other thread what bothers me a bit in this
scheme is that
>>>> the cost of obtaining GPU usage scales based on non-GPU
criteria.
>>>>
>>>> For use case of a top-like tool which shows all processes this
is a
>>>> smaller additional cost, but then for a gpu-top like tool it is
somewhat
>>>> higher.
>>> To further expand, not only cost would scale per pid multiplies per
open
>>> fd, but to detect which of the fds are DRM I see these three
options:
>>>
>>> 1) Open and parse fdinfo.
>>> 2) Name based matching ie /dev/dri/.. something.
>>> 3) Stat the symlink target and check for DRM major.
>> stat with symlink following should be plenty fast.
>>
>>> All sound quite sub-optimal to me.
>>>
>>> Name based matching is probably the least evil on system resource
usage
>>> (Keeping the dentry cache too hot? Too many syscalls?), even though
>>> fundamentally I don't it is the right approach.
>>>
>>> What happens with dup(2) is another question.
>> We need benchmark numbers showing that on anything remotely realistic
>> it's an actual problem. Until we've demonstrated it's a
real problem
>> we don't need to solve it.
>>
>> E.g. top with any sorting enabled also parses way more than it
>> displays on every update. It seems to be doing Just Fine (tm).
>>
>>> Does anyone have any feedback on the /proc/<pid>/gpu idea at
all?
>> When we know we have a problem to solve we can take a look at
solutions.
>> -Daniel
>> --
>> Daniel Vetter
>> Software Engineer, Intel Corporation
>>
https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Fblog.ffwll.ch%2F&amp;data=04%7C01%7CChristian.Koenig%40amd.com%7Ced2eccaa081d4cd336d408d91b991ee0%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637571166744508313%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=ZihrnanU70nJAM6bHYCjRnURDDCIdwGI85imjGd%2FNgs%3D&amp;reserved=0

Nouveau - May 2021 - [Intel-gfx] [PATCH 0/7] Per client engine busyness

[Nouveau] [Intel-gfx] [PATCH 0/7] Per client engine busyness

[Nouveau] [Intel-gfx] [PATCH 0/7] Per client engine busyness