thr3ads.net - Linux Virtualization - [PATCH 00/15] Share TTM code among framebuffer drivers [Apr 2019]

If this information is useful, please help other people find it:
Share via:

Daniel Vetter

2019-Apr-15 19:17 UTC

[PATCH 00/15] Share TTM code among framebuffer drivers

On Mon, Apr 15, 2019 at 6:21 PM Thomas Zimmermann <tzimmermann at suse.de>
wrote:>
> Hi
>
> Am 15.04.19 um 17:54 schrieb Daniel Vetter:
> > On Tue, Apr 09, 2019 at 09:50:40AM +0200, Thomas Zimmermann wrote:
> >> Hi
> >>
> >> Am 09.04.19 um 09:12 schrieb kraxel at redhat.com:
> >>>   Hi,
> >>>
> >>>> If not for TTM, what would be the alternative? One VMA
manager per
> >>>> memory region per device?
> >>>
> >>> Depends pretty much on the device.
> >>>
> >>> The cirrus is a display device with only 4 MB of vram.  You
can't fit
> >>> much in there.  A single 1024x768 @ 24bpp framebuffer needs
more 50%
> >>> of the video memory already.  Which is why the cirrus driver
(before the
> >>> rewrite) had to migrate buffers from/to vram on every page
flip[1].  Which
> >>> is one[2] of the reasons why cirrus (after rewrite)
doesn't ttm-manage the
> >>> vram any more.  gem objects are managed with the shmem helpers
instead
> >>> and the active framebuffer is blitted to vram.
> >>>
> >>> The qemu stdvga (bochs driver) has 16 MB vram by default and
can be
> >>> configured to have up to 256 MB.  Plenty of room even for
multiple 4k
> >>> framebuffers if needed.  So for the bochs driver all the ttm
bo
> >>> migration logic is not needed, it could just store everything
in vram.
> >>>
> >>> A set of drm_gem_vram_* helpers would do the job for bochs.
> >>
> >> Thanks for clarifying. drm_gem_vram_* (and drm_vram_mm for Simple
TTM)
> >> is probably a better name for the data structures.
> >
> > +1 on drm_gem_vram_* naming convention - we want to describe what
it's
> > for, not how it's implemented.
>
> OK, great.
>
> >>> I'd expect the same applies to the vbox driver.
> >>>
> >>> Dunno about the other drm drivers and the fbdev drivers you
plan to
> >>> migrate over.
> >>
> >> The AST HW can support up to 512 MiB, but 32-64 MiB seems more
realistic
> >> for a server. It's similar with mgag200 HW. The old
fbdev-supported
> >> device are all somewhere in the range between cirrus and bochs.
Some
> >> drivers would probably benefit from the cirrus approach, some
could use
> >> VRAM directly.
> >
> > I think for dumb scanout with vram all we need is:
> > - pin framebuffers, which potentially moves the underlying bo into
vram
> > - unpin framebuffers (which is just accounting, we don't want to
move the
> >   bo on every flip!)
> > - if a pin doesn't find enough space, move one of the unpinned bo
still
> >   resident in vram out
>
> For dumb buffers, I'd expect userspace to have a working set of only a
> front and back buffer (plus maybe a third one). This working set has to
> reside in VRAM for performance reasons; non-WS BOs from other userspace
> programs don't have to be.
>
> So we could simplify even more: if there's not enough free space in
> vram, remove all unpinned BO's. This would avoid the need to implement
> an LRU algorithm or another eviction strategy. Userspace with a WS
> larger than the absolute VRAM would see degraded performance but
> otherwise still work.
You still need a list of unpinned bo, and the lru scan algorithm is
just a few lines of code more than unpinning everything. Plus it'd be
a neat example of the drm_mm scan logic. Given that some folks might
think that not having lru evict si a problem and get them to type
their own, I'd just add it. But up to you. Plus with ttm you get it no
matter what.
-Daniel

> Best regards
> Thomas
>
> > - no pipelining, no support for dma engines (it's all cpu copies
anway)
> > - a simple drm_mm should be good enough to manage the vram, no need
for
> >   the ttm style abstraction over how memory is manged
> > - also just bake in the lru eviction list and algorithm
> > - probably good to have built-in support for redirecting the mmap
between
> >   shmem and iomem.
> > - anything that needs pipelining or copy engines would be out of scope
for
> >   these helpers
> >
> > I think for starting points we can go with a copypasted version of the
> > various ttm implementations we already have, and then simplify from
there
> > as needed. Or just start fresh if that's too complicated, due to
the issue
> > Christian highlighted.
> > -Daniel
> >
> >> Best regards
> >> Thomas
> >>
> >>>
> >>> cheers,
> >>>   Gerd
> >>>
> >>> [1] Note: The page-flip migration logic is present in some of
the other
> >>>     drivers too, not sure whenever they actually need that due
to being low
> >>>     on vram too or whenever they just copied the old cirrus
code ...
> >>>
> >>> [2] The other reason is that this allow to convert formats at
blit time,
> >>>     which helps to deal with some cirrus hardware limitations.
> >>>
> >>
> >> --
> >> Thomas Zimmermann
> >> Graphics Driver Developer
> >> SUSE Linux GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany
> >> GF: Felix Imend?rffer, Mary Higgins, Sri Rasiah
> >> HRB 21284 (AG N?rnberg)
> >>
> >
> >
> >
> >
>
> --
> Thomas Zimmermann
> Graphics Driver Developer
> SUSE Linux GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany
> GF: Felix Imend?rffer, Mary Higgins, Sri Rasiah
> HRB 21284 (AG N?rnberg)
>

-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

Koenig, Christian

2019-Apr-16 10:05 UTC

head link

[PATCH 00/15] Share TTM code among framebuffer drivers

Am 15.04.19 um 21:17 schrieb Daniel Vetter:> On Mon, Apr 15, 2019 at 6:21 PM Thomas Zimmermann <tzimmermann at
suse.de> wrote:
>> Hi
>>
>> Am 15.04.19 um 17:54 schrieb Daniel Vetter:
>>> On Tue, Apr 09, 2019 at 09:50:40AM +0200, Thomas Zimmermann wrote:
>>>> Hi
>>>>
>>>> Am 09.04.19 um 09:12 schrieb kraxel at redhat.com:
>>>> [SNIP]
>>>>> I'd expect the same applies to the vbox driver.
>>>>>
>>>>> Dunno about the other drm drivers and the fbdev drivers you
plan to
>>>>> migrate over.
>>>> The AST HW can support up to 512 MiB, but 32-64 MiB seems more
realistic
>>>> for a server. It's similar with mgag200 HW. The old
fbdev-supported
>>>> device are all somewhere in the range between cirrus and bochs.
Some
>>>> drivers would probably benefit from the cirrus approach, some
could use
>>>> VRAM directly.
>>> I think for dumb scanout with vram all we need is:
>>> - pin framebuffers, which potentially moves the underlying bo into
vram
>>> - unpin framebuffers (which is just accounting, we don't want
to move the
>>>    bo on every flip!)
>>> - if a pin doesn't find enough space, move one of the unpinned
bo still
>>>    resident in vram out
>> For dumb buffers, I'd expect userspace to have a working set of
only a
>> front and back buffer (plus maybe a third one). This working set has to
>> reside in VRAM for performance reasons; non-WS BOs from other userspace
>> programs don't have to be.
>>
>> So we could simplify even more: if there's not enough free space in
>> vram, remove all unpinned BO's. This would avoid the need to
implement
>> an LRU algorithm or another eviction strategy. Userspace with a WS
>> larger than the absolute VRAM would see degraded performance but
>> otherwise still work.
> You still need a list of unpinned bo, and the lru scan algorithm is
> just a few lines of code more than unpinning everything. Plus it'd be
> a neat example of the drm_mm scan logic. Given that some folks might
> think that not having lru evict si a problem and get them to type
> their own, I'd just add it. But up to you. Plus with ttm you get it no
> matter what.
Well how about making an drm_lru component which just does the following 
(and nothing else, please :):

1. Keep a list of objects and a spinlock protecting the list.

2. Offer helpers for adding/deleting/moving stuff from the list.

3. Offer a functionality to do the necessary dance of picking the first 
entry where we can trylock it's reservation object.

4. Offer bulk move functionality similar to what TTM does at the moment 
(can be implemented later on).

Regards,
Christian.

Daniel Vetter

2019-Apr-16 11:03 UTC

head link

[PATCH 00/15] Share TTM code among framebuffer drivers

On Tue, Apr 16, 2019 at 12:05 PM Koenig, Christian
<Christian.Koenig at amd.com> wrote:>
> Am 15.04.19 um 21:17 schrieb Daniel Vetter:
> > On Mon, Apr 15, 2019 at 6:21 PM Thomas Zimmermann <tzimmermann at
suse.de> wrote:
> >> Hi
> >>
> >> Am 15.04.19 um 17:54 schrieb Daniel Vetter:
> >>> On Tue, Apr 09, 2019 at 09:50:40AM +0200, Thomas Zimmermann
wrote:
> >>>> Hi
> >>>>
> >>>> Am 09.04.19 um 09:12 schrieb kraxel at redhat.com:
> >>>> [SNIP]
> >>>>> I'd expect the same applies to the vbox driver.
> >>>>>
> >>>>> Dunno about the other drm drivers and the fbdev
drivers you plan to
> >>>>> migrate over.
> >>>> The AST HW can support up to 512 MiB, but 32-64 MiB seems
more realistic
> >>>> for a server. It's similar with mgag200 HW. The old
fbdev-supported
> >>>> device are all somewhere in the range between cirrus and
bochs. Some
> >>>> drivers would probably benefit from the cirrus approach,
some could use
> >>>> VRAM directly.
> >>> I think for dumb scanout with vram all we need is:
> >>> - pin framebuffers, which potentially moves the underlying bo
into vram
> >>> - unpin framebuffers (which is just accounting, we don't
want to move the
> >>>    bo on every flip!)
> >>> - if a pin doesn't find enough space, move one of the
unpinned bo still
> >>>    resident in vram out
> >> For dumb buffers, I'd expect userspace to have a working set
of only a
> >> front and back buffer (plus maybe a third one). This working set
has to
> >> reside in VRAM for performance reasons; non-WS BOs from other
userspace
> >> programs don't have to be.
> >>
> >> So we could simplify even more: if there's not enough free
space in
> >> vram, remove all unpinned BO's. This would avoid the need to
implement
> >> an LRU algorithm or another eviction strategy. Userspace with a WS
> >> larger than the absolute VRAM would see degraded performance but
> >> otherwise still work.
> > You still need a list of unpinned bo, and the lru scan algorithm is
> > just a few lines of code more than unpinning everything. Plus it'd
be
> > a neat example of the drm_mm scan logic. Given that some folks might
> > think that not having lru evict si a problem and get them to type
> > their own, I'd just add it. But up to you. Plus with ttm you get
it no
> > matter what.
>
> Well how about making an drm_lru component which just does the following
> (and nothing else, please :):
>
> 1. Keep a list of objects and a spinlock protecting the list.
>
> 2. Offer helpers for adding/deleting/moving stuff from the list.
>
> 3. Offer a functionality to do the necessary dance of picking the first
> entry where we can trylock it's reservation object.
>
> 4. Offer bulk move functionality similar to what TTM does at the moment
> (can be implemented later on).
At a basic level, this is list_head of drm_gem_object. Not sure that's
all that useful (outside of the fairly simplistic vram helpers we're
discussing here). Reasons for that is that there's a lot of trickery
in selecting which is the best object to pick in any given case (e.g.
do you want to use drm_mm scanning, or is there a slab of objects you
prefer to throw out because that avoids. Given that I'm not sure
implementing the entire scanning/drm_lru logic is beneficial.

The magic trylock+kref_get_unless_zero otoh could be worth
implementing as a helper, together with a note about how to build your
own custom lru algorithm. Same for some bulk/nonblocking movement
helpers maybe. Both not really needed for the dumb scanout vram
helpers we're discussing here.
-Daniel

-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

Apparently Analagous Threads

Search for more apparently analagous threads

Linux Virtualization - Apr 2019 - [PATCH 00/15] Share TTM code among framebuffer drivers

[PATCH 00/15] Share TTM code among framebuffer drivers

[PATCH 00/15] Share TTM code among framebuffer drivers

[PATCH 00/15] Share TTM code among framebuffer drivers

Apparently Analagous Threads