Daniel Vetter
2019-Apr-15 19:17 UTC
[PATCH 00/15] Share TTM code among framebuffer drivers
On Mon, Apr 15, 2019 at 6:21 PM Thomas Zimmermann <tzimmermann at suse.de> wrote:> > Hi > > Am 15.04.19 um 17:54 schrieb Daniel Vetter: > > On Tue, Apr 09, 2019 at 09:50:40AM +0200, Thomas Zimmermann wrote: > >> Hi > >> > >> Am 09.04.19 um 09:12 schrieb kraxel at redhat.com: > >>> Hi, > >>> > >>>> If not for TTM, what would be the alternative? One VMA manager per > >>>> memory region per device? > >>> > >>> Depends pretty much on the device. > >>> > >>> The cirrus is a display device with only 4 MB of vram. You can't fit > >>> much in there. A single 1024x768 @ 24bpp framebuffer needs more 50% > >>> of the video memory already. Which is why the cirrus driver (before the > >>> rewrite) had to migrate buffers from/to vram on every page flip[1]. Which > >>> is one[2] of the reasons why cirrus (after rewrite) doesn't ttm-manage the > >>> vram any more. gem objects are managed with the shmem helpers instead > >>> and the active framebuffer is blitted to vram. > >>> > >>> The qemu stdvga (bochs driver) has 16 MB vram by default and can be > >>> configured to have up to 256 MB. Plenty of room even for multiple 4k > >>> framebuffers if needed. So for the bochs driver all the ttm bo > >>> migration logic is not needed, it could just store everything in vram. > >>> > >>> A set of drm_gem_vram_* helpers would do the job for bochs. > >> > >> Thanks for clarifying. drm_gem_vram_* (and drm_vram_mm for Simple TTM) > >> is probably a better name for the data structures. > > > > +1 on drm_gem_vram_* naming convention - we want to describe what it's > > for, not how it's implemented. > > OK, great. > > >>> I'd expect the same applies to the vbox driver. > >>> > >>> Dunno about the other drm drivers and the fbdev drivers you plan to > >>> migrate over. > >> > >> The AST HW can support up to 512 MiB, but 32-64 MiB seems more realistic > >> for a server. It's similar with mgag200 HW. The old fbdev-supported > >> device are all somewhere in the range between cirrus and bochs. Some > >> drivers would probably benefit from the cirrus approach, some could use > >> VRAM directly. > > > > I think for dumb scanout with vram all we need is: > > - pin framebuffers, which potentially moves the underlying bo into vram > > - unpin framebuffers (which is just accounting, we don't want to move the > > bo on every flip!) > > - if a pin doesn't find enough space, move one of the unpinned bo still > > resident in vram out > > For dumb buffers, I'd expect userspace to have a working set of only a > front and back buffer (plus maybe a third one). This working set has to > reside in VRAM for performance reasons; non-WS BOs from other userspace > programs don't have to be. > > So we could simplify even more: if there's not enough free space in > vram, remove all unpinned BO's. This would avoid the need to implement > an LRU algorithm or another eviction strategy. Userspace with a WS > larger than the absolute VRAM would see degraded performance but > otherwise still work.You still need a list of unpinned bo, and the lru scan algorithm is just a few lines of code more than unpinning everything. Plus it'd be a neat example of the drm_mm scan logic. Given that some folks might think that not having lru evict si a problem and get them to type their own, I'd just add it. But up to you. Plus with ttm you get it no matter what. -Daniel> Best regards > Thomas > > > - no pipelining, no support for dma engines (it's all cpu copies anway) > > - a simple drm_mm should be good enough to manage the vram, no need for > > the ttm style abstraction over how memory is manged > > - also just bake in the lru eviction list and algorithm > > - probably good to have built-in support for redirecting the mmap between > > shmem and iomem. > > - anything that needs pipelining or copy engines would be out of scope for > > these helpers > > > > I think for starting points we can go with a copypasted version of the > > various ttm implementations we already have, and then simplify from there > > as needed. Or just start fresh if that's too complicated, due to the issue > > Christian highlighted. > > -Daniel > > > >> Best regards > >> Thomas > >> > >>> > >>> cheers, > >>> Gerd > >>> > >>> [1] Note: The page-flip migration logic is present in some of the other > >>> drivers too, not sure whenever they actually need that due to being low > >>> on vram too or whenever they just copied the old cirrus code ... > >>> > >>> [2] The other reason is that this allow to convert formats at blit time, > >>> which helps to deal with some cirrus hardware limitations. > >>> > >> > >> -- > >> Thomas Zimmermann > >> Graphics Driver Developer > >> SUSE Linux GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany > >> GF: Felix Imend?rffer, Mary Higgins, Sri Rasiah > >> HRB 21284 (AG N?rnberg) > >> > > > > > > > > > > -- > Thomas Zimmermann > Graphics Driver Developer > SUSE Linux GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany > GF: Felix Imend?rffer, Mary Higgins, Sri Rasiah > HRB 21284 (AG N?rnberg) >-- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch
Koenig, Christian
2019-Apr-16 10:05 UTC
[PATCH 00/15] Share TTM code among framebuffer drivers
Am 15.04.19 um 21:17 schrieb Daniel Vetter:> On Mon, Apr 15, 2019 at 6:21 PM Thomas Zimmermann <tzimmermann at suse.de> wrote: >> Hi >> >> Am 15.04.19 um 17:54 schrieb Daniel Vetter: >>> On Tue, Apr 09, 2019 at 09:50:40AM +0200, Thomas Zimmermann wrote: >>>> Hi >>>> >>>> Am 09.04.19 um 09:12 schrieb kraxel at redhat.com: >>>> [SNIP] >>>>> I'd expect the same applies to the vbox driver. >>>>> >>>>> Dunno about the other drm drivers and the fbdev drivers you plan to >>>>> migrate over. >>>> The AST HW can support up to 512 MiB, but 32-64 MiB seems more realistic >>>> for a server. It's similar with mgag200 HW. The old fbdev-supported >>>> device are all somewhere in the range between cirrus and bochs. Some >>>> drivers would probably benefit from the cirrus approach, some could use >>>> VRAM directly. >>> I think for dumb scanout with vram all we need is: >>> - pin framebuffers, which potentially moves the underlying bo into vram >>> - unpin framebuffers (which is just accounting, we don't want to move the >>> bo on every flip!) >>> - if a pin doesn't find enough space, move one of the unpinned bo still >>> resident in vram out >> For dumb buffers, I'd expect userspace to have a working set of only a >> front and back buffer (plus maybe a third one). This working set has to >> reside in VRAM for performance reasons; non-WS BOs from other userspace >> programs don't have to be. >> >> So we could simplify even more: if there's not enough free space in >> vram, remove all unpinned BO's. This would avoid the need to implement >> an LRU algorithm or another eviction strategy. Userspace with a WS >> larger than the absolute VRAM would see degraded performance but >> otherwise still work. > You still need a list of unpinned bo, and the lru scan algorithm is > just a few lines of code more than unpinning everything. Plus it'd be > a neat example of the drm_mm scan logic. Given that some folks might > think that not having lru evict si a problem and get them to type > their own, I'd just add it. But up to you. Plus with ttm you get it no > matter what.Well how about making an drm_lru component which just does the following (and nothing else, please :): 1. Keep a list of objects and a spinlock protecting the list. 2. Offer helpers for adding/deleting/moving stuff from the list. 3. Offer a functionality to do the necessary dance of picking the first entry where we can trylock it's reservation object. 4. Offer bulk move functionality similar to what TTM does at the moment (can be implemented later on). Regards, Christian.
Daniel Vetter
2019-Apr-16 11:03 UTC
[PATCH 00/15] Share TTM code among framebuffer drivers
On Tue, Apr 16, 2019 at 12:05 PM Koenig, Christian <Christian.Koenig at amd.com> wrote:> > Am 15.04.19 um 21:17 schrieb Daniel Vetter: > > On Mon, Apr 15, 2019 at 6:21 PM Thomas Zimmermann <tzimmermann at suse.de> wrote: > >> Hi > >> > >> Am 15.04.19 um 17:54 schrieb Daniel Vetter: > >>> On Tue, Apr 09, 2019 at 09:50:40AM +0200, Thomas Zimmermann wrote: > >>>> Hi > >>>> > >>>> Am 09.04.19 um 09:12 schrieb kraxel at redhat.com: > >>>> [SNIP] > >>>>> I'd expect the same applies to the vbox driver. > >>>>> > >>>>> Dunno about the other drm drivers and the fbdev drivers you plan to > >>>>> migrate over. > >>>> The AST HW can support up to 512 MiB, but 32-64 MiB seems more realistic > >>>> for a server. It's similar with mgag200 HW. The old fbdev-supported > >>>> device are all somewhere in the range between cirrus and bochs. Some > >>>> drivers would probably benefit from the cirrus approach, some could use > >>>> VRAM directly. > >>> I think for dumb scanout with vram all we need is: > >>> - pin framebuffers, which potentially moves the underlying bo into vram > >>> - unpin framebuffers (which is just accounting, we don't want to move the > >>> bo on every flip!) > >>> - if a pin doesn't find enough space, move one of the unpinned bo still > >>> resident in vram out > >> For dumb buffers, I'd expect userspace to have a working set of only a > >> front and back buffer (plus maybe a third one). This working set has to > >> reside in VRAM for performance reasons; non-WS BOs from other userspace > >> programs don't have to be. > >> > >> So we could simplify even more: if there's not enough free space in > >> vram, remove all unpinned BO's. This would avoid the need to implement > >> an LRU algorithm or another eviction strategy. Userspace with a WS > >> larger than the absolute VRAM would see degraded performance but > >> otherwise still work. > > You still need a list of unpinned bo, and the lru scan algorithm is > > just a few lines of code more than unpinning everything. Plus it'd be > > a neat example of the drm_mm scan logic. Given that some folks might > > think that not having lru evict si a problem and get them to type > > their own, I'd just add it. But up to you. Plus with ttm you get it no > > matter what. > > Well how about making an drm_lru component which just does the following > (and nothing else, please :): > > 1. Keep a list of objects and a spinlock protecting the list. > > 2. Offer helpers for adding/deleting/moving stuff from the list. > > 3. Offer a functionality to do the necessary dance of picking the first > entry where we can trylock it's reservation object. > > 4. Offer bulk move functionality similar to what TTM does at the moment > (can be implemented later on).At a basic level, this is list_head of drm_gem_object. Not sure that's all that useful (outside of the fairly simplistic vram helpers we're discussing here). Reasons for that is that there's a lot of trickery in selecting which is the best object to pick in any given case (e.g. do you want to use drm_mm scanning, or is there a slab of objects you prefer to throw out because that avoids. Given that I'm not sure implementing the entire scanning/drm_lru logic is beneficial. The magic trylock+kref_get_unless_zero otoh could be worth implementing as a helper, together with a note about how to build your own custom lru algorithm. Same for some bulk/nonblocking movement helpers maybe. Both not really needed for the dumb scanout vram helpers we're discussing here. -Daniel -Daniel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch
Maybe Matching Threads
- [PATCH 00/15] Share TTM code among framebuffer drivers
- [PATCH 00/15] Share TTM code among framebuffer drivers
- [PATCH 00/15] Share TTM code among framebuffer drivers
- [PATCH 00/15] Share TTM code among framebuffer drivers
- [PATCH 00/15] Share TTM code among framebuffer drivers