In case anyone's curious about 30bpp framebuffer support, here's the current status: Kernel: Ben and I have switched the code to using a 256-based LUT for Kepler+, and I've also written a patch to cause the addfb ioctl to use the proper format. You can pick this up at: https://github.com/skeggsb/linux/commits/linux-4.16 (note the branch!) https://patchwork.freedesktop.org/patch/202322/ With these two, you should be able to use "X -depth 30" again on any G80+ GPU to bring up a screen (as you could in kernel 4.9 and earlier). However this still has some deficiencies, some of which I've addressed: xf86-video-nouveau: DRI3 was broken, and Xv was broken. Patches available at: https://github.com/imirkin/xf86-video-nouveau/commits/master mesa: The NVIDIA hardware (pre-Kepler) can only do XBGR scanout. Further the nouveau KMS doesn't add XRGB scanout for Kepler+ (although it could). Mesa was only enabled for XRGB, so I've piped XBGR through all the same places: https://github.com/imirkin/mesa/commits/30bpp libdrm: For testing, I added a modetest gradient pattern split horizontally. Top half is 10bpc, bottom half is 8bpc. This is useful for seeing whether you're really getting 10bpc, or if things are getting truncated along the way. Definitely hacky, but ... wasn't intending on upstreaming it anyways: https://github.com/imirkin/drm/commit/9b8776f58448b5745675c3a7f5eb2735e3989441 ------------------------------------- Results with the patches (tested on a GK208B and a "deep color" TV over HDMI): - modetest with a 10bpc gradient shows up smoother than an 8bpc gradient. However it's still dithered to 8bpc, not "real" 10bpc. - things generally work in X -- dri2 and dri3, xv, and obviously regular X rendering / acceleration - lots of X software can't handle 30bpp modes (mplayer hates it for xv and x11 rendering, aterm bails on shading the root pixmap, probably others) I'm also told that with DP, it should actually send the higher-bpc data over the wire. With HDMI, we're still stuck at 24bpp for now (although the hardware can do 36bpp as well). This is why my gradient result above was still dithered. Things to do - mostly nouveau specific, but probably some general infra needed too: - Figure out how to properly expose the 1024-sized LUT - Add fp16 scanout - Stop relying on the max bpc of the monitor/connector and make decisions based on the "effective" bpc (e.g. based on the currently-set fb format, take hdmi/dp into account, etc). This will also affect the max clock somehow. Perhaps there should be a way to force a connector to a certain bpc. - Add higher-bpc HDMI support - Add 10bpc dithering (only makes sense if >= 10bpc output is *actually* enabled first) - Investigate YUV HDMI modes (esp since they can enable 4K at 60 on HDMI 1.4 hardware) - Test out Wayland compositors - Teach xf86-video-modesetting about addfb2 or that nouveau's ordering is different. I don't necessarily plan on working further on this, so if there are interested parties, they should definitely try to pick it up. I'll try to upstream all my changes though. Cheers, -ilia
On Sun, Feb 04, 2018 at 06:50:45PM -0500, Ilia Mirkin wrote:> In case anyone's curious about 30bpp framebuffer support, here's the > current status: > > Kernel: > > Ben and I have switched the code to using a 256-based LUT for Kepler+, > and I've also written a patch to cause the addfb ioctl to use the > proper format. You can pick this up at: > > https://github.com/skeggsb/linux/commits/linux-4.16 (note the branch!) > https://patchwork.freedesktop.org/patch/202322/ > > With these two, you should be able to use "X -depth 30" again on any > G80+ GPU to bring up a screen (as you could in kernel 4.9 and > earlier). However this still has some deficiencies, some of which I've > addressed: > > xf86-video-nouveau: > > DRI3 was broken, and Xv was broken. Patches available at: > > https://github.com/imirkin/xf86-video-nouveau/commits/master > > mesa: > > The NVIDIA hardware (pre-Kepler) can only do XBGR scanout. Further the > nouveau KMS doesn't add XRGB scanout for Kepler+ (although it could). > Mesa was only enabled for XRGB, so I've piped XBGR through all the > same places: > > https://github.com/imirkin/mesa/commits/30bpp > > libdrm: > > For testing, I added a modetest gradient pattern split horizontally. > Top half is 10bpc, bottom half is 8bpc. This is useful for seeing > whether you're really getting 10bpc, or if things are getting > truncated along the way. Definitely hacky, but ... wasn't intending on > upstreaming it anyways: > > https://github.com/imirkin/drm/commit/9b8776f58448b5745675c3a7f5eb2735e3989441 > > ------------------------------------- > > Results with the patches (tested on a GK208B and a "deep color" TV over HDMI): > - modetest with a 10bpc gradient shows up smoother than an 8bpc > gradient. However it's still dithered to 8bpc, not "real" 10bpc. > - things generally work in X -- dri2 and dri3, xv, and obviously > regular X rendering / acceleration > - lots of X software can't handle 30bpp modes (mplayer hates it for > xv and x11 rendering, aterm bails on shading the root pixmap, probably > others) > > I'm also told that with DP, it should actually send the higher-bpc > data over the wire. With HDMI, we're still stuck at 24bpp for now > (although the hardware can do 36bpp as well). This is why my gradient > result above was still dithered. > > Things to do - mostly nouveau specific, but probably some general > infra needed too: > - Figure out how to properly expose the 1024-sized LUTWe have the properties in the kernel. Not sure if x11 could expose it to clients somehow, or would we just have to interpolate the missing bits in the ddx?> - Add fp16 scanouti915 could do this as well. There was a patch to just add the fourcc on account of gvt needing it for some Windows thing. IIRC I asked them to actually implement it in i915 proper but no patch ever surfaced.> - Stop relying on the max bpc of the monitor/connector and make > decisions based on the "effective" bpc (e.g. based on the > currently-set fb format, take hdmi/dp into account, etc). This will > also affect the max clock somehow. Perhaps there should be a way to > force a connector to a certain bpc.We used to look at the fb depth for the primary plane when picking the output bpc, but that doesn't really work when you have multiple planes, and you generally don't want to have to do a modeset to flip to a fb with another format. So in the end we just chose to go for the max bpc possible. There are some potential issues with deep color though (crappy HDMI cables, dongles etc.) so I suggested a property to allow the user to limit it below a certain value. Problem is that IIRC the patch we got was just adding it to i915, whereas we really want to put it into the drm core so that everyone will implement the same thing.> - Add higher-bpc HDMI supportBunch of interesting stuff in i915 to figure out the sink/dongle clock limit etc. If someone else is going to implement HDMI deep color we should perhaps look into lifting some of that stuff into some common place.> - Add 10bpc dithering (only makes sense if >= 10bpc output is > *actually* enabled first) > - Investigate YUV HDMI modes (esp since they can enable 4K at 60 on HDMI > 1.4 hardware)We have 4:2:0 in i915, and pretty close to having YCbCr 4:4:4 too. The 4:4:4 thing would need some new properties though so that the user can actually enable it. What we do with 4:2:0 is enable it automagically when the display can't do RGB 4:4:4 for the given mode. But there's currently no way for the user to say that they prefer YCbCr 4:2:0 over RGB 4:4:4.> - Test out Wayland compositors > - Teach xf86-video-modesetting about addfb2 or that nouveau's > ordering is different. > > I don't necessarily plan on working further on this, so if there are > interested parties, they should definitely try to pick it up. I'll try > to upstream all my changes though. > > Cheers, > > -ilia > _______________________________________________ > dri-devel mailing list > dri-devel at lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/dri-devel-- Ville Syrjälä Intel OTC
On Wed, Feb 07, 2018 at 06:28:42PM +0200, Ville Syrjälä wrote:> On Sun, Feb 04, 2018 at 06:50:45PM -0500, Ilia Mirkin wrote: > > In case anyone's curious about 30bpp framebuffer support, here's the > > current status: > > > > Kernel: > > > > Ben and I have switched the code to using a 256-based LUT for Kepler+, > > and I've also written a patch to cause the addfb ioctl to use the > > proper format. You can pick this up at: > > > > https://github.com/skeggsb/linux/commits/linux-4.16 (note the branch!) > > https://patchwork.freedesktop.org/patch/202322/ > > > > With these two, you should be able to use "X -depth 30" again on any > > G80+ GPU to bring up a screen (as you could in kernel 4.9 and > > earlier). However this still has some deficiencies, some of which I've > > addressed: > > > > xf86-video-nouveau: > > > > DRI3 was broken, and Xv was broken. Patches available at: > > > > https://github.com/imirkin/xf86-video-nouveau/commits/master > > > > mesa: > > > > The NVIDIA hardware (pre-Kepler) can only do XBGR scanout. Further the > > nouveau KMS doesn't add XRGB scanout for Kepler+ (although it could). > > Mesa was only enabled for XRGB, so I've piped XBGR through all the > > same places: > > > > https://github.com/imirkin/mesa/commits/30bpp > > > > libdrm: > > > > For testing, I added a modetest gradient pattern split horizontally. > > Top half is 10bpc, bottom half is 8bpc. This is useful for seeing > > whether you're really getting 10bpc, or if things are getting > > truncated along the way. Definitely hacky, but ... wasn't intending on > > upstreaming it anyways: > > > > https://github.com/imirkin/drm/commit/9b8776f58448b5745675c3a7f5eb2735e3989441 > > > > ------------------------------------- > > > > Results with the patches (tested on a GK208B and a "deep color" TV over HDMI): > > - modetest with a 10bpc gradient shows up smoother than an 8bpc > > gradient. However it's still dithered to 8bpc, not "real" 10bpc. > > - things generally work in X -- dri2 and dri3, xv, and obviously > > regular X rendering / acceleration > > - lots of X software can't handle 30bpp modes (mplayer hates it for > > xv and x11 rendering, aterm bails on shading the root pixmap, probably > > others) > > > > I'm also told that with DP, it should actually send the higher-bpc > > data over the wire. With HDMI, we're still stuck at 24bpp for now > > (although the hardware can do 36bpp as well). This is why my gradient > > result above was still dithered. > > > > Things to do - mostly nouveau specific, but probably some general > > infra needed too: > > - Figure out how to properly expose the 1024-sized LUT > > We have the properties in the kernel. Not sure if x11 could expose it > to clients somehow, or would we just have to interpolate the missing > bits in the ddx?Oh, and I think we're going to have to come up with a fancier uapi for this stuff because in the future the input points may not be evenly spaced (for HDR stuff). Also the hardware may provide various different modes for the gamma LUTs with different tradeoffs. So we may even want to somehow try to enumerate the different modes and let userspace pick the mode that best suits its needs. -- Ville Syrjälä Intel OTC
On 02/05/2018 12:50 AM, Ilia Mirkin wrote:> In case anyone's curious about 30bpp framebuffer support, here's the > current status: > > Kernel: > > Ben and I have switched the code to using a 256-based LUT for Kepler+, > and I've also written a patch to cause the addfb ioctl to use the > proper format. You can pick this up at: > > https://github.com/skeggsb/linux/commits/linux-4.16 (note the branch!) > https://patchwork.freedesktop.org/patch/202322/ > > With these two, you should be able to use "X -depth 30" again on any > G80+ GPU to bring up a screen (as you could in kernel 4.9 and > earlier). However this still has some deficiencies, some of which I've > addressed: > > xf86-video-nouveau: > > DRI3 was broken, and Xv was broken. Patches available at: > > https://github.com/imirkin/xf86-video-nouveau/commits/master > > mesa: > > The NVIDIA hardware (pre-Kepler) can only do XBGR scanout. Further the > nouveau KMS doesn't add XRGB scanout for Kepler+ (although it could). > Mesa was only enabled for XRGB, so I've piped XBGR through all the > same places: > > https://github.com/imirkin/mesa/commits/30bpp >Wrt. mesa, those patches are now in master and i think we have a bit of a problem under X11+GLX: https://cgit.freedesktop.org/mesa/mesa/tree/src/gallium/state_trackers/dri/dri_screen.c#n108 dri_fill_in_modes() defines MESA_FORMAT_R10G10B10A2_UNORM, MESA_FORMAT_R10G10B10X2_UNORM at the top inbetween the BGRX/A formats ignoring the instructions that "/* The 32-bit RGBA format must not precede the 32-bit BGRA format. * Likewise for RGBX and BGRX. Otherwise, the GLX client and the GLX * server may disagree on which format the GLXFBConfig represents, * resulting in swapped color channels." RGBA/X formats should only be exposed if (dri_loader_get_cap(screen, DRI_LOADER_CAP_RGBA_ORDERING)) and that is only the case for the Android loader. The GLX code doesn't use the red/green/blueChannelMasks for proper matching of formats, and the server doesn't even transmit those masks to the client in the case of GLX. So whatever 10 bit format comes first will win when building the assignment to GLXFBConfigs. I looked at the code and how it behaves. In practice Intel gfx works because it's a classic DRI driver with its own method of building the DRIconfig's, and it only exposes the BGR101010 formats, so no danger of mixups. AMD's gallium drivers expose both BGR and RGB ordered 10 bit formats, but due to the ordering, the matching ends up only assigning the desired BGR formats that are good for AMD hw, discarding the RGB formats. nouveau works because it only exposes the desired RGB format for the hw. But with other gallium drivers for some SoC's or future gallium drivers it is not so clear if the right thing will happen. E.g., freedreno seems to support both BGR and RGB 10 bit formats as PIPE_BIND_DISPLAY_TARGET afaics, so i don't know if by luck the right thing would happen? Afaics EGL does the right thing wrt. channelmask matching of EGLConfigs to DRIconfigs, so we could probably implement dri_loader_get_cap(screen, DRI_LOADER_CAP_RGBA_ORDERING) == TRUE for the EGL loaders. But for GLX it is not so easy or quick. I looked if i could make the servers GLX send proper channelmask attributes and Mesa parsing them, but there aren't any GLX tags defined for channel masks, and all other tags come from official GLX extension headers. I'm not sure what the proper procedure for defining new tags is? Do we have to define a new GLX extension for that and get it in the Khronos registry and then back into the server/mesa code-base? The current patches in mesa for XBGR also lack enablement pieces for EGL, Wayland and X11 compositing, but that's a different problem. -mario> libdrm: > > For testing, I added a modetest gradient pattern split horizontally. > Top half is 10bpc, bottom half is 8bpc. This is useful for seeing > whether you're really getting 10bpc, or if things are getting > truncated along the way. Definitely hacky, but ... wasn't intending on > upstreaming it anyways: > > https://github.com/imirkin/drm/commit/9b8776f58448b5745675c3a7f5eb2735e3989441 > > ------------------------------------- > > Results with the patches (tested on a GK208B and a "deep color" TV over HDMI): > - modetest with a 10bpc gradient shows up smoother than an 8bpc > gradient. However it's still dithered to 8bpc, not "real" 10bpc. > - things generally work in X -- dri2 and dri3, xv, and obviously > regular X rendering / acceleration > - lots of X software can't handle 30bpp modes (mplayer hates it for > xv and x11 rendering, aterm bails on shading the root pixmap, probably > others) > > I'm also told that with DP, it should actually send the higher-bpc > data over the wire. With HDMI, we're still stuck at 24bpp for now > (although the hardware can do 36bpp as well). This is why my gradient > result above was still dithered. > > Things to do - mostly nouveau specific, but probably some general > infra needed too: > - Figure out how to properly expose the 1024-sized LUT > - Add fp16 scanout > - Stop relying on the max bpc of the monitor/connector and make > decisions based on the "effective" bpc (e.g. based on the > currently-set fb format, take hdmi/dp into account, etc). This will > also affect the max clock somehow. Perhaps there should be a way to > force a connector to a certain bpc. > - Add higher-bpc HDMI support > - Add 10bpc dithering (only makes sense if >= 10bpc output is > *actually* enabled first) > - Investigate YUV HDMI modes (esp since they can enable 4K at 60 on HDMI > 1.4 hardware) > - Test out Wayland compositors > - Teach xf86-video-modesetting about addfb2 or that nouveau's > ordering is different. > > I don't necessarily plan on working further on this, so if there are > interested parties, they should definitely try to pick it up. I'll try > to upstream all my changes though. > > Cheers, > > -ilia >
On Mon, Mar 5, 2018 at 2:25 AM, Mario Kleiner <mario.kleiner.de at gmail.com> wrote:> On 02/05/2018 12:50 AM, Ilia Mirkin wrote: >> >> In case anyone's curious about 30bpp framebuffer support, here's the >> current status: >> >> Kernel: >> >> Ben and I have switched the code to using a 256-based LUT for Kepler+, >> and I've also written a patch to cause the addfb ioctl to use the >> proper format. You can pick this up at: >> >> https://github.com/skeggsb/linux/commits/linux-4.16 (note the branch!) >> https://patchwork.freedesktop.org/patch/202322/ >> >> With these two, you should be able to use "X -depth 30" again on any >> G80+ GPU to bring up a screen (as you could in kernel 4.9 and >> earlier). However this still has some deficiencies, some of which I've >> addressed: >> >> xf86-video-nouveau: >> >> DRI3 was broken, and Xv was broken. Patches available at: >> >> https://github.com/imirkin/xf86-video-nouveau/commits/master >> >> mesa: >> >> The NVIDIA hardware (pre-Kepler) can only do XBGR scanout. Further the >> nouveau KMS doesn't add XRGB scanout for Kepler+ (although it could). >> Mesa was only enabled for XRGB, so I've piped XBGR through all the >> same places: >> >> https://github.com/imirkin/mesa/commits/30bpp >> > > Wrt. mesa, those patches are now in master and i think we have a bit of a > problem under X11+GLX: > > https://cgit.freedesktop.org/mesa/mesa/tree/src/gallium/state_trackers/dri/dri_screen.c#n108 > > dri_fill_in_modes() defines MESA_FORMAT_R10G10B10A2_UNORM, > MESA_FORMAT_R10G10B10X2_UNORM at the top inbetween the BGRX/A formats > ignoring the instructions that > "/* The 32-bit RGBA format must not precede the 32-bit BGRA format. > * Likewise for RGBX and BGRX. Otherwise, the GLX client and the GLX > * server may disagree on which format the GLXFBConfig represents, > * resulting in swapped color channels." > > RGBA/X formats should only be exposed > if (dri_loader_get_cap(screen, DRI_LOADER_CAP_RGBA_ORDERING)) > > and that is only the case for the Android loader. > > The GLX code doesn't use the red/green/blueChannelMasks for proper matching > of formats, and the server doesn't even transmit those masks to the client > in the case of GLX. So whatever 10 bit format comes first will win when > building the assignment to GLXFBConfigs. > > I looked at the code and how it behaves. In practice Intel gfx works because > it's a classic DRI driver with its own method of building the DRIconfig's, > and it only exposes the BGR101010 formats, so no danger of mixups. AMD's > gallium drivers expose both BGR and RGB ordered 10 bit formats, but due to > the ordering, the matching ends up only assigning the desired BGR formats > that are good for AMD hw, discarding the RGB formats. nouveau works because > it only exposes the desired RGB format for the hw. But with other gallium > drivers for some SoC's or future gallium drivers it is not so clear if the > right thing will happen. E.g., freedreno seems to support both BGR and RGB > 10 bit formats as PIPE_BIND_DISPLAY_TARGET afaics, so i don't know if by > luck the right thing would happen?FWIW freedreno does not presently support 10bpc scanout.> > Afaics EGL does the right thing wrt. channelmask matching of EGLConfigs to > DRIconfigs, so we could probably implement dri_loader_get_cap(screen, > DRI_LOADER_CAP_RGBA_ORDERING) == TRUE for the EGL loaders. > > But for GLX it is not so easy or quick. I looked if i could make the servers > GLX send proper channelmask attributes and Mesa parsing them, but there > aren't any GLX tags defined for channel masks, and all other tags come from > official GLX extension headers. I'm not sure what the proper procedure for > defining new tags is? Do we have to define a new GLX extension for that and > get it in the Khronos registry and then back into the server/mesa code-base?Can all of this be solved by a healthy dose of "don't do that"? i.e. make sure that the DDX only ever exposes one of these at a time? And also make the mesa driver only expose one as a DISPLAY_TARGET?> > The current patches in mesa for XBGR also lack enablement pieces for EGL, > Wayland and X11 compositing, but that's a different problem.EGL/drm and EGL/wayland should be enabled (look at Daniel Stone's patches from a short while back, also upstream now). kmscube (with some patches that are upstream now) and weston both run OK for me. I think EGL/x11 is iffy though - haven't played with it. -ilia