Daniel Vetter
2019-Oct-15 14:19 UTC
[Nouveau] [PATCH] drm: Generalized NV Block Linear DRM format mod
On Mon, Oct 14, 2019 at 03:13:21PM -0700, James Jones wrote:> Builds upon the existing NVIDIA 16Bx2 block linear > format modifiers by adding more "fields" to the > existing parameterized > DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK format modifier > macro that allow fully defining a unique-across- > all-NVIDIA-hardware bit layout using a minimal > set of fields and values. The new modifier macro > DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D is > effectively backwards compatible with the existing > macro, introducing a superset of the previously > definable format modifiers. > > Backwards compatibility has two quirks. First, > the zero value for the "kind" field, which is > implied by the DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK > macro, must be special cased in drivers and > assumed to map to the pre-Turing generic kind of > 0xfe, since a kind of "zero" is reserved for > linear buffer layouts on all GPUs. > > Second, it is assumed backwards compatibility > is only needed when running on Tegra GPUs, and > specifically Tegra GPUs prior to Xavier. This > is based on two assertions: > > -Tegra GPUs prior to Xavier used a slightly > different raw bit layout than desktop GPUs, > making it impossible to directly share block > linear buffers between the two. > > -Support for the existing block linear modifiers > was incomplete, making them useful only for > exporting buffers created by nouveau and > importing them to Tegra DRM as framebuffers for > scan out. There was no support for adding > framebuffers using format modifiers in nouveau, > nor importing dma-buf/PRIME GEM objects into > nouveau userspace drivers with modifiers in Mesa. > > Hence it is assumed the prior modifiers were not > intended for use on desktop GPUs, and as a > corrolary, were not intended to support sharing > block linear buffers across two different NVIDIA > GPUs. > > Signed-off-by: James Jones <jajones at nvidia.com> > --- > include/uapi/drm/drm_fourcc.h | 108 +++++++++++++++++++++++++++++++--- > 1 file changed, 100 insertions(+), 8 deletions(-) > > diff --git a/include/uapi/drm/drm_fourcc.h b/include/uapi/drm/drm_fourcc.h > index 3feeaa3f987a..cc9853d42a24 100644 > --- a/include/uapi/drm/drm_fourcc.h > +++ b/include/uapi/drm/drm_fourcc.h > @@ -497,7 +497,99 @@ extern "C" { > #define DRM_FORMAT_MOD_NVIDIA_TEGRA_TILED fourcc_mod_code(NVIDIA, 1) > > /* > - * 16Bx2 Block Linear layout, used by desktop GPUs, and Tegra K1 and later > + * Generalized Block Linear layout, used by desktop GPUs starting with NV50/G80, > + * and Tegra GPUs starting with Tegra K1. > + * > + * Pixels are arranged in Groups of Bytes (GOBs). GOB size and layout varies > + * based on the architecture generation. GOBs themselves are then arranged in > + * 3D blocks, with the block dimensions (in terms of GOBs) always being a power > + * of two, and hence expressible as their log2 equivalent (E.g., "2" represents > + * a block depth or height of "4"). > + * > + * Chapter 20 "Pixel Memory Formats" of the Tegra X1 TRM describes this format > + * in full detail. > + * > + * Macro > + * Bits Param Description > + * ---- ----- ----------------------------------------------------------------- > + * > + * 3:0 h log2(height) of each block, in GOBs. Placed here for > + * compatibility with the existing > + * DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK()-based modifiers. > + * > + * 4:4 - Must be 1, to indicate block-linear layout. Necessary for > + * compatibility with the existing > + * DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK()-based modifiers. > + * > + * 8:5 - Reserved (To support 3D-surfaces with variable log2(depth) block > + * size). Must be zero. > + * > + * Note there is no log2(width) parameter. Some portions of the > + * hardware support a block width of two gobs, but it is impractical > + * to use due to lack of support elsewhere, and has no known > + * benefits. > + * > + * 11:9 - Reserved (To support 2D-array textures with variable array stride > + * in blocks, specified via log2(tile width in blocks)). Must be > + * zero. > + * > + * 19:12 k Page Kind. This value directly maps to a field in the page > + * tables of all GPUs >= NV50. It affects the exact layout of bits > + * in memory and can be derived from the tuple > + * > + * (format, GPU model, compression type, samples per pixel) > + * > + * Where compression type is defined below. If GPU model were > + * implied by the format modifier, format, or memory buffer, page > + * kind would not need to be included in the modifier itself, but > + * since the modifier should define the layout of the associated > + * memory buffer independent from any device or other context, it > + * must be included here. > + * > + * To grandfather in prior block linear format modifiers to this > + * layout, the page kind "0", which corresponds to "pitch/linear" > + * and hence is unusable with block-linear layouts, is remapped > + * within drivers to the value 0xfe, which corresponds to the > + * "generic" kind used for simple single-sample color formats on > + * pre-Turing GPUs.Hm, maybe a tiny static inline function which canonizalizes modifiers? Something like static inline u64 drm_fourcc_canonicalize_nvidia_block_linear_2d(u64 modifer, bool is_pre_turing) { } Would then give you a nice place to stick this backward compat note and make it really clear what should be done. I think establishing this as a pattern would also be nice, since I'm sure we'll have a pile more of these cases where modifiers turn out to assume a few too many things about the platform they're used on (we have a similar case on the intel side too). Just a drive-by idea, feel free to ignore. Cheers, Daniel> + * > + * 21:20 g GOB Height and Page Kind Generation. The height of a GOB changed > + * starting with Fermi GPUs. Additionally, the mapping between page > + * kind and bit layout has changed at various points. > + * > + * 0 = Gob Height 8, Fermi - Volta, Tegra K1+ Page Kind mapping > + * 1 = Gob Height 4, G80 - GT2XX Page Kind mapping > + * 2 = Gob Height 8, Turing+ Page Kind mapping > + * 3 = Reserved for future use. > + * > + * 22:22 s Sector layout. On Tegra GPUs prior to Xavier, there is a further > + * bit remapping step that occurs at an even lower level than the > + * page kind and block linear swizzles. This causes the layout of > + * surfaces mapped in those SOC's GPUs to be incompatible with the > + * equivalent mapping on other GPUs in the same system. > + * > + * 0 = Tegra K1 - Tegra Parker/TX2 Layout. > + * 1 = Desktop GPU and Tegra Xavier+ Layout > + * > + * 24:23 c Lossless Framebuffer Compression type. > + * > + * 0 = none > + * 1 = ROP/3D, actual compression implied by the Page Kind field > + * 2 = CDE horizontal > + * 3 = CDE vertical > + * > + * 55:25 - Reserved for future use. Must be zero. > + */ > +#define DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(c, s, g, k, h) \ > + fourcc_mod_code(NVIDIA, (0x10 | \ > + ((h) & 0xf) | \ > + (((k) & 0xff) << 12) | \ > + (((g) & 0x3) << 20) | \ > + (((s) & 0x1) << 22) | \ > + (((c) & 0x3) << 23))) > + > +/* > + * 16Bx2 Block Linear layout, used by Tegra K1 and later > * > * Pixels are arranged in 64x8 Groups Of Bytes (GOBs). GOBs are then stacked > * vertically by a power of 2 (1 to 32 GOBs) to form a block. > @@ -518,20 +610,20 @@ extern "C" { > * in full detail. > */ > #define DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(v) \ > - fourcc_mod_code(NVIDIA, 0x10 | ((v) & 0xf)) > + DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 0, 0, 0, (v)) > > #define DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK_ONE_GOB \ > - fourcc_mod_code(NVIDIA, 0x10) > + DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(0) > #define DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK_TWO_GOB \ > - fourcc_mod_code(NVIDIA, 0x11) > + DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(1) > #define DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK_FOUR_GOB \ > - fourcc_mod_code(NVIDIA, 0x12) > + DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(2) > #define DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK_EIGHT_GOB \ > - fourcc_mod_code(NVIDIA, 0x13) > + DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(3) > #define DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK_SIXTEEN_GOB \ > - fourcc_mod_code(NVIDIA, 0x14) > + DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(4) > #define DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK_THIRTYTWO_GOB \ > - fourcc_mod_code(NVIDIA, 0x15) > + DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(5) > > /* > * Some Broadcom modifiers take parameters, for example the number of > -- > 2.17.1 >-- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch
James Jones
2019-Oct-15 15:07 UTC
[Nouveau] [PATCH] drm: Generalized NV Block Linear DRM format mod
On 10/15/19 7:19 AM, Daniel Vetter wrote:> On Mon, Oct 14, 2019 at 03:13:21PM -0700, James Jones wrote: >> Builds upon the existing NVIDIA 16Bx2 block linear >> format modifiers by adding more "fields" to the >> existing parameterized >> DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK format modifier >> macro that allow fully defining a unique-across- >> all-NVIDIA-hardware bit layout using a minimal >> set of fields and values. The new modifier macro >> DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D is >> effectively backwards compatible with the existing >> macro, introducing a superset of the previously >> definable format modifiers. >> >> Backwards compatibility has two quirks. First, >> the zero value for the "kind" field, which is >> implied by the DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK >> macro, must be special cased in drivers and >> assumed to map to the pre-Turing generic kind of >> 0xfe, since a kind of "zero" is reserved for >> linear buffer layouts on all GPUs. >> >> Second, it is assumed backwards compatibility >> is only needed when running on Tegra GPUs, and >> specifically Tegra GPUs prior to Xavier. This >> is based on two assertions: >> >> -Tegra GPUs prior to Xavier used a slightly >> different raw bit layout than desktop GPUs, >> making it impossible to directly share block >> linear buffers between the two. >> >> -Support for the existing block linear modifiers >> was incomplete, making them useful only for >> exporting buffers created by nouveau and >> importing them to Tegra DRM as framebuffers for >> scan out. There was no support for adding >> framebuffers using format modifiers in nouveau, >> nor importing dma-buf/PRIME GEM objects into >> nouveau userspace drivers with modifiers in Mesa. >> >> Hence it is assumed the prior modifiers were not >> intended for use on desktop GPUs, and as a >> corrolary, were not intended to support sharing >> block linear buffers across two different NVIDIA >> GPUs. >> >> Signed-off-by: James Jones <jajones at nvidia.com> >> --- >> include/uapi/drm/drm_fourcc.h | 108 +++++++++++++++++++++++++++++++--- >> 1 file changed, 100 insertions(+), 8 deletions(-) >> >> diff --git a/include/uapi/drm/drm_fourcc.h b/include/uapi/drm/drm_fourcc.h >> index 3feeaa3f987a..cc9853d42a24 100644 >> --- a/include/uapi/drm/drm_fourcc.h >> +++ b/include/uapi/drm/drm_fourcc.h >> @@ -497,7 +497,99 @@ extern "C" { >> #define DRM_FORMAT_MOD_NVIDIA_TEGRA_TILED fourcc_mod_code(NVIDIA, 1) >> >> /* >> - * 16Bx2 Block Linear layout, used by desktop GPUs, and Tegra K1 and later >> + * Generalized Block Linear layout, used by desktop GPUs starting with NV50/G80, >> + * and Tegra GPUs starting with Tegra K1. >> + * >> + * Pixels are arranged in Groups of Bytes (GOBs). GOB size and layout varies >> + * based on the architecture generation. GOBs themselves are then arranged in >> + * 3D blocks, with the block dimensions (in terms of GOBs) always being a power >> + * of two, and hence expressible as their log2 equivalent (E.g., "2" represents >> + * a block depth or height of "4"). >> + * >> + * Chapter 20 "Pixel Memory Formats" of the Tegra X1 TRM describes this format >> + * in full detail. >> + * >> + * Macro >> + * Bits Param Description >> + * ---- ----- ----------------------------------------------------------------- >> + * >> + * 3:0 h log2(height) of each block, in GOBs. Placed here for >> + * compatibility with the existing >> + * DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK()-based modifiers. >> + * >> + * 4:4 - Must be 1, to indicate block-linear layout. Necessary for >> + * compatibility with the existing >> + * DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK()-based modifiers. >> + * >> + * 8:5 - Reserved (To support 3D-surfaces with variable log2(depth) block >> + * size). Must be zero. >> + * >> + * Note there is no log2(width) parameter. Some portions of the >> + * hardware support a block width of two gobs, but it is impractical >> + * to use due to lack of support elsewhere, and has no known >> + * benefits. >> + * >> + * 11:9 - Reserved (To support 2D-array textures with variable array stride >> + * in blocks, specified via log2(tile width in blocks)). Must be >> + * zero. >> + * >> + * 19:12 k Page Kind. This value directly maps to a field in the page >> + * tables of all GPUs >= NV50. It affects the exact layout of bits >> + * in memory and can be derived from the tuple >> + * >> + * (format, GPU model, compression type, samples per pixel) >> + * >> + * Where compression type is defined below. If GPU model were >> + * implied by the format modifier, format, or memory buffer, page >> + * kind would not need to be included in the modifier itself, but >> + * since the modifier should define the layout of the associated >> + * memory buffer independent from any device or other context, it >> + * must be included here. >> + * >> + * To grandfather in prior block linear format modifiers to this >> + * layout, the page kind "0", which corresponds to "pitch/linear" >> + * and hence is unusable with block-linear layouts, is remapped >> + * within drivers to the value 0xfe, which corresponds to the >> + * "generic" kind used for simple single-sample color formats on >> + * pre-Turing GPUs. > > Hm, maybe a tiny static inline function which canonizalizes modifiers? > Something like > > static inline u64 > drm_fourcc_canonicalize_nvidia_block_linear_2d(u64 modifer, bool > is_pre_turing) > { > } > > Would then give you a nice place to stick this backward compat note and > make it really clear what should be done. I think establishing this as a > pattern would also be nice, since I'm sure we'll have a pile more of these > cases where modifiers turn out to assume a few too many things about the > platform they're used on (we have a similar case on the intel side too).To make sure I'm clear, it would behave like this? fixed_mod = canonicalize(old_style_valid_mod, true); assert(fixed_mod == old_style_valid_mod | (0xfe << 12)); fixed_mod = canonicalize(new_style_valid_mod, [false,true]); assert(fixed_mod == new_style_valid_mod); I'm unclear what it should do in this case though: fixed_mod = canonicalize(old_style_valid_mod, false); Since there's no code out there using the old style modifiers with Turing+ yet, and I don't want to try to support such usage. Maybe just drop the "is_pre_turing" parameter and always canonicalize by mapping 0 -> 0xfe as the comment above states, and not touching other values? Or should any invalid modifier, including this case, return DRM_FORMAT_MOD_INVALID? This latter idea seems risky because it would cause software compiled against old drm_fourcc.h to potentially reject format modifiers from newer kernels or libraries with an expanded representation, but maybe that's what we want if only driver components are supposed to call this function. Thanks, -James> Just a drive-by idea, feel free to ignore. > > Cheers, Daniel > >> + * >> + * 21:20 g GOB Height and Page Kind Generation. The height of a GOB changed >> + * starting with Fermi GPUs. Additionally, the mapping between page >> + * kind and bit layout has changed at various points. >> + * >> + * 0 = Gob Height 8, Fermi - Volta, Tegra K1+ Page Kind mapping >> + * 1 = Gob Height 4, G80 - GT2XX Page Kind mapping >> + * 2 = Gob Height 8, Turing+ Page Kind mapping >> + * 3 = Reserved for future use. >> + * >> + * 22:22 s Sector layout. On Tegra GPUs prior to Xavier, there is a further >> + * bit remapping step that occurs at an even lower level than the >> + * page kind and block linear swizzles. This causes the layout of >> + * surfaces mapped in those SOC's GPUs to be incompatible with the >> + * equivalent mapping on other GPUs in the same system. >> + * >> + * 0 = Tegra K1 - Tegra Parker/TX2 Layout. >> + * 1 = Desktop GPU and Tegra Xavier+ Layout >> + * >> + * 24:23 c Lossless Framebuffer Compression type. >> + * >> + * 0 = none >> + * 1 = ROP/3D, actual compression implied by the Page Kind field >> + * 2 = CDE horizontal >> + * 3 = CDE vertical >> + * >> + * 55:25 - Reserved for future use. Must be zero. >> + */ >> +#define DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(c, s, g, k, h) \ >> + fourcc_mod_code(NVIDIA, (0x10 | \ >> + ((h) & 0xf) | \ >> + (((k) & 0xff) << 12) | \ >> + (((g) & 0x3) << 20) | \ >> + (((s) & 0x1) << 22) | \ >> + (((c) & 0x3) << 23))) >> + >> +/* >> + * 16Bx2 Block Linear layout, used by Tegra K1 and later >> * >> * Pixels are arranged in 64x8 Groups Of Bytes (GOBs). GOBs are then stacked >> * vertically by a power of 2 (1 to 32 GOBs) to form a block. >> @@ -518,20 +610,20 @@ extern "C" { >> * in full detail. >> */ >> #define DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(v) \ >> - fourcc_mod_code(NVIDIA, 0x10 | ((v) & 0xf)) >> + DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 0, 0, 0, (v)) >> >> #define DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK_ONE_GOB \ >> - fourcc_mod_code(NVIDIA, 0x10) >> + DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(0) >> #define DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK_TWO_GOB \ >> - fourcc_mod_code(NVIDIA, 0x11) >> + DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(1) >> #define DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK_FOUR_GOB \ >> - fourcc_mod_code(NVIDIA, 0x12) >> + DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(2) >> #define DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK_EIGHT_GOB \ >> - fourcc_mod_code(NVIDIA, 0x13) >> + DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(3) >> #define DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK_SIXTEEN_GOB \ >> - fourcc_mod_code(NVIDIA, 0x14) >> + DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(4) >> #define DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK_THIRTYTWO_GOB \ >> - fourcc_mod_code(NVIDIA, 0x15) >> + DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(5) >> >> /* >> * Some Broadcom modifiers take parameters, for example the number of >> -- >> 2.17.1 >> >
Daniel Vetter
2019-Oct-15 15:42 UTC
[Nouveau] [PATCH] drm: Generalized NV Block Linear DRM format mod
On Tue, Oct 15, 2019 at 5:14 PM James Jones <jajones at nvidia.com> wrote:> > On 10/15/19 7:19 AM, Daniel Vetter wrote: > > On Mon, Oct 14, 2019 at 03:13:21PM -0700, James Jones wrote: > >> Builds upon the existing NVIDIA 16Bx2 block linear > >> format modifiers by adding more "fields" to the > >> existing parameterized > >> DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK format modifier > >> macro that allow fully defining a unique-across- > >> all-NVIDIA-hardware bit layout using a minimal > >> set of fields and values. The new modifier macro > >> DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D is > >> effectively backwards compatible with the existing > >> macro, introducing a superset of the previously > >> definable format modifiers. > >> > >> Backwards compatibility has two quirks. First, > >> the zero value for the "kind" field, which is > >> implied by the DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK > >> macro, must be special cased in drivers and > >> assumed to map to the pre-Turing generic kind of > >> 0xfe, since a kind of "zero" is reserved for > >> linear buffer layouts on all GPUs. > >> > >> Second, it is assumed backwards compatibility > >> is only needed when running on Tegra GPUs, and > >> specifically Tegra GPUs prior to Xavier. This > >> is based on two assertions: > >> > >> -Tegra GPUs prior to Xavier used a slightly > >> different raw bit layout than desktop GPUs, > >> making it impossible to directly share block > >> linear buffers between the two. > >> > >> -Support for the existing block linear modifiers > >> was incomplete, making them useful only for > >> exporting buffers created by nouveau and > >> importing them to Tegra DRM as framebuffers for > >> scan out. There was no support for adding > >> framebuffers using format modifiers in nouveau, > >> nor importing dma-buf/PRIME GEM objects into > >> nouveau userspace drivers with modifiers in Mesa. > >> > >> Hence it is assumed the prior modifiers were not > >> intended for use on desktop GPUs, and as a > >> corrolary, were not intended to support sharing > >> block linear buffers across two different NVIDIA > >> GPUs. > >> > >> Signed-off-by: James Jones <jajones at nvidia.com> > >> --- > >> include/uapi/drm/drm_fourcc.h | 108 +++++++++++++++++++++++++++++++--- > >> 1 file changed, 100 insertions(+), 8 deletions(-) > >> > >> diff --git a/include/uapi/drm/drm_fourcc.h b/include/uapi/drm/drm_fourcc.h > >> index 3feeaa3f987a..cc9853d42a24 100644 > >> --- a/include/uapi/drm/drm_fourcc.h > >> +++ b/include/uapi/drm/drm_fourcc.h > >> @@ -497,7 +497,99 @@ extern "C" { > >> #define DRM_FORMAT_MOD_NVIDIA_TEGRA_TILED fourcc_mod_code(NVIDIA, 1) > >> > >> /* > >> - * 16Bx2 Block Linear layout, used by desktop GPUs, and Tegra K1 and later > >> + * Generalized Block Linear layout, used by desktop GPUs starting with NV50/G80, > >> + * and Tegra GPUs starting with Tegra K1. > >> + * > >> + * Pixels are arranged in Groups of Bytes (GOBs). GOB size and layout varies > >> + * based on the architecture generation. GOBs themselves are then arranged in > >> + * 3D blocks, with the block dimensions (in terms of GOBs) always being a power > >> + * of two, and hence expressible as their log2 equivalent (E.g., "2" represents > >> + * a block depth or height of "4"). > >> + * > >> + * Chapter 20 "Pixel Memory Formats" of the Tegra X1 TRM describes this format > >> + * in full detail. > >> + * > >> + * Macro > >> + * Bits Param Description > >> + * ---- ----- ----------------------------------------------------------------- > >> + * > >> + * 3:0 h log2(height) of each block, in GOBs. Placed here for > >> + * compatibility with the existing > >> + * DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK()-based modifiers. > >> + * > >> + * 4:4 - Must be 1, to indicate block-linear layout. Necessary for > >> + * compatibility with the existing > >> + * DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK()-based modifiers. > >> + * > >> + * 8:5 - Reserved (To support 3D-surfaces with variable log2(depth) block > >> + * size). Must be zero. > >> + * > >> + * Note there is no log2(width) parameter. Some portions of the > >> + * hardware support a block width of two gobs, but it is impractical > >> + * to use due to lack of support elsewhere, and has no known > >> + * benefits. > >> + * > >> + * 11:9 - Reserved (To support 2D-array textures with variable array stride > >> + * in blocks, specified via log2(tile width in blocks)). Must be > >> + * zero. > >> + * > >> + * 19:12 k Page Kind. This value directly maps to a field in the page > >> + * tables of all GPUs >= NV50. It affects the exact layout of bits > >> + * in memory and can be derived from the tuple > >> + * > >> + * (format, GPU model, compression type, samples per pixel) > >> + * > >> + * Where compression type is defined below. If GPU model were > >> + * implied by the format modifier, format, or memory buffer, page > >> + * kind would not need to be included in the modifier itself, but > >> + * since the modifier should define the layout of the associated > >> + * memory buffer independent from any device or other context, it > >> + * must be included here. > >> + * > >> + * To grandfather in prior block linear format modifiers to this > >> + * layout, the page kind "0", which corresponds to "pitch/linear" > >> + * and hence is unusable with block-linear layouts, is remapped > >> + * within drivers to the value 0xfe, which corresponds to the > >> + * "generic" kind used for simple single-sample color formats on > >> + * pre-Turing GPUs. > > > > Hm, maybe a tiny static inline function which canonizalizes modifiers? > > Something like > > > > static inline u64 > > drm_fourcc_canonicalize_nvidia_block_linear_2d(u64 modifer, bool > > is_pre_turing) > > { > > } > > > > Would then give you a nice place to stick this backward compat note and > > make it really clear what should be done. I think establishing this as a > > pattern would also be nice, since I'm sure we'll have a pile more of these > > cases where modifiers turn out to assume a few too many things about the > > platform they're used on (we have a similar case on the intel side too). > > To make sure I'm clear, it would behave like this? > > fixed_mod = canonicalize(old_style_valid_mod, true); > assert(fixed_mod == old_style_valid_mod | (0xfe << 12)); > fixed_mod = canonicalize(new_style_valid_mod, [false,true]); > assert(fixed_mod == new_style_valid_mod); > > I'm unclear what it should do in this case though: > > fixed_mod = canonicalize(old_style_valid_mod, false); > > Since there's no code out there using the old style modifiers with > Turing+ yet, and I don't want to try to support such usage. Maybe just > drop the "is_pre_turing" parameter and always canonicalize by mapping 0 > -> 0xfe as the comment above states, and not touching other values? Or > should any invalid modifier, including this case, return > DRM_FORMAT_MOD_INVALID? This latter idea seems risky because it would > cause software compiled against old drm_fourcc.h to potentially reject > format modifiers from newer kernels or libraries with an expanded > representation, but maybe that's what we want if only driver components > are supposed to call this function.That's where I'm showing that I have no clue about nvidia buffer formats I guess :-) If 0 never makes sense, then yeah I guess you could just unconditionally canonicalize. Maybe you want to reject the old/legacy style on turing+ plus, but that's a tradeoff up to you guys - as long as it's consistent across all involved drivers. Not rejecting it would essentially make 0 an alias for 0xfe everywhere, post and pre-turing. More reasons for having a shared canonicalize function which enforces the same behaviour (whichever you pick) everywhere I think. And yeah the canonicalize function would only be called by drivers, not by clients. -Daniel> Thanks, > -James > > > Just a drive-by idea, feel free to ignore. > > > > Cheers, Daniel > > > >> + * > >> + * 21:20 g GOB Height and Page Kind Generation. The height of a GOB changed > >> + * starting with Fermi GPUs. Additionally, the mapping between page > >> + * kind and bit layout has changed at various points. > >> + * > >> + * 0 = Gob Height 8, Fermi - Volta, Tegra K1+ Page Kind mapping > >> + * 1 = Gob Height 4, G80 - GT2XX Page Kind mapping > >> + * 2 = Gob Height 8, Turing+ Page Kind mapping > >> + * 3 = Reserved for future use. > >> + * > >> + * 22:22 s Sector layout. On Tegra GPUs prior to Xavier, there is a further > >> + * bit remapping step that occurs at an even lower level than the > >> + * page kind and block linear swizzles. This causes the layout of > >> + * surfaces mapped in those SOC's GPUs to be incompatible with the > >> + * equivalent mapping on other GPUs in the same system. > >> + * > >> + * 0 = Tegra K1 - Tegra Parker/TX2 Layout. > >> + * 1 = Desktop GPU and Tegra Xavier+ Layout > >> + * > >> + * 24:23 c Lossless Framebuffer Compression type. > >> + * > >> + * 0 = none > >> + * 1 = ROP/3D, actual compression implied by the Page Kind field > >> + * 2 = CDE horizontal > >> + * 3 = CDE vertical > >> + * > >> + * 55:25 - Reserved for future use. Must be zero. > >> + */ > >> +#define DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(c, s, g, k, h) \ > >> + fourcc_mod_code(NVIDIA, (0x10 | \ > >> + ((h) & 0xf) | \ > >> + (((k) & 0xff) << 12) | \ > >> + (((g) & 0x3) << 20) | \ > >> + (((s) & 0x1) << 22) | \ > >> + (((c) & 0x3) << 23))) > >> + > >> +/* > >> + * 16Bx2 Block Linear layout, used by Tegra K1 and later > >> * > >> * Pixels are arranged in 64x8 Groups Of Bytes (GOBs). GOBs are then stacked > >> * vertically by a power of 2 (1 to 32 GOBs) to form a block. > >> @@ -518,20 +610,20 @@ extern "C" { > >> * in full detail. > >> */ > >> #define DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(v) \ > >> - fourcc_mod_code(NVIDIA, 0x10 | ((v) & 0xf)) > >> + DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 0, 0, 0, (v)) > >> > >> #define DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK_ONE_GOB \ > >> - fourcc_mod_code(NVIDIA, 0x10) > >> + DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(0) > >> #define DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK_TWO_GOB \ > >> - fourcc_mod_code(NVIDIA, 0x11) > >> + DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(1) > >> #define DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK_FOUR_GOB \ > >> - fourcc_mod_code(NVIDIA, 0x12) > >> + DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(2) > >> #define DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK_EIGHT_GOB \ > >> - fourcc_mod_code(NVIDIA, 0x13) > >> + DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(3) > >> #define DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK_SIXTEEN_GOB \ > >> - fourcc_mod_code(NVIDIA, 0x14) > >> + DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(4) > >> #define DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK_THIRTYTWO_GOB \ > >> - fourcc_mod_code(NVIDIA, 0x15) > >> + DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(5) > >> > >> /* > >> * Some Broadcom modifiers take parameters, for example the number of > >> -- > >> 2.17.1 > >> > >-- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch
Apparently Analagous Threads
- [PATCH] drm: Generalized NV Block Linear DRM format mod
- [PATCH] drm: Generalized NV Block Linear DRM format mod
- [PATCH] drm: Generalized NV Block Linear DRM format mod
- [PATCH v2] drm: Generalized NV Block Linear DRM format mod
- [PATCH v3] drm: Generalized NV Block Linear DRM format mod