thr3ads.net - Nouveau - [Nouveau] [PATCH] drm: Generalized NV Block Linear DRM format mod [Oct 2019]

If this information is useful, please help other people find it:
Share via:

Daniel Vetter

2019-Oct-15 14:19 UTC

[Nouveau] [PATCH] drm: Generalized NV Block Linear DRM format mod

On Mon, Oct 14, 2019 at 03:13:21PM -0700, James Jones
wrote:> Builds upon the existing NVIDIA 16Bx2 block linear
> format modifiers by adding more "fields" to the
> existing parameterized
> DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK format modifier
> macro that allow fully defining a unique-across-
> all-NVIDIA-hardware bit layout using a minimal
> set of fields and values.  The new modifier macro
> DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D is
> effectively backwards compatible with the existing
> macro, introducing a superset of the previously
> definable format modifiers.
> 
> Backwards compatibility has two quirks.  First,
> the zero value for the "kind" field, which is
> implied by the DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK
> macro, must be special cased in drivers and
> assumed to map to the pre-Turing generic kind of
> 0xfe, since a kind of "zero" is reserved for
> linear buffer layouts on all GPUs.
> 
> Second, it is assumed backwards compatibility
> is only needed when running on Tegra GPUs, and
> specifically Tegra GPUs prior to Xavier.  This
> is based on two assertions:
> 
> -Tegra GPUs prior to Xavier used a slightly
>  different raw bit layout than desktop GPUs,
>  making it impossible to directly share block
>  linear buffers between the two.
> 
> -Support for the existing block linear modifiers
>  was incomplete, making them useful only for
>  exporting buffers created by nouveau and
>  importing them to Tegra DRM as framebuffers for
>  scan out.  There was no support for adding
>  framebuffers using format modifiers in nouveau,
>  nor importing dma-buf/PRIME GEM objects into
>  nouveau userspace drivers with modifiers in Mesa.
> 
> Hence it is assumed the prior modifiers were not
> intended for use on desktop GPUs, and as a
> corrolary, were not intended to support sharing
> block linear buffers across two different NVIDIA
> GPUs.
> 
> Signed-off-by: James Jones <jajones at nvidia.com>
> ---
>  include/uapi/drm/drm_fourcc.h | 108 +++++++++++++++++++++++++++++++---
>  1 file changed, 100 insertions(+), 8 deletions(-)
> 
> diff --git a/include/uapi/drm/drm_fourcc.h b/include/uapi/drm/drm_fourcc.h
> index 3feeaa3f987a..cc9853d42a24 100644
> --- a/include/uapi/drm/drm_fourcc.h
> +++ b/include/uapi/drm/drm_fourcc.h
> @@ -497,7 +497,99 @@ extern "C" {
>  #define DRM_FORMAT_MOD_NVIDIA_TEGRA_TILED fourcc_mod_code(NVIDIA, 1)
>  
>  /*
> - * 16Bx2 Block Linear layout, used by desktop GPUs, and Tegra K1 and later
> + * Generalized Block Linear layout, used by desktop GPUs starting with
NV50/G80,
> + * and Tegra GPUs starting with Tegra K1.
> + *
> + * Pixels are arranged in Groups of Bytes (GOBs).  GOB size and layout
varies
> + * based on the architecture generation.  GOBs themselves are then
arranged in
> + * 3D blocks, with the block dimensions (in terms of GOBs) always being a
power
> + * of two, and hence expressible as their log2 equivalent (E.g.,
"2" represents
> + * a block depth or height of "4").
> + *
> + * Chapter 20 "Pixel Memory Formats" of the Tegra X1 TRM
describes this format
> + * in full detail.
> + *
> + *       Macro
> + * Bits  Param Description
> + * ----  -----
-----------------------------------------------------------------
> + *
> + *  3:0  h     log2(height) of each block, in GOBs.  Placed here for
> + *             compatibility with the existing
> + *             DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK()-based modifiers.
> + *
> + *  4:4  -     Must be 1, to indicate block-linear layout.  Necessary for
> + *             compatibility with the existing
> + *             DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK()-based modifiers.
> + *
> + *  8:5  -     Reserved (To support 3D-surfaces with variable log2(depth)
block
> + *             size).  Must be zero.
> + *
> + *             Note there is no log2(width) parameter.  Some portions of
the
> + *             hardware support a block width of two gobs, but it is
impractical
> + *             to use due to lack of support elsewhere, and has no known
> + *             benefits.
> + *
> + * 11:9  -     Reserved (To support 2D-array textures with variable array
stride
> + *             in blocks, specified via log2(tile width in blocks)).  Must
be
> + *             zero.
> + *
> + * 19:12 k     Page Kind.  This value directly maps to a field in the page
> + *             tables of all GPUs >= NV50.  It affects the exact layout
of bits
> + *             in memory and can be derived from the tuple
> + *
> + *               (format, GPU model, compression type, samples per pixel)
> + *
> + *             Where compression type is defined below.  If GPU model were
> + *             implied by the format modifier, format, or memory buffer,
page
> + *             kind would not need to be included in the modifier itself,
but
> + *             since the modifier should define the layout of the
associated
> + *             memory buffer independent from any device or other context,
it
> + *             must be included here.
> + *
> + *             To grandfather in prior block linear format modifiers to
this
> + *             layout, the page kind "0", which corresponds to
"pitch/linear"
> + *             and hence is unusable with block-linear layouts, is
remapped
> + *             within drivers to the value 0xfe, which corresponds to the
> + *             "generic" kind used for simple single-sample
color formats on
> + *             pre-Turing GPUs.
Hm, maybe a tiny static inline function which canonizalizes modifiers?
Something like

static inline u64
drm_fourcc_canonicalize_nvidia_block_linear_2d(u64 modifer, bool
is_pre_turing)
{
}

Would then give you a nice place to stick this backward compat note and
make it really clear what should be done. I think establishing this as a
pattern would also be nice, since I'm sure we'll have a pile more of
these
cases where modifiers turn out to assume a few too many things about the
platform they're used on (we have a similar case on the intel side too).

Just a drive-by idea, feel free to ignore.

Cheers, Daniel
> + *
> + * 21:20 g     GOB Height and Page Kind Generation.  The height of a GOB
changed
> + *             starting with Fermi GPUs.  Additionally, the mapping
between page
> + *             kind and bit layout has changed at various points.
> + *
> + *               0 = Gob Height 8, Fermi - Volta, Tegra K1+ Page Kind
mapping
> + *               1 = Gob Height 4, G80 - GT2XX Page Kind mapping
> + *               2 = Gob Height 8, Turing+ Page Kind mapping
> + *               3 = Reserved for future use.
> + *
> + * 22:22 s     Sector layout.  On Tegra GPUs prior to Xavier, there is a
further
> + *             bit remapping step that occurs at an even lower level than
the
> + *             page kind and block linear swizzles.  This causes the
layout of
> + *             surfaces mapped in those SOC's GPUs to be incompatible
with the
> + *             equivalent mapping on other GPUs in the same system.
> + *
> + *               0 = Tegra K1 - Tegra Parker/TX2 Layout.
> + *               1 = Desktop GPU and Tegra Xavier+ Layout
> + *
> + * 24:23 c     Lossless Framebuffer Compression type.
> + *
> + *               0 = none
> + *               1 = ROP/3D, actual compression implied by the Page Kind
field
> + *               2 = CDE horizontal
> + *               3 = CDE vertical
> + *
> + * 55:25 -     Reserved for future use.  Must be zero.
> + */
> +#define DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(c, s, g, k, h) \
> +	fourcc_mod_code(NVIDIA, (0x10 | \
> +				 ((h) & 0xf) | \
> +				 (((k) & 0xff) << 12) | \
> +				 (((g) & 0x3) << 20) | \
> +				 (((s) & 0x1) << 22) | \
> +				 (((c) & 0x3) << 23)))
> +
> +/*
> + * 16Bx2 Block Linear layout, used by Tegra K1 and later
>   *
>   * Pixels are arranged in 64x8 Groups Of Bytes (GOBs). GOBs are then
stacked
>   * vertically by a power of 2 (1 to 32 GOBs) to form a block.
> @@ -518,20 +610,20 @@ extern "C" {
>   * in full detail.
>   */
>  #define DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(v) \
> -	fourcc_mod_code(NVIDIA, 0x10 | ((v) & 0xf))
> +	DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 0, 0, 0, (v))
>  
>  #define DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK_ONE_GOB \
> -	fourcc_mod_code(NVIDIA, 0x10)
> +	DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(0)
>  #define DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK_TWO_GOB \
> -	fourcc_mod_code(NVIDIA, 0x11)
> +	DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(1)
>  #define DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK_FOUR_GOB \
> -	fourcc_mod_code(NVIDIA, 0x12)
> +	DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(2)
>  #define DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK_EIGHT_GOB \
> -	fourcc_mod_code(NVIDIA, 0x13)
> +	DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(3)
>  #define DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK_SIXTEEN_GOB \
> -	fourcc_mod_code(NVIDIA, 0x14)
> +	DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(4)
>  #define DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK_THIRTYTWO_GOB \
> -	fourcc_mod_code(NVIDIA, 0x15)
> +	DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(5)
>  
>  /*
>   * Some Broadcom modifiers take parameters, for example the number of
> -- 
> 2.17.1
> 
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

James Jones

2019-Oct-15 15:07 UTC

head link

[Nouveau] [PATCH] drm: Generalized NV Block Linear DRM format mod

On 10/15/19 7:19 AM, Daniel Vetter wrote:> On Mon, Oct 14, 2019 at 03:13:21PM -0700, James Jones wrote:
>> Builds upon the existing NVIDIA 16Bx2 block linear
>> format modifiers by adding more "fields" to the
>> existing parameterized
>> DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK format modifier
>> macro that allow fully defining a unique-across-
>> all-NVIDIA-hardware bit layout using a minimal
>> set of fields and values.  The new modifier macro
>> DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D is
>> effectively backwards compatible with the existing
>> macro, introducing a superset of the previously
>> definable format modifiers.
>>
>> Backwards compatibility has two quirks.  First,
>> the zero value for the "kind" field, which is
>> implied by the DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK
>> macro, must be special cased in drivers and
>> assumed to map to the pre-Turing generic kind of
>> 0xfe, since a kind of "zero" is reserved for
>> linear buffer layouts on all GPUs.
>>
>> Second, it is assumed backwards compatibility
>> is only needed when running on Tegra GPUs, and
>> specifically Tegra GPUs prior to Xavier.  This
>> is based on two assertions:
>>
>> -Tegra GPUs prior to Xavier used a slightly
>>   different raw bit layout than desktop GPUs,
>>   making it impossible to directly share block
>>   linear buffers between the two.
>>
>> -Support for the existing block linear modifiers
>>   was incomplete, making them useful only for
>>   exporting buffers created by nouveau and
>>   importing them to Tegra DRM as framebuffers for
>>   scan out.  There was no support for adding
>>   framebuffers using format modifiers in nouveau,
>>   nor importing dma-buf/PRIME GEM objects into
>>   nouveau userspace drivers with modifiers in Mesa.
>>
>> Hence it is assumed the prior modifiers were not
>> intended for use on desktop GPUs, and as a
>> corrolary, were not intended to support sharing
>> block linear buffers across two different NVIDIA
>> GPUs.
>>
>> Signed-off-by: James Jones <jajones at nvidia.com>
>> ---
>>   include/uapi/drm/drm_fourcc.h | 108
+++++++++++++++++++++++++++++++---
>>   1 file changed, 100 insertions(+), 8 deletions(-)
>>
>> diff --git a/include/uapi/drm/drm_fourcc.h
b/include/uapi/drm/drm_fourcc.h
>> index 3feeaa3f987a..cc9853d42a24 100644
>> --- a/include/uapi/drm/drm_fourcc.h
>> +++ b/include/uapi/drm/drm_fourcc.h
>> @@ -497,7 +497,99 @@ extern "C" {
>>   #define DRM_FORMAT_MOD_NVIDIA_TEGRA_TILED fourcc_mod_code(NVIDIA, 1)
>>   
>>   /*
>> - * 16Bx2 Block Linear layout, used by desktop GPUs, and Tegra K1 and
later
>> + * Generalized Block Linear layout, used by desktop GPUs starting with
NV50/G80,
>> + * and Tegra GPUs starting with Tegra K1.
>> + *
>> + * Pixels are arranged in Groups of Bytes (GOBs).  GOB size and layout
varies
>> + * based on the architecture generation.  GOBs themselves are then
arranged in
>> + * 3D blocks, with the block dimensions (in terms of GOBs) always
being a power
>> + * of two, and hence expressible as their log2 equivalent (E.g.,
"2" represents
>> + * a block depth or height of "4").
>> + *
>> + * Chapter 20 "Pixel Memory Formats" of the Tegra X1 TRM
describes this format
>> + * in full detail.
>> + *
>> + *       Macro
>> + * Bits  Param Description
>> + * ----  -----
-----------------------------------------------------------------
>> + *
>> + *  3:0  h     log2(height) of each block, in GOBs.  Placed here for
>> + *             compatibility with the existing
>> + *             DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK()-based modifiers.
>> + *
>> + *  4:4  -     Must be 1, to indicate block-linear layout.  Necessary
for
>> + *             compatibility with the existing
>> + *             DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK()-based modifiers.
>> + *
>> + *  8:5  -     Reserved (To support 3D-surfaces with variable
log2(depth) block
>> + *             size).  Must be zero.
>> + *
>> + *             Note there is no log2(width) parameter.  Some portions
of the
>> + *             hardware support a block width of two gobs, but it is
impractical
>> + *             to use due to lack of support elsewhere, and has no
known
>> + *             benefits.
>> + *
>> + * 11:9  -     Reserved (To support 2D-array textures with variable
array stride
>> + *             in blocks, specified via log2(tile width in blocks)). 
Must be
>> + *             zero.
>> + *
>> + * 19:12 k     Page Kind.  This value directly maps to a field in the
page
>> + *             tables of all GPUs >= NV50.  It affects the exact
layout of bits
>> + *             in memory and can be derived from the tuple
>> + *
>> + *               (format, GPU model, compression type, samples per
pixel)
>> + *
>> + *             Where compression type is defined below.  If GPU model
were
>> + *             implied by the format modifier, format, or memory
buffer, page
>> + *             kind would not need to be included in the modifier
itself, but
>> + *             since the modifier should define the layout of the
associated
>> + *             memory buffer independent from any device or other
context, it
>> + *             must be included here.
>> + *
>> + *             To grandfather in prior block linear format modifiers
to this
>> + *             layout, the page kind "0", which corresponds
to "pitch/linear"
>> + *             and hence is unusable with block-linear layouts, is
remapped
>> + *             within drivers to the value 0xfe, which corresponds to
the
>> + *             "generic" kind used for simple single-sample
color formats on
>> + *             pre-Turing GPUs.
> 
> Hm, maybe a tiny static inline function which canonizalizes modifiers?
> Something like
> 
> static inline u64
> drm_fourcc_canonicalize_nvidia_block_linear_2d(u64 modifer, bool
> is_pre_turing)
> {
> }
> 
> Would then give you a nice place to stick this backward compat note and
> make it really clear what should be done. I think establishing this as a
> pattern would also be nice, since I'm sure we'll have a pile more
of these
> cases where modifiers turn out to assume a few too many things about the
> platform they're used on (we have a similar case on the intel side
too).
To make sure I'm clear, it would behave like this?

   fixed_mod = canonicalize(old_style_valid_mod, true);
   assert(fixed_mod == old_style_valid_mod | (0xfe << 12));
   fixed_mod = canonicalize(new_style_valid_mod, [false,true]);
   assert(fixed_mod == new_style_valid_mod);

I'm unclear what it should do in this case though:

   fixed_mod = canonicalize(old_style_valid_mod, false);

Since there's no code out there using the old style modifiers with 
Turing+ yet, and I don't want to try to support such usage.  Maybe just 
drop the "is_pre_turing" parameter and always canonicalize by mapping
0
-> 0xfe as the comment above states, and not touching other values?  Or 
should any invalid modifier, including this case, return 
DRM_FORMAT_MOD_INVALID?  This latter idea seems risky because it would 
cause software compiled against old drm_fourcc.h to potentially reject 
format modifiers from newer kernels or libraries with an expanded 
representation, but maybe that's what we want if only driver components 
are supposed to call this function.

Thanks,
-James
> Just a drive-by idea, feel free to ignore.
> 
> Cheers, Daniel
> 
>> + *
>> + * 21:20 g     GOB Height and Page Kind Generation.  The height of a
GOB changed
>> + *             starting with Fermi GPUs.  Additionally, the mapping
between page
>> + *             kind and bit layout has changed at various points.
>> + *
>> + *               0 = Gob Height 8, Fermi - Volta, Tegra K1+ Page Kind
mapping
>> + *               1 = Gob Height 4, G80 - GT2XX Page Kind mapping
>> + *               2 = Gob Height 8, Turing+ Page Kind mapping
>> + *               3 = Reserved for future use.
>> + *
>> + * 22:22 s     Sector layout.  On Tegra GPUs prior to Xavier, there is
a further
>> + *             bit remapping step that occurs at an even lower level
than the
>> + *             page kind and block linear swizzles.  This causes the
layout of
>> + *             surfaces mapped in those SOC's GPUs to be
incompatible with the
>> + *             equivalent mapping on other GPUs in the same system.
>> + *
>> + *               0 = Tegra K1 - Tegra Parker/TX2 Layout.
>> + *               1 = Desktop GPU and Tegra Xavier+ Layout
>> + *
>> + * 24:23 c     Lossless Framebuffer Compression type.
>> + *
>> + *               0 = none
>> + *               1 = ROP/3D, actual compression implied by the Page
Kind field
>> + *               2 = CDE horizontal
>> + *               3 = CDE vertical
>> + *
>> + * 55:25 -     Reserved for future use.  Must be zero.
>> + */
>> +#define DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(c, s, g, k, h) \
>> +	fourcc_mod_code(NVIDIA, (0x10 | \
>> +				 ((h) & 0xf) | \
>> +				 (((k) & 0xff) << 12) | \
>> +				 (((g) & 0x3) << 20) | \
>> +				 (((s) & 0x1) << 22) | \
>> +				 (((c) & 0x3) << 23)))
>> +
>> +/*
>> + * 16Bx2 Block Linear layout, used by Tegra K1 and later
>>    *
>>    * Pixels are arranged in 64x8 Groups Of Bytes (GOBs). GOBs are then
stacked
>>    * vertically by a power of 2 (1 to 32 GOBs) to form a block.
>> @@ -518,20 +610,20 @@ extern "C" {
>>    * in full detail.
>>    */
>>   #define DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(v) \
>> -	fourcc_mod_code(NVIDIA, 0x10 | ((v) & 0xf))
>> +	DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 0, 0, 0, (v))
>>   
>>   #define DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK_ONE_GOB \
>> -	fourcc_mod_code(NVIDIA, 0x10)
>> +	DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(0)
>>   #define DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK_TWO_GOB \
>> -	fourcc_mod_code(NVIDIA, 0x11)
>> +	DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(1)
>>   #define DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK_FOUR_GOB \
>> -	fourcc_mod_code(NVIDIA, 0x12)
>> +	DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(2)
>>   #define DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK_EIGHT_GOB \
>> -	fourcc_mod_code(NVIDIA, 0x13)
>> +	DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(3)
>>   #define DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK_SIXTEEN_GOB \
>> -	fourcc_mod_code(NVIDIA, 0x14)
>> +	DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(4)
>>   #define DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK_THIRTYTWO_GOB \
>> -	fourcc_mod_code(NVIDIA, 0x15)
>> +	DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(5)
>>   
>>   /*
>>    * Some Broadcom modifiers take parameters, for example the number of
>> -- 
>> 2.17.1
>>
>

Daniel Vetter

2019-Oct-15 15:42 UTC

head link

[Nouveau] [PATCH] drm: Generalized NV Block Linear DRM format mod

On Tue, Oct 15, 2019 at 5:14 PM James Jones <jajones at nvidia.com>
wrote:>
> On 10/15/19 7:19 AM, Daniel Vetter wrote:
> > On Mon, Oct 14, 2019 at 03:13:21PM -0700, James Jones wrote:
> >> Builds upon the existing NVIDIA 16Bx2 block linear
> >> format modifiers by adding more "fields" to the
> >> existing parameterized
> >> DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK format modifier
> >> macro that allow fully defining a unique-across-
> >> all-NVIDIA-hardware bit layout using a minimal
> >> set of fields and values.  The new modifier macro
> >> DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D is
> >> effectively backwards compatible with the existing
> >> macro, introducing a superset of the previously
> >> definable format modifiers.
> >>
> >> Backwards compatibility has two quirks.  First,
> >> the zero value for the "kind" field, which is
> >> implied by the DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK
> >> macro, must be special cased in drivers and
> >> assumed to map to the pre-Turing generic kind of
> >> 0xfe, since a kind of "zero" is reserved for
> >> linear buffer layouts on all GPUs.
> >>
> >> Second, it is assumed backwards compatibility
> >> is only needed when running on Tegra GPUs, and
> >> specifically Tegra GPUs prior to Xavier.  This
> >> is based on two assertions:
> >>
> >> -Tegra GPUs prior to Xavier used a slightly
> >>   different raw bit layout than desktop GPUs,
> >>   making it impossible to directly share block
> >>   linear buffers between the two.
> >>
> >> -Support for the existing block linear modifiers
> >>   was incomplete, making them useful only for
> >>   exporting buffers created by nouveau and
> >>   importing them to Tegra DRM as framebuffers for
> >>   scan out.  There was no support for adding
> >>   framebuffers using format modifiers in nouveau,
> >>   nor importing dma-buf/PRIME GEM objects into
> >>   nouveau userspace drivers with modifiers in Mesa.
> >>
> >> Hence it is assumed the prior modifiers were not
> >> intended for use on desktop GPUs, and as a
> >> corrolary, were not intended to support sharing
> >> block linear buffers across two different NVIDIA
> >> GPUs.
> >>
> >> Signed-off-by: James Jones <jajones at nvidia.com>
> >> ---
> >>   include/uapi/drm/drm_fourcc.h | 108
+++++++++++++++++++++++++++++++---
> >>   1 file changed, 100 insertions(+), 8 deletions(-)
> >>
> >> diff --git a/include/uapi/drm/drm_fourcc.h
b/include/uapi/drm/drm_fourcc.h
> >> index 3feeaa3f987a..cc9853d42a24 100644
> >> --- a/include/uapi/drm/drm_fourcc.h
> >> +++ b/include/uapi/drm/drm_fourcc.h
> >> @@ -497,7 +497,99 @@ extern "C" {
> >>   #define DRM_FORMAT_MOD_NVIDIA_TEGRA_TILED
fourcc_mod_code(NVIDIA, 1)
> >>
> >>   /*
> >> - * 16Bx2 Block Linear layout, used by desktop GPUs, and Tegra K1
and later
> >> + * Generalized Block Linear layout, used by desktop GPUs starting
with NV50/G80,
> >> + * and Tegra GPUs starting with Tegra K1.
> >> + *
> >> + * Pixels are arranged in Groups of Bytes (GOBs).  GOB size and
layout varies
> >> + * based on the architecture generation.  GOBs themselves are
then arranged in
> >> + * 3D blocks, with the block dimensions (in terms of GOBs) always
being a power
> >> + * of two, and hence expressible as their log2 equivalent (E.g.,
"2" represents
> >> + * a block depth or height of "4").
> >> + *
> >> + * Chapter 20 "Pixel Memory Formats" of the Tegra X1
TRM describes this format
> >> + * in full detail.
> >> + *
> >> + *       Macro
> >> + * Bits  Param Description
> >> + * ----  -----
-----------------------------------------------------------------
> >> + *
> >> + *  3:0  h     log2(height) of each block, in GOBs.  Placed here
for
> >> + *             compatibility with the existing
> >> + *             DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK()-based
modifiers.
> >> + *
> >> + *  4:4  -     Must be 1, to indicate block-linear layout. 
Necessary for
> >> + *             compatibility with the existing
> >> + *             DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK()-based
modifiers.
> >> + *
> >> + *  8:5  -     Reserved (To support 3D-surfaces with variable
log2(depth) block
> >> + *             size).  Must be zero.
> >> + *
> >> + *             Note there is no log2(width) parameter.  Some
portions of the
> >> + *             hardware support a block width of two gobs, but it
is impractical
> >> + *             to use due to lack of support elsewhere, and has
no known
> >> + *             benefits.
> >> + *
> >> + * 11:9  -     Reserved (To support 2D-array textures with
variable array stride
> >> + *             in blocks, specified via log2(tile width in
blocks)).  Must be
> >> + *             zero.
> >> + *
> >> + * 19:12 k     Page Kind.  This value directly maps to a field in
the page
> >> + *             tables of all GPUs >= NV50.  It affects the
exact layout of bits
> >> + *             in memory and can be derived from the tuple
> >> + *
> >> + *               (format, GPU model, compression type, samples
per pixel)
> >> + *
> >> + *             Where compression type is defined below.  If GPU
model were
> >> + *             implied by the format modifier, format, or memory
buffer, page
> >> + *             kind would not need to be included in the modifier
itself, but
> >> + *             since the modifier should define the layout of the
associated
> >> + *             memory buffer independent from any device or other
context, it
> >> + *             must be included here.
> >> + *
> >> + *             To grandfather in prior block linear format
modifiers to this
> >> + *             layout, the page kind "0", which
corresponds to "pitch/linear"
> >> + *             and hence is unusable with block-linear layouts,
is remapped
> >> + *             within drivers to the value 0xfe, which
corresponds to the
> >> + *             "generic" kind used for simple
single-sample color formats on
> >> + *             pre-Turing GPUs.
> >
> > Hm, maybe a tiny static inline function which canonizalizes modifiers?
> > Something like
> >
> > static inline u64
> > drm_fourcc_canonicalize_nvidia_block_linear_2d(u64 modifer, bool
> > is_pre_turing)
> > {
> > }
> >
> > Would then give you a nice place to stick this backward compat note
and
> > make it really clear what should be done. I think establishing this as
a
> > pattern would also be nice, since I'm sure we'll have a pile
more of these
> > cases where modifiers turn out to assume a few too many things about
the
> > platform they're used on (we have a similar case on the intel side
too).
>
> To make sure I'm clear, it would behave like this?
>
>    fixed_mod = canonicalize(old_style_valid_mod, true);
>    assert(fixed_mod == old_style_valid_mod | (0xfe << 12));
>    fixed_mod = canonicalize(new_style_valid_mod, [false,true]);
>    assert(fixed_mod == new_style_valid_mod);
>
> I'm unclear what it should do in this case though:
>
>    fixed_mod = canonicalize(old_style_valid_mod, false);
>
> Since there's no code out there using the old style modifiers with
> Turing+ yet, and I don't want to try to support such usage.  Maybe just
> drop the "is_pre_turing" parameter and always canonicalize by
mapping 0
> -> 0xfe as the comment above states, and not touching other values?  Or
> should any invalid modifier, including this case, return
> DRM_FORMAT_MOD_INVALID?  This latter idea seems risky because it would
> cause software compiled against old drm_fourcc.h to potentially reject
> format modifiers from newer kernels or libraries with an expanded
> representation, but maybe that's what we want if only driver components
> are supposed to call this function.
That's where I'm showing that I have no clue about nvidia buffer
formats I guess :-)

If 0 never makes sense, then yeah I guess you could just
unconditionally canonicalize. Maybe you want to reject the old/legacy
style on turing+ plus, but that's a tradeoff up to you guys - as long
as it's consistent across all involved drivers. Not rejecting it would
essentially make 0 an alias for 0xfe everywhere, post and pre-turing.
More reasons for having a shared canonicalize function which enforces
the same behaviour (whichever you pick) everywhere I think.

And yeah the canonicalize function would only be called by drivers,
not by clients.
-Daniel

> Thanks,
> -James
>
> > Just a drive-by idea, feel free to ignore.
> >
> > Cheers, Daniel
> >
> >> + *
> >> + * 21:20 g     GOB Height and Page Kind Generation.  The height
of a GOB changed
> >> + *             starting with Fermi GPUs.  Additionally, the
mapping between page
> >> + *             kind and bit layout has changed at various points.
> >> + *
> >> + *               0 = Gob Height 8, Fermi - Volta, Tegra K1+ Page
Kind mapping
> >> + *               1 = Gob Height 4, G80 - GT2XX Page Kind mapping
> >> + *               2 = Gob Height 8, Turing+ Page Kind mapping
> >> + *               3 = Reserved for future use.
> >> + *
> >> + * 22:22 s     Sector layout.  On Tegra GPUs prior to Xavier,
there is a further
> >> + *             bit remapping step that occurs at an even lower
level than the
> >> + *             page kind and block linear swizzles.  This causes
the layout of
> >> + *             surfaces mapped in those SOC's GPUs to be
incompatible with the
> >> + *             equivalent mapping on other GPUs in the same
system.
> >> + *
> >> + *               0 = Tegra K1 - Tegra Parker/TX2 Layout.
> >> + *               1 = Desktop GPU and Tegra Xavier+ Layout
> >> + *
> >> + * 24:23 c     Lossless Framebuffer Compression type.
> >> + *
> >> + *               0 = none
> >> + *               1 = ROP/3D, actual compression implied by the
Page Kind field
> >> + *               2 = CDE horizontal
> >> + *               3 = CDE vertical
> >> + *
> >> + * 55:25 -     Reserved for future use.  Must be zero.
> >> + */
> >> +#define DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(c, s, g, k, h) \
> >> +    fourcc_mod_code(NVIDIA, (0x10 | \
> >> +                             ((h) & 0xf) | \
> >> +                             (((k) & 0xff) << 12) | \
> >> +                             (((g) & 0x3) << 20) | \
> >> +                             (((s) & 0x1) << 22) | \
> >> +                             (((c) & 0x3) << 23)))
> >> +
> >> +/*
> >> + * 16Bx2 Block Linear layout, used by Tegra K1 and later
> >>    *
> >>    * Pixels are arranged in 64x8 Groups Of Bytes (GOBs). GOBs are
then stacked
> >>    * vertically by a power of 2 (1 to 32 GOBs) to form a block.
> >> @@ -518,20 +610,20 @@ extern "C" {
> >>    * in full detail.
> >>    */
> >>   #define DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(v) \
> >> -    fourcc_mod_code(NVIDIA, 0x10 | ((v) & 0xf))
> >> +    DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 0, 0, 0, (v))
> >>
> >>   #define DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK_ONE_GOB \
> >> -    fourcc_mod_code(NVIDIA, 0x10)
> >> +    DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(0)
> >>   #define DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK_TWO_GOB \
> >> -    fourcc_mod_code(NVIDIA, 0x11)
> >> +    DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(1)
> >>   #define DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK_FOUR_GOB \
> >> -    fourcc_mod_code(NVIDIA, 0x12)
> >> +    DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(2)
> >>   #define DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK_EIGHT_GOB \
> >> -    fourcc_mod_code(NVIDIA, 0x13)
> >> +    DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(3)
> >>   #define DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK_SIXTEEN_GOB \
> >> -    fourcc_mod_code(NVIDIA, 0x14)
> >> +    DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(4)
> >>   #define DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK_THIRTYTWO_GOB \
> >> -    fourcc_mod_code(NVIDIA, 0x15)
> >> +    DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(5)
> >>
> >>   /*
> >>    * Some Broadcom modifiers take parameters, for example the
number of
> >> --
> >> 2.17.1
> >>
> >


-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

Reasonably Related Threads

Search for more reasonably related threads

Nouveau - Oct 2019 - [PATCH] drm: Generalized NV Block Linear DRM format mod

[Nouveau] [PATCH] drm: Generalized NV Block Linear DRM format mod

[Nouveau] [PATCH] drm: Generalized NV Block Linear DRM format mod

[Nouveau] [PATCH] drm: Generalized NV Block Linear DRM format mod

Reasonably Related Threads