Tomasz Figa
2020-Aug-19 12:49 UTC
[Nouveau] [PATCH 05/28] media/v4l2: remove V4L2-FLAG-MEMORY-NON-CONSISTENT
On Wed, Aug 19, 2020 at 1:51 PM Robin Murphy <robin.murphy at arm.com> wrote:> > Hi Tomasz, > > On 2020-08-19 12:16, Tomasz Figa wrote: > > Hi Christoph, > > > > On Wed, Aug 19, 2020 at 8:56 AM Christoph Hellwig <hch at lst.de> wrote: > >> > >> The V4L2-FLAG-MEMORY-NON-CONSISTENT flag is entirely unused, > > > > Could you explain what makes you think it's unused? It's a feature of > > the UAPI generally supported by the videobuf2 framework and relied on > > by Chromium OS to get any kind of reasonable performance when > > accessing V4L2 buffers in the userspace. > > > >> and causes > >> weird gymanstics with the DMA_ATTR_NON_CONSISTENT flag, which is > >> unimplemented except on PARISC and some MIPS configs, and about to be > >> removed. > > > > It is implemented by the generic DMA mapping layer [1], which is used > > by a number of architectures including ARM64 and supposed to be used > > by new architectures going forward. > > AFAICS all that V4L2_FLAG_MEMORY_NON_CONSISTENT does is end up > controling whether DMA_ATTR_NON_CONSISTENT is added to vb2_queue::dma_attrs. > > Please can you point to where DMA_ATTR_NON_CONSISTENT does anything at > all on arm64? >With the default config it doesn't, but with CONFIG_DMA_NONCOHERENT_CACHE_SYNC enabled it makes dma_pgprot() keep the pgprot value as is, without enforcing coherence attributes.> Also, I posit that videobuf2 is not actually relying on > DMA_ATTR_NON_CONSISTENT anyway, since it's clearly not using it properly: > > "By using this API, you are guaranteeing to the platform > that you have all the correct and necessary sync points for this memory > in the driver should it choose to return non-consistent memory." > > $ git grep dma_cache_sync drivers/media > $AFAIK dma_cache_sync() isn't the only way to perform the cache synchronization. The earlier patch series that I reviewed relied on dma_get_sgtable() and then dma_sync_sg_*() (which existed in the vb2-dc since forever [1]). However, it looks like with the final code the sgtable isn't acquired and the synchronization isn't happening, so you have a point. FWIW, I asked back in time what the plan is for non-coherent allocations and it seemed like DMA_ATTR_NON_CONSISTENT and dma_sync_*() was supposed to be the right thing to go with. [2] The same thread also explains why dma_alloc_pages() isn't suitable for the users of dma_alloc_attrs() and DMA_ATTR_NON_CONSISTENT. I think we could make a deal here. We could revert back the parts using DMA_ATTR_NON_CONSISTENT, keeping the UAPI intact, but just rendering it no-op, since it's just a hint after all. Then, you would propose a proper, functionally equivalent and working for ARM64, replacement for dma_alloc_attrs(..., DMA_ATTR_NON_CONSISTENT), which we could then use to enable the functionality expected by this UAPI. Does it sound like something that could work as a way forward here? By the way, as a videobuf2 reviewer, I'd appreciate being CC'd on any series related to the subsystem-facing DMA API changes, since videobuf2 is one of the biggest users of it. [1] https://elixir.bootlin.com/linux/v5.9-rc1/source/drivers/media/common/videobuf2/videobuf2-dma-contig.c#L98 [2] https://patchwork.kernel.org/comment/23312203/ Best regards, Tomasz> > Robin. > > > [1] https://elixir.bootlin.com/linux/v5.9-rc1/source/kernel/dma/mapping.c#L341 > > > > When removing features from generic kernel code, I'd suggest first > > providing viable alternatives for its users, rather than killing the > > users altogether. > > > > Given the above, I'm afraid I have to NAK this. > > > > Best regards, > > Tomasz > > > >> > >> Signed-off-by: Christoph Hellwig <hch at lst.de> > >> --- > >> .../userspace-api/media/v4l/buffer.rst | 17 --------- > >> .../media/v4l/vidioc-reqbufs.rst | 1 - > >> .../media/common/videobuf2/videobuf2-core.c | 36 +------------------ > >> .../common/videobuf2/videobuf2-dma-contig.c | 19 ---------- > >> .../media/common/videobuf2/videobuf2-dma-sg.c | 3 +- > >> .../media/common/videobuf2/videobuf2-v4l2.c | 12 ------- > >> include/media/videobuf2-core.h | 3 +- > >> include/uapi/linux/videodev2.h | 2 -- > >> 8 files changed, 3 insertions(+), 90 deletions(-) > >> > >> diff --git a/Documentation/userspace-api/media/v4l/buffer.rst b/Documentation/userspace-api/media/v4l/buffer.rst > >> index 57e752aaf414a7..2044ed13cd9d7d 100644 > >> --- a/Documentation/userspace-api/media/v4l/buffer.rst > >> +++ b/Documentation/userspace-api/media/v4l/buffer.rst > >> @@ -701,23 +701,6 @@ Memory Consistency Flags > >> :stub-columns: 0 > >> :widths: 3 1 4 > >> > >> - * .. _`V4L2-FLAG-MEMORY-NON-CONSISTENT`: > >> - > >> - - ``V4L2_FLAG_MEMORY_NON_CONSISTENT`` > >> - - 0x00000001 > >> - - A buffer is allocated either in consistent (it will be automatically > >> - coherent between the CPU and the bus) or non-consistent memory. The > >> - latter can provide performance gains, for instance the CPU cache > >> - sync/flush operations can be avoided if the buffer is accessed by the > >> - corresponding device only and the CPU does not read/write to/from that > >> - buffer. However, this requires extra care from the driver -- it must > >> - guarantee memory consistency by issuing a cache flush/sync when > >> - consistency is needed. If this flag is set V4L2 will attempt to > >> - allocate the buffer in non-consistent memory. The flag takes effect > >> - only if the buffer is used for :ref:`memory mapping <mmap>` I/O and the > >> - queue reports the :ref:`V4L2_BUF_CAP_SUPPORTS_MMAP_CACHE_HINTS > >> - <V4L2-BUF-CAP-SUPPORTS-MMAP-CACHE-HINTS>` capability. > >> - > >> .. c:type:: v4l2_memory > >> > >> enum v4l2_memory > >> diff --git a/Documentation/userspace-api/media/v4l/vidioc-reqbufs.rst b/Documentation/userspace-api/media/v4l/vidioc-reqbufs.rst > >> index 75d894d9c36c42..3180c111d368ee 100644 > >> --- a/Documentation/userspace-api/media/v4l/vidioc-reqbufs.rst > >> +++ b/Documentation/userspace-api/media/v4l/vidioc-reqbufs.rst > >> @@ -169,7 +169,6 @@ aborting or finishing any DMA in progress, an implicit > >> - This capability is set by the driver to indicate that the queue supports > >> cache and memory management hints. However, it's only valid when the > >> queue is used for :ref:`memory mapping <mmap>` streaming I/O. See > >> - :ref:`V4L2_FLAG_MEMORY_NON_CONSISTENT <V4L2-FLAG-MEMORY-NON-CONSISTENT>`, > >> :ref:`V4L2_BUF_FLAG_NO_CACHE_INVALIDATE <V4L2-BUF-FLAG-NO-CACHE-INVALIDATE>` and > >> :ref:`V4L2_BUF_FLAG_NO_CACHE_CLEAN <V4L2-BUF-FLAG-NO-CACHE-CLEAN>`. > >> > >> diff --git a/drivers/media/common/videobuf2/videobuf2-core.c b/drivers/media/common/videobuf2/videobuf2-core.c > >> index f544d3393e9d6b..66a41cef33c1b1 100644 > >> --- a/drivers/media/common/videobuf2/videobuf2-core.c > >> +++ b/drivers/media/common/videobuf2/videobuf2-core.c > >> @@ -721,39 +721,14 @@ int vb2_verify_memory_type(struct vb2_queue *q, > >> } > >> EXPORT_SYMBOL(vb2_verify_memory_type); > >> > >> -static void set_queue_consistency(struct vb2_queue *q, bool consistent_mem) > >> -{ > >> - q->dma_attrs &= ~DMA_ATTR_NON_CONSISTENT; > >> - > >> - if (!vb2_queue_allows_cache_hints(q)) > >> - return; > >> - if (!consistent_mem) > >> - q->dma_attrs |= DMA_ATTR_NON_CONSISTENT; > >> -} > >> - > >> -static bool verify_consistency_attr(struct vb2_queue *q, bool consistent_mem) > >> -{ > >> - bool queue_is_consistent = !(q->dma_attrs & DMA_ATTR_NON_CONSISTENT); > >> - > >> - if (consistent_mem != queue_is_consistent) { > >> - dprintk(q, 1, "memory consistency model mismatch\n"); > >> - return false; > >> - } > >> - return true; > >> -} > >> - > >> int vb2_core_reqbufs(struct vb2_queue *q, enum vb2_memory memory, > >> unsigned int flags, unsigned int *count) > >> { > >> unsigned int num_buffers, allocated_buffers, num_planes = 0; > >> unsigned plane_sizes[VB2_MAX_PLANES] = { }; > >> - bool consistent_mem = true; > >> unsigned int i; > >> int ret; > >> > >> - if (flags & V4L2_FLAG_MEMORY_NON_CONSISTENT) > >> - consistent_mem = false; > >> - > >> if (q->streaming) { > >> dprintk(q, 1, "streaming active\n"); > >> return -EBUSY; > >> @@ -765,8 +740,7 @@ int vb2_core_reqbufs(struct vb2_queue *q, enum vb2_memory memory, > >> } > >> > >> if (*count == 0 || q->num_buffers != 0 || > >> - (q->memory != VB2_MEMORY_UNKNOWN && q->memory != memory) || > >> - !verify_consistency_attr(q, consistent_mem)) { > >> + (q->memory != VB2_MEMORY_UNKNOWN && q->memory != memory)) { > >> /* > >> * We already have buffers allocated, so first check if they > >> * are not in use and can be freed. > >> @@ -803,7 +777,6 @@ int vb2_core_reqbufs(struct vb2_queue *q, enum vb2_memory memory, > >> num_buffers = min_t(unsigned int, num_buffers, VB2_MAX_FRAME); > >> memset(q->alloc_devs, 0, sizeof(q->alloc_devs)); > >> q->memory = memory; > >> - set_queue_consistency(q, consistent_mem); > >> > >> /* > >> * Ask the driver how many buffers and planes per buffer it requires. > >> @@ -894,12 +867,8 @@ int vb2_core_create_bufs(struct vb2_queue *q, enum vb2_memory memory, > >> { > >> unsigned int num_planes = 0, num_buffers, allocated_buffers; > >> unsigned plane_sizes[VB2_MAX_PLANES] = { }; > >> - bool consistent_mem = true; > >> int ret; > >> > >> - if (flags & V4L2_FLAG_MEMORY_NON_CONSISTENT) > >> - consistent_mem = false; > >> - > >> if (q->num_buffers == VB2_MAX_FRAME) { > >> dprintk(q, 1, "maximum number of buffers already allocated\n"); > >> return -ENOBUFS; > >> @@ -912,15 +881,12 @@ int vb2_core_create_bufs(struct vb2_queue *q, enum vb2_memory memory, > >> } > >> memset(q->alloc_devs, 0, sizeof(q->alloc_devs)); > >> q->memory = memory; > >> - set_queue_consistency(q, consistent_mem); > >> q->waiting_for_buffers = !q->is_output; > >> } else { > >> if (q->memory != memory) { > >> dprintk(q, 1, "memory model mismatch\n"); > >> return -EINVAL; > >> } > >> - if (!verify_consistency_attr(q, consistent_mem)) > >> - return -EINVAL; > >> } > >> > >> num_buffers = min(*count, VB2_MAX_FRAME - q->num_buffers); > >> diff --git a/drivers/media/common/videobuf2/videobuf2-dma-contig.c b/drivers/media/common/videobuf2/videobuf2-dma-contig.c > >> index ec3446cc45b8da..7b1b86ec942d7d 100644 > >> --- a/drivers/media/common/videobuf2/videobuf2-dma-contig.c > >> +++ b/drivers/media/common/videobuf2/videobuf2-dma-contig.c > >> @@ -42,11 +42,6 @@ struct vb2_dc_buf { > >> struct dma_buf_attachment *db_attach; > >> }; > >> > >> -static inline bool vb2_dc_buffer_consistent(unsigned long attr) > >> -{ > >> - return !(attr & DMA_ATTR_NON_CONSISTENT); > >> -} > >> - > >> /*********************************************/ > >> /* scatterlist table functions */ > >> /*********************************************/ > >> @@ -341,13 +336,6 @@ static int > >> vb2_dc_dmabuf_ops_begin_cpu_access(struct dma_buf *dbuf, > >> enum dma_data_direction direction) > >> { > >> - struct vb2_dc_buf *buf = dbuf->priv; > >> - struct sg_table *sgt = buf->dma_sgt; > >> - > >> - if (vb2_dc_buffer_consistent(buf->attrs)) > >> - return 0; > >> - > >> - dma_sync_sg_for_cpu(buf->dev, sgt->sgl, sgt->nents, buf->dma_dir); > >> return 0; > >> } > >> > >> @@ -355,13 +343,6 @@ static int > >> vb2_dc_dmabuf_ops_end_cpu_access(struct dma_buf *dbuf, > >> enum dma_data_direction direction) > >> { > >> - struct vb2_dc_buf *buf = dbuf->priv; > >> - struct sg_table *sgt = buf->dma_sgt; > >> - > >> - if (vb2_dc_buffer_consistent(buf->attrs)) > >> - return 0; > >> - > >> - dma_sync_sg_for_device(buf->dev, sgt->sgl, sgt->nents, buf->dma_dir); > >> return 0; > >> } > >> > >> diff --git a/drivers/media/common/videobuf2/videobuf2-dma-sg.c b/drivers/media/common/videobuf2/videobuf2-dma-sg.c > >> index 0a40e00f0d7e5c..a86fce5d8ea8bf 100644 > >> --- a/drivers/media/common/videobuf2/videobuf2-dma-sg.c > >> +++ b/drivers/media/common/videobuf2/videobuf2-dma-sg.c > >> @@ -123,8 +123,7 @@ static void *vb2_dma_sg_alloc(struct device *dev, unsigned long dma_attrs, > >> /* > >> * NOTE: dma-sg allocates memory using the page allocator directly, so > >> * there is no memory consistency guarantee, hence dma-sg ignores DMA > >> - * attributes passed from the upper layer. That means that > >> - * V4L2_FLAG_MEMORY_NON_CONSISTENT has no effect on dma-sg buffers. > >> + * attributes passed from the upper layer. > >> */ > >> buf->pages = kvmalloc_array(buf->num_pages, sizeof(struct page *), > >> GFP_KERNEL | __GFP_ZERO); > >> diff --git a/drivers/media/common/videobuf2/videobuf2-v4l2.c b/drivers/media/common/videobuf2/videobuf2-v4l2.c > >> index 30caad27281e1a..de83ad48783821 100644 > >> --- a/drivers/media/common/videobuf2/videobuf2-v4l2.c > >> +++ b/drivers/media/common/videobuf2/videobuf2-v4l2.c > >> @@ -722,20 +722,11 @@ static void fill_buf_caps(struct vb2_queue *q, u32 *caps) > >> #endif > >> } > >> > >> -static void clear_consistency_attr(struct vb2_queue *q, > >> - int memory, > >> - unsigned int *flags) > >> -{ > >> - if (!q->allow_cache_hints || memory != V4L2_MEMORY_MMAP) > >> - *flags &= ~V4L2_FLAG_MEMORY_NON_CONSISTENT; > >> -} > >> - > >> int vb2_reqbufs(struct vb2_queue *q, struct v4l2_requestbuffers *req) > >> { > >> int ret = vb2_verify_memory_type(q, req->memory, req->type); > >> > >> fill_buf_caps(q, &req->capabilities); > >> - clear_consistency_attr(q, req->memory, &req->flags); > >> return ret ? ret : vb2_core_reqbufs(q, req->memory, > >> req->flags, &req->count); > >> } > >> @@ -769,7 +760,6 @@ int vb2_create_bufs(struct vb2_queue *q, struct v4l2_create_buffers *create) > >> unsigned i; > >> > >> fill_buf_caps(q, &create->capabilities); > >> - clear_consistency_attr(q, create->memory, &create->flags); > >> create->index = q->num_buffers; > >> if (create->count == 0) > >> return ret != -EBUSY ? ret : 0; > >> @@ -998,7 +988,6 @@ int vb2_ioctl_reqbufs(struct file *file, void *priv, > >> int res = vb2_verify_memory_type(vdev->queue, p->memory, p->type); > >> > >> fill_buf_caps(vdev->queue, &p->capabilities); > >> - clear_consistency_attr(vdev->queue, p->memory, &p->flags); > >> if (res) > >> return res; > >> if (vb2_queue_is_busy(vdev, file)) > >> @@ -1021,7 +1010,6 @@ int vb2_ioctl_create_bufs(struct file *file, void *priv, > >> > >> p->index = vdev->queue->num_buffers; > >> fill_buf_caps(vdev->queue, &p->capabilities); > >> - clear_consistency_attr(vdev->queue, p->memory, &p->flags); > >> /* > >> * If count == 0, then just check if memory and type are valid. > >> * Any -EBUSY result from vb2_verify_memory_type can be mapped to 0. > >> diff --git a/include/media/videobuf2-core.h b/include/media/videobuf2-core.h > >> index 52ef92049073e3..4c7f25b07e9375 100644 > >> --- a/include/media/videobuf2-core.h > >> +++ b/include/media/videobuf2-core.h > >> @@ -744,8 +744,7 @@ void vb2_core_querybuf(struct vb2_queue *q, unsigned int index, void *pb); > >> * vb2_core_reqbufs() - Initiate streaming. > >> * @q: pointer to &struct vb2_queue with videobuf2 queue. > >> * @memory: memory type, as defined by &enum vb2_memory. > >> - * @flags: auxiliary queue/buffer management flags. Currently, the only > >> - * used flag is %V4L2_FLAG_MEMORY_NON_CONSISTENT. > >> + * @flags: auxiliary queue/buffer management flags. > >> * @count: requested buffer count. > >> * > >> * Videobuf2 core helper to implement VIDIOC_REQBUF() operation. It is called > >> diff --git a/include/uapi/linux/videodev2.h b/include/uapi/linux/videodev2.h > >> index c7b70ff53bc1dd..5c00f63d9c1b58 100644 > >> --- a/include/uapi/linux/videodev2.h > >> +++ b/include/uapi/linux/videodev2.h > >> @@ -191,8 +191,6 @@ enum v4l2_memory { > >> V4L2_MEMORY_DMABUF = 4, > >> }; > >> > >> -#define V4L2_FLAG_MEMORY_NON_CONSISTENT (1 << 0) > >> - > >> /* see also http://vektor.theorem.ca/graphics/ycbcr/ */ > >> enum v4l2_colorspace { > >> /* > >> -- > >> 2.28.0 > >> > >> _______________________________________________ > >> iommu mailing list > >> iommu at lists.linux-foundation.org > >> https://lists.linuxfoundation.org/mailman/listinfo/iommu > > > > _______________________________________________ > > linux-arm-kernel mailing list > > linux-arm-kernel at lists.infradead.org > > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel > >
Christoph Hellwig
2020-Aug-19 13:57 UTC
[Nouveau] [PATCH 05/28] media/v4l2: remove V4L2-FLAG-MEMORY-NON-CONSISTENT
On Wed, Aug 19, 2020 at 02:49:01PM +0200, Tomasz Figa wrote:> With the default config it doesn't, but with > CONFIG_DMA_NONCOHERENT_CACHE_SYNC enabled it makes dma_pgprot() keep > the pgprot value as is, without enforcing coherence attributes.Which isn't selected on arm64, and that is for a good reason.> AFAIK dma_cache_sync() isn't the only way to perform the cache > synchronization.Yes, it is the only documented way to do it. And if you read the whole series instead of screaming you'd see that it provides a proper way to deal with non-coherent memory which will also work with arm64. instead of screaming> By the way, as a videobuf2 reviewer, I'd appreciate being CC'd on any > series related to the subsystem-facing DMA API changes, since > videobuf2 is one of the biggest users of it.The cc list is too long - I cc lists and key maintainers. As a reviewer should should watch your subsystems lists closely.
Robin Murphy
2020-Aug-19 14:07 UTC
[Nouveau] [PATCH 05/28] media/v4l2: remove V4L2-FLAG-MEMORY-NON-CONSISTENT
On 2020-08-19 13:49, Tomasz Figa wrote:> On Wed, Aug 19, 2020 at 1:51 PM Robin Murphy <robin.murphy at arm.com> wrote: >> >> Hi Tomasz, >> >> On 2020-08-19 12:16, Tomasz Figa wrote: >>> Hi Christoph, >>> >>> On Wed, Aug 19, 2020 at 8:56 AM Christoph Hellwig <hch at lst.de> wrote: >>>> >>>> The V4L2-FLAG-MEMORY-NON-CONSISTENT flag is entirely unused, >>> >>> Could you explain what makes you think it's unused? It's a feature of >>> the UAPI generally supported by the videobuf2 framework and relied on >>> by Chromium OS to get any kind of reasonable performance when >>> accessing V4L2 buffers in the userspace. >>> >>>> and causes >>>> weird gymanstics with the DMA_ATTR_NON_CONSISTENT flag, which is >>>> unimplemented except on PARISC and some MIPS configs, and about to be >>>> removed. >>> >>> It is implemented by the generic DMA mapping layer [1], which is used >>> by a number of architectures including ARM64 and supposed to be used >>> by new architectures going forward. >> >> AFAICS all that V4L2_FLAG_MEMORY_NON_CONSISTENT does is end up >> controling whether DMA_ATTR_NON_CONSISTENT is added to vb2_queue::dma_attrs. >> >> Please can you point to where DMA_ATTR_NON_CONSISTENT does anything at >> all on arm64? >> > > With the default config it doesn't, but with > CONFIG_DMA_NONCOHERENT_CACHE_SYNC enabled it makes dma_pgprot() keep > the pgprot value as is, without enforcing coherence attributes.How active are the PA-RISC and MIPS ports of Chromium OS? Hacking CONFIG_DMA_NONCOHERENT_CACHE_SYNC into an architecture that doesn't provide dma_cache_sync() is wrong, since at worst it may break other drivers. If downstream is wildly misusing an API then so be it, but it's hardly a strong basis for an upstream argument.>> Also, I posit that videobuf2 is not actually relying on >> DMA_ATTR_NON_CONSISTENT anyway, since it's clearly not using it properly: >> >> "By using this API, you are guaranteeing to the platform >> that you have all the correct and necessary sync points for this memory >> in the driver should it choose to return non-consistent memory." >> >> $ git grep dma_cache_sync drivers/media >> $ > > AFAIK dma_cache_sync() isn't the only way to perform the cache > synchronization. The earlier patch series that I reviewed relied on > dma_get_sgtable() and then dma_sync_sg_*() (which existed in the > vb2-dc since forever [1]). However, it looks like with the final code > the sgtable isn't acquired and the synchronization isn't happening, so > you have a point.Using the streaming sync calls on coherent allocations has also always been wrong per the API, regardless of the bodies of code that have happened to get away with it for so long.> FWIW, I asked back in time what the plan is for non-coherent > allocations and it seemed like DMA_ATTR_NON_CONSISTENT and > dma_sync_*() was supposed to be the right thing to go with. [2] The > same thread also explains why dma_alloc_pages() isn't suitable for the > users of dma_alloc_attrs() and DMA_ATTR_NON_CONSISTENT.AFAICS even back then Christoph was implying getting rid of NON_CONSISTENT and *replacing* it with something streaming-API-based - i.e. this series - not encouraging mixing the existing APIs. It doesn't seem impossible to implement a remapping version of this new dma_alloc_pages() for IOMMU-backed ops if it's really warranted (although at that point it seems like "non-coherent" vb2-dc starts to have significant conceptual overlap with vb2-sg). Robin.
Tomasz Figa
2020-Aug-19 14:11 UTC
[Nouveau] [PATCH 05/28] media/v4l2: remove V4L2-FLAG-MEMORY-NON-CONSISTENT
On Wed, Aug 19, 2020 at 3:57 PM Christoph Hellwig <hch at lst.de> wrote:> > On Wed, Aug 19, 2020 at 02:49:01PM +0200, Tomasz Figa wrote: > > With the default config it doesn't, but with > > CONFIG_DMA_NONCOHERENT_CACHE_SYNC enabled it makes dma_pgprot() keep > > the pgprot value as is, without enforcing coherence attributes. > > Which isn't selected on arm64, and that is for a good reason. > > > AFAIK dma_cache_sync() isn't the only way to perform the cache > > synchronization. > > Yes, it is the only documented way to do it. And if you read the whole > series instead of screaming you'd see that it provides a proper way > to deal with non-coherent memory which will also work with arm64. > instead of screaming >I'm sorry if I have offended you in any way, but would also appreciate it if a less aggressive tone was directed towards me as well. I have valid reasons to object to this patch, as stated in my previous emails. The fact that the original feature has problems is of course another story and, as I mentioned too, I'm willing to look into fixing them. I'm of course happy to review the rest of the series and even more happy to help migrating this code to whatever is added there, as long as the functionality is preserved.> > By the way, as a videobuf2 reviewer, I'd appreciate being CC'd on any > > series related to the subsystem-facing DMA API changes, since > > videobuf2 is one of the biggest users of it. > > The cc list is too long - I cc lists and key maintainers. As a reviewer > should should watch your subsystems lists closely.Well, I guess we can disagree on this, because there is no clear policy. I'm listed in the MAINTAINERS file for the subsystem and I believe the purpose of the file is to list the people to CC on relevant patches. We're all overloaded with work and having to look through the huge volume of mailing lists like linux-media doesn't help and thus I'd still appreciate being added on CC. Best regards, Tomasz
Tomasz Figa
2020-Aug-19 14:22 UTC
[Nouveau] [PATCH 05/28] media/v4l2: remove V4L2-FLAG-MEMORY-NON-CONSISTENT
On Wed, Aug 19, 2020 at 4:07 PM Robin Murphy <robin.murphy at arm.com> wrote:> > On 2020-08-19 13:49, Tomasz Figa wrote: > > On Wed, Aug 19, 2020 at 1:51 PM Robin Murphy <robin.murphy at arm.com> wrote: > >> > >> Hi Tomasz, > >> > >> On 2020-08-19 12:16, Tomasz Figa wrote: > >>> Hi Christoph, > >>> > >>> On Wed, Aug 19, 2020 at 8:56 AM Christoph Hellwig <hch at lst.de> wrote: > >>>> > >>>> The V4L2-FLAG-MEMORY-NON-CONSISTENT flag is entirely unused, > >>> > >>> Could you explain what makes you think it's unused? It's a feature of > >>> the UAPI generally supported by the videobuf2 framework and relied on > >>> by Chromium OS to get any kind of reasonable performance when > >>> accessing V4L2 buffers in the userspace. > >>> > >>>> and causes > >>>> weird gymanstics with the DMA_ATTR_NON_CONSISTENT flag, which is > >>>> unimplemented except on PARISC and some MIPS configs, and about to be > >>>> removed. > >>> > >>> It is implemented by the generic DMA mapping layer [1], which is used > >>> by a number of architectures including ARM64 and supposed to be used > >>> by new architectures going forward. > >> > >> AFAICS all that V4L2_FLAG_MEMORY_NON_CONSISTENT does is end up > >> controling whether DMA_ATTR_NON_CONSISTENT is added to vb2_queue::dma_attrs. > >> > >> Please can you point to where DMA_ATTR_NON_CONSISTENT does anything at > >> all on arm64? > >> > > > > With the default config it doesn't, but with > > CONFIG_DMA_NONCOHERENT_CACHE_SYNC enabled it makes dma_pgprot() keep > > the pgprot value as is, without enforcing coherence attributes. > > How active are the PA-RISC and MIPS ports of Chromium OS?Not active. We enable CONFIG_DMA_NONCOHERENT_CACHE_SYNC for ARM64, given the directions received back in April when discussing the noncoherent memory functionality on the mailing list in the thread I pointed out in my previous message and no clarification on why it is disabled for ARM64 in upstream, despite making several attempts to get some.> > Hacking CONFIG_DMA_NONCOHERENT_CACHE_SYNC into an architecture that > doesn't provide dma_cache_sync() is wrong, since at worst it may break > other drivers. If downstream is wildly misusing an API then so be it, > but it's hardly a strong basis for an upstream argument.I guess it means that we're wildly misusing the API, but it still does work. Could you explain how it could break other drivers?> > >> Also, I posit that videobuf2 is not actually relying on > >> DMA_ATTR_NON_CONSISTENT anyway, since it's clearly not using it properly: > >> > >> "By using this API, you are guaranteeing to the platform > >> that you have all the correct and necessary sync points for this memory > >> in the driver should it choose to return non-consistent memory." > >> > >> $ git grep dma_cache_sync drivers/media > >> $ > > > > AFAIK dma_cache_sync() isn't the only way to perform the cache > > synchronization. The earlier patch series that I reviewed relied on > > dma_get_sgtable() and then dma_sync_sg_*() (which existed in the > > vb2-dc since forever [1]). However, it looks like with the final code > > the sgtable isn't acquired and the synchronization isn't happening, so > > you have a point. > > Using the streaming sync calls on coherent allocations has also always > been wrong per the API, regardless of the bodies of code that have > happened to get away with it for so long. > > > FWIW, I asked back in time what the plan is for non-coherent > > allocations and it seemed like DMA_ATTR_NON_CONSISTENT and > > dma_sync_*() was supposed to be the right thing to go with. [2] The > > same thread also explains why dma_alloc_pages() isn't suitable for the > > users of dma_alloc_attrs() and DMA_ATTR_NON_CONSISTENT. > > AFAICS even back then Christoph was implying getting rid of > NON_CONSISTENT and *replacing* it with something streaming-API-based -That's not how I read his reply from the thread I pointed to, but that might of course be my misunderstanding.> i.e. this series - not encouraging mixing the existing APIs. It doesn't > seem impossible to implement a remapping version of this new > dma_alloc_pages() for IOMMU-backed ops if it's really warranted > (although at that point it seems like "non-coherent" vb2-dc starts to > have significant conceptual overlap with vb2-sg).No, there is no overlap between vb2-dc and vb2-sg. They differ on another level - the former is to be used by devices without scatter-gather or internal mapping capabilities and gives the driver a single DMA address for the whole buffer, regardless of whether it's IOVA-contiguous (for devices behind an IOMMU) or physically contiguous (for the others), while the latter gives the driver an sgtable, which of course may be DMA-contiguous internally, but doesn't have to and usually isn't. This model makes it possible to hide the SoC implementation details from particular drivers, since those are very often reused on many SoCs which differ in the availability of IOMMU, DMA addressing restrictions and so on. Best regards, Tomasz
Christoph Hellwig
2020-Aug-20 05:02 UTC
[Nouveau] [PATCH 05/28] media/v4l2: remove V4L2-FLAG-MEMORY-NON-CONSISTENT
On Wed, Aug 19, 2020 at 03:07:04PM +0100, Robin Murphy wrote:>> FWIW, I asked back in time what the plan is for non-coherent >> allocations and it seemed like DMA_ATTR_NON_CONSISTENT and >> dma_sync_*() was supposed to be the right thing to go with. [2] The >> same thread also explains why dma_alloc_pages() isn't suitable for the >> users of dma_alloc_attrs() and DMA_ATTR_NON_CONSISTENT. > > AFAICS even back then Christoph was implying getting rid of NON_CONSISTENT > and *replacing* it with something streaming-API-based - i.e. this series - > not encouraging mixing the existing APIs. It doesn't seem impossible to > implement a remapping version of this new dma_alloc_pages() for > IOMMU-backed ops if it's really warranted (although at that point it seems > like "non-coherent" vb2-dc starts to have significant conceptual overlap > with vb2-sg).You can alway vmap the returned pages from dma_alloc_pages, but it will make cache invalidation hell - you'll need to use invalidate_kernel_vmap_range and flush_kernel_vmap_range to properly handle virtually indexed caches. Or with remapping you mean using the iommu do de-scatter/gather? You can implement that trivially implement it yourself for the iommu case: { merge_boundary = dma_get_merge_boundary(dev); if (!merge_boundary || merge_boundary > chunk_size - 1) { /* can't coalesce */ return -EINVAL; } nents = DIV_ROUND_UP(total_size, chunk_size); sg = sgl_alloc(); for_each_sgl() { sg->page = __alloc_pages(get_order(chunk_size)) sg->len = chunk_size; } dma_map_sg(sg, DMA_ATTR_SKIP_CPU_SYNC); // you are guaranteed to get a single dma_addr out } Of course this still uses the scatterlist structure with its annoying mix of input and output parametes, so I'd rather not expose it as an official API at the DMA layer.
Possibly Parallel Threads
- [PATCH 05/28] media/v4l2: remove V4L2-FLAG-MEMORY-NON-CONSISTENT
- [PATCH 05/28] media/v4l2: remove V4L2-FLAG-MEMORY-NON-CONSISTENT
- [PATCH 05/28] media/v4l2: remove V4L2-FLAG-MEMORY-NON-CONSISTENT
- [PATCH 05/28] media/v4l2: remove V4L2-FLAG-MEMORY-NON-CONSISTENT
- [PATCH 05/28] media/v4l2: remove V4L2-FLAG-MEMORY-NON-CONSISTENT