Tom Murphy
2020-Aug-24 00:04 UTC
[PATCH 0/8] Convert the intel iommu driver to the dma-iommu api
Hi Logan/All, I have added a check for the sg_dma_len == 0 : """ } __sgt_iter(struct scatterlist *sgl, bool dma) { struct sgt_iter s = { .sgp = sgl }; + if (sgl && sg_dma_len(sgl) == 0) + s.sgp = NULL; if (s.sgp) { ..... """ at location [1]. but it doens't fix the problem. You're right though, this change does need to be made, this code doesn't handle pages of sg_dma_len(sg) == 0 correctly So my guess is that we have more bugs in other parts of the i915 driver (or there is a problem with my "sg_dma_len == 0" fix above). I have been trying to spot where else the code might be buggy but I haven't had any luck so far. I'm doing a microconfernce (at LPC 2020) this wednesdays [1] on this if you're interested in attending. I'm hoping I can chat about it with a few people and find how can reproduce and fix this issues. I don't have any more time I can give to this unfortunately and it would be a shame for the work to go to waste. [0] https://github.com/torvalds/linux/blob/d012a7190fc1fd72ed48911e77ca97ba4521bccd/drivers/gpu/drm/i915/i915_scatterlist.h#L28 [1] https://linuxplumbersconf.org/event/7/contributions/846/ On Fri, 29 May 2020 at 22:21, Logan Gunthorpe <logang at deltatee.com> wrote:> > > > On 2020-05-29 3:11 p.m., Marek Szyprowski wrote: > > Patches are pending: > > https://lore.kernel.org/linux-iommu/20200513132114.6046-1-m.szyprowski at samsung.com/T/ > > Cool, nice! Though, I still don't think that fixes the issue in > i915_scatterlist.h given it still ignores sg_dma_len() and strictly > relies on sg_next()/sg_is_last() to stop iterating -- and I suspect this > is the bug that got in Tom's way. > > >> However, as Robin pointed out, there are other ugly tricks like stopping > >> iterating through the SGL when sg_dma_len() is zero. For example, the > >> AMD driver appears to use drm_prime_sg_to_page_addr_arrays() which does > >> this trick and thus likely isn't buggy (otherwise, I'd expect someone to > >> have complained by now seeing AMD has already switched to IOMMU-DMA. > > > > I'm not sure that this is a trick. Stopping at zero sg_dma_len() was > > somewhere documented. > > Well whatever you want to call it, it is ugly to have some drivers doing > one thing with the returned value and others assuming there's an extra > zero at the end. It just causes confusion for people reading/copying the > code. It would be better if they are all consistent. However, I concede > stopping at zero should not be broken, presently. > > Logan
Alex Deucher
2020-Aug-26 18:26 UTC
[PATCH 0/8] Convert the intel iommu driver to the dma-iommu api
On Mon, Aug 24, 2020 at 2:56 AM Tom Murphy <murphyt7 at tcd.ie> wrote:> > Hi Logan/All, > > I have added a check for the sg_dma_len == 0 : > """ > } __sgt_iter(struct scatterlist *sgl, bool dma) { > struct sgt_iter s = { .sgp = sgl }; > > + if (sgl && sg_dma_len(sgl) == 0) > + s.sgp = NULL; > > if (s.sgp) { > ..... > """ > at location [1]. > but it doens't fix the problem. > > You're right though, this change does need to be made, this code > doesn't handle pages of sg_dma_len(sg) == 0 correctly > So my guess is that we have more bugs in other parts of the i915 > driver (or there is a problem with my "sg_dma_len == 0" fix above). > I have been trying to spot where else the code might be buggy but I > haven't had any luck so far. > > I'm doing a microconfernce (at LPC 2020) this wednesdays [1] on this > if you're interested in attending. > I'm hoping I can chat about it with a few people and find how can > reproduce and fix this issues. I don't have any more time I can give > to this unfortunately and it would be a shame for the work to go to > waste. > > [0] https://github.com/torvalds/linux/blob/d012a7190fc1fd72ed48911e77ca97ba4521bccd/drivers/gpu/drm/i915/i915_scatterlist.h#L28 > [1] https://linuxplumbersconf.org/event/7/contributions/846/ > > On Fri, 29 May 2020 at 22:21, Logan Gunthorpe <logang at deltatee.com> wrote: > > > > > > > > On 2020-05-29 3:11 p.m., Marek Szyprowski wrote: > > > Patches are pending: > > > https://lore.kernel.org/linux-iommu/20200513132114.6046-1-m.szyprowski at samsung.com/T/ > > > > Cool, nice! Though, I still don't think that fixes the issue in > > i915_scatterlist.h given it still ignores sg_dma_len() and strictly > > relies on sg_next()/sg_is_last() to stop iterating -- and I suspect this > > is the bug that got in Tom's way. > > > > >> However, as Robin pointed out, there are other ugly tricks like stopping > > >> iterating through the SGL when sg_dma_len() is zero. For example, the > > >> AMD driver appears to use drm_prime_sg_to_page_addr_arrays() which does > > >> this trick and thus likely isn't buggy (otherwise, I'd expect someone to > > >> have complained by now seeing AMD has already switched to IOMMU-DMA.We ran into the same issue with amdgpu and radeon when the AMD IOMMU driver was converted and had to fix it as well. The relevant fixes were: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=42e67b479eab6d26459b80b4867298232b0435e7 https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0199172f933342d8b1011aae2054a695c25726f4 https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=47f7826c520ecd92ffbffe59ecaa2fe61e42ec70 https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c0f83d164fb8f3a2b7bc379a6c1e27d1123a9eab Alex> > > > > > I'm not sure that this is a trick. Stopping at zero sg_dma_len() was > > > somewhere documented. > > > > Well whatever you want to call it, it is ugly to have some drivers doing > > one thing with the returned value and others assuming there's an extra > > zero at the end. It just causes confusion for people reading/copying the > > code. It would be better if they are all consistent. However, I concede > > stopping at zero should not be broken, presently. > > > > Logan > _______________________________________________ > dri-devel mailing list > dri-devel at lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/dri-devel