Christian König
2017-Dec-19 13:45 UTC
[Nouveau] nouveau. swiotlb: coherent allocation failed for device 0000:01:00.0 size=2097152
Am 19.12.2017 um 11:39 schrieb Michel Dänzer:> On 2017-12-19 11:37 AM, Michel Dänzer wrote: >> On 2017-12-18 08:01 PM, Tobias Klausmann wrote: >>> On 12/18/17 7:06 PM, Mike Galbraith wrote: >>>> Greetings, >>>> >>>> Kernel bound workloads seem to trigger the below for whatever reason. >>>> I only see this when beating up NFS. There was a kworker wakeup >>>> latency issue, but with a bandaid applied to fix that up, I can still >>>> trigger this. >>> >>> Hi, >>> >>> i have seen this one as well with my system, but i could not find an >>> easy way to trigger it for bisecting purpose. If you can trigger it >>> conveniently, a bisect would be nice! >> I'm seeing this (with the amdgpu and radeon drivers) when restic takes a >> backup, creating memory pressure. I happen to have just finished >> bisecting, the result is: >> >> 648bc3574716400acc06f99915815f80d9563783 is the first bad commit >> commit 648bc3574716400acc06f99915815f80d9563783 >> Author: Christian König <christian.koenig at amd.com> >> Date: Thu Jul 6 09:59:43 2017 +0200 >> >> drm/ttm: add transparent huge page support for DMA allocations v2 >> >> Try to allocate huge pages when it makes sense. >> >> v2: fix comment and use ifdef >> >> > BTW, I haven't noticed any bad effects other than the dmesg splats, so > maybe it's just noise about transient failures for which there is a > proper fallback in place.Yeah, I think that is exactly what happens here. We try to allocate a huge page, but fail and so fall back to using multiple 4k pages instead. Going to send out a patch to suppress the warning. Thanks for bisecting this, Christian.
Ilia Mirkin
2017-Dec-31 18:27 UTC
[Nouveau] nouveau. swiotlb: coherent allocation failed for device 0000:01:00.0 size=2097152
On Tue, Dec 19, 2017 at 8:45 AM, Christian König <ckoenig.leichtzumerken at gmail.com> wrote:> Am 19.12.2017 um 11:39 schrieb Michel Dänzer: >> >> On 2017-12-19 11:37 AM, Michel Dänzer wrote: >>> >>> On 2017-12-18 08:01 PM, Tobias Klausmann wrote: >>>> >>>> On 12/18/17 7:06 PM, Mike Galbraith wrote: >>>>> >>>>> Greetings, >>>>> >>>>> Kernel bound workloads seem to trigger the below for whatever reason. >>>>> I only see this when beating up NFS. There was a kworker wakeup >>>>> latency issue, but with a bandaid applied to fix that up, I can still >>>>> trigger this. >>>> >>>> >>>> Hi, >>>> >>>> i have seen this one as well with my system, but i could not find an >>>> easy way to trigger it for bisecting purpose. If you can trigger it >>>> conveniently, a bisect would be nice! >>> >>> I'm seeing this (with the amdgpu and radeon drivers) when restic takes a >>> backup, creating memory pressure. I happen to have just finished >>> bisecting, the result is: >>> >>> 648bc3574716400acc06f99915815f80d9563783 is the first bad commit >>> commit 648bc3574716400acc06f99915815f80d9563783 >>> Author: Christian König <christian.koenig at amd.com> >>> Date: Thu Jul 6 09:59:43 2017 +0200 >>> >>> drm/ttm: add transparent huge page support for DMA allocations v2 >>> >>> Try to allocate huge pages when it makes sense. >>> >>> v2: fix comment and use ifdef >>> >>> >> BTW, I haven't noticed any bad effects other than the dmesg splats, so >> maybe it's just noise about transient failures for which there is a >> proper fallback in place. > > > Yeah, I think that is exactly what happens here. > > We try to allocate a huge page, but fail and so fall back to using multiple > 4k pages instead. > > Going to send out a patch to suppress the warning.Hi Christian, Did you ever send out such a patch? I didn't see one on the list, but perhaps I missed it. One definitely hasn't made it upstream yet. (I just hit the issue myself with Linus's tree from last night.) Thanks, -ilia
Mike Galbraith
2017-Dec-31 20:53 UTC
[Nouveau] nouveau. swiotlb: coherent allocation failed for device 0000:01:00.0 size=2097152
On Sun, 2017-12-31 at 13:27 -0500, Ilia Mirkin wrote:> On Tue, Dec 19, 2017 at 8:45 AM, Christian König > <ckoenig.leichtzumerken at gmail.com> wrote: > > Am 19.12.2017 um 11:39 schrieb Michel Dänzer: > >> > >> On 2017-12-19 11:37 AM, Michel Dänzer wrote: > >>> > >>> On 2017-12-18 08:01 PM, Tobias Klausmann wrote: > >>>> > >>>> On 12/18/17 7:06 PM, Mike Galbraith wrote: > >>>>> > >>>>> Greetings, > >>>>> > >>>>> Kernel bound workloads seem to trigger the below for whatever reason. > >>>>> I only see this when beating up NFS. There was a kworker wakeup > >>>>> latency issue, but with a bandaid applied to fix that up, I can still > >>>>> trigger this. > >>>> > >>>> > >>>> Hi, > >>>> > >>>> i have seen this one as well with my system, but i could not find an > >>>> easy way to trigger it for bisecting purpose. If you can trigger it > >>>> conveniently, a bisect would be nice! > >>> > >>> I'm seeing this (with the amdgpu and radeon drivers) when restic takes a > >>> backup, creating memory pressure. I happen to have just finished > >>> bisecting, the result is: > >>> > >>> 648bc3574716400acc06f99915815f80d9563783 is the first bad commit > >>> commit 648bc3574716400acc06f99915815f80d9563783 > >>> Author: Christian König <christian.koenig at amd.com> > >>> Date: Thu Jul 6 09:59:43 2017 +0200 > >>> > >>> drm/ttm: add transparent huge page support for DMA allocations v2 > >>> > >>> Try to allocate huge pages when it makes sense. > >>> > >>> v2: fix comment and use ifdef > >>> > >>> > >> BTW, I haven't noticed any bad effects other than the dmesg splats, so > >> maybe it's just noise about transient failures for which there is a > >> proper fallback in place. > > > > > > Yeah, I think that is exactly what happens here. > > > > We try to allocate a huge page, but fail and so fall back to using multiple > > 4k pages instead. > > > > Going to send out a patch to suppress the warning. > > Hi Christian, > > Did you ever send out such a patch? I didn't see one on the list, but > perhaps I missed it. One definitely hasn't made it upstream yet. (I > just hit the issue myself with Linus's tree from last night.)Actually, that wants a bit more methinks, because while the stack dump goes away, you still get spammed, it just comes in smaller chunks. -Mike
Seemingly Similar Threads
- nouveau. swiotlb: coherent allocation failed for device 0000:01:00.0 size=2097152
- nouveau. swiotlb: coherent allocation failed for device 0000:01:00.0 size=2097152
- nouveau. swiotlb: coherent allocation failed for device 0000:01:00.0 size=2097152
- nouveau. swiotlb: coherent allocation failed for device 0000:01:00.0 size=2097152
- nouveau. swiotlb: coherent allocation failed for device 0000:01:00.0 size=2097152