On Thu, Feb 02, 2017 at 04:40:49PM +0000, Will Deacon wrote:> On Thu, Feb 02, 2017 at 06:30:28PM +0200, Michael S. Tsirkin wrote: > > I am inclined to say, for 4.10 let's revert > > c7070619f3408d9a0dffbed9149e6f00479cf43b since what it fixes is not a > > regression in 4.10. > > No complaints there, as long as we can keep working to fix this for 4.11 > and onwards. You'll also need to cc stable on the revert. > > > So I think we can defer the fix to 4.11. > > I think we still want f7f6634d23830ff74335734fbdb28ea109c1f349 > > for hosts with virtio 1 support. > > > > All this will hopefully push hosts to just implement virtio 1. > > For mmio the changes are very small: several new registers, > > that's all. You want this for proper 64 bit dma mask anyway. > > As I've said, virtio 1 will have exactly the same issue unless we start > requiring firmware to advertise dma-coherent/_CCA for virtio-mmio > devices correctly. > > WillOK I read up on _CCA in ACPI spec. It says: The _CCA object returns whether or not a bus-master device supports hardware managed cache coherency. Expected values are 0 to indicate it is not supported, and 1 to indicate that it is supported. So if host is cache coherent, and guest thinks it isn't, we incur unnecessary overhead by wasting coherent memory. I get that but you said it actually breaks - why does it? -- MST
On Thu, Feb 09, 2017 at 08:17:16PM +0200, Michael S. Tsirkin wrote:> On Thu, Feb 02, 2017 at 04:40:49PM +0000, Will Deacon wrote: > > On Thu, Feb 02, 2017 at 06:30:28PM +0200, Michael S. Tsirkin wrote: > > > I am inclined to say, for 4.10 let's revert > > > c7070619f3408d9a0dffbed9149e6f00479cf43b since what it fixes is not a > > > regression in 4.10. > > > > No complaints there, as long as we can keep working to fix this for 4.11 > > and onwards. You'll also need to cc stable on the revert. > > > > > So I think we can defer the fix to 4.11. > > > I think we still want f7f6634d23830ff74335734fbdb28ea109c1f349 > > > for hosts with virtio 1 support. > > > > > > All this will hopefully push hosts to just implement virtio 1. > > > For mmio the changes are very small: several new registers, > > > that's all. You want this for proper 64 bit dma mask anyway. > > > > As I've said, virtio 1 will have exactly the same issue unless we start > > requiring firmware to advertise dma-coherent/_CCA for virtio-mmio > > devices correctly. > > > > OK I read up on _CCA in ACPI spec. It says: > The _CCA object returns whether or not a bus-master device supports > hardware managed cache coherency. Expected values are 0 to indicate it > is not supported, and 1 to indicate that it is supported. > > So if host is cache coherent, and guest thinks it isn't, we incur > unnecessary overhead by wasting coherent memory. > I get that but you said it actually breaks - why does it?It breaks because QEMU doesn't set _CCA for virtio-mmio devices, and that only becomes a problem when we use the DMA API, because that results in the guest taking out a non-cacheable mapping. On ARM (and other archs such as Power), having a mismatch between a cacheable and a non-cacheable mapping can result in a loss of coherency between the two (for example, if the non-cacheable gues accesses bypass the cache, but the cacheable host accesses allocate in the cache). Will
On Thu, Feb 09, 2017 at 06:31:18PM +0000, Will Deacon wrote:> On Thu, Feb 09, 2017 at 08:17:16PM +0200, Michael S. Tsirkin wrote: > > On Thu, Feb 02, 2017 at 04:40:49PM +0000, Will Deacon wrote: > > > On Thu, Feb 02, 2017 at 06:30:28PM +0200, Michael S. Tsirkin wrote: > > > > I am inclined to say, for 4.10 let's revert > > > > c7070619f3408d9a0dffbed9149e6f00479cf43b since what it fixes is not a > > > > regression in 4.10. > > > > > > No complaints there, as long as we can keep working to fix this for 4.11 > > > and onwards. You'll also need to cc stable on the revert. > > > > > > > So I think we can defer the fix to 4.11. > > > > I think we still want f7f6634d23830ff74335734fbdb28ea109c1f349 > > > > for hosts with virtio 1 support. > > > > > > > > All this will hopefully push hosts to just implement virtio 1. > > > > For mmio the changes are very small: several new registers, > > > > that's all. You want this for proper 64 bit dma mask anyway. > > > > > > As I've said, virtio 1 will have exactly the same issue unless we start > > > requiring firmware to advertise dma-coherent/_CCA for virtio-mmio > > > devices correctly. > > > > > > > OK I read up on _CCA in ACPI spec. It says: > > The _CCA object returns whether or not a bus-master device supports > > hardware managed cache coherency. Expected values are 0 to indicate it > > is not supported, and 1 to indicate that it is supported. > > > > So if host is cache coherent, and guest thinks it isn't, we incur > > unnecessary overhead by wasting coherent memory. > > I get that but you said it actually breaks - why does it? > > It breaks because QEMU doesn't set _CCA for virtio-mmio devices, and that > only becomes a problem when we use the DMA API, because that results in the > guest taking out a non-cacheable mapping. On ARM (and other archs such as > Power), having a mismatch between a cacheable and a non-cacheable mapping > can result in a loss of coherency between the two (for example, if the > non-cacheable gues accesses bypass the cache, but the cacheable host > accesses allocate in the cache). > > WillI see. And I guess using a cacheable mapping is significantly faster. I would say we want to typically use cacheable for virtio then, whether we bypass the IOMMU or not. I guess this is why we always set _CCA/DT correctly, right? -- MST
On Thu, Feb 09, 2017 at 06:31:18PM +0000, Will Deacon wrote:> On ARM (and other archs such as > Power), having a mismatch between a cacheable and a non-cacheable mapping > can result in a loss of coherency between the two (for example, if the > non-cacheable gues accesses bypass the cache, but the cacheable host > accesses allocate in the cache).I guess it's an optimization to avoid cache snoops for non-cacheable accesses? -- MST