Michael S. Tsirkin
2018-Dec-25 16:25 UTC
[PATCH net V2 4/4] vhost: log dirty page correctly
On Tue, Dec 25, 2018 at 05:43:25PM +0800, Jason Wang wrote:> > On 2018/12/25 ??1:41, Michael S. Tsirkin wrote: > > On Mon, Dec 24, 2018 at 11:43:31AM +0800, Jason Wang wrote: > > > On 2018/12/14 ??9:20, Michael S. Tsirkin wrote: > > > > On Fri, Dec 14, 2018 at 10:43:03AM +0800, Jason Wang wrote: > > > > > On 2018/12/13 ??10:31, Michael S. Tsirkin wrote: > > > > > > > Just to make sure I understand this. It looks to me we should: > > > > > > > > > > > > > > - allow passing GIOVA->GPA through UAPI > > > > > > > > > > > > > > - cache GIOVA->GPA somewhere but still use GIOVA->HVA in device IOTLB for > > > > > > > performance > > > > > > > > > > > > > > Is this what you suggest? > > > > > > > > > > > > > > Thanks > > > > > > Not really. We already have GPA->HVA, so I suggested a flag to pass > > > > > > GIOVA->GPA in the IOTLB. > > > > > > > > > > > > This has advantages for security since a single table needs > > > > > > then to be validated to ensure guest does not corrupt > > > > > > QEMU memory. > > > > > > > > > > > I wonder how much we can gain through this. Currently, qemu IOMMU gives > > > > > GIOVA->GPA mapping, and qemu vhost code will translate GPA to HVA then pass > > > > > GIOVA->HVA to vhost. It looks no difference to me. > > > > > > > > > > Thanks > > > > The difference is in security not in performance. Getting a bad HVA > > > > corrupts QEMU memory and it might be guest controlled. Very risky. > > > How can this be controlled by guest? HVA was generated from qemu ram blocks > > > which is totally under the control of qemu memory core instead of guest. > > > > > > > > > Thanks > > It is ultimately under guest influence as guest supplies IOVA->GPA > > translations. qemu translates GPA->HVA and gives the translated result > > to the kernel. If it's not buggy and kernel isn't buggy it's all > > fine. > > > If qemu provides buggy GPA->HVA, we can't workaround this. And I don't get > the point why we even want to try this. Buggy qemu code can crash itself in > many ways. > > > > > > But that's the approach that was proven not to work in the 20th century. > > In the 21st century we are trying defence in depth approach. > > > > My point is that a single code path that is responsible for > > the HVA translations is better than two. > > > > So the difference whether or not use memory table information: > > Current: > > 1) SET_MEM_TABLE: GPA->HVA > > 2) Qemu GIOVA->GPA > > 3) Qemu GPA->HVA > > 4) IOTLB_UPDATE: GIOVA->HVA > > If I understand correctly you want to drop step 3 consider it might be buggy > which is just 19 lines of code in qemu (vhost_memory_region_lookup()). This > will ends up: > > 1) Do GPA->HVA translation in IOTLB_UPDATE path (I believe we won't want to > do it during device IOTLB lookup). > > 2) Extra bits to enable this capability. > > So this looks need more codes in kernel than what qemu did in userspace.? Is > this really worthwhile? > > ThanksSo there are several points I would like to make 1. At the moment without an iommu it is possible to change GPA-HVA mappings and everything keeps working because a change in memory tables flushes the rings. However I don't see the iotlb cache being invalidated on that path - did I miss it? If it is not there it's a related minor bug. 2. qemu already has a GPA. Discarding it and re-calculating when logging is on just seems wrong. However if you would like to *also* keep the HVA in the iotlb to avoid doing extra translations, that sounds like a reasonable optimization. 3. it also means that the hva->gpa translation only runs when logging is enabled. That is a rarely excercised path so any bugs there will not be caught. So I really would like us long term to move away from hva->gpa translations, keep them for legacy userspace only but I don't really mind how we do it. How about - a new flag to pass an iotlb with *both* a gpa and hva - for legacy userspace, calculate the gpa on iotlb update so the device then uses a shared code path what do you think? -- MST
On 2018/12/26 ??12:25, Michael S. Tsirkin wrote:> On Tue, Dec 25, 2018 at 05:43:25PM +0800, Jason Wang wrote: >> On 2018/12/25 ??1:41, Michael S. Tsirkin wrote: >>> On Mon, Dec 24, 2018 at 11:43:31AM +0800, Jason Wang wrote: >>>> On 2018/12/14 ??9:20, Michael S. Tsirkin wrote: >>>>> On Fri, Dec 14, 2018 at 10:43:03AM +0800, Jason Wang wrote: >>>>>> On 2018/12/13 ??10:31, Michael S. Tsirkin wrote: >>>>>>>> Just to make sure I understand this. It looks to me we should: >>>>>>>> >>>>>>>> - allow passing GIOVA->GPA through UAPI >>>>>>>> >>>>>>>> - cache GIOVA->GPA somewhere but still use GIOVA->HVA in device IOTLB for >>>>>>>> performance >>>>>>>> >>>>>>>> Is this what you suggest? >>>>>>>> >>>>>>>> Thanks >>>>>>> Not really. We already have GPA->HVA, so I suggested a flag to pass >>>>>>> GIOVA->GPA in the IOTLB. >>>>>>> >>>>>>> This has advantages for security since a single table needs >>>>>>> then to be validated to ensure guest does not corrupt >>>>>>> QEMU memory. >>>>>>> >>>>>> I wonder how much we can gain through this. Currently, qemu IOMMU gives >>>>>> GIOVA->GPA mapping, and qemu vhost code will translate GPA to HVA then pass >>>>>> GIOVA->HVA to vhost. It looks no difference to me. >>>>>> >>>>>> Thanks >>>>> The difference is in security not in performance. Getting a bad HVA >>>>> corrupts QEMU memory and it might be guest controlled. Very risky. >>>> How can this be controlled by guest? HVA was generated from qemu ram blocks >>>> which is totally under the control of qemu memory core instead of guest. >>>> >>>> >>>> Thanks >>> It is ultimately under guest influence as guest supplies IOVA->GPA >>> translations. qemu translates GPA->HVA and gives the translated result >>> to the kernel. If it's not buggy and kernel isn't buggy it's all >>> fine. >> >> If qemu provides buggy GPA->HVA, we can't workaround this. And I don't get >> the point why we even want to try this. Buggy qemu code can crash itself in >> many ways. >> >> >>> But that's the approach that was proven not to work in the 20th century. >>> In the 21st century we are trying defence in depth approach. >>> >>> My point is that a single code path that is responsible for >>> the HVA translations is better than two. >>> >> So the difference whether or not use memory table information: >> >> Current: >> >> 1) SET_MEM_TABLE: GPA->HVA >> >> 2) Qemu GIOVA->GPA >> >> 3) Qemu GPA->HVA >> >> 4) IOTLB_UPDATE: GIOVA->HVA >> >> If I understand correctly you want to drop step 3 consider it might be buggy >> which is just 19 lines of code in qemu (vhost_memory_region_lookup()). This >> will ends up: >> >> 1) Do GPA->HVA translation in IOTLB_UPDATE path (I believe we won't want to >> do it during device IOTLB lookup). >> >> 2) Extra bits to enable this capability. >> >> So this looks need more codes in kernel than what qemu did in userspace.? Is >> this really worthwhile? >> >> Thanks > So there are several points I would like to make > > 1. At the moment without an iommu it is possible to > change GPA-HVA mappings and everything keeps working > because a change in memory tables flushes the rings.Interesting, I don't know this before. But when can this happen?> However I don't see the iotlb cache being invalidated > on that path - did I miss it? If it is not there it's > a related minor bug.It might have a bug. But a question is consider the case without IOMMU. We only update mem table (SET_MEM_TABLE), but not vring address. This looks like a bug as well?> > 2. qemu already has a GPA. Discarding it and re-calculating > when logging is on just seems wrong. > However if you would like to *also* keep the HVA in the iotlb > to avoid doing extra translations, that sounds like a > reasonable optimization.Yes, traverse GPA->HVA mapping seems unnecessary.> > 3. it also means that the hva->gpa translation only runs > when logging is enabled. That is a rarely excercised > path so any bugs there will not be caught.I wonder maybe some kind of unit-test may help here.> > So I really would like us long term to move away from > hva->gpa translations, keep them for legacy userspace only > but I don't really mind how we do it. > > How about > - a new flag to pass an iotlb with *both* a gpa and hva > - for legacy userspace, calculate the gpa on iotlb update > so the device then uses a shared code path > > what do you think? > >I don't object this idea so I can try, just want to figure out why it was a must. Thanks
Michael S. Tsirkin
2018-Dec-26 13:46 UTC
[PATCH net V2 4/4] vhost: log dirty page correctly
On Wed, Dec 26, 2018 at 01:43:26PM +0800, Jason Wang wrote:> > On 2018/12/26 ??12:25, Michael S. Tsirkin wrote: > > On Tue, Dec 25, 2018 at 05:43:25PM +0800, Jason Wang wrote: > > > On 2018/12/25 ??1:41, Michael S. Tsirkin wrote: > > > > On Mon, Dec 24, 2018 at 11:43:31AM +0800, Jason Wang wrote: > > > > > On 2018/12/14 ??9:20, Michael S. Tsirkin wrote: > > > > > > On Fri, Dec 14, 2018 at 10:43:03AM +0800, Jason Wang wrote: > > > > > > > On 2018/12/13 ??10:31, Michael S. Tsirkin wrote: > > > > > > > > > Just to make sure I understand this. It looks to me we should: > > > > > > > > > > > > > > > > > > - allow passing GIOVA->GPA through UAPI > > > > > > > > > > > > > > > > > > - cache GIOVA->GPA somewhere but still use GIOVA->HVA in device IOTLB for > > > > > > > > > performance > > > > > > > > > > > > > > > > > > Is this what you suggest? > > > > > > > > > > > > > > > > > > Thanks > > > > > > > > Not really. We already have GPA->HVA, so I suggested a flag to pass > > > > > > > > GIOVA->GPA in the IOTLB. > > > > > > > > > > > > > > > > This has advantages for security since a single table needs > > > > > > > > then to be validated to ensure guest does not corrupt > > > > > > > > QEMU memory. > > > > > > > > > > > > > > > I wonder how much we can gain through this. Currently, qemu IOMMU gives > > > > > > > GIOVA->GPA mapping, and qemu vhost code will translate GPA to HVA then pass > > > > > > > GIOVA->HVA to vhost. It looks no difference to me. > > > > > > > > > > > > > > Thanks > > > > > > The difference is in security not in performance. Getting a bad HVA > > > > > > corrupts QEMU memory and it might be guest controlled. Very risky. > > > > > How can this be controlled by guest? HVA was generated from qemu ram blocks > > > > > which is totally under the control of qemu memory core instead of guest. > > > > > > > > > > > > > > > Thanks > > > > It is ultimately under guest influence as guest supplies IOVA->GPA > > > > translations. qemu translates GPA->HVA and gives the translated result > > > > to the kernel. If it's not buggy and kernel isn't buggy it's all > > > > fine. > > > > > > If qemu provides buggy GPA->HVA, we can't workaround this. And I don't get > > > the point why we even want to try this. Buggy qemu code can crash itself in > > > many ways. > > > > > > > > > > But that's the approach that was proven not to work in the 20th century. > > > > In the 21st century we are trying defence in depth approach. > > > > > > > > My point is that a single code path that is responsible for > > > > the HVA translations is better than two. > > > > > > > So the difference whether or not use memory table information: > > > > > > Current: > > > > > > 1) SET_MEM_TABLE: GPA->HVA > > > > > > 2) Qemu GIOVA->GPA > > > > > > 3) Qemu GPA->HVA > > > > > > 4) IOTLB_UPDATE: GIOVA->HVA > > > > > > If I understand correctly you want to drop step 3 consider it might be buggy > > > which is just 19 lines of code in qemu (vhost_memory_region_lookup()). This > > > will ends up: > > > > > > 1) Do GPA->HVA translation in IOTLB_UPDATE path (I believe we won't want to > > > do it during device IOTLB lookup). > > > > > > 2) Extra bits to enable this capability. > > > > > > So this looks need more codes in kernel than what qemu did in userspace.? Is > > > this really worthwhile? > > > > > > Thanks > > So there are several points I would like to make > > > > 1. At the moment without an iommu it is possible to > > change GPA-HVA mappings and everything keeps working > > because a change in memory tables flushes the rings. > > > Interesting, I don't know this before. But when can this happen?It doesn't happen with existing qemu. But it seems like a valid thing to do to remap memory at a different address.> > > However I don't see the iotlb cache being invalidated > > on that path - did I miss it? If it is not there it's > > a related minor bug. > > > It might have a bug. But a question is consider the case without IOMMU. We > only update mem table (SET_MEM_TABLE), but not vring address. This looks > like a bug as well?I think that without an iommu it can only work without races if backend is stopped or if the vring isn't in guest memory with ring aliasing).> > > > > 2. qemu already has a GPA. Discarding it and re-calculating > > when logging is on just seems wrong. > > However if you would like to *also* keep the HVA in the iotlb > > to avoid doing extra translations, that sounds like a > > reasonable optimization. > > > Yes, traverse GPA->HVA mapping seems unnecessary. > > > > > > 3. it also means that the hva->gpa translation only runs > > when logging is enabled. That is a rarely excercised > > path so any bugs there will not be caught. > > > I wonder maybe some kind of unit-test may help here. > > > > > > So I really would like us long term to move away from > > hva->gpa translations, keep them for legacy userspace only > > but I don't really mind how we do it. > > > > How about > > - a new flag to pass an iotlb with *both* a gpa and hva > > - for legacy userspace, calculate the gpa on iotlb update > > so the device then uses a shared code path > > > > what do you think? > > > > > > I don't object this idea so I can try, just want to figure out why it was a > must. > > ThanksNot a must but I think it's a good interface extension. -- MST