thr3ads.net - Virtualization - [PATCH net V2 4/4] vhost: log dirty page correctly [Dec 2018]

If this information is useful, please help other people find it:
Share via:

Michael S. Tsirkin

2018-Dec-25 16:25 UTC

[PATCH net V2 4/4] vhost: log dirty page correctly

On Tue, Dec 25, 2018 at 05:43:25PM +0800, Jason Wang
wrote:> 
> On 2018/12/25 ??1:41, Michael S. Tsirkin wrote:
> > On Mon, Dec 24, 2018 at 11:43:31AM +0800, Jason Wang wrote:
> > > On 2018/12/14 ??9:20, Michael S. Tsirkin wrote:
> > > > On Fri, Dec 14, 2018 at 10:43:03AM +0800, Jason Wang wrote:
> > > > > On 2018/12/13 ??10:31, Michael S. Tsirkin wrote:
> > > > > > > Just to make sure I understand this. It looks
to me we should:
> > > > > > > 
> > > > > > > - allow passing GIOVA->GPA through UAPI
> > > > > > > 
> > > > > > > - cache GIOVA->GPA somewhere but still use
GIOVA->HVA in device IOTLB for
> > > > > > > performance
> > > > > > > 
> > > > > > > Is this what you suggest?
> > > > > > > 
> > > > > > > Thanks
> > > > > > Not really. We already have GPA->HVA, so I
suggested a flag to pass
> > > > > > GIOVA->GPA in the IOTLB.
> > > > > > 
> > > > > > This has advantages for security since a single
table needs
> > > > > > then to be validated to ensure guest does not
corrupt
> > > > > > QEMU memory.
> > > > > > 
> > > > > I wonder how much we can gain through this. Currently,
qemu IOMMU gives
> > > > > GIOVA->GPA mapping, and qemu vhost code will
translate GPA to HVA then pass
> > > > > GIOVA->HVA to vhost. It looks no difference to me.
> > > > > 
> > > > > Thanks
> > > > The difference is in security not in performance.  Getting a
bad HVA
> > > > corrupts QEMU memory and it might be guest controlled. Very
risky.
> > > How can this be controlled by guest? HVA was generated from qemu
ram blocks
> > > which is totally under the control of qemu memory core instead of
guest.
> > > 
> > > 
> > > Thanks
> > It is ultimately under guest influence as guest supplies IOVA->GPA
> > translations.  qemu translates GPA->HVA and gives the translated
result
> > to the kernel.  If it's not buggy and kernel isn't buggy
it's all
> > fine.
> 
> 
> If qemu provides buggy GPA->HVA, we can't workaround this. And I
don't get
> the point why we even want to try this. Buggy qemu code can crash itself in
> many ways.
> 
> 
> > 
> > But that's the approach that was proven not to work in the 20th
century.
> > In the 21st century we are trying defence in depth approach.
> > 
> > My point is that a single code path that is responsible for
> > the HVA translations is better than two.
> > 
> 
> So the difference whether or not use memory table information:
> 
> Current:
> 
> 1) SET_MEM_TABLE: GPA->HVA
> 
> 2) Qemu GIOVA->GPA
> 
> 3) Qemu GPA->HVA
> 
> 4) IOTLB_UPDATE: GIOVA->HVA
> 
> If I understand correctly you want to drop step 3 consider it might be
buggy
> which is just 19 lines of code in qemu (vhost_memory_region_lookup()). This
> will ends up:
> 
> 1) Do GPA->HVA translation in IOTLB_UPDATE path (I believe we won't
want to
> do it during device IOTLB lookup).
> 
> 2) Extra bits to enable this capability.
> 
> So this looks need more codes in kernel than what qemu did in userspace.?
Is
> this really worthwhile?
> 
> Thanks
So there are several points I would like to make

1. At the moment without an iommu it is possible to
   change GPA-HVA mappings and everything keeps working
   because a change in memory tables flushes the rings.
   However I don't see the iotlb cache being invalidated
   on that path - did I miss it? If it is not there it's
   a related minor bug.

2. qemu already has a GPA. Discarding it and re-calculating
   when logging is on just seems wrong.
   However if you would like to *also* keep the HVA in the iotlb
   to avoid doing extra translations, that sounds like a
   reasonable optimization.

3. it also means that the hva->gpa translation only runs
   when logging is enabled. That is a rarely excercised
   path so any bugs there will not be caught.

So I really would like us long term to move away from
hva->gpa translations, keep them for legacy userspace only
but I don't really mind how we do it.

How about
- a new flag to pass an iotlb with *both* a gpa and hva
- for legacy userspace, calculate the gpa on iotlb update
  so the device then uses a shared code path

what do you think?


-- 
MST

Jason Wang

2018-Dec-26 05:43 UTC

head link

[PATCH net V2 4/4] vhost: log dirty page correctly

On 2018/12/26 ??12:25, Michael S. Tsirkin wrote:> On Tue, Dec 25, 2018 at 05:43:25PM +0800, Jason Wang wrote:
>> On 2018/12/25 ??1:41, Michael S. Tsirkin wrote:
>>> On Mon, Dec 24, 2018 at 11:43:31AM +0800, Jason Wang wrote:
>>>> On 2018/12/14 ??9:20, Michael S. Tsirkin wrote:
>>>>> On Fri, Dec 14, 2018 at 10:43:03AM +0800, Jason Wang wrote:
>>>>>> On 2018/12/13 ??10:31, Michael S. Tsirkin wrote:
>>>>>>>> Just to make sure I understand this. It looks
to me we should:
>>>>>>>>
>>>>>>>> - allow passing GIOVA->GPA through UAPI
>>>>>>>>
>>>>>>>> - cache GIOVA->GPA somewhere but still use
GIOVA->HVA in device IOTLB for
>>>>>>>> performance
>>>>>>>>
>>>>>>>> Is this what you suggest?
>>>>>>>>
>>>>>>>> Thanks
>>>>>>> Not really. We already have GPA->HVA, so I
suggested a flag to pass
>>>>>>> GIOVA->GPA in the IOTLB.
>>>>>>>
>>>>>>> This has advantages for security since a single
table needs
>>>>>>> then to be validated to ensure guest does not
corrupt
>>>>>>> QEMU memory.
>>>>>>>
>>>>>> I wonder how much we can gain through this. Currently,
qemu IOMMU gives
>>>>>> GIOVA->GPA mapping, and qemu vhost code will
translate GPA to HVA then pass
>>>>>> GIOVA->HVA to vhost. It looks no difference to me.
>>>>>>
>>>>>> Thanks
>>>>> The difference is in security not in performance.  Getting
a bad HVA
>>>>> corrupts QEMU memory and it might be guest controlled. Very
risky.
>>>> How can this be controlled by guest? HVA was generated from
qemu ram blocks
>>>> which is totally under the control of qemu memory core instead
of guest.
>>>>
>>>>
>>>> Thanks
>>> It is ultimately under guest influence as guest supplies
IOVA->GPA
>>> translations.  qemu translates GPA->HVA and gives the translated
result
>>> to the kernel.  If it's not buggy and kernel isn't buggy
it's all
>>> fine.
>>
>> If qemu provides buggy GPA->HVA, we can't workaround this. And I
don't get
>> the point why we even want to try this. Buggy qemu code can crash
itself in
>> many ways.
>>
>>
>>> But that's the approach that was proven not to work in the 20th
century.
>>> In the 21st century we are trying defence in depth approach.
>>>
>>> My point is that a single code path that is responsible for
>>> the HVA translations is better than two.
>>>
>> So the difference whether or not use memory table information:
>>
>> Current:
>>
>> 1) SET_MEM_TABLE: GPA->HVA
>>
>> 2) Qemu GIOVA->GPA
>>
>> 3) Qemu GPA->HVA
>>
>> 4) IOTLB_UPDATE: GIOVA->HVA
>>
>> If I understand correctly you want to drop step 3 consider it might be
buggy
>> which is just 19 lines of code in qemu (vhost_memory_region_lookup()).
This
>> will ends up:
>>
>> 1) Do GPA->HVA translation in IOTLB_UPDATE path (I believe we
won't want to
>> do it during device IOTLB lookup).
>>
>> 2) Extra bits to enable this capability.
>>
>> So this looks need more codes in kernel than what qemu did in
userspace.? Is
>> this really worthwhile?
>>
>> Thanks
> So there are several points I would like to make
>
> 1. At the moment without an iommu it is possible to
>     change GPA-HVA mappings and everything keeps working
>     because a change in memory tables flushes the rings.

Interesting, I don't know this before. But when can this happen?

>     However I don't see the iotlb cache being invalidated
>     on that path - did I miss it? If it is not there it's
>     a related minor bug.

It might have a bug. But a question is consider the case without IOMMU. 
We only update mem table (SET_MEM_TABLE), but not vring address. This 
looks like a bug as well?

>
> 2. qemu already has a GPA. Discarding it and re-calculating
>     when logging is on just seems wrong.
>     However if you would like to *also* keep the HVA in the iotlb
>     to avoid doing extra translations, that sounds like a
>     reasonable optimization.

Yes, traverse GPA->HVA mapping seems unnecessary.

>
> 3. it also means that the hva->gpa translation only runs
>     when logging is enabled. That is a rarely excercised
>     path so any bugs there will not be caught.

I wonder maybe some kind of unit-test may help here.

>
> So I really would like us long term to move away from
> hva->gpa translations, keep them for legacy userspace only
> but I don't really mind how we do it.
>
> How about
> - a new flag to pass an iotlb with *both* a gpa and hva
> - for legacy userspace, calculate the gpa on iotlb update
>    so the device then uses a shared code path
>
> what do you think?
>
>
I don't object this idea so I can try, just want to figure out why it 
was a must.

Thanks

Michael S. Tsirkin

2018-Dec-26 13:46 UTC

head link

[PATCH net V2 4/4] vhost: log dirty page correctly

On Wed, Dec 26, 2018 at 01:43:26PM +0800, Jason Wang
wrote:> 
> On 2018/12/26 ??12:25, Michael S. Tsirkin wrote:
> > On Tue, Dec 25, 2018 at 05:43:25PM +0800, Jason Wang wrote:
> > > On 2018/12/25 ??1:41, Michael S. Tsirkin wrote:
> > > > On Mon, Dec 24, 2018 at 11:43:31AM +0800, Jason Wang wrote:
> > > > > On 2018/12/14 ??9:20, Michael S. Tsirkin wrote:
> > > > > > On Fri, Dec 14, 2018 at 10:43:03AM +0800, Jason
Wang wrote:
> > > > > > > On 2018/12/13 ??10:31, Michael S. Tsirkin
wrote:
> > > > > > > > > Just to make sure I understand
this. It looks to me we should:
> > > > > > > > > 
> > > > > > > > > - allow passing GIOVA->GPA
through UAPI
> > > > > > > > > 
> > > > > > > > > - cache GIOVA->GPA somewhere but
still use GIOVA->HVA in device IOTLB for
> > > > > > > > > performance
> > > > > > > > > 
> > > > > > > > > Is this what you suggest?
> > > > > > > > > 
> > > > > > > > > Thanks
> > > > > > > > Not really. We already have GPA->HVA,
so I suggested a flag to pass
> > > > > > > > GIOVA->GPA in the IOTLB.
> > > > > > > > 
> > > > > > > > This has advantages for security since a
single table needs
> > > > > > > > then to be validated to ensure guest
does not corrupt
> > > > > > > > QEMU memory.
> > > > > > > > 
> > > > > > > I wonder how much we can gain through this.
Currently, qemu IOMMU gives
> > > > > > > GIOVA->GPA mapping, and qemu vhost code
will translate GPA to HVA then pass
> > > > > > > GIOVA->HVA to vhost. It looks no
difference to me.
> > > > > > > 
> > > > > > > Thanks
> > > > > > The difference is in security not in performance. 
Getting a bad HVA
> > > > > > corrupts QEMU memory and it might be guest
controlled. Very risky.
> > > > > How can this be controlled by guest? HVA was generated
from qemu ram blocks
> > > > > which is totally under the control of qemu memory core
instead of guest.
> > > > > 
> > > > > 
> > > > > Thanks
> > > > It is ultimately under guest influence as guest supplies
IOVA->GPA
> > > > translations.  qemu translates GPA->HVA and gives the
translated result
> > > > to the kernel.  If it's not buggy and kernel isn't
buggy it's all
> > > > fine.
> > > 
> > > If qemu provides buggy GPA->HVA, we can't workaround this.
And I don't get
> > > the point why we even want to try this. Buggy qemu code can crash
itself in
> > > many ways.
> > > 
> > > 
> > > > But that's the approach that was proven not to work in
the 20th century.
> > > > In the 21st century we are trying defence in depth approach.
> > > > 
> > > > My point is that a single code path that is responsible for
> > > > the HVA translations is better than two.
> > > > 
> > > So the difference whether or not use memory table information:
> > > 
> > > Current:
> > > 
> > > 1) SET_MEM_TABLE: GPA->HVA
> > > 
> > > 2) Qemu GIOVA->GPA
> > > 
> > > 3) Qemu GPA->HVA
> > > 
> > > 4) IOTLB_UPDATE: GIOVA->HVA
> > > 
> > > If I understand correctly you want to drop step 3 consider it
might be buggy
> > > which is just 19 lines of code in qemu
(vhost_memory_region_lookup()). This
> > > will ends up:
> > > 
> > > 1) Do GPA->HVA translation in IOTLB_UPDATE path (I believe we
won't want to
> > > do it during device IOTLB lookup).
> > > 
> > > 2) Extra bits to enable this capability.
> > > 
> > > So this looks need more codes in kernel than what qemu did in
userspace.? Is
> > > this really worthwhile?
> > > 
> > > Thanks
> > So there are several points I would like to make
> > 
> > 1. At the moment without an iommu it is possible to
> >     change GPA-HVA mappings and everything keeps working
> >     because a change in memory tables flushes the rings.
> 
> 
> Interesting, I don't know this before. But when can this happen?

It doesn't happen with existing qemu. But it seems like a valid
thing to do to remap memory at a different address.

> 
> >     However I don't see the iotlb cache being invalidated
> >     on that path - did I miss it? If it is not there it's
> >     a related minor bug.
> 
> 
> It might have a bug. But a question is consider the case without IOMMU. We
> only update mem table (SET_MEM_TABLE), but not vring address. This looks
> like a bug as well?
I think that without an iommu it can only work without races if backend is
stopped or if the vring isn't in guest memory with ring aliasing).

> 
> > 
> > 2. qemu already has a GPA. Discarding it and re-calculating
> >     when logging is on just seems wrong.
> >     However if you would like to *also* keep the HVA in the iotlb
> >     to avoid doing extra translations, that sounds like a
> >     reasonable optimization.
> 
> 
> Yes, traverse GPA->HVA mapping seems unnecessary.
> 
> 
> > 
> > 3. it also means that the hva->gpa translation only runs
> >     when logging is enabled. That is a rarely excercised
> >     path so any bugs there will not be caught.
> 
> 
> I wonder maybe some kind of unit-test may help here.
> 
> 
> > 
> > So I really would like us long term to move away from
> > hva->gpa translations, keep them for legacy userspace only
> > but I don't really mind how we do it.
> > 
> > How about
> > - a new flag to pass an iotlb with *both* a gpa and hva
> > - for legacy userspace, calculate the gpa on iotlb update
> >    so the device then uses a shared code path
> > 
> > what do you think?
> > 
> > 
> 
> I don't object this idea so I can try, just want to figure out why it
was a
> must.
> 
> Thanks
Not a must but I think it's a good interface extension.

-- 
MST

Virtualization - Dec 2018 - [PATCH net V2 4/4] vhost: log dirty page correctly

[PATCH net V2 4/4] vhost: log dirty page correctly

[PATCH net V2 4/4] vhost: log dirty page correctly

[PATCH net V2 4/4] vhost: log dirty page correctly