thr3ads.net - Linux Virtualization - [RFC PATCH] virtio_ring: Use DMA API if guest memory is encrypted [Mar 2019]

If this information is useful, please help other people find it:
Share via:

Thiago Jung Bauermann

2019-Feb-04 18:14 UTC

[RFC PATCH] virtio_ring: Use DMA API if guest memory is encrypted

Hello Michael,

Michael S. Tsirkin <mst at redhat.com> writes:
> On Tue, Jan 29, 2019 at 03:42:44PM -0200, Thiago Jung Bauermann wrote:
>>
>> Fixing address of powerpc mailing list.
>>
>> Thiago Jung Bauermann <bauerman at linux.ibm.com> writes:
>>
>> > Hello,
>> >
>> > With Christoph's rework of the DMA API that recently landed,
the patch
>> > below is the only change needed in virtio to make it work in a
POWER
>> > secure guest under the ultravisor.
>> >
>> > The other change we need (making sure the device's dma_map_ops
is NULL
>> > so that the dma-direct/swiotlb code is used) can be made in
>> > powerpc-specific code.
>> >
>> > Of course, I also have patches (soon to be posted as RFC) which
hook up
>> > <linux/mem_encrypt.h> to the powerpc secure guest support
code.
>> >
>> > What do you think?
>> >
>> > From d0629a36a75c678b4a72b853f8f7f8c17eedd6b3 Mon Sep 17 00:00:00
2001
>> > From: Thiago Jung Bauermann <bauerman at linux.ibm.com>
>> > Date: Thu, 24 Jan 2019 22:08:02 -0200
>> > Subject: [RFC PATCH] virtio_ring: Use DMA API if guest memory is
encrypted
>> >
>> > The host can't access the guest memory when it's
encrypted, so using
>> > regular memory pages for the ring isn't an option. Go through
the DMA API.
>> >
>> > Signed-off-by: Thiago Jung Bauermann <bauerman at
linux.ibm.com>
>
> Well I think this will come back to bite us (witness xen which is now
> reworking precisely this path - but at least they aren't to blame, xen
> came before ACCESS_PLATFORM).
>
> I also still think the right thing would have been to set
> ACCESS_PLATFORM for all systems where device can't access all memory.
I understand. The problem with that approach for us is that because we
don't know which guests will become secure guests and which will remain
regular guests, QEMU would need to offer ACCESS_PLATFORM to all guests.

And the problem with that is that for QEMU on POWER, having
ACCESS_PLATFORM turned off means that it can bypass the IOMMU for the
device (which makes sense considering that the name of the flag was
IOMMU_PLATFORM). And we need that for regular guests to avoid
performance degradation.

So while ACCESS_PLATFORM solves our problems for secure guests, we can't
turn it on by default because we can't affect legacy systems. Doing so
would penalize existing systems that can access all memory. They would
all have to unnecessarily go through address translations, and take a
performance hit.

The semantics of ACCESS_PLATFORM assume that the hypervisor/QEMU knows
in advance - right when the VM is instantiated - that it will not have
access to all guest memory. Unfortunately that assumption is subtly
broken on our secure-platform. The hypervisor/QEMU realizes that the
platform is going secure only *after the VM is instantiated*. It's the
kernel running in the VM that determines that it wants to switch the
platform to secure-mode.

Another way of looking at this issue which also explains our reluctance
is that the only difference between a secure guest and a regular guest
(at least regarding virtio) is that the former uses swiotlb while the
latter doens't. And from the device's point of view they're
indistinguishable. It can't tell one guest that is using swiotlb from
one that isn't. And that implies that secure guest vs regular guest
isn't a virtio interface issue, it's "guest internal affairs".
So
there's no reason to reflect that in the feature flags.

That said, we still would like to arrive at a proper design for this
rather than add yet another hack if we can avoid it. So here's another
proposal: considering that the dma-direct code (in kernel/dma/direct.c)
automatically uses swiotlb when necessary (thanks to Christoph's recent
DMA work), would it be ok to replace virtio's own direct-memory code
that is used in the !ACCESS_PLATFORM case with the dma-direct code? That
way we'll get swiotlb even with !ACCESS_PLATFORM, and virtio will get a
code cleanup (replace open-coded stuff with calls to existing
infrastructure).
> But I also think I don't have the energy to argue about power secure
> guest anymore.  So be it for power secure guest since the involved
> engineers disagree with me.  Hey I've been wrong in the past ;).
Yeah, it's been a difficult discussion. Thanks for still engaging!
I honestly thought that this patch was a good solution (if the guest has
encrypted memory it means that the DMA API needs to be used), but I can
see where you are coming from. As I said, we'd like to arrive at a good
solution if possible.
> But the name "sev_active" makes me scared because at least AMD
guys who
> were doing the sensible thing and setting ACCESS_PLATFORM
My understanding is, AMD guest-platform knows in advance that their
guest will run in secure mode and hence sets the flag at the time of VM
instantiation. Unfortunately we dont have that luxury on our platforms.
> (unless I'm
> wrong? I reemember distinctly that's so) will likely be affected too.
> We don't want that.
>
> So let's find a way to make sure it's just power secure guest for
now
> pls.
Yes, my understanding is that they turn ACCESS_PLATFORM on. And because
of that, IIUC this patch wouldn't affect them because in their platform
vring_use_dma_api() returns true earlier in the
"if !virtio_has_iommu_quirk(vdev)" condition.
> I also think we should add a dma_api near features under virtio_device
> such that these hacks can move off data path.
Sorry, I don't understand this.
> By the way could you please respond about virtio-iommu and
> why there's no support for ACCESS_PLATFORM on power?
There is support for ACCESS_PLATFORM on POWER. We don't enable it
because it causes a performance hit.
> I have Cc'd you on these discussions.
I'm having a look at the spec and the patches, but to be honest I'm not
the best powerpc guy for this. I'll see if I can get others to have a
look.
> Thanks!
Thanks as well!

--
Thiago Jung Bauermann
IBM Linux Technology Center

Michael S. Tsirkin

2019-Feb-04 20:23 UTC

head link

[RFC PATCH] virtio_ring: Use DMA API if guest memory is encrypted

On Mon, Feb 04, 2019 at 04:14:20PM -0200, Thiago Jung Bauermann
wrote:> 
> Hello Michael,
> 
> Michael S. Tsirkin <mst at redhat.com> writes:
> 
> > On Tue, Jan 29, 2019 at 03:42:44PM -0200, Thiago Jung Bauermann wrote:
> >>
> >> Fixing address of powerpc mailing list.
> >>
> >> Thiago Jung Bauermann <bauerman at linux.ibm.com> writes:
> >>
> >> > Hello,
> >> >
> >> > With Christoph's rework of the DMA API that recently
landed, the patch
> >> > below is the only change needed in virtio to make it work in
a POWER
> >> > secure guest under the ultravisor.
> >> >
> >> > The other change we need (making sure the device's
dma_map_ops is NULL
> >> > so that the dma-direct/swiotlb code is used) can be made in
> >> > powerpc-specific code.
> >> >
> >> > Of course, I also have patches (soon to be posted as RFC)
which hook up
> >> > <linux/mem_encrypt.h> to the powerpc secure guest
support code.
> >> >
> >> > What do you think?
> >> >
> >> > From d0629a36a75c678b4a72b853f8f7f8c17eedd6b3 Mon Sep 17
00:00:00 2001
> >> > From: Thiago Jung Bauermann <bauerman at linux.ibm.com>
> >> > Date: Thu, 24 Jan 2019 22:08:02 -0200
> >> > Subject: [RFC PATCH] virtio_ring: Use DMA API if guest memory
is encrypted
> >> >
> >> > The host can't access the guest memory when it's
encrypted, so using
> >> > regular memory pages for the ring isn't an option. Go
through the DMA API.
> >> >
> >> > Signed-off-by: Thiago Jung Bauermann <bauerman at
linux.ibm.com>
> >
> > Well I think this will come back to bite us (witness xen which is now
> > reworking precisely this path - but at least they aren't to blame,
xen
> > came before ACCESS_PLATFORM).
> >
> > I also still think the right thing would have been to set
> > ACCESS_PLATFORM for all systems where device can't access all
memory.
> 
> I understand. The problem with that approach for us is that because we
> don't know which guests will become secure guests and which will remain
> regular guests, QEMU would need to offer ACCESS_PLATFORM to all guests.
> 
> And the problem with that is that for QEMU on POWER, having
> ACCESS_PLATFORM turned off means that it can bypass the IOMMU for the
> device (which makes sense considering that the name of the flag was
> IOMMU_PLATFORM). And we need that for regular guests to avoid
> performance degradation.
You don't really, ACCESS_PLATFORM means just that, platform decides.
> So while ACCESS_PLATFORM solves our problems for secure guests, we
can't
> turn it on by default because we can't affect legacy systems. Doing so
> would penalize existing systems that can access all memory. They would
> all have to unnecessarily go through address translations, and take a
> performance hit.
So as step one, you just give hypervisor admin an option to run legacy
systems faster by blocking secure mode. I don't see why that is
so terrible.

But as step two, assuming you use above step one to make legacy
guests go fast - maybe there is a point in detecting
such a hypervisor and doing something smarter with it.
By all means let's have a discussion around this but that is no longer
"to make it work" as the commit log says it's more a performance
optimization.

> The semantics of ACCESS_PLATFORM assume that the hypervisor/QEMU knows
> in advance - right when the VM is instantiated - that it will not have
> access to all guest memory.
Not quite. It just means that hypervisor can live with not having
access to all memory. If platform wants to give it access
to all memory that is quite all right.

> Unfortunately that assumption is subtly
> broken on our secure-platform. The hypervisor/QEMU realizes that the
> platform is going secure only *after the VM is instantiated*. It's the
> kernel running in the VM that determines that it wants to switch the
> platform to secure-mode.
ACCESS_PLATFORM is there so guests can detect legacy hypervisors
which always assumed it's another CPU.
> Another way of looking at this issue which also explains our reluctance
> is that the only difference between a secure guest and a regular guest
> (at least regarding virtio) is that the former uses swiotlb while the
> latter doens't.
But swiotlb is just one implementation. It's a guest internal thing. The
issue is that memory isn't host accessible.  Yes linux does not use that
info too much right now but it already begins to seep out of the
abstraction.  For example as you are doing data copies you should maybe
calculate the packet checksum just as well.  Not something DMA API will
let you know right now, but that's because any bounce buffer users so
far weren't terribly fast anyway - it was all for 16 bit hardware and
such.

> And from the device's point of view they're
> indistinguishable. It can't tell one guest that is using swiotlb from
> one that isn't. And that implies that secure guest vs regular guest
> isn't a virtio interface issue, it's "guest internal
affairs". So
> there's no reason to reflect that in the feature flags.
So don't. The way not to reflect that in the feature flags is
to set ACCESS_PLATFORM.  Then you say *I don't care let platform device*.


Without ACCESS_PLATFORM
virtio has a very specific opinion about the security of the
device, and that opinion is that device is part of the guest
supervisor security domain.



> That said, we still would like to arrive at a proper design for this
> rather than add yet another hack if we can avoid it. So here's another
> proposal: considering that the dma-direct code (in kernel/dma/direct.c)
> automatically uses swiotlb when necessary (thanks to Christoph's recent
> DMA work), would it be ok to replace virtio's own direct-memory code
> that is used in the !ACCESS_PLATFORM case with the dma-direct code? That
> way we'll get swiotlb even with !ACCESS_PLATFORM, and virtio will get a
> code cleanup (replace open-coded stuff with calls to existing
> infrastructure).
Let's say I have some doubts that there's an API that
matches what virtio with its bag of legacy compatibility exactly.

But taking a step back you seem to keep looking at it at the code level.
And I think that's not necessarily right. If ACCESS_PLATFORM isn't what
you
are looking for then maybe you need another feature bit.
But you/we need to figure out what it means first.



> > But I also think I don't have the energy to argue about power
secure
> > guest anymore.  So be it for power secure guest since the involved
> > engineers disagree with me.  Hey I've been wrong in the past ;).
> 
> Yeah, it's been a difficult discussion. Thanks for still engaging!
> I honestly thought that this patch was a good solution (if the guest has
> encrypted memory it means that the DMA API needs to be used), but I can
> see where you are coming from. As I said, we'd like to arrive at a good
> solution if possible.
> 
> > But the name "sev_active" makes me scared because at least
AMD guys who
> > were doing the sensible thing and setting ACCESS_PLATFORM
> 
> My understanding is, AMD guest-platform knows in advance that their
> guest will run in secure mode and hence sets the flag at the time of VM
> instantiation. Unfortunately we dont have that luxury on our platforms.
Well you do have that luxury. It looks like that there are existing
guests that already acknowledge ACCESS_PLATFORM and you are not happy
with how that path is slow. So you are trying to optimize for
them by clearing ACCESS_PLATFORM and then you have lost ability
to invoke DMA API.

For example if there was another flag just like ACCESS_PLATFORM
just not yet used by anyone, you would be all fine using that right?

Is there any justification to doing that beyond someone putting
out slow code in the past?

> > (unless I'm
> > wrong? I reemember distinctly that's so) will likely be affected
too.
> > We don't want that.
> >
> > So let's find a way to make sure it's just power secure guest
for now
> > pls.
> 
> Yes, my understanding is that they turn ACCESS_PLATFORM on. And because
> of that, IIUC this patch wouldn't affect them because in their platform
> vring_use_dma_api() returns true earlier in the
> "if !virtio_has_iommu_quirk(vdev)" condition.
Let's just say I don't think we should assume how the specific
hypervisor
behaves. It seems to follow the spec and so should Linux.
> > I also think we should add a dma_api near features under virtio_device
> > such that these hacks can move off data path.
> 
> Sorry, I don't understand this.
I mean we can set a flag within struct virtio_device instead
of poking at features checking xen etc etc.
> > By the way could you please respond about virtio-iommu and
> > why there's no support for ACCESS_PLATFORM on power?
> 
> There is support for ACCESS_PLATFORM on POWER. We don't enable it
> because it causes a performance hit.
For legacy guests.
> > I have Cc'd you on these discussions.
> 
> I'm having a look at the spec and the patches, but to be honest I'm
not
> the best powerpc guy for this. I'll see if I can get others to have a
> look.
> 
> > Thanks!
> 
> Thanks as well!
> 
> --
> Thiago Jung Bauermann
> IBM Linux Technology Center

Thiago Jung Bauermann

2019-Mar-20 16:13 UTC

head link

[RFC PATCH] virtio_ring: Use DMA API if guest memory is encrypted

Hello Michael,

Sorry for the delay in responding. We had some internal discussions on
this.

Michael S. Tsirkin <mst at redhat.com> writes:
> On Mon, Feb 04, 2019 at 04:14:20PM -0200, Thiago Jung Bauermann wrote:
>>
>> Hello Michael,
>>
>> Michael S. Tsirkin <mst at redhat.com> writes:
>>
>> > On Tue, Jan 29, 2019 at 03:42:44PM -0200, Thiago Jung Bauermann
wrote:
>> So while ACCESS_PLATFORM solves our problems for secure guests, we
can't
>> turn it on by default because we can't affect legacy systems. Doing
so
>> would penalize existing systems that can access all memory. They would
>> all have to unnecessarily go through address translations, and take a
>> performance hit.
>
> So as step one, you just give hypervisor admin an option to run legacy
> systems faster by blocking secure mode. I don't see why that is
> so terrible.
There are a few reasons why:

1. It's bad user experience to require people to fiddle with knobs for
obscure reasons if it's possible to design things such that they Just
Work.

2. "User" in this case can be a human directly calling QEMU, but could
also be libvirt or one of its users, or some other framework. This means
having to adjust and/or educate an open-ended number of people and
software. It's best avoided if possible.

3. The hypervisor admin and the admin of the guest system don't
necessarily belong to the same organization (e.g., cloud provider and
cloud customer), so there may be some friction when they need to
coordinate to get this right.

4. A feature of our design is that the guest may or may not decide to
"go secure" at boot time, so it's best not to depend on flags that
may
or may not have been set at the time QEMU was started.
>> The semantics of ACCESS_PLATFORM assume that the hypervisor/QEMU knows
>> in advance - right when the VM is instantiated - that it will not have
>> access to all guest memory.
>
> Not quite. It just means that hypervisor can live with not having
> access to all memory. If platform wants to give it access
> to all memory that is quite all right.
Except that on powerpc it also means "there's an IOMMU present"
and
there's no way to say "bypass IOMMU translation". :-/
>> Another way of looking at this issue which also explains our reluctance
>> is that the only difference between a secure guest and a regular guest
>> (at least regarding virtio) is that the former uses swiotlb while the
>> latter doens't.
>
> But swiotlb is just one implementation. It's a guest internal thing.
The
> issue is that memory isn't host accessible.
>From what I understand of the ACCESS_PLATFORM definition, the host willonly ever try to access memory addresses that are supplied to it by the
guest, so all of the secure guest memory that the host cares about is
accessible:

    If this feature bit is set to 0, then the device has same access to
    memory addresses supplied to it as the driver has. In particular,
    the device will always use physical addresses matching addresses
    used by the driver (typically meaning physical addresses used by the
    CPU) and not translated further, and can access any address supplied
    to it by the driver. When clear, this overrides any
    platform-specific description of whether device access is limited or
    translated in any way, e.g. whether an IOMMU may be present.

All of the above is true for POWER guests, whether they are secure
guests or not.

Or are you saying that a virtio device may want to access memory
addresses that weren't supplied to it by the driver?
>> And from the device's point of view they're
>> indistinguishable. It can't tell one guest that is using swiotlb
from
>> one that isn't. And that implies that secure guest vs regular guest
>> isn't a virtio interface issue, it's "guest internal
affairs". So
>> there's no reason to reflect that in the feature flags.
>
> So don't. The way not to reflect that in the feature flags is
> to set ACCESS_PLATFORM.  Then you say *I don't care let platform
device*.
>
>
> Without ACCESS_PLATFORM
> virtio has a very specific opinion about the security of the
> device, and that opinion is that device is part of the guest
> supervisor security domain.
Sorry for being a bit dense, but not sure what "the device is part of
the guest supervisor security domain" means. In powerpc-speak,
"supervisor" is the operating system so perhaps that explains my
confusion. Are you saying that without ACCESS_PLATFORM, the guest
considers the host to be part of the guest operating system's security
domain? If so, does that have any other implication besides "the host
can access any address supplied to it by the driver"? If that is the
case, perhaps the definition of ACCESS_PLATFORM needs to be amended to
include that information because it's not part of the current
definition.
>> That said, we still would like to arrive at a proper design for this
>> rather than add yet another hack if we can avoid it. So here's
another
>> proposal: considering that the dma-direct code (in kernel/dma/direct.c)
>> automatically uses swiotlb when necessary (thanks to Christoph's
recent
>> DMA work), would it be ok to replace virtio's own direct-memory
code
>> that is used in the !ACCESS_PLATFORM case with the dma-direct code?
That
>> way we'll get swiotlb even with !ACCESS_PLATFORM, and virtio will
get a
>> code cleanup (replace open-coded stuff with calls to existing
>> infrastructure).
>
> Let's say I have some doubts that there's an API that
> matches what virtio with its bag of legacy compatibility exactly.
Ok.
>> > But the name "sev_active" makes me scared because at
least AMD guys who
>> > were doing the sensible thing and setting ACCESS_PLATFORM
>>
>> My understanding is, AMD guest-platform knows in advance that their
>> guest will run in secure mode and hence sets the flag at the time of VM
>> instantiation. Unfortunately we dont have that luxury on our platforms.
>
> Well you do have that luxury. It looks like that there are existing
> guests that already acknowledge ACCESS_PLATFORM and you are not happy
> with how that path is slow. So you are trying to optimize for
> them by clearing ACCESS_PLATFORM and then you have lost ability
> to invoke DMA API.
>
> For example if there was another flag just like ACCESS_PLATFORM
> just not yet used by anyone, you would be all fine using that right?
Yes, a new flag sounds like a great idea. What about the definition
below?

VIRTIO_F_ACCESS_PLATFORM_NO_IOMMU This feature has the same meaning as
    VIRTIO_F_ACCESS_PLATFORM both when set and when not set, with the
    exception that the IOMMU is explicitly defined to be off or bypassed
    when accessing memory addresses supplied to the device by the
    driver. This flag should be set by the guest if offered, but to
    allow for backward-compatibility device implementations allow for it
    to be left unset by the guest. It is an error to set both this flag
    and VIRTIO_F_ACCESS_PLATFORM.
> Is there any justification to doing that beyond someone putting
> out slow code in the past?
The definition of the ACCESS_PLATFORM flag is generic and captures the
notion of memory access restrictions for the device. Unfortunately, on
powerpc pSeries guests it also implies that the IOMMU is turned on even
though pSeries guests have never used IOMMU for virtio devices. Combined
with the lack of a way to turn off or bypass the IOMMU for virtio
devices, this means that existing guests in the field are compelled to
use the IOMMU even though that never was the case before, and said
guests having no mechanism to turn it off.

Therefore, we need a new flag to signal the memory access restriction
present in secure guests which doesn't also imply turning on the IOMMU.

--
Thiago Jung Bauermann
IBM Linux Technology Center

Maybe Matching Threads

Search for more seemingly similar threads

Linux Virtualization - Mar 2019 - [RFC PATCH] virtio_ring: Use DMA API if guest memory is encrypted

[RFC PATCH] virtio_ring: Use DMA API if guest memory is encrypted

[RFC PATCH] virtio_ring: Use DMA API if guest memory is encrypted

[RFC PATCH] virtio_ring: Use DMA API if guest memory is encrypted

Maybe Matching Threads