Current Xen design is that the guest domain have readonly access to the memory mapped for them. Documentation say it is not safe for them to be writable. Why? Is it so as to trigger a trap exception whenever writing is made to it? This is the optimal answer :-). And since it is not "safe" what checks are done in Xen hypervisor against these "dangers", ie, enumerate the potential dangers? I cannot think of any, as a newbie in Xen. My logic is that if the pages have been assigned as owned by a domain, just let it do whatever it wants to, and so therefore should not trigger any privilege trap condition (or VM exit condition, in the HVM case). In the traditional Linux model, once a memory is mapped for user process, non-root user included, it can be mapped as writable. So why is this discrepancy in the case of Xen? By taking away this readonly restriction, I think Xen hypervisor will have a lot of performance to gain. Please share your thoughts? Apologies for the questions from a newbie. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Sorry, I think may have found the answer. The real answer is because the guest should be hidden from the real physical PTE page table mapping. And since all the guest page table are faked, a mechanism is needed to inform the hypervisor to remap to the real physical mapping. The mechanism present used to do this is setting the table to readonly, so as to trigger an exception whenever writing is attempted. Not sure if my analysis is correct? Thanks. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Xen does not have this general read-only restriction. It does force page tables to be read-only, otherwise a guest could grant itself access to arbitrary memory that it does not own. -- Keir On 12/9/07 02:22, "Peter Teoh" <htmldeveloper@gmail.com> wrote:> Current Xen design is that the guest domain have readonly access to the memory > mapped for them. Documentation say it is not safe for them to be writable. > Why? > > Is it so as to trigger a trap exception whenever writing is made to it? This > is the optimal answer :-). > > And since it is not "safe" what checks are done in Xen hypervisor against > these "dangers", ie, enumerate the potential dangers? I cannot think of any, > as a newbie in Xen. My logic is that if the pages have been assigned as > owned by a domain, just let it do whatever it wants to, and so therefore > should not trigger any privilege trap condition (or VM exit condition, in the > HVM case). > > In the traditional Linux model, once a memory is mapped for user process, > non-root user included, it can be mapped as writable. So why is this > discrepancy in the case of Xen? > > By taking away this readonly restriction, I think Xen hypervisor will have a > lot of performance to gain. > > Please share your thoughts? Apologies for the questions from a newbie. > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 9/12/07, Keir Fraser <Keir.Fraser@cl.cam.ac.uk> wrote:> > Xen does not have this general read-only restriction. It does force page > tables to be read-only, otherwise a guest could grant itself access to > arbitrary memory that it does not own. >Thank you for the answer. In the first place, we will not know what is pagetable or non-pagetable memory. For example, during dom0/domU initialisation, the guest OS will query the e820 bios mechanism for physical memory availability, and the guest OS (paravirt or HVM) will then assign different parts of the physical memory for pagetable construction. Then after all the pagetable is completely constructed, the CR3 is loaded, which started the hardware MMU operation. So therefore, before the CR3 is loaded the entire physical memory is marked as readonly, and after the CR3 is loaded, only those memory not involved in pagetable mapping are unmarked readonly? Does not seem right, as guest OS can change the CR3 anytime subsequently as well. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
pradeep singh rautela
2007-Sep-13 04:40 UTC
Re: [Xen-devel] Readonly memory for guest domain
On 9/13/07, Peter Teoh <htmldeveloper@gmail.com> wrote: [...]> Thank you for the answer. In the first place, we will not know what is > pagetable or non-pagetable memory. For example, during dom0/domU > initialisation, the guest OS will query the e820 bios mechanism for physical > memory availability, and the guest OS (paravirt or HVM) will then assign > different parts of the physical memory for pagetable construction. Then > after all the pagetable is completely constructed, the CR3 is loaded, which > started the hardware MMU operation. So therefore, before the CR3 is > loaded the entire physical memory is marked as readonly, and after the CR3 > is loaded, only those memory not involved in pagetable mapping are unmarked > readonly? > > Does not seem right, as guest OS can change the CR3 anytime subsequently as > well.Any writes to CR3 ''ll be trapped to the Xen itself AFAIK. So, yes any guest can change the CR3 anytime but there is always Xen to see what it is writing in the CR3 .Anything beyond the memory assigned to domain is illegal, xen knows the limits of the domains. Please CMIIW somewhere. Thanks> > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel > >-- -- pradeep singh rautela "question = ( to ) ? be : ! be;" -- Wm. Shakespeare _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Yeah, it doesn¹t really work as you describe! K. On 13/9/07 02:59, "Peter Teoh" <htmldeveloper@gmail.com> wrote:> > On 9/12/07, Keir Fraser <Keir.Fraser@cl.cam.ac.uk> wrote: >> Xen does not have this general read-only restriction. It does force page >> tables to be read-only, otherwise a guest could grant itself access to >> arbitrary memory that it does not own. > > Thank you for the answer. In the first place, we will not know what is > pagetable or non-pagetable memory. For example, during dom0/domU > initialisation, the guest OS will query the e820 bios mechanism for physical > memory availability, and the guest OS (paravirt or HVM) will then assign > different parts of the physical memory for pagetable construction. Then > after all the pagetable is completely constructed, the CR3 is loaded, which > started the hardware MMU operation. So therefore, before the CR3 is loaded > the entire physical memory is marked as readonly, and after the CR3 is loaded, > only those memory not involved in pagetable mapping are unmarked readonly? > > Does not seem right, as guest OS can change the CR3 anytime subsequently as > well. > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Thank you for the answer, but I am still totally confused....apologies here. On 9/13/07, pradeep singh rautela <rautelap@gmail.com> wrote:> > On 9/13/07, Peter Teoh <htmldeveloper@gmail.com> wrote: > [...] > > Thank you for the answer. In the first place, we will not know what is > > pagetable or non-pagetable memory. For example, during dom0/domU > > initialisation, the guest OS will query the e820 bios mechanism for > physical > > memory availability, and the guest OS (paravirt or HVM) will then > assign > > different parts of the physical memory for pagetable construction. > Then > > after all the pagetable is completely constructed, the CR3 is loaded, > which > > started the hardware MMU operation. So therefore, before the CR3 is > > loaded the entire physical memory is marked as readonly, and after the > CR3 > > is loaded, only those memory not involved in pagetable mapping are > unmarked > > readonly? > > > > Does not seem right, as guest OS can change the CR3 anytime subsequently > as > > well. > > Any writes to CR3 ''ll be trapped to the Xen itself AFAIK. So, yes any > guest can change the CR3 anytime but there is always Xen to see what > it is writing in the CR3 .Anything beyond the memory assigned to > domain is illegal, xen knows the limits of the domains.This part I fully understand. But the guest OS, knowing that he owns the entire memory range, will attempt to partition the entire blocks of memory in any design he wants to - whether it be pagetable memories or not. And so the contents in memory can be anything, there is no concept of "invalid frame number" to the guest OS, and will remain as what the guest OS has written - no change, ie hypervisor cannot change its content. But the hypervisor will implement a shadow memory (apologies if I am wrong, just describing based on the all the materials I have read so far) - this construction (done in hypervisor) is triggered immediately upon loading of CR3 by the guest. And the purpose of the shadow memory is to rewrite all the pagetable entries in the guest to its real/physical values, so that it can be used for pagetable mapping by MMU. This rewriting process is done in hypervisor, based on the memory assigned to the guest, and so it has to be ALWAYS valid values. It is needed because hypervisor cannot change the content of the guest pagetable. The guest should always be able to write ANYTHING he wants to, to his own guest memory. And the hypervisor will always generate the VALID mapping values to put into the shadow memory. So throughout the entire chain of reasoning, there is no way for the guest to corrupt the shadow table in the hypervisor. The only reason I can think of, that pagetable in guest must be made readonly, is so that it will trigger the corresponding pagetable update in the shadow memory in the hypervisor. Nothing to do with valid/invalid frames numbers here, or "unsafe" values either. Does it sound logical? Please correct me if I am wrong. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Thu, 2007-09-13 at 14:36 +0800, Peter Teoh wrote:> Thank you for the answer, but I am still totally confused....apologies > here. > > On 9/13/07, pradeep singh rautela <rautelap@gmail.com> wrote: > On 9/13/07, Peter Teoh <htmldeveloper@gmail.com> wrote: > [...] > > Thank you for the answer. In the first place, we will not > know what is > > pagetable or non-pagetable memory. For example, during > dom0/domU > > initialisation, the guest OS will query the e820 bios > mechanism for physical > > memory availability, and the guest OS (paravirt or HVM) > will then assign > > different parts of the physical memory for pagetable > construction. Then > > after all the pagetable is completely constructed, the CR3 > is loaded, which > > started the hardware MMU operation. So therefore, before > the CR3 is > > loaded the entire physical memory is marked as readonly, and > after the CR3 > > is loaded, only those memory not involved in pagetable > mapping are unmarked > > readonly? > > > > Does not seem right, as guest OS can change the CR3 anytime > subsequently as > > well. > > Any writes to CR3 ''ll be trapped to the Xen itself AFAIK. So, > yes any > guest can change the CR3 anytime but there is always Xen to > see what > it is writing in the CR3 .Anything beyond the memory assigned > to > domain is illegal, xen knows the limits of the domains. > > This part I fully understand. But the guest OS, knowing that he owns > the entire memory range, will attempt to partition the entire blocks > of memory in any design he wants to - whether it be pagetable memories > or not. And so the contents in memory can be anything, there is no > concept of "invalid frame number" to the guest OS, and will remain as > what the guest OS has written - no change, ie hypervisor cannot change > its content. > > But the hypervisor will implement a shadow memory (apologies if I am > wrong, just describing based on the all the materials I have read so > far) - this construction (done in hypervisor) is triggered immediately > upon loading of CR3 by the guest. And the purpose of the shadow > memory is to rewrite all the pagetable entries in the guest to its > real/physical values, so that it can be used for pagetable mapping by > MMU. This rewriting process is done in hypervisor, based on the > memory assigned to the guest, and so it has to be ALWAYS valid values. > It is needed because hypervisor cannot change the content of the guest > pagetable. The guest should always be able to write ANYTHING he > wants to, to his own guest memory. And the hypervisor will always > generate the VALID mapping values to put into the shadow memory. > > So throughout the entire chain of reasoning, there is no way for the > guest to corrupt the shadow table in the hypervisor. The only reason > I can think of, that pagetable in guest must be made readonly, is so > that it will trigger the corresponding pagetable update in the shadow > memory in the hypervisor. Nothing to do with valid/invalid frames > numbers here, or "unsafe" values either. Does it sound logical? > > Please correct me if I am wrong.You need to make it clear whether you are talking about paravirutalised (PV) or fully-virtualised (HVM) mode guests, they are very different in this regard. What you say is roughly true for HVM guests but not PV guests where there is no shadow mode. In the HVM case the shadowing code ensures that guest page-table pages are marked read-only in the shadowed page tables (the ones actually loaded into cr3) in order to trap and propagate updates. For PV guests the guest is required to perform the psuedo-physical to machine address translation itself. The hypervisor enforces the invariant that the guest cannot have a writable mapping to a page table page using the algorithm described in the Xen paper[0], section 3.3.3. On startup the initial pagetables are marked readonly and the guest has to make other pages read-only if it wishes to use them as page tables. [0] http://www.cl.cam.ac.uk/research/srg/netos/papers/2003-xensosp.pdf Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Thank you Ian, Pradeep, and Keir for all the answers. Just a few more questions to confirm my understanding: On 9/13/07, Ian Campbell <Ian.Campbell@xensource.com> wrote:> > > > Thank you for the answer. In the first place, we will not > > know what is > > > pagetable or non-pagetable memory. For example, during > > dom0/domU > > > initialisation, the guest OS will query the e820 bios > > mechanism for physical > > > memory availability, and the guest OS (paravirt or HVM) > > will then assign > > > different parts of the physical memory for pagetable > > construction.I guessed this part is wrong - ie, PV will not have the luxury of having the entire range of contiguous physical memory. Since the actual pagetable to be used will be stored in guest memory, to minimize copying, what the guest see in the pagetable, will also be the real value to be used in MMU operation. Correct? Then> > > after all the pagetable is completely constructed, the CR3 > > is loaded, which > > > started the hardware MMU operation. So therefore, before > > the CR3 is > > > loaded the entire physical memory is marked as readonly, and > > after the CR3 > > > is loaded, only those memory not involved in pagetable > > mapping are unmarked > > > readonly? > > > > > > Does not seem right, as guest OS can change the CR3 anytime > > subsequently as > > > well. > > > > Any writes to CR3 ''ll be trapped to the Xen itself AFAIK. So, > > yes any > > guest can change the CR3 anytime but there is always Xen to > > see what > > it is writing in the CR3 .Anything beyond the memory assigned > > to > > domain is illegal, xen knows the limits of the domains. > > > > This part I fully understand. But the guest OS, knowing that he owns > > the entire memory range, will attempt to partition the entire blocks > > of memory in any design he wants to - whether it be pagetable memories > > or not. And so the contents in memory can be anything, there is no > > concept of "invalid frame number" to the guest OS, and will remain as > > what the guest OS has written - no change, ie hypervisor cannot change > > its content. > > > > But the hypervisor will implement a shadow memory (apologies if I am > > wrong, just describing based on the all the materials I have read so > > far) - this construction (done in hypervisor) is triggered immediately > > upon loading of CR3 by the guest. And the purpose of the shadow > > memory is to rewrite all the pagetable entries in the guest to its > > real/physical values, so that it can be used for pagetable mapping by > > MMU. This rewriting process is done in hypervisor, based on the > > memory assigned to the guest, and so it has to be ALWAYS valid values. > > It is needed because hypervisor cannot change the content of the guest > > pagetable. The guest should always be able to write ANYTHING he > > wants to, to his own guest memory. And the hypervisor will always > > generate the VALID mapping values to put into the shadow memory. > > > > So throughout the entire chain of reasoning, there is no way for the > > guest to corrupt the shadow table in the hypervisor. The only reason > > I can think of, that pagetable in guest must be made readonly, is so > > that it will trigger the corresponding pagetable update in the shadow > > memory in the hypervisor. Nothing to do with valid/invalid frames > > numbers here, or "unsafe" values either. Does it sound logical? > > > > Please correct me if I am wrong. > > You need to make it clear whether you are talking about paravirutalised > (PV) or fully-virtualised (HVM) mode guests, they are very different in > this regard.Apologies for this deep probing again. I don''t quite understand why it has to be PV or HVM. As the "load cr3" instruction is a privileged insn, running it at ring1 (PV) will trigger a exception condition, which can be used to update the hypervisor shadow table, if it is implemented, irregardless of HVM (which is SVM or VMX) available or not. Similarly for guest readonly pagetable enforcement - no HVM features is needed here, because it is still running at ring1, and subject to ring0''s host control. Please englighten :-). Perhaps some other operations subsequent to this make the shadow table implementation for PV infeasible? From the paper[0] quoted below, is it due to the high overheads of shadow table implementation in PV scenario? What you say is roughly true for HVM guests but not PV guests where> there is no shadow mode.This is something new to me, thanks you for the info. In the HVM case the shadowing code ensures that guest page-table pages> are marked read-only in the shadowed page tables (the ones actually > loaded into cr3) in order to trap and propagate updates. > > For PV guests the guest is required to perform the psuedo-physical to > machine address translation itself. The hypervisor enforces the > invariant that the guest cannot have a writable mapping to a page table > page using the algorithm described in the Xen paper[0], section 3.3.3. > On startup the initial pagetables are marked readonly and the guest has > to make other pages read-only if it wishes to use them as page tables. > > [0] http://www.cl.cam.ac.uk/research/srg/netos/papers/2003-xensosp.pdf > > Ian. > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Thu, 2007-09-13 at 22:46 +0800, Peter Teoh wrote:> Thank you Ian, Pradeep, and Keir for all the answers. Just a few > more questions to confirm my understanding: > > On 9/13/07, Ian Campbell <Ian.Campbell@xensource.com> wrote: > > > Thank you for the answer. In the first place, we > will not > > know what is > > > pagetable or non-pagetable memory. For example, > during > > dom0/domU > > > initialisation, the guest OS will query the e820 > bios > > mechanism for physical > > > memory availability, and the guest OS (paravirt > or HVM) > > will then assign > > > different parts of the physical memory for > pagetable > > construction. > > I guessed this part is wrong - ie, PV will not have the luxury of > having the entire range of contiguous physical memory. Since the > actual pagetable to be used will be stored in guest memory, to > minimize copying, what the guest see in the pagetable, will also be > the real value to be used in MMU operation. Correct?Yes -- in PV mode the exact cr3 value the guest wants to load is loaded into the MMU, there is no shadow mode at all for PV guests and hence no second set of page tables.> You need to make it clear whether you are talking about > paravirutalised > (PV) or fully-virtualised (HVM) mode guests, they are very > different in > this regard. > > Apologies for this deep probing again. I don''t quite understand why > it has to be PV or HVM.Because page tables are handled in two different and separate ways depending on whether the guest is a PV or HVM guest. The key difference is that PV uses "direct page tables" and HVM uses "shadow page tables". Your failure to understand this seems to underlie most of your confusion.> As the "load cr3" instruction is a privileged insn, running it at > ring1 (PV) will trigger a exception condition,True, although we actually use an explicit "load cr3" hypercall instead for PV guests, in theory that could have been done via emulation of the mov to cr3 instruction but it isn''t.> which can be used to update the hypervisor shadow table, if it is > implemented, irregardless of HVM (which is SVM or VMX) available or > not. Similarly for guest readonly pagetable enforcement - no HVM > features is needed here, because it is still running at ring1, and > subject to ring0''s host control. Please englighten :-).Guest readonly page table enforcement is used with PV guests, not HVM guests so I don''t get what you are trying to say here, of course no HVM features are needed for this. If a PV guest was able to setup writable mappings to its own page tables then it could use these to update the currently active page tables without trapping via the hypervisor and therefore map memory it is not allowed to. There is no hardware feature which stops ring 1 writing to a page (even a page table page) if a writable mapping exists (the U/S bit only stops ring 3 writing to a page) hence the hypervisor has to enforce additional constraints on the mappings of page table pages for PV guests.> Perhaps some other operations subsequent to this make the shadow table > implementation for PV infeasible?There is nothing in principle stopping you from implementing a shadow mode for PV (in fact there used to be one but it went away with shadow2). However direct paging has lower overheads than shadow paging, doesn''t need an extra copy of the page table (i.e. lower memory requirements), etc etc. Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel