Hi, I was going through the memory management in Xen and little confused about shadow page tables for HVM guest and how they work. Few of my questions are : 1. A shadow page table, is a copy of the guest page table, with actual machine frame numbers (MFN), as against the physical page number(PFN) in case of guests. Do the shadow page tables write protect each of the PTE it contains, or only the pages containing the "guest page table" are write protected (to check if the guest has modified any entry)? 2. I know that live migration is supported on xen, but not sure if it works for HVM guests. In this case also, xen will need to keep a shadow page table, which should detect which pages have been modified since the last time the pages were migrated. For this, shadow page table should mark all the PTEs as write protected. Does xen implement live migration of HVM guests in this way ? If yes, then is this shadow table same as the one used in (1), or there are multiple copies of the shadow tables for different purposes? Thanks. Priya _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Hi, At 20:34 +0100 on 20 Apr (1240259658), priya sehgal wrote:> I was going through the memory management in Xen and little confused > about shadow page tables for HVM guest and how they work.Why are you interested? Do you have a project in mind?> Few of my questions are : > 1. A shadow page table, is a copy of the guest page table, with actual machine frame numbers (MFN), as against the physical page number(PFN) in case of guests. > Do the shadow page tables write protect each of the PTE it contains, or only the pages containing the "guest page table" are write protected (to check if the guest has modified any entry)?They write-protect all pagetables and some other things but not all of memory; that wouldn''t be very useful.> 2. I know that live migration is supported on xen, but not sure if it works for HVM guests. In this case also, xen will need to keep a shadow page table, which should detect which pages have been modified since the last time the pages were migrated. For this, shadow page table should mark all the PTEs as write protected. Does xen implement live migration of HVM guests in this way ? > If yes, then is this shadow table same as the one used in (1), or there are multiple copies of the shadow tables for different purposes?There''s only one set of shadows, since the MMU can only use one at a time. Cheers, Tim.> Thanks. > Priya > > > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel-- Tim Deegan <Tim.Deegan@citrix.com> Principal Software Engineer, Citrix Systems (R&D) Ltd. [Company #02300071, SL9 0DZ, UK.] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Thanks Tim.> Why are you interested? Do you have a project in > mind?We have a course project, in which we have to improve the performance of live migration for HVM guests. It seems that to support live migration, all the page table entries in the shadow page table are marked as write protected, so as to know which pages are dirtied and to be sent to the other machine. Since, there will be many page faults leading to performance degradation, we want to reduce these page faults. In our course project, we are supposed to form groups of pages and if any page in the group hits the page fault (due to write-protection), we mark all the pages in the group as RW. This way we can reduce the page faults. This is based on some hueristic that if a page is written, its immediate neighbors will also be written. My concern was that since we will be touching the same shadow page table and marking its PTEs as RO or RW during different epochs we should not break something existing. Priya.I --- On Tue, 4/21/09, Tim Deegan <Tim.Deegan@citrix.com> wrote:> From: Tim Deegan <Tim.Deegan@citrix.com> > Subject: Re: [Xen-devel] Shadow Page Tables in Xen > To: "priya sehgal" <priyagps@yahoo.co.in> > Cc: "xen-devel@lists.xensource.com" <xen-devel@lists.xensource.com> > Date: Tuesday, April 21, 2009, 2:07 PM > Hi, > > At 20:34 +0100 on 20 Apr (1240259658), priya sehgal wrote: > > I was going through the memory management in Xen and > little confused > > about shadow page tables for HVM guest and how they > work. > > Why are you interested? Do you have a project in > mind? > > > Few of my questions are : > > 1. A shadow page table, is a copy of the guest page > table, with actual machine frame numbers (MFN), as against > the physical page number(PFN) in case of guests. > > Do the shadow page tables write protect each of the > PTE it contains, or only the pages containing the "guest > page table" are write protected (to check if the guest has > modified any entry)? > > They write-protect all pagetables and some other things but > not all of > memory; that wouldn''t be very useful. > > > 2. I know that live migration is supported on xen, but > not sure if it works for HVM guests. In this case also, xen > will need to keep a shadow page table, which should detect > which pages have been modified since the last time the pages > were migrated. For this, shadow page table should mark all > the PTEs as write protected. Does xen implement live > migration of HVM guests in this way ? > > If yes, then is this shadow table same as the one used > in (1), or there are multiple copies of the shadow tables > for different purposes? > > There''s only one set of shadows, since the MMU can only use > one at a > time. > > Cheers, > > Tim. > > > Thanks. > > Priya > > > > > > > > > > _______________________________________________ > > Xen-devel mailing list > > Xen-devel@lists.xensource.com > > http://lists.xensource.com/xen-devel > > -- > Tim Deegan <Tim.Deegan@citrix.com> > Principal Software Engineer, Citrix Systems (R&D) Ltd. > [Company #02300071, SL9 0DZ, UK.] >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Hi, At 12:37 +0100 on 21 Apr (1240317466), priya sehgal wrote:> We have a course project, in which we have to improve the performance > of live migration for HVM guests. It seems that to support live > migration, all the page table entries in the shadow page table are > marked as write protected, so as to know which pages are dirtied and > to be sent to the other machine. Since, there will be many page faults > leading to performance degradation, we want to reduce these page > faults. In our course project, we are supposed to form groups of pages > and if any page in the group hits the page fault (due to > write-protection), we mark all the pages in the group as RW. This way > we can reduce the page faults.I see. Yes, in the shadow pagetables that should be easy enough; the shadow_set_l1e() function is where the read-only mapping will be replaced by a read-write one, which should be a hint that you might want to reconsider its neighbours.> This is based on some hueristic that if a page is written, its > immediate neighbors will also be written.I''d be interested to hear how it trades off against increaded bandwicth use. Cheers, Tim.> My concern was that since we will be touching the same shadow page > table and marking its PTEs as RO or RW during different epochs we > should not break something existing. > > Priya.I > --- On Tue, 4/21/09, Tim Deegan <Tim.Deegan@citrix.com> wrote: > > > From: Tim Deegan <Tim.Deegan@citrix.com> > > Subject: Re: [Xen-devel] Shadow Page Tables in Xen > > To: "priya sehgal" <priyagps@yahoo.co.in> > > Cc: "xen-devel@lists.xensource.com" <xen-devel@lists.xensource.com> > > Date: Tuesday, April 21, 2009, 2:07 PM > > Hi, > > > > At 20:34 +0100 on 20 Apr (1240259658), priya sehgal wrote: > > > I was going through the memory management in Xen and > > little confused > > > about shadow page tables for HVM guest and how they > > work. > > > > Why are you interested? Do you have a project in > > mind? > > > > > Few of my questions are : > > > 1. A shadow page table, is a copy of the guest page > > table, with actual machine frame numbers (MFN), as against > > the physical page number(PFN) in case of guests. > > > Do the shadow page tables write protect each of the > > PTE it contains, or only the pages containing the "guest > > page table" are write protected (to check if the guest has > > modified any entry)? > > > > They write-protect all pagetables and some other things but > > not all of > > memory; that wouldn''t be very useful. > > > > > 2. I know that live migration is supported on xen, but > > not sure if it works for HVM guests. In this case also, xen > > will need to keep a shadow page table, which should detect > > which pages have been modified since the last time the pages > > were migrated. For this, shadow page table should mark all > > the PTEs as write protected. Does xen implement live > > migration of HVM guests in this way ? > > > If yes, then is this shadow table same as the one used > > in (1), or there are multiple copies of the shadow tables > > for different purposes? > > > > There''s only one set of shadows, since the MMU can only use > > one at a > > time. > > > > Cheers, > > > > Tim. > > > > > Thanks. > > > Priya > > > > > > > > > > > > > > > _______________________________________________ > > > Xen-devel mailing list > > > Xen-devel@lists.xensource.com > > > http://lists.xensource.com/xen-devel > > > > -- > > Tim Deegan <Tim.Deegan@citrix.com> > > Principal Software Engineer, Citrix Systems (R&D) Ltd. > > [Company #02300071, SL9 0DZ, UK.] > > > > > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel-- Tim Deegan <Tim.Deegan@citrix.com> Principal Software Engineer, Citrix Systems (R&D) Ltd. [Company #02300071, SL9 0DZ, UK.] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Tue, Apr 21, 2009 at 1:04 AM, priya sehgal <priyagps@yahoo.co.in> wrote:> > Hi, > I was going through the memory management in Xen and little confused > about shadow page tables for HVM guest and how they work. > Few of my questions are : > 1. A shadow page table, is a copy of the guest page table, with actual machine frame numbers (MFN), as against the physical page number(PFN) in case of guests.yes thats true.> Do the shadow page tables write protect each of the PTE it contains, or only the pages containing the "guest page table" are write protected (to check if the guest has modified any entry)?Its gPT that is write protected and it is one of the mechanism to detect the gPT modification to keep sPT and gPT in sync. others mechanisms are like syncing based on processor’s page-fault behaviour Hypervisor has to sync accessed and PTE dirty bit every time there is a change in sPT by processor. As far as the write protection is concerned it does for all the PTE entry in gPT.> > 2. I know that live migration is supported on xen, but not sure if it works for HVM guests. In this case also, xen will need to keep a shadow page table, which should detect which pages have been modified since the last time the pages were migrated. For this, shadow page table should mark all the PTEs as write protected. Does xen implement live migration of HVM guests in this way ? > If yes, then is this shadow table same as the one used in (1), or there are multiple copies of the shadow tables for different purposes?Yes its hypervisor job to keep sPT and gPT in sync during migration.> > Thanks. > Priya > > > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Hi, On Tue, Apr 21, 2009 at 1:37 PM, priya sehgal <priyagps@yahoo.co.in> wrote:> > Thanks Tim. >> Why are you interested? Do you have a project in >> mind? > > We have a course project, in which we have to improve the performance of live migration for HVM guests. It seems that to support live migration, all the page table entries in the shadow page table are marked as write protected, so as to know which pages are dirtied and to be sent to the other machine. Since, there will be many page faults leading to performance degradation, we want to reduce these page faults. In our course project, we are supposed to form groups of pages and if any page in the group hits the page fault (due to write-protection), we mark all the pages in the group as RW. This way we can reduce the page faults. >Have you actually measured this? I think that the major cause of page faults and VM slowdown is -- rather than page faults on write access -- the fact that we blow the shadow pagetables away everytime we clean the dirty bitmap, and this requires a long operation to remove from top to bottom all reference counts and reconstructing later the shadow pagetables on the next memory accesses. Thanks, Gianluca -- It was a type of people I did not know, I found them very strange and they did not inspire confidence at all. Later I learned that I had been introduced to electronic engineers. E. W. Dijkstra _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> Have you actually measured this? I think that the major cause of page > faults and VM slowdown is -- rather than page faults on write access > -- the fact that we blow the shadow pagetables away everytime we clean > the dirty bitmap, and this requires a long operation to remove from > top to bottom all reference counts and reconstructing later the shadow > pagetables on the next memory accesses.Yep, ideally we''d just walking all the shadow leaf PTE''s to just remove write access. I''d guess that this would be a much more worthwhile optimization than trying to ''pre-fetch'' write faults. Ian _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> > We have a course project, in which we have to improve > the performance of live migration for HVM guests. It seems > that to support live migration, all the page table entries > in the shadow page table are marked as write protected, so > as to know which pages are dirtied and to be sent to the > other machine. Since, there will be many page faults leading > to performance degradation, we want to reduce these page > faults. In our course project, we are supposed to form > groups of pages and if any page in the group hits the page > fault (due to write-protection), we mark all the pages in > the group as RW. This way we can reduce the page faults. > > > > Have you actually measured this? I think that the major > cause of page > faults and VM slowdown is -- rather than page faults on > write access > -- the fact that we blow the shadow pagetables away > everytime we clean > the dirty bitmap, and this requires a long operation to > remove from > top to bottom all reference counts and reconstructing later > the shadow > pagetables on the next memory accesses. >We have not measured this, but we will benchmark it after making the changes. Since the number of page faults will reduce by a factor of "n", where "n" is the size of the page group, it should help speed the VM. If n is large enough, say 1000 contiguous pages and the workload is such that it dirties consecutive pages, it should help in improving performance. For very small values of "n" it might not help that much. Thanks, Priya _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Wed, Apr 22, 2009 at 12:37 AM, priya sehgal <priyagps@yahoo.co.in> wrote:> >> > We have a course project, in which we have to improve >> the performance of live migration for HVM guests. It seems >> that to support live migration, all the page table entries >> in the shadow page table are marked as write protected, so >> as to know which pages are dirtied and to be sent to the >> other machine. Since, there will be many page faults leading >> to performance degradation, we want to reduce these page >> faults. In our course project, we are supposed to form >> groups of pages and if any page in the group hits the page >> fault (due to write-protection), we mark all the pages in >> the group as RW. This way we can reduce the page faults. >> > >> >> Have you actually measured this? I think that the major >> cause of page >> faults and VM slowdown is -- rather than page faults on >> write access >> -- the fact that we blow the shadow pagetables away >> everytime we clean >> the dirty bitmap, and this requires a long operation to >> remove from >> top to bottom all reference counts and reconstructing later >> the shadow >> pagetables on the next memory accesses. >> > > We have not measured this, but we will benchmark it after making the changes. Since the number of page faults will reduce by a factor of "n", > where "n" is the size of the page group, it should help speed the VM. If n is large enough, say 1000 > contiguous pages and the workload is such that it dirties consecutive pages, it should help in improving performance. For very small values of "n" it might not help that much.There are various problems I can see with this approach: - A fixup fault (a page that add the writable mapping on an L1 after a pagefault) is not so expensive in this context. The big slowdown is that we run most of the time on empty shadow pagetables (due to the often shadow pagetables blowing). So, even if you speed up this minor case, you won''t get too far. - As Tim suggested, this will make the bandwidth required to do live migration much bigger (you''re talking about increasing the granularity of memory to be sent from 1 to 1000 pages). So you should take into account that yes, making bigger logdirty chunks will decrease the pagefaults, but will increase the required network bandwidth, which is a very important parameter for live migration. - Also, in a minor way, the fact that pages close to a page just dirtied are likely going to be dirtied soon does not imply that when libxc sends the big chunk of pages over the network the neighbors page are already been dirtied by the guest. This might impredictably cause the same big chunk of memory to be sent over the network multiple times during the live migration, and will further increase the bandwidth in a non controllable way. So, unless you''re interested in this particular feature, and you just want to check if this is worth or not (i.e. you are OK if this method doesn''t work), I''d suggest you to trace both log dirty fixup faults (when we mark a page dirty during a page fault) and when a particular page is sent over the network, and analyze the flow to see if this makes sense. Also, seems like this feature you''re thinking about is orthogonal to the paging technique used, i.e. HAP or Shadow, so if you have an EPT or NPT box available you might want to try with HAP at first, that does all the log dirty at P2M level, since that will make your life much easier. Hope this is useful, Gianluca -- It was a type of people I did not know, I found them very strange and they did not inspire confidence at all. Later I learned that I had been introduced to electronic engineers. E. W. Dijkstra _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Hi Gianluca,> There are various problems I can see with this approach:> > - As Tim suggested, this will make the bandwidth required > to do live > migration much bigger (you''re talking about increasing the > granularity > of memory to be sent from 1 to 1000 pages). So you should > take into > account that yes, making bigger logdirty chunks will > decrease the > pagefaults, but will increase the required network > bandwidth, which is > a very important parameter for live migration.I think there is one thing missing in what I explained. If a page in a group is written for the first time, all its neighbors are marked as RW. But, their dirty bit might be off. Now, we maintain a dirty group bitmap, which just stores the page groups that are dirty. During the end of an epoch, when we are about to ship the pages to the destination machine, we first check the groups which are dirty. Scan through *each* page''s PTE corresponding to that group to determine if it is DIRTY. Only if the page is DIRTY is it sent over the network. So, we do not send unwanted pages , only the ones marked DIRTY in the RW-enabled groups. I hope this time it is much clearer. Please let me know your comments on the same. Also, you mentioned that the main performance bottleneck is due to blowing up of the shadow page tables. Why is this required? Can''t the page table entries that were migrated in the last epoch be cleaned up only and reset. Could you please elaborate on this ? Thanks and Regards, Priya. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Apr 22, 2009, at 3:02 PM, priya sehgal wrote:> > Hi Gianluca, > >> There are various problems I can see with this approach: > >> >> - As Tim suggested, this will make the bandwidth required >> to do live >> migration much bigger (you''re talking about increasing the >> granularity >> of memory to be sent from 1 to 1000 pages). So you should >> take into >> account that yes, making bigger logdirty chunks will >> decrease the >> pagefaults, but will increase the required network >> bandwidth, which is >> a very important parameter for live migration. > > I think there is one thing missing in what I explained. > If a page in a group is written for the first time, all its neighbors > are marked as RW. But, their dirty bit might be off. Now, we maintain > a dirty group bitmap, which just stores the page groups that are > dirty. > During the end of an epoch, when we are about to ship the pages > to the destination machine, we first check the groups which are > dirty. Scan through *each* page''s PTE corresponding to that group to > determine > if it is DIRTY. Only if the page is DIRTY is it sent over the network. > So, we do not send unwanted pages , only the ones marked DIRTY in the > RW-enabled groups.Ok that seems, but please note that at the moment shadows always set the DIRTY bit (unless we''re tracking a VRAM address), so you might have to set the dirty bit clean when propagating the shadow, which is a bit too intrusive perhaps. But yes, that would work.> Also, you mentioned that the main performance bottleneck is due to > blowing up of the shadow page tables. Why is this required? Can''t > the page table entries that were migrated in the last epoch be > cleaned up only and reset. Could you please elaborate on this ?Yes. We could do that, as Ian suggested. There was no effort in making live migration''s shadow code fast, so nobody ever did it. Might be good to do it, if you''re interested in this case, and will be much cleaner than your suggested effort, and perhaps way more interesting. This comment, found in shadow_clean_dirty_bitmap might explain you a lot: /* Need to revoke write access to the domain''s pages again. * In future, we''ll have a less heavy-handed approach to this, * but for now, we just unshadow everything except Xen. */ shadow_blow_tables(d); Thanks, Gianluca _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> Ok that seems, but please note that at the moment shadows > always set the DIRTY bit (unless we''re tracking a VRAM > address), so you might have to set the dirty bit clean when > propagating the shadow, which is a bit too intrusive > perhaps. But yes, that would work. >I see one problem, which we might introduce by making the SPTE entries of all the pages in the group RW, but not marking their corresponding GPTE entry as DIRTY -- if the Guest writes to a page (marked as RW in SPTE) and makes it dirty, there is no way Hypervisor will know about it, and so it cannot mark GPTE''s entry as DIRTY. This might lead to inconsistency between the shadow and guest page tables. Also, the OS running inside the guest will not see correct picture about the dirty pages, thereby not flushing the dirty pages. Please let me know if I am right about the problem created due to our proposed implementation. Thanks. Priya _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Fri, Apr 24, 2009 at 6:33 AM, priya sehgal <priyagps@yahoo.co.in> wrote:> >> Ok that seems, but please note that at the moment shadows >> always set the DIRTY bit (unless we''re tracking a VRAM >> address), so you might have to set the dirty bit clean when >> propagating the shadow, which is a bit too intrusive >> perhaps. But yes, that would work. >> > > I see one problem, which we might introduce by making the SPTE entries of > all the pages in the group RW, but not marking their corresponding GPTE entry as DIRTY -- if the Guest writes to a page (marked as RW in SPTE) and makes it dirty, there is no way Hypervisor will know about it, and so it cannot mark GPTE''s entry as DIRTY. This might lead to inconsistency between the shadow and guest page tables. Also, the OS running inside the guest will not see correct picture about the dirty pages, thereby not flushing the dirty pages.Yes, that''s a common problem when you "prefetch" write accesses in shadows. You can just set dirty (as in log_dirty) pages that have guest''s ptes dirty. Of course you must be very careful in case you have an entry coming from a splintered L1 in shadow (that is the guest sets the PSE bit) because in that case you have to check at the guest L2 (guest PDE level) the dirty bit. Anyway, I think it''s kind of clear that I was trying to convince you to the idea of getting rid of shadow_blow_tables at each log dirty cleanup, since this feature you have in mind is both intrusive and very specific, and perhaps not worth it. Of course, you might prove my non-proved assumption wrong. :) Thanks, Gianluca -- It was a type of people I did not know, I found them very strange and they did not inspire confidence at all. Later I learned that I had been introduced to electronic engineers. E. W. Dijkstra _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel