Hi, Robert, I found out another confusing code snippet: in void xi_invl_mfn(struct domain *d, unsigned long mfn) if (ext && pfn < ext->large_page_aligned_size) According to the code, it should be if (ext && (pfn>>SPT_ENTRIES_ORDER) < ext->large_page_aligned_size) If I made any mistake, please point it out. _______________________________________________________ Best Regards, hanzhu Robert S. Phillips 写道:> Hi Han, > Yes it is a bug but of limited impact. The effect is that we will fail > to keep the ''accessed'' flag up-to-date but only for L3 guest page table > entries in 64-bit guests. Still I''m very pleased that you spotted it. > -- rsp > > Robert S. Phillips wrote: >> Hi Han, >> I think you''re right. I''ll get back to you soon after we review the >> ramifications. >> Thanks! >> -- rsp >> p.s. Out of curiosity ... where do you work? >> >> han wrote: >>> Hi Robert, >>> Thanks a lot for your thorough explanation. >>> However, in dt_init_action I notice this code snippet: >>> + int _32pae_l3 = c_nr_levels <= 3 && c_level == 3; >>> Seems c_nr_levels is always <= 3, isn''t it? >>> I''m afraid the code means that c_nr_levels == 3. >>> >>> I suppose the c_nr_levels should be 3 if the guest is 32bit(pae or >>> not), and c_nr_levels should be 4 if the guest is 64bit. So I >>> mentioned you about the code for uncombine_nr_levels. >>> >>> >>> _______________________________________________________ >>> Best Regards, >>> hanzhu >>> >>> >>> Robert S. Phillips 写道: >>>> Hello Han, >>>> >>>> The puzzling function, uncombine_nr_levels(), should be reviewed >>>> together with the previous two functions: combine nr_levels() and >>>> uncombine_level(), shown below. The problem I''m attempting to solve >>>> is to minimize the size of the decision table. It is currently 4096 >>>> entries. >>>> >>>> The decision function depends on "level" (that is, the level of the >>>> current shadow page table) and, to a lesser extent, on "nr_levels" >>>> (that is, the number of guest paging levels). It really only cares >>>> if nr_levels is 3 or not, which is why uncombine_nr_levels() returns >>>> 3 or not 3. >>>> >>>> Normally each of the two values can each have three values (2, 3, or >>>> 4) making 9 combinations. >>>> I squeeze them down to 4 combinations, which cuts the decision table >>>> size by more than half. >>>> >>>> The function combine_levels() is used to encode nr_levels and >>>> levels into a number from 0 to 3. >>>> >>>> The functions uncombine_nr_levels() and uncombine_level() do the >>>> inverse, sort of: >>>> uncombine_level() returns the number of levels (1, 2, 3, or 4) >>>> and uncombine_nr_levels() returns 3 if the number of guest paging >>>> levels was 3, 0 otherwise. This is all dt_init_action() needs to >>>> initialize the decision table. It only tests if c_nr_levels is 3. >>>> >>>> I hope this explanation helps. >>>> >>>> -- rsp >>>> >>>> /* Combine nr_levels and level into 4 combinations by >>>> * taking advantage of the fact that the rules for PTE levels >>>> * 3 and 4 are identical >>>> * except that the PDPT (ie nr_level == 3, level == 3) >>>> * is slightly irregular. */ >>>> static inline int combine_levels(int c_nr_levels, >>>> int c_level) >>>> { >>>> switch (c_level) { >>>> case 1: return 0; >>>> case 2: return 1; >>>> case 3: return c_nr_levels == 3 >>>> ? 2 /* PDPT */ >>>> : 3; /* treat like level 4*/ >>>> default: /* the compiler needs this */ >>>> case 4: return 3; >>>> } >>>> } >>>> >>>> static inline int uncombine_level(int combo) >>>> { >>>> return combo + 1; >>>> } >>>> >>>> static inline int uncombine_nr_levels(int combo) >>>> { >>>> return combo == 2 ? 3 : 0; >>>> } >>>> >>>> Ben Thomas wrote: >>>>> >>>>> >>>>> -------- Original Message -------- >>>>> Subject: Re: [Xen-devel] [PATCH - proposed] XI Shadow Page Table >>>>> Mechanism >>>>> Date: Thu, 29 Jun 2006 20:25:13 +0800 >>>>> From: han <vanbas.han@gmail.com> >>>>> To: Ben Thomas <bthomas@virtualiron.com> >>>>> References: <44A055FA.1010405@virtualiron.com> >>>>> >>>>> Hi, Ben >>>>> I''m learning about your patch for the rewritten shadow pagetable code. >>>>> However, I feel confused about some part of your code: >>>>> +static inline int uncombine_nr_levels(int combo) >>>>> +{ >>>>> + return combo == 2 ? 3 : 0; >>>>> +} >>>>> shouldn''t it be like below snippet: >>>>> +static inline int uncombine_nr_levels(int combo) >>>>> +{ >>>>> + return combo <= 2 ? 3 : 4; >>>>> +} >>>>> >>>>> _______________________________________________________ >>>>> Best Regards, >>>>> hanzhu >>>>> >>>>> >>>>> Ben Thomas 写道: >>>>>> A post last week contained the design document for a shadow page >>>>>> table mechanism. This post contains more information, and a pointer >>>>>> to the code. >>>>>> >>>>>> As a recap, the "XI Shadow Mechanism" is a design for shadow >>>>>> page table code for fully virtualized HVM domains running on a 64-bit >>>>>> Xen hypervisor. >>>>>> >>>>>> This work was undertaken to address a number of goals. These are >>>>>> enumerated in the document posted last week and include: >>>>>> >>>>>> - ability to run fully virtualized 32, 32PAE and 64 bit guest >>>>>> domains concurrently on a 64-bit hypervisor >>>>>> >>>>>> - start support of live migration of fully virtualized domains >>>>>> >>>>>> - provide good performance and robustness >>>>>> >>>>>> - limit size and scope of change to Xen hypervisor >>>>>> >>>>>> This design has been coded, and tested on a variety of operating >>>>>> systems. I''ve included this weekend''s test report. >>>>>> >>>>>> Of particular interest from the following table are the Average Time >>>>>> comparisions (smaller is better): >>>>>> >>>>>> pfault 10.56 vs 2.43 (XI) >>>>>> sync_all 39.72 vs 33.51 (XI) >>>>>> invl_pg 22.75 vs 4.00 (XI) >>>>>> >>>>>> The patch is too large for the mailing list, and may be obtained from >>>>>> >>>>>> http://s3.amazonaws.com/VIDownloads/xi-20060626.patch >>>>>> >>>>>> The patch applies against changeset 10352 (and possibly others) >>>>>> >>>>>> We look forward to any comments and suggestions. >>>>>> >>>>>> Thanks, >>>>>> -b >>>>>> >>>>>> >>>>>> >>>>>> XI Shadow Test Report >>>>>> >>>>>> This report shows a comparsion of test runs done on xen-unstable >>>>>> changeset >>>>>> 10352 with and without the XI patch applied. It also compares >>>>>> pfault, >>>>>> sync_all, and invl_pg performance. >>>>>> >>>>>> With the XI patch 32bit SMP (PAE) guests now boot and pass the >>>>>> regression >>>>>> tests, closing Bug #661. With the XI patch 64bit SMP guests are >>>>>> much more >>>>>> stable and are no longer experiencing SEGVs and GPFs, closing Bug >>>>>> #678. >>>>>> >>>>>> ---------------------------------------------------------------------- >>>>>> >>>>>> | XEN | Guest Kernel SMP kernels booted with 2 >>>>>> CPUs | >>>>>> | >>>>>> Changeset|-----------------------------------------------------------| >>>>>> >>>>>> | | 32bit Non-XI | 32bit XI | 64bit Non-XI | 64bit >>>>>> XI | >>>>>> | >>>>>> |--------------|--------------|--------------|--------------| >>>>>> | | Boot | Test | Boot | Test | Boot | Test | Boot | >>>>>> Test | >>>>>> |----------|------|-------|------|-------|------|-------|------|-------| >>>>>> >>>>>> | 10352 | Fail | | Pass | 852/1 | Pass | 852/0 | Pass | >>>>>> 852/0 | >>>>>> | | (1) | | | (2) | |(3,4,5)| >>>>>> |(3,4) | >>>>>> ---------------------------------------------------------------------- >>>>>> >>>>>> >>>>>> 1. BUG 661: 32bit SMP guest crashes on boot: >>>>>> "(XEN) __hvm_bug at vmx.c:2289" >>>>>> 2. BUG 666: 32bit UP guest fail ltp gettimeofday02: >>>>>> "Time is going backwards" >>>>>> 3. BUG 663: 64bit SMP guest report these messages while running: >>>>>> "(XEN) spurious IRQ irq got=-1" >>>>>> 4. BUG 662: 64bit SMP guest report this message in ltp pth_str01/2/3: >>>>>> "(XEN) <vlapic_accept_irq>level trig mode repeatedly for >>>>>> vector 252" >>>>>> 5. BUG 678: 64bit 2CPU SMP guest failed several ltp tests with >>>>>> SEGVs or GPF >>>>>> "sshd[23654] general protection rip:2a9555d426 rsp:7fbfffd000 >>>>>> error 0" >>>>>> >>>>>> The XI patch can be enabled or disabled at build time and includes >>>>>> some >>>>>> performance statistics that can be examined whether or not it''s >>>>>> enabled. >>>>>> This allows us to compare the performance of XI vs the existing >>>>>> shadow code. >>>>>> You access the statistics from the XEN console (^a^a^a). Typing a ''Y'' >>>>>> clears the counters and a ''y'' displays the counters since the last >>>>>> clear. >>>>>> Here is a comparsion of XI vs Non-XI shadow code done on xen-unstable >>>>>> changeset 10352. The guest is a RHEL4U2-64bit 2 CPU SMP 256MB >>>>>> Guest running >>>>>> a usex -b24 load for 5 minutes. Shadow statistics cleared at >>>>>> start of run. >>>>>> >>>>>> (XEN) Non-XI shadow performance statistics for the last 303 seconds >>>>>> (XEN) Op type # of >>>>>> Operations Total >>>>>> Avg Tick Profile >>>>>> (XEN) <2usec <3usec <4usec <8usec >>>>>> <16usec <32usec <64usec >=64usec ops/msec usec In >>>>>> Op Locked Total >>>>>> (XEN) pfault 6750685 1128922 624988 3249437 >>>>>> 3833201 2863229 1224908 315992 19991362 10.56 >>>>>> 0 0 0 >>>>>> (XEN) 33.7% 5.6% 3.1% 16.2% >>>>>> 19.1% 14.3% 6.1% 1.5% 211175 >>>>>> (XEN) sync_all 8176 380 333 159011 >>>>>> 167619 305470 814598 298411 1753998 39.72 >>>>>> 0 0 0 >>>>>> (XEN) 0.4% 0.0% 0.0% 9.0% >>>>>> 9.5% 17.4% 46.4% 17.0% 69670 >>>>>> (XEN) invl_pg 219 17 114 23333 >>>>>> 40087 21904 24622 4639 114935 22.75 >>>>>> 0 0 0 >>>>>> (XEN) 0.1% 0.0% 0.0% 20.3% >>>>>> 34.8% 19.0% 21.4% 4.0% 2615 >>>>>> >>>>>> (XEN) XI shadow performance statistics for the last 302 seconds >>>>>> (XEN) Op type # of >>>>>> Operations Total >>>>>> Avg Tick Profile >>>>>> (XEN) <2usec <3usec <4usec <8usec >>>>>> <16usec <32usec <64usec >=64usec ops/msec usec In >>>>>> Op Locked Total >>>>>> (XEN) pfault 23078414 4694139 485499 613419 >>>>>> 396088 1427265 11976 8228 30715028 2.43 >>>>>> 0 0 0 >>>>>> (XEN) 75.1% 15.2% 1.5% 1.9% >>>>>> 1.2% 4.6% 0.0% 0.0% 74690 >>>>>> (XEN) sync_all 1734 97 169 305433 >>>>>> 484991 174300 150199 130405 1247328 33.51 >>>>>> 0 0 0 >>>>>> (XEN) 0.1% 0.0% 0.0% 24.4% >>>>>> 38.8% 13.9% 12.0% 10.4% 41810 >>>>>> (XEN) invl_pg 10169 302977 125270 44927 >>>>>> 5385 17603 776 1656 508763 4.00 >>>>>> 0 0 0 >>>>>> (XEN) 1.9% 59.5% 24.6% 8.8% >>>>>> 1.0% 3.4% 0.1% 0.3% 2037 >>>>>> >>>>>> >>>>> >>>> >> >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Robert Phillips
2006-Jun-30 12:24 UTC
Re: [Xen-devel] [PATCH - proposed] XI Shadow Page Table Mechanism]
Hi Han, You are right again. I have not seen this cause a problem but we''ll certainly fix it. -- rsp On 6/29/06, zhu <vanbas.han@gmail.com> wrote:> > Hi, Robert, > I found out another confusing code snippet: > in void xi_invl_mfn(struct domain *d, unsigned long mfn) > if (ext && pfn < ext->large_page_aligned_size) > > According to the code, it should be > if (ext && (pfn>>SPT_ENTRIES_ORDER) < ext->large_page_aligned_size) > > If I made any mistake, please point it out. > _______________________________________________________ > Best Regards, > hanzhu >-- -------------------------------------------------------------------- Robert S. Phillips Virtual Iron Software rphillips@virtualiron.com Tower 1, Floor 2 978-849-1220 900 Chelmsford Street Lowell, MA 01851 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Robert Phillips
2006-Jun-30 13:18 UTC
Re: [Xen-devel] [PATCH - proposed] XI Shadow Page Table Mechanism]
Yes, that''s true. If the caller supplied a bogus PFN (one larger than ram size), we''d crash. BTW, we are avoiding use of grant table transfers because they fragment PSE pages. -- rsp On 6/30/06, zhu <vanbas.han@gmail.com> wrote:> > I''m afraid it could trigger some certain problems only when you open l2 > PSE flag and use grant table to transfer pages between domains. I''m not > very sure about it. :-) > > _______________________________________________________ > Best Regards, > hanzhu > > > Robert Phillips 写道: > > Hi Han, > > You are right again. I have not seen this cause a problem but we''ll > > certainly fix it. > > -- rsp > > > > On 6/29/06, zhu <vanbas.han@gmail.com> wrote: > >> > >> Hi, Robert, > >> I found out another confusing code snippet: > >> in void xi_invl_mfn(struct domain *d, unsigned long mfn) > >> if (ext && pfn < ext->large_page_aligned_size) > >> > >> According to the code, it should be > >> if (ext && (pfn>>SPT_ENTRIES_ORDER) < > ext->large_page_aligned_size) > >> > >> If I made any mistake, please point it out. > >> _______________________________________________________ > >> Best Regards, > >> hanzhu > >> > > > > >-- -------------------------------------------------------------------- Robert S. Phillips Virtual Iron Software rphillips@virtualiron.com Tower 1, Floor 2 978-849-1220 900 Chelmsford Street Lowell, MA 01851 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Steve Ofsthun
2006-Jun-30 13:33 UTC
Re: [Xen-devel] [PATCH - proposed] XI Shadow Page Table Mechanism]
Robert Phillips wrote:> Yes, that''s true. If the caller supplied a bogus PFN (one larger than > ram size), we''d crash. > BTW, we are avoiding use of grant table transfers because they fragment > PSE pages. > -- rspWe also avoid grant table transfers due to incompatibilities with QEMU''s one time map of all of guest memory. If you use grant table transfers (via netfront/netback), the QEMU memory map quickly becomes out of date. Any QEMU I/O (net, or disk), will result in stale guest data being used.> On 6/30/06, *zhu* <vanbas.han@gmail.com <mailto:vanbas.han@gmail.com>> > wrote: > > I''m afraid it could trigger some certain problems only when you open l2 > PSE flag and use grant table to transfer pages between domains. I''m not > very sure about it. :-) > > _______________________________________________________ > Best Regards, > hanzhu > > > Robert Phillips 写道: > > Hi Han, > > You are right again. I have not seen this cause a problem but we''ll > > certainly fix it. > > -- rsp > > > > On 6/29/06, zhu < vanbas.han@gmail.com > <mailto:vanbas.han@gmail.com>> wrote: > >> > >> Hi, Robert, > >> I found out another confusing code snippet: > >> in void xi_invl_mfn(struct domain *d, unsigned long mfn) > >> if (ext && pfn < ext->large_page_aligned_size) > >> > >> According to the code, it should be > >> if (ext && (pfn>>SPT_ENTRIES_ORDER) < > ext->large_page_aligned_size) > >> > >> If I made any mistake, please point it out. > >> _______________________________________________________ > >> Best Regards, > >> hanzhu > >> > > > > > > > > > -- > -------------------------------------------------------------------- > Robert S. Phillips Virtual Iron Software > rphillips@virtualiron.com > <mailto:rphillips@virtualiron.com> Tower 1, Floor 2 > 978-849-1220 900 Chelmsford Street > Lowell, MA 01851 > > > ------------------------------------------------------------------------ > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel-- Steve Ofsthun - Virtual Iron Software, Inc. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
zhu
2006-Jun-30 13:47 UTC
Re: [Xen-devel] [PATCH - proposed] XI Shadow Page Table Mechanism]
Hi, Robert, There is another minor problem. in dt_init_action: + if (_32pae_l3 || c_curr_accessed) + action |= a_present; + /* else make present when accessed */ + if (c_curr_rw) + action |= a_rw; According to your comments, the following snippet may be more correct: + if (_32pae_l3 || c_curr_accessed) + action |= a_present; + /* else make present when accessed */ + if (c_curr_rw && !_32_pae_l3) + action |= a_rw; The doc said we should retain R/W be zero for pae_l3_pte in legacy mode. If returned a_rw, resync_spte will set _PAGE_RW for the corresponding spte. _______________________________________________________ Best Regards, hanzhu Robert Phillips 写道:> Yes, that''s true. If the caller supplied a bogus PFN (one larger than ram > size), we''d crash. > BTW, we are avoiding use of grant table transfers because they fragment PSE > pages. > -- rsp > > > On 6/30/06, zhu <vanbas.han@gmail.com> wrote: >> >> I''m afraid it could trigger some certain problems only when you open l2 >> PSE flag and use grant table to transfer pages between domains. I''m not >> very sure about it. :-) >> >> _______________________________________________________ >> Best Regards, >> hanzhu >> >> >> Robert Phillips $B<LF;(B: >> > Hi Han, >> > You are right again. I have not seen this cause a problem but we''ll >> > certainly fix it. >> > -- rsp >> > >> > On 6/29/06, zhu <vanbas.han@gmail.com> wrote: >> >> >> >> Hi, Robert, >> >> I found out another confusing code snippet: >> >> in void xi_invl_mfn(struct domain *d, unsigned long mfn) >> >> if (ext && pfn < ext->large_page_aligned_size) >> >> >> >> According to the code, it should be >> >> if (ext && (pfn>>SPT_ENTRIES_ORDER) < >> ext->large_page_aligned_size) >> >> >> >> If I made any mistake, please point it out. >> >> _______________________________________________________ >> >> Best Regards, >> >> hanzhu >> >> >> > >> > >> > > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
zhu
2006-Jul-01 09:22 UTC
Re: [Xen-devel] [PATCH - proposed] XI Shadow Page Table Mechanism]
Hi, After taking some time to dig into your patch about XI Shadow page table, I have to say it''s really a good design and implementation IMHO, especially the parts about the clear hierarchy for each smfn,decision table and how to support 32nopae in a rather elegant way. However, I have several questions to discuss with you.:-) 1) It seems XI shadow pgt reserve all of the possible resources at the early stage for HVM domain(the first time to create the asi). It could be quite proper to reserve the smfns and sptis. However, do we really need to reserve one snapshot page for each smfn at first and retain it until the HVM domain is destroyed? I guess a large number of gpts will not been modified frequently after them are totally set up. IMHO, it would be better to manage these snapshot pages dynamic. Of course, this will change the basic logistic of the code, e.g. you have to sync the shadow pgt when invoke spti_make_shadow instead of leaving it out of sync, you can''t set up the total low level shadow pgt when invoke resync_spte since it could cost a lot of time. 2) GP back link plays a very important role in XI shadow pgt. However, it will also cause high memory pressure for the domain(2 pages for each smfn). For these normal guest pages instead of GPT pages, I guess its usage is limited. Only when invoke xi_invld_mfn, divide_large_page or dirty logging, we will refer to the back link for these normal guest pages. Is it reasonable to implement the back link only for the GPT pages? Of course, this will increase the complexity of the code a little. 3) Can you show us the statistics between the current shadow pgt and XI pgt for some critical operations, such as shadow_resync_all, gva_to_gpa, shadow_fault and so on. I''m really curious about it. I have to say I''m not very familiar with the current shadow pgt implementation so I could miss some important considerations when I post these questions. Please point it out. Thanks for sharing your idea and code with us. :-) _______________________________________________________ Best Regards, hanzhu _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel