Hi, Robert,
I found out another confusing code snippet:
in void xi_invl_mfn(struct domain *d, unsigned long mfn)
if (ext && pfn < ext->large_page_aligned_size)
According to the code, it should be
if (ext && (pfn>>SPT_ENTRIES_ORDER) <
ext->large_page_aligned_size)
If I made any mistake, please point it out.
_______________________________________________________
Best Regards,
hanzhu
Robert S. Phillips 写道:> Hi Han,
> Yes it is a bug but of limited impact. The effect is that we will fail
> to keep the ''accessed'' flag up-to-date but only for L3
guest page table
> entries in 64-bit guests. Still I''m very pleased that you
spotted it.
> -- rsp
>
> Robert S. Phillips wrote:
>> Hi Han,
>> I think you''re right. I''ll get back to you soon
after we review the
>> ramifications.
>> Thanks!
>> -- rsp
>> p.s. Out of curiosity ... where do you work?
>>
>> han wrote:
>>> Hi Robert,
>>> Thanks a lot for your thorough explanation.
>>> However, in dt_init_action I notice this code snippet:
>>> + int _32pae_l3 = c_nr_levels <= 3 && c_level
== 3;
>>> Seems c_nr_levels is always <= 3, isn''t it?
>>> I''m afraid the code means that c_nr_levels == 3.
>>>
>>> I suppose the c_nr_levels should be 3 if the guest is 32bit(pae or
>>> not), and c_nr_levels should be 4 if the guest is 64bit. So I
>>> mentioned you about the code for uncombine_nr_levels.
>>>
>>>
>>> _______________________________________________________
>>> Best Regards,
>>> hanzhu
>>>
>>>
>>> Robert S. Phillips 写道:
>>>> Hello Han,
>>>>
>>>> The puzzling function, uncombine_nr_levels(), should be
reviewed
>>>> together with the previous two functions: combine nr_levels()
and
>>>> uncombine_level(), shown below. The problem I''m
attempting to solve
>>>> is to minimize the size of the decision table. It is currently
4096
>>>> entries.
>>>>
>>>> The decision function depends on "level" (that is,
the level of the
>>>> current shadow page table) and, to a lesser extent, on
"nr_levels"
>>>> (that is, the number of guest paging levels). It really only
cares
>>>> if nr_levels is 3 or not, which is why uncombine_nr_levels()
returns
>>>> 3 or not 3.
>>>>
>>>> Normally each of the two values can each have three values (2,
3, or
>>>> 4) making 9 combinations.
>>>> I squeeze them down to 4 combinations, which cuts the decision
table
>>>> size by more than half.
>>>>
>>>> The function combine_levels() is used to encode nr_levels and
>>>> levels into a number from 0 to 3.
>>>>
>>>> The functions uncombine_nr_levels() and uncombine_level() do
the
>>>> inverse, sort of:
>>>> uncombine_level() returns the number of levels (1, 2, 3, or 4)
>>>> and uncombine_nr_levels() returns 3 if the number of guest
paging
>>>> levels was 3, 0 otherwise. This is all dt_init_action() needs
to
>>>> initialize the decision table. It only tests if c_nr_levels is
3.
>>>>
>>>> I hope this explanation helps.
>>>>
>>>> -- rsp
>>>>
>>>> /* Combine nr_levels and level into 4 combinations by
>>>> * taking advantage of the fact that the rules for PTE levels
>>>> * 3 and 4 are identical
>>>> * except that the PDPT (ie nr_level == 3, level == 3)
>>>> * is slightly irregular. */
>>>> static inline int combine_levels(int c_nr_levels,
>>>> int c_level)
>>>> {
>>>> switch (c_level) {
>>>> case 1: return 0;
>>>> case 2: return 1;
>>>> case 3: return c_nr_levels == 3
>>>> ? 2 /* PDPT */
>>>> : 3; /* treat like level 4*/
>>>> default: /* the compiler needs this */
>>>> case 4: return 3;
>>>> }
>>>> }
>>>>
>>>> static inline int uncombine_level(int combo)
>>>> {
>>>> return combo + 1;
>>>> }
>>>>
>>>> static inline int uncombine_nr_levels(int combo)
>>>> {
>>>> return combo == 2 ? 3 : 0;
>>>> }
>>>>
>>>> Ben Thomas wrote:
>>>>>
>>>>>
>>>>> -------- Original Message --------
>>>>> Subject: Re: [Xen-devel] [PATCH - proposed] XI Shadow Page
Table
>>>>> Mechanism
>>>>> Date: Thu, 29 Jun 2006 20:25:13 +0800
>>>>> From: han <vanbas.han@gmail.com>
>>>>> To: Ben Thomas <bthomas@virtualiron.com>
>>>>> References: <44A055FA.1010405@virtualiron.com>
>>>>>
>>>>> Hi, Ben
>>>>> I''m learning about your patch for the rewritten
shadow pagetable code.
>>>>> However, I feel confused about some part of your code:
>>>>> +static inline int uncombine_nr_levels(int combo)
>>>>> +{
>>>>> + return combo == 2 ? 3 : 0;
>>>>> +}
>>>>> shouldn''t it be like below snippet:
>>>>> +static inline int uncombine_nr_levels(int combo)
>>>>> +{
>>>>> + return combo <= 2 ? 3 : 4;
>>>>> +}
>>>>>
>>>>> _______________________________________________________
>>>>> Best Regards,
>>>>> hanzhu
>>>>>
>>>>>
>>>>> Ben Thomas 写道:
>>>>>> A post last week contained the design document for a
shadow page
>>>>>> table mechanism. This post contains more information,
and a pointer
>>>>>> to the code.
>>>>>>
>>>>>> As a recap, the "XI Shadow Mechanism" is a
design for shadow
>>>>>> page table code for fully virtualized HVM domains
running on a 64-bit
>>>>>> Xen hypervisor.
>>>>>>
>>>>>> This work was undertaken to address a number of goals.
These are
>>>>>> enumerated in the document posted last week and
include:
>>>>>>
>>>>>> - ability to run fully virtualized 32, 32PAE and 64 bit
guest
>>>>>> domains concurrently on a 64-bit hypervisor
>>>>>>
>>>>>> - start support of live migration of fully virtualized
domains
>>>>>>
>>>>>> - provide good performance and robustness
>>>>>>
>>>>>> - limit size and scope of change to Xen hypervisor
>>>>>>
>>>>>> This design has been coded, and tested on a variety of
operating
>>>>>> systems. I''ve included this weekend''s
test report.
>>>>>>
>>>>>> Of particular interest from the following table are the
Average Time
>>>>>> comparisions (smaller is better):
>>>>>>
>>>>>> pfault 10.56 vs 2.43 (XI)
>>>>>> sync_all 39.72 vs 33.51 (XI)
>>>>>> invl_pg 22.75 vs 4.00 (XI)
>>>>>>
>>>>>> The patch is too large for the mailing list, and may be
obtained from
>>>>>>
>>>>>> http://s3.amazonaws.com/VIDownloads/xi-20060626.patch
>>>>>>
>>>>>> The patch applies against changeset 10352 (and possibly
others)
>>>>>>
>>>>>> We look forward to any comments and suggestions.
>>>>>>
>>>>>> Thanks,
>>>>>> -b
>>>>>>
>>>>>>
>>>>>>
>>>>>> XI Shadow Test Report
>>>>>>
>>>>>> This report shows a comparsion of test runs done on
xen-unstable
>>>>>> changeset
>>>>>> 10352 with and without the XI patch applied. It also
compares
>>>>>> pfault,
>>>>>> sync_all, and invl_pg performance.
>>>>>>
>>>>>> With the XI patch 32bit SMP (PAE) guests now boot and
pass the
>>>>>> regression
>>>>>> tests, closing Bug #661. With the XI patch 64bit SMP
guests are
>>>>>> much more
>>>>>> stable and are no longer experiencing SEGVs and GPFs,
closing Bug
>>>>>> #678.
>>>>>>
>>>>>>
----------------------------------------------------------------------
>>>>>>
>>>>>> | XEN | Guest Kernel SMP kernels booted with 2
>>>>>> CPUs |
>>>>>> |
>>>>>>
Changeset|-----------------------------------------------------------|
>>>>>>
>>>>>> | | 32bit Non-XI | 32bit XI | 64bit Non-XI
| 64bit
>>>>>> XI |
>>>>>> |
>>>>>>
|--------------|--------------|--------------|--------------|
>>>>>> | | Boot | Test | Boot | Test | Boot | Test
| Boot |
>>>>>> Test |
>>>>>>
|----------|------|-------|------|-------|------|-------|------|-------|
>>>>>>
>>>>>> | 10352 | Fail | | Pass | 852/1 | Pass | 852/0
| Pass |
>>>>>> 852/0 |
>>>>>> | | (1) | | | (2) |
|(3,4,5)|
>>>>>> |(3,4) |
>>>>>>
----------------------------------------------------------------------
>>>>>>
>>>>>>
>>>>>> 1. BUG 661: 32bit SMP guest crashes on boot:
>>>>>> "(XEN) __hvm_bug at vmx.c:2289"
>>>>>> 2. BUG 666: 32bit UP guest fail ltp gettimeofday02:
>>>>>> "Time is going backwards"
>>>>>> 3. BUG 663: 64bit SMP guest report these messages while
running:
>>>>>> "(XEN) spurious IRQ irq got=-1"
>>>>>> 4. BUG 662: 64bit SMP guest report this message in ltp
pth_str01/2/3:
>>>>>> "(XEN) <vlapic_accept_irq>level trig
mode repeatedly for
>>>>>> vector 252"
>>>>>> 5. BUG 678: 64bit 2CPU SMP guest failed several ltp
tests with
>>>>>> SEGVs or GPF
>>>>>> "sshd[23654] general protection rip:2a9555d426
rsp:7fbfffd000
>>>>>> error 0"
>>>>>>
>>>>>> The XI patch can be enabled or disabled at build time
and includes
>>>>>> some
>>>>>> performance statistics that can be examined whether or
not it''s
>>>>>> enabled.
>>>>>> This allows us to compare the performance of XI vs the
existing
>>>>>> shadow code.
>>>>>> You access the statistics from the XEN console
(^a^a^a). Typing a ''Y''
>>>>>> clears the counters and a ''y''
displays the counters since the last
>>>>>> clear.
>>>>>> Here is a comparsion of XI vs Non-XI shadow code done
on xen-unstable
>>>>>> changeset 10352. The guest is a RHEL4U2-64bit 2 CPU SMP
256MB
>>>>>> Guest running
>>>>>> a usex -b24 load for 5 minutes. Shadow statistics
cleared at
>>>>>> start of run.
>>>>>>
>>>>>> (XEN) Non-XI shadow performance statistics for the last
303 seconds
>>>>>> (XEN) Op type # of
>>>>>> Operations Total
>>>>>> Avg Tick Profile
>>>>>> (XEN) <2usec <3usec <4usec
<8usec
>>>>>> <16usec <32usec <64usec >=64usec
ops/msec usec In
>>>>>> Op Locked Total
>>>>>> (XEN) pfault 6750685 1128922 624988 3249437
>>>>>> 3833201 2863229 1224908 315992 19991362
10.56
>>>>>> 0 0 0
>>>>>> (XEN) 33.7% 5.6% 3.1% 16.2%
>>>>>> 19.1% 14.3% 6.1% 1.5% 211175
>>>>>> (XEN) sync_all 8176 380 333 159011
>>>>>> 167619 305470 814598 298411 1753998
39.72
>>>>>> 0 0 0
>>>>>> (XEN) 0.4% 0.0% 0.0% 9.0%
>>>>>> 9.5% 17.4% 46.4% 17.0% 69670
>>>>>> (XEN) invl_pg 219 17 114 23333
>>>>>> 40087 21904 24622 4639 114935 22.75
>>>>>> 0 0 0
>>>>>> (XEN) 0.1% 0.0% 0.0% 20.3%
>>>>>> 34.8% 19.0% 21.4% 4.0% 2615
>>>>>>
>>>>>> (XEN) XI shadow performance statistics for the last 302
seconds
>>>>>> (XEN) Op type # of
>>>>>> Operations Total
>>>>>> Avg Tick Profile
>>>>>> (XEN) <2usec <3usec <4usec
<8usec
>>>>>> <16usec <32usec <64usec >=64usec
ops/msec usec In
>>>>>> Op Locked Total
>>>>>> (XEN) pfault 23078414 4694139 485499 613419
>>>>>> 396088 1427265 11976 8228 30715028
2.43
>>>>>> 0 0 0
>>>>>> (XEN) 75.1% 15.2% 1.5% 1.9%
>>>>>> 1.2% 4.6% 0.0% 0.0% 74690
>>>>>> (XEN) sync_all 1734 97 169 305433
>>>>>> 484991 174300 150199 130405 1247328
33.51
>>>>>> 0 0 0
>>>>>> (XEN) 0.1% 0.0% 0.0% 24.4%
>>>>>> 38.8% 13.9% 12.0% 10.4% 41810
>>>>>> (XEN) invl_pg 10169 302977 125270 44927
>>>>>> 5385 17603 776 1656 508763 4.00
>>>>>> 0 0 0
>>>>>> (XEN) 1.9% 59.5% 24.6% 8.8%
>>>>>> 1.0% 3.4% 0.1% 0.3% 2037
>>>>>>
>>>>>>
>>>>>
>>>>
>>
>
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
Robert Phillips
2006-Jun-30 12:24 UTC
Re: [Xen-devel] [PATCH - proposed] XI Shadow Page Table Mechanism]
Hi Han, You are right again. I have not seen this cause a problem but we''ll certainly fix it. -- rsp On 6/29/06, zhu <vanbas.han@gmail.com> wrote:> > Hi, Robert, > I found out another confusing code snippet: > in void xi_invl_mfn(struct domain *d, unsigned long mfn) > if (ext && pfn < ext->large_page_aligned_size) > > According to the code, it should be > if (ext && (pfn>>SPT_ENTRIES_ORDER) < ext->large_page_aligned_size) > > If I made any mistake, please point it out. > _______________________________________________________ > Best Regards, > hanzhu >-- -------------------------------------------------------------------- Robert S. Phillips Virtual Iron Software rphillips@virtualiron.com Tower 1, Floor 2 978-849-1220 900 Chelmsford Street Lowell, MA 01851 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Robert Phillips
2006-Jun-30 13:18 UTC
Re: [Xen-devel] [PATCH - proposed] XI Shadow Page Table Mechanism]
Yes, that''s true. If the caller supplied a bogus PFN (one larger than ram size), we''d crash. BTW, we are avoiding use of grant table transfers because they fragment PSE pages. -- rsp On 6/30/06, zhu <vanbas.han@gmail.com> wrote:> > I''m afraid it could trigger some certain problems only when you open l2 > PSE flag and use grant table to transfer pages between domains. I''m not > very sure about it. :-) > > _______________________________________________________ > Best Regards, > hanzhu > > > Robert Phillips 写道: > > Hi Han, > > You are right again. I have not seen this cause a problem but we''ll > > certainly fix it. > > -- rsp > > > > On 6/29/06, zhu <vanbas.han@gmail.com> wrote: > >> > >> Hi, Robert, > >> I found out another confusing code snippet: > >> in void xi_invl_mfn(struct domain *d, unsigned long mfn) > >> if (ext && pfn < ext->large_page_aligned_size) > >> > >> According to the code, it should be > >> if (ext && (pfn>>SPT_ENTRIES_ORDER) < > ext->large_page_aligned_size) > >> > >> If I made any mistake, please point it out. > >> _______________________________________________________ > >> Best Regards, > >> hanzhu > >> > > > > >-- -------------------------------------------------------------------- Robert S. Phillips Virtual Iron Software rphillips@virtualiron.com Tower 1, Floor 2 978-849-1220 900 Chelmsford Street Lowell, MA 01851 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Steve Ofsthun
2006-Jun-30 13:33 UTC
Re: [Xen-devel] [PATCH - proposed] XI Shadow Page Table Mechanism]
Robert Phillips wrote:> Yes, that''s true. If the caller supplied a bogus PFN (one larger than > ram size), we''d crash. > BTW, we are avoiding use of grant table transfers because they fragment > PSE pages. > -- rspWe also avoid grant table transfers due to incompatibilities with QEMU''s one time map of all of guest memory. If you use grant table transfers (via netfront/netback), the QEMU memory map quickly becomes out of date. Any QEMU I/O (net, or disk), will result in stale guest data being used.> On 6/30/06, *zhu* <vanbas.han@gmail.com <mailto:vanbas.han@gmail.com>> > wrote: > > I''m afraid it could trigger some certain problems only when you open l2 > PSE flag and use grant table to transfer pages between domains. I''m not > very sure about it. :-) > > _______________________________________________________ > Best Regards, > hanzhu > > > Robert Phillips 写道: > > Hi Han, > > You are right again. I have not seen this cause a problem but we''ll > > certainly fix it. > > -- rsp > > > > On 6/29/06, zhu < vanbas.han@gmail.com > <mailto:vanbas.han@gmail.com>> wrote: > >> > >> Hi, Robert, > >> I found out another confusing code snippet: > >> in void xi_invl_mfn(struct domain *d, unsigned long mfn) > >> if (ext && pfn < ext->large_page_aligned_size) > >> > >> According to the code, it should be > >> if (ext && (pfn>>SPT_ENTRIES_ORDER) < > ext->large_page_aligned_size) > >> > >> If I made any mistake, please point it out. > >> _______________________________________________________ > >> Best Regards, > >> hanzhu > >> > > > > > > > > > -- > -------------------------------------------------------------------- > Robert S. Phillips Virtual Iron Software > rphillips@virtualiron.com > <mailto:rphillips@virtualiron.com> Tower 1, Floor 2 > 978-849-1220 900 Chelmsford Street > Lowell, MA 01851 > > > ------------------------------------------------------------------------ > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel-- Steve Ofsthun - Virtual Iron Software, Inc. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
zhu
2006-Jun-30 13:47 UTC
Re: [Xen-devel] [PATCH - proposed] XI Shadow Page Table Mechanism]
Hi, Robert, There is another minor problem. in dt_init_action: + if (_32pae_l3 || c_curr_accessed) + action |= a_present; + /* else make present when accessed */ + if (c_curr_rw) + action |= a_rw; According to your comments, the following snippet may be more correct: + if (_32pae_l3 || c_curr_accessed) + action |= a_present; + /* else make present when accessed */ + if (c_curr_rw && !_32_pae_l3) + action |= a_rw; The doc said we should retain R/W be zero for pae_l3_pte in legacy mode. If returned a_rw, resync_spte will set _PAGE_RW for the corresponding spte. _______________________________________________________ Best Regards, hanzhu Robert Phillips 写道:> Yes, that''s true. If the caller supplied a bogus PFN (one larger than ram > size), we''d crash. > BTW, we are avoiding use of grant table transfers because they fragment PSE > pages. > -- rsp > > > On 6/30/06, zhu <vanbas.han@gmail.com> wrote: >> >> I''m afraid it could trigger some certain problems only when you open l2 >> PSE flag and use grant table to transfer pages between domains. I''m not >> very sure about it. :-) >> >> _______________________________________________________ >> Best Regards, >> hanzhu >> >> >> Robert Phillips $B<LF;(B: >> > Hi Han, >> > You are right again. I have not seen this cause a problem but we''ll >> > certainly fix it. >> > -- rsp >> > >> > On 6/29/06, zhu <vanbas.han@gmail.com> wrote: >> >> >> >> Hi, Robert, >> >> I found out another confusing code snippet: >> >> in void xi_invl_mfn(struct domain *d, unsigned long mfn) >> >> if (ext && pfn < ext->large_page_aligned_size) >> >> >> >> According to the code, it should be >> >> if (ext && (pfn>>SPT_ENTRIES_ORDER) < >> ext->large_page_aligned_size) >> >> >> >> If I made any mistake, please point it out. >> >> _______________________________________________________ >> >> Best Regards, >> >> hanzhu >> >> >> > >> > >> > > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
zhu
2006-Jul-01 09:22 UTC
Re: [Xen-devel] [PATCH - proposed] XI Shadow Page Table Mechanism]
Hi, After taking some time to dig into your patch about XI Shadow page table, I have to say it''s really a good design and implementation IMHO, especially the parts about the clear hierarchy for each smfn,decision table and how to support 32nopae in a rather elegant way. However, I have several questions to discuss with you.:-) 1) It seems XI shadow pgt reserve all of the possible resources at the early stage for HVM domain(the first time to create the asi). It could be quite proper to reserve the smfns and sptis. However, do we really need to reserve one snapshot page for each smfn at first and retain it until the HVM domain is destroyed? I guess a large number of gpts will not been modified frequently after them are totally set up. IMHO, it would be better to manage these snapshot pages dynamic. Of course, this will change the basic logistic of the code, e.g. you have to sync the shadow pgt when invoke spti_make_shadow instead of leaving it out of sync, you can''t set up the total low level shadow pgt when invoke resync_spte since it could cost a lot of time. 2) GP back link plays a very important role in XI shadow pgt. However, it will also cause high memory pressure for the domain(2 pages for each smfn). For these normal guest pages instead of GPT pages, I guess its usage is limited. Only when invoke xi_invld_mfn, divide_large_page or dirty logging, we will refer to the back link for these normal guest pages. Is it reasonable to implement the back link only for the GPT pages? Of course, this will increase the complexity of the code a little. 3) Can you show us the statistics between the current shadow pgt and XI pgt for some critical operations, such as shadow_resync_all, gva_to_gpa, shadow_fault and so on. I''m really curious about it. I have to say I''m not very familiar with the current shadow pgt implementation so I could miss some important considerations when I post these questions. Please point it out. Thanks for sharing your idea and code with us. :-) _______________________________________________________ Best Regards, hanzhu _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel