Hi keir I noticed that the changeset 15289 introuduced locking to platform timers. And you mentioned that it only handy for correctness. Are there some potential issues which is fixed by this patch? If not, I wonder why we need those locks? I think it should be OS''s responsibly to guarantee the access sequentially, not hypervisor. Am I right? I don''t know whether all those locks are necessary, but at least the lock for vhpet, especially the reading lock, is not required. best regards yang
On 17/04/2012 04:26, "Zhang, Yang Z" <yang.z.zhang@intel.com> wrote:> Hi keir > > I noticed that the changeset 15289 introuduced locking to platform timers. And > you mentioned that it only handy for correctness. Are there some potential > issues which is fixed by this patch? If not, I wonder why we need those locks?Yes, issues were fixed by the patch. That''s why I bothered to implement it. However I think the observed issues were with protecting the mechanisms in vpt.c, and the other locking at least partially may be overly cautious.> I think it should be OS''s responsibly to guarantee the access sequentially, > not hypervisor. Am I right?It depends. Where an access is an apparently-atomic memory-mapped access, but implemented as a sequence of operations in the hypervisor, the hypervisor might need to maintain atomicity through locking.> I don''t know whether all those locks are necessary, but at least the lock for > vhpet, especially the reading lock, is not required.This is definitely not true, for example locking is required around calls to create_periodic_time(), to serialise them. So in general the locking I added, even in vhpet.c is required. If you have a specific hot path you are looking to optimise, and especially if you have numbers to back that up, then we can consider specific localised optimisations to avoid locking where we can reason it is not needed. -- Keir> best regards > yang >
> -----Original Message----- > From: Keir Fraser [mailto:keir.xen@gmail.com] On Behalf Of Keir Fraser > > > Hi keir > > > > I noticed that the changeset 15289 introuduced locking to platform > > timers. And you mentioned that it only handy for correctness. Are > > there some potential issues which is fixed by this patch? If not, I wonder why > we need those locks? > > Yes, issues were fixed by the patch. That''s why I bothered to implement it. > However I think the observed issues were with protecting the mechanisms in > vpt.c, and the other locking at least partially may be overly cautious. > > > I think it should be OS''s responsibly to guarantee the access > > sequentially, not hypervisor. Am I right? > > It depends. Where an access is an apparently-atomic memory-mapped access, > but implemented as a sequence of operations in the hypervisor, the hypervisor > might need to maintain atomicity through locking.But if there already has lock inside guest for those access, and there have no other code patch(like timer callback function) in hypervisor to access the shared data, then we don''t need to use lock in hypersivor.> > I don''t know whether all those locks are necessary, but at least the > > lock for vhpet, especially the reading lock, is not required. > > This is definitely not true, for example locking is required around calls to > create_periodic_time(), to serialise them. So in general the locking I added, > even in vhpet.c is required. If you have a specific hot path you are looking to > optimise, and especially if you have numbers to back that up, then we can > consider specific localised optimisations to avoid locking where we can reason > it is not needed.As I mentioned, if the guest can ensure there only one CPU to access the hpet at same time, this means the access itself already is serialized. Yes.Win8 is booting very slowly w/ more than 16 VCPUs. And this is due to the big lock in reading hpet. Also, the xentrace data shows lots of vmexit which mainly from PAUSE instruction from other cpus. So I think if guest already uses lock to protect the hpet access, why hypervisor do the same thing too? yang
On 18/04/2012 01:52, "Zhang, Yang Z" <yang.z.zhang@intel.com> wrote:>> It depends. Where an access is an apparently-atomic memory-mapped access, >> but implemented as a sequence of operations in the hypervisor, the hypervisor >> might need to maintain atomicity through locking. > > But if there already has lock inside guest for those access, and there have no > other code patch(like timer callback function) in hypervisor to access the > shared data, then we don''t need to use lock in hypersivor.If there is a memory-mapped register access which is atomic on bare metal, the guest may not bother with locking. We have to maintain that apparent atomicity ourselves by implementing serialisation in the hypervisor.>>> I don''t know whether all those locks are necessary, but at least the >>> lock for vhpet, especially the reading lock, is not required. >> >> This is definitely not true, for example locking is required around calls to >> create_periodic_time(), to serialise them. So in general the locking I added, >> even in vhpet.c is required. If you have a specific hot path you are looking >> to >> optimise, and especially if you have numbers to back that up, then we can >> consider specific localised optimisations to avoid locking where we can >> reason >> it is not needed. > As I mentioned, if the guest can ensure there only one CPU to access the hpet > at same time, this means the access itself already is serialized. > Yes.Win8 is booting very slowly w/ more than 16 VCPUs. And this is due to the > big lock in reading hpet. Also, the xentrace data shows lots of vmexit which > mainly from PAUSE instruction from other cpus. So I think if guest already > uses lock to protect the hpet access, why hypervisor do the same thing too?If the HPET accesses are atomic on bare metal, we have to maintain that, even if some guests have extra locking themselves. Also, in some cases Xen needs locking to correctly maintain its own internal state regardless of what an (untrusted) guest might do. So we cannot just get rid of the vhpet lock everywhere. It''s definitely not correct to do so. Optimising the hpet read path however, sounds okay. I agree the lock may not be needed on that specific path. -- Keir
> -----Original Message----- > From: Keir Fraser [mailto:keir.xen@gmail.com] > Sent: Wednesday, April 18, 2012 3:13 PM > To: Zhang, Yang Z; xen-devel@lists.xensource.com > Subject: Re: lock in vhpet > > On 18/04/2012 01:52, "Zhang, Yang Z" <yang.z.zhang@intel.com> wrote: > > >> It depends. Where an access is an apparently-atomic memory-mapped > >> access, but implemented as a sequence of operations in the > >> hypervisor, the hypervisor might need to maintain atomicity through locking. > > > > But if there already has lock inside guest for those access, and there > > have no other code patch(like timer callback function) in hypervisor > > to access the shared data, then we don''t need to use lock in hypersivor. > > If there is a memory-mapped register access which is atomic on bare metal, > the guest may not bother with locking. We have to maintain that apparent > atomicity ourselves by implementing serialisation in the hypervisor. > > >>> I don''t know whether all those locks are necessary, but at least the > >>> lock for vhpet, especially the reading lock, is not required. > >> > >> This is definitely not true, for example locking is required around > >> calls to create_periodic_time(), to serialise them. So in general the > >> locking I added, even in vhpet.c is required. If you have a specific > >> hot path you are looking to optimise, and especially if you have > >> numbers to back that up, then we can consider specific localised > >> optimisations to avoid locking where we can reason it is not needed. > > As I mentioned, if the guest can ensure there only one CPU to access > > the hpet at same time, this means the access itself already is serialized. > > Yes.Win8 is booting very slowly w/ more than 16 VCPUs. And this is due > > to the big lock in reading hpet. Also, the xentrace data shows lots of > > vmexit which mainly from PAUSE instruction from other cpus. So I think > > if guest already uses lock to protect the hpet access, why hypervisor do the > same thing too? > > If the HPET accesses are atomic on bare metal, we have to maintain that, even > if some guests have extra locking themselves. Also, in some cases Xen needs > locking to correctly maintain its own internal state regardless of what an > (untrusted) guest might do. So we cannot just get rid of the vhpet lock > everywhere. It''s definitely not correct to do so. Optimising the hpet read path > however, sounds okay. I agree the lock may not be needed on that specific > path.You are right. For this case, since the main access of hpet is to read the main counter, so I think the rwlock is a better choice. yang
On 18/04/2012 08:55, "Zhang, Yang Z" <yang.z.zhang@intel.com> wrote:>> If the HPET accesses are atomic on bare metal, we have to maintain that, even >> if some guests have extra locking themselves. Also, in some cases Xen needs >> locking to correctly maintain its own internal state regardless of what an >> (untrusted) guest might do. So we cannot just get rid of the vhpet lock >> everywhere. It''s definitely not correct to do so. Optimising the hpet read >> path >> however, sounds okay. I agree the lock may not be needed on that specific >> path. > > You are right. > For this case, since the main access of hpet is to read the main counter, so I > think the rwlock is a better choice.I''ll see if I can make a patch... -- Keir
On 18/04/2012 09:29, "Keir Fraser" <keir@xen.org> wrote:> On 18/04/2012 08:55, "Zhang, Yang Z" <yang.z.zhang@intel.com> wrote: > >>> If the HPET accesses are atomic on bare metal, we have to maintain that, >>> even >>> if some guests have extra locking themselves. Also, in some cases Xen needs >>> locking to correctly maintain its own internal state regardless of what an >>> (untrusted) guest might do. So we cannot just get rid of the vhpet lock >>> everywhere. It''s definitely not correct to do so. Optimising the hpet read >>> path >>> however, sounds okay. I agree the lock may not be needed on that specific >>> path. >> >> You are right. >> For this case, since the main access of hpet is to read the main counter, so >> I >> think the rwlock is a better choice. > > I''ll see if I can make a patch...Please try the attached patch (build tested only). -- Keir> -- Keir > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
On 18/04/2012 10:14, "Keir Fraser" <keir@xen.org> wrote:> On 18/04/2012 09:29, "Keir Fraser" <keir@xen.org> wrote: > >> On 18/04/2012 08:55, "Zhang, Yang Z" <yang.z.zhang@intel.com> wrote: >> >>>> If the HPET accesses are atomic on bare metal, we have to maintain that, >>>> even >>>> if some guests have extra locking themselves. Also, in some cases Xen needs >>>> locking to correctly maintain its own internal state regardless of what an >>>> (untrusted) guest might do. So we cannot just get rid of the vhpet lock >>>> everywhere. It''s definitely not correct to do so. Optimising the hpet read >>>> path >>>> however, sounds okay. I agree the lock may not be needed on that specific >>>> path. >>> >>> You are right. >>> For this case, since the main access of hpet is to read the main counter, so >>> I >>> think the rwlock is a better choice. >> >> I''ll see if I can make a patch... > > Please try the attached patch (build tested only).Actually try this updated one. :-)> -- Keir > >> -- Keir >> >> >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
There have no problem with this patch, it works well. But it cannot fix the win8 issue. It seems there has some other issues with hpet. I will look into it. Thanks for your quick patch. best regards yang> -----Original Message----- > From: Keir Fraser [mailto:keir.xen@gmail.com] On Behalf Of Keir Fraser > Sent: Wednesday, April 18, 2012 5:31 PM > To: Zhang, Yang Z; xen-devel@lists.xensource.com > Subject: Re: lock in vhpet > > On 18/04/2012 10:14, "Keir Fraser" <keir@xen.org> wrote: > > > On 18/04/2012 09:29, "Keir Fraser" <keir@xen.org> wrote: > > > >> On 18/04/2012 08:55, "Zhang, Yang Z" <yang.z.zhang@intel.com> wrote: > >> > >>>> If the HPET accesses are atomic on bare metal, we have to maintain > >>>> that, even if some guests have extra locking themselves. Also, in > >>>> some cases Xen needs locking to correctly maintain its own internal > >>>> state regardless of what an > >>>> (untrusted) guest might do. So we cannot just get rid of the vhpet > >>>> lock everywhere. It''s definitely not correct to do so. Optimising > >>>> the hpet read path however, sounds okay. I agree the lock may not > >>>> be needed on that specific path. > >>> > >>> You are right. > >>> For this case, since the main access of hpet is to read the main > >>> counter, so I think the rwlock is a better choice. > >> > >> I''ll see if I can make a patch... > > > > Please try the attached patch (build tested only). > > Actually try this updated one. :-) > > > -- Keir > > > >> -- Keir > >> > >> > >
At 05:19 +0000 on 19 Apr (1334812779), Zhang, Yang Z wrote:> There have no problem with this patch, it works well. But it cannot > fix the win8 issue. It seems there has some other issues with hpet. I > will look into it. Thanks for your quick patch.The lock in hvm_get_guest_time() will still be serializing the hpet reads. But since it needs to update a shared variable, that will need to haul cachelines around anyway. Tim.
On 19/04/2012 09:27, "Tim Deegan" <tim@xen.org> wrote:> At 05:19 +0000 on 19 Apr (1334812779), Zhang, Yang Z wrote: >> There have no problem with this patch, it works well. But it cannot >> fix the win8 issue. It seems there has some other issues with hpet. I >> will look into it. Thanks for your quick patch. > > The lock in hvm_get_guest_time() will still be serializing the hpet > reads. But since it needs to update a shared variable, that will need to > haul cachelines around anyway.Yes, that''s true. You could try the attached hacky patch out of interest, to see what that lock is costing you in your scenario. But if we want consistent monotonically-increasing guest time, I suspect we can''t get rid of the lock, so that''s going to limit our scalability unavoidably. Shame. -- Keir> Tim. >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
The p2m lock in __get_gfn_type_access() is the culprit. Here is the profiling data with 10 seconds: (XEN) p2m_lock 1 lock: (XEN) lock: 190733(00000000:14CE5726), block: 67159(00000007:6AAA53F3) Those data is collected when win8 guest(16 vcpus) is idle. 16 VCPUs blocked 30 seconds with 10 sec''s profiling. It means 18% of cpu cycle is waiting for the p2m lock. And those data only for idle guest. The impaction is more seriously when run some workload inside guest. I noticed that this change was adding by cs 24770. And before it, we don''t require the p2m lock in _get_gfn_type_access. So is this lock really necessary? best regards yang> -----Original Message----- > From: Keir Fraser [mailto:keir.xen@gmail.com] On Behalf Of Keir Fraser > Sent: Thursday, April 19, 2012 4:47 PM > To: Tim Deegan; Zhang, Yang Z > Cc: xen-devel@lists.xensource.com > Subject: Re: [Xen-devel] lock in vhpet > > On 19/04/2012 09:27, "Tim Deegan" <tim@xen.org> wrote: > > > At 05:19 +0000 on 19 Apr (1334812779), Zhang, Yang Z wrote: > >> There have no problem with this patch, it works well. But it cannot > >> fix the win8 issue. It seems there has some other issues with hpet. I > >> will look into it. Thanks for your quick patch. > > > > The lock in hvm_get_guest_time() will still be serializing the hpet > > reads. But since it needs to update a shared variable, that will need > > to haul cachelines around anyway. > > Yes, that''s true. You could try the attached hacky patch out of interest, to see > what that lock is costing you in your scenario. But if we want consistent > monotonically-increasing guest time, I suspect we can''t get rid of the lock, so > that''s going to limit our scalability unavoidably. Shame. > > -- Keir > > > Tim. > >
>>> On 23.04.12 at 09:36, "Zhang, Yang Z" <yang.z.zhang@intel.com> wrote: > The p2m lock in __get_gfn_type_access() is the culprit. Here is the profiling > data with 10 seconds: > > (XEN) p2m_lock 1 lock: > (XEN) lock: 190733(00000000:14CE5726), block: > 67159(00000007:6AAA53F3) > > Those data is collected when win8 guest(16 vcpus) is idle. 16 VCPUs blocked > 30 seconds with 10 sec''s profiling. It means 18% of cpu cycle is waiting for > the p2m lock. And those data only for idle guest. The impaction is more > seriously when run some workload inside guest. > I noticed that this change was adding by cs 24770. And before it, we don''t > require the p2m lock in _get_gfn_type_access. So is this lock really > necessary?Or shouldn''t such a lock frequently taken on a read path be an rwlock instead? Jan
> -----Original Message----- > From: Jan Beulich [mailto:JBeulich@suse.com] > Sent: Monday, April 23, 2012 3:43 PM > To: Zhang, Yang Z; Keir Fraser; Tim Deegan > Cc: andres@lagarcavilla.org; xen-devel@lists.xensource.com > Subject: Re: [Xen-devel] lock in vhpet > > >>> On 23.04.12 at 09:36, "Zhang, Yang Z" <yang.z.zhang@intel.com> wrote: > > The p2m lock in __get_gfn_type_access() is the culprit. Here is the > > profiling data with 10 seconds: > > > > (XEN) p2m_lock 1 lock: > > (XEN) lock: 190733(00000000:14CE5726), block: > > 67159(00000007:6AAA53F3) > > > > Those data is collected when win8 guest(16 vcpus) is idle. 16 VCPUs > > blocked > > 30 seconds with 10 sec''s profiling. It means 18% of cpu cycle is > > waiting for the p2m lock. And those data only for idle guest. The > > impaction is more seriously when run some workload inside guest. > > I noticed that this change was adding by cs 24770. And before it, we > > don''t require the p2m lock in _get_gfn_type_access. So is this lock > > really necessary? > > Or shouldn''t such a lock frequently taken on a read path be an rwlock instead? >Right. Using rwlock would make more sense. best regards yang
On 23/04/2012 09:15, "Zhang, Yang Z" <yang.z.zhang@intel.com> wrote:>>> Those data is collected when win8 guest(16 vcpus) is idle. 16 VCPUs >>> blocked >>> 30 seconds with 10 sec''s profiling. It means 18% of cpu cycle is >>> waiting for the p2m lock. And those data only for idle guest. The >>> impaction is more seriously when run some workload inside guest. >>> I noticed that this change was adding by cs 24770. And before it, we >>> don''t require the p2m lock in _get_gfn_type_access. So is this lock >>> really necessary? >> >> Or shouldn''t such a lock frequently taken on a read path be an rwlock >> instead? >> > Right. Using rwlock would make more sense.Interested to see if it would improve performance. My guess would be no. -- Keir
At 07:36 +0000 on 23 Apr (1335166577), Zhang, Yang Z wrote:> The p2m lock in __get_gfn_type_access() is the culprit. Here is the profiling data with 10 seconds: > > (XEN) p2m_lock 1 lock: > (XEN) lock: 190733(00000000:14CE5726), block: 67159(00000007:6AAA53F3) > > Those data is collected when win8 guest(16 vcpus) is idle. 16 VCPUs > blocked 30 seconds with 10 sec''s profiling. It means 18% of cpu cycle > is waiting for the p2m lock. And those data only for idle guest. The > impaction is more seriously when run some workload inside guest. I > noticed that this change was adding by cs 24770. And before it, we > don''t require the p2m lock in _get_gfn_type_access. So is this lock > really necessary?Ugh; that certainly is a regression. We used to be lock-free on p2m lookups and losing that will be bad for perf in lots of ways. IIRC the original aim was to use fine-grained per-page locks for this -- there should be no need to hold a per-domain lock during a normal read. Andres, what happened to that code? Making it an rwlock would be tricky as this interface doesn''t differenctiate readers from writers. For the common case (no sharing/paging/mem-access) it ought to be a win since there is hardly any writing. But it would be better to make this particular lock/unlock go away. Tim.
> At 07:36 +0000 on 23 Apr (1335166577), Zhang, Yang Z wrote: >> The p2m lock in __get_gfn_type_access() is the culprit. Here is the >> profiling data with 10 seconds: >> >> (XEN) p2m_lock 1 lock: >> (XEN) lock: 190733(00000000:14CE5726), block: >> 67159(00000007:6AAA53F3) >> >> Those data is collected when win8 guest(16 vcpus) is idle. 16 VCPUs >> blocked 30 seconds with 10 sec''s profiling. It means 18% of cpu cycle >> is waiting for the p2m lock. And those data only for idle guest. The >> impaction is more seriously when run some workload inside guest. I >> noticed that this change was adding by cs 24770. And before it, we >> don''t require the p2m lock in _get_gfn_type_access. So is this lock >> really necessary? > > Ugh; that certainly is a regression. We used to be lock-free on p2m > lookups and losing that will be bad for perf in lots of ways. IIRC the > original aim was to use fine-grained per-page locks for this -- there > should be no need to hold a per-domain lock during a normal read. > Andres, what happened to that code?The fine-grained p2m locking code is stashed somewhere and untested. Obviously not meant for 4.2. I don''t think it''ll be useful here: all vcpus are hitting the same gfn for the hpet mmio address. Thanks for the report Yang, it would be great to understand at which call sites the p2m lock is contended, so we can try to alleviate contention at the right place. Looking closely at the code, it would seem the only get_gfn in hvmemul_do_io is called on ram_gpa == 0 (?!). Shouldn''t ram_gpa underlie the target mmio address for emulation? Notwithstanding the above, we''ve optimized p2m access on hvmemul_do_io to have as brief a critical section as possible: get_gfn, get_page, put_gfn, work, put_page later. So contention might arise from (bogusly) doing get_gfn(0) by every vcpu (when it should instead be get_gfn(hpet_gfn)). The only way to alleviate that contention is to use get_gfn_query_unlocked, and that will be unsafe against paging or sharing. I am very skeptical this is causing the slowdown you see, since you''re not using paging or sharing. The critical section protected by the p2m lock is literally one cmpxchg. The other source of contention might come from hvmemul_rep_movs, which holds the p2m lock for the duration of the mmio operation. I can optimize that one using the get_gfn/get_page/put_gfn pattern mentioned above. The third source of contention might be the __hvm_copy to/from the target address of the hpet value read/written. That one can be slightly optimized by doing the memcpy outside of the scope of the p2m lock.> > Making it an rwlock would be tricky as this interface doesn''t > differenctiate readers from writers. For the common case (no > sharing/paging/mem-access) it ought to be a win since there is hardly > any writing. But it would be better to make this particular lock/unlock > go away. >We had a discussion about rwlocks way back when. It''s impossible (or nearly so) to tell when an access might upgrade to write privileges. Deadlock has a high likelihood. Andres> Tim. >
> At 07:36 +0000 on 23 Apr (1335166577), Zhang, Yang Z wrote: >> The p2m lock in __get_gfn_type_access() is the culprit. Here is the >> profiling data with 10 seconds: >> >> (XEN) p2m_lock 1 lock: >> (XEN) lock: 190733(00000000:14CE5726), block: >> 67159(00000007:6AAA53F3) >> >> Those data is collected when win8 guest(16 vcpus) is idle. 16 VCPUs >> blocked 30 seconds with 10 sec''s profiling. It means 18% of cpu cycle >> is waiting for the p2m lock. And those data only for idle guest. The >> impaction is more seriously when run some workload inside guest. I >> noticed that this change was adding by cs 24770. And before it, we >> don''t require the p2m lock in _get_gfn_type_access. So is this lock >> really necessary? > > Ugh; that certainly is a regression. We used to be lock-free on p2m > lookups and losing that will be bad for perf in lots of ways. IIRC the > original aim was to use fine-grained per-page locks for this -- there > should be no need to hold a per-domain lock during a normal read. > Andres, what happened to that code?Sigh, scratch a lot of nonsense I just spewed on the "hpet gfn". No actual page backing that one, no concerns. Still is the case that ram_gpa is zero in many cases going into hvmemul_do_io. There really is no point in doing a get_page(get_gfn(0)). How about the following (there''s more email after this patch): # HG changeset patch # Parent ccc64793187f7d9c926343a1cd4ae822a4364bd1 x86/emulation: No need to get_gfn on zero ram_gpa. Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org> diff -r ccc64793187f xen/arch/x86/hvm/emulate.c --- a/xen/arch/x86/hvm/emulate.c +++ b/xen/arch/x86/hvm/emulate.c @@ -60,33 +60,37 @@ static int hvmemul_do_io( ioreq_t *p = get_ioreq(curr); unsigned long ram_gfn = paddr_to_pfn(ram_gpa); p2m_type_t p2mt; - mfn_t ram_mfn; + mfn_t ram_mfn = _mfn(INVALID_MFN); int rc; - /* Check for paged out page */ - ram_mfn = get_gfn_unshare(curr->domain, ram_gfn, &p2mt); - if ( p2m_is_paging(p2mt) ) - { - put_gfn(curr->domain, ram_gfn); - p2m_mem_paging_populate(curr->domain, ram_gfn); - return X86EMUL_RETRY; - } - if ( p2m_is_shared(p2mt) ) - { - put_gfn(curr->domain, ram_gfn); - return X86EMUL_RETRY; - } - - /* Maintain a ref on the mfn to ensure liveness. Put the gfn - * to avoid potential deadlock wrt event channel lock, later. */ - if ( mfn_valid(mfn_x(ram_mfn)) ) - if ( !get_page(mfn_to_page(mfn_x(ram_mfn)), - curr->domain) ) + /* Many callers pass a stub zero ram_gpa addresses. */ + if ( ram_gfn != 0 ) + { + /* Check for paged out page */ + ram_mfn = get_gfn_unshare(curr->domain, ram_gfn, &p2mt); + if ( p2m_is_paging(p2mt) ) { - put_gfn(curr->domain, ram_gfn); + put_gfn(curr->domain, ram_gfn); + p2m_mem_paging_populate(curr->domain, ram_gfn); return X86EMUL_RETRY; } - put_gfn(curr->domain, ram_gfn); + if ( p2m_is_shared(p2mt) ) + { + put_gfn(curr->domain, ram_gfn); + return X86EMUL_RETRY; + } + + /* Maintain a ref on the mfn to ensure liveness. Put the gfn + * to avoid potential deadlock wrt event channel lock, later. */ + if ( mfn_valid(mfn_x(ram_mfn)) ) + if ( !get_page(mfn_to_page(mfn_x(ram_mfn)), + curr->domain) ) + { + put_gfn(curr->domain, ram_gfn); + return X86EMUL_RETRY; + } + put_gfn(curr->domain, ram_gfn); + } /* * Weird-sized accesses have undefined behaviour: we discard writes If contention is coming in from the emul_rep_movs path, then that might be taken care of with the following: # HG changeset patch # Parent 18b694840961cb6e3934628b23902a7979f00bac x86/emulate: Relieve contention of p2m lock in emulation of rep movs. get_two_gfns is used to query the source and target physical addresses of the emulated rep movs. It is not necessary to hold onto the two gfn''s for the duration of the emulation: further calls down the line will do the appropriate unsahring or paging in, and unwind correctly on failure. Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org> diff -r 18b694840961 xen/arch/x86/hvm/emulate.c --- a/xen/arch/x86/hvm/emulate.c +++ b/xen/arch/x86/hvm/emulate.c @@ -714,25 +714,23 @@ static int hvmemul_rep_movs( if ( rc != X86EMUL_OKAY ) return rc; + /* The query on the gfns is to establish the need for mmio. In the two mmio + * cases, a proper get_gfn for the "other" gfn will be enacted, with paging in + * or unsharing if necessary. In the memmove case, the gfn might change given + * the bytes mov''ed, and, again, proper get_gfn''s will be enacted in + * __hvm_copy. */ get_two_gfns(current->domain, sgpa >> PAGE_SHIFT, &sp2mt, NULL, NULL, current->domain, dgpa >> PAGE_SHIFT, &dp2mt, NULL, NULL, P2M_ALLOC, &tg); - + put_two_gfns(&tg); + if ( !p2m_is_ram(sp2mt) && !p2m_is_grant(sp2mt) ) - { - rc = hvmemul_do_mmio( + return hvmemul_do_mmio( sgpa, reps, bytes_per_rep, dgpa, IOREQ_READ, df, NULL); - put_two_gfns(&tg); - return rc; - } if ( !p2m_is_ram(dp2mt) && !p2m_is_grant(dp2mt) ) - { - rc = hvmemul_do_mmio( + return hvmemul_do_mmio( dgpa, reps, bytes_per_rep, sgpa, IOREQ_WRITE, df, NULL); - put_two_gfns(&tg); - return rc; - } /* RAM-to-RAM copy: emulate as equivalent of memmove(dgpa, sgpa, bytes). */ bytes = *reps * bytes_per_rep; @@ -747,10 +745,7 @@ static int hvmemul_rep_movs( * can be emulated by a source-to-buffer-to-destination block copy. */ if ( ((dgpa + bytes_per_rep) > sgpa) && (dgpa < (sgpa + bytes)) ) - { - put_two_gfns(&tg); return X86EMUL_UNHANDLEABLE; - } /* Adjust destination address for reverse copy. */ if ( df ) @@ -759,10 +754,7 @@ static int hvmemul_rep_movs( /* Allocate temporary buffer. Fall back to slow emulation if this fails. */ buf = xmalloc_bytes(bytes); if ( buf == NULL ) - { - put_two_gfns(&tg); return X86EMUL_UNHANDLEABLE; - } /* * We do a modicum of checking here, just for paranoia''s sake and to @@ -773,7 +765,6 @@ static int hvmemul_rep_movs( rc = hvm_copy_to_guest_phys(dgpa, buf, bytes); xfree(buf); - put_two_gfns(&tg); if ( rc == HVMCOPY_gfn_paged_out ) return X86EMUL_RETRY; Let me know if any of this helps Thanks, Andres> > Making it an rwlock would be tricky as this interface doesn''t > differenctiate readers from writers. For the common case (no > sharing/paging/mem-access) it ought to be a win since there is hardly > any writing. But it would be better to make this particular lock/unlock > go away. > > Tim. >
> -----Original Message----- > From: Andres Lagar-Cavilla [mailto:andres@lagarcavilla.org] > Sent: Tuesday, April 24, 2012 1:19 AM > > Let me know if any of this helpsNo, it not works. best regards yang
At 08:26 -0700 on 23 Apr (1335169568), Andres Lagar-Cavilla wrote:> > At 07:36 +0000 on 23 Apr (1335166577), Zhang, Yang Z wrote: > >> The p2m lock in __get_gfn_type_access() is the culprit. Here is the > >> profiling data with 10 seconds: > >> > >> (XEN) p2m_lock 1 lock: > >> (XEN) lock: 190733(00000000:14CE5726), block: > >> 67159(00000007:6AAA53F3) > >> > >> Those data is collected when win8 guest(16 vcpus) is idle. 16 VCPUs > >> blocked 30 seconds with 10 sec''s profiling. It means 18% of cpu cycle > >> is waiting for the p2m lock. And those data only for idle guest. The > >> impaction is more seriously when run some workload inside guest. I > >> noticed that this change was adding by cs 24770. And before it, we > >> don''t require the p2m lock in _get_gfn_type_access. So is this lock > >> really necessary? > > > > Ugh; that certainly is a regression. We used to be lock-free on p2m > > lookups and losing that will be bad for perf in lots of ways. IIRC the > > original aim was to use fine-grained per-page locks for this -- there > > should be no need to hold a per-domain lock during a normal read. > > Andres, what happened to that code? > > The fine-grained p2m locking code is stashed somewhere and untested. > Obviously not meant for 4.2. I don''t think it''ll be useful here: all vcpus > are hitting the same gfn for the hpet mmio address.We''ll have to do _something_ for 4.2 if it''s introducing an 18% CPU overhead in an otherwise idle VM. In any case I think this means I probably shouldn''t take the patch that turns on this locking for shadow VMs. They do a lot more p2m lookups.> The other source of contention might come from hvmemul_rep_movs, which > holds the p2m lock for the duration of the mmio operation. I can optimize > that one using the get_gfn/get_page/put_gfn pattern mentioned above.But wouldn''t that be unsafe? What if the p2m changes during the operation? Or, conversely, could you replace all uses of the lock in p2m lookups with get_page() on the result and still get what you need? Tim.
At 08:58 +0000 on 24 Apr (1335257909), Zhang, Yang Z wrote:> > -----Original Message----- > > From: Andres Lagar-Cavilla [mailto:andres@lagarcavilla.org] > > Sent: Tuesday, April 24, 2012 1:19 AM > > > > Let me know if any of this helps > No, it not works.Do you mean that it doesn''t help with the CPU overhead, or that it''s broken in some other way? Tim.
> At 08:26 -0700 on 23 Apr (1335169568), Andres Lagar-Cavilla wrote: >> > At 07:36 +0000 on 23 Apr (1335166577), Zhang, Yang Z wrote: >> >> The p2m lock in __get_gfn_type_access() is the culprit. Here is the >> >> profiling data with 10 seconds: >> >> >> >> (XEN) p2m_lock 1 lock: >> >> (XEN) lock: 190733(00000000:14CE5726), block: >> >> 67159(00000007:6AAA53F3) >> >> >> >> Those data is collected when win8 guest(16 vcpus) is idle. 16 VCPUs >> >> blocked 30 seconds with 10 sec''s profiling. It means 18% of cpu cycle >> >> is waiting for the p2m lock. And those data only for idle guest. The >> >> impaction is more seriously when run some workload inside guest. I >> >> noticed that this change was adding by cs 24770. And before it, we >> >> don''t require the p2m lock in _get_gfn_type_access. So is this lock >> >> really necessary? >> > >> > Ugh; that certainly is a regression. We used to be lock-free on p2m >> > lookups and losing that will be bad for perf in lots of ways. IIRC >> the >> > original aim was to use fine-grained per-page locks for this -- there >> > should be no need to hold a per-domain lock during a normal read. >> > Andres, what happened to that code? >> >> The fine-grained p2m locking code is stashed somewhere and untested. >> Obviously not meant for 4.2. I don''t think it''ll be useful here: all >> vcpus >> are hitting the same gfn for the hpet mmio address. > > We''ll have to do _something_ for 4.2 if it''s introducing an 18% CPU > overhead in an otherwise idle VM.An hpet mmio read or write hits get_gfn twice: one for gfn zero at the beginning of hvmemul_do_io, the other during hvm copy. The patch I sent to Yang yesterday removes the bogus first get_gfn. The second one has to perform a locked query. Yang, is there any possibility to get more information here? The ability to identify the call site that contends for the p2m lock would be crucial. Even something as crude as sampling vcpu stack traces by hitting ''d'' repeatedly on the serial line, and seeing what sticks out frequently.> > In any case I think this means I probably shouldn''t take the patch that > turns on this locking for shadow VMs. They do a lot more p2m lookups. > >> The other source of contention might come from hvmemul_rep_movs, which >> holds the p2m lock for the duration of the mmio operation. I can >> optimize >> that one using the get_gfn/get_page/put_gfn pattern mentioned above. > > But wouldn''t that be unsafe? What if the p2m changes during the > operation? Or, conversely, could you replace all uses of the lock in > p2m lookups with get_page() on the result and still get what you need?I''ve been thinking of wrapping the pattern into a handy p2m accessor. I see this pattern repeating itself for hvm_copy, tss/segment entry, page table walking, nested hvm, etc, in which the consumer wants to map the result of the translation. By the time you finish with a get_gfn_unshare, most possible p2m transformations will have been taken care of (PoD-alloc, page in, unshare). By taking a reference to the underlying page, paging out is prevented, and then the vcpu can safely let go of the p2m lock. Conceivably guest_physmap_remove_page and guest_physmap_add_entry can still come in and change the p2m entry. Andres> > Tim. >
> -----Original Message----- > From: Tim Deegan [mailto:tim@xen.org] > Sent: Tuesday, April 24, 2012 5:17 PM > To: Zhang, Yang Z > Cc: andres@lagarcavilla.org; xen-devel@lists.xensource.com; Keir Fraser > Subject: Re: [Xen-devel] lock in vhpet > > At 08:58 +0000 on 24 Apr (1335257909), Zhang, Yang Z wrote: > > > -----Original Message----- > > > From: Andres Lagar-Cavilla [mailto:andres@lagarcavilla.org] > > > Sent: Tuesday, April 24, 2012 1:19 AM > > > > > > Let me know if any of this helps > > No, it not works. > > Do you mean that it doesn''t help with the CPU overhead, or that it''s broken in > some other way? >It cannot help with the CPU overhead best regards yang
>> -----Original Message----- >> From: Tim Deegan [mailto:tim@xen.org] >> Sent: Tuesday, April 24, 2012 5:17 PM >> To: Zhang, Yang Z >> Cc: andres@lagarcavilla.org; xen-devel@lists.xensource.com; Keir Fraser >> Subject: Re: [Xen-devel] lock in vhpet >> >> At 08:58 +0000 on 24 Apr (1335257909), Zhang, Yang Z wrote: >> > > -----Original Message----- >> > > From: Andres Lagar-Cavilla [mailto:andres@lagarcavilla.org] >> > > Sent: Tuesday, April 24, 2012 1:19 AM >> > > >> > > Let me know if any of this helps >> > No, it not works. >> >> Do you mean that it doesn''t help with the CPU overhead, or that it''s >> broken in >> some other way? >> > It cannot help with the CPU overheadYang, is there any further information you can provide? A rough idea of where vcpus are spending time spinning for the p2m lock would be tremendously useful. Thanks Andres> > best regards > yang >
> -----Original Message----- > From: Andres Lagar-Cavilla [mailto:andres@lagarcavilla.org] > Sent: Wednesday, April 25, 2012 9:40 AM > To: Zhang, Yang Z > Cc: Tim Deegan; xen-devel@lists.xensource.com; Keir Fraser > Subject: RE: [Xen-devel] lock in vhpet > > >> -----Original Message----- > >> From: Tim Deegan [mailto:tim@xen.org] > >> Sent: Tuesday, April 24, 2012 5:17 PM > >> To: Zhang, Yang Z > >> Cc: andres@lagarcavilla.org; xen-devel@lists.xensource.com; Keir > >> Fraser > >> Subject: Re: [Xen-devel] lock in vhpet > >> > >> At 08:58 +0000 on 24 Apr (1335257909), Zhang, Yang Z wrote: > >> > > -----Original Message----- > >> > > From: Andres Lagar-Cavilla [mailto:andres@lagarcavilla.org] > >> > > Sent: Tuesday, April 24, 2012 1:19 AM > >> > > > >> > > Let me know if any of this helps > >> > No, it not works. > >> > >> Do you mean that it doesn''t help with the CPU overhead, or that it''s > >> broken in some other way? > >> > > It cannot help with the CPU overhead > > Yang, is there any further information you can provide? A rough idea of where > vcpus are spending time spinning for the p2m lock would be tremendously > useful. >I am doing the further investigation. Hope can get more useful information. But actually, the first cs introduced this issue is 24770. When win8 booting and if hpet is enabled, it will use hpet as the time source and there have lots of hpet access and EPT violation. In EPT violation handler, it call get_gfn_type_access to get the mfn. The cs 24770 introduces the gfn_lock for p2m lookups, and then the issue happens. After I removed the gfn_lock, the issue goes. But in latest xen, even I remove this lock, it still shows high cpu utilization. yang
> >> -----Original Message----- >> From: Andres Lagar-Cavilla [mailto:andres@lagarcavilla.org] >> Sent: Wednesday, April 25, 2012 9:40 AM >> To: Zhang, Yang Z >> Cc: Tim Deegan; xen-devel@lists.xensource.com; Keir Fraser >> Subject: RE: [Xen-devel] lock in vhpet >> >> >> -----Original Message----- >> >> From: Tim Deegan [mailto:tim@xen.org] >> >> Sent: Tuesday, April 24, 2012 5:17 PM >> >> To: Zhang, Yang Z >> >> Cc: andres@lagarcavilla.org; xen-devel@lists.xensource.com; Keir >> >> Fraser >> >> Subject: Re: [Xen-devel] lock in vhpet >> >> >> >> At 08:58 +0000 on 24 Apr (1335257909), Zhang, Yang Z wrote: >> >> > > -----Original Message----- >> >> > > From: Andres Lagar-Cavilla [mailto:andres@lagarcavilla.org] >> >> > > Sent: Tuesday, April 24, 2012 1:19 AM >> >> > > >> >> > > Let me know if any of this helps >> >> > No, it not works. >> >> >> >> Do you mean that it doesn''t help with the CPU overhead, or that it''s >> >> broken in some other way? >> >> >> > It cannot help with the CPU overhead >> >> Yang, is there any further information you can provide? A rough idea of >> where >> vcpus are spending time spinning for the p2m lock would be tremendously >> useful. >> > I am doing the further investigation. Hope can get more useful > information.Thanks, looking forward to that.> But actually, the first cs introduced this issue is 24770. When win8 > booting and if hpet is enabled, it will use hpet as the time source and > there have lots of hpet access and EPT violation. In EPT violation > handler, it call get_gfn_type_access to get the mfn. The cs 24770 > introduces the gfn_lock for p2m lookups, and then the issue happens. After > I removed the gfn_lock, the issue goes. But in latest xen, even I remove > this lock, it still shows high cpu utilization. >It would seem then that even the briefest lock-protected critical section would cause this? In the mmio case, the p2m lock taken in the hap fault handler is held during the actual lookup, and for a couple of branch instructions afterwards. In latest Xen, with lock removed for get_gfn, on which lock is time spent? Thanks, Andres> yang >
> -----Original Message----- > From: Andres Lagar-Cavilla [mailto:andres@lagarcavilla.org] > Sent: Wednesday, April 25, 2012 10:31 AM > To: Zhang, Yang Z > Cc: Tim Deegan; xen-devel@lists.xensource.com; Keir Fraser > Subject: RE: [Xen-devel] lock in vhpet > > > > >> -----Original Message----- > >> From: Andres Lagar-Cavilla [mailto:andres@lagarcavilla.org] > >> Sent: Wednesday, April 25, 2012 9:40 AM > >> To: Zhang, Yang Z > >> Cc: Tim Deegan; xen-devel@lists.xensource.com; Keir Fraser > >> Subject: RE: [Xen-devel] lock in vhpet > >> > >> >> -----Original Message----- > >> >> From: Tim Deegan [mailto:tim@xen.org] > >> >> Sent: Tuesday, April 24, 2012 5:17 PM > >> >> To: Zhang, Yang Z > >> >> Cc: andres@lagarcavilla.org; xen-devel@lists.xensource.com; Keir > >> >> Fraser > >> >> Subject: Re: [Xen-devel] lock in vhpet > >> >> > >> >> At 08:58 +0000 on 24 Apr (1335257909), Zhang, Yang Z wrote: > >> >> > > -----Original Message----- > >> >> > > From: Andres Lagar-Cavilla [mailto:andres@lagarcavilla.org] > >> >> > > Sent: Tuesday, April 24, 2012 1:19 AM > >> >> > > > >> >> > > Let me know if any of this helps > >> >> > No, it not works. > >> >> > >> >> Do you mean that it doesn''t help with the CPU overhead, or that > >> >> it''s broken in some other way? > >> >> > >> > It cannot help with the CPU overhead > >> > >> Yang, is there any further information you can provide? A rough idea > >> of where vcpus are spending time spinning for the p2m lock would be > >> tremendously useful. > >> > > I am doing the further investigation. Hope can get more useful > > information. > > Thanks, looking forward to that. > > > But actually, the first cs introduced this issue is 24770. When win8 > > booting and if hpet is enabled, it will use hpet as the time source > > and there have lots of hpet access and EPT violation. In EPT violation > > handler, it call get_gfn_type_access to get the mfn. The cs 24770 > > introduces the gfn_lock for p2m lookups, and then the issue happens. > > After I removed the gfn_lock, the issue goes. But in latest xen, even > > I remove this lock, it still shows high cpu utilization. > > > > It would seem then that even the briefest lock-protected critical section would > cause this? In the mmio case, the p2m lock taken in the hap fault handler is > held during the actual lookup, and for a couple of branch instructions > afterwards. > > In latest Xen, with lock removed for get_gfn, on which lock is time spent?Still the p2m_lock. yang
>> -----Original Message----- >> From: Andres Lagar-Cavilla [mailto:andres@lagarcavilla.org] >> Sent: Wednesday, April 25, 2012 10:31 AM >> To: Zhang, Yang Z >> Cc: Tim Deegan; xen-devel@lists.xensource.com; Keir Fraser >> Subject: RE: [Xen-devel] lock in vhpet >> >> > >> >> -----Original Message----- >> >> From: Andres Lagar-Cavilla [mailto:andres@lagarcavilla.org] >> >> Sent: Wednesday, April 25, 2012 9:40 AM >> >> To: Zhang, Yang Z >> >> Cc: Tim Deegan; xen-devel@lists.xensource.com; Keir Fraser >> >> Subject: RE: [Xen-devel] lock in vhpet >> >> >> >> >> -----Original Message----- >> >> >> From: Tim Deegan [mailto:tim@xen.org] >> >> >> Sent: Tuesday, April 24, 2012 5:17 PM >> >> >> To: Zhang, Yang Z >> >> >> Cc: andres@lagarcavilla.org; xen-devel@lists.xensource.com; Keir >> >> >> Fraser >> >> >> Subject: Re: [Xen-devel] lock in vhpet >> >> >> >> >> >> At 08:58 +0000 on 24 Apr (1335257909), Zhang, Yang Z wrote: >> >> >> > > -----Original Message----- >> >> >> > > From: Andres Lagar-Cavilla [mailto:andres@lagarcavilla.org] >> >> >> > > Sent: Tuesday, April 24, 2012 1:19 AM >> >> >> > > >> >> >> > > Let me know if any of this helps >> >> >> > No, it not works. >> >> >> >> >> >> Do you mean that it doesn''t help with the CPU overhead, or that >> >> >> it''s broken in some other way? >> >> >> >> >> > It cannot help with the CPU overhead >> >> >> >> Yang, is there any further information you can provide? A rough idea >> >> of where vcpus are spending time spinning for the p2m lock would be >> >> tremendously useful. >> >> >> > I am doing the further investigation. Hope can get more useful >> > information. >> >> Thanks, looking forward to that. >> >> > But actually, the first cs introduced this issue is 24770. When win8 >> > booting and if hpet is enabled, it will use hpet as the time source >> > and there have lots of hpet access and EPT violation. In EPT violation >> > handler, it call get_gfn_type_access to get the mfn. The cs 24770 >> > introduces the gfn_lock for p2m lookups, and then the issue happens. >> > After I removed the gfn_lock, the issue goes. But in latest xen, even >> > I remove this lock, it still shows high cpu utilization. >> > >> >> It would seem then that even the briefest lock-protected critical >> section would >> cause this? In the mmio case, the p2m lock taken in the hap fault >> handler is >> held during the actual lookup, and for a couple of branch instructions >> afterwards. >> >> In latest Xen, with lock removed for get_gfn, on which lock is time >> spent? > Still the p2m_lock.How are you removing the lock from get_gfn? The p2m lock is taken on a few specific code paths outside of get_gfn (change type of an entry, add a new p2m entry, setup and teardown), and I''m surprised any of those code paths is being used by the hpet mmio handler. Andres> > yang > >
> -----Original Message----- > From: Andres Lagar-Cavilla [mailto:andres@lagarcavilla.org] > Sent: Wednesday, April 25, 2012 10:42 AM > To: Zhang, Yang Z > Cc: Tim Deegan; xen-devel@lists.xensource.com; Keir Fraser > Subject: RE: [Xen-devel] lock in vhpet > > >> -----Original Message----- > >> From: Andres Lagar-Cavilla [mailto:andres@lagarcavilla.org] > >> Sent: Wednesday, April 25, 2012 10:31 AM > >> To: Zhang, Yang Z > >> Cc: Tim Deegan; xen-devel@lists.xensource.com; Keir Fraser > >> Subject: RE: [Xen-devel] lock in vhpet > >> > >> > > >> >> -----Original Message----- > >> >> From: Andres Lagar-Cavilla [mailto:andres@lagarcavilla.org] > >> >> Sent: Wednesday, April 25, 2012 9:40 AM > >> >> To: Zhang, Yang Z > >> >> Cc: Tim Deegan; xen-devel@lists.xensource.com; Keir Fraser > >> >> Subject: RE: [Xen-devel] lock in vhpet > >> >> > >> >> >> -----Original Message----- > >> >> >> From: Tim Deegan [mailto:tim@xen.org] > >> >> >> Sent: Tuesday, April 24, 2012 5:17 PM > >> >> >> To: Zhang, Yang Z > >> >> >> Cc: andres@lagarcavilla.org; xen-devel@lists.xensource.com; > >> >> >> Keir Fraser > >> >> >> Subject: Re: [Xen-devel] lock in vhpet > >> >> >> > >> >> >> At 08:58 +0000 on 24 Apr (1335257909), Zhang, Yang Z wrote: > >> >> >> > > -----Original Message----- > >> >> >> > > From: Andres Lagar-Cavilla [mailto:andres@lagarcavilla.org] > >> >> >> > > Sent: Tuesday, April 24, 2012 1:19 AM > >> >> >> > > > >> >> >> > > Let me know if any of this helps > >> >> >> > No, it not works. > >> >> >> > >> >> >> Do you mean that it doesn''t help with the CPU overhead, or that > >> >> >> it''s broken in some other way? > >> >> >> > >> >> > It cannot help with the CPU overhead > >> >> > >> >> Yang, is there any further information you can provide? A rough > >> >> idea of where vcpus are spending time spinning for the p2m lock > >> >> would be tremendously useful. > >> >> > >> > I am doing the further investigation. Hope can get more useful > >> > information. > >> > >> Thanks, looking forward to that. > >> > >> > But actually, the first cs introduced this issue is 24770. When > >> > win8 booting and if hpet is enabled, it will use hpet as the time > >> > source and there have lots of hpet access and EPT violation. In EPT > >> > violation handler, it call get_gfn_type_access to get the mfn. The > >> > cs 24770 introduces the gfn_lock for p2m lookups, and then the issue > happens. > >> > After I removed the gfn_lock, the issue goes. But in latest xen, > >> > even I remove this lock, it still shows high cpu utilization. > >> > > >> > >> It would seem then that even the briefest lock-protected critical > >> section would cause this? In the mmio case, the p2m lock taken in the > >> hap fault handler is held during the actual lookup, and for a couple > >> of branch instructions afterwards. > >> > >> In latest Xen, with lock removed for get_gfn, on which lock is time > >> spent? > > Still the p2m_lock. > > How are you removing the lock from get_gfn? > > The p2m lock is taken on a few specific code paths outside of get_gfn (change > type of an entry, add a new p2m entry, setup and teardown), and I''m surprised > any of those code paths is being used by the hpet mmio handler.Sorry, what I said maybe not accurate. In latest xen, I use a workaround way to skip calling get_gfn_type_access in hvm_hap_nested_page_fault(). So the p2m_lock is still existing. Now, I found the contention of p2m_lock is coming from __hvm_copy. In mmio handler, it has some code paths to call it(hvm_fetch_from_guest_virt_nofault(), hvm_copy_from_guest_virt()). When lots of mmio access happened, the contention is very obviously. yang
>> -----Original Message----- >> From: Andres Lagar-Cavilla [mailto:andres@lagarcavilla.org] >> Sent: Wednesday, April 25, 2012 10:42 AM >> To: Zhang, Yang Z >> Cc: Tim Deegan; xen-devel@lists.xensource.com; Keir Fraser >> Subject: RE: [Xen-devel] lock in vhpet >> >> >> -----Original Message----- >> >> From: Andres Lagar-Cavilla [mailto:andres@lagarcavilla.org] >> >> Sent: Wednesday, April 25, 2012 10:31 AM >> >> To: Zhang, Yang Z >> >> Cc: Tim Deegan; xen-devel@lists.xensource.com; Keir Fraser >> >> Subject: RE: [Xen-devel] lock in vhpet >> >> >> >> > >> >> >> -----Original Message----- >> >> >> From: Andres Lagar-Cavilla [mailto:andres@lagarcavilla.org] >> >> >> Sent: Wednesday, April 25, 2012 9:40 AM >> >> >> To: Zhang, Yang Z >> >> >> Cc: Tim Deegan; xen-devel@lists.xensource.com; Keir Fraser >> >> >> Subject: RE: [Xen-devel] lock in vhpet >> >> >> >> >> >> >> -----Original Message----- >> >> >> >> From: Tim Deegan [mailto:tim@xen.org] >> >> >> >> Sent: Tuesday, April 24, 2012 5:17 PM >> >> >> >> To: Zhang, Yang Z >> >> >> >> Cc: andres@lagarcavilla.org; xen-devel@lists.xensource.com; >> >> >> >> Keir Fraser >> >> >> >> Subject: Re: [Xen-devel] lock in vhpet >> >> >> >> >> >> >> >> At 08:58 +0000 on 24 Apr (1335257909), Zhang, Yang Z wrote: >> >> >> >> > > -----Original Message----- >> >> >> >> > > From: Andres Lagar-Cavilla [mailto:andres@lagarcavilla.org] >> >> >> >> > > Sent: Tuesday, April 24, 2012 1:19 AM >> >> >> >> > > >> >> >> >> > > Let me know if any of this helps >> >> >> >> > No, it not works. >> >> >> >> >> >> >> >> Do you mean that it doesn''t help with the CPU overhead, or that >> >> >> >> it''s broken in some other way? >> >> >> >> >> >> >> > It cannot help with the CPU overhead >> >> >> >> >> >> Yang, is there any further information you can provide? A rough >> >> >> idea of where vcpus are spending time spinning for the p2m lock >> >> >> would be tremendously useful. >> >> >> >> >> > I am doing the further investigation. Hope can get more useful >> >> > information. >> >> >> >> Thanks, looking forward to that. >> >> >> >> > But actually, the first cs introduced this issue is 24770. When >> >> > win8 booting and if hpet is enabled, it will use hpet as the time >> >> > source and there have lots of hpet access and EPT violation. In EPT >> >> > violation handler, it call get_gfn_type_access to get the mfn. The >> >> > cs 24770 introduces the gfn_lock for p2m lookups, and then the >> issue >> happens. >> >> > After I removed the gfn_lock, the issue goes. But in latest xen, >> >> > even I remove this lock, it still shows high cpu utilization. >> >> > >> >> >> >> It would seem then that even the briefest lock-protected critical >> >> section would cause this? In the mmio case, the p2m lock taken in the >> >> hap fault handler is held during the actual lookup, and for a couple >> >> of branch instructions afterwards. >> >> >> >> In latest Xen, with lock removed for get_gfn, on which lock is time >> >> spent? >> > Still the p2m_lock. >> >> How are you removing the lock from get_gfn? >> >> The p2m lock is taken on a few specific code paths outside of get_gfn >> (change >> type of an entry, add a new p2m entry, setup and teardown), and I''m >> surprised >> any of those code paths is being used by the hpet mmio handler. > > Sorry, what I said maybe not accurate. In latest xen, I use a workaround > way to skip calling get_gfn_type_access in hvm_hap_nested_page_fault(). So > the p2m_lock is still existing. > Now, I found the contention of p2m_lock is coming from __hvm_copy. In mmio > handler, it has some code paths to call > it(hvm_fetch_from_guest_virt_nofault(), hvm_copy_from_guest_virt()). When > lots of mmio access happened, the contention is very obviously.Thanks. Can you please try this: http://lists.xen.org/archives/html/xen-devel/2012-04/msg01861.html in conjunction with the patch below? Andres diff -r 7a7443e80b99 xen/arch/x86/hvm/hvm.c --- a/xen/arch/x86/hvm/hvm.c +++ b/xen/arch/x86/hvm/hvm.c @@ -2383,6 +2383,8 @@ static enum hvm_copy_result __hvm_copy( while ( todo > 0 ) { + struct page_info *pg; + count = min_t(int, PAGE_SIZE - (addr & ~PAGE_MASK), todo); if ( flags & HVMCOPY_virt ) @@ -2427,7 +2429,11 @@ static enum hvm_copy_result __hvm_copy( put_gfn(curr->domain, gfn); return HVMCOPY_bad_gfn_to_mfn; } + ASSERT(mfn_valid(mfn)); + pg = mfn_to_page(mfn); + ASSERT(get_page(pg, curr->domain)); + put_gfn(curr->domain, gfn); p = (char *)map_domain_page(mfn) + (addr & ~PAGE_MASK); @@ -2457,7 +2463,7 @@ static enum hvm_copy_result __hvm_copy( addr += count; buf += count; todo -= count; - put_gfn(curr->domain, gfn); + put_page(pg); } return HVMCOPY_okay;> > yang >
> -----Original Message----- > From: Andres Lagar-Cavilla [mailto:andres@lagarcavilla.org] > Sent: Wednesday, April 25, 2012 11:34 AM > > Thanks. Can you please try this: > http://lists.xen.org/archives/html/xen-devel/2012-04/msg01861.html > > in conjunction with the patch below? > Andres > > diff -r 7a7443e80b99 xen/arch/x86/hvm/hvm.c > --- a/xen/arch/x86/hvm/hvm.c > +++ b/xen/arch/x86/hvm/hvm.c > @@ -2383,6 +2383,8 @@ static enum hvm_copy_result __hvm_copy( > > while ( todo > 0 ) > { > + struct page_info *pg; > + > count = min_t(int, PAGE_SIZE - (addr & ~PAGE_MASK), todo); > > if ( flags & HVMCOPY_virt ) > @@ -2427,7 +2429,11 @@ static enum hvm_copy_result __hvm_copy( > put_gfn(curr->domain, gfn); > return HVMCOPY_bad_gfn_to_mfn; > } > + > ASSERT(mfn_valid(mfn)); > + pg = mfn_to_page(mfn); > + ASSERT(get_page(pg, curr->domain)); > + put_gfn(curr->domain, gfn); > > p = (char *)map_domain_page(mfn) + (addr & ~PAGE_MASK); > > @@ -2457,7 +2463,7 @@ static enum hvm_copy_result __hvm_copy( > addr += count; > buf += count; > todo -= count; > - put_gfn(curr->domain, gfn); > + put_page(pg); > } > > return HVMCOPY_okay;No, it doesn''t work. On the contrary, the competition is more fiercer than before: Here is the p2m_lock competition with 10 seconds with 16 vcpus: lock: 560583(00000000:83362735), block: 321453(00000009:364CA49B) yang
>>> On 25.04.12 at 05:34, "Andres Lagar-Cavilla" <andres@lagarcavilla.org> wrote: > --- a/xen/arch/x86/hvm/hvm.c > +++ b/xen/arch/x86/hvm/hvm.c > @@ -2383,6 +2383,8 @@ static enum hvm_copy_result __hvm_copy( > > while ( todo > 0 ) > { > + struct page_info *pg; > + > count = min_t(int, PAGE_SIZE - (addr & ~PAGE_MASK), todo); > > if ( flags & HVMCOPY_virt ) > @@ -2427,7 +2429,11 @@ static enum hvm_copy_result __hvm_copy( > put_gfn(curr->domain, gfn); > return HVMCOPY_bad_gfn_to_mfn; > } > + > ASSERT(mfn_valid(mfn)); > + pg = mfn_to_page(mfn); > + ASSERT(get_page(pg, curr->domain));You really shouldn''t ever put expressions with (side) effects inside an ASSERT(), not even for debugging patches that you want others to apply (you''re of course free to shoot yourself in the foot ;-) ), as it just won''t work in non-debug builds. Jan> + put_gfn(curr->domain, gfn); > > p = (char *)map_domain_page(mfn) + (addr & ~PAGE_MASK); > > @@ -2457,7 +2463,7 @@ static enum hvm_copy_result __hvm_copy( > addr += count; > buf += count; > todo -= count; > - put_gfn(curr->domain, gfn); > + put_page(pg); > } > > return HVMCOPY_okay;
At 02:36 +0000 on 25 Apr (1335321409), Zhang, Yang Z wrote:> > > But actually, the first cs introduced this issue is 24770. When win8 > > > booting and if hpet is enabled, it will use hpet as the time source > > > and there have lots of hpet access and EPT violation. In EPT violation > > > handler, it call get_gfn_type_access to get the mfn. The cs 24770 > > > introduces the gfn_lock for p2m lookups, and then the issue happens. > > > After I removed the gfn_lock, the issue goes. But in latest xen, even > > > I remove this lock, it still shows high cpu utilization. > > > > It would seem then that even the briefest lock-protected critical section would > > cause this? In the mmio case, the p2m lock taken in the hap fault handler is > > held during the actual lookup, and for a couple of branch instructions > > afterwards. > > > > In latest Xen, with lock removed for get_gfn, on which lock is time spent? > Still the p2m_lock.Can you please try the attached patch? I think you''ll need this one plus the ones that take the locks out of the hpet code. This patch makes the p2m lock into an rwlock and adjusts a number of the paths that don''t update the p2m so they only take the read lock. It''s a bit rough but I can boot 16-way win7 guest with it. N.B. Since rwlocks don''t show up the the existing lock profiling, please don''t try to use the lock-profiling numbers to see if it''s helping! Andres, this is basically the big-hammer version of your "take a pagecount" changes, plus the change you made to hvmemul_rep_movs(). If this works I intend to follow it up with a patch to make some of the read-modify-write paths avoid taking the lock (by using a compare-exchange operation so they only take the lock on a write). If that succeeds I might drop put_gfn() altogether. But first it will need a lot of tidying up. Noticeably missing: - SVM code equivalents to the vmx.c changes - grant-table operations still use the lock, because frankly I could not follow the current code, and it''s quite late in the evening. I also have a long list of uglinesses in the mm code that I found while writing this lot. Keir, I have no objection to later replacing this with something better than an rwlock. :) Or with making a NUMA-friendly rwlock implementation, since I really expect this to be heavily read-mostly when paging/sharing/pod are not enabled. Cheers, Tim. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
> -----Original Message----- > From: Tim Deegan [mailto:tim@xen.org] > Sent: Friday, April 27, 2012 5:26 AM > To: Zhang, Yang Z > Cc: andres@lagarcavilla.org; Keir Fraser; xen-devel@lists.xensource.com > Subject: Re: [Xen-devel] lock in vhpet > > At 02:36 +0000 on 25 Apr (1335321409), Zhang, Yang Z wrote: > > > > But actually, the first cs introduced this issue is 24770. When > > > > win8 booting and if hpet is enabled, it will use hpet as the time > > > > source and there have lots of hpet access and EPT violation. In > > > > EPT violation handler, it call get_gfn_type_access to get the mfn. > > > > The cs 24770 introduces the gfn_lock for p2m lookups, and then the issue > happens. > > > > After I removed the gfn_lock, the issue goes. But in latest xen, > > > > even I remove this lock, it still shows high cpu utilization. > > > > > > It would seem then that even the briefest lock-protected critical > > > section would cause this? In the mmio case, the p2m lock taken in > > > the hap fault handler is held during the actual lookup, and for a > > > couple of branch instructions afterwards. > > > > > > In latest Xen, with lock removed for get_gfn, on which lock is time spent? > > Still the p2m_lock. > > Can you please try the attached patch? I think you''ll need this one plus the > ones that take the locks out of the hpet code. > > This patch makes the p2m lock into an rwlock and adjusts a number of the > paths that don''t update the p2m so they only take the read lock. It''s a bit > rough but I can boot 16-way win7 guest with it.This really a great work! Now, the win7 guest is booting very fast and never saw the BSOD again. But the changes are so large in your patch. I think we need to do more sanity testing to avoid any regressions. After you finish all the work, I''d like to do a whole testing.:)> N.B. Since rwlocks don''t show up the the existing lock profiling, please don''t try > to use the lock-profiling numbers to see if it''s helping! > > Andres, this is basically the big-hammer version of your "take a pagecount" > changes, plus the change you made to hvmemul_rep_movs(). > If this works I intend to follow it up with a patch to make some of the > read-modify-write paths avoid taking the lock (by using a compare-exchange > operation so they only take the lock on a write). If that succeeds I might drop > put_gfn() altogether. > > But first it will need a lot of tidying up. Noticeably missing: > - SVM code equivalents to the vmx.c changes > - grant-table operations still use the lock, because frankly I > could not follow the current code, and it''s quite late in the evening. > I also have a long list of uglinesses in the mm code that I found while writing > this lot. > > Keir, I have no objection to later replacing this with something better than an > rwlock. :) Or with making a NUMA-friendly rwlock implementation, since I > really expect this to be heavily read-mostly when paging/sharing/pod are not > enabled. > > Cheers, > > Tim.best regards yang
>> -----Original Message----- >> From: Tim Deegan [mailto:tim@xen.org] >> Sent: Friday, April 27, 2012 5:26 AM >> To: Zhang, Yang Z >> Cc: andres@lagarcavilla.org; Keir Fraser; xen-devel@lists.xensource.com >> Subject: Re: [Xen-devel] lock in vhpet >> >> At 02:36 +0000 on 25 Apr (1335321409), Zhang, Yang Z wrote: >> > > > But actually, the first cs introduced this issue is 24770. When >> > > > win8 booting and if hpet is enabled, it will use hpet as the time >> > > > source and there have lots of hpet access and EPT violation. In >> > > > EPT violation handler, it call get_gfn_type_access to get the mfn. >> > > > The cs 24770 introduces the gfn_lock for p2m lookups, and then the >> issue >> happens. >> > > > After I removed the gfn_lock, the issue goes. But in latest xen, >> > > > even I remove this lock, it still shows high cpu utilization. >> > > >> > > It would seem then that even the briefest lock-protected critical >> > > section would cause this? In the mmio case, the p2m lock taken in >> > > the hap fault handler is held during the actual lookup, and for a >> > > couple of branch instructions afterwards. >> > > >> > > In latest Xen, with lock removed for get_gfn, on which lock is time >> spent? >> > Still the p2m_lock. >> >> Can you please try the attached patch? I think you''ll need this one >> plus the >> ones that take the locks out of the hpet code. >> >> This patch makes the p2m lock into an rwlock and adjusts a number of the >> paths that don''t update the p2m so they only take the read lock. It''s a >> bit >> rough but I can boot 16-way win7 guest with it.That is great news. Tim, thanks for the amazing work. I''m poring over the patch presently, and will shoot at it with everything I''ve got testing-wise. I''m taking the liberty of pulling in Olaf (paging) and George (PoD) as the changeset affects those subsystems. Andres> > This really a great work! Now, the win7 guest is booting very fast and > never saw the BSOD again. > But the changes are so large in your patch. I think we need to do more > sanity testing to avoid any regressions. After you finish all the work, > I''d like to do a whole testing.:) > >> N.B. Since rwlocks don''t show up the the existing lock profiling, please >> don''t try >> to use the lock-profiling numbers to see if it''s helping! >> >> Andres, this is basically the big-hammer version of your "take a >> pagecount" >> changes, plus the change you made to hvmemul_rep_movs(). >> If this works I intend to follow it up with a patch to make some of the >> read-modify-write paths avoid taking the lock (by using a >> compare-exchange >> operation so they only take the lock on a write). If that succeeds I >> might drop >> put_gfn() altogether. >> >> But first it will need a lot of tidying up. Noticeably missing: >> - SVM code equivalents to the vmx.c changes >> - grant-table operations still use the lock, because frankly I >> could not follow the current code, and it''s quite late in the >> evening. >> I also have a long list of uglinesses in the mm code that I found while >> writing >> this lot. >> >> Keir, I have no objection to later replacing this with something better >> than an >> rwlock. :) Or with making a NUMA-friendly rwlock implementation, since >> I >> really expect this to be heavily read-mostly when paging/sharing/pod are >> not >> enabled. >> >> Cheers, >> >> Tim. > > best regards > yang >
> -----Original Message----- > From: Andres Lagar-Cavilla [mailto:andres@lagarcavilla.org] > Sent: Friday, April 27, 2012 8:52 AM > To: Zhang, Yang Z > Cc: Tim Deegan; Keir Fraser; xen-devel@lists.xensource.com; olaf@aepfle.de; > George.Dunlap@eu.citrix.com > Subject: RE: [Xen-devel] lock in vhpet > > >> -----Original Message----- > >> From: Tim Deegan [mailto:tim@xen.org] > >> Sent: Friday, April 27, 2012 5:26 AM > >> To: Zhang, Yang Z > >> Cc: andres@lagarcavilla.org; Keir Fraser; > >> xen-devel@lists.xensource.com > >> Subject: Re: [Xen-devel] lock in vhpet > >> > >> At 02:36 +0000 on 25 Apr (1335321409), Zhang, Yang Z wrote: > >> > > > But actually, the first cs introduced this issue is 24770. When > >> > > > win8 booting and if hpet is enabled, it will use hpet as the > >> > > > time source and there have lots of hpet access and EPT > >> > > > violation. In EPT violation handler, it call get_gfn_type_access to get > the mfn. > >> > > > The cs 24770 introduces the gfn_lock for p2m lookups, and then > >> > > > the > >> issue > >> happens. > >> > > > After I removed the gfn_lock, the issue goes. But in latest > >> > > > xen, even I remove this lock, it still shows high cpu utilization. > >> > > > >> > > It would seem then that even the briefest lock-protected critical > >> > > section would cause this? In the mmio case, the p2m lock taken in > >> > > the hap fault handler is held during the actual lookup, and for a > >> > > couple of branch instructions afterwards. > >> > > > >> > > In latest Xen, with lock removed for get_gfn, on which lock is > >> > > time > >> spent? > >> > Still the p2m_lock. > >> > >> Can you please try the attached patch? I think you''ll need this one > >> plus the ones that take the locks out of the hpet code. > >> > >> This patch makes the p2m lock into an rwlock and adjusts a number of > >> the paths that don''t update the p2m so they only take the read lock. > >> It''s a bit rough but I can boot 16-way win7 guest with it. > > That is great news. > > Tim, thanks for the amazing work. I''m poring over the patch presently, and will > shoot at it with everything I''ve got testing-wise. > > I''m taking the liberty of pulling in Olaf (paging) and George (PoD) as the > changeset affects those subsystems.But win8 guest shows BSOD with 32 VCPUs. :( The reason of BSOD is : SYSTEM_THREAD_EXCEPTION_NOT_HANDLED (ACPI.sys) best regards yang
> At 02:36 +0000 on 25 Apr (1335321409), Zhang, Yang Z wrote: >> > > But actually, the first cs introduced this issue is 24770. When win8 >> > > booting and if hpet is enabled, it will use hpet as the time source >> > > and there have lots of hpet access and EPT violation. In EPT >> violation >> > > handler, it call get_gfn_type_access to get the mfn. The cs 24770 >> > > introduces the gfn_lock for p2m lookups, and then the issue happens. >> > > After I removed the gfn_lock, the issue goes. But in latest xen, >> even >> > > I remove this lock, it still shows high cpu utilization. >> > >> > It would seem then that even the briefest lock-protected critical >> section would >> > cause this? In the mmio case, the p2m lock taken in the hap fault >> handler is >> > held during the actual lookup, and for a couple of branch instructions >> > afterwards. >> > >> > In latest Xen, with lock removed for get_gfn, on which lock is time >> spent? >> Still the p2m_lock. > > Can you please try the attached patch? I think you''ll need this one > plus the ones that take the locks out of the hpet code.Right off the bat I''m getting a multitude of (XEN) mm.c:3294:d0 Error while clearing mfn 100cbb7 And a hung dom0 during initramfs. I''m a little baffled as to why, but it''s there (32 bit dom0, XenServer6).> > This patch makes the p2m lock into an rwlock and adjusts a number of the > paths that don''t update the p2m so they only take the read lock. It''s a > bit rough but I can boot 16-way win7 guest with it. > > N.B. Since rwlocks don''t show up the the existing lock profiling, please > don''t try to use the lock-profiling numbers to see if it''s helping! > > Andres, this is basically the big-hammer version of your "take a > pagecount" changes, plus the change you made to hvmemul_rep_movs(). > If this works I intend to follow it up with a patch to make some of the > read-modify-write paths avoid taking the lock (by using a > compare-exchange operation so they only take the lock on a write). If > that succeeds I might drop put_gfn() altogether.You mean cmpxchg the whole p2m entry? I don''t think I parse the plan. There are code paths that do get_gfn_query -> p2m_change_type -> put_gfn. But I guess those could lock the p2m up-front if they become the only consumers of put_gfn left.> > But first it will need a lot of tidying up. Noticeably missing: > - SVM code equivalents to the vmx.c changesload_pdptrs doesn''t exist on svm. There are a couple of debugging get_gfn_query that can be made unlocked. And svm_vmcb_restore needs similar treatment to what you did in vmx.c. The question is who will be able to test it...> - grant-table operations still use the lock, because frankly I > could not follow the current code, and it''s quite late in the evening.It''s pretty complex with serious nesting, and ifdef''s for arm and 32 bit. gfn_to_mfn_private callers will suffer from altering the current meaning, as put_gfn resolves to the right thing for the ifdef''ed arch. The other user is grant_transfer which also relies on the page *not* having an extra ref in steal_page. So it''s a prime candidate to be left alone.> I also have a long list of uglinesses in the mm code that I foundUh, ugly stuff, how could that have happened? I have a few preliminary observations on the patch. Pasting relevant bits here, since the body of the patch seems to have been lost by the email thread: @@ -977,23 +976,25 @@ int arch_set_info_guest( ... + + if (!paging_mode_refcounts(d) + && !get_page_and_type(cr3_page, d, PGT_l3_page_table) ) replace with && !get_page_type() ) @@ -2404,32 +2373,33 @@ static enum hvm_copy_result __hvm_copy( gfn = addr >> PAGE_SHIFT; } - mfn = mfn_x(get_gfn_unshare(curr->domain, gfn, &p2mt)); + page = get_page_from_gfn(curr->domain, gfn, &p2mt, P2M_UNSHARE); replace with (flags & HVMCOPY_to_guest) ? P2M_UNSHARE : P2M_ALLOC (and same logic when checking p2m_is_shared). Not truly related to your patch bit since we''re at it. Same, further down - if ( !p2m_is_ram(p2mt) ) + if ( !page ) { - put_gfn(curr->domain, gfn); + if ( page ) + put_page(page); Last two lines are redundant @@ -4019,35 +3993,16 @@ long do_hvm_op(unsigned long op, XEN_GUE case HVMOP_modified_memory: a lot of error checking has been removed. At the very least: if ( page ) { ... } else { rc = -EINVAL; goto param_fail3; } arch/x86/mm.c:do_mmu_update -> you blew up all the paging/sharing checking for target gfns of mmu updates of l2/3/4 entries. It seems that this wouldn''t work anyways, that''s why you killed it? +++ b/xen/arch/x86/mm/hap/guest_walk.c @@ -54,34 +54,37 @@ unsigned long hap_p2m_ga_to_gfn(GUEST_PA ... + if ( !top_page ) { pfec[0] &= ~PFEC_page_present; - __put_gfn(p2m, top_gfn); + put_page(top_page); top_page is NULL here, remove put_page get_page_from_gfn_p2m, slow path: no need for p2m_lock/unlock since locking is already done by get_gfn_type_access/__put_gfn (hope those observations made sense without inlining them in the actual patch) Andres
> -----Original Message----- > From: Zhang, Yang Z > Sent: Friday, April 27, 2012 9:25 AM > To: andres@lagarcavilla.org > Cc: Tim Deegan; Keir Fraser; xen-devel@lists.xensource.com; olaf@aepfle.de; > George.Dunlap@eu.citrix.com > Subject: RE: [Xen-devel] lock in vhpet > > > > -----Original Message----- > > From: Andres Lagar-Cavilla [mailto:andres@lagarcavilla.org] > > Sent: Friday, April 27, 2012 8:52 AM > > To: Zhang, Yang Z > > Cc: Tim Deegan; Keir Fraser; xen-devel@lists.xensource.com; > > olaf@aepfle.de; George.Dunlap@eu.citrix.com > > Subject: RE: [Xen-devel] lock in vhpet > > > > >> -----Original Message----- > > >> From: Tim Deegan [mailto:tim@xen.org] > > >> Sent: Friday, April 27, 2012 5:26 AM > > >> To: Zhang, Yang Z > > >> Cc: andres@lagarcavilla.org; Keir Fraser; > > >> xen-devel@lists.xensource.com > > >> Subject: Re: [Xen-devel] lock in vhpet > > >> > > >> At 02:36 +0000 on 25 Apr (1335321409), Zhang, Yang Z wrote: > > >> > > > But actually, the first cs introduced this issue is 24770. > > >> > > > When > > >> > > > win8 booting and if hpet is enabled, it will use hpet as the > > >> > > > time source and there have lots of hpet access and EPT > > >> > > > violation. In EPT violation handler, it call > > >> > > > get_gfn_type_access to get > > the mfn. > > >> > > > The cs 24770 introduces the gfn_lock for p2m lookups, and > > >> > > > then the > > >> issue > > >> happens. > > >> > > > After I removed the gfn_lock, the issue goes. But in latest > > >> > > > xen, even I remove this lock, it still shows high cpu utilization. > > >> > > > > >> > > It would seem then that even the briefest lock-protected > > >> > > critical section would cause this? In the mmio case, the p2m > > >> > > lock taken in the hap fault handler is held during the actual > > >> > > lookup, and for a couple of branch instructions afterwards. > > >> > > > > >> > > In latest Xen, with lock removed for get_gfn, on which lock is > > >> > > time > > >> spent? > > >> > Still the p2m_lock. > > >> > > >> Can you please try the attached patch? I think you''ll need this > > >> one plus the ones that take the locks out of the hpet code. > > >> > > >> This patch makes the p2m lock into an rwlock and adjusts a number > > >> of the paths that don''t update the p2m so they only take the read lock. > > >> It''s a bit rough but I can boot 16-way win7 guest with it. > > > > That is great news. > > > > Tim, thanks for the amazing work. I''m poring over the patch presently, > > and will shoot at it with everything I''ve got testing-wise. > > > > I''m taking the liberty of pulling in Olaf (paging) and George (PoD) as > > the changeset affects those subsystems. > > But win8 guest shows BSOD with 32 VCPUs. :( The reason of BSOD is : > SYSTEM_THREAD_EXCEPTION_NOT_HANDLED (ACPI.sys) >Um....., I find this issue is related to xl not hypervisor. Will send a patch to fix it later. yang
At 20:02 -0700 on 26 Apr (1335470547), Andres Lagar-Cavilla wrote:> > Can you please try the attached patch? I think you''ll need this one > > plus the ones that take the locks out of the hpet code. > > Right off the bat I''m getting a multitude of > (XEN) mm.c:3294:d0 Error while clearing mfn 100cbb7 > And a hung dom0 during initramfs. I''m a little baffled as to why, but it''s > there (32 bit dom0, XenServer6).Curses, I knew there''d be one somewhere. I''ve been replacing get_page_and_type_from_pagenr()s (which return 0 for success) with old-school get_page_type()s (which return 1 for success) and not always getting the right number of inversions. That''s a horrible horrible beartrap of an API, BTW, which had me cursing at the screen, but I had enough on my plate yesterday without touching _that_ code too!> > Andres, this is basically the big-hammer version of your "take a > > pagecount" changes, plus the change you made to hvmemul_rep_movs(). > > If this works I intend to follow it up with a patch to make some of the > > read-modify-write paths avoid taking the lock (by using a > > compare-exchange operation so they only take the lock on a write). If > > that succeeds I might drop put_gfn() altogether. > > You mean cmpxchg the whole p2m entry? I don''t think I parse the plan. > There are code paths that do get_gfn_query -> p2m_change_type -> put_gfn. > But I guess those could lock the p2m up-front if they become the only > consumers of put_gfn left.Well, that''s more or less what happens now. I was thinking of replacing any remaining (implicit) lock ; read ; think a bit ; maybe write ; unlock code with the fast-path-friendlier: read ; think ; maybe-cmpxchg (and on failure undo or retry which avoids taking the write lock altogether if there''s no work to do. But maybe there aren''t many of those left now. Obviously any path which will always write should just take the write-lock first.> > - grant-table operations still use the lock, because frankly I > > could not follow the current code, and it''s quite late in the evening. > > It''s pretty complex with serious nesting, and ifdef''s for arm and 32 bit. > gfn_to_mfn_private callers will suffer from altering the current meaning, > as put_gfn resolves to the right thing for the ifdef''ed arch. The other > user is grant_transfer which also relies on the page *not* having an extra > ref in steal_page. So it''s a prime candidate to be left alone.Sadly, I think it''s not. The PV backends will be doing lots of grant ops, which shouldn''t get serialized against all other P2M lookups.> > I also have a long list of uglinesses in the mm code that I found > > Uh, ugly stuff, how could that have happened?I can''t imagine. :) Certainly nothing to do with me thinking "I''ll clean that up when I get some time."> I have a few preliminary observations on the patch. Pasting relevant bits > here, since the body of the patch seems to have been lost by the email > thread: > > @@ -977,23 +976,25 @@ int arch_set_info_guest( > ... > + > + if (!paging_mode_refcounts(d) > + && !get_page_and_type(cr3_page, d, PGT_l3_page_table) ) > replace with && !get_page_type() )Yep.> @@ -2404,32 +2373,33 @@ static enum hvm_copy_result __hvm_copy( > gfn = addr >> PAGE_SHIFT; > } > > - mfn = mfn_x(get_gfn_unshare(curr->domain, gfn, &p2mt)); > + page = get_page_from_gfn(curr->domain, gfn, &p2mt, P2M_UNSHARE); > replace with (flags & HVMCOPY_to_guest) ? P2M_UNSHARE : P2M_ALLOC (and > same logic when checking p2m_is_shared). Not truly related to your patch > bit since we''re at it.OK, but not in this patch.> Same, further down > - if ( !p2m_is_ram(p2mt) ) > + if ( !page ) > { > - put_gfn(curr->domain, gfn); > + if ( page ) > + put_page(page); > Last two lines are redundantYep.> @@ -4019,35 +3993,16 @@ long do_hvm_op(unsigned long op, XEN_GUE > case HVMOP_modified_memory: a lot of error checking has been removed.Yes, but it was bogus - there''s a race between the actual modification and the call, during which anything might have happened. The best we can do is throw log-dirty bits at everything, and the caller can''t do anything with the error anyway. When I come to tidy up I''ll just add a new mark_gfn_dirty function and skip the pointless gfn->mfn->gfn translation on this path.> arch/x86/mm.c:do_mmu_update -> you blew up all the paging/sharing checking > for target gfns of mmu updates of l2/3/4 entries. It seems that this > wouldn''t work anyways, that''s why you killed it?Yeah - since only L1es can point at foreign mappings it was all just noise, and even if there had been real p2m lookups on those paths there was no equivalent to the translate-in-place that happens in mod_l1_entry so it would have been broken in a much worse way.> +++ b/xen/arch/x86/mm/hap/guest_walk.c > @@ -54,34 +54,37 @@ unsigned long hap_p2m_ga_to_gfn(GUEST_PA > ... > + if ( !top_page ) > { > pfec[0] &= ~PFEC_page_present; > - __put_gfn(p2m, top_gfn); > + put_page(top_page); > top_page is NULL here, remove put_pageYep.> get_page_from_gfn_p2m, slow path: no need for p2m_lock/unlock since > locking is already done by get_gfn_type_access/__put_gfnYeah, but I was writing that with half an eye on killing that lock. :) I''ll drop them for now.> (hope those observations made sense without inlining them in the actual > patch)Yes, absolutely - thanks for the review! If we can get this to work well enough I''ll tidy it up into a sensible series next week. In the meantime, an updated verison of the monster patch is attached. Cheers, Tim. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
> At 20:02 -0700 on 26 Apr (1335470547), Andres Lagar-Cavilla wrote: >> > Can you please try the attached patch? I think you''ll need this one >> > plus the ones that take the locks out of the hpet code. >> >> Right off the bat I''m getting a multitude of >> (XEN) mm.c:3294:d0 Error while clearing mfn 100cbb7 >> And a hung dom0 during initramfs. I''m a little baffled as to why, but >> it''s >> there (32 bit dom0, XenServer6). > > Curses, I knew there''d be one somewhere. I''ve been replacing > get_page_and_type_from_pagenr()s (which return 0 for success) with > old-school get_page_type()s (which return 1 for success) and not always > getting the right number of inversions. That''s a horrible horrible > beartrap of an API, BTW, which had me cursing at the screen, but I had > enough on my plate yesterday without touching _that_ code too!Found more bugs. Some were predating this (xsm not calling put_gfn after get_gfn_untyped, etc). Some were new (get_gfn_untyped not arch independent). I will be sending you shortly a small patch series on top of your mosnter patch purely for RFC. Feel free to merge all, pick out bug fixes, etc. Andres> >> > Andres, this is basically the big-hammer version of your "take a >> > pagecount" changes, plus the change you made to hvmemul_rep_movs(). >> > If this works I intend to follow it up with a patch to make some of >> the >> > read-modify-write paths avoid taking the lock (by using a >> > compare-exchange operation so they only take the lock on a write). If >> > that succeeds I might drop put_gfn() altogether. >> >> You mean cmpxchg the whole p2m entry? I don''t think I parse the plan. >> There are code paths that do get_gfn_query -> p2m_change_type -> >> put_gfn. >> But I guess those could lock the p2m up-front if they become the only >> consumers of put_gfn left. > > Well, that''s more or less what happens now. I was thinking of replacing > any remaining > > (implicit) lock ; read ; think a bit ; maybe write ; unlock > > code with the fast-path-friendlier: > > read ; think ; maybe-cmpxchg (and on failure undo or retry > > which avoids taking the write lock altogether if there''s no work to do. > But maybe there aren''t many of those left now. Obviously any path > which will always write should just take the write-lock first. > >> > - grant-table operations still use the lock, because frankly I >> > could not follow the current code, and it''s quite late in the >> evening. >> >> It''s pretty complex with serious nesting, and ifdef''s for arm and 32 >> bit. >> gfn_to_mfn_private callers will suffer from altering the current >> meaning, >> as put_gfn resolves to the right thing for the ifdef''ed arch. The other >> user is grant_transfer which also relies on the page *not* having an >> extra >> ref in steal_page. So it''s a prime candidate to be left alone. > > Sadly, I think it''s not. The PV backends will be doing lots of grant > ops, which shouldn''t get serialized against all other P2M lookups. > >> > I also have a long list of uglinesses in the mm code that I found >> >> Uh, ugly stuff, how could that have happened? > > I can''t imagine. :) Certainly nothing to do with me thinking "I''ll > clean that up when I get some time." > >> I have a few preliminary observations on the patch. Pasting relevant >> bits >> here, since the body of the patch seems to have been lost by the email >> thread: >> >> @@ -977,23 +976,25 @@ int arch_set_info_guest( >> ... >> + >> + if (!paging_mode_refcounts(d) >> + && !get_page_and_type(cr3_page, d, PGT_l3_page_table) ) >> replace with && !get_page_type() ) > > Yep. > >> @@ -2404,32 +2373,33 @@ static enum hvm_copy_result __hvm_copy( >> gfn = addr >> PAGE_SHIFT; >> } >> >> - mfn = mfn_x(get_gfn_unshare(curr->domain, gfn, &p2mt)); >> + page = get_page_from_gfn(curr->domain, gfn, &p2mt, >> P2M_UNSHARE); >> replace with (flags & HVMCOPY_to_guest) ? P2M_UNSHARE : P2M_ALLOC (and >> same logic when checking p2m_is_shared). Not truly related to your patch >> bit since we''re at it. > > OK, but not in this patch. > >> Same, further down >> - if ( !p2m_is_ram(p2mt) ) >> + if ( !page ) >> { >> - put_gfn(curr->domain, gfn); >> + if ( page ) >> + put_page(page); >> Last two lines are redundant > > Yep. > >> @@ -4019,35 +3993,16 @@ long do_hvm_op(unsigned long op, XEN_GUE >> case HVMOP_modified_memory: a lot of error checking has been >> removed. > > Yes, but it was bogus - there''s a race between the actual modification > and the call, during which anything might have happened. The best we > can do is throw log-dirty bits at everything, and the caller can''t do > anything with the error anyway. > > When I come to tidy up I''ll just add a new mark_gfn_dirty function > and skip the pointless gfn->mfn->gfn translation on this path. > >> arch/x86/mm.c:do_mmu_update -> you blew up all the paging/sharing >> checking >> for target gfns of mmu updates of l2/3/4 entries. It seems that this >> wouldn''t work anyways, that''s why you killed it? > > Yeah - since only L1es can point at foreign mappings it was all just > noise, and even if there had been real p2m lookups on those paths there > was no equivalent to the translate-in-place that happens in > mod_l1_entry so it would have been broken in a much worse way. > >> +++ b/xen/arch/x86/mm/hap/guest_walk.c >> @@ -54,34 +54,37 @@ unsigned long hap_p2m_ga_to_gfn(GUEST_PA >> ... >> + if ( !top_page ) >> { >> pfec[0] &= ~PFEC_page_present; >> - __put_gfn(p2m, top_gfn); >> + put_page(top_page); >> top_page is NULL here, remove put_page > > Yep. > >> get_page_from_gfn_p2m, slow path: no need for p2m_lock/unlock since >> locking is already done by get_gfn_type_access/__put_gfn > > Yeah, but I was writing that with half an eye on killing that lock. :) > I''ll drop them for now. > >> (hope those observations made sense without inlining them in the actual >> patch) > > Yes, absolutely - thanks for the review! > > If we can get this to work well enough I''ll tidy it up into a sensible > series next week. In the meantime, an updated verison of the > monster patch is attached. > > Cheers, > > Tim. >
> At 20:02 -0700 on 26 Apr (1335470547), Andres Lagar-Cavilla wrote: >> > Can you please try the attached patch? I think you''ll need this one >> > plus the ones that take the locks out of the hpet code. >> >> Right off the bat I''m getting a multitude of >> (XEN) mm.c:3294:d0 Error while clearing mfn 100cbb7 >> And a hung dom0 during initramfs. I''m a little baffled as to why, but >> it''s >> there (32 bit dom0, XenServer6). > > Curses, I knew there''d be one somewhere. I''ve been replacing > get_page_and_type_from_pagenr()s (which return 0 for success) with > old-school get_page_type()s (which return 1 for success) and not always > getting the right number of inversions. That''s a horrible horrible > beartrap of an API, BTW, which had me cursing at the screen, but I had > enough on my plate yesterday without touching _that_ code too! >I now am quite pleased with the testing results on our end. I have a four-patch series to top up your monster patch, which I''ll submit shortly. I encourage everyone interested to test this (obviously a lot of code is touched). Including AMD, as I''ve expanded the code to touch svm too.>> > Andres, this is basically the big-hammer version of your "take a >> > pagecount" changes, plus the change you made to hvmemul_rep_movs(). >> > If this works I intend to follow it up with a patch to make some of >> the >> > read-modify-write paths avoid taking the lock (by using a >> > compare-exchange operation so they only take the lock on a write). If >> > that succeeds I might drop put_gfn() altogether. >> >> You mean cmpxchg the whole p2m entry? I don''t think I parse the plan. >> There are code paths that do get_gfn_query -> p2m_change_type -> >> put_gfn. >> But I guess those could lock the p2m up-front if they become the only >> consumers of put_gfn left. > > Well, that''s more or less what happens now. I was thinking of replacing > any remaining > > (implicit) lock ; read ; think a bit ; maybe write ; unlock > > code with the fast-path-friendlier: > > read ; think ; maybe-cmpxchg (and on failure undo or retry > > which avoids taking the write lock altogether if there''s no work to do. > But maybe there aren''t many of those left now. Obviously any path > which will always write should just take the write-lock first.After my four patches there aren''t really any paths like the above left (exhaustion disclaimer). I believe one or two iterative paths (like HVMOP_set_mem_type) could be optimized to take the p2m lock up front, instead of many get_gfn''s. The nice thing about get_gfn/put_gfn is that it will allow for seamless (har har) transition to a fine-grained p2m. But then maybe we won''t ever need that with a p2m rwlock.> >> > - grant-table operations still use the lock, because frankly I >> > could not follow the current code, and it''s quite late in the >> evening. >> >> It''s pretty complex with serious nesting, and ifdef''s for arm and 32 >> bit. >> gfn_to_mfn_private callers will suffer from altering the current >> meaning, >> as put_gfn resolves to the right thing for the ifdef''ed arch. The other >> user is grant_transfer which also relies on the page *not* having an >> extra >> ref in steal_page. So it''s a prime candidate to be left alone. > > Sadly, I think it''s not. The PV backends will be doing lots of grant > ops, which shouldn''t get serialized against all other P2M lookups. >Those are addressed in my patch series now, which should case waves of panic. Andres>> > I also have a long list of uglinesses in the mm code that I found >> >> Uh, ugly stuff, how could that have happened? > > I can''t imagine. :) Certainly nothing to do with me thinking "I''ll > clean that up when I get some time." > >> I have a few preliminary observations on the patch. Pasting relevant >> bits >> here, since the body of the patch seems to have been lost by the email >> thread: >> >> @@ -977,23 +976,25 @@ int arch_set_info_guest( >> ... >> + >> + if (!paging_mode_refcounts(d) >> + && !get_page_and_type(cr3_page, d, PGT_l3_page_table) ) >> replace with && !get_page_type() ) > > Yep. > >> @@ -2404,32 +2373,33 @@ static enum hvm_copy_result __hvm_copy( >> gfn = addr >> PAGE_SHIFT; >> } >> >> - mfn = mfn_x(get_gfn_unshare(curr->domain, gfn, &p2mt)); >> + page = get_page_from_gfn(curr->domain, gfn, &p2mt, >> P2M_UNSHARE); >> replace with (flags & HVMCOPY_to_guest) ? P2M_UNSHARE : P2M_ALLOC (and >> same logic when checking p2m_is_shared). Not truly related to your patch >> bit since we''re at it. > > OK, but not in this patch. > >> Same, further down >> - if ( !p2m_is_ram(p2mt) ) >> + if ( !page ) >> { >> - put_gfn(curr->domain, gfn); >> + if ( page ) >> + put_page(page); >> Last two lines are redundant > > Yep. > >> @@ -4019,35 +3993,16 @@ long do_hvm_op(unsigned long op, XEN_GUE >> case HVMOP_modified_memory: a lot of error checking has been >> removed. > > Yes, but it was bogus - there''s a race between the actual modification > and the call, during which anything might have happened. The best we > can do is throw log-dirty bits at everything, and the caller can''t do > anything with the error anyway. > > When I come to tidy up I''ll just add a new mark_gfn_dirty function > and skip the pointless gfn->mfn->gfn translation on this path. > >> arch/x86/mm.c:do_mmu_update -> you blew up all the paging/sharing >> checking >> for target gfns of mmu updates of l2/3/4 entries. It seems that this >> wouldn''t work anyways, that''s why you killed it? > > Yeah - since only L1es can point at foreign mappings it was all just > noise, and even if there had been real p2m lookups on those paths there > was no equivalent to the translate-in-place that happens in > mod_l1_entry so it would have been broken in a much worse way. > >> +++ b/xen/arch/x86/mm/hap/guest_walk.c >> @@ -54,34 +54,37 @@ unsigned long hap_p2m_ga_to_gfn(GUEST_PA >> ... >> + if ( !top_page ) >> { >> pfec[0] &= ~PFEC_page_present; >> - __put_gfn(p2m, top_gfn); >> + put_page(top_page); >> top_page is NULL here, remove put_page > > Yep. > >> get_page_from_gfn_p2m, slow path: no need for p2m_lock/unlock since >> locking is already done by get_gfn_type_access/__put_gfn > > Yeah, but I was writing that with half an eye on killing that lock. :) > I''ll drop them for now. > >> (hope those observations made sense without inlining them in the actual >> patch) > > Yes, absolutely - thanks for the review! > > If we can get this to work well enough I''ll tidy it up into a sensible > series next week. In the meantime, an updated verison of the > monster patch is attached. > > Cheers, > > Tim. >
Hi tim, Did the attached patch apply to upstream xen? I tried the latest xen and still saw the high cpu utilization. best regards yang> -----Original Message----- > From: Tim Deegan [mailto:tim@xen.org] > Sent: Friday, April 27, 2012 5:26 AM > To: Zhang, Yang Z > Cc: andres@lagarcavilla.org; Keir Fraser; xen-devel@lists.xensource.com > Subject: Re: [Xen-devel] lock in vhpet > > At 02:36 +0000 on 25 Apr (1335321409), Zhang, Yang Z wrote: > > > > But actually, the first cs introduced this issue is 24770. When win8 > > > > booting and if hpet is enabled, it will use hpet as the time source > > > > and there have lots of hpet access and EPT violation. In EPT violation > > > > handler, it call get_gfn_type_access to get the mfn. The cs 24770 > > > > introduces the gfn_lock for p2m lookups, and then the issue happens. > > > > After I removed the gfn_lock, the issue goes. But in latest xen, even > > > > I remove this lock, it still shows high cpu utilization. > > > > > > It would seem then that even the briefest lock-protected critical section > would > > > cause this? In the mmio case, the p2m lock taken in the hap fault handler is > > > held during the actual lookup, and for a couple of branch instructions > > > afterwards. > > > > > > In latest Xen, with lock removed for get_gfn, on which lock is time spent? > > Still the p2m_lock. > > Can you please try the attached patch? I think you''ll need this one > plus the ones that take the locks out of the hpet code. > > This patch makes the p2m lock into an rwlock and adjusts a number of the > paths that don''t update the p2m so they only take the read lock. It''s a > bit rough but I can boot 16-way win7 guest with it. > > N.B. Since rwlocks don''t show up the the existing lock profiling, please > don''t try to use the lock-profiling numbers to see if it''s helping! > > Andres, this is basically the big-hammer version of your "take a > pagecount" changes, plus the change you made to hvmemul_rep_movs(). > If this works I intend to follow it up with a patch to make some of the > read-modify-write paths avoid taking the lock (by using a > compare-exchange operation so they only take the lock on a write). If > that succeeds I might drop put_gfn() altogether. > > But first it will need a lot of tidying up. Noticeably missing: > - SVM code equivalents to the vmx.c changes > - grant-table operations still use the lock, because frankly I > could not follow the current code, and it''s quite late in the evening. > I also have a long list of uglinesses in the mm code that I found while > writing this lot. > > Keir, I have no objection to later replacing this with something better > than an rwlock. :) Or with making a NUMA-friendly rwlock > implementation, since I really expect this to be heavily read-mostly > when paging/sharing/pod are not enabled. > > Cheers, > > Tim.
Hi, At 11:36 +0000 on 16 May (1337168174), Zhang, Yang Z wrote:> Hi tim, > > Did the attached patch apply to upstream xen? I tried the latest xen > and still saw the high cpu utilization.The patch is not yet applied. It''s been cleaned up into a patch series that I posted last Thursday, and will probably be applied later this week. Cheers, Tim.> best regards > yang > > > -----Original Message----- > > From: Tim Deegan [mailto:tim@xen.org] > > Sent: Friday, April 27, 2012 5:26 AM > > To: Zhang, Yang Z > > Cc: andres@lagarcavilla.org; Keir Fraser; xen-devel@lists.xensource.com > > Subject: Re: [Xen-devel] lock in vhpet > > > > At 02:36 +0000 on 25 Apr (1335321409), Zhang, Yang Z wrote: > > > > > But actually, the first cs introduced this issue is 24770. When win8 > > > > > booting and if hpet is enabled, it will use hpet as the time source > > > > > and there have lots of hpet access and EPT violation. In EPT violation > > > > > handler, it call get_gfn_type_access to get the mfn. The cs 24770 > > > > > introduces the gfn_lock for p2m lookups, and then the issue happens. > > > > > After I removed the gfn_lock, the issue goes. But in latest xen, even > > > > > I remove this lock, it still shows high cpu utilization. > > > > > > > > It would seem then that even the briefest lock-protected critical section > > would > > > > cause this? In the mmio case, the p2m lock taken in the hap fault handler is > > > > held during the actual lookup, and for a couple of branch instructions > > > > afterwards. > > > > > > > > In latest Xen, with lock removed for get_gfn, on which lock is time spent? > > > Still the p2m_lock. > > > > Can you please try the attached patch? I think you''ll need this one > > plus the ones that take the locks out of the hpet code. > > > > This patch makes the p2m lock into an rwlock and adjusts a number of the > > paths that don''t update the p2m so they only take the read lock. It''s a > > bit rough but I can boot 16-way win7 guest with it. > > > > N.B. Since rwlocks don''t show up the the existing lock profiling, please > > don''t try to use the lock-profiling numbers to see if it''s helping! > > > > Andres, this is basically the big-hammer version of your "take a > > pagecount" changes, plus the change you made to hvmemul_rep_movs(). > > If this works I intend to follow it up with a patch to make some of the > > read-modify-write paths avoid taking the lock (by using a > > compare-exchange operation so they only take the lock on a write). If > > that succeeds I might drop put_gfn() altogether. > > > > But first it will need a lot of tidying up. Noticeably missing: > > - SVM code equivalents to the vmx.c changes > > - grant-table operations still use the lock, because frankly I > > could not follow the current code, and it''s quite late in the evening. > > I also have a long list of uglinesses in the mm code that I found while > > writing this lot. > > > > Keir, I have no objection to later replacing this with something better > > than an rwlock. :) Or with making a NUMA-friendly rwlock > > implementation, since I really expect this to be heavily read-mostly > > when paging/sharing/pod are not enabled. > > > > Cheers, > > > > Tim. > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel
Hi, At 13:36 +0100 on 16 May (1337175361), Tim Deegan wrote:> At 11:36 +0000 on 16 May (1337168174), Zhang, Yang Z wrote: > > Did the attached patch apply to upstream xen? I tried the latest xen > > and still saw the high cpu utilization. > > The patch is not yet applied. It''s been cleaned up into a patch series > that I posted last Thursday, and will probably be applied later this > week.It''s now been applied to the staging tree, as csets 25350--25360. Cheers, Tim.
> -----Original Message----- > From: Tim Deegan [mailto:tim@xen.org] > Sent: Thursday, May 17, 2012 6:58 PM > To: Zhang, Yang Z > Cc: xen-devel@lists.xensource.com; Keir Fraser; andres@lagarcavilla.org > Subject: Re: [Xen-devel] lock in vhpet > > Hi, > > At 13:36 +0100 on 16 May (1337175361), Tim Deegan wrote: > > At 11:36 +0000 on 16 May (1337168174), Zhang, Yang Z wrote: > > > Did the attached patch apply to upstream xen? I tried the latest xen > > > and still saw the high cpu utilization. > > > > The patch is not yet applied. It''s been cleaned up into a patch series > > that I posted last Thursday, and will probably be applied later this > > week. > > It''s now been applied to the staging tree, as csets 25350--25360.It''s a great work! With one week''s testing, we didn''t find any regression with those patches. best regards yang