thr3ads.net - Xen devel - lock in vhpet [Apr 2012]

If this information is useful, please help other people find it:
Share via:

Zhang, Yang Z

2012-Apr-17 03:26 UTC

lock in vhpet

Hi keir

I noticed that the changeset 15289 introuduced locking to platform timers. And
you mentioned that it only handy for correctness. Are there some potential
issues which is fixed by this patch? If not, I wonder why we need those locks? I
think it should be OS''s responsibly to guarantee the access
sequentially, not hypervisor. Am I right?
I don''t know whether all those locks are necessary, but at least the
lock for vhpet, especially the reading lock, is not required.

best regards
yang

Keir Fraser

2012-Apr-17 07:27 UTC

head link

Re: lock in vhpet

On 17/04/2012 04:26, "Zhang, Yang Z" <yang.z.zhang@intel.com>
wrote:
> Hi keir
> 
> I noticed that the changeset 15289 introuduced locking to platform timers.
And
> you mentioned that it only handy for correctness. Are there some potential
> issues which is fixed by this patch? If not, I wonder why we need those
locks?
Yes, issues were fixed by the patch. That''s why I bothered to implement
it.
However I think the observed issues were with protecting the mechanisms in
vpt.c, and the other locking at least partially may be overly cautious.
> I think it should be OS''s responsibly to guarantee the access
sequentially,
> not hypervisor. Am I right?
It depends. Where an access is an apparently-atomic memory-mapped access,
but implemented as a sequence of operations in the hypervisor, the
hypervisor might need to maintain atomicity through locking.
> I don''t know whether all those locks are necessary, but at least
the lock for
> vhpet, especially the reading lock, is not required.
This is definitely not true, for example locking is required around calls to
create_periodic_time(), to serialise them. So in general the locking I
added, even in vhpet.c is required. If you have a specific hot path you are
looking to optimise, and especially if you have numbers to back that up,
then we can consider specific localised optimisations to avoid locking where
we can reason it is not needed.

 -- Keir
> best regards
> yang
>

Zhang, Yang Z

2012-Apr-18 00:52 UTC

head link

Re: lock in vhpet

> -----Original Message-----
> From: Keir Fraser [mailto:keir.xen@gmail.com] On Behalf Of Keir Fraser
> 
> > Hi keir
> >
> > I noticed that the changeset 15289 introuduced locking to platform
> > timers. And you mentioned that it only handy for correctness. Are
> > there some potential issues which is fixed by this patch? If not, I
wonder why
> we need those locks?
> 
> Yes, issues were fixed by the patch. That''s why I bothered to
implement it.
> However I think the observed issues were with protecting the mechanisms in
> vpt.c, and the other locking at least partially may be overly cautious.
> 
> > I think it should be OS''s responsibly to guarantee the access
> > sequentially, not hypervisor. Am I right?
> 
> It depends. Where an access is an apparently-atomic memory-mapped access,
> but implemented as a sequence of operations in the hypervisor, the
hypervisor
> might need to maintain atomicity through locking.
But if there already has lock inside guest for those access, and there have no
other code patch(like timer callback function) in hypervisor to access the
shared data, then we don''t need to use lock in hypersivor.
> > I don''t know whether all those locks are necessary, but at
least the
> > lock for vhpet, especially the reading lock, is not required.
> 
> This is definitely not true, for example locking is required around calls
to
> create_periodic_time(), to serialise them. So in general the locking I
added,
> even in vhpet.c is required. If you have a specific hot path you are
looking to
> optimise, and especially if you have numbers to back that up, then we can
> consider specific localised optimisations to avoid locking where we can
reason
> it is not needed.As I mentioned, if the guest can ensure there only one CPU to access the hpet at
same time, this means the access itself already is serialized.
Yes.Win8 is booting very slowly w/ more than 16 VCPUs. And this is due to the
big lock in reading hpet. Also, the xentrace data shows lots of vmexit which
mainly from PAUSE instruction from other cpus. So I think if guest already uses
lock to protect the hpet access, why hypervisor do the same thing too?

yang

Keir Fraser

2012-Apr-18 07:13 UTC

head link

Re: lock in vhpet

On 18/04/2012 01:52, "Zhang, Yang Z" <yang.z.zhang@intel.com>
wrote:
>> It depends. Where an access is an apparently-atomic memory-mapped
access,
>> but implemented as a sequence of operations in the hypervisor, the
hypervisor
>> might need to maintain atomicity through locking.
> 
> But if there already has lock inside guest for those access, and there have
no
> other code patch(like timer callback function) in hypervisor to access the
> shared data, then we don''t need to use lock in hypersivor.
If there is a memory-mapped register access which is atomic on bare metal,
the guest may not bother with locking. We have to maintain that apparent
atomicity ourselves by implementing serialisation in the hypervisor.
>>> I don''t know whether all those locks are necessary, but at
least the
>>> lock for vhpet, especially the reading lock, is not required.
>> 
>> This is definitely not true, for example locking is required around
calls to
>> create_periodic_time(), to serialise them. So in general the locking I
added,
>> even in vhpet.c is required. If you have a specific hot path you are
looking
>> to
>> optimise, and especially if you have numbers to back that up, then we
can
>> consider specific localised optimisations to avoid locking where we can
>> reason
>> it is not needed.
> As I mentioned, if the guest can ensure there only one CPU to access the
hpet
> at same time, this means the access itself already is serialized.
> Yes.Win8 is booting very slowly w/ more than 16 VCPUs. And this is due to
the
> big lock in reading hpet. Also, the xentrace data shows lots of vmexit
which
> mainly from PAUSE instruction from other cpus. So I think if guest already
> uses lock to protect the hpet access, why hypervisor do the same thing too?
If the HPET accesses are atomic on bare metal, we have to maintain that,
even if some guests have extra locking themselves. Also, in some cases Xen
needs locking to correctly maintain its own internal state regardless of
what an (untrusted) guest might do. So we cannot just get rid of the vhpet
lock everywhere. It''s definitely not correct to do so. Optimising the
hpet
read path however, sounds okay. I agree the lock may not be needed on that
specific path.

 -- Keir

Zhang, Yang Z

2012-Apr-18 07:55 UTC

head link

Re: lock in vhpet

> -----Original Message-----
> From: Keir Fraser [mailto:keir.xen@gmail.com]
> Sent: Wednesday, April 18, 2012 3:13 PM
> To: Zhang, Yang Z; xen-devel@lists.xensource.com
> Subject: Re: lock in vhpet
> 
> On 18/04/2012 01:52, "Zhang, Yang Z"
<yang.z.zhang@intel.com> wrote:
> 
> >> It depends. Where an access is an apparently-atomic memory-mapped
> >> access, but implemented as a sequence of operations in the
> >> hypervisor, the hypervisor might need to maintain atomicity
through locking.
> >
> > But if there already has lock inside guest for those access, and there
> > have no other code patch(like timer callback function) in hypervisor
> > to access the shared data, then we don''t need to use lock in
hypersivor.
> 
> If there is a memory-mapped register access which is atomic on bare metal,
> the guest may not bother with locking. We have to maintain that apparent
> atomicity ourselves by implementing serialisation in the hypervisor.
> 
> >>> I don''t know whether all those locks are necessary,
but at least the
> >>> lock for vhpet, especially the reading lock, is not required.
> >>
> >> This is definitely not true, for example locking is required
around
> >> calls to create_periodic_time(), to serialise them. So in general
the
> >> locking I added, even in vhpet.c is required. If you have a
specific
> >> hot path you are looking to optimise, and especially if you have
> >> numbers to back that up, then we can consider specific localised
> >> optimisations to avoid locking where we can reason it is not
needed.
> > As I mentioned, if the guest can ensure there only one CPU to access
> > the hpet at same time, this means the access itself already is
serialized.
> > Yes.Win8 is booting very slowly w/ more than 16 VCPUs. And this is due
> > to the big lock in reading hpet. Also, the xentrace data shows lots of
> > vmexit which mainly from PAUSE instruction from other cpus. So I think
> > if guest already uses lock to protect the hpet access, why hypervisor
do the
> same thing too?
> 
> If the HPET accesses are atomic on bare metal, we have to maintain that,
even
> if some guests have extra locking themselves. Also, in some cases Xen needs
> locking to correctly maintain its own internal state regardless of what an
> (untrusted) guest might do. So we cannot just get rid of the vhpet lock
> everywhere. It''s definitely not correct to do so. Optimising the
hpet read path
> however, sounds okay. I agree the lock may not be needed on that specific
> path.
You are right.
For this case, since the main access of hpet is to read the main counter, so I
think the rwlock is a better choice.

yang

Keir Fraser

2012-Apr-18 08:29 UTC

head link

Re: lock in vhpet

On 18/04/2012 08:55, "Zhang, Yang Z" <yang.z.zhang@intel.com>
wrote:
>> If the HPET accesses are atomic on bare metal, we have to maintain
that, even
>> if some guests have extra locking themselves. Also, in some cases Xen
needs
>> locking to correctly maintain its own internal state regardless of what
an
>> (untrusted) guest might do. So we cannot just get rid of the vhpet lock
>> everywhere. It''s definitely not correct to do so. Optimising
the hpet read
>> path
>> however, sounds okay. I agree the lock may not be needed on that
specific
>> path.
> 
> You are right.
> For this case, since the main access of hpet is to read the main counter,
so I
> think the rwlock is a better choice.
I''ll see if I can make a patch...

 -- Keir

Keir Fraser

2012-Apr-18 09:14 UTC

head link

Re: lock in vhpet

On 18/04/2012 09:29, "Keir Fraser" <keir@xen.org> wrote:
> On 18/04/2012 08:55, "Zhang, Yang Z"
<yang.z.zhang@intel.com> wrote:
> 
>>> If the HPET accesses are atomic on bare metal, we have to maintain
that,
>>> even
>>> if some guests have extra locking themselves. Also, in some cases
Xen needs
>>> locking to correctly maintain its own internal state regardless of
what an
>>> (untrusted) guest might do. So we cannot just get rid of the vhpet
lock
>>> everywhere. It''s definitely not correct to do so.
Optimising the hpet read
>>> path
>>> however, sounds okay. I agree the lock may not be needed on that
specific
>>> path.
>> 
>> You are right.
>> For this case, since the main access of hpet is to read the main
counter, so
>> I
>> think the rwlock is a better choice.
> 
> I''ll see if I can make a patch...
Please try the attached patch (build tested only).

 -- Keir
>  -- Keir
> 
> 


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Keir Fraser

2012-Apr-18 09:30 UTC

head link

Re: lock in vhpet

On 18/04/2012 10:14, "Keir Fraser" <keir@xen.org> wrote:
> On 18/04/2012 09:29, "Keir Fraser" <keir@xen.org> wrote:
> 
>> On 18/04/2012 08:55, "Zhang, Yang Z"
<yang.z.zhang@intel.com> wrote:
>> 
>>>> If the HPET accesses are atomic on bare metal, we have to
maintain that,
>>>> even
>>>> if some guests have extra locking themselves. Also, in some
cases Xen needs
>>>> locking to correctly maintain its own internal state regardless
of what an
>>>> (untrusted) guest might do. So we cannot just get rid of the
vhpet lock
>>>> everywhere. It''s definitely not correct to do so.
Optimising the hpet read
>>>> path
>>>> however, sounds okay. I agree the lock may not be needed on
that specific
>>>> path.
>>> 
>>> You are right.
>>> For this case, since the main access of hpet is to read the main
counter, so
>>> I
>>> think the rwlock is a better choice.
>> 
>> I''ll see if I can make a patch...
> 
> Please try the attached patch (build tested only).
Actually try this updated one. :-)
>  -- Keir
> 
>>  -- Keir
>> 
>> 
> 


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Zhang, Yang Z

2012-Apr-19 05:19 UTC

head link

Re: lock in vhpet

There have no problem with this patch, it works well. But it cannot fix the win8
issue. It seems there has some other issues with hpet. I will look into it.
Thanks for your quick patch.

best regards
yang

> -----Original Message-----
> From: Keir Fraser [mailto:keir.xen@gmail.com] On Behalf Of Keir Fraser
> Sent: Wednesday, April 18, 2012 5:31 PM
> To: Zhang, Yang Z; xen-devel@lists.xensource.com
> Subject: Re: lock in vhpet
> 
> On 18/04/2012 10:14, "Keir Fraser" <keir@xen.org> wrote:
> 
> > On 18/04/2012 09:29, "Keir Fraser" <keir@xen.org>
wrote:
> >
> >> On 18/04/2012 08:55, "Zhang, Yang Z"
<yang.z.zhang@intel.com> wrote:
> >>
> >>>> If the HPET accesses are atomic on bare metal, we have to
maintain
> >>>> that, even if some guests have extra locking themselves.
Also, in
> >>>> some cases Xen needs locking to correctly maintain its own
internal
> >>>> state regardless of what an
> >>>> (untrusted) guest might do. So we cannot just get rid of
the vhpet
> >>>> lock everywhere. It''s definitely not correct to
do so. Optimising
> >>>> the hpet read path however, sounds okay. I agree the lock
may not
> >>>> be needed on that specific path.
> >>>
> >>> You are right.
> >>> For this case, since the main access of hpet is to read the
main
> >>> counter, so I think the rwlock is a better choice.
> >>
> >> I''ll see if I can make a patch...
> >
> > Please try the attached patch (build tested only).
> 
> Actually try this updated one. :-)
> 
> >  -- Keir
> >
> >>  -- Keir
> >>
> >>
> >

Tim Deegan

2012-Apr-19 08:27 UTC

head link

Re: lock in vhpet

At 05:19 +0000 on 19 Apr (1334812779), Zhang, Yang Z
wrote:> There have no problem with this patch, it works well. But it cannot
> fix the win8 issue. It seems there has some other issues with hpet. I
> will look into it.  Thanks for your quick patch.
The lock in hvm_get_guest_time() will still be serializing the hpet
reads.  But since it needs to update a shared variable, that will need to
haul cachelines around anyway. 

Tim.

Keir Fraser

2012-Apr-19 08:47 UTC

head link

Re: lock in vhpet

On 19/04/2012 09:27, "Tim Deegan" <tim@xen.org> wrote:
> At 05:19 +0000 on 19 Apr (1334812779), Zhang, Yang Z wrote:
>> There have no problem with this patch, it works well. But it cannot
>> fix the win8 issue. It seems there has some other issues with hpet. I
>> will look into it.  Thanks for your quick patch.
> 
> The lock in hvm_get_guest_time() will still be serializing the hpet
> reads.  But since it needs to update a shared variable, that will need to
> haul cachelines around anyway.
Yes, that''s true. You could try the attached hacky patch out of
interest, to
see what that lock is costing you in your scenario. But if we want
consistent monotonically-increasing guest time, I suspect we can''t get
rid
of the lock, so that''s going to limit our scalability unavoidably.
Shame.

 -- Keir
> Tim.
> 


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Zhang, Yang Z

2012-Apr-23 07:36 UTC

head link

Re: lock in vhpet

The p2m lock in __get_gfn_type_access() is the culprit. Here is the profiling
data with 10 seconds:

(XEN) p2m_lock 1 lock:
(XEN)   lock:      190733(00000000:14CE5726), block:      
67159(00000007:6AAA53F3)

Those data is collected when win8 guest(16 vcpus) is idle. 16 VCPUs blocked 30
seconds with 10 sec''s profiling. It means 18% of cpu cycle is waiting
for the p2m lock. And those data only for idle guest. The impaction is more
seriously when run some workload inside guest.
I noticed that this change was adding by cs 24770. And before it, we
don''t require the p2m lock in _get_gfn_type_access. So is this lock
really necessary?

best regards
yang

> -----Original Message-----
> From: Keir Fraser [mailto:keir.xen@gmail.com] On Behalf Of Keir Fraser
> Sent: Thursday, April 19, 2012 4:47 PM
> To: Tim Deegan; Zhang, Yang Z
> Cc: xen-devel@lists.xensource.com
> Subject: Re: [Xen-devel] lock in vhpet
> 
> On 19/04/2012 09:27, "Tim Deegan" <tim@xen.org> wrote:
> 
> > At 05:19 +0000 on 19 Apr (1334812779), Zhang, Yang Z wrote:
> >> There have no problem with this patch, it works well. But it
cannot
> >> fix the win8 issue. It seems there has some other issues with
hpet. I
> >> will look into it.  Thanks for your quick patch.
> >
> > The lock in hvm_get_guest_time() will still be serializing the hpet
> > reads.  But since it needs to update a shared variable, that will need
> > to haul cachelines around anyway.
> 
> Yes, that''s true. You could try the attached hacky patch out of
interest, to see
> what that lock is costing you in your scenario. But if we want consistent
> monotonically-increasing guest time, I suspect we can''t get rid of
the lock, so
> that''s going to limit our scalability unavoidably. Shame.
> 
>  -- Keir
> 
> > Tim.
> >

Jan Beulich

2012-Apr-23 07:43 UTC

head link

Re: lock in vhpet

>>> On 23.04.12 at 09:36, "Zhang, Yang Z"
<yang.z.zhang@intel.com> wrote:
> The p2m lock in __get_gfn_type_access() is the culprit. Here is the
profiling
> data with 10 seconds:
> 
> (XEN) p2m_lock 1 lock:
> (XEN)   lock:      190733(00000000:14CE5726), block:       
> 67159(00000007:6AAA53F3)
> 
> Those data is collected when win8 guest(16 vcpus) is idle. 16 VCPUs blocked
> 30 seconds with 10 sec''s profiling. It means 18% of cpu cycle is
waiting for
> the p2m lock. And those data only for idle guest. The impaction is more 
> seriously when run some workload inside guest. 
> I noticed that this change was adding by cs 24770. And before it, we
don''t
> require the p2m lock in _get_gfn_type_access. So is this lock really 
> necessary?
Or shouldn''t such a lock frequently taken on a read path be an rwlock
instead?

Jan

Zhang, Yang Z

2012-Apr-23 08:15 UTC

head link

Re: lock in vhpet

> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: Monday, April 23, 2012 3:43 PM
> To: Zhang, Yang Z; Keir Fraser; Tim Deegan
> Cc: andres@lagarcavilla.org; xen-devel@lists.xensource.com
> Subject: Re: [Xen-devel] lock in vhpet
> 
> >>> On 23.04.12 at 09:36, "Zhang, Yang Z"
<yang.z.zhang@intel.com> wrote:
> > The p2m lock in __get_gfn_type_access() is the culprit. Here is the
> > profiling data with 10 seconds:
> >
> > (XEN) p2m_lock 1 lock:
> > (XEN)   lock:      190733(00000000:14CE5726), block:
> > 67159(00000007:6AAA53F3)
> >
> > Those data is collected when win8 guest(16 vcpus) is idle. 16 VCPUs
> > blocked
> > 30 seconds with 10 sec''s profiling. It means 18% of cpu cycle
is
> > waiting for the p2m lock. And those data only for idle guest. The
> > impaction is more seriously when run some workload inside guest.
> > I noticed that this change was adding by cs 24770. And before it, we
> > don''t require the p2m lock in _get_gfn_type_access. So is
this lock
> > really necessary?
> 
> Or shouldn''t such a lock frequently taken on a read path be an
rwlock instead?
> Right. Using rwlock would make more sense. 

best regards
yang

Keir Fraser

2012-Apr-23 08:22 UTC

head link

Re: lock in vhpet

On 23/04/2012 09:15, "Zhang, Yang Z" <yang.z.zhang@intel.com>
wrote:
>>> Those data is collected when win8 guest(16 vcpus) is idle. 16 VCPUs
>>> blocked
>>> 30 seconds with 10 sec''s profiling. It means 18% of cpu
cycle is
>>> waiting for the p2m lock. And those data only for idle guest. The
>>> impaction is more seriously when run some workload inside guest.
>>> I noticed that this change was adding by cs 24770. And before it,
we
>>> don''t require the p2m lock in _get_gfn_type_access. So is
this lock
>>> really necessary?
>> 
>> Or shouldn''t such a lock frequently taken on a read path be an
rwlock
>> instead?
>> 
> Right. Using rwlock would make more sense.
Interested to see if it would improve performance. My guess would be no.

 -- Keir

Tim Deegan

2012-Apr-23 09:14 UTC

head link

Re: lock in vhpet

At 07:36 +0000 on 23 Apr (1335166577), Zhang, Yang Z
wrote:> The p2m lock in __get_gfn_type_access() is the culprit. Here is the
profiling data with 10 seconds:
> 
> (XEN) p2m_lock 1 lock:
> (XEN)   lock:      190733(00000000:14CE5726), block:      
67159(00000007:6AAA53F3)
> 
> Those data is collected when win8 guest(16 vcpus) is idle. 16 VCPUs
> blocked 30 seconds with 10 sec''s profiling. It means 18% of cpu
cycle
> is waiting for the p2m lock. And those data only for idle guest. The
> impaction is more seriously when run some workload inside guest.  I
> noticed that this change was adding by cs 24770. And before it, we
> don''t require the p2m lock in _get_gfn_type_access. So is this
lock
> really necessary?
Ugh; that certainly is a regression.  We used to be lock-free on p2m
lookups and losing that will be bad for perf in lots of ways.  IIRC the
original aim was to use fine-grained per-page locks for this -- there
should be no need to hold a per-domain lock during a normal read.
Andres, what happened to that code?

Making it an rwlock would be tricky as this interface doesn''t
differenctiate readers from writers.  For the common case (no
sharing/paging/mem-access) it ought to be a win since there is hardly
any writing.  But it would be better to make this particular lock/unlock
go away.

Tim.

Andres Lagar-Cavilla

2012-Apr-23 15:26 UTC

head link

Re: lock in vhpet

> At 07:36 +0000 on 23 Apr (1335166577), Zhang, Yang Z wrote:
>> The p2m lock in __get_gfn_type_access() is the culprit. Here is the
>> profiling data with 10 seconds:
>>
>> (XEN) p2m_lock 1 lock:
>> (XEN)   lock:      190733(00000000:14CE5726), block:
>> 67159(00000007:6AAA53F3)
>>
>> Those data is collected when win8 guest(16 vcpus) is idle. 16 VCPUs
>> blocked 30 seconds with 10 sec''s profiling. It means 18% of
cpu cycle
>> is waiting for the p2m lock. And those data only for idle guest. The
>> impaction is more seriously when run some workload inside guest.  I
>> noticed that this change was adding by cs 24770. And before it, we
>> don''t require the p2m lock in _get_gfn_type_access. So is this
lock
>> really necessary?
>
> Ugh; that certainly is a regression.  We used to be lock-free on p2m
> lookups and losing that will be bad for perf in lots of ways.  IIRC the
> original aim was to use fine-grained per-page locks for this -- there
> should be no need to hold a per-domain lock during a normal read.
> Andres, what happened to that code?
The fine-grained p2m locking code is stashed somewhere and untested.
Obviously not meant for 4.2. I don''t think it''ll be useful
here: all vcpus
are hitting the same gfn for the hpet mmio address.

Thanks for the report Yang, it would be great to understand at which call
sites the p2m lock is contended, so we can try to alleviate contention at
the right place.

Looking closely at the code, it would seem the only get_gfn in
hvmemul_do_io is called on ram_gpa == 0 (?!). Shouldn''t ram_gpa
underlie
the target mmio address for emulation?

Notwithstanding the above, we''ve optimized p2m access on hvmemul_do_io
to
have as brief a critical section as possible: get_gfn, get_page, put_gfn,
work, put_page later.

So contention might arise from (bogusly) doing get_gfn(0) by every vcpu
(when it should instead be get_gfn(hpet_gfn)). The only way to alleviate
that contention is to use get_gfn_query_unlocked, and that will be unsafe
against paging or sharing. I am very skeptical this is causing the
slowdown you see, since you''re not using paging or sharing. The
critical
section protected by the p2m lock is literally one cmpxchg.

The other source of contention might come from hvmemul_rep_movs, which
holds the p2m lock for the duration of the mmio operation. I can optimize
that one using the get_gfn/get_page/put_gfn pattern mentioned above.

The third source of contention might be the __hvm_copy to/from the target
address of the hpet value read/written. That one can be slightly optimized
by doing the memcpy outside of the scope of the p2m lock.
>
> Making it an rwlock would be tricky as this interface doesn''t
> differenctiate readers from writers.  For the common case (no
> sharing/paging/mem-access) it ought to be a win since there is hardly
> any writing.  But it would be better to make this particular lock/unlock
> go away.
>
We had a discussion about rwlocks way back when. It''s impossible (or
nearly so) to tell when an access might upgrade to write privileges.
Deadlock has a high likelihood.

Andres
> Tim.
>

Andres Lagar-Cavilla

2012-Apr-23 17:18 UTC

head link

Re: lock in vhpet

> At 07:36 +0000 on 23 Apr (1335166577), Zhang, Yang Z wrote:
>> The p2m lock in __get_gfn_type_access() is the culprit. Here is the
>> profiling data with 10 seconds:
>>
>> (XEN) p2m_lock 1 lock:
>> (XEN)   lock:      190733(00000000:14CE5726), block:
>> 67159(00000007:6AAA53F3)
>>
>> Those data is collected when win8 guest(16 vcpus) is idle. 16 VCPUs
>> blocked 30 seconds with 10 sec''s profiling. It means 18% of
cpu cycle
>> is waiting for the p2m lock. And those data only for idle guest. The
>> impaction is more seriously when run some workload inside guest.  I
>> noticed that this change was adding by cs 24770. And before it, we
>> don''t require the p2m lock in _get_gfn_type_access. So is this
lock
>> really necessary?
>
> Ugh; that certainly is a regression.  We used to be lock-free on p2m
> lookups and losing that will be bad for perf in lots of ways.  IIRC the
> original aim was to use fine-grained per-page locks for this -- there
> should be no need to hold a per-domain lock during a normal read.
> Andres, what happened to that code?
Sigh, scratch a lot of nonsense I just spewed on the "hpet gfn". No
actual
page backing that one, no concerns.

Still is the case that ram_gpa is zero in many cases going into
hvmemul_do_io. There really is no point in doing a get_page(get_gfn(0)).
How about the following (there''s more email after this patch):
# HG changeset patch
# Parent ccc64793187f7d9c926343a1cd4ae822a4364bd1
x86/emulation: No need to get_gfn on zero ram_gpa.

Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org>

diff -r ccc64793187f xen/arch/x86/hvm/emulate.c
--- a/xen/arch/x86/hvm/emulate.c
+++ b/xen/arch/x86/hvm/emulate.c
@@ -60,33 +60,37 @@ static int hvmemul_do_io(
     ioreq_t *p = get_ioreq(curr);
     unsigned long ram_gfn = paddr_to_pfn(ram_gpa);
     p2m_type_t p2mt;
-    mfn_t ram_mfn;
+    mfn_t ram_mfn = _mfn(INVALID_MFN);
     int rc;

-    /* Check for paged out page */
-    ram_mfn = get_gfn_unshare(curr->domain, ram_gfn, &p2mt);
-    if ( p2m_is_paging(p2mt) )
-    {
-        put_gfn(curr->domain, ram_gfn);
-        p2m_mem_paging_populate(curr->domain, ram_gfn);
-        return X86EMUL_RETRY;
-    }
-    if ( p2m_is_shared(p2mt) )
-    {
-        put_gfn(curr->domain, ram_gfn);
-        return X86EMUL_RETRY;
-    }
-
-    /* Maintain a ref on the mfn to ensure liveness. Put the gfn
-     * to avoid potential deadlock wrt event channel lock, later. */
-    if ( mfn_valid(mfn_x(ram_mfn)) )
-        if ( !get_page(mfn_to_page(mfn_x(ram_mfn)),
-             curr->domain) )
+    /* Many callers pass a stub zero ram_gpa addresses. */
+    if ( ram_gfn != 0 )
+    {
+        /* Check for paged out page */
+        ram_mfn = get_gfn_unshare(curr->domain, ram_gfn, &p2mt);
+        if ( p2m_is_paging(p2mt) )
         {
-            put_gfn(curr->domain, ram_gfn);
+            put_gfn(curr->domain, ram_gfn);
+            p2m_mem_paging_populate(curr->domain, ram_gfn);
             return X86EMUL_RETRY;
         }
-    put_gfn(curr->domain, ram_gfn);
+        if ( p2m_is_shared(p2mt) )
+        {
+            put_gfn(curr->domain, ram_gfn);
+            return X86EMUL_RETRY;
+        }
+
+        /* Maintain a ref on the mfn to ensure liveness. Put the gfn
+         * to avoid potential deadlock wrt event channel lock, later. */
+        if ( mfn_valid(mfn_x(ram_mfn)) )
+            if ( !get_page(mfn_to_page(mfn_x(ram_mfn)),
+                 curr->domain) )
+            {
+                put_gfn(curr->domain, ram_gfn);
+                return X86EMUL_RETRY;
+            }
+        put_gfn(curr->domain, ram_gfn);
+    }

     /*
      * Weird-sized accesses have undefined behaviour: we discard writes


If contention is coming in from the emul_rep_movs path, then that might be
taken care of with the following:
# HG changeset patch
# Parent 18b694840961cb6e3934628b23902a7979f00bac
x86/emulate: Relieve contention of p2m lock in emulation of rep movs.

get_two_gfns is used to query the source and target physical addresses of the
emulated rep movs. It is not necessary to hold onto the two gfn''s for
the
duration of the emulation: further calls down the line will do the
appropriate
unsahring or paging in, and unwind correctly on failure.

Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org>

diff -r 18b694840961 xen/arch/x86/hvm/emulate.c
--- a/xen/arch/x86/hvm/emulate.c
+++ b/xen/arch/x86/hvm/emulate.c
@@ -714,25 +714,23 @@ static int hvmemul_rep_movs(
     if ( rc != X86EMUL_OKAY )
         return rc;

+    /* The query on the gfns is to establish the need for mmio. In the
two mmio
+     * cases, a proper get_gfn for the "other" gfn will be enacted,
with
paging in
+     * or unsharing if necessary. In the memmove case, the gfn might
change given
+     * the bytes mov''ed, and, again, proper get_gfn''s will be
enacted in
+     * __hvm_copy. */
     get_two_gfns(current->domain, sgpa >> PAGE_SHIFT, &sp2mt,
NULL, NULL,
                  current->domain, dgpa >> PAGE_SHIFT, &dp2mt,
NULL, NULL,
                  P2M_ALLOC, &tg);
-
+    put_two_gfns(&tg);
+
     if ( !p2m_is_ram(sp2mt) && !p2m_is_grant(sp2mt) )
-    {
-        rc = hvmemul_do_mmio(
+        return hvmemul_do_mmio(
             sgpa, reps, bytes_per_rep, dgpa, IOREQ_READ, df, NULL);
-        put_two_gfns(&tg);
-        return rc;
-    }

     if ( !p2m_is_ram(dp2mt) && !p2m_is_grant(dp2mt) )
-    {
-        rc = hvmemul_do_mmio(
+        return hvmemul_do_mmio(
             dgpa, reps, bytes_per_rep, sgpa, IOREQ_WRITE, df, NULL);
-        put_two_gfns(&tg);
-        return rc;
-    }

     /* RAM-to-RAM copy: emulate as equivalent of memmove(dgpa, sgpa,
bytes). */
     bytes = *reps * bytes_per_rep;
@@ -747,10 +745,7 @@ static int hvmemul_rep_movs(
      * can be emulated by a source-to-buffer-to-destination block copy.
      */
     if ( ((dgpa + bytes_per_rep) > sgpa) && (dgpa < (sgpa +
bytes)) )
-    {
-        put_two_gfns(&tg);
         return X86EMUL_UNHANDLEABLE;
-    }

     /* Adjust destination address for reverse copy. */
     if ( df )
@@ -759,10 +754,7 @@ static int hvmemul_rep_movs(
     /* Allocate temporary buffer. Fall back to slow emulation if this
fails. */
     buf = xmalloc_bytes(bytes);
     if ( buf == NULL )
-    {
-        put_two_gfns(&tg);
         return X86EMUL_UNHANDLEABLE;
-    }

     /*
      * We do a modicum of checking here, just for paranoia''s sake and
to
@@ -773,7 +765,6 @@ static int hvmemul_rep_movs(
         rc = hvm_copy_to_guest_phys(dgpa, buf, bytes);

     xfree(buf);
-    put_two_gfns(&tg);

     if ( rc == HVMCOPY_gfn_paged_out )
         return X86EMUL_RETRY;


Let me know if any of this helps
Thanks,
Andres
>
> Making it an rwlock would be tricky as this interface doesn''t
> differenctiate readers from writers.  For the common case (no
> sharing/paging/mem-access) it ought to be a win since there is hardly
> any writing.  But it would be better to make this particular lock/unlock
> go away.
>
> Tim.
>

Zhang, Yang Z

2012-Apr-24 08:58 UTC

head link

Re: lock in vhpet

> -----Original Message-----
> From: Andres Lagar-Cavilla [mailto:andres@lagarcavilla.org]
> Sent: Tuesday, April 24, 2012 1:19 AM
> 
> Let me know if any of this helpsNo, it not works. 

best regards
yang

Tim Deegan

2012-Apr-24 09:15 UTC

head link

Re: lock in vhpet

At 08:26 -0700 on 23 Apr (1335169568), Andres Lagar-Cavilla
wrote:> > At 07:36 +0000 on 23 Apr (1335166577), Zhang, Yang Z wrote:
> >> The p2m lock in __get_gfn_type_access() is the culprit. Here is
the
> >> profiling data with 10 seconds:
> >>
> >> (XEN) p2m_lock 1 lock:
> >> (XEN)   lock:      190733(00000000:14CE5726), block:
> >> 67159(00000007:6AAA53F3)
> >>
> >> Those data is collected when win8 guest(16 vcpus) is idle. 16
VCPUs
> >> blocked 30 seconds with 10 sec''s profiling. It means 18%
of cpu cycle
> >> is waiting for the p2m lock. And those data only for idle guest.
The
> >> impaction is more seriously when run some workload inside guest. 
I
> >> noticed that this change was adding by cs 24770. And before it, we
> >> don''t require the p2m lock in _get_gfn_type_access. So is
this lock
> >> really necessary?
> >
> > Ugh; that certainly is a regression.  We used to be lock-free on p2m
> > lookups and losing that will be bad for perf in lots of ways.  IIRC
the
> > original aim was to use fine-grained per-page locks for this -- there
> > should be no need to hold a per-domain lock during a normal read.
> > Andres, what happened to that code?
> 
> The fine-grained p2m locking code is stashed somewhere and untested.
> Obviously not meant for 4.2. I don''t think it''ll be
useful here: all vcpus
> are hitting the same gfn for the hpet mmio address.
We''ll have to do _something_ for 4.2 if it''s introducing an
18% CPU
overhead in an otherwise idle VM.

In any case I think this means I probably shouldn''t take the patch that
turns on this locking for shadow VMs.  They do a lot more p2m lookups. 
> The other source of contention might come from hvmemul_rep_movs, which
> holds the p2m lock for the duration of the mmio operation. I can optimize
> that one using the get_gfn/get_page/put_gfn pattern mentioned above.
But wouldn''t that be unsafe?  What if the p2m changes during the
operation?  Or, conversely, could you replace all uses of the lock in
p2m lookups with get_page() on the result and still get what you need?

Tim.

Tim Deegan

2012-Apr-24 09:16 UTC

head link

Re: lock in vhpet

At 08:58 +0000 on 24 Apr (1335257909), Zhang, Yang Z
wrote:> > -----Original Message-----
> > From: Andres Lagar-Cavilla [mailto:andres@lagarcavilla.org]
> > Sent: Tuesday, April 24, 2012 1:19 AM
> > 
> > Let me know if any of this helps
> No, it not works. 
Do you mean that it doesn''t help with the CPU overhead, or that
it''s
broken in some other way?

Tim.

Andres Lagar-Cavilla

2012-Apr-24 13:28 UTC

head link

Re: lock in vhpet

> At 08:26 -0700 on 23 Apr (1335169568), Andres Lagar-Cavilla wrote:
>> > At 07:36 +0000 on 23 Apr (1335166577), Zhang, Yang Z wrote:
>> >> The p2m lock in __get_gfn_type_access() is the culprit. Here
is the
>> >> profiling data with 10 seconds:
>> >>
>> >> (XEN) p2m_lock 1 lock:
>> >> (XEN)   lock:      190733(00000000:14CE5726), block:
>> >> 67159(00000007:6AAA53F3)
>> >>
>> >> Those data is collected when win8 guest(16 vcpus) is idle. 16
VCPUs
>> >> blocked 30 seconds with 10 sec''s profiling. It means
18% of cpu cycle
>> >> is waiting for the p2m lock. And those data only for idle
guest. The
>> >> impaction is more seriously when run some workload inside
guest.  I
>> >> noticed that this change was adding by cs 24770. And before
it, we
>> >> don''t require the p2m lock in _get_gfn_type_access.
So is this lock
>> >> really necessary?
>> >
>> > Ugh; that certainly is a regression.  We used to be lock-free on
p2m
>> > lookups and losing that will be bad for perf in lots of ways. 
IIRC
>> the
>> > original aim was to use fine-grained per-page locks for this --
there
>> > should be no need to hold a per-domain lock during a normal read.
>> > Andres, what happened to that code?
>>
>> The fine-grained p2m locking code is stashed somewhere and untested.
>> Obviously not meant for 4.2. I don''t think it''ll be
useful here: all
>> vcpus
>> are hitting the same gfn for the hpet mmio address.
>
> We''ll have to do _something_ for 4.2 if it''s introducing
an 18% CPU
> overhead in an otherwise idle VM.
An hpet mmio read or write hits get_gfn twice: one for gfn zero at the
beginning of hvmemul_do_io, the other during hvm copy. The patch I sent to
Yang yesterday removes the bogus first get_gfn. The second one has to
perform a locked query.

Yang, is there any possibility to get more information here? The ability
to identify the call site that contends for the p2m lock would be crucial.
Even something as crude as sampling vcpu stack traces by hitting
''d''
repeatedly on the serial line, and seeing what sticks out frequently.
>
> In any case I think this means I probably shouldn''t take the patch
that
> turns on this locking for shadow VMs.  They do a lot more p2m lookups.
>
>> The other source of contention might come from hvmemul_rep_movs, which
>> holds the p2m lock for the duration of the mmio operation. I can
>> optimize
>> that one using the get_gfn/get_page/put_gfn pattern mentioned above.
>
> But wouldn''t that be unsafe?  What if the p2m changes during the
> operation?  Or, conversely, could you replace all uses of the lock in
> p2m lookups with get_page() on the result and still get what you need?
I''ve been thinking of wrapping the pattern into a handy p2m accessor. I
see this pattern repeating itself for hvm_copy, tss/segment entry, page
table walking, nested hvm, etc, in which the consumer wants to map the
result of the translation. By the time you finish with a get_gfn_unshare,
most possible p2m transformations will have been taken care of (PoD-alloc,
page in, unshare). By taking a reference to the underlying page, paging
out is prevented, and then the vcpu can safely let go of the p2m lock.

Conceivably guest_physmap_remove_page and guest_physmap_add_entry can
still come in and change the p2m entry.

Andres
>
> Tim.
>

Zhang, Yang Z

2012-Apr-25 00:27 UTC

head link

Re: lock in vhpet

> -----Original Message-----
> From: Tim Deegan [mailto:tim@xen.org]
> Sent: Tuesday, April 24, 2012 5:17 PM
> To: Zhang, Yang Z
> Cc: andres@lagarcavilla.org; xen-devel@lists.xensource.com; Keir Fraser
> Subject: Re: [Xen-devel] lock in vhpet
> 
> At 08:58 +0000 on 24 Apr (1335257909), Zhang, Yang Z wrote:
> > > -----Original Message-----
> > > From: Andres Lagar-Cavilla [mailto:andres@lagarcavilla.org]
> > > Sent: Tuesday, April 24, 2012 1:19 AM
> > >
> > > Let me know if any of this helps
> > No, it not works.
> 
> Do you mean that it doesn''t help with the CPU overhead, or that
it''s broken in
> some other way?
> It cannot help with the CPU overhead

best regards
yang

Andres Lagar-Cavilla

2012-Apr-25 01:40 UTC

head link

Re: lock in vhpet

>> -----Original Message-----
>> From: Tim Deegan [mailto:tim@xen.org]
>> Sent: Tuesday, April 24, 2012 5:17 PM
>> To: Zhang, Yang Z
>> Cc: andres@lagarcavilla.org; xen-devel@lists.xensource.com; Keir Fraser
>> Subject: Re: [Xen-devel] lock in vhpet
>>
>> At 08:58 +0000 on 24 Apr (1335257909), Zhang, Yang Z wrote:
>> > > -----Original Message-----
>> > > From: Andres Lagar-Cavilla [mailto:andres@lagarcavilla.org]
>> > > Sent: Tuesday, April 24, 2012 1:19 AM
>> > >
>> > > Let me know if any of this helps
>> > No, it not works.
>>
>> Do you mean that it doesn''t help with the CPU overhead, or
that it''s
>> broken in
>> some other way?
>>
> It cannot help with the CPU overhead
Yang, is there any further information you can provide? A rough idea of
where vcpus are spending time spinning for the p2m lock would be
tremendously useful.

Thanks
Andres
>
> best regards
> yang
>

Zhang, Yang Z

2012-Apr-25 01:48 UTC

head link

Re: lock in vhpet

> -----Original Message-----
> From: Andres Lagar-Cavilla [mailto:andres@lagarcavilla.org]
> Sent: Wednesday, April 25, 2012 9:40 AM
> To: Zhang, Yang Z
> Cc: Tim Deegan; xen-devel@lists.xensource.com; Keir Fraser
> Subject: RE: [Xen-devel] lock in vhpet
> 
> >> -----Original Message-----
> >> From: Tim Deegan [mailto:tim@xen.org]
> >> Sent: Tuesday, April 24, 2012 5:17 PM
> >> To: Zhang, Yang Z
> >> Cc: andres@lagarcavilla.org; xen-devel@lists.xensource.com; Keir
> >> Fraser
> >> Subject: Re: [Xen-devel] lock in vhpet
> >>
> >> At 08:58 +0000 on 24 Apr (1335257909), Zhang, Yang Z wrote:
> >> > > -----Original Message-----
> >> > > From: Andres Lagar-Cavilla
[mailto:andres@lagarcavilla.org]
> >> > > Sent: Tuesday, April 24, 2012 1:19 AM
> >> > >
> >> > > Let me know if any of this helps
> >> > No, it not works.
> >>
> >> Do you mean that it doesn''t help with the CPU overhead,
or that it''s
> >> broken in some other way?
> >>
> > It cannot help with the CPU overhead
> 
> Yang, is there any further information you can provide? A rough idea of
where
> vcpus are spending time spinning for the p2m lock would be tremendously
> useful.
> I am doing the further investigation. Hope can get more useful information. 
But actually, the first cs introduced this issue is 24770. When win8 booting and
if hpet is enabled, it will use hpet as the time source and there have lots of
hpet access and EPT violation. In EPT violation handler, it call
get_gfn_type_access to get the mfn. The cs 24770 introduces the gfn_lock for p2m
lookups, and then the issue happens. After I removed the gfn_lock, the issue
goes. But in latest xen, even I remove this lock, it still shows high cpu
utilization.

yang

Andres Lagar-Cavilla

2012-Apr-25 02:31 UTC

head link

Re: lock in vhpet

>
>> -----Original Message-----
>> From: Andres Lagar-Cavilla [mailto:andres@lagarcavilla.org]
>> Sent: Wednesday, April 25, 2012 9:40 AM
>> To: Zhang, Yang Z
>> Cc: Tim Deegan; xen-devel@lists.xensource.com; Keir Fraser
>> Subject: RE: [Xen-devel] lock in vhpet
>>
>> >> -----Original Message-----
>> >> From: Tim Deegan [mailto:tim@xen.org]
>> >> Sent: Tuesday, April 24, 2012 5:17 PM
>> >> To: Zhang, Yang Z
>> >> Cc: andres@lagarcavilla.org; xen-devel@lists.xensource.com;
Keir
>> >> Fraser
>> >> Subject: Re: [Xen-devel] lock in vhpet
>> >>
>> >> At 08:58 +0000 on 24 Apr (1335257909), Zhang, Yang Z wrote:
>> >> > > -----Original Message-----
>> >> > > From: Andres Lagar-Cavilla
[mailto:andres@lagarcavilla.org]
>> >> > > Sent: Tuesday, April 24, 2012 1:19 AM
>> >> > >
>> >> > > Let me know if any of this helps
>> >> > No, it not works.
>> >>
>> >> Do you mean that it doesn''t help with the CPU
overhead, or that it''s
>> >> broken in some other way?
>> >>
>> > It cannot help with the CPU overhead
>>
>> Yang, is there any further information you can provide? A rough idea of
>> where
>> vcpus are spending time spinning for the p2m lock would be tremendously
>> useful.
>>
> I am doing the further investigation. Hope can get more useful
> information.
Thanks, looking forward to that.
> But actually, the first cs introduced this issue is 24770. When win8
> booting and if hpet is enabled, it will use hpet as the time source and
> there have lots of hpet access and EPT violation. In EPT violation
> handler, it call get_gfn_type_access to get the mfn. The cs 24770
> introduces the gfn_lock for p2m lookups, and then the issue happens. After
> I removed the gfn_lock, the issue goes. But in latest xen, even I remove
> this lock, it still shows high cpu utilization.
>
It would seem then that even the briefest lock-protected critical section
would cause this? In the mmio case, the p2m lock taken in the hap fault
handler is held during the actual lookup, and for a couple of branch
instructions afterwards.

In latest Xen, with lock removed for get_gfn, on which lock is time spent?

Thanks,
Andres
> yang
>

Zhang, Yang Z

2012-Apr-25 02:36 UTC

head link

Re: lock in vhpet

> -----Original Message-----
> From: Andres Lagar-Cavilla [mailto:andres@lagarcavilla.org]
> Sent: Wednesday, April 25, 2012 10:31 AM
> To: Zhang, Yang Z
> Cc: Tim Deegan; xen-devel@lists.xensource.com; Keir Fraser
> Subject: RE: [Xen-devel] lock in vhpet
> 
> >
> >> -----Original Message-----
> >> From: Andres Lagar-Cavilla [mailto:andres@lagarcavilla.org]
> >> Sent: Wednesday, April 25, 2012 9:40 AM
> >> To: Zhang, Yang Z
> >> Cc: Tim Deegan; xen-devel@lists.xensource.com; Keir Fraser
> >> Subject: RE: [Xen-devel] lock in vhpet
> >>
> >> >> -----Original Message-----
> >> >> From: Tim Deegan [mailto:tim@xen.org]
> >> >> Sent: Tuesday, April 24, 2012 5:17 PM
> >> >> To: Zhang, Yang Z
> >> >> Cc: andres@lagarcavilla.org;
xen-devel@lists.xensource.com; Keir
> >> >> Fraser
> >> >> Subject: Re: [Xen-devel] lock in vhpet
> >> >>
> >> >> At 08:58 +0000 on 24 Apr (1335257909), Zhang, Yang Z
wrote:
> >> >> > > -----Original Message-----
> >> >> > > From: Andres Lagar-Cavilla
[mailto:andres@lagarcavilla.org]
> >> >> > > Sent: Tuesday, April 24, 2012 1:19 AM
> >> >> > >
> >> >> > > Let me know if any of this helps
> >> >> > No, it not works.
> >> >>
> >> >> Do you mean that it doesn''t help with the CPU
overhead, or that
> >> >> it''s broken in some other way?
> >> >>
> >> > It cannot help with the CPU overhead
> >>
> >> Yang, is there any further information you can provide? A rough
idea
> >> of where vcpus are spending time spinning for the p2m lock would
be
> >> tremendously useful.
> >>
> > I am doing the further investigation. Hope can get more useful
> > information.
> 
> Thanks, looking forward to that.
> 
> > But actually, the first cs introduced this issue is 24770. When win8
> > booting and if hpet is enabled, it will use hpet as the time source
> > and there have lots of hpet access and EPT violation. In EPT violation
> > handler, it call get_gfn_type_access to get the mfn. The cs 24770
> > introduces the gfn_lock for p2m lookups, and then the issue happens.
> > After I removed the gfn_lock, the issue goes. But in latest xen, even
> > I remove this lock, it still shows high cpu utilization.
> >
> 
> It would seem then that even the briefest lock-protected critical section
would
> cause this? In the mmio case, the p2m lock taken in the hap fault handler
is
> held during the actual lookup, and for a couple of branch instructions
> afterwards.
> 
> In latest Xen, with lock removed for get_gfn, on which lock is time spent?Still the p2m_lock.

yang

Andres Lagar-Cavilla

2012-Apr-25 02:42 UTC

head link

Re: lock in vhpet

>> -----Original Message-----
>> From: Andres Lagar-Cavilla [mailto:andres@lagarcavilla.org]
>> Sent: Wednesday, April 25, 2012 10:31 AM
>> To: Zhang, Yang Z
>> Cc: Tim Deegan; xen-devel@lists.xensource.com; Keir Fraser
>> Subject: RE: [Xen-devel] lock in vhpet
>>
>> >
>> >> -----Original Message-----
>> >> From: Andres Lagar-Cavilla [mailto:andres@lagarcavilla.org]
>> >> Sent: Wednesday, April 25, 2012 9:40 AM
>> >> To: Zhang, Yang Z
>> >> Cc: Tim Deegan; xen-devel@lists.xensource.com; Keir Fraser
>> >> Subject: RE: [Xen-devel] lock in vhpet
>> >>
>> >> >> -----Original Message-----
>> >> >> From: Tim Deegan [mailto:tim@xen.org]
>> >> >> Sent: Tuesday, April 24, 2012 5:17 PM
>> >> >> To: Zhang, Yang Z
>> >> >> Cc: andres@lagarcavilla.org;
xen-devel@lists.xensource.com; Keir
>> >> >> Fraser
>> >> >> Subject: Re: [Xen-devel] lock in vhpet
>> >> >>
>> >> >> At 08:58 +0000 on 24 Apr (1335257909), Zhang, Yang Z
wrote:
>> >> >> > > -----Original Message-----
>> >> >> > > From: Andres Lagar-Cavilla
[mailto:andres@lagarcavilla.org]
>> >> >> > > Sent: Tuesday, April 24, 2012 1:19 AM
>> >> >> > >
>> >> >> > > Let me know if any of this helps
>> >> >> > No, it not works.
>> >> >>
>> >> >> Do you mean that it doesn''t help with the
CPU overhead, or that
>> >> >> it''s broken in some other way?
>> >> >>
>> >> > It cannot help with the CPU overhead
>> >>
>> >> Yang, is there any further information you can provide? A
rough idea
>> >> of where vcpus are spending time spinning for the p2m lock
would be
>> >> tremendously useful.
>> >>
>> > I am doing the further investigation. Hope can get more useful
>> > information.
>>
>> Thanks, looking forward to that.
>>
>> > But actually, the first cs introduced this issue is 24770. When
win8
>> > booting and if hpet is enabled, it will use hpet as the time
source
>> > and there have lots of hpet access and EPT violation. In EPT
violation
>> > handler, it call get_gfn_type_access to get the mfn. The cs 24770
>> > introduces the gfn_lock for p2m lookups, and then the issue
happens.
>> > After I removed the gfn_lock, the issue goes. But in latest xen,
even
>> > I remove this lock, it still shows high cpu utilization.
>> >
>>
>> It would seem then that even the briefest lock-protected critical
>> section would
>> cause this? In the mmio case, the p2m lock taken in the hap fault
>> handler is
>> held during the actual lookup, and for a couple of branch instructions
>> afterwards.
>>
>> In latest Xen, with lock removed for get_gfn, on which lock is time
>> spent?
> Still the p2m_lock.
How are you removing the lock from get_gfn?

The p2m lock is taken on a few specific code paths outside of get_gfn
(change type of an entry, add a new p2m entry, setup and teardown), and
I''m surprised any of those code paths is being used by the hpet mmio
handler.

Andres
>
> yang
>
>

Zhang, Yang Z

2012-Apr-25 03:12 UTC

head link

Re: lock in vhpet

> -----Original Message-----
> From: Andres Lagar-Cavilla [mailto:andres@lagarcavilla.org]
> Sent: Wednesday, April 25, 2012 10:42 AM
> To: Zhang, Yang Z
> Cc: Tim Deegan; xen-devel@lists.xensource.com; Keir Fraser
> Subject: RE: [Xen-devel] lock in vhpet
> 
> >> -----Original Message-----
> >> From: Andres Lagar-Cavilla [mailto:andres@lagarcavilla.org]
> >> Sent: Wednesday, April 25, 2012 10:31 AM
> >> To: Zhang, Yang Z
> >> Cc: Tim Deegan; xen-devel@lists.xensource.com; Keir Fraser
> >> Subject: RE: [Xen-devel] lock in vhpet
> >>
> >> >
> >> >> -----Original Message-----
> >> >> From: Andres Lagar-Cavilla
[mailto:andres@lagarcavilla.org]
> >> >> Sent: Wednesday, April 25, 2012 9:40 AM
> >> >> To: Zhang, Yang Z
> >> >> Cc: Tim Deegan; xen-devel@lists.xensource.com; Keir
Fraser
> >> >> Subject: RE: [Xen-devel] lock in vhpet
> >> >>
> >> >> >> -----Original Message-----
> >> >> >> From: Tim Deegan [mailto:tim@xen.org]
> >> >> >> Sent: Tuesday, April 24, 2012 5:17 PM
> >> >> >> To: Zhang, Yang Z
> >> >> >> Cc: andres@lagarcavilla.org;
xen-devel@lists.xensource.com;
> >> >> >> Keir Fraser
> >> >> >> Subject: Re: [Xen-devel] lock in vhpet
> >> >> >>
> >> >> >> At 08:58 +0000 on 24 Apr (1335257909), Zhang,
Yang Z wrote:
> >> >> >> > > -----Original Message-----
> >> >> >> > > From: Andres Lagar-Cavilla
[mailto:andres@lagarcavilla.org]
> >> >> >> > > Sent: Tuesday, April 24, 2012 1:19 AM
> >> >> >> > >
> >> >> >> > > Let me know if any of this helps
> >> >> >> > No, it not works.
> >> >> >>
> >> >> >> Do you mean that it doesn''t help with
the CPU overhead, or that
> >> >> >> it''s broken in some other way?
> >> >> >>
> >> >> > It cannot help with the CPU overhead
> >> >>
> >> >> Yang, is there any further information you can provide? A
rough
> >> >> idea of where vcpus are spending time spinning for the
p2m lock
> >> >> would be tremendously useful.
> >> >>
> >> > I am doing the further investigation. Hope can get more
useful
> >> > information.
> >>
> >> Thanks, looking forward to that.
> >>
> >> > But actually, the first cs introduced this issue is 24770.
When
> >> > win8 booting and if hpet is enabled, it will use hpet as the
time
> >> > source and there have lots of hpet access and EPT violation.
In EPT
> >> > violation handler, it call get_gfn_type_access to get the
mfn. The
> >> > cs 24770 introduces the gfn_lock for p2m lookups, and then
the issue
> happens.
> >> > After I removed the gfn_lock, the issue goes. But in latest
xen,
> >> > even I remove this lock, it still shows high cpu utilization.
> >> >
> >>
> >> It would seem then that even the briefest lock-protected critical
> >> section would cause this? In the mmio case, the p2m lock taken in
the
> >> hap fault handler is held during the actual lookup, and for a
couple
> >> of branch instructions afterwards.
> >>
> >> In latest Xen, with lock removed for get_gfn, on which lock is
time
> >> spent?
> > Still the p2m_lock.
> 
> How are you removing the lock from get_gfn?
>
> The p2m lock is taken on a few specific code paths outside of get_gfn
(change
> type of an entry, add a new p2m entry, setup and teardown), and
I''m surprised
> any of those code paths is being used by the hpet mmio handler.
Sorry, what I said maybe not accurate. In latest xen, I use a workaround way to
skip calling get_gfn_type_access in hvm_hap_nested_page_fault(). So the p2m_lock
is still existing.
Now, I found the contention of p2m_lock is coming from __hvm_copy. In mmio
handler, it has some code paths to call it(hvm_fetch_from_guest_virt_nofault(), 
hvm_copy_from_guest_virt()). When lots of mmio access happened, the contention
is very obviously.

yang

Andres Lagar-Cavilla

2012-Apr-25 03:34 UTC

head link

Re: lock in vhpet

>> -----Original Message-----
>> From: Andres Lagar-Cavilla [mailto:andres@lagarcavilla.org]
>> Sent: Wednesday, April 25, 2012 10:42 AM
>> To: Zhang, Yang Z
>> Cc: Tim Deegan; xen-devel@lists.xensource.com; Keir Fraser
>> Subject: RE: [Xen-devel] lock in vhpet
>>
>> >> -----Original Message-----
>> >> From: Andres Lagar-Cavilla [mailto:andres@lagarcavilla.org]
>> >> Sent: Wednesday, April 25, 2012 10:31 AM
>> >> To: Zhang, Yang Z
>> >> Cc: Tim Deegan; xen-devel@lists.xensource.com; Keir Fraser
>> >> Subject: RE: [Xen-devel] lock in vhpet
>> >>
>> >> >
>> >> >> -----Original Message-----
>> >> >> From: Andres Lagar-Cavilla
[mailto:andres@lagarcavilla.org]
>> >> >> Sent: Wednesday, April 25, 2012 9:40 AM
>> >> >> To: Zhang, Yang Z
>> >> >> Cc: Tim Deegan; xen-devel@lists.xensource.com; Keir
Fraser
>> >> >> Subject: RE: [Xen-devel] lock in vhpet
>> >> >>
>> >> >> >> -----Original Message-----
>> >> >> >> From: Tim Deegan [mailto:tim@xen.org]
>> >> >> >> Sent: Tuesday, April 24, 2012 5:17 PM
>> >> >> >> To: Zhang, Yang Z
>> >> >> >> Cc: andres@lagarcavilla.org;
xen-devel@lists.xensource.com;
>> >> >> >> Keir Fraser
>> >> >> >> Subject: Re: [Xen-devel] lock in vhpet
>> >> >> >>
>> >> >> >> At 08:58 +0000 on 24 Apr (1335257909),
Zhang, Yang Z wrote:
>> >> >> >> > > -----Original Message-----
>> >> >> >> > > From: Andres Lagar-Cavilla
[mailto:andres@lagarcavilla.org]
>> >> >> >> > > Sent: Tuesday, April 24, 2012 1:19
AM
>> >> >> >> > >
>> >> >> >> > > Let me know if any of this helps
>> >> >> >> > No, it not works.
>> >> >> >>
>> >> >> >> Do you mean that it doesn''t help
with the CPU overhead, or that
>> >> >> >> it''s broken in some other way?
>> >> >> >>
>> >> >> > It cannot help with the CPU overhead
>> >> >>
>> >> >> Yang, is there any further information you can
provide? A rough
>> >> >> idea of where vcpus are spending time spinning for
the p2m lock
>> >> >> would be tremendously useful.
>> >> >>
>> >> > I am doing the further investigation. Hope can get more
useful
>> >> > information.
>> >>
>> >> Thanks, looking forward to that.
>> >>
>> >> > But actually, the first cs introduced this issue is
24770. When
>> >> > win8 booting and if hpet is enabled, it will use hpet as
the time
>> >> > source and there have lots of hpet access and EPT
violation. In EPT
>> >> > violation handler, it call get_gfn_type_access to get the
mfn. The
>> >> > cs 24770 introduces the gfn_lock for p2m lookups, and
then the
>> issue
>> happens.
>> >> > After I removed the gfn_lock, the issue goes. But in
latest xen,
>> >> > even I remove this lock, it still shows high cpu
utilization.
>> >> >
>> >>
>> >> It would seem then that even the briefest lock-protected
critical
>> >> section would cause this? In the mmio case, the p2m lock taken
in the
>> >> hap fault handler is held during the actual lookup, and for a
couple
>> >> of branch instructions afterwards.
>> >>
>> >> In latest Xen, with lock removed for get_gfn, on which lock is
time
>> >> spent?
>> > Still the p2m_lock.
>>
>> How are you removing the lock from get_gfn?
>>
>> The p2m lock is taken on a few specific code paths outside of get_gfn
>> (change
>> type of an entry, add a new p2m entry, setup and teardown), and
I''m
>> surprised
>> any of those code paths is being used by the hpet mmio handler.
>
> Sorry, what I said maybe not accurate. In latest xen, I use a workaround
> way to skip calling get_gfn_type_access in hvm_hap_nested_page_fault(). So
> the p2m_lock is still existing.
> Now, I found the contention of p2m_lock is coming from __hvm_copy. In mmio
> handler, it has some code paths to call
> it(hvm_fetch_from_guest_virt_nofault(),  hvm_copy_from_guest_virt()). When
> lots of mmio access happened, the contention is very obviously.
Thanks. Can you please try this:
http://lists.xen.org/archives/html/xen-devel/2012-04/msg01861.html

in conjunction with the patch below?
Andres

diff -r 7a7443e80b99 xen/arch/x86/hvm/hvm.c
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -2383,6 +2383,8 @@ static enum hvm_copy_result __hvm_copy(

     while ( todo > 0 )
     {
+        struct page_info *pg;
+
         count = min_t(int, PAGE_SIZE - (addr & ~PAGE_MASK), todo);

         if ( flags & HVMCOPY_virt )
@@ -2427,7 +2429,11 @@ static enum hvm_copy_result __hvm_copy(
             put_gfn(curr->domain, gfn);
             return HVMCOPY_bad_gfn_to_mfn;
         }
+
         ASSERT(mfn_valid(mfn));
+        pg = mfn_to_page(mfn);
+        ASSERT(get_page(pg, curr->domain));
+        put_gfn(curr->domain, gfn);

         p = (char *)map_domain_page(mfn) + (addr & ~PAGE_MASK);

@@ -2457,7 +2463,7 @@ static enum hvm_copy_result __hvm_copy(
         addr += count;
         buf  += count;
         todo -= count;
-        put_gfn(curr->domain, gfn);
+        put_page(pg);
     }

     return HVMCOPY_okay;
>
> yang
>

Zhang, Yang Z

2012-Apr-25 05:18 UTC

head link

Re: lock in vhpet

> -----Original Message-----
> From: Andres Lagar-Cavilla [mailto:andres@lagarcavilla.org]
> Sent: Wednesday, April 25, 2012 11:34 AM
> 
> Thanks. Can you please try this:
> http://lists.xen.org/archives/html/xen-devel/2012-04/msg01861.html
> 
> in conjunction with the patch below?
> Andres
> 
> diff -r 7a7443e80b99 xen/arch/x86/hvm/hvm.c
> --- a/xen/arch/x86/hvm/hvm.c
> +++ b/xen/arch/x86/hvm/hvm.c
> @@ -2383,6 +2383,8 @@ static enum hvm_copy_result __hvm_copy(
> 
>      while ( todo > 0 )
>      {
> +        struct page_info *pg;
> +
>          count = min_t(int, PAGE_SIZE - (addr & ~PAGE_MASK), todo);
> 
>          if ( flags & HVMCOPY_virt )
> @@ -2427,7 +2429,11 @@ static enum hvm_copy_result __hvm_copy(
>              put_gfn(curr->domain, gfn);
>              return HVMCOPY_bad_gfn_to_mfn;
>          }
> +
>          ASSERT(mfn_valid(mfn));
> +        pg = mfn_to_page(mfn);
> +        ASSERT(get_page(pg, curr->domain));
> +        put_gfn(curr->domain, gfn);
> 
>          p = (char *)map_domain_page(mfn) + (addr & ~PAGE_MASK);
> 
> @@ -2457,7 +2463,7 @@ static enum hvm_copy_result __hvm_copy(
>          addr += count;
>          buf  += count;
>          todo -= count;
> -        put_gfn(curr->domain, gfn);
> +        put_page(pg);
>      }
> 
>      return HVMCOPY_okay;No, it doesn''t work. On the contrary, the competition is more fiercer
than before:
Here is the p2m_lock competition with 10 seconds with 16 vcpus:
lock:      560583(00000000:83362735), block:      321453(00000009:364CA49B)

yang

Jan Beulich

2012-Apr-25 08:07 UTC

head link

Re: lock in vhpet

>>> On 25.04.12 at 05:34, "Andres Lagar-Cavilla"
<andres@lagarcavilla.org> wrote:
> --- a/xen/arch/x86/hvm/hvm.c
> +++ b/xen/arch/x86/hvm/hvm.c
> @@ -2383,6 +2383,8 @@ static enum hvm_copy_result __hvm_copy(
> 
>      while ( todo > 0 )
>      {
> +        struct page_info *pg;
> +
>          count = min_t(int, PAGE_SIZE - (addr & ~PAGE_MASK), todo);
> 
>          if ( flags & HVMCOPY_virt )
> @@ -2427,7 +2429,11 @@ static enum hvm_copy_result __hvm_copy(
>              put_gfn(curr->domain, gfn);
>              return HVMCOPY_bad_gfn_to_mfn;
>          }
> +
>          ASSERT(mfn_valid(mfn));
> +        pg = mfn_to_page(mfn);
> +        ASSERT(get_page(pg, curr->domain));
You really shouldn''t ever put expressions with (side) effects inside
an ASSERT(), not even for debugging patches that you want others
to apply (you''re of course free to shoot yourself in the foot ;-) ), as
it just won''t work in non-debug builds.

Jan
> +        put_gfn(curr->domain, gfn);
> 
>          p = (char *)map_domain_page(mfn) + (addr & ~PAGE_MASK);
> 
> @@ -2457,7 +2463,7 @@ static enum hvm_copy_result __hvm_copy(
>          addr += count;
>          buf  += count;
>          todo -= count;
> -        put_gfn(curr->domain, gfn);
> +        put_page(pg);
>      }
> 
>      return HVMCOPY_okay;

Tim Deegan

2012-Apr-26 21:25 UTC

head link

Re: lock in vhpet

At 02:36 +0000 on 25 Apr (1335321409), Zhang, Yang Z
wrote:> > > But actually, the first cs introduced this issue is 24770. When
win8
> > > booting and if hpet is enabled, it will use hpet as the time
source
> > > and there have lots of hpet access and EPT violation. In EPT
violation
> > > handler, it call get_gfn_type_access to get the mfn. The cs 24770
> > > introduces the gfn_lock for p2m lookups, and then the issue
happens.
> > > After I removed the gfn_lock, the issue goes. But in latest xen,
even
> > > I remove this lock, it still shows high cpu utilization.
> > 
> > It would seem then that even the briefest lock-protected critical
section would
> > cause this? In the mmio case, the p2m lock taken in the hap fault
handler is
> > held during the actual lookup, and for a couple of branch instructions
> > afterwards.
> > 
> > In latest Xen, with lock removed for get_gfn, on which lock is time
spent?
> Still the p2m_lock.
Can you please try the attached patch?  I think you''ll need this one
plus the ones that take the locks out of the hpet code. 

This patch makes the p2m lock into an rwlock and adjusts a number of the
paths that don''t update the p2m so they only take the read lock. 
It''s a
bit rough but I can boot 16-way win7 guest with it.

N.B. Since rwlocks don''t show up the the existing lock profiling,
please
don''t try to use the lock-profiling numbers to see if it''s
helping!

Andres, this is basically the big-hammer version of your "take a
pagecount" changes, plus the change you made to hvmemul_rep_movs().
If this works I intend to follow it up with a patch to make some of the
read-modify-write paths avoid taking the lock (by using a
compare-exchange operation so they only take the lock on a write).  If
that succeeds I might drop put_gfn() altogether. 

But first it will need a lot of tidying up.  Noticeably missing:
 - SVM code equivalents to the vmx.c changes
 - grant-table operations still use the lock, because frankly I 
   could not follow the current code, and it''s quite late in the
evening.
I also have a long list of uglinesses in the mm code that I found while
writing this lot. 

Keir, I have no objection to later replacing this with something better
than an rwlock. :)  Or with making a NUMA-friendly rwlock
implementation, since I really expect this to be heavily read-mostly
when paging/sharing/pod are not enabled.

Cheers,

Tim.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Zhang, Yang Z

2012-Apr-27 00:46 UTC

head link

Re: lock in vhpet

> -----Original Message-----
> From: Tim Deegan [mailto:tim@xen.org]
> Sent: Friday, April 27, 2012 5:26 AM
> To: Zhang, Yang Z
> Cc: andres@lagarcavilla.org; Keir Fraser; xen-devel@lists.xensource.com
> Subject: Re: [Xen-devel] lock in vhpet
> 
> At 02:36 +0000 on 25 Apr (1335321409), Zhang, Yang Z wrote:
> > > > But actually, the first cs introduced this issue is 24770.
When
> > > > win8 booting and if hpet is enabled, it will use hpet as the
time
> > > > source and there have lots of hpet access and EPT violation.
In
> > > > EPT violation handler, it call get_gfn_type_access to get
the mfn.
> > > > The cs 24770 introduces the gfn_lock for p2m lookups, and
then the issue
> happens.
> > > > After I removed the gfn_lock, the issue goes. But in latest
xen,
> > > > even I remove this lock, it still shows high cpu
utilization.
> > >
> > > It would seem then that even the briefest lock-protected critical
> > > section would cause this? In the mmio case, the p2m lock taken in
> > > the hap fault handler is held during the actual lookup, and for a
> > > couple of branch instructions afterwards.
> > >
> > > In latest Xen, with lock removed for get_gfn, on which lock is
time spent?
> > Still the p2m_lock.
> 
> Can you please try the attached patch?  I think you''ll need this
one plus the
> ones that take the locks out of the hpet code.
> 
> This patch makes the p2m lock into an rwlock and adjusts a number of the
> paths that don''t update the p2m so they only take the read lock. 
It''s a bit
> rough but I can boot 16-way win7 guest with it.
This really a great work! Now, the win7 guest is booting very fast and never saw
the BSOD again.
But the changes are so large in your patch. I think we need to do more sanity
testing to avoid any regressions. After you finish all the work, I''d
like to do a whole testing.:)
> N.B. Since rwlocks don''t show up the the existing lock profiling,
please don''t try
> to use the lock-profiling numbers to see if it''s helping!
> 
> Andres, this is basically the big-hammer version of your "take a
pagecount"
> changes, plus the change you made to hvmemul_rep_movs().
> If this works I intend to follow it up with a patch to make some of the
> read-modify-write paths avoid taking the lock (by using a compare-exchange
> operation so they only take the lock on a write).  If that succeeds I might
drop
> put_gfn() altogether.
> 
> But first it will need a lot of tidying up.  Noticeably missing:
>  - SVM code equivalents to the vmx.c changes
>  - grant-table operations still use the lock, because frankly I
>    could not follow the current code, and it''s quite late in the
evening.
> I also have a long list of uglinesses in the mm code that I found while
writing
> this lot.
> 
> Keir, I have no objection to later replacing this with something better
than an
> rwlock. :)  Or with making a NUMA-friendly rwlock implementation, since I
> really expect this to be heavily read-mostly when paging/sharing/pod are
not
> enabled.
> 
> Cheers,
> 
> Tim.
best regards
yang

Andres Lagar-Cavilla

2012-Apr-27 00:51 UTC

head link

Re: lock in vhpet

>> -----Original Message-----
>> From: Tim Deegan [mailto:tim@xen.org]
>> Sent: Friday, April 27, 2012 5:26 AM
>> To: Zhang, Yang Z
>> Cc: andres@lagarcavilla.org; Keir Fraser; xen-devel@lists.xensource.com
>> Subject: Re: [Xen-devel] lock in vhpet
>>
>> At 02:36 +0000 on 25 Apr (1335321409), Zhang, Yang Z wrote:
>> > > > But actually, the first cs introduced this issue is
24770. When
>> > > > win8 booting and if hpet is enabled, it will use hpet as
the time
>> > > > source and there have lots of hpet access and EPT
violation. In
>> > > > EPT violation handler, it call get_gfn_type_access to
get the mfn.
>> > > > The cs 24770 introduces the gfn_lock for p2m lookups,
and then the
>> issue
>> happens.
>> > > > After I removed the gfn_lock, the issue goes. But in
latest xen,
>> > > > even I remove this lock, it still shows high cpu
utilization.
>> > >
>> > > It would seem then that even the briefest lock-protected
critical
>> > > section would cause this? In the mmio case, the p2m lock
taken in
>> > > the hap fault handler is held during the actual lookup, and
for a
>> > > couple of branch instructions afterwards.
>> > >
>> > > In latest Xen, with lock removed for get_gfn, on which lock
is time
>> spent?
>> > Still the p2m_lock.
>>
>> Can you please try the attached patch?  I think you''ll need
this one
>> plus the
>> ones that take the locks out of the hpet code.
>>
>> This patch makes the p2m lock into an rwlock and adjusts a number of
the
>> paths that don''t update the p2m so they only take the read
lock.  It''s a
>> bit
>> rough but I can boot 16-way win7 guest with it.
That is great news.

Tim, thanks for the amazing work. I''m poring over the patch presently,
and
will shoot at it with everything I''ve got testing-wise.

I''m taking the liberty of pulling in Olaf (paging) and George (PoD) as
the
changeset affects those subsystems.

Andres
>
> This really a great work! Now, the win7 guest is booting very fast and
> never saw the BSOD again.
> But the changes are so large in your patch. I think we need to do more
> sanity testing to avoid any regressions. After you finish all the work,
> I''d like to do a whole testing.:)
>
>> N.B. Since rwlocks don''t show up the the existing lock
profiling, please
>> don''t try
>> to use the lock-profiling numbers to see if it''s helping!
>>
>> Andres, this is basically the big-hammer version of your "take a
>> pagecount"
>> changes, plus the change you made to hvmemul_rep_movs().
>> If this works I intend to follow it up with a patch to make some of the
>> read-modify-write paths avoid taking the lock (by using a
>> compare-exchange
>> operation so they only take the lock on a write).  If that succeeds I
>> might drop
>> put_gfn() altogether.
>>
>> But first it will need a lot of tidying up.  Noticeably missing:
>>  - SVM code equivalents to the vmx.c changes
>>  - grant-table operations still use the lock, because frankly I
>>    could not follow the current code, and it''s quite late in
the
>> evening.
>> I also have a long list of uglinesses in the mm code that I found while
>> writing
>> this lot.
>>
>> Keir, I have no objection to later replacing this with something better
>> than an
>> rwlock. :)  Or with making a NUMA-friendly rwlock implementation, since
>> I
>> really expect this to be heavily read-mostly when paging/sharing/pod
are
>> not
>> enabled.
>>
>> Cheers,
>>
>> Tim.
>
> best regards
> yang
>

Zhang, Yang Z

2012-Apr-27 01:24 UTC

head link

Re: lock in vhpet

> -----Original Message-----
> From: Andres Lagar-Cavilla [mailto:andres@lagarcavilla.org]
> Sent: Friday, April 27, 2012 8:52 AM
> To: Zhang, Yang Z
> Cc: Tim Deegan; Keir Fraser; xen-devel@lists.xensource.com; olaf@aepfle.de;
> George.Dunlap@eu.citrix.com
> Subject: RE: [Xen-devel] lock in vhpet
> 
> >> -----Original Message-----
> >> From: Tim Deegan [mailto:tim@xen.org]
> >> Sent: Friday, April 27, 2012 5:26 AM
> >> To: Zhang, Yang Z
> >> Cc: andres@lagarcavilla.org; Keir Fraser;
> >> xen-devel@lists.xensource.com
> >> Subject: Re: [Xen-devel] lock in vhpet
> >>
> >> At 02:36 +0000 on 25 Apr (1335321409), Zhang, Yang Z wrote:
> >> > > > But actually, the first cs introduced this issue is
24770. When
> >> > > > win8 booting and if hpet is enabled, it will use
hpet as the
> >> > > > time source and there have lots of hpet access and
EPT
> >> > > > violation. In EPT violation handler, it call
get_gfn_type_access to get
> the mfn.
> >> > > > The cs 24770 introduces the gfn_lock for p2m
lookups, and then
> >> > > > the
> >> issue
> >> happens.
> >> > > > After I removed the gfn_lock, the issue goes. But
in latest
> >> > > > xen, even I remove this lock, it still shows high
cpu utilization.
> >> > >
> >> > > It would seem then that even the briefest lock-protected
critical
> >> > > section would cause this? In the mmio case, the p2m lock
taken in
> >> > > the hap fault handler is held during the actual lookup,
and for a
> >> > > couple of branch instructions afterwards.
> >> > >
> >> > > In latest Xen, with lock removed for get_gfn, on which
lock is
> >> > > time
> >> spent?
> >> > Still the p2m_lock.
> >>
> >> Can you please try the attached patch?  I think you''ll
need this one
> >> plus the ones that take the locks out of the hpet code.
> >>
> >> This patch makes the p2m lock into an rwlock and adjusts a number
of
> >> the paths that don''t update the p2m so they only take the
read lock.
> >> It''s a bit rough but I can boot 16-way win7 guest with
it.
> 
> That is great news.
> 
> Tim, thanks for the amazing work. I''m poring over the patch
presently, and will
> shoot at it with everything I''ve got testing-wise.
> 
> I''m taking the liberty of pulling in Olaf (paging) and George
(PoD) as the
> changeset affects those subsystems.
But win8 guest shows BSOD with 32 VCPUs. :(
The reason of BSOD is : SYSTEM_THREAD_EXCEPTION_NOT_HANDLED (ACPI.sys)


best regards
yang

Andres Lagar-Cavilla

2012-Apr-27 03:02 UTC

head link

Re: lock in vhpet

> At 02:36 +0000 on 25 Apr (1335321409), Zhang, Yang Z wrote:
>> > > But actually, the first cs introduced this issue is 24770.
When win8
>> > > booting and if hpet is enabled, it will use hpet as the time
source
>> > > and there have lots of hpet access and EPT violation. In EPT
>> violation
>> > > handler, it call get_gfn_type_access to get the mfn. The cs
24770
>> > > introduces the gfn_lock for p2m lookups, and then the issue
happens.
>> > > After I removed the gfn_lock, the issue goes. But in latest
xen,
>> even
>> > > I remove this lock, it still shows high cpu utilization.
>> >
>> > It would seem then that even the briefest lock-protected critical
>> section would
>> > cause this? In the mmio case, the p2m lock taken in the hap fault
>> handler is
>> > held during the actual lookup, and for a couple of branch
instructions
>> > afterwards.
>> >
>> > In latest Xen, with lock removed for get_gfn, on which lock is
time
>> spent?
>> Still the p2m_lock.
>
> Can you please try the attached patch?  I think you''ll need this
one
> plus the ones that take the locks out of the hpet code.
Right off the bat I''m getting a multitude of
(XEN) mm.c:3294:d0 Error while clearing mfn 100cbb7
And a hung dom0 during initramfs. I''m a little baffled as to why, but
it''s
there (32 bit dom0, XenServer6).
>
> This patch makes the p2m lock into an rwlock and adjusts a number of the
> paths that don''t update the p2m so they only take the read lock. 
It''s a
> bit rough but I can boot 16-way win7 guest with it.
>
> N.B. Since rwlocks don''t show up the the existing lock profiling,
please
> don''t try to use the lock-profiling numbers to see if
it''s helping!
>
> Andres, this is basically the big-hammer version of your "take a
> pagecount" changes, plus the change you made to hvmemul_rep_movs().
> If this works I intend to follow it up with a patch to make some of the
> read-modify-write paths avoid taking the lock (by using a
> compare-exchange operation so they only take the lock on a write).  If
> that succeeds I might drop put_gfn() altogether.
You mean cmpxchg the whole p2m entry? I don''t think I parse the plan.
There are code paths that do get_gfn_query -> p2m_change_type -> put_gfn.
But I guess those could lock the p2m up-front if they become the only
consumers of put_gfn left.
>
> But first it will need a lot of tidying up.  Noticeably missing:
>  - SVM code equivalents to the vmx.c changes
load_pdptrs doesn''t exist on svm. There are a couple of debugging
get_gfn_query that can be made unlocked. And svm_vmcb_restore needs
similar treatment to what you did in vmx.c. The question is who will be
able to test it...
>  - grant-table operations still use the lock, because frankly I
>    could not follow the current code, and it''s quite late in the
evening.
It''s pretty complex with serious nesting, and ifdef''s for arm
and 32 bit.
gfn_to_mfn_private callers will suffer from altering the current meaning,
as put_gfn resolves to the right thing for the ifdef''ed arch. The other
user is grant_transfer which also relies on the page *not* having an extra
ref in steal_page. So it''s a prime candidate to be left alone.
> I also have a long list of uglinesses in the mm code that I found
Uh, ugly stuff, how could that have happened?

I have a few preliminary observations on the patch. Pasting relevant bits
here, since the body of the patch seems to have been lost by the email
thread:

@@ -977,23 +976,25 @@ int arch_set_info_guest(
...
+
+        if (!paging_mode_refcounts(d)
+            && !get_page_and_type(cr3_page, d, PGT_l3_page_table) )
replace with && !get_page_type() )

@@ -2404,32 +2373,33 @@ static enum hvm_copy_result __hvm_copy(
             gfn = addr >> PAGE_SHIFT;
         }

-        mfn = mfn_x(get_gfn_unshare(curr->domain, gfn, &p2mt));
+        page = get_page_from_gfn(curr->domain, gfn, &p2mt, P2M_UNSHARE);
replace with (flags & HVMCOPY_to_guest) ? P2M_UNSHARE : P2M_ALLOC (and
same logic when checking p2m_is_shared). Not truly related to your patch
bit since we''re at it.

Same, further down
-        if ( !p2m_is_ram(p2mt) )
+        if ( !page )
         {
-            put_gfn(curr->domain, gfn);
+            if ( page )
+                put_page(page);
Last two lines are redundant

@@ -4019,35 +3993,16 @@ long do_hvm_op(unsigned long op, XEN_GUE
    case HVMOP_modified_memory: a lot of error checking has been removed.
At the very least:
if ( page )
{ ...
} else {
    rc = -EINVAL;
    goto param_fail3;
}

arch/x86/mm.c:do_mmu_update -> you blew up all the paging/sharing checking
for target gfns of mmu updates of l2/3/4 entries. It seems that this
wouldn''t work anyways, that''s why you killed it?

+++ b/xen/arch/x86/mm/hap/guest_walk.c
@@ -54,34 +54,37 @@ unsigned long hap_p2m_ga_to_gfn(GUEST_PA
...
+    if ( !top_page )
     {
         pfec[0] &= ~PFEC_page_present;
-        __put_gfn(p2m, top_gfn);
+        put_page(top_page);
top_page is NULL here, remove put_page

get_page_from_gfn_p2m, slow path: no need for p2m_lock/unlock since
locking is already done by get_gfn_type_access/__put_gfn

(hope those observations made sense without inlining them in the actual
patch)

Andres

Zhang, Yang Z

2012-Apr-27 08:36 UTC

head link

Re: lock in vhpet

> -----Original Message-----
> From: Zhang, Yang Z
> Sent: Friday, April 27, 2012 9:25 AM
> To: andres@lagarcavilla.org
> Cc: Tim Deegan; Keir Fraser; xen-devel@lists.xensource.com; olaf@aepfle.de;
> George.Dunlap@eu.citrix.com
> Subject: RE: [Xen-devel] lock in vhpet
> 
> 
> > -----Original Message-----
> > From: Andres Lagar-Cavilla [mailto:andres@lagarcavilla.org]
> > Sent: Friday, April 27, 2012 8:52 AM
> > To: Zhang, Yang Z
> > Cc: Tim Deegan; Keir Fraser; xen-devel@lists.xensource.com;
> > olaf@aepfle.de; George.Dunlap@eu.citrix.com
> > Subject: RE: [Xen-devel] lock in vhpet
> >
> > >> -----Original Message-----
> > >> From: Tim Deegan [mailto:tim@xen.org]
> > >> Sent: Friday, April 27, 2012 5:26 AM
> > >> To: Zhang, Yang Z
> > >> Cc: andres@lagarcavilla.org; Keir Fraser;
> > >> xen-devel@lists.xensource.com
> > >> Subject: Re: [Xen-devel] lock in vhpet
> > >>
> > >> At 02:36 +0000 on 25 Apr (1335321409), Zhang, Yang Z wrote:
> > >> > > > But actually, the first cs introduced this
issue is 24770.
> > >> > > > When
> > >> > > > win8 booting and if hpet is enabled, it will
use hpet as the
> > >> > > > time source and there have lots of hpet access
and EPT
> > >> > > > violation. In EPT violation handler, it call
> > >> > > > get_gfn_type_access to get
> > the mfn.
> > >> > > > The cs 24770 introduces the gfn_lock for p2m
lookups, and
> > >> > > > then the
> > >> issue
> > >> happens.
> > >> > > > After I removed the gfn_lock, the issue goes.
But in latest
> > >> > > > xen, even I remove this lock, it still shows
high cpu utilization.
> > >> > >
> > >> > > It would seem then that even the briefest
lock-protected
> > >> > > critical section would cause this? In the mmio
case, the p2m
> > >> > > lock taken in the hap fault handler is held during
the actual
> > >> > > lookup, and for a couple of branch instructions
afterwards.
> > >> > >
> > >> > > In latest Xen, with lock removed for get_gfn, on
which lock is
> > >> > > time
> > >> spent?
> > >> > Still the p2m_lock.
> > >>
> > >> Can you please try the attached patch?  I think
you''ll need this
> > >> one plus the ones that take the locks out of the hpet code.
> > >>
> > >> This patch makes the p2m lock into an rwlock and adjusts a
number
> > >> of the paths that don''t update the p2m so they only
take the read lock.
> > >> It''s a bit rough but I can boot 16-way win7 guest
with it.
> >
> > That is great news.
> >
> > Tim, thanks for the amazing work. I''m poring over the patch
presently,
> > and will shoot at it with everything I''ve got testing-wise.
> >
> > I''m taking the liberty of pulling in Olaf (paging) and George
(PoD) as
> > the changeset affects those subsystems.
> 
> But win8 guest shows BSOD with 32 VCPUs. :( The reason of BSOD is :
> SYSTEM_THREAD_EXCEPTION_NOT_HANDLED (ACPI.sys)
> Um....., I find this issue is related to xl not hypervisor.  
Will send a patch to fix it later.

yang

Tim Deegan

2012-Apr-27 09:26 UTC

head link

Re: lock in vhpet

At 20:02 -0700 on 26 Apr (1335470547), Andres Lagar-Cavilla
wrote:> > Can you please try the attached patch?  I think you''ll need
this one
> > plus the ones that take the locks out of the hpet code.
> 
> Right off the bat I''m getting a multitude of
> (XEN) mm.c:3294:d0 Error while clearing mfn 100cbb7
> And a hung dom0 during initramfs. I''m a little baffled as to why,
but it''s
> there (32 bit dom0, XenServer6).
Curses, I knew there''d be one somewhere.  I''ve been replacing
get_page_and_type_from_pagenr()s (which return 0 for success) with
old-school get_page_type()s (which return 1 for success) and not always
getting the right number of inversions.  That''s a horrible horrible
beartrap of an API, BTW, which had me cursing at the screen, but I had
enough on my plate yesterday without touching _that_ code too!
> > Andres, this is basically the big-hammer version of your "take a
> > pagecount" changes, plus the change you made to
hvmemul_rep_movs().
> > If this works I intend to follow it up with a patch to make some of
the
> > read-modify-write paths avoid taking the lock (by using a
> > compare-exchange operation so they only take the lock on a write).  If
> > that succeeds I might drop put_gfn() altogether.
> 
> You mean cmpxchg the whole p2m entry? I don''t think I parse the
plan.
> There are code paths that do get_gfn_query -> p2m_change_type ->
put_gfn.
> But I guess those could lock the p2m up-front if they become the only
> consumers of put_gfn left.
Well, that''s more or less what happens now.  I was thinking of
replacing
any remaining

 (implicit) lock ; read ; think a bit ; maybe write ; unlock

code with the fast-path-friendlier:

 read ; think ; maybe-cmpxchg (and on failure undo or retry 

which avoids taking the write lock altogether if there''s no work to do.
But maybe there aren''t many of those left now.  Obviously any path
which will always write should just take the write-lock first. 
 > >  - grant-table operations still use the lock, because frankly I
> >    could not follow the current code, and it''s quite late in
the evening.
> 
> It''s pretty complex with serious nesting, and ifdef''s for
arm and 32 bit.
> gfn_to_mfn_private callers will suffer from altering the current meaning,
> as put_gfn resolves to the right thing for the ifdef''ed arch. The
other
> user is grant_transfer which also relies on the page *not* having an extra
> ref in steal_page. So it''s a prime candidate to be left alone.
Sadly, I think it''s not.  The PV backends will be doing lots of grant
ops, which shouldn''t get serialized against all other P2M lookups. 
> > I also have a long list of uglinesses in the mm code that I found
> 
> Uh, ugly stuff, how could that have happened?
I can''t imagine. :)  Certainly nothing to do with me thinking
"I''ll
clean that up when I get some time."
> I have a few preliminary observations on the patch. Pasting relevant bits
> here, since the body of the patch seems to have been lost by the email
> thread:
> 
> @@ -977,23 +976,25 @@ int arch_set_info_guest(
> ...
> +
> +        if (!paging_mode_refcounts(d)
> +            && !get_page_and_type(cr3_page, d, PGT_l3_page_table)
)
> replace with && !get_page_type() )
Yep.
> @@ -2404,32 +2373,33 @@ static enum hvm_copy_result __hvm_copy(
>              gfn = addr >> PAGE_SHIFT;
>          }
> 
> -        mfn = mfn_x(get_gfn_unshare(curr->domain, gfn, &p2mt));
> +        page = get_page_from_gfn(curr->domain, gfn, &p2mt,
P2M_UNSHARE);
> replace with (flags & HVMCOPY_to_guest) ? P2M_UNSHARE : P2M_ALLOC (and
> same logic when checking p2m_is_shared). Not truly related to your patch
> bit since we''re at it.
OK, but not in this patch.
> Same, further down
> -        if ( !p2m_is_ram(p2mt) )
> +        if ( !page )
>          {
> -            put_gfn(curr->domain, gfn);
> +            if ( page )
> +                put_page(page);
> Last two lines are redundant
Yep.
> @@ -4019,35 +3993,16 @@ long do_hvm_op(unsigned long op, XEN_GUE
>     case HVMOP_modified_memory: a lot of error checking has been removed.
Yes, but it was bogus - there''s a race between the actual modification
and the call, during which anything might have happened.  The best we
can do is throw log-dirty bits at everything, and the caller can''t do
anything with the error anyway.

When I come to tidy up I''ll just add a new mark_gfn_dirty function
and skip the pointless gfn->mfn->gfn translation on this path.
> arch/x86/mm.c:do_mmu_update -> you blew up all the paging/sharing
checking
> for target gfns of mmu updates of l2/3/4 entries. It seems that this
> wouldn''t work anyways, that''s why you killed it?
Yeah - since only L1es can point at foreign mappings it was all just
noise, and even if there had been real p2m lookups on those paths there
was no equivalent to the translate-in-place that happens in
mod_l1_entry so it would have been broken in a much worse way.
> +++ b/xen/arch/x86/mm/hap/guest_walk.c
> @@ -54,34 +54,37 @@ unsigned long hap_p2m_ga_to_gfn(GUEST_PA
> ...
> +    if ( !top_page )
>      {
>          pfec[0] &= ~PFEC_page_present;
> -        __put_gfn(p2m, top_gfn);
> +        put_page(top_page);
> top_page is NULL here, remove put_page
Yep.
> get_page_from_gfn_p2m, slow path: no need for p2m_lock/unlock since
> locking is already done by get_gfn_type_access/__put_gfn
Yeah, but I was writing that with half an eye on killing that lock. :) 
I''ll drop them for now.
> (hope those observations made sense without inlining them in the actual
> patch)
Yes, absolutely - thanks for the review!

If we can get this to work well enough I''ll tidy it up into a sensible
series next week.   In the meantime, an updated verison of the
monster patch is attached. 

Cheers,

Tim.


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Andres Lagar-Cavilla

2012-Apr-27 14:17 UTC

head link

Re: lock in vhpet

> At 20:02 -0700 on 26 Apr (1335470547), Andres Lagar-Cavilla wrote:
>> > Can you please try the attached patch?  I think you''ll
need this one
>> > plus the ones that take the locks out of the hpet code.
>>
>> Right off the bat I''m getting a multitude of
>> (XEN) mm.c:3294:d0 Error while clearing mfn 100cbb7
>> And a hung dom0 during initramfs. I''m a little baffled as to
why, but
>> it''s
>> there (32 bit dom0, XenServer6).
>
> Curses, I knew there''d be one somewhere.  I''ve been
replacing
> get_page_and_type_from_pagenr()s (which return 0 for success) with
> old-school get_page_type()s (which return 1 for success) and not always
> getting the right number of inversions.  That''s a horrible
horrible
> beartrap of an API, BTW, which had me cursing at the screen, but I had
> enough on my plate yesterday without touching _that_ code too!
Found more bugs. Some were predating this (xsm not calling put_gfn after
get_gfn_untyped, etc). Some were new (get_gfn_untyped not arch
independent).

I will be sending you shortly a small patch series on top of your mosnter
patch purely for RFC. Feel free to merge all, pick out bug fixes, etc.

Andres>
>> > Andres, this is basically the big-hammer version of your
"take a
>> > pagecount" changes, plus the change you made to
hvmemul_rep_movs().
>> > If this works I intend to follow it up with a patch to make some
of
>> the
>> > read-modify-write paths avoid taking the lock (by using a
>> > compare-exchange operation so they only take the lock on a write).
If
>> > that succeeds I might drop put_gfn() altogether.
>>
>> You mean cmpxchg the whole p2m entry? I don''t think I parse
the plan.
>> There are code paths that do get_gfn_query -> p2m_change_type ->
>> put_gfn.
>> But I guess those could lock the p2m up-front if they become the only
>> consumers of put_gfn left.
>
> Well, that''s more or less what happens now.  I was thinking of
replacing
> any remaining
>
>  (implicit) lock ; read ; think a bit ; maybe write ; unlock
>
> code with the fast-path-friendlier:
>
>  read ; think ; maybe-cmpxchg (and on failure undo or retry
>
> which avoids taking the write lock altogether if there''s no work
to do.
> But maybe there aren''t many of those left now.  Obviously any path
> which will always write should just take the write-lock first.
>
>> >  - grant-table operations still use the lock, because frankly I
>> >    could not follow the current code, and it''s quite late
in the
>> evening.
>>
>> It''s pretty complex with serious nesting, and ifdef''s
for arm and 32
>> bit.
>> gfn_to_mfn_private callers will suffer from altering the current
>> meaning,
>> as put_gfn resolves to the right thing for the ifdef''ed arch.
The other
>> user is grant_transfer which also relies on the page *not* having an
>> extra
>> ref in steal_page. So it''s a prime candidate to be left alone.
>
> Sadly, I think it''s not.  The PV backends will be doing lots of
grant
> ops, which shouldn''t get serialized against all other P2M lookups.
>
>> > I also have a long list of uglinesses in the mm code that I found
>>
>> Uh, ugly stuff, how could that have happened?
>
> I can''t imagine. :)  Certainly nothing to do with me thinking
"I''ll
> clean that up when I get some time."
>
>> I have a few preliminary observations on the patch. Pasting relevant
>> bits
>> here, since the body of the patch seems to have been lost by the email
>> thread:
>>
>> @@ -977,23 +976,25 @@ int arch_set_info_guest(
>> ...
>> +
>> +        if (!paging_mode_refcounts(d)
>> +            && !get_page_and_type(cr3_page, d,
PGT_l3_page_table) )
>> replace with && !get_page_type() )
>
> Yep.
>
>> @@ -2404,32 +2373,33 @@ static enum hvm_copy_result __hvm_copy(
>>              gfn = addr >> PAGE_SHIFT;
>>          }
>>
>> -        mfn = mfn_x(get_gfn_unshare(curr->domain, gfn, &p2mt));
>> +        page = get_page_from_gfn(curr->domain, gfn, &p2mt,
>> P2M_UNSHARE);
>> replace with (flags & HVMCOPY_to_guest) ? P2M_UNSHARE : P2M_ALLOC
(and
>> same logic when checking p2m_is_shared). Not truly related to your
patch
>> bit since we''re at it.
>
> OK, but not in this patch.
>
>> Same, further down
>> -        if ( !p2m_is_ram(p2mt) )
>> +        if ( !page )
>>          {
>> -            put_gfn(curr->domain, gfn);
>> +            if ( page )
>> +                put_page(page);
>> Last two lines are redundant
>
> Yep.
>
>> @@ -4019,35 +3993,16 @@ long do_hvm_op(unsigned long op, XEN_GUE
>>     case HVMOP_modified_memory: a lot of error checking has been
>> removed.
>
> Yes, but it was bogus - there''s a race between the actual
modification
> and the call, during which anything might have happened.  The best we
> can do is throw log-dirty bits at everything, and the caller can''t
do
> anything with the error anyway.
>
> When I come to tidy up I''ll just add a new mark_gfn_dirty function
> and skip the pointless gfn->mfn->gfn translation on this path.
>
>> arch/x86/mm.c:do_mmu_update -> you blew up all the paging/sharing
>> checking
>> for target gfns of mmu updates of l2/3/4 entries. It seems that this
>> wouldn''t work anyways, that''s why you killed it?
>
> Yeah - since only L1es can point at foreign mappings it was all just
> noise, and even if there had been real p2m lookups on those paths there
> was no equivalent to the translate-in-place that happens in
> mod_l1_entry so it would have been broken in a much worse way.
>
>> +++ b/xen/arch/x86/mm/hap/guest_walk.c
>> @@ -54,34 +54,37 @@ unsigned long hap_p2m_ga_to_gfn(GUEST_PA
>> ...
>> +    if ( !top_page )
>>      {
>>          pfec[0] &= ~PFEC_page_present;
>> -        __put_gfn(p2m, top_gfn);
>> +        put_page(top_page);
>> top_page is NULL here, remove put_page
>
> Yep.
>
>> get_page_from_gfn_p2m, slow path: no need for p2m_lock/unlock since
>> locking is already done by get_gfn_type_access/__put_gfn
>
> Yeah, but I was writing that with half an eye on killing that lock. :)
> I''ll drop them for now.
>
>> (hope those observations made sense without inlining them in the actual
>> patch)
>
> Yes, absolutely - thanks for the review!
>
> If we can get this to work well enough I''ll tidy it up into a
sensible
> series next week.   In the meantime, an updated verison of the
> monster patch is attached.
>
> Cheers,
>
> Tim.
>

Andres Lagar-Cavilla

2012-Apr-27 21:08 UTC

head link

Re: lock in vhpet

> At 20:02 -0700 on 26 Apr (1335470547), Andres Lagar-Cavilla wrote:
>> > Can you please try the attached patch?  I think you''ll
need this one
>> > plus the ones that take the locks out of the hpet code.
>>
>> Right off the bat I''m getting a multitude of
>> (XEN) mm.c:3294:d0 Error while clearing mfn 100cbb7
>> And a hung dom0 during initramfs. I''m a little baffled as to
why, but
>> it''s
>> there (32 bit dom0, XenServer6).
>
> Curses, I knew there''d be one somewhere.  I''ve been
replacing
> get_page_and_type_from_pagenr()s (which return 0 for success) with
> old-school get_page_type()s (which return 1 for success) and not always
> getting the right number of inversions.  That''s a horrible
horrible
> beartrap of an API, BTW, which had me cursing at the screen, but I had
> enough on my plate yesterday without touching _that_ code too!
>
I now am quite pleased with the testing results on our end. I have a
four-patch series to top up your monster patch, which I''ll submit
shortly.
I encourage everyone interested to test this (obviously a lot of code is
touched). Including AMD, as I''ve expanded the code to touch svm too.
>> > Andres, this is basically the big-hammer version of your
"take a
>> > pagecount" changes, plus the change you made to
hvmemul_rep_movs().
>> > If this works I intend to follow it up with a patch to make some
of
>> the
>> > read-modify-write paths avoid taking the lock (by using a
>> > compare-exchange operation so they only take the lock on a write).
If
>> > that succeeds I might drop put_gfn() altogether.
>>
>> You mean cmpxchg the whole p2m entry? I don''t think I parse
the plan.
>> There are code paths that do get_gfn_query -> p2m_change_type ->
>> put_gfn.
>> But I guess those could lock the p2m up-front if they become the only
>> consumers of put_gfn left.
>
> Well, that''s more or less what happens now.  I was thinking of
replacing
> any remaining
>
>  (implicit) lock ; read ; think a bit ; maybe write ; unlock
>
> code with the fast-path-friendlier:
>
>  read ; think ; maybe-cmpxchg (and on failure undo or retry
>
> which avoids taking the write lock altogether if there''s no work
to do.
> But maybe there aren''t many of those left now.  Obviously any path
> which will always write should just take the write-lock first.
After my four patches there aren''t really any paths like the above left
(exhaustion disclaimer). I believe one or two iterative paths (like
HVMOP_set_mem_type) could be optimized to take the p2m lock up front,
instead of many get_gfn''s.

The nice thing about get_gfn/put_gfn is that it will allow for seamless
(har har) transition to a fine-grained p2m. But then maybe we won''t
ever
need that with a p2m rwlock.
>
>> >  - grant-table operations still use the lock, because frankly I
>> >    could not follow the current code, and it''s quite late
in the
>> evening.
>>
>> It''s pretty complex with serious nesting, and ifdef''s
for arm and 32
>> bit.
>> gfn_to_mfn_private callers will suffer from altering the current
>> meaning,
>> as put_gfn resolves to the right thing for the ifdef''ed arch.
The other
>> user is grant_transfer which also relies on the page *not* having an
>> extra
>> ref in steal_page. So it''s a prime candidate to be left alone.
>
> Sadly, I think it''s not.  The PV backends will be doing lots of
grant
> ops, which shouldn''t get serialized against all other P2M lookups.
>
Those are addressed in my patch series now, which should case waves of panic.

Andres
>> > I also have a long list of uglinesses in the mm code that I found
>>
>> Uh, ugly stuff, how could that have happened?
>
> I can''t imagine. :)  Certainly nothing to do with me thinking
"I''ll
> clean that up when I get some time."
>
>> I have a few preliminary observations on the patch. Pasting relevant
>> bits
>> here, since the body of the patch seems to have been lost by the email
>> thread:
>>
>> @@ -977,23 +976,25 @@ int arch_set_info_guest(
>> ...
>> +
>> +        if (!paging_mode_refcounts(d)
>> +            && !get_page_and_type(cr3_page, d,
PGT_l3_page_table) )
>> replace with && !get_page_type() )
>
> Yep.
>
>> @@ -2404,32 +2373,33 @@ static enum hvm_copy_result __hvm_copy(
>>              gfn = addr >> PAGE_SHIFT;
>>          }
>>
>> -        mfn = mfn_x(get_gfn_unshare(curr->domain, gfn, &p2mt));
>> +        page = get_page_from_gfn(curr->domain, gfn, &p2mt,
>> P2M_UNSHARE);
>> replace with (flags & HVMCOPY_to_guest) ? P2M_UNSHARE : P2M_ALLOC
(and
>> same logic when checking p2m_is_shared). Not truly related to your
patch
>> bit since we''re at it.
>
> OK, but not in this patch.
>
>> Same, further down
>> -        if ( !p2m_is_ram(p2mt) )
>> +        if ( !page )
>>          {
>> -            put_gfn(curr->domain, gfn);
>> +            if ( page )
>> +                put_page(page);
>> Last two lines are redundant
>
> Yep.
>
>> @@ -4019,35 +3993,16 @@ long do_hvm_op(unsigned long op, XEN_GUE
>>     case HVMOP_modified_memory: a lot of error checking has been
>> removed.
>
> Yes, but it was bogus - there''s a race between the actual
modification
> and the call, during which anything might have happened.  The best we
> can do is throw log-dirty bits at everything, and the caller can''t
do
> anything with the error anyway.
>
> When I come to tidy up I''ll just add a new mark_gfn_dirty function
> and skip the pointless gfn->mfn->gfn translation on this path.
>
>> arch/x86/mm.c:do_mmu_update -> you blew up all the paging/sharing
>> checking
>> for target gfns of mmu updates of l2/3/4 entries. It seems that this
>> wouldn''t work anyways, that''s why you killed it?
>
> Yeah - since only L1es can point at foreign mappings it was all just
> noise, and even if there had been real p2m lookups on those paths there
> was no equivalent to the translate-in-place that happens in
> mod_l1_entry so it would have been broken in a much worse way.
>
>> +++ b/xen/arch/x86/mm/hap/guest_walk.c
>> @@ -54,34 +54,37 @@ unsigned long hap_p2m_ga_to_gfn(GUEST_PA
>> ...
>> +    if ( !top_page )
>>      {
>>          pfec[0] &= ~PFEC_page_present;
>> -        __put_gfn(p2m, top_gfn);
>> +        put_page(top_page);
>> top_page is NULL here, remove put_page
>
> Yep.
>
>> get_page_from_gfn_p2m, slow path: no need for p2m_lock/unlock since
>> locking is already done by get_gfn_type_access/__put_gfn
>
> Yeah, but I was writing that with half an eye on killing that lock. :)
> I''ll drop them for now.
>
>> (hope those observations made sense without inlining them in the actual
>> patch)
>
> Yes, absolutely - thanks for the review!
>
> If we can get this to work well enough I''ll tidy it up into a
sensible
> series next week.   In the meantime, an updated verison of the
> monster patch is attached.
>
> Cheers,
>
> Tim.
>

Zhang, Yang Z

2012-May-16 11:36 UTC

head link

Re: lock in vhpet

Hi tim,

Did the attached patch apply to upstream xen? I tried the latest xen and still
saw the high cpu utilization.

best regards
yang
> -----Original Message-----
> From: Tim Deegan [mailto:tim@xen.org]
> Sent: Friday, April 27, 2012 5:26 AM
> To: Zhang, Yang Z
> Cc: andres@lagarcavilla.org; Keir Fraser; xen-devel@lists.xensource.com
> Subject: Re: [Xen-devel] lock in vhpet
> 
> At 02:36 +0000 on 25 Apr (1335321409), Zhang, Yang Z wrote:
> > > > But actually, the first cs introduced this issue is 24770.
When win8
> > > > booting and if hpet is enabled, it will use hpet as the time
source
> > > > and there have lots of hpet access and EPT violation. In EPT
violation
> > > > handler, it call get_gfn_type_access to get the mfn. The cs
24770
> > > > introduces the gfn_lock for p2m lookups, and then the issue
happens.
> > > > After I removed the gfn_lock, the issue goes. But in latest
xen, even
> > > > I remove this lock, it still shows high cpu utilization.
> > >
> > > It would seem then that even the briefest lock-protected critical
section
> would
> > > cause this? In the mmio case, the p2m lock taken in the hap fault
handler is
> > > held during the actual lookup, and for a couple of branch
instructions
> > > afterwards.
> > >
> > > In latest Xen, with lock removed for get_gfn, on which lock is
time spent?
> > Still the p2m_lock.
> 
> Can you please try the attached patch?  I think you''ll need this
one
> plus the ones that take the locks out of the hpet code.
> 
> This patch makes the p2m lock into an rwlock and adjusts a number of the
> paths that don''t update the p2m so they only take the read lock. 
It''s a
> bit rough but I can boot 16-way win7 guest with it.
> 
> N.B. Since rwlocks don''t show up the the existing lock profiling,
please
> don''t try to use the lock-profiling numbers to see if
it''s helping!
> 
> Andres, this is basically the big-hammer version of your "take a
> pagecount" changes, plus the change you made to hvmemul_rep_movs().
> If this works I intend to follow it up with a patch to make some of the
> read-modify-write paths avoid taking the lock (by using a
> compare-exchange operation so they only take the lock on a write).  If
> that succeeds I might drop put_gfn() altogether.
> 
> But first it will need a lot of tidying up.  Noticeably missing:
>  - SVM code equivalents to the vmx.c changes
>  - grant-table operations still use the lock, because frankly I
>    could not follow the current code, and it''s quite late in the
evening.
> I also have a long list of uglinesses in the mm code that I found while
> writing this lot.
> 
> Keir, I have no objection to later replacing this with something better
> than an rwlock. :)  Or with making a NUMA-friendly rwlock
> implementation, since I really expect this to be heavily read-mostly
> when paging/sharing/pod are not enabled.
> 
> Cheers,
> 
> Tim.

Tim Deegan

2012-May-16 12:36 UTC

head link

Re: lock in vhpet

Hi,

At 11:36 +0000 on 16 May (1337168174), Zhang, Yang Z
wrote:> Hi tim,
> 
> Did the attached patch apply to upstream xen? I tried the latest xen
> and still saw the high cpu utilization.
The patch is not yet applied.  It''s been cleaned up into a patch series
that I posted last Thursday, and will probably be applied later this
week. 

Cheers,

Tim.
> best regards
> yang
> 
> > -----Original Message-----
> > From: Tim Deegan [mailto:tim@xen.org]
> > Sent: Friday, April 27, 2012 5:26 AM
> > To: Zhang, Yang Z
> > Cc: andres@lagarcavilla.org; Keir Fraser;
xen-devel@lists.xensource.com
> > Subject: Re: [Xen-devel] lock in vhpet
> > 
> > At 02:36 +0000 on 25 Apr (1335321409), Zhang, Yang Z wrote:
> > > > > But actually, the first cs introduced this issue is
24770. When win8
> > > > > booting and if hpet is enabled, it will use hpet as the
time source
> > > > > and there have lots of hpet access and EPT violation.
In EPT violation
> > > > > handler, it call get_gfn_type_access to get the mfn.
The cs 24770
> > > > > introduces the gfn_lock for p2m lookups, and then the
issue happens.
> > > > > After I removed the gfn_lock, the issue goes. But in
latest xen, even
> > > > > I remove this lock, it still shows high cpu
utilization.
> > > >
> > > > It would seem then that even the briefest lock-protected
critical section
> > would
> > > > cause this? In the mmio case, the p2m lock taken in the hap
fault handler is
> > > > held during the actual lookup, and for a couple of branch
instructions
> > > > afterwards.
> > > >
> > > > In latest Xen, with lock removed for get_gfn, on which lock
is time spent?
> > > Still the p2m_lock.
> > 
> > Can you please try the attached patch?  I think you''ll need
this one
> > plus the ones that take the locks out of the hpet code.
> > 
> > This patch makes the p2m lock into an rwlock and adjusts a number of
the
> > paths that don''t update the p2m so they only take the read
lock.  It''s a
> > bit rough but I can boot 16-way win7 guest with it.
> > 
> > N.B. Since rwlocks don''t show up the the existing lock
profiling, please
> > don''t try to use the lock-profiling numbers to see if
it''s helping!
> > 
> > Andres, this is basically the big-hammer version of your "take a
> > pagecount" changes, plus the change you made to
hvmemul_rep_movs().
> > If this works I intend to follow it up with a patch to make some of
the
> > read-modify-write paths avoid taking the lock (by using a
> > compare-exchange operation so they only take the lock on a write).  If
> > that succeeds I might drop put_gfn() altogether.
> > 
> > But first it will need a lot of tidying up.  Noticeably missing:
> >  - SVM code equivalents to the vmx.c changes
> >  - grant-table operations still use the lock, because frankly I
> >    could not follow the current code, and it''s quite late in
the evening.
> > I also have a long list of uglinesses in the mm code that I found
while
> > writing this lot.
> > 
> > Keir, I have no objection to later replacing this with something
better
> > than an rwlock. :)  Or with making a NUMA-friendly rwlock
> > implementation, since I really expect this to be heavily read-mostly
> > when paging/sharing/pod are not enabled.
> > 
> > Cheers,
> > 
> > Tim.
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

Tim Deegan

2012-May-17 10:57 UTC

head link

Re: lock in vhpet

Hi,

At 13:36 +0100 on 16 May (1337175361), Tim Deegan wrote:> At 11:36 +0000 on 16 May (1337168174), Zhang, Yang Z wrote:
> > Did the attached patch apply to upstream xen? I tried the latest xen
> > and still saw the high cpu utilization.
> 
> The patch is not yet applied.  It''s been cleaned up into a patch
series
> that I posted last Thursday, and will probably be applied later this
> week. 
It''s now been applied to the staging tree, as csets 25350--25360.

Cheers,

Tim.

Zhang, Yang Z

2012-May-28 06:54 UTC

head link

Re: lock in vhpet

> -----Original Message-----
> From: Tim Deegan [mailto:tim@xen.org]
> Sent: Thursday, May 17, 2012 6:58 PM
> To: Zhang, Yang Z
> Cc: xen-devel@lists.xensource.com; Keir Fraser; andres@lagarcavilla.org
> Subject: Re: [Xen-devel] lock in vhpet
> 
> Hi,
> 
> At 13:36 +0100 on 16 May (1337175361), Tim Deegan wrote:
> > At 11:36 +0000 on 16 May (1337168174), Zhang, Yang Z wrote:
> > > Did the attached patch apply to upstream xen? I tried the latest
xen
> > > and still saw the high cpu utilization.
> >
> > The patch is not yet applied.  It''s been cleaned up into a
patch series
> > that I posted last Thursday, and will probably be applied later this
> > week.
> 
> It''s now been applied to the staging tree, as csets 25350--25360.
It''s a great work! With one week''s testing, we didn''t
find any regression with those patches.

best regards
yang

Xen devel - Apr 2012 - lock in vhpet

lock in vhpet

Re: lock in vhpet

Re: lock in vhpet

Re: lock in vhpet

Re: lock in vhpet

Re: lock in vhpet

Re: lock in vhpet

Re: lock in vhpet

Re: lock in vhpet

Re: lock in vhpet

Re: lock in vhpet

Re: lock in vhpet

Re: lock in vhpet

Re: lock in vhpet

Re: lock in vhpet

Re: lock in vhpet

Re: lock in vhpet

Re: lock in vhpet

Re: lock in vhpet

Re: lock in vhpet

Re: lock in vhpet

Re: lock in vhpet

Re: lock in vhpet

Re: lock in vhpet

Re: lock in vhpet

Re: lock in vhpet

Re: lock in vhpet

Re: lock in vhpet

Re: lock in vhpet

Re: lock in vhpet

Re: lock in vhpet

Re: lock in vhpet

Re: lock in vhpet

Re: lock in vhpet

Re: lock in vhpet

Re: lock in vhpet

Re: lock in vhpet

Re: lock in vhpet

Re: lock in vhpet

Re: lock in vhpet

Re: lock in vhpet

Re: lock in vhpet

Re: lock in vhpet

Re: lock in vhpet

Re: lock in vhpet