thr3ads.net - Xen devel - [Xen-devel] [PATCH] Reduce overhead in find_domain_by

If this information is useful, please help other people find it:
Share via:

Santos, Jose Renato G

2006-Dec-06 01:35 UTC

[Xen-devel] [PATCH] Reduce overhead in find_domain_by_id() [0/2]

This is a set of patches to improve performance of find_domain_by_id().
find_domain_by_id shows up high in profiles for network I/O intensive
workloads.
Most of the cost for this function comes from 3 main functions (of
aproximate equal costs): 1)read_lock(), 2)read_unlock() and
3)get_domain().
These patches replace the lock used for accessing domain_list and
domain_hash with a lock free RCU scheme. Experiments confirm that the
cost of find_domain_by_id() is in fact reduced by 2/3. 
The patches apply cleanly to changeset 12732.

Renato

Patches:
  1/2 - Import linux RCU code into Xen
  2/2 - replace domlist_lock operations by RCU operations

Signed-off-by: Jose Renato Santos <jsantos@hpl.hp.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2006-Dec-06 08:43 UTC

head link

Re: [Xen-devel] [PATCH] Reduce overhead in find_domain_by_id() [0/2]

On 6/12/06 1:35 am, "Santos, Jose Renato G"
<joserenato.santos@hp.com>
wrote:
> 
> This is a set of patches to improve performance of find_domain_by_id().
> find_domain_by_id shows up high in profiles for network I/O intensive
> workloads.
> Most of the cost for this function comes from 3 main functions (of
> aproximate equal costs): 1)read_lock(), 2)read_unlock() and
> 3)get_domain().
> These patches replace the lock used for accessing domain_list and
> domain_hash with a lock free RCU scheme. Experiments confirm that the
> cost of find_domain_by_id() is in fact reduced by 2/3.
> The patches apply cleanly to changeset 12732.
Do you have numbers for performance improvement on a macro benchmark?

 -- Keir


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Emmanuel Ackaouy

2006-Dec-06 10:41 UTC

head link

Re: [Xen-devel] [PATCH] Reduce overhead in find_domain_by_id() [0/2]

I also spotted find_domain_by_id() showing up rather high
in network intensive workloads. The CPU overhead of our
network I/O path is pretty large so it''s worth trying to
address and if I remember, that one was oddly rather high
on the list of low hanging fruits.

Find_domain_by_id() is called from __gnttab_map_grant_ref()
which is typically called N times on an array of grant ops
from gnttab_map_grant_ref(). Perhaps we could find a way
to optimize the common case here and only lookup and hold
the domain once per OP array instead of once per op in the
multi op?

We could also cleanup some code while there:

    if ( unlikely((rd = find_domain_by_id(op->dom)) == NULL) )
    {
     vvvvvvvvvvvvvvvvvvvvvvvv
        if ( rd != NULL )
            put_domain(rd);
     ^^^^^^^^^^^^^^^^^^^^^^^^ WTF???
        DPRINTK("Could not find domain %d\n", op->dom);
        op->status = GNTST_bad_domain;
        return;
    }

It''s a bit puzzling to me that grabbing the lock adds such an
overhead. Is this purely a lock operation overhead or is there
contention on the lock cache line (could find this out by
profiling for data cache line misses)?

Cheers,
Emmanuel.

On Tue, Dec 05, 2006 at 07:35:37PM -0600, Santos, Jose Renato G
wrote:> 
> This is a set of patches to improve performance of find_domain_by_id().
> find_domain_by_id shows up high in profiles for network I/O intensive
> workloads.
> Most of the cost for this function comes from 3 main functions (of
> aproximate equal costs): 1)read_lock(), 2)read_unlock() and
> 3)get_domain().
> These patches replace the lock used for accessing domain_list and
> domain_hash with a lock free RCU scheme. Experiments confirm that the
> cost of find_domain_by_id() is in fact reduced by 2/3. 
> The patches apply cleanly to changeset 12732.
> 
> Renato
> 
> Patches:
>   1/2 - Import linux RCU code into Xen
>   2/2 - replace domlist_lock operations by RCU operations
> 
> Signed-off-by: Jose Renato Santos <jsantos@hpl.hp.com>
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Apparao, Padmashree K

2006-Dec-06 11:52 UTC

head link

RE: [Xen-devel] [PATCH] Reduce overhead in find_domain_by_id() [0/2]

Keir,

Earlier this summer I had done some experiments with Iperf and found
that find_domain_id was consuming 7% of cpu while do_grant_table_op was
consuming 10.5%. A quick fix to figure what the upper bound of
performance gain was to remove the hash table lookups in
find_domain_by_id and we saw a 12% improvement in perf receive
performance.
The results of these finding were published in our paper at VTDC.
In that paper we also discuss how the inter vm communication by grant
tables was consuming a considerable amount of time and what could be
possibly done to optimize.

I will send the paper is a separate note to you and Renato.

Thanks
- Padma

-----Original Message-----
From: xen-devel-bounces@lists.xensource.com
[mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of Keir Fraser
Sent: Wednesday, December 06, 2006 12:44 AM
To: Santos, Jose Renato G; xen-devel@lists.xensource.com
Cc: Yoshio Turner; Jose Renato Santos; G John Janakiraman
Subject: Re: [Xen-devel] [PATCH] Reduce overhead in find_domain_by_id()
[0/2]

On 6/12/06 1:35 am, "Santos, Jose Renato G"
<joserenato.santos@hp.com>
wrote:
> 
> This is a set of patches to improve performance of
find_domain_by_id().> find_domain_by_id shows up high in profiles for network I/O intensive
> workloads.
> Most of the cost for this function comes from 3 main functions (of
> aproximate equal costs): 1)read_lock(), 2)read_unlock() and
> 3)get_domain().
> These patches replace the lock used for accessing domain_list and
> domain_hash with a lock free RCU scheme. Experiments confirm that the
> cost of find_domain_by_id() is in fact reduced by 2/3.
> The patches apply cleanly to changeset 12732.
Do you have numbers for performance improvement on a macro benchmark?

 -- Keir

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2006-Dec-06 12:12 UTC

head link

Re: [Xen-devel] [PATCH] Reduce overhead in find_domain_by_id() [0/2]

Okay, sounds good. I don''t want the whole Linux RCU mechanism in Xen
particularly. Linux may well need the complexity because it leans on RCU
quite heavily these days, but Xen will do so a whole lot less I''m sure
and
something much simpler should suffice. I''ll look into this -- the
patches
you sent provide a good starting point, thanks!

 -- Keir

On 6/12/06 11:52, "Apparao, Padmashree K"
<padmashree.k.apparao@intel.com>
wrote:
> Keir,
> 
> Earlier this summer I had done some experiments with Iperf and found
> that find_domain_id was consuming 7% of cpu while do_grant_table_op was
> consuming 10.5%. A quick fix to figure what the upper bound of
> performance gain was to remove the hash table lookups in
> find_domain_by_id and we saw a 12% improvement in perf receive
> performance.
> The results of these finding were published in our paper at VTDC.
> In that paper we also discuss how the inter vm communication by grant
> tables was consuming a considerable amount of time and what could be
> possibly done to optimize.
> 
> I will send the paper is a separate note to you and Renato.
> 
> Thanks
> - Padma
> 
> 
> 
> -----Original Message-----
> From: xen-devel-bounces@lists.xensource.com
> [mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of Keir Fraser
> Sent: Wednesday, December 06, 2006 12:44 AM
> To: Santos, Jose Renato G; xen-devel@lists.xensource.com
> Cc: Yoshio Turner; Jose Renato Santos; G John Janakiraman
> Subject: Re: [Xen-devel] [PATCH] Reduce overhead in find_domain_by_id()
> [0/2]
> 
> On 6/12/06 1:35 am, "Santos, Jose Renato G"
<joserenato.santos@hp.com>
> wrote:
> 
>> 
>> This is a set of patches to improve performance of
> find_domain_by_id().
>> find_domain_by_id shows up high in profiles for network I/O intensive
>> workloads.
>> Most of the cost for this function comes from 3 main functions (of
>> aproximate equal costs): 1)read_lock(), 2)read_unlock() and
>> 3)get_domain().
>> These patches replace the lock used for accessing domain_list and
>> domain_hash with a lock free RCU scheme. Experiments confirm that the
>> cost of find_domain_by_id() is in fact reduced by 2/3.
>> The patches apply cleanly to changeset 12732.
> 
> Do you have numbers for performance improvement on a macro benchmark?
> 
>  -- Keir
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Santos, Jose Renato G

2006-Dec-07 03:49 UTC

head link

RE: [Xen-devel] [PATCH] Reduce overhead in find_domain_by_id() [0/2]

Keir,

I have already removed the following from linux RCU code:
1) "bh" version of RCU functions, not needed in Xen since 
   sofirqs cannot preempt hypervisor
2) synchronization calls that block waiting for readers
   to complete. (we just need asyncronous call back)
3) support for hotplug CPUs (Xen does not support it).

Looking at the code again, the following could also be removed:
1) logic and functions associated with limiting the RCU batch size
   This is probably not needed since we do not expect a lot of RCU
   activity
2) Functions which are not  being used:
    2.1) rcu_batch_after
    2.2) rcu_needs_cpu
    2.3) rcu_batches_completed

Can''t think of anything else. If you want I can remove these and submit
a revised patch.
It seems that there is not much more we could remove. Not sure if you
have something else in mind...

Regards

Renato
   > -----Original Message-----
> From: Keir Fraser [mailto:keir@xensource.com] 
> Sent: Wednesday, December 06, 2006 4:13 AM
> To: Apparao, Padmashree K; Keir Fraser; Santos, Jose Renato 
> G; xen-devel@lists.xensource.com
> Cc: Turner, Yoshio; Jose Renato Santos; G John Janakiraman
> Subject: Re: [Xen-devel] [PATCH] Reduce overhead in 
> find_domain_by_id() [0/2]
> 
> 
> Okay, sounds good. I don''t want the whole Linux RCU mechanism 
> in Xen particularly. Linux may well need the complexity 
> because it leans on RCU quite heavily these days, but Xen 
> will do so a whole lot less I''m sure and something much 
> simpler should suffice. I''ll look into this -- the patches 
> you sent provide a good starting point, thanks!
> 
>  -- Keir
> 
> On 6/12/06 11:52, "Apparao, Padmashree K" 
> <padmashree.k.apparao@intel.com>
> wrote:
> 
> > Keir,
> > 
> > Earlier this summer I had done some experiments with Iperf 
> and found 
> > that find_domain_id was consuming 7% of cpu while do_grant_table_op 
> > was consuming 10.5%. A quick fix to figure what the upper bound of 
> > performance gain was to remove the hash table lookups in 
> > find_domain_by_id and we saw a 12% improvement in perf receive 
> > performance.
> > The results of these finding were published in our paper at VTDC.
> > In that paper we also discuss how the inter vm 
> communication by grant 
> > tables was consuming a considerable amount of time and what 
> could be 
> > possibly done to optimize.
> > 
> > I will send the paper is a separate note to you and Renato.
> > 
> > Thanks
> > - Padma
> > 
> > 
> > 
> > -----Original Message-----
> > From: xen-devel-bounces@lists.xensource.com
> > [mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of Keir 
> > Fraser
> > Sent: Wednesday, December 06, 2006 12:44 AM
> > To: Santos, Jose Renato G; xen-devel@lists.xensource.com
> > Cc: Yoshio Turner; Jose Renato Santos; G John Janakiraman
> > Subject: Re: [Xen-devel] [PATCH] Reduce overhead in 
> > find_domain_by_id() [0/2]
> > 
> > On 6/12/06 1:35 am, "Santos, Jose Renato G" 
> <joserenato.santos@hp.com>
> > wrote:
> > 
> >> 
> >> This is a set of patches to improve performance of
> > find_domain_by_id().
> >> find_domain_by_id shows up high in profiles for network 
> I/O intensive 
> >> workloads.
> >> Most of the cost for this function comes from 3 main functions (of
> >> aproximate equal costs): 1)read_lock(), 2)read_unlock() and 
> >> 3)get_domain().
> >> These patches replace the lock used for accessing domain_list and 
> >> domain_hash with a lock free RCU scheme. Experiments 
> confirm that the 
> >> cost of find_domain_by_id() is in fact reduced by 2/3.
> >> The patches apply cleanly to changeset 12732.
> > 
> > Do you have numbers for performance improvement on a macro 
> benchmark?
> > 
> >  -- Keir
> > 
> > 
> > _______________________________________________
> > Xen-devel mailing list
> > Xen-devel@lists.xensource.com
> > http://lists.xensource.com/xen-devel
> > 
> > _______________________________________________
> > Xen-devel mailing list
> > Xen-devel@lists.xensource.com
> > http://lists.xensource.com/xen-devel
> 
> 
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Santos, Jose Renato G

2006-Dec-07 04:07 UTC

head link

RE: [Xen-devel] [PATCH] Reduce overhead in find_domain_by_id() [0/2]

> -----Original Message-----
> From: Emmanuel Ackaouy [mailto:ack@xensource.com] 
> Sent: Wednesday, December 06, 2006 2:42 AM
> To: Santos, Jose Renato G
> Cc: xen-devel@lists.xensource.com; Turner, Yoshio; Jose 
> Renato Santos; G John Janakiraman
> Subject: Re: [Xen-devel] [PATCH] Reduce overhead in 
> find_domain_by_id() [0/2]
> 
> I also spotted find_domain_by_id() showing up rather high in 
> network intensive workloads. The CPU overhead of our network 
> I/O path is pretty large so it''s worth trying to address and 
> if I remember, that one was oddly rather high on the list of 
> low hanging fruits.
> 
> Find_domain_by_id() is called from __gnttab_map_grant_ref() 
> which is typically called N times on an array of grant ops 
> from gnttab_map_grant_ref(). Perhaps we could find a way to 
> optimize the common case here and only lookup and hold the 
> domain once per OP array instead of once per op in the multi op?
> 
> We could also cleanup some code while there:
> 
>     if ( unlikely((rd = find_domain_by_id(op->dom)) == NULL) )
>     {
>      vvvvvvvvvvvvvvvvvvvvvvvv
>         if ( rd != NULL )
>             put_domain(rd);
>      ^^^^^^^^^^^^^^^^^^^^^^^^ WTF???
>         DPRINTK("Could not find domain %d\n", op->dom);
>         op->status = GNTST_bad_domain;
>         return;
>     }
> 
> It''s a bit puzzling to me that grabbing the lock adds such an 
> overhead. Is this purely a lock operation overhead or is 
> there contention on the lock cache line (could find this out 
> by profiling for data cache line misses)?
> 
  Yes, this is due to cache contention on the lock. 
  There is also cache contention on the domain refcnt used by 
  get_domain().
  I just implemented a percpu version of the reference count 
  that avoids cache contention and the cost of 
  find_domain_by_id() is reduced either further.

  Currently, find_domain_by_id() consumes approximately 3.05% of 
  the total CPU cycles for a TCP TX micro benchmark. With the RCU
  scheme this is reduced to 1.16%. And with a per cpu reference 
  count mechanism, this is reduced to 0.31%. 
  I can submit a patch for the percpu reference count after
  I clean up the code a bit.

  Regards

  Renato
 > Cheers,
> Emmanuel.
> 
> On Tue, Dec 05, 2006 at 07:35:37PM -0600, Santos, Jose Renato G wrote:
> > 
> > This is a set of patches to improve performance of 
> find_domain_by_id().
> > find_domain_by_id shows up high in profiles for network I/O 
> intensive 
> > workloads.
> > Most of the cost for this function comes from 3 main functions (of 
> > aproximate equal costs): 1)read_lock(), 2)read_unlock() and 
> > 3)get_domain().
> > These patches replace the lock used for accessing domain_list and 
> > domain_hash with a lock free RCU scheme. Experiments 
> confirm that the 
> > cost of find_domain_by_id() is in fact reduced by 2/3.
> > The patches apply cleanly to changeset 12732.
> > 
> > Renato
> > 
> > Patches:
> >   1/2 - Import linux RCU code into Xen
> >   2/2 - replace domlist_lock operations by RCU operations
> > 
> > Signed-off-by: Jose Renato Santos <jsantos@hpl.hp.com>
> > 
> > _______________________________________________
> > Xen-devel mailing list
> > Xen-devel@lists.xensource.com
> > http://lists.xensource.com/xen-devel
> 
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2006-Dec-07 12:01 UTC

head link

Re: [Xen-devel] [PATCH] Reduce overhead in find_domain_by_id() [0/2]

On 7/12/06 03:49, "Santos, Jose Renato G"
<joserenato.santos@hp.com> wrote:
> Can''t think of anything else. If you want I can remove these and
submit
> a revised patch.
> It seems that there is not much more we could remove. Not sure if you
> have something else in mind...
I suppose the patch isn''t actually as big as I first imagined.
I''ll review
it and probably apply it pretty much as is. It''s just that RCU has
always
seemed rather over complicated to me (lots of different queues, for example)
for what should be a rather simple concept to implement. This is possibly
just ignorance on my part. :-)

As for per-cpu refcounts, I suspect a better scheme would be
find_domain_by_id_noref(). Idea being that often we take a reference for a
short period of time (in particular, scope of one function) and with delayed
destruction we can now safely use a found domain pointer with no refcnt
increment.

 -- Keir


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Santos, Jose Renato G

2006-Dec-08 04:17 UTC

head link

RE: [Xen-devel] [PATCH] Reduce overhead in find_domain_by_id() [0/2]

> -----Original Message-----
> From: Keir Fraser [mailto:keir@xensource.com] 
> Sent: Thursday, December 07, 2006 4:02 AM
> To: Santos, Jose Renato G; Keir Fraser; Apparao, Padmashree 
> K; xen-devel@lists.xensource.com
> Cc: Turner, Yoshio; Jose Renato Santos; G John Janakiraman
> Subject: Re: [Xen-devel] [PATCH] Reduce overhead in 
> find_domain_by_id() [0/2]
> 
> On 7/12/06 03:49, "Santos, Jose Renato G" 
> <joserenato.santos@hp.com> wrote:
> 
> > Can''t think of anything else. If you want I can remove these
and
> > submit a revised patch.
> > It seems that there is not much more we could remove. Not 
> sure if you 
> > have something else in mind...
> 
> I suppose the patch isn''t actually as big as I first 
> imagined. I''ll review it and probably apply it pretty much as 
> is. It''s just that RCU has always seemed rather over 
> complicated to me (lots of different queues, for example) for 
> what should be a rather simple concept to implement. This is 
> possibly just ignorance on my part. :-)
> 
>
Yes, the multiple queues seems somewhat complicated, but
I think we need them to be able to handle multiple callbacks.
Maybe we can get rid of of of them ("done")...
> As for per-cpu refcounts, I suspect a better scheme would be 
> find_domain_by_id_noref(). Idea being that often we take a 
> reference for a short period of time (in particular, scope of 
> one function) and with delayed destruction we can now safely 
> use a found domain pointer with no refcnt increment.
> 
Good point. This is indeed a much better scheme. I will 
work out a patch for find_domain_by_id_noref().

Thanks

Renato
>  -- Keir
> 
> 
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Xen devel - Dec 2006 - [PATCH] Reduce overhead in find_domain_by_id() [0/2]

[Xen-devel] [PATCH] Reduce overhead in find_domain_by_id() [0/2]

Re: [Xen-devel] [PATCH] Reduce overhead in find_domain_by_id() [0/2]

Re: [Xen-devel] [PATCH] Reduce overhead in find_domain_by_id() [0/2]

RE: [Xen-devel] [PATCH] Reduce overhead in find_domain_by_id() [0/2]

Re: [Xen-devel] [PATCH] Reduce overhead in find_domain_by_id() [0/2]

RE: [Xen-devel] [PATCH] Reduce overhead in find_domain_by_id() [0/2]

RE: [Xen-devel] [PATCH] Reduce overhead in find_domain_by_id() [0/2]

Re: [Xen-devel] [PATCH] Reduce overhead in find_domain_by_id() [0/2]

RE: [Xen-devel] [PATCH] Reduce overhead in find_domain_by_id() [0/2]