thr3ads.net - Lustre discuss - [Lustre-discuss] copy_user_generic

If this information is useful, please help other people find it:
Share via:

Roger Spellman

2008-Jul-07 21:42 UTC

[Lustre-discuss] copy_user_generic_c ?

I''m running an OST (not a client on an OST) with SW RAID 5.  My
interconnect is IB, and I''m using OFED 1.3.1.  I added OPROFILE to my
kernel, to see if I could find a bottleneck.  The biggest CPU user, at
25%, was copy_user_generic_c.  


Grepping through the linux, ofed, and lustre code, I cannot find where
this is being called.  Can anyone suggest where this is being called,
and why?

 

-Roger

 

 

Roger Spellman

Staff Engineer

Terascala, Inc.

508-588-1501

www.terascala.com <http://www.terascala.com/> 

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://lists.lustre.org/pipermail/lustre-discuss/attachments/20080707/b0a01b87/attachment.html

Andreas Dilger

2008-Jul-07 23:05 UTC

head link

[Lustre-discuss] copy_user_generic_c ?

On Jul 07, 2008  17:42 -0400, Roger Spellman wrote:> I''m running an OST (not a client on an OST) with SW RAID 5.  My
> interconnect is IB, and I''m using OFED 1.3.1.  I added OPROFILE to
my
> kernel, to see if I could find a bottleneck.  The biggest CPU user, at
> 25%, was copy_user_generic_c.  
>
> Grepping through the linux, ofed, and lustre code, I cannot find where
> this is being called.  Can anyone suggest where this is being called,
> and why?
This is a well-known problem - this kernel function is copying data
from userspace to the kernel buffers on a write, and vice versa on
a read.  The way to avoid this is by using O_DIRECT, but as a result
you will not get cached data on the client, and this means you will
not able to do cached writes (i.e. write behind) and will wait for
IO completion for each write (i.e. sync writes).

If you are doing enough IO to hit a bottleneck with copy_{to,from}_user()
then you can probably also be doing large enough IOs to make the sync IO
performance hit of O_DIRECT negligible.

We are looking at how to spread the load of copy_{to,from}_user() over
more CPUs, but that is not likely to make it into a Lustre release for
some time yet.  Completely avoiding the copy while allowing a cache on
the client would require major VFS/VM surgery (e.g. ensuring the buffers
are aligned and marking the pages read-only, forcing the client to fault
them if it changes the buffer again).

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

Roger Spellman

2008-Jul-08 00:36 UTC

head link

[Lustre-discuss] copy_user_generic_c ?

Andreas,
Thanks for this information.
But, I''m seeing this problem on an OST, not on a client.  Why would an
OST be doing copy_to/from_user()?  On a write, the IB card should be directly
placing the data.  So, shouldn''t the data already be in kernel space?
Thanks.
-Roger
>This is a well-known problem - this kernel function is copying data
>from userspace to the kernel buffers on a write, and vice versa on
>a read.  -------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://lists.lustre.org/pipermail/lustre-discuss/attachments/20080707/82f6829e/attachment.html

Andreas Dilger

2008-Jul-08 05:31 UTC

head link

[Lustre-discuss] copy_user_generic_c ?

On Jul 07, 2008  20:36 -0400, Roger Spellman wrote:> Thanks for this information.
> But, I''m seeing this problem on an OST, not on a client.  Why
would
> an OST be doing copy_to/from_user()?  On a write, the IB card should
> be directly placing the data.  So, shouldn''t the data already be
in
> kernel space?
Yes, by all means it shouldn''t need a copy on the OST - that is what
RDMA is for.  You definitely are not running Samba exports on the OST
node?  I can''t imagine what else would be doing this on an OST.

Your oprofile output should be able to show the callchain for the
busiest callpaths.  Alternately, if this is active 25% of the time it
may be enough to do "echo p > /proc/sysrq-trigger" 16 times and see
what the resulting stacks are.  In theory 4 of them should have
copy_{to,from}_user() at the top of the stack.
> >This is a well-known problem - this kernel function is copying data
> >from userspace to the kernel buffers on a write, and vice versa on
> >a read.  
Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

Roger Spellman

2008-Jul-08 13:46 UTC

head link

[Lustre-discuss] copy_user_generic_c ?

> Yes, by all means it shouldn''t need a copy on the OST - that is
what
> RDMA is for.  
Agreed!
> You definitely are not running Samba exports on the OST
> node?  
Certainly not.

I can''t imagine what else would be doing this on an OST.
> Your oprofile output should be able to show the callchain for the
> busiest callpaths.  Alternately, if this is active 25% of the time it
> may be enough to do "echo p > /proc/sysrq-trigger" 16 times
and see
> what the resulting stacks are.  In theory 4 of them should have
> copy_{to,from}_user() at the top of the stack.
Andreas, why would 4 threads have copy_(to,from)_user at the top of the
stack?  Are certain threads supposed to be doing that on an OST?

Thanks,

Roger

Kalpak Shah

2008-Jul-08 14:27 UTC

head link

[Lustre-discuss] copy_user_generic_c ?

On Tue, 2008-07-08 at 09:46 -0400, Roger Spellman wrote:> > Yes, by all means it shouldn''t need a copy on the OST - that
is what
> > RDMA is for.  
> 
> Agreed!
> 
> > You definitely are not running Samba exports on the OST
> > node?  
> 
> Certainly not.
> 
> I can''t imagine what else would be doing this on an OST.
> 
> > Your oprofile output should be able to show the callchain for the
> > busiest callpaths.  Alternately, if this is active 25% of the time it
> > may be enough to do "echo p > /proc/sysrq-trigger" 16
times and see
> > what the resulting stacks are.  In theory 4 of them should have
> > copy_{to,from}_user() at the top of the stack.
> 
> Andreas, why would 4 threads have copy_(to,from)_user at the top of the
> stack?  Are certain threads supposed to be doing that on an OST?
Since the CPU on the OST is active 25% of the time, triggering a stack
trace 16 times, should give us the stack trace for the
copy_{to,from}_user() functions around 4 times.

I don''t think copy_{to,from}_user() is expected to be called on the OST
with that frequency(if any) so having the stack trace will help us
determine from where it is being called.

Thanks,
Kalpak
> 
> Thanks,
> 
> Roger
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss

Roger Spellman

2008-Jul-08 14:46 UTC

head link

[Lustre-discuss] copy_user_generic_c ?

Kalpak,
Thank you for the clarification.
-Roger
> Since the CPU on the OST is active 25% of the time, triggering a stack
> trace 16 times, should give us the stack trace for the
> copy_{to,from}_user() functions around 4 times.
> 
> I don''t think copy_{to,from}_user() is expected to be called on
the
OST> with that frequency(if any) so having the stack trace will help us
> determine from where it is being called.
> 
> Thanks,
> Kalpak

Roger Spellman

2008-Jul-09 21:09 UTC

head link

[Lustre-discuss] copy_user_generic_c ?

It turns out that there was a problem in how I was using oprofile.  I
was doing opcontrol --start and opcontrol --stop.  But, I forgot to do
an opcontrol --reset in between.  So, in addition to recording my OST
results, oprofile was picking up some old data, which is probably where
the copy_(to,from)_user() came from.

Lesson learned:  Always do opcontrol --reset before opcontrol --start.

Thanks to everyone who helped me out.

Roger Spellman
Staff Engineer
Terascala, Inc.
508-588-1501
www.terascala.com

> I don''t think copy_{to,from}_user() is expected to be called on
the
OST> with that frequency(if any) so having the stack trace will help us
> determine from where it is being called.
> 
> Thanks,
> Kalpak

Lustre discuss - Jul 2008 - copy_user_generic_c ?

[Lustre-discuss] copy_user_generic_c ?

[Lustre-discuss] copy_user_generic_c ?

[Lustre-discuss] copy_user_generic_c ?

[Lustre-discuss] copy_user_generic_c ?

[Lustre-discuss] copy_user_generic_c ?

[Lustre-discuss] copy_user_generic_c ?

[Lustre-discuss] copy_user_generic_c ?

[Lustre-discuss] copy_user_generic_c ?