thr3ads.net - zfs discuss - [zfs-discuss] UC Davis Cyrus Incident September 2007 [Oct 2007]

If this information is useful, please help other people find it:
Share via:

Gary Mills

2007-Oct-18 13:04 UTC

[zfs-discuss] UC Davis Cyrus Incident September 2007

Does anyone on this mailing list have an idea what went wrong with
ZFS and Cyrus IMAP?  Here''s an excerpt that explains the problem:

  About a week before classes actually start is when all the kids start
  moving back into town and mailing all their buds.  We saw process
  numbers go from 500-ish to as high as 5,000.  Load would climb
  radically after passing 2,000 processes and systems became slow to
  respond.

Here''s a suggestion on the cause:

  The root problem seems to be an interaction between Solaris'' concept
  of global memory consistency and the fact that Cyrus spawns many
  processes that all memory map (mmap) the same file.  Whenever any
  process updates any part of a memory mapped file, Solaris freezes all
  of the processes that have that file mmaped, updates their memory
  tables, and then re-schedules the processes to run.  When we have
  problems we see the load average go extremely high and no useful work
  gets done by Cyrus.

I''m concerned because I''m also using Cyrus IMAP with ZFS.  So
far,
it''s been extremely well behaved.  Snapshots are one the best parts of
this system.

-- 
-Gary Mills-    -Unix Support-    -U of M Academic Computing and Networking-

Bill Sommerfeld

2007-Oct-18 14:16 UTC

head link

[zfs-discuss] UC Davis Cyrus Incident September 2007

On Thu, 2007-10-18 at 08:04 -0500, Gary Mills wrote:> Here''s a suggestion on the cause:
> 
>   The root problem seems to be an interaction between Solaris''
concept
>   of global memory consistency and the fact that Cyrus spawns many
>   processes that all memory map (mmap) the same file.  Whenever any
>   process updates any part of a memory mapped file, Solaris freezes all
>   of the processes that have that file mmaped, updates their memory
>   tables, and then re-schedules the processes to run.  When we have
>   problems we see the load average go extremely high and no useful work
>   gets done by Cyrus.
that sounds like a somewhat mangled description of the cross-calls done
to invalidate the TLB on other processors when a page is unmapped.
(it certainly doesn''t happen on *every* update to a mapped file).

from grepping the source code it looks like cyrus is both multithreaded
and a heavy user of munmap.  

					- Bill

Mike Gerdts

2007-Oct-18 15:03 UTC

head link

[zfs-discuss] UC Davis Cyrus Incident September 2007

On 10/18/07, Bill Sommerfeld <sommerfeld at sun.com>
wrote:> that sounds like a somewhat mangled description of the cross-calls done
> to invalidate the TLB on other processors when a page is unmapped.
> (it certainly doesn''t happen on *every* update to a mapped file).
I''ve seen systems running Veritas Cluster & Oracle Cluster Ready
Services idle at about 10% sys due to the huge number of monitoring
scripts that kept firing.  This was on a 12 - 16 CPU 25k domain.  A
quite similar configuration on T2000''s had negligible overhead.
Lesson learned: cross-calls (and thread migrations, and ...) are much
cheaper on systems with lower latency between CPUs.

-- 
Mike Gerdts
http://mgerdts.blogspot.com/

Gary Mills

2007-Oct-18 15:19 UTC

head link

[zfs-discuss] UC Davis Cyrus Incident September 2007

On Thu, Oct 18, 2007 at 10:16:52AM -0400, Bill Sommerfeld
wrote:> On Thu, 2007-10-18 at 08:04 -0500, Gary Mills wrote:
> > Here''s a suggestion on the cause:
> > 
> >   The root problem seems to be an interaction between
Solaris'' concept
> >   of global memory consistency and the fact that Cyrus spawns many
> >   processes that all memory map (mmap) the same file.  Whenever any
> >   process updates any part of a memory mapped file, Solaris freezes
all
> >   of the processes that have that file mmaped, updates their memory
> >   tables, and then re-schedules the processes to run.  When we have
> >   problems we see the load average go extremely high and no useful
work
> >   gets done by Cyrus.
> 
> that sounds like a somewhat mangled description of the cross-calls done
> to invalidate the TLB on other processors when a page is unmapped.
> (it certainly doesn''t happen on *every* update to a mapped file).
> 
> from grepping the source code it looks like cyrus is both multithreaded
> and a heavy user of munmap.  
Here''s a process summary of our Cyrus back-end which uses ZFS for its
mail store:

   10:08am  up 38 day(s), 12:11,  1 user,  load average: 2.36, 2.02, 1.89
  %CPU    NUM     COMM
  5.2     1788    imapd
  0.7     11      pop3d
  0.6     43      lmtpd
  0.2     2       /usr/local/cyrus/bin/master
  0.1     1       ps
  0.1     1       idled
  0.1     1       fsflush
  0.1     1       /usr/sbin/syslogd
  0.1     1       /opt/local/mysql/libexec/mysqld

The imapd, pop3d, and lmtpd processes are single-threaded.  There''s
one for each client connection.  `master'' on the other hand is
supposed to be multi-threaded, but `prstat'' shows only one thread now.
There are two because one is the mupdate master and the other is the
back-end master.

What''s the command to show cross calls?

-- 
-Gary Mills-    -Unix Support-    -U of M Academic Computing and Networking-

Frank Hofmann

2007-Oct-18 15:32 UTC

head link

[zfs-discuss] UC Davis Cyrus Incident September 2007

On Thu, 18 Oct 2007, Mike Gerdts wrote:
> On 10/18/07, Bill Sommerfeld <sommerfeld at sun.com> wrote:
>> that sounds like a somewhat mangled description of the cross-calls done
>> to invalidate the TLB on other processors when a page is unmapped.
>> (it certainly doesn''t happen on *every* update to a mapped
file).
>
> I''ve seen systems running Veritas Cluster & Oracle Cluster
Ready
> Services idle at about 10% sys due to the huge number of monitoring
> scripts that kept firing.  This was on a 12 - 16 CPU 25k domain.  A
Monitoring scripts and mmap users ... URGH :(

That runs into procfs'' notorious keenness on locking the address spaces
of
inspected processes. Even as much as an "ls -l /proc/<PID>/" is
acquiring
address space locks on that process, and I can see how/why this leads to 
CPU spikes when you have an application that heavily uses mmap()/munmap().

One could say, if you want this workload to perform well, trust it to 
perform well and restrain the urge to watch it all the time ...
> quite similar configuration on T2000''s had negligible overhead.
> Lesson learned: cross-calls (and thread migrations, and ...) are much
> cheaper on systems with lower latency between CPUs.
And quantum theory tells us: If you hadn''t looked, that cat might still
be
living happily ever after ... /proc isn''t for free.

FrankH.
>
> -- 
> Mike Gerdts
> http://mgerdts.blogspot.com/
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>

Mike Gerdts

2007-Oct-18 15:32 UTC

head link

[zfs-discuss] UC Davis Cyrus Incident September 2007

On 10/18/07, Gary Mills <mills at cc.umanitoba.ca>
wrote:> What''s the command to show cross calls?
mpstat will show it on a system basis.

xcallsbypid.d from the DTraceToolkit (ask google) will tell you which
PID is responsible.

-- 
Mike Gerdts
http://mgerdts.blogspot.com/

Pramod Batni

2007-Oct-18 15:33 UTC

head link

[zfs-discuss] UC Davis Cyrus Incident September 2007

>
> What''s the command to show cross calls?
>   
mpstat(1M)

example o/p
$ mpstat 1
CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl
0 16 0 0 416 316 485 16 0 0 0 618 7 3 0 90
0 6 0 0 425 324 488 2 0 0 0 579 4 2 0 94

michael schuster

2007-Oct-18 15:37 UTC

head link

[zfs-discuss] UC Davis Cyrus Incident September 2007

Gary Mills wrote:
> What''s the command to show cross calls?
mpstat

-- 
Michael Schuster	Sun Microsystems, Inc.
recursion, n: see ''recursion''

Gary Mills

2007-Oct-18 15:59 UTC

head link

[zfs-discuss] UC Davis Cyrus Incident September 2007

On Thu, Oct 18, 2007 at 10:32:58AM -0500, Mike Gerdts
wrote:> On 10/18/07, Gary Mills <mills at cc.umanitoba.ca> wrote:
> > What''s the command to show cross calls?
> 
> mpstat will show it on a system basis.
Thanks.  This is on our T2000 Cyrus IMAP server with ZFS.  It''s
the second listing from `mpstat 5''.  How do I recognize when there
are too many cross calls?

  CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl
    0   29   0 2591   779  139 1037    8   78  214    1   785    2  10   0  87
    1   12   0 1225   175    0  250    1   63   95    0   257    2   2   0  96
    2    2   0  427    83    0  128    0   35   59    0   124    0   1   0  99
    3    3   0  432    66    0  121    0   32   44    0   154    1   1   0  98
    4   13   0 1231   159    0  338    0   20   83    0   281    1   3   0  96
    5   11   0  991   164    2  337    1   22   69    0   299    1   2   0  97
    6    3   0  512   101    0  205    0   18   51    0   185    0   4   0  96
    7    8   0  598   119    0  246    0   20   68    0   211    1   1   0  97
    8    9   0  947   119    0  245    1   20   52    0   264    1   1   0  98
    9    2   0  258   110    0  229    0   19   58    0    77    0   4   0  95
   10   13   0 2083   153    0  311    0   20   71    0   151    0   4   0  96
   11   16   0 1434   119    0  262    1   21   53    0   385    1   2   0  97
   12   20   0 1093   145    0  308    1   21   75    0   381    1   2   0  97
   13   12   0 2163   154    0  318    2   23   68    0   211    1   2   0  97
   14   22   0 2955   126    0  269    1   22   49    0   666    2   3   0  95
   15    4   0  313   132    0  273    1   20   55    0   127    0   1   0  99
   16    0   0  256   287    0  582    0   20   60    0    53    0   2   0  98
   17    0   0  138   145    0  295    0   20   54    0   116    0   1   0  98
   18    1   0  806   283    0  574    1   19   62    0   150    0   3   0  96
   19    7   0 2290  2347 2181  347    2   23  105    0   169    1   7   0  92
   20    1   0  671   605  496  226    0   18   61    0   140    0   2   0  98
   21    7   0 1205   128   26  203    0   16   51    0   152    1   7   0  93
   22   15   0 1045   107    0  218    1   18   56    0   243    0   2   0  98
   23   27   0 2887   141    0  306    2   19   53    0   594    3   2   0  95
   24   55   3  991   128    0  272    1   22   56    0   388    1   2   0  97
   25   21   0 1857   124    0  264    1   19   59    0   461    1   2   0  97
   26   16   0  835   172    0  358    0   19   93    0   267    1   2   0  97
   27   10   0 1132   183    0  383    1   20   66    0   289    1   3   0  96
   28   14   0 2761   103    0  225    1   20   53    0   379    1   3   0  95
   29    5   0  618    99    0  212    0   19   56    0   197    0   3   0  97
   30   16   0  538    87    0  178    1   18   47    0   185    1   1   0  98
   31    0   0  661  1104    0 2319    3   24  206    0    78    0   8   0  91

-- 
-Gary Mills-    -Unix Support-    -U of M Academic Computing and Networking-

Roch - PAE

2007-Nov-26 09:09 UTC

head link

[zfs-discuss] UC Davis Cyrus Incident September 2007

xcal  are  sometimes   a signature   of   some problem.   Of
themselves they should be cheap. Below one sees that the sys
time is rather small, so I''m inclined to think this is not a
problem here pending further analysis of the problem. We see
that all you CPUs are making what appears to progress with a
number of syscl.

So this machine is underperforming, which means it''s being
given some work to do which is not happening in the expected time.

I might have missed the initial post, but what data is there 
to support that ?

CPU 19 is probably your network  interrupt. Creating a fence
around it   so that no   other work   become pinned  by   it
sometimes help : psrset -c 19.


-r


Gary Mills writes:
 > On Thu, Oct 18, 2007 at 10:32:58AM -0500, Mike Gerdts wrote:
 > > On 10/18/07, Gary Mills <mills at cc.umanitoba.ca> wrote:
 > > > What''s the command to show cross calls?
 > > 
 > > mpstat will show it on a system basis.
 > 
 > Thanks.  This is on our T2000 Cyrus IMAP server with ZFS.  It''s
 > the second listing from `mpstat 5''.  How do I recognize when
there
 > are too many cross calls?
 > 
 >   CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt
idl
 >     0   29   0 2591   779  139 1037    8   78  214    1   785    2  10   0
87
 >     1   12   0 1225   175    0  250    1   63   95    0   257    2   2   0
96
 >     2    2   0  427    83    0  128    0   35   59    0   124    0   1   0
99
 >     3    3   0  432    66    0  121    0   32   44    0   154    1   1   0
98
 >     4   13   0 1231   159    0  338    0   20   83    0   281    1   3   0
96
 >     5   11   0  991   164    2  337    1   22   69    0   299    1   2   0
97
 >     6    3   0  512   101    0  205    0   18   51    0   185    0   4   0
96
 >     7    8   0  598   119    0  246    0   20   68    0   211    1   1   0
97
 >     8    9   0  947   119    0  245    1   20   52    0   264    1   1   0
98
 >     9    2   0  258   110    0  229    0   19   58    0    77    0   4   0
95
 >    10   13   0 2083   153    0  311    0   20   71    0   151    0   4   0
96
 >    11   16   0 1434   119    0  262    1   21   53    0   385    1   2   0
97
 >    12   20   0 1093   145    0  308    1   21   75    0   381    1   2   0
97
 >    13   12   0 2163   154    0  318    2   23   68    0   211    1   2   0
97
 >    14   22   0 2955   126    0  269    1   22   49    0   666    2   3   0
95
 >    15    4   0  313   132    0  273    1   20   55    0   127    0   1   0
99
 >    16    0   0  256   287    0  582    0   20   60    0    53    0   2   0
98
 >    17    0   0  138   145    0  295    0   20   54    0   116    0   1   0
98
 >    18    1   0  806   283    0  574    1   19   62    0   150    0   3   0
96
 >    19    7   0 2290  2347 2181  347    2   23  105    0   169    1   7   0
92
 >    20    1   0  671   605  496  226    0   18   61    0   140    0   2   0
98
 >    21    7   0 1205   128   26  203    0   16   51    0   152    1   7   0
93
 >    22   15   0 1045   107    0  218    1   18   56    0   243    0   2   0
98
 >    23   27   0 2887   141    0  306    2   19   53    0   594    3   2   0
95
 >    24   55   3  991   128    0  272    1   22   56    0   388    1   2   0
97
 >    25   21   0 1857   124    0  264    1   19   59    0   461    1   2   0
97
 >    26   16   0  835   172    0  358    0   19   93    0   267    1   2   0
97
 >    27   10   0 1132   183    0  383    1   20   66    0   289    1   3   0
96
 >    28   14   0 2761   103    0  225    1   20   53    0   379    1   3   0
95
 >    29    5   0  618    99    0  212    0   19   56    0   197    0   3   0
97
 >    30   16   0  538    87    0  178    1   18   47    0   185    1   1   0
98
 >    31    0   0  661  1104    0 2319    3   24  206    0    78    0   8   0
91
 > 
 > -- 
 > -Gary Mills-    -Unix Support-    -U of M Academic Computing and
Networking-
 > _______________________________________________
 > zfs-discuss mailing list
 > zfs-discuss at opensolaris.org
 > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

zfs discuss - Oct 2007 - UC Davis Cyrus Incident September 2007

[zfs-discuss] UC Davis Cyrus Incident September 2007

[zfs-discuss] UC Davis Cyrus Incident September 2007

[zfs-discuss] UC Davis Cyrus Incident September 2007

[zfs-discuss] UC Davis Cyrus Incident September 2007

[zfs-discuss] UC Davis Cyrus Incident September 2007

[zfs-discuss] UC Davis Cyrus Incident September 2007

[zfs-discuss] UC Davis Cyrus Incident September 2007

[zfs-discuss] UC Davis Cyrus Incident September 2007

[zfs-discuss] UC Davis Cyrus Incident September 2007

[zfs-discuss] UC Davis Cyrus Incident September 2007