Does anyone on this mailing list have an idea what went wrong with ZFS and Cyrus IMAP? Here''s an excerpt that explains the problem: About a week before classes actually start is when all the kids start moving back into town and mailing all their buds. We saw process numbers go from 500-ish to as high as 5,000. Load would climb radically after passing 2,000 processes and systems became slow to respond. Here''s a suggestion on the cause: The root problem seems to be an interaction between Solaris'' concept of global memory consistency and the fact that Cyrus spawns many processes that all memory map (mmap) the same file. Whenever any process updates any part of a memory mapped file, Solaris freezes all of the processes that have that file mmaped, updates their memory tables, and then re-schedules the processes to run. When we have problems we see the load average go extremely high and no useful work gets done by Cyrus. I''m concerned because I''m also using Cyrus IMAP with ZFS. So far, it''s been extremely well behaved. Snapshots are one the best parts of this system. -- -Gary Mills- -Unix Support- -U of M Academic Computing and Networking-
Bill Sommerfeld
2007-Oct-18 14:16 UTC
[zfs-discuss] UC Davis Cyrus Incident September 2007
On Thu, 2007-10-18 at 08:04 -0500, Gary Mills wrote:> Here''s a suggestion on the cause: > > The root problem seems to be an interaction between Solaris'' concept > of global memory consistency and the fact that Cyrus spawns many > processes that all memory map (mmap) the same file. Whenever any > process updates any part of a memory mapped file, Solaris freezes all > of the processes that have that file mmaped, updates their memory > tables, and then re-schedules the processes to run. When we have > problems we see the load average go extremely high and no useful work > gets done by Cyrus.that sounds like a somewhat mangled description of the cross-calls done to invalidate the TLB on other processors when a page is unmapped. (it certainly doesn''t happen on *every* update to a mapped file). from grepping the source code it looks like cyrus is both multithreaded and a heavy user of munmap. - Bill
On 10/18/07, Bill Sommerfeld <sommerfeld at sun.com> wrote:> that sounds like a somewhat mangled description of the cross-calls done > to invalidate the TLB on other processors when a page is unmapped. > (it certainly doesn''t happen on *every* update to a mapped file).I''ve seen systems running Veritas Cluster & Oracle Cluster Ready Services idle at about 10% sys due to the huge number of monitoring scripts that kept firing. This was on a 12 - 16 CPU 25k domain. A quite similar configuration on T2000''s had negligible overhead. Lesson learned: cross-calls (and thread migrations, and ...) are much cheaper on systems with lower latency between CPUs. -- Mike Gerdts http://mgerdts.blogspot.com/
On Thu, Oct 18, 2007 at 10:16:52AM -0400, Bill Sommerfeld wrote:> On Thu, 2007-10-18 at 08:04 -0500, Gary Mills wrote: > > Here''s a suggestion on the cause: > > > > The root problem seems to be an interaction between Solaris'' concept > > of global memory consistency and the fact that Cyrus spawns many > > processes that all memory map (mmap) the same file. Whenever any > > process updates any part of a memory mapped file, Solaris freezes all > > of the processes that have that file mmaped, updates their memory > > tables, and then re-schedules the processes to run. When we have > > problems we see the load average go extremely high and no useful work > > gets done by Cyrus. > > that sounds like a somewhat mangled description of the cross-calls done > to invalidate the TLB on other processors when a page is unmapped. > (it certainly doesn''t happen on *every* update to a mapped file). > > from grepping the source code it looks like cyrus is both multithreaded > and a heavy user of munmap.Here''s a process summary of our Cyrus back-end which uses ZFS for its mail store: 10:08am up 38 day(s), 12:11, 1 user, load average: 2.36, 2.02, 1.89 %CPU NUM COMM 5.2 1788 imapd 0.7 11 pop3d 0.6 43 lmtpd 0.2 2 /usr/local/cyrus/bin/master 0.1 1 ps 0.1 1 idled 0.1 1 fsflush 0.1 1 /usr/sbin/syslogd 0.1 1 /opt/local/mysql/libexec/mysqld The imapd, pop3d, and lmtpd processes are single-threaded. There''s one for each client connection. `master'' on the other hand is supposed to be multi-threaded, but `prstat'' shows only one thread now. There are two because one is the mupdate master and the other is the back-end master. What''s the command to show cross calls? -- -Gary Mills- -Unix Support- -U of M Academic Computing and Networking-
On Thu, 18 Oct 2007, Mike Gerdts wrote:> On 10/18/07, Bill Sommerfeld <sommerfeld at sun.com> wrote: >> that sounds like a somewhat mangled description of the cross-calls done >> to invalidate the TLB on other processors when a page is unmapped. >> (it certainly doesn''t happen on *every* update to a mapped file). > > I''ve seen systems running Veritas Cluster & Oracle Cluster Ready > Services idle at about 10% sys due to the huge number of monitoring > scripts that kept firing. This was on a 12 - 16 CPU 25k domain. AMonitoring scripts and mmap users ... URGH :( That runs into procfs'' notorious keenness on locking the address spaces of inspected processes. Even as much as an "ls -l /proc/<PID>/" is acquiring address space locks on that process, and I can see how/why this leads to CPU spikes when you have an application that heavily uses mmap()/munmap(). One could say, if you want this workload to perform well, trust it to perform well and restrain the urge to watch it all the time ...> quite similar configuration on T2000''s had negligible overhead. > Lesson learned: cross-calls (and thread migrations, and ...) are much > cheaper on systems with lower latency between CPUs.And quantum theory tells us: If you hadn''t looked, that cat might still be living happily ever after ... /proc isn''t for free. FrankH.> > -- > Mike Gerdts > http://mgerdts.blogspot.com/ > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >
On 10/18/07, Gary Mills <mills at cc.umanitoba.ca> wrote:> What''s the command to show cross calls?mpstat will show it on a system basis. xcallsbypid.d from the DTraceToolkit (ask google) will tell you which PID is responsible. -- Mike Gerdts http://mgerdts.blogspot.com/
> > What''s the command to show cross calls? >mpstat(1M) example o/p $ mpstat 1 CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl 0 16 0 0 416 316 485 16 0 0 0 618 7 3 0 90 0 6 0 0 425 324 488 2 0 0 0 579 4 2 0 94
michael schuster
2007-Oct-18 15:37 UTC
[zfs-discuss] UC Davis Cyrus Incident September 2007
Gary Mills wrote:> What''s the command to show cross calls?mpstat -- Michael Schuster Sun Microsystems, Inc. recursion, n: see ''recursion''
On Thu, Oct 18, 2007 at 10:32:58AM -0500, Mike Gerdts wrote:> On 10/18/07, Gary Mills <mills at cc.umanitoba.ca> wrote: > > What''s the command to show cross calls? > > mpstat will show it on a system basis.Thanks. This is on our T2000 Cyrus IMAP server with ZFS. It''s the second listing from `mpstat 5''. How do I recognize when there are too many cross calls? CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl 0 29 0 2591 779 139 1037 8 78 214 1 785 2 10 0 87 1 12 0 1225 175 0 250 1 63 95 0 257 2 2 0 96 2 2 0 427 83 0 128 0 35 59 0 124 0 1 0 99 3 3 0 432 66 0 121 0 32 44 0 154 1 1 0 98 4 13 0 1231 159 0 338 0 20 83 0 281 1 3 0 96 5 11 0 991 164 2 337 1 22 69 0 299 1 2 0 97 6 3 0 512 101 0 205 0 18 51 0 185 0 4 0 96 7 8 0 598 119 0 246 0 20 68 0 211 1 1 0 97 8 9 0 947 119 0 245 1 20 52 0 264 1 1 0 98 9 2 0 258 110 0 229 0 19 58 0 77 0 4 0 95 10 13 0 2083 153 0 311 0 20 71 0 151 0 4 0 96 11 16 0 1434 119 0 262 1 21 53 0 385 1 2 0 97 12 20 0 1093 145 0 308 1 21 75 0 381 1 2 0 97 13 12 0 2163 154 0 318 2 23 68 0 211 1 2 0 97 14 22 0 2955 126 0 269 1 22 49 0 666 2 3 0 95 15 4 0 313 132 0 273 1 20 55 0 127 0 1 0 99 16 0 0 256 287 0 582 0 20 60 0 53 0 2 0 98 17 0 0 138 145 0 295 0 20 54 0 116 0 1 0 98 18 1 0 806 283 0 574 1 19 62 0 150 0 3 0 96 19 7 0 2290 2347 2181 347 2 23 105 0 169 1 7 0 92 20 1 0 671 605 496 226 0 18 61 0 140 0 2 0 98 21 7 0 1205 128 26 203 0 16 51 0 152 1 7 0 93 22 15 0 1045 107 0 218 1 18 56 0 243 0 2 0 98 23 27 0 2887 141 0 306 2 19 53 0 594 3 2 0 95 24 55 3 991 128 0 272 1 22 56 0 388 1 2 0 97 25 21 0 1857 124 0 264 1 19 59 0 461 1 2 0 97 26 16 0 835 172 0 358 0 19 93 0 267 1 2 0 97 27 10 0 1132 183 0 383 1 20 66 0 289 1 3 0 96 28 14 0 2761 103 0 225 1 20 53 0 379 1 3 0 95 29 5 0 618 99 0 212 0 19 56 0 197 0 3 0 97 30 16 0 538 87 0 178 1 18 47 0 185 1 1 0 98 31 0 0 661 1104 0 2319 3 24 206 0 78 0 8 0 91 -- -Gary Mills- -Unix Support- -U of M Academic Computing and Networking-
xcal are sometimes a signature of some problem. Of themselves they should be cheap. Below one sees that the sys time is rather small, so I''m inclined to think this is not a problem here pending further analysis of the problem. We see that all you CPUs are making what appears to progress with a number of syscl. So this machine is underperforming, which means it''s being given some work to do which is not happening in the expected time. I might have missed the initial post, but what data is there to support that ? CPU 19 is probably your network interrupt. Creating a fence around it so that no other work become pinned by it sometimes help : psrset -c 19. -r Gary Mills writes: > On Thu, Oct 18, 2007 at 10:32:58AM -0500, Mike Gerdts wrote: > > On 10/18/07, Gary Mills <mills at cc.umanitoba.ca> wrote: > > > What''s the command to show cross calls? > > > > mpstat will show it on a system basis. > > Thanks. This is on our T2000 Cyrus IMAP server with ZFS. It''s > the second listing from `mpstat 5''. How do I recognize when there > are too many cross calls? > > CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl > 0 29 0 2591 779 139 1037 8 78 214 1 785 2 10 0 87 > 1 12 0 1225 175 0 250 1 63 95 0 257 2 2 0 96 > 2 2 0 427 83 0 128 0 35 59 0 124 0 1 0 99 > 3 3 0 432 66 0 121 0 32 44 0 154 1 1 0 98 > 4 13 0 1231 159 0 338 0 20 83 0 281 1 3 0 96 > 5 11 0 991 164 2 337 1 22 69 0 299 1 2 0 97 > 6 3 0 512 101 0 205 0 18 51 0 185 0 4 0 96 > 7 8 0 598 119 0 246 0 20 68 0 211 1 1 0 97 > 8 9 0 947 119 0 245 1 20 52 0 264 1 1 0 98 > 9 2 0 258 110 0 229 0 19 58 0 77 0 4 0 95 > 10 13 0 2083 153 0 311 0 20 71 0 151 0 4 0 96 > 11 16 0 1434 119 0 262 1 21 53 0 385 1 2 0 97 > 12 20 0 1093 145 0 308 1 21 75 0 381 1 2 0 97 > 13 12 0 2163 154 0 318 2 23 68 0 211 1 2 0 97 > 14 22 0 2955 126 0 269 1 22 49 0 666 2 3 0 95 > 15 4 0 313 132 0 273 1 20 55 0 127 0 1 0 99 > 16 0 0 256 287 0 582 0 20 60 0 53 0 2 0 98 > 17 0 0 138 145 0 295 0 20 54 0 116 0 1 0 98 > 18 1 0 806 283 0 574 1 19 62 0 150 0 3 0 96 > 19 7 0 2290 2347 2181 347 2 23 105 0 169 1 7 0 92 > 20 1 0 671 605 496 226 0 18 61 0 140 0 2 0 98 > 21 7 0 1205 128 26 203 0 16 51 0 152 1 7 0 93 > 22 15 0 1045 107 0 218 1 18 56 0 243 0 2 0 98 > 23 27 0 2887 141 0 306 2 19 53 0 594 3 2 0 95 > 24 55 3 991 128 0 272 1 22 56 0 388 1 2 0 97 > 25 21 0 1857 124 0 264 1 19 59 0 461 1 2 0 97 > 26 16 0 835 172 0 358 0 19 93 0 267 1 2 0 97 > 27 10 0 1132 183 0 383 1 20 66 0 289 1 3 0 96 > 28 14 0 2761 103 0 225 1 20 53 0 379 1 3 0 95 > 29 5 0 618 99 0 212 0 19 56 0 197 0 3 0 97 > 30 16 0 538 87 0 178 1 18 47 0 185 1 1 0 98 > 31 0 0 661 1104 0 2319 3 24 206 0 78 0 8 0 91 > > -- > -Gary Mills- -Unix Support- -U of M Academic Computing and Networking- > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss