We''re running a Cyrus IMAP server on a T2000 under Solaris 10 with about 1 TB of mailboxes on ZFS filesystems. Recently, when under load, we''ve had incidents where IMAP operations became very slow. The general symptoms are that the number of imapd, pop3d, and lmtpd processes increases, the CPU load average increases, but the ZFS I/O bandwidth decreases. At the same time, ZFS filesystem operations become very slow. A rewrite of a small file can take two minutes. We''ve added memory; this was an improvement, but the incidents continued. The next step is to disable ZFS prefetch and test this under load. If that doesn''t help either, we''re down to ZFS bugs. Our incidents seem similar to the ones at UC Davis: http://vpiet.ucdavis.edu/docs/EmailReviewCmte.Report_Feb2008.pdf These were attributed to bug 6535160, but this one is fixed on our server with patch 127127-11. Bug 6535172, ``zil_sync causing long hold times on zl_lock'''', doesn''t have a patch yet: http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6535172 Could this bug cause our problem? How do I confirm that it does? Is there a workaround? Cyrus IMAP uses several moderate-sized databases that are memory-mapped by all processes. I can move these from ZFS to UFS if this is likely to help. -- -Gary Mills- -Unix Support- -U of M Academic Computing and Networking-
Gary Mills wrote:> We''re running a Cyrus IMAP server on a T2000 under Solaris 10 with > about 1 TB of mailboxes on ZFS filesystems. Recently, when under > load, we''ve had incidents where IMAP operations became very slow. The > general symptoms are that the number of imapd, pop3d, and lmtpd > processes increases, the CPU load average increases, but the ZFS I/O > bandwidth decreases. At the same time, ZFS filesystem operations > become very slow. A rewrite of a small file can take two minutes. >Bandwidth is likely not the issue. What does the latency to disk look like? -- richard> We''ve added memory; this was an improvement, but the incidents > continued. The next step is to disable ZFS prefetch and test this > under load. If that doesn''t help either, we''re down to ZFS bugs. > > Our incidents seem similar to the ones at UC Davis: > > http://vpiet.ucdavis.edu/docs/EmailReviewCmte.Report_Feb2008.pdf > > These were attributed to bug 6535160, but this one is fixed on our > server with patch 127127-11. Bug 6535172, ``zil_sync causing long > hold times on zl_lock'''', doesn''t have a patch yet: > > http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6535172 > > Could this bug cause our problem? How do I confirm that it does? > Is there a workaround? > > Cyrus IMAP uses several moderate-sized databases that are > memory-mapped by all processes. I can move these from ZFS to UFS if > this is likely to help. > >
On Sun, Apr 12, 2009 at 10:49:49AM -0700, Richard Elling wrote:> Gary Mills wrote: > >We''re running a Cyrus IMAP server on a T2000 under Solaris 10 with > >about 1 TB of mailboxes on ZFS filesystems. Recently, when under > >load, we''ve had incidents where IMAP operations became very slow. The > >general symptoms are that the number of imapd, pop3d, and lmtpd > >processes increases, the CPU load average increases, but the ZFS I/O > >bandwidth decreases. At the same time, ZFS filesystem operations > >become very slow. A rewrite of a small file can take two minutes. > > > > Bandwidth is likely not the issue. What does the latency to disk look like?Yes, I have statistics! This set was taken during an incident on Thursday. The load average was 12. There were about 5700 Cyrus processes running. Here are the relevant portions of `iostat -xn 5 4'': extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 23.8 20.7 1195.0 677.8 0.0 1.0 0.0 22.2 0 37 c4t60A98000433469764E4A2D456A644A74d0 29.0 23.5 1438.3 626.8 0.0 1.3 0.0 25.4 0 44 c4t60A98000433469764E4A2D456A696579d0 22.8 26.6 1356.7 822.1 0.0 1.3 0.0 26.2 0 32 c4t60A98000433469764E4A476D2F664E4Fd0 26.4 27.3 1516.0 850.7 0.0 1.4 0.0 26.5 0 38 c4t60A98000433469764E4A476D2F6B385Ad0 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 39.7 27.0 1395.8 285.5 0.0 1.1 0.0 16.3 0 51 c4t60A98000433469764E4A2D456A644A74d0 52.5 29.8 1890.8 175.1 0.0 1.8 0.0 22.3 0 63 c4t60A98000433469764E4A2D456A696579d0 30.0 33.3 1940.2 432.8 0.0 1.2 0.0 19.4 0 34 c4t60A98000433469764E4A476D2F664E4Fd0 39.9 42.5 2062.1 616.7 0.0 1.9 0.0 22.9 0 50 c4t60A98000433469764E4A476D2F6B385Ad0 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 43.8 47.6 1691.5 504.8 0.0 1.6 0.0 17.3 0 59 c4t60A98000433469764E4A2D456A644A74d0 55.4 62.4 2027.8 517.0 0.0 2.2 0.0 18.5 0 72 c4t60A98000433469764E4A2D456A696579d0 18.6 76.8 682.3 843.5 0.0 1.1 0.0 12.0 0 34 c4t60A98000433469764E4A476D2F664E4Fd0 30.2 115.8 873.6 905.8 0.0 2.2 0.0 15.1 0 52 c4t60A98000433469764E4A476D2F6B385Ad0 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 49.8 21.8 2438.7 400.3 0.0 1.7 0.0 24.0 0 62 c4t60A98000433469764E4A2D456A644A74d0 53.2 34.0 2741.3 218.0 0.0 2.1 0.0 24.4 0 63 c4t60A98000433469764E4A2D456A696579d0 14.0 26.8 506.2 482.1 0.0 0.7 0.0 18.2 0 32 c4t60A98000433469764E4A476D2F664E4Fd0 23.4 38.8 484.5 582.3 0.0 1.1 0.0 18.2 0 42 c4t60A98000433469764E4A476D2F6B385Ad0 -- -Gary Mills- -Unix Support- -U of M Academic Computing and Networking-
On Sun, Apr 12, 2009 at 12:23:03PM -0700, Richard Elling wrote:> These disks are pretty slow. JBOD? They are not 100% busy, which > means that either the cached data is providing enough response to the > apps, or the apps are not capable of producing enough load -- which > means the bottleneck may be elsewhere.They are four 500-gig Iscsi LUNs exported from a Netapp filer, with Solaris multipathing. Yes, the I/O is normally mostly writes, with reads being satisfied from various caches.> You can use fsstat to get a better idea of what sort of I/O the applications > are seeing from the file system. That might be revealing.Thanks for the suggestion. There are so many `*stat'' commands that I forget about some of them. I''ve run a baseline with `fsstat'', but the server is mostly idle now. I''ll have to wait for another incident! What option to `fsstat'' do you recommend? Here''s a sample of the default output: $ fsstat zfs 5 5 new name name attr attr lookup rddir read read write write file remov chng get set ops ops ops bytes ops bytes 3.56M 1.53M 3.83M 1.07G 1.53M 2.47G 4.09M 56.4M 1.83T 61.1M 306G zfs 13 1 16 1.40K 5 11.6K 0 5 38.5K 125 127K zfs 18 0 18 3.61K 6 21.1K 0 6 16.7K 97 244K zfs 26 4 25 1.73K 10 6.76K 0 18 178K 142 817K zfs 12 3 13 3.90K 5 9.00K 0 5 32.8K 108 287K zfs 7 2 7 1.98K 3 5.87K 0 7 67.5K 108 2.34M zfs -- -Gary Mills- -Unix Support- -U of M Academic Computing and Networking-
On Sun, Apr 12, 2009 at 05:01:57PM -0400, Ellis, Mike wrote:> Is the netapp iscsi-lun forcing a dull sync as a part of zfs''s > 5-second synx/flush type of thing? (Not needed tince the netapp > guarantees the write once it acks it)I''ve asked that of our Netapp guy, but so far I haven''t heard from him. Is there a way to determine this from the Iscsi initiator side? I do have a test mail server that I can play with.> That could make a big difference... > (Perhaps disabling the write-flush in zfs will make a big difference > here, especially on a write-heavy system)-- -Gary Mills- -Unix Support- -U of M Academic Computing and Networking-
Gary, How full is the pool ? -- Sanjeev On Sun, Apr 12, 2009 at 08:39:03AM -0500, Gary Mills wrote:> We''re running a Cyrus IMAP server on a T2000 under Solaris 10 with > about 1 TB of mailboxes on ZFS filesystems. Recently, when under > load, we''ve had incidents where IMAP operations became very slow. The > general symptoms are that the number of imapd, pop3d, and lmtpd > processes increases, the CPU load average increases, but the ZFS I/O > bandwidth decreases. At the same time, ZFS filesystem operations > become very slow. A rewrite of a small file can take two minutes. > > We''ve added memory; this was an improvement, but the incidents > continued. The next step is to disable ZFS prefetch and test this > under load. If that doesn''t help either, we''re down to ZFS bugs. > > Our incidents seem similar to the ones at UC Davis: > > http://vpiet.ucdavis.edu/docs/EmailReviewCmte.Report_Feb2008.pdf > > These were attributed to bug 6535160, but this one is fixed on our > server with patch 127127-11. Bug 6535172, ``zil_sync causing long > hold times on zl_lock'''', doesn''t have a patch yet: > > http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6535172 > > Could this bug cause our problem? How do I confirm that it does? > Is there a workaround? > > Cyrus IMAP uses several moderate-sized databases that are > memory-mapped by all processes. I can move these from ZFS to UFS if > this is likely to help. > > -- > -Gary Mills- -Unix Support- -U of M Academic Computing and Networking- > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss-- ---------------- Sanjeev Bagewadi Solaris RPE Bangalore, India
On Mon, Apr 13, 2009 at 09:08:09AM +0530, Sanjeev wrote:> > How full is the pool ?Only 50%, but it started with two 500-gig LUNs initially. We added two more when it got up to 300 gigabytes. # zpool list NAME SIZE USED AVAIL CAP HEALTH ALTROOT space 1.99T 1.02T 992G 51% ONLINE - # zpool status pool: space state: ONLINE status: The pool is formatted using an older on-disk format. The pool can still be used, but some features are unavailable. action: Upgrade the pool using ''zpool upgrade''. Once this is done, the pool will no longer be accessible on older software versions. scrub: none requested config: NAME STATE READ WRITE CKSUM space ONLINE 0 0 0 c4t60A98000433469764E4A2D456A644A74d0 ONLINE 0 0 0 c4t60A98000433469764E4A2D456A696579d0 ONLINE 0 0 0 c4t60A98000433469764E4A476D2F6B385Ad0 ONLINE 0 0 0 c4t60A98000433469764E4A476D2F664E4Fd0 ONLINE 0 0 0 errors: No known data errors -- -Gary Mills- -Unix Support- -U of M Academic Computing and Networking-
Gary, Thanks ! I was suspecting bug#6596237. But, with the current usage that does not seem likely. In any case can you collect the output of : /usr/sbin/lockstat -HcwP -n 50000 -D 20 -s 40 sleep 5 This would tell us if there are any lock contentions. And if the system is suffering from 6596237 we would see metaslab related routines on the top. Thanks and regards, Sanjeev On Mon, Apr 13, 2009 at 07:13:03AM -0500, Gary Mills wrote:> On Mon, Apr 13, 2009 at 09:08:09AM +0530, Sanjeev wrote: > > > > How full is the pool ? > > Only 50%, but it started with two 500-gig LUNs initially. We added > two more when it got up to 300 gigabytes. > > # zpool list > NAME SIZE USED AVAIL CAP HEALTH ALTROOT > space 1.99T 1.02T 992G 51% ONLINE - > # zpool status > pool: space > state: ONLINE > status: The pool is formatted using an older on-disk format. The pool can > still be used, but some features are unavailable. > action: Upgrade the pool using ''zpool upgrade''. Once this is done, the > pool will no longer be accessible on older software versions. > scrub: none requested > config: > > NAME STATE READ WRITE CKSUM > space ONLINE 0 0 0 > c4t60A98000433469764E4A2D456A644A74d0 ONLINE 0 0 0 > c4t60A98000433469764E4A2D456A696579d0 ONLINE 0 0 0 > c4t60A98000433469764E4A476D2F6B385Ad0 ONLINE 0 0 0 > c4t60A98000433469764E4A476D2F664E4Fd0 ONLINE 0 0 0 > > errors: No known data errors > > -- > -Gary Mills- -Unix Support- -U of M Academic Computing and Networking--- ---------------- Sanjeev Bagewadi Solaris RPE Bangalore, India