Jürgen Keil
2007-May-29 17:49 UTC
[zfs-discuss] Deterioration with zfs performace and recent zfs bits?
Has anyone else noticed a significant zfs performance deterioration when running recent opensolaris bits? My 32-bit / 768 MB Toshiba Tecra S1 notebook was able to do a full opensolaris release build in ~ 4 hours 45 minutes (gcc shadow compilation disabled; using an lzjb compressed zpool / zfs on a single notebook hdd p-ata drive). After upgrading to 2007-05-25 opensolaris release bits (compiled from source), the same release build now needs ~ 6 hours; that''s ~ 25% slower. I think a change that might be responsible for this is the fix for 6542676 "ARC needs to track meta-data memory overhead" (that is, less caching with the fix for 6542676). Has anyone noticed similar zfs performace deterioration? This message posted from opensolaris.org
Jürgen Keil
2007-Jun-01 14:52 UTC
[zfs-discuss] Re: Deterioration with zfs performance and recent zfs bits?
I wrote> Has anyone else noticed a significant zfs performance > deterioration when running recent opensolaris bits? > > My 32-bit / 768 MB Toshiba Tecra S1 notebook was able > to do a full opensolaris release build in ~ 4 hours 45 > minutes (gcc shadow compilation disabled; using an lzjb > compressed zpool / zfs on a single notebook hdd p-ata drive). > > After upgrading to 2007-05-25 opensolaris release > bits (compiled from source), the same release build now > needs ~ 6 hours; that''s ~ 25% slower.It might be Bug ID 6469558 "ZFS prefetch needs to be more aware of memory pressure": http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6469558 Instead of compiling opensolaris for 4-6 hours, I''ve now used the following find / grep test using on-2007-05-30 sources: 1st test using Nevada build 60: % cd /files/onnv-2007-05-30 % repeat 10 /bin/time find usr/src/ -name "*.[hc]" -exec grep FooBar {} + usr/src/lib/pam_modules/authtok_check/authtok_check.c: * user entering FooBar1234 with PASSLENGTH=6, MINDIGIT=4, while real 4:22.5 user 3.3 sys 5.8 usr/src/lib/pam_modules/authtok_check/authtok_check.c: * user entering FooBar1234 with PASSLENGTH=6, MINDIGIT=4, while real 4:28.4 user 3.3 sys 4.8 usr/src/lib/pam_modules/authtok_check/authtok_check.c: * user entering FooBar1234 with PASSLENGTH=6, MINDIGIT=4, while real 4:18.0 user 3.3 sys 4.7 usr/src/lib/pam_modules/authtok_check/authtok_check.c: * user entering FooBar1234 with PASSLENGTH=6, MINDIGIT=4, while real 4:17.3 user 3.3 sys 4.8 usr/src/lib/pam_modules/authtok_check/authtok_check.c: * user entering FooBar1234 with PASSLENGTH=6, MINDIGIT=4, while real 4:15.0 user 3.3 sys 4.7 usr/src/lib/pam_modules/authtok_check/authtok_check.c: * user entering FooBar1234 with PASSLENGTH=6, MINDIGIT=4, while real 4:12.0 user 3.3 sys 4.7 usr/src/lib/pam_modules/authtok_check/authtok_check.c: * user entering FooBar1234 with PASSLENGTH=6, MINDIGIT=4, while real 4:21.9 user 3.3 sys 4.7 usr/src/lib/pam_modules/authtok_check/authtok_check.c: * user entering FooBar1234 with PASSLENGTH=6, MINDIGIT=4, while real 4:18.7 user 3.3 sys 4.7 usr/src/lib/pam_modules/authtok_check/authtok_check.c: * user entering FooBar1234 with PASSLENGTH=6, MINDIGIT=4, while real 4:19.5 user 3.3 sys 4.7 usr/src/lib/pam_modules/authtok_check/authtok_check.c: * user entering FooBar1234 with PASSLENGTH=6, MINDIGIT=4, while real 4:17.2 user 3.3 sys 4.7 Same test, but running onnv-2007-05-30 release bits (compiled from source). This is at least 25% slower than snv_60: (Note: zfs_prefetch_disable = 0 , the default value) % repeat 10 /bin/time find usr/src/ -name "*.[hc]" -exec grep FooBar {} + usr/src/lib/pam_modules/authtok_check/authtok_check.c: * user entering FooBar1234 with PASSLENGTH=6, MINDIGIT=4, while real 8:04.3 user 7.3 sys 13.2 usr/src/lib/pam_modules/authtok_check/authtok_check.c: * user entering FooBar1234 with PASSLENGTH=6, MINDIGIT=4, while real 6:34.4 user 7.3 sys 11.2 usr/src/lib/pam_modules/authtok_check/authtok_check.c: * user entering FooBar1234 with PASSLENGTH=6, MINDIGIT=4, while real 6:33.8 user 7.3 sys 11.1 usr/src/lib/pam_modules/authtok_check/authtok_check.c: * user entering FooBar1234 with PASSLENGTH=6, MINDIGIT=4, while real 5:35.6 user 7.3 sys 10.6 usr/src/lib/pam_modules/authtok_check/authtok_check.c: * user entering FooBar1234 with PASSLENGTH=6, MINDIGIT=4, while real 5:39.8 user 7.3 sys 10.6 usr/src/lib/pam_modules/authtok_check/authtok_check.c: * user entering FooBar1234 with PASSLENGTH=6, MINDIGIT=4, while real 5:37.8 user 7.3 sys 11.1 usr/src/lib/pam_modules/authtok_check/authtok_check.c: * user entering FooBar1234 with PASSLENGTH=6, MINDIGIT=4, while real 5:53.5 user 7.3 sys 11.0 usr/src/lib/pam_modules/authtok_check/authtok_check.c: * user entering FooBar1234 with PASSLENGTH=6, MINDIGIT=4, while real 5:45.2 user 7.3 sys 11.1 usr/src/lib/pam_modules/authtok_check/authtok_check.c: * user entering FooBar1234 with PASSLENGTH=6, MINDIGIT=4, while real 5:44.8 user 7.3 sys 11.0 usr/src/lib/pam_modules/authtok_check/authtok_check.c: * user entering FooBar1234 with PASSLENGTH=6, MINDIGIT=4, while real 5:49.1 user 7.3 sys 11.0 Then I patched zfs_prefetch_disable/W1, and now the find & grep test runs much faster on onnv-2007-05-30 bits: (Note: zfs_prefetch_disable = 1) % repeat 10 /bin/time find usr/src/ -name "*.[hc]" -exec grep FooBar {} + usr/src/lib/pam_modules/authtok_check/authtok_check.c: * user entering FooBar1234 with PASSLENGTH=6, MINDIGIT=4, while real 4:01.3 user 7.2 sys 9.9 usr/src/lib/pam_modules/authtok_check/authtok_check.c: * user entering FooBar1234 with PASSLENGTH=6, MINDIGIT=4, while real 4:24.5 user 7.2 sys 9.0 usr/src/lib/pam_modules/authtok_check/authtok_check.c: * user entering FooBar1234 with PASSLENGTH=6, MINDIGIT=4, while real 4:31.1 user 7.3 sys 9.7 usr/src/lib/pam_modules/authtok_check/authtok_check.c: * user entering FooBar1234 with PASSLENGTH=6, MINDIGIT=4, while real 4:47.0 user 7.2 sys 9.6 usr/src/lib/pam_modules/authtok_check/authtok_check.c: * user entering FooBar1234 with PASSLENGTH=6, MINDIGIT=4, while real 4:34.4 user 7.2 sys 9.5 usr/src/lib/pam_modules/authtok_check/authtok_check.c: * user entering FooBar1234 with PASSLENGTH=6, MINDIGIT=4, while real 4:38.3 user 7.3 sys 9.3 usr/src/lib/pam_modules/authtok_check/authtok_check.c: * user entering FooBar1234 with PASSLENGTH=6, MINDIGIT=4, while real 4:35.9 user 7.2 sys 9.5 usr/src/lib/pam_modules/authtok_check/authtok_check.c: * user entering FooBar1234 with PASSLENGTH=6, MINDIGIT=4, while real 4:30.2 user 7.2 sys 9.5 usr/src/lib/pam_modules/authtok_check/authtok_check.c: * user entering FooBar1234 with PASSLENGTH=6, MINDIGIT=4, while real 4:30.8 user 7.2 sys 9.3 usr/src/lib/pam_modules/authtok_check/authtok_check.c: * user entering FooBar1234 with PASSLENGTH=6, MINDIGIT=4, while real 4:33.0 user 7.2 sys 9.3 Patching zfs_prefetch_disable = 1 has helped opensolaris nightly compile times, too. With zfs_prefetch_disable/W1 nightly build time is back at 4h 45m (instead of 6h +) .... This message posted from opensolaris.org
Rob Logan
2007-Jun-01 15:27 UTC
[zfs-discuss] Re: Deterioration with zfs performance and recent zfs bits?
> Patching zfs_prefetch_disable = 1 has helpedIt''s my belief this mainly aids scanning metadata. my testing with rsync and yours with find (and seen with du & ; zpool iostat -v 1 ) pans this out.. mainly tracked in bug 6437054 vdev_cache: wise up or die http://www.opensolaris.org/jive/thread.jspa?messageID=42212 so to link your code, it might help, but if one ran a clean down the tree, it would hurt compile times. Rob
Jürgen Keil
2007-Jun-04 17:09 UTC
[zfs-discuss] Re: Re: Deterioration with zfs performance and recent zfs bits?
> > Patching zfs_prefetch_disable = 1 has helped > It''s my belief this mainly aids scanning metadata. my > testing with rsync and yours with find (and seen with > du & ; zpool iostat -v 1 ) pans this out.. > mainly tracked in bug 6437054 vdev_cache: wise up or die > http://www.opensolaris.org/jive/thread.jspa?messageID=42212 > > so to link your code, it might help, but if one ran > a clean down the tree, it would hurt compile times.I think the slowdown that I''m observing is due to the changes that have been made for 6542676 "ARC needs to track meta-data memory overhead". There is now a limit of 1/4 of arc size ("arc_meta_limit") for zfs meta-data. On a 32-bit x86 platform with > 512MB physical memory, the arc size is limited to 3/4 of the size of the kernel heap arena, which is 3/4 * ~ 650MB => ~ 500MB. 1/4 of that 500MB is ~ 125MB for zfs meta data. When more than 1/4 of arc is used for meta-data, meta-data allocations steal space from arc mru/mfu list. When more than 1/4 of arc is used for meta-data and arc_reclaim_needed() returns TRUE, entries from the dnlc cache are purged and arc data is evicted. Apparently, before 6542676 it was possible to use a lot more meta-data if we compare it to what is possible now with 6542676. void arc_init(void) { ... /* limit meta-data to 1/4 of the arc capacity */ arc_meta_limit = arc_c_max / 4; ... } static int arc_evict_needed(arc_buf_contents_t type) { if (type == ARC_BUFC_METADATA && arc_meta_used >= arc_meta_limit) return (1); ... } static void arc_get_data_buf(arc_buf_t *buf) { /* * We have not yet reached cache maximum size, * just allocate a new buffer. */ if (!arc_evict_needed(type)) { ... goto out; } /* * If we are prefetching from the mfu ghost list, this buffer * will end up on the mru list; so steal space from there. */ ... if ((buf->b_data = arc_evict(state, size, TRUE, type)) == NULL) { ... } static void arc_kmem_reap_now(arc_reclaim_strategy_t strat) { ... if (arc_meta_used >= arc_meta_limit) { /* * We are exceeding our meta-data cache limit. * Purge some DNLC entries to release holds on meta-data. */ dnlc_reduce_cache((void *)(uintptr_t)arc_reduce_dnlc_percent); } ... } The Tecra-S1 (32-bit Solaris x86) has> arc_meta_limit::print0x7380000 <<<<<<<<<<<<<<<<<<<> arc_meta_limit::print -d0t121110528> ::arc{ anon = -735455552 mru = -735455488 mru_ghost = -735455424 mfu = -735455360 mfu_ghost = -735455296 size = 0x131dae70 p = 0xb10983e c = 0x1330105e c_min = 0x4000000 c_max = 0x1ce00000 <<<<<<<<<<<<<<<<<<<< hits = 0x2e405 misses = 0x9092 deleted = 0x5f recycle_miss = 0x45bf mutex_miss = 0 evict_skip = 0x6e4b0 hash_elements = 0x54dd hash_elements_max = 0x54de hash_collisions = 0x398e hash_chains = 0x1887 hash_chain_max = 0x7 no_grow = 0> 0x1ce00000%4=X7380000 Patching arc_meta_limit to 1/2 of arc size improves find performance. Another problem: In dbuf.c, dbuf_read_impl() arc_meta_used accounting appears to be broken, the amount of meta-data used ("arc_meta_used") is inflated: db->db.db_data = zio_buf_alloc(DN_MAX_BONUSLEN); arc_space_consume(512); Why 512? Apparently, we zio_buf_alloc DN_MAX_BONUSLEN = 0x140 bytes but consume 0x200 bytes of meta-data? (When these buffers are freed, only DN_MAX_BONUSLEN = 0x140 bytes are returned to arc meta-data) I''m currently using the following changes, which seem to restore the zfs performace to what it has been before 6542676 - more or less: diff -r bec4e9eb1f01 usr/src/uts/common/fs/zfs/arc.c --- a/usr/src/uts/common/fs/zfs/arc.c Fri Jun 01 08:24:48 2007 -0700 +++ b/usr/src/uts/common/fs/zfs/arc.c Sat Jun 02 22:09:33 2007 +0200 @@ -2781,10 +2781,10 @@ arc_init(void) arc_c = arc_c_max; arc_p = (arc_c >> 1); - /* limit meta-data to 1/4 of the arc capacity */ - arc_meta_limit = arc_c_max / 4; - if (arc_c_min < arc_meta_limit / 2 && zfs_arc_min == 0) - arc_c_min = arc_meta_limit / 2; + /* limit meta-data to 1/2 of the arc capacity */ + arc_meta_limit = arc_c_max / 2; + if (arc_c_min < arc_meta_limit / 4 && zfs_arc_min == 0) + arc_c_min = arc_meta_limit / 4; /* if kmem_flags are set, lets try to use less memory */ if (kmem_debugging()) diff -r bec4e9eb1f01 usr/src/uts/common/fs/zfs/dbuf.c --- a/usr/src/uts/common/fs/zfs/dbuf.c Fri Jun 01 08:24:48 2007 -0700 +++ b/usr/src/uts/common/fs/zfs/dbuf.c Sat Jun 02 22:09:52 2007 +0200 @@ -470,7 +470,7 @@ dbuf_read_impl(dmu_buf_impl_t *db, zio_t if (db->db_blkid == DB_BONUS_BLKID) { ASSERT3U(db->db_dnode->dn_bonuslen, ==, db->db.db_size); db->db.db_data = zio_buf_alloc(DN_MAX_BONUSLEN); - arc_space_consume(512); + arc_space_consume(DN_MAX_BONUSLEN); if (db->db.db_size < DN_MAX_BONUSLEN) bzero(db->db.db_data, DN_MAX_BONUSLEN); bcopy(DN_BONUS(db->db_dnode->dn_phys), db->db.db_data, This message posted from opensolaris.org
Jürgen Keil
2007-Jun-04 18:04 UTC
[zfs-discuss] Re: Deterioration with zfs performance and recent zfs bits?
I wrote> Instead of compiling opensolaris for 4-6 hours, I''ve now used > the following find / grep test using on-2007-05-30 sources: > > 1st test using Nevada build 60: > > % cd /files/onnv-2007-05-30 > % repeat 10 /bin/time find usr/src/ -name "*.[hc]" -exec grep FooBar {} +This find + grep command basically - does a recursive scan looking for *.h and *.c files - at the end of the recursive directory scan invokes one grep command with ~ 20000 filename args. Simplifying the test a bit more: snv_60 is able to cache all meta-data for a compiled onnv source tree, on a 32-bit x86 machine with 768 mb of physical memory: % cd /files/wos_b67 % repeat 10 sh -c "/bin/time find usr/src/ -name ''*.[hc]'' -print|wc" real 2:11.7 user 0.2 sys 3.2 19355 19355 772864 real 2.4 user 0.1 sys 1.4 19355 19355 772864 real 2.2 user 0.1 sys 1.5 19355 19355 772864 real 2.0 user 0.1 sys 1.4 19355 19355 772864 real 1:21.8 <<<<<< seems that some meta data was freed here... user 0.2 sys 1.7 19355 19355 772864 real 1:21.0 user 0.2 sys 1.7 19355 19355 772864 real 45.9 user 0.1 sys 1.6 19355 19355 772864 real 3.2 user 0.1 sys 1.3 19355 19355 772864 real 1.9 user 0.1 sys 1.3 19355 19355 772864 real 2.8 user 0.1 sys 1.3 19355 19355 772864 (and the next 10 finds all completed in ~2 seconds per find) build 67 is unable to cache the meta-data, for the same find command on the same zfs: % cd /files/wos_b67 % repeat 10 sh -c "/bin/time find usr/src/ -name ''*.[hc]'' -print|wc" real 3:20.7 user 0.5 sys 7.5 19355 19355 772864 real 3:07.0 user 0.5 sys 5.5 19355 19355 772864 real 2:44.6 user 0.5 sys 4.7 19355 19355 772864 real 2:06.1 user 0.4 sys 3.9 19355 19355 772864 real 1:16.1 user 0.4 sys 3.5 19355 19355 772864 real 33.0 user 0.4 sys 2.7 19355 19355 772864 real 40.8 user 0.4 sys 3.0 19355 19355 772864 real 18.8 user 0.3 sys 2.6 19355 19355 772864 real 2:32.2 user 0.4 sys 4.2 19355 19355 772864 real 2:05.4 user 0.4 sys 3.9 19355 19355 772864 This message posted from opensolaris.org
Robert Milkowski
2007-Jun-05 08:00 UTC
[zfs-discuss] Re: Re: Deterioration with zfs performance and recent zfs bits?
Hello J?rgen, Monday, June 4, 2007, 7:09:59 PM, you wrote:>> > Patching zfs_prefetch_disable = 1 has helped >> It''s my belief this mainly aids scanning metadata. my >> testing with rsync and yours with find (and seen with >> du & ; zpool iostat -v 1 ) pans this out.. >> mainly tracked in bug 6437054 vdev_cache: wise up or die >> http://www.opensolaris.org/jive/thread.jspa?messageID=42212 >> >> so to link your code, it might help, but if one ran >> a clean down the tree, it would hurt compile times.JK> I think the slowdown that I''m observing is due to the changes JK> that have been made for 6542676 "ARC needs to track meta-data JK> memory overhead". JK> There is now a limit of 1/4 of arc size ("arc_meta_limit") JK> for zfs meta-data. Not good - I have some systems with TBs of meta-data mostly. I guess there''s some tunable... -- Best regards, Robert mailto:rmilkowski at task.gda.pl http://milek.blogspot.com
Jürgen Keil
2007-Jun-05 09:59 UTC
[zfs-discuss] Re: Re: Re: Deterioration with zfs performance and recent zfs bits?
> Hello J?rgen, > > Monday, June 4, 2007, 7:09:59 PM, you wrote: > > >> > Patching zfs_prefetch_disable = 1 has helped > >> It''s my belief this mainly aids scanning metadata. my > >> testing with rsync and yours with find (and seen with > >> du & ; zpool iostat -v 1 ) pans this out.. > >> mainly tracked in bug 6437054 vdev_cache: wise up or die > >> http://www.opensolaris.org/jive/thread.jspa?messageID=42212 > >> > >> so to link your code, it might help, but if one ran > >> a clean down the tree, it would hurt compile times. > > > JK> I think the slowdown that I''m observing is due to the changes > JK> that have been made for 6542676 "ARC needs to track meta-data > JK> memory overhead". > JK > > JK> There is now a limit of 1/4 of arc size ("arc_meta_limit") > JK> for zfs meta-data. > > Not good - I have some systems with TBs of meta-data mostly. > I guess there''s some tunable...AFAICT, you can patch the kernel global variable "arc_meta_limit" at run time, using mdb -wk (variable should be visible in build 66 or newer) But you can''t tune it via an /etc/system "set" command. This message posted from opensolaris.org