Matthew Anderson
2011-Apr-27 07:09 UTC
[zfs-discuss] No write coalescing after upgrade to Solaris 11 Express
Hi All, I''ve run into a massive performance problem after upgrading to Solaris 11 Express from oSol 134. Previously the server was performing a batch write every 10-15 seconds and the client servers (connected via NFS and iSCSI) had very low wait times. Now I''m seeing constant writes to the array with a very low throughput and high wait times on the client servers. Zil is currently disabled. There is currently one failed disk that is being replaced shortly. Is there any ZFS tunable to revert Solaris 11 back to the behaviour of oSol 134? I attempted to remove Sol 11 and reinstall 134 but it keeps freezing during install which is probably another issue entirely... IOstat output is below. When running iostat -v 2 that level is writes OP''s and throughput is very constant. capacity operations bandwidth pool alloc free read write read write ---------- ----- ----- ----- ----- ----- ----- MirrorPool 12.2T 4.11T 153 4.63K 6.06M 33.6M mirror 1.04T 325G 11 416 400K 2.80M c7t0d0 - - 5 114 163K 2.80M c7t1d0 - - 6 114 237K 2.80M mirror 1.04T 324G 10 374 426K 2.79M c7t2d0 - - 5 108 190K 2.79M c7t3d0 - - 5 107 236K 2.79M mirror 1.04T 324G 15 425 537K 3.15M c7t4d0 - - 7 115 290K 3.15M c7t5d0 - - 8 116 247K 3.15M mirror 1.04T 325G 13 412 572K 3.00M c7t6d0 - - 7 115 313K 3.00M c7t7d0 - - 6 116 259K 3.00M mirror 1.04T 324G 13 381 580K 2.85M c7t8d0 - - 7 111 362K 2.85M c7t9d0 - - 5 111 219K 2.85M mirror 1.04T 325G 15 408 654K 3.10M c7t10d0 - - 7 122 336K 3.10M c7t11d0 - - 7 123 318K 3.10M mirror 1.04T 325G 14 461 681K 3.22M c7t12d0 - - 8 130 403K 3.22M c7t13d0 - - 6 132 278K 3.22M mirror 749G 643G 1 279 140K 1.07M c4t14d0 - - 0 0 0 0 c7t15d0 - - 1 83 140K 1.07M mirror 1.05T 319G 18 333 672K 2.74M c7t16d0 - - 11 96 406K 2.74M c7t17d0 - - 7 96 266K 2.74M mirror 1.04T 323G 13 353 540K 2.85M c7t18d0 - - 7 98 279K 2.85M c7t19d0 - - 6 100 261K 2.85M mirror 1.04T 324G 12 459 543K 2.99M c7t20d0 - - 7 118 285K 2.99M c7t21d0 - - 4 119 258K 2.99M mirror 1.04T 324G 11 431 465K 3.04M c7t22d0 - - 5 116 195K 3.04M c7t23d0 - - 6 117 272K 3.04M c8t2d0 0 29.5G 0 0 0 0 cache - - - - - - c8t3d0 59.4G 3.88M 113 64 6.51M 7.31M c8t1d0 59.5G 48K 95 69 5.69M 8.08M Thanks -Matt
Andrew Gabriel
2011-Apr-27 07:41 UTC
[zfs-discuss] No write coalescing after upgrade to Solaris 11 Express
Matthew Anderson wrote:> Hi All, > > I''ve run into a massive performance problem after upgrading to Solaris 11 Express from oSol 134. > > Previously the server was performing a batch write every 10-15 seconds and the client servers (connected via NFS and iSCSI) had very low wait times. Now I''m seeing constant writes to the array with a very low throughput and high wait times on the client servers. Zil is currently disabled.How/Why?> There is currently one failed disk that is being replaced shortly. > > Is there any ZFS tunable to revert Solaris 11 back to the behaviour of oSol 134? >What does "zfs get sync" report? -- Andrew Gabriel
Matthew Anderson
2011-Apr-27 07:43 UTC
[zfs-discuss] No write coalescing after upgrade to Solaris 11 Express
NAME PROPERTY VALUE SOURCE MirrorPool sync disabled local MirrorPool/CCIT sync disabled local MirrorPool/EX01 sync disabled inherited from MirrorPool MirrorPool/EX02 sync disabled inherited from MirrorPool MirrorPool/FileStore1 sync disabled inherited from MirrorPool Sync was disabled on the main pool and then let to inherrit to everything else. The reason for disabled this in the first place was to fix bad NFS write performance (even with Zil on an X25e SSD it was under 1MB/s). I''ve also tried setting the logbias to throughput and latency but they both perform around the same level. Thanks -Matt -----Original Message----- From: Andrew Gabriel [mailto:Andrew.Gabriel at oracle.com] Sent: Wednesday, 27 April 2011 3:41 PM To: Matthew Anderson Cc: ''zfs-discuss at opensolaris.org'' Subject: Re: [zfs-discuss] No write coalescing after upgrade to Solaris 11 Express Matthew Anderson wrote:> Hi All, > > I''ve run into a massive performance problem after upgrading to Solaris 11 Express from oSol 134. > > Previously the server was performing a batch write every 10-15 seconds and the client servers (connected via NFS and iSCSI) had very low wait times. Now I''m seeing constant writes to the array with a very low throughput and high wait times on the client servers. Zil is currently disabled.How/Why?> There is currently one failed disk that is being replaced shortly. > > Is there any ZFS tunable to revert Solaris 11 back to the behaviour of oSol 134? >What does "zfs get sync" report? -- Andrew Gabriel
Markus Kovero
2011-Apr-27 11:00 UTC
[zfs-discuss] No write coalescing after upgrade to Solaris 11 Express
> Sync was disabled on the main pool and then let to inherrit to everything else. The > reason for disabled this in the first place was to fix bad NFS write performance (even with > Zil on an X25e SSD it was under 1MB/s). > I''ve also tried setting the logbias to throughput and latency but they both perform > around the same level.> Thanks > -MattI believe you''re hitting bug "7000208: Space map trashing affects NFS write throughput". We also did, and it did impact iscsi as well. If you have enough ram you can try enabling metaslab debug (which makes problem vanish); # echo metaslab_debug/W1 | mdb -kw And calculating amount of ram needed: /usr/sbin/amd64/zdb -mm <poolname> > /tmp/zdb-mm.out awk ''/segments/ {s+=$2}END {printf("sum=%d\n",s)}'' zdb_mm.out 93373117 sum of segments 16 VDEVs 116 metaslabs 1856 metaslabs in total 93373117/1856 = 50308 average number of segments per metaslab 50308*1856*64 5975785472 5975785472/1024/1024/1024 5.56 = 5.56 GB Yours Markus Kovero
Tomas Ă–gren
2011-Apr-27 11:14 UTC
[zfs-discuss] No write coalescing after upgrade to Solaris 11 Express
On 27 April, 2011 - Matthew Anderson sent me these 3,2K bytes:> Hi All, > > I''ve run into a massive performance problem after upgrading to Solaris 11 Express from oSol 134. > > Previously the server was performing a batch write every 10-15 seconds and the client servers (connected via NFS and iSCSI) had very low wait times. Now I''m seeing constant writes to the array with a very low throughput and high wait times on the client servers. Zil is currently disabled. There is currently one failed disk that is being replaced shortly. > > Is there any ZFS tunable to revert Solaris 11 back to the behaviour of oSol 134? > I attempted to remove Sol 11 and reinstall 134 but it keeps freezing during install which is probably another issue entirely... > > IOstat output is below. When running iostat -v 2 that level is writes OP''s and throughput is very constant. > > capacity operations bandwidth > pool alloc free read write read write > ---------- ----- ----- ----- ----- ----- ----- > MirrorPool 12.2T 4.11T 153 4.63K 6.06M 33.6M > mirror 1.04T 325G 11 416 400K 2.80M > c7t0d0 - - 5 114 163K 2.80M > c7t1d0 - - 6 114 237K 2.80M > mirror 1.04T 324G 10 374 426K 2.79M > c7t2d0 - - 5 108 190K 2.79M > c7t3d0 - - 5 107 236K 2.79M > mirror 1.04T 324G 15 425 537K 3.15M > c7t4d0 - - 7 115 290K 3.15M > c7t5d0 - - 8 116 247K 3.15M > mirror 1.04T 325G 13 412 572K 3.00M > c7t6d0 - - 7 115 313K 3.00M > c7t7d0 - - 6 116 259K 3.00M > mirror 1.04T 324G 13 381 580K 2.85M > c7t8d0 - - 7 111 362K 2.85M > c7t9d0 - - 5 111 219K 2.85M > mirror 1.04T 325G 15 408 654K 3.10M > c7t10d0 - - 7 122 336K 3.10M > c7t11d0 - - 7 123 318K 3.10M > mirror 1.04T 325G 14 461 681K 3.22M > c7t12d0 - - 8 130 403K 3.22M > c7t13d0 - - 6 132 278K 3.22M > mirror 749G 643G 1 279 140K 1.07M > c4t14d0 - - 0 0 0 0 > c7t15d0 - - 1 83 140K 1.07M > mirror 1.05T 319G 18 333 672K 2.74M > c7t16d0 - - 11 96 406K 2.74M > c7t17d0 - - 7 96 266K 2.74M > mirror 1.04T 323G 13 353 540K 2.85M > c7t18d0 - - 7 98 279K 2.85M > c7t19d0 - - 6 100 261K 2.85M > mirror 1.04T 324G 12 459 543K 2.99M > c7t20d0 - - 7 118 285K 2.99M > c7t21d0 - - 4 119 258K 2.99M > mirror 1.04T 324G 11 431 465K 3.04M > c7t22d0 - - 5 116 195K 3.04M > c7t23d0 - - 6 117 272K 3.04M > c8t2d0 0 29.5G 0 0 0 0Btw, this disk seems alone, unmirrored and a bit small..?> cache - - - - - - > c8t3d0 59.4G 3.88M 113 64 6.51M 7.31M > c8t1d0 59.5G 48K 95 69 5.69M 8.08M > > > Thanks > -Matt > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss/Tomas -- Tomas ?gren, stric at acc.umu.se, http://www.acc.umu.se/~stric/ |- Student at Computing Science, University of Ume? `- Sysadmin at {cs,acc}.umu.se
zfs user
2011-Apr-27 23:13 UTC
[zfs-discuss] No write coalescing after upgrade to Solaris 11 Express
On 4/27/11 4:00 AM, Markus Kovero wrote:> > >> Sync was disabled on the main pool and then let to inherrit to everything else. The> reason for disabled this in the first place was to fix bad NFS write performance (even with> Zil on an X25e SSD it was under 1MB/s). >> I''ve also tried setting the logbias to throughput and latency but they both perform> around the same level. > >> Thanks >> -Matt > > I believe you''re hitting bug "7000208: Space map trashing affects NFS write throughput". We also did, and it did impact iscsi as well. > > If you have enough ram you can try enabling metaslab debug (which makes problem vanish); > > # echo metaslab_debug/W1 | mdb -kw > > And calculating amount of ram needed: > > > /usr/sbin/amd64/zdb -mm<poolname> > /tmp/zdb-mm.outmetaslab 65 offset 41000000000 spacemap 258 free Assertion failed: space_map_load(sm, zfs_metaslab_ops, SM_FREE, smo, spa->spa_meta_objset) == 0, file ../zdb.c, line 571, function dump_metaslab Is this something I should worry about? uname -a SunOS E55000 5.11 oi_148 i86pc i386 i86pc Solaris> > awk ''/segments/ {s+=$2}END {printf("sum=%d\n",s)}'' zdb_mm.out > > 93373117 sum of segments > 16 VDEVs > 116 metaslabs > 1856 metaslabs in total > > 93373117/1856 = 50308 average number of segments per metaslab > > 50308*1856*64 > 5975785472 > > 5975785472/1024/1024/1024 > 5.56 > > = 5.56 GB > > Yours > Markus Kovero > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Markus Kovero
2011-Apr-28 09:51 UTC
[zfs-discuss] No write coalescing after upgrade to Solaris 11 Express
> failed: space_map_load(sm, zfs_metaslab_ops, SM_FREE, smo, > spa->spa_meta_objset) == 0, file ../zdb.c, line 571, function dump_metaslab> Is this something I should worry about?> uname -a > SunOS E55000 5.11 oi_148 i86pc i386 i86pc SolarisI thought we were talking about solaris 11 express, not oi? Anyway, no idea about how openindiana should work or not. Yours Markus Kovero
Stephan Budach
2011-Apr-28 12:39 UTC
[zfs-discuss] No write coalescing after upgrade to Solaris 11 Express
>> Sync was disabled on the main pool and then let to inherrit to everything else. The> reason for disabled this in the first place was to fix bad NFS write performance (even with> Zil on an X25e SSD it was under 1MB/s). >> I''ve also tried setting the logbias to throughput and latency but they both perform> around the same level. >> Thanks >> -Matt > I believe you''re hitting bug "7000208: Space map trashing affects NFS write throughput". We also did, and it did impact iscsi as well. > > If you have enough ram you can try enabling metaslab debug (which makes problem vanish); > > # echo metaslab_debug/W1 | mdb -kw > > And calculating amount of ram needed: > > > /usr/sbin/amd64/zdb -mm<poolname> > /tmp/zdb-mm.out > > awk ''/segments/ {s+=$2}END {printf("sum=%d\n",s)}'' zdb_mm.out > > 93373117 sum of segments > 16 VDEVs > 116 metaslabs > 1856 metaslabs in total > > 93373117/1856 = 50308 average number of segments per metaslab > > 50308*1856*64 > 5975785472 > > 5975785472/1024/1024/1024 > 5.56 > > = 5.56 GB > > Yours > Markus KoveroOut of curiosity, I just tried the command on one of my zpools, that has an number of ZFS volumes and was presented the following: root at solaris11c:/obelixData/99999_Testkunde/01_Etat01# /usr/sbin/amd64/zdb -mm obelixData > /tmp/zdb-mm.out WARNING: can''t open objset for obelixData/15035_RWE zdb: can''t open ''obelixData'': I/O error So, actually one of the volumes seems to have a problem, of which I wasn''t aware - but this volume seems to behave just normal and it doesn''t seem to show any erratic behaviour. Can anybody maybe shed some light on what might be wrong with that particular volume? Thanks, budy
Stephan Budach
2011-Apr-28 13:04 UTC
[zfs-discuss] No write coalescing after upgrade to Solaris 11 Express
Am 28.04.11 11:51, schrieb Markus Kovero:>> failed: space_map_load(sm, zfs_metaslab_ops, SM_FREE, smo, >> spa->spa_meta_objset) == 0, file ../zdb.c, line 571, function dump_metaslab >> Is this something I should worry about? >> uname -a >> SunOS E55000 5.11 oi_148 i86pc i386 i86pc Solaris > I thought we were talking about solaris 11 express, not oi? > Anyway, no idea about how openindiana should work or not. > > Yours > Markus Kovero > _______________________________________________Maybe this hasn''t anything to do with Sol11Expr vs. oi. I do get the same error on one of my Sol11Expr hosts, when I run it against a 50 TB zpool: root at solaris11a:~# uname -a SunOS solaris11a 5.11 snv_151a i86pc i386 i86pc root at solaris11a:~# /usr/sbin/amd64/zdb -mm backupPool_01 > /tmp/zdb-mm.out Assertion failed: space_map_load(sm, zfs_metaslab_ops, SM_FREE, smo, spa->spa_meta_objset) == 0, file ../zdb.c, line 593, function dump_metaslab Cheers, budy
Victor Latushkin
2011-Apr-28 13:16 UTC
[zfs-discuss] No write coalescing after upgrade to Solaris 11 Express
On Apr 28, 2011, at 5:04 PM, Stephan Budach wrote:> Am 28.04.11 11:51, schrieb Markus Kovero: >>> failed: space_map_load(sm, zfs_metaslab_ops, SM_FREE, smo, >>> spa->spa_meta_objset) == 0, file ../zdb.c, line 571, function dump_metaslab >>> Is this something I should worry about? >>> uname -a >>> SunOS E55000 5.11 oi_148 i86pc i386 i86pc Solaris >> I thought we were talking about solaris 11 express, not oi? >> Anyway, no idea about how openindiana should work or not. >> >> Yours >> Markus Kovero >> _______________________________________________ > Maybe this hasn''t anything to do with Sol11Expr vs. oi. I do get the same error on one of my Sol11Expr hosts, when I run it against a 50 TB zpool: > > root at solaris11a:~# uname -a > SunOS solaris11a 5.11 snv_151a i86pc i386 i86pc > > root at solaris11a:~# /usr/sbin/amd64/zdb -mm backupPool_01 > /tmp/zdb-mm.out > Assertion failed: space_map_load(sm, zfs_metaslab_ops, SM_FREE, smo, spa->spa_meta_objset) == 0, file ../zdb.c, line 593, function dump_metaslabzdb is not intended to be run against changing pools/datasets.
Stephan Budach
2011-Apr-28 13:21 UTC
[zfs-discuss] No write coalescing after upgrade to Solaris 11 Express
Am 28.04.11 15:16, schrieb Victor Latushkin:> On Apr 28, 2011, at 5:04 PM, Stephan Budach wrote: > >> Am 28.04.11 11:51, schrieb Markus Kovero: >>>> failed: space_map_load(sm, zfs_metaslab_ops, SM_FREE, smo, >>>> spa->spa_meta_objset) == 0, file ../zdb.c, line 571, function dump_metaslab >>>> Is this something I should worry about? >>>> uname -a >>>> SunOS E55000 5.11 oi_148 i86pc i386 i86pc Solaris >>> I thought we were talking about solaris 11 express, not oi? >>> Anyway, no idea about how openindiana should work or not. >>> >>> Yours >>> Markus Kovero >>> _______________________________________________ >> Maybe this hasn''t anything to do with Sol11Expr vs. oi. I do get the same error on one of my Sol11Expr hosts, when I run it against a 50 TB zpool: >> >> root at solaris11a:~# uname -a >> SunOS solaris11a 5.11 snv_151a i86pc i386 i86pc >> >> root at solaris11a:~# /usr/sbin/amd64/zdb -mm backupPool_01> /tmp/zdb-mm.out >> Assertion failed: space_map_load(sm, zfs_metaslab_ops, SM_FREE, smo, spa->spa_meta_objset) == 0, file ../zdb.c, line 593, function dump_metaslab > zdb is not intended to be run against changing pools/datasets.Well, that explains it then (and probably the I/O erorr I got from my other zpool as well). Thanks. Cheers, budy
Jamie Krier
2011-Jun-28 18:13 UTC
[zfs-discuss] No write coalescing after upgrade to Solaris 11 Express
Markus Kovero <Markus.Kovero <at> nebula.fi> writes:> > > > Sync was disabled on the main pool and then let to inherrit to everythingelse. The > reason for disabled> this in the first place was to fix bad NFS write performance (even with > Zilon an X25e SSD it was under 1MB/s).> > I''ve also tried setting the logbias to throughput and latency but they bothperform > around the same level.> > > Thanks > > -Matt > > I believe you''re hitting bug "7000208: Space map trashing affects NFS writethroughput". We also did, and> it did impact iscsi as well. > > If you have enough ram you can try enabling metaslab debug (which makesproblem vanish);> > # echo metaslab_debug/W1 | mdb -kw > > And calculating amount of ram needed: > > /usr/sbin/amd64/zdb -mm <poolname> > /tmp/zdb-mm.out > > awk ''/segments/ {s+=$2}END {printf("sum=%d\n",s)}'' zdb_mm.out > > 93373117 sum of segments > 16 VDEVs > 116 metaslabs > 1856 metaslabs in total > > 93373117/1856 = 50308 average number of segments per metaslab > > 50308*1856*64 > 5975785472 > > 5975785472/1024/1024/1024 > 5.56 > > = 5.56 GB > > Yours > Markus Kovero >We are running Solaris Express 11. We hit this same problem when we crossed 80% usage on our mirror pool. Write performance dropped to 3-8 MB/sec on 48 mirror sets! echo metaslab_debug/W1 | mdb -kw resolves problem instantly. What are the ramifications of running this command? Thanks - Jamie