thr3ads.net - zfs discuss - [zfs-discuss] No write coalescing after upgrade to Solaris 11 Express [Apr 2011]

If this information is useful, please help other people find it:
Share via:

Matthew Anderson

2011-Apr-27 07:09 UTC

[zfs-discuss] No write coalescing after upgrade to Solaris 11 Express

Hi All,

I''ve run into a massive performance problem after upgrading to Solaris
11 Express from oSol 134.

Previously the server was performing a batch write every 10-15 seconds and the
client servers (connected via NFS and iSCSI) had very low wait times. Now
I''m seeing constant writes to the array with a very low throughput and
high wait times on the client servers. Zil is currently disabled. There is
currently one failed disk that is being replaced shortly.

Is there any ZFS tunable to revert Solaris 11 back to the behaviour of oSol 134?
I attempted to remove Sol 11 and reinstall 134 but it keeps freezing during
install which is probably another issue entirely...

IOstat output is below. When running iostat -v 2 that level is writes
OP''s and throughput is very constant.

               capacity     operations    bandwidth
pool        alloc   free   read  write   read  write
----------  -----  -----  -----  -----  -----  -----
MirrorPool  12.2T  4.11T    153  4.63K  6.06M  33.6M
  mirror    1.04T   325G     11    416   400K  2.80M
    c7t0d0      -      -      5    114   163K  2.80M
    c7t1d0      -      -      6    114   237K  2.80M
  mirror    1.04T   324G     10    374   426K  2.79M
    c7t2d0      -      -      5    108   190K  2.79M
    c7t3d0      -      -      5    107   236K  2.79M
  mirror    1.04T   324G     15    425   537K  3.15M
    c7t4d0      -      -      7    115   290K  3.15M
    c7t5d0      -      -      8    116   247K  3.15M
  mirror    1.04T   325G     13    412   572K  3.00M
    c7t6d0      -      -      7    115   313K  3.00M
    c7t7d0      -      -      6    116   259K  3.00M
  mirror    1.04T   324G     13    381   580K  2.85M
    c7t8d0      -      -      7    111   362K  2.85M
    c7t9d0      -      -      5    111   219K  2.85M
  mirror    1.04T   325G     15    408   654K  3.10M
    c7t10d0      -      -      7    122   336K  3.10M
    c7t11d0      -      -      7    123   318K  3.10M
  mirror    1.04T   325G     14    461   681K  3.22M
    c7t12d0      -      -      8    130   403K  3.22M
    c7t13d0      -      -      6    132   278K  3.22M
  mirror     749G   643G      1    279   140K  1.07M
    c4t14d0      -      -      0      0      0      0
    c7t15d0      -      -      1     83   140K  1.07M
  mirror    1.05T   319G     18    333   672K  2.74M
    c7t16d0      -      -     11     96   406K  2.74M
    c7t17d0      -      -      7     96   266K  2.74M
  mirror    1.04T   323G     13    353   540K  2.85M
    c7t18d0      -      -      7     98   279K  2.85M
    c7t19d0      -      -      6    100   261K  2.85M
  mirror    1.04T   324G     12    459   543K  2.99M
    c7t20d0      -      -      7    118   285K  2.99M
    c7t21d0      -      -      4    119   258K  2.99M
  mirror    1.04T   324G     11    431   465K  3.04M
    c7t22d0      -      -      5    116   195K  3.04M
    c7t23d0      -      -      6    117   272K  3.04M
  c8t2d0        0  29.5G      0      0      0      0
cache           -      -      -      -      -      -
  c8t3d0    59.4G  3.88M    113     64  6.51M  7.31M
  c8t1d0    59.5G    48K     95     69  5.69M  8.08M


Thanks
-Matt

Andrew Gabriel

2011-Apr-27 07:41 UTC

head link

[zfs-discuss] No write coalescing after upgrade to Solaris 11 Express

Matthew Anderson wrote:> Hi All,
>
> I''ve run into a massive performance problem after upgrading to
Solaris 11 Express from oSol 134.
>
> Previously the server was performing a batch write every 10-15 seconds and
the client servers (connected via NFS and iSCSI) had very low wait times. Now
I''m seeing constant writes to the array with a very low throughput and
high wait times on the client servers. Zil is currently disabled.
How/Why?
>  There is currently one failed disk that is being replaced shortly.
>
> Is there any ZFS tunable to revert Solaris 11 back to the behaviour of oSol
134?
>   
What does "zfs get sync" report?

-- 
Andrew Gabriel

Matthew Anderson

2011-Apr-27 07:43 UTC

head link

[zfs-discuss] No write coalescing after upgrade to Solaris 11 Express

NAME                        PROPERTY  VALUE     SOURCE
MirrorPool                  sync      disabled  local
MirrorPool/CCIT             sync      disabled  local
MirrorPool/EX01             sync      disabled  inherited from MirrorPool
MirrorPool/EX02             sync      disabled  inherited from MirrorPool
MirrorPool/FileStore1       sync      disabled  inherited from MirrorPool


Sync was disabled on the main pool and then let to inherrit to everything else.
The reason for disabled this in the first place was to fix bad NFS write
performance (even with Zil on an X25e SSD it was under 1MB/s).
I''ve also tried setting the logbias to throughput and latency but they
both perform around the same level.

Thanks
-Matt


-----Original Message-----
From: Andrew Gabriel [mailto:Andrew.Gabriel at oracle.com] 
Sent: Wednesday, 27 April 2011 3:41 PM
To: Matthew Anderson
Cc: ''zfs-discuss at opensolaris.org''
Subject: Re: [zfs-discuss] No write coalescing after upgrade to Solaris 11
Express

Matthew Anderson wrote:> Hi All,
>
> I''ve run into a massive performance problem after upgrading to
Solaris 11 Express from oSol 134.
>
> Previously the server was performing a batch write every 10-15 seconds and
the client servers (connected via NFS and iSCSI) had very low wait times. Now
I''m seeing constant writes to the array with a very low throughput and
high wait times on the client servers. Zil is currently disabled.
How/Why?
>  There is currently one failed disk that is being replaced shortly.
>
> Is there any ZFS tunable to revert Solaris 11 back to the behaviour of oSol
134?
>   
What does "zfs get sync" report?

-- 
Andrew Gabriel

Markus Kovero

2011-Apr-27 11:00 UTC

head link

[zfs-discuss] No write coalescing after upgrade to Solaris 11 Express

> Sync was disabled on the main pool and then let to inherrit to everything
else. The > reason for disabled this in the first place was to fix bad NFS
write performance (even with > Zil on an X25e SSD it was under 1MB/s).
> I''ve also tried setting the logbias to throughput and latency but
they both perform > around the same level.
> Thanks
> -Matt
I believe you''re hitting bug "7000208: Space map trashing affects
NFS write throughput". We also did, and it did impact iscsi as well.

If you have enough ram you can try enabling metaslab debug (which makes problem
vanish);

# echo metaslab_debug/W1 | mdb -kw

And calculating amount of ram needed:


/usr/sbin/amd64/zdb -mm <poolname> > /tmp/zdb-mm.out

awk ''/segments/ {s+=$2}END {printf("sum=%d\n",s)}''
zdb_mm.out

93373117 sum of segments
16 VDEVs
116 metaslabs
1856 metaslabs in total

93373117/1856 = 50308 average number of segments per metaslab

50308*1856*64
5975785472

5975785472/1024/1024/1024
5.56

= 5.56 GB

Yours
Markus Kovero

Tomas Ögren

2011-Apr-27 11:14 UTC

head link

[zfs-discuss] No write coalescing after upgrade to Solaris 11 Express

On 27 April, 2011 - Matthew Anderson sent me these 3,2K bytes:
> Hi All,
> 
> I''ve run into a massive performance problem after upgrading to
Solaris 11 Express from oSol 134.
> 
> Previously the server was performing a batch write every 10-15 seconds and
the client servers (connected via NFS and iSCSI) had very low wait times. Now
I''m seeing constant writes to the array with a very low throughput and
high wait times on the client servers. Zil is currently disabled. There is
currently one failed disk that is being replaced shortly.
> 
> Is there any ZFS tunable to revert Solaris 11 back to the behaviour of oSol
134?
> I attempted to remove Sol 11 and reinstall 134 but it keeps freezing during
install which is probably another issue entirely...
> 
> IOstat output is below. When running iostat -v 2 that level is writes
OP''s and throughput is very constant.
> 
>                capacity     operations    bandwidth
> pool        alloc   free   read  write   read  write
> ----------  -----  -----  -----  -----  -----  -----
> MirrorPool  12.2T  4.11T    153  4.63K  6.06M  33.6M
>   mirror    1.04T   325G     11    416   400K  2.80M
>     c7t0d0      -      -      5    114   163K  2.80M
>     c7t1d0      -      -      6    114   237K  2.80M
>   mirror    1.04T   324G     10    374   426K  2.79M
>     c7t2d0      -      -      5    108   190K  2.79M
>     c7t3d0      -      -      5    107   236K  2.79M
>   mirror    1.04T   324G     15    425   537K  3.15M
>     c7t4d0      -      -      7    115   290K  3.15M
>     c7t5d0      -      -      8    116   247K  3.15M
>   mirror    1.04T   325G     13    412   572K  3.00M
>     c7t6d0      -      -      7    115   313K  3.00M
>     c7t7d0      -      -      6    116   259K  3.00M
>   mirror    1.04T   324G     13    381   580K  2.85M
>     c7t8d0      -      -      7    111   362K  2.85M
>     c7t9d0      -      -      5    111   219K  2.85M
>   mirror    1.04T   325G     15    408   654K  3.10M
>     c7t10d0      -      -      7    122   336K  3.10M
>     c7t11d0      -      -      7    123   318K  3.10M
>   mirror    1.04T   325G     14    461   681K  3.22M
>     c7t12d0      -      -      8    130   403K  3.22M
>     c7t13d0      -      -      6    132   278K  3.22M
>   mirror     749G   643G      1    279   140K  1.07M
>     c4t14d0      -      -      0      0      0      0
>     c7t15d0      -      -      1     83   140K  1.07M
>   mirror    1.05T   319G     18    333   672K  2.74M
>     c7t16d0      -      -     11     96   406K  2.74M
>     c7t17d0      -      -      7     96   266K  2.74M
>   mirror    1.04T   323G     13    353   540K  2.85M
>     c7t18d0      -      -      7     98   279K  2.85M
>     c7t19d0      -      -      6    100   261K  2.85M
>   mirror    1.04T   324G     12    459   543K  2.99M
>     c7t20d0      -      -      7    118   285K  2.99M
>     c7t21d0      -      -      4    119   258K  2.99M
>   mirror    1.04T   324G     11    431   465K  3.04M
>     c7t22d0      -      -      5    116   195K  3.04M
>     c7t23d0      -      -      6    117   272K  3.04M
>   c8t2d0        0  29.5G      0      0      0      0
Btw, this disk seems alone, unmirrored and a bit small..?
> cache           -      -      -      -      -      -
>   c8t3d0    59.4G  3.88M    113     64  6.51M  7.31M
>   c8t1d0    59.5G    48K     95     69  5.69M  8.08M
> 
> 
> Thanks
> -Matt
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

/Tomas
-- 
Tomas ?gren, stric at acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Ume?
`- Sysadmin at {cs,acc}.umu.se

zfs user

2011-Apr-27 23:13 UTC

head link

[zfs-discuss] No write coalescing after upgrade to Solaris 11 Express

On 4/27/11 4:00 AM, Markus Kovero wrote:>
>
>> Sync was disabled on the main pool and then let to inherrit to
everything else. The>  reason for disabled this in the first place was to fix
bad NFS write performance (even with>  Zil on an X25e SSD it was under
1MB/s).
>> I''ve also tried setting the logbias to throughput and latency
but they both perform>  around the same level.
>
>> Thanks
>> -Matt
>
> I believe you''re hitting bug "7000208: Space map trashing
affects NFS write throughput". We also did, and it did impact iscsi as
well.
>
> If you have enough ram you can try enabling metaslab debug (which makes
problem vanish);
>
> # echo metaslab_debug/W1 | mdb -kw
>
> And calculating amount of ram needed:
>
>
> /usr/sbin/amd64/zdb -mm<poolname>  >  /tmp/zdb-mm.out
metaslab     65   offset  41000000000   spacemap    258   free   Assertion 
failed: space_map_load(sm, zfs_metaslab_ops, SM_FREE, smo, 
spa->spa_meta_objset) == 0, file ../zdb.c, line 571, function dump_metaslab

Is this something I should worry about?

uname -a
SunOS E55000 5.11 oi_148 i86pc i386 i86pc Solaris
>
> awk ''/segments/ {s+=$2}END
{printf("sum=%d\n",s)}'' zdb_mm.out
>
> 93373117 sum of segments
> 16 VDEVs
> 116 metaslabs
> 1856 metaslabs in total
>
> 93373117/1856 = 50308 average number of segments per metaslab
>
> 50308*1856*64
> 5975785472
>
> 5975785472/1024/1024/1024
> 5.56
>
> = 5.56 GB
>
> Yours
> Markus Kovero
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Markus Kovero

2011-Apr-28 09:51 UTC

head link

[zfs-discuss] No write coalescing after upgrade to Solaris 11 Express

> failed: space_map_load(sm, zfs_metaslab_ops, SM_FREE, smo, 
> spa->spa_meta_objset) == 0, file ../zdb.c, line 571, function
dump_metaslab
> Is this something I should worry about?
> uname -a
> SunOS E55000 5.11 oi_148 i86pc i386 i86pc Solaris
I thought we were talking about solaris 11 express, not oi?
Anyway, no idea about how openindiana should work or not.

Yours
Markus Kovero

Stephan Budach

2011-Apr-28 12:39 UTC

head link

[zfs-discuss] No write coalescing after upgrade to Solaris 11 Express

>> Sync was disabled on the main pool and then let to inherrit to
everything else. The>  reason for disabled this in the first place was to fix
bad NFS write performance (even with>  Zil on an X25e SSD it was under
1MB/s).
>> I''ve also tried setting the logbias to throughput and latency
but they both perform>  around the same level.
>> Thanks
>> -Matt
> I believe you''re hitting bug "7000208: Space map trashing
affects NFS write throughput". We also did, and it did impact iscsi as
well.
>
> If you have enough ram you can try enabling metaslab debug (which makes
problem vanish);
>
> # echo metaslab_debug/W1 | mdb -kw
>
> And calculating amount of ram needed:
>
>
> /usr/sbin/amd64/zdb -mm<poolname>  >  /tmp/zdb-mm.out
>
> awk ''/segments/ {s+=$2}END
{printf("sum=%d\n",s)}'' zdb_mm.out
>
> 93373117 sum of segments
> 16 VDEVs
> 116 metaslabs
> 1856 metaslabs in total
>
> 93373117/1856 = 50308 average number of segments per metaslab
>
> 50308*1856*64
> 5975785472
>
> 5975785472/1024/1024/1024
> 5.56
>
> = 5.56 GB
>
> Yours
> Markus KoveroOut of curiosity, I just tried the command on one of my zpools, that has 
an number of ZFS volumes and was presented the following:

root at solaris11c:/obelixData/99999_Testkunde/01_Etat01# 
/usr/sbin/amd64/zdb -mm obelixData > /tmp/zdb-mm.out
WARNING: can''t open objset for obelixData/15035_RWE
zdb: can''t open ''obelixData'': I/O error

So, actually one of the volumes seems to have a problem, of which I 
wasn''t aware - but this volume seems to behave just normal and it 
doesn''t seem to show any erratic behaviour.

Can anybody maybe shed some light on what might be wrong with that 
particular volume?

Thanks,
budy

Stephan Budach

2011-Apr-28 13:04 UTC

head link

[zfs-discuss] No write coalescing after upgrade to Solaris 11 Express

Am 28.04.11 11:51, schrieb Markus Kovero:>> failed: space_map_load(sm, zfs_metaslab_ops, SM_FREE, smo,
>> spa->spa_meta_objset) == 0, file ../zdb.c, line 571, function
dump_metaslab
>> Is this something I should worry about?
>> uname -a
>> SunOS E55000 5.11 oi_148 i86pc i386 i86pc Solaris
> I thought we were talking about solaris 11 express, not oi?
> Anyway, no idea about how openindiana should work or not.
>
> Yours
> Markus Kovero
> _______________________________________________Maybe this hasn''t anything to do with Sol11Expr vs. oi. I do get the 
same error on one of my Sol11Expr hosts, when I run it against a 50 TB 
zpool:

root at solaris11a:~# uname -a
SunOS solaris11a 5.11 snv_151a i86pc i386 i86pc

root at solaris11a:~# /usr/sbin/amd64/zdb -mm backupPool_01 > /tmp/zdb-mm.out
Assertion failed: space_map_load(sm, zfs_metaslab_ops, SM_FREE, smo, 
spa->spa_meta_objset) == 0, file ../zdb.c, line 593, function dump_metaslab


Cheers,
budy

Victor Latushkin

2011-Apr-28 13:16 UTC

head link

[zfs-discuss] No write coalescing after upgrade to Solaris 11 Express

On Apr 28, 2011, at 5:04 PM, Stephan Budach wrote:
> Am 28.04.11 11:51, schrieb Markus Kovero:
>>> failed: space_map_load(sm, zfs_metaslab_ops, SM_FREE, smo,
>>> spa->spa_meta_objset) == 0, file ../zdb.c, line 571, function
dump_metaslab
>>> Is this something I should worry about?
>>> uname -a
>>> SunOS E55000 5.11 oi_148 i86pc i386 i86pc Solaris
>> I thought we were talking about solaris 11 express, not oi?
>> Anyway, no idea about how openindiana should work or not.
>> 
>> Yours
>> Markus Kovero
>> _______________________________________________
> Maybe this hasn''t anything to do with Sol11Expr vs. oi. I do get
the same error on one of my Sol11Expr hosts, when I run it against a 50 TB
zpool:
> 
> root at solaris11a:~# uname -a
> SunOS solaris11a 5.11 snv_151a i86pc i386 i86pc
> 
> root at solaris11a:~# /usr/sbin/amd64/zdb -mm backupPool_01 >
/tmp/zdb-mm.out
> Assertion failed: space_map_load(sm, zfs_metaslab_ops, SM_FREE, smo,
spa->spa_meta_objset) == 0, file ../zdb.c, line 593, function dump_metaslab
zdb is not intended to be run against changing pools/datasets.

Stephan Budach

2011-Apr-28 13:21 UTC

head link

[zfs-discuss] No write coalescing after upgrade to Solaris 11 Express

Am 28.04.11 15:16, schrieb Victor Latushkin:> On Apr 28, 2011, at 5:04 PM, Stephan Budach wrote:
>
>> Am 28.04.11 11:51, schrieb Markus Kovero:
>>>> failed: space_map_load(sm, zfs_metaslab_ops, SM_FREE, smo,
>>>> spa->spa_meta_objset) == 0, file ../zdb.c, line 571,
function dump_metaslab
>>>> Is this something I should worry about?
>>>> uname -a
>>>> SunOS E55000 5.11 oi_148 i86pc i386 i86pc Solaris
>>> I thought we were talking about solaris 11 express, not oi?
>>> Anyway, no idea about how openindiana should work or not.
>>>
>>> Yours
>>> Markus Kovero
>>> _______________________________________________
>> Maybe this hasn''t anything to do with Sol11Expr vs. oi. I do
get the same error on one of my Sol11Expr hosts, when I run it against a 50 TB
zpool:
>>
>> root at solaris11a:~# uname -a
>> SunOS solaris11a 5.11 snv_151a i86pc i386 i86pc
>>
>> root at solaris11a:~# /usr/sbin/amd64/zdb -mm backupPool_01> 
/tmp/zdb-mm.out
>> Assertion failed: space_map_load(sm, zfs_metaslab_ops, SM_FREE, smo,
spa->spa_meta_objset) == 0, file ../zdb.c, line 593, function dump_metaslab
> zdb is not intended to be run against changing pools/datasets.Well, that explains it then (and probably the I/O erorr I got from my 
other zpool as well). Thanks.

Cheers,
budy

Jamie Krier

2011-Jun-28 18:13 UTC

head link

[zfs-discuss] No write coalescing after upgrade to Solaris 11 Express

Markus Kovero <Markus.Kovero <at> nebula.fi> writes:
> 
> 
> > Sync was disabled on the main pool and then let to inherrit to
everything
else. The > reason for disabled> this in the first place was to fix bad NFS write performance (even with
> Zil
on an X25e SSD it was under 1MB/s).> > I''ve also tried setting the logbias to throughput and latency
but they both
perform > around the same level.> 
> > Thanks
> > -Matt
> 
> I believe you''re hitting bug "7000208: Space map trashing
affects NFS write
throughput". We also did, and> it did impact iscsi as well.
> 
> If you have enough ram you can try enabling metaslab debug (which makes
problem vanish);> 
> # echo metaslab_debug/W1 | mdb -kw
> 
> And calculating amount of ram needed:
> 
> /usr/sbin/amd64/zdb -mm <poolname> > /tmp/zdb-mm.out
> 
> awk ''/segments/ {s+=$2}END
{printf("sum=%d\n",s)}'' zdb_mm.out
> 
> 93373117 sum of segments
> 16 VDEVs
> 116 metaslabs
> 1856 metaslabs in total
> 
> 93373117/1856 = 50308 average number of segments per metaslab
> 
> 50308*1856*64
> 5975785472
> 
> 5975785472/1024/1024/1024
> 5.56
> 
> = 5.56 GB
> 
> Yours
> Markus Kovero
> 
We are running Solaris Express 11. We hit this same problem when we crossed 80%
usage on our mirror pool. 
Write performance dropped to 3-8 MB/sec on 48 mirror sets!

echo metaslab_debug/W1 | mdb -kw  resolves problem instantly.

What are the ramifications of running this command?

Thanks
- Jamie

zfs discuss - Apr 2011 - No write coalescing after upgrade to Solaris 11 Express

[zfs-discuss] No write coalescing after upgrade to Solaris 11 Express

[zfs-discuss] No write coalescing after upgrade to Solaris 11 Express

[zfs-discuss] No write coalescing after upgrade to Solaris 11 Express

[zfs-discuss] No write coalescing after upgrade to Solaris 11 Express

[zfs-discuss] No write coalescing after upgrade to Solaris 11 Express

[zfs-discuss] No write coalescing after upgrade to Solaris 11 Express

[zfs-discuss] No write coalescing after upgrade to Solaris 11 Express

[zfs-discuss] No write coalescing after upgrade to Solaris 11 Express

[zfs-discuss] No write coalescing after upgrade to Solaris 11 Express

[zfs-discuss] No write coalescing after upgrade to Solaris 11 Express

[zfs-discuss] No write coalescing after upgrade to Solaris 11 Express

[zfs-discuss] No write coalescing after upgrade to Solaris 11 Express