thr3ads.net - zfs discuss - [zfs-discuss] ZFS Space Map optimalization [Sep 2007]

If this information is useful, please help other people find it:
Share via:

Łukasz

2007-Sep-14 15:16 UTC

[zfs-discuss] ZFS Space Map optimalization

I have a huge problem with space maps on thumper. Space maps takes over 3GB
and write operations generates massive read operations. 
Before every spa sync phase zfs reads space maps from disk.

I decided to turn on compression for pool ( only for pool, not filesystems ) and
it helps.
Now space maps, intent log, spa history are compressed.

Not I''m thinking about disabling checksums. All metadata are written in
2 copies,
so when I have compression=on do I need checksums ? 
Does zfs will try to read second block when zio_decompress_data will return
error ?

Is there other way to check space map compression ratio ?
Now I''m using "#zdb -bb pool" but it takes hours.
 
 
This message posted from opensolaris.org

eric kustarz

2007-Sep-14 23:14 UTC

head link

[zfs-discuss] ZFS Space Map optimalization

On Sep 14, 2007, at 8:16 AM, ?ukasz wrote:
> I have a huge problem with space maps on thumper. Space maps takes  
> over 3GB
> and write operations generates massive read operations.
> Before every spa sync phase zfs reads space maps from disk.
>
> I decided to turn on compression for pool ( only for pool, not  
> filesystems ) and it helps.
> Now space maps, intent log, spa history are compressed.
How did you do that?
>
> Not I''m thinking about disabling checksums. All metadata are  
> written in 2 copies,
> so when I have compression=on do I need checksums ?
They are separate things.  If you want data integrity, then you need  
to leave checksums enabled.
> Does zfs will try to read second block when zio_decompress_data  
> will return error ?
>
> Is there other way to check space map compression ratio ?
> Now I''m using "#zdb -bb pool" but it takes hours.
#zdb -v <pool>
...
Traversing all blocks to verify checksums and verify nothing leaked ...

         No leaks (block sum matches space maps exactly)

         bp count:             103
         bp logical:       1572352        avg:  15265
         bp physical:       120832        avg:   1173     
compression:  13.01
         bp allocated:      296448        avg:   2878     
compression:   5.30
         SPA allocated:     296448       used:  0.00%

Blocks  LSIZE   PSIZE   ASIZE     avg    comp   %Total  Type
      3  12.0K   1.50K   4.50K   1.50K    8.00     1.55  deferred free
      1    512     512   1.50K   1.50K    1.00     0.52  object  
directory
      2     1K      1K   3.00K   1.50K    1.00     1.04  object array
      1    16K   1.50K   4.50K   4.50K   10.67     1.55  packed nvlist
      -      -       -       -       -       -        -  packed  
nvlist size
      5   528K   34.0K    102K   20.4K   15.53    35.23  bplist
      -      -       -       -       -       -        -  bplist header
      -      -       -       -       -       -        -  SPA space  
map header
      3  12.0K   1.50K   4.50K   1.50K    8.00     1.55  SPA space map
      -      -       -       -       -       -        -  ZIL intent log
     51   816K   54.0K    113K   2.21K   15.11    38.86  DMU dnode
      8     8K      4K   8.50K   1.06K    2.00     2.94  DMU objset
      -      -       -       -       -       -        -  DSL directory
      4     2K      2K   6.00K   1.50K    1.00     2.07  DSL  
directory child map
      3  1.50K   1.50K   4.50K   1.50K    1.00     1.55  DSL dataset  
snap map
      4     2K      2K   6.00K   1.50K    1.00     2.07  DSL props
      -      -       -       -       -       -        -  DSL dataset
      -      -       -       -       -       -        -  ZFS znode
      -      -       -       -       -       -        -  ZFS ACL
      6  3.00K   3.00K   3.00K     512    1.00     1.04  ZFS plain file
      5  2.50K   2.50K   5.00K      1K    1.00     1.73  ZFS directory
      3  1.50K   1.50K   3.00K      1K    1.00     1.04  ZFS master node
      3  1.50K   1.50K   3.00K      1K    1.00     1.04  ZFS delete  
queue
      -      -       -       -       -       -        -  zvol object
      -      -       -       -       -       -        -  zvol prop
      -      -       -       -       -       -        -  other uint8[]
      -      -       -       -       -       -        -  other uint64[]
      -      -       -       -       -       -        -  other ZAP
      -      -       -       -       -       -        -  persistent  
error log
      1   128K   6.00K   18.0K   18.0K   21.33     6.22  SPA history
      -      -       -       -       -       -        -  SPA history  
offsets
      -      -       -       -       -       -        -  Pool properties
      -      -       -       -       -       -        -  DSL permissions
    103  1.50M    118K    290K   2.81K   13.01   100.00  Total

                       capacity   operations   bandwidth  ---- errors  
----
description          used avail  read write  read write  read write  
cksum
d                    290K 24.4G   874     0 1023K     0     0      
0     0
   /dev/dsk/c1t1d0s7  290K 24.4G   874     0 1023K     0     0      
0     0
fsh-hake#


eric

Łukasz

2007-Sep-24 15:49 UTC

head link

[zfs-discuss] ZFS Space Map optimalization

> 
> On Sep 14, 2007, at 8:16 AM, ?ukasz wrote:
> 
> > I have a huge problem with space maps on thumper.
> Space maps takes  
> > over 3GB
> > and write operations generates massive read
> operations.
> > Before every spa sync phase zfs reads space maps
> from disk.
> >
> > I decided to turn on compression for pool ( only
> for pool, not  
> > filesystems ) and it helps.
> > Now space maps, intent log, spa history are
> compressed.
> 
> How did you do that?
# zfs list
NAME                 USED  AVAIL  REFER  MOUNTPOINT
zpool                   7.99G  59.0G    19K  /zpool
zpool/data           7.98G  59.0G  6.16G  /zpool/data

Then:
# zfs set compress=off zpool/data
# zfs set compress=on zpool

If you will noe keep any files in /zpool,
then only metadata blocks will be compressed.

# zdb -bbb zpool
Blocks  LSIZE   PSIZE   ASIZE     avg    comp   %Total  Type
     1    16K      1K   3.00K   3.00K   16.00     0.00      L1 deferred free
     3  12.0K      2K   6.00K      2K    6.00     0.00      L0 deferred free
     4  28.0K   3.00K   9.00K   2.25K    9.33     0.00  deferred free
     -      -       -       -       -       -        -  SPA space map header
     5  80.0K   6.00K   18.0K   3.60K   13.33     0.00      L1 SPA space map
    56   224K    158K    473K   8.44K    1.42     0.01      L0 SPA space map
    61   304K    164K    491K   8.04K    1.86     0.01  SPA space map
> 
> >
> > Not I''m thinking about disabling checksums. All
> metadata are  
> > written in 2 copies,
> > so when I have compression=on do I need checksums ?
> 
> They are separate things.  If you want data
> integrity, then you need  
> to leave checksums enabled.
Why not to keep checksum in compressed block after compressed data ?
We do not have to use 2 blocks then.
> 
> > Does zfs will try to read second block when
> zio_decompress_data  
> > will return error ?
> >
> > Is there other way to check space map compression
> ratio ?
> > Now I''m using "#zdb -bb pool" but it takes hours.
> 
> #zdb -v <pool>
> ...
> Traversing all blocks to verify checksums and verify
> nothing leaked ...
I don''t  want to traverse all blocks.
 
 
This message posted from opensolaris.org

Matthew Ahrens

2007-Oct-10 03:45 UTC

head link

[zfs-discuss] ZFS Space Map optimalization

?ukasz wrote:> I have a huge problem with space maps on thumper. Space maps takes over 3GB
> and write operations generates massive read operations. 
> Before every spa sync phase zfs reads space maps from disk.
> 
> I decided to turn on compression for pool ( only for pool, not filesystems
) and it helps.
That is extremely hard to believe (given that all you actually did was turn 
on compression for a 19k filesystem).
> Now space maps, intent log, spa history are compressed.
All normal metadata (including space maps and spa history) is always 
compressed.  The intent log is never compressed.
> Not I''m thinking about disabling checksums. All metadata are
written in 2 copies,
> so when I have compression=on do I need checksums ? 
Yes, you need checksums, otherwise silent hardware errors will be silent data 
corruption.  You can not turn off checksums on metadata.  Turning off 
checksums may have some tiny impact because it will cause the level-1 
indirect blocks to compress better.
> Is there other way to check space map compression ratio ?
> Now I''m using "#zdb -bb pool" but it takes hours.
You can probably do it with "zdb -vvv pool | less" and look for each
of the
space map files in the MOS.  This is printed pretty early on, after which you 
can kill off the zdb.

--matt

Łukasz K

2007-Oct-11 11:56 UTC

head link

[zfs-discuss] ZFS Space Map optimalization

> > Now space maps, intent log, spa history are compressed.
> 
> All normal metadata (including space maps and spa history) is always
> compressed.  The intent log is never compressed.
Can you tell me where space map is compressed ?

Buffer is filled up with:
  468 			*entry++ = SM_OFFSET_ENCODE(start) |
  469 			    SM_TYPE_ENCODE(maptype) |
  470 			    SM_RUN_ENCODE(run_len);
and later dmu_write is called.

I want to propose few optimalization here:
 - space map block size schould be dynamin ( 4KB buffer is a bug )
   My space map on thumper takes over 3,5 GB / 4kB = 855k blocks

 - space map should be compressed before dividing:
      1. FILL LARGER BLOCK with data
      2. compress it
      3. divide to blocks and then write

 - other thing is memory usage, space map is using "kmem_alloc_40"
   for allocating space map in memory. During sync phase after
   removing snapshot kmem_alloc_40 takes over 13GB RAM and system
   is swapping.

My question is when are you going to optimalize space map ?
We are having big problems here with ZFS due to space map and
fragmentation. We have to lower recordsize and disable zil.

----------------------------------------------------
Potrzebujesz samochodu? Mamy dla Ciebie auto tylko za 70 z? dziennie!
Oferta specjalna Express Rent a Car - Kliknij:
http://klik.wp.pl/?adr=https%3A%2F%2Fwynajemsamochodow.wp.pl%2F&sid=58

Matthew Ahrens

2007-Oct-13 00:32 UTC

head link

[zfs-discuss] ZFS Space Map optimalization

?ukasz K wrote:>>> Now space maps, intent log, spa history are compressed.
>> All normal metadata (including space maps and spa history) is always
>> compressed.  The intent log is never compressed.
> 
> Can you tell me where space map is compressed ?
we specify that it should be compressed in dbuf_sync_leaf:

         if (dmu_ot[dn->dn_type].ot_metadata) {
                 checksum = os->os_md_checksum;
                 compress = zio_compress_select(dn->dn_compress,
                     os->os_md_compress);

os_md_compress is set to ZIO_COMPRESS_LZJB in dmu_objset_open_impl(), so the 
compression will happen in lzjb_compress().
> I want to propose few optimalization here:
>  - space map block size schould be dynamin ( 4KB buffer is a bug )
>    My space map on thumper takes over 3,5 GB / 4kB = 855k blocks
A small block size is used because we typically have to keep the last block 
of every space map in memory (as we are constantly appending to it).  This is 
a trade-off between memory usage and time taken to load the space map.

--matt

Maybe Matching Threads

Search for more reasonably related threads

zfs discuss - Sep 2007 - ZFS Space Map optimalization

[zfs-discuss] ZFS Space Map optimalization

[zfs-discuss] ZFS Space Map optimalization

[zfs-discuss] ZFS Space Map optimalization

[zfs-discuss] ZFS Space Map optimalization

[zfs-discuss] ZFS Space Map optimalization

[zfs-discuss] ZFS Space Map optimalization

Maybe Matching Threads