I have a huge problem with space maps on thumper. Space maps takes over 3GB and write operations generates massive read operations. Before every spa sync phase zfs reads space maps from disk. I decided to turn on compression for pool ( only for pool, not filesystems ) and it helps. Now space maps, intent log, spa history are compressed. Not I''m thinking about disabling checksums. All metadata are written in 2 copies, so when I have compression=on do I need checksums ? Does zfs will try to read second block when zio_decompress_data will return error ? Is there other way to check space map compression ratio ? Now I''m using "#zdb -bb pool" but it takes hours. This message posted from opensolaris.org
On Sep 14, 2007, at 8:16 AM, ?ukasz wrote:> I have a huge problem with space maps on thumper. Space maps takes > over 3GB > and write operations generates massive read operations. > Before every spa sync phase zfs reads space maps from disk. > > I decided to turn on compression for pool ( only for pool, not > filesystems ) and it helps. > Now space maps, intent log, spa history are compressed.How did you do that?> > Not I''m thinking about disabling checksums. All metadata are > written in 2 copies, > so when I have compression=on do I need checksums ?They are separate things. If you want data integrity, then you need to leave checksums enabled.> Does zfs will try to read second block when zio_decompress_data > will return error ? > > Is there other way to check space map compression ratio ? > Now I''m using "#zdb -bb pool" but it takes hours.#zdb -v <pool> ... Traversing all blocks to verify checksums and verify nothing leaked ... No leaks (block sum matches space maps exactly) bp count: 103 bp logical: 1572352 avg: 15265 bp physical: 120832 avg: 1173 compression: 13.01 bp allocated: 296448 avg: 2878 compression: 5.30 SPA allocated: 296448 used: 0.00% Blocks LSIZE PSIZE ASIZE avg comp %Total Type 3 12.0K 1.50K 4.50K 1.50K 8.00 1.55 deferred free 1 512 512 1.50K 1.50K 1.00 0.52 object directory 2 1K 1K 3.00K 1.50K 1.00 1.04 object array 1 16K 1.50K 4.50K 4.50K 10.67 1.55 packed nvlist - - - - - - - packed nvlist size 5 528K 34.0K 102K 20.4K 15.53 35.23 bplist - - - - - - - bplist header - - - - - - - SPA space map header 3 12.0K 1.50K 4.50K 1.50K 8.00 1.55 SPA space map - - - - - - - ZIL intent log 51 816K 54.0K 113K 2.21K 15.11 38.86 DMU dnode 8 8K 4K 8.50K 1.06K 2.00 2.94 DMU objset - - - - - - - DSL directory 4 2K 2K 6.00K 1.50K 1.00 2.07 DSL directory child map 3 1.50K 1.50K 4.50K 1.50K 1.00 1.55 DSL dataset snap map 4 2K 2K 6.00K 1.50K 1.00 2.07 DSL props - - - - - - - DSL dataset - - - - - - - ZFS znode - - - - - - - ZFS ACL 6 3.00K 3.00K 3.00K 512 1.00 1.04 ZFS plain file 5 2.50K 2.50K 5.00K 1K 1.00 1.73 ZFS directory 3 1.50K 1.50K 3.00K 1K 1.00 1.04 ZFS master node 3 1.50K 1.50K 3.00K 1K 1.00 1.04 ZFS delete queue - - - - - - - zvol object - - - - - - - zvol prop - - - - - - - other uint8[] - - - - - - - other uint64[] - - - - - - - other ZAP - - - - - - - persistent error log 1 128K 6.00K 18.0K 18.0K 21.33 6.22 SPA history - - - - - - - SPA history offsets - - - - - - - Pool properties - - - - - - - DSL permissions 103 1.50M 118K 290K 2.81K 13.01 100.00 Total capacity operations bandwidth ---- errors ---- description used avail read write read write read write cksum d 290K 24.4G 874 0 1023K 0 0 0 0 /dev/dsk/c1t1d0s7 290K 24.4G 874 0 1023K 0 0 0 0 fsh-hake# eric
> > On Sep 14, 2007, at 8:16 AM, ?ukasz wrote: > > > I have a huge problem with space maps on thumper. > Space maps takes > > over 3GB > > and write operations generates massive read > operations. > > Before every spa sync phase zfs reads space maps > from disk. > > > > I decided to turn on compression for pool ( only > for pool, not > > filesystems ) and it helps. > > Now space maps, intent log, spa history are > compressed. > > How did you do that?# zfs list NAME USED AVAIL REFER MOUNTPOINT zpool 7.99G 59.0G 19K /zpool zpool/data 7.98G 59.0G 6.16G /zpool/data Then: # zfs set compress=off zpool/data # zfs set compress=on zpool If you will noe keep any files in /zpool, then only metadata blocks will be compressed. # zdb -bbb zpool Blocks LSIZE PSIZE ASIZE avg comp %Total Type 1 16K 1K 3.00K 3.00K 16.00 0.00 L1 deferred free 3 12.0K 2K 6.00K 2K 6.00 0.00 L0 deferred free 4 28.0K 3.00K 9.00K 2.25K 9.33 0.00 deferred free - - - - - - - SPA space map header 5 80.0K 6.00K 18.0K 3.60K 13.33 0.00 L1 SPA space map 56 224K 158K 473K 8.44K 1.42 0.01 L0 SPA space map 61 304K 164K 491K 8.04K 1.86 0.01 SPA space map> > > > > Not I''m thinking about disabling checksums. All > metadata are > > written in 2 copies, > > so when I have compression=on do I need checksums ? > > They are separate things. If you want data > integrity, then you need > to leave checksums enabled.Why not to keep checksum in compressed block after compressed data ? We do not have to use 2 blocks then.> > > Does zfs will try to read second block when > zio_decompress_data > > will return error ? > > > > Is there other way to check space map compression > ratio ? > > Now I''m using "#zdb -bb pool" but it takes hours. > > #zdb -v <pool> > ... > Traversing all blocks to verify checksums and verify > nothing leaked ...I don''t want to traverse all blocks. This message posted from opensolaris.org
?ukasz wrote:> I have a huge problem with space maps on thumper. Space maps takes over 3GB > and write operations generates massive read operations. > Before every spa sync phase zfs reads space maps from disk. > > I decided to turn on compression for pool ( only for pool, not filesystems ) and it helps.That is extremely hard to believe (given that all you actually did was turn on compression for a 19k filesystem).> Now space maps, intent log, spa history are compressed.All normal metadata (including space maps and spa history) is always compressed. The intent log is never compressed.> Not I''m thinking about disabling checksums. All metadata are written in 2 copies, > so when I have compression=on do I need checksums ?Yes, you need checksums, otherwise silent hardware errors will be silent data corruption. You can not turn off checksums on metadata. Turning off checksums may have some tiny impact because it will cause the level-1 indirect blocks to compress better.> Is there other way to check space map compression ratio ? > Now I''m using "#zdb -bb pool" but it takes hours.You can probably do it with "zdb -vvv pool | less" and look for each of the space map files in the MOS. This is printed pretty early on, after which you can kill off the zdb. --matt
> > Now space maps, intent log, spa history are compressed. > > All normal metadata (including space maps and spa history) is always > compressed. The intent log is never compressed.Can you tell me where space map is compressed ? Buffer is filled up with: 468 *entry++ = SM_OFFSET_ENCODE(start) | 469 SM_TYPE_ENCODE(maptype) | 470 SM_RUN_ENCODE(run_len); and later dmu_write is called. I want to propose few optimalization here: - space map block size schould be dynamin ( 4KB buffer is a bug ) My space map on thumper takes over 3,5 GB / 4kB = 855k blocks - space map should be compressed before dividing: 1. FILL LARGER BLOCK with data 2. compress it 3. divide to blocks and then write - other thing is memory usage, space map is using "kmem_alloc_40" for allocating space map in memory. During sync phase after removing snapshot kmem_alloc_40 takes over 13GB RAM and system is swapping. My question is when are you going to optimalize space map ? We are having big problems here with ZFS due to space map and fragmentation. We have to lower recordsize and disable zil. ---------------------------------------------------- Potrzebujesz samochodu? Mamy dla Ciebie auto tylko za 70 z? dziennie! Oferta specjalna Express Rent a Car - Kliknij: http://klik.wp.pl/?adr=https%3A%2F%2Fwynajemsamochodow.wp.pl%2F&sid=58
?ukasz K wrote:>>> Now space maps, intent log, spa history are compressed. >> All normal metadata (including space maps and spa history) is always >> compressed. The intent log is never compressed. > > Can you tell me where space map is compressed ?we specify that it should be compressed in dbuf_sync_leaf: if (dmu_ot[dn->dn_type].ot_metadata) { checksum = os->os_md_checksum; compress = zio_compress_select(dn->dn_compress, os->os_md_compress); os_md_compress is set to ZIO_COMPRESS_LZJB in dmu_objset_open_impl(), so the compression will happen in lzjb_compress().> I want to propose few optimalization here: > - space map block size schould be dynamin ( 4KB buffer is a bug ) > My space map on thumper takes over 3,5 GB / 4kB = 855k blocksA small block size is used because we typically have to keep the last block of every space map in memory (as we are constantly appending to it). This is a trade-off between memory usage and time taken to load the space map. --matt