Constantin Gonzalez
2006-May-05 08:19 UTC
[zfs-discuss] ZFS: More information on ditto blocks?
Hi, (apologies if this was discussed before, I _did_ some research, but this one may have slipped for me...) Looking through the current Sun ZFS Technical presentation, I found a ZFS feature that was new to me: Ditto Blocks. In search of more information, I asked Google but there seem to be no real information other than the source code on Ditto Blocks. From the Ditto Block slide, I conclude that: - ZFS blocks can have multiple copies (up to 3), even on the same disk, but preferably on multiple disks, if possible. - The uber-block has an additional 3 copies (we already knew that) - The ZFS metadata structure has 2 or more copies (that was new to me) - In the future, users will be able to ask for multiple copies of their data (wow, what a great feature for laptop users with big, but single disks!) Can someone elaborate more on ditto blocks? Perhaps that would be a great blog entry (Google didn''t find anything for "site:blogs.sun.com zfs ditto blocks"). In particular: - Are regular data blocks multiplied by default if the disk isn''t mirrored/ raid-z''ed and there''s enough space? - What are the general rules on what blocks get multiplied how often? Best regards, Constantin -- Constantin Gonzalez Sun Microsystems GmbH, Germany Platform Technology Group, Client Solutions http://www.sun.de/ Tel.: +49 89/4 60 08-25 91 http://blogs.sun.com/constantin/
On Fri, May 05, 2006 at 10:19:56AM +0200, Constantin Gonzalez wrote:> (apologies if this was discussed before, I _did_ some research, but this > one may have slipped for me...)I''m in the process of writing a blog on this one. Give me another day or so.> Looking through the current Sun ZFS Technical presentation, I found a ZFS > feature that was new to me: Ditto Blocks.It is (relatively speaking) a new feature. I did the on-disk work back in October (before the initial ZFS putback), but the complete functionality was only recently integreated (beginning of April). It is in Nevada build 38, and will appear in S10u2.> In particular: > > - Are regular data blocks multiplied by default if the disk isn''t mirrored/ > raid-z''ed and there''s enough space?No, not yet. That''s a future feature.> - What are the general rules on what blocks get multiplied how often?The rule right now is pretty straightforward: User data blocks: 1 copy Per-filesystem metadata: 2 copies Global (across all filesystems) metadata: 3 copies. My block will hopefully explain things in more detail. --Bill
Bill Sommerfeld
2006-May-12 22:43 UTC
[zfs-discuss] ZFS: More information on ditto blocks?
Since I was curious, I''ve hacked up a version of zdb to collect statistics on ditto block counts and diversity (how many ditto blocks are actually on different top-level vdevs). diffs available on request. here''s some sample output on a pool which started life with one top-level vdev before the introduction of ditto blocks, was upgraded to support them, had about 120GB of data migrated into it, and then was grown to three top-level vdevs: : 3 #; time ./amd64/zdb -L -bD z Traversing all blocks to verify nothing leaked ... *** Live pool traversal; block counts are only approximate *** No leaks (block sum matches space maps exactly) bp count: 8266676 bp logical: 269002290176 avg: 32540 bp physical: 123969312768 avg: 14996 compression: 2.17 bp allocated: 161739043840 avg: 19565 compression: 1.66 SPA allocated: 161739043840 used: 14.77% Ditto blocks: 1 % 2 % 3 % Total Type 256 7.25 - 3273 92.75 3529 R3 blocks 63557 4.11 1484190 95.89 8 0.00 1547755 R2 blocks 6715392 100.00 6715392 R1 blocks Ditto block diversity: 1 % 2 % 3 % Total Type 640 18.14 - 2889 81.86 3529 R3 vdevs 926581 59.87 621166 40.13 8 0.00 1547755 R2 vdevs 6715392 100.00 6715392 R1 vdevs ------ presentation could use some work but to summarize, the "R3"/"R2"/"R1" indicates counts of blocks at each expected replication level, and the column headers indicate either the actual number of ditto blocks, or the number of different vdevs on which the ditto blocks appear. the first table is unlikely to be interesting on pools born with ditto blocks enabled. Anyhow, to explain this all a little better, for this pool, 18% of the triply-replicated blocks are on a single vdev and 82% are on all three; 60% of the doubly-replicated are on one vdev and 40% are on two. 93% of the triply-replicated blocks are actually triply replicated, while 7% aren''t (presumably they haven''t been rewritten since the pool was upgraded) Similarly, 4% of the doubly-replicated blocks haven''t been rewritten since the upgrade. The first table is unlikely to be interesting on pools born with ditto blocks enabled. (I believe the 8 blocks with expected replication level of 2 which turn up with a replication level of 3 is a bug in my counting). By comparison, zpool iostat reports: capacity operations bandwidth pool used avail read write read write ----------- ----- ----- ----- ----- ----- ----- z 151G 869G 71 83 605K 1.11M raidz 105G 235G 55 28 497K 392K ... raidz 23.9G 316G 8 30 64.1K 431K ... raidz 21.8G 318G 9 33 60.9K 428K ... ----------- ----- ----- ----- ----- ----- ----- so as I might have expected, the metadata is spreading out a lot faster than the unreplicated data.. - Bill