valrhona at gmail.com
2010-Apr-11 06:32 UTC
[zfs-discuss] ZFS effective short-stroking and connection to thin provisioning?
A theoretical question on how ZFS works, for the experts on this board. I am wondering about how and where ZFS puts the physical data on a mechanical hard drive. In the past, I have spent lots of money on 15K rpm SCSI and then SAS drives, which of course have great performance. However, given the increase in areal density in modern consumer SATA drives, similar performance can be reached by short-stroking the drives; that is, the outermost tracks are similar in performance to the average performance, and sometimes exceeding the peak, on the 15K drives. My question is how ZFS lays the data out on the disk, and if there''s a way to capture some of this effectively. It seems inefficient to do physically short-stroke any of the drives, but more sensible to have ZFS handle this (if in fact it has this capability). But if I am using mirrored pairs of 2 TB drives, but only have a few hundred GB of data, in effect if only the outer tracks are used, then the performance should be similar to if I have nearly-full 15 K drives, in practice. Given that ZFS can also thin provision, thereby disconnecting the virtual space and physical space on the drives, how does the data layout maximize performance? The practical question: I have something like 600 GB of data on a mirrored pair of 2 TB Hitachi SATA drives, with compression and deduplication. Before, I had a RAID5 of four 147 GB 10K rpm Seagate Savvio 10K.2 2.5" SAS drives on a Dell PERC5/i caching RAID controller. The old RAID was nearly full (20-30 GB free), and performed substantially slower than the current setup in daily use (it had noticeably slower disk access, and transfer rates), because the drives were nearly full. I''m curious to see if I switched from these two disks to the new Western Digital Velociraptors (10K RPM SATA), if I could even tell the difference. Or because those drives would be nearly full, would the whole setup be slower? -- This message posted from opensolaris.org
Richard Elling
2010-Apr-11 15:50 UTC
[zfs-discuss] ZFS effective short-stroking and connection to thin provisioning?
On Apr 10, 2010, at 11:32 PM, valrhona at gmail.com wrote:> A theoretical question on how ZFS works, for the experts on this board. > I am wondering about how and where ZFS puts the physical data on a mechanical hard drive. In the past, I have spent lots of money on 15K rpm SCSI and then SAS drives, which of course have great performance. However, given the increase in areal density in modern consumer SATA drives, similar performance can be reached by short-stroking the drives; that is, the outermost tracks are similar in performance to the average performance, and sometimes exceeding the peak, on the 15K drives.HDDs and performance do not mix. SSDs win. Game over.> My question is how ZFS lays the data out on the disk, and if there''s a way to capture some of this effectively. It seems inefficient to do physically short-stroke any of the drives, but more sensible to have ZFS handle this (if in fact it has this capability). But if I am using mirrored pairs of 2 TB drives, but only have a few hundred GB of data, in effect if only the outer tracks are used, then the performance should be similar to if I have nearly-full 15 K drives, in practice. Given that ZFS can also thin provision, thereby disconnecting the virtual space and physical space on the drives, how does the data layout maximize performance?In general, the space with the lower numbered LBA is allocated first. For many HDDs, the lower numbered LBAs are on the outer cylinders. An easy way to see the allocations at a high level is to look at the metaslab statistics in # zdb -m syspool Metaslabs: vdev 0 metaslabs 148 offset spacemap free --------------- ------------------- --------------- ------------- metaslab 0 offset 0 spacemap 26 free 476M metaslab 1 offset 40000000 spacemap 41 free 481M metaslab 2 offset 80000000 spacemap 44 free 974M metaslab 3 offset c0000000 spacemap 45 free 935M metaslab 4 offset 100000000 spacemap 46 free 1007M metaslab 5 offset 140000000 spacemap 110 free 935M metaslab 6 offset 180000000 spacemap 111 free 1019M metaslab 7 offset 1c0000000 spacemap 0 free 1G metaslab 8 offset 200000000 spacemap 0 free 1G metaslab 9 offset 240000000 spacemap 0 free 1G ... metaslab 27 offset 6c0000000 spacemap 0 free 1G metaslab 28 offset 700000000 spacemap 25 free 1012M metaslab 29 offset 740000000 spacemap 40 free 1011M metaslab 30 offset 780000000 spacemap 0 free 1G metaslab 31 offset 7c0000000 spacemap 0 free 1G metaslab 32 offset 800000000 spacemap 0 free 1G ... Most of the data is allocated in lower numbered metaslabs. A bit later you can see where the redundanct metadata is written. The rest is mostly free space. Remember that ZFS uses COW, so new writes will be to the free areas.> The practical question: I have something like 600 GB of data on a mirrored pair of 2 TB Hitachi SATA drives, with compression and deduplication. Before, I had a RAID5 of four 147 GB 10K rpm Seagate Savvio 10K.2 2.5" SAS drives on a Dell PERC5/i caching RAID controller. The old RAID was nearly full (20-30 GB free), and performed substantially slower than the current setup in daily use (it had noticeably slower disk access, and transfer rates), because the drives were nearly full. I''m curious to see if I switched from these two disks to the new Western Digital Velociraptors (10K RPM SATA), if I could even tell the difference. Or because those drives would be nearly full, would the whole setup be slower?Yes, the drives will be able to push more media under the head. It is not clear that this will always give better performance. Also, for writes, as the pool fills, it becomes more difficult to allocate free space. This is not a ZFS-only phenomenon, all file systems have some sort of allocation. However, there have been improvements in this area for ZFS over the past year or so. -- richard ZFS storage and performance consulting at http://www.RichardElling.com ZFS training on deduplication, NexentaStor, and NAS performance Las Vegas, April 29-30, 2010 http://nexenta-vegas.eventbrite.com