I''ve got two recent examples of SSDs. Their pristine state from the manufacturer shows: Device Model: OCZ-VERTEX3 # hexdump -C /dev/sdd 00000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * 1bf2976000 Device Model: OCZ VERTEX PLUS (OCZ VERTEX 2E) # hexdump -C /dev/sdd 00000000 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................| * df99e6000 What''s a good way to test what state they get erased to from a TRIM operation? Can btrfs detect the erase state and pad unused space in filesystem writes with the same value so as to reduce SSD wear? Regards, Martin -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, 2012-05-22 at 22:47 +0100, Martin wrote:> I''ve got two recent examples of SSDs. Their pristine state from the > manufacturer shows:> Device Model: OCZ-VERTEX3 > 00000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00> Device Model: OCZ VERTEX PLUS > 00000000 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff> What''s a good way to test what state they get erased to from a TRIM > operation?This pristine state probably matches up with the result of a trim command on the drive. In particular, a freshly erased flash block is in a state where the bits are all 1, so the Vertex Plus drive is showing you the flash contents directly. The Vertex 3 has substantially more processing, and the 0s are effectively generated on the fly for unmapped flash blocks (similar to how the missing portions of a sparse file contains 0s).> Can btrfs detect the erase state and pad unused space in filesystem > writes with the same value so as to reduce SSD wear?On the Vertex 3, this wouldn''t actually do what you''d hope. The firmware in that drive actually compresses, deduplicates, and encrypts all the data prior to writing it to flash - and as a result the data that hits the flash looks nothing like what the filesystem wrote. (For best performance, it might make sense to disable btrfs''s built-in compression on the Vertex 3 drive to allow the drive''s compression to kick in. Let us know if you benchmark it either way.) The benefit to doing this on the Vertex Plus is probably fairly small, since to rewrite a block - even if the block is partially unwritten - is still likely to require a read-modify-write cycle with an erase step. The granularity of the erase blocks is just too big for the savings to be very meaningful. -- Calvin Walton <calvin.walton@kepstin.ca> -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 23/05/12 05:19, Calvin Walton wrote:> On Tue, 2012-05-22 at 22:47 +0100, Martin wrote: >> I''ve got two recent examples of SSDs. Their pristine state from the >> manufacturer shows: > >> Device Model: OCZ-VERTEX3 >> 00000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > >> Device Model: OCZ VERTEX PLUS >> 00000000 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > >> What''s a good way to test what state they get erased to from a TRIM >> operation? > > This pristine state probably matches up with the result of a trim > command on the drive. In particular, a freshly erased flash block is in > a state where the bits are all 1, so the Vertex Plus drive is showing > you the flash contents directly. The Vertex 3 has substantially more > processing, and the 0s are effectively generated on the fly for unmapped > flash blocks (similar to how the missing portions of a sparse file > contains 0s).So for that example of reading an ''empty'' drive, the OCZ-VERTEX3 might not even be reading the flash chips at all!...>> Can btrfs detect the erase state and pad unused space in filesystem >> writes with the same value so as to reduce SSD wear? > > On the Vertex 3, this wouldn''t actually do what you''d hope. The firmware > in that drive actually compresses, deduplicates, and encrypts all the > data prior to writing it to flash - and as a result the data that hits > the flash looks nothing like what the filesystem wrote. > (For best performance, it might make sense to disable btrfs''s built-in > compression on the Vertex 3 drive to allow the drive''s compression to > kick in. Let us know if you benchmark it either way.)Very good comment, thanks. That leaves a very good question of how the Sandforce controller uses the flash. Does it implement its own ''virtual block level'' interface to then use the underlying flash using structures that are not visible externally? What does that do to concerns about alignment?... And for what granularity of write chunks?> The benefit to doing this on the Vertex Plus is probably fairly small, > since to rewrite a block - even if the block is partially unwritten - is > still likely to require a read-modify-write cycle with an erase step. > The granularity of the erase blocks is just too big for the savings to > be very meaningful.My understanding is that the ''wear'' mechanism in flash is a problem of charge getting trapped in the insulation material itself that surrounds the floating gate of a cell. The permanently trapped charge accumulates further for each change of state until a high enough offset voltage has accumulated to exceed what can be tolerated for correct operation of the cell. Hence, writing the *same value* as that for already stored for a cell should not cause any wear being as you are not changing the state of a cell. (No change in charge levels.) For non-Sandforce controllers, that suggests doing a read-modify-write to pad out whatever minimum sized write chunk. That would be rather poor for performance, and the manufacturer''s secrecy means we cannot be sure of the underlying write block size for minimum sized alignment. Alternatively, padding out writes with the erased state value means that no further wear should be caused for when that block is eventually TRIMed/erased for rewriting. That should also be a ''soft'' option for the Sandforce controllers in that /hopefully/ their compression/deduplication will compress down the padding so as not to be a problem. (Damn the Manufacturer''s secrecy!) Regards, Martin -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 2012-05-23 at 16:44 +0100, Martin wrote:> On 23/05/12 05:19, Calvin Walton wrote: > > On Tue, 2012-05-22 at 22:47 +0100, Martin wrote: > >> I''ve got two recent examples of SSDs. Their pristine state from the > >> manufacturer shows: > > > >> Device Model: OCZ-VERTEX3 > >> 00000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > > >> Device Model: OCZ VERTEX PLUS > >> 00000000 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff> >> Can btrfs detect the erase state and pad unused space in filesystem > >> writes with the same value so as to reduce SSD wear?> > The benefit to doing this on the Vertex Plus is probably fairly small, > > since to rewrite a block - even if the block is partially unwritten - is > > still likely to require a read-modify-write cycle with an erase step. > > The granularity of the erase blocks is just too big for the savings to > > be very meaningful. > > My understanding is that the ''wear'' mechanism in flash is a problem of > charge getting trapped in the insulation material itself that surrounds > the floating gate of a cell. The permanently trapped charge accumulates > further for each change of state until a high enough offset voltage has > accumulated to exceed what can be tolerated for correct operation of the > cell. > > Hence, writing the *same value* as that for already stored for a cell > should not cause any wear being as you are not changing the state of a > cell. (No change in charge levels.) > > For non-Sandforce controllers, that suggests doing a read-modify-write > to pad out whatever minimum sized write chunk. That would be rather poor > for performance, and the manufacturer''s secrecy means we cannot be sure > of the underlying write block size for minimum sized alignment.It''s very unlikely that the firmware in any modern high-performance SSD would ever do an in-place read-modify-write sequence. If you write data to the same sector on the disc twice, it is more likely to actually write to two different places in the flash. A flash erase block typically won''t be re-used until all of the data that had been in it gets rewritten somewhere else. The Indilinx controller in the Vertex 1 drives have a garbage collector that runs in the background to look for flash erase blocks that have been partially rewritten, and consolidate the remaining data from multiple blocks into one block to free new space for future writing.> Alternatively, padding out writes with the erased state value means that > no further wear should be caused for when that block is eventually > TRIMed/erased for rewriting.It is certainly possible that this could be the case. The difference is likely to be fairly minimal. But unless you are an SSD manufacturer, you''ll probably never know how much actual difference it would make :) -- Calvin Walton <calvin.walton@kepstin.ca> -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html