thr3ads.net - Btrfs devel - [Btrfs-devel] btrfs and Solid State Disks (SSD). [Feb 2008]

If this information is useful, please help other people find it:
Share via:

Miguel Figueiredo Mascarenhas Sousa Filipe

2008-Feb-11 10:39 UTC

[Btrfs-devel] btrfs and Solid State Disks (SSD).

Hi there,

This might sound stupid, but since I cannot infer from the
documentation and features of btrfs the answer to my doubts.
Here it goes:

Is the data and metadata ondisk layout of btrfs favorable for SSDs
?>From what I read, current SSD are characterized by:- poor performance in random writes (because of block erasure)
- require wear leveling, even those that emulate sata/ide/scsi disk
with onboard wear leveling logic.
- excellent seek latency
- excellent read (random and sequential) performance.
- good at sequential writes.

I've read that journaling file systems are usually bad for SSD because
of (from what I suppose are) two things:
- increased "random" write load (journal + proper data)
- write hot spot on the journal, causing lots of write cycles on a
given set of blocks.

Theoretically, ext2 would be better for a SSD than ext3 because of these issues.

So, is the design of btrfs a good match for the peculiarities of SSDs ?

Kind regards,
patiently waiting for btrfs to mature and become the default linux filesystem ;D

--
Miguel Sousa Filipe



-- 
Miguel Sousa Filipe

Chris Mason

2008-Feb-11 12:14 UTC

head link

[Btrfs-devel] btrfs and Solid State Disks (SSD).

On Monday 11 February 2008, Miguel Figueiredo Mascarenhas Sousa Filipe
wrote:> Hi there,
>
> This might sound stupid, but since I cannot infer from the
> documentation and features of btrfs the answer to my doubts.
> Here it goes:
>
> Is the data and metadata ondisk layout of btrfs favorable for SSDs ?
Yes, SSDs are a big target of mine, and so the parts that are not currently 
favorable for SSDs will be changed.  The big problem right now is that btrfs 
writes to a fixed super block for every commit.  That will change to a 
rotating set of fixed super blocks to lower wear and improve redundancy.
>
> >From what I read, current SSD are characterized by:
>
> - poor performance in random writes (because of block erasure)
Random writes are very fast, as long as they fill an entire erasure block 
(often 128KB) There are rumors that SSDs will have smaller erasure blocks in 
the future.
> - require wear leveling, even those that emulate sata/ide/scsi disk
> with onboard wear leveling logic.
Yes, they all require wear leveling, and usually they do this internally.  
Btrfs will not do wear leveling.
> - excellent seek latency
> - excellent read (random and sequential) performance.
> - good at sequential writes.
>
> I've read that journaling file systems are usually bad for SSD because
> of (from what I suppose are) two things:
> - increased "random" write load (journal + proper data)
> - write hot spot on the journal, causing lots of write cycles on a
> given set of blocks.
>
> Theoretically, ext2 would be better for a SSD than ext3 because of these
> issues.
Journaled filesystems will definitely exercise the wear leveling firmware, but 
so will ext2.  The metadata and file data blocks are in a fixed location and 
use small block sizes.  So, metadata heavy workloads will hammer on the SSD 
either way.
>
> So, is the design of btrfs a good match for the peculiarities of SSDs ?
>
Yes, because Btrfs is copy on write it is able to always cluster metadata and 
data writes in an optimal fashion on the SSD.  On traditional storage you get 
very bad performance with this type of allocation model because it will have 
many more seeks on reads.  But with SSD, there is no read penalty.

With v0.12, I introduced a knob to tune the allocator for SSD (clearly there 
is much more work to do here).  You mount with -o ssd to enable the tuning.  
Here is an example graph on an SSD device with postmark, which basically does 
random writes to a bunch of files:

http://oss.oracle.com/~mason/seekwatcher/pm-compare.png

Here is the same workload on a traditional sata drive, but with ext3 in the 
results:

http://oss.oracle.com/~mason/seekwatcher/postmark/postmark-compare.png

Notice that on the spinning sata drive btrfs -o ssd isn't much faster
overall.
This is because write cache is enabled on the drive, without the cache on -o 
ssd is about 2x faster than the defaults.

-chris

Claudio Martins

2008-Feb-11 12:35 UTC

head link

[Btrfs-devel] btrfs and Solid State Disks (SSD).

On Monday 11 February 2008, Miguel Figueiredo Mascarenhas Sousa Filipe
wrote:>
> >From what I read, current SSD are characterized by:
>
> - poor performance in random writes (because of block erasure)
> - require wear leveling, even those that emulate sata/ide/scsi disk
> with onboard wear leveling logic.
 I thought that SSDs that emulate scsi disks (like USB pen drives, etc) and 
also CompactFlash and SD cards, had no need of additional wear leveling since 
they have, as you say, onboard wear leveling logic because they were 
basically designed to work well with FAT16/32.
Though I'd also really like to hear if anyone on the list has more
information
about this problem.
> - excellent seek latency
> - excellent read (random and sequential) performance.
> - good at sequential writes.
>
> I've read that journaling file systems are usually bad for SSD because
> of (from what I suppose are) two things:
> - increased "random" write load (journal + proper data)
> - write hot spot on the journal, causing lots of write cycles on a
> given set of blocks.
 From what I understand of the design (please correct me if I'm wrong),
btrfs
shouldn't have much of a problem in this particular area, since there really
is no specific journal area. When something is changed (data or metadata) the 
data is not rewritten in place. Instead, new blocks are allocated and then 
the tree is updated all the way up to the root node.

 Best regards

Cl?udio Martins

Btrfs devel - Feb 2008 - btrfs and Solid State Disks (SSD).

[Btrfs-devel] btrfs and Solid State Disks (SSD).

[Btrfs-devel] btrfs and Solid State Disks (SSD).

[Btrfs-devel] btrfs and Solid State Disks (SSD).