thr3ads.net - Btrfs devel - Future Linux filesystems [Jun 2008]

If this information is useful, please help other people find it:
Share via:

Thomas King

2008-Jun-02 21:46 UTC

Future Linux filesystems

Folks,

I am writing an article for Linux.com to answer Henry Newman''s article
at
http://www.enterprisestorageforum.com/sans/features/article.php/3749926
concerning Linux and massive filesystems. Is there someone here that can field
some questions about BTRFS?

Thanks!
Tom King
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Thomas King

2008-Jun-03 14:37 UTC

head link

Re: Future Linux filesystems

> All the issues he complains about actually are solved by XFS, and XFS
actually
does better in> exactly these environments than either zfs on Solaris or JFS2 on AIX.
>
>
I asked the author that question and he states XFS is actually a pretty good
answer to most of those issues but believes it still falls short where "the
metadata areas are not aligned with RAID strips and allocation units are FAR too
small but better than ext." Another detail he brought out was sending data
and
metadata to different devices in those environments and referenced RT XFS.
Otherwise having them on the same device increases the possibility of corruption
and/or a longer filesystem check/repair. Will btrfs offer something like this in
the future?

Do y''all foresee btrfs being used in exabtye installations?
Does/Will btrfs have RAID awareness in that it will align "the
superblock and metadata to the RAID stripe"?
What is the largest block allocation available?
Will btrfs be T10 DIF/block protect aware?
I remember reading that CRFS relies on btrfs, but will btrfs support NFS,
specifically version 4.1?

Thanks!
Tom King





--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Joe Peterson

2008-Jun-03 15:02 UTC

head link

Re: Future Linux filesystems

Thomas King wrote:>> All the issues he complains about actually are solved by XFS, and XFS
actually
> does better in
>> exactly these environments than either zfs on Solaris or JFS2 on AIX.
>>
>>
> 
> I asked the author that question and he states XFS is actually a pretty
good
> answer to most of those issues but believes it still falls short where
"the
> metadata areas are not aligned with RAID strips and allocation units are
FAR too
> small but better than ext." Another detail he brought out was sending
data and
> metadata to different devices in those environments and referenced RT XFS.
> Otherwise having them on the same device increases the possibility of
corruption
> and/or a longer filesystem check/repair. Will btrfs offer something like
this in
> the future?
> 
> Do y''all foresee btrfs being used in exabtye installations?
> Does/Will btrfs have RAID awareness in that it will align "the
> superblock and metadata to the RAID stripe"?
> What is the largest block allocation available?
> Will btrfs be T10 DIF/block protect aware?
> I remember reading that CRFS relies on btrfs, but will btrfs support NFS,
> specifically version 4.1?
You don''t mention what I believe is the *key* issue (and I
don''t think
the author did either, but I skimmed his article): data integrity.  I''m
not talking about blatant failures or known need for an fsck, but rather
silent corruption.

Where I work, we are considering multi-petabyte scenarios, and with the
specs of current drives, we are talking hundreds of silent errors per
read of the volume of data - unacceptable.  With large filesystems (and
he''s talking 100 PB, etc.), this is the #1 issue for me.

						-Joe
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Evgeniy Polyakov

2008-Jun-03 15:52 UTC

head link

Re: Future Linux filesystems

Hi.

On Tue, Jun 03, 2008 at 09:37:27AM -0500, Thomas King
(kingttx@tomslinux.homelinux.org) wrote:> I asked the author that question and he states XFS is actually a pretty
good
> answer to most of those issues but believes it still falls short where
"the
> metadata areas are not aligned with RAID strips and allocation units are
FAR too
> small but better than ext." Another detail he brought out was sending
data and
> metadata to different devices in those environments and referenced RT XFS.
> Otherwise having them on the same device increases the possibility of
corruption
> and/or a longer filesystem check/repair. Will btrfs offer something like
this in
> the future?
Right now btrfs can be created on top of multiple devices.
AFAIK, there are no policies on hwo to put data and metadata between them.
> Do y''all foresee btrfs being used in exabtye installations?
> Does/Will btrfs have RAID awareness in that it will align "the
> superblock and metadata to the RAID stripe"?
> What is the largest block allocation available?
> Will btrfs be T10 DIF/block protect aware?
> I remember reading that CRFS relies on btrfs, but will btrfs support NFS,
> specifically version 4.1?
Original author does not belive in networked filesystem as a key method
to organize large storages :)
Changes to filesystem are quite simple in order fs would be exported via
NFS, so that should not be a problem.

-- 
	Evgeniy Polyakov
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Martin K. Petersen

2008-Jun-03 16:06 UTC

head link

Re: Future Linux filesystems

>>>>> "Joe" == Joe Peterson <lavajoe@gentoo.org>
writes:
Joe> You don''t mention what I believe is the *key* issue (and I
don''t
Joe> think the author did either, but I skimmed his article): data
Joe> integrity.  I''m not talking about blatant failures or known
need
Joe> for an fsck, but rather silent corruption.

We''re very concerned about data integrity.  With btrfs everything is
checksummed at the logical level.  This allows you to detect data
corruption, repair bad blocks using redundant, good copies, perform
data scrubbing, etc.

A related, but orthogonal data integrity measure is the T10 DIF
infrastructure that I am working on.  DIF enables protection at the
sector level and includes stuff like a data checksum and a locality
check which ensures that the sector ends up the right place on disk.

If there is a mismatch the I/O will be reject by either the HBA or the
storage device.  That allows us to catch a lot of the corruption
scenarios where we accidentally write bad stuff to disk.

Right now the DIF checksum is added at the block layer level.  Work is
in progress to move it up into the filesystems and from there into
user space.  Eventually we''d like to be able to generate the checksum
in the application and pass it along the I/O path all the way out to
the physical disk.

-- 
Martin K. Petersen	Oracle Linux Engineering

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Miguel Sousa Filipe

2008-Jun-03 16:17 UTC

head link

Re: Future Linux filesystems

Hi,

On Tue, Jun 3, 2008 at 4:52 PM, Evgeniy Polyakov <johnpol@2ka.mipt.ru>
wrote:> Hi.
>
> On Tue, Jun 03, 2008 at 09:37:27AM -0500, Thomas King
(kingttx@tomslinux.homelinux.org) wrote:
>> I asked the author that question and he states XFS is actually a pretty
good
>> answer to most of those issues but believes it still falls short where
"the
>> metadata areas are not aligned with RAID strips and allocation units
are FAR too
>> small but better than ext." Another detail he brought out was
sending data and
>> metadata to different devices in those environments and referenced RT
XFS.
>> Otherwise having them on the same device increases the possibility of
corruption
>> and/or a longer filesystem check/repair. Will btrfs offer something
like this in
>> the future?
>
> Right now btrfs can be created on top of multiple devices.
> AFAIK, there are no policies on hwo to put data and metadata between them.
>
But it does allow to specify to have different replication/stripping
policies for metadata and data.
Such has: configure a raid0 with N drives, but mirror the metadata
across all of them.
>> Do y''all foresee btrfs being used in exabtye installations?
>> Does/Will btrfs have RAID awareness in that it will align "the
>> superblock and metadata to the RAID stripe"?
This is a feature that is intented to provided in the future, this was
talked about in the
#btrfs@freenode.org irc channel.
There isn''t code for this currently.



-- 
Miguel Sousa Filipe
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Joe Peterson

2008-Jun-03 16:46 UTC

head link

Re: Future Linux filesystems

Martin K. Petersen wrote:> We''re very concerned about data integrity.  With btrfs everything
is
> checksummed at the logical level.  This allows you to detect data
> corruption, repair bad blocks using redundant, good copies, perform
> data scrubbing, etc.
That''s the main reason I am interesting in btrfs, actually.  :)
> A related, but orthogonal data integrity measure is the T10 DIF
> infrastructure that I am working on.  DIF enables protection at the
> sector level and includes stuff like a data checksum and a locality
> check which ensures that the sector ends up the right place on disk.
Great!  Really great to hear that this issue is being actively worked.
> Right now the DIF checksum is added at the block layer level.  Work is
> in progress to move it up into the filesystems and from there into
> user space.  Eventually we''d like to be able to generate the
checksum
> in the application and pass it along the I/O path all the way out to
> the physical disk.
Yep, end-to-end is a great idea.  Kudos to this and to btrfs!

					-Joe
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Chris Mason

2008-Jun-04 02:14 UTC

head link

Re: Future Linux filesystems

On Tue, Jun 03, 2008 at 09:37:27AM -0500, Thomas King
wrote:> > All the issues he complains about actually are solved by XFS, and XFS
actually
> does better in
> > exactly these environments than either zfs on Solaris or JFS2 on AIX.
> >
> >
> 
> I asked the author that question and he states XFS is actually a pretty
good
> answer to most of those issues but believes it still falls short where
"the
> metadata areas are not aligned with RAID strips and allocation units are
FAR too
> small but better than ext."
I think it would be best to let the XFS developers answer this part.
But, XFS is designed for and used in massive installations, and I think
it represents a scalability goal for Btrfs.
> Another detail he brought out was sending data and
> metadata to different devices in those environments and referenced RT XFS.
> Otherwise having them on the same device increases the possibility of
corruption
> and/or a longer filesystem check/repair. Will btrfs offer something like
this in
> the future?
Btrfs can duplicate metadata via the internal raid1 and raid10 code.  On
single spindles it will duplicate metadata as well.  This is different
from RT XFS which I do not understand well.

There is not code today in btrfs to force data and metadata to different
devices, but the disk format has the bits it needs to make that happen.
I think it is an oversimplification to say that splitting the two
between devices changes the chances of a corruption, or changes the time
a repair takes.

Btrfs does split data and metadata allocations, grouping metadata
together in large chunks on the drive.  This does make FS check/repair
faster by reducing seeks between metadata blocks.
> 
> Do y''all foresee btrfs being used in exabtye installations?
Yes
> Does/Will btrfs have RAID awareness in that it will align "the
> superblock and metadata to the RAID stripe"?
Today the superblock is not stripe aligned, but it will be in a future
release that supports super block duplication.  At least, the
blocks that are frequently written will be striped aligned.
> What is the largest block allocation available?
2^64 bytes.  But, in COW filesystems massive extents have different
costs than they do in traditional filesystems.  It isn''t always a good
idea to make a huge extent.
> Will btrfs be T10 DIF/block protect aware?
I work closely with Martin, and we''ll leverage the T10 DIF code as much
as possible.
> I remember reading that CRFS relies on btrfs, but will btrfs support NFS,
> specifically version 4.1?
> 
We''ll definitely support NFS.  It doesn''t work today, but it
will before
1.0.

-chris

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Dongjun Shin

2008-Jun-04 02:34 UTC

head link

Re: Future Linux filesystems

On Tue, Jun 3, 2008 at 11:37 PM, Thomas King
<kingttx@tomslinux.homelinux.org> wrote:>> All the issues he complains about actually are solved by XFS, and XFS
actually
> does better in
>> exactly these environments than either zfs on Solaris or JFS2 on AIX.
>>
>>
>
> I asked the author that question and he states XFS is actually a pretty
good
> answer to most of those issues but believes it still falls short where
"the
> metadata areas are not aligned with RAID strips and allocation units are
FAR too
> small but better than ext." Another detail he brought out was sending
data and
> metadata to different devices in those environments and referenced RT XFS.
> Otherwise having them on the same device increases the possibility of
corruption
> and/or a longer filesystem check/repair. Will btrfs offer something like
this in
> the future?
>
> Do y''all foresee btrfs being used in exabtye installations?
> Does/Will btrfs have RAID awareness in that it will align "the
> superblock and metadata to the RAID stripe"?
> What is the largest block allocation available?
> Will btrfs be T10 DIF/block protect aware?
> I remember reading that CRFS relies on btrfs, but will btrfs support NFS,
> specifically version 4.1?
>
I also would like to comment that btrfs is ready for the future storage
- the solid state drive. Btrfs performs well on both HDD and SSD.

AFAIK, the ssd option of btrfs only affects the block allocation behavior.
However, under hybrid combination of HDD and SSD with the multi-device
support of btrfs, there can be more interesting optimizations that utilize
the physical characteristics of each device.

-- 
Dongjun
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Thomas King

2008-Jun-04 14:00 UTC

head link

Re: Future Linux filesystems

> On Tue, Jun 03, 2008 at 09:37:27AM -0500, Thomas King wrote:
>> > All the issues he complains about actually are solved by XFS, and
XFS
>> actually
>> does better in
>> > exactly these environments than either zfs on Solaris or JFS2 on
AIX.
>> >
>> >
>>
>> I asked the author that question and he states XFS is actually a pretty
good
>> answer to most of those issues but believes it still falls short where
"the
>> metadata areas are not aligned with RAID strips and allocation units
are FAR
>> too
>> small but better than ext."
>
> I think it would be best to let the XFS developers answer this part.
> But, XFS is designed for and used in massive installations, and I think
> it represents a scalability goal for Btrfs.
>
>> Another detail he brought out was sending data and
>> metadata to different devices in those environments and referenced RT
XFS.
>> Otherwise having them on the same device increases the possibility of
>> corruption
>> and/or a longer filesystem check/repair. Will btrfs offer something
like this
>> in
>> the future?
>
> Btrfs can duplicate metadata via the internal raid1 and raid10 code.  On
> single spindles it will duplicate metadata as well.  This is different
> from RT XFS which I do not understand well.
>
> There is not code today in btrfs to force data and metadata to different
> devices, but the disk format has the bits it needs to make that happen.
> I think it is an oversimplification to say that splitting the two
> between devices changes the chances of a corruption, or changes the time
> a repair takes.
>
> Btrfs does split data and metadata allocations, grouping metadata
> together in large chunks on the drive.  This does make FS check/repair
> faster by reducing seeks between metadata blocks.
>
>>
>> Do y''all foresee btrfs being used in exabtye installations?
>
> Yes
>
>> Does/Will btrfs have RAID awareness in that it will align "the
>> superblock and metadata to the RAID stripe"?
>
> Today the superblock is not stripe aligned, but it will be in a future
> release that supports super block duplication.  At least, the
> blocks that are frequently written will be striped aligned.
>
>> What is the largest block allocation available?
>
> 2^64 bytes.  But, in COW filesystems massive extents have different
> costs than they do in traditional filesystems.  It isn''t always a
good
> idea to make a huge extent.
>
>> Will btrfs be T10 DIF/block protect aware?
>
> I work closely with Martin, and we''ll leverage the T10 DIF code as
much
> as possible.
>
>> I remember reading that CRFS relies on btrfs, but will btrfs support
NFS,
>> specifically version 4.1?
>>
>
> We''ll definitely support NFS.  It doesn''t work today, but
it will before
> 1.0.
>
> -chris
>
>Chris,

Thanks a ton for answering all these questions. I''ve asked the XFS
developers
what was discussed here and they gave some excellent info as well.

Enjoy your day!
Tom King
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Tomasz Chmielewski

2008-Jun-11 09:38 UTC

head link

Re: Future Linux filesystems

> I also would like to comment that btrfs is ready for the future storage
> - the solid state drive. Btrfs performs well on both HDD and SSD.
SSD is still very expensive when compared to traditional hard disks.

*If* btrfs supported compression, I would second your opinion that btrfs 
is (will be, when it''s stable) ready for the future storage.


-- 
Tomasz Chmielewski
http://wpkg.org

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Zach Brown

2008-Jun-11 16:27 UTC

head link

Re: Future Linux filesystems

> SSD is still very expensive when compared to traditional hard disks.
When measured by GB/$, sure.

Many data centers, though, care more about (ops/sec) / ($ * power *
heat).  SSDs look much more compelling by that metric.

- z
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Btrfs devel - Jun 2008 - Future Linux filesystems

Future Linux filesystems

Re: Future Linux filesystems

Re: Future Linux filesystems

Re: Future Linux filesystems

Re: Future Linux filesystems

Re: Future Linux filesystems

Re: Future Linux filesystems

Re: Future Linux filesystems

Re: Future Linux filesystems

Re: Future Linux filesystems

Re: Future Linux filesystems

Re: Future Linux filesystems