thr3ads.net - zfs discuss - [zfs-discuss] Welcome to the ZFS community! [Nov 2005]

If this information is useful, please help other people find it:
Share via:

Eric Schrock

2005-Nov-16 17:28 UTC

[zfs-discuss] Welcome to the ZFS community!

Welcome to the ZFS Community!

Today, build 27 of OpenSolaris was released to the community.  Included
in this release is ZFS, Sun''s next generation filesystem.  We are proud
to announce the creation of the ZFS community to discuss and develop ZFS
for the future.  You can find the community at:

        http://www.opensolaris.org/os/community/zfs

Be sure to look for blogs relating to ZFS at:

        http://blogs.sun.com

As well as an introductory screencast produced by Dan Price:

        http://www.opensolaris.org/os/community/zfs/demos/basics/

For the developers out there, you can find an overview of the source
code at:

        http://www.opensolaris.org/os/community/zfs/source/

Many thanks to the ZFS and OpenSolaris teams for making this a reality.


So what is ZFS?

ZFS is a new kind of filesystem that provides simple administration,
transactional semantics, end-to-end data integrity, and immense
scalability. ZFS is not an incremental improvement to existing
technology; it is a fundamentally new approach to data management.
We''ve
blown away 20 years of obsolete assumptions, eliminated complexity at
the source, and created a storage system that''s actually a pleasure to
use.

ZFS presents a pooled storage model that completely eliminates the
concept of volumes and the associated problems of partitions,
provisioning, wasted bandwidth and stranded storage. Thousands of
filesystems can draw from a common storage pool, each one consuming only
as much space as it actually needs.

All operations are copy-on-write transactions, so the on-disk state is
always valid. There is no need to fsck(1M) a ZFS filesystem, ever. Every
block is checksummed to prevent silent data corruption, and the data is
self-healing in replicated (mirrored or RAID) configurations.

ZFS provides unlimited constant-time snapshots and clones. A snapshot is
a read-only point-in-time copy of a filesystem, while a clone is a
writable copy of a snapshot. Clones provide an extremely space-efficient
way to store many copies of mostly-shared data such as workspaces,
software installations, and diskless clients.

ZFS administration is both simple and powerful.  The tools are designed
from the ground up to eliminate all the traditional headaches relating
to managing filesystems.  Storage can be added, disks replaced, and data
scrubbed with straightforward commands.  Filesystems can be created
instantaneously, snapshots and clones taken, native backups made, and a
simplified property mechanism allows for setting of quotas,
reservations, compression, and more.


Give it a spin, and let us know what you think!

- The ZFS Team

Stefan Parvu

2005-Nov-16 19:28 UTC

head link

[zfs-discuss] Re: Welcome to the ZFS community!

thank you for message. Is already b27 out ? Any ideas what are the differences
between b27a and 27 ? Should we wait for b27 ?

ZFS should be available in b27, am I right ?

Thanks,
stefan
This message posted from opensolaris.org

Dan Price

2005-Nov-16 19:44 UTC

head link

[zfs-discuss] Re: Welcome to the ZFS community!

On Wed 16 Nov 2005 at 11:28AM, Stefan Parvu wrote:> thank you for message. Is already b27 out ? Any ideas what are the
> differences between b27a and 27 ? Should we wait for b27 ?
> 
> ZFS should be available in b27, am I right ?
Yes, it''s out.  b27a is out, I mean.  27a means 27 + a bug fix.

http://www.opensolaris.org/os/downloads

        -dp

-- 
Daniel Price - Solaris Kernel Engineering - dp at eng.sun.com - blogs.sun.com/dp

Casper.Dik at Sun.COM

2005-Nov-16 19:46 UTC

head link

[zfs-discuss] Re: Welcome to the ZFS community!

>thank you for message. Is already b27 out ? Any ideas what are the
differences between b27a and 27 ? Should we wait for b27 ?

b27a post-dates b27.
>ZFS should be available in b27, am I right ?
It''s in b27a; b27 will not be made available externally.

Some additional fixes for ZFS were pulled into the first release so
build 27 was respun, giving build 27a.

(Most importantly, some to make performance on debug kernels
better)

Casper

Stefan Parvu

2005-Nov-16 19:52 UTC

head link

[zfs-discuss] Re: Re: Welcome to the ZFS community!

Ok, thanks Casper, starting then to download 27a.

stefan
This message posted from opensolaris.org

Gary Combs

2005-Nov-16 19:57 UTC

head link

[zfs-discuss] Re: Welcome to the ZFS community!

b27a is just a respin of b27. ZFS was release in b27, so it is also in 
b27a.

Gary

Stefan Parvu wrote:
>thank you for message. Is already b27 out ? Any ideas what are the
differences between b27a and 27 ? Should we wait for b27 ?
>
>ZFS should be available in b27, am I right ?
>
>Thanks,
>stefan
>This message posted from opensolaris.org
>_______________________________________________
>zfs-discuss mailing list
>zfs-discuss at opensolaris.org
>http://opensolaris.org/mailman/listinfo/zfs-discuss
>  
>
-- 

<http://www.sun.com> 	* Gary Combs *
Product Architect

*Sun Microsystems, Inc.*
3295 NW 211th Terrace
Hillsboro, OR 97124 US
Phone x32604/+1 503 715 3517
Fax 503-715-3517
Email Gary.Combs at Sun.COM
	

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20051116/8c6c8b19/attachment.html>

Luke

2005-Nov-16 21:15 UTC

head link

[zfs-discuss] ZFS and SAN

It''s nice that ZFS eliminates the concept of volumes, but how does that
fit in with a SAN where volume management is performed by the external storage
system and not by the solaris server?
This message posted from opensolaris.org

Eric Schrock

2005-Nov-16 21:41 UTC

head link

[zfs-discuss] ZFS and SAN

ZFS will be perfectly happy in this setting.  However, you may not be
getting the best of the features which ZFS has to offer.  For example,
if you have your SAN box set up with internal mirroring, and export it
as a single volume to ZFS, the SAN doesn''t understand ZFS checksums and
self healing data.  A corrupt block would still be detected by ZFS at a
higher level, but it would be unable to retrieve the correct data and
proactively repair it.  Similar small features will also be unavailable
in other subsystems of ZFS (RAID-Z, dynamic striping, I/O load
balancing, etc).

Overall, ZFS will continue to function as designed, but we encourage
users to expose ZFS to as close to raw hardware as possible to fully
leverage all the available features.

- Eric

On Wed, Nov 16, 2005 at 01:15:58PM -0800, Luke wrote:> It''s nice that ZFS eliminates the concept of volumes, but how does
that fit in with a SAN where volume management is performed by the external
storage system and not by the solaris server?
> This message posted from opensolaris.org
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://opensolaris.org/mailman/listinfo/zfs-discuss
--
Eric Schrock, Solaris Kernel Development       http://blogs.sun.com/eschrock

Bill Moore

2005-Nov-16 21:50 UTC

head link

[zfs-discuss] ZFS and SAN

On Wed, Nov 16, 2005 at 01:15:58PM -0800, Luke wrote:> It''s nice that ZFS eliminates the concept of volumes, but how does
> that fit in with a SAN where volume management is performed by the
> external storage system and not by the solaris server?
It will work fine since at the bottom-most layer of ZFS, we simply
read/write blocks to a block storage device.

The thing you lose, however, is the ability to do self-healing. Since
ZFS checksums the data, we can detect silent data corruption.
Furthermore, since ZFS also does redundancy (mirroring, RAID-Z, etc.),
when an error is detected, we can look to redundant copies of the data
in the hopes of finding a good copy.

You can''t do this if the filesystem and volume manager are separate
products (as they would be if the redundancy were implemented in an
external RAID array). There is no interface to a block device to ask it
to read the other side of the mirror, for example. And the volume
manager itself can''t do this since it has no idea what the data was
supposed to look like.

"But wait", you say, "my storage array vendor says that they
checksum
all of the data, too." That is correct, some do. But there is a whole
class of errors that you can''t fix with that architecture:

- Accidental overwrites: The array has no idea if the block writes
coming in from the filesystem, or some rouge agent overwriting
filesystem blocks.

- Phantom writes: This is where the block device says it did the
requested write, but actually dropped the data on the floor.

- Mis-directed reads/writes: The filesystem requests block X,
but instead gets block Y. How often can these errors (along
with phantom writes) actually happen? Well, these are industry
standard terms for the phenomenon, so it''s often enough to be
given a proper name. Also, most folks working on storage for any
length of time have actually witnessed this sort of thing for
real.

- Data path errors: If the data gets corrupted on the way to the
storage array, then the storage array nicely checksums corrupt
data and ensures that it stays corrupt.

- Software bugs in the array: A storage array is nothing but a big
pile of software sitting on top of a bunch of disks. Any software
product has bugs (as does ZFS, I''m sure). And two stacks of
software will, by definition, have more failures than a single
stack. But in ZFS, we mitigate this risk by having a very simple,
extremely well-tested I/O path that verifies the integrity of the
data before the rest of ZFS gets to even look at it.

So to sum up my answer, ZFS will work correctly on HW RAID devices,
but you lose the ability to survive silent data corruption.

--Bill

James C. McPherson

2005-Nov-16 21:52 UTC

head link

[zfs-discuss] ZFS and SAN

Luke wrote:> It''s nice that ZFS eliminates the concept of volumes, but how does
that fit
> in with a SAN where volume management is performed by the external storage
> system and not by the solaris server?

Luke,
as long as you have multiple paths you''ll be fine. Besides, not all
san-resident storage is managed for you (a5200->switch->host etc).

What concerns do you have about zfs and san-resident storage?

James C. McPherson
--
Solaris Datapath Engineering
Data Management Group
Sun Microsystems

Torrey McMahon

2005-Nov-16 21:56 UTC

head link

[zfs-discuss] ZFS and SAN

Luke wrote:> It''s nice that ZFS eliminates the concept of volumes, but how does
that fit in with a SAN where volume management is performed by the external
storage system and not by the solaris server?
Someone from the ZFS team will probably have a 
better/longer/more-quote-worthy answer :) but the general takeaway is 
ZFS operates on the LUNs offered to the host. It has no view into the 
SAN, storage arrays, etc. on the other side of the LUN. If you have a 
storage array that provides virtualization, for example a Sun StorEdge 
6920, and it has exported a LUN to the host running ZFS that really is 
composed of ....

    * A storage pool with five LUNs that come from ...
    * Five different 6020 trays each of which each LUN is really a ...
    * 7+1+1 R5 stripe of 144 GB disk drives ...


... ZFS isn''t going to manage or dive down to any of those levels.
It''s
really up to the customer to determine if and where volume management 
should be done to best suit their environment. ZFS greatly simplifies 
many storage management tasks and it would be advantageous to look at 
the new features available to see where any simplification can be achieved.

David Robinson

2005-Nov-16 22:38 UTC

head link

[zfs-discuss] ZFS and SAN

On Nov 16, 2005, at 3:50 PM, Bill Moore wrote:
> On Wed, Nov 16, 2005 at 01:15:58PM -0800, Luke wrote:
>> It''s nice that ZFS eliminates the concept of volumes, but how
does
>> that fit in with a SAN where volume management is performed by the
>> external storage system and not by the solaris server?
>
> It will work fine since at the bottom-most layer of ZFS, we simply
> read/write blocks to a block storage device.
>
> The thing you lose, however, is the ability to do self-healing.  Since
> ZFS checksums the data, we can detect silent data corruption.
> Furthermore, since ZFS also does redundancy (mirroring, RAID-Z, etc.),
> when an error is detected, we can look to redundant copies of the data
> in the hopes of finding a good copy.
Just to make sure the point is not lost, you can (and should)
still mirror or RAID-Z across multiple LUNs in your SAN so
you can get all the features that Bill laid out. The fact that
those LUNs may be anything from a dumb disk to a complex RAID
is not known to ZFS, its just a block device.

Doing mirroring in two places is a bit of a belt and suspenders
duplication. A subject of interesting discussion on what if
any value that may add.

	-David

Luke

2005-Nov-17 00:06 UTC

head link

[zfs-discuss] Re: ZFS and SAN

Thanks everyone for the great responses.

Eric wrote:> Overall, ZFS will continue to function as designed, but we encourage
> users to expose ZFS to as close to raw hardware as possible to fully
> leverage all the available features.
Bill wrote:> So to sum up my answer, ZFS will work correctly on HW RAID devices,
> but you lose the ability to survive silent data corruption.
David wrote:> Just to make sure the point is not lost, you can (and should)
> still mirror or RAID-Z across multiple LUNs in your SAN so
> you can get all the features that Bill laid out.
You guys have explained quite clearly the benefits of doing some volume
management on the solaris server even when using SAN storage with its own volume
management.  Here are some downsides I see with this:
(1) As David pointed out, two levels of mirroring/RAID involves dubious
duplication (and is more wasteful of disk space than should be necessary);
(2) RAID on the SAN target is a lot more flexible than at the solaris server,
because a single RAID group can be shared between multiple servers;
(3) Mirroring on the solaris server will double the SAN bandwidth requirements
(is this what James was alluding to when he wrote "as long as you have
multiple paths you''ll be fine"?)

All this makes me wonder whether there is an opportunity here to make a SAN
target able to collaborate somehow with a server running ZFS.
This message posted from opensolaris.org

David Robinson

2005-Nov-17 00:30 UTC

head link

[zfs-discuss] Re: ZFS and SAN

On Nov 16, 2005, at 6:06 PM, Luke wrote:> You guys have explained quite clearly the benefits of doing some 
> volume management on the solaris server even when using SAN storage 
> with its own volume management.  Here are some downsides I see with 
> this:
> (1) As David pointed out, two levels of mirroring/RAID involves 
> dubious duplication (and is more wasteful of disk space than should be 
> necessary);
> (2) RAID on the SAN target is a lot more flexible than at the solaris 
> server, because a single RAID group can be shared between multiple 
> servers;
> (3) Mirroring on the solaris server will double the SAN bandwidth 
> requirements (is this what James was alluding to when he wrote "as 
> long as you have multiple paths you''ll be fine"?)
>
> All this makes me wonder whether there is an opportunity here to make 
> a SAN target able to collaborate somehow with a server running ZFS.
 From an integrity perspective, there is no real choice. You have to
mirror or RAID-Z in the software. Doing only in the array leaves
open the entire data path from the array to the CPU to allow
corruption to occur, even if you exposed the checksums and detected
the error, you still lack a mechanism to recover without
a software mirror. With ZFS it is "safe" through to the
cpu/memory subsystem, one that is usually built with various
hardware protections (ECC etc) to minimize internal corruption.

But there is added bandwidth and links, integrity is not free.

The contrary argument can be made that you should just buy
cheap JBODs, ZFS is all that you need, everything else is
expensive duplication.

But there is a middle ground, as you say in a shared array it
may be easier to have one RAID group that may be accessed by
systems that are not blessed with ZFS.

	-David

Torrey McMahon

2005-Nov-17 00:50 UTC

head link

[zfs-discuss] Re: ZFS and SAN

I''m not sure if I should continue the thread or start working on a 
blueprint titled something like, "Best Practices for ZFS in a 
Heterogeneous SAN environment". More inline ...

Luke wrote:> (1) As David pointed out, two levels of mirroring/RAID involves dubious
duplication (and is more wasteful of disk space than should be necessary);
>   
True but as you then state....
> (2) RAID on the SAN target is a lot more flexible than at the solaris
server, because a single RAID group can be shared between multiple servers;
>   
Clearly, this depends on your environment. A data center working on 
heterogeneous storage consolidation between 100s of systems would take a 
different approach then a customer running ZFS on a system with a JBOD 
storage system.
> (3) Mirroring on the solaris server will double the SAN bandwidth
requirements (is this what James was alluding to when he wrote "as long as
you have multiple paths you''ll be fine"?)
>   
I believe James point was that as long as you have transport redundancy, 
for example Traffic Manager, you negate some of the other data 
availability concerns. For example, if your FC link goes down between 
your HBA and storage array or switch  ZFS can''t do much about it.
Having
a redundant link would alleviate that concern to a large degree.

Mirroring on the server would double the write requirements but the read 
bandwidth would be the same. ZFS won''t read both sides of a mirror 
unless it detects an invalid checksum.  I can''t say for sure but I
would
assume - Perhaps incorrectly so someone jump in here if I''m wrong -
that
reads between the LUNs in a mirror are accessed round-robin to increase 
throughput.
> All this makes me wonder whether there is an opportunity here to make a SAN
target able to collaborate somehow with a server running ZFS.
It''s an idea a few of us have had for some time but not an easy problem
to fix for various reasons.

Bill Sommerfeld

2005-Nov-17 04:34 UTC

head link

[zfs-discuss] Re: ZFS and SAN

On Wed, 2005-11-16 at 19:50, Torrey McMahon wrote:> > All this makes me wonder whether there is an opportunity here to
> make a > SAN target able to collaborate somehow with a server running
> ZFS.
> 
> It''s an idea a few of us have had for some time but not an easy
problem
> to fix for various reasons.
Yup.  I''ll plug my blog post from earlier today:

http://blogs.sun.com/roller/page/sommerfeld?entry=the_end_to_end_argument

And it''s worth rereading "End to End Arguments in System
Design"....

If you''re going to build a shared storage box to provide space on a SAN
to systems which will be running ZFS, I''m willing to bet that the
answer
is likely to look very different from today''s SAN systems.  They go to
immense lengths to provide the illusion of a huge, extremely reliable
disk.  ZFS doesn''t need that, so maybe some of the complexity in there
is not actually needed...

It screams out for taking the same start-with-a-blank-sheet design
approach that was taken with ZFS itself.  Move the functions around
between boxes to where they can best be performed, and tweak the
interfaces between them to make this possible.

					- Bill

zfs discuss - Nov 2005 - Welcome to the ZFS community!

[zfs-discuss] Welcome to the ZFS community!

[zfs-discuss] Re: Welcome to the ZFS community!

[zfs-discuss] Re: Welcome to the ZFS community!

[zfs-discuss] Re: Welcome to the ZFS community!

[zfs-discuss] Re: Re: Welcome to the ZFS community!

[zfs-discuss] Re: Welcome to the ZFS community!

[zfs-discuss] ZFS and SAN

[zfs-discuss] ZFS and SAN

[zfs-discuss] ZFS and SAN

[zfs-discuss] ZFS and SAN

[zfs-discuss] ZFS and SAN

[zfs-discuss] ZFS and SAN

[zfs-discuss] Re: ZFS and SAN

[zfs-discuss] Re: ZFS and SAN

[zfs-discuss] Re: ZFS and SAN

[zfs-discuss] Re: ZFS and SAN