thr3ads.net - zfs discuss - [zfs-discuss] Cluster File System Use Cases [Jan 2006]

If this information is useful, please help other people find it:
Share via:

Ed Gould

2006-Jan-24 16:46 UTC

[zfs-discuss] Cluster File System Use Cases

Over the months that this list has been active, there have been several 
queries about using ZFS with clusters.  I have responded that we are 
looking into these issues.

We are starting a project to define, in detail, what would be required 
to make ZFS into a true cluster file system.  (It is much to early to 
speculate on when this project might produce results, so please don''t 
ask.)  And we would like some help from the community.

If you are interested in using a cluster file system, please let us 
know how you would like to do so.  We are gathering use cases, and we 
would like a broad spectrum of possibilities to evaluate.  Any 
description that you can provide of how you would like to share data 
among the nodes of a cluster would be valuable to us; in particular, 
we''re interested in such things as

	- how many nodes would likely be in the cluster?
	- how many of the nodes participate in active data sharing?
	- what applications would be sharing data?
	- what is the sharing model?  Is it single-writer, multi-reader?  
multi-writer?
		multi-append?  something else?  sharing via a single shared file or 
multiple
		files?

Initially, at least, we are probably going to confine our thinking to 
clusters where all nodes have direct access to the storage (e.g., a SAN 
environment).  But we are still interested in use cases that would 
apply to other circumstances, as well.

Please note that this is in the context of the Sun Cluster product, 
which provides high-availability clustering for a modest number of 
nodes (from two to 64).  This product is not a high-performance 
computing solution encompassing hundreds or thousands of nodes.

Thank you very much.

	--Ed
--
Ed Gould				Sun Microsystems
File System Architect	Sun Cluster
ed.gould at sun.com		17 Network Circle
+1.650.786.4937			MS UMPK17-201
x84937					Menlo Park, CA  94025

Darren Dunham

2006-Jan-25 00:10 UTC

head link

[zfs-discuss] Re: Cluster File System Use Cases

Outside of a few rare questions here and there, most of the time I get asked
about what turns out to be a cluster filesystem, it''s in the context of
a web farm that would probably otherwise rely on NFS (server failure issues) or
over-the-network syncing (extra storage requirements, data update latency).

They wanted a filesystem with basically a single writer (or at most a single
writer at a time) and multiple simultaneous readers (usually less than 10).  Now
this certainly isn''t everyone (nor is it a wishlist), but I hear that
one come up more than other cases.  As this is just my observation over the past
couple of years, I have no firm numbers.

In fact, if that came about, I would wish to have a way to override the default
endian used for writing on ZFS.  I could easily see a case where I''ve
got a beefy SPARC crunching stuff at the center and doing a few writes to a
shared storage where the I/O performance isn''t an issue, but lots and
lots of reads on x86/x64 boxes at the edge where we''d want to speed
things up as much as possible.  For now, it sounds like we''d have to
incur a translation on almost all the I/O in a setup like that.

-- 
Darren
This message posted from opensolaris.org

Bill Moore

2006-Jan-25 03:45 UTC

head link

[zfs-discuss] Re: Cluster File System Use Cases

On Tue, Jan 24, 2006 at 04:10:54PM -0800, Darren Dunham
wrote:> In fact, if that came about, I would wish to have a way to override
> the default endian used for writing on ZFS.  I could easily see a case
> where I''ve got a beefy SPARC crunching stuff at the center and
doing a
> few writes to a shared storage where the I/O performance isn''t an
> issue, but lots and lots of reads on x86/x64 boxes at the edge where
> we''d want to speed things up as much as possible.  For now, it
sounds
> like we''d have to incur a translation on almost all the I/O in a
setup
> like that.
That''s not true.  You''d only have to pay the byte-swapping tax
on ZFS
metadata (since that''s the only data we know how to byte-swap).  This
is
typically a very small percentage of the overall data in the storage
pool (<1%), and we byte-swap it before we put it in the cache.  If this
is actually a measured performance problem, I''d really like to know so
we
can look into it.


--Bill

Barry Robison

2006-Jan-25 04:56 UTC

head link

[zfs-discuss] Cluster File System Use Cases

Ed Gould wrote:
>
> If you are interested in using a cluster file system, please let us 
> know how you would like to do so.  We are gathering use cases, and we 
> would like a broad spectrum of possibilities to evaluate.  Any 
> description that you can provide of how you would like to share data 
> among the nodes of a cluster would be valuable to us; in particular, 
> we''re interested in such things as
This is a use case for my company, and probably most of the CG/post 
industry.
>
>     - how many nodes would likely be in the cluster?
two scenarios here:
1) (which you''ve already basically ruled out) a large (300-3000) node 
cluster. Each node is responsible for procesing a job and would directly 
read the data it needs ( 3d models, texture, etc ) from the file system.

2) ( our case ) A medium number of file servers that service that much 
larger pool of nodes. We currently use NetApps, and distribute load with 
Microsoft''s DFS. Servers and clients talk CIFS. A smaller number of NFS
clients need access as well.
>     - how many of the nodes participate in active data sharing?
>     - what applications would be sharing data?
>     - what is the sharing model?  Is it single-writer, multi-reader?  
> multi-writer?
>         multi-append?  something else?  sharing via a single shared 
> file or multiple
>         files?
Almost 100% single writer, multi-reader. ie a single node writes an 
image, which is then read by render wranglers, other nodes for Quicktime 
generation, compositors, etc. Or a 3d scene used by potentially 
thousands of processes simultaneously to render images. Having a single 
bandwidth path to a file is a huge bottleneck.

So you have a huge amount of generated data (hair, particles, geometry, 
animation data, images), of which a relatively tiny amount is in demand 
at any one time. Lots of disk spindles spread between different heads.

Can elaborate further if you want more gory details on data life cycle...

cheers,
    Barry

Jason King

2006-Jan-25 05:00 UTC

head link

[zfs-discuss] Re: Cluster File System Use Cases

> Outside of a few rare questions here and there, most
> of the time I get asked about what turns out to be a
> cluster filesystem, it''s in the context of a web farm
> that would probably otherwise rely on NFS (server
> failure issues) or over-the-network syncing (extra
> storage requirements, data update latency).
> 
> They wanted a filesystem with basically a single
> writer (or at most a single writer at a time) and
> multiple simultaneous readers (usually less than 10).
> Now this certainly isn''t everyone (nor is it a
> a wishlist), but I hear that one come up more than
> other cases.  As this is just my observation over the
> past couple of years, I have no firm numbers.
For what it''s worth, I encountered this exact scenario about a month
ago for a project.  I believe we ended up going with clustered VxFS instead (in
other words I definately concur).
This message posted from opensolaris.org

Richard Elling

2006-Jan-25 05:39 UTC

head link

[zfs-discuss] Re: Cluster File System Use Cases

Barry Robison writes:> >     - how many of the nodes participate in active data sharing?
> >     - what applications would be sharing data?
> >     - what is the sharing model?  Is it single-writer, multi-reader?  
> > multi-writer?
> >         multi-append?  something else?  sharing via a single shared 
> > file or multiple
> >         files?
> 
> Almost 100% single writer, multi-reader. ie a single node writes an 
> image, which is then read by render wranglers, other nodes for Quicktime 
> generation, compositors, etc. Or a 3d scene used by potentially 
> thousands of processes simultaneously to render images. Having a single 
> bandwidth path to a file is a huge bottleneck.
Pardon my sidetracking, but this doesn''t make sense to me
except for the case where the system engineer assumes that
bandwidth to storage >> bandwidth between nodes.  Since that
is not the case with today''s technology, nor will it ever be the 
case going forward with magnetic disks, are you making an
assumption which is already technologically obsolete?

Back on the main track, QFS today has the model of single
writer, multiple reader which relieves a the major architectural
bottleneck of arbitration.  But it means that those workloads
where multiple writers and readers suffer; no trade-off is free.
I think this is the track Ed is following: where do we make the
trade-off for writer arbitration.  But I do not necessarily think it
is a good idea to design a system based on today''s mag
disk technology which may be obsoleted long before the
file system is obsoleted.  Rather, we should expect radical
changes in the storage technology as well.  Considering
that today''s storage technology is at least an order of 
magnitude slower and smaller bandwidth than interconnect
technology, does that change your architectural view of the
system?  Is 10:1 readers:writers a reasonable target? 
100:1?  1000:1?

N.B. arbitration is latency-sensitive, data movement is
bandwidth-sensitive, so it is often difficult to determine
where the right mix should be. I''m not convinced that the
general case has a viable solution (I''ve not seen one yet)
 -- richard
This message posted from opensolaris.org

Barry Robison

2006-Jan-25 06:31 UTC

head link

[zfs-discuss] Re: Cluster File System Use Cases

Richard Elling wrote:
>Barry Robison writes:
>  
>
>>Almost 100% single writer, multi-reader. ie a single node writes an 
>>image, which is then read by render wranglers, other nodes for Quicktime
>>generation, compositors, etc. Or a 3d scene used by potentially 
>>thousands of processes simultaneously to render images. Having a single 
>>bandwidth path to a file is a huge bottleneck.
>>    
>>
>
>Pardon my sidetracking, but this doesn''t make sense to me
>except for the case where the system engineer assumes that
>bandwidth to storage >> bandwidth between nodes.  Since that
>is not the case with today''s technology, nor will it ever be the 
>case going forward with magnetic disks, are you making an
>assumption which is already technologically obsolete?
>  
>Well yes, the first scenario where all the nodes participate in the 
cluster is superior. However that''s not the architecture we have 
currently, and Ed struck down that scenario with the 2-64 cluster member 
limit. We do have an in house p2p application that attempts to get 
requested files from peers that have already cached them from the 
filers. But it''s requires hooks into applications, and has
it''s own
issues of course.

cheers, 
    Barry

Ed Gould

2006-Jan-25 07:42 UTC

head link

[zfs-discuss] Re: Cluster File System Use Cases

On Jan 24, 2006, at 22:31, Barry Robison wrote:> Well yes, the first scenario where all the nodes participate in the 
> cluster is superior. However that''s not the architecture we have 
> currently, and Ed struck down that scenario with the 2-64 cluster 
> member limit. We do have an in house p2p application that attempts to 
> get requested files from peers that have already cached them from the 
> filers. But it''s requires hooks into applications, and has
it''s own
> issues of course.
I certainly didn''t man to suggest that the high-performance cluster 
case (hundreds or thousands of nodes) wasn''t also interesting.  But the
project I''m concerned with (because that''s what Sun''s
cluster product
is) is for high-availability clustering, with a modest number of nodes.

Perhaps I was too focussed on my task at hand when I phrased my query.  
If there is significant interest in adapting ZFS to the 
high-performance cluster arena as well, we would like to know that, 
too.  As I think more about this possibility, even though it is not, 
and will not be, the focus of the project I''m working on, it may be 
valuable to keep the HPC case in mind as we architect for the HA case.

	--Ed
--
Ed Gould				Sun Microsystems
File System Architect	Sun Cluster
ed.gould at sun.com		17 Network Circle
+1.650.786.4937			MS UMPK17-201
x84937					Menlo Park, CA  94025

Peter Tribble

2006-Jan-25 16:33 UTC

head link

[zfs-discuss] Re: Cluster File System Use Cases

On Wed, 2006-01-25 at 05:39, Richard Elling wrote:> Barry Robison writes:
> > Almost 100% single writer, multi-reader. ie a single node writes an 
> > image, which is then read by render wranglers, other nodes for
Quicktime
> > generation, compositors, etc. Or a 3d scene used by potentially 
> > thousands of processes simultaneously to render images. Having a
single
> > bandwidth path to a file is a huge bottleneck.
> 
> Pardon my sidetracking, but this doesn''t make sense to me
> except for the case where the system engineer assumes that
> bandwidth to storage >> bandwidth between nodes.  Since that
> is not the case with today''s technology, nor will it ever be the 
> case going forward with magnetic disks, are you making an
> assumption which is already technologically obsolete?
While it''s true that the bandwidth to a single storage device
may be smaller than the bandwidth between nodes, what about the
aggregate bandwidth to a large number of storage devices?

If I have a large number of devices on a SAN, for example,
then having to route all the requests to them through one
node is a major bottleneck.

At my previous employer, the single-writer multi-reader
scenario would have been a great boon, as we were limited
by the NFS server (which could saturate gigabit, and often
did). Currently, I''m thinking more about multi-writer - think
Oracle RAC.

-- 
-Peter Tribble
L.I.S., University of Hertfordshire - http://www.herts.ac.uk/
http://www.petertribble.co.uk/ - http://ptribble.blogspot.com/

Anton B. Rang

2006-Jan-25 16:44 UTC

head link

[zfs-discuss] Re: Cluster File System Use Cases

Bandwidth to storage can be greater than node-to-node bandwidth in high-end
installations; typically the node interconnect is 1Gb Ethernet today, while
storage is 2Gb or 4Gb FC per port with perhaps 8 ports into an array (e.g.
StorageTek FLX380, DataDirect S2A9500), and data spread across multiple arrays.

QFS supports both a single-writer/multiple-reader model (typically used for web
server farms or video distribution) and a multiple-writer/multiple-reader model
(for a true "shared" file system). The former doesn''t require
a network link between nodes, which is an advantage in environments where
network security requires that the writer be "firewalled" from the
readers; however, this limits how much synchronization is possible between
writer & readers. For clustering, the multiple-writer/multiple-reader model
makes more sense.

For distributed-compute applications (e.g. seismic analysis), there are two
common cases. All nodes read from a common data file; then either each node
writes to an independent file, or all the nodes write to non-overlapping ranges
of the same file. This is a function of the structure of the computation;
changing the relative speeds of storage & interconnect won''t change
it.

One can, of course, choose to route all writes through a single node attached to
the storage by sending data across network, saving the cost of a storage
interconnect at the cost of increased latency and increased utilization of the
network interconnect. That''s reasonable for some applications.
(Obviously one would want to use at least two storage-connected nodes for
redundancy.) However, the industry direction appears (IMHO) to be attaching some
types of storage directly to the network interconnect (Infiniband or Ethernet),
which eliminates the need to centralize I/O through a server which could become
a bottleneck.
This message posted from opensolaris.org

Nicolas Williams

2006-Jan-25 16:49 UTC

head link

[zfs-discuss] Re: Cluster File System Use Cases

On Wed, Jan 25, 2006 at 04:33:11PM +0000, Peter Tribble
wrote:> On Wed, 2006-01-25 at 05:39, Richard Elling wrote:
> > Barry Robison writes:
> > > Almost 100% single writer, multi-reader. ie a single node writes
an
> > > image, which is then read by render wranglers, other nodes for
Quicktime
> > > generation, compositors, etc. Or a 3d scene used by potentially 
> > > thousands of processes simultaneously to render images. Having a
single
> > > bandwidth path to a file is a huge bottleneck.
> > 
> > Pardon my sidetracking, but this doesn''t make sense to me
> > except for the case where the system engineer assumes that
> > bandwidth to storage >> bandwidth between nodes.  Since that
> > is not the case with today''s technology, nor will it ever be
the
> > case going forward with magnetic disks, are you making an
> > assumption which is already technologically obsolete?
> 
> While it''s true that the bandwidth to a single storage device
> may be smaller than the bandwidth between nodes, what about the
> aggregate bandwidth to a large number of storage devices?
> 
> If I have a large number of devices on a SAN, for example,
> then having to route all the requests to them through one
> node is a major bottleneck.
> 
> At my previous employer, the single-writer multi-reader
> scenario would have been a great boon, as we were limited
> by the NFS server (which could saturate gigabit, and often
> did). Currently, I''m thinking more about multi-writer - think
> Oracle RAC.
Would a combination of single-writer/multi-reader ZFS clustering +
pNFS[*] help?

[*]  pNFS ->parallelized NFS, where one server handles most filesystem
     metadata and redirects clients to data servers for file I/O.

Richard Elling

2006-Jan-25 17:19 UTC

head link

[zfs-discuss] Re: Cluster File System Use Cases

> Bandwidth to storage can be greater than node-to-node
> bandwidth in high-end installations; typically the
> node interconnect is 1Gb Ethernet today, while
> storage is 2Gb or 4Gb FC per port with perhaps 8
> ports into an array (e.g. StorageTek FLX380,
> DataDirect S2A9500), and data spread across multiple
> arrays.
By the time this gets specified and developed, GbE will
be passe, if it isn''t already.  The higher speed networks
are approaching main memory bandwidth, and that 
essentially flips the architecture upside down.
> QFS supports both a single-writer/multiple-reader
> model (typically used for web server farms or video
> distribution) and a multiple-writer/multiple-reader
> model (for a true "shared" file system). The former
> doesn''t require a network link between nodes, which
> is an advantage in environments where network
> security requires that the writer be "firewalled"
> from the readers; however, this limits how much
> synchronization is possible between writer & readers.
> For clustering, the multiple-writer/multiple-reader
> model makes more sense.
Yes.
> For distributed-compute applications (e.g. seismic
> analysis), there are two common cases. All nodes read
> from a common data file; then either each node writes
> to an independent file, or all the nodes write to
> non-overlapping ranges of the same file. This is a
> function of the structure of the computation;
> changing the relative speeds of storage &
> interconnect won''t change it.
Agree.
> One can, of course, choose to route all writes
> through a single node attached to the storage by
> sending data across network, saving the cost of a
> storage interconnect at the cost of increased latency
> and increased utilization of the network
> interconnect. That''s reasonable for some
> applications. (Obviously one would want to use at
> least two storage-connected nodes for redundancy.)
> However, the industry direction appears (IMHO) to be
> attaching some types of storage directly to the
> network interconnect (Infiniband or Ethernet), which
> eliminates the need to centralize I/O through a
> server which could become a bottleneck.
When I hear people make this argument, I always
ask them "what is a RAID array?"  Usually they don''t
really know.  A RAID array is really just a server.
So in such architectures you really need multiple
RAID arrays, for the reasons stated above.  This 
also puts another protocol or two in between your 
processors and the media as well as a hop or 
three.  For the high bandwidth, single writer scenarios,
this can work quite well.  For multi-writer it adds
complexity because the RAID array doesn''t
understand the context of the data.

[the doors are open... just gotta choose which
one... :-)]
 -- richard
This message posted from opensolaris.org

Robert Milkowski

2006-Jan-25 21:17 UTC

head link

[zfs-discuss] Re: Cluster File System Use Cases

Hello Ed,

Wednesday, January 25, 2006, 8:42:27 AM, you wrote:

EG> On Jan 24, 2006, at 22:31, Barry Robison wrote:>> Well yes, the first scenario where all the nodes participate in the 
>> cluster is superior. However that''s not the architecture we
have
>> currently, and Ed struck down that scenario with the 2-64 cluster 
>> member limit. We do have an in house p2p application that attempts to 
>> get requested files from peers that have already cached them from the 
>> filers. But it''s requires hooks into applications, and has
it''s own
>> issues of course.
EG> I certainly didn''t man to suggest that the high-performance
cluster
EG> case (hundreds or thousands of nodes) wasn''t also interesting. 
But the
EG> project I''m concerned with (because that''s what
Sun''s cluster product
EG> is) is for high-availability clustering, with a modest number of nodes.

It isn''t clear to me if you plan to make ZFS clutering dependable oc
Sun Cluster? I hope not. I would really like if you could get 2-16
nodes even without Sun Cluster and use clustered ZFS (shared I should
probably say). Then using Sun Cluster would be only an option to
provide HA or/and scalability to an aplication.

I also belive that as ZFS is going to hit S10U2 it would be really
useful if equivalent of HAStorage+ (HAZFS?) agant would be created
ASAP so people could use Sun Cluster with S10U2 and ZFS (of course not
shared filesystem, yet). I know I would use it immediatly.

-- 
Best regards,
 Robert                            mailto:rmilkowski at task.gda.pl
                                   http://milek.blogspot.com

John Kaitschuck

2006-Jan-25 22:17 UTC

head link

[zfs-discuss] Re: Cluster File System Use Cases

I would be most interested in understanding where QFS is going
in relationship with ZFS. ZFS seems to be the long term focus,
file systems wise for Sun, someone can correct me if I am wrong.
I have heard/seen references to features such as encryption,
and clustering, coming either soon or in the case of clustering,
multi-reader/multi-writer, in the longer term. Sun already has
an HSM product, SAMFS which works only with QFS, so my questions
are simply...

[a] How does it make sense to support both QFS and ZFS longer term?
If both will be, or are, "high performance" and reliable, and in
the case of ZFS extended to support security features such as
encryption or labels. Stating what appears obvious, once ZFS
is bootable it seems UFS will "go away", won''t the same hold
true for QFS, once performance and other issues are worked out
with ZFS?

[b] Will SAMFS ever run with ZFS? Please note, I don''t expect
Sun to "open source" SAMFS.

[c] What technologies, if any, will be shared between ZFS and QFS?
I understand ZFS is much more advanced and different in many areas,
but it seems from my perspective that there could be a large degree
of value in the 2 teams joining efforts in the file systems area.

I already work with some folks I do consulting for that run SAMFS,
so I would love to understand the road map wrt this subject area.


Thanks.

Ed Gould

2006-Jan-25 22:30 UTC

head link

[zfs-discuss] Re: Cluster File System Use Cases

Robert Milkowski wrote:> It isn''t clear to me if you plan to make ZFS clutering dependable
oc
> Sun Cluster? I hope not. I would really like if you could get 2-16
> nodes even without Sun Cluster and use clustered ZFS (shared I should
> probably say). Then using Sun Cluster would be only an option to
> provide HA or/and scalability to an aplication.
At the moment, we are only considering clusterized ZFS in the context of 
Sun Cluster.  People often ask for this sort of decoupling, without 
really understanding what it entails.  In particular, there are parts of 
Sun Cluster (e.g., membership management) that are most likely required 
for a cluster file system to function properly (at least with reasonable 
performance when failures occur), but are integral to the Cluster 
product and cannot be factored out easily.  The request really seems to 
amount to, "Let me pick and choose the parts of Sun Cluster that I want 
at the moment, even though they were not designed to be separable."

We''ll keep this idea in mind, however, and if there is a reasonable way
to decouple sharing ZFS from Clustering, we''ll look at it.
> I also belive that as ZFS is going to hit S10U2 it would be really
> useful if equivalent of HAStorage+ (HAZFS?) agant would be created
> ASAP so people could use Sun Cluster with S10U2 and ZFS (of course not
> shared filesystem, yet). I know I would use it immediatly.
We agree that HA-ZFS would be very useful.  It is planned for Sun 
Cluster 3.2; coding is done and testing has begun.  I do not know the 
release schedule for this, however.  Due to testing requirements, and 
that Sun Cluster is part of JES (it''s the JES Availability Suite), it 
won''t be concurrent with S10U2, but I do not imagine that it should be 
too long afterwards.

	--Ed
-- 
Ed Gould                Sun Microsystems
File System Architect   Sun Cluster
ed.gould at sun.com        17 Network Circle
+1.650.786.4937         M/S UMPK17-201
x84937                  Menlo Park, CA  94025

Robert Milkowski

2006-Jan-26 13:22 UTC

head link

[zfs-discuss] Re: Cluster File System Use Cases

Hello Ed,

Wednesday, January 25, 2006, 11:30:28 PM, you wrote:

EG> Robert Milkowski wrote:>> It isn''t clear to me if you plan to make ZFS clutering
dependable oc
>> Sun Cluster? I hope not. I would really like if you could get 2-16
>> nodes even without Sun Cluster and use clustered ZFS (shared I should
>> probably say). Then using Sun Cluster would be only an option to
>> provide HA or/and scalability to an aplication.
EG> At the moment, we are only considering clusterized ZFS in the context of 
EG> Sun Cluster.  People often ask for this sort of decoupling, without 
EG> really understanding what it entails.  In particular, there are parts of 
EG> Sun Cluster (e.g., membership management) that are most likely required 
EG> for a cluster file system to function properly (at least with reasonable 
EG> performance when failures occur), but are integral to the Cluster 
EG> product and cannot be factored out easily.  The request really seems to 
EG> amount to, "Let me pick and choose the parts of Sun Cluster that I
want
EG> at the moment, even though they were not designed to be separable."

EG> We''ll keep this idea in mind, however, and if there is a
reasonable way
EG> to decouple sharing ZFS from Clustering, we''ll look at it.

I haven''t used QFS - but doesn''t it allow to have sharing
filesystem
between noded nad yet doesn''t require Sun Cluster?

>> I also belive that as ZFS is going to hit S10U2 it would be really
>> useful if equivalent of HAStorage+ (HAZFS?) agant would be created
>> ASAP so people could use Sun Cluster with S10U2 and ZFS (of course not
>> shared filesystem, yet). I know I would use it immediatly.
EG> We agree that HA-ZFS would be very useful.  It is planned for Sun 
EG> Cluster 3.2; coding is done and testing has begun.  I do not know the 
EG> release schedule for this, however.  Due to testing requirements, and 
EG> that Sun Cluster is part of JES (it''s the JES Availability
Suite), it
EG> won''t be concurrent with S10U2, but I do not imagine that it
should be
EG> too long afterwards.

This is great news!
Is it possible to get some beta bits of it?


-- 
Best regards,
 Robert                            mailto:rmilkowski at task.gda.pl
                                   http://milek.blogspot.com

Anton B. Rang

2006-Jan-26 15:47 UTC

head link

[zfs-discuss] Re: Re: Cluster File System Use Cases

QFS and ZFS presently address somewhat different markets. ZFS is a
general-purpose file system which offers very high reliability at some
performance cost. QFS can be used as a general-purpose file system as well, but
is at its best in high-performance scenarios, where it can be tuned to get
absolute peak performance for a particular application. (For instance, metadata
can be stored on a separate disk to avoid head seeks when reading or writing
data files, and files can be allocated contiguously on disk.)

SAM at present is tied to QFS, which enables some interesting features (for
instance, the ability to read from a file on tape without having to ever copy it
to disk, increasing performance over a tradtional HSM). There is an internal
project which has begun looking at decoupling SAM functionality from the file
system, with the eventual intent of providing HSM capabilities to ZFS and
possibly other file systems. I can''t say anything about dates, of
course.

There is some contact between the ZFS and QFS teams, but given the fundamentally
different architectures, it''s more likely that features and interfaces
may be shared than implementations. (QFS also has an existing customer base, so
features in new releases are driven primarily by customer requests.)
This message posted from opensolaris.org

Anton B. Rang

2006-Jan-26 15:50 UTC

head link

[zfs-discuss] Re: Re[2]: Re: Cluster File System Use Cases

>I haven''t used QFS - but doesn''t it allow to have sharing
filesystem
>between nodes and yet doesn''t require Sun Cluster?
Yes.  You do get somewhat more functionality when running QFS in conjunction
with SunCluster, though.  (In particular, without SunCluster the system
administrator is responsible for issuing the commands to reconfigure QFS if the
metadata server fails.)
This message posted from opensolaris.org

Ed Gould

2006-Jan-27 00:00 UTC

head link

[zfs-discuss] Re: Cluster File System Use Cases

Robert Milkowski wrote:> I haven''t used QFS - but doesn''t it allow to have sharing
filesystem
> between noded nad yet doesn''t require Sun Cluster?
Yes, there is a Shared QFS that does not depend on Sun Cluster.  But, as 
  Anton Rang has already commented, the QFS architecture is 
substantially different from that of ZFS, and there is functionality 
that is only available when Shared QFS is cuopled with Sun Cluster.

It''s not at all clear to me that we could do the same with ZFS and 
maintain the performance and correctness characteristics that we want. 
But, as I said, we''ll keep it in mind, and if there''s a way to
do it,
we''ll consider it.

	--Ed
-- 
Ed Gould                Sun Microsystems
File System Architect   Sun Cluster
ed.gould at sun.com        17 Network Circle
+1.650.786.4937         M/S UMPK17-201
x84937                  Menlo Park, CA  94025

Richard Elling

2006-Jan-27 19:33 UTC

head link

[zfs-discuss] Re: Re: Cluster File System Use Cases

> Robert Milkowski wrote:
> > It isn''t clear to me if you plan to make ZFS clutering
dependable oc
> > Sun Cluster? I hope not. I would really like if you could get 2-16
> > nodes even without Sun Cluster and use clustered ZFS (shared I should
> > probably say). Then using Sun Cluster would be only an option to
> > provide HA or/and scalability to an aplication.
> 
> At the moment, we are only considering clusterized ZFS in the context of 
> Sun Cluster.  People often ask for this sort of decoupling, without 
> really understanding what it entails.  In particular, there are parts of 
> Sun Cluster (e.g., membership management) that are most likely required 
> for a cluster file system to function properly (at least with reasonable 
> performance when failures occur), but are integral to the Cluster 
> product and cannot be factored out easily.  The request really seems to 
> amount to, "Let me pick and choose the parts of Sun Cluster that I
want
> at the moment, even though they were not designed to be separable."
In my experience, reactions like Robert''s are often due to
policy decisions made in the Sun Cluster design rather
than a desire to not cluster.  In particular, the Sun Cluster
policy is oriented towards making a cluster have the same
data integrity as a single host.  For modern single hosts, if
hardware breaks, beyond builtin resiliency, then the OS
will panic or othewise attempt to work around the issue.
Sun Cluster will do this too, by use of fencing and failfast
panics.  The problem is that this is counterintuitive to most
system administrators, who likely do not have the same
expectations of a cluster as we do for single systems.
When a failfast panic occurs, they tend to blame Sun
Cluster software as broken rather than recognize that a
failfast panic is a symptom that something else in the
cluster is broken.  For a single system, they would have
just seen a panic and immediately understood that the
hardware is broken.  I don''t know how to directly solve
this recognition problem.

If we were to allow this policy to be tunable, then we 
could eliminate some of the real or percieved deficiencies.
However, this must be done in the manner such that
the right data is protected.  I feel that ZFS offers some
opportunities here which we simply don''t have in other
file systems.  There are also opportunities to improve
fencing at a level more appropriate than LUN 
reservations and all of the grief associated with 
vendor implementations of LUN reservations.

Back to Robert''s thread, what is it about Sun Cluster
that is distasteful?  
 + Membership which is needed to ensure protection
   of data access and enable cluster-wide system
   administration?
 + Data protection via LUN reservation?
 + General complexity, system admin interfaces?
 + Resource group management and agent 
    interfaces (which is similar to SMF)?
 + Policies?
 + LVM management?

 -- richard
This message posted from opensolaris.org

Robert Milkowski

2006-Jan-30 08:14 UTC

head link

[zfs-discuss] Re: Re: Cluster File System Use Cases

Hello Richard,

Friday, January 27, 2006, 8:33:16 PM, you wrote:
>> Robert Milkowski wrote:
>> > It isn''t clear to me if you plan to make ZFS clutering
dependable oc
>> > Sun Cluster? I hope not. I would really like if you could get 2-16
>> > nodes even without Sun Cluster and use clustered ZFS (shared I
should
>> > probably say). Then using Sun Cluster would be only an option to
>> > provide HA or/and scalability to an aplication.
>> 
>> At the moment, we are only considering clusterized ZFS in the context
of
>> Sun Cluster.  People often ask for this sort of decoupling, without 
>> really understanding what it entails.  In particular, there are parts
of
>> Sun Cluster (e.g., membership management) that are most likely required
>> for a cluster file system to function properly (at least with
reasonable
>> performance when failures occur), but are integral to the Cluster 
>> product and cannot be factored out easily.  The request really seems to
>> amount to, "Let me pick and choose the parts of Sun Cluster that I
want
>> at the moment, even though they were not designed to be
separable."
RE> In my experience, reactions like Robert''s are often due to
RE> policy decisions made in the Sun Cluster design rather
RE> than a desire to not cluster.  In particular, the Sun Cluster
RE> policy is oriented towards making a cluster have the same
RE> data integrity as a single host.  For modern single hosts, if
RE> hardware breaks, beyond builtin resiliency, then the OS
RE> will panic or othewise attempt to work around the issue.
RE> Sun Cluster will do this too, by use of fencing and failfast
RE> panics.  The problem is that this is counterintuitive to most
RE> system administrators, who likely do not have the same
RE> expectations of a cluster as we do for single systems.
RE> When a failfast panic occurs, they tend to blame Sun
RE> Cluster software as broken rather than recognize that a
RE> failfast panic is a symptom that something else in the
RE> cluster is broken.  For a single system, they would have
RE> just seen a panic and immediately understood that the
RE> hardware is broken.  I don''t know how to directly solve
RE> this recognition problem.

RE> If we were to allow this policy to be tunable, then we 
RE> could eliminate some of the real or percieved deficiencies.
RE> However, this must be done in the manner such that
RE> the right data is protected.  I feel that ZFS offers some
RE> opportunities here which we simply don''t have in other
RE> file systems.  There are also opportunities to improve
RE> fencing at a level more appropriate than LUN 
RE> reservations and all of the grief associated with 
RE> vendor implementations of LUN reservations.

RE> Back to Robert''s thread, what is it about Sun Cluster
RE> that is distasteful?  
RE>  + Membership which is needed to ensure protection
RE>    of data access and enable cluster-wide system
RE>    administration?
RE>  + Data protection via LUN reservation?
RE>  + General complexity, system admin interfaces?
RE>  + Resource group management and agent 
RE>     interfaces (which is similar to SMF)?
RE>  + Policies?
RE>  + LVM management?

I do use SC and I find it really great product.
I''m almost sure I will use ZFS+SC in a near future (in a way I already
do it).

However I can think of some other enviroments where SC is just too
much complexity and all is needed is a shared filesystem. If one can
just mount the same filesystem on two or three nodes wih standard
Solaris installation and not worring about interconnects, clusters,
etc. Sure there''ll be less functionality but it''s not always
needed.

I''m also not sure if you can setup SC with different architectures in
the same cluster - in theory it should be possible with ZFS and there
shouldn''t be such a limitation.

-- 
Best regards,
 Robert                            mailto:rmilkowski at task.gda.pl
                                   http://milek.blogspot.com

Manoj Joseph

2006-Jan-30 08:22 UTC

head link

[zfs-discuss] Re: Cluster File System Use Cases

Robert Milkowski wrote:> I haven''t used QFS - but doesn''t it allow to have sharing
filesystem
> between noded nad yet doesn''t require Sun Cluster?
You are right. There is a flavor of QFS that works with Sun Cluster.

QFS has a metadata server. When working with Sun Cluster, the metadata 
server can be made highly available. When the node hosting the metadata 
server goes down, it is brought up on another node.

All this is done without a tight integration with Sun Cluster. The HA 
metadata server is an RGM service. It just automates what the sysadmin 
would have done manually.

By the way, the functionality/guarantees Shared-QFS provides are 
different from what a cluster filesystem like GFS aka PxFS provides.

PxFS provides transparent access during failover/switchover scenarios. 
By transparent, I mean the client would not see an EIO as long as a node 
is able to master the filesystem. I do not think shared QFS provides 
this functionality - HA metadata server or otherwise.

My take is that hooking into sun cluster configuration changes would be 
needed to do this. The design team would have to decide how tight the 
integration is going to be.

The svm-sc aka ''sun cluster svm'' aka oban team went with a
loosely
coupled design. You can actually get it up without installing sun 
cluster (though it is probably not supported). IMO, this loose 
integration has performance penalties during reconfigurations.

Disclaimer: I have not looked at the QFS source code nor have I worked 
with it extensively. The above is based on my understanding of how it 
works. :)

Regards,
Manoj

Ellard Roush

2006-Jan-31 00:29 UTC

head link

[zfs-discuss] Re: Re: Cluster File System Use Cases

Hi

Please see responses inline.

Ellard
> Hello Richard,
> 
> Friday, January 27, 2006, 8:33:16 PM, you wrote:
> 
> >> Robert Milkowski wrote:
> >> > It isn''t clear to me if you plan to make ZFS
clutering dependable oc
> >> > Sun Cluster? I hope not. I would really like if you could get
2-16
> >> > nodes even without Sun Cluster and use clustered ZFS (shared
I should
> >> > probably say). Then using Sun Cluster would be only an option
to
> >> > provide HA or/and scalability to an aplication.
> >> 
> >> At the moment, we are only considering clusterized ZFS in the
context of
> >> Sun Cluster.  People often ask for this sort of decoupling,
without
> >> really understanding what it entails.  In particular, there are
parts of
> >> Sun Cluster (e.g., membership management) that are most likely
required
> >> for a cluster file system to function properly (at least with
reasonable
> >> performance when failures occur), but are integral to the Cluster 
> >> product and cannot be factored out easily.  The request really
seems to
> >> amount to, "Let me pick and choose the parts of Sun Cluster
that I want
> >> at the moment, even though they were not designed to be
separable."
> 
> RE> In my experience, reactions like Robert''s are often due to
> RE> policy decisions made in the Sun Cluster design rather
> RE> than a desire to not cluster.  In particular, the Sun Cluster
> RE> policy is oriented towards making a cluster have the same
> RE> data integrity as a single host.  For modern single hosts, if
> RE> hardware breaks, beyond builtin resiliency, then the OS
> RE> will panic or othewise attempt to work around the issue.
> RE> Sun Cluster will do this too, by use of fencing and failfast
> RE> panics.  The problem is that this is counterintuitive to most
> RE> system administrators, who likely do not have the same
> RE> expectations of a cluster as we do for single systems.
> RE> When a failfast panic occurs, they tend to blame Sun
> RE> Cluster software as broken rather than recognize that a
> RE> failfast panic is a symptom that something else in the
> RE> cluster is broken.  For a single system, they would have
> RE> just seen a panic and immediately understood that the
> RE> hardware is broken.  I don''t know how to directly solve
> RE> this recognition problem.
> 
> RE> If we were to allow this policy to be tunable, then we 
> RE> could eliminate some of the real or percieved deficiencies.
> RE> However, this must be done in the manner such that
> RE> the right data is protected.  I feel that ZFS offers some
> RE> opportunities here which we simply don''t have in other
> RE> file systems.  There are also opportunities to improve
> RE> fencing at a level more appropriate than LUN 
> RE> reservations and all of the grief associated with 
> RE> vendor implementations of LUN reservations.
> 
> RE> Back to Robert''s thread, what is it about Sun Cluster
> RE> that is distasteful?  
> RE>  + Membership which is needed to ensure protection
> RE>    of data access and enable cluster-wide system
> RE>    administration?
> RE>  + Data protection via LUN reservation?
> RE>  + General complexity, system admin interfaces?
> RE>  + Resource group management and agent 
> RE>     interfaces (which is similar to SMF)?
> RE>  + Policies?
> RE>  + LVM management?
> 
> I do use SC and I find it really great product.
> I''m almost sure I will use ZFS+SC in a near future (in a way I
already
> do it).
> 
> However I can think of some other enviroments where SC is just too
> much complexity and all is needed is a shared filesystem. If one can
> just mount the same filesystem on two or three nodes wih standard
> Solaris installation and not worring about interconnects, clusters,
> etc. Sure there''ll be less functionality but it''s not
always needed.
> 
Complexity is one complaint that we have received about Sun Cluster.
Specifically, the administrative work and hardware restrictions
have been cited.
We have some ideas about significantly improving each of these areas.
I would love to hear from anyone outside the Sun Cluster organization
who is familiar with our SC product as to their concerns and issues.
It is always good to independent feedback. 

Since this email is on "zfs-discuss" and this is really a Sun Cluster
topic,
please send responses to just me so that we do not flood the ZFS people
with SC stuff.
> I''m also not sure if you can setup SC with different architectures
in
> the same cluster - in theory it should be possible with ZFS and there
> shouldn''t be such a limitation.
>At this time Sun Cluster only supports a cluster consisting
of machines of the same architecture: either SPARC or x86.

Sun Cluster operates mostly at a level where
the differences between SPARC and x86 do not matter.
The two limitations that I know about are:
1) little-endian vs big-endian translation
2) pxfs assumes all machines have same OS flavor.

One reason that we have not pursued mixed SPARC/x86 cluster
is that we have not found much customer interest.

It would be interesting if you are encountering potential
customers for such a product.> 
> -- 
> Best regards,
>  Robert                            mailto:rmilkowski at task.gda.pl
>                                    http://milek.blogspot.com
> 
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

frank gleason

2006-Feb-01 19:01 UTC

head link

[zfs-discuss] Re: Cluster File System Use Cases

Do you believe it is necessary for a host to be part of the Sun Cluster to mount
the filesystem? I understand the need for some actions to require an HA
component but it seems to me it would be possible to mount and R/W the filesytem
without being part of the cluster as long as host could access services required
to be HA. Perhaps by putting the location of the service in a label?
Certain restrictions like requiring z* commands to be executed from inside the
cluster would be acceptable.

To give a real world example we run ~40 web servers for images. I don''t
see it as practical to put all those servers into a cluster.
This message posted from opensolaris.org

Richard Elling

2006-Feb-01 20:20 UTC

head link

[zfs-discuss] Re: Cluster File System Use Cases

> Do you believe it is necessary for a host to be part
> of the Sun Cluster to mount the filesystem? 
Yes, as long as the cluster owns the data.
> I understand the need for some actions to require an HA
> component but it seems to me it would be possible to
> mount and R/W the filesytem without being part of the
> cluster as long as host could access services
> required to be HA. Perhaps by putting the location of
> the service in a label?
> Certain restrictions like requiring z* commands to be
> executed from inside the cluster would be acceptable.
> 
> To give a real world example we run ~40 web servers
> for images. I don''t see it as practical to put all
> those servers into a cluster.
Lots of people do this today.  Current sharing technologies
seem to work quite well.  How would a tight coupling between
a (mostly?) read-only client and read-write clients be an
improvement?  Or, what problem are you trying to solve?
 -- richard
This message posted from opensolaris.org

Carisdad

2006-Feb-01 21:09 UTC

head link

[zfs-discuss] Re: Cluster File System Use Cases

Not necessarily a Cluster File System Use Case, but I''d like to be able
to make filesystems available to multiple boxes over SAN attached disk 
for the purpose of using clones.  We have several test environments for 
each production environment which refresh data from production snapshots 
regularly.  Currently, all systems must have enough disk to hold all of 
the production data even though, the testing usually only changes 
relatively small amounts of data.

I understand I could create an architecture using nfs to share the 
clone(s) to the test systems, but problems with that are:
    - I would have to add at least 1 more server to the environment 
(probably a cluster, so 2+ servers).
    - Our systems are pre-wired w/ 2 GigE interfaces and I''d be worried
about sharing the general network bandwidth w/ file access.
    - Our systems are already pre-wired w/ SAN connections, so it would 
be a waste to not utilize them.

I also understand that it would be better to just load test data on the 
test instances, but I have very limited influence in that area :-(

So in essence, what would be really cool, would be to be able to import 
a pool on multiple machines w/ one read/writer of the primary data, and 
be able to build read/write clones (writable by only one machine) to 
facilitate near instant refreshes of data.

Any chance of that becoming a reality?

frank gleason

2006-Feb-01 23:10 UTC

head link

[zfs-discuss] Re: Cluster File System Use Cases

I''m trying to solve the problem of lots of copies of the same files and
a file distribution model based on rsync. I''ll have to take a look at
Sun Cluster. It''s been a two years since I touched Solaris. Truthfuly
it would have to be shockingly cost effective to consider and that''s no
dig on Sun. Just looking around at all the new stuff I''m seeing amazing
value for the money
This message posted from opensolaris.org

Richard Elling

2006-Feb-02 04:49 UTC

head link

[zfs-discuss] Re: Cluster File System Use Cases

> I''m trying to solve the problem of lots of copies of
> the same files and a file distribution model based on
> rsync. I''ll have to take a look at Sun Cluster. It''s
> been a two years since I touched Solaris. Truthfuly
> it would have to be shockingly cost effective to
> consider and that''s no dig on Sun. Just looking
> around at all the new stuff I''m seeing amazing value
> for the money
Your model is backwards.  Rather than pushing out
to the masses, have the masses cache.  q.v. cachefsd(1m),
mount_cachefs(1m)  [note: cachefs is a nop for NFSv4]
This is a much simpler model to manage, almost a 
no-brainer.
 -- richard
This message posted from opensolaris.org

Richard Elling

2006-Feb-02 05:17 UTC

head link

[zfs-discuss] Re: Re: Cluster File System Use Cases

> Not necessarily a Cluster File System Use Case, but I''d like to be
able
> to make filesystems available to multiple boxes over SAN attached disk 
> for the purpose of using clones.  We have several test environments for 
> each production environment which refresh data from production snapshots 
> regularly.  Currently, all systems must have enough disk to hold all of 
> the production data even though, the testing usually only changes 
> relatively small amounts of data.
It is the changing that is problematic.  Read-only is a much simpler
case.
 > I understand I could create an architecture using nfs to share the 
> clone(s) to the test systems, but problems with that are:
If I understand what you are saying, you want something like:

  disk --<SCSI>-- server --<SCSI>-- server[s] --<IP>-- clients

while I tend to advocate:

  disk --<SCSI>-- server --<NFS>-- server[s] --<IP>-- clients

habit, I suppose.  But the difference is that the SCSI protocol has
no context of the data, whereas NFS has some knowledge of
the data.  NFS does not have as much knowledge of the data
as ZFS, though.  Hint: use NFS to share the ZFS clones.

Note, in this model a RAID array is a server which speaks the
SCSI protocol.  You still need to get to a file system level of
abstraction at the edge servers.
> - I would have to add at least 1 more server to to the environment 
> (probably a cluster, so 2+ servers).
With a RAID array this would be:
  disk --<SCSI>-- server --<SCSI>-- server[s] --<NFS>--
server[s] --<IP>-- clients

Not especially palatable, though common.
> - Our systems are pre-wired w/ 2 GigE interfaces aces and I''d be
worried
> about sharing the general network bandwidth w/ file access.
> - Our systems are already pre-wired w/ SAN SAN connections, so it would 
> be a waste to not utilize them.
> 
> I also understand that it would be better to just load test data on the 
> test instances, but I have very limited influence in that area :-(
> 
> So in essence, what would be really cool, would be to be able to import 
> a pool on multiple machines w/ one read/writer of the primary data, and 
> be able to build read/write clones (writable by only one machine) to 
> facilitate near instant refreshes of data.
> 
> Any chance of that becoming a reality?
This is already available with QFS today.  It has an arbitration/
synchronization method which is especially suitable for such environments.
It follows your desired model.  It isn''t quite like ZFS, though, so 
there are
some feature trade-offs.

Bringing this back towards ZFS-land, I think that there are some clever
things we can do with snapshots and clones.  But the age-old problem 
of arbitration rears its ugly head.  I think I could write an option to expose
ZFS snapshots to read-only clients.  But in doing so, I don''t see how
to
prevent an ill-behaved client from clobbering the data.  To solve that
problem, an arbiter must decide who can write where.  The SCSI
protocol has almost nothing to assist us in this cause, but NFS, QFS,
and pxfs do.  There is room for cleverness, but not at the SCSI or block
level.
 -- richard
This message posted from opensolaris.org

nathan

2006-Feb-02 07:35 UTC

head link

[zfs-discuss] Re: Cluster File System Use Cases

Probably a typical scenario. Three to five node Oracle RAC. Two to 4 nodes
read/write, with the last node a datawarehouse needing read access.

Specifically:
- how many nodes would likely be in the cluster?
5 at the upper end
- how many of the nodes participate in active data sharing?
All
- what applications would be sharing data?
Oracle
- what is the sharing model? 
multi-writer
This message posted from opensolaris.org

Robert Milkowski

2006-Feb-02 07:49 UTC

head link

[zfs-discuss] Re: Cluster File System Use Cases

Hello frank,

Thursday, February 2, 2006, 12:10:18 AM, you wrote:

fg> I''m trying to solve the problem of lots of copies of the same
files and a file distribution model based on rsync. I''ll have to take a
look at Sun Cluster. It''s been a two years since I touched
fg> Solaris. Truthfuly it would have to be shockingly cost effective to
consider and that''s no dig on Sun. Just looking around at all the new
stuff I''m seeing amazing value for the money

Sun Cluster is free now.

And wouldn''t NFS be a better solution

-- 
Best regards,
 Robert                            mailto:rmilkowski at task.gda.pl
                                   http://milek.blogspot.com

Richard Elling

2006-Feb-02 17:38 UTC

head link

[zfs-discuss] Re: Cluster File System Use Cases

> Probably a typical scenario. Three to five node
> Oracle RAC. Two to 4 nodes read/write, with the last
> node a datawarehouse needing read access.
Disagree, this is not typical.  You really do want isolation
between your OLTP and DSS workloads.  RAC tends to
run as fast as the slowest node can handle the arbitration,
so you don''t want to mix your workloads in the same cluster.
> Specifically:
> - how many nodes would likely be in the cluster?
> 5 at the upper end
> - how many of the nodes participate in active data
> sharing?
> All
> - what applications would be sharing data?
> Oracle
> - what is the sharing model? 
> multi-writer
However, Oracle is heavily invested in, and promoting,
ASM.  I do not think it is wise for ZFS to try to out-ASM ASM.
In fact, most of the recent Sun+Oracle world record 
performance benchmarks are using ASM.
 -- richard
This message posted from opensolaris.org

Carisdad

2006-Feb-02 22:40 UTC

head link

[zfs-discuss] Re: Re: Cluster File System Use Cases

>If I understand what you are saying, you want something like:
>
>  disk --<SCSI>-- server --<SCSI>-- server[s] --<IP>--
clients
>
>while I tend to advocate:
>
>  disk --<SCSI>-- server --<NFS>-- server[s] --<IP>--
clients
>
>habit, I suppose.  But the difference is that the SCSI protocol has
>no context of the data, whereas NFS has some knowledge of
>the data.  NFS does not have as much knowledge of the data
>as ZFS, though.  Hint: use NFS to share the ZFS clones.
>
>  
>I would want something like:
    disk --<SCSI>-- server1 (r/w) --<IP>-- clients (This would
actually
be a BC/DR copy of the production data so there would only be client 
access in a disaster)
                         |
                         -- server2 (ro w/ the exception of r/w clones) 
--<IP>-- clients
                         |
                         -- server3 (ro w/ the exception of different 
r/w clones) --<IP>-- clients
                         .
                         .
                         .

I understand the possibility of using NFS to share the ZFS clones.  My 
concern more around our current, well embedded infrastructure being able 
to handle this case.
>This is already available with QFS today.  It has an arbitration/
>synchronization method which is especially suitable for such environments.
>It follows your desired model.  It isn''t quite like ZFS, though, so
there are
>some feature trade-offs.
>  
>QFS doesn''t seem to have the "clone" feature which is the
whole point to
the approach.  The idea is to virutalize the data.  Let''s see if I can 
draw this out a little further.

If I have a production server hosting a database with 1TB of mirrored 
disk attached, and 4 test/dev environments which at this point also 
require 1TB of disk which is in one way or another copied from the 
production database, on a monthly basis.  I''m required to have 5TB + 
whatever redundancy is required for each environment.  I would like to 
be able to create clones of the production data and only require disk to 
hold the changes for each environment which is really quite small 
relatively.  Again, I do understand this could be done over NFS.  The 
question is would it be possible to implement something like
"sub-pools"
in ZFS.  Where the primary pool is available/imported r/w on a master 
server.  While slave servers could import readonly that same pool and 
create a read/writeable sub-pool on which clones could be created.
>Bringing this back towards ZFS-land, I think that there are some clever
>things we can do with snapshots and clones.  But the age-old problem 
>of arbitration rears its ugly head.  I think I could write an option to
expose
>ZFS snapshots to read-only clients.  But in doing so, I don''t see
how to
>prevent an ill-behaved client from clobbering the data.  To solve that
>  
>An ill-behaved client being one that might attempt to take over the 
primary pool read/writeable?  I could understand a concern there, but I 
would prefer to handle that farther up the stack, probably at a process 
layer.
>problem, an arbiter must decide who can write where.  The SCSI
>protocol has almost nothing to assist us in this cause, but NFS, QFS,
>and pxfs do.  There is room for cleverness, but not at the SCSI or block
>level.
> -- richard
>This message posted from opensolaris.org
>_______________________________________________
>zfs-discuss mailing list
>zfs-discuss at opensolaris.org
>http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
>  
>Thanks for the inpus so far...
  --Andy

nathan

2006-Feb-05 07:04 UTC

head link

[zfs-discuss] Re: Cluster File System Use Cases

Ahh. True I should not have said typical. Using the read access on the "DSS
node" to avoid sending that massive amount of data across the network
during DW loads. The node reads it "locally". Apparently I''ll
have to look more closely at the effects.

Yes, ASM would make my life easier, but it''s another matter to convince
the rest of the organization. From my limited understanding of ASM, it would
also require using RMAN. We''re currently stuck on file system backups.
Something we''ll try to rectify this year. I didn''t know about
the benchmarks using ASM. That is great ammunition for me. Thanks!!
This message posted from opensolaris.org

Richard Elling

2006-Feb-06 23:37 UTC

head link

[zfs-discuss] Re: Cluster File System Use Cases

> Ahh. True I should not have said typical. Using the
> read access on the "DSS node" to avoid sending that
> massive amount of data across the network during DW
> loads. The node reads it "locally". Apparently I''ll
> have to look more closely at the effects.
This implies a new use case, which I think we should
consider for ZFS: QoS.  One of the problems mixing
OLTP and DSS workloads in the same storage is that
when OLTP needs a lot of latency-sensitive but small
iops, DSS needs fewer, larger iops.  Without any QoS,
the OLTP app will get killed by the DSS hog.  It would
seem to reason that ZFS would need an interface into
whatever IO QoS mechanisms are being developed.
 -- richard
This message posted from opensolaris.org

Thomas Roach

2007-Feb-28 15:23 UTC

head link

[zfs-discuss] Re: Cluster File System Use Cases

I''m an Oracle DBA and we are doing ASM on SUN with RAC. I am happy with
ASM''s performance but am interested in Clustering. I mentioned to Bob
Netherton that if Sun could make it a clustering file system, that helps them
enable the grid further. Oracle wrote and gave OCFS2 to the Linux Kernel. Since
Solaris is GPL too and CDDL (Correct me if Im wrong) then couldn''t they
take OCFS2 and port it into Solaris? Any chance at adding Clustering to ZFS?

Just to see it and play with it would be fun. ZFS is open source so if someone
cares to write their own clustering file system, they can : )
 
 
This message posted from opensolaris.org

Paul Fisher

2007-Feb-28 15:28 UTC

head link

[zfs-discuss] Re: Cluster File System Use Cases

Sun is supposed to have work ongoing on clustered zfs, but is also supposed to
be out in the 2+ year timeframe.

I for one would love if someone involved in this work would give a little bit of
visibility into the effort and possibly how community members could help, if one
was sufficiently talented and inclined.


thanks,
paul


-----Original Message-----
From: zfs-discuss-bounces at opensolaris.org on behalf of Thomas Roach
Sent: Wed 2/28/2007 9:23 AM
To: zfs-discuss at opensolaris.org
Subject: [zfs-discuss] Re: Cluster File System Use Cases
 
I''m an Oracle DBA and we are doing ASM on SUN with RAC. I am happy with
ASM''s performance but am interested in Clustering. I mentioned to Bob
Netherton that if Sun could make it a clustering file system, that helps them
enable the grid further. Oracle wrote and gave OCFS2 to the Linux Kernel. Since
Solaris is GPL too and CDDL (Correct me if Im wrong) then couldn''t they
take OCFS2 and port it into Solaris? Any chance at adding Clustering to ZFS?

Just to see it and play with it would be fun. ZFS is open source so if someone
cares to write their own clustering file system, they can : )
 
 
This message posted from opensolaris.org
_______________________________________________
zfs-discuss mailing list
zfs-discuss at opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20070228/c7b84ab3/attachment.html>

Eric Haycraft

2007-Feb-28 15:33 UTC

head link

[zfs-discuss] Re: Cluster File System Use Cases

"Also Oracle forums and SUN forums have the SAME exact look and feel...
hmmm. Even the options are exactly the same... weird."

Both are from a company called Jive Software that does enterprise forums.
 
 
This message posted from opensolaris.org

Dean Roehrich

2007-Feb-28 15:54 UTC

head link

[zfs-discuss] Re: Cluster File System Use Cases

On Wed, Feb 28, 2007 at 07:23:44AM -0800, Thomas Roach wrote:
> I''m an Oracle DBA and we are doing ASM on SUN with RAC. I am happy
with ASM''s
> performance but am interested in Clustering. I mentioned to Bob Netherton
that
> if Sun could make it a clustering file system, that helps them enable the
grid
> further. Oracle wrote and gave OCFS2 to the Linux Kernel. Since Solaris is
GPL
> too and CDDL (Correct me if Im wrong) then couldn''t they take
OCFS2 and port
> it into Solaris? Any chance at adding Clustering to ZFS?
ASM was Storage-Tek''s rebranding of SAM-QFS.  SAM-QFS is already a
shared
(clustering) filesystem.  You need to upgrade :)  Look for "Shared
QFS".

And yes, we''re actively pushing the SAM-QFS code through the
open-source
process.  Here''s the first blog entry:

http://blogs.sun.com/samqfs/entry/welcome_to_sam_qfs_weblog

Dean

Dagobert Michelsen

2007-Mar-01 14:08 UTC

head link

[zfs-discuss] Re: Cluster File System Use Cases

Hi,

my main interest is sharing a zpool between machines, so the zfs filesystems on
different hosts can share a single lun. When you run several applications each
in a different zone and allow the zones to be run on one of several hosts
individually (!) this currently means at least one separate lun for each zone.
Therefore you can''t use any of the cool features like cloning a zone
with snapshots, dynamic space sharing between zones or easy resizing. Mounting a
zfs on multiple hosts is nice to have, but for me not essential.

Just my .02 ?

  -- Dagobert
 
 
This message posted from opensolaris.org

Rayson Ho

2007-Mar-05 20:37 UTC

head link

[zfs-discuss] Re: Cluster File System Use Cases

I read this paper on Sunday. Seems interesting:

The Architecture of PolyServe Matrix Server: Implementing a Symmetric
Cluster File System

http://www.polyserve.com/requestinfo_formq1.php?pdf=2

What interested me the most is that the metadata and lock are spread
across all the nodes. I read the "Parallel NFS (pNFS)" presentation,
and seems like pNFS still has the metadata on one server... (Lisa,
correct me if I am wrong).

http://opensolaris.org/os/community/os_user_groups/frosug/pNFS/FROSUG-pNFS.pdf

Rayson

Mike Gerdts

2007-Mar-06 02:20 UTC

head link

[zfs-discuss] Re: Cluster File System Use Cases

On 2/28/07, Dean Roehrich <dean.roehrich at sun.com>
wrote:> ASM was Storage-Tek''s rebranding of SAM-QFS.  SAM-QFS is already a
shared
> (clustering) filesystem.  You need to upgrade :)  Look for "Shared
QFS".
ASM as Oracle states it is Automatic Storage Management.  To the best
of my knowledge, it shares no heritage with SAM-QFS.

http://www.oracle.com/technology/products/database/asm/index.html

Mike

-- 
Mike Gerdts
http://mgerdts.blogspot.com/

Dean Roehrich

2007-Mar-06 15:31 UTC

head link

[zfs-discuss] Re: Cluster File System Use Cases

On Mon, Mar 05, 2007 at 08:20:33PM -0600, Mike Gerdts
wrote:> On 2/28/07, Dean Roehrich <dean.roehrich at sun.com> wrote:
> >ASM was Storage-Tek''s rebranding of SAM-QFS.  SAM-QFS is
already a shared
> >(clustering) filesystem.  You need to upgrade :)  Look for "Shared
QFS".
> 
> ASM as Oracle states it is Automatic Storage Management.  To the best
> of my knowledge, it shares no heritage with SAM-QFS.
> 
> http://www.oracle.com/technology/products/database/asm/index.html
Thanks.  This is the ASM I know:

http://www.storagetek.com/products/product_page86.html

Dean

Spencer Shepler

2007-Mar-06 18:03 UTC

head link

[zfs-discuss] Re: Cluster File System Use Cases

The pNFS protocol doesn''t preclude varying meta-data server designs
and their various locking strategies.

As an example, there has been work going on at University of Michigan/ 
CITI
to extend the Linux/NFSv4 implementation to allow for a pNFS server on
top of the Polyserve solution.

Spencer

On Mar 5, 2007, at 2:37 PM, Rayson Ho wrote:
> I read this paper on Sunday. Seems interesting:
>
> The Architecture of PolyServe Matrix Server: Implementing a Symmetric
> Cluster File System
>
> http://www.polyserve.com/requestinfo_formq1.php?pdf=2
>
> What interested me the most is that the metadata and lock are spread
> across all the nodes. I read the "Parallel NFS (pNFS)"
presentation,
> and seems like pNFS still has the metadata on one server... (Lisa,
> correct me if I am wrong).
>
> http://opensolaris.org/os/community/os_user_groups/frosug/pNFS/ 
> FROSUG-pNFS.pdf
>
> Rayson
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Brian Hechinger

2007-Jul-12 16:19 UTC

head link

[zfs-discuss] Cluster File System Use Cases

On Wed, Feb 28, 2007 at 09:54:37AM -0600, Dean Roehrich
wrote:> On Wed, Feb 28, 2007 at 07:23:44AM -0800, Thomas Roach wrote:
> 
> And yes, we''re actively pushing the SAM-QFS code through the
open-source
> process.  Here''s the first blog entry:
> 
> http://blogs.sun.com/samqfs/entry/welcome_to_sam_qfs_weblog
I see that libSAM has been release.  How long until we see QFS out in the
wild?

-brian
-- 
"Perl can be fast and elegant as much as J2EE can be fast and elegant.
In the hands of a skilled artisan, it can and does happen; it''s just
that most of the shit out there is built by people who''d be better
suited to making sure that my burger is cooked thoroughly."  -- Jonathan
Patschke

Richard L. Hamilton

2007-Jul-13 07:20 UTC

head link

[zfs-discuss] Cluster File System Use Cases

> Bringing this back towards ZFS-land, I think that
> there are some clever
> things we can do with snapshots and clones.  But the
> age-old problem 
> of arbitration rears its ugly head.  I think I could
> write an option to expose
> ZFS snapshots to read-only clients.  But in doing so,
> I don''t see how to
> prevent an ill-behaved client from clobbering the
> data.  To solve that
> problem, an arbiter must decide who can write where.
>  The SCSI
> rotocol has almost nothing to assist us in this
> cause, but NFS, QFS,
> and pxfs do.  There is room for cleverness, but not
> at the SCSI or block
> level.
>  -- richard
Yeah; ISTR that IBM mainframe complexes with what they called
"shared DASD" (DASD==Direct Access Storage Device, i.e. disk, drum, or
the
like) depended on extent reserves.  IIRC, SCSI dropped extent reserve
support, and indeed it was never widely nor reliably available anyway.
AFAIK, all SCSI offers is reserves of an entire LUN; that doesn''t even
help
with slices, let alone anything else.  Nor (unlike either the VTOC structure
on MVS nor VxFS) is ZFS extent-based anyway; so even if extent reserves
were available, they''d only help a little.  Which means, as he says,
some
sort of arbitration.

I wonder whether the hooks for putting the ZIL on a separate device
will be of any use for the cluster filesystem problem; it almost makes me
wonder if there could be any parallels between pNFS and a refactored
ZFS.
 
 
This message posted from opensolaris.org

Spencer Shepler

2007-Jul-13 17:29 UTC

head link

[zfs-discuss] Cluster File System Use Cases

On Jul 13, 2007, at 2:20 AM, Richard L. Hamilton wrote:
>> Bringing this back towards ZFS-land, I think that
>> there are some clever
>> things we can do with snapshots and clones.  But the
>> age-old problem
>> of arbitration rears its ugly head.  I think I could
>> write an option to expose
>> ZFS snapshots to read-only clients.  But in doing so,
>> I don''t see how to
>> prevent an ill-behaved client from clobbering the
>> data.  To solve that
>> problem, an arbiter must decide who can write where.
>>  The SCSI
>> rotocol has almost nothing to assist us in this
>> cause, but NFS, QFS,
>> and pxfs do.  There is room for cleverness, but not
>> at the SCSI or block
>> level.
>>  -- richard
>
> Yeah; ISTR that IBM mainframe complexes with what they called
> "shared DASD" (DASD==Direct Access Storage Device, i.e. disk,
drum,
> or the
> like) depended on extent reserves.  IIRC, SCSI dropped extent reserve
> support, and indeed it was never widely nor reliably available anyway.
> AFAIK, all SCSI offers is reserves of an entire LUN; that doesn''t
> even help
> with slices, let alone anything else.  Nor (unlike either the VTOC  
> structure
> on MVS nor VxFS) is ZFS extent-based anyway; so even if extent  
> reserves
> were available, they''d only help a little.  Which means, as he  
> says, some
> sort of arbitration.
>
> I wonder whether the hooks for putting the ZIL on a separate device
> will be of any use for the cluster filesystem problem; it almost  
> makes me
> wonder if there could be any parallels between pNFS and a refactored
> ZFS.
We are busy layering pNFS on ZFS in the NFSv4.1 project and hope to
allow for coordination with client access and other interesting  
features.

Spencer

zfs discuss - Jan 2006 - Cluster File System Use Cases

[zfs-discuss] Cluster File System Use Cases

[zfs-discuss] Re: Cluster File System Use Cases

[zfs-discuss] Re: Cluster File System Use Cases

[zfs-discuss] Cluster File System Use Cases

[zfs-discuss] Re: Cluster File System Use Cases

[zfs-discuss] Re: Cluster File System Use Cases

[zfs-discuss] Re: Cluster File System Use Cases

[zfs-discuss] Re: Cluster File System Use Cases

[zfs-discuss] Re: Cluster File System Use Cases

[zfs-discuss] Re: Cluster File System Use Cases

[zfs-discuss] Re: Cluster File System Use Cases

[zfs-discuss] Re: Cluster File System Use Cases

[zfs-discuss] Re: Cluster File System Use Cases

[zfs-discuss] Re: Cluster File System Use Cases

[zfs-discuss] Re: Cluster File System Use Cases

[zfs-discuss] Re: Cluster File System Use Cases

[zfs-discuss] Re: Re: Cluster File System Use Cases

[zfs-discuss] Re: Re[2]: Re: Cluster File System Use Cases

[zfs-discuss] Re: Cluster File System Use Cases

[zfs-discuss] Re: Re: Cluster File System Use Cases

[zfs-discuss] Re: Re: Cluster File System Use Cases

[zfs-discuss] Re: Cluster File System Use Cases

[zfs-discuss] Re: Re: Cluster File System Use Cases

[zfs-discuss] Re: Cluster File System Use Cases

[zfs-discuss] Re: Cluster File System Use Cases

[zfs-discuss] Re: Cluster File System Use Cases

[zfs-discuss] Re: Cluster File System Use Cases

[zfs-discuss] Re: Cluster File System Use Cases

[zfs-discuss] Re: Re: Cluster File System Use Cases

[zfs-discuss] Re: Cluster File System Use Cases

[zfs-discuss] Re: Cluster File System Use Cases

[zfs-discuss] Re: Cluster File System Use Cases

[zfs-discuss] Re: Re: Cluster File System Use Cases

[zfs-discuss] Re: Cluster File System Use Cases

[zfs-discuss] Re: Cluster File System Use Cases

[zfs-discuss] Re: Cluster File System Use Cases

[zfs-discuss] Re: Cluster File System Use Cases

[zfs-discuss] Re: Cluster File System Use Cases

[zfs-discuss] Re: Cluster File System Use Cases

[zfs-discuss] Re: Cluster File System Use Cases

[zfs-discuss] Re: Cluster File System Use Cases

[zfs-discuss] Re: Cluster File System Use Cases

[zfs-discuss] Re: Cluster File System Use Cases

[zfs-discuss] Re: Cluster File System Use Cases

[zfs-discuss] Cluster File System Use Cases

[zfs-discuss] Cluster File System Use Cases

[zfs-discuss] Cluster File System Use Cases