thr3ads.net - Ocfs2 devel - [Ocfs2-devel] [RFC] Integration with external clustering [Oct 2005]

If this information is useful, please help other people find it:
Share via:

Jeff Mahoney

2005-Oct-18 16:52 UTC

[Ocfs2-devel] [RFC] Integration with external clustering

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


Hey all -

We're interested in using OCFS2 with an external, userspace clustering
solution. Specifically, the heartbeat2 project from linux-ha.org.
Obviously, the internal cluster manager would still be available for
users with no interest in deploying and configuring a full cluster
manager just to use the file system.  I'd like to attempt to make the
interface as consistent as possible between the two.

The obvious mapping to an external cluster manager is to map one file
system to one cluster resource, to be managed individually. The user
space cluster manager will take over most of the cluster management
infrastructure supplied now by o2cb, including heartbeat, fencing, etc.
The node manager would still be used to coordinate DLM operations, which
would be left in-kernel. The o2cb code is pretty well structured for
this kind of integration without a lot of hacking, but there are a few
sticking points. The good news is that the infrastructure for fixing
most of them is already in place, just waiting to be used.

The existing code has a notion of one global cluster with each node
owning a particular node number and a single IP address/port. This node
number is mapped 1:1 to file system slots and DLM domain node numbers,
regardless of how many nodes are actually involved in mounting any
particular file system. A large cluster may deploy a cluster-global file
system, but also many smaller file systems to small subsets of nodes.
The smaller file systems, even though they are deployed on a small
number of nodes, still require slots for every member of the larger
cluster. If separate network connectivity is desired for the smaller
file systems, separate node numbers must be allocated in order to
utilize the alternate network, making the problem worse.

The one-cluster notion appears to be rooted in o2net, where the
assumption of a 1:1 IP Address:Node mapping is made. The node manager is
aware of multiple clusters, and even has to provide an interface to fake
the single cluster membership. o2net itself even understands that an
internode connection will be used for multiple virtual connections.

And, one of the larger issues for integration with a userspace cluster
manager is how nodes are organized and exported to userspace. Currently,
there is only one instance of a node. If a heartbeat down event is
triggered for a particular node, all file systems are told about it,
even if they don't care. What we need to integrate a userspace cluster
manager is more fine grained configuration of node membership.

I'd like to address these issues in my proposal:

Individual file systems should be represented individually, with
resources and connectivity assignable independently to each.

I'll start with an idea of what I'd like to see the configfs space look
like, since I think it will probably illustrate it best:

/config/cluster/ocfs2/<fs uuid>/<node>/
                                       ip address
                                       port
                                       fs slot
                                       local
                                       active (for userspace)
                                       heartbeat/ (for kernelspace)
                                                 block_bytes
                                                 blocks
                                                 dev
                                                 start_block

Rather than having one global cluster, each file system would be its own
cluster. Nodes would be created and destroyed as needed on a per file
system basis. The current o2net concept of a node would be replaced by
something that is specific to connectivity. The current implemention of
one connection per ip/port would stay, but rather than assume a
particular connection-node mapping at accept time, it would broker
messages later once the key has been observed in the message.

Since heartbeat and node management would end up having similar trees
with different attributes, the node and heartbeat attributes would be
unified under a single fs instance.

Obviously, modifications to the o2cb userspace tools would be required
to make this work. I think that the changes required for cluster.conf
could be minimal -- just keep the existing format and add overrides for
 file systems that want to use different slots/networks/etc.

I'm volunteering to code all this up, I just didn't want to post code
that nobody wanted.

Opinions?

- -Jeff

- --
Jeff Mahoney
SUSE Labs
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFDVW+KLPWxlyuTD7IRAv5SAJ4yUID/gnGslfhu0JZzNiF+1f0OYQCfUQei
2eeyWWd6lfe9Ae8NzV8tXSI=xI1V
-----END PGP SIGNATURE-----

Joel Becker

2005-Oct-18 17:18 UTC

head link

[Ocfs2-devel] [RFC] Integration with external clustering

On Tue, Oct 18, 2005 at 05:56:27PM -0400, Jeff Mahoney
wrote:> I'll start with an idea of what I'd like to see the configfs space
look
> like, since I think it will probably illustrate it best:
> 
> /config/cluster/ocfs2/<fs uuid>/<node>/
	If you are treating each mount as a 'cluster', the ocfs2 path
element is pretty redundant, and /config/cluster/<fs uuid> would
suffice.
	Given that heartbeat regions can and should be shared, you need
a way to describe this.  We don't have userspace doing global heartbeat
yet, but there is no reason that all OCFS2 volumes can't share one
heartbeat region (see
http://oss.oracle.com/projects/ocfs2-tools/src/branches/global-heartbeat/documentation/o2cb/).
	Have you also considered what this will or won't do to possible
interaction with the CMan stack?  We'd love OCFS2 to handle both stacks.
	Finally, have you considered the user barriers to this?  The
absolute bottom-line goal of O2CB is the minimum input by the user.  For
this to work, the user should not have to see the plethora of XML config
files that heartbeat has (or at least, used to have).  I'm talking about
the user-visible part here, not the technical reality.  The O2CB
frontend or some other piece of software can take the user's name:ip
node mapping and turn it into whatever XML it needs, but the user
shouldn't have to do anything more than ocfs2console requires them
today.

Joel


-- 

"If you took all of the grains of sand in the world, and lined
 them up end to end in a row, you'd be working for the government!"
	- Mr. Interesting

Joel Becker
Principal Software Developer
Oracle
E-mail: joel.becker@oracle.com
Phone: (650) 506-8127

Lars Marowsky-Bree

2005-Oct-28 10:11 UTC

head link

[Ocfs2-devel] Re: [RFC] Integration with external clustering

On 2005-10-18T17:56:27, Jeff Mahoney <jeffm@suse.com> wrote:

Hi all,

just want to make sure this doesn't get lost. Where are we currently
at?

FYI, I'd like to ask for an additional way of documenting a suggested
approach: Please show how to setup a, say, 3 node "cluster"
(statically)
and how to shut it down again - on the commandline with shell scripts
;-) Hey, we're only operating on configfs/sysfs style "text files"
and
directories, no? That should be possible.

Not only will it be a good basis for a regression test of the API, but
it'll also help us understand how the scripts for the Cluster Resource
Manager integration will have to look like and whether that's a
workable approach.

Anybody thinking I'm on drugs? ;-)


Sincerely,
    Lars Marowsky-Br?e <lmb@suse.de>

-- 
High Availability & Clustering
SUSE Labs, Research and Development
SUSE LINUX Products GmbH - A Novell Business	 -- Charles Darwin
"Ignorance more frequently begets confidence than does knowledge"

Ocfs2 devel - Oct 2005 - [RFC] Integration with external clustering

[Ocfs2-devel] [RFC] Integration with external clustering

[Ocfs2-devel] [RFC] Integration with external clustering

[Ocfs2-devel] Re: [RFC] Integration with external clustering