Response inline.
Jeff White
Linux/Unix Systems Engineer
University of Pittsburgh - CSSD
Jaw171 at pitt.edu
On 10/17/2011 01:11 PM, Jeff Shaw wrote:> Hello Gluster users,
> Before I put Gluster into production, I am wondering how it determines
whether a byte can be written, and where I should look in the source code to
change these behaviors. My experiences are with glusterfs 3.2.4 on CentOS 6
64-bit.
>
> Suppose I have a Gluster volume made up of four 1 MB bricks, like this
>
> Volume Name: test
> Type: Distributed-Replicate
> Status: Started
> Number of Bricks: 2 x 2 = 4
> Transport-type: tcp
> Bricks:
> Brick1: gluster0-node0:/brick0
> Brick2: gluster0-node1:/brick1
> Brick3: gluster0-node0:/brick2
> Brick4: gluster0-node1:/brick3
>
> The mounted Gluster volume will report that the size of the volume is 2 MB,
which creates a false impression that it can hold a 2 MB file. This isn't
too bad, since people are used to a file system's maximum file size being
smaller than the file system's maximum total size.
>
> Scenario 1: One brick runs out of space first.
>
> Taking this a step further, suppose brick0 is actually 2 MB, and I attempt
to copy a file having 2 MB to the Gluster volume. If Gluster chooses to copy the
file to brick0 and brick1, then the copy succeeds, although brick1 only stores
half the file. When brick0 fails, only half of the file is available for
reading. It would be better if Gluster failed to continue writing when one brick
in the replication group ran out of space.
>
> Scenario 2: One brick is umounted.
>
> Suppose after Scenario 1 completes, brick0 goes offline. Then, a user
attempts to retrieve the 2 MB file. The user receives the file fragment. Because
gluster0-node0:/brick0 is unmounted, the file doesn't exist at that
location, and so the gateway copies the file fragment from
gluster0-node1:/brick1 onto gluster0-node0:/brick0. Then, even worse, the user
starts copying files onto the Gluster volume. All the files destined for the
first replication group appear under /brick0, even though it's unmounted.
This eventually will fill up the root file system.
>
> I think to fix this, when creating a file, Gluster should make sure that
the file system that the brick was originally created on is mounted.
I had an idea for this already: http://bugs.gluster.com/show_bug.cgi?id=3578
> Also, perhaps bricks should only be able to be created at mount points.
I think this would be too limiting. Some people might have a large
/data mount point but only want /data/gluster to hold the gluster files.
> A colleague of mine suggested mounting all the Gluster bricks within
another file system's path that's read only.
This would be more complicated for a gluster admin to set up but could
be possible. You could also mount tmpfs or something to /data then your
real storage to /data/gluster. That might work even though tmpfs itself
won't work with Gluster (I don't think so at least) so I'm not sure
what
would happen if /data/gluster was unmounted and gluster suddenly fell
into an unsupported filesystem type.
In any case I wouldn't force this to need to be true but just have it as
a way an admin could design their servers if they wish.
> Gluster's source code is quite large, so if someone could point me to
the right files to edit, I'd be happy to change its behavior to match what I
expect.
>
> Thanks,
> Jeff
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users