thr3ads.net - Gluster users - [Gluster-users] Recovering a broken distributed volume [Jul 2012]

If this information is useful, please help other people find it:
Share via:

Brian Candler

2012-Jul-11 10:27 UTC

[Gluster-users] Recovering a broken distributed volume

I had a RAID array fail due to a number of Seagate drives going down, so
this gave me an opportunity to check the recovery of gluster volumes.

I found that the replicated volumes came up just fine, but the
non-replicated ones have not.  I'm wondering if there's a better
solution
than simply blowing them away and creating fresh ones (especially to keep
the half data set in the distributed volume).

The platform is ubuntu 12.04, glusterfs 3.3.0.

There are two nodes, dev-storage1/2, and four volumes:

* A distributed volume across the two nodes

Volume Name: fast
Type: Distribute
Volume ID: 864fd12d-d879-4310-abaa-a2cb99b7f695
Status: Started
Number of Bricks: 2
Transport-type: tcp
Bricks:
Brick1: dev-storage1:/disk/storage1/fast
Brick2: dev-storage2:/disk/storage2/fast

* A replicated volume across the two nodes

Volume Name: safe
Type: Replicate
Volume ID: 47a8f326-0e48-4a71-9cfe-f9ef8d555db7
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: dev-storage1:/disk/storage1/safe
Brick2: dev-storage2:/disk/storage2/safe

* Two single-brick volumes, one on each node.
 
Volume Name: single1
Type: Distribute
Volume ID: 74d62eb4-176e-4671-8471-779d909e19f0
Status: Started
Number of Bricks: 1
Transport-type: tcp
Bricks:
Brick1: dev-storage1:/disk/storage1/single1
 
Volume Name: single2
Type: Distribute
Volume ID: edab496f-c204-4122-ad10-c5f2e2ac92bd
Status: Started
Number of Bricks: 1
Transport-type: tcp
Bricks:
Brick1: dev-storage2:/disk/storage2/single2

These four volumes are FUSE-mounted on
  /gluster/safe
  /gluster/fast
  /gluster/single1
  /glsuter/single2
on both servers.

The bricks are sharing their underlying filesystems, i.e.
dev-storage1:/disk/storage1 and dev-storage2:/disk/storage2.

Now, the filesystem dev-storage1:/disk/storage1 failed. I created a
new filesystem mounted on dev-storage1:/disk/storage1, did

  mkdir /disk/storage1/{single1,safe,fast}

and restarted glusterd.

After a couple of minutes, the contents of the replicated volume
("safe") was
synchronised between the two nodes. That is,

ls -lR /gluster/safe
ls -lR /disk/storage1/safe  # on dev-storage1
ls -lR /disk/storage2/safe  # on dev-storage2

all showed the same. This is excellent.

However the other two filesystems which depend on dev-storage1 are broken. 
As this is a dev system I could just blow them away, but I would like to use
this as an exercise for fixing broken filesystems which I may have to do in
production later.

Here are the problems:

(1) The "single1" volume is empty, which I expected since it's a
brand new
empty directory, but I cannot create files in it.

root at dev-storage1:~# touch /gluster/single1/test
touch: cannot touch `/gluster/single1/test': Read-only file system

I guess gluster doesn't like the lack of metadata on this directory. Is
there a quick recovery procedure here, or do I need to destroy the volume
and recreate it?

(2) The "fast" (distributed) volume appears empty to the clients:

root at dev-storage1:~# ls /gluster/fast
root at dev-storage1:~# 

However there is still half the content available in the brick which didn't
fail:

root at dev-storage2:~# ls /disk/storage2/fast
images  iso
root at dev-storage2:~# 

Although this is a test system, ideally I would like to reactivate this
volume and make the half data set available.

I guess I could destroy the volume, move the data to a safe place, create a
new volume and copy in the data.  Is there a more direct way?

Thanks,

Brian.

Brian Candler

2012-Jul-11 14:07 UTC

head link

[Gluster-users] Recovering a broken distributed volume

On Wed, Jul 11, 2012 at 11:27:58AM +0100, Brian Candler
wrote:> (1) The "single1" volume is empty, which I expected since
it's a brand new
> empty directory, but I cannot create files in it.
> 
> root at dev-storage1:~# touch /gluster/single1/test
> touch: cannot touch `/gluster/single1/test': Read-only file system
Sorry, this was my problem: it turns out a few more drives failed, and the
underlying brick filesystem went read-only.  Unbelievably that's 7 seagate
drives failed out of an array of 12!

Anyway, rebuilding the array with the remaining 5 working disks, the single
volume came up fine. Also the distributed volume healed itself after I
did 'ls' a few times on it.

root at dev-storage1:~# ls /gluster/fast
...
root at dev-storage1:~# ls /gluster/fast
images  iso
root at dev-storage1:~# ls /gluster/fast/images/
root at dev-storage1:~# ls /gluster/fast/iso
linuxmint-11-gnome-dvd-64bit.iso
root at dev-storage1:~# ls /gluster/fast/images/
lucidtest
root at dev-storage1:~# ls /gluster/fast/images/lucidtest/
tmpaJqTD9.qcow2

I can only see one other strange thing: the newly-created replica appears to
have made a sparse copy of a file which wasn't sparse on the original.

On the original working side of the replicated volume:

root at dev-storage2:~# ls -l /disk/storage2/safe/images/lucidtest/
total 756108
-rw-r--r-- 2 root root 774307840 Jul 11 14:55 tmpaJqTD9.qcow2
root at dev-storage2:~# du -k /disk/storage2/safe/images/lucidtest/
756116	/disk/storage2/safe/images/lucidtest/

On the newly-created side, which glustershd rebuilt automatically:

root at dev-storage1:~# ls -l /disk/storage1/safe/images/lucidtest/
total 422728
-rw-r--r-- 2 root root 774307840 Jul 11 14:55 tmpaJqTD9.qcow2
root at dev-storage1:~# du -k /disk/storage1/safe/images/lucidtest/
422736	/disk/storage1/safe/images/lucidtest/

Is this intentional?  Does glustershd notice runs of zeros and create a
sparse file on the target?

(This may or may not be desirable, e.g.  for performance you might want to
fully preallocate a VM image)

Regards,

Brian.

Arnold Krille

2012-Jul-11 21:01 UTC

head link

[Gluster-users] Recovering a broken distributed volume

On 11.07.2012 22:37, Mailing Lists wrote:> I had that some years ago on two servers at a customer's office, 2
disks in each in raid 1 so 4 disks. Same series ... failing in the same
afternoon after 10 months of service !
I can only repeat myself: Most people argue "but its two devices, its
statistically independent".

Well, two devices(*) manufactured at the same time and on the same
assembly line (preferably with consecutive serial numbers), running the
same firmware-version, bought at the same time, used in the same array
with the same external stress and the same usage patterns. Even all my
non-mathematical-non-informatics friends know that this isn't what you
call "statistically independent".

(*) Doesn't matter if its disks or switches or motherboards or
processors or memory chips or power supplies or ups or backplanes or
power-distribution-boards. If you "remove" your spof by using two of
the
exact same kind, statistics (and sad experience) say that its still a spof.

So, subs (or is it "Mailing Lists":) and probably Brian, thanks for
another data-point of this never-happens-in-real-life-scenario. We feel
with you for such scenarios.

Have fun,

Arnold
--
Dieses Email wurde elektronisch erstellt und ist ohne handschriftliche
Unterschrift g?ltig.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: OpenPGP digital signature
URL:
<http://supercolony.gluster.org/pipermail/gluster-users/attachments/20120711/a1e5aee3/attachment.sig>

Gluster users - Jul 2012 - Recovering a broken distributed volume

[Gluster-users] Recovering a broken distributed volume

[Gluster-users] Recovering a broken distributed volume

[Gluster-users] Recovering a broken distributed volume