Brian Candler
2012-Jul-11 10:27 UTC
[Gluster-users] Recovering a broken distributed volume
I had a RAID array fail due to a number of Seagate drives going down, so
this gave me an opportunity to check the recovery of gluster volumes.
I found that the replicated volumes came up just fine, but the
non-replicated ones have not. I'm wondering if there's a better
solution
than simply blowing them away and creating fresh ones (especially to keep
the half data set in the distributed volume).
The platform is ubuntu 12.04, glusterfs 3.3.0.
There are two nodes, dev-storage1/2, and four volumes:
* A distributed volume across the two nodes
Volume Name: fast
Type: Distribute
Volume ID: 864fd12d-d879-4310-abaa-a2cb99b7f695
Status: Started
Number of Bricks: 2
Transport-type: tcp
Bricks:
Brick1: dev-storage1:/disk/storage1/fast
Brick2: dev-storage2:/disk/storage2/fast
* A replicated volume across the two nodes
Volume Name: safe
Type: Replicate
Volume ID: 47a8f326-0e48-4a71-9cfe-f9ef8d555db7
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: dev-storage1:/disk/storage1/safe
Brick2: dev-storage2:/disk/storage2/safe
* Two single-brick volumes, one on each node.
Volume Name: single1
Type: Distribute
Volume ID: 74d62eb4-176e-4671-8471-779d909e19f0
Status: Started
Number of Bricks: 1
Transport-type: tcp
Bricks:
Brick1: dev-storage1:/disk/storage1/single1
Volume Name: single2
Type: Distribute
Volume ID: edab496f-c204-4122-ad10-c5f2e2ac92bd
Status: Started
Number of Bricks: 1
Transport-type: tcp
Bricks:
Brick1: dev-storage2:/disk/storage2/single2
These four volumes are FUSE-mounted on
/gluster/safe
/gluster/fast
/gluster/single1
/glsuter/single2
on both servers.
The bricks are sharing their underlying filesystems, i.e.
dev-storage1:/disk/storage1 and dev-storage2:/disk/storage2.
Now, the filesystem dev-storage1:/disk/storage1 failed. I created a
new filesystem mounted on dev-storage1:/disk/storage1, did
mkdir /disk/storage1/{single1,safe,fast}
and restarted glusterd.
After a couple of minutes, the contents of the replicated volume
("safe") was
synchronised between the two nodes. That is,
ls -lR /gluster/safe
ls -lR /disk/storage1/safe # on dev-storage1
ls -lR /disk/storage2/safe # on dev-storage2
all showed the same. This is excellent.
However the other two filesystems which depend on dev-storage1 are broken.
As this is a dev system I could just blow them away, but I would like to use
this as an exercise for fixing broken filesystems which I may have to do in
production later.
Here are the problems:
(1) The "single1" volume is empty, which I expected since it's a
brand new
empty directory, but I cannot create files in it.
root at dev-storage1:~# touch /gluster/single1/test
touch: cannot touch `/gluster/single1/test': Read-only file system
I guess gluster doesn't like the lack of metadata on this directory. Is
there a quick recovery procedure here, or do I need to destroy the volume
and recreate it?
(2) The "fast" (distributed) volume appears empty to the clients:
root at dev-storage1:~# ls /gluster/fast
root at dev-storage1:~#
However there is still half the content available in the brick which didn't
fail:
root at dev-storage2:~# ls /disk/storage2/fast
images iso
root at dev-storage2:~#
Although this is a test system, ideally I would like to reactivate this
volume and make the half data set available.
I guess I could destroy the volume, move the data to a safe place, create a
new volume and copy in the data. Is there a more direct way?
Thanks,
Brian.
Brian Candler
2012-Jul-11 14:07 UTC
[Gluster-users] Recovering a broken distributed volume
On Wed, Jul 11, 2012 at 11:27:58AM +0100, Brian Candler wrote:> (1) The "single1" volume is empty, which I expected since it's a brand new > empty directory, but I cannot create files in it. > > root at dev-storage1:~# touch /gluster/single1/test > touch: cannot touch `/gluster/single1/test': Read-only file systemSorry, this was my problem: it turns out a few more drives failed, and the underlying brick filesystem went read-only. Unbelievably that's 7 seagate drives failed out of an array of 12! Anyway, rebuilding the array with the remaining 5 working disks, the single volume came up fine. Also the distributed volume healed itself after I did 'ls' a few times on it. root at dev-storage1:~# ls /gluster/fast ... root at dev-storage1:~# ls /gluster/fast images iso root at dev-storage1:~# ls /gluster/fast/images/ root at dev-storage1:~# ls /gluster/fast/iso linuxmint-11-gnome-dvd-64bit.iso root at dev-storage1:~# ls /gluster/fast/images/ lucidtest root at dev-storage1:~# ls /gluster/fast/images/lucidtest/ tmpaJqTD9.qcow2 I can only see one other strange thing: the newly-created replica appears to have made a sparse copy of a file which wasn't sparse on the original. On the original working side of the replicated volume: root at dev-storage2:~# ls -l /disk/storage2/safe/images/lucidtest/ total 756108 -rw-r--r-- 2 root root 774307840 Jul 11 14:55 tmpaJqTD9.qcow2 root at dev-storage2:~# du -k /disk/storage2/safe/images/lucidtest/ 756116 /disk/storage2/safe/images/lucidtest/ On the newly-created side, which glustershd rebuilt automatically: root at dev-storage1:~# ls -l /disk/storage1/safe/images/lucidtest/ total 422728 -rw-r--r-- 2 root root 774307840 Jul 11 14:55 tmpaJqTD9.qcow2 root at dev-storage1:~# du -k /disk/storage1/safe/images/lucidtest/ 422736 /disk/storage1/safe/images/lucidtest/ Is this intentional? Does glustershd notice runs of zeros and create a sparse file on the target? (This may or may not be desirable, e.g. for performance you might want to fully preallocate a VM image) Regards, Brian.
Arnold Krille
2012-Jul-11 21:01 UTC
[Gluster-users] Recovering a broken distributed volume
On 11.07.2012 22:37, Mailing Lists wrote:> I had that some years ago on two servers at a customer's office, 2 disks in each in raid 1 so 4 disks. Same series ... failing in the same afternoon after 10 months of service !I can only repeat myself: Most people argue "but its two devices, its statistically independent". Well, two devices(*) manufactured at the same time and on the same assembly line (preferably with consecutive serial numbers), running the same firmware-version, bought at the same time, used in the same array with the same external stress and the same usage patterns. Even all my non-mathematical-non-informatics friends know that this isn't what you call "statistically independent". (*) Doesn't matter if its disks or switches or motherboards or processors or memory chips or power supplies or ups or backplanes or power-distribution-boards. If you "remove" your spof by using two of the exact same kind, statistics (and sad experience) say that its still a spof. So, subs (or is it "Mailing Lists":) and probably Brian, thanks for another data-point of this never-happens-in-real-life-scenario. We feel with you for such scenarios. Have fun, Arnold -- Dieses Email wurde elektronisch erstellt und ist ohne handschriftliche Unterschrift g?ltig. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 198 bytes Desc: OpenPGP digital signature URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20120711/a1e5aee3/attachment.sig>