On March 27, 2014 11:08:03 PM PDT, Nicolas Ochem <nicolas.ochem at
gmail.com> wrote:>Hi list,
>I would like to describe an issue I had today with Gluster and ask for
>opinion:
>
>I have a replicated mount with 2 replica. There is about 1TB of
>production
>data in there in around 100.000 files. They sit on 2x Supermicro
>x9dr3-ln4f
>machines with a RAID array of 18TB each, 64gb of ram, 2x Xeon CPUs, as
>recommended in Red Hat hardware guidelines for storage server. They
>have a
>10gb link between each other. I am running gluster 3.4.2 on centos 6.5
>
>This storage is NFS-mounted to a lot of production servers. A very
>little
>part of this data is actually useful, the rest is legacy.
>
>Due to some unrelated issue with one of the supermicro server (faulty
>memory), I had to take one of the nodes offline for 3 days.
>
>When I brought it back up, some files and directories ended up in
>heal-failed state (but no split-brain). Unfortunately that were the
>critical files that had been edited in the last 3 days. On the NFS
>mounts,
>attempts to read these files resulted in I/O error.
>
>I was able to fix a few of these files by manually removing them in
>each
>brick and then copying them to the mounted volume again. But I did not
>know
>what to do when full directories were unreachable because of "heal
>failed".
>
>I later read that healing could take time and that heal-failed may be a
>transient state (is that correct?
>http://stackoverflow.com/questions/19257054/is-it-normal-to-get-a-lot-of-heal-failed-entries-in-a-gluster-mount),
>but at the time I thought that was beyond recovery, so I proceeded to
>destroy the gluster volume. Then on one of the replicas I moved the
>content
>of the brick to another directory, created another volume with the same
>name, then copied the content of the brick to the mounted volume. This
>took
>around 2 hours. Then I had to reboot all my NFS-mounted machines which
>were
>in "stale NFS file handle" state.
>
>Few questions :
>- I realize that I cannot expect 1TB of data to heal instantly, but is
>there any way for me to know if the system would have recovered
>eventually
>despite being shown as "heal failed" ?
>- if yes, what amount of files and filesize should I clean-up from my
>volume to make this time go under 10 minutes ?
>- would native gluster mounts instead of NFS have been of help here ?
>- would any other course of action have resulted in faster recovery
>time ?
>- is there a way in such situation to make one replica have authority
>about
>the correct status of the filesystem ?
>
>Thanks in advance for your replies.
>
>
Although the self-heal daemon can take time to heal all the files, accessing a
file that needs healed does trigger the heal to be performed immediately by the
client (the nfs server is the client in this case).
Like pretty much all errors in GlusterFS, you would have had to look in the logs
to find why something as vague as "heal failed" happened.