thr3ads.net - Gluster users - [Gluster-users] Not real confident in 3.3 [Jun 2012]

If this information is useful, please help other people find it:
Share via:

Sean Fulton

2012-Jun-16 17:48 UTC

[Gluster-users] Not real confident in 3.3

I do not mean to be argumentative, but I have to admit a little 
frustration with Gluster. I know an enormous emount of effort has gone 
into this product, and I just can't believe that with all the effort 
behind it and so many people using it, it could be so fragile.

So here goes. Perhaps someone here can point to the error of my ways. I 
really want this to work because it would be ideal for our environment, 
but ...

Please note that all of the nodes below are OpenVZ nodes with 
nfs/nfsd/fuse modules loaded on the hosts.

After spending months trying to get 3.2.5 and 3.2.6 working in a 
production environment, I gave up on Gluster and went with a 
Linux-HA/NFS cluster which just works. The problems I had with gluster 
were strange lock-ups, split brains, and too many instances where the 
whole cluster was off-line until I reloaded the data.

So wiith the release of 3.3, I decided to give it another try. I created 
one relicated volume on my two NFS servers.

I then mounted the volume on a client as follows:
10.10.10.7:/pub2    /pub2     nfs 
rw,noacl,noatime,nodiratime,soft,proto=tcp,vers=3,defaults 0 0

I threw some data at it (find / -mount -print | cpio -pvdum /pub2/test)

Within 10 seconds it locked up solid. No error messages on any of the 
servers, the client was unresponsive and load on the client was 15+. I 
restarted glusterd on both of my NFS servers, and the client remained 
locked. Finally I killed the cpio process on the client. When I started 
another cpio, it runs further than before, but now the logs on my 
NFS/Gluster server say:

[2012-06-16 13:37:35.242754] I 
[afr-self-heal-common.c:1318:afr_sh_missing_entries_lookup_done] 
0-pub2-replicate-0: No sources for dir of 
<gfid:4a787ad7-ab91-46ef-9b31-715e49f5f818>/log/secure, in missing entry 
self-heal, continuing with the rest of the self-heals
[2012-06-16 13:37:35.243315] I 
[afr-self-heal-common.c:994:afr_sh_missing_entries_done] 
0-pub2-replicate-0: split brain found, aborting selfheal of 
<gfid:4a787ad7-ab91-46ef-9b31-715e49f5f818>/log/secure
[2012-06-16 13:37:35.243350] E 
[afr-self-heal-common.c:2156:afr_self_heal_completion_cbk] 
0-pub2-replicate-0: background  data gfid self-heal failed on 
<gfid:4a787ad7-ab91-46ef-9b31-715e49f5f818>/log/secure


This still seems to be an INCREDIBLY fragile system. Why would it lock 
solid while copying a large file? Why no errors in the logs?

I am the only one seeing this kind of behavior?

sean





-- 
Sean Fulton
GCN Publishing, Inc.
Internet Design, Development and Consulting For Today's Media Companies
http://www.gcnpublishing.com
(203) 665-6211, x203

Sean Fulton

2012-Jun-16 20:47 UTC

head link

[Gluster-users] Not real confident in 3.3

A little more information on this, which makes it more puzzling:

1) The split-brain message is strange because there are only two server 
nodes and 1 client node which has mounted the volume via NFS on a 
floating IP. This was done to guarantee that only one node gets written 
to at any point in time, so there is zero chance that two nodes were 
updated simultaneously.

2) I re-ran the cpio command to put a file tree on the gluster volume, 
then made a tar of the file tree. The tree was not being written to by 
anyone, and yet every 5 to 10 files tar would report "file changed as I 
read it." At first I thought there was some sort of healing operation 
going on, but since I was only writing to one node at a time, then 
making the backup, I don't see how this was possible.

I've checked the network, resources, etc., and there are no issues 
there, no packet loss, all machines share the same time via NTP, etc. 
The OS is SL 6.1.

So this is all very strange behavior.

sean

On 06/16/2012 01:48 PM, Sean Fulton wrote:> I do not mean to be argumentative, but I have to admit a little 
> frustration with Gluster. I know an enormous emount of effort has gone 
> into this product, and I just can't believe that with all the effort 
> behind it and so many people using it, it could be so fragile.
>
> So here goes. Perhaps someone here can point to the error of my ways. 
> I really want this to work because it would be ideal for our 
> environment, but ...
>
> Please note that all of the nodes below are OpenVZ nodes with 
> nfs/nfsd/fuse modules loaded on the hosts.
>
> After spending months trying to get 3.2.5 and 3.2.6 working in a 
> production environment, I gave up on Gluster and went with a 
> Linux-HA/NFS cluster which just works. The problems I had with gluster 
> were strange lock-ups, split brains, and too many instances where the 
> whole cluster was off-line until I reloaded the data.
>
> So wiith the release of 3.3, I decided to give it another try. I 
> created one relicated volume on my two NFS servers.
>
> I then mounted the volume on a client as follows:
> 10.10.10.7:/pub2    /pub2     nfs 
> rw,noacl,noatime,nodiratime,soft,proto=tcp,vers=3,defaults 0 0
>
> I threw some data at it (find / -mount -print | cpio -pvdum /pub2/test)
>
> Within 10 seconds it locked up solid. No error messages on any of the 
> servers, the client was unresponsive and load on the client was 15+. I 
> restarted glusterd on both of my NFS servers, and the client remained 
> locked. Finally I killed the cpio process on the client. When I 
> started another cpio, it runs further than before, but now the logs on 
> my NFS/Gluster server say:
>
> [2012-06-16 13:37:35.242754] I 
> [afr-self-heal-common.c:1318:afr_sh_missing_entries_lookup_done] 
> 0-pub2-replicate-0: No sources for dir of 
> <gfid:4a787ad7-ab91-46ef-9b31-715e49f5f818>/log/secure, in missing 
> entry self-heal, continuing with the rest of the self-heals
> [2012-06-16 13:37:35.243315] I 
> [afr-self-heal-common.c:994:afr_sh_missing_entries_done] 
> 0-pub2-replicate-0: split brain found, aborting selfheal of 
> <gfid:4a787ad7-ab91-46ef-9b31-715e49f5f818>/log/secure
> [2012-06-16 13:37:35.243350] E 
> [afr-self-heal-common.c:2156:afr_self_heal_completion_cbk] 
> 0-pub2-replicate-0: background  data gfid self-heal failed on 
> <gfid:4a787ad7-ab91-46ef-9b31-715e49f5f818>/log/secure
>
>
> This still seems to be an INCREDIBLY fragile system. Why would it lock 
> solid while copying a large file? Why no errors in the logs?
>
> I am the only one seeing this kind of behavior?
>
> sean
>
>
>
>
>
-- 
Sean Fulton
GCN Publishing, Inc.
Internet Design, Development and Consulting For Today's Media Companies
http://www.gcnpublishing.com
(203) 665-6211, x203

Anand Avati

2012-Jun-16 21:04 UTC

head link

[Gluster-users] Not real confident in 3.3

Was there anything in dmesg on the servers? If you are able to reproduce
the hang, can you get the output of 'gluster volume status <name>
callpool'
and 'gluster volume status <name> nfs callpool' ?

How big is the 'log/secure' file? Is it so large the the client was just
busy writing it for a very long time? Are there any signs of disconnections
or ping tmeouts in the logs?

Avati

On Sat, Jun 16, 2012 at 10:48 AM, Sean Fulton <sean at
gcnpublishing.com>wrote:
> I do not mean to be argumentative, but I have to admit a little
> frustration with Gluster. I know an enormous emount of effort has gone into
> this product, and I just can't believe that with all the effort behind
it
> and so many people using it, it could be so fragile.
>
> So here goes. Perhaps someone here can point to the error of my ways. I
> really want this to work because it would be ideal for our environment, but
> ...
>
> Please note that all of the nodes below are OpenVZ nodes with
> nfs/nfsd/fuse modules loaded on the hosts.
>
> After spending months trying to get 3.2.5 and 3.2.6 working in a
> production environment, I gave up on Gluster and went with a Linux-HA/NFS
> cluster which just works. The problems I had with gluster were strange
> lock-ups, split brains, and too many instances where the whole cluster was
> off-line until I reloaded the data.
>
> So wiith the release of 3.3, I decided to give it another try. I created
> one relicated volume on my two NFS servers.
>
> I then mounted the volume on a client as follows:
> 10.10.10.7:/pub2    /pub2     nfs
rw,noacl,noatime,nodiratime,**soft,proto=tcp,vers=3,defaults
> 0 0
>
> I threw some data at it (find / -mount -print | cpio -pvdum /pub2/test)
>
> Within 10 seconds it locked up solid. No error messages on any of the
> servers, the client was unresponsive and load on the client was 15+. I
> restarted glusterd on both of my NFS servers, and the client remained
> locked. Finally I killed the cpio process on the client. When I started
> another cpio, it runs further than before, but now the logs on my
> NFS/Gluster server say:
>
> [2012-06-16 13:37:35.242754] I [afr-self-heal-common.c:1318:**
> afr_sh_missing_entries_lookup_**done] 0-pub2-replicate-0: No sources for
> dir of <gfid:4a787ad7-ab91-46ef-9b31-**715e49f5f818>/log/secure, in
> missing entry self-heal, continuing with the rest of the self-heals
> [2012-06-16 13:37:35.243315] I
[afr-self-heal-common.c:994:**afr_sh_missing_entries_done]
> 0-pub2-replicate-0: split brain found, aborting selfheal of
> <gfid:4a787ad7-ab91-46ef-9b31-**715e49f5f818>/log/secure
> [2012-06-16 13:37:35.243350] E
[afr-self-heal-common.c:2156:**afr_self_heal_completion_cbk]
> 0-pub2-replicate-0: background  data gfid self-heal failed on
> <gfid:4a787ad7-ab91-46ef-9b31-**715e49f5f818>/log/secure
>
>
> This still seems to be an INCREDIBLY fragile system. Why would it lock
> solid while copying a large file? Why no errors in the logs?
>
> I am the only one seeing this kind of behavior?
>
> sean
>
>
>
>
>
> --
> Sean Fulton
> GCN Publishing, Inc.
> Internet Design, Development and Consulting For Today's Media Companies
> http://www.gcnpublishing.com
> (203) 665-6211, x203
>
> ______________________________**_________________
> Gluster-users mailing list
> Gluster-users at gluster.org
>
http://gluster.org/cgi-bin/**mailman/listinfo/gluster-users<http://gluster.org/cgi-bin/mailman/listinfo/gluster-users>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://supercolony.gluster.org/pipermail/gluster-users/attachments/20120616/76ac571d/attachment.html>

Brian Candler

2012-Jun-17 07:42 UTC

head link

[Gluster-users] Not real confident in 3.3

On Sat, Jun 16, 2012 at 04:47:51PM -0400, Sean Fulton
wrote:> 1) The split-brain message is strange because there are only two
> server nodes and 1 client node which has mounted the volume via NFS
> on a floating IP. This was done to guarantee that only one node gets
> written to at any point in time, so there is zero chance that two
> nodes were updated simultaneously.
Are you using a distributed volume, or a replicated volume? Writes to a
replicated volume go to both nodes.
>    [586898.273283] INFO: task flush-0:45:633954 blocked for more than 120
seconds.
>    [586898.273290] "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
>    [586898.273295] flush-0:45    D ffff8806037592d0     0 633954      20
0x00000000
>    [586898.273304]  ffff88000d1ebbe0 0000000000000046 ffff88000d1ebd6c
0000000000000000
>    [586898.273312]  ffff88000d1ebce0 ffffffff81054444 ffff88000d1ebc80
ffff88000d1ebbf0
>    [586898.273319]  ffff8806050ac5f8 ffff880603759888 ffff88000d1ebfd8
ffff88000d1ebfd8
>    [586898.273326] Call Trace:
>    [586898.273335]  [<ffffffff81054444>] ?
find_busiest_group+0x244/0xb20
>    [586898.273343]  [<ffffffff811ab050>] ? inode_wait+0x0/0x20
>    [586898.273349]  [<ffffffff811ab05e>] inode_wait+0xe/0x20
Are you using XFS by any chance?

I started with XFS, because that was what the gluster docs recommend, but
eventually gave up on it.  I can replicate those sort of kernel lockups on a
24-disk MD array within a short space of time - without gluster, just by
throwing four bonnie++ processes at it.

The same tests run with either ext4 or btrfs do not hang, at least not
during two days of continuous testing.

Of course, any kernel problem cannot be the fault of glusterfs, since
glusterfs runs entirely in userland.

Regards,

Brian.

Brian Candler

2012-Jun-17 15:20 UTC

head link

[Gluster-users] Not real confident in 3.3

BTW, the thing which is unusual about your configuration is the HA setup. 
Are you completely sure that the HA IP has not been moving between the
nodes?  What if you point the NFS client at one server's fixed IP address
instead of the HA address?

I can imagine that the HA IP moving would cause the split-brain situations
you describe.

Jeff Darcy

2012-Jun-18 14:46 UTC

head link

[Gluster-users] Not real confident in 3.3

On Sat, 2012-06-16 at 13:48 -0400, Sean Fulton wrote:> I do not mean to be argumentative, but I have to admit a little 
> frustration with Gluster. I know an enormous emount of effort has gone 
> into this product, and I just can't believe that with all the effort 
> behind it and so many people using it, it could be so fragile.
Often it's not individual pieces that are fragile but combinations of
pieces.  For example, two possible interactions might be involved for
you:

(1) There are known problems with the interaction between FUSE and
transparent hugepages
(https://bugzilla.redhat.com/show_bug.cgi?id=764964).  This could cause
one or more of your server processes to lock up.

(2) There are known problems with OpenVZ and our use of the "trusted"
extended-attribute namespace
(http://forum.openvz.org/index.php?t=msg&goto=35230&).  This should
result in a clean failure, but it's possible that it's leading to
problems tracking which replicas need updates instead.

If you're still having problems with the workarounds for those two
issues, please let us know.

Reasonably Related Threads

Search for more seemingly similar threads

Gluster users - Jun 2012 - Not real confident in 3.3

[Gluster-users] Not real confident in 3.3

[Gluster-users] Not real confident in 3.3

[Gluster-users] Not real confident in 3.3

[Gluster-users] Not real confident in 3.3

[Gluster-users] Not real confident in 3.3

[Gluster-users] Not real confident in 3.3

Reasonably Related Threads