gluster1206 at akxnet.de
2011-Aug-30 20:29 UTC
[Gluster-users] Gluster 3.2.1 : Mounted volumes "vanishes" on client side
Hi! I am using Gluster 3.2.1 on a two/three Opensuse 11.3/11.4 server cluster, where the Gluster nodes are server and client. While merging the cluster to servers with higher performance, I tried Gluster 3.3 beta. Both versions show the same problem: A single volume (holding the mail base, being accessed by POP3, IMAP and SMTP server) reports short time after mounting an "Input/Ouput error" and becomes unaccessible. The same volume on another idle server mounted still works. ls /var/vmail ls: cannot access /var/vmail: Input/output error lsof /var/vmail lsof: WARNING: can't stat() fuse.glusterfs file system /var/vmail Output information may be incomplete. lsof: status error on /var/vmail: Input/output error After unmounting and remounting the volume, the same thing happens. I tried to recreate the volume, but this does not help. Although just created, the log is full of "self healing" entries (but they should not cause the volume to disappear, right?). I tried it with initially three bricks (and had to remove one) and the following parameters Volume Name: vmail Type: Replicate Status: Started Number of Bricks: 3 Transport-type: tcp Bricks: Brick1: mx00.akxnet.de:/data/vmail Brick2: mx02.akxnet.de:/data/vmail Brick3: mx01.akxnet.de:/data/vmail Options Reconfigured: network.ping-timeout: 15 performance.write-behind-window-size: 2097152 auth.allow: xx.xx.xx.xx,yy.yy.yy.yy,zz.zz.zz.zz,127.0.0.1 performance.io-thread-count: 64 performance.io-cache: on performance.stat-prefetch: on performance.quick-read: off nfs.disable: on performance.cache-size: 32MB and 64 MB and after the delete/create with two bricks and the following parameters Volume Name: vmail Type: Replicate Status: Started Number of Bricks: 2 Transport-type: tcp Bricks: Brick1: mx02.akxnet.de:/data/vmail Brick2: mx01.akxnet.de:/data/vmail Options Reconfigured: performance.quick-read: off nfs.disable: on auth.allow: xx.xx.xx.xx,yy.yy.yy.yy,zz.zz.zz.zz,127.0.0.1 But always the same result. The log entries [2011-08-30 22:10:45.376568] I [afr-self-heal-common.c:1557:afr_self_heal_completion_cbk] 0-vmail-replicate-0: background data data self-heal completed on /xxxxx.de/yyyyyyyyyy/.Tauchen/courierimapuiddb [2011-08-30 22:10:45.385541] I [afr-common.c:801:afr_lookup_done] 0-vmail-replicate-0: background meta-data self-heal triggered. path: /xxxxx.de/yyyyyyyyy/.Tauchen/courierimapkeywords The volume is presently unuseable. Any hint?
Pranith Kumar K
2011-Aug-31 03:05 UTC
[Gluster-users] Gluster 3.2.1 : Mounted volumes "vanishes" on client side
hi, This can happen if there is a split-brain on that directory, could you post the output of "getfattr -d -m . /data/vmail/var/vmail" on all the bricks so that we can confirm if that is the case. Pranith. On 08/31/2011 01:59 AM, gluster1206 at akxnet.de wrote:> Hi! > > I am using Gluster 3.2.1 on a two/three Opensuse 11.3/11.4 server > cluster, where the Gluster nodes are server and client. > > While merging the cluster to servers with higher performance, I tried > Gluster 3.3 beta. > > Both versions show the same problem: > > A single volume (holding the mail base, being accessed by POP3, IMAP and > SMTP server) reports short time after mounting an "Input/Ouput error" > and becomes unaccessible. The same volume on another idle server mounted > still works. > > ls /var/vmail > ls: cannot access /var/vmail: Input/output error > > lsof /var/vmail > lsof: WARNING: can't stat() fuse.glusterfs file system /var/vmail > Output information may be incomplete. > lsof: status error on /var/vmail: Input/output error > > After unmounting and remounting the volume, the same thing happens. > > I tried to recreate the volume, but this does not help. > > Although just created, the log is full of "self healing" entries (but > they should not cause the volume to disappear, right?). > > I tried it with initially three bricks (and had to remove one) and the > following parameters > > Volume Name: vmail > Type: Replicate > Status: Started > Number of Bricks: 3 > Transport-type: tcp > Bricks: > Brick1: mx00.akxnet.de:/data/vmail > Brick2: mx02.akxnet.de:/data/vmail > Brick3: mx01.akxnet.de:/data/vmail > Options Reconfigured: > network.ping-timeout: 15 > performance.write-behind-window-size: 2097152 > auth.allow: xx.xx.xx.xx,yy.yy.yy.yy,zz.zz.zz.zz,127.0.0.1 > performance.io-thread-count: 64 > performance.io-cache: on > performance.stat-prefetch: on > performance.quick-read: off > nfs.disable: on > performance.cache-size: 32MB and 64 MB > > and after the delete/create with two bricks and the following parameters > > Volume Name: vmail > Type: Replicate > Status: Started > Number of Bricks: 2 > Transport-type: tcp > Bricks: > Brick1: mx02.akxnet.de:/data/vmail > Brick2: mx01.akxnet.de:/data/vmail > Options Reconfigured: > performance.quick-read: off > nfs.disable: on > auth.allow: xx.xx.xx.xx,yy.yy.yy.yy,zz.zz.zz.zz,127.0.0.1 > > But always the same result. > > The log entries > > [2011-08-30 22:10:45.376568] I > [afr-self-heal-common.c:1557:afr_self_heal_completion_cbk] > 0-vmail-replicate-0: background data data self-heal completed on > /xxxxx.de/yyyyyyyyyy/.Tauchen/courierimapuiddb > [2011-08-30 22:10:45.385541] I [afr-common.c:801:afr_lookup_done] > 0-vmail-replicate-0: background meta-data self-heal triggered. path: > /xxxxx.de/yyyyyyyyy/.Tauchen/courierimapkeywords > > The volume is presently unuseable. Any hint? > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://gluster.org/cgi-bin/mailman/listinfo/gluster-users