thr3ads.net - Gluster users - [Gluster-users] Geo-replication big logs and large number of pending files [Feb 2015]

If this information is useful, please help other people find it:
Share via:

wodel youchi

2015-Feb-24 14:34 UTC

[Gluster-users] Geo-replication big logs and large number of pending files

Hi,
I have a 03 nodes setup (Centos7x64 latest updates, glusterfs 3.6.1 latest
updates).
Master: 02 nodes? (g1 and g2) are on replicated mode with 02 volumes data1 and
data2, each volume is constituted of one brick.Slave: the 3rd node (g3) is for
Geo-Rep with also 2 volumes slavedata1 and slavedata2I am using the geo-rep with
a user geoaccount1 and group geogroup1.
the setup was successfully made and geo-rep started. 

Problems:- After some days, I've found the geo-rep in a faulty state, the
reason /var was full in g1 and g3 the slave node.the ssh log file for
geo-replication-slave on g3 was full with (11Go):
[2015-02-24 11:29:26.526285] W [client-rpc-fops.c:172:client3_3_symlink_cbk]
0-slavedata2-client-0: remote operation failed: File exists. Path:
(<gfid:ce5d8b13-1961-4126-93e8-e4ee2fd6b34d>/S15bind9 to ../init.d/bind9)
[2015-02-24 11:29:26.526297] W [fuse-bridge.c:1261:fuse_err_cbk]
0-glusterfs-fuse: 1100: SETXATTR() /.gfid/ce5d8b13-1961-4126-93e8-e4ee2fd6b34d
=> -1 (File exists)
[2015-02-24 11:29:26.526602] W [client-rpc-fops.c:172:client3_3_symlink_cbk]
0-slavedata2-client-0: remote operation failed: File exists. Path:
(<gfid:ce5d8b13-1961-4126-93e8-e4ee2fd6b34d>/S20modules_dep.sh to
../init.d/modules_dep.sh)
[2015-02-24 11:29:26.526618] W [fuse-bridge.c:1261:fuse_err_cbk]
0-glusterfs-fuse: 1101: SETXATTR() /.gfid/ce5d8b13-1961-4126-93e8-e4ee2fd6b34d
=> -1 (File exists)
I emptied the log files on both servers, then I modified the logrotate conf file
for geo-repl on all nodes from rotate 52 to
rotate 7
size 50M
Does geo-rep produce such big logs?

the modifications worked for g1 and g2, but I had a problem with g3[root at
glustersrv3 logrotate.d]# logrotate -f /etc/logrotate.d/glusterfs-georep
error: skipping
"/var/log/glusterfs/geo-replication-slaves/967ddac3-af34-4c70-8d2b-eb201ebb645d:gluster%3A%2F%2F127.0.0.1%3Aslavedata1.gluster.log"
because parent directory has insecure permissions (It's world writable or
writable by group which is not "root") Set "su" directive in
config file to tell logrotate which user/group should be used for rotation
So I added these two lines to the /etc/logrotate.d/glusterfs-georepsu root
geogroup1
 And now it seems working, is that correct?
After cleaning up the logs, I've tried to restart the geo-rep but didn't
succeed: no active session between g1 and g3 erro, so I had to restart the
glusterfs daemon on all three nodes.
After the geo-rep was restarted and the its state became stable, I did a geo-rep
status detail and I got this
[root at glustersrv1 ~]# gluster volume geo-replication data1? geoaccount1 at
gserver3.domain.tld::slavedata1 status detail

MASTER NODE??????????? MASTER VOL??? MASTER BRICK????????
SLAVE?????????????????????????? STATUS???? CHECKPOINT STATUS??? CRAWL STATUS???
FILES SYNCD??? FILES PENDING??? BYTES PENDING??? DELETES PENDING??? FILES
SKIPPED
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
glustersrv1.domain.tld??? data1???????? /mnt/brick1/brick???
gserver3.domain.tld::slavedata1??? Active???? N/A????????????????? Hybrid
Crawl??? 25784????????? 8191???????????? 0??????????????? 0????????????????? 0
glustersrv2.domain.tld??? data1???????? /mnt/brick1/brick???
gserver3.domain.tld::slavedata1??? Passive??? N/A?????????????????
N/A???????????? 0????????????? 0??????????????? 0???????????????
0????????????????? 0
[root at glustersrv1 ~]# gluster volume geo-replication data2? geoaccount1 at
gserver3.domain.tld::slavedata2 status detail

MASTER NODE??????????? MASTER VOL??? MASTER BRICK????????
SLAVE?????????????????????????? STATUS???? CHECKPOINT STATUS??? CRAWL STATUS???
FILES SYNCD??? FILES PENDING??? BYTES PENDING??? DELETES PENDING??? FILES
SKIPPED
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
glustersrv1.domain.tld??? data2???????? /mnt/brick2/brick???
gserver3.domain.tld::slavedata2??? Active???? N/A????????????????? Hybrid
Crawl??? 11768408?????? 8191???????????? 0??????????????? 0?????????????????
3833
glustersrv2.domain.tld??? data2???????? /mnt/brick2/brick???
gserver3.domain.tld::slavedata2??? Passive??? N/A?????????????????
N/A???????????? 0????????????? 0??????????????? 0???????????????
0????????????????? 0
What does it mean? FILES PENDING? because this number didn't change after
1hour from restarting the geo-rep, I thought that it will decrease over time but
it didn't.And what does mean FILES SKIPPED?
I tried another thing, I stopped the geo-rep, stopped the volumes on g3 then
deleted them.then I cleaned up the .glusterfs directory on both bricks and
deleted all the glusterfs attributes on them with setfattr command, but I did
not delete my data (files and directories).
then I recreated the slave volumes, started them and finally restarted the
geo-rep, after the initialization and stabilization I got the same result from
status command on geo-rep, the same values on FILES PENDING and FILES SKIPPED
is that ok? how can I be sure that I have all my data on g3?

thanks in advance










-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20150224/d8d9c47a/attachment.html>

Atin Mukherjee

2015-Feb-24 14:51 UTC

head link

[Gluster-users] Geo-replication big logs and large number of pending files

CCing Geo-rep folks.

~Atin

On 02/24/2015 08:04 PM, wodel youchi wrote:> Hi,
> I have a 03 nodes setup (Centos7x64 latest updates, glusterfs 3.6.1 latest
updates).
> Master: 02 nodes?  (g1 and g2) are on replicated mode with 02 volumes data1
and data2, each volume is constituted of one brick.Slave: the 3rd node (g3) is
for Geo-Rep with also 2 volumes slavedata1 and slavedata2I am using the geo-rep
with a user geoaccount1 and group geogroup1.
> the setup was successfully made and geo-rep started. 
> 
> Problems:- After some days, I've found the geo-rep in a faulty state,
the reason /var was full in g1 and g3 the slave node.the ssh log file for
geo-replication-slave on g3 was full with (11Go):
> [2015-02-24 11:29:26.526285] W
[client-rpc-fops.c:172:client3_3_symlink_cbk] 0-slavedata2-client-0: remote
operation failed: File exists. Path:
(<gfid:ce5d8b13-1961-4126-93e8-e4ee2fd6b34d>/S15bind9 to ../init.d/bind9)
> [2015-02-24 11:29:26.526297] W [fuse-bridge.c:1261:fuse_err_cbk]
0-glusterfs-fuse: 1100: SETXATTR() /.gfid/ce5d8b13-1961-4126-93e8-e4ee2fd6b34d
=> -1 (File exists)
> [2015-02-24 11:29:26.526602] W
[client-rpc-fops.c:172:client3_3_symlink_cbk] 0-slavedata2-client-0: remote
operation failed: File exists. Path:
(<gfid:ce5d8b13-1961-4126-93e8-e4ee2fd6b34d>/S20modules_dep.sh to
../init.d/modules_dep.sh)
> [2015-02-24 11:29:26.526618] W [fuse-bridge.c:1261:fuse_err_cbk]
0-glusterfs-fuse: 1101: SETXATTR() /.gfid/ce5d8b13-1961-4126-93e8-e4ee2fd6b34d
=> -1 (File exists)
> I emptied the log files on both servers, then I modified the logrotate conf
file for geo-repl on all nodes from rotate 52 to
> rotate 7
> size 50M
> Does geo-rep produce such big logs?
> 
> the modifications worked for g1 and g2, but I had a problem with g3[root at
glustersrv3 logrotate.d]# logrotate -f /etc/logrotate.d/glusterfs-georep
> error: skipping
"/var/log/glusterfs/geo-replication-slaves/967ddac3-af34-4c70-8d2b-eb201ebb645d:gluster%3A%2F%2F127.0.0.1%3Aslavedata1.gluster.log"
because parent directory has insecure permissions (It's world writable or
writable by group which is not "root") Set "su" directive in
config file to tell logrotate which user/group should be used for rotation
> So I added these two lines to the /etc/logrotate.d/glusterfs-georepsu root
geogroup1
>  And now it seems working, is that correct?
> After cleaning up the logs, I've tried to restart the geo-rep but
didn't succeed: no active session between g1 and g3 erro, so I had to
restart the glusterfs daemon on all three nodes.
> After the geo-rep was restarted and the its state became stable, I did a
geo-rep status detail and I got this
> [root at glustersrv1 ~]# gluster volume geo-replication data1?  geoaccount1
at gserver3.domain.tld::slavedata1 status detail
> 
> MASTER NODE? ? ? ? ? ? ? ? ? ? ?  MASTER VOL? ? ?  MASTER BRICK? ? ? ? ? ?
? ?  SLAVE? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?  STATUS? ? ? ? 
CHECKPOINT STATUS? ? ?  CRAWL STATUS? ? ?  FILES SYNCD? ? ?  FILES PENDING? ? ? 
BYTES PENDING? ? ?  DELETES PENDING? ? ?  FILES SKIPPED
>
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> glustersrv1.domain.tld? ? ?  data1? ? ? ? ? ? ? ?  /mnt/brick1/brick? ? ? 
gserver3.domain.tld::slavedata1? ? ?  Active? ? ? ?  N/A? ? ? ? ? ? ? ? ? ? ? ?
? ? ? ? ?  Hybrid Crawl? ? ?  25784? ? ? ? ? ? ? ? ?  8191? ? ? ? ? ? ? ? ? ? ?
?  0? ? ? ? ? ? ? ? ? ? ? ? ? ? ?  0? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?  0
> glustersrv2.domain.tld? ? ?  data1? ? ? ? ? ? ? ?  /mnt/brick1/brick? ? ? 
gserver3.domain.tld::slavedata1? ? ?  Passive? ? ?  N/A? ? ? ? ? ? ? ? ? ? ? ? ?
? ? ? ?  N/A? ? ? ? ? ? ? ? ? ? ? ?  0? ? ? ? ? ? ? ? ? ? ? ? ?  0? ? ? ? ? ? ?
? ? ? ? ? ? ? ?  0? ? ? ? ? ? ? ? ? ? ? ? ? ? ?  0? ? ? ? ? ? ? ? ? ? ? ? ? ? ?
? ?  0
> [root at glustersrv1 ~]# gluster volume geo-replication data2?  geoaccount1
at gserver3.domain.tld::slavedata2 status detail
> 
> MASTER NODE? ? ? ? ? ? ? ? ? ? ?  MASTER VOL? ? ?  MASTER BRICK? ? ? ? ? ?
? ?  SLAVE? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?  STATUS? ? ? ? 
CHECKPOINT STATUS? ? ?  CRAWL STATUS? ? ?  FILES SYNCD? ? ?  FILES PENDING? ? ? 
BYTES PENDING? ? ?  DELETES PENDING? ? ?  FILES SKIPPED
>
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> glustersrv1.domain.tld? ? ?  data2? ? ? ? ? ? ? ?  /mnt/brick2/brick? ? ? 
gserver3.domain.tld::slavedata2? ? ?  Active? ? ? ?  N/A? ? ? ? ? ? ? ? ? ? ? ?
? ? ? ? ?  Hybrid Crawl? ? ?  11768408? ? ? ? ? ?  8191? ? ? ? ? ? ? ? ? ? ? ? 
0? ? ? ? ? ? ? ? ? ? ? ? ? ? ?  0? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?  3833
> glustersrv2.domain.tld? ? ?  data2? ? ? ? ? ? ? ?  /mnt/brick2/brick? ? ? 
gserver3.domain.tld::slavedata2? ? ?  Passive? ? ?  N/A? ? ? ? ? ? ? ? ? ? ? ? ?
? ? ? ?  N/A? ? ? ? ? ? ? ? ? ? ? ?  0? ? ? ? ? ? ? ? ? ? ? ? ?  0? ? ? ? ? ? ?
? ? ? ? ? ? ? ?  0? ? ? ? ? ? ? ? ? ? ? ? ? ? ?  0? ? ? ? ? ? ? ? ? ? ? ? ? ? ?
? ?  0
> What does it mean?  FILES PENDING? because this number didn't change
after 1hour from restarting the geo-rep, I thought that it will decrease over
time but it didn't.And what does mean FILES SKIPPED?
> I tried another thing, I stopped the geo-rep, stopped the volumes on g3
then deleted them.then I cleaned up the .glusterfs directory on both bricks and
deleted all the glusterfs attributes on them with setfattr command, but I did
not delete my data (files and directories).
> then I recreated the slave volumes, started them and finally restarted the
geo-rep, after the initialization and stabilization I got the same result from
status command on geo-rep, the same values on FILES PENDING and FILES SKIPPED
> is that ok? how can I be sure that I have all my data on g3?
> 
> thanks in advance
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
> 
-- 
~Atin

Gluster users - Feb 2015 - Geo-replication big logs and large number of pending files

[Gluster-users] Geo-replication big logs and large number of pending files

[Gluster-users] Geo-replication big logs and large number of pending files