wodel youchi
2015-Feb-24 14:34 UTC
[Gluster-users] Geo-replication big logs and large number of pending files
Hi, I have a 03 nodes setup (Centos7x64 latest updates, glusterfs 3.6.1 latest updates). Master: 02 nodes? (g1 and g2) are on replicated mode with 02 volumes data1 and data2, each volume is constituted of one brick.Slave: the 3rd node (g3) is for Geo-Rep with also 2 volumes slavedata1 and slavedata2I am using the geo-rep with a user geoaccount1 and group geogroup1. the setup was successfully made and geo-rep started. Problems:- After some days, I've found the geo-rep in a faulty state, the reason /var was full in g1 and g3 the slave node.the ssh log file for geo-replication-slave on g3 was full with (11Go): [2015-02-24 11:29:26.526285] W [client-rpc-fops.c:172:client3_3_symlink_cbk] 0-slavedata2-client-0: remote operation failed: File exists. Path: (<gfid:ce5d8b13-1961-4126-93e8-e4ee2fd6b34d>/S15bind9 to ../init.d/bind9) [2015-02-24 11:29:26.526297] W [fuse-bridge.c:1261:fuse_err_cbk] 0-glusterfs-fuse: 1100: SETXATTR() /.gfid/ce5d8b13-1961-4126-93e8-e4ee2fd6b34d => -1 (File exists) [2015-02-24 11:29:26.526602] W [client-rpc-fops.c:172:client3_3_symlink_cbk] 0-slavedata2-client-0: remote operation failed: File exists. Path: (<gfid:ce5d8b13-1961-4126-93e8-e4ee2fd6b34d>/S20modules_dep.sh to ../init.d/modules_dep.sh) [2015-02-24 11:29:26.526618] W [fuse-bridge.c:1261:fuse_err_cbk] 0-glusterfs-fuse: 1101: SETXATTR() /.gfid/ce5d8b13-1961-4126-93e8-e4ee2fd6b34d => -1 (File exists) I emptied the log files on both servers, then I modified the logrotate conf file for geo-repl on all nodes from rotate 52 to rotate 7 size 50M Does geo-rep produce such big logs? the modifications worked for g1 and g2, but I had a problem with g3[root at glustersrv3 logrotate.d]# logrotate -f /etc/logrotate.d/glusterfs-georep error: skipping "/var/log/glusterfs/geo-replication-slaves/967ddac3-af34-4c70-8d2b-eb201ebb645d:gluster%3A%2F%2F127.0.0.1%3Aslavedata1.gluster.log" because parent directory has insecure permissions (It's world writable or writable by group which is not "root") Set "su" directive in config file to tell logrotate which user/group should be used for rotation So I added these two lines to the /etc/logrotate.d/glusterfs-georepsu root geogroup1 And now it seems working, is that correct? After cleaning up the logs, I've tried to restart the geo-rep but didn't succeed: no active session between g1 and g3 erro, so I had to restart the glusterfs daemon on all three nodes. After the geo-rep was restarted and the its state became stable, I did a geo-rep status detail and I got this [root at glustersrv1 ~]# gluster volume geo-replication data1? geoaccount1 at gserver3.domain.tld::slavedata1 status detail MASTER NODE??????????? MASTER VOL??? MASTER BRICK???????? SLAVE?????????????????????????? STATUS???? CHECKPOINT STATUS??? CRAWL STATUS??? FILES SYNCD??? FILES PENDING??? BYTES PENDING??? DELETES PENDING??? FILES SKIPPED ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- glustersrv1.domain.tld??? data1???????? /mnt/brick1/brick??? gserver3.domain.tld::slavedata1??? Active???? N/A????????????????? Hybrid Crawl??? 25784????????? 8191???????????? 0??????????????? 0????????????????? 0 glustersrv2.domain.tld??? data1???????? /mnt/brick1/brick??? gserver3.domain.tld::slavedata1??? Passive??? N/A????????????????? N/A???????????? 0????????????? 0??????????????? 0??????????????? 0????????????????? 0 [root at glustersrv1 ~]# gluster volume geo-replication data2? geoaccount1 at gserver3.domain.tld::slavedata2 status detail MASTER NODE??????????? MASTER VOL??? MASTER BRICK???????? SLAVE?????????????????????????? STATUS???? CHECKPOINT STATUS??? CRAWL STATUS??? FILES SYNCD??? FILES PENDING??? BYTES PENDING??? DELETES PENDING??? FILES SKIPPED ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- glustersrv1.domain.tld??? data2???????? /mnt/brick2/brick??? gserver3.domain.tld::slavedata2??? Active???? N/A????????????????? Hybrid Crawl??? 11768408?????? 8191???????????? 0??????????????? 0????????????????? 3833 glustersrv2.domain.tld??? data2???????? /mnt/brick2/brick??? gserver3.domain.tld::slavedata2??? Passive??? N/A????????????????? N/A???????????? 0????????????? 0??????????????? 0??????????????? 0????????????????? 0 What does it mean? FILES PENDING? because this number didn't change after 1hour from restarting the geo-rep, I thought that it will decrease over time but it didn't.And what does mean FILES SKIPPED? I tried another thing, I stopped the geo-rep, stopped the volumes on g3 then deleted them.then I cleaned up the .glusterfs directory on both bricks and deleted all the glusterfs attributes on them with setfattr command, but I did not delete my data (files and directories). then I recreated the slave volumes, started them and finally restarted the geo-rep, after the initialization and stabilization I got the same result from status command on geo-rep, the same values on FILES PENDING and FILES SKIPPED is that ok? how can I be sure that I have all my data on g3? thanks in advance -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20150224/d8d9c47a/attachment.html>
Atin Mukherjee
2015-Feb-24 14:51 UTC
[Gluster-users] Geo-replication big logs and large number of pending files
CCing Geo-rep folks. ~Atin On 02/24/2015 08:04 PM, wodel youchi wrote:> Hi, > I have a 03 nodes setup (Centos7x64 latest updates, glusterfs 3.6.1 latest updates). > Master: 02 nodes? (g1 and g2) are on replicated mode with 02 volumes data1 and data2, each volume is constituted of one brick.Slave: the 3rd node (g3) is for Geo-Rep with also 2 volumes slavedata1 and slavedata2I am using the geo-rep with a user geoaccount1 and group geogroup1. > the setup was successfully made and geo-rep started. > > Problems:- After some days, I've found the geo-rep in a faulty state, the reason /var was full in g1 and g3 the slave node.the ssh log file for geo-replication-slave on g3 was full with (11Go): > [2015-02-24 11:29:26.526285] W [client-rpc-fops.c:172:client3_3_symlink_cbk] 0-slavedata2-client-0: remote operation failed: File exists. Path: (<gfid:ce5d8b13-1961-4126-93e8-e4ee2fd6b34d>/S15bind9 to ../init.d/bind9) > [2015-02-24 11:29:26.526297] W [fuse-bridge.c:1261:fuse_err_cbk] 0-glusterfs-fuse: 1100: SETXATTR() /.gfid/ce5d8b13-1961-4126-93e8-e4ee2fd6b34d => -1 (File exists) > [2015-02-24 11:29:26.526602] W [client-rpc-fops.c:172:client3_3_symlink_cbk] 0-slavedata2-client-0: remote operation failed: File exists. Path: (<gfid:ce5d8b13-1961-4126-93e8-e4ee2fd6b34d>/S20modules_dep.sh to ../init.d/modules_dep.sh) > [2015-02-24 11:29:26.526618] W [fuse-bridge.c:1261:fuse_err_cbk] 0-glusterfs-fuse: 1101: SETXATTR() /.gfid/ce5d8b13-1961-4126-93e8-e4ee2fd6b34d => -1 (File exists) > I emptied the log files on both servers, then I modified the logrotate conf file for geo-repl on all nodes from rotate 52 to > rotate 7 > size 50M > Does geo-rep produce such big logs? > > the modifications worked for g1 and g2, but I had a problem with g3[root at glustersrv3 logrotate.d]# logrotate -f /etc/logrotate.d/glusterfs-georep > error: skipping "/var/log/glusterfs/geo-replication-slaves/967ddac3-af34-4c70-8d2b-eb201ebb645d:gluster%3A%2F%2F127.0.0.1%3Aslavedata1.gluster.log" because parent directory has insecure permissions (It's world writable or writable by group which is not "root") Set "su" directive in config file to tell logrotate which user/group should be used for rotation > So I added these two lines to the /etc/logrotate.d/glusterfs-georepsu root geogroup1 > And now it seems working, is that correct? > After cleaning up the logs, I've tried to restart the geo-rep but didn't succeed: no active session between g1 and g3 erro, so I had to restart the glusterfs daemon on all three nodes. > After the geo-rep was restarted and the its state became stable, I did a geo-rep status detail and I got this > [root at glustersrv1 ~]# gluster volume geo-replication data1? geoaccount1 at gserver3.domain.tld::slavedata1 status detail > > MASTER NODE? ? ? ? ? ? ? ? ? ? ? MASTER VOL? ? ? MASTER BRICK? ? ? ? ? ? ? ? SLAVE? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? STATUS? ? ? ? CHECKPOINT STATUS? ? ? CRAWL STATUS? ? ? FILES SYNCD? ? ? FILES PENDING? ? ? BYTES PENDING? ? ? DELETES PENDING? ? ? FILES SKIPPED > ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > glustersrv1.domain.tld? ? ? data1? ? ? ? ? ? ? ? /mnt/brick1/brick? ? ? gserver3.domain.tld::slavedata1? ? ? Active? ? ? ? N/A? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? Hybrid Crawl? ? ? 25784? ? ? ? ? ? ? ? ? 8191? ? ? ? ? ? ? ? ? ? ? ? 0? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 0? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 0 > glustersrv2.domain.tld? ? ? data1? ? ? ? ? ? ? ? /mnt/brick1/brick? ? ? gserver3.domain.tld::slavedata1? ? ? Passive? ? ? N/A? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? N/A? ? ? ? ? ? ? ? ? ? ? ? 0? ? ? ? ? ? ? ? ? ? ? ? ? 0? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 0? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 0? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 0 > [root at glustersrv1 ~]# gluster volume geo-replication data2? geoaccount1 at gserver3.domain.tld::slavedata2 status detail > > MASTER NODE? ? ? ? ? ? ? ? ? ? ? MASTER VOL? ? ? MASTER BRICK? ? ? ? ? ? ? ? SLAVE? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? STATUS? ? ? ? CHECKPOINT STATUS? ? ? CRAWL STATUS? ? ? FILES SYNCD? ? ? FILES PENDING? ? ? BYTES PENDING? ? ? DELETES PENDING? ? ? FILES SKIPPED > ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > glustersrv1.domain.tld? ? ? data2? ? ? ? ? ? ? ? /mnt/brick2/brick? ? ? gserver3.domain.tld::slavedata2? ? ? Active? ? ? ? N/A? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? Hybrid Crawl? ? ? 11768408? ? ? ? ? ? 8191? ? ? ? ? ? ? ? ? ? ? ? 0? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 0? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 3833 > glustersrv2.domain.tld? ? ? data2? ? ? ? ? ? ? ? /mnt/brick2/brick? ? ? gserver3.domain.tld::slavedata2? ? ? Passive? ? ? N/A? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? N/A? ? ? ? ? ? ? ? ? ? ? ? 0? ? ? ? ? ? ? ? ? ? ? ? ? 0? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 0? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 0? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 0 > What does it mean? FILES PENDING? because this number didn't change after 1hour from restarting the geo-rep, I thought that it will decrease over time but it didn't.And what does mean FILES SKIPPED? > I tried another thing, I stopped the geo-rep, stopped the volumes on g3 then deleted them.then I cleaned up the .glusterfs directory on both bricks and deleted all the glusterfs attributes on them with setfattr command, but I did not delete my data (files and directories). > then I recreated the slave volumes, started them and finally restarted the geo-rep, after the initialization and stabilization I got the same result from status command on geo-rep, the same values on FILES PENDING and FILES SKIPPED > is that ok? how can I be sure that I have all my data on g3? > > thanks in advance > > > > > > > > > > > > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://www.gluster.org/mailman/listinfo/gluster-users >-- ~Atin