Kingsley
2014-Oct-13 15:51 UTC
[Gluster-users] geo-replication breaks on CentOS 6.5 + gluster 3.6.0 beta3
Hi, I have a small script to simulate file activity for an application we have. It breaks geo-replication within about 15 - 20 seconds when I try it. This is on a small Gluster test environment running in some VMs running CentOS 6.5 and using gluster 3.6.0 beta3. I have 6 VMs - test1, test2, test3, test4, test5 and test6. test1, test2 , test3 and test4 are gluster servers while test5 and test6 are the clients. test3 is actually not used in this test. Before the test, I had a single gluster volume as follows: test1# gluster volume status Status of volume: gv0 Gluster process Port Online Pid ------------------------------------------------------------------------------ Brick test1:/data/brick/gv0 49168 Y 12017 Brick test2:/data/brick/gv0 49168 Y 11835 NFS Server on localhost 2049 Y 12032 Self-heal Daemon on localhost N/A Y 12039 NFS Server on test4 2049 Y 7934 Self-heal Daemon on test4 N/A Y 7939 NFS Server on test3 2049 Y 11768 Self-heal Daemon on test3 N/A Y 11775 NFS Server on test2 2049 Y 11849 Self-heal Daemon on test2 N/A Y 11855 Task Status of Volume gv0 ------------------------------------------------------------------------------ There are no active volume tasks I created a new volume and set up geo-replication as follows (as these are test machines I only have one file system on each, hence using "force" to create the bricks in the root FS): test4# date ; gluster volume create gv0-slave test4:/data/brick/gv0-slave force; date Mon Oct 13 15:03:14 BST 2014 volume create: gv0-slave: success: please start the volume to access data Mon Oct 13 15:03:15 BST 2014 test4# date ; gluster volume start gv0-slave; date Mon Oct 13 15:03:36 BST 2014 volume start: gv0-slave: success Mon Oct 13 15:03:39 BST 2014 test4# date ; gluster volume geo-replication gv0 test4::gv0-slave create push-pem force ; date Mon Oct 13 15:05:59 BST 2014 Creating geo-replication session between gv0 & test4::gv0-slave has been successful Mon Oct 13 15:06:11 BST 2014 I then mount volume gv0 on one of the client machines. I can create files within the gv0 volume and can see the changes being replicated to the gv0-slave volume, so I know that geo-replication is working at the start. When I run my script (which quickly creates, deletes and renames files), geo-replication breaks within a very short time. The test script output is in http://gluster.dogwind.com/files/georep20141013/test6_script-output.log (I interrupted the script once I saw that geo-replication was broken). Note that when it deletes a file, it renames any later-numbered file so that the file numbering remains sequential with no gaps; this simulates a real world application that we use. If you want a copy of the test script, it's here: http://gluster.dogwind.com/files/georep20141013/test_script.tar.gz The various gluster log files can be downloaded from here: http://gluster.dogwind.com/files/georep20141013/ - each log file has the actual log file path at the top of the file. If you want to run the test script on your own system, edit test.pl so that @mailstores contains a directory path to a gluster volume. My systems' timezone is BST (GMT+1 / UTC+1) so any timestamps outside of gluster logs are in this timezone. Let me know if you need any more info. -- Cheers, Kingsley.
Kingsley
2014-Oct-14 13:27 UTC
[Gluster-users] geo-replication breaks on CentOS 6.5 + gluster 3.6.0 beta3
It's worth me adding that since geo-replication broke, if I query the volume status (in this instance, on test1), I get this: test1# gluster volume status Another transaction is in progress. Please try again after sometime. It's still giving this error, 24 hours later. Cheers, Kingsley. On Mon, 2014-10-13 at 16:51 +0100, Kingsley wrote:> Hi, > > I have a small script to simulate file activity for an application we > have. It breaks geo-replication within about 15 - 20 seconds when I try > it. > > This is on a small Gluster test environment running in some VMs running > CentOS 6.5 and using gluster 3.6.0 beta3. I have 6 VMs - test1, test2, > test3, test4, test5 and test6. test1, test2 , test3 and test4 are > gluster servers while test5 and test6 are the clients. test3 is actually > not used in this test. > > > Before the test, I had a single gluster volume as follows: > > test1# gluster volume status > Status of volume: gv0 > Gluster process Port Online Pid > ------------------------------------------------------------------------------ > Brick test1:/data/brick/gv0 49168 Y 12017 > Brick test2:/data/brick/gv0 49168 Y 11835 > NFS Server on localhost 2049 Y 12032 > Self-heal Daemon on localhost N/A Y 12039 > NFS Server on test4 2049 Y 7934 > Self-heal Daemon on test4 N/A Y 7939 > NFS Server on test3 2049 Y 11768 > Self-heal Daemon on test3 N/A Y 11775 > NFS Server on test2 2049 Y 11849 > Self-heal Daemon on test2 N/A Y 11855 > > Task Status of Volume gv0 > ------------------------------------------------------------------------------ > There are no active volume tasks > > > I created a new volume and set up geo-replication as follows (as these > are test machines I only have one file system on each, hence using > "force" to create the bricks in the root FS): > > test4# date ; gluster volume create gv0-slave test4:/data/brick/gv0-slave force; date > Mon Oct 13 15:03:14 BST 2014 > volume create: gv0-slave: success: please start the volume to access data > Mon Oct 13 15:03:15 BST 2014 > > test4# date ; gluster volume start gv0-slave; date > Mon Oct 13 15:03:36 BST 2014 > volume start: gv0-slave: success > Mon Oct 13 15:03:39 BST 2014 > > test4# date ; gluster volume geo-replication gv0 test4::gv0-slave create push-pem force ; date > Mon Oct 13 15:05:59 BST 2014 > Creating geo-replication session between gv0 & test4::gv0-slave has been successful > Mon Oct 13 15:06:11 BST 2014 > > > I then mount volume gv0 on one of the client machines. I can create > files within the gv0 volume and can see the changes being replicated to > the gv0-slave volume, so I know that geo-replication is working at the > start. > > When I run my script (which quickly creates, deletes and renames files), > geo-replication breaks within a very short time. The test script output > is in > http://gluster.dogwind.com/files/georep20141013/test6_script-output.log > (I interrupted the script once I saw that geo-replication was broken). > Note that when it deletes a file, it renames any later-numbered file so > that the file numbering remains sequential with no gaps; this simulates > a real world application that we use. > > If you want a copy of the test script, it's here: > http://gluster.dogwind.com/files/georep20141013/test_script.tar.gz > > > The various gluster log files can be downloaded from here: > http://gluster.dogwind.com/files/georep20141013/ - each log file has the > actual log file path at the top of the file. > > If you want to run the test script on your own system, edit test.pl so > that @mailstores contains a directory path to a gluster volume. > > My systems' timezone is BST (GMT+1 / UTC+1) so any timestamps outside of > gluster logs are in this timezone. > > Let me know if you need any more info. >
James Payne
2014-Oct-14 21:10 UTC
[Gluster-users] geo-replication breaks on CentOS 6.5 + gluster 3.6.0 beta3
Just adding that I have verified this as well with the 3.6 beta, I added a log to the ticket regarding this. https://bugzilla.redhat.com/show_bug.cgi?id=1141379 Please feel free to add to the bug report, I think we are seeing the same issue. It isn't present in the 3.4 series which in the one I'm testing currently. (no distributed geo rep though) Regards James -----Original Message----- From: Kingsley [mailto:gluster at gluster.dogwind.com] Sent: 13 October 2014 16:51 To: gluster-users at gluster.org Subject: [Gluster-users] geo-replication breaks on CentOS 6.5 + gluster 3.6.0 beta3 Hi, I have a small script to simulate file activity for an application we have. It breaks geo-replication within about 15 - 20 seconds when I try it. This is on a small Gluster test environment running in some VMs running CentOS 6.5 and using gluster 3.6.0 beta3. I have 6 VMs - test1, test2, test3, test4, test5 and test6. test1, test2 , test3 and test4 are gluster servers while test5 and test6 are the clients. test3 is actually not used in this test. Before the test, I had a single gluster volume as follows: test1# gluster volume status Status of volume: gv0 Gluster process Port Online Pid ---------------------------------------------------------------------------- -- Brick test1:/data/brick/gv0 49168 Y 12017 Brick test2:/data/brick/gv0 49168 Y 11835 NFS Server on localhost 2049 Y 12032 Self-heal Daemon on localhost N/A Y 12039 NFS Server on test4 2049 Y 7934 Self-heal Daemon on test4 N/A Y 7939 NFS Server on test3 2049 Y 11768 Self-heal Daemon on test3 N/A Y 11775 NFS Server on test2 2049 Y 11849 Self-heal Daemon on test2 N/A Y 11855 Task Status of Volume gv0 ---------------------------------------------------------------------------- -- There are no active volume tasks I created a new volume and set up geo-replication as follows (as these are test machines I only have one file system on each, hence using "force" to create the bricks in the root FS): test4# date ; gluster volume create gv0-slave test4:/data/brick/gv0-slave force; date Mon Oct 13 15:03:14 BST 2014 volume create: gv0-slave: success: please start the volume to access data Mon Oct 13 15:03:15 BST 2014 test4# date ; gluster volume start gv0-slave; date Mon Oct 13 15:03:36 BST 2014 volume start: gv0-slave: success Mon Oct 13 15:03:39 BST 2014 test4# date ; gluster volume geo-replication gv0 test4::gv0-slave create push-pem force ; date Mon Oct 13 15:05:59 BST 2014 Creating geo-replication session between gv0 & test4::gv0-slave has been successful Mon Oct 13 15:06:11 BST 2014 I then mount volume gv0 on one of the client machines. I can create files within the gv0 volume and can see the changes being replicated to the gv0-slave volume, so I know that geo-replication is working at the start. When I run my script (which quickly creates, deletes and renames files), geo-replication breaks within a very short time. The test script output is in http://gluster.dogwind.com/files/georep20141013/test6_script-output.log (I interrupted the script once I saw that geo-replication was broken). Note that when it deletes a file, it renames any later-numbered file so that the file numbering remains sequential with no gaps; this simulates a real world application that we use. If you want a copy of the test script, it's here: http://gluster.dogwind.com/files/georep20141013/test_script.tar.gz The various gluster log files can be downloaded from here: http://gluster.dogwind.com/files/georep20141013/ - each log file has the actual log file path at the top of the file. If you want to run the test script on your own system, edit test.pl so that @mailstores contains a directory path to a gluster volume. My systems' timezone is BST (GMT+1 / UTC+1) so any timestamps outside of gluster logs are in this timezone. Let me know if you need any more info. -- Cheers, Kingsley.