Hi, I am running some test on top of v3.5.1 in my 2 nodes configuration with one disk each and replica 2 mode. I have two servers connected by a cable. Through this cable I let glusterd communicate. I start dd to create a relatively large file. In the middle of writing process I disconnect the cable, so on one server (node1) I can see all data and on the other one (node2) I can see just a split of the file when writing is finished.. no surprise so far. Then I put the cable back. After a while peers are discovered, self-healing daemons start to communicate, so I can see: gluster volume heal vg0 info Brick node1:/dist1/brick/fs/ /node-middle - Possibly undergoing heal Number of entries: 1 Brick node2:/dist1/brick/fs/ /node-middle - Possibly undergoing heal Number of entries: 1 But on the network there are no data moving, which I verify by df.. Any help? In my opinion after a while I should get my nodes synchronized, but after 20minuts of waiting still nothing (the file was 2G big) Thanks Milos
On 07/02/2014 02:28 AM, Milo? Koz?k wrote:> Hi, > I am running some test on top of v3.5.1 in my 2 nodes configuration > with one disk each and replica 2 mode. > > I have two servers connected by a cable. Through this cable I let > glusterd communicate. I start dd to create a relatively large file. In > the middle of writing process I disconnect the cable, so on one server > (node1) I can see all data and on the other one (node2) I can see just > a split of the file when writing is finishedDoes this mean your client (mount point) is also on node 1?> .. no surprise so far. > > Then I put the cable back. After a while peers are discovered, > self-healing daemons start to communicate, so I can see: > > gluster volume heal vg0 info > Brick node1:/dist1/brick/fs/ > /node-middle - Possibly undergoing heal > Number of entries: 1 > > Brick node2:/dist1/brick/fs/ > /node-middle - Possibly undergoing heal > Number of entries: 1 > > But on the network there are no data moving, which I verify by df.. >When you get "Possibly undergoing heal" and no I/O is going on from the client, it means the self-heal daemon is healing the file. Can you check if there are messages in glustershd.log of node1 about self-heal completion ?> Any help? In my opinion after a while I should get my nodes > synchronized, but after 20minuts of waiting still nothing (the file > was 2G big)Does gluster volume status show all processes being online?> > Thanks Milos > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://supercolony.gluster.org/mailman/listinfo/gluster-users
Hi there Not sure whether this is related, but we see the same problem with glusterfs-3.4(.2). Several files are listed as being healed but they never finish and checksums are identical. We had some problems with NTP, meaning that the clocks on the nodes diverged by a couple of seconds. I suspect this may be the root cause for it, but I could not do any further tests and the files are still in the same state (self-healing). Interestingly there are other threads describing this sort of problem, but nothing came out so far. Best, Tiziano Am 01.07.2014 22:58, schrieb Milo? Koz?k:> Hi, > I am running some test on top of v3.5.1 in my 2 nodes configuration with one > disk each and replica 2 mode. > > I have two servers connected by a cable. Through this cable I let glusterd > communicate. I start dd to create a relatively large file. In the middle of > writing process I disconnect the cable, so on one server (node1) I can see all > data and on the other one (node2) I can see just a split of the file when > writing is finished.. no surprise so far. > > Then I put the cable back. After a while peers are discovered, self-healing > daemons start to communicate, so I can see: > > gluster volume heal vg0 info > Brick node1:/dist1/brick/fs/ > /node-middle - Possibly undergoing heal > Number of entries: 1 > > Brick node2:/dist1/brick/fs/ > /node-middle - Possibly undergoing heal > Number of entries: 1 > > But on the network there are no data moving, which I verify by df.. > > Any help? In my opinion after a while I should get my nodes synchronized, but > after 20minuts of waiting still nothing (the file was 2G big) > > Thanks Milos > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://supercolony.gluster.org/mailman/listinfo/gluster-users-- stepping stone GmbH Neufeldstrasse 9 CH-3012 Bern Telefon: +41 31 332 53 63 www.stepping-stone.ch tiziano.mueller@@stepping-stone.ch
On 07/02/2014 02:28 AM, Milo? Koz?k wrote:> Hi, > I am running some test on top of v3.5.1 in my 2 nodes configuration > with one disk each and replica 2 mode. > > I have two servers connected by a cable. Through this cable I let > glusterd communicate. I start dd to create a relatively large file. In > the middle of writing process I disconnect the cable, so on one server > (node1) I can see all data and on the other one (node2) I can see just > a split of the file when writing is finished.. no surprise so far. > > Then I put the cable back. After a while peers are discovered, > self-healing daemons start to communicate, so I can see: > > gluster volume heal vg0 info > Brick node1:/dist1/brick/fs/ > /node-middle - Possibly undergoing heal > Number of entries: 1 > > Brick node2:/dist1/brick/fs/ > /node-middle - Possibly undergoing heal > Number of entries: 1 > > But on the network there are no data moving, which I verify by df..Could you execute "gluster volume statedump vg0" 2 times 2 minutes apart and attach the files in /var/run/gluster to the bug you raised. We need to verify if it is running into bug fixed by http://review.gluster.com/8187 for 3.5.2 Pranith> > Any help? In my opinion after a while I should get my nodes > synchronized, but after 20minuts of waiting still nothing (the file > was 2G big) > > Thanks Milos > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://supercolony.gluster.org/mailman/listinfo/gluster-users