Mariusz Sobisiak
2013-Dec-10 10:59 UTC
[Gluster-users] Error after crash of Virtual Machine during migration
Greetings, Legend: storage-gfs-3-prd - the first gluster. storage-1-saas - new gluster where "the first gluster" had to be migrated. storage-gfs-4-prd - the second gluster (which had to be migrated later). I've started command replace-brick: 'gluster volume replace-brick sa_bookshelf storage-gfs-3-prd:/ydp/shared storage-1-saas:/ydp/shared start' During that Virtual Machine (Xen) has crashed. Now I can't abort migration and continue it again. When I try: '# gluster volume replace-brick sa_bookshelf storage-gfs-3-prd:/ydp/shared storage-1-saas:/ydp/shared abort' The command lasts about 5 minutes then finishes with no results. Apart from that Gluster after that command starts behave very strange. For example I can't do '# gluster volume heal sa_bookshelf info' because it lasts about 5 minutes and returns black screen (the same like abort). Then I restart Gluster server and Gluster returns to normal work except the replace-brick commands. When I do: '# gluster volume replace-brick sa_bookshelf storage-gfs-3-prd:/ydp/shared storage-1-saas:/ydp/shared status' I get: Number of files migrated = 0 Current fileI can do 'volume heal info' commands etc. until I call the command: '# gluster volume replace-brick sa_bookshelf storage-gfs-3-prd:/ydp/shared storage-1-saas:/ydp/shared abort'. # gluster --version glusterfs 3.3.1 built on Oct 22 2012 07:54:24 Repository revision: git://git.gluster.com/glusterfs.git Copyright (c) 2006-2011 Gluster Inc. <http://www.gluster.com> GlusterFS comes with ABSOLUTELY NO WARRANTY. You may redistribute copies of GlusterFS under the terms of the GNU General Public License. Brick (/ydp/shared) logs (repeats the same constantly): [2013-12-06 11:29:44.790299] W [dict.c:995:data_to_str] (-->/usr/lib/glusterfs/3.3.1/rpc-transport/socket.so(socket_connect+0xab ) [0x7ff4a5d35fcb] (-->/usr/lib/glusterfs/3.3.1/rpc-transport/socket.so(socket_client_get_r emote_sockaddr+0x15d) [0x7ff4a5d3d64d] (-->/usr/lib/glusterfs/3.3.1/rpc-transport/socket.so(client_fill_address _family+0x2bb) [0x7ff4a5d3d4ab]))) 0-dict: data is NULL [2013-12-06 11:29:44.790402] W [dict.c:995:data_to_str] (-->/usr/lib/glusterfs/3.3.1/rpc-transport/socket.so(socket_connect+0xab ) [0x7ff4a5d35fcb] (-->/usr/lib/glusterfs/3.3.1/rpc-transport/socket.so(socket_client_get_r emote_sockaddr+0x15d) [0x7ff4a5d3d64d] (-->/usr/lib/glusterfs/3.3.1/rpc-transport/socket.so(client_fill_address _family+0x2c6) [0x7ff4a5d3d4b6]))) 0-dict: data is NULL [2013-12-06 11:29:44.790465] E [name.c:141:client_fill_address_family] 0-sa_bookshelf-replace-brick: transport.address-family not specified. Could not guess default value from (remote-host:(null) or transport.unix.connect-path:(null)) options [2013-12-06 11:29:47.791037] W [dict.c:995:data_to_str] (-->/usr/lib/glusterfs/3.3.1/rpc-transport/socket.so(socket_connect+0xab ) [0x7ff4a5d35fcb] (-->/usr/lib/glusterfs/3.3.1/rpc-transport/socket.so(socket_client_get_r emote_sockaddr+0x15d) [0x7ff4a5d3d64d] (-->/usr/lib/glusterfs/3.3.1/rpc-transport/socket.so(client_fill_address _family+0x2bb) [0x7ff4a5d3d4ab]))) 0-dict: data is NULL [2013-12-06 11:29:47.791141] W [dict.c:995:data_to_str] (-->/usr/lib/glusterfs/3.3.1/rpc-transport/socket.so(socket_connect+0xab ) [0x7ff4a5d35fcb] (-->/usr/lib/glusterfs/3.3.1/rpc-transport/socket.so(socket_client_get_r emote_sockaddr+0x15d) [0x7ff4a5d3d64d] (-->/usr/lib/glusterfs/3.3.1/rpc-transport/socket.so(client_fill_address _family+0x2c6) [0x7ff4a5d3d4b6]))) 0-dict: data is NULL [2013-12-06 11:29:47.791174] E [name.c:141:client_fill_address_family] 0-sa_bookshelf-replace-brick: transport.address-family not specified. Could not guess default value from (remote-host:(null) or transport.unix.connect-path:(null)) options [2013-12-06 11:29:50.791775] W [dict.c:995:data_to_str] (-->/usr/lib/glusterfs/3.3.1/rpc-transport/socket.so(socket_connect+0xab ) [0x7ff4a5d35fcb] (-->/usr/lib/glusterfs/3.3.1/rpc-transport/socket.so(socket_client_get_r emote_sockaddr+0x15d) [0x7ff4a5d3d64d] (-->/usr/lib/glusterfs/3.3.1/rpc-transport/socket.so(client_fill_address _family+0x2bb) [0x7ff4a5d3d4ab]))) 0-dict: data is NULL [2013-12-06 11:29:50.791986] W [dict.c:995:data_to_str] (-->/usr/lib/glusterfs/3.3.1/rpc-transport/socket.so(socket_connect+0xab ) [0x7ff4a5d35fcb] (-->/usr/lib/glusterfs/3.3.1/rpc-transport/socket.so(socket_client_get_r emote_sockaddr+0x15d) [0x7ff4a5d3d64d] (-->/usr/lib/glusterfs/3.3.1/rpc-transport/socket.so(client_fill_address _family+0x2c6) [0x7ff4a5d3d4b6]))) 0-dict: data is NULL [2013-12-06 11:29:50.792046] E [name.c:141:client_fill_address_family] 0-sa_bookshelf-replace-brick: transport.address-family not specified. Could not guess default value from (remote-host:(null) or transport.unix.connect-path:(null)) options # gluster volume info Volume Name: sa_bookshelf Type: Distributed-Replicate Volume ID: 74512f52-72ec-4538-9a54-4e50c4691722 Status: Started Number of Bricks: 2 x 2 = 4 Transport-type: tcp Bricks: Brick1: storage-gfs-3-prd:/ydp/shared Brick2: storage-gfs-4-prd:/ydp/shared Brick3: storage-gfs-3-prd:/ydp/shared2 Brick4: storage-gfs-4-prd:/ydp/shared2 # gluster volume status Status of volume: sa_bookshelf Gluster process Port Online Pid ------------------------------------------------------------------------ ------ Brick storage-gfs-3-prd:/ydp/shared 24009 Y 758 Brick storage-gfs-4-prd:/ydp/shared 24009 Y 730 Brick storage-gfs-3-prd:/ydp/shared2 24010 Y 764 Brick storage-gfs-4-prd:/ydp/shared2 24010 Y 4578 NFS Server on localhost 38467 Y 770 Self-heal Daemon on localhost N/A Y 776 NFS Server on storage-1-saas 38467 Y 840 Self-heal Daemon on storage-1-saas N/A Y 846 NFS Server on storage-gfs-4-prd 38467 Y 4584 Self-heal Daemon on storage-gfs-4-prd N/A Y 4590 storage-gfs-3-prd:~# gluster peer status Number of Peers: 2 Hostname: storage-1-saas Uuid: 37b9d881-ce24-4550-b9de-6b304d7e9d07 State: Peer in Cluster (Connected) Hostname: storage-gfs-4-prd Uuid: 4c384f45-873b-4c12-9683-903059132c56 State: Peer in Cluster (Connected) (from storage-1-saas)# gluster peer status Number of Peers: 2 Hostname: 172.16.3.60 Uuid: 1441a7b0-09d2-4a40-a3ac-0d0e546f6884 State: Peer in Cluster (Connected) Hostname: storage-gfs-4-prd Uuid: 4c384f45-873b-4c12-9683-903059132c56 State: Peer in Cluster (Connected) Clients work properly. I googled for that but I found that was a bug but in 3.3.0 version. How can I repair that and continue my migration? Thank You for any help. BTW: I moved Gluster Server via Gluster 3.4: Brick Restoration - Replace Crashed Server how to. Regards, Mariusz
Joe Julian
2014-Jan-21 15:42 UTC
[Gluster-users] Error after crash of Virtual Machine during migration
On 12/10/2013 02:59 AM, Mariusz Sobisiak wrote:> Greetings, > > Legend: > storage-gfs-3-prd - the first gluster.What's a "gluster"?> storage-1-saas - new gluster where "the first gluster" had to be > migrated. > storage-gfs-4-prd - the second gluster (which had to be migrated later).What do you mean "migrated"?> I've started command replace-brick: > 'gluster volume replace-brick sa_bookshelf storage-gfs-3-prd:/ydp/shared > storage-1-saas:/ydp/shared start' > > During that Virtual Machine (Xen) has crashed. Now I can't abort > migration and continue it again.I don't know what state that leaves your files in. I think the original brick, "storage-gfs-3-prd:/ydp/shared", should still have all the data. The rest of the problem has to do with settings in /var/lib/glusterd/sa_bookshelf/info. Make a backup of that file and edit it, removing anything to do with replace-brick, or rebalance. Feel free to put the info file on fpaste.org and ping me on IRC if you need help with that. Stop the volume and glusterd. Copy that same edited info file to the same path on both servers. Start glusterd again. That should clear the replace-brick status so you can try again with 3.4.2.> When I try: > '# gluster volume replace-brick sa_bookshelf > storage-gfs-3-prd:/ydp/shared storage-1-saas:/ydp/shared abort' > The command lasts about 5 minutes then finishes with no results. Apart > from that Gluster after that command starts behave very strange. > For example I can't do '# gluster volume heal sa_bookshelf info' because > it lasts about 5 minutes and returns black screen (the same like abort). > > Then I restart Gluster server and Gluster returns to normal work except > the replace-brick commands. When I do: > '# gluster volume replace-brick sa_bookshelf > storage-gfs-3-prd:/ydp/shared storage-1-saas:/ydp/shared status' > I get: > Number of files migrated = 0 Current file> I can do 'volume heal info' commands etc. until I call the command: > '# gluster volume replace-brick sa_bookshelf > storage-gfs-3-prd:/ydp/shared storage-1-saas:/ydp/shared abort'. > > > > # gluster --version > glusterfs 3.3.1 built on Oct 22 2012 07:54:24 Repository revision: > git://git.gluster.com/glusterfs.git > Copyright (c) 2006-2011 Gluster Inc. <http://www.gluster.com> GlusterFS > comes with ABSOLUTELY NO WARRANTY. > You may redistribute copies of GlusterFS under the terms of the GNU > General Public License. > > Brick (/ydp/shared) logs (repeats the same constantly): > [2013-12-06 11:29:44.790299] W [dict.c:995:data_to_str] > (-->/usr/lib/glusterfs/3.3.1/rpc-transport/socket.so(socket_connect+0xab > ) [0x7ff4a5d35fcb] > (-->/usr/lib/glusterfs/3.3.1/rpc-transport/socket.so(socket_client_get_r > emote_sockaddr+0x15d) [0x7ff4a5d3d64d] > (-->/usr/lib/glusterfs/3.3.1/rpc-transport/socket.so(client_fill_address > _family+0x2bb) [0x7ff4a5d3d4ab]))) 0-dict: data is NULL > [2013-12-06 11:29:44.790402] W [dict.c:995:data_to_str] > (-->/usr/lib/glusterfs/3.3.1/rpc-transport/socket.so(socket_connect+0xab > ) [0x7ff4a5d35fcb] > (-->/usr/lib/glusterfs/3.3.1/rpc-transport/socket.so(socket_client_get_r > emote_sockaddr+0x15d) [0x7ff4a5d3d64d] > (-->/usr/lib/glusterfs/3.3.1/rpc-transport/socket.so(client_fill_address > _family+0x2c6) [0x7ff4a5d3d4b6]))) 0-dict: data is NULL > [2013-12-06 11:29:44.790465] E [name.c:141:client_fill_address_family] > 0-sa_bookshelf-replace-brick: transport.address-family not specified. > Could not guess default value from (remote-host:(null) or > transport.unix.connect-path:(null)) options > [2013-12-06 11:29:47.791037] W [dict.c:995:data_to_str] > (-->/usr/lib/glusterfs/3.3.1/rpc-transport/socket.so(socket_connect+0xab > ) [0x7ff4a5d35fcb] > (-->/usr/lib/glusterfs/3.3.1/rpc-transport/socket.so(socket_client_get_r > emote_sockaddr+0x15d) [0x7ff4a5d3d64d] > (-->/usr/lib/glusterfs/3.3.1/rpc-transport/socket.so(client_fill_address > _family+0x2bb) [0x7ff4a5d3d4ab]))) 0-dict: data is NULL > [2013-12-06 11:29:47.791141] W [dict.c:995:data_to_str] > (-->/usr/lib/glusterfs/3.3.1/rpc-transport/socket.so(socket_connect+0xab > ) [0x7ff4a5d35fcb] > (-->/usr/lib/glusterfs/3.3.1/rpc-transport/socket.so(socket_client_get_r > emote_sockaddr+0x15d) [0x7ff4a5d3d64d] > (-->/usr/lib/glusterfs/3.3.1/rpc-transport/socket.so(client_fill_address > _family+0x2c6) [0x7ff4a5d3d4b6]))) 0-dict: data is NULL > [2013-12-06 11:29:47.791174] E [name.c:141:client_fill_address_family] > 0-sa_bookshelf-replace-brick: transport.address-family not specified. > Could not guess default value from (remote-host:(null) or > transport.unix.connect-path:(null)) options > [2013-12-06 11:29:50.791775] W [dict.c:995:data_to_str] > (-->/usr/lib/glusterfs/3.3.1/rpc-transport/socket.so(socket_connect+0xab > ) [0x7ff4a5d35fcb] > (-->/usr/lib/glusterfs/3.3.1/rpc-transport/socket.so(socket_client_get_r > emote_sockaddr+0x15d) [0x7ff4a5d3d64d] > (-->/usr/lib/glusterfs/3.3.1/rpc-transport/socket.so(client_fill_address > _family+0x2bb) [0x7ff4a5d3d4ab]))) 0-dict: data is NULL > [2013-12-06 11:29:50.791986] W [dict.c:995:data_to_str] > (-->/usr/lib/glusterfs/3.3.1/rpc-transport/socket.so(socket_connect+0xab > ) [0x7ff4a5d35fcb] > (-->/usr/lib/glusterfs/3.3.1/rpc-transport/socket.so(socket_client_get_r > emote_sockaddr+0x15d) [0x7ff4a5d3d64d] > (-->/usr/lib/glusterfs/3.3.1/rpc-transport/socket.so(client_fill_address > _family+0x2c6) [0x7ff4a5d3d4b6]))) 0-dict: data is NULL > [2013-12-06 11:29:50.792046] E [name.c:141:client_fill_address_family] > 0-sa_bookshelf-replace-brick: transport.address-family not specified. > Could not guess default value from (remote-host:(null) or > transport.unix.connect-path:(null)) options > > > # gluster volume info > > Volume Name: sa_bookshelf > Type: Distributed-Replicate > Volume ID: 74512f52-72ec-4538-9a54-4e50c4691722 > Status: Started > Number of Bricks: 2 x 2 = 4 > Transport-type: tcp > Bricks: > Brick1: storage-gfs-3-prd:/ydp/shared > Brick2: storage-gfs-4-prd:/ydp/shared > Brick3: storage-gfs-3-prd:/ydp/shared2 > Brick4: storage-gfs-4-prd:/ydp/shared2 > > > # gluster volume status > Status of volume: sa_bookshelf > Gluster process Port Online > Pid > ------------------------------------------------------------------------ > ------ > Brick storage-gfs-3-prd:/ydp/shared 24009 Y > 758 > Brick storage-gfs-4-prd:/ydp/shared 24009 Y > 730 > Brick storage-gfs-3-prd:/ydp/shared2 24010 Y > 764 > Brick storage-gfs-4-prd:/ydp/shared2 24010 Y > 4578 > NFS Server on localhost 38467 Y > 770 > Self-heal Daemon on localhost N/A Y > 776 > NFS Server on storage-1-saas 38467 Y > 840 > Self-heal Daemon on storage-1-saas N/A Y > 846 > NFS Server on storage-gfs-4-prd 38467 Y > 4584 > Self-heal Daemon on storage-gfs-4-prd N/A Y > 4590 > > storage-gfs-3-prd:~# gluster peer status Number of Peers: 2 > > Hostname: storage-1-saas > Uuid: 37b9d881-ce24-4550-b9de-6b304d7e9d07 > State: Peer in Cluster (Connected) > > Hostname: storage-gfs-4-prd > Uuid: 4c384f45-873b-4c12-9683-903059132c56 > State: Peer in Cluster (Connected) > > > (from storage-1-saas)# gluster peer status Number of Peers: 2 > > Hostname: 172.16.3.60 > Uuid: 1441a7b0-09d2-4a40-a3ac-0d0e546f6884 > State: Peer in Cluster (Connected) > > Hostname: storage-gfs-4-prd > Uuid: 4c384f45-873b-4c12-9683-903059132c56 > State: Peer in Cluster (Connected) > > > > Clients work properly. > I googled for that but I found that was a bug but in 3.3.0 version. How > can I repair that and continue my migration? Thank You for any help. > > BTW: I moved Gluster Server via Gluster 3.4: Brick Restoration - Replace > Crashed Server how to. > > Regards, > Mariusz > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://supercolony.gluster.org/mailman/listinfo/gluster-users
Reasonably Related Threads
- gluster replace
- Question about glusterfs quotas on debian wheezy?
- replace-brick failing - transport.address-family not specified
- geo-replication 3.5.2 not working on Ubuntu 12.0.4 - transport.address-family not specified
- Experiencing errors after adding new nodes