張為超
2014-Mar-04 09:31 UTC
[Gluster-users] add-brick and remove-brick on a nearly full volume
Hi all, I have 3 peers (peer-A, peer-B and peer-C). I tried to use add-brick and remove-brick to replace peers. (version: glusterfs 3.4) What I did: 1. created a distribute volume with two 10-GB bricks (peer-A:/brick and peer-B:/brick. Actually they are 9.7 GB after ext4 formatting). 2. mount it and write 16 1-GB files in to it (command: seq 16 | xargs -i dd if=/dev/zero of=/mnt/file-{} bs=1G count=1). 3. add peer-C:/brick (also 10-GB) to this volume. 4. execute remove peer-A:/brick start. 5. check remove status and wait until all of the hosts are completed. 6. execute remove peer-A:/brick commit. After step 6, I lost 2 files in the volume. I list the files in bricks after step 2 and step 5: After step 2: peer-A:/brick: -rw-r--r-- 2 root root 1073741824 Mar 4 17:05 file-1 -rw-r--r-- 2 root root 1073741824 Mar 4 17:07 file-12 -rw-r--r-- 2 root root 1073741824 Mar 4 17:07 file-14 -rw-r--r-- 2 root root 1073741824 Mar 4 17:07 file-15 -rw-r--r-- 2 root root 1073741824 Mar 4 17:08 file-16 -rw-r--r-- 2 root root 1073741824 Mar 4 17:05 file-3 -rw-r--r-- 2 root root 1073741824 Mar 4 17:06 file-6 peer-B:/brick: -rw-r--r-- 2 root root 1073741824 Mar 4 17:06 file-10 -rw-r--r-- 2 root root 1073741824 Mar 4 17:07 file-11 -rw-r--r-- 2 root root 1073741824 Mar 4 17:07 file-13 ---------T 2 root root 0 Mar 4 17:07 file-15 ---------T 2 root root 0 Mar 4 17:07 file-16 -rw-r--r-- 2 root root 1073741824 Mar 4 17:05 file-2 -rw-r--r-- 2 root root 1073741824 Mar 4 17:05 file-4 -rw-r--r-- 2 root root 1073741824 Mar 4 17:05 file-5 -rw-r--r-- 2 root root 1073741824 Mar 4 17:06 file-7 -rw-r--r-- 2 root root 1073741824 Mar 4 17:06 file-8 -rw-r--r-- 2 root root 1073741824 Mar 4 17:06 file-9 After step 5: peer-A:/brick: -rw-r--r-- 2 root root 1073741824 Mar 4 17:07 file-15 -rw-r--r-- 2 root root 1073741824 Mar 4 17:08 file-16 peer-B:/brick: -rw-r--r-- 2 root root 1073741824 Mar 4 17:06 file-10 -rw-r--r-- 2 root root 1073741824 Mar 4 17:07 file-11 -rw-r--r-- 2 root root 1073741824 Mar 4 17:07 file-13 ---------T 2 root root 1073741824 Mar 4 17:17 file-15 ---------T 2 root root 1073741824 Mar 4 17:17 file-16 -rw-r--r-- 2 root root 1073741824 Mar 4 17:05 file-2 -rw-r--r-- 2 root root 1073741824 Mar 4 17:05 file-4 -rw-r--r-- 2 root root 1073741824 Mar 4 17:05 file-5 -rw-r--r-- 2 root root 1073741824 Mar 4 17:06 file-7 -rw-r--r-- 2 root root 1073741824 Mar 4 17:06 file-8 -rw-r--r-- 2 root root 1073741824 Mar 4 17:06 file-9 peer-C:/brick: -rw-r--r-- 2 root root 1073741824 Mar 4 17:05 file-1 -rw-r--r-- 2 root root 1073741824 Mar 4 17:07 file-12 -rw-r--r-- 2 root root 1073741824 Mar 4 17:07 file-14 -rw-r--r-- 2 root root 1073741824 Mar 4 17:05 file-3 -rw-r--r-- 2 root root 1073741824 Mar 4 17:06 file-6 After step 6, I lost file-15 and file-16 in the volume. Anyone know why file-15 and file-16 are not moved to peer-C? If it's caused by peer-B is full, why does the status show "completed"? Node Rebalanced-files size scanned failures skipped status run-time in secs --------- ----------- ----------- ----------- ----------- ----------- ------------ -------------- localhost 5 5.0GB 21 0 completed 126.00 localhost 5 5.0GB 21 0 completed 126.00 localhost 5 5.0GB 21 0 completed 126.00 localhost 5 5.0GB 21 0 completed 126.00 -- Best regards, Johnny j1899j1899 at gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20140304/bb5c13fa/attachment.html>
João Pagaime
2014-Mar-04 09:43 UTC
[Gluster-users] 3.3.0 -> 3.4.2 / Rolling upgrades with no downtime
Hello all anyone tried a rolling upgrades with no downtime [1] from 3.3.0 to 3.4.2 or similar upgrade? any comments? for testing purposes we've installed a 3.4.2 server and it won't peer, giving the error "peer probe: failed: Peer X does not support required op-version". I guess this is expected behavior for a new entry on the cluster What about changing the software on an existing peer of the cluster? Will it also refuse to re-enter the cluster after the upgrade for the same reason (peers not supporting the required op-version)? After all servers and clients are upgraded, how to increase the op-version of the global cluster? best regards, jo?o [1] http://vbellur.wordpress.com/2013/07/15/upgrading-to-glusterfs-3-4/
張為超
2014-Mar-10 07:03 UTC
[Gluster-users] add-brick and remove-brick on a nearly full volume
Hi all, Here's my command history and some error messages in the log. I hope these are helpful. *** .cmd_log_history: *** [2014-03-10 06:05:04.176225] : volume create gv1 192.168.13.93:/brick 192.168.5.198:/brick : SUCCESS [2014-03-10 06:05:07.048590] : volume start gv1 : SUCCESS [2014-03-10 06:08:20.781311] : volume add-brick gv1 192.168.14.36:/brick : SUCCESS [2014-03-10 06:09:10.373150] : volume remove-brick gv1 192.168.13.93:/brick start : SUCCESS [2014-03-10 06:11:24.400824] : volume remove-brick gv1 192.168.13.93:/brick status : SUCCESS [2014-03-10 06:12:30.043228] : volume remove-brick gv1 192.168.13.93:/brick commit : SUCCESS [2014-03-10 06:12:34.567630] : volume remove-brick gv1 192.168.5.198:/brick status : SUCCESS *** volume info after last command *** Volume Name: gv1 Type: Distribute Volume ID: 0d889dae-1a28-4f07-8991-0bb7a8c9da33 Status: Started Number of Bricks: 2 Transport-type: tcp Bricks: Brick1: 192.168.5.198:/brick Brick2: 192.168.14.36:/brick *** Because brick (192.168.5.198:/brick) are full, some files are write to the other brick (192.168.13.93:/brick) *** *** I found that the missing files are the files "linked" from the full brick (192.168.5.198:/brick) *** *** Here's some error messages about the missing files. Do anyone knows why setting xattrs fails? *** *** 192.168.13.93: brick.log *** [2014-03-10 06:07:53.017207] W [posix-helpers.c:737:posix_handle_pair] 0-gv1-posix: Extended attributes not supported (try remounting brick with 'user_xattr' flag) [2014-03-10 06:07:53.029126] E [posix.c:1751:posix_create] 0-gv1-posix: setting xattrs on /brick/file-15 failed (Operation not supported) [2014-03-10 06:08:04.218437] E [posix.c:1751:posix_create] 0-gv1-posix: setting xattrs on /brick/file-16 failed (Operation not supported) [2014-03-10 06:12:27.801988] W [glusterfsd.c:1002:cleanup_and_exit] (-->/lib64/libc.so.6(clone+0x6d) [0x7f9e1d22a78d] (-->/lib64/libpthread.so.0(+0x7def) [0x7f9e1d8b1def] (-->/var/packages/GlusterfsMgmt/target/sbin/glusterfsd(glusterfs_sigwaiter+0xdc) [0x40816c]))) 0-: received signum (15), shutting down *** 192.168.13.93: glusterd.vol.log *** [2014-03-10 06:05:04.866511] I [glusterd-pmap.c:227:pmap_registry_bind] 0-pmap: adding brick /brick on port 49152 [2014-03-10 06:05:04.868654] I [rpc-clnt.c:962:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600 [2014-03-10 06:05:04.868806] I [socket.c:3480:socket_init] 0-management: SSL support is NOT enabled [2014-03-10 06:05:04.868833] I [socket.c:3495:socket_init] 0-management: using system polling thread [2014-03-10 06:05:05.718552] I [rpc-clnt.c:962:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600 [2014-03-10 06:05:05.718819] I [socket.c:3480:socket_init] 0-management: SSL support is NOT enabled [2014-03-10 06:05:05.718870] I [socket.c:3495:socket_init] 0-management: using system polling thread [2014-03-10 06:05:05.719358] I [glusterd-utils.c:1079:glusterd_volume_brickinfo_get] 0-management: Found brick [2014-03-10 06:05:05.719457] I [socket.c:2236:socket_event_handler] 0-transport: disconnecting now [2014-03-10 06:08:17.350770] I [glusterd-brick-ops.c:370:__glusterd_handle_add_brick] 0-management: Received add brick req [2014-03-10 06:08:17.381885] I [glusterd-utils.c:1079:glusterd_volume_brickinfo_get] 0-management: Found brick [2014-03-10 06:08:17.381952] I [glusterd-utils.c:1079:glusterd_volume_brickinfo_get] 0-management: Found brick [2014-03-10 06:08:17.381984] I [glusterd-utils.c:1079:glusterd_volume_brickinfo_get] 0-management: Found brick [2014-03-10 06:08:17.383519] I [glusterd-utils.c:1079:glusterd_volume_brickinfo_get] 0-management: Found brick [2014-03-10 06:08:17.383582] I [glusterd-utils.c:1079:glusterd_volume_brickinfo_get] 0-management: Found brick [2014-03-10 06:08:17.383613] I [glusterd-utils.c:1079:glusterd_volume_brickinfo_get] 0-management: Found brick [2014-03-10 06:08:17.384776] I [glusterd-utils.c:1079:glusterd_volume_brickinfo_get] 0-management: Found brick [2014-03-10 06:09:02.703267] I [glusterd-brick-ops.c:593:__glusterd_handle_remove_brick] 0-management: Received rem brick req [2014-03-10 06:09:02.704150] I [glusterd-utils.c:1079:glusterd_volume_brickinfo_get] 0-management: Found brick [2014-03-10 06:09:02.705531] I [glusterd-utils.c:7658:glusterd_generate_and_set_task_id] 0-management: Generated task-id b3b5bf8f-86e6-4ded-8c5a-a951db48d705 for key remove [2014-03-10 06:09:02.707226] I [glusterd-op-sm.c:4168:glusterd_bricks_select_remove_brick] 0-management: force flag is not set [2014-03-10 06:09:02.707904] W [dict.c:480:dict_unref] (-->/var/packages/GlusterfsMgmt/target/lib/glusterfs/3.4git/xlator/mgmt/glusterd.so(gd_commit_op_phase+0xe1) [0x7f589 [2014-03-10 06:09:02.708634] I [glusterd-utils.c:1079:glusterd_volume_brickinfo_get] 0-management: Found brick [2014-03-10 06:09:02.755359] I [glusterd-utils.c:1079:glusterd_volume_brickinfo_get] 0-management: Found brick [2014-03-10 06:09:02.755436] I [glusterd-utils.c:1079:glusterd_volume_brickinfo_get] 0-management: Found brick [2014-03-10 06:09:02.755483] I [glusterd-utils.c:1079:glusterd_volume_brickinfo_get] 0-management: Found brick [2014-03-10 06:09:02.757110] I [glusterd-utils.c:1079:glusterd_volume_brickinfo_get] 0-management: Found brick [2014-03-10 06:09:02.757218] I [glusterd-utils.c:1079:glusterd_volume_brickinfo_get] 0-management: Found brick [2014-03-10 06:09:02.757252] I [glusterd-utils.c:1079:glusterd_volume_brickinfo_get] 0-management: Found brick [2014-03-10 06:09:08.870637] I [rpc-clnt.c:962:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600 [2014-03-10 06:09:08.870857] I [socket.c:3480:socket_init] 0-management: SSL support is NOT enabled [2014-03-10 06:09:08.870908] I [socket.c:3495:socket_init] 0-management: using system polling thread [2014-03-10 06:11:15.089162] I [glusterd-handshake.c:358:__server_event_notify] 0-: received defrag status updated [2014-03-10 06:11:15.094016] W [socket.c:514:__socket_rwv] 0-management: readv failed (No data available) [2014-03-10 06:11:15.094104] W [socket.c:1962:__socket_proto_state_machine] 0-management: reading from socket failed. Error (No data available), peer (/tmp/glusterfs-rebala [2014-03-10 06:11:15.204709] I [mem-pool.c:541:mem_pool_destroy] 0-management: size=2236 max=0 total=0 [2014-03-10 06:11:15.204829] I [mem-pool.c:541:mem_pool_destroy] 0-management: size=124 max=0 total=0 [2014-03-10 06:12:27.789846] I [glusterd-brick-ops.c:593:__glusterd_handle_remove_brick] 0-management: Received rem brick req [2014-03-10 06:12:27.790629] I [glusterd-utils.c:1079:glusterd_volume_brickinfo_get] 0-management: Found brick [2014-03-10 06:12:27.793175] I [glusterd-op-sm.c:4168:glusterd_bricks_select_remove_brick] 0-management: force flag is not set [2014-03-10 06:12:27.794029] I [glusterd-utils.c:1079:glusterd_volume_brickinfo_get] 0-management: Found brick [2014-03-10 06:12:27.794700] W [socket.c:514:__socket_rwv] 0-management: readv failed (No data available) [2014-03-10 06:12:27.801447] E [glusterd-utils.c:1618:glusterd_brick_unlink_socket_file] 0-management: Failed to remove /var/run/1c5239eb7451335304f0aee8712e7cdf.socket err [2014-03-10 06:12:27.825757] I [glusterd-utils.c:1079:glusterd_volume_brickinfo_get] 0-management: Found brick [2014-03-10 06:12:27.825819] I [glusterd-utils.c:1079:glusterd_volume_brickinfo_get] 0-management: Found brick [2014-03-10 06:12:27.826741] I [glusterd-utils.c:1079:glusterd_volume_brickinfo_get] 0-management: Found brick [2014-03-10 06:12:27.826804] I [glusterd-utils.c:1079:glusterd_volume_brickinfo_get] 0-management: Found brick [2014-03-10 06:12:28.710599] I [mem-pool.c:541:mem_pool_destroy] 0-management: size=2236 max=0 total=0 [2014-03-10 06:12:28.710705] I [mem-pool.c:541:mem_pool_destroy] 0-management: size=124 max=0 total=0 [2014-03-10 06:12:28.710894] I [glusterd-pmap.c:271:pmap_registry_remove] 0-pmap: removing brick /brick on port 49152 [2014-03-10 06:12:28.713848] W [socket.c:514:__socket_rwv] 0-socket.management: writev failed (Broken pipe) [2014-03-10 06:12:28.713985] I [socket.c:2236:socket_event_handler] 0-transport: disconnecting now Thanks. Johnny Chang 2014-03-04 17:31 GMT+08:00 ??? <j1899j1899 at gmail.com>:> Hi all, > > I have 3 peers (peer-A, peer-B and peer-C). I tried to use add-brick and > remove-brick to replace peers. > (version: glusterfs 3.4) > > What I did: > > 1. created a distribute volume with two 10-GB bricks (peer-A:/brick > and peer-B:/brick. Actually they are 9.7 GB after ext4 formatting). > 2. mount it and write 16 1-GB files in to it (command: seq 16 | xargs > -i dd if=/dev/zero of=/mnt/file-{} bs=1G count=1). > 3. add peer-C:/brick (also 10-GB) to this volume. > 4. execute remove peer-A:/brick start. > 5. check remove status and wait until all of the hosts are completed. > 6. execute remove peer-A:/brick commit. > > After step 6, I lost 2 files in the volume. > > > I list the files in bricks after step 2 and step 5: > After step 2: > > peer-A:/brick: > > -rw-r--r-- 2 root root 1073741824 Mar 4 17:05 file-1 > -rw-r--r-- 2 root root 1073741824 Mar 4 17:07 file-12 > -rw-r--r-- 2 root root 1073741824 Mar 4 17:07 file-14 > -rw-r--r-- 2 root root 1073741824 Mar 4 17:07 file-15 > -rw-r--r-- 2 root root 1073741824 Mar 4 17:08 file-16 > -rw-r--r-- 2 root root 1073741824 Mar 4 17:05 file-3 > -rw-r--r-- 2 root root 1073741824 Mar 4 17:06 file-6 > > > peer-B:/brick: > -rw-r--r-- 2 root root 1073741824 Mar 4 17:06 file-10 > -rw-r--r-- 2 root root 1073741824 Mar 4 17:07 file-11 > -rw-r--r-- 2 root root 1073741824 Mar 4 17:07 file-13 > ---------T 2 root root 0 Mar 4 17:07 file-15 > ---------T 2 root root 0 Mar 4 17:07 file-16 > -rw-r--r-- 2 root root 1073741824 Mar 4 17:05 file-2 > -rw-r--r-- 2 root root 1073741824 Mar 4 17:05 file-4 > -rw-r--r-- 2 root root 1073741824 Mar 4 17:05 file-5 > -rw-r--r-- 2 root root 1073741824 Mar 4 17:06 file-7 > -rw-r--r-- 2 root root 1073741824 Mar 4 17:06 file-8 > -rw-r--r-- 2 root root 1073741824 Mar 4 17:06 file-9 > > After step 5: > > peer-A:/brick: > -rw-r--r-- 2 root root 1073741824 Mar 4 17:07 file-15 > -rw-r--r-- 2 root root 1073741824 Mar 4 17:08 file-16 > > peer-B:/brick: > -rw-r--r-- 2 root root 1073741824 Mar 4 17:06 file-10 > -rw-r--r-- 2 root root 1073741824 Mar 4 17:07 file-11 > -rw-r--r-- 2 root root 1073741824 Mar 4 17:07 file-13 > ---------T 2 root root 1073741824 Mar 4 17:17 file-15 > ---------T 2 root root 1073741824 Mar 4 17:17 file-16 > -rw-r--r-- 2 root root 1073741824 Mar 4 17:05 file-2 > -rw-r--r-- 2 root root 1073741824 Mar 4 17:05 file-4 > -rw-r--r-- 2 root root 1073741824 Mar 4 17:05 file-5 > -rw-r--r-- 2 root root 1073741824 Mar 4 17:06 file-7 > -rw-r--r-- 2 root root 1073741824 Mar 4 17:06 file-8 > -rw-r--r-- 2 root root 1073741824 Mar 4 17:06 file-9 > > peer-C:/brick: > -rw-r--r-- 2 root root 1073741824 Mar 4 17:05 file-1 > -rw-r--r-- 2 root root 1073741824 Mar 4 17:07 file-12 > -rw-r--r-- 2 root root 1073741824 Mar 4 17:07 file-14 > -rw-r--r-- 2 root root 1073741824 Mar 4 17:05 file-3 > -rw-r--r-- 2 root root 1073741824 Mar 4 17:06 file-6 > > > After step 6, I lost file-15 and file-16 in the volume. > Anyone know why file-15 and file-16 are not moved to peer-C? > If it's caused by peer-B is full, why does the status show "completed"? > > Node Rebalanced-files size scanned failures > skipped status run-time in secs > --------- ----------- ----------- ----------- ----------- > ----------- ------------ -------------- > localhost 5 5.0GB 21 0 > completed 126.00 > localhost 5 5.0GB 21 0 > completed 126.00 > localhost 5 5.0GB 21 0 > completed 126.00 > localhost 5 5.0GB 21 0 > completed 126.00 > > > -- > Best regards, > Johnny > j1899j1899 at gmail.com > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20140310/014e57e9/attachment.html>