thr3ads.net - Gluster users - [Gluster-users] add-brick and remove-brick on a nearly full volume [Mar 2014]

If this information is useful, please help other people find it:
Share via:

張為超

2014-Mar-04 09:31 UTC

[Gluster-users] add-brick and remove-brick on a nearly full volume

Hi all,

I have 3 peers (peer-A, peer-B and peer-C). I tried to use add-brick and
remove-brick to replace peers.
(version: glusterfs 3.4)

What I did:

   1. created a distribute volume with two 10-GB bricks (peer-A:/brick and
   peer-B:/brick. Actually they are 9.7 GB after ext4 formatting).
   2. mount it and write 16 1-GB files in to it (command: seq 16 | xargs -i
   dd if=/dev/zero of=/mnt/file-{} bs=1G count=1).
   3. add peer-C:/brick (also 10-GB) to this volume.
   4. execute remove peer-A:/brick start.
   5. check remove status and wait until all of the hosts are completed.
   6. execute remove peer-A:/brick commit.

After step 6, I lost 2 files in the volume.


I list the files in bricks after step 2 and step 5:
After step 2:

peer-A:/brick:

-rw-r--r--    2 root     root     1073741824 Mar  4 17:05 file-1
-rw-r--r--    2 root     root     1073741824 Mar  4 17:07 file-12
-rw-r--r--    2 root     root     1073741824 Mar  4 17:07 file-14
-rw-r--r--    2 root     root     1073741824 Mar  4 17:07 file-15
-rw-r--r--    2 root     root     1073741824 Mar  4 17:08 file-16
-rw-r--r--    2 root     root     1073741824 Mar  4 17:05 file-3
-rw-r--r--    2 root     root     1073741824 Mar  4 17:06 file-6


peer-B:/brick:
-rw-r--r--    2 root     root     1073741824 Mar  4 17:06 file-10
-rw-r--r--    2 root     root     1073741824 Mar  4 17:07 file-11
-rw-r--r--    2 root     root     1073741824 Mar  4 17:07 file-13
---------T    2 root     root             0 Mar  4 17:07 file-15
---------T    2 root     root             0 Mar  4 17:07 file-16
-rw-r--r--    2 root     root     1073741824 Mar  4 17:05 file-2
-rw-r--r--    2 root     root     1073741824 Mar  4 17:05 file-4
-rw-r--r--    2 root     root     1073741824 Mar  4 17:05 file-5
-rw-r--r--    2 root     root     1073741824 Mar  4 17:06 file-7
-rw-r--r--    2 root     root     1073741824 Mar  4 17:06 file-8
-rw-r--r--    2 root     root     1073741824 Mar  4 17:06 file-9

After step 5:

peer-A:/brick:
-rw-r--r--    2 root     root     1073741824 Mar  4 17:07 file-15
-rw-r--r--    2 root     root     1073741824 Mar  4 17:08 file-16

peer-B:/brick:
-rw-r--r--    2 root     root     1073741824 Mar  4 17:06 file-10
-rw-r--r--    2 root     root     1073741824 Mar  4 17:07 file-11
-rw-r--r--    2 root     root     1073741824 Mar  4 17:07 file-13
---------T    2 root     root     1073741824 Mar  4 17:17 file-15
---------T    2 root     root     1073741824 Mar  4 17:17 file-16
-rw-r--r--    2 root     root     1073741824 Mar  4 17:05 file-2
-rw-r--r--    2 root     root     1073741824 Mar  4 17:05 file-4
-rw-r--r--    2 root     root     1073741824 Mar  4 17:05 file-5
-rw-r--r--    2 root     root     1073741824 Mar  4 17:06 file-7
-rw-r--r--    2 root     root     1073741824 Mar  4 17:06 file-8
-rw-r--r--    2 root     root     1073741824 Mar  4 17:06 file-9

peer-C:/brick:
-rw-r--r--    2 root     root     1073741824 Mar  4 17:05 file-1
-rw-r--r--    2 root     root     1073741824 Mar  4 17:07 file-12
-rw-r--r--    2 root     root     1073741824 Mar  4 17:07 file-14
-rw-r--r--    2 root     root     1073741824 Mar  4 17:05 file-3
-rw-r--r--    2 root     root     1073741824 Mar  4 17:06 file-6


After step 6, I lost file-15 and file-16 in the volume.
Anyone know why file-15 and file-16 are not moved to peer-C?
If it's caused by peer-B is full, why does the status show
"completed"?

Node Rebalanced-files          size       scanned      failures
skipped         status run-time in secs
---------      -----------   -----------   -----------   -----------
-----------   ------------   --------------
localhost                5         5.0GB            21             0
 completed           126.00
localhost                5         5.0GB            21             0
 completed           126.00
localhost                5         5.0GB            21             0
 completed           126.00
localhost                5         5.0GB            21             0
 completed           126.00


--
Best regards,
Johnny
j1899j1899 at gmail.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://supercolony.gluster.org/pipermail/gluster-users/attachments/20140304/bb5c13fa/attachment.html>

João Pagaime

2014-Mar-04 09:43 UTC

head link

[Gluster-users] 3.3.0 -> 3.4.2 / Rolling upgrades with no downtime

Hello all

anyone tried a  rolling upgrades with no downtime [1] from 3.3.0 to 
3.4.2 or similar upgrade? any comments?

for testing purposes we've installed a 3.4.2 server and it won't peer, 
giving the error "peer probe: failed: Peer X does not support required 
op-version".

I guess this is expected behavior for a new entry on the cluster

What about changing the software on an existing peer of the cluster? 
Will it also refuse to re-enter the cluster after the upgrade for the 
same reason (peers not supporting the required op-version)?

After all servers and clients are upgraded, how to increase the 
op-version of the global cluster?

best regards,
jo?o

[1]
http://vbellur.wordpress.com/2013/07/15/upgrading-to-glusterfs-3-4/

張為超

2014-Mar-10 07:03 UTC

head link

[Gluster-users] add-brick and remove-brick on a nearly full volume

Hi all,

Here's my command history and some error messages in the log.
I hope these are helpful.

*** .cmd_log_history: ***
[2014-03-10 06:05:04.176225]  : volume create gv1 192.168.13.93:/brick
192.168.5.198:/brick : SUCCESS
[2014-03-10 06:05:07.048590]  : volume start gv1 : SUCCESS
[2014-03-10 06:08:20.781311]  : volume add-brick gv1 192.168.14.36:/brick :
SUCCESS
[2014-03-10 06:09:10.373150]  : volume remove-brick gv1 192.168.13.93:/brick
start : SUCCESS
[2014-03-10 06:11:24.400824]  : volume remove-brick gv1 192.168.13.93:/brick
status : SUCCESS
[2014-03-10 06:12:30.043228]  : volume remove-brick gv1 192.168.13.93:/brick
commit : SUCCESS
[2014-03-10 06:12:34.567630]  : volume remove-brick gv1 192.168.5.198:/brick
status : SUCCESS


*** volume info after last command ***
Volume Name: gv1
Type: Distribute
Volume ID: 0d889dae-1a28-4f07-8991-0bb7a8c9da33
Status: Started
Number of Bricks: 2
Transport-type: tcp
Bricks:
Brick1: 192.168.5.198:/brick
Brick2: 192.168.14.36:/brick


*** Because brick (192.168.5.198:/brick) are full, some files are write to
the other brick (192.168.13.93:/brick) ***
*** I found that the missing files are the files "linked" from the
full
brick (192.168.5.198:/brick) ***
*** Here's some error messages about the missing files. Do anyone knows why
setting xattrs fails? ***
*** 192.168.13.93: brick.log ***
[2014-03-10 06:07:53.017207] W [posix-helpers.c:737:posix_handle_pair]
0-gv1-posix: Extended attributes not supported (try remounting brick with
'user_xattr' flag)
[2014-03-10 06:07:53.029126] E [posix.c:1751:posix_create] 0-gv1-posix:
setting xattrs on /brick/file-15 failed (Operation not supported)
[2014-03-10 06:08:04.218437] E [posix.c:1751:posix_create] 0-gv1-posix:
setting xattrs on /brick/file-16 failed (Operation not supported)
[2014-03-10 06:12:27.801988] W [glusterfsd.c:1002:cleanup_and_exit]
(-->/lib64/libc.so.6(clone+0x6d) [0x7f9e1d22a78d]
(-->/lib64/libpthread.so.0(+0x7def) [0x7f9e1d8b1def]
(-->/var/packages/GlusterfsMgmt/target/sbin/glusterfsd(glusterfs_sigwaiter+0xdc)
[0x40816c]))) 0-: received signum (15), shutting down


*** 192.168.13.93: glusterd.vol.log ***
[2014-03-10 06:05:04.866511] I [glusterd-pmap.c:227:pmap_registry_bind]
0-pmap: adding brick /brick on port 49152
[2014-03-10 06:05:04.868654] I [rpc-clnt.c:962:rpc_clnt_connection_init]
0-management: setting frame-timeout to 600
[2014-03-10 06:05:04.868806] I [socket.c:3480:socket_init] 0-management:
SSL support is NOT enabled
[2014-03-10 06:05:04.868833] I [socket.c:3495:socket_init] 0-management:
using system polling thread
[2014-03-10 06:05:05.718552] I [rpc-clnt.c:962:rpc_clnt_connection_init]
0-management: setting frame-timeout to 600
[2014-03-10 06:05:05.718819] I [socket.c:3480:socket_init] 0-management:
SSL support is NOT enabled
[2014-03-10 06:05:05.718870] I [socket.c:3495:socket_init] 0-management:
using system polling thread
[2014-03-10 06:05:05.719358] I
[glusterd-utils.c:1079:glusterd_volume_brickinfo_get] 0-management: Found
brick
[2014-03-10 06:05:05.719457] I [socket.c:2236:socket_event_handler]
0-transport: disconnecting now
[2014-03-10 06:08:17.350770] I
[glusterd-brick-ops.c:370:__glusterd_handle_add_brick] 0-management:
Received add brick req
[2014-03-10 06:08:17.381885] I
[glusterd-utils.c:1079:glusterd_volume_brickinfo_get] 0-management: Found
brick
[2014-03-10 06:08:17.381952] I
[glusterd-utils.c:1079:glusterd_volume_brickinfo_get] 0-management: Found
brick
[2014-03-10 06:08:17.381984] I
[glusterd-utils.c:1079:glusterd_volume_brickinfo_get] 0-management: Found
brick
[2014-03-10 06:08:17.383519] I
[glusterd-utils.c:1079:glusterd_volume_brickinfo_get] 0-management: Found
brick
[2014-03-10 06:08:17.383582] I
[glusterd-utils.c:1079:glusterd_volume_brickinfo_get] 0-management: Found
brick
[2014-03-10 06:08:17.383613] I
[glusterd-utils.c:1079:glusterd_volume_brickinfo_get] 0-management: Found
brick
[2014-03-10 06:08:17.384776] I
[glusterd-utils.c:1079:glusterd_volume_brickinfo_get] 0-management: Found
brick
[2014-03-10 06:09:02.703267] I
[glusterd-brick-ops.c:593:__glusterd_handle_remove_brick] 0-management:
Received rem brick req
[2014-03-10 06:09:02.704150] I
[glusterd-utils.c:1079:glusterd_volume_brickinfo_get] 0-management: Found
brick
[2014-03-10 06:09:02.705531] I
[glusterd-utils.c:7658:glusterd_generate_and_set_task_id] 0-management:
Generated task-id b3b5bf8f-86e6-4ded-8c5a-a951db48d705 for key remove
[2014-03-10 06:09:02.707226] I
[glusterd-op-sm.c:4168:glusterd_bricks_select_remove_brick] 0-management:
force flag is not set
[2014-03-10 06:09:02.707904] W [dict.c:480:dict_unref]
(-->/var/packages/GlusterfsMgmt/target/lib/glusterfs/3.4git/xlator/mgmt/glusterd.so(gd_commit_op_phase+0xe1)
[0x7f589
[2014-03-10 06:09:02.708634] I
[glusterd-utils.c:1079:glusterd_volume_brickinfo_get] 0-management: Found
brick
[2014-03-10 06:09:02.755359] I
[glusterd-utils.c:1079:glusterd_volume_brickinfo_get] 0-management: Found
brick
[2014-03-10 06:09:02.755436] I
[glusterd-utils.c:1079:glusterd_volume_brickinfo_get] 0-management: Found
brick
[2014-03-10 06:09:02.755483] I
[glusterd-utils.c:1079:glusterd_volume_brickinfo_get] 0-management: Found
brick
[2014-03-10 06:09:02.757110] I
[glusterd-utils.c:1079:glusterd_volume_brickinfo_get] 0-management: Found
brick
[2014-03-10 06:09:02.757218] I
[glusterd-utils.c:1079:glusterd_volume_brickinfo_get] 0-management: Found
brick
[2014-03-10 06:09:02.757252] I
[glusterd-utils.c:1079:glusterd_volume_brickinfo_get] 0-management: Found
brick
[2014-03-10 06:09:08.870637] I [rpc-clnt.c:962:rpc_clnt_connection_init]
0-management: setting frame-timeout to 600
[2014-03-10 06:09:08.870857] I [socket.c:3480:socket_init] 0-management:
SSL support is NOT enabled
[2014-03-10 06:09:08.870908] I [socket.c:3495:socket_init] 0-management:
using system polling thread
[2014-03-10 06:11:15.089162] I
[glusterd-handshake.c:358:__server_event_notify] 0-: received defrag status
updated
[2014-03-10 06:11:15.094016] W [socket.c:514:__socket_rwv] 0-management:
readv failed (No data available)
[2014-03-10 06:11:15.094104] W [socket.c:1962:__socket_proto_state_machine]
0-management: reading from socket failed. Error (No data available), peer
(/tmp/glusterfs-rebala
[2014-03-10 06:11:15.204709] I [mem-pool.c:541:mem_pool_destroy]
0-management: size=2236 max=0 total=0
[2014-03-10 06:11:15.204829] I [mem-pool.c:541:mem_pool_destroy]
0-management: size=124 max=0 total=0
[2014-03-10 06:12:27.789846] I
[glusterd-brick-ops.c:593:__glusterd_handle_remove_brick] 0-management:
Received rem brick req
[2014-03-10 06:12:27.790629] I
[glusterd-utils.c:1079:glusterd_volume_brickinfo_get] 0-management: Found
brick
[2014-03-10 06:12:27.793175] I
[glusterd-op-sm.c:4168:glusterd_bricks_select_remove_brick] 0-management:
force flag is not set
[2014-03-10 06:12:27.794029] I
[glusterd-utils.c:1079:glusterd_volume_brickinfo_get] 0-management: Found
brick
[2014-03-10 06:12:27.794700] W [socket.c:514:__socket_rwv] 0-management:
readv failed (No data available)
[2014-03-10 06:12:27.801447] E
[glusterd-utils.c:1618:glusterd_brick_unlink_socket_file] 0-management:
Failed to remove /var/run/1c5239eb7451335304f0aee8712e7cdf.socket err
[2014-03-10 06:12:27.825757] I
[glusterd-utils.c:1079:glusterd_volume_brickinfo_get] 0-management: Found
brick
[2014-03-10 06:12:27.825819] I
[glusterd-utils.c:1079:glusterd_volume_brickinfo_get] 0-management: Found
brick
[2014-03-10 06:12:27.826741] I
[glusterd-utils.c:1079:glusterd_volume_brickinfo_get] 0-management: Found
brick
[2014-03-10 06:12:27.826804] I
[glusterd-utils.c:1079:glusterd_volume_brickinfo_get] 0-management: Found
brick
[2014-03-10 06:12:28.710599] I [mem-pool.c:541:mem_pool_destroy]
0-management: size=2236 max=0 total=0
[2014-03-10 06:12:28.710705] I [mem-pool.c:541:mem_pool_destroy]
0-management: size=124 max=0 total=0
[2014-03-10 06:12:28.710894] I [glusterd-pmap.c:271:pmap_registry_remove]
0-pmap: removing brick /brick on port 49152
[2014-03-10 06:12:28.713848] W [socket.c:514:__socket_rwv]
0-socket.management: writev failed (Broken pipe)
[2014-03-10 06:12:28.713985] I [socket.c:2236:socket_event_handler]
0-transport: disconnecting now


Thanks.
Johnny Chang


2014-03-04 17:31 GMT+08:00 ??? <j1899j1899 at gmail.com>:
> Hi all,
>
> I have 3 peers (peer-A, peer-B and peer-C). I tried to use add-brick and
> remove-brick to replace peers.
> (version: glusterfs 3.4)
>
> What I did:
>
>    1. created a distribute volume with two 10-GB bricks (peer-A:/brick
>    and peer-B:/brick. Actually they are 9.7 GB after ext4 formatting).
>    2. mount it and write 16 1-GB files in to it (command: seq 16 | xargs
>    -i dd if=/dev/zero of=/mnt/file-{} bs=1G count=1).
>    3. add peer-C:/brick (also 10-GB) to this volume.
>    4. execute remove peer-A:/brick start.
>    5. check remove status and wait until all of the hosts are completed.
>    6. execute remove peer-A:/brick commit.
>
> After step 6, I lost 2 files in the volume.
>
>
> I list the files in bricks after step 2 and step 5:
> After step 2:
>
> peer-A:/brick:
>
> -rw-r--r--    2 root     root     1073741824 Mar  4 17:05 file-1
> -rw-r--r--    2 root     root     1073741824 Mar  4 17:07 file-12
> -rw-r--r--    2 root     root     1073741824 Mar  4 17:07 file-14
> -rw-r--r--    2 root     root     1073741824 Mar  4 17:07 file-15
> -rw-r--r--    2 root     root     1073741824 Mar  4 17:08 file-16
> -rw-r--r--    2 root     root     1073741824 Mar  4 17:05 file-3
> -rw-r--r--    2 root     root     1073741824 Mar  4 17:06 file-6
>
>
> peer-B:/brick:
> -rw-r--r--    2 root     root     1073741824 Mar  4 17:06 file-10
> -rw-r--r--    2 root     root     1073741824 Mar  4 17:07 file-11
> -rw-r--r--    2 root     root     1073741824 Mar  4 17:07 file-13
> ---------T    2 root     root             0 Mar  4 17:07 file-15
> ---------T    2 root     root             0 Mar  4 17:07 file-16
> -rw-r--r--    2 root     root     1073741824 Mar  4 17:05 file-2
> -rw-r--r--    2 root     root     1073741824 Mar  4 17:05 file-4
> -rw-r--r--    2 root     root     1073741824 Mar  4 17:05 file-5
> -rw-r--r--    2 root     root     1073741824 Mar  4 17:06 file-7
> -rw-r--r--    2 root     root     1073741824 Mar  4 17:06 file-8
> -rw-r--r--    2 root     root     1073741824 Mar  4 17:06 file-9
>
> After step 5:
>
> peer-A:/brick:
> -rw-r--r--    2 root     root     1073741824 Mar  4 17:07 file-15
> -rw-r--r--    2 root     root     1073741824 Mar  4 17:08 file-16
>
> peer-B:/brick:
> -rw-r--r--    2 root     root     1073741824 Mar  4 17:06 file-10
> -rw-r--r--    2 root     root     1073741824 Mar  4 17:07 file-11
> -rw-r--r--    2 root     root     1073741824 Mar  4 17:07 file-13
> ---------T    2 root     root     1073741824 Mar  4 17:17 file-15
> ---------T    2 root     root     1073741824 Mar  4 17:17 file-16
> -rw-r--r--    2 root     root     1073741824 Mar  4 17:05 file-2
> -rw-r--r--    2 root     root     1073741824 Mar  4 17:05 file-4
> -rw-r--r--    2 root     root     1073741824 Mar  4 17:05 file-5
> -rw-r--r--    2 root     root     1073741824 Mar  4 17:06 file-7
> -rw-r--r--    2 root     root     1073741824 Mar  4 17:06 file-8
> -rw-r--r--    2 root     root     1073741824 Mar  4 17:06 file-9
>
> peer-C:/brick:
> -rw-r--r--    2 root     root     1073741824 Mar  4 17:05 file-1
> -rw-r--r--    2 root     root     1073741824 Mar  4 17:07 file-12
> -rw-r--r--    2 root     root     1073741824 Mar  4 17:07 file-14
> -rw-r--r--    2 root     root     1073741824 Mar  4 17:05 file-3
> -rw-r--r--    2 root     root     1073741824 Mar  4 17:06 file-6
>
>
> After step 6, I lost file-15 and file-16 in the volume.
> Anyone know why file-15 and file-16 are not moved to peer-C?
> If it's caused by peer-B is full, why does the status show
"completed"?
>
> Node Rebalanced-files          size       scanned      failures
> skipped         status run-time in secs
> ---------      -----------   -----------   -----------   -----------
> -----------   ------------   --------------
> localhost                5         5.0GB            21             0
>  completed           126.00
> localhost                5         5.0GB            21             0
>  completed           126.00
> localhost                5         5.0GB            21             0
>  completed           126.00
> localhost                5         5.0GB            21             0
>  completed           126.00
>
>
> --
> Best regards,
> Johnny
> j1899j1899 at gmail.com
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://supercolony.gluster.org/pipermail/gluster-users/attachments/20140310/014e57e9/attachment.html>

Gluster users - Mar 2014 - add-brick and remove-brick on a nearly full volume

[Gluster-users] add-brick and remove-brick on a nearly full volume

[Gluster-users] 3.3.0 -> 3.4.2 / Rolling upgrades with no downtime

[Gluster-users] add-brick and remove-brick on a nearly full volume