Hu Bert
2018-Aug-16 07:01 UTC
[Gluster-users] Previously replaced brick not coming up after reboot
Hi there, 2 times i had to replace a brick on 2 different servers; replace went fine, heal took very long but finally finished. From time to time you have to reboot the server (kernel upgrades), and i've noticed that the replaced brick doesn't come up after the reboot. Status after reboot: gluster volume status Status of volume: shared Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick gluster11:/gluster/bricksda1/shared 49164 0 Y 6425 Brick gluster12:/gluster/bricksda1/shared 49152 0 Y 2078 Brick gluster13:/gluster/bricksda1/shared 49152 0 Y 2478 Brick gluster11:/gluster/bricksdb1/shared 49165 0 Y 6452 Brick gluster12:/gluster/bricksdb1/shared 49153 0 Y 2084 Brick gluster13:/gluster/bricksdb1/shared 49153 0 Y 2497 Brick gluster11:/gluster/bricksdc1/shared 49166 0 Y 6479 Brick gluster12:/gluster/bricksdc1/shared 49154 0 Y 2090 Brick gluster13:/gluster/bricksdc1/shared 49154 0 Y 2485 Brick gluster11:/gluster/bricksdd1/shared 49168 0 Y 7897 Brick gluster12:/gluster/bricksdd1_new/shared 49157 0 Y 7632 Brick gluster13:/gluster/bricksdd1_new/shared N/A N/A N N/A Self-heal Daemon on localhost N/A N/A Y 25483 Self-heal Daemon on gluster13 N/A N/A Y 2463 Self-heal Daemon on gluster12 N/A N/A Y 17619 Task Status of Volume shared ------------------------------------------------------------------------------ There are no active volume tasks Here gluster13:/gluster/bricksdd1_new/shared is not up. Related log message after reboot in glusterd.log: [2018-08-16 05:22:52.986757] W [socket.c:593:__socket_rwv] 0-management: readv on /var/run/gluster/02d086b75bfc97f2cce96fe47e26dcf3.socket failed (No data available) [2018-08-16 05:22:52.987648] I [MSGID: 106005] [glusterd-handler.c:6071:__glusterd_brick_rpc_notify] 0-management: Brick gluster13:/gluster/bricksdd1_new/shared has disconnected from glusterd. [2018-08-16 05:22:52.987908] E [rpc-clnt.c:350:saved_frames_unwind] (--> /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x13e)[0x7fdbaa398b8e] (--> /usr/lib/x86_64- linux-gnu/libgfrpc.so.0(saved_frames_unwind+0x1d1)[0x7fdbaa15f111] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7fdbaa15f23e] (--> /usr/lib/x86_64-linu x-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x91)[0x7fdbaa1608d1] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_notify+0x288)[0x7fdbaa1613f8] ))))) 0-management: force d unwinding frame type(brick operations) op(--(4)) called at 2018-08-16 05:22:52.941332 (xid=0x2) [2018-08-16 05:22:52.988058] W [dict.c:426:dict_set] (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.12/xlator/mgmt/glusterd.so(+0xd1e59) [0x7fdba4f9ce59] -->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(dict_set_int32+0x2b) [0x7fdbaa39122b] -->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(dict_set+0xd3) [0x7fdbaa38fa13] ) 0-dict: !this || !value for key=index [I nvalid argument] [2018-08-16 05:22:52.988092] E [MSGID: 106060] [glusterd-syncop.c:1014:gd_syncop_mgmt_brick_op] 0-management: Error setting index on brick status rsp dict This problem could be related to my previous mail. After executing "gluster volume start shared force" the brick comes up, resulting in healing the brick (and in high load, too). Is there any possibility to track down why this happens and how to ensure that the brick comes up at boot? Best regards Hubert
Serkan Çoban
2018-Aug-16 07:26 UTC
[Gluster-users] Previously replaced brick not coming up after reboot
What is your gluster version? There was a bug in 3.10, when you reboot a node some bricks may not come online but it fixed in later versions. On 8/16/18, Hu Bert <revirii at googlemail.com> wrote:> Hi there, > > 2 times i had to replace a brick on 2 different servers; replace went > fine, heal took very long but finally finished. From time to time you > have to reboot the server (kernel upgrades), and i've noticed that the > replaced brick doesn't come up after the reboot. Status after reboot: > > gluster volume status > Status of volume: shared > Gluster process TCP Port RDMA Port Online > Pid > ------------------------------------------------------------------------------ > Brick gluster11:/gluster/bricksda1/shared 49164 0 Y > 6425 > Brick gluster12:/gluster/bricksda1/shared 49152 0 Y > 2078 > Brick gluster13:/gluster/bricksda1/shared 49152 0 Y > 2478 > Brick gluster11:/gluster/bricksdb1/shared 49165 0 Y > 6452 > Brick gluster12:/gluster/bricksdb1/shared 49153 0 Y > 2084 > Brick gluster13:/gluster/bricksdb1/shared 49153 0 Y > 2497 > Brick gluster11:/gluster/bricksdc1/shared 49166 0 Y > 6479 > Brick gluster12:/gluster/bricksdc1/shared 49154 0 Y > 2090 > Brick gluster13:/gluster/bricksdc1/shared 49154 0 Y > 2485 > Brick gluster11:/gluster/bricksdd1/shared 49168 0 Y > 7897 > Brick gluster12:/gluster/bricksdd1_new/shared 49157 0 Y > 7632 > Brick gluster13:/gluster/bricksdd1_new/shared N/A N/A N > N/A > Self-heal Daemon on localhost N/A N/A Y > 25483 > Self-heal Daemon on gluster13 N/A N/A Y > 2463 > Self-heal Daemon on gluster12 N/A N/A Y > 17619 > > Task Status of Volume shared > ------------------------------------------------------------------------------ > There are no active volume tasks > > Here gluster13:/gluster/bricksdd1_new/shared is not up. Related log > message after reboot in glusterd.log: > > [2018-08-16 05:22:52.986757] W [socket.c:593:__socket_rwv] > 0-management: readv on > /var/run/gluster/02d086b75bfc97f2cce96fe47e26dcf3.socket failed (No > data available) > [2018-08-16 05:22:52.987648] I [MSGID: 106005] > [glusterd-handler.c:6071:__glusterd_brick_rpc_notify] 0-management: > Brick gluster13:/gluster/bricksdd1_new/shared has disconnected from > glusterd. > [2018-08-16 05:22:52.987908] E [rpc-clnt.c:350:saved_frames_unwind] > (--> > /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x13e)[0x7fdbaa398b8e] > (--> /usr/lib/x86_64- > linux-gnu/libgfrpc.so.0(saved_frames_unwind+0x1d1)[0x7fdbaa15f111] > (--> > /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7fdbaa15f23e] > (--> /usr/lib/x86_64-linu > x-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x91)[0x7fdbaa1608d1] > (--> > /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_notify+0x288)[0x7fdbaa1613f8] > ))))) 0-management: force > d unwinding frame type(brick operations) op(--(4)) called at > 2018-08-16 05:22:52.941332 (xid=0x2) > [2018-08-16 05:22:52.988058] W [dict.c:426:dict_set] > (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.12/xlator/mgmt/glusterd.so(+0xd1e59) > [0x7fdba4f9ce59] > -->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(dict_set_int32+0x2b) > [0x7fdbaa39122b] > -->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(dict_set+0xd3) > [0x7fdbaa38fa13] ) 0-dict: !this || !value for key=index [I > nvalid argument] > [2018-08-16 05:22:52.988092] E [MSGID: 106060] > [glusterd-syncop.c:1014:gd_syncop_mgmt_brick_op] 0-management: Error > setting index on brick status rsp dict > > This problem could be related to my previous mail. After executing > "gluster volume start shared force" the brick comes up, resulting in > healing the brick (and in high load, too). Is there any possibility to > track down why this happens and how to ensure that the brick comes up > at boot? > > > Best regards > Hubert > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users >