Lindsay Mathieson
2016-Apr-20  06:07 UTC
[Gluster-users] 3.7.11 - Brick died, can't restart
A brick has died on node vnb of my cluster. Unfortnately it has left a zombie glusterfsd process which is holding the brick socket so I can't restart it. Any advice on how to work round that asap would be appreciated. Tail of brick logging: 2016-04-20 05:41:37.325846] I [dict.c:473:dict_get] (-->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(default_getxattr_cbk+0xab) [0x7f5ff77d239b] -->/usr/lib/x86_64-linux-gnu/glusterfs/3.7.11/xlator/features/marker.so(marker_getxattr_cbk+0xa7) [0x7f5feb9c88e7] -->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(dict_get+0x93) [0x7f5ff77c30f3] ) 0-dict: !this || key=() [Invalid argument] [2016-04-20 05:41:37.328255] I [dict.c:473:dict_get] (-->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(default_getxattr_cbk+0xab) [0x7f5ff77d239b] -->/usr/lib/x86_64-linux-gnu/glusterfs/3.7.11/xlator/features/marker.so(marker_getxattr_cbk+0xa7) [0x7f5feb9c88e7] -->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(dict_get+0x93) [0x7f5ff77c30f3] ) 0-dict: !this || key=() [Invalid argument] [2016-04-20 05:41:37.599402] I [dict.c:473:dict_get] (-->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(default_getxattr_cbk+0xab) [0x7f5ff77d239b] -->/usr/lib/x86_64-linux-gnu/glusterfs/3.7.11/xlator/features/marker.so(marker_getxattr_cbk+0xa7) [0x7f5feb9c88e7] -->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(dict_get+0x93) [0x7f5ff77c30f3] ) 0-dict: !this || key=() [Invalid argument] [2016-04-20 05:41:37.601843] I [dict.c:473:dict_get] (-->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(default_getxattr_cbk+0xab) [0x7f5ff77d239b] -->/usr/lib/x86_64-linux-gnu/glusterfs/3.7.11/xlator/features/marker.so(marker_getxattr_cbk+0xa7) [0x7f5feb9c88e7] -->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(dict_get+0x93) [0x7f5ff77c30f3] ) 0-dict: !this || key=() [Invalid argument] [2016-04-20 05:41:37.604164] I [dict.c:473:dict_get] (-->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(default_getxattr_cbk+0xab) [0x7f5ff77d239b] -->/usr/lib/x86_64-linux-gnu/glusterfs/3.7.11/xlator/features/marker.so(marker_getxattr_cbk+0xa7) [0x7f5feb9c88e7] -->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(dict_get+0x93) [0x7f5ff77c30f3] ) 0-dict: !this || key=() [Invalid argument] [2016-04-20 05:41:37.682886] I [dict.c:473:dict_get] (-->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(default_getxattr_cbk+0xab) [0x7f5ff77d239b] -->/usr/lib/x86_64-linux-gnu/glusterfs/3.7.11/xlator/features/marker.so(marker_getxattr_cbk+0xa7) [0x7f5feb9c88e7] -->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(dict_get+0x93) [0x7f5ff77c30f3] ) 0-dict: !this || key=() [Invalid argument] [2016-04-20 05:55:16.203806] W [glusterfsd.c:1251:cleanup_and_exit] (-->/lib/x86_64-linux-gnu/libpthread.so.0(+0x80a4) [0x7f5ff6a4f0a4] -->/usr/sbin/glusterfsd(glusterfs_sigwaiter+0xe5) [0x5629ffed26f5] -->/usr/sbin/glusterfsd(cleanup_and_exit+0x59) [0x5629ffed2569] ) 0-: received signum (15), shutting down [2016-04-20 05:55:35.536514] I [MSGID: 100030] [glusterfsd.c:2332:main] 0-/usr/sbin/glusterfsd: Started running /usr/sbin/glusterfsd version 3.7.11 (args: /usr/sbin/glusterfsd -s vnb.proxmox.softlog --volfile-id datastore4.vnb.proxmox.softlog.tank-vmdata-datastore4 -p /var/lib/glusterd/vols/datastore4/run/vnb.proxmox.softlog-tank-vmdata-datastore4.pid -S /var/run/gluster/5ca23018ece7b94960f0580687e60650.socket --brick-name /tank/vmdata/datastore4 -l /var/log/glusterfs/bricks/tank-vmdata-datastore4.log --xlator-option *-posix.glusterd-uuid=43a1bf8c-3e69-4581-8e16-f2e1462cfc36 --brick-port 49156 --xlator-option datastore4-server.listen-port=49156) [2016-04-20 05:55:35.541739] E [socket.c:770:__socket_server_bind] 0-socket.glusterfsd: binding to failed: Address already in use [2016-04-20 05:55:35.541777] E [socket.c:773:__socket_server_bind] 0-socket.glusterfsd: Port is already in use [2016-04-20 05:55:35.541794] W [rpcsvc.c:1604:rpcsvc_transport_create] 0-rpc-service: listening on transport failed [2016-04-20 05:55:35.547990] I [MSGID: 100030] [glusterfsd.c:2332:main] 0-/usr/sbin/glusterfsd: Started running /usr/sbin/glusterfsd version 3.7.11 (args: /usr/sbin/glusterfsd -s vnb.proxmox.softlog --volfile-id datastore4.vnb.proxmox.softlog.tank-vmdata-datastore4 -p /var/lib/glusterd/vols/datastore4/run/vnb.proxmox.softlog-tank-vmdata-datastore4.pid -S /var/run/gluster/5ca23018ece7b94960f0580687e60650.socket --brick-name /tank/vmdata/datastore4 -l /var/log/glusterfs/bricks/tank-vmdata-datastore4.log --xlator-option *-posix.glusterd-uuid=43a1bf8c-3e69-4581-8e16-f2e1462cfc36 --brick-port 49156 --xlator-option datastore4-server.listen-port=49156) I did a quick check of the other bricks, they are all filled with "I [dict.c:473:dict_get] (-->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(default_getxattr_cbk+0xab) [0x7f5ff77d239b] -->/usr/lib/x86_64-linux-gnu/glusterfs/3.7.11/xlator/features/marker.so(marker_getxattr_cbk+0xa7) [0x7f5feb9c88e7] -->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(dict_get+0x93) [0x7f5ff77c30f3] ) 0-dict: !this || key=() [Invalid argument]" Thanks, -- Lindsay
try restarting glusterd.
# service glistered restart
if it didn?t work, try killing glusterfsd PID(s)
# kill $(ps -ef | grep glusterfsd | awk '{print $2}?)
t
hen, restart glusterd
# service glusterd restart
PS: killing glusterfsd that way will kill all the bricks on that node, but
restarting glusterd should return the bricks back online.
?Bishoy
> On Apr 19, 2016, at 11:07 PM, Lindsay Mathieson <lindsay.mathieson at
gmail.com> wrote:
> 
> A brick has died on node vnb of my cluster. Unfortnately it has left a
> zombie glusterfsd process which is holding the brick socket so I can't
> restart it. Any advice on how to work round that asap would be
> appreciated.
> 
> Tail of brick logging:
> 
> 2016-04-20 05:41:37.325846] I [dict.c:473:dict_get]
>
(-->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(default_getxattr_cbk+0xab)
> [0x7f5ff77d239b]
>
-->/usr/lib/x86_64-linux-gnu/glusterfs/3.7.11/xlator/features/marker.so(marker_getxattr_cbk+0xa7)
> [0x7f5feb9c88e7]
> -->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(dict_get+0x93)
> [0x7f5ff77c30f3] ) 0-dict: !this || key=() [Invalid argument]
> [2016-04-20 05:41:37.328255] I [dict.c:473:dict_get]
>
(-->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(default_getxattr_cbk+0xab)
> [0x7f5ff77d239b]
>
-->/usr/lib/x86_64-linux-gnu/glusterfs/3.7.11/xlator/features/marker.so(marker_getxattr_cbk+0xa7)
> [0x7f5feb9c88e7]
> -->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(dict_get+0x93)
> [0x7f5ff77c30f3] ) 0-dict: !this || key=() [Invalid argument]
> [2016-04-20 05:41:37.599402] I [dict.c:473:dict_get]
>
(-->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(default_getxattr_cbk+0xab)
> [0x7f5ff77d239b]
>
-->/usr/lib/x86_64-linux-gnu/glusterfs/3.7.11/xlator/features/marker.so(marker_getxattr_cbk+0xa7)
> [0x7f5feb9c88e7]
> -->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(dict_get+0x93)
> [0x7f5ff77c30f3] ) 0-dict: !this || key=() [Invalid argument]
> [2016-04-20 05:41:37.601843] I [dict.c:473:dict_get]
>
(-->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(default_getxattr_cbk+0xab)
> [0x7f5ff77d239b]
>
-->/usr/lib/x86_64-linux-gnu/glusterfs/3.7.11/xlator/features/marker.so(marker_getxattr_cbk+0xa7)
> [0x7f5feb9c88e7]
> -->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(dict_get+0x93)
> [0x7f5ff77c30f3] ) 0-dict: !this || key=() [Invalid argument]
> [2016-04-20 05:41:37.604164] I [dict.c:473:dict_get]
>
(-->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(default_getxattr_cbk+0xab)
> [0x7f5ff77d239b]
>
-->/usr/lib/x86_64-linux-gnu/glusterfs/3.7.11/xlator/features/marker.so(marker_getxattr_cbk+0xa7)
> [0x7f5feb9c88e7]
> -->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(dict_get+0x93)
> [0x7f5ff77c30f3] ) 0-dict: !this || key=() [Invalid argument]
> [2016-04-20 05:41:37.682886] I [dict.c:473:dict_get]
>
(-->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(default_getxattr_cbk+0xab)
> [0x7f5ff77d239b]
>
-->/usr/lib/x86_64-linux-gnu/glusterfs/3.7.11/xlator/features/marker.so(marker_getxattr_cbk+0xa7)
> [0x7f5feb9c88e7]
> -->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(dict_get+0x93)
> [0x7f5ff77c30f3] ) 0-dict: !this || key=() [Invalid argument]
> [2016-04-20 05:55:16.203806] W [glusterfsd.c:1251:cleanup_and_exit]
> (-->/lib/x86_64-linux-gnu/libpthread.so.0(+0x80a4) [0x7f5ff6a4f0a4]
> -->/usr/sbin/glusterfsd(glusterfs_sigwaiter+0xe5) [0x5629ffed26f5]
> -->/usr/sbin/glusterfsd(cleanup_and_exit+0x59) [0x5629ffed2569] ) 0-:
> received signum (15), shutting down
> [2016-04-20 05:55:35.536514] I [MSGID: 100030]
> [glusterfsd.c:2332:main] 0-/usr/sbin/glusterfsd: Started running
> /usr/sbin/glusterfsd version 3.7.11 (args: /usr/sbin/glusterfsd -s
> vnb.proxmox.softlog --volfile-id
> datastore4.vnb.proxmox.softlog.tank-vmdata-datastore4 -p
>
/var/lib/glusterd/vols/datastore4/run/vnb.proxmox.softlog-tank-vmdata-datastore4.pid
> -S /var/run/gluster/5ca23018ece7b94960f0580687e60650.socket
> --brick-name /tank/vmdata/datastore4 -l
> /var/log/glusterfs/bricks/tank-vmdata-datastore4.log --xlator-option
> *-posix.glusterd-uuid=43a1bf8c-3e69-4581-8e16-f2e1462cfc36
> --brick-port 49156 --xlator-option
> datastore4-server.listen-port=49156)
> [2016-04-20 05:55:35.541739] E [socket.c:770:__socket_server_bind]
> 0-socket.glusterfsd: binding to  failed: Address already in use
> [2016-04-20 05:55:35.541777] E [socket.c:773:__socket_server_bind]
> 0-socket.glusterfsd: Port is already in use
> [2016-04-20 05:55:35.541794] W [rpcsvc.c:1604:rpcsvc_transport_create]
> 0-rpc-service: listening on transport failed
> [2016-04-20 05:55:35.547990] I [MSGID: 100030]
> [glusterfsd.c:2332:main] 0-/usr/sbin/glusterfsd: Started running
> /usr/sbin/glusterfsd version 3.7.11 (args: /usr/sbin/glusterfsd -s
> vnb.proxmox.softlog --volfile-id
> datastore4.vnb.proxmox.softlog.tank-vmdata-datastore4 -p
>
/var/lib/glusterd/vols/datastore4/run/vnb.proxmox.softlog-tank-vmdata-datastore4.pid
> -S /var/run/gluster/5ca23018ece7b94960f0580687e60650.socket
> --brick-name /tank/vmdata/datastore4 -l
> /var/log/glusterfs/bricks/tank-vmdata-datastore4.log --xlator-option
> *-posix.glusterd-uuid=43a1bf8c-3e69-4581-8e16-f2e1462cfc36
> --brick-port 49156 --xlator-option
> datastore4-server.listen-port=49156)
> 
> 
> I did a quick check of the other bricks, they are all filled with "I
> [dict.c:473:dict_get]
>
(-->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(default_getxattr_cbk+0xab)
> [0x7f5ff77d239b]
>
-->/usr/lib/x86_64-linux-gnu/glusterfs/3.7.11/xlator/features/marker.so(marker_getxattr_cbk+0xa7)
> [0x7f5feb9c88e7]
> -->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(dict_get+0x93)
> [0x7f5ff77c30f3] ) 0-dict: !this || key=() [Invalid argument]"
> 
> 
> Thanks,
> -- 
> Lindsay
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20160419/e6a9529f/attachment.html>
A zombied glusterfsd means it's stuck in a kernel operation, likely some io wait that was hung in the kernel. Since there's no way to clear that from the kernel, the only option was to reboot. On 04/19/2016 11:07 PM, Lindsay Mathieson wrote:> A brick has died on node vnb of my cluster. Unfortnately it has left a > zombie glusterfsd process which is holding the brick socket so I can't > restart it. Any advice on how to work round that asap would be > appreciated. > > Tail of brick logging: > > 2016-04-20 05:41:37.325846] I [dict.c:473:dict_get] > (-->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(default_getxattr_cbk+0xab) > [0x7f5ff77d239b] > -->/usr/lib/x86_64-linux-gnu/glusterfs/3.7.11/xlator/features/marker.so(marker_getxattr_cbk+0xa7) > [0x7f5feb9c88e7] > -->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(dict_get+0x93) > [0x7f5ff77c30f3] ) 0-dict: !this || key=() [Invalid argument] > [2016-04-20 05:41:37.328255] I [dict.c:473:dict_get] > (-->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(default_getxattr_cbk+0xab) > [0x7f5ff77d239b] > -->/usr/lib/x86_64-linux-gnu/glusterfs/3.7.11/xlator/features/marker.so(marker_getxattr_cbk+0xa7) > [0x7f5feb9c88e7] > -->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(dict_get+0x93) > [0x7f5ff77c30f3] ) 0-dict: !this || key=() [Invalid argument] > [2016-04-20 05:41:37.599402] I [dict.c:473:dict_get] > (-->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(default_getxattr_cbk+0xab) > [0x7f5ff77d239b] > -->/usr/lib/x86_64-linux-gnu/glusterfs/3.7.11/xlator/features/marker.so(marker_getxattr_cbk+0xa7) > [0x7f5feb9c88e7] > -->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(dict_get+0x93) > [0x7f5ff77c30f3] ) 0-dict: !this || key=() [Invalid argument] > [2016-04-20 05:41:37.601843] I [dict.c:473:dict_get] > (-->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(default_getxattr_cbk+0xab) > [0x7f5ff77d239b] > -->/usr/lib/x86_64-linux-gnu/glusterfs/3.7.11/xlator/features/marker.so(marker_getxattr_cbk+0xa7) > [0x7f5feb9c88e7] > -->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(dict_get+0x93) > [0x7f5ff77c30f3] ) 0-dict: !this || key=() [Invalid argument] > [2016-04-20 05:41:37.604164] I [dict.c:473:dict_get] > (-->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(default_getxattr_cbk+0xab) > [0x7f5ff77d239b] > -->/usr/lib/x86_64-linux-gnu/glusterfs/3.7.11/xlator/features/marker.so(marker_getxattr_cbk+0xa7) > [0x7f5feb9c88e7] > -->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(dict_get+0x93) > [0x7f5ff77c30f3] ) 0-dict: !this || key=() [Invalid argument] > [2016-04-20 05:41:37.682886] I [dict.c:473:dict_get] > (-->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(default_getxattr_cbk+0xab) > [0x7f5ff77d239b] > -->/usr/lib/x86_64-linux-gnu/glusterfs/3.7.11/xlator/features/marker.so(marker_getxattr_cbk+0xa7) > [0x7f5feb9c88e7] > -->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(dict_get+0x93) > [0x7f5ff77c30f3] ) 0-dict: !this || key=() [Invalid argument] > [2016-04-20 05:55:16.203806] W [glusterfsd.c:1251:cleanup_and_exit] > (-->/lib/x86_64-linux-gnu/libpthread.so.0(+0x80a4) [0x7f5ff6a4f0a4] > -->/usr/sbin/glusterfsd(glusterfs_sigwaiter+0xe5) [0x5629ffed26f5] > -->/usr/sbin/glusterfsd(cleanup_and_exit+0x59) [0x5629ffed2569] ) 0-: > received signum (15), shutting down > [2016-04-20 05:55:35.536514] I [MSGID: 100030] > [glusterfsd.c:2332:main] 0-/usr/sbin/glusterfsd: Started running > /usr/sbin/glusterfsd version 3.7.11 (args: /usr/sbin/glusterfsd -s > vnb.proxmox.softlog --volfile-id > datastore4.vnb.proxmox.softlog.tank-vmdata-datastore4 -p > /var/lib/glusterd/vols/datastore4/run/vnb.proxmox.softlog-tank-vmdata-datastore4.pid > -S /var/run/gluster/5ca23018ece7b94960f0580687e60650.socket > --brick-name /tank/vmdata/datastore4 -l > /var/log/glusterfs/bricks/tank-vmdata-datastore4.log --xlator-option > *-posix.glusterd-uuid=43a1bf8c-3e69-4581-8e16-f2e1462cfc36 > --brick-port 49156 --xlator-option > datastore4-server.listen-port=49156) > [2016-04-20 05:55:35.541739] E [socket.c:770:__socket_server_bind] > 0-socket.glusterfsd: binding to failed: Address already in use > [2016-04-20 05:55:35.541777] E [socket.c:773:__socket_server_bind] > 0-socket.glusterfsd: Port is already in use > [2016-04-20 05:55:35.541794] W [rpcsvc.c:1604:rpcsvc_transport_create] > 0-rpc-service: listening on transport failed > [2016-04-20 05:55:35.547990] I [MSGID: 100030] > [glusterfsd.c:2332:main] 0-/usr/sbin/glusterfsd: Started running > /usr/sbin/glusterfsd version 3.7.11 (args: /usr/sbin/glusterfsd -s > vnb.proxmox.softlog --volfile-id > datastore4.vnb.proxmox.softlog.tank-vmdata-datastore4 -p > /var/lib/glusterd/vols/datastore4/run/vnb.proxmox.softlog-tank-vmdata-datastore4.pid > -S /var/run/gluster/5ca23018ece7b94960f0580687e60650.socket > --brick-name /tank/vmdata/datastore4 -l > /var/log/glusterfs/bricks/tank-vmdata-datastore4.log --xlator-option > *-posix.glusterd-uuid=43a1bf8c-3e69-4581-8e16-f2e1462cfc36 > --brick-port 49156 --xlator-option > datastore4-server.listen-port=49156) > > > I did a quick check of the other bricks, they are all filled with "I > [dict.c:473:dict_get] > (-->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(default_getxattr_cbk+0xab) > [0x7f5ff77d239b] > -->/usr/lib/x86_64-linux-gnu/glusterfs/3.7.11/xlator/features/marker.so(marker_getxattr_cbk+0xa7) > [0x7f5feb9c88e7] > -->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(dict_get+0x93) > [0x7f5ff77c30f3] ) 0-dict: !this || key=() [Invalid argument]" > > > Thanks,