Alastair Neil
2017-Oct-24 17:43 UTC
[Gluster-users] brick is down but gluster volume status says it's fine
gluster version 3.10.6, replica 3 volume, daemon is present but does not appear to be functioning peculiar behaviour. If I kill the glusterfs brick daemon and restart glusterd then the brick becomes available - but one of my other volumes bricks on the same server goes down in the same way it's like wack-a-mole. any ideas? [root at gluster-2 bricks]# glv status digitalcorpora> Status of volume: digitalcorpora > Gluster process TCP Port RDMA Port Online > Pid > > ------------------------------------------------------------------------------ > Brick gluster-2:/export/brick7/digitalcorpo > ra 49156 0 Y > 125708 > Brick gluster1.vsnet.gmu.edu:/export/brick7 > /digitalcorpora 49152 0 Y > 12345 > Brick gluster0:/export/brick7/digitalcorpor > a 49152 0 Y > 16098 > Self-heal Daemon on localhost N/A N/A Y > 126625 > Self-heal Daemon on gluster1 N/A N/A Y > 15405 > Self-heal Daemon on gluster0 N/A N/A Y > 18584 > > Task Status of Volume digitalcorpora > > ------------------------------------------------------------------------------ > There are no active volume tasks > > [root at gluster-2 bricks]# glv heal digitalcorpora info > Brick gluster-2:/export/brick7/digitalcorpora > Status: Transport endpoint is not connected > Number of entries: - > > Brick gluster1.vsnet.gmu.edu:/export/brick7/digitalcorpora > /.trashcan > /DigitalCorpora/hello2.txt > /DigitalCorpora > Status: Connected > Number of entries: 3 > > Brick gluster0:/export/brick7/digitalcorpora > /.trashcan > /DigitalCorpora/hello2.txt > /DigitalCorpora > Status: Connected > Number of entries: 3 > > [2017-10-24 17:18:48.288505] W [glusterfsd.c:1360:cleanup_and_exit] > (-->/lib64/libpthread.so.0(+0x7e25) [0x7f6f83c9de25] > -->/usr/sbin/glusterfsd(glusterfs_sigwaiter+0xe5) [0x55a148eeb135] > -->/usr/sbin/glusterfsd(cleanup_and_exit+0x6b) [0x55a148eeaf5b] ) 0-: > received signum (15), shutting down > [2017-10-24 17:18:59.270384] I [MSGID: 100030] [glusterfsd.c:2503:main] > 0-/usr/sbin/glusterfsd: Started running /usr/sbin/glusterfsd version 3.10.6 > (args: /usr/sbin/glusterfsd -s gluster-2 --volfile-id > digitalcorpora.gluster-2.export-brick7-digitalcorpora -p > /var/lib/glusterd/vols/digitalcorpora/run/gluster-2-export-brick7-digitalcorpora.pid > -S /var/run/gluster/f8e0b3393e47dc51a07c6609f9b40841.socket --brick-name > /export/brick7/digitalcorpora -l > /var/log/glusterfs/bricks/export-brick7-digitalcorpora.log --xlator-option > *-posix.glusterd-uuid=032c17f5-8cc9-445f-aa45-897b5a066b43 --brick-port > 49154 --xlator-option digitalcorpora-server.listen-port=49154) > [2017-10-24 17:18:59.285279] I [MSGID: 101190] > [event-epoll.c:629:event_dispatch_epoll_worker] 0-epoll: Started thread > with index 1 > [2017-10-24 17:19:04.611723] I > [rpcsvc.c:2237:rpcsvc_set_outstanding_rpc_limit] 0-rpc-service: Configured > rpc.outstanding-rpc-limit with value 64 > [2017-10-24 17:19:04.611815] W [MSGID: 101002] > [options.c:954:xl_opt_validate] 0-digitalcorpora-server: option > 'listen-port' is deprecated, preferred is 'transport.socket.listen-port', > continuing with correction > [2017-10-24 17:19:04.615974] W [MSGID: 101174] > [graph.c:361:_log_if_unknown_option] 0-digitalcorpora-server: option > 'rpc-auth.auth-glusterfs' is not recognized > [2017-10-24 17:19:04.616033] W [MSGID: 101174] > [graph.c:361:_log_if_unknown_option] 0-digitalcorpora-server: option > 'rpc-auth.auth-unix' is not recognized > [2017-10-24 17:19:04.616070] W [MSGID: 101174] > [graph.c:361:_log_if_unknown_option] 0-digitalcorpora-server: option > 'rpc-auth.auth-null' is not recognized > [2017-10-24 17:19:04.616134] W [MSGID: 101174] > [graph.c:361:_log_if_unknown_option] 0-digitalcorpora-server: option > 'auth-path' is not recognized > [2017-10-24 17:19:04.616177] W [MSGID: 101174] > [graph.c:361:_log_if_unknown_option] 0-digitalcorpora-server: option > 'ping-timeout' is not recognized > [2017-10-24 17:19:04.616203] W [MSGID: 101174] > [graph.c:361:_log_if_unknown_option] 0-/export/brick7/digitalcorpora: > option 'rpc-auth-allow-insecure' is not recognized > [2017-10-24 17:19:04.616215] W [MSGID: 101174] > [graph.c:361:_log_if_unknown_option] 0-/export/brick7/digitalcorpora: > option 'auth.addr./export/brick7/digitalcorpora.allow' is not recognized > [2017-10-24 17:19:04.616226] W [MSGID: 101174] > [graph.c:361:_log_if_unknown_option] 0-/export/brick7/digitalcorpora: > option 'auth-path' is not recognized > [2017-10-24 17:19:04.616237] W [MSGID: 101174] > [graph.c:361:_log_if_unknown_option] 0-/export/brick7/digitalcorpora: > option 'auth.login.b17f2513-7d9c-4174-a0c5-de4a752d46ca.password' is not > recognized > [2017-10-24 17:19:04.616248] W [MSGID: 101174] > [graph.c:361:_log_if_unknown_option] 0-/export/brick7/digitalcorpora: > option 'auth.login./export/brick7/digitalcorpora.allow' is not recognized > [2017-10-24 17:19:04.616283] W [MSGID: 101174] > [graph.c:361:_log_if_unknown_option] 0-digitalcorpora-quota: option > 'timeout' is not recognized > [2017-10-24 17:19:04.616367] W [MSGID: 101174] > [graph.c:361:_log_if_unknown_option] 0-digitalcorpora-trash: option > 'brick-path' is not recognized > Final graph: > > +------------------------------------------------------------------------------+ > 1: volume digitalcorpora-posix > 2: type storage/posix > 3: option glusterd-uuid 032c17f5-8cc9-445f-aa45-897b5a066b43 > 4: option directory /export/brick7/digitalcorpora > 5: option volume-id 61efe58a-ae5b-4d8b-b9f9-67829867c442 > 6: option brick-uid 36 > 7: option brick-gid 36 > 8: end-volume > 9: > 10: volume digitalcorpora-trash > 11: type features/trash > 12: option trash-dir .trashcan > 13: option brick-path /export/brick7/digitalcorpora > 14: option trash-internal-op off > 15: subvolumes digitalcorpora-posix > 16: end-volume > 17: > 18: volume digitalcorpora-changetimerecorder > 19: type features/changetimerecorder > 20: option db-type sqlite3 > 21: option hot-brick off > 22: option db-name digitalcorpora.db > 23: option db-path /export/brick7/digitalcorpora/.glusterfs/ > 24: option record-exit off > 25: option ctr_link_consistency off > 26: option ctr_lookupheal_link_timeout 300 > 27: option ctr_lookupheal_inode_timeout 300 > 28: option record-entry on > 29: option ctr-enabled off > 30: option record-counters off > 31: option ctr-record-metadata-heat off > 32: option sql-db-cachesize 12500 > 33: option sql-db-wal-autocheckpoint 25000 > 34: subvolumes digitalcorpora-trash > 35: end-volume > 36: > 37: volume digitalcorpora-changelog > 38: type features/changelog > 39: option changelog-brick /export/brick7/digitalcorpora > 40: option changelog-dir > /export/brick7/digitalcorpora/.glusterfs/changelogs > 41: option changelog-barrier-timeout 120 > 42: subvolumes digitalcorpora-changetimerecorder > 43: end-volume > 44: > 45: volume digitalcorpora-bitrot-stub > 46: type features/bitrot-stub > 47: option export /export/brick7/digitalcorpora > 48: subvolumes digitalcorpora-changelog > 49: end-volume > 50: > 51: volume digitalcorpora-access-control > 52: type features/access-control > 53: subvolumes digitalcorpora-bitrot-stub > 54: end-volume > 55: > 56: volume digitalcorpora-locks > 57: type features/locks > 58: subvolumes digitalcorpora-access-control > 59: end-volume > 60: > 61: volume digitalcorpora-worm > 62: type features/worm > 63: option worm off > 64: option worm-file-level off > 65: subvolumes digitalcorpora-locks > 66: end-volume > 67: > 68: volume digitalcorpora-read-only > 69: type features/read-only > 70: option read-only off > 71: subvolumes digitalcorpora-worm > 72: end-volume > 73: > 74: volume digitalcorpora-leases > 75: type features/leases > 76: option leases off > 77: subvolumes digitalcorpora-read-only > 78: end-volume > 79: > 80: volume digitalcorpora-upcall > 81: type features/upcall > 82: option cache-invalidation off > 83: subvolumes digitalcorpora-leases > 84: end-volume > 85: > 86: volume digitalcorpora-io-threads > 87: type performance/io-threads > 88: subvolumes digitalcorpora-upcall > 89: end-volume > 90: > 91: volume digitalcorpora-marker > 92: type features/marker > 93: option volume-uuid 61efe58a-ae5b-4d8b-b9f9-67829867c442 > 94: option timestamp-file > /var/lib/glusterd/vols/digitalcorpora/marker.tstamp > 95: option quota-version 0 > 96: option xtime off > 97: option gsync-force-xtime off > 98: option quota off > 99: option inode-quota off > 100: subvolumes digitalcorpora-io-threads > 101: end-volume > 102: > 103: volume digitalcorpora-barrier > 104: type features/barrier > 105: option barrier disable > 106: option barrier-timeout 120 > 107: subvolumes digitalcorpora-marker > 108: end-volume > 109: > 110: volume digitalcorpora-index > 111: type features/index > 112: option index-base /export/brick7/digitalcorpora/.glusterfs/indices > 113: option xattrop-dirty-watchlist trusted.afr.dirty > 114: option xattrop-pending-watchlist trusted.afr.digitalcorpora- > 115: subvolumes digitalcorpora-barrier > 116: end-volume > 117: > 118: volume digitalcorpora-quota > 119: type features/quota > 120: option volume-uuid digitalcorpora > 121: option server-quota off > 122: option timeout 0 > 123: option deem-statfs off > 124: subvolumes digitalcorpora-index > 125: end-volume > 126: > 127: volume digitalcorpora-io-stats > 128: type debug/io-stats > 129: option unique-id /export/brick7/digitalcorpora > 130: option log-level WARNING > 131: option latency-measurement off > 132: option count-fop-hits off > 133: subvolumes digitalcorpora-quota > 134: end-volume > 135: > 136: volume /export/brick7/digitalcorpora > 137: type performance/decompounder > 138: option rpc-auth-allow-insecure on > 139: option auth.addr./export/brick7/digitalcorpora.allow > 129.174.125.204,129.174.93.204 > 140: option auth-path /export/brick7/digitalcorpora > 141: option auth.login.b17f2513-7d9c-4174-a0c5-de4a752d46ca.password > 6c007ad0-b5a2-4564-8464-300f8317e5c7 > 142: option auth.login./export/brick7/digitalcorpora.allow > b17f2513-7d9c-4174-a0c5-de4a752d46ca > 143: subvolumes digitalcorpora-io-stats > 144: end-volume > 145: > 146: volume digitalcorpora-server > 147: type protocol/server > 148: option transport.socket.listen-port 49154 > 149: option rpc-auth.auth-glusterfs on > 150: option rpc-auth.auth-unix on > 151: option rpc-auth.auth-null on > 152: option transport-type tcp > 153: option transport.address-family inet > 154: option auth.login./export/brick7/digitalcorpora.allow > b17f2513-7d9c-4174-a0c5-de4a752d46ca > 155: option auth.login.b17f2513-7d9c-4174-a0c5-de4a752d46ca.password > 6c007ad0-b5a2-4564-8464-300f8317e5c7 > 156: option auth-path /export/brick7/digitalcorpora > 157: option auth.addr./export/brick7/digitalcorpora.allow > 129.174.125.204,129.174.93.204 > 158: option ping-timeout 42 > 159: option transport.socket.keepalive 1 > 160: option rpc-auth-allow-insecure on > 161: option transport.tcp-user-timeout 0 > 162: option transport.socket.keepalive-time 20 > 163: option transport.socket.keepalive-interval 2 > 164: option transport.socket.keepalive-count 9 > 165: subvolumes /export/brick7/digitalcorpora > 166: end-volume > 167: > > +------------------------------------------------------------------------------+ > [2017-10-24 17:22:21.438620] W [socket.c:593:__socket_rwv] 0-glusterfs: > readv on 129.174.126.87:24007 failed (No data available) >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20171024/d8acd6bd/attachment.html>
Atin Mukherjee
2017-Oct-24 17:56 UTC
[Gluster-users] brick is down but gluster volume status says it's fine
On Tue, Oct 24, 2017 at 11:13 PM, Alastair Neil <ajneil.tech at gmail.com> wrote:> gluster version 3.10.6, replica 3 volume, daemon is present but does not > appear to be functioning > > peculiar behaviour. If I kill the glusterfs brick daemon and restart > glusterd then the brick becomes available - but one of my other volumes > bricks on the same server goes down in the same way it's like wack-a-mole. > > any ideas? >The subject and the data looks to be contradictory to me. Brick log (what you shared) doesn't have a cleanup_and_exit () trigger for a shutdown. Are you sure brick is down? OTOH, I see a mismatch of port for brick7/digitalcorpora where the brick process has 49154 but gluster volume status shows 49152. There is an issue with stale port which we're trying to address through https://review.gluster.org/18541 . But could you specify what exactly the problem is? Is it the stale port or the conflict between volume status output and actual brick health? If it's the latter, I'd need further information like output of "gluster get-state" command from the same node.> > [root at gluster-2 bricks]# glv status digitalcorpora > >> Status of volume: digitalcorpora >> Gluster process TCP Port RDMA Port Online >> Pid >> ------------------------------------------------------------ >> ------------------ >> Brick gluster-2:/export/brick7/digitalcorpo >> ra 49156 0 Y >> 125708 >> Brick gluster1.vsnet.gmu.edu:/export/brick7 >> /digitalcorpora 49152 0 Y >> 12345 >> Brick gluster0:/export/brick7/digitalcorpor >> a 49152 0 Y >> 16098 >> Self-heal Daemon on localhost N/A N/A Y >> 126625 >> Self-heal Daemon on gluster1 N/A N/A Y >> 15405 >> Self-heal Daemon on gluster0 N/A N/A Y >> 18584 >> >> Task Status of Volume digitalcorpora >> ------------------------------------------------------------ >> ------------------ >> There are no active volume tasks >> >> [root at gluster-2 bricks]# glv heal digitalcorpora info >> Brick gluster-2:/export/brick7/digitalcorpora >> Status: Transport endpoint is not connected >> Number of entries: - >> >> Brick gluster1.vsnet.gmu.edu:/export/brick7/digitalcorpora >> /.trashcan >> /DigitalCorpora/hello2.txt >> /DigitalCorpora >> Status: Connected >> Number of entries: 3 >> >> Brick gluster0:/export/brick7/digitalcorpora >> /.trashcan >> /DigitalCorpora/hello2.txt >> /DigitalCorpora >> Status: Connected >> Number of entries: 3 >> >> [2017-10-24 17:18:48.288505] W [glusterfsd.c:1360:cleanup_and_exit] >> (-->/lib64/libpthread.so.0(+0x7e25) [0x7f6f83c9de25] >> -->/usr/sbin/glusterfsd(glusterfs_sigwaiter+0xe5) [0x55a148eeb135] >> -->/usr/sbin/glusterfsd(cleanup_and_exit+0x6b) [0x55a148eeaf5b] ) 0-: >> received signum (15), shutting down >> [2017-10-24 17:18:59.270384] I [MSGID: 100030] [glusterfsd.c:2503:main] >> 0-/usr/sbin/glusterfsd: Started running /usr/sbin/glusterfsd version 3.10.6 >> (args: /usr/sbin/glusterfsd -s gluster-2 --volfile-id >> digitalcorpora.gluster-2.export-brick7-digitalcorpora -p >> /var/lib/glusterd/vols/digitalcorpora/run/gluster-2- >> export-brick7-digitalcorpora.pid -S /var/run/gluster/ >> f8e0b3393e47dc51a07c6609f9b40841.socket --brick-name >> /export/brick7/digitalcorpora -l /var/log/glusterfs/bricks/ >> export-brick7-digitalcorpora.log --xlator-option *-posix.glusterd-uuid>> 032c17f5-8cc9-445f-aa45-897b5a066b43 --brick-port 49154 --xlator-option >> digitalcorpora-server.listen-port=49154) >> [2017-10-24 17:18:59.285279] I [MSGID: 101190] [event-epoll.c:629:event_dispatch_epoll_worker] >> 0-epoll: Started thread with index 1 >> [2017-10-24 17:19:04.611723] I [rpcsvc.c:2237:rpcsvc_set_outstanding_rpc_limit] >> 0-rpc-service: Configured rpc.outstanding-rpc-limit with value 64 >> [2017-10-24 17:19:04.611815] W [MSGID: 101002] [options.c:954:xl_opt_validate] >> 0-digitalcorpora-server: option 'listen-port' is deprecated, preferred is >> 'transport.socket.listen-port', continuing with correction >> [2017-10-24 17:19:04.615974] W [MSGID: 101174] >> [graph.c:361:_log_if_unknown_option] 0-digitalcorpora-server: option >> 'rpc-auth.auth-glusterfs' is not recognized >> [2017-10-24 17:19:04.616033] W [MSGID: 101174] >> [graph.c:361:_log_if_unknown_option] 0-digitalcorpora-server: option >> 'rpc-auth.auth-unix' is not recognized >> [2017-10-24 17:19:04.616070] W [MSGID: 101174] >> [graph.c:361:_log_if_unknown_option] 0-digitalcorpora-server: option >> 'rpc-auth.auth-null' is not recognized >> [2017-10-24 17:19:04.616134] W [MSGID: 101174] >> [graph.c:361:_log_if_unknown_option] 0-digitalcorpora-server: option >> 'auth-path' is not recognized >> [2017-10-24 17:19:04.616177] W [MSGID: 101174] >> [graph.c:361:_log_if_unknown_option] 0-digitalcorpora-server: option >> 'ping-timeout' is not recognized >> [2017-10-24 17:19:04.616203] W [MSGID: 101174] >> [graph.c:361:_log_if_unknown_option] 0-/export/brick7/digitalcorpora: >> option 'rpc-auth-allow-insecure' is not recognized >> [2017-10-24 17:19:04.616215] W [MSGID: 101174] >> [graph.c:361:_log_if_unknown_option] 0-/export/brick7/digitalcorpora: >> option 'auth.addr./export/brick7/digitalcorpora.allow' is not recognized >> [2017-10-24 17:19:04.616226] W [MSGID: 101174] >> [graph.c:361:_log_if_unknown_option] 0-/export/brick7/digitalcorpora: >> option 'auth-path' is not recognized >> [2017-10-24 17:19:04.616237] W [MSGID: 101174] >> [graph.c:361:_log_if_unknown_option] 0-/export/brick7/digitalcorpora: >> option 'auth.login.b17f2513-7d9c-4174-a0c5-de4a752d46ca.password' is not >> recognized >> [2017-10-24 17:19:04.616248] W [MSGID: 101174] >> [graph.c:361:_log_if_unknown_option] 0-/export/brick7/digitalcorpora: >> option 'auth.login./export/brick7/digitalcorpora.allow' is not recognized >> [2017-10-24 17:19:04.616283] W [MSGID: 101174] >> [graph.c:361:_log_if_unknown_option] 0-digitalcorpora-quota: option >> 'timeout' is not recognized >> [2017-10-24 17:19:04.616367] W [MSGID: 101174] >> [graph.c:361:_log_if_unknown_option] 0-digitalcorpora-trash: option >> 'brick-path' is not recognized >> Final graph: >> +----------------------------------------------------------- >> -------------------+ >> 1: volume digitalcorpora-posix >> 2: type storage/posix >> 3: option glusterd-uuid 032c17f5-8cc9-445f-aa45-897b5a066b43 >> 4: option directory /export/brick7/digitalcorpora >> 5: option volume-id 61efe58a-ae5b-4d8b-b9f9-67829867c442 >> 6: option brick-uid 36 >> 7: option brick-gid 36 >> 8: end-volume >> 9: >> 10: volume digitalcorpora-trash >> 11: type features/trash >> 12: option trash-dir .trashcan >> 13: option brick-path /export/brick7/digitalcorpora >> 14: option trash-internal-op off >> 15: subvolumes digitalcorpora-posix >> 16: end-volume >> 17: >> 18: volume digitalcorpora-changetimerecorder >> 19: type features/changetimerecorder >> 20: option db-type sqlite3 >> 21: option hot-brick off >> 22: option db-name digitalcorpora.db >> 23: option db-path /export/brick7/digitalcorpora/.glusterfs/ >> 24: option record-exit off >> 25: option ctr_link_consistency off >> 26: option ctr_lookupheal_link_timeout 300 >> 27: option ctr_lookupheal_inode_timeout 300 >> 28: option record-entry on >> 29: option ctr-enabled off >> 30: option record-counters off >> 31: option ctr-record-metadata-heat off >> 32: option sql-db-cachesize 12500 >> 33: option sql-db-wal-autocheckpoint 25000 >> 34: subvolumes digitalcorpora-trash >> 35: end-volume >> 36: >> 37: volume digitalcorpora-changelog >> 38: type features/changelog >> 39: option changelog-brick /export/brick7/digitalcorpora >> 40: option changelog-dir /export/brick7/digitalcorpora/ >> .glusterfs/changelogs >> 41: option changelog-barrier-timeout 120 >> 42: subvolumes digitalcorpora-changetimerecorder >> 43: end-volume >> 44: >> 45: volume digitalcorpora-bitrot-stub >> 46: type features/bitrot-stub >> 47: option export /export/brick7/digitalcorpora >> 48: subvolumes digitalcorpora-changelog >> 49: end-volume >> 50: >> 51: volume digitalcorpora-access-control >> 52: type features/access-control >> 53: subvolumes digitalcorpora-bitrot-stub >> 54: end-volume >> 55: >> 56: volume digitalcorpora-locks >> 57: type features/locks >> 58: subvolumes digitalcorpora-access-control >> 59: end-volume >> 60: >> 61: volume digitalcorpora-worm >> 62: type features/worm >> 63: option worm off >> 64: option worm-file-level off >> 65: subvolumes digitalcorpora-locks >> 66: end-volume >> 67: >> 68: volume digitalcorpora-read-only >> 69: type features/read-only >> 70: option read-only off >> 71: subvolumes digitalcorpora-worm >> 72: end-volume >> 73: >> 74: volume digitalcorpora-leases >> 75: type features/leases >> 76: option leases off >> 77: subvolumes digitalcorpora-read-only >> 78: end-volume >> 79: >> 80: volume digitalcorpora-upcall >> 81: type features/upcall >> 82: option cache-invalidation off >> 83: subvolumes digitalcorpora-leases >> 84: end-volume >> 85: >> 86: volume digitalcorpora-io-threads >> 87: type performance/io-threads >> 88: subvolumes digitalcorpora-upcall >> 89: end-volume >> 90: >> 91: volume digitalcorpora-marker >> 92: type features/marker >> 93: option volume-uuid 61efe58a-ae5b-4d8b-b9f9-67829867c442 >> 94: option timestamp-file /var/lib/glusterd/vols/ >> digitalcorpora/marker.tstamp >> 95: option quota-version 0 >> 96: option xtime off >> 97: option gsync-force-xtime off >> 98: option quota off >> 99: option inode-quota off >> 100: subvolumes digitalcorpora-io-threads >> 101: end-volume >> 102: >> 103: volume digitalcorpora-barrier >> 104: type features/barrier >> 105: option barrier disable >> 106: option barrier-timeout 120 >> 107: subvolumes digitalcorpora-marker >> 108: end-volume >> 109: >> 110: volume digitalcorpora-index >> 111: type features/index >> 112: option index-base /export/brick7/digitalcorpora/ >> .glusterfs/indices >> 113: option xattrop-dirty-watchlist trusted.afr.dirty >> 114: option xattrop-pending-watchlist trusted.afr.digitalcorpora- >> 115: subvolumes digitalcorpora-barrier >> 116: end-volume >> 117: >> 118: volume digitalcorpora-quota >> 119: type features/quota >> 120: option volume-uuid digitalcorpora >> 121: option server-quota off >> 122: option timeout 0 >> 123: option deem-statfs off >> 124: subvolumes digitalcorpora-index >> 125: end-volume >> 126: >> 127: volume digitalcorpora-io-stats >> 128: type debug/io-stats >> 129: option unique-id /export/brick7/digitalcorpora >> 130: option log-level WARNING >> 131: option latency-measurement off >> 132: option count-fop-hits off >> 133: subvolumes digitalcorpora-quota >> 134: end-volume >> 135: >> 136: volume /export/brick7/digitalcorpora >> 137: type performance/decompounder >> 138: option rpc-auth-allow-insecure on >> 139: option auth.addr./export/brick7/digitalcorpora.allow >> 129.174.125.204,129.174.93.204 >> 140: option auth-path /export/brick7/digitalcorpora >> 141: option auth.login.b17f2513-7d9c-4174-a0c5-de4a752d46ca.password >> 6c007ad0-b5a2-4564-8464-300f8317e5c7 >> 142: option auth.login./export/brick7/digitalcorpora.allow >> b17f2513-7d9c-4174-a0c5-de4a752d46ca >> 143: subvolumes digitalcorpora-io-stats >> 144: end-volume >> 145: >> 146: volume digitalcorpora-server >> 147: type protocol/server >> 148: option transport.socket.listen-port 49154 >> 149: option rpc-auth.auth-glusterfs on >> 150: option rpc-auth.auth-unix on >> 151: option rpc-auth.auth-null on >> 152: option transport-type tcp >> 153: option transport.address-family inet >> 154: option auth.login./export/brick7/digitalcorpora.allow >> b17f2513-7d9c-4174-a0c5-de4a752d46ca >> 155: option auth.login.b17f2513-7d9c-4174-a0c5-de4a752d46ca.password >> 6c007ad0-b5a2-4564-8464-300f8317e5c7 >> 156: option auth-path /export/brick7/digitalcorpora >> 157: option auth.addr./export/brick7/digitalcorpora.allow >> 129.174.125.204,129.174.93.204 >> 158: option ping-timeout 42 >> 159: option transport.socket.keepalive 1 >> 160: option rpc-auth-allow-insecure on >> 161: option transport.tcp-user-timeout 0 >> 162: option transport.socket.keepalive-time 20 >> 163: option transport.socket.keepalive-interval 2 >> 164: option transport.socket.keepalive-count 9 >> 165: subvolumes /export/brick7/digitalcorpora >> 166: end-volume >> 167: >> +----------------------------------------------------------- >> -------------------+ >> [2017-10-24 17:22:21.438620] W [socket.c:593:__socket_rwv] 0-glusterfs: >> readv on 129.174.126.87:24007 failed (No data available) >> > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://lists.gluster.org/mailman/listinfo/gluster-users >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20171024/011f8f7e/attachment.html>
Alastair Neil
2017-Oct-24 19:32 UTC
[Gluster-users] brick is down but gluster volume status says it's fine
It looks like this is to do with the stale port issue. I think it's pretty clear from the below that the digitalcorpora brick process is shown by volume status as having the same TCP port as the public volume brick on gluster-2, 49156. But is actually listening on 49154. So although the brick process is technically up nothing is talking to it. I am surprised I don't see more errors in the brick log for brick8/public. It also explains the wack-a-mole problem, Every time I kill and restart the daemon it must be grabbing the port of another brick and then that volume brick goes silent. I killed all the brick processes and restarted glusterd and everything came up ok. [root at gluster-2 ~]# glv status digitalcorpora | grep -v ^Self Status of volume: digitalcorpora Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick gluster-2:/export/brick7/digitalcorpo ra 49156 0 Y 125708 Brick gluster1.vsnet.gmu.edu:/export/brick7 /digitalcorpora 49152 0 Y 12345 Brick gluster0:/export/brick7/digitalcorpor a 49152 0 Y 16098 Task Status of Volume digitalcorpora ------------------------------------------------------------------------------ There are no active volume tasks [root at gluster-2 ~]# glv status public | grep -v ^Self Status of volume: public Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick gluster1:/export/brick8/public 49156 0 Y 3519 Brick gluster2:/export/brick8/public 49156 0 Y 8578 Brick gluster0:/export/brick8/public 49156 0 Y 3176 Task Status of Volume public ------------------------------------------------------------------------------ There are no active volume tasks [root at gluster-2 ~]# netstat -pant | grep 8578 | grep 0.0.0.0 tcp 0 0 0.0.0.0:49156 0.0.0.0:* LISTEN 8578/glusterfsd [root at gluster-2 ~]# netstat -pant | grep 125708 | grep 0.0.0.0 tcp 0 0 0.0.0.0:49154 0.0.0.0:* LISTEN 125708/glusterfsd [root at gluster-2 ~]# ps -c --pid 125708 8578 PID CLS PRI TTY STAT TIME COMMAND 8578 TS 19 ? Ssl 224:20 /usr/sbin/glusterfsd -s gluster2 --volfile-id public.gluster2.export-brick8-public -p /var/lib/glusterd/vols/public/run/gluster2-export-bric 125708 TS 19 ? Ssl 0:08 /usr/sbin/glusterfsd -s gluster-2 --volfile-id digitalcorpora.gluster-2.export-brick7-digitalcorpora -p /var/lib/glusterd/vols/digitalcorpor [root at gluster-2 ~]# On 24 October 2017 at 13:56, Atin Mukherjee <amukherj at redhat.com> wrote:> > > On Tue, Oct 24, 2017 at 11:13 PM, Alastair Neil <ajneil.tech at gmail.com> > wrote: > >> gluster version 3.10.6, replica 3 volume, daemon is present but does not >> appear to be functioning >> >> peculiar behaviour. If I kill the glusterfs brick daemon and restart >> glusterd then the brick becomes available - but one of my other volumes >> bricks on the same server goes down in the same way it's like wack-a-mole. >> >> any ideas? >> > > The subject and the data looks to be contradictory to me. Brick log (what > you shared) doesn't have a cleanup_and_exit () trigger for a shutdown. Are > you sure brick is down? OTOH, I see a mismatch of port for > brick7/digitalcorpora where the brick process has 49154 but gluster volume > status shows 49152. There is an issue with stale port which we're trying to > address through https://review.gluster.org/18541 . But could you specify > what exactly the problem is? Is it the stale port or the conflict between > volume status output and actual brick health? If it's the latter, I'd need > further information like output of "gluster get-state" command from the > same node. > > >> >> [root at gluster-2 bricks]# glv status digitalcorpora >> >>> Status of volume: digitalcorpora >>> Gluster process TCP Port RDMA Port >>> Online Pid >>> ------------------------------------------------------------ >>> ------------------ >>> Brick gluster-2:/export/brick7/digitalcorpo >>> ra 49156 0 >>> Y 125708 >>> Brick gluster1.vsnet.gmu.edu:/export/brick7 >>> /digitalcorpora 49152 0 >>> Y 12345 >>> Brick gluster0:/export/brick7/digitalcorpor >>> a 49152 0 >>> Y 16098 >>> Self-heal Daemon on localhost N/A N/A Y >>> 126625 >>> Self-heal Daemon on gluster1 N/A N/A Y >>> 15405 >>> Self-heal Daemon on gluster0 N/A N/A Y >>> 18584 >>> >>> Task Status of Volume digitalcorpora >>> ------------------------------------------------------------ >>> ------------------ >>> There are no active volume tasks >>> >>> [root at gluster-2 bricks]# glv heal digitalcorpora info >>> Brick gluster-2:/export/brick7/digitalcorpora >>> Status: Transport endpoint is not connected >>> Number of entries: - >>> >>> Brick gluster1.vsnet.gmu.edu:/export/brick7/digitalcorpora >>> /.trashcan >>> /DigitalCorpora/hello2.txt >>> /DigitalCorpora >>> Status: Connected >>> Number of entries: 3 >>> >>> Brick gluster0:/export/brick7/digitalcorpora >>> /.trashcan >>> /DigitalCorpora/hello2.txt >>> /DigitalCorpora >>> Status: Connected >>> Number of entries: 3 >>> >>> [2017-10-24 17:18:48.288505] W [glusterfsd.c:1360:cleanup_and_exit] >>> (-->/lib64/libpthread.so.0(+0x7e25) [0x7f6f83c9de25] >>> -->/usr/sbin/glusterfsd(glusterfs_sigwaiter+0xe5) [0x55a148eeb135] >>> -->/usr/sbin/glusterfsd(cleanup_and_exit+0x6b) [0x55a148eeaf5b] ) 0-: >>> received signum (15), shutting down >>> [2017-10-24 17:18:59.270384] I [MSGID: 100030] [glusterfsd.c:2503:main] >>> 0-/usr/sbin/glusterfsd: Started running /usr/sbin/glusterfsd version 3.10.6 >>> (args: /usr/sbin/glusterfsd -s gluster-2 --volfile-id >>> digitalcorpora.gluster-2.export-brick7-digitalcorpora -p >>> /var/lib/glusterd/vols/digitalcorpora/run/gluster-2-export-brick7-digitalcorpora.pid >>> -S /var/run/gluster/f8e0b3393e47dc51a07c6609f9b40841.socket >>> --brick-name /export/brick7/digitalcorpora -l /var/log/glusterfs/bricks/export-brick7-digitalcorpora.log >>> --xlator-option *-posix.glusterd-uuid=032c17f5-8cc9-445f-aa45-897b5a066b43 >>> --brick-port 49154 --xlator-option digitalcorpora-server.listen-p >>> ort=49154) >>> [2017-10-24 17:18:59.285279] I [MSGID: 101190] >>> [event-epoll.c:629:event_dispatch_epoll_worker] 0-epoll: Started thread >>> with index 1 >>> [2017-10-24 17:19:04.611723] I [rpcsvc.c:2237:rpcsvc_set_outstanding_rpc_limit] >>> 0-rpc-service: Configured rpc.outstanding-rpc-limit with value 64 >>> [2017-10-24 17:19:04.611815] W [MSGID: 101002] >>> [options.c:954:xl_opt_validate] 0-digitalcorpora-server: option >>> 'listen-port' is deprecated, preferred is 'transport.socket.listen-port', >>> continuing with correction >>> [2017-10-24 17:19:04.615974] W [MSGID: 101174] >>> [graph.c:361:_log_if_unknown_option] 0-digitalcorpora-server: option >>> 'rpc-auth.auth-glusterfs' is not recognized >>> [2017-10-24 17:19:04.616033] W [MSGID: 101174] >>> [graph.c:361:_log_if_unknown_option] 0-digitalcorpora-server: option >>> 'rpc-auth.auth-unix' is not recognized >>> [2017-10-24 17:19:04.616070] W [MSGID: 101174] >>> [graph.c:361:_log_if_unknown_option] 0-digitalcorpora-server: option >>> 'rpc-auth.auth-null' is not recognized >>> [2017-10-24 17:19:04.616134] W [MSGID: 101174] >>> [graph.c:361:_log_if_unknown_option] 0-digitalcorpora-server: option >>> 'auth-path' is not recognized >>> [2017-10-24 17:19:04.616177] W [MSGID: 101174] >>> [graph.c:361:_log_if_unknown_option] 0-digitalcorpora-server: option >>> 'ping-timeout' is not recognized >>> [2017-10-24 17:19:04.616203] W [MSGID: 101174] >>> [graph.c:361:_log_if_unknown_option] 0-/export/brick7/digitalcorpora: >>> option 'rpc-auth-allow-insecure' is not recognized >>> [2017-10-24 17:19:04.616215] W [MSGID: 101174] >>> [graph.c:361:_log_if_unknown_option] 0-/export/brick7/digitalcorpora: >>> option 'auth.addr./export/brick7/digitalcorpora.allow' is not recognized >>> [2017-10-24 17:19:04.616226] W [MSGID: 101174] >>> [graph.c:361:_log_if_unknown_option] 0-/export/brick7/digitalcorpora: >>> option 'auth-path' is not recognized >>> [2017-10-24 17:19:04.616237] W [MSGID: 101174] >>> [graph.c:361:_log_if_unknown_option] 0-/export/brick7/digitalcorpora: >>> option 'auth.login.b17f2513-7d9c-4174-a0c5-de4a752d46ca.password' is >>> not recognized >>> [2017-10-24 17:19:04.616248] W [MSGID: 101174] >>> [graph.c:361:_log_if_unknown_option] 0-/export/brick7/digitalcorpora: >>> option 'auth.login./export/brick7/digitalcorpora.allow' is not >>> recognized >>> [2017-10-24 17:19:04.616283] W [MSGID: 101174] >>> [graph.c:361:_log_if_unknown_option] 0-digitalcorpora-quota: option >>> 'timeout' is not recognized >>> [2017-10-24 17:19:04.616367] W [MSGID: 101174] >>> [graph.c:361:_log_if_unknown_option] 0-digitalcorpora-trash: option >>> 'brick-path' is not recognized >>> Final graph: >>> +----------------------------------------------------------- >>> -------------------+ >>> 1: volume digitalcorpora-posix >>> 2: type storage/posix >>> 3: option glusterd-uuid 032c17f5-8cc9-445f-aa45-897b5a066b43 >>> 4: option directory /export/brick7/digitalcorpora >>> 5: option volume-id 61efe58a-ae5b-4d8b-b9f9-67829867c442 >>> 6: option brick-uid 36 >>> 7: option brick-gid 36 >>> 8: end-volume >>> 9: >>> 10: volume digitalcorpora-trash >>> 11: type features/trash >>> 12: option trash-dir .trashcan >>> 13: option brick-path /export/brick7/digitalcorpora >>> 14: option trash-internal-op off >>> 15: subvolumes digitalcorpora-posix >>> 16: end-volume >>> 17: >>> 18: volume digitalcorpora-changetimerecorder >>> 19: type features/changetimerecorder >>> 20: option db-type sqlite3 >>> 21: option hot-brick off >>> 22: option db-name digitalcorpora.db >>> 23: option db-path /export/brick7/digitalcorpora/.glusterfs/ >>> 24: option record-exit off >>> 25: option ctr_link_consistency off >>> 26: option ctr_lookupheal_link_timeout 300 >>> 27: option ctr_lookupheal_inode_timeout 300 >>> 28: option record-entry on >>> 29: option ctr-enabled off >>> 30: option record-counters off >>> 31: option ctr-record-metadata-heat off >>> 32: option sql-db-cachesize 12500 >>> 33: option sql-db-wal-autocheckpoint 25000 >>> 34: subvolumes digitalcorpora-trash >>> 35: end-volume >>> 36: >>> 37: volume digitalcorpora-changelog >>> 38: type features/changelog >>> 39: option changelog-brick /export/brick7/digitalcorpora >>> 40: option changelog-dir /export/brick7/digitalcorpora/ >>> .glusterfs/changelogs >>> 41: option changelog-barrier-timeout 120 >>> 42: subvolumes digitalcorpora-changetimerecorder >>> 43: end-volume >>> 44: >>> 45: volume digitalcorpora-bitrot-stub >>> 46: type features/bitrot-stub >>> 47: option export /export/brick7/digitalcorpora >>> 48: subvolumes digitalcorpora-changelog >>> 49: end-volume >>> 50: >>> 51: volume digitalcorpora-access-control >>> 52: type features/access-control >>> 53: subvolumes digitalcorpora-bitrot-stub >>> 54: end-volume >>> 55: >>> 56: volume digitalcorpora-locks >>> 57: type features/locks >>> 58: subvolumes digitalcorpora-access-control >>> 59: end-volume >>> 60: >>> 61: volume digitalcorpora-worm >>> 62: type features/worm >>> 63: option worm off >>> 64: option worm-file-level off >>> 65: subvolumes digitalcorpora-locks >>> 66: end-volume >>> 67: >>> 68: volume digitalcorpora-read-only >>> 69: type features/read-only >>> 70: option read-only off >>> 71: subvolumes digitalcorpora-worm >>> 72: end-volume >>> 73: >>> 74: volume digitalcorpora-leases >>> 75: type features/leases >>> 76: option leases off >>> 77: subvolumes digitalcorpora-read-only >>> 78: end-volume >>> 79: >>> 80: volume digitalcorpora-upcall >>> 81: type features/upcall >>> 82: option cache-invalidation off >>> 83: subvolumes digitalcorpora-leases >>> 84: end-volume >>> 85: >>> 86: volume digitalcorpora-io-threads >>> 87: type performance/io-threads >>> 88: subvolumes digitalcorpora-upcall >>> 89: end-volume >>> 90: >>> 91: volume digitalcorpora-marker >>> 92: type features/marker >>> 93: option volume-uuid 61efe58a-ae5b-4d8b-b9f9-67829867c442 >>> 94: option timestamp-file /var/lib/glusterd/vols/digital >>> corpora/marker.tstamp >>> 95: option quota-version 0 >>> 96: option xtime off >>> 97: option gsync-force-xtime off >>> 98: option quota off >>> 99: option inode-quota off >>> 100: subvolumes digitalcorpora-io-threads >>> 101: end-volume >>> 102: >>> 103: volume digitalcorpora-barrier >>> 104: type features/barrier >>> 105: option barrier disable >>> 106: option barrier-timeout 120 >>> 107: subvolumes digitalcorpora-marker >>> 108: end-volume >>> 109: >>> 110: volume digitalcorpora-index >>> 111: type features/index >>> 112: option index-base /export/brick7/digitalcorpora/ >>> .glusterfs/indices >>> 113: option xattrop-dirty-watchlist trusted.afr.dirty >>> 114: option xattrop-pending-watchlist trusted.afr.digitalcorpora- >>> 115: subvolumes digitalcorpora-barrier >>> 116: end-volume >>> 117: >>> 118: volume digitalcorpora-quota >>> 119: type features/quota >>> 120: option volume-uuid digitalcorpora >>> 121: option server-quota off >>> 122: option timeout 0 >>> 123: option deem-statfs off >>> 124: subvolumes digitalcorpora-index >>> 125: end-volume >>> 126: >>> 127: volume digitalcorpora-io-stats >>> 128: type debug/io-stats >>> 129: option unique-id /export/brick7/digitalcorpora >>> 130: option log-level WARNING >>> 131: option latency-measurement off >>> 132: option count-fop-hits off >>> 133: subvolumes digitalcorpora-quota >>> 134: end-volume >>> 135: >>> 136: volume /export/brick7/digitalcorpora >>> 137: type performance/decompounder >>> 138: option rpc-auth-allow-insecure on >>> 139: option auth.addr./export/brick7/digitalcorpora.allow >>> 129.174.125.204,129.174.93.204 >>> 140: option auth-path /export/brick7/digitalcorpora >>> 141: option auth.login.b17f2513-7d9c-4174-a0c5-de4a752d46ca.password >>> 6c007ad0-b5a2-4564-8464-300f8317e5c7 >>> 142: option auth.login./export/brick7/digitalcorpora.allow >>> b17f2513-7d9c-4174-a0c5-de4a752d46ca >>> 143: subvolumes digitalcorpora-io-stats >>> 144: end-volume >>> 145: >>> 146: volume digitalcorpora-server >>> 147: type protocol/server >>> 148: option transport.socket.listen-port 49154 >>> 149: option rpc-auth.auth-glusterfs on >>> 150: option rpc-auth.auth-unix on >>> 151: option rpc-auth.auth-null on >>> 152: option transport-type tcp >>> 153: option transport.address-family inet >>> 154: option auth.login./export/brick7/digitalcorpora.allow >>> b17f2513-7d9c-4174-a0c5-de4a752d46ca >>> 155: option auth.login.b17f2513-7d9c-4174-a0c5-de4a752d46ca.password >>> 6c007ad0-b5a2-4564-8464-300f8317e5c7 >>> 156: option auth-path /export/brick7/digitalcorpora >>> 157: option auth.addr./export/brick7/digitalcorpora.allow >>> 129.174.125.204,129.174.93.204 >>> 158: option ping-timeout 42 >>> 159: option transport.socket.keepalive 1 >>> 160: option rpc-auth-allow-insecure on >>> 161: option transport.tcp-user-timeout 0 >>> 162: option transport.socket.keepalive-time 20 >>> 163: option transport.socket.keepalive-interval 2 >>> 164: option transport.socket.keepalive-count 9 >>> 165: subvolumes /export/brick7/digitalcorpora >>> 166: end-volume >>> 167: >>> +----------------------------------------------------------- >>> -------------------+ >>> [2017-10-24 17:22:21.438620] W [socket.c:593:__socket_rwv] 0-glusterfs: >>> readv on 129.174.126.87:24007 failed (No data available) >>> >> >> >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> http://lists.gluster.org/mailman/listinfo/gluster-users >> > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20171024/f9580462/attachment.html>
Seemingly Similar Threads
- brick is down but gluster volume status says it's fine
- Is transport=rdma tested with "stripe"?
- parallel-readdir is not recognized in GlusterFS 3.12.4
- parallel-readdir is not recognized in GlusterFS 3.12.4
- [PATCH 0/7] hivex + hivexml: Add byte runs for nodes and values