Erik Jacobson
2020-Apr-01 03:27 UTC
[Gluster-users] gnfs split brain when 1 server in 3x1 down (high load) - help request
THANK YOU for the hints. Very happy to have the help. I'll reply to a couple things then dig in: On Tue, Mar 31, 2020 at 03:27:59PM +0530, Ravishankar N wrote:> From your reply in the other thread, I'm assuming that the file/gfid in > question is not in genuine split-brain or needing heal. i.e. for exampleRight, they were not tagged split-brain either, just healing needed, which is expected for those 76 files.> with that 1 brick down and 2 bricks up test case, if you tried to read the > file from say a temporary fuse mount (which is also now connected to only to > 2 bricks since the 3rd one is down) it works fine and there is no EIO > error...Looking at the heal info, all files are the files I expected to have write changes and I think* are outside the scope of this issue. To close the loop, I ran a 'strings' on the top of one the files to confirm from a fuse mount and had no trouble.> ...which means that what you have observed is true, i.e. > afr_read_txn_refresh_done() is called with err=EIO. You can add logs to see > at what point it is EIO set. The call graph is like this: afr_inode_refresh_done()-->afr_txn_refresh_done()-->afr_read_txn_refresh_done(). > > Maybe https://github.com/gluster/glusterfs/blob/v7.4/xlators/cluster/afr/src/afr-common.c#L1188 > in afr_txn_refresh_done() is causing it either due to ret being -EIO or > event_generation being zero. > > If you are comfortable with gdb, you an put a conditional break point in > afr_read_txn_refresh_done() at https://github.com/gluster/glusterfs/blob/v7.4/xlators/cluster/afr/src/afr-read-txn.c#L283 > when err=EIO and then check the backtrace for who is setting err to EIO.Ok so the main event! :) I'm not a gdb expert but I think I figured it out well enough to paste some back traces. However, I'm having trouble intepreting them exactly. It looks to me to be the "event" case. (I got permission to use this MFG system at night for a couple more nights; avoiding the 24-hour-reserved internal larger system we have). here is what I did, feel free to suggest something better. - I am using an RPM build so I changed the spec file to create debuginfo packages. I'm on rhel8.1 - I installed the updated packages and debuginfo packages - When glusterd started the nfs glusterfs, I killed it. - I ran this: gdb -d /root/rpmbuild/BUILD/glusterfs-7.2 -d /root/rpmbuild/BUILD/glusterfs-7.2/xlators/cluster/afr/src/ /usr/sbin/glusterfs - Then from GDB, I ran this: (gdb) run -s localhost --volfile-id gluster/nfs -p /var/run/gluster/nfs/nfs.pid -l /var/log/glusterfs/nfs.log -S /var/run/gluster/9ddb5561058ff543.socket -N - I hit ctrl-c, then set the break point: (gdb) break xlators/cluster/afr/src/afr-read-txn.c:280 if err == 5 - I have some debugging statements but gluster 72 line 280 is this: --> line 280 (I think gdb changed it to 281 internally) if (err) { if (!priv->thin_arbiter_count) { - continue - Then I ran the test case. Here are some back traces. They make my head hurt. Maybe you can suggest something else to try next? In the morning I'll try to unwind this myself too in the source code but I suspect it will be tough for me. (gdb) break xlators/cluster/afr/src/afr-read-txn.c:280 if err == 5 Breakpoint 1 at 0x7fff688e057b: file afr-read-txn.c, line 281. (gdb) continue Continuing. [Switching to Thread 0x7ffecffff700 (LWP 50175)] Thread 15 "glfs_epoll007" hit Breakpoint 1, afr_read_txn_refresh_done ( frame=0x7fff48325d78, this=0x7fff640137b0, err=5) at afr-read-txn.c:281 281 if (err) { (gdb) bt #0 afr_read_txn_refresh_done (frame=0x7fff48325d78, this=0x7fff640137b0, err=5) at afr-read-txn.c:281 #1 0x00007fff68901fdb in afr_txn_refresh_done ( frame=frame at entry=0x7fff48325d78, this=this at entry=0x7fff640137b0, err=5, err at entry=0) at afr-common.c:1223 #2 0x00007fff689022b3 in afr_inode_refresh_done ( frame=frame at entry=0x7fff48325d78, this=this at entry=0x7fff640137b0, error=0) at afr-common.c:1295 #3 0x00007fff6890f3fb in afr_inode_refresh_subvol_cbk (frame=0x7fff48325d78, cookie=<optimized out>, this=0x7fff640137b0, op_ret=<optimized out>, op_errno=<optimized out>, buf=buf at entry=0x7ffecfffdaa0, xdata=0x7ffeb806ef08, par=0x7ffecfffdb40) at afr-common.c:1333 #4 0x00007fff6890f42a in afr_inode_refresh_subvol_with_lookup_cbk ( frame=<optimized out>, cookie=<optimized out>, this=<optimized out>, op_ret=<optimized out>, op_errno=<optimized out>, inode=<optimized out>, buf=0x7ffecfffdaa0, xdata=0x7ffeb806ef08, par=0x7ffecfffdb40) at afr-common.c:1344 #5 0x00007fff68b8e96f in client4_0_lookup_cbk (req=<optimized out>, iov=<optimized out>, count=<optimized out>, myframe=0x7fff483147b8) at client-rpc-fops_v2.c:2640 #6 0x00007fffed293115 in rpc_clnt_handle_reply ( clnt=clnt at entry=0x7fff640671b0, pollin=pollin at entry=0x7ffeb81aa110) at rpc-clnt.c:764 #7 0x00007fffed2934b3 in rpc_clnt_notify (trans=0x7fff64067540, mydata=0x7fff640671e0, event=<optimized out>, data=0x7ffeb81aa110) at rpc-clnt.c:931 #8 0x00007fffed29007b in rpc_transport_notify ( this=this at entry=0x7fff64067540, event=event at entry=RPC_TRANSPORT_MSG_RECEIVED, data=data at entry=0x7ffeb81aa110) at rpc-transport.c:545 #9 0x00007fff7b6298dc in socket_event_poll_in_async (xl=<optimized out>, async=0x7ffeb81aa238) at socket.c:2592 #10 0x00007fff7b63134c in gf_async ( cbk=0x7fff7b6298b0 <socket_event_poll_in_async>, xl=<optimized out>, async=0x7ffeb81aa238) at ../../../../libglusterfs/src/glusterfs/async.h:189 #11 socket_event_poll_in (notify_handled=true, this=0x7fff64067540) at socket.c:2633 #12 socket_event_handler (fd=fd at entry=18, idx=idx at entry=9, gen=gen at entry=1, --Type <RET> for more, q to quit, c to continue without paging-- data=data at entry=0x7fff64067540, poll_in=<optimized out>, poll_out=<optimized out>, poll_err=0, event_thread_died=0 '\000') at socket.c:3031 #13 0x00007fffed54da8b in event_dispatch_epoll_handler (event=0x7ffecfffe014, event_pool=0x5555557cb750) at event-epoll.c:650 #14 event_dispatch_epoll_worker (data=0x7fff64038a80) at event-epoll.c:763 #15 0x00007fffec0702de in start_thread () from /lib64/libpthread.so.0 #16 0x00007fffeb8c8a63 in clone () from /lib64/libc.so.6 (gdb) continue Continuing. [Switching to Thread 0x7fff2ffff700 (LWP 50171)] Thread 11 "glfs_epoll003" hit Breakpoint 1, afr_read_txn_refresh_done ( frame=0x7ffc64544158, this=0x7fff640137b0, err=5) at afr-read-txn.c:281 281 if (err) { (gdb) bt #0 afr_read_txn_refresh_done (frame=0x7ffc64544158, this=0x7fff640137b0, err=5) at afr-read-txn.c:281 #1 0x00007fff68901fdb in afr_txn_refresh_done ( frame=frame at entry=0x7ffc64544158, this=this at entry=0x7fff640137b0, err=5, err at entry=0) at afr-common.c:1223 #2 0x00007fff689022b3 in afr_inode_refresh_done ( frame=frame at entry=0x7ffc64544158, this=this at entry=0x7fff640137b0, error=0) at afr-common.c:1295 #3 0x00007fff6890f3fb in afr_inode_refresh_subvol_cbk (frame=0x7ffc64544158, cookie=<optimized out>, this=0x7fff640137b0, op_ret=<optimized out>, op_errno=<optimized out>, buf=buf at entry=0x7fff2fffdaa0, xdata=0x7fff18523298, par=0x7fff2fffdb40) at afr-common.c:1333 #4 0x00007fff6890f42a in afr_inode_refresh_subvol_with_lookup_cbk ( frame=<optimized out>, cookie=<optimized out>, this=<optimized out>, op_ret=<optimized out>, op_errno=<optimized out>, inode=<optimized out>, buf=0x7fff2fffdaa0, xdata=0x7fff18523298, par=0x7fff2fffdb40) at afr-common.c:1344 #5 0x00007fff68b8e96f in client4_0_lookup_cbk (req=<optimized out>, iov=<optimized out>, count=<optimized out>, myframe=0x7ffc64319a08) at client-rpc-fops_v2.c:2640 #6 0x00007fffed293115 in rpc_clnt_handle_reply ( clnt=clnt at entry=0x7fff640671b0, pollin=pollin at entry=0x7fff18685b40) at rpc-clnt.c:764 #7 0x00007fffed2934b3 in rpc_clnt_notify (trans=0x7fff64067540, mydata=0x7fff640671e0, event=<optimized out>, data=0x7fff18685b40) at rpc-clnt.c:931 #8 0x00007fffed29007b in rpc_transport_notify ( this=this at entry=0x7fff64067540, event=event at entry=RPC_TRANSPORT_MSG_RECEIVED, data=data at entry=0x7fff18685b40) at rpc-transport.c:545 #9 0x00007fff7b6298dc in socket_event_poll_in_async (xl=<optimized out>, async=0x7fff18685c68) at socket.c:2592 #10 0x00007fff7b63134c in gf_async ( cbk=0x7fff7b6298b0 <socket_event_poll_in_async>, xl=<optimized out>, async=0x7fff18685c68) at ../../../../libglusterfs/src/glusterfs/async.h:189 #11 socket_event_poll_in (notify_handled=true, this=0x7fff64067540) at socket.c:2633 #12 socket_event_handler (fd=fd at entry=18, idx=idx at entry=9, gen=gen at entry=1, --Type <RET> for more, q to quit, c to continue without paging-- data=data at entry=0x7fff64067540, poll_in=<optimized out>, poll_out=<optimized out>, poll_err=0, event_thread_died=0 '\000') at socket.c:3031 #13 0x00007fffed54da8b in event_dispatch_epoll_handler (event=0x7fff2fffe014, event_pool=0x5555557cb750) at event-epoll.c:650 #14 event_dispatch_epoll_worker (data=0x7fff64038340) at event-epoll.c:763 #15 0x00007fffec0702de in start_thread () from /lib64/libpthread.so.0 #16 0x00007fffeb8c8a63 in clone () from /lib64/libc.so.6 Continuing. [Switching to Thread 0x7ffd53fff700 (LWP 50191)] Thread 31 "glfs_epoll016" hit Breakpoint 1, afr_read_txn_refresh_done ( frame=0x7ffc949cf7c8, this=0x7fff640137b0, err=5) at afr-read-txn.c:281 281 if (err) { (gdb) bt #0 afr_read_txn_refresh_done (frame=0x7ffc949cf7c8, this=0x7fff640137b0, err=5) at afr-read-txn.c:281 #1 0x00007fff68901fdb in afr_txn_refresh_done ( frame=frame at entry=0x7ffc949cf7c8, this=this at entry=0x7fff640137b0, err=5, err at entry=0) at afr-common.c:1223 #2 0x00007fff689022b3 in afr_inode_refresh_done ( frame=frame at entry=0x7ffc949cf7c8, this=this at entry=0x7fff640137b0, error=0) at afr-common.c:1295 #3 0x00007fff6890f3fb in afr_inode_refresh_subvol_cbk (frame=0x7ffc949cf7c8, cookie=<optimized out>, this=0x7fff640137b0, op_ret=<optimized out>, op_errno=<optimized out>, buf=buf at entry=0x7ffd53ffdaa0, xdata=0x7ffd3c6764f8, par=0x7ffd53ffdb40) at afr-common.c:1333 #4 0x00007fff6890f42a in afr_inode_refresh_subvol_with_lookup_cbk ( frame=<optimized out>, cookie=<optimized out>, this=<optimized out>, op_ret=<optimized out>, op_errno=<optimized out>, inode=<optimized out>, buf=0x7ffd53ffdaa0, xdata=0x7ffd3c6764f8, par=0x7ffd53ffdb40) at afr-common.c:1344 #5 0x00007fff68b8e96f in client4_0_lookup_cbk (req=<optimized out>, iov=<optimized out>, count=<optimized out>, myframe=0x7ffc9498ce38) at client-rpc-fops_v2.c:2640 #6 0x00007fffed293115 in rpc_clnt_handle_reply ( clnt=clnt at entry=0x7fff6406b4a0, pollin=pollin at entry=0x7ffd3c1685d0) at rpc-clnt.c:764 #7 0x00007fffed2934b3 in rpc_clnt_notify (trans=0x7fff6406b7b0, mydata=0x7fff6406b4d0, event=<optimized out>, data=0x7ffd3c1685d0) at rpc-clnt.c:931 #8 0x00007fffed29007b in rpc_transport_notify ( this=this at entry=0x7fff6406b7b0, event=event at entry=RPC_TRANSPORT_MSG_RECEIVED, data=data at entry=0x7ffd3c1685d0) at rpc-transport.c:545 #9 0x00007fff7b6298dc in socket_event_poll_in_async (xl=<optimized out>, async=0x7ffd3c1686f8) at socket.c:2592 #10 0x00007fff7b63134c in gf_async ( cbk=0x7fff7b6298b0 <socket_event_poll_in_async>, xl=<optimized out>, async=0x7ffd3c1686f8) at ../../../../libglusterfs/src/glusterfs/async.h:189 #11 socket_event_poll_in (notify_handled=true, this=0x7fff6406b7b0) at socket.c:2633 #12 socket_event_handler (fd=fd at entry=14, idx=idx at entry=6, gen=gen at entry=4, --Type <RET> for more, q to quit, c to continue without paging-- data=data at entry=0x7fff6406b7b0, poll_in=<optimized out>, poll_out=<optimized out>, poll_err=0, event_thread_died=0 '\000') at socket.c:3031 #13 0x00007fffed54da8b in event_dispatch_epoll_handler (event=0x7ffd53ffe014, event_pool=0x5555557cb750) at event-epoll.c:650 #14 event_dispatch_epoll_worker (data=0x7fff64065ee0) at event-epoll.c:763 #15 0x00007fffec0702de in start_thread () from /lib64/libpthread.so.0 #16 0x00007fffeb8c8a63 in clone () from /lib64/libc.so.6 (gdb) continue Continuing. [Switching to Thread 0x7fff60dc6700 (LWP 50169)] Thread 9 "glfs_epoll001" hit Breakpoint 1, afr_read_txn_refresh_done ( frame=0x7ffe589b0848, this=0x7fff640137b0, err=5) at afr-read-txn.c:281 281 if (err) { (gdb) bt #0 afr_read_txn_refresh_done (frame=0x7ffe589b0848, this=0x7fff640137b0, err=5) at afr-read-txn.c:281 #1 0x00007fff68901fdb in afr_txn_refresh_done ( frame=frame at entry=0x7ffe589b0848, this=this at entry=0x7fff640137b0, err=5, err at entry=0) at afr-common.c:1223 #2 0x00007fff689022b3 in afr_inode_refresh_done ( frame=frame at entry=0x7ffe589b0848, this=this at entry=0x7fff640137b0, error=0) at afr-common.c:1295 #3 0x00007fff6890f3fb in afr_inode_refresh_subvol_cbk (frame=0x7ffe589b0848, cookie=<optimized out>, this=0x7fff640137b0, op_ret=<optimized out>, op_errno=<optimized out>, buf=buf at entry=0x7fff60dc4aa0, xdata=0x7fff48a70bc8, par=0x7fff60dc4b40) at afr-common.c:1333 #4 0x00007fff6890f42a in afr_inode_refresh_subvol_with_lookup_cbk ( frame=<optimized out>, cookie=<optimized out>, this=<optimized out>, op_ret=<optimized out>, op_errno=<optimized out>, inode=<optimized out>, buf=0x7fff60dc4aa0, xdata=0x7fff48a70bc8, par=0x7fff60dc4b40) at afr-common.c:1344 #5 0x00007fff68b8e96f in client4_0_lookup_cbk (req=<optimized out>, iov=<optimized out>, count=<optimized out>, myframe=0x7ffe584cfeb8) at client-rpc-fops_v2.c:2640 #6 0x00007fffed293115 in rpc_clnt_handle_reply ( clnt=clnt at entry=0x7fff640671b0, pollin=pollin at entry=0x7fff484cc110) at rpc-clnt.c:764 #7 0x00007fffed2934b3 in rpc_clnt_notify (trans=0x7fff64067540, mydata=0x7fff640671e0, event=<optimized out>, data=0x7fff484cc110) at rpc-clnt.c:931 #8 0x00007fffed29007b in rpc_transport_notify ( this=this at entry=0x7fff64067540, event=event at entry=RPC_TRANSPORT_MSG_RECEIVED, data=data at entry=0x7fff484cc110) at rpc-transport.c:545 #9 0x00007fff7b6298dc in socket_event_poll_in_async (xl=<optimized out>, async=0x7fff484cc238) at socket.c:2592 #10 0x00007fff7b63134c in gf_async ( cbk=0x7fff7b6298b0 <socket_event_poll_in_async>, xl=<optimized out>, async=0x7fff484cc238) at ../../../../libglusterfs/src/glusterfs/async.h:189 #11 socket_event_poll_in (notify_handled=true, this=0x7fff64067540) at socket.c:2633 #12 socket_event_handler (fd=fd at entry=18, idx=idx at entry=9, gen=gen at entry=1, --Type <RET> for more, q to quit, c to continue without paging-- data=data at entry=0x7fff64067540, poll_in=<optimized out>, poll_out=<optimized out>, poll_err=0, event_thread_died=0 '\000') at socket.c:3031 #13 0x00007fffed54da8b in event_dispatch_epoll_handler (event=0x7fff60dc5014, event_pool=0x5555557cb750) at event-epoll.c:650 #14 event_dispatch_epoll_worker (data=0x555555825900) at event-epoll.c:763 #15 0x00007fffec0702de in start_thread () from /lib64/libpthread.so.0 #16 0x00007fffeb8c8a63 in clone () from /lib64/libc.so.6 (gdb) continue
Ravishankar N
2020-Apr-01 04:18 UTC
[Gluster-users] gnfs split brain when 1 server in 3x1 down (high load) - help request
On 01/04/20 8:57 am, Erik Jacobson wrote:> Here are some back traces. They make my head hurt. Maybe you can suggest > something else to try next? In the morning I'll try to unwind this > myself too in the source code but I suspect it will be tough for me. > > > (gdb) break xlators/cluster/afr/src/afr-read-txn.c:280 if err == 5 > Breakpoint 1 at 0x7fff688e057b: file afr-read-txn.c, line 281. > (gdb) continue > Continuing. > [Switching to Thread 0x7ffecffff700 (LWP 50175)] > > Thread 15 "glfs_epoll007" hit Breakpoint 1, afr_read_txn_refresh_done ( > frame=0x7fff48325d78, this=0x7fff640137b0, err=5) at afr-read-txn.c:281 > 281 if (err) { > (gdb) bt > #0 afr_read_txn_refresh_done (frame=0x7fff48325d78, this=0x7fff640137b0, > err=5) at afr-read-txn.c:281 > #1 0x00007fff68901fdb in afr_txn_refresh_done ( > frame=frame at entry=0x7fff48325d78, this=this at entry=0x7fff640137b0, err=5, > err at entry=0) at afr-common.c:1223 > #2 0x00007fff689022b3 in afr_inode_refresh_done ( > frame=frame at entry=0x7fff48325d78, this=this at entry=0x7fff640137b0, error=0) > at afr-common.c:1295Hmm, afr_inode_refresh_done() is called with error=0 and by the time we reach afr_txn_refresh_done(), it becomes 5(i.e. EIO). So afr_inode_refresh_done() is changing it to 5. Maybe you can put breakpoints/ log messages in afr_inode_refresh_done() at the places where error is getting changed and see where the assignment happens. Regards, Ravi