Hi all, These days, our glusterfs system crashed many times. The coredump showed the crash point always in FRAME_DESTORY or STACK_WIND. Once frame->next->prev was referenced, even if the value check such as 'if (frame->next->prev == NULL) {return}' always crashed. This mainly occured when opendir was called. There are about 500 bricks in thes system. The coredump listed as follows, and anybody can help me ? Coredump 1: #0 0x00002aad0cd465f4 in FRAME_DESTROY (frame=<value optimized out>, cookie=<value optimized out>, this=<value optimized out>, op_ret=<value optimized out>, op_errno=<value optimized out>, fd=<value optimized out>) at ../../../../libglusterfs/src/stack.h:143 143 frame->next->prev = frame->prev; (gdb) bt #0 0x00002aad0cd465f4 in FRAME_DESTROY (frame=<value optimized out>, cookie=<value optimized out>, this=<value optimized out>, op_ret=<value optimized out>, op_errno=<value optimized out>, fd=<value optimized out>) at ../../../../libglusterfs/src/stack.h:143 #1 STACK_DESTROY (frame=<value optimized out>, cookie=<value optimized out>, this=<value optimized out>, op_ret=<value optimized out>, op_errno=<value optimized out>, fd=<value optimized out>) at ../../../../libglusterfs/src/stack.h:180 #2 fuse_fd_cbk (frame=<value optimized out>, cookie=<value optimized out>, this=<value optimized out>, op_ret=<value optimized out>, op_errno=<value optimized out>, fd=<value optimized out>) at fuse-bridge.c:594 #3 0x00002aad0e34e7a9 in io_stats_opendir_cbk (frame=0x2aadc5b329c0, cookie=<value optimized out>, this=<value optimized out>, op_ret=0, op_errno=117, fd=0x2aad67f98954) at io-stats.c:1492 #4 0x00002aad0e12eab2 in sp_fd_cbk (frame=0x2aadc5bc0fc0, cookie=<value optimized out>, this=<value optimized out>, op_ret=0, op_errno=117, fd=0x2aad67f98954) at stat-prefetch.c:1506 #5 0x00002aad0df00238 in dht_fd_cbk (frame=0x2aadc5bc10c0, cookie=<value optimized out>, this=<value optimized out>, op_ret=<value optimized out>, op_errno=<value optimized out>, fd=<value optimized out>) at dht-common.c:2615 #6 0x00002aad0dc8f941 in afr_examine_dir_readdir_cbk (frame=0x2aadc3a59ce0, cookie=<value optimized out>, this=<value optimized out>, op_ret=<value optimized out>, op_errno=<value optimized out>, entries=<value optimized out>) at afr-dir-read.c:185 #7 0x00002aad0da6485d in client3_1_readdir_cbk (req=<value optimized out>, iov=<value optimized out>, count=<value optimized out>, myframe=0x36a4600) at client3_1-fops.c:1883 #8 0x00002aad0a317315 in rpc_clnt_handle_reply (clnt=0x18a2b30, pollin=0x24054b0) at rpc-clnt.c:741 #9 0x00002aad0a317569 in rpc_clnt_notify (trans=<value optimized out>, mydata=0x18a2b60, event=<value optimized out>, data=<value optimized out>) at rpc-clnt.c:854 #10 0x00002aad0a312418 in rpc_transport_notify (this=<value optimized out>, event=<value optimized out>, data=<value optimized out>) at rpc-transport.c:919 #11 0x00002aad0d41f254 in socket_event_poll_in (this=0x18a2c50) at socket.c:1647 #12 0x00002aad0d41f337 in socket_event_handler (fd=<value optimized out>, idx=405, data=0x18a2c50, poll_in=1, poll_out=0, poll_err=<value optimized out>) at socket.c:1762 #13 0x00002aad0a0e3014 in event_dispatch_epoll_handler (event_pool=0x1602330) at event.c:794 #14 event_dispatch_epoll (event_pool=0x1602330) at event.c:856 #15 0x0000000000405e69 in main (argc=5, argv=0x7fff4b224d28) at glusterfsd.c:1462 Coredump 2: #0 CHECK_FRAME (frame=0x2af752e9bba0, this=<value optimized out>, loc=0x2af7538acd38, fd=0x2af6b71f6b4c) at ../../../../libglusterfs/src/stack.h:205 205 if (frame->root->frames.next->prev == NULL){ (gdb) bt #0 CHECK_FRAME (frame=0x2af752e9bba0, this=<value optimized out>, loc=0x2af7538acd38, fd=0x2af6b71f6b4c) at ../../../../libglusterfs/src/stack.h:205 #1 afr_opendir (frame=0x2af752e9bba0, this=<value optimized out>, loc=0x2af7538acd38, fd=0x2af6b71f6b4c) at afr-dir-read.c:343 #2 0x00002af65d2d036a in dht_opendir (frame=0x2af751f91ec0, this=<value optimized out>, loc=0x2af7538acd38, fd=0x2af6b71f6b4c) at dht-common.c:3092 #3 0x00002af65d4ff86a in sp_opendir (frame=<value optimized out>, this=0xa1a730, loc=0x2af7538acd38, fd=0x2af6b71f6b4c) at stat-prefetch.c:1854 #4 0x00002af65d719c4a in io_stats_opendir (frame=<value optimized out>, this=0xa1b940, loc=0x2af7538acd38, fd=0x2af6b71f6b4c) at io-stats.c:2137 #5 0x00002af65c111540 in fuse_opendir_resume (state=0x2af7538acd20) at fuse-bridge.c:2011 #6 0x00002af65c0ff512 in fuse_resolve_and_resume (state=0x2af7538acd20, fn=0x2af65c1113a0 <fuse_opendir_resume>) at fuse-resolve.c:763 #7 0x00002af65c10c5bd in fuse_thread_proc (data=0x7d16d0) at fuse-bridge.c:3223 #8 0x0000003c394077e1 in start_thread () from /lib64/libpthread.so.0 #9 0x0000003c390e152d in clone () from /lib64/libc.so.6 Note: CHECK_FRAME was a function I added to showed the crashed line in the coredump. Thank you very much 2012-04-09 -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20120409/4a050452/attachment.html>
Krishnan Parthasarathi
2012-Apr-09 06:00 UTC
[Gluster-users] glusterfs crash on frame->next->prev
chyd, Which version of glusterfs are you using? We need to check if this happens even after the following fix was merged, http://review.gluster.com/774 thanks, krish ----- Original Message ----- From: "chyd" <chyd at ihep.ac.cn> To: "gluster-users" <gluster-users at gluster.org> Sent: Monday, April 9, 2012 10:06:58 AM Subject: [Gluster-users] glusterfs crash on frame->next->prev Hi all, These days, our glusterfs system crashed many times. The coredump showed the crash point always in FRAME_DESTORY or STACK_WIND. Once frame->next->prev was referenced, even if the value check such as 'if (frame->next->prev == NULL) {return}' always crashed. This mainly occured when opendir was called. There are about 500 bricks in thes system. The coredump listed as follows, and anybody can help me ? Coredump 1: #0 0x00002aad0cd465f4 in FRAME_DESTROY (frame=<value optimized out>, cookie=<value optimized out>, this=<value optimized out>, op_ret=<value optimized out>, op_errno=<value optimized out>, fd=<value optimized out>) at ../../../../libglusterfs/src/stack.h:143 143 frame->next->prev = frame->prev; (gdb) bt #0 0x00002aad0cd465f4 in FRAME_DESTROY (frame=<value optimized out>, cookie=<value optimized out>, this=<value optimized out>, op_ret=<value optimized out>, op_errno=<value optimized out>, fd=<value optimized out>) at ../../../../libglusterfs/src/stack.h:143 #1 STACK_DESTROY (frame=<value optimized out>, cookie=<value optimized out>, this=<value optimized out>, op_ret=<value optimized out>, op_errno=<value optimized out>, fd=<value optimized out>) at ../../../../libglusterfs/src/stack.h:180 #2 fuse_fd_cbk (frame=<value optimized out>, cookie=<value optimized out>, this=<value optimized out>, op_ret=<value optimized out>, op_errno=<value optimized out>, fd=<value optimized out>) at fuse-bridge.c:594 #3 0x00002aad0e34e7a9 in io_stats_opendir_cbk (frame=0x2aadc5b329c0, cookie=<value optimized out>, this=<value optimized out>, op_ret=0, op_errno=117, fd=0x2aad67f98954) at io-stats.c:1492 #4 0x00002aad0e12eab2 in sp_fd_cbk (frame=0x2aadc5bc0fc0, cookie=<value optimized out>, this=<value optimized out>, op_ret=0, op_errno=117, fd=0x2aad67f98954) at stat-prefetch.c:1506 #5 0x00002aad0df00238 in dht_fd_cbk (frame=0x2aadc5bc10c0, cookie=<value optimized out>, this=<value optimized out>, op_ret=<value optimized out>, op_errno=<value optimized out>, fd=<value optimized out>) at dht-common.c:2615 #6 0x00002aad0dc8f941 in afr_examine_dir_readdir_cbk (frame=0x2aadc3a59ce0, cookie=<value optimized out>, this=<value optimized out>, op_ret=<value optimized out>, op_errno=<value optimized out>, entries=<value optimized out>) at afr-dir-read.c:185 #7 0x00002aad0da6485d in client3_1_readdir_cbk (req=<value optimized out>, iov=<value optimized out>, count=<value optimized out>, myframe=0x36a4600) at client3_1-fops.c:1883 #8 0x00002aad0a317315 in rpc_clnt_handle_reply (clnt=0x18a2b30, pollin=0x24054b0) at rpc-clnt.c:741 #9 0x00002aad0a317569 in rpc_clnt_notify (trans=<value optimized out>, mydata=0x18a2b60, event=<value optimized out>, data=<value optimized out>) at rpc-clnt.c:854 #10 0x00002aad0a312418 in rpc_transport_notify (this=<value optimized out>, event=<value optimized out>, data=<value optimized out>) at rpc-transport.c:919 #11 0x00002aad0d41f254 in socket_event_poll_in (this=0x18a2c50) at socket.c:1647 #12 0x00002aad0d41f337 in socket_event_handler (fd=<value optimized out>, idx=405, data=0x18a2c50, poll_in=1, poll_out=0, poll_err=<value optimized out>) at socket.c:1762 #13 0x00002aad0a0e3014 in event_dispatch_epoll_handler (event_pool=0x1602330) at event.c:794 #14 event_dispatch_epoll (event_pool=0x1602330) at event.c:856 #15 0x0000000000405e69 in main (argc=5, argv=0x7fff4b224d28) at glusterfsd.c:1462 Coredump 2: #0 CHECK_FRAME (frame=0x2af752e9bba0, this=<value optimized out>, loc=0x2af7538acd38, fd=0x2af6b71f6b4c) at ../../../../libglusterfs/src/stack.h:205 205 if (frame->root->frames.next->prev == NULL){ (gdb) bt #0 CHECK_FRAME (frame=0x2af752e9bba0, this=<value optimized out>, loc=0x2af7538acd38, fd=0x2af6b71f6b4c) at ../../../../libglusterfs/src/stack.h:205 #1 afr_opendir (frame=0x2af752e9bba0, this=<value optimized out>, loc=0x2af7538acd38, fd=0x2af6b71f6b4c) at afr-dir-read.c:343 #2 0x00002af65d2d036a in dht_opendir (frame=0x2af751f91ec0, this=<value optimized out>, loc=0x2af7538acd38, fd=0x2af6b71f6b4c) at dht-common.c:3092 #3 0x00002af65d4ff86a in sp_opendir (frame=<value optimized out>, this=0xa1a730, loc=0x2af7538acd38, fd=0x2af6b71f6b4c) at stat-prefetch.c:1854 #4 0x00002af65d719c4a in io_stats_opendir (frame=<value optimized out>, this=0xa1b940, loc=0x2af7538acd38, fd=0x2af6b71f6b4c) at io-stats.c:2137 #5 0x00002af65c111540 in fuse_opendir_resume (state=0x2af7538acd20) at fuse-bridge.c:2011 #6 0x00002af65c0ff512 in fuse_resolve_and_resume (state=0x2af7538acd20, fn=0x2af65c1113a0 <fuse_opendir_resume>) at fuse-resolve.c:763 #7 0x00002af65c10c5bd in fuse_thread_proc (data=0x7d16d0) at fuse-bridge.c:3223 #8 0x0000003c394077e1 in start_thread () from /lib64/libpthread.so.0 #9 0x0000003c390e152d in clone () from /lib64/libc.so.6 Note: CHECK_FRAME was a function I added to showed the crashed line in the coredump. Thank you very much 2012-04-09 _______________________________________________ Gluster-users mailing list Gluster-users at gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users