thr3ads.net - CentOS - [CentOS] CentOS 6.4 tcp_fatretrans

If this information is useful, please help other people find it:
Share via:

Zhang Qiang

2016-Nov-28 07:29 UTC

[CentOS] CentOS 6.4 tcp_fatretrans_alert causes panic

Hi all,

Our kernel is 2.6.32-358.14.1.x86_64, recently dozens of them panicked,
since it's been OK for a long time and the problem emerged all of a sudden,
I'm not sure if an upgrade caused this problem. Here's what I got from
backtracing:

PID: 8136   TASK: ffff8803341aead0  CPU: 2   COMMAND: ""
 #0 [ffff880028283610] panic at ffffffff815286b8
 #1 [ffff880028283690] oops_end at ffffffff8152c8a2
 #2 [ffff8800282836c0] no_context at ffffffff81046c1b
 #3 [ffff880028283710] __bad_area_nosemaphore at ffffffff81046ea5
 #4 [ffff880028283760] bad_area_nosemaphore at ffffffff81046f73
 #5 [ffff880028283770] __do_page_fault at ffffffff810476d1
 #6 [ffff880028283890] do_page_fault at ffffffff8152e7be
 #7 [ffff8800282838c0] page_fault at ffffffff8152bb75
    [exception RIP: tcp_fastretrans_alert+2754]
    RIP: ffffffff814aed62  RSP: ffff880028283970  RFLAGS: 00010246
    RAX: 0000000000000002  RBX: ffff88003d22c940  RCX: 0000000000000002
    RDX: 0000000000000000  RSI: 0000000000000003  RDI: 0000000000000000
    RBP: ffff8800282839b0   R8: 000000018033a9ac   R9: 0000000000000000
    R10: 0000000000000000  R11: 0000000000000000  R12: 0000000000000000
    R13: 0000000000000000  R14: 0000000000000d03  R15: 0000000000000000
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0000
 #8 [ffff8800282839b8] tcp_ack at ffffffff814afb2c
 #9 [ffff880028283a88] tcp_rcv_state_process at ffffffff814b1128
#10 [ffff880028283b18] tcp_v4_do_rcv at ffffffff814b94f0
#11 [ffff880028283bb8] tcp_v4_rcv at ffffffff814baf9a
#12 [ffff880028283c48] ip_local_deliver_finish at ffffffff8149648d
#13 [ffff880028283c78] ip_local_deliver at ffffffff81496718
#14 [ffff880028283ca8] ip_rcv_finish at ffffffff81495bbd
#15 [ffff880028283ce8] ip_rcv at ffffffff81496155
#16 [ffff880028283d28] __netif_receive_skb at ffffffff8145db5b
#17 [ffff880028283d88] netif_receive_skb at ffffffff814621b8
#18 [ffff880028283dc8] virtnet_poll at ffffffffa0130565 [virtio_net]
#19 [ffff880028283e68] net_rx_action at ffffffff81463193
#20 [ffff880028283ec8] __do_softirq at ffffffff81078c71
#21 [ffff880028283f38] call_softirq at ffffffff8100c1cc
#22 [ffff880028283f50] do_softirq at ffffffff8100de05
#23 [ffff880028283f70] irq_exit at ffffffff81078a55
#24 [ffff880028283f80] do_IRQ at ffffffff81532365
--- <IRQ stack> ---
#25 [ffff88001e851f58] ret_from_intr at ffffffff8100b9d3
    RIP: 00007fa080e1a538  RSP: 00007fa0781ec960  RFLAGS: 00000206
    RAX: 0000000000000001  RBX: 00007fa0781ec9a0  RCX: 000000000001ef8c
    RDX: 0000000000001000  RSI: 0000000000000006  RDI: 00007fa07c093df8
    RBP: ffffffff8100b9ce   R8: 0000000000000006   R9: 0000000004000001
    R10: 0000000000000001  R11: 0000000000000246  R12: 0000000000000000
    R13: 00007fa0710a18f0  R14: 0000000000000120  R15: 0000000000001000
    ORIG_RAX: ffffffffffffff8e  CS: 0033  SS: 002b

disassemble tcp_fasteretrans_alert+2754 gives:

0xffffffff814aed62 <tcp_fastretrans_alert+2754>:        sub
 0x58(%rdi),%r8d

I know this kernel is a bit old, but since these kernels are in production
environment, I can't just upgrade them all to test if it's the problem
of
the old version. So I need some advice on how to debug or a bug report.
Thanks.

Phil Wyett

2016-Nov-28 07:45 UTC

head link

[CentOS] CentOS 6.4 tcp_fatretrans_alert causes panic

On Mon, 2016-11-28 at 15:29 +0800, Zhang Qiang wrote:> Hi all,
> 
> Our kernel is 2.6.32-358.14.1.x86_64, recently dozens of them panicked,
> since it's been OK for a long time and the problem emerged all of a
sudden,
> I'm not sure if an upgrade caused this problem. Here's what I got
from
> backtracing:
> 
> PID: 8136   TASK: ffff8803341aead0  CPU: 2   COMMAND: ""
>  #0 [ffff880028283610] panic at ffffffff815286b8
>  #1 [ffff880028283690] oops_end at ffffffff8152c8a2
>  #2 [ffff8800282836c0] no_context at ffffffff81046c1b
>  #3 [ffff880028283710] __bad_area_nosemaphore at ffffffff81046ea5
>  #4 [ffff880028283760] bad_area_nosemaphore at ffffffff81046f73
>  #5 [ffff880028283770] __do_page_fault at ffffffff810476d1
>  #6 [ffff880028283890] do_page_fault at ffffffff8152e7be
>  #7 [ffff8800282838c0] page_fault at ffffffff8152bb75
>     [exception RIP: tcp_fastretrans_alert+2754]
>     RIP: ffffffff814aed62  RSP: ffff880028283970  RFLAGS: 00010246
>     RAX: 0000000000000002  RBX: ffff88003d22c940  RCX: 0000000000000002
>     RDX: 0000000000000000  RSI: 0000000000000003  RDI: 0000000000000000
>     RBP: ffff8800282839b0   R8: 000000018033a9ac   R9: 0000000000000000
>     R10: 0000000000000000  R11: 0000000000000000  R12: 0000000000000000
>     R13: 0000000000000000  R14: 0000000000000d03  R15: 0000000000000000
>     ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0000
>  #8 [ffff8800282839b8] tcp_ack at ffffffff814afb2c
>  #9 [ffff880028283a88] tcp_rcv_state_process at ffffffff814b1128
> #10 [ffff880028283b18] tcp_v4_do_rcv at ffffffff814b94f0
> #11 [ffff880028283bb8] tcp_v4_rcv at ffffffff814baf9a
> #12 [ffff880028283c48] ip_local_deliver_finish at ffffffff8149648d
> #13 [ffff880028283c78] ip_local_deliver at ffffffff81496718
> #14 [ffff880028283ca8] ip_rcv_finish at ffffffff81495bbd
> #15 [ffff880028283ce8] ip_rcv at ffffffff81496155
> #16 [ffff880028283d28] __netif_receive_skb at ffffffff8145db5b
> #17 [ffff880028283d88] netif_receive_skb at ffffffff814621b8
> #18 [ffff880028283dc8] virtnet_poll at ffffffffa0130565 [virtio_net]
> #19 [ffff880028283e68] net_rx_action at ffffffff81463193
> #20 [ffff880028283ec8] __do_softirq at ffffffff81078c71
> #21 [ffff880028283f38] call_softirq at ffffffff8100c1cc
> #22 [ffff880028283f50] do_softirq at ffffffff8100de05
> #23 [ffff880028283f70] irq_exit at ffffffff81078a55
> #24 [ffff880028283f80] do_IRQ at ffffffff81532365
> --- <IRQ stack> ---
> #25 [ffff88001e851f58] ret_from_intr at ffffffff8100b9d3
>     RIP: 00007fa080e1a538  RSP: 00007fa0781ec960  RFLAGS: 00000206
>     RAX: 0000000000000001  RBX: 00007fa0781ec9a0  RCX: 000000000001ef8c
>     RDX: 0000000000001000  RSI: 0000000000000006  RDI: 00007fa07c093df8
>     RBP: ffffffff8100b9ce   R8: 0000000000000006   R9: 0000000004000001
>     R10: 0000000000000001  R11: 0000000000000246  R12: 0000000000000000
>     R13: 00007fa0710a18f0  R14: 0000000000000120  R15: 0000000000001000
>     ORIG_RAX: ffffffffffffff8e  CS: 0033  SS: 002b
> 
> disassemble tcp_fasteretrans_alert+2754 gives:
> 
> 0xffffffff814aed62 <tcp_fastretrans_alert+2754>:        sub
>  0x58(%rdi),%r8d
> 
> I know this kernel is a bit old, but since these kernels are in production
> environment, I can't just upgrade them all to test if it's the
problem of
> the old version. So I need some advice on how to debug or a bug report.
> Thanks.
> _______________________________________________

Hi,

Being in a production environment, all the more reason to have an
upgrade plan in place and be running latest package version with all the
fixes provided.

Is this isolated to one machine or many?

Can it be reproduced and if so how?

Regards

Phil

-- 

Google+: https://goo.gl/CPjvNo
Blog: https://philwyett-hemi.blogspot.co.uk/
GitLab: https://gitlab.com/philwyett_hemi/


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: This is a digitally signed message part
URL:
<http://lists.centos.org/pipermail/centos/attachments/20161128/fe07cb1e/attachment-0001.sig>

CentOS - Nov 2016 - CentOS 6.4 tcp_fatretrans_alert causes panic

[CentOS] CentOS 6.4 tcp_fatretrans_alert causes panic

[CentOS] CentOS 6.4 tcp_fatretrans_alert causes panic