thr3ads.net - CentOS - [CentOS] CentOS 6.5: NFS server crashes with list

If this information is useful, please help other people find it:
Share via:

Alessio Cecchi

2014-Jan-30 11:24 UTC

[CentOS] CentOS 6.5: NFS server crashes with list_add corruption errors

Hi,

I'm running CentOS 6.5 as NFS server (v3 and v4) and exporting Ext4 and 
XFS filesystem.

After many months that all works fine today the server crash:

Jan 30 09:46:13 qb-storage kernel: ------------[ cut here ]------------
Jan 30 09:46:13 qb-storage kernel: WARNING: at lib/list_debug.c:26 
__list_add+0x6d/0xa0() (Not tainted)
Jan 30 09:46:13 qb-storage kernel: Hardware name: PowerEdge
Jan 30 09:46:13 qb-storage kernel: list_add corruption. next->prev 
should be prev (ffff8804366c5df0), but was ffff8803f611fa68. 
(next=ffff8803f611fa68).
Jan 30 09:46:13 qb-storage kernel: Modules linked in: nfsd lockd nfs_acl 
auth_rpcgss sunrpc act_police cls_basic cls_flow cls_fw cls_u32 sch_tbf 
sch_prio sch_htb sch_hfsc sch_ingress sch_sfq bridge stp llc 
xt_statistic xt_time xt_connlimit xt_realm iptable_raw xt_comment 
xt_recent xt_policy ipt_ULOG ipt_REJECT ipt_REDIRECT ipt_NETMAP 
ipt_MASQUERADE ipt_ECN ipt_ecn ipt_CLUSTERIP ipt_ah ipt_addrtype xt_set 
ip_set nf_nat_tftp nf_nat_snmp_basic nf_conntrack_snmp nf_nat_sip 
nf_nat_pptp nf_nat_proto_gre nf_nat_irc nf_nat_h323 nf_nat_ftp 
nf_nat_amanda ts_kmp nf_conntrack_amanda nf_conntrack_sane 
nf_conntrack_tftp nf_conntrack_sip nf_conntrack_proto_udplite 
nf_conntrack_proto_sctp nf_conntrack_pptp nf_conntrack_proto_gre 
nf_conntrack_netlink nf_conntrack_netbios_ns nf_conntrack_broadcast 
nf_conntrack_irc nf_conntrack_h323 nf_conntrack_ftp xt_TPROXY 
nf_tproxy_core nf_defrag_ipv6 xt_tcpmss xt_pkttype xt_physdev xt_owner 
xt_NFQUEUE xt_NFLOG nfnetlink_log xt_multiport xt_MARK xt_mark xt_mac 
xt_limit xt_length xt_iprange xt_help
Jan 30 09:46:13 qb-storage kernel: er xt_hashlimit xt_DSCP xt_dscp 
xt_dccp xt_conntrack xt_CONNMARK xt_connmark xt_CLASSIFY xt_AUDIT 
ipt_LOG xt_state iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 
nf_conntrack iptable_mangle nfnetlink iptable_filter ip_tables ipv6 xfs 
exportfs microcode power_meter iTCO_wdt iTCO_vendor_support dcdbas sg 
bnx2 lpc_ich mfd_core usb_storage ext4 jbd2 mbcache raid1 sr_mod cdrom 
sd_mod crc_t10dif ahci wmi dm_mirror dm_region_hash dm_log dm_mod [last 
unloaded: speedstep_lib]
Jan 30 09:46:13 qb-storage kernel: Pid: 5759, comm: nfsd4 Not tainted 
2.6.32-431.1.2.0.1.el6.x86_64 #1
Jan 30 09:46:13 qb-storage kernel: Call Trace:
Jan 30 09:46:13 qb-storage kernel: [<ffffffff81071e27>] ? 
warn_slowpath_common+0x87/0xc0
Jan 30 09:46:13 qb-storage kernel: [<ffffffff81071f16>] ? 
warn_slowpath_fmt+0x46/0x50
Jan 30 09:46:13 qb-storage kernel: [<ffffffff81527920>] ? 
thread_return+0x4e/0x76e
Jan 30 09:46:13 qb-storage kernel: [<ffffffff812944ed>] ? 
__list_add+0x6d/0xa0
Jan 30 09:46:13 qb-storage kernel: [<ffffffffa05bd60a>] ? 
laundromat_main+0x23a/0x3f0 [nfsd]
Jan 30 09:46:13 qb-storage kernel: [<ffffffffa05bd3d0>] ? 
laundromat_main+0x0/0x3f0 [nfsd]
Jan 30 09:46:13 qb-storage kernel: [<ffffffff81094d30>] ? 
worker_thread+0x170/0x2a0
Jan 30 09:46:13 qb-storage kernel: [<ffffffff8109b2b0>] ? 
autoremove_wake_function+0x0/0x40
Jan 30 09:46:13 qb-storage kernel: [<ffffffff81094bc0>] ? 
worker_thread+0x0/0x2a0
Jan 30 09:46:13 qb-storage kernel: [<ffffffff8109af06>] ?
kthread+0x96/0xa0
Jan 30 09:46:13 qb-storage kernel: [<ffffffff8100c20a>] ?
child_rip+0xa/0x20
Jan 30 09:46:13 qb-storage kernel: [<ffffffff8109ae70>] ? kthread+0x0/0xa0
Jan 30 09:46:13 qb-storage kernel: [<ffffffff8100c200>] ?
child_rip+0x0/0x20
Jan 30 09:46:13 qb-storage kernel: ---[ end trace 13fa6e7d5ee2d668 ]---

and:

|BUG: soft lockup - CPU#0 stuck for 67s! [nfsd4:3519]

The error is exactly like this:

https://access.redhat.com/site/solutions/166583
|

Does anyone know if the problem is solved and how?
Thanks

-- 
Alessio Cecchi is:
@ ILS -> http://www.linux.it/~alessice/
on LinkedIn -> http://www.linkedin.com/in/alessice
Assistenza Sistemi GNU/Linux -> http://www.cecchi.biz
Cloud Email Hosting -> http://www.qboxmail.com
@ PLUG -> ex-Presidente, adesso senatore a vita, http://www.prato.linux.it

Daniel Bird

2014-Jan-30 12:32 UTC

head link

[CentOS] CentOS 6.5: NFS server crashes with list_add corruption errors

On 30/01/2014 11:24, Alessio Cecchi wrote:> he error is exactly like this:
>
> https://access.redhat.com/site/solutions/166583
> |
>
> Does anyone know if the problem is solved and how?
> ThanksThat page lists the workaround if you log in, but no "resolution" as
yet. Although there are internal Bugzilla ID's for both 5 and 6

echo 0 >/proc/sys/fs/leases-enable

Jeffrey Hass

2014-Jan-30 15:36 UTC

head link

[CentOS] CentOS 6.5: NFS server crashes with list_add corruption errors

Allesio,

Are these VM's -- did you move the /VM files/ respectively to backup 
location named as per VM? e.g.:
/DB
/CORE business Server
/etc

Because it looks like your PowerEdge system chocked and you may have to:

A: get that back online, fire up the systems.. or
B: replace
C: restor/replace

The errors are pretty obvious to me at first pass -- and if I was there 
I could tell in 5 minutes
what is 'probably' wrong.. but that's my first pass at this.

I hope you had some kind of failover/redundancy with the "appliance"
--

Goodluck,

JJ Hass


On 1/30/2014 3:24 AM, Alessio Cecchi wrote:> Hi,
>
> I'm running CentOS 6.5 as NFS server (v3 and v4) and exporting Ext4 and
> XFS filesystem.
>
> After many months that all works fine today the server crash:
>
> Jan 30 09:46:13 qb-storage kernel: ------------[ cut here ]------------
> Jan 30 09:46:13 qb-storage kernel: WARNING: at lib/list_debug.c:26
> __list_add+0x6d/0xa0() (Not tainted)
> Jan 30 09:46:13 qb-storage kernel: Hardware name: PowerEdge
> Jan 30 09:46:13 qb-storage kernel: list_add corruption. next->prev
> should be prev (ffff8804366c5df0), but was ffff8803f611fa68.
> (next=ffff8803f611fa68).
> Jan 30 09:46:13 qb-storage kernel: Modules linked in: nfsd lockd nfs_acl
> auth_rpcgss sunrpc act_police cls_basic cls_flow cls_fw cls_u32 sch_tbf
> sch_prio sch_htb sch_hfsc sch_ingress sch_sfq bridge stp llc
> xt_statistic xt_time xt_connlimit xt_realm iptable_raw xt_comment
> xt_recent xt_policy ipt_ULOG ipt_REJECT ipt_REDIRECT ipt_NETMAP
> ipt_MASQUERADE ipt_ECN ipt_ecn ipt_CLUSTERIP ipt_ah ipt_addrtype xt_set
> ip_set nf_nat_tftp nf_nat_snmp_basic nf_conntrack_snmp nf_nat_sip
> nf_nat_pptp nf_nat_proto_gre nf_nat_irc nf_nat_h323 nf_nat_ftp
> nf_nat_amanda ts_kmp nf_conntrack_amanda nf_conntrack_sane
> nf_conntrack_tftp nf_conntrack_sip nf_conntrack_proto_udplite
> nf_conntrack_proto_sctp nf_conntrack_pptp nf_conntrack_proto_gre
> nf_conntrack_netlink nf_conntrack_netbios_ns nf_conntrack_broadcast
> nf_conntrack_irc nf_conntrack_h323 nf_conntrack_ftp xt_TPROXY
> nf_tproxy_core nf_defrag_ipv6 xt_tcpmss xt_pkttype xt_physdev xt_owner
> xt_NFQUEUE xt_NFLOG nfnetlink_log xt_multiport xt_MARK xt_mark xt_mac
> xt_limit xt_length xt_iprange xt_help
> Jan 30 09:46:13 qb-storage kernel: er xt_hashlimit xt_DSCP xt_dscp
> xt_dccp xt_conntrack xt_CONNMARK xt_connmark xt_CLASSIFY xt_AUDIT
> ipt_LOG xt_state iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4
> nf_conntrack iptable_mangle nfnetlink iptable_filter ip_tables ipv6 xfs
> exportfs microcode power_meter iTCO_wdt iTCO_vendor_support dcdbas sg
> bnx2 lpc_ich mfd_core usb_storage ext4 jbd2 mbcache raid1 sr_mod cdrom
> sd_mod crc_t10dif ahci wmi dm_mirror dm_region_hash dm_log dm_mod [last
> unloaded: speedstep_lib]
> Jan 30 09:46:13 qb-storage kernel: Pid: 5759, comm: nfsd4 Not tainted
> 2.6.32-431.1.2.0.1.el6.x86_64 #1
> Jan 30 09:46:13 qb-storage kernel: Call Trace:
> Jan 30 09:46:13 qb-storage kernel: [<ffffffff81071e27>] ?
> warn_slowpath_common+0x87/0xc0
> Jan 30 09:46:13 qb-storage kernel: [<ffffffff81071f16>] ?
> warn_slowpath_fmt+0x46/0x50
> Jan 30 09:46:13 qb-storage kernel: [<ffffffff81527920>] ?
> thread_return+0x4e/0x76e
> Jan 30 09:46:13 qb-storage kernel: [<ffffffff812944ed>] ?
> __list_add+0x6d/0xa0
> Jan 30 09:46:13 qb-storage kernel: [<ffffffffa05bd60a>] ?
> laundromat_main+0x23a/0x3f0 [nfsd]
> Jan 30 09:46:13 qb-storage kernel: [<ffffffffa05bd3d0>] ?
> laundromat_main+0x0/0x3f0 [nfsd]
> Jan 30 09:46:13 qb-storage kernel: [<ffffffff81094d30>] ?
> worker_thread+0x170/0x2a0
> Jan 30 09:46:13 qb-storage kernel: [<ffffffff8109b2b0>] ?
> autoremove_wake_function+0x0/0x40
> Jan 30 09:46:13 qb-storage kernel: [<ffffffff81094bc0>] ?
> worker_thread+0x0/0x2a0
> Jan 30 09:46:13 qb-storage kernel: [<ffffffff8109af06>] ?
kthread+0x96/0xa0
> Jan 30 09:46:13 qb-storage kernel: [<ffffffff8100c20a>] ?
child_rip+0xa/0x20
> Jan 30 09:46:13 qb-storage kernel: [<ffffffff8109ae70>] ?
kthread+0x0/0xa0
> Jan 30 09:46:13 qb-storage kernel: [<ffffffff8100c200>] ?
child_rip+0x0/0x20
> Jan 30 09:46:13 qb-storage kernel: ---[ end trace 13fa6e7d5ee2d668 ]---
>
> and:
>
> |BUG: soft lockup - CPU#0 stuck for 67s! [nfsd4:3519]
>
> The error is exactly like this:
>
> https://access.redhat.com/site/solutions/166583
> |
>
> Does anyone know if the problem is solved and how?
> Thanks
>

Maybe Matching Threads

Search for more seemingly similar threads

CentOS - Jan 2014 - CentOS 6.5: NFS server crashes with list_add corruption errors

[CentOS] CentOS 6.5: NFS server crashes with list_add corruption errors

[CentOS] CentOS 6.5: NFS server crashes with list_add corruption errors

[CentOS] CentOS 6.5: NFS server crashes with list_add corruption errors

Maybe Matching Threads