master@bradleyland.com
2007-Jan-23 17:21 UTC
[Fedora-xen] FC6/Xen crash -- isolated to rsnapshot job
I''ve isolated my FC6/xen crashing problems to cron backup jobs that were running in my dom0. I have moved the jobs into a domU and now observe the crashing behavior in the domU. At least the entire environment doesn''t come down when it''s in a domU. In my crontab, I have a series of rsnapshot backup jobs to backup a handful of windows and linux servers. For windows machines, the script mounts a share on the windows machine using CIFS (samba). It seems only the Windows backup jobs crash the machine and then only crash when two are scheduled to start at exactly the same time. I can replicate the problem by running the crontab command from the command line. If I run the commands one at a time, no crash. If I start them both back to back, the crash occurs within 30 seconds or so. Under FC4, these scripts/backup jobs worked fine for almost a year without intervention. I''ve read there have been a host of problems with CIFS in FC6, but I thought they had been resolved. As a workaround, I can change the job schedule for now, but something is still broken in the kernel, samba or both. Here''s a trace: list_del corruption. prev->next should be c2f5c640, but was c2f50080 ------------[ cut here ]------------ kernel BUG at lib/list_debug.c:65! invalid opcode: 0000 [#1] SMP last sysfs file: /block/ram0/range Modules linked in: nls_utf8 cifs ipv6 autofs4 hidp l2cap bluetooth iptable_raw xt_policy xt_multiport ipt_ULOG ipt_TTL ipt_ttl ipt_TOS ipt_tos ipt_TCPMSS ipt_SAME ipt_REJECT ipt_REDIRECT ipt_recent ipt_owner ipt_NETMAP ipt_MASQUERADE ipt_LOG ipt_iprange ipt_hashlimit ipt_ECN ipt_ecn ipt_CLUSTERIP ipt_ah ipt_addrtype ip_nat_tftp ip_nat_snmp_basic ip_nat_pptp ip_nat_irc ip_nat_ftp ip_nat_amanda ip_conntrack_tftp ip_conntrack_pptp ip_conntrack_netbios_ns ip_conntrack_irc ip_conntrack_ftp ts_kmp ip_conntrack_amanda xt_tcpmss xt_pkttype xt_physdev bridge xt_NFQUEUE xt_MARK xt_mark xt_mac xt_limit xt_length xt_helper xt_dccp xt_conntrack xt_CONNMARK xt_connmark xt_CLASSIFY xt_tcpudp xt_state iptable_nat ip_nat ip_conntrack iptable_mangle nfnetlink iptable_filter ip_tables x_tables tun sunrpc xennet parport_pc lp parport pcspkr dm_snapshot dm_zero dm_mirror dm_mod raid456 xor ext3 jbd xenblk CPU: 0 EIP: 0061:[<c04e9d0b>] Not tainted VLI EFLAGS: 00010082 (2.6.19-1.2895.fc6xen #1) EIP is at list_del+0x23/0x6c eax: 00000048 ebx: c2f5c640 ecx: c0683b30 edx: f5416000 esi: c117a7c0 edi: c32af000 ebp: c117eda0 esp: c0d2def0 ds: 007b es: 007b ss: 0069 Process events/0 (pid: 5, ti=c0d2d000 task=c006e030 task.ti=c0d2d000) Stack: c0646145 c2f5c640 c2f50080 c2f5c640 c0467706 c078afc0 c028c980 c0619b9d 00000014 00000002 c1176228 c1176220 00000014 c1176200 00000000 c0467809 00000000 00000000 c117eda0 c117a7e4 c117a7c0 c117eda0 c0d404a0 00000000 Call Trace: [<c0467706>] free_block+0x77/0xf0 [<c0467809>] drain_array+0x8a/0xb5 [<c0468e22>] cache_reap+0x85/0x117 [<c042d603>] run_workqueue+0x97/0xdd [<c042dfc0>] worker_thread+0xd9/0x10d [<c043058c>] kthread+0xc0/0xec [<c0405253>] kernel_thread_helper+0x7/0x10 ======================Code: 00 00 89 c3 eb e8 90 90 53 89 c3 83 ec 0c 8b 40 04 8b 00 39 d8 74 1c 89 5c 24 04 89 44 24 08 c7 04 24 45 61 64 c0 e8 9a 4b f3 ff <0f> 0b 41 00 82 61 64 c0 8b 03 8b 40 04 39 d8 74 1c 89 5c 24 04 EIP: [<c04e9d0b>] list_del+0x23/0x6c SS:ESP 0069:c0d2def0 <3>BUG: sleeping function called from invalid context at kernel/rwsem.c:20 in_atomic():0, irqs_disabled():1 [<c04056ff>] dump_trace+0x69/0x1b6 [<c0405864>] show_trace_log_lvl+0x18/0x2c [<c0405e4b>] show_trace+0xf/0x11 [<c0405e7a>] dump_stack+0x15/0x17 [<c0433252>] down_read+0x12/0x28 [<c042aca2>] blocking_notifier_call_chain+0xe/0x29 [<c0420d75>] do_exit+0x1b/0x787 [<c0405dec>] die+0x2af/0x2d4 [<c0406262>] do_invalid_op+0xa2/0xab [<c0619deb>] error_code+0x2b/0x30 [<c04e9d0b>] list_del+0x23/0x6c [<c0467706>] free_block+0x77/0xf0 [<c0467809>] drain_array+0x8a/0xb5 [<c0468e22>] cache_reap+0x85/0x117 [<c042d603>] run_workqueue+0x97/0xdd [<c042dfc0>] worker_thread+0xd9/0x10d [<c043058c>] kthread+0xc0/0xec [<c0405253>] kernel_thread_helper+0x7/0x10 ======================BUG: spinlock lockup on CPU#0, rsync/11148, c117a7e4 (Not tainted) [<c04056ff>] dump_trace+0x69/0x1b6 [<c0405864>] show_trace_log_lvl+0x18/0x2c [<c0405e4b>] show_trace+0xf/0x11 [<c0405e7a>] dump_stack+0x15/0x17 [<c04e9b6f>] _raw_spin_lock+0xbf/0xdc [<c0467a45>] cache_alloc_refill+0x74/0x4dc [<c04679b8>] kmem_cache_alloc+0x54/0x6d [<c0413ec1>] pgd_alloc+0x54/0x230 [<c041c020>] mm_init+0x94/0xb9 [<c047056d>] do_execve+0x6f/0x1f5 [<c0402e08>] sys_execve+0x2f/0x4f [<c0404efb>] syscall_call+0x7/0xb [<00b98402>] 0xb98402 =======================