Chris Miller
2008-Aug-13 21:54 UTC
[CentOS] DRBD 8.2 crashes CentOS 5.2 on rsync from remote host
I've got a pair of HA servers I'm trying to get into production. Here are some specs : Xeon X3210 Quad Core (aka Core 2 Quad) 2.13Ghz (four logical processors, no Hyper Threading) 4GB memory Hardware (3ware) Raid 1 mirror, 2 x Seagate 750GB SATA2 650GB DRBD partition run on top of an LVM2 partition. CentOS 5.2 2.6.18-92.1.6.el5.centos.plus DRBD 8.2 (drbd82-8.2.6-1.el5.centos) Kernel Module kmod-drbd82-8.2.6-1.2.6.18_92.1.6.el5.centos.plus I've been trying to rsync data from a remote server and it's crashed a couple of times now. It does not happen immediately, but over time. I connected a serial console and got the below panic message. The last file copied was ~1GB in size, but previous files up to 4GB had been copied. I do not have kernel core dumping enabled, but that's a possibility if needed. Not sure if this is a bug or is caused by something I've done. This isn't my first DRBD install (although first on top of LVM) and I believe I've gotten everything setup correctly. I did have a full sync rate (110M) enabled over Gbe, if that's relevant. Thoughts? Regards, Chris [root at haws2 ~]# pvscan PV /dev/sda2 VG VolGroup00 lvm2 [698.28 GB / 0 free] Total: 1 [698.28 GB] / in use: 1 [698.28 GB] / in no VG: 0 [0 ] [root at haws2 ~]# lvscan ACTIVE '/dev/VolGroup00/LogVol00' [39.06 GB] inherit ACTIVE '/dev/VolGroup00/LogVol02' [658.72 GB] inherit ACTIVE '/dev/VolGroup00/LogVol01' [512.00 MB] inherit [root at haws2 ~]# df Filesystem 1K-blocks Used Available Use% Mounted on /dev/mapper/VolGroup00-LogVol00 39676508 1938880 35689628 6% / /dev/sda1 194442 23650 160753 13% /boot tmpfs 1684156 0 1684156 0% /dev/shm /dev/drbd0 679824572 113321224 531970212 18% /home [root at haws1 ~]# BUG: unable to handle kernel paging request at virtual address c printing eip: c04e9291 *pde = 00000000 Oops: 0000 [#1] SMP last sysfs file: /devices/pci0000:00/0000:00:00.0/irq Modules linked in: softdog drbd(U) autofs4 hidp rfcomm l2cap bluetooth sunrpc id CPU: 0 EIP: 0060:[<c04e9291>] Tainted: G VLI EFLAGS: 00010046 (2.6.18-92.1.6.el5.centos.plus #1) EIP is at list_del+0x25/0x5c eax: fe187128 ebx: f04a6ab8 ecx: f04a6a8c edx: f04a6a8c esi: fe187128 edi: f4e355a0 ebp: f426c800 esp: f385df3c ds: 007b es: 007b ss: 0068 Process drbd0_asender (pid: 2900, ti=f385d000 task=f4932000 task.ti=f385d000) Stack: 000000e6 f8d1953b 00000000 f04a6a8c 000000e6 00000001 ee187b14 00000046 f49e7bc0 f04a6ab8 f04a6a8c f426c800 fe187128 f4e355a0 0000349f f8d24805 00000800 f426c800 f426c800 00000008 f426c9f4 f8d14d47 f385dfbc f8d15fbc Call Trace: [<f8d1953b>] _req_may_be_done+0x4ea/0x710 [drbd] [<f8d24805>] tl_release+0x35/0x172 [drbd] [<f8d14d47>] got_BarrierAck+0x10/0x6b [drbd] [<f8d15fbc>] drbd_asender+0x3b1/0x4e7 [drbd] [<f8d24a53>] drbd_thread_setup+0x0/0x14e [drbd] [<f8d24adb>] drbd_thread_setup+0x88/0x14e [drbd] [<f8d24a53>] drbd_thread_setup+0x0/0x14e [drbd] [<c0405c3b>] kernel_thread_helper+0x7/0x10 ======================Code: 89 c3 eb eb 90 90 53 89 c3 8b 40 04 8b 00 39 d8 74 17 50 53 68 9b 9a 63 c EIP: [<c04e9291>] list_del+0x25/0x5c SS:ESP 0068:f385df3c <0>Kernel panic - not syncing: Fatal exception BUG: warning at arch/i386/kernel/smp.c:550/smp_call_function() (Tainted: G ) [<c0417ae0>] stop_this_cpu+0x0/0x33 [<c04178cf>] smp_call_function+0x57/0xc3 [<c0426682>] printk+0x18/0x8e [<c041794e>] smp_send_stop+0x13/0x1c [<c0425c53>] panic+0x4c/0x16d [<c04064dd>] die+0x25d/0x291 [<c060c48b>] do_page_fault+0x3ea/0x4b8 [<c060c0a1>] do_page_fault+0x0/0x4b8 [<c0405a71>] error_code+0x39/0x40 [<c04e9291>] list_del+0x25/0x5c [<f8d1953b>] _req_may_be_done+0x4ea/0x710 [drbd] [<f8d24805>] tl_release+0x35/0x172 [drbd] [<f8d14d47>] got_BarrierAck+0x10/0x6b [drbd] [<f8d15fbc>] drbd_asender+0x3b1/0x4e7 [drbd] [<f8d24a53>] drbd_thread_setup+0x0/0x14e [drbd] [<f8d24adb>] drbd_thread_setup+0x88/0x14e [drbd] [<f8d24a53>] drbd_thread_setup+0x0/0x14e [drbd] [<c0405c3b>] kernel_thread_helper+0x7/0x10 =======================
Chris Miller wrote:> > I've got a pair of HA servers I'm trying to get into production. > Here are some specs : >[..]> [root at haws1 ~]# BUG: unable to handle kernel paging request at > virtual address cThis typically means bad RAM nate
Toshaan Bharvani
2008-Aug-18 13:28 UTC
[CentOS] Re: DRBD 8.2 crashes CentOS 5.2 on rsync from remote host
Scott Silva wrote:> on 8-14-2008 12:55 AM Chris Miller spake the following: >> nate wrote: >>> Chris Miller wrote: >>>> I've got a pair of HA servers I'm trying to get into production. >>>> Here are some specs : >>>> >>> >>> [..] >>>> [root at haws1 ~]# BUG: unable to handle kernel paging request at >>>> virtual address c >>> >>> This typically means bad RAM >> >> While I won't rule this out, my local hardware vendor does a 48 hour >> burn-in including a full gamut of tests (including memory) before >> handing over the servers. These servers are less than two weeks old... >> >> Seems like this is a common type of error in some situations. I tried >> to boot in kexec/kdump mode (CentOS 5 replacement for diskdumputils), >> but the e1000 driver isn't seeing the NICs after a reboot via the >> "capture kernel", so I can't replicate the (rsync induced) problem >> and perform kernel debugging. I'll explore this more tomorrow. >> >> Chris > When the servers are shipped to you, do you open them and make sure > all modules are seated completely, and haven't been dislodged by the > shipping? > > > > ------------------------------------------------------------------------ > > _______________________________________________ > CentOS mailing list > CentOS at centos.org > http://lists.centos.org/mailman/listinfo/centos >why not try a memtest, you can download a bootable cd/usb and do a check _________________________________________________________________ News, entertainment and everything you care about at Live.com. Get it now! http://www.live.com/getstarted.aspx -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.centos.org/pipermail/centos/attachments/20080818/c5bdc30e/attachment-0005.html>
nightduke
2008-Aug-18 13:34 UTC
[CentOS] Re: DRBD 8.2 crashes CentOS 5.2 on rsync from remote host
What server are? IBM, HP, DELL? 2008/8/18 Toshaan Bharvani <toshaan at hotmail.com>:> Scott Silva wrote: >> on 8-14-2008 12:55 AM Chris Miller spake the following: >>> nate wrote: >>>> Chris Miller wrote: >>>>> I've got a pair of HA servers I'm trying to get into production. >>>>> Here are some specs : >>>>> >>>> >>>> [..] >>>>> [root at haws1 ~]# BUG: unable to handle kernel paging request at >>>>> virtual address c >>>> >>>> This typically means bad RAM >>> >>> While I won't rule this out, my local hardware vendor does a 48 hour >>> burn-in including a full gamut of tests (including memory) before >>> handing over the servers. These servers are less than two weeks old... >>> >>> Seems like this is a common type of error in some situations. I tried >>> to boot in kexec/kdump mode (CentOS 5 replacement for diskdumputils), >>> but the e1000 driver isn't seeing the NICs after a reboot via the >>> "capture kernel", so I can't replicate the (rsync induced) problem >>> and perform kernel debugging. I'll explore this more tomorrow. >>> >>> Chris >> When the servers are shipped to you, do you open them and make sure >> all modules are seated completely, and haven't been dislodged by the >> shipping? >> >> >> >> ------------------------------------------------------------------------ >> >> _______________________________________________ >> CentOS mailing list >> CentOS at centos.org >> http://lists.centos.org/mailman/listinfo/centos >> > why not try a memtest, you can download a bootable cd/usb and do a check > > > ________________________________ > Get news, entertainment and everything you care about at Live.com. Check it > out! > _______________________________________________ > CentOS mailing list > CentOS at centos.org > http://lists.centos.org/mailman/listinfo/centos > >