Hi,
We have currently set up a 16 node cluster with xen3.0.2-3/Linux 2.6.16-13.
We also have MPI setup and running on the cluster. I construct a ring of 4
machines with 3 real nodes and 1 virtual one and run an MPI application(a
benchmark -smg2000) and it completes fine. Very nice.
Now while running the MPI benchmark on the ring, I try to live migrate the
virtual machine. This produces a ''Kernel Bug'' in the virtual
machine with
the dump pasted below. Also I am pasting the error thrown by the MPI
benchmark application.(Seems like some kind of memory corruption while doing
migration...)
Has anyone tried successfully doing a live migration while running an MPI
application?
Could you please help me how to approach this? (On seeing the glibc errors,
i moved /lib64/tls to /lib64/tls.disabled. But no difference..)
Thank you,
Arun
1. Error message given by the virtual machine''s console running and
MPI.
----------- [cut here ] --------- [please bite here ] ---------
Kernel BUG at mm/mmap.c:1961
invalid opcode: 0000 [3] SMP
CPU 0
Modules linked in: ipv6 autofs4 i2c_dev i2c_core dm_mirror dm_mod lp
parport_pc parport
Pid: 4790, comm: smg2000 Not tainted 2.6.16.13-xen #7
RIP: e030:[<ffffffff8016a42b>] <ffffffff8016a42b>{exit_mmap+235}
RSP: e02b:ffff880012cddcd8 EFLAGS: 00010202
RAX: 0000000000000000 RBX: ffff8800021a42c0 RCX: 000000000000011d
RDX: ffffffffff578000 RSI: ffff88002b33e6b8 RDI: ffff88000116da80
RBP: 0000000000000000 R08: ffff8800395445b0 R09: 0000000000000000
R10: 0000000000000537 R11: ffffffff801dac20 R12: ffff880002700880
R13: 0000000000000001 R14: 0000000000000006 R15: ffffffff8010b45d
FS: 00002b1d2333e6f0(0000) GS:ffffffff80514000(0000) knlGS:00000000
00000000
CS: e033 DS: 0000 ES: 0000
Process smg2000 (pid: 4790, threadinfo ffff880012cdc000, task ffff88
003f9828b0)
Stack: 0000000000002181 ffff8800021a42c0 ffff880002700880 ffff880002 700900
ffff88003f982f1c ffffffff8012ef94 0000000000000006 0000000000 000006
ffff88003f9828b0 ffffffff80135479
Call Trace: <ffffffff8012ef94>{mmput+52} <ffffffff80135479>{do_exit+
521}
<ffffffff8013deae>{__dequeue_signal+478} <ffffffff8010b45d>{s
ysret_signal+56}
<ffffffff80135c28>{do_group_exit+264} <ffffffff8014062c>{get_
signal_to_deliver+1708}
<ffffffff8010b45d>{sysret_signal+56} <ffffffff8010a5ed>{do_si
gnal+157}
<ffffffff801378cb>{current_fs_time+59} <ffffffff803a3c62>{__d
own_read+18}
<ffffffff80129eec>{try_to_wake_up+924} <ffffffff80196864>{dpu
t+84}
<ffffffff8013d62c>{sigprocmask+220} <ffffffff8013ef23>{sys_rt
_sigprocmask+99}
<ffffffff8017b768>{filp_close+104} <ffffffff8013d62c>{sigproc
mask+220}
<ffffffff8010b45d>{sysret_signal+56} <ffffffff8010b735>{ptreg
scall_common+61}
Code: 0f 0b 68 95 2b 3d 80 c2 a9 07 48 83 c4 10 5b 5d 41 5c c3 66
RIP <ffffffff8016a42b>{exit_mmap+235} RSP <ffff880012cddcd8>
<1>Fixing recursive fault but reboot is needed!
2. Error thrown by the MPI benchmark application:
*** glibc detected *** smg2000: free(): invalid pointer: 0x00000000017ef1a0
***
======= Backtrace: ========/lib64/libc.so.6[0x2b1d23162c43]
/lib64/libc.so.6(__libc_free+0x84)[0x2b1d23162dc4]
smg2000[0x42b632]
smg2000[0x4289ee]
smg2000[0x41d261]
smg2000[0x405dee]
smg2000[0x4056a8]
smg2000[0x408aad]
smg2000[0x403c05]
smg2000[0x403730]
/lib64/libc.so.6(__libc_start_main+0xf4)[0x2b1d23111e54]
smg2000[0x402269]
======= Memory map: =======00400000-004bd000 r-xp 00000000 00:15 33425271
/nfsroot/home/abnagara/code/bm/smg2000/test/smg2000
005bc000-005be000 rw-p 000bc000 00:15 33425271
/nfsroot/home/abnagara/code/bm/smg2000/test/smg2000
005be000-0180f000 rw-p 005be000 00:00 0
[heap]
36f8e00000-36f8e0d000 r-xp 00000000 00:0c 37044253
/nfsroot/lib64/libgcc_s-4.1.0-20060304.so.1
36f8e0d000-36f8f0d000 ---p 0000d000 00:0c 37044253
/nfsroot/lib64/libgcc_s-4.1.0-20060304.so.1
36f8f0d000-36f8f0e000 rw-p 0000d000 00:0c 37044253
/nfsroot/lib64/libgcc_s-4.1.0-20060304.so.1
2b1d22c37000-2b1d22c51000 r-xp 00000000 00:0c 37044225
/nfsroot/lib64/ld-2.4.so
2b1d22c51000-2b1d22c52000 rw-p 2b1d22c51000 00:00 0
2b1d22c73000-2b1d22c74000 rw-p 2b1d22c73000 00:00 0
2b1d22d50000-2b1d22d51000 r--p 00019000 00:0c 37044225
/nfsroot/lib64/ld-2.4.so
2b1d22d51000-2b1d22d52000 rw-p 0001a000 00:0c 37044225
/nfsroot/lib64/ld-2.4.so
2b1d22d52000-2b1d22dd2000 r-xp 00000000 00:0c 37044256
/nfsroot/lib64/libm-2.4.so
2b1d22dd2000-2b1d22ed2000 ---p 00080000 00:0c 37044256
/nfsroot/lib64/libm-2.4.so
2b1d22ed2000-2b1d22ed3000 r--p 00080000 00:0c 37044256
/nfsroot/lib64/libm-2.4.so
2b1d22ed3000-2b1d22ed4000 rw-p 00081000 00:0c 37044256
/nfsroot/lib64/libm-2.4.so
2b1d22ed4000-2b1d22ee6000 r-xp 00000000 00:0c 37044273
/nfsroot/lib64/libpthread-2.4.so
2b1d22ee6000-2b1d22fe6000 ---p 00012000 00:0c 37044273
/nfsroot/lib64/libpthread-2.4.so
2b1d22fe6000-2b1d22fe7000 r--p 00012000 00:0c 37044273
/nfsroot/lib64/libpthread-2.4.so
2b1d22fe7000-2b1d22fe8000 rw-p 00013000 00:0c 37044273
/nfsroot/lib64/libpthread-2.4.so
2b1d22fe8000-2b1d22fec000 rw-p 2b1d22fe8000 00:00 0
2b1d22fec000-2b1d22ff3000 r-xp 00000000 00:0c 37044275
/nfsroot/lib64/librt-2.4.so
2b1d22ff3000-2b1d230f2000 ---p 00007000 00:0c 37044275
/nfsroot/lib64/librt-2.4.so
2b1d230f2000-2b1d230f3000 r--p 00006000 00:0c 37044275
/nfsroot/lib64/librt-2.4.so
2b1d230f3000-2b1d230f4000 rw-p 00007000 00:0c 37044275
/nfsroot/lib64/librt-2.4.so
2b1d230f4000-2b1d230f5000 rw-p 2b1d230f4000 00:00 0
2b1d230f5000-2b1d23234000 r-xp 00000000 00:0c 37044234
/nfsroot/lib64/libc-2.4.so
2b1d23234000-2b1d23334000 ---p 0013f000 00:0c 37044234
/nfsroot/lib64/libc-2.4.so
2b1d23334000-2b1d23338000 r--p 0013f000 00:0c 37044234
/nfsroot/lib64/libc-2.4.so
2b1d23338000-2b1d23339000 rw-p 00143000 00:0c 37044234
/nfsroot/lib64/libc-2.4.so
2b1d23339000-2b1d23458000 rw-p 2b1d23339000 00:00 0
2b1d2348d000-2b1d2350f000 rw-p 2b1d2348d000 00:00 0
2b1d2352b000-2b1d2426d000 rw-p 2b1d2352b000 00:00 0
2b1d24300000-2b1d24321000 rw-p 2b1d24300000 00:00 0
2b1d24321000-2b1d24400000 ---p 2b1d24321000 00:00 0
7fffffd72000-7fffffd87000 rw-p 7fffffd72000 00:00 0
[stack]ffffffffff600000-ffffffffffe00000 ---p 00000000 00:00
0 [vdso]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel