thr3ads.net - Xen devel - [Xen-devel] Crash on boot with 2.6.37-rc8-git3 [Jan 2011]

If this information is useful, please help other people find it:
Share via:

M A Young

2011-Jan-04 22:01 UTC

[Xen-devel] Crash on boot with 2.6.37-rc8-git3

The latest Fedora based 2.6.37 kernels have stopped booting for me under 
xen. They stopped working around -rc7 but I think the trigger is that 
various debug options were turned off. My hardware won''t let me get
serial
output, so I have tried booting it within kvm, and got the attached output 
- the behaviour was similar to bare metal, though I don''t see enough to
know if it is exactly the same crash. The kernel used has no additional 
xen patches, though I am seeing similar behaviour for kernels with patches 
from xen-next-2.6.37. The crash looks like it is something to do with irq.

 	Michael Young

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Konrad Rzeszutek Wilk

2011-Jan-05 15:43 UTC

head link

Re: [Xen-devel] Crash on boot with 2.6.37-rc8-git3

On Tue, Jan 04, 2011 at 10:01:56PM +0000, M A Young
wrote:> The latest Fedora based 2.6.37 kernels have stopped booting for me
> under xen. They stopped working around -rc7 but I think the trigger
> is that various debug options were turned off. My hardware won''t
let
> me get serial output, so I have tried booting it within kvm, and got
> the attached output - the behaviour was similar to bare metal,
> though I don''t see enough to know if it is exactly the same crash.
> The kernel used has no additional xen patches, though I am seeing
> similar behaviour for kernels with patches from xen-next-2.6.37. The
> crash looks like it is something to do with irq.
Ahh, I hit this. Can you try ''stable/bug-fixes'' branch of
mine?
It has "xen/irq: Don''t fall over when nr_irqs_gsi >
nr_irqs." patch
which will fix the below problem you are seeing.

But I am not sure if it fixes the problem you are having with hardware?

(git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen.git)

..> [    0.008220] ------------[ cut here ]------------
> [    0.008999] WARNING: at drivers/xen/events.c:432
find_unbound_irq+0x88/0x9f()
> [    0.008999] Hardware name: Bochs
> [    0.008999] Modules linked in:
> [    0.008999] Pid: 1, comm: swapper Not tainted
2.6.37-0.rc8.git3.1.fc15.x86_64 #1
> [    0.008999] Call Trace:
> [    0.008999]  [<ffffffff810505d7>] warn_slowpath_common+0x85/0x9d
> [    0.008999]  [<ffffffff81050609>] warn_slowpath_null+0x1a/0x1c
> [    0.008999]  [<ffffffff812abfea>] find_unbound_irq+0x88/0x9f
> [    0.008999]  [<ffffffff812ac90e>]
bind_ipi_to_irqhandler+0x64/0x153
> [    0.008999]  [<ffffffff81007979>] ?
xen_reschedule_interrupt+0x0/0x18
> [    0.008999]  [<ffffffff81234511>] ? kasprintf+0x38/0x3b
> [    0.008999]  [<ffffffff81007b92>] xen_smp_intr_init+0x46/0x1f3
> [    0.008999]  [<ffffffff81b5839a>] xen_smp_prepare_cpus+0x3d/0x107
> [    0.008999]  [<ffffffff81b53cf3>] kernel_init+0x92/0x2b6
> [    0.008999]  [<ffffffff8100bae4>] kernel_thread_helper+0x4/0x10
> [    0.008999]  [<ffffffff8100aee3>] ? int_ret_from_sys_call+0x7/0x1b
> [    0.008999]  [<ffffffff81477edd>] ? retint_restore_args+0x5/0x6
> [    0.008999]  [<ffffffff8100bae0>] ? kernel_thread_helper+0x0/0x10
> [    0.008999] ---[ end trace a7919e7f17c0a725 ]---
> [    0.008999] ------------[ cut here ]------------
> [    0.008999] WARNING: at kernel/irq/manage.c:904 __free_irq+0xa3/0x1ab()
> [    0.008999] Hardware name: Bochs
> [    0.008999] Trying to free already-free IRQ 0
> [    0.008999] Modules linked in:
> [    0.008999] Pid: 1, comm: swapper Tainted: G        W  
2.6.37-0.rc8.git3.1.fc15.x86_64 #1
> [    0.008999] Call Trace:
> [    0.008999]  [<ffffffff810505d7>] warn_slowpath_common+0x85/0x9d
> [    0.008999]  [<ffffffff81050692>] warn_slowpath_fmt+0x46/0x48
> [    0.008999]  [<ffffffff8107d246>] ? arch_local_irq_save+0x18/0x1e
> [    0.008999]  [<ffffffff810ac901>] __free_irq+0xa3/0x1ab
> [    0.008999]  [<ffffffff810aca41>] free_irq+0x38/0x50
> [    0.008999]  [<ffffffff812abead>] unbind_from_irqhandler+0x15/0x20
> [    0.008999]  [<ffffffff81007cce>] xen_smp_intr_init+0x182/0x1f3
> [    0.008999]  [<ffffffff81b5839a>] xen_smp_prepare_cpus+0x3d/0x107
> [    0.008999]  [<ffffffff81b53cf3>] kernel_init+0x92/0x2b6
> [    0.008999]  [<ffffffff8100bae4>] kernel_thread_helper+0x4/0x10
> [    0.008999]  [<ffffffff8100aee3>] ? int_ret_from_sys_call+0x7/0x1b
> [    0.008999]  [<ffffffff81477edd>] ? retint_restore_args+0x5/0x6
> [    0.008999]  [<ffffffff8100bae0>] ? kernel_thread_helper+0x0/0x10
> [    0.008999] ---[ end trace a7919e7f17c0a726 ]---
> [    0.008999] ------------[ cut here ]------------
> [    0.008999] WARNING: at kernel/irq/manage.c:904 __free_irq+0xa3/0x1ab()
> [    0.008999] Hardware name: Bochs
> [    0.008999] Trying to free already-free IRQ 0
> [    0.008999] Modules linked in:
> [    0.008999] Pid: 1, comm: swapper Tainted: G        W  
2.6.37-0.rc8.git3.1.fc15.x86_64 #1
> [    0.008999] Call Trace:
> [    0.008999]  [<ffffffff810505d7>] warn_slowpath_common+0x85/0x9d
> [    0.008999]  [<ffffffff81050692>] warn_slowpath_fmt+0x46/0x48
> [    0.008999]  [<ffffffff8107d246>] ? arch_local_irq_save+0x18/0x1e
> [    0.008999]  [<ffffffff810ac901>] __free_irq+0xa3/0x1ab
> [    0.008999]  [<ffffffff810aca41>] free_irq+0x38/0x50
> [    0.008999]  [<ffffffff812abead>] unbind_from_irqhandler+0x15/0x20
> [    0.008999]  [<ffffffff81007cf0>] xen_smp_intr_init+0x1a4/0x1f3
> [    0.008999]  [<ffffffff81b5839a>] xen_smp_prepare_cpus+0x3d/0x107
> [    0.008999]  [<ffffffff81b53cf3>] kernel_init+0x92/0x2b6
> [    0.008999]  [<ffffffff8100bae4>] kernel_thread_helper+0x4/0x10
> [    0.008999]  [<ffffffff8100aee3>] ? int_ret_from_sys_call+0x7/0x1b
> [    0.008999]  [<ffffffff81477edd>] ? retint_restore_args+0x5/0x6
> [    0.008999]  [<ffffffff8100bae0>] ? kernel_thread_helper+0x0/0x10
> [    0.008999] ---[ end trace a7919e7f17c0a727 ]---
> [    0.008999] ------------[ cut here ]------------
> [    0.008999] WARNING: at kernel/irq/manage.c:904 __free_irq+0xa3/0x1ab()
> [    0.008999] Hardware name: Bochs
> [    0.008999] Trying to free already-free IRQ 0
> [    0.008999] Modules linked in:
> [    0.008999] Pid: 1, comm: swapper Tainted: G        W  
2.6.37-0.rc8.git3.1.fc15.x86_64 #1
> [    0.008999] Call Trace:
> [    0.008999]  [<ffffffff810505d7>] warn_slowpath_common+0x85/0x9d
> [    0.008999]  [<ffffffff81050692>] warn_slowpath_fmt+0x46/0x48
> [    0.008999]  [<ffffffff8107d246>] ? arch_local_irq_save+0x18/0x1e
> [    0.008999]  [<ffffffff810ac901>] __free_irq+0xa3/0x1ab
> [    0.008999]  [<ffffffff810aca41>] free_irq+0x38/0x50
> [    0.008999]  [<ffffffff812abead>] unbind_from_irqhandler+0x15/0x20
> [    0.008999]  [<ffffffff81007d34>] xen_smp_intr_init+0x1e8/0x1f3
> [    0.008999]  [<ffffffff81b5839a>] xen_smp_prepare_cpus+0x3d/0x107
> [    0.008999]  [<ffffffff81b53cf3>] kernel_init+0x92/0x2b6
> [    0.008999]  [<ffffffff8100bae4>] kernel_thread_helper+0x4/0x10
> [    0.008999]  [<ffffffff8100aee3>] ? int_ret_from_sys_call+0x7/0x1b
> [    0.008999]  [<ffffffff81477edd>] ? retint_restore_args+0x5/0x6
> [    0.008999]  [<ffffffff8100bae0>] ? kernel_thread_helper+0x0/0x10
> [    0.008999] ---[ end trace a7919e7f17c0a728 ]---
> [    0.009018] ------------[ cut here ]------------
> [    0.009999] kernel BUG at arch/x86/xen/smp.c:217!
> [    0.009999] invalid opcode: 0000 [#1] SMP 
> [    0.009999] last sysfs file: 
> [    0.009999] CPU 0 
> [    0.009999] Modules linked in:
> [    0.009999] 
> [    0.009999] Pid: 1, comm: swapper Tainted: G        W  
2.6.37-0.rc8.git3.1.fc15.x86_64 #1 /Bochs
> [    0.009999] RIP: e030:[<ffffffff81b5839e>] 
[<ffffffff81b5839e>] xen_smp_prepare_cpus+0x41/0x107
> [    0.009999] RSP: e02b:ffff880033841eb0  EFLAGS: 00010286
> [    0.009999] RAX: 00000000ffffffff RBX: ffffffff81c1c7b0 RCX:
0000000000000100
> [    0.009999] RDX: ffff88003a410000 RSI: 0000000000000000 RDI:
ffffffff81d64d50
> [    0.009999] RBP: ffff880033841ed0 R08: 0000000000000002 R09:
00000000fffffffe
> [    0.009999] R10: ffff880033841e50 R11: 0000000000000000 R12:
0000000000000100
> [    0.009999] R13: 0000000000000000 R14: 0000000000000000 R15:
0000000000000000
> [    0.009999] FS:  0000000000000000(0000) GS:ffff88003b063000(0000)
knlGS:0000000000000000
> [    0.009999] CS:  e033 DS: 0000 ES: 0000 CR0: 000000008005003b
> [    0.009999] CR2: 0000000000000000 CR3: 0000000001a03000 CR4:
0000000000000660
> [    0.009999] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
> [    0.009999] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
0000000000000400
> [    0.009999] Process swapper (pid: 1, threadinfo ffff880033840000, task
ffff880033838000)
> [    0.009999] Stack:
> [    0.009999]  ffff880033838000 ffffffff81c1c7b0 0000000000000000
0000000000000000
> [    0.009999]  ffff880033841f40 ffffffff81b53cf3 0000000000000001
0000000000000000
> [    0.009999]  0000000000000000 0000000000000000 0000000000000000
0000000000000000
> [    0.009999] Call Trace:
> [    0.009999]  [<ffffffff81b53cf3>] kernel_init+0x92/0x2b6
> [    0.009999]  [<ffffffff8100bae4>] kernel_thread_helper+0x4/0x10
> [    0.009999]  [<ffffffff8100aee3>] ? int_ret_from_sys_call+0x7/0x1b
> [    0.009999]  [<ffffffff81477edd>] ? retint_restore_args+0x5/0x6
> [    0.009999]  [<ffffffff8100bae0>] ? kernel_thread_helper+0x0/0x10
> [    0.009999] Code: ff 48 8b 15 25 b9 fd ff 31 ff 48 c7 c0 00 36 01 00 66
c7 84 10 c0 00 00 00 01 00 e8 3c 76 91 ff 31 ff e8 b2 f7 4a ff 85 c0 74 02
<0f> 0b 31 ff e8 a9 f5 4a ff 48 c7 c2 00 20 c3 81 b9 08 00 00 00
> [    0.009999] RIP  [<ffffffff81b5839e>]
xen_smp_prepare_cpus+0x41/0x107
> [    0.009999]  RSP <ffff880033841eb0>
> [    0.009999] ---[ end trace a7919e7f17c0a729 ]---
> [    0.010021] Kernel panic - not syncing: Attempted to kill init!
> [    0.010999] Pid: 1, comm: swapper Tainted: G      D W  
2.6.37-0.rc8.git3.1.fc15.x86_64 #1
> [    0.010999] Call Trace:
> [    0.010999]  [<ffffffff814759d5>] panic+0x91/0x1a4
> [    0.010999]  [<ffffffff810d6093>] ?
perf_event_exit_task+0xb8/0x1c7
> [    0.010999]  [<ffffffff81053b89>] do_exit+0x7c/0x75d
> [    0.010999]  [<ffffffff8107d21f>] ? arch_local_irq_restore+0xb/0xd
> [    0.010999]  [<ffffffff8147795f>] ?
_raw_spin_unlock_irqrestore+0x17/0x19
> [    0.010999]  [<ffffffff8100022a>] ? _stext+0x9a/0xe70
> [    0.010999]  [<ffffffff81478c8b>] oops_end+0xbf/0xc7
> [    0.010999]  [<ffffffff8100022a>] ? _stext+0x9a/0xe70
> [    0.010999]  [<ffffffff8100022a>] ? _stext+0x9a/0xe70
> [    0.010999]  [<ffffffff8100e6ec>] die+0x5a/0x66
> [    0.010999]  [<ffffffff81478518>] do_trap+0x121/0x130
> [    0.010999]  [<ffffffff8100c06d>] do_invalid_op+0x98/0xa1
> [    0.010999]  [<ffffffff81b5839e>] ?
xen_smp_prepare_cpus+0x41/0x107
> [    0.010999]  [<ffffffff8107d246>] ? arch_local_irq_save+0x18/0x1e
> [    0.010999]  [<ffffffff8107d21f>] ? arch_local_irq_restore+0xb/0xd
> [    0.010999]  [<ffffffff8147795f>] ?
_raw_spin_unlock_irqrestore+0x17/0x19
> [    0.010999]  [<ffffffff810ac90d>] ? __free_irq+0xaf/0x1ab
> [    0.010999]  [<ffffffff8100b95b>] invalid_op+0x1b/0x20
> [    0.010999]  [<ffffffff81b5839e>] ?
xen_smp_prepare_cpus+0x41/0x107
> [    0.010999]  [<ffffffff81b53cf3>] kernel_init+0x92/0x2b6
> [    0.010999]  [<ffffffff8100bae4>] kernel_thread_helper+0x4/0x10
> [    0.010999]  [<ffffffff8100aee3>] ? int_ret_from_sys_call+0x7/0x1b
> [    0.010999]  [<ffffffff81477edd>] ? retint_restore_args+0x5/0x6
> [    0.010999]  [<ffffffff8100bae0>] ? kernel_thread_helper+0x0/0x10
> (XEN) Domain 0 crashed: rebooting machine in 5 seconds.
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

M A Young

2011-Jan-05 23:11 UTC

head link

Re: [Xen-devel] Crash on boot with 2.6.37-rc8-git3

On Wed, 5 Jan 2011, Konrad Rzeszutek Wilk wrote:
> Ahh, I hit this. Can you try ''stable/bug-fixes'' branch of
mine?
> It has "xen/irq: Don''t fall over when nr_irqs_gsi >
nr_irqs." patch
> which will fix the below problem you are seeing.
>
> But I am not sure if it fixes the problem you are having with hardware?
That fixes the kvm boot, but unfortunately booting directly on the 
hardware doesn''t. Incidentally it is definitely turning debug options
off
that trigger the crash, as I realized I was building a kernel-debug 
package as well as a kernel package from the same source RPM, and it boots 
with the debug kernel but not the ordinary kernel.

 	Michael Young

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Konrad Rzeszutek Wilk

2011-Jan-06 14:56 UTC

head link

Re: [Xen-devel] Crash on boot with 2.6.37-rc8-git3

On Wed, Jan 05, 2011 at 11:11:03PM +0000, M A Young
wrote:> On Wed, 5 Jan 2011, Konrad Rzeszutek Wilk wrote:
> 
> >Ahh, I hit this. Can you try ''stable/bug-fixes''
branch of mine?
> >It has "xen/irq: Don''t fall over when nr_irqs_gsi >
nr_irqs." patch
> >which will fix the below problem you are seeing.
> >
> >But I am not sure if it fixes the problem you are having with hardware?
> 
> That fixes the kvm boot, but unfortunately booting directly on the
> hardware doesn''t. Incidentally it is definitely turning debug
> options off that trigger the crash, as I realized I was building a
> kernel-debug package as well as a kernel package from the same
Ok, I think we need a serial output. I don''t remember if you said that
your docking station has a serial port or not.

If the docking station does not, this card ought to do the trick:

http://www.newegg.com/Product/Product.aspx?Item=N82E16839328018&Tpk=SDEXP15005

You can use under Xen as a normal PCI type serial card. For details:

http://wiki.xensource.com/xenwiki/XenSerialConsole
> source RPM, and it boots with the debug kernel but not the ordinary
> kernel.
> 
> 	Michael Young
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

M A Young

2011-Jan-07 00:37 UTC

head link

Re: [Xen-devel] Crash on boot with 2.6.37-rc8-git3

On Thu, 6 Jan 2011, Konrad Rzeszutek Wilk wrote:
> Ok, I think we need a serial output. I don''t remember if you said
that
> your docking station has a serial port or not.
I don''t have any good way of getting a serial port on this computer. I 
have however managed to get output on the screen and have a poor quality 
photo. The relevant lines looks like
BUG unable to handle kernel NULL pointer dereference at
IP: [<ffffffff81b69b92>] setup_node_bootmem+0x16b/0x199

 	Michael Young

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Konrad Rzeszutek Wilk

2011-Jan-07 19:18 UTC

head link

Re: [Xen-devel] Crash on boot with 2.6.37-rc8-git3

On Fri, Jan 07, 2011 at 12:37:36AM +0000, M A Young
wrote:> On Thu, 6 Jan 2011, Konrad Rzeszutek Wilk wrote:
> 
> >Ok, I think we need a serial output. I don''t remember if you
said that
> >your docking station has a serial port or not.
> 
> I don''t have any good way of getting a serial port on this
computer.
> I have however managed to get output on the screen and have a poor
> quality photo. The relevant lines looks like
> BUG unable to handle kernel NULL pointer dereference at
> IP: [<ffffffff81b69b92>] setup_node_bootmem+0x16b/0x199
Hmmm, I did see something similar to this in 2.6.37-rc1, but we fixed
that quickly. It was triggered by having 4GB of memory or so and
the work-around was to use dom0_mem=max:2GB.

Can you send the photo? Maybe the calleer stack will shed some light.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

M A Young

2011-Jan-07 20:34 UTC

head link

Re: [Xen-devel] Crash on boot with 2.6.37-rc8-git3

[This email is either empty or too large to be displayed at this time]

Konrad Rzeszutek Wilk

2011-Jan-07 21:23 UTC

head link

Re: [Xen-devel] Crash on boot with 2.6.37-rc8-git3

On Fri, Jan 07, 2011 at 08:34:43PM +0000, M A Young
wrote:> On Fri, 7 Jan 2011, Konrad Rzeszutek Wilk wrote:
> >>BUG unable to handle kernel NULL pointer dereference at
> >>IP: [<ffffffff81b69b92>] setup_node_bootmem+0x16b/0x199
> 
> >Hmmm, I did see something similar to this in 2.6.37-rc1, but we fixed
> >that quickly. It was triggered by having 4GB of memory or so and
> >the work-around was to use dom0_mem=max:2GB.
> >
> >Can you send the photo? Maybe the calleer stack will shed some light.
> 
> Here are two photos of the output at different times. The context is
> 
>    0xffffffff81b69b6d <setup_node_bootmem+326>:
>     callq  0xffffffff81475ec9 <printk>
>    0xffffffff81b69b72 <setup_node_bootmem+331>:	movslq %ebx,%rdx
>    0xffffffff81b69b75 <setup_node_bootmem+334>:	xor    %eax,%eax
>    0xffffffff81b69b77 <setup_node_bootmem+336>:	mov    $0x4fc0,%ecx
>    0xffffffff81b69b7c <setup_node_bootmem+341>:
>     mov    -0x7e4cb750(,%rdx,8),%rsi
>    0xffffffff81b69b84 <setup_node_bootmem+349>:	shr    $0xc,%r13
>    0xffffffff81b69b88 <setup_node_bootmem+353>:	shr    $0xc,%r12
>    0xffffffff81b69b8c <setup_node_bootmem+357>:	sub    %r13,%r12
>    0xffffffff81b69b8f <setup_node_bootmem+360>:	mov    %rsi,%rdi
>    0xffffffff81b69b92 <setup_node_bootmem+363>:	rep stos
%eax,%es:(%rdi)
That looks like:

	memset(NODE_DATA(nodeid), 0, sizeof(pg_data_t));
>From the photo, %eax is zero, and this is perfect code for copying values
in.
>    0xffffffff81b69b94 <setup_node_bootmem+365>:	mov    %ebx,%edi
>    0xffffffff81b69b96 <setup_node_bootmem+367>:
>     mov    -0x7e4cb750(,%rdx,8),%rax
> 
> which is somewhere around line 224 in arch/x86/mm/numa_64.c
> 
>         if (nid != nodeid)
>                 printk(KERN_INFO "    NODE_DATA(%d) on node
%d\n",
> nodeid, nid);
Can you make sure that 419db274bed4269f475a8e78cbe9c917192cfe8b is in? That
is the patch that fixed this issue last time.

However .. the more I look at the code the less it seems to be that and
that is the last fix in that file.

Do you see any messages about ''Cannot find 20 bytes in node X''
(where X
I think is 0)?


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

M A Young

2011-Jan-08 00:10 UTC

head link

Re: [Xen-devel] Crash on boot with 2.6.37-rc8-git3

On Fri, 7 Jan 2011, Konrad Rzeszutek Wilk wrote:
> Can you make sure that 419db274bed4269f475a8e78cbe9c917192cfe8b is in? That
> is the patch that fixed this issue last time.
Yes it is.
> Do you see any messages about ''Cannot find 20 bytes in node
X'' (where X
> I think is 0)?
I haven''t spotted any such message.

 	Michael Young

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Konrad Rzeszutek Wilk

2011-Jan-10 18:42 UTC

head link

Re: [Xen-devel] Crash on boot with 2.6.37-rc8-git3

> >Do you see any messages about ''Cannot find 20 bytes in node
X'' (where X
> >I think is 0)?
> 
> I haven''t spotted any such message.
Try fiddling with the dom0_mem.. to see at what point it starts failing. Is
this happening only on this machine or do you see it on other boxes too?

Your E820 looks as so:
 BIOS-e820: 0000000000000000 - 000000000009f000 (usable)
 BIOS-e820: 000000000009f000 - 00000000000a0000 (reserved)
 BIOS-e820: 0000000000100000 - 00000000df66d800 (usable)
 BIOS-e820: 00000000df66d800 - 00000000e0000000 (reserved)
 BIOS-e820: 00000000f8000000 - 00000000fc000000 (reserved)
 BIOS-e820: 00000000fec00000 - 00000000fec10000 (reserved)
 BIOS-e820: 00000000fed18000 - 00000000fed1c000 (reserved)
 BIOS-e820: 00000000fed20000 - 00000000fed90000 (reserved)
 BIOS-e820: 00000000feda0000 - 00000000feda6000 (reserved)
 BIOS-e820: 00000000fee00000 - 00000000fee10000 (reserved)
 BIOS-e820: 00000000ffe00000 - 0000000100000000 (reserved)
 BIOS-e820: 0000000100000000 - 0000000120000000 (usable)

Which looks completly normal.. I am really at loss here. You could
also sprinkle printk''s around that code (or xen_raw_printk and inhibit
the Linux kernel console output - that way you would only see the Xen
and output from xen_raw_printk).

Let me bootup 2.6.37 on a 4GB machine just to see if I am seeing this.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

M A Young

2011-Jan-10 21:43 UTC

head link

Re: [Xen-devel] Crash on boot with 2.6.37-rc8-git3

On Mon, 10 Jan 2011, Konrad Rzeszutek Wilk wrote:
> Try fiddling with the dom0_mem.. to see at what point it starts failing. Is
> this happening only on this machine or do you see it on other boxes too?
dom0_mem=max:3574MB boots, dom0_mem=max:3575MB doesn''t. I
haven''t tried it
on other boxes yet.
> Which looks completly normal.. I am really at loss here. You could
> also sprinkle printk''s around that code (or xen_raw_printk and
inhibit
> the Linux kernel console output - that way you would only see the Xen
> and output from xen_raw_printk).
I will think about where the printk''s should go, but probably not
tonight.

 	Michael Young

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

M A Young

2011-Jan-16 20:48 UTC

head link

Re: [Xen-devel] Crash on boot with 2.6.37-rc8-git3

On Mon, 10 Jan 2011, Konrad Rzeszutek Wilk wrote:
> Your E820 looks as so:
> BIOS-e820: 0000000000000000 - 000000000009f000 (usable)
> BIOS-e820: 000000000009f000 - 00000000000a0000 (reserved)
> BIOS-e820: 0000000000100000 - 00000000df66d800 (usable)
> BIOS-e820: 00000000df66d800 - 00000000e0000000 (reserved)
> BIOS-e820: 00000000f8000000 - 00000000fc000000 (reserved)
> BIOS-e820: 00000000fec00000 - 00000000fec10000 (reserved)
> BIOS-e820: 00000000fed18000 - 00000000fed1c000 (reserved)
> BIOS-e820: 00000000fed20000 - 00000000fed90000 (reserved)
> BIOS-e820: 00000000feda0000 - 00000000feda6000 (reserved)
> BIOS-e820: 00000000fee00000 - 00000000fee10000 (reserved)
> BIOS-e820: 00000000ffe00000 - 0000000100000000 (reserved)
> BIOS-e820: 0000000100000000 - 0000000120000000 (usable)
>
> Which looks completly normal.. I am really at loss here.
I have looked at this again and I am worried by the last section, which is 
a chunk from 4GB to 4.5GB. The problem is that I only have 4GB. My tests 
show that dom0_mem=max:3574MB boots, dom0_mem=max:3575MB doesn''t. The 
first two "usable" chunks add up to a few KB over 3574MB so the
problems
come when it tries to use the final "usable" chunk which I interpret
as
being beyond the memory I have.

3574MB is a bit less than 3.5GB so I would guess that the final chunk is 
trying to make up the memory to 4GB. There are also gaps in these memory 
pieces which add up to about 445MB. Hence I think there are some issues 
with the memory allocation mechanism.

 	Michael Young

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2011-Jan-16 20:56 UTC

head link

Re: [Xen-devel] Crash on boot with 2.6.37-rc8-git3

On 16/01/2011 20:48, "M A Young" <m.a.young@durham.ac.uk> wrote:
>> BIOS-e820: 0000000000000000 - 000000000009f000 (usable)
>> BIOS-e820: 000000000009f000 - 00000000000a0000 (reserved)
>> BIOS-e820: 0000000000100000 - 00000000df66d800 (usable)
>> BIOS-e820: 00000000df66d800 - 00000000e0000000 (reserved)
>> BIOS-e820: 00000000f8000000 - 00000000fc000000 (reserved)
>> BIOS-e820: 00000000fec00000 - 00000000fec10000 (reserved)
>> BIOS-e820: 00000000fed18000 - 00000000fed1c000 (reserved)
>> BIOS-e820: 00000000fed20000 - 00000000fed90000 (reserved)
>> BIOS-e820: 00000000feda0000 - 00000000feda6000 (reserved)
>> BIOS-e820: 00000000fee00000 - 00000000fee10000 (reserved)
>> BIOS-e820: 00000000ffe00000 - 0000000100000000 (reserved)
>> BIOS-e820: 0000000100000000 - 0000000120000000 (usable)
>> 
>> Which looks completly normal.. I am really at loss here.
> 
> I have looked at this again and I am worried by the last section, which is
> a chunk from 4GB to 4.5GB. The problem is that I only have 4GB. My tests
> show that dom0_mem=max:3574MB boots, dom0_mem=max:3575MB doesn''t.
The
> first two "usable" chunks add up to a few KB over 3574MB so the
problems
> come when it tries to use the final "usable" chunk which I
interpret as
> being beyond the memory I have.
> 
> 3574MB is a bit less than 3.5GB so I would guess that the final chunk is
> trying to make up the memory to 4GB. There are also gaps in these memory
> pieces which add up to about 445MB. Hence I think there are some issues
> with the memory allocation mechanism.
Device memory gets mapped just below 4GB, so the last piece of your RAM gets
re-mapped above 4GB by your BIOS, so that it can still be accessed. If you
add up the size of all the usable regions in the list above, it will sum to
a bit less than 4GB.

The bug will be something in the kernel code that can''t handle physical
addresses wider than 32 bits (i.e., physical addresses 4GB and above).

 -- Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

M A Young

2011-Jan-18 00:52 UTC

head link

Re: [Xen-devel] Crash on boot with 2.6.37-rc8-git3

On Mon, 10 Jan 2011, Konrad Rzeszutek Wilk wrote:
>>> Do you see any messages about ''Cannot find 20 bytes in
node X'' (where X
>>> I think is 0)?
>>
>> I haven''t spotted any such message.
>
> Try fiddling with the dom0_mem.. to see at what point it starts failing. Is
> this happening only on this machine or do you see it on other boxes too?
>
> Your E820 looks as so:
> BIOS-e820: 0000000000000000 - 000000000009f000 (usable)
> BIOS-e820: 000000000009f000 - 00000000000a0000 (reserved)
> BIOS-e820: 0000000000100000 - 00000000df66d800 (usable)
> BIOS-e820: 00000000df66d800 - 00000000e0000000 (reserved)
> BIOS-e820: 00000000f8000000 - 00000000fc000000 (reserved)
> BIOS-e820: 00000000fec00000 - 00000000fec10000 (reserved)
> BIOS-e820: 00000000fed18000 - 00000000fed1c000 (reserved)
> BIOS-e820: 00000000fed20000 - 00000000fed90000 (reserved)
> BIOS-e820: 00000000feda0000 - 00000000feda6000 (reserved)
> BIOS-e820: 00000000fee00000 - 00000000fee10000 (reserved)
> BIOS-e820: 00000000ffe00000 - 0000000100000000 (reserved)
> BIOS-e820: 0000000100000000 - 0000000120000000 (usable)
>
> Which looks completly normal.. I am really at loss here. You could
> also sprinkle printk''s around that code (or xen_raw_printk and
inhibit
> the Linux kernel console output - that way you would only see the Xen
> and output from xen_raw_printk).
>
> Let me bootup 2.6.37 on a 4GB machine just to see if I am seeing this.
My next theory is that the issue is that the system is an alignment issue. 
The NODE DATA is put in the range 00000000df659800 to 00000000df66d7ff 
(the top end of the second "usable" chunk) and the problem come when
it
tries to write to the final 2K piece (00000000df66d000 to 
00000000df66d800 - 00000000df66d000 occurs on the stack) which hasn''t
been
initialized properly because it isn''t a 4K piece.
Does this sound plausible?

 	Michael Young

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

M A Young

2011-Jan-19 22:54 UTC

head link

Re: [Xen-devel] Crash on boot with 2.6.37-rc8-git3

On Tue, 18 Jan 2011, M A Young wrote:
> My next theory is that the issue is that the system is an alignment issue. 
> The NODE DATA is put in the range 00000000df659800 to 00000000df66d7ff (the
> top end of the second "usable" chunk) and the problem come when
it tries to
> write to the final 2K piece (00000000df66d000 to 00000000df66d800 - 
> 00000000df66d000 occurs on the stack) which hasn''t been
initialized properly
> because it isn''t a 4K piece.
> Does this sound plausible?
Further experiments confirm that it is this 2K piece causing the problem - 
if I reserve the 2K chunk in the same was that NODE DATA is reserved 
(though without zeroing it) the system boots, if I reduce this to 
reserving only 1K then it doesn''t.

 	Michael Young

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Konrad Rzeszutek Wilk

2011-Jan-20 19:24 UTC

head link

Re: [Xen-devel] Crash on boot with 2.6.37-rc8-git3

On Wed, Jan 19, 2011 at 10:54:00PM +0000, M A Young
wrote:> On Tue, 18 Jan 2011, M A Young wrote:
> 
> >My next theory is that the issue is that the system is an
> >alignment issue. The NODE DATA is put in the range
> >00000000df659800 to 00000000df66d7ff (the top end of the second
> >"usable" chunk) and the problem come when it tries to write
to the
> >final 2K piece (00000000df66d000 to 00000000df66d800 -
> >00000000df66d000 occurs on the stack) which hasn''t been
> >initialized properly because it isn''t a 4K piece.
> >Does this sound plausible?
> 
> Further experiments confirm that it is this 2K piece causing the
> problem - if I reserve the 2K chunk in the same was that NODE DATA
> is reserved (though without zeroing it) the system boots, if I
> reduce this to reserving only 1K then it doesn''t.
I think my math is off here. The reserve call is made on the
df659800 -> df66d7ff, that would be 20 pages of data. The last
PFN df66d is where it dies b/c there is no PTE entry set for it?

What happens if you fudge the code so it allocates those pages to be
page aligned. So df65a000->df66e000 ? We skip this way the region
df659800->df659fff and start on a new PFN (and pte).


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

M A Young

2011-Jan-20 22:39 UTC

head link

Re: [Xen-devel] Crash on boot with 2.6.37-rc8-git3

On Thu, 20 Jan 2011, Konrad Rzeszutek Wilk wrote:
> I think my math is off here. The reserve call is made on the
> df659800 -> df66d7ff, that would be 20 pages of data. The last
> PFN df66d is where it dies b/c there is no PTE entry set for it?
>
> What happens if you fudge the code so it allocates those pages to be
> page aligned. So df65a000->df66e000 ? We skip this way the region
> df659800->df659fff and start on a new PFN (and pte).
I get (though the photo isn''t clear in places) df659000->df66cfff
and it
crashes at find_range_array+0x4d/0x56 which traces back to the 
call of memblock_find_dma_reserve from setup_arch in 
arch/x86/kernel/setup.c . So it still crashes, but at a slightly later 
stage.

 	Michael Young

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Konrad Rzeszutek Wilk

2011-Jan-21 15:27 UTC

head link

Re: [Xen-devel] Crash on boot with 2.6.37-rc8-git3

On Thu, Jan 20, 2011 at 10:39:17PM +0000, M A Young
wrote:> On Thu, 20 Jan 2011, Konrad Rzeszutek Wilk wrote:
> 
> >I think my math is off here. The reserve call is made on the
> >df659800 -> df66d7ff, that would be 20 pages of data. The last
> >PFN df66d is where it dies b/c there is no PTE entry set for it?
> >
> >What happens if you fudge the code so it allocates those pages to be
> >page aligned. So df65a000->df66e000 ? We skip this way the region
> >df659800->df659fff and start on a new PFN (and pte).
> 
> I get (though the photo isn''t clear in places)
df659000->df66cfff
> and it crashes at find_range_array+0x4d/0x56 which traces back to
> the call of memblock_find_dma_reserve from setup_arch in
> arch/x86/kernel/setup.c . So it still crashes, but at a slightly
> later stage.
Ok, so we just pass the back so to say to the next user of that PFN.

We should find out why that PTE is not being setup.... And I think
this might be a missing entry in the MFN (thanks to Stefan Bader
finding a bug there).  Looking at your E820:

[    0.000000]  Xen: 0000000000100000 - 000000003b0e2000 (usable)

Your memory ends a 3b0e, which is not on a nice page boundary.
Can you try this patch (you will need to re-gigger as in 2.6.38-rc1
the p2m code moved out of xen/mmu.c to xen/p2m.c):

https://patchwork.kernel.org/patch/492011/

BTW, You are doing a great detective work here. Thanks for
being willing to dig in this.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

M A Young

2011-Jan-21 21:43 UTC

head link

Re: [Xen-devel] Crash on boot with 2.6.37-rc8-git3

On Fri, 21 Jan 2011, Konrad Rzeszutek Wilk wrote:
> We should find out why that PTE is not being setup.... And I think
> this might be a missing entry in the MFN (thanks to Stefan Bader
> finding a bug there).  Looking at your E820:
>
> [    0.000000]  Xen: 0000000000100000 - 000000003b0e2000 (usable)
Mine is
[    0.000000]  Xen: 0000000000100000 - 00000000df66d800 (usable)
> Your memory ends a 3b0e, which is not on a nice page boundary.
Mine isn''t on a page boundary at all!
> Can you try this patch (you will need to re-gigger as in 2.6.38-rc1
> the p2m code moved out of xen/mmu.c to xen/p2m.c):
It doesn''t help, and crashes at the same place as the unaltered kernel.
My
problem may not be happening in the xen code at all. From the boot logs of 
one of my hack attempts that actually booted I have

[    0.000000] BIOS-provided physical RAM map:
[    0.000000]  Xen: 0000000000000000 - 000000000009f000 (usable)
[    0.000000]  Xen: 000000000009f000 - 0000000000100000 (reserved)
[    0.000000]  Xen: 0000000000100000 - 00000000df66d800 (usable)
[    0.000000]  Xen: 00000000df66d800 - 00000000e0000000 (reserved)
[    0.000000]  Xen: 00000000f8000000 - 00000000fc000000 (reserved)
[    0.000000]  Xen: 00000000fec00000 - 00000000fec10000 (reserved)
[    0.000000]  Xen: 00000000fed18000 - 00000000fed1c000 (reserved)
[    0.000000]  Xen: 00000000fed20000 - 00000000fed90000 (reserved)
[    0.000000]  Xen: 00000000feda0000 - 00000000feda6000 (reserved)
[    0.000000]  Xen: 00000000fee00000 - 00000000fee10000 (reserved)
[    0.000000]  Xen: 00000000ffe00000 - 0000000100000000 (reserved)
[    0.000000]  Xen: 0000000100000000 - 00000001342cb000 (usable)
[    0.000000] NX (Execute Disable) protection: active
[    0.000000] DMI 2.4 present.
[    0.000000] No AGP bridge found
[    0.000000] last_pfn = 0x1342cb max_arch_pfn = 0x400000000
[    0.000000] last_pfn = 0xdf66d max_arch_pfn = 0x400000000
[    0.000000] init_memory_mapping: 0000000000000000-00000000df66d000
[    0.000000] init_memory_mapping: 0000000100000000-00000001342cb000

The last_pfn figure above is actually one more than the last pfn that is 
initialized and is obtained by right-shifting the start memory address 
plus the length of the memory piece. That is fine if the memory ends on a 
page boundary, but not if it doesn''t because the partial page
doesn''t get
a pfn. Thus it is available for early allocations such as the NODE DATA 
chunk. Xen goes for the memory chunk just below the 4GB mark and hits this 
region, bare metal (2.6.35) starts the NODE DATA at the 4GB mark and 
doesn''t.

I am not sure if bare metal is clever enough not to try to use this 
partial page, or whether it could but misses it because of how it places 
the NODE_DATA (at the bottom end of a memory region rather than the top 
end).

 	Michael Young

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Konrad Rzeszutek Wilk

2011-Jan-24 14:14 UTC

head link

Re: [Xen-devel] Crash on boot with 2.6.37-rc8-git3

On Fri, Jan 21, 2011 at 09:43:34PM +0000, M A Young
wrote:> On Fri, 21 Jan 2011, Konrad Rzeszutek Wilk wrote:
> 
> >We should find out why that PTE is not being setup.... And I think
> >this might be a missing entry in the MFN (thanks to Stefan Bader
> >finding a bug there).  Looking at your E820:
> >
> >[    0.000000]  Xen: 0000000000100000 - 000000003b0e2000 (usable)
> 
> Mine is
> [    0.000000]  Xen: 0000000000100000 - 00000000df66d800 (usable)
> 
> >Your memory ends a 3b0e, which is not on a nice page boundary.
> 
> Mine isn''t on a page boundary at all!
Whoa.> 
> >Can you try this patch (you will need to re-gigger as in 2.6.38-rc1
> >the p2m code moved out of xen/mmu.c to xen/p2m.c):
> 
> It doesn''t help, and crashes at the same place as the unaltered
> kernel. My problem may not be happening in the xen code at all. From
> the boot logs of one of my hack attempts that actually booted I have
> 
> [    0.000000] BIOS-provided physical RAM map:
> [    0.000000]  Xen: 0000000000000000 - 000000000009f000 (usable)
> [    0.000000]  Xen: 000000000009f000 - 0000000000100000 (reserved)
> [    0.000000]  Xen: 0000000000100000 - 00000000df66d800 (usable)
> [    0.000000]  Xen: 00000000df66d800 - 00000000e0000000 (reserved)
> [    0.000000]  Xen: 00000000f8000000 - 00000000fc000000 (reserved)
> [    0.000000]  Xen: 00000000fec00000 - 00000000fec10000 (reserved)
> [    0.000000]  Xen: 00000000fed18000 - 00000000fed1c000 (reserved)
> [    0.000000]  Xen: 00000000fed20000 - 00000000fed90000 (reserved)
> [    0.000000]  Xen: 00000000feda0000 - 00000000feda6000 (reserved)
> [    0.000000]  Xen: 00000000fee00000 - 00000000fee10000 (reserved)
> [    0.000000]  Xen: 00000000ffe00000 - 0000000100000000 (reserved)
> [    0.000000]  Xen: 0000000100000000 - 00000001342cb000 (usable)
> [    0.000000] NX (Execute Disable) protection: active
> [    0.000000] DMI 2.4 present.
> [    0.000000] No AGP bridge found
> [    0.000000] last_pfn = 0x1342cb max_arch_pfn = 0x400000000
> [    0.000000] last_pfn = 0xdf66d max_arch_pfn = 0x400000000
> [    0.000000] init_memory_mapping: 0000000000000000-00000000df66d000
> [    0.000000] init_memory_mapping: 0000000100000000-00000001342cb000
> 
> The last_pfn figure above is actually one more than the last pfn
> that is initialized and is obtained by right-shifting the start
> memory address plus the length of the memory piece. That is fine if
> the memory ends on a page boundary, but not if it doesn''t because
> the partial page doesn''t get a pfn. Thus it is available for early
We can fix how the E820 is done.
Look in arch/x86/xen/setup.c for ''xen_memory_setup'' function.
Try to wrap make map[i].size be = map[i].szie & ~(PAGE_SIZE-1)
that should trim off the last 2048 bytes.
> allocations such as the NODE DATA chunk. Xen goes for the memory
> chunk just below the 4GB mark and hits this region, bare metal
> (2.6.35) starts the NODE DATA at the 4GB mark and doesn''t.
That should be generic and hit both cases - but I think this got
fixed in 2.6.36-ish were going for the region right underneath
4GB is not done (don''t remember the details, sadly).
> 
> I am not sure if bare metal is clever enough not to try to use this
> partial page, or whether it could but misses it because of how it
> places the NODE_DATA (at the bottom end of a memory region rather
> than the top end).
If you leave the instrumentation you placed in and add
''memblock=debug''
that should give you a good idea of how it does it?> 
> 	Michael Young
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Stefano Stabellini

2011-Jan-24 19:04 UTC

head link

Re: [Xen-devel] Crash on boot with 2.6.37-rc8-git3

I have a work-in-progress patch that fixes a booting issue on one of my
testboxes. Could you please give it a try, passing dom0_mem=700M to the
Xen command line?



diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
index 947f42a..ebc0221 100644
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -291,10 +291,23 @@ unsigned long __init_refok init_memory_mapping(unsigned
long start,
 		 * located on different 2M pages. cleanup_highmap(), however,
 		 * can only consider _end when it runs, so destroy any
 		 * mappings beyond _brk_end here.
+		 * Be careful not to go over _end.
 		 */
 		pud = pud_offset(pgd_offset_k(_brk_end), _brk_end);
 		pmd = pmd_offset(pud, _brk_end - 1);
-		while (++pmd <= pmd_offset(pud, (unsigned long)_end - 1))
+		while (++pmd < pmd_offset(pud, (unsigned long)_end - 1))
+			pmd_clear(pmd);
+		if (((unsigned long)_end) & ~PMD_MASK) {
+			pte_t *pte;
+			unsigned long addr;
+			for (addr = ((unsigned long)_end) & PMD_MASK;
+					addr < ((unsigned long)_end);
+					addr += PAGE_SIZE) {
+				pte = pte_offset_map(pmd, addr);
+				pte_clear(&init_mm, addr, pte);
+				pte_unmap(pte);
+			}
+		} else
 			pmd_clear(pmd);
 	}
 #endif

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

M A Young

2011-Jan-24 23:12 UTC

head link

Re: [Xen-devel] Crash on boot with 2.6.37-rc8-git3

On Mon, 24 Jan 2011, Konrad Rzeszutek Wilk wrote:
> We can fix how the E820 is done.
> Look in arch/x86/xen/setup.c for ''xen_memory_setup''
function.
> Try to wrap make map[i].size be = map[i].szie & ~(PAGE_SIZE-1)
> that should trim off the last 2048 bytes.
The attached patch works for me, though it does assume the memory region 
starts on a page boundary.

 	Michael Young

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

M A Young

2011-Jan-25 00:22 UTC

head link

Re: [Xen-devel] Crash on boot with 2.6.37-rc8-git3

On Mon, 24 Jan 2011, Stefano Stabellini wrote:
> I have a work-in-progress patch that fixes a booting issue on one of my
> testboxes. Could you please give it a try, passing dom0_mem=700M to the
> Xen command line?
It wouldn''t prove anything in my case as booting with dom0_mem=700M
works.

 	Michael Young

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Stefano Stabellini

2011-Jan-25 12:03 UTC

head link

Re: [Xen-devel] Crash on boot with 2.6.37-rc8-git3

On Mon, 24 Jan 2011, M A Young wrote:> On Mon, 24 Jan 2011, Konrad Rzeszutek Wilk wrote:
> 
> > We can fix how the E820 is done.
> > Look in arch/x86/xen/setup.c for ''xen_memory_setup''
function.
> > Try to wrap make map[i].size be = map[i].szie & ~(PAGE_SIZE-1)
> > that should trim off the last 2048 bytes.
> 
> The attached patch works for me, though it does assume the memory region 
> starts on a page boundary.
It turns out that it is me having the same issue you have and not the
other way around :)

Your patch (in addition to my previous patch) makes my testbox boot, no
matter what dom0_mem parameter I choose.

Appended is a version of the patch that doesn''t assume that the memory
region starts on a page boundary.

---

diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c
index b5a7f92..a3d28a1 100644
--- a/arch/x86/xen/setup.c
+++ b/arch/x86/xen/setup.c
@@ -179,7 +179,10 @@ char * __init xen_memory_setup(void)
 	e820.nr_map = 0;
 	xen_extra_mem_start = mem_end;
 	for (i = 0; i < memmap.nr_entries; i++) {
-		unsigned long long end = map[i].addr + map[i].size;
+		unsigned long long end;
+		if (map[i].type == E820_RAM)
+			map[i].size -= (map[i].size + map[i].addr) % PAGE_SIZE;
+		end = map[i].addr + map[i].size;
 
 		if (map[i].type == E820_RAM && end > mem_end) {
 			/* RAM off the end - may be partially included */

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Ian Campbell

2011-Jan-25 13:24 UTC

head link

Re: [Xen-devel] Crash on boot with 2.6.37-rc8-git3

On Tue, 2011-01-25 at 12:03 +0000, Stefano Stabellini
wrote:> On Mon, 24 Jan 2011, M A Young wrote:
> > On Mon, 24 Jan 2011, Konrad Rzeszutek Wilk wrote:
> > 
> > > We can fix how the E820 is done.
> > > Look in arch/x86/xen/setup.c for
''xen_memory_setup'' function.
> > > Try to wrap make map[i].size be = map[i].szie &
~(PAGE_SIZE-1)
> > > that should trim off the last 2048 bytes.
> > 
> > The attached patch works for me, though it does assume the memory
region
> > starts on a page boundary.
> 
> It turns out that it is me having the same issue you have and not the
> other way around :)
> 
> Your patch (in addition to my previous patch) makes my testbox boot, no
> matter what dom0_mem parameter I choose.
> 
> Appended is a version of the patch that doesn''t assume that the
memory
> region starts on a page boundary.
> 
> ---
> 
> diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c
> index b5a7f92..a3d28a1 100644
> --- a/arch/x86/xen/setup.c
> +++ b/arch/x86/xen/setup.c
> @@ -179,7 +179,10 @@ char * __init xen_memory_setup(void)
>  	e820.nr_map = 0;
>  	xen_extra_mem_start = mem_end;
>  	for (i = 0; i < memmap.nr_entries; i++) {
> -		unsigned long long end = map[i].addr + map[i].size;
> +		unsigned long long end;
> +		if (map[i].type == E820_RAM)
> +			map[i].size -= (map[i].size + map[i].addr) % PAGE_SIZE;
The more normal idiom to round down to a page boundary in the kernel is:
	map[i].size &= ~(PAGE_SIZE-1);

Do you also need to page align map[i].addr upwards for maximum safety?

Ian.
> +		end = map[i].addr + map[i].size;
>  
>  		if (map[i].type == E820_RAM && end > mem_end) {
>  			/* RAM off the end - may be partially included */
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Stefano Stabellini

2011-Jan-25 13:31 UTC

head link

Re: [Xen-devel] Crash on boot with 2.6.37-rc8-git3

On Tue, 25 Jan 2011, Ian Campbell wrote:> > It turns out that it is me having the same issue you have and not the
> > other way around :)
> > 
> > Your patch (in addition to my previous patch) makes my testbox boot,
no
> > matter what dom0_mem parameter I choose.
> > 
> > Appended is a version of the patch that doesn''t assume that
the memory
> > region starts on a page boundary.
> > 
> > ---
> > 
> > diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c
> > index b5a7f92..a3d28a1 100644
> > --- a/arch/x86/xen/setup.c
> > +++ b/arch/x86/xen/setup.c
> > @@ -179,7 +179,10 @@ char * __init xen_memory_setup(void)
> >  	e820.nr_map = 0;
> >  	xen_extra_mem_start = mem_end;
> >  	for (i = 0; i < memmap.nr_entries; i++) {
> > -		unsigned long long end = map[i].addr + map[i].size;
> > +		unsigned long long end;
> > +		if (map[i].type == E820_RAM)
> > +			map[i].size -= (map[i].size + map[i].addr) % PAGE_SIZE;
> 
> The more normal idiom to round down to a page boundary in the kernel is:
> 	map[i].size &= ~(PAGE_SIZE-1);
> 
> Do you also need to page align map[i].addr upwards for maximum safety?
> 
unless I am very confused

map[i].size -= (map[i].size + map[i].addr) % PAGE_SIZE

is not the same as:

as map[i].size &= ~(PAGE_SIZE-1): 

because it also takes into account the possibility that map[i].addr is
not page aligned. It doesn''t move map[i].addr upward but still makes
sure that
the region ends at a page boundary anyway.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Ian Campbell

2011-Jan-25 13:45 UTC

head link

Re: [Xen-devel] Crash on boot with 2.6.37-rc8-git3

On Tue, 2011-01-25 at 13:31 +0000, Stefano Stabellini
wrote:> On Tue, 25 Jan 2011, Ian Campbell wrote:
> > > It turns out that it is me having the same issue you have and not
the
> > > other way around :)
> > > 
> > > Your patch (in addition to my previous patch) makes my testbox
boot, no
> > > matter what dom0_mem parameter I choose.
> > > 
> > > Appended is a version of the patch that doesn''t assume
that the memory
> > > region starts on a page boundary.
> > > 
> > > ---
> > > 
> > > diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c
> > > index b5a7f92..a3d28a1 100644
> > > --- a/arch/x86/xen/setup.c
> > > +++ b/arch/x86/xen/setup.c
> > > @@ -179,7 +179,10 @@ char * __init xen_memory_setup(void)
> > >  	e820.nr_map = 0;
> > >  	xen_extra_mem_start = mem_end;
> > >  	for (i = 0; i < memmap.nr_entries; i++) {
> > > -		unsigned long long end = map[i].addr + map[i].size;
> > > +		unsigned long long end;
> > > +		if (map[i].type == E820_RAM)
> > > +			map[i].size -= (map[i].size + map[i].addr) % PAGE_SIZE;
> > 
> > The more normal idiom to round down to a page boundary in the kernel
is:
> > 	map[i].size &= ~(PAGE_SIZE-1);
> > 
> > Do you also need to page align map[i].addr upwards for maximum safety?
> > 
> 
> unless I am very confused
> 
> map[i].size -= (map[i].size + map[i].addr) % PAGE_SIZE
> 
> is not the same as:
> 
> as map[i].size &= ~(PAGE_SIZE-1): 
> 
> because it also takes into account the possibility that map[i].addr is
> not page aligned.
Oh yes, I didn''t notice that aspect of it.
>  It doesn''t move map[i].addr upward but still makes sure that
> the region ends at a page boundary anyway.
Which returns to my second question ;-) Why do we not need to align addr
too?

Ian.


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Stefano Stabellini

2011-Jan-25 15:19 UTC

head link

Re: [Xen-devel] Crash on boot with 2.6.37-rc8-git3

On Tue, 25 Jan 2011, Ian Campbell wrote:> > unless I am very confused
> > 
> > map[i].size -= (map[i].size + map[i].addr) % PAGE_SIZE
> > 
> > is not the same as:
> > 
> > as map[i].size &= ~(PAGE_SIZE-1): 
> > 
> > because it also takes into account the possibility that map[i].addr is
> > not page aligned.
> 
> Oh yes, I didn''t notice that aspect of it.
> 
> >  It doesn''t move map[i].addr upward but still makes sure that
> > the region ends at a page boundary anyway.
> 
> Which returns to my second question ;-) Why do we not need to align addr
> too?
My machine can boot fine with a map[i].addr not page aligned.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Konrad Rzeszutek Wilk

2011-Jan-25 15:52 UTC

head link

Re: [Xen-devel] Crash on boot with 2.6.37-rc8-git3

On Tue, Jan 25, 2011 at 03:19:22PM +0000, Stefano Stabellini
wrote:> On Tue, 25 Jan 2011, Ian Campbell wrote:
> > > unless I am very confused
> > > 
> > > map[i].size -= (map[i].size + map[i].addr) % PAGE_SIZE
> > > 
> > > is not the same as:
> > > 
> > > as map[i].size &= ~(PAGE_SIZE-1): 
> > > 
> > > because it also takes into account the possibility that
map[i].addr is
> > > not page aligned.
> > 
> > Oh yes, I didn''t notice that aspect of it.
> > 
> > >  It doesn''t move map[i].addr upward but still makes sure
that
> > > the region ends at a page boundary anyway.
> > 
> > Which returns to my second question ;-) Why do we not need to align
addr
> > too?
> 
> My machine can boot fine with a map[i].addr not page aligned.
OK, so then the patch that M A Young came up with ought to do it?

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Stefano Stabellini

2011-Jan-25 15:56 UTC

head link

Re: [Xen-devel] Crash on boot with 2.6.37-rc8-git3

On Tue, 25 Jan 2011, Konrad Rzeszutek Wilk wrote:> On Tue, Jan 25, 2011 at 03:19:22PM +0000, Stefano Stabellini wrote:
> > On Tue, 25 Jan 2011, Ian Campbell wrote:
> > > > unless I am very confused
> > > > 
> > > > map[i].size -= (map[i].size + map[i].addr) % PAGE_SIZE
> > > > 
> > > > is not the same as:
> > > > 
> > > > as map[i].size &= ~(PAGE_SIZE-1): 
> > > > 
> > > > because it also takes into account the possibility that
map[i].addr is
> > > > not page aligned.
> > > 
> > > Oh yes, I didn''t notice that aspect of it.
> > > 
> > > >  It doesn''t move map[i].addr upward but still makes
sure that
> > > > the region ends at a page boundary anyway.
> > > 
> > > Which returns to my second question ;-) Why do we not need to
align addr
> > > too?
> > 
> > My machine can boot fine with a map[i].addr not page aligned.
> 
> OK, so then the patch that M A Young came up with ought to do it?
> 
I think you need the slightly improved version I posted before that can
handle map[i].addr not page aligned (I silently added a s-o-b Young, I
hope he''s OK with this).

---


commit b84683ad1e704c2a296d08ff0cbe29db936f94a7
Author: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Date:   Tue Jan 25 12:03:42 2011 +0000

    xen: make sure the e820 memory regions end at page boundary
    
    Signed-off-by: M A Young <m.a.young@durham.ac.uk>
    Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>

diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c
index b5a7f92..a3d28a1 100644
--- a/arch/x86/xen/setup.c
+++ b/arch/x86/xen/setup.c
@@ -179,7 +179,10 @@ char * __init xen_memory_setup(void)
 	e820.nr_map = 0;
 	xen_extra_mem_start = mem_end;
 	for (i = 0; i < memmap.nr_entries; i++) {
-		unsigned long long end = map[i].addr + map[i].size;
+		unsigned long long end;
+		if (map[i].type == E820_RAM)
+			map[i].size -= (map[i].size + map[i].addr) % PAGE_SIZE;
+		end = map[i].addr + map[i].size;
 
 		if (map[i].type == E820_RAM && end > mem_end) {
 			/* RAM off the end - may be partially included */

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

M A Young

2011-Jan-25 16:05 UTC

head link

Re: [Xen-devel] Crash on boot with 2.6.37-rc8-git3

On Tue, 25 Jan 2011, Stefano Stabellini wrote:
> I think you need the slightly improved version I posted before that can
> handle map[i].addr not page aligned (I silently added a s-o-b Young, I
> hope he''s OK with this).
Yes and yes. My version doesn''t work if map[i].addr is not page
aligned.
The aim is to make sure the end address is page aligned, and avoid ending 
with a partial page which won''t have a PFN and might also require 
different treatment if there is reserved content in the rest of the page 
(which is true in my case).

 	Michael Young
> commit b84683ad1e704c2a296d08ff0cbe29db936f94a7
> Author: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
> Date:   Tue Jan 25 12:03:42 2011 +0000
>
>    xen: make sure the e820 memory regions end at page boundary
>
>    Signed-off-by: M A Young <m.a.young@durham.ac.uk>
>    Signed-off-by: Stefano Stabellini
<stefano.stabellini@eu.citrix.com>
>
> diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c
> index b5a7f92..a3d28a1 100644
> --- a/arch/x86/xen/setup.c
> +++ b/arch/x86/xen/setup.c
> @@ -179,7 +179,10 @@ char * __init xen_memory_setup(void)
> 	e820.nr_map = 0;
> 	xen_extra_mem_start = mem_end;
> 	for (i = 0; i < memmap.nr_entries; i++) {
> -		unsigned long long end = map[i].addr + map[i].size;
> +		unsigned long long end;
> +		if (map[i].type == E820_RAM)
> +			map[i].size -= (map[i].size + map[i].addr) % PAGE_SIZE;
> +		end = map[i].addr + map[i].size;
>
> 		if (map[i].type == E820_RAM && end > mem_end) {
> 			/* RAM off the end - may be partially included */
>
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Xen devel - Jan 2011 - Crash on boot with 2.6.37-rc8-git3

[Xen-devel] Crash on boot with 2.6.37-rc8-git3

Re: [Xen-devel] Crash on boot with 2.6.37-rc8-git3

Re: [Xen-devel] Crash on boot with 2.6.37-rc8-git3

Re: [Xen-devel] Crash on boot with 2.6.37-rc8-git3

Re: [Xen-devel] Crash on boot with 2.6.37-rc8-git3

Re: [Xen-devel] Crash on boot with 2.6.37-rc8-git3

Re: [Xen-devel] Crash on boot with 2.6.37-rc8-git3

Re: [Xen-devel] Crash on boot with 2.6.37-rc8-git3

Re: [Xen-devel] Crash on boot with 2.6.37-rc8-git3

Re: [Xen-devel] Crash on boot with 2.6.37-rc8-git3

Re: [Xen-devel] Crash on boot with 2.6.37-rc8-git3

Re: [Xen-devel] Crash on boot with 2.6.37-rc8-git3

Re: [Xen-devel] Crash on boot with 2.6.37-rc8-git3

Re: [Xen-devel] Crash on boot with 2.6.37-rc8-git3

Re: [Xen-devel] Crash on boot with 2.6.37-rc8-git3

Re: [Xen-devel] Crash on boot with 2.6.37-rc8-git3

Re: [Xen-devel] Crash on boot with 2.6.37-rc8-git3

Re: [Xen-devel] Crash on boot with 2.6.37-rc8-git3

Re: [Xen-devel] Crash on boot with 2.6.37-rc8-git3

Re: [Xen-devel] Crash on boot with 2.6.37-rc8-git3

Re: [Xen-devel] Crash on boot with 2.6.37-rc8-git3

Re: [Xen-devel] Crash on boot with 2.6.37-rc8-git3

Re: [Xen-devel] Crash on boot with 2.6.37-rc8-git3

Re: [Xen-devel] Crash on boot with 2.6.37-rc8-git3

Re: [Xen-devel] Crash on boot with 2.6.37-rc8-git3

Re: [Xen-devel] Crash on boot with 2.6.37-rc8-git3

Re: [Xen-devel] Crash on boot with 2.6.37-rc8-git3

Re: [Xen-devel] Crash on boot with 2.6.37-rc8-git3

Re: [Xen-devel] Crash on boot with 2.6.37-rc8-git3

Re: [Xen-devel] Crash on boot with 2.6.37-rc8-git3

Re: [Xen-devel] Crash on boot with 2.6.37-rc8-git3