Konrad Rzeszutek Wilk
2012-Sep-04 16:33 UTC
Re: dom0 linux 3.6.0-rc4, crash due to ballooning althoug dom0_mem=X, max:X set
On Tue, Sep 04, 2012 at 06:37:57PM +0200, Sander Eikelenboom wrote:> Hi Konrad, > > This seems to happen only on a intel machine i''m trying to setup as a development machine (haven''t seen it on my amd). > It boots fine, i have dom0_mem=1024M,max:1024M set, the machine has 2G of mem.Is this only with Xen 4.2? As, does Xen 4.1 work?> > Dom0 and guest kernel are 3.6.0-rc4 with config:If you back out: f393387d160211f60398d58463a7e65 Author: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Date: Fri Aug 17 16:43:28 2012 -0400 xen/setup: Fix one-off error when adding for-balloon PFNs to the P2M. Do you see this bug? (Either with Xen 4.1 or Xen 4.2)?> [*] Xen memory balloon driver > [*] Scrub pages before returning them to system > > From http://wiki.xen.org/wiki/Do%EF%BB%BFm0_Memory_%E2%80%94_Where_It_Has_Not_Gone , I thought this should be okay > > But when trying to start a PV guest with 512MB mem, the machine (dom0) crashes with the stacktrace below (complete serial-log.txt attached). > > From the: > "mapping kernel into physical memory > about to get started..." > > I would almost say it''s trying to reload dom0 ? > > > [ 897.161119] device vif1.0 entered promiscuous mode > mapping kernel into physical memory > about to get started... > [ 897.696619] xen_bridge: port 1(vif1.0) entered forwarding state > [ 897.716219] xen_bridge: port 1(vif1.0) entered forwarding state > [ 898.129465] ------------[ cut here ]------------ > [ 898.132209] kernel BUG at drivers/xen/balloon.c:359! > [ 898.132209] invalid opcode: 0000 [#1] PREEMPT SMP
Sander Eikelenboom
2012-Sep-04 16:37 UTC
dom0 linux 3.6.0-rc4, crash due to ballooning althoug dom0_mem=X, max:X set
Hi Konrad, This seems to happen only on a intel machine i''m trying to setup as a development machine (haven''t seen it on my amd). It boots fine, i have dom0_mem=1024M,max:1024M set, the machine has 2G of mem. Dom0 and guest kernel are 3.6.0-rc4 with config: [*] Xen memory balloon driver [*] Scrub pages before returning them to system From http://wiki.xen.org/wiki/Do%EF%BB%BFm0_Memory_%E2%80%94_Where_It_Has_Not_Gone , I thought this should be okay But when trying to start a PV guest with 512MB mem, the machine (dom0) crashes with the stacktrace below (complete serial-log.txt attached). From the: "mapping kernel into physical memory about to get started..." I would almost say it''s trying to reload dom0 ? [ 897.161119] device vif1.0 entered promiscuous mode mapping kernel into physical memory about to get started... [ 897.696619] xen_bridge: port 1(vif1.0) entered forwarding state [ 897.716219] xen_bridge: port 1(vif1.0) entered forwarding state [ 898.129465] ------------[ cut here ]------------ [ 898.132209] kernel BUG at drivers/xen/balloon.c:359! [ 898.132209] invalid opcode: 0000 [#1] PREEMPT SMP [ 898.132209] Modules linked in: [ 898.132209] CPU 0 [ 898.132209] Pid: 3338, comm: kworker/0:1 Not tainted 3.6.0-rc4-20120830+ #66 System manufacturer System Product Name/P5Q-EM DO [ 898.132209] RIP: e030:[<ffffffff8133b206>] [<ffffffff8133b206>] balloon_process+0x336/0x340 [ 898.132209] RSP: e02b:ffff880037b4dce0 EFLAGS: 00010213 [ 898.132209] RAX: 00000000242b0000 RBX: ffffea0000dfadc0 RCX: 0000000000000000 [ 898.132209] RDX: 0000000000037eb7 RSI: 00000000deadbeef RDI: 00000000000000b7 [ 898.132209] RBP: ffff880037b4dd40 R08: ffffea0000dfade0 R09: 2222222222222222 [ 898.132209] R10: 2222222222222222 R11: 2222222222222222 R12: 0000000000000000 [ 898.132209] R13: ffffea0000dfade0 R14: 0000160000000000 R15: 0000000000000001 [ 898.132209] FS: 00007fd4bd0ec740(0000) GS:ffff88003fc00000(0000) knlGS:0000000000000000 [ 898.132209] CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b [ 898.132209] CR2: 00007fd4b387d000 CR3: 000000003920a000 CR4: 0000000000042660 [ 898.132209] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 898.132209] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 898.132209] Process kworker/0:1 (pid: 3338, threadinfo ffff880037b4c000, task ffff8800398fe180) [ 898.132209] Stack: [ 898.132209] 0000000000037eb7 0000000000000001 ffffffff8286c540 0000000000000001 [ 898.132209] 0000000000000000 0000000000007ff0 ffff880037b4dd20 ffffffff81e42a60 [ 898.132209] ffff88003799c6c0 ffff88003fc16700 ffff88003fc0e000 ffff880037b4dd90 [ 898.132209] Call Trace: [ 898.132209] [<ffffffff8107fb8f>] process_one_work+0x1bf/0x4a0 [ 898.132209] [<ffffffff8107fb30>] ? process_one_work+0x160/0x4a0 [ 898.132209] [<ffffffff81849191>] ? __schedule+0x471/0x8a0 [ 898.132209] [<ffffffff8133aed0>] ? decrease_reservation+0x2d0/0x2d0 [ 898.132209] [<ffffffff81080252>] worker_thread+0x152/0x470 [ 898.132209] [<ffffffff8184ad85>] ? _raw_spin_unlock_irqrestore+0x75/0xa0 [ 898.132209] [<ffffffff810ae4dd>] ? trace_hardirqs_on+0xd/0x10 [ 898.132209] [<ffffffff8184ad63>] ? _raw_spin_unlock_irqrestore+0x53/0xa0 [ 898.132209] [<ffffffff81080100>] ? manage_workers+0x290/0x290 [ 898.132209] [<ffffffff81087696>] kthread+0x96/0xa0 [ 898.132209] [<ffffffff8184cb84>] kernel_thread_helper+0x4/0x10 [ 898.132209] [<ffffffff8184b134>] ? retint_restore_args+0x13/0x13 [ 898.132209] [<ffffffff8184cb80>] ? gs_change+0x13/0x13 [ 898.132209] Code: ff 0f 1f 40 00 48 89 d8 e9 22 fe ff ff 0f 0b eb fe 48 89 d7 48 89 55 a0 e8 18 e7 cc ff 48 83 f8 ff 48 8b 55 a0 0f 84 74 fe ff ff <0f> 0b eb fe 66 0f 1f 44 00 00 55 48 89 e5 41 57 41 56 41 89 d6 [ 898.132209] RIP [<ffffffff8133b206>] balloon_process+0x336/0x340 [ 898.132209] RSP <ffff880037b4dce0> [ 898.738233] ---[ end trace 3f7af50285edb7bb ]--- [ 898.749003] BUG: unable to handle kernel paging request at fffffffffffffff8 [ 898.752237] IP: [<ffffffff81086eeb>] kthread_data+0xb/0x20 [ 898.752237] PGD 1e0d067 PUD 1e0e067 PMD 0 [ 898.752237] Oops: 0000 [#2] PREEMPT SMP [ 898.752237] Modules linked in: [ 898.752237] CPU 0 [ 898.752237] Pid: 3338, comm: kworker/0:1 Tainted: G D 3.6.0-rc4-20120830+ #66 System manufacturer System Product Name/P5Q-EM DO [ 898.752237] RIP: e030:[<ffffffff81086eeb>] [<ffffffff81086eeb>] kthread_data+0xb/0x20 [ 898.752237] RSP: e02b:ffff880037b4d898 EFLAGS: 00010082 [ 898.752237] RAX: 0000000000000000 RBX: ffff88003fc12e80 RCX: 0000000000000000 [ 898.752237] RDX: ffffffff820057a0 RSI: 0000000000000000 RDI: ffff8800398fe180 [ 898.752237] RBP: ffff880037b4d898 R08: ffff8800398fe1f0 R09: 0000000000000400 [ 898.752237] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000 [ 898.752237] R13: 0000000000000000 R14: ffff880037b4d7b8 R15: ffff880037b4da90 [ 898.752237] FS: 00007fd4bd0ec740(0000) GS:ffff88003fc00000(0000) knlGS:0000000000000000 [ 898.752237] CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b [ 898.752237] CR2: fffffffffffffff8 CR3: 000000003920a000 CR4: 0000000000042660 [ 898.752237] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 898.752237] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 898.752237] Process kworker/0:1 (pid: 3338, threadinfo ffff880037b4c000, task ffff8800398fe180) [ 898.752237] Stack: [ 898.752237] ffff880037b4d8c8 ffffffff8108306c 0000000000000000 ffff88003fc12e80 [ 898.752237] 0000000000000000 ffff8800398fe528 ffff880037b4da18 ffffffff8184931f [ 898.752237] 0000000000000000 ffffffff81083bb8 ffff8800398fe180 0000000000012e80 [ 898.752237] Call Trace: [ 898.752237] [<ffffffff8108306c>] wq_worker_sleeping+0x1c/0x90 [ 898.752237] [<ffffffff8184931f>] __schedule+0x5ff/0x8a0 [ 898.752237] [<ffffffff81083bb8>] ? free_pid+0x18/0xc0 [ 898.752237] [<ffffffff810602a7>] ? sha1_transform_ssse3+0x187/0xd00 [ 898.752237] [<ffffffff810b1a94>] ? lock_acquire+0xe4/0x110 [ 898.752237] [<ffffffff8106cfa7>] ? do_exit+0x4e7/0x8e0 [ 898.752237] [<ffffffff810e27f2>] ? call_rcu+0x12/0x20 [ 898.752237] [<ffffffff810b1f01>] ? lock_release+0x111/0x260 [ 898.752237] [<ffffffff81849654>] schedule+0x24/0x70 [ 898.752237] [<ffffffff8106d074>] do_exit+0x5b4/0x8e0 [ 898.752237] [<ffffffff81010240>] oops_end+0xb0/0xf0 [ 898.752237] [<ffffffff810103b6>] die+0x56/0x90 [ 898.752237] [<ffffffff8100d6c4>] do_trap+0xc4/0x170 [ 898.752237] [<ffffffff8100dbe2>] ? do_invalid_op+0x72/0xc0 [ 898.752237] [<ffffffff8100dc16>] do_invalid_op+0xa6/0xc0 [ 898.752237] [<ffffffff8133b206>] ? balloon_process+0x336/0x340 [ 898.752237] [<ffffffff810ac9e8>] ? trace_hardirqs_off_caller+0x78/0x150 [ 898.752237] [<ffffffff812b29fd>] ? trace_hardirqs_off_thunk+0x3a/0x3c [ 898.752237] [<ffffffff8184b164>] ? restore_args+0x30/0x30 [ 898.752237] [<ffffffff8184c9fb>] invalid_op+0x1b/0x20 [ 898.752237] [<ffffffff8133b206>] ? balloon_process+0x336/0x340 [ 898.752237] [<ffffffff8107fb8f>] process_one_work+0x1bf/0x4a0 [ 898.752237] [<ffffffff8107fb30>] ? process_one_work+0x160/0x4a0 [ 898.752237] [<ffffffff81849191>] ? __schedule+0x471/0x8a0 [ 898.752237] [<ffffffff8133aed0>] ? decrease_reservation+0x2d0/0x2d0 [ 898.752237] [<ffffffff81080252>] worker_thread+0x152/0x470 [ 898.752237] [<ffffffff8184ad85>] ? _raw_spin_unlock_irqrestore+0x75/0xa0 [ 898.752237] [<ffffffff810ae4dd>] ? trace_hardirqs_on+0xd/0x10 [ 898.752237] [<ffffffff8184ad63>] ? _raw_spin_unlock_irqrestore+0x53/0xa0 [ 898.752237] [<ffffffff81080100>] ? manage_workers+0x290/0x290 [ 898.752237] [<ffffffff81087696>] kthread+0x96/0xa0 [ 898.752237] [<ffffffff8184cb84>] kernel_thread_helper+0x4/0x10 [ 898.752237] [<ffffffff8184b134>] ? retint_restore_args+0x13/0x13 [ 898.752237] [<ffffffff8184cb80>] ? gs_change+0x13/0x13 [ 898.752237] Code: 55 65 48 8b 04 25 80 c6 00 00 48 8b 80 50 03 00 00 48 89 e5 8b 40 f0 c9 c3 0f 1f 80 00 00 00 00 48 8b 87 50 03 00 00 55 48 89 e5 <48> 8b 40 f8 c9 c3 66 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 [ 898.752237] RIP [<ffffffff81086eeb>] kthread_data+0xb/0x20 [ 898.752237] RSP <ffff880037b4d898> [ 898.752237] CR2: fffffffffffffff8 [ 898.752237] ---[ end trace 3f7af50285edb7bc ]--- [ 898.752237] Fixing recursive fault but reboot is needed! [ 912.746625] xen_bridge: port 1(vif1.0) entered forwarding state _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Konrad Rzeszutek Wilk
2012-Sep-04 16:39 UTC
Re: dom0 linux 3.6.0-rc4, crash due to ballooning althoug dom0_mem=X, max:X set
On Tue, Sep 04, 2012 at 06:37:57PM +0200, Sander Eikelenboom wrote:> Hi Konrad, > > This seems to happen only on a intel machine i''m trying to setup as a development machine (haven''t seen it on my amd). > It boots fine, i have dom0_mem=1024M,max:1024M set, the machine has 2G of mem. > > Dom0 and guest kernel are 3.6.0-rc4 with config: > [*] Xen memory balloon driver > [*] Scrub pages before returning them to systemCan you also try this patch out and provide the full log (bootup and such). Thanks! diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c index 31ab82f..871a93c 100644 --- a/drivers/xen/balloon.c +++ b/drivers/xen/balloon.c @@ -355,8 +355,12 @@ static enum bp_state increase_reservation(unsigned long nr_pages) BUG_ON(page == NULL); pfn = page_to_pfn(page); - BUG_ON(!xen_feature(XENFEAT_auto_translated_physmap) && - phys_to_machine_mapping_valid(pfn)); + if (!xen_feature(XENFEAT_auto_translated_physmap)) { + if (phys_to_machine_mapping_valid(pfn)) { + printk(KERN_DEBUG "%lx is %lx!\n", pfn, get_phys_to_machine(pfn)); + continue; + } + } set_phys_to_machine(pfn, frame_list[i]); @@ -572,6 +576,7 @@ static void __init balloon_add_region(unsigned long start_pfn, */ extra_pfn_end = min(max_pfn, start_pfn + pages); + printk(KERN_INFO "%s: [%lx->%lx]\n", __func__, start_pfn, extra_pfn_end); for (pfn = start_pfn; pfn < extra_pfn_end; pfn++) { page = pfn_to_page(pfn); /* totalram_pages and totalhigh_pages do not
Sander Eikelenboom
2012-Sep-04 17:19 UTC
Re: dom0 linux 3.6.0-rc4, crash due to ballooning althoug dom0_mem=X, max:X set
Tuesday, September 4, 2012, 6:33:47 PM, you wrote:> On Tue, Sep 04, 2012 at 06:37:57PM +0200, Sander Eikelenboom wrote: >> Hi Konrad, >> >> This seems to happen only on a intel machine i''m trying to setup as a development machine (haven''t seen it on my amd). >> It boots fine, i have dom0_mem=1024M,max:1024M set, the machine has 2G of mem.> Is this only with Xen 4.2? As, does Xen 4.1 work? >> >> Dom0 and guest kernel are 3.6.0-rc4 with config:> If you back out:> f393387d160211f60398d58463a7e65 > Author: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> > Date: Fri Aug 17 16:43:28 2012 -0400> xen/setup: Fix one-off error when adding for-balloon PFNs to the P2M.> Do you see this bug? (Either with Xen 4.1 or Xen 4.2)?With c96aae1f7f393387d160211f60398d58463a7e65 reverted i still see this bug (with Xen 4.2). Will use the debug patch you mailed and send back the results ...>> [*] Xen memory balloon driver >> [*] Scrub pages before returning them to system >> >> From http://wiki.xen.org/wiki/Do%EF%BB%BFm0_Memory_%E2%80%94_Where_It_Has_Not_Gone , I thought this should be okay >> >> But when trying to start a PV guest with 512MB mem, the machine (dom0) crashes with the stacktrace below (complete serial-log.txt attached). >> >> From the: >> "mapping kernel into physical memory >> about to get started..." >> >> I would almost say it''s trying to reload dom0 ? >> >> >> [ 897.161119] device vif1.0 entered promiscuous mode >> mapping kernel into physical memory >> about to get started... >> [ 897.696619] xen_bridge: port 1(vif1.0) entered forwarding state >> [ 897.716219] xen_bridge: port 1(vif1.0) entered forwarding state >> [ 898.129465] ------------[ cut here ]------------ >> [ 898.132209] kernel BUG at drivers/xen/balloon.c:359! >> [ 898.132209] invalid opcode: 0000 [#1] PREEMPT SMP
Konrad Rzeszutek Wilk
2012-Sep-04 17:58 UTC
Re: dom0 linux 3.6.0-rc4, crash due to ballooning althoug dom0_mem=X, max:X set
On Tue, Sep 04, 2012 at 08:02:41PM +0200, Sander Eikelenboom wrote:> > Tuesday, September 4, 2012, 6:39:03 PM, you wrote: > > > On Tue, Sep 04, 2012 at 06:37:57PM +0200, Sander Eikelenboom wrote: > >> Hi Konrad, > >> > >> This seems to happen only on a intel machine i''m trying to setup as a development machine (haven''t seen it on my amd). > >> It boots fine, i have dom0_mem=1024M,max:1024M set, the machine has 2G of mem. > >> > >> Dom0 and guest kernel are 3.6.0-rc4 with config: > >> [*] Xen memory balloon driver > >> [*] Scrub pages before returning them to system > > > Can you also try this patch out and provide the full log (bootup and such). Thanks! > > After applying this patch and due to the removal of the BUG_ON the domU boots and is reachable by SSH. > Serial log attached.Wow. That is a lot of .. And if you use Xen 4.1 it works fine?
Sander Eikelenboom
2012-Sep-04 18:02 UTC
Re: dom0 linux 3.6.0-rc4, crash due to ballooning althoug dom0_mem=X, max:X set
Tuesday, September 4, 2012, 6:39:03 PM, you wrote:> On Tue, Sep 04, 2012 at 06:37:57PM +0200, Sander Eikelenboom wrote: >> Hi Konrad, >> >> This seems to happen only on a intel machine i''m trying to setup as a development machine (haven''t seen it on my amd). >> It boots fine, i have dom0_mem=1024M,max:1024M set, the machine has 2G of mem. >> >> Dom0 and guest kernel are 3.6.0-rc4 with config: >> [*] Xen memory balloon driver >> [*] Scrub pages before returning them to system> Can you also try this patch out and provide the full log (bootup and such). Thanks!After applying this patch and due to the removal of the BUG_ON the domU boots and is reachable by SSH. Serial log attached.> diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c > index 31ab82f..871a93c 100644 > --- a/drivers/xen/balloon.c > +++ b/drivers/xen/balloon.c > @@ -355,8 +355,12 @@ static enum bp_state increase_reservation(unsigned long nr_pages) > BUG_ON(page == NULL); > > pfn = page_to_pfn(page); > - BUG_ON(!xen_feature(XENFEAT_auto_translated_physmap) && > - phys_to_machine_mapping_valid(pfn)); > + if (!xen_feature(XENFEAT_auto_translated_physmap)) { > + if (phys_to_machine_mapping_valid(pfn)) { > + printk(KERN_DEBUG "%lx is %lx!\n", pfn, get_phys_to_machine(pfn)); > + continue; > + } > + } > > set_phys_to_machine(pfn, frame_list[i]); > > @@ -572,6 +576,7 @@ static void __init balloon_add_region(unsigned long start_pfn, > */ > extra_pfn_end = min(max_pfn, start_pfn + pages); > > + printk(KERN_INFO "%s: [%lx->%lx]\n", __func__, start_pfn, extra_pfn_end); > for (pfn = start_pfn; pfn < extra_pfn_end; pfn++) { > page = pfn_to_page(pfn); > /* totalram_pages and totalhigh_pages do not_______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Ben Guthro
2012-Sep-04 18:07 UTC
Re: dom0 linux 3.6.0-rc4, crash due to ballooning althoug dom0_mem=X, max:X set
We ran into the same issue, in newer kernels - but had not yet submitted this fix. One of the developers here came up with a fix (attached, and CC''ed here) that fixes an issue where the p2m code reuses a structure member where it shouldn''t. The patch adds a new "old_mfn" member to the gnttab_map_grant_ref structure, instead of re-using dev_bus_addr. If this also works for you, I can re-submit it with a Signed-off-by line, if you prefer, Konrad. Ben On Tue, Sep 4, 2012 at 1:19 PM, Sander Eikelenboom <linux@eikelenboom.it> wrote:> > Tuesday, September 4, 2012, 6:33:47 PM, you wrote: > >> On Tue, Sep 04, 2012 at 06:37:57PM +0200, Sander Eikelenboom wrote: >>> Hi Konrad, >>> >>> This seems to happen only on a intel machine i''m trying to setup as a development machine (haven''t seen it on my amd). >>> It boots fine, i have dom0_mem=1024M,max:1024M set, the machine has 2G of mem. > >> Is this only with Xen 4.2? As, does Xen 4.1 work? >>> >>> Dom0 and guest kernel are 3.6.0-rc4 with config: > >> If you back out: > >> f393387d160211f60398d58463a7e65 >> Author: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> >> Date: Fri Aug 17 16:43:28 2012 -0400 > >> xen/setup: Fix one-off error when adding for-balloon PFNs to the P2M. > >> Do you see this bug? (Either with Xen 4.1 or Xen 4.2)? > > With c96aae1f7f393387d160211f60398d58463a7e65 reverted i still see this bug (with Xen 4.2). > > Will use the debug patch you mailed and send back the results ... > > >>> [*] Xen memory balloon driver >>> [*] Scrub pages before returning them to system >>> >>> From http://wiki.xen.org/wiki/Do%EF%BB%BFm0_Memory_%E2%80%94_Where_It_Has_Not_Gone , I thought this should be okay >>> >>> But when trying to start a PV guest with 512MB mem, the machine (dom0) crashes with the stacktrace below (complete serial-log.txt attached). >>> >>> From the: >>> "mapping kernel into physical memory >>> about to get started..." >>> >>> I would almost say it''s trying to reload dom0 ? >>> >>> >>> [ 897.161119] device vif1.0 entered promiscuous mode >>> mapping kernel into physical memory >>> about to get started... >>> [ 897.696619] xen_bridge: port 1(vif1.0) entered forwarding state >>> [ 897.716219] xen_bridge: port 1(vif1.0) entered forwarding state >>> [ 898.129465] ------------[ cut here ]------------ >>> [ 898.132209] kernel BUG at drivers/xen/balloon.c:359! >>> [ 898.132209] invalid opcode: 0000 [#1] PREEMPT SMP > > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Konrad Rzeszutek Wilk
2012-Sep-04 18:22 UTC
Re: dom0 linux 3.6.0-rc4, crash due to ballooning althoug dom0_mem=X, max:X set
On Tue, Sep 04, 2012 at 02:07:11PM -0400, Ben Guthro wrote:> We ran into the same issue, in newer kernels - but had not yet > submitted this fix. > > One of the developers here came up with a fix (attached, and CC''ed > here) that fixes an issue where the p2m code reuses a structure member > where it shouldn''t. > The patch adds a new "old_mfn" member to the gnttab_map_grant_ref > structure, instead of re-using dev_bus_addr.Wow. So that implies the m2p code had some new wonkiness in it. Perhaps this b9e0d95c041ca2d7ad297ee37c2e9cfab67a188f or 0930bba674e248b921ea659b036ff02564e5a5f4 both courtesy of Stefano (who is on vacation this week :-()) are at fault? Would it be possible to revert one of them (or both) and see if the issues disappear?> > > If this also works for you, I can re-submit it with a Signed-off-by > line, if you prefer, Konrad. > > Ben > > > On Tue, Sep 4, 2012 at 1:19 PM, Sander Eikelenboom <linux@eikelenboom.it> wrote: > > > > Tuesday, September 4, 2012, 6:33:47 PM, you wrote: > > > >> On Tue, Sep 04, 2012 at 06:37:57PM +0200, Sander Eikelenboom wrote: > >>> Hi Konrad, > >>> > >>> This seems to happen only on a intel machine i''m trying to setup as a development machine (haven''t seen it on my amd). > >>> It boots fine, i have dom0_mem=1024M,max:1024M set, the machine has 2G of mem. > > > >> Is this only with Xen 4.2? As, does Xen 4.1 work? > >>> > >>> Dom0 and guest kernel are 3.6.0-rc4 with config: > > > >> If you back out: > > > >> f393387d160211f60398d58463a7e65 > >> Author: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> > >> Date: Fri Aug 17 16:43:28 2012 -0400 > > > >> xen/setup: Fix one-off error when adding for-balloon PFNs to the P2M. > > > >> Do you see this bug? (Either with Xen 4.1 or Xen 4.2)? > > > > With c96aae1f7f393387d160211f60398d58463a7e65 reverted i still see this bug (with Xen 4.2). > > > > Will use the debug patch you mailed and send back the results ... > > > > > >>> [*] Xen memory balloon driver > >>> [*] Scrub pages before returning them to system > >>> > >>> From http://wiki.xen.org/wiki/Do%EF%BB%BFm0_Memory_%E2%80%94_Where_It_Has_Not_Gone , I thought this should be okay > >>> > >>> But when trying to start a PV guest with 512MB mem, the machine (dom0) crashes with the stacktrace below (complete serial-log.txt attached). > >>> > >>> From the: > >>> "mapping kernel into physical memory > >>> about to get started..." > >>> > >>> I would almost say it''s trying to reload dom0 ? > >>> > >>> > >>> [ 897.161119] device vif1.0 entered promiscuous mode > >>> mapping kernel into physical memory > >>> about to get started... > >>> [ 897.696619] xen_bridge: port 1(vif1.0) entered forwarding state > >>> [ 897.716219] xen_bridge: port 1(vif1.0) entered forwarding state > >>> [ 898.129465] ------------[ cut here ]------------ > >>> [ 898.132209] kernel BUG at drivers/xen/balloon.c:359! > >>> [ 898.132209] invalid opcode: 0000 [#1] PREEMPT SMP > > > > > > > > _______________________________________________ > > Xen-devel mailing list > > Xen-devel@lists.xen.org > > http://lists.xen.org/xen-devel
Sander Eikelenboom
2012-Sep-04 18:57 UTC
Re: dom0 linux 3.6.0-rc4, crash due to ballooning althoug dom0_mem=X, max:X set
Tuesday, September 4, 2012, 8:22:41 PM, you wrote:> On Tue, Sep 04, 2012 at 02:07:11PM -0400, Ben Guthro wrote: >> We ran into the same issue, in newer kernels - but had not yet >> submitted this fix. >> >> One of the developers here came up with a fix (attached, and CC''ed >> here) that fixes an issue where the p2m code reuses a structure member >> where it shouldn''t. >> The patch adds a new "old_mfn" member to the gnttab_map_grant_ref >> structure, instead of re-using dev_bus_addr.> Wow. So that implies the m2p code had some new wonkiness in it.> Perhaps this b9e0d95c041ca2d7ad297ee37c2e9cfab67a188f > or > 0930bba674e248b921ea659b036ff02564e5a5f4> both courtesy of Stefano (who is on vacation this week :-()) > are at fault?> Would it be possible to revert one of them (or both) and see if the > issues disappear?reverting b9e0d95c041ca2d7ad297ee37c2e9cfab67a188f didn''t help reverting 0930bba674e248b921ea659b036ff02564e5a5f4 didn''t work out due to a lot of merge conflicts :S>> >> >> If this also works for you, I can re-submit it with a Signed-off-by >> line, if you prefer, Konrad. >> >> Ben >> >> >> On Tue, Sep 4, 2012 at 1:19 PM, Sander Eikelenboom <linux@eikelenboom.it> wrote: >> > >> > Tuesday, September 4, 2012, 6:33:47 PM, you wrote: >> > >> >> On Tue, Sep 04, 2012 at 06:37:57PM +0200, Sander Eikelenboom wrote: >> >>> Hi Konrad, >> >>> >> >>> This seems to happen only on a intel machine i''m trying to setup as a development machine (haven''t seen it on my amd). >> >>> It boots fine, i have dom0_mem=1024M,max:1024M set, the machine has 2G of mem. >> > >> >> Is this only with Xen 4.2? As, does Xen 4.1 work? >> >>> >> >>> Dom0 and guest kernel are 3.6.0-rc4 with config: >> > >> >> If you back out: >> > >> >> f393387d160211f60398d58463a7e65 >> >> Author: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> >> >> Date: Fri Aug 17 16:43:28 2012 -0400 >> > >> >> xen/setup: Fix one-off error when adding for-balloon PFNs to the P2M. >> > >> >> Do you see this bug? (Either with Xen 4.1 or Xen 4.2)? >> > >> > With c96aae1f7f393387d160211f60398d58463a7e65 reverted i still see this bug (with Xen 4.2). >> > >> > Will use the debug patch you mailed and send back the results ... >> > >> > >> >>> [*] Xen memory balloon driver >> >>> [*] Scrub pages before returning them to system >> >>> >> >>> From http://wiki.xen.org/wiki/Do%EF%BB%BFm0_Memory_%E2%80%94_Where_It_Has_Not_Gone , I thought this should be okay >> >>> >> >>> But when trying to start a PV guest with 512MB mem, the machine (dom0) crashes with the stacktrace below (complete serial-log.txt attached). >> >>> >> >>> From the: >> >>> "mapping kernel into physical memory >> >>> about to get started..." >> >>> >> >>> I would almost say it''s trying to reload dom0 ? >> >>> >> >>> >> >>> [ 897.161119] device vif1.0 entered promiscuous mode >> >>> mapping kernel into physical memory >> >>> about to get started... >> >>> [ 897.696619] xen_bridge: port 1(vif1.0) entered forwarding state >> >>> [ 897.716219] xen_bridge: port 1(vif1.0) entered forwarding state >> >>> [ 898.129465] ------------[ cut here ]------------ >> >>> [ 898.132209] kernel BUG at drivers/xen/balloon.c:359! >> >>> [ 898.132209] invalid opcode: 0000 [#1] PREEMPT SMP >> > >> > >> > >> > _______________________________________________ >> > Xen-devel mailing list >> > Xen-devel@lists.xen.org >> > http://lists.xen.org/xen-devel
Sander Eikelenboom
2012-Sep-04 19:01 UTC
Re: dom0 linux 3.6.0-rc4, crash due to ballooning althoug dom0_mem=X, max:X set
Tuesday, September 4, 2012, 7:58:41 PM, you wrote:> On Tue, Sep 04, 2012 at 08:02:41PM +0200, Sander Eikelenboom wrote: >> >> Tuesday, September 4, 2012, 6:39:03 PM, you wrote: >> >> > On Tue, Sep 04, 2012 at 06:37:57PM +0200, Sander Eikelenboom wrote: >> >> Hi Konrad, >> >> >> >> This seems to happen only on a intel machine i''m trying to setup as a development machine (haven''t seen it on my amd). >> >> It boots fine, i have dom0_mem=1024M,max:1024M set, the machine has 2G of mem. >> >> >> >> Dom0 and guest kernel are 3.6.0-rc4 with config: >> >> [*] Xen memory balloon driver >> >> [*] Scrub pages before returning them to system >> >> > Can you also try this patch out and provide the full log (bootup and such). Thanks! >> >> After applying this patch and due to the removal of the BUG_ON the domU boots and is reachable by SSH. >> Serial log attached.> Wow. That is a lot of .. And if you use Xen 4.1 it works fine?Uhmm don''t know, didn''t use this machine for a while, doing things like writing a master thesis :) Upgraded xen and kernel from 2.6.36 kernel and xen 4.0something to 3.6.0-rc4 and 4.2-rc4 Trying to make it work and try to make some xen patches :-p But i seem to be stumbling over quite a lot of things with both machines(amd and intel) while going to xen 4.2 (and from xm to xl)
Sander Eikelenboom
2012-Sep-04 19:34 UTC
Re: dom0 linux 3.6.0-rc4, crash due to ballooning althoug dom0_mem=X, max:X set
Tuesday, September 4, 2012, 8:07:11 PM, you wrote:> We ran into the same issue, in newer kernels - but had not yet > submitted this fix.> One of the developers here came up with a fix (attached, and CC''ed > here) that fixes an issue where the p2m code reuses a structure member > where it shouldn''t. > The patch adds a new "old_mfn" member to the gnttab_map_grant_ref > structure, instead of re-using dev_bus_addr.> If this also works for you, I can re-submit it with a Signed-off-by > line, if you prefer, Konrad.Hi Ben, This patch doesn''t work for me: When starting the PV-guest i get: (XEN) [2012-09-04 20:31:37] grant_table.c:499:d0 Bad flags in grant map op (68b69070). (XEN) [2012-09-04 20:31:37] grant_table.c:499:d0 Bad flags in grant map op (0). (XEN) [2012-09-04 20:31:37] grant_table.c:499:d0 Bad flags in grant map op (0). and from the dom0 kernel: [ 374.425727] BUG: unable to handle kernel paging request at ffff8800fffd9078 [ 374.428901] IP: [<ffffffff81336e4e>] gnttab_map_refs+0x14e/0x270 [ 374.428901] PGD 1e0c067 PUD 0 [ 374.428901] Oops: 0000 [#1] PREEMPT SMP [ 374.428901] Modules linked in: [ 374.428901] CPU 0 [ 374.428901] Pid: 4308, comm: qemu-system-i38 Not tainted 3.6.0-rc4-20120830+ #70 System manufacturer System Product Name/P5Q-EM DO [ 374.428901] RIP: e030:[<ffffffff81336e4e>] [<ffffffff81336e4e>] gnttab_map_refs+0x14e/0x270 [ 374.428901] RSP: e02b:ffff88002f185ca8 EFLAGS: 00010206 [ 374.428901] RAX: ffff880000000000 RBX: ffff88001471cf00 RCX: 00000000fffd9078 [ 374.428901] RDX: 0000000000000050 RSI: 40000000000fffd9 RDI: 00003ffffffff000 [ 374.428901] RBP: ffff88002f185d08 R08: 0000000000000078 R09: 0000000000000000 [ 374.428901] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000004 [ 374.428901] R13: ffff88001471c480 R14: 0000000000000002 R15: 0000000000000002 [ 374.428901] FS: 00007f6def9f2740(0000) GS:ffff88003fc00000(0000) knlGS:0000000000000000 [ 374.428901] CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b [ 374.428901] CR2: ffff8800fffd9078 CR3: 000000002d30e000 CR4: 0000000000042660 [ 374.428901] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 374.428901] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 374.428901] Process qemu-system-i38 (pid: 4308, threadinfo ffff88002f184000, task ffff8800376f1040) [ 374.428901] Stack: [ 374.428901] ffffffffffffffff 0000000000000050 00000000fffd9078 00000000000fffd9 [ 374.428901] 0000000001000000 ffff8800382135a0 ffff88002f185d08 ffff880038211960 [ 374.428901] ffff88002f11d2c0 0000000000000004 0000000000000003 0000000000000001 [ 374.428901] Call Trace: [ 374.428901] [<ffffffff8134212e>] gntdev_mmap+0x20e/0x520 [ 374.428901] [<ffffffff8111c502>] ? mmap_region+0x312/0x5a0 [ 374.428901] [<ffffffff810ae0a0>] ? lockdep_trace_alloc+0xa0/0x130 [ 374.428901] [<ffffffff8111c5be>] mmap_region+0x3ce/0x5a0 [ 374.428901] [<ffffffff8111c9e0>] do_mmap_pgoff+0x250/0x350 [ 374.428901] [<ffffffff81109e88>] vm_mmap_pgoff+0x68/0x90 [ 374.428901] [<ffffffff8111a5b2>] sys_mmap_pgoff+0x152/0x170 [ 374.428901] [<ffffffff812b29be>] ? trace_hardirqs_on_thunk+0x3a/0x3f [ 374.428901] [<ffffffff81011f29>] sys_mmap+0x29/0x30 [ 374.428901] [<ffffffff8184b939>] system_call_fastpath+0x16/0x1b [ 374.428901] Code: 0f 84 e7 00 00 00 48 89 f1 48 c1 e1 0c 41 81 e0 ff 0f 00 00 48 b8 00 00 00 00 00 88 ff ff 48 bf 00 f0 ff ff ff 3f 00 00 4c 01 c1 <48> 23 3c 01 48 c1 ef 0c 49 8d 54 15 00 4d 85 ed b8 00 00 00 00 [ 374.428901] RIP [<ffffffff81336e4e>] gnttab_map_refs+0x14e/0x270 [ 374.428901] RSP <ffff88002f185ca8> [ 374.428901] CR2: ffff8800fffd9078 [ 374.428901] ---[ end trace 0e0a5a49f6503c0a ]---> Ben> On Tue, Sep 4, 2012 at 1:19 PM, Sander Eikelenboom <linux@eikelenboom.it> wrote: >> >> Tuesday, September 4, 2012, 6:33:47 PM, you wrote: >> >>> On Tue, Sep 04, 2012 at 06:37:57PM +0200, Sander Eikelenboom wrote: >>>> Hi Konrad, >>>> >>>> This seems to happen only on a intel machine i''m trying to setup as a development machine (haven''t seen it on my amd). >>>> It boots fine, i have dom0_mem=1024M,max:1024M set, the machine has 2G of mem. >> >>> Is this only with Xen 4.2? As, does Xen 4.1 work? >>>> >>>> Dom0 and guest kernel are 3.6.0-rc4 with config: >> >>> If you back out: >> >>> f393387d160211f60398d58463a7e65 >>> Author: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> >>> Date: Fri Aug 17 16:43:28 2012 -0400 >> >>> xen/setup: Fix one-off error when adding for-balloon PFNs to the P2M. >> >>> Do you see this bug? (Either with Xen 4.1 or Xen 4.2)? >> >> With c96aae1f7f393387d160211f60398d58463a7e65 reverted i still see this bug (with Xen 4.2). >> >> Will use the debug patch you mailed and send back the results ... >> >> >>>> [*] Xen memory balloon driver >>>> [*] Scrub pages before returning them to system >>>> >>>> From http://wiki.xen.org/wiki/Do%EF%BB%BFm0_Memory_%E2%80%94_Where_It_Has_Not_Gone , I thought this should be okay >>>> >>>> But when trying to start a PV guest with 512MB mem, the machine (dom0) crashes with the stacktrace below (complete serial-log.txt attached). >>>> >>>> From the: >>>> "mapping kernel into physical memory >>>> about to get started..." >>>> >>>> I would almost say it''s trying to reload dom0 ? >>>> >>>> >>>> [ 897.161119] device vif1.0 entered promiscuous mode >>>> mapping kernel into physical memory >>>> about to get started... >>>> [ 897.696619] xen_bridge: port 1(vif1.0) entered forwarding state >>>> [ 897.716219] xen_bridge: port 1(vif1.0) entered forwarding state >>>> [ 898.129465] ------------[ cut here ]------------ >>>> [ 898.132209] kernel BUG at drivers/xen/balloon.c:359! >>>> [ 898.132209] invalid opcode: 0000 [#1] PREEMPT SMP >> >> >> >> _______________________________________________ >> Xen-devel mailing list >> Xen-devel@lists.xen.org >> http://lists.xen.org/xen-devel
Sander Eikelenboom
2012-Sep-04 20:13 UTC
Re: dom0 linux 3.6.0-rc4, crash due to ballooning althoug dom0_mem=X, max:X set
Tuesday, September 4, 2012, 7:58:41 PM, you wrote:> On Tue, Sep 04, 2012 at 08:02:41PM +0200, Sander Eikelenboom wrote: >> >> Tuesday, September 4, 2012, 6:39:03 PM, you wrote: >> >> > On Tue, Sep 04, 2012 at 06:37:57PM +0200, Sander Eikelenboom wrote: >> >> Hi Konrad, >> >> >> >> This seems to happen only on a intel machine i''m trying to setup as a development machine (haven''t seen it on my amd). >> >> It boots fine, i have dom0_mem=1024M,max:1024M set, the machine has 2G of mem. >> >> >> >> Dom0 and guest kernel are 3.6.0-rc4 with config: >> >> [*] Xen memory balloon driver >> >> [*] Scrub pages before returning them to system >> >> > Can you also try this patch out and provide the full log (bootup and such). Thanks! >> >> After applying this patch and due to the removal of the BUG_ON the domU boots and is reachable by SSH. >> Serial log attached.> Wow. That is a lot of .. And if you use Xen 4.1 it works fine?Ok 3.5.3 crashes as well .. will see what xen 4.1.3 does with both kernels on this machine ...
Robert Phillips
2012-Sep-04 20:27 UTC
Re: dom0 linux 3.6.0-rc4, crash due to ballooning althoug dom0_mem=X, max:X set
Ben, You have asked me to provide the rationale behind the gnttab_old_mfn patch, which you emailed to Sander earlier today. Here are my findings. I found that xen_blkbk_map() in drivers/block/xen-blkback/blkback.c has changed from our previous version. It now calls gnttab_map_refs() in drivers/xen/grant-table.c. That function first calls HYPERVISOR_grant_table_op(GNTTABOP_map_grant_ref, ... ) and then calls m2p_add_override() in p2m.c which is where I made my change. The unpatched code was saving the pfn''s old mfn in kmap_op->dev_bus_addr. kmap_op is of type struct gnttab_map_grant_ref. That data type is used to record grant table mappings so later they can be unmapped correctly. The problem with saving the old mfn in kmap_op->dev_bus_addr is that it is later overwritten by __gnttab_map_grant_ref() in xen/common/grant_table.c Since the storage holding the old mfn got overwritten, the unmapping was being done incorrectly. The balloon code detected that and bugged at drivers/xen/balloon.c:359 My patch simply adds another member called old_mfn to struct gnttab_map_grant_ref rather than trying to overload dev_bus_addr. I don''t know if Sander''s bug is the same or related. The BUG_ON at drivers/xen/balloon.c:359 is quite general. It simply asserts that we are not trying to re-map a valid mapping. -- Robert Phillips -----Original Message----- From: Sander Eikelenboom [mailto:linux@eikelenboom.it] Sent: Tuesday, September 04, 2012 3:35 PM To: Ben Guthro Cc: Konrad Rzeszutek Wilk; xen-devel@lists.xen.org; Robert Phillips Subject: Re: [Xen-devel] dom0 linux 3.6.0-rc4, crash due to ballooning althoug dom0_mem=X, max:X set Tuesday, September 4, 2012, 8:07:11 PM, you wrote:> We ran into the same issue, in newer kernels - but had not yet > submitted this fix.> One of the developers here came up with a fix (attached, and CC''ed > here) that fixes an issue where the p2m code reuses a structure member > where it shouldn''t. > The patch adds a new "old_mfn" member to the gnttab_map_grant_ref > structure, instead of re-using dev_bus_addr.> If this also works for you, I can re-submit it with a Signed-off-by > line, if you prefer, Konrad.Hi Ben, This patch doesn''t work for me: When starting the PV-guest i get: (XEN) [2012-09-04 20:31:37] grant_table.c:499:d0 Bad flags in grant map op (68b69070). (XEN) [2012-09-04 20:31:37] grant_table.c:499:d0 Bad flags in grant map op (0). (XEN) [2012-09-04 20:31:37] grant_table.c:499:d0 Bad flags in grant map op (0). and from the dom0 kernel: [ 374.425727] BUG: unable to handle kernel paging request at ffff8800fffd9078 [ 374.428901] IP: [<ffffffff81336e4e>] gnttab_map_refs+0x14e/0x270 [ 374.428901] PGD 1e0c067 PUD 0 [ 374.428901] Oops: 0000 [#1] PREEMPT SMP [ 374.428901] Modules linked in: [ 374.428901] CPU 0 [ 374.428901] Pid: 4308, comm: qemu-system-i38 Not tainted 3.6.0-rc4-20120830+ #70 System manufacturer System Product Name/P5Q-EM DO [ 374.428901] RIP: e030:[<ffffffff81336e4e>] [<ffffffff81336e4e>] gnttab_map_refs+0x14e/0x270 [ 374.428901] RSP: e02b:ffff88002f185ca8 EFLAGS: 00010206 [ 374.428901] RAX: ffff880000000000 RBX: ffff88001471cf00 RCX: 00000000fffd9078 [ 374.428901] RDX: 0000000000000050 RSI: 40000000000fffd9 RDI: 00003ffffffff000 [ 374.428901] RBP: ffff88002f185d08 R08: 0000000000000078 R09: 0000000000000000 [ 374.428901] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000004 [ 374.428901] R13: ffff88001471c480 R14: 0000000000000002 R15: 0000000000000002 [ 374.428901] FS: 00007f6def9f2740(0000) GS:ffff88003fc00000(0000) knlGS:0000000000000000 [ 374.428901] CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b [ 374.428901] CR2: ffff8800fffd9078 CR3: 000000002d30e000 CR4: 0000000000042660 [ 374.428901] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 374.428901] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 374.428901] Process qemu-system-i38 (pid: 4308, threadinfo ffff88002f184000, task ffff8800376f1040) [ 374.428901] Stack: [ 374.428901] ffffffffffffffff 0000000000000050 00000000fffd9078 00000000000fffd9 [ 374.428901] 0000000001000000 ffff8800382135a0 ffff88002f185d08 ffff880038211960 [ 374.428901] ffff88002f11d2c0 0000000000000004 0000000000000003 0000000000000001 [ 374.428901] Call Trace: [ 374.428901] [<ffffffff8134212e>] gntdev_mmap+0x20e/0x520 [ 374.428901] [<ffffffff8111c502>] ? mmap_region+0x312/0x5a0 [ 374.428901] [<ffffffff810ae0a0>] ? lockdep_trace_alloc+0xa0/0x130 [ 374.428901] [<ffffffff8111c5be>] mmap_region+0x3ce/0x5a0 [ 374.428901] [<ffffffff8111c9e0>] do_mmap_pgoff+0x250/0x350 [ 374.428901] [<ffffffff81109e88>] vm_mmap_pgoff+0x68/0x90 [ 374.428901] [<ffffffff8111a5b2>] sys_mmap_pgoff+0x152/0x170 [ 374.428901] [<ffffffff812b29be>] ? trace_hardirqs_on_thunk+0x3a/0x3f [ 374.428901] [<ffffffff81011f29>] sys_mmap+0x29/0x30 [ 374.428901] [<ffffffff8184b939>] system_call_fastpath+0x16/0x1b [ 374.428901] Code: 0f 84 e7 00 00 00 48 89 f1 48 c1 e1 0c 41 81 e0 ff 0f 00 00 48 b8 00 00 00 00 00 88 ff ff 48 bf 00 f0 ff ff ff 3f 00 00 4c 01 c1 <48> 23 3c 01 48 c1 ef 0c 49 8d 54 15 00 4d 85 ed b8 00 00 00 00 [ 374.428901] RIP [<ffffffff81336e4e>] gnttab_map_refs+0x14e/0x270 [ 374.428901] RSP <ffff88002f185ca8> [ 374.428901] CR2: ffff8800fffd9078 [ 374.428901] ---[ end trace 0e0a5a49f6503c0a ]---> Ben> On Tue, Sep 4, 2012 at 1:19 PM, Sander Eikelenboom <linux@eikelenboom.it> wrote: >> >> Tuesday, September 4, 2012, 6:33:47 PM, you wrote: >> >>> On Tue, Sep 04, 2012 at 06:37:57PM +0200, Sander Eikelenboom wrote: >>>> Hi Konrad, >>>> >>>> This seems to happen only on a intel machine i''m trying to setup as a development machine (haven''t seen it on my amd). >>>> It boots fine, i have dom0_mem=1024M,max:1024M set, the machine has 2G of mem. >> >>> Is this only with Xen 4.2? As, does Xen 4.1 work? >>>> >>>> Dom0 and guest kernel are 3.6.0-rc4 with config: >> >>> If you back out: >> >>> f393387d160211f60398d58463a7e65 >>> Author: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> >>> Date: Fri Aug 17 16:43:28 2012 -0400 >> >>> xen/setup: Fix one-off error when adding for-balloon PFNs to the P2M. >> >>> Do you see this bug? (Either with Xen 4.1 or Xen 4.2)? >> >> With c96aae1f7f393387d160211f60398d58463a7e65 reverted i still see this bug (with Xen 4.2). >> >> Will use the debug patch you mailed and send back the results ... >> >> >>>> [*] Xen memory balloon driver >>>> [*] Scrub pages before returning them to system >>>> >>>> From http://wiki.xen.org/wiki/Do%EF%BB%BFm0_Memory_%E2%80%94_Where_It_Has_Not_Gone , I thought this should be okay >>>> >>>> But when trying to start a PV guest with 512MB mem, the machine (dom0) crashes with the stacktrace below (complete serial-log.txt attached). >>>> >>>> From the: >>>> "mapping kernel into physical memory >>>> about to get started..." >>>> >>>> I would almost say it''s trying to reload dom0 ? >>>> >>>> >>>> [ 897.161119] device vif1.0 entered promiscuous mode >>>> mapping kernel into physical memory >>>> about to get started... >>>> [ 897.696619] xen_bridge: port 1(vif1.0) entered forwarding state >>>> [ 897.716219] xen_bridge: port 1(vif1.0) entered forwarding state >>>> [ 898.129465] ------------[ cut here ]------------ >>>> [ 898.132209] kernel BUG at drivers/xen/balloon.c:359! >>>> [ 898.132209] invalid opcode: 0000 [#1] PREEMPT SMP >> >> >> >> _______________________________________________ >> Xen-devel mailing list >> Xen-devel@lists.xen.org >> http://lists.xen.org/xen-devel
Sander Eikelenboom
2012-Sep-04 21:23 UTC
Re: dom0 linux 3.6.0-rc4, crash due to ballooning althoug dom0_mem=X, max:X set
Tuesday, September 4, 2012, 7:58:41 PM, you wrote:> On Tue, Sep 04, 2012 at 08:02:41PM +0200, Sander Eikelenboom wrote: >> >> Tuesday, September 4, 2012, 6:39:03 PM, you wrote: >> >> > On Tue, Sep 04, 2012 at 06:37:57PM +0200, Sander Eikelenboom wrote: >> >> Hi Konrad, >> >> >> >> This seems to happen only on a intel machine i''m trying to setup as a development machine (haven''t seen it on my amd). >> >> It boots fine, i have dom0_mem=1024M,max:1024M set, the machine has 2G of mem. >> >> >> >> Dom0 and guest kernel are 3.6.0-rc4 with config: >> >> [*] Xen memory balloon driver >> >> [*] Scrub pages before returning them to system >> >> > Can you also try this patch out and provide the full log (bootup and such). Thanks! >> >> After applying this patch and due to the removal of the BUG_ON the domU boots and is reachable by SSH. >> Serial log attached.> Wow. That is a lot of .. And if you use Xen 4.1 it works fine?Ok .. to sum it up after todays compile day :-p - xen-4.2.0-rc4-pre + linux 3.6-rc4 -> BUG_ON on start PV guest - xen-4.2.0-rc4-pre + linux 3.5.3 -> BUG_ON on start PV guest - xen-4.1.4-pre + linux 3.5.3 -> BUG_ON on start PV guest - xen-4.1.4-pre + linux 3.4.1 -> Works OK - xen-4.2.0-rc4-pre + linux 3.6-rc4 -> Works, BUG_ON removed by patch (http://lists.xen.org/archives/html/xen-devel/2012-09/msg00142.html)
Konrad Rzeszutek Wilk
2012-Sep-05 14:06 UTC
Re: dom0 linux 3.6.0-rc4, crash due to ballooning althoug dom0_mem=X, max:X set
On Tue, Sep 04, 2012 at 04:27:20PM -0400, Robert Phillips wrote:> Ben, > > You have asked me to provide the rationale behind the gnttab_old_mfn patch, which you emailed to Sander earlier today. > Here are my findings. > > I found that xen_blkbk_map() in drivers/block/xen-blkback/blkback.c has changed from our previous version. It now calls gnttab_map_refs() in drivers/xen/grant-table.c. > > That function first calls HYPERVISOR_grant_table_op(GNTTABOP_map_grant_ref, ... ) and then calls m2p_add_override() in p2m.cAnd HYPERVISOR_grant_table_op .. would populate map_ops[i].bus_addr with the machine address..> which is where I made my change. > > The unpatched code was saving the pfn''s old mfn in kmap_op->dev_bus_addr. > > kmap_op is of type struct gnttab_map_grant_ref. That data type is used to record grant table mappings so later they can be unmapped correctly.Right, but the blkback makes a distinction by passing NULL as kmap_op, which means it should use the old mechanism. Meaning that once the hypercall is done, the map_ops[i].bus_addr is not used anymore..> > The problem with saving the old mfn in kmap_op->dev_bus_addr is that it is later overwritten by __gnttab_map_grant_ref() in xen/common/grant_table.cUh, so the problem of saving the old mfn in dev_bus_addr has been there for a long long time then? Even before this patch set?> > Since the storage holding the old mfn got overwritten, the unmapping was being done incorrectly. The balloon code detected that and bugged at drivers/xen/balloon.c:359 >Hmm, I believe the storage for holding the old mfn was/is page->index.> My patch simply adds another member called old_mfn to struct gnttab_map_grant_ref rather than trying to overload dev_bus_addr. > > I don''t know if Sander''s bug is the same or related. The BUG_ON at drivers/xen/balloon.c:359 is quite general. It simply asserts that we are not trying to re-map a valid mapping.Right. Somehow he ends up with valid mappings where there should be none. And lots of them.> > -- Robert Phillips > > > -----Original Message----- > From: Sander Eikelenboom [mailto:linux@eikelenboom.it] > Sent: Tuesday, September 04, 2012 3:35 PM > To: Ben Guthro > Cc: Konrad Rzeszutek Wilk; xen-devel@lists.xen.org; Robert Phillips > Subject: Re: [Xen-devel] dom0 linux 3.6.0-rc4, crash due to ballooning althoug dom0_mem=X, max:X set > > > Tuesday, September 4, 2012, 8:07:11 PM, you wrote: > > > We ran into the same issue, in newer kernels - but had not yet > > submitted this fix. > > > One of the developers here came up with a fix (attached, and CC''ed > > here) that fixes an issue where the p2m code reuses a structure member > > where it shouldn''t. > > The patch adds a new "old_mfn" member to the gnttab_map_grant_ref > > structure, instead of re-using dev_bus_addr. > > > > If this also works for you, I can re-submit it with a Signed-off-by > > line, if you prefer, Konrad. > > Hi Ben, > > This patch doesn''t work for me: > > When starting the PV-guest i get: > > (XEN) [2012-09-04 20:31:37] grant_table.c:499:d0 Bad flags in grant map op (68b69070). > (XEN) [2012-09-04 20:31:37] grant_table.c:499:d0 Bad flags in grant map op (0). > (XEN) [2012-09-04 20:31:37] grant_table.c:499:d0 Bad flags in grant map op (0). > > > and from the dom0 kernel: > > [ 374.425727] BUG: unable to handle kernel paging request at ffff8800fffd9078 > [ 374.428901] IP: [<ffffffff81336e4e>] gnttab_map_refs+0x14e/0x270 > [ 374.428901] PGD 1e0c067 PUD 0 > [ 374.428901] Oops: 0000 [#1] PREEMPT SMP > [ 374.428901] Modules linked in: > [ 374.428901] CPU 0 > [ 374.428901] Pid: 4308, comm: qemu-system-i38 Not tainted 3.6.0-rc4-20120830+ #70 System manufacturer System Product Name/P5Q-EM DO > [ 374.428901] RIP: e030:[<ffffffff81336e4e>] [<ffffffff81336e4e>] gnttab_map_refs+0x14e/0x270 > [ 374.428901] RSP: e02b:ffff88002f185ca8 EFLAGS: 00010206 > [ 374.428901] RAX: ffff880000000000 RBX: ffff88001471cf00 RCX: 00000000fffd9078 > [ 374.428901] RDX: 0000000000000050 RSI: 40000000000fffd9 RDI: 00003ffffffff000 > [ 374.428901] RBP: ffff88002f185d08 R08: 0000000000000078 R09: 0000000000000000 > [ 374.428901] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000004 > [ 374.428901] R13: ffff88001471c480 R14: 0000000000000002 R15: 0000000000000002 > [ 374.428901] FS: 00007f6def9f2740(0000) GS:ffff88003fc00000(0000) knlGS:0000000000000000 > [ 374.428901] CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b > [ 374.428901] CR2: ffff8800fffd9078 CR3: 000000002d30e000 CR4: 0000000000042660 > [ 374.428901] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > [ 374.428901] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > [ 374.428901] Process qemu-system-i38 (pid: 4308, threadinfo ffff88002f184000, task ffff8800376f1040) > [ 374.428901] Stack: > [ 374.428901] ffffffffffffffff 0000000000000050 00000000fffd9078 00000000000fffd9 > [ 374.428901] 0000000001000000 ffff8800382135a0 ffff88002f185d08 ffff880038211960 > [ 374.428901] ffff88002f11d2c0 0000000000000004 0000000000000003 0000000000000001 > [ 374.428901] Call Trace: > [ 374.428901] [<ffffffff8134212e>] gntdev_mmap+0x20e/0x520 > [ 374.428901] [<ffffffff8111c502>] ? mmap_region+0x312/0x5a0 > [ 374.428901] [<ffffffff810ae0a0>] ? lockdep_trace_alloc+0xa0/0x130 > [ 374.428901] [<ffffffff8111c5be>] mmap_region+0x3ce/0x5a0 > [ 374.428901] [<ffffffff8111c9e0>] do_mmap_pgoff+0x250/0x350 > [ 374.428901] [<ffffffff81109e88>] vm_mmap_pgoff+0x68/0x90 > [ 374.428901] [<ffffffff8111a5b2>] sys_mmap_pgoff+0x152/0x170 > [ 374.428901] [<ffffffff812b29be>] ? trace_hardirqs_on_thunk+0x3a/0x3f > [ 374.428901] [<ffffffff81011f29>] sys_mmap+0x29/0x30 > [ 374.428901] [<ffffffff8184b939>] system_call_fastpath+0x16/0x1b > [ 374.428901] Code: 0f 84 e7 00 00 00 48 89 f1 48 c1 e1 0c 41 81 e0 ff 0f 00 00 48 b8 00 00 00 00 00 88 ff ff 48 bf 00 f0 ff ff ff 3f 00 00 4c 01 c1 <48> 23 3c 01 48 c1 ef 0c 49 8d 54 15 00 4d 85 ed b8 00 00 00 00 > [ 374.428901] RIP [<ffffffff81336e4e>] gnttab_map_refs+0x14e/0x270 > [ 374.428901] RSP <ffff88002f185ca8> > [ 374.428901] CR2: ffff8800fffd9078 > [ 374.428901] ---[ end trace 0e0a5a49f6503c0a ]--- > > > > > Ben > > > > On Tue, Sep 4, 2012 at 1:19 PM, Sander Eikelenboom <linux@eikelenboom.it> wrote: > >> > >> Tuesday, September 4, 2012, 6:33:47 PM, you wrote: > >> > >>> On Tue, Sep 04, 2012 at 06:37:57PM +0200, Sander Eikelenboom wrote: > >>>> Hi Konrad, > >>>> > >>>> This seems to happen only on a intel machine i''m trying to setup as a development machine (haven''t seen it on my amd). > >>>> It boots fine, i have dom0_mem=1024M,max:1024M set, the machine has 2G of mem. > >> > >>> Is this only with Xen 4.2? As, does Xen 4.1 work? > >>>> > >>>> Dom0 and guest kernel are 3.6.0-rc4 with config: > >> > >>> If you back out: > >> > >>> f393387d160211f60398d58463a7e65 > >>> Author: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> > >>> Date: Fri Aug 17 16:43:28 2012 -0400 > >> > >>> xen/setup: Fix one-off error when adding for-balloon PFNs to the P2M. > >> > >>> Do you see this bug? (Either with Xen 4.1 or Xen 4.2)? > >> > >> With c96aae1f7f393387d160211f60398d58463a7e65 reverted i still see this bug (with Xen 4.2). > >> > >> Will use the debug patch you mailed and send back the results ... > >> > >> > >>>> [*] Xen memory balloon driver > >>>> [*] Scrub pages before returning them to system > >>>> > >>>> From http://wiki.xen.org/wiki/Do%EF%BB%BFm0_Memory_%E2%80%94_Where_It_Has_Not_Gone , I thought this should be okay > >>>> > >>>> But when trying to start a PV guest with 512MB mem, the machine (dom0) crashes with the stacktrace below (complete serial-log.txt attached). > >>>> > >>>> From the: > >>>> "mapping kernel into physical memory > >>>> about to get started..." > >>>> > >>>> I would almost say it''s trying to reload dom0 ? > >>>> > >>>> > >>>> [ 897.161119] device vif1.0 entered promiscuous mode > >>>> mapping kernel into physical memory > >>>> about to get started... > >>>> [ 897.696619] xen_bridge: port 1(vif1.0) entered forwarding state > >>>> [ 897.716219] xen_bridge: port 1(vif1.0) entered forwarding state > >>>> [ 898.129465] ------------[ cut here ]------------ > >>>> [ 898.132209] kernel BUG at drivers/xen/balloon.c:359! > >>>> [ 898.132209] invalid opcode: 0000 [#1] PREEMPT SMP > >> > >> > >> > >> _______________________________________________ > >> Xen-devel mailing list > >> Xen-devel@lists.xen.org > >> http://lists.xen.org/xen-devel
Sander Eikelenboom
2012-Sep-05 14:38 UTC
Re: dom0 linux 3.6.0-rc4, crash due to ballooning althoug dom0_mem=X, max:X set
Wednesday, September 5, 2012, 4:06:01 PM, you wrote:> On Tue, Sep 04, 2012 at 04:27:20PM -0400, Robert Phillips wrote: >> Ben, >> >> You have asked me to provide the rationale behind the gnttab_old_mfn patch, which you emailed to Sander earlier today. >> Here are my findings. >> >> I found that xen_blkbk_map() in drivers/block/xen-blkback/blkback.c has changed from our previous version. It now calls gnttab_map_refs() in drivers/xen/grant-table.c. >> >> That function first calls HYPERVISOR_grant_table_op(GNTTABOP_map_grant_ref, ... ) and then calls m2p_add_override() in p2m.c> And HYPERVISOR_grant_table_op .. would populate map_ops[i].bus_addr with the machine address..>> which is where I made my change. >> >> The unpatched code was saving the pfn''s old mfn in kmap_op->dev_bus_addr. >> >> kmap_op is of type struct gnttab_map_grant_ref. That data type is used to record grant table mappings so later they can be unmapped correctly.> Right, but the blkback makes a distinction by passing NULL as kmap_op, which means it should > use the old mechanism. Meaning that once the hypercall is done, the map_ops[i].bus_addr is not > used anymore..>> >> The problem with saving the old mfn in kmap_op->dev_bus_addr is that it is later overwritten by __gnttab_map_grant_ref() in xen/common/grant_table.c> Uh, so the problem of saving the old mfn in dev_bus_addr has been there for a long long time then? > Even before this patch set? >> >> Since the storage holding the old mfn got overwritten, the unmapping was being done incorrectly. The balloon code detected that and bugged at drivers/xen/balloon.c:359 >>> Hmm, I believe the storage for holding the old mfn was/is page->index.>> My patch simply adds another member called old_mfn to struct gnttab_map_grant_ref rather than trying to overload dev_bus_addr. >> >> I don''t know if Sander''s bug is the same or related. The BUG_ON at drivers/xen/balloon.c:359 is quite general. It simply asserts that we are not trying to re-map a valid mapping.> Right. Somehow he ends up with valid mappings where there should be none. And lots of them.It''s something between kernel v3.4.1 and v3.5.3, haven''t had time to narrow it down yet. Any suggestions for specific commits i could try to quickly bisect this one ?>> >> -- Robert Phillips >> >> >> -----Original Message----- >> From: Sander Eikelenboom [mailto:linux@eikelenboom.it] >> Sent: Tuesday, September 04, 2012 3:35 PM >> To: Ben Guthro >> Cc: Konrad Rzeszutek Wilk; xen-devel@lists.xen.org; Robert Phillips >> Subject: Re: [Xen-devel] dom0 linux 3.6.0-rc4, crash due to ballooning althoug dom0_mem=X, max:X set >> >> >> Tuesday, September 4, 2012, 8:07:11 PM, you wrote: >> >> > We ran into the same issue, in newer kernels - but had not yet >> > submitted this fix. >> >> > One of the developers here came up with a fix (attached, and CC''ed >> > here) that fixes an issue where the p2m code reuses a structure member >> > where it shouldn''t. >> > The patch adds a new "old_mfn" member to the gnttab_map_grant_ref >> > structure, instead of re-using dev_bus_addr. >> >> >> > If this also works for you, I can re-submit it with a Signed-off-by >> > line, if you prefer, Konrad. >> >> Hi Ben, >> >> This patch doesn''t work for me: >> >> When starting the PV-guest i get: >> >> (XEN) [2012-09-04 20:31:37] grant_table.c:499:d0 Bad flags in grant map op (68b69070). >> (XEN) [2012-09-04 20:31:37] grant_table.c:499:d0 Bad flags in grant map op (0). >> (XEN) [2012-09-04 20:31:37] grant_table.c:499:d0 Bad flags in grant map op (0). >> >> >> and from the dom0 kernel: >> >> [ 374.425727] BUG: unable to handle kernel paging request at ffff8800fffd9078 >> [ 374.428901] IP: [<ffffffff81336e4e>] gnttab_map_refs+0x14e/0x270 >> [ 374.428901] PGD 1e0c067 PUD 0 >> [ 374.428901] Oops: 0000 [#1] PREEMPT SMP >> [ 374.428901] Modules linked in: >> [ 374.428901] CPU 0 >> [ 374.428901] Pid: 4308, comm: qemu-system-i38 Not tainted 3.6.0-rc4-20120830+ #70 System manufacturer System Product Name/P5Q-EM DO >> [ 374.428901] RIP: e030:[<ffffffff81336e4e>] [<ffffffff81336e4e>] gnttab_map_refs+0x14e/0x270 >> [ 374.428901] RSP: e02b:ffff88002f185ca8 EFLAGS: 00010206 >> [ 374.428901] RAX: ffff880000000000 RBX: ffff88001471cf00 RCX: 00000000fffd9078 >> [ 374.428901] RDX: 0000000000000050 RSI: 40000000000fffd9 RDI: 00003ffffffff000 >> [ 374.428901] RBP: ffff88002f185d08 R08: 0000000000000078 R09: 0000000000000000 >> [ 374.428901] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000004 >> [ 374.428901] R13: ffff88001471c480 R14: 0000000000000002 R15: 0000000000000002 >> [ 374.428901] FS: 00007f6def9f2740(0000) GS:ffff88003fc00000(0000) knlGS:0000000000000000 >> [ 374.428901] CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b >> [ 374.428901] CR2: ffff8800fffd9078 CR3: 000000002d30e000 CR4: 0000000000042660 >> [ 374.428901] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >> [ 374.428901] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 >> [ 374.428901] Process qemu-system-i38 (pid: 4308, threadinfo ffff88002f184000, task ffff8800376f1040) >> [ 374.428901] Stack: >> [ 374.428901] ffffffffffffffff 0000000000000050 00000000fffd9078 00000000000fffd9 >> [ 374.428901] 0000000001000000 ffff8800382135a0 ffff88002f185d08 ffff880038211960 >> [ 374.428901] ffff88002f11d2c0 0000000000000004 0000000000000003 0000000000000001 >> [ 374.428901] Call Trace: >> [ 374.428901] [<ffffffff8134212e>] gntdev_mmap+0x20e/0x520 >> [ 374.428901] [<ffffffff8111c502>] ? mmap_region+0x312/0x5a0 >> [ 374.428901] [<ffffffff810ae0a0>] ? lockdep_trace_alloc+0xa0/0x130 >> [ 374.428901] [<ffffffff8111c5be>] mmap_region+0x3ce/0x5a0 >> [ 374.428901] [<ffffffff8111c9e0>] do_mmap_pgoff+0x250/0x350 >> [ 374.428901] [<ffffffff81109e88>] vm_mmap_pgoff+0x68/0x90 >> [ 374.428901] [<ffffffff8111a5b2>] sys_mmap_pgoff+0x152/0x170 >> [ 374.428901] [<ffffffff812b29be>] ? trace_hardirqs_on_thunk+0x3a/0x3f >> [ 374.428901] [<ffffffff81011f29>] sys_mmap+0x29/0x30 >> [ 374.428901] [<ffffffff8184b939>] system_call_fastpath+0x16/0x1b >> [ 374.428901] Code: 0f 84 e7 00 00 00 48 89 f1 48 c1 e1 0c 41 81 e0 ff 0f 00 00 48 b8 00 00 00 00 00 88 ff ff 48 bf 00 f0 ff ff ff 3f 00 00 4c 01 c1 <48> 23 3c 01 48 c1 ef 0c 49 8d 54 15 00 4d 85 ed b8 00 00 00 00 >> [ 374.428901] RIP [<ffffffff81336e4e>] gnttab_map_refs+0x14e/0x270 >> [ 374.428901] RSP <ffff88002f185ca8> >> [ 374.428901] CR2: ffff8800fffd9078 >> [ 374.428901] ---[ end trace 0e0a5a49f6503c0a ]--- >> >> >> >> > Ben >> >> >> > On Tue, Sep 4, 2012 at 1:19 PM, Sander Eikelenboom <linux@eikelenboom.it> wrote: >> >> >> >> Tuesday, September 4, 2012, 6:33:47 PM, you wrote: >> >> >> >>> On Tue, Sep 04, 2012 at 06:37:57PM +0200, Sander Eikelenboom wrote: >> >>>> Hi Konrad, >> >>>> >> >>>> This seems to happen only on a intel machine i''m trying to setup as a development machine (haven''t seen it on my amd). >> >>>> It boots fine, i have dom0_mem=1024M,max:1024M set, the machine has 2G of mem. >> >> >> >>> Is this only with Xen 4.2? As, does Xen 4.1 work? >> >>>> >> >>>> Dom0 and guest kernel are 3.6.0-rc4 with config: >> >> >> >>> If you back out: >> >> >> >>> f393387d160211f60398d58463a7e65 >> >>> Author: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> >> >>> Date: Fri Aug 17 16:43:28 2012 -0400 >> >> >> >>> xen/setup: Fix one-off error when adding for-balloon PFNs to the P2M. >> >> >> >>> Do you see this bug? (Either with Xen 4.1 or Xen 4.2)? >> >> >> >> With c96aae1f7f393387d160211f60398d58463a7e65 reverted i still see this bug (with Xen 4.2). >> >> >> >> Will use the debug patch you mailed and send back the results ... >> >> >> >> >> >>>> [*] Xen memory balloon driver >> >>>> [*] Scrub pages before returning them to system >> >>>> >> >>>> From http://wiki.xen.org/wiki/Do%EF%BB%BFm0_Memory_%E2%80%94_Where_It_Has_Not_Gone , I thought this should be okay >> >>>> >> >>>> But when trying to start a PV guest with 512MB mem, the machine (dom0) crashes with the stacktrace below (complete serial-log.txt attached). >> >>>> >> >>>> From the: >> >>>> "mapping kernel into physical memory >> >>>> about to get started..." >> >>>> >> >>>> I would almost say it''s trying to reload dom0 ? >> >>>> >> >>>> >> >>>> [ 897.161119] device vif1.0 entered promiscuous mode >> >>>> mapping kernel into physical memory >> >>>> about to get started... >> >>>> [ 897.696619] xen_bridge: port 1(vif1.0) entered forwarding state >> >>>> [ 897.716219] xen_bridge: port 1(vif1.0) entered forwarding state >> >>>> [ 898.129465] ------------[ cut here ]------------ >> >>>> [ 898.132209] kernel BUG at drivers/xen/balloon.c:359! >> >>>> [ 898.132209] invalid opcode: 0000 [#1] PREEMPT SMP >> >> >> >> >> >> >> >> _______________________________________________ >> >> Xen-devel mailing list >> >> Xen-devel@lists.xen.org >> >> http://lists.xen.org/xen-devel
Konrad Rzeszutek Wilk
2012-Sep-05 20:19 UTC
Re: dom0 linux 3.6.0-rc4, crash due to ballooning althoug dom0_mem=X, max:X set
On Wed, Sep 05, 2012 at 04:38:48PM +0200, Sander Eikelenboom wrote:> > Wednesday, September 5, 2012, 4:06:01 PM, you wrote: > > > On Tue, Sep 04, 2012 at 04:27:20PM -0400, Robert Phillips wrote: > >> Ben, > >> > >> You have asked me to provide the rationale behind the gnttab_old_mfn patch, which you emailed to Sander earlier today. > >> Here are my findings. > >> > >> I found that xen_blkbk_map() in drivers/block/xen-blkback/blkback.c has changed from our previous version. It now calls gnttab_map_refs() in drivers/xen/grant-table.c. > >> > >> That function first calls HYPERVISOR_grant_table_op(GNTTABOP_map_grant_ref, ... ) and then calls m2p_add_override() in p2m.c > > > And HYPERVISOR_grant_table_op .. would populate map_ops[i].bus_addr with the machine address.. > > >> which is where I made my change. > >> > >> The unpatched code was saving the pfn''s old mfn in kmap_op->dev_bus_addr. > >> > >> kmap_op is of type struct gnttab_map_grant_ref. That data type is used to record grant table mappings so later they can be unmapped correctly. > > > Right, but the blkback makes a distinction by passing NULL as kmap_op, which means it should > > use the old mechanism. Meaning that once the hypercall is done, the map_ops[i].bus_addr is not > > used anymore.. > > >> > >> The problem with saving the old mfn in kmap_op->dev_bus_addr is that it is later overwritten by __gnttab_map_grant_ref() in xen/common/grant_table.c > > > Uh, so the problem of saving the old mfn in dev_bus_addr has been there for a long long time then? > > Even before this patch set? > >> > >> Since the storage holding the old mfn got overwritten, the unmapping was being done incorrectly. The balloon code detected that and bugged at drivers/xen/balloon.c:359 > >> > > > Hmm, I believe the storage for holding the old mfn was/is page->index. > > > >> My patch simply adds another member called old_mfn to struct gnttab_map_grant_ref rather than trying to overload dev_bus_addr. > >> > >> I don''t know if Sander''s bug is the same or related. The BUG_ON at drivers/xen/balloon.c:359 is quite general. It simply asserts that we are not trying to re-map a valid mapping. > > > Right. Somehow he ends up with valid mappings where there should be none. And lots of them. > > It''s something between kernel v3.4.1 and v3.5.3, haven''t had time to narrow it down yet. > Any suggestions for specific commits i could try to quickly bisect this one ?These are the ones that went in: ea61fc0 xen/p2m: Reserve 8MB of _brk space for P2M leafs when populating back. b9e0d95 xen: mark local pages as FOREIGN in the m2p_override 6878c32 xen/blkfront: Add WARN to deal with misbehaving backends. 5e62625 xen/setup: filter APERFMPERF cpuid feature out 8c9ce60 xen/blkback: Copy id field when doing BLKIF_DISCARD. 58b7b53 xen/balloon: Subtract from xen_released_pages the count that is populated. 780dbcd xen/pci: Check for PCI bridge before using it. 5e152e6 xen/events: Add WARN_ON when quick lookup found invalid type. 5842f57 xen/hvc: Check HVM_PARAM_CONSOLE_[EVTCHN|PFN] for correctness. a32c88b xen/hvc: Fix error cases around HVM_PARAM_CONSOLE_PFN 2e5ad6b xen/hvc: Collapse error logic. 7664810 xen: do not disable netfront in dom0 68c2c39 xen: do not map the same GSI twice in PVHVM guests. 201a52b hvc_xen: NULL dereference on allocation failure d79d595 xen: Add selfballoning memory reservation tunable. d2fb4c5 xenbus: Add support for xenbus backend in stub domain 2f1bd67 xen/smp: unbind irqworkX when unplugging vCPUs. 87e4baa x86/xen/apic: Add missing #include <xen/xen.h> 323f90a xen-acpi-processor: Add missing #include <xen/xen.h> 8605067 xen-blkfront: module exit handling adjustments e77c78c xen-blkfront: properly name all devices f62805f xen: enter/exit lazy_mmu_mode around m2p_override calls 211063d xen/acpi/sleep: Enable ACPI sleep via the __acpi_os_prepare_sleep 1ff2b0c xen: implement IRQ_WORK_VECTOR handler f447d56 xen: implement apic ipi interface 83d51ab xen/setup: update VA mapping when releasing memory during setup 96dc08b xen/setup: Combine the two hypercall functions - since they are quite similar. 2e2fb75 xen/setup: Populate freed MFNs from non-RAM E820 entries and gaps to E820 RAM ca11823 xen/setup: Only print "Freeing XXX-YYY pfn range: Z pages freed" if Z > 0 9438ef7 x86/apic: Fix UP boot crash ab6ec39 xen/apic: implement io apic read with hypercall 27abd14 Revert "xen/x86: Workaround ''x86/ioapic: Add register level checks to detect bogus io-apic entries''" 31b3c9d xen/x86: Implement x86_apic_ops 4a8e2a3 x86/apic: Replace io_apic_ops with x86_io_apic_ops. 977f857 PCI: move mutex locking out of pci_dev_reset function 569ca5b xen/gnttab: add deferred freeing logic 9fe2a70 debugfs: Add support to print u32 array in debugfs 940713b xen/p2m: An early bootup variant of set_phys_to_machine d509685 xen/p2m: Collapse early_alloc_p2m_middle redundant checks. cef4cca xen/p2m: Allow alloc_p2m_middle to call reserve_brk depending on argument 3f3aaea xen/p2m: Move code around to allow for better re-usage. Narrowing this down (so ignore APIC bootup, drivers, etc) these could be it: b9e0d95 xen: mark local pages as FOREIGN in the m2p_override 58b7b53 xen/balloon: Subtract from xen_released_pages the count that is populated. d79d595 xen: Add selfballoning memory reservation tunable. f62805f xen: enter/exit lazy_mmu_mode around m2p_override calls 83d51ab xen/setup: update VA mapping when releasing memory during setup 96dc08b xen/setup: Combine the two hypercall functions - since they are quite similar. 2e2fb75 xen/setup: Populate freed MFNs from non-RAM E820 entries and gaps to E820 RAM ca11823 xen/setup: Only print "Freeing XXX-YYY pfn range: Z pages freed" if Z > 0 940713b xen/p2m: An early bootup variant of set_phys_to_machine d509685 xen/p2m: Collapse early_alloc_p2m_middle redundant checks. cef4cca xen/p2m: Allow alloc_p2m_middle to call reserve_brk depending on argument 3f3aaea xen/p2m: Move code around to allow for better re-usage. About nine of them deal with dom0_mem=max ballooning up right, so if you ignore those: b9e0d95 xen: mark local pages as FOREIGN in the m2p_override d79d595 xen: Add selfballoning memory reservation tunable. f62805f xen: enter/exit lazy_mmu_mode around m2p_override calls Try reverting any of those. And if nothing works there then we can try to revert the ones that deal with ''dom0_mem=max:XX''.. I also need to be able to reproduce this. You said you can only reproduce this on your Intel box - is this a fast Intel machine? It also looks like you only have 2GB in the machine - and reserve 1GB to the dom0. If you manually (so don''t start the guest), balloon down - say to 512MB and then launch a guest do you see this problem?
Sander Eikelenboom
2012-Sep-05 22:52 UTC
Re: dom0 linux 3.6.0-rc4, crash due to ballooning althoug dom0_mem=X, max:X set
Wednesday, September 5, 2012, 10:19:33 PM, you wrote:> On Wed, Sep 05, 2012 at 04:38:48PM +0200, Sander Eikelenboom wrote: >> >> Wednesday, September 5, 2012, 4:06:01 PM, you wrote: >> >> > On Tue, Sep 04, 2012 at 04:27:20PM -0400, Robert Phillips wrote: >> >> Ben, >> >> >> >> You have asked me to provide the rationale behind the gnttab_old_mfn patch, which you emailed to Sander earlier today. >> >> Here are my findings. >> >> >> >> I found that xen_blkbk_map() in drivers/block/xen-blkback/blkback.c has changed from our previous version. It now calls gnttab_map_refs() in drivers/xen/grant-table.c. >> >> >> >> That function first calls HYPERVISOR_grant_table_op(GNTTABOP_map_grant_ref, ... ) and then calls m2p_add_override() in p2m.c >> >> > And HYPERVISOR_grant_table_op .. would populate map_ops[i].bus_addr with the machine address.. >> >> >> which is where I made my change. >> >> >> >> The unpatched code was saving the pfn''s old mfn in kmap_op->dev_bus_addr. >> >> >> >> kmap_op is of type struct gnttab_map_grant_ref. That data type is used to record grant table mappings so later they can be unmapped correctly. >> >> > Right, but the blkback makes a distinction by passing NULL as kmap_op, which means it should >> > use the old mechanism. Meaning that once the hypercall is done, the map_ops[i].bus_addr is not >> > used anymore.. >> >> >> >> >> The problem with saving the old mfn in kmap_op->dev_bus_addr is that it is later overwritten by __gnttab_map_grant_ref() in xen/common/grant_table.c >> >> > Uh, so the problem of saving the old mfn in dev_bus_addr has been there for a long long time then? >> > Even before this patch set? >> >> >> >> Since the storage holding the old mfn got overwritten, the unmapping was being done incorrectly. The balloon code detected that and bugged at drivers/xen/balloon.c:359 >> >> >> >> > Hmm, I believe the storage for holding the old mfn was/is page->index. >> >> >> >> My patch simply adds another member called old_mfn to struct gnttab_map_grant_ref rather than trying to overload dev_bus_addr. >> >> >> >> I don''t know if Sander''s bug is the same or related. The BUG_ON at drivers/xen/balloon.c:359 is quite general. It simply asserts that we are not trying to re-map a valid mapping. >> >> > Right. Somehow he ends up with valid mappings where there should be none. And lots of them. >> >> It''s something between kernel v3.4.1 and v3.5.3, haven''t had time to narrow it down yet. >> Any suggestions for specific commits i could try to quickly bisect this one ?> These are the ones that went in:> ea61fc0 xen/p2m: Reserve 8MB of _brk space for P2M leafs when populating back. > b9e0d95 xen: mark local pages as FOREIGN in the m2p_override > 6878c32 xen/blkfront: Add WARN to deal with misbehaving backends. > 5e62625 xen/setup: filter APERFMPERF cpuid feature out > 8c9ce60 xen/blkback: Copy id field when doing BLKIF_DISCARD. > 58b7b53 xen/balloon: Subtract from xen_released_pages the count that is populated. > 780dbcd xen/pci: Check for PCI bridge before using it. > 5e152e6 xen/events: Add WARN_ON when quick lookup found invalid type. > 5842f57 xen/hvc: Check HVM_PARAM_CONSOLE_[EVTCHN|PFN] for correctness. > a32c88b xen/hvc: Fix error cases around HVM_PARAM_CONSOLE_PFN > 2e5ad6b xen/hvc: Collapse error logic. > 7664810 xen: do not disable netfront in dom0 > 68c2c39 xen: do not map the same GSI twice in PVHVM guests. > 201a52b hvc_xen: NULL dereference on allocation failure > d79d595 xen: Add selfballoning memory reservation tunable. > d2fb4c5 xenbus: Add support for xenbus backend in stub domain > 2f1bd67 xen/smp: unbind irqworkX when unplugging vCPUs. > 87e4baa x86/xen/apic: Add missing #include <xen/xen.h> > 323f90a xen-acpi-processor: Add missing #include <xen/xen.h> > 8605067 xen-blkfront: module exit handling adjustments > e77c78c xen-blkfront: properly name all devices > f62805f xen: enter/exit lazy_mmu_mode around m2p_override calls > 211063d xen/acpi/sleep: Enable ACPI sleep via the __acpi_os_prepare_sleep > 1ff2b0c xen: implement IRQ_WORK_VECTOR handler > f447d56 xen: implement apic ipi interface > 83d51ab xen/setup: update VA mapping when releasing memory during setup > 96dc08b xen/setup: Combine the two hypercall functions - since they are quite similar. > 2e2fb75 xen/setup: Populate freed MFNs from non-RAM E820 entries and gaps to E820 RAM > ca11823 xen/setup: Only print "Freeing XXX-YYY pfn range: Z pages freed" if Z > 0 > 9438ef7 x86/apic: Fix UP boot crash > ab6ec39 xen/apic: implement io apic read with hypercall > 27abd14 Revert "xen/x86: Workaround ''x86/ioapic: Add register level checks to detect bogus io-apic entries''" > 31b3c9d xen/x86: Implement x86_apic_ops > 4a8e2a3 x86/apic: Replace io_apic_ops with x86_io_apic_ops. > 977f857 PCI: move mutex locking out of pci_dev_reset function > 569ca5b xen/gnttab: add deferred freeing logic > 9fe2a70 debugfs: Add support to print u32 array in debugfs > 940713b xen/p2m: An early bootup variant of set_phys_to_machine > d509685 xen/p2m: Collapse early_alloc_p2m_middle redundant checks. > cef4cca xen/p2m: Allow alloc_p2m_middle to call reserve_brk depending on argument > 3f3aaea xen/p2m: Move code around to allow for better re-usage.> Narrowing this down (so ignore APIC bootup, drivers, etc) these could be it:> b9e0d95 xen: mark local pages as FOREIGN in the m2p_override > 58b7b53 xen/balloon: Subtract from xen_released_pages the count that is populated. > d79d595 xen: Add selfballoning memory reservation tunable. > f62805f xen: enter/exit lazy_mmu_mode around m2p_override calls > 83d51ab xen/setup: update VA mapping when releasing memory during setup > 96dc08b xen/setup: Combine the two hypercall functions - since they are quite similar. > 2e2fb75 xen/setup: Populate freed MFNs from non-RAM E820 entries and gaps to E820 RAM > ca11823 xen/setup: Only print "Freeing XXX-YYY pfn range: Z pages freed" if Z > 0 > 940713b xen/p2m: An early bootup variant of set_phys_to_machine > d509685 xen/p2m: Collapse early_alloc_p2m_middle redundant checks. > cef4cca xen/p2m: Allow alloc_p2m_middle to call reserve_brk depending on argument > 3f3aaea xen/p2m: Move code around to allow for better re-usage.> About nine of them deal with dom0_mem=max ballooning up right, so if you > ignore those:> b9e0d95 xen: mark local pages as FOREIGN in the m2p_override > d79d595 xen: Add selfballoning memory reservation tunable. > f62805f xen: enter/exit lazy_mmu_mode around m2p_override calls> Try reverting any of those.Ah i missed your email since my hostingprovider was down :-( But anyway done a git bisect in the mean time that leads to: [f62805f1f30a40e354bd036b4cb799863a39be4b] xen: enter/exit lazy_mmu_mode around m2p_override calls> And if nothing works there then we can try to revert the ones that > deal with ''dom0_mem=max:XX''..> I also need to be able to reproduce this. You said you can only reproduce this > on your Intel box - is this a fast Intel machine? It also looks like you only > have 2GB in the machine - and reserve 1GB to the dom0.Machine is a quad core q9400 @ 2.66mhz, not very fast .. not very slow either> If you manually (so don''t start the guest), balloon down - say to 512MB and then launch > a guest do you see this problem?Should i use xl mem-max domain-id mem or xl mem-set domain-id mem for that ? Perhaps a silly question, but why is it ballooning anyway ? I have set dom0''s memory and there is enough left to create the domain ... or at least there should be ... -- Sander
Konrad Rzeszutek Wilk
2012-Sep-06 10:57 UTC
Re: dom0 linux 3.6.0-rc4, crash due to ballooning althoug dom0_mem=X, max:X set
> > About nine of them deal with dom0_mem=max ballooning up right, so if you > > ignore those: > > > b9e0d95 xen: mark local pages as FOREIGN in the m2p_override > > d79d595 xen: Add selfballoning memory reservation tunable. > > f62805f xen: enter/exit lazy_mmu_mode around m2p_override calls > > > Try reverting any of those. > > Ah i missed your email since my hostingprovider was down :-( > But anyway done a git bisect in the mean time that leads to: > > [f62805f1f30a40e354bd036b4cb799863a39be4b] xen: enter/exit lazy_mmu_mode around m2p_override callsOK. Hmm.that will take a bit of thinking to fix.> > > > And if nothing works there then we can try to revert the ones that > > deal with ''dom0_mem=max:XX''.. > > > I also need to be able to reproduce this. You said you can only reproduce this > > on your Intel box - is this a fast Intel machine? It also looks like you only > > have 2GB in the machine - and reserve 1GB to the dom0. > > Machine is a quad core q9400 @ 2.66mhz, not very fast .. not very slow eitherThat is a fast machine. I was thinking you had a Core2 Solo or a Pentium IV Prescott.> > > If you manually (so don''t start the guest), balloon down - say to 512MB and then launch > > a guest do you see this problem? > > > Should i use > > xl mem-max domain-id mem > > or > > xl mem-set domain-id memThe later.> > for that ? > > > Perhaps a silly question, but why is it ballooning anyway ? > I have set dom0''s memory and there is enough left to create the domain ... or at least there should be ...There was a bug in xl that would autoballoon. You can turn it off using some xl.conf file.> > -- > Sander
Sander Eikelenboom
2012-Sep-06 11:16 UTC
Re: dom0 linux 3.6.0-rc4, crash due to ballooning althoug dom0_mem=X, max:X set
Thursday, September 6, 2012, 12:57:46 PM, you wrote:>> > About nine of them deal with dom0_mem=max ballooning up right, so if you >> > ignore those: >> >> > b9e0d95 xen: mark local pages as FOREIGN in the m2p_override >> > d79d595 xen: Add selfballoning memory reservation tunable. >> > f62805f xen: enter/exit lazy_mmu_mode around m2p_override calls >> >> > Try reverting any of those. >> >> Ah i missed your email since my hostingprovider was down :-( >> But anyway done a git bisect in the mean time that leads to: >> >> [f62805f1f30a40e354bd036b4cb799863a39be4b] xen: enter/exit lazy_mmu_mode around m2p_override calls> OK. Hmm.that will take a bit of thinking to fix. >> >> >> > And if nothing works there then we can try to revert the ones that >> > deal with ''dom0_mem=max:XX''.. >> >> > I also need to be able to reproduce this. You said you can only reproduce this >> > on your Intel box - is this a fast Intel machine? It also looks like you only >> > have 2GB in the machine - and reserve 1GB to the dom0. >> >> Machine is a quad core q9400 @ 2.66mhz, not very fast .. not very slow either> That is a fast machine. I was thinking you had a Core2 Solo or a Pentium IV Prescott.>> >> > If you manually (so don''t start the guest), balloon down - say to 512MB and then launch >> > a guest do you see this problem? >> >> >> Should i use >> >> xl mem-max domain-id mem >> >> or >> >> xl mem-set domain-id memWill test that shortly> The later. >> >> for that ? >> >> >> Perhaps a silly question, but why is it ballooning anyway ? >> I have set dom0''s memory and there is enough left to create the domain ... or at least there should be ...> There was a bug in xl that would autoballoon. You can turn it off using some xl.conf file.Should that have been fixed ? A) you describe it as a bug, so even without tinkering with the default xl.conf, the tools shouldn''t be autoballooning when "dom0_mem=X, max:X" is set right ?) B) or was it fixed by letting the user turn it off in xl.conf ? If A .. That would make that the "bug" is still present in xen-4.2-rc4 ... Because i was under the impression the "dom0_mem=X, max:X" would prevent the whole autoballooning stuff :-)>> >> -- >> Sander
Sander Eikelenboom
2012-Sep-06 16:46 UTC
Re: dom0 linux 3.6.0-rc4, crash due to ballooning althoug dom0_mem=X, max:X set
Thursday, September 6, 2012, 12:57:46 PM, you wrote:>> > About nine of them deal with dom0_mem=max ballooning up right, so if you >> > ignore those: >> >> > b9e0d95 xen: mark local pages as FOREIGN in the m2p_override >> > d79d595 xen: Add selfballoning memory reservation tunable. >> > f62805f xen: enter/exit lazy_mmu_mode around m2p_override calls >> >> > Try reverting any of those. >> >> Ah i missed your email since my hostingprovider was down :-( >> But anyway done a git bisect in the mean time that leads to: >> >> [f62805f1f30a40e354bd036b4cb799863a39be4b] xen: enter/exit lazy_mmu_mode around m2p_override calls> OK. Hmm.that will take a bit of thinking to fix. >> >> >> > And if nothing works there then we can try to revert the ones that >> > deal with ''dom0_mem=max:XX''.. >> >> > I also need to be able to reproduce this. You said you can only reproduce this >> > on your Intel box - is this a fast Intel machine? It also looks like you only >> > have 2GB in the machine - and reserve 1GB to the dom0. >> >> Machine is a quad core q9400 @ 2.66mhz, not very fast .. not very slow either> That is a fast machine. I was thinking you had a Core2 Solo or a Pentium IV Prescott.>> >> > If you manually (so don''t start the guest), balloon down - say to 512MB and then launch >> > a guest do you see this problem? >> >> >> Should i use >> >> xl mem-max domain-id mem >> >> or >> >> xl mem-set domain-id mem> The later.Ok tested that as well: - After "xl mem-set 0 768M", xentop reports the new mem values (free and for dom0) correctly, nothing else happens. - After a small wait, i tried to start a guest and it crashes dom0 with the ballooning as before.>> >> for that ? >> >> >> Perhaps a silly question, but why is it ballooning anyway ? >> I have set dom0''s memory and there is enough left to create the domain ... or at least there should be ...> There was a bug in xl that would autoballoon. You can turn it off using some xl.conf file.>> >> -- >> Sander
Stefano Stabellini
2012-Sep-11 16:02 UTC
Re: dom0 linux 3.6.0-rc4, crash due to ballooning althoug dom0_mem=X, max:X set
On Wed, 5 Sep 2012, Konrad Rzeszutek Wilk wrote:> On Tue, Sep 04, 2012 at 04:27:20PM -0400, Robert Phillips wrote: > > Ben, > > > > You have asked me to provide the rationale behind the gnttab_old_mfn patch, which you emailed to Sander earlier today. > > Here are my findings. > > > > I found that xen_blkbk_map() in drivers/block/xen-blkback/blkback.c has changed from our previous version. It now calls gnttab_map_refs() in drivers/xen/grant-table.c. > > > > That function first calls HYPERVISOR_grant_table_op(GNTTABOP_map_grant_ref, ... ) and then calls m2p_add_override() in p2m.c > > And HYPERVISOR_grant_table_op .. would populate map_ops[i].bus_addr with the machine address.. > > > which is where I made my change. > > > > The unpatched code was saving the pfn''s old mfn in kmap_op->dev_bus_addr. > > > > kmap_op is of type struct gnttab_map_grant_ref. That data type is used to record grant table mappings so later they can be unmapped correctly. > > Right, but the blkback makes a distinction by passing NULL as kmap_op, which means it should > use the old mechanism. Meaning that once the hypercall is done, the map_ops[i].bus_addr is not > used anymore.. > > > > > The problem with saving the old mfn in kmap_op->dev_bus_addr is that it is later overwritten by __gnttab_map_grant_ref() in xen/common/grant_table.c > > Uh, so the problem of saving the old mfn in dev_bus_addr has been there for a long long time then? > Even before this patch set?I think that Robert identified the real problem: dev_bus_addr shouldn''t have been used here. However the bug only shows up if we are batching the grant table operations, that we started doing since f62805f1f30a40e354bd036b4cb799863a39be4b. That''s why Sander''s bisection found that f62805f1f30a40e354bd036b4cb799863a39be4b is the culprit. However the fix is incorrect because it is modifying a struct that is part of the Xen ABI. I am appending an alternative fix that doesn''t need any changes to public headers. Sander, could you please let me know if it fixes the problem for you? --- diff --git a/arch/x86/include/asm/xen/page.h b/arch/x86/include/asm/xen/page.h index 93971e8..472b9b7 100644 --- a/arch/x86/include/asm/xen/page.h +++ b/arch/x86/include/asm/xen/page.h @@ -51,7 +51,8 @@ extern unsigned long set_phys_range_identity(unsigned long pfn_s, extern int m2p_add_override(unsigned long mfn, struct page *page, struct gnttab_map_grant_ref *kmap_op); -extern int m2p_remove_override(struct page *page, bool clear_pte); +extern int m2p_remove_override(struct page *page, + struct gnttab_map_grant_ref *kmap_op); extern struct page *m2p_find_override(unsigned long mfn); extern unsigned long m2p_find_override_pfn(unsigned long mfn, unsigned long pfn); diff --git a/arch/x86/xen/p2m.c b/arch/x86/xen/p2m.c index 64effdc..2825594 100644 --- a/arch/x86/xen/p2m.c +++ b/arch/x86/xen/p2m.c @@ -734,9 +734,6 @@ int m2p_add_override(unsigned long mfn, struct page *page, xen_mc_issue(PARAVIRT_LAZY_MMU); } - /* let''s use dev_bus_addr to record the old mfn instead */ - kmap_op->dev_bus_addr = page->index; - page->index = (unsigned long) kmap_op; } spin_lock_irqsave(&m2p_override_lock, flags); list_add(&page->lru, &m2p_overrides[mfn_hash(mfn)]); @@ -763,7 +760,8 @@ int m2p_add_override(unsigned long mfn, struct page *page, return 0; } EXPORT_SYMBOL_GPL(m2p_add_override); -int m2p_remove_override(struct page *page, bool clear_pte) +int m2p_remove_override(struct page *page, + struct gnttab_map_grant_ref *kmap_op) { unsigned long flags; unsigned long mfn; @@ -793,10 +791,8 @@ int m2p_remove_override(struct page *page, bool clear_pte) WARN_ON(!PagePrivate(page)); ClearPagePrivate(page); - if (clear_pte) { - struct gnttab_map_grant_ref *map_op - (struct gnttab_map_grant_ref *) page->index; - set_phys_to_machine(pfn, map_op->dev_bus_addr); + set_phys_to_machine(pfn, page->index); + if (kmap_op != NULL) { if (!PageHighMem(page)) { struct multicall_space mcs; struct gnttab_unmap_grant_ref *unmap_op; @@ -808,13 +804,13 @@ int m2p_remove_override(struct page *page, bool clear_pte) * issued. In this case handle is going to -1 because * it hasn''t been modified yet. */ - if (map_op->handle == -1) + if (kmap_op->handle == -1) xen_mc_flush(); /* - * Now if map_op->handle is negative it means that the + * Now if kmap_op->handle is negative it means that the * hypercall actually returned an error. */ - if (map_op->handle == GNTST_general_error) { + if (kmap_op->handle == GNTST_general_error) { printk(KERN_WARNING "m2p_remove_override: " "pfn %lx mfn %lx, failed to modify kernel mappings", pfn, mfn); @@ -824,8 +820,8 @@ int m2p_remove_override(struct page *page, bool clear_pte) mcs = xen_mc_entry( sizeof(struct gnttab_unmap_grant_ref)); unmap_op = mcs.args; - unmap_op->host_addr = map_op->host_addr; - unmap_op->handle = map_op->handle; + unmap_op->host_addr = kmap_op->host_addr; + unmap_op->handle = kmap_op->handle; unmap_op->dev_bus_addr = 0; MULTI_grant_table_op(mcs.mc, @@ -836,10 +832,9 @@ int m2p_remove_override(struct page *page, bool clear_pte) set_pte_at(&init_mm, address, ptep, pfn_pte(pfn, PAGE_KERNEL)); __flush_tlb_single(address); - map_op->host_addr = 0; + kmap_op->host_addr = 0; } - } else - set_phys_to_machine(pfn, page->index); + } /* p2m(m2p(mfn)) == FOREIGN_FRAME(mfn): the mfn is already present * somewhere in this domain, even before being added to the diff --git a/drivers/block/xen-blkback/blkback.c b/drivers/block/xen-blkback/blkback.c index 73f196c..c6decb9 100644 --- a/drivers/block/xen-blkback/blkback.c +++ b/drivers/block/xen-blkback/blkback.c @@ -337,7 +337,7 @@ static void xen_blkbk_unmap(struct pending_req *req) invcount++; } - ret = gnttab_unmap_refs(unmap, pages, invcount, false); + ret = gnttab_unmap_refs(unmap, NULL, pages, invcount); BUG_ON(ret); } diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c index 1ffd03b..7f12416 100644 --- a/drivers/xen/gntdev.c +++ b/drivers/xen/gntdev.c @@ -314,8 +314,9 @@ static int __unmap_grant_pages(struct grant_map *map, int offset, int pages) } } - err = gnttab_unmap_refs(map->unmap_ops + offset, map->pages + offset, - pages, true); + err = gnttab_unmap_refs(map->unmap_ops + offset, + use_ptemod ? map->kmap_ops + offset : NULL, map->pages + offset, + pages); if (err) return err; diff --git a/drivers/xen/grant-table.c b/drivers/xen/grant-table.c index 0bfc1ef..0067266 100644 --- a/drivers/xen/grant-table.c +++ b/drivers/xen/grant-table.c @@ -870,7 +870,8 @@ int gnttab_map_refs(struct gnttab_map_grant_ref *map_ops, EXPORT_SYMBOL_GPL(gnttab_map_refs); int gnttab_unmap_refs(struct gnttab_unmap_grant_ref *unmap_ops, - struct page **pages, unsigned int count, bool clear_pte) + struct gnttab_map_grant_ref *kmap_ops, + struct page **pages, unsigned int count) { int i, ret; bool lazy = false; @@ -888,7 +889,8 @@ int gnttab_unmap_refs(struct gnttab_unmap_grant_ref *unmap_ops, } for (i = 0; i < count; i++) { - ret = m2p_remove_override(pages[i], clear_pte); + ret = m2p_remove_override(pages[i], kmap_ops ? + &kmap_ops[i] : NULL); if (ret) return ret; } diff --git a/include/xen/grant_table.h b/include/xen/grant_table.h index 11e27c3..f19fff8 100644 --- a/include/xen/grant_table.h +++ b/include/xen/grant_table.h @@ -187,6 +187,7 @@ int gnttab_map_refs(struct gnttab_map_grant_ref *map_ops, struct gnttab_map_grant_ref *kmap_ops, struct page **pages, unsigned int count); int gnttab_unmap_refs(struct gnttab_unmap_grant_ref *unmap_ops, - struct page **pages, unsigned int count, bool clear_pte); + struct gnttab_map_grant_ref *kunmap_ops, + struct page **pages, unsigned int count); #endif /* __ASM_GNTTAB_H__ */
Sander Eikelenboom
2012-Sep-12 10:28 UTC
Re: dom0 linux 3.6.0-rc4, crash due to ballooning althoug dom0_mem=X, max:X set
Tuesday, September 11, 2012, 6:02:47 PM, you wrote:> On Wed, 5 Sep 2012, Konrad Rzeszutek Wilk wrote: >> On Tue, Sep 04, 2012 at 04:27:20PM -0400, Robert Phillips wrote: >> > Ben, >> > >> > You have asked me to provide the rationale behind the gnttab_old_mfn patch, which you emailed to Sander earlier today. >> > Here are my findings. >> > >> > I found that xen_blkbk_map() in drivers/block/xen-blkback/blkback.c has changed from our previous version. It now calls gnttab_map_refs() in drivers/xen/grant-table.c. >> > >> > That function first calls HYPERVISOR_grant_table_op(GNTTABOP_map_grant_ref, ... ) and then calls m2p_add_override() in p2m.c >> >> And HYPERVISOR_grant_table_op .. would populate map_ops[i].bus_addr with the machine address.. >> >> > which is where I made my change. >> > >> > The unpatched code was saving the pfn''s old mfn in kmap_op->dev_bus_addr. >> > >> > kmap_op is of type struct gnttab_map_grant_ref. That data type is used to record grant table mappings so later they can be unmapped correctly. >> >> Right, but the blkback makes a distinction by passing NULL as kmap_op, which means it should >> use the old mechanism. Meaning that once the hypercall is done, the map_ops[i].bus_addr is not >> used anymore.. >> >> > >> > The problem with saving the old mfn in kmap_op->dev_bus_addr is that it is later overwritten by __gnttab_map_grant_ref() in xen/common/grant_table.c >> >> Uh, so the problem of saving the old mfn in dev_bus_addr has been there for a long long time then? >> Even before this patch set?> I think that Robert identified the real problem: dev_bus_addr shouldn''t > have been used here. However the bug only shows up if we are batching > the grant table operations, that we started doing since > f62805f1f30a40e354bd036b4cb799863a39be4b. > That''s why Sander''s bisection found that > f62805f1f30a40e354bd036b4cb799863a39be4b is the culprit.> However the fix is incorrect because it is modifying a struct that is > part of the Xen ABI. > I am appending an alternative fix that doesn''t need any changes to > public headers.> Sander, could you please let me know if it fixes the problem for you?It does ! Tested-By: Sander Eikelenboom <linux@eikelenboom.it>> ---> diff --git a/arch/x86/include/asm/xen/page.h b/arch/x86/include/asm/xen/page.h > index 93971e8..472b9b7 100644 > --- a/arch/x86/include/asm/xen/page.h > +++ b/arch/x86/include/asm/xen/page.h > @@ -51,7 +51,8 @@ extern unsigned long set_phys_range_identity(unsigned long pfn_s, > > extern int m2p_add_override(unsigned long mfn, struct page *page, > struct gnttab_map_grant_ref *kmap_op); > -extern int m2p_remove_override(struct page *page, bool clear_pte); > +extern int m2p_remove_override(struct page *page, > + struct gnttab_map_grant_ref *kmap_op); > extern struct page *m2p_find_override(unsigned long mfn); > extern unsigned long m2p_find_override_pfn(unsigned long mfn, unsigned long pfn); > > diff --git a/arch/x86/xen/p2m.c b/arch/x86/xen/p2m.c > index 64effdc..2825594 100644 > --- a/arch/x86/xen/p2m.c > +++ b/arch/x86/xen/p2m.c > @@ -734,9 +734,6 @@ int m2p_add_override(unsigned long mfn, struct page *page, > > xen_mc_issue(PARAVIRT_LAZY_MMU); > } > - /* let''s use dev_bus_addr to record the old mfn instead */ > - kmap_op->dev_bus_addr = page->index; > - page->index = (unsigned long) kmap_op; > } > spin_lock_irqsave(&m2p_override_lock, flags); > list_add(&page->lru, &m2p_overrides[mfn_hash(mfn)]); > @@ -763,7 +760,8 @@ int m2p_add_override(unsigned long mfn, struct page *page, > return 0; > } > EXPORT_SYMBOL_GPL(m2p_add_override); > -int m2p_remove_override(struct page *page, bool clear_pte) > +int m2p_remove_override(struct page *page, > + struct gnttab_map_grant_ref *kmap_op) > { > unsigned long flags; > unsigned long mfn; > @@ -793,10 +791,8 @@ int m2p_remove_override(struct page *page, bool clear_pte) > WARN_ON(!PagePrivate(page)); > ClearPagePrivate(page); > > - if (clear_pte) { > - struct gnttab_map_grant_ref *map_op > - (struct gnttab_map_grant_ref *) page->index; > - set_phys_to_machine(pfn, map_op->dev_bus_addr); > + set_phys_to_machine(pfn, page->index); > + if (kmap_op != NULL) { > if (!PageHighMem(page)) { > struct multicall_space mcs; > struct gnttab_unmap_grant_ref *unmap_op; > @@ -808,13 +804,13 @@ int m2p_remove_override(struct page *page, bool clear_pte) > * issued. In this case handle is going to -1 because > * it hasn''t been modified yet. > */ > - if (map_op->handle == -1) > + if (kmap_op->handle == -1) > xen_mc_flush(); > /* > - * Now if map_op->handle is negative it means that the > + * Now if kmap_op->handle is negative it means that the > * hypercall actually returned an error. > */ > - if (map_op->handle == GNTST_general_error) { > + if (kmap_op->handle == GNTST_general_error) { > printk(KERN_WARNING "m2p_remove_override: " > "pfn %lx mfn %lx, failed to modify kernel mappings", > pfn, mfn); > @@ -824,8 +820,8 @@ int m2p_remove_override(struct page *page, bool clear_pte) > mcs = xen_mc_entry( > sizeof(struct gnttab_unmap_grant_ref)); > unmap_op = mcs.args; > - unmap_op->host_addr = map_op->host_addr; > - unmap_op->handle = map_op->handle; > + unmap_op->host_addr = kmap_op->host_addr; > + unmap_op->handle = kmap_op->handle; > unmap_op->dev_bus_addr = 0; > > MULTI_grant_table_op(mcs.mc, > @@ -836,10 +832,9 @@ int m2p_remove_override(struct page *page, bool clear_pte) > set_pte_at(&init_mm, address, ptep, > pfn_pte(pfn, PAGE_KERNEL)); > __flush_tlb_single(address); > - map_op->host_addr = 0; > + kmap_op->host_addr = 0; > } > - } else > - set_phys_to_machine(pfn, page->index); > + } > > /* p2m(m2p(mfn)) == FOREIGN_FRAME(mfn): the mfn is already present > * somewhere in this domain, even before being added to the > diff --git a/drivers/block/xen-blkback/blkback.c b/drivers/block/xen-blkback/blkback.c > index 73f196c..c6decb9 100644 > --- a/drivers/block/xen-blkback/blkback.c > +++ b/drivers/block/xen-blkback/blkback.c > @@ -337,7 +337,7 @@ static void xen_blkbk_unmap(struct pending_req *req) > invcount++; > } > > - ret = gnttab_unmap_refs(unmap, pages, invcount, false); > + ret = gnttab_unmap_refs(unmap, NULL, pages, invcount); > BUG_ON(ret); > } > > diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c > index 1ffd03b..7f12416 100644 > --- a/drivers/xen/gntdev.c > +++ b/drivers/xen/gntdev.c > @@ -314,8 +314,9 @@ static int __unmap_grant_pages(struct grant_map *map, int offset, int pages) > } > } > > - err = gnttab_unmap_refs(map->unmap_ops + offset, map->pages + offset, > - pages, true); > + err = gnttab_unmap_refs(map->unmap_ops + offset, > + use_ptemod ? map->kmap_ops + offset : NULL, map->pages + offset, > + pages); > if (err) > return err; > > diff --git a/drivers/xen/grant-table.c b/drivers/xen/grant-table.c > index 0bfc1ef..0067266 100644 > --- a/drivers/xen/grant-table.c > +++ b/drivers/xen/grant-table.c > @@ -870,7 +870,8 @@ int gnttab_map_refs(struct gnttab_map_grant_ref *map_ops, > EXPORT_SYMBOL_GPL(gnttab_map_refs); > > int gnttab_unmap_refs(struct gnttab_unmap_grant_ref *unmap_ops, > - struct page **pages, unsigned int count, bool clear_pte) > + struct gnttab_map_grant_ref *kmap_ops, > + struct page **pages, unsigned int count) > { > int i, ret; > bool lazy = false; > @@ -888,7 +889,8 @@ int gnttab_unmap_refs(struct gnttab_unmap_grant_ref *unmap_ops, > } > > for (i = 0; i < count; i++) { > - ret = m2p_remove_override(pages[i], clear_pte); > + ret = m2p_remove_override(pages[i], kmap_ops ? > + &kmap_ops[i] : NULL); > if (ret) > return ret; > } > diff --git a/include/xen/grant_table.h b/include/xen/grant_table.h > index 11e27c3..f19fff8 100644 > --- a/include/xen/grant_table.h > +++ b/include/xen/grant_table.h > @@ -187,6 +187,7 @@ int gnttab_map_refs(struct gnttab_map_grant_ref *map_ops, > struct gnttab_map_grant_ref *kmap_ops, > struct page **pages, unsigned int count); > int gnttab_unmap_refs(struct gnttab_unmap_grant_ref *unmap_ops, > - struct page **pages, unsigned int count, bool clear_pte); > + struct gnttab_map_grant_ref *kunmap_ops, > + struct page **pages, unsigned int count); > > #endif /* __ASM_GNTTAB_H__ */
Stefano Stabellini
2012-Sep-12 11:28 UTC
Re: dom0 linux 3.6.0-rc4, crash due to ballooning althoug dom0_mem=X, max:X set
On Wed, 12 Sep 2012, Sander Eikelenboom wrote:> Tuesday, September 11, 2012, 6:02:47 PM, you wrote: > > > On Wed, 5 Sep 2012, Konrad Rzeszutek Wilk wrote: > >> On Tue, Sep 04, 2012 at 04:27:20PM -0400, Robert Phillips wrote: > >> > Ben, > >> > > >> > You have asked me to provide the rationale behind the gnttab_old_mfn patch, which you emailed to Sander earlier today. > >> > Here are my findings. > >> > > >> > I found that xen_blkbk_map() in drivers/block/xen-blkback/blkback.c has changed from our previous version. It now calls gnttab_map_refs() in drivers/xen/grant-table.c. > >> > > >> > That function first calls HYPERVISOR_grant_table_op(GNTTABOP_map_grant_ref, ... ) and then calls m2p_add_override() in p2m.c > >> > >> And HYPERVISOR_grant_table_op .. would populate map_ops[i].bus_addr with the machine address.. > >> > >> > which is where I made my change. > >> > > >> > The unpatched code was saving the pfn''s old mfn in kmap_op->dev_bus_addr. > >> > > >> > kmap_op is of type struct gnttab_map_grant_ref. That data type is used to record grant table mappings so later they can be unmapped correctly. > >> > >> Right, but the blkback makes a distinction by passing NULL as kmap_op, which means it should > >> use the old mechanism. Meaning that once the hypercall is done, the map_ops[i].bus_addr is not > >> used anymore.. > >> > >> > > >> > The problem with saving the old mfn in kmap_op->dev_bus_addr is that it is later overwritten by __gnttab_map_grant_ref() in xen/common/grant_table.c > >> > >> Uh, so the problem of saving the old mfn in dev_bus_addr has been there for a long long time then? > >> Even before this patch set? > > > I think that Robert identified the real problem: dev_bus_addr shouldn''t > > have been used here. However the bug only shows up if we are batching > > the grant table operations, that we started doing since > > f62805f1f30a40e354bd036b4cb799863a39be4b. > > That''s why Sander''s bisection found that > > f62805f1f30a40e354bd036b4cb799863a39be4b is the culprit. > > > However the fix is incorrect because it is modifying a struct that is > > part of the Xen ABI. > > I am appending an alternative fix that doesn''t need any changes to > > public headers. > > > Sander, could you please let me know if it fixes the problem for you? > > It does ! > > Tested-By: Sander Eikelenboom <linux@eikelenboom.it> >Thanks for testing!
Konrad Rzeszutek Wilk
2012-Sep-13 13:32 UTC
Re: dom0 linux 3.6.0-rc4, crash due to ballooning althoug dom0_mem=X, max:X set
> > Sander, could you please let me know if it fixes the problem for you? > > It does ! > > Tested-By: Sander Eikelenboom <linux@eikelenboom.it>Excellent. Applied. Thx for reporting and testing.
Robert Phillips
2012-Sep-13 13:42 UTC
Re: dom0 linux 3.6.0-rc4, crash due to ballooning althoug dom0_mem=X, max:X set
In our tree, I have tested Stefano''s patch (replacing the "gnttab_old_mfn" patch which Ben previously provided). It seems to work just fine. Thanks, Stefano. -- rsp -----Original Message----- From: Konrad Rzeszutek [mailto:ketuzsezr@gmail.com] On Behalf Of Konrad Rzeszutek Wilk Sent: Thursday, September 13, 2012 9:32 AM To: Sander Eikelenboom Cc: Stefano Stabellini; Robert Phillips; xen-devel@lists.xen.org; Ben Guthro; Konrad Rzeszutek Wilk Subject: Re: [Xen-devel] dom0 linux 3.6.0-rc4, crash due to ballooning althoug dom0_mem=X, max:X set> > Sander, could you please let me know if it fixes the problem for you? > > It does ! > > Tested-By: Sander Eikelenboom <linux@eikelenboom.it>Excellent. Applied. Thx for reporting and testing.
Conny Seidel
2012-Sep-14 14:53 UTC
Re: dom0 linux 3.6.0-rc4, crash due to ballooning althoug dom0_mem=X, max:X set
Hi, On Thu, 13 Sep 2012 09:32:14 -0400 Konrad Rzeszutek Wilk <konrad@kernel.org> wrote:>> > Sander, could you please let me know if it fixes the problem for >> > you? >> >> It does ! >> >> Tested-By: Sander Eikelenboom <linux@eikelenboom.it> > >Excellent. Applied. Thx for reporting and testing.Is it possible that this patch is backported to stable? -- Kind regards. Conny Seidel ################################################################## # Email : conny.seidel@amd.com GnuPG-Key : 0xA6AB055D # # Fingerprint: 17C4 5DB2 7C4C C1C7 1452 8148 F139 7C09 A6AB 055D # ################################################################## # Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach # # General Managers: Alberto Bozzo # # Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen # # HRB Nr. 43632 # ################################################################## _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Konrad Rzeszutek Wilk
2012-Sep-14 17:00 UTC
Re: dom0 linux 3.6.0-rc4, crash due to ballooning althoug dom0_mem=X, max:X set
On Fri, Sep 14, 2012 at 04:53:33PM +0200, Conny Seidel wrote:> Hi, > > > On Thu, 13 Sep 2012 09:32:14 -0400 > Konrad Rzeszutek Wilk <konrad@kernel.org> wrote: > > >> > Sander, could you please let me know if it fixes the problem for > >> > you? > >> > >> It does ! > >> > >> Tested-By: Sander Eikelenboom <linux@eikelenboom.it> > > > >Excellent. Applied. Thx for reporting and testing. > > Is it possible that this patch is backported to stable?It is on the stable release train.
Conny Seidel
2012-Sep-14 17:38 UTC
Re: dom0 linux 3.6.0-rc4, crash due to ballooning althoug dom0_mem=X, max:X set
On Fri, 14 Sep 2012 13:00:42 -0400 Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote:>On Fri, Sep 14, 2012 at 04:53:33PM +0200, Conny Seidel wrote: >> Hi, >> >> >> On Thu, 13 Sep 2012 09:32:14 -0400 >> Konrad Rzeszutek Wilk <konrad@kernel.org> wrote: >> >> >> > Sander, could you please let me know if it fixes the problem for >> >> > you? >> >> >> >> It does ! >> >> >> >> Tested-By: Sander Eikelenboom <linux@eikelenboom.it> >> > >> >Excellent. Applied. Thx for reporting and testing. >> >> Is it possible that this patch is backported to stable? > >It is on the stable release train. >Thank you, thats nice to know. -- Kind regards. Conny Seidel ################################################################## # Email : conny.seidel@amd.com GnuPG-Key : 0xA6AB055D # # Fingerprint: 17C4 5DB2 7C4C C1C7 1452 8148 F139 7C09 A6AB 055D # ################################################################## # Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach # # General Managers: Alberto Bozzo # # Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen # # HRB Nr. 43632 # ################################################################## _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Sander Eikelenboom
2012-Sep-17 19:14 UTC
Re: dom0 linux 3.6.0-rc4, crash due to ballooning althoug dom0_mem=X, max:X set
Thursday, September 13, 2012, 3:32:14 PM, you wrote:>> > Sander, could you please let me know if it fixes the problem for you? >> >> It does ! >> >> Tested-By: Sander Eikelenboom <linux@eikelenboom.it>> Excellent. Applied. Thx for reporting and testing.Hi Konrad, Could it be that i haven''t seen a pull request for this one for the 3.6.0 kernel yet ? -- Sander
Konrad Rzeszutek Wilk
2012-Sep-17 19:23 UTC
Re: dom0 linux 3.6.0-rc4, crash due to ballooning althoug dom0_mem=X, max:X set
On Mon, Sep 17, 2012 at 09:14:52PM +0200, Sander Eikelenboom wrote:> Thursday, September 13, 2012, 3:32:14 PM, you wrote: > > >> > Sander, could you please let me know if it fixes the problem for you? > >> > >> It does ! > >> > >> Tested-By: Sander Eikelenboom <linux@eikelenboom.it> > > > Excellent. Applied. Thx for reporting and testing. > > Hi Konrad, > > Could it be that i haven''t seen a pull request for this one for the 3.6.0 kernel yet ?Correct. I am waiting for Andre Przyrwa to give me heads up on the AMD NUMA bugfix so I can push two bug-fixes to Linus ASAP.> > -- > > Sander >