thr3ads.net - Xen devel - [PATCH v2 0/2] xen: vnuma introduction for pv guest [Nov 2013]

If this information is useful, please help other people find it:
Share via:

Elena Ufimtseva

2013-Nov-18 20:25 UTC

[PATCH v2 0/2] xen: vnuma introduction for pv guest

Xen vnuma introduction.

The patchset introduces vnuma to paravirtualized Xen guests
runnning as domU.
Xen subop hypercall is used to retreive vnuma topology information.
Bases on the retreived topology from Xen, NUMA number of nodes,
memory ranges, distance table and cpumask is being set.
If initialization is incorrect, sets ''dummy'' node and unsets
nodemask.
vNUMA topology is constructed by Xen toolstack. Xen patchset is
available at https://git.gitorious.org/xenvnuma/xenvnuma.git:v3.

Example of vnuma enabled pv domain dmesg:

[    0.000000] Movable zone start for each node
[    0.000000] Early memory node ranges
[    0.000000]   node   0: [mem 0x00001000-0x0009ffff]
[    0.000000]   node   0: [mem 0x00100000-0xffffffff]
[    0.000000]   node   1: [mem 0x100000000-0x1ffffffff]
[    0.000000]   node   2: [mem 0x200000000-0x2ffffffff]
[    0.000000]   node   3: [mem 0x300000000-0x3ffffffff]
[    0.000000] On node 0 totalpages: 1048479
[    0.000000]   DMA zone: 56 pages used for memmap
[    0.000000]   DMA zone: 21 pages reserved
[    0.000000]   DMA zone: 3999 pages, LIFO batch:0
[    0.000000]   DMA32 zone: 14280 pages used for memmap
[    0.000000]   DMA32 zone: 1044480 pages, LIFO batch:31
[    0.000000] On node 1 totalpages: 1048576
[    0.000000]   Normal zone: 14336 pages used for memmap
[    0.000000]   Normal zone: 1048576 pages, LIFO batch:31
[    0.000000] On node 2 totalpages: 1048576
[    0.000000]   Normal zone: 14336 pages used for memmap
[    0.000000]   Normal zone: 1048576 pages, LIFO batch:31
[    0.000000] On node 3 totalpages: 1048576
[    0.000000]   Normal zone: 14336 pages used for memmap
[    0.000000]   Normal zone: 1048576 pages, LIFO batch:31
[    0.000000] SFI: Simple Firmware Interface v0.81 http://simplefirmware.org
[    0.000000] smpboot: Allowing 4 CPUs, 0 hotplug CPUs
[    0.000000] No local APIC present
[    0.000000] APIC: disable apic facility
[    0.000000] APIC: switched to apic NOOP
[    0.000000] nr_irqs_gsi: 16
[    0.000000] PM: Registered nosave memory: [mem 0x000a0000-0x000fffff]
[    0.000000] e820: cannot find a gap in the 32bit address range
[    0.000000] e820: PCI devices with unassigned 32bit BARs may break!
[    0.000000] e820: [mem 0x400100000-0x4004fffff] available for PCI devices
[    0.000000] Booting paravirtualized kernel on Xen
[    0.000000] Xen version: 4.4-unstable (preserve-AD)
[    0.000000] setup_percpu: NR_CPUS:512 nr_cpumask_bits:512 nr_cpu_ids:4
nr_node_ids:4
[    0.000000] PERCPU: Embedded 28 pages/cpu @ffff8800ffc00000 s85376 r8192
d21120 u2097152
[    0.000000] pcpu-alloc: s85376 r8192 d21120 u2097152 alloc=1*2097152


numactl output:
root@heatpipe:~# numactl --hardware
available: 4 nodes (0-3)
node 0 cpus: 0
node 0 size: 4031 MB
node 0 free: 3997 MB
node 1 cpus: 1
node 1 size: 4039 MB
node 1 free: 4022 MB
node 2 cpus: 2
node 2 size: 4039 MB
node 2 free: 4023 MB
node 3 cpus: 3
node 3 size: 3975 MB
node 3 free: 3963 MB
node distances:
node   0   1   2   3
  0:  10  20  20  20
  1:  20  10  20  20
  2:  20  20  10  20
  3:  20  20  20  10

Current patchset is available at
https://git.gitorious.org/xenvnuma/linuxvnuma.git:v3
Xen patchset is available at: https://git.gitorious.org/xenvnuma/xenvnuma.git:v3

TODO
*       dom0, pvh and hvm vnuma support;
*       multiple memory ranges per node support;
*       benchmarking;


Elena Ufimtseva (2):
  xen: vnuma support for PV guests running as domU
  xen: enable vnuma for PV guest

 arch/x86/include/asm/xen/vnuma.h |   12 ++++
 arch/x86/mm/numa.c               |    3 +
 arch/x86/xen/Makefile            |    2 +-
 arch/x86/xen/setup.c             |    6 +-
 arch/x86/xen/vnuma.c             |  127 ++++++++++++++++++++++++++++++++++++++
 include/xen/interface/memory.h   |   44 +++++++++++++
 6 files changed, 192 insertions(+), 2 deletions(-)
 create mode 100644 arch/x86/include/asm/xen/vnuma.h
 create mode 100644 arch/x86/xen/vnuma.c

-- 
1.7.10.4

Konrad Rzeszutek Wilk

2013-Nov-19 15:38 UTC

head link

Re: [PATCH v2 0/2] xen: vnuma introduction for pv guest

On Mon, Nov 18, 2013 at 03:25:48PM -0500, Elena Ufimtseva
wrote:> Xen vnuma introduction.
> 
> The patchset introduces vnuma to paravirtualized Xen guests
> runnning as domU.
> Xen subop hypercall is used to retreive vnuma topology information.
> Bases on the retreived topology from Xen, NUMA number of nodes,
> memory ranges, distance table and cpumask is being set.
> If initialization is incorrect, sets ''dummy'' node and
unsets
> nodemask.
> vNUMA topology is constructed by Xen toolstack. Xen patchset is
> available at https://git.gitorious.org/xenvnuma/xenvnuma.git:v3.
Yeey!

One question - I know you had questions about the
PROT_GLOBAL | ~PAGE_PRESENT being set on PTEs that are going to
be harvested for AutoNUMA balancing. 

And that the hypercall to set such PTE entry disallows the
PROT_GLOBAL (it stripts it off)? That means that when the
Linux page system kicks in (as it has ~PAGE_PRESENT) the
Linux pagehandler won''t see the PROT_GLOBAL (as it has
been filtered out). Which means that the AutoNUMA code won''t
kick in.

(see http://article.gmane.org/gmane.comp.emulators.xen.devel/174317)

Was that problem ever answered?

Dario Faggioli

2013-Nov-19 18:29 UTC

head link

Re: [PATCH v2 0/2] xen: vnuma introduction for pv guest

On mar, 2013-11-19 at 10:38 -0500, Konrad Rzeszutek Wilk
wrote:> On Mon, Nov 18, 2013 at 03:25:48PM -0500, Elena Ufimtseva wrote:
> > The patchset introduces vnuma to paravirtualized Xen guests
> > runnning as domU.
> > Xen subop hypercall is used to retreive vnuma topology information.
> > Bases on the retreived topology from Xen, NUMA number of nodes,
> > memory ranges, distance table and cpumask is being set.
> > If initialization is incorrect, sets ''dummy'' node
and unsets
> > nodemask.
> > vNUMA topology is constructed by Xen toolstack. Xen patchset is
> > available at https://git.gitorious.org/xenvnuma/xenvnuma.git:v3.
> 
> Yeey!
> :-)
> One question - I know you had questions about the
> PROT_GLOBAL | ~PAGE_PRESENT being set on PTEs that are going to
> be harvested for AutoNUMA balancing. 
> 
> And that the hypercall to set such PTE entry disallows the
> PROT_GLOBAL (it stripts it off)? That means that when the
> Linux page system kicks in (as it has ~PAGE_PRESENT) the
> Linux pagehandler won''t see the PROT_GLOBAL (as it has
> been filtered out). Which means that the AutoNUMA code won''t
> kick in.
> 
> (see http://article.gmane.org/gmane.comp.emulators.xen.devel/174317)
> 
> Was that problem ever answered?
> I think the issue is a twofold one.

If I remember correctly (Elena, please, correct me if I''m wrong) Elena
was seeing _crashes_ with both vNUMA and AutoNUMA enabled for the guest.
That''s what pushed her to investigate the issue, and led to what
you''re
summing up above.

However, it appears the crash was due to something completely unrelated
to Xen and vNUMA, was affecting baremetal too, and got fixed, which
means the crash is now gone.

It remains to be seen (I think) whether that also means that AutoNUMA
works. In fact, chatting about this in Edinburgh, Elena managed to
convince me pretty badly that we should --as part of the vNUMA support--
do something about this, in order to make it work. At that time I
thought we should be doing something to avoid the system to go ka-boom,
but as I said, even now that it does not crash anymore, she was so
persuasive that I now find it quite hard to believe that we really
don''t
need to do anything. :-P

I guess, as soon as we get the chance, we should see if this actually
works, i.e., in addition to seeing the proper topology and not crashing,
verify that AutoNUMA in the guest is actually doing is job.

What do you think? Again, Elena, please chime in and explain how things
are, if I got something wrong. :-)

Regards,
Dario

-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Elena Ufimtseva

2013-Dec-04 00:35 UTC

head link

Re: [PATCH v2 0/2] xen: vnuma introduction for pv guest

On Tue, Nov 19, 2013 at 1:29 PM, Dario Faggioli
<dario.faggioli@citrix.com> wrote:> On mar, 2013-11-19 at 10:38 -0500, Konrad Rzeszutek Wilk wrote:
>> On Mon, Nov 18, 2013 at 03:25:48PM -0500, Elena Ufimtseva wrote:
>> > The patchset introduces vnuma to paravirtualized Xen guests
>> > runnning as domU.
>> > Xen subop hypercall is used to retreive vnuma topology
information.
>> > Bases on the retreived topology from Xen, NUMA number of nodes,
>> > memory ranges, distance table and cpumask is being set.
>> > If initialization is incorrect, sets ''dummy''
node and unsets
>> > nodemask.
>> > vNUMA topology is constructed by Xen toolstack. Xen patchset is
>> > available at https://git.gitorious.org/xenvnuma/xenvnuma.git:v3.
>>
>> Yeey!
>>
> :-)
>
>> One question - I know you had questions about the
>> PROT_GLOBAL | ~PAGE_PRESENT being set on PTEs that are going to
>> be harvested for AutoNUMA balancing.
>>
>> And that the hypercall to set such PTE entry disallows the
>> PROT_GLOBAL (it stripts it off)? That means that when the
>> Linux page system kicks in (as it has ~PAGE_PRESENT) the
>> Linux pagehandler won''t see the PROT_GLOBAL (as it has
>> been filtered out). Which means that the AutoNUMA code won''t
>> kick in.
>>
>> (see http://article.gmane.org/gmane.comp.emulators.xen.devel/174317)
>>
>> Was that problem ever answered?
>>
> I think the issue is a twofold one.
>
> If I remember correctly (Elena, please, correct me if I''m wrong)
Elena
> was seeing _crashes_ with both vNUMA and AutoNUMA enabled for the guest.
> That''s what pushed her to investigate the issue, and led to what
you''re
> summing up above.
>
> However, it appears the crash was due to something completely unrelated
> to Xen and vNUMA, was affecting baremetal too, and got fixed, which
> means the crash is now gone.
>
> It remains to be seen (I think) whether that also means that AutoNUMA
> works. In fact, chatting about this in Edinburgh, Elena managed to
> convince me pretty badly that we should --as part of the vNUMA support--
> do something about this, in order to make it work. At that time I
> thought we should be doing something to avoid the system to go ka-boom,
> but as I said, even now that it does not crash anymore, she was so
> persuasive that I now find it quite hard to believe that we really
don''t
> need to do anything. :-P
Yes, you were right Dario :) See at the end. pv guests do not crash,
but they have user space memory corruption.
Ok, so I will try to understand what again had happened during this
weekend.
Meanwhile posting patches for Xen.
>
> I guess, as soon as we get the chance, we should see if this actually
> works, i.e., in addition to seeing the proper topology and not crashing,
> verify that AutoNUMA in the guest is actually doing is job.
>
> What do you think? Again, Elena, please chime in and explain how things
> are, if I got something wrong. :-)
>
Oh guys, I feel really bad about not replying to these emails... Somehow these
replies all got deleted.. wierd.

Ok, about that automatic balancing. At the moment of the last patch
automatic numa balancing seem to
work, but after rebasing on the top of 3.12-rc2 I see similar issues.
I will try to figure out what commits broke and will contact Ingo
Molnar and Mel Gorman.

Konrad,
as of PROT_GLOBAL flag, I will double check once more to exclude
errors from my side.
Last time I was able to have numa_balancing working without any
modifications from hypervisor side.
But again, I want to double check this, some experiments might have
appear being good :)

> Regards,
> Dario
>
> --
> <<This happens because I choose it to happen!>> (Raistlin
Majere)
> -----------------------------------------------------------------
> Dario Faggioli, Ph.D, http://about.me/dario.faggioli
> Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)
>


-- 
Elena

Elena Ufimtseva

2013-Dec-04 06:20 UTC

head link

Re: [PATCH v2 0/2] xen: vnuma introduction for pv guest

On Tue, Dec 3, 2013 at 7:35 PM, Elena Ufimtseva <ufimtseva@gmail.com>
wrote:> On Tue, Nov 19, 2013 at 1:29 PM, Dario Faggioli
> <dario.faggioli@citrix.com> wrote:
>> On mar, 2013-11-19 at 10:38 -0500, Konrad Rzeszutek Wilk wrote:
>>> On Mon, Nov 18, 2013 at 03:25:48PM -0500, Elena Ufimtseva wrote:
>>> > The patchset introduces vnuma to paravirtualized Xen guests
>>> > runnning as domU.
>>> > Xen subop hypercall is used to retreive vnuma topology
information.
>>> > Bases on the retreived topology from Xen, NUMA number of
nodes,
>>> > memory ranges, distance table and cpumask is being set.
>>> > If initialization is incorrect, sets ''dummy''
node and unsets
>>> > nodemask.
>>> > vNUMA topology is constructed by Xen toolstack. Xen patchset
is
>>> > available at
https://git.gitorious.org/xenvnuma/xenvnuma.git:v3.
>>>
>>> Yeey!
>>>
>> :-)
>>
>>> One question - I know you had questions about the
>>> PROT_GLOBAL | ~PAGE_PRESENT being set on PTEs that are going to
>>> be harvested for AutoNUMA balancing.
>>>
>>> And that the hypercall to set such PTE entry disallows the
>>> PROT_GLOBAL (it stripts it off)? That means that when the
>>> Linux page system kicks in (as it has ~PAGE_PRESENT) the
>>> Linux pagehandler won''t see the PROT_GLOBAL (as it has
>>> been filtered out). Which means that the AutoNUMA code
won''t
>>> kick in.
>>>
>>> (see
http://article.gmane.org/gmane.comp.emulators.xen.devel/174317)
>>>
>>> Was that problem ever answered?
>>>
>> I think the issue is a twofold one.
>>
>> If I remember correctly (Elena, please, correct me if I''m
wrong) Elena
>> was seeing _crashes_ with both vNUMA and AutoNUMA enabled for the
guest.
>> That''s what pushed her to investigate the issue, and led to
what you''re
>> summing up above.
>>
>> However, it appears the crash was due to something completely unrelated
>> to Xen and vNUMA, was affecting baremetal too, and got fixed, which
>> means the crash is now gone.
>>
>> It remains to be seen (I think) whether that also means that AutoNUMA
>> works. In fact, chatting about this in Edinburgh, Elena managed to
>> convince me pretty badly that we should --as part of the vNUMA
support--
>> do something about this, in order to make it work. At that time I
>> thought we should be doing something to avoid the system to go ka-boom,
>> but as I said, even now that it does not crash anymore, she was so
>> persuasive that I now find it quite hard to believe that we really
don''t
>> need to do anything. :-P
>
> Yes, you were right Dario :) See at the end. pv guests do not crash,
> but they have user space memory corruption.
> Ok, so I will try to understand what again had happened during this
> weekend.
> Meanwhile posting patches for Xen.
>
>>
>> I guess, as soon as we get the chance, we should see if this actually
>> works, i.e., in addition to seeing the proper topology and not
crashing,
>> verify that AutoNUMA in the guest is actually doing is job.
>>
>> What do you think? Again, Elena, please chime in and explain how things
>> are, if I got something wrong. :-)
>>
>
> Oh guys, I feel really bad about not replying to these emails... Somehow
these
> replies all got deleted.. wierd.
>
> Ok, about that automatic balancing. At the moment of the last patch
> automatic numa balancing seem to
> work, but after rebasing on the top of 3.12-rc2 I see similar issues.
> I will try to figure out what commits broke and will contact Ingo
> Molnar and Mel Gorman.
>
> Konrad,
> as of PROT_GLOBAL flag, I will double check once more to exclude
> errors from my side.
> Last time I was able to have numa_balancing working without any
> modifications from hypervisor side.
> But again, I want to double check this, some experiments might have
> appear being good :)
>
>
>> Regards,
>> Dario
>>
>> --
>> <<This happens because I choose it to happen!>> (Raistlin
Majere)
>> -----------------------------------------------------------------
>> Dario Faggioli, Ph.D, http://about.me/dario.faggioli
>> Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)
>>
>
As of now I have patch v4 for reviewing. Not sure if it will be
beneficial to post it for review
or look closer at the current problem.
The issue I am seeing right now is defferent from what was happening before.
The corruption happens when on change_prot_numa way :

[ 6638.021439]  pfn 45e602, highest_memmap_pfn - 14ddd7
[ 6638.021444] BUG: Bad page map in process dd  pte:800000045e602166
pmd:abf1a067
[ 6638.021449] addr:00007f4fda2d8000 vm_flags:00100073
anon_vma:ffff8800abf77b90 mapping:          (null) index:7f4fda2d8
[ 6638.021457] CPU: 1 PID: 1033 Comm: dd Tainted: G    B   W    3.13.0-rc2+ #10
[ 6638.021462]  0000000000000000 00007f4fda2d8000 ffffffff813ca5b1
ffff88010d68deb8
[ 6638.021471]  ffffffff810f2c88 00000000abf1a067 800000045e602166
0000000000000000
[ 6638.021482]  000000000045e602 ffff88010d68deb8 00007f4fda2d8000
800000045e602166
[ 6638.021492] Call Trace:
[ 6638.021497]  [<ffffffff813ca5b1>] ? dump_stack+0x41/0x51
[ 6638.021503]  [<ffffffff810f2c88>] ? print_bad_pte+0x19d/0x1c9
[ 6638.021509]  [<ffffffff810f3aef>] ? vm_normal_page+0x94/0xb3
[ 6638.021519]  [<ffffffff810fb788>] ? change_protection+0x35c/0x5a8
[ 6638.021527]  [<ffffffff81107965>] ? change_prot_numa+0x13/0x24
[ 6638.021533]  [<ffffffff81071697>] ? task_numa_work+0x1fb/0x299
[ 6638.021539]  [<ffffffff8105ef54>] ? task_work_run+0x7b/0x8f
[ 6638.021545]  [<ffffffff8100e658>] ? do_notify_resume+0x53/0x68
[ 6638.021552]  [<ffffffff813d4432>] ? int_signal+0x12/0x17
[ 6638.021560]  pfn 45d732, highest_memmap_pfn - 14ddd7
[ 6638.021565] BUG: Bad page map in process dd  pte:800000045d732166
pmd:10d684067
[ 6638.021572] addr:00007fff7c143000 vm_flags:00100173
anon_vma:ffff8800abf77960 mapping:          (null) index:7fffffffc
[ 6638.021582] CPU: 1 PID: 1033 Comm: dd Tainted: G    B   W    3.13.0-rc2+ #10
[ 6638.021587]  0000000000000000 00007fff7c143000 ffffffff813ca5b1
ffff8800abf339b0
[ 6638.021595]  ffffffff810f2c88 000000010d684067 800000045d732166
0000000000000000
[ 6638.021603]  000000000045d732 ffff8800abf339b0 00007fff7c143000
800000045d732166

The code has changed since last problem, I will work on this to see
where it comes from.

Elena
>
>
> --
> Elena


-- 
Elena

Dario Faggioli

2013-Dec-05 01:13 UTC

head link

Re: [PATCH v2 0/2] xen: vnuma introduction for pv guest

On mer, 2013-12-04 at 01:20 -0500, Elena Ufimtseva
wrote:> On Tue, Dec 3, 2013 at 7:35 PM, Elena Ufimtseva <ufimtseva@gmail.com>
wrote:
> > Oh guys, I feel really bad about not replying to these emails...
Somehow these
> > replies all got deleted.. wierd.
> >No worries... You should see *my* backlog. :-P
> > Ok, about that automatic balancing. At the moment of the last patch
> > automatic numa balancing seem to
> > work, but after rebasing on the top of 3.12-rc2 I see similar issues.
> > I will try to figure out what commits broke and will contact Ingo
> > Molnar and Mel Gorman.
> >
> As of now I have patch v4 for reviewing. Not sure if it will be
> beneficial to post it for review
> or look closer at the current problem.
>You mean the Linux side? Perhaps stick somewhere a reference to the git
tree/branch where it lives, but, before re-sending, let''s wait for it
to
be as issue free as we can tell?
> The issue I am seeing right now is defferent from what was happening
before.
> The corruption happens when on change_prot_numa way :
> Ok, so, I think I need to step back a bit from the actual stack trace
and look at the big picture. Please, Elena or anyone, correct me if I''m
saying something wrong about how Linux''s autonuma works and interacts
with Xen.

The way it worked when I last looked at it was sort of like this:
 - there was a kthread scanning all the pages, removing the PAGE_PRESENT
   bit from actually present pages, and adding a new special one
   (PAGE_NUMA or something like that);
 - when a page fault is triggered and the PAGE_NUMA flag is found, it
   figures out the page is actually there, so no swap or anything.
   However, it tracks from what node the access to that page came from,
   matches it with the node where the page actually is and collect some
   statistics about that;
 - at some point (and here I don''t remember the exact logic, since it
   changed quite a few times) pages ranking badly in the stats above are
   moved from one node to another.

Is this description still accurate? If yes, here''s what I would
(double)
check, when running this in a PV guest on top of Xen:

 1. the NUMA hinting page fault, are we getting and handling them
    correctly in the PV guest? Are the stats in the guest kernel being
    updated in a sensible way, i.e., do they make sense and properly
    relate to the virtual topology of the guest?
    At some point we thought it would have been necessary to intercept
    these faults and make sure the above is true with some help from the
    hypervisor... Is this the case? Why? Why not?

 2. what happens when autonuma tries to move pages from one node to
    another? For us, that would mean in moving from one virtual node
    to another... Is there a need to do anything at all? I mean, is
    this, from our perspective, just copying the content of an MFN from
    node X into another MFN on node Y, or do we need to update some of
    our vnuma tracking data structures in Xen?

If we have this figured out already, then I think we just chase bugs and
repost the series. If not, well, I think we should. :-D

Thanks and Regards,
Dario

-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Xen devel - Nov 2013 - [PATCH v2 0/2] xen: vnuma introduction for pv guest

[PATCH v2 0/2] xen: vnuma introduction for pv guest

Re: [PATCH v2 0/2] xen: vnuma introduction for pv guest

Re: [PATCH v2 0/2] xen: vnuma introduction for pv guest

Re: [PATCH v2 0/2] xen: vnuma introduction for pv guest

Re: [PATCH v2 0/2] xen: vnuma introduction for pv guest

Re: [PATCH v2 0/2] xen: vnuma introduction for pv guest