Hi all,
Here I am sending as attachments patches enabling kexec/kdump
support in Xen PV domU. Only x84_64 architecture is supported.
There is no support for i386 but some code could be easily reused.
Here is a description of patches:
- kexec-tools-2.0.3_20120522.patch: patch for kexec-tools
which cleanly applies to version 2.0.3,
- kexec-kernel-only_20120522.patch: main kexec/kdump kernel patch;
it was prepared for quite old custom version of Xen Linux Kernel 2.6.18;
it should apply to publicly available Xen Linux Kernel 2.6.18 after
doing some needed changes,
- kexec-kernel-only_20121119.patch: patch fixes initial boot
structures overwrites on machines with memory larger than 1 GiB;
this is partial solution,
- kexec-kernel-only_20121203.patch: this patch fixes timer
issue on Amazon EC2 machines.
kexec-tools patch in general implements new xen-pv loader.
It reads vmlinux ELF file (it could be compressed with gzip
but there is no support for bzImage format) build segments
containing kernel, purgatory and needed boot structures
(start_info, initial P2M, initial page table, etc.).
Some required data (P2M table, hypercall page and start_info)
is taken from kernel via sysfs interface. Finaly kexec syscall
is called and all things are placed in relevant place.
Additionally, this patch contains some fixes for issues which
surfaced during work on kexec/kdump support for Xen PV domU
(e.g. ELF notes issues) and minor cleanups.
Linux Kernel code does segments load, stops processors
if needed, destroys pagetables, move pages in P2M table
if needed, etc. Kernel patches contains also some fixes
and minor cleanups.
During work on kexec/kdump support for Xen PV domU there was
an assumption that we could not change anything in hypervisor
or dom0. It led to the situation in which some hacks should be
used. There are two major tricks in regards to CPU, page tables,
LDT and GDT management and CPU stopping during crash.
Xen does not allow you to destroy CPU context if it was started.
This behavior makes a lot of difficulties if SMP system must be
restarted. Every CPU could be stopped by VCPUOP_down but page tables,
LDT and GDT used when VCPUOP_down is executed on given procesor
are locked (e.g. page tables are pinned and they must be unpinned
before destroying them). This way those CPU strutures could not be
destroyed. There is workaround which gives a chance to stop all
unneeded processors in state where relevant structures are owned by
new kernel and there is no need to destroy them. That way all old
kernel structures could be destoyed and new kernel could be started.
However, this leads to the situation in which new system must run
only one CPU (others are stopped in special way). Now I think that
it could be fixed but it requires some work on code stopping all
unneeded CPUs (the final and correct solution should add special
hypercall to destroy a given CPU context; as I remember relevant
code exists in Xen hypervisor but it could not be called from within
guest). This issue does not appear if UP kernel (or configured in
relevant way) runs on SMP PV domU and user executes kexec or kdump.
In that case new kernel could start all CPUs without any issues.
Additionally, due to lack of NMI implementation in PV domU IPIs are
used to stop extra CPUs during crash. This is not very reliable but
it often works. As I know Konrad Wilk and Boris Ostrovsky were working
on NMI implementation and probably this issue was solved in one way
or another.
Last but not least. New kernels store P2M data as 3 level tree instead
of flat array. Hence, exporting P2M via sysfs would not be so easy.
It should be mentioned that this kexec/kdump implementation could not
work if balloon driver is used.
Now as I can see it is not perfect implementation and some things
could be done in different way. However, some ideas are still valid.
I have tried to comment all non obvious things but if you find
something unclear drop me a line.
All code is GPL 2 licensed (http://www.gnu.org/licenses/gpl-2.0.html).
Feel free to base your development on this patchset but please do not
remove any copyrights. Addtionally, I am happy to help anybody who is
interested in work on this stuff.
Big thank you for Acunu Ltd. (http://www.acunu.com/) for
sponsoring initial work on Xen PV domU.
Daniel
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel