Hello, The Xen community is working on a new virtualization mode (or maybe I should say an extension of HVM) to be able to run PV guests inside HVM containers without requiring a device-model (Qemu). One of the advantages of this new virtualization mode is that now it is much more easier to port guests to run under it (as compared to pure PV guests). Given that FreeBSD already supports PVHVM, adding PVH support is quite easy, we only need some glue for the PV entry point and then support for diverging some early init functions (like fetching the e820 map or starting the APs). The attached patch contains all this changes, and allows a SMP FreeBSD guest to fully boot (and AFAIK work) under this new PVH mode. The patch can also be found on my git repo: git://xenbits.xen.org/people/royger/freebsd.git pvh_v2 The patch touches quite a lot of the early init, so I''ve Cced the persons that maintain those areas, so they can review it. In order to test it, and since the PVH changes are not yet merged into upstream Xen, the use of a patched Xen is necessary. I''ve collected the patches for PVH guest support from George Dunlap (v13) and fixed some bugs on top of them, the tree can be found at: git://xenbits.xen.org/people/royger/xen.git fix_pvh For those curious, here is a dmesg of a FreeBSD PVH guest booting: GDB: no debug ports present KDB: debugger backends: ddb KDB: current backend: ddb SMAP type=01 base=0000000000000000 len=0000000138800000 ACPI BIOS Error (bug): A valid RSDP was not found (20130823/tbxfroot-223) APIC: Using the Xen PV enumerator. SMP: Added CPU 0 (BSP) SMP: Added CPU 2 (AP) SMP: Added CPU 4 (AP) SMP: Added CPU 6 (AP) SMP: Added CPU 8 (AP) SMP: Added CPU 10 (AP) SMP: Added CPU 12 (AP) Copyright (c) 1992-2013 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD is a registered trademark of The FreeBSD Foundation. FreeBSD 11.0-CURRENT #420: Mon Oct 28 13:07:53 CET 2013 root@odin:/usr/obj/usr/src/sys/GENERIC amd64 FreeBSD clang version 3.3 (tags/RELEASE_33/final 183502) 20130610 WARNING: WITNESS option enabled, expect reduced performance. Hypervisor: Origin = "XenVMMXenVMM" Calibrating TSC clock ... TSC clock: 3066775691 Hz CPU: Intel(R) Xeon(R) CPU W3550 @ 3.07GHz (3066.78-MHz K8-class CPU) Origin = "GenuineIntel" Id = 0x106a5 Family = 0x6 Model = 0x1a Stepping = 5 Features=0x1fc98b75<FPU,DE,TSC,MSR,PAE,CX8,APIC,SEP,CMOV,PAT,CLFLUSH,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT> Features2=0x80982201<SSE3,SSSE3,CX16,SSE4.1,SSE4.2,POPCNT,HV> AMD Features=0x20100800<SYSCALL,NX,LM> AMD Features2=0x1<LAHF> real memory = 5242880000 (5000 MB) Physical memory chunk(s): 0x0000000000010000 - 0x00000000001fffff, 2031616 bytes (496 pages) 0x0000000002708000 - 0x0000000130864fff, 5068148736 bytes (1237341 pages) avail memory = 5035581440 (4802 MB) INTR: Adding local APIC 2 as a target INTR: Adding local APIC 4 as a target INTR: Adding local APIC 6 as a target INTR: Adding local APIC 8 as a target INTR: Adding local APIC 10 as a target INTR: Adding local APIC 12 as a target FreeBSD/SMP: Multiprocessor System Detected: 7 CPUs FreeBSD/SMP: 1 package(s) x 7 core(s) cpu0 (BSP): APIC ID: 0 cpu1 (AP): APIC ID: 2 cpu2 (AP): APIC ID: 4 cpu3 (AP): APIC ID: 6 cpu4 (AP): APIC ID: 8 cpu5 (AP): APIC ID: 10 cpu6 (AP): APIC ID: 12 XEN: CPU 0 has VCPU ID 0 XEN: CPU 1 has VCPU ID 1 XEN: CPU 2 has VCPU ID 2 XEN: CPU 3 has VCPU ID 3 XEN: CPU 4 has VCPU ID 4 XEN: CPU 5 has VCPU ID 5 XEN: CPU 6 has VCPU ID 6 x86bios: IVT 0x000000-0x0004ff at 0xfffff80000000000 x86bios: SSEG 0x010000-0x010fff at 0xfffffe012e79d000 x86bios: ROM 0x0a0000-0x0fefff at 0xfffff800000a0000 random device not loaded; using insecure entropy ULE: setup cpu 0 ULE: setup cpu 1 ULE: setup cpu 2 ULE: setup cpu 3 ULE: setup cpu 4 ULE: setup cpu 5 ULE: setup cpu 6 Event-channel device installed. snd_unit_init() u=0x00ff8000 [512] d=0x00007c00 [32] c=0x000003ff [1024] feeder_register: snd_unit=-1 snd_maxautovchans=16 latency=5 feeder_rate_min=1 feeder_rate_max=2016000 feeder_rate_round=25 wlan: <802.11 Link Layer> Hardware, VIA Nehemiah Padlock RNG: VIA Padlock RNG not present Hardware, Intel IvyBridge+ RNG: RDRAND is not present null: <null device, zero device> Falling back to <Software, Yarrow> random adaptor random: <Software, Yarrow> initialized nfslock: pseudo-device kbd0 at kbdmux0 module_register_init: MOD_LOAD (vesa, 0xffffffff80d21c60, 0) error 19 io: <I/O> VMBUS: load mem: <memory> hpt27xx: RocketRAID 27xx controller driver v1.1 hptrr: RocketRAID 17xx/2xxx SATA controller driver v1.2 hptnr: R750/DC7280 controller driver v1.0 ACPI BIOS Error (bug): A valid RSDP was not found (20130823/tbxfroot-223) ACPI: Table initialisation failed: AE_NOT_FOUND ACPI: Try disabling either ACPI or apic support. xenstore0: <XenStore> on motherboard Grant table initialized xc0: <Xen Console> on motherboard xen_et0: <Xen PV Clock> on motherboard Event timer "XENTIMER" frequency 1000000000 Hz quality 950 Timecounter "XENTIMER" frequency 1000000000 Hz quality 950 xen_et0: registered as a time-of-day clock (resolution 10000000us, adjustment 5.000000000s) pvcpu0: <Xen PV CPU> on motherboard pvcpu1: <Xen PV CPU> on motherboard pvcpu2: <Xen PV CPU> on motherboard pvcpu3: <Xen PV CPU> on motherboard pvcpu4: <Xen PV CPU> on motherboard pvcpu5: <Xen PV CPU> on motherboard pvcpu6: <Xen PV CPU> on motherboard legacy_pcib_identify: no bridge found, adding pcib0 anyway pcib0 pcibus 0 on motherboard pci0: <PCI bus> on pcib0 pci0: domain=0, physical bus=0 cpu0 on motherboard cpu1 on motherboard cpu2 on motherboard cpu3 on motherboard cpu4 on motherboard cpu5 on motherboard cpu6 on motherboard isa0: <ISA bus> on motherboard qpi0: <QPI system bus> on motherboard ex_isa_identify() isa_probe_children: disabling PnP devices isa_probe_children: probing non-PnP devices fb: new array size 4 sc0: <System console> on isa0 sc0: MDA <16 virtual consoles, flags=0x100> sc0: fb0, kbd0, terminal emulator: scteken (teken terminal) vga0: <Generic ISA VGA> at port 0x3b0-0x3bb iomem 0xb0000-0xb7fff on isa0 isa_probe_children: probing PnP devices Device configuration finished. procfs registered Timecounters tick every 1.000 msec vlan: initialized, using hash tables with chaining tcp_init: net.inet.tcp.tcbhashsize auto tuned to 65536 lo0: bpf attached hpt27xx: no controller detected. hptrr: no controller detected. hptnr: no controller detected. xenbusb_front0: <Xen Frontend Devices> on xenstore0 xenbusb_add_device: Device device/suspend/event-channel ignored. State 6 xn0: <Virtual Network Interface> at device/vif/0 on xenbusb_front0 xn0: bpf attached xn0: Ethernet address: 00:16:3e:0b:a4:b1 xenbusb_back0: <Xen Backend Devices> on xenstore0 xctrl0: <Xen Control Device> on xenstore0 xn0: backend features: feature-sg feature-gso-tcp4 xbd0: 20480MB <Virtual Block Device> at device/vbd/51712 on xenbusb_front0 xbd0: features: flush, write_barrier xbd0: synchronize cache commands enabled. GEOM: new disk xbd0 random: unblocking device. Netvsc initializing... SMP: AP CPU #5 Launched! SMP: AP CPU #2 Launched! SMP: AP CPU #1 Launched! SMP: AP CPU #3 Launched! SMP: AP CPU #6 Launched! SMP: AP CPU #4 Launched! TSC timecounter discards lower 1 bit(s) Timecounter "TSC-low" frequency 1533387845 Hz quality -100 WARNING: WITNESS option enabled, expect reduced performance. Trying to mount root from ufs:/dev/xbd0p2 []... start_init: trying /sbin/init Setting hostuuid: c9230f36-1a54-489e-877c-1d15b8f463e9. Setting hostid: 0xd52252c7. ZFS filesystem version: 5 ZFS storage pool version: features support (5000) Entropy harvesting: interrupts ethernet point_to_pointsha256: /kernel: No such file or directory kickstart. Starting file system checks: /dev/xbd0p2: FILE SYSTEM CLEAN; SKIPPING CHECKS /dev/xbd0p2: clean, 2213647 free (17111 frags, 274567 blocks, 0.4% fragmentation) Mounting local file systems:. Writing entropy file:. xn0: link state changed to DOWN xn0: link state changed to UP Starting Network: lo0 xn0. lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384 options=600003<RXCSUM,TXCSUM,RXCSUM_IPV6,TXCSUM_IPV6> inet6 ::1 prefixlen 128 inet6 fe80::1%lo0 prefixlen 64 scopeid 0x1 inet 127.0.0.1 netmask 0xff000000 nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL> xn0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=503<RXCSUM,TXCSUM,TSO4,LRO> ether 00:16:3e:0b:a4:b1 nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL> media: Ethernet manual status: active Starting devd. Starting dhclient. DHCPDISCOVER on xn0 to 255.255.255.255 port 67 interval 7 DHCPOFFER from 172.16.1.1 DHCPREQUEST on xn0 to 255.255.255.255 port 67 DHCPACK from 172.16.1.1 bound to 172.16.1.149 -- renewal in 43200 seconds. add net ::ffff:0.0.0.0: gateway ::1 add net ::0.0.0.0: gateway ::1 add net fe80::: gateway ::1 add net ff02::: gateway ::1 ELF ldconfig path: /lib /usr/lib /usr/lib/compat /usr/local/lib 32-bit compatibility ldconfig path: /usr/lib32 Creating and/or trimming log files. Starting syslogd. No core dumps found. lock order reversal: 1st 0xfffffe012e861e28 bufwait (bufwait) @ /usr/src/sys/kern/vfs_bio.c:3050 2nd 0xfffff80005b87c00 dirhash (dirhash) @ /usr/src/sys/ufs/ufs/ufs_dirhash.c:284 KDB: stack backtrace: X_db_symbol_values() at X_db_symbol_values+0x10b/frame 0xfffffe012fb8c410 kdb_backtrace() at kdb_backtrace+0x39/frame 0xfffffe012fb8c4c0 witness_checkorder() at witness_checkorder+0xd23/frame 0xfffffe012fb8c550 _sx_xlock() at _sx_xlock+0x75/frame 0xfffffe012fb8c590 ufsdirhash_add() at ufsdirhash_add+0x3b/frame 0xfffffe012fb8c5d0 ufs_direnter() at ufs_direnter+0x688/frame 0xfffffe012fb8c690 ufs_vinit() at ufs_vinit+0x33f3/frame 0xfffffe012fb8c890 VOP_MKDIR_APV() at VOP_MKDIR_APV+0xf0/frame 0xfffffe012fb8c8c0 kern_mkdirat() at kern_mkdirat+0x1ff/frame 0xfffffe012fb8cae0 amd64_syscall() at amd64_syscall+0x265/frame 0xfffffe012fb8cbf0 Xfast_syscall() at Xfast_syscall+0xfb/frame 0xfffffe012fb8cbf0 --- syscall (136, FreeBSD ELF64, sys_mkdir), rip = 0x80092faaa, rsp = 0x7fffffffd788, rbp = 0x7fffffffdc70 --- Clearing /tmp (X related). Updating motd:. Configuring syscons: keymap blanktime. Performing sanity check on sshd configuration. Starting sshd. Starting cron. Starting background file system checks in 60 seconds. Mon Oct 28 13:22:52 CET 2013 FreeBSD/amd64 (Amnesiac) (xc0) --------------020808000103060707080705 Content-Type: text/plain; charset="UTF-8"; x-mac-type=0; x-mac-creator=0; name="0001-Xen-x86-PVH-support.patch" Content-Transfer-Encoding: 8bit Content-Disposition: attachment; filename="0001-Xen-x86-PVH-support.patch" From 16de1566ada65e5838105870df576ab8258ed8b6 Mon Sep 17 00:00:00 2001 From: Roger Pau Monne <roger.pau@citrix.com> Date: Mon, 14 Oct 2013 18:33:17 +0200 Subject: [PATCH] Xen x86 PVH support This is still very experimental, and PVH support has not yet been merged into upstream Xen. PVH mode is basically a PV guest inside an HVM container, and shares a great amount of code with PVHVM. The main difference is the way the guest is started, PVH uses the PV start sequence, jumping directly into the kernel entry point in long mode and with page tables set. The main work of this patch consists in setting the environment as similar as possible to what native FreeBSD expects, and then adding hooks to the PV ops when necessary. sys/amd64/amd64/locore.S: * Add PV entry point, hypervisor_page and the necessary elfnotes. sys/amd64/amd64/machdep.c: * Add hooks to replace bare metal operations that should use a PV helper, this includes: - Preload metadata - i8254_init and i8254_delay - Fetching the e820 memory map - Reserve of the MP bootstrap region * Create a DELAY function that uses the PV hooks. * Introduce a new hammer_time_xen that sets the necessary stuff when running in PVH mode. sys/amd64/amd64/mp_machdep.c: * Introduce a hook to replace start_all_aps. * Introduce a lapic_disabled variable to prevent polluting the code with xen specific gates. sys/amd64/include/asmacros.h: * Copy the ELFNOTE macro from the i386 Xen PV port. sys/amd64/include/clock.h: sys/i386/include/clock.h: * Prototypes for the xen early delay initialization and usage. sys/amd64/include/cpu.h: * Introduce a new cpu hook to init APs. sys/amd64/include/sysarch.h: * Declare the init_ops structure. sys/amd64/include/xen/hypercall.h: sys/i386/include/xen/hypercall.h * Switch to the PV style hypercall mechanism for HVM also. sys/conf/files: * Make the PV console available on XENHVM also. sys/conf/files.amd64: * Include the new files for the PVH port. sys/dev/xen/console/console.c: sys/dev/xen/console/xencons_ring.c: * Gate the PV console attach so it is only used on PV ports. * Use HYPERVISOR_start_info instead of xen_start_info. * Use HYPERVISOR_event_channel_op to kick the event channel before xen interrupts are setup. sys/dev/xen/control/control.c: * Use the PV shutdown on PVH. sys/dev/xen/timer/timer.c: * Pass a vcpu_info to xen_fetch_vcpu_time, this allows using this function at very early init, before per-cpu vcpu_info is set. * Remove critical_{enter/exit} from xen_fetch_vcpu_time so it can be used at early boot, instead place them on the callers. * Introduce two new functions, xen_delay_init and xen_delay that can be used at early boot to implement the generic DELAY function. sys/i386/i386/locore.s: * Reserve space for the hypercall page. sys/i386/i386/machdep.c: * Create a generic DELAY function. sys/i386/xen/xen_machdep.c: * Set HYPERVISOR_start_info. sys/x86/isa/clock.c: * Rename the generic DELAY function to i8254_delay. sys/x86/x86/delay.c: * Put generic delay helpers here, get_tsc and delay_tc. sys/x86/x86/local_apic.c: * Prevent the local apic from attaching when running on PVH mode. sys/x86/xen/hvm.c: * Set the start_all_aps hook. * Fix the setting of the hypercall page now that we are using the same mechanism as the PV port. * Initialize Xen CPU hooks for the PVH port. * Introduce the xen_early_printf debug function, which prints directly to the hypervisor console. sys/x86/xen/mptable.c: * Create a dummy PV CPU enumerator for the PVH port. sys/x86/xen/pv.c: * Implement the PV functions for the early boot hooks, parse_preload_data and fetch_e820_map. * Implement the PV function for the start_all_aps hook. sys/x86/xen/pvcpu.c: * Dummy Xen PV CPU device, that we use to set the per-cpu pc_device. sys/xen/gnttab.c: * Allocate resume_frames for the PVH port. sys/xen/interface/arch-x86/xen.h: * Interface change for the PVH port (not used on FreeBSD). sys/xen/pv.h: * Header that exports the specific PV functions. sys/xen/xen-os.h: * Declare prototypes for the newly added functions. sys/xen/xenstore/xenstore.c: * Make the xenstore driver hang from both xenpci and the nexus when running XENHVM, this is because we don''t have a xenpci device on the PVH port. * Gate xenstore addition to parent == xenpci on the HVM case. --- sys/amd64/amd64/locore.S | 53 ++++++++ sys/amd64/amd64/machdep.c | 179 ++++++++++++++++++++++---- sys/amd64/amd64/mp_machdep.c | 27 +++-- sys/amd64/include/asmacros.h | 26 ++++ sys/amd64/include/clock.h | 6 + sys/amd64/include/cpu.h | 1 + sys/amd64/include/sysarch.h | 19 +++ sys/amd64/include/xen/hypercall.h | 7 - sys/conf/files | 4 +- sys/conf/files.amd64 | 4 + sys/conf/files.i386 | 1 + sys/dev/xen/console/console.c | 23 +++- sys/dev/xen/console/xencons_ring.c | 15 ++- sys/dev/xen/control/control.c | 37 +++--- sys/dev/xen/timer/timer.c | 59 +++++++-- sys/i386/i386/locore.s | 9 ++ sys/i386/i386/machdep.c | 9 ++ sys/i386/include/clock.h | 6 + sys/i386/include/xen/hypercall.h | 7 - sys/i386/xen/xen_machdep.c | 4 +- sys/x86/isa/clock.c | 53 +-------- sys/x86/x86/delay.c | 95 ++++++++++++++ sys/x86/x86/local_apic.c | 8 +- sys/x86/xen/hvm.c | 93 ++++++++++---- sys/x86/xen/mptable.c | 136 ++++++++++++++++++++ sys/x86/xen/pv.c | 247 ++++++++++++++++++++++++++++++++++++ sys/x86/xen/pvcpu.c | 98 ++++++++++++++ sys/xen/gnttab.c | 21 +++- sys/xen/interface/arch-x86/xen.h | 11 ++- sys/xen/pv.h | 29 ++++ sys/xen/xen-os.h | 8 + sys/xen/xenstore/xenstore.c | 32 ++++-- 32 files changed, 1141 insertions(+), 186 deletions(-) create mode 100644 sys/x86/x86/delay.c create mode 100644 sys/x86/xen/mptable.c create mode 100644 sys/x86/xen/pv.c create mode 100644 sys/x86/xen/pvcpu.c create mode 100644 sys/xen/pv.h diff --git a/sys/amd64/amd64/locore.S b/sys/amd64/amd64/locore.S index 55cda3a..e04cc48 100644 --- a/sys/amd64/amd64/locore.S +++ b/sys/amd64/amd64/locore.S @@ -31,6 +31,12 @@ #include <machine/pmap.h> #include <machine/specialreg.h> +#ifdef XENHVM +#include <xen/xen-os.h> +#define __ASSEMBLY__ +#include <xen/interface/elfnote.h> +#endif + #include "assym.s" /* @@ -86,3 +92,50 @@ NON_GPROF_ENTRY(btext) ALIGN_DATA /* just to be sure */ .space 0x1000 /* space for bootstack - temporary stack */ bootstack: + +#ifdef XENHVM +/* Xen */ +.section __xen_guest + ELFNOTE(Xen, XEN_ELFNOTE_GUEST_OS, .asciz, "FreeBSD") + ELFNOTE(Xen, XEN_ELFNOTE_GUEST_VERSION, .asciz, "HEAD") + ELFNOTE(Xen, XEN_ELFNOTE_XEN_VERSION, .asciz, "xen-3.0") + ELFNOTE(Xen, XEN_ELFNOTE_VIRT_BASE, .quad, KERNBASE) + ELFNOTE(Xen, XEN_ELFNOTE_PADDR_OFFSET, .quad, KERNBASE) /* Xen honours elf->p_paddr; compensate for this */ + ELFNOTE(Xen, XEN_ELFNOTE_ENTRY, .quad, xen_start) + ELFNOTE(Xen, XEN_ELFNOTE_HYPERCALL_PAGE, .quad, hypercall_page) + ELFNOTE(Xen, XEN_ELFNOTE_HV_START_LOW, .quad, HYPERVISOR_VIRT_START) + ELFNOTE(Xen, XEN_ELFNOTE_FEATURES, .asciz, "writable_descriptor_tables|auto_translated_physmap|supervisor_mode_kernel|hvm_callback_vector") + ELFNOTE(Xen, XEN_ELFNOTE_PAE_MODE, .asciz, "yes") + ELFNOTE(Xen, XEN_ELFNOTE_L1_MFN_VALID, .long, PG_V, PG_V) + ELFNOTE(Xen, XEN_ELFNOTE_LOADER, .asciz, "generic") + ELFNOTE(Xen, XEN_ELFNOTE_SUSPEND_CANCEL, .long, 0) + ELFNOTE(Xen, XEN_ELFNOTE_BSD_SYMTAB, .asciz, "yes") + + .text +.p2align PAGE_SHIFT, 0x90 /* Hypercall_page needs to be PAGE aligned */ + +NON_GPROF_ENTRY(hypercall_page) + .skip 0x1000, 0x90 /* Fill with "nop"s */ + +NON_GPROF_ENTRY(xen_start) + /* Don''t trust what the loader gives for rflags. */ + pushq $PSL_KERNEL + popfq + + /* Parameters for the xen init function */ + movq %rsi, %rdi /* shared_info (arg 1) */ + movq %rsp, %rsi /* xenstack (arg 2) */ + + /* Use our own stack */ + movq $bootstack,%rsp + xorl %ebp, %ebp + + /* u_int64_t hammer_time_xen(start_info_t *si, u_int64_t xenstack); */ + call hammer_time_xen + movq %rax, %rsp /* set up kstack for mi_startup() */ + call mi_startup /* autoconfiguration, mountroot etc */ + + /* NOTREACHED */ +0: hlt + jmp 0b +#endif diff --git a/sys/amd64/amd64/machdep.c b/sys/amd64/amd64/machdep.c index 2b2e47f..b649def 100644 --- a/sys/amd64/amd64/machdep.c +++ b/sys/amd64/amd64/machdep.c @@ -127,6 +127,7 @@ __FBSDID("$FreeBSD$"); #include <machine/reg.h> #include <machine/sigframe.h> #include <machine/specialreg.h> +#include <machine/sysarch.h> #ifdef PERFMON #include <machine/perfmon.h> #endif @@ -147,10 +148,20 @@ __FBSDID("$FreeBSD$"); #include <isa/isareg.h> #include <isa/rtc.h> +#ifdef XENHVM +/* Xen */ +#include <xen/xen-os.h> +#include <xen/hvm.h> +#include <xen/pv.h> +#endif + /* Sanity check for __curthread() */ CTASSERT(offsetof(struct pcpu, pc_curthread) == 0); extern u_int64_t hammer_time(u_int64_t, u_int64_t); +#ifdef XENHVM +extern u_int64_t hammer_time_xen(start_info_t *, u_int64_t); +#endif extern void printcpuinfo(void); /* XXX header file */ extern void identify_cpu(void); @@ -166,6 +177,23 @@ static int set_fpcontext(struct thread *td, const mcontext_t *mcp, char *xfpustate, size_t xfpustate_len); SYSINIT(cpu, SI_SUB_CPU, SI_ORDER_FIRST, cpu_startup, NULL); +/* Preload data parse function */ +static caddr_t native_parse_preload_data(u_int64_t); + +/* Native function to fetch the e820 map */ +static void native_fetch_e820_map(caddr_t, struct bios_smap **, u_int32_t *); + +/* Default init_ops implementation. */ +struct init_ops init_ops = { + .parse_preload_data = native_parse_preload_data, + .early_delay_init = i8254_init, + .early_delay = i8254_delay, + .fetch_e820_map = native_fetch_e820_map, +#ifdef SMP + .mp_bootaddress = mp_bootaddress, +#endif +}; + /* * The file "conf/ldscript.amd64" defines the symbol "kernphys". Its value is * the physical address at which the kernel is loaded. @@ -216,6 +244,15 @@ struct mem_range_softc mem_range_softc; struct mtx dt_lock; /* lock for GDT and LDT */ +void +DELAY(int n) +{ + if (delay_tc(n)) + return; + + init_ops.early_delay(n); +} + static void cpu_startup(dummy) void *dummy; @@ -1408,6 +1445,24 @@ add_smap_entry(struct bios_smap *smap, vm_paddr_t *physmap, int *physmap_idxp) return (1); } +static void +native_fetch_e820_map(caddr_t kmdp, struct bios_smap **smap, u_int32_t *size) +{ + /* + * get memory map from INT 15:E820, kindly supplied by the + * loader. + * + * subr_module.c says: + * "Consumer may safely assume that size value precedes data." + * ie: an int32_t immediately precedes smap. + */ + *smap = (struct bios_smap *)preload_search_info(kmdp, + MODINFO_METADATA | MODINFOMD_SMAP); + if (*smap == NULL) + panic("No BIOS smap info from loader!"); + *size = *((u_int32_t *)*smap - 1); +} + /* * Populate the (physmap) array with base/bound pairs describing the * available physical memory in the system, then test this memory and @@ -1433,19 +1488,8 @@ getmemsize(caddr_t kmdp, u_int64_t first) basemem = 0; physmap_idx = 0; - /* - * get memory map from INT 15:E820, kindly supplied by the loader. - * - * subr_module.c says: - * "Consumer may safely assume that size value precedes data." - * ie: an int32_t immediately precedes smap. - */ - smapbase = (struct bios_smap *)preload_search_info(kmdp, - MODINFO_METADATA | MODINFOMD_SMAP); - if (smapbase == NULL) - panic("No BIOS smap info from loader!"); + init_ops.fetch_e820_map(kmdp, &smapbase, &smapsize); - smapsize = *((u_int32_t *)smapbase - 1); smapend = (struct bios_smap *)((uintptr_t)smapbase + smapsize); for (smap = smapbase; smap < smapend; smap++) @@ -1467,7 +1511,8 @@ getmemsize(caddr_t kmdp, u_int64_t first) #ifdef SMP /* make hole for AP bootstrap code */ - physmap[1] = mp_bootaddress(physmap[1] / 1024); + if (init_ops.mp_bootaddress) + physmap[1] = init_ops.mp_bootaddress(physmap[1] / 1024); #endif /* @@ -1681,6 +1726,98 @@ do_next: msgbufp = (struct msgbuf *)PHYS_TO_DMAP(phys_avail[pa_indx]); } +static caddr_t +native_parse_preload_data(u_int64_t modulep) +{ + caddr_t kmdp; + + preload_metadata = (caddr_t)(uintptr_t)(modulep + KERNBASE); + preload_bootstrap_relocate(KERNBASE); + kmdp = preload_search_by_type("elf kernel"); + if (kmdp == NULL) + kmdp = preload_search_by_type("elf64 kernel"); + boothowto = MD_FETCH(kmdp, MODINFOMD_HOWTO, int); + kern_envp = MD_FETCH(kmdp, MODINFOMD_ENVP, char *) + KERNBASE; +#ifdef DDB + ksym_start = MD_FETCH(kmdp, MODINFOMD_SSYM, uintptr_t); + ksym_end = MD_FETCH(kmdp, MODINFOMD_ESYM, uintptr_t); +#endif + + return (kmdp); +} + +#ifdef XENHVM +/* + * First function called by the Xen PVH boot sequence. + * + * Set some Xen global variables and prepare the environment so it is + * as similar as possible to what native FreeBSD init function expects. + */ +u_int64_t +hammer_time_xen(start_info_t *si, u_int64_t xenstack) +{ + u_int64_t physfree; + u_int64_t *PT4 = (u_int64_t *)xenstack; + u_int64_t *PT3 = (u_int64_t *)(xenstack + PAGE_SIZE); + u_int64_t *PT2 = (u_int64_t *)(xenstack + 2 * PAGE_SIZE); + int i; + + KASSERT((si != NULL && xenstack != 0), + ("invalid start_info or xenstack")); + + xen_early_printf("FreeBSD PVH running on %s\n", si->magic); + + /* We use 3 pages of xen stack for the boot pagetables */ + physfree = xenstack + 3 * PAGE_SIZE - KERNBASE; + + /* Setup Xen global variables */ + HYPERVISOR_start_info = si; + HYPERVISOR_shared_info + (shared_info_t *)(si->shared_info + KERNBASE); + + /* + * Setup some misc global variables for Xen devices + * + * XXX: devices that need this specific variables should + * be rewritten to fetch this info by themselves from the + * start_info page. + */ + console_page + (char *)(ptoa(si->console.domU.mfn) + KERNBASE); + xen_store = (struct xenstore_domain_interface *) + (ptoa(si->store_mfn) + KERNBASE); + + xen_domain_type = XEN_PV_DOMAIN; + vm_guest = VM_GUEST_XEN; + + /* + * Use the stack Xen gives us to build the page tables + * as native FreeBSD expects to find them (created + * by the boot trampoline). + */ + for (i = 0; i < 512; i++) { + /* Each slot of the level 4 pages points to the same level 3 page */ + PT4[i] = ((u_int64_t)&PT3[0]) - KERNBASE; + PT4[i] |= PG_V | PG_RW | PG_U; + + /* Each slot of the level 3 pages points to the same level 2 page */ + PT3[i] = ((u_int64_t)&PT2[0]) - KERNBASE; + PT3[i] |= PG_V | PG_RW | PG_U; + + /* The level 2 page slots are mapped with 2MB pages for 1GB. */ + PT2[i] = i * (2 * 1024 * 1024); + PT2[i] |= PG_V | PG_RW | PG_PS | PG_U; + } + load_cr3(((u_int64_t)&PT4[0]) - KERNBASE); + + /* Set the hooks for early functions that diverge from bare metal */ + xen_pv_set_init_ops(); + + /* Now we can jump into the native init function */ + return hammer_time(0, physfree); +} +#endif + u_int64_t hammer_time(u_int64_t modulep, u_int64_t physfree) { @@ -1705,17 +1842,7 @@ hammer_time(u_int64_t modulep, u_int64_t physfree) */ proc_linkup0(&proc0, &thread0); - preload_metadata = (caddr_t)(uintptr_t)(modulep + KERNBASE); - preload_bootstrap_relocate(KERNBASE); - kmdp = preload_search_by_type("elf kernel"); - if (kmdp == NULL) - kmdp = preload_search_by_type("elf64 kernel"); - boothowto = MD_FETCH(kmdp, MODINFOMD_HOWTO, int); - kern_envp = MD_FETCH(kmdp, MODINFOMD_ENVP, char *) + KERNBASE; -#ifdef DDB - ksym_start = MD_FETCH(kmdp, MODINFOMD_SSYM, uintptr_t); - ksym_end = MD_FETCH(kmdp, MODINFOMD_ESYM, uintptr_t); -#endif + kmdp = init_ops.parse_preload_data(modulep); /* Init basic tunables, hz etc */ init_param1(); @@ -1799,10 +1926,10 @@ hammer_time(u_int64_t modulep, u_int64_t physfree) lidt(&r_idt); /* - * Initialize the i8254 before the console so that console + * Initialize the early delay before the console so that console * initialization can use DELAY(). */ - i8254_init(); + init_ops.early_delay_init(); /* * Initialize the console before we print anything out. diff --git a/sys/amd64/amd64/mp_machdep.c b/sys/amd64/amd64/mp_machdep.c index 4ef4b3d..44c2a45 100644 --- a/sys/amd64/amd64/mp_machdep.c +++ b/sys/amd64/amd64/mp_machdep.c @@ -90,7 +90,8 @@ extern struct pcpu __pcpu[]; /* AP uses this during bootstrap. Do not staticize. */ char *bootSTK; -static int bootAP; +int bootAP; +bool lapic_disabled = false; /* Free these after use */ void *bootstacks[MAXCPU]; @@ -122,9 +123,12 @@ u_long *ipi_rendezvous_counts[MAXCPU]; static u_long *ipi_hardclock_counts[MAXCPU]; #endif +int native_start_all_aps(void); + /* Default cpu_ops implementation. */ struct cpu_ops cpu_ops = { - .ipi_vectored = lapic_ipi_vectored + .ipi_vectored = lapic_ipi_vectored, + .start_all_aps = native_start_all_aps, }; extern inthand_t IDTVEC(fast_syscall), IDTVEC(fast_syscall32); @@ -138,7 +142,7 @@ extern int pmap_pcid_enabled; static volatile cpuset_t ipi_nmi_pending; /* used to hold the AP''s until we are ready to release them */ -static struct mtx ap_boot_mtx; +struct mtx ap_boot_mtx; /* Set to 1 once we''re ready to let the APs out of the pen. */ static volatile int aps_ready = 0; @@ -165,7 +169,6 @@ static int cpu_cores; /* cores per package */ static void assign_cpu_ids(void); static void set_interrupt_apic_ids(void); -static int start_all_aps(void); static int start_ap(int apic_id); static void release_aps(void *dummy); @@ -569,7 +572,7 @@ cpu_mp_start(void) assign_cpu_ids(); /* Start each Application Processor */ - start_all_aps(); + cpu_ops.start_all_aps(); set_interrupt_apic_ids(); } @@ -707,7 +710,8 @@ init_secondary(void) wrmsr(MSR_SF_MASK, PSL_NT|PSL_T|PSL_I|PSL_C|PSL_D); /* Disable local APIC just to be sure. */ - lapic_disable(); + if (!lapic_disabled) + lapic_disable(); /* signal our startup to the BSP. */ mp_naps++; @@ -733,7 +737,7 @@ init_secondary(void) /* A quick check from sanity claus */ cpuid = PCPU_GET(cpuid); - if (PCPU_GET(apic_id) != lapic_id()) { + if (!lapic_disabled && PCPU_GET(apic_id) != lapic_id()) { printf("SMP: cpuid = %d\n", cpuid); printf("SMP: actual apic_id = %d\n", lapic_id()); printf("SMP: correct apic_id = %d\n", PCPU_GET(apic_id)); @@ -749,7 +753,8 @@ init_secondary(void) mtx_lock_spin(&ap_boot_mtx); /* Init local apic for irq''s */ - lapic_setup(1); + if (!lapic_disabled) + lapic_setup(1); /* Set memory range attributes for this CPU to match the BSP */ mem_range_AP_init(); @@ -764,7 +769,7 @@ init_secondary(void) if (cpu_logical > 1 && PCPU_GET(apic_id) % cpu_logical != 0) CPU_SET(cpuid, &logical_cpus_mask); - if (bootverbose) + if (!lapic_disabled && bootverbose) lapic_dump("AP"); if (smp_cpus == mp_ncpus) { @@ -908,8 +913,8 @@ assign_cpu_ids(void) /* * start each AP in our list */ -static int -start_all_aps(void) +int +native_start_all_aps(void) { vm_offset_t va = boot_address + KERNBASE; u_int64_t *pt4, *pt3, *pt2; diff --git a/sys/amd64/include/asmacros.h b/sys/amd64/include/asmacros.h index 1fb592a..ce8dce4 100644 --- a/sys/amd64/include/asmacros.h +++ b/sys/amd64/include/asmacros.h @@ -201,4 +201,30 @@ #endif /* LOCORE */ +#ifdef __STDC__ +#define ELFNOTE(name, type, desctype, descdata...) \ +.pushsection .note.name ; \ + .align 4 ; \ + .long 2f - 1f /* namesz */ ; \ + .long 4f - 3f /* descsz */ ; \ + .long type ; \ +1:.asciz #name ; \ +2:.align 4 ; \ +3:desctype descdata ; \ +4:.align 4 ; \ +.popsection +#else /* !__STDC__, i.e. -traditional */ +#define ELFNOTE(name, type, desctype, descdata) \ +.pushsection .note.name ; \ + .align 4 ; \ + .long 2f - 1f /* namesz */ ; \ + .long 4f - 3f /* descsz */ ; \ + .long type ; \ +1:.asciz "name" ; \ +2:.align 4 ; \ +3:desctype descdata ; \ +4:.align 4 ; \ +.popsection +#endif /* __STDC__ */ + #endif /* !_MACHINE_ASMACROS_H_ */ diff --git a/sys/amd64/include/clock.h b/sys/amd64/include/clock.h index d7f7d82..e7817ab 100644 --- a/sys/amd64/include/clock.h +++ b/sys/amd64/include/clock.h @@ -25,6 +25,12 @@ extern int smp_tsc; #endif void i8254_init(void); +void i8254_delay(int); +#ifdef XENHVM +void xen_delay_init(void); +void xen_delay(int); +#endif +int delay_tc(int); /* * Driver to clock driver interface. diff --git a/sys/amd64/include/cpu.h b/sys/amd64/include/cpu.h index 3d9ff531..ed9f1db 100644 --- a/sys/amd64/include/cpu.h +++ b/sys/amd64/include/cpu.h @@ -64,6 +64,7 @@ struct cpu_ops { void (*cpu_init)(void); void (*cpu_resume)(void); void (*ipi_vectored)(u_int, int); + int (*start_all_aps)(void); }; extern struct cpu_ops cpu_ops; diff --git a/sys/amd64/include/sysarch.h b/sys/amd64/include/sysarch.h index cd380d4..27fd3ba 100644 --- a/sys/amd64/include/sysarch.h +++ b/sys/amd64/include/sysarch.h @@ -4,3 +4,22 @@ /* $FreeBSD$ */ #include <x86/sysarch.h> + +#include <machine/pc/bios.h> +/* + * Struct containing pointers to init functions whose + * implementation is run time selectable. Selection can be made, + * for example, based on detection of a BIOS variant or + * hypervisor environment. + */ +struct init_ops { + caddr_t (*parse_preload_data)(u_int64_t); + void (*early_delay_init)(void); + void (*early_delay)(int); + void (*fetch_e820_map)(caddr_t, struct bios_smap **, u_int32_t *); +#ifdef SMP + u_int (*mp_bootaddress)(u_int); +#endif +}; + +extern struct init_ops init_ops; diff --git a/sys/amd64/include/xen/hypercall.h b/sys/amd64/include/xen/hypercall.h index a1b2a5c..499fb4d 100644 --- a/sys/amd64/include/xen/hypercall.h +++ b/sys/amd64/include/xen/hypercall.h @@ -51,15 +51,8 @@ #define CONFIG_XEN_COMPAT 0x030002 #define __must_check -#ifdef XEN #define HYPERCALL_STR(name) \ "call hypercall_page + ("STR(__HYPERVISOR_##name)" * 32)" -#else -#define HYPERCALL_STR(name) \ - "mov $("STR(__HYPERVISOR_##name)" * 32),%%eax; "\ - "add hypercall_stubs(%%rip),%%rax; " \ - "call *%%rax" -#endif #define _hypercall0(type, name) \ ({ \ diff --git a/sys/conf/files b/sys/conf/files index f3e298c..6040447 100644 --- a/sys/conf/files +++ b/sys/conf/files @@ -2508,8 +2508,8 @@ dev/xe/if_xe_pccard.c optional xe pccard dev/xen/balloon/balloon.c optional xen | xenhvm dev/xen/blkfront/blkfront.c optional xen | xenhvm dev/xen/blkback/blkback.c optional xen | xenhvm -dev/xen/console/console.c optional xen -dev/xen/console/xencons_ring.c optional xen +dev/xen/console/console.c optional xen | xenhvm +dev/xen/console/xencons_ring.c optional xen | xenhvm dev/xen/control/control.c optional xen | xenhvm dev/xen/netback/netback.c optional xen | xenhvm dev/xen/netfront/netfront.c optional xen | xenhvm diff --git a/sys/conf/files.amd64 b/sys/conf/files.amd64 index 1914c48..bd52e8f 100644 --- a/sys/conf/files.amd64 +++ b/sys/conf/files.amd64 @@ -554,5 +554,9 @@ x86/x86/mptable_pci.c optional mptable pci x86/x86/msi.c optional pci x86/x86/nexus.c standard x86/x86/tsc.c standard +x86/x86/delay.c standard x86/xen/hvm.c optional xenhvm x86/xen/xen_intr.c optional xen | xenhvm +x86/xen/mptable.c optional xenhvm +x86/xen/pvcpu.c optional xenhvm +x86/xen/pv.c optional xenhvm diff --git a/sys/conf/files.i386 b/sys/conf/files.i386 index e259659..15a3aae 100644 --- a/sys/conf/files.i386 +++ b/sys/conf/files.i386 @@ -577,5 +577,6 @@ x86/x86/mptable_pci.c optional apic native pci x86/x86/msi.c optional apic pci x86/x86/nexus.c standard x86/x86/tsc.c standard +x86/x86/delay.c standard x86/xen/hvm.c optional xenhvm x86/xen/xen_intr.c optional xen | xenhvm diff --git a/sys/dev/xen/console/console.c b/sys/dev/xen/console/console.c index 65a0e7d..86dc2a4 100644 --- a/sys/dev/xen/console/console.c +++ b/sys/dev/xen/console/console.c @@ -69,11 +69,14 @@ struct mtx cn_mtx; static char wbuf[WBUF_SIZE]; static char rbuf[RBUF_SIZE]; static int rc, rp; -static unsigned int cnsl_evt_reg; +unsigned int cnsl_evt_reg; static unsigned int wc, wp; /* write_cons, write_prod */ xen_intr_handle_t xen_intr_handle; device_t xencons_dev; +/* Virt address of the shared console page */ +char *console_page; + #ifdef KDB static int xc_altbrk; #endif @@ -113,6 +116,9 @@ static struct ttydevsw xc_ttydevsw = { static void xc_cnprobe(struct consdev *cp) { + if (!xen_pv_domain()) + return; + cp->cn_pri = CN_REMOTE; sprintf(cp->cn_name, "%s0", driver_name); } @@ -175,7 +181,7 @@ static void xc_cnputc(struct consdev *dev, int c) { - if (xen_start_info->flags & SIF_INITDOMAIN) + if (HYPERVISOR_start_info->flags & SIF_INITDOMAIN) xc_cnputc_dom0(dev, c); else xc_cnputc_domu(dev, c); @@ -206,8 +212,7 @@ xcons_putc(int c) xcons_force_flush(); #endif } - if (cnsl_evt_reg) - __xencons_tx_flush(); + __xencons_tx_flush(); /* inform start path that we''re pretty full */ return ((wp - wc) >= WBUF_SIZE - 100) ? TRUE : FALSE; @@ -217,6 +222,10 @@ static void xc_identify(driver_t *driver, device_t parent) { device_t child; + + if (!xen_pv_domain()) + return; + child = BUS_ADD_CHILD(parent, 0, driver_name, 0); device_set_driver(child, driver); device_set_desc(child, "Xen Console"); @@ -245,7 +254,7 @@ xc_attach(device_t dev) cnsl_evt_reg = 1; callout_reset(&xc_callout, XC_POLLTIME, xc_timeout, xccons); - if (xen_start_info->flags & SIF_INITDOMAIN) { + if (HYPERVISOR_start_info->flags & SIF_INITDOMAIN) { error = xen_intr_bind_virq(dev, VIRQ_CONSOLE, 0, NULL, xencons_priv_interrupt, NULL, INTR_TYPE_TTY, &xen_intr_handle); @@ -309,7 +318,7 @@ __xencons_tx_flush(void) sz = wp - wc; if (sz > (WBUF_SIZE - WBUF_MASK(wc))) sz = WBUF_SIZE - WBUF_MASK(wc); - if (xen_start_info->flags & SIF_INITDOMAIN) { + if (HYPERVISOR_start_info->flags & SIF_INITDOMAIN) { HYPERVISOR_console_io(CONSOLEIO_write, sz, &wbuf[WBUF_MASK(wc)]); wc += sz; } else { @@ -424,7 +433,7 @@ xcons_force_flush(void) { int sz; - if (xen_start_info->flags & SIF_INITDOMAIN) + if (HYPERVISOR_start_info->flags & SIF_INITDOMAIN) return; /* Spin until console data is flushed through to the domain controller. */ diff --git a/sys/dev/xen/console/xencons_ring.c b/sys/dev/xen/console/xencons_ring.c index 3701551..3046498 100644 --- a/sys/dev/xen/console/xencons_ring.c +++ b/sys/dev/xen/console/xencons_ring.c @@ -32,9 +32,9 @@ __FBSDID("$FreeBSD$"); #define console_evtchn console.domU.evtchn xen_intr_handle_t console_handle; -extern char *console_page; extern struct mtx cn_mtx; extern device_t xencons_dev; +extern int cnsl_evt_reg; static inline struct xencons_interface * xencons_interface(void) @@ -60,6 +60,7 @@ xencons_ring_send(const char *data, unsigned len) struct xencons_interface *intf; XENCONS_RING_IDX cons, prod; int sent; + struct evtchn_send send = { .port = HYPERVISOR_start_info->console.domU.evtchn }; intf = xencons_interface(); cons = intf->out_cons; @@ -76,7 +77,11 @@ xencons_ring_send(const char *data, unsigned len) wmb(); intf->out_prod = prod; - xen_intr_signal(console_handle); + if (cnsl_evt_reg) + xen_intr_signal(console_handle); + else + HYPERVISOR_event_channel_op(EVTCHNOP_send, &send); + return sent; @@ -125,11 +130,11 @@ xencons_ring_init(void) { int err; - if (!xen_start_info->console_evtchn) + if (!HYPERVISOR_start_info->console_evtchn) return 0; err = xen_intr_bind_local_port(xencons_dev, - xen_start_info->console_evtchn, NULL, xencons_handle_input, NULL, + HYPERVISOR_start_info->console_evtchn, NULL, xencons_handle_input, NULL, INTR_TYPE_MISC | INTR_MPSAFE, &console_handle); if (err) { return err; @@ -145,7 +150,7 @@ void xencons_suspend(void) { - if (!xen_start_info->console_evtchn) + if (!HYPERVISOR_start_info->console_evtchn) return; xen_intr_unbind(&console_handle); diff --git a/sys/dev/xen/control/control.c b/sys/dev/xen/control/control.c index a9f8d1b..35c923d 100644 --- a/sys/dev/xen/control/control.c +++ b/sys/dev/xen/control/control.c @@ -317,21 +317,6 @@ xctrl_suspend() EVENTHANDLER_INVOKE(power_resume); } -static void -xen_pv_shutdown_final(void *arg, int howto) -{ - /* - * Inform the hypervisor that shutdown is complete. - * This is not necessary in HVM domains since Xen - * emulates ACPI in that mode and FreeBSD''s ACPI - * support will request this transition. - */ - if (howto & (RB_HALT | RB_POWEROFF)) - HYPERVISOR_shutdown(SHUTDOWN_poweroff); - else - HYPERVISOR_shutdown(SHUTDOWN_reboot); -} - #else /* HVM mode suspension. */ @@ -447,6 +432,21 @@ xctrl_halt() shutdown_nice(RB_HALT); } +static void +xen_pv_shutdown_final(void *arg, int howto) +{ + /* + * Inform the hypervisor that shutdown is complete. + * This is not necessary in HVM domains since Xen + * emulates ACPI in that mode and FreeBSD''s ACPI + * support will request this transition. + */ + if (howto & (RB_HALT | RB_POWEROFF)) + HYPERVISOR_shutdown(SHUTDOWN_poweroff); + else + HYPERVISOR_shutdown(SHUTDOWN_reboot); +} + /*------------------------------ Event Reception -----------------------------*/ static void xctrl_on_watch_event(struct xs_watch *watch, const char **vec, unsigned int len) @@ -529,10 +529,9 @@ xctrl_attach(device_t dev) xctrl->xctrl_watch.callback_data = (uintptr_t)xctrl; xs_register_watch(&xctrl->xctrl_watch); -#ifndef XENHVM - EVENTHANDLER_REGISTER(shutdown_final, xen_pv_shutdown_final, NULL, - SHUTDOWN_PRI_LAST); -#endif + if (xen_pv_domain()) + EVENTHANDLER_REGISTER(shutdown_final, xen_pv_shutdown_final, NULL, + SHUTDOWN_PRI_LAST); return (0); } diff --git a/sys/dev/xen/timer/timer.c b/sys/dev/xen/timer/timer.c index 824c75b..13bd852 100644 --- a/sys/dev/xen/timer/timer.c +++ b/sys/dev/xen/timer/timer.c @@ -59,6 +59,9 @@ __FBSDID("$FreeBSD$"); #include <machine/_inttypes.h> #include <machine/smp.h> +/* For the declaration of clock_lock */ +#include <isa/rtc.h> + #include "clock_if.h" static devclass_t xentimer_devclass; @@ -234,18 +237,16 @@ xen_fetch_vcpu_tinfo(struct vcpu_time_info *dst, struct vcpu_time_info *src) * it happens to be less than another CPU''s previously determined value. */ static uint64_t -xen_fetch_vcpu_time(void) +xen_fetch_vcpu_time(struct vcpu_info *vcpu) { struct vcpu_time_info dst; struct vcpu_time_info *src; uint32_t pre_version; uint64_t now; volatile uint64_t last; - struct vcpu_info *vcpu = DPCPU_GET(vcpu_info); src = &vcpu->time; - critical_enter(); do { pre_version = xen_fetch_vcpu_tinfo(&dst, src); barrier(); @@ -266,16 +267,19 @@ xen_fetch_vcpu_time(void) } } while (!atomic_cmpset_64(&xen_timer_last_time, last, now)); - critical_exit(); - return (now); } static uint32_t xentimer_get_timecount(struct timecounter *tc) { + uint32_t xen_time; + + critical_enter(); + xen_time = (uint32_t)xen_fetch_vcpu_time(DPCPU_GET(vcpu_info)) & UINT_MAX; + critical_exit(); - return ((uint32_t)xen_fetch_vcpu_time() & UINT_MAX); + return xen_time; } /** @@ -305,7 +309,12 @@ xen_fetch_wallclock(struct timespec *ts) static void xen_fetch_uptime(struct timespec *ts) { - uint64_t uptime = xen_fetch_vcpu_time(); + uint64_t uptime; + + critical_enter(); + uptime = xen_fetch_vcpu_time(DPCPU_GET(vcpu_info)); + critical_exit(); + ts->tv_sec = uptime / NSEC_IN_SEC; ts->tv_nsec = uptime % NSEC_IN_SEC; } @@ -354,7 +363,7 @@ xentimer_intr(void *arg) struct xentimer_softc *sc = (struct xentimer_softc *)arg; struct xentimer_pcpu_data *pcpu = DPCPU_PTR(xentimer_pcpu); - pcpu->last_processed = xen_fetch_vcpu_time(); + pcpu->last_processed = xen_fetch_vcpu_time(DPCPU_GET(vcpu_info)); if (pcpu->timer != 0 && sc->et.et_active) sc->et.et_event_cb(&sc->et, sc->et.et_arg); @@ -415,7 +424,9 @@ xentimer_et_start(struct eventtimer *et, do { if (++i == 60) panic("can''t schedule timer"); - next_time = xen_fetch_vcpu_time() + first_in_ns; + critical_enter(); + next_time = xen_fetch_vcpu_time(DPCPU_GET(vcpu_info)) + first_in_ns; + critical_exit(); error = xentimer_vcpu_start_timer(cpu, next_time); } while (error == -ETIME); @@ -573,6 +584,36 @@ xentimer_suspend(device_t dev) return (0); } +/* + * Xen delay early init + */ +void xen_delay_init(void) +{ + /* Init the clock lock */ + mtx_init(&clock_lock, "clk", NULL, MTX_SPIN | MTX_NOPROFILE); +} +/* + * Xen PV DELAY function + * + * When running on PVH mode we don''t have an emulated i8524, so + * make use of the Xen time info in order to code a simple DELAY + * function that can be used during early boot. + */ +void xen_delay(int n) +{ + uint64_t end_ns; + uint64_t current; + + end_ns = xen_fetch_vcpu_time(&HYPERVISOR_shared_info->vcpu_info[0]); + end_ns += n * NSEC_IN_USEC; + + for (;;) { + current = xen_fetch_vcpu_time(&HYPERVISOR_shared_info->vcpu_info[0]); + if (current >= end_ns) + break; + } +} + static device_method_t xentimer_methods[] = { DEVMETHOD(device_identify, xentimer_identify), DEVMETHOD(device_probe, xentimer_probe), diff --git a/sys/i386/i386/locore.s b/sys/i386/i386/locore.s index 68cb430..bd136b1 100644 --- a/sys/i386/i386/locore.s +++ b/sys/i386/i386/locore.s @@ -898,3 +898,12 @@ done_pde: #endif ret + +#ifdef XENHVM +/* Xen Hypercall page */ + .text +.p2align PAGE_SHIFT, 0x90 /* Hypercall_page needs to be PAGE aligned */ + +NON_GPROF_ENTRY(hypercall_page) + .skip 0x1000, 0x90 /* Fill with "nop"s */ +#endif diff --git a/sys/i386/i386/machdep.c b/sys/i386/i386/machdep.c index c430316..8bd9a8e 100644 --- a/sys/i386/i386/machdep.c +++ b/sys/i386/i386/machdep.c @@ -254,6 +254,15 @@ struct mtx icu_lock; struct mem_range_softc mem_range_softc; +void +DELAY(int n) +{ + if (delay_tc(n)) + return; + + i8254_delay(n); +} + static void cpu_startup(dummy) void *dummy; diff --git a/sys/i386/include/clock.h b/sys/i386/include/clock.h index d980ec7..287b2c8 100644 --- a/sys/i386/include/clock.h +++ b/sys/i386/include/clock.h @@ -22,6 +22,12 @@ extern int tsc_is_invariant; extern int tsc_perf_stat; void i8254_init(void); +void i8254_delay(int); +#ifdef XENHVM +void xen_delay_init(void); +void xen_delay(int); +#endif +int delay_tc(int); /* * Driver to clock driver interface. diff --git a/sys/i386/include/xen/hypercall.h b/sys/i386/include/xen/hypercall.h index edc13f4..1c15b0f 100644 --- a/sys/i386/include/xen/hypercall.h +++ b/sys/i386/include/xen/hypercall.h @@ -40,15 +40,8 @@ #define CONFIG_XEN_COMPAT 0x030002 -#if defined(XEN) #define HYPERCALL_STR(name) \ "call hypercall_page + ("STR(__HYPERVISOR_##name)" * 32)" -#else -#define HYPERCALL_STR(name) \ - "mov hypercall_stubs,%%eax; " \ - "add $("STR(__HYPERVISOR_##name)" * 32),%%eax; " \ - "call *%%eax" -#endif #define _hypercall0(type, name) \ ({ \ diff --git a/sys/i386/xen/xen_machdep.c b/sys/i386/xen/xen_machdep.c index 7049be6..1b1c74d 100644 --- a/sys/i386/xen/xen_machdep.c +++ b/sys/i386/xen/xen_machdep.c @@ -89,6 +89,7 @@ IDTVEC(div), IDTVEC(dbg), IDTVEC(nmi), IDTVEC(bpt), IDTVEC(ofl), int xendebug_flags; start_info_t *xen_start_info; +start_info_t *HYPERVISOR_start_info; shared_info_t *HYPERVISOR_shared_info; xen_pfn_t *xen_machine_phys = machine_to_phys_mapping; xen_pfn_t *xen_phys_machine; @@ -744,7 +745,7 @@ void initvalues(start_info_t *startinfo); struct xenstore_domain_interface; extern struct xenstore_domain_interface *xen_store; -char *console_page; +extern char *console_page; void * bootmem_alloc(unsigned int size) @@ -927,6 +928,7 @@ initvalues(start_info_t *startinfo) HYPERVISOR_vm_assist(VMASST_CMD_enable, VMASST_TYPE_4gb_segments_notify); #endif xen_start_info = startinfo; + HYPERVISOR_start_info = startinfo; xen_phys_machine = (xen_pfn_t *)startinfo->mfn_list; IdlePTD = (pd_entry_t *)((uint8_t *)startinfo->pt_base + PAGE_SIZE); diff --git a/sys/x86/isa/clock.c b/sys/x86/isa/clock.c index a12e175..a5aed1c 100644 --- a/sys/x86/isa/clock.c +++ b/sys/x86/isa/clock.c @@ -247,61 +247,13 @@ getit(void) return ((high << 8) | low); } -#ifndef DELAYDEBUG -static u_int -get_tsc(__unused struct timecounter *tc) -{ - - return (rdtsc32()); -} - -static __inline int -delay_tc(int n) -{ - struct timecounter *tc; - timecounter_get_t *func; - uint64_t end, freq, now; - u_int last, mask, u; - - tc = timecounter; - freq = atomic_load_acq_64(&tsc_freq); - if (tsc_is_invariant && freq != 0) { - func = get_tsc; - mask = ~0u; - } else { - if (tc->tc_quality <= 0) - return (0); - func = tc->tc_get_timecount; - mask = tc->tc_counter_mask; - freq = tc->tc_frequency; - } - now = 0; - end = freq * n / 1000000; - if (func == get_tsc) - sched_pin(); - last = func(tc) & mask; - do { - cpu_spinwait(); - u = func(tc) & mask; - if (u < last) - now += mask - last + u + 1; - else - now += u - last; - last = u; - } while (now < end); - if (func == get_tsc) - sched_unpin(); - return (1); -} -#endif - /* * Wait "n" microseconds. * Relies on timer 1 counting down from (i8254_freq / hz) * Note: timer had better have been programmed before this is first used! */ void -DELAY(int n) +i8254_delay(int n) { int delta, prev_tick, tick, ticks_left; #ifdef DELAYDEBUG @@ -317,9 +269,6 @@ DELAY(int n) } if (state == 1) printf("DELAY(%d)...", n); -#else - if (delay_tc(n)) - return; #endif /* * Read the counter first, so that the rest of the setup overhead is diff --git a/sys/x86/x86/delay.c b/sys/x86/x86/delay.c new file mode 100644 index 0000000..7ea70b1 --- /dev/null +++ b/sys/x86/x86/delay.c @@ -0,0 +1,95 @@ +/*- + * Copyright (c) 1990 The Regents of the University of California. + * Copyright (c) 2010 Alexander Motin <mav@FreeBSD.org> + * All rights reserved. + * + * This code is derived from software contributed to Berkeley by + * William Jolitz and Don Ahn. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * 4. Neither the name of the University nor the names of its contributors + * may be used to endorse or promote products derived from this software + * without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'''' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + * + * from: @(#)clock.c 7.2 (Berkeley) 5/12/91 + */ + +#include <sys/cdefs.h> +__FBSDID("$FreeBSD$"); + +/* Generic x86 routines to handle delay */ + +#include <sys/param.h> +#include <sys/systm.h> +#include <sys/timetc.h> +#include <sys/proc.h> +#include <sys/kernel.h> +#include <sys/sched.h> + +#include <machine/clock.h> +#include <machine/cpu.h> + +static u_int +get_tsc(__unused struct timecounter *tc) +{ + + return (rdtsc32()); +} + +int +delay_tc(int n) +{ + struct timecounter *tc; + timecounter_get_t *func; + uint64_t end, freq, now; + u_int last, mask, u; + + tc = timecounter; + freq = atomic_load_acq_64(&tsc_freq); + if (tsc_is_invariant && freq != 0) { + func = get_tsc; + mask = ~0u; + } else { + if (tc->tc_quality <= 0) + return (0); + func = tc->tc_get_timecount; + mask = tc->tc_counter_mask; + freq = tc->tc_frequency; + } + now = 0; + end = freq * n / 1000000; + if (func == get_tsc) + sched_pin(); + last = func(tc) & mask; + do { + cpu_spinwait(); + u = func(tc) & mask; + if (u < last) + now += mask - last + u + 1; + else + now += u - last; + last = u; + } while (now < end); + if (func == get_tsc) + sched_unpin(); + return (1); +} diff --git a/sys/x86/x86/local_apic.c b/sys/x86/x86/local_apic.c index 8c8eef6..d8d7701 100644 --- a/sys/x86/x86/local_apic.c +++ b/sys/x86/x86/local_apic.c @@ -1368,9 +1368,13 @@ apic_setup_io(void *dummy __unused) if (retval != 0) printf("%s: Failed to setup I/O APICs: returned %d\n", best_enum->apic_name, retval); -#ifdef XEN - return; + +#if defined(XEN) || defined(XENHVM) + /* There''s no lapic on PV Xen */ + if (xen_pv_domain()) + return; #endif + /* * Finish setting up the local APIC on the BSP once we know how to * properly program the LINT pins. diff --git a/sys/x86/xen/hvm.c b/sys/x86/xen/hvm.c index 72811dc..be15594 100644 --- a/sys/x86/xen/hvm.c +++ b/sys/x86/xen/hvm.c @@ -35,15 +35,21 @@ __FBSDID("$FreeBSD$"); #include <sys/proc.h> #include <sys/smp.h> #include <sys/systm.h> +#include <sys/lock.h> +#include <sys/mutex.h> +#include <sys/reboot.h> #include <vm/vm.h> #include <vm/pmap.h> +#include <vm/vm_kern.h> +#include <vm/vm_extern.h> #include <dev/pci/pcivar.h> #include <machine/cpufunc.h> #include <machine/cpu.h> #include <machine/smp.h> +#include <machine/stdarg.h> #include <x86/apicreg.h> @@ -52,6 +58,9 @@ __FBSDID("$FreeBSD$"); #include <xen/gnttab.h> #include <xen/hypervisor.h> #include <xen/hvm.h> +#ifdef __amd64__ +#include <xen/pv.h> +#endif #include <xen/xen_intr.h> #include <xen/interface/hvm/params.h> @@ -97,6 +106,11 @@ extern void pmap_lazyfix_action(void); /* Variables used by mp_machdep to perform the bitmap IPI */ extern volatile u_int cpu_ipi_pending[MAXCPU]; +#ifdef __amd64__ +/* Native AP start used on PVHVM */ +extern int native_start_all_aps(void); +#endif + /*---------------------------------- Macros ----------------------------------*/ #define IPI_TO_IDX(ipi) ((ipi) - APIC_IPI_INTS) @@ -119,7 +133,10 @@ enum xen_domain_type xen_domain_type = XEN_NATIVE; struct cpu_ops xen_hvm_cpu_ops = { .ipi_vectored = lapic_ipi_vectored, .cpu_init = xen_hvm_cpu_init, - .cpu_resume = xen_hvm_cpu_resume + .cpu_resume = xen_hvm_cpu_resume, +#ifdef __amd64__ + .start_all_aps = native_start_all_aps, +#endif }; static MALLOC_DEFINE(M_XENHVM, "xen_hvm", "Xen HVM PV Support"); @@ -157,8 +174,9 @@ DPCPU_DEFINE(xen_intr_handle_t, ipi_handle[nitems(xen_ipis)]); /*------------------ Hypervisor Access Shared Memory Regions -----------------*/ /** Hypercall table accessed via HYPERVISOR_*_op() methods. */ -char *hypercall_stubs; +extern char *hypercall_page; shared_info_t *HYPERVISOR_shared_info; +start_info_t *HYPERVISOR_start_info; #ifdef SMP /*---------------------------- XEN PV IPI Handlers ---------------------------*/ @@ -522,7 +540,7 @@ xen_setup_cpus(void) { int i; - if (!xen_hvm_domain() || !xen_vector_callback_enabled) + if (!xen_vector_callback_enabled) return; #ifdef __amd64__ @@ -558,7 +576,7 @@ xen_hvm_cpuid_base(void) * Allocate and fill in the hypcall page. */ static int -xen_hvm_init_hypercall_stubs(void) +xen_hvm_init_hypercall_stubs(enum xen_hvm_init_type init_type) { uint32_t base, regs[4]; int i; @@ -567,7 +585,7 @@ xen_hvm_init_hypercall_stubs(void) if (base == 0) return (ENXIO); - if (hypercall_stubs == NULL) { + if (init_type == XEN_HVM_INIT_COLD) { do_cpuid(base + 1, regs); printf("XEN: Hypervisor version %d.%d detected.\n", regs[0] >> 16, regs[0] & 0xffff); @@ -577,18 +595,9 @@ xen_hvm_init_hypercall_stubs(void) * Find the hypercall pages. */ do_cpuid(base + 2, regs); - - if (hypercall_stubs == NULL) { - size_t call_region_size; - - call_region_size = regs[0] * PAGE_SIZE; - hypercall_stubs = malloc(call_region_size, M_XENHVM, M_NOWAIT); - if (hypercall_stubs == NULL) - panic("Unable to allocate Xen hypercall region"); - } for (i = 0; i < regs[0]; i++) - wrmsr(regs[1], vtophys(hypercall_stubs + i * PAGE_SIZE) + i); + wrmsr(regs[1], vtophys(&hypercall_page + i * PAGE_SIZE) + i); return (0); } @@ -677,8 +686,6 @@ xen_hvm_disable_emulated_devices(void) if (inw(XEN_MAGIC_IOPORT) != XMI_MAGIC) return; - if (bootverbose) - printf("XEN: Disabling emulated block and network devices\n"); outw(XEN_MAGIC_IOPORT, XMI_UNPLUG_IDE_DISKS|XMI_UNPLUG_NICS); } @@ -691,7 +698,12 @@ xen_hvm_init(enum xen_hvm_init_type init_type) if (init_type == XEN_HVM_INIT_CANCELLED_SUSPEND) return; - error = xen_hvm_init_hypercall_stubs(); + if (xen_pv_domain()) { + /* hypercall page is already set in the PV case */ + error = 0; + } else { + error = xen_hvm_init_hypercall_stubs(init_type); + } switch (init_type) { case XEN_HVM_INIT_COLD: @@ -701,6 +713,12 @@ xen_hvm_init(enum xen_hvm_init_type init_type) setup_xen_features(); cpu_ops = xen_hvm_cpu_ops; vm_guest = VM_GUEST_XEN; +#ifdef __amd64__ + if (xen_pv_domain()) + cpu_ops.start_all_aps = xen_pv_start_all_aps; + else +#endif + printf("XEN: Disabling emulated block and network devices\n"); break; case XEN_HVM_INIT_RESUME: if (error != 0) @@ -715,10 +733,13 @@ xen_hvm_init(enum xen_hvm_init_type init_type) } xen_vector_callback_enabled = 0; - xen_domain_type = XEN_HVM_DOMAIN; - xen_hvm_init_shared_info_page(); xen_hvm_set_callback(NULL); - xen_hvm_disable_emulated_devices(); + + if (!xen_pv_domain()) { + xen_domain_type = XEN_HVM_DOMAIN; + xen_hvm_init_shared_info_page(); + xen_hvm_disable_emulated_devices(); + } } void @@ -749,10 +770,11 @@ xen_set_vcpu_id(void) struct pcpu *pc; int i; - /* Set vcpu_id to acpi_id */ + /* Set vcpu_id to acpi_id for PVHVM guests */ CPU_FOREACH(i) { pc = pcpu_find(i); - pc->pc_vcpu_id = pc->pc_acpi_id; + if (xen_hvm_domain()) + pc->pc_vcpu_id = pc->pc_acpi_id; if (bootverbose) printf("XEN: CPU %u has VCPU ID %u\n", i, pc->pc_vcpu_id); @@ -790,6 +812,31 @@ xen_hvm_cpu_init(void) DPCPU_SET(vcpu_info, vcpu_info); } +/*----------------------------- Debug functions ------------------------------*/ +#define PRINTK_BUFSIZE 1024 +static int +vprintk(const char *fmt, __va_list ap) +{ + int retval, len; + static char buf[PRINTK_BUFSIZE]; + + retval = vsnprintf(buf, PRINTK_BUFSIZE - 1, fmt, ap); + buf[retval] = 0; + len = strlen(buf); + retval = HYPERVISOR_console_io(CONSOLEIO_write, len, (char *)buf); + return retval; +} + +void +xen_early_printf(const char *fmt, ...) +{ + __va_list ap; + + va_start(ap, fmt); + vprintk(fmt, ap); + va_end(ap); +} + SYSINIT(xen_hvm_init, SI_SUB_HYPERVISOR, SI_ORDER_FIRST, xen_hvm_sysinit, NULL); #ifdef SMP SYSINIT(xen_setup_cpus, SI_SUB_SMP, SI_ORDER_FIRST, xen_setup_cpus, NULL); diff --git a/sys/x86/xen/mptable.c b/sys/x86/xen/mptable.c new file mode 100644 index 0000000..8916314 --- /dev/null +++ b/sys/x86/xen/mptable.c @@ -0,0 +1,136 @@ +/*- + * Copyright (c) 2003 John Baldwin <jhb@FreeBSD.org> + * Copyright (c) 2013 Roger Pau Monné <roger.pau@citrix.com> + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * 3. Neither the name of the author nor the names of any co-contributors + * may be used to endorse or promote products derived from this software + * without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'''' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + */ + +#include <sys/cdefs.h> +__FBSDID("$FreeBSD$"); + +#include <sys/param.h> +#include <sys/systm.h> +#include <sys/bus.h> +#include <sys/kernel.h> +#include <sys/smp.h> +#include <sys/pcpu.h> +#include <vm/vm.h> +#include <vm/pmap.h> + +#include <machine/intr_machdep.h> +#include <machine/apicvar.h> + +#include <machine/cpu.h> +#include <machine/smp.h> + +#include <xen/xen-os.h> +#include <xen/hypervisor.h> + +#include <xen/interface/vcpu.h> + +static int xenpv_probe(void); +static int xenpv_probe_cpus(void); +static int xenpv_setup_local(void); +static int xenpv_setup_io(void); + +static struct apic_enumerator xenpv_enumerator = { + "Xen PV", + xenpv_probe, + xenpv_probe_cpus, + xenpv_setup_local, + xenpv_setup_io +}; + +/* + * Look for an ACPI Multiple APIC Description Table ("APIC") + */ +static int +xenpv_probe(void) +{ + return (-100); +} + +/* + * Run through the MP table enumerating CPUs. + */ +static int +xenpv_probe_cpus(void) +{ + int i, ret; + + for (i = 0; i < MAXCPU; i++) { + ret = HYPERVISOR_vcpu_op(VCPUOP_is_up, i, NULL); + if (ret >= 0) + cpu_add((i * 2), (i == 0)); + } + + return (0); +} + +/* + * Initialize the local APIC on the BSP. + */ +static int +xenpv_setup_local(void) +{ + PCPU_SET(vcpu_id, 0); + return (0); +} + +/* + * Enumerate I/O APICs and setup interrupt sources. + */ +static int +xenpv_setup_io(void) +{ + return (0); +} + +static void +xenpv_register(void *dummy __unused) +{ + if (xen_pv_domain()) { + apic_register_enumerator(&xenpv_enumerator); + } +} +SYSINIT(xenpv_register, SI_SUB_TUNABLES - 1, SI_ORDER_FIRST, xenpv_register, NULL); + +/* + * Setup per-CPU ACPI IDs. + */ +static void +xenpv_set_ids(void *dummy) +{ + struct pcpu *pc; + int i; + + CPU_FOREACH(i) { + pc = pcpu_find(i); + pc->pc_vcpu_id = i; + } + return; +} +SYSINIT(xenpv_set_ids, SI_SUB_CPU, SI_ORDER_MIDDLE, xenpv_set_ids, NULL); diff --git a/sys/x86/xen/pv.c b/sys/x86/xen/pv.c new file mode 100644 index 0000000..6756dec --- /dev/null +++ b/sys/x86/xen/pv.c @@ -0,0 +1,247 @@ +/* + * Copyright (c) 2004 Christian Limpach. + * Copyright (c) 2004-2006,2008 Kip Macy + * Copyright (c) 2013 Roger Pau Monné <roger.pau@citrix.com> + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS AS IS'''' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + */ + +#include <sys/cdefs.h> +__FBSDID("$FreeBSD$"); + +#include <sys/param.h> +#include <sys/bus.h> +#include <sys/kernel.h> +#include <sys/malloc.h> +#include <sys/proc.h> +#include <sys/smp.h> +#include <sys/systm.h> +#include <sys/lock.h> +#include <sys/mutex.h> +#include <sys/reboot.h> + +#include <vm/vm.h> +#include <vm/pmap.h> +#include <vm/vm_kern.h> +#include <vm/vm_extern.h> + +#include <dev/pci/pcivar.h> + +#include <machine/cpufunc.h> +#include <machine/cpu.h> +#include <machine/smp.h> +#include <machine/tss.h> +#include <machine/sysarch.h> +#include <machine/clock.h> + +#include <x86/apicreg.h> + +#include <xen/xen-os.h> +#include <xen/features.h> +#include <xen/gnttab.h> +#include <xen/hypervisor.h> +#include <xen/hvm.h> +#include <xen/pv.h> +#include <xen/xen_intr.h> + +#include <xen/interface/hvm/params.h> +#include <xen/interface/vcpu.h> + +#define MAX_E820_ENTRIES 128 + +/*--------------------------- Forward Declarations ---------------------------*/ +static caddr_t xen_pv_parse_preload_data(u_int64_t); +static void xen_pv_fetch_e820_map(caddr_t, struct bios_smap **, u_int32_t *); + +/*---------------------------- Extern Declarations ---------------------------*/ +/* Variables used by amd64 mp_machdep to start APs */ +extern struct mtx ap_boot_mtx; +extern void *bootstacks[]; +extern char *doublefault_stack; +extern char *nmi_stack; +extern void *dpcpu; +extern int bootAP; +extern char *bootSTK; +extern bool lapic_disabled; + +/*-------------------------------- Global Data -------------------------------*/ +/* Xen init_ops implementation. */ +struct init_ops xen_init_ops = { + .parse_preload_data = xen_pv_parse_preload_data, + .early_delay_init = xen_delay_init, + .early_delay = xen_delay, + .fetch_e820_map = xen_pv_fetch_e820_map, +}; + +static struct +{ + const char *ev; + int mask; +} howto_names[] = { + {"boot_askname", RB_ASKNAME}, + {"boot_single", RB_SINGLE}, + {"boot_nosync", RB_NOSYNC}, + {"boot_halt", RB_ASKNAME}, + {"boot_serial", RB_SERIAL}, + {"boot_cdrom", RB_CDROM}, + {"boot_gdb", RB_GDB}, + {"boot_gdb_pause", RB_RESERVED1}, + {"boot_verbose", RB_VERBOSE}, + {"boot_multicons", RB_MULTIPLE}, + {NULL, 0} +}; + +static struct bios_smap xen_smap[MAX_E820_ENTRIES]; + +static int +start_xen_ap(int cpu) +{ + struct vcpu_guest_context *ctxt; + int ms, cpus = mp_naps; + + ctxt = malloc(sizeof(*ctxt), M_TEMP, M_NOWAIT | M_ZERO); + if (ctxt == NULL) + panic("unable to allocate memory"); + + ctxt->flags = VGCF_IN_KERNEL; + ctxt->user_regs.rip = (unsigned long) init_secondary; + ctxt->user_regs.rsp = (unsigned long) bootSTK; + + /* Set the CPU to use the same page tables and CR4 value */ + ctxt->ctrlreg[3] = KPML4phys; + ctxt->ctrlreg[4] = rcr4(); + + if (HYPERVISOR_vcpu_op(VCPUOP_initialise, cpu, ctxt)) + panic("unable to initialize CPU#%d\n", cpu); + + free(ctxt, M_TEMP); + + /* Launch the vCPU */ + if (HYPERVISOR_vcpu_op(VCPUOP_up, cpu, NULL)) + panic("unable to start AP#%d\n", cpu); + + /* Wait up to 5 seconds for it to start. */ + for (ms = 0; ms < 5000; ms++) { + if (mp_naps > cpus) + return 1; /* return SUCCESS */ + DELAY(1000); + } + + return 0; +} + +int +xen_pv_start_all_aps(void) +{ + int cpu; + + mtx_init(&ap_boot_mtx, "ap boot", NULL, MTX_SPIN); + lapic_disabled = true; + + for (cpu = 1; cpu < mp_ncpus; cpu++) { + + /* allocate and set up an idle stack data page */ + bootstacks[cpu] = (void *)kmem_malloc(kernel_arena, + KSTACK_PAGES * PAGE_SIZE, M_WAITOK | M_ZERO); + doublefault_stack = (char *)kmem_malloc(kernel_arena, + PAGE_SIZE, M_WAITOK | M_ZERO); + nmi_stack = (char *)kmem_malloc(kernel_arena, PAGE_SIZE, + M_WAITOK | M_ZERO); + dpcpu = (void *)kmem_malloc(kernel_arena, DPCPU_SIZE, + M_WAITOK | M_ZERO); + + bootSTK = (char *)bootstacks[cpu] + KSTACK_PAGES * PAGE_SIZE - 8; + bootAP = cpu; + + /* attempt to start the Application Processor */ + if (!start_xen_ap(cpu)) + panic("AP #%d failed to start!", cpu); + + CPU_SET(cpu, &all_cpus); /* record AP in CPU map */ + } + + return mp_naps; +} + +/* + * Functions to convert the "extra" parameters passed by Xen + * into FreeBSD boot options (from the i386 Xen port). + */ +static char * +xen_setbootenv(char *cmd_line) +{ + char *cmd_line_next; + + /* Skip leading spaces */ + for (; *cmd_line == '' ''; cmd_line++); + + for (cmd_line_next = cmd_line; strsep(&cmd_line_next, ",") != NULL;); + return (cmd_line); +} + +static int +xen_boothowto(char *envp) +{ + int i, howto = 0; + + /* get equivalents from the environment */ + for (i = 0; howto_names[i].ev != NULL; i++) + if (getenv(howto_names[i].ev) != NULL) + howto |= howto_names[i].mask; + return (howto); +} + +static caddr_t +xen_pv_parse_preload_data(u_int64_t modulep) +{ + /* Parse the extra boot information given by Xen */ + if (HYPERVISOR_start_info->cmd_line) + kern_envp = xen_setbootenv(HYPERVISOR_start_info->cmd_line); + boothowto |= xen_boothowto(kern_envp); + + return (NULL); +} + +static void +xen_pv_fetch_e820_map(caddr_t kmdp, struct bios_smap **smap, u_int32_t *size) +{ + struct xen_memory_map memmap; + int rc; + + /* Fetch the E820 map from Xen */ + memmap.nr_entries = MAX_E820_ENTRIES; + set_xen_guest_handle(memmap.buffer, xen_smap); + rc = HYPERVISOR_memory_op(XENMEM_memory_map, &memmap); + if (rc) + panic("unable to fetch Xen E820 memory map"); + + *smap = xen_smap; + *size = memmap.nr_entries * sizeof(xen_smap[0]); +} + +void +xen_pv_set_init_ops(void) +{ + /* Init ops for Xen PV */ + init_ops = xen_init_ops; +} diff --git a/sys/x86/xen/pvcpu.c b/sys/x86/xen/pvcpu.c new file mode 100644 index 0000000..00e063b --- /dev/null +++ b/sys/x86/xen/pvcpu.c @@ -0,0 +1,98 @@ +/* + * Copyright (c) 2013 Roger Pau Monné <roger.pau@citrix.com> + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS AS IS'''' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + */ + +#include <sys/cdefs.h> +__FBSDID("$FreeBSD$"); + +#include <sys/param.h> +#include <sys/systm.h> +#include <sys/bus.h> +#include <sys/kernel.h> +#include <sys/module.h> +#include <sys/pcpu.h> +#include <sys/smp.h> + +#include <xen/xen-os.h> + +static void +xenpvcpu_identify(driver_t *driver, device_t parent) +{ + int i; + + if (!xen_pv_domain()) + return; + + CPU_FOREACH(i) + BUS_ADD_CHILD(parent, 0, "pvcpu", i); +} + +static int +xenpvcpu_probe(device_t dev) +{ + if (!xen_pv_domain()) + return (ENXIO); + + device_set_desc(dev, "Xen PV CPU"); + return (0); +} + +static int +xenpvcpu_attach(device_t dev) +{ + struct pcpu *pc; + int cpu; + + cpu = device_get_unit(dev); + pc = pcpu_find(cpu); + pc->pc_device = dev; + return (0); +} + +static int +xenpvcpu_detach(device_t dev) +{ + + return (0); +} + +static device_method_t xenpvcpu_methods[] = { + DEVMETHOD(device_identify, xenpvcpu_identify), + DEVMETHOD(device_probe, xenpvcpu_probe), + DEVMETHOD(device_attach, xenpvcpu_attach), + DEVMETHOD(device_detach, xenpvcpu_detach), + DEVMETHOD_END +}; + +static driver_t xenpvcpu_driver = { + "pvcpu", + xenpvcpu_methods, + 0, +}; + +devclass_t xenpvcpu_devclass; + +DRIVER_MODULE(xenpvcpu, nexus, xenpvcpu_driver, xenpvcpu_devclass, 0, 0); +MODULE_DEPEND(xenpvcpu, nexus, 1, 1, 1); diff --git a/sys/xen/gnttab.c b/sys/xen/gnttab.c index 03c32b7..909378a 100644 --- a/sys/xen/gnttab.c +++ b/sys/xen/gnttab.c @@ -25,6 +25,7 @@ __FBSDID("$FreeBSD$"); #include <sys/lock.h> #include <sys/malloc.h> #include <sys/mman.h> +#include <sys/limits.h> #include <xen/xen-os.h> #include <xen/hypervisor.h> @@ -607,6 +608,7 @@ gnttab_resume(void) { int error; unsigned int max_nr_gframes, nr_gframes; + void *alloc_mem; nr_gframes = nr_grant_frames; max_nr_gframes = max_nr_grant_frames(); @@ -614,11 +616,20 @@ gnttab_resume(void) return (ENOSYS); if (!resume_frames) { - error = xenpci_alloc_space(PAGE_SIZE * max_nr_gframes, - &resume_frames); - if (error) { - printf("error mapping gnttab share frames\n"); - return (error); + if (xen_pv_domain()) { + alloc_mem = contigmalloc(max_nr_gframes * PAGE_SIZE, + M_DEVBUF, M_NOWAIT, 0, + ULONG_MAX, PAGE_SIZE, 0); + KASSERT((alloc_mem != NULL), + ("unable to alloc memory for gnttab")); + resume_frames = vtophys(alloc_mem); + } else { + error = xenpci_alloc_space(PAGE_SIZE * max_nr_gframes, + &resume_frames); + if (error) { + printf("error mapping gnttab share frames\n"); + return (error); + } } } diff --git a/sys/xen/interface/arch-x86/xen.h b/sys/xen/interface/arch-x86/xen.h index 1c186d7..6cc15d3 100644 --- a/sys/xen/interface/arch-x86/xen.h +++ b/sys/xen/interface/arch-x86/xen.h @@ -147,7 +147,16 @@ struct vcpu_guest_context { struct cpu_user_regs user_regs; /* User-level CPU registers */ struct trap_info trap_ctxt[256]; /* Virtual IDT */ unsigned long ldt_base, ldt_ents; /* LDT (linear address, # ents) */ - unsigned long gdt_frames[16], gdt_ents; /* GDT (machine frames, # ents) */ + union { + struct { + /* PV: GDT (machine frames, # ents).*/ + unsigned long gdt_frames[16], gdt_ents; + } pv; + struct { + /* PVH: GDTR addr and size */ + unsigned long gdtaddr, gdtsz; + } pvh; + } u; unsigned long kernel_ss, kernel_sp; /* Virtual TSS (only SS1/SP1) */ /* NB. User pagetable on x86/64 is placed in ctrlreg[1]. */ unsigned long ctrlreg[8]; /* CR0-CR7 (control registers) */ diff --git a/sys/xen/pv.h b/sys/xen/pv.h new file mode 100644 index 0000000..bbb1048 --- /dev/null +++ b/sys/xen/pv.h @@ -0,0 +1,29 @@ +/* + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the "Software"), to + * deal in the Software without restriction, including without limitation the + * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or + * sell copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER + * DEALINGS IN THE SOFTWARE. + * + * $FreeBSD$ + */ + +#ifndef __XEN_PV_H__ +#define __XEN_PV_H__ + +int xen_pv_start_all_aps(void); +void xen_pv_set_init_ops(void); + +#endif /* __XEN_PV_H__ */ \ No newline at end of file diff --git a/sys/xen/xen-os.h b/sys/xen/xen-os.h index 95e8c6a..d3dccad 100644 --- a/sys/xen/xen-os.h +++ b/sys/xen/xen-os.h @@ -53,6 +53,11 @@ void force_evtchn_callback(void); extern int gdtset; extern shared_info_t *HYPERVISOR_shared_info; +extern start_info_t *HYPERVISOR_start_info; + +/* XXX: we need to get rid of this and use HYPERVISOR_start_info directly */ +extern struct xenstore_domain_interface *xen_store; +extern char *console_page; enum xen_domain_type { XEN_NATIVE, /* running on bare hardware */ @@ -80,6 +85,9 @@ xen_hvm_domain(void) return (xen_domain_type == XEN_HVM_DOMAIN); } +/* Debug function, prints directly to hypervisor console */ +void xen_early_printf(const char *, ...); + #ifndef xen_mb #define xen_mb() mb() #endif diff --git a/sys/xen/xenstore/xenstore.c b/sys/xen/xenstore/xenstore.c index d404862..b9885af 100644 --- a/sys/xen/xenstore/xenstore.c +++ b/sys/xen/xenstore/xenstore.c @@ -1082,6 +1082,19 @@ xs_init_comms(void) static void xs_identify(driver_t *driver, device_t parent) { + const char *parent_name; + + if (!xen_domain()) + return; + + /* + * On HVM domains we will get called twice, once from the nexus + * and another time after the xenpci device is attached, we should + * only attach after the xenpci device has been added. + */ + parent_name = device_get_name(parent); + if (xen_hvm_domain() && strncmp(parent_name, "xenpci", 6) != 0) + return; BUS_ADD_CHILD(parent, 0, "xenstore", 0); } @@ -1147,13 +1160,15 @@ xs_attach(device_t dev) /* Initialize the interface to xenstore. */ struct proc *p; -#ifdef XENHVM - xs.evtchn = hvm_get_parameter(HVM_PARAM_STORE_EVTCHN); - xs.gpfn = hvm_get_parameter(HVM_PARAM_STORE_PFN); - xen_store = pmap_mapdev(xs.gpfn * PAGE_SIZE, PAGE_SIZE); -#else - xs.evtchn = xen_start_info->store_evtchn; -#endif + if (xen_hvm_domain()) { + xs.evtchn = hvm_get_parameter(HVM_PARAM_STORE_EVTCHN); + xs.gpfn = hvm_get_parameter(HVM_PARAM_STORE_PFN); + xen_store = pmap_mapdev(xs.gpfn * PAGE_SIZE, PAGE_SIZE); + } else if (xen_pv_domain()) { + xs.evtchn = HYPERVISOR_start_info->store_evtchn; + } else { + panic("Unknown domain type, cannot initialize xenstore\n"); + } TAILQ_INIT(&xs.reply_list); TAILQ_INIT(&xs.watch_events); @@ -1263,9 +1278,8 @@ static devclass_t xenstore_devclass; #ifdef XENHVM DRIVER_MODULE(xenstore, xenpci, xenstore_driver, xenstore_devclass, 0, 0); -#else -DRIVER_MODULE(xenstore, nexus, xenstore_driver, xenstore_devclass, 0, 0); #endif +DRIVER_MODULE(xenstore, nexus, xenstore_driver, xenstore_devclass, 0, 0); /*------------------------------- Sysctl Data --------------------------------*/ /* XXX Shouldn''t the node be somewhere else? */ -- 1.7.7.5 (Apple Git-26) --------------020808000103060707080705 Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel --------------020808000103060707080705--
On Mon, Oct 28, 2013 at 02:35:03PM +0100, Roger Pau Monné wrote:> Hello, > > The Xen community is working on a new virtualization mode (or maybe I > should say an extension of HVM) to be able to run PV guests inside HVM > containers without requiring a device-model (Qemu). One of the > advantages of this new virtualization mode is that now it is much more > easier to port guests to run under it (as compared to pure PV guests). > > Given that FreeBSD already supports PVHVM, adding PVH support is quite > easy, we only need some glue for the PV entry point and then support > for diverging some early init functions (like fetching the e820 map or > starting the APs). > > The attached patch contains all this changes, and allows a SMP FreeBSD > guest to fully boot (and AFAIK work) under this new PVH mode. The patch > can also be found on my git repo: > > git://xenbits.xen.org/people/royger/freebsd.git pvh_v2Awesome! That is really fantastic!> > The patch touches quite a lot of the early init, so I''ve Cced the > persons that maintain those areas, so they can review it. > > In order to test it, and since the PVH changes are not yet merged into > upstream Xen, the use of a patched Xen is necessary. I''ve collected the > patches for PVH guest support from George Dunlap (v13) and fixed some > bugs on top of them, the tree can be found at: > > git://xenbits.xen.org/people/royger/xen.git fix_pvh > > For those curious, here is a dmesg of a FreeBSD PVH guest booting: > > GDB: no debug ports present > KDB: debugger backends: ddb > KDB: current backend: ddb > SMAP type=01 base=0000000000000000 len=0000000138800000 > ACPI BIOS Error (bug): A valid RSDP was not found (20130823/tbxfroot-223) > APIC: Using the Xen PV enumerator. > SMP: Added CPU 0 (BSP) > SMP: Added CPU 2 (AP) > SMP: Added CPU 4 (AP) > SMP: Added CPU 6 (AP) > SMP: Added CPU 8 (AP) > SMP: Added CPU 10 (AP) > SMP: Added CPU 12 (AP) > Copyright (c) 1992-2013 The FreeBSD Project. > Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 > The Regents of the University of California. All rights reserved. > FreeBSD is a registered trademark of The FreeBSD Foundation. > FreeBSD 11.0-CURRENT #420: Mon Oct 28 13:07:53 CET 2013 > root@odin:/usr/obj/usr/src/sys/GENERIC amd64 > FreeBSD clang version 3.3 (tags/RELEASE_33/final 183502) 20130610 > WARNING: WITNESS option enabled, expect reduced performance. > Hypervisor: Origin = "XenVMMXenVMM" > Calibrating TSC clock ... TSC clock: 3066775691 Hz > CPU: Intel(R) Xeon(R) CPU W3550 @ 3.07GHz (3066.78-MHz K8-class CPU) > Origin = "GenuineIntel" Id = 0x106a5 Family = 0x6 Model = 0x1a Stepping = 5 > Features=0x1fc98b75<FPU,DE,TSC,MSR,PAE,CX8,APIC,SEP,CMOV,PAT,CLFLUSH,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT> > Features2=0x80982201<SSE3,SSSE3,CX16,SSE4.1,SSE4.2,POPCNT,HV> > AMD Features=0x20100800<SYSCALL,NX,LM> > AMD Features2=0x1<LAHF> > real memory = 5242880000 (5000 MB) > Physical memory chunk(s): > 0x0000000000010000 - 0x00000000001fffff, 2031616 bytes (496 pages) > 0x0000000002708000 - 0x0000000130864fff, 5068148736 bytes (1237341 pages) > avail memory = 5035581440 (4802 MB) > INTR: Adding local APIC 2 as a target > INTR: Adding local APIC 4 as a target > INTR: Adding local APIC 6 as a target > INTR: Adding local APIC 8 as a target > INTR: Adding local APIC 10 as a target > INTR: Adding local APIC 12 as a target > FreeBSD/SMP: Multiprocessor System Detected: 7 CPUs > FreeBSD/SMP: 1 package(s) x 7 core(s) > cpu0 (BSP): APIC ID: 0 > cpu1 (AP): APIC ID: 2 > cpu2 (AP): APIC ID: 4 > cpu3 (AP): APIC ID: 6 > cpu4 (AP): APIC ID: 8 > cpu5 (AP): APIC ID: 10 > cpu6 (AP): APIC ID: 12 > XEN: CPU 0 has VCPU ID 0 > XEN: CPU 1 has VCPU ID 1 > XEN: CPU 2 has VCPU ID 2 > XEN: CPU 3 has VCPU ID 3 > XEN: CPU 4 has VCPU ID 4 > XEN: CPU 5 has VCPU ID 5 > XEN: CPU 6 has VCPU ID 6 > x86bios: IVT 0x000000-0x0004ff at 0xfffff80000000000 > x86bios: SSEG 0x010000-0x010fff at 0xfffffe012e79d000 > x86bios: ROM 0x0a0000-0x0fefff at 0xfffff800000a0000 > random device not loaded; using insecure entropy > ULE: setup cpu 0 > ULE: setup cpu 1 > ULE: setup cpu 2 > ULE: setup cpu 3 > ULE: setup cpu 4 > ULE: setup cpu 5 > ULE: setup cpu 6 > Event-channel device installed. > snd_unit_init() u=0x00ff8000 [512] d=0x00007c00 [32] c=0x000003ff [1024] > feeder_register: snd_unit=-1 snd_maxautovchans=16 latency=5 feeder_rate_min=1 feeder_rate_max=2016000 feeder_rate_round=25 > wlan: <802.11 Link Layer> > Hardware, VIA Nehemiah Padlock RNG: VIA Padlock RNG not present > Hardware, Intel IvyBridge+ RNG: RDRAND is not present > null: <null device, zero device> > Falling back to <Software, Yarrow> random adaptor > random: <Software, Yarrow> initialized > nfslock: pseudo-device > kbd0 at kbdmux0 > module_register_init: MOD_LOAD (vesa, 0xffffffff80d21c60, 0) error 19 > io: <I/O> > VMBUS: load > mem: <memory> > hpt27xx: RocketRAID 27xx controller driver v1.1 > hptrr: RocketRAID 17xx/2xxx SATA controller driver v1.2 > hptnr: R750/DC7280 controller driver v1.0 > ACPI BIOS Error (bug): A valid RSDP was not found (20130823/tbxfroot-223) > ACPI: Table initialisation failed: AE_NOT_FOUND > ACPI: Try disabling either ACPI or apic support. > xenstore0: <XenStore> on motherboard > Grant table initialized > xc0: <Xen Console> on motherboard > xen_et0: <Xen PV Clock> on motherboard > Event timer "XENTIMER" frequency 1000000000 Hz quality 950 > Timecounter "XENTIMER" frequency 1000000000 Hz quality 950 > xen_et0: registered as a time-of-day clock (resolution 10000000us, adjustment 5.000000000s) > pvcpu0: <Xen PV CPU> on motherboard > pvcpu1: <Xen PV CPU> on motherboard > pvcpu2: <Xen PV CPU> on motherboard > pvcpu3: <Xen PV CPU> on motherboard > pvcpu4: <Xen PV CPU> on motherboard > pvcpu5: <Xen PV CPU> on motherboard > pvcpu6: <Xen PV CPU> on motherboard > legacy_pcib_identify: no bridge found, adding pcib0 anyway > pcib0 pcibus 0 on motherboard > pci0: <PCI bus> on pcib0 > pci0: domain=0, physical bus=0 > cpu0 on motherboard > cpu1 on motherboard > cpu2 on motherboard > cpu3 on motherboard > cpu4 on motherboard > cpu5 on motherboard > cpu6 on motherboard > isa0: <ISA bus> on motherboard > qpi0: <QPI system bus> on motherboard > ex_isa_identify() > isa_probe_children: disabling PnP devices > isa_probe_children: probing non-PnP devices > fb: new array size 4 > sc0: <System console> on isa0 > sc0: MDA <16 virtual consoles, flags=0x100> > sc0: fb0, kbd0, terminal emulator: scteken (teken terminal) > vga0: <Generic ISA VGA> at port 0x3b0-0x3bb iomem 0xb0000-0xb7fff on isa0 > isa_probe_children: probing PnP devices > Device configuration finished. > procfs registered > Timecounters tick every 1.000 msec > vlan: initialized, using hash tables with chaining > tcp_init: net.inet.tcp.tcbhashsize auto tuned to 65536 > lo0: bpf attached > hpt27xx: no controller detected. > hptrr: no controller detected. > hptnr: no controller detected. > xenbusb_front0: <Xen Frontend Devices> on xenstore0 > xenbusb_add_device: Device device/suspend/event-channel ignored. State 6 > xn0: <Virtual Network Interface> at device/vif/0 on xenbusb_front0 > xn0: bpf attached > xn0: Ethernet address: 00:16:3e:0b:a4:b1 > xenbusb_back0: <Xen Backend Devices> on xenstore0 > xctrl0: <Xen Control Device> on xenstore0 > xn0: backend features: feature-sg feature-gso-tcp4 > xbd0: 20480MB <Virtual Block Device> at device/vbd/51712 on xenbusb_front0 > xbd0: features: flush, write_barrier > xbd0: synchronize cache commands enabled. > GEOM: new disk xbd0 > random: unblocking device. > Netvsc initializing... SMP: AP CPU #5 Launched! > SMP: AP CPU #2 Launched! > SMP: AP CPU #1 Launched! > SMP: AP CPU #3 Launched! > SMP: AP CPU #6 Launched! > SMP: AP CPU #4 Launched! > TSC timecounter discards lower 1 bit(s) > Timecounter "TSC-low" frequency 1533387845 Hz quality -100 > WARNING: WITNESS option enabled, expect reduced performance. > Trying to mount root from ufs:/dev/xbd0p2 []... > start_init: trying /sbin/init > Setting hostuuid: c9230f36-1a54-489e-877c-1d15b8f463e9. > Setting hostid: 0xd52252c7. > ZFS filesystem version: 5 > ZFS storage pool version: features support (5000) > Entropy harvesting: interrupts ethernet point_to_pointsha256: /kernel: No such file or directory > kickstart. > Starting file system checks: > /dev/xbd0p2: FILE SYSTEM CLEAN; SKIPPING CHECKS > /dev/xbd0p2: clean, 2213647 free (17111 frags, 274567 blocks, 0.4% fragmentation) > Mounting local file systems:. > Writing entropy file:. > xn0: link state changed to DOWN > xn0: link state changed to UP > Starting Network: lo0 xn0. > lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384 > options=600003<RXCSUM,TXCSUM,RXCSUM_IPV6,TXCSUM_IPV6> > inet6 ::1 prefixlen 128 > inet6 fe80::1%lo0 prefixlen 64 scopeid 0x1 > inet 127.0.0.1 netmask 0xff000000 > nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL> > xn0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 > options=503<RXCSUM,TXCSUM,TSO4,LRO> > ether 00:16:3e:0b:a4:b1 > nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL> > media: Ethernet manual > status: active > Starting devd. > Starting dhclient. > DHCPDISCOVER on xn0 to 255.255.255.255 port 67 interval 7 > DHCPOFFER from 172.16.1.1 > DHCPREQUEST on xn0 to 255.255.255.255 port 67 > DHCPACK from 172.16.1.1 > bound to 172.16.1.149 -- renewal in 43200 seconds. > add net ::ffff:0.0.0.0: gateway ::1 > add net ::0.0.0.0: gateway ::1 > add net fe80::: gateway ::1 > add net ff02::: gateway ::1 > ELF ldconfig path: /lib /usr/lib /usr/lib/compat /usr/local/lib > 32-bit compatibility ldconfig path: /usr/lib32 > Creating and/or trimming log files. > Starting syslogd. > No core dumps found. > lock order reversal: > 1st 0xfffffe012e861e28 bufwait (bufwait) @ /usr/src/sys/kern/vfs_bio.c:3050 > 2nd 0xfffff80005b87c00 dirhash (dirhash) @ /usr/src/sys/ufs/ufs/ufs_dirhash.c:284 > KDB: stack backtrace: > X_db_symbol_values() at X_db_symbol_values+0x10b/frame 0xfffffe012fb8c410 > kdb_backtrace() at kdb_backtrace+0x39/frame 0xfffffe012fb8c4c0 > witness_checkorder() at witness_checkorder+0xd23/frame 0xfffffe012fb8c550 > _sx_xlock() at _sx_xlock+0x75/frame 0xfffffe012fb8c590 > ufsdirhash_add() at ufsdirhash_add+0x3b/frame 0xfffffe012fb8c5d0 > ufs_direnter() at ufs_direnter+0x688/frame 0xfffffe012fb8c690 > ufs_vinit() at ufs_vinit+0x33f3/frame 0xfffffe012fb8c890 > VOP_MKDIR_APV() at VOP_MKDIR_APV+0xf0/frame 0xfffffe012fb8c8c0 > kern_mkdirat() at kern_mkdirat+0x1ff/frame 0xfffffe012fb8cae0 > amd64_syscall() at amd64_syscall+0x265/frame 0xfffffe012fb8cbf0 > Xfast_syscall() at Xfast_syscall+0xfb/frame 0xfffffe012fb8cbf0 > --- syscall (136, FreeBSD ELF64, sys_mkdir), rip = 0x80092faaa, rsp = 0x7fffffffd788, rbp = 0x7fffffffdc70 --- > Clearing /tmp (X related). > Updating motd:. > Configuring syscons: keymap blanktime. > Performing sanity check on sshd configuration. > Starting sshd. > Starting cron. > Starting background file system checks in 60 seconds. > > Mon Oct 28 13:22:52 CET 2013 > > FreeBSD/amd64 (Amnesiac) (xc0)> >From 16de1566ada65e5838105870df576ab8258ed8b6 Mon Sep 17 00:00:00 2001 > From: Roger Pau Monne <roger.pau@citrix.com> > Date: Mon, 14 Oct 2013 18:33:17 +0200 > Subject: [PATCH] Xen x86 PVH support > > This is still very experimental, and PVH support has not yet been > merged into upstream Xen. > > PVH mode is basically a PV guest inside an HVM container, and shares > a great amount of code with PVHVM. The main difference is the way the > guest is started, PVH uses the PV start sequence, jumping directly > into the kernel entry point in long mode and with page tables set. > The main work of this patch consists in setting the environment as > similar as possible to what native FreeBSD expects, and then adding > hooks to the PV ops when necessary. > > sys/amd64/amd64/locore.S: > * Add PV entry point, hypervisor_page and the necessary elfnotes. > > sys/amd64/amd64/machdep.c: > * Add hooks to replace bare metal operations that should use a PV > helper, this includes: > - Preload metadata > - i8254_init and i8254_delay > - Fetching the e820 memory map > - Reserve of the MP bootstrap region > > * Create a DELAY function that uses the PV hooks. > * Introduce a new hammer_time_xen that sets the necessary stuff when > running in PVH mode. > > sys/amd64/amd64/mp_machdep.c: > * Introduce a hook to replace start_all_aps. > * Introduce a lapic_disabled variable to prevent polluting the code > with xen specific gates. > > sys/amd64/include/asmacros.h: > * Copy the ELFNOTE macro from the i386 Xen PV port. > > sys/amd64/include/clock.h: > sys/i386/include/clock.h: > * Prototypes for the xen early delay initialization and usage. > > sys/amd64/include/cpu.h: > * Introduce a new cpu hook to init APs. > > sys/amd64/include/sysarch.h: > * Declare the init_ops structure. > > sys/amd64/include/xen/hypercall.h: > sys/i386/include/xen/hypercall.h > * Switch to the PV style hypercall mechanism for HVM also. > > sys/conf/files: > * Make the PV console available on XENHVM also. > > sys/conf/files.amd64: > * Include the new files for the PVH port. > > sys/dev/xen/console/console.c: > sys/dev/xen/console/xencons_ring.c: > * Gate the PV console attach so it is only used on PV ports. > * Use HYPERVISOR_start_info instead of xen_start_info. > * Use HYPERVISOR_event_channel_op to kick the event channel before > xen interrupts are setup. > > sys/dev/xen/control/control.c: > * Use the PV shutdown on PVH. > > sys/dev/xen/timer/timer.c: > * Pass a vcpu_info to xen_fetch_vcpu_time, this allows using this > function at very early init, before per-cpu vcpu_info is set. > * Remove critical_{enter/exit} from xen_fetch_vcpu_time so it can be > used at early boot, instead place them on the callers. > * Introduce two new functions, xen_delay_init and xen_delay that can > be used at early boot to implement the generic DELAY function. > > sys/i386/i386/locore.s: > * Reserve space for the hypercall page. > > sys/i386/i386/machdep.c: > * Create a generic DELAY function. > > sys/i386/xen/xen_machdep.c: > * Set HYPERVISOR_start_info. > > sys/x86/isa/clock.c: > * Rename the generic DELAY function to i8254_delay. > > sys/x86/x86/delay.c: > * Put generic delay helpers here, get_tsc and delay_tc. > > sys/x86/x86/local_apic.c: > * Prevent the local apic from attaching when running on PVH mode. > > sys/x86/xen/hvm.c: > * Set the start_all_aps hook. > * Fix the setting of the hypercall page now that we are using the > same mechanism as the PV port. > * Initialize Xen CPU hooks for the PVH port. > * Introduce the xen_early_printf debug function, which prints > directly to the hypervisor console. > > sys/x86/xen/mptable.c: > * Create a dummy PV CPU enumerator for the PVH port. > > sys/x86/xen/pv.c: > * Implement the PV functions for the early boot hooks, > parse_preload_data and fetch_e820_map. > * Implement the PV function for the start_all_aps hook. > > sys/x86/xen/pvcpu.c: > * Dummy Xen PV CPU device, that we use to set the per-cpu pc_device. > > sys/xen/gnttab.c: > * Allocate resume_frames for the PVH port. > > sys/xen/interface/arch-x86/xen.h: > * Interface change for the PVH port (not used on FreeBSD). > > sys/xen/pv.h: > * Header that exports the specific PV functions. > > sys/xen/xen-os.h: > * Declare prototypes for the newly added functions. > > sys/xen/xenstore/xenstore.c: > * Make the xenstore driver hang from both xenpci and the nexus when > running XENHVM, this is because we don''t have a xenpci device on > the PVH port. > * Gate xenstore addition to parent == xenpci on the HVM case. > --- > sys/amd64/amd64/locore.S | 53 ++++++++ > sys/amd64/amd64/machdep.c | 179 ++++++++++++++++++++++---- > sys/amd64/amd64/mp_machdep.c | 27 +++-- > sys/amd64/include/asmacros.h | 26 ++++ > sys/amd64/include/clock.h | 6 + > sys/amd64/include/cpu.h | 1 + > sys/amd64/include/sysarch.h | 19 +++ > sys/amd64/include/xen/hypercall.h | 7 - > sys/conf/files | 4 +- > sys/conf/files.amd64 | 4 + > sys/conf/files.i386 | 1 + > sys/dev/xen/console/console.c | 23 +++- > sys/dev/xen/console/xencons_ring.c | 15 ++- > sys/dev/xen/control/control.c | 37 +++--- > sys/dev/xen/timer/timer.c | 59 +++++++-- > sys/i386/i386/locore.s | 9 ++ > sys/i386/i386/machdep.c | 9 ++ > sys/i386/include/clock.h | 6 + > sys/i386/include/xen/hypercall.h | 7 - > sys/i386/xen/xen_machdep.c | 4 +- > sys/x86/isa/clock.c | 53 +-------- > sys/x86/x86/delay.c | 95 ++++++++++++++ > sys/x86/x86/local_apic.c | 8 +- > sys/x86/xen/hvm.c | 93 ++++++++++---- > sys/x86/xen/mptable.c | 136 ++++++++++++++++++++ > sys/x86/xen/pv.c | 247 ++++++++++++++++++++++++++++++++++++ > sys/x86/xen/pvcpu.c | 98 ++++++++++++++ > sys/xen/gnttab.c | 21 +++- > sys/xen/interface/arch-x86/xen.h | 11 ++- > sys/xen/pv.h | 29 ++++ > sys/xen/xen-os.h | 8 + > sys/xen/xenstore/xenstore.c | 32 ++++-- > 32 files changed, 1141 insertions(+), 186 deletions(-) > create mode 100644 sys/x86/x86/delay.c > create mode 100644 sys/x86/xen/mptable.c > create mode 100644 sys/x86/xen/pv.c > create mode 100644 sys/x86/xen/pvcpu.c > create mode 100644 sys/xen/pv.h > > diff --git a/sys/amd64/amd64/locore.S b/sys/amd64/amd64/locore.S > index 55cda3a..e04cc48 100644 > --- a/sys/amd64/amd64/locore.S > +++ b/sys/amd64/amd64/locore.S > @@ -31,6 +31,12 @@ > #include <machine/pmap.h> > #include <machine/specialreg.h> > > +#ifdef XENHVM > +#include <xen/xen-os.h> > +#define __ASSEMBLY__ > +#include <xen/interface/elfnote.h> > +#endif > + > #include "assym.s" > > /* > @@ -86,3 +92,50 @@ NON_GPROF_ENTRY(btext) > ALIGN_DATA /* just to be sure */ > .space 0x1000 /* space for bootstack - temporary stack */ > bootstack: > + > +#ifdef XENHVM > +/* Xen */ > +.section __xen_guest > + ELFNOTE(Xen, XEN_ELFNOTE_GUEST_OS, .asciz, "FreeBSD") > + ELFNOTE(Xen, XEN_ELFNOTE_GUEST_VERSION, .asciz, "HEAD") > + ELFNOTE(Xen, XEN_ELFNOTE_XEN_VERSION, .asciz, "xen-3.0") > + ELFNOTE(Xen, XEN_ELFNOTE_VIRT_BASE, .quad, KERNBASE) > + ELFNOTE(Xen, XEN_ELFNOTE_PADDR_OFFSET, .quad, KERNBASE) /* Xen honours elf->p_paddr; compensate for this */ > + ELFNOTE(Xen, XEN_ELFNOTE_ENTRY, .quad, xen_start) > + ELFNOTE(Xen, XEN_ELFNOTE_HYPERCALL_PAGE, .quad, hypercall_page) > + ELFNOTE(Xen, XEN_ELFNOTE_HV_START_LOW, .quad, HYPERVISOR_VIRT_START) > + ELFNOTE(Xen, XEN_ELFNOTE_FEATURES, .asciz, "writable_descriptor_tables|auto_translated_physmap|supervisor_mode_kernel|hvm_callback_vector") > + ELFNOTE(Xen, XEN_ELFNOTE_PAE_MODE, .asciz, "yes") > + ELFNOTE(Xen, XEN_ELFNOTE_L1_MFN_VALID, .long, PG_V, PG_V) > + ELFNOTE(Xen, XEN_ELFNOTE_LOADER, .asciz, "generic") > + ELFNOTE(Xen, XEN_ELFNOTE_SUSPEND_CANCEL, .long, 0) > + ELFNOTE(Xen, XEN_ELFNOTE_BSD_SYMTAB, .asciz, "yes") > + > + .text > +.p2align PAGE_SHIFT, 0x90 /* Hypercall_page needs to be PAGE aligned */ > + > +NON_GPROF_ENTRY(hypercall_page) > + .skip 0x1000, 0x90 /* Fill with "nop"s */ > + > +NON_GPROF_ENTRY(xen_start) > + /* Don''t trust what the loader gives for rflags. */ > + pushq $PSL_KERNEL > + popfq > + > + /* Parameters for the xen init function */ > + movq %rsi, %rdi /* shared_info (arg 1) */ > + movq %rsp, %rsi /* xenstack (arg 2) */ > + > + /* Use our own stack */ > + movq $bootstack,%rsp > + xorl %ebp, %ebp > + > + /* u_int64_t hammer_time_xen(start_info_t *si, u_int64_t xenstack); */ > + call hammer_time_xen > + movq %rax, %rsp /* set up kstack for mi_startup() */ > + call mi_startup /* autoconfiguration, mountroot etc */ > + > + /* NOTREACHED */ > +0: hlt > + jmp 0b > +#endif > diff --git a/sys/amd64/amd64/machdep.c b/sys/amd64/amd64/machdep.c > index 2b2e47f..b649def 100644 > --- a/sys/amd64/amd64/machdep.c > +++ b/sys/amd64/amd64/machdep.c > @@ -127,6 +127,7 @@ __FBSDID("$FreeBSD$"); > #include <machine/reg.h> > #include <machine/sigframe.h> > #include <machine/specialreg.h> > +#include <machine/sysarch.h> > #ifdef PERFMON > #include <machine/perfmon.h> > #endif > @@ -147,10 +148,20 @@ __FBSDID("$FreeBSD$"); > #include <isa/isareg.h> > #include <isa/rtc.h> > > +#ifdef XENHVM > +/* Xen */ > +#include <xen/xen-os.h> > +#include <xen/hvm.h> > +#include <xen/pv.h> > +#endif > + > /* Sanity check for __curthread() */ > CTASSERT(offsetof(struct pcpu, pc_curthread) == 0); > > extern u_int64_t hammer_time(u_int64_t, u_int64_t); > +#ifdef XENHVM > +extern u_int64_t hammer_time_xen(start_info_t *, u_int64_t); > +#endif > > extern void printcpuinfo(void); /* XXX header file */ > extern void identify_cpu(void); > @@ -166,6 +177,23 @@ static int set_fpcontext(struct thread *td, const mcontext_t *mcp, > char *xfpustate, size_t xfpustate_len); > SYSINIT(cpu, SI_SUB_CPU, SI_ORDER_FIRST, cpu_startup, NULL); > > +/* Preload data parse function */ > +static caddr_t native_parse_preload_data(u_int64_t); > + > +/* Native function to fetch the e820 map */ > +static void native_fetch_e820_map(caddr_t, struct bios_smap **, u_int32_t *); > + > +/* Default init_ops implementation. */ > +struct init_ops init_ops = { > + .parse_preload_data = native_parse_preload_data, > + .early_delay_init = i8254_init, > + .early_delay = i8254_delay, > + .fetch_e820_map = native_fetch_e820_map, > +#ifdef SMP > + .mp_bootaddress = mp_bootaddress, > +#endif > +}; > + > /* > * The file "conf/ldscript.amd64" defines the symbol "kernphys". Its value is > * the physical address at which the kernel is loaded. > @@ -216,6 +244,15 @@ struct mem_range_softc mem_range_softc; > > struct mtx dt_lock; /* lock for GDT and LDT */ > > +void > +DELAY(int n) > +{ > + if (delay_tc(n)) > + return; > + > + init_ops.early_delay(n); > +} > + > static void > cpu_startup(dummy) > void *dummy; > @@ -1408,6 +1445,24 @@ add_smap_entry(struct bios_smap *smap, vm_paddr_t *physmap, int *physmap_idxp) > return (1); > } > > +static void > +native_fetch_e820_map(caddr_t kmdp, struct bios_smap **smap, u_int32_t *size) > +{ > + /* > + * get memory map from INT 15:E820, kindly supplied by the > + * loader. > + * > + * subr_module.c says: > + * "Consumer may safely assume that size value precedes data." > + * ie: an int32_t immediately precedes smap. > + */ > + *smap = (struct bios_smap *)preload_search_info(kmdp, > + MODINFO_METADATA | MODINFOMD_SMAP); > + if (*smap == NULL) > + panic("No BIOS smap info from loader!"); > + *size = *((u_int32_t *)*smap - 1); > +} > + > /* > * Populate the (physmap) array with base/bound pairs describing the > * available physical memory in the system, then test this memory and > @@ -1433,19 +1488,8 @@ getmemsize(caddr_t kmdp, u_int64_t first) > basemem = 0; > physmap_idx = 0; > > - /* > - * get memory map from INT 15:E820, kindly supplied by the loader. > - * > - * subr_module.c says: > - * "Consumer may safely assume that size value precedes data." > - * ie: an int32_t immediately precedes smap. > - */ > - smapbase = (struct bios_smap *)preload_search_info(kmdp, > - MODINFO_METADATA | MODINFOMD_SMAP); > - if (smapbase == NULL) > - panic("No BIOS smap info from loader!"); > + init_ops.fetch_e820_map(kmdp, &smapbase, &smapsize); > > - smapsize = *((u_int32_t *)smapbase - 1); > smapend = (struct bios_smap *)((uintptr_t)smapbase + smapsize); > > for (smap = smapbase; smap < smapend; smap++) > @@ -1467,7 +1511,8 @@ getmemsize(caddr_t kmdp, u_int64_t first) > > #ifdef SMP > /* make hole for AP bootstrap code */ > - physmap[1] = mp_bootaddress(physmap[1] / 1024); > + if (init_ops.mp_bootaddress) > + physmap[1] = init_ops.mp_bootaddress(physmap[1] / 1024); > #endif > > /* > @@ -1681,6 +1726,98 @@ do_next: > msgbufp = (struct msgbuf *)PHYS_TO_DMAP(phys_avail[pa_indx]); > } > > +static caddr_t > +native_parse_preload_data(u_int64_t modulep) > +{ > + caddr_t kmdp; > + > + preload_metadata = (caddr_t)(uintptr_t)(modulep + KERNBASE); > + preload_bootstrap_relocate(KERNBASE); > + kmdp = preload_search_by_type("elf kernel"); > + if (kmdp == NULL) > + kmdp = preload_search_by_type("elf64 kernel"); > + boothowto = MD_FETCH(kmdp, MODINFOMD_HOWTO, int); > + kern_envp = MD_FETCH(kmdp, MODINFOMD_ENVP, char *) + KERNBASE; > +#ifdef DDB > + ksym_start = MD_FETCH(kmdp, MODINFOMD_SSYM, uintptr_t); > + ksym_end = MD_FETCH(kmdp, MODINFOMD_ESYM, uintptr_t); > +#endif > + > + return (kmdp); > +} > + > +#ifdef XENHVM > +/* > + * First function called by the Xen PVH boot sequence. > + * > + * Set some Xen global variables and prepare the environment so it is > + * as similar as possible to what native FreeBSD init function expects. > + */ > +u_int64_t > +hammer_time_xen(start_info_t *si, u_int64_t xenstack) > +{ > + u_int64_t physfree; > + u_int64_t *PT4 = (u_int64_t *)xenstack; > + u_int64_t *PT3 = (u_int64_t *)(xenstack + PAGE_SIZE); > + u_int64_t *PT2 = (u_int64_t *)(xenstack + 2 * PAGE_SIZE); > + int i; > + > + KASSERT((si != NULL && xenstack != 0), > + ("invalid start_info or xenstack")); > + > + xen_early_printf("FreeBSD PVH running on %s\n", si->magic); > + > + /* We use 3 pages of xen stack for the boot pagetables */ > + physfree = xenstack + 3 * PAGE_SIZE - KERNBASE; > + > + /* Setup Xen global variables */ > + HYPERVISOR_start_info = si; > + HYPERVISOR_shared_info > + (shared_info_t *)(si->shared_info + KERNBASE); > + > + /* > + * Setup some misc global variables for Xen devices > + * > + * XXX: devices that need this specific variables should > + * be rewritten to fetch this info by themselves from the > + * start_info page. > + */ > + console_page > + (char *)(ptoa(si->console.domU.mfn) + KERNBASE); > + xen_store = (struct xenstore_domain_interface *) > + (ptoa(si->store_mfn) + KERNBASE); > + > + xen_domain_type = XEN_PV_DOMAIN; > + vm_guest = VM_GUEST_XEN; > + > + /* > + * Use the stack Xen gives us to build the page tables > + * as native FreeBSD expects to find them (created > + * by the boot trampoline). > + */ > + for (i = 0; i < 512; i++) { > + /* Each slot of the level 4 pages points to the same level 3 page */ > + PT4[i] = ((u_int64_t)&PT3[0]) - KERNBASE; > + PT4[i] |= PG_V | PG_RW | PG_U; > + > + /* Each slot of the level 3 pages points to the same level 2 page */ > + PT3[i] = ((u_int64_t)&PT2[0]) - KERNBASE; > + PT3[i] |= PG_V | PG_RW | PG_U; > + > + /* The level 2 page slots are mapped with 2MB pages for 1GB. */ > + PT2[i] = i * (2 * 1024 * 1024); > + PT2[i] |= PG_V | PG_RW | PG_PS | PG_U; > + } > + load_cr3(((u_int64_t)&PT4[0]) - KERNBASE); > + > + /* Set the hooks for early functions that diverge from bare metal */ > + xen_pv_set_init_ops(); > + > + /* Now we can jump into the native init function */ > + return hammer_time(0, physfree); > +} > +#endif > + > u_int64_t > hammer_time(u_int64_t modulep, u_int64_t physfree) > { > @@ -1705,17 +1842,7 @@ hammer_time(u_int64_t modulep, u_int64_t physfree) > */ > proc_linkup0(&proc0, &thread0); > > - preload_metadata = (caddr_t)(uintptr_t)(modulep + KERNBASE); > - preload_bootstrap_relocate(KERNBASE); > - kmdp = preload_search_by_type("elf kernel"); > - if (kmdp == NULL) > - kmdp = preload_search_by_type("elf64 kernel"); > - boothowto = MD_FETCH(kmdp, MODINFOMD_HOWTO, int); > - kern_envp = MD_FETCH(kmdp, MODINFOMD_ENVP, char *) + KERNBASE; > -#ifdef DDB > - ksym_start = MD_FETCH(kmdp, MODINFOMD_SSYM, uintptr_t); > - ksym_end = MD_FETCH(kmdp, MODINFOMD_ESYM, uintptr_t); > -#endif > + kmdp = init_ops.parse_preload_data(modulep); > > /* Init basic tunables, hz etc */ > init_param1(); > @@ -1799,10 +1926,10 @@ hammer_time(u_int64_t modulep, u_int64_t physfree) > lidt(&r_idt); > > /* > - * Initialize the i8254 before the console so that console > + * Initialize the early delay before the console so that console > * initialization can use DELAY(). > */ > - i8254_init(); > + init_ops.early_delay_init(); > > /* > * Initialize the console before we print anything out. > diff --git a/sys/amd64/amd64/mp_machdep.c b/sys/amd64/amd64/mp_machdep.c > index 4ef4b3d..44c2a45 100644 > --- a/sys/amd64/amd64/mp_machdep.c > +++ b/sys/amd64/amd64/mp_machdep.c > @@ -90,7 +90,8 @@ extern struct pcpu __pcpu[]; > > /* AP uses this during bootstrap. Do not staticize. */ > char *bootSTK; > -static int bootAP; > +int bootAP; > +bool lapic_disabled = false; > > /* Free these after use */ > void *bootstacks[MAXCPU]; > @@ -122,9 +123,12 @@ u_long *ipi_rendezvous_counts[MAXCPU]; > static u_long *ipi_hardclock_counts[MAXCPU]; > #endif > > +int native_start_all_aps(void); > + > /* Default cpu_ops implementation. */ > struct cpu_ops cpu_ops = { > - .ipi_vectored = lapic_ipi_vectored > + .ipi_vectored = lapic_ipi_vectored, > + .start_all_aps = native_start_all_aps, > }; > > extern inthand_t IDTVEC(fast_syscall), IDTVEC(fast_syscall32); > @@ -138,7 +142,7 @@ extern int pmap_pcid_enabled; > static volatile cpuset_t ipi_nmi_pending; > > /* used to hold the AP''s until we are ready to release them */ > -static struct mtx ap_boot_mtx; > +struct mtx ap_boot_mtx; > > /* Set to 1 once we''re ready to let the APs out of the pen. */ > static volatile int aps_ready = 0; > @@ -165,7 +169,6 @@ static int cpu_cores; /* cores per package */ > > static void assign_cpu_ids(void); > static void set_interrupt_apic_ids(void); > -static int start_all_aps(void); > static int start_ap(int apic_id); > static void release_aps(void *dummy); > > @@ -569,7 +572,7 @@ cpu_mp_start(void) > assign_cpu_ids(); > > /* Start each Application Processor */ > - start_all_aps(); > + cpu_ops.start_all_aps(); > > set_interrupt_apic_ids(); > } > @@ -707,7 +710,8 @@ init_secondary(void) > wrmsr(MSR_SF_MASK, PSL_NT|PSL_T|PSL_I|PSL_C|PSL_D); > > /* Disable local APIC just to be sure. */ > - lapic_disable(); > + if (!lapic_disabled) > + lapic_disable(); > > /* signal our startup to the BSP. */ > mp_naps++; > @@ -733,7 +737,7 @@ init_secondary(void) > > /* A quick check from sanity claus */ > cpuid = PCPU_GET(cpuid); > - if (PCPU_GET(apic_id) != lapic_id()) { > + if (!lapic_disabled && PCPU_GET(apic_id) != lapic_id()) { > printf("SMP: cpuid = %d\n", cpuid); > printf("SMP: actual apic_id = %d\n", lapic_id()); > printf("SMP: correct apic_id = %d\n", PCPU_GET(apic_id)); > @@ -749,7 +753,8 @@ init_secondary(void) > mtx_lock_spin(&ap_boot_mtx); > > /* Init local apic for irq''s */ > - lapic_setup(1); > + if (!lapic_disabled) > + lapic_setup(1); > > /* Set memory range attributes for this CPU to match the BSP */ > mem_range_AP_init(); > @@ -764,7 +769,7 @@ init_secondary(void) > if (cpu_logical > 1 && PCPU_GET(apic_id) % cpu_logical != 0) > CPU_SET(cpuid, &logical_cpus_mask); > > - if (bootverbose) > + if (!lapic_disabled && bootverbose) > lapic_dump("AP"); > > if (smp_cpus == mp_ncpus) { > @@ -908,8 +913,8 @@ assign_cpu_ids(void) > /* > * start each AP in our list > */ > -static int > -start_all_aps(void) > +int > +native_start_all_aps(void) > { > vm_offset_t va = boot_address + KERNBASE; > u_int64_t *pt4, *pt3, *pt2; > diff --git a/sys/amd64/include/asmacros.h b/sys/amd64/include/asmacros.h > index 1fb592a..ce8dce4 100644 > --- a/sys/amd64/include/asmacros.h > +++ b/sys/amd64/include/asmacros.h > @@ -201,4 +201,30 @@ > > #endif /* LOCORE */ > > +#ifdef __STDC__ > +#define ELFNOTE(name, type, desctype, descdata...) \ > +.pushsection .note.name ; \ > + .align 4 ; \ > + .long 2f - 1f /* namesz */ ; \ > + .long 4f - 3f /* descsz */ ; \ > + .long type ; \ > +1:.asciz #name ; \ > +2:.align 4 ; \ > +3:desctype descdata ; \ > +4:.align 4 ; \ > +.popsection > +#else /* !__STDC__, i.e. -traditional */ > +#define ELFNOTE(name, type, desctype, descdata) \ > +.pushsection .note.name ; \ > + .align 4 ; \ > + .long 2f - 1f /* namesz */ ; \ > + .long 4f - 3f /* descsz */ ; \ > + .long type ; \ > +1:.asciz "name" ; \ > +2:.align 4 ; \ > +3:desctype descdata ; \ > +4:.align 4 ; \ > +.popsection > +#endif /* __STDC__ */ > + > #endif /* !_MACHINE_ASMACROS_H_ */ > diff --git a/sys/amd64/include/clock.h b/sys/amd64/include/clock.h > index d7f7d82..e7817ab 100644 > --- a/sys/amd64/include/clock.h > +++ b/sys/amd64/include/clock.h > @@ -25,6 +25,12 @@ extern int smp_tsc; > #endif > > void i8254_init(void); > +void i8254_delay(int); > +#ifdef XENHVM > +void xen_delay_init(void); > +void xen_delay(int); > +#endif > +int delay_tc(int); > > /* > * Driver to clock driver interface. > diff --git a/sys/amd64/include/cpu.h b/sys/amd64/include/cpu.h > index 3d9ff531..ed9f1db 100644 > --- a/sys/amd64/include/cpu.h > +++ b/sys/amd64/include/cpu.h > @@ -64,6 +64,7 @@ struct cpu_ops { > void (*cpu_init)(void); > void (*cpu_resume)(void); > void (*ipi_vectored)(u_int, int); > + int (*start_all_aps)(void); > }; > > extern struct cpu_ops cpu_ops; > diff --git a/sys/amd64/include/sysarch.h b/sys/amd64/include/sysarch.h > index cd380d4..27fd3ba 100644 > --- a/sys/amd64/include/sysarch.h > +++ b/sys/amd64/include/sysarch.h > @@ -4,3 +4,22 @@ > /* $FreeBSD$ */ > > #include <x86/sysarch.h> > + > +#include <machine/pc/bios.h> > +/* > + * Struct containing pointers to init functions whose > + * implementation is run time selectable. Selection can be made, > + * for example, based on detection of a BIOS variant or > + * hypervisor environment. > + */ > +struct init_ops { > + caddr_t (*parse_preload_data)(u_int64_t); > + void (*early_delay_init)(void); > + void (*early_delay)(int); > + void (*fetch_e820_map)(caddr_t, struct bios_smap **, u_int32_t *); > +#ifdef SMP > + u_int (*mp_bootaddress)(u_int); > +#endif > +}; > + > +extern struct init_ops init_ops; > diff --git a/sys/amd64/include/xen/hypercall.h b/sys/amd64/include/xen/hypercall.h > index a1b2a5c..499fb4d 100644 > --- a/sys/amd64/include/xen/hypercall.h > +++ b/sys/amd64/include/xen/hypercall.h > @@ -51,15 +51,8 @@ > #define CONFIG_XEN_COMPAT 0x030002 > #define __must_check > > -#ifdef XEN > #define HYPERCALL_STR(name) \ > "call hypercall_page + ("STR(__HYPERVISOR_##name)" * 32)" > -#else > -#define HYPERCALL_STR(name) \ > - "mov $("STR(__HYPERVISOR_##name)" * 32),%%eax; "\ > - "add hypercall_stubs(%%rip),%%rax; " \ > - "call *%%rax" > -#endif > > #define _hypercall0(type, name) \ > ({ \ > diff --git a/sys/conf/files b/sys/conf/files > index f3e298c..6040447 100644 > --- a/sys/conf/files > +++ b/sys/conf/files > @@ -2508,8 +2508,8 @@ dev/xe/if_xe_pccard.c optional xe pccard > dev/xen/balloon/balloon.c optional xen | xenhvm > dev/xen/blkfront/blkfront.c optional xen | xenhvm > dev/xen/blkback/blkback.c optional xen | xenhvm > -dev/xen/console/console.c optional xen > -dev/xen/console/xencons_ring.c optional xen > +dev/xen/console/console.c optional xen | xenhvm > +dev/xen/console/xencons_ring.c optional xen | xenhvm > dev/xen/control/control.c optional xen | xenhvm > dev/xen/netback/netback.c optional xen | xenhvm > dev/xen/netfront/netfront.c optional xen | xenhvm > diff --git a/sys/conf/files.amd64 b/sys/conf/files.amd64 > index 1914c48..bd52e8f 100644 > --- a/sys/conf/files.amd64 > +++ b/sys/conf/files.amd64 > @@ -554,5 +554,9 @@ x86/x86/mptable_pci.c optional mptable pci > x86/x86/msi.c optional pci > x86/x86/nexus.c standard > x86/x86/tsc.c standard > +x86/x86/delay.c standard > x86/xen/hvm.c optional xenhvm > x86/xen/xen_intr.c optional xen | xenhvm > +x86/xen/mptable.c optional xenhvm > +x86/xen/pvcpu.c optional xenhvm > +x86/xen/pv.c optional xenhvm > diff --git a/sys/conf/files.i386 b/sys/conf/files.i386 > index e259659..15a3aae 100644 > --- a/sys/conf/files.i386 > +++ b/sys/conf/files.i386 > @@ -577,5 +577,6 @@ x86/x86/mptable_pci.c optional apic native pci > x86/x86/msi.c optional apic pci > x86/x86/nexus.c standard > x86/x86/tsc.c standard > +x86/x86/delay.c standard > x86/xen/hvm.c optional xenhvm > x86/xen/xen_intr.c optional xen | xenhvm > diff --git a/sys/dev/xen/console/console.c b/sys/dev/xen/console/console.c > index 65a0e7d..86dc2a4 100644 > --- a/sys/dev/xen/console/console.c > +++ b/sys/dev/xen/console/console.c > @@ -69,11 +69,14 @@ struct mtx cn_mtx; > static char wbuf[WBUF_SIZE]; > static char rbuf[RBUF_SIZE]; > static int rc, rp; > -static unsigned int cnsl_evt_reg; > +unsigned int cnsl_evt_reg; > static unsigned int wc, wp; /* write_cons, write_prod */ > xen_intr_handle_t xen_intr_handle; > device_t xencons_dev; > > +/* Virt address of the shared console page */ > +char *console_page; > + > #ifdef KDB > static int xc_altbrk; > #endif > @@ -113,6 +116,9 @@ static struct ttydevsw xc_ttydevsw = { > static void > xc_cnprobe(struct consdev *cp) > { > + if (!xen_pv_domain()) > + return; > + > cp->cn_pri = CN_REMOTE; > sprintf(cp->cn_name, "%s0", driver_name); > } > @@ -175,7 +181,7 @@ static void > xc_cnputc(struct consdev *dev, int c) > { > > - if (xen_start_info->flags & SIF_INITDOMAIN) > + if (HYPERVISOR_start_info->flags & SIF_INITDOMAIN) > xc_cnputc_dom0(dev, c); > else > xc_cnputc_domu(dev, c); > @@ -206,8 +212,7 @@ xcons_putc(int c) > xcons_force_flush(); > #endif > } > - if (cnsl_evt_reg) > - __xencons_tx_flush(); > + __xencons_tx_flush(); > > /* inform start path that we''re pretty full */ > return ((wp - wc) >= WBUF_SIZE - 100) ? TRUE : FALSE; > @@ -217,6 +222,10 @@ static void > xc_identify(driver_t *driver, device_t parent) > { > device_t child; > + > + if (!xen_pv_domain()) > + return; > + > child = BUS_ADD_CHILD(parent, 0, driver_name, 0); > device_set_driver(child, driver); > device_set_desc(child, "Xen Console"); > @@ -245,7 +254,7 @@ xc_attach(device_t dev) > cnsl_evt_reg = 1; > callout_reset(&xc_callout, XC_POLLTIME, xc_timeout, xccons); > > - if (xen_start_info->flags & SIF_INITDOMAIN) { > + if (HYPERVISOR_start_info->flags & SIF_INITDOMAIN) { > error = xen_intr_bind_virq(dev, VIRQ_CONSOLE, 0, NULL, > xencons_priv_interrupt, NULL, > INTR_TYPE_TTY, &xen_intr_handle); > @@ -309,7 +318,7 @@ __xencons_tx_flush(void) > sz = wp - wc; > if (sz > (WBUF_SIZE - WBUF_MASK(wc))) > sz = WBUF_SIZE - WBUF_MASK(wc); > - if (xen_start_info->flags & SIF_INITDOMAIN) { > + if (HYPERVISOR_start_info->flags & SIF_INITDOMAIN) { > HYPERVISOR_console_io(CONSOLEIO_write, sz, &wbuf[WBUF_MASK(wc)]); > wc += sz; > } else { > @@ -424,7 +433,7 @@ xcons_force_flush(void) > { > int sz; > > - if (xen_start_info->flags & SIF_INITDOMAIN) > + if (HYPERVISOR_start_info->flags & SIF_INITDOMAIN) > return; > > /* Spin until console data is flushed through to the domain controller. */ > diff --git a/sys/dev/xen/console/xencons_ring.c b/sys/dev/xen/console/xencons_ring.c > index 3701551..3046498 100644 > --- a/sys/dev/xen/console/xencons_ring.c > +++ b/sys/dev/xen/console/xencons_ring.c > @@ -32,9 +32,9 @@ __FBSDID("$FreeBSD$"); > > #define console_evtchn console.domU.evtchn > xen_intr_handle_t console_handle; > -extern char *console_page; > extern struct mtx cn_mtx; > extern device_t xencons_dev; > +extern int cnsl_evt_reg; > > static inline struct xencons_interface * > xencons_interface(void) > @@ -60,6 +60,7 @@ xencons_ring_send(const char *data, unsigned len) > struct xencons_interface *intf; > XENCONS_RING_IDX cons, prod; > int sent; > + struct evtchn_send send = { .port = HYPERVISOR_start_info->console.domU.evtchn }; > > intf = xencons_interface(); > cons = intf->out_cons; > @@ -76,7 +77,11 @@ xencons_ring_send(const char *data, unsigned len) > wmb(); > intf->out_prod = prod; > > - xen_intr_signal(console_handle); > + if (cnsl_evt_reg) > + xen_intr_signal(console_handle); > + else > + HYPERVISOR_event_channel_op(EVTCHNOP_send, &send); > + > > return sent; > > @@ -125,11 +130,11 @@ xencons_ring_init(void) > { > int err; > > - if (!xen_start_info->console_evtchn) > + if (!HYPERVISOR_start_info->console_evtchn) > return 0; > > err = xen_intr_bind_local_port(xencons_dev, > - xen_start_info->console_evtchn, NULL, xencons_handle_input, NULL, > + HYPERVISOR_start_info->console_evtchn, NULL, xencons_handle_input, NULL, > INTR_TYPE_MISC | INTR_MPSAFE, &console_handle); > if (err) { > return err; > @@ -145,7 +150,7 @@ void > xencons_suspend(void) > { > > - if (!xen_start_info->console_evtchn) > + if (!HYPERVISOR_start_info->console_evtchn) > return; > > xen_intr_unbind(&console_handle); > diff --git a/sys/dev/xen/control/control.c b/sys/dev/xen/control/control.c > index a9f8d1b..35c923d 100644 > --- a/sys/dev/xen/control/control.c > +++ b/sys/dev/xen/control/control.c > @@ -317,21 +317,6 @@ xctrl_suspend() > EVENTHANDLER_INVOKE(power_resume); > } > > -static void > -xen_pv_shutdown_final(void *arg, int howto) > -{ > - /* > - * Inform the hypervisor that shutdown is complete. > - * This is not necessary in HVM domains since Xen > - * emulates ACPI in that mode and FreeBSD''s ACPI > - * support will request this transition. > - */ > - if (howto & (RB_HALT | RB_POWEROFF)) > - HYPERVISOR_shutdown(SHUTDOWN_poweroff); > - else > - HYPERVISOR_shutdown(SHUTDOWN_reboot); > -} > - > #else > > /* HVM mode suspension. */ > @@ -447,6 +432,21 @@ xctrl_halt() > shutdown_nice(RB_HALT); > } > > +static void > +xen_pv_shutdown_final(void *arg, int howto) > +{ > + /* > + * Inform the hypervisor that shutdown is complete. > + * This is not necessary in HVM domains since Xen > + * emulates ACPI in that mode and FreeBSD''s ACPI > + * support will request this transition. > + */ > + if (howto & (RB_HALT | RB_POWEROFF)) > + HYPERVISOR_shutdown(SHUTDOWN_poweroff); > + else > + HYPERVISOR_shutdown(SHUTDOWN_reboot); > +} > + > /*------------------------------ Event Reception -----------------------------*/ > static void > xctrl_on_watch_event(struct xs_watch *watch, const char **vec, unsigned int len) > @@ -529,10 +529,9 @@ xctrl_attach(device_t dev) > xctrl->xctrl_watch.callback_data = (uintptr_t)xctrl; > xs_register_watch(&xctrl->xctrl_watch); > > -#ifndef XENHVM > - EVENTHANDLER_REGISTER(shutdown_final, xen_pv_shutdown_final, NULL, > - SHUTDOWN_PRI_LAST); > -#endif > + if (xen_pv_domain()) > + EVENTHANDLER_REGISTER(shutdown_final, xen_pv_shutdown_final, NULL, > + SHUTDOWN_PRI_LAST); > > return (0); > } > diff --git a/sys/dev/xen/timer/timer.c b/sys/dev/xen/timer/timer.c > index 824c75b..13bd852 100644 > --- a/sys/dev/xen/timer/timer.c > +++ b/sys/dev/xen/timer/timer.c > @@ -59,6 +59,9 @@ __FBSDID("$FreeBSD$"); > #include <machine/_inttypes.h> > #include <machine/smp.h> > > +/* For the declaration of clock_lock */ > +#include <isa/rtc.h> > + > #include "clock_if.h" > > static devclass_t xentimer_devclass; > @@ -234,18 +237,16 @@ xen_fetch_vcpu_tinfo(struct vcpu_time_info *dst, struct vcpu_time_info *src) > * it happens to be less than another CPU''s previously determined value. > */ > static uint64_t > -xen_fetch_vcpu_time(void) > +xen_fetch_vcpu_time(struct vcpu_info *vcpu) > { > struct vcpu_time_info dst; > struct vcpu_time_info *src; > uint32_t pre_version; > uint64_t now; > volatile uint64_t last; > - struct vcpu_info *vcpu = DPCPU_GET(vcpu_info); > > src = &vcpu->time; > > - critical_enter(); > do { > pre_version = xen_fetch_vcpu_tinfo(&dst, src); > barrier(); > @@ -266,16 +267,19 @@ xen_fetch_vcpu_time(void) > } > } while (!atomic_cmpset_64(&xen_timer_last_time, last, now)); > > - critical_exit(); > - > return (now); > } > > static uint32_t > xentimer_get_timecount(struct timecounter *tc) > { > + uint32_t xen_time; > + > + critical_enter(); > + xen_time = (uint32_t)xen_fetch_vcpu_time(DPCPU_GET(vcpu_info)) & UINT_MAX; > + critical_exit(); > > - return ((uint32_t)xen_fetch_vcpu_time() & UINT_MAX); > + return xen_time; > } > > /** > @@ -305,7 +309,12 @@ xen_fetch_wallclock(struct timespec *ts) > static void > xen_fetch_uptime(struct timespec *ts) > { > - uint64_t uptime = xen_fetch_vcpu_time(); > + uint64_t uptime; > + > + critical_enter(); > + uptime = xen_fetch_vcpu_time(DPCPU_GET(vcpu_info)); > + critical_exit(); > + > ts->tv_sec = uptime / NSEC_IN_SEC; > ts->tv_nsec = uptime % NSEC_IN_SEC; > } > @@ -354,7 +363,7 @@ xentimer_intr(void *arg) > struct xentimer_softc *sc = (struct xentimer_softc *)arg; > struct xentimer_pcpu_data *pcpu = DPCPU_PTR(xentimer_pcpu); > > - pcpu->last_processed = xen_fetch_vcpu_time(); > + pcpu->last_processed = xen_fetch_vcpu_time(DPCPU_GET(vcpu_info)); > if (pcpu->timer != 0 && sc->et.et_active) > sc->et.et_event_cb(&sc->et, sc->et.et_arg); > > @@ -415,7 +424,9 @@ xentimer_et_start(struct eventtimer *et, > do { > if (++i == 60) > panic("can''t schedule timer"); > - next_time = xen_fetch_vcpu_time() + first_in_ns; > + critical_enter(); > + next_time = xen_fetch_vcpu_time(DPCPU_GET(vcpu_info)) + first_in_ns; > + critical_exit(); > error = xentimer_vcpu_start_timer(cpu, next_time); > } while (error == -ETIME); > > @@ -573,6 +584,36 @@ xentimer_suspend(device_t dev) > return (0); > } > > +/* > + * Xen delay early init > + */ > +void xen_delay_init(void) > +{ > + /* Init the clock lock */ > + mtx_init(&clock_lock, "clk", NULL, MTX_SPIN | MTX_NOPROFILE); > +} > +/* > + * Xen PV DELAY function > + * > + * When running on PVH mode we don''t have an emulated i8524, so > + * make use of the Xen time info in order to code a simple DELAY > + * function that can be used during early boot. > + */ > +void xen_delay(int n) > +{ > + uint64_t end_ns; > + uint64_t current; > + > + end_ns = xen_fetch_vcpu_time(&HYPERVISOR_shared_info->vcpu_info[0]); > + end_ns += n * NSEC_IN_USEC; > + > + for (;;) { > + current = xen_fetch_vcpu_time(&HYPERVISOR_shared_info->vcpu_info[0]); > + if (current >= end_ns) > + break; > + } > +} > + > static device_method_t xentimer_methods[] = { > DEVMETHOD(device_identify, xentimer_identify), > DEVMETHOD(device_probe, xentimer_probe), > diff --git a/sys/i386/i386/locore.s b/sys/i386/i386/locore.s > index 68cb430..bd136b1 100644 > --- a/sys/i386/i386/locore.s > +++ b/sys/i386/i386/locore.s > @@ -898,3 +898,12 @@ done_pde: > #endif > > ret > + > +#ifdef XENHVM > +/* Xen Hypercall page */ > + .text > +.p2align PAGE_SHIFT, 0x90 /* Hypercall_page needs to be PAGE aligned */ > + > +NON_GPROF_ENTRY(hypercall_page) > + .skip 0x1000, 0x90 /* Fill with "nop"s */ > +#endif > diff --git a/sys/i386/i386/machdep.c b/sys/i386/i386/machdep.c > index c430316..8bd9a8e 100644 > --- a/sys/i386/i386/machdep.c > +++ b/sys/i386/i386/machdep.c > @@ -254,6 +254,15 @@ struct mtx icu_lock; > > struct mem_range_softc mem_range_softc; > > +void > +DELAY(int n) > +{ > + if (delay_tc(n)) > + return; > + > + i8254_delay(n); > +} > + > static void > cpu_startup(dummy) > void *dummy; > diff --git a/sys/i386/include/clock.h b/sys/i386/include/clock.h > index d980ec7..287b2c8 100644 > --- a/sys/i386/include/clock.h > +++ b/sys/i386/include/clock.h > @@ -22,6 +22,12 @@ extern int tsc_is_invariant; > extern int tsc_perf_stat; > > void i8254_init(void); > +void i8254_delay(int); > +#ifdef XENHVM > +void xen_delay_init(void); > +void xen_delay(int); > +#endif > +int delay_tc(int); > > /* > * Driver to clock driver interface. > diff --git a/sys/i386/include/xen/hypercall.h b/sys/i386/include/xen/hypercall.h > index edc13f4..1c15b0f 100644 > --- a/sys/i386/include/xen/hypercall.h > +++ b/sys/i386/include/xen/hypercall.h > @@ -40,15 +40,8 @@ > #define CONFIG_XEN_COMPAT 0x030002 > > > -#if defined(XEN) > #define HYPERCALL_STR(name) \ > "call hypercall_page + ("STR(__HYPERVISOR_##name)" * 32)" > -#else > -#define HYPERCALL_STR(name) \ > - "mov hypercall_stubs,%%eax; " \ > - "add $("STR(__HYPERVISOR_##name)" * 32),%%eax; " \ > - "call *%%eax" > -#endif > > #define _hypercall0(type, name) \ > ({ \ > diff --git a/sys/i386/xen/xen_machdep.c b/sys/i386/xen/xen_machdep.c > index 7049be6..1b1c74d 100644 > --- a/sys/i386/xen/xen_machdep.c > +++ b/sys/i386/xen/xen_machdep.c > @@ -89,6 +89,7 @@ IDTVEC(div), IDTVEC(dbg), IDTVEC(nmi), IDTVEC(bpt), IDTVEC(ofl), > > int xendebug_flags; > start_info_t *xen_start_info; > +start_info_t *HYPERVISOR_start_info; > shared_info_t *HYPERVISOR_shared_info; > xen_pfn_t *xen_machine_phys = machine_to_phys_mapping; > xen_pfn_t *xen_phys_machine; > @@ -744,7 +745,7 @@ void initvalues(start_info_t *startinfo); > struct xenstore_domain_interface; > extern struct xenstore_domain_interface *xen_store; > > -char *console_page; > +extern char *console_page; > > void * > bootmem_alloc(unsigned int size) > @@ -927,6 +928,7 @@ initvalues(start_info_t *startinfo) > HYPERVISOR_vm_assist(VMASST_CMD_enable, VMASST_TYPE_4gb_segments_notify); > #endif > xen_start_info = startinfo; > + HYPERVISOR_start_info = startinfo; > xen_phys_machine = (xen_pfn_t *)startinfo->mfn_list; > > IdlePTD = (pd_entry_t *)((uint8_t *)startinfo->pt_base + PAGE_SIZE); > diff --git a/sys/x86/isa/clock.c b/sys/x86/isa/clock.c > index a12e175..a5aed1c 100644 > --- a/sys/x86/isa/clock.c > +++ b/sys/x86/isa/clock.c > @@ -247,61 +247,13 @@ getit(void) > return ((high << 8) | low); > } > > -#ifndef DELAYDEBUG > -static u_int > -get_tsc(__unused struct timecounter *tc) > -{ > - > - return (rdtsc32()); > -} > - > -static __inline int > -delay_tc(int n) > -{ > - struct timecounter *tc; > - timecounter_get_t *func; > - uint64_t end, freq, now; > - u_int last, mask, u; > - > - tc = timecounter; > - freq = atomic_load_acq_64(&tsc_freq); > - if (tsc_is_invariant && freq != 0) { > - func = get_tsc; > - mask = ~0u; > - } else { > - if (tc->tc_quality <= 0) > - return (0); > - func = tc->tc_get_timecount; > - mask = tc->tc_counter_mask; > - freq = tc->tc_frequency; > - } > - now = 0; > - end = freq * n / 1000000; > - if (func == get_tsc) > - sched_pin(); > - last = func(tc) & mask; > - do { > - cpu_spinwait(); > - u = func(tc) & mask; > - if (u < last) > - now += mask - last + u + 1; > - else > - now += u - last; > - last = u; > - } while (now < end); > - if (func == get_tsc) > - sched_unpin(); > - return (1); > -} > -#endif > - > /* > * Wait "n" microseconds. > * Relies on timer 1 counting down from (i8254_freq / hz) > * Note: timer had better have been programmed before this is first used! > */ > void > -DELAY(int n) > +i8254_delay(int n) > { > int delta, prev_tick, tick, ticks_left; > #ifdef DELAYDEBUG > @@ -317,9 +269,6 @@ DELAY(int n) > } > if (state == 1) > printf("DELAY(%d)...", n); > -#else > - if (delay_tc(n)) > - return; > #endif > /* > * Read the counter first, so that the rest of the setup overhead is > diff --git a/sys/x86/x86/delay.c b/sys/x86/x86/delay.c > new file mode 100644 > index 0000000..7ea70b1 > --- /dev/null > +++ b/sys/x86/x86/delay.c > @@ -0,0 +1,95 @@ > +/*- > + * Copyright (c) 1990 The Regents of the University of California. > + * Copyright (c) 2010 Alexander Motin <mav@FreeBSD.org> > + * All rights reserved. > + * > + * This code is derived from software contributed to Berkeley by > + * William Jolitz and Don Ahn. > + * > + * Redistribution and use in source and binary forms, with or without > + * modification, are permitted provided that the following conditions > + * are met: > + * 1. Redistributions of source code must retain the above copyright > + * notice, this list of conditions and the following disclaimer. > + * 2. Redistributions in binary form must reproduce the above copyright > + * notice, this list of conditions and the following disclaimer in the > + * documentation and/or other materials provided with the distribution. > + * 4. Neither the name of the University nor the names of its contributors > + * may be used to endorse or promote products derived from this software > + * without specific prior written permission. > + * > + * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'''' AND > + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE > + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE > + * ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE > + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL > + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS > + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) > + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT > + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY > + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF > + * SUCH DAMAGE. > + * > + * from: @(#)clock.c 7.2 (Berkeley) 5/12/91 > + */ > + > +#include <sys/cdefs.h> > +__FBSDID("$FreeBSD$"); > + > +/* Generic x86 routines to handle delay */ > + > +#include <sys/param.h> > +#include <sys/systm.h> > +#include <sys/timetc.h> > +#include <sys/proc.h> > +#include <sys/kernel.h> > +#include <sys/sched.h> > + > +#include <machine/clock.h> > +#include <machine/cpu.h> > + > +static u_int > +get_tsc(__unused struct timecounter *tc) > +{ > + > + return (rdtsc32()); > +} > + > +int > +delay_tc(int n) > +{ > + struct timecounter *tc; > + timecounter_get_t *func; > + uint64_t end, freq, now; > + u_int last, mask, u; > + > + tc = timecounter; > + freq = atomic_load_acq_64(&tsc_freq); > + if (tsc_is_invariant && freq != 0) { > + func = get_tsc; > + mask = ~0u; > + } else { > + if (tc->tc_quality <= 0) > + return (0); > + func = tc->tc_get_timecount; > + mask = tc->tc_counter_mask; > + freq = tc->tc_frequency; > + } > + now = 0; > + end = freq * n / 1000000; > + if (func == get_tsc) > + sched_pin(); > + last = func(tc) & mask; > + do { > + cpu_spinwait(); > + u = func(tc) & mask; > + if (u < last) > + now += mask - last + u + 1; > + else > + now += u - last; > + last = u; > + } while (now < end); > + if (func == get_tsc) > + sched_unpin(); > + return (1); > +} > diff --git a/sys/x86/x86/local_apic.c b/sys/x86/x86/local_apic.c > index 8c8eef6..d8d7701 100644 > --- a/sys/x86/x86/local_apic.c > +++ b/sys/x86/x86/local_apic.c > @@ -1368,9 +1368,13 @@ apic_setup_io(void *dummy __unused) > if (retval != 0) > printf("%s: Failed to setup I/O APICs: returned %d\n", > best_enum->apic_name, retval); > -#ifdef XEN > - return; > + > +#if defined(XEN) || defined(XENHVM) > + /* There''s no lapic on PV Xen */ > + if (xen_pv_domain()) > + return; > #endif > + > /* > * Finish setting up the local APIC on the BSP once we know how to > * properly program the LINT pins. > diff --git a/sys/x86/xen/hvm.c b/sys/x86/xen/hvm.c > index 72811dc..be15594 100644 > --- a/sys/x86/xen/hvm.c > +++ b/sys/x86/xen/hvm.c > @@ -35,15 +35,21 @@ __FBSDID("$FreeBSD$"); > #include <sys/proc.h> > #include <sys/smp.h> > #include <sys/systm.h> > +#include <sys/lock.h> > +#include <sys/mutex.h> > +#include <sys/reboot.h> > > #include <vm/vm.h> > #include <vm/pmap.h> > +#include <vm/vm_kern.h> > +#include <vm/vm_extern.h> > > #include <dev/pci/pcivar.h> > > #include <machine/cpufunc.h> > #include <machine/cpu.h> > #include <machine/smp.h> > +#include <machine/stdarg.h> > > #include <x86/apicreg.h> > > @@ -52,6 +58,9 @@ __FBSDID("$FreeBSD$"); > #include <xen/gnttab.h> > #include <xen/hypervisor.h> > #include <xen/hvm.h> > +#ifdef __amd64__ > +#include <xen/pv.h> > +#endif > #include <xen/xen_intr.h> > > #include <xen/interface/hvm/params.h> > @@ -97,6 +106,11 @@ extern void pmap_lazyfix_action(void); > /* Variables used by mp_machdep to perform the bitmap IPI */ > extern volatile u_int cpu_ipi_pending[MAXCPU]; > > +#ifdef __amd64__ > +/* Native AP start used on PVHVM */ > +extern int native_start_all_aps(void); > +#endif > + > /*---------------------------------- Macros ----------------------------------*/ > #define IPI_TO_IDX(ipi) ((ipi) - APIC_IPI_INTS) > > @@ -119,7 +133,10 @@ enum xen_domain_type xen_domain_type = XEN_NATIVE; > struct cpu_ops xen_hvm_cpu_ops = { > .ipi_vectored = lapic_ipi_vectored, > .cpu_init = xen_hvm_cpu_init, > - .cpu_resume = xen_hvm_cpu_resume > + .cpu_resume = xen_hvm_cpu_resume, > +#ifdef __amd64__ > + .start_all_aps = native_start_all_aps, > +#endif > }; > > static MALLOC_DEFINE(M_XENHVM, "xen_hvm", "Xen HVM PV Support"); > @@ -157,8 +174,9 @@ DPCPU_DEFINE(xen_intr_handle_t, ipi_handle[nitems(xen_ipis)]); > > /*------------------ Hypervisor Access Shared Memory Regions -----------------*/ > /** Hypercall table accessed via HYPERVISOR_*_op() methods. */ > -char *hypercall_stubs; > +extern char *hypercall_page; > shared_info_t *HYPERVISOR_shared_info; > +start_info_t *HYPERVISOR_start_info; > > #ifdef SMP > /*---------------------------- XEN PV IPI Handlers ---------------------------*/ > @@ -522,7 +540,7 @@ xen_setup_cpus(void) > { > int i; > > - if (!xen_hvm_domain() || !xen_vector_callback_enabled) > + if (!xen_vector_callback_enabled) > return; > > #ifdef __amd64__ > @@ -558,7 +576,7 @@ xen_hvm_cpuid_base(void) > * Allocate and fill in the hypcall page. > */ > static int > -xen_hvm_init_hypercall_stubs(void) > +xen_hvm_init_hypercall_stubs(enum xen_hvm_init_type init_type) > { > uint32_t base, regs[4]; > int i; > @@ -567,7 +585,7 @@ xen_hvm_init_hypercall_stubs(void) > if (base == 0) > return (ENXIO); > > - if (hypercall_stubs == NULL) { > + if (init_type == XEN_HVM_INIT_COLD) { > do_cpuid(base + 1, regs); > printf("XEN: Hypervisor version %d.%d detected.\n", > regs[0] >> 16, regs[0] & 0xffff); > @@ -577,18 +595,9 @@ xen_hvm_init_hypercall_stubs(void) > * Find the hypercall pages. > */ > do_cpuid(base + 2, regs); > - > - if (hypercall_stubs == NULL) { > - size_t call_region_size; > - > - call_region_size = regs[0] * PAGE_SIZE; > - hypercall_stubs = malloc(call_region_size, M_XENHVM, M_NOWAIT); > - if (hypercall_stubs == NULL) > - panic("Unable to allocate Xen hypercall region"); > - } > > for (i = 0; i < regs[0]; i++) > - wrmsr(regs[1], vtophys(hypercall_stubs + i * PAGE_SIZE) + i); > + wrmsr(regs[1], vtophys(&hypercall_page + i * PAGE_SIZE) + i); > > return (0); > } > @@ -677,8 +686,6 @@ xen_hvm_disable_emulated_devices(void) > if (inw(XEN_MAGIC_IOPORT) != XMI_MAGIC) > return; > > - if (bootverbose) > - printf("XEN: Disabling emulated block and network devices\n"); > outw(XEN_MAGIC_IOPORT, XMI_UNPLUG_IDE_DISKS|XMI_UNPLUG_NICS); > } > > @@ -691,7 +698,12 @@ xen_hvm_init(enum xen_hvm_init_type init_type) > if (init_type == XEN_HVM_INIT_CANCELLED_SUSPEND) > return; > > - error = xen_hvm_init_hypercall_stubs(); > + if (xen_pv_domain()) { > + /* hypercall page is already set in the PV case */ > + error = 0; > + } else { > + error = xen_hvm_init_hypercall_stubs(init_type); > + } > > switch (init_type) { > case XEN_HVM_INIT_COLD: > @@ -701,6 +713,12 @@ xen_hvm_init(enum xen_hvm_init_type init_type) > setup_xen_features(); > cpu_ops = xen_hvm_cpu_ops; > vm_guest = VM_GUEST_XEN; > +#ifdef __amd64__ > + if (xen_pv_domain()) > + cpu_ops.start_all_aps = xen_pv_start_all_aps; > + else > +#endif > + printf("XEN: Disabling emulated block and network devices\n"); > break; > case XEN_HVM_INIT_RESUME: > if (error != 0) > @@ -715,10 +733,13 @@ xen_hvm_init(enum xen_hvm_init_type init_type) > } > > xen_vector_callback_enabled = 0; > - xen_domain_type = XEN_HVM_DOMAIN; > - xen_hvm_init_shared_info_page(); > xen_hvm_set_callback(NULL); > - xen_hvm_disable_emulated_devices(); > + > + if (!xen_pv_domain()) { > + xen_domain_type = XEN_HVM_DOMAIN; > + xen_hvm_init_shared_info_page(); > + xen_hvm_disable_emulated_devices(); > + } > } > > void > @@ -749,10 +770,11 @@ xen_set_vcpu_id(void) > struct pcpu *pc; > int i; > > - /* Set vcpu_id to acpi_id */ > + /* Set vcpu_id to acpi_id for PVHVM guests */ > CPU_FOREACH(i) { > pc = pcpu_find(i); > - pc->pc_vcpu_id = pc->pc_acpi_id; > + if (xen_hvm_domain()) > + pc->pc_vcpu_id = pc->pc_acpi_id; > if (bootverbose) > printf("XEN: CPU %u has VCPU ID %u\n", > i, pc->pc_vcpu_id); > @@ -790,6 +812,31 @@ xen_hvm_cpu_init(void) > DPCPU_SET(vcpu_info, vcpu_info); > } > > +/*----------------------------- Debug functions ------------------------------*/ > +#define PRINTK_BUFSIZE 1024 > +static int > +vprintk(const char *fmt, __va_list ap) > +{ > + int retval, len; > + static char buf[PRINTK_BUFSIZE]; > + > + retval = vsnprintf(buf, PRINTK_BUFSIZE - 1, fmt, ap); > + buf[retval] = 0; > + len = strlen(buf); > + retval = HYPERVISOR_console_io(CONSOLEIO_write, len, (char *)buf); > + return retval; > +} > + > +void > +xen_early_printf(const char *fmt, ...) > +{ > + __va_list ap; > + > + va_start(ap, fmt); > + vprintk(fmt, ap); > + va_end(ap); > +} > + > SYSINIT(xen_hvm_init, SI_SUB_HYPERVISOR, SI_ORDER_FIRST, xen_hvm_sysinit, NULL); > #ifdef SMP > SYSINIT(xen_setup_cpus, SI_SUB_SMP, SI_ORDER_FIRST, xen_setup_cpus, NULL); > diff --git a/sys/x86/xen/mptable.c b/sys/x86/xen/mptable.c > new file mode 100644 > index 0000000..8916314 > --- /dev/null > +++ b/sys/x86/xen/mptable.c > @@ -0,0 +1,136 @@ > +/*- > + * Copyright (c) 2003 John Baldwin <jhb@FreeBSD.org> > + * Copyright (c) 2013 Roger Pau Monné <roger.pau@citrix.com> > + * All rights reserved. > + * > + * Redistribution and use in source and binary forms, with or without > + * modification, are permitted provided that the following conditions > + * are met: > + * 1. Redistributions of source code must retain the above copyright > + * notice, this list of conditions and the following disclaimer. > + * 2. Redistributions in binary form must reproduce the above copyright > + * notice, this list of conditions and the following disclaimer in the > + * documentation and/or other materials provided with the distribution. > + * 3. Neither the name of the author nor the names of any co-contributors > + * may be used to endorse or promote products derived from this software > + * without specific prior written permission. > + * > + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'''' AND > + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE > + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE > + * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE > + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL > + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS > + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) > + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT > + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY > + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF > + * SUCH DAMAGE. > + */ > + > +#include <sys/cdefs.h> > +__FBSDID("$FreeBSD$"); > + > +#include <sys/param.h> > +#include <sys/systm.h> > +#include <sys/bus.h> > +#include <sys/kernel.h> > +#include <sys/smp.h> > +#include <sys/pcpu.h> > +#include <vm/vm.h> > +#include <vm/pmap.h> > + > +#include <machine/intr_machdep.h> > +#include <machine/apicvar.h> > + > +#include <machine/cpu.h> > +#include <machine/smp.h> > + > +#include <xen/xen-os.h> > +#include <xen/hypervisor.h> > + > +#include <xen/interface/vcpu.h> > + > +static int xenpv_probe(void); > +static int xenpv_probe_cpus(void); > +static int xenpv_setup_local(void); > +static int xenpv_setup_io(void); > + > +static struct apic_enumerator xenpv_enumerator = { > + "Xen PV", > + xenpv_probe, > + xenpv_probe_cpus, > + xenpv_setup_local, > + xenpv_setup_io > +}; > + > +/* > + * Look for an ACPI Multiple APIC Description Table ("APIC") > + */ > +static int > +xenpv_probe(void) > +{ > + return (-100); > +} > + > +/* > + * Run through the MP table enumerating CPUs. > + */ > +static int > +xenpv_probe_cpus(void) > +{ > + int i, ret; > + > + for (i = 0; i < MAXCPU; i++) { > + ret = HYPERVISOR_vcpu_op(VCPUOP_is_up, i, NULL); > + if (ret >= 0) > + cpu_add((i * 2), (i == 0)); > + } > + > + return (0); > +} > + > +/* > + * Initialize the local APIC on the BSP. > + */ > +static int > +xenpv_setup_local(void) > +{ > + PCPU_SET(vcpu_id, 0); > + return (0); > +} > + > +/* > + * Enumerate I/O APICs and setup interrupt sources. > + */ > +static int > +xenpv_setup_io(void) > +{ > + return (0); > +} > + > +static void > +xenpv_register(void *dummy __unused) > +{ > + if (xen_pv_domain()) { > + apic_register_enumerator(&xenpv_enumerator); > + } > +} > +SYSINIT(xenpv_register, SI_SUB_TUNABLES - 1, SI_ORDER_FIRST, xenpv_register, NULL); > + > +/* > + * Setup per-CPU ACPI IDs. > + */ > +static void > +xenpv_set_ids(void *dummy) > +{ > + struct pcpu *pc; > + int i; > + > + CPU_FOREACH(i) { > + pc = pcpu_find(i); > + pc->pc_vcpu_id = i; > + } > + return; > +} > +SYSINIT(xenpv_set_ids, SI_SUB_CPU, SI_ORDER_MIDDLE, xenpv_set_ids, NULL); > diff --git a/sys/x86/xen/pv.c b/sys/x86/xen/pv.c > new file mode 100644 > index 0000000..6756dec > --- /dev/null > +++ b/sys/x86/xen/pv.c > @@ -0,0 +1,247 @@ > +/* > + * Copyright (c) 2004 Christian Limpach. > + * Copyright (c) 2004-2006,2008 Kip Macy > + * Copyright (c) 2013 Roger Pau Monné <roger.pau@citrix.com> > + * All rights reserved. > + * > + * Redistribution and use in source and binary forms, with or without > + * modification, are permitted provided that the following conditions > + * are met: > + * 1. Redistributions of source code must retain the above copyright > + * notice, this list of conditions and the following disclaimer. > + * 2. Redistributions in binary form must reproduce the above copyright > + * notice, this list of conditions and the following disclaimer in the > + * documentation and/or other materials provided with the distribution. > + * > + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS AS IS'''' AND > + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE > + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE > + * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE > + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL > + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS > + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) > + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT > + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY > + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF > + * SUCH DAMAGE. > + */ > + > +#include <sys/cdefs.h> > +__FBSDID("$FreeBSD$"); > + > +#include <sys/param.h> > +#include <sys/bus.h> > +#include <sys/kernel.h> > +#include <sys/malloc.h> > +#include <sys/proc.h> > +#include <sys/smp.h> > +#include <sys/systm.h> > +#include <sys/lock.h> > +#include <sys/mutex.h> > +#include <sys/reboot.h> > + > +#include <vm/vm.h> > +#include <vm/pmap.h> > +#include <vm/vm_kern.h> > +#include <vm/vm_extern.h> > + > +#include <dev/pci/pcivar.h> > + > +#include <machine/cpufunc.h> > +#include <machine/cpu.h> > +#include <machine/smp.h> > +#include <machine/tss.h> > +#include <machine/sysarch.h> > +#include <machine/clock.h> > + > +#include <x86/apicreg.h> > + > +#include <xen/xen-os.h> > +#include <xen/features.h> > +#include <xen/gnttab.h> > +#include <xen/hypervisor.h> > +#include <xen/hvm.h> > +#include <xen/pv.h> > +#include <xen/xen_intr.h> > + > +#include <xen/interface/hvm/params.h> > +#include <xen/interface/vcpu.h> > + > +#define MAX_E820_ENTRIES 128 > + > +/*--------------------------- Forward Declarations ---------------------------*/ > +static caddr_t xen_pv_parse_preload_data(u_int64_t); > +static void xen_pv_fetch_e820_map(caddr_t, struct bios_smap **, u_int32_t *); > + > +/*---------------------------- Extern Declarations ---------------------------*/ > +/* Variables used by amd64 mp_machdep to start APs */ > +extern struct mtx ap_boot_mtx; > +extern void *bootstacks[]; > +extern char *doublefault_stack; > +extern char *nmi_stack; > +extern void *dpcpu; > +extern int bootAP; > +extern char *bootSTK; > +extern bool lapic_disabled; > + > +/*-------------------------------- Global Data -------------------------------*/ > +/* Xen init_ops implementation. */ > +struct init_ops xen_init_ops = { > + .parse_preload_data = xen_pv_parse_preload_data, > + .early_delay_init = xen_delay_init, > + .early_delay = xen_delay, > + .fetch_e820_map = xen_pv_fetch_e820_map, > +}; > + > +static struct > +{ > + const char *ev; > + int mask; > +} howto_names[] = { > + {"boot_askname", RB_ASKNAME}, > + {"boot_single", RB_SINGLE}, > + {"boot_nosync", RB_NOSYNC}, > + {"boot_halt", RB_ASKNAME}, > + {"boot_serial", RB_SERIAL}, > + {"boot_cdrom", RB_CDROM}, > + {"boot_gdb", RB_GDB}, > + {"boot_gdb_pause", RB_RESERVED1}, > + {"boot_verbose", RB_VERBOSE}, > + {"boot_multicons", RB_MULTIPLE}, > + {NULL, 0} > +}; > + > +static struct bios_smap xen_smap[MAX_E820_ENTRIES]; > + > +static int > +start_xen_ap(int cpu) > +{ > + struct vcpu_guest_context *ctxt; > + int ms, cpus = mp_naps; > + > + ctxt = malloc(sizeof(*ctxt), M_TEMP, M_NOWAIT | M_ZERO); > + if (ctxt == NULL) > + panic("unable to allocate memory"); > + > + ctxt->flags = VGCF_IN_KERNEL; > + ctxt->user_regs.rip = (unsigned long) init_secondary; > + ctxt->user_regs.rsp = (unsigned long) bootSTK; > + > + /* Set the CPU to use the same page tables and CR4 value */ > + ctxt->ctrlreg[3] = KPML4phys; > + ctxt->ctrlreg[4] = rcr4(); > + > + if (HYPERVISOR_vcpu_op(VCPUOP_initialise, cpu, ctxt)) > + panic("unable to initialize CPU#%d\n", cpu); > + > + free(ctxt, M_TEMP); > + > + /* Launch the vCPU */ > + if (HYPERVISOR_vcpu_op(VCPUOP_up, cpu, NULL)) > + panic("unable to start AP#%d\n", cpu); > + > + /* Wait up to 5 seconds for it to start. */ > + for (ms = 0; ms < 5000; ms++) { > + if (mp_naps > cpus) > + return 1; /* return SUCCESS */ > + DELAY(1000); > + } > + > + return 0; > +} > + > +int > +xen_pv_start_all_aps(void) > +{ > + int cpu; > + > + mtx_init(&ap_boot_mtx, "ap boot", NULL, MTX_SPIN); > + lapic_disabled = true; > + > + for (cpu = 1; cpu < mp_ncpus; cpu++) { > + > + /* allocate and set up an idle stack data page */ > + bootstacks[cpu] = (void *)kmem_malloc(kernel_arena, > + KSTACK_PAGES * PAGE_SIZE, M_WAITOK | M_ZERO); > + doublefault_stack = (char *)kmem_malloc(kernel_arena, > + PAGE_SIZE, M_WAITOK | M_ZERO); > + nmi_stack = (char *)kmem_malloc(kernel_arena, PAGE_SIZE, > + M_WAITOK | M_ZERO); > + dpcpu = (void *)kmem_malloc(kernel_arena, DPCPU_SIZE, > + M_WAITOK | M_ZERO); > + > + bootSTK = (char *)bootstacks[cpu] + KSTACK_PAGES * PAGE_SIZE - 8; > + bootAP = cpu; > + > + /* attempt to start the Application Processor */ > + if (!start_xen_ap(cpu)) > + panic("AP #%d failed to start!", cpu); > + > + CPU_SET(cpu, &all_cpus); /* record AP in CPU map */ > + } > + > + return mp_naps; > +} > + > +/* > + * Functions to convert the "extra" parameters passed by Xen > + * into FreeBSD boot options (from the i386 Xen port). > + */ > +static char * > +xen_setbootenv(char *cmd_line) > +{ > + char *cmd_line_next; > + > + /* Skip leading spaces */ > + for (; *cmd_line == '' ''; cmd_line++); > + > + for (cmd_line_next = cmd_line; strsep(&cmd_line_next, ",") != NULL;); > + return (cmd_line); > +} > + > +static int > +xen_boothowto(char *envp) > +{ > + int i, howto = 0; > + > + /* get equivalents from the environment */ > + for (i = 0; howto_names[i].ev != NULL; i++) > + if (getenv(howto_names[i].ev) != NULL) > + howto |= howto_names[i].mask; > + return (howto); > +} > + > +static caddr_t > +xen_pv_parse_preload_data(u_int64_t modulep) > +{ > + /* Parse the extra boot information given by Xen */ > + if (HYPERVISOR_start_info->cmd_line) > + kern_envp = xen_setbootenv(HYPERVISOR_start_info->cmd_line); > + boothowto |= xen_boothowto(kern_envp); > + > + return (NULL); > +} > + > +static void > +xen_pv_fetch_e820_map(caddr_t kmdp, struct bios_smap **smap, u_int32_t *size) > +{ > + struct xen_memory_map memmap; > + int rc; > + > + /* Fetch the E820 map from Xen */ > + memmap.nr_entries = MAX_E820_ENTRIES; > + set_xen_guest_handle(memmap.buffer, xen_smap); > + rc = HYPERVISOR_memory_op(XENMEM_memory_map, &memmap); > + if (rc) > + panic("unable to fetch Xen E820 memory map"); > + > + *smap = xen_smap; > + *size = memmap.nr_entries * sizeof(xen_smap[0]); > +} > + > +void > +xen_pv_set_init_ops(void) > +{ > + /* Init ops for Xen PV */ > + init_ops = xen_init_ops; > +} > diff --git a/sys/x86/xen/pvcpu.c b/sys/x86/xen/pvcpu.c > new file mode 100644 > index 0000000..00e063b > --- /dev/null > +++ b/sys/x86/xen/pvcpu.c > @@ -0,0 +1,98 @@ > +/* > + * Copyright (c) 2013 Roger Pau Monné <roger.pau@citrix.com> > + * All rights reserved. > + * > + * Redistribution and use in source and binary forms, with or without > + * modification, are permitted provided that the following conditions > + * are met: > + * 1. Redistributions of source code must retain the above copyright > + * notice, this list of conditions and the following disclaimer. > + * 2. Redistributions in binary form must reproduce the above copyright > + * notice, this list of conditions and the following disclaimer in the > + * documentation and/or other materials provided with the distribution. > + * > + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS AS IS'''' AND > + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE > + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE > + * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE > + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL > + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS > + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) > + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT > + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY > + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF > + * SUCH DAMAGE. > + */ > + > +#include <sys/cdefs.h> > +__FBSDID("$FreeBSD$"); > + > +#include <sys/param.h> > +#include <sys/systm.h> > +#include <sys/bus.h> > +#include <sys/kernel.h> > +#include <sys/module.h> > +#include <sys/pcpu.h> > +#include <sys/smp.h> > + > +#include <xen/xen-os.h> > + > +static void > +xenpvcpu_identify(driver_t *driver, device_t parent) > +{ > + int i; > + > + if (!xen_pv_domain()) > + return; > + > + CPU_FOREACH(i) > + BUS_ADD_CHILD(parent, 0, "pvcpu", i); > +} > + > +static int > +xenpvcpu_probe(device_t dev) > +{ > + if (!xen_pv_domain()) > + return (ENXIO); > + > + device_set_desc(dev, "Xen PV CPU"); > + return (0); > +} > + > +static int > +xenpvcpu_attach(device_t dev) > +{ > + struct pcpu *pc; > + int cpu; > + > + cpu = device_get_unit(dev); > + pc = pcpu_find(cpu); > + pc->pc_device = dev; > + return (0); > +} > + > +static int > +xenpvcpu_detach(device_t dev) > +{ > + > + return (0); > +} > + > +static device_method_t xenpvcpu_methods[] = { > + DEVMETHOD(device_identify, xenpvcpu_identify), > + DEVMETHOD(device_probe, xenpvcpu_probe), > + DEVMETHOD(device_attach, xenpvcpu_attach), > + DEVMETHOD(device_detach, xenpvcpu_detach), > + DEVMETHOD_END > +}; > + > +static driver_t xenpvcpu_driver = { > + "pvcpu", > + xenpvcpu_methods, > + 0, > +}; > + > +devclass_t xenpvcpu_devclass; > + > +DRIVER_MODULE(xenpvcpu, nexus, xenpvcpu_driver, xenpvcpu_devclass, 0, 0); > +MODULE_DEPEND(xenpvcpu, nexus, 1, 1, 1); > diff --git a/sys/xen/gnttab.c b/sys/xen/gnttab.c > index 03c32b7..909378a 100644 > --- a/sys/xen/gnttab.c > +++ b/sys/xen/gnttab.c > @@ -25,6 +25,7 @@ __FBSDID("$FreeBSD$"); > #include <sys/lock.h> > #include <sys/malloc.h> > #include <sys/mman.h> > +#include <sys/limits.h> > > #include <xen/xen-os.h> > #include <xen/hypervisor.h> > @@ -607,6 +608,7 @@ gnttab_resume(void) > { > int error; > unsigned int max_nr_gframes, nr_gframes; > + void *alloc_mem; > > nr_gframes = nr_grant_frames; > max_nr_gframes = max_nr_grant_frames(); > @@ -614,11 +616,20 @@ gnttab_resume(void) > return (ENOSYS); > > if (!resume_frames) { > - error = xenpci_alloc_space(PAGE_SIZE * max_nr_gframes, > - &resume_frames); > - if (error) { > - printf("error mapping gnttab share frames\n"); > - return (error); > + if (xen_pv_domain()) { > + alloc_mem = contigmalloc(max_nr_gframes * PAGE_SIZE, > + M_DEVBUF, M_NOWAIT, 0, > + ULONG_MAX, PAGE_SIZE, 0); > + KASSERT((alloc_mem != NULL), > + ("unable to alloc memory for gnttab")); > + resume_frames = vtophys(alloc_mem); > + } else { > + error = xenpci_alloc_space(PAGE_SIZE * max_nr_gframes, > + &resume_frames); > + if (error) { > + printf("error mapping gnttab share frames\n"); > + return (error); > + } > } > } > > diff --git a/sys/xen/interface/arch-x86/xen.h b/sys/xen/interface/arch-x86/xen.h > index 1c186d7..6cc15d3 100644 > --- a/sys/xen/interface/arch-x86/xen.h > +++ b/sys/xen/interface/arch-x86/xen.h > @@ -147,7 +147,16 @@ struct vcpu_guest_context { > struct cpu_user_regs user_regs; /* User-level CPU registers */ > struct trap_info trap_ctxt[256]; /* Virtual IDT */ > unsigned long ldt_base, ldt_ents; /* LDT (linear address, # ents) */ > - unsigned long gdt_frames[16], gdt_ents; /* GDT (machine frames, # ents) */ > + union { > + struct { > + /* PV: GDT (machine frames, # ents).*/ > + unsigned long gdt_frames[16], gdt_ents; > + } pv; > + struct { > + /* PVH: GDTR addr and size */ > + unsigned long gdtaddr, gdtsz; > + } pvh; > + } u; > unsigned long kernel_ss, kernel_sp; /* Virtual TSS (only SS1/SP1) */ > /* NB. User pagetable on x86/64 is placed in ctrlreg[1]. */ > unsigned long ctrlreg[8]; /* CR0-CR7 (control registers) */ > diff --git a/sys/xen/pv.h b/sys/xen/pv.h > new file mode 100644 > index 0000000..bbb1048 > --- /dev/null > +++ b/sys/xen/pv.h > @@ -0,0 +1,29 @@ > +/* > + * Permission is hereby granted, free of charge, to any person obtaining a copy > + * of this software and associated documentation files (the "Software"), to > + * deal in the Software without restriction, including without limitation the > + * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or > + * sell copies of the Software, and to permit persons to whom the Software is > + * furnished to do so, subject to the following conditions: > + * > + * The above copyright notice and this permission notice shall be included in > + * all copies or substantial portions of the Software. > + * > + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR > + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, > + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE > + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER > + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING > + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER > + * DEALINGS IN THE SOFTWARE. > + * > + * $FreeBSD$ > + */ > + > +#ifndef __XEN_PV_H__ > +#define __XEN_PV_H__ > + > +int xen_pv_start_all_aps(void); > +void xen_pv_set_init_ops(void); > + > +#endif /* __XEN_PV_H__ */ > \ No newline at end of file > diff --git a/sys/xen/xen-os.h b/sys/xen/xen-os.h > index 95e8c6a..d3dccad 100644 > --- a/sys/xen/xen-os.h > +++ b/sys/xen/xen-os.h > @@ -53,6 +53,11 @@ void force_evtchn_callback(void); > extern int gdtset; > > extern shared_info_t *HYPERVISOR_shared_info; > +extern start_info_t *HYPERVISOR_start_info; > + > +/* XXX: we need to get rid of this and use HYPERVISOR_start_info directly */ > +extern struct xenstore_domain_interface *xen_store; > +extern char *console_page; > > enum xen_domain_type { > XEN_NATIVE, /* running on bare hardware */ > @@ -80,6 +85,9 @@ xen_hvm_domain(void) > return (xen_domain_type == XEN_HVM_DOMAIN); > } > > +/* Debug function, prints directly to hypervisor console */ > +void xen_early_printf(const char *, ...); > + > #ifndef xen_mb > #define xen_mb() mb() > #endif > diff --git a/sys/xen/xenstore/xenstore.c b/sys/xen/xenstore/xenstore.c > index d404862..b9885af 100644 > --- a/sys/xen/xenstore/xenstore.c > +++ b/sys/xen/xenstore/xenstore.c > @@ -1082,6 +1082,19 @@ xs_init_comms(void) > static void > xs_identify(driver_t *driver, device_t parent) > { > + const char *parent_name; > + > + if (!xen_domain()) > + return; > + > + /* > + * On HVM domains we will get called twice, once from the nexus > + * and another time after the xenpci device is attached, we should > + * only attach after the xenpci device has been added. > + */ > + parent_name = device_get_name(parent); > + if (xen_hvm_domain() && strncmp(parent_name, "xenpci", 6) != 0) > + return; > > BUS_ADD_CHILD(parent, 0, "xenstore", 0); > } > @@ -1147,13 +1160,15 @@ xs_attach(device_t dev) > /* Initialize the interface to xenstore. */ > struct proc *p; > > -#ifdef XENHVM > - xs.evtchn = hvm_get_parameter(HVM_PARAM_STORE_EVTCHN); > - xs.gpfn = hvm_get_parameter(HVM_PARAM_STORE_PFN); > - xen_store = pmap_mapdev(xs.gpfn * PAGE_SIZE, PAGE_SIZE); > -#else > - xs.evtchn = xen_start_info->store_evtchn; > -#endif > + if (xen_hvm_domain()) { > + xs.evtchn = hvm_get_parameter(HVM_PARAM_STORE_EVTCHN); > + xs.gpfn = hvm_get_parameter(HVM_PARAM_STORE_PFN); > + xen_store = pmap_mapdev(xs.gpfn * PAGE_SIZE, PAGE_SIZE); > + } else if (xen_pv_domain()) { > + xs.evtchn = HYPERVISOR_start_info->store_evtchn; > + } else { > + panic("Unknown domain type, cannot initialize xenstore\n"); > + } > > TAILQ_INIT(&xs.reply_list); > TAILQ_INIT(&xs.watch_events); > @@ -1263,9 +1278,8 @@ static devclass_t xenstore_devclass; > > #ifdef XENHVM > DRIVER_MODULE(xenstore, xenpci, xenstore_driver, xenstore_devclass, 0, 0); > -#else > -DRIVER_MODULE(xenstore, nexus, xenstore_driver, xenstore_devclass, 0, 0); > #endif > +DRIVER_MODULE(xenstore, nexus, xenstore_driver, xenstore_devclass, 0, 0); > > /*------------------------------- Sysctl Data --------------------------------*/ > /* XXX Shouldn''t the node be somewhere else? */ > -- > 1.7.7.5 (Apple Git-26) >> _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel
On 28/10/13 14:35, Roger Pau Monné wrote:> Hello, > > The Xen community is working on a new virtualization mode (or maybe I > should say an extension of HVM) to be able to run PV guests inside HVM > containers without requiring a device-model (Qemu). One of the > advantages of this new virtualization mode is that now it is much more > easier to port guests to run under it (as compared to pure PV guests). > > Given that FreeBSD already supports PVHVM, adding PVH support is quite > easy, we only need some glue for the PV entry point and then support > for diverging some early init functions (like fetching the e820 map or > starting the APs). > > The attached patch contains all this changes, and allows a SMP FreeBSD > guest to fully boot (and AFAIK work) under this new PVH mode. The patch > can also be found on my git repo: > > git://xenbits.xen.org/people/royger/freebsd.git pvh_v2 > > The patch touches quite a lot of the early init, so I''ve Cced the > persons that maintain those areas, so they can review it. > > In order to test it, and since the PVH changes are not yet merged into > upstream Xen, the use of a patched Xen is necessary. I''ve collected the > patches for PVH guest support from George Dunlap (v13) and fixed some > bugs on top of them, the tree can be found at: > > git://xenbits.xen.org/people/royger/xen.git fix_pvhI''ve updated the patch (as suggested by John Baldwin) and added a Xen Nexus, that attaches all the Xen top-level devices, this gets rid of the legacy bus. The new patch can be found at: git://xenbits.xen.org/people/royger/freebsd.git pvh_v2 And also attached on this email. Thanks for the review, Roger. --------------010605090609060304010908 Content-Type: text/plain; charset="UTF-8"; x-mac-type=0; x-mac-creator=0; name="0001-Xen-x86-PVH-support.patch" Content-Transfer-Encoding: 8bit Content-Disposition: attachment; filename="0001-Xen-x86-PVH-support.patch" From 325c95ccd941bdb3101e9b6dd6c6a66274865fa9 Mon Sep 17 00:00:00 2001 From: Roger Pau Monne <roger.pau@citrix.com> Date: Thu, 7 Nov 2013 17:07:50 +0100 Subject: [PATCH] Xen x86 PVH support This is still very experimental, and PVH support has not yet been merged into upstream Xen. PVH mode is basically a PV guest inside an HVM container, and shares a great amount of code with PVHVM. The main difference is the way the guest is started, PVH uses the PV start sequence, jumping directly into the kernel entry point in long mode and with page tables set. The main work of this patch consists in setting the environment as similar as possible to what native FreeBSD expects, and then adding hooks to the PV ops when necessary. sys/amd64/amd64/locore.S: * Add PV entry point, hypervisor_page and the necessary elfnotes. sys/amd64/amd64/machdep.c: * Add hooks to replace bare metal operations that should use a PV helper, this includes: - Preload metadata - i8254_init and i8254_delay - Fetching the e820 memory map - Reserve of the MP bootstrap region * Create a DELAY function that uses the PV hooks. * Introduce a new hammer_time_xen that sets the necessary stuff when running in PVH mode. sys/amd64/amd64/mp_machdep.c: * Introduce a hook to replace start_all_aps. * Introduce a lapic_disabled variable to prevent polluting the code with xen specific gates. sys/amd64/include/asmacros.h: * Copy the ELFNOTE macro from the i386 Xen PV port. sys/amd64/include/clock.h: sys/i386/include/clock.h: * Prototypes for the xen early delay initialization and usage. sys/amd64/include/cpu.h: * Introduce a new cpu hook to init APs. sys/amd64/include/sysarch.h: * Declare the init_ops structure. sys/amd64/include/xen/hypercall.h: sys/i386/include/xen/hypercall.h * Switch to the PV style hypercall mechanism for HVM also. sys/conf/files: * Make the PV console available on XENHVM also. sys/conf/files.amd64: * Include the new files for the PVH port. sys/dev/xen/console/console.c: sys/dev/xen/console/xencons_ring.c: * Remove the identify method and instead add the device from nexus_xen. * Use HYPERVISOR_start_info instead of xen_start_info. * Use HYPERVISOR_event_channel_op to kick the event channel before xen interrupts are setup. sys/dev/xen/control/control.c: * Use the PV shutdown on PVH. sys/dev/xen/timer/timer.c: * Pass a vcpu_info to xen_fetch_vcpu_time, this allows using this function at very early init, before per-cpu vcpu_info is set. * Remove critical_{enter/exit} from xen_fetch_vcpu_time so it can be used at early boot, instead place them on the callers. * Introduce two new functions, xen_delay_init and xen_delay that can be used at early boot to implement the generic DELAY function. * Remove the identify method that used to add the device, now it is manually added from either xenpci (HVM) or nexus_xen (PV). sys/i386/i386/locore.s: * Reserve space for the hypercall page. sys/i386/i386/machdep.c: * Create a generic DELAY function. sys/i386/xen/xen_machdep.c: * Set HYPERVISOR_start_info. sys/x86/isa/clock.c: * Rename the generic DELAY function to i8254_delay. sys/x86/x86/delay.c: * Put generic delay helpers here, get_tsc and delay_tc. sys/x86/x86/local_apic.c: * Prevent the local apic from attaching when running on PVH mode. sys/x86/xen/hvm.c: * Set the start_all_aps hook. * Fix the setting of the hypercall page now that we are using the same mechanism as the PV port. * Initialize Xen CPU hooks for the PVH port. * Introduce the xen_early_printf debug function, which prints directly to the hypervisor console. * Initialize APs before SI_SUB_SMP (SI_SUB_SMP-1). sys/x86/xen/mptable.c: * Create a dummy PV CPU enumerator for the PVH port. sys/x86/xen/pv.c: * Implement the PV functions for the early boot hooks, parse_preload_data and fetch_e820_map. * Implement the PV function for the start_all_aps hook. sys/x86/xen/pvcpu.c: * Dummy Xen PV CPU device, that we use to set the per-cpu pc_device. sys/xen/gnttab.c: * Allocate resume_frames for the PVH port. sys/xen/interface/arch-x86/xen.h: * Interface change for the PVH port (not used on FreeBSD). sys/xen/pv.h: * Header that exports the specific PV functions. sys/xen/xen-os.h: * Declare prototypes for the newly added functions. sys/xen/xenstore/xenstore.c: * Make the xenstore driver hang from both xenpci and the nexus when running XENHVM, this is because we don''t have a xenpci device on the PVH port. * Remove the identify routine that added the device, instead add it from either xenpci (HVM) or nexus_xen (PV). sys/dev/xen/xenpci/xenpci.c: * Add the xenstore and xen_et devices on succesful attach. sys/i386/xen/mp_machdep.c: * Modify cpu_initialize_context to match the changes in the Xen interface. sys/x86/xen/xen_nexus.c: * Create a specific nexus for Xen PV guests that takes care of adding the top level Xen PV devices. --- sys/amd64/amd64/locore.S | 53 ++++++++ sys/amd64/amd64/machdep.c | 179 ++++++++++++++++++++++---- sys/amd64/amd64/mp_machdep.c | 27 +++-- sys/amd64/include/asmacros.h | 26 ++++ sys/amd64/include/clock.h | 6 + sys/amd64/include/cpu.h | 1 + sys/amd64/include/sysarch.h | 19 +++ sys/amd64/include/xen/hypercall.h | 7 - sys/conf/files | 4 +- sys/conf/files.amd64 | 5 + sys/conf/files.i386 | 2 + sys/dev/xen/console/console.c | 29 ++--- sys/dev/xen/console/xencons_ring.c | 15 ++- sys/dev/xen/control/control.c | 37 +++--- sys/dev/xen/timer/timer.c | 73 +++++++---- sys/dev/xen/xenpci/xenpci.c | 8 + sys/i386/i386/locore.s | 9 ++ sys/i386/i386/machdep.c | 11 ++ sys/i386/include/clock.h | 6 + sys/i386/include/xen/hypercall.h | 7 - sys/i386/xen/mp_machdep.c | 6 +- sys/i386/xen/xen_machdep.c | 4 +- sys/x86/isa/clock.c | 53 +-------- sys/x86/isa/isa.c | 3 + sys/x86/x86/delay.c | 95 ++++++++++++++ sys/x86/x86/local_apic.c | 8 +- sys/x86/xen/hvm.c | 98 +++++++++++---- sys/x86/xen/mptable.c | 136 ++++++++++++++++++++ sys/x86/xen/pv.c | 247 ++++++++++++++++++++++++++++++++++++ sys/x86/xen/pvcpu.c | 77 +++++++++++ sys/x86/xen/xen_nexus.c | 99 ++++++++++++++ sys/xen/gnttab.c | 21 +++- sys/xen/interface/arch-x86/xen.h | 11 ++- sys/xen/pv.h | 29 ++++ sys/xen/xen-os.h | 8 + sys/xen/xenstore/xenstore.c | 24 ++-- 36 files changed, 1225 insertions(+), 218 deletions(-) create mode 100644 sys/x86/x86/delay.c create mode 100644 sys/x86/xen/mptable.c create mode 100644 sys/x86/xen/pv.c create mode 100644 sys/x86/xen/pvcpu.c create mode 100644 sys/x86/xen/xen_nexus.c create mode 100644 sys/xen/pv.h diff --git a/sys/amd64/amd64/locore.S b/sys/amd64/amd64/locore.S index 55cda3a..e04cc48 100644 --- a/sys/amd64/amd64/locore.S +++ b/sys/amd64/amd64/locore.S @@ -31,6 +31,12 @@ #include <machine/pmap.h> #include <machine/specialreg.h> +#ifdef XENHVM +#include <xen/xen-os.h> +#define __ASSEMBLY__ +#include <xen/interface/elfnote.h> +#endif + #include "assym.s" /* @@ -86,3 +92,50 @@ NON_GPROF_ENTRY(btext) ALIGN_DATA /* just to be sure */ .space 0x1000 /* space for bootstack - temporary stack */ bootstack: + +#ifdef XENHVM +/* Xen */ +.section __xen_guest + ELFNOTE(Xen, XEN_ELFNOTE_GUEST_OS, .asciz, "FreeBSD") + ELFNOTE(Xen, XEN_ELFNOTE_GUEST_VERSION, .asciz, "HEAD") + ELFNOTE(Xen, XEN_ELFNOTE_XEN_VERSION, .asciz, "xen-3.0") + ELFNOTE(Xen, XEN_ELFNOTE_VIRT_BASE, .quad, KERNBASE) + ELFNOTE(Xen, XEN_ELFNOTE_PADDR_OFFSET, .quad, KERNBASE) /* Xen honours elf->p_paddr; compensate for this */ + ELFNOTE(Xen, XEN_ELFNOTE_ENTRY, .quad, xen_start) + ELFNOTE(Xen, XEN_ELFNOTE_HYPERCALL_PAGE, .quad, hypercall_page) + ELFNOTE(Xen, XEN_ELFNOTE_HV_START_LOW, .quad, HYPERVISOR_VIRT_START) + ELFNOTE(Xen, XEN_ELFNOTE_FEATURES, .asciz, "writable_descriptor_tables|auto_translated_physmap|supervisor_mode_kernel|hvm_callback_vector") + ELFNOTE(Xen, XEN_ELFNOTE_PAE_MODE, .asciz, "yes") + ELFNOTE(Xen, XEN_ELFNOTE_L1_MFN_VALID, .long, PG_V, PG_V) + ELFNOTE(Xen, XEN_ELFNOTE_LOADER, .asciz, "generic") + ELFNOTE(Xen, XEN_ELFNOTE_SUSPEND_CANCEL, .long, 0) + ELFNOTE(Xen, XEN_ELFNOTE_BSD_SYMTAB, .asciz, "yes") + + .text +.p2align PAGE_SHIFT, 0x90 /* Hypercall_page needs to be PAGE aligned */ + +NON_GPROF_ENTRY(hypercall_page) + .skip 0x1000, 0x90 /* Fill with "nop"s */ + +NON_GPROF_ENTRY(xen_start) + /* Don''t trust what the loader gives for rflags. */ + pushq $PSL_KERNEL + popfq + + /* Parameters for the xen init function */ + movq %rsi, %rdi /* shared_info (arg 1) */ + movq %rsp, %rsi /* xenstack (arg 2) */ + + /* Use our own stack */ + movq $bootstack,%rsp + xorl %ebp, %ebp + + /* u_int64_t hammer_time_xen(start_info_t *si, u_int64_t xenstack); */ + call hammer_time_xen + movq %rax, %rsp /* set up kstack for mi_startup() */ + call mi_startup /* autoconfiguration, mountroot etc */ + + /* NOTREACHED */ +0: hlt + jmp 0b +#endif diff --git a/sys/amd64/amd64/machdep.c b/sys/amd64/amd64/machdep.c index 2b2e47f..b649def 100644 --- a/sys/amd64/amd64/machdep.c +++ b/sys/amd64/amd64/machdep.c @@ -127,6 +127,7 @@ __FBSDID("$FreeBSD$"); #include <machine/reg.h> #include <machine/sigframe.h> #include <machine/specialreg.h> +#include <machine/sysarch.h> #ifdef PERFMON #include <machine/perfmon.h> #endif @@ -147,10 +148,20 @@ __FBSDID("$FreeBSD$"); #include <isa/isareg.h> #include <isa/rtc.h> +#ifdef XENHVM +/* Xen */ +#include <xen/xen-os.h> +#include <xen/hvm.h> +#include <xen/pv.h> +#endif + /* Sanity check for __curthread() */ CTASSERT(offsetof(struct pcpu, pc_curthread) == 0); extern u_int64_t hammer_time(u_int64_t, u_int64_t); +#ifdef XENHVM +extern u_int64_t hammer_time_xen(start_info_t *, u_int64_t); +#endif extern void printcpuinfo(void); /* XXX header file */ extern void identify_cpu(void); @@ -166,6 +177,23 @@ static int set_fpcontext(struct thread *td, const mcontext_t *mcp, char *xfpustate, size_t xfpustate_len); SYSINIT(cpu, SI_SUB_CPU, SI_ORDER_FIRST, cpu_startup, NULL); +/* Preload data parse function */ +static caddr_t native_parse_preload_data(u_int64_t); + +/* Native function to fetch the e820 map */ +static void native_fetch_e820_map(caddr_t, struct bios_smap **, u_int32_t *); + +/* Default init_ops implementation. */ +struct init_ops init_ops = { + .parse_preload_data = native_parse_preload_data, + .early_delay_init = i8254_init, + .early_delay = i8254_delay, + .fetch_e820_map = native_fetch_e820_map, +#ifdef SMP + .mp_bootaddress = mp_bootaddress, +#endif +}; + /* * The file "conf/ldscript.amd64" defines the symbol "kernphys". Its value is * the physical address at which the kernel is loaded. @@ -216,6 +244,15 @@ struct mem_range_softc mem_range_softc; struct mtx dt_lock; /* lock for GDT and LDT */ +void +DELAY(int n) +{ + if (delay_tc(n)) + return; + + init_ops.early_delay(n); +} + static void cpu_startup(dummy) void *dummy; @@ -1408,6 +1445,24 @@ add_smap_entry(struct bios_smap *smap, vm_paddr_t *physmap, int *physmap_idxp) return (1); } +static void +native_fetch_e820_map(caddr_t kmdp, struct bios_smap **smap, u_int32_t *size) +{ + /* + * get memory map from INT 15:E820, kindly supplied by the + * loader. + * + * subr_module.c says: + * "Consumer may safely assume that size value precedes data." + * ie: an int32_t immediately precedes smap. + */ + *smap = (struct bios_smap *)preload_search_info(kmdp, + MODINFO_METADATA | MODINFOMD_SMAP); + if (*smap == NULL) + panic("No BIOS smap info from loader!"); + *size = *((u_int32_t *)*smap - 1); +} + /* * Populate the (physmap) array with base/bound pairs describing the * available physical memory in the system, then test this memory and @@ -1433,19 +1488,8 @@ getmemsize(caddr_t kmdp, u_int64_t first) basemem = 0; physmap_idx = 0; - /* - * get memory map from INT 15:E820, kindly supplied by the loader. - * - * subr_module.c says: - * "Consumer may safely assume that size value precedes data." - * ie: an int32_t immediately precedes smap. - */ - smapbase = (struct bios_smap *)preload_search_info(kmdp, - MODINFO_METADATA | MODINFOMD_SMAP); - if (smapbase == NULL) - panic("No BIOS smap info from loader!"); + init_ops.fetch_e820_map(kmdp, &smapbase, &smapsize); - smapsize = *((u_int32_t *)smapbase - 1); smapend = (struct bios_smap *)((uintptr_t)smapbase + smapsize); for (smap = smapbase; smap < smapend; smap++) @@ -1467,7 +1511,8 @@ getmemsize(caddr_t kmdp, u_int64_t first) #ifdef SMP /* make hole for AP bootstrap code */ - physmap[1] = mp_bootaddress(physmap[1] / 1024); + if (init_ops.mp_bootaddress) + physmap[1] = init_ops.mp_bootaddress(physmap[1] / 1024); #endif /* @@ -1681,6 +1726,98 @@ do_next: msgbufp = (struct msgbuf *)PHYS_TO_DMAP(phys_avail[pa_indx]); } +static caddr_t +native_parse_preload_data(u_int64_t modulep) +{ + caddr_t kmdp; + + preload_metadata = (caddr_t)(uintptr_t)(modulep + KERNBASE); + preload_bootstrap_relocate(KERNBASE); + kmdp = preload_search_by_type("elf kernel"); + if (kmdp == NULL) + kmdp = preload_search_by_type("elf64 kernel"); + boothowto = MD_FETCH(kmdp, MODINFOMD_HOWTO, int); + kern_envp = MD_FETCH(kmdp, MODINFOMD_ENVP, char *) + KERNBASE; +#ifdef DDB + ksym_start = MD_FETCH(kmdp, MODINFOMD_SSYM, uintptr_t); + ksym_end = MD_FETCH(kmdp, MODINFOMD_ESYM, uintptr_t); +#endif + + return (kmdp); +} + +#ifdef XENHVM +/* + * First function called by the Xen PVH boot sequence. + * + * Set some Xen global variables and prepare the environment so it is + * as similar as possible to what native FreeBSD init function expects. + */ +u_int64_t +hammer_time_xen(start_info_t *si, u_int64_t xenstack) +{ + u_int64_t physfree; + u_int64_t *PT4 = (u_int64_t *)xenstack; + u_int64_t *PT3 = (u_int64_t *)(xenstack + PAGE_SIZE); + u_int64_t *PT2 = (u_int64_t *)(xenstack + 2 * PAGE_SIZE); + int i; + + KASSERT((si != NULL && xenstack != 0), + ("invalid start_info or xenstack")); + + xen_early_printf("FreeBSD PVH running on %s\n", si->magic); + + /* We use 3 pages of xen stack for the boot pagetables */ + physfree = xenstack + 3 * PAGE_SIZE - KERNBASE; + + /* Setup Xen global variables */ + HYPERVISOR_start_info = si; + HYPERVISOR_shared_info + (shared_info_t *)(si->shared_info + KERNBASE); + + /* + * Setup some misc global variables for Xen devices + * + * XXX: devices that need this specific variables should + * be rewritten to fetch this info by themselves from the + * start_info page. + */ + console_page + (char *)(ptoa(si->console.domU.mfn) + KERNBASE); + xen_store = (struct xenstore_domain_interface *) + (ptoa(si->store_mfn) + KERNBASE); + + xen_domain_type = XEN_PV_DOMAIN; + vm_guest = VM_GUEST_XEN; + + /* + * Use the stack Xen gives us to build the page tables + * as native FreeBSD expects to find them (created + * by the boot trampoline). + */ + for (i = 0; i < 512; i++) { + /* Each slot of the level 4 pages points to the same level 3 page */ + PT4[i] = ((u_int64_t)&PT3[0]) - KERNBASE; + PT4[i] |= PG_V | PG_RW | PG_U; + + /* Each slot of the level 3 pages points to the same level 2 page */ + PT3[i] = ((u_int64_t)&PT2[0]) - KERNBASE; + PT3[i] |= PG_V | PG_RW | PG_U; + + /* The level 2 page slots are mapped with 2MB pages for 1GB. */ + PT2[i] = i * (2 * 1024 * 1024); + PT2[i] |= PG_V | PG_RW | PG_PS | PG_U; + } + load_cr3(((u_int64_t)&PT4[0]) - KERNBASE); + + /* Set the hooks for early functions that diverge from bare metal */ + xen_pv_set_init_ops(); + + /* Now we can jump into the native init function */ + return hammer_time(0, physfree); +} +#endif + u_int64_t hammer_time(u_int64_t modulep, u_int64_t physfree) { @@ -1705,17 +1842,7 @@ hammer_time(u_int64_t modulep, u_int64_t physfree) */ proc_linkup0(&proc0, &thread0); - preload_metadata = (caddr_t)(uintptr_t)(modulep + KERNBASE); - preload_bootstrap_relocate(KERNBASE); - kmdp = preload_search_by_type("elf kernel"); - if (kmdp == NULL) - kmdp = preload_search_by_type("elf64 kernel"); - boothowto = MD_FETCH(kmdp, MODINFOMD_HOWTO, int); - kern_envp = MD_FETCH(kmdp, MODINFOMD_ENVP, char *) + KERNBASE; -#ifdef DDB - ksym_start = MD_FETCH(kmdp, MODINFOMD_SSYM, uintptr_t); - ksym_end = MD_FETCH(kmdp, MODINFOMD_ESYM, uintptr_t); -#endif + kmdp = init_ops.parse_preload_data(modulep); /* Init basic tunables, hz etc */ init_param1(); @@ -1799,10 +1926,10 @@ hammer_time(u_int64_t modulep, u_int64_t physfree) lidt(&r_idt); /* - * Initialize the i8254 before the console so that console + * Initialize the early delay before the console so that console * initialization can use DELAY(). */ - i8254_init(); + init_ops.early_delay_init(); /* * Initialize the console before we print anything out. diff --git a/sys/amd64/amd64/mp_machdep.c b/sys/amd64/amd64/mp_machdep.c index 4ef4b3d..44c2a45 100644 --- a/sys/amd64/amd64/mp_machdep.c +++ b/sys/amd64/amd64/mp_machdep.c @@ -90,7 +90,8 @@ extern struct pcpu __pcpu[]; /* AP uses this during bootstrap. Do not staticize. */ char *bootSTK; -static int bootAP; +int bootAP; +bool lapic_disabled = false; /* Free these after use */ void *bootstacks[MAXCPU]; @@ -122,9 +123,12 @@ u_long *ipi_rendezvous_counts[MAXCPU]; static u_long *ipi_hardclock_counts[MAXCPU]; #endif +int native_start_all_aps(void); + /* Default cpu_ops implementation. */ struct cpu_ops cpu_ops = { - .ipi_vectored = lapic_ipi_vectored + .ipi_vectored = lapic_ipi_vectored, + .start_all_aps = native_start_all_aps, }; extern inthand_t IDTVEC(fast_syscall), IDTVEC(fast_syscall32); @@ -138,7 +142,7 @@ extern int pmap_pcid_enabled; static volatile cpuset_t ipi_nmi_pending; /* used to hold the AP''s until we are ready to release them */ -static struct mtx ap_boot_mtx; +struct mtx ap_boot_mtx; /* Set to 1 once we''re ready to let the APs out of the pen. */ static volatile int aps_ready = 0; @@ -165,7 +169,6 @@ static int cpu_cores; /* cores per package */ static void assign_cpu_ids(void); static void set_interrupt_apic_ids(void); -static int start_all_aps(void); static int start_ap(int apic_id); static void release_aps(void *dummy); @@ -569,7 +572,7 @@ cpu_mp_start(void) assign_cpu_ids(); /* Start each Application Processor */ - start_all_aps(); + cpu_ops.start_all_aps(); set_interrupt_apic_ids(); } @@ -707,7 +710,8 @@ init_secondary(void) wrmsr(MSR_SF_MASK, PSL_NT|PSL_T|PSL_I|PSL_C|PSL_D); /* Disable local APIC just to be sure. */ - lapic_disable(); + if (!lapic_disabled) + lapic_disable(); /* signal our startup to the BSP. */ mp_naps++; @@ -733,7 +737,7 @@ init_secondary(void) /* A quick check from sanity claus */ cpuid = PCPU_GET(cpuid); - if (PCPU_GET(apic_id) != lapic_id()) { + if (!lapic_disabled && PCPU_GET(apic_id) != lapic_id()) { printf("SMP: cpuid = %d\n", cpuid); printf("SMP: actual apic_id = %d\n", lapic_id()); printf("SMP: correct apic_id = %d\n", PCPU_GET(apic_id)); @@ -749,7 +753,8 @@ init_secondary(void) mtx_lock_spin(&ap_boot_mtx); /* Init local apic for irq''s */ - lapic_setup(1); + if (!lapic_disabled) + lapic_setup(1); /* Set memory range attributes for this CPU to match the BSP */ mem_range_AP_init(); @@ -764,7 +769,7 @@ init_secondary(void) if (cpu_logical > 1 && PCPU_GET(apic_id) % cpu_logical != 0) CPU_SET(cpuid, &logical_cpus_mask); - if (bootverbose) + if (!lapic_disabled && bootverbose) lapic_dump("AP"); if (smp_cpus == mp_ncpus) { @@ -908,8 +913,8 @@ assign_cpu_ids(void) /* * start each AP in our list */ -static int -start_all_aps(void) +int +native_start_all_aps(void) { vm_offset_t va = boot_address + KERNBASE; u_int64_t *pt4, *pt3, *pt2; diff --git a/sys/amd64/include/asmacros.h b/sys/amd64/include/asmacros.h index 1fb592a..ce8dce4 100644 --- a/sys/amd64/include/asmacros.h +++ b/sys/amd64/include/asmacros.h @@ -201,4 +201,30 @@ #endif /* LOCORE */ +#ifdef __STDC__ +#define ELFNOTE(name, type, desctype, descdata...) \ +.pushsection .note.name ; \ + .align 4 ; \ + .long 2f - 1f /* namesz */ ; \ + .long 4f - 3f /* descsz */ ; \ + .long type ; \ +1:.asciz #name ; \ +2:.align 4 ; \ +3:desctype descdata ; \ +4:.align 4 ; \ +.popsection +#else /* !__STDC__, i.e. -traditional */ +#define ELFNOTE(name, type, desctype, descdata) \ +.pushsection .note.name ; \ + .align 4 ; \ + .long 2f - 1f /* namesz */ ; \ + .long 4f - 3f /* descsz */ ; \ + .long type ; \ +1:.asciz "name" ; \ +2:.align 4 ; \ +3:desctype descdata ; \ +4:.align 4 ; \ +.popsection +#endif /* __STDC__ */ + #endif /* !_MACHINE_ASMACROS_H_ */ diff --git a/sys/amd64/include/clock.h b/sys/amd64/include/clock.h index d7f7d82..e7817ab 100644 --- a/sys/amd64/include/clock.h +++ b/sys/amd64/include/clock.h @@ -25,6 +25,12 @@ extern int smp_tsc; #endif void i8254_init(void); +void i8254_delay(int); +#ifdef XENHVM +void xen_delay_init(void); +void xen_delay(int); +#endif +int delay_tc(int); /* * Driver to clock driver interface. diff --git a/sys/amd64/include/cpu.h b/sys/amd64/include/cpu.h index 3d9ff531..ed9f1db 100644 --- a/sys/amd64/include/cpu.h +++ b/sys/amd64/include/cpu.h @@ -64,6 +64,7 @@ struct cpu_ops { void (*cpu_init)(void); void (*cpu_resume)(void); void (*ipi_vectored)(u_int, int); + int (*start_all_aps)(void); }; extern struct cpu_ops cpu_ops; diff --git a/sys/amd64/include/sysarch.h b/sys/amd64/include/sysarch.h index cd380d4..27fd3ba 100644 --- a/sys/amd64/include/sysarch.h +++ b/sys/amd64/include/sysarch.h @@ -4,3 +4,22 @@ /* $FreeBSD$ */ #include <x86/sysarch.h> + +#include <machine/pc/bios.h> +/* + * Struct containing pointers to init functions whose + * implementation is run time selectable. Selection can be made, + * for example, based on detection of a BIOS variant or + * hypervisor environment. + */ +struct init_ops { + caddr_t (*parse_preload_data)(u_int64_t); + void (*early_delay_init)(void); + void (*early_delay)(int); + void (*fetch_e820_map)(caddr_t, struct bios_smap **, u_int32_t *); +#ifdef SMP + u_int (*mp_bootaddress)(u_int); +#endif +}; + +extern struct init_ops init_ops; diff --git a/sys/amd64/include/xen/hypercall.h b/sys/amd64/include/xen/hypercall.h index a1b2a5c..499fb4d 100644 --- a/sys/amd64/include/xen/hypercall.h +++ b/sys/amd64/include/xen/hypercall.h @@ -51,15 +51,8 @@ #define CONFIG_XEN_COMPAT 0x030002 #define __must_check -#ifdef XEN #define HYPERCALL_STR(name) \ "call hypercall_page + ("STR(__HYPERVISOR_##name)" * 32)" -#else -#define HYPERCALL_STR(name) \ - "mov $("STR(__HYPERVISOR_##name)" * 32),%%eax; "\ - "add hypercall_stubs(%%rip),%%rax; " \ - "call *%%rax" -#endif #define _hypercall0(type, name) \ ({ \ diff --git a/sys/conf/files b/sys/conf/files index 3c20141..e711ddf 100644 --- a/sys/conf/files +++ b/sys/conf/files @@ -2512,8 +2512,8 @@ dev/xe/if_xe_pccard.c optional xe pccard dev/xen/balloon/balloon.c optional xen | xenhvm dev/xen/blkfront/blkfront.c optional xen | xenhvm dev/xen/blkback/blkback.c optional xen | xenhvm -dev/xen/console/console.c optional xen -dev/xen/console/xencons_ring.c optional xen +dev/xen/console/console.c optional xen | xenhvm +dev/xen/console/xencons_ring.c optional xen | xenhvm dev/xen/control/control.c optional xen | xenhvm dev/xen/netback/netback.c optional xen | xenhvm dev/xen/netfront/netfront.c optional xen | xenhvm diff --git a/sys/conf/files.amd64 b/sys/conf/files.amd64 index 33c4297..d736d84 100644 --- a/sys/conf/files.amd64 +++ b/sys/conf/files.amd64 @@ -564,5 +564,10 @@ x86/x86/mptable_pci.c optional mptable pci x86/x86/msi.c optional pci x86/x86/nexus.c standard x86/x86/tsc.c standard +x86/x86/delay.c standard x86/xen/hvm.c optional xenhvm x86/xen/xen_intr.c optional xen | xenhvm +x86/xen/mptable.c optional xenhvm +x86/xen/pvcpu.c optional xenhvm +x86/xen/pv.c optional xenhvm +x86/xen/xen_nexus.c optional xenhvm diff --git a/sys/conf/files.i386 b/sys/conf/files.i386 index 696d4e7..10a4da8 100644 --- a/sys/conf/files.i386 +++ b/sys/conf/files.i386 @@ -587,5 +587,7 @@ x86/x86/mptable_pci.c optional apic native pci x86/x86/msi.c optional apic pci x86/x86/nexus.c standard x86/x86/tsc.c standard +x86/x86/delay.c standard x86/xen/hvm.c optional xenhvm x86/xen/xen_intr.c optional xen | xenhvm +x86/xen/xen_nexus.c optional xen | xenhvm diff --git a/sys/dev/xen/console/console.c b/sys/dev/xen/console/console.c index 23eaee2..33d7cce 100644 --- a/sys/dev/xen/console/console.c +++ b/sys/dev/xen/console/console.c @@ -69,11 +69,14 @@ struct mtx cn_mtx; static char wbuf[WBUF_SIZE]; static char rbuf[RBUF_SIZE]; static int rc, rp; -static unsigned int cnsl_evt_reg; +unsigned int cnsl_evt_reg; static unsigned int wc, wp; /* write_cons, write_prod */ xen_intr_handle_t xen_intr_handle; device_t xencons_dev; +/* Virt address of the shared console page */ +char *console_page; + #ifdef KDB static int xc_altbrk; #endif @@ -113,6 +116,9 @@ static struct ttydevsw xc_ttydevsw = { static void xc_cnprobe(struct consdev *cp) { + if (!xen_pv_domain()) + return; + cp->cn_pri = CN_REMOTE; sprintf(cp->cn_name, "%s0", driver_name); } @@ -175,7 +181,7 @@ static void xc_cnputc(struct consdev *dev, int c) { - if (xen_start_info->flags & SIF_INITDOMAIN) + if (HYPERVISOR_start_info->flags & SIF_INITDOMAIN) xc_cnputc_dom0(dev, c); else xc_cnputc_domu(dev, c); @@ -206,22 +212,12 @@ xcons_putc(int c) xcons_force_flush(); #endif } - if (cnsl_evt_reg) - __xencons_tx_flush(); + __xencons_tx_flush(); /* inform start path that we''re pretty full */ return ((wp - wc) >= WBUF_SIZE - 100) ? TRUE : FALSE; } -static void -xc_identify(driver_t *driver, device_t parent) -{ - device_t child; - child = BUS_ADD_CHILD(parent, 0, driver_name, 0); - device_set_driver(child, driver); - device_set_desc(child, "Xen Console"); -} - static int xc_probe(device_t dev) { @@ -245,7 +241,7 @@ xc_attach(device_t dev) cnsl_evt_reg = 1; callout_reset(&xc_callout, XC_POLLTIME, xc_timeout, xccons); - if (xen_start_info->flags & SIF_INITDOMAIN) { + if (HYPERVISOR_start_info->flags & SIF_INITDOMAIN) { error = xen_intr_bind_virq(dev, VIRQ_CONSOLE, 0, NULL, xencons_priv_interrupt, NULL, INTR_TYPE_TTY, &xen_intr_handle); @@ -309,7 +305,7 @@ __xencons_tx_flush(void) sz = wp - wc; if (sz > (WBUF_SIZE - WBUF_MASK(wc))) sz = WBUF_SIZE - WBUF_MASK(wc); - if (xen_start_info->flags & SIF_INITDOMAIN) { + if (HYPERVISOR_start_info->flags & SIF_INITDOMAIN) { HYPERVISOR_console_io(CONSOLEIO_write, sz, &wbuf[WBUF_MASK(wc)]); wc += sz; } else { @@ -405,7 +401,6 @@ xc_timeout(void *v) } static device_method_t xc_methods[] = { - DEVMETHOD(device_identify, xc_identify), DEVMETHOD(device_probe, xc_probe), DEVMETHOD(device_attach, xc_attach), @@ -424,7 +419,7 @@ xcons_force_flush(void) { int sz; - if (xen_start_info->flags & SIF_INITDOMAIN) + if (HYPERVISOR_start_info->flags & SIF_INITDOMAIN) return; /* Spin until console data is flushed through to the domain controller. */ diff --git a/sys/dev/xen/console/xencons_ring.c b/sys/dev/xen/console/xencons_ring.c index 3701551..3046498 100644 --- a/sys/dev/xen/console/xencons_ring.c +++ b/sys/dev/xen/console/xencons_ring.c @@ -32,9 +32,9 @@ __FBSDID("$FreeBSD$"); #define console_evtchn console.domU.evtchn xen_intr_handle_t console_handle; -extern char *console_page; extern struct mtx cn_mtx; extern device_t xencons_dev; +extern int cnsl_evt_reg; static inline struct xencons_interface * xencons_interface(void) @@ -60,6 +60,7 @@ xencons_ring_send(const char *data, unsigned len) struct xencons_interface *intf; XENCONS_RING_IDX cons, prod; int sent; + struct evtchn_send send = { .port = HYPERVISOR_start_info->console.domU.evtchn }; intf = xencons_interface(); cons = intf->out_cons; @@ -76,7 +77,11 @@ xencons_ring_send(const char *data, unsigned len) wmb(); intf->out_prod = prod; - xen_intr_signal(console_handle); + if (cnsl_evt_reg) + xen_intr_signal(console_handle); + else + HYPERVISOR_event_channel_op(EVTCHNOP_send, &send); + return sent; @@ -125,11 +130,11 @@ xencons_ring_init(void) { int err; - if (!xen_start_info->console_evtchn) + if (!HYPERVISOR_start_info->console_evtchn) return 0; err = xen_intr_bind_local_port(xencons_dev, - xen_start_info->console_evtchn, NULL, xencons_handle_input, NULL, + HYPERVISOR_start_info->console_evtchn, NULL, xencons_handle_input, NULL, INTR_TYPE_MISC | INTR_MPSAFE, &console_handle); if (err) { return err; @@ -145,7 +150,7 @@ void xencons_suspend(void) { - if (!xen_start_info->console_evtchn) + if (!HYPERVISOR_start_info->console_evtchn) return; xen_intr_unbind(&console_handle); diff --git a/sys/dev/xen/control/control.c b/sys/dev/xen/control/control.c index a9f8d1b..35c923d 100644 --- a/sys/dev/xen/control/control.c +++ b/sys/dev/xen/control/control.c @@ -317,21 +317,6 @@ xctrl_suspend() EVENTHANDLER_INVOKE(power_resume); } -static void -xen_pv_shutdown_final(void *arg, int howto) -{ - /* - * Inform the hypervisor that shutdown is complete. - * This is not necessary in HVM domains since Xen - * emulates ACPI in that mode and FreeBSD''s ACPI - * support will request this transition. - */ - if (howto & (RB_HALT | RB_POWEROFF)) - HYPERVISOR_shutdown(SHUTDOWN_poweroff); - else - HYPERVISOR_shutdown(SHUTDOWN_reboot); -} - #else /* HVM mode suspension. */ @@ -447,6 +432,21 @@ xctrl_halt() shutdown_nice(RB_HALT); } +static void +xen_pv_shutdown_final(void *arg, int howto) +{ + /* + * Inform the hypervisor that shutdown is complete. + * This is not necessary in HVM domains since Xen + * emulates ACPI in that mode and FreeBSD''s ACPI + * support will request this transition. + */ + if (howto & (RB_HALT | RB_POWEROFF)) + HYPERVISOR_shutdown(SHUTDOWN_poweroff); + else + HYPERVISOR_shutdown(SHUTDOWN_reboot); +} + /*------------------------------ Event Reception -----------------------------*/ static void xctrl_on_watch_event(struct xs_watch *watch, const char **vec, unsigned int len) @@ -529,10 +529,9 @@ xctrl_attach(device_t dev) xctrl->xctrl_watch.callback_data = (uintptr_t)xctrl; xs_register_watch(&xctrl->xctrl_watch); -#ifndef XENHVM - EVENTHANDLER_REGISTER(shutdown_final, xen_pv_shutdown_final, NULL, - SHUTDOWN_PRI_LAST); -#endif + if (xen_pv_domain()) + EVENTHANDLER_REGISTER(shutdown_final, xen_pv_shutdown_final, NULL, + SHUTDOWN_PRI_LAST); return (0); } diff --git a/sys/dev/xen/timer/timer.c b/sys/dev/xen/timer/timer.c index 354085b..333f1b0 100644 --- a/sys/dev/xen/timer/timer.c +++ b/sys/dev/xen/timer/timer.c @@ -59,6 +59,9 @@ __FBSDID("$FreeBSD$"); #include <machine/_inttypes.h> #include <machine/smp.h> +/* For the declaration of clock_lock */ +#include <isa/rtc.h> + #include "clock_if.h" static devclass_t xentimer_devclass; @@ -95,19 +98,6 @@ struct xentimer_softc { /* Last time; this guarantees a monotonically increasing clock. */ volatile uint64_t xen_timer_last_time = 0; -static void -xentimer_identify(driver_t *driver, device_t parent) -{ - if (!xen_domain()) - return; - - /* Handle all Xen PV timers in one device instance. */ - if (devclass_get_device(xentimer_devclass, 0)) - return; - - BUS_ADD_CHILD(parent, 0, "xen_et", 0); -} - static int xentimer_probe(device_t dev) { @@ -234,18 +224,16 @@ xen_fetch_vcpu_tinfo(struct vcpu_time_info *dst, struct vcpu_time_info *src) * it happens to be less than another CPU''s previously determined value. */ static uint64_t -xen_fetch_vcpu_time(void) +xen_fetch_vcpu_time(struct vcpu_info *vcpu) { struct vcpu_time_info dst; struct vcpu_time_info *src; uint32_t pre_version; uint64_t now; volatile uint64_t last; - struct vcpu_info *vcpu = DPCPU_GET(vcpu_info); src = &vcpu->time; - critical_enter(); do { pre_version = xen_fetch_vcpu_tinfo(&dst, src); barrier(); @@ -266,16 +254,19 @@ xen_fetch_vcpu_time(void) } } while (!atomic_cmpset_64(&xen_timer_last_time, last, now)); - critical_exit(); - return (now); } static uint32_t xentimer_get_timecount(struct timecounter *tc) { + uint32_t xen_time; + + critical_enter(); + xen_time = (uint32_t)xen_fetch_vcpu_time(DPCPU_GET(vcpu_info)) & UINT_MAX; + critical_exit(); - return ((uint32_t)xen_fetch_vcpu_time() & UINT_MAX); + return xen_time; } /** @@ -305,7 +296,12 @@ xen_fetch_wallclock(struct timespec *ts) static void xen_fetch_uptime(struct timespec *ts) { - uint64_t uptime = xen_fetch_vcpu_time(); + uint64_t uptime; + + critical_enter(); + uptime = xen_fetch_vcpu_time(DPCPU_GET(vcpu_info)); + critical_exit(); + ts->tv_sec = uptime / NSEC_IN_SEC; ts->tv_nsec = uptime % NSEC_IN_SEC; } @@ -354,7 +350,7 @@ xentimer_intr(void *arg) struct xentimer_softc *sc = (struct xentimer_softc *)arg; struct xentimer_pcpu_data *pcpu = DPCPU_PTR(xentimer_pcpu); - pcpu->last_processed = xen_fetch_vcpu_time(); + pcpu->last_processed = xen_fetch_vcpu_time(DPCPU_GET(vcpu_info)); if (pcpu->timer != 0 && sc->et.et_active) sc->et.et_event_cb(&sc->et, sc->et.et_arg); @@ -415,7 +411,9 @@ xentimer_et_start(struct eventtimer *et, do { if (++i == 60) panic("can''t schedule timer"); - next_time = xen_fetch_vcpu_time() + first_in_ns; + critical_enter(); + next_time = xen_fetch_vcpu_time(DPCPU_GET(vcpu_info)) + first_in_ns; + critical_exit(); error = xentimer_vcpu_start_timer(cpu, next_time); } while (error == -ETIME); @@ -573,8 +571,37 @@ xentimer_suspend(device_t dev) return (0); } +/* + * Xen delay early init + */ +void xen_delay_init(void) +{ + /* Init the clock lock */ + mtx_init(&clock_lock, "clk", NULL, MTX_SPIN | MTX_NOPROFILE); +} +/* + * Xen PV DELAY function + * + * When running on PVH mode we don''t have an emulated i8524, so + * make use of the Xen time info in order to code a simple DELAY + * function that can be used during early boot. + */ +void xen_delay(int n) +{ + uint64_t end_ns; + uint64_t current; + + end_ns = xen_fetch_vcpu_time(&HYPERVISOR_shared_info->vcpu_info[0]); + end_ns += n * NSEC_IN_USEC; + + for (;;) { + current = xen_fetch_vcpu_time(&HYPERVISOR_shared_info->vcpu_info[0]); + if (current >= end_ns) + break; + } +} + static device_method_t xentimer_methods[] = { - DEVMETHOD(device_identify, xentimer_identify), DEVMETHOD(device_probe, xentimer_probe), DEVMETHOD(device_attach, xentimer_attach), DEVMETHOD(device_detach, xentimer_detach), diff --git a/sys/dev/xen/xenpci/xenpci.c b/sys/dev/xen/xenpci/xenpci.c index dd2ad92..a19ebcb 100644 --- a/sys/dev/xen/xenpci/xenpci.c +++ b/sys/dev/xen/xenpci/xenpci.c @@ -240,6 +240,7 @@ xenpci_attach(device_t dev) { struct xenpci_softc *scp = device_get_softc(dev); devclass_t dc; + device_t child; int error; /* @@ -270,6 +271,13 @@ xenpci_attach(device_t dev) goto errexit; } + if (BUS_ADD_CHILD(dev, 0, "xenstore", 0) == NULL) + panic("xenpci: unable to add xenstore device"); + child = BUS_ADD_CHILD(nexus, 0, "xen_et", 0); + if (child == NULL) + panic("xenpci: unable to add xen pv timer device"); + device_probe_and_attach(child); + return (bus_generic_attach(dev)); errexit: diff --git a/sys/i386/i386/locore.s b/sys/i386/i386/locore.s index 68cb430..bd136b1 100644 --- a/sys/i386/i386/locore.s +++ b/sys/i386/i386/locore.s @@ -898,3 +898,12 @@ done_pde: #endif ret + +#ifdef XENHVM +/* Xen Hypercall page */ + .text +.p2align PAGE_SHIFT, 0x90 /* Hypercall_page needs to be PAGE aligned */ + +NON_GPROF_ENTRY(hypercall_page) + .skip 0x1000, 0x90 /* Fill with "nop"s */ +#endif diff --git a/sys/i386/i386/machdep.c b/sys/i386/i386/machdep.c index c430316..af12b1d 100644 --- a/sys/i386/i386/machdep.c +++ b/sys/i386/i386/machdep.c @@ -254,6 +254,17 @@ struct mtx icu_lock; struct mem_range_softc mem_range_softc; +#ifndef XEN +void +DELAY(int n) +{ + if (delay_tc(n)) + return; + + i8254_delay(n); +} +#endif + static void cpu_startup(dummy) void *dummy; diff --git a/sys/i386/include/clock.h b/sys/i386/include/clock.h index d980ec7..287b2c8 100644 --- a/sys/i386/include/clock.h +++ b/sys/i386/include/clock.h @@ -22,6 +22,12 @@ extern int tsc_is_invariant; extern int tsc_perf_stat; void i8254_init(void); +void i8254_delay(int); +#ifdef XENHVM +void xen_delay_init(void); +void xen_delay(int); +#endif +int delay_tc(int); /* * Driver to clock driver interface. diff --git a/sys/i386/include/xen/hypercall.h b/sys/i386/include/xen/hypercall.h index edc13f4..1c15b0f 100644 --- a/sys/i386/include/xen/hypercall.h +++ b/sys/i386/include/xen/hypercall.h @@ -40,15 +40,8 @@ #define CONFIG_XEN_COMPAT 0x030002 -#if defined(XEN) #define HYPERCALL_STR(name) \ "call hypercall_page + ("STR(__HYPERVISOR_##name)" * 32)" -#else -#define HYPERCALL_STR(name) \ - "mov hypercall_stubs,%%eax; " \ - "add $("STR(__HYPERVISOR_##name)" * 32),%%eax; " \ - "call *%%eax" -#endif #define _hypercall0(type, name) \ ({ \ diff --git a/sys/i386/xen/mp_machdep.c b/sys/i386/xen/mp_machdep.c index c48fcb2..adf7627 100644 --- a/sys/i386/xen/mp_machdep.c +++ b/sys/i386/xen/mp_machdep.c @@ -928,9 +928,9 @@ cpu_initialize_context(unsigned int cpu) smp_trap_init(ctxt.trap_ctxt); ctxt.ldt_ents = 0; - ctxt.gdt_frames[0] + ctxt.u.pv.gdt_frames[0] (uint32_t)((uint64_t)vtomach(bootAPgdt) >> PAGE_SHIFT); - ctxt.gdt_ents = 512; + ctxt.u.pv.gdt_ents = 512; #ifdef __i386__ ctxt.user_regs.esp = boot_stack + PAGE_SIZE; @@ -959,7 +959,7 @@ cpu_initialize_context(unsigned int cpu) #endif printf("gdtpfn=%lx pdptpfn=%lx\n", - ctxt.gdt_frames[0], + ctxt.u.pv.gdt_frames[0], ctxt.ctrlreg[3] >> PAGE_SHIFT); PANIC_IF(HYPERVISOR_vcpu_op(VCPUOP_initialise, cpu, &ctxt)); diff --git a/sys/i386/xen/xen_machdep.c b/sys/i386/xen/xen_machdep.c index 7049be6..1b1c74d 100644 --- a/sys/i386/xen/xen_machdep.c +++ b/sys/i386/xen/xen_machdep.c @@ -89,6 +89,7 @@ IDTVEC(div), IDTVEC(dbg), IDTVEC(nmi), IDTVEC(bpt), IDTVEC(ofl), int xendebug_flags; start_info_t *xen_start_info; +start_info_t *HYPERVISOR_start_info; shared_info_t *HYPERVISOR_shared_info; xen_pfn_t *xen_machine_phys = machine_to_phys_mapping; xen_pfn_t *xen_phys_machine; @@ -744,7 +745,7 @@ void initvalues(start_info_t *startinfo); struct xenstore_domain_interface; extern struct xenstore_domain_interface *xen_store; -char *console_page; +extern char *console_page; void * bootmem_alloc(unsigned int size) @@ -927,6 +928,7 @@ initvalues(start_info_t *startinfo) HYPERVISOR_vm_assist(VMASST_CMD_enable, VMASST_TYPE_4gb_segments_notify); #endif xen_start_info = startinfo; + HYPERVISOR_start_info = startinfo; xen_phys_machine = (xen_pfn_t *)startinfo->mfn_list; IdlePTD = (pd_entry_t *)((uint8_t *)startinfo->pt_base + PAGE_SIZE); diff --git a/sys/x86/isa/clock.c b/sys/x86/isa/clock.c index a12e175..a5aed1c 100644 --- a/sys/x86/isa/clock.c +++ b/sys/x86/isa/clock.c @@ -247,61 +247,13 @@ getit(void) return ((high << 8) | low); } -#ifndef DELAYDEBUG -static u_int -get_tsc(__unused struct timecounter *tc) -{ - - return (rdtsc32()); -} - -static __inline int -delay_tc(int n) -{ - struct timecounter *tc; - timecounter_get_t *func; - uint64_t end, freq, now; - u_int last, mask, u; - - tc = timecounter; - freq = atomic_load_acq_64(&tsc_freq); - if (tsc_is_invariant && freq != 0) { - func = get_tsc; - mask = ~0u; - } else { - if (tc->tc_quality <= 0) - return (0); - func = tc->tc_get_timecount; - mask = tc->tc_counter_mask; - freq = tc->tc_frequency; - } - now = 0; - end = freq * n / 1000000; - if (func == get_tsc) - sched_pin(); - last = func(tc) & mask; - do { - cpu_spinwait(); - u = func(tc) & mask; - if (u < last) - now += mask - last + u + 1; - else - now += u - last; - last = u; - } while (now < end); - if (func == get_tsc) - sched_unpin(); - return (1); -} -#endif - /* * Wait "n" microseconds. * Relies on timer 1 counting down from (i8254_freq / hz) * Note: timer had better have been programmed before this is first used! */ void -DELAY(int n) +i8254_delay(int n) { int delta, prev_tick, tick, ticks_left; #ifdef DELAYDEBUG @@ -317,9 +269,6 @@ DELAY(int n) } if (state == 1) printf("DELAY(%d)...", n); -#else - if (delay_tc(n)) - return; #endif /* * Read the counter first, so that the rest of the setup overhead is diff --git a/sys/x86/isa/isa.c b/sys/x86/isa/isa.c index 1a57137..09d1ab7 100644 --- a/sys/x86/isa/isa.c +++ b/sys/x86/isa/isa.c @@ -241,3 +241,6 @@ isa_release_resource(device_t bus, device_t child, int type, int rid, * On this platform, isa can also attach to the legacy bus. */ DRIVER_MODULE(isa, legacy, isa_driver, isa_devclass, 0, 0); +#ifdef XENHVM +DRIVER_MODULE(isa, nexus, isa_driver, isa_devclass, 0, 0); +#endif diff --git a/sys/x86/x86/delay.c b/sys/x86/x86/delay.c new file mode 100644 index 0000000..7ea70b1 --- /dev/null +++ b/sys/x86/x86/delay.c @@ -0,0 +1,95 @@ +/*- + * Copyright (c) 1990 The Regents of the University of California. + * Copyright (c) 2010 Alexander Motin <mav@FreeBSD.org> + * All rights reserved. + * + * This code is derived from software contributed to Berkeley by + * William Jolitz and Don Ahn. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * 4. Neither the name of the University nor the names of its contributors + * may be used to endorse or promote products derived from this software + * without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'''' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + * + * from: @(#)clock.c 7.2 (Berkeley) 5/12/91 + */ + +#include <sys/cdefs.h> +__FBSDID("$FreeBSD$"); + +/* Generic x86 routines to handle delay */ + +#include <sys/param.h> +#include <sys/systm.h> +#include <sys/timetc.h> +#include <sys/proc.h> +#include <sys/kernel.h> +#include <sys/sched.h> + +#include <machine/clock.h> +#include <machine/cpu.h> + +static u_int +get_tsc(__unused struct timecounter *tc) +{ + + return (rdtsc32()); +} + +int +delay_tc(int n) +{ + struct timecounter *tc; + timecounter_get_t *func; + uint64_t end, freq, now; + u_int last, mask, u; + + tc = timecounter; + freq = atomic_load_acq_64(&tsc_freq); + if (tsc_is_invariant && freq != 0) { + func = get_tsc; + mask = ~0u; + } else { + if (tc->tc_quality <= 0) + return (0); + func = tc->tc_get_timecount; + mask = tc->tc_counter_mask; + freq = tc->tc_frequency; + } + now = 0; + end = freq * n / 1000000; + if (func == get_tsc) + sched_pin(); + last = func(tc) & mask; + do { + cpu_spinwait(); + u = func(tc) & mask; + if (u < last) + now += mask - last + u + 1; + else + now += u - last; + last = u; + } while (now < end); + if (func == get_tsc) + sched_unpin(); + return (1); +} diff --git a/sys/x86/x86/local_apic.c b/sys/x86/x86/local_apic.c index 8c8eef6..d8d7701 100644 --- a/sys/x86/x86/local_apic.c +++ b/sys/x86/x86/local_apic.c @@ -1368,9 +1368,13 @@ apic_setup_io(void *dummy __unused) if (retval != 0) printf("%s: Failed to setup I/O APICs: returned %d\n", best_enum->apic_name, retval); -#ifdef XEN - return; + +#if defined(XEN) || defined(XENHVM) + /* There''s no lapic on PV Xen */ + if (xen_pv_domain()) + return; #endif + /* * Finish setting up the local APIC on the BSP once we know how to * properly program the LINT pins. diff --git a/sys/x86/xen/hvm.c b/sys/x86/xen/hvm.c index 72811dc..dc8d9a2 100644 --- a/sys/x86/xen/hvm.c +++ b/sys/x86/xen/hvm.c @@ -35,15 +35,21 @@ __FBSDID("$FreeBSD$"); #include <sys/proc.h> #include <sys/smp.h> #include <sys/systm.h> +#include <sys/lock.h> +#include <sys/mutex.h> +#include <sys/reboot.h> #include <vm/vm.h> #include <vm/pmap.h> +#include <vm/vm_kern.h> +#include <vm/vm_extern.h> #include <dev/pci/pcivar.h> #include <machine/cpufunc.h> #include <machine/cpu.h> #include <machine/smp.h> +#include <machine/stdarg.h> #include <x86/apicreg.h> @@ -52,6 +58,9 @@ __FBSDID("$FreeBSD$"); #include <xen/gnttab.h> #include <xen/hypervisor.h> #include <xen/hvm.h> +#ifdef __amd64__ +#include <xen/pv.h> +#endif #include <xen/xen_intr.h> #include <xen/interface/hvm/params.h> @@ -97,6 +106,11 @@ extern void pmap_lazyfix_action(void); /* Variables used by mp_machdep to perform the bitmap IPI */ extern volatile u_int cpu_ipi_pending[MAXCPU]; +#ifdef __amd64__ +/* Native AP start used on PVHVM */ +extern int native_start_all_aps(void); +#endif + /*---------------------------------- Macros ----------------------------------*/ #define IPI_TO_IDX(ipi) ((ipi) - APIC_IPI_INTS) @@ -119,7 +133,10 @@ enum xen_domain_type xen_domain_type = XEN_NATIVE; struct cpu_ops xen_hvm_cpu_ops = { .ipi_vectored = lapic_ipi_vectored, .cpu_init = xen_hvm_cpu_init, - .cpu_resume = xen_hvm_cpu_resume + .cpu_resume = xen_hvm_cpu_resume, +#ifdef __amd64__ + .start_all_aps = native_start_all_aps, +#endif }; static MALLOC_DEFINE(M_XENHVM, "xen_hvm", "Xen HVM PV Support"); @@ -157,8 +174,9 @@ DPCPU_DEFINE(xen_intr_handle_t, ipi_handle[nitems(xen_ipis)]); /*------------------ Hypervisor Access Shared Memory Regions -----------------*/ /** Hypercall table accessed via HYPERVISOR_*_op() methods. */ -char *hypercall_stubs; +extern char *hypercall_page; shared_info_t *HYPERVISOR_shared_info; +start_info_t *HYPERVISOR_start_info; #ifdef SMP /*---------------------------- XEN PV IPI Handlers ---------------------------*/ @@ -522,7 +540,7 @@ xen_setup_cpus(void) { int i; - if (!xen_hvm_domain() || !xen_vector_callback_enabled) + if (!xen_vector_callback_enabled) return; #ifdef __amd64__ @@ -558,7 +576,7 @@ xen_hvm_cpuid_base(void) * Allocate and fill in the hypcall page. */ static int -xen_hvm_init_hypercall_stubs(void) +xen_hvm_init_hypercall_stubs(enum xen_hvm_init_type init_type) { uint32_t base, regs[4]; int i; @@ -567,7 +585,7 @@ xen_hvm_init_hypercall_stubs(void) if (base == 0) return (ENXIO); - if (hypercall_stubs == NULL) { + if (init_type == XEN_HVM_INIT_COLD) { do_cpuid(base + 1, regs); printf("XEN: Hypervisor version %d.%d detected.\n", regs[0] >> 16, regs[0] & 0xffff); @@ -577,18 +595,9 @@ xen_hvm_init_hypercall_stubs(void) * Find the hypercall pages. */ do_cpuid(base + 2, regs); - - if (hypercall_stubs == NULL) { - size_t call_region_size; - - call_region_size = regs[0] * PAGE_SIZE; - hypercall_stubs = malloc(call_region_size, M_XENHVM, M_NOWAIT); - if (hypercall_stubs == NULL) - panic("Unable to allocate Xen hypercall region"); - } for (i = 0; i < regs[0]; i++) - wrmsr(regs[1], vtophys(hypercall_stubs + i * PAGE_SIZE) + i); + wrmsr(regs[1], vtophys(&hypercall_page + i * PAGE_SIZE) + i); return (0); } @@ -677,8 +686,6 @@ xen_hvm_disable_emulated_devices(void) if (inw(XEN_MAGIC_IOPORT) != XMI_MAGIC) return; - if (bootverbose) - printf("XEN: Disabling emulated block and network devices\n"); outw(XEN_MAGIC_IOPORT, XMI_UNPLUG_IDE_DISKS|XMI_UNPLUG_NICS); } @@ -691,7 +698,12 @@ xen_hvm_init(enum xen_hvm_init_type init_type) if (init_type == XEN_HVM_INIT_CANCELLED_SUSPEND) return; - error = xen_hvm_init_hypercall_stubs(); + if (xen_pv_domain()) { + /* hypercall page is already set in the PV case */ + error = 0; + } else { + error = xen_hvm_init_hypercall_stubs(init_type); + } switch (init_type) { case XEN_HVM_INIT_COLD: @@ -701,6 +713,12 @@ xen_hvm_init(enum xen_hvm_init_type init_type) setup_xen_features(); cpu_ops = xen_hvm_cpu_ops; vm_guest = VM_GUEST_XEN; +#ifdef __amd64__ + if (xen_pv_domain()) + cpu_ops.start_all_aps = xen_pv_start_all_aps; + else +#endif + printf("XEN: Disabling emulated block and network devices\n"); break; case XEN_HVM_INIT_RESUME: if (error != 0) @@ -715,10 +733,13 @@ xen_hvm_init(enum xen_hvm_init_type init_type) } xen_vector_callback_enabled = 0; - xen_domain_type = XEN_HVM_DOMAIN; - xen_hvm_init_shared_info_page(); xen_hvm_set_callback(NULL); - xen_hvm_disable_emulated_devices(); + + if (!xen_pv_domain()) { + xen_domain_type = XEN_HVM_DOMAIN; + xen_hvm_init_shared_info_page(); + xen_hvm_disable_emulated_devices(); + } } void @@ -749,10 +770,14 @@ xen_set_vcpu_id(void) struct pcpu *pc; int i; - /* Set vcpu_id to acpi_id */ + if (!xen_domain()) + return; + + /* Set vcpu_id to acpi_id for PVHVM guests */ CPU_FOREACH(i) { pc = pcpu_find(i); - pc->pc_vcpu_id = pc->pc_acpi_id; + if (xen_hvm_domain()) + pc->pc_vcpu_id = pc->pc_acpi_id; if (bootverbose) printf("XEN: CPU %u has VCPU ID %u\n", i, pc->pc_vcpu_id); @@ -790,9 +815,34 @@ xen_hvm_cpu_init(void) DPCPU_SET(vcpu_info, vcpu_info); } +/*----------------------------- Debug functions ------------------------------*/ +#define PRINTK_BUFSIZE 1024 +static int +vprintk(const char *fmt, __va_list ap) +{ + int retval, len; + static char buf[PRINTK_BUFSIZE]; + + retval = vsnprintf(buf, PRINTK_BUFSIZE - 1, fmt, ap); + buf[retval] = 0; + len = strlen(buf); + retval = HYPERVISOR_console_io(CONSOLEIO_write, len, (char *)buf); + return retval; +} + +void +xen_early_printf(const char *fmt, ...) +{ + __va_list ap; + + va_start(ap, fmt); + vprintk(fmt, ap); + va_end(ap); +} + SYSINIT(xen_hvm_init, SI_SUB_HYPERVISOR, SI_ORDER_FIRST, xen_hvm_sysinit, NULL); #ifdef SMP -SYSINIT(xen_setup_cpus, SI_SUB_SMP, SI_ORDER_FIRST, xen_setup_cpus, NULL); +SYSINIT(xen_setup_cpus, SI_SUB_SMP-1, SI_ORDER_ANY, xen_setup_cpus, NULL); #endif SYSINIT(xen_hvm_cpu_init, SI_SUB_INTR, SI_ORDER_FIRST, xen_hvm_cpu_init, NULL); SYSINIT(xen_set_vcpu_id, SI_SUB_CPU, SI_ORDER_ANY, xen_set_vcpu_id, NULL); diff --git a/sys/x86/xen/mptable.c b/sys/x86/xen/mptable.c new file mode 100644 index 0000000..8916314 --- /dev/null +++ b/sys/x86/xen/mptable.c @@ -0,0 +1,136 @@ +/*- + * Copyright (c) 2003 John Baldwin <jhb@FreeBSD.org> + * Copyright (c) 2013 Roger Pau Monné <roger.pau@citrix.com> + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * 3. Neither the name of the author nor the names of any co-contributors + * may be used to endorse or promote products derived from this software + * without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'''' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + */ + +#include <sys/cdefs.h> +__FBSDID("$FreeBSD$"); + +#include <sys/param.h> +#include <sys/systm.h> +#include <sys/bus.h> +#include <sys/kernel.h> +#include <sys/smp.h> +#include <sys/pcpu.h> +#include <vm/vm.h> +#include <vm/pmap.h> + +#include <machine/intr_machdep.h> +#include <machine/apicvar.h> + +#include <machine/cpu.h> +#include <machine/smp.h> + +#include <xen/xen-os.h> +#include <xen/hypervisor.h> + +#include <xen/interface/vcpu.h> + +static int xenpv_probe(void); +static int xenpv_probe_cpus(void); +static int xenpv_setup_local(void); +static int xenpv_setup_io(void); + +static struct apic_enumerator xenpv_enumerator = { + "Xen PV", + xenpv_probe, + xenpv_probe_cpus, + xenpv_setup_local, + xenpv_setup_io +}; + +/* + * Look for an ACPI Multiple APIC Description Table ("APIC") + */ +static int +xenpv_probe(void) +{ + return (-100); +} + +/* + * Run through the MP table enumerating CPUs. + */ +static int +xenpv_probe_cpus(void) +{ + int i, ret; + + for (i = 0; i < MAXCPU; i++) { + ret = HYPERVISOR_vcpu_op(VCPUOP_is_up, i, NULL); + if (ret >= 0) + cpu_add((i * 2), (i == 0)); + } + + return (0); +} + +/* + * Initialize the local APIC on the BSP. + */ +static int +xenpv_setup_local(void) +{ + PCPU_SET(vcpu_id, 0); + return (0); +} + +/* + * Enumerate I/O APICs and setup interrupt sources. + */ +static int +xenpv_setup_io(void) +{ + return (0); +} + +static void +xenpv_register(void *dummy __unused) +{ + if (xen_pv_domain()) { + apic_register_enumerator(&xenpv_enumerator); + } +} +SYSINIT(xenpv_register, SI_SUB_TUNABLES - 1, SI_ORDER_FIRST, xenpv_register, NULL); + +/* + * Setup per-CPU ACPI IDs. + */ +static void +xenpv_set_ids(void *dummy) +{ + struct pcpu *pc; + int i; + + CPU_FOREACH(i) { + pc = pcpu_find(i); + pc->pc_vcpu_id = i; + } + return; +} +SYSINIT(xenpv_set_ids, SI_SUB_CPU, SI_ORDER_MIDDLE, xenpv_set_ids, NULL); diff --git a/sys/x86/xen/pv.c b/sys/x86/xen/pv.c new file mode 100644 index 0000000..6756dec --- /dev/null +++ b/sys/x86/xen/pv.c @@ -0,0 +1,247 @@ +/* + * Copyright (c) 2004 Christian Limpach. + * Copyright (c) 2004-2006,2008 Kip Macy + * Copyright (c) 2013 Roger Pau Monné <roger.pau@citrix.com> + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS AS IS'''' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + */ + +#include <sys/cdefs.h> +__FBSDID("$FreeBSD$"); + +#include <sys/param.h> +#include <sys/bus.h> +#include <sys/kernel.h> +#include <sys/malloc.h> +#include <sys/proc.h> +#include <sys/smp.h> +#include <sys/systm.h> +#include <sys/lock.h> +#include <sys/mutex.h> +#include <sys/reboot.h> + +#include <vm/vm.h> +#include <vm/pmap.h> +#include <vm/vm_kern.h> +#include <vm/vm_extern.h> + +#include <dev/pci/pcivar.h> + +#include <machine/cpufunc.h> +#include <machine/cpu.h> +#include <machine/smp.h> +#include <machine/tss.h> +#include <machine/sysarch.h> +#include <machine/clock.h> + +#include <x86/apicreg.h> + +#include <xen/xen-os.h> +#include <xen/features.h> +#include <xen/gnttab.h> +#include <xen/hypervisor.h> +#include <xen/hvm.h> +#include <xen/pv.h> +#include <xen/xen_intr.h> + +#include <xen/interface/hvm/params.h> +#include <xen/interface/vcpu.h> + +#define MAX_E820_ENTRIES 128 + +/*--------------------------- Forward Declarations ---------------------------*/ +static caddr_t xen_pv_parse_preload_data(u_int64_t); +static void xen_pv_fetch_e820_map(caddr_t, struct bios_smap **, u_int32_t *); + +/*---------------------------- Extern Declarations ---------------------------*/ +/* Variables used by amd64 mp_machdep to start APs */ +extern struct mtx ap_boot_mtx; +extern void *bootstacks[]; +extern char *doublefault_stack; +extern char *nmi_stack; +extern void *dpcpu; +extern int bootAP; +extern char *bootSTK; +extern bool lapic_disabled; + +/*-------------------------------- Global Data -------------------------------*/ +/* Xen init_ops implementation. */ +struct init_ops xen_init_ops = { + .parse_preload_data = xen_pv_parse_preload_data, + .early_delay_init = xen_delay_init, + .early_delay = xen_delay, + .fetch_e820_map = xen_pv_fetch_e820_map, +}; + +static struct +{ + const char *ev; + int mask; +} howto_names[] = { + {"boot_askname", RB_ASKNAME}, + {"boot_single", RB_SINGLE}, + {"boot_nosync", RB_NOSYNC}, + {"boot_halt", RB_ASKNAME}, + {"boot_serial", RB_SERIAL}, + {"boot_cdrom", RB_CDROM}, + {"boot_gdb", RB_GDB}, + {"boot_gdb_pause", RB_RESERVED1}, + {"boot_verbose", RB_VERBOSE}, + {"boot_multicons", RB_MULTIPLE}, + {NULL, 0} +}; + +static struct bios_smap xen_smap[MAX_E820_ENTRIES]; + +static int +start_xen_ap(int cpu) +{ + struct vcpu_guest_context *ctxt; + int ms, cpus = mp_naps; + + ctxt = malloc(sizeof(*ctxt), M_TEMP, M_NOWAIT | M_ZERO); + if (ctxt == NULL) + panic("unable to allocate memory"); + + ctxt->flags = VGCF_IN_KERNEL; + ctxt->user_regs.rip = (unsigned long) init_secondary; + ctxt->user_regs.rsp = (unsigned long) bootSTK; + + /* Set the CPU to use the same page tables and CR4 value */ + ctxt->ctrlreg[3] = KPML4phys; + ctxt->ctrlreg[4] = rcr4(); + + if (HYPERVISOR_vcpu_op(VCPUOP_initialise, cpu, ctxt)) + panic("unable to initialize CPU#%d\n", cpu); + + free(ctxt, M_TEMP); + + /* Launch the vCPU */ + if (HYPERVISOR_vcpu_op(VCPUOP_up, cpu, NULL)) + panic("unable to start AP#%d\n", cpu); + + /* Wait up to 5 seconds for it to start. */ + for (ms = 0; ms < 5000; ms++) { + if (mp_naps > cpus) + return 1; /* return SUCCESS */ + DELAY(1000); + } + + return 0; +} + +int +xen_pv_start_all_aps(void) +{ + int cpu; + + mtx_init(&ap_boot_mtx, "ap boot", NULL, MTX_SPIN); + lapic_disabled = true; + + for (cpu = 1; cpu < mp_ncpus; cpu++) { + + /* allocate and set up an idle stack data page */ + bootstacks[cpu] = (void *)kmem_malloc(kernel_arena, + KSTACK_PAGES * PAGE_SIZE, M_WAITOK | M_ZERO); + doublefault_stack = (char *)kmem_malloc(kernel_arena, + PAGE_SIZE, M_WAITOK | M_ZERO); + nmi_stack = (char *)kmem_malloc(kernel_arena, PAGE_SIZE, + M_WAITOK | M_ZERO); + dpcpu = (void *)kmem_malloc(kernel_arena, DPCPU_SIZE, + M_WAITOK | M_ZERO); + + bootSTK = (char *)bootstacks[cpu] + KSTACK_PAGES * PAGE_SIZE - 8; + bootAP = cpu; + + /* attempt to start the Application Processor */ + if (!start_xen_ap(cpu)) + panic("AP #%d failed to start!", cpu); + + CPU_SET(cpu, &all_cpus); /* record AP in CPU map */ + } + + return mp_naps; +} + +/* + * Functions to convert the "extra" parameters passed by Xen + * into FreeBSD boot options (from the i386 Xen port). + */ +static char * +xen_setbootenv(char *cmd_line) +{ + char *cmd_line_next; + + /* Skip leading spaces */ + for (; *cmd_line == '' ''; cmd_line++); + + for (cmd_line_next = cmd_line; strsep(&cmd_line_next, ",") != NULL;); + return (cmd_line); +} + +static int +xen_boothowto(char *envp) +{ + int i, howto = 0; + + /* get equivalents from the environment */ + for (i = 0; howto_names[i].ev != NULL; i++) + if (getenv(howto_names[i].ev) != NULL) + howto |= howto_names[i].mask; + return (howto); +} + +static caddr_t +xen_pv_parse_preload_data(u_int64_t modulep) +{ + /* Parse the extra boot information given by Xen */ + if (HYPERVISOR_start_info->cmd_line) + kern_envp = xen_setbootenv(HYPERVISOR_start_info->cmd_line); + boothowto |= xen_boothowto(kern_envp); + + return (NULL); +} + +static void +xen_pv_fetch_e820_map(caddr_t kmdp, struct bios_smap **smap, u_int32_t *size) +{ + struct xen_memory_map memmap; + int rc; + + /* Fetch the E820 map from Xen */ + memmap.nr_entries = MAX_E820_ENTRIES; + set_xen_guest_handle(memmap.buffer, xen_smap); + rc = HYPERVISOR_memory_op(XENMEM_memory_map, &memmap); + if (rc) + panic("unable to fetch Xen E820 memory map"); + + *smap = xen_smap; + *size = memmap.nr_entries * sizeof(xen_smap[0]); +} + +void +xen_pv_set_init_ops(void) +{ + /* Init ops for Xen PV */ + init_ops = xen_init_ops; +} diff --git a/sys/x86/xen/pvcpu.c b/sys/x86/xen/pvcpu.c new file mode 100644 index 0000000..35d88148 --- /dev/null +++ b/sys/x86/xen/pvcpu.c @@ -0,0 +1,77 @@ +/* + * Copyright (c) 2013 Roger Pau Monné <roger.pau@citrix.com> + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS AS IS'''' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + */ + +#include <sys/cdefs.h> +__FBSDID("$FreeBSD$"); + +#include <sys/param.h> +#include <sys/systm.h> +#include <sys/bus.h> +#include <sys/kernel.h> +#include <sys/module.h> +#include <sys/pcpu.h> +#include <sys/smp.h> + +#include <xen/xen-os.h> + +static int +xenpvcpu_probe(device_t dev) +{ + if (!xen_pv_domain()) + return (ENXIO); + + device_set_desc(dev, "Xen PV CPU"); + return (0); +} + +static int +xenpvcpu_attach(device_t dev) +{ + struct pcpu *pc; + int cpu; + + cpu = device_get_unit(dev); + pc = pcpu_find(cpu); + pc->pc_device = dev; + return (0); +} + +static device_method_t xenpvcpu_methods[] = { + DEVMETHOD(device_probe, xenpvcpu_probe), + DEVMETHOD(device_attach, xenpvcpu_attach), + DEVMETHOD_END +}; + +static driver_t xenpvcpu_driver = { + "pvcpu", + xenpvcpu_methods, + 0, +}; + +devclass_t xenpvcpu_devclass; + +DRIVER_MODULE(xenpvcpu, nexus, xenpvcpu_driver, xenpvcpu_devclass, 0, 0); +MODULE_DEPEND(xenpvcpu, nexus, 1, 1, 1); diff --git a/sys/x86/xen/xen_nexus.c b/sys/x86/xen/xen_nexus.c new file mode 100644 index 0000000..288e6b6 --- /dev/null +++ b/sys/x86/xen/xen_nexus.c @@ -0,0 +1,99 @@ +/* + * Copyright (c) 2013 Roger Pau Monné <roger.pau@citrix.com> + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS AS IS'''' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + */ + +#include <sys/cdefs.h> +__FBSDID("$FreeBSD$"); + +#include <sys/param.h> +#include <sys/bus.h> +#include <sys/kernel.h> +#include <sys/module.h> +#include <sys/sysctl.h> +#include <sys/systm.h> +#include <sys/smp.h> + +#include <machine/nexusvar.h> + +#include <xen/xen-os.h> + +static const char *xen_devices[] +{ + "xenstore", /* XenStore bus */ + "xen_et", /* Xen PV timer (provides: tc, et, clk) */ + "xc", /* Xen PV console */ + "isa", /* Dummy ISA bus for sc to attach */ +}; + +/* + * Xen nexus(4) driver. + */ +static int +nexus_xen_probe(device_t dev) +{ + if (!xen_pv_domain()) + return (ENXIO); + + return (BUS_PROBE_DEFAULT); +} + +static int +nexus_xen_attach(device_t dev) +{ + int i, error = 0; + + nexus_init_resources(); + bus_generic_probe(dev); + + /* + * Since we have no ACPI, we need to create a dummy CPU device + * in order to set pcpu->pc_device. + */ + CPU_FOREACH(i) + if (BUS_ADD_CHILD(dev, 0, "pvcpu", i) == NULL) + panic("unable to add pvcpu#%d device", i); + + for (i = 0; i < nitems(xen_devices); i++) { + if (BUS_ADD_CHILD(dev, 0, xen_devices[i], 0) == NULL) + panic("%s: could not add", xen_devices[i]); + } + + bus_generic_attach(dev); + + return (error); +} + +static device_method_t nexus_xen_methods[] = { + /* Device interface */ + DEVMETHOD(device_probe, nexus_xen_probe), + DEVMETHOD(device_attach, nexus_xen_attach), + + { 0, 0 } +}; + +DEFINE_CLASS_1(nexus, nexus_xen_driver, nexus_xen_methods, 1, nexus_driver); +static devclass_t nexus_devclass; + +DRIVER_MODULE(nexus_xen, root, nexus_xen_driver, nexus_devclass, 0, 0); diff --git a/sys/xen/gnttab.c b/sys/xen/gnttab.c index 03c32b7..909378a 100644 --- a/sys/xen/gnttab.c +++ b/sys/xen/gnttab.c @@ -25,6 +25,7 @@ __FBSDID("$FreeBSD$"); #include <sys/lock.h> #include <sys/malloc.h> #include <sys/mman.h> +#include <sys/limits.h> #include <xen/xen-os.h> #include <xen/hypervisor.h> @@ -607,6 +608,7 @@ gnttab_resume(void) { int error; unsigned int max_nr_gframes, nr_gframes; + void *alloc_mem; nr_gframes = nr_grant_frames; max_nr_gframes = max_nr_grant_frames(); @@ -614,11 +616,20 @@ gnttab_resume(void) return (ENOSYS); if (!resume_frames) { - error = xenpci_alloc_space(PAGE_SIZE * max_nr_gframes, - &resume_frames); - if (error) { - printf("error mapping gnttab share frames\n"); - return (error); + if (xen_pv_domain()) { + alloc_mem = contigmalloc(max_nr_gframes * PAGE_SIZE, + M_DEVBUF, M_NOWAIT, 0, + ULONG_MAX, PAGE_SIZE, 0); + KASSERT((alloc_mem != NULL), + ("unable to alloc memory for gnttab")); + resume_frames = vtophys(alloc_mem); + } else { + error = xenpci_alloc_space(PAGE_SIZE * max_nr_gframes, + &resume_frames); + if (error) { + printf("error mapping gnttab share frames\n"); + return (error); + } } } diff --git a/sys/xen/interface/arch-x86/xen.h b/sys/xen/interface/arch-x86/xen.h index 1c186d7..6cc15d3 100644 --- a/sys/xen/interface/arch-x86/xen.h +++ b/sys/xen/interface/arch-x86/xen.h @@ -147,7 +147,16 @@ struct vcpu_guest_context { struct cpu_user_regs user_regs; /* User-level CPU registers */ struct trap_info trap_ctxt[256]; /* Virtual IDT */ unsigned long ldt_base, ldt_ents; /* LDT (linear address, # ents) */ - unsigned long gdt_frames[16], gdt_ents; /* GDT (machine frames, # ents) */ + union { + struct { + /* PV: GDT (machine frames, # ents).*/ + unsigned long gdt_frames[16], gdt_ents; + } pv; + struct { + /* PVH: GDTR addr and size */ + unsigned long gdtaddr, gdtsz; + } pvh; + } u; unsigned long kernel_ss, kernel_sp; /* Virtual TSS (only SS1/SP1) */ /* NB. User pagetable on x86/64 is placed in ctrlreg[1]. */ unsigned long ctrlreg[8]; /* CR0-CR7 (control registers) */ diff --git a/sys/xen/pv.h b/sys/xen/pv.h new file mode 100644 index 0000000..bbb1048 --- /dev/null +++ b/sys/xen/pv.h @@ -0,0 +1,29 @@ +/* + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the "Software"), to + * deal in the Software without restriction, including without limitation the + * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or + * sell copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER + * DEALINGS IN THE SOFTWARE. + * + * $FreeBSD$ + */ + +#ifndef __XEN_PV_H__ +#define __XEN_PV_H__ + +int xen_pv_start_all_aps(void); +void xen_pv_set_init_ops(void); + +#endif /* __XEN_PV_H__ */ \ No newline at end of file diff --git a/sys/xen/xen-os.h b/sys/xen/xen-os.h index 87644e9..70e4719 100644 --- a/sys/xen/xen-os.h +++ b/sys/xen/xen-os.h @@ -51,6 +51,11 @@ void force_evtchn_callback(void); extern shared_info_t *HYPERVISOR_shared_info; +extern start_info_t *HYPERVISOR_start_info; + +/* XXX: we need to get rid of this and use HYPERVISOR_start_info directly */ +extern struct xenstore_domain_interface *xen_store; +extern char *console_page; enum xen_domain_type { XEN_NATIVE, /* running on bare hardware */ @@ -78,6 +83,9 @@ xen_hvm_domain(void) return (xen_domain_type == XEN_HVM_DOMAIN); } +/* Debug function, prints directly to hypervisor console */ +void xen_early_printf(const char *, ...); + #ifndef xen_mb #define xen_mb() mb() #endif diff --git a/sys/xen/xenstore/xenstore.c b/sys/xen/xenstore/xenstore.c index d404862..a4ef369 100644 --- a/sys/xen/xenstore/xenstore.c +++ b/sys/xen/xenstore/xenstore.c @@ -1079,12 +1079,6 @@ xs_init_comms(void) } /*------------------ Private Device Attachment Functions --------------------*/ -static void -xs_identify(driver_t *driver, device_t parent) -{ - - BUS_ADD_CHILD(parent, 0, "xenstore", 0); -} /** * Probe for the existance of the XenStore. @@ -1148,11 +1142,17 @@ xs_attach(device_t dev) struct proc *p; #ifdef XENHVM - xs.evtchn = hvm_get_parameter(HVM_PARAM_STORE_EVTCHN); - xs.gpfn = hvm_get_parameter(HVM_PARAM_STORE_PFN); - xen_store = pmap_mapdev(xs.gpfn * PAGE_SIZE, PAGE_SIZE); + if (xen_hvm_domain()) { + xs.evtchn = hvm_get_parameter(HVM_PARAM_STORE_EVTCHN); + xs.gpfn = hvm_get_parameter(HVM_PARAM_STORE_PFN); + xen_store = pmap_mapdev(xs.gpfn * PAGE_SIZE, PAGE_SIZE); + } else if (xen_pv_domain()) { + xs.evtchn = HYPERVISOR_start_info->store_evtchn; + } else { + panic("Unknown domain type, cannot initialize xenstore\n"); + } #else - xs.evtchn = xen_start_info->store_evtchn; + xs.evtchn = HYPERVISOR_start_info->store_evtchn; #endif TAILQ_INIT(&xs.reply_list); @@ -1240,7 +1240,6 @@ xs_resume(device_t dev __unused) /*-------------------- Private Device Attachment Data -----------------------*/ static device_method_t xenstore_methods[] = { /* Device interface */ - DEVMETHOD(device_identify, xs_identify), DEVMETHOD(device_probe, xs_probe), DEVMETHOD(device_attach, xs_attach), DEVMETHOD(device_detach, bus_generic_detach), @@ -1263,9 +1262,8 @@ static devclass_t xenstore_devclass; #ifdef XENHVM DRIVER_MODULE(xenstore, xenpci, xenstore_driver, xenstore_devclass, 0, 0); -#else -DRIVER_MODULE(xenstore, nexus, xenstore_driver, xenstore_devclass, 0, 0); #endif +DRIVER_MODULE(xenstore, nexus, xenstore_driver, xenstore_devclass, 0, 0); /*------------------------------- Sysctl Data --------------------------------*/ /* XXX Shouldn''t the node be somewhere else? */ -- 1.7.7.5 (Apple Git-26) --------------010605090609060304010908 Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel --------------010605090609060304010908--
On 07/11/13 19:10, Roger Pau Monné wrote:> On 28/10/13 14:35, Roger Pau Monné wrote: >> Hello, >> >> The Xen community is working on a new virtualization mode (or maybe I >> should say an extension of HVM) to be able to run PV guests inside HVM >> containers without requiring a device-model (Qemu). One of the >> advantages of this new virtualization mode is that now it is much more >> easier to port guests to run under it (as compared to pure PV guests). >> >> Given that FreeBSD already supports PVHVM, adding PVH support is quite >> easy, we only need some glue for the PV entry point and then support >> for diverging some early init functions (like fetching the e820 map or >> starting the APs). >> >> The attached patch contains all this changes, and allows a SMP FreeBSD >> guest to fully boot (and AFAIK work) under this new PVH mode. The patch >> can also be found on my git repo: >> >> git://xenbits.xen.org/people/royger/freebsd.git pvh_v2 >> >> The patch touches quite a lot of the early init, so I''ve Cced the >> persons that maintain those areas, so they can review it. >> >> In order to test it, and since the PVH changes are not yet merged into >> upstream Xen, the use of a patched Xen is necessary. I''ve collected the >> patches for PVH guest support from George Dunlap (v13) and fixed some >> bugs on top of them, the tree can be found at: >> >> git://xenbits.xen.org/people/royger/xen.git fix_pvh > > I''ve updated the patch (as suggested by John Baldwin) and added a Xen > Nexus, that attaches all the Xen top-level devices, this gets rid of the > legacy bus. > > The new patch can be found at: > > git://xenbits.xen.org/people/royger/freebsd.git pvh_v2The correct branch is pvh_v3, not pvh_v2: http://xenbits.xen.org/gitweb/?p=people/royger/freebsd.git;a=shortlog;h=refs/heads/pvh_v3
On 09/11/13 13:04, Roger Pau Monné wrote:> On 07/11/13 19:10, Roger Pau Monné wrote: >> On 28/10/13 14:35, Roger Pau Monné wrote: >>> Hello, >>> >>> The Xen community is working on a new virtualization mode (or maybe I >>> should say an extension of HVM) to be able to run PV guests inside HVM >>> containers without requiring a device-model (Qemu). One of the >>> advantages of this new virtualization mode is that now it is much more >>> easier to port guests to run under it (as compared to pure PV guests). >>> >>> Given that FreeBSD already supports PVHVM, adding PVH support is quite >>> easy, we only need some glue for the PV entry point and then support >>> for diverging some early init functions (like fetching the e820 map or >>> starting the APs). >>> >>> The attached patch contains all this changes, and allows a SMP FreeBSD >>> guest to fully boot (and AFAIK work) under this new PVH mode. The patch >>> can also be found on my git repo: >>> >>> git://xenbits.xen.org/people/royger/freebsd.git pvh_v2 >>> >>> The patch touches quite a lot of the early init, so I''ve Cced the >>> persons that maintain those areas, so they can review it. >>> >>> In order to test it, and since the PVH changes are not yet merged into >>> upstream Xen, the use of a patched Xen is necessary. I''ve collected the >>> patches for PVH guest support from George Dunlap (v13) and fixed some >>> bugs on top of them, the tree can be found at: >>> >>> git://xenbits.xen.org/people/royger/xen.git fix_pvhPVH DomU support has been committed to upstream Xen, and I''ve updated the patch to match the interface. The main change is that cr4 is not set to ctrlreg[4] by Xen, and the AP is launched without the PSE flag set, so we have to set it on init_secondary. Patch can be found here: http://xenbits.xen.org/gitweb/?p=people/royger/freebsd.git;a=commit;h=8db6aa8cbc5b7a2a88f4e4fb51f99a166c128cec And attached on this email. Thanks for the review, Roger. --------------040406040602030408060208 Content-Type: text/plain; charset="UTF-8"; x-mac-type=0; x-mac-creator=0; name="0001-Xen-x86-DomU-PVH-support.patch" Content-Transfer-Encoding: 8bit Content-Disposition: attachment; filename="0001-Xen-x86-DomU-PVH-support.patch" From c45d62c5c78aca948f652397cb70dd5720f46583 Mon Sep 17 00:00:00 2001 From: Roger Pau Monne <roger.pau@citrix.com> Date: Thu, 7 Nov 2013 17:07:50 +0100 Subject: [PATCH] Xen x86 DomU PVH support PVH mode is basically a PV guest inside an HVM container, and shares a great amount of code with PVHVM. The main difference is the way the guest is started, PVH uses the PV start sequence, jumping directly into the kernel entry point in long mode and with page tables set. The main work of this patch consists in setting the environment as similar as possible to what native FreeBSD expects, and then adding hooks to the PV ops when necessary. sys/amd64/amd64/locore.S: * Add PV entry point, hypervisor_page and the necessary elfnotes. sys/amd64/amd64/machdep.c: * Add hooks to replace bare metal operations that should use a PV helper, this includes: - Preload metadata - i8254_init and i8254_delay - Fetching the e820 memory map - Reserve of the MP bootstrap region * Create a DELAY function that uses the PV hooks. * Introduce a new hammer_time_xen that sets the necessary stuff when running in PVH mode. sys/amd64/amd64/mp_machdep.c: * Introduce a hook to replace start_all_aps. * Introduce a lapic_disabled variable to prevent polluting the code with xen specific gates. sys/amd64/include/asmacros.h: * Copy the ELFNOTE macro from the i386 Xen PV port. sys/amd64/include/clock.h: sys/i386/include/clock.h: * Prototypes for the xen early delay initialization and usage. sys/amd64/include/cpu.h: * Introduce a new cpu hook to init APs. sys/amd64/include/sysarch.h: * Declare the init_ops structure. sys/amd64/include/xen/hypercall.h: sys/i386/include/xen/hypercall.h * Switch to the PV style hypercall mechanism for HVM also. sys/conf/files: * Make the PV console available on XENHVM also. sys/conf/files.amd64: * Include the new files for the PVH port. sys/dev/xen/console/console.c: sys/dev/xen/console/xencons_ring.c: * Remove the identify method and instead add the device from nexus_xen. * Use HYPERVISOR_start_info instead of xen_start_info. * Use HYPERVISOR_event_channel_op to kick the event channel before xen interrupts are setup. sys/dev/xen/control/control.c: * Use the PV shutdown on PVH. sys/dev/xen/timer/timer.c: * Pass a vcpu_info to xen_fetch_vcpu_time, this allows using this function at very early init, before per-cpu vcpu_info is set. * Remove critical_{enter/exit} from xen_fetch_vcpu_time so it can be used at early boot, instead place them on the callers. * Introduce two new functions, xen_delay_init and xen_delay that can be used at early boot to implement the generic DELAY function. * Remove the identify method that used to add the device, now it is manually added from either xenpci (HVM) or nexus_xen (PV). sys/i386/i386/locore.s: * Reserve space for the hypercall page. sys/i386/i386/machdep.c: * Create a generic DELAY function. sys/i386/xen/xen_machdep.c: * Set HYPERVISOR_start_info. sys/x86/isa/clock.c: * Rename the generic DELAY function to i8254_delay. sys/x86/x86/delay.c: * Put generic delay helpers here, get_tsc and delay_tc. sys/x86/x86/local_apic.c: * Prevent the local apic from attaching when running on PVH mode. sys/x86/xen/hvm.c: * Set the start_all_aps hook. * Fix the setting of the hypercall page now that we are using the same mechanism as the PV port. * Initialize Xen CPU hooks for the PVH port. * Introduce the xen_early_printf debug function, which prints directly to the hypervisor console. * Initialize APs before SI_SUB_SMP (SI_SUB_SMP-1). sys/x86/xen/mptable.c: * Create a dummy PV CPU enumerator for the PVH port. sys/x86/xen/pv.c: * Implement the PV functions for the early boot hooks, parse_preload_data and fetch_e820_map. * Implement the PV function for the start_all_aps hook. sys/x86/xen/pvcpu.c: * Dummy Xen PV CPU device, that we use to set the per-cpu pc_device. sys/xen/gnttab.c: * Allocate resume_frames for the PVH port. sys/xen/pv.h: * Header that exports the specific PV functions. sys/xen/xen-os.h: * Declare prototypes for the newly added functions. sys/xen/xenstore/xenstore.c: * Make the xenstore driver hang from both xenpci and the nexus when running XENHVM, this is because we don''t have a xenpci device on the PVH port. * Remove the identify routine that added the device, instead add it from either xenpci (HVM) or nexus_xen (PV). sys/dev/xen/xenpci/xenpci.c: * Add the xenstore and xen_et devices on succesful attach. sys/i386/xen/mp_machdep.c: * Modify cpu_initialize_context to match the changes in the Xen interface. sys/x86/xen/xen_nexus.c: * Create a specific nexus for Xen PV guests that takes care of adding the top level Xen PV devices. --- sys/amd64/amd64/locore.S | 53 ++++++++ sys/amd64/amd64/machdep.c | 179 ++++++++++++++++++++++---- sys/amd64/amd64/mp_machdep.c | 33 +++-- sys/amd64/include/asmacros.h | 26 ++++ sys/amd64/include/clock.h | 6 + sys/amd64/include/cpu.h | 1 + sys/amd64/include/sysarch.h | 19 +++ sys/amd64/include/xen/hypercall.h | 7 - sys/conf/files | 4 +- sys/conf/files.amd64 | 5 + sys/conf/files.i386 | 2 + sys/dev/xen/console/console.c | 29 ++--- sys/dev/xen/console/xencons_ring.c | 15 ++- sys/dev/xen/control/control.c | 37 +++--- sys/dev/xen/timer/timer.c | 73 ++++++++---- sys/dev/xen/xenpci/xenpci.c | 8 ++ sys/i386/i386/locore.s | 9 ++ sys/i386/i386/machdep.c | 11 ++ sys/i386/include/clock.h | 6 + sys/i386/include/xen/hypercall.h | 7 - sys/i386/xen/mp_machdep.c | 6 +- sys/i386/xen/xen_machdep.c | 4 +- sys/x86/isa/clock.c | 53 +-------- sys/x86/isa/isa.c | 3 + sys/x86/x86/delay.c | 95 ++++++++++++++ sys/x86/x86/local_apic.c | 8 +- sys/x86/xen/hvm.c | 98 +++++++++++---- sys/x86/xen/mptable.c | 136 ++++++++++++++++++++ sys/x86/xen/pv.c | 246 ++++++++++++++++++++++++++++++++++++ sys/x86/xen/pvcpu.c | 77 +++++++++++ sys/x86/xen/xen_nexus.c | 99 +++++++++++++++ sys/xen/gnttab.c | 21 +++- sys/xen/pv.h | 29 +++++ sys/xen/xen-os.h | 8 ++ sys/xen/xenstore/xenstore.c | 24 ++-- 35 files changed, 1219 insertions(+), 218 deletions(-) create mode 100644 sys/x86/x86/delay.c create mode 100644 sys/x86/xen/mptable.c create mode 100644 sys/x86/xen/pv.c create mode 100644 sys/x86/xen/pvcpu.c create mode 100644 sys/x86/xen/xen_nexus.c create mode 100644 sys/xen/pv.h diff --git a/sys/amd64/amd64/locore.S b/sys/amd64/amd64/locore.S index 55cda3a..e04cc48 100644 --- a/sys/amd64/amd64/locore.S +++ b/sys/amd64/amd64/locore.S @@ -31,6 +31,12 @@ #include <machine/pmap.h> #include <machine/specialreg.h> +#ifdef XENHVM +#include <xen/xen-os.h> +#define __ASSEMBLY__ +#include <xen/interface/elfnote.h> +#endif + #include "assym.s" /* @@ -86,3 +92,50 @@ NON_GPROF_ENTRY(btext) ALIGN_DATA /* just to be sure */ .space 0x1000 /* space for bootstack - temporary stack */ bootstack: + +#ifdef XENHVM +/* Xen */ +.section __xen_guest + ELFNOTE(Xen, XEN_ELFNOTE_GUEST_OS, .asciz, "FreeBSD") + ELFNOTE(Xen, XEN_ELFNOTE_GUEST_VERSION, .asciz, "HEAD") + ELFNOTE(Xen, XEN_ELFNOTE_XEN_VERSION, .asciz, "xen-3.0") + ELFNOTE(Xen, XEN_ELFNOTE_VIRT_BASE, .quad, KERNBASE) + ELFNOTE(Xen, XEN_ELFNOTE_PADDR_OFFSET, .quad, KERNBASE) /* Xen honours elf->p_paddr; compensate for this */ + ELFNOTE(Xen, XEN_ELFNOTE_ENTRY, .quad, xen_start) + ELFNOTE(Xen, XEN_ELFNOTE_HYPERCALL_PAGE, .quad, hypercall_page) + ELFNOTE(Xen, XEN_ELFNOTE_HV_START_LOW, .quad, HYPERVISOR_VIRT_START) + ELFNOTE(Xen, XEN_ELFNOTE_FEATURES, .asciz, "writable_descriptor_tables|auto_translated_physmap|supervisor_mode_kernel|hvm_callback_vector") + ELFNOTE(Xen, XEN_ELFNOTE_PAE_MODE, .asciz, "yes") + ELFNOTE(Xen, XEN_ELFNOTE_L1_MFN_VALID, .long, PG_V, PG_V) + ELFNOTE(Xen, XEN_ELFNOTE_LOADER, .asciz, "generic") + ELFNOTE(Xen, XEN_ELFNOTE_SUSPEND_CANCEL, .long, 0) + ELFNOTE(Xen, XEN_ELFNOTE_BSD_SYMTAB, .asciz, "yes") + + .text +.p2align PAGE_SHIFT, 0x90 /* Hypercall_page needs to be PAGE aligned */ + +NON_GPROF_ENTRY(hypercall_page) + .skip 0x1000, 0x90 /* Fill with "nop"s */ + +NON_GPROF_ENTRY(xen_start) + /* Don''t trust what the loader gives for rflags. */ + pushq $PSL_KERNEL + popfq + + /* Parameters for the xen init function */ + movq %rsi, %rdi /* shared_info (arg 1) */ + movq %rsp, %rsi /* xenstack (arg 2) */ + + /* Use our own stack */ + movq $bootstack,%rsp + xorl %ebp, %ebp + + /* u_int64_t hammer_time_xen(start_info_t *si, u_int64_t xenstack); */ + call hammer_time_xen + movq %rax, %rsp /* set up kstack for mi_startup() */ + call mi_startup /* autoconfiguration, mountroot etc */ + + /* NOTREACHED */ +0: hlt + jmp 0b +#endif diff --git a/sys/amd64/amd64/machdep.c b/sys/amd64/amd64/machdep.c index 2b2e47f..b649def 100644 --- a/sys/amd64/amd64/machdep.c +++ b/sys/amd64/amd64/machdep.c @@ -127,6 +127,7 @@ __FBSDID("$FreeBSD$"); #include <machine/reg.h> #include <machine/sigframe.h> #include <machine/specialreg.h> +#include <machine/sysarch.h> #ifdef PERFMON #include <machine/perfmon.h> #endif @@ -147,10 +148,20 @@ __FBSDID("$FreeBSD$"); #include <isa/isareg.h> #include <isa/rtc.h> +#ifdef XENHVM +/* Xen */ +#include <xen/xen-os.h> +#include <xen/hvm.h> +#include <xen/pv.h> +#endif + /* Sanity check for __curthread() */ CTASSERT(offsetof(struct pcpu, pc_curthread) == 0); extern u_int64_t hammer_time(u_int64_t, u_int64_t); +#ifdef XENHVM +extern u_int64_t hammer_time_xen(start_info_t *, u_int64_t); +#endif extern void printcpuinfo(void); /* XXX header file */ extern void identify_cpu(void); @@ -166,6 +177,23 @@ static int set_fpcontext(struct thread *td, const mcontext_t *mcp, char *xfpustate, size_t xfpustate_len); SYSINIT(cpu, SI_SUB_CPU, SI_ORDER_FIRST, cpu_startup, NULL); +/* Preload data parse function */ +static caddr_t native_parse_preload_data(u_int64_t); + +/* Native function to fetch the e820 map */ +static void native_fetch_e820_map(caddr_t, struct bios_smap **, u_int32_t *); + +/* Default init_ops implementation. */ +struct init_ops init_ops = { + .parse_preload_data = native_parse_preload_data, + .early_delay_init = i8254_init, + .early_delay = i8254_delay, + .fetch_e820_map = native_fetch_e820_map, +#ifdef SMP + .mp_bootaddress = mp_bootaddress, +#endif +}; + /* * The file "conf/ldscript.amd64" defines the symbol "kernphys". Its value is * the physical address at which the kernel is loaded. @@ -216,6 +244,15 @@ struct mem_range_softc mem_range_softc; struct mtx dt_lock; /* lock for GDT and LDT */ +void +DELAY(int n) +{ + if (delay_tc(n)) + return; + + init_ops.early_delay(n); +} + static void cpu_startup(dummy) void *dummy; @@ -1408,6 +1445,24 @@ add_smap_entry(struct bios_smap *smap, vm_paddr_t *physmap, int *physmap_idxp) return (1); } +static void +native_fetch_e820_map(caddr_t kmdp, struct bios_smap **smap, u_int32_t *size) +{ + /* + * get memory map from INT 15:E820, kindly supplied by the + * loader. + * + * subr_module.c says: + * "Consumer may safely assume that size value precedes data." + * ie: an int32_t immediately precedes smap. + */ + *smap = (struct bios_smap *)preload_search_info(kmdp, + MODINFO_METADATA | MODINFOMD_SMAP); + if (*smap == NULL) + panic("No BIOS smap info from loader!"); + *size = *((u_int32_t *)*smap - 1); +} + /* * Populate the (physmap) array with base/bound pairs describing the * available physical memory in the system, then test this memory and @@ -1433,19 +1488,8 @@ getmemsize(caddr_t kmdp, u_int64_t first) basemem = 0; physmap_idx = 0; - /* - * get memory map from INT 15:E820, kindly supplied by the loader. - * - * subr_module.c says: - * "Consumer may safely assume that size value precedes data." - * ie: an int32_t immediately precedes smap. - */ - smapbase = (struct bios_smap *)preload_search_info(kmdp, - MODINFO_METADATA | MODINFOMD_SMAP); - if (smapbase == NULL) - panic("No BIOS smap info from loader!"); + init_ops.fetch_e820_map(kmdp, &smapbase, &smapsize); - smapsize = *((u_int32_t *)smapbase - 1); smapend = (struct bios_smap *)((uintptr_t)smapbase + smapsize); for (smap = smapbase; smap < smapend; smap++) @@ -1467,7 +1511,8 @@ getmemsize(caddr_t kmdp, u_int64_t first) #ifdef SMP /* make hole for AP bootstrap code */ - physmap[1] = mp_bootaddress(physmap[1] / 1024); + if (init_ops.mp_bootaddress) + physmap[1] = init_ops.mp_bootaddress(physmap[1] / 1024); #endif /* @@ -1681,6 +1726,98 @@ do_next: msgbufp = (struct msgbuf *)PHYS_TO_DMAP(phys_avail[pa_indx]); } +static caddr_t +native_parse_preload_data(u_int64_t modulep) +{ + caddr_t kmdp; + + preload_metadata = (caddr_t)(uintptr_t)(modulep + KERNBASE); + preload_bootstrap_relocate(KERNBASE); + kmdp = preload_search_by_type("elf kernel"); + if (kmdp == NULL) + kmdp = preload_search_by_type("elf64 kernel"); + boothowto = MD_FETCH(kmdp, MODINFOMD_HOWTO, int); + kern_envp = MD_FETCH(kmdp, MODINFOMD_ENVP, char *) + KERNBASE; +#ifdef DDB + ksym_start = MD_FETCH(kmdp, MODINFOMD_SSYM, uintptr_t); + ksym_end = MD_FETCH(kmdp, MODINFOMD_ESYM, uintptr_t); +#endif + + return (kmdp); +} + +#ifdef XENHVM +/* + * First function called by the Xen PVH boot sequence. + * + * Set some Xen global variables and prepare the environment so it is + * as similar as possible to what native FreeBSD init function expects. + */ +u_int64_t +hammer_time_xen(start_info_t *si, u_int64_t xenstack) +{ + u_int64_t physfree; + u_int64_t *PT4 = (u_int64_t *)xenstack; + u_int64_t *PT3 = (u_int64_t *)(xenstack + PAGE_SIZE); + u_int64_t *PT2 = (u_int64_t *)(xenstack + 2 * PAGE_SIZE); + int i; + + KASSERT((si != NULL && xenstack != 0), + ("invalid start_info or xenstack")); + + xen_early_printf("FreeBSD PVH running on %s\n", si->magic); + + /* We use 3 pages of xen stack for the boot pagetables */ + physfree = xenstack + 3 * PAGE_SIZE - KERNBASE; + + /* Setup Xen global variables */ + HYPERVISOR_start_info = si; + HYPERVISOR_shared_info + (shared_info_t *)(si->shared_info + KERNBASE); + + /* + * Setup some misc global variables for Xen devices + * + * XXX: devices that need this specific variables should + * be rewritten to fetch this info by themselves from the + * start_info page. + */ + console_page + (char *)(ptoa(si->console.domU.mfn) + KERNBASE); + xen_store = (struct xenstore_domain_interface *) + (ptoa(si->store_mfn) + KERNBASE); + + xen_domain_type = XEN_PV_DOMAIN; + vm_guest = VM_GUEST_XEN; + + /* + * Use the stack Xen gives us to build the page tables + * as native FreeBSD expects to find them (created + * by the boot trampoline). + */ + for (i = 0; i < 512; i++) { + /* Each slot of the level 4 pages points to the same level 3 page */ + PT4[i] = ((u_int64_t)&PT3[0]) - KERNBASE; + PT4[i] |= PG_V | PG_RW | PG_U; + + /* Each slot of the level 3 pages points to the same level 2 page */ + PT3[i] = ((u_int64_t)&PT2[0]) - KERNBASE; + PT3[i] |= PG_V | PG_RW | PG_U; + + /* The level 2 page slots are mapped with 2MB pages for 1GB. */ + PT2[i] = i * (2 * 1024 * 1024); + PT2[i] |= PG_V | PG_RW | PG_PS | PG_U; + } + load_cr3(((u_int64_t)&PT4[0]) - KERNBASE); + + /* Set the hooks for early functions that diverge from bare metal */ + xen_pv_set_init_ops(); + + /* Now we can jump into the native init function */ + return hammer_time(0, physfree); +} +#endif + u_int64_t hammer_time(u_int64_t modulep, u_int64_t physfree) { @@ -1705,17 +1842,7 @@ hammer_time(u_int64_t modulep, u_int64_t physfree) */ proc_linkup0(&proc0, &thread0); - preload_metadata = (caddr_t)(uintptr_t)(modulep + KERNBASE); - preload_bootstrap_relocate(KERNBASE); - kmdp = preload_search_by_type("elf kernel"); - if (kmdp == NULL) - kmdp = preload_search_by_type("elf64 kernel"); - boothowto = MD_FETCH(kmdp, MODINFOMD_HOWTO, int); - kern_envp = MD_FETCH(kmdp, MODINFOMD_ENVP, char *) + KERNBASE; -#ifdef DDB - ksym_start = MD_FETCH(kmdp, MODINFOMD_SSYM, uintptr_t); - ksym_end = MD_FETCH(kmdp, MODINFOMD_ESYM, uintptr_t); -#endif + kmdp = init_ops.parse_preload_data(modulep); /* Init basic tunables, hz etc */ init_param1(); @@ -1799,10 +1926,10 @@ hammer_time(u_int64_t modulep, u_int64_t physfree) lidt(&r_idt); /* - * Initialize the i8254 before the console so that console + * Initialize the early delay before the console so that console * initialization can use DELAY(). */ - i8254_init(); + init_ops.early_delay_init(); /* * Initialize the console before we print anything out. diff --git a/sys/amd64/amd64/mp_machdep.c b/sys/amd64/amd64/mp_machdep.c index 4ef4b3d..a751055 100644 --- a/sys/amd64/amd64/mp_machdep.c +++ b/sys/amd64/amd64/mp_machdep.c @@ -90,7 +90,8 @@ extern struct pcpu __pcpu[]; /* AP uses this during bootstrap. Do not staticize. */ char *bootSTK; -static int bootAP; +int bootAP; +bool lapic_disabled = false; /* Free these after use */ void *bootstacks[MAXCPU]; @@ -122,9 +123,12 @@ u_long *ipi_rendezvous_counts[MAXCPU]; static u_long *ipi_hardclock_counts[MAXCPU]; #endif +int native_start_all_aps(void); + /* Default cpu_ops implementation. */ struct cpu_ops cpu_ops = { - .ipi_vectored = lapic_ipi_vectored + .ipi_vectored = lapic_ipi_vectored, + .start_all_aps = native_start_all_aps, }; extern inthand_t IDTVEC(fast_syscall), IDTVEC(fast_syscall32); @@ -138,7 +142,7 @@ extern int pmap_pcid_enabled; static volatile cpuset_t ipi_nmi_pending; /* used to hold the AP''s until we are ready to release them */ -static struct mtx ap_boot_mtx; +struct mtx ap_boot_mtx; /* Set to 1 once we''re ready to let the APs out of the pen. */ static volatile int aps_ready = 0; @@ -165,7 +169,6 @@ static int cpu_cores; /* cores per package */ static void assign_cpu_ids(void); static void set_interrupt_apic_ids(void); -static int start_all_aps(void); static int start_ap(int apic_id); static void release_aps(void *dummy); @@ -569,7 +572,7 @@ cpu_mp_start(void) assign_cpu_ids(); /* Start each Application Processor */ - start_all_aps(); + cpu_ops.start_all_aps(); set_interrupt_apic_ids(); } @@ -707,7 +710,8 @@ init_secondary(void) wrmsr(MSR_SF_MASK, PSL_NT|PSL_T|PSL_I|PSL_C|PSL_D); /* Disable local APIC just to be sure. */ - lapic_disable(); + if (!lapic_disabled) + lapic_disable(); /* signal our startup to the BSP. */ mp_naps++; @@ -733,7 +737,7 @@ init_secondary(void) /* A quick check from sanity claus */ cpuid = PCPU_GET(cpuid); - if (PCPU_GET(apic_id) != lapic_id()) { + if (!lapic_disabled && PCPU_GET(apic_id) != lapic_id()) { printf("SMP: cpuid = %d\n", cpuid); printf("SMP: actual apic_id = %d\n", lapic_id()); printf("SMP: correct apic_id = %d\n", PCPU_GET(apic_id)); @@ -749,7 +753,8 @@ init_secondary(void) mtx_lock_spin(&ap_boot_mtx); /* Init local apic for irq''s */ - lapic_setup(1); + if (!lapic_disabled) + lapic_setup(1); /* Set memory range attributes for this CPU to match the BSP */ mem_range_AP_init(); @@ -764,7 +769,7 @@ init_secondary(void) if (cpu_logical > 1 && PCPU_GET(apic_id) % cpu_logical != 0) CPU_SET(cpuid, &logical_cpus_mask); - if (bootverbose) + if (!lapic_disabled && bootverbose) lapic_dump("AP"); if (smp_cpus == mp_ncpus) { @@ -776,9 +781,13 @@ init_secondary(void) /* * Enable global pages TLB extension * This also implicitly flushes the TLB + * + * Also set PSE, because on Xen AP bringup + * it is not set, and it doesn''t do any harm + * to set it again here on the bare-metal case. */ - load_cr4(rcr4() | CR4_PGE); + load_cr4(rcr4() | CR4_PGE | CR4_PSE); if (pmap_pcid_enabled) load_cr4(rcr4() | CR4_PCIDE); load_ds(_udatasel); @@ -908,8 +917,8 @@ assign_cpu_ids(void) /* * start each AP in our list */ -static int -start_all_aps(void) +int +native_start_all_aps(void) { vm_offset_t va = boot_address + KERNBASE; u_int64_t *pt4, *pt3, *pt2; diff --git a/sys/amd64/include/asmacros.h b/sys/amd64/include/asmacros.h index 1fb592a..ce8dce4 100644 --- a/sys/amd64/include/asmacros.h +++ b/sys/amd64/include/asmacros.h @@ -201,4 +201,30 @@ #endif /* LOCORE */ +#ifdef __STDC__ +#define ELFNOTE(name, type, desctype, descdata...) \ +.pushsection .note.name ; \ + .align 4 ; \ + .long 2f - 1f /* namesz */ ; \ + .long 4f - 3f /* descsz */ ; \ + .long type ; \ +1:.asciz #name ; \ +2:.align 4 ; \ +3:desctype descdata ; \ +4:.align 4 ; \ +.popsection +#else /* !__STDC__, i.e. -traditional */ +#define ELFNOTE(name, type, desctype, descdata) \ +.pushsection .note.name ; \ + .align 4 ; \ + .long 2f - 1f /* namesz */ ; \ + .long 4f - 3f /* descsz */ ; \ + .long type ; \ +1:.asciz "name" ; \ +2:.align 4 ; \ +3:desctype descdata ; \ +4:.align 4 ; \ +.popsection +#endif /* __STDC__ */ + #endif /* !_MACHINE_ASMACROS_H_ */ diff --git a/sys/amd64/include/clock.h b/sys/amd64/include/clock.h index d7f7d82..e7817ab 100644 --- a/sys/amd64/include/clock.h +++ b/sys/amd64/include/clock.h @@ -25,6 +25,12 @@ extern int smp_tsc; #endif void i8254_init(void); +void i8254_delay(int); +#ifdef XENHVM +void xen_delay_init(void); +void xen_delay(int); +#endif +int delay_tc(int); /* * Driver to clock driver interface. diff --git a/sys/amd64/include/cpu.h b/sys/amd64/include/cpu.h index 3d9ff531..ed9f1db 100644 --- a/sys/amd64/include/cpu.h +++ b/sys/amd64/include/cpu.h @@ -64,6 +64,7 @@ struct cpu_ops { void (*cpu_init)(void); void (*cpu_resume)(void); void (*ipi_vectored)(u_int, int); + int (*start_all_aps)(void); }; extern struct cpu_ops cpu_ops; diff --git a/sys/amd64/include/sysarch.h b/sys/amd64/include/sysarch.h index cd380d4..27fd3ba 100644 --- a/sys/amd64/include/sysarch.h +++ b/sys/amd64/include/sysarch.h @@ -4,3 +4,22 @@ /* $FreeBSD$ */ #include <x86/sysarch.h> + +#include <machine/pc/bios.h> +/* + * Struct containing pointers to init functions whose + * implementation is run time selectable. Selection can be made, + * for example, based on detection of a BIOS variant or + * hypervisor environment. + */ +struct init_ops { + caddr_t (*parse_preload_data)(u_int64_t); + void (*early_delay_init)(void); + void (*early_delay)(int); + void (*fetch_e820_map)(caddr_t, struct bios_smap **, u_int32_t *); +#ifdef SMP + u_int (*mp_bootaddress)(u_int); +#endif +}; + +extern struct init_ops init_ops; diff --git a/sys/amd64/include/xen/hypercall.h b/sys/amd64/include/xen/hypercall.h index a1b2a5c..499fb4d 100644 --- a/sys/amd64/include/xen/hypercall.h +++ b/sys/amd64/include/xen/hypercall.h @@ -51,15 +51,8 @@ #define CONFIG_XEN_COMPAT 0x030002 #define __must_check -#ifdef XEN #define HYPERCALL_STR(name) \ "call hypercall_page + ("STR(__HYPERVISOR_##name)" * 32)" -#else -#define HYPERCALL_STR(name) \ - "mov $("STR(__HYPERVISOR_##name)" * 32),%%eax; "\ - "add hypercall_stubs(%%rip),%%rax; " \ - "call *%%rax" -#endif #define _hypercall0(type, name) \ ({ \ diff --git a/sys/conf/files b/sys/conf/files index 3c20141..e711ddf 100644 --- a/sys/conf/files +++ b/sys/conf/files @@ -2512,8 +2512,8 @@ dev/xe/if_xe_pccard.c optional xe pccard dev/xen/balloon/balloon.c optional xen | xenhvm dev/xen/blkfront/blkfront.c optional xen | xenhvm dev/xen/blkback/blkback.c optional xen | xenhvm -dev/xen/console/console.c optional xen -dev/xen/console/xencons_ring.c optional xen +dev/xen/console/console.c optional xen | xenhvm +dev/xen/console/xencons_ring.c optional xen | xenhvm dev/xen/control/control.c optional xen | xenhvm dev/xen/netback/netback.c optional xen | xenhvm dev/xen/netfront/netfront.c optional xen | xenhvm diff --git a/sys/conf/files.amd64 b/sys/conf/files.amd64 index 33c4297..d736d84 100644 --- a/sys/conf/files.amd64 +++ b/sys/conf/files.amd64 @@ -564,5 +564,10 @@ x86/x86/mptable_pci.c optional mptable pci x86/x86/msi.c optional pci x86/x86/nexus.c standard x86/x86/tsc.c standard +x86/x86/delay.c standard x86/xen/hvm.c optional xenhvm x86/xen/xen_intr.c optional xen | xenhvm +x86/xen/mptable.c optional xenhvm +x86/xen/pvcpu.c optional xenhvm +x86/xen/pv.c optional xenhvm +x86/xen/xen_nexus.c optional xenhvm diff --git a/sys/conf/files.i386 b/sys/conf/files.i386 index 696d4e7..10a4da8 100644 --- a/sys/conf/files.i386 +++ b/sys/conf/files.i386 @@ -587,5 +587,7 @@ x86/x86/mptable_pci.c optional apic native pci x86/x86/msi.c optional apic pci x86/x86/nexus.c standard x86/x86/tsc.c standard +x86/x86/delay.c standard x86/xen/hvm.c optional xenhvm x86/xen/xen_intr.c optional xen | xenhvm +x86/xen/xen_nexus.c optional xen | xenhvm diff --git a/sys/dev/xen/console/console.c b/sys/dev/xen/console/console.c index 23eaee2..33d7cce 100644 --- a/sys/dev/xen/console/console.c +++ b/sys/dev/xen/console/console.c @@ -69,11 +69,14 @@ struct mtx cn_mtx; static char wbuf[WBUF_SIZE]; static char rbuf[RBUF_SIZE]; static int rc, rp; -static unsigned int cnsl_evt_reg; +unsigned int cnsl_evt_reg; static unsigned int wc, wp; /* write_cons, write_prod */ xen_intr_handle_t xen_intr_handle; device_t xencons_dev; +/* Virt address of the shared console page */ +char *console_page; + #ifdef KDB static int xc_altbrk; #endif @@ -113,6 +116,9 @@ static struct ttydevsw xc_ttydevsw = { static void xc_cnprobe(struct consdev *cp) { + if (!xen_pv_domain()) + return; + cp->cn_pri = CN_REMOTE; sprintf(cp->cn_name, "%s0", driver_name); } @@ -175,7 +181,7 @@ static void xc_cnputc(struct consdev *dev, int c) { - if (xen_start_info->flags & SIF_INITDOMAIN) + if (HYPERVISOR_start_info->flags & SIF_INITDOMAIN) xc_cnputc_dom0(dev, c); else xc_cnputc_domu(dev, c); @@ -206,22 +212,12 @@ xcons_putc(int c) xcons_force_flush(); #endif } - if (cnsl_evt_reg) - __xencons_tx_flush(); + __xencons_tx_flush(); /* inform start path that we''re pretty full */ return ((wp - wc) >= WBUF_SIZE - 100) ? TRUE : FALSE; } -static void -xc_identify(driver_t *driver, device_t parent) -{ - device_t child; - child = BUS_ADD_CHILD(parent, 0, driver_name, 0); - device_set_driver(child, driver); - device_set_desc(child, "Xen Console"); -} - static int xc_probe(device_t dev) { @@ -245,7 +241,7 @@ xc_attach(device_t dev) cnsl_evt_reg = 1; callout_reset(&xc_callout, XC_POLLTIME, xc_timeout, xccons); - if (xen_start_info->flags & SIF_INITDOMAIN) { + if (HYPERVISOR_start_info->flags & SIF_INITDOMAIN) { error = xen_intr_bind_virq(dev, VIRQ_CONSOLE, 0, NULL, xencons_priv_interrupt, NULL, INTR_TYPE_TTY, &xen_intr_handle); @@ -309,7 +305,7 @@ __xencons_tx_flush(void) sz = wp - wc; if (sz > (WBUF_SIZE - WBUF_MASK(wc))) sz = WBUF_SIZE - WBUF_MASK(wc); - if (xen_start_info->flags & SIF_INITDOMAIN) { + if (HYPERVISOR_start_info->flags & SIF_INITDOMAIN) { HYPERVISOR_console_io(CONSOLEIO_write, sz, &wbuf[WBUF_MASK(wc)]); wc += sz; } else { @@ -405,7 +401,6 @@ xc_timeout(void *v) } static device_method_t xc_methods[] = { - DEVMETHOD(device_identify, xc_identify), DEVMETHOD(device_probe, xc_probe), DEVMETHOD(device_attach, xc_attach), @@ -424,7 +419,7 @@ xcons_force_flush(void) { int sz; - if (xen_start_info->flags & SIF_INITDOMAIN) + if (HYPERVISOR_start_info->flags & SIF_INITDOMAIN) return; /* Spin until console data is flushed through to the domain controller. */ diff --git a/sys/dev/xen/console/xencons_ring.c b/sys/dev/xen/console/xencons_ring.c index 3701551..3046498 100644 --- a/sys/dev/xen/console/xencons_ring.c +++ b/sys/dev/xen/console/xencons_ring.c @@ -32,9 +32,9 @@ __FBSDID("$FreeBSD$"); #define console_evtchn console.domU.evtchn xen_intr_handle_t console_handle; -extern char *console_page; extern struct mtx cn_mtx; extern device_t xencons_dev; +extern int cnsl_evt_reg; static inline struct xencons_interface * xencons_interface(void) @@ -60,6 +60,7 @@ xencons_ring_send(const char *data, unsigned len) struct xencons_interface *intf; XENCONS_RING_IDX cons, prod; int sent; + struct evtchn_send send = { .port = HYPERVISOR_start_info->console.domU.evtchn }; intf = xencons_interface(); cons = intf->out_cons; @@ -76,7 +77,11 @@ xencons_ring_send(const char *data, unsigned len) wmb(); intf->out_prod = prod; - xen_intr_signal(console_handle); + if (cnsl_evt_reg) + xen_intr_signal(console_handle); + else + HYPERVISOR_event_channel_op(EVTCHNOP_send, &send); + return sent; @@ -125,11 +130,11 @@ xencons_ring_init(void) { int err; - if (!xen_start_info->console_evtchn) + if (!HYPERVISOR_start_info->console_evtchn) return 0; err = xen_intr_bind_local_port(xencons_dev, - xen_start_info->console_evtchn, NULL, xencons_handle_input, NULL, + HYPERVISOR_start_info->console_evtchn, NULL, xencons_handle_input, NULL, INTR_TYPE_MISC | INTR_MPSAFE, &console_handle); if (err) { return err; @@ -145,7 +150,7 @@ void xencons_suspend(void) { - if (!xen_start_info->console_evtchn) + if (!HYPERVISOR_start_info->console_evtchn) return; xen_intr_unbind(&console_handle); diff --git a/sys/dev/xen/control/control.c b/sys/dev/xen/control/control.c index a9f8d1b..35c923d 100644 --- a/sys/dev/xen/control/control.c +++ b/sys/dev/xen/control/control.c @@ -317,21 +317,6 @@ xctrl_suspend() EVENTHANDLER_INVOKE(power_resume); } -static void -xen_pv_shutdown_final(void *arg, int howto) -{ - /* - * Inform the hypervisor that shutdown is complete. - * This is not necessary in HVM domains since Xen - * emulates ACPI in that mode and FreeBSD''s ACPI - * support will request this transition. - */ - if (howto & (RB_HALT | RB_POWEROFF)) - HYPERVISOR_shutdown(SHUTDOWN_poweroff); - else - HYPERVISOR_shutdown(SHUTDOWN_reboot); -} - #else /* HVM mode suspension. */ @@ -447,6 +432,21 @@ xctrl_halt() shutdown_nice(RB_HALT); } +static void +xen_pv_shutdown_final(void *arg, int howto) +{ + /* + * Inform the hypervisor that shutdown is complete. + * This is not necessary in HVM domains since Xen + * emulates ACPI in that mode and FreeBSD''s ACPI + * support will request this transition. + */ + if (howto & (RB_HALT | RB_POWEROFF)) + HYPERVISOR_shutdown(SHUTDOWN_poweroff); + else + HYPERVISOR_shutdown(SHUTDOWN_reboot); +} + /*------------------------------ Event Reception -----------------------------*/ static void xctrl_on_watch_event(struct xs_watch *watch, const char **vec, unsigned int len) @@ -529,10 +529,9 @@ xctrl_attach(device_t dev) xctrl->xctrl_watch.callback_data = (uintptr_t)xctrl; xs_register_watch(&xctrl->xctrl_watch); -#ifndef XENHVM - EVENTHANDLER_REGISTER(shutdown_final, xen_pv_shutdown_final, NULL, - SHUTDOWN_PRI_LAST); -#endif + if (xen_pv_domain()) + EVENTHANDLER_REGISTER(shutdown_final, xen_pv_shutdown_final, NULL, + SHUTDOWN_PRI_LAST); return (0); } diff --git a/sys/dev/xen/timer/timer.c b/sys/dev/xen/timer/timer.c index 354085b..333f1b0 100644 --- a/sys/dev/xen/timer/timer.c +++ b/sys/dev/xen/timer/timer.c @@ -59,6 +59,9 @@ __FBSDID("$FreeBSD$"); #include <machine/_inttypes.h> #include <machine/smp.h> +/* For the declaration of clock_lock */ +#include <isa/rtc.h> + #include "clock_if.h" static devclass_t xentimer_devclass; @@ -95,19 +98,6 @@ struct xentimer_softc { /* Last time; this guarantees a monotonically increasing clock. */ volatile uint64_t xen_timer_last_time = 0; -static void -xentimer_identify(driver_t *driver, device_t parent) -{ - if (!xen_domain()) - return; - - /* Handle all Xen PV timers in one device instance. */ - if (devclass_get_device(xentimer_devclass, 0)) - return; - - BUS_ADD_CHILD(parent, 0, "xen_et", 0); -} - static int xentimer_probe(device_t dev) { @@ -234,18 +224,16 @@ xen_fetch_vcpu_tinfo(struct vcpu_time_info *dst, struct vcpu_time_info *src) * it happens to be less than another CPU''s previously determined value. */ static uint64_t -xen_fetch_vcpu_time(void) +xen_fetch_vcpu_time(struct vcpu_info *vcpu) { struct vcpu_time_info dst; struct vcpu_time_info *src; uint32_t pre_version; uint64_t now; volatile uint64_t last; - struct vcpu_info *vcpu = DPCPU_GET(vcpu_info); src = &vcpu->time; - critical_enter(); do { pre_version = xen_fetch_vcpu_tinfo(&dst, src); barrier(); @@ -266,16 +254,19 @@ xen_fetch_vcpu_time(void) } } while (!atomic_cmpset_64(&xen_timer_last_time, last, now)); - critical_exit(); - return (now); } static uint32_t xentimer_get_timecount(struct timecounter *tc) { + uint32_t xen_time; + + critical_enter(); + xen_time = (uint32_t)xen_fetch_vcpu_time(DPCPU_GET(vcpu_info)) & UINT_MAX; + critical_exit(); - return ((uint32_t)xen_fetch_vcpu_time() & UINT_MAX); + return xen_time; } /** @@ -305,7 +296,12 @@ xen_fetch_wallclock(struct timespec *ts) static void xen_fetch_uptime(struct timespec *ts) { - uint64_t uptime = xen_fetch_vcpu_time(); + uint64_t uptime; + + critical_enter(); + uptime = xen_fetch_vcpu_time(DPCPU_GET(vcpu_info)); + critical_exit(); + ts->tv_sec = uptime / NSEC_IN_SEC; ts->tv_nsec = uptime % NSEC_IN_SEC; } @@ -354,7 +350,7 @@ xentimer_intr(void *arg) struct xentimer_softc *sc = (struct xentimer_softc *)arg; struct xentimer_pcpu_data *pcpu = DPCPU_PTR(xentimer_pcpu); - pcpu->last_processed = xen_fetch_vcpu_time(); + pcpu->last_processed = xen_fetch_vcpu_time(DPCPU_GET(vcpu_info)); if (pcpu->timer != 0 && sc->et.et_active) sc->et.et_event_cb(&sc->et, sc->et.et_arg); @@ -415,7 +411,9 @@ xentimer_et_start(struct eventtimer *et, do { if (++i == 60) panic("can''t schedule timer"); - next_time = xen_fetch_vcpu_time() + first_in_ns; + critical_enter(); + next_time = xen_fetch_vcpu_time(DPCPU_GET(vcpu_info)) + first_in_ns; + critical_exit(); error = xentimer_vcpu_start_timer(cpu, next_time); } while (error == -ETIME); @@ -573,8 +571,37 @@ xentimer_suspend(device_t dev) return (0); } +/* + * Xen delay early init + */ +void xen_delay_init(void) +{ + /* Init the clock lock */ + mtx_init(&clock_lock, "clk", NULL, MTX_SPIN | MTX_NOPROFILE); +} +/* + * Xen PV DELAY function + * + * When running on PVH mode we don''t have an emulated i8524, so + * make use of the Xen time info in order to code a simple DELAY + * function that can be used during early boot. + */ +void xen_delay(int n) +{ + uint64_t end_ns; + uint64_t current; + + end_ns = xen_fetch_vcpu_time(&HYPERVISOR_shared_info->vcpu_info[0]); + end_ns += n * NSEC_IN_USEC; + + for (;;) { + current = xen_fetch_vcpu_time(&HYPERVISOR_shared_info->vcpu_info[0]); + if (current >= end_ns) + break; + } +} + static device_method_t xentimer_methods[] = { - DEVMETHOD(device_identify, xentimer_identify), DEVMETHOD(device_probe, xentimer_probe), DEVMETHOD(device_attach, xentimer_attach), DEVMETHOD(device_detach, xentimer_detach), diff --git a/sys/dev/xen/xenpci/xenpci.c b/sys/dev/xen/xenpci/xenpci.c index dd2ad92..a19ebcb 100644 --- a/sys/dev/xen/xenpci/xenpci.c +++ b/sys/dev/xen/xenpci/xenpci.c @@ -240,6 +240,7 @@ xenpci_attach(device_t dev) { struct xenpci_softc *scp = device_get_softc(dev); devclass_t dc; + device_t child; int error; /* @@ -270,6 +271,13 @@ xenpci_attach(device_t dev) goto errexit; } + if (BUS_ADD_CHILD(dev, 0, "xenstore", 0) == NULL) + panic("xenpci: unable to add xenstore device"); + child = BUS_ADD_CHILD(nexus, 0, "xen_et", 0); + if (child == NULL) + panic("xenpci: unable to add xen pv timer device"); + device_probe_and_attach(child); + return (bus_generic_attach(dev)); errexit: diff --git a/sys/i386/i386/locore.s b/sys/i386/i386/locore.s index 68cb430..bd136b1 100644 --- a/sys/i386/i386/locore.s +++ b/sys/i386/i386/locore.s @@ -898,3 +898,12 @@ done_pde: #endif ret + +#ifdef XENHVM +/* Xen Hypercall page */ + .text +.p2align PAGE_SHIFT, 0x90 /* Hypercall_page needs to be PAGE aligned */ + +NON_GPROF_ENTRY(hypercall_page) + .skip 0x1000, 0x90 /* Fill with "nop"s */ +#endif diff --git a/sys/i386/i386/machdep.c b/sys/i386/i386/machdep.c index c430316..af12b1d 100644 --- a/sys/i386/i386/machdep.c +++ b/sys/i386/i386/machdep.c @@ -254,6 +254,17 @@ struct mtx icu_lock; struct mem_range_softc mem_range_softc; +#ifndef XEN +void +DELAY(int n) +{ + if (delay_tc(n)) + return; + + i8254_delay(n); +} +#endif + static void cpu_startup(dummy) void *dummy; diff --git a/sys/i386/include/clock.h b/sys/i386/include/clock.h index d980ec7..287b2c8 100644 --- a/sys/i386/include/clock.h +++ b/sys/i386/include/clock.h @@ -22,6 +22,12 @@ extern int tsc_is_invariant; extern int tsc_perf_stat; void i8254_init(void); +void i8254_delay(int); +#ifdef XENHVM +void xen_delay_init(void); +void xen_delay(int); +#endif +int delay_tc(int); /* * Driver to clock driver interface. diff --git a/sys/i386/include/xen/hypercall.h b/sys/i386/include/xen/hypercall.h index edc13f4..1c15b0f 100644 --- a/sys/i386/include/xen/hypercall.h +++ b/sys/i386/include/xen/hypercall.h @@ -40,15 +40,8 @@ #define CONFIG_XEN_COMPAT 0x030002 -#if defined(XEN) #define HYPERCALL_STR(name) \ "call hypercall_page + ("STR(__HYPERVISOR_##name)" * 32)" -#else -#define HYPERCALL_STR(name) \ - "mov hypercall_stubs,%%eax; " \ - "add $("STR(__HYPERVISOR_##name)" * 32),%%eax; " \ - "call *%%eax" -#endif #define _hypercall0(type, name) \ ({ \ diff --git a/sys/i386/xen/mp_machdep.c b/sys/i386/xen/mp_machdep.c index c48fcb2..adf7627 100644 --- a/sys/i386/xen/mp_machdep.c +++ b/sys/i386/xen/mp_machdep.c @@ -928,9 +928,9 @@ cpu_initialize_context(unsigned int cpu) smp_trap_init(ctxt.trap_ctxt); ctxt.ldt_ents = 0; - ctxt.gdt_frames[0] + ctxt.u.pv.gdt_frames[0] (uint32_t)((uint64_t)vtomach(bootAPgdt) >> PAGE_SHIFT); - ctxt.gdt_ents = 512; + ctxt.u.pv.gdt_ents = 512; #ifdef __i386__ ctxt.user_regs.esp = boot_stack + PAGE_SIZE; @@ -959,7 +959,7 @@ cpu_initialize_context(unsigned int cpu) #endif printf("gdtpfn=%lx pdptpfn=%lx\n", - ctxt.gdt_frames[0], + ctxt.u.pv.gdt_frames[0], ctxt.ctrlreg[3] >> PAGE_SHIFT); PANIC_IF(HYPERVISOR_vcpu_op(VCPUOP_initialise, cpu, &ctxt)); diff --git a/sys/i386/xen/xen_machdep.c b/sys/i386/xen/xen_machdep.c index 7049be6..1b1c74d 100644 --- a/sys/i386/xen/xen_machdep.c +++ b/sys/i386/xen/xen_machdep.c @@ -89,6 +89,7 @@ IDTVEC(div), IDTVEC(dbg), IDTVEC(nmi), IDTVEC(bpt), IDTVEC(ofl), int xendebug_flags; start_info_t *xen_start_info; +start_info_t *HYPERVISOR_start_info; shared_info_t *HYPERVISOR_shared_info; xen_pfn_t *xen_machine_phys = machine_to_phys_mapping; xen_pfn_t *xen_phys_machine; @@ -744,7 +745,7 @@ void initvalues(start_info_t *startinfo); struct xenstore_domain_interface; extern struct xenstore_domain_interface *xen_store; -char *console_page; +extern char *console_page; void * bootmem_alloc(unsigned int size) @@ -927,6 +928,7 @@ initvalues(start_info_t *startinfo) HYPERVISOR_vm_assist(VMASST_CMD_enable, VMASST_TYPE_4gb_segments_notify); #endif xen_start_info = startinfo; + HYPERVISOR_start_info = startinfo; xen_phys_machine = (xen_pfn_t *)startinfo->mfn_list; IdlePTD = (pd_entry_t *)((uint8_t *)startinfo->pt_base + PAGE_SIZE); diff --git a/sys/x86/isa/clock.c b/sys/x86/isa/clock.c index a12e175..a5aed1c 100644 --- a/sys/x86/isa/clock.c +++ b/sys/x86/isa/clock.c @@ -247,61 +247,13 @@ getit(void) return ((high << 8) | low); } -#ifndef DELAYDEBUG -static u_int -get_tsc(__unused struct timecounter *tc) -{ - - return (rdtsc32()); -} - -static __inline int -delay_tc(int n) -{ - struct timecounter *tc; - timecounter_get_t *func; - uint64_t end, freq, now; - u_int last, mask, u; - - tc = timecounter; - freq = atomic_load_acq_64(&tsc_freq); - if (tsc_is_invariant && freq != 0) { - func = get_tsc; - mask = ~0u; - } else { - if (tc->tc_quality <= 0) - return (0); - func = tc->tc_get_timecount; - mask = tc->tc_counter_mask; - freq = tc->tc_frequency; - } - now = 0; - end = freq * n / 1000000; - if (func == get_tsc) - sched_pin(); - last = func(tc) & mask; - do { - cpu_spinwait(); - u = func(tc) & mask; - if (u < last) - now += mask - last + u + 1; - else - now += u - last; - last = u; - } while (now < end); - if (func == get_tsc) - sched_unpin(); - return (1); -} -#endif - /* * Wait "n" microseconds. * Relies on timer 1 counting down from (i8254_freq / hz) * Note: timer had better have been programmed before this is first used! */ void -DELAY(int n) +i8254_delay(int n) { int delta, prev_tick, tick, ticks_left; #ifdef DELAYDEBUG @@ -317,9 +269,6 @@ DELAY(int n) } if (state == 1) printf("DELAY(%d)...", n); -#else - if (delay_tc(n)) - return; #endif /* * Read the counter first, so that the rest of the setup overhead is diff --git a/sys/x86/isa/isa.c b/sys/x86/isa/isa.c index 1a57137..09d1ab7 100644 --- a/sys/x86/isa/isa.c +++ b/sys/x86/isa/isa.c @@ -241,3 +241,6 @@ isa_release_resource(device_t bus, device_t child, int type, int rid, * On this platform, isa can also attach to the legacy bus. */ DRIVER_MODULE(isa, legacy, isa_driver, isa_devclass, 0, 0); +#ifdef XENHVM +DRIVER_MODULE(isa, nexus, isa_driver, isa_devclass, 0, 0); +#endif diff --git a/sys/x86/x86/delay.c b/sys/x86/x86/delay.c new file mode 100644 index 0000000..7ea70b1 --- /dev/null +++ b/sys/x86/x86/delay.c @@ -0,0 +1,95 @@ +/*- + * Copyright (c) 1990 The Regents of the University of California. + * Copyright (c) 2010 Alexander Motin <mav@FreeBSD.org> + * All rights reserved. + * + * This code is derived from software contributed to Berkeley by + * William Jolitz and Don Ahn. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * 4. Neither the name of the University nor the names of its contributors + * may be used to endorse or promote products derived from this software + * without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'''' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + * + * from: @(#)clock.c 7.2 (Berkeley) 5/12/91 + */ + +#include <sys/cdefs.h> +__FBSDID("$FreeBSD$"); + +/* Generic x86 routines to handle delay */ + +#include <sys/param.h> +#include <sys/systm.h> +#include <sys/timetc.h> +#include <sys/proc.h> +#include <sys/kernel.h> +#include <sys/sched.h> + +#include <machine/clock.h> +#include <machine/cpu.h> + +static u_int +get_tsc(__unused struct timecounter *tc) +{ + + return (rdtsc32()); +} + +int +delay_tc(int n) +{ + struct timecounter *tc; + timecounter_get_t *func; + uint64_t end, freq, now; + u_int last, mask, u; + + tc = timecounter; + freq = atomic_load_acq_64(&tsc_freq); + if (tsc_is_invariant && freq != 0) { + func = get_tsc; + mask = ~0u; + } else { + if (tc->tc_quality <= 0) + return (0); + func = tc->tc_get_timecount; + mask = tc->tc_counter_mask; + freq = tc->tc_frequency; + } + now = 0; + end = freq * n / 1000000; + if (func == get_tsc) + sched_pin(); + last = func(tc) & mask; + do { + cpu_spinwait(); + u = func(tc) & mask; + if (u < last) + now += mask - last + u + 1; + else + now += u - last; + last = u; + } while (now < end); + if (func == get_tsc) + sched_unpin(); + return (1); +} diff --git a/sys/x86/x86/local_apic.c b/sys/x86/x86/local_apic.c index 8c8eef6..d8d7701 100644 --- a/sys/x86/x86/local_apic.c +++ b/sys/x86/x86/local_apic.c @@ -1368,9 +1368,13 @@ apic_setup_io(void *dummy __unused) if (retval != 0) printf("%s: Failed to setup I/O APICs: returned %d\n", best_enum->apic_name, retval); -#ifdef XEN - return; + +#if defined(XEN) || defined(XENHVM) + /* There''s no lapic on PV Xen */ + if (xen_pv_domain()) + return; #endif + /* * Finish setting up the local APIC on the BSP once we know how to * properly program the LINT pins. diff --git a/sys/x86/xen/hvm.c b/sys/x86/xen/hvm.c index 72811dc..dc8d9a2 100644 --- a/sys/x86/xen/hvm.c +++ b/sys/x86/xen/hvm.c @@ -35,15 +35,21 @@ __FBSDID("$FreeBSD$"); #include <sys/proc.h> #include <sys/smp.h> #include <sys/systm.h> +#include <sys/lock.h> +#include <sys/mutex.h> +#include <sys/reboot.h> #include <vm/vm.h> #include <vm/pmap.h> +#include <vm/vm_kern.h> +#include <vm/vm_extern.h> #include <dev/pci/pcivar.h> #include <machine/cpufunc.h> #include <machine/cpu.h> #include <machine/smp.h> +#include <machine/stdarg.h> #include <x86/apicreg.h> @@ -52,6 +58,9 @@ __FBSDID("$FreeBSD$"); #include <xen/gnttab.h> #include <xen/hypervisor.h> #include <xen/hvm.h> +#ifdef __amd64__ +#include <xen/pv.h> +#endif #include <xen/xen_intr.h> #include <xen/interface/hvm/params.h> @@ -97,6 +106,11 @@ extern void pmap_lazyfix_action(void); /* Variables used by mp_machdep to perform the bitmap IPI */ extern volatile u_int cpu_ipi_pending[MAXCPU]; +#ifdef __amd64__ +/* Native AP start used on PVHVM */ +extern int native_start_all_aps(void); +#endif + /*---------------------------------- Macros ----------------------------------*/ #define IPI_TO_IDX(ipi) ((ipi) - APIC_IPI_INTS) @@ -119,7 +133,10 @@ enum xen_domain_type xen_domain_type = XEN_NATIVE; struct cpu_ops xen_hvm_cpu_ops = { .ipi_vectored = lapic_ipi_vectored, .cpu_init = xen_hvm_cpu_init, - .cpu_resume = xen_hvm_cpu_resume + .cpu_resume = xen_hvm_cpu_resume, +#ifdef __amd64__ + .start_all_aps = native_start_all_aps, +#endif }; static MALLOC_DEFINE(M_XENHVM, "xen_hvm", "Xen HVM PV Support"); @@ -157,8 +174,9 @@ DPCPU_DEFINE(xen_intr_handle_t, ipi_handle[nitems(xen_ipis)]); /*------------------ Hypervisor Access Shared Memory Regions -----------------*/ /** Hypercall table accessed via HYPERVISOR_*_op() methods. */ -char *hypercall_stubs; +extern char *hypercall_page; shared_info_t *HYPERVISOR_shared_info; +start_info_t *HYPERVISOR_start_info; #ifdef SMP /*---------------------------- XEN PV IPI Handlers ---------------------------*/ @@ -522,7 +540,7 @@ xen_setup_cpus(void) { int i; - if (!xen_hvm_domain() || !xen_vector_callback_enabled) + if (!xen_vector_callback_enabled) return; #ifdef __amd64__ @@ -558,7 +576,7 @@ xen_hvm_cpuid_base(void) * Allocate and fill in the hypcall page. */ static int -xen_hvm_init_hypercall_stubs(void) +xen_hvm_init_hypercall_stubs(enum xen_hvm_init_type init_type) { uint32_t base, regs[4]; int i; @@ -567,7 +585,7 @@ xen_hvm_init_hypercall_stubs(void) if (base == 0) return (ENXIO); - if (hypercall_stubs == NULL) { + if (init_type == XEN_HVM_INIT_COLD) { do_cpuid(base + 1, regs); printf("XEN: Hypervisor version %d.%d detected.\n", regs[0] >> 16, regs[0] & 0xffff); @@ -577,18 +595,9 @@ xen_hvm_init_hypercall_stubs(void) * Find the hypercall pages. */ do_cpuid(base + 2, regs); - - if (hypercall_stubs == NULL) { - size_t call_region_size; - - call_region_size = regs[0] * PAGE_SIZE; - hypercall_stubs = malloc(call_region_size, M_XENHVM, M_NOWAIT); - if (hypercall_stubs == NULL) - panic("Unable to allocate Xen hypercall region"); - } for (i = 0; i < regs[0]; i++) - wrmsr(regs[1], vtophys(hypercall_stubs + i * PAGE_SIZE) + i); + wrmsr(regs[1], vtophys(&hypercall_page + i * PAGE_SIZE) + i); return (0); } @@ -677,8 +686,6 @@ xen_hvm_disable_emulated_devices(void) if (inw(XEN_MAGIC_IOPORT) != XMI_MAGIC) return; - if (bootverbose) - printf("XEN: Disabling emulated block and network devices\n"); outw(XEN_MAGIC_IOPORT, XMI_UNPLUG_IDE_DISKS|XMI_UNPLUG_NICS); } @@ -691,7 +698,12 @@ xen_hvm_init(enum xen_hvm_init_type init_type) if (init_type == XEN_HVM_INIT_CANCELLED_SUSPEND) return; - error = xen_hvm_init_hypercall_stubs(); + if (xen_pv_domain()) { + /* hypercall page is already set in the PV case */ + error = 0; + } else { + error = xen_hvm_init_hypercall_stubs(init_type); + } switch (init_type) { case XEN_HVM_INIT_COLD: @@ -701,6 +713,12 @@ xen_hvm_init(enum xen_hvm_init_type init_type) setup_xen_features(); cpu_ops = xen_hvm_cpu_ops; vm_guest = VM_GUEST_XEN; +#ifdef __amd64__ + if (xen_pv_domain()) + cpu_ops.start_all_aps = xen_pv_start_all_aps; + else +#endif + printf("XEN: Disabling emulated block and network devices\n"); break; case XEN_HVM_INIT_RESUME: if (error != 0) @@ -715,10 +733,13 @@ xen_hvm_init(enum xen_hvm_init_type init_type) } xen_vector_callback_enabled = 0; - xen_domain_type = XEN_HVM_DOMAIN; - xen_hvm_init_shared_info_page(); xen_hvm_set_callback(NULL); - xen_hvm_disable_emulated_devices(); + + if (!xen_pv_domain()) { + xen_domain_type = XEN_HVM_DOMAIN; + xen_hvm_init_shared_info_page(); + xen_hvm_disable_emulated_devices(); + } } void @@ -749,10 +770,14 @@ xen_set_vcpu_id(void) struct pcpu *pc; int i; - /* Set vcpu_id to acpi_id */ + if (!xen_domain()) + return; + + /* Set vcpu_id to acpi_id for PVHVM guests */ CPU_FOREACH(i) { pc = pcpu_find(i); - pc->pc_vcpu_id = pc->pc_acpi_id; + if (xen_hvm_domain()) + pc->pc_vcpu_id = pc->pc_acpi_id; if (bootverbose) printf("XEN: CPU %u has VCPU ID %u\n", i, pc->pc_vcpu_id); @@ -790,9 +815,34 @@ xen_hvm_cpu_init(void) DPCPU_SET(vcpu_info, vcpu_info); } +/*----------------------------- Debug functions ------------------------------*/ +#define PRINTK_BUFSIZE 1024 +static int +vprintk(const char *fmt, __va_list ap) +{ + int retval, len; + static char buf[PRINTK_BUFSIZE]; + + retval = vsnprintf(buf, PRINTK_BUFSIZE - 1, fmt, ap); + buf[retval] = 0; + len = strlen(buf); + retval = HYPERVISOR_console_io(CONSOLEIO_write, len, (char *)buf); + return retval; +} + +void +xen_early_printf(const char *fmt, ...) +{ + __va_list ap; + + va_start(ap, fmt); + vprintk(fmt, ap); + va_end(ap); +} + SYSINIT(xen_hvm_init, SI_SUB_HYPERVISOR, SI_ORDER_FIRST, xen_hvm_sysinit, NULL); #ifdef SMP -SYSINIT(xen_setup_cpus, SI_SUB_SMP, SI_ORDER_FIRST, xen_setup_cpus, NULL); +SYSINIT(xen_setup_cpus, SI_SUB_SMP-1, SI_ORDER_ANY, xen_setup_cpus, NULL); #endif SYSINIT(xen_hvm_cpu_init, SI_SUB_INTR, SI_ORDER_FIRST, xen_hvm_cpu_init, NULL); SYSINIT(xen_set_vcpu_id, SI_SUB_CPU, SI_ORDER_ANY, xen_set_vcpu_id, NULL); diff --git a/sys/x86/xen/mptable.c b/sys/x86/xen/mptable.c new file mode 100644 index 0000000..8916314 --- /dev/null +++ b/sys/x86/xen/mptable.c @@ -0,0 +1,136 @@ +/*- + * Copyright (c) 2003 John Baldwin <jhb@FreeBSD.org> + * Copyright (c) 2013 Roger Pau Monné <roger.pau@citrix.com> + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * 3. Neither the name of the author nor the names of any co-contributors + * may be used to endorse or promote products derived from this software + * without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'''' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + */ + +#include <sys/cdefs.h> +__FBSDID("$FreeBSD$"); + +#include <sys/param.h> +#include <sys/systm.h> +#include <sys/bus.h> +#include <sys/kernel.h> +#include <sys/smp.h> +#include <sys/pcpu.h> +#include <vm/vm.h> +#include <vm/pmap.h> + +#include <machine/intr_machdep.h> +#include <machine/apicvar.h> + +#include <machine/cpu.h> +#include <machine/smp.h> + +#include <xen/xen-os.h> +#include <xen/hypervisor.h> + +#include <xen/interface/vcpu.h> + +static int xenpv_probe(void); +static int xenpv_probe_cpus(void); +static int xenpv_setup_local(void); +static int xenpv_setup_io(void); + +static struct apic_enumerator xenpv_enumerator = { + "Xen PV", + xenpv_probe, + xenpv_probe_cpus, + xenpv_setup_local, + xenpv_setup_io +}; + +/* + * Look for an ACPI Multiple APIC Description Table ("APIC") + */ +static int +xenpv_probe(void) +{ + return (-100); +} + +/* + * Run through the MP table enumerating CPUs. + */ +static int +xenpv_probe_cpus(void) +{ + int i, ret; + + for (i = 0; i < MAXCPU; i++) { + ret = HYPERVISOR_vcpu_op(VCPUOP_is_up, i, NULL); + if (ret >= 0) + cpu_add((i * 2), (i == 0)); + } + + return (0); +} + +/* + * Initialize the local APIC on the BSP. + */ +static int +xenpv_setup_local(void) +{ + PCPU_SET(vcpu_id, 0); + return (0); +} + +/* + * Enumerate I/O APICs and setup interrupt sources. + */ +static int +xenpv_setup_io(void) +{ + return (0); +} + +static void +xenpv_register(void *dummy __unused) +{ + if (xen_pv_domain()) { + apic_register_enumerator(&xenpv_enumerator); + } +} +SYSINIT(xenpv_register, SI_SUB_TUNABLES - 1, SI_ORDER_FIRST, xenpv_register, NULL); + +/* + * Setup per-CPU ACPI IDs. + */ +static void +xenpv_set_ids(void *dummy) +{ + struct pcpu *pc; + int i; + + CPU_FOREACH(i) { + pc = pcpu_find(i); + pc->pc_vcpu_id = i; + } + return; +} +SYSINIT(xenpv_set_ids, SI_SUB_CPU, SI_ORDER_MIDDLE, xenpv_set_ids, NULL); diff --git a/sys/x86/xen/pv.c b/sys/x86/xen/pv.c new file mode 100644 index 0000000..ea1706f --- /dev/null +++ b/sys/x86/xen/pv.c @@ -0,0 +1,246 @@ +/* + * Copyright (c) 2004 Christian Limpach. + * Copyright (c) 2004-2006,2008 Kip Macy + * Copyright (c) 2013 Roger Pau Monné <roger.pau@citrix.com> + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS AS IS'''' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + */ + +#include <sys/cdefs.h> +__FBSDID("$FreeBSD$"); + +#include <sys/param.h> +#include <sys/bus.h> +#include <sys/kernel.h> +#include <sys/malloc.h> +#include <sys/proc.h> +#include <sys/smp.h> +#include <sys/systm.h> +#include <sys/lock.h> +#include <sys/mutex.h> +#include <sys/reboot.h> + +#include <vm/vm.h> +#include <vm/pmap.h> +#include <vm/vm_kern.h> +#include <vm/vm_extern.h> + +#include <dev/pci/pcivar.h> + +#include <machine/cpufunc.h> +#include <machine/cpu.h> +#include <machine/smp.h> +#include <machine/tss.h> +#include <machine/sysarch.h> +#include <machine/clock.h> + +#include <x86/apicreg.h> + +#include <xen/xen-os.h> +#include <xen/features.h> +#include <xen/gnttab.h> +#include <xen/hypervisor.h> +#include <xen/hvm.h> +#include <xen/pv.h> +#include <xen/xen_intr.h> + +#include <xen/interface/hvm/params.h> +#include <xen/interface/vcpu.h> + +#define MAX_E820_ENTRIES 128 + +/*--------------------------- Forward Declarations ---------------------------*/ +static caddr_t xen_pv_parse_preload_data(u_int64_t); +static void xen_pv_fetch_e820_map(caddr_t, struct bios_smap **, u_int32_t *); + +/*---------------------------- Extern Declarations ---------------------------*/ +/* Variables used by amd64 mp_machdep to start APs */ +extern struct mtx ap_boot_mtx; +extern void *bootstacks[]; +extern char *doublefault_stack; +extern char *nmi_stack; +extern void *dpcpu; +extern int bootAP; +extern char *bootSTK; +extern bool lapic_disabled; + +/*-------------------------------- Global Data -------------------------------*/ +/* Xen init_ops implementation. */ +struct init_ops xen_init_ops = { + .parse_preload_data = xen_pv_parse_preload_data, + .early_delay_init = xen_delay_init, + .early_delay = xen_delay, + .fetch_e820_map = xen_pv_fetch_e820_map, +}; + +static struct +{ + const char *ev; + int mask; +} howto_names[] = { + {"boot_askname", RB_ASKNAME}, + {"boot_single", RB_SINGLE}, + {"boot_nosync", RB_NOSYNC}, + {"boot_halt", RB_ASKNAME}, + {"boot_serial", RB_SERIAL}, + {"boot_cdrom", RB_CDROM}, + {"boot_gdb", RB_GDB}, + {"boot_gdb_pause", RB_RESERVED1}, + {"boot_verbose", RB_VERBOSE}, + {"boot_multicons", RB_MULTIPLE}, + {NULL, 0} +}; + +static struct bios_smap xen_smap[MAX_E820_ENTRIES]; + +static int +start_xen_ap(int cpu) +{ + struct vcpu_guest_context *ctxt; + int ms, cpus = mp_naps; + + ctxt = malloc(sizeof(*ctxt), M_TEMP, M_NOWAIT | M_ZERO); + if (ctxt == NULL) + panic("unable to allocate memory"); + + ctxt->flags = VGCF_IN_KERNEL; + ctxt->user_regs.rip = (unsigned long) init_secondary; + ctxt->user_regs.rsp = (unsigned long) bootSTK; + + /* Set the CPU to use the same page tables and CR4 value */ + ctxt->ctrlreg[3] = KPML4phys; + + if (HYPERVISOR_vcpu_op(VCPUOP_initialise, cpu, ctxt)) + panic("unable to initialize CPU#%d\n", cpu); + + free(ctxt, M_TEMP); + + /* Launch the vCPU */ + if (HYPERVISOR_vcpu_op(VCPUOP_up, cpu, NULL)) + panic("unable to start AP#%d\n", cpu); + + /* Wait up to 5 seconds for it to start. */ + for (ms = 0; ms < 5000; ms++) { + if (mp_naps > cpus) + return 1; /* return SUCCESS */ + DELAY(1000); + } + + return 0; +} + +int +xen_pv_start_all_aps(void) +{ + int cpu; + + mtx_init(&ap_boot_mtx, "ap boot", NULL, MTX_SPIN); + lapic_disabled = true; + + for (cpu = 1; cpu < mp_ncpus; cpu++) { + + /* allocate and set up an idle stack data page */ + bootstacks[cpu] = (void *)kmem_malloc(kernel_arena, + KSTACK_PAGES * PAGE_SIZE, M_WAITOK | M_ZERO); + doublefault_stack = (char *)kmem_malloc(kernel_arena, + PAGE_SIZE, M_WAITOK | M_ZERO); + nmi_stack = (char *)kmem_malloc(kernel_arena, PAGE_SIZE, + M_WAITOK | M_ZERO); + dpcpu = (void *)kmem_malloc(kernel_arena, DPCPU_SIZE, + M_WAITOK | M_ZERO); + + bootSTK = (char *)bootstacks[cpu] + KSTACK_PAGES * PAGE_SIZE - 8; + bootAP = cpu; + + /* attempt to start the Application Processor */ + if (!start_xen_ap(cpu)) + panic("AP #%d failed to start!", cpu); + + CPU_SET(cpu, &all_cpus); /* record AP in CPU map */ + } + + return mp_naps; +} + +/* + * Functions to convert the "extra" parameters passed by Xen + * into FreeBSD boot options (from the i386 Xen port). + */ +static char * +xen_setbootenv(char *cmd_line) +{ + char *cmd_line_next; + + /* Skip leading spaces */ + for (; *cmd_line == '' ''; cmd_line++); + + for (cmd_line_next = cmd_line; strsep(&cmd_line_next, ",") != NULL;); + return (cmd_line); +} + +static int +xen_boothowto(char *envp) +{ + int i, howto = 0; + + /* get equivalents from the environment */ + for (i = 0; howto_names[i].ev != NULL; i++) + if (getenv(howto_names[i].ev) != NULL) + howto |= howto_names[i].mask; + return (howto); +} + +static caddr_t +xen_pv_parse_preload_data(u_int64_t modulep) +{ + /* Parse the extra boot information given by Xen */ + if (HYPERVISOR_start_info->cmd_line) + kern_envp = xen_setbootenv(HYPERVISOR_start_info->cmd_line); + boothowto |= xen_boothowto(kern_envp); + + return (NULL); +} + +static void +xen_pv_fetch_e820_map(caddr_t kmdp, struct bios_smap **smap, u_int32_t *size) +{ + struct xen_memory_map memmap; + int rc; + + /* Fetch the E820 map from Xen */ + memmap.nr_entries = MAX_E820_ENTRIES; + set_xen_guest_handle(memmap.buffer, xen_smap); + rc = HYPERVISOR_memory_op(XENMEM_memory_map, &memmap); + if (rc) + panic("unable to fetch Xen E820 memory map"); + + *smap = xen_smap; + *size = memmap.nr_entries * sizeof(xen_smap[0]); +} + +void +xen_pv_set_init_ops(void) +{ + /* Init ops for Xen PV */ + init_ops = xen_init_ops; +} diff --git a/sys/x86/xen/pvcpu.c b/sys/x86/xen/pvcpu.c new file mode 100644 index 0000000..35d88148 --- /dev/null +++ b/sys/x86/xen/pvcpu.c @@ -0,0 +1,77 @@ +/* + * Copyright (c) 2013 Roger Pau Monné <roger.pau@citrix.com> + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS AS IS'''' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + */ + +#include <sys/cdefs.h> +__FBSDID("$FreeBSD$"); + +#include <sys/param.h> +#include <sys/systm.h> +#include <sys/bus.h> +#include <sys/kernel.h> +#include <sys/module.h> +#include <sys/pcpu.h> +#include <sys/smp.h> + +#include <xen/xen-os.h> + +static int +xenpvcpu_probe(device_t dev) +{ + if (!xen_pv_domain()) + return (ENXIO); + + device_set_desc(dev, "Xen PV CPU"); + return (0); +} + +static int +xenpvcpu_attach(device_t dev) +{ + struct pcpu *pc; + int cpu; + + cpu = device_get_unit(dev); + pc = pcpu_find(cpu); + pc->pc_device = dev; + return (0); +} + +static device_method_t xenpvcpu_methods[] = { + DEVMETHOD(device_probe, xenpvcpu_probe), + DEVMETHOD(device_attach, xenpvcpu_attach), + DEVMETHOD_END +}; + +static driver_t xenpvcpu_driver = { + "pvcpu", + xenpvcpu_methods, + 0, +}; + +devclass_t xenpvcpu_devclass; + +DRIVER_MODULE(xenpvcpu, nexus, xenpvcpu_driver, xenpvcpu_devclass, 0, 0); +MODULE_DEPEND(xenpvcpu, nexus, 1, 1, 1); diff --git a/sys/x86/xen/xen_nexus.c b/sys/x86/xen/xen_nexus.c new file mode 100644 index 0000000..288e6b6 --- /dev/null +++ b/sys/x86/xen/xen_nexus.c @@ -0,0 +1,99 @@ +/* + * Copyright (c) 2013 Roger Pau Monné <roger.pau@citrix.com> + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS AS IS'''' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + */ + +#include <sys/cdefs.h> +__FBSDID("$FreeBSD$"); + +#include <sys/param.h> +#include <sys/bus.h> +#include <sys/kernel.h> +#include <sys/module.h> +#include <sys/sysctl.h> +#include <sys/systm.h> +#include <sys/smp.h> + +#include <machine/nexusvar.h> + +#include <xen/xen-os.h> + +static const char *xen_devices[] +{ + "xenstore", /* XenStore bus */ + "xen_et", /* Xen PV timer (provides: tc, et, clk) */ + "xc", /* Xen PV console */ + "isa", /* Dummy ISA bus for sc to attach */ +}; + +/* + * Xen nexus(4) driver. + */ +static int +nexus_xen_probe(device_t dev) +{ + if (!xen_pv_domain()) + return (ENXIO); + + return (BUS_PROBE_DEFAULT); +} + +static int +nexus_xen_attach(device_t dev) +{ + int i, error = 0; + + nexus_init_resources(); + bus_generic_probe(dev); + + /* + * Since we have no ACPI, we need to create a dummy CPU device + * in order to set pcpu->pc_device. + */ + CPU_FOREACH(i) + if (BUS_ADD_CHILD(dev, 0, "pvcpu", i) == NULL) + panic("unable to add pvcpu#%d device", i); + + for (i = 0; i < nitems(xen_devices); i++) { + if (BUS_ADD_CHILD(dev, 0, xen_devices[i], 0) == NULL) + panic("%s: could not add", xen_devices[i]); + } + + bus_generic_attach(dev); + + return (error); +} + +static device_method_t nexus_xen_methods[] = { + /* Device interface */ + DEVMETHOD(device_probe, nexus_xen_probe), + DEVMETHOD(device_attach, nexus_xen_attach), + + { 0, 0 } +}; + +DEFINE_CLASS_1(nexus, nexus_xen_driver, nexus_xen_methods, 1, nexus_driver); +static devclass_t nexus_devclass; + +DRIVER_MODULE(nexus_xen, root, nexus_xen_driver, nexus_devclass, 0, 0); diff --git a/sys/xen/gnttab.c b/sys/xen/gnttab.c index 03c32b7..909378a 100644 --- a/sys/xen/gnttab.c +++ b/sys/xen/gnttab.c @@ -25,6 +25,7 @@ __FBSDID("$FreeBSD$"); #include <sys/lock.h> #include <sys/malloc.h> #include <sys/mman.h> +#include <sys/limits.h> #include <xen/xen-os.h> #include <xen/hypervisor.h> @@ -607,6 +608,7 @@ gnttab_resume(void) { int error; unsigned int max_nr_gframes, nr_gframes; + void *alloc_mem; nr_gframes = nr_grant_frames; max_nr_gframes = max_nr_grant_frames(); @@ -614,11 +616,20 @@ gnttab_resume(void) return (ENOSYS); if (!resume_frames) { - error = xenpci_alloc_space(PAGE_SIZE * max_nr_gframes, - &resume_frames); - if (error) { - printf("error mapping gnttab share frames\n"); - return (error); + if (xen_pv_domain()) { + alloc_mem = contigmalloc(max_nr_gframes * PAGE_SIZE, + M_DEVBUF, M_NOWAIT, 0, + ULONG_MAX, PAGE_SIZE, 0); + KASSERT((alloc_mem != NULL), + ("unable to alloc memory for gnttab")); + resume_frames = vtophys(alloc_mem); + } else { + error = xenpci_alloc_space(PAGE_SIZE * max_nr_gframes, + &resume_frames); + if (error) { + printf("error mapping gnttab share frames\n"); + return (error); + } } } diff --git a/sys/xen/pv.h b/sys/xen/pv.h new file mode 100644 index 0000000..bbb1048 --- /dev/null +++ b/sys/xen/pv.h @@ -0,0 +1,29 @@ +/* + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the "Software"), to + * deal in the Software without restriction, including without limitation the + * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or + * sell copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER + * DEALINGS IN THE SOFTWARE. + * + * $FreeBSD$ + */ + +#ifndef __XEN_PV_H__ +#define __XEN_PV_H__ + +int xen_pv_start_all_aps(void); +void xen_pv_set_init_ops(void); + +#endif /* __XEN_PV_H__ */ \ No newline at end of file diff --git a/sys/xen/xen-os.h b/sys/xen/xen-os.h index 87644e9..70e4719 100644 --- a/sys/xen/xen-os.h +++ b/sys/xen/xen-os.h @@ -51,6 +51,11 @@ void force_evtchn_callback(void); extern shared_info_t *HYPERVISOR_shared_info; +extern start_info_t *HYPERVISOR_start_info; + +/* XXX: we need to get rid of this and use HYPERVISOR_start_info directly */ +extern struct xenstore_domain_interface *xen_store; +extern char *console_page; enum xen_domain_type { XEN_NATIVE, /* running on bare hardware */ @@ -78,6 +83,9 @@ xen_hvm_domain(void) return (xen_domain_type == XEN_HVM_DOMAIN); } +/* Debug function, prints directly to hypervisor console */ +void xen_early_printf(const char *, ...); + #ifndef xen_mb #define xen_mb() mb() #endif diff --git a/sys/xen/xenstore/xenstore.c b/sys/xen/xenstore/xenstore.c index d404862..a4ef369 100644 --- a/sys/xen/xenstore/xenstore.c +++ b/sys/xen/xenstore/xenstore.c @@ -1079,12 +1079,6 @@ xs_init_comms(void) } /*------------------ Private Device Attachment Functions --------------------*/ -static void -xs_identify(driver_t *driver, device_t parent) -{ - - BUS_ADD_CHILD(parent, 0, "xenstore", 0); -} /** * Probe for the existance of the XenStore. @@ -1148,11 +1142,17 @@ xs_attach(device_t dev) struct proc *p; #ifdef XENHVM - xs.evtchn = hvm_get_parameter(HVM_PARAM_STORE_EVTCHN); - xs.gpfn = hvm_get_parameter(HVM_PARAM_STORE_PFN); - xen_store = pmap_mapdev(xs.gpfn * PAGE_SIZE, PAGE_SIZE); + if (xen_hvm_domain()) { + xs.evtchn = hvm_get_parameter(HVM_PARAM_STORE_EVTCHN); + xs.gpfn = hvm_get_parameter(HVM_PARAM_STORE_PFN); + xen_store = pmap_mapdev(xs.gpfn * PAGE_SIZE, PAGE_SIZE); + } else if (xen_pv_domain()) { + xs.evtchn = HYPERVISOR_start_info->store_evtchn; + } else { + panic("Unknown domain type, cannot initialize xenstore\n"); + } #else - xs.evtchn = xen_start_info->store_evtchn; + xs.evtchn = HYPERVISOR_start_info->store_evtchn; #endif TAILQ_INIT(&xs.reply_list); @@ -1240,7 +1240,6 @@ xs_resume(device_t dev __unused) /*-------------------- Private Device Attachment Data -----------------------*/ static device_method_t xenstore_methods[] = { /* Device interface */ - DEVMETHOD(device_identify, xs_identify), DEVMETHOD(device_probe, xs_probe), DEVMETHOD(device_attach, xs_attach), DEVMETHOD(device_detach, bus_generic_detach), @@ -1263,9 +1262,8 @@ static devclass_t xenstore_devclass; #ifdef XENHVM DRIVER_MODULE(xenstore, xenpci, xenstore_driver, xenstore_devclass, 0, 0); -#else -DRIVER_MODULE(xenstore, nexus, xenstore_driver, xenstore_devclass, 0, 0); #endif +DRIVER_MODULE(xenstore, nexus, xenstore_driver, xenstore_devclass, 0, 0); /*------------------------------- Sysctl Data --------------------------------*/ /* XXX Shouldn''t the node be somewhere else? */ -- 1.7.7.5 (Apple Git-26) --------------040406040602030408060208 Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel --------------040406040602030408060208--
Hello, I''ve updated the branch one more time in order to cope with the recent HEAD changes regarding SMAP parsing, as usual the branch can be found at: http://xenbits.xen.org/gitweb/?p=people/royger/freebsd.git;a=shortlog;h=refs/heads/pvh_v5 Also, I''ve created a wiki page that describes how to set up a FreeBSD PVH guest: http://wiki.xen.org/wiki/FreeBSD_PVH In case anyone wants to give it a try :) Thanks.