Chris Wright wrote:>* Zachary Amsden (zach@vmware.com) wrote: > > >>Does Xen assume page aligned descriptor tables? I assume from this >> >> > >Yes. > > > >>patch and snippets I have gathered from others, that is a yes, and other >>things here imply that DT pages are not shadowed. If so, Xen itself >>must have live segments in the GDT pages, so how do you allocate space >>for the per-CPU GDT pages on SMP? >> >> > >early during boot. > >Doesn't that require 16 pages per CPU? That seems excessive to impose on a native build. Perhaps we could get away with 1 page per CPU for the GDT on native boots and bump that up to 16 if compiling for a virtualized sub-architecture - i.e. move GDT to a page aligned struct for native (doesn't cost too much), and give it MACH_GDT_PAGES of space which is defined by the sub-architecture. Let's take this thread over to virtualization@lists.osdl.org as well. Zach
Martin J. Bligh wrote:>I can easily leave those bits out. There's going to be lots of bits common >with std i386, and bits that are common amongst the hypervisor layers, >then bits that are specific. Hopefully more bits that are common, but >still. > >Humpf. I shall go back into my corner and have a rethink. Will read through >your patches some more, then send you some email. > >Pulling LKML this time. For the most part, I have encapsulated everything Xen could possibly need from a CPU / page table / descriptor table point of view. All privileged operations, all sensitive operations, and all privileged data access. The easiest way to probably go about this is find common code that we share in this area and get our trees cleaned to that level. This leaves behind the higher level operations. I have deliberately refrained from introducing higher level operations and hints at this stage, because I know they might require a bit of a rethink when the Xen picture becomes more clear. It appears the first high level operations you have provided are the set_page_writable / set_page_readonly hypervisor interfaces, which I think are a great start. Since I haven't tried to introduce these types of interfaces yet, you can expect very little collision with the important bits in your tree. I think you'll find going file by file through the low-level stuff will be quite easy, since I've encapsulated for the most part the minimal superset of what Xen could possibly need. I do expect some issues with the MMU patches, the dual compile time PAE / non-PAE stuff throws a giant monkeywrench into the middle of the code when trying to rework it for a hypervisor. I've attached my MMU patch so you can take a look. I do have patches for higher level interfaces that I will be sending along to the virtualization list so we can get cohesion on the best approach before moving forward with more hints and high level paradigms to LKML. Zach -------------- next part -------------- i386 Transparent paravirtualization sub-arch patch #8. Transparent paravirtualization support for MMU operations. All operations which update live page table entries have been moved to the sub-architecture layer. Unfortunately, this required yet another parallel set of pgtable-Nlevel-ops.h files, but this avoids the ugliness of having to use #ifdef's all of the code. This is pure code motion. Anything else would be a bug. Signed-off-by: Zachary Amsden <zach@vmware.com> Index: linux-2.6.13/include/asm-i386/pgtable-2level.h =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- linux-2.6.13.orig/include/asm-i386/pgtable-2level.h 2005-08-04 13:42:31.000000000 -0700 +++ linux-2.6.13/include/asm-i386/pgtable-2level.h 2005-08-04 14:02:16.000000000 -0700 @@ -8,17 +8,6 @@ #define pgd_ERROR(e) \ printk("%s:%d: bad pgd %08lx.\n", __FILE__, __LINE__, pgd_val(e)) -/* - * Certain architectures need to do special things when PTEs - * within a page table are directly modified. Thus, the following - * hook is made available. - */ -#define set_pte(pteptr, pteval) (*(pteptr) =3D pteval) -#define set_pte_at(mm,addr,ptep,pteval) set_pte(ptep,pteval) -#define set_pte_atomic(pteptr, pteval) set_pte(pteptr,pteval) -#define set_pmd(pmdptr, pmdval) (*(pmdptr) =3D (pmdval)) - -#define ptep_get_and_clear(mm,addr,xp) __pte(xchg(&(xp)->pte_low, 0)) #define pte_same(a, b) ((a).pte_low =3D=3D (b).pte_low) #define pte_page(x) pfn_to_page(pte_pfn(x)) #define pte_none(x) (!(x).pte_low) Index: linux-2.6.13/include/asm-i386/pgtable-3level.h =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- linux-2.6.13.orig/include/asm-i386/pgtable-3level.h 2005-08-04 13:42:31.000000000 -0700 +++ linux-2.6.13/include/asm-i386/pgtable-3level.h 2005-08-04 14:02:16.000000000 -0700 @@ -44,28 +44,6 @@ return pte_x(pte); } -/* Rules for using set_pte: the pte being assigned *must* be - * either not present or in a state where the hardware will - * not attempt to update the pte. In places where this is - * not possible, use pte_get_and_clear to obtain the old pte - * value and then use set_pte to update it. -ben - */ -static inline void set_pte(pte_t *ptep, pte_t pte) -{ - ptep->pte_high =3D pte.pte_high; - smp_wmb(); - ptep->pte_low =3D pte.pte_low; -} -#define set_pte_at(mm,addr,ptep,pteval) set_pte(ptep,pteval) - -#define __HAVE_ARCH_SET_PTE_ATOMIC -#define set_pte_atomic(pteptr,pteval) \ - set_64bit((unsigned long long *)(pteptr),pte_val(pteval)) -#define set_pmd(pmdptr,pmdval) \ - set_64bit((unsigned long long *)(pmdptr),pmd_val(pmdval)) -#define set_pud(pudptr,pudval) \ - (*(pudptr) =3D (pudval)) - /* * Pentium-II erratum A13: in PAE mode we explicitly have to flush * the TLB via cr3 if the top-level pgd is changed... @@ -90,18 +68,6 @@ #define pmd_offset(pud, address) ((pmd_t *) pud_page(*(pud)) + \ pmd_index(address)) -static inline pte_t ptep_get_and_clear(struct mm_struct *mm, unsigned long addr, pte_t *ptep) -{ - pte_t res; - - /* xchg acts as a barrier before the setting of the high bits */ - res.pte_low =3D xchg(&ptep->pte_low, 0); - res.pte_high =3D ptep->pte_high; - ptep->pte_high =3D 0; - - return res; -} - static inline int pte_same(pte_t a, pte_t b) { return a.pte_low =3D=3D b.pte_low && a.pte_high =3D=3D b.pte_high; Index: linux-2.6.13/include/asm-i386/pgtable.h =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- linux-2.6.13.orig/include/asm-i386/pgtable.h 2005-08-04 13:47:07.000000000 -0700 +++ linux-2.6.13/include/asm-i386/pgtable.h 2005-08-04 14:02:29.000000000 -0700 @@ -201,11 +201,9 @@ extern unsigned long pg0[]; #define pte_present(x) ((x).pte_low & (_PAGE_PRESENT | _PAGE_PROTNONE)) -#define pte_clear(mm,addr,xp) do { set_pte_at(mm, addr, xp, __pte(0)); } while (0) #define pmd_none(x) (!pmd_val(x)) #define pmd_present(x) (pmd_val(x) & _PAGE_PRESENT) -#define pmd_clear(xp) do { set_pmd(xp, __pmd(0)); } while (0) #define pmd_bad(x) ((pmd_val(x) & (~PAGE_MASK & ~_PAGE_USER)) !=3D _KERNPG_TABLE) @@ -243,20 +241,12 @@ #else # include <asm/pgtable-2level.h> #endif +#include <pgtable-ops.h> + +#define set_pte_at(mm,addr,pteptr,pteval) set_pte(pteptr,pteval) -static inline int ptep_test_and_clear_dirty(struct vm_area_struct *vma, unsigned long addr, pte_t *ptep) -{ - if (!pte_dirty(*ptep)) - return 0; - return test_and_clear_bit(_PAGE_BIT_DIRTY, &ptep->pte_low); -} - -static inline int ptep_test_and_clear_young(struct vm_area_struct *vma, unsigned long addr, pte_t *ptep) -{ - if (!pte_young(*ptep)) - return 0; - return test_and_clear_bit(_PAGE_BIT_ACCESSED, &ptep->pte_low); -} +#define pte_clear(mm,addr,xp) do { set_pte_at(mm, addr, xp, __pte(0)); } while (0) +#define pmd_clear(xp) do { set_pmd(xp, __pmd(0)); } while (0) static inline pte_t ptep_get_and_clear_full(struct mm_struct *mm, unsigned long addr, pte_t *ptep, int full) { @@ -270,26 +260,6 @@ return pte; } -static inline void ptep_set_wrprotect(struct mm_struct *mm, unsigned long addr, pte_t *ptep) -{ - clear_bit(_PAGE_BIT_RW, &ptep->pte_low); -} - -/* - * clone_pgd_range(pgd_t *dst, pgd_t *src, int count); - * - * dst - pointer to pgd range anwhere on a pgd page - * src - "" - * count - the number of pgds to copy. - * - * dst and src can be on the same page, but the range must not overlap, - * and must not cross a page boundary. - */ -static inline void clone_pgd_range(pgd_t *dst, pgd_t *src, int count) -{ - memcpy(dst, src, count * sizeof(pgd_t)); -} - /* * Macro to mark a page protection value as "uncacheable". On processors which do not support * it, this is a no-op. @@ -414,14 +384,6 @@ * bit at the same time. */ #define update_mmu_cache(vma,address,pte) do { } while (0) -#define __HAVE_ARCH_PTEP_SET_ACCESS_FLAGS -#define ptep_set_access_flags(__vma, __address, __ptep, __entry, __dirty) \ - do { \ - if (__dirty) { \ - (__ptep)->pte_low =3D (__entry).pte_low; \ - flush_tlb_page(__vma, __address); \ - } \ - } while (0) #endif /* !__ASSEMBLY__ */ Index: linux-2.6.13/include/asm-i386/mach-default/pgtable-2level-ops.h =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- linux-2.6.13.orig/include/asm-i386/mach-default/pgtable-2level-ops.h 2005-08-04 14:02:04.000000000 -0700 +++ linux-2.6.13/include/asm-i386/mach-default/pgtable-2level-ops.h 2005-08-04 14:02:16.000000000 -0700 @@ -0,0 +1,15 @@ +#ifndef _MACH_PGTABLE_LEVEL_OPS_H +#define _MACH_PGTABLE_LEVEL_OPS_H + +/* + * Certain architectures need to do special things when PTEs + * within a page table are directly modified. Thus, the following + * hook is made available. + */ +#define set_pte(pteptr, pteval) (*(pteptr) =3D pteval) +#define set_pte_atomic(pteptr, pteval) set_pte(pteptr,pteval) +#define set_pmd(pmdptr, pmdval) (*(pmdptr) =3D (pmdval)) + +#define ptep_get_and_clear(mm,addr,xp) __pte(xchg(&(xp)->pte_low, 0)) + +#endif /* _PGTABLE_OPS_H */ Index: linux-2.6.13/include/asm-i386/mach-default/pgtable-3level-ops.h =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- linux-2.6.13.orig/include/asm-i386/mach-default/pgtable-3level-ops.h 2005-08-04 14:02:04.000000000 -0700 +++ linux-2.6.13/include/asm-i386/mach-default/pgtable-3level-ops.h 2005-08-04 14:02:16.000000000 -0700 @@ -0,0 +1,37 @@ +#ifndef _MACH_PGTABLE_LEVEL_OPS_H +#define _MACH_PGTABLE_LEVEL_OPS_H + +/* Rules for using set_pte: the pte being assigned *must* be + * either not present or in a state where the hardware will + * not attempt to update the pte. In places where this is + * not possible, use pte_get_and_clear to obtain the old pte + * value and then use set_pte to update it. -ben + */ +static inline void set_pte(pte_t *ptep, pte_t pte) +{ + ptep->pte_high =3D pte.pte_high; + smp_wmb(); + ptep->pte_low =3D pte.pte_low; +} + +#define __HAVE_ARCH_SET_PTE_ATOMIC +#define set_pte_atomic(pteptr,pteval) \ + set_64bit((unsigned long long *)(pteptr),pte_val(pteval)) +#define set_pmd(pmdptr,pmdval) \ + set_64bit((unsigned long long *)(pmdptr),pmd_val(pmdval)) +#define set_pud(pudptr,pudval) \ + (*(pudptr) =3D (pudval)) + +static inline pte_t ptep_get_and_clear(struct mm_struct *mm, unsigned long addr, pte_t *ptep) +{ + pte_t res; + + /* xchg acts as a barrier before the setting of the high bits */ + res.pte_low =3D xchg(&ptep->pte_low, 0); + res.pte_high =3D ptep->pte_high; + ptep->pte_high =3D 0; + + return res; +} + +#endif Index: linux-2.6.13/include/asm-i386/mach-default/pgtable-ops.h =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- linux-2.6.13.orig/include/asm-i386/mach-default/pgtable-ops.h 2005-08-04 14:02:04.000000000 -0700 +++ linux-2.6.13/include/asm-i386/mach-default/pgtable-ops.h 2005-08-04 14:02:44.000000000 -0700 @@ -0,0 +1,77 @@ +/* + * Copyright (C) 2005, VMware, Inc. + * + * All rights reserved. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, but + * WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, GOOD TITLE or + * NON INFRINGEMENT. See the GNU General Public License for more + * details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. + * + * Send feedback to zach@vmware.com + * + */ + +#ifndef _PGTABLE_OPS_H +#define _PGTABLE_OPS_H + +#ifdef CONFIG_X86_PAE +# include <pgtable-3level-ops.h> +#else +# include <pgtable-2level-ops.h> +#endif + +static inline int ptep_test_and_clear_dirty(struct vm_area_struct *vma, unsigned long addr, pte_t *ptep) +{ + if (!pte_dirty(*ptep)) + return 0; + return test_and_clear_bit(_PAGE_BIT_DIRTY, &ptep->pte_low); +} + +static inline int ptep_test_and_clear_young(struct vm_area_struct *vma, unsigned long addr, pte_t *ptep) +{ + if (!pte_young(*ptep)) + return 0; + return test_and_clear_bit(_PAGE_BIT_ACCESSED, &ptep->pte_low); +} + +static inline void ptep_set_wrprotect(struct mm_struct *mm, unsigned long addr, pte_t *ptep) +{ + clear_bit(_PAGE_BIT_RW, &ptep->pte_low); +} + +/* + * clone_pgd_range(pgd_t *dst, pgd_t *src, int count); + * + * dst - pointer to pgd range anwhere on a pgd page + * src - "" + * count - the number of pgds to copy. + * + * dst and src can be on the same page, but the range must not overlap, + * and must not cross a page boundary. + */ +static inline void clone_pgd_range(pgd_t *dst, pgd_t *src, int count) +{ + memcpy(dst, src, count * sizeof(pgd_t)); +} + +#define __HAVE_ARCH_PTEP_SET_ACCESS_FLAGS +#define ptep_set_access_flags(__vma, __address, __ptep, __entry, __dirty) \ + do { \ + if (__dirty) { \ + (__ptep)->pte_low =3D (__entry).pte_low; \ + flush_tlb_page(__vma, __address); \ + } \ + } while (0) + +#endif /* _PGTABLE_OPS_H */
Martin J. Bligh wrote:>So I'd really, >really rather merge stuff together in a smaller group first - and yes, >mea culpa for throwing stuff out on LKML to start with ... I'll assign >that to frustration from mindless hours of patch refactoring. > >Agree. How about a file by file from 2.6.13-rc4-mm1 (prior to the code movement changes) and then work our differences into updated patches for those code movement changes to rc5-mm1. This alleviates a lot of grief on Andrew's part, and lets us have a better united front when pushing the code movement to LKML again (as you all saw, there was definitely some object to this). We've got selected movement from the following: include/asm-i386/desc.h -> include/asm-i386/mach-default/mach_desc.h include/asm-i386/io.h -> include/asm-i386/mach-default/mach_io.h include/asm-i386/msr.h -> include/asm-i386/mach-default/mach_msr.h include/asm-i386/processor.h -> include/asm-i386/mach-default/mach_processor.h include/asm-i386/system.h -> include/asm-i386/mach-default/mach_system.h include/asm-i386/tlbflush.h -> include/asm-i386/mach-default/mach_tlbflush.h And two oddballs: include/asm-i386/pgtable* -> various include/asm-i386/ptrace.h -> include/asm-i386/mach-default/mach_segment.h Want to start at the top? Here's my patches for descriptor related goo. -------------- next part -------------- Introduce a write acessor for updating the current LDT. This is required for hypervisors like Xen that do not allow LDT pages to be directly written. Testing - here's a fun little LDT test that can be trivially modified to test limits as well. /* * Copyright (c) 2005, Zachary Amsden (zach@vmware.com) * This is licensed under the GPL. */ #include <stdio.h> #include <signal.h> #include <asm/ldt.h> #include <asm/segment.h> #include <sys/types.h> #include <unistd.h> #include <sys/mman.h> #define __KERNEL__ #include <asm/page.h> void main(void) { struct user_desc desc; char *code; unsigned long long tsc; code =3D (char *)mmap(0, 8192, PROT_EXEC|PROT_READ|PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); desc.entry_number =3D 0; desc.base_addr =3D code; desc.limit =3D 1; desc.seg_32bit =3D 1; desc.contents =3D MODIFY_LDT_CONTENTS_CODE; desc.read_exec_only =3D 0; desc.limit_in_pages =3D 1; desc.seg_not_present =3D 0; desc.useable =3D 1; if (modify_ldt(1, &desc, sizeof(desc)) !=3D 0) { perror("modify_ldt"); } printf("code base is 0x%08x\n", (unsigned)code); code[0x0ffe] =3D 0x0f; /* rdtsc */ code[0x0fff] =3D 0x31; code[0x1000] =3D 0xcb; /* lret */ __asm__ __volatile("lcall $7,$0xffe" : "=3DA" (tsc)); printf("TSC is 0x%016llx\n", tsc); } Signed-off-by: Zachary Amsden <zach@vmware.com> Index: linux-2.6.13/arch/i386/kernel/ldt.c =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- linux-2.6.13.orig/arch/i386/kernel/ldt.c 2005-08-03 15:44:24.000000000 -0700 +++ linux-2.6.13/arch/i386/kernel/ldt.c 2005-08-03 15:48:53.000000000 -0700 @@ -177,7 +177,7 @@ static int write_ldt(void __user * ptr, unsigned long bytecount, int oldmode) { struct mm_struct * mm =3D current->mm; - __u32 entry_1, entry_2, *lp; + __u32 entry_1, entry_2; int error; struct user_desc ldt_info; @@ -205,8 +205,6 @@ goto out_unlock; } - lp =3D (__u32 *) ((ldt_info.entry_number << 3) + (char *) mm->context.ldt); - /* Allow LDTs to be cleared by the user. */ if (ldt_info.base_addr =3D=3D 0 && ldt_info.limit =3D=3D 0) { if (oldmode || LDT_empty(&ldt_info)) { @@ -223,8 +221,7 @@ /* Install the new entry ... */ install: - *lp =3D entry_1; - *(lp+1) =3D entry_2; + write_ldt_entry(mm->context.ldt, ldt_info.entry_number, entry_1, entry_2); error =3D 0; out_unlock: Index: linux-2.6.13/include/asm-i386/desc.h =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- linux-2.6.13.orig/include/asm-i386/desc.h 2005-08-03 15:44:24.000000000 -0700 +++ linux-2.6.13/include/asm-i386/desc.h 2005-08-03 16:17:25.000000000 -0700 @@ -96,6 +96,13 @@ (info)->seg_not_present =3D=3D 1 && \ (info)->useable =3D=3D 0 ) +static inline void write_ldt_entry(void *ldt, int entry, __u32 entry_a, __u32 entry_b) +{ + __u32 *lp =3D (__u32 *)((char *)ldt + entry*8); + *lp =3D entry_a; + *(lp+1) =3D entry_b; +} + #if TLS_SIZE !=3D 24 # error update this code. #endif -------------- next part -------------- i386 Transparent paravirtualization subarch patch #5 This change encapsulates descriptor and task register management. Diffs against: 2.6.13-rc4-mm1 Signed-off-by: Zachary Amsden <zach@vmware.com> Index: linux-2.6.13/include/asm-i386/desc.h =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- linux-2.6.13.orig/include/asm-i386/desc.h 2005-08-03 16:24:09.000000000 -0700 +++ linux-2.6.13/include/asm-i386/desc.h 2005-08-03 16:31:40.000000000 -0700 @@ -27,19 +27,6 @@ extern struct Xgt_desc_struct idt_descr, cpu_gdt_descr[NR_CPUS]; -#define load_TR_desc() __asm__ __volatile__("ltr %w0"::"q" (GDT_ENTRY_TSS*8)) -#define load_LDT_desc() __asm__ __volatile__("lldt %w0"::"q" (GDT_ENTRY_LDT*8)) - -#define load_gdt(dtr) __asm__ __volatile("lgdt %0"::"m" (*dtr)) -#define load_idt(dtr) __asm__ __volatile("lidt %0"::"m" (*dtr)) -#define load_tr(tr) __asm__ __volatile("ltr %0"::"mr" (tr)) -#define load_ldt(ldt) __asm__ __volatile("lldt %0"::"mr" (ldt)) - -#define store_gdt(dtr) __asm__ ("sgdt %0":"=3Dm" (*dtr)) -#define store_idt(dtr) __asm__ ("sidt %0":"=3Dm" (*dtr)) -#define store_tr(tr) __asm__ ("str %0":"=3Dmr" (tr)) -#define store_ldt(ldt) __asm__ ("sldt %0":"=3Dmr" (ldt)) - /* * This is the ldt that every process will get unless we need * something other than this. @@ -58,19 +45,10 @@ "rorl $16,%1" \ : "=3Dm"(*(n)) : "q" (addr), "r"(n), "ir"(limit), "i"(type)) -static inline void __set_tss_desc(unsigned int cpu, unsigned int entry, void *addr) -{ - _set_tssldt_desc(&per_cpu(cpu_gdt_table, cpu)[entry], (int)addr, - offsetof(struct tss_struct, __cacheline_filler) - 1, 0x89); -} +#include <mach_desc.h> #define set_tss_desc(cpu,addr) __set_tss_desc(cpu, GDT_ENTRY_TSS, addr) -static inline void set_ldt_desc(unsigned int cpu, void *addr, unsigned int size) -{ - _set_tssldt_desc(&per_cpu(cpu_gdt_table, cpu)[GDT_ENTRY_LDT], (int)addr, ((size << 3)-1), 0x82); -} - #define LDT_entry_a(info) \ ((((info)->base_addr & 0x0000ffff) << 16) | ((info)->limit & 0x0ffff)) @@ -96,24 +74,6 @@ (info)->seg_not_present =3D=3D 1 && \ (info)->useable =3D=3D 0 ) -static inline void write_ldt_entry(void *ldt, int entry, __u32 entry_a, __u32 entry_b) -{ - __u32 *lp =3D (__u32 *)((char *)ldt + entry*8); - *lp =3D entry_a; - *(lp+1) =3D entry_b; -} - -#if TLS_SIZE !=3D 24 -# error update this code. -#endif - -static inline void load_TLS(struct thread_struct *t, unsigned int cpu) -{ -#define C(i) per_cpu(cpu_gdt_table, cpu)[GDT_ENTRY_TLS_MIN + i] =3D t->tls_array[i] - C(0); C(1); C(2); -#undef C -} - static inline void clear_LDT(void) { int cpu =3D get_cpu(); Index: linux-2.6.13/include/asm-i386/mach-default/mach_desc.h =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- linux-2.6.13.orig/include/asm-i386/mach-default/mach_desc.h 2005-08-03 16:31:40.000000000 -0700 +++ linux-2.6.13/include/asm-i386/mach-default/mach_desc.h 2005-08-03 16:32:52.000000000 -0700 @@ -0,0 +1,83 @@ +/* + * Copyright (C) 2005, VMware, Inc. + * Copyright (C) 1992-2004, Linus Torvalds and authors + * + * All rights reserved. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, but + * WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, GOOD TITLE or + * NON INFRINGEMENT. See the GNU General Public License for more + * details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. + * + */ + +#ifndef __MACH_DESC_H +#define __MACH_DESC_H + +#define load_TR_desc() __asm__ __volatile__("ltr %w0"::"q" (GDT_ENTRY_TSS*8)) +#define load_LDT_desc() __asm__ __volatile__("lldt %w0"::"q" (GDT_ENTRY_LDT*8)) + +#define load_gdt(dtr) __asm__ __volatile("lgdt %0"::"m" (*dtr)) +#define load_idt(dtr) __asm__ __volatile("lidt %0"::"m" (*dtr)) +#define load_tr(tr) __asm__ __volatile("ltr %0"::"mr" (tr)) +#define load_ldt(ldt) __asm__ __volatile("lldt %0"::"mr" (ldt)) + +#define store_gdt(dtr) __asm__ ("sgdt %0":"=3Dm" (*dtr)) +#define store_idt(dtr) __asm__ ("sidt %0":"=3Dm" (*dtr)) +#define store_tr(tr) __asm__ ("str %0":"=3Dmr" (tr)) +#define store_ldt(ldt) __asm__ ("sldt %0":"=3Dmr" (ldt)) + +static inline unsigned int get_TR_desc(void) +{ + unsigned int tr; + __asm__ ("str %w0":"=3Dq" (tr)); + return tr; +} + +static inline unsigned int get_LDT_desc(void) +{ + unsigned int ldt; + __asm__ ("sldt %w0":"=3Dq" (ldt)); + return ldt; +} + +static inline void __set_tss_desc(unsigned int cpu, unsigned int entry, void *addr) +{ + _set_tssldt_desc(&per_cpu(cpu_gdt_table, cpu)[entry], (int)addr, + offsetof(struct tss_struct, __cacheline_filler) - 1, 0x89); +} + +static inline void set_ldt_desc(unsigned int cpu, void *addr, unsigned int size) +{ + _set_tssldt_desc(&per_cpu(cpu_gdt_table, cpu)[GDT_ENTRY_LDT], (int)addr, ((size << 3)-1), 0x82); +} + +static inline void write_ldt_entry(void *ldt, int entry, __u32 entry_a, __u32 entry_b) +{ + __u32 *lp =3D (__u32 *)((char *)ldt + entry*8); + *lp =3D entry_a; + *(lp+1) =3D entry_b; +} + +#if TLS_SIZE !=3D 24 +# error update this code. +#endif + +static inline void load_TLS(struct thread_struct *t, unsigned int cpu) +{ +#define C(i) per_cpu(cpu_gdt_table, cpu)[GDT_ENTRY_TLS_MIN + i] =3D t->tls_array[i] + C(0); C(1); C(2); +#undef C +} + +#endif -------------- next part -------------- When reviewing GDT updates, I found the code: set_tss_desc(cpu,t); /* This just modifies memory; ... */ per_cpu(cpu_gdt_table, cpu)[GDT_ENTRY_TSS].b &=3D 0xfffffdff; This second line is unnecessary, since set_tss_desc() has already cleared the busy bit. Commented disassembly, line 1: c028b8bd: 8b 0c 86 mov (%esi,%eax,4),%ecx c028b8c0: 01 cb add %ecx,%ebx c028b8c2: 8d 0c 39 lea (%ecx,%edi,1),%ecx =3D> %ecx =3D per_cpu(cpu_gdt_table, cpu) c028b8c5: 8d 91 80 00 00 00 lea 0x80(%ecx),%edx =3D> %edx =3D &per_cpu(cpu_gdt_table, cpu)[GDT_ENTRY_TSS] c028b8cb: 66 c7 42 00 73 20 movw $0x2073,0x0(%edx) c028b8d1: 66 89 5a 02 mov %bx,0x2(%edx) c028b8d5: c1 cb 10 ror $0x10,%ebx c028b8d8: 88 5a 04 mov %bl,0x4(%edx) c028b8db: c6 42 05 89 movb $0x89,0x5(%edx) =3D> ((char *)%edx)[5] =3D 0x89 (equivalent) ((char *)per_cpu(cpu_gdt_table, cpu)[GDT_ENTRY_TSS])[5] =3D 0x89 c028b8df: c6 42 06 00 movb $0x0,0x6(%edx) c028b8e3: 88 7a 07 mov %bh,0x7(%edx) c028b8e6: c1 cb 10 ror $0x10,%ebx =3D> other bits Commented disassembly, line 2: c028b8e9: 8b 14 86 mov (%esi,%eax,4),%edx c028b8ec: 8d 04 3a lea (%edx,%edi,1),%eax =3D> %eax =3D per_cpu(cpu_gdt_table, cpu) c028b8ef: 81 a0 84 00 00 00 ff andl $0xfffffdff,0x84(%eax) =3D> per_cpu(cpu_gdt_table, cpu)[GDT_ENTRY_TSS].b &=3D 0xfffffdff; (equivalent) ((char *)per_cpu(cpu_gdt_table, cpu)[GDT_ENTRY_TSS])[5] &=3D 0xfd Note that (0x89 & ~0xfd) =3D=3D 0; i.e, set_tss_desc(cpu,t) has already stored the type field in the GDT with the busy bit clear. Eliminating redundant and obscure code is always a good thing; in fact, I pointed out this same optimization many moons ago in arch/i386/setup.c, back when it used to be called that. Signed-off-by: Zachary Amsden <zach@vmware.com> Index: linux-2.6.13/arch/i386/power/cpu.c =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- linux-2.6.13.orig/arch/i386/power/cpu.c 2005-08-02 17:06:21.000000000 -0700 +++ linux-2.6.13/arch/i386/power/cpu.c 2005-08-03 15:27:57.000000000 -0700 @@ -84,7 +84,6 @@ struct tss_struct * t =3D &per_cpu(init_tss, cpu); set_tss_desc(cpu,t); /* This just modifies memory; should not be necessary. But... This is necessary, because 386 hardware has concept of busy TSS or some similar stupidity. */ - per_cpu(cpu_gdt_table, cpu)[GDT_ENTRY_TSS].b &=3D 0xfffffdff; load_TR_desc(); /* This does ltr */ load_LDT(¤t->active_mm->context); /* This does lldt */
* Zachary Amsden (zach@vmware.com) wrote:> Doesn't that require 16 pages per CPU? That seems excessive to impose > on a native build. Perhaps we could get away with 1 page per CPU for > the GDT on native boots and bump that up to 16 if compiling for a > virtualized sub-architecture - i.e. move GDT to a page aligned struct > for native (doesn't cost too much), and give it MACH_GDT_PAGES of space > which is defined by the sub-architecture.For Xen, the gdt is one page per cpu (so it's not one page per table entry). thanks, -chris