I didn''t think this through well enough. It''s an ABI change. The reason Solaris worked at all before was that we didn''t remove _PAGE_USER for kernel PTEs when compatibility was broken last time (3.0.3 I think). Thus the combination of our bug and the hypervisor''s bug conspired to work. Now the hypervisor is fixed, we''ll be getting _PAGE_GLOBAL on our kernel pages - not a good idea. But we can''t just fix Solaris, because other hypervisors without the fix will then not be putting _PAGE_USER on kernel PTEs - much worse!! I think the right thing to do is: - finally start the page listing incompatibilities on the Wiki (theoretical or otherwise) [1] - fix Solaris to add _PAGE_USER (or PT_USER as we know it) iff we have a ''broken'' hypervisor. I''m not sure how to do that though, beyond a "if it''s our hypervisor, or Xen 3.1.2 or higher" check. BTW it would be nice to see this in 3.1.2 Does that make sense Keir? thanks john [1] This is the list I''m aware of that breaks Solaris domUs: * Xen 3.1.1 is broken for 64-bit in B75, b76 (6616864). xen-unstable post 2007-10-15 is OK, as is Xen 3.1 * Xen 3.0.4 upstream is broken for Solaris domU (doesn''t save/restore trap interrupt settings) * pre-3.0.4 doesn''t work on 64-bit (changes in PTE handling?) * pre-3.0.4 doesn''t work with SMP guests (spurious page fault code) * To quote Jan Beulich: Subject: [Xen-devel] c/s 15147 change to struct vcpu_register_vcpu_info This changeset changed the layout of the structure, and 3.1 as well as 2.6.23 use the old layout, while 3.1.1 uses to new one. We don''t use this on Solaris yet, however * the fix for cmpxchg and PT_GLOBAL means that newer Solaris versions (or anything cmpxchg''ing a PTE) that correctly don''t set PT_USER will break on hypervisors without this changeset: 16129:2173fe77dcd2 from xen-unstable _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Yes, deciding whether to add _PAGE_USER based on Xen version seems the best way to go. Bear in mind that the ABI bug affects *only* cmpxchg of pagetables. Any of the following methods of writing a pte with _PAGE_USER set will also set _PAGE_GLOBAL (unless our software flag _PAGE_GUEST_KERNEL is also set): * update_va_mapping() * MMU_NORMAL_PT_UPDATE * direct modification of a not-yet-pinned pagetable (the _PAGE_GLOBAL will be added on each pte when the pagetable becomes pinned) So, unless you *only* ever update kernel ptes with cmpxchg, you have quite a nasty problem with older Xen: some updates methods will adjust the l1e, while direct cmpxchg won''t. Our Linux guests have not had problems because I''m pretty sure we basically never cmpxchg a kernel pte. -- Keir On 18/10/07 20:58, "John Levon" <levon@movementarian.org> wrote:> > I didn''t think this through well enough. It''s an ABI change. > > The reason Solaris worked at all before was that we didn''t remove > _PAGE_USER for kernel PTEs when compatibility was broken last time > (3.0.3 I think). Thus the combination of our bug and the hypervisor''s > bug conspired to work. > > Now the hypervisor is fixed, we''ll be getting _PAGE_GLOBAL on our kernel > pages - not a good idea. But we can''t just fix Solaris, because other > hypervisors without the fix will then not be putting _PAGE_USER on > kernel PTEs - much worse!! > > I think the right thing to do is: > > - finally start the page listing incompatibilities on the Wiki > (theoretical or otherwise) [1] > > - fix Solaris to add _PAGE_USER (or PT_USER as we know it) iff we have a > ''broken'' hypervisor. I''m not sure how to do that though, beyond a > "if it''s our hypervisor, or Xen 3.1.2 or higher" check. BTW it would > be nice to see this in 3.1.2 > > Does that make sense Keir? > > thanks > john > > [1] > > This is the list I''m aware of that breaks Solaris domUs: > > * Xen 3.1.1 is broken for 64-bit in B75, b76 (6616864). > xen-unstable post 2007-10-15 is OK, as is Xen 3.1 > * Xen 3.0.4 upstream is broken for Solaris domU (doesn''t > save/restore trap interrupt settings) > * pre-3.0.4 doesn''t work on 64-bit (changes in PTE handling?) > * pre-3.0.4 doesn''t work with SMP guests (spurious page fault code) > * To quote Jan Beulich: > > Subject: [Xen-devel] c/s 15147 change to struct vcpu_register_vcpu_info > > This changeset changed the layout of the structure, and 3.1 as well as > 2.6.23 use the old layout, while 3.1.1 uses to new one. > > We don''t use this on Solaris yet, however > > * the fix for cmpxchg and PT_GLOBAL means that newer Solaris > versions (or anything cmpxchg''ing a PTE) that correctly don''t set > PT_USER will break on hypervisors without this changeset: > 16129:2173fe77dcd2 from xen-unstable > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 18/10/07 20:58, "John Levon" <levon@movementarian.org> wrote:> * To quote Jan Beulich: > > Subject: [Xen-devel] c/s 15147 change to struct vcpu_register_vcpu_info > > This changeset changed the layout of the structure, and 3.1 as well as > 2.6.23 use the old layout, while 3.1.1 uses to new one. > > We don''t use this on Solaris yet, howeverSince the old structure layout was never present in a stable release of Xen (that operation was not supported at all in 3.1.0) this one doesn''t really belong on the incompatibility list. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser wrote:> On 18/10/07 20:58, "John Levon" <levon@movementarian.org> wrote: > > >> * To quote Jan Beulich: >> >> Subject: [Xen-devel] c/s 15147 change to struct vcpu_register_vcpu_info >> >> This changeset changed the layout of the structure, and 3.1 as well as >> 2.6.23 use the old layout, while 3.1.1 uses to new one. >> >> We don''t use this on Solaris yet, however >> > > Since the old structure layout was never present in a stable release of Xen > (that operation was not supported at all in 3.1.0) this one doesn''t really > belong on the incompatibility list. >I think the structure was present, but there was no implementation to back it up. It''s remotely possible someone decided to implement the vcpu_info placement without ever testing it, but it seems unlikely. J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Fri, Oct 19, 2007 at 08:38:14AM +0100, Keir Fraser wrote:> Yes, deciding whether to add _PAGE_USER based on Xen version seems the best > way to go.OK.> Bear in mind that the ABI bug affects *only* cmpxchg of > pagetables.Yep. Unfortunately we use that rather heavily, although Linux doesn''t :) regards john _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel