Stephen C. Tweedie
2008-Mar-19 15:39 UTC
[Xen-devel] Illegal PV kernel pfm/pfn translations on PROT_NONE ioremaps
Hi, On paravirt x86 (both 32- and 64-bit), since cset 13998: http://xenbits.xensource.com/xen-unstable.hg?rev/13998 we translate all ptes from being mfn-based to pfn-based when the hardware _PAGE_PRESENT bit is cleared. We do this for PROT_NONE pages, which appear to the HV to be non-present, but which are special-cased in the kernel to appear present (a different bit in the pte remains set for these pages and is caught by the pte_present() tests.) Unfortunately, it looks like recent X servers are attempting to do mprotect(PROT_NONE) and back on regions of ioremap()ed memory. When we do so, the translation of mfn to pfn results on x86_64 in end_pfn: maddr.h: static inline unsigned long mfn_to_pfn(unsigned long mfn) { ... if (unlikely((mfn >> machine_to_phys_order) != 0)) return end_pfn; and when we do mprotect(PROT_READ) later on on the same ptes, this end_pfn value is illegal: maddr.h: static inline unsigned long pfn_to_mfn(unsigned long pfn) { BUG_ON(end_pfn && pfn >= end_pfn); so we BUG(). It needs both an updated X and an updated kernel to show the bug, but given that, this results in an instant, completely repeatable kernel panic on starting X on both 32- and 64-bits on some hardware. Any suggestions? The obvious fix is to special-case these mfn_to_pfn translations so that they can be recognised as "untranslated" by a later pfn_to_mfn, but ideally we''d want such special pfns to be easily recognised so that we don''t completely lose the value of the BUG_ON() above. We''d also ideally like the HV to be able to spot such pte contents, as they won''t (indeed, cannot) be translated on migrate. That''s not a problem for dom0, of course, but might be for domUs with pci passthrough . Cheers, Stephen _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2008-Mar-19 16:00 UTC
[Xen-devel] Re: Illegal PV kernel pfm/pfn translations on PROT_NONE ioremaps
Return ~mfn in this case? Still fails the usual is-it-a-valid-pfn tests, but can be picked up and converted back to a valid mfn by pfn_to_mfn(). The key is that most of the time invalid pfns are explicitly == end_pfn, or max_page, or similar, so we are distinguishing from those and also can still bug on that specific value in pfn_to_mfn(). As for picking this up in the save/restore code -- sounds a bit tricky to me. We''re better off not allowing migration of a I/O-privileged domain in the first place. And indeed I believe the tools already have some such safety checks. -- Keir On 19/3/08 15:39, "Stephen C. Tweedie" <sct@redhat.com> wrote:> Hi, > > On paravirt x86 (both 32- and 64-bit), since cset 13998: > > http://xenbits.xensource.com/xen-unstable.hg?rev/13998 > > we translate all ptes from being mfn-based to pfn-based when the > hardware _PAGE_PRESENT bit is cleared. We do this for PROT_NONE pages, > which appear to the HV to be non-present, but which are special-cased in > the kernel to appear present (a different bit in the pte remains set for > these pages and is caught by the pte_present() tests.) > > Unfortunately, it looks like recent X servers are attempting to do > mprotect(PROT_NONE) and back on regions of ioremap()ed memory. When we > do so, the translation of mfn to pfn results on x86_64 in end_pfn: > > maddr.h: > static inline unsigned long mfn_to_pfn(unsigned long mfn) > { > ... > if (unlikely((mfn >> machine_to_phys_order) != 0)) > return end_pfn; > > and when we do mprotect(PROT_READ) later on on the same ptes, this > end_pfn value is illegal: > > maddr.h: > static inline unsigned long pfn_to_mfn(unsigned long pfn) > { > BUG_ON(end_pfn && pfn >= end_pfn); > > so we BUG(). > > It needs both an updated X and an updated kernel to show the bug, but > given that, this results in an instant, completely repeatable kernel > panic on starting X on both 32- and 64-bits on some hardware. > > > Any suggestions? The obvious fix is to special-case these mfn_to_pfn > translations so that they can be recognised as "untranslated" by a later > pfn_to_mfn, but ideally we''d want such special pfns to be easily > recognised so that we don''t completely lose the value of the BUG_ON() > above. > > We''d also ideally like the HV to be able to spot such pte contents, as > they won''t (indeed, cannot) be translated on migrate. That''s not a > problem for dom0, of course, but might be for domUs with pci > passthrough . > > Cheers, > Stephen > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Stephen C. Tweedie
2008-Mar-19 16:31 UTC
[Xen-devel] Re: Illegal PV kernel pfm/pfn translations on PROT_NONE ioremaps
Hi, On Wed, 2008-03-19 at 16:00 +0000, Keir Fraser wrote:> Return ~mfn in this case? Still fails the usual is-it-a-valid-pfn tests,Right...> The key > is that most of the time invalid pfns are explicitly == end_pfn, or > max_page, or similar, so we are distinguishing from those and also can still > bug on that specific value in pfn_to_mfn().Yep, it will still let pfn_to_mfn pick up that magic number --- it won''t catch corrupted ptes, though, which is a shame. ~mfn on its own probably won''t work, though. On non-PAE, we store mfns in a pte that only has 20 bits for the information, so we have to deal with that restricted range. It might be easier to do this at the pte_machine_to_phys level instead, where we can potentially take advantage of other bits of the pte to encode the special casing.> As for picking this up in the save/restore code -- sounds a bit tricky to > me. We''re better off not allowing migration of a I/O-privileged domain in > the first place.Right, clearly. Ideally, though, we''d make this mechanism robust, and either have the HV fail the migrate if it finds such ptes, or alternatively have the kernel BUG on the ptes if the guest is marked migratable. The latter sounds cleaner, but would require a mechanism to mark guests as unmigratable/saveable while they have IO space mapped; I''m not sure we have a clean way for marking such things right now. Thinking about it, if a guest could be clearly marked as having IO permission, we could simply disable both save/migrate AND !PAGE_PRESENT pfn-to-mfn translation in one go, and solve both problems at once. Would either SIF_INITDOMAIN or SIF_PRIVILEGED work for this? SIF_INITDOMAIN would be the safe fix for this particular dom0 case, I think. We could do it for SIF_PRIVILEGED too, but ONLY if we were sure that such domains would never be migrated or saved (we''ll corrupt PROT_NONE mappings if we migrate such domains.) Cheers, Stephen _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2008-Mar-19 16:34 UTC
Re: [Xen-devel] Re: Illegal PV kernel pfm/pfn translations on PROT_NONE ioremaps
Another possibility, if we can modify all places that convert to/from PROT_NONE (or otherwise specifically hook those places) would be to use a software-available pte flag as PAGE_IO and avoid pte-mfn conversions on such ptes. I reckon the ~mfn trick, or similar mapping of untranslated mfns into the pfn ''namespace'', is the easier and more incremental solution. -- Keir On 19/3/08 16:00, "Keir Fraser" <keir.fraser@eu.citrix.com> wrote:> Return ~mfn in this case? Still fails the usual is-it-a-valid-pfn tests, but > can be picked up and converted back to a valid mfn by pfn_to_mfn(). The key > is that most of the time invalid pfns are explicitly == end_pfn, or > max_page, or similar, so we are distinguishing from those and also can still > bug on that specific value in pfn_to_mfn(). > > As for picking this up in the save/restore code -- sounds a bit tricky to > me. We''re better off not allowing migration of a I/O-privileged domain in > the first place. And indeed I believe the tools already have some such > safety checks. > > -- Keir > > On 19/3/08 15:39, "Stephen C. Tweedie" <sct@redhat.com> wrote: > >> Hi, >> >> On paravirt x86 (both 32- and 64-bit), since cset 13998: >> >> http://xenbits.xensource.com/xen-unstable.hg?rev/13998 >> >> we translate all ptes from being mfn-based to pfn-based when the >> hardware _PAGE_PRESENT bit is cleared. We do this for PROT_NONE pages, >> which appear to the HV to be non-present, but which are special-cased in >> the kernel to appear present (a different bit in the pte remains set for >> these pages and is caught by the pte_present() tests.) >> >> Unfortunately, it looks like recent X servers are attempting to do >> mprotect(PROT_NONE) and back on regions of ioremap()ed memory. When we >> do so, the translation of mfn to pfn results on x86_64 in end_pfn: >> >> maddr.h: >> static inline unsigned long mfn_to_pfn(unsigned long mfn) >> { >> ... >> if (unlikely((mfn >> machine_to_phys_order) != 0)) >> return end_pfn; >> >> and when we do mprotect(PROT_READ) later on on the same ptes, this >> end_pfn value is illegal: >> >> maddr.h: >> static inline unsigned long pfn_to_mfn(unsigned long pfn) >> { >> BUG_ON(end_pfn && pfn >= end_pfn); >> >> so we BUG(). >> >> It needs both an updated X and an updated kernel to show the bug, but >> given that, this results in an instant, completely repeatable kernel >> panic on starting X on both 32- and 64-bits on some hardware. >> >> >> Any suggestions? The obvious fix is to special-case these mfn_to_pfn >> translations so that they can be recognised as "untranslated" by a later >> pfn_to_mfn, but ideally we''d want such special pfns to be easily >> recognised so that we don''t completely lose the value of the BUG_ON() >> above. >> >> We''d also ideally like the HV to be able to spot such pte contents, as >> they won''t (indeed, cannot) be translated on migrate. That''s not a >> problem for dom0, of course, but might be for domUs with pci >> passthrough . >> >> Cheers, >> Stephen >> >> > > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2008-Mar-19 16:42 UTC
[Xen-devel] Re: Illegal PV kernel pfm/pfn translations on PROT_NONE ioremaps
On 19/3/08 16:31, "Stephen C. Tweedie" <sct@redhat.com> wrote:> ~mfn on its own probably won''t work, though. On non-PAE, we store mfns > in a pte that only has 20 bits for the information, so we have to deal > with that restricted range.Sure. Personally I could live with a solution that didn''t work for non-PAE. I guess that might not be the popular choice however. ;-)> It might be easier to do this at the pte_machine_to_phys level instead, > where we can potentially take advantage of other bits of the pte to > encode the special casing.Oh yes, the PAGE_IO type of trick I mentioned in my other email just now. To expand on that -- we''d like a PAGE_IO flag for foreign and I/O mappings anyway. It would simplify pte_pfn and similar routines which currently do a nasty little MFN->PFN->MFN conversion check to detect such ''foreign'' mappings.> Thinking about it, if a guest could be clearly marked as having IO > permission, we could simply disable both save/migrate AND !PAGE_PRESENT > pfn-to-mfn translation in one go, and solve both problems at once. > > Would either SIF_INITDOMAIN or SIF_PRIVILEGED work for this? > SIF_INITDOMAIN would be the safe fix for this particular dom0 case, I > think. We could do it for SIF_PRIVILEGED too, but ONLY if we were sure > that such domains would never be migrated or saved (we''ll corrupt > PROT_NONE mappings if we migrate such domains.)SIF_PRIVILEGED is no longer used. It''s still set for dom0, but not for hardware-capable domUs. It''s tricky anyway, since a domU can be given hardware capabilities after it has booted through mechanisms like pciback-pcifront PCI hotplug. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Stephen C. Tweedie
2008-Mar-19 17:12 UTC
[Xen-devel] Re: Illegal PV kernel pfm/pfn translations on PROT_NONE ioremaps
Hi, On Wed, 2008-03-19 at 16:42 +0000, Keir Fraser wrote:> > It might be easier to do this at the pte_machine_to_phys level instead, > > where we can potentially take advantage of other bits of the pte to > > encode the special casing. > > Oh yes, the PAGE_IO type of trick I mentioned in my other email just now.Yep. There are a number of bits that could be used: for example, I don''t think PROT_WRITE is ever going to be set on a PROT_NONE page, either. But...> SIF_PRIVILEGED is no longer used. It''s still set for dom0, but not for > hardware-capable domUs. It''s tricky anyway, since a domU can be given > hardware capabilities after it has booted through mechanisms like > pciback-pcifront PCI hotplug.Right; but I still think it''s reasonable to expect this sort of thing to be declared in advance for a domain. We clearly cannot enter a non- translating mode dynamically, because of the risk of already having untranslated PROT_NONE ptes in the page tables. But becoming unable to save or migrate a guest is a Big Thing, and it would seem not unreasonable to ask for a startup flag such as SIF_PRIVILEGED to be declared in advance before we permit that. I''ll probably do a quick patch based in SIF_INITDOMAIN for now, simply to be able to test whether it does in fact fix the symptoms we''re seeing; but that might not be the best solution to merge long term. Cheers, Stephen _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2008-Mar-19 18:52 UTC
[Xen-devel] Re: Illegal PV kernel pfm/pfn translations on PROT_NONE ioremaps
On 19/3/08 17:12, "Stephen C. Tweedie" <sct@redhat.com> wrote:> On Wed, 2008-03-19 at 16:42 +0000, Keir Fraser wrote: > >>> It might be easier to do this at the pte_machine_to_phys level instead, >>> where we can potentially take advantage of other bits of the pte to >>> encode the special casing. >> >> Oh yes, the PAGE_IO type of trick I mentioned in my other email just now. > > Yep. There are a number of bits that could be used: for example, I > don''t think PROT_WRITE is ever going to be set on a PROT_NONE page, > either.My point is the flag would be set even when the protection is not PROT_NONE. It has other uses beyond fixing this bug. I might look into doing this myself as I think it is probably the best fix for this issue. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2008-Mar-20 11:39 UTC
[Xen-devel] Re: Illegal PV kernel pfm/pfn translations on PROT_NONE ioremaps
On 19/3/08 18:52, "Keir Fraser" <keir.fraser@eu.citrix.com> wrote:>> Yep. There are a number of bits that could be used: for example, I >> don''t think PROT_WRITE is ever going to be set on a PROT_NONE page, >> either. > > My point is the flag would be set even when the protection is not PROT_NONE. > It has other uses beyond fixing this bug. > > I might look into doing this myself as I think it is probably the best fix for > this issue....and the issue is now fixed (changeset 488) in http://xenbits.xensource.com/staging/linux-2.6.18-xen.hg The patch is a minimal one given the approach of using a new _PAGE_IO flag. For example, pte_pfn() probably no longer needs to use mfn_to_local_pfn() and could use the simpler and faster mfn_to_pfn() instead. But that belongs in a separate patch. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Stephen C. Tweedie
2008-Mar-28 16:41 UTC
[Xen-devel] Re: Illegal PV kernel pfm/pfn translations on PROT_NONE ioremaps
Hi, On Thu, 2008-03-20 at 11:39 +0000, Keir Fraser wrote:> ...and the issue is now fixed (changeset 488) in > http://xenbits.xensource.com/staging/linux-2.6.18-xen.hgI''ve merged this for RHEL-5.2 (along with cset 492), and now have multiple reports that it has fixed the X crashes --- thanks! We''ll need to keep it queued for pvops, too, as that is going to need the whole PROT_NONE special handling in order to support migrate. I''m away all next week, I''ll have to deal with it when I get back. Cheers, Stephen _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2008-Mar-28 16:44 UTC
[Xen-devel] Re: Illegal PV kernel pfm/pfn translations on PROT_NONE ioremaps
On 28/3/08 16:41, "Stephen C. Tweedie" <sct@redhat.com> wrote:>> ...and the issue is now fixed (changeset 488) in >> http://xenbits.xensource.com/staging/linux-2.6.18-xen.hg > > I''ve merged this for RHEL-5.2 (along with cset 492), and now have > multiple reports that it has fixed the X crashes --- thanks! > > We''ll need to keep it queued for pvops, too, as that is going to need > the whole PROT_NONE special handling in order to support migrate. I''m > away all next week, I''ll have to deal with it when I get back.Great. You also need c/s 492, unless you already fixed the i386 non-PAE build breakage that c/s 488 introduces. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2008-Mar-28 16:45 UTC
[Xen-devel] Re: Illegal PV kernel pfm/pfn translations on PROT_NONE ioremaps
On 28/3/08 16:44, "Keir Fraser" <keir.fraser@eu.citrix.com> wrote:>> I''ve merged this for RHEL-5.2 (along with cset 492), and now haveDuh, sorry for the noise. -- Keir>> multiple reports that it has fixed the X crashes --- thanks! >> >> We''ll need to keep it queued for pvops, too, as that is going to need >> the whole PROT_NONE special handling in order to support migrate. I''m >> away all next week, I''ll have to deal with it when I get back. > > Great. You also need c/s 492, unless you already fixed the i386 non-PAE build > breakage that c/s 488 introduces._______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Possibly Parallel Threads
- [PATCH] xen/balloon: don't set P2M entry for auto translated guest
- [PATCH] Arch-neutral balloon driver
- [PATCH v8 0/19] enable swiotlb-xen on arm and arm64
- [PATCH] xen/grant-table: Refactor gnttab_[un]map_refs to avoid m2p_override
- [PATCH] xen/grant-table: Refactor gnttab_[un]map_refs to avoid m2p_override