Jan Beulich
2006-Mar-13  16:40 UTC
[Xen-devel] [PATCH] i386 linux: make 32-bit PAE kernel work when built with newer gcc
The compiler isn''t required to order the two stores ptep_get_and_clear_full() in any particular way, and we saw cases where the upper 32 bits get stored before the lower ones, which causes the access to fail (page-fault propagated out of Xen). Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2006-Mar-14  10:09 UTC
Re: [Xen-devel] [PATCH] i386 linux: make 32-bit PAE kernel work when built with newer gcc
On 13 Mar 2006, at 16:40, Jan Beulich wrote:> The compiler isn''t required to order the two stores > ptep_get_and_clear_full() in any particular way, and we saw cases > where the upper 32 bits get stored before the lower ones, which causes > the access to fail (page-fault propagated out of > Xen).Won''t this need a barrier() (compile barrier) between the updates of low and high portions? -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jan Beulich
2006-Mar-14  10:40 UTC
Re: [Xen-devel] [PATCH] i386 linux: make 32-bit PAE kernel work when built with newer gcc
>>> Keir Fraser <Keir.Fraser@cl.cam.ac.uk> 14.03.06 11:09:29 >>> > >On 13 Mar 2006, at 16:40, Jan Beulich wrote: > >> The compiler isn''t required to order the two stores >> ptep_get_and_clear_full() in any particular way, and we saw cases >> where the upper 32 bits get stored before the lower ones, which causes >> the access to fail (page-fault propagated out of >> Xen). > >Won''t this need a barrier() (compile barrier) between the updates of >low and high portions?No, I can''t see why. The compiler isn''t permitted to re-order separate writes across sequence points (which is different from a single 64-bit write, where the compiler is only expected to carry out the full 64-bit write prior to the next sequence point, but nothing requires it to do this in any particular order). Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2006-Mar-14  11:53 UTC
Re: [Xen-devel] [PATCH] i386 linux: make 32-bit PAE kernel work when built with newer gcc
On 14 Mar 2006, at 10:40, Jan Beulich wrote:>> Won''t this need a barrier() (compile barrier) between the updates of >> low and high portions? > > No, I can''t see why. The compiler isn''t permitted to re-order separate > writes across > sequence points (which is different from a single 64-bit write, where > the compiler is > only expected to carry out the full 64-bit write prior to the next > sequence point, but > nothing requires it to do this in any particular order).The compiler is certainly allowed to reorder those updates. The constraints on a conforming implementation of C99 are pretty weak, and don''t say anything about obeying the rules for access/update ordering on non-volatile objects. Whether gcc reorders such updates is another matter. :-) If I build the following with -O2 on x86/64 gcc 4.1, the compiler removes the first update of x. If I take a signal after the update of y, the signal handler can see y==2, z!=2 but also x!=2, which disobeys the semantics of the C99 abstract machine semantics: int x, y, z; void foo(void) { x = 2; y = 2; z = 2; x = 0; } Really it''s safer just to include the barrier(), and makes our ordering requirement explicit in the code. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Petersson, Mats
2006-Mar-14  12:15 UTC
RE: [Xen-devel] [PATCH] i386 linux: make 32-bit PAE kernel work when built with newer gcc
> -----Original Message----- > From: xen-devel-bounces@lists.xensource.com > [mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of > Keir Fraser > Sent: 14 March 2006 11:53 > To: Jan Beulich > Cc: xen-devel@lists.xensource.com > Subject: Re: [Xen-devel] [PATCH] i386 linux: make 32-bit PAE > kernel work when built with newer gcc > > > On 14 Mar 2006, at 10:40, Jan Beulich wrote: > > >> Won''t this need a barrier() (compile barrier) between the > updates of > >> low and high portions? > > > > No, I can''t see why. The compiler isn''t permitted to > re-order separate > > writes across sequence points (which is different from a > single 64-bit > > write, where the compiler is only expected to carry out the full > > 64-bit write prior to the next sequence point, but nothing > requires it > > to do this in any particular order). > > The compiler is certainly allowed to reorder those updates. > The constraints on a conforming implementation of C99 are > pretty weak, and don''t say anything about obeying the rules > for access/update ordering on non-volatile objects. Whether > gcc reorders such updates is another matter. :-) > > If I build the following with -O2 on x86/64 gcc 4.1, the > compiler removes the first update of x. If I take a signal > after the update of y, the signal handler can see y==2, z!=2 > but also x!=2, which disobeys the semantics of the C99 > abstract machine semantics: > int x, y, z; > void foo(void) { x = 2; y = 2; z = 2; x = 0; } > > Really it''s safer just to include the barrier(), and makes > our ordering requirement explicit in the code.In my personal opinion this is also better in the sense that it makes it CLEAR to the reader of the code, what''s meant to happen. Relying on the compiler to "do the right thing" in these circumstances tend to break when a new compiler version comes out. And it can be VERY hard to figure out right in some cases [where it works 99999 times out of 100000, and then randomly crashes on the 100000th time...]. -- Mats> > -- Keir > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jan Beulich
2006-Mar-14  12:40 UTC
Re: [Xen-devel] [PATCH] i386 linux: make 32-bit PAE kernel work when built with newer gcc
>The compiler is certainly allowed to reorder those updates. The >constraints on a conforming implementation of C99 are pretty weak, and >don''t say anything about obeying the rules for access/update ordering >on non-volatile objects. Whether gcc reorders such updates is another >matter. :-)I''m not sure this is correct - the standard, in section 5.1.2.3, says "Accessing a volatile object, modifying an object, modifying a file, or calling a function that does any of those operations are all side effects, which are changes in the state of the execution environment. Evaluation of an expression may produce side effects. At certain specified points in the execution sequence called sequence points, all side effects of previous evaluations shall be complete and no side effects of subsequent evaluations shall have taken place." As we''re talking about modifying an object, this being considered a side-effect means it must have been carried out by the time the ; is (logically) reached.>If I build the following with -O2 on x86/64 gcc 4.1, the compiler >removes the first update of x. If I take a signal after the update of >y, the signal handler can see y==2, z!=2 but also x!=2, which disobeys >the semantics of the C99 abstract machine semantics: >int x, y, z; >void foo(void) { x = 2; y = 2; z = 2; x = 0; }I believe it doing so is permitted by the fact that a signal here can only come from outside of the program (the compiler can prove that none of these accesses can fault, at least not in the sense the program cares about), and the standard doesn''t care about ''outside''. If you inserted a function call after the update to y, I''m sure the compiler would leave the first write to x.>Really it''s safer just to include the barrier(), and makes our ordering >requirement explicit in the code.It might still be safer, I agree, to prevent broken compiler versions to do bad. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2006-Mar-14  14:06 UTC
Re: [Xen-devel] [PATCH] i386 linux: make 32-bit PAE kernel work when built with newer gcc
On 14 Mar 2006, at 12:40, Jan Beulich wrote:> I''m not sure this is correct - the standard, in section 5.1.2.3, says > > "Accessing a volatile object, modifying an object, modifying a file, > or calling a function > that does any of those operations are all side effects, which are > changes in the state of > the execution environment. Evaluation of an expression may produce > side effects. At > certain specified points in the execution sequence called sequence > points, all side effects > of previous evaluations shall be complete and no side effects of > subsequent evaluations > shall have taken place." > > As we''re talking about modifying an object, this being considered a > side-effect means it > must have been carried out by the time the ; is (logically) reached.See 5.1.2.3.5 and then the examples 5.1.2.3.8 and 5.1.2.3.9 (particularly the first sentence, which summarises the guarantee provided by an aggressive optimising C compiler). The standard is really not clear in this respect, but I take 5.1.2.3.2 to apply to the abstract machine, and not necessarily an implementation. But I am not a compiler writer. :-) -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jan Beulich
2006-Mar-14  14:31 UTC
Re: [Xen-devel] [PATCH] i386 linux: make 32-bit PAE kernel work when built with newer gcc
>See 5.1.2.3.5 and then the examples 5.1.2.3.8 and 5.1.2.3.9 >(particularly the first sentence, which summarises the guarantee >provided by an aggressive optimising C compiler). > >The standard is really not clear in this respect, but I take 5.1.2.3.2 >to apply to the abstract machine, and not necessarily an >implementation. But I am not a compiler writer. :-)Okay, I agree. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2006-Mar-14  14:39 UTC
Re: [Xen-devel] [PATCH] i386 linux: make 32-bit PAE kernel work when built with newer gcc
> Okay, I agree. > > JanI applied your patch with the barrier added. Thanks. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2006-Mar-17  10:19 UTC
Re: [Xen-devel] [PATCH] i386 linux: make 32-bit PAE kernel work when built with newer gcc
On 13 Mar 2006, at 16:40, Jan Beulich wrote:> The compiler isn''t required to order the two stores > ptep_get_and_clear_full() in any particular way, and we saw cases > where the upper 32 bits get stored before the lower ones, which causes > the access to fail (page-fault propagated out of > Xen).Jan, Isn''t this patch needed for native also? Even though this fastpath is (I think) only called from exit_mmap(), processors may still be running on those pagetables at that point (e.g., due to lazy switching). So what if: 1. Compiler causes high to be cleared before low. 2. This causes an invalid PTE (e.g., pointing into an uncacheable I/O region) 3. A processor speculatively loads the PTE into its TLB 4. A processor speculatively fetches a cacheline from the bogus area 5. You get a bug like the old AMD GART hang, where the CPU writes back the cache line at an inopportune moment, when it should never have been cached in the first place. ?? -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jan Beulich
2006-Mar-17  11:35 UTC
Re: [Xen-devel] [PATCH] i386 linux: make 32-bit PAE kernel work when built with newer gcc
>>> Keir Fraser <Keir.Fraser@cl.cam.ac.uk> 17.03.06 11:19:21 >>> > >On 13 Mar 2006, at 16:40, Jan Beulich wrote: > >> The compiler isn''t required to order the two stores >> ptep_get_and_clear_full() in any particular way, and we saw cases >> where the upper 32 bits get stored before the lower ones, which causes >> the access to fail (page-fault propagated out of >> Xen). > >Isn''t this patch needed for native also? Even though this fastpath is >(I think) only called from exit_mmap(), processors may still be running >on those pagetables at that point (e.g., due to lazy switching).I thought about that, too, but wasn''t sure.>So what if: > >1. Compiler causes high to be cleared before low. >2. This causes an invalid PTE (e.g., pointing into an uncacheable I/O >region) >3. A processor speculatively loads the PTE into its TLB >4. A processor speculatively fetches a cacheline from the bogus area >5. You get a bug like the old AMD GART hang, where the CPU writes back >the cache line at an inopportune moment, when it should never have been >cached in the first place.It, at least as per the spec, cannot write this cache line back, as everything up to here was speculative, and hence no change to the cache line may have occurred. But the caching inconsistencies would still be a problem and could, if I recall right, lead to processor hangs. If such a speculative access is considered theoretically possible, then yes, I''d think the same change is needed for native. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2006-Mar-17  11:51 UTC
Re: [Xen-devel] [PATCH] i386 linux: make 32-bit PAE kernel work when built with newer gcc
>> 1. Compiler causes high to be cleared before low. >> 2. This causes an invalid PTE (e.g., pointing into an uncacheable I/O >> region) >> 3. A processor speculatively loads the PTE into its TLB >> 4. A processor speculatively fetches a cacheline from the bogus area >> 5. You get a bug like the old AMD GART hang, where the CPU writes back >> the cache line at an inopportune moment, when it should never have >> been >> cached in the first place. > > It, at least as per the spec, cannot write this cache line back, as > everything > up to here was speculative, and hence no change to the cache line may > have > occurred. But the caching inconsistencies would still be a problem and > could, > if I recall right, lead to processor hangs. > If such a speculative access is considered theoretically possible, > then yes, I''d > think the same change is needed for native.Not sure which spec you refer to but it definitely can happen, as demonstrated by the infamous AMD GART problem some time back: http://lwn.net/2002/0124/a/athlon-agp-problem.php3 The above link gives plenty of detail, but briefly: The processor can speculatively decide it''s found a store instruction and thus write-allocate an arbitrary cache line. This line will ultimately get written back even if the speculative store never actually gets executed. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jan Beulich
2006-Mar-17  12:28 UTC
Re: [Xen-devel] [PATCH] i386 linux: make 32-bit PAE kernel work when built with newer gcc
>Not sure which spec you refer to but it definitely can happen, as >demonstrated by the infamous AMD GART problem some time back: >http://lwn.net/2002/0124/a/athlon-agp-problem.php3 > >The above link gives plenty of detail, but briefly: The processor can >speculatively decide it''s found a store instruction and thus >write-allocate an arbitrary cache line. This line will ultimately get >written back even if the speculative store never actually gets >executed.Indeed, then this should be needed on native, too. I have to admit that I hadn''t read about x86 processors using speculative writes (Intel''s SDM, Vol 3, section 7.2.2 clearly doesn''t say anything like that, and AMD doesn''t seem to have references to such behavior in their SDM either). Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2006-Mar-17  13:24 UTC
Re: [Xen-devel] [PATCH] i386 linux: make 32-bit PAE kernel work when built with newer gcc
On 17 Mar 2006, at 12:28, Jan Beulich wrote:> Indeed, then this should be needed on native, too. I have to admit that > I hadn''t read about x86 processors using speculative writes (Intel''s > SDM, > Vol 3, section 7.2.2 clearly doesn''t say anything like that, and AMD > doesn''t > seem to have references to such behavior in their SDM either).It wouldn''t be described there since the effect of this behaviour is only seen on the bus, but not visible to programmers. From the p.o.v of the programmer, stores will always be seen by software to happen in order and no earlier than previous loads. Hence no serialising LOCK prefix needed in spin_unlock(). So actually executing a store speculatively would be both bizarre and illegal anyway on x86. Will you send the patch to the appropriate person for it to be committed to native? -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jan Beulich
2006-Mar-17  13:37 UTC
Re: [Xen-devel] [PATCH] i386 linux: make 32-bit PAE kernel work when built with newer gcc
>Will you send the patch to the appropriate person for it to be >committed to native?"to the appropriate person" is quite a funny statement for i386... But yes, I will, though perhaps not immediately. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel