thr3ads.net - Virtualization - [v3,11/41] mips: reuse asm-generic/barrier.h [Jan 2016]

If this information is useful, please help other people find it:
Share via:

Leonid Yegoshin

2016-Jan-14 23:33 UTC

[v3,11/41] mips: reuse asm-generic/barrier.h

On 01/14/2016 02:55 PM, Paul E. McKenney wrote:> OK, so it looks like Will was asking not about WRC+addr+addr, but instead
> about WRC+sync+addr.(He actually asked twice about this and that too but skip this)
> I am guessing that the manual's "Older instructions which must be
globally
> performed when the SYNC instruction completes" provides the equivalent
> of ARM/Power A-cumulativity, which can be thought of as transitivity
> backwards in time.  This leads me to believe that your smp_mb() needs
> to use SYNC rather than SYNC_MB, as was the subject of earlier spirited
> discussion in this thread.
Don't be fooled here by words "ordered" and "completed"
- it is HW
design items and actually written poorly.
Just assume that SYNC_MB is absolutely the same as SYNC for any CPU and 
coherent device (besides performance). The difference can be in 
non-coherent devices because SYNC actually tries to make a barrier for 
them too. In some SoCs it is just the same because there is no need to 
barrier a non-coherent device (device register access usually strictly 
ordered... if there is no bridge in between).
>
> Suppose you have something like this:
> ...
> Does your hardware guarantee that it is not possible for all of r0,
> r1, r2, and r3 to be equal to zero at the end of the test, assuming
> that a, b, c, and d are all initially zero, and the four functions
> above run concurrently?
It is assumed to be so from Arch point of view. HW bugs are possible, of 
course.
> Another (more academic) case is this one, with x and y initially zero:
>
> ...
> Does SYNC_MB() prohibit r1 == 1 && r2 == 0 && r3 == 1
&& r4 == 0?
It is assumed to be so from Arch point of view. HW bugs are possible, of 
course.

Note: I am not sure about ANY past MIPS R2 CPU because that stuff is 
implemented some time but nobody made it in Linux kernel (it was used by 
some vendor for non-Linux system). For that reason my patch for 
lightweight SYNCs has an option - implement it or implement a generic 
SYNC. It is possible that some vendor did it in different way but nobody 
knows or test it. But as a minimum - SYNC must be implemented in 
spinlocks/atomics/bitops, in recent P5600 it is proven that read can 
pass write in atomics.

MIPS R6 is a different story, I verified lightweight SYNCs from the 
beginning and it also should use SYNCs.

- Leonid.

Paul E. McKenney

2016-Jan-15 00:47 UTC

head link

[v3,11/41] mips: reuse asm-generic/barrier.h

On Thu, Jan 14, 2016 at 03:33:40PM -0800, Leonid Yegoshin
wrote:> On 01/14/2016 02:55 PM, Paul E. McKenney wrote:
> >OK, so it looks like Will was asking not about WRC+addr+addr, but
instead
> >about WRC+sync+addr.
> (He actually asked twice about this and that too but skip this)
Fair enough!  ;-)
> >I am guessing that the manual's "Older instructions which must
be globally
> >performed when the SYNC instruction completes" provides the
equivalent
> >of ARM/Power A-cumulativity, which can be thought of as transitivity
> >backwards in time.  This leads me to believe that your smp_mb() needs
> >to use SYNC rather than SYNC_MB, as was the subject of earlier spirited
> >discussion in this thread.
> 
> Don't be fooled here by words "ordered" and
"completed" - it is HW
> design items and actually written poorly.
> Just assume that SYNC_MB is absolutely the same as SYNC for any CPU
> and coherent device (besides performance). The difference can be in
> non-coherent devices because SYNC actually tries to make a barrier
> for them too. In some SoCs it is just the same because there is no
> need to barrier a non-coherent device (device register access
> usually strictly ordered... if there is no bridge in between).
So smp_mb() can be SYNC_MB.  However, mb() needs to be SYNC for MMIO
purposes, correct?
> >Suppose you have something like this:
> >...
> >Does your hardware guarantee that it is not possible for all of r0,
> >r1, r2, and r3 to be equal to zero at the end of the test, assuming
> >that a, b, c, and d are all initially zero, and the four functions
> >above run concurrently?
> 
> It is assumed to be so from Arch point of view. HW bugs are
> possible, of course.
Indeed!
> >Another (more academic) case is this one, with x and y initially zero:
> >
> >...
> >Does SYNC_MB() prohibit r1 == 1 && r2 == 0 && r3 == 1
&& r4 == 0?
> 
> It is assumed to be so from Arch point of view. HW bugs are
> possible, of course.
Looks to me like smp_mb() can be SYNC_MB, then.
> Note: I am not sure about ANY past MIPS R2 CPU because that stuff is
> implemented some time but nobody made it in Linux kernel (it was
> used by some vendor for non-Linux system). For that reason my patch
> for lightweight SYNCs has an option - implement it or implement a
> generic SYNC. It is possible that some vendor did it in different
> way but nobody knows or test it. But as a minimum - SYNC must be
> implemented in spinlocks/atomics/bitops, in recent P5600 it is
> proven that read can pass write in atomics.
> 
> MIPS R6 is a different story, I verified lightweight SYNCs from the
> beginning and it also should use SYNCs.
So you need to build a different kernel for some types of MIPS systems?
Or do you do boot-time rewriting, like a number of other arches do?

							Thanx, Paul

Leonid Yegoshin

2016-Jan-15 01:07 UTC

head link

[v3,11/41] mips: reuse asm-generic/barrier.h

On 01/14/2016 04:47 PM, Paul E. McKenney wrote:> On Thu, Jan 14, 2016 at 03:33:40PM -0800, Leonid Yegoshin wrote:
>> Don't be fooled here by words "ordered" and
"completed" - it is HW
>> design items and actually written poorly.
>> Just assume that SYNC_MB is absolutely the same as SYNC for any CPU
>> and coherent device (besides performance). The difference can be in
>> non-coherent devices because SYNC actually tries to make a barrier
>> for them too. In some SoCs it is just the same because there is no
>> need to barrier a non-coherent device (device register access
>> usually strictly ordered... if there is no bridge in between).
> So smp_mb() can be SYNC_MB.  However, mb() needs to be SYNC for MMIO
> purposes, correct?
Absolutely. For MIPS R2 which is not Octeon.
>> Note: I am not sure about ANY past MIPS R2 CPU because that stuff is
>> implemented some time but nobody made it in Linux kernel (it was
>> used by some vendor for non-Linux system). For that reason my patch
>> for lightweight SYNCs has an option - implement it or implement a
>> generic SYNC. It is possible that some vendor did it in different
>> way but nobody knows or test it. But as a minimum - SYNC must be
>> implemented in spinlocks/atomics/bitops, in recent P5600 it is
>> proven that read can pass write in atomics.
>>
>> MIPS R6 is a different story, I verified lightweight SYNCs from the
>> beginning and it also should use SYNCs.
> So you need to build a different kernel for some types of MIPS systems?
> Or do you do boot-time rewriting, like a number of other arches do?
I don't know. I would like to have responses. Ralf asked Maciej about 
old systems and that came nowhere. Even rewrite - don't know what to do 
with that: no lightweight SYNC or no SYNC at all - yes, it is still 
possible that SYNC on some systems can be too heavy or even harmful, 
nobody tested that.

- Leonid.

Ralf Baechle

2016-Jan-27 10:40 UTC

head link

[v3,11/41] mips: reuse asm-generic/barrier.h

On Thu, Jan 14, 2016 at 04:47:53PM -0800, Paul E. McKenney wrote:
> So you need to build a different kernel for some types of MIPS systems?
Yes.  We can't really do without.  Classic MIPS code is not relocatable
without the complexity of PIC code as used by ELF DSOs - and their
performanc penalty.  Plus we have a number of architecture revisions
ovr the decades, big and little endian, 32 and 64 bit as the major
stumbling stones.  There however are groups of similar systems that
can share kernel binaries.
> Or do you do boot-time rewriting, like a number of other arches do?
We don't rewrite the code (as in the .text of the vmlinux binary) but we
do runtime code generation for a few highly performance sensitive area
of the kernel code such as copy_page() or TLB exception handlers.  This
allows more flexibility than just inserting templates into the kernel
code.  Downside - it means we have some of the complexity of as and ld
in the kernel.

  Ralf

Seemingly Similar Threads

Search for more seemingly similar threads

Virtualization - Jan 2016 - [v3,11/41] mips: reuse asm-generic/barrier.h

[v3,11/41] mips: reuse asm-generic/barrier.h

[v3,11/41] mips: reuse asm-generic/barrier.h

[v3,11/41] mips: reuse asm-generic/barrier.h

[v3,11/41] mips: reuse asm-generic/barrier.h

Seemingly Similar Threads