search for: ftz

Displaying 20 results from an estimated 70 matches for "ftz".

Did you mean: fts
2018 Sep 08
0
[PATCH] maxwell,pascal: add scheduling data to shaders
...0 0x1 t2d 0x8 ipa $r3 a[0x84] $r0 0x0 0x1 -sched (st 0x0) (st 0x0) (st 0x0) +sched (st 0xf wr 0x2) (st 0x1 wr 0x0 wt 0x5) (st 0xe) ipa $r2 a[0x80] $r0 0x0 0x1 tex nodep $r0 $r2 0x0 0x0 t2d 0x8 depbar le 0x5 0x0 0x0 -sched (st 0x0) (st 0x0) (st 0x0) +sched (st 0x6 wt 0x3) (st 0x1) (st 0x1) fmul ftz $r3 $r0 $r1 mov $r2 $r3 0xf mov $r1 $r3 0xf -sched (st 0x0) (st 0x0) (st 0x0) +sched (st 0x1) (st 0xf wt 0x3f) (st 0x1) mov $r0 $r3 0xf exit +nop 0x0 #endif diff --git a/src/shader/exac8nv110.fpc b/src/shader/exac8nv110.fpc index 4aa1368..ffc2bdc 100644 --- a/src/shader/exac8nv110.fpc +++ b/sr...
2017 Jul 01
2
[PATCH 1/2] nv110/exa: Remove depbars
...ndex ce78036..220d7e5 100644 --- a/src/shader/exac8nv110.fp +++ b/src/shader/exac8nv110.fp @@ -36,12 +36,11 @@ ipa $r3 a[0x84] $r0 0x0 0x1 sched (st 0x0) (st 0x0) (st 0x0) ipa $r2 a[0x80] $r0 0x0 0x1 tex nodep $r0 $r2 0x0 0x0 t2d 0x8 -depbar le 0x5 0x0 0x0 -sched (st 0x0) (st 0x0) (st 0x0) fmul ftz $r3 $r0 $r1 +sched (st 0x0) (st 0x0) (st 0x0) mov $r2 $r3 0xf mov $r1 $r3 0xf -sched (st 0x0) (st 0x0) (st 0x0) mov $r0 $r3 0xf +sched (st 0x0) (st 0x0) (st 0x0) exit #endif diff --git a/src/shader/exac8nv110.fpc b/src/shader/exac8nv110.fpc index 4aa1368..d8d5517 100644 --- a/src/shader/exac8n...
2019 Mar 16
3
[RFC] Making space for a flush-to-zero flag in FastMathFlags
Hi, I need to add a flush-denormals-to-zero (FTZ) flag to FastMathFlags, but we've already used up the 7 bits available in Value::SubclassOptionalData (the "backing storage" for FPMathOperator::getFastMathFlags()). These are the possibilities I can think of: 1. Increase the size of FPMathOperator. This gives us some additional b...
2017 Jun 27
4
[PATCH v4] nv110/exa: update sched codes
...0 0x1 t2d 0x8 ipa $r3 a[0x84] $r0 0x0 0x1 -sched (st 0x0) (st 0x0) (st 0x0) +sched (st 0xf wr 0x2) (st 0xf wr 0x1 wt 0x6) (st 0xf) ipa $r2 a[0x80] $r0 0x0 0x1 tex nodep $r0 $r2 0x0 0x0 t2d 0x8 depbar le 0x5 0x0 0x0 -sched (st 0x0) (st 0x0) (st 0x0) +sched (st 0x6 wt 0x3) (st 0x1) (st 0x1) fmul ftz $r3 $r0 $r1 mov $r2 $r3 0xf mov $r1 $r3 0xf -sched (st 0x0) (st 0x0) (st 0x0) +sched (st 0x1) (st 0xf) (st 0x0) mov $r0 $r3 0xf exit #endif diff --git a/src/shader/exac8nv110.fpc b/src/shader/exac8nv110.fpc index 4aa1368..1f7d649 100644 --- a/src/shader/exac8nv110.fpc +++ b/src/shader/exac8nv1...
2016 Oct 16
2
[PATCH] exa: add GM10x acceleration support
...$r0 0x0 0x1 +sched (st 0x0) (st 0x0) (st 0x0) +ipa $r2 a[0x90] $r0 0x0 0x1 +tex nodep $r1 $r2 0x0 0x1 t2d 0x8 +ipa $r3 a[0x84] $r0 0x0 0x1 +sched (st 0x0) (st 0x0) (st 0x0) +ipa $r2 a[0x80] $r0 0x0 0x1 +tex nodep $r0 $r2 0x0 0x0 t2d 0x8 +depbar le 0x5 0x0 0x0 +sched (st 0x0) (st 0x0) (st 0x0) +fmul ftz $r3 $r0 $r1 +mov $r2 $r3 0xf +mov $r1 $r3 0xf +sched (st 0x0) (st 0x0) (st 0x0) +mov $r0 $r3 0xf +exit +#endif diff --git a/src/shader/exac8nv110.fpc b/src/shader/exac8nv110.fpc new file mode 100644 index 0000000..4aa1368 --- /dev/null +++ b/src/shader/exac8nv110.fpc @@ -0,0 +1,38 @@ +0xfc0007e0, +...
2017 Jun 10
2
[PATCH v3] nv110/exa: update sched codes
...0 0x1 t2d 0x8 ipa $r3 a[0x84] $r0 0x0 0x1 -sched (st 0x0) (st 0x0) (st 0x0) +sched (st 0xf wr 0x2) (st 0xf wr 0x1 wt 0x6) (st 0xf) ipa $r2 a[0x80] $r0 0x0 0x1 tex nodep $r0 $r2 0x0 0x0 t2d 0x8 depbar le 0x5 0x0 0x0 -sched (st 0x0) (st 0x0) (st 0x0) +sched (st 0x6 wt 0x3) (st 0x1) (st 0x1) fmul ftz $r3 $r0 $r1 mov $r2 $r3 0xf mov $r1 $r3 0xf -sched (st 0x0) (st 0x0) (st 0x0) +sched (st 0x1) (st 0xf) (st 0x0) mov $r0 $r3 0xf exit #endif diff --git a/src/shader/exac8nv110.fpc b/src/shader/exac8nv110.fpc index 4aa1368..1f7d649 100644 --- a/src/shader/exac8nv110.fpc +++ b/src/shader/exac8nv1...
2016 Oct 27
0
[PATCH v2 1/7] exa: add GM10x acceleration support
...$r0 0x0 0x1 +sched (st 0x0) (st 0x0) (st 0x0) +ipa $r2 a[0x90] $r0 0x0 0x1 +tex nodep $r1 $r2 0x0 0x1 t2d 0x8 +ipa $r3 a[0x84] $r0 0x0 0x1 +sched (st 0x0) (st 0x0) (st 0x0) +ipa $r2 a[0x80] $r0 0x0 0x1 +tex nodep $r0 $r2 0x0 0x0 t2d 0x8 +depbar le 0x5 0x0 0x0 +sched (st 0x0) (st 0x0) (st 0x0) +fmul ftz $r3 $r0 $r1 +mov $r2 $r3 0xf +mov $r1 $r3 0xf +sched (st 0x0) (st 0x0) (st 0x0) +mov $r0 $r3 0xf +exit +#endif diff --git a/src/shader/exac8nv110.fpc b/src/shader/exac8nv110.fpc new file mode 100644 index 0000000..4aa1368 --- /dev/null +++ b/src/shader/exac8nv110.fpc @@ -0,0 +1,38 @@ +0xfc0007e0, +...
2017 Jun 03
2
[PATCH v2] nv110/exa: update sched codes
...0 0x1 t2d 0x8 ipa $r3 a[0x84] $r0 0x0 0x1 -sched (st 0x0) (st 0x0) (st 0x0) +sched (st 0xf wr 0x2) (st 0xf wr 0x1 wt 0x6) (st 0xf) ipa $r2 a[0x80] $r0 0x0 0x1 tex nodep $r0 $r2 0x0 0x0 t2d 0x8 depbar le 0x5 0x0 0x0 -sched (st 0x0) (st 0x0) (st 0x0) +sched (st 0x6 wt 0x3) (st 0x6) (st 0x1) fmul ftz $r3 $r0 $r1 mov $r2 $r3 0xf mov $r1 $r3 0xf -sched (st 0x0) (st 0x0) (st 0x0) +sched (st 0x6) (st 0xf) (st 0x0) mov $r0 $r3 0xf exit #endif diff --git a/src/shader/exac8nv110.fpc b/src/shader/exac8nv110.fpc index 4aa1368..46943b7 100644 --- a/src/shader/exac8nv110.fpc +++ b/src/shader/exac8nv1...
2016 Oct 17
0
[PATCH] exa: add GM10x acceleration support
...> +ipa $r2 a[0x90] $r0 0x0 0x1 > +tex nodep $r1 $r2 0x0 0x1 t2d 0x8 > +ipa $r3 a[0x84] $r0 0x0 0x1 > +sched (st 0x0) (st 0x0) (st 0x0) > +ipa $r2 a[0x80] $r0 0x0 0x1 > +tex nodep $r0 $r2 0x0 0x0 t2d 0x8 > +depbar le 0x5 0x0 0x0 > +sched (st 0x0) (st 0x0) (st 0x0) > +fmul ftz $r3 $r0 $r1 > +mov $r2 $r3 0xf > +mov $r1 $r3 0xf > +sched (st 0x0) (st 0x0) (st 0x0) > +mov $r0 $r3 0xf > +exit > +#endif > diff --git a/src/shader/exac8nv110.fpc b/src/shader/exac8nv110.fpc > new file mode 100644 > index 0000000..4aa1368 > --- /dev/null > +++ b/sr...
2017 Jun 28
1
[PATCH v4] nv110/exa: update sched codes
...t 0x0) > > +sched (st 0xf wr 0x2) (st 0xf wr 0x1 wt 0x6) (st 0xf) > > ipa $r2 a[0x80] $r0 0x0 0x1 > > tex nodep $r0 $r2 0x0 0x0 t2d 0x8 > > depbar le 0x5 0x0 0x0 > > -sched (st 0x0) (st 0x0) (st 0x0) > > +sched (st 0x6 wt 0x3) (st 0x1) (st 0x1) > > fmul ftz $r3 $r0 $r1 > > mov $r2 $r3 0xf > > mov $r1 $r3 0xf > > -sched (st 0x0) (st 0x0) (st 0x0) > > +sched (st 0x1) (st 0xf) (st 0x0) > > mov $r0 $r3 0xf > > exit > > #endif > > diff --git a/src/shader/exac8nv110.fpc b/src/shader/exac8nv110.fpc > &g...
2019 Mar 18
2
[RFC] Making space for a flush-to-zero flag in FastMathFlags
...e problem). Let's see if we can agree on a more future proof solution. -- Sanjoy > > ~Craig > > > On Sat, Mar 16, 2019 at 12:51 PM Sanjoy Das via llvm-dev <llvm-dev at lists.llvm.org> wrote: >> >> Hi, >> >> I need to add a flush-denormals-to-zero (FTZ) flag to FastMathFlags, >> but we've already used up the 7 bits available in >> Value::SubclassOptionalData (the "backing storage" for >> FPMathOperator::getFastMathFlags()). These are the possibilities I >> can think of: >> >> 1. Increase the siz...
2017 Jun 07
2
[PATCH v2] nv110/exa: update sched codes
...2 0x0 0x0 t2d 0x8 >> > > Out of curiosity, what didn't you add a read-dep-bar on $r2:$r3 here? Missed it, thanks for pointing it out. > > > depbar le 0x5 0x0 0x0 >> -sched (st 0x0) (st 0x0) (st 0x0) >> +sched (st 0x6 wt 0x3) (st 0x6) (st 0x1) >> fmul ftz $r3 $r0 $r1 >> mov $r2 $r3 0xf >> > > You can stall for only one cycle here, but the 6 cycles on fmul is needed. > > mov $r1 $r3 0xf >> -sched (st 0x0) (st 0x0) (st 0x0) >> +sched (st 0x6) (st 0xf) (st 0x0) >> mov $r0 $r3 0xf >> > > Same h...
2017 Jul 01
0
[PATCH v5 2/2] nv110/exa: update sched codes
...xf wr 0x0 wt 0x3) (st 0xf wr 0x0 wt 0x1) ipa $r2 a[0x90] $r0 0x0 0x1 tex nodep $r1 $r2 0x0 0x1 t2d 0x8 ipa $r3 a[0x84] $r0 0x0 0x1 -sched (st 0x0) (st 0x0) (st 0x0) +sched (st 0xf wr 0x1) (st 0xf wr 0x0 wt 0x3) (st 0x6 wt 0x1) ipa $r2 a[0x80] $r0 0x0 0x1 tex nodep $r0 $r2 0x0 0x0 t2d 0x8 fmul ftz $r3 $r0 $r1 -sched (st 0x0) (st 0x0) (st 0x0) +sched (st 0x1) (st 0x1) (st 0x1) mov $r2 $r3 0xf mov $r1 $r3 0xf mov $r0 $r3 0xf -sched (st 0x0) (st 0x0) (st 0x0) +sched (st 0xf) (st 0x0) (st 0x0) exit #endif diff --git a/src/shader/exac8nv110.fpc b/src/shader/exac8nv110.fpc index d8d5517..7eb9...
2017 Jun 03
0
[PATCH] nv110/exa: update sched codes
...0 0x1 t2d 0x8 ipa $r3 a[0x84] $r0 0x0 0x1 -sched (st 0x0) (st 0x0) (st 0x0) +sched (st 0xf wr 0x2) (st 0xf wr 0x1 wt 0x6) (st 0xf) ipa $r2 a[0x80] $r0 0x0 0x1 tex nodep $r0 $r2 0x0 0x0 t2d 0x8 depbar le 0x5 0x0 0x0 -sched (st 0x0) (st 0x0) (st 0x0) +sched (st 0x6 wt 0x3) (st 0x6) (st 0x1) fmul ftz $r3 $r0 $r1 mov $r2 $r3 0xf mov $r1 $r3 0xf -sched (st 0x0) (st 0x0) (st 0x0) +sched (st 0x6) (st 0xf) (st 0x0) mov $r0 $r3 0xf exit #endif diff --git a/src/shader/exac8nv110.fpc b/src/shader/exac8nv110.fpc index 4aa1368..46943b7 100644 --- a/src/shader/exac8nv110.fpc +++ b/src/shader/exac8nv1...
2019 Sep 16
3
Handling of FP denormal values
...not entirely certain whether it is intended to control the target hardware or just the optimizer. In addition, when either -Ofast or -ffast-math is used, we attempt to link 'crtfastmath.o' if it can be found. For X86 targets, this object file adds a static constructor that sets the DAZ and FTZ bits of the MXCSR register. I expect that it has analogous behavior for other architectures when it is available. This object file is typically available on Linux systems, possibly also with things like MinGW. If it isn't found, the denomral control flags will be left in their default state. T...
2017 Jun 29
0
[PATCH v4] nv110/exa: update sched codes
...gt; -sched (st 0x0) (st 0x0) (st 0x0) > +sched (st 0xf wr 0x2) (st 0xf wr 0x1 wt 0x6) (st 0xf) > ipa $r2 a[0x80] $r0 0x0 0x1 > tex nodep $r0 $r2 0x0 0x0 t2d 0x8 > depbar le 0x5 0x0 0x0 > -sched (st 0x0) (st 0x0) (st 0x0) > +sched (st 0x6 wt 0x3) (st 0x1) (st 0x1) > fmul ftz $r3 $r0 $r1 > mov $r2 $r3 0xf > mov $r1 $r3 0xf > -sched (st 0x0) (st 0x0) (st 0x0) > +sched (st 0x1) (st 0xf) (st 0x0) > mov $r0 $r3 0xf > exit > #endif > diff --git a/src/shader/exac8nv110.fpc b/src/shader/exac8nv110.fpc > index 4aa1368..1f7d649 100644 > --...
2017 Jun 10
0
[PATCH v3] nv110/exa: update sched codes
...x1 > -sched (st 0x0) (st 0x0) (st 0x0) > +sched (st 0xf wr 0x2) (st 0xf wr 0x1 wt 0x6) (st 0xf) > ipa $r2 a[0x80] $r0 0x0 0x1 > tex nodep $r0 $r2 0x0 0x0 t2d 0x8 > depbar le 0x5 0x0 0x0 > -sched (st 0x0) (st 0x0) (st 0x0) > +sched (st 0x6 wt 0x3) (st 0x1) (st 0x1) > fmul ftz $r3 $r0 $r1 > mov $r2 $r3 0xf > mov $r1 $r3 0xf > -sched (st 0x0) (st 0x0) (st 0x0) > +sched (st 0x1) (st 0xf) (st 0x0) > mov $r0 $r3 0xf > exit > #endif > diff --git a/src/shader/exac8nv110.fpc b/src/shader/exac8nv110.fpc > index 4aa1368..1f7d649 100644 > --- a/s...
2017 Jun 28
0
[PATCH v4] nv110/exa: update sched codes
...x1 > -sched (st 0x0) (st 0x0) (st 0x0) > +sched (st 0xf wr 0x2) (st 0xf wr 0x1 wt 0x6) (st 0xf) > ipa $r2 a[0x80] $r0 0x0 0x1 > tex nodep $r0 $r2 0x0 0x0 t2d 0x8 > depbar le 0x5 0x0 0x0 > -sched (st 0x0) (st 0x0) (st 0x0) > +sched (st 0x6 wt 0x3) (st 0x1) (st 0x1) > fmul ftz $r3 $r0 $r1 > mov $r2 $r3 0xf > mov $r1 $r3 0xf > -sched (st 0x0) (st 0x0) (st 0x0) > +sched (st 0x1) (st 0xf) (st 0x0) > mov $r0 $r3 0xf > exit > #endif > diff --git a/src/shader/exac8nv110.fpc b/src/shader/exac8nv110.fpc > index 4aa1368..1f7d649 100644 > --- a/s...
2017 Jun 05
0
[PATCH v2] nv110/exa: update sched codes
...0x1 wt 0x6) (st 0xf) > ipa $r2 a[0x80] $r0 0x0 0x1 > tex nodep $r0 $r2 0x0 0x0 t2d 0x8 Out of curiosity, what didn't you add a read-dep-bar on $r2:$r3 here? > depbar le 0x5 0x0 0x0 > -sched (st 0x0) (st 0x0) (st 0x0) > +sched (st 0x6 wt 0x3) (st 0x6) (st 0x1) > fmul ftz $r3 $r0 $r1 > mov $r2 $r3 0xf You can stall for only one cycle here, but the 6 cycles on fmul is needed. > mov $r1 $r3 0xf > -sched (st 0x0) (st 0x0) (st 0x0) > +sched (st 0x6) (st 0xf) (st 0x0) > mov $r0 $r3 0xf Same here. > exit > #endif > diff --git a/src/sh...
2017 Jun 08
1
[PATCH v2] nv110/exa: update sched codes
...at since the two 'ipa' instructions are already waited on, $r0 will be ready? > > > >> >> >> depbar le 0x5 0x0 0x0 >> -sched (st 0x0) (st 0x0) (st 0x0) >> +sched (st 0x6 wt 0x3) (st 0x6) (st 0x1) >> fmul ftz $r3 $r0 $r1 >> mov $r2 $r3 0xf >> >> >> You can stall for only one cycle here, but the 6 cycles on fmul is >> needed. >> >> mov $r1 $r3 0xf >> -sched (st 0x0) (st 0x0) (st 0x0) >> +sched (st 0x6) (st...