thr3ads.net - search: "s64"

2016 Apr 19

2

[PATCH v4 20/37] volt: add coefficients

...100644 > --- a/drm/nouveau/nvkm/subdev/volt/base.c > +++ b/drm/nouveau/nvkm/subdev/volt/base.c > @@ -110,13 +110,47 @@ nvkm_volt_map(struct nvkm_volt *volt, u8 id, u8 temp) > > vmap = nvbios_vmap_entry_parse(bios, id, &ver, &len, &info); > if (vmap) { > + s64 result; > + > + if (volt->speedo < 0) > + return volt->speedo; Hmm, so you will refuse reclocking if the speedo cannot be read... Fair-enough, but I would like to see a warning in the kernel logs. > + > + if (ver == 0x10 || (ver == 0x20 && info.mode == 0)) { &...

[GlobalISel] Legalize generic instructions that also depend on type of scalar, not only scalar size

2018 Sep 21

2

[GlobalISel] Legalize generic instructions that also depend on type of scalar, not only scalar size

Hi, Mips32 has 64 bit floating point instructions, while i64 instructions have to be emulated with i32 instructions. This means that G_LOAD should be custom legalized for s64 integer value, and be legal for s64 floating point value. There are also other generic instructions with the same problem: G_STORE, G_SELECT, G_EXTRACT, and G_INSERT. There are also other configurations where integer and floating point instructions of the same size are not simultaneously availa...

[PATCH v4 20/37] volt: add coefficients

2016 Apr 18

0

[PATCH v4 20/37] volt: add coefficients

...dev/volt/base.c index cecfac6..5e35d96 100644 --- a/drm/nouveau/nvkm/subdev/volt/base.c +++ b/drm/nouveau/nvkm/subdev/volt/base.c @@ -110,13 +110,47 @@ nvkm_volt_map(struct nvkm_volt *volt, u8 id, u8 temp) vmap = nvbios_vmap_entry_parse(bios, id, &ver, &len, &info); if (vmap) { + s64 result; + + if (volt->speedo < 0) + return volt->speedo; + + if (ver == 0x10 || (ver == 0x20 && info.mode == 0)) { + result = (s64)info.arg[0] / 10; + result += ((s64)info.arg[1] * volt->speedo) / 10; + result += ((s64)info.arg[2] * volt->speedo * volt->speedo)...

[LLVMdev] IndVar widening in IndVarSimplify causing performance regression on GPU programs

2014 Oct 24

3

[LLVMdev] IndVar widening in IndVarSimplify causing performance regression on GPU programs

...ut widening, the loop body in the PTX (a low-level assembly-like language generated by NVPTX64) is: BB0_2: // =>This Inner Loop Header: Depth=1 mul.lo.s32 %r5, %r6, %r6; st.u32 [%rd4], %r5; add.s32 %r6, %r6, 3; add.s64 %rd4, %rd4, 12; setp.lt.s32 %p2, %r6, %r3; @%p2 bra BB0_2; in which %r6 is the induction variable i. With widening, the loop body becomes: BB0_2: // =>This Inner Loop Header: Depth=1 mul.lo.s64 %rd8, %rd10, %rd10;...

[PATCH] volt: use kernel's 64-bit signed division function

2016 Sep 16

1

[PATCH] volt: use kernel's 64-bit signed division function

Doing direct 64 bit divisions in kernel code leads to references to undefined symbols on 32 bit architectures. Replace such divisions with calls to div64_s64 to make the module usable on 32 bit archs. Signed-off-by: Alexandre Courbot <acourbot at nvidia.com> --- drm/nouveau/nvkm/subdev/volt/base.c | 6 +++--- lib/include/nvif/os.h | 1 + 2 files changed, 4 insertions(+), 3 deletions(-) diff --git a/drm/nouveau/nvkm/subdev/volt/bas...

[PATCH] virtio_balloon: fix towards_target when deflating balloon

2008 Aug 18

2

[PATCH] virtio_balloon: fix towards_target when deflating balloon

Both v and vb->num_pages are u32 and unsigned int respectively. If v is less than vb->num_pages (and it is, when deflating the balloon), the result is a very large 32-bit number. Since we're returning a s64, instead of getting the same negative number we desire, we get a very large positive number. This handles the case where v < vb->num_pages and ensures we get a small, negative, s64 as the result. Rusty: please push this for 2.6.27-rc4. It's probably appropriate for the stable tree too...

[PATCH] virtio_balloon: fix towards_target when deflating balloon

2008 Aug 18

2

[PATCH] virtio_balloon: fix towards_target when deflating balloon

Both v and vb->num_pages are u32 and unsigned int respectively. If v is less than vb->num_pages (and it is, when deflating the balloon), the result is a very large 32-bit number. Since we're returning a s64, instead of getting the same negative number we desire, we get a very large positive number. This handles the case where v < vb->num_pages and ensures we get a small, negative, s64 as the result. Rusty: please push this for 2.6.27-rc4. It's probably appropriate for the stable tree too...

[PATCH] virtio_balloon: Fix endless deflation and inflation on arm64

2023 Aug 29

2

[PATCH] virtio_balloon: Fix endless deflation and inflation on arm64

...rs/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c index 5b15936a5214..625caac35264 100644 --- a/drivers/virtio/virtio_balloon.c +++ b/drivers/virtio/virtio_balloon.c @@ -386,6 +386,17 @@ static void stats_handle_request(struct virtio_balloon *vb) virtqueue_kick(vq); } +static inline s64 align_pages_up(s64 diff) +{ + if (diff == 0) + return diff; + + if (diff > 0) + return ALIGN(diff, VIRTIO_BALLOON_PAGES_PER_PAGE); + + return -ALIGN(-diff, VIRTIO_BALLOON_PAGES_PER_PAGE); +} + static inline s64 towards_target(struct virtio_balloon *vb) { s64 target; @@ -396,7 +407,7 @@ sta...

[GlobalISel][MIPS] Legality and instruction combining

2018 Sep 14

2

[GlobalISel][MIPS] Legality and instruction combining

...at for TypeIdx==1. Is it intentionally implemented this way? >> b) Is the plan to sometimes let s1 as legal type and ignore it later? > I'm not sure what you mean here > For example lets look at AArch64 G_SELECT: getActionDefinitionsBuilder(G_SELECT) .legalFor({{s32, s1}, {s64, s1}, {p0, s1}}) .clampScalar(0, s32, s64) .widenScalarToNextPow2(0); In this case LLT of operand 1 (s1) in G_SELECT has size 1, and corresponding register class in selected instruction has size 32 (that is $src1 in AArch64::ANDSWri, it has GPR32 regsiter class). For that reason s1...

[PATCH] virtio_balloon: Fix endless deflation and inflation on arm64

2023 Aug 30

1

[PATCH] virtio_balloon: Fix endless deflation and inflation on arm64

.../virtio_balloon.c > index 5b15936a5214..625caac35264 100644 > --- a/drivers/virtio/virtio_balloon.c > +++ b/drivers/virtio/virtio_balloon.c > @@ -386,6 +386,17 @@ static void stats_handle_request(struct virtio_balloon *vb) > virtqueue_kick(vq); > } > > +static inline s64 align_pages_up(s64 diff) > +{ > + if (diff == 0) > + return diff; > + > + if (diff > 0) > + return ALIGN(diff, VIRTIO_BALLOON_PAGES_PER_PAGE); > + > + return -ALIGN(-diff, VIRTIO_BALLOON_PAGES_PER_PAGE); > +} > + > static inline s64 towards_target(struct virt...

[PATCH v4 20/37] volt: add coefficients

2016 Apr 19

0

[PATCH v4 20/37] volt: add coefficients

On Tue, Apr 19, 2016 at 5:52 PM, Martin Peres <martin.peres at free.fr> wrote: >> + result = ((s64)info.arg[0] * 15625) >> >> 18; >> + result += ((s64)info.arg[1] * volt->speedo >> * 15625) >> 18; >> + result += ((s64)info.arg[2] * temp * >> 15625) >> 10; >> +...

[LLVMdev] [NVPTX] Eliminate common sub-expressions in a group of similar GEPs

2014 Apr 19

4

[LLVMdev] [NVPTX] Eliminate common sub-expressions in a group of similar GEPs

...s PTX code that literally computes the pointer address of each GEP, wasting tons of registers. e.g., it emits the following PTX for the first load and similar PTX for other loads. mov.u32 %r1, %tid.x; mov.u32 %r2, %tid.y; mul.wide.u32 %rl2, %r1, 128; mov.u64 %rl3, a; add.s64 %rl4, %rl3, %rl2; mul.wide.u32 %rl5, %r2, 4; add.s64 %rl6, %rl4, %rl5; ld.shared.f32 %f1, [%rl6]; The resultant register pressure causes up to 20% slowdown on some of our benchmarks. To reduce register pressure, the optimization implemented in this patch merges the common sub...

[LLVMdev] NVPTX CUDA_ERROR_NO_BINARY_FOR_GPU

2013 Mar 01

4

[LLVMdev] NVPTX CUDA_ERROR_NO_BINARY_FOR_GPU

...ples_2E_mandelbrot_2F_square_param_0 > > ) > > { > > .reg .pred %p<396>; > > .reg .s16 %rc<396>; > > .reg .s16 %rs<396>; > > .reg .s32 %r<396>; > > .reg .s64 %rl<396>; > > .reg .f32 %f<396>; > > .reg .f64 %fl<396>; > > > > mov.f64 %fl0, examples_2E_mandelbrot_2F_square_param_0; > > mul.f64 %fl0, %fl0, %fl0; > >...

[RFC] Tablegen-erated GlobalISel Combine Rules

2018 Nov 27

2

[RFC] Tablegen-erated GlobalISel Combine Rules

...ICombineRule< (defs reg:$D, reg:$A), (match (G_LOAD $t1, $D), (G_SEXT $A, $t1)), (apply (G_SEXTLOAD $A, $D))> { let MatchStartsFrom = (roots $D); }; def : GICombineRule< (defs reg:$D, reg:$A, reg:$B, reg:$C), (match (G_TRUNC s32:$t1, s64:$A), (G_TRUNC s32:$t2, s64:$B), (G_ADD $D, $t1, $t2) (G_SEXT s64:$C, $D)), (apply (G_ADD $D, $A, $B), (G_SEXT_INREG $C, $D))> { let MatchStartsFrom = (roots $D); }; def : GICombineRule< (defs reg:$D1, reg:$D2, reg:$...

[GlobalISel] G_LOAD/G_STORE i64/f64 handling

2017 Jul 02

2

[GlobalISel] G_LOAD/G_STORE i64/f64 handling

...rm + float/double configuration (-mtriple=i386-linux-gnu -mattr=+sse2 ) load i64, i64* %p1 - illegal, require narrowScalar action load double, double * %p1 - legal What is the best approach to Legalize this case ? Should I mark G_LOAD/G_STORE s64 as Custom? Regards, Igor Breger --------------------------------------------------------------------- Intel Israel (74) Limited This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibi...

[LLVMdev] NVPTX CUDA_ERROR_NO_BINARY_FOR_GPU

2013 Mar 01

0

[LLVMdev] NVPTX CUDA_ERROR_NO_BINARY_FOR_GPU

...brot_2F_square( > > > .reg .b64 examples_2E_mandelbrot_2F_square_param_0 > > > ) > > > { > > > .reg .pred %p<396>; > > > .reg .s16 %rc<396>; > > > .reg .s16 %rs<396>; > > > .reg .s32 %r<396>; > > > .reg .s64 %rl<396>; > > > .reg .f32 %f<396>; > > > .reg .f64 %fl<396>; > > > > > > mov.f64 %fl0, examples_2E_mandelbrot_2F_square_param_0; > > > mul.f64 %fl0, %fl0, %fl0; > > > mov.f64 func_retval0, %fl0;...

[RFC] Tablegen-erated GlobalISel Combine Rules

2018 Nov 30

2

[RFC] Tablegen-erated GlobalISel Combine Rules

...:$A), >> (match (G_LOAD $t1, $D), >> (G_SEXT $A, $t1)), >> (apply (G_SEXTLOAD $A, $D))> { >> let MatchStartsFrom = (roots $D); >> }; >> def : GICombineRule< >> (defs reg:$D, reg:$A, reg:$B, reg:$C), >> (match (G_TRUNC s32:$t1, s64:$A), >> (G_TRUNC s32:$t2, s64:$B), >> (G_ADD $D, $t1, $t2) >> (G_SEXT s64:$C, $D)), >> (apply (G_ADD $D, $A, $B), >> (G_SEXT_INREG $C, $D))> { >> let MatchStartsFrom = (roots $D); >> }; >> def : GICombineRule&l...

[LLVMdev] NVPTX CUDA_ERROR_NO_BINARY_FOR_GPU

2013 Mar 01

0

[LLVMdev] NVPTX CUDA_ERROR_NO_BINARY_FOR_GPU

..._param_0 >> > ) >> > { >> > .reg .pred %p<396>; >> > .reg .s16 %rc<396>; >> > .reg .s16 %rs<396>; >> > .reg .s32 %r<396>; >> > .reg .s64 %rl<396>; >> > .reg .f32 %f<396>; >> > .reg .f64 %fl<396>; >> > >> > mov.f64 %fl0, examples_2E_mandelbrot_2F_square_param_0; >> > mul.f64 %fl0, %fl0, %fl0;...

How to print out float/double arguments from arg0, arg1, ...?

2006 Nov 15

0

How to print out float/double arguments from arg0, arg1, ...?

I want to print out arguments of float and double type, such as from sin(), cos(), etc. By trial and error, I came up following macros. union { double d64; float f32[2]; int64_t s64; int32_t s32[2]; } VALUE; #define PRINT_F32_sparc(val) \ VALUE.s64 = val; \ printf("\n%s = %f\n", \ "val", VALUE.f32[1]); #define PRINT_F32_i386(...

Removing the register block in MIR

2017 Oct 21

2

Removing the register block in MIR

The MIR format currently has a short-hand syntax for declaring vreg classes and banks in the function body so you can write something like this: name: foo body: | %3:gpr(s64) = ... rather than the much more verbose and awkward: name: foo registers: - { id: 3, class: gpr } body: | %3(s64) = ... I'd like to make this shorthand the only way to do this. There are a few things that need to be handled here: - We should only print the class on defs, not...

search for: s64