thr3ads.net - Nouveau - [Nouveau] [RFC PATCH 3/8] nv50: allocate and map a notifier buffer object for PM [Jun 2015]

If this information is useful, please help other people find it:
Share via:

Ilia Mirkin

2015-Jun-22 20:52 UTC

[Nouveau] [RFC PATCH 5/8] nv50: prevent NULL pointer dereference with pipe_query functions

If query_create fails, why would any of these functions get called?

On Mon, Jun 22, 2015 at 4:53 PM, Samuel Pitoiset
<samuel.pitoiset at gmail.com> wrote:> This may happen when nv50_query_create() fails to create a new query.
>
> Signed-off-by: Samuel Pitoiset <samuel.pitoiset at gmail.com>
> ---
>  src/gallium/drivers/nouveau/nv50/nv50_query.c | 15 ++++++++++++++-
>  1 file changed, 14 insertions(+), 1 deletion(-)
>
> diff --git a/src/gallium/drivers/nouveau/nv50/nv50_query.c
b/src/gallium/drivers/nouveau/nv50/nv50_query.c
> index 55fcac8..1162110 100644
> --- a/src/gallium/drivers/nouveau/nv50/nv50_query.c
> +++ b/src/gallium/drivers/nouveau/nv50/nv50_query.c
> @@ -96,6 +96,9 @@ nv50_query_allocate(struct nv50_context *nv50, struct
nv50_query *q, int size)
>  static void
>  nv50_query_destroy(struct pipe_context *pipe, struct pipe_query *pq)
>  {
> +   if (!pq)
> +      return;
> +
>     nv50_query_allocate(nv50_context(pipe), nv50_query(pq), 0);
>     nouveau_fence_ref(NULL, &nv50_query(pq)->fence);
>     FREE(nv50_query(pq));
> @@ -152,6 +155,9 @@ nv50_query_begin(struct pipe_context *pipe, struct
pipe_query *pq)
>     struct nouveau_pushbuf *push = nv50->base.pushbuf;
>     struct nv50_query *q = nv50_query(pq);
>
> +   if (!pq)
> +      return FALSE;
> +
>     /* For occlusion queries we have to change the storage, because a
previous
>      * query might set the initial render conition to FALSE even *after* we
re-
>      * initialized it to TRUE.
> @@ -218,6 +224,9 @@ nv50_query_end(struct pipe_context *pipe, struct
pipe_query *pq)
>     struct nouveau_pushbuf *push = nv50->base.pushbuf;
>     struct nv50_query *q = nv50_query(pq);
>
> +   if (!pq)
> +      return;
> +
>     q->state = NV50_QUERY_STATE_ENDED;
>
>     switch (q->type) {
> @@ -294,9 +303,12 @@ nv50_query_result(struct pipe_context *pipe, struct
pipe_query *pq,
>     uint64_t *res64 = (uint64_t *)result;
>     uint32_t *res32 = (uint32_t *)result;
>     boolean *res8 = (boolean *)result;
> -   uint64_t *data64 = (uint64_t *)q->data;
> +   uint64_t *data64;
>     int i;
>
> +   if (!pq)
> +      return FALSE;
> +
>     if (q->state != NV50_QUERY_STATE_READY)
>        nv50_query_update(q);
>
> @@ -314,6 +326,7 @@ nv50_query_result(struct pipe_context *pipe, struct
pipe_query *pq,
>     }
>     q->state = NV50_QUERY_STATE_READY;
>
> +   data64 = (uint64_t *)q->data;
>     switch (q->type) {
>     case PIPE_QUERY_GPU_FINISHED:
>        res8[0] = TRUE;
> --
> 2.4.4
>
> _______________________________________________
> Nouveau mailing list
> Nouveau at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/nouveau

Samuel Pitoiset

2015-Jun-22 20:53 UTC

head link

[Nouveau] [RFC PATCH 0/8] nv50: expose global performance counters

Hello there,

This series exposes NVIDIA's global performance counters for Tesla through
the
Gallium's HUD and the GL_AMD_performance_monitor extension.

This adds support for 24 hardware events which have been reverse engineered
with PerfKit (Windows) and CUPTI (Linux). These hardware events will allow
developers to profile OpenGL applications.

To reduce latency and to improve accuracy, these global performance counters
are tied to the command stream of the GPU using a set of software methods
instead of ioctls. Results are then written by the kernel to a mapped notifier
buffer object that allows the userspace to read back them.

However, the libdrm branch which implements the new nvif interface exposed by
Nouveau and the software methods interface are not upstream yet. I hope this
should done in the next days.

The code of this series can be found here:
http://cgit.freedesktop.org/~hakzsam/mesa/log/?h=nouveau_perfmon

The libdrm branch can be found here:
http://cgit.freedesktop.org/~hakzsam/drm/log/?h=nouveau_perfmon

The code of the software methods interface can be found here (two last commits):
http://cgit.freedesktop.org/~hakzsam/nouveau/log/?h=nouveau_perfmon

An other series which exposes global performance counters for Fermi and Kepler
will be submitted once I have got enough reviews for this one.

Feel free to make a review.

Thanks,
Samuel.

Samuel Pitoiset (8):
  nouveau: implement the nvif hardware performance counters interface
  nv50: allocate a software object class
  nv50: allocate and map a notifier buffer object for PM
  nv50: configure the ring buffer for reading back PM counters
  nv50: prevent NULL pointer dereference with pipe_query functions
  nv50: add support for compute/graphics global performance counters
  nv50: expose global performance counters to the HUD
  nv50: enable GL_AMD_performance_monitor

 src/gallium/drivers/nouveau/Makefile.sources   |    2 +
 src/gallium/drivers/nouveau/nouveau_perfmon.c  |  302 +++++++
 src/gallium/drivers/nouveau/nouveau_perfmon.h  |   59 ++
 src/gallium/drivers/nouveau/nouveau_screen.c   |    5 +
 src/gallium/drivers/nouveau/nouveau_screen.h   |    1 +
 src/gallium/drivers/nouveau/nv50/nv50_query.c  | 1148 +++++++++++++++++++++++-
 src/gallium/drivers/nouveau/nv50/nv50_screen.c |   49 +
 src/gallium/drivers/nouveau/nv50/nv50_screen.h |   51 ++
 src/gallium/drivers/nouveau/nv50/nv50_winsys.h |    1 +
 9 files changed, 1612 insertions(+), 6 deletions(-)
 create mode 100644 src/gallium/drivers/nouveau/nouveau_perfmon.c
 create mode 100644 src/gallium/drivers/nouveau/nouveau_perfmon.h

-- 
2.4.4

Samuel Pitoiset

2015-Jun-22 20:53 UTC

head link

[Nouveau] [RFC PATCH 1/8] nouveau: implement the nvif hardware performance counters interface

This commit implements the base interface for hardware performance
counters that will be shared between nv50 and nvc0 drivers.

TODO: Bump libdrm version of mesa when nvif will be merged.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset at gmail.com>
---
 src/gallium/drivers/nouveau/Makefile.sources  |   2 +
 src/gallium/drivers/nouveau/nouveau_perfmon.c | 302 ++++++++++++++++++++++++++
 src/gallium/drivers/nouveau/nouveau_perfmon.h |  59 +++++
 src/gallium/drivers/nouveau/nouveau_screen.c  |   5 +
 src/gallium/drivers/nouveau/nouveau_screen.h  |   1 +
 5 files changed, 369 insertions(+)
 create mode 100644 src/gallium/drivers/nouveau/nouveau_perfmon.c
 create mode 100644 src/gallium/drivers/nouveau/nouveau_perfmon.h

diff --git a/src/gallium/drivers/nouveau/Makefile.sources
b/src/gallium/drivers/nouveau/Makefile.sources
index 3fae3bc..3da0bdc 100644
--- a/src/gallium/drivers/nouveau/Makefile.sources
+++ b/src/gallium/drivers/nouveau/Makefile.sources
@@ -10,6 +10,8 @@ C_SOURCES := \
 	nouveau_heap.h \
 	nouveau_mm.c \
 	nouveau_mm.h \
+	nouveau_perfmon.c \
+	nouveau_perfmon.h \
 	nouveau_screen.c \
 	nouveau_screen.h \
 	nouveau_statebuf.h \
diff --git a/src/gallium/drivers/nouveau/nouveau_perfmon.c
b/src/gallium/drivers/nouveau/nouveau_perfmon.c
new file mode 100644
index 0000000..3798612
--- /dev/null
+++ b/src/gallium/drivers/nouveau/nouveau_perfmon.c
@@ -0,0 +1,302 @@
+/*
+ * Copyright 2015 Samuel Pitoiset
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the
"Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+#include <errno.h>
+
+#include "util/u_memory.h"
+
+#include "nouveau_debug.h"
+#include "nouveau_winsys.h"
+#include "nouveau_perfmon.h"
+
+static int
+nouveau_perfmon_query_sources(struct nouveau_perfmon *pm,
+                              struct nouveau_perfmon_dom *dom,
+                              struct nouveau_perfmon_sig *sig)
+{
+	struct nvif_perfmon_query_source_v0 args = {};
+
+	args.domain = dom->id;
+	args.signal = sig->signal;
+	do {
+		uint8_t prev_iter = args.iter;
+		struct nouveau_perfmon_src *src;
+		int ret;
+
+		ret = nouveau_object_mthd(pm->object, NVIF_PERFMON_V0_QUERY_SOURCE,
+				                    &args, sizeof(args));
+		if (ret)
+			return ret;
+
+		if (prev_iter) {
+			args.iter = prev_iter;
+			ret = nouveau_object_mthd(pm->object, NVIF_PERFMON_V0_QUERY_SOURCE,
+					                    &args, sizeof(args));
+			if (ret)
+				return ret;
+
+			src = CALLOC_STRUCT(nouveau_perfmon_src);
+			if (!src)
+				return -ENOMEM;
+
+#if 0
+			debug_printf("id   = %d\n", args.source);
+			debug_printf("name = %s\n", args.name);
+			debug_printf("mask = %08x\n", args.mask);
+			debug_printf("\n");
+#endif
+
+		   src->id = args.source;
+         strncpy(src->name, args.name, sizeof(src->name));
+			list_addtail(&src->head, &sig->sources);
+		}
+	} while (args.iter != 0xff);
+
+	return 0;
+}
+
+static int
+nouveau_perfmon_query_signals(struct nouveau_perfmon *pm,
+                              struct nouveau_perfmon_dom *dom)
+{
+   struct nvif_perfmon_query_signal_v0 args = {};
+
+   args.domain = dom->id;
+   do {
+      uint16_t prev_iter = args.iter;
+      struct nouveau_perfmon_sig *sig;
+      int ret;
+
+      ret = nouveau_object_mthd(pm->object, NVIF_PERFMON_V0_QUERY_SIGNAL,
+                                &args, sizeof(args));
+      if (ret)
+         return ret;
+
+      if (prev_iter) {
+         args.iter = prev_iter;
+         ret = nouveau_object_mthd(pm->object, NVIF_PERFMON_V0_QUERY_SIGNAL,
+                                   &args, sizeof(args));
+         if (ret)
+            return ret;
+
+         sig = CALLOC_STRUCT(nouveau_perfmon_sig);
+         if (!sig)
+            return -ENOMEM;
+         list_inithead(&sig->sources);
+
+#if 0
+         debug_printf("name      = %s\n", args.name);
+         debug_printf("signal    = 0x%02x\n", args.signal);
+         debug_printf("source_nr = %d\n", args.source_nr);
+         debug_printf("\n");
+#endif
+
+         sig->signal = args.signal;
+         strncpy(sig->name, args.name, sizeof(sig->name));
+         list_addtail(&sig->head, &dom->signals);
+
+         /* Query all available sources for this signal. */
+         if (args.source_nr > 0) {
+            ret = nouveau_perfmon_query_sources(pm, dom, sig);
+            if (ret)
+               return ret;
+         }
+      }
+   } while (args.iter != 0xffff);
+
+   return 0;
+}
+
+static int
+nouveau_perfmon_query_domains(struct nouveau_perfmon *pm)
+{
+   struct nvif_perfmon_query_domain_v0 args = {};
+
+   do {
+      uint8_t prev_iter = args.iter;
+      struct nouveau_perfmon_dom *dom;
+      int ret;
+
+      ret = nouveau_object_mthd(pm->object, NVIF_PERFMON_V0_QUERY_DOMAIN,
+                                &args, sizeof(args));
+      if (ret)
+         return ret;
+
+      if (prev_iter) {
+         args.iter = prev_iter;
+         ret = nouveau_object_mthd(pm->object, NVIF_PERFMON_V0_QUERY_DOMAIN,
+                                   &args, sizeof(args));
+         if (ret)
+            return ret;
+
+         dom = CALLOC_STRUCT(nouveau_perfmon_dom);
+         if (!dom)
+            return -ENOMEM;
+         list_inithead(&dom->signals);
+
+#if 0
+         debug_printf("id         = %d\n", args.id);
+         debug_printf("name       = %s\n", args.name);
+         debug_printf("counter_nr = %d\n", args.counter_nr);
+         debug_printf("signal_nr  = %d\n", args.signal_nr);
+         debug_printf("\n");
+#endif
+
+         dom->id              = args.id;
+         dom->max_active_cntr = args.counter_nr;
+         strncpy(dom->name, args.name, sizeof(dom->name));
+         list_addtail(&dom->head, &pm->domains);
+
+         /* Query all available signals for this domain. */
+         if (args.signal_nr > 0) {
+            ret = nouveau_perfmon_query_signals(pm, dom);
+            if (ret)
+               return ret;
+         }
+      }
+   } while (args.iter != 0xff);
+
+   return 0;
+}
+
+static void
+nouveau_perfmon_free_sources(struct nouveau_perfmon_sig *sig)
+{
+   struct nouveau_perfmon_src *src, *next;
+
+   LIST_FOR_EACH_ENTRY_SAFE(src, next, &sig->sources, head) {
+      list_del(&src->head);
+      free(src);
+   }
+}
+
+static void
+nouveau_perfmon_free_signals(struct nouveau_perfmon_dom *dom)
+{
+   struct nouveau_perfmon_sig *sig, *next;
+
+   LIST_FOR_EACH_ENTRY_SAFE(sig, next, &dom->signals, head) {
+      nouveau_perfmon_free_sources(sig);
+      list_del(&sig->head);
+      free(sig);
+   }
+}
+
+static void
+nouveau_perfmon_free_domains(struct nouveau_perfmon *pm)
+{
+   struct nouveau_perfmon_dom *dom, *next;
+
+   LIST_FOR_EACH_ENTRY_SAFE(dom, next, &pm->domains, head) {
+      nouveau_perfmon_free_signals(dom);
+      list_del(&dom->head);
+      free(dom);
+   }
+}
+
+struct nouveau_perfmon *
+nouveau_perfmon_create(struct nouveau_device *dev)
+{
+   struct nouveau_perfmon *pm = NULL;
+   int ret;
+
+   pm = CALLOC_STRUCT(nouveau_perfmon);
+   if (!pm) {
+       NOUVEAU_ERR("Failed to allocate pm struct!\n");
+       return NULL;
+   }
+   list_inithead(&pm->domains);
+
+   /* init handle for perfdom objects */
+   pm->handle = 0xbeef9751;
+
+   ret = nouveau_object_new(&dev->object, 0xbeef9750,
+                            NVIF_IOCTL_NEW_V0_PERFMON, NULL, 0,
&pm->object);
+   if (ret)
+      goto fail;
+
+   /* Query all available domains, signals and sources for this device. */
+   ret = nouveau_perfmon_query_domains(pm);
+   if (ret)
+      goto fail;
+
+   return pm;
+
+fail:
+   nouveau_perfmon_destroy(pm);
+   return NULL;
+}
+
+void
+nouveau_perfmon_destroy(struct nouveau_perfmon *pm)
+{
+   if (!pm)
+      return;
+
+   nouveau_perfmon_free_domains(pm);
+   nouveau_object_del(&pm->object);
+   FREE(pm);
+}
+
+struct nouveau_perfmon_dom *
+nouveau_perfmon_get_dom_by_id(struct nouveau_perfmon *pm, uint8_t dom_id)
+{
+   struct nouveau_perfmon_dom *dom;
+
+   if (pm) {
+      LIST_FOR_EACH_ENTRY(dom, &pm->domains, head) {
+         if (dom->id == dom_id)
+            return dom;
+      }
+   }
+   return NULL;
+}
+
+struct nouveau_perfmon_sig *
+nouveau_perfmon_get_sig_by_name(struct nouveau_perfmon_dom *dom,
+                                const char *name)
+{
+   struct nouveau_perfmon_sig *sig;
+
+   if (dom) {
+      LIST_FOR_EACH_ENTRY(sig, &dom->signals, head) {
+         if (!strcmp(sig->name, name))
+            return sig;
+      }
+   }
+   return NULL;
+}
+
+struct nouveau_perfmon_src *
+nouveau_perfmon_get_src_by_name(struct nouveau_perfmon_sig *sig,
+                                const char *name)
+{
+   struct nouveau_perfmon_src *src;
+
+   if (sig) {
+      LIST_FOR_EACH_ENTRY(src, &sig->sources, head) {
+         if (!strcmp(src->name, name))
+            return src;
+      }
+   }
+   return NULL;
+}
diff --git a/src/gallium/drivers/nouveau/nouveau_perfmon.h
b/src/gallium/drivers/nouveau/nouveau_perfmon.h
new file mode 100644
index 0000000..49dcad3
--- /dev/null
+++ b/src/gallium/drivers/nouveau/nouveau_perfmon.h
@@ -0,0 +1,59 @@
+#ifndef __NOUVEAU_PERFMON_H__
+#define __NOUVEAU_PERFMON_H__
+
+#include <drm.h>
+#include <nouveau_class.h>
+#include <nouveau_ioctl.h>
+
+#include "util/list.h"
+
+struct nouveau_perfmon
+{
+   struct nouveau_object *object;
+   struct list_head domains;
+   uint64_t handle;
+};
+
+struct nouveau_perfmon_dom
+{
+   struct list_head head;
+   struct list_head signals;
+   uint8_t id;
+   char name[64];
+   uint8_t max_active_cntr;
+   uint8_t num_active_cntr;
+};
+
+struct nouveau_perfmon_sig
+{
+   struct list_head head;
+   struct list_head sources;
+   uint8_t signal;
+   char name[64];
+};
+
+struct nouveau_perfmon_src
+{
+   struct list_head head;
+   uint8_t id;
+   char name[64];
+};
+
+struct nouveau_perfmon *
+nouveau_perfmon_create(struct nouveau_device *dev);
+
+void
+nouveau_perfmon_destroy(struct nouveau_perfmon *pm);
+
+struct nouveau_perfmon_dom *
+nouveau_perfmon_get_dom_by_id(struct nouveau_perfmon *pm, uint8_t dom_id);
+
+struct nouveau_perfmon_sig *
+nouveau_perfmon_get_sig_by_name(struct nouveau_perfmon_dom *dom,
+                                const char *name);
+
+struct nouveau_perfmon_src *
+nouveau_perfmon_get_src_by_name(struct nouveau_perfmon_sig *sig,
+                                const char *name);
+
+#endif /* __NOUVEAU_PERFMON_H__ */
diff --git a/src/gallium/drivers/nouveau/nouveau_screen.c
b/src/gallium/drivers/nouveau/nouveau_screen.c
index c6e5074..3c14a77 100644
--- a/src/gallium/drivers/nouveau/nouveau_screen.c
+++ b/src/gallium/drivers/nouveau/nouveau_screen.c
@@ -21,6 +21,7 @@
 #include "nouveau_fence.h"
 #include "nouveau_mm.h"
 #include "nouveau_buffer.h"
+#include "nouveau_perfmon.h"
 
 /* XXX this should go away */
 #include "state_tracker/drm_driver.h"
@@ -226,6 +227,8 @@ nouveau_screen_init(struct nouveau_screen *screen, struct
nouveau_device *dev)
 					    NOUVEAU_BO_GART | NOUVEAU_BO_MAP,
 					    &mm_config);
 	screen->mm_VRAM = nouveau_mm_create(dev, NOUVEAU_BO_VRAM, &mm_config);
+
+	screen->perfmon = nouveau_perfmon_create(dev);
 	return 0;
 }
 
@@ -235,6 +238,8 @@ nouveau_screen_fini(struct nouveau_screen *screen)
 	nouveau_mm_destroy(screen->mm_GART);
 	nouveau_mm_destroy(screen->mm_VRAM);
 
+	nouveau_perfmon_destroy(screen->perfmon);
+
 	nouveau_pushbuf_del(&screen->pushbuf);
 
 	nouveau_client_del(&screen->client);
diff --git a/src/gallium/drivers/nouveau/nouveau_screen.h
b/src/gallium/drivers/nouveau/nouveau_screen.h
index 30041b2..fd7ecdb 100644
--- a/src/gallium/drivers/nouveau/nouveau_screen.h
+++ b/src/gallium/drivers/nouveau/nouveau_screen.h
@@ -21,6 +21,7 @@ struct nouveau_screen {
 	struct nouveau_object *channel;
 	struct nouveau_client *client;
 	struct nouveau_pushbuf *pushbuf;
+	struct nouveau_perfmon *perfmon;
 
 	int refcount;
 
-- 
2.4.4

Samuel Pitoiset

2015-Jun-22 20:53 UTC

head link

[Nouveau] [RFC PATCH 2/8] nv50: allocate a software object class

This will allow to monitor global performance counters through the
command stream of the GPU instead of using ioctls.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset at gmail.com>
---
 src/gallium/drivers/nouveau/nv50/nv50_screen.c | 11 +++++++++++
 src/gallium/drivers/nouveau/nv50/nv50_screen.h |  1 +
 src/gallium/drivers/nouveau/nv50/nv50_winsys.h |  1 +
 3 files changed, 13 insertions(+)

diff --git a/src/gallium/drivers/nouveau/nv50/nv50_screen.c
b/src/gallium/drivers/nouveau/nv50/nv50_screen.c
index 6583a35..c985344 100644
--- a/src/gallium/drivers/nouveau/nv50/nv50_screen.c
+++ b/src/gallium/drivers/nouveau/nv50/nv50_screen.c
@@ -367,6 +367,7 @@ nv50_screen_destroy(struct pipe_screen *pscreen)
    nouveau_object_del(&screen->eng2d);
    nouveau_object_del(&screen->m2mf);
    nouveau_object_del(&screen->sync);
+   nouveau_object_del(&screen->sw);
 
    nouveau_screen_fini(&screen->base);
 
@@ -437,6 +438,9 @@ nv50_screen_init_hwctx(struct nv50_screen *screen)
    BEGIN_NV04(push, SUBC_3D(NV01_SUBCHAN_OBJECT), 1);
    PUSH_DATA (push, screen->tesla->handle);
 
+   BEGIN_NV04(push, SUBC_SW(NV01_SUBCHAN_OBJECT), 1);
+   PUSH_DATA (push, screen->sw->handle);
+
    BEGIN_NV04(push, NV50_3D(COND_MODE), 1);
    PUSH_DATA (push, NV50_3D_COND_MODE_ALWAYS);
 
@@ -768,6 +772,13 @@ nv50_screen_create(struct nouveau_device *dev)
       goto fail;
    }
 
+   ret = nouveau_object_new(chan, 0xbeef506e, 0x506e,
+                            NULL, 0, &screen->sw);
+   if (ret) {
+      NOUVEAU_ERR("Failed to allocate SW object: %d\n", ret);
+      goto fail;
+   }
+
    ret = nouveau_object_new(chan, 0xbeef5039, NV50_M2MF_CLASS,
                             NULL, 0, &screen->m2mf);
    if (ret) {
diff --git a/src/gallium/drivers/nouveau/nv50/nv50_screen.h
b/src/gallium/drivers/nouveau/nv50/nv50_screen.h
index 881051b..69fdfdb 100644
--- a/src/gallium/drivers/nouveau/nv50/nv50_screen.h
+++ b/src/gallium/drivers/nouveau/nv50/nv50_screen.h
@@ -93,6 +93,7 @@ struct nv50_screen {
    struct nouveau_object *tesla;
    struct nouveau_object *eng2d;
    struct nouveau_object *m2mf;
+   struct nouveau_object *sw;
 };
 
 static INLINE struct nv50_screen *
diff --git a/src/gallium/drivers/nouveau/nv50/nv50_winsys.h
b/src/gallium/drivers/nouveau/nv50/nv50_winsys.h
index e8578c8..5cb33ef 100644
--- a/src/gallium/drivers/nouveau/nv50/nv50_winsys.h
+++ b/src/gallium/drivers/nouveau/nv50/nv50_winsys.h
@@ -60,6 +60,7 @@ PUSH_REFN(struct nouveau_pushbuf *push, struct nouveau_bo *bo,
uint32_t flags)
 #define SUBC_COMPUTE(m) 6, (m)
 #define NV50_COMPUTE(n) SUBC_COMPUTE(NV50_COMPUTE_##n)
 
+#define SUBC_SW(m) 7, (m)
 
 static INLINE uint32_t
 NV50_FIFO_PKHDR(int subc, int mthd, unsigned size)
-- 
2.4.4

Samuel Pitoiset

2015-Jun-22 20:53 UTC

head link

[Nouveau] [RFC PATCH 3/8] nv50: allocate and map a notifier buffer object for PM

This notifier buffer object will be used to read back global performance
counters results written by the kernel.

For each domain, we will store the handle of the perfdom object, an
array of 4 counters and the number of cycles. Like the Gallium's HUD,
we keep a list of busy queries in a ring in order to prevent stalls
when reading queries.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset at gmail.com>
---
 src/gallium/drivers/nouveau/nv50/nv50_screen.c | 29 ++++++++++++++++++++++++++
 src/gallium/drivers/nouveau/nv50/nv50_screen.h |  6 ++++++
 2 files changed, 35 insertions(+)

diff --git a/src/gallium/drivers/nouveau/nv50/nv50_screen.c
b/src/gallium/drivers/nouveau/nv50/nv50_screen.c
index c985344..3a99cc8 100644
--- a/src/gallium/drivers/nouveau/nv50/nv50_screen.c
+++ b/src/gallium/drivers/nouveau/nv50/nv50_screen.c
@@ -368,6 +368,7 @@ nv50_screen_destroy(struct pipe_screen *pscreen)
    nouveau_object_del(&screen->m2mf);
    nouveau_object_del(&screen->sync);
    nouveau_object_del(&screen->sw);
+   nouveau_object_del(&screen->query);
 
    nouveau_screen_fini(&screen->base);
 
@@ -699,9 +700,11 @@ nv50_screen_create(struct nouveau_device *dev)
    struct nv50_screen *screen;
    struct pipe_screen *pscreen;
    struct nouveau_object *chan;
+   struct nv04_fifo *fifo;
    uint64_t value;
    uint32_t tesla_class;
    unsigned stack_size;
+   uint32_t length;
    int ret;
 
    screen = CALLOC_STRUCT(nv50_screen);
@@ -727,6 +730,7 @@ nv50_screen_create(struct nouveau_device *dev)
    screen->base.pushbuf->rsvd_kick = 5;
 
    chan = screen->base.channel;
+   fifo = chan->data;
 
    pscreen->destroy = nv50_screen_destroy;
    pscreen->context_create = nv50_create;
@@ -772,6 +776,23 @@ nv50_screen_create(struct nouveau_device *dev)
       goto fail;
    }
 
+   /* Compute size (in bytes) of the notifier buffer object which is used
+    * in order to read back global performance counters results written
+    * by the kernel. For each domain, we store the handle of the perfdom
+    * object, an array of 4 counters and the number of cycles. Like for
+    * the Gallium's HUD, we keep a list of busy queries in a ring in order
+    * to prevent stalls when reading queries. */
+   length = (1 + (NV50_HW_PM_RING_BUFFER_NUM_DOMAINS * 6) *
+      NV50_HW_PM_RING_BUFFER_MAX_QUERIES) * 4;
+
+   ret = nouveau_object_new(chan, 0xbeef0302, NOUVEAU_NOTIFIER_CLASS,
+                            &(struct nv04_notify){ .length = length },
+                            sizeof(struct nv04_notify), &screen->query);
+   if (ret) {
+       NOUVEAU_ERR("Failed to allocate notifier object for PM: %d\n",
ret);
+       goto fail;
+   }
+
    ret = nouveau_object_new(chan, 0xbeef506e, 0x506e,
                             NULL, 0, &screen->sw);
    if (ret) {
@@ -845,6 +866,14 @@ nv50_screen_create(struct nouveau_device *dev)
    nouveau_heap_init(&screen->gp_code_heap, 0, 1 <<
NV50_CODE_BO_SIZE_LOG2);
    nouveau_heap_init(&screen->fp_code_heap, 0, 1 <<
NV50_CODE_BO_SIZE_LOG2);
 
+   ret = nouveau_bo_wrap(screen->base.device, fifo->notify,
&screen->notify_bo);
+   if (ret == 0)
+      nouveau_bo_map(screen->notify_bo, 0, screen->base.client);
+   if (ret) {
+      NOUVEAU_ERR("Failed to map notifier object for PM: %d\n", ret);
+      goto fail;
+   }
+
    nouveau_getparam(dev, NOUVEAU_GETPARAM_GRAPH_UNITS, &value);
 
    screen->TPs = util_bitcount(value & 0xffff);
diff --git a/src/gallium/drivers/nouveau/nv50/nv50_screen.h
b/src/gallium/drivers/nouveau/nv50/nv50_screen.h
index 69fdfdb..71a5247 100644
--- a/src/gallium/drivers/nouveau/nv50/nv50_screen.h
+++ b/src/gallium/drivers/nouveau/nv50/nv50_screen.h
@@ -59,6 +59,7 @@ struct nv50_screen {
    struct nouveau_bo *txc; /* TIC (offset 0) and TSC (65536) */
    struct nouveau_bo *stack_bo;
    struct nouveau_bo *tls_bo;
+   struct nouveau_bo *notify_bo;
 
    unsigned TPs;
    unsigned MPsInTP;
@@ -89,6 +90,7 @@ struct nv50_screen {
    } fence;
 
    struct nouveau_object *sync;
+   struct nouveau_object *query;
 
    struct nouveau_object *tesla;
    struct nouveau_object *eng2d;
@@ -96,6 +98,10 @@ struct nv50_screen {
    struct nouveau_object *sw;
 };
 
+/* Parameters of the ring buffer used to read back global PM counters. */
+#define NV50_HW_PM_RING_BUFFER_NUM_DOMAINS 8
+#define NV50_HW_PM_RING_BUFFER_MAX_QUERIES 9 /* HUD_NUM_QUERIES + 1 */
+
 static INLINE struct nv50_screen *
 nv50_screen(struct pipe_screen *screen)
 {
-- 
2.4.4

Samuel Pitoiset

2015-Jun-22 20:53 UTC

head link

[Nouveau] [RFC PATCH 4/8] nv50: configure the ring buffer for reading back PM counters

To write data at the right offset, the kernel has to know some
parameters of this ring buffer, like the number of domains and the
maximum number of queries.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset at gmail.com>
---
 src/gallium/drivers/nouveau/nv50/nv50_screen.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/src/gallium/drivers/nouveau/nv50/nv50_screen.c
b/src/gallium/drivers/nouveau/nv50/nv50_screen.c
index 3a99cc8..53817c0 100644
--- a/src/gallium/drivers/nouveau/nv50/nv50_screen.c
+++ b/src/gallium/drivers/nouveau/nv50/nv50_screen.c
@@ -441,6 +441,13 @@ nv50_screen_init_hwctx(struct nv50_screen *screen)
 
    BEGIN_NV04(push, SUBC_SW(NV01_SUBCHAN_OBJECT), 1);
    PUSH_DATA (push, screen->sw->handle);
+   BEGIN_NV04(push, SUBC_SW(0x0190), 1);
+   PUSH_DATA (push, screen->query->handle);
+   // XXX: Maybe add a check for DRM version here ?
+   BEGIN_NV04(push, SUBC_SW(0x0600), 1);
+   PUSH_DATA (push, NV50_HW_PM_RING_BUFFER_MAX_QUERIES);
+   BEGIN_NV04(push, SUBC_SW(0x0604), 1);
+   PUSH_DATA (push, NV50_HW_PM_RING_BUFFER_NUM_DOMAINS);
 
    BEGIN_NV04(push, NV50_3D(COND_MODE), 1);
    PUSH_DATA (push, NV50_3D_COND_MODE_ALWAYS);
-- 
2.4.4

Samuel Pitoiset

2015-Jun-22 20:53 UTC

head link

[Nouveau] [RFC PATCH 5/8] nv50: prevent NULL pointer dereference with pipe_query functions

This may happen when nv50_query_create() fails to create a new query.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset at gmail.com>
---
 src/gallium/drivers/nouveau/nv50/nv50_query.c | 15 ++++++++++++++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/src/gallium/drivers/nouveau/nv50/nv50_query.c
b/src/gallium/drivers/nouveau/nv50/nv50_query.c
index 55fcac8..1162110 100644
--- a/src/gallium/drivers/nouveau/nv50/nv50_query.c
+++ b/src/gallium/drivers/nouveau/nv50/nv50_query.c
@@ -96,6 +96,9 @@ nv50_query_allocate(struct nv50_context *nv50, struct
nv50_query *q, int size)
 static void
 nv50_query_destroy(struct pipe_context *pipe, struct pipe_query *pq)
 {
+   if (!pq)
+      return;
+
    nv50_query_allocate(nv50_context(pipe), nv50_query(pq), 0);
    nouveau_fence_ref(NULL, &nv50_query(pq)->fence);
    FREE(nv50_query(pq));
@@ -152,6 +155,9 @@ nv50_query_begin(struct pipe_context *pipe, struct
pipe_query *pq)
    struct nouveau_pushbuf *push = nv50->base.pushbuf;
    struct nv50_query *q = nv50_query(pq);
 
+   if (!pq)
+      return FALSE;
+
    /* For occlusion queries we have to change the storage, because a previous
     * query might set the initial render conition to FALSE even *after* we re-
     * initialized it to TRUE.
@@ -218,6 +224,9 @@ nv50_query_end(struct pipe_context *pipe, struct pipe_query
*pq)
    struct nouveau_pushbuf *push = nv50->base.pushbuf;
    struct nv50_query *q = nv50_query(pq);
 
+   if (!pq)
+      return;
+
    q->state = NV50_QUERY_STATE_ENDED;
 
    switch (q->type) {
@@ -294,9 +303,12 @@ nv50_query_result(struct pipe_context *pipe, struct
pipe_query *pq,
    uint64_t *res64 = (uint64_t *)result;
    uint32_t *res32 = (uint32_t *)result;
    boolean *res8 = (boolean *)result;
-   uint64_t *data64 = (uint64_t *)q->data;
+   uint64_t *data64;
    int i;
 
+   if (!pq)
+      return FALSE;
+
    if (q->state != NV50_QUERY_STATE_READY)
       nv50_query_update(q);
 
@@ -314,6 +326,7 @@ nv50_query_result(struct pipe_context *pipe, struct
pipe_query *pq,
    }
    q->state = NV50_QUERY_STATE_READY;
 
+   data64 = (uint64_t *)q->data;
    switch (q->type) {
    case PIPE_QUERY_GPU_FINISHED:
       res8[0] = TRUE;
-- 
2.4.4

Samuel Pitoiset

2015-Jun-22 20:53 UTC

head link

[Nouveau] [RFC PATCH 6/8] nv50: add support for compute/graphics global performance counters

This commit adds support for both compute and graphics global
performance counters which have been reverse engineered with
CUPTI (Linux) and PerfKit (Windows).

Currently, only one query type can be monitored at the same time because
the Gallium's HUD doesn't fit pretty well. This will be improved later.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset at gmail.com>
---
 src/gallium/drivers/nouveau/nv50/nv50_query.c  | 1057 +++++++++++++++++++++++-
 src/gallium/drivers/nouveau/nv50/nv50_screen.h |   35 +
 2 files changed, 1087 insertions(+), 5 deletions(-)

diff --git a/src/gallium/drivers/nouveau/nv50/nv50_query.c
b/src/gallium/drivers/nouveau/nv50/nv50_query.c
index 1162110..b9d2914 100644
--- a/src/gallium/drivers/nouveau/nv50/nv50_query.c
+++ b/src/gallium/drivers/nouveau/nv50/nv50_query.c
@@ -27,6 +27,8 @@
 #include "nv50/nv50_context.h"
 #include "nv_object.xml.h"
 
+#include "nouveau_perfmon.h"
+
 #define NV50_QUERY_STATE_READY   0
 #define NV50_QUERY_STATE_ACTIVE  1
 #define NV50_QUERY_STATE_ENDED   2
@@ -51,10 +53,25 @@ struct nv50_query {
    boolean is64bit;
    struct nouveau_mm_allocation *mm;
    struct nouveau_fence *fence;
+   struct nouveau_object *perfdom;
 };
 
 #define NV50_QUERY_ALLOC_SPACE 256
 
+#ifdef DEBUG
+static void nv50_hw_pm_dump_perfdom(struct nvif_perfdom_v0 *args);
+#endif
+
+static boolean
+nv50_hw_pm_query_create(struct nv50_context *, struct nv50_query *);
+static void
+nv50_hw_pm_query_destroy(struct nv50_context *, struct nv50_query *);
+static boolean
+nv50_hw_pm_query_begin(struct nv50_context *, struct nv50_query *);
+static void nv50_hw_pm_query_end(struct nv50_context *, struct nv50_query *);
+static boolean nv50_hw_pm_query_result(struct nv50_context *,
+                                    struct nv50_query *, boolean, void *);
+
 static INLINE struct nv50_query *
 nv50_query(struct pipe_query *pipe)
 {
@@ -96,12 +113,18 @@ nv50_query_allocate(struct nv50_context *nv50, struct
nv50_query *q, int size)
 static void
 nv50_query_destroy(struct pipe_context *pipe, struct pipe_query *pq)
 {
+   struct nv50_context *nv50 = nv50_context(pipe);
+   struct nv50_query *q = nv50_query(pq);
+
    if (!pq)
       return;
 
-   nv50_query_allocate(nv50_context(pipe), nv50_query(pq), 0);
-   nouveau_fence_ref(NULL, &nv50_query(pq)->fence);
-   FREE(nv50_query(pq));
+   if ((q->type >= NV50_HW_PM_QUERY(0) && q->type <=
NV50_HW_PM_QUERY_LAST))
+      nv50_hw_pm_query_destroy(nv50, q);
+
+   nv50_query_allocate(nv50, q, 0);
+   nouveau_fence_ref(NULL, &q->fence);
+   FREE(q);
 }
 
 static struct pipe_query *
@@ -130,6 +153,11 @@ nv50_query_create(struct pipe_context *pipe, unsigned type,
unsigned index)
       q->data -= 32 / sizeof(*q->data); /* we advance before query_begin
! */
    }
 
+   if ((q->type >= NV50_HW_PM_QUERY(0) && q->type <=
NV50_HW_PM_QUERY_LAST)) {
+      if (!nv50_hw_pm_query_create(nv50, q))
+         return NULL;
+   }
+
    return (struct pipe_query *)q;
 }
 
@@ -154,6 +182,7 @@ nv50_query_begin(struct pipe_context *pipe, struct
pipe_query *pq)
    struct nv50_context *nv50 = nv50_context(pipe);
    struct nouveau_pushbuf *push = nv50->base.pushbuf;
    struct nv50_query *q = nv50_query(pq);
+   boolean ret = TRUE;
 
    if (!pq)
       return FALSE;
@@ -211,10 +240,13 @@ nv50_query_begin(struct pipe_context *pipe, struct
pipe_query *pq)
       nv50_query_get(push, q, 0x10, 0x00005002);
       break;
    default:
+      if ((q->type >= NV50_HW_PM_QUERY(0) && q->type <=
NV50_HW_PM_QUERY_LAST)) {
+         ret = nv50_hw_pm_query_begin(nv50, q);
+      }
       break;
    }
    q->state = NV50_QUERY_STATE_ACTIVE;
-   return true;
+   return ret;
 }
 
 static void
@@ -274,7 +306,9 @@ nv50_query_end(struct pipe_context *pipe, struct pipe_query
*pq)
       q->state = NV50_QUERY_STATE_READY;
       break;
    default:
-      assert(0);
+      if ((q->type >= NV50_HW_PM_QUERY(0) && q->type <=
NV50_HW_PM_QUERY_LAST)) {
+         nv50_hw_pm_query_end(nv50, q);
+      }
       break;
    }
 
@@ -309,6 +343,10 @@ nv50_query_result(struct pipe_context *pipe, struct
pipe_query *pq,
    if (!pq)
       return FALSE;
 
+   if ((q->type >= NV50_HW_PM_QUERY(0) && q->type <=
NV50_HW_PM_QUERY_LAST)) {
+      return nv50_hw_pm_query_result(nv50, q, wait, result);
+   }
+
    if (q->state != NV50_QUERY_STATE_READY)
       nv50_query_update(q);
 
@@ -488,6 +526,1015 @@ nva0_so_target_save_offset(struct pipe_context *pipe,
    nv50_query_end(pipe, targ->pq);
 }
 
+/* === HARDWARE GLOBAL PERFORMANCE COUNTERS for NV50 === */
+
+struct nv50_hw_pm_source_cfg
+{
+   const char *name;
+   uint64_t value;
+};
+
+struct nv50_hw_pm_signal_cfg
+{
+   const char *name;
+   const struct nv50_hw_pm_source_cfg src[8];
+};
+
+struct nv50_hw_pm_counter_cfg
+{
+   uint16_t logic_op;
+   const struct nv50_hw_pm_signal_cfg sig[4];
+};
+
+enum nv50_hw_pm_query_display
+{
+   NV50_HW_PM_EVENT_DISPLAY_RAW,
+   NV50_HW_PM_EVENT_DISPLAY_RATIO,
+};
+
+enum nv50_hw_pm_query_count
+{
+   NV50_HW_PM_EVENT_COUNT_SIMPLE,
+   NV50_HW_PM_EVENT_COUNT_B4,
+   NV50_HW_PM_EVENT_COUNT_B6,
+};
+
+struct nv50_hw_pm_event_cfg
+{
+   const char *name;
+   const char *desc;
+   enum nv50_hw_pm_query_display display;
+   enum nv50_hw_pm_query_count count;
+   uint8_t domain;
+};
+
+struct nv50_hw_pm_query_cfg
+{
+   const struct nv50_hw_pm_event_cfg *event;
+   const struct nv50_hw_pm_counter_cfg ctr[4];
+};
+
+#define SRC(name, val) { name, val }
+#define SIG(name, ...) { name, { __VA_ARGS__ } }
+#define CTR(func, ...) { func, { __VA_ARGS__ } }
+
+/*
+ * GPU
+ */
+/* gpu_idle */
+static const struct nv50_hw_pm_event_cfg
+nv50_gpu_idle_event +{
+   .name    = "gpu_idle",
+   .desc    = "The \% of time the GPU is idle/busy since the last call.
"
+              "Having the GPU idle at all is a waste of valuable
resources. "
+              "You want to balance the GPU and CPU workloads so that no
one "
+              "processor is starved for work. Time management or using
"
+              "multithreading in your application can help balance CPU
based "
+              "tasks (world management, etc.) with the rendering
pipeline.",
+   .display = NV50_HW_PM_EVENT_DISPLAY_RATIO,
+   .count   = NV50_HW_PM_EVENT_COUNT_SIMPLE,
+   .domain  = 1,
+};
+
+static const struct nv50_hw_pm_query_cfg
+nv50_gpu_idle_query +{
+   .event  = &nv50_gpu_idle_event,
+   .ctr[0] = CTR(0xaaaa, SIG("pc01_gr_idle")),
+};
+
+/*
+ * INPUT ASSEMBLER
+ */
+/* input_assembler_busy */
+static const struct nv50_hw_pm_event_cfg
+nv50_ia_busy_event +{
+   .name    = "input_assembler_busy",
+   .desc    = "The \% of time the input assembler unit is busy. This is
mainly "
+              "impacted by both the number of vertices processed as well
as "
+              "the size of the attributes on those vertices. You can
optimize "
+              "this by reducing vertex size as much as possible and using
"
+              "indexed primitives to take advantage of the vertex
cache.",
+   .display = NV50_HW_PM_EVENT_DISPLAY_RATIO,
+   .count   = NV50_HW_PM_EVENT_COUNT_SIMPLE,
+   .domain  = 1,
+};
+
+static const struct nv50_hw_pm_query_cfg
+nv50_ia_busy_query +{
+   .event   = &nv50_ia_busy_event,
+   .ctr[0]  = CTR(0xf888, SIG("pc01_vfetch_18",
+                              SRC("pgraph_vfetch_unk0c_unk0", 0x1)),
+                          SIG("pc01_vfetch_17"),
+                          SIG("pc01_vfetch_03"),
+                          SIG("pc01_vfetch_02")),
+};
+
+static const struct nv50_hw_pm_query_cfg
+nva0_ia_busy_query +{
+   .event   = &nv50_ia_busy_event,
+   .ctr[0]  = CTR(0xf888, SIG("pc01_vfetch_15",
+                              SRC("pgraph_vfetch_unk0c_unk0", 0x1)),
+                          SIG("pc01_vfetch_14"),
+                          SIG("pc01_vfetch_03"),
+                          SIG("pc01_vfetch_02")),
+};
+
+/* input_assembler_waits_for_fb */
+static const struct nv50_hw_pm_event_cfg
+nv50_ia_waits_for_fb_event = {
+   .name    = "input_assembler_waits_for_fb",
+   .desc    = "This is the amount of time the input assembler unit was
"
+              "waiting for data from the frame buffer unit.",
+   .display = NV50_HW_PM_EVENT_DISPLAY_RATIO,
+   .count   = NV50_HW_PM_EVENT_COUNT_SIMPLE,
+   .domain  = 1,
+};
+
+static const struct nv50_hw_pm_query_cfg
+nv50_ia_waits_for_fb_query +{
+   .event   = &nv50_ia_waits_for_fb_event,
+   .ctr[0]  = CTR(0xaaaa, SIG("pc01_vfetch_0e",
+                              SRC("pgraph_vfetch_unk0c_unk0", 0x1))),
+};
+
+static const struct nv50_hw_pm_query_cfg
+nva0_ia_waits_for_fb_query +{
+   .event   = &nv50_ia_waits_for_fb_event,
+   .ctr[0]  = CTR(0xaaaa, SIG("pc01_vfetch_0b",
+                              SRC("pgraph_vfetch_unk0c_unk0", 0x1))),
+};
+
+/* vertex_attribute_count */
+static const struct nv50_hw_pm_event_cfg
+nv50_vertex_attr_count_event +{
+   .name    = "vertex_attribute_count",
+   .desc    = "The number of vertex attributes that are fetched and passed
to "
+              "the geometry unit is returned in this counter. A large
number "
+              "of attributes (or unaligned vertices) can hurt vertex cache
"
+              "performance and reduce the overall vertex processing "
+              "capabilities of the pipeline.",
+   .display = NV50_HW_PM_EVENT_DISPLAY_RAW,
+   .count   = NV50_HW_PM_EVENT_COUNT_SIMPLE,
+   .domain  = 1,
+};
+
+static const struct nv50_hw_pm_query_cfg
+nv50_vertex_attr_count_query +{
+   .event = &nv50_vertex_attr_count_event,
+   .ctr[0] = CTR(0xf888, SIG("pc01_vfetch_18",
+                             SRC("pgraph_vfetch_unk0c_unk0", 0x1)),
+                         SIG("pc01_vfetch_17"),
+                         SIG("pc01_vfetch_03"),
+                         SIG("pc01_vfetch_02")),
+};
+
+static const struct nv50_hw_pm_query_cfg
+nva0_vertex_attr_count_query +{
+   .event  = &nv50_vertex_attr_count_event,
+   .ctr[0] = CTR(0xf888, SIG("pc01_vfetch_15",
+                             SRC("pgraph_vfetch_unk0c_unk0", 0x1)),
+                         SIG("pc01_vfetch_14"),
+                         SIG("pc01_vfetch_03"),
+                         SIG("pc01_vfetch_02")),
+};
+
+/*
+ * GEOM
+ */
+/* geom_vertex_in_count */
+static const struct nv50_hw_pm_event_cfg
+nv50_geom_vertex_in_count_event +{
+   .name    = "geom_vertex_in_count",
+   .desc    = "The number of vertices input to the geom unit.",
+   .display = NV50_HW_PM_EVENT_DISPLAY_RAW,
+   .count   = NV50_HW_PM_EVENT_COUNT_B4,
+   .domain  = 1,
+};
+
+static const struct nv50_hw_pm_query_cfg
+nv50_geom_vertex_in_count_query +{
+   .event  = &nv50_geom_vertex_in_count_event,
+   .ctr[1] = CTR(0xffff, SIG("pc01_vfetch_0e",
+                             SRC("pgraph_vfetch_unk0c_unk0", 0x0)),
+                         SIG("pc01_vfetch_0f"),
+                         SIG("pc01_vfetch_10"),
+                         SIG("pc01_trailer")),
+   .ctr[2] = CTR(0x5555, SIG("pc01_trailer"),
+                         SIG("pc01_trailer"),
+                         SIG("pc01_trailer"),
+                         SIG("pc01_trailer")),
+};
+
+/* geom_vertex_out_count */
+static const struct nv50_hw_pm_event_cfg
+nv50_geom_vertex_out_count_event +{
+   .name    = "geom_vertex_out_count",
+   .desc    = "The number of vertices coming out of the geom unit after
any "
+              "geometry shader expansion.",
+   .display = NV50_HW_PM_EVENT_DISPLAY_RAW,
+   .count   = NV50_HW_PM_EVENT_COUNT_SIMPLE,
+   .domain  = 1,
+};
+
+static const struct nv50_hw_pm_query_cfg
+nv50_geom_vertex_out_count_query +{
+   .event  = &nv50_geom_vertex_out_count_event,
+   .ctr[0] = CTR(0xaaaa, SIG("pc01_vattr_01")),
+};
+
+/* geom_primitive_in_count */
+static const struct nv50_hw_pm_event_cfg
+nv50_geom_primitive_in_count_event +{
+   .name    = "geom_primitive_in_count",
+   .desc    = "The number of primitives input to the geom unit.",
+   .display = NV50_HW_PM_EVENT_DISPLAY_RAW,
+   .count   = NV50_HW_PM_EVENT_COUNT_SIMPLE,
+   .domain  = 1,
+};
+
+static const struct nv50_hw_pm_query_cfg
+nv50_geom_primitive_in_count_query +{
+   .event  = &nv50_geom_primitive_in_count_event,
+   .ctr[0] = CTR(0xaaaa, SIG("pc01_vfetch_08",
+                             SRC("pgraph_vfetch_unk0c_unk0", 0x0))),
+};
+
+/* geom_primitive_out_count */
+static const struct nv50_hw_pm_event_cfg
+nv50_geom_primitive_out_count_event +{
+   .name    = "geom_primitive_out_count",
+   .desc    = "The number of primitives coming out the geom unit after any
"
+              "geometry shader expansion.",
+   .display = NV50_HW_PM_EVENT_DISPLAY_RAW,
+   .count   = NV50_HW_PM_EVENT_COUNT_SIMPLE,
+   .domain  = 1,
+};
+
+static const struct nv50_hw_pm_query_cfg
+nv50_geom_primitive_out_count_query +{
+   .event  = &nv50_geom_primitive_out_count_event,
+   .ctr[0] = CTR(0xaaaa, SIG("pc01_vattr_00")),
+};
+
+/*
+ * STREAM OUT
+ */
+/* stream_out_busy */
+static const struct nv50_hw_pm_event_cfg
+nv50_so_busy_event +{
+   .name    = "stream_out_busy",
+   .desc    = "This unit manages the writing of vertices to the frame
buffer "
+              "when using stream out. If a significant number of vertices
are "
+              "written, this can become a bottleneck.",
+   .display = NV50_HW_PM_EVENT_DISPLAY_RATIO,
+   .count   = NV50_HW_PM_EVENT_COUNT_SIMPLE,
+   .domain  = 1,
+};
+
+static const struct nv50_hw_pm_query_cfg
+nv50_so_busy_query +{
+   .event  = &nv50_so_busy_event,
+   .ctr[0] = CTR(0x8888, SIG("pc01_strmout_00"),
+                         SIG("pc01_strmout_01")),
+};
+
+/*
+ * SETUP
+ */
+/* setup_primitive_count */
+static const struct nv50_hw_pm_event_cfg
+nv50_setup_primitive_count_event +{
+   .name    = "setup_primitive_count",
+   .desc    = "Returns the number of primitives processed in the geometry
"
+              "subsystem. This experiments counts points, lines and
triangles. "
+              "To count only triangles, use the setup_triangle_count
counter. "
+              "Balance these counts with the number of pixels being drawn
to "
+              "see if you could simplify your geometry and use "
+              "bump/displacement maps, for example.",
+   .display = NV50_HW_PM_EVENT_DISPLAY_RAW,
+   .count   = NV50_HW_PM_EVENT_COUNT_SIMPLE,
+   .domain  = 1,
+};
+
+static const struct nv50_hw_pm_query_cfg
+nv50_setup_primitive_count_query +{
+   .event  = &nv50_setup_primitive_count_event,
+   .ctr[0] = CTR(0xaaaa, SIG("pc01_trast_00")),
+};
+
+/* setup_point_count */
+static const struct nv50_hw_pm_event_cfg
+nv50_setup_point_count_event +{
+   .name    = "setup_point_count",
+   .desc    = "The number of points seen by the primitive setup unit (just
"
+              "before rasterization).",
+   .display = NV50_HW_PM_EVENT_DISPLAY_RAW,
+   .count   = NV50_HW_PM_EVENT_COUNT_SIMPLE,
+   .domain  = 1,
+};
+
+static const struct nv50_hw_pm_query_cfg
+nv50_setup_point_count_query +{
+   .event  = &nv50_setup_point_count_event,
+   .ctr[0] = CTR(0x8080, SIG("pc01_trast_01"),
+                         SIG("pc01_trast_04"),
+                         SIG("pc01_trast_05")),
+};
+
+/* setup_line_count */
+static const struct nv50_hw_pm_event_cfg
+nv50_setup_line_count_event +{
+   .name    = "setup_line_count",
+   .desc    = "The number of lines seen by the primitive setup unit (just
"
+              "before rasterization).",
+   .display = NV50_HW_PM_EVENT_DISPLAY_RAW,
+   .count   = NV50_HW_PM_EVENT_COUNT_SIMPLE,
+   .domain  = 1,
+};
+
+static const struct nv50_hw_pm_query_cfg
+nv50_setup_line_count_query +{
+   .event  = &nv50_setup_line_count_event,
+   .ctr[0] = CTR(0x8080, SIG("pc01_trast_02"),
+                         SIG("pc01_trast_04"),
+                         SIG("pc01_trast_05")),
+};
+
+/* setup_triangle_count */
+static const struct nv50_hw_pm_event_cfg
+nv50_setup_triangle_count_event +{
+   .name    = "setup_triangle_count",
+   .desc    = "Returns the number of triangles processed in the geometry
"
+              "subsystem.",
+   .display = NV50_HW_PM_EVENT_DISPLAY_RAW,
+   .count   = NV50_HW_PM_EVENT_COUNT_SIMPLE,
+   .domain  = 1,
+};
+
+static const struct nv50_hw_pm_query_cfg
+nv50_setup_triangle_count_query +{
+   .event  = &nv50_setup_triangle_count_event,
+   .ctr[0] = CTR(0x8080, SIG("pc01_trast_03"),
+                         SIG("pc01_trast_04"),
+                         SIG("pc01_trast_05")),
+};
+
+/* setup_primitive_culled_count */
+static const struct nv50_hw_pm_event_cfg
+nv50_setup_primitive_culled_count_event +{
+   .name    = "setup_primitive_culled_count",
+   .desc    = "Returns the number of primitives culled in primitive setup.
If "
+              "you are performing viewport culling, this gives you an
"
+              "indication of the accuracy of the algorithm being used, and
can "
+              "give you and idea if you need to improves this culling.
This "
+              "includes primitives culled when using backface culling.
Drawing "
+              "a fully visible sphere on the screen should cull half of
the "
+              "triangles if backface culling is turned on and all the
"
+              "triangles are ordered consistently (CW or CCW).",
+   .display = NV50_HW_PM_EVENT_DISPLAY_RAW,
+   .count   = NV50_HW_PM_EVENT_COUNT_SIMPLE,
+   .domain  = 1,
+};
+
+static const struct nv50_hw_pm_query_cfg
+nv50_setup_primitive_culled_count_query +{
+   .event  = &nv50_setup_primitive_culled_count_event,
+   .ctr[0] = CTR(0xaaaa, SIG("pc01_unk00")),
+};
+
+/*
+ * RASTERIZER
+ */
+/* rast_tiles_killed_by_zcull_count */
+static const struct nv50_hw_pm_event_cfg
+nv50_rast_tiles_killed_by_zcull_event +{
+   .name    = "rasterizer_tiles_killed_by_zcull_count",
+   .desc    = "The number of pixels killed by the zcull unit in the
rasterizer.",
+   .display = NV50_HW_PM_EVENT_DISPLAY_RAW,
+   .count   = NV50_HW_PM_EVENT_COUNT_B6,
+   .domain  = 1,
+};
+
+static const struct nv50_hw_pm_query_cfg
+nv50_rast_tiles_killed_by_zcull_query +{
+   .event  = &nv50_rast_tiles_killed_by_zcull_event,
+   .ctr[1] = CTR(0xffff, SIG("pc01_zcull_00",
+                             SRC("pgraph_zcull_pm_unka4_unk0", 0x7)),
+                         SIG("pc01_zcull_01"),
+                         SIG("pc01_zcull_02"),
+                         SIG("pc01_zcull_03")),
+   .ctr[2] = CTR(0x5555, SIG("pc01_trailer"),
+                         SIG("pc01_trailer"),
+                         SIG("pc01_zcull_04"),
+                         SIG("pc01_zcull_05")),
+};
+
+/* rast_tiles_in_count */
+static const struct nv50_hw_pm_event_cfg
+nv50_rast_tiles_in_count_event +{
+   .name    = "rasterizer_tiles_in_count",
+   .desc    = "Count of tiles (each of which contain 1-8 pixels) seen by
the "
+              "rasterizer stage.",
+   .display = NV50_HW_PM_EVENT_DISPLAY_RAW,
+   .count   = NV50_HW_PM_EVENT_COUNT_B6,
+   .domain  = 1,
+};
+
+static const struct nv50_hw_pm_query_cfg
+nv50_rast_tiles_in_count_query +{
+   .event  = &nv50_rast_tiles_in_count_event,
+   .ctr[1] = CTR(0xffff, SIG("pc01_zcull_00",
+                             SRC("pgraph_zcull_pm_unka4_unk0", 0x0)),
+                         SIG("pc01_zcull_01"),
+                         SIG("pc01_zcull_02"),
+                         SIG("pc01_zcull_03")),
+   .ctr[2] = CTR(0x5555, SIG("pc01_trailer"),
+                         SIG("pc01_trailer"),
+                         SIG("pc01_zcull_04"),
+                         SIG("pc01_zcull_05")),
+};
+
+/*
+ * ROP
+ */
+/* rop_busy */
+static const struct nv50_hw_pm_event_cfg
+nv50_rop_busy_event +{
+   .name    = "rop_busy",
+   .desc    = "\% of time that the ROP unit is actively doing work. "
+              "This can be high if alpha blending is turned on, of
overdraw "
+              "is high, etc.",
+   .display = NV50_HW_PM_EVENT_DISPLAY_RATIO,
+   .count   = NV50_HW_PM_EVENT_COUNT_SIMPLE,
+   .domain  = 2,
+};
+
+static const struct nv50_hw_pm_query_cfg
+nv50_rop_busy_query +{
+   .event  = &nv50_rop_busy_event,
+   .ctr[0] = CTR(0xf888, SIG("pc02_prop_02",
+                             SRC("pgraph_tpc0_prop_pm_mux_sel",
0x0)),
+                         SIG("pc02_prop_03"),
+                         SIG("pc02_prop_04"),
+                         SIG("pc02_prop_05")),
+};
+
+/* rop_waits_for_fb */
+static const struct nv50_hw_pm_event_cfg
+nv50_rop_waits_for_fb_event +{
+   .name    = "rop_waits_for_fb",
+   .desc    = "The amount of time the blending unit spent waiting for data
"
+              "from the frame buffer unit. If blending is enabled and
there "
+              "is a lot of traffic here (since this is a read/modify/write
"
+              "operation) this can become a bottleneck.",
+   .display = NV50_HW_PM_EVENT_DISPLAY_RATIO,
+   .count   = NV50_HW_PM_EVENT_COUNT_SIMPLE,
+   .domain  = 2,
+};
+
+static const struct nv50_hw_pm_query_cfg
+nv50_rop_waits_for_fb_query +{
+   .event  = &nv50_rop_waits_for_fb_event,
+   .ctr[0] = CTR(0x22f2, SIG("pc02_crop_03",
+                             SRC("pgraph_rop0_crop_pm_mux_sel0",
0x0)),
+                         SIG("pc02_crop_02"),
+                         SIG("pc02_zrop_03",
+                             SRC("pgraph_rop0_zrop_pm_mux_sel0",
0x0)),
+                         SIG("pc02_zrop_02")),
+};
+
+/* rop_waits_for_shader */
+static const struct nv50_hw_pm_event_cfg
+nv50_rop_waits_for_shader_event +{
+   .name    = "rop_waits_for_shader",
+   .desc    = "This is a measurement of how often the blending unit was
"
+              "waiting on new work (fragments to be placed into the render
"
+              "target). If the pixel shaders are particularly expensive,
the "
+              "ROP unit could be starved waiting for results.",
+   .display = NV50_HW_PM_EVENT_DISPLAY_RATIO,
+   .count   = NV50_HW_PM_EVENT_COUNT_SIMPLE,
+   .domain  = 2,
+};
+
+static const struct nv50_hw_pm_query_cfg
+nv50_rop_waits_for_shader_query +{
+   .event  = &nv50_rop_waits_for_shader_event,
+   .ctr[0] = CTR(0x2222, SIG("pc02_prop_6",
+                             SRC("pgraph_tpc0_prop_pm_mux_sel",
0x0)),
+                         SIG("pc02_prop_7")),
+};
+
+/* rop_samples_killed_by_earlyz_count */
+static const struct nv50_hw_pm_event_cfg
+nv50_rop_samples_killed_by_earlyz_event +{
+   .name    = "rop_samples_killed_by_earlyz_count",
+   .desc    = "This returns the number of pixels that were killed in the
"
+              "earlyZ hardware. This signal will give you an idea of, for
"
+              "instance, a Z only pass was successful in setting up the
depth "
+              "buffer.",
+   .display = NV50_HW_PM_EVENT_DISPLAY_RAW,
+   .count   = NV50_HW_PM_EVENT_COUNT_B6,
+   .domain  = 2,
+};
+
+static const struct nv50_hw_pm_query_cfg
+nv50_rop_samples_killed_by_earlyz_query +{
+   .event  = &nv50_rop_samples_killed_by_earlyz_event,
+   .ctr[1] = CTR(0xffff, SIG("pc02_prop_00",
+                             SRC("pgraph_tpc0_prop_pm_mux_sel",
0x1a)),
+                         SIG("pc02_prop_01"),
+                         SIG("pc02_prop_02"),
+                         SIG("pc02_prop_03")),
+   .ctr[2] = CTR(0x5555, SIG("pc02_prop_07"),
+                         SIG("pc02_trailer"),
+                         SIG("pc02_prop_04"),
+                         SIG("pc02_prop_05")),
+};
+
+/* rop_samples_killed_by_latez_count */
+static const struct nv50_hw_pm_event_cfg
+nv50_rop_samples_killed_by_latez_event +{
+   .name    = "rop_samples_killed_by_latez_count",
+   .desc    = "This returns the number of pixels that were killed after
the "
+              "pixel shader ran. This can happen if the early Z is unable
to "
+              "cull the pixel because of an API setup issue like changing
the "
+              "Z direction or modifying Z in the pixel shader.",
+   .display = NV50_HW_PM_EVENT_DISPLAY_RAW,
+   .count   = NV50_HW_PM_EVENT_COUNT_B6,
+   .domain  = 2,
+};
+
+static const struct nv50_hw_pm_query_cfg
+nv50_rop_samples_killed_by_latez_query +{
+   .event  = &nv50_rop_samples_killed_by_latez_event,
+   .ctr[1] = CTR(0xffff, SIG("pc02_prop_00",
+                             SRC("pgraph_tpc0_prop_pm_mux_sel",
0x1b)),
+                         SIG("pc02_prop_01"),
+                         SIG("pc02_prop_02"),
+                         SIG("pc02_prop_03")),
+   .ctr[2] = CTR(0x5555, SIG("pc02_prop_07"),
+                         SIG("pc02_trailer"),
+                         SIG("pc02_prop_04"),
+                         SIG("pc02_prop_05")),
+};
+
+/*
+ * TEXTURE
+ */
+/* tex_cache_miss */
+static const struct nv50_hw_pm_event_cfg
+nv50_tex_cache_miss_event +{
+   .name    = "tex_cache_miss",
+   .desc    = "Number of texture cache misses.",
+   .display = NV50_HW_PM_EVENT_DISPLAY_RAW,
+   .count   = NV50_HW_PM_EVENT_COUNT_SIMPLE,
+   .domain  = 2,
+};
+
+static const struct nv50_hw_pm_query_cfg
+nv50_tex_cache_miss_query +{
+   .event  = &nv50_tex_cache_miss_event,
+   .ctr[0] = CTR(0xaaaa, SIG("pc02_tex_04",
+                             SRC("pgraph_tpc0_tex_unk08_unk0",
0x200))),
+};
+
+static const struct nv50_hw_pm_query_cfg
+nv84_tex_cache_miss_query +{
+   .event  = &nv50_tex_cache_miss_event,
+   .ctr[0] = CTR(0xaaaa, SIG("pc02_tex_04",
+                             SRC("pgraph_tpc0_tex_unk08_unk0",
0x800))),
+};
+
+/* tex_cache_hit */
+static const struct nv50_hw_pm_event_cfg
+nv50_tex_cache_hit_event +{
+   .name    = "tex_cache_hit",
+   .desc    = "Number of texture cache hits.",
+   .display = NV50_HW_PM_EVENT_DISPLAY_RAW,
+   .count   = NV50_HW_PM_EVENT_COUNT_SIMPLE,
+   .domain  = 2,
+};
+
+static const struct nv50_hw_pm_query_cfg
+nv50_tex_cache_hit_query +{
+   .event  = &nv50_tex_cache_hit_event,
+   .ctr[0] = CTR(0xaaaa, SIG("pc02_tex_05",
+                             SRC("pgraph_tpc0_tex_unk08_unk0",
0x200))),
+};
+
+static const struct nv50_hw_pm_query_cfg
+nv84_tex_cache_hit_query +{
+   .event  = &nv50_tex_cache_hit_event,
+   .ctr[0] = CTR(0xaaaa, SIG("pc02_tex_05",
+                             SRC("pgraph_tpc0_tex_unk08_unk0",
0x800))),
+};
+
+/* tex_waits_for_fb */
+static const struct nv50_hw_pm_event_cfg
+nv50_tex_waits_for_fb_event +{
+   .name    = "tex_waits_for_fb",
+   .desc    = "This is the amount of time the texture unit spent waiting
on "
+              "samples to return from the frame buffer unit. It is a
potential "
+              "indication of poor texture cache utilization.",
+   .display = NV50_HW_PM_EVENT_DISPLAY_RATIO,
+   .count   = NV50_HW_PM_EVENT_COUNT_SIMPLE,
+   .domain  = 2,
+};
+
+static const struct nv50_hw_pm_query_cfg
+nv50_tex_waits_for_fb_query +{
+   .event  = &nv50_tex_waits_for_fb_event,
+   .ctr[0] = CTR(0xaaaa, SIG("pc02_tex_06",
+                             SRC("pgraph_tpc0_tex_unk08_unk0",
0x200))),
+};
+
+static const struct nv50_hw_pm_query_cfg
+nv84_tex_waits_for_fb_query +{
+   .event  = &nv50_tex_waits_for_fb_event,
+   .ctr[0] = CTR(0xaaaa, SIG("pc02_tex_06",
+                             SRC("pgraph_tpc0_tex_unk08_unk0",
0x800))),
+};
+
+static const struct nv50_hw_pm_query_cfg
*nv50_hw_pm_queries[NV50_HW_PM_QUERY_COUNT];
+
+#define _Q(n, q) nv50_hw_pm_queries[NV50_HW_PM_QUERY_##n] = &q;
+
+static void
+nv50_identify_events(struct nv50_screen *screen)
+{
+  _Q(GPU_IDLE,                      nv50_gpu_idle_query);
+  _Q(IA_BUSY,                       nv50_ia_busy_query);
+  _Q(IA_WAITS_FOR_FB,               nv50_ia_waits_for_fb_query);
+  _Q(VERTEX_ATTR_COUNT,             nv50_vertex_attr_count_query);
+  _Q(GEOM_VERTEX_IN_COUNT,          nv50_geom_vertex_in_count_query);
+  _Q(GEOM_VERTEX_OUT_COUNT,         nv50_geom_vertex_out_count_query);
+  _Q(GEOM_PRIMITIVE_IN_COUNT,       nv50_geom_primitive_in_count_query);
+  _Q(GEOM_PRIMITIVE_OUT_COUNT,      nv50_geom_primitive_out_count_query);
+  _Q(SO_BUSY,                       nv50_so_busy_query);
+  _Q(SETUP_PRIMITIVE_COUNT,         nv50_setup_primitive_count_query);
+  _Q(SETUP_POINT_COUNT,             nv50_setup_point_count_query);
+  _Q(SETUP_LINE_COUNT,              nv50_setup_line_count_query);
+  _Q(SETUP_TRIANGLE_COUNT,          nv50_setup_triangle_count_query);
+  _Q(SETUP_PRIMITIVE_CULLED_COUNT,  nv50_setup_primitive_culled_count_query);
+  _Q(RAST_TILES_KILLED_BY_ZCULL,    nv50_rast_tiles_killed_by_zcull_query);
+  _Q(RAST_TILES_IN_COUNT,           nv50_rast_tiles_in_count_query);
+  _Q(ROP_BUSY,                      nv50_rop_busy_query);
+  _Q(ROP_WAITS_FOR_FB,              nv50_rop_waits_for_fb_query);
+  _Q(ROP_WAITS_FOR_SHADER,          nv50_rop_waits_for_shader_query);
+  _Q(ROP_SAMPLES_KILLED_BY_EARLYZ,  nv50_rop_samples_killed_by_earlyz_query);
+  _Q(ROP_SAMPLES_KILLED_BY_LATEZ,   nv50_rop_samples_killed_by_latez_query );
+  _Q(TEX_CACHE_MISS,                nv50_tex_cache_miss_query);
+  _Q(TEX_CACHE_HIT,                 nv50_tex_cache_hit_query);
+  _Q(TEX_WAITS_FOR_FB,              nv50_tex_waits_for_fb_query);
+
+   if (screen->base.class_3d >= NV84_3D_CLASS) {
+      /* Variants for NV84+ */
+      _Q(TEX_CACHE_MISS,   nv84_tex_cache_miss_query);
+      _Q(TEX_CACHE_HIT,    nv84_tex_cache_hit_query);
+      _Q(TEX_WAITS_FOR_FB, nv84_tex_waits_for_fb_query);
+   }
+
+   if (screen->base.class_3d >= NVA0_3D_CLASS) {
+      /* Variants for NVA0+ */
+      _Q(IA_BUSY,           nva0_ia_busy_query);
+      _Q(IA_WAITS_FOR_FB,   nva0_ia_waits_for_fb_query);
+      _Q(VERTEX_ATTR_COUNT, nva0_vertex_attr_count_query);
+   }
+}
+
+#undef _Q
+
+#ifdef DEBUG
+static void
+nv50_hw_pm_dump_perfdom(struct nvif_perfdom_v0 *args)
+{
+   int i, j, k;
+
+   debug_printf("PERFDOM CONFIGURATION:\n");
+   debug_printf("domaine: 0x%02x\n", args->domain);
+   debug_printf("mode: 0x%02x\n", args->mode);
+   for (i = 0; i < 4; i++) {
+      uint32_t signal = 0;
+      for (j = 0; j < 4; j++)
+         signal |= args->ctr[i].signal[j] << (j * 8);
+
+      debug_printf("ctr[%d]: func = 0x%04x, signal=0x%08x\n",
+                   i, args->ctr[i].logic_op, signal);
+
+      for (j = 0; j < 4; j++) {
+         for (k = 0; k < 8; k++) {
+            uint32_t source, value;
+            if (!args->ctr[i].source[j][k])
+               continue;
+
+            source = args->ctr[i].source[j][k];
+            value  = args->ctr[i].source[j][k] >> 32;
+            debug_printf("  src[%d][%d]: source = 0x%08x, value =
0x%08x\n",
+                         j, k, source, value);
+         }
+      }
+   }
+}
+#endif
+
+static const struct nv50_hw_pm_query_cfg *
+nv50_hw_pm_query_get_cfg(struct nv50_screen *screen, uint32_t query_type)
+{
+   return nv50_hw_pm_queries[query_type - NV50_HW_PM_QUERY(0)];
+}
+
+static boolean
+nv50_hw_pm_query_create(struct nv50_context *nv50, struct nv50_query *q)
+{
+   struct nv50_screen *screen = nv50->screen;
+   struct nouveau_perfmon *perfmon = screen->base.perfmon;
+   static const struct nv50_hw_pm_query_cfg *cfg;
+   struct nvif_perfdom_v0 args = {};
+   struct nouveau_perfmon_dom *dom;
+   int i, j, k;
+   int ret;
+
+   if (!screen->pm.num_active) {
+      /* TODO: Currently, only one query type can be monitored simultaneously
+       * because the Gallium's HUD doesn't fit well with the perfdom
interface.
+       *
+       * With two different query types, the current scenario is as follows:
+       * CREATE Q1, BEGIN Q1, CREATE Q2, BEGIN Q2, END Q1, RESULT Q1, BEGIN Q1,
+       * END Q2, RESULT Q2, BEGIN Q2, END Q1, and so on.
+       *
+       * This behaviour doesn't allow to schedule multiple counters because
+       * we have to do that at query creation (ie. when a perfdom is created).
+       *
+       * To get rid of this limitation, a better scenario would be:
+       * CREATE Q1, CREATE Q2, BEGIN Q1, BEGIN Q2, END Q1, END Q2, RESULT Q1,
+       * RESULT Q2, BEGIN Q1, BEGIN Q2, END Q1, and so on.
+       *
+       * With this kind of behaviour, we could introduce
+       * {create,begin,end}_all_queries() functions to be able to configure
+       * all queries in one shot.
+       */
+      screen->pm.query_type = q->type;
+   }
+   screen->pm.num_active++;
+
+   if (screen->pm.query_type != q->type) {
+      NOUVEAU_ERR("Only one query type can be monitored at the same
time!");
+      return FALSE;
+   }
+
+   cfg = nv50_hw_pm_query_get_cfg(nv50->screen, q->type);
+
+   dom = nouveau_perfmon_get_dom_by_id(perfmon, cfg->event->domain);
+   if (!dom) {
+      NOUVEAU_ERR("Failed to find domain %d\n",
cfg->event->domain);
+      return FALSE;
+   }
+
+   /* configure domain and counting mode */
+   args.domain = dom->id;
+   args.mode   = cfg->event->count;
+
+   /* configure counters for this hardware event */
+   for (i = 0; i < ARRAY_SIZE(cfg->ctr); i++) {
+      const struct nv50_hw_pm_counter_cfg *sctr = &cfg->ctr[i];
+
+      if (!sctr->logic_op)
+         continue;
+      args.ctr[i].logic_op = sctr->logic_op;
+
+      /* configure signals for this counter */
+      for (j = 0; j < ARRAY_SIZE(sctr->sig); j++) {
+         const struct nv50_hw_pm_signal_cfg *ssig = &sctr->sig[j];
+         struct nouveau_perfmon_sig *sig;
+
+         if (!ssig->name)
+            continue;
+
+         sig = nouveau_perfmon_get_sig_by_name(dom, ssig->name);
+         if (!sig) {
+            NOUVEAU_ERR("Failed to find signal %s\n", ssig->name);
+            return FALSE;
+         }
+         args.ctr[i].signal[j] = sig->signal;
+
+         /* configure sources for this signal */
+         for (k = 0; k < ARRAY_SIZE(ssig->src); k++) {
+            const struct nv50_hw_pm_source_cfg *ssrc = &ssig->src[k];
+            struct nouveau_perfmon_src *src;
+
+            if (!ssrc->name)
+               continue;
+
+            src = nouveau_perfmon_get_src_by_name(sig, ssrc->name);
+            if (!src) {
+               NOUVEAU_ERR("Failed to find source %s\n",
ssrc->name);
+               return FALSE;
+            }
+            args.ctr[i].source[j][k] = (ssrc->value << 32) |
src->id;
+         }
+      }
+   }
+
+#ifdef DEBUG
+   if (debug_get_num_option("NV50_PM_DEBUG", 0))
+      nv50_hw_pm_dump_perfdom(&args);
+#endif
+
+   ret = nouveau_object_new(perfmon->object, perfmon->handle++,
+                            NVIF_IOCTL_NEW_V0_PERFDOM,
+                            &args, sizeof(args), &q->perfdom);
+   if (ret) {
+      NOUVEAU_ERR("Failed to create perfdom object: %d\n", ret);
+      return FALSE;
+   }
+
+   return TRUE;
+}
+
+static void
+nv50_hw_pm_query_destroy(struct nv50_context *nv50, struct nv50_query *q)
+{
+   struct nv50_screen *screen = nv50->screen;
+
+   nouveau_object_del(&q->perfdom);
+   screen->pm.num_active--;
+}
+
+static boolean
+nv50_hw_pm_query_begin(struct nv50_context *nv50, struct nv50_query *q)
+{
+   struct nouveau_pushbuf *push = nv50->base.pushbuf;
+
+   /* start the next batch of counters */
+   PUSH_SPACE(push, 2);
+   BEGIN_NV04(push, SUBC_SW(0x0608), 1);
+   PUSH_DATA (push, q->perfdom->handle);
+
+   return TRUE;
+}
+
+static void
+nv50_hw_pm_query_end(struct nv50_context *nv50, struct nv50_query *q)
+{
+   struct nouveau_pushbuf *push = nv50->base.pushbuf;
+   struct nv50_screen *screen = nv50->screen;
+
+   /* set sequence field (used to check if result is available) */
+   q->sequence = ++screen->pm.sequence;
+
+   /* sample the previous batch of counters */
+   PUSH_SPACE(push, 2);
+   BEGIN_NV04(push, SUBC_SW(0x060c), 1);
+   PUSH_DATA (push, q->perfdom->handle);
+
+   /* read back counters values */
+   PUSH_SPACE(push, 2);
+   BEGIN_NV04(push, SUBC_SW(0x0700), 1);
+   PUSH_DATA (push, screen->pm.sequence);
+}
+
+static volatile void *
+nv50_ntfy(struct nv50_screen *screen)
+{
+   struct nv04_notify *query = screen->query->data;
+   struct nouveau_bo *notify = screen->notify_bo;
+
+   return (char *)notify->map + query->offset;
+}
+
+static INLINE uint32_t
+nv50_hw_pm_query_get_offset(struct nv50_query *q)
+{
+   return (1 + (q->sequence % NV50_HW_PM_RING_BUFFER_MAX_QUERIES) *
+           NV50_HW_PM_RING_BUFFER_NUM_DOMAINS * 6);
+}
+
+static INLINE boolean
+nv50_hw_pm_query_read_data(struct nv50_context *nv50, struct nv50_query *q,
+                           boolean wait, uint32_t ctr[4], uint32_t *clk)
+{
+   volatile uint32_t *ntfy = nv50_ntfy(nv50->screen);
+   uint32_t offset = nv50_hw_pm_query_get_offset(q);
+   boolean found = FALSE;
+   int i;
+
+   while (ntfy[0] < q->sequence) {
+      if (!wait)
+         return FALSE;
+      usleep(100);
+   }
+
+   if (ntfy[0] > q->sequence + NV50_HW_PM_RING_BUFFER_MAX_QUERIES - 1)
+      return FALSE;
+
+   for (i = 0; i < NV50_HW_PM_RING_BUFFER_NUM_DOMAINS; i++) {
+      if (ntfy[offset + i * 6] == q->perfdom->handle) {
+         found = TRUE;
+         break;
+      }
+   }
+
+   if (!found) {
+      NOUVEAU_ERR("Failed to find perfdom object %" PRIu64
"!\n",
+                  q->perfdom->handle);
+      return FALSE;
+   }
+
+   for (i = 0; i < 4; i++)
+      ctr[i] = ntfy[offset + i + 1];
+   *clk = ntfy[offset + 5];
+
+   return TRUE;
+}
+
+static boolean
+nv50_hw_pm_query_result(struct nv50_context *nv50, struct nv50_query *q,
+                        boolean wait, void *result)
+{
+   struct nv50_screen *screen = nv50->screen;
+   const struct nv50_hw_pm_query_cfg *cfg;
+   uint32_t ctr[4], clk;
+   uint64_t value = 0;
+   int ret;
+
+   ret = nv50_hw_pm_query_read_data(nv50, q, wait, ctr, &clk);
+   if (!ret)
+      return FALSE;
+
+   cfg = nv50_hw_pm_query_get_cfg(screen, q->type);
+   if (cfg->event->count == NV50_HW_PM_EVENT_COUNT_SIMPLE) {
+      /* SIMPLE hardware events are sampled on PRE_CTR. */
+      value = ctr[0];
+   } else {
+      /* EVENT_B4/EVENT_B6 hardware events are sampled on EVENT_CTR. */
+      value = ctr[2];
+   }
+
+   if (cfg->event->display == NV50_HW_PM_EVENT_DISPLAY_RATIO) {
+      if (clk)
+         value = (value * 100) / (float)clk;
+   }
+
+   fprintf(stderr, "ctr[0]=%d, ctr[1]=%d, ctr[2]=%d, ctr[3]=%d, clk=%d,
val=%d\n",
+           ctr[0], ctr[1], ctr[2], ctr[3], clk, value);
+
+   *(uint64_t *)result = value;
+   return TRUE;
+}
+
 void
 nv50_init_query_functions(struct nv50_context *nv50)
 {
diff --git a/src/gallium/drivers/nouveau/nv50/nv50_screen.h
b/src/gallium/drivers/nouveau/nv50/nv50_screen.h
index 71a5247..0449659 100644
--- a/src/gallium/drivers/nouveau/nv50/nv50_screen.h
+++ b/src/gallium/drivers/nouveau/nv50/nv50_screen.h
@@ -89,6 +89,12 @@ struct nv50_screen {
       struct nouveau_bo *bo;
    } fence;
 
+   struct {
+      uint32_t sequence;
+      uint32_t query_type;
+      uint32_t num_active;
+   } pm;
+
    struct nouveau_object *sync;
    struct nouveau_object *query;
 
@@ -108,6 +114,35 @@ nv50_screen(struct pipe_screen *screen)
    return (struct nv50_screen *)screen;
 }
 
+/* Hardware global performance counters. */
+#define NV50_HW_PM_QUERY_COUNT  24
+#define NV50_HW_PM_QUERY(i)    (PIPE_QUERY_DRIVER_SPECIFIC + (i))
+#define NV50_HW_PM_QUERY_LAST   NV50_HW_PM_QUERY(NV50_HW_PM_QUERY_COUNT - 1)
+#define NV50_HW_PM_QUERY_GPU_IDLE                            0
+#define NV50_HW_PM_QUERY_IA_BUSY                             1
+#define NV50_HW_PM_QUERY_IA_WAITS_FOR_FB                     2
+#define NV50_HW_PM_QUERY_VERTEX_ATTR_COUNT                   3
+#define NV50_HW_PM_QUERY_GEOM_VERTEX_IN_COUNT                4
+#define NV50_HW_PM_QUERY_GEOM_VERTEX_OUT_COUNT               5
+#define NV50_HW_PM_QUERY_GEOM_PRIMITIVE_IN_COUNT             6
+#define NV50_HW_PM_QUERY_GEOM_PRIMITIVE_OUT_COUNT            7
+#define NV50_HW_PM_QUERY_SO_BUSY                             8
+#define NV50_HW_PM_QUERY_SETUP_PRIMITIVE_COUNT               9
+#define NV50_HW_PM_QUERY_SETUP_POINT_COUNT                  10
+#define NV50_HW_PM_QUERY_SETUP_LINE_COUNT                   11
+#define NV50_HW_PM_QUERY_SETUP_TRIANGLE_COUNT               12
+#define NV50_HW_PM_QUERY_SETUP_PRIMITIVE_CULLED_COUNT       13
+#define NV50_HW_PM_QUERY_RAST_TILES_KILLED_BY_ZCULL         14
+#define NV50_HW_PM_QUERY_RAST_TILES_IN_COUNT                15
+#define NV50_HW_PM_QUERY_ROP_BUSY                           16
+#define NV50_HW_PM_QUERY_ROP_WAITS_FOR_FB                   17
+#define NV50_HW_PM_QUERY_ROP_WAITS_FOR_SHADER               18
+#define NV50_HW_PM_QUERY_ROP_SAMPLES_KILLED_BY_EARLYZ       19
+#define NV50_HW_PM_QUERY_ROP_SAMPLES_KILLED_BY_LATEZ        20
+#define NV50_HW_PM_QUERY_TEX_CACHE_MISS                     21
+#define NV50_HW_PM_QUERY_TEX_CACHE_HIT                      22
+#define NV50_HW_PM_QUERY_TEX_WAITS_FOR_FB                   23
+
 boolean nv50_blitter_create(struct nv50_screen *);
 void nv50_blitter_destroy(struct nv50_screen *);
 
-- 
2.4.4

Samuel Pitoiset

2015-Jun-22 20:53 UTC

head link

[Nouveau] [RFC PATCH 7/8] nv50: expose global performance counters to the HUD

Signed-off-by: Samuel Pitoiset <samuel.pitoiset at gmail.com>
---
 src/gallium/drivers/nouveau/nv50/nv50_query.c  | 41 ++++++++++++++++++++++++++
 src/gallium/drivers/nouveau/nv50/nv50_screen.c |  1 +
 src/gallium/drivers/nouveau/nv50/nv50_screen.h |  3 ++
 3 files changed, 45 insertions(+)

diff --git a/src/gallium/drivers/nouveau/nv50/nv50_query.c
b/src/gallium/drivers/nouveau/nv50/nv50_query.c
index b9d2914..062d427 100644
--- a/src/gallium/drivers/nouveau/nv50/nv50_query.c
+++ b/src/gallium/drivers/nouveau/nv50/nv50_query.c
@@ -1535,6 +1535,47 @@ nv50_hw_pm_query_result(struct nv50_context *nv50, struct
nv50_query *q,
    return TRUE;
 }
 
+int
+nv50_screen_get_driver_query_info(struct pipe_screen *pscreen,
+                                  unsigned id,
+                                  struct pipe_driver_query_info *info)
+{
+   struct nv50_screen *screen = nv50_screen(pscreen);
+   int count = 0;
+
+   // TODO: Check DRM version when nvif will be merged in libdrm!
+   if (screen->base.perfmon) {
+      nv50_identify_events(screen);
+      count += NV50_HW_PM_QUERY_COUNT;
+   }
+
+   if (!info)
+      return count;
+
+   /* Init default values. */
+   info->name = "this_is_not_the_query_you_are_looking_for";
+   info->query_type = 0xdeadd01d;
+   info->type = PIPE_DRIVER_QUERY_TYPE_UINT64;
+   info->max_value.u64 = 0;
+   info->group_id = -1;
+
+   if (id < count) {
+      if (screen->base.perfmon) {
+         const struct nv50_hw_pm_query_cfg *cfg +           
nv50_hw_pm_query_get_cfg(screen, NV50_HW_PM_QUERY(id));
+
+         info->name = cfg->event->name;
+         info->query_type = NV50_HW_PM_QUERY(id);
+         info->max_value.u64 +            (cfg->event->display ==
NV50_HW_PM_EVENT_DISPLAY_RATIO) ? 100 : 0;
+         return 1;
+      }
+   }
+
+   /* User asked for info about non-existing query. */
+   return 0;
+}
+
 void
 nv50_init_query_functions(struct nv50_context *nv50)
 {
diff --git a/src/gallium/drivers/nouveau/nv50/nv50_screen.c
b/src/gallium/drivers/nouveau/nv50/nv50_screen.c
index 53817c0..f07798e 100644
--- a/src/gallium/drivers/nouveau/nv50/nv50_screen.c
+++ b/src/gallium/drivers/nouveau/nv50/nv50_screen.c
@@ -745,6 +745,7 @@ nv50_screen_create(struct nouveau_device *dev)
    pscreen->get_param = nv50_screen_get_param;
    pscreen->get_shader_param = nv50_screen_get_shader_param;
    pscreen->get_paramf = nv50_screen_get_paramf;
+   pscreen->get_driver_query_info = nv50_screen_get_driver_query_info;
 
    nv50_screen_init_resource_functions(pscreen);
 
diff --git a/src/gallium/drivers/nouveau/nv50/nv50_screen.h
b/src/gallium/drivers/nouveau/nv50/nv50_screen.h
index 0449659..69127c0 100644
--- a/src/gallium/drivers/nouveau/nv50/nv50_screen.h
+++ b/src/gallium/drivers/nouveau/nv50/nv50_screen.h
@@ -143,6 +143,9 @@ nv50_screen(struct pipe_screen *screen)
 #define NV50_HW_PM_QUERY_TEX_CACHE_HIT                      22
 #define NV50_HW_PM_QUERY_TEX_WAITS_FOR_FB                   23
 
+int nv50_screen_get_driver_query_info(struct pipe_screen *, unsigned,
+                                      struct pipe_driver_query_info *);
+
 boolean nv50_blitter_create(struct nv50_screen *);
 void nv50_blitter_destroy(struct nv50_screen *);
 
-- 
2.4.4

Samuel Pitoiset

2015-Jun-22 20:53 UTC

head link

[Nouveau] [RFC PATCH 8/8] nv50: enable GL_AMD_performance_monitor

This exposes a group of global performance counters that enables
GL_AMD_performance_monitor. All piglit tests are okay.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset at gmail.com>
---
 src/gallium/drivers/nouveau/nv50/nv50_query.c  | 35 ++++++++++++++++++++++++++
 src/gallium/drivers/nouveau/nv50/nv50_screen.c |  1 +
 src/gallium/drivers/nouveau/nv50/nv50_screen.h |  6 +++++
 3 files changed, 42 insertions(+)

diff --git a/src/gallium/drivers/nouveau/nv50/nv50_query.c
b/src/gallium/drivers/nouveau/nv50/nv50_query.c
index 062d427..6638e82 100644
--- a/src/gallium/drivers/nouveau/nv50/nv50_query.c
+++ b/src/gallium/drivers/nouveau/nv50/nv50_query.c
@@ -1566,6 +1566,7 @@ nv50_screen_get_driver_query_info(struct pipe_screen
*pscreen,
 
          info->name = cfg->event->name;
          info->query_type = NV50_HW_PM_QUERY(id);
+         info->group_id = NV50_HW_PM_QUERY_GROUP;
          info->max_value.u64              (cfg->event->display ==
NV50_HW_PM_EVENT_DISPLAY_RATIO) ? 100 : 0;
          return 1;
@@ -1576,6 +1577,40 @@ nv50_screen_get_driver_query_info(struct pipe_screen
*pscreen,
    return 0;
 }
 
+int
+nv50_screen_get_driver_query_group_info(struct pipe_screen *pscreen,
+                                        unsigned id,
+                                        struct pipe_driver_query_group_info
*info)
+{
+   struct nv50_screen *screen = nv50_screen(pscreen);
+   int count = 0;
+
+   // TODO: Check DRM version when nvif will be merged in libdrm!
+   if (screen->base.perfmon) {
+      count++; /* NV50_HW_PM_QUERY_GROUP */
+   }
+
+   if (!info)
+      return count;
+
+   if (id == NV50_HW_PM_QUERY_GROUP) {
+      if (screen->base.perfmon) {
+         info->name = "Global performance counters";
+         info->type = PIPE_DRIVER_QUERY_GROUP_TYPE_GPU;
+         info->num_queries = NV50_HW_PM_QUERY_COUNT;
+         info->max_active_queries = 1; /* TODO: get rid of this limitation!
*/
+         return 1;
+      }
+   }
+
+   /* user asked for info about non-existing query group */
+   info->name = "this_is_not_the_query_group_you_are_looking_for";
+   info->max_active_queries = 0;
+   info->num_queries = 0;
+   info->type = 0;
+   return 0;
+}
+
 void
 nv50_init_query_functions(struct nv50_context *nv50)
 {
diff --git a/src/gallium/drivers/nouveau/nv50/nv50_screen.c
b/src/gallium/drivers/nouveau/nv50/nv50_screen.c
index f07798e..dfe20c9 100644
--- a/src/gallium/drivers/nouveau/nv50/nv50_screen.c
+++ b/src/gallium/drivers/nouveau/nv50/nv50_screen.c
@@ -746,6 +746,7 @@ nv50_screen_create(struct nouveau_device *dev)
    pscreen->get_shader_param = nv50_screen_get_shader_param;
    pscreen->get_paramf = nv50_screen_get_paramf;
    pscreen->get_driver_query_info = nv50_screen_get_driver_query_info;
+   pscreen->get_driver_query_group_info =
nv50_screen_get_driver_query_group_info;
 
    nv50_screen_init_resource_functions(pscreen);
 
diff --git a/src/gallium/drivers/nouveau/nv50/nv50_screen.h
b/src/gallium/drivers/nouveau/nv50/nv50_screen.h
index 69127c0..807ae0e 100644
--- a/src/gallium/drivers/nouveau/nv50/nv50_screen.h
+++ b/src/gallium/drivers/nouveau/nv50/nv50_screen.h
@@ -114,6 +114,9 @@ nv50_screen(struct pipe_screen *screen)
    return (struct nv50_screen *)screen;
 }
 
+/* Hardware global performance counters groups. */
+#define NV50_HW_PM_QUERY_GROUP 0
+
 /* Hardware global performance counters. */
 #define NV50_HW_PM_QUERY_COUNT  24
 #define NV50_HW_PM_QUERY(i)    (PIPE_QUERY_DRIVER_SPECIFIC + (i))
@@ -146,6 +149,9 @@ nv50_screen(struct pipe_screen *screen)
 int nv50_screen_get_driver_query_info(struct pipe_screen *, unsigned,
                                       struct pipe_driver_query_info *);
 
+int nv50_screen_get_driver_query_group_info(struct pipe_screen *, unsigned,
+                                            struct pipe_driver_query_group_info
*);
+
 boolean nv50_blitter_create(struct nv50_screen *);
 void nv50_blitter_destroy(struct nv50_screen *);
 
-- 
2.4.4

Ilia Mirkin

2015-Jun-25 23:02 UTC

head link

[Nouveau] [RFC PATCH 3/8] nv50: allocate and map a notifier buffer object for PM

On Mon, Jun 22, 2015 at 4:53 PM, Samuel Pitoiset
<samuel.pitoiset at gmail.com> wrote:> This notifier buffer object will be used to read back global performance
> counters results written by the kernel.
>
> For each domain, we will store the handle of the perfdom object, an
> array of 4 counters and the number of cycles. Like the Gallium's HUD,
> we keep a list of busy queries in a ring in order to prevent stalls
> when reading queries.
>
> Signed-off-by: Samuel Pitoiset <samuel.pitoiset at gmail.com>
> ---
>  src/gallium/drivers/nouveau/nv50/nv50_screen.c | 29
++++++++++++++++++++++++++
>  src/gallium/drivers/nouveau/nv50/nv50_screen.h |  6 ++++++
>  2 files changed, 35 insertions(+)
>
> diff --git a/src/gallium/drivers/nouveau/nv50/nv50_screen.c
b/src/gallium/drivers/nouveau/nv50/nv50_screen.c
> index c985344..3a99cc8 100644
> --- a/src/gallium/drivers/nouveau/nv50/nv50_screen.c
> +++ b/src/gallium/drivers/nouveau/nv50/nv50_screen.c
> @@ -368,6 +368,7 @@ nv50_screen_destroy(struct pipe_screen *pscreen)
>     nouveau_object_del(&screen->m2mf);
>     nouveau_object_del(&screen->sync);
>     nouveau_object_del(&screen->sw);
> +   nouveau_object_del(&screen->query);
>
>     nouveau_screen_fini(&screen->base);
>
> @@ -699,9 +700,11 @@ nv50_screen_create(struct nouveau_device *dev)
>     struct nv50_screen *screen;
>     struct pipe_screen *pscreen;
>     struct nouveau_object *chan;
> +   struct nv04_fifo *fifo;
>     uint64_t value;
>     uint32_t tesla_class;
>     unsigned stack_size;
> +   uint32_t length;
>     int ret;
>
>     screen = CALLOC_STRUCT(nv50_screen);
> @@ -727,6 +730,7 @@ nv50_screen_create(struct nouveau_device *dev)
>     screen->base.pushbuf->rsvd_kick = 5;
>
>     chan = screen->base.channel;
> +   fifo = chan->data;
>
>     pscreen->destroy = nv50_screen_destroy;
>     pscreen->context_create = nv50_create;
> @@ -772,6 +776,23 @@ nv50_screen_create(struct nouveau_device *dev)
>        goto fail;
>     }
>
> +   /* Compute size (in bytes) of the notifier buffer object which is used
> +    * in order to read back global performance counters results written
> +    * by the kernel. For each domain, we store the handle of the perfdom
> +    * object, an array of 4 counters and the number of cycles. Like for
> +    * the Gallium's HUD, we keep a list of busy queries in a ring in
order
> +    * to prevent stalls when reading queries. */
> +   length = (1 + (NV50_HW_PM_RING_BUFFER_NUM_DOMAINS * 6) *
> +      NV50_HW_PM_RING_BUFFER_MAX_QUERIES) * 4;
This calculation may become apparent to me later, but it certainly
isn't now. What's the *6? You refer to an array of 4 counters...
should that have been 6 counters? Or should this have been a 4?
> +
> +   ret = nouveau_object_new(chan, 0xbeef0302, NOUVEAU_NOTIFIER_CLASS,
> +                            &(struct nv04_notify){ .length = length },
> +                            sizeof(struct nv04_notify),
&screen->query);
> +   if (ret) {
> +       NOUVEAU_ERR("Failed to allocate notifier object for PM:
%d\n", ret);
> +       goto fail;
> +   }
> +
>     ret = nouveau_object_new(chan, 0xbeef506e, 0x506e,
>                              NULL, 0, &screen->sw);
>     if (ret) {
> @@ -845,6 +866,14 @@ nv50_screen_create(struct nouveau_device *dev)
>     nouveau_heap_init(&screen->gp_code_heap, 0, 1 <<
NV50_CODE_BO_SIZE_LOG2);
>     nouveau_heap_init(&screen->fp_code_heap, 0, 1 <<
NV50_CODE_BO_SIZE_LOG2);
>
> +   ret = nouveau_bo_wrap(screen->base.device, fifo->notify,
&screen->notify_bo);
> +   if (ret == 0)
> +      nouveau_bo_map(screen->notify_bo, 0, screen->base.client);
ret = ...
> +   if (ret) {
> +      NOUVEAU_ERR("Failed to map notifier object for PM: %d\n",
ret);
> +      goto fail;
> +   }
> +
>     nouveau_getparam(dev, NOUVEAU_GETPARAM_GRAPH_UNITS, &value);
>
>     screen->TPs = util_bitcount(value & 0xffff);
> diff --git a/src/gallium/drivers/nouveau/nv50/nv50_screen.h
b/src/gallium/drivers/nouveau/nv50/nv50_screen.h
> index 69fdfdb..71a5247 100644
> --- a/src/gallium/drivers/nouveau/nv50/nv50_screen.h
> +++ b/src/gallium/drivers/nouveau/nv50/nv50_screen.h
> @@ -59,6 +59,7 @@ struct nv50_screen {
>     struct nouveau_bo *txc; /* TIC (offset 0) and TSC (65536) */
>     struct nouveau_bo *stack_bo;
>     struct nouveau_bo *tls_bo;
> +   struct nouveau_bo *notify_bo;
>
>     unsigned TPs;
>     unsigned MPsInTP;
> @@ -89,6 +90,7 @@ struct nv50_screen {
>     } fence;
>
>     struct nouveau_object *sync;
> +   struct nouveau_object *query;
>
>     struct nouveau_object *tesla;
>     struct nouveau_object *eng2d;
> @@ -96,6 +98,10 @@ struct nv50_screen {
>     struct nouveau_object *sw;
>  };
>
> +/* Parameters of the ring buffer used to read back global PM counters. */
> +#define NV50_HW_PM_RING_BUFFER_NUM_DOMAINS 8
> +#define NV50_HW_PM_RING_BUFFER_MAX_QUERIES 9 /* HUD_NUM_QUERIES + 1 */
> +
>  static INLINE struct nv50_screen *
>  nv50_screen(struct pipe_screen *screen)
>  {
> --
> 2.4.4
>
> _______________________________________________
> Nouveau mailing list
> Nouveau at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/nouveau

Ilia Mirkin

2015-Jun-25 23:04 UTC

head link

[Nouveau] [RFC PATCH 4/8] nv50: configure the ring buffer for reading back PM counters

Yeah, this whole thing has to be guarded by a drm version check,
otherwise it'll end up with errors in dmesg I assume. Perhaps only
allocate screen->query when the drm version matches, and gate things
on that for the rest of the code?

On Mon, Jun 22, 2015 at 4:53 PM, Samuel Pitoiset
<samuel.pitoiset at gmail.com> wrote:> To write data at the right offset, the kernel has to know some
> parameters of this ring buffer, like the number of domains and the
> maximum number of queries.
>
> Signed-off-by: Samuel Pitoiset <samuel.pitoiset at gmail.com>
> ---
>  src/gallium/drivers/nouveau/nv50/nv50_screen.c | 7 +++++++
>  1 file changed, 7 insertions(+)
>
> diff --git a/src/gallium/drivers/nouveau/nv50/nv50_screen.c
b/src/gallium/drivers/nouveau/nv50/nv50_screen.c
> index 3a99cc8..53817c0 100644
> --- a/src/gallium/drivers/nouveau/nv50/nv50_screen.c
> +++ b/src/gallium/drivers/nouveau/nv50/nv50_screen.c
> @@ -441,6 +441,13 @@ nv50_screen_init_hwctx(struct nv50_screen *screen)
>
>     BEGIN_NV04(push, SUBC_SW(NV01_SUBCHAN_OBJECT), 1);
>     PUSH_DATA (push, screen->sw->handle);
> +   BEGIN_NV04(push, SUBC_SW(0x0190), 1);
> +   PUSH_DATA (push, screen->query->handle);
> +   // XXX: Maybe add a check for DRM version here ?
> +   BEGIN_NV04(push, SUBC_SW(0x0600), 1);
> +   PUSH_DATA (push, NV50_HW_PM_RING_BUFFER_MAX_QUERIES);
> +   BEGIN_NV04(push, SUBC_SW(0x0604), 1);
> +   PUSH_DATA (push, NV50_HW_PM_RING_BUFFER_NUM_DOMAINS);
FYI you can do BEGIN_NV04(..., 2), since they're sequential.
>
>     BEGIN_NV04(push, NV50_3D(COND_MODE), 1);
>     PUSH_DATA (push, NV50_3D_COND_MODE_ALWAYS);
> --
> 2.4.4
>
> _______________________________________________
> Nouveau mailing list
> Nouveau at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/nouveau

Ilia Mirkin

2015-Jun-25 23:09 UTC

head link

[Nouveau] [RFC PATCH 6/8] nv50: add support for compute/graphics global performance counters

What's with the \%'s everywhere?

On Mon, Jun 22, 2015 at 4:53 PM, Samuel Pitoiset
<samuel.pitoiset at gmail.com> wrote:> This commit adds support for both compute and graphics global
> performance counters which have been reverse engineered with
> CUPTI (Linux) and PerfKit (Windows).
>
> Currently, only one query type can be monitored at the same time because
> the Gallium's HUD doesn't fit pretty well. This will be improved
later.
>
> Signed-off-by: Samuel Pitoiset <samuel.pitoiset at gmail.com>
> ---
>  src/gallium/drivers/nouveau/nv50/nv50_query.c  | 1057
+++++++++++++++++++++++-
>  src/gallium/drivers/nouveau/nv50/nv50_screen.h |   35 +
>  2 files changed, 1087 insertions(+), 5 deletions(-)
>
> diff --git a/src/gallium/drivers/nouveau/nv50/nv50_query.c
b/src/gallium/drivers/nouveau/nv50/nv50_query.c
> index 1162110..b9d2914 100644
> --- a/src/gallium/drivers/nouveau/nv50/nv50_query.c
> +++ b/src/gallium/drivers/nouveau/nv50/nv50_query.c
> @@ -27,6 +27,8 @@
>  #include "nv50/nv50_context.h"
>  #include "nv_object.xml.h"
>
> +#include "nouveau_perfmon.h"
> +
>  #define NV50_QUERY_STATE_READY   0
>  #define NV50_QUERY_STATE_ACTIVE  1
>  #define NV50_QUERY_STATE_ENDED   2
> @@ -51,10 +53,25 @@ struct nv50_query {
>     boolean is64bit;
>     struct nouveau_mm_allocation *mm;
>     struct nouveau_fence *fence;
> +   struct nouveau_object *perfdom;
>  };
>
>  #define NV50_QUERY_ALLOC_SPACE 256
>
> +#ifdef DEBUG
> +static void nv50_hw_pm_dump_perfdom(struct nvif_perfdom_v0 *args);
> +#endif
> +
> +static boolean
> +nv50_hw_pm_query_create(struct nv50_context *, struct nv50_query *);
> +static void
> +nv50_hw_pm_query_destroy(struct nv50_context *, struct nv50_query *);
> +static boolean
> +nv50_hw_pm_query_begin(struct nv50_context *, struct nv50_query *);
> +static void nv50_hw_pm_query_end(struct nv50_context *, struct nv50_query
*);
> +static boolean nv50_hw_pm_query_result(struct nv50_context *,
> +                                    struct nv50_query *, boolean, void *);
> +
>  static INLINE struct nv50_query *
>  nv50_query(struct pipe_query *pipe)
>  {
> @@ -96,12 +113,18 @@ nv50_query_allocate(struct nv50_context *nv50, struct
nv50_query *q, int size)
>  static void
>  nv50_query_destroy(struct pipe_context *pipe, struct pipe_query *pq)
>  {
> +   struct nv50_context *nv50 = nv50_context(pipe);
> +   struct nv50_query *q = nv50_query(pq);
> +
>     if (!pq)
>        return;
>
> -   nv50_query_allocate(nv50_context(pipe), nv50_query(pq), 0);
> -   nouveau_fence_ref(NULL, &nv50_query(pq)->fence);
> -   FREE(nv50_query(pq));
> +   if ((q->type >= NV50_HW_PM_QUERY(0) && q->type <=
NV50_HW_PM_QUERY_LAST))
> +      nv50_hw_pm_query_destroy(nv50, q);
> +
> +   nv50_query_allocate(nv50, q, 0);
> +   nouveau_fence_ref(NULL, &q->fence);
> +   FREE(q);
>  }
>
>  static struct pipe_query *
> @@ -130,6 +153,11 @@ nv50_query_create(struct pipe_context *pipe, unsigned
type, unsigned index)
>        q->data -= 32 / sizeof(*q->data); /* we advance before
query_begin ! */
>     }
>
> +   if ((q->type >= NV50_HW_PM_QUERY(0) && q->type <=
NV50_HW_PM_QUERY_LAST)) {
> +      if (!nv50_hw_pm_query_create(nv50, q))
> +         return NULL;
> +   }
> +
>     return (struct pipe_query *)q;
>  }
>
> @@ -154,6 +182,7 @@ nv50_query_begin(struct pipe_context *pipe, struct
pipe_query *pq)
>     struct nv50_context *nv50 = nv50_context(pipe);
>     struct nouveau_pushbuf *push = nv50->base.pushbuf;
>     struct nv50_query *q = nv50_query(pq);
> +   boolean ret = TRUE;
>
>     if (!pq)
>        return FALSE;
> @@ -211,10 +240,13 @@ nv50_query_begin(struct pipe_context *pipe, struct
pipe_query *pq)
>        nv50_query_get(push, q, 0x10, 0x00005002);
>        break;
>     default:
> +      if ((q->type >= NV50_HW_PM_QUERY(0) && q->type
<= NV50_HW_PM_QUERY_LAST)) {
> +         ret = nv50_hw_pm_query_begin(nv50, q);
> +      }
>        break;
>     }
>     q->state = NV50_QUERY_STATE_ACTIVE;
> -   return true;
> +   return ret;
>  }
>
>  static void
> @@ -274,7 +306,9 @@ nv50_query_end(struct pipe_context *pipe, struct
pipe_query *pq)
>        q->state = NV50_QUERY_STATE_READY;
>        break;
>     default:
> -      assert(0);
> +      if ((q->type >= NV50_HW_PM_QUERY(0) && q->type
<= NV50_HW_PM_QUERY_LAST)) {
> +         nv50_hw_pm_query_end(nv50, q);
> +      }
>        break;
>     }
>
> @@ -309,6 +343,10 @@ nv50_query_result(struct pipe_context *pipe, struct
pipe_query *pq,
>     if (!pq)
>        return FALSE;
>
> +   if ((q->type >= NV50_HW_PM_QUERY(0) && q->type <=
NV50_HW_PM_QUERY_LAST)) {
> +      return nv50_hw_pm_query_result(nv50, q, wait, result);
> +   }
> +
>     if (q->state != NV50_QUERY_STATE_READY)
>        nv50_query_update(q);
>
> @@ -488,6 +526,1015 @@ nva0_so_target_save_offset(struct pipe_context
*pipe,
>     nv50_query_end(pipe, targ->pq);
>  }
>
> +/* === HARDWARE GLOBAL PERFORMANCE COUNTERS for NV50 === */
> +
> +struct nv50_hw_pm_source_cfg
> +{
> +   const char *name;
> +   uint64_t value;
> +};
> +
> +struct nv50_hw_pm_signal_cfg
> +{
> +   const char *name;
> +   const struct nv50_hw_pm_source_cfg src[8];
> +};
> +
> +struct nv50_hw_pm_counter_cfg
> +{
> +   uint16_t logic_op;
> +   const struct nv50_hw_pm_signal_cfg sig[4];
> +};
> +
> +enum nv50_hw_pm_query_display
> +{
> +   NV50_HW_PM_EVENT_DISPLAY_RAW,
> +   NV50_HW_PM_EVENT_DISPLAY_RATIO,
> +};
> +
> +enum nv50_hw_pm_query_count
> +{
> +   NV50_HW_PM_EVENT_COUNT_SIMPLE,
> +   NV50_HW_PM_EVENT_COUNT_B4,
> +   NV50_HW_PM_EVENT_COUNT_B6,
> +};
> +
> +struct nv50_hw_pm_event_cfg
> +{
> +   const char *name;
> +   const char *desc;
> +   enum nv50_hw_pm_query_display display;
> +   enum nv50_hw_pm_query_count count;
> +   uint8_t domain;
> +};
> +
> +struct nv50_hw_pm_query_cfg
> +{
> +   const struct nv50_hw_pm_event_cfg *event;
> +   const struct nv50_hw_pm_counter_cfg ctr[4];
> +};
> +
> +#define SRC(name, val) { name, val }
> +#define SIG(name, ...) { name, { __VA_ARGS__ } }
> +#define CTR(func, ...) { func, { __VA_ARGS__ } }
> +
> +/*
> + * GPU
> + */
> +/* gpu_idle */
> +static const struct nv50_hw_pm_event_cfg
> +nv50_gpu_idle_event > +{
> +   .name    = "gpu_idle",
> +   .desc    = "The \% of time the GPU is idle/busy since the last
call. "
> +              "Having the GPU idle at all is a waste of valuable
resources. "
> +              "You want to balance the GPU and CPU workloads so that
no one "
> +              "processor is starved for work. Time management or
using "
> +              "multithreading in your application can help balance
CPU based "
> +              "tasks (world management, etc.) with the rendering
pipeline.",
> +   .display = NV50_HW_PM_EVENT_DISPLAY_RATIO,
> +   .count   = NV50_HW_PM_EVENT_COUNT_SIMPLE,
> +   .domain  = 1,
> +};
> +
> +static const struct nv50_hw_pm_query_cfg
> +nv50_gpu_idle_query > +{
> +   .event  = &nv50_gpu_idle_event,
> +   .ctr[0] = CTR(0xaaaa, SIG("pc01_gr_idle")),
> +};
> +
> +/*
> + * INPUT ASSEMBLER
> + */
> +/* input_assembler_busy */
> +static const struct nv50_hw_pm_event_cfg
> +nv50_ia_busy_event > +{
> +   .name    = "input_assembler_busy",
> +   .desc    = "The \% of time the input assembler unit is busy. This
is mainly "
> +              "impacted by both the number of vertices processed as
well as "
> +              "the size of the attributes on those vertices. You can
optimize "
> +              "this by reducing vertex size as much as possible and
using "
> +              "indexed primitives to take advantage of the vertex
cache.",
> +   .display = NV50_HW_PM_EVENT_DISPLAY_RATIO,
> +   .count   = NV50_HW_PM_EVENT_COUNT_SIMPLE,
> +   .domain  = 1,
> +};
> +
> +static const struct nv50_hw_pm_query_cfg
> +nv50_ia_busy_query > +{
> +   .event   = &nv50_ia_busy_event,
> +   .ctr[0]  = CTR(0xf888, SIG("pc01_vfetch_18",
> +                              SRC("pgraph_vfetch_unk0c_unk0",
0x1)),
> +                          SIG("pc01_vfetch_17"),
> +                          SIG("pc01_vfetch_03"),
> +                          SIG("pc01_vfetch_02")),
> +};
> +
> +static const struct nv50_hw_pm_query_cfg
> +nva0_ia_busy_query > +{
> +   .event   = &nv50_ia_busy_event,
> +   .ctr[0]  = CTR(0xf888, SIG("pc01_vfetch_15",
> +                              SRC("pgraph_vfetch_unk0c_unk0",
0x1)),
> +                          SIG("pc01_vfetch_14"),
> +                          SIG("pc01_vfetch_03"),
> +                          SIG("pc01_vfetch_02")),
> +};
> +
> +/* input_assembler_waits_for_fb */
> +static const struct nv50_hw_pm_event_cfg
> +nv50_ia_waits_for_fb_event = {
> +   .name    = "input_assembler_waits_for_fb",
> +   .desc    = "This is the amount of time the input assembler unit
was "
> +              "waiting for data from the frame buffer unit.",
> +   .display = NV50_HW_PM_EVENT_DISPLAY_RATIO,
> +   .count   = NV50_HW_PM_EVENT_COUNT_SIMPLE,
> +   .domain  = 1,
> +};
> +
> +static const struct nv50_hw_pm_query_cfg
> +nv50_ia_waits_for_fb_query > +{
> +   .event   = &nv50_ia_waits_for_fb_event,
> +   .ctr[0]  = CTR(0xaaaa, SIG("pc01_vfetch_0e",
> +                              SRC("pgraph_vfetch_unk0c_unk0",
0x1))),
> +};
> +
> +static const struct nv50_hw_pm_query_cfg
> +nva0_ia_waits_for_fb_query > +{
> +   .event   = &nv50_ia_waits_for_fb_event,
> +   .ctr[0]  = CTR(0xaaaa, SIG("pc01_vfetch_0b",
> +                              SRC("pgraph_vfetch_unk0c_unk0",
0x1))),
> +};
> +
> +/* vertex_attribute_count */
> +static const struct nv50_hw_pm_event_cfg
> +nv50_vertex_attr_count_event > +{
> +   .name    = "vertex_attribute_count",
> +   .desc    = "The number of vertex attributes that are fetched and
passed to "
> +              "the geometry unit is returned in this counter. A large
number "
> +              "of attributes (or unaligned vertices) can hurt vertex
cache "
> +              "performance and reduce the overall vertex processing
"
> +              "capabilities of the pipeline.",
> +   .display = NV50_HW_PM_EVENT_DISPLAY_RAW,
> +   .count   = NV50_HW_PM_EVENT_COUNT_SIMPLE,
> +   .domain  = 1,
> +};
> +
> +static const struct nv50_hw_pm_query_cfg
> +nv50_vertex_attr_count_query > +{
> +   .event = &nv50_vertex_attr_count_event,
> +   .ctr[0] = CTR(0xf888, SIG("pc01_vfetch_18",
> +                             SRC("pgraph_vfetch_unk0c_unk0",
0x1)),
> +                         SIG("pc01_vfetch_17"),
> +                         SIG("pc01_vfetch_03"),
> +                         SIG("pc01_vfetch_02")),
> +};
> +
> +static const struct nv50_hw_pm_query_cfg
> +nva0_vertex_attr_count_query > +{
> +   .event  = &nv50_vertex_attr_count_event,
> +   .ctr[0] = CTR(0xf888, SIG("pc01_vfetch_15",
> +                             SRC("pgraph_vfetch_unk0c_unk0",
0x1)),
> +                         SIG("pc01_vfetch_14"),
> +                         SIG("pc01_vfetch_03"),
> +                         SIG("pc01_vfetch_02")),
> +};
> +
> +/*
> + * GEOM
> + */
> +/* geom_vertex_in_count */
> +static const struct nv50_hw_pm_event_cfg
> +nv50_geom_vertex_in_count_event > +{
> +   .name    = "geom_vertex_in_count",
> +   .desc    = "The number of vertices input to the geom unit.",
> +   .display = NV50_HW_PM_EVENT_DISPLAY_RAW,
> +   .count   = NV50_HW_PM_EVENT_COUNT_B4,
> +   .domain  = 1,
> +};
> +
> +static const struct nv50_hw_pm_query_cfg
> +nv50_geom_vertex_in_count_query > +{
> +   .event  = &nv50_geom_vertex_in_count_event,
> +   .ctr[1] = CTR(0xffff, SIG("pc01_vfetch_0e",
> +                             SRC("pgraph_vfetch_unk0c_unk0",
0x0)),
> +                         SIG("pc01_vfetch_0f"),
> +                         SIG("pc01_vfetch_10"),
> +                         SIG("pc01_trailer")),
> +   .ctr[2] = CTR(0x5555, SIG("pc01_trailer"),
> +                         SIG("pc01_trailer"),
> +                         SIG("pc01_trailer"),
> +                         SIG("pc01_trailer")),
> +};
> +
> +/* geom_vertex_out_count */
> +static const struct nv50_hw_pm_event_cfg
> +nv50_geom_vertex_out_count_event > +{
> +   .name    = "geom_vertex_out_count",
> +   .desc    = "The number of vertices coming out of the geom unit
after any "
> +              "geometry shader expansion.",
> +   .display = NV50_HW_PM_EVENT_DISPLAY_RAW,
> +   .count   = NV50_HW_PM_EVENT_COUNT_SIMPLE,
> +   .domain  = 1,
> +};
> +
> +static const struct nv50_hw_pm_query_cfg
> +nv50_geom_vertex_out_count_query > +{
> +   .event  = &nv50_geom_vertex_out_count_event,
> +   .ctr[0] = CTR(0xaaaa, SIG("pc01_vattr_01")),
> +};
> +
> +/* geom_primitive_in_count */
> +static const struct nv50_hw_pm_event_cfg
> +nv50_geom_primitive_in_count_event > +{
> +   .name    = "geom_primitive_in_count",
> +   .desc    = "The number of primitives input to the geom
unit.",
> +   .display = NV50_HW_PM_EVENT_DISPLAY_RAW,
> +   .count   = NV50_HW_PM_EVENT_COUNT_SIMPLE,
> +   .domain  = 1,
> +};
> +
> +static const struct nv50_hw_pm_query_cfg
> +nv50_geom_primitive_in_count_query > +{
> +   .event  = &nv50_geom_primitive_in_count_event,
> +   .ctr[0] = CTR(0xaaaa, SIG("pc01_vfetch_08",
> +                             SRC("pgraph_vfetch_unk0c_unk0",
0x0))),
> +};
> +
> +/* geom_primitive_out_count */
> +static const struct nv50_hw_pm_event_cfg
> +nv50_geom_primitive_out_count_event > +{
> +   .name    = "geom_primitive_out_count",
> +   .desc    = "The number of primitives coming out the geom unit
after any "
> +              "geometry shader expansion.",
> +   .display = NV50_HW_PM_EVENT_DISPLAY_RAW,
> +   .count   = NV50_HW_PM_EVENT_COUNT_SIMPLE,
> +   .domain  = 1,
> +};
> +
> +static const struct nv50_hw_pm_query_cfg
> +nv50_geom_primitive_out_count_query > +{
> +   .event  = &nv50_geom_primitive_out_count_event,
> +   .ctr[0] = CTR(0xaaaa, SIG("pc01_vattr_00")),
> +};
> +
> +/*
> + * STREAM OUT
> + */
> +/* stream_out_busy */
> +static const struct nv50_hw_pm_event_cfg
> +nv50_so_busy_event > +{
> +   .name    = "stream_out_busy",
> +   .desc    = "This unit manages the writing of vertices to the frame
buffer "
> +              "when using stream out. If a significant number of
vertices are "
> +              "written, this can become a bottleneck.",
> +   .display = NV50_HW_PM_EVENT_DISPLAY_RATIO,
> +   .count   = NV50_HW_PM_EVENT_COUNT_SIMPLE,
> +   .domain  = 1,
> +};
> +
> +static const struct nv50_hw_pm_query_cfg
> +nv50_so_busy_query > +{
> +   .event  = &nv50_so_busy_event,
> +   .ctr[0] = CTR(0x8888, SIG("pc01_strmout_00"),
> +                         SIG("pc01_strmout_01")),
> +};
> +
> +/*
> + * SETUP
> + */
> +/* setup_primitive_count */
> +static const struct nv50_hw_pm_event_cfg
> +nv50_setup_primitive_count_event > +{
> +   .name    = "setup_primitive_count",
> +   .desc    = "Returns the number of primitives processed in the
geometry "
> +              "subsystem. This experiments counts points, lines and
triangles. "
> +              "To count only triangles, use the setup_triangle_count
counter. "
> +              "Balance these counts with the number of pixels being
drawn to "
> +              "see if you could simplify your geometry and use "
> +              "bump/displacement maps, for example.",
> +   .display = NV50_HW_PM_EVENT_DISPLAY_RAW,
> +   .count   = NV50_HW_PM_EVENT_COUNT_SIMPLE,
> +   .domain  = 1,
> +};
> +
> +static const struct nv50_hw_pm_query_cfg
> +nv50_setup_primitive_count_query > +{
> +   .event  = &nv50_setup_primitive_count_event,
> +   .ctr[0] = CTR(0xaaaa, SIG("pc01_trast_00")),
> +};
> +
> +/* setup_point_count */
> +static const struct nv50_hw_pm_event_cfg
> +nv50_setup_point_count_event > +{
> +   .name    = "setup_point_count",
> +   .desc    = "The number of points seen by the primitive setup unit
(just "
> +              "before rasterization).",
> +   .display = NV50_HW_PM_EVENT_DISPLAY_RAW,
> +   .count   = NV50_HW_PM_EVENT_COUNT_SIMPLE,
> +   .domain  = 1,
> +};
> +
> +static const struct nv50_hw_pm_query_cfg
> +nv50_setup_point_count_query > +{
> +   .event  = &nv50_setup_point_count_event,
> +   .ctr[0] = CTR(0x8080, SIG("pc01_trast_01"),
> +                         SIG("pc01_trast_04"),
> +                         SIG("pc01_trast_05")),
> +};
> +
> +/* setup_line_count */
> +static const struct nv50_hw_pm_event_cfg
> +nv50_setup_line_count_event > +{
> +   .name    = "setup_line_count",
> +   .desc    = "The number of lines seen by the primitive setup unit
(just "
> +              "before rasterization).",
> +   .display = NV50_HW_PM_EVENT_DISPLAY_RAW,
> +   .count   = NV50_HW_PM_EVENT_COUNT_SIMPLE,
> +   .domain  = 1,
> +};
> +
> +static const struct nv50_hw_pm_query_cfg
> +nv50_setup_line_count_query > +{
> +   .event  = &nv50_setup_line_count_event,
> +   .ctr[0] = CTR(0x8080, SIG("pc01_trast_02"),
> +                         SIG("pc01_trast_04"),
> +                         SIG("pc01_trast_05")),
> +};
> +
> +/* setup_triangle_count */
> +static const struct nv50_hw_pm_event_cfg
> +nv50_setup_triangle_count_event > +{
> +   .name    = "setup_triangle_count",
> +   .desc    = "Returns the number of triangles processed in the
geometry "
> +              "subsystem.",
> +   .display = NV50_HW_PM_EVENT_DISPLAY_RAW,
> +   .count   = NV50_HW_PM_EVENT_COUNT_SIMPLE,
> +   .domain  = 1,
> +};
> +
> +static const struct nv50_hw_pm_query_cfg
> +nv50_setup_triangle_count_query > +{
> +   .event  = &nv50_setup_triangle_count_event,
> +   .ctr[0] = CTR(0x8080, SIG("pc01_trast_03"),
> +                         SIG("pc01_trast_04"),
> +                         SIG("pc01_trast_05")),
> +};
> +
> +/* setup_primitive_culled_count */
> +static const struct nv50_hw_pm_event_cfg
> +nv50_setup_primitive_culled_count_event > +{
> +   .name    = "setup_primitive_culled_count",
> +   .desc    = "Returns the number of primitives culled in primitive
setup. If "
> +              "you are performing viewport culling, this gives you an
"
> +              "indication of the accuracy of the algorithm being
used, and can "
> +              "give you and idea if you need to improves this
culling. This "
> +              "includes primitives culled when using backface
culling. Drawing "
> +              "a fully visible sphere on the screen should cull half
of the "
> +              "triangles if backface culling is turned on and all the
"
> +              "triangles are ordered consistently (CW or CCW).",
> +   .display = NV50_HW_PM_EVENT_DISPLAY_RAW,
> +   .count   = NV50_HW_PM_EVENT_COUNT_SIMPLE,
> +   .domain  = 1,
> +};
> +
> +static const struct nv50_hw_pm_query_cfg
> +nv50_setup_primitive_culled_count_query > +{
> +   .event  = &nv50_setup_primitive_culled_count_event,
> +   .ctr[0] = CTR(0xaaaa, SIG("pc01_unk00")),
> +};
> +
> +/*
> + * RASTERIZER
> + */
> +/* rast_tiles_killed_by_zcull_count */
> +static const struct nv50_hw_pm_event_cfg
> +nv50_rast_tiles_killed_by_zcull_event > +{
> +   .name    = "rasterizer_tiles_killed_by_zcull_count",
> +   .desc    = "The number of pixels killed by the zcull unit in the
rasterizer.",
> +   .display = NV50_HW_PM_EVENT_DISPLAY_RAW,
> +   .count   = NV50_HW_PM_EVENT_COUNT_B6,
> +   .domain  = 1,
> +};
> +
> +static const struct nv50_hw_pm_query_cfg
> +nv50_rast_tiles_killed_by_zcull_query > +{
> +   .event  = &nv50_rast_tiles_killed_by_zcull_event,
> +   .ctr[1] = CTR(0xffff, SIG("pc01_zcull_00",
> +                             SRC("pgraph_zcull_pm_unka4_unk0",
0x7)),
> +                         SIG("pc01_zcull_01"),
> +                         SIG("pc01_zcull_02"),
> +                         SIG("pc01_zcull_03")),
> +   .ctr[2] = CTR(0x5555, SIG("pc01_trailer"),
> +                         SIG("pc01_trailer"),
> +                         SIG("pc01_zcull_04"),
> +                         SIG("pc01_zcull_05")),
> +};
> +
> +/* rast_tiles_in_count */
> +static const struct nv50_hw_pm_event_cfg
> +nv50_rast_tiles_in_count_event > +{
> +   .name    = "rasterizer_tiles_in_count",
> +   .desc    = "Count of tiles (each of which contain 1-8 pixels) seen
by the "
> +              "rasterizer stage.",
> +   .display = NV50_HW_PM_EVENT_DISPLAY_RAW,
> +   .count   = NV50_HW_PM_EVENT_COUNT_B6,
> +   .domain  = 1,
> +};
> +
> +static const struct nv50_hw_pm_query_cfg
> +nv50_rast_tiles_in_count_query > +{
> +   .event  = &nv50_rast_tiles_in_count_event,
> +   .ctr[1] = CTR(0xffff, SIG("pc01_zcull_00",
> +                             SRC("pgraph_zcull_pm_unka4_unk0",
0x0)),
> +                         SIG("pc01_zcull_01"),
> +                         SIG("pc01_zcull_02"),
> +                         SIG("pc01_zcull_03")),
> +   .ctr[2] = CTR(0x5555, SIG("pc01_trailer"),
> +                         SIG("pc01_trailer"),
> +                         SIG("pc01_zcull_04"),
> +                         SIG("pc01_zcull_05")),
> +};
> +
> +/*
> + * ROP
> + */
> +/* rop_busy */
> +static const struct nv50_hw_pm_event_cfg
> +nv50_rop_busy_event > +{
> +   .name    = "rop_busy",
> +   .desc    = "\% of time that the ROP unit is actively doing work.
"
> +              "This can be high if alpha blending is turned on, of
overdraw "
> +              "is high, etc.",
> +   .display = NV50_HW_PM_EVENT_DISPLAY_RATIO,
> +   .count   = NV50_HW_PM_EVENT_COUNT_SIMPLE,
> +   .domain  = 2,
> +};
> +
> +static const struct nv50_hw_pm_query_cfg
> +nv50_rop_busy_query > +{
> +   .event  = &nv50_rop_busy_event,
> +   .ctr[0] = CTR(0xf888, SIG("pc02_prop_02",
> +                             SRC("pgraph_tpc0_prop_pm_mux_sel",
0x0)),
> +                         SIG("pc02_prop_03"),
> +                         SIG("pc02_prop_04"),
> +                         SIG("pc02_prop_05")),
> +};
> +
> +/* rop_waits_for_fb */
> +static const struct nv50_hw_pm_event_cfg
> +nv50_rop_waits_for_fb_event > +{
> +   .name    = "rop_waits_for_fb",
> +   .desc    = "The amount of time the blending unit spent waiting for
data "
> +              "from the frame buffer unit. If blending is enabled and
there "
> +              "is a lot of traffic here (since this is a
read/modify/write "
> +              "operation) this can become a bottleneck.",
> +   .display = NV50_HW_PM_EVENT_DISPLAY_RATIO,
> +   .count   = NV50_HW_PM_EVENT_COUNT_SIMPLE,
> +   .domain  = 2,
> +};
> +
> +static const struct nv50_hw_pm_query_cfg
> +nv50_rop_waits_for_fb_query > +{
> +   .event  = &nv50_rop_waits_for_fb_event,
> +   .ctr[0] = CTR(0x22f2, SIG("pc02_crop_03",
> +                             SRC("pgraph_rop0_crop_pm_mux_sel0",
0x0)),
> +                         SIG("pc02_crop_02"),
> +                         SIG("pc02_zrop_03",
> +                             SRC("pgraph_rop0_zrop_pm_mux_sel0",
0x0)),
> +                         SIG("pc02_zrop_02")),
> +};
> +
> +/* rop_waits_for_shader */
> +static const struct nv50_hw_pm_event_cfg
> +nv50_rop_waits_for_shader_event > +{
> +   .name    = "rop_waits_for_shader",
> +   .desc    = "This is a measurement of how often the blending unit
was "
> +              "waiting on new work (fragments to be placed into the
render "
> +              "target). If the pixel shaders are particularly
expensive, the "
> +              "ROP unit could be starved waiting for results.",
> +   .display = NV50_HW_PM_EVENT_DISPLAY_RATIO,
> +   .count   = NV50_HW_PM_EVENT_COUNT_SIMPLE,
> +   .domain  = 2,
> +};
> +
> +static const struct nv50_hw_pm_query_cfg
> +nv50_rop_waits_for_shader_query > +{
> +   .event  = &nv50_rop_waits_for_shader_event,
> +   .ctr[0] = CTR(0x2222, SIG("pc02_prop_6",
> +                             SRC("pgraph_tpc0_prop_pm_mux_sel",
0x0)),
> +                         SIG("pc02_prop_7")),
> +};
> +
> +/* rop_samples_killed_by_earlyz_count */
> +static const struct nv50_hw_pm_event_cfg
> +nv50_rop_samples_killed_by_earlyz_event > +{
> +   .name    = "rop_samples_killed_by_earlyz_count",
> +   .desc    = "This returns the number of pixels that were killed in
the "
> +              "earlyZ hardware. This signal will give you an idea of,
for "
> +              "instance, a Z only pass was successful in setting up
the depth "
> +              "buffer.",
> +   .display = NV50_HW_PM_EVENT_DISPLAY_RAW,
> +   .count   = NV50_HW_PM_EVENT_COUNT_B6,
> +   .domain  = 2,
> +};
> +
> +static const struct nv50_hw_pm_query_cfg
> +nv50_rop_samples_killed_by_earlyz_query > +{
> +   .event  = &nv50_rop_samples_killed_by_earlyz_event,
> +   .ctr[1] = CTR(0xffff, SIG("pc02_prop_00",
> +                             SRC("pgraph_tpc0_prop_pm_mux_sel",
0x1a)),
> +                         SIG("pc02_prop_01"),
> +                         SIG("pc02_prop_02"),
> +                         SIG("pc02_prop_03")),
> +   .ctr[2] = CTR(0x5555, SIG("pc02_prop_07"),
> +                         SIG("pc02_trailer"),
> +                         SIG("pc02_prop_04"),
> +                         SIG("pc02_prop_05")),
> +};
> +
> +/* rop_samples_killed_by_latez_count */
> +static const struct nv50_hw_pm_event_cfg
> +nv50_rop_samples_killed_by_latez_event > +{
> +   .name    = "rop_samples_killed_by_latez_count",
> +   .desc    = "This returns the number of pixels that were killed
after the "
> +              "pixel shader ran. This can happen if the early Z is
unable to "
> +              "cull the pixel because of an API setup issue like
changing the "
> +              "Z direction or modifying Z in the pixel shader.",
> +   .display = NV50_HW_PM_EVENT_DISPLAY_RAW,
> +   .count   = NV50_HW_PM_EVENT_COUNT_B6,
> +   .domain  = 2,
> +};
> +
> +static const struct nv50_hw_pm_query_cfg
> +nv50_rop_samples_killed_by_latez_query > +{
> +   .event  = &nv50_rop_samples_killed_by_latez_event,
> +   .ctr[1] = CTR(0xffff, SIG("pc02_prop_00",
> +                             SRC("pgraph_tpc0_prop_pm_mux_sel",
0x1b)),
> +                         SIG("pc02_prop_01"),
> +                         SIG("pc02_prop_02"),
> +                         SIG("pc02_prop_03")),
> +   .ctr[2] = CTR(0x5555, SIG("pc02_prop_07"),
> +                         SIG("pc02_trailer"),
> +                         SIG("pc02_prop_04"),
> +                         SIG("pc02_prop_05")),
> +};
> +
> +/*
> + * TEXTURE
> + */
> +/* tex_cache_miss */
> +static const struct nv50_hw_pm_event_cfg
> +nv50_tex_cache_miss_event > +{
> +   .name    = "tex_cache_miss",
> +   .desc    = "Number of texture cache misses.",
> +   .display = NV50_HW_PM_EVENT_DISPLAY_RAW,
> +   .count   = NV50_HW_PM_EVENT_COUNT_SIMPLE,
> +   .domain  = 2,
> +};
> +
> +static const struct nv50_hw_pm_query_cfg
> +nv50_tex_cache_miss_query > +{
> +   .event  = &nv50_tex_cache_miss_event,
> +   .ctr[0] = CTR(0xaaaa, SIG("pc02_tex_04",
> +                             SRC("pgraph_tpc0_tex_unk08_unk0",
0x200))),
> +};
> +
> +static const struct nv50_hw_pm_query_cfg
> +nv84_tex_cache_miss_query > +{
> +   .event  = &nv50_tex_cache_miss_event,
> +   .ctr[0] = CTR(0xaaaa, SIG("pc02_tex_04",
> +                             SRC("pgraph_tpc0_tex_unk08_unk0",
0x800))),
> +};
> +
> +/* tex_cache_hit */
> +static const struct nv50_hw_pm_event_cfg
> +nv50_tex_cache_hit_event > +{
> +   .name    = "tex_cache_hit",
> +   .desc    = "Number of texture cache hits.",
> +   .display = NV50_HW_PM_EVENT_DISPLAY_RAW,
> +   .count   = NV50_HW_PM_EVENT_COUNT_SIMPLE,
> +   .domain  = 2,
> +};
> +
> +static const struct nv50_hw_pm_query_cfg
> +nv50_tex_cache_hit_query > +{
> +   .event  = &nv50_tex_cache_hit_event,
> +   .ctr[0] = CTR(0xaaaa, SIG("pc02_tex_05",
> +                             SRC("pgraph_tpc0_tex_unk08_unk0",
0x200))),
> +};
> +
> +static const struct nv50_hw_pm_query_cfg
> +nv84_tex_cache_hit_query > +{
> +   .event  = &nv50_tex_cache_hit_event,
> +   .ctr[0] = CTR(0xaaaa, SIG("pc02_tex_05",
> +                             SRC("pgraph_tpc0_tex_unk08_unk0",
0x800))),
> +};
> +
> +/* tex_waits_for_fb */
> +static const struct nv50_hw_pm_event_cfg
> +nv50_tex_waits_for_fb_event > +{
> +   .name    = "tex_waits_for_fb",
> +   .desc    = "This is the amount of time the texture unit spent
waiting on "
> +              "samples to return from the frame buffer unit. It is a
potential "
> +              "indication of poor texture cache utilization.",
> +   .display = NV50_HW_PM_EVENT_DISPLAY_RATIO,
> +   .count   = NV50_HW_PM_EVENT_COUNT_SIMPLE,
> +   .domain  = 2,
> +};
> +
> +static const struct nv50_hw_pm_query_cfg
> +nv50_tex_waits_for_fb_query > +{
> +   .event  = &nv50_tex_waits_for_fb_event,
> +   .ctr[0] = CTR(0xaaaa, SIG("pc02_tex_06",
> +                             SRC("pgraph_tpc0_tex_unk08_unk0",
0x200))),
> +};
> +
> +static const struct nv50_hw_pm_query_cfg
> +nv84_tex_waits_for_fb_query > +{
> +   .event  = &nv50_tex_waits_for_fb_event,
> +   .ctr[0] = CTR(0xaaaa, SIG("pc02_tex_06",
> +                             SRC("pgraph_tpc0_tex_unk08_unk0",
0x800))),
> +};
> +
> +static const struct nv50_hw_pm_query_cfg
*nv50_hw_pm_queries[NV50_HW_PM_QUERY_COUNT];
> +
> +#define _Q(n, q) nv50_hw_pm_queries[NV50_HW_PM_QUERY_##n] = &q;
> +
> +static void
> +nv50_identify_events(struct nv50_screen *screen)
> +{
> +  _Q(GPU_IDLE,                      nv50_gpu_idle_query);
> +  _Q(IA_BUSY,                       nv50_ia_busy_query);
> +  _Q(IA_WAITS_FOR_FB,               nv50_ia_waits_for_fb_query);
> +  _Q(VERTEX_ATTR_COUNT,             nv50_vertex_attr_count_query);
> +  _Q(GEOM_VERTEX_IN_COUNT,          nv50_geom_vertex_in_count_query);
> +  _Q(GEOM_VERTEX_OUT_COUNT,         nv50_geom_vertex_out_count_query);
> +  _Q(GEOM_PRIMITIVE_IN_COUNT,       nv50_geom_primitive_in_count_query);
> +  _Q(GEOM_PRIMITIVE_OUT_COUNT,      nv50_geom_primitive_out_count_query);
> +  _Q(SO_BUSY,                       nv50_so_busy_query);
> +  _Q(SETUP_PRIMITIVE_COUNT,         nv50_setup_primitive_count_query);
> +  _Q(SETUP_POINT_COUNT,             nv50_setup_point_count_query);
> +  _Q(SETUP_LINE_COUNT,              nv50_setup_line_count_query);
> +  _Q(SETUP_TRIANGLE_COUNT,          nv50_setup_triangle_count_query);
> +  _Q(SETUP_PRIMITIVE_CULLED_COUNT, 
nv50_setup_primitive_culled_count_query);
> +  _Q(RAST_TILES_KILLED_BY_ZCULL,   
nv50_rast_tiles_killed_by_zcull_query);
> +  _Q(RAST_TILES_IN_COUNT,           nv50_rast_tiles_in_count_query);
> +  _Q(ROP_BUSY,                      nv50_rop_busy_query);
> +  _Q(ROP_WAITS_FOR_FB,              nv50_rop_waits_for_fb_query);
> +  _Q(ROP_WAITS_FOR_SHADER,          nv50_rop_waits_for_shader_query);
> +  _Q(ROP_SAMPLES_KILLED_BY_EARLYZ, 
nv50_rop_samples_killed_by_earlyz_query);
> +  _Q(ROP_SAMPLES_KILLED_BY_LATEZ,   nv50_rop_samples_killed_by_latez_query
);
> +  _Q(TEX_CACHE_MISS,                nv50_tex_cache_miss_query);
> +  _Q(TEX_CACHE_HIT,                 nv50_tex_cache_hit_query);
> +  _Q(TEX_WAITS_FOR_FB,              nv50_tex_waits_for_fb_query);
> +
> +   if (screen->base.class_3d >= NV84_3D_CLASS) {
> +      /* Variants for NV84+ */
> +      _Q(TEX_CACHE_MISS,   nv84_tex_cache_miss_query);
> +      _Q(TEX_CACHE_HIT,    nv84_tex_cache_hit_query);
> +      _Q(TEX_WAITS_FOR_FB, nv84_tex_waits_for_fb_query);
> +   }
> +
> +   if (screen->base.class_3d >= NVA0_3D_CLASS) {
> +      /* Variants for NVA0+ */
> +      _Q(IA_BUSY,           nva0_ia_busy_query);
> +      _Q(IA_WAITS_FOR_FB,   nva0_ia_waits_for_fb_query);
> +      _Q(VERTEX_ATTR_COUNT, nva0_vertex_attr_count_query);
> +   }
> +}
> +
> +#undef _Q
> +
> +#ifdef DEBUG
> +static void
> +nv50_hw_pm_dump_perfdom(struct nvif_perfdom_v0 *args)
> +{
> +   int i, j, k;
> +
> +   debug_printf("PERFDOM CONFIGURATION:\n");
> +   debug_printf("domaine: 0x%02x\n", args->domain);
> +   debug_printf("mode: 0x%02x\n", args->mode);
> +   for (i = 0; i < 4; i++) {
> +      uint32_t signal = 0;
> +      for (j = 0; j < 4; j++)
> +         signal |= args->ctr[i].signal[j] << (j * 8);
> +
> +      debug_printf("ctr[%d]: func = 0x%04x, signal=0x%08x\n",
> +                   i, args->ctr[i].logic_op, signal);
> +
> +      for (j = 0; j < 4; j++) {
> +         for (k = 0; k < 8; k++) {
> +            uint32_t source, value;
> +            if (!args->ctr[i].source[j][k])
> +               continue;
> +
> +            source = args->ctr[i].source[j][k];
> +            value  = args->ctr[i].source[j][k] >> 32;
> +            debug_printf("  src[%d][%d]: source = 0x%08x, value =
0x%08x\n",
> +                         j, k, source, value);
> +         }
> +      }
> +   }
> +}
> +#endif
> +
> +static const struct nv50_hw_pm_query_cfg *
> +nv50_hw_pm_query_get_cfg(struct nv50_screen *screen, uint32_t query_type)
> +{
> +   return nv50_hw_pm_queries[query_type - NV50_HW_PM_QUERY(0)];
> +}
> +
> +static boolean
> +nv50_hw_pm_query_create(struct nv50_context *nv50, struct nv50_query *q)
> +{
> +   struct nv50_screen *screen = nv50->screen;
> +   struct nouveau_perfmon *perfmon = screen->base.perfmon;
> +   static const struct nv50_hw_pm_query_cfg *cfg;
> +   struct nvif_perfdom_v0 args = {};
> +   struct nouveau_perfmon_dom *dom;
> +   int i, j, k;
> +   int ret;
> +
> +   if (!screen->pm.num_active) {
> +      /* TODO: Currently, only one query type can be monitored
simultaneously
> +       * because the Gallium's HUD doesn't fit well with the
perfdom interface.
> +       *
> +       * With two different query types, the current scenario is as
follows:
> +       * CREATE Q1, BEGIN Q1, CREATE Q2, BEGIN Q2, END Q1, RESULT Q1,
BEGIN Q1,
> +       * END Q2, RESULT Q2, BEGIN Q2, END Q1, and so on.
> +       *
> +       * This behaviour doesn't allow to schedule multiple counters
because
> +       * we have to do that at query creation (ie. when a perfdom is
created).
> +       *
> +       * To get rid of this limitation, a better scenario would be:
> +       * CREATE Q1, CREATE Q2, BEGIN Q1, BEGIN Q2, END Q1, END Q2, RESULT
Q1,
> +       * RESULT Q2, BEGIN Q1, BEGIN Q2, END Q1, and so on.
> +       *
> +       * With this kind of behaviour, we could introduce
> +       * {create,begin,end}_all_queries() functions to be able to
configure
> +       * all queries in one shot.
> +       */
> +      screen->pm.query_type = q->type;
> +   }
> +   screen->pm.num_active++;
> +
> +   if (screen->pm.query_type != q->type) {
> +      NOUVEAU_ERR("Only one query type can be monitored at the same
time!");
> +      return FALSE;
> +   }
> +
> +   cfg = nv50_hw_pm_query_get_cfg(nv50->screen, q->type);
> +
> +   dom = nouveau_perfmon_get_dom_by_id(perfmon, cfg->event->domain);
> +   if (!dom) {
> +      NOUVEAU_ERR("Failed to find domain %d\n",
cfg->event->domain);
> +      return FALSE;
> +   }
> +
> +   /* configure domain and counting mode */
> +   args.domain = dom->id;
> +   args.mode   = cfg->event->count;
> +
> +   /* configure counters for this hardware event */
> +   for (i = 0; i < ARRAY_SIZE(cfg->ctr); i++) {
> +      const struct nv50_hw_pm_counter_cfg *sctr = &cfg->ctr[i];
> +
> +      if (!sctr->logic_op)
> +         continue;
> +      args.ctr[i].logic_op = sctr->logic_op;
> +
> +      /* configure signals for this counter */
> +      for (j = 0; j < ARRAY_SIZE(sctr->sig); j++) {
> +         const struct nv50_hw_pm_signal_cfg *ssig = &sctr->sig[j];
> +         struct nouveau_perfmon_sig *sig;
> +
> +         if (!ssig->name)
> +            continue;
> +
> +         sig = nouveau_perfmon_get_sig_by_name(dom, ssig->name);
> +         if (!sig) {
> +            NOUVEAU_ERR("Failed to find signal %s\n",
ssig->name);
> +            return FALSE;
> +         }
> +         args.ctr[i].signal[j] = sig->signal;
> +
> +         /* configure sources for this signal */
> +         for (k = 0; k < ARRAY_SIZE(ssig->src); k++) {
> +            const struct nv50_hw_pm_source_cfg *ssrc =
&ssig->src[k];
> +            struct nouveau_perfmon_src *src;
> +
> +            if (!ssrc->name)
> +               continue;
> +
> +            src = nouveau_perfmon_get_src_by_name(sig, ssrc->name);
> +            if (!src) {
> +               NOUVEAU_ERR("Failed to find source %s\n",
ssrc->name);
> +               return FALSE;
> +            }
> +            args.ctr[i].source[j][k] = (ssrc->value << 32) |
src->id;
> +         }
> +      }
> +   }
> +
> +#ifdef DEBUG
> +   if (debug_get_num_option("NV50_PM_DEBUG", 0))
> +      nv50_hw_pm_dump_perfdom(&args);
> +#endif
> +
> +   ret = nouveau_object_new(perfmon->object, perfmon->handle++,
> +                            NVIF_IOCTL_NEW_V0_PERFDOM,
> +                            &args, sizeof(args), &q->perfdom);
> +   if (ret) {
> +      NOUVEAU_ERR("Failed to create perfdom object: %d\n", ret);
> +      return FALSE;
> +   }
> +
> +   return TRUE;
> +}
> +
> +static void
> +nv50_hw_pm_query_destroy(struct nv50_context *nv50, struct nv50_query *q)
> +{
> +   struct nv50_screen *screen = nv50->screen;
> +
> +   nouveau_object_del(&q->perfdom);
> +   screen->pm.num_active--;
> +}
> +
> +static boolean
> +nv50_hw_pm_query_begin(struct nv50_context *nv50, struct nv50_query *q)
> +{
> +   struct nouveau_pushbuf *push = nv50->base.pushbuf;
> +
> +   /* start the next batch of counters */
> +   PUSH_SPACE(push, 2);
> +   BEGIN_NV04(push, SUBC_SW(0x0608), 1);
> +   PUSH_DATA (push, q->perfdom->handle);
> +
> +   return TRUE;
> +}
> +
> +static void
> +nv50_hw_pm_query_end(struct nv50_context *nv50, struct nv50_query *q)
> +{
> +   struct nouveau_pushbuf *push = nv50->base.pushbuf;
> +   struct nv50_screen *screen = nv50->screen;
> +
> +   /* set sequence field (used to check if result is available) */
> +   q->sequence = ++screen->pm.sequence;
> +
> +   /* sample the previous batch of counters */
> +   PUSH_SPACE(push, 2);
> +   BEGIN_NV04(push, SUBC_SW(0x060c), 1);
> +   PUSH_DATA (push, q->perfdom->handle);
> +
> +   /* read back counters values */
> +   PUSH_SPACE(push, 2);
Do this once as PUSH_SPACE(4). Or even better, PUSH_SPACE(3) and only
do 1 begin with length 2.
> +   BEGIN_NV04(push, SUBC_SW(0x0700), 1);
> +   PUSH_DATA (push, screen->pm.sequence);
> +}
> +
> +static volatile void *
> +nv50_ntfy(struct nv50_screen *screen)
> +{
> +   struct nv04_notify *query = screen->query->data;
> +   struct nouveau_bo *notify = screen->notify_bo;
> +
> +   return (char *)notify->map + query->offset;
> +}
> +
> +static INLINE uint32_t
> +nv50_hw_pm_query_get_offset(struct nv50_query *q)
> +{
> +   return (1 + (q->sequence % NV50_HW_PM_RING_BUFFER_MAX_QUERIES) *
> +           NV50_HW_PM_RING_BUFFER_NUM_DOMAINS * 6);
> +}
> +
> +static INLINE boolean
> +nv50_hw_pm_query_read_data(struct nv50_context *nv50, struct nv50_query
*q,
> +                           boolean wait, uint32_t ctr[4], uint32_t *clk)
> +{
> +   volatile uint32_t *ntfy = nv50_ntfy(nv50->screen);
> +   uint32_t offset = nv50_hw_pm_query_get_offset(q);
> +   boolean found = FALSE;
> +   int i;
> +
> +   while (ntfy[0] < q->sequence) {
> +      if (!wait)
> +         return FALSE;
> +      usleep(100);
> +   }
Yeah this won't fly. Take a look at nouveau_fence_wait for it does
that. I don't suppose you can hook into the fence mechanism for all
this, instead of implementing your own version, right?

BTW, what makes sure that the query has been kicked out? You never do
a PUSH_KICK that I can see...
> +
> +   if (ntfy[0] > q->sequence + NV50_HW_PM_RING_BUFFER_MAX_QUERIES -
1)
> +      return FALSE;
> +
> +   for (i = 0; i < NV50_HW_PM_RING_BUFFER_NUM_DOMAINS; i++) {
> +      if (ntfy[offset + i * 6] == q->perfdom->handle) {
> +         found = TRUE;
> +         break;
> +      }
> +   }
> +
> +   if (!found) {
> +      NOUVEAU_ERR("Failed to find perfdom object %" PRIu64
"!\n",
> +                  q->perfdom->handle);
> +      return FALSE;
> +   }
> +
> +   for (i = 0; i < 4; i++)
> +      ctr[i] = ntfy[offset + i + 1];
> +   *clk = ntfy[offset + 5];
> +
> +   return TRUE;
> +}
> +
> +static boolean
> +nv50_hw_pm_query_result(struct nv50_context *nv50, struct nv50_query *q,
> +                        boolean wait, void *result)
> +{
> +   struct nv50_screen *screen = nv50->screen;
> +   const struct nv50_hw_pm_query_cfg *cfg;
> +   uint32_t ctr[4], clk;
> +   uint64_t value = 0;
> +   int ret;
> +
> +   ret = nv50_hw_pm_query_read_data(nv50, q, wait, ctr, &clk);
> +   if (!ret)
> +      return FALSE;
> +
> +   cfg = nv50_hw_pm_query_get_cfg(screen, q->type);
> +   if (cfg->event->count == NV50_HW_PM_EVENT_COUNT_SIMPLE) {
> +      /* SIMPLE hardware events are sampled on PRE_CTR. */
> +      value = ctr[0];
> +   } else {
> +      /* EVENT_B4/EVENT_B6 hardware events are sampled on EVENT_CTR. */
> +      value = ctr[2];
> +   }
> +
> +   if (cfg->event->display == NV50_HW_PM_EVENT_DISPLAY_RATIO) {
> +      if (clk)
> +         value = (value * 100) / (float)clk;
> +   }
> +
> +   fprintf(stderr, "ctr[0]=%d, ctr[1]=%d, ctr[2]=%d, ctr[3]=%d,
clk=%d, val=%d\n",
> +           ctr[0], ctr[1], ctr[2], ctr[3], clk, value);
> +
> +   *(uint64_t *)result = value;
> +   return TRUE;
> +}
> +
>  void
>  nv50_init_query_functions(struct nv50_context *nv50)
>  {
> diff --git a/src/gallium/drivers/nouveau/nv50/nv50_screen.h
b/src/gallium/drivers/nouveau/nv50/nv50_screen.h
> index 71a5247..0449659 100644
> --- a/src/gallium/drivers/nouveau/nv50/nv50_screen.h
> +++ b/src/gallium/drivers/nouveau/nv50/nv50_screen.h
> @@ -89,6 +89,12 @@ struct nv50_screen {
>        struct nouveau_bo *bo;
>     } fence;
>
> +   struct {
> +      uint32_t sequence;
> +      uint32_t query_type;
> +      uint32_t num_active;
> +   } pm;
> +
>     struct nouveau_object *sync;
>     struct nouveau_object *query;
>
> @@ -108,6 +114,35 @@ nv50_screen(struct pipe_screen *screen)
>     return (struct nv50_screen *)screen;
>  }
>
> +/* Hardware global performance counters. */
> +#define NV50_HW_PM_QUERY_COUNT  24
> +#define NV50_HW_PM_QUERY(i)    (PIPE_QUERY_DRIVER_SPECIFIC + (i))
> +#define NV50_HW_PM_QUERY_LAST   NV50_HW_PM_QUERY(NV50_HW_PM_QUERY_COUNT -
1)
> +#define NV50_HW_PM_QUERY_GPU_IDLE                            0
> +#define NV50_HW_PM_QUERY_IA_BUSY                             1
> +#define NV50_HW_PM_QUERY_IA_WAITS_FOR_FB                     2
> +#define NV50_HW_PM_QUERY_VERTEX_ATTR_COUNT                   3
> +#define NV50_HW_PM_QUERY_GEOM_VERTEX_IN_COUNT                4
> +#define NV50_HW_PM_QUERY_GEOM_VERTEX_OUT_COUNT               5
> +#define NV50_HW_PM_QUERY_GEOM_PRIMITIVE_IN_COUNT             6
> +#define NV50_HW_PM_QUERY_GEOM_PRIMITIVE_OUT_COUNT            7
> +#define NV50_HW_PM_QUERY_SO_BUSY                             8
> +#define NV50_HW_PM_QUERY_SETUP_PRIMITIVE_COUNT               9
> +#define NV50_HW_PM_QUERY_SETUP_POINT_COUNT                  10
> +#define NV50_HW_PM_QUERY_SETUP_LINE_COUNT                   11
> +#define NV50_HW_PM_QUERY_SETUP_TRIANGLE_COUNT               12
> +#define NV50_HW_PM_QUERY_SETUP_PRIMITIVE_CULLED_COUNT       13
> +#define NV50_HW_PM_QUERY_RAST_TILES_KILLED_BY_ZCULL         14
> +#define NV50_HW_PM_QUERY_RAST_TILES_IN_COUNT                15
> +#define NV50_HW_PM_QUERY_ROP_BUSY                           16
> +#define NV50_HW_PM_QUERY_ROP_WAITS_FOR_FB                   17
> +#define NV50_HW_PM_QUERY_ROP_WAITS_FOR_SHADER               18
> +#define NV50_HW_PM_QUERY_ROP_SAMPLES_KILLED_BY_EARLYZ       19
> +#define NV50_HW_PM_QUERY_ROP_SAMPLES_KILLED_BY_LATEZ        20
> +#define NV50_HW_PM_QUERY_TEX_CACHE_MISS                     21
> +#define NV50_HW_PM_QUERY_TEX_CACHE_HIT                      22
> +#define NV50_HW_PM_QUERY_TEX_WAITS_FOR_FB                   23
> +
>  boolean nv50_blitter_create(struct nv50_screen *);
>  void nv50_blitter_destroy(struct nv50_screen *);
>
> --
> 2.4.4
>
> _______________________________________________
> Nouveau mailing list
> Nouveau at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/nouveau

Nouveau - Jun 2015 - [RFC PATCH 3/8] nv50: allocate and map a notifier buffer object for PM

[Nouveau] [RFC PATCH 5/8] nv50: prevent NULL pointer dereference with pipe_query functions

[Nouveau] [RFC PATCH 0/8] nv50: expose global performance counters

[Nouveau] [RFC PATCH 1/8] nouveau: implement the nvif hardware performance counters interface

[Nouveau] [RFC PATCH 2/8] nv50: allocate a software object class

[Nouveau] [RFC PATCH 3/8] nv50: allocate and map a notifier buffer object for PM

[Nouveau] [RFC PATCH 4/8] nv50: configure the ring buffer for reading back PM counters

[Nouveau] [RFC PATCH 5/8] nv50: prevent NULL pointer dereference with pipe_query functions

[Nouveau] [RFC PATCH 6/8] nv50: add support for compute/graphics global performance counters

[Nouveau] [RFC PATCH 7/8] nv50: expose global performance counters to the HUD

[Nouveau] [RFC PATCH 8/8] nv50: enable GL_AMD_performance_monitor

[Nouveau] [RFC PATCH 3/8] nv50: allocate and map a notifier buffer object for PM

[Nouveau] [RFC PATCH 4/8] nv50: configure the ring buffer for reading back PM counters

[Nouveau] [RFC PATCH 6/8] nv50: add support for compute/graphics global performance counters