Hello, I have my work on the nv vpe video decoder in a functional state. In case you didn't know this decoder accelerates mpeg2 video at the idct/mc level. I have verified that it works on nv40 hardware. I believe it works on nv30 hardware (and maybe some earlier hardware), but I cannot verify since I have none. I will reply with patches against the kernel, drm, ddx and mesa for your review. Any comments are welcome. An important note is that the performance isn't that great in my opinion. There is something preventing the nv vpe engine from rendering cmds at the same rate as the blob. Either caching or whatever. Jimmy Rentz
This patch includes all the relevant nv vpe kernel support. This patch applies against the latest nouveau-linux-2.6. Though, the makefile might need adjusting. Some notes about the decoder engine: * It is composed of the mmio control registers, fifo and the output surfaces. * The fifo pushbuffer can be allocated from vram or agp. AGP is not working right now but it should in theory. * Output surfaces for the luma+chroma data can be only be allocated from vram. * Since only one set of mmio control registers exist only one client app can use the engine at a time. I suppose it might be possible to support context switching but that might be too slow to be useful. Client usage: * Client app calls the vpe channel create ioctl to setup the hw and fifo pushbuffer. * Client app creates all the output surfaces via buffer objects. * Client apps writes a set of cmds to the pushbuffer then calls the fire ioctl to kick of a decode of a cmd sequence. * Client app calls the query ioctl to see when an output surface is done rendering. Some notes about the kernel implementation: * Both user and kernel submission of pushbuffers is supported. I originally implemented the kernel submission via a copy of the pushbuffer. The user-space pushbuffer was added later for performance reaons. Though, you still need to call the kernel to fire since mmio access is not allowed for user-mode. * The output surface must be pinned in memory until the rendering is done. A sequence type fence exists that lets you query when a given output surface is done decoding. This would make it possible to free a surface if you want. The kernel would then automatically unpin the surface if you replace it later. Realistically, it wouldn't be smart for performance reasons to free these surfaces. Signed-off-by: Jimmy Rentz <jb17bsome at gmail.com> diff --git a/drivers/gpu/drm/nouveau/Makefile b/drivers/gpu/drm/nouveau/Makefile index 2405d5e..7a6d699 100644 --- a/drivers/gpu/drm/nouveau/Makefile +++ b/drivers/gpu/drm/nouveau/Makefile @@ -23,7 +23,7 @@ nouveau-y := nouveau_drv.o nouveau_state.o nouveau_channel.o nouveau_mem.o \ nv04_dac.o nv04_dfp.o nv04_tv.o nv17_tv.o nv17_tv_modes.o \ nv04_crtc.o nv04_display.o nv04_cursor.o nv04_fbcon.o \ nv10_gpio.o nv50_gpio.o \ - nv50_calc.o + nv50_calc.o nouveau_vd_vpe.o nouveau-$(CONFIG_DRM_NOUVEAU_DEBUG) += nouveau_debugfs.o nouveau-$(CONFIG_COMPAT) += nouveau_ioc32.o diff --git a/drivers/gpu/drm/nouveau/nouveau_channel.c b/drivers/gpu/drm/nouveau/nouveau_channel.c index e952c3b..cfbc981 100644 --- a/drivers/gpu/drm/nouveau/nouveau_channel.c +++ b/drivers/gpu/drm/nouveau/nouveau_channel.c @@ -336,6 +336,15 @@ nouveau_channel_cleanup(struct drm_device *dev, struct drm_file *file_priv) if (chan && chan->file_priv == file_priv) nouveau_channel_free(chan); } + + if (dev_priv->vpe_channel) { + NV_DEBUG(dev, "clearing VPE channel from file_priv\n"); + struct nouveau_vd_vpe_channel *vpe_channel; + vpe_channel = dev_priv->vpe_channel; + + if (vpe_channel->file_priv == file_priv) + nouveau_vpe_channel_free(vpe_channel); + } } int @@ -437,6 +446,14 @@ struct drm_ioctl_desc nouveau_ioctls[] = { DRM_IOCTL_DEF(DRM_NOUVEAU_GEM_CPU_PREP, nouveau_gem_ioctl_cpu_prep, DRM_AUTH), DRM_IOCTL_DEF(DRM_NOUVEAU_GEM_CPU_FINI, nouveau_gem_ioctl_cpu_fini, DRM_AUTH), DRM_IOCTL_DEF(DRM_NOUVEAU_GEM_INFO, nouveau_gem_ioctl_info, DRM_AUTH), + DRM_IOCTL_DEF(DRM_NOUVEAU_VD_VPE_CHANNEL_ALLOC, + nouveau_vd_vpe_ioctl_channel_alloc, DRM_AUTH), + DRM_IOCTL_DEF(DRM_NOUVEAU_VD_VPE_CHANNEL_FREE, + nouveau_vd_vpe_ioctl_channel_free, DRM_AUTH), + DRM_IOCTL_DEF(DRM_NOUVEAU_VD_VPE_PUSHBUF_FIRE, + nouveau_vd_vpe_ioctl_pushbuf_fire, DRM_AUTH), + DRM_IOCTL_DEF(DRM_NOUVEAU_VD_VPE_SURFACE_QUERY, + nouveau_vd_vpe_ioctl_surface_query, DRM_AUTH), }; int nouveau_max_ioctl = DRM_ARRAY_SIZE(nouveau_ioctls); diff --git a/drivers/gpu/drm/nouveau/nouveau_debugfs.c b/drivers/gpu/drm/nouveau/nouveau_debugfs.c index 7933de4..cc3387d 100644 --- a/drivers/gpu/drm/nouveau/nouveau_debugfs.c +++ b/drivers/gpu/drm/nouveau/nouveau_debugfs.c @@ -117,6 +117,117 @@ nouveau_debugfs_channel_fini(struct nouveau_channel *chan) } } +static +int nouveau_debugfs_vpe_channel_info(struct seq_file *m, void *data) +{ + struct drm_info_node *node = (struct drm_info_node *) m->private; + struct nouveau_vd_vpe_channel *chan = node->info_ent->data; + int i; + uint32_t val; + + seq_printf(m, "cpu fifo state:\n"); + seq_printf(m, " max: 0x%08x\n", chan->dma.max << 2); + seq_printf(m, " cur: 0x%08x\n", chan->dma.cur << 2); + seq_printf(m, " put: 0x%08x\n", chan->dma.put << 2); + seq_printf(m, " free: 0x%08x\n", chan->dma.free << 2); + + seq_printf(m, "vpe fifo state:\n"); + seq_printf(m, " config: 0x%08x\n", + nv_rd32(chan->dev, NV_VPE_MPEG2_USER_CONFIG)); + seq_printf(m, " offset: 0x%08x\n", + nv_rd32(chan->dev, NV_VPE_MPEG2_USER_OFFSET)); + seq_printf(m, " size: 0x%08x\n", + nv_rd32(chan->dev, NV_VPE_MPEG2_USER_SIZE)); + seq_printf(m, " get: 0x%08x\n", + nv_rd32(chan->dev, NV_VPE_MPEG2_USER_GET)); + seq_printf(m, " put: 0x%08x\n", + nv_rd32(chan->dev, NV_VPE_MPEG2_USER_PUT)); + seq_printf(m, " get.seq: 0x%08x\n", + nv_rd32(chan->dev, NV_VPE_MPEG2_SEQUENCE_GET)); + seq_printf(m, " put.seq: 0x%08x\n", + chan->dma.sequence); + + seq_printf(m, "vpe engine status:\n"); + seq_printf(m, " engine_config_1: 0x%08x\n", + nv_rd32(chan->dev, NV_VPE_MPEG2_ENGINE_CONFIG_1)); + seq_printf(m, " engine_config_2: 0x%08x\n", + nv_rd32(chan->dev, NV_VPE_MPEG2_ENGINE_CONFIG_2)); + seq_printf(m, " engine_setup_1: 0x%08x\n", + nv_rd32(chan->dev, NV_VPE_MPEG2_ENGINE_SETUP_1)); + seq_printf(m, " engine_setup_2: 0x%08x\n", + nv_rd32(chan->dev, NV_VPE_MPEG2_ENGINE_SETUP_2)); + seq_printf(m, " engine_reader_config: 0x%08x\n", + nv_rd32(chan->dev, NV_VPE_MPEG2_ENGINE_READER_CONFIG)); + seq_printf(m, " engine_processing_status: 0x%08x\n", + nv_rd32(chan->dev, NV_VPE_MPEG2_ENGINE_STATUS)); + seq_printf(m, " engine_status: 0x%08x\n", + nv_rd32(chan->dev, NV_VPE_MPEG2_ENGINE_CONTROL)); + + seq_printf(m, "vpe decode surface config:\n"); + val = nv_rd32(chan->dev, NV_VPE_MPEG2_SURFACE_INFO); + seq_printf(m, " info: 0x%08X\n", + val); + val = nv_rd32(chan->dev, NV_VPE_MPEG2_CONTEXT_DIMENSIONS); + seq_printf(m, " dimensions: width = %d, height = %d\n", + (val >> 16) & 0xFFF, val & 0xFFF); + + seq_printf(m, "vpe decode surface fb offsets:\n"); + for (i = 0; i < ARRAY_SIZE(chan->surface); i++) { + seq_printf(m, " luma.[0x%08X] = 0x%08X\n", i, + nv_rd32(chan->dev, NV_VPE_MPEG2_LUMA_SURFACE_OFFSET_GET(i))); + seq_printf(m, " chroma.[0x%08X] = 0x%08X\n", i, + nv_rd32(chan->dev, NV_VPE_MPEG2_CHROMA_SURFACE_OFFSET_GET(i))); + } + + return 0; +} + +int nouveau_debugfs_vpe_channel_init(struct nouveau_vd_vpe_channel *chan) +{ + struct drm_nouveau_private *dev_priv = chan->dev->dev_private; + struct drm_minor *minor = chan->dev->primary; + int ret; + + if (!dev_priv->debugfs.vpe_channel_root) { + dev_priv->debugfs.vpe_channel_root + debugfs_create_dir("vpe_channel", minor->debugfs_root); + if (!dev_priv->debugfs.vpe_channel_root) + return -ENOENT; + } + + strcpy(chan->debugfs.name, "0"); + chan->debugfs.info.name = chan->debugfs.name; + chan->debugfs.info.show = nouveau_debugfs_vpe_channel_info; + chan->debugfs.info.driver_features = 0; + chan->debugfs.info.data = chan; + + ret = drm_debugfs_create_files(&chan->debugfs.info, 1, + dev_priv->debugfs.vpe_channel_root, + chan->dev->primary); + if (ret == 0) + chan->debugfs.active = true; + return ret; +} + +void +nouveau_debugfs_vpe_channel_fini(struct nouveau_vd_vpe_channel *chan) +{ + struct drm_nouveau_private *dev_priv = chan->dev->dev_private; + + if (!chan->debugfs.active) + return; + + drm_debugfs_remove_files(&chan->debugfs.info, 1, chan->dev->primary); + chan->debugfs.active = false; + + if (chan == dev_priv->vpe_channel) { + debugfs_remove(dev_priv->debugfs.vpe_channel_root); + dev_priv->debugfs.vpe_channel_root = NULL; + } +} + + + static int nouveau_debugfs_chipset_info(struct seq_file *m, void *data) { diff --git a/drivers/gpu/drm/nouveau/nouveau_drv.h b/drivers/gpu/drm/nouveau/nouveau_drv.h index da62e92..150cbf9 100644 --- a/drivers/gpu/drm/nouveau/nouveau_drv.h +++ b/drivers/gpu/drm/nouveau/nouveau_drv.h @@ -502,6 +502,38 @@ struct nv04_mode_state { struct nv04_crtc_reg crtc_reg[2]; }; +struct nouveau_vd_vpe_surface { + struct nouveau_bo *luma_bo; + struct nouveau_bo *chroma_bo; + uint32_t dma_sequence; +}; + +struct nouveau_vd_vpe_channel { + struct drm_device *dev; + struct drm_file *file_priv; + uint32_t width; + uint32_t height; + + /* Push buffer state */ + struct { + uint32_t max; + uint32_t cur; + uint32_t put; + uint32_t free; + uint32_t sequence; + /* access via pushbuf_bo */ + } dma; + + struct nouveau_bo *pushbuf_bo; + struct nouveau_vd_vpe_surface surface[8]; + + struct { + bool active; + char name[32]; + struct drm_info_list info; + } debugfs; +}; + enum nouveau_card_type { NV_04 = 0x00, NV_10 = 0x10, @@ -626,10 +658,13 @@ struct drm_nouveau_private { struct { struct dentry *channel_root; + struct dentry *vpe_channel_root; } debugfs; struct nouveau_fbdev *nfbdev; struct apertures_struct *apertures; + + struct nouveau_vd_vpe_channel *vpe_channel; }; static inline struct drm_nouveau_private * @@ -667,6 +702,16 @@ nouveau_bo_ref(struct nouveau_bo *ref, struct nouveau_bo **pnvbo) (ch) = nv->fifos[(id)]; \ } while (0) +#define NOUVEAU_GET_VPE_CHANNEL_WITH_RETURN(id, ch) do { \ + struct drm_nouveau_private *nv = dev->dev_private; \ + if (nv->vpe_channel && (nv->vpe_channel->file_priv != id)) { \ + NV_ERROR(dev, "pid %d doesn't own vpe channel\n", \ + DRM_CURRENTPID); \ + return -EPERM; \ + } \ + (ch) = nv->vpe_channel; \ +} while (0) + /* nouveau_drv.c */ extern int nouveau_noagp; extern int nouveau_duallink; @@ -811,6 +856,8 @@ extern int nouveau_debugfs_init(struct drm_minor *); extern void nouveau_debugfs_takedown(struct drm_minor *); extern int nouveau_debugfs_channel_init(struct nouveau_channel *); extern void nouveau_debugfs_channel_fini(struct nouveau_channel *); +extern int nouveau_debugfs_vpe_channel_init(struct nouveau_vd_vpe_channel *); +extern void nouveau_debugfs_vpe_channel_fini(struct nouveau_vd_vpe_channel *); #else static inline int nouveau_debugfs_init(struct drm_minor *minor) @@ -832,6 +879,17 @@ static inline void nouveau_debugfs_channel_fini(struct nouveau_channel *chan) { } + +static inline int +nouveau_debugfs_vpe_channel_init(struct nouveau_vd_vpe_channel *chan) +{ + return 0; +} + +static inline void +nouveau_debugfs_vpe_channel_fini(struct nouveau_vd_vpe_channel *chan) +{ +} #endif /* nouveau_dma.c */ @@ -1161,6 +1219,17 @@ extern int nouveau_gem_ioctl_cpu_fini(struct drm_device *, void *, extern int nouveau_gem_ioctl_info(struct drm_device *, void *, struct drm_file *); +/* nouveau_vd_vpe.c */ +extern void nouveau_vpe_channel_free(struct nouveau_vd_vpe_channel *); +extern int nouveau_vd_vpe_ioctl_channel_alloc(struct drm_device *, void *, + struct drm_file *); +extern int nouveau_vd_vpe_ioctl_channel_free(struct drm_device *, void *, + struct drm_file *); +extern int nouveau_vd_vpe_ioctl_pushbuf_fire(struct drm_device *, void *, + struct drm_file *); +extern int nouveau_vd_vpe_ioctl_surface_query(struct drm_device *, void *, + struct drm_file *); + /* nv10_gpio.c */ int nv10_gpio_get(struct drm_device *dev, enum dcb_gpio_tag tag); int nv10_gpio_set(struct drm_device *dev, enum dcb_gpio_tag tag, int state); diff --git a/drivers/gpu/drm/nouveau/nouveau_reg.h b/drivers/gpu/drm/nouveau/nouveau_reg.h index 9c1056c..3dd8308 100644 --- a/drivers/gpu/drm/nouveau/nouveau_reg.h +++ b/drivers/gpu/drm/nouveau/nouveau_reg.h @@ -176,6 +176,37 @@ #define NV04_PTIMER_TIME_1 0x00009410 #define NV04_PTIMER_ALARM_0 0x00009420 +/* The NV VPE MPEG2 control registers that exist on NV40 and NV30 and + * some other older boards possibly.*/ +#define NV_VPE_MPEG2_ENGINE_CONFIG_1 0x0000B0E0 +#define NV_VPE_MPEG2_ENGINE_CONFIG_2 0x0000B0E8 +#define NV_VPE_MPEG2_ENGINE_SETUP_1 0x0000B100 +#define NV_VPE_MPEG2_ENGINE_SETUP_2 0x0000B140 +#define NV_VPE_MPEG2_ENGINE_STATUS 0x0000B200 +#define NV_VPE_MPEG2_ENGINE_READER_CONFIG 0x0000B204 +#define NV_VPE_MPEG2_USER_CONFIG 0x0000B300 +# define NV_VPE_MPEG2_USER_NOT_PRESENT 0x020F0200 +# define NV_VPE_MPEG2_USER_PRESENT 0x02001ec1 +# define NV_VPE_MPEG2_USER_VRAM (0 << 16) +# define NV_VPE_MPEG2_USER_AGP_OR_PCI (1 << 16) +# define NV_VPE_MPEG2_USER_AGP_OR_PCI_READY (2 << 16) +/* Complete guess here about pcie.*/ +# define NV_VPE_MPEG2_USER_PCIE (8 << 16) +#define NV_VPE_MPEG2_UNKNOWN_SETUP_3 0x0000B314 +#define NV_VPE_MPEG2_USER_OFFSET 0x0000B320 +#define NV_VPE_MPEG2_USER_SIZE 0x0000B324 +#define NV_VPE_MPEG2_USER_PUT 0x0000B328 +#define NV_VPE_MPEG2_USER_GET 0x0000B330 +#define NV_VPE_MPEG2_ENGINE_CONTROL 0x0000B32C +# define NV_VPE_MPEG2_ENGINE_STOP 0 +# define NV_VPE_MPEG2_ENGINE_START 1 +#define NV_VPE_MPEG2_SEQUENCE_GET 0x0000B340 +#define NV_VPE_MPEG2_SURFACE_INFO 0x0000B378 +#define NV_VPE_MPEG2_CONTEXT_DIMENSIONS 0x0000B37C +#define NV_VPE_MPEG2_LUMA_SURFACE_OFFSET_GET(s) (0x0000B450 + (s * 8)) +#define NV_VPE_MPEG2_CHROMA_SURFACE_OFFSET_GET(s) (0x0000B454 + (s * 8)) +#define NV_VPE_MPEG2_ENGINE_STATUS_1 0x0000B848 + #define NV04_PGRAPH_DEBUG_0 0x00400080 #define NV04_PGRAPH_DEBUG_1 0x00400084 #define NV04_PGRAPH_DEBUG_2 0x00400088 diff --git a/drivers/gpu/drm/nouveau/nouveau_vd_vpe.c b/drivers/gpu/drm/nouveau/nouveau_vd_vpe.c new file mode 100644 index 0000000..149f10b --- /dev/null +++ b/drivers/gpu/drm/nouveau/nouveau_vd_vpe.c @@ -0,0 +1,1218 @@ +/* + * Copyright (C) 2010 Jimmy Rentz + * All Rights Reserved. + * + * Permission is hereby granted, free of charge, to any person obtaining + * a copy of this software and associated documentation files (the + * "Software"), to deal in the Software without restriction, including + * without limitation the rights to use, copy, modify, merge, publish, + * distribute, sublicense, and/or sell copies of the Software, and to + * permit persons to whom the Software is furnished to do so, subject to + * the following conditions: + * + * The above copyright notice and this permission notice (including the + * next paragraph) shall be included in all copies or substantial + * portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. + * IN NO EVENT SHALL THE COPYRIGHT OWNER(S) AND/OR ITS SUPPLIERS BE + * LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION + * OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION + * WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. + * + */ + +#include "drmP.h" +#include "drm.h" + +#include "nouveau_drv.h" +#include "nouveau_drm.h" +#include "nouveau_vpe_hw.h" + +/* VPE MPEG2 HW notes: + * - There is a 64byte fetch size. That is why each set of commands must + * be aligned on a 64 byte boundary for firing. + * - One fetch of cmds seem to process in 1 microsecond on my nv4e. + * However, I presume this can vary based on the hw and nature of commands. + * - Each firing of a set of commands must be followed by a small delay. + * The main reason is to avoid overwhelming the hw. + * The delays below were determined from testing/measuring. I doubt they + are perfect and they could be tweaked a bit.*/ + +/* Channel/Surface init commands should not take long to process.*/ +#define VPE_UDELAY_FIRE_INIT 4 + +/* Normal firing needs this type of delay.*/ +#define VPE_UDELAY_FIRE_NORMAL 35 + +/* Need a longer delay at the end of the fifo since it takes longer.*/ +#define VPE_UDELAY_FIRE_END 100 + +/* Set if you want to validate vpe user cmds. + * Otherwise, they are copied asis. + * The reason this exists is because a user could set a vpe surface to + * point to the visible framebuffer, etc. However, the user could never + * make a vpe surface use a gart address since it isn't supported by the + * hardware.*/ +/*#define NOUVEAU_VPE_VALIDATE_USER_CMDS*/ + +/* TODO - Export this from nouveau_gem.c*/ +/* Needed to copy userspace pushbuffers that are sent to the vpe hw.*/ +static inline void * +_u_memcpya(uint64_t user, unsigned nmemb, unsigned size) +{ + void *mem; + void __user *userptr = (void __force __user *)(uintptr_t)user; + + mem = kmalloc(nmemb * size, GFP_KERNEL); + if (!mem) + return ERR_PTR(-ENOMEM); + + if (DRM_COPY_FROM_USER(mem, userptr, nmemb * size)) { + kfree(mem); + return ERR_PTR(-EFAULT); + } + + return mem; +} + +/* Internal */ +static inline void +nouveau_vpe_cmd_write(struct nouveau_vd_vpe_channel *vpe_channel, + uint32_t value) +{ + nouveau_bo_wr32(vpe_channel->pushbuf_bo, vpe_channel->dma.cur++, + value); + vpe_channel->dma.free--; + + if (vpe_channel->dma.cur == vpe_channel->dma.max) { + vpe_channel->dma.cur = 0; + vpe_channel->dma.free = vpe_channel->dma.max; + } +} + +static inline void +nouveau_vpe_cmd_align(struct nouveau_vd_vpe_channel *vpe_channel) +{ + uint32_t nop_count; + uint32_t cmd_sequence_count; + int i; + + /* Alignment is needed when ending cmd sequences.*/ + cmd_sequence_count = vpe_channel->dma.cur - vpe_channel->dma.put; + nop_count = ALIGN(cmd_sequence_count, NV_VPE_CMD_ALIGNMENT); + nop_count -= cmd_sequence_count; + + for (i = 0; i < nop_count; i++) + nouveau_vpe_cmd_write(vpe_channel, NV_VPE_CMD_NOP << + NV_VPE_CMD_TYPE_SHIFT); +} + +static inline void +nouveau_vpe_fire(struct nouveau_vd_vpe_channel *vpe_channel, uint64_t delay) +{ + struct drm_device *dev = vpe_channel->dev; + uint32_t put; + + DRM_MEMORYBARRIER(); + + put = (vpe_channel->dma.cur / NV_VPE_CMD_ALIGNMENT) * + NV_VPE_CMD_ALIGNMENT; + + nouveau_bo_rd32(vpe_channel->pushbuf_bo, put); + + nv_wr32(dev, NV_VPE_MPEG2_USER_PUT, put << 2); + + vpe_channel->dma.put = put; + + if (delay) + DRM_UDELAY(delay); +} + +static uint32_t +nouveau_vpe_channel_read_get(struct nouveau_vd_vpe_channel *vpe_channel) +{ + struct drm_device *dev = vpe_channel->dev; + + return nv_rd32(dev, NV_VPE_MPEG2_USER_GET) >> 2; +} + +static int +nouveau_vpe_wait_until_engine_idle(struct nouveau_vd_vpe_channel *vpe_channel) +{ + struct drm_device *dev = vpe_channel->dev; + + if (!nouveau_wait_until(dev, 10000000, NV_VPE_MPEG2_ENGINE_STATUS, + 0x0FFFFFFF, 0)) { + NV_ERROR(dev, "nouveau_vpe_wait_until_engine_idle - engine is not" + " idle. status = 0x%08X.\n", + nv_rd32(dev, NV_VPE_MPEG2_ENGINE_STATUS)); + return -EINVAL; + } + + return 0; +} + +static int +nouveau_vpe_channel_wait(struct nouveau_vd_vpe_channel *vpe_channel, + uint32_t put) +{ + uint32_t get; + uint32_t prev_get = 0; + bool is_beg = (put == 0) || (vpe_channel->dma.put == 0); + uint32_t cnt = 0; + + get = prev_get = nouveau_vpe_channel_read_get(vpe_channel); + + while ((!is_beg && (get < put)) || + (is_beg && (get != 0))) { + + /* reset counter as long as GET is still advancing, this is + * to avoid misdetecting a GPU lockup if the GPU happens to + * just be processing an operation that takes a long time + */ + get = nouveau_vpe_channel_read_get(vpe_channel); + if (get != prev_get) { + prev_get = get; + cnt = 0; + } + + if ((++cnt & 0xff) == 0) { + DRM_UDELAY(1); + if (cnt > 100000) { + NV_ERROR(vpe_channel->dev, "nouveau_vpe_channel_wait - lockup. " + "cur = 0x%08X, put = 0x%08X, get = 0x%08X, put.seq = %u," + "get.seq = %u, ec1 = 0x%08X, ec2 = 0x%08X, es = 0x%08X.\n", + vpe_channel->dma.cur, put, + nouveau_vpe_channel_read_get(vpe_channel), + vpe_channel->dma.sequence, + nv_rd32(vpe_channel->dev, NV_VPE_MPEG2_SEQUENCE_GET), + nv_rd32(vpe_channel->dev, NV_VPE_MPEG2_ENGINE_CONFIG_1), + nv_rd32(vpe_channel->dev, NV_VPE_MPEG2_ENGINE_CONFIG_2), + nv_rd32(vpe_channel->dev, NV_VPE_MPEG2_ENGINE_STATUS)); + return -EBUSY; + } + } + } + + return 0; +} + +static void +nouveau_vpe_cmd_end_sequence_header(struct nouveau_vd_vpe_channel *vpe_channel) +{ + nouveau_vpe_cmd_write(vpe_channel, NV_VPE_CMD_END_SEQUENCE << + NV_VPE_CMD_TYPE_SHIFT | NV_VPE_CMD_SEQUENCE << 24); + + nouveau_vpe_cmd_write(vpe_channel, ++vpe_channel->dma.sequence); +} + +static void +nouveau_vpe_cmd_end_sequence_trailer(struct nouveau_vd_vpe_channel *vpe_channel) +{ + nouveau_vpe_cmd_write(vpe_channel, NV_VPE_CMD_END_SEQUENCE << + NV_VPE_CMD_TYPE_SHIFT); +} + +static void +nouveau_vpe_cmd_end_sequence_finish(struct nouveau_vd_vpe_channel *vpe_channel) +{ + nouveau_vpe_cmd_align(vpe_channel); + nouveau_vpe_fire(vpe_channel, VPE_UDELAY_FIRE_NORMAL); +} + +#ifndef NOUVEAU_VPE_VALIDATE_USER_CMDS +static void +_OUT_RINGp(struct nouveau_vd_vpe_channel *chan, const void *data, + unsigned nr_dwords) +{ + bool is_iomem; + u32 *mem = ttm_kmap_obj_virtual(&chan->pushbuf_bo->kmap, &is_iomem); + mem = &mem[chan->dma.cur]; + if (is_iomem) + memcpy_toio((void __force __iomem *)mem, data, nr_dwords * 4); + else + memcpy(mem, data, nr_dwords * 4); + chan->dma.cur += nr_dwords; +} +#endif + +static int +nouveau_vpe_cmd_write_user_batch(struct nouveau_vd_vpe_channel *chan, + const void *data, unsigned nr_dwords) +{ +#ifdef NOUVEAU_VPE_VALIDATE_USER_CMDS + bool is_iomem; + u32 *mem = ttm_kmap_obj_virtual(&chan->pushbuf_bo->kmap, &is_iomem); + u32 *user_data = (u32 *) data; + uint32_t val; + int i; + bool in_mb_db = false; + bool at_end_mb_db = false; + + mem = &mem[chan->dma.cur]; + + for (i = 0; i < nr_dwords; i++) { + val = user_data[i]; + + if (in_mb_db) { + if (at_end_mb_db) { + if (val == (NV_VPE_CMD_DCT_SEPARATOR << NV_VPE_CMD_TYPE_SHIFT)) + at_end_mb_db = false; + else + in_mb_db = false; + } else if (val & NV_VPE_DCT_BLOCK_TERMINATOR) + at_end_mb_db = true; + } + if (!in_mb_db) { + switch (val & 0xF0000000) { + case NV_VPE_CMD_DCT_SEPARATOR << NV_VPE_CMD_TYPE_SHIFT: + in_mb_db = true; + at_end_mb_db = false; + break; + case NV_VPE_CMD_DCT_CHROMA_HEADER << NV_VPE_CMD_TYPE_SHIFT: + case NV_VPE_CMD_DCT_LUMA_HEADER << NV_VPE_CMD_TYPE_SHIFT: + case NV_VPE_CMD_DCT_COORDINATE << NV_VPE_CMD_TYPE_SHIFT: + case NV_VPE_CMD_CHROMA_MOTION_VECTOR_HEADER << + NV_VPE_CMD_TYPE_SHIFT: + case NV_VPE_CMD_LUMA_MOTION_VECTOR_HEADER << NV_VPE_CMD_TYPE_SHIFT: + case NV_VPE_CMD_MOTION_VECTOR << NV_VPE_CMD_TYPE_SHIFT: + case NV_VPE_CMD_NOP << NV_VPE_CMD_TYPE_SHIFT: + break; + default: + NV_ERROR(chan->dev, "vpe - invalid cmd 0x%08X detected. " + "Aborting cmd sequence.\n", val); + return -EINVAL; + } + } + + /* Always iomem/vram for vpe.*/ + iowrite32_native(val, (void __force __iomem *)&mem[i]); + } + + chan->dma.cur += nr_dwords; +#else + _OUT_RINGp(chan, data, nr_dwords); +#endif + + return 0; +} + +static bool +nouveau_vpe_validate_surface(struct nouveau_vd_vpe_channel *vpe_channel, + uint32_t handle, + struct nouveau_bo *target_nvbo) +{ + struct drm_device *dev = vpe_channel->dev; + struct drm_gem_object *gem; + struct nouveau_bo *nvbo; + bool result; + + gem = drm_gem_object_lookup(dev, vpe_channel->file_priv, handle); + if (unlikely(!gem)) { + result = false; + NV_ERROR(dev, "nouveau_vpe_validate_gem_handle - " + "Unknown handle 0x%08X.\n", handle); + goto out; + } + nvbo = nouveau_gem_object(gem); + if (unlikely(!nvbo || (nvbo != target_nvbo))) { + result = false; + NV_ERROR(dev, "nouveau_vpe_validate_gem_handle - " + "Unknown bo 0x%08X.\n", handle); + goto out; + } + + result = true; + +out: + + mutex_lock(&dev->struct_mutex); + drm_gem_object_unreference(gem); + mutex_unlock(&dev->struct_mutex); + + return result; +} + +static int +nouveau_vpe_pin_surface(struct nouveau_vd_vpe_channel *vpe_channel, + uint32_t handle, uint32_t required_size, + struct nouveau_bo **pnvbo) +{ + struct drm_device *dev = vpe_channel->dev; + struct drm_gem_object *gem; + struct nouveau_bo *nvbo; + uint32_t mem_type; + unsigned long size; + int ret; + + gem = drm_gem_object_lookup(dev, vpe_channel->file_priv, handle); + if (!gem) { + NV_ERROR(dev, "nouveau_vpe_pin_surface - " + " Unknown handle 0x%08X.\n", handle); + return -EINVAL; + } + nvbo = nouveau_gem_object(gem); + if (!nvbo) { + ret = -EINVAL; + NV_ERROR(dev, "nouveau_vpe_pin_surface - " + "Unknown bo 0x%08X.\n", handle); + goto out; + } + ret = ttm_bo_reserve(&nvbo->bo, false, false, false, 0); + if (ret) + goto out; + + mem_type = nvbo->bo.mem.mem_type; + size = nvbo->bo.mem.size; + + ttm_bo_unreserve(&nvbo->bo); + + if (mem_type != TTM_PL_VRAM) { + ret = -EINVAL; + NV_ERROR(dev, "nouveau_vpe_pin_surface - bo must be in vram.\n"); + goto out; + } + if (size < required_size) { + ret = -EINVAL; + NV_ERROR(dev, "nouveau_vpe_pin_surface - bo 0x%08X has size %lu, " + "required %u.\n", handle, + size, required_size); + goto out; + } + ret = nouveau_bo_pin(nvbo, TTM_PL_FLAG_VRAM); + if (ret) { + NV_ERROR(dev, "nouveau_vpe_pin_surface - " + "Could not pin handle 0x%08X.\n", handle); + goto out; + } + + *pnvbo = nvbo; + ret = 0; + +out: + + mutex_lock(&dev->struct_mutex); + drm_gem_object_unreference(gem); + mutex_unlock(&dev->struct_mutex); + + return ret; +} + +static void +nouveau_vpe_unpin_surface(struct nouveau_vd_vpe_channel *vpe_channel, + struct nouveau_bo *nvbo) +{ + if (nvbo && nvbo->pin_refcnt) + nouveau_bo_unpin(nvbo); +} + +static void +nouveau_vpe_reset_pushbuf_to_start(struct nouveau_vd_vpe_channel *vpe_channel) +{ + int i; + uint32_t nop_count; + + if (vpe_channel->dma.cur) { + /* Just write nops till the end since alignment is a non-issue + * here.*/ + nop_count = vpe_channel->dma.max - vpe_channel->dma.cur; + + for (i = 0; i < nop_count; i++) + nouveau_vpe_cmd_write(vpe_channel, NV_VPE_CMD_NOP << + NV_VPE_CMD_TYPE_SHIFT); + } + + nouveau_vpe_fire(vpe_channel, VPE_UDELAY_FIRE_END); +} + +static int +nouveau_vpe_channel_pushbuf_alloc(struct nouveau_vd_vpe_channel *vpe_channel) +{ + struct drm_device *dev = vpe_channel->dev; + struct nouveau_bo *pushbuf_bo; + int ret; + uint32_t flags; + + if (0) + /*dev_priv->gart_info.type == NOUVEAU_GART_AGP) + * agp init is broken right now it seems.*/ + flags = TTM_PL_FLAG_TT; + else + flags = TTM_PL_FLAG_VRAM; + + ret = nouveau_gem_new(dev, NULL, NV_VPE_PUSHBUFFER_SIZE, 0, + flags, 0, 0x0000, false, true, &pushbuf_bo); + if (ret) + return ret; + + ret = nouveau_bo_pin(pushbuf_bo, flags); + if (ret) + goto out_err; + + ret = nouveau_bo_map(pushbuf_bo); + if (ret) + goto out_err; + + vpe_channel->pushbuf_bo = pushbuf_bo; + vpe_channel->dma.max = vpe_channel->pushbuf_bo->bo.mem.size >> 2; + vpe_channel->dma.free = vpe_channel->dma.max; + +out_err: + if (ret) { + mutex_lock(&dev->struct_mutex); + drm_gem_object_unreference(pushbuf_bo->gem); + mutex_unlock(&dev->struct_mutex); + } + + return ret; +} + +static int +nouveau_vpe_channel_hw_init(struct nouveau_vd_vpe_channel *vpe_channel) +{ + uint32_t value; + struct drm_device *dev = vpe_channel->dev; + struct drm_nouveau_private *dev_priv = dev->dev_private; + uint32_t pushbuf_offset = 0; + + /* Turn off the mpeg2 decoder.*/ + nv_wr32(dev, NV_VPE_MPEG2_USER_CONFIG, + NV_VPE_MPEG2_USER_NOT_PRESENT); + nv_wr32(dev, NV_VPE_MPEG2_ENGINE_CONTROL, NV_VPE_MPEG2_ENGINE_STOP); + nv_wr32(dev, NV_VPE_MPEG2_USER_PUT, 0); + nv_wr32(dev, NV_VPE_MPEG2_USER_OFFSET, 0); + nv_wr32(dev, NV_VPE_MPEG2_USER_SIZE, 0); + nv_wr32(dev, NV_VPE_MPEG2_ENGINE_SETUP_1, 0); + nv_wr32(dev, NV_VPE_MPEG2_ENGINE_SETUP_2, 0); + nv_rd32(dev, NV_VPE_MPEG2_ENGINE_CONTROL); + + /* Pause a tiny bit to let the hardware reset. + * This might be needed.*/ + DRM_UDELAY(100); + + nv_wr32(dev, NV_VPE_MPEG2_ENGINE_SETUP_1, 0x01010000); + nv_wr32(dev, NV_VPE_MPEG2_ENGINE_SETUP_2, 0x01010000); + nv_wr32(dev, NV_VPE_MPEG2_UNKNOWN_SETUP_3, 0x100); + + /* Some type of mpeg2 engine config. + * It seems that the hardware automatically sets this to 0x20. + * However, I have an nv4a mmio trace where the nvidia driver + * actually writes 0x20. + * Also I have noticed that when the mpeg2 engine hw locks + * up after playing video, this register gets reset to 0x1. + */ + if (nv_rd32(dev, NV_VPE_MPEG2_ENGINE_CONFIG_1) != 0x20) + nv_wr32(dev, NV_VPE_MPEG2_ENGINE_CONFIG_1, 0x20); + if (nv_rd32(dev, NV_VPE_MPEG2_ENGINE_CONFIG_2) != 0x20) + nv_wr32(dev, NV_VPE_MPEG2_ENGINE_CONFIG_2, 0x20); + + /* Make sure the decoder is ready. + * So, we check each status register. + * Well, that is what these registers seem to be. + */ + value = nv_rd32(dev, NV_VPE_MPEG2_ENGINE_STATUS); + + /* Is the hw still busy? */ + if (value & 0x1) + if (!nouveau_wait_until(dev, 10000000, NV_VPE_MPEG2_ENGINE_STATUS, + 0x0FFFFFFF, 0)) { + NV_ERROR(dev, "nouveau_vpe_channel_hw_init - " + "unknown status value of 0x%08X for engine " + "status reg. Must exit.\n", + nv_rd32(dev, NV_VPE_MPEG2_ENGINE_STATUS)); + return -EINVAL; + } + + /* Make sure the decoder is ready. */ + value = nv_rd32(dev, NV_VPE_MPEG2_ENGINE_STATUS_1); + + /* If we got this value then we might have a problem. */ + if (value & 0x200) { + NV_ERROR(dev, "nouveau_vpe_channel_hw_init - " + "unknown status value of 0x%08X for engine status 1 reg. " + "Must exit.\n", + value); + return -EINVAL; + } + + /* Is the status reg still busy? */ + if (value & 0x1) + if (!nouveau_wait_until(dev, 10000000, NV_VPE_MPEG2_ENGINE_STATUS_1, + 0x0FFFFFFF, 0)) { + NV_ERROR(dev, "nouveau_vpe_channel_hw_init - " + "unknown status value of 0x%08X for engine status 1 reg. " + "Must exit.\n", + nv_rd32(dev, NV_VPE_MPEG2_ENGINE_STATUS_1)); + return -EINVAL; + } + + /* Reset the mpeg2 pushbuffer/user. */ + nv_wr32(dev, NV_VPE_MPEG2_ENGINE_CONTROL, NV_VPE_MPEG2_ENGINE_STOP); + nv_wr32(dev, NV_VPE_MPEG2_USER_OFFSET, 0); + nv_wr32(dev, NV_VPE_MPEG2_USER_SIZE, 0); + + /* The setup of the command buffer is different for agp and pci/pcie. + * NOTE: Agp is not working right now so it is disabled.*/ + if (vpe_channel->pushbuf_bo->bo.mem.mem_type == TTM_PL_TT) { + + pushbuf_offset = lower_32_bits(dev_priv->gart_info.aper_base) + + lower_32_bits(vpe_channel->pushbuf_bo->bo.offset); + + nv_wr32(dev, NV_VPE_MPEG2_USER_CONFIG, + NV_VPE_MPEG2_USER_PRESENT | NV_VPE_MPEG2_USER_AGP_OR_PCI); + /* This needs the agp aperature in the offset.*/ + nv_wr32(dev, NV_VPE_MPEG2_USER_OFFSET, + pushbuf_offset); + nv_wr32(dev, NV_VPE_MPEG2_USER_SIZE, + vpe_channel->dma.max << 2); + nv_wr32(dev, NV_VPE_MPEG2_ENGINE_SETUP_1, 0x01010000); + nv_wr32(dev, NV_VPE_MPEG2_ENGINE_SETUP_2, 0x01010000); + nv_wr32(dev, NV_VPE_MPEG2_USER_CONFIG, + NV_VPE_MPEG2_USER_PRESENT | NV_VPE_MPEG2_USER_AGP_OR_PCI | + NV_VPE_MPEG2_USER_AGP_OR_PCI_READY); + } else { + /* For pci, only the fb offset is used. + * However, have to init the pushbuffer/user using the fb size? + * This is not related to decoding but strictly for reading from + * the pushbuffer/user. It might be caching related. + * The nv driver uses different values but it looks fb size related. + * So, I will go with that for now. + */ + pushbuf_offset = lower_32_bits(vpe_channel->pushbuf_bo->bo.offset); + nv_wr32(dev, NV_VPE_MPEG2_USER_CONFIG, + NV_VPE_MPEG2_USER_PRESENT | NV_VPE_MPEG2_USER_VRAM); + nv_wr32(dev, NV_VPE_MPEG2_USER_OFFSET, 0); + nv_wr32(dev, NV_VPE_MPEG2_USER_SIZE, dev_priv->fb_available_size); + nv_wr32(dev, NV_VPE_MPEG2_ENGINE_SETUP_1, 0x01010000); + nv_wr32(dev, NV_VPE_MPEG2_ENGINE_SETUP_2, 0x01010000); + nv_wr32(dev, NV_VPE_MPEG2_USER_CONFIG, + NV_VPE_MPEG2_USER_PRESENT | NV_VPE_MPEG2_USER_VRAM); + nv_wr32(dev, NV_VPE_MPEG2_USER_OFFSET, + pushbuf_offset); + nv_wr32(dev, NV_VPE_MPEG2_USER_SIZE, + vpe_channel->dma.max << 2); + } + + /* Start up the mpeg2 engine */ + nv_wr32(dev, NV_VPE_MPEG2_ENGINE_CONTROL, NV_VPE_MPEG2_ENGINE_STOP); + nv_wr32(dev, NV_VPE_MPEG2_USER_PUT, 0); + nv_wr32(dev, NV_VPE_MPEG2_ENGINE_CONTROL, NV_VPE_MPEG2_ENGINE_START); + nv_rd32(dev, NV_VPE_MPEG2_ENGINE_CONTROL); + + return 0; +} + +static int +nouveau_vpe_channel_init(struct nouveau_vd_vpe_channel *vpe_channel) +{ + struct drm_device *dev = vpe_channel->dev; + int ret; + int i; + uint32_t value; + + /* Reset decoder to the initial state.*/ + nouveau_vpe_cmd_write(vpe_channel, NV_VPE_CMD_INIT_CHANNEL << + NV_VPE_CMD_TYPE_SHIFT | NV_VPE_CMD_INIT_CHANNEL_ACCEL + << 24); + nouveau_vpe_cmd_write(vpe_channel, NV_VPE_CMD_INIT_CHANNEL << + NV_VPE_CMD_TYPE_SHIFT); + /* NOTE: The surface group info value might be tiling related. */ + nouveau_vpe_cmd_write(vpe_channel, NV_VPE_CMD_INIT_CHANNEL << + NV_VPE_CMD_TYPE_SHIFT | + NV_VPE_CMD_INIT_CHANNEL_SURFACE_GROUP_INFO << 24); + + nouveau_vpe_cmd_end_sequence_header(vpe_channel); + /* No body/trailer for the init cmd.*/ + nouveau_vpe_cmd_end_sequence_finish(vpe_channel); + + ret = nouveau_vpe_channel_wait(vpe_channel, vpe_channel->dma.put); + if (ret) + return ret; + + /* Clear out all surface references.*/ + for (i = 0; i < NV_VPE_MAX_SURFACES; i++) { + + nouveau_vpe_cmd_write(vpe_channel, NV_VPE_CMD_INIT_SURFACE << + NV_VPE_CMD_TYPE_SHIFT | + NV_VPE_CMD_INIT_SURFACE_LUMA(i)); + nouveau_vpe_cmd_align(vpe_channel); + + nouveau_vpe_fire(vpe_channel, VPE_UDELAY_FIRE_INIT); + ret = nouveau_vpe_channel_wait(vpe_channel, vpe_channel->dma.put); + if (ret) + return ret; + + nouveau_vpe_cmd_write(vpe_channel, NV_VPE_CMD_INIT_SURFACE << + NV_VPE_CMD_TYPE_SHIFT | + NV_VPE_CMD_INIT_SURFACE_CHROMA(i)); + nouveau_vpe_cmd_align(vpe_channel); + + nouveau_vpe_fire(vpe_channel, VPE_UDELAY_FIRE_INIT); + ret = nouveau_vpe_channel_wait(vpe_channel, vpe_channel->dma.put); + if (ret) + return ret; + } + + /* Init the decoder channel.*/ + nouveau_vpe_cmd_write(vpe_channel, NV_VPE_CMD_INIT_CHANNEL << + NV_VPE_CMD_TYPE_SHIFT | + NV_VPE_CMD_INIT_CHANNEL_ACCEL << 24 + /* If IDCT is disabled then only MC is done.*/ + | NV_VPE_CMD_INIT_CHANNEL_ACCEL_IDCT); + nouveau_vpe_cmd_write(vpe_channel, NV_VPE_CMD_INIT_CHANNEL << + NV_VPE_CMD_TYPE_SHIFT | + (vpe_channel->width << 12 | vpe_channel->height)); + /* NOTE: The surface group info value might be tiling related. */ + nouveau_vpe_cmd_write(vpe_channel, NV_VPE_CMD_INIT_CHANNEL << + NV_VPE_CMD_TYPE_SHIFT | + NV_VPE_CMD_INIT_CHANNEL_SURFACE_GROUP_INFO << 24 + | (ALIGN(vpe_channel->width, 112) / 32)); + + nouveau_vpe_cmd_end_sequence_header(vpe_channel); + /* No body/trailer for the init cmd.*/ + nouveau_vpe_cmd_end_sequence_finish(vpe_channel); + + ret = nouveau_vpe_channel_wait(vpe_channel, vpe_channel->dma.put); + if (ret) + return ret; + + ret = nouveau_vpe_wait_until_engine_idle(vpe_channel); + if (ret) + return ret; + + /* Make sure hardware context is setup correctly */ + + value = nv_rd32(dev, NV_VPE_MPEG2_SURFACE_INFO); + if (value != (0x10000 | (ALIGN(vpe_channel->width, 128)))) { + NV_ERROR(dev, "nouveau_vpe_channel_init - " + "channel surface setup wrong for width = %d," + "height = %d, got = 0x%08X.\n", + vpe_channel->width, vpe_channel->height, value); + return -EINVAL; + } + + value = nv_rd32(dev, NV_VPE_MPEG2_CONTEXT_DIMENSIONS); + if (value != (((vpe_channel->width & 0xFFF) << 16) | (vpe_channel->height & 0xFFF))) { + NV_ERROR(dev, "nouveau_vpe_channel_init - " + "channel dimensions wrong for width = %d," + "height = %d, got = 0x%08X.\n", + vpe_channel->width, vpe_channel->height, value); + return -EINVAL; + } + + return 0; +} + +static void +nouveau_vpe_channel_shutdown(struct nouveau_vd_vpe_channel *vpe_channel) +{ + nouveau_vpe_cmd_end_sequence_header(vpe_channel); + /* No body/trailer for the init cmd.*/ + nouveau_vpe_cmd_end_sequence_finish(vpe_channel); +} + +static void +nouveau_vpe_channel_hw_shutdown(struct nouveau_vd_vpe_channel *vpe_channel) +{ + struct drm_device *dev = vpe_channel->dev; + + nouveau_vpe_channel_shutdown(vpe_channel); + + nouveau_vpe_channel_wait(vpe_channel, vpe_channel->dma.cur); + + /* Just a slight pause. This might not be needed. */ + DRM_UDELAY(100); + + /* Turn off the mpeg2 decoder.*/ + nv_wr32(dev, NV_VPE_MPEG2_USER_CONFIG, + NV_VPE_MPEG2_USER_NOT_PRESENT); + nv_wr32(dev, NV_VPE_MPEG2_ENGINE_CONTROL, NV_VPE_MPEG2_ENGINE_STOP); + nv_wr32(dev, NV_VPE_MPEG2_USER_PUT, 0); + nv_wr32(dev, NV_VPE_MPEG2_USER_OFFSET, 0); + nv_wr32(dev, NV_VPE_MPEG2_USER_SIZE, 0); + nv_wr32(dev, NV_VPE_MPEG2_ENGINE_SETUP_1, 0); + nv_wr32(dev, NV_VPE_MPEG2_ENGINE_SETUP_2, 0); + nv_rd32(dev, NV_VPE_MPEG2_ENGINE_CONTROL); +} + +static int +nouveau_vpe_channel_alloc(struct drm_device *dev, + struct drm_nouveau_vd_vpe_channel_alloc *req, + struct drm_file *file_priv) +{ + struct drm_nouveau_private *dev_priv = dev->dev_private; + struct nouveau_vd_vpe_channel *vpe_channel; + int ret; + + if (dev_priv->vpe_channel) { + NV_ERROR(dev, "vpe channel is already in use.\n"); + return -EPERM; + } + + if ((dev_priv->card_type != NV_40) && + (dev_priv->card_type != NV_30)) { + NV_ERROR(dev, "vpe is not supported on NV%d.\n", + dev_priv->card_type); + return -EINVAL; + } + + if ((req->width < NV_VPE_MIN_WIDTH) || + (req->width > NV_VPE_MAX_WIDTH) || + (req->height < NV_VPE_MIN_HEIGHT) || + (req->height > NV_VPE_MAX_HEIGHT)) { + NV_ERROR(dev, "vpe does not support width = %d, height = %d\n", + req->width, req->height); + return -EINVAL; + } + + vpe_channel = kzalloc(sizeof(*vpe_channel), GFP_KERNEL); + if (!vpe_channel) + return -ENOMEM; + + req->width = ALIGN(req->width, 16); + req->height = ALIGN(req->height, 16); + vpe_channel->dev = dev; + vpe_channel->width = req->width; + vpe_channel->height = req->height; + + ret = nouveau_vpe_channel_pushbuf_alloc(vpe_channel); + if (ret) + goto out_err; + + ret = nouveau_vpe_channel_hw_init(vpe_channel); + if (ret) + goto out_err; + + ret = nouveau_vpe_channel_init(vpe_channel); + if (ret) + goto out_err; + + ret = drm_gem_handle_create(file_priv, vpe_channel->pushbuf_bo->gem, + &req->pushbuf_handle); + if (ret) + goto out_err; + + nouveau_debugfs_vpe_channel_init(vpe_channel); + + vpe_channel->file_priv = file_priv; + dev_priv->vpe_channel = vpe_channel; + + NV_INFO(dev, "intialized vpe channel\n"); + +out_err: + if (ret) + nouveau_vpe_channel_free(vpe_channel); + + return ret; +} + +void +nouveau_vpe_channel_free(struct nouveau_vd_vpe_channel *vpe_channel) +{ + struct drm_device *dev; + struct drm_nouveau_private *dev_priv; + struct nouveau_vd_vpe_surface *vpe_surface; + int i; + + if (!vpe_channel) + return; + + dev = vpe_channel->dev; + dev_priv = dev->dev_private; + + nouveau_vpe_channel_hw_shutdown(vpe_channel); + + nouveau_debugfs_vpe_channel_fini(vpe_channel); + + for (i = 0; i < ARRAY_SIZE(vpe_channel->surface); i++) { + vpe_surface = &vpe_channel->surface[i]; + if (vpe_surface->luma_bo) + nouveau_vpe_unpin_surface(vpe_channel, vpe_surface->luma_bo); + if (vpe_surface->chroma_bo) + nouveau_vpe_unpin_surface(vpe_channel, vpe_surface->chroma_bo); + } + + if (vpe_channel->pushbuf_bo) { + nouveau_bo_unmap(vpe_channel->pushbuf_bo); + mutex_lock(&vpe_channel->dev->struct_mutex); + drm_gem_object_unreference(vpe_channel->pushbuf_bo->gem); + mutex_unlock(&vpe_channel->dev->struct_mutex); + } + + NV_INFO(vpe_channel->dev, "shutdown vpe channel\n"); + + dev_priv->vpe_channel = NULL; + + kfree(vpe_channel); +} + +static int +nouveau_vpe_reference_surface(struct nouveau_vd_vpe_channel *vpe_channel, + uint32_t surface_index, uint64_t addr_offset, + bool is_luma) +{ + struct drm_device *dev = vpe_channel->dev; + uint32_t value; + int ret; + + if (vpe_channel->dma.free < 8) + nouveau_vpe_reset_pushbuf_to_start(vpe_channel); + + nouveau_vpe_cmd_write(vpe_channel, NV_VPE_CMD_INIT_SURFACE << + NV_VPE_CMD_TYPE_SHIFT | (is_luma ? + NV_VPE_CMD_INIT_SURFACE_LUMA(surface_index) : + NV_VPE_CMD_INIT_SURFACE_CHROMA(surface_index)) + | NV_VPE_CMD_INIT_SURFACE_OFFSET_DIV(lower_32_bits(addr_offset))); + nouveau_vpe_cmd_align(vpe_channel); + + if (vpe_channel->dma.free >= NV_VPE_CMD_ALIGNMENT) + nouveau_vpe_fire(vpe_channel, VPE_UDELAY_FIRE_INIT); + else + nouveau_vpe_reset_pushbuf_to_start(vpe_channel); + + ret = nouveau_vpe_channel_wait(vpe_channel, vpe_channel->dma.cur); + if (ret) + return ret; + + ret = nouveau_vpe_wait_until_engine_idle(vpe_channel); + if (ret) + return ret; + + if (is_luma) { + value = nv_rd32(dev, NV_VPE_MPEG2_LUMA_SURFACE_OFFSET_GET(surface_index)); + if (lower_32_bits(addr_offset) != value) { + NV_ERROR(dev, "vpe - surface.luma ref is wrong. " + "Expected 0x%08X, Got 0x%08X.\n", + lower_32_bits(addr_offset), value); + return -EINVAL; + } + } else { + value = nv_rd32(dev, NV_VPE_MPEG2_CHROMA_SURFACE_OFFSET_GET(surface_index)); + if (lower_32_bits(addr_offset) != value) { + NV_ERROR(dev, "vpe - surface.chroma ref is wrong. " + "Expected 0x%08X, Got 0x%08X.\n", + lower_32_bits(addr_offset), value); + return -EINVAL; + } + } + + return 0; +} + +static int +nouveau_vpe_channel_validate_surfaces(struct nouveau_vd_vpe_channel *vpe_channel, + struct drm_nouveau_vd_vpe_surface *surfaces, int nr_surfaces, + struct nouveau_vd_vpe_surface **target_vpe_surface) +{ + struct drm_device *dev = vpe_channel->dev; + int ret; + int i; + struct nouveau_vd_vpe_surface *vpe_surface; + struct drm_nouveau_vd_vpe_surface *surface; + uint32_t decoder_surface_size = 0; + + for (i = 0, surface = surfaces; i < nr_surfaces; i++, surface++) { + if (unlikely(surface->surface_index >= ARRAY_SIZE(vpe_channel->surface))) { + NV_ERROR(dev, "nouveau_vpe_channel_validate_surfaces - " + "surface_index %d is invalid.\n", surface->surface_index); + return -EINVAL; + } + + vpe_surface = &vpe_channel->surface[surface->surface_index]; + if (!vpe_surface->luma_bo || + !nouveau_vpe_validate_surface(vpe_channel, surface->luma_handle, vpe_surface->luma_bo)) { + if (!decoder_surface_size) + decoder_surface_size = vpe_channel->width * vpe_channel->height; + + if (vpe_surface->luma_bo) { + nouveau_vpe_unpin_surface(vpe_channel, vpe_surface->luma_bo); + vpe_surface->luma_bo = NULL; + } + + ret = nouveau_vpe_pin_surface(vpe_channel, surface->luma_handle, + decoder_surface_size, &vpe_surface->luma_bo); + if (ret) { + NV_ERROR(dev, "nouveau_vpe_channel_validate_surfaces - " + "could not pin surface_index %d, luma handle 0x%08X, " + "error %d.\n", surface->surface_index, + surface->luma_handle, ret); + return ret; + } + + ret = nouveau_vpe_reference_surface(vpe_channel, surface->surface_index, + vpe_surface->luma_bo->bo.offset, true); + if (ret) { + NV_ERROR(dev, "nouveau_vpe_channel_validate_surfaces - " + "could not reference surface_index %d, luma handle 0x%08X, " + "error %d.\n", surface->surface_index, + surface->luma_handle, ret); + nouveau_vpe_unpin_surface(vpe_channel, vpe_surface->luma_bo); + vpe_surface->luma_bo = NULL; + return ret; + } + + vpe_surface->dma_sequence = 0; + } + if (!vpe_surface->chroma_bo || + !nouveau_vpe_validate_surface(vpe_channel, surface->chroma_handle, vpe_surface->chroma_bo)) { + + if (!decoder_surface_size) + decoder_surface_size = vpe_channel->width * vpe_channel->height; + + if (vpe_surface->chroma_bo) { + nouveau_vpe_unpin_surface(vpe_channel, vpe_surface->chroma_bo); + vpe_surface->chroma_bo = NULL; + } + + /* The chroma surface is 1/2 the size of the luma in both the width + * and height.*/ + ret = nouveau_vpe_pin_surface(vpe_channel, surface->chroma_handle, + decoder_surface_size / 4, &vpe_surface->chroma_bo); + if (ret) { + NV_ERROR(dev, "nouveau_vpe_channel_validate_surfaces - " + "could not pin surface_index %d, chroma handle 0x%08X, " + "error %d.\n", surface->surface_index, + surface->luma_handle, ret); + return ret; + } + + ret = nouveau_vpe_reference_surface(vpe_channel, surface->surface_index, + vpe_surface->chroma_bo->bo.offset, false); + if (ret) { + NV_ERROR(dev, "nouveau_vpe_channel_validate_surfaces - " + "could not reference surface_index %d, " + "chroma handle 0x%08X, error %d.\n", + surface->surface_index, surface->luma_handle, ret); + nouveau_vpe_unpin_surface(vpe_channel, vpe_surface->chroma_bo); + vpe_surface->chroma_bo = NULL; + return ret; + } + + vpe_surface->dma_sequence = 0; + } + + /* First surface is considered the target.*/ + if (i == 0) + *target_vpe_surface = vpe_surface; + } + + return 0; +} + +static int +nouveau_vpe_channel_pushbuf_fire(struct nouveau_vd_vpe_channel *vpe_channel, + struct drm_nouveau_vd_vpe_pushbuf_fire *req) +{ + int ret; + uint32_t *pushbuf = NULL; + uint32_t *batches = NULL; + struct drm_nouveau_vd_vpe_surface *surfaces = NULL; + struct nouveau_vd_vpe_surface *vpe_surface = NULL; + int i; + uint32_t offset = 0; + uint32_t batch_size; + bool is_end_sequence = req->flags & + NOUVEAU_VD_VPE_PUSHBUF_FIRE_FLAG_END_SEQUENCE; + bool is_update_dma_pos = req->flags & + NOUVEAU_VD_VPE_PUSHBUF_FIRE_FLAG_UPDATE_DMA_POS; + bool do_fire_batch; + + if (req->nr_surfaces) { + surfaces = _u_memcpya(req->surfaces, req->nr_surfaces, sizeof(*surfaces)); + if (unlikely(IS_ERR(surfaces))) { + ret = PTR_ERR(surfaces); + goto out; + } + } + + if (req->nr_dwords) { + pushbuf = _u_memcpya(req->dwords, req->nr_dwords, sizeof(uint32_t)); + if (unlikely(IS_ERR(pushbuf))) { + ret = PTR_ERR(pushbuf); + goto out; + } + } + + if (req->nr_batches) { + batches = _u_memcpya(req->batches, req->nr_batches, sizeof(uint32_t)); + if (unlikely(IS_ERR(batches))) { + ret = PTR_ERR(batches); + goto out; + } + } + + if (req->nr_surfaces) { + ret = nouveau_vpe_channel_validate_surfaces(vpe_channel, + surfaces, req->nr_surfaces, + &vpe_surface); + if (unlikely(ret)) + goto out; + } + + if (is_update_dma_pos) { + if (req->dma_cur >= vpe_channel->dma.max) { + ret = -EINVAL; + goto out; + } + vpe_channel->dma.cur = req->dma_cur; + vpe_channel->dma.free = vpe_channel->dma.max - vpe_channel->dma.cur; + if (!is_end_sequence) + nouveau_vpe_fire(vpe_channel, VPE_UDELAY_FIRE_NORMAL); + } + + for (i = 0; i < req->nr_batches; i++) { + batch_size = batches[i]; + + do_fire_batch = !(batch_size & + NOUVEAU_VD_VPE_PUSHBUF_FIRE_BATCH_DO_NOT_FIRE); + + batch_size &= 0xFFFF; + + if (unlikely(!batch_size)) { + ret = -EINVAL; + goto out; + } + + if (unlikely((batch_size + offset) > req->nr_dwords)) { + ret = -EINVAL; + goto out; + } + + if (batch_size > vpe_channel->dma.free) + nouveau_vpe_reset_pushbuf_to_start(vpe_channel); + + ret = nouveau_vpe_cmd_write_user_batch(vpe_channel, + (const void *)((uint64_t)pushbuf + (offset << 2)), batch_size); + if (ret) + goto out; + + offset += batch_size; + vpe_channel->dma.free -= batch_size; + + if (!vpe_channel->dma.free) { + vpe_channel->dma.cur = 0; + vpe_channel->dma.free = vpe_channel->dma.max; + nouveau_vpe_fire(vpe_channel, VPE_UDELAY_FIRE_END); + } + + if (do_fire_batch) + nouveau_vpe_fire(vpe_channel, VPE_UDELAY_FIRE_NORMAL); + } + + if (req->nr_dwords) { + if (vpe_channel->dma.free < NV_VPE_MAX_MB) + nouveau_vpe_reset_pushbuf_to_start(vpe_channel); + } + + if (is_end_sequence) { + if (vpe_channel->dma.free < NV_VPE_CMD_ALIGNMENT) + nouveau_vpe_reset_pushbuf_to_start(vpe_channel); + nouveau_vpe_cmd_end_sequence_header(vpe_channel); + nouveau_vpe_cmd_end_sequence_trailer(vpe_channel); + nouveau_vpe_cmd_end_sequence_finish(vpe_channel); + + if (vpe_surface) + vpe_surface->dma_sequence = vpe_channel->dma.sequence; + } + + req->dma_free = vpe_channel->dma.free; + req->dma_cur = vpe_channel->dma.cur; + ret = 0; +out: + if (!IS_ERR(surfaces) && surfaces) + kfree(surfaces); + if (!IS_ERR(batches) && batches) + kfree(batches); + if (!IS_ERR(pushbuf) && pushbuf) + kfree(pushbuf); + + return ret; +} + +static int +nouveau_vpe_surface_query(struct nouveau_vd_vpe_channel *vpe_channel, + struct drm_nouveau_vd_vpe_surface_query *req) +{ + struct drm_device *dev = vpe_channel->dev; + struct nouveau_vd_vpe_surface *vpe_surface; + uint32_t i; + uint32_t value; + + if (unlikely(req->surface_index >= ARRAY_SIZE(vpe_channel->surface))) { + NV_ERROR(dev, "nouveau_vpe_surface_query - invalid surface index %d.\n", + req->surface_index); + return -EINVAL; + } + + req->is_busy = 0; + + vpe_surface = &vpe_channel->surface[req->surface_index]; + + /* This is set when a cmd sequence is done for the target surface.*/ + if (vpe_surface->dma_sequence) { + /* Read the current sequence and see if any surfaces have + * finished rendering.*/ + value = nv_rd32(dev, NV_VPE_MPEG2_SEQUENCE_GET); + for (i = 0; i < ARRAY_SIZE(vpe_channel->surface); i++) { + if (vpe_channel->surface[i].luma_bo || + vpe_channel->surface[i].chroma_bo) { + if (value >= vpe_channel->surface[i].dma_sequence) + vpe_channel->surface[i].dma_sequence = 0; + else if (i == req->surface_index) + req->is_busy = 1; + } + } + } + + return 0; +} + +/* IOCtls.*/ + +int +nouveau_vd_vpe_ioctl_channel_alloc(struct drm_device *dev, void *data, + struct drm_file *file_priv) +{ + + struct drm_nouveau_vd_vpe_channel_alloc *req = data; + + return nouveau_vpe_channel_alloc(dev, req, file_priv); +} + +int +nouveau_vd_vpe_ioctl_channel_free(struct drm_device *dev, void *data, + struct drm_file *file_priv) +{ + struct nouveau_vd_vpe_channel *vpe_channel; + + NOUVEAU_GET_VPE_CHANNEL_WITH_RETURN(file_priv, vpe_channel); + + nouveau_vpe_channel_free(vpe_channel); + + return 0; +} + +int nouveau_vd_vpe_ioctl_pushbuf_fire(struct drm_device *dev, void *data, + struct drm_file *file_priv) +{ + struct nouveau_vd_vpe_channel *vpe_channel; + struct drm_nouveau_vd_vpe_pushbuf_fire *req = data; + + NOUVEAU_GET_VPE_CHANNEL_WITH_RETURN(file_priv, vpe_channel); + + return nouveau_vpe_channel_pushbuf_fire(vpe_channel, req); +} + +int nouveau_vd_vpe_ioctl_surface_query(struct drm_device *dev, void *data, + struct drm_file *file_priv) +{ + struct nouveau_vd_vpe_channel *vpe_channel; + struct drm_nouveau_vd_vpe_surface_query *req = data; + + NOUVEAU_GET_VPE_CHANNEL_WITH_RETURN(file_priv, vpe_channel); + + return nouveau_vpe_surface_query(vpe_channel, req); +} diff --git a/drivers/gpu/drm/nouveau/nouveau_vpe_hw.h b/drivers/gpu/drm/nouveau/nouveau_vpe_hw.h new file mode 100644 index 0000000..8e3dfb9 --- /dev/null +++ b/drivers/gpu/drm/nouveau/nouveau_vpe_hw.h @@ -0,0 +1,153 @@ +/* + * Copyright (C) 2010 Jimmy Rentz + * All Rights Reserved. + * + * Permission is hereby granted, free of charge, to any person obtaining + * a copy of this software and associated documentation files (the + * "Software"), to deal in the Software without restriction, including + * without limitation the rights to use, copy, modify, merge, publish, + * distribute, sublicense, and/or sell copies of the Software, and to + * permit persons to whom the Software is furnished to do so, subject to + * the following conditions: + * + * The above copyright notice and this permission notice (including the + * next paragraph) shall be included in all copies or substantial + * portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. + * IN NO EVENT SHALL THE COPYRIGHT OWNER(S) AND/OR ITS SUPPLIERS BE + * LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION + * OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION + * WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. + * + */ + +#ifndef __NOUVEAU_VPE_HW_H__ +#define __NOUVEAU_VPE_HW_H__ + +/* VPE is the video decoder engine that is found in nv30, nv40 and some + * older hardware (geforce 4 and higher I believe). + * It contains an mpeg2 decoder with the following properties: + * (-) Decodes at the idct level. However, I believe older cards only + * support mc level. + * (-) 32x64 to 2032x2032 profiles. + * (-) 4:2:0 chroma sampling. + * (-) Only one set of registers so only one user unless some type of + * context/channel switching is added.*/ + +#define NV_VPE_MAX_CHANNELS 1 +#define NV_VPE_MAX_SURFACES 8 +#define NV_VPE_MIN_WIDTH 32 +#define NV_VPE_MIN_HEIGHT 64 +#define NV_VPE_MAX_WIDTH 2032 +#define NV_VPE_MAX_HEIGHT 2032 +#define NV_VPE_PUSHBUFFER_SIZE (1 * 1024 * 1024) +#define NV_VPE_CMD_ALIGNMENT 16 + +#define NV_VPE_MAX_MB_BATCH 16 +#define NV_VPE_MAX_MB_HEADER 20 +#define NV_VPE_MAX_MB_DCT (33 * 6) +#define NV_VPE_MAX_MB (NV_VPE_MAX_MB_HEADER + NV_VPE_MAX_MB_DCT) + +#define NV_VPE_CMD_TYPE_SHIFT 28 + +/* All cmd info.*/ +#define NV_VPE_CMD_NOP 0x1 + +#define NV_VPE_CMD_INIT_SURFACE 0x2 + #define NV_VPE_CMD_INIT_SURFACE_LUMA(index) ((index * 2) << 24) + #define NV_VPE_CMD_INIT_SURFACE_CHROMA(index) (((index * 2) + 1) << 24) + #define NV_VPE_CMD_INIT_SURFACE_OFFSET_DIV(offset) (offset >> 5) + +#define NV_VPE_CMD_INIT_CHANNEL 0x3 + /* ( (width round to 112) / 32 */ + #define NV_VPE_CMD_INIT_CHANNEL_SURFACE_GROUP_INFO 0x1 + #define NV_VPE_CMD_INIT_CHANNEL_ACCEL 0x2 + /* (0x1 to turn on idct operations). */ + #define NV_VPE_CMD_INIT_CHANNEL_ACCEL_IDCT 0x1 + +#define NV_VPE_CMD_DCT_SEPARATOR 0x6 +#define NV_VPE_CMD_END_SEQUENCE 0x7 + #define NV_VPE_CMD_SEQUENCE 0x1 + +/* DCT Blocks */ +#define NV_VPE_CMD_DCT_CHROMA_HEADER 0x8 +#define NV_VPE_CMD_DCT_LUMA_HEADER 0x9 + /* The block pattern is used for chroma and luma blocks */ + #define NV_VPE_CMD_DCT_BLOCK_PATTERN(p) ((p) << 24) + /* Not sure what this is for. This is always set in the dct block header */ + #define NV_VPE_CMD_DCT_BLOCK_UNKNOWN 0x10000 + /* Target surface index. Is 0 based. */ + #define NV_VPE_CMD_DCT_BLOCK_TARGET_SURFACE(s) (s << 20) + /* If picture element is frame */ + #define NV_VPE_CMD_PICT_FRAME 0x80000 + /* If field based encoding and a luma block */ + #define NV_VPE_CMD_PICT_FRAME_FIELD 0x800000 + /* If picture element or field encoding is bottom field */ + #define NV_VD_VPE_CMD_BOTTOM_FIELD 0x20000 + /* If macroblock x coordinate is even */ + #define NV_VD_VPE_CMD_EVEN_X_COORD 0x8000 + +/* Used to terminate a set of dct data blocks.*/ +#define NV_VPE_DCT_BLOCK_TERMINATOR 0x1 + +/* Used to designate dct data blocks that are all zero.*/ +#define NV_VPE_DCT_BLOCK_NULL (0x80040000 | NV_VPE_DCT_BLOCK_TERMINATOR) + +/* Coordinates of dct */ +#define NV_VPE_CMD_DCT_COORDINATE 0xA + #define NV_VPE_DCT_POINTS_LUMA(x, y, p) (((y * 16 * p) << 12) | (x * 16)) + #define NV_VPE_DCT_POINTS_CHROMA(x, y, p) (((y * 8 * p) << 12) | (x * 16)) + +/* Motion Vectors */ +#define NV_VPE_CMD_LUMA_MOTION_VECTOR_HEADER 0xD +#define NV_VPE_CMD_CHROMA_MOTION_VECTOR_HEADER 0xC +#define NV_VPE_CMD_MOTION_VECTOR 0xE + + /* Motion Vector Header */ + + /* Set if 2 motion vectors exist for this header. + * Otherwise, it is cleared and only 1 exists.*/ + #define NV_VPE_CMD_MC_MV_COUNT_2 (0x1 << 16) + + /* [Field Picture or Field Motion Only] + * motion_vertical_field_select is set here. + * This means that the bottom field is selected for the given vertical + * vector. However, dual-prime blocks do not follow this rule. + * It is treated speciallly for them.*/ + #define NV_VPE_CMD_BOTTOM_FIELD_VERTICAL_MOTION_SELECT_FIRST (0x1 << 17) + + /* [Frame Picture and Frame Motion Type only] */ + #define NV_VPE_CMD_FRAME_PICT_FRAME_MOTION (0x1 << 19) + + /* MC prediction surface index. Is 0 based. */ + #define NV_VPE_CMD_PREDICTION_SURFACE(s) (s << 20) + + /* Set if this is a second motion vector. Otherwise, the first one is + * assumed.*/ + #define NV_VPE_CMD_MOTION_VECTOR_TYPE_SECOND (0x1 << 23) + + /* [Frame Picture and Frame Motion Type OR Field Picture only]*/ + #define NV_VPE_CMD_FRAME_FRAME_PICT_OR_FIELD (0x1 << 24) + + /* If Vertical Motion Vector is odd then set. This is before any + * operations are done. */ + #define NV_VPE_CMD_ODD_VERTICAL_MOTION_VECTOR (0x1 << 25) + + /* If Horizontal Motion Vector is odd then set. This is before any + * operations are done. */ + #define NV_VPE_CMD_ODD_HORIZONTAL_MOTION_VECTOR (0x1 << 26) + + /* If set then the motion vectors are backward. Otherwise, + * they are forward.*/ + #define NV_VPE_CMD_MOTION_VECTOR_BACKWARD (0x1 << 27) + + /* Motion Vectors. This is the equation used for each motion vector. + * d is only used as a second vector displacement in a couple of cases. + */ + #define NV_VPE_MOTION_VECTOR_VERTICAL(y, c, v, q, d) (((y * c) + (v / q) + d) << 12) + #define NV_VPE_MOTION_VECTOR_HORIZONTAL(x, c, v, q, d) ((x * c) + (v / q) + d) + +#endif diff --git a/include/drm/nouveau_drm.h b/include/drm/nouveau_drm.h index fe917de..c597c0a 100644 --- a/include/drm/nouveau_drm.h +++ b/include/drm/nouveau_drm.h @@ -184,6 +184,52 @@ enum nouveau_bus_type { struct drm_nouveau_sarea { }; +/* VPE Supports mpeg2 only.*/ +struct drm_nouveau_vd_vpe_channel_alloc { + uint32_t width; + uint32_t height; + /* Used for user pushbuf access. + * mmio access is not allowed so you still need to fire as normal.*/ + uint32_t pushbuf_handle; +}; + +struct drm_nouveau_vd_vpe_channel_free { +}; + +#define NOUVEAU_VD_VPE_PUSHBUF_FIRE_FLAG_END_SEQUENCE 0x00000001 +#define NOUVEAU_VD_VPE_PUSHBUF_FIRE_FLAG_UPDATE_DMA_POS 0x00000002 +/* structure for surface.*/ +struct drm_nouveau_vd_vpe_surface { + uint32_t luma_handle; + uint32_t chroma_handle; + uint32_t surface_index; +}; + +/* This flag lets you turn off firing for a specific batch. + * This is needed in some cases to avoid locking up the decoder.*/ +#define NOUVEAU_VD_VPE_PUSHBUF_FIRE_BATCH_DO_NOT_FIRE 0x10000000 +struct drm_nouveau_vd_vpe_pushbuf_fire { + /* [in] */ + uint32_t nr_dwords; + uint64_t dwords; + uint32_t nr_batches; + uint64_t batches; + /* Surface[0] is always the target.*/ + uint32_t nr_surfaces; + uint64_t surfaces; + uint32_t flags; + /* Needed when writing to the hw pushbuf from user space. + * This also will perform a fire.*/ + uint32_t dma_cur; + /* [out] */ + uint32_t dma_free; +}; + +struct drm_nouveau_vd_vpe_surface_query { + uint32_t surface_index; + uint32_t is_busy; +}; + #define DRM_NOUVEAU_GETPARAM 0x00 #define DRM_NOUVEAU_SETPARAM 0x01 #define DRM_NOUVEAU_CHANNEL_ALLOC 0x02 @@ -196,5 +242,9 @@ struct drm_nouveau_sarea { #define DRM_NOUVEAU_GEM_CPU_PREP 0x42 #define DRM_NOUVEAU_GEM_CPU_FINI 0x43 #define DRM_NOUVEAU_GEM_INFO 0x44 +#define DRM_NOUVEAU_VD_VPE_CHANNEL_ALLOC 0x49 +#define DRM_NOUVEAU_VD_VPE_CHANNEL_FREE 0x50 +#define DRM_NOUVEAU_VD_VPE_PUSHBUF_FIRE 0x51 +#define DRM_NOUVEAU_VD_VPE_SURFACE_QUERY 0x52 #endif /* __NOUVEAU_DRM_H__ */
This patch includes all the relevant nv vpe drm support. This patch applies against the latest drm. The drm portion has a couple of classes that interface with the kernel vpe layer: * nouveau_vpe_channel class - Manages the drm vpe channel. This includes open the vpe channel in the kernel, setting up the pushbuf and each output surface. Right now I force the pushbuffer to be allocated/managed from user-mode. This is performance reaons. However, the kernel can always decline this. * nouveau_vpe_pushbuf class - Manages the pushbuf. This includes starting/ending cmd sequences, writing the cmds to the pushbuf and firing. Signed-off-by: Jimmy Rentz <jb17bsome at gmail.com> diff --git a/include/drm/nouveau_drm.h b/include/drm/nouveau_drm.h index a6a9f4a..c597c0a 100644 --- a/include/drm/nouveau_drm.h +++ b/include/drm/nouveau_drm.h @@ -79,6 +79,7 @@ struct drm_nouveau_gpuobj_free { #define NOUVEAU_GETPARAM_CHIPSET_ID 11 #define NOUVEAU_GETPARAM_VM_VRAM_BASE 12 #define NOUVEAU_GETPARAM_GRAPH_UNITS 13 +#define NOUVEAU_GETPARAM_PTIMER_TIME 14 struct drm_nouveau_getparam { uint64_t param; uint64_t value; @@ -183,6 +184,52 @@ enum nouveau_bus_type { struct drm_nouveau_sarea { }; +/* VPE Supports mpeg2 only.*/ +struct drm_nouveau_vd_vpe_channel_alloc { + uint32_t width; + uint32_t height; + /* Used for user pushbuf access. + * mmio access is not allowed so you still need to fire as normal.*/ + uint32_t pushbuf_handle; +}; + +struct drm_nouveau_vd_vpe_channel_free { +}; + +#define NOUVEAU_VD_VPE_PUSHBUF_FIRE_FLAG_END_SEQUENCE 0x00000001 +#define NOUVEAU_VD_VPE_PUSHBUF_FIRE_FLAG_UPDATE_DMA_POS 0x00000002 +/* structure for surface.*/ +struct drm_nouveau_vd_vpe_surface { + uint32_t luma_handle; + uint32_t chroma_handle; + uint32_t surface_index; +}; + +/* This flag lets you turn off firing for a specific batch. + * This is needed in some cases to avoid locking up the decoder.*/ +#define NOUVEAU_VD_VPE_PUSHBUF_FIRE_BATCH_DO_NOT_FIRE 0x10000000 +struct drm_nouveau_vd_vpe_pushbuf_fire { + /* [in] */ + uint32_t nr_dwords; + uint64_t dwords; + uint32_t nr_batches; + uint64_t batches; + /* Surface[0] is always the target.*/ + uint32_t nr_surfaces; + uint64_t surfaces; + uint32_t flags; + /* Needed when writing to the hw pushbuf from user space. + * This also will perform a fire.*/ + uint32_t dma_cur; + /* [out] */ + uint32_t dma_free; +}; + +struct drm_nouveau_vd_vpe_surface_query { + uint32_t surface_index; + uint32_t is_busy; +}; + #define DRM_NOUVEAU_GETPARAM 0x00 #define DRM_NOUVEAU_SETPARAM 0x01 #define DRM_NOUVEAU_CHANNEL_ALLOC 0x02 @@ -195,5 +242,9 @@ struct drm_nouveau_sarea { #define DRM_NOUVEAU_GEM_CPU_PREP 0x42 #define DRM_NOUVEAU_GEM_CPU_FINI 0x43 #define DRM_NOUVEAU_GEM_INFO 0x44 +#define DRM_NOUVEAU_VD_VPE_CHANNEL_ALLOC 0x49 +#define DRM_NOUVEAU_VD_VPE_CHANNEL_FREE 0x50 +#define DRM_NOUVEAU_VD_VPE_PUSHBUF_FIRE 0x51 +#define DRM_NOUVEAU_VD_VPE_SURFACE_QUERY 0x52 #endif /* __NOUVEAU_DRM_H__ */ diff --git a/nouveau/Makefile.am b/nouveau/Makefile.am index de3f4df..5c148f1 100644 --- a/nouveau/Makefile.am +++ b/nouveau/Makefile.am @@ -19,7 +19,9 @@ libdrm_nouveau_la_SOURCES = \ nouveau_bo.c \ nouveau_resource.c \ nouveau_private.h \ - nouveau_reloc.c + nouveau_reloc.c \ + nouveau_vpe_channel.c \ + nouveau_vpe_pushbuf.c libdrm_nouveaucommonincludedir = ${includedir}/nouveau libdrm_nouveaucommoninclude_HEADERS = \ @@ -30,7 +32,10 @@ libdrm_nouveaucommoninclude_HEADERS = \ nouveau_pushbuf.h \ nouveau_bo.h \ nouveau_resource.h \ - nouveau_reloc.h + nouveau_reloc.h \ + nouveau_vpe_channel.h \ + nouveau_vpe_pushbuf.h \ + nouveau_vpe_hw.h libdrm_nouveauincludedir = ${includedir}/libdrm diff --git a/nouveau/nouveau_vpe_channel.c b/nouveau/nouveau_vpe_channel.c new file mode 100644 index 0000000..22092ae --- /dev/null +++ b/nouveau/nouveau_vpe_channel.c @@ -0,0 +1,301 @@ +/* + * Copyright (C) 2010 Jimmy Rentz + * All Rights Reserved. + * + * Permission is hereby granted, free of charge, to any person obtaining + * a copy of this software and associated documentation files (the + * "Software"), to deal in the Software without restriction, including + * without limitation the rights to use, copy, modify, merge, publish, + * distribute, sublicense, and/or sell copies of the Software, and to + * permit persons to whom the Software is furnished to do so, subject to + * the following conditions: + * + * The above copyright notice and this permission notice (including the + * next paragraph) shall be included in all copies or substantial + * portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. + * IN NO EVENT SHALL THE COPYRIGHT OWNER(S) AND/OR ITS SUPPLIERS BE + * LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION + * OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION + * WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. + * + */ + +#include <stdint.h> +#include <stdio.h> +#include <stdlib.h> +#include <string.h> +#include <errno.h> + +#include "nouveau_drmif.h" +#include <nouveau_drm.h> +#include "nouveau_bo.h" +#include "nouveau_vpe_hw.h" +#include "nouveau_vpe_channel.h" +#include "nouveau_vpe_pushbuf.h" + +static int +nouveau_vpe_channel_hw_alloc(struct nouveau_device *dev, + uint32_t *width, uint32_t *height, + uint32_t *hw_pushbuf_handle) +{ + struct drm_nouveau_vd_vpe_channel_alloc vpe_channel_alloc; + struct nouveau_device_priv *nvdev = nouveau_device(dev); + int ret; + + vpe_channel_alloc.width = *width; + vpe_channel_alloc.height = *height; + + ret = drmCommandWriteRead(nvdev->fd, DRM_NOUVEAU_VD_VPE_CHANNEL_ALLOC, + &vpe_channel_alloc, sizeof(vpe_channel_alloc)); + if (ret) { + fprintf(stderr, "vpe - could not initialize channel. error %d.\n", ret); + return ret; + } + + *width = vpe_channel_alloc.width; + *height = vpe_channel_alloc.height; + *hw_pushbuf_handle = vpe_channel_alloc.pushbuf_handle; + + return 0; +} + +static void +nouveau_vpe_channel_hw_free(struct nouveau_device *dev) +{ + struct drm_nouveau_vd_vpe_channel_free vpe_channel_free; + struct nouveau_device_priv *nvdev = nouveau_device(dev); + + drmCommandWriteRead(nvdev->fd, DRM_NOUVEAU_VD_VPE_CHANNEL_FREE, + &vpe_channel_free, sizeof(vpe_channel_free)); +} + +int +nouveau_vpe_channel_alloc(struct nouveau_device *dev, uint32_t width, + uint32_t height, struct nouveau_vpe_channel **vpe_channel) +{ + int ret; + struct nouveau_vpe_channel *chan = NULL; + struct nouveau_vpe_pushbuf *pushbuf = NULL; + struct nouveau_vpe_surface *surfaces = NULL; + + if (!dev) + return -EINVAL; + + chan = calloc(1, sizeof(*chan)); + + if (!chan) { + ret = -ENOMEM; + goto out_err; + } + + pushbuf = calloc(1, sizeof(*pushbuf)); + + if (!pushbuf) { + ret = -ENOMEM; + goto out_err; + } + + /* For Past, Target, Future.*/ + pushbuf->nr_surfaces = 3; + pushbuf->surfaces = calloc(pushbuf->nr_surfaces, sizeof(*surfaces)); + + if (!pushbuf->surfaces) { + ret = -ENOMEM; + goto out_err; + } + + pushbuf->mb_buffer = calloc(NV_VPE_MAX_MB, sizeof(uint32_t)); + if (!pushbuf->mb_buffer) { + ret = -ENOMEM; + goto out_err; + } + + chan->nr_surfaces = NV_VPE_MAX_SURFACES; + surfaces = calloc(chan->nr_surfaces, sizeof(*surfaces)); + + if (!surfaces) { + ret = -ENOMEM; + goto out_err; + } + + chan->width = width; + chan->height = height; + + ret = nouveau_vpe_channel_hw_alloc(dev, &chan->width, &chan->height, + &pushbuf->hw_handle); + if (ret) + goto out_err; + + pushbuf->use_hw_pushbuf = 1; + + if (pushbuf->use_hw_pushbuf && pushbuf->hw_handle) { + ret = nouveau_bo_wrap(dev, pushbuf->hw_handle, &pushbuf->hw_bo); + if (ret) + goto out_err; + + ret = nouveau_bo_map(pushbuf->hw_bo, NOUVEAU_BO_RDWR); + if (ret) + goto out_err; + + pushbuf->buf = (uint32_t*)pushbuf->hw_bo->map; + pushbuf->buf_max = pushbuf->hw_bo->size >> 2; + pushbuf->max = pushbuf->buf_max; + } + else { + pushbuf->use_hw_pushbuf = 0; + pushbuf->buf_max = NV_VPE_USER_PUSHBUFFER_SIZE >> 2; + + pushbuf->buf = calloc(pushbuf->buf_max, sizeof(*pushbuf->buf)); + + if (!pushbuf->buf) { + ret = -ENOMEM; + goto out_err; + } + } + + chan->pushbuf = pushbuf; + chan->surfaces = surfaces; + chan->device = dev; + + *vpe_channel = chan; + +out_err: + if (ret) { + if (surfaces) + free(surfaces); + + if (pushbuf) { + if (pushbuf->surfaces) + free(pushbuf->surfaces); + if (pushbuf->use_hw_pushbuf) { + if (pushbuf->hw_bo) + nouveau_bo_ref(NULL, &pushbuf->hw_bo); + } + else { + if (pushbuf->buf) + free(pushbuf->buf); + } + + if (pushbuf->mb_buffer) + free(pushbuf->mb_buffer); + + free(pushbuf); + } + if (chan) + free(chan); + } + + return ret; +} + +void +nouveau_vpe_channel_free(struct nouveau_vpe_channel **vpe_channel) +{ + struct nouveau_vpe_channel *chan; + + if (!vpe_channel || !*vpe_channel) + return; + + chan = *vpe_channel; + + nouveau_vpe_channel_hw_free(chan->device); + + if (chan->surfaces) + free(chan->surfaces); + if (chan->pushbuf) { + if (chan->pushbuf->surfaces) + free(chan->pushbuf->surfaces); + if (chan->pushbuf->use_hw_pushbuf) { + if (chan->pushbuf->hw_bo) { + nouveau_bo_unmap(chan->pushbuf->hw_bo); + nouveau_bo_ref(NULL, &chan->pushbuf->hw_bo); + } + } + else { + if (chan->pushbuf->buf) + free(chan->pushbuf->buf); + } + if (chan->pushbuf->mb_buffer) + free(chan->pushbuf->mb_buffer); + free(chan->pushbuf); + } + + free(chan); + *vpe_channel = NULL; +} + +static int +nouveau_vpe_surface_hw_query(struct nouveau_device *dev, + uint32_t surface_index, uint32_t *is_busy) +{ + struct drm_nouveau_vd_vpe_surface_query query; + struct nouveau_device_priv *nvdev = nouveau_device(dev); + int ret; + + query.surface_index = surface_index; + do { + ret = drmCommandWriteRead(nvdev->fd, DRM_NOUVEAU_VD_VPE_SURFACE_QUERY, + &query, sizeof(query)); + } while (ret == -EAGAIN); + if (!ret) + *is_busy = query.is_busy; + else + fprintf(stderr, "vpe - could not query status for surface %d. error %d.\n", + surface_index, ret); + + return ret; +} + +int +nouveau_vpe_surface_alloc(struct nouveau_vpe_channel *vpe_channel, + uint32_t luma_handle, uint32_t chroma_handle, + uint32_t *surface_index) +{ + int i; + + if (!vpe_channel || !vpe_channel->surfaces || !luma_handle || + !chroma_handle) + return -EINVAL; + + for (i = 0; i < (int)vpe_channel->nr_surfaces; i++) { + if (!vpe_channel->surfaces[i].used) { + vpe_channel->surfaces[i].luma_handle = luma_handle; + vpe_channel->surfaces[i].chroma_handle = chroma_handle; + vpe_channel->surfaces[i].used = 1; + *surface_index = i; + return 0; + } + } + + fprintf(stderr, "vpe - all %d surfaces are in use.\n", vpe_channel->nr_surfaces); + + return -EINVAL; +} + +void +nouveau_vpe_surface_free(struct nouveau_vpe_channel *vpe_channel, + uint32_t surface_index) +{ + if (!vpe_channel) + return; + + if (surface_index >= vpe_channel->nr_surfaces) + return; + + vpe_channel->surfaces[surface_index].used = 0; +} + +int +nouveau_vpe_surface_query(struct nouveau_vpe_channel *vpe_channel, + uint32_t surface_index, uint32_t *is_busy) +{ + if (!vpe_channel || !is_busy) + return -EINVAL; + + return nouveau_vpe_surface_hw_query(vpe_channel->device, surface_index, + is_busy); +} diff --git a/nouveau/nouveau_vpe_channel.h b/nouveau/nouveau_vpe_channel.h new file mode 100644 index 0000000..a4d4a71 --- /dev/null +++ b/nouveau/nouveau_vpe_channel.h @@ -0,0 +1,72 @@ +/* + * Copyright (C) 2010 Jimmy Rentz + * All Rights Reserved. + * + * Permission is hereby granted, free of charge, to any person obtaining + * a copy of this software and associated documentation files (the + * "Software"), to deal in the Software without restriction, including + * without limitation the rights to use, copy, modify, merge, publish, + * distribute, sublicense, and/or sell copies of the Software, and to + * permit persons to whom the Software is furnished to do so, subject to + * the following conditions: + * + * The above copyright notice and this permission notice (including the + * next paragraph) shall be included in all copies or substantial + * portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. + * IN NO EVENT SHALL THE COPYRIGHT OWNER(S) AND/OR ITS SUPPLIERS BE + * LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION + * OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION + * WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. + * + */ + +#ifndef __NOUVEAU_VPE_CHANNEL_H__ +#define __NOUVEAU_VPE_CHANNEL_H__ + +/*#define NV_VPE_USER_HW_PUSHBUFFER*/ +#define NV_VPE_USER_PUSHBUFFER_SIZE 128 * 1024 + +struct nouveau_vpe_surface { + uint32_t luma_handle; + uint32_t chroma_handle; + char kernel_referenced; + char used; +}; + +struct nouveau_vpe_channel { + struct nouveau_device *device; + + uint32_t width; + uint32_t height; + + struct nouveau_vpe_pushbuf *pushbuf; + + uint32_t nr_surfaces; + struct nouveau_vpe_surface *surfaces; +}; + +int +nouveau_vpe_channel_alloc(struct nouveau_device *, uint32_t width, + uint32_t height, struct nouveau_vpe_channel **); + +void +nouveau_vpe_channel_free(struct nouveau_vpe_channel **); + +int +nouveau_vpe_surface_alloc(struct nouveau_vpe_channel *, + uint32_t luma_handle, uint32_t chroma_handle, + uint32_t *surface_index); + +void +nouveau_vpe_surface_free(struct nouveau_vpe_channel *, + uint32_t surface_index); + +int +nouveau_vpe_surface_query(struct nouveau_vpe_channel *, + uint32_t surface_index, uint32_t *is_busy); + +#endif diff --git a/nouveau/nouveau_vpe_hw.h b/nouveau/nouveau_vpe_hw.h new file mode 100644 index 0000000..8e3dfb9 --- /dev/null +++ b/nouveau/nouveau_vpe_hw.h @@ -0,0 +1,153 @@ +/* + * Copyright (C) 2010 Jimmy Rentz + * All Rights Reserved. + * + * Permission is hereby granted, free of charge, to any person obtaining + * a copy of this software and associated documentation files (the + * "Software"), to deal in the Software without restriction, including + * without limitation the rights to use, copy, modify, merge, publish, + * distribute, sublicense, and/or sell copies of the Software, and to + * permit persons to whom the Software is furnished to do so, subject to + * the following conditions: + * + * The above copyright notice and this permission notice (including the + * next paragraph) shall be included in all copies or substantial + * portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. + * IN NO EVENT SHALL THE COPYRIGHT OWNER(S) AND/OR ITS SUPPLIERS BE + * LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION + * OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION + * WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. + * + */ + +#ifndef __NOUVEAU_VPE_HW_H__ +#define __NOUVEAU_VPE_HW_H__ + +/* VPE is the video decoder engine that is found in nv30, nv40 and some + * older hardware (geforce 4 and higher I believe). + * It contains an mpeg2 decoder with the following properties: + * (-) Decodes at the idct level. However, I believe older cards only + * support mc level. + * (-) 32x64 to 2032x2032 profiles. + * (-) 4:2:0 chroma sampling. + * (-) Only one set of registers so only one user unless some type of + * context/channel switching is added.*/ + +#define NV_VPE_MAX_CHANNELS 1 +#define NV_VPE_MAX_SURFACES 8 +#define NV_VPE_MIN_WIDTH 32 +#define NV_VPE_MIN_HEIGHT 64 +#define NV_VPE_MAX_WIDTH 2032 +#define NV_VPE_MAX_HEIGHT 2032 +#define NV_VPE_PUSHBUFFER_SIZE (1 * 1024 * 1024) +#define NV_VPE_CMD_ALIGNMENT 16 + +#define NV_VPE_MAX_MB_BATCH 16 +#define NV_VPE_MAX_MB_HEADER 20 +#define NV_VPE_MAX_MB_DCT (33 * 6) +#define NV_VPE_MAX_MB (NV_VPE_MAX_MB_HEADER + NV_VPE_MAX_MB_DCT) + +#define NV_VPE_CMD_TYPE_SHIFT 28 + +/* All cmd info.*/ +#define NV_VPE_CMD_NOP 0x1 + +#define NV_VPE_CMD_INIT_SURFACE 0x2 + #define NV_VPE_CMD_INIT_SURFACE_LUMA(index) ((index * 2) << 24) + #define NV_VPE_CMD_INIT_SURFACE_CHROMA(index) (((index * 2) + 1) << 24) + #define NV_VPE_CMD_INIT_SURFACE_OFFSET_DIV(offset) (offset >> 5) + +#define NV_VPE_CMD_INIT_CHANNEL 0x3 + /* ( (width round to 112) / 32 */ + #define NV_VPE_CMD_INIT_CHANNEL_SURFACE_GROUP_INFO 0x1 + #define NV_VPE_CMD_INIT_CHANNEL_ACCEL 0x2 + /* (0x1 to turn on idct operations). */ + #define NV_VPE_CMD_INIT_CHANNEL_ACCEL_IDCT 0x1 + +#define NV_VPE_CMD_DCT_SEPARATOR 0x6 +#define NV_VPE_CMD_END_SEQUENCE 0x7 + #define NV_VPE_CMD_SEQUENCE 0x1 + +/* DCT Blocks */ +#define NV_VPE_CMD_DCT_CHROMA_HEADER 0x8 +#define NV_VPE_CMD_DCT_LUMA_HEADER 0x9 + /* The block pattern is used for chroma and luma blocks */ + #define NV_VPE_CMD_DCT_BLOCK_PATTERN(p) ((p) << 24) + /* Not sure what this is for. This is always set in the dct block header */ + #define NV_VPE_CMD_DCT_BLOCK_UNKNOWN 0x10000 + /* Target surface index. Is 0 based. */ + #define NV_VPE_CMD_DCT_BLOCK_TARGET_SURFACE(s) (s << 20) + /* If picture element is frame */ + #define NV_VPE_CMD_PICT_FRAME 0x80000 + /* If field based encoding and a luma block */ + #define NV_VPE_CMD_PICT_FRAME_FIELD 0x800000 + /* If picture element or field encoding is bottom field */ + #define NV_VD_VPE_CMD_BOTTOM_FIELD 0x20000 + /* If macroblock x coordinate is even */ + #define NV_VD_VPE_CMD_EVEN_X_COORD 0x8000 + +/* Used to terminate a set of dct data blocks.*/ +#define NV_VPE_DCT_BLOCK_TERMINATOR 0x1 + +/* Used to designate dct data blocks that are all zero.*/ +#define NV_VPE_DCT_BLOCK_NULL (0x80040000 | NV_VPE_DCT_BLOCK_TERMINATOR) + +/* Coordinates of dct */ +#define NV_VPE_CMD_DCT_COORDINATE 0xA + #define NV_VPE_DCT_POINTS_LUMA(x, y, p) (((y * 16 * p) << 12) | (x * 16)) + #define NV_VPE_DCT_POINTS_CHROMA(x, y, p) (((y * 8 * p) << 12) | (x * 16)) + +/* Motion Vectors */ +#define NV_VPE_CMD_LUMA_MOTION_VECTOR_HEADER 0xD +#define NV_VPE_CMD_CHROMA_MOTION_VECTOR_HEADER 0xC +#define NV_VPE_CMD_MOTION_VECTOR 0xE + + /* Motion Vector Header */ + + /* Set if 2 motion vectors exist for this header. + * Otherwise, it is cleared and only 1 exists.*/ + #define NV_VPE_CMD_MC_MV_COUNT_2 (0x1 << 16) + + /* [Field Picture or Field Motion Only] + * motion_vertical_field_select is set here. + * This means that the bottom field is selected for the given vertical + * vector. However, dual-prime blocks do not follow this rule. + * It is treated speciallly for them.*/ + #define NV_VPE_CMD_BOTTOM_FIELD_VERTICAL_MOTION_SELECT_FIRST (0x1 << 17) + + /* [Frame Picture and Frame Motion Type only] */ + #define NV_VPE_CMD_FRAME_PICT_FRAME_MOTION (0x1 << 19) + + /* MC prediction surface index. Is 0 based. */ + #define NV_VPE_CMD_PREDICTION_SURFACE(s) (s << 20) + + /* Set if this is a second motion vector. Otherwise, the first one is + * assumed.*/ + #define NV_VPE_CMD_MOTION_VECTOR_TYPE_SECOND (0x1 << 23) + + /* [Frame Picture and Frame Motion Type OR Field Picture only]*/ + #define NV_VPE_CMD_FRAME_FRAME_PICT_OR_FIELD (0x1 << 24) + + /* If Vertical Motion Vector is odd then set. This is before any + * operations are done. */ + #define NV_VPE_CMD_ODD_VERTICAL_MOTION_VECTOR (0x1 << 25) + + /* If Horizontal Motion Vector is odd then set. This is before any + * operations are done. */ + #define NV_VPE_CMD_ODD_HORIZONTAL_MOTION_VECTOR (0x1 << 26) + + /* If set then the motion vectors are backward. Otherwise, + * they are forward.*/ + #define NV_VPE_CMD_MOTION_VECTOR_BACKWARD (0x1 << 27) + + /* Motion Vectors. This is the equation used for each motion vector. + * d is only used as a second vector displacement in a couple of cases. + */ + #define NV_VPE_MOTION_VECTOR_VERTICAL(y, c, v, q, d) (((y * c) + (v / q) + d) << 12) + #define NV_VPE_MOTION_VECTOR_HORIZONTAL(x, c, v, q, d) ((x * c) + (v / q) + d) + +#endif diff --git a/nouveau/nouveau_vpe_pushbuf.c b/nouveau/nouveau_vpe_pushbuf.c new file mode 100644 index 0000000..aeba8fb --- /dev/null +++ b/nouveau/nouveau_vpe_pushbuf.c @@ -0,0 +1,429 @@ +/* + * Copyright (C) 2010 Jimmy Rentz + * All Rights Reserved. + * + * Permission is hereby granted, free of charge, to any person obtaining + * a copy of this software and associated documentation files (the + * "Software"), to deal in the Software without restriction, including + * without limitation the rights to use, copy, modify, merge, publish, + * distribute, sublicense, and/or sell copies of the Software, and to + * permit persons to whom the Software is furnished to do so, subject to + * the following conditions: + * + * The above copyright notice and this permission notice (including the + * next paragraph) shall be included in all copies or substantial + * portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. + * IN NO EVENT SHALL THE COPYRIGHT OWNER(S) AND/OR ITS SUPPLIERS BE + * LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION + * OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION + * WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. + * + */ + +#include <stdint.h> +#include <stdio.h> +#include <stdlib.h> +#include <string.h> +#include <errno.h> +#include <unistd.h> + +#include "nouveau_drmif.h" +#include <nouveau_drm.h> +#include "nouveau_vpe_hw.h" +#include "nouveau_vpe_channel.h" +#include "nouveau_vpe_pushbuf.h" + +static char +nouveau_vpe_pushbuf_reference_surface(struct nouveau_vpe_channel *vpe_channel, + int surface_index) +{ + int index = vpe_channel->pushbuf->nr_surfaces++; + vpe_channel->pushbuf->surfaces[index].surface_index = surface_index; + vpe_channel->pushbuf->surfaces[index].luma_handle + vpe_channel->surfaces[surface_index].luma_handle; + vpe_channel->pushbuf->surfaces[index].chroma_handle + vpe_channel->surfaces[surface_index].chroma_handle; + + return !vpe_channel->surfaces[surface_index].kernel_referenced; +} + +static int +nouveau_vpe_pushbuf_hw_start(struct nouveau_vpe_channel *vpe_channel, + int target_surface_index, int past_surface_index, + int future_surface_index) +{ + struct drm_nouveau_vd_vpe_pushbuf_fire vpe_pushbuf; + struct nouveau_device_priv *nvdev = nouveau_device(vpe_channel->device); + int ret; + char do_kernel_surface_reference; + unsigned int i; + + vpe_channel->pushbuf->nr_surfaces = 0; + + do_kernel_surface_reference |= nouveau_vpe_pushbuf_reference_surface(vpe_channel, target_surface_index); + + if (past_surface_index >= 0) + do_kernel_surface_reference |= nouveau_vpe_pushbuf_reference_surface(vpe_channel, past_surface_index); + if (future_surface_index >= 0) + do_kernel_surface_reference |= nouveau_vpe_pushbuf_reference_surface(vpe_channel, future_surface_index); + + if (!do_kernel_surface_reference) { + vpe_channel->pushbuf->last_batch_put = 0; + vpe_channel->pushbuf->nr_batches = 0; + vpe_channel->pushbuf->is_near_end = 0; + return 0; + } + + memset(&vpe_pushbuf, 0, sizeof(vpe_pushbuf)); + vpe_pushbuf.nr_surfaces = vpe_channel->pushbuf->nr_surfaces; + vpe_pushbuf.surfaces = (uint64_t)vpe_channel->pushbuf->surfaces; + + do { + ret = drmCommandWriteRead(nvdev->fd, DRM_NOUVEAU_VD_VPE_PUSHBUF_FIRE, + &vpe_pushbuf, sizeof(vpe_pushbuf)); + } while (ret == -EAGAIN); + if (!ret) { + if (vpe_channel->pushbuf->use_hw_pushbuf) { + vpe_channel->pushbuf->cur = vpe_pushbuf.dma_cur; + vpe_channel->pushbuf->free = vpe_pushbuf.dma_free; + } + else { + vpe_channel->pushbuf->cur = 0; + if (vpe_pushbuf.dma_free > vpe_channel->pushbuf->buf_max) + vpe_channel->pushbuf->max = vpe_channel->pushbuf->buf_max; + else + /* hw pushbuf is almost empty so only use that much.*/ + vpe_channel->pushbuf->max = vpe_pushbuf.dma_free; + vpe_channel->pushbuf->free = vpe_channel->pushbuf->max; + } + for (i = 0; i < vpe_channel->pushbuf->nr_surfaces; i++) + vpe_channel->surfaces[vpe_channel->pushbuf->surfaces[i].surface_index].kernel_referenced = 1; + } + else { + fprintf(stderr, "vpe - could not start pushbuf sequence. error %d.\n", ret); + vpe_channel->pushbuf->max = 0; + return ret; + } + + vpe_channel->pushbuf->last_batch_put = 0; + vpe_channel->pushbuf->nr_batches = 0; + vpe_channel->pushbuf->is_near_end = 0; + + return 0; +} + +static int +nouveau_vpe_pushbuf_hw_fire(struct nouveau_vpe_channel *vpe_channel, + char end_sequence) +{ + struct drm_nouveau_vd_vpe_pushbuf_fire vpe_pushbuf; + struct nouveau_device_priv *nvdev = nouveau_device(vpe_channel->device); + int ret; + + if (!vpe_channel->pushbuf->use_hw_pushbuf) { + vpe_pushbuf.nr_dwords = vpe_channel->pushbuf->cur; + vpe_pushbuf.dwords = (uint64_t)vpe_channel->pushbuf->buf; + vpe_pushbuf.nr_batches = vpe_channel->pushbuf->nr_batches; + vpe_pushbuf.batches = (uint64_t)vpe_channel->pushbuf->batches; + } + else { + vpe_pushbuf.nr_dwords = 0; + vpe_pushbuf.dwords = 0; + vpe_pushbuf.nr_batches = 0; + vpe_pushbuf.batches = 0; + } + + if (!end_sequence) { + vpe_pushbuf.nr_surfaces = 0; + vpe_pushbuf.flags = 0; + } + else { + vpe_pushbuf.flags = NOUVEAU_VD_VPE_PUSHBUF_FIRE_FLAG_END_SEQUENCE; + /* Target surface (0) is the only one that needs to be referenced. + * That surface will get updated with the hw sequence for + * later queries.*/ + vpe_pushbuf.nr_surfaces = 1; + vpe_pushbuf.surfaces = (uint64_t)&vpe_channel->pushbuf->surfaces[0]; + } + + if (vpe_channel->pushbuf->use_hw_pushbuf) { + vpe_pushbuf.dma_free = vpe_channel->pushbuf->free; + vpe_pushbuf.dma_cur = vpe_channel->pushbuf->cur; + vpe_pushbuf.flags |= NOUVEAU_VD_VPE_PUSHBUF_FIRE_FLAG_UPDATE_DMA_POS; + } + + do { + ret = drmCommandWriteRead(nvdev->fd, DRM_NOUVEAU_VD_VPE_PUSHBUF_FIRE, + &vpe_pushbuf, sizeof(vpe_pushbuf)); + } while (ret == -EAGAIN); + if (!ret) { + if (vpe_channel->pushbuf->use_hw_pushbuf) { + vpe_channel->pushbuf->cur = vpe_pushbuf.dma_cur; + vpe_channel->pushbuf->free = vpe_pushbuf.dma_free; + } + else { + vpe_channel->pushbuf->cur = 0; + if (vpe_pushbuf.dma_free > vpe_channel->pushbuf->buf_max) + vpe_channel->pushbuf->max = vpe_channel->pushbuf->buf_max; + else + /* hw pushbuf is almost empty so only use that much.*/ + vpe_channel->pushbuf->max = vpe_pushbuf.dma_free; + vpe_channel->pushbuf->free = vpe_channel->pushbuf->max; + } + } + else { + fprintf(stderr, "vpe - could not fire pushbuf (%d). error %d.\n", end_sequence, ret); + vpe_channel->pushbuf->max = 0; + return ret; + } + + vpe_channel->pushbuf->last_batch_put = 0; + vpe_channel->pushbuf->nr_batches = 0; + vpe_channel->pushbuf->is_near_end = 0; + + return 0; +} + +static int +nouveau_vpe_pushbuf_end_batch(struct nouveau_vpe_channel *vpe_channel, + char do_flush) +{ + uint32_t size; + + if (vpe_channel->pushbuf->use_hw_pushbuf) + return nouveau_vpe_pushbuf_hw_fire(vpe_channel, 0); + + size = vpe_channel->pushbuf->cur - vpe_channel->pushbuf->last_batch_put; + + if (!size) + return 0; + + vpe_channel->pushbuf->batches = realloc(vpe_channel->pushbuf->batches, + (++vpe_channel->pushbuf->nr_batches) << 2); + if (!vpe_channel->pushbuf->batches) + return -ENOMEM; + + if (!do_flush) + size |= NOUVEAU_VD_VPE_PUSHBUF_FIRE_BATCH_DO_NOT_FIRE; + vpe_channel->pushbuf->batches[vpe_channel->pushbuf->nr_batches - 1] = size; + vpe_channel->pushbuf->last_batch_put = vpe_channel->pushbuf->cur; + + return 0; +} + +static void +nouveau_vpe_pushbuf_direct_write(struct nouveau_vpe_channel *vpe_channel, + uint32_t val) +{ + if (vpe_channel->pushbuf->use_hw_pushbuf) { + vpe_channel->pushbuf->buf[vpe_channel->pushbuf->cur++] = val; + vpe_channel->pushbuf->free--; + + if (vpe_channel->pushbuf->cur == vpe_channel->pushbuf->max) { + vpe_channel->pushbuf->cur = 0; + vpe_channel->pushbuf->free = vpe_channel->pushbuf->buf_max; + } + } + else { + if (vpe_channel->pushbuf->cur < vpe_channel->pushbuf->max) { + vpe_channel->pushbuf->buf[vpe_channel->pushbuf->cur++] = val; + vpe_channel->pushbuf->free--; + } + } +} + +static int +nouveau_vpe_pushbuf_reset_to_start(struct nouveau_vpe_channel *vpe_channel) +{ + int nop_count; + int i; + int ret; + + if (vpe_channel->pushbuf->max <= vpe_channel->pushbuf->buf_max) { + /* We are at the end of the hw pushbuf.. + * So, write nops and flush to hw. + */ + nop_count = vpe_channel->pushbuf->max - vpe_channel->pushbuf->cur; + for (i = 0; i < nop_count; i++) + nouveau_vpe_pushbuf_direct_write(vpe_channel, + NV_VPE_CMD_NOP << NV_VPE_CMD_TYPE_SHIFT); + } + + ret = nouveau_vpe_pushbuf_end_batch(vpe_channel, 0); + + if (!ret) + ret = nouveau_vpe_pushbuf_hw_fire(vpe_channel, 0); + + if (!ret && (vpe_channel->pushbuf->max < NV_VPE_MAX_MB)) { + /* No space left after fire so try again. + * This condition should be very rare since the kernel + * will reset automatically.*/ + ret = nouveau_vpe_pushbuf_reset_to_start(vpe_channel); + } + + return ret; +} + +int +nouveau_vpe_pushbuf_start(struct nouveau_vpe_channel *vpe_channel, + unsigned int first_mb, unsigned int end_mb, + int target_surface_index, int past_surface_index, + int future_surface_index) +{ + int ret; + + if (!vpe_channel || !vpe_channel->pushbuf || !vpe_channel->pushbuf->surfaces) + return -EINVAL; + + ret = nouveau_vpe_pushbuf_hw_start(vpe_channel, target_surface_index, + past_surface_index, future_surface_index); + if (ret) + return ret; + + /* The hardware can decode up to 16 macroblocks at a time. + * So, split up macroblocks into groups of 16...ending on 16 if possible.*/ + vpe_channel->pushbuf->next_batch_mb = (end_mb - first_mb) % NV_VPE_MAX_MB_BATCH; + if (!vpe_channel->pushbuf->next_batch_mb) + vpe_channel->pushbuf->next_batch_mb = NV_VPE_MAX_MB_BATCH; + + vpe_channel->pushbuf->cur_mb = 0; + + return ret; +} + +int +nouveau_vpe_pushbuf_fire(struct nouveau_vpe_channel *vpe_channel, + char end_sequence) +{ + if (!vpe_channel || !vpe_channel->pushbuf || !vpe_channel->pushbuf->buf + || !vpe_channel->pushbuf->surfaces || (!vpe_channel->pushbuf->use_hw_pushbuf && !vpe_channel->pushbuf->batches)) + return -EINVAL; + + return nouveau_vpe_pushbuf_hw_fire(vpe_channel, end_sequence); +} + +static void +nouveau_vpe_pushbuf_write_batch(struct nouveau_vpe_channel *vpe_channel) +{ + uint32_t len = vpe_channel->pushbuf->nr_mb_buffer; + + if (vpe_channel->pushbuf->use_hw_pushbuf) { + if (len <= vpe_channel->pushbuf->free) { + memcpy(&vpe_channel->pushbuf->buf[vpe_channel->pushbuf->cur], + vpe_channel->pushbuf->mb_buffer, + len * sizeof(uint32_t)); + vpe_channel->pushbuf->cur += len; + vpe_channel->pushbuf->free -= len; + + if (vpe_channel->pushbuf->cur == vpe_channel->pushbuf->max) { + vpe_channel->pushbuf->cur = 0; + vpe_channel->pushbuf->free = vpe_channel->pushbuf->buf_max; + } + } + } + else { + if ( (vpe_channel->pushbuf->cur + len) < vpe_channel->pushbuf->max) { + memcpy(&vpe_channel->pushbuf->buf[vpe_channel->pushbuf->cur], + vpe_channel->pushbuf->mb_buffer, + len * sizeof(uint32_t)); + + vpe_channel->pushbuf->cur += len; + vpe_channel->pushbuf->free -= len; + } + } +} + +void +nouveau_vpe_pushbuf_write(struct nouveau_vpe_channel *vpe_channel, + uint32_t val) +{ + if (!vpe_channel || !vpe_channel->pushbuf || (vpe_channel->pushbuf->nr_mb_buffer >= NV_VPE_MAX_MB) ) + return; + + vpe_channel->pushbuf->mb_buffer[vpe_channel->pushbuf->nr_mb_buffer++] = val; +} + +void +nouveau_vpe_pushbuf_last_or(struct nouveau_vpe_channel *vpe_channel, + uint32_t val) +{ + if (!vpe_channel || !vpe_channel->pushbuf || !vpe_channel->pushbuf->nr_mb_buffer) + return; + + vpe_channel->pushbuf->mb_buffer[vpe_channel->pushbuf->nr_mb_buffer - 1] |= val; +} + +int +nouveau_vpe_pushbuf_start_mb(struct nouveau_vpe_channel *vpe_channel) +{ + int ret; + + if (!vpe_channel || !vpe_channel->pushbuf) + return -EINVAL; + + if (vpe_channel->pushbuf->free > NV_VPE_MAX_MB) { + ret = 0; + } + else { + return nouveau_vpe_pushbuf_reset_to_start(vpe_channel); + /* + * This causes alignment problems in the kernel and will lockup + * the decoder. The idea here is to put as much of a mb in the + * pushbuffer. This maximizes the usage of the hw fifo. + * However, this seems to make the vpe decoder get behind more + * often and eventually lockup. Yes, adding more delays in the + * kernel help but it slows it down too much. + * So, for now disable this.*/ + /* + if (vpe_channel->pushbuf->max == vpe_channel->pushbuf->buf_max) + ret = nouveau_vpe_pushbuf_reset_to_start(vpe_channel); + else { + if (vpe_channel->pushbuf->free >= NV_VPE_MAX_MB_HEADER) { + vpe_channel->pushbuf->is_near_end = 1; + ret = 0; + } + else { + ret = nouveau_vpe_pushbuf_reset_to_start(vpe_channel); + } + } + **/ + } + + return ret; +} + +int +nouveau_vpe_pushbuf_start_mb_db(struct nouveau_vpe_channel *vpe_channel) +{ + if (!vpe_channel || !vpe_channel->pushbuf) + return -EINVAL; + + if (!vpe_channel->pushbuf->is_near_end) + return 0; + else + return nouveau_vpe_pushbuf_reset_to_start(vpe_channel); +} + +int +nouveau_vpe_pushbuf_end_mb(struct nouveau_vpe_channel *vpe_channel) +{ + if (!vpe_channel || !vpe_channel->pushbuf) + return -EINVAL; + + if (vpe_channel->pushbuf->nr_mb_buffer) { + nouveau_vpe_pushbuf_write_batch(vpe_channel); + vpe_channel->pushbuf->nr_mb_buffer = 0; + } + + ++vpe_channel->pushbuf->cur_mb; + if (vpe_channel->pushbuf->cur_mb == vpe_channel->pushbuf->next_batch_mb) { + nouveau_vpe_pushbuf_end_batch(vpe_channel, 1); + vpe_channel->pushbuf->next_batch_mb += NV_VPE_MAX_MB_BATCH; + } + + return 0; +} diff --git a/nouveau/nouveau_vpe_pushbuf.h b/nouveau/nouveau_vpe_pushbuf.h new file mode 100644 index 0000000..0112952 --- /dev/null +++ b/nouveau/nouveau_vpe_pushbuf.h @@ -0,0 +1,89 @@ +/* + * Copyright (C) 2010 Jimmy Rentz + * All Rights Reserved. + * + * Permission is hereby granted, free of charge, to any person obtaining + * a copy of this software and associated documentation files (the + * "Software"), to deal in the Software without restriction, including + * without limitation the rights to use, copy, modify, merge, publish, + * distribute, sublicense, and/or sell copies of the Software, and to + * permit persons to whom the Software is furnished to do so, subject to + * the following conditions: + * + * The above copyright notice and this permission notice (including the + * next paragraph) shall be included in all copies or substantial + * portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. + * IN NO EVENT SHALL THE COPYRIGHT OWNER(S) AND/OR ITS SUPPLIERS BE + * LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION + * OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION + * WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. + * + */ + +#ifndef __NOUVEAU_VPE_PUSHBUF_H__ +#define __NOUVEAU_VPE_PUSHBUF_H__ + +struct nouveau_vpe_pushbuf { + uint32_t *buf; + uint32_t buf_max; + uint32_t max; + uint32_t cur; + uint32_t free; + char is_near_end; + + char use_hw_pushbuf; + uint32_t hw_handle; + struct nouveau_bo *hw_bo; + + /* mb cmds are sent in batches. + * This is necessary to avoid hw lockups or corruption.*/ + uint32_t cur_mb; + uint32_t next_batch_mb; + uint32_t nr_batches; + uint32_t *batches; + /* Used only for flushing mb batches.*/ + uint32_t last_batch_put; + + uint32_t nr_mb_buffer; + uint32_t *mb_buffer; + + /* Set prior to rendering. + * It is used so that any surfaces are automatically pinned + * by the hw.*/ + uint32_t nr_surfaces; + struct drm_nouveau_vd_vpe_surface *surfaces; +}; + + +int +nouveau_vpe_pushbuf_start(struct nouveau_vpe_channel *, + unsigned int first_mb, unsigned int end_mb, + int target_surface_index, int past_surface_index, + int future_surface_index); + +int +nouveau_vpe_pushbuf_fire(struct nouveau_vpe_channel *, + char end_sequence); + +void +nouveau_vpe_pushbuf_write(struct nouveau_vpe_channel *, + uint32_t val); + +void +nouveau_vpe_pushbuf_last_or(struct nouveau_vpe_channel *, + uint32_t val); + +int +nouveau_vpe_pushbuf_start_mb(struct nouveau_vpe_channel *); + +int +nouveau_vpe_pushbuf_start_mb_db(struct nouveau_vpe_channel *); + +int +nouveau_vpe_pushbuf_end_mb(struct nouveau_vpe_channel *); + +#endif
This patch includes all the relevant nv vpe ddx support. This patch applies against the latest xf86-video-nouveau. This is hooks up the XV blit adapter to be used for XvMC IDCT on NV30/NV40 cards. This makes it possible to fallback to g3dvl XvMC if nv vpe is in use. Signed-off-by: Jimmy Rentz <jb17bsome at gmail.com> diff --git a/src/nouveau_xv.c b/src/nouveau_xv.c index d1f87c3..aa07193 100644 --- a/src/nouveau_xv.c +++ b/src/nouveau_xv.c @@ -2062,6 +2062,7 @@ NVInitVideo(ScreenPtr pScreen) XF86VideoAdaptorPtr blitAdaptor = NULL; XF86VideoAdaptorPtr textureAdaptor[2] = {NULL, NULL}; int num_adaptors; + bool create_vpe_xvmc = false; /* * Driving the blitter requires the DMA FIFO. Using the FIFO @@ -2140,14 +2141,28 @@ NVInitVideo(ScreenPtr pScreen) * associate with any/all adapters since VL doesn't depend on Xv for color conversion. */ if (textureAdaptor[0]) { - XF86MCAdaptorPtr *adaptorsXvMC xalloc(sizeof(XF86MCAdaptorPtr)); + if (blitAdaptor && ( (pNv->Architecture == NV_ARCH_30) || + (pNv->Architecture == NV_ARCH_40) ) ) { + create_vpe_xvmc = true; + num_adaptors = 2; + } + else + num_adaptors = 1; + + XF86MCAdaptorPtr *adaptorsXvMC = xalloc(num_adaptors * sizeof(XF86MCAdaptorPtr)); if (adaptorsXvMC) { adaptorsXvMC[0] = vlCreateAdaptorXvMC(pScreen, textureAdaptor[0]->name); + if (create_vpe_xvmc) + adaptorsXvMC[1] vlCreateAdaptorXvMCVPE(pScreen, blitAdaptor->name); + if (adaptorsXvMC[0]) { - vlInitXvMC(pScreen, 1, adaptorsXvMC); + vlInitXvMC(pScreen, num_adaptors, adaptorsXvMC); vlDestroyAdaptorXvMC(adaptorsXvMC[0]); + + if (num_adaptors > 1) + vlDestroyAdaptorXvMC(adaptorsXvMC[1]); } xfree(adaptorsXvMC); diff --git a/src/vl_hwmc.c b/src/vl_hwmc.c index d8d8860..800fc88 100644 --- a/src/vl_hwmc.c +++ b/src/vl_hwmc.c @@ -58,11 +58,30 @@ static XF86MCSurfaceInfoRec yv12_mpeg2_surface &subpicture_list }; +static XF86MCSurfaceInfoRec yv12_mpeg2_vpe_surface +{ + FOURCC_YV12, + XVMC_CHROMA_FORMAT_420, + 0, + 2032, + 2032, + 2048, + 2048, + XVMC_IDCT | XVMC_MPEG_2, + XVMC_SUBPICTURE_INDEPENDENT_SCALING | XVMC_BACKEND_SUBPICTURE, + &subpicture_list +}; + static XF86MCSurfaceInfoPtr surfaces[] { (XF86MCSurfaceInfoPtr)&yv12_mpeg2_surface }; +static XF86MCSurfaceInfoPtr vpe_surfaces[] +{ + (XF86MCSurfaceInfoPtr)&yv12_mpeg2_vpe_surface +}; + static XF86ImageRec rgb_subpicture = XVIMAGE_RGB; static XF86ImagePtr subpictures[] @@ -85,6 +104,21 @@ static XF86MCAdaptorRec adaptor_template (xf86XvMCDestroySubpictureProcPtr)NULL }; +static XF86MCAdaptorRec vpe_adaptor_template +{ + "", + 1, + vpe_surfaces, + 1, + subpictures, + (xf86XvMCCreateContextProcPtr)NULL, + (xf86XvMCDestroyContextProcPtr)NULL, + (xf86XvMCCreateSurfaceProcPtr)NULL, + (xf86XvMCDestroySurfaceProcPtr)NULL, + (xf86XvMCCreateSubpictureProcPtr)NULL, + (xf86XvMCDestroySubpictureProcPtr)NULL +}; + XF86MCAdaptorPtr vlCreateAdaptorXvMC(ScreenPtr pScreen, char *xv_adaptor_name) { XF86MCAdaptorPtr adaptor; @@ -110,6 +144,31 @@ XF86MCAdaptorPtr vlCreateAdaptorXvMC(ScreenPtr pScreen, char *xv_adaptor_name) return adaptor; } +XF86MCAdaptorPtr vlCreateAdaptorXvMCVPE(ScreenPtr pScreen, char *xv_adaptor_name) +{ + XF86MCAdaptorPtr adaptor; + ScrnInfoPtr pScrn; + + assert(pScreen); + assert(xv_adaptor_name); + + pScrn = xf86Screens[pScreen->myNum]; + adaptor = xf86XvMCCreateAdaptorRec(); + + if (!adaptor) + { + xf86DrvMsg(pScrn->scrnIndex, X_ERROR, "[XvMC.VPE] Memory allocation failed.\n"); + return NULL; + } + + *adaptor = vpe_adaptor_template; + adaptor->name = xv_adaptor_name; + + xf86DrvMsg(pScrn->scrnIndex, X_INFO, "[XvMC.VPE] Associated with %s.\n", xv_adaptor_name); + + return adaptor; +} + void vlDestroyAdaptorXvMC(XF86MCAdaptorPtr adaptor) { assert(adaptor); diff --git a/src/vl_hwmc.h b/src/vl_hwmc.h index 715120d..efb2d56 100644 --- a/src/vl_hwmc.h +++ b/src/vl_hwmc.h @@ -4,6 +4,7 @@ #include <xf86xvmc.h> XF86MCAdaptorPtr vlCreateAdaptorXvMC(ScreenPtr pScreen, char *xv_adaptor_name); +XF86MCAdaptorPtr vlCreateAdaptorXvMCVPE(ScreenPtr pScreen, char *xv_adaptor_name); void vlDestroyAdaptorXvMC(XF86MCAdaptorPtr adaptor); void vlInitXvMC(ScreenPtr pScreen, unsigned int num_adaptors, XF86MCAdaptorPtr *adaptors);
This patch includes all the relevant nv vpe mesa support. This patch applies against the latest mesa.pipe-video branch. This is where all the real nv vpe work is done. There are several changes required: * Modify the existing pipe-video arch to support nv vpe - 1. Push decode verifiction down into each context creation method. 2. Add some new decode flags into the context creation method. 3. Add some surface init/query methods - This includes a full XVMCQuerySurface/XVMCSyncSurface support. * Modify the pipe-video vl compositor to support nv12 surfaces. This means the frag shader/renderer needs to deal with 2 luma+chroma surfaces. * Add the nv vpe video context - This uses the nv vpe video renderer for render and the vl compositor for display. Since the render output is nv12 the pipe video surface is actually made up of 2 surfaces. One for luma (R8) and one for chroma (G8B8 and 1/2 luma). * Add the nv vpe video renderer - This includes: * Open/close vpe channel. * Allocate/free vpe output surface from a buffer object. * Query vpe output surface status. * MB render. * Hookup the nv vpe video context in the nvfx video context chain. * Add the G8B8_UNORM pipe texture format. This is required for the luma portion of the nv12 format for component swizziling purposes. * Add nv40 fragtex texture mapping for R8_UNORM and G8B8_UNORM with the correct component swizzles. Signed-off-by: Jimmy Rentz <jb17bsome at gmail.com> diff --git a/src/gallium/auxiliary/util/u_format.csv b/src/gallium/auxiliary/util/u_format.csv index 0811280..178e854 100644 --- a/src/gallium/auxiliary/util/u_format.csv +++ b/src/gallium/auxiliary/util/u_format.csv @@ -201,6 +201,7 @@ PIPE_FORMAT_R16G16B16_SSCALED , plain, 1, 1, s16 , s16 , s16 , , xyz1, r PIPE_FORMAT_R16G16B16A16_SSCALED , plain, 1, 1, s16 , s16 , s16 , s16 , xyzw, rgb PIPE_FORMAT_R8_UNORM , plain, 1, 1, un8 , , , , x001, rgb PIPE_FORMAT_R8G8_UNORM , plain, 1, 1, un8 , un8 , , , xy01, rgb +PIPE_FORMAT_G8B8_UNORM , plain, 1, 1, un8 , un8 , , , yxyx, rgb PIPE_FORMAT_R8G8B8_UNORM , plain, 1, 1, un8 , un8 , un8 , , xyz1, rgb PIPE_FORMAT_R8G8B8A8_UNORM , plain, 1, 1, un8 , un8 , un8 , un8 , xyzw, rgb PIPE_FORMAT_R8_USCALED , plain, 1, 1, u8 , , , , x001, rgb diff --git a/src/gallium/auxiliary/vl/vl_compositor.c b/src/gallium/auxiliary/vl/vl_compositor.c index 0640b1a..8dc0cef 100644 --- a/src/gallium/auxiliary/vl/vl_compositor.c +++ b/src/gallium/auxiliary/vl/vl_compositor.c @@ -98,6 +98,53 @@ create_vert_shader(struct vl_compositor *c) } static bool +create_frag_shader_nv12_2_rgb(struct vl_compositor *c) +{ + struct ureg_program *shader; + struct ureg_src tc; + struct ureg_src csc[4]; + struct ureg_src luma_sampler; + struct ureg_src chroma_sampler; + struct ureg_dst texel; + struct ureg_dst fragment; + unsigned i; + + shader = ureg_create(TGSI_PROCESSOR_FRAGMENT); + if (!shader) + return false; + + tc = ureg_DECL_fs_input(shader, TGSI_SEMANTIC_GENERIC, 1, TGSI_INTERPOLATE_LINEAR); + for (i = 0; i < 4; ++i) + csc[i] = ureg_DECL_constant(shader, i); + luma_sampler = ureg_DECL_sampler(shader, 0); + chroma_sampler = ureg_DECL_sampler(shader, 1); + texel = ureg_DECL_temporary(shader); + fragment = ureg_DECL_output(shader, TGSI_SEMANTIC_COLOR, 0); + + ureg_MOV(shader, texel, ureg_imm4f(shader, 0.0f, 0.0f, 0.0f, 1.0f)); + ureg_TEX(shader, ureg_writemask(texel, TGSI_WRITEMASK_X), TGSI_TEXTURE_2D, tc, luma_sampler); + ureg_TEX(shader, ureg_writemask(texel, TGSI_WRITEMASK_Y | TGSI_WRITEMASK_Z), TGSI_TEXTURE_2D, tc, chroma_sampler); + + /* + * texel = {0,0,0,1} + * texel.x = tex(tc, luma_sampler) + * texel.yz = tex(tc, chroma_sampler) + * fragment = csc * texel + */ + for (i = 0; i < 4; ++i) + ureg_DP4(shader, ureg_writemask(fragment, TGSI_WRITEMASK_X << i), csc[i], ureg_src(texel)); + + ureg_release_temporary(shader, texel); + ureg_END(shader); + + c->fragment_shader.ycbcr_2_rgb = ureg_create_shader_and_destroy(shader, c->pipe); + if (!c->fragment_shader.ycbcr_2_rgb) + return false; + + return true; +} + +static bool create_frag_shader_ycbcr_2_rgb(struct vl_compositor *c) { struct ureg_program *shader; @@ -190,8 +237,10 @@ init_pipe_state(struct vl_compositor *c) /*sampler.max_lod = ;*/ /*sampler.border_color[i] = ;*/ /*sampler.max_anisotropy = ;*/ - c->sampler = c->pipe->create_sampler_state(c->pipe, &sampler); - + c->sampler[0] = c->pipe->create_sampler_state(c->pipe, &sampler); + if (c->src_surface_format == PIPE_FORMAT_NV12) + c->sampler[1] = c->pipe->create_sampler_state(c->pipe, &sampler); + return true; } @@ -199,7 +248,9 @@ static void cleanup_pipe_state(struct vl_compositor *c) { assert(c); - c->pipe->delete_sampler_state(c->pipe, c->sampler); + c->pipe->delete_sampler_state(c->pipe, c->sampler[0]); + if (c->src_surface_format == PIPE_FORMAT_NV12) + c->pipe->delete_sampler_state(c->pipe, c->sampler[1]); } static bool @@ -207,14 +258,25 @@ init_shaders(struct vl_compositor *c) { assert(c); + if (c->src_surface_format != PIPE_FORMAT_NV12) { + if (!create_frag_shader_ycbcr_2_rgb(c)) { + debug_printf("Unable to create YCbCr-to-RGB fragment shader.\n"); + return false; + } + } + else { + + if (!create_frag_shader_nv12_2_rgb(c)) { + debug_printf("Unable to create NV12-to-RGB fragment shader.\n"); + return false; + } + } + if (!create_vert_shader(c)) { debug_printf("Unable to create vertex shader.\n"); return false; - } - if (!create_frag_shader_ycbcr_2_rgb(c)) { - debug_printf("Unable to create YCbCr-to-RGB fragment shader.\n"); - return false; - } + } + if (!create_frag_shader_rgb_2_rgb(c)) { debug_printf("Unable to create RGB-to-RGB fragment shader.\n"); return false; @@ -308,7 +370,8 @@ texview_map_delete(const struct keymap *map, pipe_sampler_view_reference(&sv, NULL); } -bool vl_compositor_init(struct vl_compositor *compositor, struct pipe_context *pipe) +bool vl_compositor_init(struct vl_compositor *compositor, struct pipe_context *pipe, + enum pipe_format src_surface_format) { unsigned i; @@ -317,6 +380,7 @@ bool vl_compositor_init(struct vl_compositor *compositor, struct pipe_context *p memset(compositor, 0, sizeof(struct vl_compositor)); compositor->pipe = pipe; + compositor->src_surface_format = src_surface_format; compositor->texview_map = util_new_keymap(sizeof(struct pipe_surface*), -1, texview_map_delete); @@ -519,6 +583,84 @@ static unsigned gen_data(struct vl_compositor *c, return num_rects; } +static void draw_nv12_layers(struct vl_compositor *c, + struct pipe_surface *src_luma_surface, + struct pipe_surface *src_chroma_surface, + struct pipe_video_rect *src_rect, + struct pipe_video_rect *dst_rect) +{ + unsigned num_rects; + struct pipe_surface *src_surfaces[VL_COMPOSITOR_MAX_LAYERS + 2]; + void *frag_shaders[VL_COMPOSITOR_MAX_LAYERS + 2]; + unsigned i; + boolean is_video_surface = FALSE; + struct pipe_sampler_view templat; + struct pipe_surface *src_chroma_surf_ref = src_chroma_surface; + struct pipe_sampler_view *surface_views[2] = {NULL, NULL}; + + assert(c); + assert(src_luma_surface); + assert(src_chroma_surface); + assert(src_rect); + assert(dst_rect); + + num_rects = gen_data(c, src_luma_surface, src_rect, dst_rect, src_surfaces, + frag_shaders); + + for (i = 0; i < num_rects; ++i) { + boolean delete_view = FALSE; + is_video_surface = FALSE; + + surface_views[0] = (struct pipe_sampler_view*)util_keymap_lookup(c->texview_map, &src_surfaces[i]); + if (!surface_views[0]) { + u_sampler_view_default_template(&templat, src_surfaces[i]->texture, + src_surfaces[i]->texture->format); + surface_views[0] = c->pipe->create_sampler_view(c->pipe, src_surfaces[i]->texture, + &templat); + if (!surface_views[0]) + return; + + delete_view = !util_keymap_insert(c->texview_map, &src_surfaces[i], + surface_views[0], c->pipe); + } + + if (src_surfaces[i] == src_luma_surface) + is_video_surface = TRUE; + + c->pipe->bind_fs_state(c->pipe, frag_shaders[i]); + if (is_video_surface) { + boolean delete_cview = FALSE; + + surface_views[1] = (struct pipe_sampler_view*)util_keymap_lookup(c->texview_map, &src_chroma_surf_ref); + if (!surface_views[1]) { + u_sampler_view_default_template(&templat, src_chroma_surf_ref->texture, + src_chroma_surf_ref->texture->format); + surface_views[1] = c->pipe->create_sampler_view(c->pipe, src_chroma_surf_ref->texture, + &templat); + if (!surface_views[1]) + return; + + delete_cview = !util_keymap_insert(c->texview_map, &src_chroma_surf_ref, + surface_views[1], c->pipe); + } + + c->pipe->set_fragment_sampler_views(c->pipe, 2, surface_views); + c->pipe->draw_arrays(c->pipe, PIPE_PRIM_TRIANGLES, i * 6, 6); + + if (delete_cview) + pipe_sampler_view_reference(&surface_views[1], NULL); + } + else { + c->pipe->set_fragment_sampler_views(c->pipe, 1, surface_views); + c->pipe->draw_arrays(c->pipe, PIPE_PRIM_TRIANGLES, i * 6, 6); + } + + if (delete_view) { + pipe_sampler_view_reference(&surface_views[0], NULL); + } + } +} + static void draw_layers(struct vl_compositor *c, struct pipe_surface *src_surface, struct pipe_video_rect *src_rect, @@ -604,7 +746,7 @@ void vl_compositor_render(struct vl_compositor *compositor, compositor->pipe->set_framebuffer_state(compositor->pipe, &compositor->fb_state); compositor->pipe->set_viewport_state(compositor->pipe, &compositor->viewport); - compositor->pipe->bind_fragment_sampler_states(compositor->pipe, 1, &compositor->sampler); + compositor->pipe->bind_fragment_sampler_states(compositor->pipe, 1, compositor->sampler); compositor->pipe->bind_vs_state(compositor->pipe, compositor->vertex_shader); compositor->pipe->set_vertex_buffers(compositor->pipe, 1, &compositor->vertex_buf); compositor->pipe->bind_vertex_elements_state(compositor->pipe, compositor->vertex_elems_state); @@ -616,6 +758,62 @@ void vl_compositor_render(struct vl_compositor *compositor, compositor->pipe->flush(compositor->pipe, PIPE_FLUSH_RENDER_CACHE, fence); } +void vl_compositor_render_nv12(struct vl_compositor *compositor, + struct pipe_surface *src_luma_surface, + struct pipe_surface *src_chroma_surface, + enum pipe_mpeg12_picture_type picture_type, + /*unsigned num_past_surfaces, + struct pipe_surface *past_surfaces, + unsigned num_future_surfaces, + struct pipe_surface *future_surfaces,*/ + struct pipe_video_rect *src_area, + struct pipe_surface *dst_surface, + struct pipe_video_rect *dst_area, + struct pipe_fence_handle **fence) +{ + assert(compositor); + assert(src_luma_surface); + assert(src_chroma_surface); + assert(src_area); + assert(dst_surface); + assert(dst_area); + assert(picture_type == PIPE_MPEG12_PICTURE_TYPE_FRAME); + assert(compositor->src_surface_format == PIPE_FORMAT_NV12); + + if (compositor->fb_state.width != dst_surface->width) { + compositor->fb_inv_size.x = 1.0f / dst_surface->width; + compositor->fb_state.width = dst_surface->width; + } + if (compositor->fb_state.height != dst_surface->height) { + compositor->fb_inv_size.y = 1.0f / dst_surface->height; + compositor->fb_state.height = dst_surface->height; + } + + compositor->fb_state.cbufs[0] = dst_surface; + + compositor->viewport.scale[0] = compositor->fb_state.width; + compositor->viewport.scale[1] = compositor->fb_state.height; + compositor->viewport.scale[2] = 1; + compositor->viewport.scale[3] = 1; + compositor->viewport.translate[0] = 0; + compositor->viewport.translate[1] = 0; + compositor->viewport.translate[2] = 0; + compositor->viewport.translate[3] = 0; + + compositor->pipe->set_framebuffer_state(compositor->pipe, &compositor->fb_state); + compositor->pipe->set_viewport_state(compositor->pipe, &compositor->viewport); + compositor->pipe->bind_fragment_sampler_states(compositor->pipe, 2, compositor->sampler); + compositor->pipe->bind_vs_state(compositor->pipe, compositor->vertex_shader); + compositor->pipe->set_vertex_buffers(compositor->pipe, 1, &compositor->vertex_buf); + compositor->pipe->bind_vertex_elements_state(compositor->pipe, compositor->vertex_elems_state); + compositor->pipe->set_constant_buffer(compositor->pipe, PIPE_SHADER_FRAGMENT, 0, compositor->fs_const_buf); + + draw_nv12_layers(compositor, src_luma_surface, src_chroma_surface, src_area, dst_area); + + assert(!compositor->dirty_bg && !compositor->dirty_layers); + compositor->pipe->flush(compositor->pipe, PIPE_FLUSH_RENDER_CACHE, fence); +} + void vl_compositor_set_csc_matrix(struct vl_compositor *compositor, const float *mat) { struct pipe_transfer *buf_transfer; diff --git a/src/gallium/auxiliary/vl/vl_compositor.h b/src/gallium/auxiliary/vl/vl_compositor.h index 820c9ef..e53d1e2 100644 --- a/src/gallium/auxiliary/vl/vl_compositor.h +++ b/src/gallium/auxiliary/vl/vl_compositor.h @@ -42,9 +42,10 @@ struct vl_compositor { struct pipe_context *pipe; + enum pipe_format src_surface_format; struct pipe_framebuffer_state fb_state; struct vertex2f fb_inv_size; - void *sampler; + void *sampler[2]; struct pipe_sampler_view *sampler_view; void *vertex_shader; struct @@ -68,7 +69,8 @@ struct vl_compositor struct keymap *texview_map; }; -bool vl_compositor_init(struct vl_compositor *compositor, struct pipe_context *pipe); +bool vl_compositor_init(struct vl_compositor *compositor, struct pipe_context *pipe, + enum pipe_format src_surface_format); void vl_compositor_cleanup(struct vl_compositor *compositor); @@ -92,6 +94,19 @@ void vl_compositor_render(struct vl_compositor *compositor, struct pipe_surface *dst_surface, struct pipe_video_rect *dst_area, struct pipe_fence_handle **fence); + +void vl_compositor_render_nv12(struct vl_compositor *compositor, + struct pipe_surface *src_luma_surface, + struct pipe_surface *src_chroma_surface, + enum pipe_mpeg12_picture_type picture_type, + /*unsigned num_past_surfaces, + struct pipe_surface *past_surfaces, + unsigned num_future_surfaces, + struct pipe_surface *future_surfaces,*/ + struct pipe_video_rect *src_area, + struct pipe_surface *dst_surface, + struct pipe_video_rect *dst_area, + struct pipe_fence_handle **fence); void vl_compositor_set_csc_matrix(struct vl_compositor *compositor, const float *mat); diff --git a/src/gallium/drivers/nouveau/nouveau_winsys.h b/src/gallium/drivers/nouveau/nouveau_winsys.h index cd7da99..0bc9eeb 100644 --- a/src/gallium/drivers/nouveau/nouveau_winsys.h +++ b/src/gallium/drivers/nouveau/nouveau_winsys.h @@ -12,6 +12,9 @@ #include "nouveau/nouveau_notifier.h" #include "nouveau/nouveau_resource.h" #include "nouveau/nouveau_pushbuf.h" +#include "nouveau/nouveau_vpe_hw.h" +#include "nouveau/nouveau_vpe_channel.h" +#include "nouveau/nouveau_vpe_pushbuf.h" static inline uint32_t nouveau_screen_transfer_flags(unsigned pipe) diff --git a/src/gallium/drivers/nvfx/Makefile b/src/gallium/drivers/nvfx/Makefile index e7ca6e6..3f9c19a 100644 --- a/src/gallium/drivers/nvfx/Makefile +++ b/src/gallium/drivers/nvfx/Makefile @@ -30,7 +30,9 @@ C_SOURCES = \ nvfx_transfer.c \ nvfx_vbo.c \ nvfx_vertprog.c \ - nvfx_video_context.c + nvfx_video_context.c \ + nvfx_vpe_video_context.c \ + nvfx_vpe_mpeg2_mc_renderer.c LIBRARY_INCLUDES = \ -I$(TOP)/src/gallium/drivers/nouveau/include diff --git a/src/gallium/drivers/nvfx/nv40_fragtex.c b/src/gallium/drivers/nvfx/nv40_fragtex.c index 0068b1b..b38a777 100644 --- a/src/gallium/drivers/nvfx/nv40_fragtex.c +++ b/src/gallium/drivers/nvfx/nv40_fragtex.c @@ -87,6 +87,8 @@ nv40_texture_formats[] = { _(DXT1_RGBA , DXT1 , S1, S1, S1, S1, X, Y, Z, W, 0, 0, 0, 0), _(DXT3_RGBA , DXT3 , S1, S1, S1, S1, X, Y, Z, W, 0, 0, 0, 0), _(DXT5_RGBA , DXT5 , S1, S1, S1, S1, X, Y, Z, W, 0, 0, 0, 0), + _(R8_UNORM , L8 , S1, S1, S1, S1, X, X, X, X, 0, 0, 0, 0), + _(G8B8_UNORM , A8L8 , S1, S1, S1, S1, Y, X, Y, X, 0, 0, 0, 0), {}, }; diff --git a/src/gallium/drivers/nvfx/nvfx_video_context.c b/src/gallium/drivers/nvfx/nvfx_video_context.c index 9212ae5..49fe3cc 100644 --- a/src/gallium/drivers/nvfx/nvfx_video_context.c +++ b/src/gallium/drivers/nvfx/nvfx_video_context.c @@ -26,24 +26,35 @@ **************************************************************************/ #include "nvfx_video_context.h" +#include "nvfx_vpe_video_context.h" #include <softpipe/sp_video_context.h> struct pipe_video_context * nvfx_video_create(struct pipe_screen *screen, enum pipe_video_profile profile, enum pipe_video_chroma_format chroma_format, + enum pipe_video_entry_point entry_point, + unsigned decode_flags, unsigned width, unsigned height, void *priv) { struct pipe_context *pipe; + struct pipe_video_context *pvctx; assert(screen); pipe = screen->context_create(screen, priv); if (!pipe) return NULL; - - return sp_video_create_ex(pipe, profile, chroma_format, width, height, + + /* Try to create vpe context first.*/ + pvctx = nv_vpe_video_create(pipe, profile, chroma_format, entry_point, + decode_flags, width, height, priv); + + if (!pvctx) + pvctx = sp_video_create_ex(pipe, profile, chroma_format, entry_point, + decode_flags, width, height, VL_MPEG12_MC_RENDERER_BUFFER_PICTURE, VL_MPEG12_MC_RENDERER_EMPTY_BLOCK_XFER_ONE, true, PIPE_FORMAT_VUYX); + return pvctx; } diff --git a/src/gallium/drivers/nvfx/nvfx_video_context.h b/src/gallium/drivers/nvfx/nvfx_video_context.h index 6619427..50e178a 100644 --- a/src/gallium/drivers/nvfx/nvfx_video_context.h +++ b/src/gallium/drivers/nvfx/nvfx_video_context.h @@ -29,10 +29,12 @@ #define __NVFX_VIDEO_CONTEXT_H__ #include <pipe/p_video_context.h> - + struct pipe_video_context * nvfx_video_create(struct pipe_screen *screen, enum pipe_video_profile profile, enum pipe_video_chroma_format chroma_format, + enum pipe_video_entry_point entry_point, + unsigned decode_flags, unsigned width, unsigned height, void *priv); #endif diff --git a/src/gallium/drivers/nvfx/nvfx_vpe_mpeg2_mc_renderer.c b/src/gallium/drivers/nvfx/nvfx_vpe_mpeg2_mc_renderer.c new file mode 100644 index 0000000..d9796e4 --- /dev/null +++ b/src/gallium/drivers/nvfx/nvfx_vpe_mpeg2_mc_renderer.c @@ -0,0 +1,1053 @@ +/* + * Copyright (C) 2010 Jimmy Rentz + * All Rights Reserved. + * + * Permission is hereby granted, free of charge, to any person obtaining + * a copy of this software and associated documentation files (the + * "Software"), to deal in the Software without restriction, including + * without limitation the rights to use, copy, modify, merge, publish, + * distribute, sublicense, and/or sell copies of the Software, and to + * permit persons to whom the Software is furnished to do so, subject to + * the following conditions: + * + * The above copyright notice and this permission notice (including the + * next paragraph) shall be included in all copies or substantial + * portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. + * IN NO EVENT SHALL THE COPYRIGHT OWNER(S) AND/OR ITS SUPPLIERS BE + * LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION + * OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION + * WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. + * + */ + +#include <stdio.h> +#include <stdlib.h> +#include <stdint.h> +#include <string.h> +#include <errno.h> + +#include <pipe/p_compiler.h> +#include <pipe/p_state.h> +#include <pipe/p_video_state.h> +#include <util/u_memory.h> +#include <util/u_rect.h> +#include <util/u_video.h> + +#include <nouveau/nouveau_winsys.h> +#include "nvfx_screen.h" +#include "nvfx_resource.h" +#include "nvfx_vpe_video_context.h" +#include "nvfx_vpe_mpeg2_mc_renderer.h" + +static __inline__ boolean +is_odd_multiple(int val, int multiplier) +{ + return ( (val / multiplier) & 1); +} + +static void +nv_vpe_mpeg2_mb_dct_header(struct nouveau_vpe_channel *vpe_channel, + boolean is_luma, + enum pipe_mpeg12_picture_type picture_type, + int target_surface_index, + struct pipe_mpeg12_macroblock *mb) +{ + unsigned int base_dct; + unsigned int is_field_dct; + unsigned int luma_dct_extra; + unsigned int x; + unsigned int y; + unsigned int p; + unsigned int cbp; + boolean is_frame_picture; + boolean is_bottom_field; + + x = mb->mbx; + y = mb->mby; + + if (picture_type == PIPE_MPEG12_PICTURE_TYPE_FRAME) { + is_frame_picture = TRUE; + is_bottom_field = FALSE; + } + else { + is_frame_picture = FALSE; + is_bottom_field = picture_type == PIPE_MPEG12_PICTURE_TYPE_FIELD_BOTTOM; + } + + /* Intra blocks always have a full set of datablocks regardless of the pattern. + * Any empty block sections are filled with a null. + */ + if (mb->mb_type == PIPE_MPEG12_MACROBLOCK_TYPE_INTRA) + cbp = 0x3F; + else + cbp = mb->cbp; + + if ( !is_frame_picture && + ( (mb->mo_type == PIPE_MPEG12_MOTION_TYPE_FIELD) || + (mb->mo_type == PIPE_MPEG12_MOTION_TYPE_DUALPRIME) || + (mb->mo_type == PIPE_MPEG12_MOTION_TYPE_16x8) ) ) + p = 2; + else + p = 1; + + is_field_dct = mb->dct_type == PIPE_MPEG12_DCT_TYPE_FIELD; + + base_dct = NV_VPE_CMD_DCT_BLOCK_UNKNOWN | + NV_VPE_CMD_DCT_BLOCK_TARGET_SURFACE(target_surface_index); + + if (!is_odd_multiple(x, 1)) + base_dct |= NV_VD_VPE_CMD_EVEN_X_COORD; + + if (is_frame_picture) { + base_dct |= NV_VPE_CMD_PICT_FRAME; + luma_dct_extra = (is_field_dct) ? NV_VPE_CMD_PICT_FRAME_FIELD : 0; + } else { + luma_dct_extra = 0; + if (is_bottom_field) + base_dct |= NV_VD_VPE_CMD_BOTTOM_FIELD; + } + + if (is_luma) { + nouveau_vpe_pushbuf_write(vpe_channel, + NV_VPE_CMD_DCT_LUMA_HEADER << NV_VPE_CMD_TYPE_SHIFT + | NV_VPE_CMD_DCT_BLOCK_PATTERN(cbp >> 2) + | base_dct + | luma_dct_extra); + nouveau_vpe_pushbuf_write(vpe_channel, + NV_VPE_CMD_DCT_COORDINATE << NV_VPE_CMD_TYPE_SHIFT + | NV_VPE_DCT_POINTS_LUMA(x, y, p)); + } + else { + nouveau_vpe_pushbuf_write(vpe_channel, + NV_VPE_CMD_DCT_CHROMA_HEADER << NV_VPE_CMD_TYPE_SHIFT + | NV_VPE_CMD_DCT_BLOCK_PATTERN( (cbp & 3) << 2) + | base_dct); + nouveau_vpe_pushbuf_write(vpe_channel, + NV_VPE_CMD_DCT_COORDINATE << NV_VPE_CMD_TYPE_SHIFT + | NV_VPE_DCT_POINTS_CHROMA(x, y, p)); + } +} + +static void +nv_vpe_mpeg2_mb_dct_blocks(struct nouveau_vpe_channel *vpe_channel, + unsigned int cbp, boolean is_intra, void *blocks) +{ + short *db = (short*) blocks; + int cbb; + int i; + int packed_db = 0; + char got_dct = 0; + + for (cbb = 0x20; cbb > 0; cbb >>= 1) { + + if (cbb & cbp) { + + nouveau_vpe_pushbuf_write(vpe_channel, + NV_VPE_CMD_DCT_SEPARATOR << NV_VPE_CMD_TYPE_SHIFT); + + /* Pack each datablock (n datablocks of 64 entries each) into the command buffer. + */ + for (i = 0; i < 64; i += 2) { + + if (db[i] || db[i + 1]) { + packed_db = ((int)(db[i] & 0xFFF) << 19) | ((int)(db[i + 1] & 0xFFF) << 6); + nouveau_vpe_pushbuf_write(vpe_channel, + packed_db | i); + got_dct = 1; + } + } + if (got_dct) + nouveau_vpe_pushbuf_last_or(vpe_channel, + NV_VPE_DCT_BLOCK_TERMINATOR); + else + /* Nothing exists so null out this datablock.*/ + nouveau_vpe_pushbuf_write(vpe_channel, + NV_VPE_DCT_BLOCK_NULL); + db += 64; + got_dct = 0; + } + else if (is_intra) { + nouveau_vpe_pushbuf_write(vpe_channel, + NV_VPE_CMD_DCT_SEPARATOR << NV_VPE_CMD_TYPE_SHIFT); + /* Intra blocks get a null data block if no data exists. + * However, we do not increment the data block offset.*/ + nouveau_vpe_pushbuf_write(vpe_channel, + NV_VPE_DCT_BLOCK_NULL); + } + } +} + +static int +nv_vpe_mpeg2_mb_ipicture(struct nouveau_vpe_channel *vpe_channel, + enum pipe_mpeg12_picture_type picture_type, + int target_surface_index, + struct pipe_mpeg12_macroblock *mb) +{ + int ret; + + ret = nouveau_vpe_pushbuf_start_mb(vpe_channel); + if (ret) { + debug_printf("[nv_vpe] could not start ipicture. error %d.\n", ret); + return ret; + } + + nv_vpe_mpeg2_mb_dct_header(vpe_channel, TRUE, picture_type, target_surface_index, mb); + nv_vpe_mpeg2_mb_dct_header(vpe_channel, FALSE, picture_type, target_surface_index, mb); + + ret = nouveau_vpe_pushbuf_start_mb_db(vpe_channel); + if (ret) { + debug_printf("[nv_vpe] could not start ipicture db. error %d.\n", ret); + return ret; + } + + nv_vpe_mpeg2_mb_dct_blocks(vpe_channel, mb->cbp, TRUE, mb->blocks); + + return 0; +} + +static __inline__ boolean +mb_has_forward_mv(enum pipe_mpeg12_macroblock_type type) +{ + return (type == PIPE_MPEG12_MACROBLOCK_TYPE_FWD)|| + (type == PIPE_MPEG12_MACROBLOCK_TYPE_BI); +} + +static __inline__ boolean +mb_has_backward_mv(enum pipe_mpeg12_macroblock_type type) +{ + return (type == PIPE_MPEG12_MACROBLOCK_TYPE_BKWD)|| + (type == PIPE_MPEG12_MACROBLOCK_TYPE_BI); +} + +static unsigned int +nv_vpe_mpeg2_mb_mc_header(unsigned int type, unsigned int mc_header_base, + boolean has_odd_horizontal_vector, + boolean has_odd_vertical_vector, + boolean is_forward, boolean is_first, + boolean is_vertical_motion) +{ + unsigned int mc_header; + + mc_header = (type << NV_VPE_CMD_TYPE_SHIFT) | mc_header_base; + + if (has_odd_horizontal_vector) + mc_header |= NV_VPE_CMD_ODD_HORIZONTAL_MOTION_VECTOR; + + if (has_odd_vertical_vector) + mc_header |= NV_VPE_CMD_ODD_VERTICAL_MOTION_VECTOR; + + if (!is_forward) + mc_header |= NV_VPE_CMD_MOTION_VECTOR_BACKWARD; + + if (!is_first) + mc_header |= NV_VPE_CMD_MOTION_VECTOR_TYPE_SECOND; + + if (is_vertical_motion) + mc_header |= NV_VPE_CMD_BOTTOM_FIELD_VERTICAL_MOTION_SELECT_FIRST; + + return mc_header; +} + +static void +nv_vpe_mpeg2_mb_1mv_luma(struct nouveau_vpe_channel *vpe_channel, + boolean is_frame_picture_type, boolean is_forward, + boolean is_dual_prime_motion, boolean is_vertical_motion, + unsigned int x, unsigned int y, + int mv_horizontal, int mv_vertical, + unsigned int mc_header_base, + int target_surface_index) +{ + unsigned int mc_header; + unsigned int mc_vector; + boolean has_odd_vertical_vector; + boolean has_odd_horizontal_vector; + + has_odd_horizontal_vector = is_odd_multiple(mv_horizontal, 1); + has_odd_vertical_vector = is_odd_multiple(mv_vertical, 1); + + mc_header = nv_vpe_mpeg2_mb_mc_header(NV_VPE_CMD_LUMA_MOTION_VECTOR_HEADER, + mc_header_base, + has_odd_horizontal_vector, + has_odd_vertical_vector, + is_forward, + TRUE, is_vertical_motion); + + mc_header |= NV_VPE_CMD_PREDICTION_SURFACE(target_surface_index); + + mc_vector = NV_VPE_CMD_MOTION_VECTOR << NV_VPE_CMD_TYPE_SHIFT; + + if (mv_horizontal < 0) + mv_horizontal--; + + mc_vector |= NV_VPE_MOTION_VECTOR_HORIZONTAL(x, 16, mv_horizontal, 2, 0); + + if (is_frame_picture_type) { + + if (mv_vertical < 0) + mv_vertical--; + + if (has_odd_vertical_vector) + mv_vertical--; + + mc_vector |= NV_VPE_MOTION_VECTOR_VERTICAL(y, 16, mv_vertical, 2, 0); + } else { + + if (has_odd_vertical_vector) + mv_vertical--; + + mc_vector |= NV_VPE_MOTION_VECTOR_VERTICAL(y, 32, mv_vertical, 1, 0); + } + + nouveau_vpe_pushbuf_write(vpe_channel, mc_header); + nouveau_vpe_pushbuf_write(vpe_channel, mc_vector); +} + +static void +nv_vpe_mpeg2_mb_1mv_chroma(struct nouveau_vpe_channel *vpe_channel, + boolean is_frame_picture_type, boolean is_forward, + boolean is_dual_prime_motion, + boolean is_vertical_motion, + unsigned int x, unsigned int y, + int mv_horizontal, int mv_vertical, + unsigned int mc_header_base, + int target_surface_index) +{ + unsigned int mc_header; + unsigned int mc_vector; + boolean has_odd_vertical_vector; + boolean has_odd_horizontal_vector; + + has_odd_horizontal_vector = is_odd_multiple(mv_horizontal, 2); + has_odd_vertical_vector = is_odd_multiple(mv_vertical, 2); + + mc_header = nv_vpe_mpeg2_mb_mc_header(NV_VPE_CMD_CHROMA_MOTION_VECTOR_HEADER, + mc_header_base, + has_odd_horizontal_vector, + has_odd_vertical_vector, + is_forward, + TRUE, is_vertical_motion); + mc_header |= NV_VPE_CMD_PREDICTION_SURFACE(target_surface_index); + + mc_vector = NV_VPE_CMD_MOTION_VECTOR << NV_VPE_CMD_TYPE_SHIFT; + + mv_horizontal /= 2; + + if (has_odd_horizontal_vector) + mv_horizontal--; + + mc_vector |= NV_VPE_MOTION_VECTOR_HORIZONTAL(x, 16, mv_horizontal, 1, 0); + + if (is_frame_picture_type) { + + if (mv_vertical < 0) + mv_vertical -= 2; + + mc_vector |= NV_VPE_MOTION_VECTOR_VERTICAL(y, 8, mv_vertical, 4, 0); + } + else { + + mv_vertical /= 2; + + if (has_odd_vertical_vector) + mv_vertical--; + + mc_vector |= NV_VPE_MOTION_VECTOR_VERTICAL(y, 16, mv_vertical, 1, 0); + } + + nouveau_vpe_pushbuf_write(vpe_channel, mc_header); + nouveau_vpe_pushbuf_write(vpe_channel, mc_vector); +} + +static int +nv_vpe_mpeg2_mb_1fbmv(struct nouveau_vpe_channel *vpe_channel, + enum pipe_mpeg12_picture_type picture_type, + int target_surface_index, int past_surface_index, + int future_surface_index, + struct pipe_mpeg12_macroblock *mb) +{ + int ret; + unsigned int x; + unsigned int y; + boolean has_forward; + boolean has_backward; + boolean is_frame_picture_type; + boolean is_dual_prime_motion; + boolean is_vertical_forward_motion; + boolean is_vertical_backward_motion; + unsigned int mc_header_base; + int mv_horizontal_forward; + int mv_vertical_forward; + int mv_horizontal_backward; + int mv_vertical_backward; + + ret = nouveau_vpe_pushbuf_start_mb(vpe_channel); + if (ret) { + debug_printf("[nv_vpe] - could not start 1fbmv. error %d.\n", ret); + return ret; + } + + x = mb->mbx; + y = mb->mby; + + is_frame_picture_type = picture_type == PIPE_MPEG12_PICTURE_TYPE_FRAME; + is_dual_prime_motion = mb->mo_type == PIPE_MPEG12_MOTION_TYPE_DUALPRIME; + has_forward = mb_has_forward_mv(mb->mb_type); + has_backward = mb_has_backward_mv(mb->mb_type); + + mc_header_base = NV_VPE_CMD_FRAME_FRAME_PICT_OR_FIELD; + + if (is_frame_picture_type) { + mc_header_base |= NV_VPE_CMD_FRAME_PICT_FRAME_MOTION; + /* Frame pictures never have vertical motion selection.*/ + is_vertical_forward_motion = FALSE; + is_vertical_backward_motion = FALSE; + } + else if (is_dual_prime_motion) { + /* dual-prime selects the backward vector for top field pictures. + * Bottom field pictures are reversed.*/ + if (picture_type == PIPE_MPEG12_PICTURE_TYPE_FIELD_TOP) { + is_vertical_forward_motion = FALSE; + is_vertical_backward_motion = TRUE; + } + else { + is_vertical_forward_motion = TRUE; + is_vertical_backward_motion = FALSE; + } + /* dual-prime always has forward and backward vectors. + * However, for some reason the nv driver only does this if at least the forward motion vector exists. + * So, if the backward doesn't exist then each motion vector is skipped.*/ + has_backward = has_forward; + } + else { + is_vertical_forward_motion = mb->motion_vertical_field_select & + PIPE_VIDEO_MOTION_VERTICAL_FIELD_SELECT_FIRST_FORWARD; + is_vertical_backward_motion = mb->motion_vertical_field_select & + PIPE_VIDEO_MOTION_VERTICAL_FIELD_SELECT_FIRST_BACKWARD; + } + + /* Be sure the user passed valid predictor surfaces. + * Skip them otherwise. */ + if (has_forward && (past_surface_index == -1) ) + has_forward = FALSE; + + if (has_backward && (future_surface_index == -1) ) + has_backward = FALSE; + + if (!has_forward && !has_backward) + return 0; + + mv_horizontal_forward = mb->pmv[0][0][0]; + mv_vertical_forward = mb->pmv[0][0][1]; + mv_horizontal_backward = mb->pmv[0][1][0]; + mv_vertical_backward = mb->pmv[0][1][1]; + + /* Luma */ + if (has_forward) + nv_vpe_mpeg2_mb_1mv_luma(vpe_channel, is_frame_picture_type, TRUE, + is_dual_prime_motion, is_vertical_forward_motion, + x, y, mv_horizontal_forward, mv_vertical_forward, + mc_header_base, past_surface_index); + + if (has_backward) + nv_vpe_mpeg2_mb_1mv_luma(vpe_channel, is_frame_picture_type, !has_forward, + is_dual_prime_motion, is_vertical_backward_motion, + x, y, mv_horizontal_backward, mv_vertical_backward, + mc_header_base, future_surface_index); + + if (has_forward || has_backward) + nv_vpe_mpeg2_mb_dct_header(vpe_channel, TRUE, picture_type, + target_surface_index, mb); + + /* Chroma */ + if (has_forward) + nv_vpe_mpeg2_mb_1mv_chroma(vpe_channel, is_frame_picture_type, TRUE, + is_dual_prime_motion, is_vertical_forward_motion, + x, y, mv_horizontal_forward, mv_vertical_forward, + mc_header_base, past_surface_index); + if (has_backward) + nv_vpe_mpeg2_mb_1mv_chroma(vpe_channel, is_frame_picture_type, !has_forward, + is_dual_prime_motion, is_vertical_backward_motion, + x, y, mv_horizontal_backward, mv_vertical_backward, + mc_header_base, future_surface_index); + + if (has_forward || has_backward) + nv_vpe_mpeg2_mb_dct_header(vpe_channel, FALSE, picture_type, + target_surface_index, mb); + + if ( (has_forward || has_backward) && mb->cbp) { + ret = nouveau_vpe_pushbuf_start_mb_db(vpe_channel); + if (ret) { + debug_printf("[nv_vpe] - could not start 1fbmv db. error %d.\n", ret); + return ret; + } + + nv_vpe_mpeg2_mb_dct_blocks(vpe_channel, mb->cbp, FALSE, mb->blocks); + } + + return 0; +} + +static void +nv_vpe_mpeg2_mb_2mv_luma(struct nouveau_vpe_channel *vpe_channel, boolean is_frame_picture_type, + boolean is_forward, boolean is_first, boolean is_dual_prime_motion, + boolean is_vertical_motion, unsigned int x, + unsigned int y, int mv_horizontal, int mv_vertical, + unsigned int mc_header_base, + int target_surface_index) +{ + unsigned int mc_header; + unsigned int mc_vector; + boolean has_odd_vertical_vector; + boolean has_odd_horizontal_vector; + + has_odd_horizontal_vector = is_odd_multiple(mv_horizontal, 1); + + if (is_frame_picture_type) { + if (mv_vertical < 0) + mv_vertical--; + has_odd_vertical_vector = is_odd_multiple(mv_vertical, 2); + } + else { + has_odd_vertical_vector = is_odd_multiple(mv_vertical, 2); + } + + mc_header = nv_vpe_mpeg2_mb_mc_header(NV_VPE_CMD_LUMA_MOTION_VECTOR_HEADER, + mc_header_base, + has_odd_horizontal_vector, + has_odd_vertical_vector, + is_forward, + is_first, is_vertical_motion); + + mc_header |= NV_VPE_CMD_PREDICTION_SURFACE(target_surface_index); + + mc_vector = NV_VPE_CMD_MOTION_VECTOR << NV_VPE_CMD_TYPE_SHIFT; + + if (mv_horizontal < 0) + mv_horizontal--; + + mc_vector |= NV_VPE_MOTION_VECTOR_HORIZONTAL(x, 16, mv_horizontal, 2, 0); + + if (is_frame_picture_type) { + + mv_vertical /= 2; + + if (has_odd_vertical_vector) + mv_vertical--; + + mc_vector |= NV_VPE_MOTION_VECTOR_VERTICAL(y, 16, mv_vertical, 1, 0); + + } else if (!is_dual_prime_motion){ + + if (has_odd_vertical_vector) + mv_vertical--; + + mc_vector |= NV_VPE_MOTION_VECTOR_VERTICAL(y, 32, mv_vertical, 1, (is_first ? 0 : 16)); + } + else { + + if (has_odd_vertical_vector) + mv_vertical--; + + mc_vector |= NV_VPE_MOTION_VECTOR_VERTICAL(y, 16, mv_vertical, 1, 0); + } + + nouveau_vpe_pushbuf_write(vpe_channel, mc_header); + nouveau_vpe_pushbuf_write(vpe_channel, mc_vector); +} + +static void +nv_vpe_mpeg2_mb_2mv_chroma(struct nouveau_vpe_channel *vpe_channel, boolean is_frame_picture_type, + boolean is_forward, boolean is_first, boolean is_dual_prime_motion, + boolean is_vertical_motion, unsigned int x, + unsigned int y, int mv_horizontal, int mv_vertical, + unsigned int mc_header_base, + int target_surface_index) +{ + unsigned int mc_header; + unsigned int mc_vector; + boolean has_odd_vertical_vector; + boolean has_odd_horizontal_vector; + + has_odd_horizontal_vector = is_odd_multiple(mv_horizontal, 2); + + if (is_frame_picture_type) { + if (mv_vertical < 0) + mv_vertical--; + has_odd_vertical_vector = is_odd_multiple(mv_vertical, 4); + } + else { + has_odd_vertical_vector = is_odd_multiple(mv_vertical, 2); + } + + mc_header = nv_vpe_mpeg2_mb_mc_header(NV_VPE_CMD_CHROMA_MOTION_VECTOR_HEADER, + mc_header_base, + has_odd_horizontal_vector, + has_odd_vertical_vector, + is_forward, + is_first, is_vertical_motion); + + mc_header |= NV_VPE_CMD_PREDICTION_SURFACE(target_surface_index); + + mc_vector = NV_VPE_CMD_MOTION_VECTOR << NV_VPE_CMD_TYPE_SHIFT; + + mv_horizontal /= 2; + if (has_odd_horizontal_vector) + mv_horizontal--; + + mc_vector |= NV_VPE_MOTION_VECTOR_HORIZONTAL(x, 16, mv_horizontal, 1, 0); + + if (is_frame_picture_type) { + + mv_vertical /= 4; + + if (has_odd_vertical_vector) + mv_vertical--; + + mc_vector |= NV_VPE_MOTION_VECTOR_VERTICAL(y, 8, mv_vertical, 1, 0); + + } else if (!is_dual_prime_motion){ + + mv_vertical /= 2; + + if (has_odd_vertical_vector) + mv_vertical--; + + mc_vector |= NV_VPE_MOTION_VECTOR_VERTICAL(y, 16, mv_vertical, 1, (is_first ? 0 : 8)); + } + else { + + mv_vertical /= 4; + + mc_vector |= NV_VPE_MOTION_VECTOR_VERTICAL(y, 8, mv_vertical, 1, 0); + } + + nouveau_vpe_pushbuf_write(vpe_channel, mc_header); + nouveau_vpe_pushbuf_write(vpe_channel, mc_vector); +} + +static int +nv_vpe_mpeg2_mb_2fbmv(struct nouveau_vpe_channel *vpe_channel, + enum pipe_mpeg12_picture_type picture_type, + int target_surface_index, int past_surface_index, + int future_surface_index, + struct pipe_mpeg12_macroblock *mb) +{ + unsigned int ret; + unsigned int x; + unsigned int y; + boolean has_forward; + boolean has_backward; + boolean is_frame_picture_type; + boolean is_dual_prime_motion; + boolean is_vertical_forward_motion; + boolean is_second_vertical_forward_motion; + boolean is_vertical_backward_motion; + boolean is_second_vertical_backward_motion; + unsigned int mc_header_base; + int mv_horizontal_forward; + int mv_vertical_forward; + int mv_second_horizontal_forward; + int mv_second_vertical_forward; + int mv_horizontal_backward; + int mv_vertical_backward; + int mv_second_horizontal_backward; + int mv_second_vertical_backward; + + ret = nouveau_vpe_pushbuf_start_mb(vpe_channel); + if (ret) { + debug_printf("[nv_vpe] - could not start 2fbmv. error %d.\n", ret); + return ret; + } + + x = mb->mbx; + y = mb->mby; + + is_frame_picture_type = picture_type == PIPE_MPEG12_PICTURE_TYPE_FRAME; + is_dual_prime_motion = mb->mo_type == PIPE_MPEG12_MOTION_TYPE_DUALPRIME; + has_forward = mb_has_forward_mv(mb->mb_type); + has_backward = mb_has_backward_mv(mb->mb_type); + + mc_header_base = NV_VPE_CMD_MC_MV_COUNT_2; + + if (is_dual_prime_motion) { + /* Dual-prime selects the second forward and first backward vectors.*/ + is_vertical_forward_motion = FALSE; + is_second_vertical_forward_motion = TRUE; + is_vertical_backward_motion = TRUE; + is_second_vertical_backward_motion = FALSE; + /* dual-prime always has forward and backward vectors. + * However, for some reason the nv driver only does this if at least the forward motion vector exists. + * So, if the backward doesn't exist then each motion vector is skipped.*/ + has_backward = has_forward; + } + else { + /* Might need to check this condition again. + * I changed this slightly from my original xvmc stuff. + * I had two else conditions even though they both set the forward motion + * thing. However, the frame picture thing was the only reason another else existed.*/ + if (!is_frame_picture_type) + mc_header_base |= NV_VPE_CMD_FRAME_FRAME_PICT_OR_FIELD; + /* Only the first forward/backward vectors use vertical motion selection.*/ + is_vertical_forward_motion = mb->motion_vertical_field_select & + PIPE_VIDEO_MOTION_VERTICAL_FIELD_SELECT_FIRST_FORWARD; + is_second_vertical_forward_motion = mb->motion_vertical_field_select & + PIPE_VIDEO_MOTION_VERTICAL_FIELD_SELECT_SECOND_FORWARD; + is_vertical_backward_motion = mb->motion_vertical_field_select & + PIPE_VIDEO_MOTION_VERTICAL_FIELD_SELECT_FIRST_BACKWARD; + is_second_vertical_backward_motion = mb->motion_vertical_field_select & + PIPE_VIDEO_MOTION_VERTICAL_FIELD_SELECT_SECOND_BACKWARD; + } + + /* Be sure the user passed valid predictor surfaces. + * Skip them otherwise. */ + if (has_forward && (past_surface_index == -1) ) + has_forward = FALSE; + + if (has_backward && (future_surface_index == -1) ) + has_backward = FALSE; + + if (!has_forward && !has_backward) + return 0; + + mv_horizontal_forward = mb->pmv[0][0][0]; + mv_vertical_forward = mb->pmv[0][0][1]; + if (!is_dual_prime_motion) { + mv_second_horizontal_forward = mb->pmv[1][0][0]; + mv_second_vertical_forward = mb->pmv[1][0][1]; + } + else { + /* For dual-prime, the second forward vector is a duplicate of the first forward vector.*/ + mv_second_horizontal_forward = mb->pmv[0][0][0]; + mv_second_vertical_forward = mb->pmv[0][0][1]; + } + + if (!is_dual_prime_motion) { + mv_horizontal_backward = mb->pmv[0][1][0]; + mv_vertical_backward = mb->pmv[0][1][1]; + } + else { + /* For dual-prime, the first backward vector actually uses the second forward vector.*/ + mv_horizontal_backward = mb->pmv[1][0][0]; + mv_vertical_backward = mb->pmv[1][0][1]; + } + mv_second_horizontal_backward = mb->pmv[1][1][0]; + mv_second_vertical_backward = mb->pmv[1][1][1]; + + /* Luma */ + if (has_forward) { + nv_vpe_mpeg2_mb_2mv_luma(vpe_channel, is_frame_picture_type, TRUE, + TRUE, is_dual_prime_motion, is_vertical_forward_motion, + x, y, mv_horizontal_forward, mv_vertical_forward, + mc_header_base, past_surface_index); + + nv_vpe_mpeg2_mb_2mv_luma(vpe_channel, is_frame_picture_type, TRUE, + FALSE, is_dual_prime_motion, is_second_vertical_forward_motion, + x, y, mv_second_horizontal_forward, + mv_second_vertical_forward, mc_header_base, + past_surface_index); + } + + if (has_backward) { + nv_vpe_mpeg2_mb_2mv_luma(vpe_channel, is_frame_picture_type, !has_forward, + TRUE, is_dual_prime_motion, is_vertical_backward_motion, + x, y, mv_horizontal_backward, + mv_vertical_backward, mc_header_base, + future_surface_index); + + nv_vpe_mpeg2_mb_2mv_luma(vpe_channel, is_frame_picture_type, !has_forward, + FALSE, is_dual_prime_motion, is_second_vertical_backward_motion, + x, y, mv_second_horizontal_backward, + mv_second_vertical_backward, mc_header_base, + future_surface_index); + } + + if (has_forward || has_backward) + nv_vpe_mpeg2_mb_dct_header(vpe_channel, TRUE, picture_type, + target_surface_index, mb); + + /* Chroma */ + if (has_forward) { + nv_vpe_mpeg2_mb_2mv_chroma(vpe_channel, is_frame_picture_type, TRUE, + TRUE, is_dual_prime_motion, is_vertical_forward_motion, + x, y, mv_horizontal_forward, + mv_vertical_forward, mc_header_base, + past_surface_index); + + nv_vpe_mpeg2_mb_2mv_chroma(vpe_channel, is_frame_picture_type, TRUE, + FALSE, is_dual_prime_motion, is_second_vertical_forward_motion, + x, y, mv_second_horizontal_forward, + mv_second_vertical_forward, mc_header_base, + past_surface_index); + } + if (has_backward) { + nv_vpe_mpeg2_mb_2mv_chroma(vpe_channel, is_frame_picture_type, !has_forward, + TRUE, is_dual_prime_motion, is_vertical_backward_motion, + x, y, mv_horizontal_backward, + mv_vertical_backward, mc_header_base, + future_surface_index); + + nv_vpe_mpeg2_mb_2mv_chroma(vpe_channel, is_frame_picture_type, !has_forward, + FALSE, is_dual_prime_motion, is_second_vertical_backward_motion, + x, y, mv_second_horizontal_backward, + mv_second_vertical_backward, mc_header_base, + future_surface_index); + } + + if (has_forward || has_backward) + nv_vpe_mpeg2_mb_dct_header(vpe_channel, FALSE, picture_type, + target_surface_index, mb); + + if ( (has_forward || has_backward) && mb->cbp) { + ret = nouveau_vpe_pushbuf_start_mb_db(vpe_channel); + if (ret) { + debug_printf("[nv_vpe] - could not start 2fbmv db. error %d.\n", ret); + return ret; + } + nv_vpe_mpeg2_mb_dct_blocks(vpe_channel, mb->cbp, FALSE, mb->blocks); + } + + return 0; +} + +/* Exports */ + +int +nv_vpe_mpeg2_mc_renderer_create(struct nouveau_device *dev, enum pipe_video_profile profile, + enum pipe_video_chroma_format chroma_format, + enum pipe_video_entry_point entry_point, + unsigned decode_flags, + unsigned width, + unsigned height, + struct nouveau_vpe_channel **vpe_channel) +{ + switch (dev->chipset & 0xf0) { + case 0x30: + case 0x40: + case 0x60: + break; + default: + debug_printf("[nv_vpe] Chipset nv%02x is not supported. Only nv30 and nv40 are supported.\n", dev->chipset); + return -EINVAL; + } + + if ((profile != PIPE_VIDEO_PROFILE_MPEG2_SIMPLE) && + (profile != PIPE_VIDEO_PROFILE_MPEG2_MAIN) ) { + debug_printf("[nv_vpe] Cannot decode requested profile %d. Only mpeg2 supported.\n", profile); + return -EINVAL; + } + + if (chroma_format != PIPE_VIDEO_CHROMA_FORMAT_420) { + debug_printf("[nv_vpe] Cannot decode requested chroma format %d. Only 420 supported.\n", chroma_format); + return -EINVAL; + } + + if (entry_point != PIPE_VIDEO_ENTRY_POINT_IDCT) { + debug_printf("[nv_vpe] Cannot decode at requested entry point %d. Only IDCT supported.\n", entry_point); + return -EINVAL; + } + + if (decode_flags & PIPE_VIDEO_DECODE_FLAG_MB_INTRA_UNSIGNED) { + debug_printf("[nv_vpe] Cannot decode requested surface type at IDCT entry point. Only signed intra supported.\n"); + return -EINVAL; + } + + if ((width < NV_VPE_MIN_WIDTH) || + (width > NV_VPE_MAX_WIDTH) || + (height < NV_VPE_MIN_HEIGHT) || + (height > NV_VPE_MAX_HEIGHT) ) { + debug_printf("[nv_vpe] Unsupported width = %d, height = %d. %dx%d up to %dx%d supported.\n", width, + height, NV_VPE_MIN_WIDTH, NV_VPE_MIN_HEIGHT, + NV_VPE_MAX_WIDTH, NV_VPE_MAX_HEIGHT); + return -EINVAL; + } + + return nouveau_vpe_channel_alloc(dev, width, height, vpe_channel); +} + +void +nv_vpe_mpeg2_mc_renderer_destroy(struct nouveau_vpe_channel **vpe_channel) +{ + nouveau_vpe_channel_free(vpe_channel); +} + +int +nv_vpe_mpeg2_mc_renderer_surface_create(struct nouveau_vpe_channel *vpe_channel, + struct nv_vpe_pipe_surface *surface) +{ + int ret; + struct nouveau_bo *luma_bo; + struct nouveau_bo *chroma_bo; + + if (!vpe_channel || !vpe_channel->device || !surface || + !surface->luma_surf || !surface->chroma_surf) + return -EINVAL; + + luma_bo = nvfx_surface_buffer(surface->luma_surf); + chroma_bo = nvfx_surface_buffer(surface->chroma_surf); + + ret = nouveau_vpe_surface_alloc(vpe_channel, + luma_bo->handle, chroma_bo->handle, + &surface->surface_index); + if (ret) + debug_printf("[nv_vpe] Could not allocate video surface. error %d.\n", ret); + + return ret; +} + +void +nv_vpe_mpeg2_mc_renderer_surface_destroy(struct nouveau_vpe_channel *vpe_channel, + struct nv_vpe_pipe_surface *surface) +{ + if (!vpe_channel || !surface) + return; + + nouveau_vpe_surface_free(vpe_channel, surface->surface_index); +} + +int +nv_vpe_mpeg2_mc_renderer_surface_query(struct nouveau_vpe_channel *vpe_channel, + struct nv_vpe_pipe_surface *surface, + enum pipe_video_surface_status *status) +{ + int ret; + uint32_t is_busy; + + if (!vpe_channel || !surface || !status) + return -EINVAL; + + ret = nouveau_vpe_surface_query(vpe_channel, surface->surface_index, + &is_busy); + if (!ret) { + if (is_busy) + *status = PIPE_VIDEO_SURFACE_STATUS_RENDERING; + else + *status = PIPE_VIDEO_SURFACE_STATUS_FREE; + } + else + debug_printf("[nv_vpe] Could not query surface %d. error %d.\n", + surface->surface_index, ret); + + return ret; +} + +int +nv_vpe_mpeg2_mc_renderer_decode_macroblocks(struct nouveau_vpe_channel *vpe_channel, + struct nv_vpe_pipe_surface *target, + struct nv_vpe_pipe_surface *past, + struct nv_vpe_pipe_surface *future, + enum pipe_mpeg12_picture_type picture_type, + unsigned num_macroblocks, + struct pipe_mpeg12_macroblock *mb_array) +{ + int ret; + struct pipe_mpeg12_macroblock *mb; + int target_surface_index; + int past_surface_index; + int future_surface_index; + unsigned int i; + + target_surface_index = target->surface_index; + + if (past) + past_surface_index = past->surface_index; + else + past_surface_index = -1; + + if (future) + future_surface_index = future->surface_index; + else + future_surface_index = -1; + + ret = nouveau_vpe_pushbuf_start(vpe_channel, 0, num_macroblocks, + target_surface_index, + past_surface_index, + future_surface_index); + if (ret) { + debug_printf("[nv_vpe] could start mb sequence. error %d.\n", + ret); + return ret; + } + + for (i = 0; i < num_macroblocks; i++) { + mb = &mb_array[i]; + if (mb->mb_type == PIPE_MPEG12_MACROBLOCK_TYPE_INTRA) + ret = nv_vpe_mpeg2_mb_ipicture(vpe_channel, picture_type, + target_surface_index, mb); + else if (picture_type == PIPE_MPEG12_PICTURE_TYPE_FRAME) { + switch (mb->mo_type) { + case PIPE_MPEG12_MOTION_TYPE_FIELD: + ret = nv_vpe_mpeg2_mb_2fbmv(vpe_channel, picture_type, + target_surface_index, past_surface_index, + future_surface_index, mb); + break; + + case PIPE_MPEG12_MOTION_TYPE_FRAME: + case PIPE_MPEG12_MOTION_TYPE_16x8: + ret = nv_vpe_mpeg2_mb_1fbmv(vpe_channel, picture_type, + target_surface_index, past_surface_index, + future_surface_index, mb); + break; + + case PIPE_MPEG12_MOTION_TYPE_DUALPRIME: + ret = nv_vpe_mpeg2_mb_2fbmv(vpe_channel, picture_type, + target_surface_index, past_surface_index, + future_surface_index, mb); + break; + + default: + /* INVALID..Do not log though.*/ + continue; + } + } + else { /* Field Picture*/ + switch (mb->mo_type) { + case PIPE_MPEG12_MOTION_TYPE_FIELD: + ret = nv_vpe_mpeg2_mb_1fbmv(vpe_channel, picture_type, + target_surface_index, past_surface_index, + future_surface_index, mb); + break; + + case PIPE_MPEG12_MOTION_TYPE_FRAME: + case PIPE_MPEG12_MOTION_TYPE_16x8: + ret = nv_vpe_mpeg2_mb_2fbmv(vpe_channel, picture_type, + target_surface_index, past_surface_index, + future_surface_index, mb); + break; + + case PIPE_MPEG12_MOTION_TYPE_DUALPRIME: + ret = nv_vpe_mpeg2_mb_1fbmv(vpe_channel, picture_type, + target_surface_index, past_surface_index, + future_surface_index, mb); + break; + + default: + /* INVALID..Do not log though.*/ + continue; + } + } + + if (ret) { + debug_printf("[nv_vpe] could process mb %d. error %d.\n", + i, ret); + return ret; + } + + ret = nouveau_vpe_pushbuf_end_mb(vpe_channel); + if (ret) { + debug_printf("[nv_vpe] could end mb %d. error %d.\n", + i, ret); + return ret; + } + } + + ret = nouveau_vpe_pushbuf_fire(vpe_channel, 1); + if (ret) + debug_printf("[nv_vpe] could end mb sequence. error %d.\n", + ret); + + return ret; +} diff --git a/src/gallium/drivers/nvfx/nvfx_vpe_mpeg2_mc_renderer.h b/src/gallium/drivers/nvfx/nvfx_vpe_mpeg2_mc_renderer.h new file mode 100644 index 0000000..05f4311 --- /dev/null +++ b/src/gallium/drivers/nvfx/nvfx_vpe_mpeg2_mc_renderer.h @@ -0,0 +1,63 @@ +/* + * Copyright (C) 2010 Jimmy Rentz + * All Rights Reserved. + * + * Permission is hereby granted, free of charge, to any person obtaining + * a copy of this software and associated documentation files (the + * "Software"), to deal in the Software without restriction, including + * without limitation the rights to use, copy, modify, merge, publish, + * distribute, sublicense, and/or sell copies of the Software, and to + * permit persons to whom the Software is furnished to do so, subject to + * the following conditions: + * + * The above copyright notice and this permission notice (including the + * next paragraph) shall be included in all copies or substantial + * portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. + * IN NO EVENT SHALL THE COPYRIGHT OWNER(S) AND/OR ITS SUPPLIERS BE + * LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION + * OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION + * WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. + * + */ + +#ifndef __NV_VPE_MPEG2_MC_RENDERER_H__ +#define __NV_VPE_MPEG2_MC_RENDERER_H__ + +int +nv_vpe_mpeg2_mc_renderer_create(struct nouveau_device *dev, enum pipe_video_profile profile, + enum pipe_video_chroma_format chroma_format, + enum pipe_video_entry_point entry_point, + unsigned decode_flags, + unsigned width, + unsigned height, + struct nouveau_vpe_channel **vpe_channel); + +void +nv_vpe_mpeg2_mc_renderer_destroy(struct nouveau_vpe_channel **vpe_channel); + +int +nv_vpe_mpeg2_mc_renderer_surface_create(struct nouveau_vpe_channel *vpe_channel, + struct nv_vpe_pipe_surface *surface); + +void +nv_vpe_mpeg2_mc_renderer_surface_destroy(struct nouveau_vpe_channel *vpe_channel, + struct nv_vpe_pipe_surface *surface); + +int +nv_vpe_mpeg2_mc_renderer_surface_query(struct nouveau_vpe_channel *vpe_channel, + struct nv_vpe_pipe_surface *surface, + enum pipe_video_surface_status *status); + +int +nv_vpe_mpeg2_mc_renderer_decode_macroblocks(struct nouveau_vpe_channel *vpe_channel, + struct nv_vpe_pipe_surface *target, + struct nv_vpe_pipe_surface *past, + struct nv_vpe_pipe_surface *future, + enum pipe_mpeg12_picture_type picture_type, + unsigned num_macroblocks, + struct pipe_mpeg12_macroblock *mb_array); +#endif diff --git a/src/gallium/drivers/nvfx/nvfx_vpe_video_context.c b/src/gallium/drivers/nvfx/nvfx_vpe_video_context.c new file mode 100644 index 0000000..38e99cf --- /dev/null +++ b/src/gallium/drivers/nvfx/nvfx_vpe_video_context.c @@ -0,0 +1,532 @@ +/* + * Copyright (C) 2010 Jimmy Rentz + * All Rights Reserved. + * + * Permission is hereby granted, free of charge, to any person obtaining + * a copy of this software and associated documentation files (the + * "Software"), to deal in the Software without restriction, including + * without limitation the rights to use, copy, modify, merge, publish, + * distribute, sublicense, and/or sell copies of the Software, and to + * permit persons to whom the Software is furnished to do so, subject to + * the following conditions: + * + * The above copyright notice and this permission notice (including the + * next paragraph) shall be included in all copies or substantial + * portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. + * IN NO EVENT SHALL THE COPYRIGHT OWNER(S) AND/OR ITS SUPPLIERS BE + * LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION + * OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION + * WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. + * + */ + +#include <stdio.h> +#include <stdlib.h> +#include <stdint.h> +#include <string.h> +#include <errno.h> + +#include "util/u_inlines.h" +#include "util/u_memory.h" + +#include <util/u_memory.h> +#include <util/u_rect.h> +#include <util/u_video.h> + +#include "nouveau/nouveau_winsys.h" +#include "nvfx_screen.h" +#include "nv04_surface_2d.h" +#include "nvfx_vpe_video_context.h" +#include "nvfx_vpe_mpeg2_mc_renderer.h" + +static void +nv_vpe_video_destroy(struct pipe_video_context *vpipe) +{ + struct nv_vpe_video_context *ctx; + + assert(vpipe); + + ctx = nv_vpe_video_context(vpipe); + + vl_compositor_cleanup(&ctx->compositor); + + nv_vpe_mpeg2_mc_renderer_destroy(&ctx->vpe_channel); + + ctx->pipe->destroy(ctx->pipe); + + FREE(ctx); +} + +static int +nv_vpe_get_param(struct pipe_video_context *vpipe, int param) +{ + assert(vpipe); + + debug_printf("[nv_vpe]: get_param not supported\n"); + + return 0; +} + +static boolean +nv_vpe_is_format_supported(struct pipe_video_context *vpipe, + enum pipe_format format, unsigned usage, + unsigned geom) +{ + assert(vpipe); + + debug_printf("[nv_vpe]: is_format_supported not supported\n"); + + return false; +} + +static void +nv_vpe_create_decoder_surface(struct pipe_video_context *vpipe, + struct pipe_surface **surface) +{ + int ret; + struct nv_vpe_video_context *ctx; + struct nv_vpe_pipe_surface *vpe_surface; + struct pipe_resource template; + struct pipe_resource *vsfc_tex; + + assert(vpipe); + assert(surface); + + *surface = NULL; + + ctx = nv_vpe_video_context(vpipe); + + vpe_surface = CALLOC_STRUCT(nv_vpe_pipe_surface); + + if (!vpe_surface) + return; + + /* Create the NV12 luma and chroma surfaces.*/ + memset(&template, 0, sizeof(struct pipe_resource)); + template.target = PIPE_TEXTURE_2D; + template.format = PIPE_FORMAT_R8_UNORM; + template.last_level = 0; + template.width0 = vpipe->width; + template.height0 = vpipe->height; + template.depth0 = 1; + template.usage = PIPE_USAGE_DEFAULT; + template.bind = PIPE_BIND_SAMPLER_VIEW | PIPE_BIND_SCANOUT; + + vsfc_tex = vpipe->screen->resource_create(vpipe->screen, &template); + if (!vsfc_tex) { + FREE(vpe_surface); + debug_printf("[nv_vpe] Could not allocate luma surface.\n"); + return; + } + + vpe_surface->luma_surf = vpipe->screen->get_tex_surface(vpipe->screen, vsfc_tex, 0, 0, 0, + PIPE_BIND_SAMPLER_VIEW | PIPE_BIND_SCANOUT); + pipe_resource_reference(&vsfc_tex, NULL); + + memset(&template, 0, sizeof(struct pipe_resource)); + template.target = PIPE_TEXTURE_2D; + template.format = PIPE_FORMAT_G8B8_UNORM; + template.last_level = 0; + /* Chroma is of 1/2.*/ + template.width0 = vpipe->width / 2; + template.height0 = vpipe->height / 2; + template.depth0 = 1; + template.usage = PIPE_USAGE_DEFAULT; + template.bind = PIPE_BIND_SAMPLER_VIEW | PIPE_BIND_SCANOUT; + + vsfc_tex = vpipe->screen->resource_create(vpipe->screen, &template); + if (!vsfc_tex) { + pipe_surface_reference(&vpe_surface->luma_surf, NULL); + FREE(vpe_surface); + debug_printf("[nv_vpe] Could not allocate chroma surface.\n"); + return; + } + + vpe_surface->chroma_surf = vpipe->screen->get_tex_surface(vpipe->screen, vsfc_tex, 0, 0, 0, + PIPE_BIND_SAMPLER_VIEW | PIPE_BIND_SCANOUT); + pipe_resource_reference(&vsfc_tex, NULL); + + ret = nv_vpe_mpeg2_mc_renderer_surface_create(ctx->vpe_channel, + vpe_surface); + if (ret) { + pipe_surface_reference(&vpe_surface->luma_surf, NULL); + pipe_surface_reference(&vpe_surface->chroma_surf, NULL); + FREE(vpe_surface); + debug_printf("[nv_vpe] Could not allocate vpe surface. error %d.\n", ret); + return; + } + + *surface = &vpe_surface->base; +} + +static void +nv_vpe_destroy_decoder_surface(struct pipe_video_context *vpipe, + struct pipe_surface **surface) +{ + struct nv_vpe_video_context *ctx; + struct nv_vpe_pipe_surface *vpe_surface; + + assert(vpipe); + assert(surface); + assert(*surface); + + ctx = nv_vpe_video_context(vpipe); + + vpe_surface = nv_vpe_pipe_surface(*surface); + + nv_vpe_mpeg2_mc_renderer_surface_destroy(ctx->vpe_channel, + vpe_surface); + + pipe_surface_reference(&vpe_surface->luma_surf, NULL); + pipe_surface_reference(&vpe_surface->chroma_surf, NULL); + + FREE(vpe_surface); + *surface = NULL; +} + + +static void +nv_vpe_query_decoder_surface(struct pipe_video_context *vpipe, + struct pipe_surface *surface, + enum pipe_video_surface_status *status) +{ + int ret; + struct nv_vpe_video_context *ctx; + struct nv_vpe_pipe_surface *vpe_surface; + + assert(vpipe); + assert(surface); + assert(status); + + ctx = nv_vpe_video_context(vpipe); + vpe_surface = nv_vpe_pipe_surface(surface); + + ret = nv_vpe_mpeg2_mc_renderer_surface_query(ctx->vpe_channel, vpe_surface, status); + /* how to handle errors?*/ + assert(ret == 0); +} + +static void +nv_vpe_decode_macroblocks(struct pipe_video_context *vpipe, + struct pipe_surface *past, + struct pipe_surface *future, + enum pipe_mpeg12_picture_type picture_type, + unsigned num_macroblocks, + struct pipe_macroblock *macroblocks, + struct pipe_fence_handle **fence) +{ + struct nv_vpe_video_context *ctx; + int ret; + + assert(vpipe); + assert(num_macroblocks); + assert(macroblocks); + /* Only mpeg2 is supported really...*/ + assert(macroblocks->codec == PIPE_VIDEO_CODEC_MPEG12); + + ctx = nv_vpe_video_context(vpipe); + assert(ctx->decode_target); + + ret = nv_vpe_mpeg2_mc_renderer_decode_macroblocks(ctx->vpe_channel, + nv_vpe_pipe_surface(ctx->decode_target), + past ? nv_vpe_pipe_surface(past) : NULL, + future ? nv_vpe_pipe_surface(future) : NULL, + picture_type, + num_macroblocks, + (struct pipe_mpeg12_macroblock *)macroblocks); + /* How to handle errors?*/ + assert(ret == 0); +} + +static void +nv_vpe_render_picture(struct pipe_video_context *vpipe, + /*struct pipe_surface *backround, + struct pipe_video_rect *backround_area,*/ + struct pipe_surface *src_surface, + enum pipe_mpeg12_picture_type picture_type, + /*unsigned num_past_surfaces, + struct pipe_video_surface *past_surfaces, + unsigned num_future_surfaces, + struct pipe_video_surface *future_surfaces,*/ + struct pipe_video_rect *src_area, + struct pipe_surface *dst_surface, + struct pipe_video_rect *dst_area, + /*unsigned num_layers, + struct pipe_surface *layers, + struct pipe_video_rect *layer_src_areas, + struct pipe_video_rect *layer_dst_areas*/ + struct pipe_fence_handle **fence) +{ + struct nv_vpe_video_context *ctx; + + assert(vpipe); + assert(src_surface); + assert(src_area); + assert(dst_surface); + assert(dst_area); + + ctx = nv_vpe_video_context(vpipe); + + vl_compositor_render_nv12(&ctx->compositor, nv_vpe_pipe_surface(src_surface)->luma_surf, + nv_vpe_pipe_surface(src_surface)->chroma_surf, + picture_type, src_area, dst_surface, dst_area, fence); +} + +static void +nv_vpe_set_picture_background(struct pipe_video_context *vpipe, + struct pipe_surface *bg, + struct pipe_video_rect *bg_src_rect) +{ + struct nv_vpe_video_context *ctx; + + assert(vpipe); + assert(bg); + assert(bg_src_rect); + + ctx = nv_vpe_video_context(vpipe); + + vl_compositor_set_background(&ctx->compositor, bg, bg_src_rect); +} + +static void +nv_vpe_set_picture_layers(struct pipe_video_context *vpipe, + struct pipe_surface *layers[], + struct pipe_video_rect *src_rects[], + struct pipe_video_rect *dst_rects[], + unsigned num_layers) +{ + struct nv_vpe_video_context *ctx; + + assert(vpipe); + assert((layers && src_rects && dst_rects) || + (!layers && !src_rects && !dst_rects)); + + ctx = nv_vpe_video_context(vpipe); + + vl_compositor_set_layers(&ctx->compositor, layers, src_rects, dst_rects, num_layers); +} + +static void +nv_vpe_surface_fill(struct pipe_video_context *vpipe, + struct pipe_surface *dst, + unsigned dstx, unsigned dsty, + unsigned width, unsigned height, + unsigned value) +{ + /*struct nv_vpe_video_context *ctx; + + assert(vpipe); + assert(dst); + + ctx = nv_vpe_video_context(vpipe); + + if (ctx->pipe->surface_fill) + ctx->pipe->surface_fill(ctx->pipe, dst, dstx, dsty, width, height, value); + else + util_surface_fill(ctx->pipe, dst, dstx, dsty, width, height, value);*/ + + /* Need to fill luma+chroma surfaces somehow.*/ + debug_printf("[nv_vpe]: surface_fill is not supported\n"); +} + +static void +nv_vpe_surface_copy(struct pipe_video_context *vpipe, + struct pipe_surface *dst, + unsigned dstx, unsigned dsty, + struct pipe_surface *src, + unsigned srcx, unsigned srcy, + unsigned width, unsigned height) +{ + /*struct nv_vpe_video_context *ctx; + + assert(vpipe); + assert(dst); + + ctx = nv_vpe_video_context(vpipe); + + if (ctx->pipe->surface_copy) + ctx->pipe->surface_copy(ctx->pipe, dst, dstx, dsty, src, srcx, srcy, width, height); + else + util_surface_copy(ctx->pipe, FALSE, dst, dstx, dsty, src, srcx, srcy, width, height); + */ + + /* Need to copy from luma+chroma surfaces somehow.*/ + + debug_printf("[nv_vpe]: surface_copy is not supported\n"); +} + +static void +nv_vpe_set_decode_target(struct pipe_video_context *vpipe, + struct pipe_surface *dt) +{ + struct nv_vpe_video_context *ctx; + + assert(vpipe); + assert(dt); + + ctx = nv_vpe_video_context(vpipe); + + ctx->decode_target = dt; +} + +static void +nv_vpe_set_csc_matrix(struct pipe_video_context *vpipe, const float *mat) +{ + struct nv_vpe_video_context *ctx; + + assert(vpipe); + + ctx = nv_vpe_video_context(vpipe); + + vl_compositor_set_csc_matrix(&ctx->compositor, mat); +} + +static bool +init_pipe_state(struct nv_vpe_video_context *ctx) +{ + struct pipe_rasterizer_state rast; + struct pipe_blend_state blend; + struct pipe_depth_stencil_alpha_state dsa; + unsigned i; + + assert(ctx); + + rast.flatshade = 1; + rast.flatshade_first = 0; + rast.light_twoside = 0; + rast.front_winding = PIPE_WINDING_CCW; + rast.cull_mode = PIPE_WINDING_CW; + rast.fill_cw = PIPE_POLYGON_MODE_FILL; + rast.fill_ccw = PIPE_POLYGON_MODE_FILL; + rast.offset_cw = 0; + rast.offset_ccw = 0; + rast.scissor = 0; + rast.poly_smooth = 0; + rast.poly_stipple_enable = 0; + rast.sprite_coord_enable = 0; + rast.point_size_per_vertex = 0; + rast.multisample = 0; + rast.line_smooth = 0; + rast.line_stipple_enable = 0; + rast.line_stipple_factor = 0; + rast.line_stipple_pattern = 0; + rast.line_last_pixel = 0; + rast.line_width = 1; + rast.point_smooth = 0; + rast.point_quad_rasterization = 0; + rast.point_size = 1; + rast.offset_units = 1; + rast.offset_scale = 1; + rast.gl_rasterization_rules = 1; + ctx->rast = ctx->pipe->create_rasterizer_state(ctx->pipe, &rast); + ctx->pipe->bind_rasterizer_state(ctx->pipe, ctx->rast); + + + blend.independent_blend_enable = 0; + blend.rt[0].blend_enable = 0; + blend.rt[0].rgb_func = PIPE_BLEND_ADD; + blend.rt[0].rgb_src_factor = PIPE_BLENDFACTOR_ONE; + blend.rt[0].rgb_dst_factor = PIPE_BLENDFACTOR_ONE; + blend.rt[0].alpha_func = PIPE_BLEND_ADD; + blend.rt[0].alpha_src_factor = PIPE_BLENDFACTOR_ONE; + blend.rt[0].alpha_dst_factor = PIPE_BLENDFACTOR_ONE; + blend.logicop_enable = 0; + blend.logicop_func = PIPE_LOGICOP_CLEAR; + /* Needed to allow color writes to FB, even if blending disabled */ + blend.rt[0].colormask = PIPE_MASK_RGBA; + blend.dither = 0; + ctx->blend = ctx->pipe->create_blend_state(ctx->pipe, &blend); + ctx->pipe->bind_blend_state(ctx->pipe, ctx->blend); + + dsa.depth.enabled = 0; + dsa.depth.writemask = 0; + dsa.depth.func = PIPE_FUNC_ALWAYS; + for (i = 0; i < 2; ++i) { + dsa.stencil[i].enabled = 0; + dsa.stencil[i].func = PIPE_FUNC_ALWAYS; + dsa.stencil[i].fail_op = PIPE_STENCIL_OP_KEEP; + dsa.stencil[i].zpass_op = PIPE_STENCIL_OP_KEEP; + dsa.stencil[i].zfail_op = PIPE_STENCIL_OP_KEEP; + dsa.stencil[i].valuemask = 0; + dsa.stencil[i].writemask = 0; + } + dsa.alpha.enabled = 0; + dsa.alpha.func = PIPE_FUNC_ALWAYS; + dsa.alpha.ref_value = 0; + ctx->dsa = ctx->pipe->create_depth_stencil_alpha_state(ctx->pipe, &dsa); + ctx->pipe->bind_depth_stencil_alpha_state(ctx->pipe, ctx->dsa); + + return true; +} + +struct pipe_video_context * +nv_vpe_video_create(struct pipe_context *pipe, enum pipe_video_profile profile, + enum pipe_video_chroma_format chroma_format, + enum pipe_video_entry_point entry_point, + unsigned decode_flags, + unsigned width, unsigned height, + unsigned pvctx_id) +{ + struct nouveau_device *dev = nouveau_screen(pipe->screen)->device; + struct nv_vpe_video_context *ctx; + int ret; + + ctx = CALLOC_STRUCT(nv_vpe_video_context); + + if (!ctx) + return NULL; + + ret = nv_vpe_mpeg2_mc_renderer_create(dev, profile, chroma_format, entry_point, + decode_flags, width, height, + &ctx->vpe_channel); + if (ret) { + FREE(ctx); + return NULL; + } + + ctx->base.profile = profile; + ctx->base.chroma_format = chroma_format; + /* Width and height are adjusted automatically for the hw + * so use those values.*/ + ctx->base.width = ctx->vpe_channel->width; + ctx->base.height = ctx->vpe_channel->height; + ctx->base.entry_point = entry_point; + ctx->base.decode_flags = decode_flags; + + ctx->base.screen = pipe->screen; + ctx->base.destroy = nv_vpe_video_destroy; + ctx->base.get_param = nv_vpe_get_param; + ctx->base.is_format_supported = nv_vpe_is_format_supported; + ctx->base.decode_macroblocks = nv_vpe_decode_macroblocks; + ctx->base.render_picture = nv_vpe_render_picture; + ctx->base.surface_fill = nv_vpe_surface_fill; + ctx->base.surface_copy = nv_vpe_surface_copy; + ctx->base.set_picture_background = nv_vpe_set_picture_background; + ctx->base.set_picture_layers = nv_vpe_set_picture_layers; + ctx->base.set_decode_target = nv_vpe_set_decode_target; + ctx->base.set_csc_matrix = nv_vpe_set_csc_matrix; + ctx->base.create_decoder_surface = nv_vpe_create_decoder_surface; + ctx->base.destroy_decoder_surface = nv_vpe_destroy_decoder_surface; + ctx->base.query_decoder_surface = nv_vpe_query_decoder_surface; + + ctx->pipe = pipe; + + if (!vl_compositor_init(&ctx->compositor, ctx->pipe, PIPE_FORMAT_NV12)) { + ctx->pipe->destroy(ctx->pipe); + FREE(ctx); + return NULL; + } + + if (!init_pipe_state(ctx)) { + vl_compositor_cleanup(&ctx->compositor); + ctx->pipe->destroy(ctx->pipe); + FREE(ctx); + return NULL; + } + + return &ctx->base; +} diff --git a/src/gallium/drivers/nvfx/nvfx_vpe_video_context.h b/src/gallium/drivers/nvfx/nvfx_vpe_video_context.h new file mode 100644 index 0000000..22e97b9 --- /dev/null +++ b/src/gallium/drivers/nvfx/nvfx_vpe_video_context.h @@ -0,0 +1,89 @@ +/* + * Copyright (C) 2010 Jimmy Rentz + * All Rights Reserved. + * + * Permission is hereby granted, free of charge, to any person obtaining + * a copy of this software and associated documentation files (the + * "Software"), to deal in the Software without restriction, including + * without limitation the rights to use, copy, modify, merge, publish, + * distribute, sublicense, and/or sell copies of the Software, and to + * permit persons to whom the Software is furnished to do so, subject to + * the following conditions: + * + * The above copyright notice and this permission notice (including the + * next paragraph) shall be included in all copies or substantial + * portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. + * IN NO EVENT SHALL THE COPYRIGHT OWNER(S) AND/OR ITS SUPPLIERS BE + * LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION + * OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION + * WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. + * + */ + +#ifndef __NV_VPE_VIDEO_CONTEXT_H__ +#define __NV_VPE_VIDEO_CONTEXT_H__ + +#include <pipe/p_video_context.h> +#include <pipe/p_video_state.h> +#include <vl/vl_compositor.h> + +struct pipe_screen; +struct pipe_context; +struct pipe_surface; +struct nouveau_vpe_channel; + +struct nv_vpe_pipe_surface +{ + struct pipe_surface base; + /* pipe textures for a surface. + * They can be referenced as textures: + * - Linear -> So, not swizzled. + * - Rect + * - Luma Surface -> L8 texture format. + * - Chroma Surface -> A8L8 texture format. + */ + struct pipe_surface *luma_surf; + struct pipe_surface *chroma_surf; + + /* index in vpe hw.*/ + uint32_t surface_index; +}; + +struct nv_vpe_video_context +{ + struct pipe_video_context base; + struct pipe_context *pipe; + struct pipe_surface *decode_target; + struct vl_compositor compositor; + + void *rast; + void *dsa; + void *blend; + + struct nouveau_vpe_channel *vpe_channel; +}; + +static INLINE struct nv_vpe_video_context * +nv_vpe_video_context(struct pipe_video_context *vpipe) +{ + return (struct nv_vpe_video_context *)vpipe; +} + +static INLINE struct nv_vpe_pipe_surface * +nv_vpe_pipe_surface(struct pipe_surface *surface) +{ + return (struct nv_vpe_pipe_surface *)surface; +} + +struct pipe_video_context * +nv_vpe_video_create(struct pipe_context *pipe, enum pipe_video_profile profile, + enum pipe_video_chroma_format chroma_format, + enum pipe_video_entry_point entry_point, + unsigned decode_flags, + unsigned width, unsigned height, + unsigned pvctx_id); +#endif diff --git a/src/gallium/drivers/softpipe/sp_video_context.c b/src/gallium/drivers/softpipe/sp_video_context.c index 44df00e..6c0825f 100644 --- a/src/gallium/drivers/softpipe/sp_video_context.c +++ b/src/gallium/drivers/softpipe/sp_video_context.c @@ -36,6 +36,35 @@ #include "sp_public.h" #include "sp_texture.h" +static boolean +sp_mpeg12_validate_codec_params(enum pipe_video_profile profile, + enum pipe_video_chroma_format chroma_format, + enum pipe_video_entry_point entry_point, + unsigned decode_flags) +{ + if (u_reduce_video_profile(profile) != PIPE_VIDEO_CODEC_MPEG12) { + debug_printf("[XvMCg3dvl] Cannot decode requested profile %d. Only mpeg1/2 supported.\n", profile); + return false; + } + + if (chroma_format != PIPE_VIDEO_CHROMA_FORMAT_420) { + debug_printf("[XvMCg3dvl] Cannot decode requested chroma format %d. Only 420 supported.\n", chroma_format); + return false; + } + + if (entry_point != PIPE_VIDEO_ENTRY_POINT_MC) { + debug_printf("[XvMCg3dvl] Cannot decode at requested entry point %d. Only MC supported.\n", entry_point); + return false; + } + + if (!(decode_flags & PIPE_VIDEO_DECODE_FLAG_MB_INTRA_UNSIGNED)) { + debug_printf("[XvMCg3dvl] Cannot decode requested surface type at MC entry point. Signed intra is unsupported.\n"); + return false; + } + + return true; +} + static void sp_mpeg12_destroy(struct pipe_video_context *vpipe) { @@ -105,6 +134,7 @@ static void sp_mpeg12_decode_macroblocks(struct pipe_video_context *vpipe, struct pipe_surface *past, struct pipe_surface *future, + enum pipe_mpeg12_picture_type picture_type, unsigned num_macroblocks, struct pipe_macroblock *macroblocks, struct pipe_fence_handle **fence) @@ -326,6 +356,70 @@ sp_mpeg12_set_csc_matrix(struct pipe_video_context *vpipe, const float *mat) vl_compositor_set_csc_matrix(&ctx->compositor, mat); } +static void +sp_mpeg12_create_decoder_surface(struct pipe_video_context *vpipe, + struct pipe_surface **surface) +{ + assert(vpipe); + assert(surface); + + struct pipe_resource template; + struct pipe_resource *vsfc_tex; + + *surface = NULL; + + memset(&template, 0, sizeof(struct pipe_resource)); + template.target = PIPE_TEXTURE_2D; + template.format = (enum pipe_format)vpipe->get_param(vpipe, PIPE_CAP_DECODE_TARGET_PREFERRED_FORMAT); + template.last_level = 0; + if (vpipe->is_format_supported(vpipe, template.format, + PIPE_BIND_SAMPLER_VIEW | PIPE_BIND_RENDER_TARGET, + PIPE_TEXTURE_GEOM_NON_POWER_OF_TWO)) { + template.width0 = vpipe->width; + template.height0 = vpipe->height; + } + else { + assert(vpipe->is_format_supported(vpipe, template.format, + PIPE_BIND_SAMPLER_VIEW | PIPE_BIND_RENDER_TARGET, + PIPE_TEXTURE_GEOM_NON_SQUARE)); + template.width0 = util_next_power_of_two(vpipe->width); + template.height0 = util_next_power_of_two(vpipe->height); + } + template.depth0 = 1; + template.usage = PIPE_USAGE_DEFAULT; + template.bind = PIPE_BIND_SAMPLER_VIEW | PIPE_BIND_RENDER_TARGET; + template.flags = 0; + vsfc_tex = vpipe->screen->resource_create(vpipe->screen, &template); + if (!vsfc_tex) + return; + + *surface = vpipe->screen->get_tex_surface(vpipe->screen, vsfc_tex, 0, 0, 0, + PIPE_BIND_SAMPLER_VIEW | PIPE_BIND_RENDER_TARGET); + pipe_resource_reference(&vsfc_tex, NULL); +} + +static void +sp_mpeg12_destroy_decoder_surface(struct pipe_video_context *vpipe, + struct pipe_surface **surface) +{ + assert(vpipe); + assert(surface); + + pipe_surface_reference(surface, NULL); +} + + +static void +sp_mpeg12_query_decoder_surface(struct pipe_video_context *vpipe, + struct pipe_surface *surface, + enum pipe_video_surface_status *status) +{ + assert(vpipe); + assert(surface); + assert(status); + *status = PIPE_VIDEO_SURFACE_STATUS_FREE; +} + static bool init_pipe_state(struct sp_mpeg12_context *ctx) { @@ -406,6 +500,8 @@ init_pipe_state(struct sp_mpeg12_context *ctx) static struct pipe_video_context * sp_mpeg12_create(struct pipe_context *pipe, enum pipe_video_profile profile, enum pipe_video_chroma_format chroma_format, + enum pipe_video_entry_point entry_point, + unsigned decode_flags, unsigned width, unsigned height, enum VL_MPEG12_MC_RENDERER_BUFFER_MODE bufmode, enum VL_MPEG12_MC_RENDERER_EMPTY_BLOCK eb_handling, @@ -425,6 +521,8 @@ sp_mpeg12_create(struct pipe_context *pipe, enum pipe_video_profile profile, ctx->base.chroma_format = chroma_format; ctx->base.width = width; ctx->base.height = height; + ctx->base.entry_point = entry_point; + ctx->base.decode_flags = decode_flags; ctx->base.screen = pipe->screen; ctx->base.destroy = sp_mpeg12_destroy; @@ -445,6 +543,9 @@ sp_mpeg12_create(struct pipe_context *pipe, enum pipe_video_profile profile, ctx->base.set_picture_layers = sp_mpeg12_set_picture_layers; ctx->base.set_decode_target = sp_mpeg12_set_decode_target; ctx->base.set_csc_matrix = sp_mpeg12_set_csc_matrix; + ctx->base.create_decoder_surface = sp_mpeg12_create_decoder_surface; + ctx->base.destroy_decoder_surface = sp_mpeg12_destroy_decoder_surface; + ctx->base.query_decoder_surface = sp_mpeg12_query_decoder_surface; ctx->pipe = pipe; ctx->decode_format = decode_format; @@ -457,7 +558,7 @@ sp_mpeg12_create(struct pipe_context *pipe, enum pipe_video_profile profile, return NULL; } - if (!vl_compositor_init(&ctx->compositor, ctx->pipe)) { + if (!vl_compositor_init(&ctx->compositor, ctx->pipe, decode_format)) { vl_mpeg12_mc_renderer_cleanup(&ctx->mc_renderer); ctx->pipe->destroy(ctx->pipe); FREE(ctx); @@ -478,12 +579,18 @@ sp_mpeg12_create(struct pipe_context *pipe, enum pipe_video_profile profile, struct pipe_video_context * sp_video_create(struct pipe_screen *screen, enum pipe_video_profile profile, enum pipe_video_chroma_format chroma_format, + enum pipe_video_entry_point entry_point, + unsigned decode_flags, unsigned width, unsigned height, void *priv) { struct pipe_context *pipe; assert(screen); assert(width && height); + + if (!sp_mpeg12_validate_codec_params(profile, chroma_format, entry_point, + decode_flags)) + return NULL; pipe = screen->context_create(screen, NULL); if (!pipe) @@ -493,6 +600,8 @@ sp_video_create(struct pipe_screen *screen, enum pipe_video_profile profile, /* TODO: Use XFER_NONE when implemented */ return sp_video_create_ex(pipe, profile, chroma_format, + entry_point, + decode_flags, width, height, VL_MPEG12_MC_RENDERER_BUFFER_PICTURE, VL_MPEG12_MC_RENDERER_EMPTY_BLOCK_XFER_ONE, @@ -503,6 +612,8 @@ sp_video_create(struct pipe_screen *screen, enum pipe_video_profile profile, struct pipe_video_context * sp_video_create_ex(struct pipe_context *pipe, enum pipe_video_profile profile, enum pipe_video_chroma_format chroma_format, + enum pipe_video_entry_point entry_point, + unsigned decode_flags, unsigned width, unsigned height, enum VL_MPEG12_MC_RENDERER_BUFFER_MODE bufmode, enum VL_MPEG12_MC_RENDERER_EMPTY_BLOCK eb_handling, @@ -511,11 +622,16 @@ sp_video_create_ex(struct pipe_context *pipe, enum pipe_video_profile profile, { assert(pipe); assert(width && height); + + if (!sp_mpeg12_validate_codec_params(profile, chroma_format, entry_point, + decode_flags)) + return NULL; switch (u_reduce_video_profile(profile)) { case PIPE_VIDEO_CODEC_MPEG12: return sp_mpeg12_create(pipe, profile, chroma_format, + entry_point, decode_flags, width, height, bufmode, eb_handling, pot_buffers, diff --git a/src/gallium/drivers/softpipe/sp_video_context.h b/src/gallium/drivers/softpipe/sp_video_context.h index 0fe48d7..0657a44 100644 --- a/src/gallium/drivers/softpipe/sp_video_context.h +++ b/src/gallium/drivers/softpipe/sp_video_context.h @@ -53,13 +53,17 @@ struct sp_mpeg12_context struct pipe_video_context * sp_video_create(struct pipe_screen *screen, enum pipe_video_profile profile, enum pipe_video_chroma_format chroma_format, - unsigned width, unsigned height, void *priv); + enum pipe_video_entry_point entry_point, + unsigned decode_flags, unsigned width, unsigned height, + void *priv); /* Other drivers can call this function in their pipe_video_context constructors and pass it an accelerated pipe_context along with suitable buffering modes, etc */ struct pipe_video_context * sp_video_create_ex(struct pipe_context *pipe, enum pipe_video_profile profile, enum pipe_video_chroma_format chroma_format, + enum pipe_video_entry_point entry_point, + unsigned decode_flags, unsigned width, unsigned height, enum VL_MPEG12_MC_RENDERER_BUFFER_MODE bufmode, enum VL_MPEG12_MC_RENDERER_EMPTY_BLOCK eb_handling, diff --git a/src/gallium/include/pipe/p_defines.h b/src/gallium/include/pipe/p_defines.h index 2831818..a663b59 100644 --- a/src/gallium/include/pipe/p_defines.h +++ b/src/gallium/include/pipe/p_defines.h @@ -487,6 +487,16 @@ enum pipe_video_profile PIPE_VIDEO_PROFILE_MPEG4_AVC_HIGH }; +enum pipe_video_entry_point +{ + PIPE_VIDEO_ENTRY_POINT_CSC, + PIPE_VIDEO_ENTRY_POINT_MC, + PIPE_VIDEO_ENTRY_POINT_IDCT, + PIPE_VIDEO_ENTRY_POINT_BS +}; + +#define PIPE_VIDEO_DECODE_FLAG_MB_INTRA_UNSIGNED 1 + #ifdef __cplusplus } diff --git a/src/gallium/include/pipe/p_format.h b/src/gallium/include/pipe/p_format.h index 5ca27b3..304e583 100644 --- a/src/gallium/include/pipe/p_format.h +++ b/src/gallium/include/pipe/p_format.h @@ -199,6 +199,7 @@ enum pipe_format { PIPE_FORMAT_VUYX = PIPE_FORMAT_B8G8R8X8_UNORM, PIPE_FORMAT_IA44 = 141, PIPE_FORMAT_AI44 = 142, + PIPE_FORMAT_G8B8_UNORM = 143, PIPE_FORMAT_COUNT }; diff --git a/src/gallium/include/pipe/p_screen.h b/src/gallium/include/pipe/p_screen.h index 919d162..5a1bf61 100644 --- a/src/gallium/include/pipe/p_screen.h +++ b/src/gallium/include/pipe/p_screen.h @@ -93,6 +93,8 @@ struct pipe_screen { struct pipe_video_context * (*video_context_create)( struct pipe_screen *screen, enum pipe_video_profile profile, enum pipe_video_chroma_format chroma_format, + enum pipe_video_entry_point entry_point, + unsigned decode_flags, unsigned width, unsigned height, void *priv ); /** diff --git a/src/gallium/include/pipe/p_video_context.h b/src/gallium/include/pipe/p_video_context.h index 294dc46..94f9ce2 100644 --- a/src/gallium/include/pipe/p_video_context.h +++ b/src/gallium/include/pipe/p_video_context.h @@ -52,6 +52,8 @@ struct pipe_video_context struct pipe_screen *screen; enum pipe_video_profile profile; enum pipe_video_chroma_format chroma_format; + enum pipe_video_entry_point entry_point; + unsigned decode_flags; unsigned width; unsigned height; @@ -73,6 +75,17 @@ struct pipe_video_context unsigned geom); void (*destroy)(struct pipe_video_context *vpipe); + + /** + * Decoder surface creation/destruction/query + */ + void (*create_decoder_surface)(struct pipe_video_context *vpipe, + struct pipe_surface **surface); + void (*destroy_decoder_surface)(struct pipe_video_context *vpipe, + struct pipe_surface **surface); + void (*query_decoder_surface)(struct pipe_video_context *vpipe, + struct pipe_surface *surface, + enum pipe_video_surface_status *status); /** * Picture decoding and displaying @@ -85,6 +98,7 @@ struct pipe_video_context void (*decode_macroblocks)(struct pipe_video_context *vpipe, struct pipe_surface *past, struct pipe_surface *future, + enum pipe_mpeg12_picture_type picture_type, unsigned num_macroblocks, struct pipe_macroblock *macroblocks, struct pipe_fence_handle **fence); diff --git a/src/gallium/include/pipe/p_video_state.h b/src/gallium/include/pipe/p_video_state.h index 5eb9635..08a6707 100644 --- a/src/gallium/include/pipe/p_video_state.h +++ b/src/gallium/include/pipe/p_video_state.h @@ -38,6 +38,13 @@ extern "C" { #endif +enum pipe_video_surface_status +{ + PIPE_VIDEO_SURFACE_STATUS_FREE, + PIPE_VIDEO_SURFACE_STATUS_RENDERING, + PIPE_VIDEO_SURFACE_STATUS_DISPLAYING +}; + struct pipe_video_rect { unsigned x, y, w, h; @@ -79,6 +86,12 @@ struct pipe_macroblock enum pipe_video_codec codec; }; +/* Same motion_field_select from xvmc.*/ +#define PIPE_VIDEO_MOTION_VERTICAL_FIELD_SELECT_FIRST_FORWARD 0x1 +#define PIPE_VIDEO_MOTION_VERTICAL_FIELD_SELECT_FIRST_BACKWARD 0x2 +#define PIPE_VIDEO_MOTION_VERTICAL_FIELD_SELECT_SECOND_FORWARD 0x4 +#define PIPE_VIDEO_MOTION_VERTICAL_FIELD_SELECT_SECOND_BACKWARD 0x8 + struct pipe_mpeg12_macroblock { struct pipe_macroblock base; @@ -88,6 +101,7 @@ struct pipe_mpeg12_macroblock enum pipe_mpeg12_macroblock_type mb_type; enum pipe_mpeg12_motion_type mo_type; enum pipe_mpeg12_dct_type dct_type; + unsigned motion_vertical_field_select; signed pmv[2][2][2]; unsigned cbp; short *blocks; diff --git a/src/gallium/state_trackers/xorg/xvmc/context.c b/src/gallium/state_trackers/xorg/xvmc/context.c index 5e4af9e..272ae93 100644 --- a/src/gallium/state_trackers/xorg/xvmc/context.c +++ b/src/gallium/state_trackers/xorg/xvmc/context.c @@ -172,6 +172,28 @@ static enum pipe_video_chroma_format FormatToPipe(int xvmc_format) return -1; } +static enum pipe_video_entry_point EntryPointToPipe(int xvmc_mc_type) +{ + if (xvmc_mc_type & XVMC_IDCT) + return PIPE_VIDEO_ENTRY_POINT_IDCT; + if ((xvmc_mc_type & XVMC_MOCOMP) == XVMC_MOCOMP) + return PIPE_VIDEO_ENTRY_POINT_MC; + + assert(0); + + return -1; +} + +static unsigned SurfaceFlagsToPipe(int surface_flags) +{ + unsigned flags = 0; + + if (surface_flags & XVMC_INTRA_UNSIGNED) + flags |= PIPE_VIDEO_DECODE_FLAG_MB_INTRA_UNSIGNED; + + return flags; +} + PUBLIC Status XvMCCreateContext(Display *dpy, XvPortID port, int surface_type_id, int width, int height, int flags, XvMCContext *context) @@ -204,20 +226,6 @@ Status XvMCCreateContext(Display *dpy, XvPortID port, int surface_type_id, if (ret != Success || !found_port) return ret; - /* XXX: Current limits */ - if (chroma_format != XVMC_CHROMA_FORMAT_420) { - XVMC_MSG(XVMC_ERR, "[XvMC] Cannot decode requested surface type. Unsupported chroma format.\n"); - return BadImplementation; - } - if (mc_type != (XVMC_MOCOMP | XVMC_MPEG_2)) { - XVMC_MSG(XVMC_ERR, "[XvMC] Cannot decode requested surface type. Non-MPEG2/Mocomp acceleration unsupported.\n"); - return BadImplementation; - } - if (!(surface_flags & XVMC_INTRA_UNSIGNED)) { - XVMC_MSG(XVMC_ERR, "[XvMC] Cannot decode requested surface type. Signed intra unsupported.\n"); - return BadImplementation; - } - context_priv = CALLOC(1, sizeof(XvMCContextPrivate)); if (!context_priv) return BadAlloc; @@ -232,7 +240,8 @@ Status XvMCCreateContext(Display *dpy, XvPortID port, int surface_type_id, } vctx = vl_video_create(vscreen, ProfileToPipe(mc_type), - FormatToPipe(chroma_format), width, height); + FormatToPipe(chroma_format), EntryPointToPipe(mc_type), + SurfaceFlagsToPipe(surface_flags), width, height); if (!vctx) { XVMC_MSG(XVMC_ERR, "[XvMC] Could not create VL context.\n"); diff --git a/src/gallium/state_trackers/xorg/xvmc/surface.c b/src/gallium/state_trackers/xorg/xvmc/surface.c index 0decc45..2c16744 100644 --- a/src/gallium/state_trackers/xorg/xvmc/surface.c +++ b/src/gallium/state_trackers/xorg/xvmc/surface.c @@ -181,6 +181,7 @@ MacroBlocksToPipe(struct pipe_screen *screen, pipe_macroblocks->pmv[j][k][l] = xvmc_mb->PMV[j][k][l]; pipe_macroblocks->cbp = xvmc_mb->coded_block_pattern; + pipe_macroblocks->motion_vertical_field_select = xvmc_mb->motion_vertical_field_select; pipe_macroblocks->blocks = xvmc_blocks->blocks + xvmc_mb->index * BLOCK_SIZE_SAMPLES; ++pipe_macroblocks; @@ -194,8 +195,6 @@ Status XvMCCreateSurface(Display *dpy, XvMCContext *context, XvMCSurface *surfac XvMCContextPrivate *context_priv; struct pipe_video_context *vpipe; XvMCSurfacePrivate *surface_priv; - struct pipe_resource template; - struct pipe_resource *vsfc_tex; struct pipe_surface *vsfc; XVMC_MSG(XVMC_TRACE, "[XvMC] Creating surface %p.\n", surface); @@ -214,36 +213,7 @@ Status XvMCCreateSurface(Display *dpy, XvMCContext *context, XvMCSurface *surfac if (!surface_priv) return BadAlloc; - memset(&template, 0, sizeof(struct pipe_resource)); - template.target = PIPE_TEXTURE_2D; - template.format = (enum pipe_format)vpipe->get_param(vpipe, PIPE_CAP_DECODE_TARGET_PREFERRED_FORMAT); - template.last_level = 0; - if (vpipe->is_format_supported(vpipe, template.format, - PIPE_BIND_SAMPLER_VIEW | PIPE_BIND_RENDER_TARGET, - PIPE_TEXTURE_GEOM_NON_POWER_OF_TWO)) { - template.width0 = context->width; - template.height0 = context->height; - } - else { - assert(vpipe->is_format_supported(vpipe, template.format, - PIPE_BIND_SAMPLER_VIEW | PIPE_BIND_RENDER_TARGET, - PIPE_TEXTURE_GEOM_NON_SQUARE)); - template.width0 = util_next_power_of_two(context->width); - template.height0 = util_next_power_of_two(context->height); - } - template.depth0 = 1; - template.usage = PIPE_USAGE_DEFAULT; - template.bind = PIPE_BIND_SAMPLER_VIEW | PIPE_BIND_RENDER_TARGET; - template.flags = 0; - vsfc_tex = vpipe->screen->resource_create(vpipe->screen, &template); - if (!vsfc_tex) { - FREE(surface_priv); - return BadAlloc; - } - - vsfc = vpipe->screen->get_tex_surface(vpipe->screen, vsfc_tex, 0, 0, 0, - PIPE_BIND_SAMPLER_VIEW | PIPE_BIND_RENDER_TARGET); - pipe_resource_reference(&vsfc_tex, NULL); + vpipe->create_decoder_surface(vpipe, &vsfc); if (!vsfc) { FREE(surface_priv); return BadAlloc; @@ -331,7 +301,7 @@ Status XvMCRenderSurface(Display *dpy, XvMCContext *context, unsigned int pictur num_macroblocks, pipe_macroblocks); vpipe->set_decode_target(vpipe, t_vsfc); - vpipe->decode_macroblocks(vpipe, p_vsfc, f_vsfc, num_macroblocks, + vpipe->decode_macroblocks(vpipe, p_vsfc, f_vsfc, PictureToPipe(picture_structure), num_macroblocks, &pipe_macroblocks->base, &target_surface_priv->render_fence); XVMC_MSG(XVMC_TRACE, "[XvMC] Submitted surface %p for rendering.\n", target_surface); @@ -353,12 +323,22 @@ Status XvMCFlushSurface(Display *dpy, XvMCSurface *surface) PUBLIC Status XvMCSyncSurface(Display *dpy, XvMCSurface *surface) { + Status ret; + int status; + assert(dpy); if (!surface) return XvMCBadSurface; - return Success; + for (;;) { + ret = XvMCGetSurfaceStatus(dpy, surface, &status); + if (ret || ( (status & XVMC_RENDERING) == 0) ) + break; + usec_sleep(1000);//1ms (may be 20ms on linux) + } + + return ret; } PUBLIC @@ -453,14 +433,40 @@ Status XvMCPutSurface(Display *dpy, XvMCSurface *surface, Drawable drawable, PUBLIC Status XvMCGetSurfaceStatus(Display *dpy, XvMCSurface *surface, int *status) { + struct pipe_video_context *vpipe; + XvMCSurfacePrivate *surface_priv; + XvMCContextPrivate *context_priv; + XvMCContext *context; + enum pipe_video_surface_status vs_status; + assert(dpy); - if (!surface) + if (!surface || !surface->privData) return XvMCBadSurface; assert(status); - *status = 0; + surface_priv = surface->privData; + context = surface_priv->context; + context_priv = context->privData; + vpipe = context_priv->vctx->vpipe; + + vpipe->query_decoder_surface(vpipe, surface_priv->pipe_vsfc, + &vs_status); + switch (vs_status) { + case PIPE_VIDEO_SURFACE_STATUS_FREE: + *status = 0; + break; + case PIPE_VIDEO_SURFACE_STATUS_RENDERING: + *status = XVMC_RENDERING; + break; + case PIPE_VIDEO_SURFACE_STATUS_DISPLAYING: + *status = XVMC_DISPLAYING; + break; + default: + *status = XVMC_RENDERING; + break; + } return Success; } @@ -468,7 +474,10 @@ Status XvMCGetSurfaceStatus(Display *dpy, XvMCSurface *surface, int *status) PUBLIC Status XvMCDestroySurface(Display *dpy, XvMCSurface *surface) { + struct pipe_video_context *vpipe; XvMCSurfacePrivate *surface_priv; + XvMCContextPrivate *context_priv; + XvMCContext *context; XVMC_MSG(XVMC_TRACE, "[XvMC] Destroying surface %p.\n", surface); @@ -478,7 +487,11 @@ Status XvMCDestroySurface(Display *dpy, XvMCSurface *surface) return XvMCBadSurface; surface_priv = surface->privData; - pipe_surface_reference(&surface_priv->pipe_vsfc, NULL); + context = surface_priv->context; + context_priv = context->privData; + vpipe = context_priv->vctx->vpipe; + + vpipe->destroy_decoder_surface(vpipe, &surface_priv->pipe_vsfc); FREE(surface_priv); surface->privData = NULL; diff --git a/src/gallium/winsys/g3dvl/dri/dri_winsys.c b/src/gallium/winsys/g3dvl/dri/dri_winsys.c index 0663184..8ac7fdc 100644 --- a/src/gallium/winsys/g3dvl/dri/dri_winsys.c +++ b/src/gallium/winsys/g3dvl/dri/dri_winsys.c @@ -240,6 +240,8 @@ struct vl_context* vl_video_create(struct vl_screen *vscreen, enum pipe_video_profile profile, enum pipe_video_chroma_format chroma_format, + enum pipe_video_entry_point entry_point, + unsigned decode_flags, unsigned width, unsigned height) { struct vl_dri_screen *vl_dri_scrn = (struct vl_dri_screen*)vscreen; @@ -258,6 +260,8 @@ vl_video_create(struct vl_screen *vscreen, vl_dri_ctx->base.vpipe = vscreen->pscreen->video_context_create(vscreen->pscreen, profile, chroma_format, + entry_point, + decode_flags, width, height, vl_dri_ctx); diff --git a/src/gallium/winsys/g3dvl/vl_winsys.h b/src/gallium/winsys/g3dvl/vl_winsys.h index 3814786..dbdff71 100644 --- a/src/gallium/winsys/g3dvl/vl_winsys.h +++ b/src/gallium/winsys/g3dvl/vl_winsys.h @@ -56,6 +56,8 @@ struct vl_context* vl_video_create(struct vl_screen *vscreen, enum pipe_video_profile profile, enum pipe_video_chroma_format chroma_format, + enum pipe_video_entry_point entry_point, + unsigned decode_flags, unsigned width, unsigned height); void vl_video_destroy(struct vl_context *vctx); diff --git a/src/gallium/winsys/g3dvl/xlib/xsp_winsys.c b/src/gallium/winsys/g3dvl/xlib/xsp_winsys.c index 0a7f324..c960fdb 100644 --- a/src/gallium/winsys/g3dvl/xlib/xsp_winsys.c +++ b/src/gallium/winsys/g3dvl/xlib/xsp_winsys.c @@ -164,6 +164,8 @@ struct vl_context* vl_video_create(struct vl_screen *vscreen, enum pipe_video_profile profile, enum pipe_video_chroma_format chroma_format, + enum pipe_video_entry_point entry_point, + unsigned decode_flags, unsigned width, unsigned height) { struct pipe_video_context *vpipe; @@ -176,6 +178,8 @@ vl_video_create(struct vl_screen *vscreen, vpipe = vscreen->pscreen->video_context_create(vscreen->pscreen, profile, chroma_format, + entry_point, + decode_flags, width, height, NULL); if (!vpipe) return NULL;