thr3ads.net - Nouveau - [Nouveau] Synchronization mostly missing? [Dec 2009]

If this information is useful, please help other people find it:
Share via:

Luca Barbieri

2009-Dec-28 03:41 UTC

[Nouveau] Synchronization mostly missing?

It seems that Noveau is assuming that once the FIFO pointer is past a
command, that command has finished executing, and all the buffers it
used are no longer needed.

However, this seems to be false at least on G71.
In particular, the card may not have even finished reading the input
vertex buffers when the pushbuffer "fence" triggers.
While Mesa does not reuse the buffer object itself, the current
allocator tends to return memory that has just been freed, resulting
in the buffer actually been reused.
Thus Mesa will overwrite the vertices before the GPU has used them.

This results in all kinds of artifacts, such as vertices going to
infinity, and random polygons appearing.
This can be seen in progs/demos/engine, progs/demos/dinoshade,
Blender, Extreme Tux Racer and probably any non-trivial OpenGL
software.

The problem can be significantly reduced by just adding a waiting loop
at the end of draw_arrays and draw_elements, or by synchronizing
drawing by adding and calling the following function instead of
pipe->flush in nv40_vbo.c:
I think the remaining artifacts may be due to missing 2D engine
synchronization, but I'm not sure how that works.
Note that this causes the CPU to wait for rendering, which is not the
correct solution

static void nv40_sync(struct nv40_context *nv40)
{
	nouveau_notifier_reset(nv40->screen->sync, 0);

//	BEGIN_RING(curie, 0x1d6c, 1);
//	OUT_RING(0x5c0);

//	static int value = 0x23;
//	BEGIN_RING(curie, 0x1d70, 1);
//	OUT_RING(value++);

	BEGIN_RING(curie, NV40TCL_NOTIFY, 1);
	OUT_RING(0);
	
	BEGIN_RING(curie, NV40TCL_NOP, 1);
	OUT_RING(0);
	
	FIRE_RING(NULL);

	nouveau_notifier_wait_status(nv40->screen->sync, 0, 0, 0);
}

It seems that NV40TCL_NOTIFY (which must be followed by a nop for some
reason) triggers a notification of rendering completion.
Furthermore, the card will probably put the value set with 0x1d70
somewhere, where 0x1d6c has an unknown use
The 1d70/1d6c is frequently used by the nVidia driver, with 0x1d70
being a sequence number, while 0x1d6c is always set to 0x5c0, while
NV40TCL_NOTIFY seems to be inserted on demand.
On my machine, setting 0x1d6c/0x1d70 like the nVidia driver does
causes a GPU lockup. That is probably because the location where the
GPU is supposed to put the value has not been setup correctly.

So it seems that the current model is wrong, and the current fence
should only be used to determine whether the pushbuffer itself can be
reused.
It seems that, after figuring out where the GPU writes the value and
how to use the mechanism properly, this should be used by the kernel
driver as the bo->sync_obj implementation.
This will delay destruction of the buffers, and thus prevent
reallocation of them, and artifacts, without synchronizing rendering.

I'm not sure why this hasn't been noticed before though.
Is everyone getting randomly misrendered OpenGL or is my machine
somehow more prone to reusing buffers?

What do you think? Is the analysis correct?

Francisco Jerez

2009-Dec-28 04:41 UTC

head link

[Nouveau] Synchronization mostly missing?

Hi,

Luca Barbieri <luca at luca-barbieri.com> writes:
> It seems that Noveau is assuming that once the FIFO pointer is past a
> command, that command has finished executing, and all the buffers it
> used are no longer needed.
>
> However, this seems to be false at least on G71.
> In particular, the card may not have even finished reading the input
> vertex buffers when the pushbuffer "fence" triggers.
> While Mesa does not reuse the buffer object itself, the current
> allocator tends to return memory that has just been freed, resulting
> in the buffer actually been reused.
> Thus Mesa will overwrite the vertices before the GPU has used them.
>
> This results in all kinds of artifacts, such as vertices going to
> infinity, and random polygons appearing.
> This can be seen in progs/demos/engine, progs/demos/dinoshade,
> Blender, Extreme Tux Racer and probably any non-trivial OpenGL
> software.
>
Can you reproduce this with your vertex buffers in VRAM instead of GART?
(to rule out that it's a fencing issue).
> The problem can be significantly reduced by just adding a waiting loop
> at the end of draw_arrays and draw_elements, or by synchronizing
> drawing by adding and calling the following function instead of
> pipe->flush in nv40_vbo.c:
> I think the remaining artifacts may be due to missing 2D engine
> synchronization, but I'm not sure how that works.
> Note that this causes the CPU to wait for rendering, which is not the
> correct solution
>
> static void nv40_sync(struct nv40_context *nv40)
> {
> 	nouveau_notifier_reset(nv40->screen->sync, 0);
>
> //	BEGIN_RING(curie, 0x1d6c, 1);
> //	OUT_RING(0x5c0);
>
> //	static int value = 0x23;
> //	BEGIN_RING(curie, 0x1d70, 1);
> //	OUT_RING(value++);
>
> 	BEGIN_RING(curie, NV40TCL_NOTIFY, 1);
> 	OUT_RING(0);
> 	
> 	BEGIN_RING(curie, NV40TCL_NOP, 1);
> 	OUT_RING(0);
> 	
> 	FIRE_RING(NULL);
>
> 	nouveau_notifier_wait_status(nv40->screen->sync, 0, 0, 0);
> }
>
> It seems that NV40TCL_NOTIFY (which must be followed by a nop for some
> reason) triggers a notification of rendering completion.
> Furthermore, the card will probably put the value set with 0x1d70
> somewhere, where 0x1d6c has an unknown use
> The 1d70/1d6c is frequently used by the nVidia driver, with 0x1d70
> being a sequence number, while 0x1d6c is always set to 0x5c0, while
> NV40TCL_NOTIFY seems to be inserted on demand.
> On my machine, setting 0x1d6c/0x1d70 like the nVidia driver does
> causes a GPU lockup. That is probably because the location where the
> GPU is supposed to put the value has not been setup correctly.
>
> So it seems that the current model is wrong, and the current fence
> should only be used to determine whether the pushbuffer itself can be
> reused.
> It seems that, after figuring out where the GPU writes the value and
> how to use the mechanism properly, this should be used by the kernel
> driver as the bo->sync_obj implementation.
> This will delay destruction of the buffers, and thus prevent
> reallocation of them, and artifacts, without synchronizing rendering.
>
> I'm not sure why this hasn't been noticed before though.
> Is everyone getting randomly misrendered OpenGL or is my machine
> somehow more prone to reusing buffers?
>
> What do you think? Is the analysis correct?
> _______________________________________________
> Nouveau mailing list
> Nouveau at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/nouveau-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 196 bytes
Desc: not available
Url :
http://lists.freedesktop.org/archives/nouveau/attachments/20091228/c83eead9/attachment-0001.pgp

Luca Barbieri

2009-Dec-28 05:50 UTC

head link

[Nouveau] Synchronization mostly missing?

I figured out the registers.

There is a fence/sync mechanism which apparently triggers after
rendering is finished.
There are two ways to use it, but they trigger at the same time
(spinning in a loop on the CPU checking them, they trigger at the same
iteration or in two successive iterations).

The first is the "sync" notifier, which involves a notifier object set
at NV40TCL_DMA_NOTIFY.
When NV40TCL_NOTIFY, with argument 0, followed by NV40TCL_NOP, with
argument 0 is inserted in the ring, the notifier object will be
notified when rendering is finished.
fbcon uses this to sync rendering.
Currently the Mesa driver sets an object but does not use it.
The renouveau traces use this mechanism only in the
EXT_framebuffer_object tests.
It's not clear what the purpose of the NOP is, but it seems necessary.

The second is the fence mechanism, which involves an object set at
NV40TCL_DMA_FENCE.
When register 0x1d70 is set, the value set there will be written to
the object at the offset programmed in 0x1d6c.
The offset in 0x1d6c must be 16-byte aligned, but the GPU seems to
only write 4 bytes with the sequence number.
Nouveau does not use this currently, and sets NV40TCL_DMA_FENCE to 0.
The nVidia driver uses this often. It allocates a 4KB object and asks
the GPU to put the sequence number always at offset 0x5c0. Why it does
this rather than allocating a 16 byte object and using offset 0 is
unknown.

IMHO the fence mechanism should be implemented in the kernel along
with the current FIFO fencing, and should protect the relocated buffer
object.

Krzysztof Smiechowicz

2009-Dec-28 07:27 UTC

head link

[Nouveau] Synchronization mostly missing?

Luca Barbieri pisze:
> I'm not sure why this hasn't been noticed before though.
> Is everyone getting randomly misrendered OpenGL or is my machine
> somehow more prone to reusing buffers?
I reported a similar problem about 2 weeks ago. It first became apparent 
with NV40 but I also confirmed it with NV30 - in both cases it was 
visible in morph3d demo. As long as nothing changes in memory 
allocation, everything is fine. If I even move a window(which causes 
some allocations in the system) vertexes become damaged.


Some information from that previous emails:

""

I see this problem on morph3d demo. What it does is: for each frame 
create a call list and then call it 4 times.

ADDR    VRAM OFFSET
A    X
B    Y
C    X

A,B,C is the memory offset of 32kb buffer created for vertex buffer when 
call lists are compiled. X,Y is the VRAM OFFSET (bo.mem.mm_node.start)

First buffer is created (X,A). When it gets full (after around 3 frames) 
second buffer is created (Y,B). Then first one is freed. When second 
buffer is full, third is created (X,C) - here the problem start:
according to my observations, the card seems to read vertexes not from 
address C but from address A as if it somehow remembered the initial 
address binding.

Other observations:
- the data during execution of gl commands actually seems to be put into 
location C - when I switch to software path, I could track down that it 
reads data from location C - rendering is done correctly in software path
- when I comment out freeing of memory manager node (bo.mem.mm_node), so 
that the third buffer is Z,C (paired with not yet used offset of VRAM) 
then hardware rendering behaves correctly - but this will make card "run 
out" of memory as no memory manager nodes will be deallocated
- when I switch the calls of glCallList into actual rendering code and 
disable invocation of glNewList/glEndList the hardware rendering also 
behaves correctly
""

Best regards,
Krzysztof

Seemingly Similar Threads

Search for more possibly parallel threads

Nouveau - Dec 2009 - Synchronization mostly missing?

[Nouveau] Synchronization mostly missing?

[Nouveau] Synchronization mostly missing?

[Nouveau] Synchronization mostly missing?

[Nouveau] Synchronization mostly missing?

Seemingly Similar Threads