Pekka Paalanen
2008-Nov-11 22:26 UTC
[Nouveau] Memory corruption on Gallium window resize, diagnosed?
Hi, I've been playing with nouveau/mesa branch gallium-0.1, trying to get trivial/tri working on nv20 (with nv10 code). When ever I resize the window, it ends up in an assert failure: #0 0xf790744f in _debug_assert_fail (expr=0xf791908f "0", file=0xf7919050 "nv20_state_emit.c", line=139, function=0xf7919034 "nv20_state_emit_framebuffer") at p_debug.c:335 #1 0xf76fe723 in nv20_state_emit_framebuffer (nv20=0x805ef68) at nv20_state_emit.c:139 #2 0xf76fef2e in nv20_emit_hw_state (nv20=0x805ef68) at nv20_state_emit.c:255 #3 0xf76ff70b in nv20_draw_elements (pipe=0x805ef68, indexBuffer=0x0, indexSize=0, prim=4, start=0, count=3) at nv20_vbo.c:20 #4 0xf76ff922 in nv20_draw_arrays (pipe=0x805ef68, prim=4, start=0, count=3) at nv20_vbo.c:73 #5 0xf7743ac8 in st_draw_vbo (ctx=0x8074d18, arrays=0x80ade38, prims=0x80ac994, nr_prims=1, ib=0x0, min_index=0, max_index=2) at state_tracker/st_draw.c:634 #6 0xf782c140 in vbo_exec_vtx_flush (exec=0x80ac870) at vbo/vbo_exec_draw.c:248 #7 0xf782a858 in vbo_exec_FlushVertices (ctx=0x8074d18, flags=1) at vbo/vbo_exec_api.c:752 #8 0xf7761f39 in _mesa_Flush () at main/context.c:1815 #9 0xf7e00ae6 in glFlush () at ../../../src/mesa/glapi/glapitemp.h:1170 #10 0x08048dba in Draw () at tri.c:83 #11 0xf7ecdc54 in processWindowWorkList (window=0x8051200) at glut_event.c:1302 #12 0xf7ecdd3a in __glutProcessWindowWorkLists () at glut_event.c:1354 #13 0xf7ecddb1 in glutMainLoop () at glut_event.c:1375 #14 0x08048fd8 in main (argc=1, argv=0xfff760e4) at tri.c:132 The struct pipe_surface used there contains garbage: (gdb) print *fb->cbufs[0] $4 = {buffer = 0xf7d901a0, format = -136773216, status = 1, clear_value = 0, width = 189, height = 181, block = {size = 4, width = 1, height = 1}, nblocksx = 189, nblocksy = 181, stride = 768, layout = 0, offset = 0, refcount = 0, usage = 12, winsys = 0x0, texture = 0x0, face = 0, level = 0, zslice = 88} I tried to track what writes the insane value to the format field, and always came up with malloc and free, which didn't make any sense. Then I ran it with valgrind: ==5322== Invalid read of size 4 ==5322== at 0x7B6887C: pipe_surface_reference (p_inlines.h:93) ==5322== by 0x7B6881F: copy_framebuffer_state (cso_context.c:781) ==5322== by 0x7B68962: cso_set_framebuffer (cso_context.c:791) ==5322== by 0x7AB5441: update_framebuffer_state (st_atom_framebuffer.c:147) ==5322== by 0x7AB3E41: st_validate_state (st_atom.c:188) ==5322== by 0x7ABDA2E: st_clear (st_cb_clear.c:517) ==5322== by 0x7B0BED5: _mesa_Clear (clear.c:177) ==5322== by 0x47F25FA: glClear (glapitemp.h:1100) ==5322== by 0x8048CE9: Draw (tri.c:72) ==5322== by 0x46F2C53: processWindowWorkList (glut_event.c:1302) ==5322== by 0x46F2D39: __glutProcessWindowWorkLists (glut_event.c:1354) ==5322== by 0x46F2DB0: glutMainLoop (glut_event.c:1375) ==5322== Address 0x7638a0c is 68 bytes inside a block of size 84 free'd ==5322== at 0x46D99EC: free (vg_replace_malloc.c:323) ==5322== by 0x797E17D: nv20_miptree_surface_release (nv20_miptree.c:144) ==5322== by 0x7AC0C69: pipe_surface_reference (p_inlines.h:95) ==5322== by 0x7AC07D3: st_renderbuffer_alloc_storage (st_cb_fbo.c:97) ==5322== by 0x7A07844: _mesa_resize_framebuffer (framebuffer.c:292) ==5322== by 0x79C2F65: st_resize_framebuffer (st_framebuffer.c:147) ==5322== by 0x7950CB3: nouveau_context_bind (nouveau_context.c:324) ==5322== by 0x794A6E4: DoBindContext (dri_util.c:380) ==5322== by 0x794A78E: driBindContext (dri_util.c:413) ==5322== by 0x47C14B2: BindContextWrapper (glxext.c:1621) ==5322== by 0x47C15F2: MakeContextCurrent (glxext.c:1675) ==5322== by 0x47C1927: glXMakeCurrent (glxext.c:1797) ==5322== and a couple more invalid reads, and a segfault. My theory is: when the window resize occurs, somewhere along the path st_renderbuffer_alloc_storage(GLcontext * ctx, struct gl_renderbuffer *rb, GLenum internalFormat, GLuint width, GLuint height) { struct pipe_context *pipe = ctx->st->pipe; struct st_renderbuffer *strb = st_renderbuffer(rb); struct pipe_texture template; unsigned surface_usage; /* Free the old surface and texture */ pipe_surface_reference( &strb->surface, NULL ); pipe_texture_reference( &strb->texture, NULL ); is executed, which AFAICT frees the pipe_surface. My backtraces show it really is freed, since the pipe_surface::refcount = 1 and decreases to zero. Then the new pipe_surface is allocated and things are looking good so far. This is seen in the "freed" part of the valgrind stack trace. Then we get to the first glClear after resize, and hit the CSO cache. The cache still has a pointer to the already freed pipe_surface (well, the memory address is the same), and frees it again. See the valgring stack trace, the invalid read happens in three places pipe_surface_reference(struct pipe_surface **ptr, struct pipe_surface *surf) { /* bump the refcount first */ if (surf) surf->refcount++; if (*ptr) { /* There are currently two sorts of surfaces... This needs to be * fixed so that all surfaces are views into a texture. */ #-> if ((*ptr)->texture) { struct pipe_screen *screen = (*ptr)->texture->screen; screen->tex_surface_release( screen, ptr ); } else { #-> struct pipe_winsys *winsys = (*ptr)->winsys; #-> winsys->surface_release(winsys, ptr); } and then segfaults. So there's a free too much, or something forgets to increment the reference count. Or something else. What do you say, how do we fix it, is this the bug? Note, that nouveau/mesa gallium-0.1 has been last synced to mesa/mesa Sept 18th, so maybe you have already fixed it? Or not? I looked at the diffs and logs, but didn't see a fix. Thanks. -- Pekka Paalanen http://www.iki.fi/pq/
José Fonseca
2008-Nov-11 23:34 UTC
[Nouveau] [Mesa3d-dev] Memory corruption on Gallium window resize, diagnosed?
On Wed, 2008-11-12 at 00:26 +0200, Pekka Paalanen wrote:> Hi, > > I've been playing with nouveau/mesa branch gallium-0.1, trying to get > trivial/tri working on nv20 (with nv10 code). When ever I resize the window, > it ends up in an assert failure: > > #0 0xf790744f in _debug_assert_fail (expr=0xf791908f "0", file=0xf7919050 "nv20_state_emit.c", line=139, > function=0xf7919034 "nv20_state_emit_framebuffer") at p_debug.c:335 > #1 0xf76fe723 in nv20_state_emit_framebuffer (nv20=0x805ef68) at nv20_state_emit.c:139 > #2 0xf76fef2e in nv20_emit_hw_state (nv20=0x805ef68) at nv20_state_emit.c:255 > #3 0xf76ff70b in nv20_draw_elements (pipe=0x805ef68, indexBuffer=0x0, indexSize=0, prim=4, start=0, count=3) at nv20_vbo.c:20 > #4 0xf76ff922 in nv20_draw_arrays (pipe=0x805ef68, prim=4, start=0, count=3) at nv20_vbo.c:73 > #5 0xf7743ac8 in st_draw_vbo (ctx=0x8074d18, arrays=0x80ade38, prims=0x80ac994, nr_prims=1, ib=0x0, min_index=0, max_index=2) > at state_tracker/st_draw.c:634 > #6 0xf782c140 in vbo_exec_vtx_flush (exec=0x80ac870) at vbo/vbo_exec_draw.c:248 > #7 0xf782a858 in vbo_exec_FlushVertices (ctx=0x8074d18, flags=1) at vbo/vbo_exec_api.c:752 > #8 0xf7761f39 in _mesa_Flush () at main/context.c:1815 > #9 0xf7e00ae6 in glFlush () at ../../../src/mesa/glapi/glapitemp.h:1170 > #10 0x08048dba in Draw () at tri.c:83 > #11 0xf7ecdc54 in processWindowWorkList (window=0x8051200) at glut_event.c:1302 > #12 0xf7ecdd3a in __glutProcessWindowWorkLists () at glut_event.c:1354 > #13 0xf7ecddb1 in glutMainLoop () at glut_event.c:1375 > #14 0x08048fd8 in main (argc=1, argv=0xfff760e4) at tri.c:132 > > The struct pipe_surface used there contains garbage: > > (gdb) print *fb->cbufs[0] > $4 = {buffer = 0xf7d901a0, format = -136773216, status = 1, clear_value = 0, width = 189, height = 181, block = {size = 4, width = 1, > height = 1}, nblocksx = 189, nblocksy = 181, stride = 768, layout = 0, offset = 0, refcount = 0, usage = 12, winsys = 0x0, > texture = 0x0, face = 0, level = 0, zslice = 88} > > I tried to track what writes the insane value to the format field, > and always came up with malloc and free, which didn't make any sense. > > Then I ran it with valgrind: > > ==5322== Invalid read of size 4 > ==5322== at 0x7B6887C: pipe_surface_reference (p_inlines.h:93) > ==5322== by 0x7B6881F: copy_framebuffer_state (cso_context.c:781) > ==5322== by 0x7B68962: cso_set_framebuffer (cso_context.c:791) > ==5322== by 0x7AB5441: update_framebuffer_state (st_atom_framebuffer.c:147) > ==5322== by 0x7AB3E41: st_validate_state (st_atom.c:188) > ==5322== by 0x7ABDA2E: st_clear (st_cb_clear.c:517) > ==5322== by 0x7B0BED5: _mesa_Clear (clear.c:177) > ==5322== by 0x47F25FA: glClear (glapitemp.h:1100) > ==5322== by 0x8048CE9: Draw (tri.c:72) > ==5322== by 0x46F2C53: processWindowWorkList (glut_event.c:1302) > ==5322== by 0x46F2D39: __glutProcessWindowWorkLists (glut_event.c:1354) > ==5322== by 0x46F2DB0: glutMainLoop (glut_event.c:1375) > ==5322== Address 0x7638a0c is 68 bytes inside a block of size 84 free'd > ==5322== at 0x46D99EC: free (vg_replace_malloc.c:323) > ==5322== by 0x797E17D: nv20_miptree_surface_release (nv20_miptree.c:144) > ==5322== by 0x7AC0C69: pipe_surface_reference (p_inlines.h:95) > ==5322== by 0x7AC07D3: st_renderbuffer_alloc_storage (st_cb_fbo.c:97) > ==5322== by 0x7A07844: _mesa_resize_framebuffer (framebuffer.c:292) > ==5322== by 0x79C2F65: st_resize_framebuffer (st_framebuffer.c:147) > ==5322== by 0x7950CB3: nouveau_context_bind (nouveau_context.c:324) > ==5322== by 0x794A6E4: DoBindContext (dri_util.c:380) > ==5322== by 0x794A78E: driBindContext (dri_util.c:413) > ==5322== by 0x47C14B2: BindContextWrapper (glxext.c:1621) > ==5322== by 0x47C15F2: MakeContextCurrent (glxext.c:1675) > ==5322== by 0x47C1927: glXMakeCurrent (glxext.c:1797) > ==5322== > > and a couple more invalid reads, and a segfault. > My theory is: when the window resize occurs, somewhere along the path > > st_renderbuffer_alloc_storage(GLcontext * ctx, struct gl_renderbuffer *rb, > GLenum internalFormat, > GLuint width, GLuint height) > { > struct pipe_context *pipe = ctx->st->pipe; > struct st_renderbuffer *strb = st_renderbuffer(rb); > struct pipe_texture template; > unsigned surface_usage; > > /* Free the old surface and texture > */ > pipe_surface_reference( &strb->surface, NULL ); > pipe_texture_reference( &strb->texture, NULL ); > > is executed, which AFAICT frees the pipe_surface. My backtraces > show it really is freed, since the pipe_surface::refcount = 1 > and decreases to zero. Then the new pipe_surface is allocated > and things are looking good so far. This is seen in the > "freed" part of the valgrind stack trace. > > Then we get to the first glClear after resize, and hit the CSO > cache. The cache still has a pointer to the already freed pipe_surface > (well, the memory address is the same), and frees it again. See the > valgring stack trace, the invalid read happens in three places > > pipe_surface_reference(struct pipe_surface **ptr, struct pipe_surface *surf) > { > /* bump the refcount first */ > if (surf) > surf->refcount++; > > if (*ptr) { > > /* There are currently two sorts of surfaces... This needs to be > * fixed so that all surfaces are views into a texture. > */ > #-> if ((*ptr)->texture) { > struct pipe_screen *screen = (*ptr)->texture->screen; > screen->tex_surface_release( screen, ptr ); > } > else { > #-> struct pipe_winsys *winsys = (*ptr)->winsys; > #-> winsys->surface_release(winsys, ptr); > } > > and then segfaults. So there's a free too much, or something > forgets to increment the reference count. Or something else. > > What do you say, how do we fix it, is this the bug?It seems that there is a missing surface reference count increment, i.e., the surface pointer is copied without using the pipe_surface_reference function, but it is destroyed with it one time too many.> Note, that nouveau/mesa gallium-0.1 has been last synced to mesa/mesa > Sept 18th, so maybe you have already fixed it? Or not? > > I looked at the diffs and logs, but didn't see a fix.The problem might be on the winsys. Make sure that the pipe_winsys::surface_* are in sync with the xlib winsys, otherwise the texture-view-surface vs standalone-view-surface logic in pipe_surface_reference and friends will break down. Jose
Ben Skeggs
2008-Nov-12 01:43 UTC
[Nouveau] Memory corruption on Gallium window resize, diagnosed?
Am Mittwoch, den 12.11.2008, 00:26 +0200 schrieb Pekka Paalanen:> Hi, > > I've been playing with nouveau/mesa branch gallium-0.1, trying to get > trivial/tri working on nv20 (with nv10 code). When ever I resize the window, > it ends up in an assert failure:Hey, I fixed nv4x recently, gallium-0.2 branch... Ben.> > #0 0xf790744f in _debug_assert_fail (expr=0xf791908f "0", file=0xf7919050 "nv20_state_emit.c", line=139, > function=0xf7919034 "nv20_state_emit_framebuffer") at p_debug.c:335 > #1 0xf76fe723 in nv20_state_emit_framebuffer (nv20=0x805ef68) at nv20_state_emit.c:139 > #2 0xf76fef2e in nv20_emit_hw_state (nv20=0x805ef68) at nv20_state_emit.c:255 > #3 0xf76ff70b in nv20_draw_elements (pipe=0x805ef68, indexBuffer=0x0, indexSize=0, prim=4, start=0, count=3) at nv20_vbo.c:20 > #4 0xf76ff922 in nv20_draw_arrays (pipe=0x805ef68, prim=4, start=0, count=3) at nv20_vbo.c:73 > #5 0xf7743ac8 in st_draw_vbo (ctx=0x8074d18, arrays=0x80ade38, prims=0x80ac994, nr_prims=1, ib=0x0, min_index=0, max_index=2) > at state_tracker/st_draw.c:634 > #6 0xf782c140 in vbo_exec_vtx_flush (exec=0x80ac870) at vbo/vbo_exec_draw.c:248 > #7 0xf782a858 in vbo_exec_FlushVertices (ctx=0x8074d18, flags=1) at vbo/vbo_exec_api.c:752 > #8 0xf7761f39 in _mesa_Flush () at main/context.c:1815 > #9 0xf7e00ae6 in glFlush () at ../../../src/mesa/glapi/glapitemp.h:1170 > #10 0x08048dba in Draw () at tri.c:83 > #11 0xf7ecdc54 in processWindowWorkList (window=0x8051200) at glut_event.c:1302 > #12 0xf7ecdd3a in __glutProcessWindowWorkLists () at glut_event.c:1354 > #13 0xf7ecddb1 in glutMainLoop () at glut_event.c:1375 > #14 0x08048fd8 in main (argc=1, argv=0xfff760e4) at tri.c:132 > > The struct pipe_surface used there contains garbage: > > (gdb) print *fb->cbufs[0] > $4 = {buffer = 0xf7d901a0, format = -136773216, status = 1, clear_value = 0, width = 189, height = 181, block = {size = 4, width = 1, > height = 1}, nblocksx = 189, nblocksy = 181, stride = 768, layout = 0, offset = 0, refcount = 0, usage = 12, winsys = 0x0, > texture = 0x0, face = 0, level = 0, zslice = 88} > > I tried to track what writes the insane value to the format field, > and always came up with malloc and free, which didn't make any sense. > > Then I ran it with valgrind: > > ==5322== Invalid read of size 4 > ==5322== at 0x7B6887C: pipe_surface_reference (p_inlines.h:93) > ==5322== by 0x7B6881F: copy_framebuffer_state (cso_context.c:781) > ==5322== by 0x7B68962: cso_set_framebuffer (cso_context.c:791) > ==5322== by 0x7AB5441: update_framebuffer_state (st_atom_framebuffer.c:147) > ==5322== by 0x7AB3E41: st_validate_state (st_atom.c:188) > ==5322== by 0x7ABDA2E: st_clear (st_cb_clear.c:517) > ==5322== by 0x7B0BED5: _mesa_Clear (clear.c:177) > ==5322== by 0x47F25FA: glClear (glapitemp.h:1100) > ==5322== by 0x8048CE9: Draw (tri.c:72) > ==5322== by 0x46F2C53: processWindowWorkList (glut_event.c:1302) > ==5322== by 0x46F2D39: __glutProcessWindowWorkLists (glut_event.c:1354) > ==5322== by 0x46F2DB0: glutMainLoop (glut_event.c:1375) > ==5322== Address 0x7638a0c is 68 bytes inside a block of size 84 free'd > ==5322== at 0x46D99EC: free (vg_replace_malloc.c:323) > ==5322== by 0x797E17D: nv20_miptree_surface_release (nv20_miptree.c:144) > ==5322== by 0x7AC0C69: pipe_surface_reference (p_inlines.h:95) > ==5322== by 0x7AC07D3: st_renderbuffer_alloc_storage (st_cb_fbo.c:97) > ==5322== by 0x7A07844: _mesa_resize_framebuffer (framebuffer.c:292) > ==5322== by 0x79C2F65: st_resize_framebuffer (st_framebuffer.c:147) > ==5322== by 0x7950CB3: nouveau_context_bind (nouveau_context.c:324) > ==5322== by 0x794A6E4: DoBindContext (dri_util.c:380) > ==5322== by 0x794A78E: driBindContext (dri_util.c:413) > ==5322== by 0x47C14B2: BindContextWrapper (glxext.c:1621) > ==5322== by 0x47C15F2: MakeContextCurrent (glxext.c:1675) > ==5322== by 0x47C1927: glXMakeCurrent (glxext.c:1797) > ==5322== > > and a couple more invalid reads, and a segfault. > My theory is: when the window resize occurs, somewhere along the path > > st_renderbuffer_alloc_storage(GLcontext * ctx, struct gl_renderbuffer *rb, > GLenum internalFormat, > GLuint width, GLuint height) > { > struct pipe_context *pipe = ctx->st->pipe; > struct st_renderbuffer *strb = st_renderbuffer(rb); > struct pipe_texture template; > unsigned surface_usage; > > /* Free the old surface and texture > */ > pipe_surface_reference( &strb->surface, NULL ); > pipe_texture_reference( &strb->texture, NULL ); > > is executed, which AFAICT frees the pipe_surface. My backtraces > show it really is freed, since the pipe_surface::refcount = 1 > and decreases to zero. Then the new pipe_surface is allocated > and things are looking good so far. This is seen in the > "freed" part of the valgrind stack trace. > > Then we get to the first glClear after resize, and hit the CSO > cache. The cache still has a pointer to the already freed pipe_surface > (well, the memory address is the same), and frees it again. See the > valgring stack trace, the invalid read happens in three places > > pipe_surface_reference(struct pipe_surface **ptr, struct pipe_surface *surf) > { > /* bump the refcount first */ > if (surf) > surf->refcount++; > > if (*ptr) { > > /* There are currently two sorts of surfaces... This needs to be > * fixed so that all surfaces are views into a texture. > */ > #-> if ((*ptr)->texture) { > struct pipe_screen *screen = (*ptr)->texture->screen; > screen->tex_surface_release( screen, ptr ); > } > else { > #-> struct pipe_winsys *winsys = (*ptr)->winsys; > #-> winsys->surface_release(winsys, ptr); > } > > and then segfaults. So there's a free too much, or something > forgets to increment the reference count. Or something else. > > What do you say, how do we fix it, is this the bug? > > Note, that nouveau/mesa gallium-0.1 has been last synced to mesa/mesa > Sept 18th, so maybe you have already fixed it? Or not? > > I looked at the diffs and logs, but didn't see a fix. > > > Thanks. >