Hi all, I've got an application here (flumotion) which uses libtheora (via gstreamer, though I think that's irrelevant here) and many other bits of code, one of which is PIL (Python Imaging Library), which also has chunks implemented in native code. When I run this, I get a crash in libtheora, the top of the stacktrace looking like: #0 0xb6a71bbf in quantize () from /usr/lib/python2.4/site-packages/PIL/_imaging.so #1 0xb6906fcd in TransformQuantizeBlock () from /usr/lib/libtheora.so.0 #2 0xb69073ab in TransformQuantizeBlock () from /usr/lib/libtheora.so.0 #3 0xb6907830 in EncodeData () from /usr/lib/libtheora.so.0 #4 0xb6909cef in WriteFrameHeader () from /usr/lib/libtheora.so.0 #5 0xb690aaf1 in theora_encode_YUVin () from /usr/lib/libtheora.so.0 #6 0xb6922b8a in theora_enc_chain (pad=0x83c34f8, data=0x81b3db0) at theoraenc.c:746 #7 0xb79c80d5 in gst_pad_call_chain_function (pad=0x83c34f8, data=0x81b3db0) at gstpad.c:4539 Here, we see that either I'm terribly confused, or gdb is, or libtheora is calling quantize() in PIL's _imaging.so. So, my guess at to what's going on is as follows: quantize() (in quant.c) is called from TransformQuantizeBlock, which is normally fine. However, because it's an exported symbol (because... well, everything is), in this application that ends up getting resolved to a different quantize() which has already been loaded. So, I guess the problem (or at least one of them) is that we really should only be exporting the API functions (theora_*), but because ELF symbols are exported by default, we get all the internal symbols as well. It's apparently possible to add a linker command line option to make it only export symbols matching some particular regexp (which is obviously much simpler than having to mess around with linker maps, etc.), but I'd really like some feedback from someone with some actual knowledge (rather than just random guessing) of ELF visibility, linking, etc. Is this analysis even vaguely plausible? If it isn't, any other suggestions as to what might be going on? Mike
On Wed, Aug 03, 2005 at 02:14:22PM +0200, Michael Smith wrote:> > So, I guess the problem (or at least one of them) is that we really > should only be exporting the API functions (theora_*), but because ELF > symbols are exported by default, we get all the internal symbols as > well. It's apparently possible to add a linker command line option to > make it only export symbols matching some particular regexp (which is > obviously much simpler than having to mess around with linker maps, > etc.), but I'd really like some feedback from someone with some actual > knowledge (rather than just random guessing) of ELF visibility, linking, > etc. > > Is this analysis even vaguely plausible? If it isn't, any other > suggestions as to what might be going on?plausible, totally, yow. I reckon we should: 1) namespace all exported symbols. There are platforms where fiddling with the linker is not possible. So, for example, rename the internal function quantize() to theora_quantize() throughout. Don't mess around with linker maps, just rename the symbols. 2) explicitly export API symbols. On Linux and Solaris, this can be done with a version script file which lists only the symbols to be exported, which for libtheora is fairly small. For an example see in liboggz how SHLIB_VERSION_ARG is handled in configure.ac, using src/liboggz/Version_script{.in}. On win32 it's normally done with a .def file, as in vorbis/win32/*.def (Mike, I expect you know this already, I'm just writing it out for completeness :-). Although there is a slight maintenance overhead in maintaining these files, this is only a problem if people are adding API calls willy-nilly, which of course isn't happening here anyway. An extra advantage of explicitly naming which symbols to export is that it disallows use of internal functions by applications. And with coverage testing, the code wouldn't even pass make check if the export files were incorrect ... cheers, Conrad.