Hi me (2x - Haberman and Coalson), On Tue, Nov 11, 2003 at 03:51:42PM -0800, Josh Coalson wrote:> The C++ interface is a little simpler:[code follows] OK. Well, I found the C interface a little bit easier to understand, so I ended up using that instead of continuing to try to figure out the C++ interface. The one problem I have is that the values in block->data. vorbis_comment.comments tend to have trailing junk. For example: Breakpoint 1, get_flac(Song*, char const*) (flac=0x806b8a8, path=0x8068844 "/usr/share/mp3/01-dazed_and_confused.flac") at ftfuncs.cc:53 53 for (i = 0; i < e->length; i++) (gdb) p e->entry $6 = (FLAC__byte *) 0x8066118 "TITLE=Dazed And Confused8Ini\021" [...] This only seems to happen sometimes, and the only reliable way to prevent myself from treading into wild memory is to iterate e->length times over each character in e->entry every single time and not go one byte further. I'm obviously doing something slightly wrong I suppose. The offending code is at <http://cvs.triplehelix.org/tagreport/ftfuncs.cc>, function get_flac(). But thanks for all your help. I might change to the C++ interface later if I have time since the rest of the project is in C++ anyway. Thanks, -- Joshua Kwan -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 827 bytes Desc: not available Url : http://lists.xiph.org/pipermail/flac-dev/attachments/20031111/49c10dca/attachment.pgp
--- Joshua Kwan <joshk@triplehelix.org> wrote:> OK. Well, I found the C interface a little bit easier to understand, > so > I ended up using that instead of continuing to try to figure out the > C++ > interface. > > The one problem I have is that the values in block->data. > vorbis_comment.comments tend to have trailing junk. For example:that's correct; vorbis comments are not c strings and the API does not expose them as such (you could argue that it should I guess). they are a length plus a buffer in UTF-8. if you want a C string in a specific charset you must convert it (see the xmms plugin for examples of how to do that). if you just want to pretend that it's ascii, you have to char str[entry->length+1]; memcpy(str, entry->entry); str[entry->length] = '\0'; Josh __________________________________ Do you Yahoo!? Protect your identity with Yahoo! Mail AddressGuard http://antispam.yahoo.com/whatsnewfree
On Wed, Nov 12, 2003 at 11:46:23AM -0800, Josh Coalson wrote:> that's correct; vorbis comments are not c strings and the API > does not expose them as such (you could argue that it should I > guess). they are a length plus a buffer in UTF-8.A buffer in UTF-8? I thought it was just junk memory - and valgrind seems to prove this: ==13477== Invalid read of size 1 ==13477== at 0x40023F18: strlen (mac_replace_strmem.c:164) ==13477== by 0x80501F9: get_flac(Song*, char const*) (char_traits.h:143) ==13477== by 0x8050F9F: get_artist_title(Song*, std::string, char*) (basic_string.h:717) ==13477== by 0x804E28C: traverse_dir(char*) (sstream:502) ==13477== Address 0x415CDE5A is 0 bytes after a block of size 22 alloc'd ==13477== at 0x4002CA4D: malloc (vg_replace_malloc.c:153) ==13477== by 0x402AAF1D: FLAC::Metadata::VorbisComment::Entry::parse_field() (in /usr/lib/libFLAC++.so.2.1.2) ==13477== by 0x8050F9F: get_artist_title(Song*, std::string, char*) (basic_string.h:717) ==13477== by 0x804E28C: traverse_dir(char*) (sstream:502) However, the C++ API exposes the field names and values as C strings: const char *get_field() const; const char *get_field_name() const; const char *get_field_value() const; but there's still junk on the char* i get back from get_field_value(). Should the type be changed to prevent the ambiguity/assumption?> char str[entry->length+1]; > memcpy(str, entry->entry); > str[entry->length] = '\0';That's about the size of what I've been doing. -- Joshua Kwan -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 827 bytes Desc: not available Url : http://lists.xiph.org/pipermail/flac-dev/attachments/20031115/46af44fb/attachment.pgp
On Mon, Nov 17, 2003 at 11:10:58AM -0800, Josh Coalson wrote:> > but there's still junk on the char* i get back from > > get_field_value(). > > there's no terminating null for these routines either, they > are returning the unterminated UTF-8 buffer just like the C > API.OK. I assumed that stuff passed back to the user as char* would have been made to behave like a normal C string.> it's hard to do that without dealing with encodings right in the > metadata API.What about making them return wchar_t* and thus IMPLYING to the user that they have to do something about character conversion?> > > char str[entry->length+1]; > > > memcpy(str, entry->entry); > > > str[entry->length] = '\0'; > > > > That's about the size of what I've been doing. > > and that's working, right? (or at least it will for ASCII > tags)Yes. To convert, would i use iconv() or something like that? -- Joshua Kwan -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 827 bytes Desc: not available Url : http://lists.xiph.org/pipermail/flac-dev/attachments/20031117/df7175cc/attachment.pgp
--- Joshua Kwan <joshk@triplehelix.org> wrote:> On Mon, Nov 17, 2003 at 11:10:58AM -0800, Josh Coalson wrote: > > > but there's still junk on the char* i get back from > > > get_field_value(). > > > > there's no terminating null for these routines either, they > > are returning the unterminated UTF-8 buffer just like the C > > API. > > OK. I assumed that stuff passed back to the user as char* would have > been made to behave like a normal C string. > > > it's hard to do that without dealing with encodings right in the > > metadata API. > > What about making them return wchar_t* and thus IMPLYING to the user > that they have to do something about character conversion?wchar_t still means converting to ucs-2 in the metadata library. that's not nearly as bad as dealing with multiple encodings, but it's independent of the question of whether or not to null- terminate. returning char* or wchar_t* don't in and of themselves imply that you're getting a null-terminated C string.> > > > char str[entry->length+1]; > > > > memcpy(str, entry->entry); > > > > str[entry->length] = '\0'; > > > > > > That's about the size of what I've been doing. > > > > and that's working, right? (or at least it will for ASCII > > tags) > > Yes. To convert, would i use iconv() or something like that?yes, see the plugin code I mentioned before for examples. Josh __________________________________ Do you Yahoo!? Protect your identity with Yahoo! Mail AddressGuard http://antispam.yahoo.com/whatsnewfree