This is 1.3.1 on OpenBSD/amd64. The --no-utf8-convert option of metaflac(1) does not work for me: $ metaflac --no-utf8-convert --set-tag="Artist=?ou?l??ek" aladin.flac aladin.flac: ERROR: tag value for 'Artist' is not valid UTF-8 (You probably can't see the Czech letters properly in my mail, but that's beside the point.) Indeed, it is not valid UTF8 (it's LATIN2), but isn't metaflac supposed to just write it as specified, with this option? Jan
Brian Willoughby
2014-Dec-06 05:50 UTC
[Flac] metaflac --no-utf8-convert complains about UTF
Hello Jan, I assume the problem is that metaflac has no way of knowing the encoding that was provided on the command line, since it could literally be anything. The --no-utf8-convert option means that metaflac does nothing to the letters as they pass through, and then the problem becomes that the next program to read the tags has to assume the character set without any information. If the program reading the tags gets the character set wrong, then you see garbage. It's possible that the "local charset" or "locale" will be the same on the command line and in the application interpreting the characters, but that's not always true. Or, to put it another way, isn't the assumption that all tags in a FLAC file are UTF-8? Thus, if you provide LATIN2 and don't allow metaflac to convert, then it's sure to be garbage. By the way, I can see the Czech letters properly in your email, because it has a header saying Content-Type: text/plain; charset="iso-8859-2" and my Mac uses that information to decode the characters correctly. Not that I can pronounce Czech properly, but it sure looks like some of my favorite movie titles? I'm just guessing here, but I assume that the best way to handle this would be to provide the characters to metaflac in UTF-8 and not use that option (because it ignores the charset). Then the applications reading out the tags will know that they're UTF-8. Obviously, if anyone has better procedures for this, please explain. I don't actually know whether this option is supposed to work on input, output, or both. Brian On Dec 5, 2014, at 11:16 AM, Jan Stary <hans at stare.cz> wrote:> This is 1.3.1 on OpenBSD/amd64. > The --no-utf8-convert option of metaflac(1) does not work for me: > > $ metaflac --no-utf8-convert --set-tag="Artist=?ou?l??ek" aladin.flac > aladin.flac: ERROR: tag value for 'Artist' is not valid UTF-8 > (You probably can't see the Czech letters properly in my mail, > but that's beside the point.) > > Indeed, it is not valid UTF8 (it's LATIN2), but isn't metaflac > supposed to just write it as specified, with this option? >
On Dec 05 21:50:47, brianw at audiobanshee.com wrote:> I assume the problem is that metaflac has no way of knowing the encoding that was provided on the command line, since it could literally be anything. The --no-utf8-convert option means that metaflac does nothing to the letters as they pass through,That's what is supposed to do, but doesn't, apparently.> and then the problem becomes that the next program to read the tags > has to assume the character set without any information. > If the program reading the tags gets the character set wrong, > then you see garbage.Nevermind the next program, the problem now is that metaflac does not honor the --no-utf8-convert option.> Or, to put it another way, isn't the assumption that all tags > in a FLAC file are UTF-8? Thus, if you provide LATIN2 and don't > allow metaflac to convert, then it's sure to be garbage.No. It's sure to be exactly what the user provided, n my case LATIN2, if metaflac honors the --no-utf8-convert option.> I don't actually know whether this option is supposed to work on input, > output, or both.The manpage says: --no-utf8-convert Do not convert tags from UTF-8 to local charset, or vice versa. This is useful for scripts, and setting tags in situations where the locale is wrong. "vice versa" tells me it's supposed to work for both input and output. "setting tags" tells me it's deffinitely for input.
On Dec 05 20:16:47, hans at stare.cz wrote:> This is 1.3.1 on OpenBSD/amd64. > The --no-utf8-convert option of metaflac(1) does not work for me: > > $ metaflac --no-utf8-convert --set-tag="Artist=?ou?l??ek" aladin.flac > aladin.flac: ERROR: tag value for 'Artist' is not valid UTF-8 > (You probably can't see the Czech letters properly in my mail, > but that's beside the point.) > > Indeed, it is not valid UTF8 (it's LATIN2), but isn't metaflac > supposed to just write it as specified, with this option?The problem seems to be in src/metaflac/operations_shorthand_vorbiscomment.c in the set_vc_field() function. It does check whether utf conversion is required, /* move 'data' into 'converted', converting to UTF-8 if necessary */ if(raw) { converted = data; } } but later checks that FLAC__format_vorbiscomment_entry_is_legal() whether or not we are utf converting; and this function, defined in ./src/libFLAC/format.c, ultimately calls for utf8len_(s) no matter what. So my LATIN2 text fails to be legal, because it's not legal UTF -- which, indeed, it isn't. Jan
Brian Willoughby
2014-Dec-06 09:28 UTC
[Flac] metaflac --no-utf8-convert complains about UTF
On Dec 6, 2014, at 12:54 AM, Jan Stary <hans at stare.cz> wrote:> On Dec 05 20:16:47, hans at stare.cz wrote: >> This is 1.3.1 on OpenBSD/amd64. >> The --no-utf8-convert option of metaflac(1) does not work for me: >> >> $ metaflac --no-utf8-convert --set-tag="Artist=?ou?l??ek" aladin.flac >> aladin.flac: ERROR: tag value for 'Artist' is not valid UTF-8 >> (You probably can't see the Czech letters properly in my mail, >> but that's beside the point.) >> >> Indeed, it is not valid UTF8 (it's LATIN2), but isn't metaflac >> supposed to just write it as specified, with this option? > > The problem seems to be in > src/metaflac/operations_shorthand_vorbiscomment.c > in the set_vc_field() function. > > It does check whether utf conversion is required, > > /* move 'data' into 'converted', converting to UTF-8 if necessary */ > if(raw) { > converted = data; > } > } > but later checks that FLAC__format_vorbiscomment_entry_is_legal() > whether or not we are utf converting; and this function, defined > in ./src/libFLAC/format.c, ultimately calls for utf8len_(s) no matter what. > So my LATIN2 text fails to be legal, because it's not legal UTF > -- which, indeed, it isn't.Looks like you found the problem. One piece of code is doing the right thing, another piece of code is ignoring the option. By the way, I've never used FLAC inside Ogg Vorbis. Instead, I use pure FLAC format files. Is there any difference between the way this option works on a straight FLAC file versus how it works on FLAC data in an Ogg Vorbis container? Brian
On Dec 05 20:16:47, hans at stare.cz wrote:> This is 1.3.1 on OpenBSD/amd64. > The --no-utf8-convert option of metaflac(1) does not work for me: > > $ metaflac --no-utf8-convert --set-tag="Artist=?ou?l??ek" aladin.flac > aladin.flac: ERROR: tag value for 'Artist' is not valid UTF-8 > (You probably can't see the Czech letters properly in my mail, > but that's beside the point.) > > Indeed, it is not valid UTF8 (it's LATIN2), but isn't metaflac > supposed to just write it as specified, with this option?On Dec 06 09:54:55, hans at stare.cz wrote:> The problem seems to be in > src/metaflac/operations_shorthand_vorbiscomment.c > in the set_vc_field() function. > > It does check whether utf conversion is required, > > /* move 'data' into 'converted', converting to UTF-8 if necessary */ > if(raw) { > converted = data; > } > } > but later checks that FLAC__format_vorbiscomment_entry_is_legal() > whether or not we are utf converting; and this function, defined > in ./src/libFLAC/format.c, ultimately calls for utf8len_(s) no matter what. > So my LATIN2 text fails to be legal, because it's not legal UTF > -- which, indeed, it isn't.On Dec 06 12:33:35, martin.leese at stanfordalumni.org wrote:> METADATA_BLOCK_VORBIS_COMMENT is defined at: > https://xiph.org/flac/format.html#metadata_block_vorbis_comment > and VorbisComments at: > http://www.xiph.org/vorbis/doc/v-comment.html > > Note that a VorbisComment is defined as > being UTF-8, although metaflac --no-utf8-convert > doesn't seem to be behaving as advertised.Reading the above links, the Vorbis Comment is defined to be UTF8. What is the purpose of --no-utf8-convert in setting tags then? To specifically ask for invalid files? Maybe I am misunderstanding the meaning of --no-utf8-convert. Perhaps the current behaviour is intended, and --no-utf8-convert just means "don't bother converting, it is already UTF8". Which my example isn't, and metaflac rightfully complains. Can anybody please shed some light on this?> Finally, Jan might have more luck taking his > problem with metatflac over to the flac-dev list.On Dec 06 13:55:16, martin.leese at stanfordalumni.org wrote:> Even better, he could submit a bug report at: > http://sourceforge.net/p/flac/bugs/Yes, I will move this to flac-dev and file a proper bug report once I am sure it is a bug, and it's the bug I think it is. BTW, the other Xiph projects track their issues at https://trac.xiph.org/ - is it intentional that FLAC uses the sourceforge bug tracker? Is there any relation between the two? Jan
Not sure it this is related, but the UTF conversion from and to my local charset does not work for me either (the --no-utf8-convert option is not involved in this). $ export LC_ALL=ISO8859-2 $ metaflac --remove-all-tags file.flac $ metaflac --set-tag="TITLE=?ou?li?ka" file.flac $ metaflac --list --block-number=2 file.flac METADATA block #2 type: 4 (VORBIS_COMMENT) is last: false length: 59 vendor string: reference libFLAC 1.3.0 20130526 comments: 1 comment[0]: TITLE=#ou#li#ka $ metaflac --export-tags-to=- file.flac TITLE=#ou#li#ka Here is how I understand this: metaflac understood the characters in --set-tag="TITLE=?ou?li?ka", because it knows my local charset is ISO8859-2; metaflac converted that string into UTF8 and stored it in the Vorbis comment; that's what --list shows me. But when I --export the tags, metaflac does _not_ convert the UTF8 comment back to my ISO8859-2 charset. Jan
Brian Willoughby
2014-Dec-07 18:29 UTC
[Flac] metaflac --no-utf8-convert complains about UTF
On Dec 7, 2014, at 2:43 AM, Jan Stary <hans at stare.cz> wrote:> Maybe I am misunderstanding the meaning of --no-utf8-convert. > Perhaps the current behaviour is intended, and --no-utf8-convert > just means "don't bother converting, it is already UTF8".That's exactly what I assume it means. I think that's the only thing it could mean. Brian Willoughby