thr3ads.net - Vorbis dev - [vorbis-dev] UTF-8 stuff [Sep 2001]

If this information is useful, please help other people find it:
Share via:

Edmund GRIMLEY EVANS

2001-Sep-30 12:46 UTC

[vorbis-dev] UTF-8 stuff

Here's a propsed heavy-duty solution for your UTF-8 problems.

I'm including a patch in this message, but I'll put the new files on
my web site at http://rano.org/tmp/xiph_files.tar.gz

I've tested this by running vorbiscomment with and without
-DHAVE_ICONV=1 in vorbis-tools/share/Makefile. It seems to work.

Changed files:

acinclude.m4: Add a test for nl_langinfo(CODESET). This is the
function that lets you discover the charset of the user's locale
without forcing them to use a command-line argument.

configure.in: Use AM_LANGINFO_CODESET.

utf8.h, utf8.c: These files are totally rewritten, apart from the
Windows part. Instead of utf8_encode() and utf8_decode() there's
convert_to_utf8() and convert_from_utf8(), which no longer have an
"encoding" argument.

oggenc.c, vcomment.c: Call setlocale(), so that nl_langinfo() will
work. Remove the "encoding" option and "encoding" arguments.
Instead
there's either nl_langinfo(CODESET) or the environment variable
CHARSET. Call convert_{to|from}_utf8() instead of utf8_{en|de}code().

Makefile.am: Different set of source files.

New files:

charset.c, charset.h: My functions for converting between encodings,
with header file.

charmaps.h: Mapping tables included by charset.c.

makemap.c: A program for generating a mapping table using iconv().

charset_test.c: A test suite for charset.c. Just compile it and link
it with charset.o.

iconvert.c: A function for converting a string using iconv().

Old files:

8859-1.map, 8859-2.map, make_code_map.pl: These are no longer used and
can be removed.

Things to do:

Maybe modify acinclude.m4 to check that iconv() works properly.
Experience with Mutt tells us that there are some severely broken
versions of iconv around.

At present exactly one of iconvert.c and charset.c is included in the
executable, but it's possible some people might want to have both. So
there should probably be a USE_CHARSET_CONVERT macro that by default
is defined iff HAVE_ICONV is not defined, but whose value can be
altered by an argument to configure.

Implement the Windows version of convert_from_utf8().

Edmund

diff -ru xiph.orig/vorbis-tools/acinclude.m4 xiph/vorbis-tools/acinclude.m4
--- xiph.orig/vorbis-tools/acinclude.m4	Tue Aug 21 15:05:09 2001
+++ xiph/vorbis-tools/acinclude.m4	Sun Sep 30 19:14:02 2001
@@ -430,3 +430,19 @@
   fi
   AC_SUBST(LIBICONV)
 ])
+
+dnl From Bruno Haible.
+dnl
+AC_DEFUN([AM_LANGINFO_CODESET],
+[
+  AC_CACHE_CHECK([for nl_langinfo and CODESET], am_cv_langinfo_codeset,
+    [AC_TRY_LINK([#include <langinfo.h>],
+      [char* cs = nl_langinfo(CODESET);],
+      am_cv_langinfo_codeset=yes,
+      am_cv_langinfo_codeset=no)
+    ])
+  if test $am_cv_langinfo_codeset = yes; then
+    AC_DEFINE(HAVE_LANGINFO_CODESET, 1,
+      [Define if you have <langinfo.h> and nl_langinfo(CODESET).])
+  fi
+])
diff -ru xiph.orig/vorbis-tools/configure.in xiph/vorbis-tools/configure.in
--- xiph.orig/vorbis-tools/configure.in	Mon Sep 24 23:42:12 2001
+++ xiph/vorbis-tools/configure.in	Sun Sep 30 19:14:19 2001
@@ -111,6 +111,7 @@
 
 AM_ICONV
 AC_FUNC_SMMAP
+AM_LANGINFO_CODESET
 
 dnl --------------------------------------------------
 dnl Work around FHS stupidity
diff -ru xiph.orig/vorbis-tools/include/utf8.h xiph/vorbis-tools/include/utf8.h
--- xiph.orig/vorbis-tools/include/utf8.h	Sat Sep 22 23:49:49 2001
+++ xiph/vorbis-tools/include/utf8.h	Sun Sep 30 18:05:14 2001
@@ -1,18 +1,23 @@
-/* OggEnc
+
+/*
+ * Convert a string between UTF-8 and the locale's charset.
+ * Invalid bytes are replaced by '#', and characters that are
+ * not available in the target encoding are replaced by '?'.
+ *
+ * If the locale's charset is not set explicitly then it is
+ * obtained using nl_langinfo(CODESET), where available, the
+ * environment variable CHARSET, or assumed to be US-ASCII.
  *
- * This program is distributed under the GNU General Public License, version 2.
- * A copy of this license is included with this source.
+ * Return value of conversion functions:
  *
- * Copyright © 2001, Daniel Resare <noa@metamatrix.se>
+ *  -1 : memory allocation failed
+ *   0 : data was converted exactly
+ *   1 : valid data was converted approximately (using '?')
+ *   2 : input was invalid (but still converted, using '#')
+ *   3 : unknown encoding (but still converted, using '?')
  */
 
-typedef struct
-{
-	char* name;
-	int mapping[256];
-} charset_map;
+void convert_set_charset(const char *charset);
 
-charset_map *get_map(const char *encoding);
-char *make_utf8_string(const unsigned short *unicode);
-int simple_utf8_encode(const char *from, char **to, const char *encoding);
-int utf8_encode(char *from, char **to, const char *encoding);
+int convert_to_utf8(const char *from, char **to);
+int convert_from_utf8(const char *from, char **to);
diff -ru xiph.orig/vorbis-tools/oggenc/oggenc.c
xiph/vorbis-tools/oggenc/oggenc.c
--- xiph.orig/vorbis-tools/oggenc/oggenc.c	Sun Sep 30 17:28:43 2001
+++ xiph/vorbis-tools/oggenc/oggenc.c	Sun Sep 30 19:04:03 2001
@@ -15,6 +15,7 @@
 #include <getopt.h>
 #include <string.h>
 #include <time.h>
+#include <locale.h>
 
 #include "platform.h"
 #include "encode.h"
@@ -50,7 +51,6 @@
         {"date",1,0,'d'},
         {"tracknum",1,0,'N'},
         {"serial",1,0,'s'},
-	{"encoding",1,0,'e'},
         {NULL,0,0,0}
 };
         
@@ -75,6 +75,8 @@
         int numfiles;
         int errors=0;
 
+	setlocale(LC_ALL, "");
+
         parse_options(argc, argv, &opt);
 
         if(optind >= argc)
@@ -320,8 +322,6 @@
                 " -s, --serial         Specify a serial number for the
stream. If encoding\n"
                 "                      multiple files, this will be
incremented for each\n"
                 "                      stream after the first.\n"
-		" -e, --encoding       Specify an encoding for the comments given
(not\n"
-		"                      supported on windows)\n"
                 "\n"
                 " Naming:\n"
                 " -o, --output=fn      Write file to fn (only valid in
single-file mode)\n"
@@ -477,7 +477,7 @@
         int ret;
         int option_index = 1;
 
-	while((ret = getopt_long(argc, argv,
"a:b:B:c:C:d:e:G:hl:m:M:n:N:o:P:q:QrR:s:t:vX:",
+	while((ret = getopt_long(argc, argv,
"a:b:B:c:C:d:G:hl:m:M:n:N:o:P:q:QrR:s:t:vX:",
                                         long_options, &option_index)) !=
-1)
         {
                 switch(ret)
@@ -498,9 +498,6 @@
                                 opt->dates = realloc(opt->dates,
(++opt->date_count)*sizeof(char *));
                                 opt->dates[opt->date_count - 1] =
strdup(optarg);
                                 break;
-			case 'e':
-				opt->encoding = strdup(optarg);
-				break;
             case 'G':
                 opt->genre = realloc(opt->genre,
(++opt->genre_count)*sizeof(char *));
                 opt->genre[opt->genre_count - 1] = strdup(optarg);
@@ -646,7 +643,7 @@
 static void add_tag(vorbis_comment *vc, oe_options *opt,char *name, char
*value)
 {
         char *utf8;
-	if(utf8_encode(value, &utf8, opt->encoding) == 0)
+	if(convert_to_utf8(value, &utf8) >= 0)
         {
                 if(name == NULL)
                         vorbis_comment_add(vc, utf8);
@@ -655,7 +652,7 @@
                 free(utf8);
         }
         else
-		fprintf(stderr, "Couldn't convert comment to UTF8, cannot
add\n");
+		fprintf(stderr, "Couldn't convert comment to UTF-8, cannot
add\n");
 }
 
 static void build_comments(vorbis_comment *vc, oe_options *opt, int filenum, 
diff -ru xiph.orig/vorbis-tools/share/Makefile.am
xiph/vorbis-tools/share/Makefile.am
--- xiph.orig/vorbis-tools/share/Makefile.am	Sun Sep 23 00:13:50 2001
+++ xiph/vorbis-tools/share/Makefile.am	Sun Sep 30 20:31:57 2001
@@ -6,12 +6,11 @@
 
 noinst_LIBRARIES = libutf8.a libgetopt.a
 
-libutf8_a_SOURCES = utf8.c
-MAP_FILES = 8859-1.map 8859-2.map
+libutf8_a_SOURCES = charset.c iconvert.c utf8.c
 
 libgetopt_a_SOURCES = getopt.c getopt1.c
 
-EXTRA_DIST = $(MAP_FILES) charsetmap.h make_code_map.pl
+EXTRA_DIST = charmaps.h makemap.c charset_test.c
 
 debug:
         $(MAKE) all CFLAGS="@DEBUG@"
diff -ru xiph.orig/vorbis-tools/share/utf8.c xiph/vorbis-tools/share/utf8.c
--- xiph.orig/vorbis-tools/share/utf8.c	Wed Sep 26 20:28:05 2001
+++ xiph/vorbis-tools/share/utf8.c	Sun Sep 30 20:23:21 2001
@@ -1,30 +1,40 @@
-/* OggEnc
- *
- * This program is distributed under the GNU General Public License, version 2.
- * A copy of this license is included with this source.
- *
- * (C) 2001 Michael Smith <msmith@labyrinth.net.au>
+/*
+ * Copyright (C) 2001 Peter Harris <peter.harris@hummingbird.com>
+ * Copyright (C) 2001 Edmund Grimley Evans <edmundo@rano.org>
  *
- * UTF-8 Conversion routines
- *   Copyright (C) 2001, Daniel Resare <noa@metamatrix.se>
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ * 
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ * 
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
+ */
+
+/*
+ * Convert a string between UTF-8 and the locale's charset.
  */
 
-#include <stdio.h>
 #include <stdlib.h>
 #include <string.h>
+
 #include "utf8.h"
 
 
 #ifdef _WIN32
+#include <stdio.h>
 #include <windows.h>
 
-int utf8_encode(char *from, char **to, const char *encoding)
+int convert_to_utf8(const char *from, char **to)
 {
         /* Thanks to Peter Harris <peter.harris@hummingbird.com> for this
win32
          * code.
-	 *
-	 * We ignore 'encoding' and assume that the input is in the 'code
page'
-	 * of the console. Reasonable, since oggenc is a console app.
          */
 
         unsigned short *unicode;
@@ -36,14 +46,14 @@
         if(wchars == 0)
         {
                 fprintf(stderr, "Unicode translation error %d\n",
GetLastError());
-		return 1;
+		return -1;
         }
 
         unicode = calloc(wchars + 1, sizeof(unsigned short));
         if(unicode == NULL) 
         {
                 fprintf(stderr, "Out of memory processing string to
UTF8\n");
-		return 1;
+		return -1;
         }
 
         err = MultiByteToWideChar(CP_ACP, MB_PRECOMPOSED, from, 
@@ -52,7 +62,7 @@
         {
                 free(unicode);
                 fprintf(stderr, "Unicode translation error %d\n",
GetLastError());
-		return 1;
+		return -1;
         }
 
         /* On NT-based windows systems, we could use WideCharToMultiByte(), but
@@ -64,234 +74,101 @@
         return 0;
 }
 
-int utf8_decode(char *from, char **to, const char *encoding)
+int convert_from_utf8(const char *from, char **to)
 {
-	return 1;  /* Dummy stub */
+	return -1;  /* Dummy stub */
 }
 
 #else /* End win32. Rest is for real operating systems */
 
-#ifdef HAVE_ICONV
-#include <iconv.h>
-#include <errno.h>
+
+#ifdef HAVE_LANGINFO_CODESET
+#include <langinfo.h>
 #endif
 
-#include "charsetmap.h"
+static char *current_charset = 0; /* means "US-ASCII" */
 
-#define BUFSIZE 256
+void convert_set_charset(const char *charset)
+{
 
-/*
- Converts the string FROM from the encoding specified in ENCODING
- to UTF-8. The resulting string i pointed to by *TO.
+#ifdef HAVE_LANGINFO_CODESET
+  if (!charset)
+    charset = nl_langinfo(CODESET);
+#endif
 
- Return values:
- 0 indicates a successfully converted string.
- 1 indicates that the given encoding is not available.
- 2 indicates that the given string is bigger than BUFSIZE and can therefore
-   not be encoded.
- 3 indicates that given string could not be parsed.
-*/
-int utf8_encode(char *from, char **to, const char *encoding)
+  if (!charset)
+    charset = getenv("CHARSET");
+
+  free(current_charset);
+  current_charset = 0;
+  if (charset && *charset)
+    current_charset = strdup(charset);
+}
+
+static int convert_buffer(const char *fromcode, const char *tocode,
+			  const char *from, size_t fromlen,
+			  char **to, size_t *tolen)
 {
+  int ret = -1;
+
 #ifdef HAVE_ICONV
-	static unsigned char buffer[BUFSIZE];
-    char *from_p, *to_p;
-	size_t from_left, to_left;
-	iconv_t cd;
+  ret = iconvert(fromcode, tocode, from, fromlen, to, tolen);
+  if (ret != -1)
+    return ret;
 #endif
 
-	if (!strcasecmp(encoding, "UTF-8")) {
-	    /* ideally some checking of the given string should be done */
-		*to = malloc(strlen(from) + 1);
-		strcpy(*to, from);
-		return 0;
-	}
-
-#ifdef HAVE_ICONV
-	cd = iconv_open("UTF-8", encoding);
-	if(cd == (iconv_t)(-1))
-	{
-		if(errno == EINVAL) {
-			/* if iconv can't encode from this encoding, try
-			 * simple_utf8_encode()
-			 */
-			return simple_utf8_encode(from, to, encoding);
-		} else {
-			perror("iconv_open");
-		}
-	}
-	
-	from_left = strlen(from);
-	to_left = BUFSIZE;
-	from_p = from;
-	to_p = buffer;
-	
-	if(iconv(cd, (ICONV_CONST char **)(&from_p), &from_left, &to_p, 
-				&to_left) == (size_t)-1)
-	{
-		iconv_close(cd);
-		switch(errno)
-		{
-		case E2BIG:
-			/* if the buffer is too small, try simple_utf8_encode()
-			 */
-			return simple_utf8_encode(from, to, encoding);
-		case EILSEQ:
-		case EINVAL:
-			return 3;
-		default:
-			perror("iconv");
-		}
-	}
-	else
-	{
-		iconv_close(cd);
-	}
-	*to = malloc(BUFSIZE - to_left + 1);
-	buffer[BUFSIZE - to_left] = 0;
-	strcpy(*to, buffer);
-	return 0;
-#else
-	return simple_utf8_encode(from, to, encoding);
+#ifndef HAVE_ICONV /* should be ifdef USE_CHARSET_CONVERT */
+  ret = charset_convert(fromcode, tocode, from, fromlen, to, tolen);
+  if (ret != -1)
+    return ret;
 #endif
+
+  return ret;
 }
 
-/*
- This implementation has the following limitations: The given charset must
- represent each glyph with exactly one (1) byte. No multi byte or variable
- width charsets are allowed. (An exception to this i UTF-8 that is passed
- right through.) The glyhps in the charsets must have a unicode value equal
- to or less than 0xFFFF (this inclues pretty much everything). For a complete,
- free conversion implementation please have a look at libiconv.
-*/
-int simple_utf8_encode(const char *from, char **to, const char *encoding)
+static int convert_string(const char *fromcode, const char *tocode,
+			  const char *from, char **to, char replace)
 {
-	/* can you always know this will be 16 bit? */
-	unsigned short *unicode;
-	charset_map *map;
-	int index = 0;
-	unsigned char c;
-	
-	unicode = calloc((strlen(from) + 1), sizeof(short));
-
-	map = get_map(encoding);
-	
-	if (map == NULL) 
-		return 1;
+  int ret;
+  size_t fromlen;
+  char *s;
 
-	c = from[index];
-	while(c)
-	{
-		unicode[index] = map->mapping[c];
-		index++;
-		c = from[index];
-	}
+  fromlen = strlen(from);
+  ret = convert_buffer(fromcode, tocode, from, fromlen, to, 0);
+  if (ret == -2)
+    return -1;
+  if (ret != -1)
+    return ret;
 
-	*to =  make_utf8_string(unicode);
-	free(unicode);
-	return 0;
+  s = malloc(fromlen + 1);
+  if (!s)
+    return -1;
+  strcpy(s, from);
+  *to = s;
+  for (; *s; s++)
+    if (*s & ~0x7f)
+      *s = replace;
+  return 3;
 }
 
-int utf8_decode(char *from, char **to, const char *encoding)
+int convert_to_utf8(const char *from, char **to)
 {
-#ifdef HAVE_ICONV
-	static unsigned char buffer[BUFSIZE];
-    char *from_p, *to_p;
-	size_t from_left, to_left;
-	iconv_t cd;
-	cd = iconv_open(encoding, "UTF-8");
-	if(cd == (iconv_t)(-1))
-	{
-		perror("iconv_open");
-	}
-	
-	from_left = strlen(from);
-	to_left = BUFSIZE;
-	from_p = from;
-	to_p = buffer;
-	
-	if(iconv(cd, (ICONV_CONST char **)(&from_p), &from_left, &to_p, 
-				&to_left) == (size_t)-1)
-	{
-		iconv_close(cd);
-		switch(errno)
-		{
-		case E2BIG:
-		case EILSEQ:
-		case EINVAL:
-			return 3;
-		default:
-			perror("iconv");
-		}
-	}
-	else
-	{
-		iconv_close(cd);
-	}
-	*to = malloc(BUFSIZE - to_left + 1);
-	buffer[BUFSIZE - to_left] = 0;
-	strcpy(*to, buffer);
-	return 0;
-#else
-	return 1;  /* Dummy stub */
-#endif /* HAVE_ICONV */
-}
+  char *charset;
 
-charset_map *get_map(const char *encoding)
-{
-	charset_map *map_p = maps;
-	while(map_p->name != NULL)
-	{
-		if(!strcasecmp(map_p->name, encoding))
-		{
-			return map_p;
-		}
-		map_p++;
-	}
-	return NULL;
+  if (!current_charset)
+    convert_set_charset(0);
+  charset = current_charset ? current_charset : "US-ASCII";
+  return convert_string(charset, "UTF-8", from, to, '#');
 }
 
-#endif /* The rest is used by everthing */
-
-char *make_utf8_string(const unsigned short *unicode)
+int convert_from_utf8(const char *from, char **to)
 {
-	int size = 0, index = 0, out_index = 0;
-	unsigned char *out;
-	unsigned short c;
-
-    /* first calculate the size of the target string */
-	c = unicode[index++];
-	while(c) {
-		if(c < 0x0080) {
-			size += 1;
-		} else if(c < 0x0800) {
-			size += 2;
-		} else {
-			size += 3;
-		}
-		c = unicode[index++];
-	}	
-
-	out = malloc(size + 1);
-	index = 0;
-
-	c = unicode[index++];
-	while(c)
-	{
-		if(c < 0x080) {
-			out[out_index++] = c;
-		} else if(c < 0x800) {
-			out[out_index++] = 0xc0 | (c >> 6);
-			out[out_index++] = 0x80 | (c & 0x3f);
-		} else {
-			out[out_index++] = 0xe0 | (c >> 12);
-			out[out_index++] = 0x80 | ((c >> 6) & 0x3f);
-			out[out_index++] = 0x80 | (c & 0x3f);
-		}
-		c = unicode[index++];
-	}
-	out[out_index] = 0x00;
+  char *charset;
 
-	return out;
+  if (!current_charset)
+    convert_set_charset(0);
+  charset = current_charset ? current_charset : "US-ASCII";
+  return convert_string("UTF-8", charset, from, to, '?');
 }
 
+#endif
diff -ru xiph.orig/vorbis-tools/vorbiscomment/vcomment.c
xiph/vorbis-tools/vorbiscomment/vcomment.c
--- xiph.orig/vorbis-tools/vorbiscomment/vcomment.c	Wed Sep 26 20:28:05 2001
+++ xiph/vorbis-tools/vorbiscomment/vcomment.c	Sun Sep 30 19:04:03 2001
@@ -12,6 +12,7 @@
 #include <stdio.h>
 #include <string.h>
 #include <stdlib.h>
+#include <locale.h>
 #include "getopt.h"
 #include "utf8.h"
 
@@ -24,7 +25,6 @@
         {"help",0,0,'h'},
         {"quiet",0,0,'q'},
         {"commentfile",1,0,'c'},
-    {"encoding", 1,0,'e'},
         {NULL,0,0,0}
 };
 
@@ -37,7 +37,6 @@
         int commentcount;
         char **comments;
         int tempoutfile;
-	char *encoding;
 } param_t;
 
 #define MODE_NONE  0
@@ -47,8 +46,8 @@
 
 /* prototypes */
 void usage(void);
-void print_comments(FILE *out, vorbis_comment *vc, char *encoding);
-int  add_comment(char *line, vorbis_comment *vc, char *encoding);
+void print_comments(FILE *out, vorbis_comment *vc);
+int  add_comment(char *line, vorbis_comment *vc);
 
 param_t	*new_param(void);
 void parse_options(int argc, char *argv[], param_t *param);
@@ -98,7 +97,7 @@
 
                 /* extract and display the comments */
                 vc = vcedit_comments(state);
-		print_comments(param->com, vc, param->encoding);
+		print_comments(param->com, vc);
 
                 /* done */
                 vcedit_clear(state);
@@ -128,7 +127,7 @@
 
                 for(i=0; i < param->commentcount; i++)
                 {
-			if(add_comment(param->comments[i], vc, param->encoding) < 0)
+			if(add_comment(param->comments[i], vc) < 0)
                                 fprintf(stderr, "Bad comment:
\"%s\"\n", param->comments[i]);
                 }
 
@@ -139,7 +138,7 @@
                         char *buf = (char *)malloc(sizeof(char)*1024);
 
                         while (fgets(buf, 1024, param->com))
-				if (add_comment(buf, vc, param->encoding) < 0) {
+				if (add_comment(buf, vc) < 0) {
                                         fprintf(stderr,
                                                 "bad comment:
\"%s\"\n",
                                                 buf);
@@ -177,14 +176,14 @@
 
 ***********/
 
-void print_comments(FILE *out, vorbis_comment *vc, char *encoding)
+void print_comments(FILE *out, vorbis_comment *vc)
 {
         int i;
     char *decoded_value;
 
         for (i = 0; i < vc->comments; i++)
     {
-	    if (utf8_decode(vc->user_comments[i], &decoded_value, encoding) ==
0)
+	    if (convert_from_utf8(vc->user_comments[i], &decoded_value) >=
0)
         {
                     fprintf(out, "%s\n", decoded_value);
             free(decoded_value);
@@ -197,7 +196,7 @@
 /**********
 
    Take a line of the form "TAG=value string", parse it, convert the
-   value to UTF-8 from the specified encoding, and add it to the
+   value to UTF-8, and add it to the
    vorbis_comment structure. Error checking is performed.
 
    Note that this assumes a null-terminated string, which may cause
@@ -205,7 +204,7 @@
 
 ***********/
 
-int  add_comment(char *line, vorbis_comment *vc, char *encoding)
+int  add_comment(char *line, vorbis_comment *vc)
 {
         char	*mark, *value, *utf8_value;
 
@@ -234,7 +233,7 @@
         value++;
 
         /* convert the value from the native charset to UTF-8 */
-	if (utf8_encode(value, &utf8_value, encoding) == 0) {
+	if (convert_to_utf8(value, &utf8_value) >= 0) {
                 
                 /* append the comment and return */
                 vorbis_comment_add_tag(vc, line, utf8_value);
@@ -307,9 +306,6 @@
         param->comments=NULL;
         param->tempoutfile=0;
 
-	/* character encoding */
-	param->encoding = "ISO-8859-1";
-
         return param;
 }
 
@@ -327,7 +323,9 @@
         int ret;
         int option_index = 1;
 
-	while ((ret = getopt_long(argc, argv, "ae:lwhqc:t:",
+	setlocale(LC_ALL, "");
+
+	while ((ret = getopt_long(argc, argv, "alwhqc:t:",
                         long_options, &option_index)) != -1) {
                 switch (ret) {
                         case 0:
@@ -342,9 +340,6 @@
                                 break;
                         case 'a':
                                 param->mode = MODE_APPEND;
-				break;
-			case 'e':
-				param->encoding = strdup(optarg);
                                 break;
                         case 'h':
                                 usage();

--- >8 ----
List archives:  http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to
'vorbis-dev-request@xiph.org'
containing only the word 'unsubscribe' in the body.  No subject is
needed.
Unsubscribe messages sent to the list will be ignored/filtered.

Daniel Resare

2001-Sep-30 13:21 UTC

head link

[vorbis-dev] UTF-8 stuff

On Sun, Sep 30, 2001 at 08:46:26PM +0100, Edmund GRIMLEY EVANS
wrote:> Here's a propsed heavy-duty solution for your UTF-8 problems.
> 
As the author of the original UTF-8 patch I must say that this is a much
better solution. Please use it :)

cheers/d


-- 
begin:vcard fn:Daniel Resare tel;cell:+46739442044 tel;work:+468332040
adr;work:Scheelegatan 36; 112 28; Stockholm; Sweden end:vcard
pgp fingerprint: 8D97 F297 CA0D 8751 D8EB  12B6 6EA6 727F 9B8D EC2A

--- >8 ----
List archives:  http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to
'vorbis-dev-request@xiph.org'
containing only the word 'unsubscribe' in the body.  No subject is
needed.
Unsubscribe messages sent to the list will be ignored/filtered.

Edmund GRIMLEY EVANS

2001-Oct-01 00:54 UTC

head link

[vorbis-dev] UTF-8 stuff

Jack Moffitt <jack@xiph.org>:
> > utf8.h, utf8.c: These files are totally rewritten, apart from the
> > Windows part. Instead of utf8_encode() and utf8_decode() there's
> > convert_to_utf8() and convert_from_utf8(), which no longer have an
> > "encoding" argument.
> 
> The convention @xiph.org is generally that the module name be first in
> the function.  It's easy enough to rename these so that utf8 is again
at
> the beginning.
That's a good convention, and I can't now remember why I changed the
names. I suggest you rename them right back to utf8_encode() and
utf8_decode().

Edmund

--- >8 ----
List archives:  http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to
'vorbis-dev-request@xiph.org'
containing only the word 'unsubscribe' in the body.  No subject is
needed.
Unsubscribe messages sent to the list will be ignored/filtered.

Michael Smith

2001-Oct-01 21:14 UTC

head link

[vorbis-dev] UTF-8 stuff

At 08:46 PM 9/30/01 +0100, you wrote:>Here's a propsed heavy-duty solution for your UTF-8 problems.
Thanks very much. This code looks great. I've committed it, though I'm
fairly certain it won't work on windows (missing headers, etc.), but
that'll
be easily fixed once a windows person comes along.

It'd be very much appreciated if everyone (and if you're on a
non-english
system, even better!) tested this thoroughly in the next few days.

Michael

--- >8 ----
List archives:  http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to
'vorbis-dev-request@xiph.org'
containing only the word 'unsubscribe' in the body.  No subject is
needed.
Unsubscribe messages sent to the list will be ignored/filtered.

Maybe Matching Threads

Search for more reasonably related threads

Vorbis dev - Sep 2001 - UTF-8 stuff

[vorbis-dev] UTF-8 stuff

[vorbis-dev] UTF-8 stuff

[vorbis-dev] UTF-8 stuff

[vorbis-dev] UTF-8 stuff

Maybe Matching Threads