I am switching to a composite primary key (string and user ID) from the Rails conventional auto-incrementing integer primary ID. The table is large (2.5 million records) and I''d rather not discard the contents. The composite_primary_key gem doesn''t appear to support altering the table with a migration to do its magic, only creating a table from scratch. So I dumped the table with mysqldump, ran the migration (table looks good), and am trying to repopulate the table. It has accented characters and is complaining about duplicates, apparently around words with and without accents, e.g., ''jose'' and ''josé''. I''ve been deleting one by hand from the dump, but it is tedious and very slow. Emacs crawls when dealing with very large files with very long lines. I just don''t understand why the accents are causing problems. The string column is utf8_general_ci collation, just like other fields in the database with strings with accents. What do I need to specify so it will import the dump? Is there a problem with strings with accents in composite indexes? The table is created with a Rails migration, but everything else is pure MySQL utilities. TIA, Jeffrey
Ensure your database.yml file has a line like: encoding: utf8 On May 26, 10:50 am, "Jeffrey L. Taylor" <r...-f/t7CGFWhwGcvWdFBKKxig@public.gmane.org> wrote:> I am switching to a composite primary key (string and user ID) from the Rails > conventional auto-incrementing integer primary ID. The table is large (2.5 > million records) and I''d rather not discard the contents. The > composite_primary_key gem doesn''t appear to support altering the table with a > migration to do its magic, only creating a table from scratch. So I dumped > the table with mysqldump, ran the migration (table looks good), and am trying > to repopulate the table. It has accented characters and is complaining about > duplicates, apparently around words with and without accents, e.g., ''jose'' and > ''josé''. I''ve been deleting one by hand from the dump, but it is tedious and > very slow. Emacs crawls when dealing with very large files with very long > lines. > > I just don''t understand why the accents are causing problems. The string > column is utf8_general_ci collation, just like other fields in the database > with strings with accents. What do I need to specify so it will import the > dump? Is there a problem with strings with accents in composite indexes? > > The table is created with a Rails migration, but everything else is pure MySQL > utilities. > > TIA, > Jeffrey
Quoting BenH <benhami-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>:> > Ensure your database.yml file has a line like: > > encoding: utf8 >Thank you. It has for over a year. The data dumped was UTF-8, but something is preventing it from being re-imported. Sigh, Jeffrey
Quoting Jeffrey L. Taylor <ror-f/t7CGFWhwGcvWdFBKKxig@public.gmane.org>:> > I am switching to a composite primary key (string and user ID) from the Rails > conventional auto-incrementing integer primary ID. The table is large (2.5 > million records) and I''d rather not discard the contents. The > composite_primary_key gem doesn''t appear to support altering the table with a > migration to do its magic, only creating a table from scratch. So I dumped > the table with mysqldump, ran the migration (table looks good), and am trying > to repopulate the table. It has accented characters and is complaining about > duplicates, apparently around words with and without accents, e.g., ''jose'' and > ''josé''. I''ve been deleting one by hand from the dump, but it is tedious and > very slow. Emacs crawls when dealing with very large files with very long > lines. > > I just don''t understand why the accents are causing problems. The string > column is utf8_general_ci collation, just like other fields in the database > with strings with accents. What do I need to specify so it will import the > dump? Is there a problem with strings with accents in composite indexes? >The answer is mixed and bits of it don''t make sense. If I work with the MySQL client (mysql), I can create records with primary keys identical, except for an accent. Doing the same thing in the Rails console throws an DuplicateKey exception. Using mysqldump to dump the contents of a table, changing the table so the primary key is a integer-string composite and trying to repopulate the table from the mysqldump output also barfs on a duplicate key exception. It is non-optimal, but I can live for the moment with a table with two non-unique indexes. Jeffrey
Jeffrey L. Taylor wrote:>> The string column is utf8_general_ci collation, ...As far as I understand, utf8_general_ci is case-insensitive. Maybe, you want to give utf8_bin a shot. mysql> create table t(i int, v varchar(10) collate utf8_general_ci, constraint t_pk primary key(i, v)); Query OK, 0 rows affected (0.00 sec) mysql> insert into t select 1, ''jose'' union all select 1, ''josé''; ERROR 1062 (23000): Duplicate entry ''1-josé'' for key 1 mysql> drop table t; Query OK, 0 rows affected (0.00 sec) mysql> create table t(i int, v varchar(10) collate utf8_bin, constraint t_pk primary key(i, v)); Query OK, 0 rows affected (0.00 sec) mysql> insert into t select 1, ''jose'' union all select 1, ''josé''; Query OK, 2 rows affected (0.01 sec) Records: 2 Duplicates: 0 Warnings: 0 mysql> select * from t; +---+-------+ | i | v | +---+-------+ | 1 | jose | | 1 | josé | +---+-------+ 2 rows in set (0.00 sec) -- Posted via http://www.ruby-forum.com/.
Wisccal Wisccal wrote:> Jeffrey L. Taylor wrote: > > As far as I understand, utf8_general_ci is case-insensitive.I meant to say "accent-insensitive"... -- Posted via http://www.ruby-forum.com/.
Quoting Wisccal Wisccal <rails-mailing-list-ARtvInVfO7ksV2N9l4h3zg@public.gmane.org>:> > Wisccal Wisccal wrote: > > Jeffrey L. Taylor wrote: > > > > As far as I understand, utf8_general_ci is case-insensitive. > > I meant to say "accent-insensitive"...Thank you. I didn''t think MySQL would messup that badly. I''ll try this out this week. I just checked and my development and production machines have different collation, which explains the inconsistent results I was seeing in my tests. Jeffrey