Hi, I am working on an app that reads a .csv file and loads each cell as a row into a database. However much I discourage users to copy and past curly quotes from MS Word into their .csv, I need to build in a way to handle this. Using the create method, my generated insert statement will include values like this for cells with questionable strings: E''Dido’s Woman'' And the E'' means that there should be an escaped character in here, but Ruby doesn''t escape the character. Any idea how to fix this? I have tried using the Iconv library to convert, but that doesn''t solve the escaped character issue. -- Posted via http://www.ruby-forum.com/.
Dan Berger wrote:> Hi, I am working on an app that reads a .csv file and loads each cell as > a row into a database. However much I discourage users to copy and past > curly quotes from MS Word into their .csv, I need to build in a way to > handle this. > > Using the create method, my generated insert statement will include > values like this for cells with questionable strings: > > E''Dido’s Woman'' > > And the E'' means that there should be an escaped character in here, but > Ruby doesn''t escape the character.Are the characters getting into the database properly? Is your database encoding set to UTF8? Does your database.yml file specify UTF8 encoding? What exactly are the problems you''re seeing? Specific errors or corrupted data would be helpful.> > Any idea how to fix this? I have tried using the Iconv library to > convert, but that doesn''t solve the escaped character issue.Best, -- Marnen Laibow-Koser http://www.marnen.org marnen-sbuyVjPbboAdnm+yROfE0A@public.gmane.org -- Posted via http://www.ruby-forum.com/.
Marnen Laibow-Koser wrote:> > Are the characters getting into the database properly?No, the system is chocking at this character> > Is your database encoding set to UTF8?Yes> > Does your database.yml file specify UTF8 encoding?No....> > What exactly are the problems you''re seeing? Specific errors or > corrupted data would be helpful. >Here is my error: RuntimeError: ERROR C22021 Minvalid byte sequence for encoding "UTF8": 0x92 HThis error can also happen if the byte sequence does not match the encoding expected by the server, which is controlled by "client_encoding". Fwchar.c L1534: INSERT INTO "table_cells" ("row_id", "updated_at", "col_span", "table_meta_id", "additional_info", "content", "column_id", "created_at") VALUES(2, ''2009-10-21 19:36:14.440000'', 1, 154, NULL, E''Dido’s Woman'', 3, ''2009-10-21 19:36:14.440000'') Another error: IOError (closed stream): I believe the problem is that the ’ in Dido’s is not being escaped, despite the E'' indicating that it should/will be. Thanks, Dan -- Posted via http://www.ruby-forum.com/.
Dan Berger wrote:> Marnen Laibow-Koser wrote: >> >> Are the characters getting into the database properly? > No, the system is chocking at this character >> >> Is your database encoding set to UTF8? > Yes >> >> Does your database.yml file specify UTF8 encoding? > No....That could be your problem, then. Change it and restart the app.>> >> What exactly are the problems you''re seeing? Specific errors or >> corrupted data would be helpful. >> > Here is my error: > RuntimeError: ERROR C22021 Minvalid byte sequence for encoding "UTF8": > 0x92 HThis error can also happen if the byte sequence does not match > the encoding expected by the server, which is controlled by > "client_encoding". Fwchar.c L1534: INSERT INTO "table_cells" > ("row_id", "updated_at", "col_span", "table_meta_id", "additional_info", > "content", "column_id", "created_at") VALUES(2, ''2009-10-21 > 19:36:14.440000'', 1, 154, NULL, E''Dido’s Woman'', 3, ''2009-10-21 > 19:36:14.440000'') > > Another error: > IOError (closed stream): > > I believe the problem is that the ’ in Dido’s is not being escaped, > despite the E'' indicating that it should/will be. >Well, the error you''re getting would indicate an encoding mismatch between the app and the DB -- hardly surprising if you''re not telling the app about the DB encoding. So fix your database.yml and see what happens.> Thanks, > DanBest, -- Marnen Laibow-Koser http://www.marnen.org marnen-sbuyVjPbboAdnm+yROfE0A@public.gmane.org -- Posted via http://www.ruby-forum.com/.
OK, I added this line to my database.yml file: development: encoding: utf8 ... ... And that provides the same results. When I now call this on each string: str = Iconv.conv(''UTF-8//TRANSLIT'',''ISO-8859-1'',str) those curly quotes are now preceded by a "Â", which goes into the database like this -- not what I need, either. It''s as if this "Â" is what is being used to escape the funny characters. [0mINSERT INTO "table" ("row_id", "updated_at", "col_span", "table_meta_id", "additional_info", "content", "column_id", "created_at") VALUES(2, ''2009-10-21 20:48:07.410000'', 1, 158, NULL, E''DidoÂ’s Woman'', 3, ''2009-10-21 20:48:07.410000'') Thanks! -- Posted via http://www.ruby-forum.com/.
Dan Berger wrote:> OK, I added this line to my database.yml file: > > development: > encoding: utf8 > ... > ... > > And that provides the same results. When I now call this on each string: > > str = Iconv.conv(''UTF-8//TRANSLIT'',''ISO-8859-1'',str) > > those curly quotes are now preceded by a "Â", which goes into the > database like this -- not what I need, either. It''s as if this "Â" is > what is being used to escape the funny characters.No, that''s an extra byte that''s there because UTF-8 is a multibyte encoding. -- not really an escape character. But now that your database and app encodings are identical, try taking out the Iconv call; it will probably not be necessary. If that doesn''t work, then you may need to call Iconv with reversed arguments -- after all, it should be converting *to* UTF-8. In any case, I would strongly suggest that, as a first step, you make sure that you can round-trip characters between the app and the DB. Then verify the file uploads. Best, -- Marnen Laibow-Koser http://www.marnen.org marnen-sbuyVjPbboAdnm+yROfE0A@public.gmane.org -- Posted via http://www.ruby-forum.com/.
Again, thanks. I did pull out the Iconv, and still, the same results. Using a strait form, yes, I can get a curly quote in and out. Just not when that curly quote is in a .csv, which I parse into an array of arrays, like this: [["a", "b", "c"],["cell1", "cell2", "cell3"]] and then input into the database like this: Table.create(:table_attrib => var) do |tc| tc.column_id = col_id tc.row_id = r_id tc.content = cell end -- Posted via http://www.ruby-forum.com/.
> Again, thanks. I did pull out the Iconv, and still, the same results. > Using a strait form, yes, I can get a curly quote in and out. Just not > when that curly quote is in a .csv, which I parse into an array of > arrays, like this: > > [["a", "b", "c"],["cell1", "cell2", "cell3"]] and then input into the > database like this: > > Table.create(:table_attrib => var) do |tc| > tc.column_id = col_id > tc.row_id = r_id > tc.content = cell > endTry getting the fields you want and putting them in directly, ignoring any CSV stuff... That is, get them out of the CSV file, but don''t let your CSV library mess with them. Maybe that''s the culprit. You might also fiddle around with this Iconv line... Iconv.new(''UTF-8//Ignore'', ''UTF-8'').iconv("your string here") I''ve had to do that from time to time to try to fixup borked UTF characters... The //Ignore will drop anything it can''t convert... if my memory is right. Been awhile. -philip
Dan Berger wrote:> Again, thanks. I did pull out the Iconv, and still, the same results. > Using a strait form, yes, I can get a curly quote in and out.OK. Then your app and DB are talking the same encoding. Great. (And remember to specify UTF8 in your HTML headers.)> Just not > when that curly quote is in a .csv,Well, yeah, the uploaded file could be in a different encoding, so you probably should run it through Iconv before doing any processing. Let me check the Iconv docs and see if I can find anything useful. Or you could encourage your users to save their files in UTF-8, or perhaps to paste them into a form instead of uploading (which should bypass the issue entirely, assuming your content-type headers are correct). Best, -- Marnen Laibow-Koser http://www.marnen.org marnen-sbuyVjPbboAdnm+yROfE0A@public.gmane.org -- Posted via http://www.ruby-forum.com/.
OK, we solved it with a handful or regexs cell.gsub!("\222", "''") cell.gsub!("\226", "—") cell.gsub!("\223", ''"'') cell.gsub!("\221", "''") cell.gsub!("\224", ''"'') Thanks! -- Posted via http://www.ruby-forum.com/.
A good LART is also helpful in getting users to stop pasting Word''s garbage quotes... :) --Matt Jones On Oct 22, 10:27 am, Dan Berger <rails-mailing-l...-ARtvInVfO7ksV2N9l4h3zg@public.gmane.org> wrote:> OK, we solved it with a handful or regexs > > cell.gsub!("\222", "''") > cell.gsub!("\226", "—") > cell.gsub!("\223", ''"'') > cell.gsub!("\221", "''") > cell.gsub!("\224", ''"'') > > Thanks! > > -- > Posted viahttp://www.ruby-forum.com/.
Dan Berger wrote:> OK, we solved it with a handful or regexs > > cell.gsub!("\222", "''") > cell.gsub!("\226", "—") > cell.gsub!("\223", ''"'') > cell.gsub!("\221", "''") > cell.gsub!("\224", ''"'')This seems like a very bad solution -- particularly the 2nd line, which is pre-escaped HTML instead of a literal character. Please read my previous message in this thread for better ideas.> > Thanks!Best, -- Marnen Laibow-Koser http://www.marnen.org marnen-sbuyVjPbboAdnm+yROfE0A@public.gmane.org -- Posted via http://www.ruby-forum.com/.
Philip Hallstrom wrote:>> tc.row_id = r_id >> tc.content = cell >> end > > Try getting the fields you want and putting them in directly, ignoring > any CSV stuff... That is, get them out of the CSV file, but don''t let > your CSV library mess with them. Maybe that''s the culprit. > > You might also fiddle around with this Iconv line... > > Iconv.new(''UTF-8//Ignore'', ''UTF-8'').iconv("your string here") > > I''ve had to do that from time to time to try to fixup borked UTF > characters... > > The //Ignore will drop anything it can''t convert... if my memory is > right. Been awhile. > > -philipI found next solve of this problem: Iconv.new(''UTF-8'', ''Windows-1250'').iconv(text) because charcters as 0x92 are used in Microsoft Word. -- Posted via http://www.ruby-forum.com/. -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.