I am using CSV in a rake task (db:seed) on Rails 3.0.3, Ruby 1.9.2 to read a file with some funny chars in it. Upon breaking in at a point where the row read using CSV is in variable row, with the string with the char in row[''price''] I get the following strange results which I cannot understand. (rdb:1) row[''price''] "\xA32.00" (rdb:1) row[''price''][0] "\xA3" (rdb:1) row[''price''][0] == "\xA3" false (rdb:1) row[''price''][0].each_byte{|c| print c, '' ''} 163 "\xA3" (rdb:1) "\xA3".each_byte{|c| print c, '' ''} 163 "\xA3" (rdb:1) "\xA3".class String (rdb:1) row[''price''][0].class String (rdb:1) row[''price''][0] <=> "\xA3" -1 (rdb:1) "\xA3" <=> row[''price''][0] 1 (rdb:1) row[''price''][0].length 1 (rdb:1) "\xA3".length 1 So it appears that "\xA3" and row[''price''][0] are both strings of length 1 and both contain the byte value 163 yet "\xA3" is definitely greater than row[''price''][0] If I do c1 = row[''price''][0] and c2 = "\xA3" I still get the same effect. The variables c1 and c2 contain the same data but are different when compared. No doubt I am doing something stupid, if someone could point out what, then I would be most grateful. Colin -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.
Are you reading from xml to csv? Line feed (newline) &# 10; hexadecimal rep is &# xA; -- Posted via http://www.ruby-forum.com/. -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.
On 5 December 2010 21:02, Colin Law <clanlaw-gM/Ye1E23mwN+BqQ9rBEUg@public.gmane.org> wrote:> I am using CSV in a rake task (db:seed) on Rails 3.0.3, Ruby 1.9.2 to > read a file with some funny chars in it. Upon breaking in at a point > where the row read using CSV is in variable row, with the string with > the char in row[''price''] I get the following strange results which I > cannot understand. > > (rdb:1) row[''price''] > "\xA32.00" > (rdb:1) row[''price''][0] > "\xA3" > (rdb:1) row[''price''][0] == "\xA3" > false > (rdb:1) row[''price''][0].each_byte{|c| print c, '' ''} > 163 "\xA3" > (rdb:1) "\xA3".each_byte{|c| print c, '' ''} > 163 "\xA3" > (rdb:1) "\xA3".class > String > (rdb:1) row[''price''][0].class > String > (rdb:1) row[''price''][0] <=> "\xA3" > -1 > (rdb:1) "\xA3" <=> row[''price''][0] > 1 > (rdb:1) row[''price''][0].length > 1 > (rdb:1) "\xA3".length > 1 > > So it appears that "\xA3" and row[''price''][0] are both strings of > length 1 and both contain the byte value 163 yet "\xA3" is definitely > greater than row[''price''][0] > If I do c1 = row[''price''][0] and c2 = "\xA3" I still get the same > effect. The variables c1 and c2 contain the same data but are > different when compared.I woke up in the middle of the night and realised that this must be an encoding issue. If I check the encoding of the two strings then I see that ''\xa3'' is utf-8 but the data read by csv is ascii-8bit. (rdb:1) ''\xa3''.encoding.name "UTF-8" (rdb:1) row[''price''].encoding.name "ASCII-8BIT" This makes sense as CSV is reading an ascii text file. So it appears that in ruby 1.9.2 two strings that have the same contents and display the same, but are of different encodings, do not compare equal. Whether they should compare or not I do not know. Colin -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.
On 6 December 2010 09:30, Colin Law <clanlaw-gM/Ye1E23mwN+BqQ9rBEUg@public.gmane.org> wrote:> On 5 December 2010 21:02, Colin Law <clanlaw-gM/Ye1E23mwN+BqQ9rBEUg@public.gmane.org> wrote: >> [...] >> So it appears that "\xA3" and row[''price''][0] are both strings of >> length 1 and both contain the byte value 163 yet "\xA3" is definitely >> greater than row[''price''][0] >> If I do c1 = row[''price''][0] and c2 = "\xA3" I still get the same >> effect. The variables c1 and c2 contain the same data but are >> different when compared. > > I woke up in the middle of the night and realised that this must be an > encoding issue. > > If I check the encoding of the two strings then I see that ''\xa3'' is > utf-8 but the data read by csv is ascii-8bit. > > (rdb:1) ''\xa3''.encoding.name > "UTF-8" > (rdb:1) row[''price''].encoding.name > "ASCII-8BIT" > > This makes sense as CSV is reading an ascii text file. So it appears > that in ruby 1.9.2 two strings that have the same contents and display > the same, but are of different encodings, do not compare equal. > Whether they should compare or not I do not know.Just in case anybody has a similar problem and finds this in the future here is what I had to do sort out the issue. I needed to convert \xA3 chars in the ascii data read by CSV into UK Pound signs. I had the same encoding issues with the regular expression and this is what I had to do to achieve the desired effect At the top of the file (seeds.rb) #encoding: utf-8 ... regex = Regexp.new( "\xA3".force_encoding(''ASCII-8BIT'') ) Then to do the sub row[''price''] = row[''price''].gsub( regex, ''£''.force_encoding(''ASCII-8BIT'') ) Then when it came to updating the ActiveRecord object with the data read by CSV I had to force it to utf-8 model.price = row[''price''].force_encoding(''UTF-8'') This all works but I have to say that I am not sure that I fully understand all the encoding issues, so there may well be better ways. Colin -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.