Hi,
I am using the following code to upload a CSV and store it into my
sqlite db:
file = params[:file]
temp = CSV.new(file.tempfile, {:headers => false, :col_sep =>
";"})
temp.each do |row|
@newhash << {:var1 => row[0], :var2 => row[1]}
end
Finally I create a new record out of the @newhash above, but I got an
error before, when I have a special character in the row:
"invalid byte sequence in UTF-8"
I have german special characters: ä, ö, ü
Without these characters, my code is working!!!!
How can avoid the error by using the right encoding?
--
You received this message because you are subscribed to the Google Groups
"Ruby on Rails: Talk" group.
To post to this group, send email to
rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To unsubscribe from this group, send email to
rubyonrails-talk+unsubscribe@googlegroups.com.
For more options, visit this group at
http://groups.google.com/group/rubyonrails-talk?hl=en.
It is not only the CSV upload!
If I just add the following to my controller:
# encoding: utf-8
class HomeController < ApplicationController
def index
@u = "Müsli"
end
end
Error still remains: "invalid byte sequence in UTF-8"
I don''t understand that because it says that it is already in
"UTF-8"...
If I just use a special character in a view it is working, e.g.
<%"Müsli"%>
I read something about that gvim editor saves files in latin encoding
for default. Could that be related to my issue???
Cheers,
Sebastian
On 20 Jun., 11:46, Sebastian
<sebastian.go...-gM/Ye1E23mwN+BqQ9rBEUg@public.gmane.org>
wrote:> Hi,
>
> I am using the following code to upload a CSV and store it into my
> sqlite db:
>
> file = params[:file]
> temp = CSV.new(file.tempfile, {:headers => false, :col_sep =>
";"})
>
> temp.each do |row|
> @newhash << {:var1 => row[0], :var2 => row[1]}
> end
>
> Finally I create a new record out of the @newhash above, but I got an
> error before, when I have a special character in the row:
> "invalid byte sequence in UTF-8"
>
> I have german special characters: ä, ö, ü
>
> Without these characters, my code is working!!!!
>
> How can avoid the error by using the right encoding?
--
You received this message because you are subscribed to the Google Groups
"Ruby on Rails: Talk" group.
To post to this group, send email to
rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To unsubscribe from this group, send email to
rubyonrails-talk+unsubscribe@googlegroups.com.
For more options, visit this group at
http://groups.google.com/group/rubyonrails-talk?hl=en.
Update: If I use in my controller: #encoding: CP850 or #encoding: iso-8859-1 then the error message didn''t appear, but the special character ü is replaced by a question mark. Looks like this: M�sli I thought utf-8 is able to handle german special characters. It took me the whole day and I still didn''t come to a solution. I really hope that someone can help me. Cheers, Sebastian On 21 Jun., 14:12, Sebastian <sebastian.go...-gM/Ye1E23mwN+BqQ9rBEUg@public.gmane.org> wrote:> It is not only the CSV upload! > > If I just add the following to my controller: > > # encoding: utf-8 > class HomeController < ApplicationController > def index > @u = "Müsli" > end > end > > Error still remains: "invalid byte sequence in UTF-8" > > I don''t understand that because it says that it is already in > "UTF-8"... > > If I just use a special character in a view it is working, e.g. <%> "Müsli"%> > > I read something about that gvim editor saves files in latin encoding > for default. Could that be related to my issue??? > > Cheers, > Sebastian > > On 20 Jun., 11:46, Sebastian <sebastian.go...-gM/Ye1E23mwN+BqQ9rBEUg@public.gmane.org> wrote: > > > > > > > > > Hi, > > > I am using the following code to upload a CSV and store it into my > > sqlite db: > > > file = params[:file] > > temp = CSV.new(file.tempfile, {:headers => false, :col_sep => ";"}) > > > temp.each do |row| > > @newhash << {:var1 => row[0], :var2 => row[1]} > > end > > > Finally I create a new record out of the @newhash above, but I got an > > error before, when I have a special character in the row: > > "invalid byte sequence in UTF-8" > > > I have german special characters: ä, ö, ü > > > Without these characters, my code is working!!!! > > > How can avoid the error by using the right encoding?-- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.
Hi Sebastian, I personally haven''t had to deal with encoding issues yet, but remember reading couple of posts from Yehuda Katz (of merb fame and core contributor to rails) on that. Maybe these can help you identify and fix your problem: http://yehudakatz.com/2010/05/17/encodings-unabridged/ http://yehudakatz.com/2010/05/05/ruby-1-9-encodings-a-primer-and-the-solution-for-rails/ The articles are little long, but if you know a good deal about encodings, then you can skip towards end of the posts where he writes about how to deal with conversions. -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To view this discussion on the web visit https://groups.google.com/d/msg/rubyonrails-talk/-/hw52RS6K9MAJ. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.
Hi Chirag, thank you for the links. I will read them and look if there is something that can help me. I found out that the main problem was that my gvim editor saved every *.rb file not in utf-8 encoding. I just edited them with notepad and saved them explicitly in utf-8 and then the german special characters worked in my controllers. There is still the problem with the CSV class, which I need to import a csv file. This class is not able to read the special characters. I found the documentation here: http://www.ruby-doc.org/ruby-1.9/classes/CSV.html#M001340 there is something about encodings, but I don''t understand how to use it!!! Sebastian On 21 Jun., 14:34, Chirag Singhal <chirag.sing...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:> Hi Sebastian, > > I personally haven''t had to deal with encoding issues yet, but remember > reading couple of posts from Yehuda Katz (of merb fame and core contributor > to rails) on that. > Maybe these can help you identify and fix your problem:http://yehudakatz.com/2010/05/17/encodings-unabridged/http://yehudakatz.com/2010/05/05/ruby-1-9-encodings-a-primer-and-the-... > > The articles are little long, but if you know a good deal about encodings, > then you can skip towards end of the posts where he writes about how to deal > with conversions.-- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.
Chirag Singhal
2011-Jun-22 06:58 UTC
Re: Re: Problem with special characters and CSV upload
Per the documentation, you can probably do something like this:
file = params[:file]
CSV.open(file.tempfile, "rb:UTF-32BE:UTF-8", {:headers => false,
:col_sep =>
";"}) do |row|
@newhash << {:var1 => row[0], :var2 => row[1]}
end
Replace "UTF-32BE" with your incoming encoding and "UTF-8"
with the encoding
you want to parse/store your data in.
On Wed, Jun 22, 2011 at 12:11 PM, Sebastian
<sebastian.goldt-gM/Ye1E23mwN+BqQ9rBEUg@public.gmane.org>wrote:
> Hi Chirag,
>
> thank you for the links. I will read them and look if there is
> something that can help me.
>
> I found out that the main problem was that my gvim editor saved every
> *.rb file not in utf-8 encoding. I just edited them with notepad and
> saved them explicitly in utf-8 and then the german special characters
> worked in my controllers.
>
> There is still the problem with the CSV class, which I need to import
> a csv file. This class is not able to read the special characters.
>
> I found the documentation here:
> http://www.ruby-doc.org/ruby-1.9/classes/CSV.html#M001340
>
> there is something about encodings, but I don''t understand how to
use
> it!!!
>
>
> Sebastian
>
> On 21 Jun., 14:34, Chirag Singhal
<chirag.sing...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> > Hi Sebastian,
> >
> > I personally haven''t had to deal with encoding issues yet,
but remember
> > reading couple of posts from Yehuda Katz (of merb fame and core
> contributor
> > to rails) on that.
> > Maybe these can help you identify and fix your problem:
>
http://yehudakatz.com/2010/05/17/encodings-unabridged/http://yehudakatz.com/2010/05/05/ruby-1-9-encodings-a-primer-and-the-.
> ..
> >
> > The articles are little long, but if you know a good deal about
> encodings,
> > then you can skip towards end of the posts where he writes about how
to
> deal
> > with conversions.
>
> --
> You received this message because you are subscribed to the Google Groups
> "Ruby on Rails: Talk" group.
> To post to this group, send email to
rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To unsubscribe from this group, send email to
>
rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> For more options, visit this group at
> http://groups.google.com/group/rubyonrails-talk?hl=en.
>
>
--
Chirag
http://sumeruonrails.com
--
You received this message because you are subscribed to the Google Groups
"Ruby on Rails: Talk" group.
To post to this group, send email to
rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To unsubscribe from this group, send email to
rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
For more options, visit this group at
http://groups.google.com/group/rubyonrails-talk?hl=en.
I tried these both:
CSV.open(file.tempfile, "rb:CP850:UTF-8", {:headers => false,
:col_sep
=> ";"})
CSV.open(file.tempfile, {:headers => false, :col_sep => ";",
:encoding
=> "rb:CP850:UTF-8"})
it says ''No file to upload'' !
I used the following code to show the encoding of my file:
utf8 = File.open("test.csv")
puts utf8.external_encoding.name
it says CP850
I just opened my csv file with notepad and saved it with utf-8
encoding, then my original code is working perfectly and special
characters are shown normally.
Sebastian
On 22 Jun., 08:58, Chirag Singhal
<chirag.sing...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
wrote:> Per the documentation, you can probably do something like this:
>
> file = params[:file]
> CSV.open(file.tempfile, "rb:UTF-32BE:UTF-8", {:headers =>
false, :col_sep =>
> ";"}) do |row|
> @newhash << {:var1 => row[0], :var2 => row[1]}
> end
>
> Replace "UTF-32BE" with your incoming encoding and
"UTF-8" with the encoding
> you want to parse/store your data in.
>
> On Wed, Jun 22, 2011 at 12:11 PM, Sebastian
> <sebastian.go...-gM/Ye1E23mwN+BqQ9rBEUg@public.gmane.org>wrote:
>
>
>
>
>
>
>
>
>
> > Hi Chirag,
>
> > thank you for the links. I will read them and look if there is
> > something that can help me.
>
> > I found out that the main problem was that my gvim editor saved every
> > *.rb file not in utf-8 encoding. I just edited them with notepad and
> > saved them explicitly in utf-8 and then the german special characters
> > worked in my controllers.
>
> > There is still the problem with the CSV class, which I need to import
> > a csv file. This class is not able to read the special characters.
>
> > I found the documentation here:
> >http://www.ruby-doc.org/ruby-1.9/classes/CSV.html#M001340
>
> > there is something about encodings, but I don''t understand
how to use
> > it!!!
>
> > Sebastian
>
> > On 21 Jun., 14:34, Chirag Singhal
<chirag.sing...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> > > Hi Sebastian,
>
> > > I personally haven''t had to deal with encoding issues
yet, but remember
> > > reading couple of posts from Yehuda Katz (of merb fame and core
> > contributor
> > > to rails) on that.
> > > Maybe these can help you identify and fix your problem:
>
>http://yehudakatz.com/2010/05/17/encodings-unabridged/http://yehudaka....
> > ..
>
> > > The articles are little long, but if you know a good deal about
> > encodings,
> > > then you can skip towards end of the posts where he writes about
how to
> > deal
> > > with conversions.
>
> > --
> > You received this message because you are subscribed to the Google
Groups
> > "Ruby on Rails: Talk" group.
> > To post to this group, send email to
rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> > To unsubscribe from this group, send email to
> >
rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> > For more options, visit this group at
> >http://groups.google.com/group/rubyonrails-talk?hl=en.
>
> --
> Chiraghttp://sumeruonrails.com
--
You received this message because you are subscribed to the Google Groups
"Ruby on Rails: Talk" group.
To post to this group, send email to
rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To unsubscribe from this group, send email to
rubyonrails-talk+unsubscribe@googlegroups.com.
For more options, visit this group at
http://groups.google.com/group/rubyonrails-talk?hl=en.
How about this:
file = params[:file]
CSV.foreach(file.tempfile, {:encoding => "rb:UTF-32BE:UTF-8",
:headers =>
false, :col_sep => ";"}) do |row|
@newhash << {:var1 => row[0], :var2 => row[1]}
end
--
You received this message because you are subscribed to the Google Groups
"Ruby on Rails: Talk" group.
To view this discussion on the web visit
https://groups.google.com/d/msg/rubyonrails-talk/-/6i5uOiImS4UJ.
To post to this group, send email to
rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To unsubscribe from this group, send email to
rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
For more options, visit this group at
http://groups.google.com/group/rubyonrails-talk?hl=en.
Exactly the same error! It says ''No file to upload'' The documentation says that the encoding option for CSV is available for OPEN, FOREACH, READ and READLINES. So both, open or foreach, should work, or? On 22 Jun., 10:26, Chirag Singhal <chirag.sing...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:> How about this: > > file = params[:file] > CSV.foreach(file.tempfile, {:encoding => "rb:UTF-32BE:UTF-8", :headers => > false, :col_sep => ";"}) do |row| > @newhash << {:var1 => row[0], :var2 => row[1]} > end-- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.
What does file.tempfile return? If it is a file object, then we have a problem, we need to pass in file path here. So call path on the file object and pass that as the first argument. -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To view this discussion on the web visit https://groups.google.com/d/msg/rubyonrails-talk/-/jMBOLpNppCgJ. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.
file.temp is an object. I have a form where a csv can be uploaded, but it is never stored. That''s why I use tempfile. That means that I probably have no path to use in that method. BUT, the open and foreach method for the CSV class is working with an object whenever I don''t have a german special character in my csv file or when my csv file is already in utf-8 encoding format. On 22 Jun., 12:05, Chirag Singhal <chirag.sing...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:> What does file.tempfile return? > If it is a file object, then we have a problem, we need to pass in file path > here. > So call path on the file object and pass that as the first argument.-- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.
OK, it was working perfectly when I just made sure that my csv file is in utf-8 encoding format. I deleted some of my programm, so I had to write a lot of stuff again. If I now upload a csv file which is in utf-8 format and then I have every time in the first row that the first three character are: \xEF \xBBxBF I read that these is something about unicode and ordering, but i don''t know where these hex chars come from. Also every german special character is also shown in this hex code, e.g. "k\xC3\xBChler" should be "kühler" If I use files in other encodings there are not these three chars in the beginning, but every special char is "?" Has anyone an idea where this comes from? Cheers, Sebastian On 22 Jun., 13:26, Sebastian <sebastian.go...-gM/Ye1E23mwN+BqQ9rBEUg@public.gmane.org> wrote:> file.temp is an object. I have a form where a csv can be uploaded, but > it is never stored. That''s why I use tempfile. That means that I > probably have no path to use in that method. > > BUT, the open and foreach method for the CSV class is working with an > object whenever I don''t have a german special character in my csv file > or when my csv file is already in utf-8 encoding format. > > On 22 Jun., 12:05, Chirag Singhal <chirag.sing...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: > > > > > > > > > What does file.tempfile return? > > If it is a file object, then we have a problem, we need to pass in file path > > here. > > So call path on the file object and pass that as the first argument.-- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.
On Jul 1, 11:48 am, Sebastian <sebastian.go...-gM/Ye1E23mwN+BqQ9rBEUg@public.gmane.org> wrote:> OK, > > it was working perfectly when I just made sure that my csv file is in > utf-8 encoding format. > > I deleted some of my programm, so I had to write a lot of stuff again. > > If I now upload a csv file which is in utf-8 format and then I have > every time in the first row that the first three character are: \xEF > \xBBxBFThat''s a utf BOM: a magic unicode character that tells whoever is reading the stream what endianness is and also allows to tell UTF8 apart from utf16 You can safely strip them from the file.> > I read that these is something about unicode and ordering, but i don''t > know where these hex chars come from. > > Also every german special character is also shown in this hex code, > e.g. "k\xC3\xBChler" should be "kühler"That is probably just an output thing if you are seeing this in a terminal window- \xC3\xBC is the utf8 sequence for ü Fred> > If I use files in other encodings there are not these three chars in > the beginning, but every special char is "?" > > Has anyone an idea where this comes from? > > Cheers, > Sebastian > > On 22 Jun., 13:26, Sebastian <sebastian.go...-gM/Ye1E23mwN+BqQ9rBEUg@public.gmane.org> wrote: > > > > > file.temp is an object. I have a form where a csv can be uploaded, but > > it is never stored. That''s why I use tempfile. That means that I > > probably have no path to use in that method. > > > BUT, the open and foreach method for the CSV class is working with an > > object whenever I don''t have a german special character in my csv file > > or when my csv file is already in utf-8 encoding format. > > > On 22 Jun., 12:05, Chirag Singhal <chirag.sing...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: > > > > What does file.tempfile return? > > > If it is a file object, then we have a problem, we need to pass in file path > > > here. > > > So call path on the file object and pass that as the first argument.-- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.
Thank you for your reply! Stripping the first chars is possible of course, but I don''t understand why these chars are there. It was working before! I could just upload the utf-8 csv and everthing was working great before. I don''t really know what I changed that now these chars are appearing. Sebastian On 1 Jul., 15:12, Frederick Cheung <frederick.che...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:> On Jul 1, 11:48 am, Sebastian <sebastian.go...-gM/Ye1E23mwN+BqQ9rBEUg@public.gmane.org> wrote: > > > OK, > > > it was working perfectly when I just made sure that my csv file is in > > utf-8 encoding format. > > > I deleted some of my programm, so I had to write a lot of stuff again. > > > If I now upload a csv file which is in utf-8 format and then I have > > every time in the first row that the first three character are: \xEF > > \xBBxBF > > That''s a utf BOM: a magic unicode character that tells whoever is > reading the stream what endianness is and also allows to tell UTF8 > apart from utf16 > You can safely strip them from the file. > > > > > I read that these is something about unicode and ordering, but i don''t > > know where these hex chars come from. > > > Also every german special character is also shown in this hex code, > > e.g. "k\xC3\xBChler" should be "kühler" > > That is probably just an output thing if you are seeing this in a > terminal window- \xC3\xBC is the utf8 sequence for ü > > Fred > > > > > > > > > > > If I use files in other encodings there are not these three chars in > > the beginning, but every special char is "?" > > > Has anyone an idea where this comes from? > > > Cheers, > > Sebastian > > > On 22 Jun., 13:26, Sebastian <sebastian.go...-gM/Ye1E23mwN+BqQ9rBEUg@public.gmane.org> wrote: > > > > file.temp is an object. I have a form where a csv can be uploaded, but > > > it is never stored. That''s why I use tempfile. That means that I > > > probably have no path to use in that method. > > > > BUT, the open and foreach method for the CSV class is working with an > > > object whenever I don''t have a german special character in my csv file > > > or when my csv file is already in utf-8 encoding format. > > > > On 22 Jun., 12:05, Chirag Singhal <chirag.sing...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: > > > > > What does file.tempfile return? > > > > If it is a file object, then we have a problem, we need to pass in file path > > > > here. > > > > So call path on the file object and pass that as the first argument.-- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.
Walter Lee Davis
2011-Jul-04 16:31 UTC
Re: Re: Problem with special characters and CSV upload
Unicode uses them to indicate to the application reading the text file which order the following bytes are in. Since UTF-8 uses compound characters to indicate the scary-high end of the unicode character table (two bytes needed to encode some characters) the order that the bits arrived in is of critical importance. Text files may be little- endian or big-endian, and unless you know what order to expect, you can''t really know. Walter On Jul 4, 2011, at 3:02 AM, Sebastian wrote:> Thank you for your reply! > > Stripping the first chars is possible of course, but I don''t > understand why these chars are there. > > It was working before! I could just upload the utf-8 csv and everthing > was working great before. I don''t really know what I changed that now > these chars are appearing. > > Sebastian > > On 1 Jul., 15:12, Frederick Cheung <frederick.che...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: >> On Jul 1, 11:48 am, Sebastian <sebastian.go...-gM/Ye1E23mwN+BqQ9rBEUg@public.gmane.org> wrote: >> >>> OK, >> >>> it was working perfectly when I just made sure that my csv file is >>> in >>> utf-8 encoding format. >> >>> I deleted some of my programm, so I had to write a lot of stuff >>> again. >> >>> If I now upload a csv file which is in utf-8 format and then I have >>> every time in the first row that the first three character are: \xEF >>> \xBBxBF >> >> That''s a utf BOM: a magic unicode character that tells whoever is >> reading the stream what endianness is and also allows to tell UTF8 >> apart from utf16 >> You can safely strip them from the file. >> >> >> >>> I read that these is something about unicode and ordering, but i >>> don''t >>> know where these hex chars come from. >> >>> Also every german special character is also shown in this hex code, >>> e.g. "k\xC3\xBChler" should be "kühler" >> >> That is probably just an output thing if you are seeing this in a >> terminal window- \xC3\xBC is the utf8 sequence for ü >> >> Fred >> >> >> >> >> >> >> >> >> >>> If I use files in other encodings there are not these three chars in >>> the beginning, but every special char is "?" >> >>> Has anyone an idea where this comes from? >> >>> Cheers, >>> Sebastian >> >>> On 22 Jun., 13:26, Sebastian <sebastian.go...-gM/Ye1E23mwN+BqQ9rBEUg@public.gmane.org> wrote: >> >>>> file.temp is an object. I have a form where a csv can be >>>> uploaded, but >>>> it is never stored. That''s why I use tempfile. That means that I >>>> probably have no path to use in that method. >> >>>> BUT, the open and foreach method for the CSV class is working >>>> with an >>>> object whenever I don''t have a german special character in my csv >>>> file >>>> or when my csv file is already in utf-8 encoding format. >> >>>> On 22 Jun., 12:05, Chirag Singhal <chirag.sing...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: >> >>>>> What does file.tempfile return? >>>>> If it is a file object, then we have a problem, we need to pass >>>>> in file path >>>>> here. >>>>> So call path on the file object and pass that as the first >>>>> argument. > > -- > You received this message because you are subscribed to the Google > Groups "Ruby on Rails: Talk" group. > To post to this group, send email to rubyonrails- > talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe@googlegroups.com > . > For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en > . >-- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.
OK,
thank you for your reply! In the meantime I figured out why this was
working without errors in my first code!
There I had some REGEX checks before saving each row into the
database. That means the first row always got skipped, because the
unicode indentifiers didn''t fit to the REGEX.
Now I know where my fault is, but I don''t really know how to solve it.
If the source csv is in utf-8 I can of course strip the first three
chars. But if it is in another encoding, that means I strip of chars
that I need. How can I check which encoding the file has? I tried this
here, but that gives me always CP850 as encoding:
file = File.open("my.csv")
puts file.external_encoding.name
Or is there a way to transform a file before uploading? I use
file.temp for uploading.
Cheers,
Sebastian
On 4 Jul., 18:31, Walter Lee Davis
<wa...-HQgmohHLjDZWk0Htik3J/w@public.gmane.org>
wrote:> Unicode uses them to indicate to the application reading the text file
> which order the following bytes are in. Since UTF-8 uses compound
> characters to indicate the scary-high end of the unicode character
> table (two bytes needed to encode some characters) the order that the
> bits arrived in is of critical importance. Text files may be little-
> endian or big-endian, and unless you know what order to expect, you
> can''t really know.
>
> Walter
>
> On Jul 4, 2011, at 3:02 AM, Sebastian wrote:
>
>
>
>
>
>
>
> > Thank you for your reply!
>
> > Stripping the first chars is possible of course, but I don''t
> > understand why these chars are there.
>
> > It was working before! I could just upload the utf-8 csv and everthing
> > was working great before. I don''t really know what I changed
that now
> > these chars are appearing.
>
> > Sebastian
>
> > On 1 Jul., 15:12, Frederick Cheung
<frederick.che...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> >> On Jul 1, 11:48 am, Sebastian
<sebastian.go...-gM/Ye1E23mwN+BqQ9rBEUg@public.gmane.org> wrote:
>
> >>> OK,
>
> >>> it was working perfectly when I just made sure that my csv
file is
> >>> in
> >>> utf-8 encoding format.
>
> >>> I deleted some of my programm, so I had to write a lot of
stuff
> >>> again.
>
> >>> If I now upload a csv file which is in utf-8 format and then I
have
> >>> every time in the first row that the first three character
are: \xEF
> >>> \xBBxBF
>
> >> That''s a utf BOM: a magic unicode character that tells
whoever is
> >> reading the stream what endianness is and also allows to tell UTF8
> >> apart from utf16
> >> You can safely strip them from the file.
>
> >>> I read that these is something about unicode and ordering, but
i
> >>> don''t
> >>> know where these hex chars come from.
>
> >>> Also every german special character is also shown in this hex
code,
> >>> e.g. "k\xC3\xBChler" should be "kühler"
>
> >> That is probably just an output thing if you are seeing this in a
> >> terminal window- \xC3\xBC is the utf8 sequence for ü
>
> >> Fred
>
> >>> If I use files in other encodings there are not these three
chars in
> >>> the beginning, but every special char is "?"
>
> >>> Has anyone an idea where this comes from?
>
> >>> Cheers,
> >>> Sebastian
>
> >>> On 22 Jun., 13:26, Sebastian
<sebastian.go...-gM/Ye1E23mwN+BqQ9rBEUg@public.gmane.org> wrote:
>
> >>>> file.temp is an object. I have a form where a csv can be
> >>>> uploaded, but
> >>>> it is never stored. That''s why I use tempfile.
That means that I
> >>>> probably have no path to use in that method.
>
> >>>> BUT, the open and foreach method for the CSV class is
working
> >>>> with an
> >>>> object whenever I don''t have a german special
character in my csv
> >>>> file
> >>>> or when my csv file is already in utf-8 encoding format.
>
> >>>> On 22 Jun., 12:05, Chirag Singhal
<chirag.sing...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>
> >>>>> What does file.tempfile return?
> >>>>> If it is a file object, then we have a problem, we
need to pass
> >>>>> in file path
> >>>>> here.
> >>>>> So call path on the file object and pass that as the
first
> >>>>> argument.
>
> > --
> > You received this message because you are subscribed to the Google
> > Groups "Ruby on Rails: Talk" group.
> > To post to this group, send email to rubyonrails-
> > talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> > To unsubscribe from this group, send email to
rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
> > .
> > For more options, visit this group
athttp://groups.google.com/group/rubyonrails-talk?hl=en
> > .
--
You received this message because you are subscribed to the Google Groups
"Ruby on Rails: Talk" group.
To post to this group, send email to
rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To unsubscribe from this group, send email to
rubyonrails-talk+unsubscribe@googlegroups.com.
For more options, visit this group at
http://groups.google.com/group/rubyonrails-talk?hl=en.
Hi,
I find partly a solution. I just use this code:
file = params[:file].tempfile
content = file.read.force_encoding("UTF-8")
content.gsub!("\xEF\xBB\xBF".force_encoding("UTF-8"),
'''')
@csv = CSV.new(content, {:headers => false, :col_sep =>
";"})
I found it here:
http://stackoverflow.com/questions/5011504/is-there-a-way-to-remove-the-bom-from-a-utf-8-encoded-file
There is still a problem when the source file is not utf-8 encoded!
On 5 Jul., 10:14, Sebastian
<sebastian.go...-gM/Ye1E23mwN+BqQ9rBEUg@public.gmane.org>
wrote:> OK,
>
> thank you for your reply! In the meantime I figured out why this was
> working without errors in my first code!
>
> There I had some REGEX checks before saving each row into the
> database. That means the first row always got skipped, because the
> unicode indentifiers didn''t fit to the REGEX.
>
> Now I know where my fault is, but I don''t really know how to solve
it.
>
> If the source csv is in utf-8 I can of course strip the first three
> chars. But if it is in another encoding, that means I strip of chars
> that I need. How can I check which encoding the file has? I tried this
> here, but that gives me always CP850 as encoding:
>
> file = File.open("my.csv")
> puts file.external_encoding.name
>
> Or is there a way to transform a file before uploading? I use
> file.temp for uploading.
>
> Cheers,
> Sebastian
>
> On 4 Jul., 18:31, Walter Lee Davis
<wa...-HQgmohHLjDZWk0Htik3J/w@public.gmane.org> wrote:
>
>
>
>
>
>
>
> > Unicode uses them to indicate to the application reading the text file
> > which order the following bytes are in. Since UTF-8 uses compound
> > characters to indicate the scary-high end of the unicode character
> > table (two bytes needed to encode some characters) the order that the
> > bits arrived in is of critical importance. Text files may be little-
> > endian or big-endian, and unless you know what order to expect, you
> > can''t really know.
>
> > Walter
>
> > On Jul 4, 2011, at 3:02 AM, Sebastian wrote:
>
> > > Thank you for your reply!
>
> > > Stripping the first chars is possible of course, but I
don''t
> > > understand why these chars are there.
>
> > > It was working before! I could just upload the utf-8 csv and
everthing
> > > was working great before. I don''t really know what I
changed that now
> > > these chars are appearing.
>
> > > Sebastian
>
> > > On 1 Jul., 15:12, Frederick Cheung
<frederick.che...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> > >> On Jul 1, 11:48 am, Sebastian
<sebastian.go...-gM/Ye1E23mwN+BqQ9rBEUg@public.gmane.org> wrote:
>
> > >>> OK,
>
> > >>> it was working perfectly when I just made sure that my
csv file is
> > >>> in
> > >>> utf-8 encoding format.
>
> > >>> I deleted some of my programm, so I had to write a lot of
stuff
> > >>> again.
>
> > >>> If I now upload a csv file which is in utf-8 format and
then I have
> > >>> every time in the first row that the first three
character are: \xEF
> > >>> \xBBxBF
>
> > >> That''s a utf BOM: a magic unicode character that
tells whoever is
> > >> reading the stream what endianness is and also allows to tell
UTF8
> > >> apart from utf16
> > >> You can safely strip them from the file.
>
> > >>> I read that these is something about unicode and
ordering, but i
> > >>> don''t
> > >>> know where these hex chars come from.
>
> > >>> Also every german special character is also shown in this
hex code,
> > >>> e.g. "k\xC3\xBChler" should be
"kühler"
>
> > >> That is probably just an output thing if you are seeing this
in a
> > >> terminal window- \xC3\xBC is the utf8 sequence for ü
>
> > >> Fred
>
> > >>> If I use files in other encodings there are not these
three chars in
> > >>> the beginning, but every special char is "?"
>
> > >>> Has anyone an idea where this comes from?
>
> > >>> Cheers,
> > >>> Sebastian
>
> > >>> On 22 Jun., 13:26, Sebastian
<sebastian.go...-gM/Ye1E23mwN+BqQ9rBEUg@public.gmane.org> wrote:
>
> > >>>> file.temp is an object. I have a form where a csv can
be
> > >>>> uploaded, but
> > >>>> it is never stored. That''s why I use
tempfile. That means that I
> > >>>> probably have no path to use in that method.
>
> > >>>> BUT, the open and foreach method for the CSV class is
working
> > >>>> with an
> > >>>> object whenever I don''t have a german
special character in my csv
> > >>>> file
> > >>>> or when my csv file is already in utf-8 encoding
format.
>
> > >>>> On 22 Jun., 12:05, Chirag Singhal
<chirag.sing...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>
> > >>>>> What does file.tempfile return?
> > >>>>> If it is a file object, then we have a problem,
we need to pass
> > >>>>> in file path
> > >>>>> here.
> > >>>>> So call path on the file object and pass that as
the first
> > >>>>> argument.
>
> > > --
> > > You received this message because you are subscribed to the
Google
> > > Groups "Ruby on Rails: Talk" group.
> > > To post to this group, send email to rubyonrails-
> > > talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> > > To unsubscribe from this group, send email to
rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
> > > .
> > > For more options, visit this group
athttp://groups.google.com/group/rubyonrails-talk?hl=en
> > > .
--
You received this message because you are subscribed to the Google Groups
"Ruby on Rails: Talk" group.
To post to this group, send email to
rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To unsubscribe from this group, send email to
rubyonrails-talk+unsubscribe@googlegroups.com.
For more options, visit this group at
http://groups.google.com/group/rubyonrails-talk?hl=en.