Hi all, I am scraping a table off of another site and inserting it onto my site. you can see an example on the initial page at: http://mthosts.heroku.com. I''m referring to the green box with the snowbird weather and snowfall information. this box has been scraped off of the snowbird site at: http://www.snowbird.com/ski_board/snowreport.php The problem is that on the snowbird site it has degree symbols (°) but on my page it shows up as: (�) I think it has something to do with the encoding but i''m pretty new to html etc. and am not sure what i can do to fix this. I''ve tried substituting the characters and some other things but haven''t had any success yet. any ideas? thanks, max -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.
Everaldo Gomes
2011-Nov-27 11:54 UTC
Re: problem scraping using nokogiri - getting wrong characters
Hi! I opened the html source from the snowreport.php site and I noted that the strange symbols that you mentioned are htmlencoded characters. The symbol is ° I had a similar problem on last Monday, but I couldn''t complete solve it. Try the lib: http://htmlentities.rubyforge.org/ or use a regular expression (sub, gsub) to substitute ° for the degrees symbol. Regards, Everaldo On Sun, Nov 27, 2011 at 1:15 AM, Max <aamax-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> wrote:> Hi all, > > I am scraping a table off of another site and inserting it onto my > site. you can see an example on the initial page at: > http://mthosts.heroku.com. > I''m referring to the green box with the snowbird weather and snowfall > information. > > this box has been scraped off of the snowbird site at: > http://www.snowbird.com/ski_board/snowreport.php > > The problem is that on the snowbird site it has degree symbols (°) but > on my page it shows up as: (�) > > I think it has something to do with the encoding but i''m pretty new to > html etc. and am not sure what i can do to fix this. I''ve tried > substituting the characters and some other things but haven''t had any > success yet. > > any ideas? > > thanks, > > max > > -- > You received this message because you are subscribed to the Google Groups > "Ruby on Rails: Talk" group. > To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > To unsubscribe from this group, send email to > rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > For more options, visit this group at > http://groups.google.com/group/rubyonrails-talk?hl=en. > >-- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.
i tried that but it didn''t work for me. what did was to explicitly
set the encoding property in nokogiri
url = ''http://www.snowbird.com/ski_board/snowreport.php''
page = Nokogiri::HTML(open(url))
page.encoding = ''utf-8''
worked great after that!
thx,
Max
On Nov 27, 4:54 am, Everaldo Gomes
<everaldo.go...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
wrote:> Hi!
>
> I opened the html source from the snowreport.php site and I noted that the
> strange symbols that you mentioned are htmlencoded
> characters. The symbol is °
>
> I had a similar problem on last Monday, but I couldn''t complete
solve it.
>
> Try the lib:http://htmlentities.rubyforge.org/
>
> or use a regular expression (sub, gsub) to substitute ° for the
degrees
> symbol.
>
> Regards,
>
> Everaldo
>
>
>
>
>
>
>
> On Sun, Nov 27, 2011 at 1:15 AM, Max
<aa...-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> wrote:
> > Hi all,
>
> > I am scraping a table off of another site and inserting it onto my
> > site. you can see an example on the initial page at:
> >http://mthosts.heroku.com.
> > I''m referring to the green box with the snowbird weather and
snowfall
> > information.
>
> > this box has been scraped off of the snowbird site at:
> >http://www.snowbird.com/ski_board/snowreport.php
>
> > The problem is that on the snowbird site it has degree symbols (°) but
> > on my page it shows up as: ( )
>
> > I think it has something to do with the encoding but i''m
pretty new to
> > html etc. and am not sure what i can do to fix this. I''ve
tried
> > substituting the characters and some other things but haven''t
had any
> > success yet.
>
> > any ideas?
>
> > thanks,
>
> > max
>
> > --
> > You received this message because you are subscribed to the Google
Groups
> > "Ruby on Rails: Talk" group.
> > To post to this group, send email to
rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> > To unsubscribe from this group, send email to
> >
rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> > For more options, visit this group at
> >http://groups.google.com/group/rubyonrails-talk?hl=en.
--
You received this message because you are subscribed to the Google Groups
"Ruby on Rails: Talk" group.
To post to this group, send email to
rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To unsubscribe from this group, send email to
rubyonrails-talk+unsubscribe@googlegroups.com.
For more options, visit this group at
http://groups.google.com/group/rubyonrails-talk?hl=en.