Hi All I''m trying to build an application which requires to scrap information from a webpage. On trying to perform the action, I get an error while trying to convert the html data to JSON. Has anyone experienced this before and if so can you please tell me how to solve this problem ? Please see below for code snippet and error log. Thanks in advance Anush require ''net/http'' require ''open-uri'' require ''uri'' require ''json'' require ''pp'' class Merchant < ActiveRecord::Base def self.grab_original_content ## EXAMPLE USING ZED451.COM uri = URI("http://www.zed451.com") response = Net::HTTP.get_response(uri) @hash = JSON(response.body) puts "#{@hash}" end end I call the above method in my controller and send @hash to view. In my browser I see the below error: JSON::ParserError in Original contentController#index 706: unexpected token at ''<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> And the rest of the page is printed without error in html format. -- Posted via http://www.ruby-forum.com/. -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org For more options, visit https://groups.google.com/groups/opt_out.
Hai, On Mon, Jan 7, 2013 at 3:01 AM, Anush J. <lists-fsXkhYbjdPsEEoCn2XhGlw@public.gmane.org> wrote: I call the above method in my controller and send @hash to view. In my browser I see the below error: JSON::ParserError in Original contentController#index 706: unexpected token at ''Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> And the rest of the page is printed without error in html format. It''s printed out as HTML because it is HTML. HTML is not JSON and vice verse. If you wish to parse the page as it is you need to use something like Nokogiri so it gets tokenized, if you expected JSON you should contact them and ask them what went wrong. --- Jordon Bedwell http://envygeeks.com/ https://twitter.com/envygeeks -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Hi Jordon, Thanks for your response. I thought the JSON(response.body) performs the conversion of HTML->JSON. But I also tried response.body.to_json which gave me the same error. Will be great if you can explain a bit. Mean while I will also try using nokigiri. Thanks Anush Jordon Bedwell wrote in post #1091317:> Hai, > > On Mon, Jan 7, 2013 at 3:01 AM, Anush J. <lists-fsXkhYbjdPsEEoCn2XhGlw@public.gmane.org> wrote: > I call the above method in my controller and send @hash to view. > In my browser I see the below error: > > JSON::ParserError in Original contentController#index > > 706: unexpected token at ''Transitional//EN" > "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> > > And the rest of the page is printed without error in html format. > > It''s printed out as HTML because it is HTML. HTML is not JSON and vice > verse. If you wish to parse the page as it is you need to use something > like Nokogiri so it gets tokenized, if you expected JSON you should > contact them and ask them what went wrong. > > --- > > Jordon Bedwell > http://envygeeks.com/ > https://twitter.com/envygeeks-- Posted via http://www.ruby-forum.com/. -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org For more options, visit https://groups.google.com/groups/opt_out.
You cannot convert HTML to JSON and vice versa. HTML is a markup language, while JSON is a data interchange format. You need to parse your HTML with Nokogiri or Hpricot, extract whatever data you want from it and put it in a Hash, then call .to_json on it to get the JSON response. -- Dheeraj Kumar On Tuesday 8 January 2013 at 12:33 AM, Anush J. wrote:> Hi Jordon, > Thanks for your response. > I thought the JSON(response.body) performs the conversion of HTML->JSON. > But I also tried response.body.to_json which gave me the same error. > Will be great if you can explain a bit. Mean while I will also try using > nokigiri. > > Thanks > Anush > > Jordon Bedwell wrote in post #1091317: > > Hai, > > > > On Mon, Jan 7, 2013 at 3:01 AM, Anush J. <lists-fsXkhYbjdPsEEoCn2XhGlw@public.gmane.org (mailto:lists-fsXkhYbjdPsEEoCn2XhGlw@public.gmane.org)> wrote: > > I call the above method in my controller and send @hash to view. > > In my browser I see the below error: > > > > JSON::ParserError in Original contentController#index > > > > 706: unexpected token at ''Transitional//EN" > > "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> > > > > And the rest of the page is printed without error in html format. > > > > It''s printed out as HTML because it is HTML. HTML is not JSON and vice > > verse. If you wish to parse the page as it is you need to use something > > like Nokogiri so it gets tokenized, if you expected JSON you should > > contact them and ask them what went wrong. > > > > --- > > > > Jordon Bedwell > > http://envygeeks.com/ > > https://twitter.com/envygeeks > > > > > -- > Posted via http://www.ruby-forum.com/. > > -- > You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. > To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org (mailto:rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org). > To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org (mailto:rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org). > For more options, visit https://groups.google.com/groups/opt_out. > >-- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org For more options, visit https://groups.google.com/groups/opt_out.
Dheeraj Kumar wrote in post #1091355:> You cannot convert HTML to JSON and vice versa. HTML is a markup > language, while JSON is a data interchange format. > > You need to parse your HTML with Nokogiri or Hpricot, extract whatever > data you want from it and put it in a Hash, then call .to_json on it to > get the JSON response. > > -- > Dheeraj KumarHi Dheeraj, Ahh..I see. Got it now. Thanks, helps a lot in understanding. Thanks Anush -- Posted via http://www.ruby-forum.com/. -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org For more options, visit https://groups.google.com/groups/opt_out.