Love U Ruby
2013-Jun-08 07:35 UTC
Why Nokogiri::HTML::Document#meta_encoding returns nil ?
require "nokogiri" doc = Nokogiri::HTML::Document.new("<title> Save the page! </title>") doc.class # => Nokogiri::HTML::Document doc = Nokogiri::HTML::Document.parse <<-eof <head> <meta name="description" content="Free Web tutorials"> <meta name="keywords" content="HTML,CSS,XML,JavaScript"> <meta name="author" content="Ståle Refsnes"> <meta charset="UTF-8"> </head> eof doc.class # => Nokogiri::HTML::Document doc.meta_encoding # => nil puts doc.to_html # >> <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd"> # >> <html><head> # >> <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"> # >> <meta name="description" content="Free Web tutorials"> # >> <meta name="keywords" content="HTML,CSS,XML,JavaScript"> # >> <meta name="author" content="Ståle Refsnes"> # >> <meta charset="UTF-8"> # >> </head></html> Why Nokogiri::HTML::Document#meta_encoding returns nil ? -- Posted via http://www.ruby-forum.com/. -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To unsubscribe from this group and stop receiving emails from it, send an email to rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/rubyonrails-talk/503711b356eb4cf916eede1e2ff59dba%40ruby-forum.com?hl=en-US. For more options, visit https://groups.google.com/groups/opt_out.
chloé r.
2013-Jun-08 11:53 UTC
Re: Why Nokogiri::HTML::Document#meta_encoding returns nil ?
I think it does not recognize the html5 meta charset, since the following works : doc.meta_encoding="<meta http-equiv=\"Content-Type\" content=\"text/html; charset=utf-8\">" puts doc.meta_encoding Love U Ruby wrote in post #1111697:> require "nokogiri" > > doc = Nokogiri::HTML::Document.new("<title> Save the page! </title>") > doc.class # => Nokogiri::HTML::Document > > doc = Nokogiri::HTML::Document.parse <<-eof > <head> > <meta name="description" content="Free Web tutorials"> > <meta name="keywords" content="HTML,CSS,XML,JavaScript"> > <meta name="author" content="Ståle Refsnes"> > <meta charset="UTF-8"> > </head> > eof > > doc.class # => Nokogiri::HTML::Document > doc.meta_encoding # => nil > puts doc.to_html > # >> <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" > "http://www.w3.org/TR/REC-html40/loose.dtd"> > # >> <html><head> > # >> <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"> > # >> <meta name="description" content="Free Web tutorials"> > # >> <meta name="keywords" content="HTML,CSS,XML,JavaScript"> > # >> <meta name="author" content="Ståle Refsnes"> > # >> <meta charset="UTF-8"> > # >> </head></html> > > > Why Nokogiri::HTML::Document#meta_encoding returns nil ?-- Posted via http://www.ruby-forum.com/. -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To unsubscribe from this group and stop receiving emails from it, send an email to rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/rubyonrails-talk/98f03acf22cd628bbc5b781c41861c1c%40ruby-forum.com?hl=en-US. For more options, visit https://groups.google.com/groups/opt_out.
Love U Ruby
2013-Jun-08 12:09 UTC
Re: Why Nokogiri::HTML::Document#meta_encoding returns nil ?
Still I am getting `nil`. doc = Nokogiri::HTML::Document.new("<title> Save the page! </title>") doc.meta_encoding="<meta http-equiv=\"Content-Type\" content=\"text/html; charset=utf-8\">" doc.meta_encoding # => nil -- Posted via http://www.ruby-forum.com/. -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To unsubscribe from this group and stop receiving emails from it, send an email to rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/rubyonrails-talk/f801d86f8109d0942d9a3ce7603a3c09%40ruby-forum.com?hl=en-US. For more options, visit https://groups.google.com/groups/opt_out.
Tamara Temple
2013-Jun-08 15:02 UTC
Re: Why Nokogiri::HTML::Document#meta_encoding returns nil ?
Love U Ruby <lists-fsXkhYbjdPsEEoCn2XhGlw@public.gmane.org> wrote:> require "nokogiri" > > doc = Nokogiri::HTML::Document.new("<title> Save the page! </title>") > doc.class # => Nokogiri::HTML::Document > > doc = Nokogiri::HTML::Document.parse <<-eof > <head> > <meta name="description" content="Free Web tutorials"> > <meta name="keywords" content="HTML,CSS,XML,JavaScript"> > <meta name="author" content="Ståle Refsnes"> > <meta charset="UTF-8"> > </head> > eofI think the problem is that when nokogiri parses html, it assumes html 4.0 transitional, as is evidenced by the DOCTYPE. I''m not sure how to get it to deal with HTML 5....> doc.class # => Nokogiri::HTML::Document > doc.meta_encoding # => nil > puts doc.to_html > # >> <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" > "http://www.w3.org/TR/REC-html40/loose.dtd"> > # >> <html><head> > # >> <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"> > # >> <meta name="description" content="Free Web tutorials"> > # >> <meta name="keywords" content="HTML,CSS,XML,JavaScript"> > # >> <meta name="author" content="Ståle Refsnes"> > # >> <meta charset="UTF-8"> > # >> </head></html> > > > Why Nokogiri::HTML::Document#meta_encoding returns nil ? > > -- > Posted via http://www.ruby-forum.com/. > > -- > You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. > To unsubscribe from this group and stop receiving emails from it, send an email to rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > To view this discussion on the web visit https://groups.google.com/d/msgid/rubyonrails-talk/503711b356eb4cf916eede1e2ff59dba%40ruby-forum.com?hl=en-US. > For more options, visit https://groups.google.com/groups/opt_out. > >-- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To unsubscribe from this group and stop receiving emails from it, send an email to rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/rubyonrails-talk/51b3478e.a551b60a.0a1b.fffff757%40mx.google.com?hl=en-US. For more options, visit https://groups.google.com/groups/opt_out.
Tamara Temple
2013-Jun-08 15:12 UTC
Re: Re: Why Nokogiri::HTML::Document#meta_encoding returns nil ?
Love U Ruby <lists-fsXkhYbjdPsEEoCn2XhGlw@public.gmane.org> wrote:> Still I am getting `nil`. > > doc = Nokogiri::HTML::Document.new("<title> Save the page! </title>") > doc.meta_encoding="<meta http-equiv=\"Content-Type\" > content=\"text/html; charset=utf-8\">" > doc.meta_encoding # => nilYou''re confusing #new with #parse, as well as what the input to #meta_encoding should be. irb(main):027:0> doc = Nokogiri::HTML::Document.parse "<title> Save the page! </title>" #<Nokogiri::HTML::Document:0x4c2aa20 name="document" children=[#<Nokogiri::XML::DTD:0x4c35fc4 name="html">, #<Nokogiri::XML::Element:0x4c35a38 name="html" children=[#<Nokogiri::XML::Element:0x4c3565a name="head" children=[#<Nokogiri::XML::Element:0x4c35470 name="title" children=[#<Nokogiri::XML::Text:0x4c35196 " Save the page! ">]>]>]>]> irb(main):028:0> puts doc <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd"> <html><head> <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"> <title> Save the page! </title> </head></html> nil irb(main):029:0> doc.meta_encoding "UTF-8" irb(main):030:0> doc.meta_encoding="ISO-8599-2" "ISO-8599-2" irb(main):031:0> doc.meta_encoding "ISO-8599-2" Also, since you are parsing fragments instead of documents, you really should be using DocumentFragment instead of Document. irb(main):032:0> docf = Nokogiri::HTML::DocumentFragment.parse "<title> Save the Page! </title>" #<Nokogiri::HTML::DocumentFragment:0x4d30514 name="#document-fragment" children=[#<Nokogiri::XML::Element:0x4d303ca name="title" children=[#<Nokogiri::XML::Text:0x4d300a0 " Save the Page! ">]>]> irb(main):033:0> puts docf <title> Save the Page! </title> nil irb(main):034:0> docf.respond_to?(:meta_encoding) false Since the encoding only makes sense when you assemble the entire document to send it out to the browser, fragments don''t care. What remains is still how to get Nokogiri to recognize and emit HTML5. -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To unsubscribe from this group and stop receiving emails from it, send an email to rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/rubyonrails-talk/51b349d4.8a823c0a.4372.fffffd24%40mx.google.com?hl=en-US. For more options, visit https://groups.google.com/groups/opt_out.
Love U Ruby
2013-Jun-08 16:04 UTC
Re: Why Nokogiri::HTML::Document#meta_encoding returns nil ?
Tamara Temple wrote in post #1111724:> Love U Ruby <lists-fsXkhYbjdPsEEoCn2XhGlw@public.gmane.org> wrote: >> <meta charset="UTF-8"> >> </head> >> eofYou realized my exact pain area. I always confused about the use of the below two methods: **Nokogiri::HTML::Document.new **Nokogiri::HTML::Document.parse ================================= require "nokogiri" require ''pp'' doc = Nokogiri::HTML::Document.parse "<title> Save the page! </title>" doc.class # => Nokogiri::HTML::Document doc # => #(Document:0x4592d16 { # name = "document", # children = [ # #(DTD:0x4592244 { name = "html" }), # #(Element:0x458dd7a { # name = "html", # children = [ # #(Element:0x45871d2 { # name = "head", # children = [ # #(Element:0x458161a { # name = "title", # children = [ #(Text " Save the page! ")] # })] # })] # })] # }) doc = Nokogiri::HTML::Document.new("<title> Save the page! </title>") doc.class # => Nokogiri::HTML::Document doc # => #(Document:0x4578128 { # name = "document", # children = [ #(DTD:0x45714fe { name = "html" })] # }) Both the document creating `Nokogiri::HTML::Document` object. But when I am printing those,seeing the output differently. Now my questions are - (a) why does `Nokogiri::HTML::Document.parse` and `Nokogiri::HTML::Document.new` creating the document object differently? (b) What is the proper use-case about their uses mean when should I need to think/what method to use? Please help me to digest this basic food. :) :) Thanks -- Posted via http://www.ruby-forum.com/. -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To unsubscribe from this group and stop receiving emails from it, send an email to rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/rubyonrails-talk/82c1392f62c2bdfef54e6886bcfd98c9%40ruby-forum.com?hl=en-US. For more options, visit https://groups.google.com/groups/opt_out.
Tamara Temple
2013-Jun-08 18:24 UTC
Re: Re: Why Nokogiri::HTML::Document#meta_encoding returns nil ?
Love U Ruby <lists-fsXkhYbjdPsEEoCn2XhGlw@public.gmane.org> wrote:> You realized my exact pain area. I always confused about the use of the > below two methods: > > **Nokogiri::HTML::Document.newUse when creating a new (i.e. NON-EXISTANT) HTML document.> **Nokogiri::HTML::Document.parseUse when parsing an existing COMPLETE HTML document, but NOT a fragment. Use Nokogiri::HTML::DocumentFragment.parse when parsing a fragment string (i.e., INCOMPLETE DOCUMENT).> =================================> > > require "nokogiri" > require ''pp'' > > doc = Nokogiri::HTML::Document.parse "<title> Save the page! </title>" > doc.class # => Nokogiri::HTML::Document > doc > # => #(Document:0x4592d16 { > # name = "document", > # children = [ > # #(DTD:0x4592244 { name = "html" }), > # #(Element:0x458dd7a { > # name = "html", > # children = [ > # #(Element:0x45871d2 { > # name = "head", > # children = [ > # #(Element:0x458161a { > # name = "title", > # children = [ #(Text " Save the page! ")] > # })] > # })] > # })] > # }) > > doc = Nokogiri::HTML::Document.new("<title> Save the page! </title>") > doc.class # => Nokogiri::HTML::Document > doc > # => #(Document:0x4578128 { > # name = "document", > # children = [ #(DTD:0x45714fe { name = "html" })] > # }) > > > Both the document creating `Nokogiri::HTML::Document` object. But when I > am printing those,seeing the output differently. Now my questions are - > > (a) why does `Nokogiri::HTML::Document.parse` and > `Nokogiri::HTML::Document.new` creating the document object differently?As stated above, these do two different things, although you may end up with the same class of object, HOW they go about getting there is different.> (b) What is the proper use-case about their uses mean when should I need > to think/what method to use?See above.> Please help me to digest this basic food. :) :)Pre-chewed and partially digested. -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To unsubscribe from this group and stop receiving emails from it, send an email to rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/rubyonrails-talk/51b376d1.a255b60a.5252.2b4a%40mx.google.com?hl=en-US. For more options, visit https://groups.google.com/groups/opt_out.
Love U Ruby
2013-Jun-08 18:56 UTC
Re: Re: Why Nokogiri::HTML::Document#meta_encoding returns nil ?
Tamara Temple wrote in post #1111725:> Love U Ruby <lists-fsXkhYbjdPsEEoCn2XhGlw@public.gmane.org> wrote:> irb(main):027:0> doc = Nokogiri::HTML::Document.parse "<title> Save the > page! </title>" > #<Nokogiri::HTML::Document:0x4c2aa20 name="document" > children=[#<Nokogiri::XML::DTD:0x4c35fc4 name="html">, > #<Nokogiri::XML::Element:0x4c35a38 name="html" > children=[#<Nokogiri::XML::Element:0x4c3565a name="head" > children=[#<Nokogiri::XML::Element:0x4c35470 name="title" > children=[#<Nokogiri::XML::Text:0x4c35196 " Save the page! ">]>]>]>]> > irb(main):028:0> puts doc > <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" > "http://www.w3.org/TR/REC-html40/loose.dtd"> > <html><head> > <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"> > <title> Save the page! </title> > </head></html> > nil > irb(main):029:0> doc.meta_encoding > "UTF-8" > irb(main):030:0> doc.meta_encoding="ISO-8599-2" > "ISO-8599-2" > irb(main):031:0> doc.meta_encoding > "ISO-8599-2"For me why things not working,I don''t know : [1] pry(main)> require "nokogiri" => true [2] pry(main)> doc = Nokogiri::HTML::Document.parse "<title> Save the page! </title>" => #(Document:0x46ca5da { name = "document", children = [ #(DTD:0x46beef6 { name = "html" }), #(Element:0x46be1c2 { name = "html", children = [ #(Element:0x46b5158 { name = "head", children = [ #(Element:0x46b4974 { name = "title", children = [ #(Text " Save the page! ")] })] })] })] }) [5] pry(main)> doc.meta_encoding => nil [6] pry(main)> doc.meta_encoding="ISO-8599-2" => "ISO-8599-2" [7] pry(main)> doc.meta_encoding => nil [8] pry(main)> -- Posted via http://www.ruby-forum.com/. -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To unsubscribe from this group and stop receiving emails from it, send an email to rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/rubyonrails-talk/c4d6a9dfa816d6391e42aec3d3795e3a%40ruby-forum.com?hl=en-US. For more options, visit https://groups.google.com/groups/opt_out.
Tamara Temple
2013-Jun-08 21:51 UTC
Re: Re: Re: Why Nokogiri::HTML::Document#meta_encoding returns nil ?
Love U Ruby <lists-fsXkhYbjdPsEEoCn2XhGlw@public.gmane.org> wrote:> For me why things not working,I don''t know :Me, neither. What version of nokogiri and ruby are you using? Even at that, I haven''t a clue. -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To unsubscribe from this group and stop receiving emails from it, send an email to rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/rubyonrails-talk/51b3a76f.454eb60a.12a7.606e%40mx.google.com?hl=en-US. For more options, visit https://groups.google.com/groups/opt_out.
Love U Ruby
2013-Jun-09 04:05 UTC
Re: Re: Re: Why Nokogiri::HTML::Document#meta_encoding returns nil ?
Tamara Temple wrote in post #1111756:> Me, neither. What version of nokogiri and ruby are you using? > > Even at that, I haven''t a clue.Hummm,raised as a issue : see here https://github.com/sparklemotion/nokogiri/issues/919#issuecomment-19151742 Can you give me one example for the method `parse(string_or_io, url = nil, encoding = nil, options = XML::ParseOptions::DEFAULT_HTML)` where `url` is used? I don''t understand the meaning of the `url` and `options` as a parameter. So looking for an example where those are used. Please advise. Thanks for all your help ! :) -- Posted via http://www.ruby-forum.com/. -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To unsubscribe from this group and stop receiving emails from it, send an email to rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/rubyonrails-talk/fba88184be16711d966bc27ebe3fc9e3%40ruby-forum.com?hl=en-US. For more options, visit https://groups.google.com/groups/opt_out.
Tamara Temple
2013-Jun-09 13:59 UTC
Re: Re: Re: Re: Why Nokogiri::HTML::Document#meta_encoding returns nil ?
Love U Ruby <lists-fsXkhYbjdPsEEoCn2XhGlw@public.gmane.org> wrote:> Can you give me one example for the method `parse(string_or_io, url = > nil, encoding = nil, options = XML::ParseOptions::DEFAULT_HTML)` where > `url` is used? I don''t understand the meaning of the `url` and `options` > as a parameter. So looking for an example where those are used.Nope, never used that. -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To unsubscribe from this group and stop receiving emails from it, send an email to rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/rubyonrails-talk/51b48a2d.90623c0a.696b.12b7%40mx.google.com?hl=en-US. For more options, visit https://groups.google.com/groups/opt_out.