I am new to Mechanize and was wondering if there was a built-in method to get the elements that are on the page that are not part of a form. A couple of examples would be my banking site lists my entries and I want them to go into an array so that I can handle them. Or another site I use, does some categorization for me and I would like to manipulate it and present it differently to a user. I looked through some of the maillists and found something that Paul Lutus wrote that I should be able to use: array = data.scan(%r{<p>([^<]+?)<img .*?/></p>}) This piece of code will find all the paragraph tags that have an image associated with them. It''s clear to me that Paul understands regular expressions well....unfortunately that is not me. I just wondered, with as easy Mechanize has been to use with forms and such, it seemed like there would be something I could use that would help me accomplish my task. While I''m hoping there is a method from within Mechanize, I''ll start working on my regular expressions. BTW, if I wanted to create some documentation for Mechanize, how would I submit it? Mike B. ---------------------------------------------------------------- This message was sent using IMP, the Internet Messaging Program.
On Jan 27, 2007, at 1:57 AM, barsalou wrote:> I am new to Mechanize and was wondering if there was a built-in method > to get the elements that are on the page that are not part of a form. > > A couple of examples would be my banking site lists my entries and I > want them to go into an array so that I can handle them. > > array = data.scan(%r{<p>([^<]+?)<img .*?/></p>}) This piece of code > will find all the paragraph tags that have an image associated with > them. > > BTW, if I wanted to create some documentation for Mechanize, how would > I submit it?I''m sure if you wanted to create more of a manual, just email it to this list and Aaron would probably be happy to have the help. But first, Mechanize has decent API documentation. You may not know how to get at the API docs though. Just run ''gem_server'' on your local machine. Then browse to http://localhost:8808/. You''ll see an [rdoc] link for Mechanize. Then just go to the WWW::Mechanize page for an overview of the package. This is pretty standard fare for most gems. Sadly there''s nothing on the web that steers people to them. Anyway. Searching in mechanize is powered by hpricot. So anything that works in hpricot will also work on a mechanize Page. Sadly I don''t know a real easy way to do your example. But I''d do something like this: page.search(''p'').find_all { |p| p.search(''img'') } There might be something easier. But say you were interested in all the img''s that exist inside a table with id ''body''. That''d be: page.search(''table#body img'') Which is usually just the sort of thing I''m looking for. Anyway, check out: http://code.whytheluckystiff.net/doc/hpricot/ Which has more info about Hpricot (which is the magic behind WWW::Mechanize::Page) Hope that helps! -Mat
Quoting Mat Schaffer <schapht at gmail.com>: <snip>> I''m sure if you wanted to create more of a manual, just email it to > this list and Aaron would probably be happy to have the help. > > But first, Mechanize has decent API documentation. You may not know > how to get at the API docs though. Just run ''gem_server'' on your > local machine. Then browse to http://localhost:8808/. You''ll see an > [rdoc] link for Mechanize. Then just go to the WWW::Mechanize page > for an overview of the package. This is pretty standard fare for > most gems. Sadly there''s nothing on the web that steers people to them. > > Anyway. Searching in mechanize is powered by hpricot. So anything > that works in hpricot will also work on a mechanize Page. > > Sadly I don''t know a real easy way to do your example. But I''d do > something like this: > > page.search(''p'').find_all { |p| p.search(''img'') } > > There might be something easier. But say you were interested in all > the img''s that exist inside a table with id ''body''. That''d be: > > page.search(''table#body img'') > > Which is usually just the sort of thing I''m looking for. > > Anyway, check out: > http://code.whytheluckystiff.net/doc/hpricot/ ><snip> I have found the API docs, but for a newbie who doesn''t know anything about Hpricot and various ways to deal with web pages, I think more examples will be helpful. Thanks for the hints...I''ll check them out and report back. Ruby and Mechanize(which includes Hpricot) makes working with HTML almost fun! :) Mike B. ---------------------------------------------------------------- This message was sent using IMP, the Internet Messaging Program.
On Sat, Jan 27, 2007 at 12:05:11PM -0500, Mat Schaffer wrote:> On Jan 27, 2007, at 1:57 AM, barsalou wrote: > > I am new to Mechanize and was wondering if there was a built-in method > > to get the elements that are on the page that are not part of a form. > > > > A couple of examples would be my banking site lists my entries and I > > want them to go into an array so that I can handle them. > > > > array = data.scan(%r{<p>([^<]+?)<img .*?/></p>}) This piece of code > > will find all the paragraph tags that have an image associated with > > them. > > > > BTW, if I wanted to create some documentation for Mechanize, how would > > I submit it? > > I''m sure if you wanted to create more of a manual, just email it to > this list and Aaron would probably be happy to have the help.Yes, I always welcome new documentation. Poor documentation really annoys me, so if something is missing or isn''t clear, please let me know.> > But first, Mechanize has decent API documentation. You may not know > how to get at the API docs though. Just run ''gem_server'' on your > local machine. Then browse to http://localhost:8808/. You''ll see an > [rdoc] link for Mechanize. Then just go to the WWW::Mechanize page > for an overview of the package. This is pretty standard fare for > most gems. Sadly there''s nothing on the web that steers people to them.Thank you! Also, you can find the documentation on the rubyforge website (although I think it is down right now): http://mechanize.rubyforge.org/ --Aaron -- Aaron Patterson http://tenderlovemaking.com/
Just wanted to provide some feedback. Quoting Mat Schaffer <schapht at gmail.com>: <snip>> Sadly I don''t know a real easy way to do your example. But I''d do > something like this: > > page.search(''p'').find_all { |p| p.search(''img'') }This worked great..there was a lot more for me to learn and still struggling with how to organize this stuff in my head. Hopefully my examples below will shed some light on what more I need to learn.> > Anyway, check out: > http://code.whytheluckystiff.net/doc/hpricot/This was helpful as well...especially if you first go to the README link. Also there is a reference to JQuery, which was also helpful. I realize that all the documentation is there and duplication of that documentation is a waste, but I believe more examples could help newer users get acclimated. However, Mechanize is the schizzle! (can I say that here :) ) The page that this code is for has two tables and the second table contains two rows of data with two data items for every "entry". Here is what I ended up doing: # more initialization code above this page = agent.submit(form) # divide the page into tables tables = page.search("table") # now break up the table into rows. rows = tables[1].search("tr") # the tested urls are stored in the testedurls array testedurls = rows.search("td:nth-child(0)") # the results from the tests are stored in urlresults urlresults = rows.search("td:nth-child(1)") i=1 while i < (testedurls.length + 1) i += 1 answer ="" unless urlresults[i].nil? then tmp,answer = urlresults[i].split('':'') end if answer == " " then puts "The url: #{testedurls[i-1]} is not currently categorized" end end I know there are ways I can optimize the above code, but thought it better to provide the feedback. Thanks for giving me direction. Mike B. ---------------------------------------------------------------- This message was sent using IMP, the Internet Messaging Program.
All, I''ve recently started using SVN and trying to move back and forth between computers for development. So far, not so good. Is there a way to bring mechanize with me? I''ve installed via the GEM, and I''ve installed Mechanize on both computers. But, when I go to start up lighttpd, I get this message: gem_original_require'': no such file to load -- mechanize (MissingSourceFile) Though it is installed on both machines. Is there a way to install Mechanize via SVN, so that I could map Mechanize as an external library? Or, does someone else have any suggestions as to the best way to manage this? Or, am I completely missing something, and this isn''t a Mechanize-related problem? Thanks for any help! William
On Thu, Feb 01, 2007 at 05:26:35PM -0500, William Flanagan wrote:> All, > > I''ve recently started using SVN and trying to move back and forth between > computers for development. So far, not so good. > > Is there a way to bring mechanize with me? I''ve installed via the GEM, and > I''ve installed Mechanize on both computers. But, when I go to start up > lighttpd, I get this message: > > gem_original_require'': no such file to load -- mechanize (MissingSourceFile) >What happens when you try to load it in irb? Here it is on my system: irb(main):001:0> require ''rubygems'' => true irb(main):002:0> require ''mechanize'' => true irb(main):003:0>> Though it is installed on both machines. > > Is there a way to install Mechanize via SVN, so that I could map Mechanize > as an external library? Or, does someone else have any suggestions as to > the best way to manage this? > > Or, am I completely missing something, and this isn''t a Mechanize-related > problem?If you can load it in irb, I would suspect that there is something else wrong. -- Aaron Patterson http://tenderlovemaking.com/
All, FYI, this is a fast CGI problem. It works fine with webrick. So, if you see this stuff, then start looking there. Thanks for the help. I love Mechanize! William On 2/1/07 5:52 PM, "Aaron Patterson" <aaron_patterson at speakeasy.net> wrote:> On Thu, Feb 01, 2007 at 05:26:35PM -0500, William Flanagan wrote: >> All, >> >> I''ve recently started using SVN and trying to move back and forth between >> computers for development. So far, not so good. >> >> Is there a way to bring mechanize with me? I''ve installed via the GEM, and >> I''ve installed Mechanize on both computers. But, when I go to start up >> lighttpd, I get this message: >> >> gem_original_require'': no such file to load -- mechanize (MissingSourceFile) >> > > What happens when you try to load it in irb? Here it is on my system: > > irb(main):001:0> require ''rubygems'' > => true > irb(main):002:0> require ''mechanize'' > => true > irb(main):003:0> > >> Though it is installed on both machines. >> >> Is there a way to install Mechanize via SVN, so that I could map Mechanize >> as an external library? Or, does someone else have any suggestions as to >> the best way to manage this? >> >> Or, am I completely missing something, and this isn''t a Mechanize-related >> problem? > > If you can load it in irb, I would suspect that there is something else > wrong.