thr3ads.net - Mechanize users - [Mechanize-users] Getting elements from a web page [Jan 2007]

If this information is useful, please help other people find it:
Share via:

barsalou

2007-Jan-27 06:57 UTC

[Mechanize-users] Getting elements from a web page

I am new to Mechanize and was wondering if there was a built-in method 
to get the elements that are on the page that are not part of a form.

A couple of examples would be my banking site lists my entries and I 
want them to go into an array so that I can handle them.

Or another site I use, does some categorization for me and I would like 
to manipulate it and present it differently to a user.

I looked through some of the maillists and found something that Paul 
Lutus wrote that I should be able to use:

array = data.scan(%r{<p>([^<]+?)<img .*?/></p>}) This piece
of code
will find all the paragraph tags that have an image associated with 
them.

It''s clear to me that Paul understands regular expressions 
well....unfortunately that is not me.

I just wondered, with as easy Mechanize has been to use with forms and 
such, it seemed like there would be something I could use that would 
help me accomplish my task.

While I''m hoping there is a method from within Mechanize, I''ll
start
working on my regular expressions.

BTW, if I wanted to create some documentation for Mechanize, how would 
I submit it?

Mike B.


----------------------------------------------------------------
This message was sent using IMP, the Internet Messaging Program.

Mat Schaffer

2007-Jan-27 17:05 UTC

head link

[Mechanize-users] Getting elements from a web page

On Jan 27, 2007, at 1:57 AM, barsalou wrote:> I am new to Mechanize and was wondering if there was a built-in method
> to get the elements that are on the page that are not part of a form.
>
> A couple of examples would be my banking site lists my entries and I
> want them to go into an array so that I can handle them.
>
> array = data.scan(%r{<p>([^<]+?)<img .*?/></p>}) This
piece of code
> will find all the paragraph tags that have an image associated with
> them.
>
> BTW, if I wanted to create some documentation for Mechanize, how would
> I submit it?
I''m sure if you wanted to create more of a manual, just email it to  
this list and Aaron would probably be happy to have the help.

But first, Mechanize has decent API documentation.   You may not know  
how to get at the API docs though.  Just run ''gem_server'' on
your
local machine.  Then browse to http://localhost:8808/. You''ll see an  
[rdoc] link for Mechanize.  Then just go to the WWW::Mechanize page  
for an overview of the package.  This is pretty standard fare for  
most gems.  Sadly there''s nothing on the web that steers people to
them.

Anyway.  Searching in mechanize is powered by hpricot.  So anything  
that works in hpricot will also work on a mechanize Page.

Sadly I don''t know a real easy way to do your example.  But
I''d do
something like this:

page.search(''p'').find_all { |p|
p.search(''img'') }

There might be something easier.  But say you were interested in all  
the img''s that exist inside a table with id ''body''. 
That''d be:

page.search(''table#body img'')

Which is usually just the sort of thing I''m looking for.

Anyway, check out:
http://code.whytheluckystiff.net/doc/hpricot/

Which has more info about Hpricot (which is the magic behind  
WWW::Mechanize::Page)

Hope that helps!
-Mat

barsalou

2007-Jan-27 21:19 UTC

head link

[Mechanize-users] Getting elements from a web page

Quoting Mat Schaffer <schapht at gmail.com>:

<snip>> I''m sure if you wanted to create more of a manual, just email it
to
> this list and Aaron would probably be happy to have the help.
>
> But first, Mechanize has decent API documentation.   You may not know
> how to get at the API docs though.  Just run ''gem_server''
on your
> local machine.  Then browse to http://localhost:8808/. You''ll see
an
> [rdoc] link for Mechanize.  Then just go to the WWW::Mechanize page
> for an overview of the package.  This is pretty standard fare for
> most gems.  Sadly there''s nothing on the web that steers people to
them.
>
> Anyway.  Searching in mechanize is powered by hpricot.  So anything
> that works in hpricot will also work on a mechanize Page.
>
> Sadly I don''t know a real easy way to do your example.  But
I''d do
> something like this:
>
> page.search(''p'').find_all { |p|
p.search(''img'') }
>
> There might be something easier.  But say you were interested in all
> the img''s that exist inside a table with id
''body''.  That''d be:
>
> page.search(''table#body img'')
>
> Which is usually just the sort of thing I''m looking for.
>
> Anyway, check out:
> http://code.whytheluckystiff.net/doc/hpricot/
><snip>
I have found the API docs, but for a newbie who doesn''t know anything 
about Hpricot and various ways to deal with web pages, I think more 
examples will be helpful.  Thanks for the hints...I''ll check them out 
and report back.

Ruby and Mechanize(which includes Hpricot) makes working with HTML 
almost fun! :)

Mike B.

----------------------------------------------------------------
This message was sent using IMP, the Internet Messaging Program.

Aaron Patterson

2007-Jan-28 19:15 UTC

head link

[Mechanize-users] Getting elements from a web page

On Sat, Jan 27, 2007 at 12:05:11PM -0500, Mat Schaffer
wrote:> On Jan 27, 2007, at 1:57 AM, barsalou wrote:
> > I am new to Mechanize and was wondering if there was a built-in method
> > to get the elements that are on the page that are not part of a form.
> >
> > A couple of examples would be my banking site lists my entries and I
> > want them to go into an array so that I can handle them.
> >
> > array = data.scan(%r{<p>([^<]+?)<img .*?/></p>})
This piece of code
> > will find all the paragraph tags that have an image associated with
> > them.
> >
> > BTW, if I wanted to create some documentation for Mechanize, how would
> > I submit it?
> 
> I''m sure if you wanted to create more of a manual, just email it
to
> this list and Aaron would probably be happy to have the help.
Yes, I always welcome new documentation.  Poor documentation really
annoys me, so if something is missing or isn''t clear, please let me
know.
> 
> But first, Mechanize has decent API documentation.   You may not know  
> how to get at the API docs though.  Just run ''gem_server''
on your
> local machine.  Then browse to http://localhost:8808/. You''ll see
an
> [rdoc] link for Mechanize.  Then just go to the WWW::Mechanize page  
> for an overview of the package.  This is pretty standard fare for  
> most gems.  Sadly there''s nothing on the web that steers people to
them.
Thank you!  Also, you can find the documentation on the rubyforge
website (although I think it is down right now):

  http://mechanize.rubyforge.org/

--Aaron

-- 
Aaron Patterson
http://tenderlovemaking.com/

Mike

2007-Jan-30 04:07 UTC

head link

[Mechanize-users] Getting elements from a web page

Just wanted to provide some feedback.

Quoting Mat Schaffer <schapht at gmail.com>:

<snip>
> Sadly I don''t know a real easy way to do your example.  But
I''d do
> something like this:
>
> page.search(''p'').find_all { |p|
p.search(''img'') }
This worked great..there was a lot more for me to learn and still 
struggling with how to organize this stuff in my head.  Hopefully my 
examples below will shed some light on what more I need to learn.
>
> Anyway, check out:
> http://code.whytheluckystiff.net/doc/hpricot/

This was helpful as well...especially if you first go to the README 
link.  Also there is a reference to JQuery, which was also helpful.

I realize that all the documentation is there and duplication of that 
documentation is a waste, but I believe more examples could help newer 
users get acclimated.

However, Mechanize is the schizzle! (can I say that here :) )
The page that this code is for has two tables and the second table 
contains two rows of data with two data items for every "entry".

Here is what I ended up doing:

# more initialization code above this
page = agent.submit(form)

# divide the page into tables
tables = page.search("table")

# now break up the table into rows.
rows = tables[1].search("tr")

# the tested urls are stored in the testedurls array
testedurls = rows.search("td:nth-child(0)")

# the results from the tests are stored in urlresults
urlresults = rows.search("td:nth-child(1)")

i=1
while i < (testedurls.length + 1)

  i += 1
  answer =""
  unless urlresults[i].nil? then
    tmp,answer = urlresults[i].split('':'')
  end
  if answer == " " then
     puts "The url: #{testedurls[i-1]} is not currently categorized"
  end
end


I know there are ways I can optimize the above code, but thought it 
better to provide the feedback.

Thanks for giving me direction.

Mike B.


----------------------------------------------------------------
This message was sent using IMP, the Internet Messaging Program.

William Flanagan

2007-Feb-01 22:26 UTC

head link

[Mechanize-users] Mechanize, SVN, and lighttpd?

All,

I''ve recently started using SVN and trying to move back and forth
between
computers for development.  So far, not so good.

Is there a way to bring mechanize with me?  I''ve installed via the GEM,
and
I''ve installed Mechanize on both computers. But, when I go to start up
lighttpd, I get this message:

gem_original_require'': no such file to load -- mechanize
(MissingSourceFile)

Though it is installed on both machines.

Is there a way to install Mechanize via SVN, so that I could map Mechanize
as an external library?  Or, does someone else have any suggestions as to
the best way to manage this?

Or, am I completely missing something, and this isn''t a
Mechanize-related
problem?  

Thanks for any help!

William

Aaron Patterson

2007-Feb-01 22:52 UTC

head link

[Mechanize-users] Mechanize, SVN, and lighttpd?

On Thu, Feb 01, 2007 at 05:26:35PM -0500, William Flanagan
wrote:> All,
> 
> I''ve recently started using SVN and trying to move back and forth
between
> computers for development.  So far, not so good.
> 
> Is there a way to bring mechanize with me?  I''ve installed via the
GEM, and
> I''ve installed Mechanize on both computers. But, when I go to
start up
> lighttpd, I get this message:
> 
> gem_original_require'': no such file to load -- mechanize
(MissingSourceFile)
> 
What happens when you try to load it in irb?  Here it is on my system:

  irb(main):001:0> require ''rubygems''
  => true
  irb(main):002:0> require ''mechanize''
  => true
  irb(main):003:0>
> Though it is installed on both machines.
> 
> Is there a way to install Mechanize via SVN, so that I could map Mechanize
> as an external library?  Or, does someone else have any suggestions as to
> the best way to manage this?
> 
> Or, am I completely missing something, and this isn''t a
Mechanize-related
> problem?  
If you can load it in irb, I would suspect that there is something else
wrong.

-- 
Aaron Patterson
http://tenderlovemaking.com/

William Flanagan

2007-Feb-02 16:22 UTC

head link

[Mechanize-users] Mechanize, SVN, and lighttpd?

All,

FYI, this is a fast CGI problem.  It works fine with webrick.  So, if you
see this stuff, then start looking there.

Thanks for the help.  I love Mechanize!

William



On 2/1/07 5:52 PM, "Aaron Patterson" <aaron_patterson at
speakeasy.net> wrote:
> On Thu, Feb 01, 2007 at 05:26:35PM -0500, William Flanagan wrote:
>> All,
>> 
>> I''ve recently started using SVN and trying to move back and
forth between
>> computers for development.  So far, not so good.
>> 
>> Is there a way to bring mechanize with me?  I''ve installed via
the GEM, and
>> I''ve installed Mechanize on both computers. But, when I go to
start up
>> lighttpd, I get this message:
>> 
>> gem_original_require'': no such file to load -- mechanize
(MissingSourceFile)
>> 
> 
> What happens when you try to load it in irb?  Here it is on my system:
> 
>   irb(main):001:0> require ''rubygems''
>   => true
>   irb(main):002:0> require ''mechanize''
>   => true
>   irb(main):003:0>
> 
>> Though it is installed on both machines.
>> 
>> Is there a way to install Mechanize via SVN, so that I could map
Mechanize
>> as an external library?  Or, does someone else have any suggestions as
to
>> the best way to manage this?
>> 
>> Or, am I completely missing something, and this isn''t a
Mechanize-related
>> problem?  
> 
> If you can load it in irb, I would suspect that there is something else
> wrong.

Apparently Analagous Threads

Search for more maybe matching threads

Mechanize users - Jan 2007 - Getting elements from a web page

[Mechanize-users] Getting elements from a web page

[Mechanize-users] Getting elements from a web page

[Mechanize-users] Getting elements from a web page

[Mechanize-users] Getting elements from a web page

[Mechanize-users] Getting elements from a web page

[Mechanize-users] Mechanize, SVN, and lighttpd?

[Mechanize-users] Mechanize, SVN, and lighttpd?

[Mechanize-users] Mechanize, SVN, and lighttpd?

Apparently Analagous Threads