Displaying 20 results from an estimated 3000 matches similar to: "Tips on testing"
2010 Jan 26
1
Does Amazon.com block scraping?
Hi there
Does anyone know if Amazon.com has any sort of server side script that tries
to block scraping activities? I first noticed that if I didn?t change the
agent alias, it would fetch a page exactly like the normal one, but without
the intial search field(maybe a silly way to prevent scraping). Then after
it, I changed to some other alias, and submit a search. I got the result
page as
2006 Nov 22
1
to_absolute_uri typo in 0.6.3?
I just started using Mechanize, and started using Ruby about thirty
seconds before that, but one of the sites I''m scraping does a redirect
on form submission to a badly-formed relative URL:
index.cfm?action=bing&bang=boom=1|a=|b=|c= (etc.)
Interestingly, Mechanize 0.6.2 handled this OK, but in 0.6.3 this causes
a URI::InvalidURIError exception from URI.parse() in to_absolute_uri
2007 Oct 10
1
Scraping AOL Webmail to login and fetch contacts?
I''m helping with a gem that is going to published under the
contentfree project on rubyforge
(http://rubyforge.org/projects/contentfree/).
The gem is called "blackbook" and basically it will go and fetch your
contacts from the major webmail providers. So far Gmail, Yahoo!, and
MSN have been completed.
We are trying to finish up with fetching contacts from AOL Webmail.
However
2008 Jul 17
3
Convert data to utf-8
Hello, I''m trying to find a solution to convert everything returned by
mechanize to utf-8, no matter if the original page is utf-8 or iso and I
really don''t know where to start from...
agent = WWW::Mechanize.new { |a| a.log =
Logger.new(File::join(RAILS_ROOT, "log/mechanize.log")) }
one_page = agent.get("www.google.fr")
My first problem is that one_page
2010 Jan 25
4
Does Amazon.com blocks scraping?
Hi there
Does anyone know if Amazon.com has any sort of server side script that tries
to block scraping activities? I first noticed that if I didn?t change the
agent alias, it would fetch a page exactly like the normal one, but without
the intial search field(maybe a silly way to prevent scraping). Then after
it, I changed to some other alias, and submit a search. I got the result
page as
2007 Sep 14
1
Unable to scrap gmail.com - EOFError: End of file reached
Hi all,
I am so excited to use mechanize! It has opened a whole new world of
projects for me :)
I am trying to login into the Gmail.com server, as described in
http://schf.uc.org/articles/2007/02/14/scraping-gmail-with-mechanize-and-hpricot
but am running into a few issues...
irb(main):010:0> page = agent.submit form
EOFError: end of file reached
from
2008 Jun 12
1
setting request headers via get()
Hey all,
Found a email thread from Jan 2007 discussing the inability to set request
headers (like ETag and If-Modified-Since) through the API, and this is
something that''s bothering me a bit. Currently the "way" to do this is to
subclass Mechanize and override set_headers(). That seems fine for headers
that you''d like to send in every request or for classes of request,
2007 Mar 18
1
Submitting a form sends a file. How do I save it?
I''ve been using Mechanize for a project that i''ve been working on,
but this is the first time i''m having to use forms (scraping
previously). So, after i fill out the form, when I hit submit, it
sends me information in the form of a text file to download. For the
life of me, I can''t see how to get access to it. When clicking on a
link, you can put a
2007 Nov 12
3
Weird error downloading a gzip''ed file
Hi all,
I''ve been using mechanize for a while and it rocks. Docs are pretty clear
and so far I''ve been able to do it on my own.
However, I''m stuck in a weird situation in a script to download my contact
list from hotmail.
I''ve used Firebug to check all urls, and tested it by hand while logged in
via browser.
Even in the script everything works well until the
2006 May 22
2
How to execute time consuming code
Hello all,
I have a screen scraping application (go to a lots of sites, extract 10k
stuff, integrate the results, put them to DB etc). Now i want to use a
Rails application as a frontend to this: The user can push a button
which triggers the screen scraping app and view the results (preferably
asynchronously, but that does not really matter right now).
Questions:
- Should the screen scraping app
2007 Apr 03
2
Scraping and saving.
Hi,
I''m working to scrape and save some ebooks. Mechanize has been
wonderful so far. The link I''m having trouble with is this one.
http://www.webscription.net/SendZip.aspx?SKU=0671578499&ProductID=379&format=H
When I click that in the browser it saves it to a file named
H_1632.zip. How do I get that name from the page. I suspect to save
this to a file I would just do
2007 May 07
6
mock frameworks
Just curious - now that rspec (as of 0.9) let''s you choose your mock
framework, how many of you are actually using (or planning to use)
mocha or flexmock?
Anybody planning to use any other mock framework besides rspec, mocha
or flexmock?
Thanks,
David
2006 Jan 27
1
Caching from screen scraping
Hi all,
I need to do some screen scraping from my rails app. Given an ethernet
(MAC) adress, I scrape results from an internal web page that returns
location and hostname. How can I cache the result from that screen
scraping as to be polite to the scrapee? I would like to expire the
results daily. In perl, I would use Cache::File. Can I use rails caching
for this? What''s the best
2007 Aug 31
48
Deprecating the mocking framework?
I saw in one of Dave C.''s comments to a ticket that "our current plan
is to deprecate the mocking framework." I hadn''t heard anything about
that, but then again I haven''t paid super close attention to the list.
Are we planning on dumping the mock framework in favor of using Mocha
(or any other framework one might want to plug in?).
Pat
2007 Nov 04
3
Returning the mock associated with an expectation.
I was reading through the FlexMock docs and noticed the expectation
method .mock, which returns the original mock associated with an
expectation.
It looks really handy for writing nice all-in-one mocks like:
mock_user = mock(''User'').expects(:first_name).returns(''Jonah'').mock
So I started playing around with mocha and found I could actually
already do this!
2009 Feb 18
1
R as a web scraping tool using RCurl
Hi List,
I am trying to leverage my knowledge of R in trying to use it for tasks that
may not make R the best choice for these tasks.
I wish to automate a web scraping task, which requires a multi-step
procedure:
1) log in to a website
2) Go to a particular page
3) From the drop down menu, click on a particular link
4) From the tabulated data presented, choose relevant information based on a
2008 Jun 10
4
adding results from threads to a collection and returning it
Forgive me if this has been addressed somewhere, but I have searched and
can''t come up with anything.
I am basically trying to distribute several web page scraping tasks among
different threads, and have the results from each added to an Array which is
ultimately returned by the backgroundrb worker. Here is an example of what
I''m trying to do in a worker method:
pages =
2006 Oct 25
5
Mocha, Stubba and RSpec
Hi,
I''ve been reading with interest the threads trying to integrate Mocha
and Stubba with RSpec. So far, I''ve made the two changes in
spec_helper.rb suggested, but discovered another one that neither of
the archives mentions:
If you use traditional mocking: object = mock or the stub shortcut
: object = stub(:method => :result), you run into namespace conflicts
with
2012 Mar 05
2
How to choose a button and scrape the website data
hi all,
I'm working on scrapping some website data to build a database.
Under most cases, I can use package XML to get the dataset.
However, some of the website doesn't give a explicit address of the downloaded tables.
To be more specific, for example, I'm interested in the website http://ets.aeso.ca/
The data we are scraping is the "Pool Weekly Summary" under the
2008 Jul 25
21
Problems with mock assigned to a constant
Hi all,
Initially I thought this was a bug in the built-in mocking framework(and it
still may be), but I better hash it out on the mailing list before I
file/reopen the ticket:
http://rspec.lighthouseapp.com/projects/5645/tickets/478-mocks-on-constants#ticket-478-6
I thought my example illustrated my problem, but obviously I was passing the
wrong arguments to the mock. I revised my example to