Displaying 20 results from an estimated 374 matches for "scrape".
2006 Jan 27
1
Caching from screen scraping
Hi all,
I need to do some screen scraping from my rails app. Given an ethernet
(MAC) adress, I scrape results from an internal web page that returns
location and hostname. How can I cache the result from that screen
scraping as to be polite to the scrapee? I would like to expire the
results daily. In perl, I would use Cache::File. Can I use rails caching
for this? What''s the best way?...
2006 May 22
2
How to execute time consuming code
Hello all,
I have a screen scraping application (go to a lots of sites, extract 10k
stuff, integrate the results, put them to DB etc). Now i want to use a
Rails application as a frontend to this: The user can push a button
which triggers the screen scraping app and view the results (preferably
asynchronously, but that does not really matter right now).
Questions:
- Should the screen scraping app
2009 Dec 12
6
How to scrape a page without knowing its html structure
Hi,
I''m doing one module in my site, there I need to import user blog into
my site. I can use RSS feeds to read the blog information but using
RSS feeds I''m not getting entire information. So, I need to scrape the
user blog page. How to scrape a pages without knowing its html
structure of a page? Please anyone can help me for this issue. Thanks
in advance.
--
You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group.
To post to this group, send emai...
2010 Jan 25
4
Does Amazon.com blocks scraping?
Hi there
Does anyone know if Amazon.com has any sort of server side script that tries
to block scraping activities? I first noticed that if I didn?t change the
agent alias, it would fetch a page exactly like the normal one, but without
the intial search field(maybe a silly way to prevent scraping). Then after
it, I changed to some other alias, and submit a search. I got the result
page as
2009 Feb 18
1
R as a web scraping tool using RCurl
Hi List,
I am trying to leverage my knowledge of R in trying to use it for tasks that
may not make R the best choice for these tasks.
I wish to automate a web scraping task, which requires a multi-step
procedure:
1) log in to a website
2) Go to a particular page
3) From the drop down menu, click on a particular link
4) From the tabulated data presented, choose relevant information based on a
2018 Jan 18
0
Web scraping different levels of a website
...a") %>%
html_text() %>%
as.data.frame()
}) %>% do.call(rbind, .)
I have repeated the same code in order to get all the data I was interested in and it seems to work perfectly, although is of course a little slow due to the Sys.sleep() thing.
My issue has raised once I have tried to scrape the single projects descriptions that should be included in the dataframe.
For instance, the first project description is at
http://catalog.ihsn.org/index.php/catalog/7118/study-description
the second project description is at
http://catalog.ihsn.org/index.php/catalog/6606/study-description
an...
2007 Dec 03
1
./configure -> "libgd unusable" (shall I build from source or scrape and rebuild?)
On a relatively new Nagios 2.x / CentOS 4.x server (only like a week
old), I am experiencing gd(-devel/-progs) problems and am wondering if
I should just scrape and rebuild. Here is my situation:
I installed LAMP+Nagios+NagiosQL and was going to install PerfParse so
that I could have trending info integrated with Nagios. The configure
script would run ok, but would crap out on make && make install,
leading me to believe something was missing from...
2010 Jan 26
1
Does Amazon.com block scraping?
Hi there
Does anyone know if Amazon.com has any sort of server side script that tries
to block scraping activities? I first noticed that if I didn?t change the
agent alias, it would fetch a page exactly like the normal one, but without
the intial search field(maybe a silly way to prevent scraping). Then after
it, I changed to some other alias, and submit a search. I got the result
page as
2012 Mar 05
2
How to choose a button and scrape the website data
hi all,
I'm working on scrapping some website data to build a database.
Under most cases, I can use package XML to get the dataset.
However, some of the website doesn't give a explicit address of the downloaded tables.
To be more specific, for example, I'm interested in the website http://ets.aeso.ca/
The data we are scraping is the "Pool Weekly Summary" under the
2007 Oct 10
1
Scraping AOL Webmail to login and fetch contacts?
I''m helping with a gem that is going to published under the
contentfree project on rubyforge
(http://rubyforge.org/projects/contentfree/).
The gem is called "blackbook" and basically it will go and fetch your
contacts from the major webmail providers. So far Gmail, Yahoo!, and
MSN have been completed.
We are trying to finish up with fetching contacts from AOL Webmail.
However
2018 Jan 23
1
Scraping from different level URLs website
I am doing a research on World Bank (WB) projects on developing countries. To do so, I am scraping their website in order to collect the data I am interested in.
The structure of the webpage I want to scrape is the following:
1. List of countries the list of all countries in which WB has developed projects<http://projects.worldbank.org/country?lang=en&page=>
1.1. By clicking on a single country on 1. , one gets the single countries project list (that includes many webpages) it includes a...
2024 Sep 03
0
Goodreader: Scrape and Analyze 'Goodreads' Book Data
Dear R Users,
I am pleased to announce that Goodreader 0.1.1 is now available on CRAN.
Goodreader offers a toolkit for scraping and analyzing book data from
Goodreads. Users can search for books, scrape detailed information and
reviews, perform sentiment analysis on reviews, and conduct topic modeling.
Here?s a quick overview of how to use Goodreader:
# Search for books
AI_df <- search_goodreads(search_term = "artificial intelligence",
search_in = "title", num_books = 10, s...
2024 Sep 03
0
Goodreader: Scrape and Analyze 'Goodreads' Book Data
Dear R Users,
I am pleased to announce that Goodreader 0.1.1 is now available on CRAN.
Goodreader offers a toolkit for scraping and analyzing book data from
Goodreads. Users can search for books, scrape detailed information and
reviews, perform sentiment analysis on reviews, and conduct topic modeling.
Here?s a quick overview of how to use Goodreader:
# Search for books
AI_df <- search_goodreads(search_term = "artificial intelligence",
search_in = "title", num_books = 10, s...
2012 Jun 01
1
Help with this web scrape function
Hello,
I am looking to scrape this Webpage:
http://toast.gasunie.de/gud/search.aspx?soid=GUD&lang=de
The page uses the method "POST", it contains various HTML Forms, mostly
lists and a couple of radio buttons. After submit, I should get forwarded to
a new page. Which selections are being made in the forms does n...
2011 Nov 27
2
problem scraping using nokogiri - getting wrong characters
Hi all,
I am scraping a table off of another site and inserting it onto my
site. you can see an example on the initial page at: http://mthosts.heroku.com.
I''m referring to the green box with the snowbird weather and snowfall
information.
this box has been scraped off of the snowbird site at:
http://www.snowbird.com/ski_board/snowreport.php
The problem is that on the snowbird site it has degree symbols (°) but
on my page it shows up as: (�)
I think it has something to do with the encoding but i''m pretty new to
html etc. and am not sure what i can...
2012 May 14
3
Scraping a web page.
Folks,
I want to scrape a series of web-page sources for strings like the following:
"/en/Ships/A-8605507.html"
"/en/Ships/Aalborg-8122830.html"
which appear in an href inside an <a> tag inside a <div> tag inside a table.
In fact all I want is the (exactly) 7-digit number before ".h...
2008 Jun 10
4
adding results from threads to a collection and returning it
...ly trying to distribute several web page scraping tasks among
different threads, and have the results from each added to an Array which is
ultimately returned by the backgroundrb worker. Here is an example of what
I''m trying to do in a worker method:
pages = Array.new
pages_to_scrape.each do |url|
thread_pool.defer(url) do |url|
begin
# model object performs the scraping
page = ScrapedPage.new(page.url)
pages << page
rescue
logger.info "page scrape failed"
end...
2007 Apr 03
2
Scraping and saving.
Hi,
I''m working to scrape and save some ebooks. Mechanize has been
wonderful so far. The link I''m having trouble with is this one.
http://www.webscription.net/SendZip.aspx?SKU=0671578499&ProductID=379&format=H
When I click that in the browser it saves it to a file named
H_1632.zip. How do I get that name...
2006 Jun 13
4
script/plugin discover breaks?
...plugins/? [Y/n]
Add http://svn.openprofile.net/plugins/? [Y/n]
Add http://terralien.com/svn/projects/plugins/? [Y/n]
(eval):3:in `each'': undefined method `[]'' for nil:NilClass (NoMethodError)
from /usr/local/lib/ruby/gems/1.8/gems/rails-1.1.2/lib/commands/plugin.rb:658:in
`scrape''
from /usr/local/lib/ruby/gems/1.8/gems/rails-1.1.2/lib/commands/plugin.rb:632:in
`parse!''
from /usr/local/lib/ruby/gems/1.8/gems/rails-1.1.2/lib/commands/plugin.rb:631:in
`parse!''
from /usr/local/lib/ruby/gems/1.8/gems/rails-1.1.2/lib/commands/plu...
2015 Jun 03
1
Results of security honeypot experiment - scraping for IP's/credentials ?
The results of a security experiment were published this week, in which an Asterisk PBX was set out in the wild to see who would attack it and how:
http://www.telium.ca/?honeypot1
What I find particularly interesting is that people/bots are scraping support websites looking for valid IP's of PBX's, and valid credentials!
A good reminder to everyone on this list to not publish the IP