thr3ads.net - search: "scrape"

2006 Jan 27

1

Caching from screen scraping

Hi all, I need to do some screen scraping from my rails app. Given an ethernet (MAC) adress, I scrape results from an internal web page that returns location and hostname. How can I cache the result from that screen scraping as to be polite to the scrapee? I would like to expire the results daily. In perl, I would use Cache::File. Can I use rails caching for this? What''s the best way?...

How to execute time consuming code

2006 May 22

2

How to execute time consuming code

Hello all, I have a screen scraping application (go to a lots of sites, extract 10k stuff, integrate the results, put them to DB etc). Now i want to use a Rails application as a frontend to this: The user can push a button which triggers the screen scraping app and view the results (preferably asynchronously, but that does not really matter right now). Questions: - Should the screen scraping app

How to scrape a page without knowing its html structure

2009 Dec 12

6

How to scrape a page without knowing its html structure

Hi, I''m doing one module in my site, there I need to import user blog into my site. I can use RSS feeds to read the blog information but using RSS feeds I''m not getting entire information. So, I need to scrape the user blog page. How to scrape a pages without knowing its html structure of a page? Please anyone can help me for this issue. Thanks in advance. -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send emai...

Does Amazon.com blocks scraping?

2010 Jan 25

4

Does Amazon.com blocks scraping?

Hi there Does anyone know if Amazon.com has any sort of server side script that tries to block scraping activities? I first noticed that if I didn?t change the agent alias, it would fetch a page exactly like the normal one, but without the intial search field(maybe a silly way to prevent scraping). Then after it, I changed to some other alias, and submit a search. I got the result page as

R as a web scraping tool using RCurl

2009 Feb 18

1

R as a web scraping tool using RCurl

Hi List, I am trying to leverage my knowledge of R in trying to use it for tasks that may not make R the best choice for these tasks. I wish to automate a web scraping task, which requires a multi-step procedure: 1) log in to a website 2) Go to a particular page 3) From the drop down menu, click on a particular link 4) From the tabulated data presented, choose relevant information based on a

Web scraping different levels of a website

2018 Jan 18

0

Web scraping different levels of a website

...a") %>% html_text() %>% as.data.frame() }) %>% do.call(rbind, .) I have repeated the same code in order to get all the data I was interested in and it seems to work perfectly, although is of course a little slow due to the Sys.sleep() thing. My issue has raised once I have tried to scrape the single projects descriptions that should be included in the dataframe. For instance, the first project description is at http://catalog.ihsn.org/index.php/catalog/7118/study-description the second project description is at http://catalog.ihsn.org/index.php/catalog/6606/study-description an...

./configure -> "libgd unusable" (shall I build from source or scrape and rebuild?)

2007 Dec 03

1

./configure -> "libgd unusable" (shall I build from source or scrape and rebuild?)

On a relatively new Nagios 2.x / CentOS 4.x server (only like a week old), I am experiencing gd(-devel/-progs) problems and am wondering if I should just scrape and rebuild. Here is my situation: I installed LAMP+Nagios+NagiosQL and was going to install PerfParse so that I could have trending info integrated with Nagios. The configure script would run ok, but would crap out on make && make install, leading me to believe something was missing from...

Does Amazon.com block scraping?

2010 Jan 26

1

Does Amazon.com block scraping?

Hi there Does anyone know if Amazon.com has any sort of server side script that tries to block scraping activities? I first noticed that if I didn?t change the agent alias, it would fetch a page exactly like the normal one, but without the intial search field(maybe a silly way to prevent scraping). Then after it, I changed to some other alias, and submit a search. I got the result page as

How to choose a button and scrape the website data

2012 Mar 05

2

How to choose a button and scrape the website data

hi all, I'm working on scrapping some website data to build a database. Under most cases, I can use package XML to get the dataset. However, some of the website doesn't give a explicit address of the downloaded tables. To be more specific, for example, I'm interested in the website http://ets.aeso.ca/ The data we are scraping is the "Pool Weekly Summary" under the

Scraping AOL Webmail to login and fetch contacts?

2007 Oct 10

1

Scraping AOL Webmail to login and fetch contacts?

I''m helping with a gem that is going to published under the contentfree project on rubyforge (http://rubyforge.org/projects/contentfree/). The gem is called "blackbook" and basically it will go and fetch your contacts from the major webmail providers. So far Gmail, Yahoo!, and MSN have been completed. We are trying to finish up with fetching contacts from AOL Webmail. However

Scraping from different level URLs website

2018 Jan 23

1

Scraping from different level URLs website

I am doing a research on World Bank (WB) projects on developing countries. To do so, I am scraping their website in order to collect the data I am interested in. The structure of the webpage I want to scrape is the following: 1. List of countries the list of all countries in which WB has developed projects<http://projects.worldbank.org/country?lang=en&page=> 1.1. By clicking on a single country on 1. , one gets the single countries project list (that includes many webpages) it includes a...

Goodreader: Scrape and Analyze 'Goodreads' Book Data

2024 Sep 03

0

Goodreader: Scrape and Analyze 'Goodreads' Book Data

Dear R Users, I am pleased to announce that Goodreader 0.1.1 is now available on CRAN. Goodreader offers a toolkit for scraping and analyzing book data from Goodreads. Users can search for books, scrape detailed information and reviews, perform sentiment analysis on reviews, and conduct topic modeling. Here?s a quick overview of how to use Goodreader: # Search for books AI_df <- search_goodreads(search_term = "artificial intelligence", search_in = "title", num_books = 10, s...

Goodreader: Scrape and Analyze 'Goodreads' Book Data

2024 Sep 03

0

Goodreader: Scrape and Analyze 'Goodreads' Book Data

Dear R Users, I am pleased to announce that Goodreader 0.1.1 is now available on CRAN. Goodreader offers a toolkit for scraping and analyzing book data from Goodreads. Users can search for books, scrape detailed information and reviews, perform sentiment analysis on reviews, and conduct topic modeling. Here?s a quick overview of how to use Goodreader: # Search for books AI_df <- search_goodreads(search_term = "artificial intelligence", search_in = "title", num_books = 10, s...

Help with this web scrape function

2012 Jun 01

1

Help with this web scrape function

Hello, I am looking to scrape this Webpage: http://toast.gasunie.de/gud/search.aspx?soid=GUD&lang=de The page uses the method "POST", it contains various HTML Forms, mostly lists and a couple of radio buttons. After submit, I should get forwarded to a new page. Which selections are being made in the forms does n...

problem scraping using nokogiri - getting wrong characters

2011 Nov 27

2

problem scraping using nokogiri - getting wrong characters

Hi all, I am scraping a table off of another site and inserting it onto my site. you can see an example on the initial page at: http://mthosts.heroku.com. I''m referring to the green box with the snowbird weather and snowfall information. this box has been scraped off of the snowbird site at: http://www.snowbird.com/ski_board/snowreport.php The problem is that on the snowbird site it has degree symbols (°) but on my page it shows up as: (�) I think it has something to do with the encoding but i''m pretty new to html etc. and am not sure what i can...

Scraping a web page.

2012 May 14

3

Scraping a web page.

Folks, I want to scrape a series of web-page sources for strings like the following: "/en/Ships/A-8605507.html" "/en/Ships/Aalborg-8122830.html" which appear in an href inside an <a> tag inside a <div> tag inside a table. In fact all I want is the (exactly) 7-digit number before ".h...

adding results from threads to a collection and returning it

2008 Jun 10

4

adding results from threads to a collection and returning it

...ly trying to distribute several web page scraping tasks among different threads, and have the results from each added to an Array which is ultimately returned by the backgroundrb worker. Here is an example of what I''m trying to do in a worker method: pages = Array.new pages_to_scrape.each do |url| thread_pool.defer(url) do |url| begin # model object performs the scraping page = ScrapedPage.new(page.url) pages << page rescue logger.info "page scrape failed" end...

Scraping and saving.

2007 Apr 03

2

Scraping and saving.

Hi, I''m working to scrape and save some ebooks. Mechanize has been wonderful so far. The link I''m having trouble with is this one. http://www.webscription.net/SendZip.aspx?SKU=0671578499&ProductID=379&format=H When I click that in the browser it saves it to a file named H_1632.zip. How do I get that name...

script/plugin discover breaks?

2006 Jun 13

4

script/plugin discover breaks?

...plugins/? [Y/n] Add http://svn.openprofile.net/plugins/? [Y/n] Add http://terralien.com/svn/projects/plugins/? [Y/n] (eval):3:in `each'': undefined method `[]'' for nil:NilClass (NoMethodError) from /usr/local/lib/ruby/gems/1.8/gems/rails-1.1.2/lib/commands/plugin.rb:658:in `scrape'' from /usr/local/lib/ruby/gems/1.8/gems/rails-1.1.2/lib/commands/plugin.rb:632:in `parse!'' from /usr/local/lib/ruby/gems/1.8/gems/rails-1.1.2/lib/commands/plugin.rb:631:in `parse!'' from /usr/local/lib/ruby/gems/1.8/gems/rails-1.1.2/lib/commands/plu...

Results of security honeypot experiment - scraping for IP's/credentials ?

2015 Jun 03

1

Results of security honeypot experiment - scraping for IP's/credentials ?

The results of a security experiment were published this week, in which an Asterisk PBX was set out in the wild to see who would attack it and how: http://www.telium.ca/?honeypot1 What I find particularly interesting is that people/bots are scraping support websites looking for valid IP's of PBX's, and valid credentials! A good reminder to everyone on this list to not publish the IP

search for: scrape