thr3ads.net - search: "scraper"

Displaying 20 results from an estimated 31 matches for "scraper".

Did you mean: scrape

2009 Jun 07

ActiveRecord Classes

...lasses using ActiveRecord when a file is in my lib directory: Brief example: Here''s the outline of the files in use: ....app ........controllers ............application_controller.rb ............rushing_offenses_controller.rb ........models ............rushing_offense.rb ....lib ........scraper.rb ........tasks ............scraper.rake The rushing_offense.rb file contains: class RushingOffense < ActiveRecord::Base end The scraper.rb file contains: class Scraper < ActiveRecord::Base # METHOD that define which URL to parse # METHOD that parses the data into an instanced variable c...

Rake Tasks

2009 Jun 06

Rake Tasks

Hi Everyone, I just need some further help clarifying a custom rake task I''m building and the logistics of how it should be working. I''ve created a custom rake task in libs/tasks called scraper.rake which so far just contains the following: desc "This task will parse data from ncaa.org and upload the data to our db" task :scraper => :environment do # code goes here for scraping end This rake task will be parsing data from ncaa.org and placing it into my DB for further pro...

OT: Scraper library recommendation

2006 Jan 10

OT: Scraper library recommendation

Hi all, this is quite off-topic, but I''m sure a lot of people here has experience in the area, so... I''m writing a website scraper script that needs to download a web page, traverse the (X)HTML tree and finally insert data and HTML pieces into a DB. Eventually this data will be served up as RSS and/or Atom. I''m currently using html/tree (htmltools); I''ve also tried Rubyful Soup; both have their own shortcomi...

ScrapeR Unanticipated XML objects

2010 Aug 01

ScrapeR Unanticipated XML objects

...have come across a very surprising result as I have started to learn how to use R to pull data from the web for analysis. I am trying to isolate that table headers for the quarterly income statement (qtrinc) that I pulled from Google finance. I executed the following commands after installing the scrapeR package. require(scrapeR) htmlfile<-scrape(url="http://www.google.com/finance?q=NASDAQ:MSFT&fstype=ii",headers=TRUE,parse=TRUE) tables<-xpathSApply(htmlfile[[1]],"//table") qtrinc<-tables[[1]] xpathSApply(qtrinc,"//thead",xmlValue) I receive the result...

backgroundrb 0.2.1 doesn''t always load rails environment

2006 Dec 31

backgroundrb 0.2.1 doesn''t always load rails environment

I found this stack trace in my logs. My worker name is MiscWorker, and Qualifier is a Rails model. uninitialized constant MiscWorker::Qualifier: /Users/bryan/ Workspace/sandbox/scraper-trunk/config/../vendor/rails/activerecord/ lib/../../activesupport/lib/active_support/dependencies.rb:476:in `const_missing'' /Users/bryan/Workspace/sandbox/scraper-trunk/lib/workers/ envoyd_worker.rb:11:in `do_work'' /Users/bryan/Workspace/sandbox/scraper-trunk/vendor/plugins/...

Good Bye SAMBA?!?!?

2016 Sep 28

Good Bye SAMBA?!?!?

Am 28.09.2016 um 04:01 schrieb Steve Litt via samba: > Why would ANYBODY type a command when they could perform a bunch of > mouse clicks. Better yet, you can automate Windows tools with a screen > scraper and a keyboard injector, or with a top notch language like > Powershell or Visual Basic *lol* why would ANYBODY click in a GUI when he have a console - and i mean that really serious after been a clickmonkey until 2006

Find String Between Characters

2011 May 15

Find String Between Characters

Dear R Helpers, I am trying to isolate a set of characters between two other characters in a long string file. I tried some of the examples on the R help pages and elsewhere, but I am not able to get it. Your help would be much appreciated. require(scrapeR) mmm<-scrape(url="http://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK=0000320193&owner=exclude&count=40") str(mmm) I want to get the number 0000320193 that is between the CIK= and the &. I have tried g <- grep( "CIK=|&", mmm ) and temp<-...

Email Obfuscation Techniques

2011 Feb 01

Email Obfuscation Techniques

The other thread brought to my attention that only the <email> syntax obfuscates mailto links. Plus, while the entity encoding technique probably fools some scrapers, I doubt it's all that effective. Even Gruber uses the Hivelogic Enkoder [1]. So, what are people using for obfuscation and are you using any scripting or automation (filter that takes a pass before or after Markdown) to get this into your HTML? [1]: hivelogic.com/enkoder -- arno? s? hauta...

Rcurl, postForm()

2012 May 28

Rcurl, postForm()

...DirectorySearch.aspx I've read through (http://www.omegahat.org/RCurl/RCurlJSS.pdf) and scoured the web for tutorials, but I can't crack it. I'm aware that this is probably a pretty basic question, but I need some help regardless. Yours, Simon Kiss library(XML) library(RCurl) library(scrapeR) library(RHTMLForms) #Set URL bus<-c('http://www.brantford.ca/business/LocalBusinessCommunity/Pages/BusinessDirectorySearch.aspx') #Scrape URL orig<-getURLContent(url=bus) #Parse doc doc<-htmlParse(orig[[1]], asText=TRUE) #Get The forms forms<-getNodeSet(doc, "//form"...

Cisco XML 411 Interface

2004 Oct 04

Cisco XML 411 Interface

Hi All, Did anyone came across a 411 XML service I can feed to the "service" button with XML? Some other feed I can manipulate to XML query? Assaf Benharoosh -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.digium.com/pipermail/asterisk-users/attachments/20041004/dbf552ac/attachment.htm

Error handling with frozen RCurl function calls + Identification of frozen R processes

2011 Jan 26

Error handling with frozen RCurl function calls + Identification of frozen R processes

...ime which cuts time short for all the nitty-gritty details of the "components" involved. Having said this, I'm lacking the time at the moment to deeply dive into parallel computing and HTTP requests via RCurl and I hope you can help me out with one or two imminent issues of my crawler/scraper: Once a day, I'm running 'RCurl::getURIAsynchronous(x=URL.frontier.sub, multiHandle=my.multi.handle)' within an lapply()-construct in order to read chunks of deterministically composed URLs from a host. There are courtesy time delays implemented between the individual http requests (5...

Spam messages

2016 Dec 06

Spam messages

The problem is not specific to this list. Any kind of public list may mean that other subscribers (or even the whole world) can see your email address. So whenever you mail to a (public) list there is a good chance that afterwards, you will get more spam. Not really much can be done about it, at least not on the side of the list, since any of the subscribers may be a spammer, who can know... All

Spam messages

2016 Dec 06

Spam messages

...cause otherwise I wouldn't know how to filter her out... On Tue, Dec 06, 2016 at 10:02:11AM -0600, Marc Schwartz wrote: > Hi, > > This topic has come up previously, across the R e-mail lists and the spammers need not be subscribers (but could be), but simply reasonably competent HTML scrapers. > > If you look at the online archives of the R lists, for example R-Devel for this month: > > https://stat.ethz.ch/pipermail/r-devel/2016-December/thread.html <https://stat.ethz.ch/pipermail/r-devel/2016-December/thread.html> > > and look at the individual posts, t...

Good Bye SAMBA?!?!?

2016 Sep 28

Good Bye SAMBA?!?!?

...m glad to see you've come to the realization that Samba is just a console based rodent wheel and that Windows offers the professional CIFS_AD_DFS solution. Why would ANYBODY type a command when they could perform a bunch of mouse clicks. Better yet, you can automate Windows tools with a screen scraper and a keyboard injector, or with a top notch language like Powershell or Visual Basic. These Samba guys just want to install servers that never go down, but that's lazy thinking. Your server going down gives you the opportunity to do maintenance and updates. A server going down is like a circu...

Historical Libor Rates

2010 Jul 19

Historical Libor Rates

Hello All, Does anyone know how to download historical LIBOR rates of different currencies into R? Or if anyone knows of a website that holds all this data...I only need up to january of 2000. Also, how can we make the row names the index of a plot (the names of the x values)? [[alternative HTML version deleted]]

Being a polite client: maintaining history

2007 Jul 25

Being a polite client: maintaining history

...is also important. I see in Mechanize''s code that if conditional_requests is set, it''ll add the If-Modified-Since header. But this requires that the page is already in the history, and there''s currently no provision for caching the history. Since RSS readers (and most scrapers in general) are likely to be run periodically, mechanize should try to maintain this kind of state between runs, don''t you think? You might see a patch from me, unless someone beats me to it. [1] http://fishbowl.pastiche.org/2002/10/21/http_conditional_get_for_rss_hackers -- epistemol...

Rake Namespaces - How to keep them separate?

2007 May 16

Rake Namespaces - How to keep them separate?

Good morning, In our app we have a number of custom Rake tasks living in lib/tasks. Each has a different namespace, but if I do this (yes, I know, global variables bad): namespace :spider_uk_foo do @ss = Scraper.new @ss.set_name task :perform do end end Then @ss.set_name is run when running any other rake file, though each has a different namespace. @ss would also appear to be available to other rake files, again in different namespaces, where it is not even defined. Any ideas, please? Any advic...

Using xpathapply or getnodeset to get text between two distinct tags

2012 May 11

Using xpathapply or getnodeset to get text between two distinct tags

...ween 'Translation' and 'English' so that I can mark it as 'French' and then return the text between 'English' and 'Translation' and mark it as English. Does any one have any suggestions? Yours truly, Simon J. Kiss #Necessary libraries library(XML) library(scrapeR) #URL for links to 2012 transcripts hansard<-c('http://www.parl.gc.ca/housechamberbusiness/ChamberSittings.aspx?View=H&Language=E&Mode=1&Parl=41&Ses=1') #Scrape the page with the links doc<-scrape(url=hansard, parse=TRUE, follow=TRUE) #Not sure what exactly this does,...

Spam messages

2016 Dec 06

Spam messages

Hi, This topic has come up previously, across the R e-mail lists and the spammers need not be subscribers (but could be), but simply reasonably competent HTML scrapers. If you look at the online archives of the R lists, for example R-Devel for this month: https://stat.ethz.ch/pipermail/r-devel/2016-December/thread.html <https://stat.ethz.ch/pipermail/r-devel/2016-December/thread.html> and look at the individual posts, there is only minimal munging o...

Using nokogiri

2011 Dec 05

Using nokogiri

HI, I want to grab some information about university names, and I found this term called "web scraping" I search about it in google, and there are tools in ruby. One of them is nokogiri but I''m a bit confused because it seems that it only gets information that its already in an html or xml I found a webpage that have a list of university names as a <select>

search for: scraper