search for: scraper

Displaying 20 results from an estimated 31 matches for "scraper".

Did you mean: scrape
2009 Jun 07
17
ActiveRecord Classes
...lasses using ActiveRecord when a file is in my lib directory: Brief example: Here''s the outline of the files in use: ....app ........controllers ............application_controller.rb ............rushing_offenses_controller.rb ........models ............rushing_offense.rb ....lib ........scraper.rb ........tasks ............scraper.rake The rushing_offense.rb file contains: class RushingOffense < ActiveRecord::Base end The scraper.rb file contains: class Scraper < ActiveRecord::Base # METHOD that define which URL to parse # METHOD that parses the data into an instanced variable c...
2009 Jun 06
5
Rake Tasks
Hi Everyone, I just need some further help clarifying a custom rake task I''m building and the logistics of how it should be working. I''ve created a custom rake task in libs/tasks called scraper.rake which so far just contains the following: desc "This task will parse data from ncaa.org and upload the data to our db" task :scraper => :environment do # code goes here for scraping end This rake task will be parsing data from ncaa.org and placing it into my DB for further pro...
2006 Jan 10
1
OT: Scraper library recommendation
Hi all, this is quite off-topic, but I''m sure a lot of people here has experience in the area, so... I''m writing a website scraper script that needs to download a web page, traverse the (X)HTML tree and finally insert data and HTML pieces into a DB. Eventually this data will be served up as RSS and/or Atom. I''m currently using html/tree (htmltools); I''ve also tried Rubyful Soup; both have their own shortcomi...
2010 Aug 01
0
ScrapeR Unanticipated XML objects
...have come across a very surprising result as I have started to learn how to use R to pull data from the web for analysis. I am trying to isolate that table headers for the quarterly income statement (qtrinc) that I pulled from Google finance. I executed the following commands after installing the scrapeR package. require(scrapeR) htmlfile<-scrape(url="http://www.google.com/finance?q=NASDAQ:MSFT&fstype=ii",headers=TRUE,parse=TRUE) tables<-xpathSApply(htmlfile[[1]],"//table") qtrinc<-tables[[1]] xpathSApply(qtrinc,"//thead",xmlValue) I receive the result...
2006 Dec 31
0
backgroundrb 0.2.1 doesn''t always load rails environment
I found this stack trace in my logs. My worker name is MiscWorker, and Qualifier is a Rails model. uninitialized constant MiscWorker::Qualifier: /Users/bryan/ Workspace/sandbox/scraper-trunk/config/../vendor/rails/activerecord/ lib/../../activesupport/lib/active_support/dependencies.rb:476:in `const_missing'' /Users/bryan/Workspace/sandbox/scraper-trunk/lib/workers/ envoyd_worker.rb:11:in `do_work'' /Users/bryan/Workspace/sandbox/scraper-trunk/vendor/plugins/...
2016 Sep 28
2
Good Bye SAMBA?!?!?
Am 28.09.2016 um 04:01 schrieb Steve Litt via samba: > Why would ANYBODY type a command when they could perform a bunch of > mouse clicks. Better yet, you can automate Windows tools with a screen > scraper and a keyboard injector, or with a top notch language like > Powershell or Visual Basic *lol* why would ANYBODY click in a GUI when he have a console - and i mean that really serious after been a clickmonkey until 2006
2011 May 15
1
Find String Between Characters
Dear R Helpers, I am trying to isolate a set of characters between two other characters in a long string file. I tried some of the examples on the R help pages and elsewhere, but I am not able to get it. Your help would be much appreciated. require(scrapeR) mmm<-scrape(url="http://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK=0000320193&owner=exclude&count=40") str(mmm) I want to get the number 0000320193 that is between the CIK= and the &. I have tried g <- grep( "CIK=|&", mmm ) and temp<-...
2011 Feb 01
1
Email Obfuscation Techniques
The other thread brought to my attention that only the <email> syntax obfuscates mailto links. Plus, while the entity encoding technique probably fools some scrapers, I doubt it's all that effective. Even Gruber uses the Hivelogic Enkoder [1]. So, what are people using for obfuscation and are you using any scripting or automation (filter that takes a pass before or after Markdown) to get this into your HTML? [1]: hivelogic.com/enkoder -- arno? s? hauta...
2012 May 28
1
Rcurl, postForm()
...DirectorySearch.aspx I've read through (http://www.omegahat.org/RCurl/RCurlJSS.pdf) and scoured the web for tutorials, but I can't crack it. I'm aware that this is probably a pretty basic question, but I need some help regardless. Yours, Simon Kiss library(XML) library(RCurl) library(scrapeR) library(RHTMLForms) #Set URL bus<-c('http://www.brantford.ca/business/LocalBusinessCommunity/Pages/BusinessDirectorySearch.aspx') #Scrape URL orig<-getURLContent(url=bus) #Parse doc doc<-htmlParse(orig[[1]], asText=TRUE) #Get The forms forms<-getNodeSet(doc, "//form"...
2004 Oct 04
3
Cisco XML 411 Interface
Hi All, Did anyone came across a 411 XML service I can feed to the "service" button with XML? Some other feed I can manipulate to XML query? Assaf Benharoosh -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.digium.com/pipermail/asterisk-users/attachments/20041004/dbf552ac/attachment.htm
2011 Jan 26
1
Error handling with frozen RCurl function calls + Identification of frozen R processes
...ime which cuts time short for all the nitty-gritty details of the "components" involved. Having said this, I'm lacking the time at the moment to deeply dive into parallel computing and HTTP requests via RCurl and I hope you can help me out with one or two imminent issues of my crawler/scraper: Once a day, I'm running 'RCurl::getURIAsynchronous(x=URL.frontier.sub, multiHandle=my.multi.handle)' within an lapply()-construct in order to read chunks of deterministically composed URLs from a host. There are courtesy time delays implemented between the individual http requests (5...
2016 Dec 06
2
Spam messages
The problem is not specific to this list. Any kind of public list may mean that other subscribers (or even the whole world) can see your email address. So whenever you mail to a (public) list there is a good chance that afterwards, you will get more spam. Not really much can be done about it, at least not on the side of the list, since any of the subscribers may be a spammer, who can know... All
2016 Dec 06
1
Spam messages
...cause otherwise I wouldn't know how to filter her out... On Tue, Dec 06, 2016 at 10:02:11AM -0600, Marc Schwartz wrote: > Hi, > > This topic has come up previously, across the R e-mail lists and the spammers need not be subscribers (but could be), but simply reasonably competent HTML scrapers. > > If you look at the online archives of the R lists, for example R-Devel for this month: > > https://stat.ethz.ch/pipermail/r-devel/2016-December/thread.html <https://stat.ethz.ch/pipermail/r-devel/2016-December/thread.html> > > and look at the individual posts, t...
2016 Sep 28
0
Good Bye SAMBA?!?!?
...m glad to see you've come to the realization that Samba is just a console based rodent wheel and that Windows offers the professional CIFS_AD_DFS solution. Why would ANYBODY type a command when they could perform a bunch of mouse clicks. Better yet, you can automate Windows tools with a screen scraper and a keyboard injector, or with a top notch language like Powershell or Visual Basic. These Samba guys just want to install servers that never go down, but that's lazy thinking. Your server going down gives you the opportunity to do maintenance and updates. A server going down is like a circu...
2010 Jul 19
2
Historical Libor Rates
Hello All, Does anyone know how to download historical LIBOR rates of different currencies into R? Or if anyone knows of a website that holds all this data...I only need up to january of 2000. Also, how can we make the row names the index of a plot (the names of the x values)? [[alternative HTML version deleted]]
2007 Jul 25
0
Being a polite client: maintaining history
...is also important. I see in Mechanize''s code that if conditional_requests is set, it''ll add the If-Modified-Since header. But this requires that the page is already in the history, and there''s currently no provision for caching the history. Since RSS readers (and most scrapers in general) are likely to be run periodically, mechanize should try to maintain this kind of state between runs, don''t you think? You might see a patch from me, unless someone beats me to it. [1] http://fishbowl.pastiche.org/2002/10/21/http_conditional_get_for_rss_hackers -- epistemol...
2007 May 16
0
Rake Namespaces - How to keep them separate?
Good morning, In our app we have a number of custom Rake tasks living in lib/tasks. Each has a different namespace, but if I do this (yes, I know, global variables bad): namespace :spider_uk_foo do @ss = Scraper.new @ss.set_name task :perform do end end Then @ss.set_name is run when running any other rake file, though each has a different namespace. @ss would also appear to be available to other rake files, again in different namespaces, where it is not even defined. Any ideas, please? Any advic...
2012 May 11
0
Using xpathapply or getnodeset to get text between two distinct tags
...ween 'Translation' and 'English' so that I can mark it as 'French' and then return the text between 'English' and 'Translation' and mark it as English. Does any one have any suggestions? Yours truly, Simon J. Kiss #Necessary libraries library(XML) library(scrapeR) #URL for links to 2012 transcripts hansard<-c('http://www.parl.gc.ca/housechamberbusiness/ChamberSittings.aspx?View=H&Language=E&Mode=1&Parl=41&Ses=1') #Scrape the page with the links doc<-scrape(url=hansard, parse=TRUE, follow=TRUE) #Not sure what exactly this does,...
2016 Dec 06
0
Spam messages
Hi, This topic has come up previously, across the R e-mail lists and the spammers need not be subscribers (but could be), but simply reasonably competent HTML scrapers. If you look at the online archives of the R lists, for example R-Devel for this month: https://stat.ethz.ch/pipermail/r-devel/2016-December/thread.html <https://stat.ethz.ch/pipermail/r-devel/2016-December/thread.html> and look at the individual posts, there is only minimal munging o...
2011 Dec 05
12
Using nokogiri
HI, I want to grab some information about university names, and I found this term called "web scraping" I search about it in google, and there are tools in ruby. One of them is nokogiri but I''m a bit confused because it seems that it only gets information that its already in an html or xml I found a webpage that have a list of university names as a <select>