search for: scrapers

Displaying 20 results from an estimated 31 matches for "scrapers".

Did you mean: scraper
2009 Jun 07
17
ActiveRecord Classes
...shing.scrape_data offensive_rushing.clean_celldata offensive_rushing.print_values offensive_rushing.update_rushing_offense # the call to the method above end Now if I run the rake file what is going to happen is I''m going to get an error stating: Table ''project_development.scrapers'' doesn''t exist: I believe I understand why that''s happening but I''m not sure how to fix it from a long term perspective. Here''s why... The class Scraper is pushed into the ActiveRecord::Base so it believes the class is the pluralized name of the table...
2009 Jun 06
5
Rake Tasks
Hi Everyone, I just need some further help clarifying a custom rake task I''m building and the logistics of how it should be working. I''ve created a custom rake task in libs/tasks called scraper.rake which so far just contains the following: desc "This task will parse data from ncaa.org and upload the data to our db" task :scraper => :environment do # code goes
2006 Jan 10
1
OT: Scraper library recommendation
Hi all, this is quite off-topic, but I''m sure a lot of people here has experience in the area, so... I''m writing a website scraper script that needs to download a web page, traverse the (X)HTML tree and finally insert data and HTML pieces into a DB. Eventually this data will be served up as RSS and/or Atom. I''m currently using html/tree (htmltools); I''ve also
2010 Aug 01
0
ScrapeR Unanticipated XML objects
Dear All, I have come across a very surprising result as I have started to learn how to use R to pull data from the web for analysis. I am trying to isolate that table headers for the quarterly income statement (qtrinc) that I pulled from Google finance. I executed the following commands after installing the scrapeR package. require(scrapeR)
2006 Dec 31
0
backgroundrb 0.2.1 doesn''t always load rails environment
I found this stack trace in my logs. My worker name is MiscWorker, and Qualifier is a Rails model. uninitialized constant MiscWorker::Qualifier: /Users/bryan/ Workspace/sandbox/scraper-trunk/config/../vendor/rails/activerecord/ lib/../../activesupport/lib/active_support/dependencies.rb:476:in `const_missing'' /Users/bryan/Workspace/sandbox/scraper-trunk/lib/workers/
2016 Sep 28
2
Good Bye SAMBA?!?!?
Am 28.09.2016 um 04:01 schrieb Steve Litt via samba: > Why would ANYBODY type a command when they could perform a bunch of > mouse clicks. Better yet, you can automate Windows tools with a screen > scraper and a keyboard injector, or with a top notch language like > Powershell or Visual Basic *lol* why would ANYBODY click in a GUI when he have a console - and i mean that really
2011 May 15
1
Find String Between Characters
Dear R Helpers, I am trying to isolate a set of characters between two other characters in a long string file. I tried some of the examples on the R help pages and elsewhere, but I am not able to get it. Your help would be much appreciated. require(scrapeR)
2011 Feb 01
1
Email Obfuscation Techniques
The other thread brought to my attention that only the <email> syntax obfuscates mailto links. Plus, while the entity encoding technique probably fools some scrapers, I doubt it's all that effective. Even Gruber uses the Hivelogic Enkoder [1]. So, what are people using for obfuscation and are you using any scripting or automation (filter that takes a pass before or after Markdown) to get this into your HTML? [1]: hivelogic.com/enkoder -- arno? s? hautal...
2012 May 28
1
Rcurl, postForm()
Dear colleagues, Could I get some assistance using postForm() to scrape the business names and addresses at this website: http://www.brantford.ca/business/LocalBusinessCommunity/Pages/BusinessDirectorySearch.aspx I've read through (http://www.omegahat.org/RCurl/RCurlJSS.pdf) and scoured the web for tutorials, but I can't crack it. I'm aware that this is probably a pretty basic
2004 Oct 04
3
Cisco XML 411 Interface
Hi All, Did anyone came across a 411 XML service I can feed to the "service" button with XML? Some other feed I can manipulate to XML query? Assaf Benharoosh -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.digium.com/pipermail/asterisk-users/attachments/20041004/dbf552ac/attachment.htm
2011 Jan 26
1
Error handling with frozen RCurl function calls + Identification of frozen R processes
Dear list, I'm tackling an empiric research problem that requires me to address a whole bunch of conceptual and/or technical details at the same time which cuts time short for all the nitty-gritty details of the "components" involved. Having said this, I'm lacking the time at the moment to deeply dive into parallel computing and HTTP requests via RCurl and I hope you can help me
2016 Dec 06
2
Spam messages
The problem is not specific to this list. Any kind of public list may mean that other subscribers (or even the whole world) can see your email address. So whenever you mail to a (public) list there is a good chance that afterwards, you will get more spam. Not really much can be done about it, at least not on the side of the list, since any of the subscribers may be a spammer, who can know... All
2016 Dec 06
1
Spam messages
...cause otherwise I wouldn't know how to filter her out... On Tue, Dec 06, 2016 at 10:02:11AM -0600, Marc Schwartz wrote: > Hi, > > This topic has come up previously, across the R e-mail lists and the spammers need not be subscribers (but could be), but simply reasonably competent HTML scrapers. > > If you look at the online archives of the R lists, for example R-Devel for this month: > > https://stat.ethz.ch/pipermail/r-devel/2016-December/thread.html <https://stat.ethz.ch/pipermail/r-devel/2016-December/thread.html> > > and look at the individual posts, th...
2016 Sep 28
0
Good Bye SAMBA?!?!?
On Tue, 27 Sep 2016 16:02:15 -0300 Gilberto Nunes via samba <samba at lists.samba.org> wrote: > Hi list > > I am sad, today! I start to study how windows deal with CIFS, Active > Directory and DFS, I just decide follow the other path! > I will give a try to windows tools.... > The question is: why Linux doesn t have such tools to help and improve > server
2010 Jul 19
2
Historical Libor Rates
Hello All, Does anyone know how to download historical LIBOR rates of different currencies into R? Or if anyone knows of a website that holds all this data...I only need up to january of 2000. Also, how can we make the row names the index of a plot (the names of the x values)? [[alternative HTML version deleted]]
2007 Jul 25
0
Being a polite client: maintaining history
...is also important. I see in Mechanize''s code that if conditional_requests is set, it''ll add the If-Modified-Since header. But this requires that the page is already in the history, and there''s currently no provision for caching the history. Since RSS readers (and most scrapers in general) are likely to be run periodically, mechanize should try to maintain this kind of state between runs, don''t you think? You might see a patch from me, unless someone beats me to it. [1] http://fishbowl.pastiche.org/2002/10/21/http_conditional_get_for_rss_hackers -- epistemolo...
2007 May 16
0
Rake Namespaces - How to keep them separate?
Good morning, In our app we have a number of custom Rake tasks living in lib/tasks. Each has a different namespace, but if I do this (yes, I know, global variables bad): namespace :spider_uk_foo do @ss = Scraper.new @ss.set_name task :perform do end end Then @ss.set_name is run when running any other rake file, though each has a different namespace. @ss would also appear to be
2012 May 11
0
Using xpathapply or getnodeset to get text between two distinct tags
Hello: The following code extracts the links to the daily transcripts of Canada's House Of Commons. 'links' is a matrix of URLs (ncol=1), each of which points to one day's transcripts. If you inspect the code for scrape(links[1]), you will find that periodically there appears an italicitze tag after a paragraph tag (<p some text ><i>Translation</i></p>.
2016 Dec 06
0
Spam messages
Hi, This topic has come up previously, across the R e-mail lists and the spammers need not be subscribers (but could be), but simply reasonably competent HTML scrapers. If you look at the online archives of the R lists, for example R-Devel for this month: https://stat.ethz.ch/pipermail/r-devel/2016-December/thread.html <https://stat.ethz.ch/pipermail/r-devel/2016-December/thread.html> and look at the individual posts, there is only minimal munging of...
2011 Dec 05
12
Using nokogiri
HI, I want to grab some information about university names, and I found this term called "web scraping" I search about it in google, and there are tools in ruby. One of them is nokogiri but I''m a bit confused because it seems that it only gets information that its already in an html or xml I found a webpage that have a list of university names as a <select>