Dear R Users - R is a wonderful software package. CRAN provides a variety of tools to work on your data. But R is not apt to utilize all the public databases in an efficient manner. I observed the most tedious part with R is searching and downloading the data from public databases and putting it into the right format. I could not find a package on CRAN which offers exactly this fundamental capability. Imagine R is the unified interface to access (and analyze) all public data in the easiest way possible. That would create a real impact, would put R a big leap forward and would enable us to see the world with different eyes. There is a lack of a direct connection to the API of these databases, to name a few: - Eurostat - OECD - IMF - Worldbank - UN - FAO - data.gov - ... The ease of access to the data is the key of information processing with R. How can we handle the flow of information noise? R has to give an answer to that with an extensive API to public databases. I would love your comments and ideas as a contribution in a vital discussion. Benjamin
R is Open Source. You're welcome to write tools, and submit your package to CRAN. I think some part of this has been done, based on questions to the list asking about those parts. Personally, I've been using S-Plus and then R for 18 years, and never required data from any of them. Which doesn't make it not important, but suggests that public databases aren't the be-all and end-all for R use. Sarah On Fri, Jan 13, 2012 at 4:14 PM, Benjamin Weber <mail at bwe.im> wrote:> Dear R Users - > > R is a wonderful software package. CRAN provides a variety of tools to > work on your data. But R is not apt to utilize all the public > databases in an efficient manner. > I observed the most tedious part with R is searching and downloading > the data from public databases and putting it into the right format. I > could not find a package on CRAN which offers exactly this fundamental > capability. > Imagine R is the unified interface to access (and analyze) all public > data in the easiest way possible. That would create a real impact, > would put R a big leap forward and would enable us to see the world > with different eyes. > > There is a lack of a direct connection to the API of these databases, > to name a few: > > - Eurostat > - OECD > - IMF > - Worldbank > - UN > - FAO > - data.gov > - ... > > The ease of access to the data is the key of information processing with R. > > How can we handle the flow of information noise? R has to give an > answer to that with an extensive API to public databases. > > I would love your comments and ideas as a contribution in a vital discussion. > > Benjamin >-- Sarah Goslee http://www.functionaldiversity.org
HI Benjamin: What would make this easier is if these sites used standardized web services, so it would only require writing once. data.gov is the worst example, they spun the own, weak service. There is a lot of environmental data available through OPenDAP, and that is supported in the ncdf4 package. My own group has a service called ERDDAP that is entirely RESTFul, see: http://coastwatch.pfel.noaa.gov/erddap and http://upwell.pfeg.noaa.gov/erddap We provide R (and matlab) scripts that automate the extract for certain cases, see: http://coastwatch.pfeg.noaa.gov/xtracto/ We also have a tool called the Environmental Data Connector (EDC) that provides a GUI from with R (and ArcGIS, Matlab and Excel) that allows you to subset data that is served by OPeNDAP, ERDDAP, certain Sensor Observation Service (SOS) servers, and have it read directly into R. It is freely available at: http://www.pfeg.noaa.gov/products/EDC/ We can write such tools because the service is either standardized (OPeNDAP, SOS) or is easy to implement (ERDDAP). -Roy On Jan 13, 2012, at 1:14 PM, Benjamin Weber wrote:> Dear R Users - > > R is a wonderful software package. CRAN provides a variety of tools to > work on your data. But R is not apt to utilize all the public > databases in an efficient manner. > I observed the most tedious part with R is searching and downloading > the data from public databases and putting it into the right format. I > could not find a package on CRAN which offers exactly this fundamental > capability. > Imagine R is the unified interface to access (and analyze) all public > data in the easiest way possible. That would create a real impact, > would put R a big leap forward and would enable us to see the world > with different eyes. > > There is a lack of a direct connection to the API of these databases, > to name a few: > > - Eurostat > - OECD > - IMF > - Worldbank > - UN > - FAO > - data.gov > - ... > > The ease of access to the data is the key of information processing with R. > > How can we handle the flow of information noise? R has to give an > answer to that with an extensive API to public databases. > > I would love your comments and ideas as a contribution in a vital discussion. > > Benjamin > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.********************** "The contents of this message do not reflect any position of the U.S. Government or NOAA." ********************** Roy Mendelssohn Supervisory Operations Research Analyst NOAA/NMFS Environmental Research Division Southwest Fisheries Science Center 1352 Lighthouse Avenue Pacific Grove, CA 93950-2097 e-mail: Roy.Mendelssohn at noaa.gov (Note new e-mail address) voice: (831)-648-9029 fax: (831)-648-8440 www: http://www.pfeg.noaa.gov/ "Old age and treachery will overcome youth and skill." "From those who have been given much, much will be expected" "the arc of the moral universe is long, but it bends toward justice" -MLK Jr.
The situation for this kind of interface is much more advanced (for economic time series data) than has been suggested in other postings. Several of the organizations you mention support SDMX and I believe there is a working R interface to SDMX which has not yet been made public. A more complete list of organizations that I think already have working server side support for SDMX is: the OECD, Eurostat, the ECB, the IMF, the UN, the BIS, the Federal Reserve Board, the World Bank, the Italian Statistics agency, and to a small extent by the Bank of Canada. I have a working API to several time series databases (TS* packages on CRAN), and a partially working interface to SDMX, but have postponed further development of that in the hope that the already working code will be made available. Please see http://tsdbi.r-forge.r-project.org/ for more details. I would, of course, be happy to have other developers involved in this project. If you think you can contribute then see r-forge.r-project.org for details on how to join projects. Paul On 12-01-14 06:00 AM, r-help-request at r-project.org wrote:> Date: Sat, 14 Jan 2012 02:44:07 +0530 > From: Benjamin Weber<mail at bwe.im> > To:r-help at r-project.org > Subject: [R] The Future of R | API to Public Databases > Message-ID: > <CANY9Q8k+ZYVrKJJGBJp+jtnYAW15GQkOCivYVPGwgYQA9dLOxg at mail.gmail.com> > Content-Type: text/plain; charset=UTF-8 > > Dear R Users - > > R is a wonderful software package. CRAN provides a variety of tools to > work on your data. But R is not apt to utilize all the public > databases in an efficient manner. > I observed the most tedious part with R is searching and downloading > the data from public databases and putting it into the right format. I > could not find a package on CRAN which offers exactly this fundamental > capability. > Imagine R is the unified interface to access (and analyze) all public > data in the easiest way possible. That would create a real impact, > would put R a big leap forward and would enable us to see the world > with different eyes. > > There is a lack of a direct connection to the API of these databases, > to name a few: > > - Eurostat > - OECD > - IMF > - Worldbank > - UN > - FAO > - data.gov > - ... > > The ease of access to the data is the key of information processing with R. > > How can we handle the flow of information noise? R has to give an > answer to that with an extensive API to public databases. > > I would love your comments and ideas as a contribution in a vital discussion. > > Benjamin