Hi List Apologies if this isn''t the correct place for this query (I''ve tried a search of the mail archives but not had much joy). I''m running R (2.14.0) on a Mac (OSX v 10.5.8, 2.66GHz, 4GB memory) and am having a few performance issues with reading data in from a Postres database (using RPostgreSQL). My query / code are as below # ----------------------------- library(''RPostgreSQL'') drv <- dbDriver("PostgreSQL") dbh <- dbConnect(drv,user="…",password="…",dbname="…",host="…") sql <- "select id, date, lon, lat, date_trunc(''day'' , date) as jday, extract(''hour'' from date) as hour, extract(''year'' from date) as year from observations where pt = 6 and date >= ''1990-01-01'' and date < ''1995-01-01'' and lon > 180 and lon < 290 and lat > -30 and lat < 30 and sst is not null" dataIn <- dbGetQuery(dbh,sql) # ----------------------------- All variables are reals other than id which is varchar(10) and date which is a timestamp, approximately 1.5 million rows are returned by the query and it takes order 10 second to execute using psql (the command line client for Postgres) and a similar time using pgAdmin 3. In R it takes several minutes to run and I''m unsure where the bottleneck is occurring. I don''t believe it''s in the database access as the query disappears from the server status (of pgAdmin) quite quickly and where it only takes ~10 seconds from psql. I''ve only been using R for the last month or two so any tips / advice on how to speed up the query or how to track down the source of the poor performance would be appreciated. For info, I get similar poor performance running R (2.14.0) on a linux workstation (Linux 2.6.16.60-0.91.1-smp x86_64, 42GB memory, 2 6 core processors) running SUSE. Thanks in advance Dave Berry. -- This message (and any attachments) is for the recipient ...{{dropped:8}}
On Thu, Dec 1, 2011 at 10:02 AM, Berry, David I. <dyb at noc.ac.uk> wrote:> Hi List > > Apologies if this isn't the correct place for this query (I've tried a search of the mail archives but not had much joy). > > I'm running R (2.14.0) ?on a Mac (OSX v 10.5.8, 2.66GHz, 4GB memory) and am having a few performance issues with reading data in from a Postres database (using RPostgreSQL). My query / code are as below > > # ----------------------------- > library('RPostgreSQL') > > drv <- dbDriver("PostgreSQL") > > dbh <- dbConnect(drv,user="?",password="?",dbname="?",host="?") > > sql <- "select id, date, lon, lat, date_trunc('day' , date) as jday, extract('hour' from date) as hour, extract('year' from date) as year from observations where pt = 6 and date >= '1990-01-01' and date < '1995-01-01' and lon > 180 and lon < 290 and lat > -30 and lat < 30 and sst is not null" > > dataIn <- dbGetQuery(dbh,sql)If this is a large table of which the desired rows are a small fraction of all rows then be sure there indexes on the variables in your where clause. You can also try it with the RpgSQL driver although there is no reason to think that that would be faster. -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com
>>>>> "BD" == Berry, David <dyb at noc.ac.uk> writes:BD> All variables are reals other than id which is varchar(10) and date BD> which is a timestamp, approximately 1.5 million rows are returned by BD> the query and it takes order 10 second to execute using psql (the BD> command line client for Postgres) and a similar time using pgAdmin BD> 3. In R it takes several minutes to run and I'm unsure where the BD> bottleneck is occurring. You may want to test progressively smaller chunks of the data to see how quickly R slows down as compared to psql on that query. My first guess is that something allocating and re-allocating ram in a quadratic (or worse) fashion. I don't know whether OSX has anything equivilent, but you could test on the linux box using oprofile (http://oprofile.sourceforge.net; SuSE should have an rpm for it and kernel support compiled in) to confirm where the time is spent. It is /possible/ that the (sql)NULL->(r)NA logic in RS-PostgreSQL.c may be slow (relatively speaking), but it is necessary. Nothing else jumps out as a possible choke point. Oprofile (or the equivilent) would best answer the question. -JimC -- James Cloos <cloos at jhcloos.com> OpenPGP: 1024D/ED7DAEA6
Apparently Analagous Threads
- Odd characters at beginning of file
- Package RPostgreSQL_0.1-6.tar.gz has been checked and built
- how to connect to a remote PostgreSQL database from R on mac osx
- Passing bigint (8-byte) values from Postgres to R using RPostgreSQL
- unable to load shared library (Symbol not found: _PQbackendPID)