R Version : 2.14.1 x64 Running on Windows 7 Connecting to a database on a remote Microsoft SQL Server 2012 The short form of my problem is the following. I have an unordered vectors of names, say: names<-c("A", "B", "A", "C","C") each of which have an id in a table in my db. I need to convert the names to their corresponding ids. I currently have the following code to do it. ### names<-c("A", "B", "A", "C","C") dbConn<-odbcDriverConnect(connection="connection string") #successfully connects nameToID<-function(name, dbConn){ #dbConn : active db connection formed via odbcDriverConnect #name : a char string sqlQuery(dbConn, paste("select id from table where name='", name, "'", sep="")) } sapply(names, nameToID, dbConn=dbConn) ### Barring better ways to do this, which could involve loading the table into R then working with the problem there (which is possible), I understand why the following doesn't work, but I cannot seem to find a solution. Attempting to use parallelization via the package 'parallel' : ### names<-c("A", "B", "A", "C","C") dbConn<-odbcDriverConnect(connection="connection string") #successfully connects nameToID<-function(name, dbConn){ #dbConn : active db connection formed via odbcDriverConnect #name : a char string sqlQuery(dbConn, paste("select id from table where name='", name, "'", sep="")) } mc<-detectCores() cl<-makeCluster(mc) clusterExport(cl, c("sqlQuery", "dbConn")) parSapply(cl, names, nameToID, dbConn=dbConn) #incorrect passing of nameToID's second argument ### As in the comment, this is not the correct way to assign the second argument to nameToID. I have also tried the following: parSapply(cl, names, function(x) nameToID(x, dbConn)) in place of the previous parSapply call, but that also does not work, with the error being thrown saying "the first parameter is not an open RODBC connection", presumably referring to the first parameter of the sqlQuery() The following code does work with parallization. ### names<-c("A", "B", "A", "C","C") dbConn<-odbcDriverConnect(connection="connection string") #successfully connects nameToID<-function(name){ #name : a char string dbConn<-odbcDriverConnect(connection="string") result<-sqlQuery(dbConn, paste("select id from table where name='", name, "'", sep="")) odbcClose(dbConn) result } mc<-detectCores() cl<-makeCluster(mc) clusterExport(cl, c("sqlQuery", "odbcDriverConnect", "odbcClose", "dbConn", "nameToID")) #throwing everything in parSapply(cl, names, nameToID) ### But the constant opening and closing a ton of the gains from parallelization, and seems just a bit silly. So the overall question would be how to pass the second parameter (the open db connection) to the function within parSapply, in much the same way as it is done in the regular apply? In general, how does one pass a second, third, nth parameter to a function within a parallel routine? Thanks and if you need any more information let me know. -DT [[alternative HTML version deleted]]
Frede Aakmann Tøgersen
2014-Jul-03 07:51 UTC
[R] parLapply on sqlQuery (from package RODBC)
Hi Why are you doing duplicate queries to the database (two As and Cs in your names vector)? Why do 5 simultaneously connection to the database server? Woukld you do 500 connections? Why not do one query and let the database server do the job for you? Try this:> options(useFancyQuotes = FALSE) > > query0 <- "select id from table where name in (%s)" > > names <- paste(sQuote(LETTERS[1:5]), collapse = ",") > names[1] "'A','B','C','D','E'"> > query <- sprintf(query0, names) > query[1] "select id from table where name in ('A','B','C','D','E')"> > dataFromDB <- sqlQuery(dbConn, query)This should work for MS SQL and MySQL servers. Yours sincerely / Med venlig hilsen Frede Aakmann T?gersen Specialist, M.Sc., Ph.D. Plant Performance & Modeling Technology & Service Solutions T +45 9730 5135 M +45 2547 6050 frtog at vestas.com http://www.vestas.com Company reg. name: Vestas Wind Systems A/S This e-mail is subject to our e-mail disclaimer statement. Please refer to www.vestas.com/legal/notice If you have received this e-mail in error please contact the sender.> -----Original Message----- > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] > On Behalf Of Tevlin, Dylan > Sent: 2. juli 2014 23:49 > To: r-help at r-project.org > Subject: [R] parLapply on sqlQuery (from package RODBC) > > R Version : 2.14.1 x64 > Running on Windows 7 > > Connecting to a database on a remote Microsoft SQL Server 2012 > > The short form of my problem is the following. > > I have an unordered vectors of names, say: > > names<-c("A", "B", "A", "C","C") > > each of which have an id in a table in my db. I need to convert the names to > their corresponding ids. > > I currently have the following code to do it. > ### > names<-c("A", "B", "A", "C","C") > dbConn<-odbcDriverConnect(connection="connection string") #successfully > connects > > nameToID<-function(name, dbConn){ > #dbConn : active db connection formed via odbcDriverConnect > #name : a char string > > sqlQuery(dbConn, paste("select id from table where name='", name, > "'", sep="")) > } > sapply(names, nameToID, dbConn=dbConn) > ### > > Barring better ways to do this, which could involve loading the table into R > then working with the problem there (which is possible), I understand why > the following doesn't work, but I cannot seem to find a solution. Attempting > to use parallelization via the package 'parallel' : > > ### > names<-c("A", "B", "A", "C","C") > dbConn<-odbcDriverConnect(connection="connection string") #successfully > connects > > nameToID<-function(name, dbConn){ > #dbConn : active db connection formed via odbcDriverConnect > #name : a char string > > sqlQuery(dbConn, paste("select id from table where name='", name, > "'", sep="")) > } > > mc<-detectCores() > cl<-makeCluster(mc) > clusterExport(cl, c("sqlQuery", "dbConn")) > parSapply(cl, names, nameToID, dbConn=dbConn) #incorrect passing of > nameToID's second argument > ### > > As in the comment, this is not the correct way to assign the second argument > to nameToID. > > I have also tried the following: > > parSapply(cl, names, function(x) nameToID(x, dbConn)) > > in place of the previous parSapply call, but that also does not work, with the > error being thrown saying "the first parameter is not an open RODBC > connection", presumably referring to the first parameter of the sqlQuery() > > The following code does work with parallization. > > ### > names<-c("A", "B", "A", "C","C") > dbConn<-odbcDriverConnect(connection="connection string") #successfully > connects > nameToID<-function(name){ > #name : a char string > dbConn<-odbcDriverConnect(connection="string") > result<-sqlQuery(dbConn, paste("select id from table where > name='", name, "'", sep="")) > odbcClose(dbConn) > result > } > > mc<-detectCores() > cl<-makeCluster(mc) > clusterExport(cl, c("sqlQuery", "odbcDriverConnect", "odbcClose", "dbConn", > "nameToID")) #throwing everything in > parSapply(cl, names, nameToID) > ### > > But the constant opening and closing a ton of the gains from parallelization, > and seems just a bit silly. > > So the overall question would be how to pass the second parameter (the > open db connection) to the function within parSapply, in much the same way > as it is done in the regular apply? In general, how does one pass a second, > third, nth parameter to a function within a parallel routine? > > Thanks and if you need any more information let me know. > > -DT > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > and provide commented, minimal, self-contained, reproducible code.