jim holtman
2015-Oct-14 18:23 UTC
[R] SQL Server "float" is not handled by RODBC -- Is there a workaround?
Here is the system I am using: ====================================> sessionInfo() R version 3.2.2 (2015-08-14) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 7 x64 (build 7601) Service Pack 1 locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] lubridate_1.3.3 RODBC_1.3-12 loaded via a namespace (and not attached): [1] magrittr_1.5 plyr_1.8.3 tools_3.2.2 memoise_0.2.1 Rcpp_0.12.1 stringi_0.5-5 digest_0.6.8 [8] stringr_1.0.0 ======================================= I have data on a SQL Server that I am connecting to where some of the fields are defined as "float" so that the data is stored in the database as an IEEE 754 value. Now when I read this is using RODBC, the data comes across the interface in the floating point format; I used Wireshark to examine the packets that were being sent. Some of the data is also defined as "int" and comes across in binary. When the data is read in with df <- sqlQuery(db, "select * from mydb", as.is = TRUE) The resulting dataframe has the floating point values as 'chr' and the integer fields as 'int'; I would have expected the floating point fields to be 'num'. Now in the "ODBC Connectivity" Vignette by Ripley there was the comment that "double" data values come back as type 8, but on some systems they may be type 6; well on SQL Server, "float" is type 6. So what appears to happen, is this data is not recognized as a floating point value and is therefore converted to a character. When the data is made available to the R script, I then have to convert this back to floating point. If I use "stringsAsFactors = FALSE" on the query, this conversion back to floating point will be done within the RODBC package. This becomes a problem when I have dataframes with several million rows and multiple columns of numerics is that the conversion to/from characters is adding time to the processing. So I was wondering is there a workaround to this problem? Is it possible to add the capability to RODBC when processing SQL Server to avoid this conversion? Or is there some other way around this problem? Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it. [[alternative HTML version deleted]]
Mark Dalphin
2015-Oct-15 20:18 UTC
[R] SQL Server "float" is not handled by RODBC -- Is there a workaround?
Hi Jim, No answers over the course of 24 hours so I'll give it a shot. First, I always work under Linux, so my answers may well be worthless for your Windows scenario. Second, I don't know if my workaround works as I don't actually have a SQL Server DB using float. Now the workaround: I have had many problems in the past using ODBC to connect to databases. Nothing I could nail down to a fault in that system, but just no end of problems. Some of that, of course, is due to me generally working under Linux. My general workaround that has been clean is to use JDBC instead. There have been hassles at times to set up the RJava, but recent versions of that have installed very easily. Once RJava is in place (and under Windows, you'll have fun setting up Java cleanly), then installation of a JDBC jar (I use jtds from SourceForge for SQL Server) and finally RJDBC. The generic nature of the JDBC interface is a joy to work with, interacting with most database types very well and in a uniform manner. So, lots of work getting JDBC up and going to see if an alternative path into your DB gets you your data in a better format. Now you see why I waited 24 hours to say anything at all ... Also, it might be worth while posting on the DB specific maillist: https://stat.ethz.ch/mailman/listinfo/r-sig-db Hope this helps, Mark Dalphin On 15/10/15 07:23, jim holtman wrote:> Here is the system I am using: > ====================================>> sessionInfo() > R version 3.2.2 (2015-08-14) > Platform: x86_64-w64-mingw32/x64 (64-bit) > Running under: Windows 7 x64 (build 7601) Service Pack 1 > > locale: > [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United > States.1252 > [3] LC_MONETARY=English_United States.1252 > LC_NUMERIC=C > [5] LC_TIME=English_United States.1252 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] lubridate_1.3.3 RODBC_1.3-12 > loaded via a namespace (and not attached): > [1] magrittr_1.5 plyr_1.8.3 tools_3.2.2 memoise_0.2.1 Rcpp_0.12.1 > stringi_0.5-5 digest_0.6.8 > [8] stringr_1.0.0 > =======================================> > I have data on a SQL Server that I am connecting to where some of the > fields are defined as "float" so that the data is stored in the database as > an IEEE 754 value. Now when I read this is using RODBC, the data comes > across the interface in the floating point format; I used Wireshark to > examine the packets that were being sent. Some of the data is also defined > as "int" and comes across in binary. > > When the data is read in with > > df <- sqlQuery(db, "select * from mydb", as.is = TRUE) > > The resulting dataframe has the floating point values as 'chr' and the > integer fields as 'int'; I would have expected the floating point fields to > be 'num'. Now in the "ODBC Connectivity" Vignette by Ripley there was the > comment that "double" data values come back as type 8, but on some systems > they may be type 6; well on SQL Server, "float" is type 6. > > So what appears to happen, is this data is not recognized as a floating > point value and is therefore converted to a character. When the data is > made available to the R script, I then have to convert this back to > floating point. If I use "stringsAsFactors = FALSE" on the query, this > conversion back to floating point will be done within the RODBC package. > This becomes a problem when I have dataframes with several million rows and > multiple columns of numerics is that the conversion to/from characters is > adding time to the processing. > > So I was wondering is there a workaround to this problem? Is it possible > to add the capability to RODBC when processing SQL Server to avoid this > conversion? Or is there some other way around this problem? > > Jim Holtman > Data Munger Guru > > What is the problem that you are trying to solve? > Tell me what you want to do, not how you want to do it. > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
jim holtman
2015-Oct-15 21:01 UTC
[R] SQL Server "float" is not handled by RODBC -- Is there a workaround?
Mark, Thanks for the suggestion. I will have to look into that option. I assume that if I am running on a 64-bit system, I also have to use the 64-bit version of Java. We have had some problems in the past because the company standard is a 32-bit version of Java and we had to also load in the 64-bit version to work with the XLConnect package. I was reading the RJDBC package documention and they seem to list a lot of methods (e.g., 'dbReadTable'), but don't say what the parameters are, or what it returns. Where do you find this information? Also I notice that I probably have to get the JDBC driver for SQL Server from Microsoft and install that - is that correct? I hate to start mixing approaches since we have a large number of scripts that currently use the RODBC package, but I will try to see if the approach you proposed do help overcome this problem. Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it. On Thu, Oct 15, 2015 at 4:18 PM, Mark Dalphin <mdalphin at gmail.com> wrote:> Hi Jim, > > No answers over the course of 24 hours so I'll give it a shot. > > First, I always work under Linux, so my answers may well be worthless > for your Windows scenario. > > Second, I don't know if my workaround works as I don't actually have a > SQL Server DB using float. > > Now the workaround: > > I have had many problems in the past using ODBC to connect to databases. > Nothing I could nail down to a fault in that system, but just no end of > problems. Some of that, of course, is due to me generally working under > Linux. > > My general workaround that has been clean is to use JDBC instead. There > have been hassles at times to set up the RJava, but recent versions of > that have installed very easily. Once RJava is in place (and under > Windows, you'll have fun setting up Java cleanly), then installation of > a JDBC jar (I use jtds from SourceForge for SQL Server) and finally > RJDBC. The generic nature of the JDBC interface is a joy to work with, > interacting with most database types very well and in a uniform manner. > > So, lots of work getting JDBC up and going to see if an alternative path > into your DB gets you your data in a better format. Now you see why I > waited 24 hours to say anything at all ... > > Also, it might be worth while posting on the DB specific maillist: > https://stat.ethz.ch/mailman/listinfo/r-sig-db > > Hope this helps, > Mark Dalphin > > > On 15/10/15 07:23, jim holtman wrote: > > Here is the system I am using: > > ====================================> >> sessionInfo() > > R version 3.2.2 (2015-08-14) > > Platform: x86_64-w64-mingw32/x64 (64-bit) > > Running under: Windows 7 x64 (build 7601) Service Pack 1 > > > > locale: > > [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United > > States.1252 > > [3] LC_MONETARY=English_United States.1252 > > LC_NUMERIC=C > > [5] LC_TIME=English_United States.1252 > > > > attached base packages: > > [1] stats graphics grDevices utils datasets methods base > > > > other attached packages: > > [1] lubridate_1.3.3 RODBC_1.3-12 > > loaded via a namespace (and not attached): > > [1] magrittr_1.5 plyr_1.8.3 tools_3.2.2 memoise_0.2.1 Rcpp_0.12.1 > > stringi_0.5-5 digest_0.6.8 > > [8] stringr_1.0.0 > > =======================================> > > > I have data on a SQL Server that I am connecting to where some of the > > fields are defined as "float" so that the data is stored in the database > as > > an IEEE 754 value. Now when I read this is using RODBC, the data comes > > across the interface in the floating point format; I used Wireshark to > > examine the packets that were being sent. Some of the data is also > defined > > as "int" and comes across in binary. > > > > When the data is read in with > > > > df <- sqlQuery(db, "select * from mydb", as.is = TRUE) > > > > The resulting dataframe has the floating point values as 'chr' and the > > integer fields as 'int'; I would have expected the floating point fields > to > > be 'num'. Now in the "ODBC Connectivity" Vignette by Ripley there was > the > > comment that "double" data values come back as type 8, but on some > systems > > they may be type 6; well on SQL Server, "float" is type 6. > > > > So what appears to happen, is this data is not recognized as a floating > > point value and is therefore converted to a character. When the data is > > made available to the R script, I then have to convert this back to > > floating point. If I use "stringsAsFactors = FALSE" on the query, this > > conversion back to floating point will be done within the RODBC package. > > This becomes a problem when I have dataframes with several million rows > and > > multiple columns of numerics is that the conversion to/from characters is > > adding time to the processing. > > > > So I was wondering is there a workaround to this problem? Is it possible > > to add the capability to RODBC when processing SQL Server to avoid this > > conversion? Or is there some other way around this problem? > > > > Jim Holtman > > Data Munger Guru > > > > What is the problem that you are trying to solve? > > Tell me what you want to do, not how you want to do it. > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]