Thierry Onkelinx
2018-Mar-14 11:15 UTC
[R] Problem with reading data from an UTF-16 database
Dear all, We have a problem with reading some characters correctly from an UTF-16 encoded database. The code below givens the correct characters on Ubuntu with the_driver = {ODBC Driver 13 for SQL Server}. On Windows (with the_driver = {SQL Server}), special characters like '?' and '?' are returned as '?'. I've added the sessionInfo() output from both machines. Any suggestions on how to fix the problem? Best regards, Thierry library(DBI) con <- dbConnect(odbc::odbc(), .connection_string "Driver=the_drive;Server=our_server;Database=the_database;Trusted_Connection=Yes;") dbGetQuery(con, sql_statement) R version 3.4.2 (2017-09-28) Platform: i386-w64-mingw32/i386 (32-bit) Running under: Windows 7 x64 (build 7601) Service Pack 1 Matrix products: default locale: [1] LC_COLLATE=Dutch_Belgium.1252 LC_CTYPE=Dutch_Belgium.1252 LC_MONETARY=Dutch_Belgium.1252 LC_NUMERIC=C [5] LC_TIME=Dutch_Belgium.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] RODBC_1.3-15 DBI_0.8 loaded via a namespace (and not attached): [1] Rcpp_0.12.14 R6_2.2.2 odbc_1.1.5 magrittr_1.5 pillar_1.1.0 rlang_0.2.0 testthat_2.0.0 [8] blob_1.1.0 drat_0.1.3 fortunes_1.5-4 tools_3.4.2 bit64_0.9-7 bit_1.1-12 hms_0.4.0 [15] yaml_2.1.14 compiler_3.4.2 pkgconfig_2.0.1 tibble_1.4.2 R version 3.4.3 (2017-11-30) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 16.04.4 LTS Matrix products: default BLAS: /usr/lib/libblas/libblas.so.3.6.0 LAPACK: /usr/lib/lapack/liblapack.so.3.6.0 locale: [1] LC_CTYPE=nl_BE.UTF-8 LC_NUMERIC=C LC_TIME=nl_BE.UTF-8 LC_COLLATE=nl_BE.UTF-8 [5] LC_MONETARY=nl_BE.UTF-8 LC_MESSAGES=nl_BE.UTF-8 LC_PAPER=nl_BE.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=nl_BE.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] DBI_0.7 loaded via a namespace (and not attached): [1] drat_0.1.3 bit_1.1-12 odbc_1.1.2 compiler_3.4.3 hms_0.3 tools_3.4.3 pillar_1.2.1 tibble_1.4.2 [9] yaml_2.1.17 Rcpp_0.12.14 bit64_0.9-7 blob_1.1.0 rlang_0.2.0 fortunes_1.5-4 ir. Thierry Onkelinx Statisticus / Statistician Vlaamse Overheid / Government of Flanders INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE AND FOREST Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance thierry.onkelinx at inbo.be Havenlaan 88 bus 73, 1000 Brussel www.inbo.be /////////////////////////////////////////////////////////////////////////////////////////// To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey /////////////////////////////////////////////////////////////////////////////////////////// <https://www.inbo.be> [[alternative HTML version deleted]]
Duncan Murdoch
2018-Mar-14 19:45 UTC
[R] Problem with reading data from an UTF-16 database
On 14/03/2018 7:15 AM, Thierry Onkelinx wrote:> Dear all, > > We have a problem with reading some characters correctly from an UTF-16 > encoded database. The code below givens the correct characters on Ubuntu > with the_driver = {ODBC Driver 13 for SQL Server}. On Windows (with > the_driver = {SQL Server}), special characters like '?' and '?' are > returned as '?'. I've added the sessionInfo() output from both machines. > > Any suggestions on how to fix the problem?I haven't tried it, but the RODBC package includes an argument DBMSencoding in the odbcDriverConnect function. So maybe you could use that instead of DBI and odbc. Duncan Murdoch> > Best regards, > > Thierry > > library(DBI) > con <- dbConnect(odbc::odbc(), .connection_string > "Driver=the_drive;Server=our_server;Database=the_database;Trusted_Connection=Yes;") > dbGetQuery(con, sql_statement) > > > R version 3.4.2 (2017-09-28) > Platform: i386-w64-mingw32/i386 (32-bit) > Running under: Windows 7 x64 (build 7601) Service Pack 1 > > Matrix products: default > > locale: > [1] LC_COLLATE=Dutch_Belgium.1252 LC_CTYPE=Dutch_Belgium.1252 > LC_MONETARY=Dutch_Belgium.1252 LC_NUMERIC=C > [5] LC_TIME=Dutch_Belgium.1252 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] RODBC_1.3-15 DBI_0.8 > > loaded via a namespace (and not attached): > [1] Rcpp_0.12.14 R6_2.2.2 odbc_1.1.5 magrittr_1.5 > pillar_1.1.0 rlang_0.2.0 testthat_2.0.0 > [8] blob_1.1.0 drat_0.1.3 fortunes_1.5-4 tools_3.4.2 > bit64_0.9-7 bit_1.1-12 hms_0.4.0 > [15] yaml_2.1.14 compiler_3.4.2 pkgconfig_2.0.1 tibble_1.4.2 > > > R version 3.4.3 (2017-11-30) > Platform: x86_64-pc-linux-gnu (64-bit) > Running under: Ubuntu 16.04.4 LTS > > Matrix products: default > BLAS: /usr/lib/libblas/libblas.so.3.6.0 > LAPACK: /usr/lib/lapack/liblapack.so.3.6.0 > > locale: > [1] LC_CTYPE=nl_BE.UTF-8 LC_NUMERIC=C > LC_TIME=nl_BE.UTF-8 LC_COLLATE=nl_BE.UTF-8 > [5] LC_MONETARY=nl_BE.UTF-8 LC_MESSAGES=nl_BE.UTF-8 > LC_PAPER=nl_BE.UTF-8 LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > LC_MEASUREMENT=nl_BE.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] DBI_0.7 > > loaded via a namespace (and not attached): > [1] drat_0.1.3 bit_1.1-12 odbc_1.1.2 compiler_3.4.3 hms_0.3 > tools_3.4.3 pillar_1.2.1 tibble_1.4.2 > [9] yaml_2.1.17 Rcpp_0.12.14 bit64_0.9-7 blob_1.1.0 > rlang_0.2.0 fortunes_1.5-4 > > > > ir. Thierry Onkelinx > Statisticus / Statistician > > Vlaamse Overheid / Government of Flanders > INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE AND > FOREST > Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance > thierry.onkelinx at inbo.be > Havenlaan 88 bus 73, 1000 Brussel > www.inbo.be > > /////////////////////////////////////////////////////////////////////////////////////////// > To call in the statistician after the experiment is done may be no more > than asking him to perform a post-mortem examination: he may be able to say > what the experiment died of. ~ Sir Ronald Aylmer Fisher > The plural of anecdote is not data. ~ Roger Brinner > The combination of some data and an aching desire for an answer does not > ensure that a reasonable answer can be extracted from a given body of data. > ~ John Tukey > /////////////////////////////////////////////////////////////////////////////////////////// > > <https://www.inbo.be> > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >