Thierry Onkelinx
2018-Mar-14 11:15 UTC
[R] Problem with reading data from an UTF-16 database
Dear all,
We have a problem with reading some characters correctly from an UTF-16
encoded database. The code below givens the correct characters on Ubuntu
with the_driver = {ODBC Driver 13 for SQL Server}. On Windows (with
the_driver = {SQL Server}), special characters like '?' and '?'
are
returned as '?'. I've added the sessionInfo() output from both
machines.
Any suggestions on how to fix the problem?
Best regards,
Thierry
library(DBI)
con <- dbConnect(odbc::odbc(), .connection_string
"Driver=the_drive;Server=our_server;Database=the_database;Trusted_Connection=Yes;")
dbGetQuery(con, sql_statement)
R version 3.4.2 (2017-09-28)
Platform: i386-w64-mingw32/i386 (32-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1
Matrix products: default
locale:
[1] LC_COLLATE=Dutch_Belgium.1252 LC_CTYPE=Dutch_Belgium.1252
LC_MONETARY=Dutch_Belgium.1252 LC_NUMERIC=C
[5] LC_TIME=Dutch_Belgium.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] RODBC_1.3-15 DBI_0.8
loaded via a namespace (and not attached):
[1] Rcpp_0.12.14 R6_2.2.2 odbc_1.1.5 magrittr_1.5
pillar_1.1.0 rlang_0.2.0 testthat_2.0.0
[8] blob_1.1.0 drat_0.1.3 fortunes_1.5-4 tools_3.4.2
bit64_0.9-7 bit_1.1-12 hms_0.4.0
[15] yaml_2.1.14 compiler_3.4.2 pkgconfig_2.0.1 tibble_1.4.2
R version 3.4.3 (2017-11-30)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.4 LTS
Matrix products: default
BLAS: /usr/lib/libblas/libblas.so.3.6.0
LAPACK: /usr/lib/lapack/liblapack.so.3.6.0
locale:
[1] LC_CTYPE=nl_BE.UTF-8 LC_NUMERIC=C
LC_TIME=nl_BE.UTF-8 LC_COLLATE=nl_BE.UTF-8
[5] LC_MONETARY=nl_BE.UTF-8 LC_MESSAGES=nl_BE.UTF-8
LC_PAPER=nl_BE.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
LC_MEASUREMENT=nl_BE.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] DBI_0.7
loaded via a namespace (and not attached):
[1] drat_0.1.3 bit_1.1-12 odbc_1.1.2 compiler_3.4.3 hms_0.3
tools_3.4.3 pillar_1.2.1 tibble_1.4.2
[9] yaml_2.1.17 Rcpp_0.12.14 bit64_0.9-7 blob_1.1.0
rlang_0.2.0 fortunes_1.5-4
ir. Thierry Onkelinx
Statisticus / Statistician
Vlaamse Overheid / Government of Flanders
INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE AND
FOREST
Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance
thierry.onkelinx at inbo.be
Havenlaan 88 bus 73, 1000 Brussel
www.inbo.be
///////////////////////////////////////////////////////////////////////////////////////////
To call in the statistician after the experiment is done may be no more
than asking him to perform a post-mortem examination: he may be able to say
what the experiment died of. ~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner
The combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of data.
~ John Tukey
///////////////////////////////////////////////////////////////////////////////////////////
<https://www.inbo.be>
[[alternative HTML version deleted]]
Duncan Murdoch
2018-Mar-14 19:45 UTC
[R] Problem with reading data from an UTF-16 database
On 14/03/2018 7:15 AM, Thierry Onkelinx wrote:> Dear all, > > We have a problem with reading some characters correctly from an UTF-16 > encoded database. The code below givens the correct characters on Ubuntu > with the_driver = {ODBC Driver 13 for SQL Server}. On Windows (with > the_driver = {SQL Server}), special characters like '?' and '?' are > returned as '?'. I've added the sessionInfo() output from both machines. > > Any suggestions on how to fix the problem?I haven't tried it, but the RODBC package includes an argument DBMSencoding in the odbcDriverConnect function. So maybe you could use that instead of DBI and odbc. Duncan Murdoch> > Best regards, > > Thierry > > library(DBI) > con <- dbConnect(odbc::odbc(), .connection_string > "Driver=the_drive;Server=our_server;Database=the_database;Trusted_Connection=Yes;") > dbGetQuery(con, sql_statement) > > > R version 3.4.2 (2017-09-28) > Platform: i386-w64-mingw32/i386 (32-bit) > Running under: Windows 7 x64 (build 7601) Service Pack 1 > > Matrix products: default > > locale: > [1] LC_COLLATE=Dutch_Belgium.1252 LC_CTYPE=Dutch_Belgium.1252 > LC_MONETARY=Dutch_Belgium.1252 LC_NUMERIC=C > [5] LC_TIME=Dutch_Belgium.1252 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] RODBC_1.3-15 DBI_0.8 > > loaded via a namespace (and not attached): > [1] Rcpp_0.12.14 R6_2.2.2 odbc_1.1.5 magrittr_1.5 > pillar_1.1.0 rlang_0.2.0 testthat_2.0.0 > [8] blob_1.1.0 drat_0.1.3 fortunes_1.5-4 tools_3.4.2 > bit64_0.9-7 bit_1.1-12 hms_0.4.0 > [15] yaml_2.1.14 compiler_3.4.2 pkgconfig_2.0.1 tibble_1.4.2 > > > R version 3.4.3 (2017-11-30) > Platform: x86_64-pc-linux-gnu (64-bit) > Running under: Ubuntu 16.04.4 LTS > > Matrix products: default > BLAS: /usr/lib/libblas/libblas.so.3.6.0 > LAPACK: /usr/lib/lapack/liblapack.so.3.6.0 > > locale: > [1] LC_CTYPE=nl_BE.UTF-8 LC_NUMERIC=C > LC_TIME=nl_BE.UTF-8 LC_COLLATE=nl_BE.UTF-8 > [5] LC_MONETARY=nl_BE.UTF-8 LC_MESSAGES=nl_BE.UTF-8 > LC_PAPER=nl_BE.UTF-8 LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > LC_MEASUREMENT=nl_BE.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] DBI_0.7 > > loaded via a namespace (and not attached): > [1] drat_0.1.3 bit_1.1-12 odbc_1.1.2 compiler_3.4.3 hms_0.3 > tools_3.4.3 pillar_1.2.1 tibble_1.4.2 > [9] yaml_2.1.17 Rcpp_0.12.14 bit64_0.9-7 blob_1.1.0 > rlang_0.2.0 fortunes_1.5-4 > > > > ir. Thierry Onkelinx > Statisticus / Statistician > > Vlaamse Overheid / Government of Flanders > INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE AND > FOREST > Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance > thierry.onkelinx at inbo.be > Havenlaan 88 bus 73, 1000 Brussel > www.inbo.be > > /////////////////////////////////////////////////////////////////////////////////////////// > To call in the statistician after the experiment is done may be no more > than asking him to perform a post-mortem examination: he may be able to say > what the experiment died of. ~ Sir Ronald Aylmer Fisher > The plural of anecdote is not data. ~ Roger Brinner > The combination of some data and an aching desire for an answer does not > ensure that a reasonable answer can be extracted from a given body of data. > ~ John Tukey > /////////////////////////////////////////////////////////////////////////////////////////// > > <https://www.inbo.be> > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >