Maciej Hoffman-Wecker
2007-Aug-14 07:56 UTC
[R] Import of Access data via RODBC changes column name ("NO" to "Expr1014") and the content of the column
Dear all, I have some problems with importing data from an Access data base via RODBC to R. The data base contains several tables, which all are imported consecutively. One table has a column with column name "NO". If I run the code attached on the bottom of the mail I get no complain, but the column name (name of the respective vector of the data.frame) is "Expr1014" instead of "NO". Additionally the original column (type "text") containes "0"s and missings, but the imported column contains "0"s only (type "int"). If I change the column name in the Access data base to "NOx", the import works fine with the right name and the same data. Previously I generated a tiny Access data base which reproduced the problem. To be on the safe site I installed the latest version (2.5.1) and now the example works fine, but within my production process the error still remaines. An import into excel via ODBC works fine. So there is no way to figure it out whether this is a bug or a feature.-) The second problem I have is that when I rerun "rm(list = ls(all = T)); gc()" and the import several times I get the following error: Error in odbcTables(channel) : Calloc could not allocate (263168 of 1) memory In addition: Warning messages: 1: Reached total allocation of 447Mb: see help(memory.size) in: odbcQuery(channel, query, rows_at_time) 2: Reached total allocation of 447Mb: see help(memory.size) in: odbcQuery(channel, query, rows_at_time) 3: Reached total allocation of 447Mb: see help(memory.size) in: odbcTables(channel) 4: Reached total allocation of 447Mb: see help(memory.size) in: odbcTables(channel) which is surprising to me, as the first two statements should delete all objects and recover the memory. Is this only a matter of memory? Is there any logging that reduces the memory? Or is this issue connected to the upper problem? I added the code on the bottom - maybe there is some kind of misuse I lost sight of. Any hints are appreciated. Kind regards, Maciej> version_ platform i386-pc-mingw32 arch i386 os mingw32 system i386, mingw32 status major 2 minor 5.1 year 2007 month 06 day 27 svn rev 42083 language R version.string R version 2.5.1 (2007-06-27) ## code get.table <- function(name, db, drop = NULL){ .con <- try(odbcConnectAccess(db), silent = T) if(!inherits(.con, "RODBC")) return(.con) ## exclude memo columns .t <- try(sqlColumns(.con, name)) if(inherits(.t, "try-error")){close(.con); return(.t)} .t <- .t[.t$"COLUMN_SIZE" < 255, "COLUMN_NAME"] .t <- paste(.t, collapse = ",") ## get table .t <- paste("select", .t, "from", name) .d <- try(sqlQuery(.con, .t), silent = T) if(inherits(.d, "try-error")){close(.con); return(.d)} .con <- try(close(.con), silent = T) if(inherits(.con, "try-error")) return(.con) .d <- .d[!names(.d) %in% drop] return(.d) } get.alltables <- function(db){ .con <- try(odbcConnectAccess(db), silent = T) if(!inherits(.con, "RODBC")) return(.con) .tbls <- try(sqlTables(.con)[["TABLE_NAME"]]) if(inherits(.tbls, "try-error")){close(.con); return(.tbls)} .con <- try(close(.con), silent = T) if(inherits(.con, "try-error")) return(.con) .tbls <- .tbls[-grep("^MSys", .tbls)] .d <- lapply(seq(along = .tbls), function(.i){ .d <- try(get.table(.tbls[.i], db = db)) return(invisible(.d)) }) names(.d) <- .tbls .ok <- !sapply(.d, inherits, "try-error") return(list(notdone = .d[!.ok], data = .d[.ok])) } library(RODBC) alldata <- get.alltables(db = "./myaccessdb.MDB") ## code end
Prof Brian Ripley
2007-Aug-14 09:51 UTC
[R] Import of Access data via RODBC changes column name ("NO" to "Expr1014") and the content of the column
On Tue, 14 Aug 2007, Maciej Hoffman-Wecker wrote:> > Dear all, > > I have some problems with importing data from an Access data base via > RODBC to R. The data base contains several tables, which all are > imported consecutively. One table has a column with column name "NO". If > I run the code attached on the bottom of the mail I get no complain, but > the column name (name of the respective vector of the data.frame) is > "Expr1014" instead of "NO". Additionally the original column (type > "text") containes "0"s and missings, but the imported column contains > "0"s only (type "int"). If I change the column name in the Access data > base to "NOx", the import works fine with the right name and the same > data. > > Previously I generated a tiny Access data base which reproduced the > problem. To be on the safe site I installed the latest version (2.5.1) > and now the example works fine, but within my production process the > error still remaines. An import into excel via ODBC works fine. > > So there is no way to figure it out whether this is a bug or a > feature.-)It's most likely an ODBC issue, but you have not provided a reproducible example.> The second problem I have is that when I rerun "rm(list = ls(all = T)); > gc()" and the import several times I get the following error: > > Error in odbcTables(channel) : Calloc could not allocate (263168 of 1) > memory > In addition: Warning messages: > 1: Reached total allocation of 447Mb: see help(memory.size) in: > odbcQuery(channel, query, rows_at_time) > 2: Reached total allocation of 447Mb: see help(memory.size) in: > odbcQuery(channel, query, rows_at_time) > 3: Reached total allocation of 447Mb: see help(memory.size) in: > odbcTables(channel) > 4: Reached total allocation of 447Mb: see help(memory.size) in: > odbcTables(channel) > > which is surprising to me, as the first two statements should delete allHow do you _know _what they 'should' do? That only deletes all objects in the workspace, not all objects in R, and not all memory blocks used by R. Please do read ?"Memory-limits" for the possible reasons. Where did '447Mb' come from? If this machine has less than 2Gb of RAM, buy some more.> objects and recover the memory. Is this only a matter of memory? Is > there any logging that reduces the memory? Or is this issue connected to > the upper problem? > > I added the code on the bottom - maybe there is some kind of misuse I > lost sight of. Any hints are appreciated. > > Kind regards, > Maciej > >> version > _ > platform i386-pc-mingw32 > arch i386 > os mingw32 > system i386, mingw32 > status > major 2 > minor 5.1 > year 2007 > month 06 > day 27 > svn rev 42083 > language R > version.string R version 2.5.1 (2007-06-27) > > > ## code > > get.table <- function(name, db, drop = NULL){ > .con <- try(odbcConnectAccess(db), silent = T) > if(!inherits(.con, "RODBC")) return(.con) > ## exclude memo columns > .t <- try(sqlColumns(.con, name)) > if(inherits(.t, "try-error")){close(.con); return(.t)} > .t <- .t[.t$"COLUMN_SIZE" < 255, "COLUMN_NAME"] > .t <- paste(.t, collapse = ",") > ## get table > .t <- paste("select", .t, "from", name) > .d <- try(sqlQuery(.con, .t), silent = T) > if(inherits(.d, "try-error")){close(.con); return(.d)} > .con <- try(close(.con), silent = T) > if(inherits(.con, "try-error")) return(.con) > .d <- .d[!names(.d) %in% drop] > return(.d) > } > > get.alltables <- function(db){ > .con <- try(odbcConnectAccess(db), silent = T) > if(!inherits(.con, "RODBC")) return(.con) > .tbls <- try(sqlTables(.con)[["TABLE_NAME"]]) > if(inherits(.tbls, "try-error")){close(.con); return(.tbls)} > .con <- try(close(.con), silent = T) > if(inherits(.con, "try-error")) return(.con) > .tbls <- .tbls[-grep("^MSys", .tbls)] > .d <- lapply(seq(along = .tbls), function(.i){ > .d <- > try(get.table(.tbls[.i], db = db)) > return(invisible(.d)) > }) > names(.d) <- .tbls > .ok <- !sapply(.d, inherits, "try-error") > return(list(notdone = .d[!.ok], data = .d[.ok])) > } > > library(RODBC) > > alldata <- get.alltables(db = "./myaccessdb.MDB") > > ## code end > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
Maciej Hoffman-Wecker
2007-Aug-14 11:33 UTC
[R] Import of Access data via RODBC changes column name ("NO" to "Expr1014") and the content of the column
Dear Professor Ripley, Thank you very much for your response. I send the problem, as I didn't have any more ideas were to search for the reason. I didn't say this is a R bug, knowing the responses on such mails.-) But I succeeded in developing a tiny example, that reproduces the bug (wherever it is). I generated a small Access data base "test2.mdb" with one table "Tab1" and following columns: "Field name" "Field type" F1 Number NO Number F2 Number (sorry if the Access identifiers are not the standard ones, as I have a german Access version) The content of the "Tab1" table is: F1 NO F2 1 1 1 2 2 2 0 1 1 0 0 (The column "NO" contains one missing) Now if I import the data into R, I get the following results:> library(RODBC) > .con <- odbcConnectAccess("./test2.mdb") > (.d <- try(sqlQuery(.con, "select * from Tab1")))F1 NO F2 1 1 1 1 2 2 2 2 3 0 NA 1 4 1 0 0> (.d <- try(sqlQuery(.con, "select F1 , NO , F2 from Tab1")))F1 Expr1001 F2 1 1 0 1 2 2 0 2 3 0 0 1 4 1 0 0> close(.con)So the problem occurs if the column names are specified within the query. Is the query "select F1 , NO , F2 from Tab1" invalid? Regarding the memory issue, I _knew_ that there must be a reason for the running out of memory space. Sorry for not being more specific. My question than is: Is there a way to 'reset' the environment without quitting R and restarting it? Thank you for your help. Kind regards, Maciej -----Urspr?ngliche Nachricht----- Von: Prof Brian Ripley [mailto:ripley at stats.ox.ac.uk] Gesendet: Dienstag, 14. August 2007 11:51 An: Maciej Hoffman-Wecker Cc: r-help at stat.math.ethz.ch Betreff: Re: [R] Import of Access data via RODBC changes column name ("NO" to "Expr1014") and the content of the column On Tue, 14 Aug 2007, Maciej Hoffman-Wecker wrote:> > Dear all, > > I have some problems with importing data from an Access data base via > RODBC to R. The data base contains several tables, which all are > imported consecutively. One table has a column with column name "NO". > If I run the code attached on the bottom of the mail I get no > complain, but the column name (name of the respective vector of the > data.frame) is "Expr1014" instead of "NO". Additionally the original > column (type > "text") containes "0"s and missings, but the imported column contains > "0"s only (type "int"). If I change the column name in the Access data > base to "NOx", the import works fine with the right name and the same > data. > > Previously I generated a tiny Access data base which reproduced the > problem. To be on the safe site I installed the latest version (2.5.1) > and now the example works fine, but within my production process the > error still remaines. An import into excel via ODBC works fine. > > So there is no way to figure it out whether this is a bug or a > feature.-)It's most likely an ODBC issue, but you have not provided a reproducible example.> The second problem I have is that when I rerun "rm(list = ls(all = > T)); gc()" and the import several times I get the following error: > > Error in odbcTables(channel) : Calloc could not allocate (263168 of 1) > memory In addition: Warning messages: > 1: Reached total allocation of 447Mb: see help(memory.size) in: > odbcQuery(channel, query, rows_at_time) > 2: Reached total allocation of 447Mb: see help(memory.size) in: > odbcQuery(channel, query, rows_at_time) > 3: Reached total allocation of 447Mb: see help(memory.size) in: > odbcTables(channel) > 4: Reached total allocation of 447Mb: see help(memory.size) in: > odbcTables(channel) > > which is surprising to me, as the first two statements should delete > allHow do you _know _what they 'should' do? That only deletes all objects in the workspace, not all objects in R, and not all memory blocks used by R. Please do read ?"Memory-limits" for the possible reasons. Where did '447Mb' come from? If this machine has less than 2Gb of RAM, buy some more.> objects and recover the memory. Is this only a matter of memory? Is > there any logging that reduces the memory? Or is this issue connected to > the upper problem? > > I added the code on the bottom - maybe there is some kind of misuse I > lost sight of. Any hints are appreciated. > > Kind regards, > Maciej > >> version > _ > platform i386-pc-mingw32 > arch i386 > os mingw32 > system i386, mingw32 > status > major 2 > minor 5.1 > year 2007 > month 06 > day 27 > svn rev 42083 > language R > version.string R version 2.5.1 (2007-06-27) > > > ## code > > get.table <- function(name, db, drop = NULL){ > .con <- try(odbcConnectAccess(db), silent = T) > if(!inherits(.con, "RODBC")) return(.con) > ## exclude memo columns > .t <- try(sqlColumns(.con, name)) > if(inherits(.t, "try-error")){close(.con); return(.t)} > .t <- .t[.t$"COLUMN_SIZE" < 255, "COLUMN_NAME"] > .t <- paste(.t, collapse = ",") > ## get table > .t <- paste("select", .t, "from", name) > .d <- try(sqlQuery(.con, .t), silent = T) > if(inherits(.d, "try-error")){close(.con); return(.d)} > .con <- try(close(.con), silent = T) > if(inherits(.con, "try-error")) return(.con) > .d <- .d[!names(.d) %in% drop] > return(.d) > } > > get.alltables <- function(db){ > .con <- try(odbcConnectAccess(db), silent = T) > if(!inherits(.con, "RODBC")) return(.con) > .tbls <- try(sqlTables(.con)[["TABLE_NAME"]]) > if(inherits(.tbls, "try-error")){close(.con); return(.tbls)} > .con <- try(close(.con), silent = T) > if(inherits(.con, "try-error")) return(.con) > .tbls <- .tbls[-grep("^MSys", .tbls)] > .d <- lapply(seq(along = .tbls), function(.i){ > .d <- > try(get.table(.tbls[.i], db = db)) > return(invisible(.d)) > }) > names(.d) <- .tbls > .ok <- !sapply(.d, inherits, "try-error") > return(list(notdone = .d[!.ok], data = .d[.ok])) > } > > library(RODBC) > > alldata <- get.alltables(db = "./myaccessdb.MDB") > > ## code end > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
Dieter Menne
2007-Aug-14 16:06 UTC
[R] Import of Access data via RODBC changes column name ("NO" to "Expr1014") and the content of the column
Maciej Hoffman-Wecker <Maciej.Hoffman-Wecker <at> bioskin.de> writes: ...> But I succeeded in developing a tiny example, that reproduces the bug(wherever it is).> > I generated a small Access data base "test2.mdb" with one table "Tab1" andfollowing columns: .....> > library(RODBC) > > .con <- odbcConnectAccess("./test2.mdb") > > (.d <- try(sqlQuery(.con, "select * from Tab1")))> > (.d <- try(sqlQuery(.con, "select F1 , NO , F2 from Tab1"))) > F1 Expr1001 F2 > 1 1 0 1 > 2 2 0 2 > 3 0 0 1 > 4 1 0 0 > > close(.con)NO is a reserved word in ODBC (or where...). Whenever you see Exprxxx in columns, put the column name in [] d <- try(sqlQuery(con, "select F1 ,[NO], F2 from Tab1")) works for me. Dieter
Maciej Hoffman-Wecker
2007-Aug-15 07:28 UTC
[R] Import of Access data via RODBC changes column name ("NO" to "Expr1014") and the content of the column
Thank you very much Professor Ripley! Afterwards it seems obvious where to look. Have a nice day, Maciej PS: Yes, my machine has not much memory, but it is sufficient for the smaller trial data. -----Urspr?ngliche Nachricht----- Von: Prof Brian Ripley [mailto:ripley at stats.ox.ac.uk] Gesendet: Dienstag, 14. August 2007 19:14 An: Maciej Hoffman-Wecker Cc: r-help at stat.math.ethz.ch Betreff: Re: AW: [R] Import of Access data via RODBC changes column name ("NO" to "Expr1014") and the content of the column On Tue, 14 Aug 2007, Maciej Hoffman-Wecker wrote:> Dear Professor Ripley, > > Thank you very much for your response. I send the problem, as I didn't > have any more ideas were to search for the reason. I didn't say this > is a R bug, knowing the responses on such mails.-) > > But I succeeded in developing a tiny example, that reproduces the bug > (wherever it is).Thank you, that was helpful: much easier to follow that the previous code. ...>> library(RODBC) >> .con <- odbcConnectAccess("./test2.mdb") (.d <- try(sqlQuery(.con, >> "select * from Tab1"))) > F1 NO F2 > 1 1 1 1 > 2 2 2 2 > 3 0 NA 1 > 4 1 0 0 >> (.d <- try(sqlQuery(.con, "select F1 , NO , F2 from Tab1"))) > F1 Expr1001 F2 > 1 1 0 1 > 2 2 0 2 > 3 0 0 1 > 4 1 0 0 >> close(.con) > > So the problem occurs if the column names are specified within the query. > Is the query "select F1 , NO , F2 from Tab1" invalid?I believe so. 'NO' is an SQL92 and ODBC reserved word, at least according to http://www.bairdgroup.com/reservedwords.cfm See also http://support.microsoft.com/default.aspx?scid=kb;en-us;286335 which says For existing objects with names that contain reserved words, you can avoid errors by surrounding the object name with brackets ([ ]). and lists 'NO' as a reserved word. RODBC quotes all column names it uses to be sure (and knows about most non-standard quoting mechanisms from the ODBC driver in use). But this was a query you generated and so you need to do the quoting.> Regarding the memory issue, I _knew_ that there must be a reason for > the running out of memory space. Sorry for not being more specific. My > question than is: > > Is there a way to 'reset' the environment without quitting R and > restarting it?Sorry, no. You cannot move objects in memory. But why '477Mb' is coming up is still unexplained, and suggests that the machine has a peculiar amount of memory or some flag has been used.> > Thank you for your help. > > Kind regards, > Maciej > > > -----Urspr?ngliche Nachricht----- > Von: Prof Brian Ripley [mailto:ripley at stats.ox.ac.uk] > Gesendet: Dienstag, 14. August 2007 11:51 > An: Maciej Hoffman-Wecker > Cc: r-help at stat.math.ethz.ch > Betreff: Re: [R] Import of Access data via RODBC changes column name ("NO" to "Expr1014") and the content of the column > > On Tue, 14 Aug 2007, Maciej Hoffman-Wecker wrote: > >> >> Dear all, >> >> I have some problems with importing data from an Access data base via >> RODBC to R. The data base contains several tables, which all are >> imported consecutively. One table has a column with column name "NO". >> If I run the code attached on the bottom of the mail I get no >> complain, but the column name (name of the respective vector of the >> data.frame) is "Expr1014" instead of "NO". Additionally the original >> column (type >> "text") containes "0"s and missings, but the imported column contains >> "0"s only (type "int"). If I change the column name in the Access data >> base to "NOx", the import works fine with the right name and the same >> data. >> >> Previously I generated a tiny Access data base which reproduced the >> problem. To be on the safe site I installed the latest version (2.5.1) >> and now the example works fine, but within my production process the >> error still remaines. An import into excel via ODBC works fine. >> >> So there is no way to figure it out whether this is a bug or a >> feature.-) > > It's most likely an ODBC issue, but you have not provided a reproducible example. > >> The second problem I have is that when I rerun "rm(list = ls(all >> T)); gc()" and the import several times I get the following error: >> >> Error in odbcTables(channel) : Calloc could not allocate (263168 of 1) >> memory In addition: Warning messages: >> 1: Reached total allocation of 447Mb: see help(memory.size) in: >> odbcQuery(channel, query, rows_at_time) >> 2: Reached total allocation of 447Mb: see help(memory.size) in: >> odbcQuery(channel, query, rows_at_time) >> 3: Reached total allocation of 447Mb: see help(memory.size) in: >> odbcTables(channel) >> 4: Reached total allocation of 447Mb: see help(memory.size) in: >> odbcTables(channel) >> >> which is surprising to me, as the first two statements should delete >> all > > How do you _know _what they 'should' do? That only deletes all objects in the workspace, not all objects in R, and not all memory blocks used by R. > > Please do read ?"Memory-limits" for the possible reasons. > > Where did '447Mb' come from? If this machine has less than 2Gb of RAM, buy some more. > > >> objects and recover the memory. Is this only a matter of memory? Is >> there any logging that reduces the memory? Or is this issue connected to >> the upper problem? >> >> I added the code on the bottom - maybe there is some kind of misuse I >> lost sight of. Any hints are appreciated. >> >> Kind regards, >> Maciej >> >>> version >> _ >> platform i386-pc-mingw32 >> arch i386 >> os mingw32 >> system i386, mingw32 >> status >> major 2 >> minor 5.1 >> year 2007 >> month 06 >> day 27 >> svn rev 42083 >> language R >> version.string R version 2.5.1 (2007-06-27) >> >> >> ## code >> >> get.table <- function(name, db, drop = NULL){ >> .con <- try(odbcConnectAccess(db), silent = T) >> if(!inherits(.con, "RODBC")) return(.con) >> ## exclude memo columns >> .t <- try(sqlColumns(.con, name)) >> if(inherits(.t, "try-error")){close(.con); return(.t)} >> .t <- .t[.t$"COLUMN_SIZE" < 255, "COLUMN_NAME"] >> .t <- paste(.t, collapse = ",") >> ## get table >> .t <- paste("select", .t, "from", name) >> .d <- try(sqlQuery(.con, .t), silent = T) >> if(inherits(.d, "try-error")){close(.con); return(.d)} >> .con <- try(close(.con), silent = T) >> if(inherits(.con, "try-error")) return(.con) >> .d <- .d[!names(.d) %in% drop] >> return(.d) >> } >> >> get.alltables <- function(db){ >> .con <- try(odbcConnectAccess(db), silent = T) >> if(!inherits(.con, "RODBC")) return(.con) >> .tbls <- try(sqlTables(.con)[["TABLE_NAME"]]) >> if(inherits(.tbls, "try-error")){close(.con); return(.tbls)} >> .con <- try(close(.con), silent = T) >> if(inherits(.con, "try-error")) return(.con) >> .tbls <- .tbls[-grep("^MSys", .tbls)] >> .d <- lapply(seq(along = .tbls), function(.i){ >> .d <- >> try(get.table(.tbls[.i], db = db)) >> return(invisible(.d)) >> }) >> names(.d) <- .tbls >> .ok <- !sapply(.d, inherits, "try-error") >> return(list(notdone = .d[!.ok], data = .d[.ok])) >> } >> >> library(RODBC) >> >> alldata <- get.alltables(db = "./myaccessdb.MDB") >> >> ## code end >> >> ______________________________________________ >> R-help at stat.math.ethz.ch mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > >-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595