chen jia
2011-Feb-28 20:37 UTC
[R] Data type problem when extract data from SQLite to R by using RSQLite
Hi there, When I extract data from SQLite to R, the data types (or modes) of the extracted data seems to be determined by the value of the first row. Please see the following example. When I put the missing values first, the column extracted is of the mode character.> str(dbGetQuery(sql.industry,+ "select pya_var from annual_data3 + order by pya_var")) 'data.frame': 155465 obs. of 1 variable: $ pya_var: chr NA NA NA NA ... When I put the non-missing values first, the column extracted is of the mode numeric.> str(dbGetQuery(sql.industry,+ "select pya_var from annual_data3 + order by pya_var desc")) 'data.frame': 155465 obs. of 1 variable: $ pya_var: num 2.23 2.08 2.04 2 1.86 ... And, if the missing value happens to be the first observation, the whole column is converted to the mode character.> str(dbGetQuery(sql.industry,+ "select pya_var from annual_data3")) 'data.frame': 155465 obs. of 1 variable: $ pya_var: chr NA "0.0" "0.000532137017747267" "0.00655147489334259" ... This creates a problem because this column "pya_var" is supposed to be numeric. If it is converted to character, I will encounter errors in subsequent computation. Although I can convert the data back to numeric, but it will be a hassle to check whether it is necessary and which columns to convert every time I extract data from SQLite into R. Do you have any idea how I can make sure the data type stays numeric regardless what value the first observation is? Thanks. Best, Jia -- 700 Fisher Hall 2100 Neil Ave. Columbus, Ohio? 43210 http://www.fisher.osu.edu/~chen_1002/
Seth Falcon
2011-Feb-28 23:48 UTC
[R] Data type problem when extract data from SQLite to R by using RSQLite
Hi Jia, On Mon, Feb 28, 2011 at 12:37 PM, chen jia <chen_1002 at fisher.osu.edu> wrote:> When I extract data from SQLite to R, the data types (or modes) of the > extracted data seems to be determined by the value of the first row. > Please see the following example.It would help to provide the output of sessionInfo() as well as the schema definition for the table in SQLite (or at least description of how it was created). Here's an example that works as you'd like: > library(RSQLite) > db = dbConnect(SQLite(), dbname = ":memory:") > dbGetQuery(db, "create table t (a int, b real, c text)") > df = data.frame(a=c(NA, 1L, 2L), b=c(NA, 1.1, 2.2), c=c(NA, "x", "y"),stringsAsFactors=FALSE) > df a b c 1 NA NA <NA> 2 1 1.1 x 3 2 2.2 y > dbGetPreparedQuery(db, "insert into t values (?, ?, ?)", df) > dbGetQuery(db, "select * from t") a b c 1 NA NA <NA> 2 1 1.1 x 3 2 2.2 y > sapply(dbGetQuery(db, "select * from t"), typeof) a b c "integer" "double" "character" > sapply(dbGetQuery(db, "select * from t limit 1"), typeof) a b c "integer" "double" "character" > sapply(dbGetQuery(db, "select a from t limit 1"), typeof) a "integer" > sapply(dbGetQuery(db, "select a from t limit 2"), typeof) a "integer" > sapply(dbGetQuery(db, "select a from t limit 1"), typeof) a "integer"> sessionInfo()R version 2.11.1 (2010-05-31) x86_64-apple-darwin9.8.0 locale: [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats graphics grDevices datasets utils methods base other attached packages: [1] RSQLite_0.9-4 DBI_0.2-5 loaded via a namespace (and not attached): [1] tools_2.11.1 -- Seth Falcon | @sfalcon | http://userprimary.net/