On Fri, Jan 13, 2012 at 7:02 AM, Francisco <franciscororolaio at
google.com> wrote:> Hello,
> I have a csv file with many variables, both characters and integers.
> I would like to load it on R and do some operations on integer variables,
> the problem is that R loads the entire dataset considering all variables as
> characters, instead I would like that R makes the distinction between the
> two types, because there are too many variables to do:
> x1<-as.integer(x1)
> x2<-as.integer(x2)
> x3<-as.integer(x3)
> ...
>
> I tried to specify read.table(... stringsAsFactors=FALSE) but it
doesn't
> work.
There must be non-integers in some of the columns that are supposed to
be integer. Lets assume that the first row has no such garbage. Then
we can get the desired classes from that row and apply it to the
entire data frame. In this example the second column has such
garbage:
# test data
Lines <- "a,b,c
D,2,3
a,b,9
C,5,6"
# read in just row 1 and read in all rows
DF1 <- read.csv(text = Lines, nrow = 1, as.is = TRUE)
DF <- DF0 <- read.csv(text = Lines, as.is = TRUE)
# there will warning as its converting garbage to NAs
to.int <- function(v, v1) if (inherits(v1, "integer"))
as.integer(v) else v
DF <- mapply(to.int, DF0, DF1, SIMPLIFY = FALSE)
DF <- as.data.frame(DF)
As we see here the second column becomes integer despite garbage in it:
> str(DF0) # as read in
'data.frame': 3 obs. of 3 variables:
$ a: chr "D" "a" "C"
$ b: chr "2" "b" "5"
$ c: int 3 9 6> str(DF) # as converted
'data.frame': 3 obs. of 3 variables:
$ a: Factor w/ 3 levels "a","C","D": 3 1 2
$ b: int 2 NA 5
$ c: int 3 9 6
--
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com