Mark Heckmann
2008-Dec-22 15:38 UTC
[R] imputing the numerical columns of a dataframe, returning the rest unchanged
Hi R-experts, how can I apply a function to each numeric column of a data frame and return the whole data frame with changes in numeric columns only? In my case I want to do a median imputation of the numeric columns and retain the other columns. My dataframe (DF) contains factors, characters and numerics. I tried the following but that does not work: foo <- function(x){ if(is.numeric(x)==TRUE) return(impute(x)) else(return(x)) } sapply(DF, foo) day version ID V1 V2 V3 [1,] "4" "A" "1a" "1" "5" "5" [2,] "4" "A" "2a" "2" "3" "5" [3,] "4" "B" "3a" "3" "5" "5" All the variables are coerced to characters now ("day" and "version" were factors, "id" a character). I only want imputations on the numerics, but the rest to be returned unchanged. Is there a function available. If not, how can I do it? TIA and merry x-mas, Mark
Yihui Xie
2008-Dec-24 05:46 UTC
[R] imputing the numerical columns of a dataframe, returning the rest unchanged
Hi, ?sapply will tell you .... 'sapply' is a user-friendly version of 'lapply' by default returning a vector or matrix if appropriate. .... so 'x' has lost its class in sapply(); e.g. ## iris is a data.frame> str(iris)'data.frame': 150 obs. of 5 variables: $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ... $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ... $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ... $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ... $ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ... ## but sapply() will coerce it into a numeric matrix> str(sapply(iris, function(x)x))num [1:150, 1:5] 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ... - attr(*, "dimnames")=List of 2 ..$ : NULL ..$ : chr [1:5] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width" ... I'd suggest you get the class of each column first, then apply impute() to these columns (i.e. DF[, sapply(DF, class) == "numeric"]) and assign the new values to the original columns. Regards, Yihui -- Yihui Xie <xieyihui at gmail.com> Phone: +86-(0)10-82509086 Fax: +86-(0)10-82509086 Mobile: +86-15810805877 Homepage: http://www.yihui.name School of Statistics, Room 1037, Mingde Main Building, Renmin University of China, Beijing, 100872, China On Mon, Dec 22, 2008 at 11:38 PM, Mark Heckmann <mark.heckmann at gmx.de> wrote:> Hi R-experts, > > how can I apply a function to each numeric column of a data frame and return > the whole data frame with changes in numeric columns only? > In my case I want to do a median imputation of the numeric columns and > retain the other columns. My dataframe (DF) contains factors, characters and > numerics. > > I tried the following but that does not work: > > foo <- function(x){ > if(is.numeric(x)==TRUE) return(impute(x)) > else(return(x)) > } > > sapply(DF, foo) > > day version ID V1 V2 V3 > [1,] "4" "A" "1a" "1" "5" "5" > [2,] "4" "A" "2a" "2" "3" "5" > [3,] "4" "B" "3a" "3" "5" "5" > > All the variables are coerced to characters now ("day" and "version" were > factors, "id" a character). I only want imputations on the numerics, but the > rest to be returned unchanged. > > Is there a function available. If not, how can I do it? > > TIA and merry x-mas, > Mark > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >