Lauri Nikkinen
2007-Aug-16 14:54 UTC
[R] Trim trailng space from data.frame factor variables
Hi folks, I would like to trim the trailing spaces in my factor variables using lapply (described in this post by Marc Schwartz: http://tolstoy.newcastle.edu.au/R/e2/help/07/08/22826.html) but the code is not functioning (in this example there is only one factor with trailing spaces): y1 <- rnorm(20) + 6.8 y2 <- rnorm(20) + (1:20*1.7 + 1) y3 <- rnorm(20) + (1:20*6.7 + 3.7) y <- c(y1,y2,y3) x <- gl(5,12) f <- gl(3,20, labels=paste("lev", 1:3, " ", sep="")) d <- data.frame(x=x,y=y, f=f) str(d) d[] <- lapply(d, function(x) ifelse(is.factor(x), sub(" +$", "", x), x)) str(d) How should I modify this? -Lauri [[alternative HTML version deleted]]
Marc Schwartz
2007-Aug-16 16:08 UTC
[R] Trim trailng space from data.frame factor variables
On Thu, 2007-08-16 at 17:54 +0300, Lauri Nikkinen wrote:> Hi folks, > > I would like to trim the trailing spaces in my factor variables using lapply > (described in this post by Marc Schwartz: > http://tolstoy.newcastle.edu.au/R/e2/help/07/08/22826.html) but the code is > not functioning (in this example there is only one factor with trailing > spaces):Ayep....as I noted in that post, it was untested....my error. The problem is that by using ifelse() as I did, the test for the column being a factor returns a single result, not one result per element. Hence, the appropriate conditional code is only performed on the first element in each column, rather than being vectorized on the entire column.> y1 <- rnorm(20) + 6.8 > y2 <- rnorm(20) + (1:20*1.7 + 1) > y3 <- rnorm(20) + (1:20*6.7 + 3.7) > y <- c(y1,y2,y3) > x <- gl(5,12) > f <- gl(3,20, labels=paste("lev", 1:3, " ", sep="")) > d <- data.frame(x=x,y=y, f=f) > str(d) > > d[] <- lapply(d, function(x) ifelse(is.factor(x), sub(" +$", "", x), x)) > str(d) > > How should I modify this?Try this instead: d[] <- lapply(d, function(x) if (is.factor(x)) sub(" +$", "", x) else x)> str(d)'data.frame': 60 obs. of 3 variables: $ x: chr "1" "1" "1" "1" ... $ y: num 6.70 4.42 8.03 4.90 6.98 ... $ f: chr "lev1" "lev1" "lev1" "lev1" ... Note that by using grep(), the factors are coerced to character vectors as expected. You would need to coerce back to factors if you need them as such. HTH, Marc Schwartz