Auston_Wei@mdanderson.org
2004-Oct-04 19:27 UTC
[R] how to change data type in data frame?
Hi, list, suppose i have such a data frame: trash <- data.frame(cbind(seq(1:5),c('a','a','b','a','b'),c('b','a','b','b','a'))) names(trash) <- c('age','typeI','typeII') and I want to change all 'a's to be 0 and 'b's to be 1. temp <- as.matrix(trash) temp[temp=='a'] <- 0 temp[temp=='b'] <- 1 temp <- data.frame(temp) the problem was that temp$typeI and temp$typeII were still factors, whereas I want numeric type. How can I make it? Thanks, Auston [[alternative HTML version deleted]]
Here's one way:> temp <- as.matrix(trash) > temp[temp=='b'] <- 1 > temp[temp=='a'] <- 0 > temp <- as.data.frame(structure(as.numeric(temp), dim=dim(trash),dimnames=dimnames(trash)))> tempage typeI typeII 1 1 0 1 2 2 0 0 3 3 1 1 4 4 0 1 5 5 1 0 HTH, Andy> From: Auston_Wei at mdanderson.org > > Hi, list, > > suppose i have such a data frame: > > trash <- > data.frame(cbind(seq(1:5),c('a','a','b','a','b'),c('b','a','b' > ,'b','a'))) > names(trash) <- c('age','typeI','typeII') > > and I want to change all 'a's to be 0 and 'b's to be 1. > > temp <- as.matrix(trash) > temp[temp=='a'] <- 0 > temp[temp=='b'] <- 1 > temp <- data.frame(temp) > > the problem was that temp$typeI and temp$typeII were still factors, > whereas I want numeric type. How can I make it? > > Thanks, > Auston > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html > >
On Mon, 2004-10-04 at 14:27, Auston_Wei at mdanderson.org wrote:> Hi, list, > > suppose i have such a data frame: > > trash <- > data.frame(cbind(seq(1:5),c('a','a','b','a','b'),c('b','a','b','b','a'))) > names(trash) <- c('age','typeI','typeII') > > and I want to change all 'a's to be 0 and 'b's to be 1. > > temp <- as.matrix(trash) > temp[temp=='a'] <- 0 > temp[temp=='b'] <- 1 > temp <- data.frame(temp) > > the problem was that temp$typeI and temp$typeII were still factors, > whereas I want numeric type. How can I make it? > > Thanks, > AustonFirst, you need to be careful relative to the way in which you are creating the data frame. 'trash', as you have created it, is a data frame of all factors:> str(trash)`data.frame': 5 obs. of 3 variables: $ age : Factor w/ 5 levels "1","2","3","4",..: 1 2 3 4 5 $ typeI : Factor w/ 2 levels "a","b": 1 1 2 1 2 $ typeII: Factor w/ 2 levels "a","b": 2 1 2 2 1 This is because you used cbind(), which will first result in a matrix of characters:> cbind(seq(1:5), c('a','a','b','a','b'), c('b','a','b','b','a'))[,1] [,2] [,3] [1,] "1" "a" "b" [2,] "2" "a" "a" [3,] "3" "b" "b" [4,] "4" "a" "b" [5,] "5" "b" "a" and then this matrix is converted into a data frame. In the process of converting the character matrix into a data frame, the characters are converted into factors. Thus, if you want to preserve the multiple data types, for which a data frame is used, you can do the following, noting that you can name the columns here in the same step: trash <- data.frame(age = 1:5, typeI = I(c('a','a','b','a','b')), typeII = I(c('b','a','b','b','a'))) In the above, note the use of "I(...)", which preserves the character nature of typeI and typeII:> str(trash)`data.frame': 5 obs. of 3 variables: $ age : int 1 2 3 4 5 $ typeI :Class 'AsIs' chr [1:5] "a" "a" "b" "a" ... $ typeII:Class 'AsIs' chr [1:5] "b" "a" "b" "b" ... Once you have the data frame in this format, you can then do your replacements. You could either do the conversion one column at a time, as you have done above, or you can do them in one step: trash[, 2:3] <- ifelse(trash[, 2:3] == 'a', 0, 1)> trashage typeI typeII 1 1 0 1 2 2 0 0 3 3 1 1 4 4 0 1 5 5 1 0> str(trash)`data.frame': 5 obs. of 3 variables: $ age : int 1 2 3 4 5 $ typeI : num 0 0 1 0 1 $ typeII: num 1 0 1 1 0 You should note however, that depending upon what you intend to do with the data in typeI and typeII, you may want to keep them as factors, since many functions (ie. modeling functions) utilize the factor data type specifically. See ?data.frame and ?I for more information. HTH, Marc Schwartz