Alan Smith
2009-Feb-18 21:01 UTC
[R] understanding how R determines numbers and characters when creating a data frame
Hello R Users and Developers, I have a basic question about how R works. Over the past few years I have struggled when I try to generate a new data frame that I believe should contain numeric data in some columns and character data in others only to find everything converted to character data. Is there a general method to create data frames that contain the data in the desired format: numbers as numeric and character as a factor etc? I often have this problem and in the worst case I have to export the file and read it back it in. I have emulated a simple example of the problem. It often happens while using "for" loops. Could someone explain how to avoid this problem by properly creating data frames in for loops that can contain both numeric and character data. ********Question for example 1. Why does the cbind command convert the numeric data to character data? Why can't the character data be converted to numeric data using the fix command? ### Example 1 ############# data(iris) obsnum<-NULL results<-NULL for(s in unique(as.character(iris$Species))){ temp1<-iris[iris$Species==s,] obsnum<-length(unique(temp1$Sepal.Length)) # a number out1<-cbind(species=as.character(paste(s)),obsnum) # number converted to character results<-rbind(out1,results) } results #fix(results) # cannot convert obsnum to numeric using fix #################################### ******Question for example 2 Why does adding the data.frame command allow the character data to be converted to numeric data using fix command? ### Example 2 ############# data(iris) obsnum<-NULL results<-NULL for(s in unique(as.character(iris$Species))){ temp1<-iris[iris$Species==s,] obsnum<-length(unique(temp1$Sepal.Length)) out1<-data.frame(cbind(species=as.character(paste(s)),obsnum)) # number converted to character results<-rbind(out1,results) } results #fix(results) # can now convert obsnum to numeric using fix ###### Thank you, Alan Smith [[alternative HTML version deleted]]
Greg Snow
2009-Feb-18 21:27 UTC
[R] understanding how R determines numbers and characters when creating a data frame
The culprit is the cbind function. When given 2 vectors (not already something else), cbind will create a matrix, not a data frame. A matrix can only have 1 type, so the numbers get converted to character. In your first example you never do create a data frame, you just build a matrix (try str(results)) so fix cannot change a single column to numeric in something that is a matrix. In the second example you do create a data frame so fix will allow changing of columns, but the cbind inside the call to data.frame is still creating a matrix (and converting numeric to character) before it is included in the data frame. Remove the cbind and just do: out1 <- data.frame(species=as.character(paste(s)),obsnum=obsnum) and then out1 will be a data frame without ever converting the number obsnum to a character. Hope this helps, -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.snow at imail.org 801.408.8111> -----Original Message----- > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r- > project.org] On Behalf Of Alan Smith > Sent: Wednesday, February 18, 2009 2:01 PM > To: r-help at r-project.org > Subject: [R] understanding how R determines numbers and characters when > creating a data frame > > Hello R Users and Developers, > > I have a basic question about how R works. Over the past few years I > have > struggled when I try to generate a new data frame that I believe should > contain numeric data in some columns and character data in others only > to > find everything converted to character data. Is there a general method > to > create data frames that contain the data in the desired format: > numbers as > numeric and character as a factor etc? I often have this problem and > in the > worst case I have to export the file and read it back it in. I have > emulated a simple example of the problem. It often happens while using > "for" loops. Could someone explain how to avoid this problem by > properly > creating data frames in for loops that can contain both numeric and > character data. > > > > ********Question for example 1. > > Why does the cbind command convert the numeric data to character data? > Why > can't the character data be converted to numeric data using the fix > command? > > > ### Example 1 ############# > > data(iris) > > obsnum<-NULL > > results<-NULL > > for(s in unique(as.character(iris$Species))){ > > temp1<-iris[iris$Species==s,] > > obsnum<-length(unique(temp1$Sepal.Length)) # a number > > out1<-cbind(species=as.character(paste(s)),obsnum) # number converted > to > character > > results<-rbind(out1,results) > > } > > results > > #fix(results) # cannot convert obsnum to numeric using fix > > #################################### > > > > ******Question for example 2 > > Why does adding the data.frame command allow the character data to be > converted to numeric data using fix command? > > ### Example 2 ############# > > data(iris) > > obsnum<-NULL > > results<-NULL > > for(s in unique(as.character(iris$Species))){ > > temp1<-iris[iris$Species==s,] > > obsnum<-length(unique(temp1$Sepal.Length)) > > out1<-data.frame(cbind(species=as.character(paste(s)),obsnum)) # number > converted to character > > results<-rbind(out1,results) > > } > > results > > #fix(results) # can now convert obsnum to numeric using fix > > > > ###### > > > > > > Thank you, > > Alan Smith > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > and provide commented, minimal, self-contained, reproducible code.
Domenico Vistocco
2009-Feb-18 21:32 UTC
[R] understanding how R determines numbers and characters when creating a data frame
Alan Smith wrote:> Hello R Users and Developers, > > I have a basic question about how R works. Over the past few years I have > struggled when I try to generate a new data frame that I believe should > contain numeric data in some columns and character data in others only to > find everything converted to character data. Is there a general method to > create data frames that contain the data in the desired format: numbers as > numeric and character as a factor etc? I often have this problem and in the > worst case I have to export the file and read it back it in. I have > emulated a simple example of the problem. It often happens while using > "for" loops. Could someone explain how to avoid this problem by properly > creating data frames in for loops that can contain both numeric and > character data. > > > > ********Question for example 1. > > Why does the cbind command convert the numeric data to character data? Why > can't the character data be converted to numeric data using the fix command? >See ?cbind for a detailed explanation. Anyway, when cbind/rbind is used on vector / matrix it returns matrix. Matrix are necessarily composed of the same type of data (see Introduction to R): combining character and numeric data you are implicitly converting the "short" type (numeric) to the "long" type (character).> > ### Example 1 ############# > > data(iris) > > obsnum<-NULL > > results<-NULL > > for(s in unique(as.character(iris$Species))){ > > temp1<-iris[iris$Species==s,] > > obsnum<-length(unique(temp1$Sepal.Length)) # a number > >Instead of using cbind here:> out1<-cbind(species=as.character(paste(s)),obsnum) # number converted to > character >using data.frame: out1 <- data.frame(species=as.character(paste(s)),obsnum) you are telling R to convert character in factor and to preserve the numeric: c(class(results$species),mode(results$species)) c(class(results$obsnum),mode(results$obsnum)) You can keep the character using the stringsAsFactors argument of the data.frame() function: out1 <- data.frame(species=as.character(paste(s)),obsnum, stringsAsFactors=FALSE) And then: class(results$species) The message is: if you want to mix up different data type you need lists (and data.frame are a special type of list where each component has the same number of elements). Ciao, domenico> results<-rbind(out1,results) > > } > > results > > #fix(results) # cannot convert obsnum to numeric using fix > > #################################### > > > > ******Question for example 2 > > Why does adding the data.frame command allow the character data to be > converted to numeric data using fix command? > > ### Example 2 ############# > > data(iris) > > obsnum<-NULL > > results<-NULL > > for(s in unique(as.character(iris$Species))){ > > temp1<-iris[iris$Species==s,] > > obsnum<-length(unique(temp1$Sepal.Length)) > > out1<-data.frame(cbind(species=as.character(paste(s)),obsnum)) # number > converted to character > > results<-rbind(out1,results) > > } > > results > > #fix(results) # can now convert obsnum to numeric using fix > > > > ###### > > > > > > Thank you, > > Alan Smith > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > >