HI R help, I was trying to get identical data frame from a list using two methods. #Suppose my list is: listdat1<-list(rnorm(10,20),rep(LETTERS[1:2],5),rep(1:5,2)) #Creating dataframe using cbind dat1<-data.frame(do.call("cbind",listdat1)) colnames(dat1)<-c("Var1","Var2","Var3") #Second dataframe conversion dat2<-data.frame(Var1=listdat1[[1]],Var2=listdat1[[2]],Var3=listdat1[[3]]) #Structure is different in two datasets ?>str(dat1) 'data.frame':??? 10 obs. of? 3 variables: ?$ Var1: Factor w/ 10 levels "18.6153321029756",..: 5 2 6 8 7 9 1 4 3 10 ?$ Var2: Factor w/ 2 levels "A","B": 1 2 1 2 1 2 1 2 1 2 ?$ Var3: Factor w/ 5 levels "1","2","3","4",..: 1 2 3 4 5 1 2 3 4 5> str(dat2)'data.frame':??? 10 obs. of? 3 variables: ?$ Var1: num? 20.3 19.2 20.5 20.9 20.5 ... ?$ Var2: Factor w/ 2 levels "A","B": 1 2 1 2 1 2 1 2 1 2 ?$ Var3: int? 1 2 3 4 5 1 2 3 4 5 #Converting structure of dat1 to match da2 structure dat1<-within(dat1,{Var1<-as.numeric(as.character(Var1)) ??? Var3<-as.integer(Var3)}) head(dat1) ????? Var1 Var2 Var3 1 20.27193??? A??? 1 2 19.17586??? B??? 2 3 20.53197??? A??? 3 4 20.93615??? B??? 4 5 20.53498??? A??? 5 6 21.02044??? B??? 1> head(dat2)????? Var1 Var2 Var3 1 20.27193??? A??? 1 2 19.17586??? B??? 2 3 20.53197??? A??? 3 4 20.93615??? B??? 4 5 20.53498??? A??? 5 6 21.02044??? B??? 1 #New structure?identical(str(dat1),str(dat2)) 'data.frame':??? 10 obs. of? 3 variables: ?$ Var1: num? 19.9 19 21.2 20.7 20.4 ... ?$ Var2: Factor w/ 2 levels "A","B": 1 2 1 2 1 2 1 2 1 2 ?$ Var3: int? 1 2 3 4 5 1 2 3 4 5 'data.frame':??? 10 obs. of? 3 variables: ?$ Var1: num? 19.9 19 21.2 20.7 20.4 ... ?$ Var2: Factor w/ 2 levels "A","B": 1 2 1 2 1 2 1 2 1 2 ?$ Var3: int? 1 2 3 4 5 1 2 3 4 5 [1] TRUE #structure is identical and dataframe looks to be same, but it is not identical.?> identical(dat1,dat2)[1] FALSE Is it something to do with the floating point? Thanks, A.K.
David L Carlson
2012-Jul-01 21:09 UTC
[R] list to dataframe conversion-testing for identical
Yes it does have something to do with the representation of floating point numbers. Using cbind() forces the list to become a matrix and that forces all of the data to become character strings since one of the list elements is character:> set.seed(42) > listdat1<-list(rnorm(10,20),rep(LETTERS[1:2],5),rep(1:5,2)) > str(do.call("cbind", listdat1))chr [1:10, 1:3] "21.3709584471467" "19.4353018286039" ... Then you convert that to a data.frame. The default in data.frame() is to convert characters to factors so you get> str(data.frame(do.call("cbind",listdat1)))'data.frame': 10 obs. of 3 variables: $ X1: Factor w/ 10 levels "19.4353018286039",..: 8 1 5 7 6 2 9 3 10 4 $ X2: Factor w/ 2 levels "A","B": 1 2 1 2 1 2 1 2 1 2 $ X3: Factor w/ 5 levels "1","2","3","4",..: 1 2 3 4 5 1 2 3 4 5 With dat2 you used data.frame() so the numeric fields were not converted to strings and then factors. Then you converted the dat1 factors back to numeric. You would be fine with just> dat1 <- data.frame(listdat1) > colnames(dat1) <- paste0("Var", 1:3)Or you can name the list elements and then convert> names(listdat1) <- paste0("Var", 1:3) > dat1 <- data.frame(listdat1)---------------------------------------------- David L Carlson Associate Professor of Anthropology Texas A&M University College Station, TX 77843-4352> -----Original Message----- > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r- > project.org] On Behalf Of arun > Sent: Sunday, July 01, 2012 12:56 PM > To: R help > Subject: [R] list to dataframe conversion-testing for identical > > HI R help, > > I was trying to get identical data frame from a list using two methods. > > #Suppose my list is: > listdat1<-list(rnorm(10,20),rep(LETTERS[1:2],5),rep(1:5,2)) > #Creating dataframe using cbind > > dat1<-data.frame(do.call("cbind",listdat1)) > colnames(dat1)<-c("Var1","Var2","Var3") > #Second dataframe conversion > > dat2<- > data.frame(Var1=listdat1[[1]],Var2=listdat1[[2]],Var3=listdat1[[3]]) > > #Structure is different in two datasets > ?>str(dat1) > 'data.frame':??? 10 obs. of? 3 variables: > ?$ Var1: Factor w/ 10 levels "18.6153321029756",..: 5 2 6 8 7 9 1 4 3 > 10 > ?$ Var2: Factor w/ 2 levels "A","B": 1 2 1 2 1 2 1 2 1 2 > ?$ Var3: Factor w/ 5 levels "1","2","3","4",..: 1 2 3 4 5 1 2 3 4 5 > > str(dat2) > 'data.frame':??? 10 obs. of? 3 variables: > ?$ Var1: num? 20.3 19.2 20.5 20.9 20.5 ... > ?$ Var2: Factor w/ 2 levels "A","B": 1 2 1 2 1 2 1 2 1 2 > ?$ Var3: int? 1 2 3 4 5 1 2 3 4 5 > > #Converting structure of dat1 to match da2 structure > dat1<-within(dat1,{Var1<-as.numeric(as.character(Var1)) > ??? Var3<-as.integer(Var3)}) > > head(dat1) > ????? Var1 Var2 Var3 > 1 20.27193??? A??? 1 > 2 19.17586??? B??? 2 > 3 20.53197??? A??? 3 > 4 20.93615??? B??? 4 > 5 20.53498??? A??? 5 > 6 21.02044??? B??? 1 > > head(dat2) > ????? Var1 Var2 Var3 > 1 20.27193??? A??? 1 > 2 19.17586??? B??? 2 > 3 20.53197??? A??? 3 > 4 20.93615??? B??? 4 > 5 20.53498??? A??? 5 > 6 21.02044??? B??? 1 > > > #New structure?identical(str(dat1),str(dat2)) > 'data.frame':??? 10 obs. of? 3 variables: > ?$ Var1: num? 19.9 19 21.2 20.7 20.4 ... > ?$ Var2: Factor w/ 2 levels "A","B": 1 2 1 2 1 2 1 2 1 2 > ?$ Var3: int? 1 2 3 4 5 1 2 3 4 5 > 'data.frame':??? 10 obs. of? 3 variables: > ?$ Var1: num? 19.9 19 21.2 20.7 20.4 ... > ?$ Var2: Factor w/ 2 levels "A","B": 1 2 1 2 1 2 1 2 1 2 > ?$ Var3: int? 1 2 3 4 5 1 2 3 4 5 > [1] TRUE > > > > #structure is identical and dataframe looks to be same, but it is not > identical. > > identical(dat1,dat2) > [1] FALSE > > > Is it something to do with the floating point? > > Thanks, > > A.K. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > and provide commented, minimal, self-contained, reproducible code.
Hello, But > all.equal(dat1,dat2) [1] TRUE So I guess it does have to do with floating-point equality, all.equal uses .Machine$double.eps. (Which could return FALSE on ocasions we would expect TRUE, when, for instance, the tolerance could/should be .Machine$double.eps^0.5.) Rui Barradas Em 01-07-2012 18:55, arun escreveu:> HI R help, > > I was trying to get identical data frame from a list using two methods. > > #Suppose my list is: > listdat1<-list(rnorm(10,20),rep(LETTERS[1:2],5),rep(1:5,2)) > #Creating dataframe using cbind > > dat1<-data.frame(do.call("cbind",listdat1)) > colnames(dat1)<-c("Var1","Var2","Var3") > #Second dataframe conversion > > dat2<-data.frame(Var1=listdat1[[1]],Var2=listdat1[[2]],Var3=listdat1[[3]]) > > #Structure is different in two datasets > >str(dat1) > 'data.frame': 10 obs. of 3 variables: > $ Var1: Factor w/ 10 levels "18.6153321029756",..: 5 2 6 8 7 9 1 4 3 10 > $ Var2: Factor w/ 2 levels "A","B": 1 2 1 2 1 2 1 2 1 2 > $ Var3: Factor w/ 5 levels "1","2","3","4",..: 1 2 3 4 5 1 2 3 4 5 >> str(dat2) > 'data.frame': 10 obs. of 3 variables: > $ Var1: num 20.3 19.2 20.5 20.9 20.5 ... > $ Var2: Factor w/ 2 levels "A","B": 1 2 1 2 1 2 1 2 1 2 > $ Var3: int 1 2 3 4 5 1 2 3 4 5 > > #Converting structure of dat1 to match da2 structure > dat1<-within(dat1,{Var1<-as.numeric(as.character(Var1)) > Var3<-as.integer(Var3)}) > > head(dat1) > Var1 Var2 Var3 > 1 20.27193 A 1 > 2 19.17586 B 2 > 3 20.53197 A 3 > 4 20.93615 B 4 > 5 20.53498 A 5 > 6 21.02044 B 1 >> head(dat2) > Var1 Var2 Var3 > 1 20.27193 A 1 > 2 19.17586 B 2 > 3 20.53197 A 3 > 4 20.93615 B 4 > 5 20.53498 A 5 > 6 21.02044 B 1 > > > #New structure identical(str(dat1),str(dat2)) > 'data.frame': 10 obs. of 3 variables: > $ Var1: num 19.9 19 21.2 20.7 20.4 ... > $ Var2: Factor w/ 2 levels "A","B": 1 2 1 2 1 2 1 2 1 2 > $ Var3: int 1 2 3 4 5 1 2 3 4 5 > 'data.frame': 10 obs. of 3 variables: > $ Var1: num 19.9 19 21.2 20.7 20.4 ... > $ Var2: Factor w/ 2 levels "A","B": 1 2 1 2 1 2 1 2 1 2 > $ Var3: int 1 2 3 4 5 1 2 3 4 5 > [1] TRUE > > > > #structure is identical and dataframe looks to be same, but it is not identical. >> identical(dat1,dat2) > [1] FALSE > > > Is it something to do with the floating point? > > Thanks, > > A.K. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >