Chris Evans
2016-Dec-06 21:26 UTC
[R] Odd behaviour of mean() with a numeric column in a tibble
I hope I am obeying the list rules here. I am using a raw R IDE for this and running 3.3.2 (2016-10-31) on x86_64-w64-mingw32/x64 (64-bit) Here is a reproducible example. Code only first require(tibble) tmpTibble <- tibble(ID=letters,num=1:26) min(tmpTibble[,2]) # fine max(tmpTibble[,2]) # fine median(tmpTibble[,2]) # not fine mean(tmpTibble[,2]) # not fine newMeanFun <- function(x) {mean(as.numeric(unlist(x)))} newMeanFun(tmpTibble[,2]) # solved problem but surely shouldn't be necessary?! newMedianFun <- function(x) {median(as.numeric(unlist(x)))} newMedianFun(tmpTibble[,2]) # ditto str(tmpTibble[,2]) ### then I tried this to make sure it wasn't about having fed in integers tmpTibble2 <- tibble(ID=letters,num=1:26,num2=(1:26)/10) tmpTibble2 mean(tmpTibble2[,3]) # not fine, not about integers! ### before I just created tmpTibble2 I found myself trying to add a column to tmpTibble tmpTibble$newNum <- tmpTibble[,2]/10 # NO! tmpTibble[["newNum"]] <- tmpTibble[,2]/10 # NO! ### and oddly enough ... add_column(tmpTibble,newNum = tmpTibble[,2]/10) # NO! Now here it is with the output:> require(tibble)Loading required package: tibble> tmpTibble <- tibble(ID=letters,num=1:26) > min(tmpTibble[,2]) # fine[1] 1> max(tmpTibble[,2]) # fine[1] 26> median(tmpTibble[,2]) # not fineError in median.default(tmpTibble[, 2]) : need numeric data> mean(tmpTibble[,2]) # not fine[1] NA Warning message: In mean.default(tmpTibble[, 2]) : argument is not numeric or logical: returning NA> newMeanFun <- function(x) {mean(as.numeric(unlist(x)))} > newMeanFun(tmpTibble[,2]) # solved problem but surely shouldn't be necessary?![1] 13.5> newMedianFun <- function(x) {median(as.numeric(unlist(x)))} > newMedianFun(tmpTibble[,2]) # ditto[1] 13.5> str(tmpTibble[,2])Classes ?tbl_df?, ?tbl? and 'data.frame': 26 obs. of 1 variable: $ num: int 1 2 3 4 5 6 7 8 9 10 ...> > ### then I tried this to make sure it wasn't about having fed in integers > > tmpTibble2 <- tibble(ID=letters,num=1:26,num2=(1:26)/10) > tmpTibble2# A tibble: 26 ? 3 ID num num2 <chr> <int> <dbl> 1 a 1 0.1 2 b 2 0.2 3 c 3 0.3 4 d 4 0.4 5 e 5 0.5 6 f 6 0.6 7 g 7 0.7 8 h 8 0.8 9 i 9 0.9 10 j 10 1.0 # ... with 16 more rows> mean(tmpTibble2[,3]) # not fine, not about integers![1] NA Warning message: In mean.default(tmpTibble2[, 3]) : argument is not numeric or logical: returning NA> > > ### before I just created tmpTibble2 I found myself trying to add a column to tmpTibble > tmpTibble$newNum <- tmpTibble[,2]/10 # NO! > tmpTibble[["newNum"]] <- tmpTibble[,2]/10 # NO! > ### and oddly enough ... > add_column(tmpTibble,newNum = tmpTibble[,2]/10) # NO!Error: Each variable must be a 1d atomic vector or list. Problem variables: 'newNum'> >I discovered this when I hit odd behaviour after using read_spss() from the haven package for the first time as it seemed to be offering a step forward over good old read.spss() from the excellent foreign package. I am reporting it here not directly to Prof. Wickham as the issues seem rather general though I'm guessing that it needs to be fixed with a fix to tibble. Or perhaps I've completely missed something. TIA, Chris
Ista Zahn
2016-Dec-06 21:40 UTC
[R] Odd behaviour of mean() with a numeric column in a tibble
Not at a computer to check right now, but I believe single bracket indexing a tibble always returns a tibble. To extract a vector use [[ On Dec 6, 2016 4:28 PM, "Chris Evans" <chrishold at psyctc.org> wrote:> > I hope I am obeying the list rules here. I am using a raw R IDE for thisand running 3.3.2 (2016-10-31) on x86_64-w64-mingw32/x64 (64-bit)> > Here is a reproducible example. Code only first > > require(tibble) > tmpTibble <- tibble(ID=letters,num=1:26) > min(tmpTibble[,2]) # fine > max(tmpTibble[,2]) # fine > median(tmpTibble[,2]) # not fine > mean(tmpTibble[,2]) # not fineI think you want mean(tmpTibble[[2]]> newMeanFun <- function(x) {mean(as.numeric(unlist(x)))} > newMeanFun(tmpTibble[,2]) # solved problem but surely shouldn't benecessary?!> newMedianFun <- function(x) {median(as.numeric(unlist(x)))} > newMedianFun(tmpTibble[,2]) # ditto > str(tmpTibble[,2]) > > ### then I tried this to make sure it wasn't about having fed in integers > > tmpTibble2 <- tibble(ID=letters,num=1:26,num2=(1:26)/10) > tmpTibble2 > mean(tmpTibble2[,3]) # not fine, not about integers! > > > ### before I just created tmpTibble2 I found myself trying to add acolumn to tmpTibble> tmpTibble$newNum <- tmpTibble[,2]/10 # NO! > tmpTibble[["newNum"]] <- tmpTibble[,2]/10 # NO! > ### and oddly enough ... > add_column(tmpTibble,newNum = tmpTibble[,2]/10) # NO! > > Now here it is with the output: > > > require(tibble) > Loading required package: tibble > > tmpTibble <- tibble(ID=letters,num=1:26) > > min(tmpTibble[,2]) # fine > [1] 1 > > max(tmpTibble[,2]) # fine > [1] 26 > > median(tmpTibble[,2]) # not fine > Error in median.default(tmpTibble[, 2]) : need numeric data > > mean(tmpTibble[,2]) # not fine > [1] NA > Warning message: > In mean.default(tmpTibble[, 2]) : > argument is not numeric or logical: returning NA > > newMeanFun <- function(x) {mean(as.numeric(unlist(x)))} > > newMeanFun(tmpTibble[,2]) # solved problem but surely shouldn't benecessary?!> [1] 13.5 > > newMedianFun <- function(x) {median(as.numeric(unlist(x)))} > > newMedianFun(tmpTibble[,2]) # ditto > [1] 13.5 > > str(tmpTibble[,2]) > Classes ?tbl_df?, ?tbl? and 'data.frame': 26 obs. of 1 variable: > $ num: int 1 2 3 4 5 6 7 8 9 10 ... > > > > ### then I tried this to make sure it wasn't about having fed inintegers> > > > tmpTibble2 <- tibble(ID=letters,num=1:26,num2=(1:26)/10) > > tmpTibble2 > # A tibble: 26 ? 3 > ID num num2 > <chr> <int> <dbl> > 1 a 1 0.1 > 2 b 2 0.2 > 3 c 3 0.3 > 4 d 4 0.4 > 5 e 5 0.5 > 6 f 6 0.6 > 7 g 7 0.7 > 8 h 8 0.8 > 9 i 9 0.9 > 10 j 10 1.0 > # ... with 16 more rows > > mean(tmpTibble2[,3]) # not fine, not about integers! > [1] NA > Warning message: > In mean.default(tmpTibble2[, 3]) : > argument is not numeric or logical: returning NA > > > > > > ### before I just created tmpTibble2 I found myself trying to add acolumn to tmpTibble> > tmpTibble$newNum <- tmpTibble[,2]/10 # NO! > > tmpTibble[["newNum"]] <- tmpTibble[,2]/10 # NO! > > ### and oddly enough ... > > add_column(tmpTibble,newNum = tmpTibble[,2]/10) # NO! > Error: Each variable must be a 1d atomic vector or list. > Problem variables: 'newNum' > > > > > > I discovered this when I hit odd behaviour after using read_spss() fromthe haven package for the first time as it seemed to be offering a step forward over good old read.spss() from the excellent foreign package. I am reporting it here not directly to Prof. Wickham as the issues seem rather general though I'm guessing that it needs to be fixed with a fix to tibble. Or perhaps I've completely missed something.> > TIA, > > Chris > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code.[[alternative HTML version deleted]]
Chris Evans
2016-Dec-06 22:10 UTC
[R] Odd behaviour of mean() with a numeric column in a tibble
{{SIGH}} You are absolutely right. I wonder if I am losing some cognitive capacities that are needed to be part of the evolving R community. It seems to me that if a tibble is designed to be an enhanced replacement for a dataframe then it shouldn't quite so radically change things. I notice that the documentation on tibble says "[ Never simplifies (drops), so always returns data.frame" That is much less explicit than I would have liked and actually doesn't seem to be true. In fact, as you rightly say, it generally, but not quite always, returns a tibble. In fact it can be fooled into a vector of length 1.> tmpTibble[[1,]]Error in `[[.data.frame`(tmpTibble, 1, ) : argument "..2" is missing, with no default> tmpTibble[1]# A tibble: 26 ? 1 ID <chr> 1 a 2 b 3 c 4 d 5 e 6 f 7 g 8 h 9 i 10 j # ... with 16 more rows> tmpTibble[,1]# A tibble: 26 ? 1 ID <chr> 1 a 2 b 3 c 4 d 5 e 6 f 7 g 8 h 9 i 10 j # ... with 16 more rows> tmpTibble[1,]Error in `[<-.data.frame`(`*tmp*`, , value = list(ID = c("a", "a", "a", : replacement element 3 is a matrix/data frame of 26 rows, need 1 In addition: Warning messages: 1: In `[<-.data.frame`(`*tmp*`, , value = list(ID = c("a", "a", "a", : replacement element 1 has 26 rows to replace 1 rows 2: In `[<-.data.frame`(`*tmp*`, , value = list(ID = c("a", "a", "a", : replacement element 2 has 26 rows to replace 1 rows> tmpTibble[1,1:26]Error: Invalid column indexes: 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26> tmpTibble[[1,2]][1] 1> str(tmpTibble[[1,2]])int 1> str(tmpTibble[[1:2,2]])Error in col[[i, exact = exact]] : attempt to select more than one element in vectorIndex> > tmpTibble[[1,1:2]][1] "b">So [[a,b]] works if a and b are legal with the dimensions of the tibble and if a is of length 1 but returns NOT a tibble but a vector of length 1 (I think), I can see that's logical but not what it says in the documentation. [[a]] and [[,a]] return the same result, that seems excessively tolerant to me. [[a,b:c]] actually returns [[a,c]] and again as a single value, NOT a tibble. And row subsetting/indexing has gone. Why create replacement for a dataframe that has no row indexing and so radically redefines column indexing, in fact redefines the whole of indexing and subsetting? OK. I will go to sleep now and hope to feel less dumb(ed) when I wake. Perhaps Prof. Wickham or someone can spell out a bit less tersely, and I think incompletely, than the tibble documentation does, why all this is good. Thanks anyway Ista, you certainly hit the issue! Very best all, Chris> From: "Ista Zahn" <istazahn at gmail.com> > To: "Chris Evans" <chrishold at psyctc.org> > Cc: "r-helpr-project.org" <r-help at r-project.org> > Sent: Tuesday, 6 December, 2016 21:40:41 > Subject: Re: [R] Odd behaviour of mean() with a numeric column in a tibble> Not at a computer to check right now, but I believe single bracket indexing a > tibble always returns a tibble. To extract a vector use [[> On Dec 6, 2016 4:28 PM, "Chris Evans" < chrishold at psyctc.org > wrote:>> I hope I am obeying the list rules here. I am using a raw R IDE for this and > > running 3.3.2 (2016-10-31) on x86_64-w64-mingw32/x64 (64-bit)> > Here is a reproducible example. Code only first> > require(tibble) > > tmpTibble <- tibble(ID=letters,num=1:26) > > min(tmpTibble[,2]) # fine > > max(tmpTibble[,2]) # fine > > median(tmpTibble[,2]) # not fine > > mean(tmpTibble[,2]) # not fine> I think you want> mean(tmpTibble[[2]]> > newMeanFun <- function(x) {mean(as.numeric(unlist(x)))} > > newMeanFun(tmpTibble[,2]) # solved problem but surely shouldn't be necessary?! > > newMedianFun <- function(x) {median(as.numeric(unlist(x)))} > > newMedianFun(tmpTibble[,2]) # ditto > > str(tmpTibble[,2])> > ### then I tried this to make sure it wasn't about having fed in integers> > tmpTibble2 <- tibble(ID=letters,num=1:26,num2=(1:26)/10) > > tmpTibble2 > > mean(tmpTibble2[,3]) # not fine, not about integers!>> ### before I just created tmpTibble2 I found myself trying to add a column to > > tmpTibble > > tmpTibble$newNum <- tmpTibble[,2]/10 # NO! > > tmpTibble[["newNum"]] <- tmpTibble[,2]/10 # NO! > > ### and oddly enough ... > > add_column(tmpTibble,newNum = tmpTibble[,2]/10) # NO!> > Now here it is with the output:> > > require(tibble) > > Loading required package: tibble > > > tmpTibble <- tibble(ID=letters,num=1:26) > > > min(tmpTibble[,2]) # fine > > [1] 1 > > > max(tmpTibble[,2]) # fine > > [1] 26 > > > median(tmpTibble[,2]) # not fine > > Error in median.default(tmpTibble[, 2]) : need numeric data > > > mean(tmpTibble[,2]) # not fine > > [1] NA > > Warning message: > > In mean.default(tmpTibble[, 2]) : > > argument is not numeric or logical: returning NA > > > newMeanFun <- function(x) {mean(as.numeric(unlist(x)))} > > > newMeanFun(tmpTibble[,2]) # solved problem but surely shouldn't be necessary?! > > [1] 13.5 > > > newMedianFun <- function(x) {median(as.numeric(unlist(x)))} > > > newMedianFun(tmpTibble[,2]) # ditto > > [1] 13.5 > > > str(tmpTibble[,2]) > > Classes ?tbl_df?, ?tbl? and 'data.frame': 26 obs. of 1 variable: > > $ num: int 1 2 3 4 5 6 7 8 9 10 ...> > > ### then I tried this to make sure it wasn't about having fed in integers> > > tmpTibble2 <- tibble(ID=letters,num=1:26,num2=(1:26)/10) > > > tmpTibble2 > > # A tibble: 26 ? 3 > > ID num num2 > > <chr> <int> <dbl> > > 1 a 1 0.1 > > 2 b 2 0.2 > > 3 c 3 0.3 > > 4 d 4 0.4 > > 5 e 5 0.5 > > 6 f 6 0.6 > > 7 g 7 0.7 > > 8 h 8 0.8 > > 9 i 9 0.9 > > 10 j 10 1.0 > > # ... with 16 more rows > > > mean(tmpTibble2[,3]) # not fine, not about integers! > > [1] NA > > Warning message: > > In mean.default(tmpTibble2[, 3]) : > > argument is not numeric or logical: returning NA>> > ### before I just created tmpTibble2 I found myself trying to add a column to > > > tmpTibble > > > tmpTibble$newNum <- tmpTibble[,2]/10 # NO! > > > tmpTibble[["newNum"]] <- tmpTibble[,2]/10 # NO! > > > ### and oddly enough ... > > > add_column(tmpTibble,newNum = tmpTibble[,2]/10) # NO! > > Error: Each variable must be a 1d atomic vector or list. > > Problem variables: 'newNum'>> I discovered this when I hit odd behaviour after using read_spss() from the >> haven package for the first time as it seemed to be offering a step forward >> over good old read.spss() from the excellent foreign package. I am reporting it >> here not directly to Prof. Wickham as the issues seem rather general though I'm >> guessing that it needs to be fixed with a fix to tibble. Or perhaps I've > > completely missed something.> > TIA,> > Chris> > ______________________________________________ > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code.[[alternative HTML version deleted]]