Akhilesh Singh
2016-Apr-15 08:16 UTC
[R] Bug in by() function which works for some FUN argument and does not work for others
Dear All, Thanks for your help. However, I would like to draw your attention to the following: Actually, I was replicating the Example 2.3, using the dataset "brainsize.txt" given in Section 2.3.3 ("Summarize by group") at page 55, of a famous book "R by Example" written by "Jim Albert and Maria Rizzo" published in Springers (2012) in a Use R! Series. The output of the by() function printed in the book is being reproduced below for information to all:> by(data=brain[, -1], INDICES=brain$Gender, FUN=mean, na.rm=TRUE)brain$Gender: Female FSIQ VIQ PIQ Weight Height MRI_Count 111.900 109.450 110.450 137.200 65.765 862654.600 ------------------------------------------------------------ brain$Gender: Male FSIQ VIQ PIQ Weight Height MRI_Count 115.00000 115.25000 111.60000 166.44444 71.43158 954855.40000 I do not know how could the writers of the book have produced the above results by by() function. But, when I could not reproduce these results, then I thought that probably, this could possibly be due to some missing values NA's in Weight and Height variables. Then I tried the above code for the "mtcars" dataset for INDICES=mtcars$am. When I found the same results here too, then I reported the case in "r-help at R-project.org". With best regards, Dr. A.K. Singh Head, Department of Agril. Statistics Indira Gandhi Krishi Vishwavidyalaya, Raipur Chhattisgarh, India, PIN-492012 Mobile: +919752620740 Email: akhileshsingh.igkv at gmail.com On Fri, Apr 15, 2016 at 3:06 AM, Adrian Du?a <dusa.adrian at unibuc.ro> wrote:> I think you are not using the best function for what your intentions are. > Try: > > > by(data=mtcars, INDICES=list(as.factor(mtcars$am)), FUN=colMeans) > : 0 > mpg cyl disp hp drat wt > qsec vs > 17.1473684 6.9473684 290.3789474 160.2631579 3.2863158 3.7688947 > 18.1831579 0.3684211 > am gear carb > 0.0000000 3.2105263 2.7368421 > > --------------------------------------------------------------------------- > : 1 > mpg cyl disp hp drat wt > qsec vs > 24.3923077 5.0769231 143.5307692 126.8461538 4.0500000 2.4110000 > 17.3600000 0.5384615 > am gear carb > 1.0000000 4.3846154 2.9230769 > > See the difference between colMeans() and mean() in their respective help > files. > Hth, > Adrian > > On Thu, Apr 14, 2016 at 11:14 PM, Akhilesh Singh < > akhileshsingh.igkv at gmail.com> wrote: > >> Dear Sirs, >> >> I am Professor at Indira Gandhi Krishi Vishwavidyalaya, Raipur, >> Chhattisgarh, India. >> >> While taking classes, I found the *by() *function producing following >> error >> >> when I use FUN=mean or median and some other functions, however, >> FUN=summary works. >> >> Given below is the output of the example I used on a built-in dataset >> "mtcars", along with error message reproduced herewith: >> >> > by(data=mtcars, INDICES=list(mtcars$am), FUN=mean) >> : 0 >> [1] NA >> ------------------------------------------------------------ >> : 1 >> [1] NA >> Warning messages: >> 1: In mean.default(data[x, , drop = FALSE], ...) : >> argument is not numeric or logical: returning NA >> 2: In mean.default(data[x, , drop = FALSE], ...) : >> argument is not numeric or logical: returning NA >> >> However, the same by() function works for FUN=summary, given below is the >> output: >> >> > by(data=mtcars, INDICES=list(mtcars$am), FUN=summary) >> : 0 >> mpg cyl disp hp >> Min. :10.40 Min. :4.000 Min. :120.1 Min. : 62.0 >> 1st Qu.:14.95 1st Qu.:6.000 1st Qu.:196.3 1st Qu.:116.5 >> Median :17.30 Median :8.000 Median :275.8 Median :175.0 >> Mean :17.15 Mean :6.947 Mean :290.4 Mean :160.3 >> 3rd Qu.:19.20 3rd Qu.:8.000 3rd Qu.:360.0 3rd Qu.:192.5 >> Max. :24.40 Max. :8.000 Max. :472.0 Max. :245.0 >> drat wt qsec vs am >> >> Min. :2.760 Min. :2.465 Min. :15.41 Min. :0.0000 Min. >> :0 >> >> 1st Qu.:3.070 1st Qu.:3.438 1st Qu.:17.18 1st Qu.:0.0000 1st >> Qu.:0 >> >> Median :3.150 Median :3.520 Median :17.82 Median :0.0000 Median >> :0 >> >> Mean :3.286 Mean :3.769 Mean :18.18 Mean :0.3684 Mean >> :0 >> >> 3rd Qu.:3.695 3rd Qu.:3.842 3rd Qu.:19.17 3rd Qu.:1.0000 3rd >> Qu.:0 >> >> Max. :3.920 Max. :5.424 Max. :22.90 Max. :1.0000 Max. >> :0 >> >> gear carb >> Min. :3.000 Min. :1.000 >> 1st Qu.:3.000 1st Qu.:2.000 >> Median :3.000 Median :3.000 >> Mean :3.211 Mean :2.737 >> 3rd Qu.:3.000 3rd Qu.:4.000 >> Max. :4.000 Max. :4.000 >> ------------------------------------------------------------ >> : 1 >> mpg cyl disp hp drat >> >> Min. :15.00 Min. :4.000 Min. : 71.1 Min. : 52.0 Min. >> :3.54 >> 1st Qu.:21.00 1st Qu.:4.000 1st Qu.: 79.0 1st Qu.: 66.0 1st >> Qu.:3.85 >> Median :22.80 Median :4.000 Median :120.3 Median :109.0 Median >> :4.08 >> Mean :24.39 Mean :5.077 Mean :143.5 Mean :126.8 Mean >> :4.05 >> 3rd Qu.:30.40 3rd Qu.:6.000 3rd Qu.:160.0 3rd Qu.:113.0 3rd >> Qu.:4.22 >> Max. :33.90 Max. :8.000 Max. :351.0 Max. :335.0 Max. >> :4.93 >> wt qsec vs am gear >> >> Min. :1.513 Min. :14.50 Min. :0.0000 Min. :1 Min. >> :4.000 >> >> 1st Qu.:1.935 1st Qu.:16.46 1st Qu.:0.0000 1st Qu.:1 1st >> Qu.:4.000 >> >> Median :2.320 Median :17.02 Median :1.0000 Median :1 Median >> :4.000 >> >> Mean :2.411 Mean :17.36 Mean :0.5385 Mean :1 Mean >> :4.385 >> >> 3rd Qu.:2.780 3rd Qu.:18.61 3rd Qu.:1.0000 3rd Qu.:1 3rd >> Qu.:5.000 >> >> Max. :3.570 Max. :19.90 Max. :1.0000 Max. :1 Max. >> :5.000 >> >> carb >> Min. :1.000 >> 1st Qu.:1.000 >> Median :2.000 >> Mean :2.923 >> 3rd Qu.:4.000 >> Max. :8.000 >> > >> >> I am using the latest version of *R-3.2.4 on Windows*, however, this error >> is being generated in the previous version too, >> >> Hope this reporting will get serious attention in debugging. >> >> With best regards, >> >> Dr. A.K. Singh >> Head, Department of Agril. Statistics >> Indira Gandhi Krishi Vishwavidyalaya, Raipur >> Chhattisgarh, India, PIN-492012 >> Mobile: +919752620740 >> Email: akhileshsingh.igkv at gmail.com >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > > > -- > Adrian Dusa > University of Bucharest > Romanian Social Data Archive > Soseaua Panduri nr.90 > 050663 Bucharest sector 5 > Romania >[[alternative HTML version deleted]]
David Winsemius
2016-Apr-15 08:54 UTC
[R] Bug in by() function which works for some FUN argument and does not work for others
> On Apr 15, 2016, at 1:16 AM, Akhilesh Singh <akhileshsingh.igkv at gmail.com> wrote: > > Dear All, > > Thanks for your help. However, I would like to draw your attention to the > following: > > Actually, I was replicating the Example 2.3, using the dataset > "brainsize.txt" given in Section 2.3.3 ("Summarize by group") at page 55, > of a famous book "R by Example" written by "Jim Albert and Maria Rizzo" > published in Springers (2012) in a Use R! Series. The output of the by() > function printed in the book is being reproduced below for information to > all: > >> by(data=brain[, -1], INDICES=brain$Gender, FUN=mean, na.rm=TRUE) > brain$Gender: Female > FSIQ VIQ PIQ Weight Height MRI_Count > 111.900 109.450 110.450 137.200 65.765 862654.600 > ------------------------------------------------------------ > brain$Gender: Male > FSIQ VIQ PIQ Weight Height MRI_Count > 115.00000 115.25000 111.60000 166.44444 71.43158 954855.40000 > > > I do not know how could the writers of the book have produced the above > results by by() function.There was in the not-so-distant past a function named `mean.data.frame` which would have "worked" in that instance. That function was removed. I thought you could find the exact date of that action by searching the NEWS but failed. Reviewing the citations of `mean.data.frame` in the r-help archives I see that users were being warned that its use was deprecated in mid 2012. It's very possible that the authors of a book in 2012 were using an earlier version of R that had that facility available to them before it was deprecated. With a more than current version of R 3.3.0 and a modest number of loaded packages I see this:> methods(mean)[1] mean,ANY-method mean,Matrix-method mean,Raster-method [4] mean,sparseMatrix-method mean,sparseVector-method mean.Date [7] mean.default mean.difftime mean.POSIXct [10] mean.POSIXlt mean.yearmon* mean.yearqtr* [13] mean.zoo* It is your responsibility to determine whether any particular function in your version of R satisfies the language requirements at the time of your use. Jim Albert and Maria Rizzo do not set the standards for what is an evolving piece of software. -- David.> But, when I could not reproduce these results, > then I thought that probably, this could possibly be due to some missing > values NA's in Weight and Height variables. Then I tried the above code for > the "mtcars" dataset for INDICES=mtcars$am. When I found the same results > here too, then I reported the case in "r-help at R-project.org". > > With best regards, > > Dr. A.K. Singh > Head, Department of Agril. Statistics > Indira Gandhi Krishi Vishwavidyalaya, Raipur > Chhattisgarh, India, PIN-492012 > Mobile: +919752620740 > Email: akhileshsingh.igkv at gmail.com > > On Fri, Apr 15, 2016 at 3:06 AM, Adrian Du?a <dusa.adrian at unibuc.ro> wrote: > >> I think you are not using the best function for what your intentions are. >> Try: >> >>> by(data=mtcars, INDICES=list(as.factor(mtcars$am)), FUN=colMeans) >> : 0 >> mpg cyl disp hp drat wt >> qsec vs >> 17.1473684 6.9473684 290.3789474 160.2631579 3.2863158 3.7688947 >> 18.1831579 0.3684211 >> am gear carb >> 0.0000000 3.2105263 2.7368421 >> >> --------------------------------------------------------------------------- >> : 1 >> mpg cyl disp hp drat wt >> qsec vs >> 24.3923077 5.0769231 143.5307692 126.8461538 4.0500000 2.4110000 >> 17.3600000 0.5384615 >> am gear carb >> 1.0000000 4.3846154 2.9230769 >> >> See the difference between colMeans() and mean() in their respective help >> files. >> Hth, >> Adrian >> >> On Thu, Apr 14, 2016 at 11:14 PM, Akhilesh Singh < >> akhileshsingh.igkv at gmail.com> wrote: >> >>> Dear Sirs, >>> >>> I am Professor at Indira Gandhi Krishi Vishwavidyalaya, Raipur, >>> Chhattisgarh, India. >>> >>> While taking classes, I found the *by() *function producing following >>> error >>> >>> when I use FUN=mean or median and some other functions, however, >>> FUN=summary works. >>> >>> Given below is the output of the example I used on a built-in dataset >>> "mtcars", along with error message reproduced herewith: >>> >>>> by(data=mtcars, INDICES=list(mtcars$am), FUN=mean) >>> : 0 >>> [1] NA >>> ------------------------------------------------------------ >>> : 1 >>> [1] NA >>> Warning messages: >>> 1: In mean.default(data[x, , drop = FALSE], ...) : >>> argument is not numeric or logical: returning NA >>> 2: In mean.default(data[x, , drop = FALSE], ...) : >>> argument is not numeric or logical: returning NA >>> >>> However, the same by() function works for FUN=summary, given below is the >>> output: >>> >>>> by(data=mtcars, INDICES=list(mtcars$am), FUN=summary) >>> : 0 >>> mpg cyl disp hp >>> Min. :10.40 Min. :4.000 Min. :120.1 Min. : 62.0 >>> 1st Qu.:14.95 1st Qu.:6.000 1st Qu.:196.3 1st Qu.:116.5 >>> Median :17.30 Median :8.000 Median :275.8 Median :175.0 >>> Mean :17.15 Mean :6.947 Mean :290.4 Mean :160.3 >>> 3rd Qu.:19.20 3rd Qu.:8.000 3rd Qu.:360.0 3rd Qu.:192.5 >>> Max. :24.40 Max. :8.000 Max. :472.0 Max. :245.0 >>> drat wt qsec vs am >>> >>> Min. :2.760 Min. :2.465 Min. :15.41 Min. :0.0000 Min. >>> :0 >>> >>> 1st Qu.:3.070 1st Qu.:3.438 1st Qu.:17.18 1st Qu.:0.0000 1st >>> Qu.:0 >>> >>> Median :3.150 Median :3.520 Median :17.82 Median :0.0000 Median >>> :0 >>> >>> Mean :3.286 Mean :3.769 Mean :18.18 Mean :0.3684 Mean >>> :0 >>> >>> 3rd Qu.:3.695 3rd Qu.:3.842 3rd Qu.:19.17 3rd Qu.:1.0000 3rd >>> Qu.:0 >>> >>> Max. :3.920 Max. :5.424 Max. :22.90 Max. :1.0000 Max. >>> :0 >>> >>> gear carb >>> Min. :3.000 Min. :1.000 >>> 1st Qu.:3.000 1st Qu.:2.000 >>> Median :3.000 Median :3.000 >>> Mean :3.211 Mean :2.737 >>> 3rd Qu.:3.000 3rd Qu.:4.000 >>> Max. :4.000 Max. :4.000 >>> ------------------------------------------------------------ >>> : 1 >>> mpg cyl disp hp drat >>> >>> Min. :15.00 Min. :4.000 Min. : 71.1 Min. : 52.0 Min. >>> :3.54 >>> 1st Qu.:21.00 1st Qu.:4.000 1st Qu.: 79.0 1st Qu.: 66.0 1st >>> Qu.:3.85 >>> Median :22.80 Median :4.000 Median :120.3 Median :109.0 Median >>> :4.08 >>> Mean :24.39 Mean :5.077 Mean :143.5 Mean :126.8 Mean >>> :4.05 >>> 3rd Qu.:30.40 3rd Qu.:6.000 3rd Qu.:160.0 3rd Qu.:113.0 3rd >>> Qu.:4.22 >>> Max. :33.90 Max. :8.000 Max. :351.0 Max. :335.0 Max. >>> :4.93 >>> wt qsec vs am gear >>> >>> Min. :1.513 Min. :14.50 Min. :0.0000 Min. :1 Min. >>> :4.000 >>> >>> 1st Qu.:1.935 1st Qu.:16.46 1st Qu.:0.0000 1st Qu.:1 1st >>> Qu.:4.000 >>> >>> Median :2.320 Median :17.02 Median :1.0000 Median :1 Median >>> :4.000 >>> >>> Mean :2.411 Mean :17.36 Mean :0.5385 Mean :1 Mean >>> :4.385 >>> >>> 3rd Qu.:2.780 3rd Qu.:18.61 3rd Qu.:1.0000 3rd Qu.:1 3rd >>> Qu.:5.000 >>> >>> Max. :3.570 Max. :19.90 Max. :1.0000 Max. :1 Max. >>> :5.000 >>> >>> carb >>> Min. :1.000 >>> 1st Qu.:1.000 >>> Median :2.000 >>> Mean :2.923 >>> 3rd Qu.:4.000 >>> Max. :8.000 >>>> >>> >>> I am using the latest version of *R-3.2.4 on Windows*, however, this error >>> is being generated in the previous version too, >>> >>> Hope this reporting will get serious attention in debugging. >>> >>> With best regards, >>> >>> Dr. A.K. Singh >>> Head, Department of Agril. Statistics >>> Indira Gandhi Krishi Vishwavidyalaya, Raipur >>> Chhattisgarh, India, PIN-492012 >>> Mobile: +919752620740 >>> Email: akhileshsingh.igkv at gmail.com >>> >>> [[alternative HTML version deleted]] >>> >>> ______________________________________________ >>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >>> http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >>> >> >> >> >> -- >> Adrian Dusa >> University of Bucharest >> Romanian Social Data Archive >> Soseaua Panduri nr.90 >> 050663 Bucharest sector 5 >> Romania >> > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.David Winsemius Alameda, CA, USA
peter dalgaard
2016-Apr-15 09:02 UTC
[R] Bug in by() function which works for some FUN argument and does not work for others
Books don't rewrite themselves retroactively.... NEWS for 3.0.0 has ? mean() for data frames and sd() for data frames and matrices are defunct. and 3.0.0 was released April 3, 2013. A book published in 2012 would likely be based on R 2.13.x or maybe even 2.12.x. So mean(dataframe) worked in the past. It was changed because of inconsistencies, e.g. mean(as.matrix(dataframe)) is a single number, median.data.frame never existed, var(dataframe) differed from sd(dataframe)^2, etc. The deprecation/defunct process started with 2.14.0-pre in October 2011. -pd On 15 Apr 2016, at 10:16 , Akhilesh Singh <akhileshsingh.igkv at gmail.com> wrote:> Dear All, > > Thanks for your help. However, I would like to draw your attention to the > following: > > Actually, I was replicating the Example 2.3, using the dataset > "brainsize.txt" given in Section 2.3.3 ("Summarize by group") at page 55, > of a famous book "R by Example" written by "Jim Albert and Maria Rizzo" > published in Springers (2012) in a Use R! Series. The output of the by() > function printed in the book is being reproduced below for information to > all: > >> by(data=brain[, -1], INDICES=brain$Gender, FUN=mean, na.rm=TRUE) > brain$Gender: Female > FSIQ VIQ PIQ Weight Height MRI_Count > 111.900 109.450 110.450 137.200 65.765 862654.600 > ------------------------------------------------------------ > brain$Gender: Male > FSIQ VIQ PIQ Weight Height MRI_Count > 115.00000 115.25000 111.60000 166.44444 71.43158 954855.40000 > > > I do not know how could the writers of the book have produced the above > results by by() function. But, when I could not reproduce these results, > then I thought that probably, this could possibly be due to some missing > values NA's in Weight and Height variables. Then I tried the above code for > the "mtcars" dataset for INDICES=mtcars$am. When I found the same results > here too, then I reported the case in "r-help at R-project.org". > > With best regards, > > Dr. A.K. Singh > Head, Department of Agril. Statistics > Indira Gandhi Krishi Vishwavidyalaya, Raipur > Chhattisgarh, India, PIN-492012 > Mobile: +919752620740 > Email: akhileshsingh.igkv at gmail.com > > On Fri, Apr 15, 2016 at 3:06 AM, Adrian Du?a <dusa.adrian at unibuc.ro> wrote: > >> I think you are not using the best function for what your intentions are. >> Try: >> >>> by(data=mtcars, INDICES=list(as.factor(mtcars$am)), FUN=colMeans) >> : 0 >> mpg cyl disp hp drat wt >> qsec vs >> 17.1473684 6.9473684 290.3789474 160.2631579 3.2863158 3.7688947 >> 18.1831579 0.3684211 >> am gear carb >> 0.0000000 3.2105263 2.7368421 >> >> --------------------------------------------------------------------------- >> : 1 >> mpg cyl disp hp drat wt >> qsec vs >> 24.3923077 5.0769231 143.5307692 126.8461538 4.0500000 2.4110000 >> 17.3600000 0.5384615 >> am gear carb >> 1.0000000 4.3846154 2.9230769 >> >> See the difference between colMeans() and mean() in their respective help >> files. >> Hth, >> Adrian >> >> On Thu, Apr 14, 2016 at 11:14 PM, Akhilesh Singh < >> akhileshsingh.igkv at gmail.com> wrote: >> >>> Dear Sirs, >>> >>> I am Professor at Indira Gandhi Krishi Vishwavidyalaya, Raipur, >>> Chhattisgarh, India. >>> >>> While taking classes, I found the *by() *function producing following >>> error >>> >>> when I use FUN=mean or median and some other functions, however, >>> FUN=summary works. >>> >>> Given below is the output of the example I used on a built-in dataset >>> "mtcars", along with error message reproduced herewith: >>> >>>> by(data=mtcars, INDICES=list(mtcars$am), FUN=mean) >>> : 0 >>> [1] NA >>> ------------------------------------------------------------ >>> : 1 >>> [1] NA >>> Warning messages: >>> 1: In mean.default(data[x, , drop = FALSE], ...) : >>> argument is not numeric or logical: returning NA >>> 2: In mean.default(data[x, , drop = FALSE], ...) : >>> argument is not numeric or logical: returning NA >>> >>> However, the same by() function works for FUN=summary, given below is the >>> output: >>> >>>> by(data=mtcars, INDICES=list(mtcars$am), FUN=summary) >>> : 0 >>> mpg cyl disp hp >>> Min. :10.40 Min. :4.000 Min. :120.1 Min. : 62.0 >>> 1st Qu.:14.95 1st Qu.:6.000 1st Qu.:196.3 1st Qu.:116.5 >>> Median :17.30 Median :8.000 Median :275.8 Median :175.0 >>> Mean :17.15 Mean :6.947 Mean :290.4 Mean :160.3 >>> 3rd Qu.:19.20 3rd Qu.:8.000 3rd Qu.:360.0 3rd Qu.:192.5 >>> Max. :24.40 Max. :8.000 Max. :472.0 Max. :245.0 >>> drat wt qsec vs am >>> >>> Min. :2.760 Min. :2.465 Min. :15.41 Min. :0.0000 Min. >>> :0 >>> >>> 1st Qu.:3.070 1st Qu.:3.438 1st Qu.:17.18 1st Qu.:0.0000 1st >>> Qu.:0 >>> >>> Median :3.150 Median :3.520 Median :17.82 Median :0.0000 Median >>> :0 >>> >>> Mean :3.286 Mean :3.769 Mean :18.18 Mean :0.3684 Mean >>> :0 >>> >>> 3rd Qu.:3.695 3rd Qu.:3.842 3rd Qu.:19.17 3rd Qu.:1.0000 3rd >>> Qu.:0 >>> >>> Max. :3.920 Max. :5.424 Max. :22.90 Max. :1.0000 Max. >>> :0 >>> >>> gear carb >>> Min. :3.000 Min. :1.000 >>> 1st Qu.:3.000 1st Qu.:2.000 >>> Median :3.000 Median :3.000 >>> Mean :3.211 Mean :2.737 >>> 3rd Qu.:3.000 3rd Qu.:4.000 >>> Max. :4.000 Max. :4.000 >>> ------------------------------------------------------------ >>> : 1 >>> mpg cyl disp hp drat >>> >>> Min. :15.00 Min. :4.000 Min. : 71.1 Min. : 52.0 Min. >>> :3.54 >>> 1st Qu.:21.00 1st Qu.:4.000 1st Qu.: 79.0 1st Qu.: 66.0 1st >>> Qu.:3.85 >>> Median :22.80 Median :4.000 Median :120.3 Median :109.0 Median >>> :4.08 >>> Mean :24.39 Mean :5.077 Mean :143.5 Mean :126.8 Mean >>> :4.05 >>> 3rd Qu.:30.40 3rd Qu.:6.000 3rd Qu.:160.0 3rd Qu.:113.0 3rd >>> Qu.:4.22 >>> Max. :33.90 Max. :8.000 Max. :351.0 Max. :335.0 Max. >>> :4.93 >>> wt qsec vs am gear >>> >>> Min. :1.513 Min. :14.50 Min. :0.0000 Min. :1 Min. >>> :4.000 >>> >>> 1st Qu.:1.935 1st Qu.:16.46 1st Qu.:0.0000 1st Qu.:1 1st >>> Qu.:4.000 >>> >>> Median :2.320 Median :17.02 Median :1.0000 Median :1 Median >>> :4.000 >>> >>> Mean :2.411 Mean :17.36 Mean :0.5385 Mean :1 Mean >>> :4.385 >>> >>> 3rd Qu.:2.780 3rd Qu.:18.61 3rd Qu.:1.0000 3rd Qu.:1 3rd >>> Qu.:5.000 >>> >>> Max. :3.570 Max. :19.90 Max. :1.0000 Max. :1 Max. >>> :5.000 >>> >>> carb >>> Min. :1.000 >>> 1st Qu.:1.000 >>> Median :2.000 >>> Mean :2.923 >>> 3rd Qu.:4.000 >>> Max. :8.000 >>>> >>> >>> I am using the latest version of *R-3.2.4 on Windows*, however, this error >>> is being generated in the previous version too, >>> >>> Hope this reporting will get serious attention in debugging. >>> >>> With best regards, >>> >>> Dr. A.K. Singh >>> Head, Department of Agril. Statistics >>> Indira Gandhi Krishi Vishwavidyalaya, Raipur >>> Chhattisgarh, India, PIN-492012 >>> Mobile: +919752620740 >>> Email: akhileshsingh.igkv at gmail.com >>> >>> [[alternative HTML version deleted]] >>> >>> ______________________________________________ >>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >>> http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >>> >> >> >> >> -- >> Adrian Dusa >> University of Bucharest >> Romanian Social Data Archive >> Soseaua Panduri nr.90 >> 050663 Bucharest sector 5 >> Romania >> > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Office: A 4.23 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com
Duncan Murdoch
2016-Apr-15 09:30 UTC
[R] Bug in by() function which works for some FUN argument and does not work for others
On 15/04/2016 4:16 AM, Akhilesh Singh wrote:> Dear All, > > Thanks for your help. However, I would like to draw your attention to the > following: > > Actually, I was replicating the Example 2.3, using the dataset > "brainsize.txt" given in Section 2.3.3 ("Summarize by group") at page 55, > of a famous book "R by Example" written by "Jim Albert and Maria Rizzo" > published in Springers (2012) in a Use R! Series. The output of the by() > function printed in the book is being reproduced below for information to > all:See their errata page http://personal.bgsu.edu/~mrizzo/Rx/Rx-errata.txt. They corrected "mean" to "colMeans". Duncan Murdoch> >> by(data=brain[, -1], INDICES=brain$Gender, FUN=mean, na.rm=TRUE) > brain$Gender: Female > FSIQ VIQ PIQ Weight Height MRI_Count > 111.900 109.450 110.450 137.200 65.765 862654.600 > ------------------------------------------------------------ > brain$Gender: Male > FSIQ VIQ PIQ Weight Height MRI_Count > 115.00000 115.25000 111.60000 166.44444 71.43158 954855.40000 > > > I do not know how could the writers of the book have produced the above > results by by() function. But, when I could not reproduce these results, > then I thought that probably, this could possibly be due to some missing > values NA's in Weight and Height variables. Then I tried the above code for > the "mtcars" dataset for INDICES=mtcars$am. When I found the same results > here too, then I reported the case in "r-help at R-project.org". > > With best regards, > > Dr. A.K. Singh > Head, Department of Agril. Statistics > Indira Gandhi Krishi Vishwavidyalaya, Raipur > Chhattisgarh, India, PIN-492012 > Mobile: +919752620740 > Email: akhileshsingh.igkv at gmail.com > > On Fri, Apr 15, 2016 at 3:06 AM, Adrian Du?a <dusa.adrian at unibuc.ro> wrote: > >> I think you are not using the best function for what your intentions are. >> Try: >> >>> by(data=mtcars, INDICES=list(as.factor(mtcars$am)), FUN=colMeans) >> : 0 >> mpg cyl disp hp drat wt >> qsec vs >> 17.1473684 6.9473684 290.3789474 160.2631579 3.2863158 3.7688947 >> 18.1831579 0.3684211 >> am gear carb >> 0.0000000 3.2105263 2.7368421 >> >> --------------------------------------------------------------------------- >> : 1 >> mpg cyl disp hp drat wt >> qsec vs >> 24.3923077 5.0769231 143.5307692 126.8461538 4.0500000 2.4110000 >> 17.3600000 0.5384615 >> am gear carb >> 1.0000000 4.3846154 2.9230769 >> >> See the difference between colMeans() and mean() in their respective help >> files. >> Hth, >> Adrian >> >> On Thu, Apr 14, 2016 at 11:14 PM, Akhilesh Singh < >> akhileshsingh.igkv at gmail.com> wrote: >> >>> Dear Sirs, >>> >>> I am Professor at Indira Gandhi Krishi Vishwavidyalaya, Raipur, >>> Chhattisgarh, India. >>> >>> While taking classes, I found the *by() *function producing following >>> error >>> >>> when I use FUN=mean or median and some other functions, however, >>> FUN=summary works. >>> >>> Given below is the output of the example I used on a built-in dataset >>> "mtcars", along with error message reproduced herewith: >>> >>>> by(data=mtcars, INDICES=list(mtcars$am), FUN=mean) >>> : 0 >>> [1] NA >>> ------------------------------------------------------------ >>> : 1 >>> [1] NA >>> Warning messages: >>> 1: In mean.default(data[x, , drop = FALSE], ...) : >>> argument is not numeric or logical: returning NA >>> 2: In mean.default(data[x, , drop = FALSE], ...) : >>> argument is not numeric or logical: returning NA >>> >>> However, the same by() function works for FUN=summary, given below is the >>> output: >>> >>>> by(data=mtcars, INDICES=list(mtcars$am), FUN=summary) >>> : 0 >>> mpg cyl disp hp >>> Min. :10.40 Min. :4.000 Min. :120.1 Min. : 62.0 >>> 1st Qu.:14.95 1st Qu.:6.000 1st Qu.:196.3 1st Qu.:116.5 >>> Median :17.30 Median :8.000 Median :275.8 Median :175.0 >>> Mean :17.15 Mean :6.947 Mean :290.4 Mean :160.3 >>> 3rd Qu.:19.20 3rd Qu.:8.000 3rd Qu.:360.0 3rd Qu.:192.5 >>> Max. :24.40 Max. :8.000 Max. :472.0 Max. :245.0 >>> drat wt qsec vs am >>> >>> Min. :2.760 Min. :2.465 Min. :15.41 Min. :0.0000 Min. >>> :0 >>> >>> 1st Qu.:3.070 1st Qu.:3.438 1st Qu.:17.18 1st Qu.:0.0000 1st >>> Qu.:0 >>> >>> Median :3.150 Median :3.520 Median :17.82 Median :0.0000 Median >>> :0 >>> >>> Mean :3.286 Mean :3.769 Mean :18.18 Mean :0.3684 Mean >>> :0 >>> >>> 3rd Qu.:3.695 3rd Qu.:3.842 3rd Qu.:19.17 3rd Qu.:1.0000 3rd >>> Qu.:0 >>> >>> Max. :3.920 Max. :5.424 Max. :22.90 Max. :1.0000 Max. >>> :0 >>> >>> gear carb >>> Min. :3.000 Min. :1.000 >>> 1st Qu.:3.000 1st Qu.:2.000 >>> Median :3.000 Median :3.000 >>> Mean :3.211 Mean :2.737 >>> 3rd Qu.:3.000 3rd Qu.:4.000 >>> Max. :4.000 Max. :4.000 >>> ------------------------------------------------------------ >>> : 1 >>> mpg cyl disp hp drat >>> >>> Min. :15.00 Min. :4.000 Min. : 71.1 Min. : 52.0 Min. >>> :3.54 >>> 1st Qu.:21.00 1st Qu.:4.000 1st Qu.: 79.0 1st Qu.: 66.0 1st >>> Qu.:3.85 >>> Median :22.80 Median :4.000 Median :120.3 Median :109.0 Median >>> :4.08 >>> Mean :24.39 Mean :5.077 Mean :143.5 Mean :126.8 Mean >>> :4.05 >>> 3rd Qu.:30.40 3rd Qu.:6.000 3rd Qu.:160.0 3rd Qu.:113.0 3rd >>> Qu.:4.22 >>> Max. :33.90 Max. :8.000 Max. :351.0 Max. :335.0 Max. >>> :4.93 >>> wt qsec vs am gear >>> >>> Min. :1.513 Min. :14.50 Min. :0.0000 Min. :1 Min. >>> :4.000 >>> >>> 1st Qu.:1.935 1st Qu.:16.46 1st Qu.:0.0000 1st Qu.:1 1st >>> Qu.:4.000 >>> >>> Median :2.320 Median :17.02 Median :1.0000 Median :1 Median >>> :4.000 >>> >>> Mean :2.411 Mean :17.36 Mean :0.5385 Mean :1 Mean >>> :4.385 >>> >>> 3rd Qu.:2.780 3rd Qu.:18.61 3rd Qu.:1.0000 3rd Qu.:1 3rd >>> Qu.:5.000 >>> >>> Max. :3.570 Max. :19.90 Max. :1.0000 Max. :1 Max. >>> :5.000 >>> >>> carb >>> Min. :1.000 >>> 1st Qu.:1.000 >>> Median :2.000 >>> Mean :2.923 >>> 3rd Qu.:4.000 >>> Max. :8.000 >>>> >>> >>> I am using the latest version of *R-3.2.4 on Windows*, however, this error >>> is being generated in the previous version too, >>> >>> Hope this reporting will get serious attention in debugging. >>> >>> With best regards, >>> >>> Dr. A.K. Singh >>> Head, Department of Agril. Statistics >>> Indira Gandhi Krishi Vishwavidyalaya, Raipur >>> Chhattisgarh, India, PIN-492012 >>> Mobile: +919752620740 >>> Email: akhileshsingh.igkv at gmail.com >>> >>> [[alternative HTML version deleted]] >>> >>> ______________________________________________ >>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >>> http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >>> >> >> >> >> -- >> Adrian Dusa >> University of Bucharest >> Romanian Social Data Archive >> Soseaua Panduri nr.90 >> 050663 Bucharest sector 5 >> Romania >> > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Akhilesh Singh
2016-Apr-16 09:03 UTC
[R] Bug in by() function which works for some FUN argument and does not work for others
Dear All, I have got your core message, that it is my responsibility to determine whether any particular function in my version of R satisfies the language requirements at the time of your use. Jim Albert and Maria Rizzo must have used their code, which was permitted in the R-code of their time (2012). Therefore, I have now modified my R-code, as per R-3..2.4 version, according to my requirement as follows, which is working for my 'brain' data set, whose output is reproduced below for your information please:> by(brain[,-1], INDICES=list(Gender=brain$Gender), FUN=function(x,na.rm=FALSE) sapply(x, mean, na.rm=na.rm), na.rm=TRUE) Gender: Female FSIQ VIQ PIQ Weight Height MRI_Count 111.900 109.450 110.450 137.200 65.765 862654.600 -------------------------------------------------------------------------------------------------- Gender: Male FSIQ VIQ PIQ Weight Height MRI_Count 115.00000 115.25000 111.60000 166.44444 71.43158 954855.40000 With best regards, Dr. A.K. Singh Head, Department of Agril. Statistics Indira Gandhi Krishi Vishwavidyalaya, Raipur Chhattisgarh, India, PIN-492012 Mobile: +919752620740 Email: akhileshsingh.igkv at gmail.com On Fri, Apr 15, 2016 at 2:24 PM, David Winsemius <dwinsemius at comcast.net> wrote:> > > On Apr 15, 2016, at 1:16 AM, Akhilesh Singh < > akhileshsingh.igkv at gmail.com> wrote: > > > > Dear All, > > > > Thanks for your help. However, I would like to draw your attention to the > > following: > > > > Actually, I was replicating the Example 2.3, using the dataset > > "brainsize.txt" given in Section 2.3.3 ("Summarize by group") at page 55, > > of a famous book "R by Example" written by "Jim Albert and Maria Rizzo" > > published in Springers (2012) in a Use R! Series. The output of the by() > > function printed in the book is being reproduced below for information to > > all: > > > >> by(data=brain[, -1], INDICES=brain$Gender, FUN=mean, na.rm=TRUE) > > brain$Gender: Female > > FSIQ VIQ PIQ Weight Height MRI_Count > > 111.900 109.450 110.450 137.200 65.765 862654.600 > > ------------------------------------------------------------ > > brain$Gender: Male > > FSIQ VIQ PIQ Weight Height MRI_Count > > 115.00000 115.25000 111.60000 166.44444 71.43158 954855.40000 > > > > > > I do not know how could the writers of the book have produced the above > > results by by() function. > > > There was in the not-so-distant past a function named `mean.data.frame` > which would have "worked" in that instance. That function was removed. I > thought you could find the exact date of that action by searching the NEWS > but failed. Reviewing the citations of `mean.data.frame` in the r-help > archives I see that users were being warned that its use was deprecated in > mid 2012. It's very possible that the authors of a book in 2012 were using > an earlier version of R that had that facility available to them before it > was deprecated. With a more than current version of R 3.3.0 and a modest > number of loaded packages I see this: > > > methods(mean) > [1] mean,ANY-method mean,Matrix-method mean,Raster-method > [4] mean,sparseMatrix-method mean,sparseVector-method mean.Date > [7] mean.default mean.difftime mean.POSIXct > [10] mean.POSIXlt mean.yearmon* mean.yearqtr* > [13] mean.zoo* > > It is your responsibility to determine whether any particular function in > your version of R satisfies the language requirements at the time of your > use. Jim Albert and Maria Rizzo do not set the standards for what is an > evolving piece of software. > > -- > David. > > > > But, when I could not reproduce these results, > > then I thought that probably, this could possibly be due to some missing > > values NA's in Weight and Height variables. Then I tried the above code > for > > the "mtcars" dataset for INDICES=mtcars$am. When I found the same results > > here too, then I reported the case in "r-help at R-project.org". > > > > With best regards, > > > > Dr. A.K. Singh > > Head, Department of Agril. Statistics > > Indira Gandhi Krishi Vishwavidyalaya, Raipur > > Chhattisgarh, India, PIN-492012 > > Mobile: +919752620740 > > Email: akhileshsingh.igkv at gmail.com > > > > On Fri, Apr 15, 2016 at 3:06 AM, Adrian Du?a <dusa.adrian at unibuc.ro> > wrote: > > > >> I think you are not using the best function for what your intentions > are. > >> Try: > >> > >>> by(data=mtcars, INDICES=list(as.factor(mtcars$am)), FUN=colMeans) > >> : 0 > >> mpg cyl disp hp drat wt > >> qsec vs > >> 17.1473684 6.9473684 290.3789474 160.2631579 3.2863158 3.7688947 > >> 18.1831579 0.3684211 > >> am gear carb > >> 0.0000000 3.2105263 2.7368421 > >> > >> > --------------------------------------------------------------------------- > >> : 1 > >> mpg cyl disp hp drat wt > >> qsec vs > >> 24.3923077 5.0769231 143.5307692 126.8461538 4.0500000 2.4110000 > >> 17.3600000 0.5384615 > >> am gear carb > >> 1.0000000 4.3846154 2.9230769 > >> > >> See the difference between colMeans() and mean() in their respective > help > >> files. > >> Hth, > >> Adrian > >> > >> On Thu, Apr 14, 2016 at 11:14 PM, Akhilesh Singh < > >> akhileshsingh.igkv at gmail.com> wrote: > >> > >>> Dear Sirs, > >>> > >>> I am Professor at Indira Gandhi Krishi Vishwavidyalaya, Raipur, > >>> Chhattisgarh, India. > >>> > >>> While taking classes, I found the *by() *function producing following > >>> error > >>> > >>> when I use FUN=mean or median and some other functions, however, > >>> FUN=summary works. > >>> > >>> Given below is the output of the example I used on a built-in dataset > >>> "mtcars", along with error message reproduced herewith: > >>> > >>>> by(data=mtcars, INDICES=list(mtcars$am), FUN=mean) > >>> : 0 > >>> [1] NA > >>> ------------------------------------------------------------ > >>> : 1 > >>> [1] NA > >>> Warning messages: > >>> 1: In mean.default(data[x, , drop = FALSE], ...) : > >>> argument is not numeric or logical: returning NA > >>> 2: In mean.default(data[x, , drop = FALSE], ...) : > >>> argument is not numeric or logical: returning NA > >>> > >>> However, the same by() function works for FUN=summary, given below is > the > >>> output: > >>> > >>>> by(data=mtcars, INDICES=list(mtcars$am), FUN=summary) > >>> : 0 > >>> mpg cyl disp hp > >>> Min. :10.40 Min. :4.000 Min. :120.1 Min. : 62.0 > >>> 1st Qu.:14.95 1st Qu.:6.000 1st Qu.:196.3 1st Qu.:116.5 > >>> Median :17.30 Median :8.000 Median :275.8 Median :175.0 > >>> Mean :17.15 Mean :6.947 Mean :290.4 Mean :160.3 > >>> 3rd Qu.:19.20 3rd Qu.:8.000 3rd Qu.:360.0 3rd Qu.:192.5 > >>> Max. :24.40 Max. :8.000 Max. :472.0 Max. :245.0 > >>> drat wt qsec vs > am > >>> > >>> Min. :2.760 Min. :2.465 Min. :15.41 Min. :0.0000 Min. > >>> :0 > >>> > >>> 1st Qu.:3.070 1st Qu.:3.438 1st Qu.:17.18 1st Qu.:0.0000 1st > >>> Qu.:0 > >>> > >>> Median :3.150 Median :3.520 Median :17.82 Median :0.0000 Median > >>> :0 > >>> > >>> Mean :3.286 Mean :3.769 Mean :18.18 Mean :0.3684 Mean > >>> :0 > >>> > >>> 3rd Qu.:3.695 3rd Qu.:3.842 3rd Qu.:19.17 3rd Qu.:1.0000 3rd > >>> Qu.:0 > >>> > >>> Max. :3.920 Max. :5.424 Max. :22.90 Max. :1.0000 Max. > >>> :0 > >>> > >>> gear carb > >>> Min. :3.000 Min. :1.000 > >>> 1st Qu.:3.000 1st Qu.:2.000 > >>> Median :3.000 Median :3.000 > >>> Mean :3.211 Mean :2.737 > >>> 3rd Qu.:3.000 3rd Qu.:4.000 > >>> Max. :4.000 Max. :4.000 > >>> ------------------------------------------------------------ > >>> : 1 > >>> mpg cyl disp hp > drat > >>> > >>> Min. :15.00 Min. :4.000 Min. : 71.1 Min. : 52.0 Min. > >>> :3.54 > >>> 1st Qu.:21.00 1st Qu.:4.000 1st Qu.: 79.0 1st Qu.: 66.0 1st > >>> Qu.:3.85 > >>> Median :22.80 Median :4.000 Median :120.3 Median :109.0 Median > >>> :4.08 > >>> Mean :24.39 Mean :5.077 Mean :143.5 Mean :126.8 Mean > >>> :4.05 > >>> 3rd Qu.:30.40 3rd Qu.:6.000 3rd Qu.:160.0 3rd Qu.:113.0 3rd > >>> Qu.:4.22 > >>> Max. :33.90 Max. :8.000 Max. :351.0 Max. :335.0 Max. > >>> :4.93 > >>> wt qsec vs am gear > >>> > >>> Min. :1.513 Min. :14.50 Min. :0.0000 Min. :1 Min. > >>> :4.000 > >>> > >>> 1st Qu.:1.935 1st Qu.:16.46 1st Qu.:0.0000 1st Qu.:1 1st > >>> Qu.:4.000 > >>> > >>> Median :2.320 Median :17.02 Median :1.0000 Median :1 Median > >>> :4.000 > >>> > >>> Mean :2.411 Mean :17.36 Mean :0.5385 Mean :1 Mean > >>> :4.385 > >>> > >>> 3rd Qu.:2.780 3rd Qu.:18.61 3rd Qu.:1.0000 3rd Qu.:1 3rd > >>> Qu.:5.000 > >>> > >>> Max. :3.570 Max. :19.90 Max. :1.0000 Max. :1 Max. > >>> :5.000 > >>> > >>> carb > >>> Min. :1.000 > >>> 1st Qu.:1.000 > >>> Median :2.000 > >>> Mean :2.923 > >>> 3rd Qu.:4.000 > >>> Max. :8.000 > >>>> > >>> > >>> I am using the latest version of *R-3.2.4 on Windows*, however, this > error > >>> is being generated in the previous version too, > >>> > >>> Hope this reporting will get serious attention in debugging. > >>> > >>> With best regards, > >>> > >>> Dr. A.K. Singh > >>> Head, Department of Agril. Statistics > >>> Indira Gandhi Krishi Vishwavidyalaya, Raipur > >>> Chhattisgarh, India, PIN-492012 > >>> Mobile: +919752620740 > >>> Email: akhileshsingh.igkv at gmail.com > >>> > >>> [[alternative HTML version deleted]] > >>> > >>> ______________________________________________ > >>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > >>> https://stat.ethz.ch/mailman/listinfo/r-help > >>> PLEASE do read the posting guide > >>> http://www.R-project.org/posting-guide.html > >>> and provide commented, minimal, self-contained, reproducible code. > >>> > >> > >> > >> > >> -- > >> Adrian Dusa > >> University of Bucharest > >> Romanian Social Data Archive > >> Soseaua Panduri nr.90 > >> 050663 Bucharest sector 5 > >> Romania > >> > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > David Winsemius > Alameda, CA, USA > >[[alternative HTML version deleted]]
Reasonably Related Threads
- Bug in by() function which works for some FUN argument and does not work for others
- Bug in by() function which works for some FUN argument and does not work for others
- Bug in by() function which works for some FUN argument and does not work for others
- Bug in by() function which works for some FUN argument and does not work for others
- Bug in by() function which works for some FUN argument and does not work for others