Akhilesh Singh
2016-Apr-15 08:16 UTC
[R] Bug in by() function which works for some FUN argument and does not work for others
Dear All,
Thanks for your help. However, I would like to draw your attention to the
following:
Actually, I was replicating the Example 2.3, using the dataset
"brainsize.txt" given in Section 2.3.3 ("Summarize by
group") at page 55,
of a famous book "R by Example" written by "Jim Albert and Maria
Rizzo"
published in Springers (2012) in a Use R! Series. The output of the by()
function printed in the book is being reproduced below for information to
all:
> by(data=brain[, -1], INDICES=brain$Gender, FUN=mean, na.rm=TRUE)
brain$Gender: Female
FSIQ VIQ PIQ Weight Height MRI_Count
111.900 109.450 110.450 137.200 65.765 862654.600
------------------------------------------------------------
brain$Gender: Male
FSIQ VIQ PIQ Weight Height MRI_Count
115.00000 115.25000 111.60000 166.44444 71.43158 954855.40000
I do not know how could the writers of the book have produced the above
results by by() function. But, when I could not reproduce these results,
then I thought that probably, this could possibly be due to some missing
values NA's in Weight and Height variables. Then I tried the above code for
the "mtcars" dataset for INDICES=mtcars$am. When I found the same
results
here too, then I reported the case in "r-help at R-project.org".
With best regards,
Dr. A.K. Singh
Head, Department of Agril. Statistics
Indira Gandhi Krishi Vishwavidyalaya, Raipur
Chhattisgarh, India, PIN-492012
Mobile: +919752620740
Email: akhileshsingh.igkv at gmail.com
On Fri, Apr 15, 2016 at 3:06 AM, Adrian Du?a <dusa.adrian at unibuc.ro>
wrote:
> I think you are not using the best function for what your intentions are.
> Try:
>
> > by(data=mtcars, INDICES=list(as.factor(mtcars$am)), FUN=colMeans)
> : 0
> mpg cyl disp hp drat wt
> qsec vs
> 17.1473684 6.9473684 290.3789474 160.2631579 3.2863158 3.7688947
> 18.1831579 0.3684211
> am gear carb
> 0.0000000 3.2105263 2.7368421
>
> ---------------------------------------------------------------------------
> : 1
> mpg cyl disp hp drat wt
> qsec vs
> 24.3923077 5.0769231 143.5307692 126.8461538 4.0500000 2.4110000
> 17.3600000 0.5384615
> am gear carb
> 1.0000000 4.3846154 2.9230769
>
> See the difference between colMeans() and mean() in their respective help
> files.
> Hth,
> Adrian
>
> On Thu, Apr 14, 2016 at 11:14 PM, Akhilesh Singh <
> akhileshsingh.igkv at gmail.com> wrote:
>
>> Dear Sirs,
>>
>> I am Professor at Indira Gandhi Krishi Vishwavidyalaya, Raipur,
>> Chhattisgarh, India.
>>
>> While taking classes, I found the *by() *function producing following
>> error
>>
>> when I use FUN=mean or median and some other functions, however,
>> FUN=summary works.
>>
>> Given below is the output of the example I used on a built-in dataset
>> "mtcars", along with error message reproduced herewith:
>>
>> > by(data=mtcars, INDICES=list(mtcars$am), FUN=mean)
>> : 0
>> [1] NA
>> ------------------------------------------------------------
>> : 1
>> [1] NA
>> Warning messages:
>> 1: In mean.default(data[x, , drop = FALSE], ...) :
>> argument is not numeric or logical: returning NA
>> 2: In mean.default(data[x, , drop = FALSE], ...) :
>> argument is not numeric or logical: returning NA
>>
>> However, the same by() function works for FUN=summary, given below is
the
>> output:
>>
>> > by(data=mtcars, INDICES=list(mtcars$am), FUN=summary)
>> : 0
>> mpg cyl disp hp
>> Min. :10.40 Min. :4.000 Min. :120.1 Min. : 62.0
>> 1st Qu.:14.95 1st Qu.:6.000 1st Qu.:196.3 1st Qu.:116.5
>> Median :17.30 Median :8.000 Median :275.8 Median :175.0
>> Mean :17.15 Mean :6.947 Mean :290.4 Mean :160.3
>> 3rd Qu.:19.20 3rd Qu.:8.000 3rd Qu.:360.0 3rd Qu.:192.5
>> Max. :24.40 Max. :8.000 Max. :472.0 Max. :245.0
>> drat wt qsec vs
am
>>
>> Min. :2.760 Min. :2.465 Min. :15.41 Min. :0.0000 Min.
>> :0
>>
>> 1st Qu.:3.070 1st Qu.:3.438 1st Qu.:17.18 1st Qu.:0.0000 1st
>> Qu.:0
>>
>> Median :3.150 Median :3.520 Median :17.82 Median :0.0000
Median
>> :0
>>
>> Mean :3.286 Mean :3.769 Mean :18.18 Mean :0.3684 Mean
>> :0
>>
>> 3rd Qu.:3.695 3rd Qu.:3.842 3rd Qu.:19.17 3rd Qu.:1.0000 3rd
>> Qu.:0
>>
>> Max. :3.920 Max. :5.424 Max. :22.90 Max. :1.0000 Max.
>> :0
>>
>> gear carb
>> Min. :3.000 Min. :1.000
>> 1st Qu.:3.000 1st Qu.:2.000
>> Median :3.000 Median :3.000
>> Mean :3.211 Mean :2.737
>> 3rd Qu.:3.000 3rd Qu.:4.000
>> Max. :4.000 Max. :4.000
>> ------------------------------------------------------------
>> : 1
>> mpg cyl disp hp
drat
>>
>> Min. :15.00 Min. :4.000 Min. : 71.1 Min. : 52.0 Min.
>> :3.54
>> 1st Qu.:21.00 1st Qu.:4.000 1st Qu.: 79.0 1st Qu.: 66.0 1st
>> Qu.:3.85
>> Median :22.80 Median :4.000 Median :120.3 Median :109.0 Median
>> :4.08
>> Mean :24.39 Mean :5.077 Mean :143.5 Mean :126.8 Mean
>> :4.05
>> 3rd Qu.:30.40 3rd Qu.:6.000 3rd Qu.:160.0 3rd Qu.:113.0 3rd
>> Qu.:4.22
>> Max. :33.90 Max. :8.000 Max. :351.0 Max. :335.0 Max.
>> :4.93
>> wt qsec vs am gear
>>
>> Min. :1.513 Min. :14.50 Min. :0.0000 Min. :1 Min.
>> :4.000
>>
>> 1st Qu.:1.935 1st Qu.:16.46 1st Qu.:0.0000 1st Qu.:1 1st
>> Qu.:4.000
>>
>> Median :2.320 Median :17.02 Median :1.0000 Median :1 Median
>> :4.000
>>
>> Mean :2.411 Mean :17.36 Mean :0.5385 Mean :1 Mean
>> :4.385
>>
>> 3rd Qu.:2.780 3rd Qu.:18.61 3rd Qu.:1.0000 3rd Qu.:1 3rd
>> Qu.:5.000
>>
>> Max. :3.570 Max. :19.90 Max. :1.0000 Max. :1 Max.
>> :5.000
>>
>> carb
>> Min. :1.000
>> 1st Qu.:1.000
>> Median :2.000
>> Mean :2.923
>> 3rd Qu.:4.000
>> Max. :8.000
>> >
>>
>> I am using the latest version of *R-3.2.4 on Windows*, however, this
error
>> is being generated in the previous version too,
>>
>> Hope this reporting will get serious attention in debugging.
>>
>> With best regards,
>>
>> Dr. A.K. Singh
>> Head, Department of Agril. Statistics
>> Indira Gandhi Krishi Vishwavidyalaya, Raipur
>> Chhattisgarh, India, PIN-492012
>> Mobile: +919752620740
>> Email: akhileshsingh.igkv at gmail.com
>>
>> [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
>
> --
> Adrian Dusa
> University of Bucharest
> Romanian Social Data Archive
> Soseaua Panduri nr.90
> 050663 Bucharest sector 5
> Romania
>
[[alternative HTML version deleted]]
David Winsemius
2016-Apr-15 08:54 UTC
[R] Bug in by() function which works for some FUN argument and does not work for others
> On Apr 15, 2016, at 1:16 AM, Akhilesh Singh <akhileshsingh.igkv at gmail.com> wrote: > > Dear All, > > Thanks for your help. However, I would like to draw your attention to the > following: > > Actually, I was replicating the Example 2.3, using the dataset > "brainsize.txt" given in Section 2.3.3 ("Summarize by group") at page 55, > of a famous book "R by Example" written by "Jim Albert and Maria Rizzo" > published in Springers (2012) in a Use R! Series. The output of the by() > function printed in the book is being reproduced below for information to > all: > >> by(data=brain[, -1], INDICES=brain$Gender, FUN=mean, na.rm=TRUE) > brain$Gender: Female > FSIQ VIQ PIQ Weight Height MRI_Count > 111.900 109.450 110.450 137.200 65.765 862654.600 > ------------------------------------------------------------ > brain$Gender: Male > FSIQ VIQ PIQ Weight Height MRI_Count > 115.00000 115.25000 111.60000 166.44444 71.43158 954855.40000 > > > I do not know how could the writers of the book have produced the above > results by by() function.There was in the not-so-distant past a function named `mean.data.frame` which would have "worked" in that instance. That function was removed. I thought you could find the exact date of that action by searching the NEWS but failed. Reviewing the citations of `mean.data.frame` in the r-help archives I see that users were being warned that its use was deprecated in mid 2012. It's very possible that the authors of a book in 2012 were using an earlier version of R that had that facility available to them before it was deprecated. With a more than current version of R 3.3.0 and a modest number of loaded packages I see this:> methods(mean)[1] mean,ANY-method mean,Matrix-method mean,Raster-method [4] mean,sparseMatrix-method mean,sparseVector-method mean.Date [7] mean.default mean.difftime mean.POSIXct [10] mean.POSIXlt mean.yearmon* mean.yearqtr* [13] mean.zoo* It is your responsibility to determine whether any particular function in your version of R satisfies the language requirements at the time of your use. Jim Albert and Maria Rizzo do not set the standards for what is an evolving piece of software. -- David.> But, when I could not reproduce these results, > then I thought that probably, this could possibly be due to some missing > values NA's in Weight and Height variables. Then I tried the above code for > the "mtcars" dataset for INDICES=mtcars$am. When I found the same results > here too, then I reported the case in "r-help at R-project.org". > > With best regards, > > Dr. A.K. Singh > Head, Department of Agril. Statistics > Indira Gandhi Krishi Vishwavidyalaya, Raipur > Chhattisgarh, India, PIN-492012 > Mobile: +919752620740 > Email: akhileshsingh.igkv at gmail.com > > On Fri, Apr 15, 2016 at 3:06 AM, Adrian Du?a <dusa.adrian at unibuc.ro> wrote: > >> I think you are not using the best function for what your intentions are. >> Try: >> >>> by(data=mtcars, INDICES=list(as.factor(mtcars$am)), FUN=colMeans) >> : 0 >> mpg cyl disp hp drat wt >> qsec vs >> 17.1473684 6.9473684 290.3789474 160.2631579 3.2863158 3.7688947 >> 18.1831579 0.3684211 >> am gear carb >> 0.0000000 3.2105263 2.7368421 >> >> --------------------------------------------------------------------------- >> : 1 >> mpg cyl disp hp drat wt >> qsec vs >> 24.3923077 5.0769231 143.5307692 126.8461538 4.0500000 2.4110000 >> 17.3600000 0.5384615 >> am gear carb >> 1.0000000 4.3846154 2.9230769 >> >> See the difference between colMeans() and mean() in their respective help >> files. >> Hth, >> Adrian >> >> On Thu, Apr 14, 2016 at 11:14 PM, Akhilesh Singh < >> akhileshsingh.igkv at gmail.com> wrote: >> >>> Dear Sirs, >>> >>> I am Professor at Indira Gandhi Krishi Vishwavidyalaya, Raipur, >>> Chhattisgarh, India. >>> >>> While taking classes, I found the *by() *function producing following >>> error >>> >>> when I use FUN=mean or median and some other functions, however, >>> FUN=summary works. >>> >>> Given below is the output of the example I used on a built-in dataset >>> "mtcars", along with error message reproduced herewith: >>> >>>> by(data=mtcars, INDICES=list(mtcars$am), FUN=mean) >>> : 0 >>> [1] NA >>> ------------------------------------------------------------ >>> : 1 >>> [1] NA >>> Warning messages: >>> 1: In mean.default(data[x, , drop = FALSE], ...) : >>> argument is not numeric or logical: returning NA >>> 2: In mean.default(data[x, , drop = FALSE], ...) : >>> argument is not numeric or logical: returning NA >>> >>> However, the same by() function works for FUN=summary, given below is the >>> output: >>> >>>> by(data=mtcars, INDICES=list(mtcars$am), FUN=summary) >>> : 0 >>> mpg cyl disp hp >>> Min. :10.40 Min. :4.000 Min. :120.1 Min. : 62.0 >>> 1st Qu.:14.95 1st Qu.:6.000 1st Qu.:196.3 1st Qu.:116.5 >>> Median :17.30 Median :8.000 Median :275.8 Median :175.0 >>> Mean :17.15 Mean :6.947 Mean :290.4 Mean :160.3 >>> 3rd Qu.:19.20 3rd Qu.:8.000 3rd Qu.:360.0 3rd Qu.:192.5 >>> Max. :24.40 Max. :8.000 Max. :472.0 Max. :245.0 >>> drat wt qsec vs am >>> >>> Min. :2.760 Min. :2.465 Min. :15.41 Min. :0.0000 Min. >>> :0 >>> >>> 1st Qu.:3.070 1st Qu.:3.438 1st Qu.:17.18 1st Qu.:0.0000 1st >>> Qu.:0 >>> >>> Median :3.150 Median :3.520 Median :17.82 Median :0.0000 Median >>> :0 >>> >>> Mean :3.286 Mean :3.769 Mean :18.18 Mean :0.3684 Mean >>> :0 >>> >>> 3rd Qu.:3.695 3rd Qu.:3.842 3rd Qu.:19.17 3rd Qu.:1.0000 3rd >>> Qu.:0 >>> >>> Max. :3.920 Max. :5.424 Max. :22.90 Max. :1.0000 Max. >>> :0 >>> >>> gear carb >>> Min. :3.000 Min. :1.000 >>> 1st Qu.:3.000 1st Qu.:2.000 >>> Median :3.000 Median :3.000 >>> Mean :3.211 Mean :2.737 >>> 3rd Qu.:3.000 3rd Qu.:4.000 >>> Max. :4.000 Max. :4.000 >>> ------------------------------------------------------------ >>> : 1 >>> mpg cyl disp hp drat >>> >>> Min. :15.00 Min. :4.000 Min. : 71.1 Min. : 52.0 Min. >>> :3.54 >>> 1st Qu.:21.00 1st Qu.:4.000 1st Qu.: 79.0 1st Qu.: 66.0 1st >>> Qu.:3.85 >>> Median :22.80 Median :4.000 Median :120.3 Median :109.0 Median >>> :4.08 >>> Mean :24.39 Mean :5.077 Mean :143.5 Mean :126.8 Mean >>> :4.05 >>> 3rd Qu.:30.40 3rd Qu.:6.000 3rd Qu.:160.0 3rd Qu.:113.0 3rd >>> Qu.:4.22 >>> Max. :33.90 Max. :8.000 Max. :351.0 Max. :335.0 Max. >>> :4.93 >>> wt qsec vs am gear >>> >>> Min. :1.513 Min. :14.50 Min. :0.0000 Min. :1 Min. >>> :4.000 >>> >>> 1st Qu.:1.935 1st Qu.:16.46 1st Qu.:0.0000 1st Qu.:1 1st >>> Qu.:4.000 >>> >>> Median :2.320 Median :17.02 Median :1.0000 Median :1 Median >>> :4.000 >>> >>> Mean :2.411 Mean :17.36 Mean :0.5385 Mean :1 Mean >>> :4.385 >>> >>> 3rd Qu.:2.780 3rd Qu.:18.61 3rd Qu.:1.0000 3rd Qu.:1 3rd >>> Qu.:5.000 >>> >>> Max. :3.570 Max. :19.90 Max. :1.0000 Max. :1 Max. >>> :5.000 >>> >>> carb >>> Min. :1.000 >>> 1st Qu.:1.000 >>> Median :2.000 >>> Mean :2.923 >>> 3rd Qu.:4.000 >>> Max. :8.000 >>>> >>> >>> I am using the latest version of *R-3.2.4 on Windows*, however, this error >>> is being generated in the previous version too, >>> >>> Hope this reporting will get serious attention in debugging. >>> >>> With best regards, >>> >>> Dr. A.K. Singh >>> Head, Department of Agril. Statistics >>> Indira Gandhi Krishi Vishwavidyalaya, Raipur >>> Chhattisgarh, India, PIN-492012 >>> Mobile: +919752620740 >>> Email: akhileshsingh.igkv at gmail.com >>> >>> [[alternative HTML version deleted]] >>> >>> ______________________________________________ >>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >>> http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >>> >> >> >> >> -- >> Adrian Dusa >> University of Bucharest >> Romanian Social Data Archive >> Soseaua Panduri nr.90 >> 050663 Bucharest sector 5 >> Romania >> > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.David Winsemius Alameda, CA, USA
peter dalgaard
2016-Apr-15 09:02 UTC
[R] Bug in by() function which works for some FUN argument and does not work for others
Books don't rewrite themselves retroactively....
NEWS for 3.0.0 has
? mean() for data frames and sd() for data frames and matrices are
defunct.
and 3.0.0 was released April 3, 2013.
A book published in 2012 would likely be based on R 2.13.x or maybe even 2.12.x.
So mean(dataframe) worked in the past. It was changed because of
inconsistencies, e.g. mean(as.matrix(dataframe)) is a single number,
median.data.frame never existed, var(dataframe) differed from sd(dataframe)^2,
etc. The deprecation/defunct process started with 2.14.0-pre in October 2011.
-pd
On 15 Apr 2016, at 10:16 , Akhilesh Singh <akhileshsingh.igkv at
gmail.com> wrote:
> Dear All,
>
> Thanks for your help. However, I would like to draw your attention to the
> following:
>
> Actually, I was replicating the Example 2.3, using the dataset
> "brainsize.txt" given in Section 2.3.3 ("Summarize by
group") at page 55,
> of a famous book "R by Example" written by "Jim Albert and
Maria Rizzo"
> published in Springers (2012) in a Use R! Series. The output of the by()
> function printed in the book is being reproduced below for information to
> all:
>
>> by(data=brain[, -1], INDICES=brain$Gender, FUN=mean, na.rm=TRUE)
> brain$Gender: Female
> FSIQ VIQ PIQ Weight Height MRI_Count
> 111.900 109.450 110.450 137.200 65.765 862654.600
> ------------------------------------------------------------
> brain$Gender: Male
> FSIQ VIQ PIQ Weight Height MRI_Count
> 115.00000 115.25000 111.60000 166.44444 71.43158 954855.40000
>
>
> I do not know how could the writers of the book have produced the above
> results by by() function. But, when I could not reproduce these results,
> then I thought that probably, this could possibly be due to some missing
> values NA's in Weight and Height variables. Then I tried the above code
for
> the "mtcars" dataset for INDICES=mtcars$am. When I found the same
results
> here too, then I reported the case in "r-help at R-project.org".
>
> With best regards,
>
> Dr. A.K. Singh
> Head, Department of Agril. Statistics
> Indira Gandhi Krishi Vishwavidyalaya, Raipur
> Chhattisgarh, India, PIN-492012
> Mobile: +919752620740
> Email: akhileshsingh.igkv at gmail.com
>
> On Fri, Apr 15, 2016 at 3:06 AM, Adrian Du?a <dusa.adrian at
unibuc.ro> wrote:
>
>> I think you are not using the best function for what your intentions
are.
>> Try:
>>
>>> by(data=mtcars, INDICES=list(as.factor(mtcars$am)), FUN=colMeans)
>> : 0
>> mpg cyl disp hp drat wt
>> qsec vs
>> 17.1473684 6.9473684 290.3789474 160.2631579 3.2863158 3.7688947
>> 18.1831579 0.3684211
>> am gear carb
>> 0.0000000 3.2105263 2.7368421
>>
>>
---------------------------------------------------------------------------
>> : 1
>> mpg cyl disp hp drat wt
>> qsec vs
>> 24.3923077 5.0769231 143.5307692 126.8461538 4.0500000 2.4110000
>> 17.3600000 0.5384615
>> am gear carb
>> 1.0000000 4.3846154 2.9230769
>>
>> See the difference between colMeans() and mean() in their respective
help
>> files.
>> Hth,
>> Adrian
>>
>> On Thu, Apr 14, 2016 at 11:14 PM, Akhilesh Singh <
>> akhileshsingh.igkv at gmail.com> wrote:
>>
>>> Dear Sirs,
>>>
>>> I am Professor at Indira Gandhi Krishi Vishwavidyalaya, Raipur,
>>> Chhattisgarh, India.
>>>
>>> While taking classes, I found the *by() *function producing
following
>>> error
>>>
>>> when I use FUN=mean or median and some other functions, however,
>>> FUN=summary works.
>>>
>>> Given below is the output of the example I used on a built-in
dataset
>>> "mtcars", along with error message reproduced herewith:
>>>
>>>> by(data=mtcars, INDICES=list(mtcars$am), FUN=mean)
>>> : 0
>>> [1] NA
>>> ------------------------------------------------------------
>>> : 1
>>> [1] NA
>>> Warning messages:
>>> 1: In mean.default(data[x, , drop = FALSE], ...) :
>>> argument is not numeric or logical: returning NA
>>> 2: In mean.default(data[x, , drop = FALSE], ...) :
>>> argument is not numeric or logical: returning NA
>>>
>>> However, the same by() function works for FUN=summary, given below
is the
>>> output:
>>>
>>>> by(data=mtcars, INDICES=list(mtcars$am), FUN=summary)
>>> : 0
>>> mpg cyl disp hp
>>> Min. :10.40 Min. :4.000 Min. :120.1 Min. : 62.0
>>> 1st Qu.:14.95 1st Qu.:6.000 1st Qu.:196.3 1st Qu.:116.5
>>> Median :17.30 Median :8.000 Median :275.8 Median :175.0
>>> Mean :17.15 Mean :6.947 Mean :290.4 Mean :160.3
>>> 3rd Qu.:19.20 3rd Qu.:8.000 3rd Qu.:360.0 3rd Qu.:192.5
>>> Max. :24.40 Max. :8.000 Max. :472.0 Max. :245.0
>>> drat wt qsec vs
am
>>>
>>> Min. :2.760 Min. :2.465 Min. :15.41 Min. :0.0000
Min.
>>> :0
>>>
>>> 1st Qu.:3.070 1st Qu.:3.438 1st Qu.:17.18 1st Qu.:0.0000
1st
>>> Qu.:0
>>>
>>> Median :3.150 Median :3.520 Median :17.82 Median :0.0000
Median
>>> :0
>>>
>>> Mean :3.286 Mean :3.769 Mean :18.18 Mean :0.3684
Mean
>>> :0
>>>
>>> 3rd Qu.:3.695 3rd Qu.:3.842 3rd Qu.:19.17 3rd Qu.:1.0000
3rd
>>> Qu.:0
>>>
>>> Max. :3.920 Max. :5.424 Max. :22.90 Max. :1.0000
Max.
>>> :0
>>>
>>> gear carb
>>> Min. :3.000 Min. :1.000
>>> 1st Qu.:3.000 1st Qu.:2.000
>>> Median :3.000 Median :3.000
>>> Mean :3.211 Mean :2.737
>>> 3rd Qu.:3.000 3rd Qu.:4.000
>>> Max. :4.000 Max. :4.000
>>> ------------------------------------------------------------
>>> : 1
>>> mpg cyl disp hp
drat
>>>
>>> Min. :15.00 Min. :4.000 Min. : 71.1 Min. : 52.0
Min.
>>> :3.54
>>> 1st Qu.:21.00 1st Qu.:4.000 1st Qu.: 79.0 1st Qu.: 66.0 1st
>>> Qu.:3.85
>>> Median :22.80 Median :4.000 Median :120.3 Median :109.0
Median
>>> :4.08
>>> Mean :24.39 Mean :5.077 Mean :143.5 Mean :126.8
Mean
>>> :4.05
>>> 3rd Qu.:30.40 3rd Qu.:6.000 3rd Qu.:160.0 3rd Qu.:113.0 3rd
>>> Qu.:4.22
>>> Max. :33.90 Max. :8.000 Max. :351.0 Max. :335.0
Max.
>>> :4.93
>>> wt qsec vs am
gear
>>>
>>> Min. :1.513 Min. :14.50 Min. :0.0000 Min. :1 Min.
>>> :4.000
>>>
>>> 1st Qu.:1.935 1st Qu.:16.46 1st Qu.:0.0000 1st Qu.:1 1st
>>> Qu.:4.000
>>>
>>> Median :2.320 Median :17.02 Median :1.0000 Median :1 Median
>>> :4.000
>>>
>>> Mean :2.411 Mean :17.36 Mean :0.5385 Mean :1 Mean
>>> :4.385
>>>
>>> 3rd Qu.:2.780 3rd Qu.:18.61 3rd Qu.:1.0000 3rd Qu.:1 3rd
>>> Qu.:5.000
>>>
>>> Max. :3.570 Max. :19.90 Max. :1.0000 Max. :1 Max.
>>> :5.000
>>>
>>> carb
>>> Min. :1.000
>>> 1st Qu.:1.000
>>> Median :2.000
>>> Mean :2.923
>>> 3rd Qu.:4.000
>>> Max. :8.000
>>>>
>>>
>>> I am using the latest version of *R-3.2.4 on Windows*, however,
this error
>>> is being generated in the previous version too,
>>>
>>> Hope this reporting will get serious attention in debugging.
>>>
>>> With best regards,
>>>
>>> Dr. A.K. Singh
>>> Head, Department of Agril. Statistics
>>> Indira Gandhi Krishi Vishwavidyalaya, Raipur
>>> Chhattisgarh, India, PIN-492012
>>> Mobile: +919752620740
>>> Email: akhileshsingh.igkv at gmail.com
>>>
>>> [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more,
see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>>
>>
>> --
>> Adrian Dusa
>> University of Bucharest
>> Romanian Social Data Archive
>> Soseaua Panduri nr.90
>> 050663 Bucharest sector 5
>> Romania
>>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
--
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com
Duncan Murdoch
2016-Apr-15 09:30 UTC
[R] Bug in by() function which works for some FUN argument and does not work for others
On 15/04/2016 4:16 AM, Akhilesh Singh wrote:> Dear All, > > Thanks for your help. However, I would like to draw your attention to the > following: > > Actually, I was replicating the Example 2.3, using the dataset > "brainsize.txt" given in Section 2.3.3 ("Summarize by group") at page 55, > of a famous book "R by Example" written by "Jim Albert and Maria Rizzo" > published in Springers (2012) in a Use R! Series. The output of the by() > function printed in the book is being reproduced below for information to > all:See their errata page http://personal.bgsu.edu/~mrizzo/Rx/Rx-errata.txt. They corrected "mean" to "colMeans". Duncan Murdoch> >> by(data=brain[, -1], INDICES=brain$Gender, FUN=mean, na.rm=TRUE) > brain$Gender: Female > FSIQ VIQ PIQ Weight Height MRI_Count > 111.900 109.450 110.450 137.200 65.765 862654.600 > ------------------------------------------------------------ > brain$Gender: Male > FSIQ VIQ PIQ Weight Height MRI_Count > 115.00000 115.25000 111.60000 166.44444 71.43158 954855.40000 > > > I do not know how could the writers of the book have produced the above > results by by() function. But, when I could not reproduce these results, > then I thought that probably, this could possibly be due to some missing > values NA's in Weight and Height variables. Then I tried the above code for > the "mtcars" dataset for INDICES=mtcars$am. When I found the same results > here too, then I reported the case in "r-help at R-project.org". > > With best regards, > > Dr. A.K. Singh > Head, Department of Agril. Statistics > Indira Gandhi Krishi Vishwavidyalaya, Raipur > Chhattisgarh, India, PIN-492012 > Mobile: +919752620740 > Email: akhileshsingh.igkv at gmail.com > > On Fri, Apr 15, 2016 at 3:06 AM, Adrian Du?a <dusa.adrian at unibuc.ro> wrote: > >> I think you are not using the best function for what your intentions are. >> Try: >> >>> by(data=mtcars, INDICES=list(as.factor(mtcars$am)), FUN=colMeans) >> : 0 >> mpg cyl disp hp drat wt >> qsec vs >> 17.1473684 6.9473684 290.3789474 160.2631579 3.2863158 3.7688947 >> 18.1831579 0.3684211 >> am gear carb >> 0.0000000 3.2105263 2.7368421 >> >> --------------------------------------------------------------------------- >> : 1 >> mpg cyl disp hp drat wt >> qsec vs >> 24.3923077 5.0769231 143.5307692 126.8461538 4.0500000 2.4110000 >> 17.3600000 0.5384615 >> am gear carb >> 1.0000000 4.3846154 2.9230769 >> >> See the difference between colMeans() and mean() in their respective help >> files. >> Hth, >> Adrian >> >> On Thu, Apr 14, 2016 at 11:14 PM, Akhilesh Singh < >> akhileshsingh.igkv at gmail.com> wrote: >> >>> Dear Sirs, >>> >>> I am Professor at Indira Gandhi Krishi Vishwavidyalaya, Raipur, >>> Chhattisgarh, India. >>> >>> While taking classes, I found the *by() *function producing following >>> error >>> >>> when I use FUN=mean or median and some other functions, however, >>> FUN=summary works. >>> >>> Given below is the output of the example I used on a built-in dataset >>> "mtcars", along with error message reproduced herewith: >>> >>>> by(data=mtcars, INDICES=list(mtcars$am), FUN=mean) >>> : 0 >>> [1] NA >>> ------------------------------------------------------------ >>> : 1 >>> [1] NA >>> Warning messages: >>> 1: In mean.default(data[x, , drop = FALSE], ...) : >>> argument is not numeric or logical: returning NA >>> 2: In mean.default(data[x, , drop = FALSE], ...) : >>> argument is not numeric or logical: returning NA >>> >>> However, the same by() function works for FUN=summary, given below is the >>> output: >>> >>>> by(data=mtcars, INDICES=list(mtcars$am), FUN=summary) >>> : 0 >>> mpg cyl disp hp >>> Min. :10.40 Min. :4.000 Min. :120.1 Min. : 62.0 >>> 1st Qu.:14.95 1st Qu.:6.000 1st Qu.:196.3 1st Qu.:116.5 >>> Median :17.30 Median :8.000 Median :275.8 Median :175.0 >>> Mean :17.15 Mean :6.947 Mean :290.4 Mean :160.3 >>> 3rd Qu.:19.20 3rd Qu.:8.000 3rd Qu.:360.0 3rd Qu.:192.5 >>> Max. :24.40 Max. :8.000 Max. :472.0 Max. :245.0 >>> drat wt qsec vs am >>> >>> Min. :2.760 Min. :2.465 Min. :15.41 Min. :0.0000 Min. >>> :0 >>> >>> 1st Qu.:3.070 1st Qu.:3.438 1st Qu.:17.18 1st Qu.:0.0000 1st >>> Qu.:0 >>> >>> Median :3.150 Median :3.520 Median :17.82 Median :0.0000 Median >>> :0 >>> >>> Mean :3.286 Mean :3.769 Mean :18.18 Mean :0.3684 Mean >>> :0 >>> >>> 3rd Qu.:3.695 3rd Qu.:3.842 3rd Qu.:19.17 3rd Qu.:1.0000 3rd >>> Qu.:0 >>> >>> Max. :3.920 Max. :5.424 Max. :22.90 Max. :1.0000 Max. >>> :0 >>> >>> gear carb >>> Min. :3.000 Min. :1.000 >>> 1st Qu.:3.000 1st Qu.:2.000 >>> Median :3.000 Median :3.000 >>> Mean :3.211 Mean :2.737 >>> 3rd Qu.:3.000 3rd Qu.:4.000 >>> Max. :4.000 Max. :4.000 >>> ------------------------------------------------------------ >>> : 1 >>> mpg cyl disp hp drat >>> >>> Min. :15.00 Min. :4.000 Min. : 71.1 Min. : 52.0 Min. >>> :3.54 >>> 1st Qu.:21.00 1st Qu.:4.000 1st Qu.: 79.0 1st Qu.: 66.0 1st >>> Qu.:3.85 >>> Median :22.80 Median :4.000 Median :120.3 Median :109.0 Median >>> :4.08 >>> Mean :24.39 Mean :5.077 Mean :143.5 Mean :126.8 Mean >>> :4.05 >>> 3rd Qu.:30.40 3rd Qu.:6.000 3rd Qu.:160.0 3rd Qu.:113.0 3rd >>> Qu.:4.22 >>> Max. :33.90 Max. :8.000 Max. :351.0 Max. :335.0 Max. >>> :4.93 >>> wt qsec vs am gear >>> >>> Min. :1.513 Min. :14.50 Min. :0.0000 Min. :1 Min. >>> :4.000 >>> >>> 1st Qu.:1.935 1st Qu.:16.46 1st Qu.:0.0000 1st Qu.:1 1st >>> Qu.:4.000 >>> >>> Median :2.320 Median :17.02 Median :1.0000 Median :1 Median >>> :4.000 >>> >>> Mean :2.411 Mean :17.36 Mean :0.5385 Mean :1 Mean >>> :4.385 >>> >>> 3rd Qu.:2.780 3rd Qu.:18.61 3rd Qu.:1.0000 3rd Qu.:1 3rd >>> Qu.:5.000 >>> >>> Max. :3.570 Max. :19.90 Max. :1.0000 Max. :1 Max. >>> :5.000 >>> >>> carb >>> Min. :1.000 >>> 1st Qu.:1.000 >>> Median :2.000 >>> Mean :2.923 >>> 3rd Qu.:4.000 >>> Max. :8.000 >>>> >>> >>> I am using the latest version of *R-3.2.4 on Windows*, however, this error >>> is being generated in the previous version too, >>> >>> Hope this reporting will get serious attention in debugging. >>> >>> With best regards, >>> >>> Dr. A.K. Singh >>> Head, Department of Agril. Statistics >>> Indira Gandhi Krishi Vishwavidyalaya, Raipur >>> Chhattisgarh, India, PIN-492012 >>> Mobile: +919752620740 >>> Email: akhileshsingh.igkv at gmail.com >>> >>> [[alternative HTML version deleted]] >>> >>> ______________________________________________ >>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >>> http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >>> >> >> >> >> -- >> Adrian Dusa >> University of Bucharest >> Romanian Social Data Archive >> Soseaua Panduri nr.90 >> 050663 Bucharest sector 5 >> Romania >> > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Akhilesh Singh
2016-Apr-16 09:03 UTC
[R] Bug in by() function which works for some FUN argument and does not work for others
Dear All, I have got your core message, that it is my responsibility to determine whether any particular function in my version of R satisfies the language requirements at the time of your use. Jim Albert and Maria Rizzo must have used their code, which was permitted in the R-code of their time (2012). Therefore, I have now modified my R-code, as per R-3..2.4 version, according to my requirement as follows, which is working for my 'brain' data set, whose output is reproduced below for your information please:> by(brain[,-1], INDICES=list(Gender=brain$Gender), FUN=function(x,na.rm=FALSE) sapply(x, mean, na.rm=na.rm), na.rm=TRUE) Gender: Female FSIQ VIQ PIQ Weight Height MRI_Count 111.900 109.450 110.450 137.200 65.765 862654.600 -------------------------------------------------------------------------------------------------- Gender: Male FSIQ VIQ PIQ Weight Height MRI_Count 115.00000 115.25000 111.60000 166.44444 71.43158 954855.40000 With best regards, Dr. A.K. Singh Head, Department of Agril. Statistics Indira Gandhi Krishi Vishwavidyalaya, Raipur Chhattisgarh, India, PIN-492012 Mobile: +919752620740 Email: akhileshsingh.igkv at gmail.com On Fri, Apr 15, 2016 at 2:24 PM, David Winsemius <dwinsemius at comcast.net> wrote:> > > On Apr 15, 2016, at 1:16 AM, Akhilesh Singh < > akhileshsingh.igkv at gmail.com> wrote: > > > > Dear All, > > > > Thanks for your help. However, I would like to draw your attention to the > > following: > > > > Actually, I was replicating the Example 2.3, using the dataset > > "brainsize.txt" given in Section 2.3.3 ("Summarize by group") at page 55, > > of a famous book "R by Example" written by "Jim Albert and Maria Rizzo" > > published in Springers (2012) in a Use R! Series. The output of the by() > > function printed in the book is being reproduced below for information to > > all: > > > >> by(data=brain[, -1], INDICES=brain$Gender, FUN=mean, na.rm=TRUE) > > brain$Gender: Female > > FSIQ VIQ PIQ Weight Height MRI_Count > > 111.900 109.450 110.450 137.200 65.765 862654.600 > > ------------------------------------------------------------ > > brain$Gender: Male > > FSIQ VIQ PIQ Weight Height MRI_Count > > 115.00000 115.25000 111.60000 166.44444 71.43158 954855.40000 > > > > > > I do not know how could the writers of the book have produced the above > > results by by() function. > > > There was in the not-so-distant past a function named `mean.data.frame` > which would have "worked" in that instance. That function was removed. I > thought you could find the exact date of that action by searching the NEWS > but failed. Reviewing the citations of `mean.data.frame` in the r-help > archives I see that users were being warned that its use was deprecated in > mid 2012. It's very possible that the authors of a book in 2012 were using > an earlier version of R that had that facility available to them before it > was deprecated. With a more than current version of R 3.3.0 and a modest > number of loaded packages I see this: > > > methods(mean) > [1] mean,ANY-method mean,Matrix-method mean,Raster-method > [4] mean,sparseMatrix-method mean,sparseVector-method mean.Date > [7] mean.default mean.difftime mean.POSIXct > [10] mean.POSIXlt mean.yearmon* mean.yearqtr* > [13] mean.zoo* > > It is your responsibility to determine whether any particular function in > your version of R satisfies the language requirements at the time of your > use. Jim Albert and Maria Rizzo do not set the standards for what is an > evolving piece of software. > > -- > David. > > > > But, when I could not reproduce these results, > > then I thought that probably, this could possibly be due to some missing > > values NA's in Weight and Height variables. Then I tried the above code > for > > the "mtcars" dataset for INDICES=mtcars$am. When I found the same results > > here too, then I reported the case in "r-help at R-project.org". > > > > With best regards, > > > > Dr. A.K. Singh > > Head, Department of Agril. Statistics > > Indira Gandhi Krishi Vishwavidyalaya, Raipur > > Chhattisgarh, India, PIN-492012 > > Mobile: +919752620740 > > Email: akhileshsingh.igkv at gmail.com > > > > On Fri, Apr 15, 2016 at 3:06 AM, Adrian Du?a <dusa.adrian at unibuc.ro> > wrote: > > > >> I think you are not using the best function for what your intentions > are. > >> Try: > >> > >>> by(data=mtcars, INDICES=list(as.factor(mtcars$am)), FUN=colMeans) > >> : 0 > >> mpg cyl disp hp drat wt > >> qsec vs > >> 17.1473684 6.9473684 290.3789474 160.2631579 3.2863158 3.7688947 > >> 18.1831579 0.3684211 > >> am gear carb > >> 0.0000000 3.2105263 2.7368421 > >> > >> > --------------------------------------------------------------------------- > >> : 1 > >> mpg cyl disp hp drat wt > >> qsec vs > >> 24.3923077 5.0769231 143.5307692 126.8461538 4.0500000 2.4110000 > >> 17.3600000 0.5384615 > >> am gear carb > >> 1.0000000 4.3846154 2.9230769 > >> > >> See the difference between colMeans() and mean() in their respective > help > >> files. > >> Hth, > >> Adrian > >> > >> On Thu, Apr 14, 2016 at 11:14 PM, Akhilesh Singh < > >> akhileshsingh.igkv at gmail.com> wrote: > >> > >>> Dear Sirs, > >>> > >>> I am Professor at Indira Gandhi Krishi Vishwavidyalaya, Raipur, > >>> Chhattisgarh, India. > >>> > >>> While taking classes, I found the *by() *function producing following > >>> error > >>> > >>> when I use FUN=mean or median and some other functions, however, > >>> FUN=summary works. > >>> > >>> Given below is the output of the example I used on a built-in dataset > >>> "mtcars", along with error message reproduced herewith: > >>> > >>>> by(data=mtcars, INDICES=list(mtcars$am), FUN=mean) > >>> : 0 > >>> [1] NA > >>> ------------------------------------------------------------ > >>> : 1 > >>> [1] NA > >>> Warning messages: > >>> 1: In mean.default(data[x, , drop = FALSE], ...) : > >>> argument is not numeric or logical: returning NA > >>> 2: In mean.default(data[x, , drop = FALSE], ...) : > >>> argument is not numeric or logical: returning NA > >>> > >>> However, the same by() function works for FUN=summary, given below is > the > >>> output: > >>> > >>>> by(data=mtcars, INDICES=list(mtcars$am), FUN=summary) > >>> : 0 > >>> mpg cyl disp hp > >>> Min. :10.40 Min. :4.000 Min. :120.1 Min. : 62.0 > >>> 1st Qu.:14.95 1st Qu.:6.000 1st Qu.:196.3 1st Qu.:116.5 > >>> Median :17.30 Median :8.000 Median :275.8 Median :175.0 > >>> Mean :17.15 Mean :6.947 Mean :290.4 Mean :160.3 > >>> 3rd Qu.:19.20 3rd Qu.:8.000 3rd Qu.:360.0 3rd Qu.:192.5 > >>> Max. :24.40 Max. :8.000 Max. :472.0 Max. :245.0 > >>> drat wt qsec vs > am > >>> > >>> Min. :2.760 Min. :2.465 Min. :15.41 Min. :0.0000 Min. > >>> :0 > >>> > >>> 1st Qu.:3.070 1st Qu.:3.438 1st Qu.:17.18 1st Qu.:0.0000 1st > >>> Qu.:0 > >>> > >>> Median :3.150 Median :3.520 Median :17.82 Median :0.0000 Median > >>> :0 > >>> > >>> Mean :3.286 Mean :3.769 Mean :18.18 Mean :0.3684 Mean > >>> :0 > >>> > >>> 3rd Qu.:3.695 3rd Qu.:3.842 3rd Qu.:19.17 3rd Qu.:1.0000 3rd > >>> Qu.:0 > >>> > >>> Max. :3.920 Max. :5.424 Max. :22.90 Max. :1.0000 Max. > >>> :0 > >>> > >>> gear carb > >>> Min. :3.000 Min. :1.000 > >>> 1st Qu.:3.000 1st Qu.:2.000 > >>> Median :3.000 Median :3.000 > >>> Mean :3.211 Mean :2.737 > >>> 3rd Qu.:3.000 3rd Qu.:4.000 > >>> Max. :4.000 Max. :4.000 > >>> ------------------------------------------------------------ > >>> : 1 > >>> mpg cyl disp hp > drat > >>> > >>> Min. :15.00 Min. :4.000 Min. : 71.1 Min. : 52.0 Min. > >>> :3.54 > >>> 1st Qu.:21.00 1st Qu.:4.000 1st Qu.: 79.0 1st Qu.: 66.0 1st > >>> Qu.:3.85 > >>> Median :22.80 Median :4.000 Median :120.3 Median :109.0 Median > >>> :4.08 > >>> Mean :24.39 Mean :5.077 Mean :143.5 Mean :126.8 Mean > >>> :4.05 > >>> 3rd Qu.:30.40 3rd Qu.:6.000 3rd Qu.:160.0 3rd Qu.:113.0 3rd > >>> Qu.:4.22 > >>> Max. :33.90 Max. :8.000 Max. :351.0 Max. :335.0 Max. > >>> :4.93 > >>> wt qsec vs am gear > >>> > >>> Min. :1.513 Min. :14.50 Min. :0.0000 Min. :1 Min. > >>> :4.000 > >>> > >>> 1st Qu.:1.935 1st Qu.:16.46 1st Qu.:0.0000 1st Qu.:1 1st > >>> Qu.:4.000 > >>> > >>> Median :2.320 Median :17.02 Median :1.0000 Median :1 Median > >>> :4.000 > >>> > >>> Mean :2.411 Mean :17.36 Mean :0.5385 Mean :1 Mean > >>> :4.385 > >>> > >>> 3rd Qu.:2.780 3rd Qu.:18.61 3rd Qu.:1.0000 3rd Qu.:1 3rd > >>> Qu.:5.000 > >>> > >>> Max. :3.570 Max. :19.90 Max. :1.0000 Max. :1 Max. > >>> :5.000 > >>> > >>> carb > >>> Min. :1.000 > >>> 1st Qu.:1.000 > >>> Median :2.000 > >>> Mean :2.923 > >>> 3rd Qu.:4.000 > >>> Max. :8.000 > >>>> > >>> > >>> I am using the latest version of *R-3.2.4 on Windows*, however, this > error > >>> is being generated in the previous version too, > >>> > >>> Hope this reporting will get serious attention in debugging. > >>> > >>> With best regards, > >>> > >>> Dr. A.K. Singh > >>> Head, Department of Agril. Statistics > >>> Indira Gandhi Krishi Vishwavidyalaya, Raipur > >>> Chhattisgarh, India, PIN-492012 > >>> Mobile: +919752620740 > >>> Email: akhileshsingh.igkv at gmail.com > >>> > >>> [[alternative HTML version deleted]] > >>> > >>> ______________________________________________ > >>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > >>> https://stat.ethz.ch/mailman/listinfo/r-help > >>> PLEASE do read the posting guide > >>> http://www.R-project.org/posting-guide.html > >>> and provide commented, minimal, self-contained, reproducible code. > >>> > >> > >> > >> > >> -- > >> Adrian Dusa > >> University of Bucharest > >> Romanian Social Data Archive > >> Soseaua Panduri nr.90 > >> 050663 Bucharest sector 5 > >> Romania > >> > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > David Winsemius > Alameda, CA, USA > >[[alternative HTML version deleted]]
Possibly Parallel Threads
- Bug in by() function which works for some FUN argument and does not work for others
- Bug in by() function which works for some FUN argument and does not work for others
- Bug in by() function which works for some FUN argument and does not work for others
- Bug in by() function which works for some FUN argument and does not work for others
- Bug in by() function which works for some FUN argument and does not work for others