thr3ads.net - R help - [R] something missing in summary() [Feb 2007]

If this information is useful, please help other people find it:
Share via:

Gerard Smits

2007-Feb-15 23:46 UTC

[R] something missing in summary()

I just noticed that two key pieces of information are not given by 
the summary() command:  N and SD.  we are given the N missing, but 
not the converse.  I know these summary value can be obtained easy, 
but can't understand why these two pieces of information are not 
provided with the other info.

Thanks,

Gerard

Jari Oksanen

2007-Feb-16 12:53 UTC

head link

[R] something missing in summary()

Gerard Smits g_smits at verizon.net Fri Feb 16 00:46:09 CET
2007:> just noticed that two key pieces of information are not given by 
> the summary() command:  N and SD.  we are given the N missing, but 
> not the converse.  I know these summary value can be obtained easy, 
> but can't understand why these two pieces of information are not 
> provided with the other info.
> I assume you mean summary.data.frame?

There has even been an "appeal" on this:
http://tolstoy.newcastle.edu.au/R/help/06/02/20706.html

However, I didn't find any petition you could sign (but I found many
surprising petitions when googling on this). Perhaps somebody will set
up a petition page some day.

With time, I've learnt that if something obvious is missing in the base
R, there is a reason. Probably the Core thinks that you shouldn't use sd
in a summary, but it is a poor and misleading statistic (they neither
have skewness and kurtosis). You may learn to live without sd if you
survive over the first impact. 

On the other hand, there are things like R-squared and significance
stars in summary.lm, which spoils the image of purity in the Core. 

Number of observations may not be very useful in summary.data.frame,
because it varies so little among variables.

The R-help message cited above and its follow-ups suggest some ways of
locally modifying the code and maintaining the modifications over the
upgrades of R. 

Best wishes, Jari Oksanen

Mihai Nica

2007-Feb-16 14:10 UTC

head link

[R] something missing in summary()

I will take advantage of this thread with a request of my own. Would it be
possible to add an index in the result of str(df )? For large dfs I find myself
counting for the position of certain variable(s) again and again (for indexing
purposes).

Thanks,

m
 
Mihai Nica
170 East Griffith St. G5
Jackson, MS 39201
601-914-0361

----- Original Message ----
From: Douglas Bates <bates@stat.wisc.edu>
To: Jari Oksanen <jarioksa@sun3.oulu.fi>
Cc: r-help@stat.math.ethz.ch
Sent: Friday, February 16, 2007 7:58:01 AM
Subject: Re: [R] something missing in summary()

On 2/16/07, Jari Oksanen <jarioksa@sun3.oulu.fi>
wrote:> Gerard Smits g_smits at verizon.net Fri Feb 16 00:46:09 CET 2007:
> > just noticed that two key pieces of information are not given by
> > the summary() command:  N and SD.  we are given the N missing, but
> > not the converse.  I know these summary value can be obtained easy,
> > but can't understand why these two pieces of information are not
> > provided with the other info.
> >
> I assume you mean summary.data.frame?
Given a data frame, df, I would use

str(df)

before

summary(df)

because I want to see, for example, which columns are factors or
ordered factors or ...  That information is present in the value of
summary(df) but in a more subtle way.  As pointed out below the number
of rows in the data frame is the total number of observations for each
of the variables so putting that information in the summary for each
variable is redundant.
> There has even been an "appeal" on this:
> http://tolstoy.newcastle.edu.au/R/help/06/02/20706.html
>
> However, I didn't find any petition you could sign (but I found many
> surprising petitions when googling on this). Perhaps somebody will set
> up a petition page some day.
>
> With time, I've learnt that if something obvious is missing in the base
> R, there is a reason. Probably the Core thinks that you shouldn't use
sd
> in a summary, but it is a poor and misleading statistic (they neither
> have skewness and kurtosis). You may learn to live without sd if you
> survive over the first impact.
I don't think this was an explicit decision by R-core.  It was a case
of S compatibility so the original decision was made at Bell Labs and
that group was highly influenced by John Tukey who worked with them. I
imagine that is why the summary of a numeric is a 'five-number'
summary plus the mean.  I would say the surprising and unconventional
part of that summary is the fact that it includes the mean.
> On the other hand, there are things like R-squared and significance
> stars in summary.lm, which spoils the image of purity in the Core.
However there is the option show.signif.stars which can be set to
FALSE and which I always do.
> Number of observations may not be very useful in summary.data.frame,
> because it varies so little among variables.
>
> The R-help message cited above and its follow-ups suggest some ways of
> locally modifying the code and maintaining the modifications over the
> upgrades of R.
>
> Best wishes, Jari Oksanen
>
> ______________________________________________
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.







 
____________________________________________________________________________________
Finding fabulous fares is fun.  

otel bargains.

	[[alternative HTML version deleted]]

Jim Lemon

2007-Feb-17 10:20 UTC

head link

[R] something missing in summary()

Gerard Smits wrote:
 >
 > just noticed that two key pieces of information are not given by
 > the summary() command:  N and SD.  we are given the N missing, but
 > not the converse.  I know these summary value can be obtained easy,
 > but can't understand why these two pieces of information are not
 > provided with the other info.
 >
This was one reason that I wrote the describe function in the prettyR 
package. You can roll your own summary, and describe makes a reasonable 
attempt to sort out the common data types.

Jim

Maybe Matching Threads

Search for more reasonably related threads

R help - Feb 2007 - something missing in summary()

[R] something missing in summary()

[R] something missing in summary()

[R] something missing in summary()

[R] something missing in summary()

Maybe Matching Threads