I will take advantage of this thread with a request of my own. Would it be
possible to add an index in the result of str(df )? For large dfs I find myself
counting for the position of certain variable(s) again and again (for indexing
purposes).
Thanks,
m
Mihai Nica
170 East Griffith St. G5
Jackson, MS 39201
601-914-0361
----- Original Message ----
From: Douglas Bates <bates@stat.wisc.edu>
To: Jari Oksanen <jarioksa@sun3.oulu.fi>
Cc: r-help@stat.math.ethz.ch
Sent: Friday, February 16, 2007 7:58:01 AM
Subject: Re: [R] something missing in summary()
On 2/16/07, Jari Oksanen <jarioksa@sun3.oulu.fi>
wrote:> Gerard Smits g_smits at verizon.net Fri Feb 16 00:46:09 CET 2007:
> > just noticed that two key pieces of information are not given by
> > the summary() command: N and SD. we are given the N missing, but
> > not the converse. I know these summary value can be obtained easy,
> > but can't understand why these two pieces of information are not
> > provided with the other info.
> >
> I assume you mean summary.data.frame?
Given a data frame, df, I would use
str(df)
before
summary(df)
because I want to see, for example, which columns are factors or
ordered factors or ... That information is present in the value of
summary(df) but in a more subtle way. As pointed out below the number
of rows in the data frame is the total number of observations for each
of the variables so putting that information in the summary for each
variable is redundant.
> There has even been an "appeal" on this:
> http://tolstoy.newcastle.edu.au/R/help/06/02/20706.html
>
> However, I didn't find any petition you could sign (but I found many
> surprising petitions when googling on this). Perhaps somebody will set
> up a petition page some day.
>
> With time, I've learnt that if something obvious is missing in the base
> R, there is a reason. Probably the Core thinks that you shouldn't use
sd
> in a summary, but it is a poor and misleading statistic (they neither
> have skewness and kurtosis). You may learn to live without sd if you
> survive over the first impact.
I don't think this was an explicit decision by R-core. It was a case
of S compatibility so the original decision was made at Bell Labs and
that group was highly influenced by John Tukey who worked with them. I
imagine that is why the summary of a numeric is a 'five-number'
summary plus the mean. I would say the surprising and unconventional
part of that summary is the fact that it includes the mean.
> On the other hand, there are things like R-squared and significance
> stars in summary.lm, which spoils the image of purity in the Core.
However there is the option show.signif.stars which can be set to
FALSE and which I always do.
> Number of observations may not be very useful in summary.data.frame,
> because it varies so little among variables.
>
> The R-help message cited above and its follow-ups suggest some ways of
> locally modifying the code and maintaining the modifications over the
> upgrades of R.
>
> Best wishes, Jari Oksanen
>
> ______________________________________________
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
____________________________________________________________________________________
Finding fabulous fares is fun.
otel bargains.
[[alternative HTML version deleted]]