Hi everyone, I'm taking a course in statistics as part of my doctoral program in education at the U. of Minnesota, USA. I found R via Rpy, a python module that makes it possible to use R from python scripts. The instructor refers to SPSS a lot and that seems to be the standard stats tool around here. But being more of a Unix guy and not intimidated by programming, I'd like to see if I can do some of my course homework with R (probably using Rpy frequently). One thing I've noticed about SPSS is that it makes it extremely easy to generate all sorts of summary statistics including means, various trimmed means, skewness, etc. I've not had much luck doing similar things with R yet. For example, I have yet to find out how to calculate the skewness of a set of data. (This is a vector in R, right?) The same is true of trimmed means. I guess what I'm looking for is a reference that would point me to a particular R library or function for a particular statistic that I'd like to calculate. For example, I look up skewness and it tells me to use the foo function. Does such a think exist? I'm very impressed with what I see so far and it's likely that my inexperience with R at this point is making things look harder than they really are. But my approach has always been to jump in with both feet and give it a go. I haven't drowned yet. -Tim -- Tim Wilson | Visit Sibley online: | Check out: Henry Sibley HS | http://www.isd197.org | http://www.zope.com W. St. Paul, MN | | http://slashdot.org wilson at visi.com | <dtml-var pithy_quote> | http://linux.com -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
Cool, I can actually help out a little here. I'm doing a bit of self study to learn some statistics myself. I will recommend a text I bought recently, "Modern Applied Statistics with S-PLUS", 3rd Ed. by Venables & Ripley. It's helped me tremendously with making sense of R in the context of solving statistical problems, and I might learn to understand a little statistics along the way. Here's how I find out about a function I'm interested in:> help.search("skew")Help files with alias or title matching `skew', type `help(FOO, package = PKG)' to inspect entry `FOO(PKG) TITLE': k3.linear(boot) Linear Skewness Estimate> > library(boot) > ?k3.linearThere is no such function as FOO, but you can type:> help(k3.linear, package=boot)to read the help pages. I add the package/library to my search path, then invoke help as shown, or> help(k3.linear)then read on to see whether that is really the thing I'm looking for. Cheers, Bill On Thu, 2002-06-27 at 21:12, Tim Wilson wrote:> I guess what I'm looking for is a reference that would point me to a > particular R library or function for a particular statistic that I'd > like to calculate. For example, I look up skewness and it tells me to > use the foo function. Does such a think exist?-- Bill Barnard bill at barnard-engineering.com -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
If you go to the R web site, http://www.R-project.org/, and follow the documentation links, you will find a very good set of pdf's that should help a lot. If you type ?mean you will get the man page for mean, and see its "trim" argument. Generally speaking, however, R is not oriented towards generating lots of summary stats for lots of variables with just a few commands. A couple of places to start looking are ?aggregate ?summary -Don At 11:12 PM -0500 6/27/02, Tim Wilson wrote:>Hi everyone, > >I'm taking a course in statistics as part of my doctoral program in >education at the U. of Minnesota, USA. I found R via Rpy, a python >module that makes it possible to use R from python scripts. > >The instructor refers to SPSS a lot and that seems to be the standard >stats tool around here. But being more of a Unix guy and not intimidated >by programming, I'd like to see if I can do some of my course homework >with R (probably using Rpy frequently). > >One thing I've noticed about SPSS is that it makes it extremely easy to >generate all sorts of summary statistics including means, various >trimmed means, skewness, etc. I've not had much luck doing similar >things with R yet. For example, I have yet to find out how to calculate >the skewness of a set of data. (This is a vector in R, right?) The same >is true of trimmed means. > >I guess what I'm looking for is a reference that would point me to a >particular R library or function for a particular statistic that I'd >like to calculate. For example, I look up skewness and it tells me to >use the foo function. Does such a think exist? > >I'm very impressed with what I see so far and it's likely that my >inexperience with R at this point is making things look harder than they >really are. But my approach has always been to jump in with both feet >and give it a go. I haven't drowned yet. > >-Tim > >-- >Tim Wilson | Visit Sibley online: | Check out: >Henry Sibley HS | http://www.isd197.org | http://www.zope.com >W. St. Paul, MN | | http://slashdot.org >wilson at visi.com | <dtml-var pithy_quote> | http://linux.com >-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- >r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html >Send "info", "help", or "[un]subscribe" >(in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch >_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._-- -------------------------------------- Don MacQueen Environmental Protection Department Lawrence Livermore National Laboratory Livermore, CA, USA -------------------------------------- -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
The code attached creates a function for descriptives statistics called dstats. Enter the name of the column you want to summarize and dstats will produce a nice summary. If you have a data frame of numeric variables and want to summarize by column, you can use something like: apply(data.frame.name,2,dstats) wrap t( ) around the above to get the output in a format that I find more useable. Brett dstats<-function(x,na.rm=T,digits=3) { dstats<-NULL dstats[1]<-mean(x,na.rm=na.rm) dstats[2]<-sd(x,na.rm=na.rm) dstats[3]<-var(x,na.rm=na.rm) dstats[4]<-min(x,na.rm=na.rm) dstats[5]<-max(x,na.rm=na.rm) dstats[6]<-length(unique(x)) dstats[7]<-sum(!is.na(x)) dstats[8]<-sum(is.na(x)) dstats<-round(dstats,digits=digits) names(dstats)<-c("mean","sd","variance","min","max","unique","n","miss") return(dstats) } -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
You might also take a look at some functions in the Hmisc library, e.g.: set.seed(1) x <- runif(1000) g <- factor(sample(letters[1:4],1000,T)) describe(x) x n missing unique Mean .05 .10 .25 .50 .75 .90 1000 0 1000 0.5043 0.06128 0.11650 0.26521 0.50441 0.74055 0.90252 .95 0.95984 lowest : 0.003536 0.004208 0.004228 0.006153 0.006443 highest: 0.998321 0.998607 0.998766 0.999014 0.999439 options(digits=3) s <- function(y) c(Mean=mean(y),Median=median(y),SD=sqrt(var(y))) summary(x ~ g, fun=s) x N=1000 +-------+-+----+-----+------+-----+ | | |N |Mean |Median|SD | +-------+-+----+-----+------+-----+ |g |a| 254|0.495|0.469 |0.283| | |b| 243|0.523|0.533 |0.294| | |c| 249|0.495|0.481 |0.278| | |d| 254|0.505|0.514 |0.289| +-------+-+----+-----+------+-----+ |Overall| |1000|0.504|0.504 |0.286| +-------+-+----+-----+------+-----+ summarize(x, g, s) # to cross-classify g -> llist(g1,g2) g x Median SD # x column=Mean 1 a 0.495 0.469 0.283 2 b 0.523 0.533 0.294 3 c 0.495 0.481 0.278 4 d 0.505 0.514 0.289 Frank Harrell On Fri, 28 Jun 2002 11:21:32 -0700 Brett Magill <bmagill at earthlink.net> wrote:> The code attached creates a function for descriptives statistics called > dstats. Enter the name of the column you want to summarize and dstats will > produce a nice summary. If you have a data frame of numeric variables and > want to summarize by column, you can use something like: > > apply(data.frame.name,2,dstats) > > wrap t( ) around the above to get the output in a format that I find more > useable. > > Brett > > > > dstats<-function(x,na.rm=T,digits=3) { > > dstats<-NULL > > dstats[1]<-mean(x,na.rm=na.rm) > dstats[2]<-sd(x,na.rm=na.rm) > dstats[3]<-var(x,na.rm=na.rm) > dstats[4]<-min(x,na.rm=na.rm) > dstats[5]<-max(x,na.rm=na.rm) > dstats[6]<-length(unique(x)) > dstats[7]<-sum(!is.na(x)) > dstats[8]<-sum(is.na(x)) > > dstats<-round(dstats,digits=digits) > names(dstats)<-c("mean","sd","variance","min","max","unique","n","miss") > > return(dstats) > } > -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- > r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html > Send "info", "help", or "[un]subscribe" > (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch > _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._-- Frank E Harrell Jr Prof. of Biostatistics & Statistics Div. of Biostatistics & Epidem. Dept. of Health Evaluation Sciences U. Virginia School of Medicine http://hesweb1.med.virginia.edu/biostat -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._