I've got a background in computer science & have been using Linux for nearly a decade. I'm working on a Ph.D. in education and technology and I essentially live in emacs and do all of my writing in LaTeX. To me R seems like the perfect stats package. Unfortunately, the learning curve is killing me. I feel like that if I'd waded through pulling down menus in SPSS or SAS I could have gotten a bit more done by now, but I don't want to use those programs. What I'd like is a cookbook of a few basic procedures. I think I'm more interested in the R code than I am statistical explication, though I don't object to the latter. Is Venables and Ripley "MASS" going to do that for me or would "S Programming" be more appropriate? In my cursory look through the sample chapter from Nolan and Speed I saw no S-plus/S/R code whatsoever. One thing I'm trying to do right now is certainly trivial, but I can't quite get it going. Hopefully I'm not sounding too much like I'm asking you to do my homework. In a perception study, I've got three within-subject conditions, A, B, and C. Each condition has 4 trials with 2 times and an angle (actually an error measurement between the actual angle and the one the subjects pointed to). All I want is to get the stuff that summary() gives split out by condition. It might also be nice to split it out between subjects as well to look at, and possibly correct for individual differences, (which might be difficult with so few trials?). My data columns are as follows: A B C (with 0 or 1 to indicate condition, would a single column with 1-3 be better?) t1, t2, angle-error Surely fewer than 10 lines of R could yield me these results and maybe a couple pretty graphs. In another study where I'm looking at motivation and hobbies, which I have almost no idea how to analyze (which suggests I might have chosen a bad design & that a problem like this probably doesn't belong in my "cookbook") I've had people rank a set of 25 characteristics of their activities or motivations (5 in each of 5 categories) and would like to see if any patterns are emerging there. My data start out as an ordered list of these cards (1-25); I futzed in a spreadsheet to get two columns, the motivation number and its rank. If I could avoid using the spreadsheet, that'd be nice. Thanks. -- Jay Pfaffman pfaffman at relaxpc.com +1-415-821-7507 (H) +1-415-810-2238 (M) http://relax.ltc.vanderbilt.edu/~pfaffman/ -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
I think our "Notes on R for psychology..." was written just for you. I believe it has examples much like the ones you describe. (It is a little out of date because it has not been revised to keep up with newer versions of R ... yet.) Your particular problems might be solved with by(), or apply(), or tapply(). It is a little hard for me to tell because I'm not sure how you've laid out your data. In particular, there is no code for subject, yet you say it is a within-subject design. For your second example, if you have data for many subjects (not clear from your description), you might try things in the mva package. The biplot function for principal components is particularly nice. You might want to put the data into a one-row-per-subject matrix or data-frame, in both cases (although the layout you seem to have has other advantages, as we explain). Our notes are in the contributed documents section, and at http://finzi.psych.upenn.edu and thre is also a reference card in both places (also a little out of date - mostly needing a second page for graphics). As for your question about whether to use one code for each condition or 1-3 to indicate conditions, either is good, but you probably want to make the 1-3 code a _factor_, which is what is sometimes called a categorical variable. Jon Baron -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
If I understand what you're asking, it's essentially the same thing I asked the list for a week or so ago. First, if A, B, and C conditions are mutually exclusive, then yes, I would suggest working with a single variable with three values. As a rule of thumb (more about database theory than statistics) you should avoid designing data structures that can hold invalid data. I quote below the responses from Rossini and Lumley to my original query: On 9 Jan 2002, A.J. Rossini wrote:> >>>>> "AP" == Andrew Perrin <andrew_perrin at unc.edu> writes: > > AP> I'd like to get summary statistics (really just a mean would > AP> be fine) for a vector in a data frame, but split based on the > AP> value of another vector. That is, I have a data frame > AP> (hcd.df) with variables datecat (which is always 1 or 2) and > AP> auth.sum (-8..+8). I've used xtabs to get chi-square > AP> comparisons, but what I need now is a simple mean of auth.sum > AP> where datecat is 1 and another where datecat is 2. Thanks for > AP> any advice. > > Something like : > > lapply(split(hcd.df$auth.sum,hcd.df$datecat),mean) >Or tapply(hcf.df$auth.sum, hcd.df$datecat, mean) or (in 1.4.0) with(hcf.df, {tapply(auth.sum, datecat, mean}) -thomas Thomas Lumley Asst. Professor, Biostatistics tlumley at u.washington.edu University of Washington, Seattle ----- In your case, I'd say something like: tapply(df$angle, df$condition, summary) is probably right. ---------------------------------------------------------------------- Andrew J Perrin - andrew_perrin at unc.edu - http://www.unc.edu/~aperrin Assistant Professor of Sociology, U of North Carolina, Chapel Hill 269 Hamilton Hall, CB#3210, Chapel Hill, NC 27599-3210 USA On Tue, 15 Jan 2002, Jay Pfaffman wrote:> I've got a background in computer science & have been using Linux for > nearly a decade. I'm working on a Ph.D. in education and technology > and I essentially live in emacs and do all of my writing in LaTeX. > To me R seems like the perfect stats package. Unfortunately, the > learning curve is killing me. I feel like that if I'd waded through > pulling down menus in SPSS or SAS I could have gotten a bit more done > by now, but I don't want to use those programs. > > What I'd like is a cookbook of a few basic procedures. I think I'm > more interested in the R code than I am statistical explication, > though I don't object to the latter. Is Venables and Ripley "MASS" > going to do that for me or would "S Programming" be more appropriate? > In my cursory look through the sample chapter from Nolan and Speed I > saw no S-plus/S/R code whatsoever. > > One thing I'm trying to do right now is certainly trivial, but I can't > quite get it going. Hopefully I'm not sounding too much like I'm > asking you to do my homework. > > In a perception study, I've got three within-subject conditions, A, B, > and C. Each condition has 4 trials with 2 times and an angle > (actually an error measurement between the actual angle and the one > the subjects pointed to). All I want is to get the stuff that > summary() gives split out by condition. It might also be nice to > split it out between subjects as well to look at, and possibly correct > for individual differences, (which might be difficult with so few > trials?). My data columns are as follows: > > A B C (with 0 or 1 to indicate condition, would a single column with > 1-3 be better?) > > t1, t2, angle-error > > Surely fewer than 10 lines of R could yield me these results and maybe > a couple pretty graphs. > > In another study where I'm looking at motivation and hobbies, which I > have almost no idea how to analyze (which suggests I might have chosen > a bad design & that a problem like this probably doesn't belong in my > "cookbook") I've had people rank a set of 25 characteristics of their > activities or motivations (5 in each of 5 categories) and would like > to see if any patterns are emerging there. My data start out as an > ordered list of these cards (1-25); I futzed in a spreadsheet to get > two columns, the motivation number and its rank. If I could avoid > using the spreadsheet, that'd be nice. > > Thanks. > > -- > Jay Pfaffman pfaffman at relaxpc.com > +1-415-821-7507 (H) +1-415-810-2238 (M) > http://relax.ltc.vanderbilt.edu/~pfaffman/ > -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- > r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html > Send "info", "help", or "[un]subscribe" > (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch > _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._ >-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
> -----Original Message-----> From: Jay Pfaffman [mailto:pfaffman at relaxpc.com] > Sent: Tuesday, January 15, 2002 12:35 PM > To: r-help at stat.math.ethz.ch > Subject: [R] Getting started with R > [...] > What I'd like is a cookbook of a few basic procedures. I think I'm > more interested in the R code than I am statistical explication, > though I don't object to the latter. Is Venables and Ripley "MASS" > going to do that for me or would "S Programming" be more > appropriate? Have you looked at http://cran.r-project.org/doc/manuals/R-intro.pdf ? > [...] > In a perception study, I've got three within-subject > conditions, A, B, > and C. Each condition has 4 trials with 2 times and an angle > (actually an error measurement between the actual angle and the one > the subjects pointed to). All I want is to get the stuff that > summary() gives split out by condition. It might also be nice to > split it out between subjects as well to look at, and > possibly correct > for individual differences, (which might be difficult with so few > trials?). My data columns are as follows: > > A B C (with 0 or 1 to indicate condition, would a single column with > 1-3 be better?) It is probably easier to deal with a single 'factor' variable with three levels 'A','B', and 'C'. > > t1, t2, angle-error > > Surely fewer than 10 lines of R could yield me these > results and maybe > a couple pretty graphs. > If you create a data file named myfile.csv containing : Condition,t1,t2,angle.error A, 10, 12, 30 B, 12, 6, 15 C, 9, 16, 0 ... you can read it in using > mydata <- read.csv("myfile.csv") Then to get summaries of each of the variables separated by conditions do something like > by(mydata, mydata$Condition, summary) Some plots: > plot(t1 ~ Condition, data=mydata) > plot(t2 ~ Condition, data=mydata) > plot(angle ~ Condition, data=mydata) Run a regression model testing if t1 depends on condition: > summary(lm(t2 ~ Condition, data=mydata)) > In another study where I'm looking at motivation and > hobbies, which I > have almost no idea how to analyze (which suggests I might > have chosen > a bad design & that a problem like this probably doesn't > belong in my > "cookbook") I've had people rank a set of 25 > characteristics of their > activities or motivations (5 in each of 5 categories) and would like > to see if any patterns are emerging there. My data start out as an > ordered list of these cards (1-25); I futzed in a spreadsheet to get > two columns, the motivation number and its rank. If I could avoid > using the spreadsheet, that'd be nice. You would need to give quite a bit more information before one could suggest a reasonable method of anlyzing this data. Still, it seems that the easiest format to handle the data for statistical analysis would be one row per participant, with one variable (column) for each activity or motivation. -Greg > > Thanks. > > -- > Jay Pfaffman pfaffman at relaxpc.com > +1-415-821-7507 (H) +1-415-810-2238 (M) > http://relax.ltc.vanderbilt.edu/~pfaffman/ > -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-. > -.-.-.-.-.-.-.-.-.- > r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._. _._ LEGAL NOTICE Unless expressly stated otherwise, this message is confidential and may be privileged. It is intended for the addressee(s) only. Access to this E-mail by anyone else is unauthorized. If you are not an addressee, any disclosure or copying of the contents of this E-mail or any action taken (or not taken) in reliance on it is unauthorized and may be unlawful. If you are not an addressee, please inform the sender immediately. -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._