Hello, I am a new user of R. I am coming from SAS and do statistics on stock market data, economic data, and social data. My question is this: How can you get the mean, standard dev, etc. of a variable based on a conditional statement on either the same variable or a different variable in the same data set? So if I had the closing prices of the S&P from 01/01/1990-12/31/1990, how could I get the average price of the S&P from 02/01/1990-03/15/1990? Or the average price of the S&P on Mondays (assuming a dummy var is created for 1 = Monday, 0 = else). I understand that you can create subsets and new data sets based on the conditional statements; but is there an easier way to do this by typing a line into the mean() statement? That was extremely easy in SAS where you could say: proc means data=sp500; var price; where monday = 1; Thank you for your help. Joe
You can use the tapply function to do this. You can't type a line into the mean statement. (See ?mean for what you can type in there). The general approach is to have a vector of data (stock prices) and a categorical variable (day of week). Then break up the data vector according to the levels in the categorical variable, and calculate the mean values: Weekmeans <- tapply(data.vector, catvariable, mean) This will give you the means for all days. If you really just want one mean (just monday), you could do: Monmean <- mean(data.vector[catvariable=="Monday"]) Similarly, if you want the standard deviation for each day of the week, you would use: WeekSD <- tapply(data.vector, catvariable, sd) MonSD <- sd(data.vector[catvariable=="Monday"]) You will find that some things that are easy in SAS require a little more thought in R, and vice versa. Certainly, the philosophical approach to data analysis in R is different to that in SAS. There are a couple of books for R for SAS users. They might help you. Cheers, Simon. On 08/01/13 11:17, Joseph Norman Thomson wrote:> Hello, > > I am a new user of R. I am coming from SAS and do statistics on stock > market data, economic data, and social data. My question is this: How > can you get the mean, standard dev, etc. of a variable based on a > conditional statement on either the same variable or a different > variable in the same data set? So if I had the closing prices of the > S&P from 01/01/1990-12/31/1990, how could I get the average price of > the S&P from 02/01/1990-03/15/1990? Or the average price of the S&P on > Mondays (assuming a dummy var is created for 1 = Monday, 0 = else). I > understand that you can create subsets and new data sets based on the > conditional statements; but is there an easier way to do this by > typing a line into the mean() statement? That was extremely easy in > SAS where you could say: > > proc means data=sp500; > var price; > where monday = 1; > > Thank you for your help. > > Joe > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Simon Blomberg, BSc (Hons), PhD, MAppStat, AStat. Lecturer and Consultant Statistician School of Biological Sciences The University of Queensland St. Lucia Queensland 4072 Australia T: +61 7 3365 2506 email: S.Blomberg1_at_uq.edu.au http://www.evolutionarystatistics.org Policies: 1. I will NOT analyse your data for you. 2. Your deadline is your problem. Statistics is the grammar of science - Karl Pearson.
I think Simon has provided a good answer to the actual question but as a refugee from SAS I'd suggest having a look at www.et.bs.ehu.es/~etptupaf/pub/R/RforSAS&SPSSusers.pdf or getting the book Muenchen, R. A. (2008). R for SAS and SPSS Users (1st ed.). Springer. R ans SAS approach things very differently at times. John Kane Kingston ON Canada> -----Original Message----- > From: thomsonj at email.arizona.edu > Sent: Mon, 7 Jan 2013 19:17:27 -0600 > To: r-help at r-project.org > Subject: [R] Conditional Statistics > > Hello, > > I am a new user of R. I am coming from SAS and do statistics on stock > market data, economic data, and social data. My question is this: How > can you get the mean, standard dev, etc. of a variable based on a > conditional statement on either the same variable or a different > variable in the same data set? So if I had the closing prices of the > S&P from 01/01/1990-12/31/1990, how could I get the average price of > the S&P from 02/01/1990-03/15/1990? Or the average price of the S&P on > Mondays (assuming a dummy var is created for 1 = Monday, 0 = else). I > understand that you can create subsets and new data sets based on the > conditional statements; but is there an easier way to do this by > typing a line into the mean() statement? That was extremely easy in > SAS where you could say: > > proc means data=sp500; > var price; > where monday = 1; > > Thank you for your help. > > Joe > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.____________________________________________________________ FREE 3D MARINE AQUARIUM SCREENSAVER - Watch dolphins, sharks & orcas on your desktop!
> ...if I had the closing prices of the > S&P from 01/01/1990-12/31/1990, how could I get the average price of > the S&P from 02/01/1990-03/15/1990? Or the average price of the S&P on > Mondays (assuming a dummy var is created for 1 = Monday, 0 = else).tapply has already been referred to. You may also find aggregate() useful, as it gives you back a data frame that includes the conditioning variables if you tell it to. Alse ave, if you want to do something like mean-centring a data set based on group means rather than the grad mean. S Ellison ******************************************************************* This email and any attachments are confidential. Any use...{{dropped:8}}