Hello all, I hope i chose the right list as my question is a beginner-question. I have a data set with 3 colums "London", "Rome" and "Vienna" - the location is presented through a 1 like this: London Rome Vienna q1 0 0 1 4 0 1 0 2 1 0 0 3 .... .... .... I just want to calculate the means of a variable q1. I tried following script: # calculate the mean of all locations results <- subset(results, subset== 1 ) mean(results$q1) # calculate the mean of London results <- subset(results, subset== 1 , select=c(London)) mean(results$q1) # calculate the mean of Rome results <- subset(results, subset== 1 , select=c(Rome)) mean(results$q1) # calcualate the mean of Vienna results <- subset(results, subset== 1 , select=c(Vienna)) mean(results$q1) As all results are 1.68 and there is defenitely a difference in the three locations I wonder whats going on. I get confused as the Rcmdr asks me to overwrite things and there is no "just filter" option. Any help would be apprechiated. Thank you in advance. Regards Peter ___CURE - Center for Usability Research & Engineering___ Peter Wolkerstorfer Usability Engineer Hauffgasse 3-5, 1110 Wien, Austria [Tel] +43.1.743 54 51.46 [Fax] +43.1.743 54 51.30 [Mail] wolkerstorfer at cure.at [Web] http://www.cure.at
Peter, There is a much easier way to do this. First, you should consider organizing your data as follows: set.seed(1) # for replication only # Here is a sample dataframe tmp <- data.frame(city = gl(3,10, label = c("London", "Rome","Vienna" )), q1 = rnorm(30)) # Compute the means with(tmp, tapply(q1,city, mean)) London Rome Vienna 0.1322028 0.2488450 -0.1336732 I hope this helps> -----Original Message----- > From: r-help-bounces at stat.math.ethz.ch > [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Peter > Wolkerstorfer - CURE > Sent: Monday, September 25, 2006 7:51 AM > To: r-help at stat.math.ethz.ch > Subject: [R] Beginner question: select cases > > Hello all, > > I hope i chose the right list as my question is a beginner-question. > > I have a data set with 3 colums "London", "Rome" and > "Vienna" - the location is presented through a 1 like this: > London Rome Vienna q1 > 0 0 1 4 > 0 1 0 2 > 1 0 0 3 > .... > .... > .... > > I just want to calculate the means of a variable q1. > > I tried following script: > > # calculate the mean of all locations > results <- subset(results, subset== 1 ) > mean(results$q1) > # calculate the mean of London > results <- subset(results, subset== 1 , select=c(London)) > mean(results$q1) > # calculate the mean of Rome > results <- subset(results, subset== 1 , select=c(Rome)) > mean(results$q1) > # calcualate the mean of Vienna > results <- subset(results, subset== 1 , select=c(Vienna)) > mean(results$q1) > > As all results are 1.68 and there is defenitely a difference > in the three locations I wonder whats going on. > I get confused as the Rcmdr asks me to overwrite things and > there is no "just filter" option. > > Any help would be apprechiated. Thank you in advance. > > Regards > Peter > > > > ___CURE - Center for Usability Research & Engineering___ > > Peter Wolkerstorfer > Usability Engineer > Hauffgasse 3-5, 1110 Wien, Austria > > [Tel] +43.1.743 54 51.46 > [Fax] +43.1.743 54 51.30 > > [Mail] wolkerstorfer at cure.at > [Web] http://www.cure.at > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Your problem would be a lot easier if you coded the location in one variable instead of three variables. Then you could calculate the means with one line of code: by(results$q1, results$location, mean) With your dataset you could use by(results$London, results$location, mean) by(results$Rome, results$location, mean) by(results$Vienna, results$location, mean) see ?by for more information And take a good look at your code. You take a subset from results and the assign it to results. This means that you replace the original results dataframe with a subset of it. As you take the subset for the next city, you won't take a subset from the original dataset but for the previous subset! Cheers, Thierry ------------------------------------------------------------------------ ---- ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Reseach Institute for Nature and Forest Cel biometrie, methodologie en kwaliteitszorg / Section biometrics, methodology and quality assurance Gaverstraat 4 9500 Geraardsbergen Belgium tel. + 32 54/436 185 Thierry.Onkelinx op inbo.be www.inbo.be -----Oorspronkelijk bericht----- Van: r-help-bounces op stat.math.ethz.ch [mailto:r-help-bounces op stat.math.ethz.ch] Namens Peter Wolkerstorfer - CURE Verzonden: maandag 25 september 2006 13:51 Aan: r-help op stat.math.ethz.ch Onderwerp: [R] Beginner question: select cases Hello all, I hope i chose the right list as my question is a beginner-question. I have a data set with 3 colums "London", "Rome" and "Vienna" - the location is presented through a 1 like this: London Rome Vienna q1 0 0 1 4 0 1 0 2 1 0 0 3 .... .... .... I just want to calculate the means of a variable q1. I tried following script: # calculate the mean of all locations results <- subset(results, subset== 1 ) mean(results$q1) # calculate the mean of London results <- subset(results, subset== 1 , select=c(London)) mean(results$q1) # calculate the mean of Rome results <- subset(results, subset== 1 , select=c(Rome)) mean(results$q1) # calcualate the mean of Vienna results <- subset(results, subset== 1 , select=c(Vienna)) mean(results$q1) As all results are 1.68 and there is defenitely a difference in the three locations I wonder whats going on. I get confused as the Rcmdr asks me to overwrite things and there is no "just filter" option. Any help would be apprechiated. Thank you in advance. Regards Peter ___CURE - Center for Usability Research & Engineering___ Peter Wolkerstorfer Usability Engineer Hauffgasse 3-5, 1110 Wien, Austria [Tel] +43.1.743 54 51.46 [Fax] +43.1.743 54 51.30 [Mail] wolkerstorfer op cure.at [Web] http://www.cure.at ______________________________________________ R-help op stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
--- Peter Wolkerstorfer - CURE <wolkerstorfer at cure.at> wrote:> Hello all, > > I hope i chose the right list as my question is a > beginner-question. > > I have a data set with 3 colums "London", "Rome" > and "Vienna" - the > location is presented through a 1 like this: > London Rome Vienna q1 > 0 0 1 4 > 0 1 0 2 > 1 0 0 3 > .... > .... > .... > > I just want to calculate the means of a variable q1. > > I tried following script: > > # calculate the mean of all locations > results <- subset(results, subset== 1 ) > mean(results$q1) > # calculate the mean of London > results <- subset(results, subset== 1 , > select=c(London)) > mean(results$q1) > # calculate the mean of Rome > results <- subset(results, subset== 1 , > select=c(Rome)) > mean(results$q1) > # calcualate the mean of Vienna > results <- subset(results, subset== 1 , > select=c(Vienna)) > mean(results$q1) > > As all results are 1.68 and there is defenitely a > difference in the > three locations I wonder whats going on. > I get confused as the Rcmdr asks me to overwrite > things and there is no > "just filter" option. > > Any help would be apprechiated. Thank you in > advance. > > Regards > PeterI'm new at R also. However I don't recognize your syntax. I have not seen select used here. Try results <- subset(results, London==1 )