Hello all, I hope i chose the right list as my question is a beginner-question. I have a data set with 3 colums "London", "Rome" and "Vienna" - the location is presented through a 1 like this: London Rome Vienna q1 0 0 1 4 0 1 0 2 1 0 0 3 .... .... .... I just want to calculate the means of a variable q1. I tried following script: # calculate the mean of all locations results <- subset(results, subset== 1 ) mean(results$q1) # calculate the mean of London results <- subset(results, subset== 1 , select=c(London)) mean(results$q1) # calculate the mean of Rome results <- subset(results, subset== 1 , select=c(Rome)) mean(results$q1) # calcualate the mean of Vienna results <- subset(results, subset== 1 , select=c(Vienna)) mean(results$q1) As all results are 1.68 and there is defenitely a difference in the three locations I wonder whats going on. I get confused as the Rcmdr asks me to overwrite things and there is no "just filter" option. Any help would be apprechiated. Thank you in advance. Regards Peter ___CURE - Center for Usability Research & Engineering___ Peter Wolkerstorfer Usability Engineer Hauffgasse 3-5, 1110 Wien, Austria [Tel] +43.1.743 54 51.46 [Fax] +43.1.743 54 51.30 [Mail] wolkerstorfer at cure.at [Web] http://www.cure.at
Peter,
There is a much easier way to do this. First, you should consider
organizing your data as follows:
set.seed(1) # for replication only
# Here is a sample dataframe
tmp <- data.frame(city = gl(3,10, label = c("London",
"Rome","Vienna"
)), q1 = rnorm(30))
# Compute the means
with(tmp, tapply(q1,city, mean))
London Rome Vienna
0.1322028 0.2488450 -0.1336732
I hope this helps
> -----Original Message-----
> From: r-help-bounces at stat.math.ethz.ch
> [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Peter
> Wolkerstorfer - CURE
> Sent: Monday, September 25, 2006 7:51 AM
> To: r-help at stat.math.ethz.ch
> Subject: [R] Beginner question: select cases
>
> Hello all,
>
> I hope i chose the right list as my question is a beginner-question.
>
> I have a data set with 3 colums "London", "Rome" and
> "Vienna" - the location is presented through a 1 like this:
> London Rome Vienna q1
> 0 0 1 4
> 0 1 0 2
> 1 0 0 3
> ....
> ....
> ....
>
> I just want to calculate the means of a variable q1.
>
> I tried following script:
>
> # calculate the mean of all locations
> results <- subset(results, subset== 1 )
> mean(results$q1)
> # calculate the mean of London
> results <- subset(results, subset== 1 , select=c(London))
> mean(results$q1)
> # calculate the mean of Rome
> results <- subset(results, subset== 1 , select=c(Rome))
> mean(results$q1)
> # calcualate the mean of Vienna
> results <- subset(results, subset== 1 , select=c(Vienna))
> mean(results$q1)
>
> As all results are 1.68 and there is defenitely a difference
> in the three locations I wonder whats going on.
> I get confused as the Rcmdr asks me to overwrite things and
> there is no "just filter" option.
>
> Any help would be apprechiated. Thank you in advance.
>
> Regards
> Peter
>
>
>
> ___CURE - Center for Usability Research & Engineering___
>
> Peter Wolkerstorfer
> Usability Engineer
> Hauffgasse 3-5, 1110 Wien, Austria
>
> [Tel] +43.1.743 54 51.46
> [Fax] +43.1.743 54 51.30
>
> [Mail] wolkerstorfer at cure.at
> [Web] http://www.cure.at
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
Your problem would be a lot easier if you coded the location in one variable instead of three variables. Then you could calculate the means with one line of code: by(results$q1, results$location, mean) With your dataset you could use by(results$London, results$location, mean) by(results$Rome, results$location, mean) by(results$Vienna, results$location, mean) see ?by for more information And take a good look at your code. You take a subset from results and the assign it to results. This means that you replace the original results dataframe with a subset of it. As you take the subset for the next city, you won't take a subset from the original dataset but for the previous subset! Cheers, Thierry ------------------------------------------------------------------------ ---- ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Reseach Institute for Nature and Forest Cel biometrie, methodologie en kwaliteitszorg / Section biometrics, methodology and quality assurance Gaverstraat 4 9500 Geraardsbergen Belgium tel. + 32 54/436 185 Thierry.Onkelinx op inbo.be www.inbo.be -----Oorspronkelijk bericht----- Van: r-help-bounces op stat.math.ethz.ch [mailto:r-help-bounces op stat.math.ethz.ch] Namens Peter Wolkerstorfer - CURE Verzonden: maandag 25 september 2006 13:51 Aan: r-help op stat.math.ethz.ch Onderwerp: [R] Beginner question: select cases Hello all, I hope i chose the right list as my question is a beginner-question. I have a data set with 3 colums "London", "Rome" and "Vienna" - the location is presented through a 1 like this: London Rome Vienna q1 0 0 1 4 0 1 0 2 1 0 0 3 .... .... .... I just want to calculate the means of a variable q1. I tried following script: # calculate the mean of all locations results <- subset(results, subset== 1 ) mean(results$q1) # calculate the mean of London results <- subset(results, subset== 1 , select=c(London)) mean(results$q1) # calculate the mean of Rome results <- subset(results, subset== 1 , select=c(Rome)) mean(results$q1) # calcualate the mean of Vienna results <- subset(results, subset== 1 , select=c(Vienna)) mean(results$q1) As all results are 1.68 and there is defenitely a difference in the three locations I wonder whats going on. I get confused as the Rcmdr asks me to overwrite things and there is no "just filter" option. Any help would be apprechiated. Thank you in advance. Regards Peter ___CURE - Center for Usability Research & Engineering___ Peter Wolkerstorfer Usability Engineer Hauffgasse 3-5, 1110 Wien, Austria [Tel] +43.1.743 54 51.46 [Fax] +43.1.743 54 51.30 [Mail] wolkerstorfer op cure.at [Web] http://www.cure.at ______________________________________________ R-help op stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
--- Peter Wolkerstorfer - CURE <wolkerstorfer at cure.at> wrote:> Hello all, > > I hope i chose the right list as my question is a > beginner-question. > > I have a data set with 3 colums "London", "Rome" > and "Vienna" - the > location is presented through a 1 like this: > London Rome Vienna q1 > 0 0 1 4 > 0 1 0 2 > 1 0 0 3 > .... > .... > .... > > I just want to calculate the means of a variable q1. > > I tried following script: > > # calculate the mean of all locations > results <- subset(results, subset== 1 ) > mean(results$q1) > # calculate the mean of London > results <- subset(results, subset== 1 , > select=c(London)) > mean(results$q1) > # calculate the mean of Rome > results <- subset(results, subset== 1 , > select=c(Rome)) > mean(results$q1) > # calcualate the mean of Vienna > results <- subset(results, subset== 1 , > select=c(Vienna)) > mean(results$q1) > > As all results are 1.68 and there is defenitely a > difference in the > three locations I wonder whats going on. > I get confused as the Rcmdr asks me to overwrite > things and there is no > "just filter" option. > > Any help would be apprechiated. Thank you in > advance. > > Regards > PeterI'm new at R also. However I don't recognize your syntax. I have not seen select used here. Try results <- subset(results, London==1 )