Dear All, I have a database of 200 observations named myD. In the dataframe there are a column named code (with codes varying from 1 to 77), a column named "prevalence" with some quantitative measurements are given and an column named Pr_mean, with no values. I would like to set a cycle to compute the average of prevalence values for each different code and store the averages under the empty field Pr_mean. This is what I wrote: # Set a cycle for (i in 1:nrow(myD)) { mycode = myD$code[i] mymean[i] = mean(prevalence) myD$Pr_mean[i] = mymean[i] } With the above cycle I am able to compute the average of all 200 observations which is then written in every cell. I understand that a condition is missing, that indicates that the average has to be computed amongst the observations showing the same codes values. Could you please help me ? D. Posta, news, sport, oroscopo: tutto in una sola pagina. Crea l'home page che piace a te! www.yahoo.it/latuapagina [[alternative HTML version deleted]]
Try:> myD <- transform(myD, Pr_mean = ave(prevalence, codes))See ?ave and ?transform for details. -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.snow at imail.org (801) 408-8111> -----Original Message----- > From: r-help-bounces at r-project.org > [mailto:r-help-bounces at r-project.org] On Behalf Of Daniela Ottaviani > Sent: Tuesday, July 08, 2008 7:18 AM > To: r-help at r-project.org > Subject: [R] Question: Beginner stuck in a R cycle > > Dear All, > > I have a database of 200 observations named myD. > In the dataframe there are a column named code (with codes > varying from 1 to 77), a column named "prevalence" with some > quantitative measurements are given and an column named > Pr_mean, with no values. > > I would like to set a cycle to compute the average of > prevalence values for each different code and store the > averages under the empty field Pr_mean. > > This is what I wrote: > > # Set a cycle > for (i in 1:nrow(myD)) { > mycode = myD$code[i] > mymean[i] = mean(prevalence) > myD$Pr_mean[i] = mymean[i] > } > > With the above cycle I am able to compute the average of all > 200 observations which is then written in every cell. > I understand that a condition is missing, that indicates that > the average has to be computed amongst the observations > showing the same codes values. > > Could you please help me ? > > > D. > > > > Posta, news, sport, oroscopo: tutto in una sola pagina. > Crea l'home page che piace a te! > www.yahoo.it/latuapagina > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
On 08-Jul-08 13:18:13, Daniela Ottaviani wrote:> Dear All, > I have a database of 200 observations named myD. > In the dataframe there are a column named code (with codes varying > from 1 to 77), a column named "prevalence" with some quantitative > measurements are given and an column named Pr_mean, with no values. > > I would like to set a cycle to compute the average of prevalence values > for each different code and store the averages under the empty field > Pr_mean.I think something on the following lines would do what you want (I think it is wise to call the final column "Pr.mean", as below, rather than "Pr_mean"): for( Code in unique(myD$code) ){ ix <- (myD$code == Code ) myD$Pr.mean[ix] <- mean(myD$prevalence[ix]) } Ted.> This is what I wrote: > ># Set a cycle > for (i in 1:nrow(myD)) { > mycode = myD$code[i] > mymean[i] = mean(prevalence) > myD$Pr_mean[i] = mymean[i] > } > > With the above cycle I am able to compute the average of all 200 > observations which is then written in every cell. > I understand that a condition is missing, that indicates that the > average has to be computed amongst the observations showing the same > codes values. > > Could you please help me ? > > > D. > > > > Posta, news, sport, oroscopo: tutto in una sola pagina. > Crea l'home page che piace a te! > www.yahoo.it/latuapagina > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-------------------------------------------------------------------- E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk> Fax-to-email: +44 (0)870 094 0861 Date: 08-Jul-08 Time: 17:01:38 ------------------------------ XFMail ------------------------------
On Tue, Jul 8, 2008 at 3:18 PM, Daniela Ottaviani <d.ottaviani at yahoo.it> wrote:> Dear All, > > I have a database of 200 observations named myD. > In the dataframe there are a column named code (with codes varying from 1 to 77), a column named "prevalence" with some quantitative measurements are given and an column named Pr_mean, with no values. > > I would like to set a cycle to compute the average of prevalence values for each different code and store the averages under the empty field Pr_mean. > > This is what I wrote: > > # Set a cycle > for (i in 1:nrow(myD)) { > mycode = myD$code[i] > mymean[i] = mean(prevalence) > myD$Pr_mean[i] = mymean[i] > } > > With the above cycle I am able to compute the average of all 200 observations which is then written in every cell. > I understand that a condition is missing, that indicates that the average has to be computed amongst the observations showing the same codes values. > > Could you please help me ? > > > D. > >The easiest thing to do is to use ?by: myD<-data.frame(code=sample(letters[1:5],200,replace=T),value=rnorm(200)) by(myD$value,myD$code,mean) but that won't get you the the group means in the empty column without some more lines of code. Another way is to use ?lapply and ?unlist: myD$Pr_mean<-unlist(lapply(as.character(myD$code),function(x) mean(myD$value[myD$code==x]))) Regards, Gustaf -- Gustaf Rydevik, M.Sci. tel: +46(0)703 051 451 address:Essingetorget 40,112 66 Stockholm, SE skype:gustaf_rydevik -- Gustaf Rydevik, M.Sci. tel: +46(0)703 051 451 address:Essingetorget 40,112 66 Stockholm, SE skype:gustaf_rydevik
Would ave() do what you want? Rashid On Tue, 8 Jul 2008, Daniela Ottaviani wrote:> Dear All, > > I have a database of 200 observations named myD. > In the dataframe there are a column named code (with codes varying from 1 to 77), a column named "prevalence" with some quantitative measurements are given and an column named Pr_mean, with no values. > > I would like to set a cycle to compute the average of prevalence values for each different code and store the averages under the empty field Pr_mean. > > This is what I wrote: > > # Set a cycle > for (i in 1:nrow(myD)) { > mycode = myD$code[i] > mymean[i] = mean(prevalence) > myD$Pr_mean[i] = mymean[i] > } > > With the above cycle I am able to compute the average of all 200 observations which is then written in every cell. > I understand that a condition is missing, that indicates that the average has to be computed amongst the observations showing the same codes values. > > Could you please help me ? > > > D. > > > > Posta, news, sport, oroscopo: tutto in una sola pagina. > Crea l'home page che piace a te! > www.yahoo.it/latuapagina > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Dear Daniela, Try this: set.seed(123) myD<-data.frame(code=sample(letters[1:5],200,replace=T),value=rnorm(200)) tapply(myD$value,myD$code,mean) a b c d e 0.04401465 0.07813648 0.07018791 -0.14508544 -0.02369875 See ?tapply for more information. HTH, Jorge On Tue, Jul 8, 2008 at 9:18 AM, Daniela Ottaviani <d.ottaviani@yahoo.it> wrote:> Dear All, > > I have a database of 200 observations named myD. > In the dataframe there are a column named code (with codes varying from 1 > to 77), a column named "prevalence" with some quantitative measurements are > given and an column named Pr_mean, with no values. > > I would like to set a cycle to compute the average of prevalence values for > each different code and store the averages under the empty field Pr_mean. > > This is what I wrote: > > # Set a cycle > for (i in 1:nrow(myD)) { > mycode = myD$code[i] > mymean[i] = mean(prevalence) > myD$Pr_mean[i] = mymean[i] > } > > With the above cycle I am able to compute the average of all 200 > observations which is then written in every cell. > I understand that a condition is missing, that indicates that the average > has to be computed amongst the observations showing the same codes values. > > Could you please help me ? > > > D. > > > > Posta, news, sport, oroscopo: tutto in una sola pagina. > Crea l'home page che piace a te! > www.yahoo.it/latuapagina > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Hi Daniela, There may be other more elegant ways of doing it, but here is one:> myD <- data.frame(code = sample(3, 10, rep = T), prev = rnorm(10), Pr_mean = 0) > myDcode prev Pr_mean 1 3 -0.06710968 0 2 2 -1.43422034 0 3 1 0.22717580 0 4 3 0.32703754 0 5 3 1.26254159 0 6 2 0.65104107 0 7 1 -0.74293152 0 8 3 0.45845330 0 9 2 -0.64206400 0 10 3 -0.48671646 0> m <- tapply(myD$prev, myD$code, mean) > myD$Pr_mean <- m[match(myD$code, names(m))] > myDcode prev Pr_mean 1 3 -0.06710968 0.2988413 2 2 -1.43422034 -0.4750811 3 1 0.22717580 -0.2578779 4 3 0.32703754 0.2988413 5 3 1.26254159 0.2988413 6 2 0.65104107 -0.4750811 7 1 -0.74293152 -0.2578779 8 3 0.45845330 0.2988413 9 2 -0.64206400 -0.4750811 10 3 -0.48671646 0.2988413 Hope this helps. Ciao, Giovanni> Date: Tue, 08 Jul 2008 13:18:13 +0000 (GMT) > From: Daniela Ottaviani <d.ottaviani at yahoo.it> > Sender: r-help-bounces at r-project.org > Precedence: list > DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.it; > > Dear All, > > I have a database of 200 observations named myD. > In the dataframe there are a column named code (with codes varying from 1 to 77), a column named "prevalence" with some quantitative measurements are given and an column named Pr_mean, with no values. > > I would like to set a cycle to compute the average of prevalence values for each different code and store the averages under the empty field Pr_mean. > > This is what I wrote: > > # Set a cycle > for (i in 1:nrow(myD)) { > mycode = myD$code[i] > mymean[i] = mean(prevalence) > myD$Pr_mean[i] = mymean[i] > } > > With the above cycle I am able to compute the average of all 200 observations which is then written in every cell. > I understand that a condition is missing, that indicates that the average has to be computed amongst the observations showing the same codes values. > > Could you please help me ? > > > D. > > > > Posta, news, sport, oroscopo: tutto in una sola pagina. > Crea l'home page che piace a te! > www.yahoo.it/latuapagina > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > >-- Giovanni Petris <GPetris at uark.edu> Department of Mathematical Sciences University of Arkansas - Fayetteville, AR 72701 Ph: (479) 575-6324, 575-8630 (fax) http://definetti.uark.edu/~gpetris/