Vedula, Satyanarayana
2009-Mar-13 22:43 UTC
[R] Question on summing rows within nested variable
Hi, I was hoping someone could help figure out how to write code for R to do the below. I have data that looks like below. Variables, sid and pid are strings, slope is numeric. I need R to get me the mean of slopes for all pid's nested within each sid if there are more than one pid's nested within sid. If there is only pid for a sid, like for 2.1 below, I want R to write a 0. In the data below, I want to get the mean of slopes for pid (1.1; 4.1; and 5.1) because they are nested within sid 1.1 and so on. Thanks in advance for any suggestions. Swaroop sid pid slope 1.1 1.1 2 1.1 4.1 3 1.1 5.1 2 2.1 5.1 3 3.2 1.2 2 3.2 1.7 3 [[alternative HTML version deleted]]
DF2 <- read.table(textConnection("sid pid slope + 1.1 1.1 2 + 1.1 4.1 3 + 1.1 5.1 2 + 2.1 5.1 3 + 3.2 1.2 2 + 3.2 1.7 3"), header = TRUE) > tapply(DF2$slope, as.factor(DF2$pid), mean) 1.1 1.2 1.7 4.1 5.1 2.0 2.0 3.0 3.0 2.5 Could also wrap it in with(DF2, ....) to make it more readable and compact. As testing shows that the as.factor() is not needed. -- David Winsemius On Mar 13, 2009, at 6:43 PM, Vedula, Satyanarayana wrote:> Hi, > > I was hoping someone could help figure out how to write code for R > to do the below. > > I have data that looks like below. Variables, sid and pid are > strings, slope is numeric. I need R to get me the mean of slopes for > all pid's nested within each sid if there are more than one pid's > nested within sid. > > If there is only pid for a sid, like for 2.1 below, I want R to > write a 0. > > In the data below, I want to get the mean of slopes for pid (1.1; > 4.1; and 5.1) because they are nested within sid 1.1 and so on. > > Thanks in advance for any suggestions. > Swaroop > > sid pid slope > 1.1 1.1 2 > 1.1 4.1 3 > 1.1 5.1 2 > 2.1 5.1 3 > 3.2 1.2 2 > 3.2 1.7 3 > > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.David Winsemius, MD Heritage Laboratories West Hartford, CT
If you want zero if only one variable:> DF2sid pid slope 1 1.1 1.1 2 2 1.1 4.1 3 3 1.1 5.1 2 4 2.1 5.1 3 5 3.2 1.2 2 6 3.2 1.7 3> tapply(DF2$slope, DF2$sid, function(x) if(length(x) == 1) 0 else mean(x))1.1 2.1 3.2 2.333333 0.000000 2.500000>On Fri, Mar 13, 2009 at 6:43 PM, Vedula, Satyanarayana <svedula at jhsph.edu> wrote:> Hi, > > I was hoping someone could help figure out how to write code for R to do the below. > > I have data that looks like below. Variables, sid and pid are strings, slope is numeric. I need R to get me the mean of slopes for all pid's nested within each sid if there are more than one pid's nested within sid. > > If there is only pid for a sid, like for 2.1 below, I want R to write a 0. > > In the data below, I want to get the mean of slopes for pid (1.1; 4.1; and 5.1) because they are nested within sid 1.1 and so on. > > Thanks in advance for any suggestions. > Swaroop > > sid ? ? ? ?pid ? ? ? ?slope > 1.1 ? ? ? ?1.1 ? ? ? ?2 > 1.1 ? ? ? ?4.1 ? ? ? ?3 > 1.1 ? ? ? ?5.1 ? ? ? ?2 > 2.1 ? ? ? ?5.1 ? ? ? ?3 > 3.2 ? ? ? ?1.2 ? ? ? ?2 > 3.2 ? ? ? ?1.7 ? ? ? ?3 > > > > > ? ? ? ?[[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve?